TOP ACTIVATIONS
MAX = 0.054

that said " No w agons near the pond ". She
a door , but no one is there .
you already have nice w agons . You don 't need
The cave is the best shelter ." So , it drove
lock the door so no one can come in ," said
Sara says . " What is inside ?" "
grows . They water it every day with a can .
's go find a new battery ." They went
sign . It says ' Post Office '. Can you read
. Lily grabs the scissors and runs to the box .
lock the door so no one can come in ." Rose
. " And see what is inside ." Ben
! It 's the best fort ever !" The
of a hill and no one knew why it was there
successful . He worked hard every day , but he had
course ! I own this cooler but my friends can use
!" M ama and Papa come to the room
" Ben , I have a rope . Grab it and
so beautiful inside and the fringe from the cut shells made
believe it 's a reliable ship ." Tim was sad ,

MIN ACTIVATIONS
MIN = -0.057

to look at the or nam ents on the tree .
take one of the or nam ents outside for a walk
wanted to see the or nam ents up close . They
also saw some new or nam ents that Santa had hung
time playing with her or nam ents and making her special
who liked to collect or nam ents . Everywhere she went
We have our own or nam ents at home , remember
many colors . Some or nam ents were round , some
Be careful , these or nam ents are very special .
All of these lovely or nam ents reminded him of the
big tree with many or nam ents on it . The
They reached for the or nam ents and touched them softly
on it . The or nam ents were shiny and colorful
They reached for the or nam ents and tried to take
branches . But the or nam ents were slippery and some
said . " The or nam ents are so pretty !
never played with the or nam ents again . They still
had not touched the or nam ents . They hoped Mom
at all the different or nam ents and adm iring the
she had with her or nam ents . <|endoftext|> Once upon

INTERVAL 0.026 - 0.054
CONTAINS 0.340%

I see something that looks like it could be luxurious and
picks up the axe and puts it back where they found
play together until it was time for them to go home
room looking for a good place to hide . Sue saw
. He is sleepy . He says , " I love

INTERVAL -0.002 - 0.026
CONTAINS 56.441%

not be jealous of what others have , but offer to
slide - bat he and she was so proud of her
then he replied , " Because it helps me learn and
with their toys in the living room . They had many
Once upon a time , there was a man named Tom

INTERVAL -0.029 - -0.002
CONTAINS 43.119%

the slide together and have so much fun . Lily realized
happy and excited because he saw impressive airplanes . He wanted
he polished the wheels every day so they stayed shiny .
You are my best friend too . You are very loyal
a me ow . She smiled . She saw a picture

INTERVAL -0.057 - -0.029
CONTAINS 0.100%

a gray cat named Spike . Spike loved to run and
and she never sh ivered again . She lived with the
you promise to never speed again , you can have some
nurses and doctors when they try to help you . They
didn 't want to play anymore . But then
Token that
Token ID326
Feature activation+1.65e-02

Pos logit contributions
ags+0.006
ishes+0.006
 ours+0.006
 unnecessary+0.006
ports+0.005
Neg logit contributions
ad-0.007
 Cr-0.006
 jugg-0.006
Sus-0.006
all-0.006
Token said
Token ID531
Feature activation+1.21e-02

Pos logit contributions
ags+0.026
 ours+0.024
ishes+0.023
 trash+0.022
 attention+0.022
Neg logit contributions
 Cr-0.029
ad-0.028
 jugg-0.027
Sus-0.026
 Sus-0.025
Token "
Token ID366
Feature activation-9.37e-03

Pos logit contributions
ags+0.019
 ours+0.017
 trash+0.016
 attention+0.016
ports+0.016
Neg logit contributions
 Cr-0.021
ad-0.020
 jugg-0.020
Sus-0.019
Johnny-0.019
TokenNo
Token ID2949
Feature activation+1.41e-02

Pos logit contributions
 Cr+0.014
ad+0.013
 jugg+0.013
Sus+0.013
 Sus+0.012
Neg logit contributions
ags-0.017
 ours-0.016
ishes-0.015
 trash-0.015
 attention-0.014
Token w
Token ID266
Feature activation+9.01e-03

Pos logit contributions
ags+0.031
ishes+0.028
 ours+0.027
irt+0.026
 trash+0.026
Neg logit contributions
ad-0.016
 Cr-0.016
 jugg-0.016
 Sus-0.014
all-0.013
Tokenagons
Token ID34765
Feature activation+5.43e-02

Pos logit contributions
ags+0.014
 ours+0.014
ishes+0.013
 trash+0.013
 attention+0.013
Neg logit contributions
ad-0.015
 Cr-0.015
 jugg-0.014
Sus-0.013
all-0.013
Token near
Token ID1474
Feature activation+1.34e-02

Pos logit contributions
ags+0.084
ishes+0.070
 unnecessary+0.070
 ours+0.070
 trash+0.068
Neg logit contributions
 Cr-0.093
ad-0.092
 jugg-0.089
all-0.086
 Sus-0.081
Token the
Token ID262
Feature activation+3.01e-03

Pos logit contributions
ags+0.023
ishes+0.020
 ours+0.019
ses+0.019
 attention+0.019
Neg logit contributions
 Cr-0.022
ad-0.021
 jugg-0.020
all-0.019
 Sus-0.019
Token pond
Token ID16723
Feature activation+2.03e-02

Pos logit contributions
ags+0.005
ishes+0.005
 ours+0.005
ows+0.005
hes+0.004
Neg logit contributions
 Cr-0.005
 jugg-0.004
ad-0.004
 Freddy-0.004
 Bub-0.004
Token".
Token ID1911
Feature activation+1.07e-02

Pos logit contributions
ags+0.030
 ours+0.026
ishes+0.025
 unnecessary+0.025
 attention+0.024
Neg logit contributions
 Cr-0.037
ad-0.036
 jugg-0.035
all-0.034
Sus-0.033
Token She
Token ID1375
Feature activation+8.20e-04

Pos logit contributions
ags+0.021
 ours+0.020
ishes+0.020
 attention+0.019
 unnecessary+0.018
Neg logit contributions
 Cr-0.015
ad-0.013
 Sus-0.013
Sus-0.012
 jugg-0.012
Token a
Token ID257
Feature activation+9.64e-03

Pos logit contributions
ags+0.011
ishes+0.009
 ours+0.009
 attention+0.008
irt+0.008
Neg logit contributions
 Cr-0.016
ad-0.016
Sus-0.015
 jugg-0.015
 Sus-0.015
Token door
Token ID3420
Feature activation-2.73e-03

Pos logit contributions
ags+0.012
 ours+0.011
ishes+0.010
 trash+0.009
 attention+0.009
Neg logit contributions
 Cr-0.020
ad-0.019
 jugg-0.019
Sus-0.019
 Sus-0.018
Token,
Token ID11
Feature activation+2.23e-02

Pos logit contributions
 Cr+0.005
ad+0.005
 jugg+0.004
Sus+0.004
 Sus+0.004
Neg logit contributions
ags-0.004
 ours-0.004
ishes-0.004
 trash-0.004
 attention-0.004
Token but
Token ID475
Feature activation+1.13e-02

Pos logit contributions
ags+0.035
ishes+0.031
 ours+0.030
 unnecessary+0.029
 attention+0.028
Neg logit contributions
 Cr-0.039
ad-0.038
 jugg-0.037
 Sus-0.036
Sus-0.036
Token no
Token ID645
Feature activation+1.45e-02

Pos logit contributions
ags+0.020
ishes+0.018
 ours+0.018
 unnecessary+0.017
 attention+0.017
Neg logit contributions
 Cr-0.018
ad-0.016
 jugg-0.016
 Sus-0.016
Sus-0.015
Token one
Token ID530
Feature activation+5.15e-02

Pos logit contributions
ags+0.020
ishes+0.016
 ours+0.015
ports+0.015
irt+0.015
Neg logit contributions
 Sus-0.028
ad-0.028
 Cr-0.028
 jugg-0.027
Sus-0.027
Token is
Token ID318
Feature activation+1.77e-02

Pos logit contributions
ags+0.103
 unnecessary+0.086
ishes+0.079
agons+0.079
irt+0.077
Neg logit contributions
ad-0.077
 jugg-0.075
bell-0.069
 Sus-0.066
ill-0.066
Token there
Token ID612
Feature activation+2.09e-02

Pos logit contributions
ags+0.031
ishes+0.026
ports+0.025
irt+0.024
 unnecessary+0.024
Neg logit contributions
 jugg-0.027
 Cr-0.027
ad-0.027
 Sus-0.027
Sus-0.027
Token.
Token ID13
Feature activation+2.70e-03

Pos logit contributions
ags+0.032
 ours+0.028
ishes+0.027
 trash+0.026
ports+0.025
Neg logit contributions
 Cr-0.037
ad-0.037
 jugg-0.035
Sus-0.034
 Sus-0.034
Token
Token ID198
Feature activation-6.27e-03

Pos logit contributions
ags+0.005
 ours+0.005
 attention+0.005
ishes+0.005
 unnecessary+0.005
Neg logit contributions
 Cr-0.004
 Sus-0.003
Sus-0.003
 Freddy-0.003
ad-0.003
Token
Token ID198
Feature activation-7.24e-03

Pos logit contributions
 Cr+0.010
ad+0.010
 jugg+0.010
Sus+0.009
all+0.009
Neg logit contributions
ags-0.010
 ours-0.009
 trash-0.009
ishes-0.009
ports-0.009
Token you
Token ID345
Feature activation-1.61e-02

Pos logit contributions
 Cr+0.004
ad+0.004
 jugg+0.004
 Sus+0.004
Sus+0.004
Neg logit contributions
ags-0.004
ishes-0.003
 ours-0.003
ports-0.003
 trash-0.003
Token already
Token ID1541
Feature activation+5.03e-03

Pos logit contributions
 Cr+0.029
 jugg+0.028
ad+0.028
 Sus+0.025
all+0.025
Neg logit contributions
ags-0.026
ishes-0.021
 ours-0.021
 attention-0.020
ports-0.019
Token have
Token ID423
Feature activation+1.25e-02

Pos logit contributions
ags+0.008
 ours+0.007
ishes+0.007
 attention+0.006
 trash+0.006
Neg logit contributions
 Cr-0.009
ad-0.009
 jugg-0.008
Sus-0.008
 Sus-0.008
Token nice
Token ID3621
Feature activation+5.31e-03

Pos logit contributions
ags+0.022
ows+0.020
ishes+0.019
ses+0.019
 slot+0.019
Neg logit contributions
 Cr-0.019
ad-0.018
 Sus-0.016
 jugg-0.016
all-0.016
Token w
Token ID266
Feature activation+8.26e-03

Pos logit contributions
ags+0.005
ses+0.005
 ours+0.004
 slot+0.004
ows+0.004
Neg logit contributions
ad-0.011
 Cr-0.011
 Sus-0.011
all-0.011
Sus-0.010
Tokenagons
Token ID34765
Feature activation+5.06e-02

Pos logit contributions
 ours+0.007
 trash+0.006
ishes+0.006
 waits+0.006
 hopes+0.006
Neg logit contributions
ad-0.020
 Cr-0.018
all-0.018
Sus-0.018
 jugg-0.017
Token.
Token ID13
Feature activation+1.84e-03

Pos logit contributions
ags+0.076
 unnecessary+0.066
ishes+0.065
 ours+0.064
 trash+0.063
Neg logit contributions
 Cr-0.090
ad-0.086
all-0.083
 jugg-0.083
 Sus-0.081
Token You
Token ID921
Feature activation-6.50e-03

Pos logit contributions
ags+0.003
ishes+0.003
 ours+0.003
 attention+0.003
 trash+0.002
Neg logit contributions
 Cr-0.003
ad-0.003
 Sus-0.003
Sus-0.003
Johnny-0.003
Token don
Token ID836
Feature activation+3.25e-03

Pos logit contributions
 Cr+0.011
ad+0.011
 jugg+0.011
 Sus+0.010
all+0.010
Neg logit contributions
ags-0.011
 ours-0.009
 attention-0.009
ishes-0.009
ports-0.008
Token't
Token ID470
Feature activation+6.30e-03

Pos logit contributions
ags+0.008
 ours+0.007
ports+0.007
issors+0.007
hes+0.007
Neg logit contributions
 Cr-0.003
ad-0.003
 Sus-0.003
Sus-0.003
 jugg-0.003
Token need
Token ID761
Feature activation+6.90e-03

Pos logit contributions
ags+0.009
 ours+0.008
ishes+0.008
ports+0.008
 trash+0.008
Neg logit contributions
 Cr-0.012
ad-0.011
 jugg-0.011
Sus-0.011
 Sus-0.010
TokenThe
Token ID464
Feature activation-2.72e-03

Pos logit contributions
ags+0.003
 ours+0.003
ishes+0.003
 attention+0.002
 trash+0.002
Neg logit contributions
 Cr-0.004
ad-0.004
 jugg-0.004
Sus-0.004
all-0.003
Token cave
Token ID11527
Feature activation+3.26e-02

Pos logit contributions
ad+0.004
 Cr+0.004
Sus+0.004
 jugg+0.004
 Sus+0.004
Neg logit contributions
ags-0.004
ishes-0.004
 ours-0.004
ports-0.004
irt-0.004
Token is
Token ID318
Feature activation+1.67e-02

Pos logit contributions
ags+0.051
 ours+0.039
 unnecessary+0.038
issors+0.038
 dads+0.037
Neg logit contributions
ad-0.056
 Cr-0.055
 jugg-0.054
all-0.053
 Sus-0.052
Token the
Token ID262
Feature activation+1.29e-02

Pos logit contributions
ags+0.027
ishes+0.024
ports+0.023
irt+0.022
 heads+0.021
Neg logit contributions
 Cr-0.028
ad-0.027
 Sus-0.026
all-0.026
 jugg-0.026
Token best
Token ID1266
Feature activation+1.80e-02

Pos logit contributions
ags+0.018
ishes+0.017
ports+0.017
 ours+0.015
irt+0.015
Neg logit contributions
 Cr-0.024
ad-0.023
 jugg-0.022
 Sus-0.022
all-0.021
Token shelter
Token ID11772
Feature activation+4.88e-02

Pos logit contributions
ags+0.029
ports+0.027
ishes+0.026
issors+0.025
 ours+0.023
Neg logit contributions
 Sus-0.030
 Cr-0.030
ad-0.029
Sus-0.028
all-0.028
Token."
Token ID526
Feature activation+1.85e-02

Pos logit contributions
ags+0.099
 unnecessary+0.085
ishes+0.083
irt+0.081
ports+0.081
Neg logit contributions
 Cr-0.062
 Sus-0.060
all-0.060
ad-0.058
t-0.056
Token So
Token ID1406
Feature activation+1.70e-02

Pos logit contributions
 ours+0.039
ags+0.038
 unnecessary+0.037
 attention+0.037
ishes+0.036
Neg logit contributions
 Cr-0.026
ad-0.023
 Sus-0.022
 Bub-0.021
 Suz-0.021
Token,
Token ID11
Feature activation+1.93e-02

Pos logit contributions
ags+0.047
 ours+0.043
ishes+0.042
ws+0.041
irt+0.041
Neg logit contributions
 Cr-0.013
ad-0.011
 Sus-0.010
 jugg-0.009
 Suz-0.009
Token it
Token ID340
Feature activation+2.70e-03

Pos logit contributions
ags+0.039
 ours+0.035
ishes+0.034
 attention+0.032
 unnecessary+0.032
Neg logit contributions
 Cr-0.029
 jugg-0.026
ad-0.025
 Sus-0.025
Sus-0.024
Token drove
Token ID10357
Feature activation+1.90e-02

Pos logit contributions
ags+0.005
 ours+0.004
ishes+0.004
 attention+0.004
ports+0.004
Neg logit contributions
 Cr-0.004
ad-0.004
 jugg-0.004
Sus-0.004
all-0.004
Token lock
Token ID5793
Feature activation+1.03e-02

Pos logit contributions
ags+0.011
 ours+0.010
ishes+0.009
 attention+0.009
 unnecessary+0.009
Neg logit contributions
 Cr-0.010
ad-0.010
 jugg-0.010
 Sus-0.009
Sus-0.009
Token the
Token ID262
Feature activation-6.32e-04

Pos logit contributions
ags+0.020
ishes+0.017
ses+0.017
 unnecessary+0.017
 ours+0.017
Neg logit contributions
 Cr-0.014
 Sus-0.013
ad-0.013
Sus-0.012
all-0.012
Token door
Token ID3420
Feature activation+7.91e-03

Pos logit contributions
ad+0.001
 gn+0.001
 grizz+0.001
Ros+0.001
 Cr+0.001
Neg logit contributions
ags-0.001
ishes-0.001
 ours-0.001
irt-0.001
 dads-0.001
Token so
Token ID523
Feature activation+6.15e-03

Pos logit contributions
ags+0.013
 unnecessary+0.012
ishes+0.012
 trash+0.011
ports+0.011
Neg logit contributions
 Sus-0.012
ad-0.012
Sus-0.012
all-0.012
 Cr-0.012
Token no
Token ID645
Feature activation+2.19e-03

Pos logit contributions
ags+0.013
ishes+0.011
ws+0.010
aked+0.010
irt+0.010
Neg logit contributions
 Sus-0.008
 Cr-0.008
Sus-0.008
ad-0.008
 Suz-0.007
Token one
Token ID530
Feature activation+4.86e-02

Pos logit contributions
ags+0.004
ows+0.004
ishes+0.003
aked+0.003
ses+0.003
Neg logit contributions
 Cr-0.003
 Sus-0.003
 jugg-0.003
ad-0.003
Sus-0.003
Token can
Token ID460
Feature activation+3.81e-03

Pos logit contributions
ags+0.113
 unnecessary+0.086
agons+0.086
arl+0.083
ports+0.082
Neg logit contributions
 Sus-0.056
bell-0.055
 ever-0.055
�-0.054
 jugg-0.054
Token come
Token ID1282
Feature activation+1.01e-02

Pos logit contributions
ags+0.005
 ours+0.005
 trash+0.004
 attention+0.004
ishes+0.004
Neg logit contributions
 Cr-0.007
ad-0.007
 jugg-0.007
Sus-0.007
Johnny-0.007
Token in
Token ID287
Feature activation-3.71e-04

Pos logit contributions
ags+0.013
ishes+0.011
 trash+0.011
 ours+0.011
ports+0.011
Neg logit contributions
 Cr-0.020
ad-0.020
Sus-0.019
 Sus-0.019
all-0.019
Token,"
Token ID553
Feature activation-3.25e-03

Pos logit contributions
 Cr+0.001
ad+0.001
Sus+0.001
 Sus+0.001
 jugg+0.001
Neg logit contributions
ags-0.001
ishes-0.000
 ours-0.000
 trash-0.000
 attention-0.000
Token said
Token ID531
Feature activation+1.05e-02

Pos logit contributions
 Cr+0.005
 Sus+0.005
 jugg+0.005
ad+0.004
Sus+0.004
Neg logit contributions
ags-0.006
 ours-0.006
ishes-0.005
 unnecessary-0.005
 attention-0.005
Token Sara
Token ID24799
Feature activation+2.94e-02

Pos logit contributions
 Cr+0.016
ad+0.015
 jugg+0.015
 Sus+0.014
Sus+0.014
Neg logit contributions
ags-0.012
 ours-0.011
ishes-0.010
 unnecessary-0.010
 attention-0.010
Token says
Token ID1139
Feature activation+1.16e-02

Pos logit contributions
ags+0.027
 ours+0.025
 unnecessary+0.022
ishes+0.022
ports+0.020
Neg logit contributions
ad-0.069
 Cr-0.069
 jugg-0.067
all-0.064
Sus-0.063
Token.
Token ID13
Feature activation+1.96e-02

Pos logit contributions
ags+0.016
 ours+0.015
ishes+0.014
 attention+0.013
 trash+0.013
Neg logit contributions
 Cr-0.022
ad-0.021
 jugg-0.021
Sus-0.020
 Sus-0.020
Token "
Token ID366
Feature activation+1.85e-02

Pos logit contributions
ags+0.043
 ours+0.042
ishes+0.041
 unnecessary+0.040
 attention+0.039
Neg logit contributions
 Cr-0.023
ad-0.021
 jugg-0.019
 Sus-0.019
 Nice-0.019
TokenWhat
Token ID2061
Feature activation+1.91e-02

Pos logit contributions
ags+0.031
ishes+0.027
 ours+0.027
 attention+0.026
 unnecessary+0.025
Neg logit contributions
 Cr-0.031
ad-0.030
 jugg-0.030
Sus-0.030
Johnny-0.028
Token is
Token ID318
Feature activation+4.83e-02

Pos logit contributions
ags+0.032
ishes+0.027
 heads+0.025
 unnecessary+0.024
 attention+0.024
Neg logit contributions
 jugg-0.034
ad-0.033
 Cr-0.032
on-0.030
 Nice-0.029
Token inside
Token ID2641
Feature activation+1.22e-02

Pos logit contributions
ags+0.108
ishes+0.093
 unnecessary+0.088
ows+0.086
arl+0.086
Neg logit contributions
 Cr-0.061
 Sus-0.057
 Bub-0.050
ad-0.049
 Ros-0.049
Token?"
Token ID1701
Feature activation+1.22e-02

Pos logit contributions
ags+0.022
ishes+0.018
ses+0.018
 unnecessary+0.018
 ours+0.018
Neg logit contributions
 Cr-0.018
ad-0.018
 Sus-0.017
all-0.017
 jugg-0.017
Token
Token ID198
Feature activation+1.29e-02

Pos logit contributions
ags+0.027
 ours+0.027
ishes+0.026
 unnecessary+0.024
 dollars+0.024
Neg logit contributions
 Cr-0.014
ad-0.013
 Sus-0.012
 jugg-0.011
all-0.011
Token
Token ID198
Feature activation+1.53e-02

Pos logit contributions
ags+0.024
 ours+0.022
ishes+0.021
 trash+0.021
 attention+0.020
Neg logit contributions
 Cr-0.019
ad-0.018
 jugg-0.017
Sus-0.017
 Sus-0.016
Token"
Token ID1
Feature activation+1.59e-02

Pos logit contributions
ags+0.027
 ours+0.025
ishes+0.023
 trash+0.023
 attention+0.023
Neg logit contributions
 Cr-0.024
ad-0.023
 jugg-0.022
Sus-0.022
 Sus-0.021
Token grows
Token ID13676
Feature activation+3.31e-03

Pos logit contributions
ags+0.011
 ours+0.010
ishes+0.010
 trash+0.009
 attention+0.009
Neg logit contributions
 Cr-0.014
ad-0.013
 jugg-0.013
Sus-0.012
 Sus-0.012
Token.
Token ID13
Feature activation-4.88e-03

Pos logit contributions
ags+0.005
ses+0.005
 ours+0.005
 slot+0.005
 sons+0.005
Neg logit contributions
ad-0.005
 Cr-0.005
all-0.005
Sus-0.005
 Sus-0.005
Token They
Token ID1119
Feature activation+9.63e-03

Pos logit contributions
 Cr+0.008
 Sus+0.007
 jugg+0.007
ad+0.007
Sus+0.007
Neg logit contributions
ags-0.008
 ours-0.008
 attention-0.007
ishes-0.007
 unnecessary-0.007
Token water
Token ID1660
Feature activation+1.39e-02

Pos logit contributions
ags+0.016
 ours+0.014
 unnecessary+0.014
 attention+0.013
ports+0.013
Neg logit contributions
 jugg-0.016
ad-0.016
 Cr-0.016
 Sus-0.015
Sus-0.014
Token it
Token ID340
Feature activation+7.49e-04

Pos logit contributions
ags+0.021
 ours+0.019
ishes+0.018
 unnecessary+0.018
 trash+0.018
Neg logit contributions
 Cr-0.025
ad-0.024
 jugg-0.023
Sus-0.022
 Sus-0.022
Token every
Token ID790
Feature activation+4.78e-02

Pos logit contributions
ags+0.001
 ours+0.001
ishes+0.001
 trash+0.001
 attention+0.001
Neg logit contributions
 Cr-0.001
ad-0.001
 jugg-0.001
Sus-0.001
 Sus-0.001
Token day
Token ID1110
Feature activation+3.80e-03

Pos logit contributions
ags+0.090
 ours+0.083
ports+0.083
ows+0.082
ishes+0.081
Neg logit contributions
ad-0.066
 Cr-0.066
 jugg-0.066
all-0.062
bird-0.061
Token with
Token ID351
Feature activation-1.66e-03

Pos logit contributions
ags+0.007
 ours+0.006
ishes+0.006
 trash+0.005
ports+0.005
Neg logit contributions
 Cr-0.006
ad-0.006
 jugg-0.006
all-0.006
 Sus-0.005
Token a
Token ID257
Feature activation+1.02e-02

Pos logit contributions
 Cr+0.004
ad+0.003
Sus+0.003
 jugg+0.003
all+0.003
Neg logit contributions
ags-0.002
 ours-0.002
 trash-0.002
 hands-0.002
ishes-0.002
Token can
Token ID460
Feature activation-1.13e-02

Pos logit contributions
ags+0.016
 ours+0.014
ishes+0.014
ports+0.013
 attention+0.012
Neg logit contributions
 jugg-0.018
 Cr-0.017
ad-0.017
Sus-0.016
 Sus-0.016
Token.
Token ID13
Feature activation-4.21e-03

Pos logit contributions
 Cr+0.022
Johnny+0.021
Sus+0.021
ad+0.020
 jugg+0.020
Neg logit contributions
ags-0.015
 ours-0.013
 attention-0.013
 hands-0.013
 trash-0.012
Token's
Token ID338
Feature activation+9.10e-03

Pos logit contributions
ags+0.004
 ours+0.003
ses+0.003
ishes+0.003
 trailer+0.003
Neg logit contributions
 Cr-0.002
ad-0.002
Sus-0.002
 Sus-0.002
 jugg-0.002
Token go
Token ID467
Feature activation-2.96e-03

Pos logit contributions
ags+0.015
 ours+0.013
ishes+0.013
ports+0.012
 unnecessary+0.012
Neg logit contributions
ad-0.016
 Cr-0.015
 jugg-0.015
 Sus-0.014
Sus-0.014
Token find
Token ID1064
Feature activation+1.45e-02

Pos logit contributions
 Cr+0.005
ad+0.005
 jugg+0.005
Sus+0.005
 Sus+0.005
Neg logit contributions
ags-0.005
 ours-0.004
ishes-0.004
 trash-0.004
 unnecessary-0.004
Token a
Token ID257
Feature activation+7.53e-03

Pos logit contributions
ags+0.025
ses+0.023
 heads+0.022
ishes+0.022
ows+0.022
Neg logit contributions
 Cr-0.023
 Sus-0.021
ad-0.021
Sus-0.020
Johnny-0.019
Token new
Token ID649
Feature activation+8.41e-03

Pos logit contributions
ags+0.010
 ours+0.009
ishes+0.008
 trash+0.008
 attention+0.008
Neg logit contributions
 Cr-0.015
ad-0.015
 jugg-0.014
Sus-0.014
 Sus-0.014
Token battery
Token ID6555
Feature activation+4.75e-02

Pos logit contributions
ags+0.010
 ours+0.010
 trash+0.009
 attention+0.009
 hands+0.009
Neg logit contributions
 Cr-0.017
ad-0.017
Sus-0.016
 jugg-0.016
all-0.015
Token."
Token ID526
Feature activation+9.76e-04

Pos logit contributions
ags+0.079
 unnecessary+0.069
 trash+0.065
ishes+0.062
ports+0.062
Neg logit contributions
 Cr-0.078
 Sus-0.074
all-0.074
ad-0.072
 jugg-0.071
Token
Token ID198
Feature activation+4.17e-03

Pos logit contributions
ags+0.002
 ours+0.002
 unnecessary+0.002
 attention+0.002
 pests+0.002
Neg logit contributions
 Cr-0.001
 Sus-0.001
Sus-0.001
ad-0.001
Ros-0.001
Token
Token ID198
Feature activation-9.17e-04

Pos logit contributions
ags+0.008
 ours+0.008
ishes+0.008
 attention+0.007
 trash+0.007
Neg logit contributions
 Cr-0.005
ad-0.005
Sus-0.005
 jugg-0.005
Johnny-0.005
TokenThey
Token ID2990
Feature activation-7.10e-03

Pos logit contributions
 Cr+0.001
ad+0.001
Sus+0.001
 jugg+0.001
Johnny+0.001
Neg logit contributions
ags-0.002
 ours-0.002
ishes-0.002
 attention-0.002
 trash-0.002
Token went
Token ID1816
Feature activation-7.78e-03

Pos logit contributions
 Cr+0.011
ad+0.011
 jugg+0.011
Sus+0.010
 Sus+0.010
Neg logit contributions
ags-0.012
 ours-0.011
ishes-0.011
 trash-0.010
 attention-0.010
Token sign
Token ID1051
Feature activation+6.47e-03

Pos logit contributions
ags+0.025
ishes+0.021
aked+0.020
ports+0.020
 ours+0.019
Neg logit contributions
 jugg-0.024
 Cr-0.023
Sus-0.023
ad-0.023
 Sus-0.023
Token.
Token ID13
Feature activation+2.20e-02

Pos logit contributions
ags+0.012
ishes+0.010
 unnecessary+0.010
ports+0.010
 heads+0.009
Neg logit contributions
ad-0.010
 Cr-0.010
 jugg-0.010
Sus-0.009
 Sus-0.009
Token It
Token ID632
Feature activation+1.62e-02

Pos logit contributions
ags+0.043
ishes+0.039
 unnecessary+0.039
 attention+0.038
 ours+0.036
Neg logit contributions
 Cr-0.033
 Sus-0.029
ad-0.028
 jugg-0.027
 Nice-0.027
Token says
Token ID1139
Feature activation+2.27e-02

Pos logit contributions
ags+0.025
 unnecessary+0.020
 attention+0.019
 ours+0.019
ishes+0.019
Neg logit contributions
 Cr-0.030
ad-0.030
 jugg-0.029
 Sus-0.028
bell-0.027
Token '
Token ID705
Feature activation+1.34e-02

Pos logit contributions
ags+0.040
 ours+0.036
ishes+0.036
 trash+0.034
 attention+0.034
Neg logit contributions
 Cr-0.035
ad-0.034
 jugg-0.032
Sus-0.032
 Sus-0.031
TokenPost
Token ID6307
Feature activation+4.75e-02

Pos logit contributions
ags+0.025
 ours+0.022
ishes+0.021
 attention+0.021
 trash+0.021
Neg logit contributions
 Cr-0.020
ad-0.020
 jugg-0.019
Sus-0.019
all-0.018
Token Office
Token ID4452
Feature activation+2.74e-02

Pos logit contributions
ags+0.081
 unnecessary+0.070
 ours+0.069
ishes+0.068
 trash+0.068
Neg logit contributions
ad-0.076
 Cr-0.070
all-0.070
 Sus-0.070
ous-0.066
Token'.
Token ID4458
Feature activation+1.10e-02

Pos logit contributions
ags+0.048
 ours+0.042
ishes+0.042
 unnecessary+0.041
 trash+0.041
Neg logit contributions
ad-0.043
 Cr-0.041
 jugg-0.040
all-0.040
Sus-0.038
Token Can
Token ID1680
Feature activation+6.16e-03

Pos logit contributions
ags+0.022
ishes+0.021
 attention+0.020
 unnecessary+0.019
 heads+0.019
Neg logit contributions
 Cr-0.016
 Sus-0.014
ad-0.013
Sus-0.013
all-0.013
Token you
Token ID345
Feature activation-2.07e-03

Pos logit contributions
ags+0.006
 ours+0.006
ports+0.005
 trash+0.005
ishes+0.005
Neg logit contributions
ad-0.013
 Cr-0.013
 jugg-0.013
Sus-0.013
Jo-0.012
Token read
Token ID1100
Feature activation-1.70e-04

Pos logit contributions
 Cr+0.004
 Nice+0.004
Sus+0.004
Jo+0.004
ad+0.004
Neg logit contributions
ags-0.002
 ours-0.002
 hands-0.002
 attention-0.002
 slot-0.002
Token.
Token ID13
Feature activation+7.40e-03

Pos logit contributions
ags+0.025
 ours+0.022
ishes+0.021
 trash+0.021
ports+0.020
Neg logit contributions
 Cr-0.031
ad-0.031
 jugg-0.030
Sus-0.029
all-0.028
Token Lily
Token ID20037
Feature activation+1.69e-02

Pos logit contributions
ags+0.012
 ours+0.011
ishes+0.011
 attention+0.011
 unnecessary+0.010
Neg logit contributions
 Cr-0.013
 Sus-0.012
Sus-0.012
Johnny-0.011
ad-0.011
Token grabs
Token ID22378
Feature activation+3.98e-02

Pos logit contributions
ags+0.021
 ours+0.018
 attention+0.016
 unnecessary+0.016
ishes+0.016
Neg logit contributions
ad-0.035
 jugg-0.035
 Cr-0.035
 chuckled-0.033
all-0.032
Token the
Token ID262
Feature activation+1.37e-02

Pos logit contributions
ags+0.073
 ours+0.067
ishes+0.064
 unnecessary+0.063
ses+0.061
Neg logit contributions
 Cr-0.059
ad-0.056
 Sus-0.053
 jugg-0.052
Sus-0.051
Token scissors
Token ID41786
Feature activation+6.60e-03

Pos logit contributions
ags+0.019
ishes+0.018
 ours+0.016
ans+0.016
irt+0.015
Neg logit contributions
ad-0.023
 jugg-0.023
on-0.021
 marble-0.021
 gn-0.021
Token and
Token ID290
Feature activation+4.72e-02

Pos logit contributions
ags+0.008
 ours+0.008
 attention+0.007
ishes+0.007
 hands+0.007
Neg logit contributions
 Cr-0.013
ad-0.013
 jugg-0.013
Sus-0.012
Johnny-0.012
Token runs
Token ID4539
Feature activation+2.31e-02

Pos logit contributions
ags+0.069
 unnecessary+0.056
 ours+0.054
ishes+0.051
 attention+0.050
Neg logit contributions
 jugg-0.090
 Cr-0.089
ad-0.088
 Sus-0.081
 chuckled-0.080
Token to
Token ID284
Feature activation+1.12e-02

Pos logit contributions
ags+0.039
 ours+0.035
ishes+0.035
ports+0.033
irt+0.032
Neg logit contributions
 Cr-0.037
ad-0.036
Sus-0.034
 Sus-0.034
all-0.034
Token the
Token ID262
Feature activation+8.06e-03

Pos logit contributions
ags+0.020
ishes+0.018
 ours+0.018
ports+0.017
 trash+0.017
Neg logit contributions
 Cr-0.017
ad-0.017
 jugg-0.016
 Sus-0.016
Sus-0.016
Token box
Token ID3091
Feature activation+8.95e-03

Pos logit contributions
ags+0.011
ishes+0.010
irt+0.009
 ours+0.009
 unnecessary+0.009
Neg logit contributions
ad-0.015
 jugg-0.015
 gn-0.014
Ros-0.014
Johnny-0.014
Token.
Token ID13
Feature activation+1.63e-02

Pos logit contributions
ags+0.013
 ours+0.012
ishes+0.011
 trash+0.011
ports+0.011
Neg logit contributions
 Cr-0.016
ad-0.016
 jugg-0.015
Sus-0.015
 Sus-0.015
Token lock
Token ID5793
Feature activation+1.40e-02

Pos logit contributions
ags+0.006
 ours+0.005
ishes+0.005
 trash+0.005
 attention+0.005
Neg logit contributions
 Cr-0.010
ad-0.010
 jugg-0.009
Sus-0.009
 Sus-0.009
Token the
Token ID262
Feature activation+4.56e-03

Pos logit contributions
ags+0.024
ishes+0.022
irt+0.020
 dads+0.020
 unnecessary+0.020
Neg logit contributions
ad-0.021
 Cr-0.021
 Sus-0.019
all-0.019
 jugg-0.018
Token door
Token ID3420
Feature activation+7.24e-03

Pos logit contributions
ags+0.008
ishes+0.008
hes+0.007
 ours+0.007
irt+0.006
Neg logit contributions
ad-0.007
 Cr-0.007
 jugg-0.006
 grizz-0.006
 Bub-0.006
Token so
Token ID523
Feature activation+5.90e-03

Pos logit contributions
ags+0.012
ishes+0.011
 unnecessary+0.011
ports+0.010
ses+0.010
Neg logit contributions
all-0.011
ad-0.011
 Cr-0.011
 Sus-0.011
bell-0.010
Token no
Token ID645
Feature activation+2.03e-03

Pos logit contributions
ags+0.010
ishes+0.009
ports+0.008
irt+0.008
ws+0.008
Neg logit contributions
 Cr-0.010
 Sus-0.009
ad-0.009
Sus-0.009
 jugg-0.008
Token one
Token ID530
Feature activation+4.68e-02

Pos logit contributions
ags+0.003
ishes+0.003
 ours+0.003
ports+0.003
 trash+0.002
Neg logit contributions
 Cr-0.004
ad-0.004
 jugg-0.004
Sus-0.003
 Sus-0.003
Token can
Token ID460
Feature activation+1.08e-02

Pos logit contributions
ags+0.112
ports+0.095
agons+0.092
ishes+0.091
issors+0.090
Neg logit contributions
 ever-0.046
 jugg-0.044
all-0.044
t-0.043
ad-0.040
Token come
Token ID1282
Feature activation+7.30e-03

Pos logit contributions
ags+0.014
 ours+0.013
 attention+0.012
 trash+0.012
 hopes+0.012
Neg logit contributions
 Cr-0.022
ad-0.020
 jugg-0.020
Sus-0.020
Johnny-0.019
Token in
Token ID287
Feature activation+1.23e-04

Pos logit contributions
ags+0.009
ishes+0.008
 ours+0.008
 trash+0.008
ports+0.007
Neg logit contributions
 Cr-0.015
ad-0.014
Sus-0.014
all-0.014
 jugg-0.014
Token."
Token ID526
Feature activation-3.24e-03

Pos logit contributions
ags+0.000
ishes+0.000
 ours+0.000
 trash+0.000
 attention+0.000
Neg logit contributions
 Cr-0.000
ad-0.000
 jugg-0.000
Sus-0.000
all-0.000
Token Rose
Token ID8049
Feature activation+4.76e-03

Pos logit contributions
 Cr+0.005
ad+0.005
 jugg+0.005
Sus+0.005
 Sus+0.004
Neg logit contributions
ags-0.006
 ours-0.005
ishes-0.005
 trash-0.005
 attention-0.005
Token.
Token ID13
Feature activation+1.51e-02

Pos logit contributions
 ours+0.019
ags+0.018
ishes+0.018
 trash+0.017
ports+0.017
Neg logit contributions
ad-0.016
 Cr-0.016
Sus-0.014
 jugg-0.014
all-0.014
Token "
Token ID366
Feature activation+4.98e-04

Pos logit contributions
ags+0.033
 ours+0.033
ishes+0.031
 unnecessary+0.030
irt+0.029
Neg logit contributions
 Cr-0.017
 Nice-0.015
Sus-0.015
ad-0.014
all-0.014
TokenAnd
Token ID1870
Feature activation+2.14e-02

Pos logit contributions
ags+0.001
 ours+0.001
ishes+0.001
 attention+0.001
 trash+0.001
Neg logit contributions
 Cr-0.001
ad-0.001
 jugg-0.001
Sus-0.001
all-0.001
Token see
Token ID766
Feature activation+1.49e-02

Pos logit contributions
ags+0.046
ishes+0.041
aked+0.039
inters+0.039
 trash+0.038
Neg logit contributions
 Cr-0.028
 Sus-0.026
 jugg-0.025
all-0.023
ad-0.023
Token what
Token ID644
Feature activation+1.35e-02

Pos logit contributions
ags+0.029
aked+0.026
inters+0.025
issors+0.025
suits+0.025
Neg logit contributions
 Cr-0.020
ad-0.018
 jugg-0.018
all-0.018
 Sus-0.017
Token is
Token ID318
Feature activation+4.65e-02

Pos logit contributions
ags+0.026
ishes+0.021
issors+0.020
irt+0.020
 reins+0.020
Neg logit contributions
 jugg-0.021
 Cr-0.020
ad-0.019
 Sus-0.018
on-0.017
Token inside
Token ID2641
Feature activation+6.54e-03

Pos logit contributions
ags+0.089
ishes+0.087
 heads+0.079
arl+0.078
irt+0.078
Neg logit contributions
 Cr-0.054
 Sus-0.051
 Bub-0.047
 --0.047
all-0.046
Token."
Token ID526
Feature activation+1.21e-02

Pos logit contributions
ags+0.011
ses+0.009
ishes+0.009
 unnecessary+0.009
 trailer+0.008
Neg logit contributions
ad-0.011
 Cr-0.010
 Sus-0.010
all-0.010
 jugg-0.010
Token
Token ID198
Feature activation+8.03e-03

Pos logit contributions
 ours+0.027
ags+0.027
ishes+0.027
 heads+0.025
 unnecessary+0.025
Neg logit contributions
 Cr-0.013
ad-0.011
Sus-0.010
 Nice-0.010
 Sus-0.010
Token
Token ID198
Feature activation+1.02e-02

Pos logit contributions
ags+0.016
 ours+0.015
ishes+0.015
 trash+0.014
 attention+0.014
Neg logit contributions
 Cr-0.010
ad-0.010
Sus-0.009
 jugg-0.009
Johnny-0.009
TokenBen
Token ID11696
Feature activation+2.26e-02

Pos logit contributions
ags+0.018
 ours+0.017
ishes+0.017
 attention+0.016
 unnecessary+0.015
Neg logit contributions
ad-0.015
 Cr-0.015
Sus-0.015
 jugg-0.014
Johnny-0.014
Token!
Token ID0
Feature activation+2.31e-02

Pos logit contributions
 Cr+0.007
ad+0.007
 jugg+0.007
Sus+0.007
Johnny+0.006
Neg logit contributions
ags-0.006
 ours-0.005
ishes-0.005
 attention-0.005
 trash-0.005
Token It
Token ID632
Feature activation+4.42e-03

Pos logit contributions
ags+0.037
 ours+0.034
ishes+0.033
 trash+0.032
 attention+0.032
Neg logit contributions
 Cr-0.041
ad-0.039
 jugg-0.037
 Sus-0.037
Sus-0.037
Token's
Token ID338
Feature activation+1.20e-02

Pos logit contributions
ags+0.007
 ours+0.007
ishes+0.006
 attention+0.006
 trash+0.006
Neg logit contributions
 Cr-0.007
ad-0.007
 jugg-0.007
Sus-0.007
 Sus-0.007
Token the
Token ID262
Feature activation+1.50e-02

Pos logit contributions
ags+0.020
ishes+0.018
ports+0.017
 ours+0.017
 heads+0.017
Neg logit contributions
 Cr-0.020
ad-0.019
 jugg-0.018
 Sus-0.018
all-0.018
Token best
Token ID1266
Feature activation+2.02e-02

Pos logit contributions
ags+0.018
ishes+0.016
ports+0.015
 ours+0.015
irt+0.014
Neg logit contributions
ad-0.031
 Cr-0.030
 jugg-0.030
all-0.028
 gn-0.028
Token fort
Token ID6285
Feature activation+4.63e-02

Pos logit contributions
ags+0.034
ports+0.031
ishes+0.030
 ours+0.029
issors+0.028
Neg logit contributions
ad-0.033
 jugg-0.032
 Sus-0.032
all-0.032
 Suz-0.031
Token ever
Token ID1683
Feature activation-5.12e-03

Pos logit contributions
ags+0.111
ishes+0.100
 unnecessary+0.099
 trash+0.099
ports+0.098
Neg logit contributions
ad-0.041
 Cr-0.040
 jugg-0.039
all-0.038
 Sus-0.037
Token!"
Token ID2474
Feature activation+1.40e-02

Pos logit contributions
 Cr+0.010
ad+0.010
 jugg+0.009
Sus+0.009
 Sus+0.009
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
 trash-0.006
 attention-0.006
Token
Token ID198
Feature activation+1.65e-02

Pos logit contributions
ags+0.028
 ours+0.027
ishes+0.026
 trash+0.026
 attention+0.025
Neg logit contributions
 Cr-0.019
ad-0.018
Sus-0.016
 jugg-0.016
 Sus-0.016
Token
Token ID198
Feature activation+1.67e-02

Pos logit contributions
ags+0.027
 ours+0.024
ishes+0.023
 trash+0.023
ports+0.023
Neg logit contributions
 Cr-0.027
ad-0.026
 jugg-0.025
Sus-0.024
 Sus-0.024
TokenThe
Token ID464
Feature activation-7.48e-04

Pos logit contributions
ags+0.026
 ours+0.023
ports+0.023
 dads+0.022
OK+0.022
Neg logit contributions
 Cr-0.028
ad-0.026
 Sus-0.026
 Nice-0.024
 grizz-0.024
Token of
Token ID286
Feature activation+2.17e-02

Pos logit contributions
ags+0.021
 ours+0.020
ishes+0.019
 unnecessary+0.019
ports+0.019
Neg logit contributions
ad-0.010
 jugg-0.009
 Cr-0.009
all-0.009
Sus-0.008
Token a
Token ID257
Feature activation+7.77e-03

Pos logit contributions
ags+0.034
 ours+0.031
ishes+0.030
 trash+0.029
 attention+0.029
Neg logit contributions
 Cr-0.037
ad-0.036
 jugg-0.035
Sus-0.034
 Sus-0.033
Token hill
Token ID12788
Feature activation+1.12e-02

Pos logit contributions
ags+0.012
 ours+0.010
ishes+0.010
 attention+0.010
ports+0.009
Neg logit contributions
 Cr-0.014
 jugg-0.014
ad-0.014
Sus-0.013
Johnny-0.013
Token and
Token ID290
Feature activation+1.96e-02

Pos logit contributions
ags+0.016
 ours+0.015
ishes+0.014
 trash+0.014
 attention+0.013
Neg logit contributions
 Cr-0.021
ad-0.020
 jugg-0.019
Sus-0.019
 Sus-0.019
Token no
Token ID645
Feature activation+1.11e-02

Pos logit contributions
ags+0.034
 ours+0.031
ishes+0.030
 trash+0.029
 attention+0.028
Neg logit contributions
 Cr-0.031
ad-0.030
 jugg-0.029
Sus-0.029
 Sus-0.028
Token one
Token ID530
Feature activation+4.63e-02

Pos logit contributions
ags+0.023
ishes+0.020
 ours+0.020
ows+0.020
ports+0.020
Neg logit contributions
 jugg-0.016
ad-0.014
 Cr-0.014
 Sus-0.013
 chuckled-0.013
Token knew
Token ID2993
Feature activation+1.44e-02

Pos logit contributions
ags+0.099
 ours+0.088
ishes+0.084
ports+0.082
 unnecessary+0.081
Neg logit contributions
 jugg-0.061
ad-0.060
 Cr-0.060
 Sus-0.055
all-0.055
Token why
Token ID1521
Feature activation+4.37e-03

Pos logit contributions
ags+0.026
ishes+0.022
 ours+0.021
irt+0.021
 trash+0.021
Neg logit contributions
 Cr-0.022
ad-0.022
 jugg-0.021
 Sus-0.021
all-0.021
Token it
Token ID340
Feature activation-3.60e-04

Pos logit contributions
ags+0.008
 ours+0.007
ishes+0.007
 trash+0.006
ports+0.006
Neg logit contributions
 Cr-0.007
ad-0.007
 jugg-0.006
Sus-0.006
 Sus-0.006
Token was
Token ID373
Feature activation+1.64e-02

Pos logit contributions
 Cr+0.001
Sus+0.001
ad+0.001
Johnny+0.001
 Sus+0.000
Neg logit contributions
ags-0.001
 trash-0.001
 ours-0.001
ishes-0.001
 hopes-0.001
Token there
Token ID612
Feature activation+1.26e-02

Pos logit contributions
ags+0.026
 ours+0.022
ishes+0.022
ports+0.021
 trash+0.021
Neg logit contributions
 Cr-0.029
ad-0.028
 jugg-0.027
Sus-0.026
 Sus-0.026
Token successful
Token ID4388
Feature activation+9.76e-03

Pos logit contributions
ags+0.010
ses+0.010
 ours+0.010
ishes+0.010
ows+0.010
Neg logit contributions
Sus-0.012
ad-0.011
 Cr-0.011
 Sus-0.011
 jugg-0.011
Token.
Token ID13
Feature activation-1.01e-02

Pos logit contributions
ags+0.018
 ours+0.016
 trash+0.015
ishes+0.015
ses+0.015
Neg logit contributions
ad-0.014
all-0.014
 Cr-0.014
 jugg-0.014
 Sus-0.014
Token He
Token ID679
Feature activation+1.04e-02

Pos logit contributions
 Cr+0.013
ad+0.013
 jugg+0.012
Sus+0.012
 Sus+0.011
Neg logit contributions
ags-0.020
 ours-0.019
ishes-0.018
 trash-0.018
 attention-0.018
Token worked
Token ID3111
Feature activation+1.18e-02

Pos logit contributions
ags+0.017
 ours+0.015
ishes+0.015
 trash+0.014
 attention+0.014
Neg logit contributions
 Cr-0.017
ad-0.017
 jugg-0.016
Sus-0.016
 Sus-0.015
Token hard
Token ID1327
Feature activation+1.10e-02

Pos logit contributions
ags+0.016
 ours+0.016
ishes+0.015
ses+0.014
 trash+0.014
Neg logit contributions
ad-0.021
 Cr-0.021
all-0.020
 jugg-0.020
Sus-0.020
Token every
Token ID790
Feature activation+4.62e-02

Pos logit contributions
ags+0.019
 ours+0.018
ishes+0.017
 trash+0.016
ports+0.016
Neg logit contributions
ad-0.017
 Cr-0.017
all-0.016
 jugg-0.016
Sus-0.015
Token day
Token ID1110
Feature activation+1.02e-02

Pos logit contributions
ags+0.077
ishes+0.071
 ours+0.071
ows+0.065
ports+0.065
Neg logit contributions
ad-0.078
 jugg-0.078
 Cr-0.075
all-0.072
 Sus-0.071
Token,
Token ID11
Feature activation+1.43e-03

Pos logit contributions
ags+0.016
 ours+0.015
ishes+0.014
 trash+0.014
 attention+0.014
Neg logit contributions
 Cr-0.017
ad-0.017
 jugg-0.016
Sus-0.016
 Sus-0.015
Token but
Token ID475
Feature activation-1.32e-02

Pos logit contributions
ags+0.002
 ours+0.002
ishes+0.002
 trash+0.002
 attention+0.002
Neg logit contributions
 Cr-0.002
ad-0.002
 jugg-0.002
Sus-0.002
 Sus-0.002
Token he
Token ID339
Feature activation-2.61e-03

Pos logit contributions
 Cr+0.018
ad+0.018
Sus+0.017
 jugg+0.016
Ros+0.016
Neg logit contributions
 ours-0.023
ags-0.023
 trash-0.022
 attention-0.022
 best-0.022
Token had
Token ID550
Feature activation+3.71e-03

Pos logit contributions
 Cr+0.004
ad+0.004
Sus+0.004
 Nice+0.004
 jugg+0.004
Neg logit contributions
ags-0.004
ishes-0.004
 ours-0.004
 trash-0.004
 hopes-0.004
Token course
Token ID1781
Feature activation+8.69e-03

Pos logit contributions
ags+0.000
 ours+0.000
ishes+0.000
 trash+0.000
 attention+0.000
Neg logit contributions
 Cr-0.000
 jugg-0.000
ad-0.000
Sus-0.000
 Sus-0.000
Token!
Token ID0
Feature activation+1.74e-02

Pos logit contributions
ags+0.019
 ours+0.017
ishes+0.017
 trash+0.017
ports+0.016
Neg logit contributions
 Cr-0.010
ad-0.010
 jugg-0.009
Sus-0.009
 Sus-0.009
Token I
Token ID314
Feature activation+1.05e-02

Pos logit contributions
ags+0.027
 ours+0.025
 trash+0.024
ishes+0.024
 unnecessary+0.023
Neg logit contributions
 Cr-0.032
ad-0.029
 jugg-0.028
 Sus-0.028
 Nice-0.028
Token own
Token ID898
Feature activation+9.26e-03

Pos logit contributions
ags+0.012
 ours+0.010
ishes+0.009
 trash+0.009
ports+0.009
Neg logit contributions
 Cr-0.023
ad-0.023
 jugg-0.022
Sus-0.022
 Sus-0.021
Token this
Token ID428
Feature activation+3.57e-05

Pos logit contributions
ags+0.015
 ours+0.013
ishes+0.013
 attention+0.012
ports+0.012
Neg logit contributions
 Cr-0.016
ad-0.016
 jugg-0.015
 Sus-0.015
Sus-0.015
Token cooler
Token ID19346
Feature activation+4.60e-02

Pos logit contributions
ags+0.000
ishes+0.000
 unnecessary+0.000
 ours+0.000
ports+0.000
Neg logit contributions
 Cr-0.000
ad-0.000
 jugg-0.000
 Sus-0.000
Sus-0.000
Token but
Token ID475
Feature activation+5.10e-03

Pos logit contributions
ags+0.082
 unnecessary+0.073
ishes+0.068
 ours+0.067
irt+0.066
Neg logit contributions
 Cr-0.072
ad-0.068
all-0.066
 jugg-0.066
 Sus-0.066
Token my
Token ID616
Feature activation+1.86e-02

Pos logit contributions
ags+0.008
 ours+0.007
ishes+0.006
 trash+0.006
ports+0.006
Neg logit contributions
 Cr-0.009
ad-0.009
 jugg-0.009
 Sus-0.009
Sus-0.009
Token friends
Token ID2460
Feature activation+2.20e-02

Pos logit contributions
ags+0.027
 ours+0.024
ports+0.024
issors+0.023
ishes+0.023
Neg logit contributions
 Cr-0.033
ad-0.031
 jugg-0.030
 Sus-0.030
 gn-0.030
Token can
Token ID460
Feature activation-4.61e-03

Pos logit contributions
ags+0.031
 ours+0.026
ishes+0.026
ports+0.025
 unnecessary+0.025
Neg logit contributions
ad-0.041
 jugg-0.040
 Cr-0.040
 Sus-0.038
all-0.038
Token use
Token ID779
Feature activation+3.02e-04

Pos logit contributions
 Cr+0.009
igsaw+0.008
Sus+0.008
 Nice+0.008
 chuckled+0.008
Neg logit contributions
 ours-0.006
ags-0.006
 will-0.005
 head-0.005
 best-0.005
Token!"
Token ID2474
Feature activation+1.60e-03

Pos logit contributions
ags+0.006
 ours+0.005
ishes+0.005
 trash+0.005
 attention+0.005
Neg logit contributions
 Cr-0.008
ad-0.008
 jugg-0.008
Sus-0.008
 Sus-0.007
Token
Token ID198
Feature activation+1.33e-02

Pos logit contributions
ags+0.003
 ours+0.003
ishes+0.003
 attention+0.003
 unnecessary+0.003
Neg logit contributions
 Cr-0.002
ad-0.002
Sus-0.002
 jugg-0.002
 Sus-0.002
Token
Token ID198
Feature activation+1.54e-02

Pos logit contributions
ags+0.026
 ours+0.025
ishes+0.024
 attention+0.023
 trash+0.023
Neg logit contributions
 Cr-0.017
ad-0.017
Sus-0.016
 jugg-0.016
Johnny-0.015
TokenM
Token ID44
Feature activation-1.06e-02

Pos logit contributions
ags+0.031
 ours+0.029
ishes+0.028
 attention+0.027
 trash+0.026
Neg logit contributions
 Cr-0.021
ad-0.020
Sus-0.020
 jugg-0.019
Johnny-0.018
Tokenama
Token ID1689
Feature activation+6.56e-03

Pos logit contributions
ad+0.016
 Cr+0.015
Sus+0.014
all+0.014
 jugg+0.013
Neg logit contributions
ags-0.019
 ours-0.019
ishes-0.017
 trash-0.017
 unnecessary-0.017
Token and
Token ID290
Feature activation+4.59e-02

Pos logit contributions
ags+0.010
 ours+0.010
ishes+0.009
 attention+0.009
 unnecessary+0.009
Neg logit contributions
ad-0.011
 Cr-0.011
 jugg-0.011
Sus-0.010
 Sus-0.010
Token Papa
Token ID42328
Feature activation+1.64e-02

Pos logit contributions
ags+0.077
 ours+0.069
ows+0.069
ports+0.066
 slot+0.066
Neg logit contributions
 Cr-0.079
ad-0.072
 Sus-0.070
 jugg-0.070
 Suz-0.068
Token come
Token ID1282
Feature activation+4.06e-03

Pos logit contributions
 ours+0.030
ags+0.030
ishes+0.028
 unnecessary+0.027
ports+0.026
Neg logit contributions
ad-0.024
 Cr-0.022
 jugg-0.022
Sus-0.020
 Sus-0.020
Token to
Token ID284
Feature activation+1.08e-02

Pos logit contributions
ags+0.006
 ours+0.006
ishes+0.005
ports+0.005
 trash+0.005
Neg logit contributions
 Cr-0.007
ad-0.007
Sus-0.007
 Sus-0.007
all-0.006
Token the
Token ID262
Feature activation+4.35e-03

Pos logit contributions
ags+0.017
 ours+0.016
ishes+0.015
 trash+0.015
 attention+0.015
Neg logit contributions
 Cr-0.018
ad-0.018
 jugg-0.017
Sus-0.017
 Sus-0.016
Token room
Token ID2119
Feature activation+7.08e-03

Pos logit contributions
ags+0.006
ishes+0.005
 ours+0.005
 attention+0.004
 unnecessary+0.004
Neg logit contributions
 Cr-0.009
ad-0.008
 jugg-0.008
Sus-0.008
 Sus-0.008
Token "
Token ID366
Feature activation+3.49e-03

Pos logit contributions
ags+0.014
 ours+0.013
ishes+0.012
 trash+0.012
 attention+0.012
Neg logit contributions
 Cr-0.016
ad-0.015
 jugg-0.015
Sus-0.015
 Sus-0.014
TokenBen
Token ID11696
Feature activation+6.35e-03

Pos logit contributions
ags+0.006
 ours+0.006
ishes+0.006
ows+0.005
 attention+0.005
Neg logit contributions
 Cr-0.006
ad-0.005
 jugg-0.005
Sus-0.005
Johnny-0.005
Token,
Token ID11
Feature activation+1.67e-02

Pos logit contributions
ags+0.017
 ours+0.015
 unnecessary+0.015
ishes+0.015
 attention+0.015
Neg logit contributions
 Cr-0.005
ad-0.004
 jugg-0.003
 Sus-0.003
all-0.003
Token I
Token ID314
Feature activation-2.36e-03

Pos logit contributions
ags+0.029
ishes+0.025
 ours+0.024
ports+0.023
irt+0.023
Neg logit contributions
 Cr-0.028
ad-0.027
 jugg-0.027
 Sus-0.025
Sus-0.025
Token have
Token ID423
Feature activation+8.80e-03

Pos logit contributions
 Cr+0.005
ad+0.005
Sus+0.004
 jugg+0.004
odic+0.004
Neg logit contributions
ags-0.003
 hopes-0.003
 trash-0.003
 ours-0.003
 will-0.003
Token a
Token ID257
Feature activation+4.58e-02

Pos logit contributions
ags+0.014
 ours+0.013
ishes+0.012
 attention+0.012
 trash+0.012
Neg logit contributions
 Cr-0.015
ad-0.015
 jugg-0.014
Sus-0.014
 Sus-0.014
Token rope
Token ID17182
Feature activation+1.43e-02

Pos logit contributions
ags+0.069
ishes+0.063
ports+0.061
 yourselves+0.058
 ours+0.056
Neg logit contributions
 jugg-0.081
 Cr-0.077
 Sus-0.075
ad-0.074
Sus-0.072
Token.
Token ID13
Feature activation+1.60e-02

Pos logit contributions
ags+0.022
 ours+0.019
ishes+0.019
ports+0.018
 unnecessary+0.018
Neg logit contributions
 Cr-0.025
ad-0.025
 jugg-0.024
Sus-0.023
all-0.023
Token Grab
Token ID25339
Feature activation+2.11e-02

Pos logit contributions
ags+0.028
ishes+0.026
 ours+0.025
 attention+0.024
 trash+0.024
Neg logit contributions
 Cr-0.027
ad-0.024
 Sus-0.023
 Nice-0.023
 jugg-0.022
Token it
Token ID340
Feature activation+2.04e-03

Pos logit contributions
ags+0.040
ishes+0.036
 ours+0.036
ows+0.035
ses+0.035
Neg logit contributions
ad-0.028
 Cr-0.028
 jugg-0.026
all-0.026
Johnny-0.024
Token and
Token ID290
Feature activation+1.85e-02

Pos logit contributions
ags+0.003
ishes+0.003
 ours+0.003
ports+0.003
 unnecessary+0.003
Neg logit contributions
 Cr-0.003
ad-0.003
all-0.003
 jugg-0.003
Sus-0.003
Token so
Token ID523
Feature activation-5.52e-03

Pos logit contributions
 Cr+0.008
ad+0.007
 jugg+0.007
Sus+0.007
Johnny+0.007
Neg logit contributions
ags-0.005
 ours-0.005
ishes-0.005
 trash-0.005
 attention-0.005
Token beautiful
Token ID4950
Feature activation+5.50e-03

Pos logit contributions
 Cr+0.012
ad+0.012
 jugg+0.011
Sus+0.011
Johnny+0.011
Neg logit contributions
 ours-0.006
ags-0.006
 trash-0.005
 hands-0.005
 attention-0.005
Token inside
Token ID2641
Feature activation+1.42e-02

Pos logit contributions
ags+0.009
 ours+0.008
ishes+0.008
 trash+0.008
 attention+0.008
Neg logit contributions
 Cr-0.009
ad-0.009
 jugg-0.008
Sus-0.008
Johnny-0.008
Token and
Token ID290
Feature activation+1.35e-02

Pos logit contributions
ags+0.023
 ours+0.022
ishes+0.021
 trash+0.020
 attention+0.020
Neg logit contributions
 Cr-0.024
ad-0.022
 jugg-0.022
Sus-0.021
Johnny-0.020
Token the
Token ID262
Feature activation+1.04e-03

Pos logit contributions
ags+0.031
 ours+0.029
ishes+0.028
 trash+0.028
 attention+0.027
Neg logit contributions
 Cr-0.013
ad-0.013
 jugg-0.012
Sus-0.012
 Sus-0.011
Token fringe
Token ID26821
Feature activation+4.58e-02

Pos logit contributions
ags+0.002
 ours+0.001
ishes+0.001
 unnecessary+0.001
ws+0.001
Neg logit contributions
 Cr-0.002
ad-0.002
 jugg-0.002
Sus-0.002
 Sus-0.002
Token from
Token ID422
Feature activation+1.28e-02

Pos logit contributions
ags+0.074
 ours+0.067
ishes+0.063
 trash+0.061
ports+0.060
Neg logit contributions
 Cr-0.078
ad-0.077
 jugg-0.073
Sus-0.071
 Sus-0.071
Token the
Token ID262
Feature activation+7.43e-03

Pos logit contributions
ags+0.024
ows+0.022
issors+0.021
ars+0.021
 slot+0.020
Neg logit contributions
 Cr-0.018
 Sus-0.017
ad-0.017
 jugg-0.017
all-0.016
Token cut
Token ID2005
Feature activation-1.61e-03

Pos logit contributions
ags+0.013
 ours+0.011
ishes+0.011
ows+0.010
 unnecessary+0.010
Neg logit contributions
ad-0.012
 jugg-0.012
 Cr-0.012
Sus-0.011
 Sus-0.011
Token shells
Token ID19679
Feature activation+2.06e-02

Pos logit contributions
 Cr+0.003
ad+0.002
 jugg+0.002
Sus+0.002
 Sus+0.002
Neg logit contributions
ags-0.003
 ours-0.003
ishes-0.002
 trash-0.002
 attention-0.002
Token made
Token ID925
Feature activation+2.03e-02

Pos logit contributions
ags+0.030
 ours+0.027
ishes+0.025
 trash+0.025
 attention+0.024
Neg logit contributions
 Cr-0.038
ad-0.037
 jugg-0.036
Sus-0.035
 Sus-0.034
Token believe
Token ID1975
Feature activation+1.28e-02

Pos logit contributions
ags+0.034
ishes+0.033
ports+0.031
 ours+0.031
 heads+0.030
Neg logit contributions
ad-0.039
 Cr-0.039
 jugg-0.036
Sus-0.035
all-0.035
Token it
Token ID340
Feature activation+1.10e-02

Pos logit contributions
ags+0.022
ishes+0.020
 ours+0.020
 slot+0.020
 trash+0.019
Neg logit contributions
 Cr-0.020
ad-0.020
 jugg-0.018
 Sus-0.018
all-0.018
Token's
Token ID338
Feature activation+5.11e-03

Pos logit contributions
ags+0.021
 ours+0.019
ishes+0.018
 unnecessary+0.018
 slot+0.018
Neg logit contributions
 Cr-0.014
ad-0.014
all-0.014
 jugg-0.014
 Nice-0.013
Token a
Token ID257
Feature activation+9.79e-03

Pos logit contributions
ags+0.008
ishes+0.007
ports+0.006
 slot+0.006
 reins+0.006
Neg logit contributions
 Cr-0.009
ad-0.009
 Sus-0.008
all-0.008
 jugg-0.008
Token reliable
Token ID9314
Feature activation+9.76e-03

Pos logit contributions
ags+0.017
ishes+0.015
 ours+0.014
ports+0.013
ows+0.013
Neg logit contributions
 Cr-0.016
 jugg-0.016
ad-0.015
Sus-0.014
 Sus-0.014
Token ship
Token ID4074
Feature activation+4.57e-02

Pos logit contributions
ags+0.020
 ours+0.018
ishes+0.017
 unnecessary+0.016
ports+0.016
Neg logit contributions
 Cr-0.013
ad-0.013
 jugg-0.013
 Sus-0.012
Sus-0.011
Token."
Token ID526
Feature activation+2.45e-04

Pos logit contributions
ags+0.072
 unnecessary+0.061
ishes+0.058
 ours+0.057
 trash+0.057
Neg logit contributions
 Cr-0.081
 jugg-0.081
ad-0.079
all-0.077
 Nice-0.072
Token Tim
Token ID5045
Feature activation+9.16e-03

Pos logit contributions
ags+0.000
 ours+0.000
ishes+0.000
 trash+0.000
 attention+0.000
Neg logit contributions
 Cr-0.000
ad-0.000
 jugg-0.000
Sus-0.000
 Sus-0.000
Token was
Token ID373
Feature activation+1.02e-02

Pos logit contributions
ags+0.015
 ours+0.014
ishes+0.013
 trash+0.013
ports+0.012
Neg logit contributions
 Cr-0.015
ad-0.015
 jugg-0.014
Sus-0.014
 Sus-0.014
Token sad
Token ID6507
Feature activation-6.67e-03

Pos logit contributions
ags+0.015
 ours+0.014
ishes+0.013
 trash+0.013
 attention+0.013
Neg logit contributions
 Cr-0.018
ad-0.018
 jugg-0.017
Sus-0.017
 Sus-0.016
Token,
Token ID11
Feature activation-3.43e-03

Pos logit contributions
 Cr+0.008
ad+0.008
all+0.008
 jugg+0.007
 Sus+0.007
Neg logit contributions
ags-0.014
 ours-0.013
ishes-0.013
ports-0.012
 slot-0.012
Token to
Token ID284
Feature activation-2.77e-03

Pos logit contributions
 Cr+0.009
ad+0.008
 jugg+0.008
Sus+0.008
 Sus+0.008
Neg logit contributions
ags-0.008
 ours-0.007
ishes-0.007
 trash-0.007
 attention-0.007
Token look
Token ID804
Feature activation-1.34e-02

Pos logit contributions
 Cr+0.005
ad+0.005
Sus+0.005
all+0.004
 jugg+0.004
Neg logit contributions
ags-0.004
 ours-0.004
 hands-0.004
 trash-0.004
ishes-0.004
Token at
Token ID379
Feature activation-2.05e-03

Pos logit contributions
 Cr+0.023
 Sus+0.022
all+0.022
ad+0.022
Sus+0.021
Neg logit contributions
ags-0.019
ses-0.019
 ours-0.017
 slot-0.017
 unnecessary-0.016
Token the
Token ID262
Feature activation-1.93e-02

Pos logit contributions
 Cr+0.003
ad+0.003
 jugg+0.003
Sus+0.003
 Sus+0.003
Neg logit contributions
ags-0.003
 ours-0.003
ishes-0.003
 trash-0.003
 attention-0.003
Token or
Token ID393
Feature activation-1.07e-02

Pos logit contributions
 Cr+0.037
ad+0.036
 jugg+0.035
Sus+0.034
 Sus+0.033
Neg logit contributions
ags-0.027
 ours-0.024
ishes-0.022
 attention-0.021
 hopes-0.021
Tokennam
Token ID7402
Feature activation-5.73e-02

Pos logit contributions
 Cr+0.020
ad+0.020
 jugg+0.019
Sus+0.019
all+0.018
Neg logit contributions
ags-0.015
 ours-0.014
ishes-0.013
 trash-0.013
 attention-0.012
Tokenents
Token ID658
Feature activation-2.16e-03

Pos logit contributions
 Cr+0.102
 Nice+0.095
ad+0.093
 jugg+0.093
 grizz+0.091
Neg logit contributions
 ours-0.086
ags-0.086
ishes-0.078
 hands-0.074
issors-0.072
Token on
Token ID319
Feature activation-3.49e-03

Pos logit contributions
 Cr+0.004
ad+0.004
 jugg+0.004
Sus+0.004
 Sus+0.004
Neg logit contributions
ags-0.003
 ours-0.002
ishes-0.002
 trash-0.002
 attention-0.002
Token the
Token ID262
Feature activation-4.93e-03

Pos logit contributions
 Cr+0.006
ad+0.006
 jugg+0.005
Sus+0.005
 Sus+0.005
Neg logit contributions
ags-0.006
 ours-0.005
ishes-0.005
 trash-0.005
 attention-0.005
Token tree
Token ID5509
Feature activation-1.45e-02

Pos logit contributions
ad+0.010
 Cr+0.009
 jugg+0.009
Sus+0.009
 Sus+0.009
Neg logit contributions
ags-0.006
 ours-0.006
ishes-0.006
irt-0.005
ports-0.005
Token.
Token ID13
Feature activation-7.28e-03

Pos logit contributions
 Cr+0.026
ad+0.026
 jugg+0.025
Sus+0.024
all+0.024
Neg logit contributions
ags-0.021
 ours-0.019
ishes-0.018
 trash-0.018
 attention-0.018
Token take
Token ID1011
Feature activation-2.44e-03

Pos logit contributions
ags+0.002
 ours+0.002
ishes+0.002
 trash+0.002
 attention+0.002
Neg logit contributions
 Cr-0.003
ad-0.002
 jugg-0.002
Sus-0.002
 Sus-0.002
Token one
Token ID530
Feature activation+2.86e-03

Pos logit contributions
 Cr+0.004
ad+0.004
 jugg+0.004
Sus+0.004
 Sus+0.004
Neg logit contributions
ags-0.004
 ours-0.003
ishes-0.003
 trash-0.003
 heads-0.003
Token of
Token ID286
Feature activation-9.06e-03

Pos logit contributions
ags+0.005
 ours+0.005
ishes+0.005
ports+0.005
ses+0.004
Neg logit contributions
ad-0.004
 Cr-0.004
 jugg-0.004
all-0.004
Sus-0.004
Token the
Token ID262
Feature activation-8.13e-03

Pos logit contributions
 Cr+0.012
ad+0.012
 jugg+0.011
Sus+0.010
 Sus+0.010
Neg logit contributions
ags-0.018
 ours-0.017
ishes-0.016
 trash-0.016
ports-0.016
Token or
Token ID393
Feature activation-4.41e-03

Pos logit contributions
ad+0.015
 Cr+0.015
 Sus+0.014
Sus+0.014
 jugg+0.014
Neg logit contributions
ags-0.011
 ours-0.010
irt-0.009
ports-0.009
ows-0.009
Tokennam
Token ID7402
Feature activation-5.72e-02

Pos logit contributions
 Cr+0.008
ad+0.007
 jugg+0.007
Sus+0.007
 Sus+0.007
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
 trash-0.006
 attention-0.005
Tokenents
Token ID658
Feature activation-4.77e-03

Pos logit contributions
 Cr+0.107
 Nice+0.100
 jugg+0.098
ad+0.098
 grizz+0.095
Neg logit contributions
 ours-0.082
ags-0.079
ishes-0.077
 hands-0.072
 dollars-0.067
Token outside
Token ID2354
Feature activation-4.24e-03

Pos logit contributions
 Cr+0.009
ad+0.009
 jugg+0.008
Sus+0.008
Johnny+0.008
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
 attention-0.006
 trash-0.006
Token for
Token ID329
Feature activation-9.32e-03

Pos logit contributions
 Cr+0.008
ad+0.008
 jugg+0.008
Sus+0.008
Ros+0.007
Neg logit contributions
ags-0.006
 ours-0.005
 attention-0.005
 hands-0.005
ishes-0.005
Token a
Token ID257
Feature activation+1.13e-02

Pos logit contributions
 Cr+0.018
ad+0.017
 jugg+0.017
Sus+0.017
 Nice+0.016
Neg logit contributions
ags-0.013
 ours-0.011
 attention-0.011
 trash-0.011
ishes-0.010
Token walk
Token ID2513
Feature activation+3.50e-03

Pos logit contributions
ags+0.017
 ours+0.016
ishes+0.015
 trash+0.015
 attention+0.015
Neg logit contributions
 Cr-0.020
ad-0.019
 jugg-0.018
Sus-0.018
all-0.018
Token wanted
Token ID2227
Feature activation+7.97e-03

Pos logit contributions
ags+0.026
 ours+0.023
ishes+0.022
 attention+0.021
irt+0.021
Neg logit contributions
 Cr-0.022
 jugg-0.021
ad-0.021
 Sus-0.020
Sus-0.019
Token to
Token ID284
Feature activation+1.76e-02

Pos logit contributions
ags+0.011
 ours+0.011
 trash+0.010
 attention+0.010
ishes+0.010
Neg logit contributions
 Cr-0.015
ad-0.014
 jugg-0.014
Sus-0.014
Johnny-0.013
Token see
Token ID766
Feature activation+9.63e-03

Pos logit contributions
ags+0.026
 ours+0.024
ishes+0.022
 trash+0.022
 attention+0.021
Neg logit contributions
 Cr-0.032
ad-0.031
 jugg-0.030
Sus-0.030
 Sus-0.029
Token the
Token ID262
Feature activation-1.29e-02

Pos logit contributions
ags+0.017
ses+0.014
aked+0.014
 ours+0.014
 heads+0.014
Neg logit contributions
 Cr-0.015
 Sus-0.015
ad-0.015
 jugg-0.014
Sus-0.013
Token or
Token ID393
Feature activation-5.12e-05

Pos logit contributions
ad+0.023
 Cr+0.023
 jugg+0.022
Sus+0.021
 Sus+0.021
Neg logit contributions
ags-0.019
 ours-0.016
irt-0.016
ishes-0.016
ports-0.016
Tokennam
Token ID7402
Feature activation-5.65e-02

Pos logit contributions
 Cr+0.000
ad+0.000
Sus+0.000
 jugg+0.000
all+0.000
Neg logit contributions
ags-0.000
 ours-0.000
 trash-0.000
ishes-0.000
 attention-0.000
Tokenents
Token ID658
Feature activation-3.35e-03

Pos logit contributions
 Cr+0.101
ad+0.093
 Nice+0.092
 jugg+0.092
 chuckled+0.089
Neg logit contributions
 ours-0.086
ags-0.084
ishes-0.080
 hands-0.074
 twins-0.072
Token up
Token ID510
Feature activation+4.89e-03

Pos logit contributions
 Cr+0.007
ad+0.006
Sus+0.006
 jugg+0.006
Johnny+0.006
Neg logit contributions
ags-0.004
 ours-0.004
ishes-0.004
 attention-0.004
 trash-0.004
Token close
Token ID1969
Feature activation+4.44e-03

Pos logit contributions
ags+0.006
 ours+0.005
 attention+0.005
 trash+0.005
ishes+0.005
Neg logit contributions
 Cr-0.010
ad-0.010
 jugg-0.009
Sus-0.009
Johnny-0.009
Token.
Token ID13
Feature activation+4.90e-03

Pos logit contributions
ags+0.007
 ours+0.007
ishes+0.006
 unnecessary+0.006
 trash+0.006
Neg logit contributions
ad-0.007
 Cr-0.007
 Sus-0.007
 jugg-0.007
Sus-0.006
Token They
Token ID1119
Feature activation+4.55e-03

Pos logit contributions
ags+0.008
 unnecessary+0.007
 ours+0.007
ishes+0.007
 attention+0.007
Neg logit contributions
 Cr-0.008
 Sus-0.008
 jugg-0.007
 Nice-0.007
ad-0.007
Token also
Token ID635
Feature activation-5.89e-03

Pos logit contributions
 Cr+0.004
 Nice+0.004
ad+0.004
all+0.004
igsaw+0.004
Neg logit contributions
 heads-0.002
 trash-0.002
ags-0.002
 hopes-0.002
 hands-0.002
Token saw
Token ID2497
Feature activation+4.39e-03

Pos logit contributions
 Cr+0.012
 Nice+0.012
all+0.012
Sus+0.011
igsaw+0.011
Neg logit contributions
 hopes-0.007
 trash-0.006
ags-0.006
ishes-0.006
 heads-0.006
Token some
Token ID617
Feature activation-1.49e-02

Pos logit contributions
ags+0.006
 ours+0.005
ishes+0.005
 attention+0.005
 trash+0.005
Neg logit contributions
 Cr-0.009
ad-0.008
 jugg-0.008
Sus-0.008
 Sus-0.008
Token new
Token ID649
Feature activation-9.14e-03

Pos logit contributions
ad+0.031
 Cr+0.031
 jugg+0.030
Sus+0.030
 Sus+0.029
Neg logit contributions
ags-0.017
 ours-0.015
ishes-0.014
ports-0.013
 trash-0.013
Token or
Token ID393
Feature activation-5.93e-03

Pos logit contributions
ad+0.022
 Cr+0.021
 jugg+0.021
Sus+0.021
 Sus+0.021
Neg logit contributions
ags-0.008
 ours-0.007
ishes-0.006
ports-0.005
 unnecessary-0.005
Tokennam
Token ID7402
Feature activation-5.63e-02

Pos logit contributions
 Cr+0.012
ad+0.011
 jugg+0.011
Sus+0.011
 Sus+0.011
Neg logit contributions
ags-0.008
 ours-0.007
ishes-0.007
ports-0.006
 trash-0.006
Tokenents
Token ID658
Feature activation-9.42e-03

Pos logit contributions
 Cr+0.103
ad+0.093
 jugg+0.093
 Nice+0.093
 chuckled+0.089
Neg logit contributions
 ours-0.085
ags-0.084
ishes-0.080
 hands-0.076
 twins-0.072
Token that
Token ID326
Feature activation-5.62e-03

Pos logit contributions
 Cr+0.019
ad+0.018
 jugg+0.018
Sus+0.018
Johnny+0.017
Neg logit contributions
ags-0.012
 ours-0.011
ishes-0.010
 attention-0.010
 trash-0.010
Token Santa
Token ID8909
Feature activation+7.43e-03

Pos logit contributions
 Cr+0.009
ad+0.008
 jugg+0.008
Sus+0.008
 Sus+0.008
Neg logit contributions
ags-0.010
 ours-0.009
ishes-0.009
 attention-0.008
 trash-0.008
Token had
Token ID550
Feature activation-4.30e-03

Pos logit contributions
ags+0.014
 ours+0.012
ishes+0.012
 trash+0.011
 attention+0.011
Neg logit contributions
 Cr-0.011
ad-0.011
 jugg-0.010
Sus-0.010
 Sus-0.010
Token hung
Token ID9174
Feature activation-1.19e-02

Pos logit contributions
ad+0.008
 Cr+0.008
 jugg+0.008
Sus+0.007
 Sus+0.007
Neg logit contributions
ags-0.006
 ours-0.006
ishes-0.005
 trash-0.005
 attention-0.005
Token time
Token ID640
Feature activation-6.88e-03

Pos logit contributions
ags+0.012
 ours+0.011
ishes+0.011
ports+0.010
 trash+0.010
Neg logit contributions
 Cr-0.016
ad-0.016
 jugg-0.016
Sus-0.015
 Sus-0.015
Token playing
Token ID2712
Feature activation-6.39e-03

Pos logit contributions
 Cr+0.013
ad+0.013
 jugg+0.012
Sus+0.012
Johnny+0.012
Neg logit contributions
ags-0.009
 ours-0.008
 attention-0.008
 hands-0.008
ishes-0.008
Token with
Token ID351
Feature activation-5.93e-03

Pos logit contributions
 Cr+0.012
ad+0.011
 jugg+0.011
Sus+0.011
Johnny+0.010
Neg logit contributions
ags-0.009
 ours-0.008
 trash-0.008
ishes-0.008
 hands-0.008
Token her
Token ID607
Feature activation+4.59e-03

Pos logit contributions
 Cr+0.011
 jugg+0.010
ad+0.010
all+0.009
Sus+0.009
Neg logit contributions
ags-0.009
 ours-0.008
 trash-0.008
 hands-0.008
 twins-0.008
Token or
Token ID393
Feature activation-1.03e-02

Pos logit contributions
ags+0.005
 ours+0.005
ishes+0.005
 trash+0.004
 attention+0.004
Neg logit contributions
 Cr-0.010
ad-0.009
 jugg-0.009
Sus-0.009
 Sus-0.009
Tokennam
Token ID7402
Feature activation-5.55e-02

Pos logit contributions
 Cr+0.016
ad+0.016
Sus+0.015
 jugg+0.015
all+0.014
Neg logit contributions
ags-0.017
 ours-0.017
ishes-0.016
 trash-0.016
 attention-0.015
Tokenents
Token ID658
Feature activation-3.26e-03

Pos logit contributions
 Cr+0.099
 Nice+0.091
 jugg+0.090
ad+0.089
 grizz+0.089
Neg logit contributions
 ours-0.085
ags-0.084
ishes-0.077
 hands-0.075
issors-0.071
Token and
Token ID290
Feature activation-8.47e-03

Pos logit contributions
 Cr+0.007
ad+0.006
 jugg+0.006
Sus+0.006
Johnny+0.006
Neg logit contributions
ags-0.004
 ours-0.004
ishes-0.003
 attention-0.003
 trash-0.003
Token making
Token ID1642
Feature activation-6.38e-03

Pos logit contributions
 Cr+0.014
ad+0.013
 jugg+0.013
Sus+0.012
all+0.012
Neg logit contributions
ags-0.014
 ours-0.013
ishes-0.013
 trash-0.012
 attention-0.012
Token her
Token ID607
Feature activation-2.33e-03

Pos logit contributions
 Cr+0.011
ad+0.011
 jugg+0.010
Sus+0.010
 Sus+0.010
Neg logit contributions
ags-0.010
 ours-0.009
ishes-0.009
 trash-0.008
ports-0.008
Token special
Token ID2041
Feature activation+1.62e-04

Pos logit contributions
 Cr+0.004
ad+0.004
 jugg+0.004
 Sus+0.004
Sus+0.004
Neg logit contributions
ags-0.004
 ours-0.003
ishes-0.003
ports-0.003
 trash-0.003
Token who
Token ID508
Feature activation+6.19e-03

Pos logit contributions
 Cr+0.026
ad+0.025
 jugg+0.024
Sus+0.024
 Sus+0.023
Neg logit contributions
ags-0.031
 ours-0.028
ishes-0.027
 trash-0.027
 attention-0.026
Token liked
Token ID8288
Feature activation+4.16e-03

Pos logit contributions
ags+0.011
 ours+0.010
ishes+0.009
ports+0.009
 attention+0.009
Neg logit contributions
ad-0.010
 Cr-0.010
 jugg-0.009
Sus-0.009
 Sus-0.009
Token to
Token ID284
Feature activation-6.91e-04

Pos logit contributions
ags+0.007
 ours+0.007
ishes+0.006
 trash+0.006
ports+0.006
Neg logit contributions
 Cr-0.006
ad-0.006
 jugg-0.006
Sus-0.006
 Sus-0.006
Token collect
Token ID2824
Feature activation-9.91e-03

Pos logit contributions
 Cr+0.001
Sus+0.001
ad+0.001
Johnny+0.001
all+0.001
Neg logit contributions
ags-0.001
 hands-0.001
 trash-0.001
 ours-0.001
 attention-0.001
Token or
Token ID393
Feature activation-9.04e-03

Pos logit contributions
 Cr+0.023
ad+0.023
 jugg+0.022
Sus+0.022
 Sus+0.021
Neg logit contributions
ags-0.009
 ours-0.008
ishes-0.007
 trash-0.007
 attention-0.007
Tokennam
Token ID7402
Feature activation-5.51e-02

Pos logit contributions
 Cr+0.019
ad+0.018
 jugg+0.018
Sus+0.018
 Sus+0.017
Neg logit contributions
ags-0.011
 ours-0.010
ishes-0.009
 trash-0.009
 attention-0.009
Tokenents
Token ID658
Feature activation+5.45e-03

Pos logit contributions
 Cr+0.093
 jugg+0.087
 Nice+0.087
ad+0.087
 grizz+0.084
Neg logit contributions
 ours-0.087
ags-0.086
ishes-0.082
 hands-0.075
 twins-0.072
Token.
Token ID13
Feature activation-5.98e-03

Pos logit contributions
ags+0.007
 ours+0.007
ishes+0.006
 trash+0.006
 attention+0.006
Neg logit contributions
 Cr-0.010
ad-0.010
 jugg-0.010
Sus-0.010
 Sus-0.009
Token Everywhere
Token ID48201
Feature activation+1.45e-03

Pos logit contributions
 Cr+0.008
ad+0.008
 jugg+0.008
Sus+0.007
 Sus+0.007
Neg logit contributions
ags-0.011
 ours-0.011
ishes-0.010
 trash-0.010
 attention-0.010
Token she
Token ID673
Feature activation+3.24e-03

Pos logit contributions
ags+0.004
 ours+0.003
ishes+0.003
 attention+0.003
irt+0.003
Neg logit contributions
 Cr-0.001
ad-0.001
 jugg-0.001
 Sus-0.001
Sus-0.001
Token went
Token ID1816
Feature activation+6.18e-04

Pos logit contributions
ags+0.006
 ours+0.006
 attention+0.005
 unnecessary+0.005
ses+0.005
Neg logit contributions
ad-0.005
 Cr-0.005
 jugg-0.005
Sus-0.004
 Sus-0.004
Token We
Token ID775
Feature activation-7.69e-03

Pos logit contributions
 Cr+0.022
ad+0.020
Sus+0.019
 jugg+0.019
 Sus+0.019
Neg logit contributions
ags-0.017
ishes-0.015
 ours-0.015
 attention-0.014
 trash-0.014
Token have
Token ID423
Feature activation-4.24e-03

Pos logit contributions
 Cr+0.014
ad+0.013
 jugg+0.013
 Sus+0.012
Sus+0.012
Neg logit contributions
ags-0.012
 ours-0.011
ishes-0.010
 attention-0.010
ports-0.010
Token our
Token ID674
Feature activation+1.10e-02

Pos logit contributions
 Cr+0.006
ad+0.006
 Sus+0.005
 jugg+0.005
Sus+0.005
Neg logit contributions
ags-0.008
ishes-0.007
ows-0.007
 ours-0.007
irt-0.007
Token own
Token ID898
Feature activation+1.20e-02

Pos logit contributions
ags+0.014
ows+0.012
ports+0.012
hes+0.011
 ours+0.011
Neg logit contributions
 Cr-0.022
ad-0.021
 jugg-0.020
 Sus-0.020
all-0.019
Token or
Token ID393
Feature activation-6.73e-03

Pos logit contributions
ags+0.016
 ours+0.014
ishes+0.013
ports+0.013
 unnecessary+0.013
Neg logit contributions
 Cr-0.023
ad-0.023
 jugg-0.022
 Sus-0.021
Sus-0.021
Tokennam
Token ID7402
Feature activation-5.47e-02

Pos logit contributions
 Cr+0.012
ad+0.012
 jugg+0.012
 Sus+0.011
Sus+0.011
Neg logit contributions
ags-0.010
ishes-0.008
 ours-0.008
ports-0.008
 attention-0.008
Tokenents
Token ID658
Feature activation-2.83e-03

Pos logit contributions
 Cr+0.097
ad+0.090
 jugg+0.089
 Nice+0.088
 chuckled+0.087
Neg logit contributions
 ours-0.085
ags-0.083
ishes-0.078
 hands-0.072
issors-0.070
Token at
Token ID379
Feature activation+7.72e-03

Pos logit contributions
 Cr+0.005
ad+0.005
 jugg+0.005
Sus+0.005
all+0.005
Neg logit contributions
ags-0.004
 ours-0.004
ishes-0.003
 trash-0.003
ports-0.003
Token home
Token ID1363
Feature activation-1.04e-02

Pos logit contributions
ags+0.010
 ours+0.009
ishes+0.009
 trash+0.009
 attention+0.008
Neg logit contributions
 Cr-0.015
ad-0.015
 jugg-0.014
Sus-0.014
 Sus-0.014
Token,
Token ID11
Feature activation-9.10e-03

Pos logit contributions
 Cr+0.019
ad+0.018
all+0.018
 jugg+0.018
Sus+0.017
Neg logit contributions
ags-0.015
ishes-0.013
 ours-0.013
 trash-0.012
 unnecessary-0.012
Token remember
Token ID3505
Feature activation-1.46e-02

Pos logit contributions
 Cr+0.016
ad+0.014
 jugg+0.014
 Sus+0.014
Sus+0.014
Neg logit contributions
ags-0.015
ishes-0.013
 ours-0.013
ports-0.012
 attention-0.012
Token many
Token ID867
Feature activation-1.68e-02

Pos logit contributions
 Cr+0.016
ad+0.016
 jugg+0.015
Sus+0.015
 Sus+0.015
Neg logit contributions
ags-0.012
 ours-0.010
ishes-0.010
irt-0.009
 attention-0.009
Token colors
Token ID7577
Feature activation+2.53e-03

Pos logit contributions
 Cr+0.035
ad+0.035
 jugg+0.034
Sus+0.033
 Sus+0.032
Neg logit contributions
ags-0.020
 ours-0.017
ishes-0.016
 trash-0.015
ports-0.015
Token.
Token ID13
Feature activation-7.43e-03

Pos logit contributions
ags+0.003
 ours+0.003
ishes+0.003
 trash+0.003
 attention+0.003
Neg logit contributions
 Cr-0.005
ad-0.005
 jugg-0.005
Sus-0.004
 Sus-0.004
Token Some
Token ID2773
Feature activation-2.07e-02

Pos logit contributions
 Cr+0.010
ad+0.009
 jugg+0.009
Sus+0.009
 Sus+0.008
Neg logit contributions
ags-0.015
 ours-0.014
ishes-0.013
 trash-0.013
 attention-0.013
Token or
Token ID393
Feature activation-2.38e-03

Pos logit contributions
 Cr+0.031
ad+0.030
 jugg+0.029
Sus+0.028
 Sus+0.027
Neg logit contributions
ags-0.037
 ours-0.035
ishes-0.032
 attention-0.032
 trash-0.031
Tokennam
Token ID7402
Feature activation-5.47e-02

Pos logit contributions
 Cr+0.005
ad+0.005
 jugg+0.005
Sus+0.005
 Sus+0.004
Neg logit contributions
ags-0.003
 ours-0.003
ishes-0.002
 trash-0.002
 attention-0.002
Tokenents
Token ID658
Feature activation-6.34e-03

Pos logit contributions
 Cr+0.093
 jugg+0.086
ad+0.086
 Nice+0.084
 chuckled+0.081
Neg logit contributions
ags-0.088
 ours-0.086
ishes-0.082
 hands-0.076
 heads-0.074
Token were
Token ID547
Feature activation-1.66e-02

Pos logit contributions
 Cr+0.011
ad+0.010
Sus+0.010
 jugg+0.010
Johnny+0.010
Neg logit contributions
ags-0.010
 ours-0.009
ishes-0.009
 attention-0.009
 trash-0.009
Token round
Token ID2835
Feature activation-1.31e-02

Pos logit contributions
 Cr+0.030
ad+0.030
 jugg+0.029
Sus+0.028
 Sus+0.028
Neg logit contributions
ags-0.024
 ours-0.021
ishes-0.020
ports-0.019
 trash-0.019
Token,
Token ID11
Feature activation-9.68e-03

Pos logit contributions
 Cr+0.022
ad+0.022
 jugg+0.021
Sus+0.020
 Sus+0.020
Neg logit contributions
ags-0.021
 ours-0.019
ishes-0.018
 trash-0.018
 attention-0.017
Token some
Token ID617
Feature activation-1.52e-02

Pos logit contributions
 Cr+0.018
ad+0.018
 jugg+0.017
Sus+0.017
Johnny+0.016
Neg logit contributions
ags-0.014
 ours-0.013
 trash-0.012
ishes-0.012
ports-0.011
TokenBe
Token ID3856
Feature activation+2.86e-03

Pos logit contributions
 Cr+0.020
ad+0.020
 jugg+0.019
Sus+0.019
Johnny+0.018
Neg logit contributions
ags-0.028
 ours-0.026
ishes-0.025
 trash-0.024
 attention-0.024
Token careful
Token ID8161
Feature activation-1.10e-02

Pos logit contributions
ishes+0.004
ses+0.004
 ours+0.003
ags+0.003
 attic+0.003
Neg logit contributions
ad-0.006
Sus-0.005
all-0.005
 Cr-0.005
Johnny-0.005
Token,
Token ID11
Feature activation-2.02e-02

Pos logit contributions
 Cr+0.014
ad+0.013
 jugg+0.013
Sus+0.012
 Sus+0.012
Neg logit contributions
ags-0.022
 ours-0.021
ishes-0.020
 trash-0.020
 attention-0.020
Token these
Token ID777
Feature activation-2.19e-02

Pos logit contributions
 Cr+0.038
ad+0.036
 jugg+0.036
Sus+0.034
 Sus+0.034
Neg logit contributions
ags-0.031
 ours-0.026
ishes-0.026
 attention-0.024
 trash-0.024
Token or
Token ID393
Feature activation-1.02e-02

Pos logit contributions
ad+0.040
 Cr+0.037
 Sus+0.036
 jugg+0.036
Sus+0.035
Neg logit contributions
ags-0.031
 ours-0.026
irt-0.026
ses-0.025
 unnecessary-0.024
Tokennam
Token ID7402
Feature activation-5.44e-02

Pos logit contributions
 Cr+0.019
 jugg+0.018
ad+0.018
 Sus+0.017
Sus+0.016
Neg logit contributions
ags-0.016
 ours-0.013
ishes-0.013
irt-0.012
ports-0.012
Tokenents
Token ID658
Feature activation-5.08e-03

Pos logit contributions
 Cr+0.097
ad+0.089
 jugg+0.089
 Nice+0.087
 Sus+0.085
Neg logit contributions
 ours-0.084
ags-0.083
ishes-0.080
 hands-0.072
 dollars-0.071
Token are
Token ID389
Feature activation-7.13e-03

Pos logit contributions
 Cr+0.009
ad+0.009
 jugg+0.009
Sus+0.008
 Sus+0.008
Neg logit contributions
ags-0.008
 ours-0.007
ishes-0.006
 trash-0.006
 attention-0.006
Token very
Token ID845
Feature activation-1.66e-02

Pos logit contributions
 Cr+0.012
ad+0.012
 jugg+0.011
all+0.011
 Sus+0.011
Neg logit contributions
ags-0.011
ishes-0.009
ports-0.009
 ours-0.009
irt-0.009
Token special
Token ID2041
Feature activation-9.31e-03

Pos logit contributions
 Cr+0.032
ad+0.031
 jugg+0.030
Sus+0.030
 Sus+0.029
Neg logit contributions
ags-0.023
ishes-0.019
 ours-0.019
ports-0.018
ses-0.017
Token.
Token ID13
Feature activation-7.02e-03

Pos logit contributions
ad+0.014
 Cr+0.014
 jugg+0.014
 Sus+0.013
all+0.013
Neg logit contributions
ags-0.016
 ours-0.014
ishes-0.013
ports-0.013
irt-0.013
Token All
Token ID1439
Feature activation+3.51e-03

Pos logit contributions
 Cr+0.002
ad+0.002
 jugg+0.002
Sus+0.002
 Sus+0.002
Neg logit contributions
ags-0.003
 ours-0.003
ishes-0.003
 trash-0.003
 attention-0.003
Token of
Token ID286
Feature activation-9.00e-03

Pos logit contributions
ags+0.006
 ours+0.006
ishes+0.006
 trash+0.006
 attention+0.006
Neg logit contributions
 Cr-0.005
ad-0.005
 jugg-0.005
Sus-0.005
 Sus-0.004
Token these
Token ID777
Feature activation-1.17e-02

Pos logit contributions
 Cr+0.017
 jugg+0.016
ad+0.015
Sus+0.015
 Nice+0.015
Neg logit contributions
ags-0.013
 ours-0.011
 trash-0.011
 twins-0.011
ishes-0.011
Token lovely
Token ID14081
Feature activation-1.06e-02

Pos logit contributions
ad+0.019
 Cr+0.019
 jugg+0.018
Sus+0.018
 Sus+0.018
Neg logit contributions
ags-0.018
 ours-0.017
ishes-0.015
irt-0.015
 attention-0.015
Token or
Token ID393
Feature activation-1.17e-02

Pos logit contributions
ad+0.025
 Cr+0.025
 jugg+0.024
Sus+0.024
 Sus+0.024
Neg logit contributions
ags-0.009
 ours-0.008
ports-0.007
ishes-0.007
ses-0.006
Tokennam
Token ID7402
Feature activation-5.38e-02

Pos logit contributions
 Cr+0.025
ad+0.024
 jugg+0.023
Sus+0.023
all+0.023
Neg logit contributions
ags-0.013
 ours-0.013
ishes-0.012
 trash-0.012
 attention-0.011
Tokenents
Token ID658
Feature activation-4.90e-03

Pos logit contributions
 Cr+0.098
 Nice+0.091
 jugg+0.090
ad+0.089
 chuckled+0.085
Neg logit contributions
ags-0.080
 ours-0.080
ishes-0.077
 hands-0.072
 twins-0.068
Token reminded
Token ID14516
Feature activation-4.56e-03

Pos logit contributions
 Cr+0.008
ad+0.007
 jugg+0.007
Sus+0.007
Johnny+0.007
Neg logit contributions
ags-0.008
 ours-0.008
ishes-0.007
 attention-0.007
 trash-0.007
Token him
Token ID683
Feature activation-7.12e-03

Pos logit contributions
 Cr+0.005
 jugg+0.005
ad+0.005
Sus+0.005
Johnny+0.004
Neg logit contributions
ags-0.010
 ours-0.009
ishes-0.009
 attention-0.009
 trash-0.008
Token of
Token ID286
Feature activation-9.57e-03

Pos logit contributions
 Cr+0.012
ad+0.011
 jugg+0.011
Sus+0.011
Johnny+0.010
Neg logit contributions
ags-0.011
 ours-0.011
ishes-0.010
 attention-0.010
 trash-0.010
Token the
Token ID262
Feature activation+6.34e-03

Pos logit contributions
 Cr+0.016
ad+0.015
 jugg+0.015
Sus+0.015
 Sus+0.014
Neg logit contributions
ags-0.016
 ours-0.014
ishes-0.014
 trash-0.013
 attention-0.013
Token big
Token ID1263
Feature activation+2.75e-03

Pos logit contributions
 Cr+0.009
 jugg+0.009
ad+0.008
Sus+0.008
Johnny+0.008
Neg logit contributions
ags-0.007
ishes-0.006
 ours-0.005
 attention-0.005
ports-0.005
Token tree
Token ID5509
Feature activation-1.63e-02

Pos logit contributions
ags+0.004
 ours+0.003
ishes+0.003
 attention+0.003
ports+0.003
Neg logit contributions
 Cr-0.005
ad-0.005
 jugg-0.005
Sus-0.005
 Sus-0.005
Token with
Token ID351
Feature activation-1.43e-03

Pos logit contributions
 Cr+0.026
ad+0.025
 jugg+0.024
Sus+0.024
 Sus+0.023
Neg logit contributions
ags-0.028
 ours-0.026
ishes-0.024
 trash-0.024
 attention-0.024
Token many
Token ID867
Feature activation-1.15e-02

Pos logit contributions
 Cr+0.003
ad+0.003
 jugg+0.003
Sus+0.003
all+0.003
Neg logit contributions
ags-0.001
 ours-0.001
 trash-0.001
ishes-0.001
 hands-0.001
Token or
Token ID393
Feature activation+5.22e-03

Pos logit contributions
 Cr+0.025
ad+0.025
 jugg+0.024
Sus+0.024
 Sus+0.023
Neg logit contributions
ags-0.013
 ours-0.011
ishes-0.010
 attention-0.009
ports-0.009
Tokennam
Token ID7402
Feature activation-5.37e-02

Pos logit contributions
ags+0.007
 ours+0.006
ishes+0.006
ports+0.006
 attention+0.006
Neg logit contributions
 Cr-0.010
ad-0.010
 jugg-0.010
Sus-0.009
 Sus-0.009
Tokenents
Token ID658
Feature activation+3.49e-03

Pos logit contributions
 Cr+0.092
 jugg+0.086
ad+0.085
 Nice+0.085
 chuckled+0.083
Neg logit contributions
 ours-0.084
ags-0.083
ishes-0.078
 hands-0.074
 twins-0.070
Token on
Token ID319
Feature activation-4.89e-04

Pos logit contributions
ags+0.004
 ours+0.004
ishes+0.004
 trash+0.003
 attention+0.003
Neg logit contributions
 Cr-0.007
ad-0.007
 jugg-0.007
Sus-0.007
 Sus-0.007
Token it
Token ID340
Feature activation-3.08e-03

Pos logit contributions
 Cr+0.001
ad+0.001
 jugg+0.001
Sus+0.001
Johnny+0.001
Neg logit contributions
ags-0.001
 ours-0.001
 trash-0.001
ishes-0.001
 hands-0.001
Token.
Token ID13
Feature activation-2.90e-04

Pos logit contributions
 Cr+0.006
ad+0.006
 jugg+0.006
Sus+0.006
 Sus+0.005
Neg logit contributions
ags-0.004
 ours-0.004
ishes-0.003
 trash-0.003
 attention-0.003
Token The
Token ID383
Feature activation-2.67e-03

Pos logit contributions
 Cr+0.000
 Sus+0.000
Sus+0.000
ad+0.000
 jugg+0.000
Neg logit contributions
ags-0.001
ishes-0.001
 ours-0.001
 attention-0.001
 unnecessary-0.000
Token They
Token ID1119
Feature activation+1.03e-02

Pos logit contributions
 Cr+0.019
 Sus+0.018
 Nice+0.016
 Bub+0.016
Sus+0.016
Neg logit contributions
ags-0.022
 ours-0.020
 unnecessary-0.020
ishes-0.019
 attention-0.019
Token reached
Token ID4251
Feature activation+1.25e-02

Pos logit contributions
ags+0.016
 ours+0.014
ishes+0.014
 trash+0.013
ports+0.013
Neg logit contributions
 Cr-0.018
ad-0.018
 jugg-0.017
Sus-0.017
all-0.016
Token for
Token ID329
Feature activation-5.38e-03

Pos logit contributions
Deal+0.022
OK+0.020
ags+0.020
ports+0.020
 ours+0.019
Neg logit contributions
 Sus-0.018
ad-0.017
 Suz-0.016
 Cr-0.016
 M-0.016
Token the
Token ID262
Feature activation-2.52e-03

Pos logit contributions
 Cr+0.010
ad+0.010
 jugg+0.009
Sus+0.009
all+0.009
Neg logit contributions
ags-0.008
 ours-0.007
ishes-0.007
 trash-0.007
 attention-0.007
Token or
Token ID393
Feature activation-1.15e-02

Pos logit contributions
ad+0.004
 Cr+0.004
 jugg+0.004
 Sus+0.004
Sus+0.004
Neg logit contributions
ags-0.004
irt-0.003
ishes-0.003
ports-0.003
 ours-0.003
Tokennam
Token ID7402
Feature activation-5.35e-02

Pos logit contributions
 Cr+0.022
 jugg+0.021
ad+0.021
Sus+0.020
 Sus+0.020
Neg logit contributions
ags-0.016
 ours-0.014
ishes-0.014
ports-0.013
 unnecessary-0.013
Tokenents
Token ID658
Feature activation-6.19e-03

Pos logit contributions
 Cr+0.099
 Nice+0.091
 jugg+0.091
ad+0.090
 chuckled+0.087
Neg logit contributions
 ours-0.079
ags-0.077
ishes-0.074
 hands-0.069
 dollars-0.067
Token and
Token ID290
Feature activation+2.42e-03

Pos logit contributions
 Cr+0.012
ad+0.011
 jugg+0.011
Sus+0.011
Johnny+0.010
Neg logit contributions
ags-0.009
 ours-0.008
ishes-0.008
 attention-0.008
 trash-0.007
Token touched
Token ID12615
Feature activation-3.50e-03

Pos logit contributions
ags+0.004
 ours+0.004
ishes+0.003
 trash+0.003
 attention+0.003
Neg logit contributions
 Cr-0.004
ad-0.004
 jugg-0.004
Sus-0.004
 Sus-0.004
Token them
Token ID606
Feature activation-4.96e-03

Pos logit contributions
 Cr+0.005
ad+0.005
 Sus+0.005
 jugg+0.005
Sus+0.005
Neg logit contributions
ags-0.006
 ours-0.006
ishes-0.005
ses-0.005
 attention-0.005
Token softly
Token ID26625
Feature activation+8.02e-03

Pos logit contributions
 Cr+0.010
ad+0.009
Sus+0.009
 jugg+0.009
Johnny+0.009
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
 trash-0.006
 attention-0.006
Token on
Token ID319
Feature activation-4.89e-04

Pos logit contributions
ags+0.004
 ours+0.004
ishes+0.004
 trash+0.003
 attention+0.003
Neg logit contributions
 Cr-0.007
ad-0.007
 jugg-0.007
Sus-0.007
 Sus-0.007
Token it
Token ID340
Feature activation-3.08e-03

Pos logit contributions
 Cr+0.001
ad+0.001
 jugg+0.001
Sus+0.001
Johnny+0.001
Neg logit contributions
ags-0.001
 ours-0.001
 trash-0.001
ishes-0.001
 hands-0.001
Token.
Token ID13
Feature activation-2.90e-04

Pos logit contributions
 Cr+0.006
ad+0.006
 jugg+0.006
Sus+0.006
 Sus+0.005
Neg logit contributions
ags-0.004
 ours-0.004
ishes-0.003
 trash-0.003
 attention-0.003
Token The
Token ID383
Feature activation-2.67e-03

Pos logit contributions
 Cr+0.000
 Sus+0.000
Sus+0.000
ad+0.000
 jugg+0.000
Neg logit contributions
ags-0.001
ishes-0.001
 ours-0.001
 attention-0.001
 unnecessary-0.000
Token or
Token ID393
Feature activation-1.46e-03

Pos logit contributions
 Cr+0.004
ad+0.004
 jugg+0.003
 gn+0.003
bird+0.003
Neg logit contributions
ags-0.005
irt-0.004
 unnecessary-0.004
ows-0.004
ishes-0.004
Tokennam
Token ID7402
Feature activation-5.29e-02

Pos logit contributions
 Cr+0.003
ad+0.003
 jugg+0.003
Sus+0.003
 Sus+0.003
Neg logit contributions
ags-0.002
 ours-0.002
ishes-0.002
 trash-0.002
 attention-0.002
Tokenents
Token ID658
Feature activation+1.18e-03

Pos logit contributions
 Cr+0.089
 jugg+0.083
ad+0.082
 Nice+0.080
 chuckled+0.078
Neg logit contributions
ags-0.086
 ours-0.084
ishes-0.078
 hands-0.075
 heads-0.072
Token were
Token ID547
Feature activation-7.04e-03

Pos logit contributions
ags+0.002
 ours+0.002
ishes+0.002
 trash+0.002
 attention+0.002
Neg logit contributions
 Cr-0.002
ad-0.002
Sus-0.002
 jugg-0.002
Johnny-0.002
Token shiny
Token ID22441
Feature activation+2.35e-03

Pos logit contributions
 Cr+0.012
ad+0.012
 jugg+0.012
Sus+0.011
 Sus+0.011
Neg logit contributions
ags-0.011
ishes-0.009
 ours-0.009
ports-0.009
irt-0.009
Token and
Token ID290
Feature activation-5.26e-03

Pos logit contributions
ags+0.004
 ours+0.004
 unnecessary+0.004
 attention+0.004
ishes+0.004
Neg logit contributions
ad-0.003
 Cr-0.003
 jugg-0.003
 Sus-0.003
Sus-0.003
Token colorful
Token ID20239
Feature activation-2.25e-03

Pos logit contributions
 Cr+0.010
ad+0.010
Sus+0.009
all+0.009
 jugg+0.009
Neg logit contributions
 ours-0.007
ags-0.007
 trash-0.007
ishes-0.006
 attention-0.006
Token They
Token ID1119
Feature activation+4.61e-03

Pos logit contributions
ags+0.021
 ours+0.019
ishes+0.019
 attention+0.018
 unnecessary+0.017
Neg logit contributions
 Cr-0.020
ad-0.019
 jugg-0.019
Sus-0.019
 Sus-0.019
Token reached
Token ID4251
Feature activation+2.32e-02

Pos logit contributions
ags+0.007
 ours+0.007
ishes+0.007
 trash+0.007
 hands+0.006
Neg logit contributions
 Cr-0.008
ad-0.007
Sus-0.007
 jugg-0.007
all-0.007
Token for
Token ID329
Feature activation+7.61e-03

Pos logit contributions
ags+0.039
Deal+0.038
 unnecessary+0.037
ishes+0.037
ports+0.036
Neg logit contributions
 Sus-0.037
 Suz-0.034
ad-0.034
ill-0.031
 Cr-0.031
Token the
Token ID262
Feature activation+2.78e-03

Pos logit contributions
ags+0.010
 ours+0.009
 trash+0.009
 hands+0.009
 attention+0.009
Neg logit contributions
 Cr-0.015
ad-0.015
 jugg-0.014
Sus-0.014
all-0.014
Token or
Token ID393
Feature activation-3.76e-03

Pos logit contributions
ags+0.004
irt+0.004
ports+0.004
ishes+0.004
 ours+0.003
Neg logit contributions
ad-0.005
 Cr-0.005
 Sus-0.004
 jugg-0.004
Sus-0.004
Tokennam
Token ID7402
Feature activation-5.28e-02

Pos logit contributions
 Cr+0.007
ad+0.007
 jugg+0.007
Sus+0.007
 Sus+0.007
Neg logit contributions
ags-0.005
 ours-0.005
ishes-0.005
 trash-0.004
 attention-0.004
Tokenents
Token ID658
Feature activation+9.25e-05

Pos logit contributions
 Cr+0.097
 jugg+0.090
 Nice+0.089
ad+0.089
 chuckled+0.086
Neg logit contributions
 ours-0.078
ags-0.077
ishes-0.072
 hands-0.069
issors-0.065
Token and
Token ID290
Feature activation+8.03e-03

Pos logit contributions
ags+0.000
 ours+0.000
 attention+0.000
ishes+0.000
 trash+0.000
Neg logit contributions
 Cr-0.000
ad-0.000
 jugg-0.000
Sus-0.000
Johnny-0.000
Token tried
Token ID3088
Feature activation-8.82e-03

Pos logit contributions
ags+0.012
 ours+0.012
ishes+0.011
 trash+0.011
 attention+0.011
Neg logit contributions
 Cr-0.014
ad-0.014
 jugg-0.013
Sus-0.013
all-0.012
Token to
Token ID284
Feature activation+4.19e-03

Pos logit contributions
 Cr+0.017
ad+0.017
 jugg+0.017
Sus+0.016
Johnny+0.016
Neg logit contributions
ags-0.011
 attention-0.011
 ours-0.010
 trash-0.010
 best-0.010
Token take
Token ID1011
Feature activation+9.83e-03

Pos logit contributions
ags+0.006
 ours+0.005
 trash+0.005
 attention+0.005
 heads+0.005
Neg logit contributions
 Cr-0.008
ad-0.008
Sus-0.008
 Nice-0.008
Johnny-0.007
Token branches
Token ID13737
Feature activation+1.07e-03

Pos logit contributions
ags+0.007
ishes+0.007
irt+0.007
ports+0.007
aked+0.007
Neg logit contributions
ad-0.006
 Cr-0.006
bell-0.006
 Sus-0.006
Ell-0.006
Token.
Token ID13
Feature activation+1.65e-02

Pos logit contributions
ags+0.001
 ours+0.001
 attention+0.001
ishes+0.001
 hands+0.001
Neg logit contributions
 Cr-0.002
ad-0.002
 jugg-0.002
Sus-0.002
Johnny-0.002
Token But
Token ID887
Feature activation+4.78e-03

Pos logit contributions
ags+0.030
ishes+0.029
 ours+0.028
 unnecessary+0.027
 attention+0.027
Neg logit contributions
 Cr-0.024
 Sus-0.023
Sus-0.022
ad-0.022
 jugg-0.022
Token the
Token ID262
Feature activation-6.99e-03

Pos logit contributions
ags+0.008
 ours+0.007
ishes+0.007
 trash+0.007
 attention+0.007
Neg logit contributions
 Cr-0.008
ad-0.008
 jugg-0.007
Sus-0.007
 Sus-0.007
Token or
Token ID393
Feature activation-8.47e-03

Pos logit contributions
ad+0.011
 Cr+0.011
 jugg+0.010
 Sus+0.010
 gn+0.010
Neg logit contributions
ags-0.011
irt-0.010
ports-0.010
 ours-0.009
ows-0.009
Tokennam
Token ID7402
Feature activation-5.27e-02

Pos logit contributions
 Cr+0.017
ad+0.016
 jugg+0.016
Sus+0.015
 Sus+0.015
Neg logit contributions
ags-0.011
 ours-0.010
ishes-0.010
 trash-0.009
 attention-0.009
Tokenents
Token ID658
Feature activation-4.01e-04

Pos logit contributions
 Cr+0.093
 jugg+0.088
ad+0.086
 Nice+0.085
 chuckled+0.082
Neg logit contributions
ags-0.081
 ours-0.081
ishes-0.076
 hands-0.073
 heads-0.068
Token were
Token ID547
Feature activation+1.45e-03

Pos logit contributions
 Cr+0.001
ad+0.001
Sus+0.001
 jugg+0.001
Johnny+0.001
Neg logit contributions
ags-0.001
 ours-0.001
ishes-0.001
 trash-0.001
 attention-0.001
Token slippery
Token ID32911
Feature activation+6.05e-03

Pos logit contributions
ags+0.002
 ours+0.002
ishes+0.002
ports+0.002
 trash+0.002
Neg logit contributions
 Cr-0.002
ad-0.002
 jugg-0.002
Sus-0.002
all-0.002
Token and
Token ID290
Feature activation-5.61e-03

Pos logit contributions
ags+0.010
 ours+0.009
ishes+0.009
ports+0.008
 trash+0.008
Neg logit contributions
 Cr-0.010
ad-0.010
 jugg-0.009
Sus-0.009
 Sus-0.009
Token some
Token ID617
Feature activation-1.01e-02

Pos logit contributions
 Cr+0.010
ad+0.010
 jugg+0.010
Sus+0.010
Johnny+0.009
Neg logit contributions
ags-0.008
 ours-0.008
ishes-0.007
 trash-0.007
 attention-0.007
Token said
Token ID531
Feature activation+9.27e-03

Pos logit contributions
ags+0.004
 ours+0.004
ishes+0.004
 trash+0.004
 hands+0.004
Neg logit contributions
 Cr-0.006
ad-0.006
 jugg-0.006
Sus-0.006
all-0.005
Token.
Token ID13
Feature activation+1.46e-02

Pos logit contributions
ags+0.016
 ours+0.015
ishes+0.014
 trash+0.014
ports+0.014
Neg logit contributions
 Cr-0.015
ad-0.014
 jugg-0.014
Sus-0.013
 Sus-0.013
Token "
Token ID366
Feature activation-1.77e-03

Pos logit contributions
ags+0.028
 ours+0.027
ishes+0.027
 unnecessary+0.025
 attention+0.025
Neg logit contributions
 Cr-0.020
ad-0.019
 jugg-0.018
Sus-0.017
 Sus-0.017
TokenThe
Token ID464
Feature activation-1.51e-02

Pos logit contributions
 Cr+0.003
ad+0.003
 jugg+0.003
Sus+0.003
Johnny+0.003
Neg logit contributions
ags-0.003
 ours-0.003
ishes-0.002
 attention-0.002
 trash-0.002
Token or
Token ID393
Feature activation+3.52e-03

Pos logit contributions
 Cr+0.025
ad+0.024
 jugg+0.023
Sus+0.022
Ell+0.022
Neg logit contributions
ags-0.022
irt-0.021
ows-0.020
ports-0.020
ishes-0.020
Tokennam
Token ID7402
Feature activation-5.24e-02

Pos logit contributions
ags+0.005
 ours+0.004
ishes+0.004
 attention+0.004
 trash+0.004
Neg logit contributions
 Cr-0.007
ad-0.007
 jugg-0.006
Sus-0.006
 Sus-0.006
Tokenents
Token ID658
Feature activation+3.40e-03

Pos logit contributions
 Cr+0.087
 jugg+0.082
ad+0.081
 Nice+0.079
 chuckled+0.078
Neg logit contributions
ags-0.084
 ours-0.084
ishes-0.077
 hands-0.073
 twins-0.071
Token are
Token ID389
Feature activation-5.56e-03

Pos logit contributions
ags+0.005
 ours+0.004
ishes+0.004
 trash+0.004
 attention+0.004
Neg logit contributions
 Cr-0.006
ad-0.006
 jugg-0.006
Sus-0.006
 Sus-0.006
Token so
Token ID523
Feature activation-1.11e-02

Pos logit contributions
 Cr+0.009
ad+0.009
 jugg+0.009
 Sus+0.008
all+0.008
Neg logit contributions
ags-0.009
ishes-0.008
irt-0.008
ports-0.007
 unnecessary-0.007
Token pretty
Token ID2495
Feature activation+1.09e-02

Pos logit contributions
 Cr+0.022
ad+0.021
 jugg+0.021
Sus+0.020
 Sus+0.020
Neg logit contributions
ags-0.015
 ours-0.013
ishes-0.013
 attention-0.012
 trash-0.012
Token!
Token ID0
Feature activation+8.84e-03

Pos logit contributions
ags+0.019
 unnecessary+0.016
 ours+0.016
ses+0.015
ishes+0.015
Neg logit contributions
 Cr-0.017
ad-0.017
 jugg-0.016
all-0.016
 Sus-0.016
Token never
Token ID1239
Feature activation-1.65e-02

Pos logit contributions
ags+0.004
 ours+0.003
ishes+0.003
 trash+0.003
 hopes+0.003
Neg logit contributions
 Cr-0.004
ad-0.004
 jugg-0.004
Sus-0.004
Johnny-0.003
Token played
Token ID2826
Feature activation-1.11e-02

Pos logit contributions
 Cr+0.028
ad+0.027
Sus+0.027
Johnny+0.026
 jugg+0.026
Neg logit contributions
ags-0.025
 ours-0.024
ishes-0.023
 trash-0.023
 attention-0.022
Token with
Token ID351
Feature activation-1.05e-02

Pos logit contributions
 Cr+0.019
ad+0.019
 jugg+0.018
Sus+0.018
 Sus+0.017
Neg logit contributions
ags-0.017
 ours-0.016
ishes-0.015
 trash-0.015
 attention-0.014
Token the
Token ID262
Feature activation+4.32e-03

Pos logit contributions
 Cr+0.020
ad+0.019
 jugg+0.018
 Sus+0.018
Sus+0.018
Neg logit contributions
ags-0.015
 ours-0.013
ishes-0.012
 attention-0.012
 slot-0.012
Token or
Token ID393
Feature activation-1.53e-02

Pos logit contributions
ags+0.006
 ours+0.005
ports+0.005
ishes+0.005
ws+0.004
Neg logit contributions
 jugg-0.008
ad-0.008
 Cr-0.008
 Sus-0.008
Sus-0.007
Tokennam
Token ID7402
Feature activation-5.22e-02

Pos logit contributions
ad+0.028
 Cr+0.027
Sus+0.026
 jugg+0.026
Johnny+0.025
Neg logit contributions
ags-0.022
 ours-0.021
 trash-0.020
ishes-0.020
 attention-0.019
Tokenents
Token ID658
Feature activation-7.03e-03

Pos logit contributions
 Cr+0.094
ad+0.088
 Nice+0.087
 jugg+0.087
 grizz+0.084
Neg logit contributions
 ours-0.079
ags-0.077
ishes-0.073
 hands-0.068
issors-0.066
Token again
Token ID757
Feature activation-3.46e-02

Pos logit contributions
 Cr+0.008
ad+0.008
Sus+0.008
Johnny+0.007
 jugg+0.007
Neg logit contributions
ags-0.014
 ours-0.014
 attention-0.013
ishes-0.013
 trash-0.013
Token.
Token ID13
Feature activation-4.13e-03

Pos logit contributions
 Cr+0.077
ad+0.075
Sus+0.074
 jugg+0.072
Johnny+0.070
Neg logit contributions
ags-0.031
 ours-0.029
 attention-0.027
 best-0.027
ishes-0.027
Token They
Token ID1119
Feature activation-4.62e-03

Pos logit contributions
 Cr+0.007
ad+0.007
 jugg+0.006
Sus+0.006
 Sus+0.006
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
 trash-0.006
 attention-0.006
Token still
Token ID991
Feature activation-3.81e-03

Pos logit contributions
 Cr+0.008
Sus+0.008
ad+0.008
 Nice+0.008
Johnny+0.008
Neg logit contributions
ags-0.006
 hopes-0.006
ishes-0.006
 trash-0.006
 ours-0.006
Token had
Token ID550
Feature activation+7.21e-03

Pos logit contributions
 Cr+0.006
ad+0.006
 jugg+0.006
Sus+0.006
 Sus+0.005
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
 trash-0.006
 attention-0.006
Token not
Token ID407
Feature activation-5.37e-03

Pos logit contributions
ags+0.014
ishes+0.013
 ours+0.013
irt+0.012
 slot+0.012
Neg logit contributions
 Cr-0.010
ad-0.010
 jugg-0.009
Sus-0.009
all-0.008
Token touched
Token ID12615
Feature activation-3.07e-03

Pos logit contributions
 Cr+0.009
ad+0.009
 jugg+0.008
Sus+0.008
 Sus+0.008
Neg logit contributions
ags-0.009
 ours-0.008
ishes-0.008
 trash-0.008
 attention-0.008
Token the
Token ID262
Feature activation-3.37e-03

Pos logit contributions
 Cr+0.005
ad+0.005
 Sus+0.004
 jugg+0.004
all+0.004
Neg logit contributions
ags-0.005
 ours-0.005
ses-0.004
ishes-0.004
 heads-0.004
Token or
Token ID393
Feature activation-5.81e-03

Pos logit contributions
 Cr+0.006
ad+0.006
 jugg+0.005
 Sus+0.005
Sus+0.005
Neg logit contributions
ags-0.005
 ours-0.004
irt-0.004
ports-0.004
ishes-0.004
Tokennam
Token ID7402
Feature activation-5.22e-02

Pos logit contributions
 Cr+0.010
ad+0.010
 jugg+0.010
Sus+0.010
all+0.009
Neg logit contributions
ags-0.009
 ours-0.008
ishes-0.008
 trash-0.007
 attention-0.007
Tokenents
Token ID658
Feature activation-3.86e-03

Pos logit contributions
 Cr+0.095
ad+0.089
 Nice+0.087
 jugg+0.087
 Sus+0.084
Neg logit contributions
 ours-0.078
ags-0.075
ishes-0.072
 hands-0.066
 twins-0.065
Token.
Token ID13
Feature activation-5.74e-03

Pos logit contributions
 Cr+0.007
ad+0.007
Sus+0.006
 jugg+0.006
Johnny+0.006
Neg logit contributions
ags-0.006
 ours-0.005
ishes-0.005
 attention-0.005
 trash-0.005
Token They
Token ID1119
Feature activation-1.58e-03

Pos logit contributions
 Cr+0.010
 Sus+0.009
 Nice+0.009
Sus+0.008
ad+0.008
Neg logit contributions
ags-0.010
ishes-0.009
 attention-0.009
 ours-0.009
 unnecessary-0.008
Token hoped
Token ID10719
Feature activation+9.94e-04

Pos logit contributions
 Cr+0.003
ad+0.003
 jugg+0.003
Sus+0.003
all+0.003
Neg logit contributions
ags-0.002
 ours-0.002
 trash-0.002
ishes-0.002
 hopes-0.002
Token Mom
Token ID11254
Feature activation-8.87e-03

Pos logit contributions
ags+0.002
 ours+0.001
ishes+0.001
 trash+0.001
 attention+0.001
Neg logit contributions
 Cr-0.002
ad-0.002
 jugg-0.002
Sus-0.002
 Sus-0.002
Token at
Token ID379
Feature activation+1.16e-02

Pos logit contributions
 ours+0.004
ags+0.004
ports+0.003
ses+0.003
 yourselves+0.003
Neg logit contributions
 Cr-0.005
ad-0.004
 Sus-0.004
all-0.004
Sus-0.004
Token all
Token ID477
Feature activation-2.19e-03

Pos logit contributions
ags+0.020
 ours+0.018
ishes+0.018
ports+0.017
irt+0.017
Neg logit contributions
 Cr-0.018
ad-0.017
 Sus-0.017
 jugg-0.016
Sus-0.015
Token the
Token ID262
Feature activation-1.90e-03

Pos logit contributions
 Cr+0.003
ad+0.003
 jugg+0.003
Sus+0.003
 Sus+0.003
Neg logit contributions
ags-0.004
 ours-0.003
ishes-0.003
 trash-0.003
ports-0.003
Token different
Token ID1180
Feature activation-7.41e-03

Pos logit contributions
ad+0.004
 jugg+0.004
 Cr+0.004
Sus+0.004
 Sus+0.004
Neg logit contributions
ags-0.002
 ours-0.002
ishes-0.002
ses-0.002
ports-0.002
Token or
Token ID393
Feature activation-1.33e-02

Pos logit contributions
 Cr+0.018
ad+0.018
 jugg+0.017
Sus+0.017
 Sus+0.017
Neg logit contributions
ags-0.006
 ours-0.005
ishes-0.005
 trash-0.004
ports-0.004
Tokennam
Token ID7402
Feature activation-5.18e-02

Pos logit contributions
ad+0.027
 Cr+0.027
Sus+0.025
 jugg+0.025
all+0.024
Neg logit contributions
 ours-0.016
ags-0.015
 trash-0.014
ishes-0.014
 attention-0.014
Tokenents
Token ID658
Feature activation-5.00e-03

Pos logit contributions
 Cr+0.096
 Nice+0.092
 jugg+0.089
ad+0.088
 grizz+0.087
Neg logit contributions
 ours-0.074
ags-0.072
ishes-0.070
 hands-0.066
 twins-0.062
Token and
Token ID290
Feature activation-1.00e-03

Pos logit contributions
 Cr+0.010
ad+0.009
 jugg+0.009
Sus+0.009
Johnny+0.009
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
 attention-0.006
 trash-0.005
Token adm
Token ID6178
Feature activation-1.04e-02

Pos logit contributions
 Cr+0.002
ad+0.002
Sus+0.001
all+0.001
 jugg+0.001
Neg logit contributions
 ours-0.002
ags-0.002
ishes-0.001
 attention-0.001
 trash-0.001
Tokeniring
Token ID3428
Feature activation-1.62e-04

Pos logit contributions
 Cr+0.020
ad+0.020
 jugg+0.019
Sus+0.019
 Sus+0.019
Neg logit contributions
ags-0.014
 ours-0.012
ishes-0.012
 trash-0.011
 attention-0.011
Token the
Token ID262
Feature activation-1.44e-02

Pos logit contributions
 Cr+0.000
ad+0.000
 jugg+0.000
 Sus+0.000
Sus+0.000
Neg logit contributions
ags-0.000
 ours-0.000
ishes-0.000
 trash-0.000
 attention-0.000
Token she
Token ID673
Feature activation-5.04e-04

Pos logit contributions
 Cr+0.004
ad+0.003
 jugg+0.003
Sus+0.003
 Sus+0.003
Neg logit contributions
ags-0.004
 ours-0.004
ishes-0.004
 trash-0.003
 attention-0.003
Token had
Token ID550
Feature activation-8.71e-04

Pos logit contributions
 Cr+0.001
ad+0.001
Sus+0.001
 jugg+0.001
 Sus+0.001
Neg logit contributions
ags-0.001
 ours-0.001
ishes-0.001
 trash-0.001
ports-0.001
Token with
Token ID351
Feature activation-1.72e-02

Pos logit contributions
 Cr+0.001
ad+0.001
 jugg+0.001
Sus+0.001
 Sus+0.001
Neg logit contributions
ags-0.001
 ours-0.001
ishes-0.001
 trash-0.001
 attention-0.001
Token her
Token ID607
Feature activation+6.07e-03

Pos logit contributions
 Cr+0.029
 jugg+0.027
ad+0.027
Sus+0.027
Johnny+0.026
Neg logit contributions
ags-0.028
 ours-0.026
 attention-0.024
 trash-0.024
 hands-0.024
Token or
Token ID393
Feature activation-4.72e-03

Pos logit contributions
ags+0.009
 ours+0.008
ishes+0.007
ports+0.007
 trash+0.007
Neg logit contributions
 Cr-0.011
ad-0.011
 jugg-0.011
 Sus-0.010
Sus-0.010
Tokennam
Token ID7402
Feature activation-5.15e-02

Pos logit contributions
 Cr+0.008
ad+0.008
 jugg+0.007
Sus+0.007
 Sus+0.007
Neg logit contributions
ags-0.008
 ours-0.007
ishes-0.007
 trash-0.007
 attention-0.007
Tokenents
Token ID658
Feature activation-8.06e-03

Pos logit contributions
 Cr+0.098
 Nice+0.090
 jugg+0.089
ad+0.088
 Freddy+0.086
Neg logit contributions
 ours-0.074
ags-0.071
ishes-0.067
 hands-0.066
 twins-0.060
Token.
Token ID13
Feature activation+2.86e-03

Pos logit contributions
 Cr+0.015
ad+0.015
 jugg+0.014
Sus+0.014
Johnny+0.014
Neg logit contributions
ags-0.011
 ours-0.010
ishes-0.010
 attention-0.010
 trash-0.009
Token<|endoftext|>
Token ID50256
Feature activation+8.15e-03

Pos logit contributions
ags+0.005
 ours+0.005
ishes+0.005
 trash+0.005
 attention+0.005
Neg logit contributions
 Cr-0.004
ad-0.004
 jugg-0.004
Sus-0.004
 Sus-0.004
TokenOnce
Token ID7454
Feature activation-4.12e-03

Pos logit contributions
ags+0.014
 ours+0.013
ishes+0.013
 attention+0.012
 trash+0.012
Neg logit contributions
 Cr-0.013
ad-0.012
 jugg-0.012
Sus-0.012
 Sus-0.011
Token upon
Token ID2402
Feature activation+4.18e-04

Pos logit contributions
 Cr+0.006
ad+0.006
 jugg+0.006
Sus+0.006
Johnny+0.005
Neg logit contributions
ags-0.007
 ours-0.007
ishes-0.006
 attention-0.006
 trash-0.006
Token I
Token ID314
Feature activation+5.26e-03

Pos logit contributions
ags+0.008
 ours+0.008
ishes+0.007
 trash+0.007
 attention+0.007
Neg logit contributions
 Cr-0.008
ad-0.008
 jugg-0.008
Sus-0.007
 Sus-0.007
Token see
Token ID766
Feature activation+8.55e-03

Pos logit contributions
ags+0.007
 ours+0.007
 hopes+0.007
 trash+0.007
ishes+0.007
Neg logit contributions
 Cr-0.010
ad-0.009
Sus-0.009
 jugg-0.009
Johnny-0.008
Token something
Token ID1223
Feature activation+6.35e-03

Pos logit contributions
ags+0.015
ishes+0.013
 ours+0.012
irt+0.012
ses+0.012
Neg logit contributions
 Cr-0.014
ad-0.014
 Sus-0.013
Sus-0.013
 jugg-0.013
Token that
Token ID326
Feature activation+1.91e-02

Pos logit contributions
ags+0.010
ishes+0.009
 ours+0.009
ports+0.008
 trash+0.008
Neg logit contributions
 Cr-0.011
ad-0.011
 jugg-0.010
Sus-0.010
 Sus-0.010
Token looks
Token ID3073
Feature activation+1.67e-02

Pos logit contributions
ags+0.027
ishes+0.022
 ours+0.021
 unnecessary+0.020
 attention+0.020
Neg logit contributions
 Cr-0.037
ad-0.036
 jugg-0.036
 Sus-0.034
Sus-0.033
Token like
Token ID588
Feature activation+2.76e-02

Pos logit contributions
ags+0.025
 ours+0.022
ishes+0.021
ports+0.020
 trash+0.020
Neg logit contributions
 Cr-0.030
ad-0.029
 jugg-0.029
Sus-0.028
 Sus-0.028
Token it
Token ID340
Feature activation+8.49e-03

Pos logit contributions
ags+0.050
ishes+0.044
irt+0.041
 heads+0.041
ses+0.040
Neg logit contributions
ad-0.042
 Cr-0.041
 jugg-0.039
 Sus-0.039
all-0.038
Token could
Token ID714
Feature activation+1.24e-02

Pos logit contributions
ags+0.012
 ours+0.010
ishes+0.010
 attention+0.009
 trash+0.009
Neg logit contributions
 Cr-0.016
ad-0.016
 jugg-0.016
Sus-0.015
 Sus-0.015
Token be
Token ID307
Feature activation+3.39e-03

Pos logit contributions
ags+0.015
 ours+0.015
 trash+0.014
 attention+0.014
 heads+0.013
Neg logit contributions
 Cr-0.025
Sus-0.023
Johnny-0.023
ad-0.022
 Sus-0.022
Token luxurious
Token ID35985
Feature activation+1.57e-02

Pos logit contributions
ags+0.005
 ours+0.004
ishes+0.004
ports+0.004
 attention+0.004
Neg logit contributions
 Cr-0.006
ad-0.006
 jugg-0.006
Sus-0.006
 Sus-0.006
Token and
Token ID290
Feature activation+9.44e-04

Pos logit contributions
ags+0.023
 ours+0.020
ishes+0.020
 trash+0.019
ports+0.019
Neg logit contributions
 Cr-0.028
ad-0.028
 jugg-0.027
all-0.026
 Sus-0.026
Token picks
Token ID11103
Feature activation+1.63e-02

Pos logit contributions
ags+0.022
 ours+0.019
 attention+0.017
ishes+0.017
 unnecessary+0.016
Neg logit contributions
 Cr-0.042
ad-0.041
 jugg-0.041
 chuckled-0.038
all-0.038
Token up
Token ID510
Feature activation-1.88e-04

Pos logit contributions
ags+0.023
 ours+0.021
ishes+0.020
 unnecessary+0.019
ses+0.019
Neg logit contributions
 Cr-0.031
ad-0.030
all-0.027
 Sus-0.027
 jugg-0.027
Token the
Token ID262
Feature activation+6.35e-03

Pos logit contributions
 Cr+0.000
ad+0.000
 jugg+0.000
Sus+0.000
 Sus+0.000
Neg logit contributions
ags-0.000
 ours-0.000
ishes-0.000
 trash-0.000
 attention-0.000
Token axe
Token ID23272
Feature activation-6.75e-04

Pos logit contributions
ags+0.009
ishes+0.008
ports+0.008
ows+0.008
 ours+0.008
Neg logit contributions
ad-0.010
 Cr-0.010
 jugg-0.009
His-0.009
 grizz-0.009
Token and
Token ID290
Feature activation+2.63e-02

Pos logit contributions
 Cr+0.001
ad+0.001
Sus+0.001
 jugg+0.001
Johnny+0.001
Neg logit contributions
ags-0.001
 ours-0.001
ishes-0.001
 trash-0.001
 attention-0.001
Token puts
Token ID7584
Feature activation+3.01e-02

Pos logit contributions
ags+0.037
 ours+0.030
 unnecessary+0.029
issors+0.028
ports+0.027
Neg logit contributions
 Cr-0.051
 jugg-0.051
ad-0.050
 chuckled-0.046
 gn-0.046
Token it
Token ID340
Feature activation-1.63e-03

Pos logit contributions
ags+0.070
ishes+0.064
 ours+0.064
 unnecessary+0.064
ows+0.064
Neg logit contributions
 Cr-0.028
ad-0.026
 Sus-0.024
on-0.021
all-0.020
Token back
Token ID736
Feature activation-6.75e-04

Pos logit contributions
 Cr+0.003
ad+0.003
all+0.003
Sus+0.003
 jugg+0.003
Neg logit contributions
ags-0.002
 ours-0.002
ports-0.002
ishes-0.002
 unnecessary-0.002
Token where
Token ID810
Feature activation+7.48e-03

Pos logit contributions
 Cr+0.001
ad+0.001
all+0.001
 Sus+0.001
Sus+0.001
Neg logit contributions
ags-0.001
 ours-0.001
 unnecessary-0.001
ports-0.001
ishes-0.001
Token they
Token ID484
Feature activation+6.70e-03

Pos logit contributions
ags+0.013
 ours+0.013
 trash+0.012
ishes+0.012
 attention+0.012
Neg logit contributions
 Cr-0.011
ad-0.011
 jugg-0.010
Sus-0.010
all-0.010
Token found
Token ID1043
Feature activation+3.09e-02

Pos logit contributions
ags+0.012
 ours+0.011
 attention+0.010
ishes+0.010
ports+0.010
Neg logit contributions
 Cr-0.010
ad-0.010
 jugg-0.010
Sus-0.009
 Sus-0.009
Token play
Token ID711
Feature activation-8.53e-03

Pos logit contributions
 Cr+0.021
ad+0.020
Sus+0.020
Johnny+0.019
all+0.019
Neg logit contributions
ags-0.020
 hands-0.019
 attention-0.019
 ours-0.018
 heads-0.018
Token together
Token ID1978
Feature activation-1.04e-02

Pos logit contributions
 Cr+0.014
ad+0.014
 jugg+0.014
Sus+0.013
 Sus+0.013
Neg logit contributions
ags-0.013
 ours-0.013
ishes-0.012
 attention-0.012
 trash-0.012
Token until
Token ID1566
Feature activation-5.74e-03

Pos logit contributions
 Cr+0.018
ad+0.018
Sus+0.017
 jugg+0.017
Johnny+0.017
Neg logit contributions
ags-0.016
 ours-0.014
ishes-0.014
 attention-0.013
 trash-0.013
Token it
Token ID340
Feature activation-9.67e-03

Pos logit contributions
 Cr+0.010
ad+0.010
 jugg+0.009
Sus+0.009
all+0.009
Neg logit contributions
ags-0.009
 ours-0.008
 trash-0.008
ishes-0.008
 attention-0.008
Token was
Token ID373
Feature activation+8.06e-03

Pos logit contributions
 Cr+0.015
ad+0.014
Sus+0.013
 jugg+0.013
Johnny+0.013
Neg logit contributions
ags-0.016
 trash-0.015
 best-0.015
 waits-0.015
 hopes-0.015
Token time
Token ID640
Feature activation+3.09e-02

Pos logit contributions
ags+0.009
ishes+0.008
 ours+0.008
ports+0.008
 trash+0.007
Neg logit contributions
 Cr-0.017
ad-0.017
 jugg-0.016
Sus-0.016
 Sus-0.016
Token for
Token ID329
Feature activation-9.28e-03

Pos logit contributions
ishes+0.059
ags+0.059
ports+0.054
 trash+0.054
 ours+0.053
Neg logit contributions
 Sus-0.039
 Cr-0.037
all-0.037
ad-0.036
Sus-0.035
Token them
Token ID606
Feature activation-3.81e-03

Pos logit contributions
 Cr+0.017
ad+0.017
 jugg+0.017
Sus+0.017
 Nice+0.016
Neg logit contributions
ags-0.013
 ours-0.011
 trash-0.011
 twins-0.011
 hands-0.011
Token to
Token ID284
Feature activation-6.85e-03

Pos logit contributions
 Cr+0.006
ad+0.006
 jugg+0.005
 Sus+0.005
Sus+0.005
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
ports-0.006
 trash-0.006
Token go
Token ID467
Feature activation-3.17e-03

Pos logit contributions
 Cr+0.015
Sus+0.014
ad+0.014
 jugg+0.013
Johnny+0.013
Neg logit contributions
ags-0.008
 ours-0.007
 trash-0.006
 attention-0.006
 hopes-0.006
Token home
Token ID1363
Feature activation-1.55e-02

Pos logit contributions
 Cr+0.006
ad+0.006
Sus+0.006
 jugg+0.006
 Sus+0.006
Neg logit contributions
ags-0.004
 ours-0.004
ishes-0.004
 trash-0.004
ports-0.004
Token room
Token ID2119
Feature activation-3.11e-03

Pos logit contributions
 Cr+0.019
ad+0.019
 jugg+0.019
Sus+0.018
 Sus+0.018
Neg logit contributions
ags-0.010
 ours-0.009
ishes-0.008
 attention-0.007
ports-0.007
Token looking
Token ID2045
Feature activation+8.03e-03

Pos logit contributions
 Cr+0.006
ad+0.005
 jugg+0.005
Johnny+0.005
Sus+0.005
Neg logit contributions
ags-0.004
 hands-0.004
 ours-0.004
 attention-0.004
 twins-0.004
Token for
Token ID329
Feature activation+1.69e-02

Pos logit contributions
ags+0.010
 ours+0.009
ishes+0.009
ports+0.009
 unnecessary+0.008
Neg logit contributions
 Cr-0.015
ad-0.015
 Sus-0.015
 jugg-0.014
Sus-0.014
Token a
Token ID257
Feature activation+1.24e-02

Pos logit contributions
ags+0.023
 ours+0.021
ishes+0.020
 trash+0.019
 attention+0.019
Neg logit contributions
 Cr-0.033
ad-0.032
 jugg-0.031
Sus-0.030
 Sus-0.030
Token good
Token ID922
Feature activation+9.77e-03

Pos logit contributions
ags+0.017
 ours+0.015
ishes+0.014
ports+0.013
 attention+0.013
Neg logit contributions
 Cr-0.024
ad-0.024
 jugg-0.023
Sus-0.023
 Sus-0.022
Token place
Token ID1295
Feature activation+2.66e-02

Pos logit contributions
ags+0.015
 ours+0.013
ishes+0.013
ports+0.013
issors+0.012
Neg logit contributions
ad-0.017
 jugg-0.017
 Cr-0.017
 Sus-0.016
Sus-0.016
Token to
Token ID284
Feature activation-2.39e-03

Pos logit contributions
ags+0.043
 ours+0.039
ishes+0.038
 trash+0.037
ports+0.036
Neg logit contributions
 Cr-0.044
ad-0.043
 jugg-0.041
Sus-0.041
 Sus-0.040
Token hide
Token ID7808
Feature activation+1.99e-02

Pos logit contributions
 Cr+0.005
Sus+0.005
Johnny+0.005
ad+0.005
all+0.005
Neg logit contributions
ags-0.003
 ours-0.002
 attention-0.002
 trash-0.002
 hopes-0.002
Token.
Token ID13
Feature activation+1.16e-02

Pos logit contributions
ags+0.030
 ours+0.027
ishes+0.026
 trash+0.025
 attention+0.025
Neg logit contributions
 Cr-0.035
ad-0.034
Sus-0.033
 jugg-0.032
 Sus-0.032
Token Sue
Token ID26089
Feature activation+3.19e-03

Pos logit contributions
ags+0.021
 ours+0.019
ishes+0.019
 trash+0.018
 unnecessary+0.018
Neg logit contributions
 Cr-0.018
ad-0.017
 Sus-0.016
Sus-0.016
 jugg-0.016
Token saw
Token ID2497
Feature activation+1.42e-02

Pos logit contributions
ags+0.004
ishes+0.004
 ours+0.004
 hands+0.004
 trash+0.004
Neg logit contributions
 Cr-0.006
ad-0.006
Sus-0.006
 jugg-0.005
Johnny-0.005
Token.
Token ID13
Feature activation+3.55e-03

Pos logit contributions
 Cr+0.005
ad+0.005
 jugg+0.005
Sus+0.005
 Sus+0.005
Neg logit contributions
ags-0.004
 ours-0.003
ishes-0.003
 attention-0.003
 trash-0.003
Token He
Token ID679
Feature activation+2.89e-02

Pos logit contributions
 ours+0.008
ags+0.008
ishes+0.008
 unnecessary+0.008
 pests+0.008
Neg logit contributions
 Cr-0.003
ad-0.003
 Nice-0.003
 jugg-0.003
 Sus-0.003
Token is
Token ID318
Feature activation+1.18e-02

Pos logit contributions
ags+0.039
 ours+0.036
issors+0.034
 unnecessary+0.032
 attention+0.031
Neg logit contributions
 jugg-0.056
ad-0.055
 Cr-0.054
 chuckled-0.051
 gn-0.051
Token sleepy
Token ID42023
Feature activation+7.09e-03

Pos logit contributions
ags+0.021
ishes+0.020
ports+0.020
 ours+0.019
issors+0.019
Neg logit contributions
 Cr-0.017
ad-0.016
all-0.015
 jugg-0.015
Sus-0.015
Token.
Token ID13
Feature activation-1.89e-03

Pos logit contributions
ags+0.012
 ours+0.011
ishes+0.011
ports+0.010
 trash+0.010
Neg logit contributions
ad-0.011
 Cr-0.011
 jugg-0.011
all-0.010
Sus-0.010
Token He
Token ID679
Feature activation+2.84e-02

Pos logit contributions
 Cr+0.002
ad+0.002
 jugg+0.002
 Nice+0.002
Sus+0.002
Neg logit contributions
ags-0.004
ishes-0.004
 ours-0.004
 unnecessary-0.004
 heads-0.004
Token says
Token ID1139
Feature activation-3.08e-03

Pos logit contributions
ags+0.039
 ours+0.035
issors+0.033
ishes+0.031
ports+0.031
Neg logit contributions
 jugg-0.054
ad-0.053
 Cr-0.052
 chuckled-0.050
 gn-0.049
Token,
Token ID11
Feature activation+7.26e-03

Pos logit contributions
 Cr+0.004
ad+0.004
Sus+0.004
 jugg+0.004
Johnny+0.004
Neg logit contributions
ags-0.006
 ours-0.005
 attention-0.005
 hands-0.005
 best-0.005
Token "
Token ID366
Feature activation-6.70e-03

Pos logit contributions
ags+0.013
 ours+0.012
ishes+0.012
 trash+0.011
 attention+0.011
Neg logit contributions
 Cr-0.011
ad-0.011
 jugg-0.010
Sus-0.010
all-0.010
TokenI
Token ID40
Feature activation-1.47e-03

Pos logit contributions
 Cr+0.013
ad+0.013
 jugg+0.013
Sus+0.013
Johnny+0.012
Neg logit contributions
ags-0.009
 ours-0.008
ishes-0.008
 attention-0.007
 trash-0.007
Token love
Token ID1842
Feature activation-8.02e-03

Pos logit contributions
 Cr+0.003
ad+0.003
Sus+0.003
 jugg+0.003
Johnny+0.003
Neg logit contributions
ags-0.001
 ours-0.001
 trash-0.001
 hopes-0.001
ishes-0.001
Token not
Token ID407
Feature activation-3.77e-03

Pos logit contributions
 Cr+0.024
Sus+0.022
ad+0.022
 jugg+0.022
Johnny+0.022
Neg logit contributions
ags-0.018
 ours-0.017
 hopes-0.015
 trash-0.015
 heads-0.015
Token be
Token ID307
Feature activation-3.50e-03

Pos logit contributions
 Cr+0.007
ad+0.006
Sus+0.006
Johnny+0.006
 Sus+0.006
Neg logit contributions
ags-0.005
 ours-0.005
 trash-0.005
 attention-0.005
ishes-0.005
Token jealous
Token ID19354
Feature activation-3.21e-03

Pos logit contributions
 Cr+0.006
ad+0.006
 jugg+0.006
Sus+0.006
 Sus+0.005
Neg logit contributions
ags-0.005
 ours-0.005
ishes-0.005
ports-0.005
 attention-0.005
Token of
Token ID286
Feature activation-5.98e-03

Pos logit contributions
 Cr+0.004
ad+0.004
 jugg+0.004
Sus+0.004
all+0.004
Neg logit contributions
ags-0.006
 ours-0.006
ishes-0.005
 trash-0.005
ports-0.005
Token what
Token ID644
Feature activation-3.17e-03

Pos logit contributions
 Cr+0.011
ad+0.010
 jugg+0.010
 Sus+0.010
Sus+0.010
Neg logit contributions
ags-0.009
 ours-0.008
ishes-0.008
 trash-0.007
 slot-0.007
Token others
Token ID1854
Feature activation+1.09e-02

Pos logit contributions
 Cr+0.006
 jugg+0.005
ad+0.005
 Sus+0.005
Sus+0.005
Neg logit contributions
ags-0.005
 ours-0.004
ishes-0.004
 trailer-0.004
irt-0.004
Token have
Token ID423
Feature activation+3.17e-03

Pos logit contributions
ags+0.015
 ours+0.013
ishes+0.013
 trailer+0.012
 unnecessary+0.012
Neg logit contributions
 jugg-0.020
ad-0.020
 Cr-0.020
all-0.019
 Sus-0.018
Token,
Token ID11
Feature activation-1.12e-02

Pos logit contributions
ags+0.005
 ours+0.004
ishes+0.004
 trash+0.004
 attention+0.004
Neg logit contributions
 Cr-0.006
ad-0.005
 jugg-0.005
Sus-0.005
 Sus-0.005
Token but
Token ID475
Feature activation-1.44e-02

Pos logit contributions
 Cr+0.019
ad+0.019
 jugg+0.018
Sus+0.018
 Sus+0.017
Neg logit contributions
ags-0.017
 ours-0.016
ishes-0.015
 trash-0.015
 attention-0.015
Token offer
Token ID2897
Feature activation+3.78e-03

Pos logit contributions
ad+0.026
 Cr+0.026
Sus+0.025
 jugg+0.025
Johnny+0.024
Neg logit contributions
ags-0.019
 ours-0.019
ishes-0.017
 attention-0.017
 trash-0.017
Token to
Token ID284
Feature activation-5.54e-04

Pos logit contributions
ags+0.006
 ours+0.005
ishes+0.005
irt+0.005
 slot+0.005
Neg logit contributions
 Cr-0.007
ad-0.007
 jugg-0.006
 Sus-0.006
Sus-0.006
Token slide
Token ID10649
Feature activation-2.62e-03

Pos logit contributions
ags+0.007
 ours+0.007
ishes+0.006
 trash+0.006
 attention+0.006
Neg logit contributions
 Cr-0.009
ad-0.009
 jugg-0.009
Sus-0.009
 Sus-0.008
Token-
Token ID12
Feature activation-1.11e-02

Pos logit contributions
 Cr+0.005
ad+0.004
 jugg+0.004
Sus+0.004
 Sus+0.004
Neg logit contributions
ags-0.004
 ours-0.004
ishes-0.003
 trash-0.003
 attention-0.003
Tokenbat
Token ID8664
Feature activation+9.88e-03

Pos logit contributions
 Cr+0.015
ad+0.015
 jugg+0.015
Sus+0.014
 Nice+0.013
Neg logit contributions
ags-0.020
 ours-0.019
ishes-0.018
 attention-0.018
ports-0.018
Tokenhe
Token ID258
Feature activation-1.64e-03

Pos logit contributions
ags+0.018
 ours+0.016
ishes+0.015
ows+0.015
ports+0.015
Neg logit contributions
 Cr-0.015
 jugg-0.014
ad-0.013
Sus-0.013
 Sus-0.013
Token and
Token ID290
Feature activation+1.01e-02

Pos logit contributions
 Cr+0.003
ad+0.003
 jugg+0.003
Sus+0.003
 Sus+0.003
Neg logit contributions
ags-0.002
 ours-0.002
ishes-0.002
 trash-0.002
 attention-0.002
Token she
Token ID673
Feature activation+7.44e-04

Pos logit contributions
ags+0.019
 ours+0.018
ishes+0.017
 trash+0.017
ports+0.017
Neg logit contributions
 Cr-0.014
ad-0.014
 jugg-0.013
Sus-0.013
 Sus-0.013
Token was
Token ID373
Feature activation+4.62e-03

Pos logit contributions
ags+0.001
 hopes+0.001
 hands+0.001
 best+0.001
ishes+0.001
Neg logit contributions
 Cr-0.001
ad-0.001
Sus-0.001
 Nice-0.001
 Sus-0.001
Token so
Token ID523
Feature activation+5.00e-04

Pos logit contributions
ags+0.006
 ours+0.006
 best+0.005
 attention+0.005
ishes+0.005
Neg logit contributions
 Cr-0.009
ad-0.008
 jugg-0.008
Sus-0.008
Johnny-0.008
Token proud
Token ID6613
Feature activation-3.05e-04

Pos logit contributions
ags+0.001
 ours+0.001
 best+0.001
 attention+0.000
 hopes+0.000
Neg logit contributions
 Cr-0.001
ad-0.001
 jugg-0.001
 Nice-0.001
Johnny-0.001
Token of
Token ID286
Feature activation-9.01e-03

Pos logit contributions
 Cr+0.000
ad+0.000
 jugg+0.000
Sus+0.000
 Sus+0.000
Neg logit contributions
ags-0.001
 ours-0.001
ishes-0.001
 trash-0.001
ports-0.001
Token her
Token ID607
Feature activation+1.17e-02

Pos logit contributions
 Cr+0.016
 jugg+0.015
ad+0.015
Sus+0.014
Johnny+0.014
Neg logit contributions
ags-0.014
 ours-0.013
 attention-0.012
 hopes-0.012
 trash-0.011
Token then
Token ID788
Feature activation-1.36e-02

Pos logit contributions
 Cr+0.002
ad+0.002
 jugg+0.002
Sus+0.002
 Nice+0.002
Neg logit contributions
ags-0.001
 ours-0.001
ishes-0.001
 trash-0.001
ports-0.001
Token he
Token ID339
Feature activation-5.67e-03

Pos logit contributions
 Cr+0.024
ad+0.023
 jugg+0.022
Sus+0.022
all+0.021
Neg logit contributions
ags-0.019
 ours-0.018
 trash-0.018
 hands-0.017
ishes-0.017
Token replied
Token ID8712
Feature activation-8.48e-03

Pos logit contributions
 Cr+0.011
ad+0.011
 Nice+0.010
igsaw+0.010
Sus+0.010
Neg logit contributions
ports-0.006
 trash-0.006
 hands-0.006
 hopes-0.006
ishes-0.006
Token,
Token ID11
Feature activation-5.85e-04

Pos logit contributions
ad+0.015
 Cr+0.015
 jugg+0.015
|+0.014
Sus+0.014
Neg logit contributions
ags-0.012
 best-0.011
 hands-0.011
 attention-0.011
 things-0.010
Token "
Token ID366
Feature activation-1.15e-02

Pos logit contributions
ad+0.001
 Cr+0.001
 jugg+0.001
Sus+0.001
igsaw+0.001
Neg logit contributions
ags-0.001
 hands-0.001
 ours-0.001
 twins-0.001
ports-0.001
TokenBecause
Token ID8128
Feature activation+7.25e-03

Pos logit contributions
 Cr+0.024
ad+0.024
 jugg+0.022
 Sus+0.022
 grizz+0.022
Neg logit contributions
ags-0.012
OK-0.011
 ours-0.010
ports-0.010
 trash-0.010
Token it
Token ID340
Feature activation-4.58e-03

Pos logit contributions
ags+0.012
ishes+0.011
 ours+0.010
 trash+0.010
ports+0.010
Neg logit contributions
 Cr-0.012
 jugg-0.012
ad-0.012
all-0.011
 Sus-0.011
Token helps
Token ID5419
Feature activation-6.04e-03

Pos logit contributions
 Cr+0.008
ad+0.008
 jugg+0.008
Sus+0.007
all+0.007
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
 attention-0.006
 trash-0.006
Token me
Token ID502
Feature activation-4.38e-03

Pos logit contributions
 Cr+0.011
ad+0.011
 jugg+0.010
Sus+0.010
 Sus+0.010
Neg logit contributions
ags-0.009
 ours-0.008
ishes-0.008
 trash-0.008
 attention-0.007
Token learn
Token ID2193
Feature activation-1.33e-02

Pos logit contributions
 Cr+0.008
ad+0.007
 jugg+0.007
Sus+0.007
 Sus+0.007
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
 trash-0.006
ports-0.006
Token and
Token ID290
Feature activation+3.08e-03

Pos logit contributions
 Cr+0.024
ad+0.023
 jugg+0.022
Sus+0.022
 Sus+0.021
Neg logit contributions
ags-0.020
 ours-0.018
ishes-0.017
 trash-0.017
 attention-0.016
Token with
Token ID351
Feature activation-6.05e-04

Pos logit contributions
ags+0.005
 ours+0.005
ishes+0.004
 attention+0.004
 trash+0.004
Neg logit contributions
 Cr-0.007
ad-0.007
 jugg-0.007
Sus-0.007
 Sus-0.007
Token their
Token ID511
Feature activation-1.02e-03

Pos logit contributions
 Cr+0.001
ad+0.001
all+0.001
 jugg+0.001
Sus+0.001
Neg logit contributions
ags-0.001
 ours-0.001
 trash-0.001
ishes-0.001
 hands-0.001
Token toys
Token ID14958
Feature activation+2.01e-03

Pos logit contributions
 Cr+0.002
ad+0.002
 jugg+0.002
Sus+0.002
 Sus+0.002
Neg logit contributions
ags-0.001
 ours-0.001
ishes-0.001
ports-0.001
 unnecessary-0.001
Token in
Token ID287
Feature activation+8.99e-04

Pos logit contributions
ags+0.003
 ours+0.003
 unnecessary+0.003
 attention+0.003
ports+0.003
Neg logit contributions
 jugg-0.003
 Cr-0.003
ad-0.003
all-0.003
ill-0.003
Token the
Token ID262
Feature activation-6.76e-03

Pos logit contributions
ags+0.001
 ours+0.001
ishes+0.001
 trash+0.001
 attention+0.001
Neg logit contributions
 Cr-0.002
ad-0.002
 jugg-0.001
Sus-0.001
all-0.001
Token living
Token ID2877
Feature activation+9.12e-03

Pos logit contributions
 Cr+0.013
ad+0.013
 jugg+0.012
Johnny+0.012
Sus+0.012
Neg logit contributions
ags-0.008
 ours-0.008
ishes-0.008
ows-0.007
 attention-0.007
Token room
Token ID2119
Feature activation-1.18e-02

Pos logit contributions
ags+0.009
 ours+0.009
 unnecessary+0.008
 attention+0.008
ishes+0.008
Neg logit contributions
ad-0.020
 Cr-0.020
 jugg-0.019
Sus-0.019
 Sus-0.019
Token.
Token ID13
Feature activation-5.87e-03

Pos logit contributions
 Cr+0.022
ad+0.021
 jugg+0.020
Sus+0.020
 Sus+0.019
Neg logit contributions
ags-0.017
 ours-0.016
ishes-0.015
 trash-0.014
 attention-0.014
Token They
Token ID1119
Feature activation-2.33e-03

Pos logit contributions
 Cr+0.011
ad+0.011
 jugg+0.011
Sus+0.010
 Sus+0.010
Neg logit contributions
ags-0.008
 ours-0.007
ishes-0.007
 attention-0.007
 unnecessary-0.007
Token had
Token ID550
Feature activation-2.36e-03

Pos logit contributions
 Cr+0.004
ad+0.004
all+0.004
Sus+0.004
 jugg+0.004
Neg logit contributions
ags-0.003
 ours-0.003
 trash-0.003
ishes-0.003
 hands-0.003
Token many
Token ID867
Feature activation-2.21e-02

Pos logit contributions
ad+0.004
 Cr+0.004
Sus+0.004
 jugg+0.004
 Sus+0.004
Neg logit contributions
ags-0.004
ishes-0.004
 ours-0.004
irt-0.003
 attention-0.003
TokenOnce
Token ID7454
Feature activation+1.14e-04

Pos logit contributions
ags+0.008
 ours+0.008
ishes+0.008
 trash+0.007
 attention+0.007
Neg logit contributions
 Cr-0.007
ad-0.007
 jugg-0.006
Sus-0.006
 Sus-0.006
Token upon
Token ID2402
Feature activation+7.47e-03

Pos logit contributions
ags+0.000
 ours+0.000
ishes+0.000
 attention+0.000
 trash+0.000
Neg logit contributions
 Cr-0.000
ad-0.000
 jugg-0.000
Sus-0.000
Johnny-0.000
Token a
Token ID257
Feature activation+1.28e-02

Pos logit contributions
ags+0.010
 ours+0.009
ishes+0.008
 trash+0.008
 attention+0.008
Neg logit contributions
 Cr-0.015
ad-0.014
 jugg-0.014
Sus-0.014
 Sus-0.013
Token time
Token ID640
Feature activation-7.06e-03

Pos logit contributions
ags+0.017
ishes+0.017
 ours+0.016
ports+0.015
 attention+0.015
Neg logit contributions
ad-0.023
 jugg-0.023
 Cr-0.023
Sus-0.022
all-0.022
Token,
Token ID11
Feature activation+1.81e-03

Pos logit contributions
 jugg+0.013
ad+0.012
 Cr+0.012
Sus+0.012
odic+0.012
Neg logit contributions
 best-0.010
ags-0.009
 ours-0.009
 hands-0.009
 twins-0.009
Token there
Token ID612
Feature activation+8.22e-03

Pos logit contributions
ags+0.003
ishes+0.003
 ours+0.003
 attention+0.002
 unnecessary+0.002
Neg logit contributions
 Cr-0.003
 Sus-0.003
ad-0.003
Sus-0.003
 Suz-0.003
Token was
Token ID373
Feature activation+5.52e-03

Pos logit contributions
 best+0.011
 hopes+0.011
ags+0.011
 trash+0.011
 hands+0.011
Neg logit contributions
 Cr-0.015
Sus-0.014
ad-0.013
 jugg-0.013
Ros-0.013
Token a
Token ID257
Feature activation+9.89e-03

Pos logit contributions
ags+0.008
ishes+0.008
 ours+0.008
 attention+0.007
 heads+0.007
Neg logit contributions
 Cr-0.010
ad-0.009
Sus-0.009
 jugg-0.009
 Sus-0.009
Token man
Token ID582
Feature activation-1.35e-02

Pos logit contributions
ags+0.016
 ours+0.013
ishes+0.013
 attention+0.013
ws+0.012
Neg logit contributions
 Cr-0.018
 jugg-0.018
ad-0.017
Sus-0.017
Johnny-0.016
Token named
Token ID3706
Feature activation-1.21e-02

Pos logit contributions
 Cr+0.024
ad+0.023
 jugg+0.022
Sus+0.022
 Sus+0.021
Neg logit contributions
ags-0.020
 ours-0.019
ishes-0.018
 trash-0.017
 attention-0.017
Token Tom
Token ID4186
Feature activation+3.50e-03

Pos logit contributions
 Cr+0.019
ad+0.018
 jugg+0.018
Sus+0.017
all+0.016
Neg logit contributions
ags-0.021
 ours-0.020
ishes-0.018
 trash-0.018
 attention-0.018
Token the
Token ID262
Feature activation-3.16e-03

Pos logit contributions
ags+0.003
 ours+0.003
 trash+0.003
ishes+0.003
 attention+0.003
Neg logit contributions
 Cr-0.004
ad-0.004
 jugg-0.004
Sus-0.004
Johnny-0.004
Token slide
Token ID10649
Feature activation-7.06e-03

Pos logit contributions
 Cr+0.005
ad+0.005
 jugg+0.005
Sus+0.005
all+0.005
Neg logit contributions
ags-0.005
ishes-0.005
 ours-0.005
ports-0.004
 unnecessary-0.004
Token together
Token ID1978
Feature activation-6.00e-03

Pos logit contributions
 Cr+0.012
ad+0.012
Sus+0.012
 jugg+0.012
Johnny+0.011
Neg logit contributions
ags-0.011
 ours-0.010
ishes-0.010
 trash-0.009
 attention-0.009
Token and
Token ID290
Feature activation+1.37e-03

Pos logit contributions
 Cr+0.011
ad+0.010
 jugg+0.010
Sus+0.010
 Sus+0.009
Neg logit contributions
ags-0.009
 ours-0.008
ishes-0.008
 trash-0.008
 attention-0.008
Token have
Token ID423
Feature activation-7.94e-03

Pos logit contributions
ags+0.002
 ours+0.002
 attention+0.002
ishes+0.002
 hopes+0.002
Neg logit contributions
ad-0.002
 Cr-0.002
Sus-0.002
Johnny-0.002
all-0.002
Token so
Token ID523
Feature activation-1.18e-02

Pos logit contributions
 Cr+0.015
ad+0.015
 jugg+0.014
Sus+0.014
 Sus+0.014
Neg logit contributions
ags-0.011
 ours-0.010
ishes-0.010
 attention-0.009
 trash-0.009
Token much
Token ID881
Feature activation-5.02e-03

Pos logit contributions
 Cr+0.020
ad+0.019
 jugg+0.019
Sus+0.018
Johnny+0.018
Neg logit contributions
ags-0.018
 ours-0.018
 trash-0.016
 attention-0.016
ishes-0.016
Token fun
Token ID1257
Feature activation+8.70e-03

Pos logit contributions
 Cr+0.011
ad+0.010
 jugg+0.010
Sus+0.010
all+0.010
Neg logit contributions
ags-0.006
 ours-0.005
ishes-0.005
ports-0.005
 trash-0.005
Token.
Token ID13
Feature activation+2.64e-03

Pos logit contributions
ags+0.014
 ours+0.013
ishes+0.013
 trash+0.012
ports+0.012
Neg logit contributions
 Cr-0.014
ad-0.014
 jugg-0.014
Sus-0.013
all-0.013
Token Lily
Token ID20037
Feature activation-5.53e-03

Pos logit contributions
ags+0.004
 ours+0.004
ishes+0.004
 trash+0.003
 attention+0.003
Neg logit contributions
 Cr-0.005
ad-0.004
 jugg-0.004
Sus-0.004
 Sus-0.004
Token realized
Token ID6939
Feature activation-2.00e-02

Pos logit contributions
 Cr+0.010
Sus+0.010
ad+0.010
 Sus+0.009
 jugg+0.009
Neg logit contributions
ags-0.007
 hopes-0.007
ishes-0.007
 best-0.007
 waits-0.006
Token happy
Token ID3772
Feature activation-3.96e-03

Pos logit contributions
ags+0.003
 ours+0.003
ishes+0.003
 trash+0.003
 attention+0.003
Neg logit contributions
 Cr-0.003
ad-0.003
 jugg-0.003
Sus-0.003
 Sus-0.003
Token and
Token ID290
Feature activation+4.07e-03

Pos logit contributions
 Cr+0.007
ad+0.007
 jugg+0.007
Sus+0.006
Johnny+0.006
Neg logit contributions
ags-0.006
 ours-0.005
ishes-0.005
 attention-0.005
 trash-0.005
Token excited
Token ID6568
Feature activation+2.00e-03

Pos logit contributions
ags+0.006
 ours+0.006
ishes+0.005
 trash+0.005
 attention+0.005
Neg logit contributions
ad-0.007
 Cr-0.007
Sus-0.007
 jugg-0.007
all-0.006
Token because
Token ID780
Feature activation-1.23e-02

Pos logit contributions
ags+0.003
 ours+0.003
ishes+0.003
 trash+0.003
 attention+0.003
Neg logit contributions
 Cr-0.003
ad-0.003
 jugg-0.003
Sus-0.003
 Sus-0.003
Token he
Token ID339
Feature activation+1.17e-02

Pos logit contributions
ad+0.017
 Cr+0.017
 jugg+0.016
Sus+0.015
 chuckled+0.014
Neg logit contributions
 ours-0.023
ags-0.022
 trash-0.020
 attention-0.020
 twins-0.020
Token saw
Token ID2497
Feature activation-2.41e-03

Pos logit contributions
ags+0.015
 ours+0.015
 hopes+0.015
ishes+0.014
ports+0.014
Neg logit contributions
 Cr-0.021
ad-0.021
Sus-0.019
 Nice-0.019
 jugg-0.018
Token impressive
Token ID8036
Feature activation-8.54e-03

Pos logit contributions
 Cr+0.004
ad+0.004
 jugg+0.004
Sus+0.004
 Sus+0.004
Neg logit contributions
ags-0.004
 ours-0.003
ishes-0.003
 attention-0.003
 trash-0.003
Token airplanes
Token ID32808
Feature activation-3.63e-03

Pos logit contributions
 Cr+0.016
ad+0.016
 jugg+0.016
Sus+0.015
 Sus+0.015
Neg logit contributions
ags-0.012
 ours-0.010
ishes-0.010
 trash-0.009
 attention-0.009
Token.
Token ID13
Feature activation-1.30e-04

Pos logit contributions
 Cr+0.007
ad+0.007
 jugg+0.006
Sus+0.006
 Sus+0.006
Neg logit contributions
ags-0.005
 ours-0.005
ishes-0.004
 attention-0.004
 trash-0.004
Token He
Token ID679
Feature activation+1.92e-03

Pos logit contributions
 Cr+0.000
ad+0.000
 jugg+0.000
Sus+0.000
all+0.000
Neg logit contributions
ags-0.000
 ours-0.000
ishes-0.000
 trash-0.000
ports-0.000
Token wanted
Token ID2227
Feature activation-3.15e-03

Pos logit contributions
ags+0.003
 ours+0.003
ishes+0.003
 trash+0.002
ports+0.002
Neg logit contributions
 Cr-0.003
ad-0.003
Sus-0.003
 jugg-0.003
 Sus-0.003
Token he
Token ID339
Feature activation+8.81e-03

Pos logit contributions
ags+0.012
 ours+0.011
 unnecessary+0.010
 attention+0.010
ishes+0.010
Neg logit contributions
ad-0.012
 Cr-0.012
 jugg-0.011
Sus-0.011
 Sus-0.011
Token polished
Token ID23895
Feature activation-9.80e-03

Pos logit contributions
ags+0.016
 ours+0.015
ishes+0.014
 attention+0.014
 trash+0.014
Neg logit contributions
 Cr-0.013
ad-0.013
 jugg-0.012
Sus-0.012
 Sus-0.011
Token the
Token ID262
Feature activation-7.59e-03

Pos logit contributions
 Cr+0.015
ad+0.015
 jugg+0.014
Sus+0.014
 Sus+0.013
Neg logit contributions
ags-0.017
 ours-0.016
ishes-0.015
 trash-0.015
 attention-0.015
Token wheels
Token ID13666
Feature activation+4.78e-03

Pos logit contributions
ad+0.014
 Cr+0.013
 jugg+0.013
Sus+0.013
 Sus+0.012
Neg logit contributions
ags-0.011
 ours-0.010
ishes-0.010
ports-0.009
 unnecessary-0.009
Token every
Token ID790
Feature activation+2.07e-02

Pos logit contributions
ags+0.007
 ours+0.006
ishes+0.006
 attention+0.006
 trash+0.006
Neg logit contributions
 Cr-0.009
ad-0.009
 jugg-0.008
Sus-0.008
 Sus-0.008
Token day
Token ID1110
Feature activation-4.72e-03

Pos logit contributions
ags+0.028
 ours+0.025
ishes+0.024
 trash+0.022
 attention+0.022
Neg logit contributions
 Cr-0.041
ad-0.040
 jugg-0.039
Sus-0.038
 Sus-0.037
Token so
Token ID523
Feature activation-1.63e-02

Pos logit contributions
 Cr+0.009
ad+0.009
 jugg+0.008
Sus+0.008
Johnny+0.008
Neg logit contributions
ags-0.006
 ours-0.006
 attention-0.006
 hands-0.005
ishes-0.005
Token they
Token ID484
Feature activation-7.83e-03

Pos logit contributions
 Cr+0.025
ad+0.023
 jugg+0.022
 Nice+0.021
Sus+0.021
Neg logit contributions
ags-0.026
 best-0.026
 ours-0.025
 attention-0.025
 hands-0.025
Token stayed
Token ID9658
Feature activation+8.39e-03

Pos logit contributions
 Cr+0.013
ad+0.013
Sus+0.012
 jugg+0.012
 Sus+0.012
Neg logit contributions
ags-0.012
 ours-0.011
ishes-0.011
 trash-0.011
 attention-0.010
Token shiny
Token ID22441
Feature activation+3.44e-04

Pos logit contributions
ags+0.012
 ours+0.011
ishes+0.011
ports+0.010
 trash+0.010
Neg logit contributions
ad-0.015
 Cr-0.015
Sus-0.014
 jugg-0.014
 Sus-0.014
Token.
Token ID13
Feature activation-3.21e-03

Pos logit contributions
ags+0.001
 ours+0.001
ishes+0.000
 trash+0.000
ports+0.000
Neg logit contributions
ad-0.001
 Cr-0.001
 jugg-0.001
 Sus-0.000
Sus-0.000
TokenYou
Token ID1639
Feature activation-5.43e-03

Pos logit contributions
 Cr+0.019
ad+0.019
 jugg+0.018
Sus+0.018
 Sus+0.017
Neg logit contributions
ags-0.008
 ours-0.007
ishes-0.007
 trash-0.006
 attention-0.006
Token are
Token ID389
Feature activation+1.57e-02

Pos logit contributions
 Cr+0.010
ad+0.010
 jugg+0.010
Sus+0.009
 Sus+0.009
Neg logit contributions
ags-0.008
 ours-0.007
ishes-0.007
 trash-0.006
 attention-0.006
Token my
Token ID616
Feature activation+1.61e-02

Pos logit contributions
ags+0.022
ishes+0.020
 ours+0.020
ports+0.019
 trash+0.018
Neg logit contributions
 Cr-0.030
ad-0.029
 jugg-0.027
Sus-0.027
 Sus-0.027
Token best
Token ID1266
Feature activation+2.49e-02

Pos logit contributions
ags+0.020
 ours+0.016
ishes+0.016
ports+0.015
hes+0.015
Neg logit contributions
 Cr-0.035
ad-0.033
 jugg-0.032
 Sus-0.032
Sus-0.031
Token friend
Token ID1545
Feature activation+1.12e-02

Pos logit contributions
ags+0.041
 ours+0.037
ishes+0.036
ports+0.035
 trash+0.035
Neg logit contributions
 Cr-0.042
ad-0.041
 jugg-0.041
 Sus-0.039
Sus-0.039
Token too
Token ID1165
Feature activation-1.33e-02

Pos logit contributions
ags+0.022
ishes+0.020
 ours+0.019
irt+0.019
 unnecessary+0.019
Neg logit contributions
ad-0.016
 Cr-0.016
 jugg-0.014
 Sus-0.014
all-0.014
Token.
Token ID13
Feature activation+2.05e-03

Pos logit contributions
 Cr+0.024
ad+0.023
 jugg+0.023
Sus+0.022
 Sus+0.022
Neg logit contributions
ags-0.020
 ours-0.018
ishes-0.017
 trash-0.017
 attention-0.016
Token You
Token ID921
Feature activation-4.64e-03

Pos logit contributions
ags+0.003
 ours+0.003
ishes+0.002
 trash+0.002
 attention+0.002
Neg logit contributions
 Cr-0.004
ad-0.004
 jugg-0.004
Sus-0.004
 Sus-0.004
Token are
Token ID389
Feature activation+1.35e-02

Pos logit contributions
 Cr+0.009
ad+0.009
 jugg+0.009
Sus+0.008
 Sus+0.008
Neg logit contributions
ags-0.006
 ours-0.006
ishes-0.005
 attention-0.005
 trash-0.005
Token very
Token ID845
Feature activation-5.05e-03

Pos logit contributions
ags+0.020
ishes+0.018
 ours+0.018
ports+0.017
 trash+0.017
Neg logit contributions
 Cr-0.025
ad-0.024
 jugg-0.023
Sus-0.022
 Sus-0.022
Token loyal
Token ID9112
Feature activation+2.86e-02

Pos logit contributions
 Cr+0.011
ad+0.011
 jugg+0.010
Sus+0.010
Johnny+0.010
Neg logit contributions
ags-0.006
 ours-0.005
 trash-0.005
 attention-0.004
ishes-0.004
Token a
Token ID257
Feature activation-2.00e-03

Pos logit contributions
ags+0.007
 ours+0.006
 trash+0.006
 attention+0.006
ishes+0.006
Neg logit contributions
 Cr-0.010
ad-0.010
 jugg-0.009
Sus-0.009
Johnny-0.009
Token me
Token ID502
Feature activation+8.93e-04

Pos logit contributions
 Cr+0.004
ad+0.004
all+0.004
Sus+0.004
 jugg+0.004
Neg logit contributions
ags-0.003
 trash-0.002
 ours-0.002
 attention-0.002
 twins-0.002
Tokenow
Token ID322
Feature activation-7.99e-04

Pos logit contributions
ags+0.001
 attention+0.001
ows+0.001
 ours+0.001
irt+0.001
Neg logit contributions
 Cr-0.002
 jugg-0.002
Sus-0.002
 Sus-0.002
Johnny-0.001
Token.
Token ID13
Feature activation-9.03e-04

Pos logit contributions
 Cr+0.002
Sus+0.002
 jugg+0.002
ad+0.002
Johnny+0.002
Neg logit contributions
ags-0.001
 ours-0.001
ishes-0.001
 attention-0.001
 trash-0.001
Token She
Token ID1375
Feature activation-7.84e-03

Pos logit contributions
 Cr+0.001
ad+0.001
 Sus+0.001
 jugg+0.001
 Nice+0.001
Neg logit contributions
ags-0.002
 ours-0.002
ishes-0.002
 attention-0.002
 unnecessary-0.002
Token smiled
Token ID13541
Feature activation-5.07e-03

Pos logit contributions
 Cr+0.014
ad+0.013
all+0.013
igsaw+0.013
Sus+0.012
Neg logit contributions
 trash-0.011
ags-0.010
 hopes-0.010
 heads-0.010
 hands-0.010
Token.
Token ID13
Feature activation+1.76e-03

Pos logit contributions
 Cr+0.010
ad+0.010
 jugg+0.010
Sus+0.009
 Sus+0.009
Neg logit contributions
ags-0.007
 ours-0.006
ishes-0.006
 trash-0.005
 attention-0.005
Token She
Token ID1375
Feature activation-2.07e-03

Pos logit contributions
ags+0.004
 ours+0.004
ishes+0.004
 attention+0.003
 unnecessary+0.003
Neg logit contributions
 Cr-0.002
ad-0.002
 Nice-0.002
 Sus-0.002
Sus-0.002
Token saw
Token ID2497
Feature activation+1.45e-02

Pos logit contributions
 Cr+0.004
ad+0.004
igsaw+0.004
all+0.003
Sus+0.003
Neg logit contributions
ags-0.003
 trash-0.003
 hopes-0.003
 ours-0.002
 twins-0.002
Token a
Token ID257
Feature activation+8.98e-04

Pos logit contributions
ags+0.023
 ours+0.021
ishes+0.020
 attention+0.019
ports+0.019
Neg logit contributions
 Cr-0.025
ad-0.024
 jugg-0.023
Sus-0.022
 Sus-0.022
Token picture
Token ID4286
Feature activation-1.68e-03

Pos logit contributions
ags+0.001
 ours+0.001
 trash+0.001
ishes+0.001
 attention+0.001
Neg logit contributions
 Cr-0.002
ad-0.002
 jugg-0.002
Sus-0.002
all-0.002
Token a
Token ID257
Feature activation+4.78e-03

Pos logit contributions
 Cr+0.008
ad+0.008
 jugg+0.008
Sus+0.008
 chuckled+0.008
Neg logit contributions
ags-0.003
 ours-0.003
 trash-0.002
 best-0.002
 twins-0.002
Token gray
Token ID12768
Feature activation+8.00e-03

Pos logit contributions
ags+0.007
 ours+0.006
ishes+0.006
 attention+0.005
ports+0.005
Neg logit contributions
 Cr-0.009
 jugg-0.009
ad-0.009
Sus-0.009
 Sus-0.009
Token cat
Token ID3797
Feature activation-1.74e-02

Pos logit contributions
ags+0.011
 ours+0.010
ishes+0.009
 trash+0.009
 attention+0.009
Neg logit contributions
 Cr-0.015
ad-0.015
 jugg-0.014
Sus-0.014
 Sus-0.014
Token named
Token ID3706
Feature activation-4.95e-03

Pos logit contributions
 Cr+0.033
ad+0.031
Sus+0.030
 jugg+0.030
all+0.029
Neg logit contributions
ags-0.023
 ours-0.021
ishes-0.021
 trash-0.020
 twins-0.020
Token Spike
Token ID26309
Feature activation+8.47e-03

Pos logit contributions
 Cr+0.007
ad+0.006
 jugg+0.006
Sus+0.006
 Sus+0.006
Neg logit contributions
ags-0.010
 ours-0.009
ishes-0.009
 attention-0.008
 trash-0.008
Token.
Token ID13
Feature activation-2.96e-02

Pos logit contributions
ags+0.013
 ours+0.012
ishes+0.011
 attention+0.011
 trash+0.011
Neg logit contributions
 Cr-0.015
ad-0.015
 jugg-0.014
Sus-0.014
 Sus-0.014
Token Spike
Token ID26309
Feature activation-1.67e-03

Pos logit contributions
Sus+0.038
ad+0.037
 Cr+0.035
 jugg+0.035
 Sus+0.034
Neg logit contributions
ags-0.058
 ours-0.052
ishes-0.050
 trash-0.050
 hopes-0.050
Token loved
Token ID6151
Feature activation-3.10e-03

Pos logit contributions
 Cr+0.003
ad+0.003
Sus+0.003
 jugg+0.003
 Sus+0.003
Neg logit contributions
ags-0.002
 ours-0.002
ishes-0.002
 trash-0.002
ports-0.002
Token to
Token ID284
Feature activation-7.11e-03

Pos logit contributions
 Cr+0.005
ad+0.005
Sus+0.005
 jugg+0.005
 Sus+0.005
Neg logit contributions
ags-0.005
 ours-0.005
ishes-0.004
 trash-0.004
 attention-0.004
Token run
Token ID1057
Feature activation-8.72e-03

Pos logit contributions
 Cr+0.013
Sus+0.013
Johnny+0.012
all+0.012
enny+0.012
Neg logit contributions
ags-0.010
 trash-0.008
 hay-0.008
 hands-0.008
 ours-0.008
Token and
Token ID290
Feature activation-1.41e-02

Pos logit contributions
 Cr+0.017
ad+0.016
 jugg+0.016
Sus+0.016
 Sus+0.015
Neg logit contributions
ags-0.012
 ours-0.011
 trash-0.010
ishes-0.010
 attention-0.010
Token and
Token ID290
Feature activation-3.57e-03

Pos logit contributions
 Cr+0.025
 jugg+0.023
ad+0.023
Sus+0.022
Johnny+0.022
Neg logit contributions
ags-0.014
 attention-0.013
 hands-0.012
 ours-0.012
 best-0.012
Token she
Token ID673
Feature activation+1.59e-03

Pos logit contributions
 Cr+0.006
ad+0.006
 jugg+0.005
Sus+0.005
all+0.005
Neg logit contributions
ags-0.006
 ours-0.005
ishes-0.005
 trash-0.005
 hands-0.005
Token never
Token ID1239
Feature activation-1.14e-02

Pos logit contributions
ags+0.003
 ours+0.002
ishes+0.002
 hopes+0.002
 trash+0.002
Neg logit contributions
 Cr-0.003
ad-0.002
Sus-0.002
 jugg-0.002
Johnny-0.002
Token sh
Token ID427
Feature activation-1.42e-02

Pos logit contributions
 Cr+0.020
ad+0.019
Sus+0.019
Johnny+0.018
 jugg+0.018
Neg logit contributions
ags-0.016
 ours-0.015
 trash-0.015
ishes-0.015
 attention-0.015
Tokenivered
Token ID6396
Feature activation-5.67e-03

Pos logit contributions
 Cr+0.025
 jugg+0.024
 Nice+0.021
ad+0.021
 Sus+0.021
Neg logit contributions
ags-0.027
ishes-0.021
ows-0.021
irt-0.020
aked-0.020
Token again
Token ID757
Feature activation-3.12e-02

Pos logit contributions
 Cr+0.009
 jugg+0.009
Johnny+0.009
ad+0.009
Ros+0.008
Neg logit contributions
ags-0.009
 attention-0.009
 best-0.008
 ours-0.008
 trash-0.008
Token.
Token ID13
Feature activation-1.72e-03

Pos logit contributions
 Cr+0.070
ad+0.067
Sus+0.065
 jugg+0.064
Johnny+0.064
Neg logit contributions
ags-0.030
 ours-0.027
 attention-0.027
 hands-0.026
 best-0.025
Token She
Token ID1375
Feature activation-4.71e-03

Pos logit contributions
 Cr+0.002
ad+0.002
 jugg+0.002
Sus+0.002
 Sus+0.002
Neg logit contributions
ags-0.003
 ours-0.003
ishes-0.003
 trash-0.003
 attention-0.003
Token lived
Token ID5615
Feature activation+1.16e-03

Pos logit contributions
 Cr+0.008
ad+0.007
Sus+0.007
 jugg+0.007
Johnny+0.007
Neg logit contributions
ags-0.008
 hopes-0.007
 ours-0.007
ishes-0.007
 trash-0.007
Token with
Token ID351
Feature activation-1.81e-02

Pos logit contributions
ags+0.002
 ours+0.002
 hopes+0.002
 best+0.002
ishes+0.001
Neg logit contributions
 Cr-0.002
ad-0.002
 jugg-0.002
Sus-0.002
Johnny-0.002
Token the
Token ID262
Feature activation-3.58e-03

Pos logit contributions
 Cr+0.033
 jugg+0.032
Sus+0.032
ad+0.031
all+0.030
Neg logit contributions
ags-0.025
 ours-0.023
 hands-0.023
 attention-0.023
 trash-0.023
Token you
Token ID345
Feature activation-1.18e-02

Pos logit contributions
 Cr+0.019
ad+0.018
 jugg+0.018
Sus+0.018
Johnny+0.017
Neg logit contributions
ags-0.011
 ours-0.010
 attention-0.009
 trash-0.009
ishes-0.009
Token promise
Token ID6991
Feature activation-1.03e-02

Pos logit contributions
 Cr+0.022
ad+0.022
 jugg+0.021
Sus+0.021
 Sus+0.020
Neg logit contributions
ags-0.016
 ours-0.015
ishes-0.014
 trash-0.014
 attention-0.013
Token to
Token ID284
Feature activation-3.90e-03

Pos logit contributions
 Cr+0.017
ad+0.016
 jugg+0.015
Sus+0.015
 Sus+0.015
Neg logit contributions
ags-0.017
 ours-0.016
ishes-0.015
 trash-0.015
 attention-0.014
Token never
Token ID1239
Feature activation-2.07e-02

Pos logit contributions
 Cr+0.007
ad+0.007
Sus+0.007
 jugg+0.007
 Sus+0.006
Neg logit contributions
ags-0.006
 ours-0.005
ishes-0.005
 attention-0.005
 trash-0.005
Token speed
Token ID2866
Feature activation-1.16e-02

Pos logit contributions
 Cr+0.040
Sus+0.039
ad+0.037
Johnny+0.037
 Sus+0.036
Neg logit contributions
ags-0.026
 ours-0.024
 attention-0.023
 trash-0.023
ishes-0.022
Token again
Token ID757
Feature activation-3.54e-02

Pos logit contributions
 Cr+0.021
ad+0.021
 jugg+0.021
Sus+0.020
Johnny+0.019
Neg logit contributions
ags-0.017
 ours-0.015
ishes-0.014
 attention-0.014
 trash-0.014
Token,
Token ID11
Feature activation-9.56e-03

Pos logit contributions
 Cr+0.051
ad+0.049
 jugg+0.047
Sus+0.046
Johnny+0.044
Neg logit contributions
ags-0.065
 ours-0.061
ishes-0.058
 trash-0.057
 attention-0.056
Token you
Token ID345
Feature activation-1.13e-02

Pos logit contributions
 Cr+0.019
ad+0.018
 jugg+0.017
Sus+0.017
 Sus+0.017
Neg logit contributions
ags-0.013
 ours-0.012
ishes-0.011
 trash-0.011
 attention-0.010
Token can
Token ID460
Feature activation-7.18e-03

Pos logit contributions
 Cr+0.024
ad+0.024
 jugg+0.023
all+0.022
Sus+0.022
Neg logit contributions
ags-0.014
 ours-0.012
ishes-0.011
 trash-0.011
ports-0.010
Token have
Token ID423
Feature activation-8.93e-03

Pos logit contributions
 Cr+0.013
Sus+0.012
 Nice+0.012
 Sus+0.012
ad+0.012
Neg logit contributions
ags-0.010
 best-0.009
 ours-0.009
 hopes-0.009
 will-0.009
Token some
Token ID617
Feature activation-2.25e-03

Pos logit contributions
 Cr+0.017
ad+0.017
 jugg+0.016
Sus+0.016
 Sus+0.015
Neg logit contributions
ags-0.012
 ours-0.011
ishes-0.011
 trash-0.010
 attention-0.010
Token nurses
Token ID20669
Feature activation-3.00e-03

Pos logit contributions
ags+0.003
suits+0.003
 slot+0.003
ishes+0.003
ows+0.003
Neg logit contributions
 Cr-0.003
ad-0.003
 jugg-0.003
all-0.003
 Sus-0.003
Token and
Token ID290
Feature activation+4.25e-03

Pos logit contributions
 Cr+0.005
ad+0.004
 jugg+0.004
all+0.004
Sus+0.004
Neg logit contributions
ags-0.005
 ours-0.005
ishes-0.005
 trash-0.005
ports-0.004
Token doctors
Token ID7519
Feature activation-5.18e-04

Pos logit contributions
ags+0.007
 ours+0.006
ishes+0.006
ports+0.006
 trash+0.006
Neg logit contributions
 Cr-0.007
ad-0.007
 jugg-0.007
Sus-0.006
 Sus-0.006
Token when
Token ID618
Feature activation-1.18e-02

Pos logit contributions
 Cr+0.001
ad+0.001
 jugg+0.001
Sus+0.001
 Sus+0.001
Neg logit contributions
ags-0.001
 ours-0.001
ishes-0.001
 trash-0.001
ports-0.001
Token they
Token ID484
Feature activation-2.65e-03

Pos logit contributions
 Cr+0.020
ad+0.019
 jugg+0.019
 Sus+0.018
Sus+0.018
Neg logit contributions
ags-0.020
 ours-0.017
ishes-0.016
irt-0.015
ows-0.015
Token try
Token ID1949
Feature activation-3.25e-02

Pos logit contributions
 Cr+0.005
ad+0.005
 jugg+0.005
Sus+0.005
 Sus+0.005
Neg logit contributions
ags-0.004
 ours-0.003
ishes-0.003
 trash-0.003
 attention-0.003
Token to
Token ID284
Feature activation-1.42e-02

Pos logit contributions
 Cr+0.060
ad+0.060
 jugg+0.058
Sus+0.057
Johnny+0.055
Neg logit contributions
ags-0.046
 ours-0.043
 attention-0.041
 trash-0.040
ishes-0.040
Token help
Token ID1037
Feature activation-2.12e-02

Pos logit contributions
 Cr+0.027
Sus+0.025
ad+0.025
 Sus+0.024
 jugg+0.024
Neg logit contributions
ags-0.020
 ours-0.018
 attention-0.017
 trash-0.017
 heads-0.016
Token you
Token ID345
Feature activation-1.88e-02

Pos logit contributions
 Cr+0.039
ad+0.039
 jugg+0.037
Sus+0.036
 Sus+0.036
Neg logit contributions
ags-0.030
 ours-0.027
ishes-0.026
ports-0.024
 trash-0.024
Token.
Token ID13
Feature activation-1.02e-02

Pos logit contributions
 Cr+0.034
ad+0.033
 jugg+0.032
Sus+0.031
 Sus+0.030
Neg logit contributions
ags-0.028
 ours-0.026
ishes-0.024
 trash-0.024
 attention-0.023
Token They
Token ID1119
Feature activation+1.98e-03

Pos logit contributions
 Cr+0.018
ad+0.018
 jugg+0.018
Sus+0.017
 Sus+0.017
Neg logit contributions
ags-0.015
 ours-0.013
ishes-0.012
 trash-0.012
 hopes-0.012
Token didn
Token ID1422
Feature activation+1.40e-03

Pos logit contributions
 Cr+0.007
ad+0.007
 jugg+0.007
Sus+0.007
 Sus+0.006
Neg logit contributions
ags-0.010
 ours-0.009
ishes-0.009
 trash-0.009
 attention-0.008
Token't
Token ID470
Feature activation-5.72e-03

Pos logit contributions
ags+0.003
 ours+0.002
hes+0.002
issors+0.002
 best+0.002
Neg logit contributions
 chuckled-0.002
 jugg-0.002
 Cr-0.002
Sus-0.002
 Sus-0.002
Token want
Token ID765
Feature activation-3.54e-03

Pos logit contributions
 Cr+0.012
ad+0.012
 jugg+0.012
Sus+0.011
 Sus+0.011
Neg logit contributions
ags-0.007
 ours-0.006
 trash-0.005
ishes-0.005
 attention-0.005
Token to
Token ID284
Feature activation-5.49e-03

Pos logit contributions
 jugg+0.006
Sus+0.006
 Cr+0.006
ad+0.006
Jo+0.006
Neg logit contributions
ags-0.005
 ours-0.005
 attention-0.005
 trash-0.005
 hands-0.004
Token play
Token ID711
Feature activation-2.05e-02

Pos logit contributions
 Cr+0.010
Sus+0.010
odic+0.010
ad+0.010
Jo+0.010
Neg logit contributions
ags-0.007
 attention-0.006
 ours-0.006
 trash-0.006
 heads-0.006
Token anymore
Token ID7471
Feature activation-3.14e-02

Pos logit contributions
ad+0.034
 Cr+0.033
Sus+0.032
 jugg+0.032
Jo+0.031
Neg logit contributions
ags-0.034
 ours-0.030
 trash-0.029
 best-0.027
 twins-0.027
Token.
Token ID13
Feature activation-8.10e-03

Pos logit contributions
ad+0.072
 Cr+0.072
Sus+0.069
 jugg+0.069
 chuckled+0.067
Neg logit contributions
ags-0.023
 twins-0.023
 ours-0.023
 things-0.022
 hands-0.022
Token
Token ID198
Feature activation-3.33e-03

Pos logit contributions
ad+0.012
 Cr+0.012
 jugg+0.012
Sus+0.011
all+0.011
Neg logit contributions
ags-0.014
 ours-0.013
ishes-0.013
 trash-0.012
ports-0.012
Token
Token ID198
Feature activation-7.02e-03

Pos logit contributions
 Cr+0.006
ad+0.006
 jugg+0.006
 Sus+0.005
 chuckled+0.005
Neg logit contributions
ags-0.005
 ours-0.004
ports-0.004
 trash-0.004
 twins-0.004
TokenBut
Token ID1537
Feature activation-9.94e-03

Pos logit contributions
 Cr+0.010
ad+0.010
 jugg+0.010
 Sus+0.009
ill+0.009
Neg logit contributions
ags-0.012
 ours-0.012
 trash-0.011
ports-0.011
issors-0.011
Token then
Token ID788
Feature activation-1.11e-02

Pos logit contributions
ad+0.016
 jugg+0.015
 Cr+0.015
Sus+0.015
Jo+0.014
Neg logit contributions
ags-0.016
 ours-0.015
ishes-0.015
 trash-0.014
 hands-0.014