TOP ACTIVATIONS
MAX = 6.246

and the lightning struck the hyd rant . Suddenly water was
She turned on the f auc et , but the water
his hands under the f auc et . He watched as
and turned on the f auc et . L
and turned on the f auc et . Soon , the
to turn off the f auc et when he was done
and turned on the f auc et . The water was
She turned off the f auc et and dried herself .
to turn off the f auc et when she was done
She turned on the f auc et and icy water came
and turned off the f auc et for her . Lily
him and checked the f auc et before leaving him alone
shooting out of the f auc et ! The
. They opened the f auc et and put their letters
to turn off the f auc et after washing her hands
you turn on the f auc et , water will come
to turn off the f auc et . <|endoftext|> Once upon
she saw a beautiful f auc et .
and play with the f auc et . One day ,
would turn on the f auc et and use it to

MIN ACTIVATIONS
MIN = -4.946

was a little girl named Beat rice . She was so
day ever ! <|endoftext|> Mor ris was very excited . Today
had saved him . <|endoftext|> Charl otte and her mom were
One day , Beat rice â s
weak and they shouldn â t have faced the
the biggest frog hadn â t brought any l
She was so cute ! Beat rice had five fingers on
so excited he couldn â t contain himself !
worry because he hadn â t seen her pick
!" Mor ris smiled and replied , "
a strong lizard named T az . He lived happily among
old . One day , Lay la asked her mum for
and asked : " Why hasn 't the wealthy flower bloom
And so Beat rice had lots of fun
were two friends , Ab ig ail and Mat ty .
but she still hadn â t seen a purple
smiled , knowing she had mastered a new way to have
his mom my hadn â t ignored him .
so he promised to be Beat rice 's friend . From
do things he couldn â t do without it

INTERVAL 3.448 - 6.246
CONTAINS 0.023%

today . She has a fever and a sore throat .
. It bar ks and grow ls . Lily and Ben
good at j ogging . How do you do it ?"
they will make the earth taste delicious !" Bird disagreed ,
his garden and watched it grow each day .

INTERVAL 0.650 - 3.448
CONTAINS 22.783%

. They saw a big bird with many feathers . The
knew how to make real soup . " Can
replied , " We don 't know what that thing is
rainbow was a sign that the day was going to be
. The slide was a lie . It was empty .

INTERVAL -2.148 - 0.650
CONTAINS 76.529%

anything is possible . <|endoftext|> Once upon a time there was
Amy held the shiny star and said , " I wish
Once upon a time , there was a kind king .
He picked up Kitty and lifted her up high . Now
He ran around with his friends and had so much fun

INTERVAL -4.946 - -2.148
CONTAINS 0.665%

ion on Rachel â s arms and hands .
decision . He was so pleased with himself and couldn â
." He showed them how the truck worked and
. You are the best doll ever ." She pulled the
and princess were the most attractive couple there . Everyone stared
Token and
Token ID290
Feature activation-3.80e-01

Pos logit contributions
rated+0.009
aus+0.008
 gag+0.008
urg+0.008
icious+0.008
Neg logit contributions
aroo-0.003
 nowhere-0.003
aro-0.003
't-0.002
t-0.002
Token the
Token ID262
Feature activation+8.45e-01

Pos logit contributions
aroo+0.006
aro+0.006
 nowhere+0.005
t+0.005
't+0.004
Neg logit contributions
rated-0.012
urg-0.012
icious-0.012
 gag-0.012
aus-0.012
Token lightning
Token ID14357
Feature activation+1.10e+00

Pos logit contributions
aus+0.020
 gag+0.020
urg+0.020
rated+0.020
icious+0.020
Neg logit contributions
aroo-0.021
aro-0.019
 nowhere-0.018
t-0.017
hog-0.017
Token struck
Token ID7425
Feature activation+2.37e-01

Pos logit contributions
aus+0.031
 gag+0.031
urg+0.030
rated+0.030
 decisions+0.030
Neg logit contributions
aroo-0.023
aro-0.021
t-0.020
 nowhere-0.019
hog-0.017
Token the
Token ID262
Feature activation+8.96e-01

Pos logit contributions
rated+0.008
 gag+0.008
aus+0.008
icious+0.008
urg+0.008
Neg logit contributions
aroo-0.004
 nowhere-0.003
aro-0.003
t-0.003
't-0.003
Token hyd
Token ID7409
Feature activation+6.20e+00

Pos logit contributions
rated+0.022
aus+0.021
urg+0.021
 gag+0.021
icious+0.021
Neg logit contributions
aroo-0.022
aro-0.020
 nowhere-0.019
t-0.018
hog-0.018
Tokenrant
Token ID5250
Feature activation-8.57e-01

Pos logit contributions
 gag+0.146
aus+0.144
 french+0.140
 mischief+0.139
 decisions+0.138
Neg logit contributions
t-0.142
aroo-0.142
aro-0.136
hog-0.133
oe-0.126
Token.
Token ID13
Feature activation+5.98e-01

Pos logit contributions
aroo+0.015
aro+0.014
t+0.012
 nowhere+0.012
hog+0.012
Neg logit contributions
aus-0.026
 gag-0.026
icious-0.026
rated-0.026
urg-0.026
Token Suddenly
Token ID24975
Feature activation+6.07e-01

Pos logit contributions
aus+0.017
 gag+0.017
rated+0.017
ksh+0.017
urg+0.017
Neg logit contributions
aroo-0.012
aro-0.011
 nowhere-0.009
t-0.009
hog-0.008
Token water
Token ID1660
Feature activation-3.47e-01

Pos logit contributions
rated+0.019
 gag+0.019
urg+0.019
aus+0.019
icious+0.019
Neg logit contributions
aroo-0.010
aro-0.009
 nowhere-0.008
t-0.008
hog-0.007
Token was
Token ID373
Feature activation+6.19e-01

Pos logit contributions
aroo+0.007
aro+0.007
 nowhere+0.006
t+0.006
hog+0.006
Neg logit contributions
rated-0.009
 gag-0.009
urg-0.009
aus-0.009
 decisions-0.009
Token She
Token ID1375
Feature activation+1.03e+00

Pos logit contributions
aus+0.019
ksh+0.019
 gag+0.019
 burgers+0.018
 french+0.018
Neg logit contributions
aroo-0.016
aro-0.015
 nowhere-0.014
t-0.014
hog-0.012
Token turned
Token ID2900
Feature activation+8.09e-01

Pos logit contributions
rated+0.027
urg+0.026
 gag+0.026
aus+0.026
 decisions+0.026
Neg logit contributions
aroo-0.023
aro-0.022
 nowhere-0.020
t-0.019
hog-0.018
Token on
Token ID319
Feature activation-9.65e-02

Pos logit contributions
rated+0.024
 gag+0.024
urg+0.024
aus+0.023
 decisions+0.023
Neg logit contributions
aroo-0.015
aro-0.014
 nowhere-0.012
t-0.012
hog-0.011
Token the
Token ID262
Feature activation+1.11e+00

Pos logit contributions
aroo+0.002
aro+0.002
t+0.002
 nowhere+0.002
ite+0.001
Neg logit contributions
rated-0.003
urg-0.003
 gag-0.003
 decisions-0.003
aus-0.003
Token f
Token ID277
Feature activation+2.42e+00

Pos logit contributions
rated+0.028
urg+0.027
 gag+0.027
aus+0.027
icious+0.026
Neg logit contributions
aroo-0.027
aro-0.025
 nowhere-0.023
t-0.022
hog-0.021
Tokenauc
Token ID14272
Feature activation+5.89e+00

Pos logit contributions
rated+0.036
 gag+0.035
urg+0.034
aus+0.034
 decisions+0.032
Neg logit contributions
aroo-0.082
aro-0.079
t-0.074
 nowhere-0.073
hog-0.071
Tokenet
Token ID316
Feature activation-1.66e-01

Pos logit contributions
 difference+0.138
 mental+0.137
icious+0.134
 gag+0.133
 Esk+0.130
Neg logit contributions
aroo-0.139
aro-0.136
hog-0.126
t-0.124
uzz-0.114
Token,
Token ID11
Feature activation+2.73e+00

Pos logit contributions
aroo+0.003
aro+0.003
t+0.002
 nowhere+0.002
hog+0.002
Neg logit contributions
rated-0.005
 gag-0.005
urg-0.005
aus-0.005
icious-0.005
Token but
Token ID475
Feature activation-1.57e-01

Pos logit contributions
rated+0.083
aus+0.082
urg+0.081
icious+0.081
 gag+0.080
Neg logit contributions
aroo-0.050
aro-0.046
 nowhere-0.044
t-0.039
hog-0.036
Token the
Token ID262
Feature activation+6.85e-01

Pos logit contributions
aroo+0.002
aro+0.002
t+0.002
 nowhere+0.002
hog+0.002
Neg logit contributions
rated-0.005
 gag-0.005
urg-0.005
aus-0.005
 decisions-0.005
Token water
Token ID1660
Feature activation+1.70e-01

Pos logit contributions
rated+0.016
urg+0.015
aus+0.015
 gag+0.015
icious+0.015
Neg logit contributions
aroo-0.018
aro-0.017
 nowhere-0.015
t-0.015
hog-0.014
Token his
Token ID465
Feature activation+1.22e+00

Pos logit contributions
rated+0.034
urg+0.033
 gag+0.033
aus+0.033
 decisions+0.032
Neg logit contributions
aroo-0.021
aro-0.019
 nowhere-0.017
t-0.017
hog-0.015
Token hands
Token ID2832
Feature activation+7.70e-01

Pos logit contributions
 gag+0.029
rated+0.028
urg+0.028
aus+0.027
 decisions+0.027
Neg logit contributions
aroo-0.030
aro-0.029
t-0.026
 nowhere-0.026
hog-0.024
Token under
Token ID739
Feature activation-6.53e-01

Pos logit contributions
rated+0.021
 colder+0.020
ksh+0.020
 hotter+0.020
 gag+0.020
Neg logit contributions
aroo-0.016
aro-0.015
hog-0.013
 nowhere-0.012
t-0.012
Token the
Token ID262
Feature activation+1.13e+00

Pos logit contributions
aroo+0.016
aro+0.015
t+0.014
hog+0.013
ite+0.013
Neg logit contributions
 gag-0.016
urg-0.016
 burgers-0.015
rated-0.015
 hamb-0.015
Token f
Token ID277
Feature activation+2.49e+00

Pos logit contributions
rated+0.023
aus+0.022
urg+0.022
icious+0.022
 gag+0.021
Neg logit contributions
aroo-0.033
aro-0.031
 nowhere-0.028
t-0.028
hog-0.027
Tokenauc
Token ID14272
Feature activation+5.86e+00

Pos logit contributions
rated+0.038
 gag+0.036
aus+0.036
urg+0.036
 decisions+0.035
Neg logit contributions
aroo-0.083
aro-0.080
t-0.075
 nowhere-0.074
hog-0.072
Tokenet
Token ID316
Feature activation-5.00e-01

Pos logit contributions
icious+0.133
 decisions+0.132
 mental+0.130
 gag+0.130
 difference+0.128
Neg logit contributions
aroo-0.138
aro-0.135
hog-0.131
t-0.120
uzz-0.113
Token.
Token ID13
Feature activation+7.78e-01

Pos logit contributions
aroo+0.009
aro+0.008
 nowhere+0.008
t+0.007
hog+0.007
Neg logit contributions
rated-0.015
urg-0.015
 gag-0.015
aus-0.015
 decisions-0.014
Token He
Token ID679
Feature activation+4.38e-01

Pos logit contributions
ksh+0.019
aus+0.019
rated+0.019
urg+0.019
 gag+0.019
Neg logit contributions
aroo-0.019
aro-0.018
 nowhere-0.016
t-0.015
ite-0.015
Token watched
Token ID7342
Feature activation+9.22e-01

Pos logit contributions
rated+0.012
icious+0.012
 decisions+0.012
urg+0.012
aus+0.012
Neg logit contributions
aroo-0.009
aro-0.009
 nowhere-0.008
t-0.008
hog-0.007
Token as
Token ID355
Feature activation+2.22e+00

Pos logit contributions
rated+0.030
 gag+0.030
aus+0.029
urg+0.029
 decisions+0.029
Neg logit contributions
aroo-0.015
aro-0.014
 nowhere-0.012
t-0.012
hog-0.010
Token and
Token ID290
Feature activation-4.54e-01

Pos logit contributions
rated+0.021
urg+0.020
 gag+0.020
aus+0.020
 burgers+0.020
Neg logit contributions
aroo-0.014
aro-0.014
 nowhere-0.012
t-0.011
't-0.010
Token turned
Token ID2900
Feature activation+1.23e+00

Pos logit contributions
aroo+0.010
aro+0.010
 nowhere+0.009
t+0.008
hog+0.008
Neg logit contributions
rated-0.012
urg-0.012
aus-0.012
icious-0.011
 gag-0.011
Token on
Token ID319
Feature activation+2.50e-01

Pos logit contributions
rated+0.037
 gag+0.037
urg+0.037
aus+0.036
 decisions+0.036
Neg logit contributions
aroo-0.022
aro-0.020
 nowhere-0.018
t-0.017
hog-0.016
Token the
Token ID262
Feature activation+1.11e+00

Pos logit contributions
 gag+0.007
urg+0.007
rated+0.007
aus+0.007
 decisions+0.007
Neg logit contributions
aroo-0.005
aro-0.005
t-0.004
 nowhere-0.004
ite-0.004
Token f
Token ID277
Feature activation+2.56e+00

Pos logit contributions
rated+0.025
urg+0.025
aus+0.024
 gag+0.024
icious+0.023
Neg logit contributions
aroo-0.029
aro-0.027
 nowhere-0.026
t-0.025
hog-0.024
Tokenauc
Token ID14272
Feature activation+5.84e+00

Pos logit contributions
rated+0.039
 gag+0.038
urg+0.037
aus+0.037
 decisions+0.036
Neg logit contributions
aroo-0.085
aro-0.082
t-0.077
 nowhere-0.076
hog-0.074
Tokenet
Token ID316
Feature activation-4.06e-01

Pos logit contributions
icious+0.140
 mental+0.136
 decisions+0.134
urg+0.134
 difference+0.132
Neg logit contributions
aroo-0.131
aro-0.125
hog-0.124
t-0.115
oe-0.107
Token.
Token ID13
Feature activation+7.28e-01

Pos logit contributions
aroo+0.007
aro+0.006
 nowhere+0.006
t+0.006
hog+0.005
Neg logit contributions
rated-0.013
urg-0.012
 gag-0.012
aus-0.012
icious-0.012
Token
Token ID198
Feature activation+8.06e-01

Pos logit contributions
 gag+0.017
aus+0.017
urg+0.016
 microwave+0.016
 hotter+0.016
Neg logit contributions
aroo-0.019
aro-0.018
 nowhere-0.015
ite-0.015
t-0.014
Token
Token ID198
Feature activation-2.67e-01

Pos logit contributions
 mental+0.023
iver+0.023
aus+0.023
urg+0.022
rated+0.022
Neg logit contributions
aroo-0.015
aro-0.014
 nowhere-0.012
hog-0.011
ite-0.011
TokenL
Token ID43
Feature activation-4.84e-01

Pos logit contributions
aroo+0.006
aro+0.006
 nowhere+0.005
hog+0.004
uttered+0.004
Neg logit contributions
have-0.007
aus-0.007
icious-0.007
iver-0.007
rated-0.007
Token and
Token ID290
Feature activation+1.08e-01

Pos logit contributions
rated+0.033
 gag+0.033
urg+0.033
aus+0.033
icious+0.032
Neg logit contributions
aroo-0.014
aro-0.012
t-0.011
 nowhere-0.011
hog-0.010
Token turned
Token ID2900
Feature activation+1.30e+00

Pos logit contributions
rated+0.003
icious+0.003
urg+0.003
aus+0.003
 decisions+0.003
Neg logit contributions
aroo-0.002
aro-0.002
 nowhere-0.002
t-0.002
 she-0.002
Token on
Token ID319
Feature activation-8.73e-02

Pos logit contributions
rated+0.045
urg+0.045
aus+0.045
icious+0.044
 gag+0.044
Neg logit contributions
aroo-0.018
aro-0.016
 nowhere-0.014
t-0.014
't-0.011
Token the
Token ID262
Feature activation+1.30e+00

Pos logit contributions
aroo+0.001
aro+0.001
 nowhere+0.001
t+0.001
hog+0.001
Neg logit contributions
rated-0.003
urg-0.003
aus-0.003
 gag-0.003
icious-0.003
Token f
Token ID277
Feature activation+2.74e+00

Pos logit contributions
rated+0.032
urg+0.031
aus+0.031
 gag+0.030
icious+0.030
Neg logit contributions
aroo-0.031
aro-0.029
 nowhere-0.028
t-0.027
hog-0.025
Tokenauc
Token ID14272
Feature activation+5.82e+00

Pos logit contributions
rated+0.049
 gag+0.047
 difference+0.047
aus+0.046
 decisions+0.046
Neg logit contributions
aroo-0.082
aro-0.080
t-0.078
 nowhere-0.071
hog-0.071
Tokenet
Token ID316
Feature activation-3.81e-01

Pos logit contributions
icious+0.145
 mental+0.143
 difference+0.141
 decisions+0.140
 alright+0.139
Neg logit contributions
aroo-0.121
aro-0.115
hog-0.108
t-0.107
oe-0.101
Token.
Token ID13
Feature activation+8.22e-01

Pos logit contributions
aroo+0.006
aro+0.005
t+0.005
 nowhere+0.004
hog+0.004
Neg logit contributions
rated-0.013
 gag-0.013
icious-0.013
aus-0.012
urg-0.012
Token Soon
Token ID15894
Feature activation+1.48e-01

Pos logit contributions
 gag+0.022
urg+0.022
rated+0.022
ksh+0.022
aus+0.022
Neg logit contributions
aroo-0.018
aro-0.017
 nowhere-0.015
t-0.014
hog-0.014
Token,
Token ID11
Feature activation+5.55e-01

Pos logit contributions
rated+0.005
 gag+0.005
urg+0.005
aus+0.005
icious+0.005
Neg logit contributions
aroo-0.002
aro-0.002
 nowhere-0.002
t-0.002
hog-0.001
Token the
Token ID262
Feature activation+1.38e+00

Pos logit contributions
rated+0.020
urg+0.020
 gag+0.020
aus+0.020
icious+0.019
Neg logit contributions
aroo-0.007
aro-0.006
 nowhere-0.005
t-0.005
hog-0.004
Token to
Token ID284
Feature activation+5.83e-01

Pos logit contributions
aroo+0.005
aro+0.005
 nowhere+0.004
ite+0.004
hog+0.004
Neg logit contributions
urg-0.011
 decisions-0.011
 gag-0.010
aus-0.010
 burgers-0.010
Token turn
Token ID1210
Feature activation-2.31e-02

Pos logit contributions
rated+0.014
urg+0.013
ksh+0.013
aus+0.013
 gag+0.013
Neg logit contributions
aroo-0.014
aro-0.013
t-0.013
 nowhere-0.012
't-0.012
Token off
Token ID572
Feature activation+1.38e-01

Pos logit contributions
aroo+0.000
aro+0.000
 nowhere+0.000
t+0.000
hog+0.000
Neg logit contributions
rated-0.001
urg-0.001
 gag-0.001
aus-0.001
icious-0.001
Token the
Token ID262
Feature activation+1.01e+00

Pos logit contributions
rated+0.004
urg+0.004
 gag+0.004
aus+0.004
 decisions+0.004
Neg logit contributions
aroo-0.003
aro-0.002
 nowhere-0.002
t-0.002
hog-0.002
Token f
Token ID277
Feature activation+2.21e+00

Pos logit contributions
rated+0.024
urg+0.023
aus+0.023
 gag+0.023
icious+0.022
Neg logit contributions
aroo-0.026
aro-0.024
 nowhere-0.023
t-0.023
hog-0.021
Tokenauc
Token ID14272
Feature activation+5.78e+00

Pos logit contributions
rated+0.034
 gag+0.032
urg+0.031
aus+0.031
 decisions+0.031
Neg logit contributions
aroo-0.074
aro-0.072
t-0.067
 nowhere-0.066
hog-0.064
Tokenet
Token ID316
Feature activation-7.44e-01

Pos logit contributions
 gag+0.132
 mental+0.130
icious+0.129
 decisions+0.129
urg+0.126
Neg logit contributions
aroo-0.134
aro-0.131
hog-0.121
t-0.120
oe-0.108
Token when
Token ID618
Feature activation+1.32e+00

Pos logit contributions
aroo+0.013
aro+0.012
 nowhere+0.011
t+0.010
hog+0.009
Neg logit contributions
rated-0.023
urg-0.023
 gag-0.023
aus-0.023
 decisions-0.022
Token he
Token ID339
Feature activation-6.65e-02

Pos logit contributions
rated+0.043
urg+0.043
 gag+0.043
aus+0.043
 decisions+0.042
Neg logit contributions
aroo-0.020
aro-0.018
 nowhere-0.015
t-0.015
hog-0.014
Token was
Token ID373
Feature activation-2.19e-01

Pos logit contributions
aroo+0.001
aro+0.001
t+0.001
 nowhere+0.001
't+0.001
Neg logit contributions
rated-0.002
 decisions-0.002
urg-0.002
icious-0.002
 gag-0.002
Token done
Token ID1760
Feature activation+3.77e-01

Pos logit contributions
aroo+0.005
 nowhere+0.005
aro+0.004
t+0.004
't+0.004
Neg logit contributions
rated-0.006
aus-0.006
 decisions-0.006
ksh-0.006
urg-0.006
Token and
Token ID290
Feature activation-1.36e+00

Pos logit contributions
rated+0.021
urg+0.020
 gag+0.020
aus+0.020
icious+0.020
Neg logit contributions
aroo-0.010
aro-0.009
 nowhere-0.009
t-0.008
hog-0.007
Token turned
Token ID2900
Feature activation+7.16e-01

Pos logit contributions
aroo+0.036
aro+0.034
t+0.030
 nowhere+0.030
hog+0.029
Neg logit contributions
 gag-0.030
rated-0.030
aus-0.029
urg-0.029
 hamb-0.029
Token on
Token ID319
Feature activation-5.19e-01

Pos logit contributions
 gag+0.020
rated+0.020
urg+0.020
aus+0.020
 hotter+0.020
Neg logit contributions
aroo-0.014
aro-0.012
 nowhere-0.011
t-0.011
hog-0.010
Token the
Token ID262
Feature activation+1.03e+00

Pos logit contributions
aroo+0.012
aro+0.010
t+0.009
 nowhere+0.009
ite+0.009
Neg logit contributions
urg-0.014
 gag-0.014
 burgers-0.013
aus-0.013
 decisions-0.013
Token f
Token ID277
Feature activation+2.39e+00

Pos logit contributions
rated+0.024
urg+0.024
aus+0.024
 gag+0.024
icious+0.023
Neg logit contributions
aroo-0.025
aro-0.024
 nowhere-0.023
t-0.022
hog-0.021
Tokenauc
Token ID14272
Feature activation+5.75e+00

Pos logit contributions
rated+0.037
 gag+0.035
urg+0.035
aus+0.034
 decisions+0.033
Neg logit contributions
aroo-0.079
aro-0.076
t-0.072
 nowhere-0.071
hog-0.069
Tokenet
Token ID316
Feature activation-3.67e-01

Pos logit contributions
 mental+0.139
icious+0.138
 gag+0.137
 difference+0.137
 decisions+0.135
Neg logit contributions
aroo-0.122
aro-0.118
hog-0.117
t-0.111
oe-0.101
Token.
Token ID13
Feature activation+2.13e-01

Pos logit contributions
aroo+0.006
aro+0.005
t+0.005
 nowhere+0.005
hog+0.004
Neg logit contributions
rated-0.012
 gag-0.012
urg-0.012
aus-0.012
 decisions-0.012
Token The
Token ID383
Feature activation+1.03e+00

Pos logit contributions
ksh+0.005
 gag+0.005
aus+0.005
 microwave+0.005
 mental+0.005
Neg logit contributions
aroo-0.005
aro-0.005
 nowhere-0.004
ite-0.004
t-0.004
Token water
Token ID1660
Feature activation+2.39e-01

Pos logit contributions
rated+0.023
urg+0.023
aus+0.023
 gag+0.022
icious+0.022
Neg logit contributions
aroo-0.027
aro-0.026
 nowhere-0.024
t-0.023
hog-0.022
Token was
Token ID373
Feature activation+1.02e+00

Pos logit contributions
rated+0.007
urg+0.007
 gag+0.007
aus+0.007
icious+0.006
Neg logit contributions
aroo-0.005
aro-0.005
 nowhere-0.004
t-0.004
hog-0.004
Token She
Token ID1375
Feature activation+6.82e-01

Pos logit contributions
 gag+0.024
 difference+0.024
 insists+0.024
 mental+0.024
 swe+0.023
Neg logit contributions
aroo-0.039
aro-0.038
 nowhere-0.030
t-0.029
worm-0.028
Token turned
Token ID2900
Feature activation+8.85e-01

Pos logit contributions
rated+0.017
urg+0.017
 gag+0.017
aus+0.017
icious+0.016
Neg logit contributions
aroo-0.016
aro-0.015
 nowhere-0.013
t-0.013
hog-0.012
Token off
Token ID572
Feature activation+5.68e-01

Pos logit contributions
rated+0.027
urg+0.027
 gag+0.027
aus+0.027
 decisions+0.026
Neg logit contributions
aroo-0.016
aro-0.014
 nowhere-0.013
t-0.012
hog-0.011
Token the
Token ID262
Feature activation+1.42e+00

Pos logit contributions
rated+0.016
urg+0.016
 gag+0.016
aus+0.016
 decisions+0.015
Neg logit contributions
aroo-0.012
aro-0.011
 nowhere-0.009
t-0.009
hog-0.009
Token f
Token ID277
Feature activation+2.44e+00

Pos logit contributions
rated+0.035
urg+0.034
 gag+0.034
aus+0.034
icious+0.032
Neg logit contributions
aroo-0.034
aro-0.032
 nowhere-0.030
t-0.029
hog-0.027
Tokenauc
Token ID14272
Feature activation+5.71e+00

Pos logit contributions
rated+0.034
urg+0.032
 gag+0.032
aus+0.032
 decisions+0.030
Neg logit contributions
aroo-0.086
aro-0.082
 nowhere-0.078
t-0.076
hog-0.073
Tokenet
Token ID316
Feature activation-3.12e-01

Pos logit contributions
urg+0.125
 gag+0.125
 mental+0.124
icious+0.121
aus+0.119
Neg logit contributions
aroo-0.143
aro-0.133
hog-0.129
t-0.124
oe-0.116
Token and
Token ID290
Feature activation+3.82e-01

Pos logit contributions
aroo+0.006
aro+0.005
 nowhere+0.005
t+0.004
hog+0.004
Neg logit contributions
rated-0.009
urg-0.009
 gag-0.009
aus-0.009
 decisions-0.009
Token dried
Token ID16577
Feature activation+6.32e-01

Pos logit contributions
rated+0.010
urg+0.010
aus+0.010
 gag+0.010
icious+0.010
Neg logit contributions
aroo-0.008
aro-0.008
 nowhere-0.007
t-0.007
hog-0.006
Token herself
Token ID5223
Feature activation-6.42e-01

Pos logit contributions
rated+0.018
aus+0.018
 gag+0.018
icious+0.018
urg+0.018
Neg logit contributions
aroo-0.012
aro-0.012
 nowhere-0.010
t-0.010
hog-0.009
Token.
Token ID13
Feature activation+1.00e+00

Pos logit contributions
aroo+0.010
aro+0.009
 nowhere+0.008
t+0.007
hog+0.006
Neg logit contributions
rated-0.022
aus-0.021
urg-0.021
 gag-0.021
icious-0.021
Token to
Token ID284
Feature activation+8.38e-01

Pos logit contributions
 gag+0.004
 decisions+0.004
urg+0.004
rated+0.004
 burgers+0.004
Neg logit contributions
aroo-0.002
aro-0.002
 nowhere-0.002
ite-0.002
hog-0.002
Token turn
Token ID1210
Feature activation-1.88e-01

Pos logit contributions
rated+0.019
ksh+0.019
 degrees+0.018
 gag+0.018
urg+0.018
Neg logit contributions
aroo-0.022
t-0.020
aro-0.020
't-0.020
 nowhere-0.019
Token off
Token ID572
Feature activation+3.62e-01

Pos logit contributions
aroo+0.004
aro+0.003
 nowhere+0.003
t+0.003
hog+0.003
Neg logit contributions
rated-0.006
urg-0.005
 gag-0.005
aus-0.005
 decisions-0.005
Token the
Token ID262
Feature activation+7.28e-01

Pos logit contributions
rated+0.011
urg+0.011
 gag+0.011
aus+0.011
 decisions+0.010
Neg logit contributions
aroo-0.007
aro-0.006
 nowhere-0.006
t-0.005
hog-0.005
Token f
Token ID277
Feature activation+2.29e+00

Pos logit contributions
rated+0.016
urg+0.016
 gag+0.016
aus+0.015
 decisions+0.015
Neg logit contributions
aroo-0.019
aro-0.018
 nowhere-0.017
t-0.017
hog-0.016
Tokenauc
Token ID14272
Feature activation+5.71e+00

Pos logit contributions
rated+0.032
 gag+0.031
urg+0.031
aus+0.030
 decisions+0.029
Neg logit contributions
aroo-0.079
aro-0.076
 nowhere-0.071
t-0.071
hog-0.068
Tokenet
Token ID316
Feature activation-6.05e-01

Pos logit contributions
icious+0.126
 gag+0.126
 mental+0.126
 decisions+0.125
 difference+0.123
Neg logit contributions
aroo-0.137
aro-0.133
hog-0.123
t-0.123
oe-0.110
Token when
Token ID618
Feature activation+1.49e+00

Pos logit contributions
aroo+0.011
aro+0.010
t+0.008
 nowhere+0.008
hog+0.007
Neg logit contributions
rated-0.019
 gag-0.019
urg-0.019
aus-0.018
icious-0.018
Token she
Token ID673
Feature activation-3.27e-01

Pos logit contributions
 gag+0.050
urg+0.050
rated+0.049
 decisions+0.049
aus+0.049
Neg logit contributions
aroo-0.022
aro-0.019
hog-0.016
 nowhere-0.015
t-0.015
Token was
Token ID373
Feature activation-3.24e-01

Pos logit contributions
aroo+0.006
aro+0.006
 nowhere+0.005
t+0.005
hog+0.005
Neg logit contributions
rated-0.010
urg-0.010
 gag-0.009
aus-0.009
 decisions-0.009
Token done
Token ID1760
Feature activation+5.30e-01

Pos logit contributions
aroo+0.007
 nowhere+0.007
aro+0.006
t+0.006
't+0.006
Neg logit contributions
rated-0.009
aus-0.009
ksh-0.009
 decisions-0.009
 difference-0.009
Token She
Token ID1375
Feature activation-2.78e-01

Pos logit contributions
aroo+0.005
aro+0.005
 nowhere+0.004
hog+0.003
ite+0.003
Neg logit contributions
rated-0.004
 gag-0.004
aus-0.004
 frying-0.004
iver-0.004
Token turned
Token ID2900
Feature activation+6.55e-01

Pos logit contributions
aroo+0.005
aro+0.005
 nowhere+0.005
t+0.005
hog+0.004
Neg logit contributions
 decisions-0.008
rated-0.008
urg-0.008
icious-0.008
 difference-0.008
Token on
Token ID319
Feature activation-6.00e-01

Pos logit contributions
rated+0.022
urg+0.021
 gag+0.021
aus+0.021
icious+0.021
Neg logit contributions
aroo-0.010
aro-0.009
 nowhere-0.009
t-0.008
't-0.007
Token the
Token ID262
Feature activation+7.17e-01

Pos logit contributions
aroo+0.012
aro+0.011
t+0.010
 nowhere+0.010
hog+0.009
Neg logit contributions
rated-0.017
urg-0.017
 gag-0.017
aus-0.017
 decisions-0.016
Token f
Token ID277
Feature activation+2.50e+00

Pos logit contributions
rated+0.016
urg+0.016
aus+0.016
icious+0.015
 gag+0.015
Neg logit contributions
aroo-0.019
aro-0.018
t-0.017
 nowhere-0.016
't-0.015
Tokenauc
Token ID14272
Feature activation+5.71e+00

Pos logit contributions
rated+0.038
 gag+0.036
urg+0.036
aus+0.036
 decisions+0.035
Neg logit contributions
aroo-0.083
aro-0.080
t-0.076
 nowhere-0.075
hog-0.072
Tokenet
Token ID316
Feature activation-2.12e-01

Pos logit contributions
 difference+0.133
icious+0.132
 mental+0.131
 decisions+0.129
 gag+0.128
Neg logit contributions
aroo-0.130
aro-0.124
hog-0.118
t-0.117
haired-0.106
Token and
Token ID290
Feature activation-2.07e-01

Pos logit contributions
aroo+0.003
aro+0.003
t+0.003
 nowhere+0.002
't+0.002
Neg logit contributions
rated-0.007
 gag-0.007
urg-0.007
icious-0.007
 difference-0.007
Token icy
Token ID30284
Feature activation+1.92e+00

Pos logit contributions
aroo+0.004
aro+0.004
 nowhere+0.004
t+0.003
hog+0.003
Neg logit contributions
rated-0.006
urg-0.006
 decisions-0.006
 gag-0.006
icious-0.006
Token water
Token ID1660
Feature activation+3.98e-01

Pos logit contributions
icious+0.053
aus+0.053
rated+0.051
 mental+0.050
 alright+0.049
Neg logit contributions
aroo-0.041
aro-0.039
t-0.036
 nowhere-0.033
 she-0.032
Token came
Token ID1625
Feature activation+1.54e+00

Pos logit contributions
rated+0.012
aus+0.012
icious+0.012
 decisions+0.012
urg+0.012
Neg logit contributions
aroo-0.007
aro-0.006
t-0.006
 nowhere-0.006
't-0.005
Token and
Token ID290
Feature activation-5.00e-01

Pos logit contributions
rated+0.034
urg+0.034
 gag+0.033
aus+0.033
 decisions+0.033
Neg logit contributions
aroo-0.019
aro-0.017
t-0.014
 nowhere-0.014
hog-0.013
Token turned
Token ID2900
Feature activation+8.84e-01

Pos logit contributions
aroo+0.012
aro+0.011
 nowhere+0.010
t+0.010
hog+0.009
Neg logit contributions
rated-0.012
aus-0.012
urg-0.012
 gag-0.012
ksh-0.012
Token off
Token ID572
Feature activation+6.60e-01

Pos logit contributions
rated+0.029
 gag+0.028
urg+0.028
aus+0.028
 decisions+0.027
Neg logit contributions
aroo-0.014
aro-0.013
 nowhere-0.011
t-0.011
hog-0.010
Token the
Token ID262
Feature activation+1.31e+00

Pos logit contributions
rated+0.022
urg+0.022
 gag+0.021
aus+0.021
icious+0.021
Neg logit contributions
aroo-0.010
aro-0.009
 nowhere-0.009
t-0.008
hog-0.007
Token f
Token ID277
Feature activation+2.34e+00

Pos logit contributions
rated+0.032
aus+0.032
icious+0.031
urg+0.031
 gag+0.031
Neg logit contributions
aroo-0.031
aro-0.030
t-0.028
 nowhere-0.027
't-0.026
Tokenauc
Token ID14272
Feature activation+5.69e+00

Pos logit contributions
rated+0.038
 gag+0.037
aus+0.036
urg+0.035
 decisions+0.035
Neg logit contributions
aroo-0.076
aro-0.074
t-0.069
 nowhere-0.066
hog-0.066
Tokenet
Token ID316
Feature activation-3.07e-01

Pos logit contributions
icious+0.132
 gag+0.129
 mental+0.127
 mischief+0.125
 difference+0.124
Neg logit contributions
aroo-0.132
aro-0.127
hog-0.122
t-0.117
oe-0.108
Token for
Token ID329
Feature activation+1.41e+00

Pos logit contributions
aroo+0.005
aro+0.005
t+0.004
 nowhere+0.004
hog+0.004
Neg logit contributions
rated-0.010
 gag-0.010
urg-0.010
aus-0.010
icious-0.009
Token her
Token ID607
Feature activation-5.97e-01

Pos logit contributions
rated+0.047
 gag+0.045
aus+0.044
urg+0.044
ksh+0.044
Neg logit contributions
aroo-0.022
aro-0.020
 nowhere-0.019
t-0.017
't-0.016
Token.
Token ID13
Feature activation+1.22e+00

Pos logit contributions
aroo+0.011
aro+0.010
 nowhere+0.009
t+0.008
hog+0.008
Neg logit contributions
rated-0.018
urg-0.018
 gag-0.018
aus-0.018
icious-0.018
Token Lily
Token ID20037
Feature activation-4.61e-01

Pos logit contributions
rated+0.040
 gag+0.039
urg+0.039
aus+0.039
 decisions+0.038
Neg logit contributions
aroo-0.020
aro-0.018
 nowhere-0.015
t-0.015
hog-0.013
Token him
Token ID683
Feature activation+1.80e-01

Pos logit contributions
rated+0.024
 gag+0.022
urg+0.022
aus+0.022
ksh+0.022
Neg logit contributions
aroo-0.017
aro-0.016
 nowhere-0.015
t-0.014
't-0.013
Token and
Token ID290
Feature activation+2.99e-01

Pos logit contributions
rated+0.006
urg+0.006
 gag+0.006
aus+0.006
icious+0.006
Neg logit contributions
aroo-0.003
aro-0.003
 nowhere-0.002
t-0.002
hog-0.002
Token checked
Token ID10667
Feature activation+1.35e+00

Pos logit contributions
rated+0.009
urg+0.009
 gag+0.009
aus+0.009
icious+0.009
Neg logit contributions
aroo-0.006
aro-0.005
 nowhere-0.005
t-0.004
hog-0.004
Token the
Token ID262
Feature activation+4.87e-01

Pos logit contributions
rated+0.034
urg+0.033
aus+0.033
 decisions+0.033
 burgers+0.033
Neg logit contributions
aroo-0.030
aro-0.028
 nowhere-0.026
t-0.026
hog-0.025
Token f
Token ID277
Feature activation+2.24e+00

Pos logit contributions
rated+0.012
urg+0.012
 gag+0.012
aus+0.012
icious+0.011
Neg logit contributions
aroo-0.011
aro-0.011
 nowhere-0.010
t-0.010
hog-0.009
Tokenauc
Token ID14272
Feature activation+5.67e+00

Pos logit contributions
rated+0.034
 gag+0.034
urg+0.033
aus+0.033
 decisions+0.032
Neg logit contributions
aroo-0.074
aro-0.072
t-0.067
 nowhere-0.066
hog-0.064
Tokenet
Token ID316
Feature activation-6.88e-01

Pos logit contributions
 mental+0.133
 gag+0.131
urg+0.129
icious+0.128
 difference+0.128
Neg logit contributions
aroo-0.130
aro-0.127
t-0.120
hog-0.117
oe-0.111
Token before
Token ID878
Feature activation+1.02e+00

Pos logit contributions
aroo+0.012
aro+0.011
 nowhere+0.010
t+0.010
hog+0.009
Neg logit contributions
rated-0.021
 gag-0.021
urg-0.021
aus-0.021
icious-0.020
Token leaving
Token ID4305
Feature activation+8.89e-01

Pos logit contributions
rated+0.033
urg+0.033
 gag+0.033
aus+0.033
 decisions+0.032
Neg logit contributions
aroo-0.016
aro-0.014
 nowhere-0.013
t-0.012
hog-0.011
Token him
Token ID683
Feature activation+2.83e-01

Pos logit contributions
rated+0.028
 gag+0.028
urg+0.028
aus+0.028
icious+0.027
Neg logit contributions
aroo-0.015
aro-0.014
 nowhere-0.013
t-0.011
hog-0.010
Token alone
Token ID3436
Feature activation-1.49e-01

Pos logit contributions
rated+0.009
aus+0.009
urg+0.009
 gag+0.009
icious+0.009
Neg logit contributions
aroo-0.005
aro-0.004
 nowhere-0.004
t-0.004
hog-0.003
Token shooting
Token ID4395
Feature activation+3.28e-01

Pos logit contributions
aroo+0.014
aro+0.013
 nowhere+0.011
t+0.011
hog+0.009
Neg logit contributions
rated-0.034
 gag-0.033
aus-0.033
urg-0.033
icious-0.032
Token out
Token ID503
Feature activation+2.38e-01

Pos logit contributions
rated+0.010
 gag+0.010
aus+0.010
urg+0.010
icious+0.010
Neg logit contributions
aroo-0.006
aro-0.005
 nowhere-0.005
t-0.004
hog-0.004
Token of
Token ID286
Feature activation+3.25e-02

Pos logit contributions
rated+0.008
urg+0.008
 gag+0.007
aus+0.007
 decisions+0.007
Neg logit contributions
aroo-0.004
aro-0.004
 nowhere-0.003
t-0.003
hog-0.003
Token the
Token ID262
Feature activation+6.63e-01

Pos logit contributions
urg+0.001
 gag+0.001
 decisions+0.001
rated+0.001
 difference+0.001
Neg logit contributions
aroo-0.001
aro-0.001
t-0.001
ite-0.001
hog-0.001
Token f
Token ID277
Feature activation+2.22e+00

Pos logit contributions
rated+0.016
urg+0.016
 gag+0.016
aus+0.016
icious+0.016
Neg logit contributions
aroo-0.016
aro-0.015
 nowhere-0.014
t-0.014
hog-0.013
Tokenauc
Token ID14272
Feature activation+5.66e+00

Pos logit contributions
rated+0.031
 gag+0.030
urg+0.029
aus+0.029
 decisions+0.027
Neg logit contributions
aroo-0.077
aro-0.074
 nowhere-0.070
t-0.069
hog-0.066
Tokenet
Token ID316
Feature activation-9.11e-01

Pos logit contributions
 gag+0.129
 mental+0.125
icious+0.124
urg+0.120
 decisions+0.120
Neg logit contributions
aroo-0.142
aro-0.140
hog-0.134
t-0.120
 nowhere-0.115
Token!
Token ID0
Feature activation+1.45e-01

Pos logit contributions
aroo+0.018
aro+0.016
 nowhere+0.015
t+0.014
hog+0.013
Neg logit contributions
rated-0.026
urg-0.026
aus-0.025
 gag-0.025
 decisions-0.025
Token
Token ID198
Feature activation+3.48e-01

Pos logit contributions
ksh+0.004
aus+0.004
 mental+0.004
 gag+0.004
 mischief+0.004
Neg logit contributions
aroo-0.003
aro-0.003
 nowhere-0.003
t-0.002
ite-0.002
Token
Token ID198
Feature activation-6.89e-01

Pos logit contributions
 mental+0.010
iver+0.010
aus+0.010
 alright+0.010
 gag+0.009
Neg logit contributions
aroo-0.006
aro-0.006
 nowhere-0.005
ite-0.005
hog-0.005
TokenThe
Token ID464
Feature activation+9.91e-01

Pos logit contributions
aroo+0.014
aro+0.014
 nowhere+0.013
t+0.011
ite+0.011
Neg logit contributions
icious-0.019
iver-0.018
aus-0.018
have-0.018
 gag-0.018
Token.
Token ID13
Feature activation+3.92e-01

Pos logit contributions
rated+0.031
aus+0.030
urg+0.030
icious+0.030
 gag+0.030
Neg logit contributions
aroo-0.014
aro-0.013
t-0.012
 nowhere-0.012
hog-0.011
Token They
Token ID1119
Feature activation-9.87e-01

Pos logit contributions
 gag+0.011
urg+0.010
rated+0.010
aus+0.010
ksh+0.010
Neg logit contributions
aroo-0.009
aro-0.008
 nowhere-0.007
t-0.007
hog-0.007
Token opened
Token ID4721
Feature activation-1.18e-01

Pos logit contributions
aroo+0.023
aro+0.022
 nowhere+0.020
t+0.019
hog+0.018
Neg logit contributions
 gag-0.024
rated-0.024
aus-0.024
urg-0.024
 hamb-0.023
Token the
Token ID262
Feature activation+1.92e-02

Pos logit contributions
aroo+0.003
aro+0.003
t+0.002
hog+0.002
 nowhere+0.002
Neg logit contributions
urg-0.003
 gag-0.003
 hotter-0.003
 burgers-0.003
 decisions-0.003
Token f
Token ID277
Feature activation+2.71e+00

Pos logit contributions
rated+0.001
urg+0.001
aus+0.001
 gag+0.001
iver+0.001
Neg logit contributions
aroo-0.000
aro-0.000
 nowhere-0.000
t-0.000
hog-0.000
Tokenauc
Token ID14272
Feature activation+5.66e+00

Pos logit contributions
rated+0.043
 gag+0.042
urg+0.041
aus+0.041
 decisions+0.039
Neg logit contributions
aroo-0.089
aro-0.085
 nowhere-0.080
t-0.080
hog-0.076
Tokenet
Token ID316
Feature activation-4.63e-01

Pos logit contributions
icious+0.129
 gag+0.127
urg+0.126
 mental+0.126
 decisions+0.124
Neg logit contributions
aroo-0.138
aro-0.130
hog-0.128
t-0.112
 nowhere-0.109
Token and
Token ID290
Feature activation-2.50e-01

Pos logit contributions
aroo+0.008
aro+0.007
 nowhere+0.006
t+0.006
hog+0.006
Neg logit contributions
rated-0.015
urg-0.014
 gag-0.014
aus-0.014
 decisions-0.014
Token put
Token ID1234
Feature activation+9.85e-01

Pos logit contributions
aroo+0.006
aro+0.006
 nowhere+0.005
t+0.005
hog+0.005
Neg logit contributions
rated-0.006
 gag-0.006
urg-0.006
aus-0.006
 hamb-0.006
Token their
Token ID511
Feature activation+7.12e-01

Pos logit contributions
rated+0.026
urg+0.026
 gag+0.026
aus+0.026
 decisions+0.025
Neg logit contributions
aroo-0.021
aro-0.019
 nowhere-0.018
t-0.017
hog-0.016
Token letters
Token ID7475
Feature activation+3.64e-01

Pos logit contributions
rated+0.015
urg+0.015
 gag+0.015
aus+0.014
 decisions+0.014
Neg logit contributions
aroo-0.020
aro-0.018
 nowhere-0.017
t-0.017
hog-0.016
Token to
Token ID284
Feature activation+6.53e-01

Pos logit contributions
aroo+0.031
aro+0.028
t+0.025
hog+0.025
ite+0.024
Neg logit contributions
urg-0.039
 decisions-0.038
 gag-0.038
 alright-0.038
 burgers-0.038
Token turn
Token ID1210
Feature activation-3.35e-01

Pos logit contributions
rated+0.015
ksh+0.015
urg+0.015
 gag+0.014
aus+0.014
Neg logit contributions
aroo-0.016
aro-0.015
 nowhere-0.015
t-0.014
't-0.014
Token off
Token ID572
Feature activation+1.51e-01

Pos logit contributions
aroo+0.007
aro+0.006
 nowhere+0.005
t+0.005
hog+0.005
Neg logit contributions
rated-0.010
urg-0.009
 gag-0.009
aus-0.009
 decisions-0.009
Token the
Token ID262
Feature activation+1.02e+00

Pos logit contributions
rated+0.004
urg+0.004
 gag+0.004
aus+0.004
 decisions+0.004
Neg logit contributions
aroo-0.003
aro-0.003
 nowhere-0.002
t-0.002
hog-0.002
Token f
Token ID277
Feature activation+2.29e+00

Pos logit contributions
rated+0.023
 gag+0.022
 difference+0.021
 decisions+0.021
urg+0.021
Neg logit contributions
aroo-0.028
aro-0.026
 nowhere-0.023
t-0.022
hog-0.022
Tokenauc
Token ID14272
Feature activation+5.62e+00

Pos logit contributions
rated+0.035
urg+0.034
 gag+0.034
aus+0.034
 decisions+0.032
Neg logit contributions
aroo-0.076
aro-0.073
 nowhere-0.069
t-0.067
hog-0.065
Tokenet
Token ID316
Feature activation-1.07e+00

Pos logit contributions
 gag+0.135
urg+0.126
 mental+0.125
icious+0.123
 decisions+0.121
Neg logit contributions
aroo-0.129
aro-0.124
hog-0.123
t-0.120
oe-0.110
Token after
Token ID706
Feature activation-1.93e-01

Pos logit contributions
aroo+0.020
aro+0.019
 nowhere+0.017
t+0.015
hog+0.015
Neg logit contributions
rated-0.031
urg-0.031
aus-0.031
 gag-0.030
 decisions-0.030
Token washing
Token ID20518
Feature activation+8.32e-01

Pos logit contributions
aroo+0.004
aro+0.004
hog+0.003
 nowhere+0.003
t+0.003
Neg logit contributions
urg-0.006
 gag-0.005
rated-0.005
 decisions-0.005
aus-0.005
Token her
Token ID607
Feature activation+7.12e-01

Pos logit contributions
icious+0.028
 gag+0.028
rated+0.028
aus+0.028
urg+0.027
Neg logit contributions
aroo-0.011
aro-0.011
 nowhere-0.010
t-0.010
 she-0.010
Token hands
Token ID2832
Feature activation+8.37e-02

Pos logit contributions
rated+0.018
urg+0.018
 gag+0.018
aus+0.018
 decisions+0.017
Neg logit contributions
aroo-0.017
aro-0.016
 nowhere-0.014
t-0.014
hog-0.013
Token you
Token ID345
Feature activation+3.04e-01

Pos logit contributions
rated+0.023
 gag+0.023
urg+0.023
aus+0.023
 decisions+0.022
Neg logit contributions
aroo-0.011
aro-0.010
 nowhere-0.009
t-0.009
hog-0.008
Token turn
Token ID1210
Feature activation+4.29e-01

Pos logit contributions
rated+0.006
urg+0.006
 gag+0.006
aus+0.006
 decisions+0.006
Neg logit contributions
aroo-0.009
aro-0.008
 nowhere-0.008
t-0.007
hog-0.007
Token on
Token ID319
Feature activation-3.22e-01

Pos logit contributions
rated+0.011
 decisions+0.011
 hotter+0.011
urg+0.011
 gag+0.011
Neg logit contributions
aroo-0.009
aro-0.008
 nowhere-0.007
t-0.007
hog-0.007
Token the
Token ID262
Feature activation+2.91e-01

Pos logit contributions
aroo+0.007
aro+0.006
t+0.006
hog+0.006
 nowhere+0.006
Neg logit contributions
 decisions-0.008
 burgers-0.008
 degrees-0.008
urg-0.008
aus-0.008
Token f
Token ID277
Feature activation+2.01e+00

Pos logit contributions
rated+0.007
 gag+0.007
urg+0.007
aus+0.007
 decisions+0.007
Neg logit contributions
aroo-0.007
aro-0.007
 nowhere-0.006
t-0.006
hog-0.005
Tokenauc
Token ID14272
Feature activation+5.61e+00

Pos logit contributions
rated+0.030
 gag+0.029
urg+0.028
aus+0.028
icious+0.026
Neg logit contributions
aroo-0.069
aro-0.066
t-0.062
 nowhere-0.061
hog-0.059
Tokenet
Token ID316
Feature activation-6.34e-01

Pos logit contributions
 gag+0.136
 mental+0.133
icious+0.130
urg+0.128
 Esk+0.128
Neg logit contributions
aroo-0.126
aro-0.122
hog-0.117
t-0.109
uzz-0.101
Token,
Token ID11
Feature activation+5.98e-01

Pos logit contributions
aroo+0.011
aro+0.010
t+0.009
 nowhere+0.009
hog+0.008
Neg logit contributions
rated-0.019
 gag-0.019
urg-0.019
aus-0.019
icious-0.019
Token water
Token ID1660
Feature activation+7.51e-01

Pos logit contributions
rated+0.021
urg+0.021
 gag+0.021
aus+0.020
ksh+0.020
Neg logit contributions
aroo-0.008
aro-0.007
 nowhere-0.006
t-0.006
't-0.005
Token will
Token ID481
Feature activation+6.16e-01

Pos logit contributions
rated+0.017
 gag+0.017
urg+0.017
aus+0.017
icious+0.017
Neg logit contributions
aroo-0.019
aro-0.018
t-0.016
 nowhere-0.016
hog-0.015
Token come
Token ID1282
Feature activation+2.22e+00

Pos logit contributions
rated+0.015
ksh+0.014
 gag+0.014
urg+0.014
aus+0.014
Neg logit contributions
aroo-0.015
aro-0.015
t-0.014
 nowhere-0.014
't-0.013
Token to
Token ID284
Feature activation+5.99e-01

Pos logit contributions
aroo+0.016
aro+0.014
 nowhere+0.012
hog+0.011
t+0.011
Neg logit contributions
 decisions-0.027
 burgers-0.026
 gag-0.026
urg-0.026
 difference-0.026
Token turn
Token ID1210
Feature activation+2.54e-01

Pos logit contributions
rated+0.014
ksh+0.014
aus+0.013
urg+0.013
 gag+0.013
Neg logit contributions
aroo-0.015
aro-0.014
 nowhere-0.013
t-0.013
't-0.013
Token off
Token ID572
Feature activation+1.17e-01

Pos logit contributions
rated+0.008
aus+0.008
 gag+0.008
urg+0.008
icious+0.008
Neg logit contributions
aroo-0.004
aro-0.004
 nowhere-0.004
t-0.003
 she-0.003
Token the
Token ID262
Feature activation+9.17e-01

Pos logit contributions
rated+0.004
 gag+0.004
urg+0.004
aus+0.004
icious+0.004
Neg logit contributions
aroo-0.002
aro-0.002
 nowhere-0.002
t-0.002
hog-0.001
Token f
Token ID277
Feature activation+2.22e+00

Pos logit contributions
urg+0.021
rated+0.021
aus+0.020
 gag+0.020
icious+0.019
Neg logit contributions
aroo-0.024
aro-0.022
 nowhere-0.021
t-0.021
hog-0.019
Tokenauc
Token ID14272
Feature activation+5.60e+00

Pos logit contributions
 gag+0.042
rated+0.042
aus+0.040
 difference+0.040
 mental+0.040
Neg logit contributions
aroo-0.064
aro-0.062
t-0.062
hog-0.057
ite-0.056
Tokenet
Token ID316
Feature activation-9.84e-01

Pos logit contributions
 gag+0.137
 mental+0.134
urg+0.132
 Esk+0.129
 ste+0.127
Neg logit contributions
aroo-0.116
hog-0.114
aro-0.112
t-0.109
oe-0.101
Token.
Token ID13
Feature activation+3.27e-02

Pos logit contributions
aroo+0.017
aro+0.015
t+0.014
 nowhere+0.013
hog+0.012
Neg logit contributions
rated-0.031
 gag-0.031
urg-0.030
aus-0.030
icious-0.030
Token<|endoftext|>
Token ID50256
Feature activation+1.04e+00

Pos logit contributions
ksh+0.001
aus+0.001
 difference+0.001
icious+0.001
 mental+0.001
Neg logit contributions
aroo-0.001
aro-0.001
 nowhere-0.000
hog-0.000
t-0.000
TokenOnce
Token ID7454
Feature activation-1.37e+00

Pos logit contributions
rated+0.034
aus+0.034
iver+0.033
urg+0.033
 difference+0.033
Neg logit contributions
aroo-0.016
aro-0.014
 nowhere-0.011
hog-0.010
t-0.010
Token upon
Token ID2402
Feature activation+1.07e+00

Pos logit contributions
aroo+0.037
aro+0.036
hog+0.031
overed+0.027
opard+0.027
Neg logit contributions
 idea-0.028
rated-0.027
 choice-0.027
 difference-0.027
 alright-0.027
Token she
Token ID673
Feature activation-4.82e-01

Pos logit contributions
 gag+0.013
 hotter+0.012
 colder+0.012
 sne+0.012
 worse+0.012
Neg logit contributions
aro-0.010
aroo-0.009
ite-0.008
hog-0.008
uzz-0.008
Token saw
Token ID2497
Feature activation+1.12e+00

Pos logit contributions
aroo+0.010
 nowhere+0.009
aro+0.009
t+0.008
't+0.007
Neg logit contributions
rated-0.014
urg-0.013
 gag-0.013
 decisions-0.013
icious-0.013
Token a
Token ID257
Feature activation+5.55e-01

Pos logit contributions
 gag+0.024
 hotter+0.023
 burgers+0.023
aus+0.023
 colder+0.022
Neg logit contributions
aroo-0.030
aro-0.029
t-0.027
ite-0.026
hog-0.025
Token beautiful
Token ID4950
Feature activation-7.95e-02

Pos logit contributions
rated+0.014
urg+0.014
 gag+0.014
aus+0.014
 decisions+0.013
Neg logit contributions
aroo-0.013
aro-0.012
 nowhere-0.011
t-0.011
hog-0.010
Token f
Token ID277
Feature activation+2.31e+00

Pos logit contributions
aroo+0.002
aro+0.002
t+0.002
 nowhere+0.001
hog+0.001
Neg logit contributions
urg-0.002
rated-0.002
 gag-0.002
 difference-0.002
aus-0.002
Tokenauc
Token ID14272
Feature activation+5.60e+00

Pos logit contributions
rated+0.025
urg+0.025
 gag+0.025
aus+0.024
icious+0.023
Neg logit contributions
aroo-0.086
aro-0.081
 nowhere-0.081
t-0.075
hog-0.073
Tokenet
Token ID316
Feature activation-7.45e-01

Pos logit contributions
 decisions+0.099
urg+0.098
 gag+0.097
aus+0.095
 degrees+0.094
Neg logit contributions
aroo-0.171
aro-0.161
t-0.149
hog-0.148
 nowhere-0.147
Token.
Token ID13
Feature activation+4.67e-02

Pos logit contributions
aroo+0.014
aro+0.013
 nowhere+0.012
hog+0.011
t+0.011
Neg logit contributions
rated-0.021
urg-0.021
 gag-0.021
aus-0.020
 hotter-0.020
Token
Token ID220
Feature activation-9.08e-01

Pos logit contributions
 gag+0.001
aus+0.001
 microwave+0.001
 frying+0.001
ksh+0.001
Neg logit contributions
aroo-0.001
aro-0.001
ite-0.001
 nowhere-0.001
uzz-0.001
Token
Token ID198
Feature activation+1.28e-01

Pos logit contributions
aroo+0.017
aro+0.016
hog+0.012
 nowhere+0.012
ite+0.011
Neg logit contributions
iver-0.027
rated-0.027
aus-0.026
 alright-0.026
urg-0.026
Token
Token ID198
Feature activation-7.22e-01

Pos logit contributions
iver+0.004
 mental+0.004
aus+0.004
rated+0.004
have+0.003
Neg logit contributions
aroo-0.003
aro-0.002
hog-0.002
 nowhere-0.002
ite-0.002
Token and
Token ID290
Feature activation-7.30e-01

Pos logit contributions
rated+0.060
 gag+0.060
urg+0.059
aus+0.059
icious+0.058
Neg logit contributions
aroo-0.024
aro-0.021
t-0.019
 nowhere-0.019
hog-0.016
Token play
Token ID711
Feature activation-7.66e-01

Pos logit contributions
aroo+0.019
aro+0.018
 nowhere+0.016
t+0.016
hog+0.015
Neg logit contributions
aus-0.017
urg-0.016
rated-0.016
icious-0.016
 decisions-0.016
Token with
Token ID351
Feature activation-1.61e+00

Pos logit contributions
aroo+0.009
aro+0.007
 nowhere+0.006
t+0.006
hog+0.005
Neg logit contributions
rated-0.029
 gag-0.028
urg-0.028
aus-0.028
 decisions-0.028
Token the
Token ID262
Feature activation+2.84e-01

Pos logit contributions
aroo+0.048
aro+0.044
hog+0.044
t+0.043
ite+0.041
Neg logit contributions
 burgers-0.030
 gag-0.029
urg-0.029
 colder-0.028
 hamb-0.028
Token f
Token ID277
Feature activation+2.25e+00

Pos logit contributions
rated+0.007
urg+0.007
 gag+0.007
aus+0.007
 decisions+0.006
Neg logit contributions
aroo-0.007
aro-0.007
 nowhere-0.006
t-0.006
hog-0.006
Tokenauc
Token ID14272
Feature activation+5.60e+00

Pos logit contributions
rated+0.030
urg+0.029
 gag+0.028
aus+0.028
icious+0.026
Neg logit contributions
aroo-0.080
aro-0.076
 nowhere-0.073
t-0.071
hog-0.069
Tokenet
Token ID316
Feature activation-6.03e-01

Pos logit contributions
 gag+0.113
 decisions+0.113
icious+0.112
 mental+0.112
urg+0.112
Neg logit contributions
aroo-0.146
aro-0.143
t-0.134
hog-0.129
uzz-0.119
Token.
Token ID13
Feature activation+4.89e-01

Pos logit contributions
aroo+0.011
aro+0.010
 nowhere+0.010
t+0.009
hog+0.009
Neg logit contributions
rated-0.018
urg-0.017
 gag-0.017
aus-0.017
ksh-0.017
Token One
Token ID1881
Feature activation+1.62e+00

Pos logit contributions
aus+0.016
rated+0.016
urg+0.015
ksh+0.015
 gag+0.015
Neg logit contributions
aroo-0.008
aro-0.008
 nowhere-0.007
t-0.006
hog-0.006
Token day
Token ID1110
Feature activation+5.38e-01

Pos logit contributions
rated+0.051
urg+0.051
 gag+0.051
aus+0.050
icious+0.050
Neg logit contributions
aroo-0.027
aro-0.024
 nowhere-0.022
t-0.022
hog-0.019
Token,
Token ID11
Feature activation+3.38e-01

Pos logit contributions
 hamb+0.011
 burgers+0.011
aus+0.011
urg+0.010
 hotter+0.010
Neg logit contributions
aroo-0.014
aro-0.014
hog-0.013
 nowhere-0.013
uzz-0.012
Token would
Token ID561
Feature activation+1.90e-01

Pos logit contributions
aroo+0.001
t+0.001
 nowhere+0.001
 she+0.001
aro+0.001
Neg logit contributions
icious-0.003
 decisions-0.003
urg-0.003
 difference-0.003
 mental-0.002
Token turn
Token ID1210
Feature activation+2.99e-01

Pos logit contributions
rated+0.004
ksh+0.004
aus+0.004
 decisions+0.004
icious+0.004
Neg logit contributions
aroo-0.005
't-0.005
aro-0.005
t-0.004
 she-0.004
Token on
Token ID319
Feature activation-3.67e-01

Pos logit contributions
rated+0.010
icious+0.009
 gag+0.009
urg+0.009
aus+0.009
Neg logit contributions
aroo-0.005
aro-0.005
 nowhere-0.004
t-0.004
't-0.003
Token the
Token ID262
Feature activation+3.17e-01

Pos logit contributions
aroo+0.005
aro+0.005
 nowhere+0.005
t+0.004
 that+0.004
Neg logit contributions
rated-0.013
icious-0.012
 gag-0.012
iver-0.012
aus-0.012
Token f
Token ID277
Feature activation+2.14e+00

Pos logit contributions
rated+0.008
aus+0.007
urg+0.007
 gag+0.007
icious+0.007
Neg logit contributions
aroo-0.008
aro-0.007
t-0.007
 nowhere-0.007
't-0.006
Tokenauc
Token ID14272
Feature activation+5.60e+00

Pos logit contributions
rated+0.033
 gag+0.032
aus+0.031
urg+0.031
 decisions+0.030
Neg logit contributions
aroo-0.071
aro-0.069
t-0.065
 nowhere-0.063
hog-0.061
Tokenet
Token ID316
Feature activation-5.12e-01

Pos logit contributions
 mental+0.128
icious+0.128
 gag+0.126
 difference+0.125
 decisions+0.125
Neg logit contributions
aroo-0.131
aro-0.128
hog-0.120
t-0.113
haired-0.103
Token and
Token ID290
Feature activation-9.45e-01

Pos logit contributions
aroo+0.008
aro+0.007
t+0.006
 nowhere+0.006
hog+0.005
Neg logit contributions
rated-0.017
 gag-0.017
icious-0.017
urg-0.017
 decisions-0.017
Token use
Token ID779
Feature activation-9.12e-01

Pos logit contributions
aroo+0.024
aro+0.023
 nowhere+0.021
t+0.020
't+0.020
Neg logit contributions
aus-0.022
icious-0.022
rated-0.021
urg-0.021
ksh-0.021
Token it
Token ID340
Feature activation-2.39e-01

Pos logit contributions
aroo+0.018
aro+0.017
 nowhere+0.015
t+0.014
hog+0.013
Neg logit contributions
rated-0.027
urg-0.026
 gag-0.026
aus-0.026
icious-0.025
Token to
Token ID284
Feature activation+1.46e+00

Pos logit contributions
aroo+0.004
aro+0.004
 nowhere+0.003
ite+0.003
hog+0.003
Neg logit contributions
rated-0.007
urg-0.007
 gag-0.007
aus-0.007
 decisions-0.007
Token was
Token ID373
Feature activation+5.13e-01

Pos logit contributions
aroo+0.014
aro+0.013
 nowhere+0.011
t+0.011
hog+0.010
Neg logit contributions
rated-0.022
urg-0.021
 gag-0.021
aus-0.021
 decisions-0.021
Token a
Token ID257
Feature activation+1.16e+00

Pos logit contributions
aus+0.013
 gag+0.013
urg+0.012
rated+0.012
 burgers+0.012
Neg logit contributions
aroo-0.012
aro-0.012
t-0.010
hog-0.010
 nowhere-0.010
Token little
Token ID1310
Feature activation+4.52e-01

Pos logit contributions
rated+0.033
urg+0.032
 gag+0.032
aus+0.032
 decisions+0.031
Neg logit contributions
aroo-0.023
aro-0.021
 nowhere-0.020
t-0.019
hog-0.018
Token girl
Token ID2576
Feature activation+2.03e-01

Pos logit contributions
rated+0.011
 gag+0.011
aus+0.011
urg+0.011
 decisions+0.010
Neg logit contributions
aroo-0.011
aro-0.010
 nowhere-0.010
t-0.010
hog-0.009
Token named
Token ID3706
Feature activation-3.82e-01

Pos logit contributions
 sne+0.005
 por+0.005
iver+0.005
 frying+0.005
 gag+0.005
Neg logit contributions
aroo-0.004
aro-0.004
t-0.004
overed-0.004
 nowhere-0.004
Token Beat
Token ID12568
Feature activation-4.95e+00

Pos logit contributions
aroo+0.009
aro+0.009
ite+0.007
 nowhere+0.006
hog+0.006
Neg logit contributions
rated-0.010
 alright-0.010
 gag-0.010
ksh-0.009
aus-0.009
Tokenrice
Token ID20970
Feature activation-9.16e-01

Pos logit contributions
itt+0.112
aroo+0.109
aro+0.108
 nowhere+0.097
ite+0.092
Neg logit contributions
rated-0.114
ksh-0.107
 announcements-0.100
have-0.097
 insist-0.097
Token.
Token ID13
Feature activation-1.30e+00

Pos logit contributions
aro+0.026
aroo+0.026
t+0.024
ite+0.023
't+0.022
Neg logit contributions
rated-0.015
 sne-0.015
 gag-0.015
 insist-0.015
 difference-0.014
Token She
Token ID1375
Feature activation-5.35e-01

Pos logit contributions
aroo+0.028
aro+0.026
 nowhere+0.021
t+0.019
itt+0.019
Neg logit contributions
rated-0.037
aus-0.035
 difference-0.034
 decisions-0.034
urg-0.034
Token was
Token ID373
Feature activation-7.08e-01

Pos logit contributions
aroo+0.013
aro+0.013
ite+0.010
 nowhere+0.010
hog+0.010
Neg logit contributions
rated-0.013
 gag-0.013
aus-0.013
iver-0.013
 sne-0.012
Token so
Token ID523
Feature activation-6.52e-01

Pos logit contributions
aroo+0.015
aro+0.014
 nowhere+0.013
t+0.013
hog+0.012
Neg logit contributions
rated-0.019
urg-0.019
 gag-0.019
aus-0.019
 decisions-0.018
Token day
Token ID1110
Feature activation-1.42e+00

Pos logit contributions
aroo+0.020
aro+0.019
 nowhere+0.017
t+0.017
ite+0.017
Neg logit contributions
 difference-0.012
 choice-0.011
 decisions-0.011
 hamb-0.011
 deal-0.011
Token ever
Token ID1683
Feature activation+5.14e-01

Pos logit contributions
aroo+0.019
aro+0.017
 nowhere+0.014
hog+0.014
t+0.013
Neg logit contributions
 hamb-0.047
aus-0.047
urg-0.046
 burgers-0.046
 french-0.046
Token!
Token ID0
Feature activation+3.35e-01

Pos logit contributions
rated+0.014
 gag+0.014
urg+0.014
aus+0.014
icious+0.014
Neg logit contributions
aroo-0.011
aro-0.010
 nowhere-0.009
t-0.009
hog-0.008
Token<|endoftext|>
Token ID50256
Feature activation+8.26e-01

Pos logit contributions
 mental+0.009
 difference+0.009
aus+0.009
ksh+0.008
 decisions+0.008
Neg logit contributions
aroo-0.007
aro-0.007
t-0.006
 nowhere-0.005
hog-0.005
TokenMor
Token ID20044
Feature activation-1.57e+00

Pos logit contributions
aus+0.023
rated+0.023
urg+0.022
ksh+0.022
 difference+0.022
Neg logit contributions
aroo-0.016
aro-0.015
 nowhere-0.012
t-0.012
hog-0.012
Tokenris
Token ID2442
Feature activation-4.82e+00

Pos logit contributions
aroo+0.040
aro+0.038
 nowhere+0.035
t+0.034
hog+0.032
Neg logit contributions
rated-0.037
aus-0.036
urg-0.036
 gag-0.035
 decisions-0.034
Token was
Token ID373
Feature activation-4.07e-01

Pos logit contributions
aroo+0.109
aro+0.104
 nowhere+0.095
t+0.090
hog+0.087
Neg logit contributions
rated-0.115
urg-0.114
aus-0.113
 hamb-0.113
 french-0.110
Token very
Token ID845
Feature activation-1.30e+00

Pos logit contributions
aroo+0.009
aro+0.009
t+0.008
 nowhere+0.008
hog+0.007
Neg logit contributions
rated-0.011
urg-0.010
aus-0.010
 gag-0.010
iver-0.010
Token excited
Token ID6568
Feature activation-9.51e-02

Pos logit contributions
aroo+0.037
aro+0.036
 nowhere+0.035
hog+0.033
t+0.033
Neg logit contributions
 gag-0.023
rated-0.022
urg-0.022
 hamb-0.022
 french-0.022
Token.
Token ID13
Feature activation-9.57e-01

Pos logit contributions
aroo+0.002
aro+0.002
 nowhere+0.001
t+0.001
hog+0.001
Neg logit contributions
urg-0.003
rated-0.003
aus-0.003
 gag-0.003
 decisions-0.003
Token Today
Token ID6288
Feature activation-1.87e-01

Pos logit contributions
aro+0.023
aroo+0.023
 nowhere+0.021
t+0.018
ite+0.018
Neg logit contributions
aus-0.023
rated-0.022
 hamb-0.021
 decisions-0.021
 french-0.021
Token had
Token ID550
Feature activation-1.40e+00

Pos logit contributions
aroo+0.007
aro+0.006
 nowhere+0.005
t+0.005
hog+0.005
Neg logit contributions
rated-0.009
aus-0.009
urg-0.009
 gag-0.009
 decisions-0.009
Token saved
Token ID7448
Feature activation+2.70e-01

Pos logit contributions
aroo+0.034
aro+0.032
 nowhere+0.030
t+0.029
hog+0.027
Neg logit contributions
rated-0.034
 gag-0.033
urg-0.033
aus-0.033
icious-0.032
Token him
Token ID683
Feature activation-4.54e-01

Pos logit contributions
 gag+0.008
rated+0.008
icious+0.008
urg+0.008
ksh+0.008
Neg logit contributions
aroo-0.005
aro-0.005
 nowhere-0.005
t-0.004
hog-0.004
Token.
Token ID13
Feature activation+3.98e-02

Pos logit contributions
aroo+0.006
aro+0.005
t+0.005
 nowhere+0.005
hog+0.005
Neg logit contributions
 gag-0.015
rated-0.015
aus-0.015
urg-0.015
ksh-0.015
Token<|endoftext|>
Token ID50256
Feature activation+7.64e-01

Pos logit contributions
aus+0.001
 difference+0.001
 mental+0.001
ksh+0.001
urg+0.001
Neg logit contributions
aroo-0.001
aro-0.001
t-0.001
 nowhere-0.001
oe-0.001
TokenCharl
Token ID24453
Feature activation-4.56e+00

Pos logit contributions
aus+0.021
rated+0.021
 difference+0.021
iver+0.021
urg+0.020
Neg logit contributions
aroo-0.014
aro-0.014
 nowhere-0.011
t-0.011
hog-0.011
Tokenotte
Token ID11404
Feature activation-2.14e+00

Pos logit contributions
aroo+0.084
aro+0.077
hog+0.074
 Yes+0.074
t+0.074
Neg logit contributions
eling-0.125
urg-0.115
have-0.110
icious-0.110
 gag-0.110
Token and
Token ID290
Feature activation-4.01e-01

Pos logit contributions
aroo+0.037
aro+0.033
 nowhere+0.031
t+0.029
hog+0.027
Neg logit contributions
rated-0.067
 gag-0.067
urg-0.066
aus-0.065
icious-0.065
Token her
Token ID607
Feature activation-1.06e+00

Pos logit contributions
aro+0.009
aroo+0.009
t+0.009
 nowhere+0.008
hog+0.008
Neg logit contributions
aus-0.009
 insist-0.009
rated-0.009
 gag-0.009
 french-0.008
Token mom
Token ID1995
Feature activation+2.76e+00

Pos logit contributions
aro+0.025
aroo+0.025
t+0.022
 nowhere+0.021
hog+0.021
Neg logit contributions
 mental-0.025
urg-0.025
aus-0.025
 gag-0.024
 french-0.024
Token were
Token ID547
Feature activation+1.53e-01

Pos logit contributions
 gag+0.082
rated+0.079
 decisions+0.078
 degrees+0.078
ksh+0.076
Neg logit contributions
aroo-0.052
aro-0.046
 nowhere-0.046
ite-0.043
t-0.042
Token
Token ID198
Feature activation+2.17e-01

Pos logit contributions
aroo+0.002
aro+0.002
 nowhere+0.001
hog+0.001
itt+0.001
Neg logit contributions
 difference-0.002
rated-0.002
 idea-0.002
 mental-0.002
 swe-0.002
Token
Token ID198
Feature activation-1.04e+00

Pos logit contributions
 mental+0.005
have+0.005
rated+0.005
iver+0.005
 idea+0.005
Neg logit contributions
aroo-0.005
aro-0.004
hog-0.004
 nowhere-0.004
itt-0.003
TokenOne
Token ID3198
Feature activation+1.24e+00

Pos logit contributions
aroo+0.027
aro+0.026
itt+0.022
 nowhere+0.022
hog+0.021
Neg logit contributions
rated-0.024
have-0.023
aus-0.022
 alright-0.022
iver-0.022
Token day
Token ID1110
Feature activation-3.41e-01

Pos logit contributions
rated+0.031
aus+0.030
 gag+0.029
iver+0.029
urg+0.029
Neg logit contributions
aro-0.029
aroo-0.029
 nowhere-0.026
t-0.024
hog-0.024
Token,
Token ID11
Feature activation-3.11e-01

Pos logit contributions
aroo+0.009
aro+0.009
hog+0.008
 nowhere+0.008
uzz+0.007
Neg logit contributions
rated-0.007
 alright-0.007
have-0.006
 colder-0.006
 idea-0.006
Token Beat
Token ID12568
Feature activation-4.51e+00

Pos logit contributions
aroo+0.007
aro+0.007
hog+0.006
itt+0.006
uzz+0.005
Neg logit contributions
rated-0.008
 alright-0.008
have-0.008
 idea-0.008
 colder-0.008
Tokenrice
Token ID20970
Feature activation-1.85e+00

Pos logit contributions
aroo+0.100
itt+0.099
aro+0.093
 nowhere+0.085
ite+0.082
Neg logit contributions
rated-0.113
 announcements-0.102
have-0.098
ksh-0.097
 mental-0.093
Tokenâ
Token ID22940
Feature activation+8.79e-01

Pos logit contributions
aroo+0.037
aro+0.034
 nowhere+0.032
t+0.031
hog+0.029
Neg logit contributions
rated-0.052
urg-0.052
 gag-0.051
aus-0.050
icious-0.050
Token
Token ID26391
Feature activation-3.37e+00

Pos logit contributions
rated+0.018
aus+0.016
iver+0.016
icious+0.016
urg+0.016
Neg logit contributions
aroo-0.026
aro-0.025
 nowhere-0.023
t-0.022
hog-0.021
Token
Token ID8151
Feature activation-2.81e+00

Pos logit contributions
aroo+0.099
aro+0.093
 nowhere+0.082
hog+0.080
t+0.077
Neg logit contributions
iver-0.064
urg-0.063
rated-0.062
 gag-0.062
icious-0.062
Tokens
Token ID82
Feature activation-1.08e+00

Pos logit contributions
aroo+0.059
aro+0.054
 nowhere+0.046
 Spirit+0.046
 Yes+0.045
Neg logit contributions
aus-0.069
iver-0.063
 alright-0.062
urg-0.061
rated-0.060
Token weak
Token ID4939
Feature activation-4.37e-01

Pos logit contributions
rated+0.005
urg+0.005
 gag+0.005
aus+0.005
icious+0.005
Neg logit contributions
aroo-0.007
aro-0.007
 nowhere-0.006
t-0.006
hog-0.006
Token and
Token ID290
Feature activation-1.82e+00

Pos logit contributions
aroo+0.007
aro+0.006
 nowhere+0.006
t+0.006
hog+0.005
Neg logit contributions
rated-0.014
urg-0.014
 gag-0.014
aus-0.014
icious-0.014
Token they
Token ID484
Feature activation-4.20e-01

Pos logit contributions
aroo+0.035
aro+0.032
 nowhere+0.030
t+0.029
hog+0.026
Neg logit contributions
rated-0.053
urg-0.053
aus-0.053
 gag-0.052
icious-0.052
Token shouldn
Token ID6584
Feature activation-8.78e-01

Pos logit contributions
aroo+0.009
aro+0.009
 nowhere+0.008
t+0.007
hog+0.007
Neg logit contributions
rated-0.011
 gag-0.011
aus-0.010
urg-0.010
 alright-0.010
Tokenâ
Token ID22940
Feature activation+1.95e-01

Pos logit contributions
aroo+0.017
aro+0.015
 nowhere+0.014
hog+0.012
t+0.012
Neg logit contributions
rated-0.026
aus-0.025
urg-0.025
 gag-0.025
 decisions-0.025
Token
Token ID26391
Feature activation-4.49e+00

Pos logit contributions
urg+0.004
 gag+0.004
 burgers+0.004
 french+0.004
 hamb+0.004
Neg logit contributions
aroo-0.005
t-0.005
aro-0.005
 nowhere-0.004
't-0.004
Token
Token ID8151
Feature activation-3.94e+00

Pos logit contributions
aroo+0.109
aro+0.098
 nowhere+0.089
hog+0.085
t+0.081
Neg logit contributions
rated-0.108
urg-0.107
 gag-0.106
iver-0.105
 decisions-0.105
Tokent
Token ID83
Feature activation-5.35e-01

Pos logit contributions
aroo+0.050
aro+0.039
 nowhere+0.039
 Spirit+0.027
hog+0.026
Neg logit contributions
aus-0.136
rated-0.132
 alright-0.132
ksh-0.129
icious-0.128
Token have
Token ID423
Feature activation-1.59e+00

Pos logit contributions
aroo+0.014
aro+0.013
 nowhere+0.012
t+0.012
hog+0.011
Neg logit contributions
rated-0.012
 gag-0.012
 decisions-0.012
urg-0.012
 difference-0.012
Token faced
Token ID7452
Feature activation-1.09e+00

Pos logit contributions
aroo+0.030
aro+0.028
 nowhere+0.026
t+0.025
hog+0.022
Neg logit contributions
rated-0.048
urg-0.047
 gag-0.046
aus-0.046
icious-0.045
Token the
Token ID262
Feature activation-6.77e-01

Pos logit contributions
aroo+0.020
aro+0.018
 nowhere+0.017
t+0.015
hog+0.014
Neg logit contributions
rated-0.034
 gag-0.033
urg-0.033
aus-0.033
icious-0.032
Token the
Token ID262
Feature activation+1.28e-01

Pos logit contributions
aroo+0.002
aro+0.002
t+0.002
 nowhere+0.002
ite+0.002
Neg logit contributions
 gag-0.005
rated-0.005
aus-0.005
urg-0.005
 burgers-0.005
Token biggest
Token ID4094
Feature activation-1.26e+00

Pos logit contributions
rated+0.004
 gag+0.004
urg+0.004
aus+0.004
 decisions+0.004
Neg logit contributions
aroo-0.002
aro-0.002
 nowhere-0.002
t-0.002
hog-0.002
Token frog
Token ID21264
Feature activation-1.05e+00

Pos logit contributions
aroo+0.022
aro+0.020
 nowhere+0.018
t+0.017
hog+0.015
Neg logit contributions
rated-0.039
 gag-0.039
urg-0.038
aus-0.038
 decisions-0.037
Token hadn
Token ID8020
Feature activation-3.03e+00

Pos logit contributions
aroo+0.023
aro+0.022
t+0.019
 nowhere+0.019
uzz+0.019
Neg logit contributions
rated-0.026
aus-0.025
urg-0.025
 gag-0.025
 hamb-0.025
Tokenâ
Token ID22940
Feature activation-3.68e-01

Pos logit contributions
aroo+0.040
aro+0.035
hog+0.027
 nowhere+0.025
ite+0.023
Neg logit contributions
aus-0.106
urg-0.101
iver-0.100
 alright-0.099
icious-0.098
Token
Token ID26391
Feature activation-4.43e+00

Pos logit contributions
aroo+0.010
aro+0.010
 nowhere+0.009
t+0.009
hog+0.009
Neg logit contributions
rated-0.008
aus-0.007
urg-0.007
 gag-0.007
icious-0.007
Token
Token ID8151
Feature activation-3.58e+00

Pos logit contributions
aroo+0.081
aro+0.068
 nowhere+0.056
hog+0.054
ite+0.046
Neg logit contributions
urg-0.132
iver-0.132
icious-0.131
 gag-0.129
 degrees-0.128
Tokent
Token ID83
Feature activation-3.39e-01

Pos logit contributions
aroo+0.041
aro+0.036
 nowhere+0.030
 Spirit+0.023
overed+0.023
Neg logit contributions
aus-0.126
 alright-0.121
rated-0.119
 hamb-0.116
 ok-0.114
Token brought
Token ID3181
Feature activation-1.75e-01

Pos logit contributions
aroo+0.007
aro+0.007
 nowhere+0.006
t+0.006
hog+0.005
Neg logit contributions
rated-0.009
urg-0.009
 gag-0.009
aus-0.009
 decisions-0.009
Token any
Token ID597
Feature activation+1.15e-01

Pos logit contributions
aroo+0.004
aro+0.003
 nowhere+0.003
t+0.003
hog+0.003
Neg logit contributions
 decisions-0.005
rated-0.005
urg-0.005
aus-0.005
 gag-0.004
Token l
Token ID300
Feature activation-1.62e+00

Pos logit contributions
rated+0.003
urg+0.003
 gag+0.003
aus+0.002
icious+0.002
Neg logit contributions
aroo-0.003
aro-0.003
 nowhere-0.003
t-0.003
hog-0.002
Token She
Token ID1375
Feature activation-5.35e-01

Pos logit contributions
aroo+0.028
aro+0.026
 nowhere+0.021
t+0.019
itt+0.019
Neg logit contributions
rated-0.037
aus-0.035
 difference-0.034
 decisions-0.034
urg-0.034
Token was
Token ID373
Feature activation-7.08e-01

Pos logit contributions
aroo+0.013
aro+0.013
ite+0.010
 nowhere+0.010
hog+0.010
Neg logit contributions
rated-0.013
 gag-0.013
aus-0.013
iver-0.013
 sne-0.012
Token so
Token ID523
Feature activation-6.52e-01

Pos logit contributions
aroo+0.015
aro+0.014
 nowhere+0.013
t+0.013
hog+0.012
Neg logit contributions
rated-0.019
urg-0.019
 gag-0.019
aus-0.019
 decisions-0.018
Token cute
Token ID13779
Feature activation-8.65e-01

Pos logit contributions
aroo+0.019
aro+0.018
 nowhere+0.017
t+0.016
hog+0.016
Neg logit contributions
 gag-0.013
rated-0.013
urg-0.013
aus-0.012
iver-0.012
Token!
Token ID0
Feature activation-5.08e-01

Pos logit contributions
aro+0.021
aroo+0.020
t+0.017
 nowhere+0.017
ite+0.016
Neg logit contributions
 awake-0.019
 hamb-0.019
 burgers-0.019
 french-0.018
aus-0.018
Token Beat
Token ID12568
Feature activation-4.42e+00

Pos logit contributions
aroo+0.013
aro+0.012
itt+0.010
 nowhere+0.010
hog+0.010
Neg logit contributions
rated-0.012
 difference-0.011
 idea-0.011
aus-0.011
 swe-0.011
Tokenrice
Token ID20970
Feature activation-1.33e+00

Pos logit contributions
aroo+0.098
itt+0.098
aro+0.092
ite+0.085
 nowhere+0.085
Neg logit contributions
rated-0.106
 announcements-0.095
ksh-0.092
have-0.091
 mental-0.090
Token had
Token ID550
Feature activation-1.06e+00

Pos logit contributions
aroo+0.025
aro+0.023
 nowhere+0.021
t+0.020
hog+0.018
Neg logit contributions
rated-0.040
urg-0.039
 gag-0.039
aus-0.039
 decisions-0.038
Token five
Token ID1936
Feature activation-9.69e-01

Pos logit contributions
aroo+0.026
aro+0.024
t+0.021
hog+0.020
 nowhere+0.020
Neg logit contributions
 gag-0.026
aus-0.025
urg-0.025
rated-0.025
 hamb-0.025
Token fingers
Token ID9353
Feature activation-1.49e+00

Pos logit contributions
aroo+0.022
aro+0.021
 nowhere+0.019
t+0.019
hog+0.018
Neg logit contributions
rated-0.025
urg-0.024
aus-0.024
 gag-0.024
icious-0.023
Token on
Token ID319
Feature activation-2.96e+00

Pos logit contributions
aroo+0.031
aro+0.030
 nowhere+0.025
hog+0.023
 Spirit+0.022
Neg logit contributions
 gag-0.038
urg-0.037
 sne-0.036
 awake-0.036
 degrees-0.036
Token so
Token ID523
Feature activation-8.69e-01

Pos logit contributions
aroo+0.019
aro+0.018
 nowhere+0.017
t+0.016
hog+0.015
Neg logit contributions
rated-0.023
urg-0.022
aus-0.022
 gag-0.022
icious-0.022
Token excited
Token ID6568
Feature activation-1.19e+00

Pos logit contributions
aroo+0.021
aro+0.019
 nowhere+0.018
t+0.018
hog+0.017
Neg logit contributions
rated-0.022
urg-0.021
 gag-0.021
aus-0.021
icious-0.021
Token he
Token ID339
Feature activation-9.17e-02

Pos logit contributions
aroo+0.021
aro+0.019
 nowhere+0.016
t+0.015
ite+0.015
Neg logit contributions
rated-0.037
urg-0.036
aus-0.036
 decisions-0.036
 burgers-0.035
Token couldn
Token ID3521
Feature activation-1.59e+00

Pos logit contributions
aroo+0.002
aro+0.002
 nowhere+0.002
ite+0.002
t+0.002
Neg logit contributions
rated-0.002
urg-0.002
aus-0.002
 hamb-0.002
 gag-0.002
Tokenâ
Token ID22940
Feature activation-2.13e-01

Pos logit contributions
aroo+0.013
aro+0.011
 nowhere+0.009
hog+0.005
t+0.005
Neg logit contributions
aus-0.063
rated-0.063
urg-0.062
 gag-0.062
iver-0.061
Token
Token ID26391
Feature activation-4.42e+00

Pos logit contributions
aroo+0.006
t+0.006
aro+0.005
't+0.005
 nowhere+0.005
Neg logit contributions
urg-0.004
 gag-0.004
 degrees-0.004
rated-0.004
 burgers-0.004
Token
Token ID8151
Feature activation-3.71e+00

Pos logit contributions
aroo+0.080
aro+0.072
 nowhere+0.062
hog+0.056
t+0.054
Neg logit contributions
urg-0.135
rated-0.134
 gag-0.133
aus-0.132
iver-0.132
Tokent
Token ID83
Feature activation-7.83e-01

Pos logit contributions
aroo+0.040
aro+0.033
 nowhere+0.032
hog+0.021
 Spirit+0.020
Neg logit contributions
aus-0.136
iver-0.132
rated-0.131
 alright-0.130
icious-0.129
Token contain
Token ID3994
Feature activation-1.64e+00

Pos logit contributions
aroo+0.025
aro+0.024
 nowhere+0.023
ite+0.022
hog+0.021
Neg logit contributions
rated-0.013
 gag-0.013
iver-0.012
 deal-0.012
aus-0.012
Token himself
Token ID2241
Feature activation-3.64e-01

Pos logit contributions
aroo+0.040
aro+0.037
t+0.034
 nowhere+0.033
hog+0.032
Neg logit contributions
urg-0.041
rated-0.041
aus-0.040
 decisions-0.040
 gag-0.039
Token!
Token ID0
Feature activation-3.28e-01

Pos logit contributions
aroo+0.006
aro+0.006
 nowhere+0.005
t+0.005
hog+0.005
Neg logit contributions
rated-0.011
aus-0.011
urg-0.011
 gag-0.011
ksh-0.011
Token worry
Token ID5490
Feature activation-1.18e+00

Pos logit contributions
urg+0.013
rated+0.013
aus+0.013
ksh+0.013
 gag+0.013
Neg logit contributions
aroo-0.018
aro-0.017
t-0.016
 nowhere-0.016
't-0.015
Token because
Token ID780
Feature activation+3.96e-01

Pos logit contributions
aroo+0.019
aro+0.018
 nowhere+0.016
t+0.016
hog+0.014
Neg logit contributions
rated-0.038
 gag-0.037
urg-0.037
aus-0.037
ksh-0.036
Token he
Token ID339
Feature activation+1.08e-02

Pos logit contributions
aus+0.012
 decisions+0.012
 burgers+0.012
 gag+0.012
 hamb+0.012
Neg logit contributions
aroo-0.006
aro-0.005
hog-0.005
 nowhere-0.005
t-0.005
Token hadn
Token ID8020
Feature activation-1.95e+00

Pos logit contributions
rated+0.000
urg+0.000
 gag+0.000
aus+0.000
 decisions+0.000
Neg logit contributions
aroo-0.000
aro-0.000
 nowhere-0.000
t-0.000
hog-0.000
Tokenâ
Token ID22940
Feature activation-2.72e-01

Pos logit contributions
aroo+0.025
aro+0.020
 nowhere+0.017
hog+0.017
uzz+0.013
Neg logit contributions
aus-0.068
 alright-0.067
iver-0.066
urg-0.065
icious-0.065
Token
Token ID26391
Feature activation-4.38e+00

Pos logit contributions
aroo+0.008
t+0.008
aro+0.007
 nowhere+0.007
't+0.007
Neg logit contributions
urg-0.005
 gag-0.005
 burgers-0.005
 hamb-0.005
 french-0.005
Token
Token ID8151
Feature activation-4.02e+00

Pos logit contributions
aroo+0.086
aro+0.076
 nowhere+0.068
hog+0.062
t+0.058
Neg logit contributions
urg-0.125
 gag-0.125
iver-0.124
rated-0.124
aus-0.124
Tokent
Token ID83
Feature activation+4.54e-01

Pos logit contributions
aroo+0.048
aro+0.039
 nowhere+0.037
 Spirit+0.027
 globe+0.026
Neg logit contributions
aus-0.144
 alright-0.138
rated-0.135
icious-0.133
ksh-0.132
Token seen
Token ID1775
Feature activation-6.63e-01

Pos logit contributions
urg+0.012
rated+0.011
ksh+0.011
aus+0.011
icious+0.011
Neg logit contributions
aroo-0.011
aro-0.010
 nowhere-0.009
t-0.009
't-0.008
Token her
Token ID607
Feature activation-1.81e+00

Pos logit contributions
aroo+0.016
aro+0.015
t+0.014
 nowhere+0.014
hog+0.013
Neg logit contributions
urg-0.015
aus-0.015
 gag-0.015
rated-0.015
 decisions-0.015
Token pick
Token ID2298
Feature activation+1.58e-02

Pos logit contributions
aroo+0.033
aro+0.030
 nowhere+0.028
t+0.026
hog+0.024
Neg logit contributions
rated-0.055
aus-0.054
urg-0.054
 gag-0.054
icious-0.053
Token!"
Token ID2474
Feature activation+2.55e-01

Pos logit contributions
aroo+0.034
aro+0.032
 nowhere+0.030
t+0.028
hog+0.026
Neg logit contributions
rated-0.034
urg-0.032
aus-0.032
 decisions-0.032
 difference-0.032
Token
Token ID220
Feature activation-7.44e-01

Pos logit contributions
 french+0.006
 hamb+0.006
aus+0.006
 burgers+0.006
 gra+0.006
Neg logit contributions
 nowhere-0.005
aroo-0.005
t-0.005
aro-0.005
ite-0.005
Token
Token ID198
Feature activation-5.46e-01

Pos logit contributions
aroo+0.010
aro+0.009
 nowhere+0.008
t+0.007
hog+0.006
Neg logit contributions
urg-0.026
aus-0.025
rated-0.025
 decisions-0.025
 gag-0.025
Token
Token ID198
Feature activation-1.11e+00

Pos logit contributions
 nowhere+0.009
aroo+0.009
ite+0.009
aro+0.009
t+0.008
Neg logit contributions
 mental-0.014
urg-0.014
aus-0.014
 gra-0.014
 french-0.014
TokenMor
Token ID20044
Feature activation-2.65e-01

Pos logit contributions
aroo+0.023
 nowhere+0.023
aro+0.023
t+0.022
ite+0.021
Neg logit contributions
icious-0.026
aus-0.025
have-0.025
urg-0.025
 french-0.024
Tokenris
Token ID2442
Feature activation-4.35e+00

Pos logit contributions
aroo+0.007
aro+0.006
 nowhere+0.006
t+0.006
hog+0.005
Neg logit contributions
rated-0.006
urg-0.006
aus-0.006
 gag-0.006
icious-0.006
Token smiled
Token ID13541
Feature activation-3.59e-01

Pos logit contributions
aroo+0.101
aro+0.093
 nowhere+0.092
t+0.084
 Yes+0.081
Neg logit contributions
rated-0.099
urg-0.099
aus-0.098
eling-0.095
 degrees-0.093
Token and
Token ID290
Feature activation-5.92e-02

Pos logit contributions
aroo+0.006
aro+0.006
 nowhere+0.005
t+0.005
hog+0.004
Neg logit contributions
rated-0.011
urg-0.011
 gag-0.011
aus-0.011
 decisions-0.011
Token replied
Token ID8712
Feature activation-7.67e-01

Pos logit contributions
aroo+0.001
 nowhere+0.001
aro+0.001
t+0.001
ite+0.001
Neg logit contributions
urg-0.001
aus-0.001
rated-0.001
 hamb-0.001
 french-0.001
Token,
Token ID11
Feature activation+9.55e-01

Pos logit contributions
aroo+0.015
 nowhere+0.014
aro+0.014
t+0.013
ite+0.012
Neg logit contributions
urg-0.021
rated-0.020
 gag-0.020
 decisions-0.020
aus-0.020
Token "
Token ID366
Feature activation-8.60e-01

Pos logit contributions
urg+0.024
aus+0.023
 hamb+0.023
 alright+0.023
 ste+0.023
Neg logit contributions
aroo-0.020
 nowhere-0.019
aro-0.019
t-0.018
ite-0.018
Token a
Token ID257
Feature activation+1.24e+00

Pos logit contributions
aus+0.017
 gag+0.017
urg+0.017
 burgers+0.016
 hamb+0.016
Neg logit contributions
aroo-0.017
aro-0.017
t-0.015
hog-0.014
ite-0.014
Token strong
Token ID1913
Feature activation+5.39e-01

Pos logit contributions
rated+0.036
urg+0.035
 gag+0.035
aus+0.034
 decisions+0.034
Neg logit contributions
aroo-0.025
aro-0.022
 nowhere-0.021
t-0.020
hog-0.019
Token lizard
Token ID42406
Feature activation-7.12e-02

Pos logit contributions
rated+0.015
urg+0.015
 gag+0.015
aus+0.014
icious+0.014
Neg logit contributions
aroo-0.012
aro-0.010
 nowhere-0.010
t-0.009
hog-0.009
Token named
Token ID3706
Feature activation-1.07e-01

Pos logit contributions
aro+0.002
aroo+0.002
t+0.002
overed+0.001
 nowhere+0.001
Neg logit contributions
 sne-0.001
 gag-0.001
 hamb-0.001
 por-0.001
 burgers-0.001
Token T
Token ID309
Feature activation-1.26e+00

Pos logit contributions
aroo+0.002
aro+0.002
t+0.002
 nowhere+0.002
ite+0.002
Neg logit contributions
aus-0.003
urg-0.003
 hamb-0.003
 gag-0.003
rated-0.003
Tokenaz
Token ID1031
Feature activation-4.33e+00

Pos logit contributions
aroo+0.036
aro+0.035
 nowhere+0.032
t+0.031
hog+0.030
Neg logit contributions
urg-0.025
aus-0.025
rated-0.025
 gag-0.024
iver-0.023
Token.
Token ID13
Feature activation-1.01e+00

Pos logit contributions
aroo+0.106
aro+0.104
 nowhere+0.099
ite+0.098
hog+0.094
Neg logit contributions
rated-0.091
aus-0.090
urg-0.086
eling-0.083
iver-0.083
Token He
Token ID679
Feature activation-4.45e-01

Pos logit contributions
aroo+0.019
t+0.017
aro+0.017
 nowhere+0.017
't+0.016
Neg logit contributions
rated-0.029
 gag-0.028
icious-0.028
urg-0.027
 decisions-0.027
Token lived
Token ID5615
Feature activation-5.07e-01

Pos logit contributions
aroo+0.010
aro+0.010
t+0.008
ite+0.008
 nowhere+0.008
Neg logit contributions
 gag-0.011
rated-0.011
aus-0.011
urg-0.010
 hamb-0.010
Token happily
Token ID18177
Feature activation-1.94e+00

Pos logit contributions
aro+0.011
aroo+0.010
t+0.010
hog+0.009
ite+0.009
Neg logit contributions
 awake-0.013
 Esk-0.013
 tast-0.012
 ste-0.012
 ang-0.012
Token among
Token ID1871
Feature activation-2.31e+00

Pos logit contributions
aroo+0.034
aro+0.032
 nowhere+0.029
t+0.027
hog+0.027
Neg logit contributions
 gag-0.054
 hamb-0.054
rated-0.053
urg-0.053
 burgers-0.053
Token old
Token ID1468
Feature activation-1.84e+00

Pos logit contributions
urg+0.019
aus+0.018
 decisions+0.018
 burgers+0.017
rated+0.017
Neg logit contributions
aroo-0.035
aro-0.034
t-0.032
 nowhere-0.031
't-0.030
Token.
Token ID13
Feature activation-1.12e+00

Pos logit contributions
aroo+0.039
aro+0.038
t+0.033
 nowhere+0.032
't+0.029
Neg logit contributions
rated-0.049
 gag-0.049
 decisions-0.049
aus-0.049
 difference-0.048
Token One
Token ID1881
Feature activation+1.33e+00

Pos logit contributions
aroo+0.027
aro+0.024
 nowhere+0.022
itt+0.020
hog+0.020
Neg logit contributions
 difference-0.027
 idea-0.027
rated-0.027
 decisions-0.027
 mental-0.026
Token day
Token ID1110
Feature activation-2.54e-01

Pos logit contributions
rated+0.038
 gag+0.037
aus+0.037
urg+0.036
iver+0.035
Neg logit contributions
aroo-0.027
aro-0.026
 nowhere-0.023
t-0.022
hog-0.021
Token,
Token ID11
Feature activation-3.03e-01

Pos logit contributions
aroo+0.007
hog+0.007
aro+0.007
 nowhere+0.006
ite+0.006
Neg logit contributions
 alright-0.004
 Esk-0.004
 rum-0.004
 announcements-0.004
urg-0.004
Token Lay
Token ID18881
Feature activation-4.32e+00

Pos logit contributions
aroo+0.006
aro+0.006
hog+0.005
itt+0.005
 nowhere+0.005
Neg logit contributions
 Esk-0.008
 idea-0.007
 announcements-0.007
 alright-0.007
rated-0.007
Tokenla
Token ID5031
Feature activation-1.91e+00

Pos logit contributions
aroo+0.129
+0.124
aro+0.121
 nowhere+0.120
itt+0.118
Neg logit contributions
rated-0.075
urg-0.063
have-0.063
eling-0.059
 Esk-0.059
Token asked
Token ID1965
Feature activation+2.57e-01

Pos logit contributions
aroo+0.040
aro+0.037
 nowhere+0.033
t+0.032
hog+0.030
Neg logit contributions
rated-0.053
urg-0.052
 gag-0.051
aus-0.051
 decisions-0.050
Token her
Token ID607
Feature activation-1.50e+00

Pos logit contributions
rated+0.007
urg+0.007
aus+0.007
 gag+0.007
 decisions+0.007
Neg logit contributions
aroo-0.005
aro-0.005
 nowhere-0.004
t-0.004
hog-0.004
Token mum
Token ID25682
Feature activation+2.12e+00

Pos logit contributions
aroo+0.033
aro+0.031
t+0.029
 nowhere+0.028
ite+0.025
Neg logit contributions
 difference-0.039
 gag-0.039
urg-0.039
 hamb-0.038
 mental-0.038
Token for
Token ID329
Feature activation+8.89e-04

Pos logit contributions
rated+0.061
 gag+0.060
urg+0.059
icious+0.059
ksh+0.058
Neg logit contributions
aroo-0.042
aro-0.038
 nowhere-0.037
t-0.035
hog-0.032
Token and
Token ID290
Feature activation-1.10e+00

Pos logit contributions
aroo+0.002
aro+0.002
 nowhere+0.001
t+0.001
hog+0.001
Neg logit contributions
rated-0.003
urg-0.003
 gag-0.003
 decisions-0.003
aus-0.003
Token asked
Token ID1965
Feature activation+8.93e-01

Pos logit contributions
aroo+0.026
aro+0.024
 nowhere+0.023
t+0.022
hog+0.021
Neg logit contributions
rated-0.028
urg-0.027
aus-0.027
 gag-0.027
icious-0.027
Token:
Token ID25
Feature activation+1.45e+00

Pos logit contributions
rated+0.028
 gag+0.027
urg+0.027
aus+0.027
icious+0.027
Neg logit contributions
aroo-0.016
aro-0.015
 nowhere-0.013
t-0.012
hog-0.012
Token "
Token ID366
Feature activation-1.04e+00

Pos logit contributions
 alright+0.032
 hamb+0.032
 hotter+0.032
urg+0.031
 gag+0.031
Neg logit contributions
aroo-0.035
aro-0.034
ite-0.030
 nowhere-0.029
t-0.029
TokenWhy
Token ID5195
Feature activation+1.47e+00

Pos logit contributions
aroo+0.024
aro+0.023
 nowhere+0.021
t+0.021
hog+0.019
Neg logit contributions
rated-0.026
urg-0.026
 gag-0.026
aus-0.025
icious-0.025
Token hasn
Token ID5818
Feature activation-4.32e+00

Pos logit contributions
rated+0.034
ksh+0.034
urg+0.034
 gag+0.033
aus+0.033
Neg logit contributions
aroo-0.037
aro-0.035
t-0.034
 nowhere-0.033
't-0.032
Token't
Token ID470
Feature activation-3.49e-01

Pos logit contributions
aroo+0.048
aro+0.046
hog+0.039
uzz+0.038
 nowhere+0.034
Neg logit contributions
 alright-0.152
rated-0.150
aus-0.150
urg-0.148
iver-0.148
Token the
Token ID262
Feature activation-3.83e-01

Pos logit contributions
aroo+0.009
aro+0.008
hog+0.007
t+0.007
 nowhere+0.007
Neg logit contributions
rated-0.008
urg-0.008
 decisions-0.008
aus-0.008
 difference-0.008
Token wealthy
Token ID11574
Feature activation-1.12e+00

Pos logit contributions
aroo+0.008
aro+0.008
 nowhere+0.007
t+0.007
hog+0.006
Neg logit contributions
rated-0.011
 difference-0.010
 gag-0.010
 decisions-0.010
urg-0.010
Token flower
Token ID15061
Feature activation-1.54e+00

Pos logit contributions
aroo+0.026
aro+0.025
 nowhere+0.022
t+0.022
uzz+0.021
Neg logit contributions
rated-0.028
 difference-0.028
 decisions-0.028
 aroma-0.027
 deal-0.027
Token bloom
Token ID29955
Feature activation+7.14e-01

Pos logit contributions
aroo+0.035
aro+0.033
 nowhere+0.030
t+0.029
hog+0.028
Neg logit contributions
rated-0.039
urg-0.038
 gag-0.038
aus-0.038
 decisions-0.037
Token
Token ID26391
Feature activation-2.22e+00

Pos logit contributions
 gag+0.035
urg+0.034
 burgers+0.032
 french+0.032
 hamb+0.032
Neg logit contributions
aroo-0.043
aro-0.038
t-0.038
 nowhere-0.035
't-0.035
Token
Token ID198
Feature activation+8.33e-01

Pos logit contributions
aroo+0.049
aro+0.045
 nowhere+0.042
t+0.040
hog+0.038
Neg logit contributions
rated-0.058
urg-0.057
 gag-0.057
aus-0.057
 decisions-0.056
Token
Token ID198
Feature activation-4.46e-01

Pos logit contributions
aus+0.028
urg+0.028
rated+0.028
 gag+0.028
icious+0.027
Neg logit contributions
aroo-0.012
aro-0.010
 nowhere-0.009
t-0.008
hog-0.008
TokenAnd
Token ID1870
Feature activation-4.38e-01

Pos logit contributions
aroo+0.009
aro+0.008
 nowhere+0.008
t+0.007
hog+0.007
Neg logit contributions
aus-0.012
icious-0.012
rated-0.012
urg-0.012
 gag-0.012
Token so
Token ID523
Feature activation+7.91e-01

Pos logit contributions
aroo+0.008
aro+0.008
 nowhere+0.007
t+0.007
hog+0.007
Neg logit contributions
rated-0.012
 decisions-0.012
aus-0.012
urg-0.012
 gag-0.012
Token Beat
Token ID12568
Feature activation-4.31e+00

Pos logit contributions
rated+0.025
 gag+0.025
urg+0.025
aus+0.024
ksh+0.024
Neg logit contributions
aroo-0.014
aro-0.013
 nowhere-0.011
t-0.011
hog-0.010
Tokenrice
Token ID20970
Feature activation-1.38e+00

Pos logit contributions
 nowhere+0.086
itt+0.084
aroo+0.082
aro+0.075
ite+0.073
Neg logit contributions
rated-0.111
ksh-0.099
have-0.097
urg-0.097
 mental-0.097
Token had
Token ID550
Feature activation-8.08e-01

Pos logit contributions
aroo+0.028
aro+0.025
 nowhere+0.023
t+0.023
hog+0.021
Neg logit contributions
rated-0.039
 gag-0.039
urg-0.039
aus-0.038
icious-0.038
Token lots
Token ID6041
Feature activation-3.89e-01

Pos logit contributions
 nowhere+0.014
aroo+0.012
t+0.011
 she+0.010
aro+0.010
Neg logit contributions
rated-0.026
 gag-0.025
urg-0.025
ksh-0.024
 hotter-0.024
Token of
Token ID286
Feature activation-3.25e-01

Pos logit contributions
aroo+0.006
aro+0.005
 nowhere+0.005
t+0.004
hog+0.004
Neg logit contributions
rated-0.013
aus-0.013
urg-0.013
 gag-0.013
 decisions-0.013
Token fun
Token ID1257
Feature activation-1.49e+00

Pos logit contributions
aroo+0.008
aro+0.008
 nowhere+0.007
t+0.007
hog+0.007
Neg logit contributions
rated-0.008
urg-0.007
 gag-0.007
ksh-0.007
aus-0.007
Token were
Token ID547
Feature activation-2.54e-01

Pos logit contributions
aroo+0.014
aro+0.013
 nowhere+0.011
t+0.011
hog+0.010
Neg logit contributions
rated-0.022
urg-0.022
 gag-0.021
aus-0.021
 decisions-0.021
Token two
Token ID734
Feature activation+3.76e-01

Pos logit contributions
aroo+0.005
aro+0.005
 nowhere+0.004
t+0.004
hog+0.004
Neg logit contributions
urg-0.007
aus-0.007
rated-0.007
 gag-0.007
 burgers-0.006
Token friends
Token ID2460
Feature activation+2.99e-01

Pos logit contributions
rated+0.013
 gag+0.012
ksh+0.012
urg+0.012
aus+0.012
Neg logit contributions
aroo-0.006
aro-0.005
 nowhere-0.004
hog-0.004
t-0.004
Token,
Token ID11
Feature activation-2.19e-02

Pos logit contributions
urg+0.008
 burgers+0.007
 sne+0.007
 alright+0.007
iver+0.007
Neg logit contributions
aroo-0.006
aro-0.006
t-0.006
ite-0.005
overed-0.005
Token Ab
Token ID2275
Feature activation-3.31e+00

Pos logit contributions
aro+0.001
aroo+0.001
t+0.001
overed+0.000
ite+0.000
Neg logit contributions
 hamb-0.000
 burgers-0.000
 sne-0.000
aus-0.000
 french-0.000
Tokenig
Token ID328
Feature activation-4.31e+00

Pos logit contributions
aroo+0.086
aro+0.081
 nowhere+0.075
t+0.074
hog+0.071
Neg logit contributions
rated-0.074
urg-0.073
aus-0.072
 gag-0.071
icious-0.069
Tokenail
Token ID603
Feature activation-8.49e-01

Pos logit contributions
 Yes+0.104
 nowhere+0.103
 Spirit+0.094
 Inside+0.094
aroo+0.094
Neg logit contributions
urg-0.098
urger-0.085
eping-0.080
iver-0.080
 sne-0.079
Token and
Token ID290
Feature activation-1.15e+00

Pos logit contributions
aroo+0.018
aro+0.017
 nowhere+0.016
t+0.015
ite+0.014
Neg logit contributions
urg-0.022
aus-0.021
rated-0.021
 burgers-0.021
 hamb-0.021
Token Mat
Token ID6550
Feature activation-6.96e-01

Pos logit contributions
aroo+0.024
aro+0.023
t+0.021
 nowhere+0.021
hog+0.019
Neg logit contributions
aus-0.031
urg-0.030
 gag-0.030
rated-0.029
 hamb-0.029
Tokenty
Token ID774
Feature activation-5.93e-01

Pos logit contributions
aroo+0.020
aro+0.019
 nowhere+0.018
t+0.017
hog+0.016
Neg logit contributions
aus-0.014
urg-0.014
rated-0.014
iver-0.013
icious-0.013
Token.
Token ID13
Feature activation-1.05e+00

Pos logit contributions
aroo+0.013
aro+0.012
 nowhere+0.011
t+0.011
ite+0.010
Neg logit contributions
urg-0.016
rated-0.015
aus-0.015
 gag-0.015
 burgers-0.015
Token but
Token ID475
Feature activation-1.05e+00

Pos logit contributions
rated+0.010
urg+0.010
 gag+0.010
aus+0.010
icious+0.010
Neg logit contributions
aroo-0.007
aro-0.006
 nowhere-0.006
t-0.005
hog-0.005
Token she
Token ID673
Feature activation-1.72e+00

Pos logit contributions
aroo+0.017
aro+0.016
t+0.013
hog+0.013
 nowhere+0.013
Neg logit contributions
rated-0.033
aus-0.033
 gag-0.032
urg-0.032
 decisions-0.032
Token still
Token ID991
Feature activation-1.22e+00

Pos logit contributions
aroo+0.038
aro+0.035
 nowhere+0.032
t+0.031
hog+0.029
Neg logit contributions
rated-0.046
 gag-0.045
urg-0.045
aus-0.045
icious-0.043
Token hadn
Token ID8020
Feature activation-3.06e+00

Pos logit contributions
aroo+0.024
aro+0.022
 nowhere+0.021
t+0.020
hog+0.018
Neg logit contributions
rated-0.034
urg-0.034
aus-0.034
 gag-0.034
icious-0.033
Tokenâ
Token ID22940
Feature activation-3.40e-01

Pos logit contributions
aroo+0.043
aro+0.036
hog+0.029
 nowhere+0.029
uzz+0.023
Neg logit contributions
aus-0.105
 alright-0.103
iver-0.100
urg-0.100
icious-0.100
Token
Token ID26391
Feature activation-4.30e+00

Pos logit contributions
aroo+0.010
aro+0.010
t+0.009
 nowhere+0.009
't+0.009
Neg logit contributions
urg-0.006
 gag-0.006
 burgers-0.006
rated-0.006
 decisions-0.006
Token
Token ID8151
Feature activation-3.90e+00

Pos logit contributions
aroo+0.092
aro+0.082
 nowhere+0.073
hog+0.068
t+0.064
Neg logit contributions
urg-0.115
icious-0.115
iver-0.114
 gag-0.114
rated-0.114
Tokent
Token ID83
Feature activation-7.34e-01

Pos logit contributions
aroo+0.044
aro+0.037
 nowhere+0.036
 Spirit+0.024
hog+0.023
Neg logit contributions
aus-0.140
 alright-0.135
rated-0.131
icious-0.129
ksh-0.128
Token seen
Token ID1775
Feature activation-1.11e+00

Pos logit contributions
aroo+0.016
aro+0.015
 nowhere+0.015
t+0.014
hog+0.013
Neg logit contributions
urg-0.019
rated-0.019
aus-0.019
 gag-0.019
ksh-0.019
Token a
Token ID257
Feature activation-6.22e-01

Pos logit contributions
aroo+0.025
aro+0.023
t+0.021
 nowhere+0.021
hog+0.020
Neg logit contributions
rated-0.028
urg-0.028
aus-0.027
 gag-0.027
 decisions-0.027
Token purple
Token ID14032
Feature activation-1.80e+00

Pos logit contributions
aroo+0.016
aro+0.015
t+0.013
 nowhere+0.013
uzz+0.013
Neg logit contributions
rated-0.014
 difference-0.014
 hamb-0.014
 gag-0.013
 deal-0.013
Token smiled
Token ID13541
Feature activation-8.08e-01

Pos logit contributions
aroo+0.003
aro+0.003
 nowhere+0.002
t+0.002
hog+0.002
Neg logit contributions
rated-0.003
aus-0.003
urg-0.003
 gag-0.003
 difference-0.003
Token,
Token ID11
Feature activation+2.25e+00

Pos logit contributions
aroo+0.015
aro+0.014
 nowhere+0.012
t+0.012
hog+0.011
Neg logit contributions
rated-0.024
urg-0.024
 gag-0.024
aus-0.024
 decisions-0.023
Token knowing
Token ID6970
Feature activation-9.92e-01

Pos logit contributions
rated+0.046
aus+0.046
 tast+0.046
urg+0.045
 ste+0.045
Neg logit contributions
aroo-0.062
aro-0.059
t-0.054
hog-0.050
ite-0.050
Token she
Token ID673
Feature activation+8.90e-01

Pos logit contributions
aroo+0.016
aro+0.015
t+0.013
ite+0.013
hog+0.013
Neg logit contributions
 alright-0.030
 decisions-0.030
 difference-0.030
 healthier-0.030
 tast-0.029
Token had
Token ID550
Feature activation-8.25e-01

Pos logit contributions
rated+0.029
urg+0.028
 gag+0.028
aus+0.028
 decisions+0.027
Neg logit contributions
aroo-0.015
aro-0.013
 nowhere-0.012
t-0.012
hog-0.010
Token mastered
Token ID33626
Feature activation-4.29e+00

Pos logit contributions
aroo+0.021
 nowhere+0.020
aro+0.019
t+0.019
't+0.017
Neg logit contributions
rated-0.019
 gag-0.019
urg-0.019
ksh-0.018
aus-0.018
Token a
Token ID257
Feature activation-1.10e+00

Pos logit contributions
aroo+0.085
aro+0.076
t+0.069
 nowhere+0.068
hog+0.065
Neg logit contributions
urg-0.121
 decisions-0.120
 gag-0.119
rated-0.119
aus-0.118
Token new
Token ID649
Feature activation-3.02e+00

Pos logit contributions
aroo+0.027
aro+0.025
t+0.024
 nowhere+0.023
hog+0.022
Neg logit contributions
urg-0.026
rated-0.026
 gag-0.026
aus-0.025
ksh-0.025
Token way
Token ID835
Feature activation-3.03e+00

Pos logit contributions
aroo+0.079
aro+0.074
 nowhere+0.069
t+0.067
hog+0.064
Neg logit contributions
rated-0.068
urg-0.066
 gag-0.066
aus-0.065
 decisions-0.064
Token to
Token ID284
Feature activation+3.25e-01

Pos logit contributions
aroo+0.063
aro+0.057
 nowhere+0.053
hog+0.047
t+0.046
Neg logit contributions
 decisions-0.082
rated-0.079
 alright-0.079
 difference-0.078
urg-0.076
Token have
Token ID423
Feature activation-1.91e+00

Pos logit contributions
ksh+0.008
rated+0.008
urg+0.008
 gag+0.008
aus+0.008
Neg logit contributions
aroo-0.008
aro-0.007
 nowhere-0.007
t-0.007
't-0.006
Token his
Token ID465
Feature activation+6.51e-01

Pos logit contributions
rated+0.026
urg+0.026
 gag+0.026
aus+0.026
icious+0.025
Neg logit contributions
aroo-0.011
aro-0.009
 nowhere-0.008
t-0.008
hog-0.007
Token mom
Token ID1995
Feature activation+1.21e+00

Pos logit contributions
rated+0.019
urg+0.019
 gag+0.019
 decisions+0.018
aus+0.018
Neg logit contributions
aroo-0.013
aro-0.012
 nowhere-0.011
t-0.010
hog-0.010
Tokenmy
Token ID1820
Feature activation+3.99e-01

Pos logit contributions
rated+0.023
aus+0.022
urg+0.022
 gag+0.021
 decisions+0.021
Neg logit contributions
aroo-0.035
aro-0.034
 nowhere-0.031
t-0.030
hog-0.029
Token hadn
Token ID8020
Feature activation-1.96e+00

Pos logit contributions
rated+0.012
 gag+0.012
urg+0.012
aus+0.011
icious+0.011
Neg logit contributions
aroo-0.008
aro-0.007
 nowhere-0.007
t-0.006
hog-0.006
Tokenâ
Token ID22940
Feature activation+2.49e-01

Pos logit contributions
aroo+0.021
aro+0.017
 nowhere+0.014
hog+0.012
uzz+0.009
Neg logit contributions
aus-0.073
iver-0.072
urg-0.072
rated-0.071
 alright-0.071
Token
Token ID26391
Feature activation-4.27e+00

Pos logit contributions
urg+0.006
 gag+0.006
 burgers+0.005
 decisions+0.005
 degrees+0.005
Neg logit contributions
aroo-0.007
t-0.006
aro-0.006
 nowhere-0.006
't-0.006
Token
Token ID8151
Feature activation-3.40e+00

Pos logit contributions
aroo+0.089
aro+0.080
 nowhere+0.071
hog+0.065
t+0.063
Neg logit contributions
urg-0.117
rated-0.117
 gag-0.116
aus-0.115
icious-0.115
Tokent
Token ID83
Feature activation+8.23e-01

Pos logit contributions
aroo+0.035
aro+0.030
 nowhere+0.024
 Spirit+0.016
hog+0.016
Neg logit contributions
aus-0.127
 alright-0.123
rated-0.121
ksh-0.119
icious-0.119
Token ignored
Token ID9514
Feature activation-1.38e+00

Pos logit contributions
urg+0.021
rated+0.021
 gag+0.021
aus+0.021
icious+0.020
Neg logit contributions
aroo-0.018
aro-0.017
 nowhere-0.017
t-0.017
't-0.015
Token him
Token ID683
Feature activation-4.54e-01

Pos logit contributions
aroo+0.027
aro+0.025
 nowhere+0.024
t+0.022
't+0.021
Neg logit contributions
rated-0.041
 gag-0.039
urg-0.039
ksh-0.039
icious-0.038
Token.
Token ID13
Feature activation+1.14e-02

Pos logit contributions
aroo+0.006
aro+0.006
t+0.005
 nowhere+0.005
hog+0.005
Neg logit contributions
rated-0.015
urg-0.015
 gag-0.015
aus-0.015
ksh-0.015
Token so
Token ID523
Feature activation+2.21e-02

Pos logit contributions
rated+0.037
 gag+0.037
aus+0.037
urg+0.036
 decisions+0.036
Neg logit contributions
aroo-0.029
aro-0.027
 nowhere-0.024
t-0.024
hog-0.022
Token he
Token ID339
Feature activation+2.25e-01

Pos logit contributions
rated+0.001
 gag+0.001
 decisions+0.001
aus+0.001
urg+0.001
Neg logit contributions
aroo-0.000
aro-0.000
 nowhere-0.000
t-0.000
hog-0.000
Token promised
Token ID8072
Feature activation-1.06e+00

Pos logit contributions
rated+0.005
aus+0.005
urg+0.005
 gag+0.005
 hamb+0.005
Neg logit contributions
aroo-0.005
aro-0.005
 nowhere-0.005
t-0.005
hog-0.004
Token to
Token ID284
Feature activation+5.19e-01

Pos logit contributions
aroo+0.020
aro+0.018
 nowhere+0.017
ite+0.016
hog+0.015
Neg logit contributions
 decisions-0.030
aus-0.030
 alright-0.029
urg-0.029
rated-0.029
Token be
Token ID307
Feature activation+5.69e-01

Pos logit contributions
rated+0.011
urg+0.011
 gag+0.011
aus+0.011
 decisions+0.010
Neg logit contributions
aroo-0.014
aro-0.013
 nowhere-0.013
t-0.012
hog-0.012
Token Beat
Token ID12568
Feature activation-4.26e+00

Pos logit contributions
rated+0.013
urg+0.013
 gag+0.013
aus+0.012
ksh+0.012
Neg logit contributions
aroo-0.015
aro-0.014
 nowhere-0.014
t-0.013
hog-0.012
Tokenrice
Token ID20970
Feature activation-2.20e+00

Pos logit contributions
aroo+0.074
aro+0.071
 nowhere+0.070
ite+0.057
t+0.055
Neg logit contributions
rated-0.126
 gag-0.120
ksh-0.119
 decisions-0.118
 deal-0.117
Token's
Token ID338
Feature activation-7.93e-01

Pos logit contributions
aroo+0.041
aro+0.038
 nowhere+0.034
t+0.032
hog+0.030
Neg logit contributions
rated-0.065
 gag-0.064
urg-0.064
aus-0.064
 decisions-0.063
Token friend
Token ID1545
Feature activation-1.01e+00

Pos logit contributions
aroo+0.019
aro+0.017
 nowhere+0.016
t+0.016
hog+0.015
Neg logit contributions
rated-0.020
urg-0.020
 gag-0.020
aus-0.019
ksh-0.019
Token.
Token ID13
Feature activation-5.73e-01

Pos logit contributions
aroo+0.018
aro+0.017
 nowhere+0.015
t+0.015
hog+0.013
Neg logit contributions
rated-0.031
urg-0.030
 gag-0.030
aus-0.030
 decisions-0.029
Token From
Token ID3574
Feature activation-1.86e+00

Pos logit contributions
aroo+0.014
aro+0.013
 nowhere+0.012
t+0.011
itt+0.010
Neg logit contributions
 mental-0.014
ksh-0.013
aus-0.013
 difference-0.013
 decisions-0.013
Token do
Token ID466
Feature activation-1.85e+00

Pos logit contributions
rated+0.005
urg+0.005
ksh+0.005
 gag+0.005
aus+0.005
Neg logit contributions
aroo-0.006
aro-0.006
 nowhere-0.005
t-0.005
't-0.005
Token things
Token ID1243
Feature activation-1.02e+00

Pos logit contributions
aroo+0.026
aro+0.026
 nowhere+0.023
t+0.021
 that+0.020
Neg logit contributions
rated-0.063
ksh-0.063
 gag-0.060
icious-0.060
aus-0.060
Token he
Token ID339
Feature activation+1.22e+00

Pos logit contributions
aroo+0.017
aro+0.015
t+0.014
 nowhere+0.013
't+0.012
Neg logit contributions
rated-0.033
 gag-0.033
urg-0.033
ksh-0.033
aus-0.032
Token couldn
Token ID3521
Feature activation-1.19e+00

Pos logit contributions
urg+0.039
 gag+0.038
 hotter+0.037
rated+0.037
ksh+0.037
Neg logit contributions
aroo-0.021
t-0.018
aro-0.018
 nowhere-0.018
hog-0.016
Tokenâ
Token ID22940
Feature activation+4.97e-02

Pos logit contributions
aroo+0.007
aro+0.006
t+0.004
 nowhere+0.003
't+0.002
Neg logit contributions
rated-0.051
 gag-0.050
urg-0.050
aus-0.049
ksh-0.049
Token
Token ID26391
Feature activation-4.25e+00

Pos logit contributions
urg+0.001
 gag+0.001
 hotter+0.001
 degrees+0.001
 burgers+0.001
Neg logit contributions
aroo-0.001
t-0.001
't-0.001
aro-0.001
 nowhere-0.001
Token
Token ID8151
Feature activation-3.71e+00

Pos logit contributions
aroo+0.088
aro+0.082
 nowhere+0.075
t+0.074
hog+0.067
Neg logit contributions
rated-0.118
urg-0.115
 gag-0.115
aus-0.114
 decisions-0.111
Tokent
Token ID83
Feature activation+5.69e-02

Pos logit contributions
aroo+0.031
aro+0.025
 nowhere+0.024
hog+0.013
uzz+0.010
Neg logit contributions
aus-0.146
rated-0.145
iver-0.142
urg-0.142
 gag-0.141
Token do
Token ID466
Feature activation-1.24e+00

Pos logit contributions
rated+0.001
urg+0.001
 gag+0.001
aus+0.001
ksh+0.001
Neg logit contributions
aroo-0.002
aro-0.002
 nowhere-0.001
t-0.001
hog-0.001
Token without
Token ID1231
Feature activation-1.06e-01

Pos logit contributions
aroo+0.024
aro+0.022
 nowhere+0.020
t+0.019
hog+0.018
Neg logit contributions
rated-0.036
urg-0.036
 gag-0.035
aus-0.035
 decisions-0.035
Token it
Token ID340
Feature activation-9.19e-01

Pos logit contributions
aroo+0.003
aro+0.002
 nowhere+0.002
t+0.002
't+0.002
Neg logit contributions
rated-0.003
ksh-0.003
urg-0.003
icious-0.003
aus-0.003
Token today
Token ID1909
Feature activation+5.77e-01

Pos logit contributions
urg+0.069
 gag+0.069
rated+0.069
aus+0.068
icious+0.067
Neg logit contributions
aroo-0.027
t-0.024
aro-0.023
 nowhere-0.022
't-0.020
Token.
Token ID13
Feature activation+1.75e-01

Pos logit contributions
urg+0.020
 gag+0.020
rated+0.020
aus+0.020
ksh+0.019
Neg logit contributions
aroo-0.007
t-0.007
 nowhere-0.007
aro-0.007
't-0.006
Token She
Token ID1375
Feature activation+6.55e-01

Pos logit contributions
rated+0.005
 gag+0.005
aus+0.005
urg+0.005
ksh+0.005
Neg logit contributions
aroo-0.003
aro-0.003
 nowhere-0.003
t-0.003
hog-0.003
Token has
Token ID468
Feature activation-2.62e-01

Pos logit contributions
urg+0.016
rated+0.015
 gag+0.015
icious+0.015
ksh+0.015
Neg logit contributions
aroo-0.016
t-0.014
aro-0.014
 nowhere-0.014
't-0.013
Token a
Token ID257
Feature activation-3.63e-01

Pos logit contributions
aroo+0.005
aro+0.005
 nowhere+0.005
t+0.004
hog+0.004
Neg logit contributions
rated-0.008
urg-0.007
 gag-0.007
aus-0.007
icious-0.007
Token fever
Token ID17372
Feature activation+3.59e+00

Pos logit contributions
aroo+0.010
aro+0.009
 nowhere+0.009
t+0.009
hog+0.008
Neg logit contributions
rated-0.008
urg-0.008
aus-0.008
 gag-0.007
ksh-0.007
Token and
Token ID290
Feature activation-1.02e+00

Pos logit contributions
urg+0.130
icious+0.128
 gag+0.126
rated+0.126
aus+0.125
Neg logit contributions
aroo-0.043
t-0.042
 nowhere-0.037
't-0.036
 that-0.034
Token a
Token ID257
Feature activation+5.82e-01

Pos logit contributions
aroo+0.018
aro+0.016
 nowhere+0.016
t+0.015
't+0.013
Neg logit contributions
rated-0.031
urg-0.031
icious-0.030
aus-0.030
ksh-0.030
Token sore
Token ID19597
Feature activation+1.93e+00

Pos logit contributions
rated+0.014
urg+0.014
aus+0.013
ksh+0.013
icious+0.013
Neg logit contributions
aroo-0.015
 nowhere-0.014
t-0.013
 she-0.013
aro-0.013
Token throat
Token ID13589
Feature activation+2.46e+00

Pos logit contributions
 gag+0.037
rated+0.037
urg+0.035
aus+0.035
iver+0.034
Neg logit contributions
aroo-0.057
aro-0.055
 nowhere-0.051
t-0.049
hog-0.048
Token.
Token ID13
Feature activation+7.75e-01

Pos logit contributions
aus+0.085
urg+0.085
icious+0.084
ksh+0.083
rated+0.083
Neg logit contributions
aroo-0.033
t-0.032
aro-0.028
 nowhere-0.028
't-0.027
Token.
Token ID13
Feature activation+4.24e-01

Pos logit contributions
rated+0.005
 gag+0.005
urg+0.005
aus+0.004
 decisions+0.004
Neg logit contributions
aroo-0.003
aro-0.003
 nowhere-0.003
t-0.003
hog-0.003
Token It
Token ID632
Feature activation-5.88e-01

Pos logit contributions
 gag+0.012
urg+0.012
aus+0.012
rated+0.012
ksh+0.012
Neg logit contributions
aroo-0.008
aro-0.008
 nowhere-0.007
t-0.007
ite-0.006
Token bar
Token ID2318
Feature activation+3.95e-01

Pos logit contributions
aroo+0.016
aro+0.016
 nowhere+0.015
t+0.014
hog+0.014
Neg logit contributions
rated-0.012
urg-0.012
aus-0.012
 gag-0.012
icious-0.011
Tokenks
Token ID591
Feature activation+1.65e+00

Pos logit contributions
rated+0.010
urg+0.010
aus+0.010
 gag+0.010
icious+0.010
Neg logit contributions
aroo-0.009
aro-0.009
 nowhere-0.008
t-0.008
hog-0.007
Token and
Token ID290
Feature activation+4.91e-01

Pos logit contributions
rated+0.048
urg+0.047
 gag+0.047
aus+0.046
 decisions+0.045
Neg logit contributions
aroo-0.032
aro-0.030
 nowhere-0.027
t-0.026
hog-0.024
Token grow
Token ID1663
Feature activation+3.80e+00

Pos logit contributions
rated+0.010
urg+0.010
 gag+0.010
aus+0.010
icious+0.009
Neg logit contributions
aroo-0.014
aro-0.013
 nowhere-0.012
t-0.012
hog-0.011
Tokenls
Token ID7278
Feature activation+1.55e+00

Pos logit contributions
icious+0.148
aus+0.148
urger+0.146
 hamb+0.146
 french+0.146
Neg logit contributions
aro-0.034
aroo-0.033
hog-0.025
t-0.024
 nowhere-0.022
Token.
Token ID13
Feature activation+1.01e+00

Pos logit contributions
rated+0.052
aus+0.051
urg+0.051
 gag+0.050
icious+0.050
Neg logit contributions
aroo-0.024
aro-0.021
 nowhere-0.020
t-0.019
hog-0.017
Token Lily
Token ID20037
Feature activation-3.43e-01

Pos logit contributions
aus+0.028
 gag+0.028
urg+0.028
 mental+0.027
 microwave+0.027
Neg logit contributions
aroo-0.020
aro-0.019
 nowhere-0.016
t-0.016
ite-0.015
Token and
Token ID290
Feature activation+1.54e+00

Pos logit contributions
aroo+0.008
aro+0.007
 nowhere+0.007
t+0.007
uzz+0.006
Neg logit contributions
 gag-0.009
rated-0.009
urg-0.008
aus-0.008
 difference-0.008
Token Ben
Token ID3932
Feature activation+1.15e+00

Pos logit contributions
rated+0.058
icious+0.055
urg+0.054
iver+0.054
ksh+0.054
Neg logit contributions
aroo-0.019
aro-0.015
 nowhere-0.015
 she-0.014
 Spike-0.013
Token good
Token ID922
Feature activation-6.17e-01

Pos logit contributions
rated+0.001
 gag+0.001
urg+0.001
aus+0.001
 decisions+0.001
Neg logit contributions
aroo-0.001
aro-0.001
 nowhere-0.001
t-0.001
't-0.001
Token at
Token ID379
Feature activation-1.74e-01

Pos logit contributions
aroo+0.011
aro+0.010
 nowhere+0.009
t+0.009
't+0.008
Neg logit contributions
rated-0.019
 gag-0.019
aus-0.018
urg-0.018
icious-0.018
Token j
Token ID474
Feature activation-1.80e-01

Pos logit contributions
aroo+0.004
aro+0.003
 nowhere+0.003
t+0.003
hog+0.003
Neg logit contributions
rated-0.005
 gag-0.005
urg-0.005
aus-0.005
icious-0.005
Tokenogging
Token ID30853
Feature activation-1.13e+00

Pos logit contributions
aroo+0.005
aro+0.004
 nowhere+0.004
t+0.004
hog+0.004
Neg logit contributions
rated-0.004
urg-0.004
aus-0.004
 gag-0.004
 decisions-0.004
Token.
Token ID13
Feature activation+2.06e-01

Pos logit contributions
aroo+0.020
aro+0.019
 nowhere+0.017
t+0.016
hog+0.015
Neg logit contributions
rated-0.035
urg-0.035
 gag-0.034
aus-0.034
icious-0.034
Token How
Token ID1374
Feature activation+3.66e+00

Pos logit contributions
rated+0.005
aus+0.005
urg+0.005
 gag+0.005
ksh+0.005
Neg logit contributions
aroo-0.005
aro-0.004
 nowhere-0.004
t-0.004
hog-0.004
Token do
Token ID466
Feature activation-5.41e-01

Pos logit contributions
urg+0.097
 hotter+0.096
ksh+0.094
eling+0.093
rated+0.093
Neg logit contributions
t-0.081
 Yes-0.070
aroo-0.068
't-0.068
 that-0.067
Token you
Token ID345
Feature activation-1.79e-01

Pos logit contributions
aroo+0.008
aro+0.007
 nowhere+0.006
t+0.006
hog+0.005
Neg logit contributions
rated-0.019
urg-0.018
 gag-0.018
aus-0.018
icious-0.018
Token do
Token ID466
Feature activation+1.01e-01

Pos logit contributions
't+0.004
t+0.004
aroo+0.004
aro+0.003
+0.003
Neg logit contributions
urg-0.005
 gag-0.005
 decisions-0.005
ksh-0.005
 degrees-0.005
Token it
Token ID340
Feature activation+5.23e-01

Pos logit contributions
rated+0.003
urg+0.003
 gag+0.003
aus+0.003
ksh+0.003
Neg logit contributions
aroo-0.002
aro-0.002
 nowhere-0.001
t-0.001
't-0.001
Token?"
Token ID1701
Feature activation+7.11e-01

Pos logit contributions
 gag+0.017
 aroma+0.017
 decisions+0.017
rated+0.016
urger+0.016
Neg logit contributions
aroo-0.008
't-0.008
t-0.008
aro-0.007
 nowhere-0.007
Token they
Token ID484
Feature activation+1.12e+00

Pos logit contributions
rated+0.044
 gag+0.043
urg+0.043
aus+0.042
iver+0.042
Neg logit contributions
aroo-0.019
aro-0.017
 nowhere-0.014
t-0.014
't-0.013
Token will
Token ID481
Feature activation+7.46e-01

Pos logit contributions
urg+0.028
ksh+0.027
 gag+0.027
aus+0.027
rated+0.027
Neg logit contributions
aroo-0.026
t-0.024
aro-0.024
't-0.023
 nowhere-0.022
Token make
Token ID787
Feature activation+1.29e+00

Pos logit contributions
ksh+0.017
rated+0.017
aus+0.017
 gag+0.016
urg+0.016
Neg logit contributions
aroo-0.018
aro-0.017
't-0.017
t-0.017
 nowhere-0.016
Token the
Token ID262
Feature activation+6.79e-01

Pos logit contributions
rated+0.045
 gag+0.041
ksh+0.041
iver+0.040
aus+0.040
Neg logit contributions
aroo-0.020
aro-0.019
 that-0.017
't-0.016
 nowhere-0.016
Token earth
Token ID4534
Feature activation+4.14e-01

Pos logit contributions
aus+0.019
urg+0.019
 gag+0.019
ksh+0.018
rated+0.018
Neg logit contributions
aroo-0.014
 nowhere-0.012
aro-0.012
hog-0.011
t-0.011
Token taste
Token ID6938
Feature activation+3.46e+00

Pos logit contributions
aus+0.011
rated+0.010
 gag+0.010
ksh+0.010
icious+0.010
Neg logit contributions
aroo-0.009
aro-0.008
t-0.008
hog-0.008
't-0.008
Token delicious
Token ID12625
Feature activation+1.41e+00

Pos logit contributions
 gag+0.093
ksh+0.088
urg+0.087
rated+0.087
aus+0.087
Neg logit contributions
t-0.072
aroo-0.071
 that-0.068
aro-0.067
't-0.064
Token!"
Token ID2474
Feature activation+5.62e-01

Pos logit contributions
 gag+0.045
rated+0.044
 ang+0.044
aus+0.043
iver+0.043
Neg logit contributions
aroo-0.023
aro-0.021
t-0.019
 that-0.018
 nowhere-0.017
Token Bird
Token ID14506
Feature activation+3.07e-01

Pos logit contributions
 gag+0.017
urg+0.017
aus+0.016
rated+0.016
 decisions+0.016
Neg logit contributions
aroo-0.009
 nowhere-0.009
aro-0.009
t-0.008
ite-0.007
Token disagreed
Token ID27884
Feature activation-1.29e+00

Pos logit contributions
rated+0.009
urg+0.008
 gag+0.008
aus+0.008
 decisions+0.008
Neg logit contributions
aroo-0.006
aro-0.006
 nowhere-0.005
t-0.005
hog-0.005
Token,
Token ID11
Feature activation+1.28e+00

Pos logit contributions
aroo+0.028
aro+0.027
 nowhere+0.023
ite+0.023
hog+0.020
Neg logit contributions
rated-0.033
urg-0.033
 alright-0.032
aus-0.032
 gag-0.032
Token his
Token ID465
Feature activation+5.20e-01

Pos logit contributions
 nowhere+0.015
aroo+0.014
aro+0.012
't+0.011
t+0.010
Neg logit contributions
rated-0.042
ksh-0.041
icious-0.040
urg-0.039
aus-0.039
Token garden
Token ID11376
Feature activation-6.29e-01

Pos logit contributions
rated+0.015
urg+0.014
aus+0.014
 gag+0.014
icious+0.014
Neg logit contributions
aroo-0.011
aro-0.010
 nowhere-0.009
t-0.009
hog-0.008
Token and
Token ID290
Feature activation-1.09e+00

Pos logit contributions
aroo+0.007
 nowhere+0.006
aro+0.006
t+0.005
hog+0.005
Neg logit contributions
rated-0.023
 gag-0.023
icious-0.023
ksh-0.023
urg-0.023
Token watched
Token ID7342
Feature activation+7.05e-01

Pos logit contributions
aroo+0.028
aro+0.027
 nowhere+0.025
t+0.024
't+0.023
Neg logit contributions
rated-0.024
urg-0.024
aus-0.024
icious-0.023
ksh-0.023
Token it
Token ID340
Feature activation+5.86e-01

Pos logit contributions
rated+0.024
aus+0.023
icious+0.022
iver+0.022
ksh+0.022
Neg logit contributions
aroo-0.011
aro-0.010
 nowhere-0.009
 that-0.008
't-0.008
Token grow
Token ID1663
Feature activation+4.54e+00

Pos logit contributions
aus+0.017
ksh+0.017
rated+0.016
ior+0.016
urger+0.016
Neg logit contributions
 she-0.011
 with-0.010
aroo-0.010
 that-0.009
 in-0.009
Token each
Token ID1123
Feature activation+6.15e-01

Pos logit contributions
 gag+0.173
icious+0.169
 appointment+0.165
aus+0.163
 deal+0.163
Neg logit contributions
 in-0.046
 with-0.037
 she-0.033
aroo-0.032
 that-0.031
Token day
Token ID1110
Feature activation+1.61e+00

Pos logit contributions
urg+0.020
ksh+0.020
 gag+0.020
rated+0.019
aus+0.019
Neg logit contributions
aroo-0.009
aro-0.008
 nowhere-0.007
t-0.007
 she-0.007
Token.
Token ID13
Feature activation+1.04e-01

Pos logit contributions
rated+0.062
aus+0.060
 gag+0.060
icious+0.060
iver+0.059
Neg logit contributions
t-0.014
aroo-0.012
 nowhere-0.012
 she-0.011
 that-0.010
Token
Token ID220
Feature activation-4.34e-01

Pos logit contributions
aus+0.003
 microwave+0.003
 burgers+0.003
ksh+0.003
 hamb+0.003
Neg logit contributions
aroo-0.002
aro-0.002
 nowhere-0.002
t-0.002
hog-0.002
Token
Token ID198
Feature activation+2.21e-01

Pos logit contributions
aroo+0.007
aro+0.006
 nowhere+0.005
hog+0.005
t+0.005
Neg logit contributions
aus-0.014
urg-0.014
iver-0.013
rated-0.013
 decisions-0.013
Token.
Token ID13
Feature activation-8.14e-01

Pos logit contributions
aroo+0.031
aro+0.030
 nowhere+0.025
t+0.024
ite+0.023
Neg logit contributions
 burgers-0.032
rated-0.032
 gag-0.031
 hamb-0.031
 decisions-0.030
Token They
Token ID1119
Feature activation-2.97e-01

Pos logit contributions
aroo+0.020
aro+0.019
 nowhere+0.017
t+0.016
ite+0.015
Neg logit contributions
rated-0.020
 gag-0.020
aus-0.019
 burgers-0.019
ksh-0.019
Token saw
Token ID2497
Feature activation+1.18e+00

Pos logit contributions
aroo+0.008
aro+0.007
 nowhere+0.007
t+0.006
hog+0.006
Neg logit contributions
rated-0.006
 sne-0.006
aus-0.006
 hamb-0.006
 gag-0.006
Token a
Token ID257
Feature activation+5.00e-01

Pos logit contributions
rated+0.031
 gag+0.031
urg+0.031
aus+0.031
 decisions+0.030
Neg logit contributions
aroo-0.025
aro-0.024
t-0.021
 nowhere-0.021
hog-0.020
Token big
Token ID1263
Feature activation+1.03e+00

Pos logit contributions
rated+0.013
urg+0.012
 gag+0.012
aus+0.012
 decisions+0.012
Neg logit contributions
aroo-0.012
aro-0.011
 nowhere-0.010
t-0.010
hog-0.009
Token bird
Token ID6512
Feature activation+1.49e+00

Pos logit contributions
rated+0.032
 gag+0.032
urg+0.032
aus+0.031
ksh+0.031
Neg logit contributions
aroo-0.019
aro-0.016
 nowhere-0.015
hog-0.014
t-0.013
Token with
Token ID351
Feature activation-1.43e+00

Pos logit contributions
iver+0.041
rated+0.039
urg+0.039
 gag+0.039
 sne+0.039
Neg logit contributions
aroo-0.028
aro-0.027
t-0.024
 nowhere-0.023
uzz-0.022
Token many
Token ID867
Feature activation-2.31e-01

Pos logit contributions
aroo+0.035
aro+0.033
t+0.030
hog+0.029
 nowhere+0.029
Neg logit contributions
 gag-0.034
urg-0.033
aus-0.032
 hamb-0.032
 french-0.032
Token feathers
Token ID27400
Feature activation+7.81e-01

Pos logit contributions
aroo+0.006
aro+0.005
t+0.005
 nowhere+0.005
hog+0.005
Neg logit contributions
 gag-0.005
rated-0.005
 decisions-0.005
urg-0.005
aus-0.005
Token.
Token ID13
Feature activation+4.76e-02

Pos logit contributions
urg+0.020
rated+0.020
 gag+0.020
aus+0.020
iver+0.020
Neg logit contributions
aroo-0.017
aro-0.015
 nowhere-0.015
t-0.013
hog-0.013
Token The
Token ID383
Feature activation-3.62e-01

Pos logit contributions
aus+0.001
 burgers+0.001
 gag+0.001
ksh+0.001
 hamb+0.001
Neg logit contributions
aroo-0.001
aro-0.001
 nowhere-0.001
t-0.001
hog-0.001
Token knew
Token ID2993
Feature activation-4.53e-01

Pos logit contributions
rated+0.016
 gag+0.016
aus+0.016
urg+0.016
ksh+0.015
Neg logit contributions
aroo-0.009
aro-0.009
 nowhere-0.007
t-0.007
hog-0.007
Token how
Token ID703
Feature activation+1.06e+00

Pos logit contributions
aroo+0.009
aro+0.008
t+0.007
 nowhere+0.007
hog+0.007
Neg logit contributions
aus-0.012
urg-0.012
 burgers-0.012
 gag-0.012
 decisions-0.012
Token to
Token ID284
Feature activation+6.23e-02

Pos logit contributions
rated+0.037
urg+0.037
 gag+0.036
aus+0.036
icious+0.036
Neg logit contributions
aroo-0.015
aro-0.013
 nowhere-0.011
t-0.011
hog-0.009
Token make
Token ID787
Feature activation-5.43e-02

Pos logit contributions
rated+0.001
ksh+0.001
 decisions+0.001
aus+0.001
 unnecessary+0.001
Neg logit contributions
aroo-0.002
't-0.002
t-0.002
aro-0.002
 that-0.002
Token real
Token ID1103
Feature activation-1.00e+00

Pos logit contributions
 that+0.001
 it+0.001
 she+0.001
 nowhere+0.001
aroo+0.001
Neg logit contributions
rated-0.002
ksh-0.002
icious-0.002
aus-0.002
iver-0.002
Token soup
Token ID17141
Feature activation+1.28e+00

Pos logit contributions
aroo+0.021
 that+0.020
 with+0.020
 she+0.019
 they+0.019
Neg logit contributions
rated-0.028
aus-0.027
icious-0.026
ksh-0.026
 mental-0.025
Token.
Token ID13
Feature activation+4.52e-01

Pos logit contributions
rated+0.050
aus+0.050
ksh+0.048
 gag+0.047
 mischief+0.047
Neg logit contributions
 that-0.010
 she-0.010
aroo-0.009
 with-0.009
t-0.009
Token
Token ID198
Feature activation+6.21e-01

Pos logit contributions
 gag+0.010
ksh+0.010
 mental+0.009
aus+0.009
 microwave+0.009
Neg logit contributions
aroo-0.013
aro-0.013
t-0.010
ite-0.010
 nowhere-0.010
Token
Token ID198
Feature activation-3.71e-01

Pos logit contributions
aus+0.019
 mental+0.019
urg+0.019
iver+0.019
 gag+0.019
Neg logit contributions
aroo-0.011
aro-0.010
 nowhere-0.008
t-0.007
hog-0.007
Token"
Token ID1
Feature activation-2.92e-01

Pos logit contributions
aroo+0.008
aro+0.008
 nowhere+0.007
t+0.006
hog+0.006
Neg logit contributions
rated-0.010
icious-0.010
aus-0.010
have-0.010
iver-0.009
TokenCan
Token ID6090
Feature activation+3.30e-01

Pos logit contributions
aroo+0.006
aro+0.006
 nowhere+0.005
t+0.005
hog+0.005
Neg logit contributions
rated-0.008
aus-0.008
 gag-0.008
icious-0.008
ksh-0.008
Token replied
Token ID8712
Feature activation-3.97e-01

Pos logit contributions
rated+0.014
urg+0.014
aus+0.014
 gag+0.014
icious+0.013
Neg logit contributions
aroo-0.012
aro-0.011
 nowhere-0.010
t-0.010
hog-0.009
Token,
Token ID11
Feature activation+1.34e+00

Pos logit contributions
aroo+0.007
aro+0.007
 nowhere+0.006
t+0.006
hog+0.006
Neg logit contributions
rated-0.012
urg-0.012
 gag-0.011
aus-0.011
 decisions-0.011
Token "
Token ID366
Feature activation-1.06e+00

Pos logit contributions
urg+0.038
aus+0.038
rated+0.038
 gag+0.038
 hamb+0.037
Neg logit contributions
aroo-0.026
aro-0.024
 nowhere-0.021
ite-0.021
t-0.020
TokenWe
Token ID1135
Feature activation+6.30e-01

Pos logit contributions
aroo+0.022
aro+0.022
 nowhere+0.017
hog+0.017
t+0.016
Neg logit contributions
rated-0.029
icious-0.029
aus-0.029
ksh-0.028
have-0.028
Token don
Token ID836
Feature activation-5.59e-01

Pos logit contributions
rated+0.012
aus+0.011
 gag+0.011
urg+0.011
 decisions+0.011
Neg logit contributions
aroo-0.019
aro-0.018
 nowhere-0.016
t-0.016
hog-0.016
Token't
Token ID470
Feature activation+8.29e-01

Pos logit contributions
aroo+0.004
aro+0.003
 nowhere+0.002
hog+0.002
t+0.002
Neg logit contributions
urg-0.023
rated-0.023
aus-0.022
 gag-0.022
icious-0.022
Token know
Token ID760
Feature activation-4.13e-03

Pos logit contributions
urg+0.016
rated+0.016
aus+0.016
 gag+0.016
ksh+0.015
Neg logit contributions
aroo-0.024
aro-0.022
 nowhere-0.021
t-0.021
't-0.020
Token what
Token ID644
Feature activation-2.45e-01

Pos logit contributions
aroo+0.000
aro+0.000
 nowhere+0.000
t+0.000
hog+0.000
Neg logit contributions
rated-0.000
urg-0.000
 gag-0.000
aus-0.000
 decisions-0.000
Token that
Token ID326
Feature activation+7.27e-01

Pos logit contributions
aroo+0.005
aro+0.005
 nowhere+0.004
t+0.004
hog+0.004
Neg logit contributions
rated-0.007
 gag-0.007
urg-0.007
aus-0.007
 decisions-0.007
Token thing
Token ID1517
Feature activation-2.50e-01

Pos logit contributions
ksh+0.017
urg+0.017
icious+0.017
 gag+0.017
aus+0.016
Neg logit contributions
aroo-0.019
aro-0.017
t-0.017
 she-0.016
't-0.016
Token is
Token ID318
Feature activation+2.10e-01

Pos logit contributions
aroo+0.007
aro+0.007
 nowhere+0.006
t+0.006
hog+0.006
Neg logit contributions
rated-0.005
urg-0.005
aus-0.005
 decisions-0.005
 gag-0.005
Token rainbow
Token ID27223
Feature activation-9.94e-01

Pos logit contributions
rated+0.026
urg+0.026
 gag+0.025
aus+0.025
icious+0.024
Neg logit contributions
aroo-0.023
aro-0.022
 nowhere-0.020
t-0.019
hog-0.018
Token was
Token ID373
Feature activation+8.59e-01

Pos logit contributions
aroo+0.021
aro+0.020
 nowhere+0.019
t+0.018
't+0.017
Neg logit contributions
rated-0.026
urg-0.025
aus-0.024
 difference-0.024
 decisions-0.023
Token a
Token ID257
Feature activation+3.11e-01

Pos logit contributions
 gag+0.023
aus+0.023
rated+0.023
urg+0.023
ksh+0.022
Neg logit contributions
aroo-0.018
 nowhere-0.017
aro-0.017
t-0.015
't-0.014
Token sign
Token ID1051
Feature activation+5.15e-01

Pos logit contributions
rated+0.007
urg+0.006
 gag+0.006
aus+0.006
icious+0.006
Neg logit contributions
aroo-0.008
aro-0.008
 nowhere-0.007
t-0.007
hog-0.007
Token that
Token ID326
Feature activation+1.65e+00

Pos logit contributions
rated+0.017
aus+0.017
 difference+0.017
urg+0.017
 decisions+0.017
Neg logit contributions
aroo-0.007
aro-0.007
 nowhere-0.005
t-0.005
uzz-0.004
Token the
Token ID262
Feature activation+7.12e-01

Pos logit contributions
rated+0.047
 gag+0.046
urg+0.046
aus+0.046
icious+0.045
Neg logit contributions
aroo-0.034
aro-0.031
 nowhere-0.029
t-0.027
hog-0.025
Token day
Token ID1110
Feature activation-9.82e-01

Pos logit contributions
rated+0.018
 gag+0.017
urg+0.017
 decisions+0.017
 difference+0.017
Neg logit contributions
aroo-0.017
aro-0.016
t-0.014
 nowhere-0.014
hog-0.013
Token was
Token ID373
Feature activation+5.50e-01

Pos logit contributions
aroo+0.018
aro+0.016
 nowhere+0.015
t+0.014
hog+0.013
Neg logit contributions
rated-0.030
 gag-0.030
urg-0.029
aus-0.029
icious-0.029
Token going
Token ID1016
Feature activation+1.26e-01

Pos logit contributions
 gag+0.015
aus+0.015
rated+0.015
urg+0.014
ksh+0.014
Neg logit contributions
aroo-0.012
 nowhere-0.011
aro-0.011
t-0.010
't-0.009
Token to
Token ID284
Feature activation-5.71e-03

Pos logit contributions
 gag+0.005
urg+0.005
rated+0.005
aus+0.005
icious+0.005
Neg logit contributions
aroo-0.001
aro-0.001
 nowhere-0.001
t-0.001
hog-0.001
Token be
Token ID307
Feature activation+1.39e+00

Pos logit contributions
aroo+0.000
aro+0.000
 nowhere+0.000
't+0.000
t+0.000
Neg logit contributions
ksh-0.000
 gag-0.000
rated-0.000
 degrees-0.000
urger-0.000
Token.
Token ID13
Feature activation-8.53e-01

Pos logit contributions
aroo+0.025
aro+0.023
 nowhere+0.022
t+0.020
hog+0.019
Neg logit contributions
 decisions-0.026
rated-0.026
 deal-0.025
 difference-0.025
urg-0.025
Token The
Token ID383
Feature activation-3.46e-01

Pos logit contributions
aroo+0.020
 nowhere+0.019
aro+0.019
t+0.017
hog+0.016
Neg logit contributions
ksh-0.021
icious-0.020
aus-0.020
urg-0.019
rated-0.019
Token slide
Token ID10649
Feature activation-3.00e-01

Pos logit contributions
aroo+0.009
aro+0.008
 nowhere+0.007
t+0.007
hog+0.007
Neg logit contributions
 difference-0.008
 decisions-0.008
 gag-0.008
 deal-0.008
rated-0.008
Token was
Token ID373
Feature activation+1.56e+00

Pos logit contributions
aroo+0.007
aro+0.007
t+0.006
 nowhere+0.006
hog+0.006
Neg logit contributions
 gag-0.007
urg-0.007
rated-0.007
aus-0.007
 decisions-0.007
Token a
Token ID257
Feature activation+2.56e-02

Pos logit contributions
 gag+0.040
rated+0.039
urg+0.038
aus+0.038
ksh+0.037
Neg logit contributions
aroo-0.037
aro-0.035
 nowhere-0.034
t-0.030
't-0.029
Token lie
Token ID6486
Feature activation+1.29e+00

Pos logit contributions
rated+0.000
 gag+0.000
urg+0.000
aus+0.000
 difference+0.000
Neg logit contributions
aroo-0.001
aro-0.001
 nowhere-0.001
t-0.001
hog-0.001
Token.
Token ID13
Feature activation-2.12e-01

Pos logit contributions
rated+0.039
urg+0.039
 gag+0.039
aus+0.038
icious+0.037
Neg logit contributions
aroo-0.024
aro-0.022
 nowhere-0.019
t-0.019
hog-0.017
Token It
Token ID632
Feature activation-7.14e-01

Pos logit contributions
aroo+0.005
aro+0.004
 nowhere+0.004
t+0.004
hog+0.004
Neg logit contributions
aus-0.006
rated-0.006
urg-0.006
ksh-0.006
icious-0.005
Token was
Token ID373
Feature activation+1.38e+00

Pos logit contributions
aroo+0.013
aro+0.012
 nowhere+0.011
t+0.011
hog+0.010
Neg logit contributions
urg-0.022
rated-0.022
 gag-0.021
aus-0.021
 decisions-0.021
Token empty
Token ID6565
Feature activation+8.56e-01

Pos logit contributions
rated+0.040
 gag+0.039
urg+0.039
aus+0.038
iver+0.037
Neg logit contributions
aroo-0.027
 nowhere-0.025
aro-0.025
t-0.021
't-0.020
Token.
Token ID13
Feature activation+2.93e-01

Pos logit contributions
rated+0.025
 gag+0.025
urg+0.025
aus+0.025
icious+0.024
Neg logit contributions
aroo-0.016
aro-0.015
 nowhere-0.014
t-0.013
hog-0.012
Token anything
Token ID1997
Feature activation-2.08e-01

Pos logit contributions
rated+0.017
urg+0.016
 gag+0.016
aus+0.016
icious+0.016
Neg logit contributions
aroo-0.006
aro-0.005
 nowhere-0.005
t-0.004
hog-0.004
Token is
Token ID318
Feature activation-4.79e-01

Pos logit contributions
aroo+0.004
aro+0.004
 nowhere+0.004
t+0.004
hog+0.003
Neg logit contributions
rated-0.006
aus-0.005
 gag-0.005
 decisions-0.005
urg-0.005
Token possible
Token ID1744
Feature activation-1.30e+00

Pos logit contributions
aroo+0.013
aro+0.012
t+0.011
 nowhere+0.011
hog+0.011
Neg logit contributions
rated-0.010
aus-0.010
urg-0.010
 decisions-0.010
 gag-0.010
Token.
Token ID13
Feature activation+3.68e-01

Pos logit contributions
aroo+0.027
aro+0.026
 nowhere+0.022
t+0.021
hog+0.021
Neg logit contributions
rated-0.036
 decisions-0.035
 difference-0.034
 alright-0.034
aus-0.034
Token<|endoftext|>
Token ID50256
Feature activation+7.27e-01

Pos logit contributions
 difference+0.009
 mental+0.009
 decisions+0.009
aus+0.008
 alright+0.008
Neg logit contributions
aroo-0.009
aro-0.009
 nowhere-0.007
t-0.007
hog-0.007
TokenOnce
Token ID7454
Feature activation-1.16e+00

Pos logit contributions
aus+0.019
rated+0.018
 difference+0.018
ksh+0.018
urg+0.018
Neg logit contributions
aroo-0.016
aro-0.015
 nowhere-0.013
t-0.013
hog-0.012
Token upon
Token ID2402
Feature activation+1.19e+00

Pos logit contributions
aroo+0.031
aro+0.030
hog+0.026
opard+0.024
 nowhere+0.023
Neg logit contributions
 idea-0.023
 choice-0.023
rated-0.023
 difference-0.023
 alright-0.022
Token a
Token ID257
Feature activation+2.13e+00

Pos logit contributions
rated+0.034
 gag+0.033
urg+0.033
aus+0.033
 decisions+0.032
Neg logit contributions
aroo-0.024
aro-0.022
 nowhere-0.020
t-0.020
hog-0.018
Token time
Token ID640
Feature activation+3.93e-02

Pos logit contributions
rated+0.046
urg+0.046
 gag+0.045
aus+0.045
 decisions+0.044
Neg logit contributions
aroo-0.056
aro-0.052
 nowhere-0.051
t-0.049
hog-0.046
Token there
Token ID612
Feature activation-4.08e-01

Pos logit contributions
 alright+0.001
 rum+0.001
 ok+0.001
urg+0.001
 Esk+0.001
Neg logit contributions
aroo-0.001
aro-0.001
hog-0.001
 nowhere-0.001
ite-0.001
Token was
Token ID373
Feature activation+8.25e-01

Pos logit contributions
aroo+0.008
aro+0.007
 nowhere+0.006
t+0.006
hog+0.006
Neg logit contributions
rated-0.012
urg-0.012
 gag-0.012
aus-0.012
 decisions-0.011
TokenAmy
Token ID40797
Feature activation-9.87e-01

Pos logit contributions
aro+0.020
aroo+0.020
 nowhere+0.016
ite+0.015
uzz+0.015
Neg logit contributions
icious-0.020
iver-0.019
have-0.019
rated-0.018
aus-0.018
Token held
Token ID2714
Feature activation+1.39e-01

Pos logit contributions
aroo+0.022
aro+0.021
 nowhere+0.019
t+0.018
hog+0.017
Neg logit contributions
rated-0.026
urg-0.025
 gag-0.025
aus-0.025
 decisions-0.024
Token the
Token ID262
Feature activation+3.51e-01

Pos logit contributions
 decisions+0.003
urg+0.003
 mental+0.003
 ste+0.003
aus+0.003
Neg logit contributions
aroo-0.003
aro-0.003
t-0.003
hog-0.003
 nowhere-0.003
Token shiny
Token ID22441
Feature activation+6.21e-01

Pos logit contributions
 difference+0.008
 decisions+0.008
 deal+0.008
rated+0.008
 idea+0.008
Neg logit contributions
aro-0.009
aroo-0.009
hog-0.008
t-0.008
uzz-0.007
Token star
Token ID3491
Feature activation-4.28e-02

Pos logit contributions
 decisions+0.015
rated+0.015
 difference+0.014
 deal+0.014
 choice+0.014
Neg logit contributions
aro-0.015
aroo-0.014
t-0.013
 nowhere-0.013
hog-0.013
Token and
Token ID290
Feature activation+3.35e-01

Pos logit contributions
aroo+0.001
aro+0.001
 nowhere+0.001
t+0.001
't+0.001
Neg logit contributions
rated-0.001
 decisions-0.001
 gag-0.001
urg-0.001
 awake-0.001
Token said
Token ID531
Feature activation-6.67e-01

Pos logit contributions
rated+0.009
urg+0.009
 gag+0.009
aus+0.009
icious+0.009
Neg logit contributions
aroo-0.007
aro-0.006
 nowhere-0.006
t-0.006
hog-0.005
Token,
Token ID11
Feature activation+1.39e+00

Pos logit contributions
aroo+0.012
aro+0.011
 nowhere+0.010
t+0.009
hog+0.009
Neg logit contributions
rated-0.020
urg-0.020
 gag-0.020
aus-0.020
 decisions-0.019
Token "
Token ID366
Feature activation-1.28e+00

Pos logit contributions
urg+0.038
rated+0.038
 gag+0.038
aus+0.037
ksh+0.037
Neg logit contributions
aroo-0.028
aro-0.027
 nowhere-0.023
ite-0.022
t-0.022
TokenI
Token ID40
Feature activation-4.24e-01

Pos logit contributions
aroo+0.027
aro+0.027
 nowhere+0.023
hog+0.022
t+0.021
Neg logit contributions
rated-0.035
icious-0.035
aus-0.034
ksh-0.034
iver-0.034
Token wish
Token ID4601
Feature activation-2.13e-01

Pos logit contributions
aroo+0.011
t+0.011
aro+0.010
't+0.010
 nowhere+0.009
Neg logit contributions
urg-0.010
ksh-0.009
rated-0.009
 burgers-0.009
 hotter-0.009
TokenOnce
Token ID7454
Feature activation-1.65e+00

Pos logit contributions
aus+0.014
 Esk+0.014
iver+0.014
 difference+0.014
rated+0.014
Neg logit contributions
aroo-0.010
aro-0.010
t-0.008
hog-0.008
 nowhere-0.007
Token upon
Token ID2402
Feature activation+8.74e-01

Pos logit contributions
aroo+0.040
aro+0.038
hog+0.035
 nowhere+0.031
overed+0.030
Neg logit contributions
 idea-0.035
rated-0.035
aus-0.035
urg-0.035
 difference-0.035
Token a
Token ID257
Feature activation+1.67e+00

Pos logit contributions
rated+0.024
 gag+0.024
urg+0.024
aus+0.024
 decisions+0.023
Neg logit contributions
aroo-0.018
aro-0.017
 nowhere-0.015
t-0.014
hog-0.014
Token time
Token ID640
Feature activation-5.34e-01

Pos logit contributions
urg+0.038
rated+0.038
 gag+0.037
aus+0.037
 decisions+0.037
Neg logit contributions
aroo-0.042
 nowhere-0.038
aro-0.038
t-0.037
't-0.034
Token,
Token ID11
Feature activation-1.03e+00

Pos logit contributions
aroo+0.015
aro+0.014
hog+0.013
 nowhere+0.013
tie+0.013
Neg logit contributions
 alright-0.009
 Esk-0.009
urg-0.008
 idea-0.008
iver-0.008
Token there
Token ID612
Feature activation-1.05e+00

Pos logit contributions
aro+0.020
aroo+0.018
hog+0.016
t+0.016
oot+0.014
Neg logit contributions
rated-0.029
 Esk-0.027
 hotter-0.027
 alright-0.027
 idea-0.027
Token was
Token ID373
Feature activation+5.76e-01

Pos logit contributions
aroo+0.023
aro+0.022
 nowhere+0.019
t+0.019
hog+0.018
Neg logit contributions
rated-0.027
urg-0.027
aus-0.026
 gag-0.026
iver-0.026
Token a
Token ID257
Feature activation+9.46e-01

Pos logit contributions
aus+0.013
 gag+0.013
 burgers+0.013
urg+0.013
 hamb+0.012
Neg logit contributions
aro-0.015
aroo-0.015
t-0.013
hog-0.012
ite-0.012
Token kind
Token ID1611
Feature activation+5.97e-01

Pos logit contributions
rated+0.024
 gag+0.024
urg+0.024
aus+0.023
icious+0.023
Neg logit contributions
aroo-0.022
aro-0.020
 nowhere-0.019
t-0.018
hog-0.017
Token king
Token ID5822
Feature activation+1.04e+00

Pos logit contributions
aus+0.015
urg+0.015
 difference+0.014
 hamb+0.014
 gag+0.014
Neg logit contributions
aro-0.013
aroo-0.013
 nowhere-0.012
t-0.012
't-0.011
Token.
Token ID13
Feature activation-1.09e+00

Pos logit contributions
iver+0.021
 sne+0.020
eping+0.020
 por+0.020
urger+0.020
Neg logit contributions
aro-0.024
aroo-0.023
overed-0.022
 nowhere-0.022
uttered-0.022
Token He
Token ID679
Feature activation+1.01e-01

Pos logit contributions
aus+0.006
ksh+0.006
rated+0.005
urg+0.005
 burgers+0.005
Neg logit contributions
aroo-0.006
aro-0.006
 nowhere-0.006
t-0.005
ite-0.005
Token picked
Token ID6497
Feature activation+1.01e-01

Pos logit contributions
aus+0.002
urg+0.002
rated+0.002
 hamb+0.002
 gag+0.002
Neg logit contributions
aroo-0.003
aro-0.002
t-0.002
 nowhere-0.002
ite-0.002
Token up
Token ID510
Feature activation-8.96e-01

Pos logit contributions
rated+0.002
urg+0.002
aus+0.002
 gag+0.002
 decisions+0.002
Neg logit contributions
aroo-0.002
aro-0.002
 nowhere-0.002
t-0.002
hog-0.002
Token Kitty
Token ID21168
Feature activation-5.01e-01

Pos logit contributions
aroo+0.019
aro+0.018
t+0.017
 nowhere+0.016
hog+0.015
Neg logit contributions
aus-0.023
 gag-0.023
 burgers-0.023
urg-0.023
 decisions-0.022
Token and
Token ID290
Feature activation+1.56e-01

Pos logit contributions
aroo+0.008
aro+0.007
 nowhere+0.007
t+0.006
hog+0.006
Neg logit contributions
rated-0.016
urg-0.016
 gag-0.016
aus-0.016
icious-0.016
Token lifted
Token ID13663
Feature activation+2.12e-01

Pos logit contributions
rated+0.004
urg+0.004
aus+0.004
 gag+0.004
 decisions+0.004
Neg logit contributions
aroo-0.004
aro-0.004
 nowhere-0.003
t-0.003
hog-0.003
Token her
Token ID607
Feature activation-3.10e-01

Pos logit contributions
rated+0.006
 gag+0.006
 decisions+0.005
aus+0.005
urg+0.005
Neg logit contributions
aroo-0.004
aro-0.004
 nowhere-0.004
t-0.004
ite-0.003
Token up
Token ID510
Feature activation-3.33e-01

Pos logit contributions
aroo+0.008
aro+0.008
t+0.007
 nowhere+0.007
hog+0.007
Neg logit contributions
rated-0.007
 decisions-0.007
 gag-0.007
 difference-0.006
 ang-0.006
Token high
Token ID1029
Feature activation+6.38e-01

Pos logit contributions
aroo+0.006
aro+0.006
 nowhere+0.005
t+0.005
hog+0.004
Neg logit contributions
rated-0.010
 gag-0.010
urg-0.010
aus-0.010
 decisions-0.010
Token.
Token ID13
Feature activation+5.15e-01

Pos logit contributions
rated+0.017
urg+0.017
 gag+0.017
aus+0.017
icious+0.017
Neg logit contributions
aroo-0.014
aro-0.013
 nowhere-0.012
t-0.011
hog-0.010
Token Now
Token ID2735
Feature activation-1.24e+00

Pos logit contributions
 burgers+0.012
ksh+0.012
aus+0.012
 hamb+0.012
 microwave+0.012
Neg logit contributions
aroo-0.012
 nowhere-0.011
aro-0.011
ite-0.010
t-0.009
Token He
Token ID679
Feature activation-6.53e-02

Pos logit contributions
rated+0.003
 gag+0.003
urg+0.003
aus+0.003
 decisions+0.003
Neg logit contributions
aroo-0.002
aro-0.002
 nowhere-0.002
t-0.002
hog-0.002
Token ran
Token ID4966
Feature activation-8.87e-03

Pos logit contributions
aroo+0.001
aro+0.001
 nowhere+0.001
t+0.001
hog+0.001
Neg logit contributions
 decisions-0.002
urg-0.002
rated-0.002
 gag-0.002
icious-0.002
Token around
Token ID1088
Feature activation-1.76e+00

Pos logit contributions
aroo+0.000
aro+0.000
ite+0.000
 nowhere+0.000
itt+0.000
Neg logit contributions
 awake-0.000
 colder-0.000
rated-0.000
 burgers-0.000
 healthier-0.000
Token with
Token ID351
Feature activation-1.52e+00

Pos logit contributions
aroo+0.038
aro+0.036
t+0.033
ite+0.032
 nowhere+0.029
Neg logit contributions
urg-0.044
 hamb-0.044
 gag-0.044
rated-0.044
 ste-0.043
Token his
Token ID465
Feature activation-7.33e-01

Pos logit contributions
aroo+0.042
aro+0.040
t+0.038
hog+0.038
ite+0.036
Neg logit contributions
 gag-0.030
 colder-0.030
 burgers-0.030
 hotter-0.029
aus-0.029
Token friends
Token ID2460
Feature activation+1.99e-01

Pos logit contributions
aroo+0.018
aro+0.017
t+0.016
uzz+0.015
hog+0.015
Neg logit contributions
 gag-0.017
 sne-0.017
 difference-0.017
 choice-0.016
 chores-0.016
Token and
Token ID290
Feature activation-8.84e-01

Pos logit contributions
aus+0.005
rated+0.005
 burgers+0.005
 hamb+0.005
ksh+0.005
Neg logit contributions
aroo-0.004
aro-0.004
t-0.003
 nowhere-0.003
hog-0.003
Token had
Token ID550
Feature activation-7.34e-01

Pos logit contributions
aroo+0.019
aro+0.018
 nowhere+0.016
t+0.016
hog+0.015
Neg logit contributions
rated-0.024
 gag-0.023
urg-0.023
aus-0.023
 hamb-0.023
Token so
Token ID523
Feature activation-8.06e-02

Pos logit contributions
 nowhere+0.011
aroo+0.008
 that+0.008
 she+0.007
t+0.007
Neg logit contributions
rated-0.027
urg-0.025
 gag-0.025
eling-0.025
iver-0.025
Token much
Token ID881
Feature activation+1.48e+00

Pos logit contributions
aroo+0.002
aro+0.002
 nowhere+0.002
t+0.002
hog+0.001
Neg logit contributions
rated-0.002
urg-0.002
 gag-0.002
aus-0.002
 decisions-0.002
Token fun
Token ID1257
Feature activation-1.18e+00

Pos logit contributions
rated+0.027
urg+0.026
 gag+0.026
aus+0.025
ksh+0.025
Neg logit contributions
aroo-0.045
aro-0.043
 nowhere-0.041
t-0.040
hog-0.038
Tokenion
Token ID295
Feature activation-4.91e-01

Pos logit contributions
 degrees+0.045
 colder+0.045
aus+0.044
 hotter+0.044
rated+0.044
Neg logit contributions
aroo-0.033
 nowhere-0.030
aro-0.029
hog-0.025
t-0.024
Token on
Token ID319
Feature activation-1.35e+00

Pos logit contributions
aroo+0.009
aro+0.009
 nowhere+0.008
t+0.007
hog+0.007
Neg logit contributions
rated-0.014
 gag-0.014
aus-0.014
urg-0.014
 decisions-0.014
Token Rachel
Token ID15984
Feature activation-1.21e+00

Pos logit contributions
aroo+0.027
aro+0.025
 nowhere+0.022
t+0.022
hog+0.020
Neg logit contributions
rated-0.039
urg-0.038
 gag-0.038
aus-0.038
 decisions-0.037
Tokenâ
Token ID22940
Feature activation+1.22e+00

Pos logit contributions
aroo+0.032
aro+0.030
 nowhere+0.029
t+0.028
hog+0.027
Neg logit contributions
rated-0.026
urg-0.026
aus-0.025
 gag-0.025
icious-0.025
Token
Token ID26391
Feature activation-3.36e+00

Pos logit contributions
 gag+0.024
urg+0.024
 hotter+0.022
 hamb+0.022
 burgers+0.022
Neg logit contributions
aroo-0.035
aro-0.031
t-0.031
 nowhere-0.030
hog-0.030
Token
Token ID8151
Feature activation-3.17e+00

Pos logit contributions
aroo+0.085
aro+0.079
 nowhere+0.072
t+0.069
hog+0.068
Neg logit contributions
rated-0.078
urg-0.078
 gag-0.077
aus-0.076
iver-0.075
Tokens
Token ID82
Feature activation-3.49e-01

Pos logit contributions
aroo+0.071
aro+0.066
 nowhere+0.063
hog+0.050
 Spirit+0.050
Neg logit contributions
aus-0.080
 alright-0.075
icious-0.074
iver-0.073
 burgers-0.073
Token arms
Token ID5101
Feature activation-3.65e-01

Pos logit contributions
aroo+0.008
aro+0.007
 nowhere+0.007
t+0.006
ite+0.006
Neg logit contributions
 gag-0.009
rated-0.009
urg-0.009
 decisions-0.009
 difference-0.009
Token and
Token ID290
Feature activation-7.82e-02

Pos logit contributions
aroo+0.007
aro+0.006
 nowhere+0.005
t+0.005
hog+0.005
Neg logit contributions
rated-0.011
urg-0.011
 gag-0.011
aus-0.011
 decisions-0.011
Token hands
Token ID2832
Feature activation-2.84e-01

Pos logit contributions
aroo+0.002
aro+0.002
 nowhere+0.001
t+0.001
hog+0.001
Neg logit contributions
 gag-0.002
rated-0.002
urg-0.002
aus-0.002
 decisions-0.002
Token.
Token ID13
Feature activation+4.24e-01

Pos logit contributions
aroo+0.004
aro+0.004
t+0.004
 nowhere+0.003
hog+0.003
Neg logit contributions
rated-0.010
aus-0.009
urg-0.009
 gag-0.009
icious-0.009
Token decision
Token ID2551
Feature activation-3.84e-01

Pos logit contributions
aroo+0.002
aro+0.002
 nowhere+0.002
t+0.002
hog+0.001
Neg logit contributions
rated-0.002
aus-0.002
urg-0.002
icious-0.002
ksh-0.002
Token.
Token ID13
Feature activation+1.32e-01

Pos logit contributions
aroo+0.007
aro+0.006
 nowhere+0.006
t+0.005
hog+0.005
Neg logit contributions
rated-0.012
urg-0.012
 gag-0.012
aus-0.011
 decisions-0.011
Token He
Token ID679
Feature activation-2.35e-01

Pos logit contributions
aus+0.004
rated+0.004
 gag+0.004
urg+0.004
ksh+0.004
Neg logit contributions
aroo-0.003
aro-0.002
 nowhere-0.002
t-0.002
hog-0.002
Token was
Token ID373
Feature activation-6.42e-01

Pos logit contributions
aroo+0.005
aro+0.005
 nowhere+0.004
t+0.004
hog+0.004
Neg logit contributions
rated-0.006
urg-0.006
 gag-0.006
aus-0.006
 decisions-0.006
Token so
Token ID523
Feature activation-4.48e-01

Pos logit contributions
aroo+0.014
aro+0.013
 nowhere+0.013
t+0.012
't+0.011
Neg logit contributions
rated-0.017
urg-0.017
 gag-0.017
aus-0.017
ksh-0.016
Token pleased
Token ID10607
Feature activation-2.56e+00

Pos logit contributions
aroo+0.010
aro+0.009
 nowhere+0.009
t+0.009
't+0.008
Neg logit contributions
rated-0.012
urg-0.011
aus-0.011
ksh-0.011
 gag-0.011
Token with
Token ID351
Feature activation-1.24e+00

Pos logit contributions
aroo+0.048
aro+0.045
 nowhere+0.040
t+0.038
ite+0.037
Neg logit contributions
 decisions-0.071
urg-0.071
 burgers-0.069
 hamb-0.069
 french-0.068
Token himself
Token ID2241
Feature activation-9.55e-01

Pos logit contributions
aroo+0.027
aro+0.025
t+0.023
 nowhere+0.023
hog+0.021
Neg logit contributions
urg-0.033
rated-0.032
 gag-0.032
aus-0.032
 decisions-0.032
Token and
Token ID290
Feature activation-6.75e-01

Pos logit contributions
aroo+0.018
aro+0.016
 nowhere+0.015
t+0.014
hog+0.013
Neg logit contributions
rated-0.029
urg-0.028
 gag-0.028
aus-0.028
 decisions-0.027
Token couldn
Token ID3521
Feature activation-1.45e+00

Pos logit contributions
aroo+0.014
aro+0.013
 nowhere+0.011
t+0.011
hog+0.010
Neg logit contributions
rated-0.019
 gag-0.019
urg-0.019
aus-0.019
icious-0.019
Tokenâ
Token ID22940
Feature activation+5.29e-01

Pos logit contributions
aroo+0.012
aro+0.010
 nowhere+0.008
t+0.006
hog+0.005
Neg logit contributions
rated-0.058
aus-0.057
urg-0.057
 gag-0.057
 decisions-0.056
Token."
Token ID526
Feature activation+8.52e-01

Pos logit contributions
aroo+0.027
aro+0.025
hog+0.024
 nowhere+0.024
ite+0.022
Neg logit contributions
 alright-0.025
 decisions-0.024
urg-0.024
 deal-0.023
 difference-0.023
Token
Token ID198
Feature activation+2.94e-01

Pos logit contributions
 gag+0.022
 mental+0.021
aus+0.021
 microwave+0.021
 ang+0.020
Neg logit contributions
aroo-0.017
aro-0.016
t-0.015
 nowhere-0.015
ite-0.014
Token
Token ID198
Feature activation-7.96e-01

Pos logit contributions
 mental+0.008
aus+0.008
iver+0.008
 gag+0.008
urg+0.008
Neg logit contributions
aroo-0.005
aro-0.005
t-0.004
 nowhere-0.004
ite-0.004
TokenHe
Token ID1544
Feature activation+6.52e-01

Pos logit contributions
aroo+0.017
aro+0.017
t+0.016
 nowhere+0.016
uttered+0.016
Neg logit contributions
iver-0.018
have-0.018
 mental-0.018
icious-0.018
aus-0.018
Token showed
Token ID3751
Feature activation-8.59e-01

Pos logit contributions
rated+0.016
urg+0.016
 gag+0.016
aus+0.015
icious+0.015
Neg logit contributions
aroo-0.016
aro-0.015
 nowhere-0.014
t-0.013
hog-0.013
Token them
Token ID606
Feature activation-2.16e+00

Pos logit contributions
aroo+0.021
t+0.021
aro+0.020
 nowhere+0.020
ite+0.019
Neg logit contributions
urg-0.018
 decisions-0.018
aus-0.018
 choice-0.018
 ang-0.017
Token how
Token ID703
Feature activation+7.19e-01

Pos logit contributions
aroo+0.052
aro+0.048
 nowhere+0.045
t+0.045
ite+0.043
Neg logit contributions
 decisions-0.052
 burgers-0.052
 gag-0.052
urg-0.051
 chores-0.050
Token the
Token ID262
Feature activation+1.79e-01

Pos logit contributions
 gag+0.022
rated+0.022
 burgers+0.021
aus+0.021
 decisions+0.021
Neg logit contributions
aroo-0.012
aro-0.011
 nowhere-0.010
hog-0.009
t-0.009
Token truck
Token ID7779
Feature activation-5.31e-01

Pos logit contributions
rated+0.005
urg+0.005
 gag+0.005
aus+0.005
icious+0.004
Neg logit contributions
aroo-0.004
aro-0.004
 nowhere-0.004
t-0.003
hog-0.003
Token worked
Token ID3111
Feature activation+4.43e-01

Pos logit contributions
aroo+0.011
aro+0.010
 nowhere+0.009
t+0.008
hog+0.008
Neg logit contributions
rated-0.015
urg-0.015
 gag-0.015
aus-0.015
 decisions-0.014
Token and
Token ID290
Feature activation-1.14e+00

Pos logit contributions
 gag+0.015
rated+0.015
urg+0.015
aus+0.014
icious+0.014
Neg logit contributions
aroo-0.007
aro-0.006
 nowhere-0.006
t-0.006
hog-0.005
Token.
Token ID13
Feature activation-7.74e-01

Pos logit contributions
aroo+0.014
aro+0.013
 nowhere+0.012
t+0.011
hog+0.011
Neg logit contributions
rated-0.019
urg-0.019
 gag-0.018
aus-0.018
 decisions-0.018
Token You
Token ID921
Feature activation-1.18e-01

Pos logit contributions
aroo+0.016
aro+0.015
 nowhere+0.014
t+0.013
hog+0.012
Neg logit contributions
ksh-0.021
rated-0.021
 gag-0.021
aus-0.021
urg-0.021
Token are
Token ID389
Feature activation-6.14e-01

Pos logit contributions
aroo+0.004
aro+0.003
 nowhere+0.003
t+0.003
hog+0.003
Neg logit contributions
rated-0.002
 gag-0.002
aus-0.002
urg-0.002
 decisions-0.002
Token the
Token ID262
Feature activation-1.41e+00

Pos logit contributions
aroo+0.015
aro+0.014
t+0.013
 nowhere+0.013
ite+0.013
Neg logit contributions
aus-0.014
 alright-0.014
urg-0.014
 gag-0.014
rated-0.014
Token best
Token ID1266
Feature activation-8.49e-01

Pos logit contributions
aroo+0.041
aro+0.039
 nowhere+0.036
t+0.035
hog+0.034
Neg logit contributions
rated-0.028
urg-0.027
 gag-0.027
aus-0.027
icious-0.026
Token doll
Token ID3654
Feature activation-2.26e+00

Pos logit contributions
aroo+0.019
 nowhere+0.018
aro+0.017
t+0.016
hog+0.015
Neg logit contributions
rated-0.021
 decisions-0.021
urg-0.021
 difference-0.021
 gag-0.021
Token ever
Token ID1683
Feature activation+4.29e-01

Pos logit contributions
aroo+0.039
aro+0.038
 nowhere+0.035
t+0.028
uzz+0.028
Neg logit contributions
rated-0.064
 decisions-0.063
aus-0.062
 difference-0.062
 burgers-0.062
Token."
Token ID526
Feature activation+5.86e-01

Pos logit contributions
rated+0.011
urg+0.011
 gag+0.011
aus+0.011
 decisions+0.011
Neg logit contributions
aroo-0.010
aro-0.009
 nowhere-0.008
t-0.008
hog-0.008
Token She
Token ID1375
Feature activation-5.93e-02

Pos logit contributions
 gag+0.014
 difference+0.013
 microwave+0.013
 mental+0.013
 Dr+0.013
Neg logit contributions
aroo-0.015
aro-0.015
 nowhere-0.012
t-0.011
ite-0.011
Token pulled
Token ID5954
Feature activation+1.08e+00

Pos logit contributions
aroo+0.001
aro+0.001
t+0.001
 nowhere+0.001
ite+0.001
Neg logit contributions
aus-0.001
urg-0.001
 gag-0.001
rated-0.001
 hamb-0.001
Token the
Token ID262
Feature activation+1.94e-01

Pos logit contributions
rated+0.030
urg+0.029
 gag+0.029
aus+0.029
 decisions+0.028
Neg logit contributions
aroo-0.023
aro-0.021
 nowhere-0.019
t-0.019
hog-0.017
Token and
Token ID290
Feature activation+4.27e-01

Pos logit contributions
rated+0.007
urg+0.007
aus+0.007
 gag+0.007
 hamb+0.006
Neg logit contributions
aroo-0.006
aro-0.005
 nowhere-0.005
t-0.005
hog-0.004
Token princess
Token ID21752
Feature activation-1.07e+00

Pos logit contributions
aus+0.011
rated+0.011
 gag+0.011
urg+0.011
 hamb+0.011
Neg logit contributions
aroo-0.009
aro-0.008
t-0.008
 nowhere-0.007
ite-0.007
Token were
Token ID547
Feature activation-9.10e-01

Pos logit contributions
aroo+0.024
aro+0.022
 nowhere+0.021
t+0.020
hog+0.019
Neg logit contributions
urg-0.027
rated-0.027
aus-0.026
 gag-0.026
 hamb-0.025
Token the
Token ID262
Feature activation-1.65e+00

Pos logit contributions
aroo+0.020
aro+0.018
 nowhere+0.017
t+0.016
hog+0.015
Neg logit contributions
rated-0.025
 gag-0.024
aus-0.024
urg-0.024
icious-0.023
Token most
Token ID749
Feature activation-6.57e-01

Pos logit contributions
aroo+0.049
aro+0.047
 nowhere+0.044
t+0.043
hog+0.041
Neg logit contributions
rated-0.031
 gag-0.030
urg-0.030
aus-0.030
 decisions-0.029
Token attractive
Token ID10966
Feature activation-2.17e+00

Pos logit contributions
aroo+0.019
aro+0.019
 nowhere+0.017
t+0.017
hog+0.016
Neg logit contributions
rated-0.012
urg-0.012
aus-0.012
 gag-0.012
 decisions-0.012
Token couple
Token ID3155
Feature activation-1.20e+00

Pos logit contributions
aro+0.047
aroo+0.046
t+0.043
 nowhere+0.042
't+0.041
Neg logit contributions
 difference-0.054
 choice-0.054
 deal-0.053
 decisions-0.053
 idea-0.052
Token there
Token ID612
Feature activation-9.69e-01

Pos logit contributions
aroo+0.021
aro+0.020
t+0.018
 nowhere+0.018
't+0.017
Neg logit contributions
 burgers-0.032
 decisions-0.032
aus-0.032
 difference-0.031
 hamb-0.031
Token.
Token ID13
Feature activation+1.06e-01

Pos logit contributions
aroo+0.020
aro+0.018
 nowhere+0.016
t+0.015
hog+0.015
Neg logit contributions
rated-0.027
urg-0.026
aus-0.026
 gag-0.026
icious-0.026
Token Everyone
Token ID11075
Feature activation-6.07e-01

Pos logit contributions
ksh+0.002
aus+0.002
 difference+0.002
 burgers+0.002
 microwave+0.002
Neg logit contributions
aroo-0.003
aro-0.003
t-0.003
 nowhere-0.002
ite-0.002
Token stared
Token ID18484
Feature activation+7.31e-01

Pos logit contributions
aroo+0.014
aro+0.014
t+0.013
 nowhere+0.013
't+0.012
Neg logit contributions
rated-0.013
 ste-0.013
 hamb-0.013
 mental-0.013
eling-0.013