TOP ACTIVATIONS
MAX = 4.135

" Can I lend your mustache for a little bit ?
you want to teach your tank how to say ' s
should be compassionate . Comp assion ate means you care about
Whenever Lily went to the post office , she always added
Where did you get your gun ?" The other
Why did you say ' ob edient ' ?"
judge who lived in a beautiful palace . Whenever anyone had
" Can I have your heart , Anna ?" Tom asks
then on , the me adow was known as the surprise
They make sounds like " v room " and " be
fox named Fox y . Fox y was very intelligent and
that day on , Pet al was never careless and always
that day on , the village was even more peaceful and
that day on , the shop keeper was extra careful to
Can you lend me your colors so I can draw one
, " Ar rr , mate y !" Anna
did you take my kay ak ?" L ila
to bow . Whenever she bowed , the town spe ople
Why are you qu arre ling ?" The two
day on , Wh isk ers was no longer a lazy

MIN ACTIVATIONS
MIN = -4.209

fun . After playing for a while , they decided to
to the slide . They wait for their turn .
?" Lu cy thought that sounded like fun ,
go find a seat and wait for a turn ." They
Let 's go inside and wait for the rain to stop
, they sat down for a picnic .
Finally , after climbing for a long time , Mary reached
to the slide . They wait for their turn . L
. Let 's go for a different walk instead ." But
on top of it for a while . As
After swinging for a while , Lily 's tum
" Let 's go for a row !", her mom said
in the front yard and wait . Anna did as her
" Let 's go for a walk !" said Mom my
run to the slide and wait for their turn .
" We can go on a road trip ."
After they played for a while , it was time
around . After playing for a long time , they went
The old lady thought and thought but she couldn â €
's go back home and wait for the sun to come

INTERVAL 2.049 - 4.135
CONTAINS 0.377%

As he walked through the wood , he saw a large
brother . You have to share the mattress and be kind
a big box in the living room . It has a
One day , Tim met a talking cat . The
they heard Bob 's special song they all began to sing

INTERVAL -0.037 - 2.049
CONTAINS 51.729%

. A little bird lived in the tree . The bird
each other and cried . A nice lady saw
Billy said goodbye and started walking back home . He kept
you will think before you act ." L ily
was back in her nest . She looked around and saw

INTERVAL -2.123 - -0.037
CONTAINS 47.403%

to go to the hospital for an operation .
. Albert was upset . He didn 't want to take
think it will be more fun . They take the ball
kept on staring at the clouds . He was happy and
juice , and some cookies . They sat on a bench

INTERVAL -4.209 - -2.123
CONTAINS 0.491%

my ." <|endoftext|> Once upon a time , there was a
day . <|endoftext|> Once upon a time , there was a
end . <|endoftext|> Once upon a time , there was a
and Ben are scared . They run away . The dog
bit more until it was time to go back inside .
Token "
Token ID366
Feature activation+9.19e-01

Pos logit contributions
 minutes+0.011
ches+0.010
fy+0.010
 mun+0.009
 degrees+0.009
Neg logit contributions
roc-0.010
ate-0.009
ulture-0.008
ateful-0.008
rot-0.008
TokenCan
Token ID6090
Feature activation+3.52e-01

Pos logit contributions
roc+0.042
ate+0.039
ateful+0.036
ulture+0.035
rot+0.034
Neg logit contributions
 minutes-0.037
 Whoever-0.033
ches-0.033
 mun-0.032
fy-0.032
Token I
Token ID314
Feature activation-3.11e-01

Pos logit contributions
roc+0.021
ate+0.020
ulture+0.019
ateful+0.019
rot+0.018
Neg logit contributions
 minutes-0.009
ches-0.008
fy-0.008
 mun-0.007
 drift-0.007
Token lend
Token ID22096
Feature activation+1.09e+00

Pos logit contributions
 minutes+0.012
ches+0.011
 degrees+0.010
fy+0.010
 mun+0.010
Neg logit contributions
roc-0.015
ate-0.014
ulture-0.013
rot-0.013
flower-0.012
Token your
Token ID534
Feature activation+7.52e-01

Pos logit contributions
roc+0.060
ate+0.058
ateful+0.056
ulture+0.056
amer+0.054
Neg logit contributions
ches-0.031
 minutes-0.030
 y-0.028
fy-0.027
 Whoever-0.025
Token mustache
Token ID49303
Feature activation+4.03e+00

Pos logit contributions
roc+0.032
ate+0.031
ateful+0.029
ulture+0.028
amer+0.027
Neg logit contributions
 minutes-0.031
ches-0.029
 y-0.029
fy-0.028
 drift-0.028
Token for
Token ID329
Feature activation+3.62e-01

Pos logit contributions
ateful+0.218
roc+0.214
apper+0.208
ulture+0.203
amer+0.203
Neg logit contributions
ches-0.107
 y-0.106
 for-0.093
fy-0.090
 wo-0.090
Token a
Token ID257
Feature activation-1.54e+00

Pos logit contributions
roc+0.016
ate+0.015
ulture+0.014
ateful+0.014
rot+0.014
Neg logit contributions
 minutes-0.015
ches-0.014
fy-0.013
 y-0.013
 mun-0.013
Token little
Token ID1310
Feature activation-9.02e-01

Pos logit contributions
 minutes+0.071
 degrees+0.068
ches+0.066
 drift+0.062
fy+0.061
Neg logit contributions
roc-0.055
rot-0.051
man-0.051
 judge-0.049
ate-0.048
Token bit
Token ID1643
Feature activation+2.74e-01

Pos logit contributions
 minutes+0.034
ches+0.034
 degrees+0.032
 drift+0.031
fy+0.030
Neg logit contributions
roc-0.040
ate-0.036
rot-0.035
man-0.035
ulture-0.035
Token?
Token ID30
Feature activation-2.81e-01

Pos logit contributions
roc+0.011
ate+0.010
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.012
 mun-0.011
 y-0.011
Token you
Token ID345
Feature activation-1.62e-02

Pos logit contributions
 minutes+0.016
ches+0.015
fy+0.014
 drift+0.014
 mun+0.013
Neg logit contributions
roc-0.022
ate-0.021
ateful-0.019
ulture-0.019
anish-0.019
Token want
Token ID765
Feature activation+7.87e-01

Pos logit contributions
 minutes+0.001
ches+0.000
fy+0.000
 mun+0.000
 drift+0.000
Neg logit contributions
roc-0.001
ate-0.001
ulture-0.001
ateful-0.001
rot-0.001
Token to
Token ID284
Feature activation+1.27e+00

Pos logit contributions
roc+0.039
ate+0.037
ulture+0.035
ateful+0.035
rot+0.034
Neg logit contributions
 minutes-0.029
 y-0.026
ches-0.026
 mun-0.024
fy-0.024
Token teach
Token ID4545
Feature activation+1.42e+00

Pos logit contributions
roc+0.075
amed+0.070
ulture+0.070
ateful+0.070
ate+0.069
Neg logit contributions
 dip-0.030
 minutes-0.030
 y-0.030
 drift-0.029
 mun-0.028
Token your
Token ID534
Feature activation-8.53e-02

Pos logit contributions
roc+0.077
ulture+0.075
ate+0.073
flower+0.072
apper+0.071
Neg logit contributions
 minutes-0.040
 y-0.038
ches-0.038
 syrup-0.032
 Whoever-0.032
Token tank
Token ID6873
Feature activation+3.94e+00

Pos logit contributions
 minutes+0.004
ches+0.004
fy+0.004
 drift+0.004
 mun+0.004
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.003
rot-0.003
Token how
Token ID703
Feature activation+2.89e-01

Pos logit contributions
ivered+0.205
roc+0.203
ateful+0.201
 Fighting+0.196
ums+0.195
Neg logit contributions
 y-0.106
 minutes-0.087
ches-0.085
 for-0.083
 honey-0.079
Token to
Token ID284
Feature activation+1.27e+00

Pos logit contributions
roc+0.013
ate+0.013
ulture+0.012
ateful+0.011
rot+0.011
Neg logit contributions
 minutes-0.012
ches-0.011
fy-0.010
 y-0.010
 mun-0.010
Token say
Token ID910
Feature activation+1.43e+00

Pos logit contributions
roc+0.071
ulture+0.064
amed+0.064
ateful+0.064
ate+0.064
Neg logit contributions
 minutes-0.035
ches-0.034
 drift-0.034
 dip-0.034
 y-0.032
Token '
Token ID705
Feature activation+6.49e-01

Pos logit contributions
roc+0.073
ate+0.066
ulture+0.066
ateful+0.064
flower+0.062
Neg logit contributions
 minutes-0.051
 y-0.046
ches-0.045
 Whoever-0.041
 wo-0.041
Tokens
Token ID82
Feature activation+3.35e+00

Pos logit contributions
roc+0.027
ate+0.026
ulture+0.023
ateful+0.023
rot+0.022
Neg logit contributions
 minutes-0.028
ches-0.026
fy-0.025
 mun-0.025
 y-0.024
Token should
Token ID815
Feature activation+5.19e-01

Pos logit contributions
 minutes+0.002
ches+0.002
fy+0.001
 mun+0.001
 drift+0.001
Neg logit contributions
roc-0.002
ate-0.002
ulture-0.002
ateful-0.002
rot-0.002
Token be
Token ID307
Feature activation+1.55e+00

Pos logit contributions
roc+0.025
ate+0.024
ulture+0.022
ateful+0.022
rot+0.022
Neg logit contributions
 minutes-0.019
ches-0.018
 drift-0.017
 y-0.017
fy-0.016
Token compassionate
Token ID32533
Feature activation+2.15e+00

Pos logit contributions
ulture+0.081
roc+0.080
rot+0.076
ate+0.076
flower+0.076
Neg logit contributions
 minutes-0.055
 y-0.051
 tug-0.043
 slippery-0.041
ches-0.041
Token.
Token ID13
Feature activation+1.65e+00

Pos logit contributions
roc+0.138
ate+0.138
ulture+0.129
amed+0.126
ateful+0.126
Neg logit contributions
 minutes-0.044
ches-0.029
 y-0.026
fy-0.026
 wo-0.024
Token Comp
Token ID3082
Feature activation+1.67e+00

Pos logit contributions
roc+0.086
ate+0.082
ulture+0.078
rot+0.077
amed+0.077
Neg logit contributions
 minutes-0.050
 Whoever-0.043
ches-0.042
fy-0.041
 wo-0.040
Tokenassion
Token ID11857
Feature activation+3.91e+00

Pos logit contributions
roc+0.056
anish+0.056
ate+0.055
 neighborhood+0.053
 self+0.053
Neg logit contributions
ches-0.078
fy-0.068
 Whoever-0.064
 degrees-0.059
 minutes-0.058
Tokenate
Token ID378
Feature activation+1.32e+00

Pos logit contributions
 messenger+0.118
roc+0.115
 courtroom+0.109
flower+0.108
date+0.107
Neg logit contributions
 minutes-0.189
ches-0.185
 y-0.170
 pounds-0.162
sc-0.160
Token means
Token ID1724
Feature activation-5.75e-01

Pos logit contributions
roc+0.062
ate+0.058
ulture+0.054
ateful+0.054
amed+0.053
Neg logit contributions
ches-0.048
 minutes-0.048
 y-0.044
fy-0.043
 degrees-0.041
Token you
Token ID345
Feature activation+3.97e-01

Pos logit contributions
 minutes+0.022
ches+0.021
fy+0.021
 mun+0.020
 drift+0.020
Neg logit contributions
roc-0.026
ate-0.025
ateful-0.023
ulture-0.022
 Fighting-0.022
Token care
Token ID1337
Feature activation+1.10e+00

Pos logit contributions
roc+0.020
ate+0.019
ulture+0.018
ateful+0.018
rot+0.017
Neg logit contributions
 minutes-0.014
ches-0.013
fy-0.012
 y-0.012
 mun-0.012
Token about
Token ID546
Feature activation-1.70e-02

Pos logit contributions
roc+0.068
ate+0.064
ulture+0.062
rot+0.061
flower+0.060
Neg logit contributions
 minutes-0.026
ches-0.025
 y-0.022
fy-0.021
 degrees-0.020
Token Whenever
Token ID21326
Feature activation+9.78e-01

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.002
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token Lily
Token ID20037
Feature activation+1.79e+00

Pos logit contributions
roc+0.066
ate+0.063
ulture+0.061
amer+0.060
ateful+0.059
Neg logit contributions
 minutes-0.021
 y-0.015
 drift-0.014
ches-0.013
fy-0.013
Token went
Token ID1816
Feature activation+2.58e+00

Pos logit contributions
roc+0.093
ate+0.086
ateful+0.084
ulture+0.083
anish+0.082
Neg logit contributions
 minutes-0.059
ches-0.059
 mun-0.052
 y-0.052
fy-0.051
Token to
Token ID284
Feature activation+1.15e+00

Pos logit contributions
roc+0.162
ulture+0.148
ateful+0.147
ate+0.146
anish+0.143
Neg logit contributions
 minutes-0.064
ches-0.058
 y-0.053
 drift-0.051
 wo-0.049
Token the
Token ID262
Feature activation+2.54e+00

Pos logit contributions
roc+0.047
ulture+0.043
rot+0.042
ate+0.042
amed+0.041
Neg logit contributions
 minutes-0.050
 drift-0.047
 y-0.047
 dip-0.045
 Whoever-0.044
Token post
Token ID1281
Feature activation+3.90e+00

Pos logit contributions
roc+0.129
ivered+0.126
ated+0.119
ateful+0.117
ums+0.117
Neg logit contributions
 y-0.084
 timer-0.069
 dip-0.068
 convey-0.066
 minutes-0.066
Token office
Token ID2607
Feature activation+2.75e+00

Pos logit contributions
roc+0.165
ateful+0.150
ivered+0.137
amed+0.131
arre+0.131
Neg logit contributions
 maybe-0.142
 minutes-0.141
 might-0.138
 until-0.129
 for-0.129
Token,
Token ID11
Feature activation-5.45e-01

Pos logit contributions
roc+0.142
ate+0.129
ulture+0.128
amer+0.125
ateful+0.124
Neg logit contributions
 minutes-0.104
 y-0.084
 maybe-0.083
 wo-0.081
ches-0.080
Token she
Token ID673
Feature activation+1.35e+00

Pos logit contributions
 minutes+0.017
ches+0.017
fy+0.016
 mun+0.015
 degrees+0.015
Neg logit contributions
roc-0.028
ate-0.027
ateful-0.025
ulture-0.025
rot-0.024
Token always
Token ID1464
Feature activation+1.19e+00

Pos logit contributions
roc+0.088
ate+0.079
ulture+0.078
anish+0.077
ateful+0.077
Neg logit contributions
 minutes-0.029
ches-0.028
 mun-0.023
 y-0.023
fy-0.022
Token added
Token ID2087
Feature activation+5.48e-01

Pos logit contributions
roc+0.065
ulture+0.059
ate+0.058
rot+0.056
amer+0.055
Neg logit contributions
 minutes-0.039
 y-0.034
 mun-0.033
ches-0.033
 drift-0.033
TokenWhere
Token ID8496
Feature activation+6.29e-01

Pos logit contributions
roc+0.033
ate+0.031
ulture+0.029
ateful+0.028
ums+0.027
Neg logit contributions
 minutes-0.030
ches-0.027
 y-0.027
 Whoever-0.026
 mun-0.026
Token did
Token ID750
Feature activation-4.22e-01

Pos logit contributions
roc+0.032
ate+0.030
ulture+0.029
ateful+0.028
amer+0.028
Neg logit contributions
 minutes-0.021
ches-0.020
 y-0.020
fy-0.019
 mun-0.018
Token you
Token ID345
Feature activation+9.76e-01

Pos logit contributions
 minutes+0.015
ches+0.014
fy+0.014
 mun+0.013
 drift+0.013
Neg logit contributions
roc-0.021
ate-0.020
ateful-0.018
ulture-0.018
rot-0.017
Token get
Token ID651
Feature activation+1.58e+00

Pos logit contributions
roc+0.056
ate+0.053
ulture+0.052
rot+0.050
ateful+0.050
Neg logit contributions
 minutes-0.025
 drift-0.024
ches-0.024
 y-0.024
 dip-0.022
Token your
Token ID534
Feature activation+1.19e+00

Pos logit contributions
roc+0.088
ulture+0.083
ate+0.082
anish+0.080
ums+0.078
Neg logit contributions
 minutes-0.046
ches-0.044
 y-0.044
 dip-0.039
 tug-0.038
Token gun
Token ID2485
Feature activation+3.85e+00

Pos logit contributions
roc+0.045
ate+0.042
ateful+0.040
ulture+0.038
amer+0.037
Neg logit contributions
 minutes-0.056
ches-0.055
 y-0.054
 drift-0.051
fy-0.051
Token?"
Token ID1701
Feature activation+1.10e+00

Pos logit contributions
roc+0.200
ateful+0.185
ate+0.178
ulture+0.178
ums+0.176
Neg logit contributions
ches-0.125
 y-0.117
fy-0.105
 drift-0.105
 minutes-0.102
Token
Token ID198
Feature activation+1.24e+00

Pos logit contributions
roc+0.052
ate+0.050
ulture+0.047
ateful+0.046
rot+0.045
Neg logit contributions
 minutes-0.043
 mun-0.038
ches-0.038
 Whoever-0.037
 drift-0.037
Token
Token ID198
Feature activation+1.82e+00

Pos logit contributions
roc+0.054
ateful+0.049
ate+0.048
ulture+0.047
flower+0.046
Neg logit contributions
 minutes-0.050
ches-0.048
 Whoever-0.046
 tug-0.044
fy-0.044
TokenThe
Token ID464
Feature activation+4.76e-01

Pos logit contributions
roc+0.083
ateful+0.079
ate+0.078
ulture+0.077
flower+0.074
Neg logit contributions
 minutes-0.064
 Whoever-0.063
 tug-0.062
ches-0.060
fy-0.057
Token other
Token ID584
Feature activation+6.22e-01

Pos logit contributions
roc+0.018
ate+0.017
ulture+0.016
ateful+0.015
rot+0.015
Neg logit contributions
 minutes-0.023
 y-0.021
ches-0.021
fy-0.020
 mun-0.020
TokenWhy
Token ID5195
Feature activation+1.10e+00

Pos logit contributions
roc+0.042
ate+0.040
ateful+0.037
ulture+0.036
ums+0.035
Neg logit contributions
 minutes-0.039
ches-0.037
 mun-0.035
 y-0.035
 drift-0.034
Token did
Token ID750
Feature activation-8.82e-02

Pos logit contributions
roc+0.060
ate+0.058
amer+0.056
amed+0.055
ateful+0.055
Neg logit contributions
 y-0.031
ches-0.030
 minutes-0.030
 wo-0.027
 drift-0.027
Token you
Token ID345
Feature activation+1.37e+00

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.004
ate-0.004
ulture-0.004
ateful-0.004
rot-0.004
Token say
Token ID910
Feature activation+1.09e+00

Pos logit contributions
roc+0.080
ate+0.077
ateful+0.074
ulture+0.073
amed+0.072
Neg logit contributions
ches-0.036
 drift-0.034
 y-0.032
 dip-0.032
 minutes-0.031
Token '
Token ID705
Feature activation+7.31e-01

Pos logit contributions
roc+0.057
ate+0.056
ulture+0.052
ateful+0.051
amed+0.050
Neg logit contributions
 minutes-0.035
ches-0.034
 y-0.033
 wo-0.031
fy-0.030
Tokenob
Token ID672
Feature activation+3.84e+00

Pos logit contributions
roc+0.031
ate+0.030
ulture+0.027
ateful+0.027
rot+0.026
Neg logit contributions
 minutes-0.031
ches-0.029
fy-0.028
 mun-0.027
 y-0.027
Tokenedient
Token ID35279
Feature activation+3.34e+00

Pos logit contributions
roc+0.154
ateful+0.141
ulture+0.137
ate+0.135
amer+0.130
Neg logit contributions
ches-0.157
 minutes-0.135
 degrees-0.132
 y-0.131
 Whoever-0.128
Token'
Token ID6
Feature activation+2.13e+00

Pos logit contributions
ate+0.140
roc+0.138
ateful+0.132
amer+0.132
ulture+0.128
Neg logit contributions
ches-0.125
 minutes-0.119
 y-0.112
 wo-0.111
fy-0.108
Token?"
Token ID1701
Feature activation+1.31e+00

Pos logit contributions
roc+0.092
ate+0.089
ulture+0.086
amed+0.082
ateful+0.082
Neg logit contributions
 minutes-0.078
 y-0.077
ches-0.076
 wo-0.070
 Whoever-0.067
Token
Token ID220
Feature activation+1.68e-01

Pos logit contributions
roc+0.058
ate+0.055
amer+0.051
ateful+0.051
ulture+0.050
Neg logit contributions
 minutes-0.055
ches-0.051
 mun-0.051
 y-0.050
 drift-0.050
Token
Token ID198
Feature activation+1.23e+00

Pos logit contributions
roc+0.007
ate+0.006
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.008
ches-0.007
fy-0.007
 mun-0.007
 drift-0.007
Token judge
Token ID5052
Feature activation+1.04e+00

Pos logit contributions
roc+0.046
ate+0.043
ateful+0.039
ulture+0.039
amed+0.039
Neg logit contributions
 minutes-0.042
 y-0.039
ches-0.038
fy-0.038
 mun-0.038
Token who
Token ID508
Feature activation+1.07e+00

Pos logit contributions
roc+0.059
ate+0.056
ulture+0.052
ateful+0.052
amed+0.050
Neg logit contributions
 minutes-0.031
ches-0.028
 mun-0.025
 y-0.025
fy-0.025
Token lived
Token ID5615
Feature activation+1.18e+00

Pos logit contributions
roc+0.045
ate+0.041
ulture+0.039
ateful+0.038
rot+0.038
Neg logit contributions
 minutes-0.044
ches-0.043
 y-0.043
 mun-0.042
fy-0.042
Token in
Token ID287
Feature activation+1.11e+00

Pos logit contributions
roc+0.062
ulture+0.055
ate+0.055
ateful+0.052
amer+0.051
Neg logit contributions
 minutes-0.041
 y-0.038
ches-0.037
 mun-0.035
 tug-0.034
Token a
Token ID257
Feature activation+1.80e+00

Pos logit contributions
roc+0.057
ate+0.055
ulture+0.053
rot+0.052
flower+0.050
Neg logit contributions
 minutes-0.038
 y-0.034
 drift-0.032
ches-0.031
 tug-0.031
Token beautiful
Token ID4950
Feature activation+3.82e+00

Pos logit contributions
roc+0.096
ate+0.096
ulture+0.093
anish+0.091
amer+0.089
Neg logit contributions
 minutes-0.057
 y-0.054
 tug-0.046
fy-0.046
 slippery-0.044
Token palace
Token ID20562
Feature activation+9.16e-01

Pos logit contributions
amed+0.178
roc+0.174
amer+0.170
ulture+0.169
ateful+0.169
Neg logit contributions
 y-0.133
 minutes-0.132
 dip-0.122
 sn-0.118
 scaven-0.117
Token.
Token ID13
Feature activation+7.06e-01

Pos logit contributions
roc+0.053
ate+0.049
ulture+0.047
ateful+0.047
anish+0.045
Neg logit contributions
 minutes-0.026
ches-0.023
fy-0.022
 y-0.022
 mun-0.022
Token Whenever
Token ID21326
Feature activation+1.60e+00

Pos logit contributions
roc+0.030
ate+0.029
ulture+0.027
rot+0.027
ateful+0.026
Neg logit contributions
 minutes-0.030
fy-0.026
ches-0.026
 y-0.026
 Whoever-0.026
Token anyone
Token ID2687
Feature activation+1.10e+00

Pos logit contributions
roc+0.101
ate+0.098
amed+0.097
ulture+0.097
amer+0.093
Neg logit contributions
 minutes-0.038
 y-0.032
 drift-0.027
 maybe-0.025
 tug-0.024
Token had
Token ID550
Feature activation+1.58e+00

Pos logit contributions
roc+0.057
ate+0.053
ulture+0.050
ateful+0.049
amer+0.048
Neg logit contributions
 minutes-0.038
ches-0.036
 y-0.034
fy-0.033
 mun-0.033
Token"
Token ID1
Feature activation-2.73e-02

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.004
ches-0.004
fy-0.003
 mun-0.003
 drift-0.003
TokenCan
Token ID6090
Feature activation+1.57e-01

Pos logit contributions
 minutes+0.001
ches+0.001
fy+0.001
 mun+0.001
 drift+0.001
Neg logit contributions
roc-0.001
ate-0.001
ulture-0.001
ateful-0.001
rot-0.001
Token I
Token ID314
Feature activation+6.13e-01

Pos logit contributions
roc+0.010
ate+0.009
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.004
ches-0.003
fy-0.003
 mun-0.003
 drift-0.003
Token have
Token ID423
Feature activation+7.38e-01

Pos logit contributions
roc+0.032
ate+0.030
ulture+0.028
ateful+0.028
rot+0.027
Neg logit contributions
 minutes-0.020
ches-0.019
fy-0.018
 drift-0.018
 y-0.018
Token your
Token ID534
Feature activation+2.17e-01

Pos logit contributions
roc+0.036
ate+0.033
ulture+0.033
rot+0.031
ateful+0.031
Neg logit contributions
 minutes-0.029
 y-0.025
ches-0.025
fy-0.024
 drift-0.023
Token heart
Token ID2612
Feature activation+3.78e+00

Pos logit contributions
roc+0.010
ate+0.010
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.008
ches-0.008
fy-0.007
 drift-0.007
 mun-0.007
Token,
Token ID11
Feature activation-9.68e-01

Pos logit contributions
roc+0.212
ulture+0.192
ateful+0.180
amer+0.178
 Fighting+0.177
Neg logit contributions
 y-0.116
 minutes-0.112
ches-0.106
fy-0.106
 dip-0.101
Token Anna
Token ID11735
Feature activation+1.17e+00

Pos logit contributions
 minutes+0.032
ches+0.031
 mun+0.031
fy+0.030
 drift+0.029
Neg logit contributions
roc-0.048
ate-0.046
ateful-0.042
ulture-0.041
 Respect-0.041
Token?"
Token ID1701
Feature activation-5.06e-01

Pos logit contributions
roc+0.050
ate+0.044
ulture+0.043
ateful+0.042
amed+0.041
Neg logit contributions
 minutes-0.051
ches-0.048
 y-0.045
fy-0.045
 drift-0.042
Token Tom
Token ID4186
Feature activation-4.81e-01

Pos logit contributions
ches+0.017
 minutes+0.017
fy+0.016
 mun+0.016
 degrees+0.015
Neg logit contributions
roc-0.025
ate-0.024
ateful-0.022
 Fighting-0.021
 Respect-0.021
Token asks
Token ID7893
Feature activation+2.31e-02

Pos logit contributions
 minutes+0.022
fy+0.021
 mun+0.021
ches+0.020
 degrees+0.020
Neg logit contributions
roc-0.018
ate-0.018
ateful-0.016
ulture-0.015
rot-0.015
Token then
Token ID788
Feature activation-8.85e-02

Pos logit contributions
roc+0.054
ate+0.049
anish+0.049
ulture+0.049
ateful+0.047
Neg logit contributions
 minutes-0.025
ches-0.021
 y-0.021
fy-0.020
 mun-0.020
Token on
Token ID319
Feature activation+6.86e-01

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.004
ate-0.004
ulture-0.004
ateful-0.003
rot-0.003
Token,
Token ID11
Feature activation+6.17e-01

Pos logit contributions
roc+0.029
ate+0.027
ulture+0.025
ateful+0.025
amer+0.024
Neg logit contributions
 minutes-0.032
ches-0.029
 y-0.028
 drift-0.028
fy-0.027
Token the
Token ID262
Feature activation+5.40e-01

Pos logit contributions
roc+0.032
ate+0.031
ulture+0.029
rot+0.029
ateful+0.028
Neg logit contributions
 minutes-0.020
 y-0.018
ches-0.018
 drift-0.017
 mun-0.017
Token me
Token ID502
Feature activation+1.02e+00

Pos logit contributions
roc+0.024
ate+0.022
ulture+0.021
ateful+0.020
amer+0.020
Neg logit contributions
 minutes-0.023
 y-0.020
ches-0.020
 mun-0.019
fy-0.019
Tokenadow
Token ID4584
Feature activation+3.71e+00

Pos logit contributions
roc+0.047
ate+0.043
ateful+0.041
ulture+0.040
 Fighting+0.039
Neg logit contributions
 minutes-0.041
ches-0.038
 mun-0.036
fy-0.036
 drift-0.035
Token was
Token ID373
Feature activation+1.11e+00

Pos logit contributions
roc+0.219
ateful+0.200
ate+0.191
 Fighting+0.190
ulture+0.188
Neg logit contributions
 minutes-0.093
 mun-0.087
 jumped-0.085
ches-0.084
 wouldn-0.084
Token known
Token ID1900
Feature activation-6.39e-01

Pos logit contributions
roc+0.056
ulture+0.050
ate+0.049
ateful+0.048
anish+0.048
Neg logit contributions
 minutes-0.044
 y-0.038
ches-0.037
 drift-0.036
 mun-0.036
Token as
Token ID355
Feature activation+1.01e+00

Pos logit contributions
 minutes+0.020
ches+0.019
fy+0.017
 mun+0.017
 drift+0.017
Neg logit contributions
roc-0.034
ate-0.033
ateful-0.030
rot-0.030
ulture-0.029
Token the
Token ID262
Feature activation+1.60e-01

Pos logit contributions
roc+0.051
ate+0.047
ulture+0.047
rot+0.045
anish+0.043
Neg logit contributions
 minutes-0.036
 y-0.032
 drift-0.030
ches-0.030
 mun-0.029
Token surprise
Token ID5975
Feature activation-6.33e-01

Pos logit contributions
roc+0.006
ate+0.006
ulture+0.005
ateful+0.005
rot+0.005
Neg logit contributions
 minutes-0.007
ches-0.007
fy-0.007
 mun-0.006
 drift-0.006
Token They
Token ID1119
Feature activation-6.22e-01

Pos logit contributions
 minutes+0.022
ches+0.021
fy+0.020
 degrees+0.020
 mun+0.020
Neg logit contributions
roc-0.021
ate-0.020
ulture-0.018
ateful-0.017
 Fighting-0.017
Token make
Token ID787
Feature activation+5.86e-01

Pos logit contributions
 minutes+0.025
 degrees+0.022
ches+0.022
fy+0.022
 mun+0.022
Neg logit contributions
roc-0.028
ate-0.027
ateful-0.024
ulture-0.024
man-0.023
Token sounds
Token ID5238
Feature activation-5.33e-01

Pos logit contributions
roc+0.027
ate+0.025
ulture+0.024
ateful+0.024
rot+0.023
Neg logit contributions
 minutes-0.023
ches-0.021
 y-0.020
 drift-0.020
fy-0.020
Token like
Token ID588
Feature activation-1.21e+00

Pos logit contributions
 minutes+0.023
ches+0.023
fy+0.022
 mun+0.022
 degrees+0.021
Neg logit contributions
roc-0.020
ate-0.020
ateful-0.018
man-0.018
rot-0.018
Token "
Token ID366
Feature activation+1.28e+00

Pos logit contributions
 minutes+0.054
fy+0.054
ches+0.053
 mun+0.051
 drift+0.049
Neg logit contributions
roc-0.045
ate-0.044
man-0.039
ateful-0.039
room-0.038
Tokenv
Token ID85
Feature activation+3.71e+00

Pos logit contributions
roc+0.056
ate+0.052
ateful+0.048
ulture+0.048
rot+0.046
Neg logit contributions
 minutes-0.052
ches-0.048
fy-0.047
 Whoever-0.046
 mun-0.045
Tokenroom
Token ID3823
Feature activation+5.22e-01

Pos logit contributions
 messenger+0.089
roc+0.086
amer+0.081
handed+0.080
 courtroom+0.079
Neg logit contributions
ches-0.189
 minutes-0.187
 y-0.173
 degrees-0.168
 Whoever-0.167
Token"
Token ID1
Feature activation+7.75e-01

Pos logit contributions
roc+0.022
ate+0.021
ateful+0.020
ulture+0.020
rot+0.019
Neg logit contributions
 minutes-0.022
ches-0.021
fy-0.020
 y-0.019
 mun-0.019
Token and
Token ID290
Feature activation-4.87e-01

Pos logit contributions
roc+0.036
ate+0.033
ulture+0.032
ateful+0.031
rot+0.030
Neg logit contributions
 minutes-0.030
ches-0.028
fy-0.026
 y-0.026
 drift-0.025
Token "
Token ID366
Feature activation+6.37e-01

Pos logit contributions
 minutes+0.021
ches+0.020
fy+0.020
 mun+0.019
 drift+0.019
Neg logit contributions
roc-0.020
ate-0.019
ateful-0.017
ulture-0.017
 Fighting-0.016
Tokenbe
Token ID1350
Feature activation+1.27e+00

Pos logit contributions
roc+0.027
ate+0.025
ateful+0.023
ulture+0.023
rot+0.022
Neg logit contributions
 minutes-0.027
ches-0.025
fy-0.024
 mun-0.024
 Whoever-0.024
Token fox
Token ID21831
Feature activation+3.51e-01

Pos logit contributions
 minutes+0.014
ches+0.013
fy+0.013
 drift+0.013
 degrees+0.013
Neg logit contributions
roc-0.012
ate-0.011
ulture-0.010
rot-0.010
ateful-0.010
Token named
Token ID3706
Feature activation-1.43e+00

Pos logit contributions
roc+0.016
ate+0.015
ulture+0.014
ateful+0.014
rot+0.013
Neg logit contributions
 minutes-0.014
ches-0.013
fy-0.012
 mun-0.012
 y-0.012
Token Fox
Token ID5426
Feature activation+2.17e+00

Pos logit contributions
 minutes+0.066
fy+0.057
 degrees+0.055
 drift+0.054
 seconds+0.052
Neg logit contributions
roc-0.062
ate-0.055
gy-0.051
man-0.049
rot-0.048
Tokeny
Token ID88
Feature activation+1.17e+00

Pos logit contributions
roc+0.097
amed+0.086
ate+0.086
ulture+0.085
anish+0.084
Neg logit contributions
ches-0.086
 minutes-0.079
 y-0.074
 wo-0.073
 mun-0.071
Token.
Token ID13
Feature activation+1.44e+00

Pos logit contributions
roc+0.069
ate+0.065
ulture+0.062
ateful+0.062
amed+0.061
Neg logit contributions
 minutes-0.029
ches-0.027
 y-0.025
 wo-0.024
 mun-0.024
Token Fox
Token ID5426
Feature activation+3.71e+00

Pos logit contributions
roc+0.063
ate+0.058
ateful+0.057
amed+0.056
ulture+0.056
Neg logit contributions
 drift-0.055
ches-0.053
 minutes-0.052
 y-0.052
 mun-0.052
Tokeny
Token ID88
Feature activation+2.70e+00

Pos logit contributions
roc+0.153
ulture+0.133
anish+0.132
 Fighting+0.131
 courtroom+0.128
Neg logit contributions
ches-0.159
 y-0.135
 might-0.133
 wo-0.130
 mun-0.126
Token was
Token ID373
Feature activation+1.11e+00

Pos logit contributions
roc+0.138
ateful+0.126
ulture+0.123
ate+0.123
amer+0.120
Neg logit contributions
ches-0.086
 minutes-0.084
 y-0.082
 wo-0.081
 mun-0.079
Token very
Token ID845
Feature activation+6.29e-01

Pos logit contributions
roc+0.060
ate+0.057
ulture+0.056
anish+0.055
ateful+0.054
Neg logit contributions
 minutes-0.034
 y-0.033
ches-0.031
 mun-0.029
 wo-0.028
Token intelligent
Token ID12661
Feature activation-2.80e-01

Pos logit contributions
roc+0.030
ate+0.028
ulture+0.027
anish+0.026
rot+0.026
Neg logit contributions
 minutes-0.024
ches-0.021
 y-0.021
fy-0.021
 mun-0.020
Token and
Token ID290
Feature activation+7.00e-01

Pos logit contributions
 minutes+0.013
ches+0.012
fy+0.012
 drift+0.012
 mun+0.011
Neg logit contributions
roc-0.011
ate-0.010
ulture-0.009
rot-0.009
ateful-0.009
Token that
Token ID326
Feature activation+6.85e-01

Pos logit contributions
roc+0.042
ate+0.039
ulture+0.038
ateful+0.037
anish+0.037
Neg logit contributions
 minutes-0.021
ches-0.018
fy-0.017
 y-0.017
 mun-0.017
Token day
Token ID1110
Feature activation-6.37e-02

Pos logit contributions
roc+0.035
ate+0.033
ateful+0.031
ulture+0.030
amer+0.030
Neg logit contributions
 minutes-0.025
ches-0.023
 y-0.022
fy-0.022
 drift-0.021
Token on
Token ID319
Feature activation+8.59e-01

Pos logit contributions
 minutes+0.002
ches+0.002
fy+0.002
 mun+0.002
 drift+0.002
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.003
rot-0.002
Token,
Token ID11
Feature activation-3.41e-01

Pos logit contributions
roc+0.036
ate+0.033
ulture+0.031
ateful+0.031
amer+0.029
Neg logit contributions
 minutes-0.040
ches-0.037
 drift-0.036
 y-0.035
fy-0.035
Token Pet
Token ID4767
Feature activation+9.88e-01

Pos logit contributions
 minutes+0.014
ches+0.012
fy+0.012
 degrees+0.012
 drift+0.011
Neg logit contributions
roc-0.016
ate-0.015
ulture-0.013
ateful-0.013
rot-0.013
Tokenal
Token ID282
Feature activation+3.71e+00

Pos logit contributions
roc+0.025
ate+0.021
ateful+0.020
ulture+0.020
anish+0.019
Neg logit contributions
ches-0.058
 minutes-0.057
 mun-0.053
fy-0.053
 degrees-0.052
Token was
Token ID373
Feature activation+9.01e-01

Pos logit contributions
roc+0.187
ateful+0.173
ate+0.168
 Fighting+0.166
anish+0.157
Neg logit contributions
ches-0.137
 minutes-0.117
 mun-0.115
 drift-0.105
 jumped-0.104
Token never
Token ID1239
Feature activation+7.02e-01

Pos logit contributions
roc+0.043
ate+0.040
ulture+0.038
ateful+0.037
anish+0.036
Neg logit contributions
 minutes-0.037
ches-0.033
 y-0.031
 mun-0.031
fy-0.031
Token careless
Token ID36138
Feature activation+8.48e-01

Pos logit contributions
roc+0.032
ate+0.031
ulture+0.029
rot+0.028
ateful+0.027
Neg logit contributions
 minutes-0.029
ches-0.026
fy-0.025
 y-0.025
 mun-0.025
Token and
Token ID290
Feature activation+7.73e-01

Pos logit contributions
roc+0.042
ate+0.040
ulture+0.037
ateful+0.036
rot+0.036
Neg logit contributions
 minutes-0.032
ches-0.028
fy-0.027
 y-0.027
 drift-0.027
Token always
Token ID1464
Feature activation+9.70e-01

Pos logit contributions
roc+0.039
ate+0.038
ulture+0.035
rot+0.034
ateful+0.034
Neg logit contributions
 minutes-0.027
ches-0.025
fy-0.024
 y-0.024
 mun-0.024
Token that
Token ID326
Feature activation+1.23e+00

Pos logit contributions
roc+0.054
ate+0.050
ulture+0.049
anish+0.049
ateful+0.047
Neg logit contributions
 minutes-0.025
ches-0.022
 y-0.021
fy-0.020
 mun-0.019
Token day
Token ID1110
Feature activation+4.56e-01

Pos logit contributions
roc+0.073
ate+0.069
ateful+0.066
ulture+0.064
amer+0.064
Neg logit contributions
 minutes-0.036
 y-0.032
ches-0.031
 drift-0.030
fy-0.027
Token on
Token ID319
Feature activation+9.78e-01

Pos logit contributions
roc+0.023
ate+0.022
ulture+0.021
ateful+0.020
rot+0.020
Neg logit contributions
 minutes-0.016
ches-0.015
fy-0.014
 drift-0.014
 y-0.013
Token,
Token ID11
Feature activation+1.11e+00

Pos logit contributions
roc+0.044
ate+0.041
ulture+0.040
ateful+0.039
amer+0.038
Neg logit contributions
 minutes-0.042
ches-0.037
 y-0.036
 drift-0.036
 marsh-0.034
Token the
Token ID262
Feature activation+8.76e-01

Pos logit contributions
roc+0.062
ate+0.059
ulture+0.057
rot+0.057
amed+0.056
Neg logit contributions
 minutes-0.033
 y-0.030
 drift-0.027
 Whoever-0.027
 mun-0.026
Token village
Token ID7404
Feature activation+3.69e+00

Pos logit contributions
roc+0.042
ate+0.042
ulture+0.039
rot+0.038
amer+0.037
Neg logit contributions
 minutes-0.034
 y-0.031
 tug-0.027
 drift-0.026
fy-0.026
Token was
Token ID373
Feature activation+1.06e+00

Pos logit contributions
roc+0.198
ate+0.179
ateful+0.174
ulture+0.173
anish+0.173
Neg logit contributions
 minutes-0.093
ches-0.093
 y-0.087
 wouldn-0.087
 carefully-0.086
Token even
Token ID772
Feature activation+5.19e-01

Pos logit contributions
roc+0.056
ate+0.052
ulture+0.051
anish+0.049
rot+0.049
Neg logit contributions
 minutes-0.037
 y-0.034
ches-0.032
 mun-0.030
fy-0.029
Token more
Token ID517
Feature activation+9.04e-01

Pos logit contributions
roc+0.029
ate+0.028
ulture+0.026
rot+0.026
ateful+0.025
Neg logit contributions
 minutes-0.016
ches-0.014
 y-0.013
fy-0.013
 mun-0.013
Token peaceful
Token ID12309
Feature activation+1.70e+00

Pos logit contributions
roc+0.048
ulture+0.043
ate+0.043
anish+0.042
rot+0.041
Neg logit contributions
 minutes-0.034
 y-0.030
fy-0.028
ches-0.026
 drift-0.025
Token and
Token ID290
Feature activation+9.14e-01

Pos logit contributions
roc+0.092
ate+0.081
ulture+0.081
ateful+0.078
rot+0.077
Neg logit contributions
 minutes-0.065
 y-0.051
 mun-0.047
 drift-0.047
ches-0.046
Token that
Token ID326
Feature activation+7.32e-01

Pos logit contributions
roc+0.039
ate+0.036
ulture+0.035
anish+0.035
ateful+0.034
Neg logit contributions
 minutes-0.019
ches-0.016
 y-0.015
fy-0.015
 mun-0.014
Token day
Token ID1110
Feature activation+2.00e-01

Pos logit contributions
roc+0.038
ate+0.037
ateful+0.035
ulture+0.034
amer+0.033
Neg logit contributions
 minutes-0.026
ches-0.023
 y-0.022
 drift-0.022
fy-0.022
Token on
Token ID319
Feature activation+9.14e-01

Pos logit contributions
roc+0.010
ate+0.009
ulture+0.009
ateful+0.008
rot+0.008
Neg logit contributions
 minutes-0.007
ches-0.007
fy-0.007
 drift-0.006
 mun-0.006
Token,
Token ID11
Feature activation+4.34e-01

Pos logit contributions
roc+0.040
ate+0.037
ulture+0.035
ateful+0.035
amer+0.034
Neg logit contributions
 minutes-0.040
ches-0.036
 drift-0.035
 y-0.035
 mun-0.034
Token the
Token ID262
Feature activation+4.33e-01

Pos logit contributions
roc+0.022
ate+0.021
ulture+0.020
ateful+0.019
rot+0.019
Neg logit contributions
 minutes-0.015
ches-0.013
 y-0.013
 mun-0.013
 drift-0.013
Token shop
Token ID6128
Feature activation+3.67e+00

Pos logit contributions
roc+0.017
ate+0.016
ulture+0.015
ateful+0.014
rot+0.014
Neg logit contributions
 minutes-0.021
 y-0.019
ches-0.018
fy-0.018
 drift-0.018
Tokenkeeper
Token ID13884
Feature activation+2.63e+00

Pos logit contributions
roc+0.150
anish+0.146
ateful+0.144
 Fighting+0.141
ate+0.137
Neg logit contributions
 y-0.127
 mun-0.126
 minutes-0.122
 carefully-0.122
 wouldn-0.119
Token was
Token ID373
Feature activation+1.06e+00

Pos logit contributions
roc+0.143
ateful+0.130
ate+0.128
 Fighting+0.127
ulture+0.126
Neg logit contributions
ches-0.078
 mun-0.074
 minutes-0.074
 y-0.067
 jumped-0.066
Token extra
Token ID3131
Feature activation+6.85e-02

Pos logit contributions
roc+0.053
ate+0.048
ulture+0.047
anish+0.045
ateful+0.045
Neg logit contributions
 minutes-0.041
 y-0.036
ches-0.035
 mun-0.034
 drift-0.033
Token careful
Token ID8161
Feature activation+1.37e+00

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.003
ches-0.002
fy-0.002
 drift-0.002
 mun-0.002
Token to
Token ID284
Feature activation+1.58e-01

Pos logit contributions
roc+0.071
ate+0.064
ulture+0.062
ateful+0.061
rot+0.059
Neg logit contributions
 minutes-0.050
ches-0.045
fy-0.042
 y-0.041
 drift-0.041
Token Can
Token ID1680
Feature activation-1.60e-01

Pos logit contributions
 minutes+0.007
ches+0.007
fy+0.006
 mun+0.006
 drift+0.006
Neg logit contributions
roc-0.007
ate-0.007
ulture-0.006
ateful-0.006
rot-0.006
Token you
Token ID345
Feature activation+9.55e-01

Pos logit contributions
 minutes+0.005
ches+0.005
fy+0.004
 mun+0.004
 drift+0.004
Neg logit contributions
roc-0.009
ate-0.008
ulture-0.008
ateful-0.008
rot-0.007
Token lend
Token ID22096
Feature activation+1.74e+00

Pos logit contributions
roc+0.048
ate+0.044
ulture+0.043
ateful+0.042
amer+0.041
Neg logit contributions
 minutes-0.033
ches-0.031
 y-0.030
 drift-0.030
fy-0.030
Token me
Token ID502
Feature activation+2.65e+00

Pos logit contributions
roc+0.105
ulture+0.102
ateful+0.102
amer+0.099
flower+0.099
Neg logit contributions
ches-0.039
 minutes-0.038
 y-0.036
 honey-0.029
fy-0.029
Token your
Token ID534
Feature activation+9.65e-01

Pos logit contributions
roc+0.143
ateful+0.133
amer+0.131
ulture+0.130
ums+0.128
Neg logit contributions
ches-0.082
 minutes-0.079
 y-0.075
 honey-0.068
 dip-0.067
Token colors
Token ID7577
Feature activation+3.66e+00

Pos logit contributions
roc+0.043
ate+0.040
ulture+0.039
amer+0.038
ateful+0.038
Neg logit contributions
 minutes-0.040
 y-0.037
ches-0.035
fy-0.033
 wo-0.033
Token so
Token ID523
Feature activation+6.73e-01

Pos logit contributions
roc+0.212
ateful+0.182
ulture+0.181
handed+0.178
 courtroom+0.176
Neg logit contributions
 y-0.102
ches-0.097
 minutes-0.094
fy-0.092
 dip-0.091
Token I
Token ID314
Feature activation+1.75e-01

Pos logit contributions
roc+0.038
ate+0.035
ulture+0.034
amer+0.033
ateful+0.033
Neg logit contributions
 minutes-0.020
ches-0.017
fy-0.017
 y-0.017
 mun-0.015
Token can
Token ID460
Feature activation+1.24e+00

Pos logit contributions
roc+0.010
ate+0.010
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.005
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
Token draw
Token ID3197
Feature activation+1.61e+00

Pos logit contributions
roc+0.064
ateful+0.056
ulture+0.056
ate+0.055
rot+0.053
Neg logit contributions
ches-0.041
 minutes-0.039
 drift-0.038
 y-0.038
 dip-0.037
Token one
Token ID530
Feature activation+3.70e-01

Pos logit contributions
roc+0.082
ulture+0.072
ateful+0.070
ate+0.067
apper+0.066
Neg logit contributions
 minutes-0.059
ches-0.056
 y-0.054
fy-0.049
 Whoever-0.048
Token,
Token ID11
Feature activation-2.29e-01

Pos logit contributions
roc+0.053
ate+0.050
ulture+0.047
ateful+0.045
amer+0.045
Neg logit contributions
 minutes-0.045
 y-0.041
 wo-0.040
ches-0.039
 drift-0.039
Token "
Token ID366
Feature activation+2.97e-01

Pos logit contributions
 minutes+0.010
ches+0.009
fy+0.009
 mun+0.009
 drift+0.009
Neg logit contributions
roc-0.009
ate-0.009
ulture-0.008
ateful-0.008
rot-0.008
TokenAr
Token ID3163
Feature activation+2.71e-01

Pos logit contributions
roc+0.012
ate+0.012
ateful+0.011
ulture+0.010
rot+0.010
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.012
 mun-0.012
 drift-0.012
Tokenrr
Token ID21062
Feature activation+1.05e+00

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.006
ateful+0.006
rot+0.005
Neg logit contributions
 minutes-0.016
ches-0.015
fy-0.015
 mun-0.014
 drift-0.014
Token,
Token ID11
Feature activation-1.28e+00

Pos logit contributions
roc+0.041
ate+0.039
ulture+0.037
ateful+0.036
amed+0.033
Neg logit contributions
 minutes-0.049
ches-0.047
fy-0.044
 wo-0.043
 degrees-0.043
Token mate
Token ID16133
Feature activation+3.65e+00

Pos logit contributions
 minutes+0.052
fy+0.049
ches+0.049
 degrees+0.048
 mun+0.045
Neg logit contributions
roc-0.056
ate-0.053
rot-0.050
 judge-0.049
man-0.048
Tokeny
Token ID88
Feature activation+6.00e-01

Pos logit contributions
roc+0.132
 courtroom+0.130
 Fighting+0.126
amed+0.122
ate+0.119
Neg logit contributions
ches-0.148
 minutes-0.144
 might-0.138
 wo-0.137
 y-0.132
Token!"
Token ID2474
Feature activation-3.11e-01

Pos logit contributions
roc+0.024
ate+0.023
ateful+0.021
ulture+0.021
rot+0.020
Neg logit contributions
 minutes-0.027
ches-0.025
fy-0.024
 mun-0.024
 drift-0.024
Token
Token ID198
Feature activation+6.80e-01

Pos logit contributions
 minutes+0.015
ches+0.014
fy+0.014
 degrees+0.013
 mun+0.013
Neg logit contributions
roc-0.011
ate-0.011
ulture-0.010
ateful-0.010
rot-0.010
Token
Token ID198
Feature activation+1.83e+00

Pos logit contributions
roc+0.031
ate+0.029
ulture+0.027
ateful+0.027
flower+0.026
Neg logit contributions
 minutes-0.027
ches-0.025
 drift-0.023
 Whoever-0.023
 y-0.023
TokenAnna
Token ID31160
Feature activation+5.48e-01

Pos logit contributions
roc+0.094
ulture+0.086
ate+0.086
 zone+0.084
flower+0.084
Neg logit contributions
 minutes-0.057
 Whoever-0.052
 tug-0.051
ches-0.048
 carefully-0.047
Token did
Token ID750
Feature activation-3.99e-01

Pos logit contributions
roc+0.052
ate+0.049
ulture+0.047
amer+0.047
rot+0.046
Neg logit contributions
 minutes-0.030
 y-0.029
ches-0.029
 drift-0.026
 wo-0.026
Token you
Token ID345
Feature activation+6.26e-01

Pos logit contributions
 minutes+0.014
ches+0.012
fy+0.012
 mun+0.012
 drift+0.011
Neg logit contributions
roc-0.020
ate-0.019
ateful-0.018
ulture-0.018
rot-0.017
Token take
Token ID1011
Feature activation+4.95e-01

Pos logit contributions
roc+0.034
ate+0.032
ateful+0.030
ulture+0.030
rot+0.029
Neg logit contributions
 minutes-0.019
ches-0.019
 drift-0.017
fy-0.017
 mun-0.017
Token my
Token ID616
Feature activation-3.39e-01

Pos logit contributions
roc+0.022
ate+0.021
ulture+0.020
ateful+0.019
rot+0.019
Neg logit contributions
 minutes-0.021
ches-0.019
 y-0.018
fy-0.018
 drift-0.018
Token kay
Token ID34681
Feature activation+1.76e+00

Pos logit contributions
 minutes+0.015
ches+0.015
fy+0.014
 mun+0.013
 drift+0.013
Neg logit contributions
roc-0.013
ate-0.013
ulture-0.012
ateful-0.011
rot-0.011
Tokenak
Token ID461
Feature activation+3.65e+00

Pos logit contributions
roc+0.075
ateful+0.069
flower+0.067
 Fighting+0.067
ate+0.067
Neg logit contributions
 minutes-0.067
fy-0.065
ches-0.065
 mun-0.058
 y-0.057
Token?"
Token ID1701
Feature activation+1.76e-01

Pos logit contributions
roc+0.191
ateful+0.179
ums+0.170
ulture+0.166
ate+0.166
Neg logit contributions
ches-0.117
 y-0.101
 minutes-0.100
fy-0.099
 slippery-0.096
Token
Token ID198
Feature activation+2.94e-01

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.008
ches-0.007
fy-0.007
 mun-0.007
 drift-0.007
Token
Token ID198
Feature activation+1.50e+00

Pos logit contributions
roc+0.012
ate+0.011
ulture+0.011
ateful+0.010
rot+0.010
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.011
 drift-0.011
 mun-0.011
TokenL
Token ID43
Feature activation-7.85e-02

Pos logit contributions
roc+0.071
flower+0.069
ulture+0.067
ate+0.066
ums+0.064
Neg logit contributions
 minutes-0.056
 tug-0.046
 Whoever-0.045
ches-0.044
 y-0.042
Tokenila
Token ID10102
Feature activation+1.24e+00

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 drift+0.003
 mun+0.003
Neg logit contributions
roc-0.004
ate-0.004
ulture-0.003
ateful-0.003
rot-0.003
Token to
Token ID284
Feature activation+3.57e-02

Pos logit contributions
 minutes+0.007
ches+0.006
fy+0.006
 mun+0.006
 drift+0.006
Neg logit contributions
roc-0.006
ate-0.006
ateful-0.005
ulture-0.005
rot-0.005
Token bow
Token ID9563
Feature activation+1.47e+00

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.001
ateful+0.001
rot+0.001
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token.
Token ID13
Feature activation-9.12e-01

Pos logit contributions
roc+0.086
ate+0.077
ateful+0.075
ulture+0.075
anish+0.073
Neg logit contributions
ches-0.044
 minutes-0.041
 y-0.039
fy-0.038
 wo-0.036
Token Whenever
Token ID21326
Feature activation+9.68e-01

Pos logit contributions
 minutes+0.039
ches+0.036
fy+0.036
 degrees+0.035
 mun+0.034
Neg logit contributions
roc-0.038
ate-0.037
ulture-0.033
 Respect-0.033
 Fighting-0.033
Token she
Token ID673
Feature activation+1.52e-01

Pos logit contributions
roc+0.056
ate+0.054
ulture+0.052
amed+0.050
ateful+0.050
Neg logit contributions
 minutes-0.028
ches-0.022
 drift-0.022
 y-0.022
 mun-0.021
Token bowed
Token ID35737
Feature activation+3.64e+00

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.006
ches-0.005
fy-0.005
 mun-0.005
 drift-0.005
Token,
Token ID11
Feature activation+4.21e-02

Pos logit contributions
roc+0.213
ulture+0.198
ate+0.188
ateful+0.185
anish+0.180
Neg logit contributions
ches-0.098
 minutes-0.094
 y-0.094
 wo-0.093
 carefully-0.093
Token the
Token ID262
Feature activation+5.58e-01

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token town
Token ID3240
Feature activation+2.11e+00

Pos logit contributions
roc+0.022
ate+0.021
ulture+0.019
ateful+0.019
amer+0.018
Neg logit contributions
 minutes-0.026
ches-0.023
fy-0.023
 y-0.023
 drift-0.022
Tokenspe
Token ID4125
Feature activation+2.28e+00

Pos logit contributions
roc+0.117
ate+0.108
ulture+0.103
ateful+0.101
anish+0.098
Neg logit contributions
 minutes-0.063
ches-0.058
fy-0.056
 y-0.053
 drift-0.053
Tokenople
Token ID643
Feature activation+1.49e+00

Pos logit contributions
roc+0.088
ate+0.085
ulture+0.081
 Respect+0.080
ums+0.073
Neg logit contributions
 minutes-0.102
fy-0.089
ches-0.085
 Whoever-0.084
 mun-0.083
Token Why
Token ID4162
Feature activation+5.97e-01

Pos logit contributions
roc+0.032
ate+0.030
ulture+0.028
ateful+0.027
rot+0.027
Neg logit contributions
 minutes-0.028
ches-0.026
fy-0.025
 Whoever-0.025
 drift-0.024
Token are
Token ID389
Feature activation-3.12e-01

Pos logit contributions
roc+0.031
ate+0.030
ulture+0.028
ateful+0.027
amer+0.027
Neg logit contributions
 minutes-0.020
ches-0.018
 y-0.017
fy-0.017
 drift-0.016
Token you
Token ID345
Feature activation+1.12e+00

Pos logit contributions
 minutes+0.010
ches+0.009
fy+0.009
 mun+0.009
 drift+0.008
Neg logit contributions
roc-0.017
ate-0.016
ateful-0.015
ulture-0.014
rot-0.014
Token qu
Token ID627
Feature activation+2.35e+00

Pos logit contributions
roc+0.058
ate+0.055
ulture+0.053
ateful+0.050
rot+0.049
Neg logit contributions
 minutes-0.038
ches-0.037
 y-0.035
 drift-0.034
 Whoever-0.034
Tokenarre
Token ID9624
Feature activation+1.84e+00

Pos logit contributions
man+0.065
roc+0.065
 messenger+0.065
ate+0.064
ateful+0.062
Neg logit contributions
ches-0.125
fy-0.109
 minutes-0.108
 Whoever-0.105
 mun-0.103
Tokenling
Token ID1359
Feature activation+3.64e+00

Pos logit contributions
roc+0.094
ulture+0.088
rot+0.086
ate+0.084
 Respect+0.083
Neg logit contributions
ches-0.063
 minutes-0.063
 y-0.054
fy-0.052
 drift-0.051
Token?"
Token ID1701
Feature activation+6.21e-01

Pos logit contributions
roc+0.185
ulture+0.169
ate+0.168
ateful+0.165
ums+0.163
Neg logit contributions
ches-0.117
 minutes-0.116
 y-0.111
fy-0.094
 marsh-0.093
Token
Token ID198
Feature activation+1.17e+00

Pos logit contributions
roc+0.030
ate+0.028
ulture+0.026
ateful+0.026
rot+0.026
Neg logit contributions
 minutes-0.024
ches-0.021
 drift-0.021
 y-0.020
 Whoever-0.020
Token
Token ID198
Feature activation+1.63e+00

Pos logit contributions
roc+0.058
ate+0.053
ulture+0.053
ateful+0.052
rot+0.049
Neg logit contributions
 minutes-0.042
ches-0.038
 Whoever-0.038
 y-0.036
 drift-0.035
TokenThe
Token ID464
Feature activation+9.72e-01

Pos logit contributions
roc+0.073
ulture+0.068
ate+0.066
ateful+0.064
rot+0.061
Neg logit contributions
 minutes-0.066
 Whoever-0.061
 tug-0.059
fy-0.057
 y-0.056
Token two
Token ID734
Feature activation+6.67e-01

Pos logit contributions
roc+0.040
ate+0.040
ulture+0.036
rot+0.035
ateful+0.034
Neg logit contributions
 minutes-0.043
 y-0.039
ches-0.036
fy-0.036
 wo-0.036
Token day
Token ID1110
Feature activation-7.39e-02

Pos logit contributions
roc+0.036
ate+0.035
ateful+0.033
ulture+0.032
amer+0.031
Neg logit contributions
 minutes-0.025
ches-0.023
fy-0.021
 y-0.021
 drift-0.021
Token on
Token ID319
Feature activation+6.57e-01

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.002
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.003
rot-0.003
Token,
Token ID11
Feature activation-2.91e-01

Pos logit contributions
roc+0.026
ate+0.025
ulture+0.023
ateful+0.023
amer+0.022
Neg logit contributions
 minutes-0.031
ches-0.029
fy-0.027
 y-0.027
 drift-0.027
Token Wh
Token ID854
Feature activation+6.38e-01

Pos logit contributions
 minutes+0.012
ches+0.011
fy+0.010
 drift+0.010
 degrees+0.010
Neg logit contributions
roc-0.013
ate-0.012
ulture-0.011
ateful-0.011
rot-0.011
Tokenisk
Token ID1984
Feature activation+1.39e+00

Pos logit contributions
roc+0.027
ate+0.027
ateful+0.025
ulture+0.024
amer+0.024
Neg logit contributions
 minutes-0.025
ches-0.025
fy-0.023
 mun-0.023
 degrees-0.022
Tokeners
Token ID364
Feature activation+3.63e+00

Pos logit contributions
roc+0.058
ateful+0.053
ate+0.053
ulture+0.051
 Respect+0.049
Neg logit contributions
ches-0.057
 minutes-0.057
 mun-0.050
 y-0.049
 degrees-0.049
Token was
Token ID373
Feature activation+8.42e-01

Pos logit contributions
roc+0.206
ateful+0.185
 Fighting+0.183
ate+0.182
ulture+0.175
Neg logit contributions
ches-0.104
 mun-0.098
 jumped-0.094
 minutes-0.091
 gob-0.089
Token no
Token ID645
Feature activation-7.43e-01

Pos logit contributions
roc+0.040
ate+0.036
ulture+0.035
ateful+0.034
anish+0.033
Neg logit contributions
 minutes-0.035
ches-0.031
 y-0.030
 mun-0.029
fy-0.029
Token longer
Token ID2392
Feature activation+1.67e+00

Pos logit contributions
ches+0.021
 minutes+0.021
 degrees+0.020
 mun+0.020
 drift+0.020
Neg logit contributions
roc-0.039
ate-0.038
ateful-0.036
ulture-0.034
rot-0.033
Token a
Token ID257
Feature activation-2.64e-01

Pos logit contributions
roc+0.081
ate+0.078
ulture+0.077
rot+0.071
man+0.070
Neg logit contributions
 minutes-0.070
 y-0.059
fy-0.056
ches-0.054
 mun-0.054
Token lazy
Token ID16931
Feature activation+1.04e-02

Pos logit contributions
 minutes+0.010
ches+0.010
fy+0.009
 drift+0.009
 mun+0.009
Neg logit contributions
roc-0.012
ate-0.011
ulture-0.010
ateful-0.010
rot-0.010
Token fun
Token ID1257
Feature activation-2.14e+00

Pos logit contributions
ches+0.082
 degrees+0.069
 pounds+0.065
fy+0.065
 mun+0.064
Neg logit contributions
ate-0.086
 neighborhood-0.078
 town-0.075
roc-0.074
 Respect-0.073
Token.
Token ID13
Feature activation-8.29e-01

Pos logit contributions
ches+0.107
fy+0.106
 degrees+0.096
 mun+0.095
 syrup+0.093
Neg logit contributions
ate-0.072
ating-0.062
 neighborhood-0.058
 town-0.058
room-0.057
Token After
Token ID2293
Feature activation-1.30e+00

Pos logit contributions
 minutes+0.039
 degrees+0.037
 pounds+0.036
ches+0.035
 mun+0.035
Neg logit contributions
ate-0.030
roc-0.029
 Fighting-0.026
 neighborhood-0.025
ulture-0.024
Token playing
Token ID2712
Feature activation+3.44e-02

Pos logit contributions
ches+0.065
fy+0.061
 degrees+0.057
 minutes+0.055
 pounds+0.054
Neg logit contributions
ate-0.043
 town-0.041
roc-0.040
 Fighting-0.040
ateful-0.038
Token for
Token ID329
Feature activation-1.43e+00

Pos logit contributions
roc+0.001
ate+0.001
ulture+0.001
ateful+0.001
rot+0.001
Neg logit contributions
 minutes-0.002
ches-0.002
fy-0.001
 mun-0.001
 drift-0.001
Token a
Token ID257
Feature activation-4.21e+00

Pos logit contributions
ches+0.073
fy+0.071
 degrees+0.071
 minutes+0.069
 mun+0.069
Neg logit contributions
ate-0.039
roc-0.038
 town-0.033
ateful-0.032
amed-0.031
Token while
Token ID981
Feature activation-9.66e-01

Pos logit contributions
 degrees+0.185
 pounds+0.164
ches+0.156
oops+0.153
oll+0.153
Neg logit contributions
 judge-0.146
 town-0.131
 bully-0.125
roc-0.124
 pupil-0.121
Token,
Token ID11
Feature activation-9.64e-01

Pos logit contributions
ches+0.058
 degrees+0.057
 mun+0.057
fy+0.056
 minutes+0.056
Neg logit contributions
roc-0.016
ate-0.015
 town-0.012
ateful-0.012
rot-0.012
Token they
Token ID484
Feature activation-3.89e-01

Pos logit contributions
 minutes+0.032
fy+0.031
ches+0.031
 degrees+0.029
 pounds+0.028
Neg logit contributions
ate-0.045
roc-0.045
ateful-0.041
 Fighting-0.041
ulture-0.040
Token decided
Token ID3066
Feature activation-3.98e-01

Pos logit contributions
 minutes+0.016
ches+0.014
 degrees+0.014
fy+0.014
 drift+0.013
Neg logit contributions
roc-0.017
ate-0.016
ulture-0.015
ateful-0.015
rot-0.015
Token to
Token ID284
Feature activation-6.21e-02

Pos logit contributions
 minutes+0.019
ches+0.018
fy+0.018
 degrees+0.017
 mun+0.017
Neg logit contributions
roc-0.014
ate-0.014
ateful-0.012
ulture-0.012
 Fighting-0.012
Token to
Token ID284
Feature activation-1.67e+00

Pos logit contributions
 degrees+0.084
 minutes+0.081
ches+0.081
 mun+0.080
 pounds+0.079
Neg logit contributions
ate-0.043
roc-0.042
amed-0.035
room-0.034
man-0.034
Token the
Token ID262
Feature activation-7.41e-01

Pos logit contributions
fy+0.106
ches+0.103
 degrees+0.102
 minutes+0.102
 pounds+0.098
Neg logit contributions
ate-0.041
 town-0.035
roc-0.034
room-0.029
 Respect-0.027
Token slide
Token ID10649
Feature activation-1.46e+00

Pos logit contributions
ches+0.031
 minutes+0.031
fy+0.029
 mun+0.028
 degrees+0.027
Neg logit contributions
roc-0.032
ate-0.031
 neighborhood-0.028
ulture-0.028
ateful-0.028
Token.
Token ID13
Feature activation-9.66e-01

Pos logit contributions
 minutes+0.068
ches+0.067
 degrees+0.066
 mun+0.065
fy+0.064
Neg logit contributions
ate-0.051
man-0.047
roc-0.046
room-0.045
flower-0.043
Token They
Token ID1119
Feature activation-2.29e+00

Pos logit contributions
 minutes+0.052
fy+0.051
ches+0.050
 degrees+0.049
 pounds+0.048
Neg logit contributions
ate-0.028
roc-0.027
 Fighting-0.024
 Respect-0.023
ulture-0.023
Token wait
Token ID4043
Feature activation-4.18e+00

Pos logit contributions
 minutes+0.107
 degrees+0.105
ches+0.094
 pounds+0.092
fy+0.090
Neg logit contributions
roc-0.081
ate-0.080
 town-0.078
 judge-0.076
flower-0.072
Token for
Token ID329
Feature activation-1.93e+00

Pos logit contributions
ches+0.216
fy+0.209
 mun+0.199
ience+0.196
cone+0.195
Neg logit contributions
ate-0.113
ap-0.095
,-0.091
 with-0.088
 town-0.084
Token their
Token ID511
Feature activation-1.53e+00

Pos logit contributions
ches+0.110
fy+0.105
 mun+0.102
 pounds+0.096
 wo+0.095
Neg logit contributions
ate-0.054
roc-0.047
 town-0.043
ateful-0.041
room-0.039
Token turn
Token ID1210
Feature activation-1.91e+00

Pos logit contributions
ches+0.077
 minutes+0.068
fy+0.067
 mun+0.066
 degrees+0.064
Neg logit contributions
roc-0.053
 neighborhood-0.049
ate-0.049
 town-0.047
 judge-0.046
Token.
Token ID13
Feature activation-9.13e-01

Pos logit contributions
 mun+0.084
 minutes+0.084
ches+0.082
 degrees+0.080
 slur+0.079
Neg logit contributions
ate-0.065
man-0.057
roc-0.056
flower-0.055
amer-0.054
Token
Token ID198
Feature activation-1.14e+00

Pos logit contributions
 minutes+0.050
fy+0.049
ches+0.048
 degrees+0.048
 mun+0.047
Neg logit contributions
ate-0.025
roc-0.024
 Fighting-0.021
 Respect-0.020
ulture-0.020
Token?"
Token ID1701
Feature activation-9.92e-01

Pos logit contributions
roc+0.010
ate+0.009
ulture+0.008
ateful+0.008
rot+0.008
Neg logit contributions
 minutes-0.012
ches-0.011
fy-0.011
 drift-0.010
 mun-0.010
Token
Token ID198
Feature activation-1.00e+00

Pos logit contributions
 minutes+0.051
fy+0.051
ches+0.050
 degrees+0.048
 mun+0.048
Neg logit contributions
ate-0.029
roc-0.027
 Fighting-0.024
 Respect-0.024
 neighborhood-0.024
Token
Token ID198
Feature activation-4.16e-01

Pos logit contributions
 minutes+0.055
fy+0.053
 mun+0.053
ches+0.052
 degrees+0.052
Neg logit contributions
ate-0.027
roc-0.025
 Respect-0.021
rot-0.021
 neighborhood-0.021
TokenLu
Token ID25596
Feature activation+1.21e-01

Pos logit contributions
 minutes+0.014
ches+0.013
 mun+0.012
fy+0.012
 drift+0.012
Neg logit contributions
roc-0.021
ate-0.020
ulture-0.018
rot-0.018
ateful-0.018
Tokency
Token ID948
Feature activation+4.65e-01

Pos logit contributions
roc+0.004
ate+0.004
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.006
ches-0.006
fy-0.006
 mun-0.006
 drift-0.006
Token thought
Token ID1807
Feature activation-4.09e+00

Pos logit contributions
roc+0.022
ate+0.021
ulture+0.020
ateful+0.020
amer+0.019
Neg logit contributions
 minutes-0.018
ches-0.016
 mun-0.015
fy-0.015
 y-0.015
Token that
Token ID326
Feature activation-8.92e-01

Pos logit contributions
ksh+0.216
 mun+0.214
ns+0.200
fy+0.199
 refreshing+0.190
Neg logit contributions
,-0.094
ate-0.088
room-0.081
 office-0.077
 heart-0.075
Token sounded
Token ID14846
Feature activation-1.92e+00

Pos logit contributions
 minutes+0.047
fy+0.045
ches+0.044
 mun+0.044
 degrees+0.042
Neg logit contributions
ate-0.028
roc-0.027
anish-0.023
man-0.023
 Fighting-0.022
Token like
Token ID588
Feature activation-1.46e+00

Pos logit contributions
ches+0.091
 mun+0.089
 degrees+0.085
 minutes+0.085
ait+0.084
Neg logit contributions
ate-0.060
roc-0.059
ateful-0.050
 sincere-0.049
 Respect-0.049
Token fun
Token ID1257
Feature activation-1.02e+00

Pos logit contributions
ches+0.071
fy+0.065
 minutes+0.063
 drift+0.059
 Whoever+0.059
Neg logit contributions
ate-0.052
roc-0.051
ateful-0.045
ulture-0.043
 Fighting-0.043
Token,
Token ID11
Feature activation-1.52e+00

Pos logit contributions
ches+0.048
 mun+0.047
fy+0.046
 minutes+0.045
 drift+0.043
Neg logit contributions
ate-0.035
roc-0.034
anish-0.029
 neighborhood-0.029
ateful-0.029
Token go
Token ID467
Feature activation-1.40e+00

Pos logit contributions
 minutes+0.006
ches+0.005
fy+0.005
 mun+0.005
 degrees+0.005
Neg logit contributions
roc-0.009
ate-0.009
ulture-0.008
ateful-0.008
rot-0.008
Token find
Token ID1064
Feature activation-4.00e-01

Pos logit contributions
 minutes+0.071
 degrees+0.070
 pounds+0.066
 mun+0.062
fy+0.062
Neg logit contributions
ate-0.048
roc-0.042
man-0.038
gy-0.038
 neighborhood-0.038
Token a
Token ID257
Feature activation-1.26e+00

Pos logit contributions
 minutes+0.018
ches+0.017
fy+0.017
 mun+0.016
 drift+0.016
Neg logit contributions
roc-0.016
ate-0.015
ulture-0.013
ateful-0.013
rot-0.013
Token seat
Token ID5852
Feature activation-2.33e-01

Pos logit contributions
 minutes+0.053
ches+0.052
 degrees+0.050
 mun+0.048
fy+0.048
Neg logit contributions
roc-0.050
ate-0.048
rot-0.045
 neighborhood-0.045
man-0.044
Token and
Token ID290
Feature activation-8.39e-01

Pos logit contributions
 minutes+0.008
ches+0.008
 drift+0.007
fy+0.007
 mun+0.007
Neg logit contributions
roc-0.011
ate-0.011
ulture-0.010
ateful-0.010
man-0.010
Token wait
Token ID4043
Feature activation-4.08e+00

Pos logit contributions
 minutes+0.033
ches+0.031
 degrees+0.029
fy+0.029
 mun+0.027
Neg logit contributions
ate-0.038
roc-0.037
ateful-0.032
ulture-0.032
 Respect-0.032
Token for
Token ID329
Feature activation-1.13e+00

Pos logit contributions
ches+0.196
 mun+0.189
fy+0.184
ns+0.180
cone+0.176
Neg logit contributions
ate-0.126
ap-0.097
 yourself-0.092
 town-0.091
 home-0.091
Token a
Token ID257
Feature activation-2.87e+00

Pos logit contributions
 minutes+0.051
ches+0.050
fy+0.049
 mun+0.048
 degrees+0.047
Neg logit contributions
ate-0.042
roc-0.041
ateful-0.035
 Fighting-0.033
 Respect-0.033
Token turn
Token ID1210
Feature activation-1.93e+00

Pos logit contributions
 degrees+0.119
ches+0.118
 minutes+0.109
 pounds+0.106
 mun+0.104
Neg logit contributions
 judge-0.101
man-0.096
 symbol-0.094
rot-0.094
 town-0.092
Token."
Token ID526
Feature activation-1.09e+00

Pos logit contributions
 mun+0.084
 minutes+0.081
 tug+0.080
 drift+0.079
ches+0.079
Neg logit contributions
ate-0.069
man-0.063
amer-0.060
rot-0.058
roc-0.058
Token They
Token ID1119
Feature activation-2.11e+00

Pos logit contributions
 minutes+0.047
 pounds+0.046
 degrees+0.045
ches+0.043
 kil+0.043
Neg logit contributions
roc-0.044
ate-0.042
 Fighting-0.039
room-0.037
 neighborhood-0.036
Token Let
Token ID3914
Feature activation-1.44e-01

Pos logit contributions
 minutes+0.027
ches+0.026
fy+0.024
 mun+0.024
 y+0.024
Neg logit contributions
roc-0.025
ate-0.024
ulture-0.022
 Fighting-0.021
 Respect-0.021
Token's
Token ID338
Feature activation-1.18e-01

Pos logit contributions
 minutes+0.007
ches+0.006
fy+0.006
 mun+0.006
 degrees+0.006
Neg logit contributions
roc-0.005
ate-0.005
ulture-0.004
rot-0.004
ateful-0.004
Token go
Token ID467
Feature activation-9.46e-01

Pos logit contributions
 minutes+0.004
ches+0.003
fy+0.003
 mun+0.003
 degrees+0.003
Neg logit contributions
roc-0.006
ate-0.006
ulture-0.006
ateful-0.006
rot-0.005
Token inside
Token ID2641
Feature activation-6.57e-01

Pos logit contributions
 minutes+0.046
 degrees+0.043
 pounds+0.042
fy+0.041
 mun+0.041
Neg logit contributions
ate-0.034
roc-0.032
man-0.028
ulture-0.028
flower-0.027
Token and
Token ID290
Feature activation-1.83e-01

Pos logit contributions
 minutes+0.028
ches+0.027
fy+0.026
 mun+0.026
 drift+0.025
Neg logit contributions
ate-0.026
roc-0.026
ateful-0.023
ulture-0.022
amer-0.022
Token wait
Token ID4043
Feature activation-4.05e+00

Pos logit contributions
 minutes+0.007
ches+0.007
fy+0.006
 mun+0.006
 degrees+0.006
Neg logit contributions
roc-0.008
ate-0.008
ulture-0.007
ateful-0.007
rot-0.007
Token for
Token ID329
Feature activation-1.36e+00

Pos logit contributions
ches+0.173
 mun+0.170
fy+0.164
cone+0.160
 Whoever+0.150
Neg logit contributions
ate-0.137
 town-0.109
ap-0.109
 yourself-0.107
room-0.102
Token the
Token ID262
Feature activation-1.00e+00

Pos logit contributions
ches+0.066
fy+0.064
 mun+0.061
 minutes+0.059
 Whoever+0.059
Neg logit contributions
ate-0.047
roc-0.043
ateful-0.038
 town-0.038
 yourself-0.037
Token rain
Token ID6290
Feature activation-1.15e+00

Pos logit contributions
ches+0.042
 minutes+0.042
fy+0.040
 mun+0.039
 Whoever+0.038
Neg logit contributions
roc-0.040
ate-0.038
ulture-0.036
man-0.035
 neighborhood-0.035
Token to
Token ID284
Feature activation-9.20e-01

Pos logit contributions
 minutes+0.048
 mun+0.044
ches+0.044
fy+0.042
 degrees+0.042
Neg logit contributions
ate-0.046
roc-0.045
man-0.042
ulture-0.041
flower-0.041
Token stop
Token ID2245
Feature activation-8.26e-01

Pos logit contributions
 minutes+0.034
ches+0.032
fy+0.032
 mun+0.030
 degrees+0.029
Neg logit contributions
ate-0.043
roc-0.042
ulture-0.037
ateful-0.036
flower-0.036
Token,
Token ID11
Feature activation-2.35e-01

Pos logit contributions
roc+0.005
ate+0.005
ulture+0.004
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.006
ches-0.006
fy-0.006
 mun-0.005
 drift-0.005
Token they
Token ID484
Feature activation-5.46e-01

Pos logit contributions
 minutes+0.007
ches+0.007
fy+0.006
 degrees+0.006
 mun+0.006
Neg logit contributions
roc-0.013
ate-0.012
ateful-0.011
ulture-0.011
rot-0.011
Token sat
Token ID3332
Feature activation+1.16e+00

Pos logit contributions
 minutes+0.025
 degrees+0.023
ches+0.023
fy+0.022
 drift+0.021
Neg logit contributions
roc-0.021
ate-0.020
ateful-0.018
ulture-0.018
flower-0.018
Token down
Token ID866
Feature activation-6.22e-01

Pos logit contributions
roc+0.064
ate+0.057
ulture+0.057
anish+0.055
ateful+0.055
Neg logit contributions
 minutes-0.037
 y-0.032
ches-0.029
fy-0.028
 mun-0.028
Token for
Token ID329
Feature activation-1.94e+00

Pos logit contributions
 minutes+0.028
ches+0.028
fy+0.027
 degrees+0.026
 mun+0.026
Neg logit contributions
ate-0.023
roc-0.022
ateful-0.020
man-0.020
ulture-0.020
Token a
Token ID257
Feature activation-4.02e+00

Pos logit contributions
ches+0.109
fy+0.105
 drift+0.099
 degrees+0.098
 minutes+0.098
Neg logit contributions
roc-0.048
ate-0.046
 town-0.039
ateful-0.036
amed-0.036
Token picnic
Token ID35715
Feature activation-6.54e-01

Pos logit contributions
 degrees+0.170
 minutes+0.150
 pounds+0.149
 twice+0.147
uct+0.145
Neg logit contributions
 judge-0.140
 town-0.124
man-0.123
 painter-0.122
 musician-0.121
Token.
Token ID13
Feature activation-5.21e-01

Pos logit contributions
 minutes+0.026
ches+0.025
 degrees+0.024
fy+0.023
 Whoever+0.023
Neg logit contributions
ate-0.028
roc-0.027
ulture-0.025
man-0.025
flower-0.024
Token
Token ID220
Feature activation+1.07e-01

Pos logit contributions
 minutes+0.023
ches+0.021
 degrees+0.021
fy+0.021
 pounds+0.020
Neg logit contributions
roc-0.021
ate-0.020
 Fighting-0.018
ulture-0.018
ateful-0.017
Token
Token ID198
Feature activation-1.21e-01

Pos logit contributions
roc+0.004
ate+0.004
ulture+0.004
ateful+0.004
rot+0.003
Neg logit contributions
 minutes-0.005
ches-0.005
fy-0.004
 mun-0.004
 drift-0.004
Token
Token ID198
Feature activation+2.77e-02

Pos logit contributions
 minutes+0.006
ches+0.005
fy+0.005
 mun+0.005
 drift+0.005
Neg logit contributions
roc-0.005
ate-0.004
ulture-0.004
ateful-0.004
rot-0.004
Token Finally
Token ID9461
Feature activation-1.15e+00

Pos logit contributions
 minutes+0.026
ches+0.025
fy+0.024
 degrees+0.023
 mun+0.023
Neg logit contributions
roc-0.020
ate-0.019
ulture-0.017
ateful-0.016
 Respect-0.016
Token,
Token ID11
Feature activation-5.65e-01

Pos logit contributions
ches+0.069
fy+0.067
 mun+0.063
 degrees+0.062
 minutes+0.062
Neg logit contributions
ate-0.024
roc-0.023
anish-0.019
ateful-0.018
flower-0.017
Token after
Token ID706
Feature activation-1.80e+00

Pos logit contributions
 minutes+0.019
ches+0.019
fy+0.018
 degrees+0.017
 mun+0.017
Neg logit contributions
roc-0.027
ate-0.027
ateful-0.024
ulture-0.024
flower-0.023
Token climbing
Token ID14281
Feature activation-2.39e-01

Pos logit contributions
ches+0.090
fy+0.089
 degrees+0.076
 minutes+0.076
 mun+0.076
Neg logit contributions
ate-0.058
roc-0.054
 town-0.054
ateful-0.050
 neighborhood-0.048
Token for
Token ID329
Feature activation-2.22e+00

Pos logit contributions
 minutes+0.011
ches+0.010
fy+0.010
 mun+0.010
 degrees+0.010
Neg logit contributions
roc-0.009
ate-0.009
ulture-0.008
ateful-0.008
rot-0.008
Token a
Token ID257
Feature activation-4.02e+00

Pos logit contributions
ches+0.127
fy+0.124
ksh+0.119
 mun+0.117
 pounds+0.116
Neg logit contributions
 town-0.049
ate-0.046
roc-0.040
amed-0.038
 voice-0.037
Token long
Token ID890
Feature activation-2.47e+00

Pos logit contributions
 degrees+0.176
oll+0.168
oops+0.166
ows+0.159
ches+0.156
Neg logit contributions
 judge-0.121
 town-0.117
 symbol-0.114
 spirit-0.110
 purpose-0.107
Token time
Token ID640
Feature activation-1.58e+00

Pos logit contributions
 degrees+0.102
fy+0.090
ches+0.090
 pounds+0.089
 minutes+0.086
Neg logit contributions
roc-0.086
 town-0.085
gy-0.083
ate-0.082
man-0.080
Token,
Token ID11
Feature activation-9.26e-01

Pos logit contributions
ches+0.081
fy+0.079
 minutes+0.075
 mun+0.075
 syrup+0.073
Neg logit contributions
ate-0.044
roc-0.041
 neighborhood-0.037
flower-0.036
room-0.036
Token Mary
Token ID5335
Feature activation-4.88e-01

Pos logit contributions
ches+0.036
fy+0.034
 minutes+0.034
 mun+0.032
 degrees+0.031
Neg logit contributions
roc-0.041
ate-0.040
ateful-0.036
ulture-0.035
flower-0.034
Token reached
Token ID4251
Feature activation-1.59e+00

Pos logit contributions
 minutes+0.019
fy+0.017
ches+0.017
 drift+0.016
 degrees+0.016
Neg logit contributions
roc-0.022
ate-0.021
rot-0.019
ulture-0.019
man-0.019
Token to
Token ID284
Feature activation-1.56e+00

Pos logit contributions
 degrees+0.056
 minutes+0.056
 mun+0.055
ches+0.054
 pounds+0.053
Neg logit contributions
ate-0.032
roc-0.031
man-0.026
amed-0.026
room-0.024
Token the
Token ID262
Feature activation-3.96e-01

Pos logit contributions
fy+0.098
ches+0.096
 degrees+0.095
 minutes+0.095
 pounds+0.089
Neg logit contributions
ate-0.039
roc-0.035
 town-0.033
 Respect-0.026
room-0.026
Token slide
Token ID10649
Feature activation-1.05e+00

Pos logit contributions
 minutes+0.015
ches+0.015
fy+0.014
 mun+0.013
 degrees+0.013
Neg logit contributions
roc-0.018
ate-0.018
ulture-0.016
man-0.016
rot-0.016
Token.
Token ID13
Feature activation-8.36e-01

Pos logit contributions
 minutes+0.048
ches+0.047
 degrees+0.046
 mun+0.045
fy+0.045
Neg logit contributions
ate-0.038
roc-0.035
man-0.034
room-0.033
flower-0.032
Token They
Token ID1119
Feature activation-2.07e+00

Pos logit contributions
 minutes+0.041
ches+0.040
fy+0.039
 degrees+0.038
 mun+0.037
Neg logit contributions
roc-0.030
ate-0.027
ulture-0.024
 Fighting-0.024
room-0.024
Token wait
Token ID4043
Feature activation-4.01e+00

Pos logit contributions
 minutes+0.095
 degrees+0.095
ches+0.085
 pounds+0.085
fy+0.084
Neg logit contributions
roc-0.074
 town-0.071
ate-0.070
 judge-0.070
 neighborhood-0.064
Token for
Token ID329
Feature activation-1.92e+00

Pos logit contributions
ches+0.199
fy+0.196
 mun+0.185
cone+0.179
ience+0.179
Neg logit contributions
ate-0.114
ap-0.097
,-0.091
 town-0.088
 with-0.086
Token their
Token ID511
Feature activation-1.34e+00

Pos logit contributions
ches+0.116
fy+0.109
 mun+0.103
sc+0.099
 pounds+0.097
Neg logit contributions
ate-0.052
roc-0.047
 town-0.041
ateful-0.039
room-0.036
Token turn
Token ID1210
Feature activation-1.81e+00

Pos logit contributions
ches+0.069
 minutes+0.062
fy+0.061
 mun+0.059
 degrees+0.058
Neg logit contributions
roc-0.046
ate-0.042
 neighborhood-0.041
 town-0.038
man-0.038
Token.
Token ID13
Feature activation-9.49e-01

Pos logit contributions
 mun+0.079
 minutes+0.078
ches+0.077
 drift+0.075
 slur+0.074
Neg logit contributions
ate-0.063
man-0.056
roc-0.055
amer-0.053
flower-0.053
Token L
Token ID406
Feature activation-9.17e-01

Pos logit contributions
ches+0.046
 minutes+0.046
fy+0.045
 pounds+0.043
 mun+0.043
Neg logit contributions
roc-0.033
ate-0.032
 Fighting-0.028
room-0.027
ulture-0.026
Token.
Token ID13
Feature activation+1.24e-01

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token Let
Token ID3914
Feature activation-3.86e-01

Pos logit contributions
roc+0.006
ate+0.005
ulture+0.005
ateful+0.005
rot+0.005
Neg logit contributions
 minutes-0.005
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
Token's
Token ID338
Feature activation-7.51e-01

Pos logit contributions
 minutes+0.020
fy+0.018
 mun+0.018
 degrees+0.018
ches+0.018
Neg logit contributions
roc-0.013
ate-0.012
ulture-0.011
rot-0.011
man-0.010
Token go
Token ID467
Feature activation-1.30e+00

Pos logit contributions
 minutes+0.028
ches+0.024
fy+0.023
 degrees+0.023
 mun+0.023
Neg logit contributions
roc-0.037
ate-0.037
ulture-0.032
flower-0.032
man-0.032
Token for
Token ID329
Feature activation-1.44e+00

Pos logit contributions
 minutes+0.067
 degrees+0.064
 mun+0.060
 pounds+0.060
fy+0.059
Neg logit contributions
ate-0.044
roc-0.040
gy-0.036
man-0.035
 neighborhood-0.034
Token a
Token ID257
Feature activation-3.96e+00

Pos logit contributions
ches+0.076
fy+0.074
 mun+0.071
 degrees+0.070
 minutes+0.070
Neg logit contributions
ate-0.041
roc-0.039
 town-0.032
 yourself-0.031
 wing-0.030
Token different
Token ID1180
Feature activation-1.14e+00

Pos logit contributions
 degrees+0.182
 pounds+0.164
 minutes+0.159
uct+0.157
 syrup+0.154
Neg logit contributions
 judge-0.132
 town-0.122
gy-0.120
 spirit-0.119
 friend-0.114
Token walk
Token ID2513
Feature activation+4.08e-02

Pos logit contributions
ches+0.048
 minutes+0.044
fy+0.044
 mun+0.044
 syrup+0.042
Neg logit contributions
ate-0.046
roc-0.046
 neighborhood-0.042
anish-0.040
 town-0.040
Token instead
Token ID2427
Feature activation-6.44e-01

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.002
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token."
Token ID526
Feature activation+2.37e-01

Pos logit contributions
 minutes+0.025
ches+0.022
 mun+0.022
fy+0.022
 drift+0.022
Neg logit contributions
ate-0.028
roc-0.028
ulture-0.025
rot-0.025
ateful-0.025
Token But
Token ID887
Feature activation+2.20e-01

Pos logit contributions
roc+0.011
ate+0.010
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.010
ches-0.009
fy-0.009
 mun-0.009
 drift-0.008
Token on
Token ID319
Feature activation-1.42e+00

Pos logit contributions
roc+0.011
ate+0.010
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.009
ches-0.008
fy-0.008
 mun-0.008
 y-0.008
Token top
Token ID1353
Feature activation-1.76e+00

Pos logit contributions
fy+0.073
ches+0.073
 minutes+0.072
 mun+0.067
 degrees+0.066
Neg logit contributions
ate-0.045
roc-0.043
 town-0.036
man-0.035
room-0.035
Token of
Token ID286
Feature activation-1.28e+00

Pos logit contributions
 degrees+0.084
 minutes+0.084
 mun+0.083
 wo+0.083
ches+0.083
Neg logit contributions
ate-0.054
roc-0.046
amer-0.045
man-0.045
room-0.045
Token it
Token ID340
Feature activation-2.16e+00

Pos logit contributions
ches+0.070
fy+0.070
 minutes+0.067
 degrees+0.062
 mun+0.060
Neg logit contributions
roc-0.040
ate-0.039
ateful-0.033
 town-0.032
gy-0.030
Token for
Token ID329
Feature activation-1.78e+00

Pos logit contributions
 minutes+0.107
 degrees+0.104
 mun+0.103
 syrup+0.102
 slur+0.099
Neg logit contributions
ate-0.064
man-0.054
room-0.051
stick-0.049
 neighborhood-0.048
Token a
Token ID257
Feature activation-3.96e+00

Pos logit contributions
ches+0.094
fy+0.092
 minutes+0.084
 drift+0.082
 degrees+0.081
Neg logit contributions
roc-0.054
ate-0.050
 town-0.044
ateful-0.043
room-0.042
Token while
Token ID981
Feature activation-1.58e+00

Pos logit contributions
 degrees+0.160
ches+0.152
 minutes+0.149
uct+0.147
oops+0.144
Neg logit contributions
 judge-0.130
 symbol-0.122
 town-0.121
 nation-0.116
man-0.115
Token.
Token ID13
Feature activation-9.22e-01

Pos logit contributions
ches+0.074
fy+0.071
 mun+0.069
 degrees+0.067
 minutes+0.067
Neg logit contributions
roc-0.047
ate-0.046
 town-0.046
amer-0.044
man-0.044
Token
Token ID198
Feature activation-8.60e-01

Pos logit contributions
 minutes+0.047
ches+0.044
 degrees+0.043
fy+0.043
 pounds+0.042
Neg logit contributions
roc-0.031
ate-0.031
 Fighting-0.028
 Respect-0.026
 neighborhood-0.025
Token
Token ID198
Feature activation-3.92e-01

Pos logit contributions
 minutes+0.046
fy+0.042
 degrees+0.042
ches+0.041
 mun+0.041
Neg logit contributions
ate-0.027
roc-0.027
ulture-0.022
man-0.021
rot-0.021
TokenAs
Token ID1722
Feature activation-1.14e+00

Pos logit contributions
 minutes+0.015
ches+0.014
 drift+0.013
fy+0.013
 degrees+0.013
Neg logit contributions
roc-0.018
ate-0.017
ulture-0.015
ateful-0.015
 Fighting-0.015
Token
Token ID198
Feature activation-8.35e-01

Pos logit contributions
 minutes+0.033
 degrees+0.031
ches+0.031
fy+0.031
 pounds+0.030
Neg logit contributions
roc-0.020
ate-0.020
ulture-0.017
 Fighting-0.017
 neighborhood-0.016
Token
Token ID198
Feature activation-5.30e-01

Pos logit contributions
 minutes+0.044
fy+0.041
 degrees+0.041
ches+0.040
 mun+0.040
Neg logit contributions
ate-0.026
roc-0.026
ulture-0.022
rot-0.021
ateful-0.021
TokenAfter
Token ID3260
Feature activation-1.28e+00

Pos logit contributions
 minutes+0.023
ches+0.022
 drift+0.021
 degrees+0.021
 mun+0.021
Neg logit contributions
ate-0.021
roc-0.021
ulture-0.018
rot-0.018
amed-0.017
Token swinging
Token ID25635
Feature activation+1.99e+00

Pos logit contributions
ches+0.068
fy+0.065
 degrees+0.062
 pounds+0.061
 minutes+0.060
Neg logit contributions
ate-0.035
roc-0.035
 Fighting-0.033
 town-0.032
room-0.032
Token for
Token ID329
Feature activation-1.59e+00

Pos logit contributions
roc+0.098
ate+0.097
ateful+0.091
amer+0.088
ulture+0.086
Neg logit contributions
 minutes-0.072
 y-0.065
ches-0.064
fy-0.062
 wo-0.061
Token a
Token ID257
Feature activation-3.92e+00

Pos logit contributions
ches+0.086
fy+0.084
 degrees+0.081
 drift+0.080
 pounds+0.079
Neg logit contributions
roc-0.041
ate-0.039
 town-0.033
amed-0.033
ateful-0.031
Token while
Token ID981
Feature activation-7.71e-01

Pos logit contributions
 degrees+0.174
 pounds+0.156
oll+0.150
oops+0.147
 minutes+0.145
Neg logit contributions
 judge-0.137
 town-0.121
roc-0.120
Queen-0.113
 pupil-0.112
Token,
Token ID11
Feature activation-7.39e-01

Pos logit contributions
ches+0.046
fy+0.045
 degrees+0.045
 minutes+0.044
 mun+0.044
Neg logit contributions
roc-0.015
ate-0.013
ateful-0.011
rot-0.011
ulture-0.011
Token Lily
Token ID20037
Feature activation-1.42e-01

Pos logit contributions
 minutes+0.026
ches+0.026
fy+0.026
 degrees+0.024
 pounds+0.023
Neg logit contributions
roc-0.033
ate-0.032
ateful-0.030
ulture-0.029
rot-0.029
Token's
Token ID338
Feature activation+2.10e-01

Pos logit contributions
 minutes+0.006
ches+0.006
fy+0.006
 drift+0.006
 degrees+0.006
Neg logit contributions
roc-0.006
ate-0.005
ulture-0.005
ateful-0.005
rot-0.005
Token tum
Token ID11814
Feature activation+1.56e+00

Pos logit contributions
roc+0.008
ate+0.008
ulture+0.007
ateful+0.007
rot+0.007
Neg logit contributions
 minutes-0.010
ches-0.009
fy-0.009
 mun-0.009
 drift-0.009
Token"
Token ID1
Feature activation-3.41e-01

Pos logit contributions
roc+0.030
ate+0.028
ulture+0.027
ateful+0.026
rot+0.026
Neg logit contributions
 minutes-0.022
ches-0.019
fy-0.019
 Whoever-0.018
 y-0.018
TokenLet
Token ID5756
Feature activation-6.82e-02

Pos logit contributions
 minutes+0.016
ches+0.016
fy+0.015
 drift+0.015
 degrees+0.015
Neg logit contributions
roc-0.012
ate-0.012
ulture-0.011
ateful-0.010
rot-0.010
Token's
Token ID338
Feature activation-4.92e-01

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.003
ate-0.002
ulture-0.002
ateful-0.002
rot-0.002
Token go
Token ID467
Feature activation-1.73e+00

Pos logit contributions
 minutes+0.018
ches+0.015
fy+0.015
 degrees+0.015
 mun+0.014
Neg logit contributions
roc-0.025
ate-0.025
ulture-0.022
ateful-0.021
man-0.021
Token for
Token ID329
Feature activation-1.66e+00

Pos logit contributions
 minutes+0.082
 degrees+0.078
 pounds+0.072
fy+0.072
 mun+0.070
Neg logit contributions
ate-0.065
roc-0.056
man-0.053
gy-0.052
flower-0.051
Token a
Token ID257
Feature activation-3.91e+00

Pos logit contributions
ches+0.090
fy+0.089
 mun+0.085
 degrees+0.084
 minutes+0.083
Neg logit contributions
ate-0.046
roc-0.039
 town-0.034
 yourself-0.032
amed-0.030
Token row
Token ID5752
Feature activation+3.23e-01

Pos logit contributions
 degrees+0.185
uct+0.167
oll+0.166
 twice+0.164
 pounds+0.163
Neg logit contributions
 town-0.120
 judge-0.120
 nation-0.117
gy-0.113
man-0.109
Token!",
Token ID40754
Feature activation-1.03e-01

Pos logit contributions
roc+0.016
ate+0.015
ulture+0.014
ateful+0.014
rot+0.013
Neg logit contributions
 minutes-0.012
ches-0.011
fy-0.011
 mun-0.010
 drift-0.010
Token her
Token ID607
Feature activation+1.97e-02

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 degrees+0.003
 drift+0.003
Neg logit contributions
roc-0.005
ate-0.005
ateful-0.005
ulture-0.005
rot-0.005
Token mom
Token ID1995
Feature activation-1.89e+00

Pos logit contributions
roc+0.001
ate+0.001
ulture+0.001
ateful+0.001
rot+0.001
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token said
Token ID531
Feature activation-9.33e-01

Pos logit contributions
 minutes+0.076
 degrees+0.073
 refreshing+0.069
 marsh+0.069
 pounds+0.068
Neg logit contributions
ate-0.075
man-0.073
roc-0.070
amed-0.068
amer-0.066
Token in
Token ID287
Feature activation+4.56e-01

Pos logit contributions
roc+0.010
ate+0.009
ulture+0.008
ateful+0.008
rot+0.008
Neg logit contributions
 minutes-0.008
ches-0.008
fy-0.007
 mun-0.007
 drift-0.007
Token the
Token ID262
Feature activation+4.94e-01

Pos logit contributions
roc+0.022
ate+0.021
ulture+0.020
rot+0.019
ateful+0.019
Neg logit contributions
 minutes-0.017
ches-0.015
 drift-0.015
 y-0.015
fy-0.014
Token front
Token ID2166
Feature activation-2.13e-01

Pos logit contributions
roc+0.020
ate+0.018
ateful+0.017
ulture+0.017
amer+0.016
Neg logit contributions
 minutes-0.023
ches-0.021
 y-0.021
fy-0.020
 drift-0.020
Token yard
Token ID12699
Feature activation-2.21e-01

Pos logit contributions
 minutes+0.010
ches+0.009
fy+0.009
 mun+0.009
 drift+0.009
Neg logit contributions
roc-0.008
ate-0.008
ulture-0.007
ateful-0.007
rot-0.007
Token and
Token ID290
Feature activation-4.25e-01

Pos logit contributions
 minutes+0.008
ches+0.007
fy+0.007
 drift+0.007
 mun+0.007
Neg logit contributions
roc-0.011
ate-0.010
ulture-0.009
ateful-0.009
man-0.009
Token wait
Token ID4043
Feature activation-3.91e+00

Pos logit contributions
 minutes+0.015
ches+0.014
fy+0.014
 degrees+0.013
 mun+0.013
Neg logit contributions
roc-0.021
ate-0.020
ateful-0.018
ulture-0.018
man-0.017
Token.
Token ID13
Feature activation-1.01e+00

Pos logit contributions
 mun+0.179
ches+0.178
fy+0.177
cone+0.176
 pounds+0.166
Neg logit contributions
ate-0.116
ap-0.094
 yourself-0.087
 town-0.085
 with-0.083
Token Anna
Token ID11735
Feature activation-5.58e-01

Pos logit contributions
 minutes+0.037
ches+0.037
 mun+0.036
fy+0.035
 degrees+0.035
Neg logit contributions
ate-0.043
roc-0.042
 Fighting-0.038
 Respect-0.038
 neighborhood-0.037
Token did
Token ID750
Feature activation-1.75e+00

Pos logit contributions
 minutes+0.023
ches+0.021
fy+0.021
 drift+0.020
 degrees+0.020
Neg logit contributions
roc-0.024
ate-0.023
ulture-0.020
rot-0.020
amed-0.020
Token as
Token ID355
Feature activation-8.70e-01

Pos logit contributions
fy+0.096
ches+0.095
 minutes+0.093
 mun+0.092
 pounds+0.089
Neg logit contributions
ate-0.047
roc-0.037
 yourself-0.031
 town-0.030
room-0.029
Token her
Token ID607
Feature activation-9.19e-01

Pos logit contributions
 minutes+0.037
ches+0.035
fy+0.034
 mun+0.033
 Whoever+0.033
Neg logit contributions
ate-0.037
roc-0.036
ateful-0.035
ulture-0.031
ums-0.030
Token"
Token ID1
Feature activation+6.13e-01

Pos logit contributions
roc+0.019
ate+0.018
ateful+0.016
ulture+0.016
rot+0.015
Neg logit contributions
 minutes-0.020
ches-0.019
fy-0.018
 drift-0.018
 mun-0.017
TokenLet
Token ID5756
Feature activation+8.79e-01

Pos logit contributions
roc+0.025
ate+0.023
ulture+0.022
ateful+0.021
rot+0.020
Neg logit contributions
 minutes-0.027
ches-0.025
fy-0.024
 mun-0.024
 y-0.024
Token's
Token ID338
Feature activation+9.49e-01

Pos logit contributions
roc+0.040
ate+0.038
ateful+0.037
ulture+0.036
amer+0.035
Neg logit contributions
 minutes-0.034
ches-0.033
 y-0.031
 drift-0.030
fy-0.030
Token go
Token ID467
Feature activation-1.05e+00

Pos logit contributions
roc+0.055
ate+0.051
ulture+0.050
ateful+0.050
rot+0.048
Neg logit contributions
 minutes-0.024
ches-0.023
 drift-0.023
 y-0.022
fy-0.022
Token for
Token ID329
Feature activation-1.60e+00

Pos logit contributions
 minutes+0.050
 degrees+0.047
fy+0.044
ches+0.043
 pounds+0.043
Neg logit contributions
ate-0.039
roc-0.037
rot-0.032
man-0.032
ulture-0.032
Token a
Token ID257
Feature activation-3.91e+00

Pos logit contributions
fy+0.082
ches+0.081
 minutes+0.079
 degrees+0.078
 mun+0.077
Neg logit contributions
ate-0.049
roc-0.043
 town-0.037
 neighborhood-0.033
 Respect-0.033
Token walk
Token ID2513
Feature activation+2.59e-01

Pos logit contributions
 degrees+0.180
 pounds+0.160
 minutes+0.159
 twice+0.147
lessly+0.139
Neg logit contributions
 judge-0.139
gy-0.133
 nation-0.132
 town-0.132
date-0.126
Token!"
Token ID2474
Feature activation-4.23e-01

Pos logit contributions
roc+0.012
ate+0.012
ulture+0.011
ateful+0.011
rot+0.011
Neg logit contributions
 minutes-0.010
ches-0.009
fy-0.008
 mun-0.008
 drift-0.008
Token said
Token ID531
Feature activation+9.40e-01

Pos logit contributions
 minutes+0.014
ches+0.013
 degrees+0.013
fy+0.013
 drift+0.012
Neg logit contributions
roc-0.021
ate-0.020
ulture-0.019
ateful-0.019
rot-0.018
Token Mom
Token ID11254
Feature activation-1.14e+00

Pos logit contributions
roc+0.050
ate+0.047
ulture+0.045
ateful+0.045
rot+0.043
Neg logit contributions
 minutes-0.030
ches-0.028
 drift-0.027
 mun-0.026
 y-0.026
Tokenmy
Token ID1820
Feature activation-1.13e+00

Pos logit contributions
 minutes+0.066
 degrees+0.059
ches+0.059
fy+0.058
 y+0.058
Neg logit contributions
ate-0.030
roc-0.029
man-0.028
flower-0.028
amer-0.025
Token run
Token ID1057
Feature activation-9.48e-01

Pos logit contributions
 degrees+0.117
 minutes+0.116
ches+0.107
 pounds+0.104
fy+0.104
Neg logit contributions
roc-0.085
 town-0.081
 judge-0.081
ate-0.077
 neighborhood-0.072
Token to
Token ID284
Feature activation-1.79e+00

Pos logit contributions
 minutes+0.047
 degrees+0.046
 mun+0.046
ches+0.045
 drift+0.044
Neg logit contributions
ate-0.030
roc-0.029
man-0.025
amed-0.024
ulture-0.023
Token the
Token ID262
Feature activation-7.30e-01

Pos logit contributions
fy+0.114
ches+0.111
 degrees+0.110
 minutes+0.110
oll+0.104
Neg logit contributions
ate-0.044
 town-0.037
roc-0.036
room-0.029
 Respect-0.028
Token slide
Token ID10649
Feature activation-9.60e-01

Pos logit contributions
 minutes+0.030
ches+0.030
fy+0.028
 mun+0.027
 degrees+0.026
Neg logit contributions
roc-0.032
ate-0.031
ulture-0.028
man-0.028
 neighborhood-0.027
Token and
Token ID290
Feature activation-1.81e+00

Pos logit contributions
 minutes+0.045
ches+0.044
 degrees+0.043
 mun+0.042
fy+0.042
Neg logit contributions
ate-0.034
roc-0.032
man-0.030
room-0.029
flower-0.028
Token wait
Token ID4043
Feature activation-3.91e+00

Pos logit contributions
 degrees+0.087
 minutes+0.084
ches+0.081
 pounds+0.079
fy+0.078
Neg logit contributions
roc-0.059
ate-0.059
 town-0.056
 judge-0.056
 bully-0.053
Token for
Token ID329
Feature activation-2.22e+00

Pos logit contributions
ches+0.208
fy+0.203
 mun+0.195
ns+0.188
cone+0.188
Neg logit contributions
ate-0.097
ap-0.084
 yourself-0.072
ings-0.071
aps-0.071
Token their
Token ID511
Feature activation-1.22e+00

Pos logit contributions
ches+0.132
fy+0.127
 mun+0.120
sc+0.116
rig+0.115
Neg logit contributions
ate-0.058
roc-0.048
 town-0.046
ateful-0.045
 others-0.044
Token turn
Token ID1210
Feature activation-1.90e+00

Pos logit contributions
ches+0.065
 minutes+0.060
fy+0.058
 mun+0.056
 degrees+0.055
Neg logit contributions
roc-0.040
ate-0.037
 neighborhood-0.035
ateful-0.033
man-0.032
Token.
Token ID13
Feature activation-4.68e-01

Pos logit contributions
 minutes+0.084
 mun+0.083
ches+0.080
 drift+0.076
 degrees+0.076
Neg logit contributions
ate-0.067
man-0.059
roc-0.058
ateful-0.056
amer-0.056
Token
Token ID198
Feature activation-1.09e+00

Pos logit contributions
 minutes+0.024
ches+0.023
fy+0.022
 degrees+0.022
 mun+0.022
Neg logit contributions
roc-0.015
ate-0.015
ulture-0.013
 Respect-0.012
ateful-0.012
Token "
Token ID366
Feature activation-6.43e-01

Pos logit contributions
 minutes+0.038
fy+0.038
ches+0.038
 drift+0.036
 mun+0.035
Neg logit contributions
roc-0.029
ate-0.029
ulture-0.024
man-0.024
 Respect-0.024
TokenWe
Token ID1135
Feature activation-3.90e-01

Pos logit contributions
 minutes+0.031
ches+0.030
fy+0.028
 degrees+0.028
 drift+0.028
Neg logit contributions
roc-0.023
ate-0.023
ulture-0.021
amer-0.020
rot-0.020
Token can
Token ID460
Feature activation-2.77e-01

Pos logit contributions
 minutes+0.013
ches+0.012
 degrees+0.011
fy+0.011
 mun+0.011
Neg logit contributions
roc-0.020
ate-0.019
ulture-0.018
ateful-0.018
rot-0.018
Token go
Token ID467
Feature activation-5.83e-01

Pos logit contributions
 minutes+0.011
ches+0.009
fy+0.009
 degrees+0.009
 mun+0.009
Neg logit contributions
roc-0.013
ate-0.013
ulture-0.012
ateful-0.012
rot-0.012
Token on
Token ID319
Feature activation-1.07e+00

Pos logit contributions
 minutes+0.025
 mun+0.022
 degrees+0.022
ches+0.022
fy+0.022
Neg logit contributions
roc-0.024
ate-0.024
man-0.021
amer-0.021
ulture-0.021
Token a
Token ID257
Feature activation-3.89e+00

Pos logit contributions
 minutes+0.049
fy+0.048
 mun+0.047
ches+0.047
 drift+0.044
Neg logit contributions
ate-0.038
roc-0.037
ateful-0.031
 town-0.030
man-0.029
Token road
Token ID2975
Feature activation+1.59e-01

Pos logit contributions
uct+0.165
 degrees+0.164
utt+0.163
 mun+0.161
oll+0.157
Neg logit contributions
man-0.123
 nation-0.116
 town-0.114
 army-0.107
 voice-0.106
Token trip
Token ID5296
Feature activation+3.71e-01

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.007
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.006
ches-0.006
fy-0.005
 mun-0.005
 drift-0.005
Token."
Token ID526
Feature activation-8.03e-01

Pos logit contributions
roc+0.019
ate+0.018
ulture+0.017
ateful+0.016
rot+0.016
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.012
 y-0.011
 drift-0.011
Token
Token ID198
Feature activation-5.46e-01

Pos logit contributions
ches+0.039
fy+0.039
 minutes+0.039
 degrees+0.037
 mun+0.036
Neg logit contributions
roc-0.028
ate-0.027
ulture-0.022
man-0.022
 Respect-0.022
Token
Token ID198
Feature activation+1.01e-01

Pos logit contributions
 minutes+0.028
fy+0.027
ches+0.027
 mun+0.026
 degrees+0.026
Neg logit contributions
roc-0.018
ate-0.018
ulture-0.015
man-0.014
ateful-0.014
Token
Token ID198
Feature activation+1.29e-01

Pos logit contributions
 minutes+0.030
fy+0.028
 degrees+0.027
ches+0.027
 mun+0.027
Neg logit contributions
ate-0.019
roc-0.019
ulture-0.016
man-0.015
ateful-0.015
TokenAfter
Token ID3260
Feature activation-6.67e-01

Pos logit contributions
roc+0.006
ate+0.006
ulture+0.006
ateful+0.005
rot+0.005
Neg logit contributions
 minutes-0.005
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
Token they
Token ID484
Feature activation-1.83e+00

Pos logit contributions
ches+0.031
fy+0.028
 minutes+0.028
 degrees+0.026
 mun+0.025
Neg logit contributions
ate-0.025
roc-0.025
ateful-0.023
 Fighting-0.022
room-0.022
Token played
Token ID2826
Feature activation+3.61e-02

Pos logit contributions
 degrees+0.079
ches+0.074
 pounds+0.072
 minutes+0.071
 wo+0.064
Neg logit contributions
amed-0.065
roc-0.065
ate-0.064
room-0.061
 town-0.060
Token for
Token ID329
Feature activation-1.62e+00

Pos logit contributions
roc+0.001
ate+0.001
ulture+0.001
ateful+0.001
rot+0.001
Neg logit contributions
 minutes-0.002
ches-0.002
fy-0.002
 mun-0.001
 drift-0.001
Token a
Token ID257
Feature activation-3.88e+00

Pos logit contributions
ches+0.092
fy+0.085
 degrees+0.083
 drift+0.082
 mun+0.081
Neg logit contributions
roc-0.041
ate-0.039
 town-0.034
ateful-0.031
room-0.031
Token while
Token ID981
Feature activation-5.62e-01

Pos logit contributions
 degrees+0.168
 pounds+0.148
ches+0.144
 minutes+0.142
 lets+0.140
Neg logit contributions
 judge-0.131
 town-0.118
roc-0.115
 nation-0.115
 symbol-0.112
Token,
Token ID11
Feature activation-4.20e-01

Pos logit contributions
ches+0.033
 minutes+0.032
fy+0.031
 degrees+0.031
 mun+0.030
Neg logit contributions
roc-0.013
ate-0.012
ateful-0.010
ulture-0.010
rot-0.010
Token it
Token ID340
Feature activation-5.25e-01

Pos logit contributions
 minutes+0.014
ches+0.014
fy+0.013
 degrees+0.012
 drift+0.012
Neg logit contributions
roc-0.021
ate-0.020
ateful-0.019
ulture-0.019
man-0.018
Token was
Token ID373
Feature activation-6.86e-01

Pos logit contributions
 minutes+0.024
fy+0.022
ches+0.022
 degrees+0.022
 mun+0.022
Neg logit contributions
roc-0.019
ate-0.019
ulture-0.017
man-0.017
rot-0.016
Token time
Token ID640
Feature activation-2.24e+00

Pos logit contributions
ches+0.021
 minutes+0.021
fy+0.020
 mun+0.020
 degrees+0.019
Neg logit contributions
ate-0.035
roc-0.033
ateful-0.031
amed-0.030
flower-0.030
Token around
Token ID1088
Feature activation-1.61e-01

Pos logit contributions
 minutes+0.024
ches+0.022
 mun+0.022
fy+0.021
 degrees+0.021
Neg logit contributions
roc-0.024
ate-0.024
ulture-0.021
rot-0.021
man-0.021
Token.
Token ID13
Feature activation-7.62e-01

Pos logit contributions
 minutes+0.007
ches+0.006
fy+0.006
 mun+0.006
 drift+0.006
Neg logit contributions
roc-0.007
ate-0.007
ulture-0.006
ateful-0.006
rot-0.006
Token After
Token ID2293
Feature activation-8.58e-01

Pos logit contributions
 minutes+0.037
ches+0.035
fy+0.033
 degrees+0.033
 drift+0.032
Neg logit contributions
roc-0.027
ate-0.026
 Fighting-0.023
ulture-0.023
 Respect-0.023
Token playing
Token ID2712
Feature activation+8.60e-02

Pos logit contributions
ches+0.040
fy+0.038
 minutes+0.037
 degrees+0.035
 drift+0.034
Neg logit contributions
roc-0.030
ate-0.029
ateful-0.026
ulture-0.025
rot-0.025
Token for
Token ID329
Feature activation-1.54e+00

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.004
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
Token a
Token ID257
Feature activation-3.88e+00

Pos logit contributions
ches+0.077
fy+0.076
 degrees+0.074
 mun+0.073
 drift+0.072
Neg logit contributions
roc-0.043
ate-0.042
 town-0.037
room-0.035
amed-0.035
Token long
Token ID890
Feature activation-2.24e+00

Pos logit contributions
 degrees+0.167
oops+0.155
oll+0.152
uct+0.150
utt+0.149
Neg logit contributions
 judge-0.123
 town-0.112
 nation-0.108
roc-0.107
 team-0.107
Token time
Token ID640
Feature activation-5.43e-01

Pos logit contributions
 degrees+0.086
 pounds+0.077
 minutes+0.075
fy+0.074
 y+0.073
Neg logit contributions
roc-0.088
gy-0.083
ate-0.081
man-0.081
 town-0.081
Token,
Token ID11
Feature activation+9.65e-02

Pos logit contributions
 minutes+0.029
ches+0.029
fy+0.028
 mun+0.027
 drift+0.027
Neg logit contributions
roc-0.015
ate-0.014
ateful-0.012
ulture-0.012
rot-0.012
Token they
Token ID484
Feature activation-5.38e-01

Pos logit contributions
roc+0.005
ate+0.005
ulture+0.005
ateful+0.005
rot+0.005
Neg logit contributions
 minutes-0.003
ches-0.003
fy-0.003
 mun-0.002
 drift-0.002
Token went
Token ID1816
Feature activation+1.59e-01

Pos logit contributions
 minutes+0.022
 degrees+0.019
ches+0.019
fy+0.019
 drift+0.018
Neg logit contributions
roc-0.024
ate-0.023
ulture-0.021
rot-0.021
ateful-0.021
TokenThe
Token ID464
Feature activation+5.46e-01

Pos logit contributions
roc+0.004
ate+0.004
ulture+0.004
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.005
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
Token old
Token ID1468
Feature activation+1.87e+00

Pos logit contributions
roc+0.026
ate+0.024
ulture+0.023
ateful+0.023
anish+0.022
Neg logit contributions
 minutes-0.022
 y-0.019
ches-0.019
 drift-0.018
fy-0.018
Token lady
Token ID10846
Feature activation-6.30e-01

Pos logit contributions
roc+0.102
ateful+0.092
anish+0.091
ulture+0.091
amed+0.091
Neg logit contributions
 minutes-0.055
 y-0.052
 wo-0.049
 gob-0.046
 sn-0.045
Token thought
Token ID1807
Feature activation-3.04e+00

Pos logit contributions
 minutes+0.022
ches+0.019
fy+0.019
 degrees+0.019
 drift+0.019
Neg logit contributions
roc-0.031
ate-0.030
rot-0.027
ulture-0.027
man-0.027
Token and
Token ID290
Feature activation-3.99e-01

Pos logit contributions
 timer+0.156
ksh+0.154
 mun+0.146
 minutes+0.141
 degrees+0.141
Neg logit contributions
ate-0.072
 others-0.061
,-0.060
roc-0.060
room-0.058
Token thought
Token ID1807
Feature activation-3.88e+00

Pos logit contributions
 minutes+0.013
ches+0.012
fy+0.012
 degrees+0.012
 drift+0.011
Neg logit contributions
roc-0.020
ate-0.020
ateful-0.018
ulture-0.018
rot-0.018
Token but
Token ID475
Feature activation-2.82e-01

Pos logit contributions
 timer+0.221
ksh+0.216
 mun+0.214
 tast+0.205
 gob+0.205
Neg logit contributions
,-0.067
room-0.063
ate-0.053
ings-0.044
ums-0.044
Token she
Token ID673
Feature activation-5.41e-01

Pos logit contributions
 minutes+0.009
ches+0.008
fy+0.008
 degrees+0.008
 drift+0.008
Neg logit contributions
roc-0.015
ate-0.014
ulture-0.013
ateful-0.013
rot-0.013
Token couldn
Token ID3521
Feature activation-3.75e-01

Pos logit contributions
 minutes+0.021
 drift+0.018
ches+0.018
fy+0.018
 degrees+0.018
Neg logit contributions
roc-0.025
ate-0.024
ulture-0.022
rot-0.022
man-0.022
Tokenâ
Token ID22940
Feature activation+9.20e-02

Pos logit contributions
 minutes+0.018
fy+0.016
ches+0.016
 mun+0.016
 degrees+0.015
Neg logit contributions
roc-0.014
ate-0.014
ulture-0.012
rot-0.012
ateful-0.012
Token€
Token ID26391
Feature activation-2.63e-01

Pos logit contributions
roc+0.004
ate+0.004
ulture+0.004
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.004
ches-0.003
fy-0.003
 mun-0.003
 drift-0.003
Token's
Token ID338
Feature activation-7.16e-01

Pos logit contributions
 minutes+0.033
 mun+0.031
fy+0.031
 degrees+0.030
 pounds+0.030
Neg logit contributions
roc-0.020
ate-0.019
ulture-0.017
man-0.016
rot-0.016
Token go
Token ID467
Feature activation-1.12e+00

Pos logit contributions
 minutes+0.026
ches+0.024
fy+0.023
 degrees+0.022
 mun+0.022
Neg logit contributions
ate-0.035
roc-0.035
ulture-0.030
man-0.030
ateful-0.030
Token back
Token ID736
Feature activation-5.07e-01

Pos logit contributions
 minutes+0.060
 degrees+0.058
 pounds+0.056
 mun+0.055
ches+0.055
Neg logit contributions
ate-0.033
roc-0.030
man-0.028
gy-0.026
ulture-0.026
Token home
Token ID1363
Feature activation+6.21e-02

Pos logit contributions
 minutes+0.023
ches+0.022
 mun+0.021
fy+0.021
 degrees+0.021
Neg logit contributions
ate-0.019
roc-0.019
ateful-0.016
ulture-0.016
amer-0.016
Token and
Token ID290
Feature activation-5.83e-01

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.002
ches-0.002
fy-0.002
 mun-0.002
 drift-0.002
Token wait
Token ID4043
Feature activation-3.87e+00

Pos logit contributions
 minutes+0.023
ches+0.022
 degrees+0.021
fy+0.021
 mun+0.020
Neg logit contributions
roc-0.025
ate-0.025
ulture-0.022
man-0.022
ateful-0.022
Token for
Token ID329
Feature activation-1.51e+00

Pos logit contributions
ches+0.172
 mun+0.166
cone+0.163
fy+0.157
pill+0.152
Neg logit contributions
ate-0.130
ap-0.110
 town-0.103
room-0.098
 yourself-0.097
Token the
Token ID262
Feature activation-9.52e-01

Pos logit contributions
ches+0.078
fy+0.076
 mun+0.074
 minutes+0.069
rig+0.068
Neg logit contributions
ate-0.048
roc-0.044
ateful-0.039
 town-0.039
 yourself-0.037
Token sun
Token ID4252
Feature activation-6.71e-01

Pos logit contributions
ches+0.042
 minutes+0.041
fy+0.040
 mun+0.038
 degrees+0.037
Neg logit contributions
roc-0.037
ate-0.035
ulture-0.033
 neighborhood-0.033
man-0.032
Token to
Token ID284
Feature activation-9.77e-01

Pos logit contributions
 minutes+0.027
ches+0.025
fy+0.024
 mun+0.024
 degrees+0.023
Neg logit contributions
roc-0.029
ate-0.028
flower-0.026
ulture-0.026
man-0.026
Token come
Token ID1282
Feature activation-4.40e-01

Pos logit contributions
 minutes+0.038
ches+0.035
fy+0.035
 mun+0.033
 degrees+0.033
Neg logit contributions
ate-0.044
roc-0.043
ulture-0.038
room-0.038
flower-0.038
Token As
Token ID1081
Feature activation-8.00e-01

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.002
 mun+0.002
 drift+0.002
Neg logit contributions
roc-0.003
ate-0.002
ulture-0.002
ateful-0.002
rot-0.002
Token he
Token ID339
Feature activation-6.73e-01

Pos logit contributions
ches+0.027
fy+0.026
 minutes+0.026
 degrees+0.025
 mun+0.025
Neg logit contributions
roc-0.038
ate-0.037
man-0.035
amer-0.034
ateful-0.034
Token walked
Token ID6807
Feature activation+1.39e+00

Pos logit contributions
 minutes+0.026
ches+0.024
fy+0.024
 degrees+0.023
 y+0.022
Neg logit contributions
roc-0.029
ate-0.029
ulture-0.026
rot-0.025
ateful-0.025
Token through
Token ID832
Feature activation+5.11e-01

Pos logit contributions
roc+0.072
ateful+0.067
ate+0.067
ulture+0.066
anish+0.062
Neg logit contributions
 minutes-0.050
ches-0.046
fy-0.043
 drift-0.043
 y-0.040
Token the
Token ID262
Feature activation+6.84e-01

Pos logit contributions
roc+0.024
ate+0.023
ulture+0.022
ateful+0.021
rot+0.021
Neg logit contributions
 minutes-0.020
ches-0.018
 drift-0.017
fy-0.017
 y-0.017
Token wood
Token ID4898
Feature activation+2.12e+00

Pos logit contributions
roc+0.025
ate+0.024
ateful+0.023
ulture+0.023
rot+0.022
Neg logit contributions
 minutes-0.033
ches-0.031
fy-0.030
 drift-0.029
 y-0.029
Token,
Token ID11
Feature activation-4.06e-01

Pos logit contributions
roc+0.101
ate+0.094
ateful+0.089
ulture+0.087
rot+0.084
Neg logit contributions
 minutes-0.086
ches-0.078
 drift-0.075
 mun-0.072
 y-0.071
Token he
Token ID339
Feature activation+9.80e-01

Pos logit contributions
 minutes+0.013
ches+0.012
fy+0.012
 mun+0.011
 drift+0.011
Neg logit contributions
roc-0.021
ate-0.021
ateful-0.019
ulture-0.019
man-0.019
Token saw
Token ID2497
Feature activation+1.47e+00

Pos logit contributions
roc+0.045
ate+0.042
ulture+0.040
ateful+0.040
anish+0.038
Neg logit contributions
 minutes-0.040
ches-0.037
 mun-0.035
 drift-0.034
fy-0.034
Token a
Token ID257
Feature activation+4.87e-01

Pos logit contributions
roc+0.077
ate+0.072
ulture+0.070
apper+0.068
ateful+0.067
Neg logit contributions
 minutes-0.051
 y-0.047
ches-0.042
 mun-0.041
 marsh-0.040
Token large
Token ID1588
Feature activation+8.70e-01

Pos logit contributions
roc+0.023
ate+0.022
ulture+0.020
ateful+0.020
amer+0.019
Neg logit contributions
 minutes-0.018
ches-0.017
fy-0.016
 y-0.016
 mun-0.016
Token brother
Token ID3956
Feature activation+3.51e-01

Pos logit contributions
roc+0.010
ate+0.010
ulture+0.009
ateful+0.009
rot+0.008
Neg logit contributions
 minutes-0.011
ches-0.010
fy-0.010
 mun-0.010
 drift-0.010
Token.
Token ID13
Feature activation+6.43e-01

Pos logit contributions
roc+0.017
ate+0.016
ulture+0.015
ateful+0.015
rot+0.015
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.011
 mun-0.011
 drift-0.011
Token You
Token ID921
Feature activation+4.11e-01

Pos logit contributions
roc+0.030
ate+0.028
ulture+0.026
ateful+0.026
rot+0.025
Neg logit contributions
 minutes-0.025
ches-0.023
fy-0.022
 Whoever-0.022
 drift-0.022
Token have
Token ID423
Feature activation+8.78e-01

Pos logit contributions
roc+0.021
ate+0.020
ulture+0.019
ateful+0.019
rot+0.018
Neg logit contributions
 minutes-0.014
ches-0.013
fy-0.012
 y-0.012
 mun-0.012
Token to
Token ID284
Feature activation+7.48e-01

Pos logit contributions
roc+0.043
ate+0.041
ulture+0.038
amer+0.037
ateful+0.037
Neg logit contributions
 minutes-0.034
 y-0.030
ches-0.030
fy-0.029
 mun-0.028
Token share
Token ID2648
Feature activation+2.22e+00

Pos logit contributions
roc+0.037
ate+0.035
ulture+0.033
ateful+0.033
rot+0.033
Neg logit contributions
 minutes-0.026
ches-0.024
 drift-0.023
fy-0.023
 y-0.023
Token the
Token ID262
Feature activation+4.55e-01

Pos logit contributions
roc+0.115
ulture+0.111
ate+0.106
amed+0.106
amer+0.104
Neg logit contributions
 minutes-0.072
 y-0.070
ches-0.069
fy-0.058
 honey-0.058
Token mattress
Token ID33388
Feature activation+1.53e+00

Pos logit contributions
roc+0.020
ate+0.020
ulture+0.018
ateful+0.018
rot+0.017
Neg logit contributions
 minutes-0.018
ches-0.017
fy-0.016
 y-0.016
 drift-0.016
Token and
Token ID290
Feature activation+7.30e-01

Pos logit contributions
roc+0.071
ate+0.066
ulture+0.061
ateful+0.061
rot+0.059
Neg logit contributions
 minutes-0.061
ches-0.057
fy-0.056
 y-0.055
 mun-0.052
Token be
Token ID307
Feature activation+1.60e+00

Pos logit contributions
roc+0.037
ate+0.035
ulture+0.033
rot+0.032
ateful+0.032
Neg logit contributions
 minutes-0.026
ches-0.024
 drift-0.023
 y-0.023
fy-0.023
Token kind
Token ID1611
Feature activation+1.96e+00

Pos logit contributions
roc+0.082
ulture+0.080
ate+0.073
rot+0.072
anish+0.071
Neg logit contributions
 minutes-0.060
 y-0.054
fy-0.045
 tug-0.044
ches-0.044
Token a
Token ID257
Feature activation-1.36e-01

Pos logit contributions
roc+0.069
ate+0.064
ateful+0.062
ulture+0.062
flower+0.061
Neg logit contributions
 minutes-0.043
 y-0.037
ches-0.037
 marsh-0.034
fy-0.031
Token big
Token ID1263
Feature activation+1.05e+00

Pos logit contributions
 minutes+0.005
ches+0.005
fy+0.005
 mun+0.005
 drift+0.005
Neg logit contributions
roc-0.006
ate-0.006
ulture-0.005
ateful-0.005
rot-0.005
Token box
Token ID3091
Feature activation-7.37e-01

Pos logit contributions
roc+0.050
ate+0.048
ateful+0.045
ulture+0.044
amer+0.043
Neg logit contributions
 minutes-0.041
 y-0.037
ches-0.036
fy-0.035
 drift-0.035
Token in
Token ID287
Feature activation+1.17e-01

Pos logit contributions
 minutes+0.033
ches+0.031
 degrees+0.031
fy+0.030
 mun+0.030
Neg logit contributions
ate-0.028
roc-0.026
man-0.025
ulture-0.024
rot-0.024
Token the
Token ID262
Feature activation+1.21e+00

Pos logit contributions
roc+0.005
ate+0.005
ulture+0.005
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.005
ches-0.005
fy-0.004
 mun-0.004
 drift-0.004
Token living
Token ID2877
Feature activation+2.06e+00

Pos logit contributions
roc+0.052
ate+0.048
ateful+0.047
amer+0.045
ulture+0.045
Neg logit contributions
 minutes-0.051
 y-0.048
ches-0.044
fy-0.043
 drift-0.043
Token room
Token ID2119
Feature activation-4.47e-01

Pos logit contributions
roc+0.088
rot+0.082
ulture+0.080
amer+0.077
ums+0.076
Neg logit contributions
 minutes-0.093
 y-0.076
ches-0.067
 slippery-0.066
 seconds-0.065
Token.
Token ID13
Feature activation-3.53e-01

Pos logit contributions
 minutes+0.017
ches+0.016
fy+0.016
 mun+0.015
 drift+0.015
Neg logit contributions
ate-0.020
roc-0.020
ateful-0.018
ulture-0.018
man-0.018
Token It
Token ID632
Feature activation-1.26e+00

Pos logit contributions
 minutes+0.016
ches+0.015
fy+0.015
 mun+0.014
 degrees+0.014
Neg logit contributions
roc-0.014
ate-0.014
ulture-0.012
ateful-0.012
 Fighting-0.012
Token has
Token ID468
Feature activation+1.75e-01

Pos logit contributions
 minutes+0.071
fy+0.066
 degrees+0.066
 mun+0.065
 pounds+0.062
Neg logit contributions
ate-0.039
roc-0.035
man-0.032
ulture-0.027
rot-0.027
Token a
Token ID257
Feature activation+4.67e-02

Pos logit contributions
roc+0.008
ate+0.007
ulture+0.007
ateful+0.007
rot+0.006
Neg logit contributions
 minutes-0.007
ches-0.007
fy-0.006
 mun-0.006
 drift-0.006
Token
Token ID198
Feature activation-1.50e-01

Pos logit contributions
 minutes+0.019
ches+0.017
fy+0.017
 mun+0.017
 degrees+0.017
Neg logit contributions
roc-0.014
ate-0.013
ulture-0.012
ateful-0.011
rot-0.011
TokenOne
Token ID3198
Feature activation+1.58e-01

Pos logit contributions
 minutes+0.006
ches+0.005
fy+0.005
 drift+0.005
 mun+0.005
Neg logit contributions
roc-0.007
ate-0.007
ulture-0.006
ateful-0.006
man-0.006
Token day
Token ID1110
Feature activation+8.37e-01

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.006
ches-0.006
fy-0.006
 mun-0.006
 drift-0.006
Token,
Token ID11
Feature activation-4.01e-01

Pos logit contributions
roc+0.035
ate+0.034
ulture+0.031
ateful+0.030
amer+0.030
Neg logit contributions
 minutes-0.038
 y-0.034
ches-0.033
fy-0.033
 mun-0.032
Token Tim
Token ID5045
Feature activation+1.05e+00

Pos logit contributions
 minutes+0.015
ches+0.014
fy+0.013
 degrees+0.013
 drift+0.013
Neg logit contributions
roc-0.019
ate-0.018
ateful-0.017
ulture-0.017
man-0.017
Token met
Token ID1138
Feature activation+2.07e+00

Pos logit contributions
roc+0.053
ate+0.050
ulture+0.049
ateful+0.048
anish+0.046
Neg logit contributions
 minutes-0.037
ches-0.035
 mun-0.032
 y-0.031
fy-0.031
Token a
Token ID257
Feature activation+4.76e-01

Pos logit contributions
roc+0.127
ulture+0.125
apper+0.122
amer+0.120
rot+0.119
Neg logit contributions
 y-0.047
 minutes-0.043
ches-0.037
 snowball-0.032
 sn-0.032
Token talking
Token ID3375
Feature activation+1.32e+00

Pos logit contributions
roc+0.021
ate+0.020
ulture+0.018
ateful+0.018
amer+0.018
Neg logit contributions
 minutes-0.020
ches-0.018
 y-0.018
fy-0.017
 mun-0.017
Token cat
Token ID3797
Feature activation+1.48e-01

Pos logit contributions
roc+0.050
ate+0.049
ateful+0.045
amed+0.043
ulture+0.042
Neg logit contributions
 minutes-0.059
 y-0.055
ches-0.053
 wo-0.053
 mun-0.052
Token.
Token ID13
Feature activation+6.28e-01

Pos logit contributions
roc+0.006
ate+0.006
ulture+0.005
ateful+0.005
rot+0.005
Neg logit contributions
 minutes-0.007
ches-0.006
fy-0.006
 mun-0.006
 drift-0.006
Token The
Token ID383
Feature activation+9.96e-01

Pos logit contributions
roc+0.024
ate+0.024
ulture+0.022
ateful+0.022
rot+0.021
Neg logit contributions
 minutes-0.028
ches-0.027
 y-0.026
fy-0.026
 mun-0.026
Token they
Token ID484
Feature activation+8.48e-01

Pos logit contributions
 minutes+0.002
ches+0.002
fy+0.002
 mun+0.002
 drift+0.001
Neg logit contributions
roc-0.003
ate-0.003
ateful-0.003
ulture-0.003
rot-0.003
Token heard
Token ID2982
Feature activation+1.68e+00

Pos logit contributions
roc+0.042
ate+0.040
ulture+0.037
ateful+0.036
rot+0.036
Neg logit contributions
 minutes-0.031
ches-0.029
 drift-0.028
 mun-0.027
fy-0.027
Token Bob
Token ID5811
Feature activation+1.44e+00

Pos logit contributions
roc+0.084
ate+0.081
ulture+0.077
ateful+0.074
rot+0.073
Neg logit contributions
 minutes-0.063
 y-0.056
ches-0.055
 drift-0.053
fy-0.051
Token's
Token ID338
Feature activation+1.05e+00

Pos logit contributions
roc+0.069
ate+0.065
ulture+0.060
ateful+0.060
anish+0.057
Neg logit contributions
ches-0.054
 minutes-0.053
 drift-0.050
 mun-0.048
fy-0.048
Token special
Token ID2041
Feature activation+1.97e+00

Pos logit contributions
roc+0.043
ate+0.042
ateful+0.037
ulture+0.037
anish+0.034
Neg logit contributions
 minutes-0.047
ches-0.044
 drift-0.043
 y-0.043
fy-0.042
Token song
Token ID3496
Feature activation+2.07e+00

Pos logit contributions
roc+0.092
ate+0.086
ateful+0.081
ulture+0.077
amer+0.073
Neg logit contributions
 minutes-0.084
 drift-0.075
 y-0.073
ches-0.072
 wo-0.072
Token they
Token ID484
Feature activation+2.69e-01

Pos logit contributions
roc+0.099
ate+0.089
ateful+0.084
ulture+0.083
rot+0.077
Neg logit contributions
 minutes-0.082
ches-0.080
 drift-0.077
 y-0.074
fy-0.073
Token all
Token ID477
Feature activation+7.27e-01

Pos logit contributions
roc+0.014
ate+0.013
ulture+0.012
ateful+0.012
rot+0.012
Neg logit contributions
 minutes-0.009
ches-0.008
fy-0.008
 drift-0.008
 mun-0.008
Token began
Token ID2540
Feature activation+1.88e+00

Pos logit contributions
roc+0.036
ate+0.034
ulture+0.032
ateful+0.031
rot+0.031
Neg logit contributions
 minutes-0.027
ches-0.024
 drift-0.023
fy-0.023
 y-0.023
Token to
Token ID284
Feature activation+1.71e+00

Pos logit contributions
roc+0.103
ate+0.095
ulture+0.093
ateful+0.089
rot+0.086
Neg logit contributions
 minutes-0.063
 drift-0.057
ches-0.055
 y-0.053
 wo-0.050
Token sing
Token ID1702
Feature activation+1.27e+00

Pos logit contributions
roc+0.089
ate+0.081
ulture+0.078
ateful+0.077
amer+0.074
Neg logit contributions
 drift-0.060
 minutes-0.058
ches-0.055
 dip-0.053
 y-0.053
Token.
Token ID13
Feature activation-6.21e-02

Pos logit contributions
 minutes+0.005
ches+0.004
fy+0.004
 mun+0.004
 drift+0.004
Neg logit contributions
roc-0.005
ate-0.005
ulture-0.005
ateful-0.004
rot-0.004
Token A
Token ID317
Feature activation-6.33e-01

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.002
ate-0.002
ulture-0.002
ateful-0.002
rot-0.002
Token little
Token ID1310
Feature activation+4.34e-01

Pos logit contributions
 minutes+0.028
ches+0.027
fy+0.025
 mun+0.025
 drift+0.025
Neg logit contributions
roc-0.026
ate-0.024
rot-0.022
ulture-0.022
man-0.022
Token bird
Token ID6512
Feature activation+6.73e-01

Pos logit contributions
roc+0.016
ate+0.016
ateful+0.014
ulture+0.014
amer+0.013
Neg logit contributions
 minutes-0.021
ches-0.019
fy-0.019
 y-0.019
 mun-0.018
Token lived
Token ID5615
Feature activation+5.91e-01

Pos logit contributions
roc+0.029
ate+0.026
ateful+0.024
ulture+0.024
rot+0.023
Neg logit contributions
 minutes-0.029
ches-0.028
 mun-0.026
 y-0.026
fy-0.026
Token in
Token ID287
Feature activation+3.95e-01

Pos logit contributions
roc+0.029
ate+0.026
ulture+0.025
ateful+0.024
rot+0.024
Neg logit contributions
 minutes-0.022
ches-0.020
 y-0.020
 mun-0.019
fy-0.019
Token the
Token ID262
Feature activation+1.07e+00

Pos logit contributions
roc+0.019
ate+0.018
ulture+0.017
ateful+0.016
rot+0.016
Neg logit contributions
 minutes-0.015
ches-0.013
 y-0.013
 drift-0.013
fy-0.013
Token tree
Token ID5509
Feature activation+4.63e-01

Pos logit contributions
roc+0.050
ate+0.049
ateful+0.047
amer+0.044
ulture+0.044
Neg logit contributions
 minutes-0.042
 y-0.039
fy-0.035
ches-0.035
 drift-0.035
Token.
Token ID13
Feature activation+1.97e-02

Pos logit contributions
roc+0.024
ate+0.023
ulture+0.022
ateful+0.021
rot+0.021
Neg logit contributions
 minutes-0.015
ches-0.014
fy-0.013
 mun-0.013
 y-0.013
Token The
Token ID383
Feature activation+1.56e-01

Pos logit contributions
roc+0.001
ate+0.001
ulture+0.001
ateful+0.001
rot+0.001
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token bird
Token ID6512
Feature activation+4.58e-01

Pos logit contributions
roc+0.007
ate+0.006
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.007
ches-0.006
fy-0.006
 mun-0.006
 drift-0.006
Token each
Token ID1123
Feature activation-4.95e-01

Pos logit contributions
roc+0.036
ate+0.035
ulture+0.032
rot+0.032
ateful+0.031
Neg logit contributions
 minutes-0.028
ches-0.025
 y-0.024
 drift-0.024
fy-0.023
Token other
Token ID584
Feature activation+1.47e-01

Pos logit contributions
 minutes+0.021
fy+0.020
ches+0.020
 degrees+0.020
 mun+0.020
Neg logit contributions
roc-0.020
ate-0.019
ulture-0.017
ateful-0.017
rot-0.017
Token and
Token ID290
Feature activation-8.74e-01

Pos logit contributions
roc+0.006
ate+0.006
ulture+0.006
ateful+0.006
rot+0.005
Neg logit contributions
 minutes-0.006
ches-0.006
fy-0.005
 mun-0.005
 drift-0.005
Token cried
Token ID16896
Feature activation-2.33e-02

Pos logit contributions
ches+0.032
 minutes+0.032
 degrees+0.030
fy+0.029
 Whoever+0.029
Neg logit contributions
roc-0.039
ate-0.037
ateful-0.036
ulture-0.034
 Fighting-0.033
Token.
Token ID13
Feature activation+1.06e+00

Pos logit contributions
 minutes+0.001
ches+0.001
fy+0.001
 mun+0.001
 drift+0.001
Neg logit contributions
roc-0.001
ate-0.001
ulture-0.001
ateful-0.001
rot-0.001
Token
Token ID198
Feature activation+4.89e-03

Pos logit contributions
roc+0.053
ate+0.048
ulture+0.047
amer+0.046
rot+0.046
Neg logit contributions
 minutes-0.040
 Whoever-0.034
 y-0.033
ches-0.032
 drift-0.031
Token
Token ID198
Feature activation+6.46e-01

Pos logit contributions
roc+0.000
ate+0.000
ulture+0.000
ateful+0.000
rot+0.000
Neg logit contributions
 minutes-0.000
ches-0.000
fy-0.000
 mun-0.000
 drift-0.000
TokenA
Token ID32
Feature activation+2.10e-01

Pos logit contributions
roc+0.026
ate+0.024
ulture+0.023
ateful+0.022
flower+0.022
Neg logit contributions
 minutes-0.030
ches-0.027
fy-0.026
 Whoever-0.025
 y-0.025
Token nice
Token ID3621
Feature activation+9.78e-01

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.011
ches-0.010
fy-0.010
 drift-0.009
 mun-0.009
Token lady
Token ID10846
Feature activation-3.46e-01

Pos logit contributions
roc+0.042
ate+0.039
ateful+0.035
ulture+0.034
amed+0.033
Neg logit contributions
 minutes-0.044
 y-0.040
 drift-0.039
fy-0.038
 mun-0.038
Token saw
Token ID2497
Feature activation-6.35e-01

Pos logit contributions
 minutes+0.015
ches+0.014
fy+0.014
 degrees+0.013
 drift+0.013
Neg logit contributions
roc-0.014
ate-0.013
rot-0.012
ulture-0.012
ateful-0.012
Token Billy
Token ID15890
Feature activation-3.39e-01

Pos logit contributions
 minutes+0.008
ches+0.008
fy+0.007
 mun+0.007
 degrees+0.007
Neg logit contributions
roc-0.012
ate-0.012
ateful-0.011
ulture-0.011
rot-0.010
Token said
Token ID531
Feature activation+1.02e+00

Pos logit contributions
 minutes+0.013
ches+0.011
fy+0.011
 degrees+0.011
 drift+0.011
Neg logit contributions
roc-0.016
ate-0.015
ulture-0.014
rot-0.014
ateful-0.013
Token goodbye
Token ID24829
Feature activation+5.89e-01

Pos logit contributions
roc+0.052
ate+0.048
ulture+0.047
ateful+0.046
amer+0.044
Neg logit contributions
 minutes-0.038
 y-0.034
ches-0.033
 wo-0.033
 drift-0.033
Token and
Token ID290
Feature activation-7.14e-01

Pos logit contributions
roc+0.028
ate+0.027
ateful+0.025
ulture+0.025
rot+0.023
Neg logit contributions
 minutes-0.023
ches-0.021
fy-0.020
 mun-0.020
 drift-0.020
Token started
Token ID2067
Feature activation+1.95e-01

Pos logit contributions
 minutes+0.026
ches+0.024
 degrees+0.023
fy+0.022
 Whoever+0.022
Neg logit contributions
roc-0.034
ate-0.033
man-0.030
ulture-0.030
ateful-0.029
Token walking
Token ID6155
Feature activation+4.39e-01

Pos logit contributions
roc+0.009
ate+0.008
ulture+0.008
ateful+0.008
rot+0.007
Neg logit contributions
 minutes-0.008
ches-0.007
fy-0.007
 mun-0.007
 drift-0.007
Token back
Token ID736
Feature activation+6.78e-01

Pos logit contributions
roc+0.018
ate+0.017
ulture+0.016
ateful+0.016
rot+0.015
Neg logit contributions
 minutes-0.020
ches-0.018
fy-0.017
 drift-0.017
 mun-0.017
Token home
Token ID1363
Feature activation+6.43e-01

Pos logit contributions
roc+0.023
ate+0.020
ulture+0.019
ateful+0.019
rot+0.018
Neg logit contributions
 minutes-0.036
ches-0.034
 drift-0.033
fy-0.033
 y-0.032
Token.
Token ID13
Feature activation+1.79e-01

Pos logit contributions
roc+0.035
ate+0.033
ulture+0.032
ateful+0.031
rot+0.030
Neg logit contributions
 minutes-0.021
ches-0.018
fy-0.018
 drift-0.017
 mun-0.017
Token He
Token ID679
Feature activation-1.09e+00

Pos logit contributions
roc+0.008
ate+0.007
ulture+0.007
ateful+0.007
rot+0.006
Neg logit contributions
 minutes-0.008
ches-0.007
fy-0.007
 mun-0.007
 drift-0.007
Token kept
Token ID4030
Feature activation-3.16e-02

Pos logit contributions
 minutes+0.048
 degrees+0.044
fy+0.042
ches+0.042
 pounds+0.041
Neg logit contributions
ate-0.041
roc-0.041
man-0.036
rot-0.036
 neighborhood-0.035
Token you
Token ID345
Feature activation-4.32e-01

Pos logit contributions
 minutes+0.057
ches+0.057
 mun+0.055
fy+0.054
 degrees+0.052
Neg logit contributions
ate-0.050
roc-0.048
ateful-0.045
 Respect-0.044
 Fighting-0.044
Token will
Token ID481
Feature activation+5.07e-01

Pos logit contributions
 minutes+0.015
ches+0.013
fy+0.013
 mun+0.012
 degrees+0.012
Neg logit contributions
roc-0.022
ate-0.021
ulture-0.020
ateful-0.019
rot-0.019
Token think
Token ID892
Feature activation-1.93e+00

Pos logit contributions
roc+0.026
ate+0.024
ulture+0.023
ateful+0.022
rot+0.022
Neg logit contributions
 minutes-0.018
ches-0.016
 drift-0.016
 y-0.016
fy-0.015
Token before
Token ID878
Feature activation-9.88e-01

Pos logit contributions
 mun+0.079
 minutes+0.075
ches+0.075
fy+0.074
 drift+0.071
Neg logit contributions
ate-0.076
ateful-0.066
roc-0.064
 Respect-0.062
man-0.062
Token you
Token ID345
Feature activation-5.05e-01

Pos logit contributions
fy+0.036
ches+0.036
 minutes+0.036
 mun+0.033
 Whoever+0.032
Neg logit contributions
ate-0.044
roc-0.043
ateful-0.040
 Respect-0.038
 Fighting-0.037
Token act
Token ID719
Feature activation+8.39e-01

Pos logit contributions
 minutes+0.018
ches+0.016
fy+0.015
 degrees+0.015
 mun+0.015
Neg logit contributions
roc-0.025
ate-0.024
ulture-0.022
ateful-0.022
rot-0.021
Token."
Token ID526
Feature activation+8.89e-01

Pos logit contributions
roc+0.045
ate+0.041
ulture+0.040
ateful+0.038
rot+0.038
Neg logit contributions
 minutes-0.029
ches-0.027
fy-0.025
 y-0.025
 drift-0.024
Token
Token ID198
Feature activation-2.58e-03

Pos logit contributions
roc+0.043
ate+0.041
ulture+0.038
ateful+0.038
amed+0.038
Neg logit contributions
 minutes-0.033
 y-0.028
 Whoever-0.028
 drift-0.027
ches-0.027
Token
Token ID198
Feature activation+5.51e-02

Pos logit contributions
 minutes+0.000
ches+0.000
fy+0.000
 mun+0.000
 drift+0.000
Neg logit contributions
roc-0.000
ate-0.000
ulture-0.000
ateful-0.000
rot-0.000
TokenL
Token ID43
Feature activation-5.21e-01

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.002
ches-0.002
fy-0.002
 mun-0.002
 drift-0.002
Tokenily
Token ID813
Feature activation+9.27e-01

Pos logit contributions
 minutes+0.022
 drift+0.021
fy+0.021
ches+0.021
 mun+0.020
Neg logit contributions
roc-0.022
ate-0.022
ateful-0.020
ulture-0.019
anish-0.019
Token was
Token ID373
Feature activation+4.96e-01

Pos logit contributions
 minutes+0.056
 Whoever+0.054
fy+0.053
 degrees+0.052
ches+0.052
Neg logit contributions
roc-0.041
ate-0.040
rot-0.038
flower-0.036
 neighborhood-0.035
Token back
Token ID736
Feature activation+8.71e-01

Pos logit contributions
roc+0.025
ate+0.023
ulture+0.022
ateful+0.022
rot+0.021
Neg logit contributions
 minutes-0.018
ches-0.017
fy-0.016
 drift-0.016
 y-0.016
Token in
Token ID287
Feature activation+8.82e-01

Pos logit contributions
roc+0.038
ate+0.034
ulture+0.033
ateful+0.032
anish+0.031
Neg logit contributions
 minutes-0.039
 drift-0.035
ches-0.035
fy-0.034
 mun-0.034
Token her
Token ID607
Feature activation-1.45e-01

Pos logit contributions
roc+0.046
ate+0.043
ulture+0.040
rot+0.039
ateful+0.038
Neg logit contributions
 minutes-0.034
 drift-0.029
 y-0.028
 mun-0.027
ches-0.027
Token nest
Token ID16343
Feature activation+1.68e+00

Pos logit contributions
 minutes+0.007
ches+0.006
fy+0.006
 Whoever+0.006
 mun+0.006
Neg logit contributions
roc-0.006
ate-0.005
ulture-0.005
ateful-0.005
rot-0.005
Token.
Token ID13
Feature activation+6.17e-02

Pos logit contributions
roc+0.085
ate+0.077
ateful+0.075
ulture+0.073
 Fighting+0.070
Neg logit contributions
 minutes-0.063
ches-0.056
 mun-0.055
fy-0.054
 y-0.054
Token She
Token ID1375
Feature activation-9.13e-01

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.003
ches-0.003
fy-0.003
 mun-0.003
 drift-0.003
Token looked
Token ID3114
Feature activation-1.90e-01

Pos logit contributions
 minutes+0.039
fy+0.036
 degrees+0.035
ches+0.035
 Whoever+0.034
Neg logit contributions
ate-0.036
roc-0.036
rot-0.032
amed-0.031
man-0.031
Token around
Token ID1088
Feature activation+2.69e-01

Pos logit contributions
 minutes+0.008
ches+0.008
fy+0.007
 mun+0.007
 degrees+0.007
Neg logit contributions
roc-0.008
ate-0.008
ateful-0.007
ulture-0.007
rot-0.007
Token and
Token ID290
Feature activation-1.39e-01

Pos logit contributions
roc+0.013
ate+0.012
ulture+0.011
ateful+0.011
rot+0.011
Neg logit contributions
 minutes-0.011
ches-0.010
fy-0.009
 drift-0.009
 mun-0.009
Token saw
Token ID2497
Feature activation+9.13e-01

Pos logit contributions
 minutes+0.006
ches+0.006
fy+0.006
 degrees+0.006
 drift+0.006
Neg logit contributions
roc-0.005
ate-0.005
ateful-0.005
ulture-0.005
rot-0.004
Token to
Token ID284
Feature activation-2.61e-01

Pos logit contributions
roc+0.045
ate+0.044
ulture+0.041
rot+0.040
amer+0.040
Neg logit contributions
 minutes-0.035
 y-0.031
ches-0.031
 drift-0.030
 mun-0.030
Token go
Token ID467
Feature activation+6.20e-02

Pos logit contributions
 minutes+0.009
ches+0.008
fy+0.008
 mun+0.008
 degrees+0.008
Neg logit contributions
roc-0.013
ate-0.013
ulture-0.011
ateful-0.011
anish-0.011
Token to
Token ID284
Feature activation-9.87e-02

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.003
ches-0.003
fy-0.003
 mun-0.003
 drift-0.003
Token the
Token ID262
Feature activation+6.32e-01

Pos logit contributions
 minutes+0.004
ches+0.004
fy+0.004
 mun+0.004
 degrees+0.004
Neg logit contributions
roc-0.004
ate-0.004
ulture-0.004
ateful-0.004
rot-0.003
Token hospital
Token ID4436
Feature activation-3.86e-01

Pos logit contributions
roc+0.026
ate+0.025
ulture+0.023
ateful+0.023
rot+0.022
Neg logit contributions
 minutes-0.027
ches-0.025
 y-0.025
fy-0.024
 drift-0.024
Token for
Token ID329
Feature activation-1.12e+00

Pos logit contributions
 minutes+0.014
ches+0.013
fy+0.013
 mun+0.013
 drift+0.012
Neg logit contributions
roc-0.018
ate-0.017
ulture-0.016
ateful-0.016
man-0.015
Token an
Token ID281
Feature activation-2.06e+00

Pos logit contributions
fy+0.056
ches+0.056
 minutes+0.053
 mun+0.053
 Whoever+0.051
Neg logit contributions
ate-0.033
roc-0.033
ateful-0.028
 town-0.028
 Respect-0.027
Token operation
Token ID4905
Feature activation-2.54e-01

Pos logit contributions
ches+0.098
 mun+0.097
 minutes+0.096
fy+0.095
 Whoever+0.094
Neg logit contributions
roc-0.061
ate-0.061
 wing-0.055
ateful-0.053
 office-0.052
Token.
Token ID13
Feature activation-9.18e-01

Pos logit contributions
 minutes+0.009
ches+0.009
fy+0.008
 mun+0.008
 drift+0.008
Neg logit contributions
roc-0.012
ate-0.012
ulture-0.011
ateful-0.010
rot-0.010
Token
Token ID198
Feature activation-9.06e-01

Pos logit contributions
 minutes+0.047
ches+0.045
fy+0.043
 degrees+0.042
 drift+0.042
Neg logit contributions
roc-0.031
ate-0.029
 Respect-0.025
 Fighting-0.025
ulture-0.025
Token
Token ID198
Feature activation+3.96e-02

Pos logit contributions
 minutes+0.048
fy+0.045
ches+0.044
 degrees+0.044
 drift+0.043
Neg logit contributions
roc-0.029
ate-0.028
ulture-0.024
 Respect-0.023
man-0.023
Token.
Token ID13
Feature activation+1.55e-01

Pos logit contributions
 minutes+0.012
ches+0.011
fy+0.011
 mun+0.010
 drift+0.010
Neg logit contributions
roc-0.015
ate-0.015
ulture-0.014
ateful-0.014
man-0.014
Token Albert
Token ID9966
Feature activation-1.97e-01

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.006
ches-0.005
fy-0.005
 mun-0.005
 drift-0.005
Token was
Token ID373
Feature activation+5.70e-01

Pos logit contributions
 minutes+0.008
ches+0.008
fy+0.007
 drift+0.007
 degrees+0.007
Neg logit contributions
roc-0.008
ate-0.008
ulture-0.007
ateful-0.007
rot-0.007
Token upset
Token ID9247
Feature activation-4.50e-01

Pos logit contributions
roc+0.030
ate+0.028
ulture+0.027
ateful+0.026
rot+0.026
Neg logit contributions
 minutes-0.020
 y-0.017
ches-0.017
 drift-0.017
fy-0.017
Token.
Token ID13
Feature activation+2.44e-01

Pos logit contributions
 minutes+0.020
ches+0.019
 mun+0.018
fy+0.018
 drift+0.018
Neg logit contributions
roc-0.017
ate-0.017
ateful-0.015
ulture-0.015
man-0.014
Token He
Token ID679
Feature activation-1.71e+00

Pos logit contributions
roc+0.010
ate+0.009
ulture+0.009
ateful+0.008
rot+0.008
Neg logit contributions
 minutes-0.011
ches-0.010
fy-0.010
 drift-0.010
 mun-0.010
Token didn
Token ID1422
Feature activation-4.24e-01

Pos logit contributions
 minutes+0.079
 pounds+0.076
 degrees+0.074
 marsh+0.072
 Whoever+0.070
Neg logit contributions
roc-0.056
ate-0.052
man-0.049
amed-0.048
 neighborhood-0.046
Token't
Token ID470
Feature activation-4.31e-01

Pos logit contributions
 minutes+0.021
fy+0.019
 mun+0.018
ches+0.018
 degrees+0.018
Neg logit contributions
ate-0.015
roc-0.015
ulture-0.013
ateful-0.013
rot-0.013
Token want
Token ID765
Feature activation-4.34e-01

Pos logit contributions
 minutes+0.016
ches+0.015
 degrees+0.014
 mun+0.014
fy+0.014
Neg logit contributions
roc-0.020
ate-0.019
ateful-0.017
ulture-0.017
man-0.017
Token to
Token ID284
Feature activation+3.33e-01

Pos logit contributions
 minutes+0.020
ches+0.019
fy+0.019
 mun+0.018
 degrees+0.018
Neg logit contributions
roc-0.016
ate-0.015
ateful-0.013
ulture-0.013
man-0.013
Token take
Token ID1011
Feature activation-6.81e-01

Pos logit contributions
roc+0.018
ate+0.017
ulture+0.016
ateful+0.016
rot+0.015
Neg logit contributions
 minutes-0.011
ches-0.010
fy-0.009
 drift-0.009
 mun-0.009
Token think
Token ID892
Feature activation-2.52e+00

Pos logit contributions
 minutes+0.096
 degrees+0.092
fy+0.080
 pounds+0.078
ches+0.076
Neg logit contributions
roc-0.088
ate-0.084
 judge-0.080
 town-0.079
ateful-0.074
Token it
Token ID340
Feature activation-1.62e+00

Pos logit contributions
fy+0.119
 mun+0.113
ches+0.113
ksh+0.110
 minutes+0.105
Neg logit contributions
ate-0.078
room-0.075
ap-0.068
apped-0.066
 wing-0.066
Token will
Token ID481
Feature activation-6.88e-01

Pos logit contributions
 minutes+0.093
 mun+0.085
fy+0.083
 degrees+0.082
 drift+0.079
Neg logit contributions
ate-0.049
roc-0.043
man-0.041
rot-0.036
ulture-0.036
Token be
Token ID307
Feature activation-2.07e+00

Pos logit contributions
 minutes+0.034
ches+0.032
fy+0.031
 degrees+0.030
 mun+0.030
Neg logit contributions
roc-0.024
ate-0.024
ulture-0.020
ateful-0.020
man-0.019
Token more
Token ID517
Feature activation-1.79e+00

Pos logit contributions
 mun+0.071
ches+0.071
fy+0.068
 degrees+0.067
 Whoever+0.067
Neg logit contributions
ate-0.097
roc-0.092
amed-0.080
ateful-0.079
 yourself-0.077
Token fun
Token ID1257
Feature activation-1.48e+00

Pos logit contributions
ches+0.065
 mun+0.064
 degrees+0.058
 marsh+0.056
 wo+0.056
Neg logit contributions
ate-0.086
roc-0.071
 neighborhood-0.068
 Respect-0.066
ateful-0.066
Token.
Token ID13
Feature activation-4.28e-01

Pos logit contributions
ches+0.066
 mun+0.066
 drift+0.064
fy+0.063
 syrup+0.061
Neg logit contributions
ate-0.053
roc-0.048
amer-0.043
ateful-0.043
 zone-0.042
Token They
Token ID1119
Feature activation-2.38e+00

Pos logit contributions
 minutes+0.020
ches+0.020
fy+0.019
 drift+0.019
 mun+0.019
Neg logit contributions
roc-0.016
ate-0.015
ulture-0.013
 Fighting-0.013
ateful-0.013
Token take
Token ID1011
Feature activation-4.21e-01

Pos logit contributions
 minutes+0.106
 degrees+0.103
 pounds+0.088
fy+0.085
ches+0.083
Neg logit contributions
roc-0.093
ate-0.090
 judge-0.088
 town-0.084
 nation-0.080
Token the
Token ID262
Feature activation-8.26e-01

Pos logit contributions
 minutes+0.016
ches+0.015
fy+0.015
 degrees+0.014
 mun+0.014
Neg logit contributions
roc-0.020
ate-0.019
ulture-0.017
ateful-0.017
rot-0.016
Token ball
Token ID2613
Feature activation-6.55e-02

Pos logit contributions
 minutes+0.044
ches+0.041
 degrees+0.040
 mun+0.040
fy+0.039
Neg logit contributions
roc-0.026
ate-0.024
ulture-0.023
rot-0.022
anish-0.022
Token kept
Token ID4030
Feature activation-5.55e-01

Pos logit contributions
 minutes+0.049
 Whoever+0.046
ches+0.045
fy+0.043
 degrees+0.043
Neg logit contributions
roc-0.052
ate-0.051
flower-0.049
ateful-0.048
amed-0.047
Token on
Token ID319
Feature activation-1.85e-01

Pos logit contributions
 minutes+0.023
fy+0.022
ches+0.022
 mun+0.020
 degrees+0.020
Neg logit contributions
ate-0.023
roc-0.022
rot-0.020
ateful-0.020
man-0.020
Token staring
Token ID16143
Feature activation-2.82e-01

Pos logit contributions
 minutes+0.008
ches+0.007
fy+0.007
 mun+0.007
 Whoever+0.007
Neg logit contributions
roc-0.008
ate-0.008
ateful-0.007
ulture-0.007
rot-0.007
Token at
Token ID379
Feature activation-3.94e-01

Pos logit contributions
 minutes+0.011
ches+0.010
fy+0.010
 mun+0.010
 Whoever+0.010
Neg logit contributions
roc-0.012
ate-0.012
ateful-0.011
ulture-0.011
rot-0.011
Token the
Token ID262
Feature activation-3.96e-01

Pos logit contributions
 minutes+0.018
fy+0.017
ches+0.017
 mun+0.016
 drift+0.016
Neg logit contributions
roc-0.015
ate-0.015
ateful-0.013
ulture-0.013
rot-0.013
Token clouds
Token ID15114
Feature activation-1.92e+00

Pos logit contributions
 minutes+0.018
ches+0.017
fy+0.017
 mun+0.017
 drift+0.016
Neg logit contributions
roc-0.015
ate-0.014
ulture-0.013
rot-0.013
ateful-0.012
Token.
Token ID13
Feature activation-5.71e-01

Pos logit contributions
 minutes+0.082
 degrees+0.072
ches+0.072
 mun+0.071
fy+0.071
Neg logit contributions
ate-0.069
flower-0.067
roc-0.066
man-0.066
rot-0.065
Token He
Token ID679
Feature activation-9.91e-01

Pos logit contributions
 minutes+0.028
ches+0.026
fy+0.025
 degrees+0.025
 drift+0.024
Neg logit contributions
roc-0.020
ate-0.020
ulture-0.017
 Respect-0.017
 Fighting-0.017
Token was
Token ID373
Feature activation+4.34e-01

Pos logit contributions
 minutes+0.042
fy+0.038
 degrees+0.038
ches+0.038
 Whoever+0.037
Neg logit contributions
roc-0.038
ate-0.037
rot-0.034
man-0.033
ulture-0.033
Token happy
Token ID3772
Feature activation+2.65e-01

Pos logit contributions
roc+0.022
ate+0.021
ulture+0.019
ateful+0.019
rot+0.019
Neg logit contributions
 minutes-0.016
ches-0.014
fy-0.014
 drift-0.013
 y-0.013
Token and
Token ID290
Feature activation-3.06e-01

Pos logit contributions
roc+0.013
ate+0.012
ulture+0.011
ateful+0.011
rot+0.011
Neg logit contributions
 minutes-0.010
ches-0.009
fy-0.009
 mun-0.009
 drift-0.009
Token juice
Token ID13135
Feature activation+1.09e-01

Pos logit contributions
 minutes+0.032
ches+0.030
 Whoever+0.029
fy+0.028
 drift+0.027
Neg logit contributions
roc-0.048
ate-0.047
ulture-0.043
ateful-0.043
rot-0.042
Token,
Token ID11
Feature activation-1.10e+00

Pos logit contributions
roc+0.004
ate+0.004
ulture+0.004
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.005
ches-0.005
fy-0.004
 mun-0.004
 drift-0.004
Token and
Token ID290
Feature activation-1.55e+00

Pos logit contributions
 minutes+0.047
ches+0.044
fy+0.044
 degrees+0.042
 Whoever+0.042
Neg logit contributions
ate-0.042
roc-0.042
flower-0.036
ulture-0.036
ateful-0.035
Token some
Token ID617
Feature activation-1.04e+00

Pos logit contributions
 minutes+0.056
ches+0.054
 degrees+0.052
fy+0.052
 Whoever+0.050
Neg logit contributions
ate-0.066
roc-0.066
ateful-0.058
flower-0.058
man-0.058
Token cookies
Token ID14746
Feature activation-3.42e-01

Pos logit contributions
 minutes+0.035
ches+0.033
 Whoever+0.032
fy+0.031
 drift+0.030
Neg logit contributions
roc-0.050
ate-0.048
ateful-0.044
ulture-0.044
rot-0.043
Token.
Token ID13
Feature activation-9.07e-01

Pos logit contributions
 minutes+0.014
ches+0.013
fy+0.012
 drift+0.012
 mun+0.012
Neg logit contributions
roc-0.015
ate-0.015
ulture-0.013
ateful-0.013
rot-0.013
Token They
Token ID1119
Feature activation-1.28e+00

Pos logit contributions
 minutes+0.044
ches+0.042
fy+0.041
 degrees+0.040
 drift+0.039
Neg logit contributions
roc-0.032
ate-0.031
 Fighting-0.028
 Respect-0.027
ulture-0.027
Token sat
Token ID3332
Feature activation+1.15e+00

Pos logit contributions
 minutes+0.051
 degrees+0.045
ches+0.043
fy+0.043
 Whoever+0.043
Neg logit contributions
roc-0.057
ate-0.055
ateful-0.050
flower-0.050
ulture-0.049
Token on
Token ID319
Feature activation-2.34e-01

Pos logit contributions
roc+0.064
ulture+0.057
ate+0.056
anish+0.055
ateful+0.054
Neg logit contributions
 minutes-0.037
 y-0.032
ches-0.030
 mun-0.029
fy-0.028
Token a
Token ID257
Feature activation-2.18e+00

Pos logit contributions
 minutes+0.010
ches+0.010
fy+0.009
 drift+0.009
 mun+0.009
Neg logit contributions
roc-0.010
ate-0.009
ulture-0.008
ateful-0.008
rot-0.008
Token bench
Token ID7624
Feature activation+6.03e-01

Pos logit contributions
 minutes+0.090
 Whoever+0.083
 degrees+0.081
utt+0.079
 mun+0.076
Neg logit contributions
man-0.079
roc-0.075
ate-0.072
rot-0.071
ase-0.070
Tokenmy
Token ID1820
Feature activation+7.24e-01

Pos logit contributions
roc+0.065
ate+0.060
ulture+0.059
ateful+0.056
rot+0.054
Neg logit contributions
 minutes-0.059
ches-0.055
 y-0.054
fy-0.054
 mun-0.050
Token."
Token ID526
Feature activation+8.20e-02

Pos logit contributions
roc+0.041
ate+0.039
ulture+0.037
ateful+0.037
rot+0.036
Neg logit contributions
 minutes-0.020
ches-0.018
fy-0.017
 y-0.017
 mun-0.017
Token<|endoftext|>
Token ID50256
Feature activation-5.76e-01

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.004
ches-0.004
fy-0.003
 mun-0.003
 drift-0.003
TokenOnce
Token ID7454
Feature activation-6.37e-01

Pos logit contributions
 minutes+0.030
fy+0.028
ches+0.028
 degrees+0.027
 mun+0.027
Neg logit contributions
ate-0.019
roc-0.018
ulture-0.016
rot-0.015
man-0.015
Token upon
Token ID2402
Feature activation-1.78e-02

Pos logit contributions
 minutes+0.028
ches+0.026
fy+0.026
 mun+0.026
 degrees+0.026
Neg logit contributions
roc-0.025
ate-0.024
ateful-0.022
ulture-0.022
rot-0.021
Token a
Token ID257
Feature activation-2.38e+00

Pos logit contributions
 minutes+0.001
ches+0.001
fy+0.001
 mun+0.001
 drift+0.001
Neg logit contributions
roc-0.001
ate-0.001
ulture-0.001
ateful-0.001
rot-0.001
Token time
Token ID640
Feature activation+7.33e-01

Pos logit contributions
 degrees+0.086
ches+0.083
 minutes+0.081
 Whoever+0.080
 pounds+0.079
Neg logit contributions
roc-0.097
ate-0.095
 neighborhood-0.089
room-0.086
anish-0.083
Token,
Token ID11
Feature activation+3.17e-02

Pos logit contributions
roc+0.029
ate+0.027
ulture+0.025
ateful+0.024
amer+0.024
Neg logit contributions
 minutes-0.035
ches-0.031
 y-0.031
 drift-0.030
fy-0.030
Token there
Token ID612
Feature activation+9.91e-01

Pos logit contributions
roc+0.001
ate+0.001
ulture+0.001
ateful+0.001
rot+0.001
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token was
Token ID373
Feature activation-7.84e-02

Pos logit contributions
roc+0.048
ate+0.044
ulture+0.043
ateful+0.043
amer+0.042
Neg logit contributions
 minutes-0.036
 y-0.033
ches-0.033
 mun-0.031
 drift-0.031
Token a
Token ID257
Feature activation-9.83e-01

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.003
rot-0.003
Token day
Token ID1110
Feature activation-1.03e+00

Pos logit contributions
ches+0.047
 minutes+0.045
 degrees+0.044
fy+0.043
 drift+0.042
Neg logit contributions
ate-0.021
roc-0.021
man-0.018
ulture-0.018
flower-0.017
Token.
Token ID13
Feature activation-7.20e-01

Pos logit contributions
 minutes+0.040
ches+0.039
 degrees+0.037
 Whoever+0.036
 mun+0.036
Neg logit contributions
ate-0.042
roc-0.040
rot-0.037
amer-0.037
ulture-0.037
Token<|endoftext|>
Token ID50256
Feature activation-2.07e-01

Pos logit contributions
 minutes+0.038
ches+0.037
fy+0.036
 degrees+0.035
 drift+0.035
Neg logit contributions
roc-0.023
ate-0.022
ulture-0.019
 Fighting-0.018
 Respect-0.018
TokenOnce
Token ID7454
Feature activation-4.99e-01

Pos logit contributions
 minutes+0.010
ches+0.009
fy+0.009
 mun+0.009
 degrees+0.009
Neg logit contributions
roc-0.008
ate-0.007
ulture-0.007
rot-0.006
ateful-0.006
Token upon
Token ID2402
Feature activation-1.01e+00

Pos logit contributions
 minutes+0.022
ches+0.020
fy+0.020
 mun+0.020
 degrees+0.020
Neg logit contributions
roc-0.020
ate-0.019
ulture-0.017
ateful-0.017
rot-0.017
Token a
Token ID257
Feature activation-2.63e+00

Pos logit contributions
 minutes+0.051
ches+0.050
fy+0.049
 degrees+0.047
 mun+0.046
Neg logit contributions
ate-0.033
roc-0.032
ateful-0.027
ulture-0.026
rot-0.026
Token time
Token ID640
Feature activation+6.39e-01

Pos logit contributions
 degrees+0.104
ches+0.097
 syrup+0.097
 pounds+0.096
 Whoever+0.096
Neg logit contributions
roc-0.096
ate-0.094
 neighborhood-0.093
room-0.084
ap-0.083
Token,
Token ID11
Feature activation-8.67e-02

Pos logit contributions
roc+0.025
ate+0.023
ulture+0.021
ateful+0.021
amer+0.020
Neg logit contributions
 minutes-0.031
ches-0.028
 y-0.027
fy-0.027
 drift-0.027
Token there
Token ID612
Feature activation+9.26e-01

Pos logit contributions
 minutes+0.004
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.004
ate-0.004
ulture-0.003
ateful-0.003
rot-0.003
Token was
Token ID373
Feature activation-3.88e-01

Pos logit contributions
roc+0.044
ate+0.041
ateful+0.039
ulture+0.039
amer+0.039
Neg logit contributions
 minutes-0.035
ches-0.031
 y-0.031
 mun-0.030
 drift-0.030
Token a
Token ID257
Feature activation-1.09e+00

Pos logit contributions
 minutes+0.018
ches+0.017
fy+0.017
 degrees+0.016
 drift+0.016
Neg logit contributions
roc-0.015
ate-0.014
ulture-0.012
ateful-0.012
rot-0.012
Token end
Token ID886
Feature activation-1.05e+00

Pos logit contributions
 minutes+0.016
ches+0.015
fy+0.015
 Whoever+0.014
 degrees+0.014
Neg logit contributions
roc-0.009
ate-0.008
ulture-0.007
rot-0.007
ateful-0.007
Token.
Token ID13
Feature activation+3.42e-01

Pos logit contributions
 mun+0.047
 minutes+0.046
 Whoever+0.044
fy+0.043
 degrees+0.043
Neg logit contributions
ate-0.038
roc-0.037
man-0.035
 Fighting-0.033
rot-0.033
Token<|endoftext|>
Token ID50256
Feature activation-4.00e-01

Pos logit contributions
roc+0.014
ate+0.014
ateful+0.012
ulture+0.012
rot+0.012
Neg logit contributions
 minutes-0.015
ches-0.014
fy-0.013
 mun-0.013
 drift-0.013
TokenOnce
Token ID7454
Feature activation-3.50e-01

Pos logit contributions
 minutes+0.020
fy+0.019
ches+0.018
 degrees+0.018
 mun+0.018
Neg logit contributions
roc-0.014
ate-0.014
ulture-0.012
rot-0.012
man-0.011
Token upon
Token ID2402
Feature activation-5.32e-01

Pos logit contributions
 minutes+0.015
ches+0.014
fy+0.014
 mun+0.013
 degrees+0.013
Neg logit contributions
roc-0.014
ate-0.014
ulture-0.013
ateful-0.013
rot-0.012
Token a
Token ID257
Feature activation-2.49e+00

Pos logit contributions
 minutes+0.025
ches+0.024
fy+0.023
 degrees+0.022
 mun+0.022
Neg logit contributions
ate-0.020
roc-0.020
ulture-0.017
ateful-0.017
rot-0.016
Token time
Token ID640
Feature activation+6.43e-01

Pos logit contributions
 degrees+0.095
ches+0.090
 pounds+0.089
 Whoever+0.088
 minutes+0.088
Neg logit contributions
roc-0.096
ate-0.095
 neighborhood-0.089
room-0.084
rot-0.083
Token,
Token ID11
Feature activation-1.20e-01

Pos logit contributions
roc+0.025
ate+0.023
ulture+0.021
ateful+0.021
amer+0.020
Neg logit contributions
 minutes-0.031
ches-0.028
 y-0.027
 drift-0.027
fy-0.027
Token there
Token ID612
Feature activation+1.15e+00

Pos logit contributions
 minutes+0.005
ches+0.005
fy+0.005
 mun+0.005
 degrees+0.004
Neg logit contributions
roc-0.005
ate-0.005
ulture-0.004
ateful-0.004
rot-0.004
Token was
Token ID373
Feature activation-1.42e-01

Pos logit contributions
roc+0.058
ate+0.053
ateful+0.052
ulture+0.051
amer+0.051
Neg logit contributions
 minutes-0.040
 y-0.037
ches-0.037
 mun-0.035
 drift-0.035
Token a
Token ID257
Feature activation-1.12e+00

Pos logit contributions
 minutes+0.006
ches+0.006
fy+0.006
 mun+0.005
 drift+0.005
Neg logit contributions
roc-0.006
ate-0.006
ulture-0.005
ateful-0.005
rot-0.005
Token and
Token ID290
Feature activation+1.23e-01

Pos logit contributions
 minutes+0.046
fy+0.042
 degrees+0.041
 mun+0.038
 drift+0.038
Neg logit contributions
roc-0.050
ate-0.048
man-0.041
rot-0.041
 Respect-0.041
Token Ben
Token ID3932
Feature activation-1.75e+00

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.004
ches-0.003
fy-0.003
 mun-0.003
 drift-0.003
Token are
Token ID389
Feature activation+6.67e-01

Pos logit contributions
 minutes+0.080
 degrees+0.071
fy+0.068
 pounds+0.067
 mun+0.064
Neg logit contributions
roc-0.070
ate-0.064
 bully-0.059
 judge-0.058
 Respect-0.058
Token scared
Token ID12008
Feature activation-8.21e-01

Pos logit contributions
roc+0.034
ate+0.031
ulture+0.031
anish+0.029
rot+0.029
Neg logit contributions
 minutes-0.025
 y-0.022
ches-0.022
 drift-0.021
fy-0.020
Token.
Token ID13
Feature activation+5.34e-01

Pos logit contributions
 mun+0.036
ches+0.035
fy+0.035
 minutes+0.034
 degrees+0.033
Neg logit contributions
roc-0.030
ate-0.028
ateful-0.028
 Fighting-0.026
amer-0.026
Token They
Token ID1119
Feature activation-2.61e+00

Pos logit contributions
roc+0.024
ate+0.023
ulture+0.021
ateful+0.021
rot+0.021
Neg logit contributions
 minutes-0.022
ches-0.019
 Whoever-0.019
 y-0.019
fy-0.018
Token run
Token ID1057
Feature activation-3.57e-01

Pos logit contributions
 minutes+0.125
 degrees+0.113
 pounds+0.113
fy+0.105
 mun+0.100
Neg logit contributions
roc-0.099
 bully-0.085
 judge-0.084
 voice-0.083
 spirit-0.079
Token away
Token ID1497
Feature activation+2.17e-02

Pos logit contributions
 minutes+0.016
ches+0.015
 degrees+0.015
 mun+0.015
fy+0.015
Neg logit contributions
roc-0.014
ate-0.013
ulture-0.012
ateful-0.011
rot-0.011
Token.
Token ID13
Feature activation+4.63e-01

Pos logit contributions
roc+0.001
ate+0.001
ateful+0.001
ulture+0.001
rot+0.001
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token The
Token ID383
Feature activation+3.42e-01

Pos logit contributions
roc+0.020
ate+0.019
ulture+0.018
ateful+0.018
rot+0.018
Neg logit contributions
 minutes-0.019
ches-0.017
 Whoever-0.017
 y-0.017
fy-0.017
Token dog
Token ID3290
Feature activation-1.57e+00

Pos logit contributions
roc+0.014
ate+0.013
ateful+0.011
ulture+0.011
rot+0.011
Neg logit contributions
 minutes-0.016
ches-0.015
fy-0.014
 y-0.014
 drift-0.014
Token bit
Token ID1643
Feature activation-6.36e-01

Pos logit contributions
 minutes+0.050
 degrees+0.046
ches+0.045
 drift+0.042
 mun+0.042
Neg logit contributions
roc-0.060
man-0.054
ate-0.053
 neighborhood-0.053
rot-0.053
Token more
Token ID517
Feature activation-9.67e-01

Pos logit contributions
ches+0.022
 minutes+0.021
 wo+0.021
 degrees+0.020
 mun+0.020
Neg logit contributions
roc-0.029
ate-0.028
ateful-0.027
amer-0.025
ulture-0.025
Token until
Token ID1566
Feature activation-8.50e-01

Pos logit contributions
ches+0.042
 Whoever+0.040
 minutes+0.039
 wo+0.037
 degrees+0.037
Neg logit contributions
roc-0.034
ate-0.034
ateful-0.031
 neighborhood-0.031
 self-0.029
Token it
Token ID340
Feature activation-7.34e-01

Pos logit contributions
ches+0.032
 minutes+0.030
 degrees+0.030
fy+0.029
 pounds+0.028
Neg logit contributions
ate-0.036
roc-0.036
ateful-0.034
 Fighting-0.032
ulture-0.032
Token was
Token ID373
Feature activation-6.96e-01

Pos logit contributions
 minutes+0.036
fy+0.033
 degrees+0.033
 Whoever+0.032
ches+0.032
Neg logit contributions
roc-0.025
ate-0.024
man-0.021
ulture-0.021
rot-0.021
Token time
Token ID640
Feature activation-2.17e+00

Pos logit contributions
 minutes+0.023
ches+0.023
fy+0.022
 degrees+0.021
 Whoever+0.021
Neg logit contributions
ate-0.033
roc-0.033
ateful-0.030
amed-0.030
flower-0.029
Token to
Token ID284
Feature activation-7.16e-01

Pos logit contributions
ches+0.116
 degrees+0.113
 minutes+0.112
 marsh+0.112
 mun+0.111
Neg logit contributions
ate-0.048
amed-0.044
man-0.041
roc-0.041
 Respect-0.040
Token go
Token ID467
Feature activation-3.86e-01

Pos logit contributions
 minutes+0.027
ches+0.023
fy+0.023
 degrees+0.023
 pounds+0.021
Neg logit contributions
ate-0.035
roc-0.035
ateful-0.031
ulture-0.031
flower-0.030
Token back
Token ID736
Feature activation+2.62e-02

Pos logit contributions
 minutes+0.024
ches+0.023
 degrees+0.023
fy+0.022
 mun+0.022
Neg logit contributions
ate-0.008
roc-0.008
ulture-0.006
ateful-0.006
rot-0.006
Token inside
Token ID2641
Feature activation-4.14e-01

Pos logit contributions
roc+0.001
ate+0.001
ulture+0.001
ateful+0.001
rot+0.001
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token.
Token ID13
Feature activation-4.22e-01

Pos logit contributions
 minutes+0.016
ches+0.015
fy+0.015
 mun+0.014
 degrees+0.014
Neg logit contributions
roc-0.018
ate-0.018
ateful-0.016
ulture-0.016
 Respect-0.015