TOP ACTIVATIONS
MAX = 0.104

fun . After playing for a while , they decided to
to the slide . They wait for their turn .
?" Lu cy thought that sounded like fun ,
go find a seat and wait for a turn ." They
Let 's go inside and wait for the rain to stop
, they sat down for a picnic .
Finally , after climbing for a long time , Mary reached
to the slide . They wait for their turn . L
. Let 's go for a different walk instead ." But
on top of it for a while . As
After swinging for a while , Lily 's tum
" Let 's go for a row !", her mom said
" Let 's go for a walk !" said Mom my
run to the slide and wait for their turn .
in the front yard and wait . Anna did as her
After they played for a while , it was time
around . After playing for a long time , they went
" We can go on a road trip ."
The old lady thought and thought but she couldn â
's go back home and wait for the sun to come

MIN ACTIVATIONS
MIN = -0.102

" Can I lend your mustache for a little bit ?
you want to teach your tank how to say ' s
Whenever Lily went to the post office , she always added
should be compassionate . Comp assion ate means you care about
Where did you get your gun ?" The other
Why did you say ' ob edient ' ?"
judge who lived in a beautiful palace . Whenever anyone had
" Can I have your heart , Anna ?" Tom asks
then on , the me adow was known as the surprise
fox named Fox y . Fox y was very intelligent and
They make sounds like " v room " and " be
that day on , Pet al was never careless and always
that day on , the village was even more peaceful and
that day on , the shop keeper was extra careful to
, " Ar rr , mate y !" Anna
to bow . Whenever she bowed , the town spe ople
did you take my kay ak ?" L ila
Why are you qu arre ling ?" The two
Can you lend me your colors so I can draw one
day on , Wh isk ers was no longer a lazy

INTERVAL 0.052 - 0.104
CONTAINS 0.481%

make a fence !" John thought it was a great idea
tried to catch the cake too . They pulled
raining . <|endoftext|> Once upon a time , there was a
top of the slide . He looks down and sees L
thing . <|endoftext|> Once upon a time , there was a

INTERVAL 0.001 - 0.052
CONTAINS 46.979%

and baked it . After it was brown and crispy ,
. Happy birthday , Tom ." She calls Anna
the mom told them to get ready . Everyone put on
on her coat and went outside . " Look
We can 't run on the track . It 's for

INTERVAL -0.050 - 0.001
CONTAINS 52.153%

Billy looked at each other in fear . " What 's
. But he does not share his ball . He says
that could print . Sammy was very excited and he immediately
on a boat and see where it takes us ." He
. They put on the gloves and boots . They feel

INTERVAL -0.102 - -0.050
CONTAINS 0.387%

. He lived in a big house with a big gate
to lose him . She said , " No , Max
. At the market , Pe ppa saw a
The people on the Kind Island wanted to help , so
" Is your name Red ?" the boy asked Sam
Token fun
Token ID1257
Feature activation+5.30e-02

Pos logit contributions
ches+0.083
 degrees+0.070
 pounds+0.066
fy+0.066
 mun+0.065
Neg logit contributions
ate-0.086
 neighborhood-0.078
 town-0.075
roc-0.074
 Respect-0.073
Token.
Token ID13
Feature activation+2.02e-02

Pos logit contributions
ches+0.107
fy+0.107
 degrees+0.096
 mun+0.096
 syrup+0.094
Neg logit contributions
ate-0.072
ating-0.062
 neighborhood-0.059
 town-0.058
room-0.058
Token After
Token ID2293
Feature activation+3.22e-02

Pos logit contributions
 minutes+0.039
 degrees+0.036
 pounds+0.036
ches+0.035
 mun+0.034
Neg logit contributions
ate-0.030
roc-0.029
 Fighting-0.026
 neighborhood-0.025
 Respect-0.024
Token playing
Token ID2712
Feature activation-1.44e-03

Pos logit contributions
ches+0.065
fy+0.061
 degrees+0.057
 minutes+0.055
 pounds+0.054
Neg logit contributions
ate-0.043
 town-0.041
roc-0.041
 Fighting-0.040
ateful-0.038
Token for
Token ID329
Feature activation+3.51e-02

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.003
ches-0.003
fy-0.002
 mun-0.002
 drift-0.002
Token a
Token ID257
Feature activation+1.04e-01

Pos logit contributions
ches+0.073
fy+0.071
 degrees+0.070
 minutes+0.069
 mun+0.068
Neg logit contributions
ate-0.039
roc-0.039
 town-0.033
ateful-0.032
amed-0.031
Token while
Token ID981
Feature activation+2.37e-02

Pos logit contributions
 degrees+0.186
 pounds+0.164
ches+0.156
oops+0.154
oll+0.154
Neg logit contributions
 judge-0.145
 town-0.131
 bully-0.124
roc-0.123
 pupil-0.121
Token,
Token ID11
Feature activation+2.37e-02

Pos logit contributions
ches+0.058
 degrees+0.056
 mun+0.056
fy+0.056
 minutes+0.056
Neg logit contributions
roc-0.016
ate-0.015
 town-0.012
ums-0.012
ateful-0.012
Token they
Token ID484
Feature activation+9.71e-03

Pos logit contributions
 minutes+0.032
fy+0.031
ches+0.031
 degrees+0.029
 pounds+0.028
Neg logit contributions
ate-0.045
roc-0.045
ateful-0.041
 Fighting-0.040
ulture-0.040
Token decided
Token ID3066
Feature activation+1.01e-02

Pos logit contributions
 minutes+0.016
ches+0.014
 degrees+0.014
fy+0.014
 drift+0.014
Neg logit contributions
roc-0.017
ate-0.017
ulture-0.015
ateful-0.015
rot-0.015
Token to
Token ID284
Feature activation+1.71e-03

Pos logit contributions
 minutes+0.020
ches+0.019
fy+0.018
 degrees+0.018
 mun+0.018
Neg logit contributions
roc-0.015
ate-0.014
ateful-0.012
ulture-0.012
man-0.012
Token to
Token ID284
Feature activation+4.13e-02

Pos logit contributions
 degrees+0.085
 minutes+0.081
ches+0.081
 mun+0.081
 pounds+0.080
Neg logit contributions
ate-0.043
roc-0.042
amed-0.035
room-0.035
man-0.035
Token the
Token ID262
Feature activation+1.85e-02

Pos logit contributions
fy+0.107
ches+0.104
 degrees+0.103
 minutes+0.103
 pounds+0.099
Neg logit contributions
ate-0.041
 town-0.036
roc-0.035
room-0.030
 Respect-0.027
Token slide
Token ID10649
Feature activation+3.64e-02

Pos logit contributions
ches+0.031
 minutes+0.031
fy+0.029
 mun+0.028
 degrees+0.028
Neg logit contributions
roc-0.032
ate-0.031
 neighborhood-0.028
ulture-0.028
man-0.028
Token.
Token ID13
Feature activation+2.38e-02

Pos logit contributions
 minutes+0.069
ches+0.068
 degrees+0.067
 mun+0.066
fy+0.064
Neg logit contributions
ate-0.051
man-0.048
roc-0.046
room-0.046
amer-0.043
Token They
Token ID1119
Feature activation+5.65e-02

Pos logit contributions
 minutes+0.052
fy+0.051
ches+0.050
 degrees+0.049
 pounds+0.048
Neg logit contributions
ate-0.028
roc-0.027
 Fighting-0.024
 Respect-0.023
room-0.023
Token wait
Token ID4043
Feature activation+1.03e-01

Pos logit contributions
 minutes+0.107
 degrees+0.105
ches+0.094
 pounds+0.093
fy+0.091
Neg logit contributions
roc-0.081
ate-0.079
 town-0.078
 judge-0.075
flower-0.071
Token for
Token ID329
Feature activation+4.78e-02

Pos logit contributions
ches+0.216
fy+0.209
 mun+0.199
ience+0.197
cone+0.196
Neg logit contributions
ate-0.112
ap-0.095
,-0.092
 with-0.088
 town-0.084
Token their
Token ID511
Feature activation+3.79e-02

Pos logit contributions
ches+0.111
fy+0.105
 mun+0.103
 pounds+0.096
 wo+0.096
Neg logit contributions
ate-0.054
roc-0.048
 town-0.043
ateful-0.041
room-0.040
Token turn
Token ID1210
Feature activation+4.76e-02

Pos logit contributions
ches+0.078
 minutes+0.068
fy+0.067
 mun+0.066
 degrees+0.064
Neg logit contributions
roc-0.054
 neighborhood-0.050
ate-0.049
 town-0.047
 judge-0.046
Token.
Token ID13
Feature activation+2.25e-02

Pos logit contributions
 mun+0.084
 minutes+0.084
ches+0.083
 degrees+0.081
 slur+0.080
Neg logit contributions
ate-0.065
man-0.058
roc-0.057
room-0.055
flower-0.055
Token
Token ID198
Feature activation+2.78e-02

Pos logit contributions
 minutes+0.050
fy+0.049
ches+0.048
 degrees+0.048
 mun+0.047
Neg logit contributions
ate-0.025
roc-0.024
 Fighting-0.021
 Respect-0.020
ulture-0.020
Token?"
Token ID1701
Feature activation+2.47e-02

Pos logit contributions
roc+0.010
ate+0.009
ulture+0.008
ateful+0.008
rot+0.008
Neg logit contributions
 minutes-0.012
ches-0.011
fy-0.011
 drift-0.011
 y-0.010
Token
Token ID198
Feature activation+2.47e-02

Pos logit contributions
 minutes+0.052
fy+0.051
ches+0.051
 degrees+0.049
 mun+0.048
Neg logit contributions
ate-0.029
roc-0.028
 Respect-0.025
 Fighting-0.024
 neighborhood-0.024
Token
Token ID198
Feature activation+1.02e-02

Pos logit contributions
 minutes+0.055
fy+0.053
 mun+0.053
 degrees+0.052
ches+0.052
Neg logit contributions
ate-0.027
roc-0.026
 Respect-0.021
room-0.021
man-0.021
TokenLu
Token ID25596
Feature activation-2.64e-03

Pos logit contributions
 minutes+0.014
ches+0.013
 mun+0.012
fy+0.012
 drift+0.012
Neg logit contributions
roc-0.021
ate-0.020
ulture-0.018
rot-0.018
ateful-0.018
Tokency
Token ID948
Feature activation-1.08e-02

Pos logit contributions
roc+0.004
ate+0.003
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.006
ches-0.005
fy-0.005
 mun-0.005
 drift-0.005
Token thought
Token ID1807
Feature activation+1.01e-01

Pos logit contributions
roc+0.021
ate+0.020
ulture+0.019
ateful+0.018
amer+0.018
Neg logit contributions
 minutes-0.017
ches-0.015
fy-0.014
 mun-0.014
 y-0.014
Token that
Token ID326
Feature activation+2.19e-02

Pos logit contributions
ksh+0.216
 mun+0.214
ns+0.199
fy+0.199
Run+0.190
Neg logit contributions
,-0.094
ate-0.087
room-0.082
 office-0.078
 heart-0.075
Token sounded
Token ID14846
Feature activation+4.72e-02

Pos logit contributions
 minutes+0.047
fy+0.045
ches+0.044
 mun+0.044
 degrees+0.042
Neg logit contributions
ate-0.028
roc-0.027
man-0.023
anish-0.023
 Fighting-0.022
Token like
Token ID588
Feature activation+3.60e-02

Pos logit contributions
ches+0.090
 mun+0.089
 degrees+0.085
 minutes+0.085
ait+0.084
Neg logit contributions
roc-0.060
ate-0.059
ateful-0.050
 sincere-0.049
 Respect-0.049
Token fun
Token ID1257
Feature activation+2.49e-02

Pos logit contributions
ches+0.070
fy+0.065
 minutes+0.062
 drift+0.059
 Whoever+0.059
Neg logit contributions
roc-0.052
ate-0.052
ateful-0.045
ulture-0.043
 neighborhood-0.043
Token,
Token ID11
Feature activation+3.75e-02

Pos logit contributions
ches+0.047
 mun+0.046
fy+0.046
 minutes+0.045
 drift+0.042
Neg logit contributions
ate-0.035
roc-0.034
anish-0.029
 neighborhood-0.029
room-0.029
Token go
Token ID467
Feature activation+3.42e-02

Pos logit contributions
 minutes+0.006
ches+0.005
fy+0.005
 degrees+0.005
 mun+0.005
Neg logit contributions
roc-0.009
ate-0.009
ulture-0.008
ateful-0.008
rot-0.008
Token find
Token ID1064
Feature activation+1.01e-02

Pos logit contributions
 minutes+0.071
 degrees+0.069
 pounds+0.065
 mun+0.062
fy+0.062
Neg logit contributions
ate-0.048
roc-0.042
man-0.038
gy-0.038
 neighborhood-0.037
Token a
Token ID257
Feature activation+3.12e-02

Pos logit contributions
 minutes+0.019
ches+0.017
fy+0.017
 drift+0.017
 mun+0.017
Neg logit contributions
roc-0.016
ate-0.016
ulture-0.014
ateful-0.014
rot-0.013
Token seat
Token ID5852
Feature activation+5.64e-03

Pos logit contributions
 minutes+0.053
ches+0.052
 degrees+0.051
 mun+0.048
fy+0.048
Neg logit contributions
roc-0.050
ate-0.048
 neighborhood-0.045
rot-0.045
man-0.044
Token and
Token ID290
Feature activation+2.07e-02

Pos logit contributions
 minutes+0.008
ches+0.008
 drift+0.007
fy+0.007
 mun+0.007
Neg logit contributions
roc-0.011
ate-0.011
ulture-0.010
ateful-0.010
man-0.010
Token wait
Token ID4043
Feature activation+1.00e-01

Pos logit contributions
 minutes+0.033
ches+0.031
 degrees+0.029
fy+0.029
 mun+0.027
Neg logit contributions
ate-0.038
roc-0.037
ateful-0.032
ulture-0.032
 Respect-0.032
Token for
Token ID329
Feature activation+2.79e-02

Pos logit contributions
ches+0.195
 mun+0.189
fy+0.183
ns+0.178
cone+0.175
Neg logit contributions
ate-0.125
ap-0.097
 yourself-0.091
 town-0.091
 home-0.090
Token a
Token ID257
Feature activation+7.06e-02

Pos logit contributions
 minutes+0.051
ches+0.050
fy+0.050
 mun+0.048
 degrees+0.047
Neg logit contributions
ate-0.042
roc-0.041
ateful-0.035
 Fighting-0.033
 Respect-0.033
Token turn
Token ID1210
Feature activation+4.78e-02

Pos logit contributions
 degrees+0.119
ches+0.117
 minutes+0.109
 pounds+0.107
 mun+0.104
Neg logit contributions
 judge-0.100
man-0.097
 symbol-0.093
rot-0.093
 town-0.092
Token."
Token ID526
Feature activation+2.65e-02

Pos logit contributions
 mun+0.084
 minutes+0.081
 tug+0.080
 drift+0.079
 degrees+0.079
Neg logit contributions
ate-0.069
man-0.064
amer-0.060
roc-0.059
rot-0.058
Token They
Token ID1119
Feature activation+5.15e-02

Pos logit contributions
 minutes+0.046
 pounds+0.045
 degrees+0.045
 kil+0.042
ches+0.042
Neg logit contributions
roc-0.043
ate-0.041
 Fighting-0.038
room-0.036
 neighborhood-0.036
Token Let
Token ID3914
Feature activation+3.71e-03

Pos logit contributions
 minutes+0.027
ches+0.026
fy+0.024
 mun+0.024
 y+0.024
Neg logit contributions
roc-0.025
ate-0.024
ulture-0.022
 Fighting-0.021
 Respect-0.021
Token's
Token ID338
Feature activation+3.09e-03

Pos logit contributions
 minutes+0.007
ches+0.007
fy+0.007
 mun+0.007
 drift+0.006
Neg logit contributions
roc-0.006
ate-0.005
ulture-0.005
ateful-0.004
rot-0.004
Token go
Token ID467
Feature activation+2.31e-02

Pos logit contributions
 minutes+0.004
ches+0.004
fy+0.003
 mun+0.003
 degrees+0.003
Neg logit contributions
roc-0.007
ate-0.006
ulture-0.006
ateful-0.006
rot-0.006
Token inside
Token ID2641
Feature activation+1.61e-02

Pos logit contributions
 minutes+0.046
 degrees+0.043
 pounds+0.041
fy+0.041
 mun+0.041
Neg logit contributions
ate-0.033
roc-0.031
man-0.028
ulture-0.027
gy-0.027
Token and
Token ID290
Feature activation+4.30e-03

Pos logit contributions
 minutes+0.028
ches+0.026
fy+0.026
 mun+0.025
 drift+0.025
Neg logit contributions
ate-0.026
roc-0.026
ateful-0.023
ulture-0.022
man-0.022
Token wait
Token ID4043
Feature activation+9.93e-02

Pos logit contributions
 minutes+0.007
ches+0.006
fy+0.006
 mun+0.006
 degrees+0.006
Neg logit contributions
roc-0.008
ate-0.008
ulture-0.007
ateful-0.007
man-0.007
Token for
Token ID329
Feature activation+3.35e-02

Pos logit contributions
ches+0.171
 mun+0.168
fy+0.163
cone+0.159
 Whoever+0.151
Neg logit contributions
ate-0.136
 town-0.109
ap-0.108
 yourself-0.106
room-0.103
Token the
Token ID262
Feature activation+2.51e-02

Pos logit contributions
ches+0.065
fy+0.064
 mun+0.061
 minutes+0.059
 Whoever+0.059
Neg logit contributions
ate-0.047
roc-0.044
ateful-0.038
 town-0.038
 yourself-0.037
Token rain
Token ID6290
Feature activation+2.82e-02

Pos logit contributions
ches+0.042
 minutes+0.042
fy+0.041
 mun+0.040
 Whoever+0.039
Neg logit contributions
roc-0.041
ate-0.039
ulture-0.036
man-0.036
 neighborhood-0.035
Token to
Token ID284
Feature activation+2.24e-02

Pos logit contributions
 minutes+0.047
 mun+0.043
ches+0.043
fy+0.042
 degrees+0.042
Neg logit contributions
ate-0.045
roc-0.045
man-0.042
ulture-0.041
flower-0.040
Token stop
Token ID2245
Feature activation+2.05e-02

Pos logit contributions
 minutes+0.034
fy+0.031
ches+0.031
 mun+0.029
 Whoever+0.029
Neg logit contributions
ate-0.042
roc-0.042
ulture-0.036
ateful-0.036
man-0.036
Token,
Token ID11
Feature activation+5.76e-03

Pos logit contributions
roc+0.006
ate+0.005
ulture+0.005
ateful+0.005
rot+0.005
Neg logit contributions
 minutes-0.007
ches-0.007
fy-0.006
 mun-0.006
 drift-0.006
Token they
Token ID484
Feature activation+1.34e-02

Pos logit contributions
 minutes+0.007
ches+0.007
fy+0.006
 degrees+0.006
 mun+0.006
Neg logit contributions
roc-0.013
ate-0.012
ateful-0.011
ulture-0.011
rot-0.011
Token sat
Token ID3332
Feature activation-2.82e-02

Pos logit contributions
 minutes+0.025
 degrees+0.023
ches+0.023
fy+0.022
 drift+0.021
Neg logit contributions
roc-0.021
ate-0.020
ulture-0.018
ateful-0.018
man-0.018
Token down
Token ID866
Feature activation+1.60e-02

Pos logit contributions
roc+0.064
ate+0.056
ulture+0.056
anish+0.054
ateful+0.054
Neg logit contributions
 minutes-0.037
 y-0.031
ches-0.028
fy-0.028
 mun-0.028
Token for
Token ID329
Feature activation+4.78e-02

Pos logit contributions
 minutes+0.030
ches+0.029
fy+0.028
 degrees+0.027
 mun+0.027
Neg logit contributions
ate-0.024
roc-0.023
ateful-0.021
man-0.021
ulture-0.020
Token a
Token ID257
Feature activation+9.90e-02

Pos logit contributions
ches+0.109
fy+0.105
 drift+0.099
 degrees+0.098
 minutes+0.098
Neg logit contributions
roc-0.049
ate-0.046
 town-0.039
room-0.037
amed-0.037
Token picnic
Token ID35715
Feature activation+1.62e-02

Pos logit contributions
 degrees+0.170
 minutes+0.149
 pounds+0.149
 twice+0.148
uct+0.147
Neg logit contributions
 judge-0.138
 town-0.124
man-0.123
 painter-0.121
 musician-0.120
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
 minutes+0.026
ches+0.025
 degrees+0.024
fy+0.023
 Whoever+0.023
Neg logit contributions
ate-0.028
roc-0.027
man-0.025
ulture-0.025
flower-0.024
Token
Token ID220
Feature activation-2.79e-03

Pos logit contributions
 minutes+0.023
ches+0.021
 degrees+0.021
fy+0.021
 pounds+0.020
Neg logit contributions
roc-0.021
ate-0.020
 Fighting-0.018
ulture-0.018
ateful-0.017
Token
Token ID198
Feature activation+2.82e-03

Pos logit contributions
roc+0.005
ate+0.004
ulture+0.004
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.005
ches-0.005
fy-0.005
 mun-0.005
 drift-0.005
Token
Token ID198
Feature activation-8.02e-04

Pos logit contributions
 minutes+0.005
ches+0.005
fy+0.005
 mun+0.005
 drift+0.005
Neg logit contributions
roc-0.004
ate-0.004
ulture-0.004
ateful-0.004
rot-0.003
Token Finally
Token ID9461
Feature activation+2.85e-02

Pos logit contributions
 minutes+0.026
ches+0.025
fy+0.025
 degrees+0.024
 mun+0.024
Neg logit contributions
roc-0.020
ate-0.020
ulture-0.017
ateful-0.017
 Respect-0.016
Token,
Token ID11
Feature activation+1.44e-02

Pos logit contributions
ches+0.069
fy+0.067
 mun+0.063
 degrees+0.063
 minutes+0.062
Neg logit contributions
ate-0.024
roc-0.023
anish-0.019
ateful-0.018
man-0.017
Token after
Token ID706
Feature activation+4.47e-02

Pos logit contributions
 minutes+0.020
ches+0.020
fy+0.019
 degrees+0.018
 mun+0.017
Neg logit contributions
roc-0.028
ate-0.028
ateful-0.025
ulture-0.025
man-0.024
Token climbing
Token ID14281
Feature activation+6.22e-03

Pos logit contributions
ches+0.090
fy+0.090
 degrees+0.077
 minutes+0.077
 mun+0.076
Neg logit contributions
ate-0.058
roc-0.055
 town-0.054
ateful-0.050
 neighborhood-0.049
Token for
Token ID329
Feature activation+5.46e-02

Pos logit contributions
 minutes+0.011
ches+0.011
fy+0.011
 mun+0.010
 degrees+0.010
Neg logit contributions
roc-0.010
ate-0.010
ulture-0.008
ateful-0.008
man-0.008
Token a
Token ID257
Feature activation+9.89e-02

Pos logit contributions
ches+0.127
fy+0.124
ksh+0.119
 mun+0.117
 pounds+0.116
Neg logit contributions
 town-0.049
ate-0.046
roc-0.040
amed-0.038
 voice-0.037
Token long
Token ID890
Feature activation+6.07e-02

Pos logit contributions
 degrees+0.176
oll+0.168
oops+0.166
ows+0.158
uct+0.156
Neg logit contributions
 judge-0.120
 town-0.116
 symbol-0.112
 spirit-0.109
 purpose-0.107
Token time
Token ID640
Feature activation+3.91e-02

Pos logit contributions
 degrees+0.101
fy+0.090
ches+0.089
 pounds+0.088
 minutes+0.086
Neg logit contributions
roc-0.086
 town-0.085
gy-0.084
ate-0.082
man-0.081
Token,
Token ID11
Feature activation+2.32e-02

Pos logit contributions
ches+0.081
fy+0.079
 minutes+0.075
 mun+0.075
 syrup+0.073
Neg logit contributions
ate-0.044
roc-0.041
 neighborhood-0.038
room-0.037
ums-0.036
Token Mary
Token ID5335
Feature activation+1.21e-02

Pos logit contributions
ches+0.036
fy+0.035
 minutes+0.034
 mun+0.032
 drift+0.031
Neg logit contributions
roc-0.042
ate-0.041
ateful-0.037
ulture-0.035
flower-0.035
Token reached
Token ID4251
Feature activation+3.94e-02

Pos logit contributions
 minutes+0.019
fy+0.017
ches+0.017
 drift+0.016
 degrees+0.016
Neg logit contributions
roc-0.022
ate-0.021
ulture-0.019
rot-0.019
man-0.019
Token to
Token ID284
Feature activation+3.89e-02

Pos logit contributions
 degrees+0.056
 minutes+0.056
 mun+0.055
ches+0.055
 pounds+0.054
Neg logit contributions
ate-0.032
roc-0.032
man-0.027
amed-0.026
room-0.025
Token the
Token ID262
Feature activation+9.81e-03

Pos logit contributions
fy+0.100
ches+0.097
 degrees+0.096
 minutes+0.096
 pounds+0.091
Neg logit contributions
ate-0.039
roc-0.035
 town-0.033
 Respect-0.026
room-0.026
Token slide
Token ID10649
Feature activation+2.62e-02

Pos logit contributions
 minutes+0.016
ches+0.015
fy+0.014
 mun+0.013
 degrees+0.013
Neg logit contributions
roc-0.018
ate-0.018
ulture-0.016
man-0.016
ateful-0.016
Token.
Token ID13
Feature activation+2.05e-02

Pos logit contributions
 minutes+0.049
ches+0.047
 degrees+0.047
 mun+0.046
fy+0.046
Neg logit contributions
ate-0.039
roc-0.036
man-0.035
room-0.034
flower-0.033
Token They
Token ID1119
Feature activation+5.11e-02

Pos logit contributions
 minutes+0.041
ches+0.040
fy+0.039
 degrees+0.038
 mun+0.037
Neg logit contributions
roc-0.030
ate-0.027
ulture-0.024
room-0.024
 Fighting-0.023
Token wait
Token ID4043
Feature activation+9.86e-02

Pos logit contributions
 minutes+0.096
 degrees+0.095
 pounds+0.085
ches+0.085
fy+0.084
Neg logit contributions
roc-0.074
 town-0.071
ate-0.070
 judge-0.069
 neighborhood-0.064
Token for
Token ID329
Feature activation+4.74e-02

Pos logit contributions
ches+0.199
fy+0.195
 mun+0.185
ience+0.179
cone+0.179
Neg logit contributions
ate-0.113
ap-0.097
,-0.091
 town-0.088
room-0.086
Token their
Token ID511
Feature activation+3.32e-02

Pos logit contributions
ches+0.116
fy+0.110
 mun+0.104
sc+0.100
 pounds+0.097
Neg logit contributions
ate-0.052
roc-0.047
 town-0.041
ateful-0.039
room-0.037
Token turn
Token ID1210
Feature activation+4.49e-02

Pos logit contributions
ches+0.069
 minutes+0.062
fy+0.061
 mun+0.059
 Whoever+0.059
Neg logit contributions
roc-0.046
ate-0.042
 neighborhood-0.041
 town-0.039
man-0.038
Token.
Token ID13
Feature activation+2.34e-02

Pos logit contributions
 mun+0.079
 minutes+0.079
ches+0.077
 drift+0.075
 degrees+0.074
Neg logit contributions
ate-0.063
man-0.057
roc-0.055
room-0.053
amer-0.053
Token L
Token ID406
Feature activation+2.27e-02

Pos logit contributions
ches+0.046
 minutes+0.046
fy+0.045
 pounds+0.043
 mun+0.043
Neg logit contributions
roc-0.033
ate-0.032
 Fighting-0.027
room-0.027
ulture-0.026
Token.
Token ID13
Feature activation-3.00e-03

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.001
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token Let
Token ID3914
Feature activation+9.67e-03

Pos logit contributions
roc+0.006
ate+0.005
ulture+0.005
ateful+0.005
rot+0.005
Neg logit contributions
 minutes-0.005
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
Token's
Token ID338
Feature activation+1.88e-02

Pos logit contributions
 minutes+0.020
fy+0.019
 mun+0.018
 degrees+0.018
ches+0.018
Neg logit contributions
roc-0.013
ate-0.013
ulture-0.011
rot-0.011
man-0.010
Token go
Token ID467
Feature activation+3.20e-02

Pos logit contributions
 minutes+0.028
ches+0.025
fy+0.024
 degrees+0.024
 mun+0.023
Neg logit contributions
roc-0.038
ate-0.037
ulture-0.033
man-0.032
flower-0.032
Token for
Token ID329
Feature activation+3.55e-02

Pos logit contributions
 minutes+0.066
 degrees+0.064
 pounds+0.060
 mun+0.060
fy+0.059
Neg logit contributions
ate-0.044
roc-0.040
gy-0.036
man-0.036
 neighborhood-0.034
Token a
Token ID257
Feature activation+9.75e-02

Pos logit contributions
ches+0.076
fy+0.075
 mun+0.071
 degrees+0.070
 minutes+0.070
Neg logit contributions
ate-0.041
roc-0.040
 town-0.032
 yourself-0.031
 wing-0.030
Token different
Token ID1180
Feature activation+2.82e-02

Pos logit contributions
 degrees+0.182
 pounds+0.164
uct+0.159
 minutes+0.159
 syrup+0.154
Neg logit contributions
 judge-0.131
 town-0.122
gy-0.121
 spirit-0.118
 friend-0.114
Token walk
Token ID2513
Feature activation-8.71e-04

Pos logit contributions
ches+0.048
 minutes+0.044
fy+0.044
 mun+0.044
 syrup+0.042
Neg logit contributions
roc-0.046
ate-0.046
 neighborhood-0.042
anish-0.041
 town-0.040
Token instead
Token ID2427
Feature activation+1.59e-02

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.001
rot+0.001
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token."
Token ID526
Feature activation-5.65e-03

Pos logit contributions
 minutes+0.025
ches+0.022
 mun+0.022
fy+0.022
 drift+0.022
Neg logit contributions
ate-0.028
roc-0.028
ulture-0.026
man-0.025
ateful-0.025
Token But
Token ID887
Feature activation-4.91e-03

Pos logit contributions
roc+0.010
ate+0.010
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.009
ches-0.009
fy-0.008
 mun-0.008
 drift-0.008
Token on
Token ID319
Feature activation+3.53e-02

Pos logit contributions
roc+0.010
ate+0.010
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.009
ches-0.008
fy-0.008
 mun-0.008
 y-0.008
Token top
Token ID1353
Feature activation+4.37e-02

Pos logit contributions
fy+0.074
ches+0.073
 minutes+0.073
 mun+0.067
 degrees+0.067
Neg logit contributions
ate-0.046
roc-0.044
 town-0.037
man-0.036
room-0.036
Token of
Token ID286
Feature activation+3.18e-02

Pos logit contributions
 degrees+0.085
 minutes+0.085
 mun+0.084
 wo+0.084
ches+0.083
Neg logit contributions
ate-0.054
roc-0.046
man-0.046
room-0.046
amer-0.045
Token it
Token ID340
Feature activation+5.33e-02

Pos logit contributions
fy+0.070
ches+0.070
 minutes+0.067
 degrees+0.062
 mun+0.060
Neg logit contributions
roc-0.040
ate-0.039
ateful-0.033
 town-0.032
gy-0.031
Token for
Token ID329
Feature activation+4.37e-02

Pos logit contributions
 minutes+0.107
 degrees+0.104
 mun+0.103
 syrup+0.102
 slur+0.099
Neg logit contributions
ate-0.064
man-0.055
room-0.052
stick-0.050
gy-0.049
Token a
Token ID257
Feature activation+9.72e-02

Pos logit contributions
ches+0.094
fy+0.091
 minutes+0.084
 drift+0.082
 degrees+0.081
Neg logit contributions
roc-0.054
ate-0.050
 town-0.044
ateful-0.043
room-0.042
Token while
Token ID981
Feature activation+3.89e-02

Pos logit contributions
 degrees+0.159
ches+0.151
uct+0.149
 minutes+0.149
oops+0.144
Neg logit contributions
 judge-0.128
 symbol-0.121
 town-0.120
man-0.116
 nation-0.115
Token.
Token ID13
Feature activation+2.27e-02

Pos logit contributions
ches+0.074
fy+0.071
 mun+0.069
 degrees+0.067
 minutes+0.067
Neg logit contributions
roc-0.048
ate-0.046
 town-0.046
man-0.045
amer-0.044
Token
Token ID198
Feature activation+2.09e-02

Pos logit contributions
 minutes+0.047
ches+0.044
 degrees+0.043
fy+0.043
 pounds+0.042
Neg logit contributions
roc-0.031
ate-0.031
 Fighting-0.028
 Respect-0.026
 neighborhood-0.025
Token
Token ID198
Feature activation+9.56e-03

Pos logit contributions
 minutes+0.045
fy+0.042
 degrees+0.041
ches+0.041
 mun+0.041
Neg logit contributions
ate-0.026
roc-0.026
ulture-0.022
man-0.021
room-0.021
TokenAs
Token ID1722
Feature activation+2.83e-02

Pos logit contributions
 minutes+0.015
ches+0.014
 drift+0.013
fy+0.013
 degrees+0.013
Neg logit contributions
roc-0.018
ate-0.017
ulture-0.015
ateful-0.015
 Fighting-0.015
Token
Token ID198
Feature activation+2.04e-02

Pos logit contributions
 minutes+0.033
 degrees+0.031
ches+0.031
fy+0.030
 pounds+0.030
Neg logit contributions
roc-0.020
ate-0.020
ulture-0.017
 Fighting-0.017
 neighborhood-0.016
Token
Token ID198
Feature activation+1.29e-02

Pos logit contributions
 minutes+0.043
fy+0.041
 degrees+0.040
ches+0.040
 mun+0.040
Neg logit contributions
ate-0.026
roc-0.026
ulture-0.022
man-0.021
ateful-0.021
TokenAfter
Token ID3260
Feature activation+3.15e-02

Pos logit contributions
 minutes+0.023
ches+0.021
 drift+0.021
 degrees+0.021
 mun+0.021
Neg logit contributions
roc-0.021
ate-0.020
ulture-0.018
amed-0.017
rot-0.017
Token swinging
Token ID25635
Feature activation-4.92e-02

Pos logit contributions
ches+0.069
fy+0.065
 degrees+0.062
 pounds+0.061
 minutes+0.060
Neg logit contributions
ate-0.035
roc-0.035
 Fighting-0.033
 town-0.033
room-0.033
Token for
Token ID329
Feature activation+3.93e-02

Pos logit contributions
roc+0.099
ate+0.097
ateful+0.091
amer+0.089
ulture+0.087
Neg logit contributions
 minutes-0.072
 y-0.065
ches-0.064
fy-0.061
 wo-0.061
Token a
Token ID257
Feature activation+9.64e-02

Pos logit contributions
ches+0.086
fy+0.084
 degrees+0.081
 drift+0.080
 pounds+0.079
Neg logit contributions
roc-0.042
ate-0.039
 town-0.033
amed-0.033
ateful-0.031
Token while
Token ID981
Feature activation+1.91e-02

Pos logit contributions
 degrees+0.174
 pounds+0.156
oll+0.151
oops+0.148
uct+0.146
Neg logit contributions
 judge-0.136
 town-0.121
roc-0.119
Queen-0.113
 pupil-0.111
Token,
Token ID11
Feature activation+1.82e-02

Pos logit contributions
ches+0.046
fy+0.045
 degrees+0.045
 minutes+0.044
 mun+0.044
Neg logit contributions
roc-0.015
ate-0.013
ateful-0.011
rot-0.011
ulture-0.011
Token Lily
Token ID20037
Feature activation+3.80e-03

Pos logit contributions
 minutes+0.027
ches+0.026
fy+0.026
 degrees+0.024
 pounds+0.023
Neg logit contributions
roc-0.033
ate-0.032
ateful-0.030
ulture-0.029
 Fighting-0.028
Token's
Token ID338
Feature activation-4.82e-03

Pos logit contributions
 minutes+0.007
ches+0.006
fy+0.006
 drift+0.006
 degrees+0.006
Neg logit contributions
roc-0.006
ate-0.006
ulture-0.005
ateful-0.005
rot-0.005
Token tum
Token ID11814
Feature activation-3.80e-02

Pos logit contributions
roc+0.008
ate+0.007
ulture+0.007
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.009
ches-0.008
fy-0.008
 drift-0.008
 mun-0.008
Token"
Token ID1
Feature activation+8.70e-03

Pos logit contributions
roc+0.029
ate+0.027
ulture+0.026
ateful+0.026
rot+0.025
Neg logit contributions
 minutes-0.022
ches-0.019
fy-0.018
 Whoever-0.018
 y-0.018
TokenLet
Token ID5756
Feature activation+1.87e-03

Pos logit contributions
 minutes+0.017
ches+0.016
fy+0.016
 drift+0.015
 degrees+0.015
Neg logit contributions
roc-0.013
ate-0.012
ulture-0.011
ateful-0.011
rot-0.011
Token's
Token ID338
Feature activation+1.25e-02

Pos logit contributions
 minutes+0.004
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.002
ateful-0.002
rot-0.002
Token go
Token ID467
Feature activation+4.26e-02

Pos logit contributions
 minutes+0.018
ches+0.016
fy+0.015
 degrees+0.015
 mun+0.015
Neg logit contributions
roc-0.026
ate-0.025
ulture-0.022
ateful-0.022
man-0.022
Token for
Token ID329
Feature activation+4.10e-02

Pos logit contributions
 minutes+0.081
 degrees+0.078
 pounds+0.071
fy+0.071
 mun+0.070
Neg logit contributions
ate-0.065
roc-0.056
man-0.053
gy-0.052
flower-0.050
Token a
Token ID257
Feature activation+9.63e-02

Pos logit contributions
ches+0.090
fy+0.089
 mun+0.085
 degrees+0.084
 minutes+0.083
Neg logit contributions
ate-0.046
roc-0.039
 town-0.035
 yourself-0.032
amed-0.030
Token row
Token ID5752
Feature activation-7.85e-03

Pos logit contributions
 degrees+0.185
uct+0.169
oll+0.166
 twice+0.164
 pounds+0.164
Neg logit contributions
 town-0.120
 judge-0.119
 nation-0.116
gy-0.114
man-0.109
Token!",
Token ID40754
Feature activation+2.89e-03

Pos logit contributions
roc+0.015
ate+0.015
ulture+0.014
ateful+0.013
rot+0.013
Neg logit contributions
 minutes-0.012
ches-0.011
fy-0.010
 mun-0.010
 drift-0.010
Token her
Token ID607
Feature activation-2.22e-04

Pos logit contributions
 minutes+0.004
ches+0.004
fy+0.003
 degrees+0.003
 drift+0.003
Neg logit contributions
roc-0.006
ate-0.006
ateful-0.005
ulture-0.005
rot-0.005
Token mom
Token ID1995
Feature activation+4.66e-02

Pos logit contributions
roc+0.000
ate+0.000
ulture+0.000
ateful+0.000
rot+0.000
Neg logit contributions
 minutes-0.000
ches-0.000
fy-0.000
 mun-0.000
 drift-0.000
Token said
Token ID531
Feature activation+2.29e-02

Pos logit contributions
 minutes+0.076
 degrees+0.073
 refreshing+0.069
 marsh+0.068
 pounds+0.068
Neg logit contributions
ate-0.075
man-0.073
roc-0.070
amed-0.068
amer-0.066
Token"
Token ID1
Feature activation-1.47e-02

Pos logit contributions
roc+0.019
ate+0.017
ulture+0.016
ateful+0.016
rot+0.015
Neg logit contributions
 minutes-0.020
ches-0.018
fy-0.017
 drift-0.017
 mun-0.017
TokenLet
Token ID5756
Feature activation-2.15e-02

Pos logit contributions
roc+0.025
ate+0.023
ulture+0.021
ateful+0.021
rot+0.020
Neg logit contributions
 minutes-0.027
ches-0.024
fy-0.023
 mun-0.023
 y-0.023
Token's
Token ID338
Feature activation-2.28e-02

Pos logit contributions
roc+0.040
ate+0.038
ateful+0.037
ulture+0.036
amer+0.035
Neg logit contributions
 minutes-0.033
ches-0.033
 y-0.031
 drift-0.030
fy-0.030
Token go
Token ID467
Feature activation+2.58e-02

Pos logit contributions
roc+0.054
ate+0.050
ulture+0.048
ateful+0.048
rot+0.047
Neg logit contributions
 minutes-0.024
ches-0.023
 drift-0.023
 y-0.021
 Whoever-0.021
Token for
Token ID329
Feature activation+3.97e-02

Pos logit contributions
 minutes+0.050
 degrees+0.047
fy+0.044
 pounds+0.043
ches+0.043
Neg logit contributions
ate-0.039
roc-0.037
man-0.032
ulture-0.032
rot-0.032
Token a
Token ID257
Feature activation+9.63e-02

Pos logit contributions
fy+0.082
ches+0.081
 minutes+0.080
 degrees+0.078
 mun+0.077
Neg logit contributions
ate-0.049
roc-0.043
 town-0.037
 neighborhood-0.034
 Respect-0.033
Token walk
Token ID2513
Feature activation-5.88e-03

Pos logit contributions
 degrees+0.180
 pounds+0.160
 minutes+0.159
 twice+0.147
 kil+0.140
Neg logit contributions
 judge-0.138
gy-0.134
 nation-0.132
 town-0.132
date-0.127
Token!"
Token ID2474
Feature activation+1.06e-02

Pos logit contributions
roc+0.012
ate+0.011
ulture+0.010
ateful+0.010
rot+0.010
Neg logit contributions
 minutes-0.009
ches-0.008
fy-0.008
 mun-0.008
 drift-0.007
Token said
Token ID531
Feature activation-2.28e-02

Pos logit contributions
 minutes+0.014
ches+0.013
 degrees+0.013
fy+0.013
 drift+0.012
Neg logit contributions
roc-0.022
ate-0.021
ulture-0.019
ateful-0.019
man-0.018
Token Mom
Token ID11254
Feature activation+2.79e-02

Pos logit contributions
roc+0.049
ate+0.047
ulture+0.045
ateful+0.044
rot+0.043
Neg logit contributions
 minutes-0.030
ches-0.027
 drift-0.026
 y-0.026
 mun-0.026
Tokenmy
Token ID1820
Feature activation+2.77e-02

Pos logit contributions
 minutes+0.065
 degrees+0.059
ches+0.058
fy+0.058
 y+0.057
Neg logit contributions
ate-0.029
roc-0.029
man-0.028
flower-0.027
amer-0.025
Token run
Token ID1057
Feature activation+2.36e-02

Pos logit contributions
 degrees+0.118
 minutes+0.117
ches+0.107
 pounds+0.104
fy+0.104
Neg logit contributions
roc-0.086
 town-0.081
 judge-0.080
ate-0.076
 neighborhood-0.072
Token to
Token ID284
Feature activation+4.45e-02

Pos logit contributions
 minutes+0.048
 degrees+0.047
 mun+0.046
ches+0.046
 drift+0.045
Neg logit contributions
roc-0.030
ate-0.030
man-0.025
amed-0.024
ulture-0.023
Token the
Token ID262
Feature activation+1.83e-02

Pos logit contributions
fy+0.116
ches+0.112
 degrees+0.111
 minutes+0.111
oll+0.106
Neg logit contributions
ate-0.044
 town-0.037
roc-0.036
room-0.030
 Respect-0.028
Token slide
Token ID10649
Feature activation+2.40e-02

Pos logit contributions
 minutes+0.030
ches+0.030
fy+0.028
 mun+0.027
 degrees+0.027
Neg logit contributions
roc-0.032
ate-0.031
ulture-0.028
man-0.028
 neighborhood-0.028
Token and
Token ID290
Feature activation+4.48e-02

Pos logit contributions
 minutes+0.046
ches+0.044
 degrees+0.043
 mun+0.043
fy+0.042
Neg logit contributions
ate-0.034
roc-0.032
man-0.031
room-0.029
flower-0.028
Token wait
Token ID4043
Feature activation+9.62e-02

Pos logit contributions
 degrees+0.087
 minutes+0.085
ches+0.082
 pounds+0.080
fy+0.079
Neg logit contributions
roc-0.060
ate-0.059
 town-0.057
 judge-0.056
 bully-0.053
Token for
Token ID329
Feature activation+5.49e-02

Pos logit contributions
ches+0.208
fy+0.202
 mun+0.195
cone+0.188
ns+0.187
Neg logit contributions
ate-0.096
ap-0.084
 yourself-0.072
ings-0.071
room-0.071
Token their
Token ID511
Feature activation+3.03e-02

Pos logit contributions
ches+0.132
fy+0.128
 mun+0.120
sc+0.116
rig+0.116
Neg logit contributions
ate-0.058
roc-0.048
 town-0.046
ateful-0.044
 others-0.044
Token turn
Token ID1210
Feature activation+4.72e-02

Pos logit contributions
ches+0.065
 minutes+0.060
fy+0.058
 mun+0.057
 degrees+0.055
Neg logit contributions
roc-0.041
ate-0.037
 neighborhood-0.035
ateful-0.033
man-0.033
Token.
Token ID13
Feature activation+1.16e-02

Pos logit contributions
 minutes+0.084
 mun+0.083
ches+0.080
 drift+0.077
 degrees+0.077
Neg logit contributions
ate-0.067
man-0.060
roc-0.059
ateful-0.056
amer-0.056
Token
Token ID198
Feature activation+2.65e-02

Pos logit contributions
 minutes+0.024
ches+0.023
fy+0.022
 degrees+0.022
 mun+0.022
Neg logit contributions
roc-0.016
ate-0.015
ulture-0.013
 Respect-0.013
 Fighting-0.012
Token in
Token ID287
Feature activation-1.11e-02

Pos logit contributions
roc+0.010
ate+0.009
ulture+0.009
ateful+0.009
rot+0.008
Neg logit contributions
 minutes-0.009
ches-0.008
fy-0.008
 mun-0.008
 drift-0.008
Token the
Token ID262
Feature activation-1.23e-02

Pos logit contributions
roc+0.022
ate+0.020
ulture+0.020
rot+0.019
ateful+0.019
Neg logit contributions
 minutes-0.017
ches-0.015
 drift-0.015
 y-0.014
fy-0.014
Token front
Token ID2166
Feature activation+5.38e-03

Pos logit contributions
roc+0.020
ate+0.018
ateful+0.017
ulture+0.017
amer+0.017
Neg logit contributions
 minutes-0.023
 y-0.021
ches-0.021
fy-0.021
 drift-0.020
Token yard
Token ID12699
Feature activation+5.47e-03

Pos logit contributions
 minutes+0.010
ches+0.010
fy+0.009
 mun+0.009
 drift+0.009
Neg logit contributions
roc-0.008
ate-0.008
ulture-0.007
ateful-0.007
rot-0.007
Token and
Token ID290
Feature activation+1.02e-02

Pos logit contributions
 minutes+0.008
ches+0.007
fy+0.007
 drift+0.007
 mun+0.007
Neg logit contributions
roc-0.011
ate-0.010
ulture-0.010
ateful-0.009
man-0.009
Token wait
Token ID4043
Feature activation+9.60e-02

Pos logit contributions
 minutes+0.015
ches+0.014
fy+0.013
 degrees+0.013
 mun+0.012
Neg logit contributions
roc-0.020
ate-0.019
ateful-0.017
ulture-0.017
man-0.017
Token.
Token ID13
Feature activation+2.47e-02

Pos logit contributions
 mun+0.179
ches+0.177
fy+0.177
cone+0.175
 Whoever+0.165
Neg logit contributions
ate-0.115
ap-0.094
 yourself-0.086
 town-0.085
 with-0.083
Token Anna
Token ID11735
Feature activation+1.37e-02

Pos logit contributions
 minutes+0.037
ches+0.037
 mun+0.036
fy+0.035
 pounds+0.035
Neg logit contributions
ate-0.043
roc-0.042
 Fighting-0.038
 Respect-0.037
 neighborhood-0.037
Token did
Token ID750
Feature activation+4.33e-02

Pos logit contributions
 minutes+0.023
ches+0.021
fy+0.021
 drift+0.020
 degrees+0.020
Neg logit contributions
roc-0.024
ate-0.023
ulture-0.020
rot-0.020
man-0.020
Token as
Token ID355
Feature activation+2.17e-02

Pos logit contributions
fy+0.096
ches+0.095
 minutes+0.093
 mun+0.092
 pounds+0.089
Neg logit contributions
ate-0.047
roc-0.037
 yourself-0.030
 town-0.030
room-0.030
Token her
Token ID607
Feature activation+2.27e-02

Pos logit contributions
 minutes+0.038
ches+0.035
fy+0.034
 mun+0.033
 Whoever+0.033
Neg logit contributions
ate-0.037
roc-0.036
ateful-0.036
ulture-0.031
ums-0.031
Token
Token ID198
Feature activation-3.49e-03

Pos logit contributions
 minutes+0.029
fy+0.027
 degrees+0.026
ches+0.026
 mun+0.026
Neg logit contributions
ate-0.019
roc-0.019
ulture-0.016
man-0.015
 Respect-0.015
TokenAfter
Token ID3260
Feature activation+1.66e-02

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.005
ches-0.005
fy-0.005
 mun-0.004
 drift-0.004
Token they
Token ID484
Feature activation+4.54e-02

Pos logit contributions
ches+0.031
fy+0.028
 minutes+0.028
 degrees+0.027
 mun+0.025
Neg logit contributions
roc-0.025
ate-0.025
ateful-0.023
room-0.022
 Fighting-0.022
Token played
Token ID2826
Feature activation-1.36e-03

Pos logit contributions
 degrees+0.079
ches+0.074
 pounds+0.072
 minutes+0.071
 wo+0.065
Neg logit contributions
roc-0.065
amed-0.065
ate-0.064
room-0.062
 town-0.060
Token for
Token ID329
Feature activation+4.00e-02

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.003
ches-0.002
fy-0.002
 mun-0.002
 drift-0.002
Token a
Token ID257
Feature activation+9.56e-02

Pos logit contributions
ches+0.091
fy+0.085
 degrees+0.082
 drift+0.082
 mun+0.081
Neg logit contributions
roc-0.041
ate-0.039
 town-0.034
room-0.032
ateful-0.031
Token while
Token ID981
Feature activation+1.39e-02

Pos logit contributions
 degrees+0.168
 pounds+0.149
ches+0.143
 minutes+0.143
 lets+0.141
Neg logit contributions
 judge-0.130
 town-0.117
roc-0.115
 nation-0.115
 team-0.112
Token,
Token ID11
Feature activation+1.06e-02

Pos logit contributions
ches+0.033
 minutes+0.032
fy+0.032
 degrees+0.031
 mun+0.031
Neg logit contributions
roc-0.013
ate-0.012
ateful-0.010
ulture-0.010
man-0.010
Token it
Token ID340
Feature activation+1.31e-02

Pos logit contributions
 minutes+0.014
ches+0.014
fy+0.013
 degrees+0.013
 drift+0.012
Neg logit contributions
roc-0.022
ate-0.021
ateful-0.019
ulture-0.019
man-0.019
Token was
Token ID373
Feature activation+1.71e-02

Pos logit contributions
 minutes+0.024
fy+0.023
ches+0.022
 degrees+0.022
 mun+0.022
Neg logit contributions
roc-0.019
ate-0.019
man-0.017
ulture-0.017
amed-0.016
Token time
Token ID640
Feature activation+5.54e-02

Pos logit contributions
ches+0.022
 minutes+0.021
fy+0.020
 mun+0.020
 degrees+0.020
Neg logit contributions
ate-0.035
roc-0.034
ateful-0.032
amed-0.031
man-0.031
Token around
Token ID1088
Feature activation+3.76e-03

Pos logit contributions
 minutes+0.024
ches+0.022
 mun+0.021
fy+0.021
 drift+0.021
Neg logit contributions
roc-0.025
ate-0.024
ulture-0.021
man-0.021
rot-0.021
Token.
Token ID13
Feature activation+1.88e-02

Pos logit contributions
 minutes+0.006
ches+0.006
fy+0.006
 mun+0.005
 drift+0.005
Neg logit contributions
roc-0.007
ate-0.006
ulture-0.006
ateful-0.006
rot-0.006
Token After
Token ID2293
Feature activation+2.09e-02

Pos logit contributions
 minutes+0.037
ches+0.035
fy+0.033
 degrees+0.033
 drift+0.032
Neg logit contributions
roc-0.027
ate-0.026
 Fighting-0.023
ulture-0.023
 Respect-0.023
Token playing
Token ID2712
Feature activation-3.02e-03

Pos logit contributions
ches+0.039
fy+0.038
 minutes+0.037
 degrees+0.034
 drift+0.034
Neg logit contributions
roc-0.030
ate-0.029
ateful-0.026
ulture-0.025
 Fighting-0.025
Token for
Token ID329
Feature activation+3.77e-02

Pos logit contributions
roc+0.005
ate+0.004
ulture+0.004
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.006
ches-0.005
fy-0.005
 mun-0.005
 drift-0.005
Token a
Token ID257
Feature activation+9.53e-02

Pos logit contributions
ches+0.076
fy+0.075
 degrees+0.073
 mun+0.072
 drift+0.071
Neg logit contributions
roc-0.043
ate-0.041
 town-0.037
room-0.035
amed-0.035
Token long
Token ID890
Feature activation+5.49e-02

Pos logit contributions
 degrees+0.167
oops+0.155
oll+0.152
uct+0.151
utt+0.149
Neg logit contributions
 judge-0.122
 town-0.111
roc-0.107
 team-0.107
 nation-0.107
Token time
Token ID640
Feature activation+1.33e-02

Pos logit contributions
 degrees+0.085
 pounds+0.077
 minutes+0.074
fy+0.073
 y+0.072
Neg logit contributions
roc-0.088
gy-0.083
man-0.081
ate-0.081
 town-0.081
Token,
Token ID11
Feature activation-2.10e-03

Pos logit contributions
 minutes+0.029
ches+0.029
fy+0.028
 mun+0.027
 drift+0.027
Neg logit contributions
roc-0.015
ate-0.014
ateful-0.012
ulture-0.012
rot-0.011
Token they
Token ID484
Feature activation+1.31e-02

Pos logit contributions
roc+0.005
ate+0.004
ulture+0.004
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.003
ches-0.002
fy-0.002
 mun-0.002
 drift-0.002
Token went
Token ID1816
Feature activation-4.16e-03

Pos logit contributions
 minutes+0.021
 degrees+0.019
ches+0.019
fy+0.019
 drift+0.018
Neg logit contributions
roc-0.024
ate-0.023
ulture-0.021
ateful-0.021
rot-0.021
Token "
Token ID366
Feature activation+1.59e-02

Pos logit contributions
 minutes+0.037
fy+0.037
ches+0.037
 drift+0.035
 mun+0.034
Neg logit contributions
roc-0.029
ate-0.028
man-0.024
ulture-0.023
 Respect-0.023
TokenWe
Token ID1135
Feature activation+9.59e-03

Pos logit contributions
 minutes+0.031
ches+0.030
fy+0.028
 degrees+0.028
 drift+0.028
Neg logit contributions
roc-0.023
ate-0.023
ulture-0.021
amer-0.020
rot-0.020
Token can
Token ID460
Feature activation+7.02e-03

Pos logit contributions
 minutes+0.013
ches+0.011
 degrees+0.011
fy+0.011
 mun+0.011
Neg logit contributions
roc-0.020
ate-0.019
ulture-0.018
ateful-0.018
rot-0.017
Token go
Token ID467
Feature activation+1.40e-02

Pos logit contributions
 minutes+0.011
ches+0.009
fy+0.009
 degrees+0.009
 mun+0.009
Neg logit contributions
roc-0.014
ate-0.013
ulture-0.012
ateful-0.012
man-0.012
Token on
Token ID319
Feature activation+2.63e-02

Pos logit contributions
 minutes+0.024
 mun+0.021
 degrees+0.021
ches+0.021
fy+0.021
Neg logit contributions
roc-0.024
ate-0.024
man-0.021
amer-0.020
ulture-0.020
Token a
Token ID257
Feature activation+9.50e-02

Pos logit contributions
 minutes+0.049
fy+0.048
 mun+0.047
ches+0.046
 drift+0.044
Neg logit contributions
ate-0.038
roc-0.038
ateful-0.031
 town-0.030
man-0.030
Token road
Token ID2975
Feature activation-3.96e-03

Pos logit contributions
uct+0.165
 degrees+0.162
utt+0.161
 mun+0.160
oll+0.156
Neg logit contributions
man-0.123
 nation-0.115
 town-0.113
 army-0.108
 voice-0.105
Token trip
Token ID5296
Feature activation-9.15e-03

Pos logit contributions
roc+0.008
ate+0.007
ulture+0.007
ateful+0.007
rot+0.006
Neg logit contributions
 minutes-0.006
ches-0.006
fy-0.005
 mun-0.005
 drift-0.005
Token."
Token ID526
Feature activation+1.97e-02

Pos logit contributions
roc+0.019
ate+0.018
ulture+0.017
ateful+0.016
rot+0.016
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.011
 y-0.011
 drift-0.011
Token
Token ID198
Feature activation+1.31e-02

Pos logit contributions
fy+0.039
ches+0.039
 minutes+0.039
 degrees+0.037
 mun+0.036
Neg logit contributions
roc-0.028
ate-0.027
man-0.022
ulture-0.022
 Respect-0.022
Token
Token ID198
Feature activation-2.82e-03

Pos logit contributions
 minutes+0.028
fy+0.026
ches+0.026
 mun+0.025
 degrees+0.025
Neg logit contributions
roc-0.018
ate-0.017
ulture-0.015
man-0.014
ateful-0.014
TokenThe
Token ID464
Feature activation-1.34e-02

Pos logit contributions
roc+0.005
ate+0.004
ulture+0.004
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.005
ches-0.005
fy-0.005
 mun-0.004
 drift-0.004
Token old
Token ID1468
Feature activation-4.57e-02

Pos logit contributions
roc+0.026
ate+0.024
ulture+0.023
ateful+0.022
anish+0.022
Neg logit contributions
 minutes-0.021
 y-0.019
ches-0.018
 drift-0.018
fy-0.018
Token lady
Token ID10846
Feature activation+1.56e-02

Pos logit contributions
roc+0.101
anish+0.091
ateful+0.091
ulture+0.091
amed+0.090
Neg logit contributions
 minutes-0.055
 y-0.051
 wo-0.048
 gob-0.045
 sn-0.045
Token thought
Token ID1807
Feature activation+7.44e-02

Pos logit contributions
 minutes+0.022
fy+0.019
ches+0.019
 degrees+0.019
 drift+0.019
Neg logit contributions
roc-0.031
ate-0.030
man-0.028
ulture-0.028
rot-0.028
Token and
Token ID290
Feature activation+9.95e-03

Pos logit contributions
 timer+0.154
ksh+0.152
 mun+0.145
 minutes+0.140
 degrees+0.139
Neg logit contributions
ate-0.072
 others-0.061
roc-0.061
,-0.060
room-0.059
Token thought
Token ID1807
Feature activation+9.50e-02

Pos logit contributions
 minutes+0.013
ches+0.012
fy+0.012
 degrees+0.012
 drift+0.012
Neg logit contributions
roc-0.021
ate-0.020
ateful-0.018
ulture-0.018
man-0.018
Token but
Token ID475
Feature activation+6.89e-03

Pos logit contributions
 timer+0.219
ksh+0.214
 mun+0.212
 tast+0.204
 gob+0.203
Neg logit contributions
,-0.066
room-0.064
ate-0.053
ings-0.044
ums-0.044
Token she
Token ID673
Feature activation+1.36e-02

Pos logit contributions
 minutes+0.009
ches+0.008
fy+0.008
 degrees+0.008
 drift+0.007
Neg logit contributions
roc-0.015
ate-0.014
ulture-0.013
ateful-0.013
rot-0.013
Token couldn
Token ID3521
Feature activation+9.74e-03

Pos logit contributions
 minutes+0.021
 drift+0.019
fy+0.019
ches+0.019
 degrees+0.018
Neg logit contributions
roc-0.026
ate-0.024
ulture-0.022
man-0.022
rot-0.022
Tokenâ
Token ID22940
Feature activation-2.03e-03

Pos logit contributions
 minutes+0.019
fy+0.017
 mun+0.017
ches+0.017
 degrees+0.016
Neg logit contributions
roc-0.015
ate-0.015
ulture-0.013
rot-0.013
man-0.013
Token
Token ID26391
Feature activation+6.29e-03

Pos logit contributions
roc+0.004
ate+0.004
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.003
ches-0.003
fy-0.003
 mun-0.003
 drift-0.003
Token's
Token ID338
Feature activation+1.76e-02

Pos logit contributions
 minutes+0.033
 mun+0.031
fy+0.031
 degrees+0.030
 pounds+0.030
Neg logit contributions
roc-0.020
ate-0.019
man-0.017
ulture-0.017
rot-0.016
Token go
Token ID467
Feature activation+2.73e-02

Pos logit contributions
 minutes+0.026
ches+0.023
fy+0.022
 degrees+0.022
 mun+0.022
Neg logit contributions
ate-0.035
roc-0.035
ulture-0.030
man-0.030
ateful-0.030
Token back
Token ID736
Feature activation+1.26e-02

Pos logit contributions
 minutes+0.059
 degrees+0.058
 pounds+0.055
 mun+0.055
ches+0.055
Neg logit contributions
ate-0.033
roc-0.030
man-0.028
gy-0.026
ulture-0.026
Token home
Token ID1363
Feature activation-1.47e-03

Pos logit contributions
 minutes+0.023
ches+0.022
 mun+0.022
fy+0.021
 degrees+0.021
Neg logit contributions
ate-0.020
roc-0.019
ateful-0.016
ulture-0.016
man-0.016
Token and
Token ID290
Feature activation+1.40e-02

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.003
ateful+0.003
rot+0.002
Neg logit contributions
 minutes-0.002
ches-0.002
fy-0.002
 mun-0.002
 drift-0.002
Token wait
Token ID4043
Feature activation+9.49e-02

Pos logit contributions
 minutes+0.023
ches+0.022
 degrees+0.020
fy+0.020
 mun+0.019
Neg logit contributions
roc-0.025
ate-0.024
man-0.021
ulture-0.021
ateful-0.021
Token for
Token ID329
Feature activation+3.71e-02

Pos logit contributions
ches+0.170
 mun+0.165
cone+0.162
fy+0.156
pill+0.151
Neg logit contributions
ate-0.130
ap-0.109
 town-0.103
room-0.099
 yourself-0.097
Token the
Token ID262
Feature activation+2.37e-02

Pos logit contributions
ches+0.078
fy+0.076
 mun+0.073
 minutes+0.069
rig+0.068
Neg logit contributions
ate-0.048
roc-0.044
ateful-0.039
 town-0.039
 yourself-0.037
Token sun
Token ID4252
Feature activation+1.67e-02

Pos logit contributions
ches+0.043
 minutes+0.041
fy+0.040
 mun+0.039
 Whoever+0.037
Neg logit contributions
roc-0.037
ate-0.035
ulture-0.033
 neighborhood-0.033
man-0.033
Token to
Token ID284
Feature activation+2.36e-02

Pos logit contributions
 minutes+0.027
ches+0.025
fy+0.024
 mun+0.024
 degrees+0.023
Neg logit contributions
roc-0.029
ate-0.029
man-0.027
ulture-0.027
flower-0.027
Token come
Token ID1282
Feature activation+1.09e-02

Pos logit contributions
 minutes+0.037
ches+0.035
fy+0.034
 mun+0.032
 degrees+0.032
Neg logit contributions
ate-0.043
roc-0.043
ulture-0.037
room-0.037
man-0.037
Token "
Token ID366
Feature activation-2.26e-02

Pos logit contributions
 minutes+0.011
ches+0.010
fy+0.010
 mun+0.009
 degrees+0.009
Neg logit contributions
roc-0.010
ate-0.009
ulture-0.008
ateful-0.008
 Fighting-0.008
TokenCan
Token ID6090
Feature activation-8.41e-03

Pos logit contributions
roc+0.042
ate+0.039
ateful+0.036
ulture+0.035
rot+0.034
Neg logit contributions
 minutes-0.036
 Whoever-0.033
ches-0.033
 mun-0.032
fy-0.032
Token I
Token ID314
Feature activation+7.75e-03

Pos logit contributions
roc+0.020
ate+0.020
ulture+0.018
ateful+0.018
amer+0.018
Neg logit contributions
 minutes-0.009
ches-0.008
fy-0.007
 mun-0.007
 drift-0.007
Token lend
Token ID22096
Feature activation-2.64e-02

Pos logit contributions
 minutes+0.012
ches+0.011
 degrees+0.011
fy+0.010
 mun+0.010
Neg logit contributions
roc-0.015
ate-0.014
ulture-0.013
rot-0.013
man-0.013
Token your
Token ID534
Feature activation-1.83e-02

Pos logit contributions
roc+0.059
ate+0.057
ulture+0.055
ateful+0.055
amer+0.053
Neg logit contributions
ches-0.030
 minutes-0.030
 y-0.027
fy-0.027
 Whoever-0.025
Token mustache
Token ID49303
Feature activation-9.92e-02

Pos logit contributions
roc+0.032
ate+0.030
ateful+0.028
ulture+0.028
amer+0.027
Neg logit contributions
 minutes-0.031
ches-0.029
 y-0.029
fy-0.028
 drift-0.027
Token for
Token ID329
Feature activation-8.70e-03

Pos logit contributions
ateful+0.218
roc+0.215
apper+0.208
ulture+0.203
amer+0.203
Neg logit contributions
ches-0.106
 y-0.105
 for-0.092
fy-0.089
 wo-0.089
Token a
Token ID257
Feature activation+3.78e-02

Pos logit contributions
roc+0.016
ate+0.015
ulture+0.014
ateful+0.014
rot+0.013
Neg logit contributions
 minutes-0.014
ches-0.013
fy-0.013
 y-0.013
 mun-0.012
Token little
Token ID1310
Feature activation+2.21e-02

Pos logit contributions
 minutes+0.071
 degrees+0.068
ches+0.065
 drift+0.062
fy+0.061
Neg logit contributions
roc-0.055
man-0.051
rot-0.051
 judge-0.049
ate-0.048
Token bit
Token ID1643
Feature activation-6.73e-03

Pos logit contributions
 minutes+0.034
ches+0.033
 degrees+0.032
 drift+0.031
fy+0.030
Neg logit contributions
roc-0.040
ate-0.036
man-0.035
rot-0.035
ulture-0.034
Token?
Token ID30
Feature activation+6.84e-03

Pos logit contributions
roc+0.011
ate+0.010
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.012
 mun-0.011
 y-0.011
Token you
Token ID345
Feature activation+4.51e-04

Pos logit contributions
 minutes+0.016
ches+0.015
fy+0.014
 drift+0.014
 mun+0.013
Neg logit contributions
roc-0.023
ate-0.021
ateful-0.019
ulture-0.019
anish-0.019
Token want
Token ID765
Feature activation-1.90e-02

Pos logit contributions
 minutes+0.001
ches+0.001
fy+0.001
 mun+0.001
 drift+0.001
Neg logit contributions
roc-0.001
ate-0.001
ulture-0.001
ateful-0.001
rot-0.001
Token to
Token ID284
Feature activation-3.10e-02

Pos logit contributions
roc+0.038
ate+0.036
ulture+0.035
ateful+0.034
rot+0.033
Neg logit contributions
 minutes-0.029
 y-0.026
ches-0.025
 mun-0.024
fy-0.024
Token teach
Token ID4545
Feature activation-3.46e-02

Pos logit contributions
roc+0.075
amed+0.069
ulture+0.069
ateful+0.069
ate+0.069
Neg logit contributions
 dip-0.030
 minutes-0.030
 y-0.030
 drift-0.029
 mun-0.027
Token your
Token ID534
Feature activation+2.11e-03

Pos logit contributions
roc+0.076
ulture+0.073
ate+0.072
flower+0.070
apper+0.070
Neg logit contributions
 minutes-0.040
 y-0.038
ches-0.037
 Whoever-0.032
 syrup-0.032
Token tank
Token ID6873
Feature activation-9.71e-02

Pos logit contributions
 minutes+0.004
ches+0.004
fy+0.004
 drift+0.004
 mun+0.004
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.003
rot-0.003
Token how
Token ID703
Feature activation-6.69e-03

Pos logit contributions
ivered+0.206
roc+0.204
ateful+0.201
 Fighting+0.196
ums+0.196
Neg logit contributions
 y-0.105
 minutes-0.087
ches-0.084
 for-0.083
 honey-0.078
Token to
Token ID284
Feature activation-3.14e-02

Pos logit contributions
roc+0.012
ate+0.012
ulture+0.011
ateful+0.011
rot+0.010
Neg logit contributions
 minutes-0.011
ches-0.010
fy-0.010
 y-0.009
 mun-0.009
Token say
Token ID910
Feature activation-3.51e-02

Pos logit contributions
roc+0.071
ulture+0.064
amed+0.064
ateful+0.064
ate+0.064
Neg logit contributions
 minutes-0.035
ches-0.034
 drift-0.034
 dip-0.034
 y-0.032
Token '
Token ID705
Feature activation-1.56e-02

Pos logit contributions
roc+0.073
ate+0.066
ulture+0.066
ateful+0.063
flower+0.062
Neg logit contributions
 minutes-0.050
 y-0.046
ches-0.044
 Whoever-0.041
 wo-0.041
Tokens
Token ID82
Feature activation-8.24e-02

Pos logit contributions
roc+0.027
ate+0.025
ulture+0.023
ateful+0.023
rot+0.022
Neg logit contributions
 minutes-0.027
ches-0.025
fy-0.024
 mun-0.024
 y-0.024
Token Whenever
Token ID21326
Feature activation-2.38e-02

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.002
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token Lily
Token ID20037
Feature activation-4.39e-02

Pos logit contributions
roc+0.066
ate+0.063
ulture+0.060
amer+0.060
man+0.058
Neg logit contributions
 minutes-0.021
 y-0.015
 drift-0.014
ches-0.013
fy-0.013
Token went
Token ID1816
Feature activation-6.39e-02

Pos logit contributions
roc+0.093
ate+0.085
ateful+0.083
ulture+0.082
anish+0.081
Neg logit contributions
 minutes-0.058
ches-0.058
 mun-0.052
 y-0.052
fy-0.051
Token to
Token ID284
Feature activation-2.84e-02

Pos logit contributions
roc+0.164
ulture+0.149
ateful+0.148
ate+0.147
anish+0.144
Neg logit contributions
 minutes-0.064
ches-0.058
 y-0.052
 drift-0.050
 wo-0.049
Token the
Token ID262
Feature activation-6.24e-02

Pos logit contributions
roc+0.047
ulture+0.043
rot+0.042
ate+0.041
amed+0.041
Neg logit contributions
 minutes-0.050
 drift-0.047
 y-0.047
 dip-0.045
 Whoever-0.044
Token post
Token ID1281
Feature activation-9.61e-02

Pos logit contributions
roc+0.129
ivered+0.125
ated+0.118
ums+0.117
amed+0.117
Neg logit contributions
 y-0.084
 timer-0.069
 dip-0.068
 convey-0.066
 minutes-0.066
Token office
Token ID2607
Feature activation-6.80e-02

Pos logit contributions
roc+0.166
ateful+0.150
ivered+0.137
amed+0.131
arre+0.131
Neg logit contributions
 maybe-0.143
 minutes-0.141
 might-0.139
 until-0.129
 for-0.129
Token,
Token ID11
Feature activation+1.32e-02

Pos logit contributions
roc+0.143
ate+0.130
ulture+0.129
amer+0.126
ateful+0.125
Neg logit contributions
 minutes-0.103
 y-0.084
 maybe-0.082
 wo-0.081
ches-0.079
Token she
Token ID673
Feature activation-3.32e-02

Pos logit contributions
 minutes+0.017
ches+0.016
fy+0.015
 mun+0.015
 degrees+0.014
Neg logit contributions
roc-0.028
ate-0.027
ateful-0.025
ulture-0.025
anish-0.024
Token always
Token ID1464
Feature activation-2.95e-02

Pos logit contributions
roc+0.088
ate+0.079
ulture+0.078
anish+0.077
ateful+0.076
Neg logit contributions
 minutes-0.029
ches-0.028
 mun-0.023
 y-0.023
fy-0.022
Token added
Token ID2087
Feature activation-1.34e-02

Pos logit contributions
roc+0.066
ulture+0.059
ate+0.058
rot+0.056
amer+0.055
Neg logit contributions
 minutes-0.039
 y-0.034
 drift-0.033
 mun-0.033
ches-0.033
Token should
Token ID815
Feature activation-1.28e-02

Pos logit contributions
 minutes+0.002
ches+0.002
fy+0.002
 mun+0.002
 drift+0.002
Neg logit contributions
roc-0.002
ate-0.002
ulture-0.002
ateful-0.002
rot-0.002
Token be
Token ID307
Feature activation-3.83e-02

Pos logit contributions
roc+0.025
ate+0.024
ulture+0.022
ateful+0.022
rot+0.022
Neg logit contributions
 minutes-0.019
ches-0.017
 drift-0.017
 y-0.017
fy-0.016
Token compassionate
Token ID32533
Feature activation-5.28e-02

Pos logit contributions
ulture+0.081
roc+0.080
ate+0.076
rot+0.076
ase+0.075
Neg logit contributions
 minutes-0.055
 y-0.051
 tug-0.043
 slippery-0.042
ches-0.041
Token.
Token ID13
Feature activation-4.05e-02

Pos logit contributions
roc+0.138
ate+0.137
ulture+0.129
amed+0.126
ateful+0.126
Neg logit contributions
 minutes-0.044
ches-0.028
 y-0.026
fy-0.026
 wo-0.024
Token Comp
Token ID3082
Feature activation-4.08e-02

Pos logit contributions
roc+0.086
ate+0.081
ulture+0.078
man+0.077
amed+0.077
Neg logit contributions
 minutes-0.049
 Whoever-0.043
ches-0.041
fy-0.040
 wo-0.040
Tokenassion
Token ID11857
Feature activation-9.60e-02

Pos logit contributions
roc+0.056
anish+0.055
ate+0.055
 neighborhood+0.053
 self+0.053
Neg logit contributions
ches-0.077
fy-0.067
 Whoever-0.064
 degrees-0.058
 minutes-0.058
Tokenate
Token ID378
Feature activation-3.24e-02

Pos logit contributions
 messenger+0.117
roc+0.115
 courtroom+0.109
date+0.108
flower+0.106
Neg logit contributions
 minutes-0.188
ches-0.183
 y-0.169
 pounds-0.161
 degrees-0.159
Token means
Token ID1724
Feature activation+1.45e-02

Pos logit contributions
roc+0.062
ate+0.058
ulture+0.054
ateful+0.053
amed+0.053
Neg logit contributions
ches-0.048
 minutes-0.048
 y-0.043
fy-0.043
 degrees-0.040
Token you
Token ID345
Feature activation-9.54e-03

Pos logit contributions
 minutes+0.023
ches+0.021
fy+0.021
 mun+0.020
 drift+0.020
Neg logit contributions
roc-0.027
ate-0.026
ateful-0.024
ulture-0.023
 Fighting-0.023
Token care
Token ID1337
Feature activation-2.73e-02

Pos logit contributions
roc+0.020
ate+0.019
ulture+0.018
ateful+0.017
rot+0.017
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.012
 y-0.011
 mun-0.011
Token about
Token ID546
Feature activation+5.27e-04

Pos logit contributions
roc+0.068
ate+0.064
ulture+0.063
rot+0.061
flower+0.061
Neg logit contributions
 minutes-0.026
ches-0.025
 y-0.022
fy-0.021
 degrees-0.020
TokenWhere
Token ID8496
Feature activation-1.55e-02

Pos logit contributions
roc+0.032
ate+0.030
ulture+0.028
ateful+0.028
ums+0.027
Neg logit contributions
 minutes-0.029
ches-0.027
 y-0.026
 Whoever-0.026
 mun-0.026
Token did
Token ID750
Feature activation+1.05e-02

Pos logit contributions
roc+0.032
ate+0.030
ulture+0.029
ateful+0.028
amer+0.028
Neg logit contributions
 minutes-0.021
 y-0.020
ches-0.019
fy-0.019
 mun-0.018
Token you
Token ID345
Feature activation-2.37e-02

Pos logit contributions
 minutes+0.015
ches+0.014
fy+0.014
 mun+0.013
 drift+0.013
Neg logit contributions
roc-0.021
ate-0.020
ateful-0.018
ulture-0.018
man-0.018
Token get
Token ID651
Feature activation-3.82e-02

Pos logit contributions
roc+0.055
ate+0.052
ulture+0.051
rot+0.049
ateful+0.049
Neg logit contributions
 minutes-0.025
 drift-0.024
ches-0.024
 y-0.023
 dip-0.022
Token your
Token ID534
Feature activation-2.93e-02

Pos logit contributions
roc+0.087
ulture+0.082
ate+0.081
anish+0.079
ums+0.077
Neg logit contributions
 minutes-0.045
ches-0.044
 y-0.043
 dip-0.039
 Whoever-0.038
Token gun
Token ID2485
Feature activation-9.46e-02

Pos logit contributions
roc+0.044
ate+0.042
ateful+0.039
ulture+0.037
amer+0.037
Neg logit contributions
 minutes-0.056
ches-0.055
 y-0.054
 drift-0.051
fy-0.051
Token?"
Token ID1701
Feature activation-2.67e-02

Pos logit contributions
roc+0.201
ateful+0.185
ulture+0.178
ate+0.178
ums+0.177
Neg logit contributions
ches-0.124
 y-0.116
fy-0.104
 drift-0.104
 nearly-0.102
Token
Token ID198
Feature activation-3.04e-02

Pos logit contributions
roc+0.051
ate+0.049
ulture+0.046
ateful+0.045
rot+0.044
Neg logit contributions
 minutes-0.042
 mun-0.037
 Whoever-0.037
ches-0.037
 drift-0.037
Token
Token ID198
Feature activation-4.46e-02

Pos logit contributions
roc+0.053
ateful+0.049
ate+0.048
ulture+0.047
flower+0.046
Neg logit contributions
 minutes-0.050
ches-0.048
 Whoever-0.046
 tug-0.044
fy-0.043
TokenThe
Token ID464
Feature activation-1.15e-02

Pos logit contributions
roc+0.083
ateful+0.079
ate+0.078
ulture+0.077
flower+0.074
Neg logit contributions
 minutes-0.064
 Whoever-0.064
 tug-0.061
ches-0.059
fy-0.057
Token other
Token ID584
Feature activation-1.50e-02

Pos logit contributions
roc+0.018
ate+0.017
ulture+0.015
ateful+0.015
amer+0.014
Neg logit contributions
 minutes-0.023
 y-0.020
ches-0.020
fy-0.020
 mun-0.019
TokenWhy
Token ID5195
Feature activation-2.72e-02

Pos logit contributions
roc+0.042
ate+0.040
ateful+0.037
ulture+0.036
ums+0.035
Neg logit contributions
 minutes-0.039
ches-0.037
 mun-0.034
 y-0.034
 Whoever-0.034
Token did
Token ID750
Feature activation+2.30e-03

Pos logit contributions
roc+0.061
ate+0.058
amer+0.056
amed+0.055
ateful+0.055
Neg logit contributions
 y-0.031
ches-0.030
 minutes-0.029
 wo-0.027
 drift-0.027
Token you
Token ID345
Feature activation-3.36e-02

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.005
ate-0.004
ulture-0.004
ateful-0.004
rot-0.004
Token say
Token ID910
Feature activation-2.69e-02

Pos logit contributions
roc+0.080
ate+0.076
ateful+0.073
ulture+0.073
amed+0.072
Neg logit contributions
ches-0.036
 drift-0.034
 dip-0.032
 y-0.032
 minutes-0.031
Token '
Token ID705
Feature activation-1.77e-02

Pos logit contributions
roc+0.057
ate+0.056
ulture+0.052
ateful+0.051
amed+0.050
Neg logit contributions
 minutes-0.035
ches-0.034
 y-0.033
 wo-0.031
fy-0.030
Tokenob
Token ID672
Feature activation-9.45e-02

Pos logit contributions
roc+0.031
ate+0.029
ulture+0.027
ateful+0.026
rot+0.025
Neg logit contributions
 minutes-0.030
ches-0.028
fy-0.027
 mun-0.026
 y-0.026
Tokenedient
Token ID35279
Feature activation-8.22e-02

Pos logit contributions
roc+0.154
ateful+0.141
ulture+0.137
ate+0.135
amer+0.130
Neg logit contributions
ches-0.156
 minutes-0.134
 degrees-0.131
 y-0.130
 Whoever-0.128
Token'
Token ID6
Feature activation-5.22e-02

Pos logit contributions
ate+0.140
roc+0.138
amer+0.132
ateful+0.131
ulture+0.129
Neg logit contributions
ches-0.124
 minutes-0.118
 y-0.111
 wo-0.110
fy-0.107
Token?"
Token ID1701
Feature activation-3.24e-02

Pos logit contributions
roc+0.092
ate+0.088
ulture+0.086
amed+0.082
ateful+0.082
Neg logit contributions
 minutes-0.078
 y-0.076
ches-0.075
 wo-0.069
 Whoever-0.067
Token
Token ID220
Feature activation-4.22e-03

Pos logit contributions
roc+0.058
ate+0.055
amer+0.051
ateful+0.051
ulture+0.050
Neg logit contributions
 minutes-0.055
ches-0.052
 mun-0.051
 y-0.051
 drift-0.051
Token
Token ID198
Feature activation-3.05e-02

Pos logit contributions
roc+0.007
ate+0.006
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.008
ches-0.007
fy-0.007
 drift-0.007
 mun-0.007
Token judge
Token ID5052
Feature activation-2.54e-02

Pos logit contributions
roc+0.045
ate+0.043
ateful+0.039
ulture+0.039
amed+0.039
Neg logit contributions
 minutes-0.042
 y-0.038
ches-0.038
fy-0.038
 mun-0.037
Token who
Token ID508
Feature activation-2.66e-02

Pos logit contributions
roc+0.059
ate+0.055
ulture+0.052
ateful+0.051
amed+0.050
Neg logit contributions
 minutes-0.031
ches-0.027
 mun-0.025
 y-0.025
fy-0.025
Token lived
Token ID5615
Feature activation-2.90e-02

Pos logit contributions
roc+0.045
ate+0.041
ulture+0.039
ateful+0.038
anish+0.038
Neg logit contributions
 minutes-0.045
ches-0.044
 y-0.043
 mun-0.042
fy-0.042
Token in
Token ID287
Feature activation-2.75e-02

Pos logit contributions
roc+0.062
ulture+0.055
ate+0.054
ateful+0.051
amer+0.050
Neg logit contributions
 minutes-0.041
 y-0.037
ches-0.036
 mun-0.035
 tug-0.034
Token a
Token ID257
Feature activation-4.48e-02

Pos logit contributions
roc+0.058
ate+0.055
ulture+0.054
rot+0.052
flower+0.050
Neg logit contributions
 minutes-0.038
 y-0.034
 drift-0.032
 tug-0.031
ches-0.031
Token beautiful
Token ID4950
Feature activation-9.40e-02

Pos logit contributions
roc+0.098
ate+0.097
ulture+0.094
anish+0.092
amer+0.090
Neg logit contributions
 minutes-0.057
 y-0.055
 tug-0.046
fy-0.046
 slippery-0.045
Token palace
Token ID20562
Feature activation-2.24e-02

Pos logit contributions
amed+0.177
roc+0.174
amer+0.169
ulture+0.169
ateful+0.168
Neg logit contributions
 y-0.133
 minutes-0.132
 dip-0.122
 sn-0.118
 scaven-0.116
Token.
Token ID13
Feature activation-1.74e-02

Pos logit contributions
roc+0.053
ate+0.049
ulture+0.047
ateful+0.046
anish+0.045
Neg logit contributions
 minutes-0.026
ches-0.023
fy-0.022
 y-0.021
 mun-0.021
Token Whenever
Token ID21326
Feature activation-3.89e-02

Pos logit contributions
roc+0.030
ate+0.029
ulture+0.027
rot+0.027
ateful+0.026
Neg logit contributions
 minutes-0.030
 Whoever-0.026
fy-0.026
 y-0.026
ches-0.026
Token anyone
Token ID2687
Feature activation-2.70e-02

Pos logit contributions
roc+0.101
ate+0.097
amed+0.096
ulture+0.096
amer+0.092
Neg logit contributions
 minutes-0.038
 y-0.031
 drift-0.027
 maybe-0.024
 tug-0.024
Token had
Token ID550
Feature activation-3.89e-02

Pos logit contributions
roc+0.057
ate+0.053
ulture+0.050
ateful+0.048
amer+0.048
Neg logit contributions
 minutes-0.038
ches-0.035
 y-0.033
fy-0.033
 mun-0.032
Token"
Token ID1
Feature activation+1.32e-03

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.003
ches-0.003
fy-0.003
 mun-0.003
 drift-0.003
TokenCan
Token ID6090
Feature activation-3.20e-03

Pos logit contributions
 minutes+0.002
ches+0.002
fy+0.002
 drift+0.002
 mun+0.002
Neg logit contributions
roc-0.002
ate-0.002
ulture-0.002
ateful-0.002
rot-0.002
Token I
Token ID314
Feature activation-1.50e-02

Pos logit contributions
roc+0.008
ate+0.008
ulture+0.007
ateful+0.007
rot+0.007
Neg logit contributions
 minutes-0.003
ches-0.003
fy-0.002
 mun-0.002
 drift-0.002
Token have
Token ID423
Feature activation-1.75e-02

Pos logit contributions
roc+0.032
ate+0.029
ulture+0.028
ateful+0.028
rot+0.027
Neg logit contributions
 minutes-0.020
ches-0.019
fy-0.018
 drift-0.018
 y-0.018
Token your
Token ID534
Feature activation-4.85e-03

Pos logit contributions
roc+0.035
ate+0.032
ulture+0.031
rot+0.030
man+0.030
Neg logit contributions
 minutes-0.028
 y-0.024
ches-0.024
fy-0.023
 drift-0.022
Token heart
Token ID2612
Feature activation-9.27e-02

Pos logit contributions
roc+0.009
ate+0.009
ulture+0.008
ateful+0.008
rot+0.008
Neg logit contributions
 minutes-0.008
ches-0.007
fy-0.007
 drift-0.007
 mun-0.006
Token,
Token ID11
Feature activation+2.40e-02

Pos logit contributions
roc+0.212
ulture+0.191
ateful+0.180
amer+0.177
 Fighting+0.176
Neg logit contributions
 y-0.115
 minutes-0.112
fy-0.105
ches-0.105
 dip-0.101
Token Anna
Token ID11735
Feature activation-2.88e-02

Pos logit contributions
 minutes+0.032
ches+0.031
 mun+0.031
fy+0.030
 drift+0.029
Neg logit contributions
roc-0.048
ate-0.046
ateful-0.042
man-0.041
ulture-0.041
Token?"
Token ID1701
Feature activation+1.28e-02

Pos logit contributions
roc+0.051
ate+0.045
ulture+0.043
ateful+0.042
amed+0.042
Neg logit contributions
 minutes-0.051
ches-0.048
 y-0.045
fy-0.044
 drift-0.042
Token Tom
Token ID4186
Feature activation+1.22e-02

Pos logit contributions
ches+0.018
 minutes+0.018
fy+0.017
 mun+0.017
 degrees+0.016
Neg logit contributions
roc-0.025
ate-0.024
ateful-0.023
 Fighting-0.022
 Respect-0.022
Token asks
Token ID7893
Feature activation-6.49e-04

Pos logit contributions
 minutes+0.023
fy+0.021
 mun+0.021
ches+0.021
 degrees+0.020
Neg logit contributions
roc-0.019
ate-0.018
ateful-0.016
man-0.015
ulture-0.015
Token then
Token ID788
Feature activation+2.17e-03

Pos logit contributions
roc+0.053
ate+0.048
anish+0.048
ulture+0.048
ateful+0.046
Neg logit contributions
 minutes-0.024
ches-0.021
 y-0.020
fy-0.019
 mun-0.019
Token on
Token ID319
Feature activation-1.69e-02

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.004
ate-0.004
ulture-0.004
ateful-0.003
rot-0.003
Token,
Token ID11
Feature activation-1.50e-02

Pos logit contributions
roc+0.029
ate+0.027
ulture+0.025
ateful+0.025
amer+0.024
Neg logit contributions
 minutes-0.032
ches-0.029
 y-0.028
 drift-0.027
fy-0.027
Token the
Token ID262
Feature activation-1.26e-02

Pos logit contributions
roc+0.032
ate+0.030
ulture+0.029
ateful+0.028
rot+0.028
Neg logit contributions
 minutes-0.020
 y-0.017
ches-0.017
 drift-0.017
 mun-0.017
Token me
Token ID502
Feature activation-2.56e-02

Pos logit contributions
roc+0.022
ate+0.021
ulture+0.020
ateful+0.019
amer+0.019
Neg logit contributions
 minutes-0.022
 y-0.019
ches-0.019
 mun-0.018
fy-0.018
Tokenadow
Token ID4584
Feature activation-9.19e-02

Pos logit contributions
roc+0.048
ate+0.043
ateful+0.042
ulture+0.041
 Fighting+0.040
Neg logit contributions
 minutes-0.041
ches-0.039
 mun-0.037
fy-0.037
 drift-0.036
Token was
Token ID373
Feature activation-2.75e-02

Pos logit contributions
roc+0.221
ateful+0.201
ate+0.192
 Fighting+0.191
ulture+0.189
Neg logit contributions
 minutes-0.093
 mun-0.087
 jumped-0.085
 wouldn-0.084
ches-0.083
Token known
Token ID1900
Feature activation+1.59e-02

Pos logit contributions
roc+0.057
ulture+0.051
ate+0.049
ateful+0.048
anish+0.048
Neg logit contributions
 minutes-0.044
 y-0.038
ches-0.037
 drift-0.036
 mun-0.036
Token as
Token ID355
Feature activation-2.48e-02

Pos logit contributions
 minutes+0.021
ches+0.019
fy+0.017
 mun+0.017
 drift+0.017
Neg logit contributions
roc-0.035
ate-0.033
ateful-0.031
rot-0.030
ulture-0.030
Token the
Token ID262
Feature activation-3.57e-03

Pos logit contributions
roc+0.051
ulture+0.047
ate+0.047
rot+0.044
anish+0.043
Neg logit contributions
 minutes-0.036
 y-0.032
 drift-0.030
ches-0.030
 mun-0.029
Token surprise
Token ID5975
Feature activation+1.56e-02

Pos logit contributions
roc+0.006
ate+0.005
ulture+0.005
ateful+0.005
rot+0.005
Neg logit contributions
 minutes-0.007
ches-0.006
fy-0.006
 mun-0.006
 drift-0.006
Token fox
Token ID21831
Feature activation-8.61e-03

Pos logit contributions
 minutes+0.014
ches+0.013
fy+0.013
 drift+0.013
 degrees+0.012
Neg logit contributions
roc-0.012
ate-0.011
ulture-0.010
rot-0.010
ateful-0.010
Token named
Token ID3706
Feature activation+3.49e-02

Pos logit contributions
roc+0.016
ate+0.015
ulture+0.014
ateful+0.014
rot+0.013
Neg logit contributions
 minutes-0.014
ches-0.013
fy-0.012
 mun-0.012
 y-0.012
Token Fox
Token ID5426
Feature activation-5.37e-02

Pos logit contributions
 minutes+0.066
fy+0.056
 degrees+0.054
 drift+0.054
 seconds+0.052
Neg logit contributions
roc-0.061
ate-0.054
gy-0.051
man-0.049
rot-0.047
Tokeny
Token ID88
Feature activation-2.86e-02

Pos logit contributions
roc+0.098
amed+0.087
ate+0.086
ulture+0.086
anish+0.085
Neg logit contributions
ches-0.085
 minutes-0.078
 y-0.074
 wo-0.073
 mun-0.071
Token.
Token ID13
Feature activation-3.57e-02

Pos logit contributions
roc+0.069
ate+0.065
ulture+0.062
ateful+0.062
amed+0.061
Neg logit contributions
 minutes-0.028
ches-0.027
 y-0.024
 wo-0.024
 mun-0.023
Token Fox
Token ID5426
Feature activation-9.17e-02

Pos logit contributions
roc+0.064
ate+0.059
ateful+0.057
amed+0.057
ulture+0.057
Neg logit contributions
 drift-0.055
ches-0.053
 minutes-0.053
 y-0.053
 mun-0.052
Tokeny
Token ID88
Feature activation-6.63e-02

Pos logit contributions
roc+0.154
ulture+0.134
anish+0.133
 Fighting+0.132
 courtroom+0.129
Neg logit contributions
ches-0.158
 y-0.135
 might-0.133
 wo-0.129
 mun-0.125
Token was
Token ID373
Feature activation-2.76e-02

Pos logit contributions
roc+0.138
ateful+0.125
ulture+0.122
ate+0.122
amer+0.119
Neg logit contributions
ches-0.086
 minutes-0.084
 y-0.082
 wo-0.080
 mun-0.079
Token very
Token ID845
Feature activation-1.58e-02

Pos logit contributions
roc+0.061
ate+0.058
ulture+0.057
anish+0.056
ateful+0.054
Neg logit contributions
 minutes-0.034
 y-0.033
ches-0.031
 mun-0.029
 wo-0.028
Token intelligent
Token ID12661
Feature activation+6.95e-03

Pos logit contributions
roc+0.031
ate+0.029
ulture+0.028
anish+0.027
rot+0.026
Neg logit contributions
 minutes-0.024
 y-0.022
ches-0.022
fy-0.021
 mun-0.021
Token and
Token ID290
Feature activation-1.77e-02

Pos logit contributions
 minutes+0.013
ches+0.012
fy+0.012
 drift+0.012
 mun+0.011
Neg logit contributions
roc-0.011
ate-0.010
ulture-0.009
ateful-0.009
rot-0.009
Token They
Token ID1119
Feature activation+1.53e-02

Pos logit contributions
 minutes+0.022
ches+0.021
fy+0.021
 degrees+0.020
 mun+0.020
Neg logit contributions
roc-0.021
ate-0.020
ulture-0.018
ateful-0.017
 Fighting-0.017
Token make
Token ID787
Feature activation-1.48e-02

Pos logit contributions
 minutes+0.025
 degrees+0.022
ches+0.022
fy+0.022
 mun+0.021
Neg logit contributions
roc-0.028
ate-0.027
ateful-0.024
ulture-0.024
man-0.024
Token sounds
Token ID5238
Feature activation+1.30e-02

Pos logit contributions
roc+0.027
ate+0.026
ulture+0.025
ateful+0.024
rot+0.024
Neg logit contributions
 minutes-0.023
ches-0.021
 y-0.021
 drift-0.020
 Whoever-0.020
Token like
Token ID588
Feature activation+2.94e-02

Pos logit contributions
 minutes+0.023
ches+0.022
fy+0.022
 mun+0.022
 degrees+0.021
Neg logit contributions
roc-0.020
ate-0.020
man-0.018
ateful-0.018
ulture-0.017
Token "
Token ID366
Feature activation-3.12e-02

Pos logit contributions
 minutes+0.054
fy+0.053
ches+0.052
 mun+0.050
 drift+0.048
Neg logit contributions
roc-0.045
ate-0.044
man-0.039
ateful-0.038
room-0.038
Tokenv
Token ID85
Feature activation-9.17e-02

Pos logit contributions
roc+0.055
ate+0.051
ateful+0.048
ulture+0.047
rot+0.045
Neg logit contributions
 minutes-0.051
ches-0.048
fy-0.046
 Whoever-0.046
 mun-0.044
Tokenroom
Token ID3823
Feature activation-1.32e-02

Pos logit contributions
 messenger+0.089
roc+0.086
amer+0.080
handed+0.080
 courtroom+0.079
Neg logit contributions
ches-0.189
 minutes-0.187
 y-0.174
 degrees-0.169
 Whoever-0.168
Token"
Token ID1
Feature activation-1.89e-02

Pos logit contributions
roc+0.023
ate+0.022
ateful+0.020
ulture+0.020
amer+0.019
Neg logit contributions
 minutes-0.022
ches-0.021
fy-0.020
 y-0.020
 mun-0.019
Token and
Token ID290
Feature activation+1.20e-02

Pos logit contributions
roc+0.036
ate+0.033
ulture+0.031
ateful+0.031
rot+0.030
Neg logit contributions
 minutes-0.030
ches-0.027
fy-0.026
 y-0.026
 Whoever-0.025
Token "
Token ID366
Feature activation-1.53e-02

Pos logit contributions
 minutes+0.021
ches+0.020
fy+0.020
 mun+0.019
 drift+0.019
Neg logit contributions
roc-0.020
ate-0.019
ateful-0.017
ulture-0.017
man-0.017
Tokenbe
Token ID1350
Feature activation-3.17e-02

Pos logit contributions
roc+0.026
ate+0.024
ulture+0.022
ateful+0.022
rot+0.021
Neg logit contributions
 minutes-0.027
ches-0.024
fy-0.024
 Whoever-0.023
 mun-0.023
Token that
Token ID326
Feature activation-1.69e-02

Pos logit contributions
roc+0.041
ate+0.038
ulture+0.037
anish+0.036
ateful+0.036
Neg logit contributions
 minutes-0.020
ches-0.018
fy-0.017
 y-0.017
 mun-0.016
Token day
Token ID1110
Feature activation+1.67e-03

Pos logit contributions
roc+0.035
ate+0.033
ateful+0.031
ulture+0.031
amer+0.030
Neg logit contributions
 minutes-0.025
ches-0.022
 y-0.022
fy-0.021
 drift-0.021
Token on
Token ID319
Feature activation-2.13e-02

Pos logit contributions
 minutes+0.003
ches+0.002
fy+0.002
 mun+0.002
 drift+0.002
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.003
rot-0.003
Token,
Token ID11
Feature activation+8.41e-03

Pos logit contributions
roc+0.036
ate+0.033
ulture+0.031
ateful+0.031
amer+0.030
Neg logit contributions
 minutes-0.040
ches-0.037
 drift-0.036
 y-0.035
fy-0.035
Token Pet
Token ID4767
Feature activation-2.39e-02

Pos logit contributions
 minutes+0.014
ches+0.012
fy+0.012
 degrees+0.012
 drift+0.011
Neg logit contributions
roc-0.016
ate-0.015
ulture-0.014
ateful-0.013
rot-0.013
Tokenal
Token ID282
Feature activation-9.17e-02

Pos logit contributions
roc+0.025
ate+0.021
ateful+0.020
ulture+0.019
anish+0.018
Neg logit contributions
ches-0.057
 minutes-0.056
 mun-0.052
fy-0.052
 degrees-0.051
Token was
Token ID373
Feature activation-2.25e-02

Pos logit contributions
roc+0.188
ateful+0.174
ate+0.168
 Fighting+0.167
anish+0.158
Neg logit contributions
ches-0.137
 minutes-0.117
 mun-0.115
 drift-0.104
 jumped-0.104
Token never
Token ID1239
Feature activation-1.75e-02

Pos logit contributions
roc+0.044
ate+0.040
ulture+0.038
ateful+0.038
anish+0.037
Neg logit contributions
 minutes-0.037
ches-0.033
 y-0.032
 mun-0.031
fy-0.031
Token careless
Token ID36138
Feature activation-2.10e-02

Pos logit contributions
roc+0.033
ate+0.032
ulture+0.030
rot+0.028
ateful+0.028
Neg logit contributions
 minutes-0.029
ches-0.026
fy-0.026
 y-0.025
 mun-0.025
Token and
Token ID290
Feature activation-1.94e-02

Pos logit contributions
roc+0.042
ate+0.040
ulture+0.038
ateful+0.036
rot+0.036
Neg logit contributions
 minutes-0.032
ches-0.028
fy-0.027
 y-0.027
 drift-0.027
Token always
Token ID1464
Feature activation-2.41e-02

Pos logit contributions
roc+0.040
ate+0.038
ulture+0.036
rot+0.035
ateful+0.035
Neg logit contributions
 minutes-0.028
ches-0.025
fy-0.024
 y-0.024
 mun-0.024
Token that
Token ID326
Feature activation-3.01e-02

Pos logit contributions
roc+0.052
ate+0.048
ulture+0.048
anish+0.047
ateful+0.045
Neg logit contributions
 minutes-0.024
ches-0.021
 y-0.020
fy-0.019
 mun-0.018
Token day
Token ID1110
Feature activation-1.11e-02

Pos logit contributions
roc+0.073
ate+0.068
ateful+0.066
ulture+0.064
amer+0.064
Neg logit contributions
 minutes-0.036
 y-0.031
ches-0.030
 drift-0.029
fy-0.027
Token on
Token ID319
Feature activation-2.40e-02

Pos logit contributions
roc+0.023
ate+0.022
ulture+0.020
ateful+0.020
rot+0.019
Neg logit contributions
 minutes-0.016
ches-0.014
fy-0.014
 drift-0.013
 y-0.013
Token,
Token ID11
Feature activation-2.72e-02

Pos logit contributions
roc+0.045
ate+0.041
ulture+0.040
ateful+0.039
amed+0.038
Neg logit contributions
 minutes-0.041
ches-0.037
 y-0.036
 drift-0.036
 marsh-0.034
Token the
Token ID262
Feature activation-2.17e-02

Pos logit contributions
roc+0.061
ate+0.059
ulture+0.057
rot+0.056
amed+0.056
Neg logit contributions
 minutes-0.033
 y-0.030
 Whoever-0.027
 drift-0.027
 mun-0.026
Token village
Token ID7404
Feature activation-9.07e-02

Pos logit contributions
roc+0.043
ate+0.042
ulture+0.039
rot+0.038
amer+0.037
Neg logit contributions
 minutes-0.034
 y-0.031
 tug-0.027
 drift-0.026
 slippery-0.026
Token was
Token ID373
Feature activation-2.64e-02

Pos logit contributions
roc+0.199
ate+0.179
ateful+0.173
ulture+0.173
anish+0.173
Neg logit contributions
 minutes-0.093
ches-0.092
 y-0.087
 wouldn-0.087
 carefully-0.087
Token even
Token ID772
Feature activation-1.32e-02

Pos logit contributions
roc+0.057
ate+0.052
ulture+0.052
anish+0.050
rot+0.050
Neg logit contributions
 minutes-0.037
 y-0.034
ches-0.032
 mun-0.030
fy-0.029
Token more
Token ID517
Feature activation-2.23e-02

Pos logit contributions
roc+0.030
ate+0.029
ulture+0.028
rot+0.027
ateful+0.026
Neg logit contributions
 minutes-0.017
ches-0.014
 y-0.014
fy-0.014
 mun-0.013
Token peaceful
Token ID12309
Feature activation-4.18e-02

Pos logit contributions
roc+0.048
ulture+0.043
ate+0.043
anish+0.042
rot+0.041
Neg logit contributions
 minutes-0.034
 y-0.030
fy-0.028
ches-0.026
 drift-0.025
Token and
Token ID290
Feature activation-2.28e-02

Pos logit contributions
roc+0.092
ulture+0.081
ate+0.081
ateful+0.077
rot+0.077
Neg logit contributions
 minutes-0.065
 y-0.050
 mun-0.046
 drift-0.046
ches-0.045
Token that
Token ID326
Feature activation-1.80e-02

Pos logit contributions
roc+0.038
ate+0.035
ulture+0.034
anish+0.034
ateful+0.033
Neg logit contributions
 minutes-0.018
ches-0.015
 y-0.015
fy-0.014
 mun-0.014
Token day
Token ID1110
Feature activation-4.73e-03

Pos logit contributions
roc+0.039
ate+0.037
ateful+0.035
ulture+0.034
amer+0.033
Neg logit contributions
 minutes-0.026
ches-0.023
 y-0.022
 drift-0.022
fy-0.021
Token on
Token ID319
Feature activation-2.25e-02

Pos logit contributions
roc+0.009
ate+0.009
ulture+0.008
ateful+0.008
rot+0.008
Neg logit contributions
 minutes-0.007
ches-0.006
fy-0.006
 drift-0.006
 mun-0.006
Token,
Token ID11
Feature activation-1.06e-02

Pos logit contributions
roc+0.040
ate+0.037
ulture+0.035
ateful+0.035
amer+0.034
Neg logit contributions
 minutes-0.040
ches-0.036
 drift-0.035
 y-0.035
 mun-0.034
Token the
Token ID262
Feature activation-1.06e-02

Pos logit contributions
roc+0.022
ate+0.021
ulture+0.020
ateful+0.019
rot+0.019
Neg logit contributions
 minutes-0.015
ches-0.013
 y-0.013
 mun-0.013
 drift-0.013
Token shop
Token ID6128
Feature activation-9.05e-02

Pos logit contributions
roc+0.017
ate+0.016
ulture+0.015
ateful+0.014
amer+0.014
Neg logit contributions
 minutes-0.021
 y-0.019
ches-0.018
fy-0.018
 drift-0.018
Tokenkeeper
Token ID13884
Feature activation-6.49e-02

Pos logit contributions
roc+0.150
anish+0.147
ateful+0.144
 Fighting+0.141
ate+0.136
Neg logit contributions
 y-0.127
 mun-0.126
 carefully-0.123
 minutes-0.122
 wouldn-0.119
Token was
Token ID373
Feature activation-2.60e-02

Pos logit contributions
roc+0.144
ateful+0.131
ate+0.128
 Fighting+0.128
ulture+0.127
Neg logit contributions
ches-0.078
 mun-0.074
 minutes-0.074
 y-0.067
 jumped-0.066
Token extra
Token ID3131
Feature activation-2.07e-03

Pos logit contributions
roc+0.053
ate+0.048
ulture+0.047
anish+0.046
ateful+0.045
Neg logit contributions
 minutes-0.041
 y-0.036
ches-0.034
 mun-0.034
 drift-0.033
Token careful
Token ID8161
Feature activation-3.36e-02

Pos logit contributions
roc+0.004
ate+0.004
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.003
ches-0.003
fy-0.003
 drift-0.003
 mun-0.003
Token to
Token ID284
Feature activation-4.00e-03

Pos logit contributions
roc+0.071
ate+0.064
ulture+0.062
ateful+0.061
rot+0.059
Neg logit contributions
 minutes-0.050
ches-0.045
fy-0.041
 y-0.041
 drift-0.041
Token,
Token ID11
Feature activation+5.33e-03

Pos logit contributions
roc+0.054
ate+0.051
ulture+0.048
ateful+0.046
amer+0.046
Neg logit contributions
 minutes-0.045
 y-0.041
 wo-0.041
ches-0.039
 drift-0.039
Token "
Token ID366
Feature activation-7.20e-03

Pos logit contributions
 minutes+0.009
ches+0.009
fy+0.009
 mun+0.008
 drift+0.008
Neg logit contributions
roc-0.009
ate-0.008
ulture-0.008
ateful-0.008
man-0.008
TokenAr
Token ID3163
Feature activation-6.71e-03

Pos logit contributions
roc+0.012
ate+0.011
ateful+0.010
ulture+0.010
rot+0.010
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.012
 mun-0.011
 drift-0.011
Tokenrr
Token ID21062
Feature activation-2.63e-02

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.006
ateful+0.006
rot+0.005
Neg logit contributions
 minutes-0.016
ches-0.015
fy-0.015
 mun-0.015
 drift-0.014
Token,
Token ID11
Feature activation+3.12e-02

Pos logit contributions
roc+0.042
ate+0.039
ulture+0.037
ateful+0.036
amed+0.034
Neg logit contributions
 minutes-0.050
ches-0.047
fy-0.045
 wo-0.043
 degrees-0.043
Token mate
Token ID16133
Feature activation-9.02e-02

Pos logit contributions
 minutes+0.052
fy+0.048
ches+0.048
 degrees+0.047
 mun+0.044
Neg logit contributions
roc-0.055
ate-0.052
rot-0.049
man-0.048
 judge-0.048
Tokeny
Token ID88
Feature activation-1.51e-02

Pos logit contributions
roc+0.133
 courtroom+0.132
 Fighting+0.127
amed+0.123
ate+0.120
Neg logit contributions
ches-0.147
 minutes-0.144
 might-0.139
 wo-0.137
 y-0.131
Token!"
Token ID2474
Feature activation+7.76e-03

Pos logit contributions
roc+0.025
ate+0.023
ateful+0.022
ulture+0.021
amer+0.020
Neg logit contributions
 minutes-0.027
ches-0.026
fy-0.025
 mun-0.024
 drift-0.024
Token
Token ID198
Feature activation-1.67e-02

Pos logit contributions
 minutes+0.015
ches+0.014
fy+0.014
 degrees+0.013
 mun+0.013
Neg logit contributions
roc-0.012
ate-0.011
ulture-0.010
ateful-0.010
rot-0.010
Token
Token ID198
Feature activation-4.49e-02

Pos logit contributions
roc+0.031
ate+0.029
ulture+0.027
ateful+0.027
flower+0.026
Neg logit contributions
 minutes-0.027
ches-0.025
 Whoever-0.023
 drift-0.023
 y-0.023
TokenAnna
Token ID31160
Feature activation-1.34e-02

Pos logit contributions
roc+0.094
ulture+0.086
ate+0.086
 zone+0.084
flower+0.083
Neg logit contributions
 minutes-0.057
 Whoever-0.052
 tug-0.051
ches-0.047
 carefully-0.047
Token to
Token ID284
Feature activation-1.37e-03

Pos logit contributions
 minutes+0.006
ches+0.006
fy+0.006
 mun+0.006
 drift+0.005
Neg logit contributions
roc-0.006
ate-0.006
ulture-0.005
ateful-0.005
man-0.005
Token bow
Token ID9563
Feature activation-3.61e-02

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.002
ches-0.002
fy-0.002
 mun-0.002
 drift-0.002
Token.
Token ID13
Feature activation+2.24e-02

Pos logit contributions
roc+0.087
ate+0.077
ateful+0.075
ulture+0.075
anish+0.073
Neg logit contributions
ches-0.043
 minutes-0.040
 y-0.038
fy-0.038
 wo-0.035
Token Whenever
Token ID21326
Feature activation-2.36e-02

Pos logit contributions
 minutes+0.039
ches+0.036
fy+0.035
 degrees+0.035
 mun+0.034
Neg logit contributions
roc-0.038
ate-0.037
ulture-0.033
 Respect-0.033
 Fighting-0.032
Token she
Token ID673
Feature activation-3.46e-03

Pos logit contributions
roc+0.056
ate+0.053
ulture+0.051
amed+0.050
ateful+0.049
Neg logit contributions
 minutes-0.027
 drift-0.022
 y-0.022
ches-0.022
 mun-0.021
Token bowed
Token ID35737
Feature activation-8.99e-02

Pos logit contributions
roc+0.007
ate+0.006
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.005
ches-0.005
fy-0.005
 mun-0.005
 drift-0.005
Token,
Token ID11
Feature activation-8.29e-04

Pos logit contributions
roc+0.214
ulture+0.198
ate+0.188
ateful+0.185
anish+0.181
Neg logit contributions
ches-0.097
 minutes-0.093
 carefully-0.093
 y-0.093
 wo-0.092
Token the
Token ID262
Feature activation-1.33e-02

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token town
Token ID3240
Feature activation-5.13e-02

Pos logit contributions
roc+0.022
ate+0.020
ulture+0.018
ateful+0.018
amer+0.017
Neg logit contributions
 minutes-0.025
ches-0.022
fy-0.022
 y-0.022
 drift-0.022
Tokenspe
Token ID4125
Feature activation-5.51e-02

Pos logit contributions
roc+0.116
ate+0.107
ulture+0.101
ateful+0.100
anish+0.097
Neg logit contributions
 minutes-0.062
ches-0.057
fy-0.055
 y-0.053
 drift-0.053
Tokenople
Token ID643
Feature activation-3.64e-02

Pos logit contributions
roc+0.087
ate+0.083
ulture+0.080
 Respect+0.078
ums+0.072
Neg logit contributions
 minutes-0.100
fy-0.087
ches-0.083
 Whoever-0.083
 mun-0.082
Token did
Token ID750
Feature activation+1.01e-02

Pos logit contributions
roc+0.052
ate+0.049
ulture+0.047
amer+0.047
amed+0.046
Neg logit contributions
 minutes-0.030
 y-0.029
ches-0.029
 drift-0.026
 wo-0.026
Token you
Token ID345
Feature activation-1.53e-02

Pos logit contributions
 minutes+0.014
ches+0.013
fy+0.012
 mun+0.012
 drift+0.012
Neg logit contributions
roc-0.021
ate-0.020
ateful-0.018
ulture-0.018
rot-0.018
Token take
Token ID1011
Feature activation-1.20e-02

Pos logit contributions
roc+0.033
ate+0.032
ulture+0.030
ateful+0.030
rot+0.029
Neg logit contributions
 minutes-0.019
ches-0.018
 drift-0.017
fy-0.017
 mun-0.017
Token my
Token ID616
Feature activation+8.73e-03

Pos logit contributions
roc+0.021
ate+0.020
ulture+0.019
ateful+0.019
rot+0.018
Neg logit contributions
 minutes-0.020
ches-0.019
 y-0.018
fy-0.018
 drift-0.017
Token kay
Token ID34681
Feature activation-4.34e-02

Pos logit contributions
 minutes+0.016
ches+0.015
fy+0.015
 mun+0.014
 Whoever+0.014
Neg logit contributions
roc-0.014
ate-0.013
ulture-0.012
ateful-0.012
rot-0.012
Tokenak
Token ID461
Feature activation-8.99e-02

Pos logit contributions
roc+0.076
ateful+0.069
flower+0.067
 Fighting+0.066
ate+0.066
Neg logit contributions
 minutes-0.067
fy-0.066
ches-0.065
 mun-0.058
 y-0.057
Token?"
Token ID1701
Feature activation-4.20e-03

Pos logit contributions
roc+0.192
ateful+0.179
ums+0.171
ulture+0.166
ate+0.166
Neg logit contributions
ches-0.116
 y-0.100
 minutes-0.099
fy-0.098
 slippery-0.096
Token
Token ID198
Feature activation-7.34e-03

Pos logit contributions
roc+0.007
ate+0.007
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.008
ches-0.007
fy-0.007
 mun-0.007
 drift-0.007
Token
Token ID198
Feature activation-3.72e-02

Pos logit contributions
roc+0.012
ate+0.012
ulture+0.011
ateful+0.011
rot+0.010
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.012
 drift-0.011
 mun-0.011
TokenL
Token ID43
Feature activation+1.77e-03

Pos logit contributions
roc+0.072
flower+0.069
ulture+0.068
ate+0.066
 zone+0.064
Neg logit contributions
 minutes-0.056
 tug-0.046
 Whoever-0.046
ches-0.044
 seconds-0.042
Tokenila
Token ID10102
Feature activation-3.00e-02

Pos logit contributions
 minutes+0.003
ches+0.002
fy+0.002
 drift+0.002
 mun+0.002
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.003
rot-0.003
Token Why
Token ID4162
Feature activation-1.47e-02

Pos logit contributions
roc+0.032
ate+0.030
ulture+0.028
ateful+0.028
rot+0.026
Neg logit contributions
 minutes-0.028
ches-0.026
 Whoever-0.025
fy-0.025
 drift-0.024
Token are
Token ID389
Feature activation+8.15e-03

Pos logit contributions
roc+0.031
ate+0.029
ulture+0.028
ateful+0.027
amer+0.027
Neg logit contributions
 minutes-0.019
ches-0.018
 y-0.017
fy-0.017
 drift-0.016
Token you
Token ID345
Feature activation-2.75e-02

Pos logit contributions
 minutes+0.011
ches+0.010
fy+0.009
 mun+0.009
 drift+0.009
Neg logit contributions
roc-0.018
ate-0.017
ateful-0.016
ulture-0.015
rot-0.015
Token qu
Token ID627
Feature activation-5.80e-02

Pos logit contributions
roc+0.058
ate+0.054
ulture+0.052
ateful+0.049
rot+0.048
Neg logit contributions
 minutes-0.038
ches-0.036
 y-0.034
 Whoever-0.034
 drift-0.033
Tokenarre
Token ID9624
Feature activation-4.56e-02

Pos logit contributions
man+0.066
roc+0.065
 messenger+0.065
ate+0.064
ateful+0.062
Neg logit contributions
ches-0.125
fy-0.109
 minutes-0.108
 Whoever-0.106
 mun-0.103
Tokenling
Token ID1359
Feature activation-8.97e-02

Pos logit contributions
roc+0.095
ulture+0.089
rot+0.086
ate+0.084
 Respect+0.084
Neg logit contributions
ches-0.063
 minutes-0.063
 y-0.054
fy-0.052
 drift-0.052
Token?"
Token ID1701
Feature activation-1.49e-02

Pos logit contributions
roc+0.186
ulture+0.170
ate+0.168
ateful+0.165
ums+0.164
Neg logit contributions
ches-0.116
 minutes-0.115
 y-0.110
fy-0.093
 twice-0.092
Token
Token ID198
Feature activation-2.86e-02

Pos logit contributions
roc+0.029
ate+0.027
ulture+0.026
ateful+0.025
rot+0.025
Neg logit contributions
 minutes-0.023
ches-0.021
 drift-0.020
 Whoever-0.020
 y-0.020
Token
Token ID198
Feature activation-4.00e-02

Pos logit contributions
roc+0.058
ate+0.052
ulture+0.052
ateful+0.052
amer+0.049
Neg logit contributions
 minutes-0.041
ches-0.038
 Whoever-0.038
 y-0.035
 drift-0.035
TokenThe
Token ID464
Feature activation-2.38e-02

Pos logit contributions
roc+0.073
ulture+0.068
ate+0.066
ateful+0.064
rot+0.061
Neg logit contributions
 minutes-0.066
 Whoever-0.061
 tug-0.059
fy-0.057
 y-0.056
Token two
Token ID734
Feature activation-1.62e-02

Pos logit contributions
roc+0.040
ate+0.040
ulture+0.036
rot+0.034
anish+0.034
Neg logit contributions
 minutes-0.043
 y-0.039
ches-0.036
fy-0.035
 wo-0.035
Token Can
Token ID1680
Feature activation+4.32e-03

Pos logit contributions
 minutes+0.007
ches+0.007
fy+0.006
 mun+0.006
 drift+0.006
Neg logit contributions
roc-0.007
ate-0.007
ulture-0.006
ateful-0.006
rot-0.006
Token you
Token ID345
Feature activation-2.32e-02

Pos logit contributions
 minutes+0.006
ches+0.005
fy+0.005
 mun+0.005
 drift+0.005
Neg logit contributions
roc-0.009
ate-0.009
ulture-0.008
ateful-0.008
rot-0.008
Token lend
Token ID22096
Feature activation-4.25e-02

Pos logit contributions
roc+0.047
ate+0.043
ulture+0.042
ateful+0.041
amer+0.041
Neg logit contributions
 minutes-0.033
ches-0.031
 y-0.030
 drift-0.030
fy-0.029
Token me
Token ID502
Feature activation-6.50e-02

Pos logit contributions
roc+0.104
ulture+0.102
ateful+0.101
amer+0.098
flower+0.098
Neg logit contributions
ches-0.038
 minutes-0.037
 y-0.035
 Whoever-0.029
 honey-0.029
Token your
Token ID534
Feature activation-2.34e-02

Pos logit contributions
roc+0.142
ateful+0.132
amer+0.130
ulture+0.130
ums+0.127
Neg logit contributions
ches-0.081
 minutes-0.078
 y-0.074
 honey-0.067
 dip-0.067
Token colors
Token ID7577
Feature activation-8.97e-02

Pos logit contributions
roc+0.042
ate+0.039
ulture+0.038
amer+0.037
ateful+0.037
Neg logit contributions
 minutes-0.039
 y-0.036
ches-0.034
fy-0.033
 wo-0.033
Token so
Token ID523
Feature activation-1.60e-02

Pos logit contributions
roc+0.212
ateful+0.180
ulture+0.180
handed+0.178
 courtroom+0.175
Neg logit contributions
 y-0.101
ches-0.096
 minutes-0.093
fy-0.091
 dip-0.091
Token I
Token ID314
Feature activation-4.28e-03

Pos logit contributions
roc+0.037
ate+0.034
ulture+0.033
amer+0.032
ateful+0.031
Neg logit contributions
 minutes-0.019
fy-0.017
ches-0.017
 y-0.017
 Whoever-0.015
Token can
Token ID460
Feature activation-3.01e-02

Pos logit contributions
roc+0.010
ate+0.010
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.005
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
Token draw
Token ID3197
Feature activation-3.94e-02

Pos logit contributions
roc+0.063
ateful+0.055
ulture+0.055
ate+0.054
amer+0.052
Neg logit contributions
ches-0.040
 minutes-0.039
 drift-0.037
 y-0.037
 dip-0.037
Token one
Token ID530
Feature activation-8.46e-03

Pos logit contributions
roc+0.082
ulture+0.072
ateful+0.070
ate+0.066
apper+0.066
Neg logit contributions
 minutes-0.059
ches-0.055
 y-0.054
fy-0.048
 Whoever-0.048
Token day
Token ID1110
Feature activation+1.87e-03

Pos logit contributions
roc+0.036
ate+0.035
ateful+0.032
ulture+0.032
amer+0.031
Neg logit contributions
 minutes-0.025
ches-0.022
fy-0.021
 y-0.021
 drift-0.021
Token on
Token ID319
Feature activation-1.65e-02

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.003
rot-0.003
Token,
Token ID11
Feature activation+6.83e-03

Pos logit contributions
roc+0.027
ate+0.026
ulture+0.024
ateful+0.023
amer+0.022
Neg logit contributions
 minutes-0.031
ches-0.029
 y-0.027
fy-0.027
 drift-0.027
Token Wh
Token ID854
Feature activation-1.62e-02

Pos logit contributions
 minutes+0.011
ches+0.010
fy+0.010
 drift+0.010
 degrees+0.010
Neg logit contributions
roc-0.013
ate-0.012
ulture-0.011
ateful-0.011
man-0.010
Tokenisk
Token ID1984
Feature activation-3.47e-02

Pos logit contributions
roc+0.028
ate+0.027
ateful+0.025
ulture+0.025
amer+0.024
Neg logit contributions
 minutes-0.026
ches-0.026
fy-0.024
 mun-0.024
 degrees-0.023
Tokeners
Token ID364
Feature activation-8.95e-02

Pos logit contributions
roc+0.059
ateful+0.053
ate+0.053
ulture+0.051
 Respect+0.050
Neg logit contributions
ches-0.058
 minutes-0.058
 mun-0.050
 Whoever-0.050
 y-0.050
Token was
Token ID373
Feature activation-2.08e-02

Pos logit contributions
roc+0.208
ateful+0.185
 Fighting+0.183
ate+0.183
ulture+0.176
Neg logit contributions
ches-0.103
 mun-0.097
 jumped-0.094
 minutes-0.091
 carefully-0.088
Token no
Token ID645
Feature activation+1.84e-02

Pos logit contributions
roc+0.040
ate+0.037
ulture+0.036
ateful+0.034
anish+0.034
Neg logit contributions
 minutes-0.035
ches-0.031
 y-0.030
 mun-0.029
fy-0.029
Token longer
Token ID2392
Feature activation-4.10e-02

Pos logit contributions
ches+0.021
 minutes+0.020
 degrees+0.020
 mun+0.020
 drift+0.020
Neg logit contributions
roc-0.039
ate-0.038
ateful-0.036
ulture-0.034
apper-0.033
Token a
Token ID257
Feature activation+6.37e-03

Pos logit contributions
roc+0.081
ate+0.078
ulture+0.077
rot+0.071
man+0.070
Neg logit contributions
 minutes-0.069
 y-0.058
fy-0.056
ches-0.054
 mun-0.054
Token lazy
Token ID16931
Feature activation-6.43e-04

Pos logit contributions
 minutes+0.010
ches+0.010
fy+0.009
 drift+0.009
 mun+0.009
Neg logit contributions
roc-0.012
ate-0.011
ulture-0.010
ateful-0.010
rot-0.010
Token make
Token ID787
Feature activation+1.31e-02

Pos logit contributions
roc+0.009
ate+0.008
ulture+0.008
ateful+0.007
rot+0.007
Neg logit contributions
 minutes-0.007
ches-0.006
fy-0.006
 drift-0.006
 mun-0.006
Token a
Token ID257
Feature activation+2.73e-02

Pos logit contributions
 minutes+0.024
fy+0.023
ches+0.023
 mun+0.022
 drift+0.022
Neg logit contributions
roc-0.021
ate-0.021
ateful-0.017
ulture-0.017
rot-0.017
Token fence
Token ID13990
Feature activation-6.86e-03

Pos logit contributions
 minutes+0.052
ches+0.049
 degrees+0.048
fy+0.048
 mun+0.047
Neg logit contributions
roc-0.042
ate-0.039
man-0.034
ulture-0.034
rot-0.034
Token!"
Token ID2474
Feature activation+1.67e-03

Pos logit contributions
roc+0.014
ate+0.013
ulture+0.012
ateful+0.012
rot+0.012
Neg logit contributions
 minutes-0.010
ches-0.009
fy-0.009
 drift-0.009
 mun-0.008
Token John
Token ID1757
Feature activation-2.45e-04

Pos logit contributions
 minutes+0.003
ches+0.002
fy+0.002
 mun+0.002
 drift+0.002
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.003
man-0.003
Token thought
Token ID1807
Feature activation+7.69e-02

Pos logit contributions
roc+0.001
ate+0.001
ulture+0.000
ateful+0.000
rot+0.000
Neg logit contributions
 minutes-0.000
ches-0.000
fy-0.000
 drift-0.000
 mun-0.000
Token it
Token ID340
Feature activation+1.68e-02

Pos logit contributions
 mun+0.162
ksh+0.156
fy+0.154
ns+0.147
 syrup+0.145
Neg logit contributions
ate-0.068
room-0.066
,-0.065
ating-0.059
 language-0.057
Token was
Token ID373
Feature activation+2.11e-02

Pos logit contributions
 minutes+0.029
 mun+0.025
fy+0.025
 degrees+0.025
ches+0.025
Neg logit contributions
roc-0.029
ate-0.028
man-0.026
ulture-0.025
rot-0.024
Token a
Token ID257
Feature activation+2.53e-02

Pos logit contributions
 minutes+0.031
ches+0.031
fy+0.030
 mun+0.030
 drift+0.028
Neg logit contributions
roc-0.041
ate-0.039
ateful-0.034
man-0.034
 Fighting-0.034
Token great
Token ID1049
Feature activation+5.59e-03

Pos logit contributions
ches+0.040
 minutes+0.039
fy+0.037
 degrees+0.037
 mun+0.037
Neg logit contributions
roc-0.046
ate-0.042
man-0.039
rot-0.039
ulture-0.038
Token idea
Token ID2126
Feature activation+1.84e-02

Pos logit contributions
 minutes+0.010
ches+0.009
fy+0.009
 degrees+0.009
 mun+0.009
Neg logit contributions
roc-0.010
ate-0.009
ulture-0.008
ateful-0.008
man-0.008
Token tried
Token ID3088
Feature activation-1.08e-02

Pos logit contributions
 minutes+0.030
fy+0.029
ches+0.029
 degrees+0.028
 drift+0.027
Neg logit contributions
roc-0.043
ate-0.041
man-0.037
ateful-0.037
amed-0.037
Token to
Token ID284
Feature activation+1.59e-02

Pos logit contributions
roc+0.020
ate+0.019
ulture+0.018
ateful+0.018
rot+0.017
Neg logit contributions
 minutes-0.017
ches-0.016
 y-0.015
fy-0.015
 drift-0.015
Token catch
Token ID4929
Feature activation+9.18e-03

Pos logit contributions
 minutes+0.021
fy+0.019
ches+0.019
 degrees+0.019
 wo+0.017
Neg logit contributions
roc-0.034
ate-0.033
ateful-0.030
ulture-0.029
man-0.029
Token the
Token ID262
Feature activation+8.97e-03

Pos logit contributions
 minutes+0.014
fy+0.013
ches+0.013
 mun+0.013
 drift+0.012
Neg logit contributions
roc-0.018
ate-0.017
ateful-0.015
ulture-0.015
rot-0.014
Token cake
Token ID12187
Feature activation+3.43e-02

Pos logit contributions
 minutes+0.015
ches+0.015
fy+0.014
 mun+0.013
 drift+0.013
Neg logit contributions
roc-0.016
ate-0.015
ulture-0.014
man-0.014
rot-0.014
Token too
Token ID1165
Feature activation+5.49e-02

Pos logit contributions
 minutes+0.062
 mun+0.058
fy+0.057
ches+0.056
 degrees+0.054
Neg logit contributions
ate-0.052
roc-0.051
man-0.049
room-0.046
flower-0.045
Token.
Token ID13
Feature activation+1.02e-02

Pos logit contributions
 mun+0.113
ches+0.108
 minutes+0.108
 syrup+0.107
 freshly+0.106
Neg logit contributions
roc-0.058
ate-0.056
ings-0.051
ateful-0.051
man-0.050
Token
Token ID198
Feature activation-1.62e-03

Pos logit contributions
 minutes+0.018
ches+0.017
fy+0.017
 degrees+0.016
 drift+0.016
Neg logit contributions
roc-0.017
ate-0.016
ulture-0.014
ateful-0.014
man-0.013
Token
Token ID198
Feature activation-2.45e-02

Pos logit contributions
roc+0.003
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.003
ches-0.003
fy-0.003
 mun-0.003
 drift-0.003
TokenThey
Token ID2990
Feature activation+3.66e-02

Pos logit contributions
roc+0.047
ate+0.042
ulture+0.042
ateful+0.041
rot+0.040
Neg logit contributions
 minutes-0.040
ches-0.035
fy-0.035
 Whoever-0.034
 y-0.033
Token pulled
Token ID5954
Feature activation+1.18e-02

Pos logit contributions
 minutes+0.056
 degrees+0.056
fy+0.052
 drift+0.052
ches+0.051
Neg logit contributions
roc-0.064
amed-0.055
ate-0.055
ateful-0.054
room-0.052
Token raining
Token ID43079
Feature activation+1.77e-02

Pos logit contributions
 minutes+0.010
ches+0.009
fy+0.009
 mun+0.009
 Whoever+0.009
Neg logit contributions
roc-0.014
ate-0.014
ateful-0.013
ulture-0.012
man-0.012
Token.
Token ID13
Feature activation+1.44e-02

Pos logit contributions
 minutes+0.030
 mun+0.028
ches+0.028
fy+0.027
 degrees+0.027
Neg logit contributions
ate-0.029
roc-0.028
man-0.025
room-0.025
ulture-0.025
Token<|endoftext|>
Token ID50256
Feature activation-1.18e-03

Pos logit contributions
 minutes+0.031
ches+0.029
 degrees+0.028
fy+0.028
 drift+0.027
Neg logit contributions
roc-0.019
ate-0.018
 Fighting-0.016
ulture-0.016
 Respect-0.015
TokenOnce
Token ID7454
Feature activation+1.65e-02

Pos logit contributions
roc+0.002
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.002
ches-0.002
fy-0.002
 mun-0.002
 drift-0.002
Token upon
Token ID2402
Feature activation+1.45e-02

Pos logit contributions
 minutes+0.030
ches+0.028
fy+0.028
 mun+0.027
 degrees+0.027
Neg logit contributions
roc-0.026
ate-0.025
ateful-0.023
ulture-0.022
man-0.022
Token a
Token ID257
Feature activation+5.88e-02

Pos logit contributions
 minutes+0.028
ches+0.026
fy+0.026
 degrees+0.025
 mun+0.025
Neg logit contributions
ate-0.021
roc-0.021
ulture-0.018
ateful-0.018
rot-0.017
Token time
Token ID640
Feature activation-1.22e-02

Pos logit contributions
 degrees+0.088
ches+0.084
 Whoever+0.084
 inc+0.083
 pounds+0.082
Neg logit contributions
roc-0.094
ate-0.090
 neighborhood-0.087
room-0.083
anish-0.080
Token,
Token ID11
Feature activation+3.23e-03

Pos logit contributions
roc+0.018
ate+0.017
ulture+0.016
ateful+0.015
amer+0.015
Neg logit contributions
 minutes-0.024
ches-0.022
fy-0.022
 y-0.022
 drift-0.021
Token there
Token ID612
Feature activation-2.37e-02

Pos logit contributions
 minutes+0.006
ches+0.005
fy+0.005
 mun+0.005
 drift+0.005
Neg logit contributions
roc-0.006
ate-0.005
ulture-0.005
ateful-0.005
rot-0.005
Token was
Token ID373
Feature activation+6.19e-03

Pos logit contributions
roc+0.047
ate+0.043
ulture+0.042
ateful+0.041
amer+0.041
Neg logit contributions
 minutes-0.036
 y-0.032
ches-0.032
 drift-0.031
 mun-0.030
Token a
Token ID257
Feature activation+2.87e-02

Pos logit contributions
 minutes+0.011
ches+0.011
fy+0.010
 degrees+0.010
 drift+0.010
Neg logit contributions
roc-0.010
ate-0.010
ulture-0.009
ateful-0.008
rot-0.008
Token top
Token ID1353
Feature activation+4.39e-02

Pos logit contributions
ches+0.056
 minutes+0.055
fy+0.053
 mun+0.053
 degrees+0.049
Neg logit contributions
roc-0.038
ate-0.036
 neighborhood-0.032
ateful-0.032
ulture-0.031
Token of
Token ID286
Feature activation+3.10e-02

Pos logit contributions
 mun+0.086
 minutes+0.084
ches+0.083
 degrees+0.083
fy+0.082
Neg logit contributions
ate-0.055
roc-0.052
room-0.051
man-0.050
amer-0.047
Token the
Token ID262
Feature activation+3.75e-02

Pos logit contributions
fy+0.072
ches+0.071
 minutes+0.067
 degrees+0.064
 mun+0.063
Neg logit contributions
ate-0.036
roc-0.033
ateful-0.030
 town-0.028
room-0.027
Token slide
Token ID10649
Feature activation+2.03e-02

Pos logit contributions
ches+0.072
 mun+0.068
fy+0.066
 minutes+0.065
 degrees+0.063
Neg logit contributions
 neighborhood-0.054
roc-0.054
ate-0.053
 town-0.053
 world-0.050
Token.
Token ID13
Feature activation+2.71e-02

Pos logit contributions
 minutes+0.037
ches+0.035
fy+0.034
 mun+0.034
 degrees+0.034
Neg logit contributions
roc-0.030
ate-0.030
man-0.027
room-0.027
ulture-0.026
Token He
Token ID679
Feature activation+5.45e-02

Pos logit contributions
 minutes+0.062
ches+0.061
fy+0.060
 degrees+0.057
 mun+0.056
Neg logit contributions
ate-0.030
roc-0.029
 Respect-0.023
room-0.023
ums-0.022
Token looks
Token ID3073
Feature activation+4.56e-02

Pos logit contributions
 minutes+0.093
fy+0.090
 degrees+0.083
 mun+0.082
ches+0.081
Neg logit contributions
ate-0.084
roc-0.082
 neighborhood-0.071
ateful-0.069
 bully-0.066
Token down
Token ID866
Feature activation+2.79e-02

Pos logit contributions
 degrees+0.099
 mun+0.097
 pounds+0.094
ches+0.094
 Whoever+0.093
Neg logit contributions
ate-0.041
roc-0.040
amed-0.034
 neighborhood-0.032
ateful-0.032
Token and
Token ID290
Feature activation+4.12e-02

Pos logit contributions
 minutes+0.055
ches+0.054
 mun+0.053
fy+0.053
 degrees+0.051
Neg logit contributions
ate-0.036
roc-0.033
man-0.031
ateful-0.030
amer-0.030
Token sees
Token ID7224
Feature activation-4.45e-03

Pos logit contributions
fy+0.086
 minutes+0.085
ches+0.084
 degrees+0.082
 mun+0.082
Neg logit contributions
roc-0.046
ate-0.043
 Fighting-0.037
 neighborhood-0.037
 Respect-0.036
Token L
Token ID406
Feature activation+2.91e-02

Pos logit contributions
roc+0.008
ate+0.008
ulture+0.007
ateful+0.007
rot+0.007
Neg logit contributions
 minutes-0.007
ches-0.007
fy-0.007
 mun-0.006
 drift-0.006
Token thing
Token ID1517
Feature activation+1.37e-02

Pos logit contributions
roc+0.045
ate+0.044
ateful+0.040
ulture+0.040
amer+0.039
Neg logit contributions
 minutes-0.046
 y-0.041
fy-0.041
ches-0.040
 mun-0.040
Token.
Token ID13
Feature activation-5.61e-03

Pos logit contributions
 minutes+0.022
ches+0.020
fy+0.020
 mun+0.019
 drift+0.019
Neg logit contributions
roc-0.024
ate-0.024
ulture-0.022
ateful-0.021
rot-0.021
Token<|endoftext|>
Token ID50256
Feature activation+2.23e-03

Pos logit contributions
roc+0.010
ate+0.009
ulture+0.008
ateful+0.008
rot+0.008
Neg logit contributions
 minutes-0.010
ches-0.009
fy-0.009
 mun-0.009
 drift-0.009
TokenOnce
Token ID7454
Feature activation+1.15e-02

Pos logit contributions
 minutes+0.004
ches+0.004
fy+0.004
 mun+0.004
 drift+0.004
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.003
rot-0.003
Token upon
Token ID2402
Feature activation+3.08e-02

Pos logit contributions
 minutes+0.021
ches+0.019
fy+0.019
 mun+0.018
 degrees+0.018
Neg logit contributions
roc-0.019
ate-0.018
ateful-0.016
ulture-0.016
man-0.016
Token a
Token ID257
Feature activation+5.83e-02

Pos logit contributions
 minutes+0.065
ches+0.063
fy+0.063
 degrees+0.060
 mun+0.058
Neg logit contributions
ate-0.039
roc-0.036
room-0.030
ateful-0.030
 Respect-0.030
Token time
Token ID640
Feature activation-2.18e-02

Pos logit contributions
 degrees+0.089
ches+0.086
 Whoever+0.085
 inc+0.085
utt+0.084
Neg logit contributions
roc-0.090
ate-0.088
 neighborhood-0.086
room-0.079
ap-0.076
Token,
Token ID11
Feature activation+2.69e-04

Pos logit contributions
roc+0.036
ate+0.034
ulture+0.032
ateful+0.031
amer+0.030
Neg logit contributions
 minutes-0.041
ches-0.037
 y-0.036
 drift-0.036
 wo-0.035
Token there
Token ID612
Feature activation-3.28e-02

Pos logit contributions
 minutes+0.000
ches+0.000
fy+0.000
 mun+0.000
 drift+0.000
Neg logit contributions
roc-0.000
ate-0.000
ulture-0.000
ateful-0.000
rot-0.000
Token was
Token ID373
Feature activation+9.14e-03

Pos logit contributions
roc+0.069
ate+0.063
ateful+0.062
amer+0.062
ulture+0.062
Neg logit contributions
 minutes-0.044
 y-0.040
ches-0.039
 drift-0.038
 mun-0.038
Token a
Token ID257
Feature activation+2.39e-02

Pos logit contributions
 minutes+0.017
ches+0.016
fy+0.016
 degrees+0.015
 drift+0.015
Neg logit contributions
roc-0.014
ate-0.014
ulture-0.012
ateful-0.012
rot-0.012
Token and
Token ID290
Feature activation+1.65e-02

Pos logit contributions
roc+0.001
ate+0.001
ulture+0.001
ateful+0.001
rot+0.001
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token baked
Token ID22979
Feature activation+2.41e-02

Pos logit contributions
 minutes+0.023
ches+0.022
fy+0.021
 degrees+0.020
 drift+0.019
Neg logit contributions
ate-0.032
roc-0.032
ateful-0.029
man-0.028
ulture-0.028
Token it
Token ID340
Feature activation+2.67e-02

Pos logit contributions
 minutes+0.038
fy+0.036
ches+0.035
 mun+0.034
 drift+0.033
Neg logit contributions
ate-0.045
roc-0.043
man-0.038
ateful-0.037
room-0.037
Token.
Token ID13
Feature activation+1.43e-02

Pos logit contributions
 minutes+0.047
fy+0.041
ches+0.040
 Whoever+0.040
 degrees+0.040
Neg logit contributions
ate-0.047
roc-0.042
man-0.041
room-0.037
flower-0.037
Token After
Token ID2293
Feature activation+1.25e-02

Pos logit contributions
 minutes+0.029
ches+0.028
fy+0.027
 degrees+0.026
 drift+0.026
Neg logit contributions
roc-0.020
ate-0.019
ulture-0.016
 Respect-0.016
ateful-0.016
Token it
Token ID340
Feature activation+9.27e-03

Pos logit contributions
 minutes+0.020
ches+0.020
fy+0.019
 drift+0.017
 degrees+0.017
Neg logit contributions
roc-0.022
ate-0.022
ateful-0.020
ulture-0.019
man-0.019
Token was
Token ID373
Feature activation+2.76e-02

Pos logit contributions
 minutes+0.016
ches+0.015
fy+0.015
 degrees+0.014
 Whoever+0.014
Neg logit contributions
roc-0.015
ate-0.015
ulture-0.013
man-0.013
ateful-0.013
Token brown
Token ID7586
Feature activation-7.58e-03

Pos logit contributions
ches+0.039
 minutes+0.038
 Whoever+0.037
fy+0.036
 mun+0.036
Neg logit contributions
ate-0.052
roc-0.048
amed-0.047
man-0.045
ateful-0.044
Token and
Token ID290
Feature activation+2.62e-02

Pos logit contributions
roc+0.012
ate+0.011
ulture+0.010
ateful+0.010
rot+0.010
Neg logit contributions
 minutes-0.014
ches-0.013
fy-0.013
 mun-0.013
 y-0.013
Token crispy
Token ID42807
Feature activation+2.75e-04

Pos logit contributions
ches+0.036
 minutes+0.035
fy+0.032
 Whoever+0.032
 mun+0.031
Neg logit contributions
roc-0.050
ate-0.049
man-0.046
ateful-0.045
flower-0.044
Token,
Token ID11
Feature activation+6.54e-03

Pos logit contributions
 minutes+0.001
ches+0.000
fy+0.000
 mun+0.000
 drift+0.000
Neg logit contributions
roc-0.000
ate-0.000
ulture-0.000
ateful-0.000
rot-0.000
Token.
Token ID13
Feature activation+1.19e-03

Pos logit contributions
 minutes+0.013
ches+0.013
fy+0.012
 mun+0.012
 drift+0.012
Neg logit contributions
roc-0.011
ate-0.010
ulture-0.009
ateful-0.009
man-0.009
Token Happy
Token ID14628
Feature activation-2.32e-02

Pos logit contributions
 minutes+0.002
ches+0.002
fy+0.002
 mun+0.002
 drift+0.002
Neg logit contributions
roc-0.002
ate-0.002
ulture-0.002
ateful-0.002
rot-0.002
Token birthday
Token ID10955
Feature activation+1.59e-03

Pos logit contributions
roc+0.036
ate+0.031
ulture+0.030
ateful+0.029
amed+0.028
Neg logit contributions
 minutes-0.047
 y-0.041
ches-0.041
fy-0.040
 mun-0.040
Token,
Token ID11
Feature activation+2.81e-02

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.002
ate-0.002
ulture-0.002
ateful-0.002
rot-0.002
Token Tom
Token ID4186
Feature activation-7.55e-03

Pos logit contributions
 minutes+0.039
ches+0.037
fy+0.036
 mun+0.035
 degrees+0.034
Neg logit contributions
ate-0.054
roc-0.053
man-0.048
 Respect-0.047
ateful-0.047
Token."
Token ID526
Feature activation+1.29e-02

Pos logit contributions
roc+0.015
ate+0.014
ulture+0.013
ateful+0.013
rot+0.012
Neg logit contributions
 minutes-0.011
ches-0.011
fy-0.010
 mun-0.010
 drift-0.010
Token
Token ID198
Feature activation-2.61e-03

Pos logit contributions
 minutes+0.025
ches+0.024
fy+0.023
 degrees+0.023
 mun+0.023
Neg logit contributions
roc-0.019
ate-0.018
 Respect-0.016
ulture-0.016
 Fighting-0.015
Token
Token ID198
Feature activation-1.48e-02

Pos logit contributions
roc+0.004
ate+0.004
ulture+0.004
ateful+0.004
rot+0.003
Neg logit contributions
 minutes-0.005
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
TokenShe
Token ID3347
Feature activation+2.46e-02

Pos logit contributions
roc+0.027
ate+0.025
ulture+0.024
ateful+0.024
rot+0.022
Neg logit contributions
 minutes-0.024
ches-0.022
fy-0.021
 Whoever-0.021
 y-0.020
Token calls
Token ID3848
Feature activation+3.93e-03

Pos logit contributions
 minutes+0.037
fy+0.034
 degrees+0.034
ches+0.032
 pounds+0.032
Neg logit contributions
ate-0.047
roc-0.045
ateful-0.040
rot-0.040
ulture-0.040
Token Anna
Token ID11735
Feature activation-4.11e-04

Pos logit contributions
 minutes+0.005
ches+0.005
fy+0.005
 mun+0.005
 degrees+0.005
Neg logit contributions
roc-0.008
ate-0.008
ulture-0.007
ateful-0.007
man-0.007
Token the
Token ID262
Feature activation-1.11e-02

Pos logit contributions
roc+0.038
ate+0.036
ulture+0.035
rot+0.034
ateful+0.033
Neg logit contributions
 minutes-0.022
 y-0.019
ches-0.019
fy-0.019
 mun-0.018
Token mom
Token ID1995
Feature activation-5.64e-03

Pos logit contributions
roc+0.016
ate+0.015
ulture+0.013
ateful+0.013
rot+0.013
Neg logit contributions
 minutes-0.023
 y-0.021
ches-0.020
fy-0.020
 drift-0.020
Token told
Token ID1297
Feature activation+1.47e-02

Pos logit contributions
roc+0.011
ate+0.011
ulture+0.010
ateful+0.010
rot+0.010
Neg logit contributions
 minutes-0.008
ches-0.008
fy-0.007
 mun-0.007
 drift-0.007
Token them
Token ID606
Feature activation+1.93e-03

Pos logit contributions
 minutes+0.026
fy+0.025
ches+0.024
 mun+0.024
 degrees+0.023
Neg logit contributions
roc-0.025
ate-0.023
ateful-0.021
ulture-0.020
room-0.020
Token to
Token ID284
Feature activation-1.97e-02

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.004
ate-0.003
ulture-0.003
ateful-0.003
rot-0.003
Token get
Token ID651
Feature activation+5.79e-03

Pos logit contributions
roc+0.043
ate+0.040
ulture+0.038
ateful+0.038
rot+0.038
Neg logit contributions
 minutes-0.025
 drift-0.022
ches-0.022
 y-0.021
fy-0.021
Token ready
Token ID3492
Feature activation+4.55e-02

Pos logit contributions
 minutes+0.009
ches+0.009
fy+0.008
 mun+0.008
 drift+0.008
Neg logit contributions
roc-0.010
ate-0.010
ateful-0.009
ulture-0.009
man-0.009
Token.
Token ID13
Feature activation-2.08e-03

Pos logit contributions
 minutes+0.069
 mun+0.069
ches+0.068
fy+0.065
 Whoever+0.064
Neg logit contributions
ate-0.077
roc-0.069
man-0.066
ateful-0.064
room-0.063
Token Everyone
Token ID11075
Feature activation+5.50e-03

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.004
ches-0.004
fy-0.003
 mun-0.003
 drift-0.003
Token put
Token ID1234
Feature activation+9.10e-04

Pos logit contributions
 minutes+0.010
ches+0.009
fy+0.009
 drift+0.009
 degrees+0.009
Neg logit contributions
roc-0.009
ate-0.008
ulture-0.008
ateful-0.008
man-0.007
Token on
Token ID319
Feature activation+5.24e-03

Pos logit contributions
 minutes+0.002
ches+0.001
fy+0.001
 mun+0.001
 drift+0.001
Neg logit contributions
roc-0.002
ate-0.002
ulture-0.001
ateful-0.001
rot-0.001
Token on
Token ID319
Feature activation+2.19e-02

Pos logit contributions
 minutes+0.035
fy+0.035
ches+0.035
 Whoever+0.034
 degrees+0.034
Neg logit contributions
ate-0.029
roc-0.027
amed-0.025
flower-0.025
ulture-0.024
Token her
Token ID607
Feature activation+2.99e-02

Pos logit contributions
 minutes+0.046
ches+0.044
fy+0.044
 Whoever+0.043
 degrees+0.042
Neg logit contributions
ate-0.028
roc-0.026
amed-0.021
flower-0.020
ateful-0.020
Token coat
Token ID13209
Feature activation-1.35e-02

Pos logit contributions
ches+0.044
 Whoever+0.044
 minutes+0.044
fy+0.042
 mun+0.040
Neg logit contributions
ate-0.052
roc-0.050
man-0.047
flower-0.045
amed-0.045
Token and
Token ID290
Feature activation+3.87e-02

Pos logit contributions
roc+0.026
ate+0.024
ateful+0.023
ulture+0.023
rot+0.021
Neg logit contributions
 minutes-0.021
ches-0.020
fy-0.019
 drift-0.018
 y-0.018
Token went
Token ID1816
Feature activation+3.10e-02

Pos logit contributions
 minutes+0.058
ches+0.055
 degrees+0.055
 Whoever+0.055
fy+0.053
Neg logit contributions
ate-0.066
roc-0.062
man-0.059
flower-0.058
amed-0.058
Token outside
Token ID2354
Feature activation+1.68e-02

Pos logit contributions
 minutes+0.065
 degrees+0.064
 mun+0.064
 pounds+0.063
 Whoever+0.063
Neg logit contributions
ate-0.037
man-0.031
roc-0.030
gy-0.028
amed-0.028
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
 minutes+0.029
ches+0.029
fy+0.028
 degrees+0.027
 mun+0.027
Neg logit contributions
ate-0.026
roc-0.025
man-0.022
flower-0.022
amer-0.022
Token
Token ID198
Feature activation+1.74e-03

Pos logit contributions
 minutes+0.024
ches+0.023
fy+0.022
 degrees+0.022
 mun+0.022
Neg logit contributions
roc-0.016
ate-0.016
 Fighting-0.013
ulture-0.013
 Respect-0.013
Token
Token ID198
Feature activation-1.34e-02

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 mun+0.003
 drift+0.003
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.002
ateful-0.002
rot-0.002
Token"
Token ID1
Feature activation-4.17e-03

Pos logit contributions
roc+0.028
ate+0.026
ulture+0.024
ateful+0.024
rot+0.023
Neg logit contributions
 minutes-0.020
ches-0.018
fy-0.017
 drift-0.016
 Whoever-0.016
TokenLook
Token ID8567
Feature activation+3.03e-02

Pos logit contributions
roc+0.006
ate+0.006
ulture+0.005
ateful+0.005
rot+0.005
Neg logit contributions
 minutes-0.008
ches-0.008
fy-0.008
 mun-0.007
 drift-0.007
TokenWe
Token ID1135
Feature activation+6.39e-03

Pos logit contributions
roc+0.057
ate+0.054
ateful+0.049
rot+0.049
ulture+0.048
Neg logit contributions
 minutes-0.043
 Whoever-0.040
ches-0.039
fy-0.038
 tug-0.037
Token can
Token ID460
Feature activation+3.23e-03

Pos logit contributions
 minutes+0.009
ches+0.007
fy+0.007
 degrees+0.007
 mun+0.007
Neg logit contributions
roc-0.014
ate-0.013
ulture-0.012
ateful-0.012
rot-0.012
Token't
Token ID470
Feature activation+1.89e-02

Pos logit contributions
 minutes+0.005
ches+0.005
fy+0.004
 mun+0.004
 degrees+0.004
Neg logit contributions
roc-0.006
ate-0.006
ulture-0.005
ateful-0.005
rot-0.005
Token run
Token ID1057
Feature activation+4.64e-03

Pos logit contributions
 minutes+0.032
 degrees+0.028
fy+0.027
ches+0.027
 mun+0.027
Neg logit contributions
roc-0.034
ate-0.032
ulture-0.030
 judge-0.029
room-0.029
Token on
Token ID319
Feature activation+4.61e-03

Pos logit contributions
 minutes+0.007
ches+0.006
 mun+0.006
fy+0.006
 drift+0.006
Neg logit contributions
roc-0.009
ate-0.008
ulture-0.008
ateful-0.008
rot-0.007
Token the
Token ID262
Feature activation+3.61e-02

Pos logit contributions
 minutes+0.008
ches+0.007
fy+0.007
 mun+0.007
 drift+0.007
Neg logit contributions
roc-0.008
ate-0.008
ateful-0.007
ulture-0.007
rot-0.006
Token track
Token ID2610
Feature activation-7.91e-03

Pos logit contributions
 mun+0.082
 minutes+0.080
fy+0.077
 degrees+0.075
 drift+0.074
Neg logit contributions
roc-0.042
ate-0.037
man-0.032
room-0.031
 neighborhood-0.031
Token.
Token ID13
Feature activation-4.63e-03

Pos logit contributions
roc+0.016
ate+0.015
ulture+0.014
ateful+0.014
rot+0.014
Neg logit contributions
 minutes-0.011
ches-0.010
fy-0.010
 mun-0.010
 drift-0.010
Token It
Token ID632
Feature activation+2.32e-02

Pos logit contributions
roc+0.009
ate+0.009
ulture+0.008
ateful+0.008
rot+0.008
Neg logit contributions
 minutes-0.007
ches-0.006
fy-0.006
 mun-0.006
 drift-0.006
Token's
Token ID338
Feature activation+4.17e-02

Pos logit contributions
 minutes+0.048
 mun+0.043
fy+0.042
 degrees+0.042
 syrup+0.041
Neg logit contributions
roc-0.034
ate-0.031
man-0.028
ulture-0.028
gy-0.026
Token for
Token ID329
Feature activation+4.68e-03

Pos logit contributions
 mun+0.073
 minutes+0.067
fy+0.066
 slur+0.065
 degrees+0.065
Neg logit contributions
roc-0.063
ate-0.063
 Fighting-0.058
gy-0.055
 wing-0.055
Token Billy
Token ID15890
Feature activation-1.59e-03

Pos logit contributions
roc+0.081
rot+0.080
ulture+0.080
ate+0.078
amer+0.076
Neg logit contributions
 y-0.052
 drift-0.051
 minutes-0.050
 wo-0.047
 tug-0.046
Token looked
Token ID3114
Feature activation+3.15e-03

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.003
ches-0.002
fy-0.002
 mun-0.002
 drift-0.002
Token at
Token ID379
Feature activation-2.01e-02

Pos logit contributions
 minutes+0.005
ches+0.005
 mun+0.005
fy+0.005
 drift+0.005
Neg logit contributions
roc-0.006
ate-0.005
ateful-0.005
ulture-0.005
rot-0.005
Token each
Token ID1123
Feature activation+1.14e-03

Pos logit contributions
roc+0.041
ate+0.039
rot+0.037
ulture+0.037
amer+0.035
Neg logit contributions
 minutes-0.028
ches-0.026
 y-0.026
 drift-0.024
fy-0.024
Token other
Token ID584
Feature activation-1.77e-03

Pos logit contributions
 minutes+0.002
ches+0.002
fy+0.002
 mun+0.002
 drift+0.002
Neg logit contributions
roc-0.002
ate-0.002
ulture-0.002
ateful-0.002
rot-0.002
Token in
Token ID287
Feature activation-2.56e-02

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.003
ches-0.003
fy-0.003
 mun-0.003
 drift-0.003
Token fear
Token ID3252
Feature activation+1.56e-02

Pos logit contributions
roc+0.053
ate+0.052
ulture+0.050
rot+0.048
amer+0.047
Neg logit contributions
 minutes-0.040
 y-0.034
 tug-0.033
 drift-0.032
 Whoever-0.030
Token.
Token ID13
Feature activation-2.41e-02

Pos logit contributions
 minutes+0.026
 mun+0.024
ches+0.024
fy+0.023
 degrees+0.023
Neg logit contributions
roc-0.025
ate-0.025
man-0.023
ateful-0.023
ulture-0.022
Token "
Token ID366
Feature activation-1.52e-02

Pos logit contributions
roc+0.049
ate+0.047
ulture+0.046
rot+0.045
amer+0.045
Neg logit contributions
 minutes-0.033
 Whoever-0.031
 y-0.030
ches-0.029
 drift-0.028
TokenWhat
Token ID2061
Feature activation-2.10e-02

Pos logit contributions
roc+0.027
ate+0.026
ulture+0.024
ateful+0.023
rot+0.023
Neg logit contributions
 minutes-0.025
fy-0.022
ches-0.022
 Whoever-0.022
 y-0.022
Token's
Token ID338
Feature activation+1.16e-02

Pos logit contributions
roc+0.040
ate+0.038
ulture+0.036
amer+0.035
ateful+0.034
Neg logit contributions
 minutes-0.032
 y-0.031
ches-0.029
fy-0.028
 drift-0.027
Token.
Token ID13
Feature activation+4.01e-03

Pos logit contributions
 minutes+0.014
ches+0.013
fy+0.013
 mun+0.012
 degrees+0.012
Neg logit contributions
roc-0.017
ate-0.017
ateful-0.015
ulture-0.015
rot-0.015
Token But
Token ID887
Feature activation-8.55e-03

Pos logit contributions
 minutes+0.006
ches+0.006
fy+0.006
 drift+0.005
 degrees+0.005
Neg logit contributions
roc-0.008
ate-0.007
ulture-0.007
ateful-0.007
man-0.006
Token he
Token ID339
Feature activation+1.22e-02

Pos logit contributions
roc+0.019
ate+0.018
ulture+0.017
ateful+0.017
rot+0.017
Neg logit contributions
 minutes-0.011
ches-0.010
fy-0.009
 mun-0.009
 drift-0.009
Token does
Token ID857
Feature activation+1.60e-02

Pos logit contributions
 minutes+0.021
fy+0.019
 drift+0.018
ches+0.018
 degrees+0.018
Neg logit contributions
roc-0.022
ate-0.021
ateful-0.019
ulture-0.019
rot-0.018
Token not
Token ID407
Feature activation+4.29e-03

Pos logit contributions
 minutes+0.031
fy+0.029
ches+0.028
 mun+0.028
 degrees+0.027
Neg logit contributions
ate-0.024
roc-0.024
ateful-0.021
 Fighting-0.019
 Respect-0.019
Token share
Token ID2648
Feature activation-1.55e-02

Pos logit contributions
 minutes+0.006
ches+0.006
fy+0.006
 degrees+0.005
 mun+0.005
Neg logit contributions
roc-0.009
ate-0.008
ulture-0.007
ateful-0.007
man-0.007
Token his
Token ID465
Feature activation-1.07e-02

Pos logit contributions
roc+0.032
ate+0.030
ulture+0.029
ateful+0.028
rot+0.027
Neg logit contributions
 minutes-0.022
ches-0.020
 y-0.019
fy-0.019
 drift-0.018
Token ball
Token ID2613
Feature activation-7.84e-03

Pos logit contributions
roc+0.019
ate+0.018
ulture+0.017
ateful+0.017
amer+0.016
Neg logit contributions
 minutes-0.018
ches-0.016
fy-0.016
 y-0.016
 drift-0.015
Token.
Token ID13
Feature activation+3.76e-03

Pos logit contributions
roc+0.014
ate+0.014
ulture+0.013
ateful+0.012
rot+0.012
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.011
 mun-0.011
 y-0.011
Token He
Token ID679
Feature activation+3.51e-02

Pos logit contributions
 minutes+0.007
ches+0.006
fy+0.006
 drift+0.006
 mun+0.006
Neg logit contributions
roc-0.006
ate-0.006
ulture-0.005
ateful-0.005
rot-0.005
Token says
Token ID1139
Feature activation-1.60e-02

Pos logit contributions
 minutes+0.058
 degrees+0.050
fy+0.050
 drift+0.046
 mun+0.046
Neg logit contributions
roc-0.065
ate-0.062
ateful-0.054
ulture-0.054
rot-0.053
Token that
Token ID326
Feature activation-8.43e-03

Pos logit contributions
 minutes+0.016
ches+0.014
fy+0.014
 drift+0.014
 mun+0.014
Neg logit contributions
roc-0.017
ate-0.017
ulture-0.015
man-0.015
rot-0.015
Token could
Token ID714
Feature activation-3.28e-03

Pos logit contributions
roc+0.018
ate+0.017
ulture+0.016
ateful+0.016
rot+0.015
Neg logit contributions
 minutes-0.012
ches-0.011
fy-0.010
 mun-0.010
 y-0.010
Token print
Token ID3601
Feature activation+4.47e-04

Pos logit contributions
roc+0.007
ate+0.006
ulture+0.006
ateful+0.006
rot+0.006
Neg logit contributions
 minutes-0.005
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
Token.
Token ID13
Feature activation-1.26e-02

Pos logit contributions
 minutes+0.001
ches+0.001
fy+0.001
 mun+0.001
 drift+0.001
Neg logit contributions
roc-0.001
ate-0.001
ulture-0.001
ateful-0.001
rot-0.001
Token Sammy
Token ID44067
Feature activation-1.99e-02

Pos logit contributions
roc+0.024
ate+0.023
ateful+0.022
ulture+0.022
rot+0.021
Neg logit contributions
 minutes-0.019
ches-0.017
 Whoever-0.017
fy-0.017
 mun-0.016
Token was
Token ID373
Feature activation-1.68e-02

Pos logit contributions
roc+0.039
ate+0.036
ulture+0.035
ateful+0.034
amer+0.033
Neg logit contributions
 minutes-0.031
ches-0.029
 mun-0.027
fy-0.027
 y-0.027
Token very
Token ID845
Feature activation-1.56e-02

Pos logit contributions
roc+0.039
ate+0.036
ulture+0.036
ateful+0.033
rot+0.033
Neg logit contributions
 minutes-0.021
 y-0.018
ches-0.018
fy-0.017
 mun-0.017
Token excited
Token ID6568
Feature activation+1.51e-03

Pos logit contributions
roc+0.033
ate+0.030
ulture+0.030
rot+0.028
anish+0.028
Neg logit contributions
 minutes-0.023
 y-0.021
fy-0.020
 drift-0.019
 mun-0.019
Token and
Token ID290
Feature activation+1.54e-02

Pos logit contributions
 minutes+0.002
ches+0.002
fy+0.002
 mun+0.002
 drift+0.002
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.002
ateful-0.002
man-0.002
Token he
Token ID339
Feature activation+1.13e-02

Pos logit contributions
 minutes+0.022
ches+0.021
 degrees+0.019
fy+0.019
 drift+0.018
Neg logit contributions
roc-0.029
ate-0.029
ateful-0.026
man-0.026
ulture-0.025
Token immediately
Token ID3393
Feature activation+1.90e-02

Pos logit contributions
 minutes+0.018
ches+0.016
fy+0.015
 degrees+0.015
 drift+0.015
Neg logit contributions
roc-0.021
ate-0.020
ulture-0.018
ateful-0.018
man-0.018
Token on
Token ID319
Feature activation-6.94e-03

Pos logit contributions
roc+0.018
ate+0.016
ulture+0.015
ateful+0.015
rot+0.015
Neg logit contributions
 minutes-0.016
ches-0.015
fy-0.014
 drift-0.014
 mun-0.014
Token a
Token ID257
Feature activation+3.89e-02

Pos logit contributions
roc+0.013
ate+0.012
ulture+0.011
ateful+0.011
rot+0.011
Neg logit contributions
 minutes-0.011
ches-0.010
fy-0.010
 drift-0.010
 y-0.010
Token boat
Token ID8848
Feature activation-1.29e-02

Pos logit contributions
 minutes+0.068
ches+0.064
 mun+0.063
 degrees+0.063
fy+0.061
Neg logit contributions
roc-0.058
ate-0.053
rot-0.052
man-0.052
 nation-0.050
Token and
Token ID290
Feature activation+1.37e-02

Pos logit contributions
roc+0.026
ate+0.025
ateful+0.023
ulture+0.023
rot+0.022
Neg logit contributions
 minutes-0.019
ches-0.017
fy-0.017
 mun-0.016
 drift-0.016
Token see
Token ID766
Feature activation+2.38e-02

Pos logit contributions
 minutes+0.019
ches+0.017
fy+0.016
 degrees+0.016
 mun+0.015
Neg logit contributions
roc-0.028
ate-0.028
ulture-0.025
ateful-0.025
man-0.024
Token where
Token ID810
Feature activation-1.34e-03

Pos logit contributions
 minutes+0.040
fy+0.039
ches+0.039
 mun+0.037
 drift+0.036
Neg logit contributions
ate-0.043
roc-0.040
ateful-0.036
rot-0.034
 Respect-0.034
Token it
Token ID340
Feature activation+2.94e-02

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.001
ches-0.001
fy-0.001
 mun-0.001
 drift-0.001
Token takes
Token ID2753
Feature activation+1.36e-02

Pos logit contributions
 minutes+0.050
fy+0.045
 mun+0.043
 degrees+0.043
ches+0.042
Neg logit contributions
ate-0.052
roc-0.049
man-0.044
ulture-0.043
rot-0.043
Token us
Token ID514
Feature activation-1.05e-02

Pos logit contributions
 minutes+0.019
ches+0.018
fy+0.018
 mun+0.018
 degrees+0.017
Neg logit contributions
roc-0.026
ate-0.026
ulture-0.022
ateful-0.022
rot-0.022
Token."
Token ID526
Feature activation-2.65e-03

Pos logit contributions
roc+0.022
ate+0.021
ulture+0.019
ateful+0.019
rot+0.019
Neg logit contributions
 minutes-0.015
ches-0.013
fy-0.013
 drift-0.012
 y-0.012
Token He
Token ID679
Feature activation+2.31e-02

Pos logit contributions
roc+0.005
ate+0.004
ulture+0.004
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.005
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
Token.
Token ID13
Feature activation-6.22e-03

Pos logit contributions
 minutes+0.015
ches+0.015
fy+0.015
 degrees+0.014
 drift+0.013
Neg logit contributions
roc-0.017
ate-0.017
ateful-0.015
ulture-0.015
man-0.015
Token They
Token ID1119
Feature activation+2.09e-02

Pos logit contributions
roc+0.011
ate+0.010
ulture+0.009
ateful+0.009
rot+0.009
Neg logit contributions
 minutes-0.011
ches-0.010
fy-0.010
 drift-0.010
 mun-0.010
Token put
Token ID1234
Feature activation+7.27e-03

Pos logit contributions
 minutes+0.040
 degrees+0.035
ches+0.034
 pounds+0.033
fy+0.033
Neg logit contributions
roc-0.034
ate-0.033
ulture-0.030
ateful-0.028
rot-0.028
Token on
Token ID319
Feature activation-3.32e-03

Pos logit contributions
 minutes+0.013
ches+0.012
fy+0.012
 mun+0.011
 degrees+0.011
Neg logit contributions
roc-0.012
ate-0.012
ulture-0.011
ateful-0.010
man-0.010
Token the
Token ID262
Feature activation+2.61e-02

Pos logit contributions
roc+0.005
ate+0.005
ulture+0.004
ateful+0.004
rot+0.004
Neg logit contributions
 minutes-0.006
ches-0.006
fy-0.006
 mun-0.006
 drift-0.006
Token gloves
Token ID18051
Feature activation-1.94e-02

Pos logit contributions
 minutes+0.046
 mun+0.043
ches+0.043
fy+0.042
 Whoever+0.042
Neg logit contributions
roc-0.039
ate-0.038
ulture-0.036
 neighborhood-0.036
rot-0.036
Token and
Token ID290
Feature activation+4.06e-02

Pos logit contributions
roc+0.040
ate+0.037
ateful+0.034
ulture+0.033
anish+0.032
Neg logit contributions
 minutes-0.028
ches-0.027
fy-0.025
 mun-0.024
 drift-0.024
Token boots
Token ID14412
Feature activation+6.72e-03

Pos logit contributions
 minutes+0.062
 degrees+0.058
 pounds+0.056
ches+0.056
 Whoever+0.055
Neg logit contributions
ate-0.070
roc-0.068
man-0.059
ulture-0.058
flower-0.058
Token.
Token ID13
Feature activation+8.28e-04

Pos logit contributions
 minutes+0.011
ches+0.010
fy+0.010
 mun+0.009
 degrees+0.009
Neg logit contributions
roc-0.012
ate-0.012
ulture-0.011
ateful-0.010
man-0.010
Token They
Token ID1119
Feature activation+2.91e-02

Pos logit contributions
 minutes+0.002
ches+0.001
fy+0.001
 mun+0.001
 drift+0.001
Neg logit contributions
roc-0.001
ate-0.001
ulture-0.001
ateful-0.001
rot-0.001
Token feel
Token ID1254
Feature activation-1.24e-02

Pos logit contributions
 minutes+0.053
 degrees+0.046
ches+0.044
 pounds+0.044
fy+0.043
Neg logit contributions
roc-0.049
ate-0.048
ulture-0.044
ateful-0.041
 Respect-0.041
Token.
Token ID13
Feature activation-1.25e-02

Pos logit contributions
roc+0.000
ate+0.000
ulture+0.000
ateful+0.000
rot+0.000
Neg logit contributions
 minutes-0.000
ches-0.000
fy-0.000
 drift-0.000
 mun-0.000
Token He
Token ID679
Feature activation-3.99e-03

Pos logit contributions
roc+0.020
ate+0.019
ulture+0.017
ateful+0.017
rot+0.016
Neg logit contributions
 minutes-0.024
ches-0.022
fy-0.021
 drift-0.021
 mun-0.021
Token lived
Token ID5615
Feature activation-2.80e-02

Pos logit contributions
roc+0.006
ate+0.006
ulture+0.005
ateful+0.005
rot+0.005
Neg logit contributions
 minutes-0.008
ches-0.007
fy-0.007
 mun-0.007
 drift-0.007
Token in
Token ID287
Feature activation-7.75e-03

Pos logit contributions
roc+0.061
ulture+0.054
ate+0.053
ateful+0.052
rot+0.050
Neg logit contributions
 minutes-0.038
 y-0.035
ches-0.033
 mun-0.032
 tug-0.030
Token a
Token ID257
Feature activation-3.22e-02

Pos logit contributions
roc+0.014
ate+0.014
ulture+0.013
ateful+0.012
rot+0.012
Neg logit contributions
 minutes-0.013
ches-0.011
fy-0.011
 y-0.011
 drift-0.011
Token big
Token ID1263
Feature activation-5.62e-02

Pos logit contributions
roc+0.070
ate+0.067
ulture+0.065
ateful+0.065
anish+0.064
Neg logit contributions
 minutes-0.042
 y-0.039
ches-0.035
fy-0.035
 drift-0.033
Token house
Token ID2156
Feature activation-2.21e-02

Pos logit contributions
ate+0.104
roc+0.102
ateful+0.098
ulture+0.097
anish+0.097
Neg logit contributions
 minutes-0.092
 y-0.086
fy-0.078
 slippery-0.076
 hay-0.076
Token with
Token ID351
Feature activation-2.90e-02

Pos logit contributions
roc+0.034
ate+0.029
ulture+0.028
ateful+0.027
rot+0.027
Neg logit contributions
 minutes-0.045
ches-0.040
 y-0.039
 mun-0.039
fy-0.039
Token a
Token ID257
Feature activation-1.94e-02

Pos logit contributions
roc+0.056
ulture+0.053
ate+0.053
rot+0.051
apper+0.050
Neg logit contributions
 y-0.046
 minutes-0.044
 mun-0.038
 drift-0.038
 marsh-0.038
Token big
Token ID1263
Feature activation-3.57e-02

Pos logit contributions
roc+0.039
ate+0.038
ulture+0.035
ateful+0.034
amer+0.033
Neg logit contributions
 minutes-0.028
 y-0.026
fy-0.025
ches-0.025
 drift-0.024
Token gate
Token ID8946
Feature activation+6.74e-03

Pos logit contributions
ate+0.067
roc+0.066
ateful+0.061
ulture+0.060
amer+0.059
Neg logit contributions
 minutes-0.057
 y-0.054
 wo-0.052
 drift-0.051
fy-0.050
Token to
Token ID284
Feature activation-2.39e-02

Pos logit contributions
 minutes+0.004
ches+0.004
fy+0.004
 mun+0.004
 drift+0.004
Neg logit contributions
roc-0.004
ate-0.004
ulture-0.003
ateful-0.003
rot-0.003
Token lose
Token ID4425
Feature activation-1.49e-02

Pos logit contributions
roc+0.048
ate+0.046
ulture+0.044
amed+0.043
rot+0.043
Neg logit contributions
 minutes-0.032
ches-0.031
 drift-0.031
 y-0.030
fy-0.030
Token him
Token ID683
Feature activation+2.69e-03

Pos logit contributions
roc+0.027
ate+0.026
ulture+0.024
ateful+0.023
rot+0.023
Neg logit contributions
 minutes-0.025
ches-0.023
 y-0.022
fy-0.022
 drift-0.021
Token.
Token ID13
Feature activation-1.77e-02

Pos logit contributions
 minutes+0.004
ches+0.004
fy+0.004
 mun+0.004
 drift+0.004
Neg logit contributions
roc-0.005
ate-0.005
ulture-0.005
ateful-0.005
rot-0.004
Token She
Token ID1375
Feature activation+3.36e-02

Pos logit contributions
roc+0.032
ate+0.031
ulture+0.029
ateful+0.028
rot+0.028
Neg logit contributions
 minutes-0.029
ches-0.027
 Whoever-0.026
fy-0.026
 y-0.025
Token said
Token ID531
Feature activation-5.78e-02

Pos logit contributions
 minutes+0.059
 degrees+0.055
 pounds+0.053
 drift+0.053
fy+0.052
Neg logit contributions
roc-0.056
ate-0.051
ateful-0.046
man-0.045
 judge-0.045
Token,
Token ID11
Feature activation+6.28e-03

Pos logit contributions
ate+0.129
roc+0.120
ulture+0.115
amer+0.115
amed+0.111
Neg logit contributions
 wo-0.078
 y-0.077
 minutes-0.076
ches-0.069
 maybe-0.068
Token "
Token ID366
Feature activation-2.32e-02

Pos logit contributions
 minutes+0.011
fy+0.010
ches+0.010
 mun+0.010
 drift+0.010
Neg logit contributions
roc-0.010
ate-0.010
ulture-0.009
ateful-0.009
man-0.009
TokenNo
Token ID2949
Feature activation+7.00e-03

Pos logit contributions
roc+0.046
ate+0.044
ateful+0.041
ulture+0.039
ums+0.039
Neg logit contributions
 minutes-0.035
ches-0.033
 tug-0.031
 Whoever-0.030
fy-0.030
Token,
Token ID11
Feature activation+1.93e-02

Pos logit contributions
 minutes+0.016
ches+0.015
 mun+0.015
fy+0.015
 drift+0.015
Neg logit contributions
roc-0.008
ate-0.007
ulture-0.006
anish-0.006
man-0.006
Token Max
Token ID5436
Feature activation-5.17e-03

Pos logit contributions
 minutes+0.032
 mun+0.029
fy+0.029
ches+0.029
 degrees+0.029
Neg logit contributions
roc-0.033
ate-0.029
man-0.029
ulture-0.029
anish-0.029
Token.
Token ID13
Feature activation+1.88e-04

Pos logit contributions
 minutes+0.018
ches+0.016
fy+0.016
 mun+0.015
 degrees+0.015
Neg logit contributions
roc-0.019
ate-0.019
ulture-0.017
man-0.017
ateful-0.017
Token
Token ID198
Feature activation+1.91e-02

Pos logit contributions
 minutes+0.000
ches+0.000
fy+0.000
 mun+0.000
 drift+0.000
Neg logit contributions
roc-0.000
ate-0.000
ulture-0.000
ateful-0.000
rot-0.000
Token
Token ID198
Feature activation+6.54e-04

Pos logit contributions
 minutes+0.041
fy+0.038
 degrees+0.037
ches+0.037
 mun+0.037
Neg logit contributions
ate-0.025
roc-0.024
ulture-0.021
man-0.020
rot-0.019
TokenAt
Token ID2953
Feature activation+1.24e-02

Pos logit contributions
 minutes+0.001
ches+0.001
fy+0.001
 mun+0.001
 drift+0.001
Neg logit contributions
roc-0.001
ate-0.001
ulture-0.001
ateful-0.001
rot-0.001
Token the
Token ID262
Feature activation-1.90e-03

Pos logit contributions
 minutes+0.023
ches+0.022
fy+0.022
 degrees+0.021
 mun+0.021
Neg logit contributions
roc-0.019
ate-0.019
ateful-0.016
man-0.016
ulture-0.016
Token market
Token ID1910
Feature activation-5.06e-02

Pos logit contributions
roc+0.003
ate+0.002
ulture+0.002
ateful+0.002
rot+0.002
Neg logit contributions
 minutes-0.004
ches-0.004
fy-0.004
 mun-0.004
 drift-0.004
Token,
Token ID11
Feature activation+7.60e-03

Pos logit contributions
roc+0.097
ulture+0.090
ateful+0.088
ate+0.087
amer+0.085
Neg logit contributions
 minutes-0.078
 y-0.071
 mun-0.065
ches-0.064
 until-0.064
Token Pe
Token ID2631
Feature activation+1.79e-03

Pos logit contributions
 minutes+0.011
ches+0.010
fy+0.010
 mun+0.009
 drift+0.009
Neg logit contributions
roc-0.015
ate-0.015
ateful-0.014
ulture-0.013
rot-0.013
Tokenppa
Token ID44989
Feature activation-3.34e-02

Pos logit contributions
 minutes+0.003
ches+0.003
fy+0.003
 drift+0.003
 mun+0.003
Neg logit contributions
roc-0.003
ate-0.003
ulture-0.003
ateful-0.002
rot-0.002
Token saw
Token ID2497
Feature activation-3.62e-02

Pos logit contributions
roc+0.066
ate+0.059
ateful+0.059
ulture+0.058
anish+0.056
Neg logit contributions
 minutes-0.050
ches-0.049
 mun-0.046
 y-0.044
 drift-0.044
Token a
Token ID257
Feature activation-1.30e-02

Pos logit contributions
roc+0.083
ate+0.076
ulture+0.076
anish+0.074
apper+0.073
Neg logit contributions
 y-0.045
 minutes-0.043
 marsh-0.036
 honey-0.036
 mun-0.035
TokenThe
Token ID464
Feature activation-2.04e-02

Pos logit contributions
roc+0.020
ate+0.019
ulture+0.017
ateful+0.017
man+0.017
Neg logit contributions
 minutes-0.024
ches-0.022
fy-0.022
 Whoever-0.021
 mun-0.021
Token people
Token ID661
Feature activation-2.04e-02

Pos logit contributions
roc+0.030
ate+0.028
ulture+0.025
ateful+0.024
anish+0.024
Neg logit contributions
 minutes-0.041
 y-0.038
ches-0.036
 drift-0.036
fy-0.036
Token on
Token ID319
Feature activation-1.49e-02

Pos logit contributions
roc+0.042
ate+0.040
ulture+0.037
ateful+0.036
rot+0.035
Neg logit contributions
 minutes-0.030
ches-0.027
fy-0.025
 drift-0.025
 y-0.025
Token the
Token ID262
Feature activation-1.39e-02

Pos logit contributions
roc+0.030
ate+0.028
ulture+0.026
ateful+0.026
rot+0.025
Neg logit contributions
 minutes-0.023
ches-0.021
 drift-0.019
 y-0.019
fy-0.019
Token Kind
Token ID14927
Feature activation-3.56e-02

Pos logit contributions
roc+0.022
ate+0.021
ulture+0.019
ateful+0.019
rot+0.018
Neg logit contributions
 minutes-0.027
ches-0.025
 y-0.024
 drift-0.024
fy-0.024
Token Island
Token ID5451
Feature activation-5.32e-02

Pos logit contributions
roc+0.065
ate+0.061
ulture+0.056
ateful+0.055
anish+0.054
Neg logit contributions
 minutes-0.058
ches-0.053
fy-0.052
 drift-0.051
 degrees-0.050
Token wanted
Token ID2227
Feature activation-7.68e-03

Pos logit contributions
roc+0.106
ate+0.100
ateful+0.092
ulture+0.089
amed+0.086
Neg logit contributions
 minutes-0.079
ches-0.071
 y-0.069
 mun-0.068
fy-0.066
Token to
Token ID284
Feature activation-3.24e-02

Pos logit contributions
roc+0.014
ate+0.013
ulture+0.012
ateful+0.012
rot+0.012
Neg logit contributions
 minutes-0.013
ches-0.012
fy-0.011
 drift-0.011
 y-0.011
Token help
Token ID1037
Feature activation+2.50e-03

Pos logit contributions
roc+0.071
ate+0.066
ateful+0.064
ulture+0.064
rot+0.062
Neg logit contributions
 minutes-0.038
 drift-0.037
ches-0.036
 y-0.035
 Whoever-0.034
Token,
Token ID11
Feature activation+2.30e-02

Pos logit contributions
 minutes+0.004
ches+0.004
fy+0.004
 mun+0.004
 drift+0.004
Neg logit contributions
roc-0.004
ate-0.004
ulture-0.004
ateful-0.004
rot-0.004
Token so
Token ID523
Feature activation+4.20e-03

Pos logit contributions
 minutes+0.032
ches+0.032
fy+0.030
 mun+0.029
 degrees+0.029
Neg logit contributions
roc-0.044
ate-0.041
ateful-0.038
ulture-0.038
 Respect-0.037
Token
Token ID198
Feature activation-1.37e-02

Pos logit contributions
 minutes+0.001
ches+0.001
fy+0.001
 mun+0.001
 drift+0.001
Neg logit contributions
roc-0.001
ate-0.001
ulture-0.001
ateful-0.001
rot-0.001
Token"
Token ID1
Feature activation-2.08e-03

Pos logit contributions
roc+0.022
ate+0.020
ulture+0.018
ateful+0.018
man+0.017
Neg logit contributions
 minutes-0.026
ches-0.024
fy-0.023
 Whoever-0.023
 drift-0.023
TokenIs
Token ID3792
Feature activation+3.08e-02

Pos logit contributions
roc+0.003
ate+0.003
ulture+0.003
ateful+0.003
rot+0.003
Neg logit contributions
 minutes-0.004
ches-0.004
fy-0.003
 mun-0.003
 drift-0.003
Token your
Token ID534
Feature activation-1.04e-02

Pos logit contributions
 minutes+0.049
fy+0.048
 drift+0.046
 mun+0.045
ches+0.045
Neg logit contributions
roc-0.055
ate-0.052
ateful-0.048
rot-0.046
ulture-0.044
Token name
Token ID1438
Feature activation-5.11e-02

Pos logit contributions
roc+0.017
ate+0.017
ulture+0.015
ateful+0.015
amer+0.014
Neg logit contributions
 minutes-0.019
ches-0.017
 y-0.017
fy-0.017
 mun-0.016
Token Red
Token ID2297
Feature activation-5.85e-02

Pos logit contributions
roc+0.103
ateful+0.092
ate+0.090
ulture+0.089
apper+0.087
Neg logit contributions
 y-0.073
ches-0.072
 minutes-0.071
 wo-0.066
fy-0.062
Token?"
Token ID1701
Feature activation-3.98e-04

Pos logit contributions
roc+0.106
ulture+0.098
ateful+0.097
amed+0.096
ate+0.096
Neg logit contributions
ches-0.093
 y-0.083
 minutes-0.078
 wo-0.078
fy-0.077
Token the
Token ID262
Feature activation-1.30e-02

Pos logit contributions
roc+0.001
ate+0.001
ulture+0.001
ateful+0.001
rot+0.001
Neg logit contributions
 minutes-0.001
ches-0.000
fy-0.000
 mun-0.000
 drift-0.000
Token boy
Token ID2933
Feature activation+1.55e-02

Pos logit contributions
roc+0.020
ate+0.019
ulture+0.017
ateful+0.017
amer+0.016
Neg logit contributions
 minutes-0.025
ches-0.023
 y-0.023
fy-0.022
 drift-0.022
Token asked
Token ID1965
Feature activation-1.20e-02

Pos logit contributions
 minutes+0.026
ches+0.024
fy+0.024
 degrees+0.023
 drift+0.023
Neg logit contributions
roc-0.027
ate-0.025
ateful-0.023
man-0.022
rot-0.022
Token Sam
Token ID3409
Feature activation+3.08e-03

Pos logit contributions
roc+0.027
ate+0.025
ulture+0.024
ateful+0.023
rot+0.023
Neg logit contributions
 minutes-0.015
ches-0.014
fy-0.013
 drift-0.013
 y-0.012