TOP ACTIVATIONS
MAX = 0.089

They both skipped and jumped mer rily as they made their
dolls and pretend they were mer maid s or pirates .
bed . She dreamed of mer maid s and pirates ,
He would jump in p udd les and make a big
She would jump in p udd les and catch rain drops
He would jump in p udd les and get very wet
loved to jump in p udd les and splash around .
S ki pping mer rily along , Hawk came
loved to jump in p udd les and make big spl
not to jump in p udd les with her clean bl
Let 's pretend we are mer maid s and swim in
He didn 't want to march with his friends anymore .
She said , " C udd les , come here !
man said , " What seems to be the problem ?"
outside and jumping in p udd les . Whenever Sally went
liked to jump in p udd les . She wore her
he can jump in p udd les without getting wet or
is that even if something seems impossible , it 's always
of the story is : Acc idents happen , but we
would jump in the p udd les and get all wet

MIN ACTIVATIONS
MIN = -0.092

as he opened the ch ocol ates and started eating them
men ." He gives them each a sticker and a candy
a big box of ch ocol ates ! <|endoftext|> Once upon
face and her special black tray in her hand . <|endoftext|>
hugged them and gave them each a balloon . He wink
He gave them each a hug and a kiss
cr ay on and a paper and helped him draw a
the necklace and the magn ifying glass and told him that
the plastic pizza and a plastic plate and fork . Ben
named Fl uffy and a ham ster named Nib bles .
table . Mom gave them each a bowl of grapes .
plastic pizza and a plastic plate and fork . Ben takes
- aid and a new teaspoon . This time Winn ie
. The dog bar ks . It sounds like
mom smiled and gave them each a kiss . She was
Mom gives them each a plate . They pour syrup and
She gave Tom and Lily each a slice . They took
was a box of ch ocol ates . Lily loved ch
a big hat and a cane too . "
a big box of ch ocol ates . It was so

INTERVAL 0.044 - 0.089
CONTAINS 0.586%

trying , he could do anything he wanted . <|endoftext|> Once
. When Rachel arrived , she felt the
, she saw a big , red truck . It was
. L ila loved to go on walks in the woods
he was never scared or worried . Instead , he welcomed

INTERVAL -0.001 - 0.044
CONTAINS 54.515%

away from it , Emily ." Emily backed away , but
, she used her twist one last time and the wall
break for it . He ran back to the rock and
the big house . They liked to explore the rooms and
started to cry . She wanted to go back inside but

INTERVAL -0.047 - -0.001
CONTAINS 44.453%

Lily and drove them home . They had a warm bath
on the fire and cracked an egg on the edge of
" No more sword and no more doll ," she said
, and news , and a pink t eddy bear .
auc ets . Some were old and rusty , but others

INTERVAL -0.092 - -0.047
CONTAINS 0.445%

them crying . " What happened , my loves
her the special , original gift . From then on ,
R ani smiled and hugged each other . They loved to
are sorry , teacher . We will learn , teacher .
!" Mia says . " I did not mean to break
TokenThey
Token ID2990
Feature activation-4.01e-03

Pos logit contributions
 slices+0.005
tr+0.005
 ig+0.005
 handcuffs+0.005
 bisc+0.005
Neg logit contributions
 closely-0.006
ivated-0.006
ize-0.006
uring-0.006
ya-0.006
Token both
Token ID1111
Feature activation-1.41e-02

Pos logit contributions
tr+0.008
aid+0.008
oc+0.008
ests+0.008
 slices+0.008
Neg logit contributions
 closely-0.005
ya-0.004
 Jackie-0.004
ivated-0.004
 up-0.004
Token skipped
Token ID26684
Feature activation+1.54e-02

Pos logit contributions
aid+0.028
tr+0.028
 slices+0.027
ests+0.027
 carriers+0.026
Neg logit contributions
 closely-0.019
ya-0.016
ivated-0.016
 up-0.015
 Jackie-0.015
Token and
Token ID290
Feature activation+3.93e-03

Pos logit contributions
 closely+0.021
ivated+0.019
uring+0.018
ize+0.018
itionally+0.018
Neg logit contributions
aid-0.030
 slices-0.029
tr-0.028
 half-0.027
 bisc-0.027
Token jumped
Token ID11687
Feature activation+3.62e-02

Pos logit contributions
 closely+0.005
 Jackie+0.004
 up+0.004
ya+0.004
ivated+0.004
Neg logit contributions
aid-0.008
tr-0.008
 slices-0.007
oc-0.007
ests-0.007
Token mer
Token ID4017
Feature activation+8.93e-02

Pos logit contributions
ivated+0.049
 closely+0.049
itionally+0.049
ize+0.047
ya+0.047
Neg logit contributions
aid-0.064
 slices-0.064
tr-0.061
 half-0.059
 bisc-0.058
Tokenrily
Token ID28904
Feature activation-1.57e-02

Pos logit contributions
 Jackie+0.161
 daunting+0.159
 impossible+0.155
 lookout+0.154
working+0.152
Neg logit contributions
 carriers-0.099
 slices-0.098
tr-0.096
pter-0.093
 slice-0.093
Token as
Token ID355
Feature activation-1.49e-03

Pos logit contributions
aid+0.031
 slices+0.029
tr+0.029
oc+0.027
 half+0.027
Neg logit contributions
 closely-0.021
ivated-0.019
uring-0.019
 up-0.018
ize-0.017
Token they
Token ID484
Feature activation-1.77e-03

Pos logit contributions
aid+0.003
 slices+0.003
tr+0.003
 half+0.003
 bruises+0.003
Neg logit contributions
 closely-0.002
ivated-0.002
itionally-0.002
uring-0.002
inating-0.002
Token made
Token ID925
Feature activation-1.47e-02

Pos logit contributions
tr+0.003
aid+0.003
 carriers+0.003
ests+0.003
 handcuffs+0.003
Neg logit contributions
 closely-0.003
ivated-0.002
ya-0.002
 up-0.002
 Jackie-0.002
Token their
Token ID511
Feature activation+1.73e-02

Pos logit contributions
tr+0.027
aid+0.026
oc+0.024
Rex+0.024
ests+0.024
Neg logit contributions
 closely-0.024
 up-0.022
 Jackie-0.020
ya-0.020
uring-0.019
Token dolls
Token ID36062
Feature activation-1.36e-02

Pos logit contributions
aid+0.017
 slices+0.015
tr+0.015
oc+0.015
Rex+0.014
Neg logit contributions
 closely-0.018
 up-0.016
ivated-0.016
uring-0.016
inating-0.015
Token and
Token ID290
Feature activation-4.93e-03

Pos logit contributions
aid+0.025
tr+0.024
 slices+0.023
ests+0.023
 bruises+0.022
Neg logit contributions
 closely-0.021
ivated-0.018
 up-0.018
uring-0.018
ize-0.017
Token pretend
Token ID16614
Feature activation+2.31e-02

Pos logit contributions
aid+0.009
 slices+0.009
tr+0.009
oc+0.008
ests+0.008
Neg logit contributions
 closely-0.007
ivated-0.006
uring-0.006
 up-0.006
ya-0.006
Token they
Token ID484
Feature activation+7.55e-04

Pos logit contributions
 closely+0.034
ivated+0.031
uring+0.030
ize+0.029
ya+0.029
Neg logit contributions
aid-0.042
 slices-0.040
tr-0.039
 half-0.037
 bisc-0.037
Token were
Token ID547
Feature activation-1.45e-02

Pos logit contributions
 closely+0.002
ivated+0.002
uring+0.002
ya+0.002
ize+0.002
Neg logit contributions
aid-0.001
 slices-0.001
tr-0.001
 half-0.001
 bisc-0.001
Token mer
Token ID4017
Feature activation+8.91e-02

Pos logit contributions
aid+0.026
tr+0.024
 slices+0.024
oc+0.024
ossible+0.023
Neg logit contributions
 closely-0.022
ivated-0.020
uring-0.020
 up-0.019
ize-0.019
Tokenmaid
Token ID23151
Feature activation+1.20e-02

Pos logit contributions
 Jackie+0.165
 lookout+0.156
appointed+0.156
 daunting+0.155
 impossible+0.149
Neg logit contributions
 carriers-0.103
tr-0.096
 slices-0.095
 slice-0.095
eper-0.094
Tokens
Token ID82
Feature activation-2.39e-02

Pos logit contributions
 closely+0.018
ivated+0.015
 up+0.015
ya+0.015
uring+0.015
Neg logit contributions
aid-0.022
 slices-0.021
 bruises-0.020
tr-0.020
 handcuffs-0.020
Token or
Token ID393
Feature activation-1.84e-02

Pos logit contributions
aid+0.043
 slices+0.038
tr+0.037
oc+0.037
 bruises+0.036
Neg logit contributions
 closely-0.039
 up-0.035
ivated-0.033
uring-0.033
inating-0.032
Token pirates
Token ID27516
Feature activation-8.02e-03

Pos logit contributions
aid+0.031
oc+0.029
tr+0.028
 slices+0.028
ests+0.026
Neg logit contributions
 closely-0.029
 Jackie-0.027
ivated-0.026
 up-0.026
ya-0.026
Token.
Token ID13
Feature activation+1.24e-02

Pos logit contributions
aid+0.014
tr+0.013
 slices+0.013
 bruises+0.012
ossible+0.012
Neg logit contributions
 closely-0.013
 up-0.012
ize-0.011
uring-0.011
ivated-0.011
Token bed
Token ID3996
Feature activation+3.61e-03

Pos logit contributions
aid+0.042
 slices+0.041
tr+0.040
 half+0.039
ossible+0.038
Neg logit contributions
 closely-0.050
ivated-0.049
uring-0.046
ya-0.046
ize-0.046
Token.
Token ID13
Feature activation-9.01e-03

Pos logit contributions
 closely+0.006
ivated+0.006
ya+0.006
 Jackie+0.006
itionally+0.006
Neg logit contributions
aid-0.005
 slices-0.005
tr-0.005
 half-0.004
oc-0.004
Token She
Token ID1375
Feature activation-1.57e-02

Pos logit contributions
tr+0.012
aid+0.012
 slices+0.011
 half+0.011
ossible+0.011
Neg logit contributions
 closely-0.017
ivated-0.016
ya-0.016
uring-0.016
ize-0.016
Token dreamed
Token ID27947
Feature activation+8.89e-03

Pos logit contributions
aid+0.028
 slices+0.027
tr+0.025
umbs+0.025
 bisc+0.024
Neg logit contributions
 closely-0.024
ivated-0.020
 up-0.020
ya-0.020
itionally-0.020
Token of
Token ID286
Feature activation-2.10e-02

Pos logit contributions
 closely+0.022
ivated+0.021
itionally+0.021
uring+0.020
inating+0.020
Neg logit contributions
aid-0.008
 slices-0.006
tr-0.006
 half-0.005
brush-0.005
Token mer
Token ID4017
Feature activation+8.81e-02

Pos logit contributions
aid+0.043
tr+0.042
 slices+0.041
oc+0.041
 bisc+0.040
Neg logit contributions
 closely-0.027
ivated-0.023
uring-0.022
 up-0.022
ize-0.021
Tokenmaid
Token ID23151
Feature activation+1.44e-02

Pos logit contributions
appointed+0.163
 Jackie+0.158
 daunting+0.156
 impossible+0.155
 lookout+0.152
Neg logit contributions
ests-0.099
 carriers-0.097
es-0.097
eper-0.094
aid-0.089
Tokens
Token ID82
Feature activation-1.54e-02

Pos logit contributions
 closely+0.024
ivated+0.023
uring+0.022
ize+0.021
ya+0.021
Neg logit contributions
aid-0.023
 slices-0.022
tr-0.021
 half-0.020
 bisc-0.020
Token and
Token ID290
Feature activation-9.72e-03

Pos logit contributions
aid+0.029
 slices+0.028
tr+0.027
 bisc+0.026
oc+0.026
Neg logit contributions
 closely-0.022
 up-0.019
ivated-0.018
uring-0.018
ya-0.017
Token pirates
Token ID27516
Feature activation+5.67e-04

Pos logit contributions
aid+0.017
 slices+0.016
tr+0.016
oc+0.015
 bisc+0.015
Neg logit contributions
 closely-0.015
ivated-0.013
 up-0.013
uring-0.013
 Jackie-0.013
Token,
Token ID11
Feature activation+1.56e-02

Pos logit contributions
 closely+0.001
ivated+0.001
 up+0.001
uring+0.001
ize+0.001
Neg logit contributions
aid-0.001
 slices-0.001
tr-0.001
 bisc-0.001
 half-0.001
Token He
Token ID679
Feature activation+3.08e-02

Pos logit contributions
 closely+0.039
ivated+0.037
uring+0.035
ize+0.034
ya+0.034
Neg logit contributions
aid-0.044
 slices-0.042
tr-0.042
 half-0.039
 bisc-0.039
Token would
Token ID561
Feature activation+1.13e-02

Pos logit contributions
 closely+0.053
ivated+0.052
uring+0.050
ize+0.049
itionally+0.048
Neg logit contributions
aid-0.046
tr-0.044
 slices-0.043
 half-0.041
ossible-0.040
Token jump
Token ID4391
Feature activation+5.95e-02

Pos logit contributions
 closely+0.017
ivated+0.016
ize+0.016
itionally+0.016
uring+0.016
Neg logit contributions
aid-0.019
tr-0.018
 slices-0.018
 half-0.017
ossible-0.017
Token in
Token ID287
Feature activation+6.06e-03

Pos logit contributions
itionally+0.106
annah+0.100
ya+0.099
ivated+0.097
anna+0.095
Neg logit contributions
tr-0.090
aid-0.084
 slices-0.081
 half-0.081
 scarf-0.079
Token p
Token ID279
Feature activation+4.47e-02

Pos logit contributions
 closely+0.009
 up+0.008
ivated+0.008
uring+0.007
ya+0.007
Neg logit contributions
aid-0.012
tr-0.011
 slices-0.010
 bisc-0.010
ests-0.010
Tokenudd
Token ID4185
Feature activation+8.59e-02

Pos logit contributions
 closely+0.077
 Jackie+0.077
uring+0.076
ize+0.076
ya+0.075
Neg logit contributions
aid-0.063
ossible-0.060
tr-0.056
 slices-0.054
ests-0.053
Tokenles
Token ID829
Feature activation+1.50e-02

Pos logit contributions
itionally+0.148
ivated+0.138
uring+0.132
 impossible+0.129
 Jackie+0.128
Neg logit contributions
aid-0.131
aw-0.129
 slices-0.120
ests-0.116
ossible-0.115
Token and
Token ID290
Feature activation+1.49e-02

Pos logit contributions
 closely+0.023
ivated+0.022
itionally+0.021
uring+0.021
ya+0.020
Neg logit contributions
aid-0.026
 slices-0.025
tr-0.024
 half-0.023
oc-0.023
Token make
Token ID787
Feature activation+3.23e-03

Pos logit contributions
 closely+0.020
ivated+0.019
ize+0.018
uring+0.018
itionally+0.018
Neg logit contributions
aid-0.028
tr-0.027
 slices-0.027
 half-0.026
 bisc-0.025
Token a
Token ID257
Feature activation+1.60e-02

Pos logit contributions
 closely+0.005
ivated+0.005
uring+0.004
 up+0.004
ize+0.004
Neg logit contributions
aid-0.006
tr-0.005
 slices-0.005
 half-0.005
 bisc-0.005
Token big
Token ID1263
Feature activation+1.16e-02

Pos logit contributions
 closely+0.022
ivated+0.019
 up+0.019
uring+0.019
ya+0.018
Neg logit contributions
aid-0.031
 slices-0.030
tr-0.029
ests-0.028
 handcuffs-0.027
Token She
Token ID1375
Feature activation+2.57e-02

Pos logit contributions
 closely+0.047
ivated+0.045
uring+0.044
ize+0.043
itionally+0.042
Neg logit contributions
aid-0.042
tr-0.042
 slices-0.041
 half-0.039
ossible-0.039
Token would
Token ID561
Feature activation+1.55e-02

Pos logit contributions
 closely+0.044
ivated+0.043
uring+0.042
ize+0.040
itionally+0.040
Neg logit contributions
aid-0.040
tr-0.038
 slices-0.037
 half-0.035
ossible-0.035
Token jump
Token ID4391
Feature activation+6.09e-02

Pos logit contributions
 closely+0.024
ivated+0.023
ize+0.023
itionally+0.023
uring+0.022
Neg logit contributions
aid-0.025
tr-0.024
 slices-0.023
 half-0.022
 scarf-0.022
Token in
Token ID287
Feature activation+9.86e-03

Pos logit contributions
itionally+0.112
ya+0.106
annah+0.105
ivated+0.103
anna+0.100
Neg logit contributions
tr-0.088
aid-0.081
 slices-0.080
 half-0.079
 scarf-0.076
Token p
Token ID279
Feature activation+4.72e-02

Pos logit contributions
 closely+0.015
 up+0.013
ivated+0.013
uring+0.012
ize+0.012
Neg logit contributions
aid-0.019
tr-0.018
 slices-0.017
oc-0.016
ests-0.016
Tokenudd
Token ID4185
Feature activation+8.55e-02

Pos logit contributions
 closely+0.081
uring+0.078
 Jackie+0.077
ize+0.077
ya+0.077
Neg logit contributions
aid-0.069
ossible-0.065
tr-0.063
 slices-0.062
ests-0.059
Tokenles
Token ID829
Feature activation+1.83e-02

Pos logit contributions
itionally+0.149
uring+0.136
ivated+0.135
 impossible+0.133
 Jackie+0.128
Neg logit contributions
aid-0.130
aw-0.128
 slices-0.116
brush-0.112
 scarf-0.108
Token and
Token ID290
Feature activation+1.80e-02

Pos logit contributions
 closely+0.028
ivated+0.026
itionally+0.025
uring+0.025
ya+0.024
Neg logit contributions
aid-0.032
 slices-0.031
tr-0.030
 half-0.029
 bisc-0.028
Token catch
Token ID4929
Feature activation+2.93e-02

Pos logit contributions
 closely+0.025
ivated+0.024
ize+0.022
uring+0.022
ya+0.022
Neg logit contributions
aid-0.033
tr-0.032
 slices-0.032
 half-0.031
 bisc-0.030
Token rain
Token ID6290
Feature activation+2.01e-02

Pos logit contributions
 closely+0.048
ivated+0.044
uring+0.042
 up+0.041
ize+0.041
Neg logit contributions
aid-0.049
tr-0.046
 slices-0.046
ossible-0.043
oc-0.042
Tokendrops
Token ID49253
Feature activation+6.34e-03

Pos logit contributions
 closely+0.035
ivated+0.033
ya+0.031
uring+0.031
ize+0.031
Neg logit contributions
aid-0.031
tr-0.030
 slices-0.029
 half-0.027
 bisc-0.027
Token He
Token ID679
Feature activation+3.02e-02

Pos logit contributions
 closely+0.049
ivated+0.047
uring+0.044
ize+0.044
ya+0.043
Neg logit contributions
aid-0.051
 slices-0.049
tr-0.049
 half-0.046
 bisc-0.045
Token would
Token ID561
Feature activation+1.44e-02

Pos logit contributions
 closely+0.051
ivated+0.050
uring+0.048
ize+0.046
itionally+0.046
Neg logit contributions
aid-0.047
tr-0.045
 slices-0.044
 half-0.042
ossible-0.041
Token jump
Token ID4391
Feature activation+6.05e-02

Pos logit contributions
 closely+0.022
ivated+0.021
ize+0.021
itionally+0.020
uring+0.020
Neg logit contributions
aid-0.024
tr-0.023
 slices-0.022
 half-0.021
 scarf-0.021
Token in
Token ID287
Feature activation+7.64e-03

Pos logit contributions
itionally+0.111
annah+0.107
ya+0.107
ivated+0.104
anna+0.102
Neg logit contributions
tr-0.087
 slices-0.080
aid-0.079
 half-0.078
 scarf-0.075
Token p
Token ID279
Feature activation+4.74e-02

Pos logit contributions
 closely+0.012
 up+0.010
ivated+0.010
uring+0.010
ize+0.009
Neg logit contributions
aid-0.014
tr-0.014
 slices-0.013
ests-0.012
 carriers-0.012
Tokenudd
Token ID4185
Feature activation+8.53e-02

Pos logit contributions
 closely+0.081
uring+0.080
 Jackie+0.079
ya+0.079
ize+0.079
Neg logit contributions
aid-0.068
ossible-0.065
tr-0.061
 slices-0.060
ests-0.058
Tokenles
Token ID829
Feature activation+1.74e-02

Pos logit contributions
itionally+0.149
ivated+0.140
uring+0.134
 impossible+0.129
 Jackie+0.128
Neg logit contributions
aid-0.130
aw-0.128
 slices-0.120
ests-0.116
ossible-0.112
Token and
Token ID290
Feature activation+1.81e-02

Pos logit contributions
 closely+0.026
ivated+0.025
uring+0.023
itionally+0.023
ya+0.023
Neg logit contributions
aid-0.030
 slices-0.030
tr-0.029
 half-0.027
oc-0.027
Token get
Token ID651
Feature activation+1.51e-03

Pos logit contributions
 closely+0.025
ivated+0.024
ize+0.023
uring+0.022
ya+0.022
Neg logit contributions
aid-0.033
tr-0.032
 slices-0.032
 half-0.031
 bisc-0.030
Token very
Token ID845
Feature activation+6.68e-03

Pos logit contributions
 closely+0.002
ivated+0.002
uring+0.002
ya+0.002
ize+0.002
Neg logit contributions
aid-0.003
 slices-0.003
tr-0.003
 half-0.002
 bisc-0.002
Token wet
Token ID9583
Feature activation+1.98e-02

Pos logit contributions
ya+0.011
itionally+0.011
ize+0.011
 closely+0.011
anna+0.011
Neg logit contributions
aid-0.011
 half-0.009
 slices-0.009
 bruises-0.009
ossible-0.009
Token loved
Token ID6151
Feature activation+1.45e-02

Pos logit contributions
 closely+0.043
ivated+0.041
uring+0.039
ize+0.038
itionally+0.038
Neg logit contributions
aid-0.041
tr-0.039
 slices-0.038
 half-0.036
ossible-0.035
Token to
Token ID284
Feature activation+4.09e-03

Pos logit contributions
 closely+0.022
ivated+0.021
uring+0.020
itionally+0.020
ize+0.020
Neg logit contributions
aid-0.025
 slices-0.024
tr-0.023
 half-0.022
 bisc-0.022
Token jump
Token ID4391
Feature activation+5.46e-02

Pos logit contributions
 closely+0.007
ivated+0.007
ize+0.006
itionally+0.006
ya+0.006
Neg logit contributions
aid-0.006
tr-0.006
 scarf-0.006
 slices-0.006
 half-0.006
Token in
Token ID287
Feature activation+9.80e-03

Pos logit contributions
itionally+0.101
ya+0.099
ivated+0.097
annah+0.095
anna+0.094
Neg logit contributions
tr-0.075
aid-0.071
 slices-0.070
 scarf-0.066
 half-0.066
Token p
Token ID279
Feature activation+4.62e-02

Pos logit contributions
 closely+0.014
 up+0.012
ivated+0.011
uring+0.011
ize+0.011
Neg logit contributions
aid-0.020
tr-0.019
 slices-0.018
ests-0.017
ulture-0.017
Tokenudd
Token ID4185
Feature activation+8.50e-02

Pos logit contributions
 closely+0.082
ya+0.081
uring+0.080
 Jackie+0.080
itionally+0.079
Neg logit contributions
aid-0.064
ossible-0.061
tr-0.056
 slices-0.055
ests-0.054
Tokenles
Token ID829
Feature activation+1.10e-02

Pos logit contributions
itionally+0.145
ivated+0.139
uring+0.134
 impossible+0.130
 Jackie+0.127
Neg logit contributions
aw-0.127
aid-0.127
 slices-0.119
ests-0.115
ossible-0.113
Token and
Token ID290
Feature activation+1.66e-02

Pos logit contributions
 closely+0.016
ivated+0.015
uring+0.014
ize+0.014
 up+0.014
Neg logit contributions
aid-0.020
tr-0.019
 slices-0.019
 half-0.018
 bisc-0.018
Token splash
Token ID22870
Feature activation+5.32e-02

Pos logit contributions
 closely+0.023
ivated+0.022
uring+0.021
ize+0.021
ya+0.020
Neg logit contributions
aid-0.030
 slices-0.029
tr-0.029
 half-0.028
 bisc-0.027
Token around
Token ID1088
Feature activation+3.23e-02

Pos logit contributions
ivated+0.093
itionally+0.091
annah+0.088
uring+0.087
 closely+0.086
Neg logit contributions
 slices-0.078
aid-0.075
oc-0.071
 half-0.071
ests-0.069
Token.
Token ID13
Feature activation+1.85e-02

Pos logit contributions
ivated+0.067
 closely+0.066
itionally+0.065
ya+0.064
annah+0.064
Neg logit contributions
 slices-0.035
oc-0.031
 half-0.031
aid-0.031
tr-0.030
Token
Token ID198
Feature activation-7.00e-03

Pos logit contributions
 closely+0.021
 up+0.019
uring+0.019
ivated+0.018
 Jackie+0.018
Neg logit contributions
aid-0.018
 slices-0.018
tr-0.017
 bisc-0.016
 slice-0.016
Token
Token ID198
Feature activation+1.28e-02

Pos logit contributions
aid+0.008
 slices+0.008
tr+0.008
 half+0.007
 bisc+0.007
Neg logit contributions
 closely-0.015
ivated-0.014
uring-0.013
ize-0.013
ya-0.013
TokenS
Token ID50
Feature activation+3.67e-02

Pos logit contributions
 closely+0.021
ivated+0.020
uring+0.019
ize+0.019
itionally+0.019
Neg logit contributions
aid-0.020
 slices-0.020
tr-0.020
 half-0.019
 bisc-0.018
Tokenki
Token ID4106
Feature activation+2.05e-02

Pos logit contributions
 closely+0.068
ivated+0.063
ize+0.062
ya+0.062
uring+0.061
Neg logit contributions
aid-0.051
 slices-0.051
tr-0.049
 half-0.046
 bruises-0.044
Tokenpping
Token ID2105
Feature activation+2.43e-02

Pos logit contributions
 closely+0.033
ivated+0.031
uring+0.029
itionally+0.029
ize+0.029
Neg logit contributions
aid-0.034
 slices-0.032
tr-0.032
 half-0.030
oc-0.030
Token mer
Token ID4017
Feature activation+8.48e-02

Pos logit contributions
 closely+0.036
ivated+0.032
uring+0.031
 up+0.030
ize+0.030
Neg logit contributions
aid-0.044
 slices-0.043
tr-0.043
 bisc-0.040
 half-0.040
Tokenrily
Token ID28904
Feature activation-1.10e-02

Pos logit contributions
 Jackie+0.132
 impossible+0.132
 lookout+0.132
 daunting+0.129
ivated+0.129
Neg logit contributions
es-0.116
eper-0.114
ests-0.114
oc-0.110
aid-0.110
Token along
Token ID1863
Feature activation+1.88e-02

Pos logit contributions
aid+0.018
 slices+0.017
tr+0.017
 half+0.016
oc+0.016
Neg logit contributions
 closely-0.018
ivated-0.017
uring-0.016
ya-0.016
itionally-0.016
Token,
Token ID11
Feature activation+4.00e-03

Pos logit contributions
ivated+0.030
 closely+0.030
itionally+0.029
ya+0.028
ize+0.028
Neg logit contributions
 slices-0.028
aid-0.028
 half-0.026
oc-0.026
tr-0.026
Token Hawk
Token ID26698
Feature activation-2.26e-03

Pos logit contributions
ivated+0.005
 closely+0.005
ize+0.005
itionally+0.005
ya+0.005
Neg logit contributions
 slices-0.007
aid-0.007
tr-0.007
 half-0.007
oc-0.007
Token came
Token ID1625
Feature activation-1.08e-02

Pos logit contributions
aid+0.004
 slices+0.004
tr+0.004
 half+0.003
 bisc+0.003
Neg logit contributions
 closely-0.003
ivated-0.003
uring-0.003
ize-0.003
ya-0.003
Token loved
Token ID6151
Feature activation+2.07e-02

Pos logit contributions
ivated+0.085
uring+0.082
 closely+0.082
ize+0.079
 avail+0.078
Neg logit contributions
tr-0.046
 scarf-0.044
aid-0.042
c-0.042
oc-0.042
Token to
Token ID284
Feature activation+7.41e-03

Pos logit contributions
 closely+0.033
ivated+0.031
uring+0.030
itionally+0.029
ize+0.028
Neg logit contributions
aid-0.035
 slices-0.033
tr-0.033
 half-0.031
 scarf-0.031
Token jump
Token ID4391
Feature activation+5.51e-02

Pos logit contributions
 closely+0.012
ivated+0.012
ize+0.012
ya+0.012
itionally+0.011
Neg logit contributions
aid-0.012
tr-0.011
 scarf-0.011
 bisc-0.010
ossible-0.010
Token in
Token ID287
Feature activation+1.30e-02

Pos logit contributions
itionally+0.092
ya+0.091
ivated+0.091
 closely+0.089
annah+0.085
Neg logit contributions
tr-0.080
aid-0.078
 slices-0.076
 half-0.073
 scarf-0.072
Token p
Token ID279
Feature activation+4.47e-02

Pos logit contributions
 closely+0.019
 up+0.016
ivated+0.016
uring+0.015
ize+0.015
Neg logit contributions
aid-0.026
tr-0.025
 slices-0.024
oc-0.023
ests-0.023
Tokenudd
Token ID4185
Feature activation+8.38e-02

Pos logit contributions
 closely+0.076
uring+0.073
ivated+0.071
ya+0.071
ize+0.070
Neg logit contributions
aid-0.067
ossible-0.064
tr-0.062
 slices-0.061
ests-0.059
Tokenles
Token ID829
Feature activation+1.39e-02

Pos logit contributions
itionally+0.138
ivated+0.132
uring+0.130
 impossible+0.129
 daunting+0.123
Neg logit contributions
aid-0.129
aw-0.122
 slices-0.117
brush-0.113
 scarf-0.113
Token and
Token ID290
Feature activation+2.28e-02

Pos logit contributions
 closely+0.021
ivated+0.019
uring+0.018
ize+0.018
ya+0.018
Neg logit contributions
aid-0.025
 slices-0.024
tr-0.023
 half-0.022
 bisc-0.022
Token make
Token ID787
Feature activation+4.69e-03

Pos logit contributions
 closely+0.033
ivated+0.032
ize+0.030
uring+0.030
ya+0.030
Neg logit contributions
aid-0.040
tr-0.039
 slices-0.039
 half-0.037
 bisc-0.036
Token big
Token ID1263
Feature activation+4.39e-03

Pos logit contributions
 closely+0.007
ivated+0.007
uring+0.007
 up+0.006
ize+0.006
Neg logit contributions
aid-0.008
tr-0.008
 slices-0.008
oc-0.007
ossible-0.007
Token spl
Token ID4328
Feature activation+4.84e-02

Pos logit contributions
 closely+0.008
ivated+0.008
uring+0.007
ize+0.007
ya+0.007
Neg logit contributions
aid-0.006
 slices-0.006
tr-0.006
 half-0.005
 bisc-0.005
Token not
Token ID407
Feature activation+9.08e-03

Pos logit contributions
aid+0.013
 slices+0.012
tr+0.012
 half+0.011
ossible+0.011
Neg logit contributions
 closely-0.015
ivated-0.014
uring-0.014
itionally-0.013
ya-0.013
Token to
Token ID284
Feature activation-4.25e-03

Pos logit contributions
 closely+0.016
ya+0.016
ivated+0.016
annah+0.016
uring+0.015
Neg logit contributions
aid-0.013
 half-0.013
tr-0.012
 slices-0.011
 bruises-0.011
Token jump
Token ID4391
Feature activation+6.28e-02

Pos logit contributions
 half+0.006
aid+0.006
ossible+0.005
 bruises+0.005
tr+0.005
Neg logit contributions
 closely-0.008
ya-0.008
ivated-0.008
uring-0.007
anna-0.007
Token in
Token ID287
Feature activation+8.87e-03

Pos logit contributions
ya+0.105
 closely+0.100
ivated+0.100
itionally+0.099
uring+0.094
Neg logit contributions
aid-0.096
tr-0.092
 slices-0.090
 half-0.089
brush-0.085
Token p
Token ID279
Feature activation+4.51e-02

Pos logit contributions
 closely+0.014
ivated+0.013
uring+0.013
ize+0.013
ya+0.012
Neg logit contributions
aid-0.015
 slices-0.014
tr-0.014
 half-0.013
 bisc-0.013
Tokenudd
Token ID4185
Feature activation+8.36e-02

Pos logit contributions
 closely+0.081
ya+0.079
uring+0.079
 lookout+0.078
ivated+0.077
Neg logit contributions
aid-0.064
ossible-0.056
 slices-0.056
ests-0.055
tr-0.053
Tokenles
Token ID829
Feature activation+9.67e-03

Pos logit contributions
itionally+0.142
ivated+0.136
uring+0.134
rolling+0.134
 lookout+0.129
Neg logit contributions
aid-0.126
aw-0.119
brush-0.107
 slices-0.106
 half-0.101
Token with
Token ID351
Feature activation-2.10e-02

Pos logit contributions
 closely+0.015
ivated+0.014
uring+0.013
ize+0.013
 up+0.013
Neg logit contributions
aid-0.017
 slices-0.016
tr-0.016
 half-0.015
 bisc-0.015
Token her
Token ID607
Feature activation-8.47e-03

Pos logit contributions
aid+0.036
tr+0.035
 slices+0.035
oc+0.033
ossible+0.032
Neg logit contributions
 closely-0.032
ivated-0.029
ize-0.028
uring-0.028
 up-0.028
Token clean
Token ID3424
Feature activation+1.47e-02

Pos logit contributions
aid+0.012
 slices+0.011
 half+0.011
tr+0.010
 bruises+0.010
Neg logit contributions
 closely-0.016
ivated-0.015
uring-0.015
ya-0.015
itionally-0.014
Token bl
Token ID698
Feature activation+6.45e-03

Pos logit contributions
 closely+0.028
ivated+0.028
uring+0.027
ya+0.026
itionally+0.026
Neg logit contributions
aid-0.018
 half-0.018
 slices-0.017
 scarf-0.017
ests-0.017
TokenLet
Token ID5756
Feature activation+4.53e-03

Pos logit contributions
aid+0.015
 slices+0.014
tr+0.014
 bruises+0.013
ossible+0.013
Neg logit contributions
 closely-0.021
ivated-0.021
uring-0.020
itionally-0.019
inating-0.019
Token's
Token ID338
Feature activation+1.04e-02

Pos logit contributions
 closely+0.010
ivated+0.010
uring+0.010
ways+0.010
inating+0.010
Neg logit contributions
 half-0.004
ossible-0.004
 slices-0.003
aid-0.003
 bisc-0.003
Token pretend
Token ID16614
Feature activation+3.92e-02

Pos logit contributions
 closely+0.016
ivated+0.016
ya+0.015
ize+0.015
uring+0.015
Neg logit contributions
aid-0.016
tr-0.016
 half-0.016
 slices-0.015
ossible-0.015
Token we
Token ID356
Feature activation+1.15e-02

Pos logit contributions
 closely+0.059
ivated+0.056
uring+0.052
ize+0.051
ya+0.051
Neg logit contributions
aid-0.068
 slices-0.067
tr-0.065
 half-0.062
 bisc-0.061
Token are
Token ID389
Feature activation+8.09e-03

Pos logit contributions
 closely+0.022
ivated+0.021
uring+0.020
ize+0.020
itionally+0.020
Neg logit contributions
aid-0.016
 slices-0.015
tr-0.014
 half-0.013
 bisc-0.013
Token mer
Token ID4017
Feature activation+8.26e-02

Pos logit contributions
 closely+0.013
ivated+0.013
ya+0.012
ize+0.012
itionally+0.012
Neg logit contributions
aid-0.013
 slices-0.012
tr-0.012
 half-0.012
 bisc-0.011
Tokenmaid
Token ID23151
Feature activation+2.53e-02

Pos logit contributions
 Jackie+0.145
appointed+0.140
 daunting+0.139
 lookout+0.136
 impossible+0.133
Neg logit contributions
 carriers-0.103
tr-0.099
aid-0.096
 slice-0.095
 slices-0.093
Tokens
Token ID82
Feature activation-8.90e-03

Pos logit contributions
 closely+0.040
ivated+0.037
uring+0.035
ya+0.035
 up+0.035
Neg logit contributions
aid-0.043
 slices-0.041
tr-0.040
 bruises-0.038
 half-0.038
Token and
Token ID290
Feature activation+1.63e-03

Pos logit contributions
aid+0.014
 slices+0.014
tr+0.014
 half+0.013
 bisc+0.013
Neg logit contributions
 closely-0.014
ivated-0.014
uring-0.013
ize-0.013
itionally-0.013
Token swim
Token ID9422
Feature activation+3.53e-02

Pos logit contributions
 closely+0.002
ivated+0.002
uring+0.002
 Jackie+0.002
 up+0.002
Neg logit contributions
aid-0.003
 slices-0.003
tr-0.003
oc-0.003
ests-0.003
Token in
Token ID287
Feature activation+5.95e-03

Pos logit contributions
ivated+0.056
 closely+0.056
itionally+0.051
uring+0.051
ize+0.051
Neg logit contributions
 slices-0.055
tr-0.054
aid-0.054
 half-0.052
oc-0.051
Token He
Token ID679
Feature activation+2.22e-02

Pos logit contributions
 closely+0.029
ivated+0.029
ya+0.027
ize+0.027
uring+0.026
Neg logit contributions
tr-0.020
aid-0.019
 slices-0.019
 half-0.018
ossible-0.018
Token didn
Token ID1422
Feature activation+2.67e-02

Pos logit contributions
ivated+0.039
 closely+0.037
uring+0.036
ize+0.036
inating+0.034
Neg logit contributions
tr-0.030
aid-0.030
oc-0.028
 half-0.028
ossible-0.028
Token't
Token ID470
Feature activation+1.91e-03

Pos logit contributions
 closely+0.031
ivated+0.028
ya+0.027
itionally+0.027
uring+0.026
Neg logit contributions
aid-0.055
tr-0.053
 slices-0.053
 half-0.051
ossible-0.049
Token want
Token ID765
Feature activation+2.93e-02

Pos logit contributions
 closely+0.003
ivated+0.003
ize+0.003
inating+0.003
itionally+0.003
Neg logit contributions
aid-0.003
tr-0.002
 half-0.002
brush-0.002
 slices-0.002
Token to
Token ID284
Feature activation+6.36e-03

Pos logit contributions
ivated+0.055
rated+0.052
 closely+0.051
uring+0.049
itionally+0.049
Neg logit contributions
aid-0.039
 half-0.039
 slices-0.038
tr-0.036
 bruises-0.035
Token march
Token ID9960
Feature activation+8.19e-02

Pos logit contributions
 closely+0.011
ivated+0.010
ya+0.010
inating+0.010
ize+0.010
Neg logit contributions
aid-0.009
 half-0.009
tr-0.008
ossible-0.008
 scarf-0.008
Token with
Token ID351
Feature activation-2.36e-03

Pos logit contributions
ya+0.138
itionally+0.132
inating+0.131
ivated+0.131
 closely+0.130
Neg logit contributions
aid-0.114
tr-0.113
 half-0.104
brush-0.104
ests-0.103
Token his
Token ID465
Feature activation-2.14e-03

Pos logit contributions
aid+0.003
 slices+0.003
 half+0.003
tr+0.003
 bruises+0.003
Neg logit contributions
 closely-0.004
ivated-0.004
uring-0.004
ya-0.004
inating-0.004
Token friends
Token ID2460
Feature activation+2.35e-02

Pos logit contributions
aid+0.002
 slices+0.002
tr+0.002
 half+0.002
oc+0.002
Neg logit contributions
 closely-0.005
ivated-0.004
uring-0.004
ize-0.004
ya-0.004
Token anymore
Token ID7471
Feature activation+1.32e-02

Pos logit contributions
 closely+0.031
ivated+0.030
uring+0.028
ize+0.028
inating+0.027
Neg logit contributions
aid-0.043
 slices-0.042
tr-0.042
 half-0.040
 bisc-0.039
Token.
Token ID13
Feature activation+1.29e-02

Pos logit contributions
 closely+0.022
ivated+0.020
uring+0.019
ize+0.019
ya+0.019
Neg logit contributions
aid-0.022
 slices-0.021
tr-0.021
 half-0.019
 bisc-0.019
Token She
Token ID1375
Feature activation+2.18e-02

Pos logit contributions
 closely+0.051
ivated+0.050
rated+0.048
ya+0.048
uring+0.047
Neg logit contributions
tr-0.029
 slices-0.028
aid-0.027
ossible-0.026
 bisc-0.026
Token said
Token ID531
Feature activation-5.37e-03

Pos logit contributions
 closely+0.033
ivated+0.032
uring+0.030
ize+0.029
itionally+0.029
Neg logit contributions
aid-0.037
 slices-0.036
tr-0.036
 half-0.034
 bisc-0.034
Token,
Token ID11
Feature activation+1.27e-02

Pos logit contributions
aid+0.010
 slices+0.010
tr+0.009
 half+0.009
brush+0.009
Neg logit contributions
 closely-0.008
ivated-0.007
uring-0.007
itionally-0.007
rated-0.007
Token "
Token ID366
Feature activation-1.74e-02

Pos logit contributions
 closely+0.023
ivated+0.023
uring+0.022
itionally+0.021
ize+0.021
Neg logit contributions
aid-0.017
tr-0.017
 slices-0.017
 half-0.015
 bisc-0.015
TokenC
Token ID34
Feature activation+3.74e-02

Pos logit contributions
aid+0.024
 slices+0.023
tr+0.023
 half+0.021
oc+0.021
Neg logit contributions
 closely-0.032
ivated-0.030
ya-0.029
ize-0.029
itionally-0.029
Tokenudd
Token ID4185
Feature activation+8.14e-02

Pos logit contributions
 closely+0.061
ivated+0.057
uring+0.054
ize+0.053
ya+0.053
Neg logit contributions
aid-0.062
 slices-0.059
tr-0.058
 half-0.055
 bisc-0.054
Tokenles
Token ID829
Feature activation-2.30e-03

Pos logit contributions
ivated+0.117
 closely+0.116
itionally+0.115
 impossible+0.113
 Jackie+0.111
Neg logit contributions
aid-0.139
ests-0.119
ossible-0.116
 slices-0.115
 bisc-0.114
Token,
Token ID11
Feature activation+1.75e-02

Pos logit contributions
aid+0.004
 slices+0.004
tr+0.004
 half+0.004
 bisc+0.004
Neg logit contributions
 closely-0.003
ivated-0.003
uring-0.003
ize-0.003
ya-0.003
Token come
Token ID1282
Feature activation+3.10e-02

Pos logit contributions
 closely+0.026
ivated+0.023
uring+0.022
 up+0.022
ya+0.022
Neg logit contributions
aid-0.032
 slices-0.031
tr-0.030
ests-0.029
oc-0.029
Token here
Token ID994
Feature activation+2.09e-02

Pos logit contributions
 closely+0.046
ivated+0.042
uring+0.040
ya+0.040
ize+0.040
Neg logit contributions
aid-0.055
 slices-0.054
tr-0.054
 half-0.051
oc-0.050
Token!
Token ID0
Feature activation-1.12e-02

Pos logit contributions
ivated+0.034
 closely+0.034
itionally+0.033
ize+0.032
uring+0.031
Neg logit contributions
 slices-0.032
 half-0.031
tr-0.030
aid-0.029
ossible-0.028
Tokenman
Token ID805
Feature activation-5.34e-03

Pos logit contributions
umbs+0.058
 bisc+0.055
 slices+0.055
aid+0.055
 bruises+0.053
Neg logit contributions
 up-0.063
 closely-0.060
ly-0.056
very-0.053
working-0.053
Token said
Token ID531
Feature activation-9.10e-03

Pos logit contributions
aid+0.010
tr+0.009
 slices+0.009
ests+0.009
 bisc+0.009
Neg logit contributions
 closely-0.008
ivated-0.007
 up-0.007
uring-0.007
ya-0.007
Token,
Token ID11
Feature activation+8.34e-03

Pos logit contributions
aid+0.019
 slices+0.018
tr+0.018
 half+0.017
oc+0.017
Neg logit contributions
 closely-0.011
ivated-0.010
uring-0.010
itionally-0.010
ya-0.009
Token "
Token ID366
Feature activation-2.42e-02

Pos logit contributions
 closely+0.014
ivated+0.013
uring+0.013
ize+0.012
itionally+0.012
Neg logit contributions
aid-0.013
 slices-0.013
tr-0.012
 half-0.012
oc-0.011
TokenWhat
Token ID2061
Feature activation-2.41e-02

Pos logit contributions
tr+0.039
 ig+0.039
icks+0.039
 slices+0.037
 bisc+0.036
Neg logit contributions
very-0.034
ivated-0.033
ize-0.033
ya-0.033
 closely-0.032
Token seems
Token ID2331
Feature activation+8.00e-02

Pos logit contributions
aid+0.040
tr+0.035
 slices+0.035
bar+0.034
 handcuffs+0.034
Neg logit contributions
 closely-0.038
 up-0.035
 Jackie-0.034
ize-0.034
ivated-0.034
Token to
Token ID284
Feature activation+3.31e-03

Pos logit contributions
ivated+0.138
annah+0.138
anna+0.138
ya+0.131
 closely+0.127
Neg logit contributions
 half-0.102
aid-0.102
 slices-0.096
tr-0.092
 Fire-0.092
Token be
Token ID307
Feature activation+9.55e-03

Pos logit contributions
 closely+0.005
ivated+0.004
uring+0.004
ize+0.004
 Jackie+0.004
Neg logit contributions
aid-0.006
tr-0.006
 slices-0.006
ests-0.006
oc-0.006
Token the
Token ID262
Feature activation+1.69e-03

Pos logit contributions
 closely+0.015
ivated+0.015
itionally+0.014
ya+0.014
uring+0.014
Neg logit contributions
aid-0.016
 slices-0.015
tr-0.015
 half-0.014
 slot-0.014
Token problem
Token ID1917
Feature activation+1.18e-02

Pos logit contributions
 closely+0.003
 up+0.002
ivated+0.002
 lookout+0.002
uring+0.002
Neg logit contributions
aid-0.003
 slices-0.003
ests-0.003
ossible-0.003
Rex-0.003
Token?"
Token ID1701
Feature activation+4.14e-03

Pos logit contributions
 closely+0.016
ivated+0.016
itionally+0.014
uring+0.014
ya+0.014
Neg logit contributions
aid-0.022
 slices-0.021
tr-0.021
 half-0.020
ossible-0.019
Token outside
Token ID2354
Feature activation+3.55e-02

Pos logit contributions
 closely+0.052
ivated+0.048
itionally+0.046
ya+0.046
uring+0.045
Neg logit contributions
aid-0.071
 slices-0.069
tr-0.068
 bisc-0.064
 half-0.064
Token and
Token ID290
Feature activation+3.02e-02

Pos logit contributions
 closely+0.056
itionally+0.055
ivated+0.055
uring+0.053
ya+0.052
Neg logit contributions
 slices-0.056
aid-0.053
tr-0.053
 half-0.052
oc-0.049
Token jumping
Token ID14284
Feature activation+4.39e-02

Pos logit contributions
annah+0.043
uring+0.042
ivated+0.041
ize+0.041
 closely+0.040
Neg logit contributions
tr-0.047
 half-0.046
 slices-0.045
 scarf-0.045
c-0.044
Token in
Token ID287
Feature activation+5.80e-03

Pos logit contributions
itionally+0.071
ivated+0.069
ya+0.067
annah+0.067
anna+0.064
Neg logit contributions
tr-0.070
 slices-0.068
aid-0.065
 scarf-0.061
brush-0.061
Token p
Token ID279
Feature activation+4.58e-02

Pos logit contributions
 closely+0.008
 up+0.007
ivated+0.006
 Jackie+0.006
ya+0.006
Neg logit contributions
aid-0.013
tr-0.012
ulture-0.011
oc-0.011
ests-0.011
Tokenudd
Token ID4185
Feature activation+8.00e-02

Pos logit contributions
 closely+0.082
uring+0.081
 Jackie+0.079
ya+0.078
ize+0.077
Neg logit contributions
aid-0.061
ossible-0.061
tr-0.056
 slices-0.055
c-0.054
Tokenles
Token ID829
Feature activation+1.03e-02

Pos logit contributions
itionally+0.136
uring+0.131
ivated+0.130
 impossible+0.127
ounded+0.124
Neg logit contributions
aw-0.120
aid-0.112
 slices-0.105
brush-0.099
ests-0.096
Token.
Token ID13
Feature activation+1.84e-02

Pos logit contributions
 closely+0.018
ivated+0.017
uring+0.016
 up+0.016
ize+0.016
Neg logit contributions
aid-0.016
tr-0.015
 slices-0.015
 bisc-0.014
 half-0.014
Token Whenever
Token ID21326
Feature activation+1.07e-02

Pos logit contributions
 closely+0.028
ivated+0.027
uring+0.025
ize+0.025
ya+0.025
Neg logit contributions
aid-0.031
 slices-0.030
tr-0.030
 half-0.028
 bisc-0.028
Token Sally
Token ID25737
Feature activation+4.71e-02

Pos logit contributions
 closely+0.016
ivated+0.015
uring+0.015
itionally+0.014
ize+0.014
Neg logit contributions
aid-0.018
 slices-0.018
 half-0.017
tr-0.017
ossible-0.017
Token went
Token ID1816
Feature activation+4.99e-02

Pos logit contributions
ivated+0.093
uring+0.092
 closely+0.090
ize+0.084
itionally+0.084
Neg logit contributions
tr-0.058
aid-0.054
oc-0.052
 slices-0.052
 scarf-0.051
Token liked
Token ID8288
Feature activation-1.22e-02

Pos logit contributions
ivated+0.010
 closely+0.010
uring+0.010
itionally+0.010
ize+0.009
Neg logit contributions
aid-0.006
tr-0.006
ossible-0.006
 slices-0.006
oc-0.006
Token to
Token ID284
Feature activation+8.77e-03

Pos logit contributions
aid+0.027
tr+0.026
oc+0.025
 slices+0.025
 handcuffs+0.024
Neg logit contributions
 closely-0.014
 up-0.012
ivated-0.012
ize-0.011
uring-0.011
Token jump
Token ID4391
Feature activation+4.48e-02

Pos logit contributions
 closely+0.012
ivated+0.011
uring+0.011
ize+0.010
itionally+0.010
Neg logit contributions
aid-0.016
 slices-0.016
tr-0.016
 half-0.015
 bisc-0.015
Token in
Token ID287
Feature activation+6.44e-03

Pos logit contributions
 closely+0.071
ivated+0.068
uring+0.065
itionally+0.064
ya+0.064
Neg logit contributions
aid-0.073
 slices-0.071
tr-0.070
 half-0.066
 bisc-0.064
Token p
Token ID279
Feature activation+3.80e-02

Pos logit contributions
 closely+0.008
 up+0.007
 Jackie+0.006
ivated+0.006
ize+0.006
Neg logit contributions
aid-0.015
tr-0.014
ulture-0.013
oc-0.013
ests-0.012
Tokenudd
Token ID4185
Feature activation+7.99e-02

Pos logit contributions
 closely+0.065
uring+0.062
ivated+0.061
 Jackie+0.061
 lookout+0.060
Neg logit contributions
aid-0.057
ossible-0.054
 slices-0.052
tr-0.050
ests-0.049
Tokenles
Token ID829
Feature activation-1.72e-03

Pos logit contributions
ivated+0.111
itionally+0.110
uring+0.109
 impossible+0.106
 lookout+0.105
Neg logit contributions
 slices-0.130
aid-0.129
aw-0.125
ossible-0.122
 scarf-0.119
Token.
Token ID13
Feature activation+3.39e-02

Pos logit contributions
aid+0.003
tr+0.003
 bisc+0.003
 slices+0.003
ossible+0.003
Neg logit contributions
 closely-0.003
ivated-0.003
 up-0.002
uring-0.002
ize-0.002
Token She
Token ID1375
Feature activation+2.01e-02

Pos logit contributions
 closely+0.061
ivated+0.060
ize+0.058
uring+0.058
ya+0.058
Neg logit contributions
tr-0.046
aid-0.045
 slices-0.044
 half-0.043
brush-0.041
Token wore
Token ID12408
Feature activation+2.56e-03

Pos logit contributions
ivated+0.039
 closely+0.039
uring+0.037
ize+0.037
itionally+0.036
Neg logit contributions
aid-0.026
tr-0.025
Rex-0.024
 slices-0.024
 half-0.023
Token her
Token ID607
Feature activation-1.42e-02

Pos logit contributions
 closely+0.005
ivated+0.004
uring+0.004
itionally+0.004
ize+0.004
Neg logit contributions
aid-0.004
 slices-0.004
tr-0.003
 half-0.003
 bisc-0.003
Token he
Token ID339
Feature activation+2.63e-02

Pos logit contributions
 closely+0.027
uring+0.024
 up+0.023
ivated+0.023
 Jackie+0.023
Neg logit contributions
aid-0.040
 slices-0.038
tr-0.037
ests-0.035
 bisc-0.035
Token can
Token ID460
Feature activation+5.34e-03

Pos logit contributions
 closely+0.047
ivated+0.046
uring+0.044
itionally+0.043
ize+0.043
Neg logit contributions
aid-0.038
tr-0.036
 slices-0.036
 half-0.034
oc-0.033
Token jump
Token ID4391
Feature activation+6.43e-02

Pos logit contributions
 closely+0.008
ivated+0.007
uring+0.007
ize+0.007
ya+0.007
Neg logit contributions
aid-0.010
 slices-0.009
tr-0.009
 half-0.009
 bisc-0.009
Token in
Token ID287
Feature activation+9.76e-03

Pos logit contributions
ivated+0.088
 closely+0.088
itionally+0.086
ya+0.084
uring+0.082
Neg logit contributions
aid-0.112
tr-0.109
 slices-0.107
 half-0.105
 scarf-0.100
Token p
Token ID279
Feature activation+4.05e-02

Pos logit contributions
 closely+0.014
 up+0.012
ivated+0.012
uring+0.012
 Jackie+0.011
Neg logit contributions
aid-0.019
tr-0.019
 slices-0.017
 carriers-0.017
ests-0.017
Tokenudd
Token ID4185
Feature activation+7.91e-02

Pos logit contributions
 closely+0.065
ivated+0.061
uring+0.060
ize+0.059
ya+0.059
Neg logit contributions
aid-0.067
 slices-0.062
tr-0.061
ossible-0.060
ests-0.058
Tokenles
Token ID829
Feature activation+1.79e-02

Pos logit contributions
itionally+0.133
ivated+0.133
uring+0.122
appointed+0.120
 impossible+0.120
Neg logit contributions
aid-0.117
aw-0.112
ests-0.105
 slices-0.103
brush-0.098
Token without
Token ID1231
Feature activation-2.55e-02

Pos logit contributions
 closely+0.026
ivated+0.023
uring+0.022
 up+0.022
ize+0.021
Neg logit contributions
aid-0.034
tr-0.032
 slices-0.032
 bisc-0.030
 half-0.030
Token getting
Token ID1972
Feature activation+2.36e-03

Pos logit contributions
aid+0.044
 slices+0.042
tr+0.041
 half+0.040
 bisc+0.038
Neg logit contributions
 closely-0.039
ivated-0.037
uring-0.035
ize-0.034
ya-0.034
Token wet
Token ID9583
Feature activation+1.79e-02

Pos logit contributions
 closely+0.003
ivated+0.003
ya+0.003
uring+0.003
ize+0.003
Neg logit contributions
aid-0.004
 slices-0.004
 half-0.004
tr-0.004
 bruises-0.004
Token or
Token ID393
Feature activation+8.79e-03

Pos logit contributions
 closely+0.033
ivated+0.031
uring+0.030
itionally+0.030
ya+0.030
Neg logit contributions
aid-0.025
 slices-0.024
tr-0.024
 half-0.022
 bisc-0.022
Token is
Token ID318
Feature activation+2.41e-02

Pos logit contributions
aid+0.011
 slices+0.010
tr+0.010
 half+0.009
 bisc+0.009
Neg logit contributions
 closely-0.016
ivated-0.015
uring-0.014
ize-0.014
ya-0.014
Token that
Token ID326
Feature activation-4.04e-03

Pos logit contributions
ivated+0.049
 closely+0.048
itionally+0.047
rated+0.047
uring+0.045
Neg logit contributions
aid-0.027
ossible-0.025
 half-0.024
tr-0.024
 slices-0.023
Token even
Token ID772
Feature activation+1.76e-02

Pos logit contributions
aid+0.007
 slices+0.006
 half+0.006
tr+0.006
oc+0.006
Neg logit contributions
 closely-0.006
ivated-0.006
uring-0.006
ya-0.006
itionally-0.006
Token if
Token ID611
Feature activation+2.18e-03

Pos logit contributions
ivated+0.032
 closely+0.030
uring+0.030
annah+0.030
itionally+0.029
Neg logit contributions
aid-0.026
 half-0.024
 slices-0.023
tr-0.021
oc-0.021
Token something
Token ID1223
Feature activation+3.87e-02

Pos logit contributions
 closely+0.004
ivated+0.004
uring+0.003
inating+0.003
itionally+0.003
Neg logit contributions
aid-0.003
 slices-0.003
 half-0.003
 bruises-0.003
tr-0.003
Token seems
Token ID2331
Feature activation+7.91e-02

Pos logit contributions
ivated+0.087
appointed+0.081
uring+0.081
itionally+0.081
ways+0.081
Neg logit contributions
 half-0.033
 are-0.029
aid-0.029
 bead-0.026
 slices-0.026
Token impossible
Token ID5340
Feature activation+4.91e-02

Pos logit contributions
ivated+0.133
annah+0.132
anna+0.125
ya+0.123
ize+0.123
Neg logit contributions
 half-0.105
aid-0.091
ful-0.091
 slices-0.088
cr-0.087
Token,
Token ID11
Feature activation+3.65e-02

Pos logit contributions
ivated+0.093
itionally+0.082
ya+0.080
 closely+0.079
inating+0.079
Neg logit contributions
 half-0.066
 slices-0.059
ossible-0.054
aid-0.054
ful-0.054
Token it
Token ID340
Feature activation+1.82e-02

Pos logit contributions
ivated+0.057
 closely+0.057
itionally+0.053
uring+0.052
ya+0.052
Neg logit contributions
 slices-0.056
aid-0.055
 half-0.054
tr-0.051
oc-0.050
Token's
Token ID338
Feature activation+1.88e-02

Pos logit contributions
ivated+0.038
 closely+0.036
ya+0.035
uring+0.035
ize+0.035
Neg logit contributions
aid-0.018
 half-0.018
tr-0.017
 slices-0.017
oc-0.017
Token always
Token ID1464
Feature activation+1.59e-02

Pos logit contributions
 closely+0.026
ivated+0.026
uring+0.024
ize+0.024
itionally+0.024
Neg logit contributions
aid-0.034
 slices-0.032
 half-0.032
tr-0.031
ossible-0.029
Token of
Token ID286
Feature activation-1.27e-02

Pos logit contributions
 slices+0.026
tr+0.025
aid+0.025
 bisc+0.023
oc+0.023
Neg logit contributions
 closely-0.045
 up-0.041
uring-0.040
ya-0.040
ize-0.040
Token the
Token ID262
Feature activation+1.68e-02

Pos logit contributions
tr+0.030
aid+0.030
brush+0.028
bridge+0.027
bar+0.027
Neg logit contributions
 closely-0.013
 up-0.012
ya-0.011
 Jackie-0.011
ize-0.011
Token story
Token ID1621
Feature activation-1.61e-02

Pos logit contributions
 closely+0.040
ivated+0.038
uring+0.037
ize+0.036
itionally+0.036
Neg logit contributions
aid-0.015
 slices-0.014
tr-0.014
 half-0.012
 bisc-0.012
Token is
Token ID318
Feature activation+1.14e-02

Pos logit contributions
aid+0.022
 slices+0.021
tr+0.020
 bisc+0.019
 half+0.019
Neg logit contributions
 closely-0.031
ivated-0.029
uring-0.028
ize-0.027
ya-0.027
Token:
Token ID25
Feature activation-4.60e-04

Pos logit contributions
ivated+0.019
 closely+0.019
itionally+0.018
rated+0.018
uring+0.018
Neg logit contributions
aid-0.017
tr-0.017
ossible-0.016
 half-0.016
 slices-0.016
Token Acc
Token ID6366
Feature activation+7.90e-02

Pos logit contributions
aid+0.001
 slices+0.001
tr+0.001
 bisc+0.001
 half+0.001
Neg logit contributions
 closely-0.001
ivated-0.001
uring-0.001
itionally-0.001
ize-0.001
Tokenidents
Token ID3231
Feature activation+4.41e-02

Pos logit contributions
ivated+0.098
ya+0.091
 lookout+0.090
 Jackie+0.088
 closely+0.087
Neg logit contributions
tr-0.147
oc-0.138
ossible-0.136
 slices-0.132
 carriers-0.132
Token happen
Token ID1645
Feature activation+3.63e-02

Pos logit contributions
ivated+0.061
ya+0.058
 closely+0.057
itionally+0.056
inating+0.054
Neg logit contributions
tr-0.079
aid-0.077
 half-0.074
 slices-0.070
oc-0.069
Token,
Token ID11
Feature activation+5.24e-02

Pos logit contributions
ivated+0.089
itionally+0.079
ya+0.079
 closely+0.078
uring+0.077
Neg logit contributions
 half-0.034
 slices-0.031
tr-0.029
 carriers-0.027
aid-0.026
Token but
Token ID475
Feature activation+8.87e-03

Pos logit contributions
 closely+0.061
ivated+0.059
uring+0.053
ize+0.052
itionally+0.052
Neg logit contributions
aid-0.106
 slices-0.106
tr-0.103
 half-0.100
 bisc-0.097
Token we
Token ID356
Feature activation-1.60e-02

Pos logit contributions
 closely+0.013
ivated+0.012
uring+0.012
ya+0.011
itionally+0.011
Neg logit contributions
aid-0.016
 slices-0.015
tr-0.015
 half-0.014
 bisc-0.014
Token would
Token ID561
Feature activation+2.02e-02

Pos logit contributions
 closely+0.050
ivated+0.049
uring+0.048
ize+0.046
itionally+0.045
Neg logit contributions
aid-0.044
tr-0.043
 slices-0.041
 half-0.039
ossible-0.039
Token jump
Token ID4391
Feature activation+6.40e-02

Pos logit contributions
 closely+0.030
ivated+0.028
uring+0.026
ize+0.026
itionally+0.026
Neg logit contributions
aid-0.036
tr-0.034
 slices-0.034
 half-0.032
 bisc-0.032
Token in
Token ID287
Feature activation+1.44e-02

Pos logit contributions
itionally+0.109
ya+0.106
ivated+0.105
annah+0.105
anna+0.099
Neg logit contributions
tr-0.096
aid-0.092
 slices-0.090
 half-0.088
 scarf-0.085
Token the
Token ID262
Feature activation+8.28e-03

Pos logit contributions
 closely+0.021
 up+0.019
ivated+0.018
uring+0.018
ize+0.017
Neg logit contributions
aid-0.027
tr-0.026
 slices-0.025
 bisc-0.024
ossible-0.024
Token p
Token ID279
Feature activation+3.42e-02

Pos logit contributions
 closely+0.013
ivated+0.012
 up+0.011
uring+0.011
inating+0.011
Neg logit contributions
aid-0.014
 slices-0.014
tr-0.013
oc-0.013
 carriers-0.013
Tokenudd
Token ID4185
Feature activation+7.88e-02

Pos logit contributions
 closely+0.059
ivated+0.057
uring+0.055
ize+0.055
 Jackie+0.054
Neg logit contributions
aid-0.052
tr-0.050
 slices-0.047
ossible-0.047
ests-0.045
Tokenles
Token ID829
Feature activation+1.67e-02

Pos logit contributions
itionally+0.144
ivated+0.139
uring+0.130
 Jackie+0.125
 impossible+0.123
Neg logit contributions
aid-0.117
aw-0.114
ests-0.104
 ig-0.099
 snow-0.097
Token and
Token ID290
Feature activation+2.99e-02

Pos logit contributions
 closely+0.024
ivated+0.022
uring+0.021
ize+0.020
ya+0.020
Neg logit contributions
aid-0.031
 slices-0.030
tr-0.029
 half-0.028
 bisc-0.028
Token get
Token ID651
Feature activation+5.03e-03

Pos logit contributions
 closely+0.043
ivated+0.042
ize+0.040
uring+0.039
ya+0.039
Neg logit contributions
aid-0.052
tr-0.051
 slices-0.050
 half-0.048
 bisc-0.048
Token all
Token ID477
Feature activation+9.82e-03

Pos logit contributions
 closely+0.008
ivated+0.007
uring+0.007
ya+0.007
ize+0.007
Neg logit contributions
aid-0.009
 slices-0.009
tr-0.008
 half-0.008
 bisc-0.008
Token wet
Token ID9583
Feature activation+9.49e-03

Pos logit contributions
 closely+0.014
 up+0.013
ivated+0.013
 Jackie+0.012
uring+0.012
Neg logit contributions
aid-0.018
tr-0.017
 slices-0.017
oc-0.016
 bisc-0.016
Token as
Token ID355
Feature activation+7.16e-03

Pos logit contributions
aid+0.010
 slices+0.009
tr+0.009
 half+0.008
 bisc+0.008
Neg logit contributions
 closely-0.011
uring-0.010
ivated-0.010
 up-0.009
ya-0.009
Token he
Token ID339
Feature activation+2.71e-03

Pos logit contributions
 closely+0.011
ivated+0.009
uring+0.009
ize+0.009
ya+0.009
Neg logit contributions
aid-0.013
 slices-0.012
tr-0.012
 bisc-0.012
 half-0.012
Token opened
Token ID4721
Feature activation+5.38e-03

Pos logit contributions
 closely+0.004
ivated+0.004
uring+0.004
ya+0.004
ize+0.004
Neg logit contributions
aid-0.005
 slices-0.005
tr-0.004
 half-0.004
 bisc-0.004
Token the
Token ID262
Feature activation-2.52e-03

Pos logit contributions
 closely+0.008
ivated+0.008
uring+0.007
ize+0.007
ya+0.007
Neg logit contributions
aid-0.009
 slices-0.009
tr-0.009
 half-0.008
 bisc-0.008
Token ch
Token ID442
Feature activation-4.22e-03

Pos logit contributions
aid+0.005
 slices+0.004
ests+0.004
oc+0.004
Rex+0.004
Neg logit contributions
 closely-0.004
 up-0.003
ya-0.003
 lookout-0.003
ivated-0.003
Tokenocol
Token ID4668
Feature activation-8.19e-02

Pos logit contributions
tr+0.006
aid+0.006
 slices+0.006
 bisc+0.006
 half+0.006
Neg logit contributions
 closely-0.007
ivated-0.007
uring-0.007
ya-0.006
ize-0.006
Tokenates
Token ID689
Feature activation-2.88e-02

Pos logit contributions
 bisc+0.107
aid+0.105
tr+0.105
 scarf+0.100
 handcuffs+0.097
Neg logit contributions
 closely-0.138
anna-0.135
ya-0.130
uring-0.129
inating-0.126
Token and
Token ID290
Feature activation-2.80e-03

Pos logit contributions
aid+0.055
tr+0.051
 bruises+0.047
 slices+0.047
ests+0.046
Neg logit contributions
 closely-0.046
 up-0.039
uring-0.039
ivated-0.039
ya-0.037
Token started
Token ID2067
Feature activation-3.20e-03

Pos logit contributions
aid+0.005
 slices+0.005
tr+0.005
 half+0.004
oc+0.004
Neg logit contributions
 closely-0.004
ivated-0.004
uring-0.004
ya-0.004
ize-0.004
Token eating
Token ID6600
Feature activation-5.43e-03

Pos logit contributions
aid+0.007
tr+0.006
 slices+0.006
oc+0.006
ests+0.006
Neg logit contributions
 closely-0.004
ivated-0.004
uring-0.003
 up-0.003
ya-0.003
Token them
Token ID606
Feature activation+1.64e-02

Pos logit contributions
aid+0.010
tr+0.010
ossible+0.009
ests+0.009
ulture+0.009
Neg logit contributions
 closely-0.009
 up-0.008
uring-0.007
ize-0.007
ivated-0.007
Tokenmen
Token ID3653
Feature activation-1.62e-02

Pos logit contributions
aid+0.044
tr+0.042
 slices+0.041
 bruises+0.041
 bisc+0.039
Neg logit contributions
 up-0.032
 closely-0.032
ly-0.030
uring-0.029
ize-0.029
Token."
Token ID526
Feature activation-6.10e-03

Pos logit contributions
aid+0.031
 slices+0.029
tr+0.029
 bruises+0.028
 bisc+0.027
Neg logit contributions
 closely-0.021
 up-0.020
ya-0.019
uring-0.018
ly-0.018
Token He
Token ID679
Feature activation-3.00e-02

Pos logit contributions
tr+0.010
aid+0.010
 slices+0.009
brush+0.009
 bisc+0.008
Neg logit contributions
 closely-0.010
ivated-0.009
 up-0.009
 Jackie-0.009
uring-0.009
Token gives
Token ID3607
Feature activation-3.13e-02

Pos logit contributions
aid+0.054
tr+0.051
 slices+0.051
oc+0.047
 bisc+0.047
Neg logit contributions
 closely-0.046
 up-0.040
ivated-0.039
uring-0.038
ya-0.037
Token them
Token ID606
Feature activation-3.41e-02

Pos logit contributions
tr+0.071
aid+0.069
oc+0.067
 sy+0.065
brush+0.064
Neg logit contributions
 up-0.035
 closely-0.033
ize-0.027
ya-0.026
ly-0.025
Token each
Token ID1123
Feature activation-8.07e-02

Pos logit contributions
tr+0.064
oc+0.062
 slot+0.058
aid+0.056
 sy+0.055
Neg logit contributions
 up-0.049
 closely-0.048
ya-0.042
 Jackie-0.041
ly-0.041
Token a
Token ID257
Feature activation-3.57e-02

Pos logit contributions
tr+0.174
lor+0.155
brush+0.154
bar+0.154
ests+0.153
Neg logit contributions
 up-0.095
ly-0.088
 closely-0.083
ya-0.073
 Jackie-0.070
Token sticker
Token ID27735
Feature activation-7.24e-02

Pos logit contributions
ests+0.053
pter+0.053
Rex+0.050
aid+0.049
 bruises+0.048
Neg logit contributions
 closely-0.057
 up-0.052
 Jackie-0.050
 lookout-0.050
 handsome-0.049
Token and
Token ID290
Feature activation-3.62e-02

Pos logit contributions
 bruises+0.176
 ig+0.168
Br+0.167
Ant+0.166
Pr+0.163
Neg logit contributions
 up-0.054
ly-0.053
 closely-0.051
 around-0.043
 before-0.030
Token a
Token ID257
Feature activation-5.67e-02

Pos logit contributions
ase+0.064
aid+0.061
ulture+0.061
tr+0.060
oc+0.059
Neg logit contributions
 closely-0.055
 up-0.052
 Jackie-0.044
ya-0.042
 handsome-0.042
Token candy
Token ID18550
Feature activation-6.76e-02

Pos logit contributions
ests+0.085
pter+0.081
umbs+0.077
Rex+0.077
ase+0.076
Neg logit contributions
 closely-0.087
 handsome-0.086
 up-0.081
 lookout-0.080
 Jackie-0.078
Token a
Token ID257
Feature activation-1.92e-02

Pos logit contributions
tr+0.107
aid+0.100
oc+0.099
ulture+0.094
ase+0.094
Neg logit contributions
 closely-0.083
 up-0.075
 Jackie-0.074
ivated-0.066
ya-0.066
Token big
Token ID1263
Feature activation-2.37e-02

Pos logit contributions
aid+0.036
ests+0.035
tr+0.034
 slices+0.034
 carriers+0.033
Neg logit contributions
 closely-0.026
ivated-0.023
ya-0.022
 Jackie-0.022
 lookout-0.022
Token box
Token ID3091
Feature activation-3.67e-02

Pos logit contributions
aid+0.043
tr+0.041
 slices+0.041
ests+0.038
 half+0.038
Neg logit contributions
 closely-0.035
ivated-0.031
ya-0.030
uring-0.029
 Jackie-0.029
Token of
Token ID286
Feature activation-4.37e-02

Pos logit contributions
aid+0.060
ests+0.054
tr+0.053
 slices+0.053
oc+0.052
Neg logit contributions
 closely-0.066
uring-0.058
 up-0.058
ly-0.058
ivated-0.056
Token ch
Token ID442
Feature activation-4.00e-03

Pos logit contributions
tr+0.070
aid+0.068
umbs+0.064
ulture+0.064
ase+0.063
Neg logit contributions
 closely-0.072
 up-0.065
 Jackie-0.063
ya-0.060
 handsome-0.060
Tokenocol
Token ID4668
Feature activation-7.96e-02

Pos logit contributions
tr+0.006
aid+0.006
 slices+0.005
 bisc+0.005
oc+0.005
Neg logit contributions
 closely-0.007
ivated-0.007
uring-0.007
ya-0.006
ize-0.006
Tokenates
Token ID689
Feature activation-4.02e-02

Pos logit contributions
tr+0.097
aid+0.096
oc+0.093
 bisc+0.090
 handcuffs+0.087
Neg logit contributions
 closely-0.134
anna-0.130
ya-0.129
inating-0.129
uring-0.128
Token!
Token ID0
Feature activation-2.30e-02

Pos logit contributions
aid+0.073
tr+0.069
oc+0.065
 slices+0.065
 bruises+0.064
Neg logit contributions
 closely-0.065
 up-0.057
uring-0.054
ivated-0.054
ya-0.053
Token<|endoftext|>
Token ID50256
Feature activation-7.33e-03

Pos logit contributions
aid+0.040
 slices+0.038
tr+0.038
 bisc+0.035
 carriers+0.035
Neg logit contributions
 closely-0.034
 Jackie-0.034
ivated-0.032
annah-0.032
 But-0.031
TokenOnce
Token ID7454
Feature activation+2.28e-02

Pos logit contributions
aid+0.014
tr+0.012
 slices+0.012
 bruises+0.012
 slice+0.012
Neg logit contributions
 closely-0.010
 up-0.010
 Jackie-0.010
inating-0.009
ivated-0.009
Token upon
Token ID2402
Feature activation-2.97e-02

Pos logit contributions
 closely+0.033
ivated+0.032
uring+0.031
itionally+0.030
ize+0.030
Neg logit contributions
aid-0.039
 slices-0.038
tr-0.037
ossible-0.037
 half-0.037
Token face
Token ID1986
Feature activation-3.41e-02

Pos logit contributions
ivated+0.001
 closely+0.001
ize+0.001
itionally+0.001
inating+0.001
Neg logit contributions
 slices-0.001
aid-0.001
tr-0.001
 half-0.001
 slice-0.001
Token and
Token ID290
Feature activation-1.31e-02

Pos logit contributions
aid+0.068
ests+0.063
 slices+0.062
tr+0.062
 bruises+0.061
Neg logit contributions
 closely-0.048
 up-0.040
uring-0.039
ivated-0.038
ly-0.038
Token her
Token ID607
Feature activation-2.05e-02

Pos logit contributions
aid+0.024
 slices+0.023
tr+0.023
ests+0.022
oc+0.021
Neg logit contributions
 closely-0.019
 up-0.017
 Jackie-0.017
ivated-0.017
ya-0.016
Token special
Token ID2041
Feature activation-1.94e-02

Pos logit contributions
 slices+0.034
aid+0.032
tr+0.032
 half+0.031
 bisc+0.031
Neg logit contributions
 closely-0.032
ivated-0.032
uring-0.030
ize-0.030
itionally-0.029
Token black
Token ID2042
Feature activation-4.19e-02

Pos logit contributions
aid+0.024
 slices+0.022
tr+0.022
oc+0.020
 half+0.020
Neg logit contributions
 closely-0.040
ivated-0.037
uring-0.036
ya-0.035
ize-0.035
Token tray
Token ID26473
Feature activation-7.94e-02

Pos logit contributions
aid+0.058
 slices+0.051
tr+0.051
oc+0.047
 half+0.047
Neg logit contributions
 closely-0.081
ivated-0.075
ya-0.072
uring-0.072
inating-0.071
Token in
Token ID287
Feature activation-3.02e-02

Pos logit contributions
aid+0.150
oc+0.145
Rex+0.144
ests+0.137
 slices+0.136
Neg logit contributions
 closely-0.115
 up-0.105
ly-0.096
 curiously-0.090
uring-0.088
Token her
Token ID607
Feature activation-2.22e-02

Pos logit contributions
aid+0.071
tr+0.068
ests+0.065
oc+0.062
ulture+0.061
Neg logit contributions
 closely-0.036
 Jackie-0.032
 up-0.029
ivated-0.027
ya-0.026
Token hand
Token ID1021
Feature activation-5.30e-02

Pos logit contributions
aid+0.037
 slices+0.032
tr+0.032
ests+0.031
oc+0.030
Neg logit contributions
 closely-0.039
ivated-0.034
 up-0.034
ya-0.034
uring-0.033
Token.
Token ID13
Feature activation-1.52e-02

Pos logit contributions
aid+0.097
 slices+0.090
ests+0.089
tr+0.088
ossible+0.088
Neg logit contributions
 closely-0.078
 up-0.071
ly-0.068
uring-0.065
ivated-0.062
Token<|endoftext|>
Token ID50256
Feature activation-1.18e-02

Pos logit contributions
aid+0.024
 slices+0.022
tr+0.021
 bisc+0.021
ests+0.020
Neg logit contributions
 closely-0.027
ivated-0.025
 Jackie-0.024
uring-0.024
 up-0.023
Token hugged
Token ID41130
Feature activation+5.59e-04

Pos logit contributions
aid+0.070
 slices+0.068
tr+0.068
ossible+0.065
ests+0.065
Neg logit contributions
 closely-0.054
ivated-0.047
 up-0.045
uring-0.044
ize-0.043
Token them
Token ID606
Feature activation-2.68e-02

Pos logit contributions
 closely+0.001
 up+0.001
ize+0.001
ly+0.001
ivated+0.001
Neg logit contributions
tr-0.001
 slices-0.001
aid-0.001
 bisc-0.001
oc-0.001
Token and
Token ID290
Feature activation-2.73e-02

Pos logit contributions
aid+0.051
tr+0.050
 slices+0.049
 bisc+0.047
oc+0.047
Neg logit contributions
 closely-0.038
 up-0.033
ivated-0.032
uring-0.030
ya-0.030
Token gave
Token ID2921
Feature activation-2.53e-02

Pos logit contributions
aid+0.047
tr+0.046
 slices+0.046
oc+0.044
ests+0.044
Neg logit contributions
 closely-0.041
 up-0.036
ivated-0.035
ya-0.035
ize-0.034
Token them
Token ID606
Feature activation-3.57e-02

Pos logit contributions
tr+0.053
aid+0.052
 slices+0.052
oc+0.051
 bisc+0.049
Neg logit contributions
 closely-0.031
 up-0.029
 Jackie-0.025
ly-0.024
ize-0.024
Token each
Token ID1123
Feature activation-7.94e-02

Pos logit contributions
oc+0.059
tr+0.058
 slot+0.055
aid+0.053
ase+0.052
Neg logit contributions
 closely-0.057
 up-0.056
ya-0.049
ly-0.048
 Jackie-0.047
Token a
Token ID257
Feature activation-3.60e-02

Pos logit contributions
tr+0.143
bar+0.130
brush+0.128
ase+0.127
box+0.126
Neg logit contributions
 closely-0.118
 up-0.116
ly-0.111
ya-0.104
 Jackie-0.098
Token balloon
Token ID21190
Feature activation-7.46e-02

Pos logit contributions
pter+0.059
 slices+0.058
ests+0.057
aid+0.057
 carriers+0.056
Neg logit contributions
 closely-0.053
 up-0.047
ya-0.045
 Jackie-0.045
 handsome-0.044
Token.
Token ID13
Feature activation-5.96e-03

Pos logit contributions
aid+0.151
ests+0.150
 bruises+0.147
oc+0.146
 handcuffs+0.144
Neg logit contributions
 closely-0.083
 up-0.081
ly-0.075
 around-0.061
uring-0.054
Token He
Token ID679
Feature activation-2.55e-02

Pos logit contributions
aid+0.011
 slices+0.010
tr+0.010
 half+0.010
oc+0.010
Neg logit contributions
 closely-0.009
ivated-0.008
uring-0.008
 Jackie-0.007
 up-0.007
Token wink
Token ID41955
Feature activation-2.47e-02

Pos logit contributions
aid+0.046
 slices+0.044
tr+0.044
pter+0.042
umbs+0.042
Neg logit contributions
 closely-0.037
 up-0.031
ya-0.030
ize-0.029
ivated-0.028
Token
Token ID198
Feature activation-2.31e-02

Pos logit contributions
aid+0.011
tr+0.011
 slices+0.011
 half+0.010
brush+0.010
Neg logit contributions
 closely-0.014
ivated-0.013
 up-0.012
uring-0.012
ize-0.012
Token
Token ID198
Feature activation-1.41e-02

Pos logit contributions
tr+0.044
aid+0.040
 slices+0.040
brush+0.040
bar+0.039
Neg logit contributions
 closely-0.035
 up-0.031
ize-0.027
uring-0.027
ivated-0.027
TokenHe
Token ID1544
Feature activation-2.63e-02

Pos logit contributions
tr+0.024
pter+0.021
box+0.021
 extingu+0.021
 slices+0.020
Neg logit contributions
 closely-0.022
ize-0.020
annah-0.019
 up-0.019
ivated-0.018
Token gave
Token ID2921
Feature activation-2.43e-02

Pos logit contributions
aid+0.048
tr+0.046
 slices+0.045
ossible+0.043
ests+0.043
Neg logit contributions
 closely-0.038
 up-0.032
ivated-0.032
uring-0.031
ya-0.031
Token them
Token ID606
Feature activation-3.02e-02

Pos logit contributions
aid+0.046
tr+0.046
 slices+0.044
oc+0.042
 bisc+0.041
Neg logit contributions
 closely-0.034
 up-0.032
ya-0.028
ize-0.028
ivated-0.028
Token each
Token ID1123
Feature activation-7.92e-02

Pos logit contributions
tr+0.051
oc+0.050
aid+0.048
 slot+0.047
 slices+0.045
Neg logit contributions
 closely-0.047
 up-0.046
ya-0.041
 Jackie-0.040
ly-0.040
Token a
Token ID257
Feature activation-3.79e-02

Pos logit contributions
tr+0.149
bar+0.137
brush+0.136
ase+0.134
bridge+0.134
Neg logit contributions
 up-0.110
 closely-0.106
ly-0.103
ya-0.091
 Jackie-0.088
Token hug
Token ID16225
Feature activation-4.07e-02

Pos logit contributions
pter+0.063
ests+0.060
 slices+0.060
eper+0.058
ase+0.057
Neg logit contributions
 closely-0.053
 up-0.050
 handsome-0.047
 lookout-0.046
 Jackie-0.045
Token and
Token ID290
Feature activation-3.92e-02

Pos logit contributions
aid+0.098
 bruises+0.095
 slices+0.093
ossible+0.091
 bisc+0.089
Neg logit contributions
ly-0.035
 up-0.034
 closely-0.032
 around-0.024
 before-0.023
Token a
Token ID257
Feature activation-5.17e-02

Pos logit contributions
ase+0.067
oc+0.064
ulture+0.064
tr+0.064
umbs+0.062
Neg logit contributions
 closely-0.056
 up-0.055
 Jackie-0.049
ly-0.048
ya-0.047
Token kiss
Token ID9245
Feature activation-4.70e-02

Pos logit contributions
pter+0.095
ests+0.091
ions+0.091
eper+0.090
ase+0.087
Neg logit contributions
 handsome-0.062
 closely-0.059
 early-0.056
 up-0.055
 lookout-0.052
Token cr
Token ID1067
Feature activation-4.96e-02

Pos logit contributions
aid+0.034
tr+0.032
ests+0.031
 slices+0.031
 carriers+0.031
Neg logit contributions
 closely-0.032
 Jackie-0.027
ivated-0.027
 up-0.027
ya-0.026
Tokenay
Token ID323
Feature activation-3.80e-02

Pos logit contributions
 bruises+0.062
tr+0.060
 handcuffs+0.060
 slot+0.060
 carriers+0.059
Neg logit contributions
 closely-0.089
ly-0.080
iggle-0.079
inating-0.078
 up-0.076
Tokenon
Token ID261
Feature activation-6.14e-02

Pos logit contributions
tr+0.043
 carriers+0.037
 ig+0.037
 bruises+0.034
 slot+0.034
Neg logit contributions
 closely-0.080
 up-0.073
ly-0.073
 Jackie-0.071
very-0.068
Token and
Token ID290
Feature activation-2.74e-02

Pos logit contributions
 bruises+0.147
Rex+0.146
 handcuffs+0.146
Did+0.142
aid+0.142
Neg logit contributions
 closely-0.050
 up-0.042
ly-0.037
 around-0.028
 into-0.023
Token a
Token ID257
Feature activation-4.06e-02

Pos logit contributions
aid+0.044
ests+0.043
tr+0.043
ulture+0.042
oc+0.042
Neg logit contributions
 closely-0.046
 up-0.040
 Jackie-0.039
ya-0.036
ize-0.036
Token paper
Token ID3348
Feature activation-7.88e-02

Pos logit contributions
ests+0.078
aid+0.077
 carriers+0.074
pter+0.073
Rex+0.072
Neg logit contributions
 closely-0.054
 Jackie-0.046
 up-0.044
 handsome-0.043
ivated-0.041
Token and
Token ID290
Feature activation-2.68e-02

Pos logit contributions
aid+0.162
 bruises+0.155
 handcuffs+0.154
ests+0.152
ird+0.152
Neg logit contributions
 closely-0.098
 up-0.085
ly-0.068
 into-0.067
 around-0.065
Token helped
Token ID4193
Feature activation-3.58e-03

Pos logit contributions
aid+0.043
ests+0.043
ulture+0.042
oc+0.041
ase+0.041
Neg logit contributions
 closely-0.045
 up-0.039
 Jackie-0.038
ly-0.035
ya-0.034
Token him
Token ID683
Feature activation+3.94e-03

Pos logit contributions
aid+0.007
tr+0.006
 slices+0.006
ests+0.006
ossible+0.006
Neg logit contributions
 closely-0.005
 up-0.004
ivated-0.004
ize-0.004
uring-0.004
Token draw
Token ID3197
Feature activation-8.80e-03

Pos logit contributions
 closely+0.006
ivated+0.005
uring+0.005
 up+0.005
ize+0.005
Neg logit contributions
aid-0.007
 slices-0.007
tr-0.007
 bisc-0.007
ests-0.007
Token a
Token ID257
Feature activation-6.55e-03

Pos logit contributions
aid+0.016
tr+0.016
 bruises+0.015
ossible+0.015
 paramed+0.015
Neg logit contributions
 closely-0.014
 up-0.013
ize-0.011
uring-0.011
inating-0.010
Token the
Token ID262
Feature activation-2.88e-03

Pos logit contributions
aid+0.039
tr+0.038
 slices+0.036
oc+0.036
 bisc+0.036
Neg logit contributions
 closely-0.032
 up-0.029
ivated-0.026
ya-0.025
ize-0.025
Token necklace
Token ID38987
Feature activation-4.03e-02

Pos logit contributions
oc+0.004
aid+0.004
Rex+0.004
umbs+0.004
tr+0.004
Neg logit contributions
 closely-0.005
 up-0.005
 lookout-0.004
 impossible-0.004
ivated-0.004
Token and
Token ID290
Feature activation-8.64e-03

Pos logit contributions
aid+0.084
ossible+0.077
 slices+0.077
ests+0.076
oc+0.075
Neg logit contributions
 closely-0.054
 up-0.048
uring-0.043
ly-0.041
ivated-0.039
Token the
Token ID262
Feature activation-1.95e-02

Pos logit contributions
aid+0.016
tr+0.015
 slices+0.014
oc+0.014
ests+0.014
Neg logit contributions
 closely-0.013
 up-0.011
ya-0.011
ivated-0.011
 Jackie-0.011
Token magn
Token ID7842
Feature activation-2.28e-02

Pos logit contributions
umbs+0.030
oc+0.030
aid+0.030
ests+0.028
tr+0.027
Neg logit contributions
 closely-0.032
 up-0.030
 lookout-0.029
 handsome-0.029
 impossible-0.027
Tokenifying
Token ID4035
Feature activation-7.88e-02

Pos logit contributions
aid+0.031
 bisc+0.027
 slices+0.027
 bruises+0.027
oc+0.027
Neg logit contributions
 closely-0.047
ivated-0.043
ly-0.042
 up-0.041
inating-0.041
Token glass
Token ID5405
Feature activation-5.81e-02

Pos logit contributions
aid+0.128
Rex+0.116
pter+0.115
ests+0.115
eper+0.108
Neg logit contributions
 up-0.120
 closely-0.117
inating-0.113
 Jackie-0.105
ivated-0.101
Token and
Token ID290
Feature activation-1.21e-02

Pos logit contributions
aid+0.107
oc+0.105
ossible+0.102
tr+0.101
Rex+0.098
Neg logit contributions
 closely-0.088
 up-0.080
ly-0.077
 around-0.066
uring-0.064
Token told
Token ID1297
Feature activation-1.10e-02

Pos logit contributions
aid+0.022
tr+0.022
oc+0.021
ests+0.020
ossible+0.020
Neg logit contributions
 closely-0.018
 up-0.015
ivated-0.015
 Jackie-0.015
ya-0.015
Token him
Token ID683
Feature activation-7.01e-03

Pos logit contributions
aid+0.017
 slices+0.016
 half+0.015
tr+0.015
ossible+0.014
Neg logit contributions
ivated-0.018
 closely-0.018
uring-0.017
itionally-0.017
inating-0.017
Token that
Token ID326
Feature activation-8.37e-03

Pos logit contributions
aid+0.012
tr+0.012
 slices+0.012
oc+0.011
 bisc+0.011
Neg logit contributions
 closely-0.011
ivated-0.010
uring-0.009
 up-0.009
ize-0.009
Token the
Token ID262
Feature activation-1.78e-02

Pos logit contributions
tr+0.062
oc+0.061
 slot+0.059
aid+0.056
 sy+0.055
Neg logit contributions
 up-0.052
 closely-0.048
 Jackie-0.042
ya-0.041
ly-0.040
Token plastic
Token ID7309
Feature activation-5.78e-02

Pos logit contributions
aid+0.023
tr+0.021
oc+0.021
Rex+0.020
ests+0.020
Neg logit contributions
 closely-0.035
 up-0.033
 lookout-0.031
 Jackie-0.029
 handsome-0.029
Token pizza
Token ID14256
Feature activation-4.49e-02

Pos logit contributions
aid+0.085
tr+0.084
Rex+0.076
 bruises+0.076
c+0.076
Neg logit contributions
 closely-0.095
 up-0.095
ly-0.089
ize-0.073
uring-0.072
Token and
Token ID290
Feature activation-1.98e-02

Pos logit contributions
aid+0.096
 bruises+0.093
tr+0.091
oc+0.091
Rex+0.088
Neg logit contributions
 closely-0.049
 up-0.048
ly-0.039
 around-0.034
uring-0.034
Token a
Token ID257
Feature activation-4.83e-02

Pos logit contributions
aid+0.035
tr+0.034
ests+0.033
ase+0.032
ulture+0.032
Neg logit contributions
 closely-0.032
 up-0.028
 Jackie-0.025
ya-0.024
ivated-0.024
Token plastic
Token ID7309
Feature activation-7.87e-02

Pos logit contributions
ests+0.069
pter+0.067
aid+0.064
Rex+0.064
ulture+0.064
Neg logit contributions
 closely-0.085
 up-0.077
 handsome-0.075
 Jackie-0.075
 early-0.071
Token plate
Token ID7480
Feature activation-7.84e-02

Pos logit contributions
aid+0.118
tr+0.109
Rex+0.109
c+0.105
Kim+0.104
Neg logit contributions
 closely-0.129
ly-0.123
 up-0.123
 early-0.100
 curiously-0.099
Token and
Token ID290
Feature activation-2.75e-02

Pos logit contributions
oc+0.171
tr+0.167
ulture+0.164
Ant+0.164
inea+0.163
Neg logit contributions
 closely-0.085
 up-0.079
ly-0.071
 around-0.066
 straight-0.055
Token fork
Token ID15563
Feature activation-6.23e-02

Pos logit contributions
aid+0.042
ase+0.041
ests+0.041
tr+0.041
ulture+0.041
Neg logit contributions
 closely-0.048
 up-0.044
 Jackie-0.040
ly-0.038
ya-0.038
Token.
Token ID13
Feature activation-1.19e-02

Pos logit contributions
oc+0.125
aid+0.124
tr+0.123
Rex+0.116
ulture+0.115
Neg logit contributions
 closely-0.079
 up-0.076
ly-0.065
uring-0.057
 around-0.056
Token Ben
Token ID3932
Feature activation+4.67e-03

Pos logit contributions
aid+0.019
 slices+0.018
tr+0.018
 half+0.017
 bisc+0.016
Neg logit contributions
 closely-0.020
ivated-0.019
uring-0.018
ya-0.018
ize-0.018
Token named
Token ID3706
Feature activation+4.13e-03

Pos logit contributions
aid+0.056
 slices+0.055
 half+0.050
 bruises+0.050
tr+0.049
Neg logit contributions
 closely-0.042
 up-0.034
ya-0.033
ivated-0.033
ize-0.033
Token Fl
Token ID1610
Feature activation+7.11e-03

Pos logit contributions
 closely+0.008
ivated+0.007
uring+0.007
ize+0.007
ya+0.007
Neg logit contributions
aid-0.006
tr-0.005
 slices-0.005
 half-0.005
 bisc-0.005
Tokenuffy
Token ID15352
Feature activation-1.29e-02

Pos logit contributions
 closely+0.013
ivated+0.012
uring+0.011
ize+0.011
ya+0.011
Neg logit contributions
aid-0.011
 slices-0.010
tr-0.010
 half-0.009
 bisc-0.009
Token and
Token ID290
Feature activation-3.86e-03

Pos logit contributions
aid+0.024
 slices+0.022
 bruises+0.020
tr+0.020
ests+0.020
Neg logit contributions
 closely-0.020
 up-0.018
itionally-0.017
uring-0.017
ya-0.017
Token a
Token ID257
Feature activation-2.02e-02

Pos logit contributions
 slices+0.006
aid+0.006
ase+0.006
ests+0.006
tr+0.006
Neg logit contributions
 closely-0.006
ya-0.005
 up-0.005
 Jackie-0.005
itionally-0.005
Token ham
Token ID8891
Feature activation-7.85e-02

Pos logit contributions
aid+0.034
 slices+0.033
ests+0.031
pter+0.030
tr+0.030
Neg logit contributions
 closely-0.031
 Jackie-0.027
 up-0.027
 lookout-0.027
ya-0.027
Tokenster
Token ID1706
Feature activation-6.01e-02

Pos logit contributions
aid+0.125
 bruises+0.118
 slices+0.109
 handcuffs+0.107
bar+0.105
Neg logit contributions
 closely-0.119
ly-0.115
inating-0.106
 up-0.105
 lookout-0.102
Token named
Token ID3706
Feature activation-8.99e-04

Pos logit contributions
aid+0.126
 slices+0.122
 bruises+0.121
 handcuffs+0.115
 half+0.111
Neg logit contributions
 closely-0.070
 up-0.058
ly-0.056
anna-0.052
 around-0.051
Token Nib
Token ID49532
Feature activation+1.26e-02

Pos logit contributions
aid+0.001
 slices+0.001
tr+0.001
 half+0.001
 bisc+0.001
Neg logit contributions
 closely-0.002
ivated-0.002
uring-0.002
ize-0.002
ya-0.001
Tokenbles
Token ID7689
Feature activation-2.82e-02

Pos logit contributions
 closely+0.024
ivated+0.022
uring+0.021
ize+0.021
ya+0.021
Neg logit contributions
aid-0.018
 slices-0.017
tr-0.016
 half-0.015
 bisc-0.015
Token.
Token ID13
Feature activation+1.91e-02

Pos logit contributions
aid+0.059
 bruises+0.056
 slices+0.054
tr+0.054
 slot+0.053
Neg logit contributions
 closely-0.037
 up-0.032
ly-0.029
 around-0.027
uring-0.027
Token table
Token ID3084
Feature activation-3.90e-02

Pos logit contributions
 closely+0.009
 up+0.008
 lookout+0.007
uring+0.007
inating+0.007
Neg logit contributions
aid-0.013
ests-0.011
oc-0.011
 slices-0.011
ossible-0.011
Token.
Token ID13
Feature activation-1.11e-02

Pos logit contributions
aid+0.071
tr+0.064
Rex+0.061
 bisc+0.061
ests+0.060
Neg logit contributions
 closely-0.064
 up-0.055
uring-0.054
ivated-0.053
ly-0.049
Token Mom
Token ID11254
Feature activation-2.05e-02

Pos logit contributions
aid+0.019
 slices+0.018
tr+0.017
oc+0.017
bar+0.016
Neg logit contributions
 closely-0.018
ivated-0.016
uring-0.016
 Jackie-0.016
inating-0.016
Token gave
Token ID2921
Feature activation-1.71e-02

Pos logit contributions
aid+0.036
 slices+0.035
tr+0.034
ests+0.034
ossible+0.034
Neg logit contributions
 closely-0.030
 up-0.027
ivated-0.025
ya-0.024
uring-0.024
Token them
Token ID606
Feature activation-2.97e-02

Pos logit contributions
oc+0.039
tr+0.038
aid+0.038
 slices+0.036
 bisc+0.036
Neg logit contributions
 up-0.019
 closely-0.019
 Jackie-0.015
ly-0.014
ize-0.013
Token each
Token ID1123
Feature activation-7.85e-02

Pos logit contributions
oc+0.047
tr+0.045
aid+0.043
 slot+0.043
bridge+0.041
Neg logit contributions
 closely-0.049
 up-0.048
 Jackie-0.043
ya-0.042
ly-0.042
Token a
Token ID257
Feature activation-3.30e-02

Pos logit contributions
tr+0.150
bar+0.140
lor+0.139
bridge+0.138
ase+0.136
Neg logit contributions
 up-0.111
 closely-0.107
ly-0.100
 Jackie-0.089
ya-0.086
Token bowl
Token ID9396
Feature activation-6.30e-02

Pos logit contributions
pter+0.054
ests+0.051
 carriers+0.051
eper+0.050
Rex+0.049
Neg logit contributions
 closely-0.049
 up-0.044
 Jackie-0.044
 handsome-0.042
 early-0.042
Token of
Token ID286
Feature activation-3.16e-02

Pos logit contributions
oc+0.125
Rex+0.124
aid+0.120
tr+0.119
 slot+0.117
Neg logit contributions
 up-0.071
 closely-0.071
ly-0.068
 around-0.058
uring-0.052
Token grapes
Token ID36656
Feature activation-5.06e-02

Pos logit contributions
tr+0.040
aid+0.039
 slices+0.035
oc+0.035
ests+0.034
Neg logit contributions
 closely-0.063
ivated-0.058
 Jackie-0.056
 up-0.056
ya-0.056
Token.
Token ID13
Feature activation-9.45e-03

Pos logit contributions
aid+0.098
oc+0.096
Rex+0.092
tr+0.092
 handcuffs+0.090
Neg logit contributions
 closely-0.069
 up-0.062
ly-0.056
uring-0.055
annah-0.051
Token plastic
Token ID7309
Feature activation-5.78e-02

Pos logit contributions
aid+0.023
tr+0.021
oc+0.021
Rex+0.020
ests+0.020
Neg logit contributions
 closely-0.035
 up-0.033
 lookout-0.031
 Jackie-0.029
 handsome-0.029
Token pizza
Token ID14256
Feature activation-4.49e-02

Pos logit contributions
aid+0.085
tr+0.084
Rex+0.076
 bruises+0.076
c+0.076
Neg logit contributions
 closely-0.095
 up-0.095
ly-0.089
ize-0.073
uring-0.072
Token and
Token ID290
Feature activation-1.98e-02

Pos logit contributions
aid+0.096
 bruises+0.093
tr+0.091
oc+0.091
Rex+0.088
Neg logit contributions
 closely-0.049
 up-0.048
ly-0.039
 around-0.034
uring-0.034
Token a
Token ID257
Feature activation-4.83e-02

Pos logit contributions
aid+0.035
tr+0.034
ests+0.033
ase+0.032
ulture+0.032
Neg logit contributions
 closely-0.032
 up-0.028
 Jackie-0.025
ya-0.024
ivated-0.024
Token plastic
Token ID7309
Feature activation-7.87e-02

Pos logit contributions
ests+0.069
pter+0.067
aid+0.064
Rex+0.064
ulture+0.064
Neg logit contributions
 closely-0.085
 up-0.077
 handsome-0.075
 Jackie-0.075
 early-0.071
Token plate
Token ID7480
Feature activation-7.84e-02

Pos logit contributions
aid+0.118
tr+0.109
Rex+0.109
c+0.105
Kim+0.104
Neg logit contributions
 closely-0.129
ly-0.123
 up-0.123
 early-0.100
 curiously-0.099
Token and
Token ID290
Feature activation-2.75e-02

Pos logit contributions
oc+0.171
tr+0.167
ulture+0.164
Ant+0.164
inea+0.163
Neg logit contributions
 closely-0.085
 up-0.079
ly-0.071
 around-0.066
 straight-0.055
Token fork
Token ID15563
Feature activation-6.23e-02

Pos logit contributions
aid+0.042
ase+0.041
ests+0.041
tr+0.041
ulture+0.041
Neg logit contributions
 closely-0.048
 up-0.044
 Jackie-0.040
ly-0.038
ya-0.038
Token.
Token ID13
Feature activation-1.19e-02

Pos logit contributions
oc+0.125
aid+0.124
tr+0.123
Rex+0.116
ulture+0.115
Neg logit contributions
 closely-0.079
 up-0.076
ly-0.065
uring-0.057
 around-0.056
Token Ben
Token ID3932
Feature activation+4.67e-03

Pos logit contributions
aid+0.019
 slices+0.018
tr+0.018
 half+0.017
 bisc+0.016
Neg logit contributions
 closely-0.020
ivated-0.019
uring-0.018
ya-0.018
ize-0.018
Token takes
Token ID2753
Feature activation-2.05e-02

Pos logit contributions
 closely+0.008
ivated+0.007
uring+0.007
ize+0.007
ya+0.007
Neg logit contributions
aid-0.008
 slices-0.007
tr-0.007
 half-0.007
 bisc-0.007
Token-
Token ID12
Feature activation-2.95e-02

Pos logit contributions
 slices+0.063
 narrative+0.061
ests+0.060
tr+0.059
 slot+0.058
Neg logit contributions
ly-0.061
 up-0.059
anna-0.054
 active-0.052
uring-0.052
Tokenaid
Token ID1698
Feature activation-3.88e-02

Pos logit contributions
 slot+0.018
 ig+0.018
tr+0.017
box+0.016
 slices+0.015
Neg logit contributions
ly-0.074
 up-0.068
ya-0.066
 closely-0.066
ivated-0.066
Token and
Token ID290
Feature activation-3.20e-02

Pos logit contributions
oc+0.081
ests+0.080
pter+0.079
tr+0.078
 slices+0.078
Neg logit contributions
 up-0.043
ly-0.042
 closely-0.039
 around-0.031
 early-0.031
Token a
Token ID257
Feature activation-3.94e-02

Pos logit contributions
tr+0.052
pter+0.050
ase+0.050
aid+0.050
ests+0.050
Neg logit contributions
 closely-0.049
 up-0.045
 Jackie-0.045
ya-0.042
ly-0.040
Token new
Token ID649
Feature activation-2.46e-02

Pos logit contributions
pter+0.073
ests+0.073
 carriers+0.070
Rex+0.068
aid+0.068
Neg logit contributions
 closely-0.051
 Jackie-0.047
 up-0.046
 handsome-0.046
ivated-0.043
Token teaspoon
Token ID22326
Feature activation-7.78e-02

Pos logit contributions
aid+0.036
tr+0.033
Rex+0.032
 slices+0.032
ests+0.031
Neg logit contributions
 closely-0.044
 up-0.041
 Jackie-0.040
ya-0.039
ivated-0.039
Token.
Token ID13
Feature activation-1.34e-02

Pos logit contributions
Rex+0.128
ests+0.126
aid+0.126
 handcuffs+0.124
 carriers+0.123
Neg logit contributions
 closely-0.118
ly-0.113
 up-0.110
 Jackie-0.099
uring-0.098
Token This
Token ID770
Feature activation-9.21e-03

Pos logit contributions
 slices+0.024
aid+0.023
bridge+0.023
 slice+0.022
 bisc+0.022
Neg logit contributions
 closely-0.019
ivated-0.017
 Jackie-0.017
inating-0.017
 up-0.016
Token time
Token ID640
Feature activation+2.61e-03

Pos logit contributions
aid+0.016
tr+0.015
 slices+0.015
oc+0.014
 bisc+0.014
Neg logit contributions
 closely-0.015
ivated-0.013
ya-0.013
uring-0.013
ize-0.013
Token Winn
Token ID19454
Feature activation-4.48e-03

Pos logit contributions
 closely+0.003
ivated+0.003
uring+0.003
ize+0.003
ya+0.002
Neg logit contributions
aid-0.005
 slices-0.005
tr-0.005
 half-0.005
 bisc-0.005
Tokenie
Token ID494
Feature activation-4.17e-03

Pos logit contributions
aid+0.010
 slices+0.009
tr+0.009
 half+0.009
oc+0.009
Neg logit contributions
 closely-0.005
ivated-0.004
uring-0.004
ize-0.004
itionally-0.004
Token.
Token ID13
Feature activation-1.97e-02

Pos logit contributions
aid+0.051
tr+0.049
 slices+0.046
oc+0.045
ests+0.044
Neg logit contributions
 closely-0.054
 up-0.048
ivated-0.048
uring-0.047
ize-0.045
Token
Token ID198
Feature activation-2.93e-02

Pos logit contributions
aid+0.029
 slices+0.027
tr+0.027
ests+0.025
oc+0.025
Neg logit contributions
 closely-0.037
ivated-0.034
 up-0.032
uring-0.032
 Jackie-0.032
Token
Token ID198
Feature activation-1.62e-02

Pos logit contributions
aid+0.043
tr+0.042
 slices+0.042
 bisc+0.038
ests+0.037
Neg logit contributions
 closely-0.054
ivated-0.048
uring-0.047
 up-0.047
ize-0.047
TokenThe
Token ID464
Feature activation-7.89e-03

Pos logit contributions
tr+0.023
aid+0.022
 slices+0.021
ests+0.021
 bisc+0.020
Neg logit contributions
 closely-0.030
ivated-0.029
ize-0.028
uring-0.028
itionally-0.027
Token dog
Token ID3290
Feature activation-3.38e-02

Pos logit contributions
aid+0.012
 slices+0.011
tr+0.011
bar+0.010
pter+0.010
Neg logit contributions
 closely-0.014
 up-0.013
ya-0.012
ivated-0.012
itionally-0.012
Token bar
Token ID2318
Feature activation-7.76e-02

Pos logit contributions
tr+0.047
aid+0.045
 slices+0.045
 half+0.044
 bisc+0.044
Neg logit contributions
ivated-0.060
uring-0.058
 Jackie-0.057
ize-0.056
 closely-0.055
Tokenks
Token ID591
Feature activation-1.45e-02

Pos logit contributions
tr+0.107
 slices+0.103
aid+0.103
 bisc+0.101
 scarf+0.098
Neg logit contributions
 closely-0.146
ivated-0.125
 up-0.124
ly-0.122
ya-0.121
Token.
Token ID13
Feature activation-2.51e-03

Pos logit contributions
aid+0.024
tr+0.024
 slices+0.023
 bisc+0.022
ests+0.022
Neg logit contributions
 closely-0.025
ivated-0.021
 up-0.021
uring-0.020
ly-0.020
Token It
Token ID632
Feature activation-1.17e-02

Pos logit contributions
aid+0.005
 slices+0.005
tr+0.005
ests+0.004
brush+0.004
Neg logit contributions
 closely-0.003
 But-0.003
ivated-0.003
 up-0.003
inating-0.003
Token sounds
Token ID5238
Feature activation+7.91e-03

Pos logit contributions
tr+0.018
aid+0.017
 slices+0.017
 half+0.017
oc+0.017
Neg logit contributions
ivated-0.019
 closely-0.018
uring-0.018
ize-0.018
 Jackie-0.018
Token like
Token ID588
Feature activation-3.33e-02

Pos logit contributions
ivated+0.013
 closely+0.013
ya+0.013
annah+0.012
ize+0.012
Neg logit contributions
aid-0.012
 slices-0.011
 half-0.011
tr-0.011
Rex-0.010
Token mom
Token ID1995
Feature activation-3.58e-02

Pos logit contributions
aid+0.016
 slices+0.015
tr+0.014
oc+0.014
 bisc+0.013
Neg logit contributions
 closely-0.023
ivated-0.021
uring-0.021
ya-0.021
 up-0.020
Token smiled
Token ID13541
Feature activation-1.02e-02

Pos logit contributions
tr+0.068
 slices+0.066
aid+0.065
ests+0.064
oc+0.063
Neg logit contributions
 closely-0.048
ivated-0.042
ly-0.041
 up-0.040
ya-0.039
Token and
Token ID290
Feature activation-3.09e-02

Pos logit contributions
aid+0.022
tr+0.021
 slices+0.020
 bisc+0.020
ests+0.020
Neg logit contributions
 closely-0.013
 up-0.012
ivated-0.011
ly-0.010
uring-0.010
Token gave
Token ID2921
Feature activation-1.76e-02

Pos logit contributions
aid+0.047
 slices+0.045
tr+0.044
 half+0.041
 bisc+0.040
Neg logit contributions
 closely-0.054
ivated-0.051
uring-0.049
ize-0.048
ya-0.048
Token them
Token ID606
Feature activation-3.28e-02

Pos logit contributions
tr+0.038
aid+0.037
oc+0.036
 slices+0.035
 bisc+0.034
Neg logit contributions
 closely-0.022
 up-0.020
 Jackie-0.018
ly-0.017
ize-0.017
Token each
Token ID1123
Feature activation-7.74e-02

Pos logit contributions
oc+0.054
tr+0.053
aid+0.050
 slot+0.050
bridge+0.048
Neg logit contributions
 closely-0.055
 up-0.052
ivated-0.046
 Jackie-0.046
ya-0.045
Token a
Token ID257
Feature activation-3.24e-02

Pos logit contributions
tr+0.149
bar+0.135
lor+0.134
bridge+0.132
ase+0.132
Neg logit contributions
 closely-0.113
 up-0.109
ly-0.101
 Jackie-0.092
ya-0.092
Token kiss
Token ID9245
Feature activation-4.98e-02

Pos logit contributions
pter+0.065
eper+0.059
Rex+0.059
bridge+0.059
ests+0.059
Neg logit contributions
 closely-0.037
 up-0.034
 handsome-0.033
 Jackie-0.032
 early-0.030
Token.
Token ID13
Feature activation-7.17e-03

Pos logit contributions
 ig+0.109
 bruises+0.106
aid+0.103
 slices+0.102
ossible+0.102
Neg logit contributions
ly-0.053
 up-0.051
 closely-0.044
 around-0.043
 before-0.038
Token She
Token ID1375
Feature activation-3.79e-02

Pos logit contributions
aid+0.010
 slices+0.010
tr+0.010
 half+0.009
ossible+0.009
Neg logit contributions
 closely-0.013
ivated-0.012
uring-0.012
ya-0.012
ize-0.012
Token was
Token ID373
Feature activation-3.27e-03

Pos logit contributions
 slices+0.069
aid+0.066
pter+0.066
tr+0.065
umbs+0.065
Neg logit contributions
 closely-0.053
 up-0.045
ya-0.044
ize-0.042
ivated-0.042
Token Mom
Token ID11254
Feature activation-1.37e-02

Pos logit contributions
aid+0.011
 slices+0.010
tr+0.010
 half+0.010
 bisc+0.010
Neg logit contributions
 closely-0.009
ivated-0.009
uring-0.008
ize-0.008
itionally-0.008
Token gives
Token ID3607
Feature activation-1.75e-02

Pos logit contributions
aid+0.023
 slices+0.022
tr+0.022
 half+0.021
oc+0.020
Neg logit contributions
 closely-0.022
ivated-0.020
uring-0.019
ya-0.019
ize-0.019
Token them
Token ID606
Feature activation-2.89e-02

Pos logit contributions
tr+0.038
oc+0.037
aid+0.037
 slices+0.034
ests+0.034
Neg logit contributions
 up-0.022
 closely-0.020
 Jackie-0.017
ize-0.016
ya-0.016
Token each
Token ID1123
Feature activation-7.62e-02

Pos logit contributions
oc+0.049
tr+0.046
aid+0.045
 slot+0.045
bridge+0.044
Neg logit contributions
 closely-0.046
 up-0.044
ya-0.038
 Jackie-0.037
ly-0.036
Token a
Token ID257
Feature activation-3.56e-02

Pos logit contributions
tr+0.145
bar+0.137
aid+0.132
ase+0.131
ulture+0.131
Neg logit contributions
 closely-0.109
 up-0.108
ly-0.094
ya-0.094
 Jackie-0.086
Token plate
Token ID7480
Feature activation-7.73e-02

Pos logit contributions
aid+0.056
tr+0.054
 slices+0.054
ests+0.053
 carriers+0.053
Neg logit contributions
 closely-0.057
ya-0.050
ivated-0.049
 up-0.049
 Jackie-0.047
Token.
Token ID13
Feature activation-7.71e-03

Pos logit contributions
oc+0.156
ulture+0.152
tr+0.150
ests+0.150
Rex+0.148
Neg logit contributions
 closely-0.097
 up-0.084
ly-0.083
 around-0.071
uring-0.067
Token They
Token ID1119
Feature activation-1.80e-02

Pos logit contributions
aid+0.012
 slices+0.011
tr+0.011
oc+0.010
 half+0.010
Neg logit contributions
 closely-0.014
ivated-0.012
uring-0.012
inating-0.012
itionally-0.012
Token pour
Token ID12797
Feature activation-8.44e-03

Pos logit contributions
aid+0.032
tr+0.031
 slices+0.030
ests+0.029
oc+0.029
Neg logit contributions
 closely-0.027
ivated-0.024
 up-0.023
uring-0.022
ya-0.022
Token syrup
Token ID24208
Feature activation-2.44e-02

Pos logit contributions
aid+0.012
tr+0.012
oc+0.011
ests+0.011
 slices+0.010
Neg logit contributions
 closely-0.016
 up-0.015
ivated-0.014
uring-0.013
ly-0.013
Token and
Token ID290
Feature activation-2.82e-02

Pos logit contributions
aid+0.039
tr+0.038
oc+0.037
 bruises+0.036
 handcuffs+0.036
Neg logit contributions
 closely-0.040
 up-0.037
ivated-0.035
uring-0.034
ize-0.033
Token She
Token ID1375
Feature activation-2.09e-02

Pos logit contributions
aid+0.003
tr+0.003
 slices+0.003
oc+0.002
ests+0.002
Neg logit contributions
 closely-0.002
ivated-0.002
uring-0.002
inating-0.002
ize-0.002
Token gave
Token ID2921
Feature activation-2.02e-02

Pos logit contributions
aid+0.037
ests+0.036
tr+0.036
ulture+0.035
 carriers+0.034
Neg logit contributions
 closely-0.030
 up-0.025
ya-0.023
ize-0.023
 Jackie-0.023
Token Tom
Token ID4186
Feature activation-6.05e-03

Pos logit contributions
tr+0.042
aid+0.041
oc+0.040
ests+0.039
 sy+0.038
Neg logit contributions
 up-0.026
 closely-0.025
ize-0.020
ya-0.019
 Jackie-0.019
Token and
Token ID290
Feature activation-9.75e-03

Pos logit contributions
tr+0.013
aid+0.012
 slot+0.012
bridge+0.012
ests+0.012
Neg logit contributions
 up-0.007
 closely-0.006
ly-0.005
ya-0.005
 early-0.004
Token Lily
Token ID20037
Feature activation-2.63e-02

Pos logit contributions
oc+0.023
ulture+0.023
aid+0.022
ests+0.022
tr+0.022
Neg logit contributions
 closely-0.008
 Jackie-0.008
 up-0.007
 Jack-0.006
ya-0.006
Token each
Token ID1123
Feature activation-7.72e-02

Pos logit contributions
oc+0.044
tr+0.044
ulture+0.043
Rex+0.042
bridge+0.041
Neg logit contributions
 up-0.043
 closely-0.042
 early-0.034
ly-0.032
ivated-0.032
Token a
Token ID257
Feature activation-2.95e-02

Pos logit contributions
tr+0.145
lor+0.144
ests+0.143
ulture+0.136
 carriers+0.136
Neg logit contributions
 closely-0.098
 up-0.098
ly-0.091
 Jackie-0.080
 early-0.078
Token slice
Token ID16416
Feature activation-6.49e-02

Pos logit contributions
ests+0.038
aid+0.035
pter+0.035
tr+0.035
 carriers+0.035
Neg logit contributions
 closely-0.056
 Jackie-0.050
 up-0.049
 handsome-0.049
very-0.048
Token.
Token ID13
Feature activation-6.11e-03

Pos logit contributions
oc+0.128
aid+0.125
 bruises+0.124
Rex+0.124
 handcuffs+0.122
Neg logit contributions
 up-0.076
 closely-0.074
ly-0.067
 around-0.058
 before-0.056
Token They
Token ID1119
Feature activation-2.39e-02

Pos logit contributions
aid+0.010
tr+0.010
 slices+0.010
oc+0.009
 half+0.009
Neg logit contributions
 closely-0.010
ivated-0.009
uring-0.008
inating-0.008
ize-0.008
Token took
Token ID1718
Feature activation-2.89e-02

Pos logit contributions
tr+0.049
ests+0.047
ulture+0.047
lor+0.046
 carriers+0.046
Neg logit contributions
 closely-0.027
 forgot-0.022
 up-0.022
 almost-0.021
 Jackie-0.021
Token was
Token ID373
Feature activation-6.61e-03

Pos logit contributions
 closely+0.008
 up+0.008
 Jackie+0.006
uring+0.006
 almost+0.006
Neg logit contributions
aid-0.014
 slices-0.013
ossible-0.013
umbs-0.012
ulture-0.012
Token a
Token ID257
Feature activation+2.78e-03

Pos logit contributions
aid+0.012
oc+0.012
ulture+0.011
ests+0.011
tr+0.011
Neg logit contributions
 closely-0.010
 up-0.009
uring-0.008
 Jackie-0.008
 impossible-0.008
Token box
Token ID3091
Feature activation-1.34e-02

Pos logit contributions
 closely+0.005
 lookout+0.004
 up+0.004
ya+0.004
 Jackie+0.004
Neg logit contributions
aid-0.005
ests-0.004
 slices-0.004
umbs-0.004
ossible-0.004
Token of
Token ID286
Feature activation-3.54e-02

Pos logit contributions
aid+0.026
 slices+0.023
ests+0.023
tr+0.023
 bruises+0.022
Neg logit contributions
 closely-0.020
uring-0.017
 up-0.017
ivated-0.017
ly-0.016
Token ch
Token ID442
Feature activation+6.50e-03

Pos logit contributions
aid+0.056
tr+0.055
ulture+0.053
oc+0.053
umbs+0.053
Neg logit contributions
 closely-0.056
 up-0.052
 Jackie-0.050
 handsome-0.048
ly-0.047
Tokenocol
Token ID4668
Feature activation-7.71e-02

Pos logit contributions
 closely+0.012
ivated+0.011
uring+0.010
inating+0.010
ize+0.010
Neg logit contributions
tr-0.010
aid-0.009
 bisc-0.008
 slices-0.008
 scarf-0.008
Tokenates
Token ID689
Feature activation-2.48e-02

Pos logit contributions
aid+0.098
tr+0.097
 bisc+0.095
oc+0.089
 scarf+0.089
Neg logit contributions
 closely-0.123
anna-0.120
inating-0.119
uring-0.119
ize-0.117
Token.
Token ID13
Feature activation+2.36e-03

Pos logit contributions
aid+0.046
tr+0.041
 slices+0.039
ests+0.039
ossible+0.039
Neg logit contributions
 closely-0.040
 up-0.034
uring-0.034
ivated-0.033
ize-0.032
Token Lily
Token ID20037
Feature activation+1.75e-02

Pos logit contributions
 closely+0.004
ivated+0.003
uring+0.003
 Jackie+0.003
inating+0.003
Neg logit contributions
aid-0.004
tr-0.004
 slices-0.004
 bruises-0.004
 bisc-0.004
Token loved
Token ID6151
Feature activation+1.15e-03

Pos logit contributions
ivated+0.028
 closely+0.028
uring+0.027
ize+0.026
itionally+0.026
Neg logit contributions
 slices-0.027
aid-0.027
tr-0.027
 half-0.025
oc-0.025
Token ch
Token ID442
Feature activation+2.27e-02

Pos logit contributions
 closely+0.002
ya+0.002
 up+0.002
ize+0.002
uring+0.002
Neg logit contributions
aid-0.002
tr-0.002
oc-0.002
 slices-0.002
ossible-0.002
Token a
Token ID257
Feature activation-3.15e-02

Pos logit contributions
oc+0.122
tr+0.122
ase+0.122
ests+0.120
aid+0.120
Neg logit contributions
 closely-0.077
 up-0.072
ivated-0.066
 Jackie-0.064
 handsome-0.062
Token big
Token ID1263
Feature activation-4.21e-02

Pos logit contributions
aid+0.058
 slices+0.055
tr+0.054
ests+0.052
 half+0.051
Neg logit contributions
 closely-0.045
ivated-0.041
ya-0.038
uring-0.038
 up-0.038
Token hat
Token ID6877
Feature activation-5.73e-02

Pos logit contributions
aid+0.072
 slices+0.069
tr+0.068
 half+0.064
 bisc+0.063
Neg logit contributions
 closely-0.066
ivated-0.061
uring-0.058
ize-0.057
ya-0.057
Token and
Token ID290
Feature activation-2.94e-02

Pos logit contributions
aid+0.117
 slices+0.108
ests+0.107
ossible+0.102
oc+0.101
Neg logit contributions
 closely-0.081
 up-0.077
ly-0.066
ivated-0.064
uring-0.063
Token a
Token ID257
Feature activation-4.22e-02

Pos logit contributions
aid+0.045
 slices+0.041
tr+0.041
ests+0.041
ase+0.040
Neg logit contributions
 closely-0.052
 up-0.046
 Jackie-0.045
ivated-0.045
ya-0.044
Token cane
Token ID33009
Feature activation-7.71e-02

Pos logit contributions
aid+0.075
ests+0.071
 slices+0.070
oc+0.067
tr+0.067
Neg logit contributions
 closely-0.063
ivated-0.053
 up-0.053
 lookout-0.053
 handsome-0.052
Token too
Token ID1165
Feature activation-1.89e-02

Pos logit contributions
aid+0.150
ests+0.140
 slices+0.138
ossible+0.133
umbs+0.128
Neg logit contributions
 closely-0.117
 up-0.104
ly-0.094
ivated-0.092
uring-0.089
Token.
Token ID13
Feature activation-5.51e-03

Pos logit contributions
aid+0.029
 slices+0.027
tr+0.027
ossible+0.025
ests+0.025
Neg logit contributions
 closely-0.034
ivated-0.030
uring-0.030
 up-0.029
ize-0.028
Token
Token ID198
Feature activation-1.57e-02

Pos logit contributions
aid+0.009
 slices+0.008
tr+0.008
 half+0.008
oc+0.007
Neg logit contributions
 closely-0.009
ivated-0.009
uring-0.009
ize-0.008
ya-0.008
Token
Token ID198
Feature activation-2.45e-03

Pos logit contributions
aid+0.019
 slices+0.018
tr+0.018
 half+0.016
 bisc+0.016
Neg logit contributions
 closely-0.032
ivated-0.030
uring-0.029
ize-0.029
ya-0.029
Token"
Token ID1
Feature activation-4.14e-02

Pos logit contributions
aid+0.003
 slices+0.003
tr+0.003
 half+0.002
 bisc+0.002
Neg logit contributions
 closely-0.005
ivated-0.005
uring-0.005
ize-0.005
ya-0.005
Token a
Token ID257
Feature activation+3.62e-03

Pos logit contributions
tr+0.038
aid+0.037
oc+0.035
 slices+0.035
ossible+0.035
Neg logit contributions
 closely-0.038
 up-0.035
ya-0.032
ize-0.032
 Jackie-0.032
Token big
Token ID1263
Feature activation-5.41e-04

Pos logit contributions
 closely+0.005
ivated+0.005
ya+0.005
ize+0.005
uring+0.005
Neg logit contributions
aid-0.007
 slices-0.006
tr-0.006
ests-0.006
ossible-0.006
Token box
Token ID3091
Feature activation-1.29e-02

Pos logit contributions
aid+0.001
 slices+0.001
tr+0.001
 half+0.001
 bisc+0.001
Neg logit contributions
 closely-0.001
ivated-0.001
uring-0.001
ize-0.001
ya-0.001
Token of
Token ID286
Feature activation-2.57e-02

Pos logit contributions
aid+0.024
ests+0.022
 slices+0.021
tr+0.021
oc+0.021
Neg logit contributions
 closely-0.021
 up-0.018
ivated-0.018
uring-0.018
ly-0.016
Token ch
Token ID442
Feature activation+5.71e-03

Pos logit contributions
aid+0.042
tr+0.042
ossible+0.039
umbs+0.039
ulture+0.038
Neg logit contributions
 closely-0.042
 up-0.038
 Jackie-0.036
ize-0.035
ivated-0.035
Tokenocol
Token ID4668
Feature activation-7.70e-02

Pos logit contributions
 closely+0.010
ivated+0.010
uring+0.009
ize+0.009
inating+0.009
Neg logit contributions
tr-0.009
aid-0.008
 bisc-0.008
 slices-0.008
oc-0.007
Tokenates
Token ID689
Feature activation-2.14e-02

Pos logit contributions
aid+0.099
 bisc+0.096
tr+0.094
oc+0.091
 scarf+0.090
Neg logit contributions
 closely-0.122
anna-0.120
inating-0.120
ize-0.117
ya-0.117
Token.
Token ID13
Feature activation+1.10e-02

Pos logit contributions
aid+0.038
tr+0.034
 slices+0.032
oc+0.032
ests+0.032
Neg logit contributions
 closely-0.036
 up-0.032
ivated-0.031
uring-0.031
ize-0.030
Token It
Token ID632
Feature activation+1.60e-02

Pos logit contributions
 closely+0.017
ivated+0.015
uring+0.015
 Jackie+0.014
ya+0.014
Neg logit contributions
aid-0.020
 slices-0.018
tr-0.018
 half-0.017
 bisc-0.017
Token was
Token ID373
Feature activation-2.69e-04

Pos logit contributions
 closely+0.026
ivated+0.023
 up+0.022
uring+0.022
ya+0.022
Neg logit contributions
aid-0.028
 slices-0.026
tr-0.025
ests-0.024
ossible-0.024
Token so
Token ID523
Feature activation+1.20e-02

Pos logit contributions
aid+0.000
oc+0.000
ests+0.000
tr+0.000
 slices+0.000
Neg logit contributions
 closely-0.000
 up-0.000
uring-0.000
ivated-0.000
 Jackie-0.000
Token trying
Token ID2111
Feature activation+4.07e-02

Pos logit contributions
 closely+0.031
ivated+0.030
itionally+0.029
uring+0.028
inating+0.028
Neg logit contributions
aid-0.035
 slices-0.034
 half-0.032
tr-0.032
 handcuffs-0.031
Token,
Token ID11
Feature activation+9.22e-03

Pos logit contributions
ivated+0.074
itionally+0.071
rated+0.068
inating+0.068
ya+0.066
Neg logit contributions
 half-0.057
aid-0.056
 slices-0.055
 scarf-0.051
 bruises-0.050
Token he
Token ID339
Feature activation+1.91e-02

Pos logit contributions
 closely+0.014
ivated+0.013
uring+0.012
ize+0.012
ya+0.012
Neg logit contributions
aid-0.016
 slices-0.016
tr-0.016
 half-0.015
 bisc-0.014
Token could
Token ID714
Feature activation+8.26e-04

Pos logit contributions
 closely+0.032
ivated+0.030
uring+0.029
ize+0.028
ya+0.028
Neg logit contributions
aid-0.030
 slices-0.029
tr-0.028
 half-0.027
 bisc-0.026
Token do
Token ID466
Feature activation+1.84e-02

Pos logit contributions
 closely+0.001
ivated+0.001
uring+0.001
ya+0.001
itionally+0.001
Neg logit contributions
aid-0.001
 slices-0.001
tr-0.001
 half-0.001
ossible-0.001
Token anything
Token ID1997
Feature activation+5.13e-02

Pos logit contributions
 closely+0.030
ivated+0.030
inating+0.028
uring+0.028
itionally+0.028
Neg logit contributions
aid-0.029
 slices-0.027
 half-0.026
tr-0.026
 bruises-0.025
Token he
Token ID339
Feature activation+2.98e-02

Pos logit contributions
inating+0.093
ivated+0.092
ya+0.087
itionally+0.086
 closely+0.085
Neg logit contributions
aid-0.064
 half-0.060
 slices-0.060
oc-0.057
tr-0.055
Token wanted
Token ID2227
Feature activation+1.77e-02

Pos logit contributions
 closely+0.048
ivated+0.047
uring+0.044
inating+0.043
ize+0.043
Neg logit contributions
aid-0.049
tr-0.045
 slices-0.045
 half-0.044
oc-0.043
Token.
Token ID13
Feature activation-2.12e-04

Pos logit contributions
ivated+0.033
 closely+0.033
itionally+0.031
inating+0.030
uring+0.030
Neg logit contributions
aid-0.023
 slices-0.023
 half-0.022
tr-0.021
ests-0.020
Token<|endoftext|>
Token ID50256
Feature activation-8.49e-03

Pos logit contributions
aid+0.000
 slices+0.000
tr+0.000
brush+0.000
 carriers+0.000
Neg logit contributions
 closely-0.000
 Jackie-0.000
 up-0.000
uring-0.000
itionally-0.000
TokenOnce
Token ID7454
Feature activation+1.83e-02

Pos logit contributions
aid+0.014
 slices+0.013
tr+0.013
 bruises+0.012
 slice+0.012
Neg logit contributions
 closely-0.014
 up-0.013
 Jackie-0.013
ivated-0.012
inating-0.012
Token.
Token ID13
Feature activation+7.33e-03

Pos logit contributions
 closely+0.043
ivated+0.040
uring+0.038
ize+0.038
ya+0.037
Neg logit contributions
aid-0.040
 slices-0.038
tr-0.038
 half-0.035
 bisc-0.034
Token
Token ID220
Feature activation+1.31e-02

Pos logit contributions
 closely+0.015
ivated+0.015
ize+0.015
uring+0.015
ya+0.014
Neg logit contributions
ossible-0.007
tr-0.007
aid-0.007
 slices-0.007
 half-0.007
Token
Token ID198
Feature activation-2.07e-03

Pos logit contributions
 closely+0.023
 up+0.021
ivated+0.020
uring+0.019
 Jackie+0.019
Neg logit contributions
aid-0.021
 slices-0.020
tr-0.019
 bisc-0.019
oc-0.019
Token
Token ID198
Feature activation+1.19e-02

Pos logit contributions
aid+0.002
 slices+0.002
tr+0.002
 half+0.002
 bisc+0.002
Neg logit contributions
 closely-0.004
ivated-0.004
uring-0.004
ize-0.004
ya-0.004
TokenWhen
Token ID2215
Feature activation+3.88e-03

Pos logit contributions
 closely+0.021
ivated+0.020
uring+0.019
ize+0.019
itionally+0.019
Neg logit contributions
aid-0.017
 slices-0.017
tr-0.016
 half-0.016
ossible-0.016
Token Rachel
Token ID15984
Feature activation+5.12e-02

Pos logit contributions
 closely+0.006
ivated+0.006
itionally+0.005
uring+0.005
anna+0.005
Neg logit contributions
 slices-0.007
 half-0.006
ossible-0.006
aid-0.006
tr-0.006
Token arrived
Token ID5284
Feature activation+1.94e-02

Pos logit contributions
ivated+0.093
 closely+0.092
ize+0.090
uring+0.089
itionally+0.089
Neg logit contributions
aid-0.065
oc-0.063
 slices-0.061
 half-0.060
tr-0.059
Token,
Token ID11
Feature activation+1.20e-02

Pos logit contributions
itionally+0.041
rated+0.039
uring+0.035
ivated+0.034
ulating+0.034
Neg logit contributions
ossible-0.023
 half-0.021
 slices-0.021
aid-0.019
eper-0.019
Token she
Token ID673
Feature activation+1.82e-02

Pos logit contributions
itionally+0.019
 closely+0.018
uring+0.018
anna+0.018
ivated+0.018
Neg logit contributions
 slices-0.018
 half-0.017
aid-0.017
ossible-0.016
tr-0.016
Token felt
Token ID2936
Feature activation+7.57e-03

Pos logit contributions
 closely+0.032
ivated+0.032
ize+0.031
uring+0.031
itionally+0.031
Neg logit contributions
ossible-0.023
 slices-0.023
aid-0.022
 half-0.022
tr-0.022
Token the
Token ID262
Feature activation+6.79e-03

Pos logit contributions
 closely+0.013
ya+0.013
itionally+0.013
ize+0.012
anna+0.012
Neg logit contributions
aid-0.011
 slices-0.010
 bruises-0.009
 half-0.009
 handcuffs-0.009
Token,
Token ID11
Feature activation+2.52e-02

Pos logit contributions
 closely+0.022
ivated+0.020
uring+0.020
itionally+0.019
ya+0.019
Neg logit contributions
aid-0.035
tr-0.034
 slices-0.034
 half-0.032
 bisc-0.032
Token she
Token ID673
Feature activation+2.70e-02

Pos logit contributions
 closely+0.041
ivated+0.038
 up+0.037
uring+0.037
ize+0.036
Neg logit contributions
aid-0.041
 slices-0.039
tr-0.038
oc-0.036
 half-0.036
Token saw
Token ID2497
Feature activation-1.70e-02

Pos logit contributions
 closely+0.047
ivated+0.046
uring+0.043
ize+0.043
itionally+0.042
Neg logit contributions
aid-0.039
tr-0.039
 slices-0.038
 half-0.037
oc-0.036
Token a
Token ID257
Feature activation+1.46e-02

Pos logit contributions
oc+0.028
 slices+0.027
aid+0.027
pter+0.025
tr+0.025
Neg logit contributions
 closely-0.028
 up-0.027
ivated-0.024
 Jackie-0.024
ize-0.024
Token big
Token ID1263
Feature activation+6.87e-03

Pos logit contributions
 closely+0.023
ivated+0.021
uring+0.020
ize+0.020
ya+0.020
Neg logit contributions
aid-0.025
 slices-0.024
tr-0.023
 half-0.022
 bisc-0.022
Token,
Token ID11
Feature activation+4.50e-02

Pos logit contributions
 closely+0.013
ivated+0.013
uring+0.012
itionally+0.012
ize+0.012
Neg logit contributions
aid-0.009
 slices-0.009
tr-0.008
 half-0.008
 bisc-0.008
Token red
Token ID2266
Feature activation-3.16e-02

Pos logit contributions
 closely+0.061
ivated+0.058
uring+0.056
ize+0.054
itionally+0.054
Neg logit contributions
aid-0.083
 slices-0.082
tr-0.081
 bisc-0.077
 half-0.077
Token truck
Token ID7779
Feature activation-2.28e-02

Pos logit contributions
aid+0.050
 slices+0.041
pter+0.041
 bruises+0.040
ests+0.040
Neg logit contributions
 closely-0.056
 up-0.053
inating-0.051
ly-0.050
ya-0.049
Token.
Token ID13
Feature activation+1.45e-02

Pos logit contributions
aid+0.048
 slices+0.045
ests+0.043
ossible+0.041
 bruises+0.040
Neg logit contributions
 closely-0.031
 up-0.030
ivated-0.027
uring-0.027
ly-0.024
Token It
Token ID632
Feature activation+2.53e-02

Pos logit contributions
 closely+0.019
ivated+0.017
 up+0.017
appointed+0.016
 But+0.016
Neg logit contributions
aid-0.028
 slices-0.027
ests-0.025
 slice-0.025
 half-0.025
Token was
Token ID373
Feature activation+3.02e-03

Pos logit contributions
 closely+0.045
ivated+0.041
uring+0.040
ize+0.039
ya+0.039
Neg logit contributions
aid-0.039
 slices-0.037
tr-0.036
 half-0.034
 bisc-0.033
Token.
Token ID13
Feature activation+2.52e-02

Pos logit contributions
 closely+0.002
 up+0.002
ya+0.002
anna+0.002
ivated+0.002
Neg logit contributions
aid-0.003
 slices-0.003
 handcuffs-0.002
 bisc-0.002
ests-0.002
Token L
Token ID406
Feature activation+3.36e-02

Pos logit contributions
 closely+0.037
ivated+0.034
uring+0.032
ize+0.031
itionally+0.031
Neg logit contributions
aid-0.046
 slices-0.044
tr-0.043
 half-0.041
oc-0.041
Tokenila
Token ID10102
Feature activation+2.31e-02

Pos logit contributions
 closely+0.077
ivated+0.073
uring+0.071
ize+0.070
ya+0.070
Neg logit contributions
aid-0.033
 slices-0.031
tr-0.030
 half-0.027
oc-0.026
Token loved
Token ID6151
Feature activation+8.87e-03

Pos logit contributions
 closely+0.038
ivated+0.037
uring+0.036
itionally+0.034
ize+0.034
Neg logit contributions
aid-0.036
tr-0.036
 slices-0.034
 half-0.033
oc-0.033
Token to
Token ID284
Feature activation+4.04e-03

Pos logit contributions
 closely+0.011
ivated+0.010
ize+0.010
ya+0.010
 up+0.010
Neg logit contributions
aid-0.018
tr-0.017
 slices-0.017
oc-0.016
 half-0.016
Token go
Token ID467
Feature activation+6.43e-02

Pos logit contributions
 closely+0.006
ivated+0.005
uring+0.005
ize+0.005
ya+0.005
Neg logit contributions
aid-0.008
 slices-0.007
tr-0.007
 half-0.007
oc-0.007
Token on
Token ID319
Feature activation+1.30e-02

Pos logit contributions
itionally+0.135
ivated+0.134
 closely+0.134
ya+0.131
rated+0.128
Neg logit contributions
aid-0.075
tr-0.072
ossible-0.068
 bisc-0.064
 slices-0.064
Token walks
Token ID11114
Feature activation+3.93e-02

Pos logit contributions
 closely+0.022
ivated+0.021
uring+0.021
rated+0.020
itionally+0.020
Neg logit contributions
 slices-0.020
aid-0.019
tr-0.019
 half-0.018
ossible-0.018
Token in
Token ID287
Feature activation-1.17e-04

Pos logit contributions
itionally+0.073
 closely+0.072
ivated+0.072
ya+0.069
rated+0.069
Neg logit contributions
tr-0.050
 slices-0.049
aid-0.048
ossible-0.046
 half-0.045
Token the
Token ID262
Feature activation+1.13e-02

Pos logit contributions
aid+0.000
tr+0.000
oc+0.000
ulture+0.000
bar+0.000
Neg logit contributions
 closely-0.000
 up-0.000
 Jackie-0.000
 explore-0.000
ivated-0.000
Token woods
Token ID16479
Feature activation+3.80e-03

Pos logit contributions
 closely+0.019
ivated+0.018
uring+0.017
ize+0.016
itionally+0.016
Neg logit contributions
aid-0.018
 slices-0.017
tr-0.017
 half-0.016
oc-0.016
Token he
Token ID339
Feature activation+2.24e-02

Pos logit contributions
 closely+0.015
ivated+0.014
uring+0.013
ize+0.013
 up+0.013
Neg logit contributions
aid-0.020
tr-0.019
 slices-0.019
 bisc-0.017
ests-0.017
Token was
Token ID373
Feature activation-2.31e-03

Pos logit contributions
 closely+0.039
ivated+0.038
uring+0.036
itionally+0.035
ize+0.034
Neg logit contributions
aid-0.034
tr-0.032
 slices-0.032
oc-0.031
 half-0.030
Token never
Token ID1239
Feature activation+4.41e-03

Pos logit contributions
aid+0.004
 slices+0.004
tr+0.004
 bisc+0.004
ests+0.004
Neg logit contributions
 closely-0.003
ivated-0.003
uring-0.003
 up-0.003
ya-0.003
Token scared
Token ID12008
Feature activation+5.28e-02

Pos logit contributions
 closely+0.006
ize+0.005
ivated+0.005
uring+0.005
inating+0.005
Neg logit contributions
aid-0.008
 slices-0.008
 half-0.008
tr-0.008
 bruises-0.008
Token or
Token ID393
Feature activation+2.66e-02

Pos logit contributions
itionally+0.111
annah+0.109
 lookout+0.105
ivated+0.105
ya+0.105
Neg logit contributions
 half-0.050
 of-0.048
aid-0.045
 slices-0.045
ful-0.043
Token worried
Token ID7960
Feature activation+4.40e-02

Pos logit contributions
 closely+0.034
ize+0.031
ya+0.031
annah+0.031
ivated+0.031
Neg logit contributions
aid-0.050
tr-0.046
 slices-0.046
 half-0.045
 bruises-0.044
Token.
Token ID13
Feature activation-7.10e-03

Pos logit contributions
ivated+0.069
itionally+0.068
annah+0.065
ize+0.065
ya+0.064
Neg logit contributions
 slices-0.065
aid-0.064
 half-0.064
tr-0.063
brush-0.059
Token Instead
Token ID5455
Feature activation-2.50e-02

Pos logit contributions
aid+0.014
 slices+0.014
tr+0.013
 bisc+0.013
 slice+0.013
Neg logit contributions
 Jackie-0.009
 But-0.009
 closely-0.008
very-0.007
 Jack-0.007
Token,
Token ID11
Feature activation+5.26e-03

Pos logit contributions
aid+0.050
tr+0.048
 slices+0.047
 bisc+0.044
Rex+0.043
Neg logit contributions
 closely-0.033
 up-0.028
ivated-0.028
uring-0.028
ya-0.027
Token he
Token ID339
Feature activation+1.58e-02

Pos logit contributions
 closely+0.007
 up+0.006
ivated+0.005
uring+0.005
 Jackie+0.005
Neg logit contributions
aid-0.011
tr-0.011
 slices-0.010
ests-0.010
ossible-0.010
Token welcomed
Token ID15846
Feature activation+1.08e-02

Pos logit contributions
 closely+0.026
ivated+0.025
uring+0.024
itionally+0.023
ize+0.023
Neg logit contributions
aid-0.025
 slices-0.024
tr-0.024
 half-0.022
oc-0.022
Token away
Token ID1497
Feature activation+2.55e-02

Pos logit contributions
ivated+0.058
 closely+0.056
itionally+0.055
ize+0.054
ya+0.054
Neg logit contributions
aid-0.055
 half-0.054
 slices-0.052
 scarf-0.052
tr-0.051
Token from
Token ID422
Feature activation+6.24e-03

Pos logit contributions
ivated+0.046
 closely+0.046
itionally+0.044
ya+0.042
ize+0.042
Neg logit contributions
 slices-0.032
tr-0.032
aid-0.032
 half-0.031
oc-0.030
Token it
Token ID340
Feature activation+8.28e-03

Pos logit contributions
ivated+0.010
 closely+0.010
itionally+0.010
uring+0.009
ya+0.009
Neg logit contributions
 slices-0.009
 half-0.009
aid-0.009
tr-0.009
 bruises-0.008
Token,
Token ID11
Feature activation+3.06e-02

Pos logit contributions
 closely+0.013
ivated+0.013
itionally+0.012
ize+0.012
uring+0.012
Neg logit contributions
aid-0.013
 slices-0.013
tr-0.013
 half-0.012
oc-0.012
Token Emily
Token ID17608
Feature activation+9.20e-03

Pos logit contributions
 closely+0.041
ivated+0.041
ize+0.038
ya+0.037
itionally+0.037
Neg logit contributions
aid-0.055
 slices-0.054
tr-0.053
 half-0.052
 bisc-0.050
Token."
Token ID526
Feature activation-8.23e-04

Pos logit contributions
 closely+0.015
ivated+0.014
uring+0.014
ize+0.013
ya+0.013
Neg logit contributions
aid-0.015
 slices-0.014
tr-0.014
 half-0.013
 bisc-0.013
Token Emily
Token ID17608
Feature activation-1.84e-03

Pos logit contributions
 slices+0.001
aid+0.001
tr+0.001
ossible+0.001
 half+0.001
Neg logit contributions
 closely-0.001
ivated-0.001
uring-0.001
ya-0.001
ize-0.001
Token backed
Token ID9763
Feature activation+2.44e-02

Pos logit contributions
aid+0.003
 slices+0.003
tr+0.003
 half+0.003
oc+0.003
Neg logit contributions
 closely-0.003
ivated-0.003
uring-0.003
ize-0.003
itionally-0.003
Token away
Token ID1497
Feature activation+1.35e-02

Pos logit contributions
ivated+0.036
itionally+0.036
 lookout+0.034
uring+0.034
 closely+0.034
Neg logit contributions
aid-0.039
 half-0.037
 slices-0.037
tr-0.035
 handcuffs-0.035
Token,
Token ID11
Feature activation+1.01e-02

Pos logit contributions
itionally+0.021
ivated+0.021
 closely+0.020
inating+0.020
ize+0.020
Neg logit contributions
 slices-0.021
aid-0.020
 half-0.020
tr-0.020
brush-0.018
Token but
Token ID475
Feature activation+1.83e-03

Pos logit contributions
 closely+0.012
ize+0.011
ivated+0.011
itionally+0.011
uring+0.011
Neg logit contributions
aid-0.020
 slices-0.019
tr-0.019
 half-0.019
 bruises-0.018
Token,
Token ID11
Feature activation+1.35e-02

Pos logit contributions
 closely+0.008
ivated+0.007
uring+0.007
 Jackie+0.007
 up+0.007
Neg logit contributions
aid-0.014
tr-0.013
 slices-0.013
ests-0.012
oc-0.012
Token she
Token ID673
Feature activation+7.80e-04

Pos logit contributions
 closely+0.019
ivated+0.017
 up+0.016
uring+0.016
 Jackie+0.016
Neg logit contributions
aid-0.026
tr-0.024
 slices-0.024
ests-0.023
oc-0.023
Token used
Token ID973
Feature activation-1.87e-02

Pos logit contributions
 closely+0.001
ivated+0.001
 up+0.001
ya+0.001
uring+0.001
Neg logit contributions
aid-0.001
tr-0.001
 slices-0.001
ests-0.001
 bisc-0.001
Token her
Token ID607
Feature activation-1.82e-02

Pos logit contributions
oc+0.038
tr+0.037
aid+0.037
Rex+0.035
ulture+0.035
Neg logit contributions
 closely-0.026
 up-0.023
 Jackie-0.022
ivated-0.019
uring-0.019
Token twist
Token ID14528
Feature activation+1.89e-03

Pos logit contributions
aid+0.031
tr+0.029
 slices+0.028
umbs+0.028
oc+0.028
Neg logit contributions
 closely-0.030
 up-0.026
 Jackie-0.024
ya-0.024
 lookout-0.024
Token one
Token ID530
Feature activation-6.19e-05

Pos logit contributions
 closely+0.003
ivated+0.002
uring+0.002
ize+0.002
ya+0.002
Neg logit contributions
aid-0.003
 slices-0.003
tr-0.003
 half-0.003
 bisc-0.003
Token last
Token ID938
Feature activation-3.76e-03

Pos logit contributions
aid+0.000
tr+0.000
 slices+0.000
ests+0.000
 carriers+0.000
Neg logit contributions
 closely-0.000
 up-0.000
ivated-0.000
uring-0.000
ya-0.000
Token time
Token ID640
Feature activation+9.63e-03

Pos logit contributions
aid+0.007
 slices+0.007
tr+0.007
ests+0.007
 carriers+0.007
Neg logit contributions
 closely-0.005
ivated-0.004
uring-0.004
 up-0.004
ya-0.004
Token and
Token ID290
Feature activation-1.82e-03

Pos logit contributions
 closely+0.014
ivated+0.013
uring+0.013
ize+0.013
itionally+0.013
Neg logit contributions
aid-0.017
 slices-0.016
tr-0.016
 half-0.015
 bisc-0.015
Token the
Token ID262
Feature activation-1.11e-02

Pos logit contributions
aid+0.003
 slices+0.003
tr+0.003
oc+0.003
 bisc+0.003
Neg logit contributions
 closely-0.003
ivated-0.002
uring-0.002
ize-0.002
 up-0.002
Token wall
Token ID3355
Feature activation-2.98e-02

Pos logit contributions
aid+0.018
oc+0.016
 slices+0.016
tr+0.016
ests+0.016
Neg logit contributions
 closely-0.018
 up-0.017
ya-0.017
 Jackie-0.016
 lookout-0.016
Token break
Token ID2270
Feature activation-1.22e-03

Pos logit contributions
 closely+0.028
ivated+0.024
uring+0.023
ya+0.023
 up+0.023
Neg logit contributions
aid-0.039
 slices-0.037
tr-0.036
ests-0.036
ossible-0.034
Token for
Token ID329
Feature activation-1.74e-02

Pos logit contributions
aid+0.002
 slices+0.002
tr+0.002
ossible+0.002
 bisc+0.002
Neg logit contributions
 closely-0.002
ivated-0.002
uring-0.002
 up-0.002
ize-0.002
Token it
Token ID340
Feature activation+5.38e-03

Pos logit contributions
aid+0.028
 slices+0.027
tr+0.026
 half+0.025
 bisc+0.024
Neg logit contributions
 closely-0.029
ivated-0.027
uring-0.026
ize-0.025
itionally-0.025
Token.
Token ID13
Feature activation-7.20e-03

Pos logit contributions
 closely+0.009
ivated+0.009
uring+0.008
ize+0.008
ya+0.008
Neg logit contributions
aid-0.008
 slices-0.008
tr-0.008
 half-0.007
 bisc-0.007
Token He
Token ID679
Feature activation+1.33e-02

Pos logit contributions
aid+0.010
 slices+0.010
tr+0.010
 half+0.009
ossible+0.009
Neg logit contributions
 closely-0.013
ivated-0.012
uring-0.012
ize-0.012
ya-0.012
Token ran
Token ID4966
Feature activation+3.62e-02

Pos logit contributions
 closely+0.021
ivated+0.020
uring+0.019
ize+0.019
itionally+0.019
Neg logit contributions
aid-0.022
 slices-0.021
tr-0.021
 half-0.020
oc-0.019
Token back
Token ID736
Feature activation+1.05e-02

Pos logit contributions
itionally+0.066
ivated+0.062
ize+0.060
rated+0.060
 insignificant+0.058
Neg logit contributions
aid-0.049
 half-0.049
 handcuffs-0.047
 slices-0.046
Rex-0.045
Token to
Token ID284
Feature activation-2.22e-02

Pos logit contributions
ivated+0.020
itionally+0.019
ize+0.018
rated+0.018
 closely+0.017
Neg logit contributions
 half-0.014
tr-0.013
aid-0.013
 slices-0.013
ossible-0.013
Token the
Token ID262
Feature activation+1.59e-02

Pos logit contributions
aid+0.038
 slices+0.036
tr+0.036
oc+0.033
 half+0.033
Neg logit contributions
 closely-0.036
ivated-0.033
uring-0.032
ya-0.031
ize-0.031
Token rock
Token ID3881
Feature activation-9.50e-03

Pos logit contributions
 closely+0.026
ivated+0.024
uring+0.023
ya+0.023
ize+0.023
Neg logit contributions
aid-0.026
 slices-0.025
tr-0.025
 half-0.023
oc-0.023
Token and
Token ID290
Feature activation+9.08e-03

Pos logit contributions
aid+0.019
 slices+0.018
tr+0.017
 bisc+0.017
ests+0.016
Neg logit contributions
 closely-0.013
ivated-0.011
uring-0.011
 up-0.011
ya-0.011
Token the
Token ID262
Feature activation+3.88e-03

Pos logit contributions
 closely+0.010
 up+0.009
ivated+0.009
 Jackie+0.008
uring+0.008
Neg logit contributions
aid-0.017
tr-0.016
oc-0.015
 bisc-0.015
 slices-0.015
Token big
Token ID1263
Feature activation-2.87e-04

Pos logit contributions
 closely+0.006
 up+0.005
ivated+0.005
 Jackie+0.005
 lookout+0.005
Neg logit contributions
aid-0.008
 slices-0.007
oc-0.007
umbs-0.007
tr-0.006
Token house
Token ID2156
Feature activation-7.39e-03

Pos logit contributions
aid+0.000
 slices+0.000
tr+0.000
 half+0.000
 bisc+0.000
Neg logit contributions
 closely-0.001
ivated-0.000
uring-0.000
ya-0.000
ize-0.000
Token.
Token ID13
Feature activation+1.90e-02

Pos logit contributions
aid+0.013
tr+0.012
 slices+0.012
 bisc+0.011
ests+0.011
Neg logit contributions
 closely-0.012
ivated-0.011
uring-0.011
 up-0.011
ize-0.010
Token They
Token ID1119
Feature activation+3.31e-02

Pos logit contributions
 closely+0.035
ivated+0.034
ize+0.033
uring+0.033
ya+0.033
Neg logit contributions
 slices-0.026
aid-0.025
tr-0.024
 half-0.024
ossible-0.023
Token liked
Token ID8288
Feature activation+1.35e-02

Pos logit contributions
 closely+0.061
uring+0.058
ize+0.057
ivated+0.057
itionally+0.057
Neg logit contributions
aid-0.044
 slices-0.044
tr-0.043
 half-0.042
ossible-0.041
Token to
Token ID284
Feature activation-1.88e-03

Pos logit contributions
 closely+0.018
ivated+0.017
uring+0.016
ize+0.016
ya+0.016
Neg logit contributions
aid-0.026
 slices-0.025
tr-0.025
 half-0.023
oc-0.023
Token explore
Token ID7301
Feature activation+3.16e-02

Pos logit contributions
aid+0.003
 slices+0.003
tr+0.003
 half+0.003
 slice+0.003
Neg logit contributions
 closely-0.003
ivated-0.003
ize-0.003
ya-0.003
itionally-0.003
Token the
Token ID262
Feature activation+9.69e-03

Pos logit contributions
 closely+0.052
ivated+0.051
uring+0.048
itionally+0.048
ize+0.046
Neg logit contributions
aid-0.049
 slices-0.048
tr-0.046
 half-0.044
oc-0.043
Token rooms
Token ID9519
Feature activation-4.62e-04

Pos logit contributions
 closely+0.016
ivated+0.015
uring+0.015
ize+0.014
itionally+0.014
Neg logit contributions
aid-0.015
 slices-0.015
tr-0.015
 half-0.014
 bisc-0.013
Token and
Token ID290
Feature activation+1.75e-02

Pos logit contributions
 slices+0.001
aid+0.001
tr+0.001
 half+0.001
oc+0.001
Neg logit contributions
 closely-0.001
ivated-0.001
itionally-0.001
ya-0.001
ize-0.001
Token started
Token ID2067
Feature activation+1.61e-03

Pos logit contributions
 closely+0.013
ivated+0.012
uring+0.012
itionally+0.011
ize+0.011
Neg logit contributions
aid-0.015
 slices-0.014
tr-0.014
 half-0.013
oc-0.013
Token to
Token ID284
Feature activation-2.49e-02

Pos logit contributions
 closely+0.002
ivated+0.002
ya+0.002
ize+0.002
itionally+0.002
Neg logit contributions
aid-0.003
 slices-0.003
tr-0.003
 half-0.003
 bruises-0.002
Token cry
Token ID3960
Feature activation+1.83e-02

Pos logit contributions
 scarf+0.035
aid+0.034
 half+0.034
tr+0.034
 bisc+0.033
Neg logit contributions
ya-0.040
 closely-0.040
ivated-0.039
ize-0.039
very-0.038
Token.
Token ID13
Feature activation-3.70e-03

Pos logit contributions
 closely+0.032
ivated+0.030
uring+0.029
ya+0.029
itionally+0.028
Neg logit contributions
aid-0.027
 slices-0.026
tr-0.026
 half-0.024
oc-0.024
Token She
Token ID1375
Feature activation+1.26e-03

Pos logit contributions
aid+0.006
tr+0.006
 slices+0.006
 half+0.006
ossible+0.005
Neg logit contributions
 closely-0.006
ivated-0.005
uring-0.005
itionally-0.005
ya-0.005
Token wanted
Token ID2227
Feature activation+1.85e-05

Pos logit contributions
 closely+0.002
ivated+0.002
uring+0.002
ize+0.002
itionally+0.002
Neg logit contributions
aid-0.002
tr-0.002
 slices-0.002
 half-0.002
oc-0.002
Token to
Token ID284
Feature activation-9.30e-03

Pos logit contributions
 closely+0.000
ivated+0.000
uring+0.000
ize+0.000
ya+0.000
Neg logit contributions
aid-0.000
 slices-0.000
tr-0.000
oc-0.000
 half-0.000
Token go
Token ID467
Feature activation+4.75e-02

Pos logit contributions
aid+0.015
tr+0.014
 half+0.014
 slices+0.014
 bisc+0.014
Neg logit contributions
 closely-0.015
ivated-0.014
ya-0.014
itionally-0.013
uring-0.013
Token back
Token ID736
Feature activation+1.55e-02

Pos logit contributions
 closely+0.080
ivated+0.077
itionally+0.075
ya+0.074
ize+0.074
Neg logit contributions
aid-0.077
tr-0.068
 slices-0.068
 half-0.068
 bisc-0.067
Token inside
Token ID2641
Feature activation+7.50e-03

Pos logit contributions
ivated+0.027
itionally+0.026
 closely+0.025
ya+0.024
uring+0.024
Neg logit contributions
 half-0.022
aid-0.022
tr-0.021
 slices-0.021
 scarf-0.020
Token but
Token ID475
Feature activation+2.40e-03

Pos logit contributions
 closely+0.011
ivated+0.011
uring+0.010
itionally+0.010
ize+0.010
Neg logit contributions
aid-0.013
 slices-0.013
tr-0.012
 half-0.012
oc-0.011
Token Lily
Token ID20037
Feature activation-1.48e-02

Pos logit contributions
aid+0.002
 slices+0.002
oc+0.001
ests+0.001
tr+0.001
Neg logit contributions
 closely-0.001
 up-0.001
 Jackie-0.001
ya-0.001
ivated-0.001
Token and
Token ID290
Feature activation-2.96e-04

Pos logit contributions
aid+0.029
 slices+0.028
tr+0.028
ests+0.028
oc+0.027
Neg logit contributions
 closely-0.021
 up-0.019
ivated-0.017
uring-0.016
ly-0.016
Token drove
Token ID10357
Feature activation+6.42e-03

Pos logit contributions
 slices+0.001
aid+0.001
ests+0.001
oc+0.001
umbs+0.001
Neg logit contributions
 closely-0.000
 up-0.000
 Jackie-0.000
ya-0.000
ly-0.000
Token them
Token ID606
Feature activation+1.08e-02

Pos logit contributions
 closely+0.008
ivated+0.008
uring+0.007
ize+0.007
ya+0.007
Neg logit contributions
aid-0.013
 slices-0.012
tr-0.012
oc-0.011
 bisc-0.011
Token home
Token ID1363
Feature activation-1.69e-02

Pos logit contributions
 closely+0.015
ivated+0.014
ya+0.014
ize+0.013
itionally+0.013
Neg logit contributions
aid-0.019
tr-0.019
 slices-0.019
 half-0.018
ossible-0.018
Token.
Token ID13
Feature activation-8.37e-03

Pos logit contributions
aid+0.027
 slices+0.025
tr+0.025
 bisc+0.024
ests+0.023
Neg logit contributions
 closely-0.029
ivated-0.026
uring-0.026
 up-0.026
ize-0.025
Token They
Token ID1119
Feature activation-1.10e-02

Pos logit contributions
aid+0.012
tr+0.011
 slices+0.011
ossible+0.010
 half+0.010
Neg logit contributions
 closely-0.015
ivated-0.015
uring-0.014
ize-0.014
ya-0.014
Token had
Token ID550
Feature activation-3.74e-02

Pos logit contributions
 slices+0.020
aid+0.019
tr+0.019
umbs+0.018
ests+0.018
Neg logit contributions
 closely-0.015
ivated-0.014
 up-0.013
 forgot-0.012
ya-0.012
Token a
Token ID257
Feature activation+5.07e-04

Pos logit contributions
oc+0.065
aid+0.065
tr+0.063
 slices+0.060
ulture+0.059
Neg logit contributions
 closely-0.054
ivated-0.049
 up-0.049
itionally-0.048
uring-0.047
Token warm
Token ID5814
Feature activation-4.07e-03

Pos logit contributions
 closely+0.001
ivated+0.001
ya+0.001
 lookout+0.001
itionally+0.001
Neg logit contributions
aid-0.001
 slices-0.001
tr-0.001
ests-0.001
oc-0.001
Token bath
Token ID7837
Feature activation-3.32e-03

Pos logit contributions
aid+0.008
 slices+0.007
tr+0.007
oc+0.007
 bruises+0.007
Neg logit contributions
 closely-0.005
 up-0.005
ivated-0.005
ya-0.005
 Jackie-0.005
Token on
Token ID319
Feature activation+7.84e-03

Pos logit contributions
 closely+0.044
ivated+0.042
uring+0.041
itionally+0.041
ya+0.040
Neg logit contributions
aid-0.044
 slices-0.043
tr-0.040
 half-0.039
 bisc-0.038
Token the
Token ID262
Feature activation+1.64e-03

Pos logit contributions
ivated+0.013
 closely+0.013
itionally+0.013
uring+0.013
ize+0.012
Neg logit contributions
 slices-0.012
aid-0.011
 half-0.011
tr-0.011
 bisc-0.010
Token fire
Token ID2046
Feature activation-2.12e-02

Pos logit contributions
 closely+0.003
ivated+0.002
uring+0.002
 up+0.002
ya+0.002
Neg logit contributions
aid-0.003
tr-0.002
 slices-0.002
oc-0.002
ests-0.002
Token and
Token ID290
Feature activation-5.10e-03

Pos logit contributions
aid+0.044
 slices+0.038
 bruises+0.038
ests+0.038
ossible+0.038
Neg logit contributions
 closely-0.028
 up-0.025
ly-0.023
uring-0.023
inating-0.023
Token cracked
Token ID21368
Feature activation-1.18e-02

Pos logit contributions
aid+0.009
 slices+0.009
tr+0.008
 half+0.008
 bisc+0.008
Neg logit contributions
 closely-0.008
ivated-0.007
uring-0.007
itionally-0.007
ize-0.007
Token an
Token ID281
Feature activation-2.50e-03

Pos logit contributions
aid+0.023
tr+0.023
ests+0.021
oc+0.021
ossible+0.021
Neg logit contributions
 closely-0.017
 up-0.015
ivated-0.014
uring-0.014
inating-0.013
Token egg
Token ID5935
Feature activation-5.74e-02

Pos logit contributions
aid+0.003
 slices+0.003
tr+0.003
 half+0.003
 bisc+0.002
Neg logit contributions
 closely-0.005
ivated-0.005
uring-0.005
ize-0.005
itionally-0.005
Token on
Token ID319
Feature activation-2.42e-02

Pos logit contributions
aid+0.121
ests+0.115
Rex+0.110
ossible+0.108
 handcuffs+0.105
Neg logit contributions
 closely-0.083
 up-0.078
ly-0.066
uring-0.064
inating-0.058
Token the
Token ID262
Feature activation-1.87e-03

Pos logit contributions
aid+0.053
ests+0.049
tr+0.048
ossible+0.047
ulture+0.046
Neg logit contributions
 closely-0.033
 up-0.027
ize-0.025
inating-0.025
 Jackie-0.025
Token edge
Token ID5743
Feature activation-1.68e-02

Pos logit contributions
aid+0.004
ests+0.003
 slices+0.003
ossible+0.003
oc+0.003
Neg logit contributions
 closely-0.003
 up-0.002
ya-0.002
uring-0.002
 lookout-0.002
Token of
Token ID286
Feature activation-5.29e-02

Pos logit contributions
aid+0.029
tr+0.026
oc+0.025
 sy+0.025
 handcuffs+0.024
Neg logit contributions
 closely-0.029
 up-0.026
ly-0.025
uring-0.024
 Jackie-0.023
Token"
Token ID1
Feature activation-5.55e-02

Pos logit contributions
tr+0.031
aid+0.030
 slices+0.030
 bisc+0.028
ests+0.028
Neg logit contributions
 closely-0.038
ivated-0.035
ize-0.034
uring-0.034
annah-0.033
TokenNo
Token ID2949
Feature activation-2.25e-02

Pos logit contributions
oc+0.091
 ig+0.086
tr+0.084
inea+0.082
ossible+0.082
Neg logit contributions
ivated-0.085
 closely-0.079
ize-0.078
itionally-0.073
very-0.072
Token more
Token ID517
Feature activation+1.10e-03

Pos logit contributions
aid+0.051
 slices+0.049
tr+0.049
 half+0.046
oc+0.046
Neg logit contributions
 closely-0.023
ivated-0.020
uring-0.018
ya-0.018
ize-0.018
Token sword
Token ID8429
Feature activation-1.52e-02

Pos logit contributions
 closely+0.002
ivated+0.002
uring+0.002
ize+0.002
ya+0.001
Neg logit contributions
aid-0.002
 slices-0.002
tr-0.002
 half-0.002
 bisc-0.002
Token and
Token ID290
Feature activation-1.48e-02

Pos logit contributions
aid+0.032
 slices+0.031
tr+0.030
ests+0.029
ossible+0.029
Neg logit contributions
 closely-0.017
uring-0.014
 up-0.014
ly-0.014
ivated-0.014
Token no
Token ID645
Feature activation-2.72e-02

Pos logit contributions
aid+0.026
tr+0.026
 slices+0.025
umbs+0.025
oc+0.025
Neg logit contributions
 closely-0.021
 up-0.018
uring-0.018
ize-0.018
ly-0.017
Token more
Token ID517
Feature activation-4.19e-03

Pos logit contributions
aid+0.045
tr+0.043
 slices+0.042
 bisc+0.039
ests+0.039
Neg logit contributions
 closely-0.044
ivated-0.040
ize-0.038
uring-0.038
 up-0.038
Token doll
Token ID3654
Feature activation-1.48e-02

Pos logit contributions
aid+0.007
 slices+0.007
tr+0.007
 half+0.006
 bisc+0.006
Neg logit contributions
 closely-0.007
ivated-0.006
uring-0.006
ize-0.006
ya-0.006
Token,"
Token ID553
Feature activation-1.07e-02

Pos logit contributions
aid+0.028
 slices+0.027
tr+0.026
ests+0.026
 bisc+0.025
Neg logit contributions
 closely-0.019
 up-0.017
ly-0.017
uring-0.016
ize-0.016
Token she
Token ID673
Feature activation-3.80e-02

Pos logit contributions
oc+0.024
tr+0.023
ests+0.023
aid+0.023
umbs+0.022
Neg logit contributions
 closely-0.012
 Jackie-0.010
ivated-0.009
 up-0.009
ya-0.008
Token said
Token ID531
Feature activation-5.37e-02

Pos logit contributions
aid+0.059
 slices+0.057
tr+0.057
oc+0.053
ests+0.053
Neg logit contributions
 closely-0.065
ivated-0.059
uring-0.057
itionally-0.056
ize-0.056
Token,
Token ID11
Feature activation+2.32e-02

Pos logit contributions
aid+0.010
tr+0.010
 slices+0.010
oc+0.009
 bisc+0.009
Neg logit contributions
 closely-0.007
ivated-0.006
 up-0.006
uring-0.006
ya-0.006
Token and
Token ID290
Feature activation+9.60e-03

Pos logit contributions
 closely+0.030
 Jackie+0.026
 up+0.026
ivated+0.025
ya+0.025
Neg logit contributions
aid-0.046
tr-0.044
oc-0.043
 slices-0.043
 slot-0.040
Token news
Token ID1705
Feature activation+1.37e-04

Pos logit contributions
 closely+0.013
ivated+0.011
 Jackie+0.011
 up+0.011
ya+0.011
Neg logit contributions
aid-0.019
tr-0.018
 slices-0.018
oc-0.017
 bisc-0.017
Token,
Token ID11
Feature activation+2.79e-02

Pos logit contributions
 closely+0.000
ivated+0.000
uring+0.000
ya+0.000
ize+0.000
Neg logit contributions
aid-0.000
 slices-0.000
tr-0.000
 half-0.000
oc-0.000
Token and
Token ID290
Feature activation+1.46e-02

Pos logit contributions
 closely+0.038
ivated+0.034
uring+0.033
 up+0.032
ya+0.032
Neg logit contributions
aid-0.054
 slices-0.051
tr-0.051
oc-0.048
 half-0.048
Token a
Token ID257
Feature activation-8.07e-03

Pos logit contributions
 closely+0.021
ivated+0.019
uring+0.018
ya+0.018
ize+0.017
Neg logit contributions
aid-0.027
 slices-0.026
tr-0.026
oc-0.024
 bisc-0.024
Token pink
Token ID11398
Feature activation-4.12e-02

Pos logit contributions
aid+0.016
 slices+0.014
ests+0.014
tr+0.014
umbs+0.013
Neg logit contributions
 closely-0.011
ya-0.010
 Jackie-0.009
ivated-0.009
 up-0.009
Token t
Token ID256
Feature activation-1.41e-02

Pos logit contributions
aid+0.066
 slices+0.059
 bruises+0.055
tr+0.055
Rex+0.055
Neg logit contributions
 closely-0.071
ya-0.065
 up-0.064
ivated-0.062
 Jackie-0.062
Tokeneddy
Token ID21874
Feature activation-3.63e-02

Pos logit contributions
 slices+0.017
aid+0.016
 bruises+0.015
 bisc+0.015
 half+0.015
Neg logit contributions
 closely-0.028
ly-0.028
ya-0.027
ize-0.027
 up-0.026
Token bear
Token ID6842
Feature activation-1.86e-02

Pos logit contributions
aid+0.059
 slices+0.056
tr+0.052
 bruises+0.051
umbs+0.050
Neg logit contributions
 closely-0.058
 up-0.057
ly-0.055
ya-0.054
 Jackie-0.051
Token.
Token ID13
Feature activation-5.39e-03

Pos logit contributions
aid+0.032
 slices+0.031
tr+0.029
 bruises+0.028
 half+0.028
Neg logit contributions
 closely-0.028
 up-0.025
uring-0.025
ya-0.025
ivated-0.024
Tokenauc
Token ID14272
Feature activation+1.50e-02

Pos logit contributions
 closely+0.056
ivated+0.052
itionally+0.051
ya+0.050
uring+0.050
Neg logit contributions
aid-0.032
 slices-0.029
tr-0.028
ests-0.026
brush-0.025
Tokenets
Token ID1039
Feature activation+8.18e-03

Pos logit contributions
 closely+0.032
ivated+0.031
uring+0.028
itionally+0.028
ize+0.028
Neg logit contributions
aid-0.020
tr-0.018
 slices-0.017
ests-0.017
 bisc-0.016
Token.
Token ID13
Feature activation+9.83e-03

Pos logit contributions
 closely+0.013
ivated+0.012
uring+0.012
ize+0.012
ya+0.012
Neg logit contributions
aid-0.013
 slices-0.013
tr-0.013
 half-0.012
 bisc-0.012
Token Some
Token ID2773
Feature activation-1.13e-02

Pos logit contributions
 closely+0.017
ivated+0.016
uring+0.015
ya+0.015
ize+0.015
Neg logit contributions
tr-0.015
aid-0.014
 slices-0.014
 half-0.013
 bisc-0.013
Token were
Token ID547
Feature activation+1.75e-03

Pos logit contributions
aid+0.013
 slices+0.013
tr+0.012
 half+0.011
 bisc+0.011
Neg logit contributions
 closely-0.023
ivated-0.022
uring-0.021
ize-0.021
ya-0.021
Token old
Token ID1468
Feature activation-1.20e-02

Pos logit contributions
 closely+0.002
ivated+0.002
uring+0.002
ize+0.002
itionally+0.002
Neg logit contributions
aid-0.003
 slices-0.003
tr-0.003
 half-0.003
 bisc-0.003
Token and
Token ID290
Feature activation-4.52e-03

Pos logit contributions
aid+0.022
 slices+0.021
tr+0.021
 half+0.020
 bisc+0.020
Neg logit contributions
 closely-0.017
ivated-0.016
uring-0.015
ize-0.015
itionally-0.015
Token rusty
Token ID47855
Feature activation-1.33e-02

Pos logit contributions
aid+0.008
tr+0.007
 slices+0.007
ests+0.007
oc+0.007
Neg logit contributions
 closely-0.007
ivated-0.006
ize-0.006
 Jackie-0.006
 up-0.006
Token,
Token ID11
Feature activation+1.67e-02

Pos logit contributions
aid+0.023
 slices+0.023
tr+0.022
 half+0.021
 bisc+0.021
Neg logit contributions
 closely-0.020
ivated-0.019
uring-0.018
itionally-0.017
ya-0.017
Token but
Token ID475
Feature activation+9.93e-03

Pos logit contributions
 closely+0.024
ivated+0.021
uring+0.020
ize+0.020
ya+0.019
Neg logit contributions
aid-0.031
 slices-0.030
tr-0.030
oc-0.028
 bisc-0.028
Token others
Token ID1854
Feature activation+7.97e-03

Pos logit contributions
 closely+0.015
ivated+0.014
uring+0.013
itionally+0.013
ya+0.013
Neg logit contributions
aid-0.017
 slices-0.017
tr-0.016
 half-0.016
 bisc-0.015
Token them
Token ID606
Feature activation-3.21e-03

Pos logit contributions
tr+0.049
aid+0.048
oc+0.046
 slices+0.045
 bisc+0.044
Neg logit contributions
 closely-0.037
 up-0.033
ivated-0.032
ize-0.031
 Jackie-0.031
Token crying
Token ID13774
Feature activation-9.32e-03

Pos logit contributions
aid+0.006
 slices+0.006
tr+0.006
ests+0.005
 bisc+0.005
Neg logit contributions
 closely-0.005
ivated-0.004
 up-0.004
uring-0.004
inating-0.004
Token.
Token ID13
Feature activation-8.43e-03

Pos logit contributions
aid+0.015
 slices+0.014
tr+0.014
 bisc+0.013
ossible+0.013
Neg logit contributions
 closely-0.016
ivated-0.014
 up-0.014
uring-0.014
ize-0.013
Token
Token ID198
Feature activation-2.62e-02

Pos logit contributions
aid+0.013
 slices+0.012
tr+0.012
 half+0.011
oc+0.011
Neg logit contributions
 closely-0.015
ivated-0.014
uring-0.013
ize-0.013
itionally-0.013
Token
Token ID198
Feature activation-1.77e-02

Pos logit contributions
aid+0.036
tr+0.035
 slices+0.034
 bisc+0.031
ests+0.031
Neg logit contributions
 closely-0.050
ivated-0.046
uring-0.045
ize-0.044
 up-0.044
Token"
Token ID1
Feature activation-6.23e-02

Pos logit contributions
tr+0.024
aid+0.023
 slices+0.022
 bisc+0.021
ests+0.021
Neg logit contributions
 closely-0.034
ivated-0.032
uring-0.031
ize-0.031
 Jackie-0.030
TokenWhat
Token ID2061
Feature activation-3.51e-02

Pos logit contributions
tr+0.117
oc+0.112
bar+0.107
ests+0.106
age+0.104
Neg logit contributions
ivated-0.082
 closely-0.079
ize-0.076
 Jackie-0.075
itionally-0.075
Token happened
Token ID3022
Feature activation+2.44e-02

Pos logit contributions
aid+0.064
bar+0.059
tr+0.059
 slices+0.058
brush+0.056
Neg logit contributions
 closely-0.048
 up-0.045
ivated-0.044
 Jackie-0.042
ize-0.041
Token,
Token ID11
Feature activation+1.66e-02

Pos logit contributions
ivated+0.042
 closely+0.041
appointed+0.041
itionally+0.040
anna+0.040
Neg logit contributions
 slices-0.032
 half-0.031
ossible-0.031
 bisc-0.030
oc-0.030
Token my
Token ID616
Feature activation-1.94e-02

Pos logit contributions
 closely+0.029
ivated+0.025
 Jackie+0.025
ya+0.025
 up+0.025
Neg logit contributions
aid-0.027
tr-0.027
 slices-0.025
oc-0.024
ests-0.024
Token loves
Token ID10408
Feature activation-1.80e-02

Pos logit contributions
aid+0.033
 slices+0.033
tr+0.031
brush+0.030
 half+0.029
Neg logit contributions
 closely-0.029
 up-0.027
ya-0.026
ly-0.025
rated-0.024
Token her
Token ID607
Feature activation-1.17e-02

Pos logit contributions
aid+0.002
 slices+0.002
tr+0.002
 half+0.002
 bisc+0.002
Neg logit contributions
 closely-0.002
ivated-0.002
uring-0.002
itionally-0.002
ize-0.002
Token the
Token ID262
Feature activation+3.73e-05

Pos logit contributions
tr+0.022
aid+0.022
 slices+0.021
 bisc+0.021
umbs+0.021
Neg logit contributions
 closely-0.016
 up-0.015
 Jackie-0.014
ya-0.013
uring-0.013
Token special
Token ID2041
Feature activation-1.84e-02

Pos logit contributions
 closely+0.000
ivated+0.000
 up+0.000
 Jackie+0.000
uring+0.000
Neg logit contributions
aid-0.000
 slices-0.000
tr-0.000
oc-0.000
ests-0.000
Token,
Token ID11
Feature activation+1.19e-02

Pos logit contributions
aid+0.030
 slices+0.030
tr+0.029
Rex+0.027
oc+0.027
Neg logit contributions
 closely-0.030
 Jackie-0.026
 up-0.026
ivated-0.026
uring-0.025
Token original
Token ID2656
Feature activation-1.34e-02

Pos logit contributions
 closely+0.016
ivated+0.013
uring+0.013
ya+0.013
 Jackie+0.013
Neg logit contributions
aid-0.024
tr-0.023
 slices-0.023
ests-0.022
oc-0.022
Token gift
Token ID6979
Feature activation-4.96e-02

Pos logit contributions
aid+0.022
 slices+0.021
tr+0.021
 half+0.019
 bisc+0.019
Neg logit contributions
 closely-0.022
ivated-0.020
uring-0.020
ize-0.019
ya-0.019
Token.
Token ID13
Feature activation-2.02e-02

Pos logit contributions
aid+0.093
 slices+0.089
oc+0.088
ossible+0.084
Rex+0.084
Neg logit contributions
 closely-0.071
 up-0.064
 Jackie-0.060
ly-0.059
uring-0.059
Token From
Token ID3574
Feature activation+3.84e-03

Pos logit contributions
aid+0.037
tr+0.037
 slices+0.036
 bisc+0.036
 scarf+0.035
Neg logit contributions
 closely-0.027
 Jackie-0.027
annah-0.026
 But-0.025
ivated-0.024
Token then
Token ID788
Feature activation+1.56e-02

Pos logit contributions
 closely+0.007
ivated+0.007
itionally+0.006
uring+0.006
inating+0.006
Neg logit contributions
aid-0.005
 slices-0.005
 half-0.005
tr-0.005
ossible-0.005
Token on
Token ID319
Feature activation+1.03e-02

Pos logit contributions
 closely+0.030
ivated+0.028
uring+0.027
ize+0.026
 up+0.026
Neg logit contributions
aid-0.021
 slices-0.020
tr-0.020
 bisc-0.018
 half-0.018
Token,
Token ID11
Feature activation+9.82e-03

Pos logit contributions
 closely+0.013
ivated+0.012
uring+0.012
itionally+0.012
inating+0.011
Neg logit contributions
 slices-0.020
aid-0.020
tr-0.019
 half-0.019
ossible-0.018
Token R
Token ID371
Feature activation+1.62e-02

Pos logit contributions
 closely+0.027
ya+0.024
 Jackie+0.024
ivated+0.024
 up+0.023
Neg logit contributions
aid-0.039
tr-0.038
 slices-0.036
ests-0.035
oc-0.034
Tokenani
Token ID3216
Feature activation-2.05e-02

Pos logit contributions
 closely+0.029
ivated+0.027
uring+0.026
ize+0.026
ya+0.026
Neg logit contributions
aid-0.024
 slices-0.023
tr-0.023
 half-0.021
 bisc-0.021
Token smiled
Token ID13541
Feature activation+5.77e-04

Pos logit contributions
aid+0.035
tr+0.034
 slices+0.034
 bisc+0.031
ests+0.031
Neg logit contributions
 closely-0.032
ivated-0.029
ya-0.028
 up-0.028
uring-0.027
Token and
Token ID290
Feature activation-3.96e-03

Pos logit contributions
 closely+0.001
 up+0.001
ivated+0.001
uring+0.001
ly+0.001
Neg logit contributions
aid-0.001
 slices-0.001
tr-0.001
 bisc-0.001
 bruises-0.001
Token hugged
Token ID41130
Feature activation-2.46e-03

Pos logit contributions
tr+0.007
 slices+0.007
pter+0.007
ests+0.007
oc+0.007
Neg logit contributions
 closely-0.005
 up-0.004
 Jackie-0.004
ivated-0.004
ya-0.004
Token each
Token ID1123
Feature activation-5.45e-02

Pos logit contributions
aid+0.004
tr+0.004
 slices+0.004
 bisc+0.004
bar+0.004
Neg logit contributions
 closely-0.004
 up-0.003
ya-0.003
ize-0.003
ivated-0.003
Token other
Token ID584
Feature activation-1.76e-02

Pos logit contributions
tr+0.097
aid+0.095
 slices+0.091
bar+0.087
brush+0.087
Neg logit contributions
 closely-0.086
ya-0.075
 up-0.074
ly-0.070
ivated-0.070
Token.
Token ID13
Feature activation-8.21e-03

Pos logit contributions
aid+0.032
tr+0.031
 slices+0.029
ests+0.029
 bisc+0.029
Neg logit contributions
 closely-0.028
 up-0.025
ya-0.023
ly-0.022
uring-0.022
Token They
Token ID1119
Feature activation-2.62e-02

Pos logit contributions
aid+0.015
 slices+0.014
tr+0.013
 half+0.013
 slice+0.013
Neg logit contributions
 closely-0.013
inating-0.011
ivated-0.011
uring-0.011
itionally-0.011
Token loved
Token ID6151
Feature activation-5.31e-03

Pos logit contributions
tr+0.048
 slices+0.047
pter+0.046
aid+0.045
 slot+0.045
Neg logit contributions
 closely-0.034
ya-0.031
 forgot-0.030
 Jackie-0.030
ivated-0.029
Token to
Token ID284
Feature activation+5.54e-03

Pos logit contributions
aid+0.009
tr+0.009
 slices+0.008
oc+0.008
brush+0.008
Neg logit contributions
 closely-0.009
ya-0.007
ize-0.007
uring-0.007
ivated-0.007
Token are
Token ID389
Feature activation-8.09e-03

Pos logit contributions
 slices+0.094
tr+0.082
aid+0.081
 carriers+0.081
oc+0.080
Neg logit contributions
 closely-0.090
 up-0.087
ivated-0.086
itionally-0.083
 lookout-0.083
Token sorry
Token ID7926
Feature activation+8.74e-03

Pos logit contributions
aid+0.016
 slices+0.015
tr+0.015
oc+0.014
 bisc+0.014
Neg logit contributions
 closely-0.011
ivated-0.010
uring-0.009
 up-0.009
ize-0.009
Token,
Token ID11
Feature activation+2.17e-02

Pos logit contributions
 closely+0.014
ivated+0.014
uring+0.013
itionally+0.012
ya+0.012
Neg logit contributions
aid-0.014
 slices-0.014
 half-0.013
tr-0.013
ossible-0.013
Token teacher
Token ID4701
Feature activation-2.53e-02

Pos logit contributions
 closely+0.033
ivated+0.029
ize+0.028
ya+0.027
 up+0.027
Neg logit contributions
aid-0.040
oc-0.037
tr-0.037
 slices-0.036
ests-0.036
Token.
Token ID13
Feature activation-1.76e-02

Pos logit contributions
aid+0.049
tr+0.049
 slices+0.045
 bisc+0.044
ests+0.044
Neg logit contributions
 closely-0.033
ly-0.030
 up-0.029
ya-0.028
ize-0.027
Token We
Token ID775
Feature activation-5.70e-02

Pos logit contributions
aid+0.031
 slices+0.030
tr+0.029
 half+0.027
 bisc+0.027
Neg logit contributions
 closely-0.026
ivated-0.025
uring-0.024
ize-0.023
itionally-0.023
Token will
Token ID481
Feature activation-9.65e-03

Pos logit contributions
 slices+0.086
aid+0.076
tr+0.073
 carriers+0.072
pter+0.071
Neg logit contributions
 closely-0.098
ivated-0.093
 up-0.092
itionally-0.090
ize-0.087
Token learn
Token ID2193
Feature activation-6.25e-03

Pos logit contributions
aid+0.017
 slices+0.017
tr+0.016
 half+0.016
 bisc+0.015
Neg logit contributions
 closely-0.014
ivated-0.013
uring-0.013
ya-0.012
ize-0.012
Token,
Token ID11
Feature activation+3.05e-02

Pos logit contributions
aid+0.012
pter+0.011
tr+0.011
 slices+0.011
oc+0.011
Neg logit contributions
 closely-0.009
 up-0.008
ize-0.007
uring-0.007
ivated-0.007
Token teacher
Token ID4701
Feature activation-1.77e-02

Pos logit contributions
 closely+0.043
ivated+0.037
uring+0.036
 up+0.034
ya+0.034
Neg logit contributions
aid-0.060
tr-0.057
 slices-0.056
oc-0.055
ests-0.055
Token.
Token ID13
Feature activation-1.39e-02

Pos logit contributions
tr+0.033
aid+0.033
 slices+0.030
 bisc+0.030
bar+0.030
Neg logit contributions
 closely-0.025
ly-0.022
 up-0.021
ya-0.021
uring-0.020
Token!"
Token ID2474
Feature activation+1.93e-03

Pos logit contributions
 closely+0.017
ivated+0.016
uring+0.015
ize+0.015
ya+0.015
Neg logit contributions
aid-0.020
 slices-0.019
tr-0.019
 half-0.018
 bisc-0.018
Token Mia
Token ID32189
Feature activation-2.12e-02

Pos logit contributions
 closely+0.003
ivated+0.003
uring+0.003
ya+0.003
ize+0.003
Neg logit contributions
aid-0.003
tr-0.003
 slices-0.003
oc-0.003
 half-0.003
Token says
Token ID1139
Feature activation-3.73e-02

Pos logit contributions
aid+0.030
 slices+0.029
tr+0.028
 half+0.026
oc+0.026
Neg logit contributions
 closely-0.040
ivated-0.037
uring-0.035
ya-0.035
itionally-0.035
Token.
Token ID13
Feature activation-1.67e-02

Pos logit contributions
aid+0.047
 slices+0.045
tr+0.043
 half+0.041
 bruises+0.039
Neg logit contributions
 closely-0.073
ivated-0.071
ya-0.068
itionally-0.068
ize-0.068
Token "
Token ID366
Feature activation-1.64e-02

Pos logit contributions
aid+0.023
 slices+0.023
tr+0.022
 half+0.021
ossible+0.020
Neg logit contributions
 closely-0.030
ivated-0.029
uring-0.029
itionally-0.028
ya-0.028
TokenI
Token ID40
Feature activation-5.00e-02

Pos logit contributions
aid+0.019
 slices+0.018
tr+0.018
 bisc+0.017
ossible+0.017
Neg logit contributions
 closely-0.033
ivated-0.033
uring-0.031
ize-0.031
itionally-0.030
Token did
Token ID750
Feature activation-7.45e-04

Pos logit contributions
aid+0.081
 slices+0.080
tr+0.076
 half+0.072
 bisc+0.071
Neg logit contributions
 closely-0.081
ivated-0.076
uring-0.072
ize-0.071
itionally-0.071
Token not
Token ID407
Feature activation-4.22e-02

Pos logit contributions
aid+0.001
 half+0.001
 slices+0.001
tr+0.001
 bruises+0.001
Neg logit contributions
 closely-0.001
ivated-0.001
ya-0.001
inating-0.001
uring-0.001
Token mean
Token ID1612
Feature activation+1.84e-02

Pos logit contributions
aid+0.083
 slices+0.082
tr+0.080
oc+0.075
 half+0.075
Neg logit contributions
 closely-0.054
ivated-0.049
uring-0.047
itionally-0.045
ize-0.045
Token to
Token ID284
Feature activation+5.80e-03

Pos logit contributions
ivated+0.033
 closely+0.032
uring+0.031
itionally+0.030
 lookout+0.029
Neg logit contributions
aid-0.027
 slices-0.026
 half-0.025
tr-0.024
ossible-0.024
Token break
Token ID2270
Feature activation+2.73e-02

Pos logit contributions
 closely+0.009
ivated+0.008
uring+0.008
ya+0.008
inating+0.008
Neg logit contributions
aid-0.010
tr-0.010
 slices-0.009
 half-0.009
oc-0.009