TOP ACTIVATIONS
MAX = 4.084

he said it was only $ 10 .
The le opard toy costs $ 10 ."
careful ; we won 't cause a mess ."
the price - it was $ 5 . Sally was so
I promise I won 't cause any trouble . Can I
the shiny things were or nam ents and they were very
high they were just tiny spec ks in the sky .
to see how much it costs ." They looked at the
foot . It was 15 centimeters long . Lucy said ,
ongo didn 't like wet av oc ados , so he
complete . Height : 90 centimeters . Weight : 15 kilograms
He quickly hung colorful or nam ents and tw ink ling
ask for things that could cause harm or trouble . <|endoftext|>
a tree with shiny or nam ents hanging from its branches
you did not mean to cause trouble . But you have
We did not mean to cause trouble ." The
price tag . It said $ 1 . 99 per pound
o was not like other gor illas . He was bald
hill and put her or nam ents all around . She
the cloth onto the ash tr ay . He worked hard

MIN ACTIVATIONS
MIN = -3.429

listen . She threw a tant rum in the store .
. " All erg ies !" shouted Anne .
. He kept throwing a tant rum . His mom was
surprise . He threw a tant rum . He kicked and
case ! All of a sudden , the old
stubborn . She throws a tant rum . She kicks and
them and never threw a tant rum again . <|endoftext|> Once
mom m ies and d add ies said , " Today
bed wearing his favourite py j amas , feeling so proud
. All of his friends were happy because
his feet and threw a tant rum . "
my . What was your nightmare about ?" his mom asked
ebra , or a peac ock today ?" L ila was
sad . She threw a tant rum and cried loudly .
. All of the roads and cars continued
anyone . Where was the voice coming from ?
mom m ies and d add ies . They were all
All of his classmates came around to
butterfly ! All of Lily 's family and friends
her feet and threw a tant rum . She grabbed the

INTERVAL 2.206 - 4.084
CONTAINS 0.010%

noise and comes to the living room . "
was a generous woman called Mrs . M illy . She
scared , but he was also excited to meet him .
" Here , you can each have a half of the
she came back to the living room , the popcorn was

INTERVAL 0.328 - 2.206
CONTAINS 27.392%

folding her paper , Sam and Emma smiled at each other
Once upon a time , there was a little girl named
but he did not notice that it was w obb ly
arrow . One day , he went to the park to
. From that day on , the sheep

INTERVAL -1.551 - 0.328
CONTAINS 72.252%

started to make trouble ! He pushed the furry bear off
hardly sit still . On the big screen
the trains . " Ch oo - ch oo
shook his head . He knew Lily was kind , but
He was three years old . One day Jack was playing

INTERVAL -3.429 - -1.551
CONTAINS 0.346%

a hairy bear who lived in the forest . He wanted
. Fin liked to swim in the big blue sea and
." They searched everywhere , but they couldn 't
her . " What 's wrong ?" he asked .
When Lou got home , she told her mom
Token he
Token ID339
Feature activation-4.90e-01

Pos logit contributions
ours+0.005
 supervision+0.005
 salon+0.005
ses+0.005
 studio+0.005
Neg logit contributions
ty-0.004
ba-0.003
ve-0.003
 himself-0.003
Mo-0.003
Token said
Token ID531
Feature activation+5.77e-02

Pos logit contributions
ba+0.012
ty+0.011
ve+0.011
Mo+0.010
B+0.010
Neg logit contributions
 salon-0.015
ours-0.015
 supervision-0.015
 studio-0.015
ses-0.014
Token it
Token ID340
Feature activation-3.28e-01

Pos logit contributions
 supervision+0.002
 salon+0.002
ses+0.002
ours+0.002
 studio+0.002
Neg logit contributions
ve-0.002
ba-0.002
 himself-0.002
ty-0.002
Mo-0.001
Token was
Token ID373
Feature activation-2.24e-02

Pos logit contributions
ba+0.008
ty+0.008
ve+0.008
Mo+0.007
B+0.007
Neg logit contributions
 salon-0.010
ours-0.010
 supervision-0.010
 studio-0.009
ses-0.009
Token only
Token ID691
Feature activation+2.01e-01

Pos logit contributions
ba+0.001
ve+0.001
ty+0.001
Mo+0.001
B+0.000
Neg logit contributions
ours-0.001
 salon-0.001
 supervision-0.001
 studio-0.001
ses-0.001
Token $
Token ID720
Feature activation+4.08e+00

Pos logit contributions
 salon+0.007
ours+0.007
 studio+0.006
 supervision+0.006
anged+0.006
Neg logit contributions
ty-0.004
ba-0.004
ve-0.004
 himself-0.004
Mo-0.004
Token10
Token ID940
Feature activation+9.42e-01

Pos logit contributions
ses+0.098
ours+0.092
 supervision+0.092
 salon+0.092
 studio+0.091
Neg logit contributions
ve-0.130
ba-0.126
B-0.121
ty-0.117
Mo-0.115
Token.
Token ID13
Feature activation+3.42e-01

Pos logit contributions
ours+0.033
 salon+0.033
 studio+0.033
ses+0.032
 supervision+0.031
Neg logit contributions
ba-0.018
ve-0.018
ty-0.017
Mo-0.014
ke-0.014
Token
Token ID220
Feature activation+3.67e-01

Pos logit contributions
 salon+0.011
 supervision+0.011
ours+0.010
ses+0.010
 studio+0.010
Neg logit contributions
ty-0.008
ba-0.008
Mo-0.008
ve-0.008
 himself-0.008
Token
Token ID198
Feature activation+2.45e-01

Pos logit contributions
ours+0.012
 salon+0.011
 supervision+0.011
 studio+0.011
ses+0.010
Neg logit contributions
ba-0.009
ty-0.008
ve-0.008
Mo-0.008
Ar-0.007
Token
Token ID198
Feature activation+3.66e-01

Pos logit contributions
ours+0.006
 studio+0.006
 salon+0.006
 �+0.006
 Sharing+0.005
Neg logit contributions
ba-0.008
ty-0.007
ve-0.007
vell-0.006
Mo-0.006
TokenThe
Token ID464
Feature activation-4.23e-02

Pos logit contributions
ba+0.006
ty+0.006
ve+0.006
Mo+0.006
B+0.005
Neg logit contributions
 salon-0.006
ours-0.006
 supervision-0.006
 studio-0.006
ses-0.005
Token le
Token ID443
Feature activation+7.91e-01

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
Mo+0.001
B+0.001
Neg logit contributions
 salon-0.001
ours-0.001
 supervision-0.001
 studio-0.001
ses-0.001
Tokenopard
Token ID15478
Feature activation+1.01e-01

Pos logit contributions
 salon+0.015
 supervision+0.014
ours+0.014
 studio+0.014
 mouths+0.012
Neg logit contributions
ba-0.029
ve-0.028
ty-0.028
Mo-0.027
B-0.026
Token toy
Token ID13373
Feature activation-5.65e-01

Pos logit contributions
 salon+0.003
 supervision+0.003
 studio+0.003
ours+0.003
 gust+0.002
Neg logit contributions
ba-0.003
ty-0.003
ve-0.003
Mo-0.002
 himself-0.002
Token costs
Token ID3484
Feature activation+2.19e+00

Pos logit contributions
ba+0.018
ty+0.017
ve+0.017
Mo+0.016
B+0.016
Neg logit contributions
 salon-0.013
ours-0.013
 supervision-0.013
 studio-0.012
ses-0.012
Token $
Token ID720
Feature activation+3.87e+00

Pos logit contributions
 supervision+0.062
ours+0.062
 salon+0.062
 studio+0.058
 mouths+0.054
Neg logit contributions
ba-0.059
ty-0.056
ve-0.054
Mo-0.054
B-0.049
Token10
Token ID940
Feature activation+8.48e-01

Pos logit contributions
ses+0.086
ours+0.085
 studio+0.085
 salon+0.084
 supervision+0.083
Neg logit contributions
ve-0.125
ba-0.124
ty-0.116
B-0.116
Mo-0.111
Token."
Token ID526
Feature activation+9.23e-01

Pos logit contributions
 salon+0.029
ours+0.028
 supervision+0.028
 studio+0.028
ses+0.026
Neg logit contributions
ba-0.018
ve-0.017
ty-0.017
Mo-0.015
Ar-0.014
Token
Token ID220
Feature activation+6.36e-01

Pos logit contributions
 salon+0.027
 supervision+0.026
ours+0.026
 studio+0.026
ses+0.024
Neg logit contributions
ty-0.024
ba-0.024
ve-0.023
Mo-0.022
B-0.021
Token
Token ID198
Feature activation+5.32e-01

Pos logit contributions
ours+0.018
 supervision+0.016
 salon+0.016
 Sharing+0.015
 studio+0.014
Neg logit contributions
ba-0.018
ty-0.018
ve-0.017
Mo-0.016
Ar-0.015
Token
Token ID198
Feature activation+4.69e-01

Pos logit contributions
ours+0.013
 salon+0.012
 Sharing+0.011
 supervision+0.011
 �+0.011
Neg logit contributions
ba-0.017
ve-0.015
vell-0.015
ty-0.015
Mo-0.015
Token careful
Token ID8161
Feature activation-1.26e-01

Pos logit contributions
ours+0.006
 salon+0.006
 studio+0.005
 supervision+0.005
ses+0.005
Neg logit contributions
ba-0.004
ty-0.003
ve-0.003
 himself-0.003
Mo-0.003
Token;
Token ID26
Feature activation+4.25e-01

Pos logit contributions
ba+0.002
ty+0.002
ve+0.002
 himself+0.001
Mo+0.001
Neg logit contributions
 salon-0.005
ours-0.005
 studio-0.005
 supervision-0.005
ses-0.005
Token we
Token ID356
Feature activation+3.14e-01

Pos logit contributions
 salon+0.013
 studio+0.013
ours+0.013
anged+0.012
 supervision+0.012
Neg logit contributions
ba-0.009
ty-0.009
ve-0.009
 himself-0.008
Mo-0.008
Token won
Token ID1839
Feature activation+8.00e-01

Pos logit contributions
 salon+0.007
ours+0.007
 supervision+0.006
 studio+0.006
ses+0.006
Neg logit contributions
ba-0.011
ty-0.010
ve-0.010
Mo-0.010
B-0.009
Token't
Token ID470
Feature activation+3.88e-01

Pos logit contributions
 salon+0.025
ours+0.024
 supervision+0.024
 studio+0.023
 Sharing+0.023
Neg logit contributions
ba-0.018
ty-0.016
ve-0.016
Mo-0.015
Ar-0.015
Token cause
Token ID2728
Feature activation+3.61e+00

Pos logit contributions
 salon+0.010
 supervision+0.009
ours+0.009
 studio+0.009
ap+0.008
Neg logit contributions
ba-0.011
ve-0.011
ty-0.011
Mo-0.011
Ar-0.010
Token a
Token ID257
Feature activation+9.27e-02

Pos logit contributions
 salon+0.111
ours+0.109
 studio+0.107
 supervision+0.102
ses+0.100
Neg logit contributions
ba-0.089
ty-0.086
ve-0.085
Mo-0.079
B-0.078
Token mess
Token ID2085
Feature activation+4.46e-01

Pos logit contributions
ours+0.003
 supervision+0.003
 salon+0.003
 studio+0.003
 mouths+0.003
Neg logit contributions
ba-0.002
ve-0.002
ty-0.002
B-0.002
Ar-0.002
Token."
Token ID526
Feature activation+4.04e-01

Pos logit contributions
ours+0.016
 salon+0.016
 supervision+0.016
 studio+0.015
ses+0.014
Neg logit contributions
ba-0.008
ty-0.008
ve-0.008
Mo-0.008
Ar-0.007
Token
Token ID198
Feature activation+3.49e-01

Pos logit contributions
 salon+0.012
ours+0.012
 supervision+0.012
 studio+0.011
ses+0.010
Neg logit contributions
 himself-0.010
ty-0.009
ba-0.009
Mo-0.009
ve-0.009
Token
Token ID198
Feature activation+2.50e-01

Pos logit contributions
 salon+0.011
ours+0.011
 supervision+0.011
 studio+0.011
ses+0.010
Neg logit contributions
ba-0.008
ty-0.008
ve-0.007
Mo-0.007
B-0.006
Token the
Token ID262
Feature activation+8.85e-01

Pos logit contributions
ours+0.004
ses+0.004
 supervision+0.004
 studio+0.004
 salon+0.004
Neg logit contributions
ty-0.003
ba-0.003
 himself-0.003
ve-0.003
Mo-0.003
Token price
Token ID2756
Feature activation+7.45e-01

Pos logit contributions
ours+0.030
 supervision+0.028
 salon+0.028
 studio+0.027
anged+0.027
Neg logit contributions
ve-0.018
ba-0.018
ty-0.018
 himself-0.017
Mo-0.016
Token -
Token ID532
Feature activation+4.16e-01

Pos logit contributions
ours+0.025
ses+0.022
 salon+0.022
 studio+0.022
 supervision+0.022
Neg logit contributions
ty-0.018
ve-0.017
 himself-0.016
ba-0.016
Mo-0.014
Token it
Token ID340
Feature activation-1.90e-01

Pos logit contributions
ours+0.014
ses+0.014
anged+0.014
 studio+0.014
 supervision+0.014
Neg logit contributions
ba-0.008
ve-0.008
 himself-0.008
ty-0.008
Mo-0.007
Token was
Token ID373
Feature activation+3.77e-01

Pos logit contributions
ba+0.005
ty+0.005
ve+0.005
Mo+0.004
B+0.004
Neg logit contributions
 salon-0.006
ours-0.006
 supervision-0.006
 studio-0.005
ses-0.005
Token $
Token ID720
Feature activation+3.51e+00

Pos logit contributions
 salon+0.012
ours+0.012
 supervision+0.012
 studio+0.012
ses+0.011
Neg logit contributions
ba-0.008
ty-0.008
ve-0.008
Mo-0.007
Ar-0.007
Token5
Token ID20
Feature activation+1.55e+00

Pos logit contributions
ses+0.082
ours+0.078
 salon+0.078
 supervision+0.077
 studio+0.077
Neg logit contributions
ve-0.115
ba-0.111
B-0.104
ty-0.101
Mo-0.100
Token.
Token ID13
Feature activation-5.63e-02

Pos logit contributions
ours+0.054
 salon+0.054
 supervision+0.052
 studio+0.052
ses+0.052
Neg logit contributions
ba-0.032
ve-0.032
ty-0.031
Mo-0.027
 himself-0.025
Token Sally
Token ID25737
Feature activation+3.49e-01

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
Mo+0.001
 himself+0.001
Neg logit contributions
 salon-0.002
ours-0.002
 supervision-0.002
 studio-0.002
ses-0.001
Token was
Token ID373
Feature activation+4.34e-01

Pos logit contributions
ours+0.011
 supervision+0.011
 salon+0.011
 studio+0.011
ses+0.010
Neg logit contributions
ba-0.008
ty-0.008
ve-0.008
Mo-0.007
B-0.007
Token so
Token ID523
Feature activation+7.09e-01

Pos logit contributions
ours+0.017
 supervision+0.017
 salon+0.016
 studio+0.016
ses+0.016
Neg logit contributions
ba-0.007
ve-0.006
ty-0.006
 himself-0.006
Mo-0.005
Token I
Token ID314
Feature activation+5.09e-01

Pos logit contributions
ours+0.011
 salon+0.011
 supervision+0.011
 studio+0.010
ses+0.010
Neg logit contributions
ba-0.012
ty-0.012
ve-0.012
Mo-0.011
B-0.011
Token promise
Token ID6991
Feature activation+3.37e-02

Pos logit contributions
 salon+0.010
 studio+0.009
ours+0.009
ap+0.009
bug+0.009
Neg logit contributions
ty-0.017
Mo-0.016
ba-0.016
rob-0.016
 himself-0.016
Token I
Token ID314
Feature activation-7.29e-02

Pos logit contributions
ours+0.001
 salon+0.001
 supervision+0.001
 studio+0.001
ses+0.001
Neg logit contributions
ba-0.001
ty-0.001
ve-0.001
Mo-0.001
B-0.001
Token won
Token ID1839
Feature activation+9.03e-01

Pos logit contributions
ba+0.002
ty+0.002
Mo+0.002
ve+0.002
B+0.002
Neg logit contributions
ours-0.001
 salon-0.001
 supervision-0.001
 studio-0.001
ses-0.001
Token't
Token ID470
Feature activation+6.03e-01

Pos logit contributions
 salon+0.025
 supervision+0.025
 studio+0.024
ours+0.023
ap+0.023
Neg logit contributions
ba-0.021
Ar-0.019
rim-0.019
Mo-0.019
Ho-0.018
Token cause
Token ID2728
Feature activation+3.43e+00

Pos logit contributions
 salon+0.015
 supervision+0.014
 studio+0.013
ours+0.012
ap+0.012
Neg logit contributions
ba-0.018
ve-0.017
Mo-0.017
rob-0.016
Ar-0.016
Token any
Token ID597
Feature activation-1.23e-01

Pos logit contributions
 salon+0.099
ours+0.098
 studio+0.094
 supervision+0.093
ses+0.089
Neg logit contributions
ba-0.092
ty-0.089
ve-0.088
Mo-0.083
B-0.079
Token trouble
Token ID5876
Feature activation+9.45e-02

Pos logit contributions
ba+0.003
ty+0.003
ve+0.003
Mo+0.003
B+0.003
Neg logit contributions
 salon-0.004
ours-0.004
 supervision-0.004
 studio-0.003
ses-0.003
Token.
Token ID13
Feature activation+4.52e-01

Pos logit contributions
 salon+0.003
 supervision+0.003
ours+0.003
 studio+0.003
ses+0.003
Neg logit contributions
ba-0.002
ty-0.002
ve-0.002
Mo-0.002
B-0.002
Token Can
Token ID1680
Feature activation+1.42e+00

Pos logit contributions
 salon+0.009
ours+0.009
 supervision+0.009
 studio+0.008
ses+0.007
Neg logit contributions
ba-0.016
ty-0.015
ve-0.015
Mo-0.015
Ar-0.014
Token I
Token ID314
Feature activation+1.10e-01

Pos logit contributions
 salon+0.036
ours+0.035
 supervision+0.035
 studio+0.033
ses+0.031
Neg logit contributions
ba-0.043
ty-0.041
ve-0.041
Mo-0.039
Ar-0.037
Token the
Token ID262
Feature activation-1.14e-01

Pos logit contributions
 supervision+0.023
 studio+0.022
 salon+0.021
anged+0.021
ours+0.021
Neg logit contributions
ba-0.015
ve-0.014
ty-0.014
 himself-0.013
Mo-0.013
Token shiny
Token ID22441
Feature activation-1.83e-01

Pos logit contributions
ba+0.003
ty+0.003
ve+0.003
Mo+0.003
 cool+0.002
Neg logit contributions
ours-0.004
 supervision-0.003
anged-0.003
apping-0.003
ap-0.003
Token things
Token ID1243
Feature activation+6.72e-01

Pos logit contributions
ba+0.005
ty+0.005
ve+0.004
Mo+0.004
 himself+0.004
Neg logit contributions
 supervision-0.006
ours-0.005
 salon-0.005
 studio-0.005
anged-0.005
Token were
Token ID547
Feature activation-2.79e-01

Pos logit contributions
 salon+0.019
ours+0.019
 supervision+0.018
 studio+0.018
ses+0.017
Neg logit contributions
ty-0.018
ba-0.017
ve-0.017
 himself-0.015
Mo-0.015
Token or
Token ID393
Feature activation+7.51e-01

Pos logit contributions
ba+0.007
ve+0.007
ty+0.007
Mo+0.007
B+0.006
Neg logit contributions
 supervision-0.008
ours-0.008
 salon-0.008
 studio-0.007
 mouths-0.007
Tokennam
Token ID7402
Feature activation+3.30e+00

Pos logit contributions
ours+0.026
 salon+0.026
 studio+0.025
 supervision+0.025
ses+0.024
Neg logit contributions
ba-0.014
ty-0.013
 himself-0.013
ve-0.012
 cool-0.012
Tokenents
Token ID658
Feature activation+1.21e+00

Pos logit contributions
 salon+0.052
ours+0.051
 supervision+0.048
 studio+0.044
 mouths+0.041
Neg logit contributions
ba-0.129
ty-0.129
Mo-0.123
ve-0.122
Ar-0.117
Token and
Token ID290
Feature activation-1.44e-01

Pos logit contributions
ours+0.040
 salon+0.040
 supervision+0.039
 studio+0.039
ses+0.036
Neg logit contributions
ba-0.028
ty-0.027
ve-0.026
Mo-0.023
 himself-0.022
Token they
Token ID484
Feature activation-6.72e-01

Pos logit contributions
ba+0.003
ty+0.003
 himself+0.003
ve+0.003
Mo+0.003
Neg logit contributions
ours-0.004
 supervision-0.004
 studio-0.004
 salon-0.004
ses-0.004
Token were
Token ID547
Feature activation-1.70e-01

Pos logit contributions
ba+0.018
ty+0.017
ve+0.017
Mo+0.016
B+0.015
Neg logit contributions
ours-0.019
 salon-0.019
 supervision-0.019
 studio-0.018
ses-0.017
Token very
Token ID845
Feature activation+5.47e-01

Pos logit contributions
ba+0.004
ty+0.004
ve+0.003
Mo+0.003
 himself+0.003
Neg logit contributions
 salon-0.006
ours-0.006
 supervision-0.006
 studio-0.005
ses-0.005
Token high
Token ID1029
Feature activation+1.65e-01

Pos logit contributions
ba+0.003
ty+0.003
ve+0.003
Mo+0.003
B+0.002
Neg logit contributions
ours-0.004
 salon-0.004
 supervision-0.004
 studio-0.004
ses-0.003
Token they
Token ID484
Feature activation-6.01e-01

Pos logit contributions
ours+0.005
 salon+0.005
 supervision+0.005
 studio+0.004
ses+0.004
Neg logit contributions
ba-0.004
ve-0.004
ty-0.004
Mo-0.004
Ar-0.004
Token were
Token ID547
Feature activation-4.56e-02

Pos logit contributions
ba+0.018
ty+0.018
ve+0.017
Mo+0.017
B+0.016
Neg logit contributions
 salon-0.015
ours-0.015
 supervision-0.015
 studio-0.014
ses-0.013
Token just
Token ID655
Feature activation+4.81e-01

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
Mo+0.001
 himself+0.001
Neg logit contributions
ours-0.002
 salon-0.002
 supervision-0.001
 studio-0.001
ses-0.001
Token tiny
Token ID7009
Feature activation+5.58e-01

Pos logit contributions
 salon+0.017
ours+0.017
 studio+0.016
 supervision+0.016
 slot+0.015
Neg logit contributions
ba-0.009
ty-0.009
ve-0.009
 himself-0.009
Mo-0.008
Token spec
Token ID1020
Feature activation+3.14e+00

Pos logit contributions
ours+0.022
 salon+0.022
 supervision+0.021
 studio+0.020
 slot+0.020
Neg logit contributions
ba-0.009
ty-0.008
ve-0.008
B-0.007
Mo-0.007
Tokenks
Token ID591
Feature activation+2.13e-01

Pos logit contributions
 salon+0.060
ours+0.058
 supervision+0.058
 studio+0.056
 mouths+0.048
Neg logit contributions
ba-0.116
ve-0.111
ty-0.110
Mo-0.105
Ar-0.101
Token in
Token ID287
Feature activation-3.14e-01

Pos logit contributions
 salon+0.007
ours+0.007
 supervision+0.007
 studio+0.007
ses+0.006
Neg logit contributions
ba-0.005
ty-0.005
ve-0.005
Mo-0.004
B-0.004
Token the
Token ID262
Feature activation+7.86e-01

Pos logit contributions
ba+0.009
ty+0.009
ve+0.009
Mo+0.008
B+0.008
Neg logit contributions
 salon-0.008
ours-0.008
 supervision-0.008
 studio-0.008
 mouths-0.007
Token sky
Token ID6766
Feature activation+5.93e-01

Pos logit contributions
ours+0.029
 supervision+0.026
 salon+0.025
anged+0.025
ses+0.023
Neg logit contributions
ty-0.017
 whole-0.016
ba-0.016
 cool-0.015
B-0.015
Token.
Token ID13
Feature activation+4.61e-01

Pos logit contributions
ours+0.022
 supervision+0.022
 salon+0.022
 studio+0.021
ses+0.020
Neg logit contributions
ba-0.011
ty-0.011
ve-0.010
Mo-0.009
 himself-0.008
Token to
Token ID284
Feature activation+9.06e-01

Pos logit contributions
 salon+0.019
ours+0.018
 supervision+0.018
 studio+0.018
ses+0.017
Neg logit contributions
ba-0.016
ty-0.015
ve-0.015
Mo-0.014
B-0.013
Token see
Token ID766
Feature activation+3.21e-01

Pos logit contributions
 salon+0.020
 supervision+0.020
ours+0.018
aster+0.017
 frame+0.017
Neg logit contributions
ba-0.029
ve-0.029
Mo-0.028
rob-0.027
B-0.026
Token how
Token ID703
Feature activation+1.65e+00

Pos logit contributions
ours+0.008
 salon+0.007
 supervision+0.007
 studio+0.006
 oven+0.006
Neg logit contributions
ba-0.010
ve-0.010
Mo-0.010
Ar-0.010
ty-0.010
Token much
Token ID881
Feature activation+3.70e-01

Pos logit contributions
ses+0.062
 supervision+0.061
 studio+0.060
 salon+0.059
ours+0.056
Neg logit contributions
ve-0.033
ba-0.031
E-0.027
B-0.026
ty-0.026
Token it
Token ID340
Feature activation+4.35e-01

Pos logit contributions
 salon+0.010
ours+0.010
 supervision+0.009
 studio+0.009
ses+0.009
Neg logit contributions
ba-0.011
ty-0.010
ve-0.010
Mo-0.009
B-0.009
Token costs
Token ID3484
Feature activation+3.06e+00

Pos logit contributions
ours+0.008
 salon+0.008
 supervision+0.007
 studio+0.006
 mouths+0.006
Neg logit contributions
ba-0.016
Mo-0.016
Mill-0.016
ve-0.016
ty-0.016
Token."
Token ID526
Feature activation+8.92e-01

Pos logit contributions
 supervision+0.086
 salon+0.085
ours+0.083
 studio+0.078
 slot+0.073
Neg logit contributions
ba-0.081
Mo-0.078
ty-0.074
ve-0.072
Mill-0.072
Token They
Token ID1119
Feature activation-5.07e-01

Pos logit contributions
 supervision+0.027
 salon+0.027
ours+0.026
ses+0.025
 studio+0.025
Neg logit contributions
ty-0.022
ve-0.021
B-0.021
ba-0.020
E-0.019
Token looked
Token ID3114
Feature activation-3.04e-01

Pos logit contributions
ba+0.014
ty+0.013
ve+0.013
Mo+0.012
B+0.011
Neg logit contributions
 salon-0.015
ours-0.014
 supervision-0.014
 studio-0.014
ses-0.013
Token at
Token ID379
Feature activation+4.69e-01

Pos logit contributions
ba+0.009
ty+0.008
ve+0.008
Mo+0.008
Ar+0.008
Neg logit contributions
ours-0.008
 salon-0.008
 supervision-0.008
 studio-0.008
 mouths-0.007
Token the
Token ID262
Feature activation+4.32e-01

Pos logit contributions
 salon+0.014
ours+0.014
 supervision+0.013
 studio+0.013
ses+0.012
Neg logit contributions
ba-0.012
ty-0.012
ve-0.011
Mo-0.011
B-0.010
Token foot
Token ID2366
Feature activation-4.32e-01

Pos logit contributions
ours+0.042
 salon+0.042
 studio+0.037
 supervision+0.036
bug+0.035
Neg logit contributions
 himself-0.031
ve-0.031
ty-0.030
ba-0.030
Ar-0.027
Token.
Token ID13
Feature activation+1.18e-01

Pos logit contributions
ty+0.007
ba+0.007
 himself+0.007
ve+0.007
ke+0.005
Neg logit contributions
ours-0.017
 supervision-0.016
 salon-0.016
ses-0.016
 studio-0.016
Token It
Token ID632
Feature activation-9.07e-01

Pos logit contributions
 supervision+0.003
 salon+0.003
ours+0.003
 studio+0.003
ses+0.003
Neg logit contributions
ba-0.003
ve-0.003
Mo-0.003
ty-0.003
Mill-0.003
Token was
Token ID373
Feature activation+1.96e-01

Pos logit contributions
ba+0.024
ve+0.023
ty+0.023
Mo+0.022
B+0.020
Neg logit contributions
 salon-0.026
ours-0.026
 supervision-0.025
 studio-0.025
ses-0.023
Token 15
Token ID1315
Feature activation+1.15e+00

Pos logit contributions
 salon+0.007
ours+0.007
 supervision+0.007
 studio+0.006
ses+0.006
Neg logit contributions
ba-0.004
ty-0.004
ve-0.004
Mo-0.004
Ar-0.003
Token centimeters
Token ID48829
Feature activation+3.03e+00

Pos logit contributions
ours+0.035
 salon+0.034
 supervision+0.033
ses+0.032
 studio+0.032
Neg logit contributions
ty-0.028
ve-0.028
ba-0.028
Mo-0.025
 himself-0.024
Token long
Token ID890
Feature activation+3.69e-02

Pos logit contributions
 salon+0.108
 supervision+0.108
ours+0.108
 studio+0.103
ses+0.101
Neg logit contributions
ve-0.058
ba-0.057
ty-0.055
Mo-0.050
Ar-0.049
Token.
Token ID13
Feature activation+1.90e-02

Pos logit contributions
ours+0.001
 salon+0.001
 studio+0.001
 supervision+0.001
ses+0.001
Neg logit contributions
ty-0.001
ve-0.001
 himself-0.001
ba-0.001
ke-0.001
Token Lucy
Token ID22162
Feature activation+6.68e-01

Pos logit contributions
 supervision+0.001
 salon+0.001
ours+0.001
 studio+0.001
ses+0.000
Neg logit contributions
ba-0.001
ve-0.000
ty-0.000
Mo-0.000
B-0.000
Token said
Token ID531
Feature activation+4.65e-01

Pos logit contributions
 supervision+0.024
 salon+0.023
ours+0.023
 studio+0.022
ses+0.021
Neg logit contributions
ba-0.014
ve-0.014
ty-0.014
Mo-0.012
 himself-0.012
Token,
Token ID11
Feature activation-2.71e-01

Pos logit contributions
 supervision+0.019
 salon+0.019
ses+0.019
 studio+0.018
anged+0.018
Neg logit contributions
 himself-0.007
ve-0.006
ba-0.005
 almost-0.005
!"-0.005
Tokenongo
Token ID25162
Feature activation+5.51e-01

Pos logit contributions
ba+0.036
ty+0.034
Ar+0.034
Mo+0.033
ve+0.032
Neg logit contributions
 supervision-0.038
 salon-0.036
anged-0.036
ses-0.036
ours-0.035
Token didn
Token ID1422
Feature activation-1.35e+00

Pos logit contributions
 supervision+0.017
ours+0.017
 salon+0.017
 studio+0.016
ses+0.015
Neg logit contributions
ba-0.014
ty-0.014
ve-0.013
Mo-0.012
B-0.012
Token't
Token ID470
Feature activation-6.62e-01

Pos logit contributions
ba+0.035
ty+0.031
ve+0.030
Mo+0.030
Ar+0.030
Neg logit contributions
 salon-0.039
ours-0.038
 �-0.035
 studio-0.034
 frame-0.033
Token like
Token ID588
Feature activation+9.56e-01

Pos logit contributions
ve+0.019
ba+0.018
ty+0.018
Mo+0.018
Ar+0.017
Neg logit contributions
ours-0.016
 salon-0.016
 supervision-0.016
 studio-0.015
ses-0.015
Token wet
Token ID9583
Feature activation-3.42e-01

Pos logit contributions
ours+0.026
 salon+0.026
 supervision+0.026
 studio+0.026
ses+0.024
Neg logit contributions
ty-0.026
ba-0.026
ve-0.024
Mo-0.024
 himself-0.023
Token av
Token ID1196
Feature activation+3.03e+00

Pos logit contributions
ba+0.010
ve+0.010
ty+0.009
Mo+0.009
Ar+0.009
Neg logit contributions
ours-0.009
 salon-0.009
 supervision-0.009
 mouths-0.008
 studio-0.008
Tokenoc
Token ID420
Feature activation+1.89e+00

Pos logit contributions
 supervision+0.077
ours+0.075
 salon+0.075
 mouths+0.071
anged+0.068
Neg logit contributions
ba-0.087
ty-0.078
ve-0.078
Mo-0.075
Mill-0.072
Tokenados
Token ID22484
Feature activation+6.30e-01

Pos logit contributions
ours+0.041
 salon+0.041
 supervision+0.040
 studio+0.038
ses+0.035
Neg logit contributions
ba-0.064
ty-0.061
ve-0.060
Mo-0.058
Ar-0.056
Token,
Token ID11
Feature activation-8.57e-01

Pos logit contributions
ours+0.019
 salon+0.019
 supervision+0.019
 studio+0.018
ses+0.017
Neg logit contributions
ba-0.016
ty-0.015
ve-0.015
Mo-0.014
B-0.013
Token so
Token ID523
Feature activation-5.97e-01

Pos logit contributions
ba+0.015
ty+0.015
ve+0.013
Mo+0.013
 himself+0.012
Neg logit contributions
ours-0.032
 salon-0.032
 supervision-0.032
 studio-0.031
ses-0.030
Token he
Token ID339
Feature activation+3.44e-01

Pos logit contributions
ve+0.019
ba+0.019
ty+0.018
Ar+0.017
vell+0.017
Neg logit contributions
 salon-0.013
ours-0.013
 supervision-0.012
 mouths-0.012
 gust-0.012
Token complete
Token ID1844
Feature activation+6.48e-01

Pos logit contributions
 salon+0.014
ours+0.014
 supervision+0.014
 studio+0.014
ses+0.013
Neg logit contributions
ba-0.012
ty-0.012
ve-0.012
Mo-0.011
B-0.010
Token.
Token ID13
Feature activation+7.36e-01

Pos logit contributions
 salon+0.019
ours+0.019
 supervision+0.019
 studio+0.018
ses+0.017
Neg logit contributions
ba-0.017
ty-0.016
ve-0.016
Mo-0.015
B-0.014
Token Height
Token ID27280
Feature activation+5.13e-01

Pos logit contributions
 salon+0.016
 supervision+0.016
ours+0.015
 studio+0.015
ses+0.014
Neg logit contributions
ba-0.024
ve-0.023
ty-0.023
Mo-0.022
 himself-0.021
Token:
Token ID25
Feature activation+4.38e-01

Pos logit contributions
ours+0.015
 salon+0.015
 supervision+0.014
 studio+0.014
ses+0.013
Neg logit contributions
ba-0.014
ty-0.013
ve-0.013
Mo-0.012
B-0.012
Token 90
Token ID4101
Feature activation+4.80e-01

Pos logit contributions
 salon+0.011
ours+0.011
 studio+0.011
 supervision+0.011
ses+0.010
Neg logit contributions
ba-0.013
ve-0.012
ty-0.012
Mo-0.011
 himself-0.011
Token centimeters
Token ID48829
Feature activation+3.02e+00

Pos logit contributions
 salon+0.013
ours+0.013
 supervision+0.013
 studio+0.012
ses+0.012
Neg logit contributions
ba-0.013
ve-0.013
ty-0.013
Mo-0.012
B-0.011
Token.
Token ID13
Feature activation+6.69e-01

Pos logit contributions
 salon+0.101
 supervision+0.099
ours+0.099
 studio+0.096
ses+0.092
Neg logit contributions
ba-0.066
ve-0.064
ty-0.063
Mo-0.058
Ar-0.054
Token Weight
Token ID14331
Feature activation-5.38e-02

Pos logit contributions
 salon+0.014
 supervision+0.014
ours+0.014
 studio+0.013
ses+0.012
Neg logit contributions
ba-0.023
ty-0.022
ve-0.022
Mo-0.021
B-0.020
Token:
Token ID25
Feature activation-1.37e-01

Pos logit contributions
ba+0.002
ty+0.002
ve+0.001
Mo+0.001
 himself+0.001
Neg logit contributions
 salon-0.001
 supervision-0.001
ours-0.001
 studio-0.001
ses-0.001
Token 15
Token ID1315
Feature activation+6.05e-01

Pos logit contributions
ba+0.004
ty+0.004
ve+0.004
Mo+0.004
 himself+0.004
Neg logit contributions
 salon-0.003
 supervision-0.003
 studio-0.003
ours-0.003
ses-0.003
Token kilograms
Token ID37075
Feature activation+2.13e-01

Pos logit contributions
 salon+0.019
ours+0.018
 supervision+0.018
 studio+0.018
ses+0.017
Neg logit contributions
ba-0.015
ve-0.014
ty-0.014
Mo-0.013
B-0.012
Token He
Token ID679
Feature activation-2.21e-01

Pos logit contributions
 salon+0.005
 supervision+0.005
ours+0.004
 studio+0.004
ses+0.004
Neg logit contributions
 himself-0.003
ba-0.003
ve-0.003
Mo-0.003
B-0.003
Token quickly
Token ID2952
Feature activation-3.21e-01

Pos logit contributions
ba+0.005
ve+0.005
ty+0.005
Mo+0.005
Ar+0.004
Neg logit contributions
 supervision-0.007
ours-0.007
 salon-0.007
 studio-0.007
ses-0.006
Token hung
Token ID9174
Feature activation+7.05e-02

Pos logit contributions
ba+0.007
ty+0.007
ve+0.007
Mo+0.006
B+0.006
Neg logit contributions
ours-0.011
 salon-0.010
 supervision-0.010
 studio-0.010
ses-0.010
Token colorful
Token ID20239
Feature activation+1.31e-01

Pos logit contributions
 salon+0.002
ours+0.002
 supervision+0.002
 studio+0.002
ses+0.002
Neg logit contributions
ba-0.002
ty-0.002
ve-0.002
Mo-0.002
B-0.002
Token or
Token ID393
Feature activation-1.45e-01

Pos logit contributions
 salon+0.004
ours+0.004
 supervision+0.004
 studio+0.004
ses+0.004
Neg logit contributions
ba-0.003
ty-0.003
ve-0.003
Mo-0.003
B-0.003
Tokennam
Token ID7402
Feature activation+3.02e+00

Pos logit contributions
ba+0.004
ty+0.004
ve+0.004
Mo+0.004
Ar+0.004
Neg logit contributions
ours-0.004
 salon-0.004
 supervision-0.004
 studio-0.003
ses-0.003
Tokenents
Token ID658
Feature activation+1.48e+00

Pos logit contributions
 salon+0.046
ours+0.046
 supervision+0.042
 studio+0.040
 mouths+0.036
Neg logit contributions
ty-0.118
ba-0.118
Mo-0.112
ve-0.111
B-0.106
Token and
Token ID290
Feature activation-4.74e-01

Pos logit contributions
ours+0.053
 supervision+0.052
 salon+0.052
 studio+0.051
ses+0.050
Neg logit contributions
ba-0.031
ty-0.030
ve-0.028
 himself-0.024
Mo-0.024
Token tw
Token ID665
Feature activation-9.75e-01

Pos logit contributions
ba+0.011
ty+0.011
ve+0.010
 himself+0.010
Ar+0.009
Neg logit contributions
ours-0.015
 supervision-0.015
 salon-0.015
 studio-0.014
ses-0.014
Tokenink
Token ID676
Feature activation-8.60e-01

Pos logit contributions
ba+0.027
ty+0.027
Mo+0.025
B+0.025
Ar+0.024
Neg logit contributions
ours-0.029
 salon-0.026
 studio-0.025
 supervision-0.025
anged-0.023
Tokenling
Token ID1359
Feature activation-4.78e-01

Pos logit contributions
ba+0.017
ty+0.016
ve+0.016
Mo+0.015
Ar+0.014
Neg logit contributions
 salon-0.030
ours-0.030
 supervision-0.029
 studio-0.029
ses-0.027
Token ask
Token ID1265
Feature activation+9.85e-01

Pos logit contributions
 salon+0.005
ours+0.005
 supervision+0.005
 studio+0.005
ses+0.005
Neg logit contributions
ba-0.005
ty-0.005
ve-0.005
Mo-0.005
Ar-0.004
Token for
Token ID329
Feature activation+6.13e-01

Pos logit contributions
 salon+0.026
ours+0.026
 supervision+0.025
 studio+0.025
ses+0.023
Neg logit contributions
ba-0.028
ty-0.027
ve-0.026
Mo-0.025
 himself-0.024
Token things
Token ID1243
Feature activation-1.79e-01

Pos logit contributions
 salon+0.018
ours+0.018
 supervision+0.018
 studio+0.018
ses+0.017
Neg logit contributions
ba-0.015
ty-0.015
ve-0.015
Mo-0.014
B-0.013
Token that
Token ID326
Feature activation+1.09e+00

Pos logit contributions
ba+0.006
ty+0.005
ve+0.005
Mo+0.005
B+0.005
Neg logit contributions
 salon-0.004
ours-0.004
 supervision-0.004
 studio-0.004
ses-0.004
Token could
Token ID714
Feature activation-2.32e-01

Pos logit contributions
ours+0.033
 salon+0.032
 studio+0.032
 supervision+0.030
ses+0.029
Neg logit contributions
ba-0.028
ty-0.026
ve-0.026
Mo-0.024
B-0.023
Token cause
Token ID2728
Feature activation+3.02e+00

Pos logit contributions
ve+0.007
ba+0.007
Mo+0.007
Ar+0.007
rob+0.006
Neg logit contributions
 salon-0.006
 supervision-0.005
ours-0.005
 mouths-0.005
 tower-0.005
Token harm
Token ID4419
Feature activation+9.61e-01

Pos logit contributions
 salon+0.097
ours+0.097
 studio+0.094
 supervision+0.089
ses+0.088
Neg logit contributions
ba-0.070
ty-0.067
ve-0.066
B-0.063
Mo-0.061
Token or
Token ID393
Feature activation+1.99e-01

Pos logit contributions
 salon+0.032
ours+0.032
 studio+0.031
 supervision+0.031
ses+0.029
Neg logit contributions
ba-0.021
ty-0.020
ve-0.020
Mo-0.018
B-0.018
Token trouble
Token ID5876
Feature activation+2.27e-01

Pos logit contributions
ours+0.007
 salon+0.006
 studio+0.006
ses+0.006
 supervision+0.006
Neg logit contributions
ba-0.004
ty-0.004
ve-0.004
 himself-0.004
Mo-0.004
Token.
Token ID13
Feature activation+4.40e-01

Pos logit contributions
 supervision+0.007
ours+0.007
 salon+0.006
 studio+0.006
ses+0.006
Neg logit contributions
ba-0.006
ty-0.006
ve-0.006
Mo-0.005
Ar-0.005
Token<|endoftext|>
Token ID50256
Feature activation-7.82e-01

Pos logit contributions
 supervision+0.014
ours+0.014
 salon+0.013
 studio+0.013
ses+0.013
Neg logit contributions
ba-0.010
B-0.010
ty-0.010
Mo-0.009
Mill-0.009
Token a
Token ID257
Feature activation-7.95e-01

Pos logit contributions
ba+0.017
ve+0.016
ty+0.016
Mo+0.015
Ar+0.014
Neg logit contributions
 salon-0.024
ours-0.024
 supervision-0.023
 studio-0.023
ses-0.022
Token tree
Token ID5509
Feature activation+6.79e-01

Pos logit contributions
ba+0.016
ve+0.014
ty+0.014
Mo+0.013
B+0.012
Neg logit contributions
 supervision-0.030
ours-0.029
 salon-0.028
 studio-0.027
anged-0.027
Token with
Token ID351
Feature activation-4.13e-01

Pos logit contributions
 supervision+0.032
ours+0.029
 mouths+0.028
 studio+0.028
ses+0.028
Neg logit contributions
ty-0.009
ba-0.008
ve-0.007
 himself-0.006
Mo-0.006
Token shiny
Token ID22441
Feature activation-6.88e-01

Pos logit contributions
ba+0.011
ty+0.010
ve+0.010
Mo+0.009
Ar+0.009
Neg logit contributions
 salon-0.012
ours-0.012
 supervision-0.012
 studio-0.011
 mouths-0.011
Token or
Token ID393
Feature activation+1.82e-01

Pos logit contributions
ba+0.020
ty+0.019
ve+0.018
Mo+0.018
Ar+0.017
Neg logit contributions
ours-0.019
 salon-0.019
 supervision-0.018
 studio-0.017
 mouths-0.017
Tokennam
Token ID7402
Feature activation+3.01e+00

Pos logit contributions
 salon+0.005
ours+0.005
 supervision+0.005
 studio+0.004
ses+0.004
Neg logit contributions
ba-0.005
ty-0.005
ve-0.005
Mo-0.005
Ar-0.005
Tokenents
Token ID658
Feature activation+1.14e+00

Pos logit contributions
 salon+0.047
ours+0.046
 supervision+0.042
 studio+0.040
 mouths+0.036
Neg logit contributions
ba-0.119
ty-0.118
Mo-0.112
ve-0.111
Ar-0.107
Token hanging
Token ID10938
Feature activation+1.13e+00

Pos logit contributions
 supervision+0.044
ours+0.044
 studio+0.044
 salon+0.043
ses+0.041
Neg logit contributions
ba-0.020
ty-0.020
ve-0.019
 himself-0.015
Mo-0.014
Token from
Token ID422
Feature activation-1.46e-01

Pos logit contributions
 salon+0.037
ours+0.037
 supervision+0.035
 studio+0.034
ses+0.033
Neg logit contributions
ba-0.026
ty-0.024
ve-0.024
Mo-0.023
B-0.021
Token its
Token ID663
Feature activation+6.97e-02

Pos logit contributions
ba+0.003
ty+0.003
ve+0.003
Mo+0.003
 himself+0.003
Neg logit contributions
ours-0.005
 supervision-0.005
 salon-0.005
 studio-0.004
ses-0.004
Token branches
Token ID13737
Feature activation+7.56e-01

Pos logit contributions
ours+0.002
 supervision+0.002
 salon+0.002
 studio+0.002
anged+0.002
Neg logit contributions
ba-0.002
ve-0.002
ty-0.002
Mo-0.002
 himself-0.002
Token you
Token ID345
Feature activation+1.83e-01

Pos logit contributions
ses+0.006
 studio+0.006
 salon+0.006
ours+0.006
anged+0.006
Neg logit contributions
ty-0.002
 himself-0.002
ba-0.002
ve-0.002
B-0.002
Token did
Token ID750
Feature activation+3.39e-01

Pos logit contributions
ours+0.004
 supervision+0.004
 salon+0.004
 studio+0.004
 mouths+0.004
Neg logit contributions
ba-0.006
ty-0.005
Mo-0.005
ve-0.005
Ar-0.005
Token not
Token ID407
Feature activation-2.43e-01

Pos logit contributions
ours+0.009
 salon+0.009
 supervision+0.009
 studio+0.009
ses+0.008
Neg logit contributions
ba-0.009
ty-0.009
ve-0.009
Mo-0.008
Ar-0.008
Token mean
Token ID1612
Feature activation+2.19e+00

Pos logit contributions
ba+0.006
ve+0.006
ty+0.006
Mo+0.006
Ar+0.005
Neg logit contributions
 supervision-0.007
 salon-0.007
ours-0.007
 studio-0.007
ses-0.006
Token to
Token ID284
Feature activation+1.49e+00

Pos logit contributions
 studio+0.070
 salon+0.070
ours+0.070
ses+0.069
 supervision+0.067
Neg logit contributions
ty-0.048
ba-0.048
 himself-0.046
ve-0.045
Mo-0.041
Token cause
Token ID2728
Feature activation+2.99e+00

Pos logit contributions
 salon+0.045
 supervision+0.044
ours+0.043
 studio+0.042
 mouths+0.039
Neg logit contributions
ba-0.038
ve-0.037
ty-0.036
Mo-0.035
Ar-0.033
Token trouble
Token ID5876
Feature activation-1.56e-01

Pos logit contributions
 salon+0.093
ours+0.092
 studio+0.089
 supervision+0.085
ses+0.085
Neg logit contributions
ba-0.071
ve-0.069
ty-0.069
Mo-0.064
B-0.064
Token.
Token ID13
Feature activation+1.49e-01

Pos logit contributions
ba+0.004
Mo+0.004
ty+0.004
ve+0.004
Ar+0.004
Neg logit contributions
 supervision-0.004
ours-0.004
 salon-0.004
 studio-0.004
ses-0.004
Token But
Token ID887
Feature activation-1.03e-01

Pos logit contributions
 salon+0.004
ours+0.003
 supervision+0.003
 studio+0.003
anged+0.003
Neg logit contributions
ba-0.005
Mo-0.004
 himself-0.004
ty-0.004
B-0.004
Token you
Token ID345
Feature activation+7.93e-01

Pos logit contributions
ba+0.002
ty+0.002
 himself+0.002
ve+0.002
Mo+0.002
Neg logit contributions
ours-0.003
 salon-0.003
 studio-0.003
 supervision-0.003
ses-0.003
Token have
Token ID423
Feature activation+1.54e+00

Pos logit contributions
ours+0.015
 salon+0.015
 supervision+0.015
 studio+0.014
 mouths+0.014
Neg logit contributions
ba-0.027
Mo-0.026
ty-0.026
ve-0.026
Mill-0.024
Token We
Token ID775
Feature activation+5.30e-02

Pos logit contributions
ba+0.010
Mo+0.009
ve+0.009
Ar+0.009
ty+0.009
Neg logit contributions
 salon-0.005
 studio-0.004
 supervision-0.004
ours-0.004
anged-0.003
Token did
Token ID750
Feature activation+4.54e-01

Pos logit contributions
ours+0.001
 salon+0.001
 supervision+0.001
ses+0.001
 studio+0.001
Neg logit contributions
ba-0.002
ty-0.002
Mo-0.002
ve-0.002
Mill-0.001
Token not
Token ID407
Feature activation-2.40e-01

Pos logit contributions
 salon+0.016
ours+0.015
 supervision+0.015
 studio+0.015
ses+0.014
Neg logit contributions
ba-0.009
ty-0.009
ve-0.009
Mo-0.008
 himself-0.008
Token mean
Token ID1612
Feature activation+1.84e+00

Pos logit contributions
ba+0.007
ve+0.007
ty+0.006
Mo+0.006
Ar+0.006
Neg logit contributions
 supervision-0.006
 salon-0.006
ours-0.006
 studio-0.006
ap-0.005
Token to
Token ID284
Feature activation+1.40e+00

Pos logit contributions
 studio+0.060
 salon+0.059
ours+0.058
ses+0.056
 supervision+0.056
Neg logit contributions
ba-0.042
ty-0.041
ve-0.039
 himself-0.038
Mo-0.036
Token cause
Token ID2728
Feature activation+2.99e+00

Pos logit contributions
 salon+0.041
 supervision+0.040
ours+0.040
 studio+0.038
 mouths+0.036
Neg logit contributions
ba-0.036
ve-0.036
ty-0.035
Mo-0.033
Ar-0.031
Token trouble
Token ID5876
Feature activation-9.78e-02

Pos logit contributions
 salon+0.090
ours+0.089
 studio+0.087
 supervision+0.085
ses+0.081
Neg logit contributions
ba-0.076
ty-0.073
ve-0.072
Mo-0.068
B-0.065
Token."
Token ID526
Feature activation-5.19e-02

Pos logit contributions
Mo+0.002
ba+0.002
ty+0.002
Ar+0.002
ve+0.002
Neg logit contributions
 supervision-0.003
 salon-0.003
ours-0.003
 studio-0.003
 Sharing-0.003
Token
Token ID198
Feature activation+4.96e-02

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
Mo+0.001
B+0.001
Neg logit contributions
ours-0.002
 salon-0.002
 supervision-0.002
 studio-0.002
 mouths-0.001
Token
Token ID198
Feature activation-2.71e-01

Pos logit contributions
ours+0.001
 salon+0.001
 supervision+0.001
 studio+0.001
 slot+0.001
Neg logit contributions
ty-0.001
ba-0.001
ve-0.001
vell-0.001
Mo-0.001
TokenThe
Token ID464
Feature activation-8.71e-01

Pos logit contributions
ty+0.006
ba+0.006
ve+0.006
Mo+0.005
B+0.005
Neg logit contributions
 salon-0.009
ours-0.009
 supervision-0.009
 studio-0.008
 mouths-0.008
Token price
Token ID2756
Feature activation+7.51e-01

Pos logit contributions
 salon+0.002
ours+0.002
 supervision+0.002
 studio+0.002
anged+0.002
Neg logit contributions
ve-0.002
ba-0.002
ty-0.002
Mo-0.002
 himself-0.002
Token tag
Token ID7621
Feature activation+5.67e-01

Pos logit contributions
ours+0.029
 salon+0.028
 studio+0.028
 supervision+0.028
 Acc+0.028
Neg logit contributions
ty-0.012
ba-0.012
ve-0.011
 himself-0.011
ke-0.008
Token.
Token ID13
Feature activation-2.23e-02

Pos logit contributions
 supervision+0.023
ours+0.023
 studio+0.023
 salon+0.022
ses+0.022
Neg logit contributions
ty-0.008
ba-0.008
 himself-0.007
ve-0.007
ke-0.005
Token It
Token ID632
Feature activation-6.49e-01

Pos logit contributions
ba+0.000
B+0.000
Mo+0.000
ve+0.000
E+0.000
Neg logit contributions
 supervision-0.001
 salon-0.001
 studio-0.001
ours-0.001
anged-0.001
Token said
Token ID531
Feature activation-2.75e-01

Pos logit contributions
ba+0.017
ty+0.016
ve+0.016
Mo+0.015
B+0.014
Neg logit contributions
 salon-0.019
ours-0.019
 supervision-0.019
 studio-0.018
ses-0.017
Token $
Token ID720
Feature activation+2.98e+00

Pos logit contributions
ba+0.007
 himself+0.007
ve+0.007
ty+0.007
Mo+0.006
Neg logit contributions
 supervision-0.008
 salon-0.008
 studio-0.008
anged-0.007
ses-0.007
Token1
Token ID16
Feature activation+1.13e+00

Pos logit contributions
ses+0.074
 studio+0.072
 salon+0.072
 supervision+0.070
ours+0.065
Neg logit contributions
ve-0.088
ba-0.085
B-0.085
E-0.083
Mo-0.079
Token.
Token ID13
Feature activation+5.33e-01

Pos logit contributions
 salon+0.035
 studio+0.035
ours+0.034
 supervision+0.034
ses+0.032
Neg logit contributions
ba-0.027
ty-0.026
ve-0.026
Mo-0.023
 himself-0.022
Token99
Token ID2079
Feature activation-5.43e-01

Pos logit contributions
 supervision+0.020
 salon+0.019
ours+0.019
 studio+0.018
ses+0.018
Neg logit contributions
ba-0.009
ve-0.009
Mo-0.008
B-0.008
ty-0.008
Token per
Token ID583
Feature activation-7.62e-01

Pos logit contributions
ty+0.012
ve+0.012
ba+0.012
ke+0.010
 himself+0.010
Neg logit contributions
 salon-0.019
ours-0.019
 studio-0.018
 supervision-0.017
 mouths-0.016
Token pound
Token ID14896
Feature activation+3.71e-02

Pos logit contributions
ba+0.020
ve+0.019
ty+0.018
Mo+0.017
Ar+0.016
Neg logit contributions
 salon-0.024
 studio-0.022
ours-0.022
 supervision-0.022
 mouths-0.020
Tokeno
Token ID78
Feature activation+4.75e-01

Pos logit contributions
ours+0.004
bug+0.004
 salon+0.003
ses+0.003
 mouths+0.003
Neg logit contributions
ba-0.008
ve-0.008
ty-0.008
Mo-0.008
vell-0.008
Token was
Token ID373
Feature activation-8.61e-02

Pos logit contributions
 salon+0.013
ours+0.013
 supervision+0.012
 studio+0.012
ses+0.011
Neg logit contributions
ba-0.014
ty-0.013
ve-0.013
Mo-0.012
B-0.011
Token not
Token ID407
Feature activation-9.45e-02

Pos logit contributions
ba+0.002
ve+0.002
ty+0.002
Mo+0.002
B+0.002
Neg logit contributions
 salon-0.002
ours-0.002
 supervision-0.002
 studio-0.002
ses-0.002
Token like
Token ID588
Feature activation+8.27e-01

Pos logit contributions
ve+0.003
Mo+0.003
ba+0.003
ty+0.003
E+0.003
Neg logit contributions
 supervision-0.002
 salon-0.002
ses-0.002
ours-0.002
 square-0.002
Token other
Token ID584
Feature activation-5.02e-01

Pos logit contributions
ours+0.023
 salon+0.023
 supervision+0.023
 studio+0.022
ses+0.021
Neg logit contributions
ba-0.023
ty-0.022
ve-0.021
Mo-0.020
B-0.019
Token gor
Token ID30344
Feature activation+2.97e+00

Pos logit contributions
ba+0.016
ty+0.015
ve+0.015
Mo+0.015
B+0.015
Neg logit contributions
 salon-0.011
 mouths-0.011
ours-0.011
 tower-0.011
 supervision-0.010
Tokenillas
Token ID25314
Feature activation+3.08e-01

Pos logit contributions
 salon+0.069
ours+0.069
 supervision+0.069
 studio+0.068
ses+0.062
Neg logit contributions
ba-0.096
ve-0.095
ty-0.088
Mo-0.084
ke-0.078
Token.
Token ID13
Feature activation+1.10e-01

Pos logit contributions
ours+0.011
 studio+0.010
 salon+0.010
 supervision+0.010
ses+0.010
Neg logit contributions
ba-0.007
ty-0.007
ve-0.006
 himself-0.005
Mo-0.005
Token He
Token ID679
Feature activation-8.92e-01

Pos logit contributions
 salon+0.003
 supervision+0.003
ours+0.003
 studio+0.003
ses+0.003
Neg logit contributions
ba-0.003
ty-0.003
ve-0.003
Mo-0.003
Ar-0.002
Token was
Token ID373
Feature activation-1.41e-01

Pos logit contributions
ba+0.026
ty+0.025
ve+0.025
Mo+0.024
B+0.022
Neg logit contributions
 salon-0.023
ours-0.023
 supervision-0.022
 studio-0.022
ses-0.020
Token bald
Token ID28273
Feature activation-5.89e-02

Pos logit contributions
ba+0.004
ty+0.003
ve+0.003
Mo+0.003
Ar+0.003
Neg logit contributions
 salon-0.004
ours-0.004
 supervision-0.004
 studio-0.004
ses-0.004
Token hill
Token ID12788
Feature activation+1.76e+00

Pos logit contributions
ty+0.005
ve+0.005
ba+0.005
B+0.004
Ar+0.004
Neg logit contributions
ours-0.009
 supervision-0.009
anged-0.008
ses-0.008
 mouths-0.008
Token and
Token ID290
Feature activation+2.16e-01

Pos logit contributions
 supervision+0.082
ours+0.082
ses+0.077
 mouths+0.075
 salon+0.074
Neg logit contributions
ty-0.021
ba-0.015
ve-0.014
 himself-0.013
 almost-0.010
Token put
Token ID1234
Feature activation-4.45e-01

Pos logit contributions
 supervision+0.007
ours+0.007
 salon+0.007
ses+0.007
 mouths+0.007
Neg logit contributions
ba-0.005
ty-0.005
ve-0.004
Mo-0.004
 himself-0.004
Token her
Token ID607
Feature activation+1.53e-02

Pos logit contributions
ba+0.012
ty+0.011
ve+0.011
Mo+0.011
Ar+0.010
Neg logit contributions
ours-0.013
 salon-0.012
 supervision-0.012
 studio-0.012
 mouths-0.011
Token or
Token ID393
Feature activation-1.10e-01

Pos logit contributions
ours+0.000
 salon+0.000
 supervision+0.000
 studio+0.000
ses+0.000
Neg logit contributions
ba-0.000
ty-0.000
ve-0.000
Mo-0.000
B-0.000
Tokennam
Token ID7402
Feature activation+2.93e+00

Pos logit contributions
ba+0.003
ty+0.003
ve+0.003
 himself+0.003
Mo+0.003
Neg logit contributions
ours-0.003
 salon-0.003
 supervision-0.003
 studio-0.002
ses-0.002
Tokenents
Token ID658
Feature activation+1.46e+00

Pos logit contributions
 salon+0.050
ours+0.049
 supervision+0.047
 studio+0.044
 mouths+0.039
Neg logit contributions
ba-0.112
ty-0.110
ve-0.106
Mo-0.105
B-0.100
Token all
Token ID477
Feature activation-7.42e-01

Pos logit contributions
ours+0.054
 supervision+0.053
 salon+0.053
 studio+0.052
ses+0.051
Neg logit contributions
ba-0.028
ty-0.027
ve-0.026
 himself-0.022
Mo-0.022
Token around
Token ID1088
Feature activation-3.34e-01

Pos logit contributions
ty+0.014
ba+0.014
 himself+0.013
ve+0.012
Mo+0.011
Neg logit contributions
ses-0.027
 supervision-0.027
ours-0.026
 salon-0.026
 studio-0.025
Token.
Token ID13
Feature activation+3.21e-01

Pos logit contributions
ba+0.007
ty+0.007
ve+0.006
Mo+0.006
 himself+0.006
Neg logit contributions
ours-0.012
 supervision-0.012
 salon-0.012
ses-0.011
 studio-0.011
Token She
Token ID1375
Feature activation+3.39e-01

Pos logit contributions
 supervision+0.012
 salon+0.011
ours+0.010
 wasteful+0.010
 mouths+0.010
Neg logit contributions
E-0.006
B-0.006
Mo-0.006
 himself-0.006
Mill-0.006
Token the
Token ID262
Feature activation+4.33e-01

Pos logit contributions
ba+0.001
ve+0.001
ty+0.001
Mo+0.001
Ar+0.001
Neg logit contributions
ours-0.000
 supervision-0.000
 salon-0.000
 mouths-0.000
 �-0.000
Token cloth
Token ID16270
Feature activation+2.53e-01

Pos logit contributions
 salon+0.012
ours+0.012
 supervision+0.012
 studio+0.011
ses+0.011
Neg logit contributions
ba-0.012
ty-0.012
ve-0.012
Mo-0.011
B-0.010
Token onto
Token ID4291
Feature activation+2.65e-01

Pos logit contributions
ours+0.009
 salon+0.009
 supervision+0.009
 studio+0.009
ses+0.008
Neg logit contributions
ba-0.005
ty-0.005
ve-0.005
Mo-0.004
 himself-0.004
Token the
Token ID262
Feature activation+7.47e-01

Pos logit contributions
 salon+0.008
ours+0.008
 supervision+0.008
 studio+0.008
ses+0.008
Neg logit contributions
ba-0.006
ty-0.006
ve-0.006
Mo-0.006
B-0.005
Token ash
Token ID12530
Feature activation+9.80e-01

Pos logit contributions
ours+0.022
 salon+0.022
 supervision+0.022
 studio+0.021
ses+0.020
Neg logit contributions
ba-0.019
ty-0.019
ve-0.018
Mo-0.017
B-0.016
Tokentr
Token ID2213
Feature activation+2.93e+00

Pos logit contributions
ours+0.023
 supervision+0.022
 salon+0.022
 studio+0.021
ses+0.019
Neg logit contributions
ty-0.032
ba-0.032
ve-0.030
Mo-0.029
 himself-0.028
Tokenay
Token ID323
Feature activation-2.11e-01

Pos logit contributions
 salon+0.083
ours+0.083
 supervision+0.080
 studio+0.078
ses+0.073
Neg logit contributions
ba-0.080
ty-0.076
ve-0.075
Mo-0.072
Ar-0.067
Token.
Token ID13
Feature activation+4.98e-01

Pos logit contributions
ba+0.005
ty+0.005
ve+0.005
Mo+0.005
Ar+0.005
Neg logit contributions
 salon-0.006
ours-0.006
 supervision-0.006
 studio-0.006
ses-0.006
Token He
Token ID679
Feature activation+3.47e-01

Pos logit contributions
 supervision+0.015
 salon+0.015
ours+0.014
 studio+0.014
ses+0.013
Neg logit contributions
 himself-0.012
ba-0.012
ty-0.011
ve-0.011
Mo-0.011
Token worked
Token ID3111
Feature activation+3.27e-01

Pos logit contributions
 salon+0.010
ours+0.010
 supervision+0.010
 studio+0.010
ses+0.009
Neg logit contributions
ba-0.009
ty-0.009
ve-0.009
Mo-0.008
Ar-0.008
Token hard
Token ID1327
Feature activation+6.31e-01

Pos logit contributions
ours+0.006
 salon+0.006
 �+0.006
 supervision+0.006
 studio+0.006
Neg logit contributions
ba-0.011
Mo-0.010
ty-0.010
ve-0.010
rim-0.010
Token listen
Token ID6004
Feature activation+2.90e-01

Pos logit contributions
ba+0.029
ve+0.027
ty+0.027
Mo+0.026
Ar+0.024
Neg logit contributions
 salon-0.029
 supervision-0.029
ours-0.029
 studio-0.028
ses-0.026
Token.
Token ID13
Feature activation-8.30e-01

Pos logit contributions
ours+0.008
 salon+0.008
 supervision+0.008
 studio+0.007
ses+0.007
Neg logit contributions
ba-0.008
ty-0.008
ve-0.008
Mo-0.007
B-0.007
Token She
Token ID1375
Feature activation-2.92e-01

Pos logit contributions
 himself+0.016
E+0.015
B+0.015
Mo+0.014
ba+0.014
Neg logit contributions
 supervision-0.027
 salon-0.027
ours-0.025
 slot-0.025
 studio-0.024
Token threw
Token ID9617
Feature activation+4.61e-01

Pos logit contributions
ba+0.009
ve+0.009
Mo+0.008
ty+0.008
Mill+0.008
Neg logit contributions
 salon-0.007
ours-0.007
 studio-0.006
 supervision-0.006
ses-0.006
Token a
Token ID257
Feature activation-2.26e-01

Pos logit contributions
ours+0.016
 salon+0.016
 supervision+0.016
 studio+0.016
ses+0.015
Neg logit contributions
ty-0.009
ba-0.009
 himself-0.008
ve-0.008
Mo-0.008
Token tant
Token ID24246
Feature activation-3.43e+00

Pos logit contributions
ba+0.006
ve+0.005
ty+0.005
Mo+0.005
Ar+0.005
Neg logit contributions
ours-0.007
 supervision-0.007
 salon-0.007
 studio-0.007
ses-0.006
Tokenrum
Token ID6582
Feature activation-7.10e-01

Pos logit contributions
ba+0.095
Ar+0.094
ty+0.092
Mo+0.092
ve+0.092
Neg logit contributions
ours-0.084
 supervision-0.084
anged-0.080
 salon-0.075
apping-0.074
Token in
Token ID287
Feature activation-6.57e-01

Pos logit contributions
ba+0.009
ty+0.009
 himself+0.009
ve+0.009
B+0.006
Neg logit contributions
 supervision-0.031
ours-0.030
 salon-0.029
 studio-0.029
 Salad-0.028
Token the
Token ID262
Feature activation-2.99e-01

Pos logit contributions
ba+0.015
ty+0.015
ve+0.015
Mo+0.014
B+0.014
Neg logit contributions
ours-0.020
 salon-0.020
 supervision-0.019
 studio-0.019
ses-0.018
Token store
Token ID3650
Feature activation+7.88e-01

Pos logit contributions
ba+0.007
ty+0.006
ve+0.006
Mo+0.006
Ar+0.006
Neg logit contributions
 salon-0.010
ours-0.010
 supervision-0.010
 studio-0.009
ses-0.009
Token.
Token ID13
Feature activation-1.84e-01

Pos logit contributions
ours+0.036
 Salad+0.033
 Acc+0.032
ses+0.032
 supervision+0.031
Neg logit contributions
ty-0.008
ve-0.006
 himself-0.006
ba-0.006
 w-0.004
Token.
Token ID13
Feature activation+2.43e-01

Pos logit contributions
ba+0.027
ve+0.026
Mo+0.024
ty+0.024
Ar+0.023
Neg logit contributions
 salon-0.030
ours-0.029
 supervision-0.029
 studio-0.029
ses-0.026
Token
Token ID198
Feature activation-2.43e-01

Pos logit contributions
 salon+0.008
 supervision+0.008
ours+0.008
 studio+0.008
ses+0.008
Neg logit contributions
ba-0.005
ty-0.005
ve-0.005
Mo-0.005
B-0.004
Token
Token ID198
Feature activation-1.55e-01

Pos logit contributions
ba+0.008
ty+0.007
vell+0.007
ve+0.007
rim+0.007
Neg logit contributions
 salon-0.005
ours-0.005
 studio-0.005
 �-0.005
 slot-0.005
Token"
Token ID1
Feature activation-3.46e-01

Pos logit contributions
ba+0.004
ty+0.004
ve+0.004
Mo+0.003
Ar+0.003
Neg logit contributions
 salon-0.005
ours-0.005
 supervision-0.005
 studio-0.005
ses-0.004
TokenAll
Token ID3237
Feature activation+6.30e-01

Pos logit contributions
ve+0.010
ty+0.009
ba+0.009
cast+0.009
 himself+0.009
Neg logit contributions
 salon-0.009
 studio-0.008
 supervision-0.008
ours-0.008
bug-0.008
Tokenerg
Token ID6422
Feature activation-3.34e+00

Pos logit contributions
ses+0.020
 salon+0.020
 studio+0.020
ours+0.020
 supervision+0.019
Neg logit contributions
ba-0.015
ty-0.015
ve-0.013
Mo-0.012
 himself-0.012
Tokenies
Token ID444
Feature activation-5.83e-01

Pos logit contributions
ba+0.079
Ar+0.077
Mo+0.073
ve+0.072
B+0.070
Neg logit contributions
 supervision-0.102
ours-0.101
 salon-0.094
 mouths-0.092
 studio-0.091
Token!"
Token ID2474
Feature activation-8.56e-02

Pos logit contributions
ba+0.013
ty+0.012
ve+0.012
Mo+0.011
B+0.009
Neg logit contributions
ours-0.020
 salon-0.020
 supervision-0.020
 studio-0.019
ses-0.019
Token shouted
Token ID17293
Feature activation+2.68e-01

Pos logit contributions
ba+0.002
ty+0.002
ve+0.002
Mo+0.002
B+0.002
Neg logit contributions
 salon-0.002
ours-0.002
 supervision-0.002
 studio-0.002
ses-0.002
Token Anne
Token ID15397
Feature activation+1.09e+00

Pos logit contributions
ours+0.008
 supervision+0.008
 salon+0.008
 studio+0.007
ses+0.007
Neg logit contributions
ba-0.007
ty-0.007
ve-0.006
Mo-0.006
B-0.006
Token.
Token ID13
Feature activation+3.79e-01

Pos logit contributions
ours+0.040
 salon+0.038
 studio+0.037
 supervision+0.036
ses+0.036
Neg logit contributions
ba-0.025
ty-0.023
ve-0.022
Mo-0.019
 himself-0.019
Token.
Token ID13
Feature activation-8.19e-01

Pos logit contributions
 salon+0.005
ours+0.005
 supervision+0.005
 studio+0.005
ses+0.005
Neg logit contributions
ba-0.004
ty-0.004
ve-0.004
Mo-0.004
B-0.004
Token He
Token ID679
Feature activation-5.31e-01

Pos logit contributions
 himself+0.013
ba+0.008
Mo+0.008
E+0.008
Ar+0.007
Neg logit contributions
 salon-0.034
 supervision-0.034
ours-0.033
 studio-0.033
bug-0.031
Token kept
Token ID4030
Feature activation-3.05e-01

Pos logit contributions
ba+0.017
ve+0.016
ty+0.016
Mo+0.015
Mill+0.014
Neg logit contributions
 salon-0.013
ours-0.012
 supervision-0.012
 studio-0.012
ses-0.011
Token throwing
Token ID9644
Feature activation+6.64e-01

Pos logit contributions
ba+0.008
ty+0.007
ve+0.007
Mo+0.007
 himself+0.007
Neg logit contributions
ours-0.009
 salon-0.009
 studio-0.009
 supervision-0.009
ses-0.008
Token a
Token ID257
Feature activation-1.23e-01

Pos logit contributions
 salon+0.021
ours+0.021
 supervision+0.021
 studio+0.021
ses+0.019
Neg logit contributions
ba-0.015
ty-0.015
ve-0.014
Mo-0.014
 himself-0.013
Token tant
Token ID24246
Feature activation-3.30e+00

Pos logit contributions
ba+0.003
ve+0.003
ty+0.003
Mo+0.003
B+0.003
Neg logit contributions
ours-0.003
 supervision-0.003
 salon-0.003
 studio-0.003
ses-0.003
Tokenrum
Token ID6582
Feature activation-4.66e-01

Pos logit contributions
ba+0.094
Mo+0.091
Ar+0.089
ty+0.089
ve+0.089
Neg logit contributions
 supervision-0.079
ours-0.076
anged-0.072
apping-0.068
 salon-0.068
Token.
Token ID13
Feature activation-1.84e-01

Pos logit contributions
ba+0.008
 himself+0.008
ty+0.007
ve+0.007
Mo+0.006
Neg logit contributions
ours-0.018
 salon-0.018
 supervision-0.018
 studio-0.018
ses-0.017
Token His
Token ID2399
Feature activation-1.89e-02

Pos logit contributions
 himself+0.003
ba+0.003
Mo+0.003
Ar+0.003
ty+0.003
Neg logit contributions
 supervision-0.007
 salon-0.007
ours-0.007
 studio-0.007
bug-0.006
Token mom
Token ID1995
Feature activation+3.63e-01

Pos logit contributions
ba+0.000
 himself+0.000
ve+0.000
 cool+0.000
 t+0.000
Neg logit contributions
ours-0.001
 studio-0.001
 supervision-0.001
 Sharing-0.001
bug-0.001
Token was
Token ID373
Feature activation+3.96e-01

Pos logit contributions
ours+0.012
 supervision+0.012
 studio+0.011
 salon+0.011
ses+0.011
Neg logit contributions
ty-0.009
ba-0.009
ve-0.008
B-0.007
Mo-0.007
Token surprise
Token ID5975
Feature activation-2.78e-01

Pos logit contributions
ours+0.004
 supervision+0.004
 salon+0.004
 studio+0.003
anged+0.003
Neg logit contributions
ba-0.003
ty-0.003
ve-0.003
Mo-0.003
Ar-0.003
Token.
Token ID13
Feature activation-4.51e-01

Pos logit contributions
ty+0.005
ba+0.005
ve+0.005
 himself+0.005
Mo+0.004
Neg logit contributions
ours-0.010
 salon-0.010
 supervision-0.010
 studio-0.010
ses-0.010
Token He
Token ID679
Feature activation-3.78e-01

Pos logit contributions
ba+0.010
ty+0.010
ve+0.010
Mo+0.009
 himself+0.009
Neg logit contributions
 salon-0.015
ours-0.014
 supervision-0.014
 studio-0.014
ses-0.013
Token threw
Token ID9617
Feature activation+3.85e-01

Pos logit contributions
ba+0.011
ve+0.011
ty+0.011
Mo+0.010
B+0.010
Neg logit contributions
 salon-0.010
ours-0.010
 supervision-0.009
 studio-0.009
ses-0.009
Token a
Token ID257
Feature activation-2.55e-02

Pos logit contributions
 salon+0.013
 studio+0.013
ours+0.013
 supervision+0.013
ses+0.012
Neg logit contributions
ty-0.008
ba-0.008
 himself-0.007
ve-0.007
Mo-0.007
Token tant
Token ID24246
Feature activation-3.30e+00

Pos logit contributions
ba+0.001
ve+0.001
ty+0.001
Ar+0.001
Mo+0.001
Neg logit contributions
ours-0.001
 supervision-0.001
 salon-0.001
 studio-0.001
ses-0.001
Tokenrum
Token ID6582
Feature activation-5.45e-01

Pos logit contributions
ba+0.093
Mo+0.090
Ar+0.090
ty+0.089
ve+0.089
Neg logit contributions
 supervision-0.080
ours-0.076
anged-0.074
 salon-0.069
apping-0.067
Token.
Token ID13
Feature activation-5.65e-02

Pos logit contributions
ba+0.010
 himself+0.009
ty+0.009
ve+0.008
Mo+0.007
Neg logit contributions
 salon-0.022
 supervision-0.021
 studio-0.021
ours-0.021
 gust-0.019
Token He
Token ID679
Feature activation-2.47e-01

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
 himself+0.001
Mo+0.001
Neg logit contributions
 salon-0.002
 supervision-0.002
ours-0.002
 studio-0.002
ses-0.002
Token kicked
Token ID12165
Feature activation+3.64e-02

Pos logit contributions
ba+0.007
ty+0.007
ve+0.007
Mo+0.006
B+0.006
Neg logit contributions
 salon-0.007
ours-0.007
 supervision-0.006
 studio-0.006
ses-0.006
Token and
Token ID290
Feature activation+5.14e-02

Pos logit contributions
 salon+0.001
ours+0.001
 supervision+0.001
 studio+0.001
ses+0.001
Neg logit contributions
ba-0.001
ty-0.001
ve-0.001
Mo-0.001
Ar-0.001
Token case
Token ID1339
Feature activation+1.65e-01

Pos logit contributions
 salon+0.005
ours+0.005
 supervision+0.005
 studio+0.005
ses+0.004
Neg logit contributions
ba-0.008
ty-0.008
ve-0.008
Mo-0.007
B-0.007
Token!
Token ID0
Feature activation-1.48e-01

Pos logit contributions
ours+0.006
 supervision+0.005
 salon+0.005
ses+0.005
 studio+0.005
Neg logit contributions
ty-0.004
ba-0.004
ve-0.004
 himself-0.003
Mo-0.003
Token
Token ID198
Feature activation-4.70e-01

Pos logit contributions
ba+0.003
ve+0.003
ty+0.003
Mo+0.003
 himself+0.003
Neg logit contributions
 salon-0.005
ours-0.005
 supervision-0.005
 studio-0.005
ses-0.005
Token
Token ID198
Feature activation-3.79e-01

Pos logit contributions
ba+0.013
ty+0.012
ve+0.012
vell+0.012
Mo+0.012
Neg logit contributions
ours-0.012
 salon-0.012
 supervision-0.011
 studio-0.011
 �-0.011
TokenAll
Token ID3237
Feature activation-5.60e-01

Pos logit contributions
ba+0.009
ty+0.008
ve+0.008
Mo+0.008
Ar+0.007
Neg logit contributions
 salon-0.012
ours-0.012
 supervision-0.012
 studio-0.012
 slot-0.011
Token of
Token ID286
Feature activation-3.29e+00

Pos logit contributions
ba+0.013
ty+0.013
ve+0.012
Mo+0.011
 himself+0.011
Neg logit contributions
ses-0.018
 salon-0.018
 supervision-0.018
 studio-0.017
ours-0.017
Token a
Token ID257
Feature activation-1.54e+00

Pos logit contributions
ve+0.103
ba+0.103
vell+0.097
ty+0.092
Mo+0.088
Neg logit contributions
 salon-0.070
ours-0.065
 studio-0.064
 supervision-0.064
 �-0.061
Token sudden
Token ID4802
Feature activation-1.10e+00

Pos logit contributions
ve+0.024
ba+0.023
B+0.022
ty+0.021
Mo+0.021
Neg logit contributions
ours-0.067
 supervision-0.064
ses-0.063
 salon-0.061
 studio-0.057
Token,
Token ID11
Feature activation-3.74e-01

Pos logit contributions
ba+0.030
ty+0.029
ve+0.029
Mo+0.027
B+0.026
Neg logit contributions
 salon-0.031
ours-0.031
 supervision-0.030
 studio-0.029
ses-0.028
Token the
Token ID262
Feature activation-9.47e-01

Pos logit contributions
ba+0.010
ty+0.010
ve+0.009
Mo+0.009
B+0.008
Neg logit contributions
 salon-0.011
ours-0.011
 supervision-0.011
 studio-0.010
ses-0.010
Token old
Token ID1468
Feature activation-7.13e-01

Pos logit contributions
ba+0.024
ty+0.023
ve+0.023
Mo+0.021
 himself+0.021
Neg logit contributions
ours-0.029
 supervision-0.028
 salon-0.028
 studio-0.027
 mouths-0.026
Token stubborn
Token ID24444
Feature activation+1.04e-01

Pos logit contributions
 salon+0.007
 supervision+0.007
ours+0.007
 studio+0.007
anged+0.007
Neg logit contributions
ba-0.006
ty-0.005
ve-0.005
 himself-0.005
Mo-0.005
Token.
Token ID13
Feature activation-7.22e-01

Pos logit contributions
ours+0.004
 studio+0.004
 salon+0.004
 supervision+0.004
ses+0.004
Neg logit contributions
ty-0.001
ba-0.001
ve-0.001
 himself-0.001
ke-0.001
Token She
Token ID1375
Feature activation-1.14e-01

Pos logit contributions
ba+0.016
E+0.016
Mo+0.015
 himself+0.015
B+0.015
Neg logit contributions
 supervision-0.022
 salon-0.021
ours-0.021
anged-0.020
 studio-0.020
Token throws
Token ID12542
Feature activation+6.29e-01

Pos logit contributions
ba+0.004
ty+0.004
ve+0.004
Mo+0.004
Ar+0.003
Neg logit contributions
 supervision-0.002
 salon-0.002
ours-0.002
 studio-0.002
ses-0.002
Token a
Token ID257
Feature activation-2.35e-01

Pos logit contributions
ours+0.021
 salon+0.021
 supervision+0.021
 studio+0.021
ses+0.020
Neg logit contributions
ba-0.013
ty-0.013
ve-0.012
Mo-0.011
 himself-0.011
Token tant
Token ID24246
Feature activation-3.27e+00

Pos logit contributions
ba+0.006
ve+0.005
ty+0.005
B+0.005
Ar+0.005
Neg logit contributions
 supervision-0.007
ours-0.007
 salon-0.007
ses-0.007
 studio-0.007
Tokenrum
Token ID6582
Feature activation-4.34e-01

Pos logit contributions
ba+0.091
Ar+0.088
Mo+0.087
ve+0.086
ty+0.086
Neg logit contributions
 supervision-0.084
ours-0.080
 salon-0.075
anged-0.075
ses-0.070
Token.
Token ID13
Feature activation-3.17e-01

Pos logit contributions
ba+0.007
ty+0.007
 himself+0.006
ve+0.006
Mo+0.005
Neg logit contributions
 supervision-0.017
 salon-0.017
 studio-0.017
ours-0.017
 gust-0.016
Token She
Token ID1375
Feature activation+1.46e-01

Pos logit contributions
 himself+0.007
E+0.007
ba+0.007
Mo+0.007
B+0.007
Neg logit contributions
 supervision-0.010
 salon-0.009
ours-0.009
 studio-0.009
anged-0.008
Token kicks
Token ID17364
Feature activation+1.80e-01

Pos logit contributions
 supervision+0.003
 salon+0.003
 studio+0.003
ours+0.003
ses+0.003
Neg logit contributions
ba-0.005
ty-0.005
ve-0.005
Mo-0.004
Ar-0.004
Token and
Token ID290
Feature activation+1.51e-01

Pos logit contributions
 salon+0.005
ours+0.005
 supervision+0.005
 studio+0.005
ses+0.005
Neg logit contributions
ba-0.005
ty-0.004
ve-0.004
Mo-0.004
B-0.004
Token them
Token ID606
Feature activation-8.84e-01

Pos logit contributions
 supervision+0.022
 salon+0.022
ours+0.021
 studio+0.021
 mouths+0.020
Neg logit contributions
ba-0.018
ve-0.017
ty-0.017
Mo-0.016
Ar-0.015
Token and
Token ID290
Feature activation+4.22e-01

Pos logit contributions
ba+0.023
ty+0.021
ve+0.021
Mo+0.020
Ar+0.019
Neg logit contributions
 salon-0.026
ours-0.026
 supervision-0.026
 studio-0.025
ses-0.023
Token never
Token ID1239
Feature activation-9.15e-01

Pos logit contributions
ours+0.013
 salon+0.013
 studio+0.012
 supervision+0.012
ses+0.012
Neg logit contributions
ba-0.010
ty-0.010
 himself-0.009
ve-0.009
Mo-0.008
Token threw
Token ID9617
Feature activation+1.12e+00

Pos logit contributions
ba+0.019
ty+0.018
 himself+0.016
ve+0.016
 w+0.015
Neg logit contributions
ours-0.032
 studio-0.031
 salon-0.031
 slot-0.029
 supervision-0.029
Token a
Token ID257
Feature activation-2.00e-01

Pos logit contributions
 salon+0.040
ours+0.039
 supervision+0.039
 studio+0.038
ses+0.036
Neg logit contributions
ba-0.022
ty-0.021
ve-0.020
Mo-0.019
B-0.017
Token tant
Token ID24246
Feature activation-3.23e+00

Pos logit contributions
ba+0.005
ve+0.005
ty+0.005
Mo+0.005
B+0.005
Neg logit contributions
ours-0.006
 supervision-0.006
 salon-0.006
 studio-0.005
ses-0.005
Tokenrum
Token ID6582
Feature activation-8.97e-01

Pos logit contributions
ba+0.090
Ar+0.085
Mo+0.085
ty+0.084
ve+0.082
Neg logit contributions
 supervision-0.089
ours-0.082
 salon-0.080
anged-0.078
 studio-0.073
Token again
Token ID757
Feature activation-6.68e-01

Pos logit contributions
ba+0.018
ty+0.018
ve+0.016
 himself+0.015
Mo+0.014
Neg logit contributions
ours-0.032
 salon-0.032
 supervision-0.032
 studio-0.031
ses-0.030
Token.
Token ID13
Feature activation+2.29e-01

Pos logit contributions
ba+0.017
ty+0.016
ve+0.016
Mo+0.015
Ar+0.014
Neg logit contributions
 salon-0.020
ours-0.020
 supervision-0.020
 studio-0.019
ses-0.018
Token<|endoftext|>
Token ID50256
Feature activation-8.01e-01

Pos logit contributions
 supervision+0.008
 salon+0.008
ours+0.008
 studio+0.008
ses+0.007
Neg logit contributions
ba-0.004
ty-0.004
Mo-0.004
ve-0.004
B-0.004
TokenOnce
Token ID7454
Feature activation-6.50e-01

Pos logit contributions
ba+0.023
ve+0.021
ty+0.021
Mo+0.020
vell+0.019
Neg logit contributions
ours-0.021
 salon-0.020
 studio-0.020
 supervision-0.019
 Sharing-0.019
Token mom
Token ID1995
Feature activation-8.16e-01

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
Ar+0.001
Mo+0.001
Neg logit contributions
 supervision-0.002
ours-0.001
 salon-0.001
 slot-0.001
 studio-0.001
Tokenm
Token ID76
Feature activation-1.86e+00

Pos logit contributions
ba+0.021
ve+0.020
ty+0.019
Mo+0.019
Ar+0.018
Neg logit contributions
 salon-0.024
ours-0.024
 supervision-0.023
 studio-0.023
 mouths-0.021
Tokenies
Token ID444
Feature activation-4.60e-01

Pos logit contributions
ba+0.053
ve+0.051
ty+0.050
Mo+0.047
B+0.044
Neg logit contributions
 salon-0.050
 supervision-0.050
ours-0.049
 studio-0.047
ses-0.045
Token and
Token ID290
Feature activation+4.17e-01

Pos logit contributions
ba+0.012
ty+0.011
ve+0.011
Mo+0.011
B+0.010
Neg logit contributions
 salon-0.013
ours-0.013
 supervision-0.013
 studio-0.013
ses-0.012
Token d
Token ID288
Feature activation-3.58e-01

Pos logit contributions
ours+0.011
 salon+0.011
 slot+0.010
 Sharing+0.010
 studio+0.010
Neg logit contributions
ba-0.012
ty-0.011
Mo-0.010
 himself-0.010
ve-0.010
Tokenadd
Token ID2860
Feature activation-3.21e+00

Pos logit contributions
ba+0.011
ty+0.010
ve+0.010
Mo+0.010
Ar+0.010
Neg logit contributions
ours-0.009
 salon-0.009
 supervision-0.009
 studio-0.008
ses-0.008
Tokenies
Token ID444
Feature activation-4.79e-01

Pos logit contributions
ba+0.059
ve+0.055
ty+0.055
Mo+0.052
Ar+0.047
Neg logit contributions
ours-0.116
 salon-0.116
 supervision-0.115
 studio-0.111
ses-0.106
Token said
Token ID531
Feature activation+4.84e-01

Pos logit contributions
ba+0.011
ty+0.011
ve+0.010
Mo+0.009
B+0.008
Neg logit contributions
ours-0.016
 salon-0.016
 supervision-0.016
 studio-0.015
ses-0.015
Token,
Token ID11
Feature activation-5.47e-01

Pos logit contributions
 salon+0.015
 supervision+0.015
ours+0.015
 studio+0.015
ses+0.015
Neg logit contributions
ba-0.011
ve-0.011
ty-0.010
 himself-0.010
Mo-0.009
Token "
Token ID366
Feature activation+6.63e-02

Pos logit contributions
ba+0.019
ve+0.019
ty+0.019
Mo+0.018
B+0.017
Neg logit contributions
 salon-0.011
ours-0.011
 supervision-0.011
 studio-0.010
ses-0.010
TokenToday
Token ID8888
Feature activation-1.35e-01

Pos logit contributions
 salon+0.002
ours+0.002
 supervision+0.002
 studio+0.002
ses+0.002
Neg logit contributions
ba-0.002
ty-0.002
ve-0.002
Mo-0.002
 himself-0.002
Token bed
Token ID3996
Feature activation-1.10e+00

Pos logit contributions
 supervision+0.003
 salon+0.003
ours+0.002
 studio+0.002
 mouths+0.002
Neg logit contributions
ba-0.002
ve-0.002
ty-0.002
Mo-0.002
Ar-0.002
Token wearing
Token ID5762
Feature activation-9.22e-01

Pos logit contributions
ba+0.027
ty+0.026
ve+0.025
Mo+0.023
B+0.022
Neg logit contributions
ours-0.034
 salon-0.034
 supervision-0.033
 studio-0.032
ses-0.031
Token his
Token ID465
Feature activation-1.23e+00

Pos logit contributions
ba+0.022
ve+0.020
Mo+0.018
ty+0.018
Ar+0.017
Neg logit contributions
 studio-0.030
 salon-0.029
 supervision-0.029
ours-0.029
 mouths-0.028
Token favourite
Token ID12507
Feature activation-2.13e+00

Pos logit contributions
ba+0.037
Mo+0.034
ty+0.034
ve+0.033
Mill+0.032
Neg logit contributions
 salon-0.032
 studio-0.032
 supervision-0.031
 mouths-0.031
ours-0.030
Token py
Token ID12972
Feature activation-1.01e+00

Pos logit contributions
ba+0.065
Mo+0.061
ve+0.061
ty+0.058
Ar+0.056
Neg logit contributions
 salon-0.054
 studio-0.054
ours-0.051
 mouths-0.050
 supervision-0.048
Tokenj
Token ID73
Feature activation-3.18e+00

Pos logit contributions
ba+0.025
ve+0.023
Mo+0.022
ty+0.022
 himself+0.021
Neg logit contributions
 salon-0.029
ours-0.028
 studio-0.027
 �-0.025
 square-0.025
Tokenamas
Token ID17485
Feature activation-5.58e-01

Pos logit contributions
ba+0.087
Mo+0.082
ve+0.080
Ar+0.080
ty+0.080
Neg logit contributions
 salon-0.090
ours-0.088
 studio-0.085
 supervision-0.082
 mouths-0.075
Token,
Token ID11
Feature activation-1.79e-02

Pos logit contributions
ba+0.014
ty+0.013
ve+0.013
Mo+0.012
Ar+0.011
Neg logit contributions
 salon-0.017
ours-0.017
 supervision-0.017
 studio-0.016
ses-0.015
Token feeling
Token ID4203
Feature activation+8.69e-01

Pos logit contributions
ba+0.000
ve+0.000
ty+0.000
Mo+0.000
Ar+0.000
Neg logit contributions
 salon-0.001
 supervision-0.001
ours-0.001
 studio-0.001
 mouths-0.000
Token so
Token ID523
Feature activation-7.02e-02

Pos logit contributions
ours+0.026
 salon+0.026
 supervision+0.026
 studio+0.025
 mouths+0.023
Neg logit contributions
ba-0.022
ty-0.021
ve-0.021
Mo-0.020
Ar-0.018
Token proud
Token ID6613
Feature activation+1.17e+00

Pos logit contributions
ba+0.002
ty+0.002
ve+0.002
Mo+0.002
B+0.002
Neg logit contributions
ours-0.002
 supervision-0.002
 salon-0.002
 studio-0.002
 mouths-0.002
Token.
Token ID13
Feature activation+2.09e-01

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
Mo+0.001
Ar+0.001
Neg logit contributions
 salon-0.001
ours-0.001
 supervision-0.001
 studio-0.001
ses-0.001
Token
Token ID220
Feature activation+1.40e-01

Pos logit contributions
 salon+0.007
 supervision+0.007
ses+0.006
ours+0.006
 studio+0.006
Neg logit contributions
 himself-0.004
B-0.004
ty-0.004
ve-0.004
Mo-0.004
Token
Token ID198
Feature activation+9.36e-02

Pos logit contributions
ours+0.004
 supervision+0.004
 salon+0.004
 studio+0.004
 Sharing+0.004
Neg logit contributions
ba-0.004
ty-0.003
ve-0.003
Mo-0.003
Ar-0.003
Token
Token ID198
Feature activation+6.69e-02

Pos logit contributions
ours+0.002
 studio+0.002
 salon+0.002
 Sharing+0.002
 �+0.002
Neg logit contributions
ba-0.003
ty-0.003
ve-0.002
Mo-0.002
vell-0.002
TokenAll
Token ID3237
Feature activation-3.36e-01

Pos logit contributions
 salon+0.002
ours+0.002
 supervision+0.002
 studio+0.002
 mouths+0.002
Neg logit contributions
ba-0.002
ty-0.001
ve-0.001
Mo-0.001
Ar-0.001
Token of
Token ID286
Feature activation-3.12e+00

Pos logit contributions
ty+0.008
ba+0.008
ve+0.007
 himself+0.007
B+0.006
Neg logit contributions
 salon-0.012
 supervision-0.011
ses-0.011
 studio-0.011
ours-0.010
Token his
Token ID465
Feature activation-1.51e+00

Pos logit contributions
ba+0.107
ve+0.103
vell+0.097
ty+0.091
Mo+0.090
Neg logit contributions
 studio-0.058
 Lisa-0.057
ours-0.057
 teamwork-0.057
 square-0.056
Token friends
Token ID2460
Feature activation-5.49e-01

Pos logit contributions
ba+0.025
ve+0.024
ty+0.024
Mo+0.021
B+0.020
Neg logit contributions
ours-0.058
 salon-0.058
 supervision-0.057
 studio-0.055
 slot-0.053
Token were
Token ID547
Feature activation-6.23e-01

Pos logit contributions
ba+0.012
ty+0.012
ve+0.012
Mo+0.010
B+0.010
Neg logit contributions
ours-0.019
 supervision-0.019
ses-0.018
 salon-0.018
 studio-0.017
Token happy
Token ID3772
Feature activation+4.84e-01

Pos logit contributions
ba+0.014
ty+0.014
ve+0.013
Mo+0.013
 himself+0.012
Neg logit contributions
ours-0.020
 salon-0.020
 supervision-0.020
 studio-0.019
ses-0.019
Token because
Token ID780
Feature activation+1.94e-01

Pos logit contributions
 supervision+0.016
 salon+0.016
ours+0.016
 studio+0.015
 slot+0.014
Neg logit contributions
ba-0.012
ty-0.011
ve-0.011
 himself-0.010
Mo-0.010
Token his
Token ID465
Feature activation-7.77e-02

Pos logit contributions
ours+0.009
 supervision+0.008
 salon+0.008
 studio+0.008
 mouths+0.008
Neg logit contributions
ve-0.005
ba-0.005
ty-0.004
Mo-0.004
Mill-0.004
Token feet
Token ID3625
Feature activation-5.15e-01

Pos logit contributions
ba+0.002
ve+0.002
ty+0.002
 himself+0.002
Mo+0.002
Neg logit contributions
ours-0.002
 salon-0.002
 studio-0.002
 supervision-0.002
 Sharing-0.002
Token and
Token ID290
Feature activation+2.10e-01

Pos logit contributions
ba+0.011
ty+0.010
ve+0.010
Mo+0.009
B+0.008
Neg logit contributions
 salon-0.018
ours-0.018
 supervision-0.018
 studio-0.017
ses-0.016
Token threw
Token ID9617
Feature activation+3.09e-01

Pos logit contributions
ours+0.007
 salon+0.007
 studio+0.007
 supervision+0.007
 Salad+0.006
Neg logit contributions
ba-0.005
ty-0.004
ve-0.004
Mo-0.004
 himself-0.004
Token a
Token ID257
Feature activation-1.31e-01

Pos logit contributions
 salon+0.012
ours+0.011
 supervision+0.011
 studio+0.011
ses+0.011
Neg logit contributions
ba-0.006
ty-0.005
ve-0.005
Mo-0.005
 himself-0.004
Token tant
Token ID24246
Feature activation-3.11e+00

Pos logit contributions
ba+0.003
ve+0.003
ty+0.003
Mo+0.003
B+0.003
Neg logit contributions
ours-0.004
 salon-0.004
 supervision-0.004
 studio-0.004
ses-0.003
Tokenrum
Token ID6582
Feature activation-5.25e-01

Pos logit contributions
ba+0.090
Ar+0.089
Mo+0.086
 cool+0.085
ve+0.084
Neg logit contributions
 supervision-0.077
anged-0.068
ours-0.067
 salon-0.064
ses-0.062
Token.
Token ID13
Feature activation+4.68e-02

Pos logit contributions
ba+0.010
ty+0.009
 himself+0.009
ve+0.009
Mo+0.008
Neg logit contributions
 salon-0.020
 studio-0.020
 supervision-0.020
ours-0.019
ses-0.018
Token
Token ID198
Feature activation-7.99e-02

Pos logit contributions
 supervision+0.002
 salon+0.002
 studio+0.002
ours+0.002
ses+0.001
Neg logit contributions
ba-0.001
 himself-0.001
Mo-0.001
Ar-0.001
B-0.001
Token
Token ID198
Feature activation-1.75e-01

Pos logit contributions
ba+0.002
ty+0.002
ve+0.002
Mo+0.002
vell+0.002
Neg logit contributions
ours-0.002
 salon-0.002
 supervision-0.002
 studio-0.002
 slot-0.002
Token"
Token ID1
Feature activation-4.48e-01

Pos logit contributions
ba+0.004
ty+0.004
ve+0.004
Mo+0.004
B+0.004
Neg logit contributions
 salon-0.005
ours-0.005
 supervision-0.005
 studio-0.005
ses-0.005
Tokenmy
Token ID1820
Feature activation+5.78e-01

Pos logit contributions
 salon+0.005
 supervision+0.005
ours+0.005
 studio+0.005
ses+0.005
Neg logit contributions
ba-0.004
ve-0.004
ty-0.004
Mo-0.004
Ar-0.003
Token.
Token ID13
Feature activation+6.66e-01

Pos logit contributions
 salon+0.024
ours+0.023
 studio+0.023
 supervision+0.022
ses+0.021
Neg logit contributions
ty-0.010
ba-0.010
ve-0.009
Mo-0.007
B-0.007
Token What
Token ID1867
Feature activation+1.09e+00

Pos logit contributions
 salon+0.017
 supervision+0.015
ours+0.015
 studio+0.015
anged+0.013
Neg logit contributions
ba-0.021
Mo-0.020
 himself-0.019
ty-0.019
B-0.019
Token was
Token ID373
Feature activation-6.14e-01

Pos logit contributions
 salon+0.029
ours+0.028
 supervision+0.028
 studio+0.027
ses+0.026
Neg logit contributions
ba-0.032
ty-0.031
ve-0.030
Mo-0.028
B-0.028
Token your
Token ID534
Feature activation+2.47e-01

Pos logit contributions
ba+0.019
ty+0.018
ve+0.018
Mo+0.017
Ar+0.016
Neg logit contributions
 salon-0.015
ours-0.015
 supervision-0.015
 studio-0.014
ses-0.014
Token nightmare
Token ID17123
Feature activation-3.07e+00

Pos logit contributions
 salon+0.006
 supervision+0.006
 studio+0.006
ses+0.005
ours+0.005
Neg logit contributions
ba-0.008
ve-0.008
ty-0.008
Mo-0.007
B-0.007
Token about
Token ID546
Feature activation+2.46e-01

Pos logit contributions
ba+0.065
Mo+0.063
ve+0.061
Ar+0.059
ty+0.058
Neg logit contributions
 salon-0.104
 supervision-0.102
ours-0.101
 studio-0.100
ses-0.092
Token?"
Token ID1701
Feature activation+3.07e-01

Pos logit contributions
 salon+0.008
ours+0.008
 studio+0.008
 supervision+0.008
ses+0.007
Neg logit contributions
ba-0.005
ve-0.005
Mo-0.005
ty-0.005
Ar-0.005
Token his
Token ID465
Feature activation+2.80e-01

Pos logit contributions
 salon+0.009
ours+0.009
 supervision+0.009
 studio+0.009
ses+0.008
Neg logit contributions
ba-0.008
ty-0.007
ve-0.007
Mo-0.007
B-0.006
Token mom
Token ID1995
Feature activation-8.52e-01

Pos logit contributions
 salon+0.005
ours+0.004
 supervision+0.004
 studio+0.004
ses+0.004
Neg logit contributions
ba-0.011
ve-0.011
ty-0.011
Mo-0.010
Ar-0.010
Token asked
Token ID1965
Feature activation-8.85e-02

Pos logit contributions
ba+0.019
ve+0.019
Ar+0.018
Mo+0.018
ty+0.018
Neg logit contributions
 salon-0.026
 supervision-0.026
ours-0.026
 studio-0.025
bug-0.023
Tokenebra
Token ID37052
Feature activation-1.60e+00

Pos logit contributions
 supervision+0.001
ours+0.001
 salon+0.001
 studio+0.001
 mouths+0.001
Neg logit contributions
ba-0.001
ve-0.001
ty-0.001
Mo-0.001
ke-0.001
Token,
Token ID11
Feature activation-2.17e-01

Pos logit contributions
ba+0.032
ty+0.030
ve+0.028
Mo+0.025
B+0.023
Neg logit contributions
 supervision-0.058
ours-0.058
 salon-0.056
 studio-0.054
 mouths-0.053
Token or
Token ID393
Feature activation+8.75e-01

Pos logit contributions
ba+0.005
ty+0.005
ve+0.005
Mo+0.005
Ar+0.004
Neg logit contributions
 salon-0.007
ours-0.007
 supervision-0.007
 studio-0.007
ses-0.006
Token a
Token ID257
Feature activation-1.73e-02

Pos logit contributions
 salon+0.030
ours+0.030
 supervision+0.030
 studio+0.029
ses+0.028
Neg logit contributions
ba-0.018
ty-0.017
ve-0.016
Mo-0.016
B-0.014
Token peac
Token ID41961
Feature activation-1.22e+00

Pos logit contributions
ba+0.000
ty+0.000
ve+0.000
Mo+0.000
Ar+0.000
Neg logit contributions
ours-0.001
 supervision-0.001
 salon-0.000
 studio-0.000
anged-0.000
Tokenock
Token ID735
Feature activation-3.06e+00

Pos logit contributions
ba+0.022
Mo+0.021
ve+0.021
Ar+0.021
 himself+0.019
Neg logit contributions
 salon-0.045
 studio-0.042
ours-0.042
 supervision-0.042
bug-0.040
Token today
Token ID1909
Feature activation-6.31e-01

Pos logit contributions
ba+0.057
ve+0.055
ty+0.054
Mo+0.053
Ar+0.048
Neg logit contributions
 salon-0.110
ours-0.108
 supervision-0.106
 studio-0.105
ses-0.099
Token?"
Token ID1701
Feature activation+7.85e-01

Pos logit contributions
ba+0.010
ty+0.009
ve+0.008
Mo+0.007
 himself+0.007
Neg logit contributions
 supervision-0.025
ours-0.025
 salon-0.025
ses-0.024
 studio-0.024
Token L
Token ID406
Feature activation-1.17e+00

Pos logit contributions
 supervision+0.024
ours+0.024
 salon+0.024
 studio+0.023
ses+0.022
Neg logit contributions
ba-0.020
ty-0.018
ve-0.018
Mo-0.017
B-0.017
Tokenila
Token ID10102
Feature activation-1.41e-01

Pos logit contributions
ba+0.038
ty+0.038
ve+0.037
Mo+0.037
Ar+0.036
Neg logit contributions
 salon-0.027
ours-0.027
 studio-0.024
 supervision-0.024
bug-0.021
Token was
Token ID373
Feature activation+2.25e-01

Pos logit contributions
ba+0.004
ty+0.004
ve+0.004
Mo+0.004
Ar+0.004
Neg logit contributions
 salon-0.004
ours-0.004
 supervision-0.004
 studio-0.004
ses-0.003
Token sad
Token ID6507
Feature activation-3.02e-01

Pos logit contributions
ours+0.014
 salon+0.014
 studio+0.014
 supervision+0.014
 slot+0.013
Neg logit contributions
ty-0.008
ve-0.008
ba-0.008
 himself-0.008
 almost-0.007
Token.
Token ID13
Feature activation+1.84e-01

Pos logit contributions
ty+0.006
ba+0.006
ve+0.006
 himself+0.005
Mo+0.005
Neg logit contributions
ours-0.011
 salon-0.011
 supervision-0.011
 studio-0.010
ses-0.010
Token She
Token ID1375
Feature activation-4.15e-02

Pos logit contributions
 supervision+0.006
ours+0.006
 salon+0.006
 studio+0.006
ses+0.006
Neg logit contributions
ty-0.003
 himself-0.003
ve-0.003
Mo-0.003
ba-0.003
Token threw
Token ID9617
Feature activation+3.56e-01

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
Mo+0.001
Ar+0.001
Neg logit contributions
 salon-0.001
ours-0.001
 supervision-0.001
 studio-0.001
ses-0.001
Token a
Token ID257
Feature activation-2.38e-01

Pos logit contributions
ours+0.012
 salon+0.012
 supervision+0.011
 studio+0.011
ses+0.011
Neg logit contributions
ty-0.008
ba-0.008
ve-0.007
Mo-0.007
 himself-0.007
Token tant
Token ID24246
Feature activation-3.04e+00

Pos logit contributions
ba+0.006
ve+0.006
ty+0.005
Mo+0.005
B+0.005
Neg logit contributions
ours-0.008
 supervision-0.007
 salon-0.007
 studio-0.007
ses-0.007
Tokenrum
Token ID6582
Feature activation-3.72e-01

Pos logit contributions
ba+0.088
Ar+0.086
ve+0.083
ty+0.082
Mo+0.081
Neg logit contributions
 supervision-0.072
ours-0.070
anged-0.069
 salon-0.063
apping-0.062
Token and
Token ID290
Feature activation-1.50e-01

Pos logit contributions
ba+0.005
 himself+0.004
ve+0.004
ty+0.004
Mo+0.003
Neg logit contributions
 supervision-0.016
ours-0.016
 salon-0.016
 studio-0.016
 Salad-0.015
Token cried
Token ID16896
Feature activation+5.76e-02

Pos logit contributions
ty+0.003
ba+0.003
 himself+0.003
ve+0.003
B+0.003
Neg logit contributions
ours-0.005
 studio-0.005
 salon-0.005
 supervision-0.005
ses-0.005
Token loudly
Token ID23112
Feature activation-2.51e-01

Pos logit contributions
 salon+0.002
ours+0.002
 supervision+0.002
 studio+0.002
ses+0.002
Neg logit contributions
ba-0.001
ty-0.001
ve-0.001
Mo-0.001
B-0.001
Token.
Token ID13
Feature activation+1.76e-01

Pos logit contributions
ty+0.005
ve+0.005
ba+0.005
 himself+0.004
Mo+0.004
Neg logit contributions
ours-0.009
 salon-0.009
 supervision-0.008
 studio-0.008
 mouths-0.008
Token.
Token ID13
Feature activation+6.36e-01

Pos logit contributions
 salon+0.004
ours+0.004
 supervision+0.004
 studio+0.004
ses+0.003
Neg logit contributions
ba-0.003
ty-0.003
ve-0.003
Mo-0.003
B-0.003
Token
Token ID220
Feature activation+6.30e-01

Pos logit contributions
 salon+0.022
 supervision+0.021
 studio+0.021
ours+0.020
ses+0.019
Neg logit contributions
 himself-0.013
ba-0.012
ty-0.012
Mo-0.012
ve-0.012
Token
Token ID198
Feature activation+3.89e-01

Pos logit contributions
ours+0.019
 supervision+0.019
 salon+0.018
 studio+0.018
ses+0.017
Neg logit contributions
ba-0.016
ty-0.015
ve-0.015
Mo-0.014
B-0.013
Token
Token ID198
Feature activation+3.45e-01

Pos logit contributions
ours+0.009
 �+0.008
 salon+0.008
 Sharing+0.008
 studio+0.008
Neg logit contributions
ba-0.012
ty-0.011
ve-0.011
vell-0.011
Mo-0.011
TokenAll
Token ID3237
Feature activation-3.00e-01

Pos logit contributions
 salon+0.010
ours+0.010
 studio+0.010
 supervision+0.010
 Sharing+0.009
Neg logit contributions
ba-0.008
ty-0.008
ve-0.008
Mo-0.007
 himself-0.006
Token of
Token ID286
Feature activation-3.03e+00

Pos logit contributions
ba+0.007
ty+0.006
Mo+0.006
 himself+0.006
ve+0.005
Neg logit contributions
 salon-0.011
 supervision-0.010
ses-0.010
 studio-0.010
ours-0.010
Token the
Token ID262
Feature activation-1.05e+00

Pos logit contributions
ba+0.097
ve+0.094
ty+0.089
vell+0.087
Mo+0.083
Neg logit contributions
 studio-0.060
 salon-0.059
 gust-0.059
 teamwork-0.058
ours-0.057
Token roads
Token ID9725
Feature activation-5.30e-01

Pos logit contributions
ba+0.025
ty+0.024
ve+0.024
Mo+0.023
Ar+0.021
Neg logit contributions
ours-0.033
 salon-0.033
 supervision-0.032
 studio-0.031
ses-0.030
Token and
Token ID290
Feature activation-1.39e+00

Pos logit contributions
ba+0.012
ty+0.012
ve+0.012
 himself+0.010
Mo+0.010
Neg logit contributions
ours-0.017
 supervision-0.017
 salon-0.017
 studio-0.017
ses-0.016
Token cars
Token ID5006
Feature activation-6.45e-01

Pos logit contributions
ba+0.033
ty+0.031
ve+0.030
Mo+0.029
Ar+0.028
Neg logit contributions
ours-0.045
 salon-0.045
 supervision-0.044
 studio-0.042
ses-0.042
Token continued
Token ID3767
Feature activation-1.35e-01

Pos logit contributions
ba+0.017
ty+0.016
ve+0.016
Mo+0.014
Ar+0.014
Neg logit contributions
ours-0.019
 salon-0.019
 supervision-0.019
 studio-0.018
ses-0.018
Token anyone
Token ID2687
Feature activation-8.86e-01

Pos logit contributions
ours+0.033
 salon+0.033
 supervision+0.032
 studio+0.031
ses+0.030
Neg logit contributions
ba-0.027
ty-0.026
ve-0.024
Mo-0.024
 himself-0.022
Token.
Token ID13
Feature activation-2.22e-01

Pos logit contributions
ba+0.019
ty+0.019
ve+0.018
Mo+0.016
 himself+0.015
Neg logit contributions
ours-0.030
 salon-0.030
 supervision-0.029
 studio-0.029
ses-0.029
Token Where
Token ID6350
Feature activation+9.56e-02

Pos logit contributions
ve+0.005
Mo+0.005
 himself+0.005
Mill+0.005
ba+0.005
Neg logit contributions
 salon-0.007
 supervision-0.007
ours-0.007
 pitcher-0.006
anged-0.006
Token was
Token ID373
Feature activation-2.52e-01

Pos logit contributions
ours+0.002
 salon+0.002
 supervision+0.002
 studio+0.002
ses+0.002
Neg logit contributions
ba-0.003
ty-0.003
ve-0.003
Mo-0.003
Ar-0.003
Token the
Token ID262
Feature activation+4.72e-01

Pos logit contributions
ba+0.006
ty+0.006
ve+0.006
Mo+0.005
B+0.005
Neg logit contributions
ours-0.008
 salon-0.008
 supervision-0.008
 studio-0.008
ses-0.007
Token voice
Token ID3809
Feature activation-3.03e+00

Pos logit contributions
 salon+0.015
ours+0.014
 supervision+0.014
 studio+0.014
ses+0.013
Neg logit contributions
ba-0.012
ty-0.011
ve-0.011
Mo-0.010
B-0.010
Token coming
Token ID2406
Feature activation+5.70e-01

Pos logit contributions
ba+0.077
ve+0.073
ty+0.072
Mo+0.071
Ar+0.067
Neg logit contributions
 salon-0.090
ours-0.089
 supervision-0.088
 studio-0.086
 mouths-0.078
Token from
Token ID422
Feature activation-7.34e-01

Pos logit contributions
 salon+0.016
 studio+0.016
ours+0.015
ap+0.015
 �+0.015
Neg logit contributions
ve-0.014
rob-0.013
ba-0.013
Ar-0.012
ty-0.012
Token?
Token ID30
Feature activation+4.83e-02

Pos logit contributions
ba+0.020
ve+0.019
ty+0.018
Mo+0.017
B+0.017
Neg logit contributions
 salon-0.021
 studio-0.020
ours-0.020
 supervision-0.020
ses-0.019
Token
Token ID198
Feature activation-1.01e-01

Pos logit contributions
ours+0.002
 salon+0.002
 supervision+0.002
 studio+0.001
ses+0.001
Neg logit contributions
ba-0.001
ve-0.001
Mo-0.001
ty-0.001
B-0.001
Token
Token ID198
Feature activation-5.28e-02

Pos logit contributions
ba+0.002
ty+0.002
ve+0.002
Mo+0.002
Ar+0.002
Neg logit contributions
 salon-0.003
ours-0.003
 supervision-0.003
 studio-0.003
ses-0.003
Token mom
Token ID1995
Feature activation-1.23e+00

Pos logit contributions
ours+0.001
 salon+0.001
 studio+0.001
 supervision+0.001
 Sharing+0.001
Neg logit contributions
ba-0.001
ve-0.001
ty-0.001
Mo-0.001
 himself-0.001
Tokenm
Token ID76
Feature activation-1.53e+00

Pos logit contributions
ba+0.027
ty+0.027
ve+0.025
Mo+0.023
 himself+0.021
Neg logit contributions
ours-0.041
 salon-0.041
 studio-0.040
 supervision-0.040
ses-0.039
Tokenies
Token ID444
Feature activation-7.21e-01

Pos logit contributions
ba+0.044
ty+0.042
ve+0.041
Mo+0.040
Ar+0.038
Neg logit contributions
ours-0.041
 salon-0.041
 supervision-0.040
 studio-0.039
ses-0.036
Token and
Token ID290
Feature activation+5.39e-02

Pos logit contributions
ba+0.016
ty+0.016
ve+0.015
Mo+0.014
 himself+0.013
Neg logit contributions
ours-0.024
 salon-0.024
 supervision-0.023
 studio-0.022
ses-0.022
Token d
Token ID288
Feature activation-3.87e-01

Pos logit contributions
ours+0.002
 salon+0.002
 studio+0.001
 supervision+0.001
ses+0.001
Neg logit contributions
ba-0.001
ty-0.001
ve-0.001
Mo-0.001
 himself-0.001
Tokenadd
Token ID2860
Feature activation-3.01e+00

Pos logit contributions
ba+0.011
ty+0.011
ve+0.010
Mo+0.010
Ar+0.010
Neg logit contributions
ours-0.010
 salon-0.010
 supervision-0.009
 studio-0.009
anged-0.009
Tokenies
Token ID444
Feature activation-4.61e-01

Pos logit contributions
ba+0.054
ve+0.050
ty+0.049
Mo+0.048
Ar+0.044
Neg logit contributions
ours-0.109
 supervision-0.109
 salon-0.108
 studio-0.104
 mouths-0.100
Token.
Token ID13
Feature activation-1.40e-01

Pos logit contributions
ba+0.010
ty+0.010
ve+0.009
Mo+0.008
B+0.008
Neg logit contributions
ours-0.016
 salon-0.016
 supervision-0.015
 studio-0.015
ses-0.015
Token They
Token ID1119
Feature activation-9.35e-01

Pos logit contributions
ba+0.004
ve+0.003
ty+0.003
Mo+0.003
Ar+0.003
Neg logit contributions
 supervision-0.004
 salon-0.004
ours-0.004
 studio-0.004
ses-0.004
Token were
Token ID547
Feature activation-5.94e-01

Pos logit contributions
ba+0.024
ty+0.023
ve+0.023
Mo+0.022
B+0.021
Neg logit contributions
 salon-0.027
ours-0.027
 supervision-0.026
 studio-0.026
ses-0.024
Token all
Token ID477
Feature activation-4.29e-01

Pos logit contributions
ba+0.014
ty+0.013
ve+0.013
Mo+0.012
B+0.011
Neg logit contributions
 salon-0.019
ours-0.019
 supervision-0.019
 studio-0.018
ses-0.017
Token
Token ID26391
Feature activation-1.33e-01

Pos logit contributions
ba+0.019
ty+0.019
ve+0.018
Mo+0.017
B+0.016
Neg logit contributions
ours-0.024
 salon-0.023
 supervision-0.023
 studio-0.022
 mouths-0.022
Token
Token ID220
Feature activation+4.17e-01

Pos logit contributions
ve+0.003
ba+0.003
ty+0.003
Mo+0.003
Ar+0.002
Neg logit contributions
ours-0.004
 salon-0.004
 supervision-0.004
 studio-0.004
ses-0.004
Token
Token ID198
Feature activation-3.69e-02

Pos logit contributions
ours+0.011
 salon+0.011
 supervision+0.011
 studio+0.010
 Sharing+0.010
Neg logit contributions
ba-0.012
ty-0.012
ve-0.011
Mo-0.011
Ar-0.010
Token
Token ID198
Feature activation+1.74e-02

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
vell+0.001
Mo+0.001
Neg logit contributions
 salon-0.001
ours-0.001
 �-0.001
 studio-0.001
 Sharing-0.001
TokenAll
Token ID3237
Feature activation-2.44e-01

Pos logit contributions
 salon+0.001
 studio+0.000
ours+0.000
 Sharing+0.000
 supervision+0.000
Neg logit contributions
ba-0.000
ve-0.000
ty-0.000
Mo-0.000
vell-0.000
Token of
Token ID286
Feature activation-3.01e+00

Pos logit contributions
ba+0.006
ty+0.005
ve+0.005
 himself+0.005
Mo+0.004
Neg logit contributions
ses-0.008
 supervision-0.008
 salon-0.008
 studio-0.008
sing-0.008
Token his
Token ID465
Feature activation-1.30e+00

Pos logit contributions
ve+0.095
ba+0.092
vell+0.086
ty+0.086
Mo+0.081
Neg logit contributions
 salon-0.070
 studio-0.065
ours-0.064
 supervision-0.063
aster-0.060
Token classmates
Token ID28999
Feature activation-4.87e-01

Pos logit contributions
ba+0.028
ve+0.026
ty+0.024
Mo+0.023
Ar+0.023
Neg logit contributions
ours-0.045
 supervision-0.043
 salon-0.043
 studio-0.040
 slot-0.040
Token came
Token ID1625
Feature activation-2.62e-01

Pos logit contributions
ba+0.012
ve+0.012
ty+0.011
Mo+0.010
B+0.009
Neg logit contributions
ours-0.015
 supervision-0.015
ses-0.015
 salon-0.015
 studio-0.014
Token around
Token ID1088
Feature activation-6.66e-02

Pos logit contributions
ba+0.007
ty+0.007
ve+0.007
Mo+0.006
B+0.006
Neg logit contributions
 salon-0.008
ours-0.008
 supervision-0.007
 studio-0.007
ses-0.007
Token to
Token ID284
Feature activation-9.95e-01

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
 himself+0.001
Mo+0.001
Neg logit contributions
ours-0.002
 supervision-0.002
 salon-0.002
ses-0.002
 studio-0.002
Token butterfly
Token ID35113
Feature activation-2.59e-01

Pos logit contributions
ty+0.003
ba+0.003
ve+0.003
B+0.002
Mo+0.002
Neg logit contributions
 supervision-0.004
ours-0.004
 salon-0.004
anged-0.004
ses-0.004
Token!
Token ID0
Feature activation+6.33e-01

Pos logit contributions
ba+0.005
ty+0.005
ve+0.005
Mo+0.004
B+0.004
Neg logit contributions
ours-0.010
 supervision-0.010
 salon-0.009
 studio-0.009
ses-0.009
Token
Token ID198
Feature activation+2.06e-01

Pos logit contributions
 supervision+0.020
ours+0.020
 salon+0.020
 studio+0.019
ses+0.018
Neg logit contributions
ba-0.014
ty-0.014
ve-0.014
B-0.013
Mo-0.013
Token
Token ID198
Feature activation+4.36e-01

Pos logit contributions
 salon+0.006
ours+0.006
 supervision+0.005
 studio+0.005
ses+0.005
Neg logit contributions
ba-0.006
ty-0.005
ve-0.005
Mo-0.005
Ar-0.005
TokenAll
Token ID3237
Feature activation-1.48e-01

Pos logit contributions
 salon+0.015
ours+0.015
 supervision+0.015
 studio+0.014
ses+0.013
Neg logit contributions
ba-0.009
ty-0.009
ve-0.009
Mo-0.008
Ar-0.007
Token of
Token ID286
Feature activation-3.01e+00

Pos logit contributions
ba+0.003
ty+0.003
B+0.003
ve+0.003
Mo+0.003
Neg logit contributions
 supervision-0.005
 salon-0.005
ses-0.005
ours-0.005
 studio-0.005
Token Lily
Token ID20037
Feature activation-5.74e-02

Pos logit contributions
ba+0.106
ve+0.106
vell+0.099
ty+0.098
Mo+0.095
Neg logit contributions
 salon-0.054
 studio-0.053
 gust-0.051
ours-0.048
 square-0.047
Token's
Token ID338
Feature activation-5.22e-01

Pos logit contributions
ba+0.002
ty+0.001
ve+0.001
Mo+0.001
B+0.001
Neg logit contributions
ours-0.002
 salon-0.002
 supervision-0.002
 studio-0.002
ses-0.001
Token family
Token ID1641
Feature activation-2.59e-01

Pos logit contributions
ve+0.006
ba+0.006
ty+0.005
 himself+0.005
 w+0.005
Neg logit contributions
ours-0.022
 salon-0.021
 supervision-0.021
 slot-0.020
 Salad-0.019
Token and
Token ID290
Feature activation-2.13e-01

Pos logit contributions
ba+0.006
ty+0.005
ve+0.005
Mo+0.004
B+0.004
Neg logit contributions
 supervision-0.009
ours-0.009
ses-0.009
 salon-0.009
 slot-0.008
Token friends
Token ID2460
Feature activation-7.73e-01

Pos logit contributions
ba+0.004
ty+0.004
ve+0.004
Mo+0.004
B+0.003
Neg logit contributions
ours-0.008
 supervision-0.008
 salon-0.007
 studio-0.007
 slot-0.007
Token her
Token ID607
Feature activation+1.12e-01

Pos logit contributions
ba+0.015
ty+0.015
ve+0.013
Mo+0.013
 himself+0.012
Neg logit contributions
ours-0.028
 salon-0.028
 supervision-0.027
 studio-0.027
ses-0.026
Token feet
Token ID3625
Feature activation-5.04e-01

Pos logit contributions
ours+0.003
 Salad+0.003
 studio+0.003
 supervision+0.003
 Sharing+0.003
Neg logit contributions
ve-0.003
ba-0.003
ty-0.003
 himself-0.003
 w-0.002
Token and
Token ID290
Feature activation+1.00e-02

Pos logit contributions
ty+0.008
ba+0.008
ve+0.008
Mo+0.006
 himself+0.006
Neg logit contributions
ours-0.020
 supervision-0.019
 studio-0.019
 salon-0.019
ses-0.019
Token threw
Token ID9617
Feature activation+2.86e-01

Pos logit contributions
ours+0.000
 salon+0.000
 studio+0.000
 supervision+0.000
apping+0.000
Neg logit contributions
ba-0.000
ty-0.000
ve-0.000
Mo-0.000
 himself-0.000
Token a
Token ID257
Feature activation-2.35e-01

Pos logit contributions
ours+0.010
 salon+0.010
 supervision+0.010
 studio+0.010
ses+0.009
Neg logit contributions
ba-0.006
ty-0.006
ve-0.005
Mo-0.005
B-0.005
Token tant
Token ID24246
Feature activation-3.01e+00

Pos logit contributions
ba+0.005
ve+0.005
ty+0.005
Ar+0.005
Mo+0.005
Neg logit contributions
ours-0.008
 supervision-0.008
 salon-0.007
 studio-0.007
ses-0.007
Tokenrum
Token ID6582
Feature activation-4.07e-01

Pos logit contributions
Ar+0.087
ba+0.086
ve+0.084
ty+0.083
Mo+0.082
Neg logit contributions
 supervision-0.068
anged-0.066
ours-0.065
apping-0.059
 salon-0.059
Token.
Token ID13
Feature activation+9.23e-02

Pos logit contributions
ba+0.006
ty+0.006
 himself+0.006
ve+0.006
Mo+0.005
Neg logit contributions
ours-0.016
 salon-0.016
 supervision-0.016
 studio-0.016
 Salad-0.015
Token She
Token ID1375
Feature activation+3.49e-01

Pos logit contributions
 supervision+0.003
 salon+0.003
ours+0.002
 slot+0.002
 studio+0.002
Neg logit contributions
 himself-0.002
ba-0.002
Mo-0.002
E-0.002
ve-0.002
Token grabbed
Token ID13176
Feature activation-5.36e-01

Pos logit contributions
 salon+0.010
ours+0.010
 supervision+0.010
 studio+0.010
ses+0.009
Neg logit contributions
ba-0.009
ty-0.009
ve-0.009
Mo-0.008
B-0.008
Token the
Token ID262
Feature activation+2.82e-01

Pos logit contributions
ba+0.014
ve+0.013
ty+0.013
Mo+0.012
Ar+0.012
Neg logit contributions
ours-0.016
 salon-0.016
 supervision-0.016
 studio-0.015
 mouths-0.014
Token noise
Token ID7838
Feature activation-6.20e-01

Pos logit contributions
ours+0.013
 salon+0.013
 supervision+0.013
 studio+0.012
ses+0.012
Neg logit contributions
ba-0.007
ty-0.007
ve-0.007
Mo-0.006
B-0.006
Token and
Token ID290
Feature activation+5.89e-01

Pos logit contributions
ba+0.013
ty+0.012
ve+0.012
Mo+0.011
Ar+0.010
Neg logit contributions
 salon-0.022
ours-0.022
 supervision-0.021
 studio-0.021
ses-0.020
Token comes
Token ID2058
Feature activation+9.18e-01

Pos logit contributions
 salon+0.015
 supervision+0.015
ours+0.014
 studio+0.014
ses+0.013
Neg logit contributions
ba-0.018
ty-0.017
ve-0.016
Mo-0.016
 himself-0.015
Token to
Token ID284
Feature activation+4.30e-01

Pos logit contributions
 supervision+0.020
ours+0.020
 salon+0.020
 studio+0.019
ses+0.017
Neg logit contributions
ba-0.030
ty-0.029
ve-0.028
Mo-0.027
Ar-0.026
Token the
Token ID262
Feature activation+9.89e-01

Pos logit contributions
 supervision+0.010
 salon+0.009
 studio+0.009
ours+0.009
ses+0.008
Neg logit contributions
ba-0.014
ve-0.014
ty-0.013
Mo-0.013
vell-0.012
Token living
Token ID2877
Feature activation+2.44e+00

Pos logit contributions
ours+0.027
anged+0.026
 supervision+0.026
 gust+0.023
apping+0.023
Neg logit contributions
ba-0.026
ty-0.026
ve-0.025
Ar-0.025
 himself-0.024
Token room
Token ID2119
Feature activation+4.10e-01

Pos logit contributions
 supervision+0.058
ours+0.054
 salon+0.053
ses+0.052
anged+0.048
Neg logit contributions
ba-0.078
ty-0.073
ve-0.072
 himself-0.068
Ar-0.067
Token.
Token ID13
Feature activation-8.59e-01

Pos logit contributions
ours+0.017
 supervision+0.017
ses+0.017
 Salad+0.016
 salon+0.016
Neg logit contributions
ty-0.005
ve-0.005
ba-0.004
 himself-0.004
ke-0.003
Token
Token ID198
Feature activation-4.43e-01

Pos logit contributions
Mo+0.018
B+0.017
ve+0.017
ba+0.016
Mill+0.016
Neg logit contributions
 supervision-0.029
 salon-0.028
 studio-0.024
ours-0.024
bars-0.023
Token
Token ID198
Feature activation-4.61e-01

Pos logit contributions
ba+0.012
ty+0.011
ve+0.011
Mo+0.010
vell+0.009
Neg logit contributions
ours-0.013
 salon-0.013
 supervision-0.012
 studio-0.012
ses-0.011
Token"
Token ID1
Feature activation-2.09e-01

Pos logit contributions
ba+0.010
ve+0.010
ty+0.010
Mo+0.010
B+0.009
Neg logit contributions
 salon-0.015
 supervision-0.015
ours-0.015
 studio-0.014
ses-0.014
Token was
Token ID373
Feature activation-1.24e+00

Pos logit contributions
 salon+0.004
ours+0.004
 supervision+0.004
 studio+0.004
ses+0.004
Neg logit contributions
ba-0.003
ty-0.003
ve-0.003
Mo-0.002
B-0.002
Token a
Token ID257
Feature activation-2.31e-01

Pos logit contributions
ba+0.032
ve+0.031
ty+0.030
Mo+0.028
Ar+0.026
Neg logit contributions
ours-0.036
 salon-0.035
 supervision-0.034
 studio-0.034
ses-0.033
Token generous
Token ID14431
Feature activation+7.47e-01

Pos logit contributions
ba+0.004
ve+0.003
ty+0.003
B+0.003
Mo+0.003
Neg logit contributions
 supervision-0.010
ours-0.009
 salon-0.009
 studio-0.009
anged-0.009
Token woman
Token ID2415
Feature activation+1.64e+00

Pos logit contributions
 supervision+0.025
ours+0.024
 salon+0.022
anged+0.022
 mouths+0.022
Neg logit contributions
ba-0.018
ty-0.017
ve-0.016
Mo-0.016
B-0.015
Token called
Token ID1444
Feature activation+1.07e+00

Pos logit contributions
 supervision+0.062
ours+0.060
ses+0.059
 salon+0.058
 mouths+0.057
Neg logit contributions
ba-0.034
ty-0.033
B-0.027
ve-0.027
Mo-0.026
Token Mrs
Token ID9074
Feature activation+2.45e+00

Pos logit contributions
ours+0.030
 salon+0.030
 supervision+0.029
 studio+0.028
 mouths+0.027
Neg logit contributions
ba-0.029
ty-0.028
ve-0.027
Mo-0.027
B-0.025
Token.
Token ID13
Feature activation+8.40e-01

Pos logit contributions
ours+0.092
 salon+0.090
 supervision+0.088
 studio+0.086
 mouths+0.083
Neg logit contributions
ty-0.048
ba-0.048
ve-0.040
Mo-0.040
B-0.036
Token M
Token ID337
Feature activation-1.42e+00

Pos logit contributions
 salon+0.020
ours+0.019
 supervision+0.019
 studio+0.018
ses+0.017
Neg logit contributions
ba-0.027
ty-0.026
ve-0.026
Mo-0.024
B-0.023
Tokenilly
Token ID6548
Feature activation+2.65e-01

Pos logit contributions
ba+0.040
ty+0.037
ve+0.037
Mo+0.036
Ar+0.035
Neg logit contributions
 salon-0.040
ours-0.039
 supervision-0.038
 studio-0.038
ses-0.035
Token.
Token ID13
Feature activation+2.26e-01

Pos logit contributions
ours+0.011
 supervision+0.011
 salon+0.011
 gust+0.010
 studio+0.010
Neg logit contributions
ty-0.005
ba-0.004
ve-0.003
Mo-0.003
 himself-0.003
Token She
Token ID1375
Feature activation+2.33e-01

Pos logit contributions
 supervision+0.005
ours+0.005
 salon+0.005
 studio+0.005
ses+0.005
Neg logit contributions
ba-0.007
ty-0.007
ve-0.007
Mo-0.007
B-0.006
Token scared
Token ID12008
Feature activation+2.76e-01

Pos logit contributions
 salon+0.049
ours+0.049
 supervision+0.048
 studio+0.047
ses+0.044
Neg logit contributions
ba-0.044
ty-0.042
ve-0.041
Mo-0.039
B-0.036
Token,
Token ID11
Feature activation+2.41e-01

Pos logit contributions
 salon+0.008
ours+0.008
 supervision+0.008
 studio+0.008
ses+0.008
Neg logit contributions
ba-0.007
ve-0.006
ty-0.006
Mo-0.006
Ar-0.006
Token but
Token ID475
Feature activation+8.26e-02

Pos logit contributions
ours+0.008
 salon+0.008
 supervision+0.008
 studio+0.008
anged+0.007
Neg logit contributions
ba-0.006
ty-0.005
ve-0.005
Mo-0.005
 himself-0.005
Token he
Token ID339
Feature activation+6.59e-01

Pos logit contributions
ours+0.002
 salon+0.002
 studio+0.002
 supervision+0.002
ses+0.002
Neg logit contributions
ba-0.002
ty-0.002
 himself-0.002
Mo-0.002
ve-0.002
Token was
Token ID373
Feature activation+5.29e-01

Pos logit contributions
 salon+0.020
ours+0.020
 supervision+0.020
 studio+0.019
ses+0.018
Neg logit contributions
ba-0.017
ty-0.016
ve-0.016
Mo-0.015
Ar-0.014
Token also
Token ID635
Feature activation+2.28e+00

Pos logit contributions
 salon+0.017
ours+0.017
 supervision+0.017
 studio+0.016
ses+0.015
Neg logit contributions
ba-0.012
ty-0.012
ve-0.012
Mo-0.011
B-0.010
Token excited
Token ID6568
Feature activation+3.01e-01

Pos logit contributions
 salon+0.081
ours+0.081
 supervision+0.077
 studio+0.076
ses+0.074
Neg logit contributions
ba-0.046
ty-0.043
ve-0.043
 himself-0.039
Mo-0.037
Token to
Token ID284
Feature activation-2.28e-01

Pos logit contributions
ours+0.009
 salon+0.009
 supervision+0.008
 studio+0.008
ses+0.008
Neg logit contributions
ty-0.008
ba-0.008
ve-0.008
Mo-0.007
 himself-0.007
Token meet
Token ID1826
Feature activation-1.09e-01

Pos logit contributions
ve+0.007
ba+0.007
ty+0.007
rob+0.007
B+0.007
Neg logit contributions
 studio-0.005
 supervision-0.005
 salon-0.005
ap-0.005
ours-0.005
Token him
Token ID683
Feature activation-7.66e-01

Pos logit contributions
ba+0.003
ve+0.003
Mo+0.003
ty+0.002
Ar+0.002
Neg logit contributions
 salon-0.003
ours-0.003
 supervision-0.003
 studio-0.003
 mouths-0.003
Token.
Token ID13
Feature activation-1.62e-01

Pos logit contributions
ba+0.019
ty+0.018
ve+0.018
Mo+0.017
Ar+0.016
Neg logit contributions
 salon-0.023
ours-0.023
 supervision-0.023
 studio-0.022
ses-0.021
Token"
Token ID1
Feature activation-1.11e-01

Pos logit contributions
ba+0.007
ty+0.006
ve+0.006
Mo+0.006
B+0.005
Neg logit contributions
ours-0.008
 salon-0.008
 supervision-0.008
 studio-0.008
ses-0.007
TokenHere
Token ID4342
Feature activation-2.39e-01

Pos logit contributions
ba+0.003
ty+0.003
ve+0.003
Mo+0.003
 himself+0.003
Neg logit contributions
 salon-0.003
ours-0.003
 supervision-0.003
 studio-0.003
ses-0.003
Token,
Token ID11
Feature activation+3.02e-01

Pos logit contributions
ty+0.005
ba+0.005
ve+0.005
Mo+0.005
 himself+0.005
Neg logit contributions
 salon-0.008
ours-0.008
 studio-0.008
 supervision-0.007
ses-0.007
Token you
Token ID345
Feature activation+1.48e+00

Pos logit contributions
ours+0.009
 salon+0.009
 supervision+0.009
 studio+0.009
ses+0.008
Neg logit contributions
ba-0.008
ty-0.007
ve-0.007
Mo-0.007
 himself-0.006
Token can
Token ID460
Feature activation+3.94e-01

Pos logit contributions
 salon+0.034
ours+0.033
 supervision+0.033
 studio+0.032
ses+0.029
Neg logit contributions
ba-0.049
ty-0.047
ve-0.047
Mo-0.044
B-0.043
Token each
Token ID1123
Feature activation+2.32e+00

Pos logit contributions
 salon+0.010
ours+0.010
 supervision+0.010
 studio+0.010
ses+0.009
Neg logit contributions
ba-0.011
ty-0.011
ve-0.011
Mo-0.011
Ar-0.010
Token have
Token ID423
Feature activation+1.09e+00

Pos logit contributions
ours+0.061
 salon+0.061
 supervision+0.060
 studio+0.058
ses+0.055
Neg logit contributions
ba-0.068
ty-0.065
ve-0.064
Mo-0.061
B-0.058
Token a
Token ID257
Feature activation+4.60e-01

Pos logit contributions
 salon+0.034
ours+0.034
 supervision+0.033
 studio+0.032
ses+0.031
Neg logit contributions
ba-0.027
ty-0.025
ve-0.025
Mo-0.024
B-0.022
Token half
Token ID2063
Feature activation+1.26e+00

Pos logit contributions
 supervision+0.014
ours+0.014
 salon+0.014
anged+0.013
 studio+0.013
Neg logit contributions
ba-0.011
ve-0.011
ty-0.010
Ar-0.010
Mo-0.010
Token of
Token ID286
Feature activation+1.34e+00

Pos logit contributions
 salon+0.042
ours+0.041
 supervision+0.041
 studio+0.040
ses+0.038
Neg logit contributions
ba-0.028
ty-0.027
ve-0.027
Mo-0.024
 himself-0.022
Token the
Token ID262
Feature activation+4.84e-01

Pos logit contributions
 salon+0.041
ours+0.041
 studio+0.040
 supervision+0.040
ses+0.037
Neg logit contributions
ty-0.031
ba-0.030
 himself-0.029
ve-0.029
Mo-0.027
Token she
Token ID673
Feature activation-2.53e-01

Pos logit contributions
ty+0.018
ba+0.018
Mo+0.017
ve+0.017
 himself+0.016
Neg logit contributions
ours-0.022
 supervision-0.022
ses-0.021
 salon-0.020
anged-0.020
Token came
Token ID1625
Feature activation-3.55e-01

Pos logit contributions
ba+0.008
ty+0.007
ve+0.007
Mo+0.007
Ar+0.007
Neg logit contributions
 salon-0.006
ours-0.006
 studio-0.006
 supervision-0.006
 mouths-0.005
Token back
Token ID736
Feature activation-9.08e-01

Pos logit contributions
ba+0.009
ve+0.008
ty+0.008
Mo+0.008
 himself+0.007
Neg logit contributions
 salon-0.011
ours-0.011
 supervision-0.011
 studio-0.011
ses-0.010
Token to
Token ID284
Feature activation-8.73e-01

Pos logit contributions
ba+0.024
ty+0.023
ve+0.023
Mo+0.022
B+0.020
Neg logit contributions
ours-0.026
 salon-0.026
 supervision-0.026
 studio-0.024
ses-0.024
Token the
Token ID262
Feature activation+8.04e-01

Pos logit contributions
ba+0.019
ty+0.019
ve+0.017
Mo+0.017
 himself+0.017
Neg logit contributions
ours-0.029
 supervision-0.028
 salon-0.028
ses-0.027
 studio-0.027
Token living
Token ID2877
Feature activation+2.31e+00

Pos logit contributions
ours+0.026
 supervision+0.024
anged+0.024
ses+0.022
apping+0.022
Neg logit contributions
ba-0.019
ty-0.019
ve-0.018
Ar-0.017
 cool-0.017
Token room
Token ID2119
Feature activation-3.66e-01

Pos logit contributions
 supervision+0.062
ses+0.056
 salon+0.055
ours+0.055
 gust+0.052
Neg logit contributions
ba-0.068
ve-0.062
ty-0.061
 himself-0.059
Ar-0.058
Token,
Token ID11
Feature activation-6.95e-01

Pos logit contributions
ty+0.007
ve+0.007
ba+0.006
 himself+0.006
ke+0.005
Neg logit contributions
ours-0.013
 supervision-0.013
ses-0.013
 goalie-0.012
 salon-0.012
Token the
Token ID262
Feature activation+2.62e-01

Pos logit contributions
ba+0.018
ty+0.017
ve+0.016
Mo+0.016
 himself+0.016
Neg logit contributions
ours-0.021
 supervision-0.021
ses-0.020
 salon-0.019
 studio-0.019
Token popcorn
Token ID38914
Feature activation+3.46e-01

Pos logit contributions
ours+0.009
 supervision+0.008
anged+0.008
 salon+0.007
ses+0.007
Neg logit contributions
ba-0.006
ve-0.006
ty-0.006
B-0.005
 himself-0.005
Token was
Token ID373
Feature activation-3.96e-02

Pos logit contributions
 supervision+0.012
ours+0.012
 salon+0.012
 studio+0.011
ses+0.011
Neg logit contributions
ty-0.007
ba-0.007
ve-0.007
 himself-0.006
Mo-0.006
Token folding
Token ID29909
Feature activation-5.87e-01

Pos logit contributions
ba+0.029
ty+0.028
ve+0.028
Mo+0.026
B+0.025
Neg logit contributions
 salon-0.023
ours-0.023
 supervision-0.023
 studio-0.022
ses-0.021
Token her
Token ID607
Feature activation-5.60e-01

Pos logit contributions
ty+0.012
ba+0.012
ve+0.011
 himself+0.010
Mo+0.010
Neg logit contributions
ours-0.020
 supervision-0.020
 salon-0.020
ses-0.019
 studio-0.018
Token paper
Token ID3348
Feature activation-3.51e-01

Pos logit contributions
ve+0.014
ty+0.013
ba+0.012
 himself+0.012
 cool+0.012
Neg logit contributions
 salon-0.016
ours-0.016
 supervision-0.015
 Sharing-0.014
anged-0.014
Token,
Token ID11
Feature activation-5.71e-01

Pos logit contributions
ba+0.008
ty+0.008
ve+0.007
Mo+0.007
 himself+0.007
Neg logit contributions
ours-0.012
 salon-0.011
 supervision-0.011
ses-0.011
 studio-0.011
Token Sam
Token ID3409
Feature activation+4.22e-01

Pos logit contributions
ty+0.014
ba+0.014
ve+0.013
 himself+0.013
B+0.013
Neg logit contributions
ours-0.018
 supervision-0.017
ses-0.017
 salon-0.017
anged-0.016
Token and
Token ID290
Feature activation+5.22e-01

Pos logit contributions
ours+0.013
 supervision+0.013
 salon+0.013
ses+0.012
 studio+0.012
Neg logit contributions
ba-0.011
ty-0.010
ve-0.010
Mo-0.009
B-0.009
Token Emma
Token ID18966
Feature activation+7.61e-01

Pos logit contributions
ours+0.019
 salon+0.017
 supervision+0.017
ses+0.017
 studio+0.016
Neg logit contributions
ty-0.012
ba-0.011
 himself-0.010
Mo-0.009
B-0.009
Token smiled
Token ID13541
Feature activation+1.04e-01

Pos logit contributions
 supervision+0.025
ours+0.025
 salon+0.024
ses+0.024
 studio+0.023
Neg logit contributions
ba-0.019
ve-0.017
ty-0.017
Mo-0.015
B-0.015
Token at
Token ID379
Feature activation+1.17e-01

Pos logit contributions
 salon+0.003
ours+0.003
 supervision+0.003
 studio+0.003
ses+0.003
Neg logit contributions
ba-0.003
ty-0.002
ve-0.002
Mo-0.002
B-0.002
Token each
Token ID1123
Feature activation+1.49e+00

Pos logit contributions
 salon+0.004
ours+0.004
 supervision+0.003
 studio+0.003
ses+0.003
Neg logit contributions
ty-0.003
ba-0.003
ve-0.003
Mo-0.002
 himself-0.002
Token other
Token ID584
Feature activation-1.52e-01

Pos logit contributions
 salon+0.050
 supervision+0.049
ours+0.049
 studio+0.049
 mouths+0.045
Neg logit contributions
ba-0.033
ty-0.030
ve-0.030
Mo-0.029
Ar-0.026
TokenOnce
Token ID7454
Feature activation-6.21e-01

Pos logit contributions
ba+0.024
ve+0.022
ty+0.022
Mo+0.021
rim+0.020
Neg logit contributions
ours-0.020
 salon-0.020
 studio-0.019
 slot-0.018
 supervision-0.018
Token upon
Token ID2402
Feature activation-2.65e-01

Pos logit contributions
ba+0.012
ve+0.011
ty+0.011
Mo+0.010
B+0.009
Neg logit contributions
 supervision-0.023
 salon-0.023
ours-0.022
 studio-0.022
ses-0.021
Token a
Token ID257
Feature activation+5.12e-01

Pos logit contributions
ba+0.008
ve+0.008
ty+0.007
Mo+0.007
rim+0.007
Neg logit contributions
 salon-0.006
ours-0.006
 supervision-0.006
 studio-0.006
 tower-0.005
Token time
Token ID640
Feature activation-9.73e-02

Pos logit contributions
 salon+0.015
ours+0.015
 supervision+0.015
 studio+0.015
ses+0.014
Neg logit contributions
ba-0.013
ve-0.012
ty-0.012
Mo-0.011
 himself-0.010
Token,
Token ID11
Feature activation+4.61e-01

Pos logit contributions
ba+0.003
ty+0.003
ve+0.003
Mo+0.003
Ar+0.003
Neg logit contributions
ours-0.002
 salon-0.002
 supervision-0.002
 studio-0.002
ses-0.002
Token there
Token ID612
Feature activation+4.18e-01

Pos logit contributions
ours+0.014
 salon+0.014
 supervision+0.014
 studio+0.013
ses+0.013
Neg logit contributions
ba-0.011
ty-0.011
ve-0.011
Mo-0.010
Ar-0.009
Token was
Token ID373
Feature activation-9.38e-01

Pos logit contributions
 salon+0.013
ours+0.013
 supervision+0.013
 studio+0.012
ses+0.012
Neg logit contributions
ba-0.010
ty-0.010
ve-0.010
Mo-0.009
B-0.008
Token a
Token ID257
Feature activation-2.48e-01

Pos logit contributions
ba+0.026
ve+0.024
ty+0.024
Mo+0.022
vell+0.021
Neg logit contributions
 salon-0.026
ours-0.026
 studio-0.025
 supervision-0.024
ses-0.024
Token little
Token ID1310
Feature activation+4.32e-01

Pos logit contributions
ba+0.005
ve+0.004
ty+0.004
Mo+0.004
B+0.004
Neg logit contributions
 supervision-0.010
ours-0.009
 salon-0.009
 studio-0.009
ses-0.009
Token girl
Token ID2576
Feature activation+8.96e-01

Pos logit contributions
 supervision+0.013
ours+0.012
 salon+0.012
 studio+0.011
anged+0.011
Neg logit contributions
ba-0.012
ty-0.011
ve-0.011
B-0.011
Mo-0.011
Token named
Token ID3706
Feature activation+1.12e+00

Pos logit contributions
 supervision+0.029
ours+0.029
 salon+0.029
ses+0.028
 studio+0.027
Neg logit contributions
ba-0.021
ty-0.021
B-0.019
Mo-0.019
ve-0.018
Token but
Token ID475
Feature activation+5.55e-01

Pos logit contributions
ours+0.005
 supervision+0.004
 studio+0.004
 salon+0.004
anged+0.004
Neg logit contributions
ba-0.002
ty-0.002
 himself-0.002
ve-0.001
Mo-0.001
Token he
Token ID339
Feature activation+1.08e-01

Pos logit contributions
 studio+0.020
ours+0.020
 supervision+0.019
 salon+0.019
ses+0.018
Neg logit contributions
ba-0.011
 himself-0.011
ty-0.011
ve-0.010
Mo-0.009
Token did
Token ID750
Feature activation-1.77e-01

Pos logit contributions
 supervision+0.003
ours+0.003
 salon+0.003
 studio+0.003
ses+0.003
Neg logit contributions
ba-0.003
ty-0.003
ve-0.003
Mo-0.002
B-0.002
Token not
Token ID407
Feature activation-5.08e-01

Pos logit contributions
ba+0.004
ty+0.004
ve+0.004
 himself+0.003
Mo+0.003
Neg logit contributions
 salon-0.006
ours-0.006
 supervision-0.006
 studio-0.006
ses-0.006
Token notice
Token ID4003
Feature activation+8.75e-01

Pos logit contributions
ba+0.014
ve+0.014
ty+0.014
Mo+0.013
B+0.012
Neg logit contributions
 salon-0.014
 supervision-0.014
ours-0.013
 studio-0.013
ses-0.012
Token that
Token ID326
Feature activation+7.82e-01

Pos logit contributions
 salon+0.028
 studio+0.028
ours+0.028
 supervision+0.028
ses+0.027
Neg logit contributions
ba-0.020
ty-0.019
 himself-0.018
ve-0.017
Mo-0.017
Token it
Token ID340
Feature activation-1.68e-02

Pos logit contributions
ours+0.031
 supervision+0.029
 studio+0.029
ses+0.029
apping+0.028
Neg logit contributions
 himself-0.014
ty-0.012
ba-0.011
 his-0.011
 w-0.009
Token was
Token ID373
Feature activation+5.56e-01

Pos logit contributions
ba+0.000
ty+0.000
ve+0.000
Mo+0.000
B+0.000
Neg logit contributions
 salon-0.001
ours-0.001
 supervision-0.001
 studio-0.000
ses-0.000
Token w
Token ID266
Feature activation-5.86e-01

Pos logit contributions
 salon+0.018
ours+0.018
 supervision+0.018
 studio+0.017
ses+0.016
Neg logit contributions
ba-0.013
ty-0.012
ve-0.012
Mo-0.011
 himself-0.011
Tokenobb
Token ID21963
Feature activation-9.87e-01

Pos logit contributions
ba+0.018
Mo+0.017
ty+0.017
ve+0.017
Ar+0.016
Neg logit contributions
ours-0.015
 salon-0.014
 supervision-0.014
 studio-0.013
ses-0.012
Tokenly
Token ID306
Feature activation-1.16e-01

Pos logit contributions
ba+0.025
Mo+0.025
ve+0.024
ty+0.023
Mill+0.023
Neg logit contributions
ours-0.028
 supervision-0.028
 salon-0.028
anged-0.026
 studio-0.025
Token arrow
Token ID15452
Feature activation-4.42e-01

Pos logit contributions
ba+0.022
ve+0.022
ty+0.022
Mo+0.020
rim+0.019
Neg logit contributions
 supervision-0.020
 salon-0.019
 studio-0.018
 mouths-0.018
ours-0.018
Token.
Token ID13
Feature activation+4.18e-02

Pos logit contributions
ba+0.009
ty+0.009
ve+0.008
 himself+0.007
Mo+0.007
Neg logit contributions
ours-0.016
 salon-0.016
 supervision-0.016
 studio-0.015
ses-0.014
Token One
Token ID1881
Feature activation+6.99e-01

Pos logit contributions
 supervision+0.001
 salon+0.001
ours+0.001
 studio+0.001
ses+0.001
Neg logit contributions
ba-0.001
ty-0.001
ve-0.001
Mo-0.001
 himself-0.001
Token day
Token ID1110
Feature activation-2.16e-01

Pos logit contributions
 supervision+0.018
ours+0.018
 salon+0.018
 studio+0.017
ses+0.016
Neg logit contributions
ba-0.021
ty-0.021
Mo-0.019
ve-0.019
B-0.019
Token,
Token ID11
Feature activation+7.60e-01

Pos logit contributions
ba+0.006
ty+0.006
ve+0.006
Mo+0.006
B+0.005
Neg logit contributions
 salon-0.006
ours-0.006
 supervision-0.006
 studio-0.006
ses-0.005
Token he
Token ID339
Feature activation+5.30e-01

Pos logit contributions
 salon+0.021
ours+0.021
 supervision+0.021
 studio+0.020
ses+0.019
Neg logit contributions
ba-0.021
ty-0.020
ve-0.020
Mo-0.019
B-0.018
Token went
Token ID1816
Feature activation-2.90e-01

Pos logit contributions
 salon+0.013
ours+0.012
 studio+0.012
 supervision+0.011
 mouths+0.011
Neg logit contributions
ve-0.016
ba-0.016
ty-0.016
Mo-0.016
Mill-0.014
Token to
Token ID284
Feature activation-7.94e-01

Pos logit contributions
ba+0.010
ve+0.010
ty+0.009
Mo+0.009
Ar+0.009
Neg logit contributions
 salon-0.006
ours-0.006
 studio-0.006
 supervision-0.006
ses-0.005
Token the
Token ID262
Feature activation+8.37e-01

Pos logit contributions
ve+0.026
ba+0.026
ty+0.024
Mo+0.023
vell+0.022
Neg logit contributions
 salon-0.019
 studio-0.018
 supervision-0.018
ours-0.018
ses-0.016
Token park
Token ID3952
Feature activation+9.24e-01

Pos logit contributions
 supervision+0.025
ours+0.025
 salon+0.023
ses+0.022
anged+0.022
Neg logit contributions
ba-0.022
ty-0.021
Mo-0.020
ve-0.019
B-0.019
Token to
Token ID284
Feature activation-1.82e-01

Pos logit contributions
 supervision+0.037
ours+0.035
 mouths+0.033
ses+0.033
 salon+0.032
Neg logit contributions
ba-0.017
ty-0.016
ve-0.014
 himself-0.014
B-0.012
Token.
Token ID13
Feature activation+5.23e-02

Pos logit contributions
 salon+0.005
 supervision+0.005
ours+0.005
 studio+0.005
 mouths+0.005
Neg logit contributions
ba-0.005
ty-0.005
ve-0.005
Mo-0.004
Ar-0.004
Token
Token ID220
Feature activation+2.36e-01

Pos logit contributions
 salon+0.002
 supervision+0.002
ours+0.002
 studio+0.002
ses+0.001
Neg logit contributions
ba-0.001
Mo-0.001
ty-0.001
ve-0.001
B-0.001
Token
Token ID198
Feature activation+3.02e-01

Pos logit contributions
ours+0.008
 salon+0.008
 supervision+0.008
 studio+0.007
ses+0.007
Neg logit contributions
ba-0.005
ty-0.005
ve-0.005
Mo-0.005
Ar-0.004
Token
Token ID198
Feature activation+3.78e-02

Pos logit contributions
 salon+0.007
 �+0.007
 studio+0.007
ours+0.006
 Sharing+0.006
Neg logit contributions
ba-0.009
ty-0.008
ve-0.008
vell-0.008
Mo-0.007
TokenFrom
Token ID4863
Feature activation-1.42e-01

Pos logit contributions
 salon+0.001
 studio+0.001
 supervision+0.001
ours+0.001
 Sharing+0.001
Neg logit contributions
ba-0.001
ty-0.001
ve-0.001
Mo-0.001
vell-0.001
Token that
Token ID326
Feature activation+1.20e+00

Pos logit contributions
ba+0.006
ve+0.006
Mo+0.005
Ar+0.005
ty+0.005
Neg logit contributions
 salon-0.002
ours-0.002
 supervision-0.002
 studio-0.002
ses-0.002
Token day
Token ID1110
Feature activation-4.55e-01

Pos logit contributions
ours+0.036
 supervision+0.034
 salon+0.034
 mouths+0.031
 studio+0.031
Neg logit contributions
ty-0.034
ba-0.032
ve-0.032
Mo-0.032
B-0.029
Token on
Token ID319
Feature activation-7.02e-01

Pos logit contributions
ba+0.009
ve+0.009
ty+0.009
Mo+0.008
B+0.008
Neg logit contributions
 salon-0.016
ours-0.016
 supervision-0.016
 studio-0.015
ses-0.015
Token,
Token ID11
Feature activation+4.52e-01

Pos logit contributions
ty+0.017
ve+0.016
ba+0.016
Mo+0.016
 himself+0.015
Neg logit contributions
ours-0.022
 salon-0.022
 supervision-0.022
 studio-0.020
ses-0.020
Token the
Token ID262
Feature activation-5.09e-01

Pos logit contributions
ours+0.014
 salon+0.014
 supervision+0.014
 studio+0.013
ses+0.013
Neg logit contributions
ba-0.011
ty-0.011
ve-0.011
Mo-0.011
B-0.010
Token sheep
Token ID15900
Feature activation+1.06e-01

Pos logit contributions
ba+0.014
ve+0.013
 himself+0.012
ty+0.012
vell+0.012
Neg logit contributions
 studio-0.013
 supervision-0.013
 salon-0.012
ses-0.012
ours-0.012
Token started
Token ID2067
Feature activation+6.15e-01

Pos logit contributions
ba+0.013
ty+0.012
ve+0.012
Mo+0.012
B+0.011
Neg logit contributions
 salon-0.014
ours-0.014
 supervision-0.014
 studio-0.013
ses-0.012
Token to
Token ID284
Feature activation+5.94e-01

Pos logit contributions
ours+0.019
 supervision+0.018
 studio+0.018
 salon+0.018
ses+0.017
Neg logit contributions
ba-0.015
ty-0.015
Mo-0.014
ve-0.014
 himself-0.014
Token make
Token ID787
Feature activation+5.91e-01

Pos logit contributions
ours+0.017
 salon+0.016
 supervision+0.016
 studio+0.016
ses+0.015
Neg logit contributions
ba-0.016
ty-0.016
ve-0.015
Mo-0.015
B-0.014
Token trouble
Token ID5876
Feature activation-1.24e-01

Pos logit contributions
ours+0.019
 salon+0.019
 supervision+0.019
 studio+0.019
 mouths+0.018
Neg logit contributions
ba-0.013
ve-0.012
ty-0.012
Mo-0.011
Ar-0.011
Token!
Token ID0
Feature activation+5.56e-01

Pos logit contributions
ba+0.003
ty+0.003
ve+0.003
 himself+0.002
Mo+0.002
Neg logit contributions
ours-0.004
 salon-0.004
 supervision-0.004
 studio-0.004
ses-0.004
Token He
Token ID679
Feature activation-3.19e-01

Pos logit contributions
 supervision+0.018
 salon+0.018
ours+0.018
 studio+0.017
ses+0.016
Neg logit contributions
ba-0.013
Mo-0.012
ty-0.012
ve-0.012
 himself-0.012
Token pushed
Token ID7121
Feature activation+3.23e-01

Pos logit contributions
ba+0.009
ty+0.008
ve+0.008
Mo+0.008
B+0.007
Neg logit contributions
ours-0.009
 salon-0.009
 supervision-0.009
 studio-0.009
ses-0.008
Token the
Token ID262
Feature activation+1.40e-01

Pos logit contributions
 supervision+0.008
ours+0.008
 gust+0.008
anged+0.007
 mouths+0.007
Neg logit contributions
ba-0.009
ve-0.009
ty-0.008
rim-0.008
vell-0.008
Token furry
Token ID40857
Feature activation-3.01e-01

Pos logit contributions
ours+0.004
 supervision+0.004
 salon+0.004
 studio+0.004
ses+0.004
Neg logit contributions
ba-0.004
ty-0.003
ve-0.003
Mo-0.003
B-0.003
Token bear
Token ID6842
Feature activation-1.18e-01

Pos logit contributions
ba+0.007
ve+0.006
ty+0.006
Mo+0.006
Mill+0.006
Neg logit contributions
 salon-0.010
ours-0.010
 supervision-0.009
 studio-0.009
ses-0.009
Token off
Token ID572
Feature activation-1.96e-01

Pos logit contributions
ty+0.002
ba+0.002
ve+0.002
Mo+0.002
 himself+0.002
Neg logit contributions
ours-0.004
 salon-0.004
 supervision-0.004
 studio-0.004
ses-0.004
Token hardly
Token ID8941
Feature activation-3.02e-01

Pos logit contributions
ba+0.002
ve+0.002
ty+0.002
B+0.002
Mo+0.002
Neg logit contributions
 salon-0.002
 studio-0.002
 supervision-0.002
ours-0.002
ap-0.002
Token sit
Token ID1650
Feature activation+7.96e-01

Pos logit contributions
ba+0.010
ve+0.010
ty+0.010
Mo+0.010
B+0.009
Neg logit contributions
 salon-0.006
 supervision-0.006
ours-0.006
 studio-0.006
ses-0.005
Token still
Token ID991
Feature activation-5.78e-01

Pos logit contributions
 salon+0.024
ours+0.022
 studio+0.021
 supervision+0.021
 slot+0.020
Neg logit contributions
Ar-0.020
ba-0.020
Mo-0.019
ty-0.019
ve-0.019
Token.
Token ID13
Feature activation-3.38e-02

Pos logit contributions
ba+0.015
ve+0.014
Mo+0.014
ty+0.014
Ar+0.013
Neg logit contributions
 salon-0.017
 supervision-0.016
ours-0.016
 studio-0.016
 mouths-0.014
Token
Token ID220
Feature activation+3.25e-01

Pos logit contributions
ba+0.001
Mo+0.001
ve+0.001
ty+0.001
Mill+0.001
Neg logit contributions
 salon-0.001
 supervision-0.001
ours-0.001
 studio-0.001
ses-0.001
Token
Token ID198
Feature activation+5.16e-02

Pos logit contributions
 salon+0.011
ours+0.011
 supervision+0.011
 studio+0.011
ses+0.010
Neg logit contributions
ba-0.007
ty-0.006
ve-0.006
Mo-0.006
B-0.005
Token
Token ID198
Feature activation-5.22e-02

Pos logit contributions
 salon+0.001
ours+0.001
 studio+0.001
 supervision+0.001
 �+0.001
Neg logit contributions
ba-0.001
ty-0.001
ve-0.001
vell-0.001
Mo-0.001
TokenOn
Token ID2202
Feature activation-1.18e-01

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
Mo+0.001
B+0.001
Neg logit contributions
ours-0.002
 salon-0.002
 supervision-0.002
 studio-0.002
ses-0.002
Token the
Token ID262
Feature activation+3.07e-01

Pos logit contributions
ty+0.002
ba+0.002
Mo+0.002
 himself+0.002
ve+0.002
Neg logit contributions
ours-0.004
 supervision-0.004
ses-0.004
 salon-0.004
 Sharing-0.004
Token big
Token ID1263
Feature activation-2.10e-01

Pos logit contributions
ours+0.009
 supervision+0.008
 salon+0.008
 studio+0.008
ses+0.008
Neg logit contributions
ba-0.008
ty-0.008
ve-0.008
Mo-0.008
Ar-0.007
Token screen
Token ID3159
Feature activation-7.40e-01

Pos logit contributions
ty+0.005
ba+0.005
ve+0.005
Ar+0.004
Mo+0.004
Neg logit contributions
ours-0.007
 supervision-0.006
ses-0.006
 salon-0.006
 Sharing-0.006
Token the
Token ID262
Feature activation-2.55e-01

Pos logit contributions
 salon+0.001
ours+0.001
 supervision+0.001
 studio+0.001
ses+0.001
Neg logit contributions
ba-0.001
ty-0.001
Mo-0.001
ve-0.001
B-0.001
Token trains
Token ID13404
Feature activation-1.90e-01

Pos logit contributions
ba+0.007
ty+0.007
ve+0.007
Mo+0.007
Ar+0.006
Neg logit contributions
ours-0.007
 salon-0.007
 supervision-0.007
 studio-0.006
ses-0.006
Token.
Token ID13
Feature activation-3.61e-01

Pos logit contributions
ty+0.003
ba+0.003
ve+0.003
 himself+0.002
ke+0.002
Neg logit contributions
ours-0.008
 supervision-0.007
 salon-0.007
 studio-0.007
ses-0.007
Token
Token ID198
Feature activation-6.78e-01

Pos logit contributions
ba+0.009
ty+0.009
ve+0.009
Mo+0.008
Ar+0.008
Neg logit contributions
 salon-0.011
 supervision-0.011
ours-0.011
 studio-0.010
ses-0.010
Token
Token ID198
Feature activation-6.34e-01

Pos logit contributions
ba+0.017
ty+0.016
ve+0.016
Mo+0.015
B+0.014
Neg logit contributions
ours-0.021
 salon-0.020
 supervision-0.020
 studio-0.019
ses-0.018
Token"
Token ID1
Feature activation-3.35e-01

Pos logit contributions
ba+0.014
ty+0.014
ve+0.013
Mo+0.013
B+0.012
Neg logit contributions
 salon-0.021
ours-0.021
 supervision-0.020
 studio-0.020
ses-0.019
TokenCh
Token ID1925
Feature activation-1.17e+00

Pos logit contributions
ve+0.009
ty+0.009
ba+0.009
Mo+0.008
 himself+0.008
Neg logit contributions
 salon-0.009
ours-0.009
 studio-0.009
 supervision-0.009
 slot-0.008
Tokenoo
Token ID2238
Feature activation-1.39e+00

Pos logit contributions
ba+0.029
ty+0.027
ve+0.026
Mo+0.026
Ar+0.024
Neg logit contributions
ours-0.037
 salon-0.036
 supervision-0.036
 studio-0.035
ses-0.033
Token-
Token ID12
Feature activation-7.66e-02

Pos logit contributions
ba+0.034
ty+0.033
ve+0.032
Mo+0.030
B+0.028
Neg logit contributions
 salon-0.044
ours-0.044
 supervision-0.043
 studio-0.042
ses-0.039
Tokench
Token ID354
Feature activation-1.60e+00

Pos logit contributions
ba+0.002
ty+0.002
ve+0.002
Mo+0.002
B+0.001
Neg logit contributions
 salon-0.002
ours-0.002
 supervision-0.002
 studio-0.002
ses-0.002
Tokenoo
Token ID2238
Feature activation-1.25e+00

Pos logit contributions
ba+0.035
ty+0.033
ve+0.033
Mo+0.031
B+0.028
Neg logit contributions
 salon-0.053
ours-0.053
 supervision-0.052
 studio-0.051
ses-0.048
Token shook
Token ID14682
Feature activation-8.82e-02

Pos logit contributions
ba+0.003
ty+0.003
ve+0.003
Mo+0.003
B+0.002
Neg logit contributions
 supervision-0.003
 salon-0.003
ours-0.003
 studio-0.003
ses-0.002
Token his
Token ID465
Feature activation+2.12e-01

Pos logit contributions
ve+0.002
ba+0.002
ty+0.001
Mo+0.001
Ar+0.001
Neg logit contributions
ours-0.003
 supervision-0.003
 salon-0.003
 studio-0.003
 mouths-0.003
Token head
Token ID1182
Feature activation-4.09e-01

Pos logit contributions
ours+0.006
 salon+0.006
 supervision+0.006
 studio+0.006
anged+0.005
Neg logit contributions
ba-0.006
ty-0.006
ve-0.006
Mo-0.005
 himself-0.005
Token.
Token ID13
Feature activation-4.09e-01

Pos logit contributions
ba+0.010
ty+0.009
ve+0.009
Mo+0.009
B+0.008
Neg logit contributions
 salon-0.013
ours-0.013
 supervision-0.013
 studio-0.012
ses-0.012
Token He
Token ID679
Feature activation-3.35e-01

Pos logit contributions
 himself+0.007
ba+0.007
ty+0.006
Mo+0.006
B+0.006
Neg logit contributions
 supervision-0.016
 salon-0.015
ours-0.015
 studio-0.014
anged-0.014
Token knew
Token ID2993
Feature activation-8.90e-01

Pos logit contributions
ba+0.011
ve+0.011
Mo+0.010
ty+0.010
Mill+0.010
Neg logit contributions
ours-0.007
 salon-0.007
ses-0.006
 supervision-0.006
 mouths-0.006
Token Lily
Token ID20037
Feature activation-4.96e-01

Pos logit contributions
ba+0.023
ty+0.022
 himself+0.021
Mo+0.020
ve+0.019
Neg logit contributions
 salon-0.027
 studio-0.027
ours-0.027
 supervision-0.026
ses-0.025
Token was
Token ID373
Feature activation-6.15e-01

Pos logit contributions
ty+0.010
ba+0.010
ve+0.009
 himself+0.008
Mo+0.008
Neg logit contributions
ours-0.018
 studio-0.018
 salon-0.018
 supervision-0.017
ses-0.016
Token kind
Token ID1611
Feature activation+2.61e-01

Pos logit contributions
ba+0.019
ty+0.018
ve+0.018
Mo+0.018
Ar+0.017
Neg logit contributions
 salon-0.015
ours-0.015
 supervision-0.015
 studio-0.014
ses-0.013
Token,
Token ID11
Feature activation-2.03e-01

Pos logit contributions
ours+0.012
 salon+0.011
 studio+0.011
 supervision+0.011
ses+0.010
Neg logit contributions
ty-0.003
 himself-0.003
ba-0.003
B-0.002
ve-0.002
Token but
Token ID475
Feature activation+5.31e-01

Pos logit contributions
ba+0.004
ty+0.003
ve+0.003
Mo+0.003
 himself+0.003
Neg logit contributions
ours-0.008
 salon-0.008
 studio-0.007
 supervision-0.007
ses-0.007
Token He
Token ID679
Feature activation+1.07e-01

Pos logit contributions
 salon+0.002
 supervision+0.002
ours+0.002
 studio+0.002
ses+0.002
Neg logit contributions
 himself-0.001
ty-0.001
ba-0.001
B-0.001
Mo-0.001
Token was
Token ID373
Feature activation-4.25e-01

Pos logit contributions
 salon+0.003
ours+0.003
 supervision+0.003
 studio+0.003
ses+0.003
Neg logit contributions
ba-0.003
ty-0.003
ve-0.003
Mo-0.003
B-0.003
Token three
Token ID1115
Feature activation+3.70e-01

Pos logit contributions
ba+0.010
ty+0.009
ve+0.009
Mo+0.009
B+0.008
Neg logit contributions
 supervision-0.014
ours-0.014
 salon-0.014
 studio-0.013
ses-0.012
Token years
Token ID812
Feature activation+1.69e+00

Pos logit contributions
ours+0.011
 salon+0.011
 supervision+0.011
 studio+0.010
 mouths+0.010
Neg logit contributions
ba-0.010
ve-0.009
ty-0.009
Mo-0.009
Ar-0.008
Token old
Token ID1468
Feature activation+6.37e-01

Pos logit contributions
 supervision+0.055
ours+0.054
 salon+0.053
ses+0.052
 studio+0.051
Neg logit contributions
ba-0.037
ty-0.036
ve-0.035
 himself-0.031
Mo-0.031
Token.
Token ID13
Feature activation-2.75e-01

Pos logit contributions
ours+0.025
 supervision+0.025
 salon+0.025
 studio+0.024
ses+0.023
Neg logit contributions
ba-0.011
ty-0.010
ve-0.009
Mo-0.009
B-0.008
Token One
Token ID1881
Feature activation+5.64e-01

Pos logit contributions
ba+0.006
ty+0.006
ve+0.005
 himself+0.005
Mo+0.005
Neg logit contributions
 salon-0.009
 supervision-0.009
ours-0.009
 studio-0.008
ses-0.008
Token day
Token ID1110
Feature activation-2.90e-01

Pos logit contributions
 supervision+0.017
 salon+0.016
ours+0.016
 studio+0.015
ses+0.015
Neg logit contributions
ba-0.015
ty-0.015
B-0.014
Mo-0.014
ve-0.014
Token Jack
Token ID3619
Feature activation+5.10e-01

Pos logit contributions
ty+0.006
ba+0.006
ve+0.006
B+0.006
 himself+0.006
Neg logit contributions
 salon-0.010
 supervision-0.010
ours-0.009
 studio-0.009
ses-0.009
Token was
Token ID373
Feature activation-7.16e-02

Pos logit contributions
 salon+0.014
ours+0.014
 supervision+0.014
 studio+0.013
ses+0.012
Neg logit contributions
ba-0.014
ty-0.014
ve-0.014
Mo-0.013
B-0.012
Token playing
Token ID2712
Feature activation-4.56e-01

Pos logit contributions
ba+0.002
ve+0.002
ty+0.002
Mo+0.002
B+0.002
Neg logit contributions
 supervision-0.002
ours-0.002
 salon-0.002
ses-0.002
 studio-0.002
Token a
Token ID257
Feature activation-2.82e-01

Pos logit contributions
ba+0.035
ve+0.033
ty+0.032
Mo+0.030
vell+0.028
Neg logit contributions
ours-0.032
 salon-0.031
 studio-0.031
 supervision-0.030
ses-0.030
Token hairy
Token ID43873
Feature activation-9.18e-01

Pos logit contributions
ba+0.005
ve+0.004
ty+0.004
Mo+0.004
B+0.004
Neg logit contributions
 supervision-0.012
ours-0.011
 salon-0.011
anged-0.011
 studio-0.011
Token bear
Token ID6842
Feature activation-3.65e-01

Pos logit contributions
ba+0.025
ty+0.024
ve+0.023
Mo+0.022
B+0.021
Neg logit contributions
 supervision-0.028
ours-0.027
 salon-0.026
 studio-0.024
 mouths-0.024
Token who
Token ID508
Feature activation+1.44e+00

Pos logit contributions
ba+0.010
ty+0.010
ve+0.009
Mo+0.009
B+0.008
Neg logit contributions
 supervision-0.011
ours-0.011
 salon-0.011
 studio-0.010
 mouths-0.010
Token lived
Token ID5615
Feature activation-9.70e-01

Pos logit contributions
 salon+0.036
ours+0.036
 supervision+0.035
 studio+0.034
ses+0.032
Neg logit contributions
ba-0.044
ty-0.042
ve-0.042
Mo-0.040
B-0.038
Token in
Token ID287
Feature activation-1.56e+00

Pos logit contributions
ba+0.024
ty+0.024
ve+0.023
Mo+0.022
Ar+0.021
Neg logit contributions
ours-0.027
 salon-0.027
 studio-0.026
apping-0.025
 supervision-0.025
Token the
Token ID262
Feature activation-3.66e-01

Pos logit contributions
ba+0.039
ve+0.038
ty+0.036
Mo+0.033
Ar+0.031
Neg logit contributions
 salon-0.047
 supervision-0.047
ours-0.046
 studio-0.045
ses-0.043
Token forest
Token ID8222
Feature activation+2.62e-02

Pos logit contributions
ba+0.010
ve+0.010
ty+0.009
Mo+0.009
Ar+0.008
Neg logit contributions
 salon-0.010
ours-0.010
 supervision-0.010
 studio-0.010
ses-0.009
Token.
Token ID13
Feature activation+3.54e-01

Pos logit contributions
 supervision+0.001
 salon+0.001
ours+0.001
 studio+0.001
ses+0.001
Neg logit contributions
ba-0.000
 himself-0.000
ty-0.000
ve-0.000
Mo-0.000
Token He
Token ID679
Feature activation-4.15e-02

Pos logit contributions
 supervision+0.013
ours+0.012
 salon+0.012
 studio+0.012
anged+0.011
Neg logit contributions
 himself-0.007
ty-0.007
Mo-0.007
ba-0.006
Ar-0.006
Token wanted
Token ID2227
Feature activation-3.59e-01

Pos logit contributions
ba+0.001
ve+0.001
ty+0.001
Mo+0.001
Mill+0.001
Neg logit contributions
 salon-0.001
ours-0.001
 studio-0.001
 supervision-0.001
ses-0.001
Token.
Token ID13
Feature activation+2.57e-01

Pos logit contributions
ours+0.013
 supervision+0.013
 salon+0.012
bug+0.012
ap+0.012
Neg logit contributions
ve-0.017
ba-0.016
Mo-0.015
ty-0.015
vell-0.015
Token Fin
Token ID4463
Feature activation+1.12e-01

Pos logit contributions
ours+0.007
ses+0.007
 supervision+0.007
 mouths+0.007
 Sharing+0.006
Neg logit contributions
ba-0.007
ve-0.007
ty-0.007
Mo-0.006
Mill-0.006
Token liked
Token ID8288
Feature activation-1.80e-01

Pos logit contributions
ours+0.002
 Sharing+0.002
ses+0.002
 supervision+0.002
 salon+0.002
Neg logit contributions
ve-0.004
ba-0.004
Mo-0.004
vell-0.004
ty-0.004
Token to
Token ID284
Feature activation-6.33e-02

Pos logit contributions
ba+0.005
ty+0.005
ve+0.005
Mo+0.005
B+0.005
Neg logit contributions
 salon-0.005
 supervision-0.005
ours-0.005
 studio-0.005
ses-0.004
Token swim
Token ID9422
Feature activation-5.95e-01

Pos logit contributions
ve+0.002
ba+0.002
ty+0.002
Mo+0.002
Mill+0.002
Neg logit contributions
 salon-0.002
ours-0.002
 studio-0.002
 supervision-0.002
ap-0.002
Token in
Token ID287
Feature activation-1.67e+00

Pos logit contributions
Mill+0.020
Mo+0.020
ve+0.019
ba+0.019
ty+0.019
Neg logit contributions
ap-0.012
 square-0.010
apped-0.010
apping-0.010
ours-0.009
Token the
Token ID262
Feature activation-8.55e-01

Pos logit contributions
ba+0.056
ve+0.055
ty+0.052
Mo+0.050
vell+0.047
Neg logit contributions
ours-0.037
 salon-0.036
 studio-0.035
 supervision-0.035
 mouths-0.033
Token big
Token ID1263
Feature activation-9.81e-01

Pos logit contributions
ba+0.025
ve+0.024
ty+0.024
Mo+0.022
Mill+0.021
Neg logit contributions
 salon-0.022
ours-0.022
 studio-0.022
 supervision-0.021
 mouths-0.020
Token blue
Token ID4171
Feature activation-9.03e-01

Pos logit contributions
ba+0.023
ty+0.022
Mo+0.020
ve+0.020
Ar+0.019
Neg logit contributions
 supervision-0.031
 salon-0.031
ours-0.031
ses-0.029
 studio-0.029
Token sea
Token ID5417
Feature activation-6.92e-01

Pos logit contributions
ba+0.026
ve+0.025
ty+0.025
Mo+0.024
B+0.022
Neg logit contributions
 salon-0.024
ours-0.024
 studio-0.023
 supervision-0.023
 mouths-0.021
Token and
Token ID290
Feature activation+3.10e-01

Pos logit contributions
ba+0.016
ty+0.015
ve+0.015
Mo+0.014
B+0.013
Neg logit contributions
 salon-0.023
ours-0.023
 supervision-0.023
 studio-0.022
ses-0.021
Token."
Token ID526
Feature activation+9.23e-03

Pos logit contributions
 salon+0.025
ours+0.025
 supervision+0.025
 studio+0.024
ses+0.023
Neg logit contributions
ba-0.013
ty-0.012
ve-0.012
Mo-0.011
B-0.010
Token
Token ID198
Feature activation-1.04e-01

Pos logit contributions
 salon+0.000
ours+0.000
 supervision+0.000
 studio+0.000
ses+0.000
Neg logit contributions
ba-0.000
 himself-0.000
ty-0.000
Mo-0.000
ve-0.000
Token
Token ID198
Feature activation-3.05e-01

Pos logit contributions
ba+0.002
ty+0.002
ve+0.002
Mo+0.002
Ar+0.002
Neg logit contributions
 salon-0.003
ours-0.003
 supervision-0.003
 studio-0.003
ses-0.003
TokenThey
Token ID2990
Feature activation-3.33e-02

Pos logit contributions
ba+0.007
ty+0.006
ve+0.006
Mo+0.006
B+0.005
Neg logit contributions
ours-0.011
 salon-0.010
 supervision-0.010
 studio-0.010
ses-0.010
Token searched
Token ID16499
Feature activation-1.14e+00

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
Mo+0.001
Ar+0.001
Neg logit contributions
ours-0.001
 supervision-0.001
 salon-0.001
 studio-0.001
ses-0.001
Token everywhere
Token ID8347
Feature activation-1.66e+00

Pos logit contributions
ba+0.029
ty+0.028
ve+0.027
Mo+0.026
Ar+0.024
Neg logit contributions
 salon-0.034
ours-0.034
 supervision-0.034
 studio-0.032
ses-0.031
Token,
Token ID11
Feature activation+9.35e-01

Pos logit contributions
ba+0.036
ty+0.036
ve+0.035
Mo+0.031
 himself+0.031
Neg logit contributions
 salon-0.058
ours-0.057
 supervision-0.055
 studio-0.053
ses-0.051
Token but
Token ID475
Feature activation+4.84e-01

Pos logit contributions
ours+0.034
 salon+0.034
 supervision+0.034
 studio+0.033
ses+0.031
Neg logit contributions
ba-0.018
ty-0.017
ve-0.016
Mo-0.015
 himself-0.013
Token they
Token ID484
Feature activation-1.17e-01

Pos logit contributions
ours+0.014
 supervision+0.014
 salon+0.013
 studio+0.013
apping+0.012
Neg logit contributions
ba-0.013
ty-0.012
ve-0.012
Mo-0.012
 himself-0.011
Token couldn
Token ID3521
Feature activation-1.05e+00

Pos logit contributions
ba+0.003
ty+0.003
ve+0.003
Mo+0.003
B+0.003
Neg logit contributions
ours-0.003
 salon-0.003
 supervision-0.003
 studio-0.003
ses-0.003
Token't
Token ID470
Feature activation-6.47e-02

Pos logit contributions
ba+0.015
ty+0.014
ve+0.013
Mo+0.012
B+0.011
Neg logit contributions
 salon-0.043
ours-0.043
 supervision-0.042
 studio-0.041
ses-0.040
Token her
Token ID607
Feature activation-2.42e-01

Pos logit contributions
 supervision+0.003
ours+0.003
 salon+0.003
 studio+0.002
 mouths+0.002
Neg logit contributions
ba-0.002
ve-0.002
ty-0.002
Mo-0.002
Ar-0.002
Token.
Token ID13
Feature activation-4.70e-02

Pos logit contributions
ba+0.005
ve+0.005
ty+0.005
Mo+0.004
 himself+0.004
Neg logit contributions
ours-0.009
 salon-0.009
 supervision-0.008
 studio-0.008
 mouths-0.008
Token "
Token ID366
Feature activation-6.48e-01

Pos logit contributions
ty+0.001
Mo+0.001
ve+0.001
ba+0.001
 himself+0.001
Neg logit contributions
 salon-0.002
 supervision-0.002
ours-0.001
 studio-0.001
ses-0.001
TokenWhat
Token ID2061
Feature activation+1.08e+00

Pos logit contributions
ve+0.017
ba+0.016
ty+0.016
cast+0.014
vell+0.014
Neg logit contributions
 salon-0.018
bug-0.017
 supervision-0.017
 studio-0.016
 Sharing-0.016
Token's
Token ID338
Feature activation-4.91e-01

Pos logit contributions
 salon+0.033
 supervision+0.033
ours+0.033
 studio+0.032
anged+0.031
Neg logit contributions
ba-0.026
ty-0.025
ve-0.024
Mo-0.022
B-0.022
Token wrong
Token ID2642
Feature activation-1.57e+00

Pos logit contributions
ba+0.016
ty+0.016
ve+0.015
Mo+0.015
B+0.014
Neg logit contributions
 salon-0.011
ours-0.011
 supervision-0.011
 studio-0.011
ses-0.010
Token?"
Token ID1701
Feature activation+1.41e-01

Pos logit contributions
ba+0.035
ve+0.033
ty+0.033
Mo+0.031
B+0.029
Neg logit contributions
 salon-0.052
ours-0.052
 supervision-0.050
 studio-0.050
ses-0.047
Token he
Token ID339
Feature activation-9.10e-01

Pos logit contributions
 salon+0.005
ours+0.004
 supervision+0.004
 studio+0.004
ses+0.004
Neg logit contributions
ba-0.003
ty-0.003
ve-0.003
Mo-0.003
B-0.003
Token asked
Token ID1965
Feature activation-5.25e-01

Pos logit contributions
ve+0.022
ba+0.021
ty+0.021
vell+0.020
Mo+0.019
Neg logit contributions
 salon-0.025
 supervision-0.024
ours-0.024
 studio-0.024
 gust-0.023
Token.
Token ID13
Feature activation+4.39e-01

Pos logit contributions
ba+0.011
ty+0.011
ve+0.010
Mo+0.010
 himself+0.010
Neg logit contributions
ours-0.019
 salon-0.019
 supervision-0.018
 studio-0.018
ses-0.017
Token
Token ID198
Feature activation+3.83e-01

Pos logit contributions
 salon+0.012
ours+0.012
 supervision+0.012
 studio+0.011
ses+0.011
Neg logit contributions
ba-0.012
ty-0.012
ve-0.012
Mo-0.011
B-0.011
Token
Token ID198
Feature activation-4.08e-02

Pos logit contributions
Mo+0.000
ve+0.000
ba+0.000
ty+0.000
Mill+0.000
Neg logit contributions
 salon-0.000
 supervision-0.000
ours-0.000
 studio-0.000
ses-0.000
Token
Token ID198
Feature activation+5.84e-03

Pos logit contributions
ba+0.001
ty+0.001
ve+0.001
Mo+0.001
vell+0.001
Neg logit contributions
ours-0.001
 studio-0.001
 salon-0.001
 supervision-0.001
 �-0.001
TokenWhen
Token ID2215
Feature activation-8.24e-01

Pos logit contributions
 salon+0.000
ours+0.000
 supervision+0.000
 studio+0.000
ses+0.000
Neg logit contributions
ba-0.000
ty-0.000
ve-0.000
Mo-0.000
Ar-0.000
Token Lou
Token ID4768
Feature activation+1.40e-01

Pos logit contributions
ty+0.020
Mo+0.020
ve+0.019
ba+0.019
 himself+0.017
Neg logit contributions
 supervision-0.027
 salon-0.027
ours-0.027
ses-0.024
anged-0.023
Token got
Token ID1392
Feature activation-7.04e-01

Pos logit contributions
 salon+0.004
 supervision+0.003
ours+0.003
 studio+0.003
anged+0.003
Neg logit contributions
ba-0.004
ve-0.004
Mo-0.004
ty-0.004
Mill-0.004
Token home
Token ID1363
Feature activation-2.02e+00

Pos logit contributions
ba+0.015
ty+0.014
ve+0.014
 himself+0.013
Mo+0.013
Neg logit contributions
 supervision-0.024
ours-0.024
 salon-0.024
ses-0.024
 studio-0.023
Token,
Token ID11
Feature activation-3.84e-01

Pos logit contributions
ba+0.056
ty+0.054
ve+0.053
Mo+0.050
 himself+0.047
Neg logit contributions
ours-0.056
 salon-0.055
 supervision-0.055
 studio-0.052
ses-0.051
Token she
Token ID673
Feature activation-1.51e-01

Pos logit contributions
 himself+0.009
ba+0.009
ty+0.009
Mo+0.008
ve+0.008
Neg logit contributions
ours-0.013
 salon-0.012
 supervision-0.012
ses-0.012
anged-0.011
Token told
Token ID1297
Feature activation-2.88e-01

Pos logit contributions
ba+0.004
ty+0.004
ve+0.004
Mo+0.004
B+0.003
Neg logit contributions
 salon-0.004
ours-0.004
 supervision-0.004
 studio-0.004
ses-0.004
Token her
Token ID607
Feature activation+4.15e-01

Pos logit contributions
ty+0.005
ba+0.005
 himself+0.004
ve+0.004
Mo+0.004
Neg logit contributions
ours-0.011
 salon-0.011
 supervision-0.011
 studio-0.011
ses-0.010
Token mom
Token ID1995
Feature activation-6.91e-01

Pos logit contributions
 supervision+0.015
ses+0.014
ours+0.014
anged+0.014
 slot+0.014
Neg logit contributions
 himself-0.008
ty-0.007
ba-0.007
ve-0.007
 cool-0.007