TOP ACTIVATIONS
MAX = 0.077

re welcome , my de ars . Enjoy your cake
" Thank you , my de ars . I appreciate your
her . Rel uct antly , Elsa left
, the farmer in the de ll ". Suddenly , he
" Thank you , my de ars ," she said .
hear it ?" Rel ieved , Tim my climbed
very intelligent and curious three year old girl . She loved
very happy and energetic 3 year old boy . He was
mom my and a 3 year old child . The mom
" Hello , my de ars ," she said .
. Soon , more cater pill ars arrived . They also
J ill was a 3 year old girl with a lot
good tip for you . Rel iable people are the best
Mel issa was a 3 year old girl who loved to
. " Hello , my de ars ! What a nice
's courage . The two de ers became good friends after
ily liked to keep cater pill ars in a jar .
<|endoftext|> John was a 3 year old boy who lived in
squ eal ed with g lee and reached out to take
" Hello , my de ars . What do you

MIN ACTIVATIONS
MIN = -0.079

. One day , Sp ide y was trying to make
that day on , Sp ide y knew that even when
that day on , Sp ide y made sure to keep
. Every day , Sp ide y would spin cob we
strong . One day , Frost y met a little girl
. One day , Sp ide y noticed that there was
that day on , Sp ide y knew that even when
rest . One day , Buzz y was flying around looking
to get home . Sp ide y cared about the bug
with me ?" R abbit thought for a moment
how to get home . Sp ide y cared about the
bees . Every day , Buzz y would fly around and
. One day , Sp ide y woke up and wanted
flowers . One day , Buzz y flew into a sticky
you again ." R abb ie forg ave Fox
From that day on , Sp ide y made sure to
After helping the bug , Sp ide y felt happy too
From that day on , Sp ide y knew that even
web went flying . Sp ide y held on tight and
's web went flying . Sp ide y held on tight

INTERVAL 0.038 - 0.077
CONTAINS 0.536%

surfing . He was so en vious that he could hardly
way to use it ! â Baby listened to her
!" His mom smiled and said , " You 're welcome
mom smiled and said " Yes , it does ! And
The other animals were un imp ressed and scoff ed at

INTERVAL -0.001 - 0.038
CONTAINS 52.593%

down the street . The son was amazed at
, this place is quite impressive ! There must be something
told all his friends that the turkey was not dirty ,
he sat down next to the rabbit and they sat there
You want to share your doll with me ?"

INTERVAL -0.040 - -0.001
CONTAINS 46.518%

time . <|endoftext|> Once upon a time , there was a
The crab did not answer . He moved his claws and
may not be what is best for others . <|endoftext|> Once
and Tim my just shrugged his shoulders .
Silver is a shiny metal . It will help

INTERVAL -0.079 - -0.040
CONTAINS 0.354%

that could unlock things . Nem o asked his dad for
Snow y . And the living room . Mom will be
. He plugged the mixer into the wall and turned it
Suddenly , the flour flew out of the bowl and made
neighbour replied , sadly , " It 's the sk unk
Token
Token ID8151
Feature activation+4.30e-02

Pos logit contributions
ra+0.018
 magn+0.018
ast+0.017
 Fre+0.017
 inh+0.016
Neg logit contributions
rell-0.021
 better-0.020
 safer-0.019
 nicer-0.018
 company-0.018
Tokenre
Token ID260
Feature activation+1.78e-02

Pos logit contributions
 better+0.069
rell+0.068
 nicer+0.067
 safer+0.066
 company+0.062
Neg logit contributions
ra-0.066
 Fre-0.064
ert-0.062
ast-0.062
 magn-0.062
Token welcome
Token ID7062
Feature activation+8.27e-03

Pos logit contributions
rell+0.023
 better+0.021
ock+0.020
 nicer+0.020
 safer+0.020
Neg logit contributions
ra-0.034
 Fre-0.033
ast-0.033
 magn-0.032
 Miles-0.031
Token,
Token ID11
Feature activation-1.87e-02

Pos logit contributions
rell+0.011
 better+0.011
 safer+0.010
 nicer+0.010
ock+0.009
Neg logit contributions
ra-0.015
 magn-0.015
 Fre-0.015
ast-0.015
 inh-0.014
Token my
Token ID616
Feature activation+5.57e-03

Pos logit contributions
ra+0.028
 magn+0.028
 Fre+0.027
ast+0.027
 inh+0.026
Neg logit contributions
rell-0.031
 better-0.031
 safer-0.028
 nicer-0.028
 company-0.027
Token de
Token ID390
Feature activation+7.66e-02

Pos logit contributions
rell+0.009
 better+0.008
 safer+0.007
 nicer+0.007
ock+0.007
Neg logit contributions
ra-0.010
 magn-0.009
ast-0.009
 Fre-0.009
 inh-0.009
Tokenars
Token ID945
Feature activation+9.62e-03

Pos logit contributions
rell+0.122
 safer+0.099
 nicer+0.099
 better+0.095
 ambulance+0.093
Neg logit contributions
ra-0.126
ast-0.119
 magn-0.112
ix-0.110
Tony-0.109
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
rell+0.014
 better+0.013
 safer+0.013
 nicer+0.012
 less+0.011
Neg logit contributions
ra-0.017
 magn-0.016
ast-0.016
 Fre-0.016
 inh-0.015
Token Enjoy
Token ID18179
Feature activation-3.84e-03

Pos logit contributions
rell+0.021
 better+0.020
 safer+0.019
 nicer+0.018
 company+0.018
Neg logit contributions
ra-0.020
 magn-0.020
 Fre-0.019
ast-0.019
 inh-0.019
Token your
Token ID534
Feature activation+2.73e-03

Pos logit contributions
 magn+0.007
ra+0.007
 Fre+0.007
 inh+0.006
ast+0.006
Neg logit contributions
 better-0.005
rell-0.005
 nicer-0.005
 safer-0.005
 company-0.004
Token cake
Token ID12187
Feature activation-3.31e-03

Pos logit contributions
rell+0.004
 better+0.004
 safer+0.004
ock+0.004
 nicer+0.004
Neg logit contributions
ra-0.004
 magn-0.004
ast-0.004
 inh-0.004
 Fre-0.004
Token "
Token ID366
Feature activation-1.26e-02

Pos logit contributions
ra+0.012
 magn+0.012
ast+0.012
 Fre+0.012
 inh+0.011
Neg logit contributions
rell-0.015
 better-0.014
 safer-0.013
 nicer-0.013
 company-0.013
TokenThank
Token ID10449
Feature activation+1.56e-02

Pos logit contributions
ra+0.018
 magn+0.017
ast+0.017
 Fre+0.017
 inh+0.016
Neg logit contributions
rell-0.023
 better-0.022
 safer-0.021
 nicer-0.020
 company-0.020
Token you
Token ID345
Feature activation+1.55e-02

Pos logit contributions
rell+0.025
 better+0.022
 safer+0.021
 nicer+0.020
 company+0.019
Neg logit contributions
ra-0.027
 Fre-0.025
 magn-0.025
ast-0.025
 Miles-0.024
Token,
Token ID11
Feature activation-2.24e-02

Pos logit contributions
rell+0.028
vious+0.023
 safer+0.023
 better+0.023
ock+0.023
Neg logit contributions
ast-0.022
ra-0.022
 Fre-0.021
 Mason-0.021
 M-0.021
Token my
Token ID616
Feature activation+4.30e-04

Pos logit contributions
ra+0.038
 magn+0.037
 Fre+0.036
ast+0.036
 inh+0.035
Neg logit contributions
rell-0.034
 better-0.034
 safer-0.031
 nicer-0.030
 less-0.029
Token de
Token ID390
Feature activation+7.13e-02

Pos logit contributions
rell+0.001
 safer+0.001
 meantime+0.001
 better+0.001
ock+0.001
Neg logit contributions
ra-0.001
ast-0.001
 magn-0.001
 El-0.001
 inh-0.001
Tokenars
Token ID945
Feature activation+2.20e-02

Pos logit contributions
rell+0.114
 meantime+0.084
 safer+0.083
 nicer+0.083
 better+0.082
Neg logit contributions
ra-0.122
ast-0.119
 magn-0.112
ix-0.108
 cent-0.106
Token.
Token ID13
Feature activation-3.70e-03

Pos logit contributions
rell+0.034
 safer+0.031
 nicer+0.030
 better+0.030
 meantime+0.028
Neg logit contributions
ra-0.036
ast-0.034
 magn-0.033
 Tony-0.032
 Fre-0.032
Token I
Token ID314
Feature activation-2.67e-02

Pos logit contributions
ra+0.006
 magn+0.006
ast+0.006
 Fre+0.006
 inh+0.006
Neg logit contributions
rell-0.005
 better-0.005
 safer-0.005
 nicer-0.005
 company-0.005
Token appreciate
Token ID9144
Feature activation-1.40e-03

Pos logit contributions
ra+0.041
ast+0.039
 magn+0.039
 Fre+0.037
 inh+0.037
Neg logit contributions
rell-0.047
 better-0.040
 safer-0.040
 nicer-0.040
ock-0.038
Token your
Token ID534
Feature activation-7.32e-03

Pos logit contributions
ra+0.002
 magn+0.002
ast+0.002
 Fre+0.002
 inh+0.002
Neg logit contributions
rell-0.002
 better-0.002
 safer-0.002
 nicer-0.002
 company-0.002
Token her
Token ID607
Feature activation+7.40e-03

Pos logit contributions
rell+0.018
 better+0.017
 safer+0.017
 nicer+0.016
 company+0.016
Neg logit contributions
 magn-0.010
ra-0.010
 Fre-0.009
ast-0.009
 inh-0.009
Token.
Token ID13
Feature activation+1.89e-03

Pos logit contributions
rell+0.010
 better+0.010
 safer+0.009
 nicer+0.008
 company+0.008
Neg logit contributions
ra-0.014
 magn-0.014
ast-0.013
 Fre-0.013
 inh-0.013
Token
Token ID220
Feature activation+5.04e-03

Pos logit contributions
rell+0.003
 better+0.003
 safer+0.003
 nicer+0.003
 less+0.002
Neg logit contributions
ra-0.003
 magn-0.003
ast-0.003
 Fre-0.003
 inh-0.003
Token
Token ID198
Feature activation-9.43e-03

Pos logit contributions
rell+0.009
 better+0.009
 nicer+0.008
 safer+0.008
 company+0.008
Neg logit contributions
ra-0.007
 magn-0.007
 Fre-0.007
ast-0.007
 Miles-0.006
Token
Token ID198
Feature activation-1.23e-02

Pos logit contributions
ra+0.014
 magn+0.014
ast+0.014
 Fre+0.014
 inh+0.013
Neg logit contributions
rell-0.016
 better-0.015
 safer-0.014
 nicer-0.014
 company-0.013
TokenRel
Token ID6892
Feature activation+6.97e-02

Pos logit contributions
ra+0.019
 magn+0.018
ast+0.018
 Fre+0.018
 inh+0.017
Neg logit contributions
rell-0.021
 better-0.019
 safer-0.018
 nicer-0.018
 company-0.017
Tokenuct
Token ID4782
Feature activation-8.78e-03

Pos logit contributions
rell+0.112
 company+0.102
 better+0.099
 safer+0.097
 less+0.094
Neg logit contributions
ast-0.115
ra-0.114
 Fre-0.108
Tony-0.105
amm-0.105
Tokenantly
Token ID3875
Feature activation+5.45e-03

Pos logit contributions
 magn+0.012
 Fre+0.012
ast+0.012
 inh+0.012
ra+0.011
Neg logit contributions
 better-0.015
rell-0.015
 nicer-0.014
 safer-0.014
 less-0.013
Token,
Token ID11
Feature activation-1.74e-02

Pos logit contributions
rell+0.006
 better+0.006
 safer+0.005
 nicer+0.005
 less+0.005
Neg logit contributions
ra-0.011
 magn-0.011
ast-0.011
 Fre-0.011
ert-0.011
Token Elsa
Token ID19226
Feature activation-3.65e-04

Pos logit contributions
ra+0.028
ert+0.027
ast+0.027
 magn+0.027
udge+0.027
Neg logit contributions
 better-0.030
rell-0.025
 safer-0.025
 less-0.025
 nicer-0.025
Token left
Token ID1364
Feature activation-1.24e-02

Pos logit contributions
 magn+0.001
ra+0.001
 Fre+0.001
udge+0.001
ert+0.001
Neg logit contributions
 better-0.001
rell-0.001
 nicer-0.000
 safer-0.000
 safe-0.000
Token,
Token ID11
Feature activation-9.10e-03

Pos logit contributions
rell+0.071
 better+0.066
 safer+0.063
 nicer+0.061
ock+0.058
Neg logit contributions
ra-0.073
 Fre-0.072
 magn-0.070
ast-0.069
 Miles-0.068
Token the
Token ID262
Feature activation+4.03e-02

Pos logit contributions
ra+0.015
 magn+0.014
ast+0.014
 inh+0.013
udge+0.013
Neg logit contributions
 better-0.015
rell-0.014
 safer-0.014
 nicer-0.013
 less-0.013
Token farmer
Token ID18739
Feature activation+1.90e-02

Pos logit contributions
rell+0.061
 better+0.054
 safer+0.053
 nicer+0.051
ock+0.048
Neg logit contributions
 Fre-0.066
 magn-0.064
 Miles-0.064
ra-0.064
 Mason-0.063
Token in
Token ID287
Feature activation-1.04e-02

Pos logit contributions
rell+0.026
 better+0.025
 safer+0.023
 nicer+0.022
 company+0.021
Neg logit contributions
ra-0.035
 magn-0.034
 Fre-0.033
ast-0.033
 inh-0.032
Token the
Token ID262
Feature activation+1.99e-02

Pos logit contributions
ra+0.017
 magn+0.016
ast+0.016
 Fre+0.016
 inh+0.015
Neg logit contributions
rell-0.017
 better-0.016
 safer-0.015
 nicer-0.015
 company-0.014
Token de
Token ID390
Feature activation+6.92e-02

Pos logit contributions
rell+0.036
 better+0.032
ock+0.031
 safer+0.031
 nicer+0.030
Neg logit contributions
 magn-0.028
ra-0.027
 Fre-0.026
 inh-0.025
ast-0.025
Tokenll
Token ID297
Feature activation+6.22e-03

Pos logit contributions
rell+0.120
 nicer+0.107
 better+0.106
 safer+0.106
 less+0.103
Neg logit contributions
ra-0.102
Tony-0.096
ix-0.094
ast-0.093
isa-0.093
Token".
Token ID1911
Feature activation+9.43e-03

Pos logit contributions
rell+0.009
 better+0.009
 safer+0.009
 nicer+0.008
 less+0.008
Neg logit contributions
ra-0.011
ast-0.010
 magn-0.010
 Fre-0.010
 inh-0.010
Token Suddenly
Token ID24975
Feature activation-1.64e-03

Pos logit contributions
rell+0.014
 better+0.014
 safer+0.012
 less+0.012
 nicer+0.012
Neg logit contributions
ra-0.017
 magn-0.016
ast-0.016
 Fre-0.015
 inh-0.015
Token,
Token ID11
Feature activation-1.17e-02

Pos logit contributions
 magn+0.003
ra+0.003
 Fre+0.003
 Miles+0.003
 inh+0.003
Neg logit contributions
rell-0.002
 better-0.002
 safer-0.002
 nicer-0.002
 company-0.002
Token he
Token ID339
Feature activation-2.02e-02

Pos logit contributions
ra+0.020
ast+0.019
 magn+0.019
 Fre+0.018
 inh+0.018
Neg logit contributions
 better-0.018
rell-0.017
 safer-0.016
 nicer-0.016
 less-0.015
Token"
Token ID1
Feature activation-1.48e-02

Pos logit contributions
ra+0.009
ast+0.009
 magn+0.008
Tony+0.008
Jul+0.008
Neg logit contributions
rell-0.012
 better-0.011
 nicer-0.010
 company-0.010
 safer-0.010
TokenThank
Token ID10449
Feature activation+1.11e-02

Pos logit contributions
ra+0.025
 magn+0.024
ast+0.024
 Fre+0.023
 inh+0.023
Neg logit contributions
 better-0.023
rell-0.023
 safer-0.021
 nicer-0.021
 less-0.020
Token you
Token ID345
Feature activation+1.50e-02

Pos logit contributions
rell+0.016
 better+0.015
 safer+0.014
 nicer+0.014
 company+0.013
Neg logit contributions
ra-0.020
 magn-0.019
 Fre-0.019
ast-0.019
 inh-0.018
Token,
Token ID11
Feature activation-2.94e-02

Pos logit contributions
rell+0.025
 better+0.022
 safer+0.021
 nicer+0.021
ock+0.021
Neg logit contributions
ra-0.023
 Fre-0.023
ast-0.023
 magn-0.022
 Cic-0.022
Token my
Token ID616
Feature activation-5.14e-03

Pos logit contributions
ra+0.049
 magn+0.048
ast+0.046
 Fre+0.046
 inh+0.045
Neg logit contributions
rell-0.046
 better-0.045
 safer-0.041
 nicer-0.040
 company-0.039
Token de
Token ID390
Feature activation+6.88e-02

Pos logit contributions
ra+0.008
 magn+0.008
ast+0.007
 inh+0.007
 M+0.007
Neg logit contributions
rell-0.009
ock-0.008
 safer-0.008
 better-0.008
 meantime-0.007
Tokenars
Token ID945
Feature activation+2.57e-02

Pos logit contributions
rell+0.107
 meantime+0.081
 nicer+0.080
 better+0.079
 refreshed+0.079
Neg logit contributions
ra-0.118
 magn-0.113
ast-0.112
 cent-0.105
ix-0.104
Token,"
Token ID553
Feature activation-2.78e-02

Pos logit contributions
rell+0.039
 safer+0.035
 nicer+0.035
 better+0.034
 meantime+0.032
Neg logit contributions
ra-0.043
 Fre-0.040
ast-0.039
 magn-0.039
 Miles-0.039
Token she
Token ID673
Feature activation+5.04e-05

Pos logit contributions
ra+0.050
ast+0.048
 magn+0.048
 Fre+0.046
ert+0.046
Neg logit contributions
 better-0.041
rell-0.038
 safer-0.037
 nicer-0.036
 less-0.035
Token said
Token ID531
Feature activation+2.76e-02

Pos logit contributions
rell+0.000
 better+0.000
 safer+0.000
 nicer+0.000
 company+0.000
Neg logit contributions
ra-0.000
 magn-0.000
ast-0.000
 Fre-0.000
 inh-0.000
Token.
Token ID13
Feature activation-2.93e-02

Pos logit contributions
rell+0.043
 meantime+0.038
 safer+0.037
ock+0.037
 better+0.036
Neg logit contributions
ra-0.046
 magn-0.044
ast-0.043
 inh-0.042
 Fre-0.042
Token hear
Token ID3285
Feature activation+4.56e-03

Pos logit contributions
rell+0.002
 better+0.002
 safer+0.002
 nicer+0.002
 company+0.002
Neg logit contributions
ra-0.003
 magn-0.003
ast-0.003
 Fre-0.003
 inh-0.003
Token it
Token ID340
Feature activation+5.87e-03

Pos logit contributions
rell+0.007
 better+0.007
 safer+0.006
 nicer+0.006
 company+0.006
Neg logit contributions
ra-0.008
 magn-0.008
 Fre-0.007
ast-0.007
 inh-0.007
Token?"
Token ID1701
Feature activation-7.34e-03

Pos logit contributions
rell+0.009
 better+0.008
 safer+0.008
 nicer+0.008
 company+0.007
Neg logit contributions
ra-0.009
ast-0.009
 Fre-0.009
 magn-0.009
 inh-0.009
Token
Token ID198
Feature activation+4.02e-03

Pos logit contributions
ra+0.013
 magn+0.013
ast+0.013
 Fre+0.012
 inh+0.012
Neg logit contributions
 better-0.010
rell-0.010
 safer-0.009
 less-0.009
 nicer-0.009
Token
Token ID198
Feature activation+5.22e-03

Pos logit contributions
rell+0.008
 better+0.007
 nicer+0.007
 safer+0.007
 company+0.007
Neg logit contributions
ra-0.005
 Fre-0.005
 magn-0.005
ast-0.005
 Miles-0.005
TokenRel
Token ID6892
Feature activation+6.83e-02

Pos logit contributions
rell+0.010
 better+0.010
 safer+0.009
 nicer+0.009
 company+0.009
Neg logit contributions
ra-0.007
ast-0.006
 Fre-0.006
Tony-0.006
 magn-0.006
Tokenieved
Token ID39591
Feature activation-2.77e-02

Pos logit contributions
rell+0.114
 company+0.106
 safer+0.099
 better+0.098
 less+0.097
Neg logit contributions
ra-0.108
ast-0.108
 Fre-0.103
amm-0.098
ert-0.098
Token,
Token ID11
Feature activation-1.10e-02

Pos logit contributions
ra+0.063
 magn+0.061
ast+0.060
 inh+0.059
isa+0.058
Neg logit contributions
 better-0.026
rell-0.022
 safer-0.022
 less-0.021
 nicer-0.020
Token Tim
Token ID5045
Feature activation-1.24e-02

Pos logit contributions
ra+0.021
ast+0.021
 magn+0.020
alph+0.020
udge+0.020
Neg logit contributions
 better-0.016
rell-0.014
 less-0.013
 safer-0.013
 safe-0.013
Tokenmy
Token ID1820
Feature activation+6.03e-03

Pos logit contributions
ra+0.028
ast+0.027
 magn+0.027
 Fre+0.027
 inh+0.026
Neg logit contributions
rell-0.012
 better-0.011
 safer-0.010
 nicer-0.010
 company-0.009
Token climbed
Token ID19952
Feature activation-1.58e-02

Pos logit contributions
 better+0.009
rell+0.009
 nicer+0.008
 safer+0.008
 safe+0.008
Neg logit contributions
 magn-0.010
ra-0.010
 Fre-0.009
udge-0.009
ert-0.009
Token very
Token ID845
Feature activation+3.78e-03

Pos logit contributions
ra+0.010
 magn+0.010
 Fre+0.010
ert+0.009
ast+0.009
Neg logit contributions
 better-0.008
rell-0.007
 safer-0.007
 nicer-0.007
 safe-0.007
Token intelligent
Token ID12661
Feature activation+1.63e-02

Pos logit contributions
rell+0.005
 better+0.005
 safer+0.004
ock+0.004
 nicer+0.004
Neg logit contributions
ra-0.007
 magn-0.007
ast-0.007
 Fre-0.007
 inh-0.007
Token and
Token ID290
Feature activation+2.96e-02

Pos logit contributions
 better+0.025
rell+0.024
 safer+0.022
 nicer+0.022
 less+0.021
Neg logit contributions
ra-0.027
 magn-0.027
ast-0.026
 Fre-0.025
 inh-0.025
Token curious
Token ID11040
Feature activation+2.82e-02

Pos logit contributions
rell+0.048
 better+0.042
ock+0.040
 safer+0.040
 nicer+0.039
Neg logit contributions
 magn-0.047
ra-0.047
 Fre-0.046
ast-0.045
 inh-0.045
Token three
Token ID1115
Feature activation+8.69e-03

Pos logit contributions
rell+0.053
 better+0.052
 safer+0.049
 nicer+0.048
 company+0.046
Neg logit contributions
ra-0.037
 magn-0.036
 Fre-0.035
ast-0.035
 inh-0.034
Token year
Token ID614
Feature activation+6.82e-02

Pos logit contributions
rell+0.013
 better+0.013
 safer+0.012
 nicer+0.011
 company+0.011
Neg logit contributions
ra-0.015
 magn-0.014
ast-0.014
 Fre-0.014
 inh-0.014
Token old
Token ID1468
Feature activation+3.16e-02

Pos logit contributions
rell+0.093
 better+0.084
 meantime+0.083
 safer+0.079
 company+0.076
Neg logit contributions
 Fre-0.123
ra-0.121
ert-0.120
 El-0.118
ude-0.117
Token girl
Token ID2576
Feature activation+2.12e-02

Pos logit contributions
rell+0.043
 better+0.041
 safer+0.038
 nicer+0.037
 company+0.035
Neg logit contributions
ra-0.058
 magn-0.057
 Fre-0.056
ast-0.055
 inh-0.054
Token.
Token ID13
Feature activation+1.48e-02

Pos logit contributions
rell+0.024
 better+0.024
 safer+0.021
 nicer+0.020
 company+0.019
Neg logit contributions
ra-0.043
 magn-0.042
ast-0.041
 Fre-0.041
 inh-0.040
Token She
Token ID1375
Feature activation-1.38e-03

Pos logit contributions
 better+0.026
rell+0.026
 safer+0.025
 nicer+0.025
 company+0.024
Neg logit contributions
 Fre-0.020
ra-0.019
 El-0.018
 magn-0.018
ast-0.018
Token loved
Token ID6151
Feature activation+6.71e-04

Pos logit contributions
 magn+0.003
ra+0.002
udge+0.002
 Cic+0.002
 Fre+0.002
Neg logit contributions
 better-0.002
rell-0.002
 safer-0.001
 safe-0.001
 nicer-0.001
Token very
Token ID845
Feature activation-7.22e-03

Pos logit contributions
ra+0.021
 magn+0.020
 Fre+0.020
ast+0.020
ert+0.019
Neg logit contributions
 better-0.016
rell-0.015
 safer-0.014
 nicer-0.014
 less-0.013
Token happy
Token ID3772
Feature activation-1.32e-02

Pos logit contributions
ra+0.014
 magn+0.014
ast+0.013
 Fre+0.013
 inh+0.013
Neg logit contributions
rell-0.010
 better-0.009
 safer-0.008
ock-0.008
 nicer-0.008
Token and
Token ID290
Feature activation+3.60e-02

Pos logit contributions
ra+0.024
 magn+0.023
 Fre+0.023
ast+0.023
ert+0.022
Neg logit contributions
 better-0.019
 end-0.016
rell-0.016
 safe-0.016
 company-0.016
Token energetic
Token ID26758
Feature activation+2.91e-03

Pos logit contributions
rell+0.066
 better+0.058
 safer+0.058
ock+0.057
 nicer+0.057
Neg logit contributions
 magn-0.048
 Fre-0.047
 inh-0.047
ast-0.047
 Tony-0.046
Token 3
Token ID513
Feature activation+1.85e-02

Pos logit contributions
 better+0.004
rell+0.004
 company+0.004
 safer+0.004
 nicer+0.004
Neg logit contributions
ra-0.005
 magn-0.005
 Fre-0.005
ast-0.005
 inh-0.004
Token year
Token ID614
Feature activation+6.76e-02

Pos logit contributions
rell+0.032
 better+0.030
 safer+0.029
 nicer+0.028
 meantime+0.026
Neg logit contributions
ra-0.028
ast-0.026
 magn-0.026
 Fre-0.026
 inh-0.026
Token old
Token ID1468
Feature activation+2.95e-02

Pos logit contributions
rell+0.097
 better+0.090
 meantime+0.089
 safer+0.087
 company+0.081
Neg logit contributions
ra-0.109
 Fre-0.109
ert-0.108
ast-0.106
 El-0.106
Token boy
Token ID2933
Feature activation+1.62e-02

Pos logit contributions
rell+0.042
 better+0.038
 safer+0.036
 nicer+0.035
ock+0.033
Neg logit contributions
ra-0.054
 magn-0.052
 Fre-0.051
ast-0.051
 inh-0.050
Token.
Token ID13
Feature activation+1.16e-02

Pos logit contributions
rell+0.020
 better+0.019
 safer+0.017
 nicer+0.017
 company+0.016
Neg logit contributions
ra-0.032
 magn-0.031
 Fre-0.031
ast-0.031
 inh-0.030
Token He
Token ID679
Feature activation-6.59e-03

Pos logit contributions
rell+0.017
 better+0.017
 safer+0.015
 nicer+0.015
 less+0.014
Neg logit contributions
ra-0.020
 magn-0.020
ast-0.019
 Fre-0.019
 inh-0.018
Token was
Token ID373
Feature activation+3.06e-03

Pos logit contributions
ra+0.011
 magn+0.011
ast+0.010
 Fre+0.010
udge+0.010
Neg logit contributions
 better-0.010
rell-0.010
 safer-0.009
 nicer-0.009
 less-0.009
Token mom
Token ID1995
Feature activation+1.72e-02

Pos logit contributions
ra+0.012
 magn+0.011
 Fre+0.011
ast+0.011
 inh+0.011
Neg logit contributions
 better-0.011
rell-0.011
 safer-0.010
 nicer-0.010
 company-0.010
Tokenmy
Token ID1820
Feature activation+2.71e-03

Pos logit contributions
rell+0.021
 better+0.019
 safer+0.018
 nicer+0.017
 company+0.016
Neg logit contributions
ra-0.034
ast-0.034
 magn-0.033
 inh-0.032
 Fre-0.032
Token and
Token ID290
Feature activation+1.45e-02

Pos logit contributions
 better+0.004
rell+0.004
 safer+0.004
 nicer+0.004
ock+0.004
Neg logit contributions
 magn-0.004
ra-0.004
 Fre-0.004
ast-0.004
Tony-0.004
Token a
Token ID257
Feature activation+5.38e-04

Pos logit contributions
rell+0.031
 better+0.027
 safer+0.027
ock+0.027
 nicer+0.026
Neg logit contributions
ra-0.016
 magn-0.015
 inh-0.015
 Fre-0.015
 Cic-0.014
Token 3
Token ID513
Feature activation-4.73e-04

Pos logit contributions
rell+0.001
 better+0.001
 safer+0.001
 nicer+0.001
 company+0.001
Neg logit contributions
ra-0.001
 magn-0.001
ast-0.001
 Fre-0.001
 inh-0.001
Token year
Token ID614
Feature activation+6.75e-02

Pos logit contributions
ra+0.001
 magn+0.001
 Fre+0.001
ast+0.001
 Miles+0.001
Neg logit contributions
rell-0.001
 better-0.001
 company-0.001
 nicer-0.001
 safer-0.001
Token old
Token ID1468
Feature activation+1.75e-02

Pos logit contributions
rell+0.113
 meantime+0.105
 better+0.099
 safer+0.098
ock+0.097
Neg logit contributions
ra-0.090
yl-0.088
 Fre-0.087
ast-0.087
amm-0.086
Token child
Token ID1200
Feature activation+3.07e-02

Pos logit contributions
rell+0.030
 better+0.029
 safer+0.027
 nicer+0.027
 company+0.026
Neg logit contributions
ra-0.026
 magn-0.025
 Fre-0.024
ast-0.024
 inh-0.024
Token.
Token ID13
Feature activation+9.05e-03

Pos logit contributions
rell+0.051
 better+0.049
 safer+0.046
 nicer+0.046
 less+0.043
Neg logit contributions
ra-0.048
 magn-0.046
ast-0.045
 Fre-0.045
 inh-0.044
Token The
Token ID383
Feature activation+1.52e-02

Pos logit contributions
rell+0.017
 better+0.017
 safer+0.016
 nicer+0.016
ock+0.016
Neg logit contributions
ra-0.011
 magn-0.011
 Fre-0.011
ast-0.010
 inh-0.010
Token mom
Token ID1995
Feature activation-2.53e-03

Pos logit contributions
rell+0.033
 better+0.031
 safer+0.030
 nicer+0.029
 company+0.028
Neg logit contributions
ra-0.017
 magn-0.016
ast-0.016
 Fre-0.015
 inh-0.015
Token
Token ID198
Feature activation-9.83e-03

Pos logit contributions
ra+0.013
 magn+0.013
 Fre+0.012
ast+0.012
 inh+0.012
Neg logit contributions
rell-0.017
 better-0.016
 nicer-0.015
 safer-0.015
 company-0.015
Token"
Token ID1
Feature activation-7.49e-03

Pos logit contributions
ra+0.015
ast+0.015
 magn+0.014
 Fre+0.014
Tony+0.014
Neg logit contributions
rell-0.017
 better-0.016
 nicer-0.015
 company-0.015
 safer-0.014
TokenHello
Token ID15496
Feature activation-1.28e-02

Pos logit contributions
ra+0.012
 magn+0.012
ast+0.011
 Fre+0.011
 inh+0.011
Neg logit contributions
rell-0.012
 better-0.012
 safer-0.011
 nicer-0.011
 company-0.010
Token,
Token ID11
Feature activation-1.04e-02

Pos logit contributions
ra+0.022
 magn+0.022
 Fre+0.022
ast+0.021
 Miles+0.020
Neg logit contributions
rell-0.020
 better-0.018
 safer-0.017
 nicer-0.016
ock-0.016
Token my
Token ID616
Feature activation-5.91e-04

Pos logit contributions
ra+0.015
 magn+0.015
 Fre+0.015
ast+0.014
 El+0.014
Neg logit contributions
rell-0.019
 better-0.016
 safer-0.016
 nicer-0.015
 company-0.015
Token de
Token ID390
Feature activation+6.73e-02

Pos logit contributions
ra+0.001
 magn+0.001
ast+0.001
 Fre+0.001
 El+0.001
Neg logit contributions
rell-0.001
 better-0.001
 safer-0.001
 nicer-0.001
ock-0.001
Tokenars
Token ID945
Feature activation+3.19e-02

Pos logit contributions
rell+0.107
 better+0.080
 nicer+0.080
 meantime+0.079
 alright+0.078
Neg logit contributions
ra-0.117
 magn-0.113
ast-0.113
 cent-0.102
ix-0.101
Token,"
Token ID553
Feature activation-1.75e-02

Pos logit contributions
rell+0.050
 safer+0.045
 better+0.045
 nicer+0.044
 less+0.042
Neg logit contributions
ra-0.055
 magn-0.050
 Fre-0.050
ast-0.048
 El-0.047
Token she
Token ID673
Feature activation+5.50e-04

Pos logit contributions
ra+0.027
 magn+0.026
ast+0.026
 Fre+0.025
 inh+0.024
Neg logit contributions
 better-0.029
rell-0.029
 safer-0.027
 nicer-0.027
 less-0.026
Token said
Token ID531
Feature activation+3.14e-02

Pos logit contributions
rell+0.001
 better+0.001
 safer+0.001
 nicer+0.001
 company+0.001
Neg logit contributions
ra-0.001
 magn-0.001
ast-0.001
 Fre-0.001
 inh-0.001
Token.
Token ID13
Feature activation-2.28e-02

Pos logit contributions
rell+0.045
 better+0.039
 safer+0.038
 meantime+0.037
ock+0.037
Neg logit contributions
ra-0.057
 magn-0.057
 Fre-0.055
ast-0.054
 inh-0.053
Token.
Token ID13
Feature activation-6.35e-03

Pos logit contributions
rell+0.036
 better+0.030
 meantime+0.029
 safer+0.028
 nicer+0.028
Neg logit contributions
ra-0.032
 magn-0.031
 Fre-0.031
 inh-0.029
ast-0.029
Token Soon
Token ID15894
Feature activation+6.24e-03

Pos logit contributions
ra+0.010
 magn+0.010
ast+0.010
ert+0.009
 inh+0.009
Neg logit contributions
rell-0.010
 better-0.010
 safer-0.010
 nicer-0.009
 less-0.009
Token,
Token ID11
Feature activation-1.64e-02

Pos logit contributions
rell+0.011
 better+0.010
 safer+0.009
ock+0.009
 nicer+0.009
Neg logit contributions
 Fre-0.010
ra-0.009
 magn-0.009
 Cic-0.009
ast-0.009
Token more
Token ID517
Feature activation-7.71e-03

Pos logit contributions
ra+0.029
ast+0.028
udge+0.027
 magn+0.027
amm+0.027
Neg logit contributions
 better-0.027
 safer-0.025
 less-0.024
 nicer-0.024
rell-0.023
Token cater
Token ID20825
Feature activation+3.58e-03

Pos logit contributions
ra+0.013
 magn+0.013
ast+0.013
 Fre+0.013
 inh+0.012
Neg logit contributions
rell-0.011
 better-0.011
 safer-0.010
 nicer-0.010
 company-0.010
Tokenpill
Token ID27215
Feature activation+6.70e-02

Pos logit contributions
rell+0.007
 better+0.007
 safer+0.006
 nicer+0.006
 company+0.006
Neg logit contributions
ra-0.005
 magn-0.004
 Fre-0.004
ast-0.004
 inh-0.004
Tokenars
Token ID945
Feature activation-5.14e-03

Pos logit contributions
rell+0.103
 meantime+0.088
 better+0.083
 ambulance+0.081
 Taxi+0.081
Neg logit contributions
 magn-0.108
 fet-0.106
Bel-0.102
ra-0.102
aped-0.098
Token arrived
Token ID5284
Feature activation-1.84e-03

Pos logit contributions
ra+0.009
 magn+0.008
ast+0.008
 Fre+0.008
 inh+0.008
Neg logit contributions
rell-0.008
 better-0.007
 safer-0.007
 nicer-0.007
 company-0.006
Token.
Token ID13
Feature activation-7.12e-03

Pos logit contributions
ra+0.004
 magn+0.004
ast+0.003
 Fre+0.003
 inh+0.003
Neg logit contributions
rell-0.002
 better-0.002
 safer-0.002
 nicer-0.002
ock-0.002
Token They
Token ID1119
Feature activation+7.16e-03

Pos logit contributions
ra+0.011
 magn+0.011
ast+0.010
ert+0.010
 inh+0.010
Neg logit contributions
rell-0.012
 better-0.012
 safer-0.011
 nicer-0.011
 less-0.011
Token also
Token ID635
Feature activation+2.10e-02

Pos logit contributions
 better+0.011
rell+0.010
 safer+0.009
 nicer+0.009
 less+0.009
Neg logit contributions
ra-0.012
 magn-0.012
ast-0.012
 Fre-0.012
udge-0.011
TokenJ
Token ID41
Feature activation-4.16e-03

Pos logit contributions
rell+0.017
 better+0.016
 nicer+0.016
 safer+0.015
 company+0.015
Neg logit contributions
ra-0.015
 magn-0.015
 Fre-0.014
ast-0.014
 inh-0.013
Tokenill
Token ID359
Feature activation-5.35e-03

Pos logit contributions
 magn+0.005
ra+0.005
 Fre+0.004
 inh+0.004
ast+0.004
Neg logit contributions
rell-0.009
 better-0.008
ock-0.008
 safer-0.008
 nicer-0.008
Token was
Token ID373
Feature activation+1.36e-02

Pos logit contributions
ra+0.008
 magn+0.008
 Fre+0.007
ast+0.007
 Miles+0.007
Neg logit contributions
rell-0.009
 better-0.009
 safer-0.008
 nicer-0.008
ock-0.008
Token a
Token ID257
Feature activation-1.46e-02

Pos logit contributions
 better+0.020
rell+0.019
 safer+0.017
 nicer+0.017
 safe+0.016
Neg logit contributions
ra-0.024
 magn-0.024
ast-0.023
 Fre-0.023
ert-0.023
Token 3
Token ID513
Feature activation+2.74e-03

Pos logit contributions
ra+0.025
 magn+0.025
 Fre+0.025
ert+0.024
ast+0.024
Neg logit contributions
 better-0.021
 safer-0.019
rell-0.019
 nicer-0.019
 safe-0.018
Token year
Token ID614
Feature activation+6.68e-02

Pos logit contributions
 better+0.004
rell+0.004
 company+0.003
 nicer+0.003
 safer+0.003
Neg logit contributions
ra-0.005
 magn-0.005
 Fre-0.005
ast-0.005
 Cic-0.005
Token old
Token ID1468
Feature activation+2.02e-02

Pos logit contributions
rell+0.106
 better+0.090
 meantime+0.090
 safer+0.087
ock+0.086
Neg logit contributions
ra-0.104
ast-0.103
 Fre-0.103
 inh-0.098
 El-0.097
Token girl
Token ID2576
Feature activation+1.91e-02

Pos logit contributions
 better+0.032
rell+0.032
 nicer+0.029
 safer+0.029
 less+0.028
Neg logit contributions
ra-0.032
 magn-0.032
ast-0.031
 Fre-0.030
 inh-0.029
Token with
Token ID351
Feature activation+1.12e-02

Pos logit contributions
 better+0.026
rell+0.025
 safer+0.023
 nicer+0.022
 company+0.022
Neg logit contributions
 magn-0.034
ra-0.034
ast-0.033
 Fre-0.033
 inh-0.032
Token a
Token ID257
Feature activation-1.05e-02

Pos logit contributions
rell+0.019
 better+0.019
 safer+0.018
 nicer+0.017
 company+0.017
Neg logit contributions
ra-0.017
 magn-0.016
 Fre-0.016
ast-0.016
 inh-0.015
Token lot
Token ID1256
Feature activation+3.13e-02

Pos logit contributions
ra+0.018
 magn+0.018
ast+0.017
 Fre+0.017
 inh+0.017
Neg logit contributions
rell-0.016
 better-0.015
 safer-0.014
 nicer-0.014
 company-0.013
Token good
Token ID922
Feature activation-2.70e-02

Pos logit contributions
ra+0.030
 magn+0.029
 Fre+0.029
ast+0.029
 inh+0.028
Neg logit contributions
rell-0.026
 better-0.025
 safer-0.023
 nicer-0.023
 company-0.022
Token tip
Token ID8171
Feature activation-4.49e-03

Pos logit contributions
ra+0.049
 Fre+0.048
 magn+0.047
 Miles+0.047
 inh+0.046
Neg logit contributions
 better-0.037
rell-0.037
 company-0.033
 safer-0.031
 nicer-0.031
Token for
Token ID329
Feature activation-3.55e-03

Pos logit contributions
ra+0.008
 magn+0.007
ast+0.007
 Fre+0.007
 inh+0.007
Neg logit contributions
rell-0.007
 better-0.007
 safer-0.006
 nicer-0.006
 company-0.006
Token you
Token ID345
Feature activation+2.14e-02

Pos logit contributions
ra+0.007
 magn+0.007
ast+0.007
 Fre+0.006
 inh+0.006
Neg logit contributions
 better-0.004
rell-0.004
 safer-0.004
 nicer-0.004
 company-0.004
Token.
Token ID13
Feature activation+1.67e-02

Pos logit contributions
rell+0.033
 better+0.029
 safer+0.029
 nicer+0.028
ock+0.027
Neg logit contributions
ast-0.036
ra-0.036
 magn-0.035
 Fre-0.034
 Tony-0.033
Token Rel
Token ID4718
Feature activation+6.68e-02

Pos logit contributions
rell+0.023
 better+0.022
 safer+0.020
 less+0.019
 nicer+0.018
Neg logit contributions
 magn-0.030
ra-0.030
 inh-0.029
ude-0.028
ert-0.028
Tokeniable
Token ID3379
Feature activation+3.17e-03

Pos logit contributions
rell+0.091
 better+0.090
 safer+0.085
 less+0.083
 company+0.082
Neg logit contributions
ra-0.123
ast-0.116
 Fre-0.115
ert-0.107
ude-0.106
Token people
Token ID661
Feature activation+5.51e-03

Pos logit contributions
rell+0.005
 better+0.004
 safer+0.004
 nicer+0.004
 company+0.004
Neg logit contributions
ra-0.006
 magn-0.005
 Fre-0.005
ast-0.005
 inh-0.005
Token are
Token ID389
Feature activation+4.04e-03

Pos logit contributions
rell+0.009
 better+0.008
 safer+0.008
 nicer+0.008
 less+0.007
Neg logit contributions
ra-0.009
 magn-0.009
ast-0.008
 Fre-0.008
 inh-0.008
Token the
Token ID262
Feature activation+1.89e-02

Pos logit contributions
rell+0.006
 better+0.005
 safer+0.005
ock+0.005
 meantime+0.005
Neg logit contributions
ra-0.007
 magn-0.007
 Fre-0.007
ast-0.007
 inh-0.007
Token best
Token ID1266
Feature activation+5.64e-04

Pos logit contributions
 better+0.025
rell+0.025
 safer+0.022
 nicer+0.022
 company+0.021
Neg logit contributions
ra-0.035
 magn-0.034
ast-0.034
 Fre-0.033
 inh-0.033
TokenMel
Token ID21102
Feature activation-6.88e-03

Pos logit contributions
rell+0.017
 better+0.017
 safer+0.016
 nicer+0.016
 company+0.015
Neg logit contributions
ra-0.015
 magn-0.015
 Fre-0.014
ast-0.014
 inh-0.014
Tokenissa
Token ID13808
Feature activation+4.34e-03

Pos logit contributions
ra+0.010
 magn+0.010
 Fre+0.010
ast+0.010
 inh+0.010
Neg logit contributions
rell-0.012
 better-0.011
 safer-0.010
 nicer-0.010
ock-0.010
Token was
Token ID373
Feature activation+9.98e-03

Pos logit contributions
 better+0.007
rell+0.007
 safer+0.006
 nicer+0.006
 less+0.006
Neg logit contributions
 magn-0.007
ra-0.007
ast-0.006
 Fre-0.006
 Miles-0.006
Token a
Token ID257
Feature activation-1.05e-02

Pos logit contributions
 better+0.015
rell+0.014
 safer+0.013
 nicer+0.013
 safe+0.012
Neg logit contributions
ra-0.017
 magn-0.017
ast-0.017
 Fre-0.017
ert-0.016
Token 3
Token ID513
Feature activation+1.21e-03

Pos logit contributions
ra+0.017
 magn+0.017
 Fre+0.016
ert+0.016
ast+0.016
Neg logit contributions
 better-0.017
rell-0.015
 safer-0.015
 nicer-0.015
 safe-0.014
Token year
Token ID614
Feature activation+6.68e-02

Pos logit contributions
 better+0.002
rell+0.002
 company+0.001
 nicer+0.001
 safer+0.001
Neg logit contributions
 magn-0.002
ra-0.002
 Fre-0.002
 Cic-0.002
 Miles-0.002
Token old
Token ID1468
Feature activation+1.94e-02

Pos logit contributions
rell+0.109
 meantime+0.091
 better+0.091
ock+0.087
 safer+0.087
Neg logit contributions
 Fre-0.103
ra-0.103
ast-0.102
 inh-0.097
 El-0.096
Token girl
Token ID2576
Feature activation+2.07e-02

Pos logit contributions
 better+0.030
rell+0.029
 safer+0.027
 nicer+0.027
 less+0.026
Neg logit contributions
ra-0.031
 magn-0.031
ast-0.031
 Fre-0.030
 inh-0.029
Token who
Token ID508
Feature activation+7.17e-03

Pos logit contributions
 better+0.031
rell+0.030
 safer+0.028
 nicer+0.027
 company+0.027
Neg logit contributions
 magn-0.034
ra-0.034
ast-0.033
 Fre-0.033
 inh-0.032
Token loved
Token ID6151
Feature activation+8.87e-03

Pos logit contributions
rell+0.011
 better+0.011
 safer+0.010
 nicer+0.010
 company+0.010
Neg logit contributions
ra-0.011
 magn-0.011
ast-0.011
 Fre-0.011
 inh-0.011
Token to
Token ID284
Feature activation+1.66e-02

Pos logit contributions
 better+0.016
rell+0.015
 nicer+0.014
 safer+0.014
 company+0.014
Neg logit contributions
ra-0.013
 magn-0.013
ast-0.012
 Fre-0.012
 Miles-0.012
Token.
Token ID13
Feature activation-7.48e-03

Pos logit contributions
 magn+0.001
ra+0.001
ast+0.001
ude+0.001
 Tony+0.001
Neg logit contributions
rell-0.002
ock-0.001
 meantime-0.001
 safer-0.001
 inconvenience-0.001
Token "
Token ID366
Feature activation-2.64e-02

Pos logit contributions
ra+0.011
 magn+0.011
 Fre+0.011
ast+0.011
 inh+0.010
Neg logit contributions
rell-0.013
 better-0.012
 safer-0.012
 nicer-0.011
 company-0.011
TokenHello
Token ID15496
Feature activation-1.41e-02

Pos logit contributions
ra+0.045
 magn+0.045
ast+0.043
 Fre+0.042
 inh+0.042
Neg logit contributions
rell-0.039
 better-0.039
 safer-0.035
 nicer-0.035
ock-0.034
Token,
Token ID11
Feature activation-1.80e-02

Pos logit contributions
ra+0.025
 magn+0.025
 Fre+0.025
ast+0.024
 Miles+0.023
Neg logit contributions
rell-0.020
 better-0.019
 safer-0.018
 nicer-0.017
 company-0.017
Token my
Token ID616
Feature activation-1.14e-03

Pos logit contributions
ra+0.029
 magn+0.028
 Fre+0.028
ast+0.028
 inh+0.027
Neg logit contributions
rell-0.029
 better-0.027
 safer-0.025
 nicer-0.025
 company-0.024
Token de
Token ID390
Feature activation+6.62e-02

Pos logit contributions
ra+0.002
 magn+0.002
ast+0.002
 Fre+0.002
 inh+0.002
Neg logit contributions
rell-0.002
 better-0.002
 safer-0.002
 nicer-0.002
ock-0.002
Tokenars
Token ID945
Feature activation+2.50e-02

Pos logit contributions
rell+0.104
 meantime+0.074
 better+0.073
 nicer+0.072
 company+0.072
Neg logit contributions
ra-0.118
 magn-0.116
ast-0.114
 cent-0.105
ix-0.104
Token!
Token ID0
Feature activation-5.47e-03

Pos logit contributions
rell+0.038
 safer+0.035
 better+0.034
 nicer+0.034
 less+0.031
Neg logit contributions
ra-0.042
 magn-0.039
 Fre-0.038
ast-0.038
 Tony-0.037
Token What
Token ID1867
Feature activation-9.64e-03

Pos logit contributions
ra+0.009
 magn+0.009
ast+0.009
 inh+0.009
ude+0.009
Neg logit contributions
rell-0.008
 better-0.008
 safer-0.007
 nicer-0.007
 less-0.007
Token a
Token ID257
Feature activation-9.07e-03

Pos logit contributions
ra+0.019
 magn+0.019
 Fre+0.018
ast+0.018
 inh+0.017
Neg logit contributions
rell-0.012
 better-0.011
 safer-0.010
 nicer-0.010
 company-0.010
Token nice
Token ID3621
Feature activation-1.92e-03

Pos logit contributions
ra+0.018
 magn+0.017
 Fre+0.017
ast+0.017
 inh+0.017
Neg logit contributions
rell-0.011
 better-0.011
 safer-0.010
 nicer-0.010
 company-0.009
Token's
Token ID338
Feature activation+9.73e-04

Pos logit contributions
rell+0.032
 better+0.029
 nicer+0.027
 safer+0.027
ock+0.026
Neg logit contributions
ra-0.038
 magn-0.037
ast-0.036
 Fre-0.036
 inh-0.035
Token courage
Token ID11917
Feature activation+1.08e-02

Pos logit contributions
rell+0.002
 better+0.001
ock+0.001
 safer+0.001
 nicer+0.001
Neg logit contributions
ra-0.002
 magn-0.002
ast-0.002
 inh-0.001
ude-0.001
Token.
Token ID13
Feature activation+9.77e-03

Pos logit contributions
rell+0.018
 better+0.016
 nicer+0.016
 safer+0.016
 meantime+0.015
Neg logit contributions
ra-0.016
 magn-0.016
ast-0.015
 Fre-0.015
 inh-0.015
Token The
Token ID383
Feature activation-2.25e-03

Pos logit contributions
rell+0.016
 better+0.015
 safer+0.014
 nicer+0.013
 less+0.013
Neg logit contributions
ra-0.016
 magn-0.016
ast-0.015
 Fre-0.015
 inh-0.015
Token two
Token ID734
Feature activation-5.13e-03

Pos logit contributions
 magn+0.003
 Fre+0.003
ra+0.003
alph+0.002
 inh+0.002
Neg logit contributions
rell-0.005
 better-0.004
ock-0.004
 safer-0.004
 nicer-0.004
Token de
Token ID390
Feature activation+6.60e-02

Pos logit contributions
ra+0.007
 Fre+0.007
 magn+0.007
 inh+0.007
ude+0.007
Neg logit contributions
rell-0.009
 better-0.008
 safer-0.008
 nicer-0.008
 meantime-0.008
Tokeners
Token ID364
Feature activation+7.68e-03

Pos logit contributions
rell+0.095
 better+0.071
 safer+0.069
 meantime+0.069
 nicer+0.068
Neg logit contributions
ra-0.121
 magn-0.111
ast-0.109
 inh-0.108
Tony-0.107
Token became
Token ID2627
Feature activation-1.14e-02

Pos logit contributions
 better+0.010
rell+0.010
 safer+0.009
 nicer+0.009
 less+0.009
Neg logit contributions
 magn-0.014
ra-0.014
ast-0.014
 Fre-0.014
udge-0.013
Token good
Token ID922
Feature activation-7.82e-03

Pos logit contributions
ra+0.020
 magn+0.020
ast+0.020
 Fre+0.019
 inh+0.019
Neg logit contributions
 better-0.016
rell-0.016
 safer-0.015
 nicer-0.014
 less-0.014
Token friends
Token ID2460
Feature activation+2.08e-02

Pos logit contributions
 magn+0.014
ra+0.014
 Fre+0.014
ast+0.014
 inh+0.013
Neg logit contributions
 better-0.011
rell-0.011
 company-0.010
 nicer-0.010
 less-0.010
Token after
Token ID706
Feature activation-2.29e-02

Pos logit contributions
rell+0.036
 better+0.036
 safer+0.033
 nicer+0.033
 company+0.032
Neg logit contributions
ra-0.030
 magn-0.029
ast-0.028
 Fre-0.028
 inh-0.027
Tokenily
Token ID813
Feature activation-1.04e-02

Pos logit contributions
ra+0.030
 magn+0.029
 Fre+0.028
ast+0.028
 inh+0.028
Neg logit contributions
rell-0.027
 better-0.026
 safer-0.023
 nicer-0.023
ock-0.023
Token liked
Token ID8288
Feature activation+9.65e-03

Pos logit contributions
ra+0.018
 magn+0.017
ast+0.017
ert+0.017
 Fre+0.017
Neg logit contributions
 better-0.015
rell-0.015
 safer-0.013
 nicer-0.012
ock-0.012
Token to
Token ID284
Feature activation+7.81e-03

Pos logit contributions
 better+0.017
rell+0.016
 nicer+0.015
 safer+0.015
 company+0.014
Neg logit contributions
ra-0.016
ast-0.014
 magn-0.014
udge-0.013
 Fre-0.013
Token keep
Token ID1394
Feature activation+3.13e-03

Pos logit contributions
 better+0.011
rell+0.011
 company+0.009
 end+0.009
 safe+0.009
Neg logit contributions
ra-0.014
alph-0.013
ert-0.013
 Fre-0.013
udge-0.013
Token cater
Token ID20825
Feature activation+1.53e-02

Pos logit contributions
 better+0.004
rell+0.004
 safe+0.003
 company+0.003
 safer+0.003
Neg logit contributions
ra-0.006
ert-0.006
udge-0.006
Jul-0.006
ast-0.006
Tokenpill
Token ID27215
Feature activation+6.58e-02

Pos logit contributions
rell+0.031
 better+0.030
 safer+0.029
 nicer+0.029
 meantime+0.028
Neg logit contributions
ra-0.018
 magn-0.017
 Fre-0.017
ast-0.016
 inh-0.016
Tokenars
Token ID945
Feature activation-7.33e-03

Pos logit contributions
rell+0.090
 meantime+0.075
 nicer+0.075
 better+0.074
 End+0.072
Neg logit contributions
 magn-0.116
 fet-0.110
ast-0.109
Bel-0.108
ra-0.108
Token in
Token ID287
Feature activation-6.88e-03

Pos logit contributions
ra+0.009
 magn+0.009
ast+0.009
 Fre+0.009
 inh+0.009
Neg logit contributions
rell-0.014
 better-0.013
 safer-0.013
 nicer-0.013
 company-0.012
Token a
Token ID257
Feature activation-5.14e-03

Pos logit contributions
ra+0.012
ast+0.012
ert+0.011
 magn+0.011
ude+0.011
Neg logit contributions
 better-0.010
rell-0.010
 safer-0.009
 company-0.009
 less-0.009
Token jar
Token ID17379
Feature activation+8.76e-03

Pos logit contributions
ra+0.008
 magn+0.008
 Fre+0.008
ast+0.008
 inh+0.007
Neg logit contributions
rell-0.008
 better-0.008
 safer-0.008
 nicer-0.008
 company-0.007
Token.
Token ID13
Feature activation+1.13e-02

Pos logit contributions
rell+0.012
 better+0.011
 safer+0.010
 nicer+0.010
 less+0.009
Neg logit contributions
ra-0.016
 magn-0.016
ast-0.016
 Fre-0.016
 inh-0.015
Token<|endoftext|>
Token ID50256
Feature activation+1.27e-02

Pos logit contributions
rell+0.022
 better+0.021
 safer+0.019
 less+0.018
 refreshed+0.017
Neg logit contributions
ra-0.034
 magn-0.033
ast-0.031
ert-0.030
 inh-0.030
TokenJohn
Token ID7554
Feature activation-1.19e-02

Pos logit contributions
rell+0.022
 better+0.021
 nicer+0.020
 safer+0.020
 company+0.019
Neg logit contributions
ra-0.019
 magn-0.019
 Fre-0.018
ast-0.017
 inh-0.017
Token was
Token ID373
Feature activation+1.19e-02

Pos logit contributions
 magn+0.018
ra+0.018
 Fre+0.017
ast+0.017
ert+0.017
Neg logit contributions
 better-0.019
rell-0.019
 safer-0.017
 nicer-0.016
ock-0.016
Token a
Token ID257
Feature activation-5.75e-03

Pos logit contributions
 better+0.018
rell+0.016
 safer+0.015
 safe+0.015
 nicer+0.015
Neg logit contributions
 magn-0.021
ra-0.021
 Fre-0.020
ert-0.020
ast-0.020
Token 3
Token ID513
Feature activation+2.79e-03

Pos logit contributions
ra+0.010
 magn+0.010
 Fre+0.010
ert+0.010
ast+0.010
Neg logit contributions
 better-0.008
rell-0.007
 safer-0.007
 nicer-0.007
 safe-0.007
Token year
Token ID614
Feature activation+6.57e-02

Pos logit contributions
 better+0.004
rell+0.004
 company+0.003
 nicer+0.003
 safer+0.003
Neg logit contributions
ra-0.005
 magn-0.005
 Fre-0.005
 Cic-0.005
ert-0.005
Token old
Token ID1468
Feature activation+2.06e-02

Pos logit contributions
rell+0.101
 better+0.087
 meantime+0.086
 safer+0.083
ock+0.081
Neg logit contributions
ast-0.107
ra-0.107
 Fre-0.105
 inh-0.101
 Miles-0.100
Token boy
Token ID2933
Feature activation+1.73e-02

Pos logit contributions
 better+0.033
rell+0.033
 safer+0.030
 nicer+0.030
 less+0.029
Neg logit contributions
ra-0.032
 magn-0.032
ast-0.031
 Fre-0.030
 inh-0.029
Token who
Token ID508
Feature activation+6.63e-03

Pos logit contributions
 better+0.023
rell+0.022
 safer+0.020
 nicer+0.019
 company+0.019
Neg logit contributions
 magn-0.032
ra-0.032
 Fre-0.031
ast-0.031
 inh-0.030
Token lived
Token ID5615
Feature activation-1.17e-02

Pos logit contributions
rell+0.011
 better+0.010
 safer+0.009
 nicer+0.009
 company+0.009
Neg logit contributions
ra-0.011
 magn-0.010
 Fre-0.010
ast-0.010
 inh-0.010
Token in
Token ID287
Feature activation-2.22e-04

Pos logit contributions
ra+0.015
 magn+0.014
ast+0.014
 Fre+0.014
 Cic+0.014
Neg logit contributions
 better-0.023
rell-0.021
 safer-0.020
 nicer-0.020
 safe-0.020
Token squ
Token ID2809
Feature activation+1.00e-02

Pos logit contributions
 better+0.012
rell+0.012
 safer+0.010
 nicer+0.010
 less+0.010
Neg logit contributions
 magn-0.017
udge-0.016
ra-0.016
 Fre-0.015
Tony-0.015
Tokeneal
Token ID2287
Feature activation+1.81e-02

Pos logit contributions
rell+0.012
 better+0.011
 nicer+0.010
 safer+0.010
 company+0.009
Neg logit contributions
 magn-0.020
ra-0.020
 Fre-0.020
ast-0.019
 Miles-0.019
Tokened
Token ID276
Feature activation-6.87e-04

Pos logit contributions
 better+0.030
rell+0.030
 nicer+0.027
 safer+0.026
 less+0.026
Neg logit contributions
 magn-0.028
ra-0.027
ast-0.026
 Fre-0.026
 Miles-0.025
Token with
Token ID351
Feature activation+1.56e-02

Pos logit contributions
 magn+0.001
ert+0.001
ra+0.001
 Fre+0.001
Tony+0.001
Neg logit contributions
 better-0.001
rell-0.001
 safe-0.001
 nicer-0.001
 less-0.001
Token g
Token ID308
Feature activation+1.05e-02

Pos logit contributions
rell+0.024
 better+0.021
 safer+0.021
 nicer+0.020
ock+0.020
Neg logit contributions
 magn-0.026
ra-0.026
 Fre-0.025
 inh-0.025
ast-0.024
Tokenlee
Token ID7197
Feature activation+6.56e-02

Pos logit contributions
rell+0.018
 better+0.016
 nicer+0.015
 safer+0.015
 company+0.015
Neg logit contributions
 magn-0.016
ra-0.016
 Fre-0.015
ast-0.015
 inh-0.015
Token and
Token ID290
Feature activation+2.91e-02

Pos logit contributions
rell+0.104
 better+0.102
 safer+0.097
 nicer+0.095
 meantime+0.089
Neg logit contributions
ra-0.103
 magn-0.100
ast-0.098
 Fre-0.098
 inh-0.095
Token reached
Token ID4251
Feature activation+9.66e-05

Pos logit contributions
 better+0.041
 less+0.034
 safer+0.034
 safe+0.034
 nicer+0.034
Neg logit contributions
udge-0.055
ra-0.054
 magn-0.053
ert-0.053
ast-0.052
Token out
Token ID503
Feature activation-2.56e-02

Pos logit contributions
 better+0.000
 safer+0.000
 safe+0.000
rell+0.000
 nicer+0.000
Neg logit contributions
ra-0.000
ude-0.000
Tony-0.000
 magn-0.000
ast-0.000
Token to
Token ID284
Feature activation+8.49e-03

Pos logit contributions
ra+0.047
 magn+0.044
opsy+0.044
alph+0.043
ix+0.043
Neg logit contributions
 better-0.039
 nicer-0.034
 less-0.033
 safer-0.033
rell-0.032
Token take
Token ID1011
Feature activation+1.95e-02

Pos logit contributions
 better+0.013
rell+0.011
 safer+0.010
 nicer+0.010
 end+0.010
Neg logit contributions
ude-0.017
ert-0.016
utes-0.016
ra-0.015
 magn-0.015
Token
Token ID198
Feature activation-1.08e-02

Pos logit contributions
ra+0.007
 magn+0.007
 Fre+0.007
ast+0.007
 inh+0.007
Neg logit contributions
rell-0.010
 better-0.009
 nicer-0.009
 safer-0.009
 company-0.008
Token"
Token ID1
Feature activation-9.35e-03

Pos logit contributions
ra+0.018
 magn+0.018
ast+0.018
 Fre+0.017
Tony+0.017
Neg logit contributions
rell-0.017
 better-0.016
 nicer-0.015
 safer-0.015
 company-0.014
TokenHello
Token ID15496
Feature activation-1.56e-02

Pos logit contributions
ra+0.015
 magn+0.015
ast+0.014
 Fre+0.014
 inh+0.014
Neg logit contributions
rell-0.015
 better-0.014
 safer-0.013
 nicer-0.013
 company-0.013
Token,
Token ID11
Feature activation-1.92e-02

Pos logit contributions
ra+0.028
 magn+0.027
 Fre+0.027
ast+0.026
 Miles+0.025
Neg logit contributions
rell-0.023
 better-0.021
 safer-0.020
 nicer-0.019
 company-0.019
Token my
Token ID616
Feature activation-1.68e-03

Pos logit contributions
ra+0.032
 magn+0.031
 Fre+0.030
ast+0.030
 inh+0.029
Neg logit contributions
rell-0.030
 better-0.028
 safer-0.026
 nicer-0.026
 company-0.025
Token de
Token ID390
Feature activation+6.54e-02

Pos logit contributions
ra+0.003
 magn+0.003
ast+0.003
 Fre+0.003
 inh+0.003
Neg logit contributions
rell-0.003
 better-0.002
 safer-0.002
 nicer-0.002
ock-0.002
Tokenars
Token ID945
Feature activation+3.15e-02

Pos logit contributions
rell+0.104
 meantime+0.075
 better+0.074
 nicer+0.074
 company+0.073
Neg logit contributions
ra-0.116
ast-0.111
 magn-0.111
ix-0.102
 cent-0.102
Token.
Token ID13
Feature activation-6.01e-03

Pos logit contributions
rell+0.050
 safer+0.045
 nicer+0.044
 better+0.043
 less+0.040
Neg logit contributions
ra-0.053
 magn-0.048
 Fre-0.047
ast-0.046
 El-0.045
Token What
Token ID1867
Feature activation-5.69e-03

Pos logit contributions
ra+0.011
 magn+0.010
ast+0.010
 inh+0.010
ude+0.010
Neg logit contributions
rell-0.008
 better-0.008
 safer-0.007
 nicer-0.007
 less-0.007
Token do
Token ID466
Feature activation-8.51e-03

Pos logit contributions
ra+0.011
 magn+0.011
 Fre+0.010
ast+0.010
 inh+0.010
Neg logit contributions
rell-0.008
 better-0.007
 safer-0.007
 nicer-0.007
 company-0.006
Token you
Token ID345
Feature activation+7.36e-04

Pos logit contributions
ra+0.017
 magn+0.017
ast+0.017
 inh+0.017
 Fre+0.017
Neg logit contributions
 better-0.010
rell-0.009
 safer-0.008
 nicer-0.008
 company-0.008
Token.
Token ID13
Feature activation-1.73e-02

Pos logit contributions
 magn+0.020
ra+0.020
ast+0.020
 Fre+0.019
 inh+0.019
Neg logit contributions
rell-0.020
 better-0.019
 safer-0.017
 nicer-0.017
 meantime-0.017
Token One
Token ID1881
Feature activation-2.46e-02

Pos logit contributions
ra+0.029
 magn+0.028
ast+0.027
 Fre+0.026
 inh+0.026
Neg logit contributions
rell-0.028
 better-0.026
 safer-0.024
 nicer-0.023
 less-0.023
Token day
Token ID1110
Feature activation-8.66e-03

Pos logit contributions
ra+0.052
 magn+0.049
ast+0.049
 Fre+0.048
ude+0.047
Neg logit contributions
 better-0.029
rell-0.028
 safer-0.025
 nicer-0.024
 company-0.024
Token,
Token ID11
Feature activation-1.52e-02

Pos logit contributions
ra+0.018
ast+0.018
 magn+0.017
ude+0.017
 Fre+0.017
Neg logit contributions
 better-0.010
rell-0.010
 safer-0.008
 less-0.008
 nicer-0.008
Token Sp
Token ID1338
Feature activation-6.41e-02

Pos logit contributions
ast+0.021
ra+0.021
ude+0.020
udge+0.019
amm+0.019
Neg logit contributions
 better-0.029
rell-0.026
 less-0.026
 safer-0.026
 safe-0.024
Tokenide
Token ID485
Feature activation-7.92e-02

Pos logit contributions
 magn+0.112
ra+0.105
 Fre+0.099
 Mason+0.097
 inh+0.096
Neg logit contributions
rell-0.100
ock-0.097
 better-0.084
 company-0.081
hell-0.078
Tokeny
Token ID88
Feature activation-3.19e-02

Pos logit contributions
 magn+0.128
ra+0.124
 Fre+0.120
 cu+0.116
Tony+0.114
Neg logit contributions
rell-0.114
 better-0.113
ock-0.109
vious-0.102
ier-0.096
Token was
Token ID373
Feature activation-8.35e-03

Pos logit contributions
ra+0.052
 magn+0.052
 Fre+0.049
ast+0.049
udge+0.048
Neg logit contributions
 better-0.048
rell-0.048
 safer-0.042
 nicer-0.041
 safe-0.041
Token trying
Token ID2111
Feature activation-1.76e-02

Pos logit contributions
ra+0.012
 magn+0.012
 Fre+0.012
ast+0.012
 inh+0.012
Neg logit contributions
rell-0.014
 better-0.013
 safer-0.013
 nicer-0.013
ock-0.012
Token to
Token ID284
Feature activation-4.82e-03

Pos logit contributions
ra+0.028
ast+0.025
 Fre+0.025
opsy+0.024
 magn+0.024
Neg logit contributions
 better-0.032
 safer-0.029
rell-0.028
 nicer-0.028
 less-0.028
Token make
Token ID787
Feature activation+3.96e-03

Pos logit contributions
ra+0.008
 magn+0.008
ast+0.007
 Fre+0.007
alph+0.007
Neg logit contributions
 better-0.008
rell-0.007
 safer-0.007
 safe-0.007
 end-0.007
Token that
Token ID326
Feature activation-2.41e-02

Pos logit contributions
 better+0.031
rell+0.029
 safer+0.027
 nicer+0.027
 less+0.025
Neg logit contributions
ra-0.043
 magn-0.042
ast-0.041
 Fre-0.040
 inh-0.040
Token day
Token ID1110
Feature activation+3.43e-03

Pos logit contributions
ra+0.053
ast+0.052
udge+0.050
 Fre+0.050
ude+0.049
Neg logit contributions
 better-0.027
rell-0.025
 safer-0.022
 safe-0.021
 company-0.020
Token on
Token ID319
Feature activation-3.69e-02

Pos logit contributions
 better+0.006
rell+0.006
 safer+0.005
 nicer+0.005
 company+0.005
Neg logit contributions
ra-0.005
 magn-0.005
ast-0.005
 Fre-0.005
 inh-0.005
Token,
Token ID11
Feature activation-3.22e-02

Pos logit contributions
ast+0.096
ra+0.091
udge+0.091
Bel+0.090
Jul+0.088
Neg logit contributions
 better-0.028
rell-0.021
 safer-0.019
 less-0.017
 wrong-0.017
Token Sp
Token ID1338
Feature activation-7.12e-02

Pos logit contributions
ast+0.039
ra+0.039
 magn+0.036
ude+0.036
opsy+0.035
Neg logit contributions
 better-0.069
rell-0.063
 safer-0.063
 less-0.061
 nicer-0.058
Tokenide
Token ID485
Feature activation-7.84e-02

Pos logit contributions
 magn+0.122
ra+0.114
 Fre+0.108
 Mason+0.106
 inh+0.106
Neg logit contributions
rell-0.114
ock-0.109
 better-0.099
 company-0.095
ier-0.092
Tokeny
Token ID88
Feature activation-2.51e-02

Pos logit contributions
 magn+0.126
ra+0.123
 Fre+0.120
 cu+0.118
 inh+0.116
Neg logit contributions
 better-0.114
rell-0.110
ock-0.110
ier-0.101
vious-0.096
Token knew
Token ID2993
Feature activation+1.10e-03

Pos logit contributions
 magn+0.042
ra+0.041
 Fre+0.040
ast+0.039
udge+0.039
Neg logit contributions
 better-0.038
rell-0.037
 safer-0.032
 less-0.032
ock-0.031
Token that
Token ID326
Feature activation-1.11e-02

Pos logit contributions
 better+0.001
 safer+0.001
 less+0.001
 nicer+0.001
 safe+0.001
Neg logit contributions
ra-0.002
ast-0.002
 magn-0.002
udge-0.002
isa-0.002
Token even
Token ID772
Feature activation-2.02e-04

Pos logit contributions
ra+0.021
ast+0.020
opsy+0.019
udge+0.019
ude+0.019
Neg logit contributions
 better-0.017
 safer-0.015
 less-0.014
 safe-0.014
rell-0.014
Token when
Token ID618
Feature activation-3.40e-02

Pos logit contributions
ra+0.000
ast+0.000
 Fre+0.000
opsy+0.000
 magn+0.000
Neg logit contributions
 better-0.000
 safer-0.000
 nicer-0.000
 less-0.000
rell-0.000
Token that
Token ID326
Feature activation-2.41e-02

Pos logit contributions
 better+0.025
rell+0.024
 safer+0.022
 nicer+0.022
 company+0.020
Neg logit contributions
ra-0.034
 magn-0.033
ast-0.032
 Fre-0.032
 inh-0.031
Token day
Token ID1110
Feature activation+7.68e-03

Pos logit contributions
ra+0.053
ast+0.050
 magn+0.050
 Fre+0.049
udge+0.049
Neg logit contributions
 better-0.028
rell-0.026
 safer-0.023
 safe-0.022
 company-0.022
Token on
Token ID319
Feature activation-3.31e-02

Pos logit contributions
rell+0.014
 better+0.014
 safer+0.013
 nicer+0.012
 company+0.012
Neg logit contributions
ra-0.011
 magn-0.010
 Fre-0.010
ast-0.010
 inh-0.010
Token,
Token ID11
Feature activation-3.17e-02

Pos logit contributions
ast+0.082
ra+0.081
udge+0.078
 inh+0.077
 magn+0.076
Neg logit contributions
 better-0.028
rell-0.025
 safer-0.022
 less-0.019
 safe-0.019
Token Sp
Token ID1338
Feature activation-7.16e-02

Pos logit contributions
ra+0.038
ast+0.038
 magn+0.035
ude+0.034
opsy+0.034
Neg logit contributions
 better-0.068
rell-0.063
 safer-0.063
 less-0.060
 nicer-0.058
Tokenide
Token ID485
Feature activation-7.70e-02

Pos logit contributions
 magn+0.124
ra+0.115
 Fre+0.112
 Mason+0.110
 Tony+0.109
Neg logit contributions
rell-0.112
ock-0.109
 better-0.097
 company-0.094
ier-0.091
Tokeny
Token ID88
Feature activation-2.35e-02

Pos logit contributions
 magn+0.117
ra+0.115
 Fre+0.112
 cu+0.109
 inh+0.109
Neg logit contributions
 better-0.123
rell-0.116
ock-0.115
ier-0.106
vious-0.104
Token made
Token ID925
Feature activation+5.30e-03

Pos logit contributions
 magn+0.042
ra+0.041
 Fre+0.039
ast+0.039
udge+0.038
Neg logit contributions
 better-0.035
rell-0.033
 safer-0.029
 less-0.028
 nicer-0.028
Token sure
Token ID1654
Feature activation-1.89e-02

Pos logit contributions
 better+0.009
rell+0.008
 safer+0.008
 nicer+0.008
 less+0.007
Neg logit contributions
ra-0.009
 magn-0.009
 Fre-0.008
ast-0.008
 inh-0.008
Token to
Token ID284
Feature activation-1.30e-02

Pos logit contributions
ast+0.034
ra+0.034
 magn+0.033
opsy+0.032
isa+0.031
Neg logit contributions
 better-0.031
 nicer-0.027
 safer-0.027
 less-0.027
 wrong-0.026
Token keep
Token ID1394
Feature activation-8.69e-03

Pos logit contributions
 magn+0.024
Tony+0.023
ra+0.023
 Fre+0.023
 Miles+0.023
Neg logit contributions
 better-0.019
 safe-0.017
 safer-0.017
rell-0.016
 company-0.016
Token.
Token ID13
Feature activation-1.46e-02

Pos logit contributions
rell+0.002
 better+0.002
 nicer+0.002
 safer+0.002
ock+0.002
Neg logit contributions
 magn-0.002
ra-0.002
 Fre-0.002
 inh-0.002
ast-0.002
Token Every
Token ID3887
Feature activation-1.58e-02

Pos logit contributions
ra+0.023
 magn+0.022
ast+0.022
 Fre+0.021
 inh+0.021
Neg logit contributions
rell-0.024
 better-0.023
 safer-0.022
 nicer-0.021
 less-0.020
Token day
Token ID1110
Feature activation-3.37e-03

Pos logit contributions
ra+0.034
 magn+0.033
ast+0.033
 Fre+0.032
ude+0.032
Neg logit contributions
 better-0.017
rell-0.016
 safer-0.015
 nicer-0.014
 less-0.013
Token,
Token ID11
Feature activation-2.64e-02

Pos logit contributions
ra+0.006
ast+0.006
 magn+0.006
opsy+0.006
ude+0.006
Neg logit contributions
 better-0.005
rell-0.004
 safer-0.004
 less-0.004
 nicer-0.004
Token Sp
Token ID1338
Feature activation-6.55e-02

Pos logit contributions
ast+0.036
ra+0.036
ude+0.033
udge+0.032
amm+0.032
Neg logit contributions
 better-0.053
rell-0.048
 less-0.047
 safer-0.047
 nicer-0.044
Tokenide
Token ID485
Feature activation-7.65e-02

Pos logit contributions
 magn+0.115
ra+0.110
 Fre+0.103
 inh+0.100
 Mason+0.099
Neg logit contributions
rell-0.101
ock-0.098
 better-0.085
 company-0.081
hell-0.080
Tokeny
Token ID88
Feature activation-3.33e-02

Pos logit contributions
ra+0.116
 magn+0.114
 Fre+0.108
 cu+0.106
Tony+0.105
Neg logit contributions
rell-0.121
 better-0.119
ock-0.113
vious-0.103
ier-0.101
Token would
Token ID561
Feature activation-1.42e-02

Pos logit contributions
ra+0.053
 magn+0.052
ast+0.049
 Fre+0.049
udge+0.048
Neg logit contributions
 better-0.052
rell-0.052
 safer-0.047
 nicer-0.045
 company-0.045
Token spin
Token ID7906
Feature activation-1.50e-02

Pos logit contributions
ra+0.024
 magn+0.023
ast+0.022
 Fre+0.022
Tony+0.022
Neg logit contributions
 better-0.022
rell-0.021
 safer-0.020
 nicer-0.019
 safe-0.018
Token cob
Token ID22843
Feature activation-5.08e-02

Pos logit contributions
ra+0.025
 magn+0.025
 Fre+0.024
ast+0.024
 inh+0.024
Neg logit contributions
rell-0.022
 better-0.021
 safer-0.020
 nicer-0.020
 company-0.019
Tokenwe
Token ID732
Feature activation-2.79e-04

Pos logit contributions
alph+0.065
opsy+0.063
ast+0.062
ra+0.061
 magn+0.061
Neg logit contributions
rell-0.088
 better-0.087
ock-0.082
ier-0.082
 nicer-0.082
Token strong
Token ID1913
Feature activation+7.10e-03

Pos logit contributions
 magn+0.027
ra+0.026
ast+0.026
 inh+0.025
 Fre+0.024
Neg logit contributions
rell-0.022
 better-0.019
 safer-0.019
 nicer-0.018
 meantime-0.018
Token.
Token ID13
Feature activation+6.54e-03

Pos logit contributions
rell+0.012
 better+0.010
 nicer+0.010
 meantime+0.010
 safer+0.010
Neg logit contributions
ra-0.011
 magn-0.011
ast-0.010
 Fre-0.010
 inh-0.010
Token One
Token ID1881
Feature activation-1.30e-03

Pos logit contributions
rell+0.011
 better+0.010
 safer+0.010
 nicer+0.010
 company+0.009
Neg logit contributions
ra-0.010
 magn-0.010
 Fre-0.010
ast-0.010
 inh-0.009
Token day
Token ID1110
Feature activation+9.32e-04

Pos logit contributions
 magn+0.002
ra+0.002
ast+0.002
 Fre+0.002
 inh+0.002
Neg logit contributions
rell-0.002
 better-0.002
 nicer-0.002
 safer-0.002
 less-0.002
Token,
Token ID11
Feature activation-3.40e-03

Pos logit contributions
rell+0.001
 better+0.001
 safer+0.001
 nicer+0.001
 less+0.001
Neg logit contributions
ra-0.002
 magn-0.002
ast-0.002
 Fre-0.002
 inh-0.002
Token Frost
Token ID15122
Feature activation-7.59e-02

Pos logit contributions
ra+0.006
ast+0.006
udge+0.006
ude+0.006
 magn+0.006
Neg logit contributions
 better-0.005
 less-0.005
 safer-0.004
rell-0.004
 safe-0.004
Tokeny
Token ID88
Feature activation-1.80e-02

Pos logit contributions
 magn+0.120
 Fre+0.118
ra+0.110
 Tor+0.107
 inh+0.104
Neg logit contributions
rell-0.118
ock-0.118
 better-0.118
 company-0.107
 end-0.101
Token met
Token ID1138
Feature activation+6.16e-03

Pos logit contributions
ra+0.030
 magn+0.030
 Fre+0.029
ast+0.028
udge+0.028
Neg logit contributions
 better-0.027
rell-0.026
 safer-0.024
 nicer-0.023
 safe-0.023
Token a
Token ID257
Feature activation+7.04e-04

Pos logit contributions
rell+0.011
 better+0.010
 safer+0.010
 nicer+0.010
 company+0.009
Neg logit contributions
ra-0.009
 magn-0.009
 Fre-0.008
ast-0.008
 inh-0.008
Token little
Token ID1310
Feature activation+2.28e-03

Pos logit contributions
rell+0.001
 better+0.001
 safer+0.001
 nicer+0.001
 company+0.001
Neg logit contributions
ra-0.001
 magn-0.001
 Fre-0.001
ast-0.001
 inh-0.001
Token girl
Token ID2576
Feature activation+1.10e-02

Pos logit contributions
rell+0.004
 better+0.004
 safer+0.003
 nicer+0.003
 company+0.003
Neg logit contributions
ra-0.004
 magn-0.003
ast-0.003
 Fre-0.003
 inh-0.003
Token.
Token ID13
Feature activation-1.34e-02

Pos logit contributions
ra+0.027
 magn+0.026
ast+0.026
 Fre+0.026
 inh+0.025
Neg logit contributions
rell-0.017
 better-0.016
 safer-0.014
 nicer-0.014
 company-0.013
Token One
Token ID1881
Feature activation-1.85e-02

Pos logit contributions
ra+0.022
 magn+0.020
ast+0.020
ude+0.019
 inh+0.019
Neg logit contributions
rell-0.022
 better-0.021
 safer-0.020
 less-0.018
 company-0.018
Token day
Token ID1110
Feature activation-3.41e-03

Pos logit contributions
ra+0.035
 magn+0.034
ast+0.033
 Fre+0.033
 inh+0.032
Neg logit contributions
rell-0.025
 better-0.024
 safer-0.022
 nicer-0.021
 company-0.020
Token,
Token ID11
Feature activation-1.38e-02

Pos logit contributions
ra+0.007
ast+0.007
 magn+0.007
 Fre+0.007
ude+0.007
Neg logit contributions
 better-0.003
rell-0.003
 safer-0.003
 less-0.003
 nicer-0.003
Token Sp
Token ID1338
Feature activation-6.34e-02

Pos logit contributions
ast+0.019
ra+0.019
ude+0.018
udge+0.017
alph+0.017
Neg logit contributions
 better-0.027
 less-0.024
 safer-0.023
rell-0.023
 more-0.022
Tokenide
Token ID485
Feature activation-7.45e-02

Pos logit contributions
 magn+0.110
ra+0.104
 Fre+0.099
 Mason+0.096
 inh+0.095
Neg logit contributions
rell-0.097
ock-0.096
 better-0.083
 company-0.080
hell-0.080
Tokeny
Token ID88
Feature activation-2.64e-02

Pos logit contributions
 magn+0.120
ra+0.119
 Fre+0.113
 cu+0.110
Jul+0.108
Neg logit contributions
 better-0.105
rell-0.104
ock-0.101
vious-0.096
ier-0.088
Token noticed
Token ID6810
Feature activation-8.09e-03

Pos logit contributions
ra+0.043
 magn+0.042
 Fre+0.040
ast+0.040
udge+0.039
Neg logit contributions
 better-0.040
rell-0.040
 safer-0.035
 safe-0.034
 nicer-0.034
Token that
Token ID326
Feature activation-1.52e-03

Pos logit contributions
ra+0.014
 magn+0.013
ast+0.013
 Fre+0.013
 inh+0.012
Neg logit contributions
 better-0.012
rell-0.012
 safer-0.011
 nicer-0.011
 less-0.010
Token there
Token ID612
Feature activation+1.38e-03

Pos logit contributions
ra+0.003
ast+0.003
ude+0.003
 magn+0.003
 Fre+0.002
Neg logit contributions
 better-0.002
 safer-0.002
rell-0.002
 safe-0.002
 less-0.002
Token was
Token ID373
Feature activation-2.49e-02

Pos logit contributions
rell+0.002
 better+0.002
 safer+0.002
 nicer+0.002
 company+0.002
Neg logit contributions
ra-0.002
ast-0.002
 magn-0.002
 Fre-0.002
 inh-0.002
Token that
Token ID326
Feature activation-2.33e-02

Pos logit contributions
 better+0.030
rell+0.027
 safer+0.026
 nicer+0.026
 less+0.024
Neg logit contributions
 magn-0.047
ra-0.046
ast-0.045
 inh-0.043
 Fre-0.043
Token day
Token ID1110
Feature activation+3.09e-03

Pos logit contributions
ra+0.051
ast+0.049
 Fre+0.048
 magn+0.048
udge+0.047
Neg logit contributions
 better-0.027
rell-0.024
 safer-0.021
 end-0.020
 company-0.020
Token on
Token ID319
Feature activation-3.45e-02

Pos logit contributions
rell+0.006
 better+0.006
 safer+0.005
 nicer+0.005
 company+0.005
Neg logit contributions
ra-0.004
 magn-0.004
ast-0.004
 Fre-0.004
 inh-0.004
Token,
Token ID11
Feature activation-2.97e-02

Pos logit contributions
ast+0.086
ra+0.083
udge+0.082
Bel+0.081
 inh+0.080
Neg logit contributions
 better-0.029
rell-0.024
 safer-0.021
 wrong-0.020
 less-0.019
Token Sp
Token ID1338
Feature activation-6.52e-02

Pos logit contributions
ra+0.037
ast+0.037
 magn+0.035
ude+0.034
opsy+0.034
Neg logit contributions
 better-0.061
rell-0.058
 safer-0.056
 less-0.054
 nicer-0.052
Tokenide
Token ID485
Feature activation-7.41e-02

Pos logit contributions
 magn+0.109
ra+0.100
 Fre+0.097
 inh+0.095
 Mason+0.094
Neg logit contributions
rell-0.107
ock-0.101
 better-0.096
 company-0.091
 safer-0.088
Tokeny
Token ID88
Feature activation-2.65e-02

Pos logit contributions
 magn+0.111
ra+0.104
 Fre+0.104
 inh+0.103
 cu+0.101
Neg logit contributions
 better-0.121
rell-0.114
ock-0.112
ier-0.106
vious-0.100
Token knew
Token ID2993
Feature activation-1.24e-03

Pos logit contributions
 magn+0.047
ra+0.045
 Fre+0.044
ast+0.043
opsy+0.043
Neg logit contributions
 better-0.039
rell-0.038
 safer-0.033
 less-0.032
 nicer-0.031
Token that
Token ID326
Feature activation-1.47e-02

Pos logit contributions
ra+0.003
 magn+0.003
ast+0.003
isa+0.003
 inh+0.003
Neg logit contributions
 better-0.001
 safer-0.001
 wrong-0.001
 less-0.001
 nicer-0.001
Token even
Token ID772
Feature activation-1.46e-03

Pos logit contributions
ra+0.028
ast+0.028
opsy+0.026
ude+0.026
udge+0.026
Neg logit contributions
 better-0.023
 safer-0.020
 less-0.019
 safe-0.018
 nicer-0.018
Token when
Token ID618
Feature activation-3.68e-02

Pos logit contributions
ra+0.003
ast+0.003
opsy+0.003
Tony+0.003
 Fre+0.003
Neg logit contributions
 better-0.002
 safer-0.002
 nicer-0.002
 less-0.002
 more-0.002
Token rest
Token ID1334
Feature activation-3.32e-02

Pos logit contributions
rell+0.011
 better+0.010
 safer+0.009
 nicer+0.009
 company+0.009
Neg logit contributions
ra-0.010
 magn-0.010
ast-0.010
 Fre-0.010
 inh-0.010
Token.
Token ID13
Feature activation-1.05e-02

Pos logit contributions
ra+0.064
 magn+0.062
 Fre+0.061
ast+0.060
 inh+0.058
Neg logit contributions
 better-0.045
 safer-0.039
rell-0.039
 nicer-0.038
 safe-0.037
Token One
Token ID1881
Feature activation-1.53e-02

Pos logit contributions
ra+0.020
 magn+0.019
ast+0.019
 Fre+0.018
ude+0.018
Neg logit contributions
rell-0.015
 better-0.014
 safer-0.013
 nicer-0.012
 less-0.012
Token day
Token ID1110
Feature activation-8.15e-03

Pos logit contributions
ra+0.033
 magn+0.031
 Fre+0.031
ast+0.030
ude+0.030
Neg logit contributions
 better-0.018
rell-0.016
 safer-0.015
 company-0.014
 nicer-0.014
Token,
Token ID11
Feature activation-1.12e-02

Pos logit contributions
ra+0.017
 magn+0.016
ast+0.016
 Fre+0.016
 inh+0.016
Neg logit contributions
rell-0.009
 better-0.009
 safer-0.008
 nicer-0.008
 less-0.007
Token Buzz
Token ID15753
Feature activation-7.39e-02

Pos logit contributions
ra+0.018
ast+0.018
udge+0.017
ert+0.017
ude+0.017
Neg logit contributions
 better-0.020
 less-0.017
 safer-0.017
rell-0.017
 safe-0.016
Tokeny
Token ID88
Feature activation-2.03e-02

Pos logit contributions
 magn+0.119
ra+0.116
 Fre+0.114
 Tor+0.108
 Pats+0.106
Neg logit contributions
 better-0.111
rell-0.099
ock-0.098
vious-0.094
 end-0.092
Token was
Token ID373
Feature activation-2.07e-03

Pos logit contributions
 magn+0.036
ra+0.036
udge+0.035
 Fre+0.034
ert+0.033
Neg logit contributions
 better-0.029
rell-0.026
 safer-0.023
 nicer-0.023
 safe-0.023
Token flying
Token ID7348
Feature activation-1.06e-02

Pos logit contributions
ra+0.004
 magn+0.003
ast+0.003
 Fre+0.003
 inh+0.003
Neg logit contributions
rell-0.003
 better-0.003
 safer-0.003
 nicer-0.003
 company-0.003
Token around
Token ID1088
Feature activation-3.16e-02

Pos logit contributions
ra+0.017
 magn+0.016
ast+0.016
 Fre+0.016
 Cic+0.015
Neg logit contributions
 better-0.018
rell-0.017
 safer-0.016
 nicer-0.016
 less-0.015
Token looking
Token ID2045
Feature activation+9.00e-03

Pos logit contributions
ra+0.062
ast+0.060
opsy+0.057
amm+0.057
ude+0.056
Neg logit contributions
 better-0.045
 less-0.037
 end-0.037
 safe-0.036
 safer-0.036
Token to
Token ID284
Feature activation+3.08e-03

Pos logit contributions
 magn+0.012
 M+0.012
 inh+0.012
ra+0.011
 digging+0.011
Neg logit contributions
rell-0.039
ock-0.035
 meantime-0.034
opl-0.034
hing-0.031
Token get
Token ID651
Feature activation-5.17e-03

Pos logit contributions
rell+0.006
 better+0.005
 safer+0.005
 nicer+0.005
 less+0.005
Neg logit contributions
ra-0.004
 magn-0.004
ast-0.004
 Fre-0.004
 inh-0.004
Token home
Token ID1363
Feature activation-7.73e-03

Pos logit contributions
ra+0.006
 inh+0.006
 magn+0.006
 M+0.006
 digging+0.006
Neg logit contributions
rell-0.010
opl-0.009
ock-0.009
 meantime-0.009
 End-0.008
Token.
Token ID13
Feature activation-5.16e-03

Pos logit contributions
 magn+0.011
 inh+0.011
ra+0.011
 M+0.011
 fold+0.010
Neg logit contributions
rell-0.013
 End-0.011
ealing-0.011
 meantime-0.010
opl-0.010
Token Sp
Token ID1338
Feature activation-7.30e-02

Pos logit contributions
ra+0.006
 magn+0.006
ast+0.005
opsy+0.005
udge+0.005
Neg logit contributions
rell-0.011
 better-0.011
 safer-0.010
 less-0.010
 nicer-0.010
Tokenide
Token ID485
Feature activation-7.39e-02

Pos logit contributions
 magn+0.135
ra+0.124
 Fre+0.119
 Tony+0.116
 Mason+0.116
Neg logit contributions
rell-0.109
ock-0.108
 better-0.093
 company-0.089
ier-0.088
Tokeny
Token ID88
Feature activation-1.79e-02

Pos logit contributions
 magn+0.132
ra+0.127
 Fre+0.125
growth+0.118
 sket+0.115
Neg logit contributions
 better-0.099
ock-0.094
rell-0.093
vious-0.084
ier-0.080
Token cared
Token ID19951
Feature activation-2.44e-03

Pos logit contributions
 magn+0.033
udge+0.032
ra+0.032
 Fre+0.031
ast+0.030
Neg logit contributions
 better-0.025
rell-0.022
 nicer-0.021
 less-0.020
 end-0.020
Token about
Token ID546
Feature activation-2.50e-02

Pos logit contributions
 magn+0.004
ra+0.004
ast+0.004
 Fre+0.004
 inh+0.004
Neg logit contributions
rell-0.003
 better-0.003
 nicer-0.003
 safer-0.003
 meantime-0.003
Token the
Token ID262
Feature activation-7.44e-03

Pos logit contributions
ra+0.055
opsy+0.049
ast+0.049
udge+0.049
ude+0.049
Neg logit contributions
 better-0.033
 safer-0.028
 less-0.026
 safe-0.026
rell-0.026
Token bug
Token ID5434
Feature activation-1.01e-02

Pos logit contributions
ra+0.008
ast+0.008
 magn+0.008
 Fre+0.008
 Miles+0.007
Neg logit contributions
 better-0.016
rell-0.016
 safer-0.015
 nicer-0.014
 company-0.014
Token with
Token ID351
Feature activation-3.90e-03

Pos logit contributions
ra+0.006
 magn+0.006
ast+0.006
 Fre+0.006
 inh+0.006
Neg logit contributions
rell-0.006
 better-0.005
 safer-0.005
 nicer-0.005
 company-0.005
Token me
Token ID502
Feature activation-8.57e-03

Pos logit contributions
ra+0.007
 magn+0.007
 Fre+0.007
ast+0.007
 inh+0.007
Neg logit contributions
rell-0.005
 better-0.005
 safer-0.005
 nicer-0.005
 company-0.004
Token?"
Token ID1701
Feature activation-2.70e-02

Pos logit contributions
ra+0.015
 magn+0.015
ast+0.014
 Fre+0.014
 inh+0.014
Neg logit contributions
rell-0.013
 better-0.012
 safer-0.011
 nicer-0.011
 company-0.010
Token
Token ID198
Feature activation-1.63e-02

Pos logit contributions
ra+0.057
ast+0.053
 magn+0.053
ert+0.050
 inh+0.049
Neg logit contributions
 better-0.035
rell-0.031
 safer-0.029
 less-0.029
 company-0.026
Token
Token ID198
Feature activation-1.58e-02

Pos logit contributions
ra+0.030
 magn+0.030
ast+0.028
 inh+0.028
 Fre+0.028
Neg logit contributions
 better-0.023
rell-0.022
 safer-0.020
 nicer-0.019
 less-0.019
TokenR
Token ID49
Feature activation-7.32e-02

Pos logit contributions
 magn+0.023
ra+0.023
 knees+0.021
 inh+0.021
ast+0.020
Neg logit contributions
 better-0.028
rell-0.026
 safer-0.026
 nicer-0.025
 less-0.025
Tokenabbit
Token ID14229
Feature activation+4.40e-03

Pos logit contributions
 magn+0.106
 Fre+0.090
ra+0.089
 inh+0.088
 sket+0.085
Neg logit contributions
rell-0.138
ock-0.121
 better-0.119
 company-0.117
ealing-0.110
Token thought
Token ID1807
Feature activation+2.59e-03

Pos logit contributions
rell+0.006
 better+0.006
 safer+0.005
 nicer+0.005
 company+0.005
Neg logit contributions
ra-0.008
 magn-0.008
ast-0.007
 Fre-0.007
 inh-0.007
Token for
Token ID329
Feature activation-1.84e-02

Pos logit contributions
rell+0.004
 better+0.004
 nicer+0.004
 safer+0.004
 company+0.003
Neg logit contributions
ra-0.004
 magn-0.004
 Fre-0.004
ast-0.004
 inh-0.004
Token a
Token ID257
Feature activation-3.34e-03

Pos logit contributions
 magn+0.022
ra+0.021
 Fre+0.021
 inh+0.020
 Miles+0.020
Neg logit contributions
rell-0.038
 better-0.033
 safer-0.032
 nicer-0.032
ock-0.032
Token moment
Token ID2589
Feature activation-8.68e-03

Pos logit contributions
ra+0.006
 magn+0.005
 Fre+0.005
ast+0.005
 inh+0.005
Neg logit contributions
rell-0.005
 better-0.005
 safer-0.005
 nicer-0.004
 company-0.004
Token how
Token ID703
Feature activation-1.53e-02

Pos logit contributions
ra+0.013
 magn+0.013
 Fre+0.012
ast+0.012
 inh+0.012
Neg logit contributions
rell-0.011
 better-0.011
 safer-0.010
 nicer-0.010
 company-0.009
Token to
Token ID284
Feature activation+3.08e-03

Pos logit contributions
 magn+0.012
 M+0.012
 inh+0.012
ra+0.011
 digging+0.011
Neg logit contributions
rell-0.039
ock-0.035
 meantime-0.034
opl-0.034
hing-0.031
Token get
Token ID651
Feature activation-5.17e-03

Pos logit contributions
rell+0.006
 better+0.005
 safer+0.005
 nicer+0.005
 less+0.005
Neg logit contributions
ra-0.004
 magn-0.004
ast-0.004
 Fre-0.004
 inh-0.004
Token home
Token ID1363
Feature activation-7.73e-03

Pos logit contributions
ra+0.006
 inh+0.006
 magn+0.006
 M+0.006
 digging+0.006
Neg logit contributions
rell-0.010
opl-0.009
ock-0.009
 meantime-0.009
 End-0.008
Token.
Token ID13
Feature activation-5.16e-03

Pos logit contributions
 magn+0.011
 inh+0.011
ra+0.011
 M+0.011
 fold+0.010
Neg logit contributions
rell-0.013
 End-0.011
ealing-0.011
 meantime-0.010
opl-0.010
Token Sp
Token ID1338
Feature activation-7.30e-02

Pos logit contributions
ra+0.006
 magn+0.006
ast+0.005
opsy+0.005
udge+0.005
Neg logit contributions
rell-0.011
 better-0.011
 safer-0.010
 less-0.010
 nicer-0.010
Tokenide
Token ID485
Feature activation-7.39e-02

Pos logit contributions
 magn+0.135
ra+0.124
 Fre+0.119
 Tony+0.116
 Mason+0.116
Neg logit contributions
rell-0.109
ock-0.108
 better-0.093
 company-0.089
ier-0.088
Tokeny
Token ID88
Feature activation-1.79e-02

Pos logit contributions
 magn+0.132
ra+0.127
 Fre+0.125
growth+0.118
 sket+0.115
Neg logit contributions
 better-0.099
ock-0.094
rell-0.093
vious-0.084
ier-0.080
Token cared
Token ID19951
Feature activation-2.44e-03

Pos logit contributions
 magn+0.033
udge+0.032
ra+0.032
 Fre+0.031
ast+0.030
Neg logit contributions
 better-0.025
rell-0.022
 nicer-0.021
 less-0.020
 end-0.020
Token about
Token ID546
Feature activation-2.50e-02

Pos logit contributions
 magn+0.004
ra+0.004
ast+0.004
 Fre+0.004
 inh+0.004
Neg logit contributions
rell-0.003
 better-0.003
 nicer-0.003
 safer-0.003
 meantime-0.003
Token the
Token ID262
Feature activation-7.44e-03

Pos logit contributions
ra+0.055
opsy+0.049
ast+0.049
udge+0.049
ude+0.049
Neg logit contributions
 better-0.033
 safer-0.028
 less-0.026
 safe-0.026
rell-0.026
Token bees
Token ID17002
Feature activation-6.03e-04

Pos logit contributions
ra+0.041
 magn+0.040
 Fre+0.039
ast+0.039
 inh+0.038
Neg logit contributions
rell-0.050
 better-0.049
 safer-0.045
 nicer-0.045
 company-0.043
Token.
Token ID13
Feature activation-1.56e-02

Pos logit contributions
ra+0.001
 magn+0.001
 Fre+0.001
ast+0.001
 inh+0.001
Neg logit contributions
rell-0.001
 better-0.001
 safer-0.001
 nicer-0.001
 meantime-0.001
Token Every
Token ID3887
Feature activation-1.68e-02

Pos logit contributions
ra+0.029
 magn+0.028
ast+0.027
 Fre+0.026
 inh+0.026
Neg logit contributions
rell-0.022
 better-0.022
 safer-0.020
 nicer-0.019
 less-0.018
Token day
Token ID1110
Feature activation-2.67e-03

Pos logit contributions
ra+0.036
 magn+0.035
ast+0.034
 Fre+0.033
udge+0.033
Neg logit contributions
 better-0.019
rell-0.018
 safer-0.016
 nicer-0.015
 company-0.015
Token,
Token ID11
Feature activation-2.63e-02

Pos logit contributions
ra+0.005
 magn+0.005
 Fre+0.005
ast+0.005
 inh+0.005
Neg logit contributions
rell-0.003
 better-0.003
 safer-0.003
 nicer-0.003
 company-0.003
Token Buzz
Token ID15753
Feature activation-7.29e-02

Pos logit contributions
ra+0.040
ast+0.038
udge+0.037
 magn+0.036
ert+0.035
Neg logit contributions
 better-0.050
rell-0.045
 safer-0.045
 less-0.044
 nicer-0.043
Tokeny
Token ID88
Feature activation-2.91e-02

Pos logit contributions
ra+0.114
 magn+0.110
 Fre+0.106
 Jay+0.103
ert+0.102
Neg logit contributions
 better-0.118
rell-0.108
ock-0.105
 company-0.097
 end-0.097
Token would
Token ID561
Feature activation-1.31e-02

Pos logit contributions
ra+0.052
udge+0.050
 magn+0.050
Tony+0.048
ert+0.048
Neg logit contributions
 better-0.042
rell-0.040
 safer-0.036
 nicer-0.034
 end-0.034
Token fly
Token ID6129
Feature activation-2.13e-02

Pos logit contributions
ra+0.023
 magn+0.022
ert+0.022
udge+0.021
Tony+0.021
Neg logit contributions
 better-0.020
rell-0.019
 safer-0.017
 nicer-0.017
 safe-0.016
Token around
Token ID1088
Feature activation-3.92e-02

Pos logit contributions
ra+0.034
 magn+0.032
ast+0.032
ude+0.031
 Fre+0.030
Neg logit contributions
 better-0.036
rell-0.035
 safer-0.033
 nicer-0.032
 less-0.031
Token and
Token ID290
Feature activation+5.33e-03

Pos logit contributions
ra+0.080
ast+0.076
amm+0.076
yle+0.074
ryan+0.074
Neg logit contributions
 better-0.055
 safe-0.044
 safer-0.044
 less-0.044
 more-0.043
Token.
Token ID13
Feature activation-9.03e-03

Pos logit contributions
ra+0.021
 magn+0.021
ast+0.020
 Fre+0.020
 inh+0.020
Neg logit contributions
rell-0.014
 better-0.013
 safer-0.012
 nicer-0.012
 company-0.011
Token One
Token ID1881
Feature activation-1.58e-02

Pos logit contributions
ra+0.014
 magn+0.013
ast+0.013
 Fre+0.013
 inh+0.012
Neg logit contributions
rell-0.016
 better-0.015
 safer-0.014
 nicer-0.013
 less-0.013
Token day
Token ID1110
Feature activation-4.55e-03

Pos logit contributions
ra+0.030
 magn+0.029
ast+0.028
 Fre+0.028
 inh+0.028
Neg logit contributions
rell-0.021
 better-0.020
 safer-0.018
 nicer-0.018
 company-0.017
Token,
Token ID11
Feature activation-1.24e-02

Pos logit contributions
ra+0.010
ast+0.009
 magn+0.009
 Fre+0.009
 inh+0.009
Neg logit contributions
rell-0.005
 better-0.005
 safer-0.004
 nicer-0.004
 less-0.004
Token Sp
Token ID1338
Feature activation-6.24e-02

Pos logit contributions
ra+0.016
ast+0.015
ude+0.015
alph+0.014
udge+0.014
Neg logit contributions
 better-0.025
 less-0.023
 safer-0.023
rell-0.022
 nicer-0.021
Tokenide
Token ID485
Feature activation-7.28e-02

Pos logit contributions
 magn+0.108
ra+0.103
 Fre+0.099
 Mason+0.095
 Marc+0.095
Neg logit contributions
rell-0.094
ock-0.093
 better-0.083
 company-0.079
hell-0.077
Tokeny
Token ID88
Feature activation-2.24e-02

Pos logit contributions
 magn+0.121
ra+0.121
 Fre+0.115
 cu+0.112
Jul+0.109
Neg logit contributions
 better-0.098
rell-0.097
ock-0.094
vious-0.090
ier-0.082
Token woke
Token ID19092
Feature activation+8.22e-03

Pos logit contributions
ra+0.037
 magn+0.036
 Fre+0.034
ast+0.034
udge+0.034
Neg logit contributions
 better-0.034
rell-0.033
 safer-0.029
 safe-0.029
 nicer-0.028
Token up
Token ID510
Feature activation-3.03e-02

Pos logit contributions
rell+0.012
 better+0.011
 safer+0.010
 nicer+0.010
 company+0.010
Neg logit contributions
ra-0.015
 magn-0.014
ast-0.014
 Fre-0.014
 inh-0.014
Token and
Token ID290
Feature activation+9.45e-03

Pos logit contributions
ra+0.052
Tony+0.048
ast+0.047
 magn+0.047
 Miles+0.047
Neg logit contributions
 better-0.050
rell-0.047
 safer-0.043
 safe-0.042
 nicer-0.042
Token wanted
Token ID2227
Feature activation+7.29e-03

Pos logit contributions
rell+0.020
 better+0.018
 safer+0.018
ock+0.017
 nicer+0.017
Neg logit contributions
 magn-0.011
 inh-0.011
ra-0.011
 Fre-0.010
ast-0.010
Token flowers
Token ID12734
Feature activation-9.09e-03

Pos logit contributions
 better+0.003
rell+0.003
 safer+0.003
 nicer+0.003
 company+0.003
Neg logit contributions
ra-0.003
 magn-0.003
ast-0.003
 Fre-0.003
ude-0.003
Token.
Token ID13
Feature activation-1.01e-02

Pos logit contributions
ra+0.019
 magn+0.019
ast+0.018
 Fre+0.018
 inh+0.018
Neg logit contributions
rell-0.010
 better-0.010
 safer-0.009
 nicer-0.008
 company-0.008
Token One
Token ID1881
Feature activation-1.70e-02

Pos logit contributions
ra+0.020
 magn+0.020
ast+0.019
udge+0.018
ude+0.018
Neg logit contributions
rell-0.013
 better-0.013
 safer-0.011
 nicer-0.011
 less-0.011
Token day
Token ID1110
Feature activation-8.76e-03

Pos logit contributions
ra+0.033
 magn+0.032
ast+0.031
 Fre+0.031
 inh+0.030
Neg logit contributions
rell-0.022
 better-0.021
 safer-0.019
 nicer-0.018
 company-0.018
Token,
Token ID11
Feature activation-1.33e-02

Pos logit contributions
ra+0.018
 magn+0.017
ast+0.017
 Fre+0.017
 inh+0.016
Neg logit contributions
rell-0.010
 better-0.010
 safer-0.009
 nicer-0.009
 company-0.008
Token Buzz
Token ID15753
Feature activation-7.21e-02

Pos logit contributions
ra+0.021
ast+0.021
udge+0.020
ert+0.020
ude+0.020
Neg logit contributions
 better-0.024
rell-0.021
 less-0.020
 safer-0.020
 nicer-0.020
Tokeny
Token ID88
Feature activation-2.60e-02

Pos logit contributions
 magn+0.109
ra+0.107
 Fre+0.106
 Jay+0.100
 Tor+0.099
Neg logit contributions
 better-0.117
rell-0.107
ock-0.102
 company-0.098
 end-0.097
Token flew
Token ID13112
Feature activation-7.86e-03

Pos logit contributions
ra+0.044
 magn+0.044
udge+0.043
 Fre+0.043
 Miles+0.042
Neg logit contributions
 better-0.037
rell-0.035
 safer-0.031
 safe-0.031
 nicer-0.031
Token into
Token ID656
Feature activation-3.55e-02

Pos logit contributions
ra+0.011
 magn+0.010
ast+0.010
 Fre+0.010
 inh+0.010
Neg logit contributions
rell-0.015
 better-0.015
 safer-0.013
 nicer-0.013
 less-0.013
Token a
Token ID257
Feature activation-3.17e-03

Pos logit contributions
ra+0.079
ast+0.078
aped+0.073
ert+0.072
ately+0.072
Neg logit contributions
 better-0.043
 safer-0.036
rell-0.034
 safe-0.033
 less-0.032
Token sticky
Token ID23408
Feature activation-1.13e-02

Pos logit contributions
 magn+0.004
ra+0.004
 Fre+0.004
 inh+0.004
ast+0.004
Neg logit contributions
rell-0.006
 better-0.005
 company-0.005
ock-0.005
 safer-0.005
Token you
Token ID345
Feature activation+2.85e-02

Pos logit contributions
 magn+0.010
ast+0.010
ra+0.010
 inh+0.010
 Fre+0.009
Neg logit contributions
 better-0.007
rell-0.007
 nicer-0.006
 safer-0.006
ock-0.006
Token again
Token ID757
Feature activation-7.31e-03

Pos logit contributions
rell+0.043
 safer+0.035
 meantime+0.034
 nicer+0.033
 better+0.032
Neg logit contributions
ast-0.050
ra-0.049
 magn-0.045
 Fre-0.045
 inh-0.045
Token."
Token ID526
Feature activation-1.41e-02

Pos logit contributions
ra+0.015
ast+0.015
 magn+0.014
 Fre+0.014
ude+0.014
Neg logit contributions
rell-0.009
 better-0.007
 safer-0.007
ock-0.007
 nicer-0.007
Token
Token ID198
Feature activation-1.73e-03

Pos logit contributions
ra+0.027
ast+0.025
udge+0.025
 magn+0.025
ert+0.024
Neg logit contributions
 better-0.020
rell-0.018
 less-0.017
 safer-0.016
 company-0.016
Token
Token ID198
Feature activation-1.26e-02

Pos logit contributions
ra+0.003
 magn+0.003
ast+0.003
 Fre+0.003
 inh+0.003
Neg logit contributions
 better-0.002
rell-0.002
 safer-0.002
 nicer-0.002
 company-0.002
TokenR
Token ID49
Feature activation-7.21e-02

Pos logit contributions
 magn+0.024
ra+0.023
udge+0.023
 knees+0.022
ert+0.022
Neg logit contributions
 better-0.018
 less-0.015
 nicer-0.015
 safer-0.015
 end-0.014
Tokenabb
Token ID6485
Feature activation-4.28e-02

Pos logit contributions
 magn+0.099
 Fre+0.086
ra+0.085
 Jay+0.085
 Miles+0.084
Neg logit contributions
rell-0.128
 better-0.124
 company-0.123
ock-0.121
 End-0.113
Tokenie
Token ID494
Feature activation-1.60e-02

Pos logit contributions
 magn+0.077
ra+0.077
 Fre+0.075
ast+0.072
 inh+0.072
Neg logit contributions
rell-0.061
 better-0.057
ock-0.051
 nicer-0.049
 safer-0.049
Token forg
Token ID5784
Feature activation+2.24e-03

Pos logit contributions
ra+0.026
 magn+0.026
 Fre+0.024
ast+0.024
 inh+0.023
Neg logit contributions
 better-0.025
rell-0.025
 safer-0.023
 nicer-0.023
 less-0.022
Tokenave
Token ID1015
Feature activation-2.78e-02

Pos logit contributions
rell+0.003
 better+0.003
 safer+0.003
 meantime+0.003
 nicer+0.003
Neg logit contributions
ra-0.004
ast-0.004
ude-0.004
 inh-0.003
 fet-0.003
Token Fox
Token ID5426
Feature activation-3.29e-02

Pos logit contributions
ast+0.039
ra+0.039
udge+0.038
aped+0.038
 sket+0.037
Neg logit contributions
 better-0.053
rell-0.046
 less-0.044
 end-0.044
ock-0.043
Token From
Token ID3574
Feature activation+1.81e-02

Pos logit contributions
 better+0.007
rell+0.007
 safer+0.006
 less+0.006
 nicer+0.006
Neg logit contributions
 magn-0.006
ra-0.006
ast-0.006
opsy-0.006
ude-0.006
Token that
Token ID326
Feature activation-2.41e-02

Pos logit contributions
 better+0.025
rell+0.024
 safer+0.022
 nicer+0.022
 company+0.020
Neg logit contributions
ra-0.034
 magn-0.033
ast-0.032
 Fre-0.032
 inh-0.031
Token day
Token ID1110
Feature activation+7.68e-03

Pos logit contributions
ra+0.053
ast+0.050
 magn+0.050
 Fre+0.049
udge+0.049
Neg logit contributions
 better-0.028
rell-0.026
 safer-0.023
 safe-0.022
 company-0.022
Token on
Token ID319
Feature activation-3.31e-02

Pos logit contributions
rell+0.014
 better+0.014
 safer+0.013
 nicer+0.012
 company+0.012
Neg logit contributions
ra-0.011
 magn-0.010
 Fre-0.010
ast-0.010
 inh-0.010
Token,
Token ID11
Feature activation-3.17e-02

Pos logit contributions
ast+0.082
ra+0.081
udge+0.078
 inh+0.077
 magn+0.076
Neg logit contributions
 better-0.028
rell-0.025
 safer-0.022
 less-0.019
 safe-0.019
Token Sp
Token ID1338
Feature activation-7.16e-02

Pos logit contributions
ra+0.038
ast+0.038
 magn+0.035
ude+0.034
opsy+0.034
Neg logit contributions
 better-0.068
rell-0.063
 safer-0.063
 less-0.060
 nicer-0.058
Tokenide
Token ID485
Feature activation-7.70e-02

Pos logit contributions
 magn+0.124
ra+0.115
 Fre+0.112
 Mason+0.110
 Tony+0.109
Neg logit contributions
rell-0.112
ock-0.109
 better-0.097
 company-0.094
ier-0.091
Tokeny
Token ID88
Feature activation-2.35e-02

Pos logit contributions
 magn+0.117
ra+0.115
 Fre+0.112
 cu+0.109
 inh+0.109
Neg logit contributions
 better-0.123
rell-0.116
ock-0.115
ier-0.106
vious-0.104
Token made
Token ID925
Feature activation+5.30e-03

Pos logit contributions
 magn+0.042
ra+0.041
 Fre+0.039
ast+0.039
udge+0.038
Neg logit contributions
 better-0.035
rell-0.033
 safer-0.029
 less-0.028
 nicer-0.028
Token sure
Token ID1654
Feature activation-1.89e-02

Pos logit contributions
 better+0.009
rell+0.008
 safer+0.008
 nicer+0.008
 less+0.007
Neg logit contributions
ra-0.009
 magn-0.009
 Fre-0.008
ast-0.008
 inh-0.008
Token to
Token ID284
Feature activation-1.30e-02

Pos logit contributions
ast+0.034
ra+0.034
 magn+0.033
opsy+0.032
isa+0.031
Neg logit contributions
 better-0.031
 nicer-0.027
 safer-0.027
 less-0.027
 wrong-0.026
TokenAfter
Token ID3260
Feature activation-3.31e-02

Pos logit contributions
 magn+0.009
ra+0.009
 Fre+0.009
ast+0.008
opsy+0.008
Neg logit contributions
 better-0.008
rell-0.007
 safer-0.007
 nicer-0.007
ock-0.007
Token helping
Token ID5742
Feature activation-2.90e-02

Pos logit contributions
ra+0.059
ast+0.056
 magn+0.054
 Fre+0.054
ude+0.054
Neg logit contributions
 better-0.051
rell-0.046
 safer-0.045
 nicer-0.043
 less-0.043
Token the
Token ID262
Feature activation-1.42e-02

Pos logit contributions
ra+0.055
 magn+0.054
ast+0.053
ude+0.053
Tony+0.053
Neg logit contributions
 better-0.042
rell-0.036
 safer-0.036
 safe-0.035
 less-0.034
Token bug
Token ID5434
Feature activation-8.16e-03

Pos logit contributions
ra+0.018
 Fre+0.018
ast+0.018
 Miles+0.017
 Mason+0.017
Neg logit contributions
 better-0.028
 safer-0.026
rell-0.025
 nicer-0.025
 safe-0.024
Token,
Token ID11
Feature activation-2.89e-02

Pos logit contributions
ra+0.014
 magn+0.014
ast+0.014
 Fre+0.014
 inh+0.013
Neg logit contributions
rell-0.012
 better-0.012
 safer-0.011
 nicer-0.010
 company-0.010
Token Sp
Token ID1338
Feature activation-7.13e-02

Pos logit contributions
ra+0.043
ast+0.043
ude+0.040
alph+0.039
 magn+0.039
Neg logit contributions
 better-0.056
 safer-0.050
 less-0.049
rell-0.047
 safe-0.046
Tokenide
Token ID485
Feature activation-6.36e-02

Pos logit contributions
 magn+0.129
ra+0.118
 Fre+0.115
 Mason+0.113
 Tony+0.112
Neg logit contributions
rell-0.107
ock-0.106
 better-0.091
ier-0.089
 company-0.087
Tokeny
Token ID88
Feature activation-1.79e-02

Pos logit contributions
 magn+0.098
ra+0.096
 Fre+0.093
 inh+0.088
 cu+0.087
Neg logit contributions
 better-0.101
rell-0.096
ock-0.094
ier-0.088
vious-0.086
Token felt
Token ID2936
Feature activation+2.27e-02

Pos logit contributions
ra+0.030
 magn+0.030
 Fre+0.029
ast+0.028
udge+0.028
Neg logit contributions
 better-0.027
rell-0.026
 safer-0.024
 nicer-0.024
 less-0.023
Token happy
Token ID3772
Feature activation-3.89e-03

Pos logit contributions
rell+0.039
ock+0.029
opl+0.029
 Driver+0.027
 inconvenience+0.027
Neg logit contributions
 inh-0.039
 magn-0.036
 fold-0.035
 Mint-0.035
 digging-0.035
Token too
Token ID1165
Feature activation+2.17e-02

Pos logit contributions
ra+0.006
 magn+0.006
 Fre+0.006
ast+0.006
 inh+0.005
Neg logit contributions
rell-0.006
 better-0.006
 safer-0.006
 nicer-0.006
 company-0.006
Token From
Token ID3574
Feature activation+2.27e-02

Pos logit contributions
ra+0.012
 magn+0.011
ast+0.011
ude+0.010
opsy+0.010
Neg logit contributions
rell-0.013
 better-0.013
 safer-0.012
 less-0.011
 nicer-0.011
Token that
Token ID326
Feature activation-2.41e-02

Pos logit contributions
 better+0.031
rell+0.029
 safer+0.027
 nicer+0.027
 less+0.025
Neg logit contributions
ra-0.043
 magn-0.042
ast-0.041
 Fre-0.040
 inh-0.040
Token day
Token ID1110
Feature activation+3.43e-03

Pos logit contributions
ra+0.053
ast+0.052
udge+0.050
 Fre+0.050
ude+0.049
Neg logit contributions
 better-0.027
rell-0.025
 safer-0.022
 safe-0.021
 company-0.020
Token on
Token ID319
Feature activation-3.69e-02

Pos logit contributions
 better+0.006
rell+0.006
 safer+0.005
 nicer+0.005
 company+0.005
Neg logit contributions
ra-0.005
 magn-0.005
ast-0.005
 Fre-0.005
 inh-0.005
Token,
Token ID11
Feature activation-3.22e-02

Pos logit contributions
ast+0.096
ra+0.091
udge+0.091
Bel+0.090
Jul+0.088
Neg logit contributions
 better-0.028
rell-0.021
 safer-0.019
 less-0.017
 wrong-0.017
Token Sp
Token ID1338
Feature activation-7.12e-02

Pos logit contributions
ast+0.039
ra+0.039
 magn+0.036
ude+0.036
opsy+0.035
Neg logit contributions
 better-0.069
rell-0.063
 safer-0.063
 less-0.061
 nicer-0.058
Tokenide
Token ID485
Feature activation-7.84e-02

Pos logit contributions
 magn+0.122
ra+0.114
 Fre+0.108
 Mason+0.106
 inh+0.106
Neg logit contributions
rell-0.114
ock-0.109
 better-0.099
 company-0.095
ier-0.092
Tokeny
Token ID88
Feature activation-2.51e-02

Pos logit contributions
 magn+0.126
ra+0.123
 Fre+0.120
 cu+0.118
 inh+0.116
Neg logit contributions
 better-0.114
rell-0.110
ock-0.110
ier-0.101
vious-0.096
Token knew
Token ID2993
Feature activation+1.10e-03

Pos logit contributions
 magn+0.042
ra+0.041
 Fre+0.040
ast+0.039
udge+0.039
Neg logit contributions
 better-0.038
rell-0.037
 safer-0.032
 less-0.032
ock-0.031
Token that
Token ID326
Feature activation-1.11e-02

Pos logit contributions
 better+0.001
 safer+0.001
 less+0.001
 nicer+0.001
 safe+0.001
Neg logit contributions
ra-0.002
ast-0.002
 magn-0.002
udge-0.002
isa-0.002
Token even
Token ID772
Feature activation-2.02e-04

Pos logit contributions
ra+0.021
ast+0.020
opsy+0.019
udge+0.019
ude+0.019
Neg logit contributions
 better-0.017
 safer-0.015
 less-0.014
 safe-0.014
rell-0.014
Token web
Token ID3992
Feature activation-1.82e-02

Pos logit contributions
ra+0.006
 magn+0.006
 Fre+0.006
ast+0.006
 inh+0.005
Neg logit contributions
rell-0.008
 better-0.008
 safer-0.007
 nicer-0.007
 company-0.007
Token went
Token ID1816
Feature activation+1.88e-03

Pos logit contributions
ra+0.026
 magn+0.025
ast+0.024
 Fre+0.024
 inh+0.023
Neg logit contributions
rell-0.033
 better-0.032
 safer-0.030
 nicer-0.029
 company-0.028
Token flying
Token ID7348
Feature activation-1.06e-03

Pos logit contributions
rell+0.004
ock+0.003
 company+0.003
ealing+0.003
vious+0.003
Neg logit contributions
 magn-0.002
ra-0.002
 Fre-0.002
 inh-0.002
 El-0.002
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
ra+0.002
 magn+0.002
 Fre+0.002
 inh+0.002
ert+0.002
Neg logit contributions
rell-0.002
 better-0.001
 nicer-0.001
 safer-0.001
 company-0.001
Token Sp
Token ID1338
Feature activation-7.11e-02

Pos logit contributions
ra+0.024
 magn+0.024
ast+0.023
opsy+0.022
ude+0.022
Neg logit contributions
rell-0.020
 better-0.019
 safer-0.017
 less-0.016
 refreshed-0.016
Tokenide
Token ID485
Feature activation-7.12e-02

Pos logit contributions
 magn+0.130
ra+0.118
 Fre+0.113
 Tony+0.111
 Marc+0.111
Neg logit contributions
rell-0.109
ock-0.107
 better-0.090
 company-0.088
ier-0.088
Tokeny
Token ID88
Feature activation-2.11e-02

Pos logit contributions
 magn+0.112
ra+0.104
 Fre+0.104
ast+0.102
 inh+0.098
Neg logit contributions
rell-0.112
 better-0.108
ock-0.103
ier-0.093
vious-0.093
Token held
Token ID2714
Feature activation-1.55e-02

Pos logit contributions
ra+0.032
 magn+0.032
ast+0.031
 Fre+0.031
 Miles+0.030
Neg logit contributions
 better-0.034
rell-0.034
 safer-0.031
 nicer-0.030
 less-0.029
Token on
Token ID319
Feature activation-1.23e-02

Pos logit contributions
ast+0.031
ude+0.030
Fre+0.029
 sket+0.029
Jul+0.028
Neg logit contributions
 better-0.024
 safe-0.020
 safer-0.019
 less-0.018
 back-0.018
Token tight
Token ID5381
Feature activation+1.67e-03

Pos logit contributions
ra+0.019
 Fre+0.018
Tony+0.018
 magn+0.018
udge+0.018
Neg logit contributions
 better-0.021
rell-0.020
 safer-0.019
 safe-0.019
 nicer-0.018
Token and
Token ID290
Feature activation+1.37e-02

Pos logit contributions
rell+0.003
 better+0.003
 safer+0.003
 nicer+0.003
 company+0.002
Neg logit contributions
ra-0.003
 magn-0.002
ast-0.002
 Fre-0.002
 inh-0.002
Token's
Token ID338
Feature activation-4.36e-03

Pos logit contributions
ra+0.027
 magn+0.027
ast+0.026
 Fre+0.026
 inh+0.025
Neg logit contributions
rell-0.032
 better-0.031
 safer-0.029
 nicer-0.028
 company-0.027
Token web
Token ID3992
Feature activation-1.82e-02

Pos logit contributions
ra+0.006
 magn+0.006
 Fre+0.006
ast+0.006
 inh+0.005
Neg logit contributions
rell-0.008
 better-0.008
 safer-0.007
 nicer-0.007
 company-0.007
Token went
Token ID1816
Feature activation+1.88e-03

Pos logit contributions
ra+0.026
 magn+0.025
ast+0.024
 Fre+0.024
 inh+0.023
Neg logit contributions
rell-0.033
 better-0.032
 safer-0.030
 nicer-0.029
 company-0.028
Token flying
Token ID7348
Feature activation-1.06e-03

Pos logit contributions
rell+0.004
ock+0.003
 company+0.003
ealing+0.003
vious+0.003
Neg logit contributions
 magn-0.002
ra-0.002
 Fre-0.002
 inh-0.002
 El-0.002
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
ra+0.002
 magn+0.002
 Fre+0.002
 inh+0.002
ert+0.002
Neg logit contributions
rell-0.002
 better-0.001
 nicer-0.001
 safer-0.001
 company-0.001
Token Sp
Token ID1338
Feature activation-7.11e-02

Pos logit contributions
ra+0.024
 magn+0.024
ast+0.023
opsy+0.022
ude+0.022
Neg logit contributions
rell-0.020
 better-0.019
 safer-0.017
 less-0.016
 refreshed-0.016
Tokenide
Token ID485
Feature activation-7.12e-02

Pos logit contributions
 magn+0.130
ra+0.118
 Fre+0.113
 Tony+0.111
 Marc+0.111
Neg logit contributions
rell-0.109
ock-0.107
 better-0.090
 company-0.088
ier-0.088
Tokeny
Token ID88
Feature activation-2.11e-02

Pos logit contributions
 magn+0.112
ra+0.104
 Fre+0.104
ast+0.102
 inh+0.098
Neg logit contributions
rell-0.112
 better-0.108
ock-0.103
ier-0.093
vious-0.093
Token held
Token ID2714
Feature activation-1.55e-02

Pos logit contributions
ra+0.032
 magn+0.032
ast+0.031
 Fre+0.031
 Miles+0.030
Neg logit contributions
 better-0.034
rell-0.034
 safer-0.031
 nicer-0.030
 less-0.029
Token on
Token ID319
Feature activation-1.23e-02

Pos logit contributions
ast+0.031
ude+0.030
Fre+0.029
 sket+0.029
Jul+0.028
Neg logit contributions
 better-0.024
 safe-0.020
 safer-0.019
 less-0.018
 back-0.018
Token tight
Token ID5381
Feature activation+1.67e-03

Pos logit contributions
ra+0.019
 Fre+0.018
Tony+0.018
 magn+0.018
udge+0.018
Neg logit contributions
 better-0.021
rell-0.020
 safer-0.019
 safe-0.019
 nicer-0.018
Token surfing
Token ID36254
Feature activation+1.16e-03

Pos logit contributions
rell+0.022
 better+0.020
 safer+0.019
 nicer+0.019
 meantime+0.019
Neg logit contributions
 magn-0.014
ra-0.014
 Fre-0.014
 inh-0.013
ast-0.013
Token.
Token ID13
Feature activation-7.78e-03

Pos logit contributions
rell+0.002
 better+0.002
 safer+0.002
 nicer+0.002
 company+0.001
Neg logit contributions
ra-0.002
 magn-0.002
 Fre-0.002
ast-0.002
 Miles-0.002
Token He
Token ID679
Feature activation-8.86e-03

Pos logit contributions
ra+0.014
 magn+0.014
ast+0.014
opsy+0.013
 Fre+0.013
Neg logit contributions
rell-0.011
 better-0.011
 safer-0.010
 less-0.009
 nicer-0.009
Token was
Token ID373
Feature activation+9.06e-03

Pos logit contributions
 magn+0.015
ra+0.015
ast+0.014
 Fre+0.014
udge+0.014
Neg logit contributions
 better-0.013
rell-0.013
 nicer-0.012
 safer-0.012
 less-0.011
Token so
Token ID523
Feature activation+9.72e-03

Pos logit contributions
rell+0.013
 better+0.012
 safer+0.011
 nicer+0.011
ock+0.010
Neg logit contributions
ra-0.016
 Fre-0.016
 magn-0.016
ast-0.016
 inh-0.015
Token en
Token ID551
Feature activation+4.24e-02

Pos logit contributions
rell+0.014
 better+0.013
 safer+0.012
 nicer+0.012
 company+0.012
Neg logit contributions
ra-0.017
 magn-0.017
 Fre-0.017
ast-0.016
 inh-0.016
Tokenvious
Token ID1442
Feature activation+5.98e-03

Pos logit contributions
rell+0.031
 safer+0.027
 better+0.025
 company+0.024
 nicer+0.022
Neg logit contributions
ra-0.100
udge-0.099
ude-0.098
Tony-0.096
ast-0.096
Token that
Token ID326
Feature activation-2.95e-03

Pos logit contributions
rell+0.008
 better+0.008
 safer+0.007
 nicer+0.007
 company+0.007
Neg logit contributions
ra-0.011
 magn-0.011
ast-0.011
 Fre-0.010
 inh-0.010
Token he
Token ID339
Feature activation-5.45e-03

Pos logit contributions
ra+0.005
ast+0.005
 magn+0.005
 Fre+0.005
opsy+0.005
Neg logit contributions
 better-0.004
rell-0.004
 safer-0.004
 nicer-0.004
 less-0.004
Token could
Token ID714
Feature activation-6.10e-03

Pos logit contributions
ra+0.008
 magn+0.008
 Fre+0.008
ast+0.008
 inh+0.008
Neg logit contributions
rell-0.009
 better-0.009
 safer-0.008
 nicer-0.008
 company-0.008
Token hardly
Token ID8941
Feature activation+1.24e-02

Pos logit contributions
 magn+0.010
ra+0.010
ast+0.009
 Fre+0.009
 Cic+0.009
Neg logit contributions
 better-0.010
rell-0.010
 safer-0.009
 nicer-0.009
 less-0.009
Token way
Token ID835
Feature activation+1.89e-02

Pos logit contributions
ra+0.030
 Fre+0.030
 magn+0.029
ast+0.029
 Miles+0.029
Neg logit contributions
 better-0.022
rell-0.021
 nicer-0.019
 safer-0.019
 company-0.019
Token to
Token ID284
Feature activation+4.08e-03

Pos logit contributions
rell+0.029
 better+0.027
 nicer+0.026
 safer+0.026
 End+0.024
Neg logit contributions
ra-0.030
 magn-0.029
 Fre-0.029
ert-0.028
ast-0.028
Token use
Token ID779
Feature activation-3.92e-03

Pos logit contributions
rell+0.006
 better+0.006
 safer+0.006
 nicer+0.006
 company+0.005
Neg logit contributions
ra-0.007
 magn-0.007
ast-0.006
 Fre-0.006
 inh-0.006
Token it
Token ID340
Feature activation+1.11e-03

Pos logit contributions
ra+0.006
 magn+0.006
ast+0.006
 Fre+0.006
 inh+0.006
Neg logit contributions
rell-0.006
 better-0.006
 safer-0.006
 nicer-0.006
 company-0.005
Token!
Token ID0
Feature activation+1.88e-02

Pos logit contributions
rell+0.002
 better+0.002
 safer+0.002
 nicer+0.002
ock+0.001
Neg logit contributions
ra-0.002
 magn-0.002
 Fre-0.002
ast-0.002
 inh-0.002
Tokenâ
Token ID22940
Feature activation+3.78e-02

Pos logit contributions
rell+0.029
 better+0.028
 safer+0.025
 nicer+0.024
 less+0.024
Neg logit contributions
ra-0.032
 magn-0.032
ast-0.030
 Fre-0.030
 inh-0.029
Token
Token ID26391
Feature activation+3.41e-03

Pos logit contributions
 safer+0.051
 better+0.051
 nicer+0.051
 less+0.046
rell+0.046
Neg logit contributions
ra-0.068
ast-0.065
 Fre-0.064
ix-0.063
ert-0.061
Token Baby
Token ID14801
Feature activation-9.42e-04

Pos logit contributions
 better+0.006
rell+0.006
 nicer+0.005
 safer+0.005
ock+0.005
Neg logit contributions
ra-0.005
ast-0.005
 magn-0.005
 Fre-0.005
ude-0.005
Token listened
Token ID16399
Feature activation-5.78e-04

Pos logit contributions
ra+0.001
 magn+0.001
 Fre+0.001
ast+0.001
 inh+0.001
Neg logit contributions
rell-0.002
 better-0.002
 safer-0.002
 nicer-0.001
 company-0.001
Token to
Token ID284
Feature activation-2.72e-03

Pos logit contributions
ra+0.001
 magn+0.001
 Fre+0.001
ast+0.001
 inh+0.001
Neg logit contributions
 better-0.001
rell-0.001
 safer-0.001
 nicer-0.001
 less-0.001
Token her
Token ID607
Feature activation-8.33e-03

Pos logit contributions
ra+0.005
 magn+0.005
ast+0.004
 inh+0.004
ude+0.004
Neg logit contributions
 better-0.004
rell-0.004
 safer-0.004
 company-0.003
 less-0.003
Token!"
Token ID2474
Feature activation+1.06e-02

Pos logit contributions
rell+0.006
 better+0.006
 safer+0.006
 nicer+0.005
 meantime+0.005
Neg logit contributions
ra-0.009
 magn-0.009
ast-0.009
 Fre-0.009
 Miles-0.008
Token His
Token ID2399
Feature activation-2.93e-03

Pos logit contributions
rell+0.018
 better+0.017
 safer+0.016
 nicer+0.016
 company+0.015
Neg logit contributions
ra-0.017
 magn-0.016
 Fre-0.016
ast-0.015
 inh-0.015
Token mom
Token ID1995
Feature activation+1.94e-02

Pos logit contributions
ra+0.003
 magn+0.003
ast+0.003
 Fre+0.003
 inh+0.003
Neg logit contributions
rell-0.006
 better-0.006
 safer-0.006
 nicer-0.006
ock-0.006
Token smiled
Token ID13541
Feature activation-6.16e-03

Pos logit contributions
rell+0.028
 better+0.026
 safer+0.025
 nicer+0.024
 company+0.023
Neg logit contributions
ra-0.035
ast-0.033
 magn-0.033
 Fre-0.032
 inh-0.032
Token and
Token ID290
Feature activation+4.31e-02

Pos logit contributions
ra+0.006
ast+0.006
 magn+0.006
 inh+0.006
 Fre+0.006
Neg logit contributions
rell-0.014
 better-0.012
 safer-0.012
 nicer-0.012
ock-0.012
Token said
Token ID531
Feature activation+3.78e-02

Pos logit contributions
 better+0.053
rell+0.048
 safer+0.044
 nicer+0.044
 less+0.041
Neg logit contributions
ra-0.090
 magn-0.088
ast-0.085
udge-0.083
ude-0.083
Token,
Token ID11
Feature activation+3.65e-03

Pos logit contributions
rell+0.059
 better+0.053
 safer+0.052
ock+0.050
 meantime+0.050
Neg logit contributions
ra-0.062
 Fre-0.062
 magn-0.060
 Cic-0.058
ast-0.058
Token "
Token ID366
Feature activation-5.77e-03

Pos logit contributions
rell+0.006
 better+0.006
 safer+0.005
 nicer+0.005
 less+0.005
Neg logit contributions
ra-0.006
 magn-0.006
 Fre-0.006
ast-0.005
 inh-0.005
TokenYou
Token ID1639
Feature activation+8.78e-03

Pos logit contributions
 magn+0.010
ra+0.010
ast+0.009
 Fre+0.009
 inh+0.009
Neg logit contributions
rell-0.009
 better-0.009
 safer-0.008
 nicer-0.008
ock-0.008
Token're
Token ID821
Feature activation+2.09e-02

Pos logit contributions
rell+0.012
 safer+0.011
 nicer+0.011
 better+0.011
ock+0.010
Neg logit contributions
ra-0.016
 magn-0.016
 Fre-0.015
ast-0.015
 inh-0.015
Token welcome
Token ID7062
Feature activation+1.15e-02

Pos logit contributions
rell+0.025
ock+0.020
 nicer+0.019
vious+0.019
 better+0.018
Neg logit contributions
 Fre-0.044
ra-0.041
 Mason-0.041
 Tony-0.041
 magn-0.041
Token mom
Token ID1995
Feature activation+2.58e-02

Pos logit contributions
rell+0.002
 better+0.002
 safer+0.002
 nicer+0.002
 company+0.001
Neg logit contributions
ra-0.001
 Fre-0.001
 magn-0.001
ast-0.001
 inh-0.001
Token smiled
Token ID13541
Feature activation-2.37e-03

Pos logit contributions
 better+0.034
rell+0.033
 nicer+0.029
 safer+0.028
 less+0.027
Neg logit contributions
 magn-0.050
ra-0.048
 Fre-0.048
ast-0.045
ert-0.045
Token and
Token ID290
Feature activation+4.53e-02

Pos logit contributions
ra+0.002
ast+0.002
 magn+0.002
 inh+0.002
 Fre+0.002
Neg logit contributions
rell-0.005
 better-0.005
 safer-0.005
 nicer-0.005
ock-0.005
Token said
Token ID531
Feature activation+3.75e-02

Pos logit contributions
 better+0.048
rell+0.039
 nicer+0.038
 safer+0.036
 less+0.033
Neg logit contributions
ra-0.106
 magn-0.103
udge-0.100
ast-0.099
ert-0.098
Token "
Token ID366
Feature activation-9.85e-03

Pos logit contributions
rell+0.056
 better+0.054
 safer+0.050
 nicer+0.049
 company+0.047
Neg logit contributions
ra-0.064
 magn-0.063
 Fre-0.061
ast-0.061
 inh-0.059
TokenYes
Token ID5297
Feature activation+4.64e-02

Pos logit contributions
 magn+0.015
ra+0.014
 Fre+0.014
ast+0.014
 inh+0.013
Neg logit contributions
rell-0.017
 better-0.017
 nicer-0.015
 safer-0.015
 less-0.015
Token,
Token ID11
Feature activation+3.07e-04

Pos logit contributions
rell+0.082
 better+0.073
 safer+0.072
 company+0.069
ock+0.069
Neg logit contributions
ra-0.071
ast-0.066
 Fre-0.063
 Tony-0.063
 Cic-0.062
Token it
Token ID340
Feature activation+2.79e-02

Pos logit contributions
rell+0.001
 better+0.001
 safer+0.000
 nicer+0.000
 company+0.000
Neg logit contributions
ra-0.000
 magn-0.000
 Fre-0.000
ast-0.000
 inh-0.000
Token does
Token ID857
Feature activation+2.00e-02

Pos logit contributions
rell+0.038
 better+0.038
 safer+0.033
 nicer+0.032
 less+0.031
Neg logit contributions
ra-0.051
 magn-0.050
 Fre-0.050
ast-0.049
 inh-0.048
Token!
Token ID0
Feature activation+2.28e-02

Pos logit contributions
rell+0.027
 better+0.024
 safer+0.023
ock+0.023
 nicer+0.022
Neg logit contributions
ra-0.038
ast-0.037
 magn-0.037
 Fre-0.036
 inh-0.035
Token And
Token ID843
Feature activation+1.98e-02

Pos logit contributions
rell+0.035
 better+0.034
 safer+0.031
 less+0.030
 nicer+0.030
Neg logit contributions
 magn-0.038
ra-0.037
 inh-0.035
ast-0.035
 Fre-0.035
TokenThe
Token ID464
Feature activation+2.20e-02

Pos logit contributions
 better+0.008
rell+0.008
 safer+0.008
 nicer+0.007
 company+0.007
Neg logit contributions
ra-0.009
 magn-0.009
 Fre-0.009
 inh-0.008
ast-0.008
Token other
Token ID584
Feature activation-3.87e-03

Pos logit contributions
rell+0.037
 better+0.035
 safer+0.033
 nicer+0.032
 company+0.031
Neg logit contributions
ra-0.034
 magn-0.033
ast-0.032
 Fre-0.032
 inh-0.031
Token animals
Token ID4695
Feature activation+3.17e-03

Pos logit contributions
ast+0.006
 magn+0.006
ra+0.006
 Fre+0.006
 inh+0.006
Neg logit contributions
 better-0.006
rell-0.006
 end-0.005
 company-0.005
 nicer-0.005
Token were
Token ID547
Feature activation+5.36e-03

Pos logit contributions
rell+0.005
 better+0.005
 safer+0.005
 nicer+0.004
 company+0.004
Neg logit contributions
ra-0.005
 magn-0.005
ast-0.005
 Fre-0.005
 inh-0.005
Token un
Token ID555
Feature activation+4.88e-03

Pos logit contributions
rell+0.008
 better+0.007
ock+0.006
 safer+0.006
 nicer+0.006
Neg logit contributions
ra-0.010
 Fre-0.009
 magn-0.009
ast-0.009
 Miles-0.009
Tokenimp
Token ID11011
Feature activation+4.35e-02

Pos logit contributions
rell+0.009
 better+0.008
 safer+0.008
 nicer+0.008
 company+0.007
Neg logit contributions
 magn-0.007
ra-0.007
 Fre-0.007
ast-0.006
 inh-0.006
Tokenressed
Token ID2790
Feature activation-7.09e-03

Pos logit contributions
rell+0.062
 better+0.061
 safer+0.059
 nicer+0.056
 meantime+0.055
Neg logit contributions
ra-0.075
 Fre-0.071
ude-0.070
ert-0.070
ast-0.069
Token and
Token ID290
Feature activation+8.64e-03

Pos logit contributions
ra+0.011
 magn+0.010
ast+0.010
 Fre+0.010
 inh+0.010
Neg logit contributions
rell-0.012
 better-0.012
 safer-0.011
 nicer-0.011
 company-0.010
Token scoff
Token ID39247
Feature activation-9.00e-03

Pos logit contributions
 better+0.014
rell+0.014
 safer+0.013
 nicer+0.013
 less+0.013
Neg logit contributions
ra-0.013
 magn-0.013
ast-0.013
 Fre-0.012
 inh-0.012
Tokened
Token ID276
Feature activation-6.32e-03

Pos logit contributions
 magn+0.013
ra+0.013
ast+0.012
 Fre+0.012
 inh+0.012
Neg logit contributions
rell-0.016
 better-0.015
 nicer-0.014
 safer-0.014
 company-0.014
Token at
Token ID379
Feature activation-2.30e-02

Pos logit contributions
ra+0.010
 magn+0.010
ast+0.010
 Fre+0.010
 inh+0.009
Neg logit contributions
rell-0.010
 better-0.009
 safer-0.009
 nicer-0.009
ock-0.008
Token down
Token ID866
Feature activation-1.96e-02

Pos logit contributions
 better+0.027
rell+0.024
 nicer+0.024
 safer+0.024
 less+0.023
Neg logit contributions
ra-0.032
 magn-0.032
ast-0.031
udge-0.031
 Fre-0.030
Token the
Token ID262
Feature activation+1.45e-02

Pos logit contributions
ra+0.034
 magn+0.034
ast+0.033
 Fre+0.033
 inh+0.032
Neg logit contributions
 better-0.028
rell-0.028
 safer-0.026
 nicer-0.025
 less-0.024
Token street
Token ID4675
Feature activation+1.88e-02

Pos logit contributions
rell+0.024
 better+0.023
 safer+0.022
 nicer+0.021
ock+0.020
Neg logit contributions
ra-0.022
 magn-0.022
ast-0.021
 Fre-0.021
 inh-0.021
Token.
Token ID13
Feature activation+4.18e-03

Pos logit contributions
rell+0.028
 better+0.026
 safer+0.024
ock+0.024
 nicer+0.023
Neg logit contributions
ra-0.032
 magn-0.031
 Fre-0.030
ast-0.030
 inh-0.030
Token
Token ID198
Feature activation+1.65e-03

Pos logit contributions
rell+0.007
 better+0.006
 safer+0.006
 less+0.005
 nicer+0.005
Neg logit contributions
ra-0.007
 magn-0.007
ast-0.007
 Fre-0.007
 inh-0.006
Token
Token ID198
Feature activation+5.09e-04

Pos logit contributions
rell+0.003
 better+0.003
 safer+0.002
 nicer+0.002
 company+0.002
Neg logit contributions
ra-0.003
 magn-0.003
 Fre-0.003
ast-0.003
 inh-0.003
TokenThe
Token ID464
Feature activation+2.07e-03

Pos logit contributions
rell+0.001
 better+0.001
 safer+0.001
 nicer+0.001
 company+0.001
Neg logit contributions
ra-0.001
 magn-0.001
 Fre-0.001
ast-0.001
 inh-0.001
Token son
Token ID3367
Feature activation+9.65e-03

Pos logit contributions
rell+0.005
 better+0.004
 safer+0.004
 nicer+0.004
ock+0.004
Neg logit contributions
ra-0.002
 magn-0.002
ast-0.002
 Fre-0.002
 inh-0.002
Token was
Token ID373
Feature activation+2.45e-02

Pos logit contributions
 better+0.015
rell+0.014
 safer+0.013
 nicer+0.013
 less+0.012
Neg logit contributions
 magn-0.016
ra-0.016
 Fre-0.015
ast-0.015
udge-0.015
Token amazed
Token ID24562
Feature activation+1.69e-02

Pos logit contributions
rell+0.033
 better+0.029
ock+0.028
 safer+0.027
 nicer+0.027
Neg logit contributions
ra-0.046
 Fre-0.045
ast-0.044
 magn-0.043
 inh-0.043
Token at
Token ID379
Feature activation-1.35e-02

Pos logit contributions
 better+0.027
rell+0.026
 nicer+0.025
 safer+0.024
 less+0.023
Neg logit contributions
 magn-0.028
ra-0.027
ast-0.026
 Fre-0.026
 inh-0.025
Token,
Token ID11
Feature activation-5.89e-03

Pos logit contributions
rell+0.001
 better+0.001
 safer+0.001
 nicer+0.001
 company+0.001
Neg logit contributions
ra-0.001
 magn-0.001
 Fre-0.001
ast-0.001
ert-0.001
Token this
Token ID428
Feature activation-9.18e-04

Pos logit contributions
ra+0.010
 magn+0.010
udge+0.010
ast+0.010
ude+0.010
Neg logit contributions
 better-0.009
 safer-0.008
rell-0.007
 nicer-0.007
 safe-0.007
Token place
Token ID1295
Feature activation+3.04e-03

Pos logit contributions
ra+0.001
 magn+0.001
ast+0.001
 Fre+0.001
 inh+0.001
Neg logit contributions
rell-0.002
 better-0.002
 safer-0.001
 nicer-0.001
 company-0.001
Token is
Token ID318
Feature activation+1.18e-02

Pos logit contributions
rell+0.005
 better+0.005
 safer+0.005
 nicer+0.005
ock+0.005
Neg logit contributions
ra-0.004
 magn-0.004
 Fre-0.004
 inh-0.004
ast-0.004
Token quite
Token ID2407
Feature activation+1.87e-02

Pos logit contributions
rell+0.023
ock+0.020
vious+0.019
ealing+0.019
opl+0.019
Neg logit contributions
 magn-0.014
ra-0.014
 Fre-0.014
 inh-0.014
 M-0.014
Token impressive
Token ID8036
Feature activation+1.72e-02

Pos logit contributions
rell+0.031
ock+0.025
 company+0.023
 better+0.023
 meantime+0.023
Neg logit contributions
ra-0.030
 magn-0.030
 Fre-0.029
 inh-0.029
ast-0.028
Token!
Token ID0
Feature activation+1.74e-02

Pos logit contributions
rell+0.022
 better+0.020
 safer+0.019
 nicer+0.018
 company+0.017
Neg logit contributions
ra-0.033
 magn-0.032
 Fre-0.032
ast-0.032
 inh-0.031
Token There
Token ID1318
Feature activation-3.23e-03

Pos logit contributions
rell+0.029
 better+0.028
 safer+0.026
 nicer+0.025
 company+0.024
Neg logit contributions
ra-0.027
 magn-0.027
 Fre-0.026
ast-0.026
 inh-0.025
Token must
Token ID1276
Feature activation+9.37e-03

Pos logit contributions
ra+0.005
 magn+0.005
 Fre+0.005
 inh+0.005
 Miles+0.005
Neg logit contributions
rell-0.005
 better-0.005
 nicer-0.005
 safer-0.005
ock-0.004
Token be
Token ID307
Feature activation-4.20e-03

Pos logit contributions
rell+0.015
 better+0.014
 safer+0.013
 nicer+0.013
 company+0.012
Neg logit contributions
ra-0.015
 magn-0.015
 Fre-0.014
ast-0.014
 inh-0.014
Token something
Token ID1223
Feature activation-3.89e-03

Pos logit contributions
ra+0.005
 magn+0.005
 Fre+0.005
 inh+0.005
ast+0.005
Neg logit contributions
rell-0.008
 better-0.007
ock-0.007
 meantime-0.007
 safer-0.007
Token told
Token ID1297
Feature activation-1.30e-03

Pos logit contributions
 magn+0.023
udge+0.022
ra+0.022
 Fre+0.021
uggets+0.021
Neg logit contributions
 better-0.017
rell-0.014
 safer-0.014
 nicer-0.014
 safe-0.013
Token all
Token ID477
Feature activation-6.33e-03

Pos logit contributions
ra+0.002
 magn+0.002
ast+0.002
 Fre+0.002
opsy+0.002
Neg logit contributions
 better-0.002
rell-0.002
 nicer-0.002
 safer-0.002
 less-0.002
Token his
Token ID465
Feature activation-2.04e-02

Pos logit contributions
ra+0.011
 magn+0.011
ast+0.010
 Fre+0.010
 inh+0.010
Neg logit contributions
rell-0.010
 better-0.009
 safer-0.009
 nicer-0.008
 company-0.008
Token friends
Token ID2460
Feature activation+8.98e-03

Pos logit contributions
ra+0.031
 magn+0.030
ast+0.030
 Fre+0.030
 inh+0.028
Neg logit contributions
rell-0.034
 better-0.034
 safer-0.031
 nicer-0.031
 company-0.030
Token that
Token ID326
Feature activation-9.49e-03

Pos logit contributions
rell+0.014
 better+0.013
 safer+0.012
 nicer+0.012
 company+0.011
Neg logit contributions
ra-0.015
 magn-0.015
 Fre-0.014
ast-0.014
 inh-0.014
Token the
Token ID262
Feature activation+2.14e-03

Pos logit contributions
ra+0.017
ast+0.017
 magn+0.016
 Fre+0.016
opsy+0.016
Neg logit contributions
 better-0.015
rell-0.013
 safer-0.013
 nicer-0.012
 company-0.012
Token turkey
Token ID26876
Feature activation-1.87e-02

Pos logit contributions
rell+0.005
 better+0.004
 safer+0.004
 nicer+0.004
ock+0.004
Neg logit contributions
ra-0.002
 magn-0.002
 Fre-0.002
ast-0.002
 inh-0.002
Token was
Token ID373
Feature activation-4.42e-03

Pos logit contributions
ra+0.030
 magn+0.029
ast+0.028
 Fre+0.028
 inh+0.027
Neg logit contributions
 better-0.030
rell-0.030
 safer-0.027
 nicer-0.027
 company-0.026
Token not
Token ID407
Feature activation-1.32e-02

Pos logit contributions
ra+0.007
 Fre+0.007
 inh+0.006
ast+0.006
 magn+0.006
Neg logit contributions
rell-0.007
ock-0.006
 better-0.006
 meantime-0.006
 safer-0.006
Token dirty
Token ID11841
Feature activation+2.01e-03

Pos logit contributions
ra+0.024
 magn+0.023
 Fre+0.023
 inh+0.023
ast+0.023
Neg logit contributions
rell-0.018
 better-0.016
 safer-0.015
 meantime-0.015
 nicer-0.014
Token,
Token ID11
Feature activation-3.01e-02

Pos logit contributions
rell+0.002
 better+0.002
 safer+0.002
 nicer+0.002
 company+0.002
Neg logit contributions
ra-0.004
 magn-0.004
ast-0.004
 Fre-0.004
 inh-0.004
Token he
Token ID339
Feature activation-1.79e-02

Pos logit contributions
 magn+0.001
ra+0.001
ast+0.001
ude+0.001
udge+0.001
Neg logit contributions
 better-0.000
rell-0.000
 safer-0.000
 less-0.000
 nicer-0.000
Token sat
Token ID3332
Feature activation-2.20e-02

Pos logit contributions
 magn+0.033
udge+0.032
ra+0.031
 Fre+0.031
opsy+0.030
Neg logit contributions
 better-0.027
rell-0.023
 nicer-0.022
 safer-0.022
 less-0.021
Token down
Token ID866
Feature activation-3.58e-02

Pos logit contributions
ra+0.040
 magn+0.040
 Fre+0.038
ast+0.038
 inh+0.037
Neg logit contributions
 better-0.031
rell-0.029
 safer-0.028
 nicer-0.027
 less-0.026
Token next
Token ID1306
Feature activation-8.29e-03

Pos logit contributions
ra+0.057
 magn+0.055
opsy+0.055
alph+0.055
ast+0.054
Neg logit contributions
 better-0.061
rell-0.056
 safer-0.053
 nicer-0.052
 less-0.052
Token to
Token ID284
Feature activation-3.48e-03

Pos logit contributions
ra+0.010
 magn+0.010
ast+0.009
 Fre+0.009
 inh+0.009
Neg logit contributions
rell-0.017
 better-0.016
 safer-0.015
 nicer-0.015
 company-0.015
Token the
Token ID262
Feature activation+3.39e-03

Pos logit contributions
ra+0.006
 magn+0.006
ude+0.006
ast+0.006
opsy+0.006
Neg logit contributions
 better-0.005
rell-0.005
 safer-0.004
 safe-0.004
 less-0.004
Token rabbit
Token ID22746
Feature activation+1.87e-02

Pos logit contributions
rell+0.006
 better+0.005
 safer+0.005
 nicer+0.005
ock+0.005
Neg logit contributions
ra-0.005
 magn-0.005
 Fre-0.005
ast-0.005
 inh-0.005
Token and
Token ID290
Feature activation+1.43e-02

Pos logit contributions
rell+0.032
 better+0.032
 safer+0.029
 nicer+0.029
 company+0.028
Neg logit contributions
ra-0.027
 magn-0.027
 Fre-0.026
ast-0.026
 inh-0.025
Token they
Token ID484
Feature activation+1.54e-02

Pos logit contributions
 better+0.021
 safer+0.018
 nicer+0.017
 less+0.017
 safe+0.017
Neg logit contributions
ra-0.027
udge-0.026
 magn-0.026
ude-0.025
ast-0.025
Token sat
Token ID3332
Feature activation-1.00e-02

Pos logit contributions
 better+0.023
rell+0.023
 safer+0.020
 nicer+0.020
 safe+0.019
Neg logit contributions
 magn-0.026
ra-0.026
 Fre-0.025
ast-0.025
udge-0.024
Token there
Token ID612
Feature activation+1.36e-02

Pos logit contributions
ra+0.015
 magn+0.015
ast+0.015
 Fre+0.015
 inh+0.014
Neg logit contributions
rell-0.017
 better-0.017
 safer-0.015
 nicer-0.015
 company-0.014
Token You
Token ID921
Feature activation+1.01e-02

Pos logit contributions
 magn+0.013
ra+0.013
ast+0.013
ert+0.013
 inh+0.012
Neg logit contributions
rell-0.010
 better-0.010
 safer-0.009
 nicer-0.009
 less-0.009
Token want
Token ID765
Feature activation+1.04e-02

Pos logit contributions
rell+0.016
 safer+0.014
 better+0.013
 nicer+0.013
ock+0.013
Neg logit contributions
ra-0.017
 magn-0.017
ast-0.016
 Fre-0.016
 inh-0.016
Token to
Token ID284
Feature activation+4.04e-03

Pos logit contributions
rell+0.021
 better+0.020
 safer+0.019
 nicer+0.019
 company+0.019
Neg logit contributions
ra-0.012
 magn-0.012
 Fre-0.011
ast-0.011
 inh-0.011
Token share
Token ID2648
Feature activation+9.43e-03

Pos logit contributions
rell+0.006
 better+0.006
 safer+0.005
 nicer+0.005
 company+0.005
Neg logit contributions
 magn-0.007
ra-0.007
 Fre-0.006
ast-0.006
 inh-0.006
Token your
Token ID534
Feature activation-1.29e-02

Pos logit contributions
rell+0.017
 better+0.015
 safer+0.014
 nicer+0.014
 company+0.013
Neg logit contributions
ra-0.014
 magn-0.014
ast-0.013
 Fre-0.013
 inh-0.012
Token doll
Token ID3654
Feature activation+1.21e-02

Pos logit contributions
ra+0.018
 magn+0.018
ast+0.016
 inh+0.016
 Fre+0.015
Neg logit contributions
rell-0.024
 better-0.022
 safer-0.021
ock-0.021
 nicer-0.020
Token with
Token ID351
Feature activation+1.60e-03

Pos logit contributions
rell+0.022
 better+0.021
 safer+0.020
 nicer+0.020
ock+0.019
Neg logit contributions
ra-0.016
 magn-0.016
ast-0.016
 Fre-0.015
 inh-0.014
Token me
Token ID502
Feature activation-2.09e-03

Pos logit contributions
rell+0.002
 better+0.002
 safer+0.002
 nicer+0.002
 company+0.002
Neg logit contributions
ra-0.003
 magn-0.003
ast-0.003
 Fre-0.003
 inh-0.003
Token?"
Token ID1701
Feature activation-3.48e-03

Pos logit contributions
ra+0.004
ast+0.004
 magn+0.004
 Fre+0.004
 inh+0.004
Neg logit contributions
rell-0.003
 better-0.003
 safer-0.002
 nicer-0.002
ock-0.002
Token
Token ID198
Feature activation-5.44e-03

Pos logit contributions
ra+0.006
 magn+0.006
ast+0.006
 inh+0.006
 Fre+0.006
Neg logit contributions
 better-0.005
rell-0.005
 safer-0.004
 less-0.004
 nicer-0.004
Token
Token ID198
Feature activation-3.87e-03

Pos logit contributions
ra+0.007
 magn+0.007
 Fre+0.007
ast+0.007
Tony+0.007
Neg logit contributions
rell-0.010
 better-0.010
 nicer-0.009
 safer-0.009
 company-0.009
Token time
Token ID640
Feature activation-1.92e-02

Pos logit contributions
ra+0.029
 magn+0.028
 Fre+0.028
ast+0.028
 Cic+0.027
Neg logit contributions
 better-0.026
rell-0.025
 safer-0.024
 nicer-0.024
 less-0.023
Token.
Token ID13
Feature activation+1.74e-02

Pos logit contributions
ra+0.038
 magn+0.037
 Fre+0.037
ast+0.036
 inh+0.036
Neg logit contributions
rell-0.024
 better-0.022
 safer-0.020
 nicer-0.020
 company-0.018
Token<|endoftext|>
Token ID50256
Feature activation+8.59e-03

Pos logit contributions
rell+0.027
 better+0.026
 safer+0.025
 less+0.023
 company+0.023
Neg logit contributions
ra-0.030
 magn-0.029
ast-0.027
 Fre-0.027
 inh-0.027
TokenOnce
Token ID7454
Feature activation-1.26e-02

Pos logit contributions
rell+0.015
 better+0.014
 safer+0.013
 nicer+0.013
 company+0.013
Neg logit contributions
ra-0.013
 magn-0.012
 Fre-0.012
ast-0.012
 inh-0.012
Token upon
Token ID2402
Feature activation+1.57e-02

Pos logit contributions
ra+0.015
 magn+0.015
 Fre+0.014
ast+0.014
 inh+0.013
Neg logit contributions
rell-0.025
 better-0.024
 safer-0.023
 nicer-0.023
 company-0.022
Token a
Token ID257
Feature activation-1.06e-02

Pos logit contributions
rell+0.034
 better+0.029
 nicer+0.029
 safer+0.029
 meantime+0.028
Neg logit contributions
 magn-0.017
ra-0.017
ast-0.016
 inh-0.016
 Fre-0.016
Token time
Token ID640
Feature activation+2.70e-03

Pos logit contributions
ra+0.021
 magn+0.021
 Fre+0.021
ast+0.020
 Miles+0.020
Neg logit contributions
rell-0.012
 better-0.012
 safer-0.011
 nicer-0.011
 company-0.010
Token,
Token ID11
Feature activation-2.24e-02

Pos logit contributions
rell+0.003
 better+0.003
 safer+0.003
 nicer+0.003
 company+0.003
Neg logit contributions
ra-0.005
 magn-0.005
ast-0.005
 Fre-0.005
 inh-0.005
Token there
Token ID612
Feature activation+5.78e-03

Pos logit contributions
ra+0.036
ast+0.035
 magn+0.035
 Fre+0.034
 inh+0.033
Neg logit contributions
 better-0.036
rell-0.034
 safer-0.033
 safe-0.031
 nicer-0.031
Token was
Token ID373
Feature activation-1.74e-02

Pos logit contributions
 better+0.009
rell+0.009
 safer+0.008
 end+0.008
 safe+0.007
Neg logit contributions
ra-0.010
ast-0.010
 magn-0.009
 inh-0.009
 Fre-0.009
Token a
Token ID257
Feature activation-1.24e-02

Pos logit contributions
ra+0.027
 magn+0.026
ast+0.026
 Fre+0.025
 inh+0.025
Neg logit contributions
rell-0.029
 better-0.028
 safer-0.026
 nicer-0.025
 company-0.025
TokenThe
Token ID464
Feature activation-5.12e-03

Pos logit contributions
ra+0.015
 magn+0.014
ast+0.014
 Fre+0.014
 inh+0.013
Neg logit contributions
rell-0.014
 better-0.013
 safer-0.012
 nicer-0.012
 company-0.012
Token crab
Token ID32202
Feature activation-7.09e-03

Pos logit contributions
ra+0.006
ast+0.005
 magn+0.005
 Fre+0.005
 inh+0.005
Neg logit contributions
 better-0.011
rell-0.010
 nicer-0.010
 safer-0.010
 end-0.010
Token did
Token ID750
Feature activation-2.27e-03

Pos logit contributions
ra+0.012
udge+0.012
 magn+0.012
 Fre+0.012
ert+0.012
Neg logit contributions
 better-0.010
rell-0.009
 nicer-0.009
 safer-0.009
 less-0.008
Token not
Token ID407
Feature activation-6.32e-03

Pos logit contributions
ra+0.003
 magn+0.003
ast+0.003
 Fre+0.002
 inh+0.002
Neg logit contributions
rell-0.005
 better-0.004
 safer-0.004
 nicer-0.004
 company-0.004
Token answer
Token ID3280
Feature activation+1.30e-03

Pos logit contributions
ast+0.009
 magn+0.009
ra+0.008
 Fre+0.008
 inh+0.008
Neg logit contributions
rell-0.011
 safer-0.010
 meantime-0.010
 better-0.010
 nicer-0.010
Token.
Token ID13
Feature activation-6.58e-03

Pos logit contributions
rell+0.002
 better+0.002
 safer+0.001
 nicer+0.001
 company+0.001
Neg logit contributions
ra-0.002
 magn-0.002
 Fre-0.002
ast-0.002
 Miles-0.002
Token He
Token ID679
Feature activation-1.82e-02

Pos logit contributions
ra+0.013
 magn+0.013
ert+0.013
isa+0.013
opsy+0.013
Neg logit contributions
 better-0.008
rell-0.007
 safer-0.007
 less-0.007
 nicer-0.007
Token moved
Token ID3888
Feature activation-2.37e-02

Pos logit contributions
udge+0.033
 magn+0.033
ert+0.032
uggets+0.032
ast+0.032
Neg logit contributions
 better-0.025
 nicer-0.021
rell-0.020
 safer-0.020
 less-0.019
Token his
Token ID465
Feature activation-1.93e-02

Pos logit contributions
 magn+0.042
opsy+0.041
ra+0.041
ast+0.040
ude+0.040
Neg logit contributions
 better-0.035
 nicer-0.031
rell-0.030
 safer-0.030
 less-0.030
Token claws
Token ID28421
Feature activation-3.24e-02

Pos logit contributions
 Fre+0.027
ast+0.025
 Cic+0.025
ra+0.025
Tony+0.025
Neg logit contributions
 better-0.035
rell-0.033
 safer-0.032
 nicer-0.032
 company-0.032
Token and
Token ID290
Feature activation-1.23e-03

Pos logit contributions
ra+0.043
 magn+0.042
ast+0.040
 Fre+0.040
 inh+0.039
Neg logit contributions
rell-0.061
 better-0.058
 safer-0.055
 nicer-0.054
 company-0.053
Token may
Token ID743
Feature activation+1.03e-02

Pos logit contributions
rell+0.016
 better+0.015
 safer+0.014
 nicer+0.014
 company+0.013
Neg logit contributions
ra-0.009
 magn-0.009
 Fre-0.009
ast-0.008
 inh-0.008
Token not
Token ID407
Feature activation+1.23e-02

Pos logit contributions
rell+0.019
 better+0.017
 safer+0.017
 nicer+0.017
ock+0.016
Neg logit contributions
ra-0.014
ast-0.014
 magn-0.013
 Fre-0.013
 inh-0.013
Token be
Token ID307
Feature activation+7.01e-03

Pos logit contributions
rell+0.021
 better+0.019
 safer+0.019
 nicer+0.018
 meantime+0.018
Neg logit contributions
ra-0.018
ast-0.018
 magn-0.018
 inh-0.017
 Fre-0.017
Token what
Token ID644
Feature activation-1.26e-02

Pos logit contributions
rell+0.011
 meantime+0.010
 End+0.009
 refreshed+0.009
ock+0.009
Neg logit contributions
ra-0.011
ast-0.011
ude-0.011
 magn-0.011
 inh-0.010
Token is
Token ID318
Feature activation+6.55e-03

Pos logit contributions
ra+0.017
 magn+0.017
 Fre+0.016
ast+0.016
 inh+0.015
Neg logit contributions
rell-0.024
 better-0.022
 safer-0.021
 nicer-0.021
ock-0.020
Token best
Token ID1266
Feature activation-1.61e-03

Pos logit contributions
rell+0.009
 better+0.009
 safer+0.008
 nicer+0.008
 company+0.008
Neg logit contributions
ra-0.012
 magn-0.012
 Fre-0.011
ast-0.011
 inh-0.011
Token for
Token ID329
Feature activation-8.73e-03

Pos logit contributions
ra+0.003
 magn+0.003
ast+0.003
 Fre+0.003
 inh+0.003
Neg logit contributions
rell-0.002
 better-0.002
 safer-0.002
 nicer-0.002
 company-0.002
Token others
Token ID1854
Feature activation-6.01e-03

Pos logit contributions
ra+0.016
 magn+0.015
ast+0.015
 Fre+0.015
 inh+0.015
Neg logit contributions
rell-0.012
 better-0.012
 safer-0.011
 nicer-0.011
 company-0.010
Token.
Token ID13
Feature activation+6.66e-04

Pos logit contributions
ra+0.011
 magn+0.010
ast+0.010
ude+0.010
 Fre+0.010
Neg logit contributions
rell-0.009
 better-0.007
 safer-0.007
 nicer-0.007
 meantime-0.007
Token<|endoftext|>
Token ID50256
Feature activation+4.66e-03

Pos logit contributions
rell+0.001
 better+0.001
 safer+0.001
 nicer+0.001
 less+0.001
Neg logit contributions
 magn-0.001
ra-0.001
ast-0.001
 Fre-0.001
 inh-0.001
TokenOnce
Token ID7454
Feature activation-1.69e-02

Pos logit contributions
rell+0.009
 better+0.008
 nicer+0.008
 safer+0.008
 company+0.008
Neg logit contributions
 magn-0.006
ra-0.006
 Fre-0.006
 Miles-0.005
ast-0.005
Token and
Token ID290
Feature activation+8.55e-03

Pos logit contributions
ra+0.033
 magn+0.032
ast+0.031
 Fre+0.031
 inh+0.029
Neg logit contributions
rell-0.034
 better-0.034
 safer-0.031
 nicer-0.031
 less-0.029
Token Tim
Token ID5045
Feature activation-1.40e-02

Pos logit contributions
 better+0.012
rell+0.011
 safer+0.010
 nicer+0.010
 less+0.010
Neg logit contributions
ra-0.016
 magn-0.016
ast-0.016
 Fre-0.015
udge-0.015
Tokenmy
Token ID1820
Feature activation-7.12e-03

Pos logit contributions
ra+0.026
ast+0.026
ert+0.025
ude+0.024
 magn+0.024
Neg logit contributions
rell-0.019
 safer-0.018
 nicer-0.017
 better-0.016
 company-0.016
Token just
Token ID655
Feature activation+1.46e-02

Pos logit contributions
ra+0.010
ast+0.010
 magn+0.010
 inh+0.010
 Fre+0.009
Neg logit contributions
rell-0.013
 better-0.012
 safer-0.011
 nicer-0.011
 company-0.011
Token shrugged
Token ID28271
Feature activation-2.74e-03

Pos logit contributions
rell+0.025
 better+0.025
 safer+0.023
 nicer+0.022
 company+0.022
Neg logit contributions
ra-0.021
 magn-0.021
 Fre-0.020
ast-0.020
 Miles-0.019
Token his
Token ID465
Feature activation-4.08e-03

Pos logit contributions
ra+0.004
 magn+0.004
ast+0.004
 Fre+0.004
 inh+0.004
Neg logit contributions
rell-0.004
 better-0.004
 safer-0.004
 nicer-0.004
 company-0.004
Token shoulders
Token ID12450
Feature activation+5.19e-03

Pos logit contributions
ra+0.006
 magn+0.006
ast+0.006
 Fre+0.006
 inh+0.006
Neg logit contributions
rell-0.007
 better-0.006
 safer-0.006
 nicer-0.006
ock-0.006
Token.
Token ID13
Feature activation+4.47e-03

Pos logit contributions
rell+0.007
 better+0.006
 safer+0.006
 nicer+0.006
 meantime+0.006
Neg logit contributions
ra-0.009
ast-0.009
 magn-0.009
ude-0.009
 Fre-0.009
Token
Token ID220
Feature activation-2.85e-03

Pos logit contributions
rell+0.006
 better+0.006
 safer+0.005
 nicer+0.005
ock+0.005
Neg logit contributions
ra-0.008
 magn-0.008
 Fre-0.008
ast-0.008
 inh-0.008
Token
Token ID198
Feature activation-1.40e-03

Pos logit contributions
ra+0.004
 magn+0.004
ast+0.004
 Fre+0.004
 Miles+0.004
Neg logit contributions
rell-0.005
 better-0.005
 nicer-0.005
 safer-0.005
 company-0.004
Token
Token ID198
Feature activation-2.84e-03

Pos logit contributions
ra+0.002
 magn+0.002
 Fre+0.002
ast+0.002
 inh+0.002
Neg logit contributions
rell-0.002
 better-0.002
 safer-0.002
 nicer-0.002
 company-0.002
Token
Token ID129
Feature activation+3.07e-02

Pos logit contributions
ra+0.023
 magn+0.022
ast+0.021
 Fre+0.021
 inh+0.020
Neg logit contributions
rell-0.024
 better-0.024
 safer-0.022
 nicer-0.022
ock-0.021
Token
Token ID241
Feature activation-2.80e-02

Pos logit contributions
 better+0.045
rell+0.044
 safer+0.042
 nicer+0.042
 company+0.040
Neg logit contributions
ra-0.053
ast-0.053
 Fre-0.052
 Miles-0.048
ude-0.048
TokenSilver
Token ID26766
Feature activation-3.89e-03

Pos logit contributions
 magn+0.044
ra+0.042
ast+0.040
 Cic+0.040
 inh+0.039
Neg logit contributions
 better-0.044
rell-0.043
 safer-0.040
 less-0.039
ock-0.039
Token is
Token ID318
Feature activation-1.09e-02

Pos logit contributions
ra+0.007
 magn+0.007
ast+0.007
 Fre+0.006
 inh+0.006
Neg logit contributions
rell-0.006
 better-0.005
 safer-0.005
 nicer-0.005
 company-0.005
Token a
Token ID257
Feature activation-1.77e-02

Pos logit contributions
ra+0.018
 magn+0.017
ast+0.017
 Fre+0.016
 inh+0.016
Neg logit contributions
 better-0.017
rell-0.017
 safer-0.016
 nicer-0.015
 less-0.015
Token shiny
Token ID22441
Feature activation-1.23e-02

Pos logit contributions
ra+0.032
 Fre+0.031
ast+0.031
 magn+0.031
ude+0.030
Neg logit contributions
 better-0.025
rell-0.023
 safer-0.022
 nicer-0.021
 safe-0.020
Token metal
Token ID6147
Feature activation-2.17e-02

Pos logit contributions
 Fre+0.018
 magn+0.018
ra+0.017
ast+0.017
Tony+0.017
Neg logit contributions
 better-0.022
rell-0.020
 nicer-0.019
 safer-0.019
 less-0.018
Token.
Token ID13
Feature activation-6.39e-03

Pos logit contributions
 Fre+0.033
ast+0.033
Jul+0.032
Tony+0.032
 Cic+0.032
Neg logit contributions
 better-0.036
rell-0.033
 nicer-0.031
 safer-0.031
 less-0.030
Token It
Token ID632
Feature activation+4.96e-04

Pos logit contributions
 magn+0.011
ra+0.011
ast+0.011
 inh+0.011
ude+0.010
Neg logit contributions
rell-0.009
 better-0.009
 safer-0.008
 less-0.008
 nicer-0.007
Token will
Token ID481
Feature activation-1.37e-02

Pos logit contributions
 better+0.001
rell+0.001
 safer+0.001
 nicer+0.001
 less+0.001
Neg logit contributions
ra-0.001
ast-0.001
udge-0.001
 magn-0.001
 Fre-0.001
Token help
Token ID1037
Feature activation+9.11e-03

Pos logit contributions
ra+0.022
 magn+0.022
ast+0.021
 Fre+0.021
ude+0.020
Neg logit contributions
 better-0.022
rell-0.021
 safer-0.020
 nicer-0.019
 company-0.018
Token that
Token ID326
Feature activation-1.09e-02

Pos logit contributions
ra+0.020
 magn+0.019
ast+0.018
 Fre+0.018
 inh+0.018
Neg logit contributions
rell-0.030
 better-0.029
 safer-0.027
 nicer-0.027
 company-0.026
Token could
Token ID714
Feature activation-1.45e-02

Pos logit contributions
ra+0.020
ast+0.019
 magn+0.018
opsy+0.018
udge+0.018
Neg logit contributions
 better-0.017
rell-0.015
 safer-0.015
 nicer-0.014
 safe-0.014
Token unlock
Token ID12116
Feature activation-8.83e-03

Pos logit contributions
Tony+0.026
alph+0.025
 magn+0.024
Jul+0.024
ast+0.024
Neg logit contributions
 better-0.022
 safer-0.020
 safe-0.020
rell-0.020
 nicer-0.018
Token things
Token ID1243
Feature activation-1.43e-02

Pos logit contributions
ra+0.014
 magn+0.014
ast+0.013
 Fre+0.013
 inh+0.013
Neg logit contributions
rell-0.014
 better-0.014
 safer-0.013
 nicer-0.013
 company-0.012
Token.
Token ID13
Feature activation+4.45e-03

Pos logit contributions
ra+0.023
 magn+0.022
 Fre+0.022
ast+0.021
 inh+0.021
Neg logit contributions
rell-0.023
 better-0.021
 safer-0.020
 nicer-0.020
 company-0.019
Token Nem
Token ID22547
Feature activation-4.95e-02

Pos logit contributions
rell+0.007
 better+0.007
 safer+0.006
 nicer+0.006
 less+0.006
Neg logit contributions
ra-0.008
 magn-0.008
ast-0.007
 Fre-0.007
 inh-0.007
Tokeno
Token ID78
Feature activation-4.59e-03

Pos logit contributions
 magn+0.092
 Fre+0.090
growth+0.089
 inh+0.089
 El+0.086
Neg logit contributions
rell-0.059
 better-0.059
 company-0.049
ock-0.049
 safe-0.048
Token asked
Token ID1965
Feature activation+2.11e-03

Pos logit contributions
 magn+0.008
udge+0.007
 Fre+0.007
ra+0.007
ast+0.007
Neg logit contributions
 better-0.007
rell-0.007
 nicer-0.006
 safer-0.006
 less-0.006
Token his
Token ID465
Feature activation-1.26e-02

Pos logit contributions
 better+0.004
rell+0.003
 safer+0.003
 nicer+0.003
 less+0.003
Neg logit contributions
ra-0.003
 magn-0.003
ast-0.003
 Fre-0.003
 inh-0.003
Token dad
Token ID9955
Feature activation+2.48e-02

Pos logit contributions
ra+0.017
 Fre+0.016
 magn+0.016
 inh+0.016
ast+0.015
Neg logit contributions
 better-0.024
 nicer-0.022
rell-0.022
 safer-0.021
 less-0.020
Token for
Token ID329
Feature activation-5.15e-03

Pos logit contributions
 better+0.041
rell+0.040
 safer+0.037
 nicer+0.036
 less+0.035
Neg logit contributions
ra-0.039
 magn-0.038
ast-0.037
 Fre-0.036
 inh-0.036
Token Snow
Token ID7967
Feature activation-1.94e-02

Pos logit contributions
ra+0.005
 magn+0.005
ast+0.005
 Fre+0.005
 inh+0.005
Neg logit contributions
rell-0.004
 better-0.004
 safer-0.004
 nicer-0.004
 less-0.004
Tokeny
Token ID88
Feature activation+6.12e-04

Pos logit contributions
ra+0.016
 magn+0.015
ast+0.015
 Fre+0.014
ude+0.013
Neg logit contributions
rell-0.046
 better-0.045
 safer-0.043
 nicer-0.043
 less-0.041
Token.
Token ID13
Feature activation-2.27e-04

Pos logit contributions
rell+0.001
 better+0.001
 safer+0.001
 nicer+0.001
 company+0.001
Neg logit contributions
ra-0.001
 magn-0.001
ast-0.001
 Fre-0.001
 inh-0.001
Token And
Token ID843
Feature activation+9.86e-03

Pos logit contributions
ra+0.000
 magn+0.000
 Fre+0.000
ast+0.000
 inh+0.000
Neg logit contributions
rell-0.000
 better-0.000
 safer-0.000
 nicer-0.000
 company-0.000
Token the
Token ID262
Feature activation+9.20e-03

Pos logit contributions
 better+0.014
rell+0.014
 safer+0.013
 nicer+0.012
 company+0.012
Neg logit contributions
ra-0.017
 magn-0.017
ast-0.017
 Fre-0.017
ude-0.016
Token living
Token ID2877
Feature activation-4.15e-02

Pos logit contributions
rell+0.016
 better+0.015
 safer+0.014
 nicer+0.014
ock+0.013
Neg logit contributions
 magn-0.014
ra-0.014
ast-0.012
 inh-0.012
 Fre-0.012
Token room
Token ID2119
Feature activation+8.99e-03

Pos logit contributions
 Fre+0.080
 Cic+0.074
 Tor+0.072
 Marc+0.072
 Doug+0.071
Neg logit contributions
 better-0.056
 nicer-0.050
 company-0.049
 wrong-0.049
 end-0.048
Token.
Token ID13
Feature activation+3.23e-03

Pos logit contributions
rell+0.016
 better+0.014
 safer+0.014
 nicer+0.014
 company+0.014
Neg logit contributions
ra-0.014
 magn-0.013
ast-0.012
 inh-0.012
 Fre-0.011
Token Mom
Token ID11254
Feature activation+1.65e-03

Pos logit contributions
rell+0.005
 better+0.005
 safer+0.005
 nicer+0.005
 company+0.004
Neg logit contributions
ra-0.005
 magn-0.005
ast-0.005
 Fre-0.005
 inh-0.005
Token will
Token ID481
Feature activation+6.56e-03

Pos logit contributions
rell+0.003
 safer+0.002
 better+0.002
 nicer+0.002
 company+0.002
Neg logit contributions
ra-0.003
ast-0.002
 magn-0.002
 inh-0.002
 Fre-0.002
Token be
Token ID307
Feature activation+5.87e-03

Pos logit contributions
rell+0.012
 better+0.011
 safer+0.010
 nicer+0.010
vious+0.010
Neg logit contributions
ra-0.010
ast-0.009
 magn-0.009
 Fre-0.009
 inh-0.009
Token.
Token ID13
Feature activation-1.86e-02

Pos logit contributions
rell+0.041
 meantime+0.035
 safer+0.034
 better+0.034
ock+0.034
Neg logit contributions
ra-0.045
ast-0.043
 magn-0.043
 inh-0.042
 Fre-0.041
Token He
Token ID679
Feature activation-5.14e-03

Pos logit contributions
ra+0.037
 magn+0.035
ast+0.034
ert+0.033
opsy+0.033
Neg logit contributions
rell-0.024
 better-0.024
 safer-0.021
 less-0.021
 nicer-0.020
Token plugged
Token ID30601
Feature activation-4.21e-02

Pos logit contributions
 magn+0.010
 Fre+0.010
ert+0.010
udge+0.010
alph+0.010
Neg logit contributions
 better-0.007
 nicer-0.006
 cal-0.006
 safer-0.005
 less-0.005
Token the
Token ID262
Feature activation-3.81e-03

Pos logit contributions
 magn+0.085
alph+0.079
opsy+0.079
 inh+0.078
ude+0.078
Neg logit contributions
 better-0.051
rell-0.041
 less-0.041
 wrong-0.039
 nicer-0.039
Token mixer
Token ID33938
Feature activation+2.08e-03

Pos logit contributions
ra+0.005
 magn+0.005
ast+0.004
 Fre+0.004
 inh+0.004
Neg logit contributions
rell-0.008
 better-0.007
 safer-0.007
 nicer-0.007
 company-0.007
Token into
Token ID656
Feature activation-4.28e-02

Pos logit contributions
 better+0.003
rell+0.003
 safer+0.003
 less+0.003
 nicer+0.003
Neg logit contributions
 Fre-0.004
alph-0.003
ra-0.003
 magn-0.003
opsy-0.003
Token the
Token ID262
Feature activation+7.07e-03

Pos logit contributions
opsy+0.082
 Fre+0.079
ast+0.079
alph+0.079
ude+0.076
Neg logit contributions
 better-0.061
 less-0.051
 safer-0.051
rell-0.048
 nicer-0.048
Token wall
Token ID3355
Feature activation-1.84e-02

Pos logit contributions
rell+0.011
 better+0.011
 safer+0.010
 nicer+0.010
 company+0.009
Neg logit contributions
ra-0.012
 magn-0.012
ast-0.011
 Fre-0.011
 inh-0.011
Token and
Token ID290
Feature activation+1.41e-02

Pos logit contributions
ra+0.027
 magn+0.026
 Fre+0.026
ast+0.025
opsy+0.024
Neg logit contributions
 better-0.032
rell-0.032
 safer-0.030
 nicer-0.029
 less-0.028
Token turned
Token ID2900
Feature activation-2.76e-04

Pos logit contributions
 better+0.023
 safer+0.020
 less+0.019
 nicer+0.019
rell+0.019
Neg logit contributions
alph-0.024
ude-0.024
 Fre-0.024
opsy-0.024
ra-0.023
Token it
Token ID340
Feature activation+1.91e-04

Pos logit contributions
ra+0.000
 magn+0.000
 Fre+0.000
ast+0.000
 inh+0.000
Neg logit contributions
rell-0.000
 better-0.000
 safer-0.000
 nicer-0.000
 company-0.000
Token Suddenly
Token ID24975
Feature activation+4.96e-03

Pos logit contributions
rell+0.015
 better+0.015
 safer+0.013
 nicer+0.013
 less+0.013
Neg logit contributions
 magn-0.017
ra-0.017
 Fre-0.016
ast-0.016
Tony-0.016
Token,
Token ID11
Feature activation-2.78e-03

Pos logit contributions
rell+0.007
 better+0.007
 safer+0.006
 nicer+0.006
 company+0.006
Neg logit contributions
ra-0.009
 magn-0.009
 Fre-0.008
 inh-0.008
ast-0.008
Token the
Token ID262
Feature activation+2.58e-02

Pos logit contributions
ra+0.005
ast+0.005
 magn+0.005
 Fre+0.004
ude+0.004
Neg logit contributions
 better-0.004
rell-0.004
 safer-0.004
 nicer-0.004
 less-0.004
Token flour
Token ID10601
Feature activation+8.91e-03

Pos logit contributions
rell+0.057
 better+0.053
 safer+0.053
ock+0.050
 company+0.050
Neg logit contributions
 magn-0.025
ra-0.025
 inh-0.022
ast-0.022
ude-0.021
Token flew
Token ID13112
Feature activation+1.29e-02

Pos logit contributions
rell+0.014
 better+0.014
 safer+0.013
 nicer+0.012
 less+0.012
Neg logit contributions
ra-0.014
 magn-0.014
 Fre-0.014
ast-0.014
 inh-0.013
Token out
Token ID503
Feature activation-4.11e-02

Pos logit contributions
rell+0.022
 company+0.021
 safer+0.021
 better+0.021
 nicer+0.020
Neg logit contributions
ra-0.018
 magn-0.017
 Fre-0.016
ast-0.016
 inh-0.016
Token of
Token ID286
Feature activation-3.98e-03

Pos logit contributions
ra+0.068
 magn+0.067
 Fre+0.065
ast+0.064
opsy+0.064
Neg logit contributions
 better-0.066
rell-0.065
 safer-0.059
 nicer-0.059
 less-0.057
Token the
Token ID262
Feature activation+2.23e-02

Pos logit contributions
ra+0.007
 magn+0.007
ast+0.007
 Fre+0.007
Tony+0.007
Neg logit contributions
 better-0.006
rell-0.006
 safer-0.005
 nicer-0.005
 less-0.005
Token bowl
Token ID9396
Feature activation-1.78e-02

Pos logit contributions
rell+0.035
 better+0.033
 safer+0.031
 company+0.031
 nicer+0.030
Neg logit contributions
ra-0.037
 magn-0.036
ast-0.035
 inh-0.034
 Fre-0.033
Token and
Token ID290
Feature activation+9.68e-03

Pos logit contributions
ra+0.028
 magn+0.027
 Fre+0.027
ast+0.026
alph+0.026
Neg logit contributions
rell-0.030
 better-0.029
 safer-0.026
 nicer-0.026
 less-0.025
Token made
Token ID925
Feature activation+1.19e-02

Pos logit contributions
 better+0.015
rell+0.015
 nicer+0.014
 safer+0.014
 less+0.013
Neg logit contributions
ast-0.016
ra-0.016
 magn-0.016
 Fre-0.016
udge-0.015
Token neighbour
Token ID12250
Feature activation+2.39e-02

Pos logit contributions
ra+0.026
 inh+0.024
 magn+0.023
 sket+0.023
 Fre+0.023
Neg logit contributions
 better-0.028
 nicer-0.026
rell-0.025
 safer-0.025
 safe-0.024
Token replied
Token ID8712
Feature activation+2.30e-02

Pos logit contributions
rell+0.032
 better+0.032
 safer+0.028
 nicer+0.028
 company+0.026
Neg logit contributions
ra-0.044
 magn-0.043
ast-0.042
 Fre-0.042
 inh-0.041
Token,
Token ID11
Feature activation+6.17e-03

Pos logit contributions
rell+0.028
 better+0.027
 safer+0.024
 nicer+0.024
 company+0.022
Neg logit contributions
ra-0.046
 magn-0.045
ast-0.044
 Fre-0.044
 inh-0.043
Token sadly
Token ID21098
Feature activation+2.34e-02

Pos logit contributions
rell+0.011
 better+0.011
 safer+0.010
 nicer+0.010
 company+0.010
Neg logit contributions
ra-0.008
 magn-0.008
 Fre-0.008
ast-0.008
 inh-0.008
Token,
Token ID11
Feature activation+5.10e-03

Pos logit contributions
rell+0.029
 better+0.028
 safer+0.025
 nicer+0.025
 company+0.024
Neg logit contributions
ra-0.046
 magn-0.045
ast-0.044
 Fre-0.044
 inh-0.043
Token "
Token ID366
Feature activation-4.63e-02

Pos logit contributions
rell+0.009
 better+0.008
 safer+0.008
 nicer+0.008
 company+0.007
Neg logit contributions
ra-0.008
 magn-0.008
ast-0.007
 Fre-0.007
 inh-0.007
TokenIt
Token ID1026
Feature activation+7.78e-03

Pos logit contributions
 magn+0.095
ra+0.088
 knees+0.087
 Mae+0.086
 Cic+0.086
Neg logit contributions
 better-0.050
vious-0.047
ock-0.046
 less-0.045
rell-0.044
Token's
Token ID338
Feature activation-1.77e-02

Pos logit contributions
 better+0.009
rell+0.009
 safer+0.008
 nicer+0.008
 less+0.007
Neg logit contributions
ra-0.016
 magn-0.016
ast-0.015
 Fre-0.015
Tony-0.015
Token the
Token ID262
Feature activation+1.54e-02

Pos logit contributions
ra+0.029
 magn+0.028
ast+0.027
 Fre+0.027
 inh+0.026
Neg logit contributions
rell-0.028
 better-0.027
 safer-0.025
 nicer-0.025
 company-0.024
Token sk
Token ID1341
Feature activation-3.24e-02

Pos logit contributions
rell+0.022
 better+0.020
 safer+0.019
 nicer+0.018
 company+0.017
Neg logit contributions
ra-0.028
 magn-0.027
 Fre-0.027
ast-0.026
 inh-0.026
Tokenunk
Token ID2954
Feature activation+6.64e-03

Pos logit contributions
 magn+0.032
 Cic+0.029
 Fre+0.029
 inh+0.029
ra+0.028
Neg logit contributions
rell-0.072
 better-0.064
ock-0.063
 company-0.062
vious-0.061