TOP ACTIVATIONS
MAX = 10.239

gave Tim my a l oll ipop to say thank you
She gives Sara a l oll ipop and a hug .
lady gave her a l oll ipop and said , "
Sally gave her the l oll ipop . Lily learned that
Can I have a l oll ipop , Johnny ?" Tim
gave Tim my a l oll ipop after his haircut .
and gave Jack a l oll ipop . Jack thanked him
gave him a fancy l oll ipop as a reward .
. He took his l oll ipop and said , "
and handed Lucy a l oll ipop . Lucy was so
She gives them a l oll ipop and a sticker .
She gave them a l oll ipop each and told them
Lily happily took the l oll ipop and started licking it
unw ra pped the l oll ipop and l icked it
should have shared my l oll ipop with you . It
gave Sus ie a l oll ipop to make her feel
Do you want a l oll ipop ?" Tom
gave Peter a big l oll ipop ! There were many
. They took their l oll ip ops and smiled .
gives them each a l oll ipop . Tom gets a

MIN ACTIVATIONS
MIN = -7.734

out that she had been excluded from a special game .
. But the pony was confused . She was not used
day his peaceful journey was disrupted by an ignorant animal .
happy but he was also confused why he wanted to collect
the knife made him feel confused . One day
, but then Lime got confused . He couldn 't remember
, which made them very excited . Ben asked
but the nurse was very surprised and shouted " O uch
day , Jack was very confused ; he could smell something
the store keeper was very ignorant . He said he couldn
One day , he was faced with an impressive puzzle .
arm but Tim my was confused and didn 't know what
. The rabbit was very confused . He hopped around looking
dad saw that he was confused and they sat down with
woods and she was very confused . She didn 't know
however , the ducks were faced with a threat . A
. Poor Benny was too excited and tore too hard .
no doll . Sara was confused and uncomfortable . She did
bright flame that makes them curious . They want to touch
. But they were too excited and curious . They wanted

INTERVAL 5.746 - 10.239
CONTAINS 0.024%

looked brighter . From that day on , Lee always remembered
all day . From that day on , Max and Lily
! Now let â € ™ s make a delicious meal
and gave Jack a l oll ipop . Jack thanked him
man who gave her the sa us ages . She even

INTERVAL 1.252 - 5.746
CONTAINS 19.707%

oring loudly . " Look , Lily , a
. He opened the box and showed them what was inside
When it was all clean , Mia 's mom gave her
lit up and he said , " Yes , please !
from within . She had finally realized that love was the

INTERVAL -3.241 - 1.252
CONTAINS 79.090%

. Every day Jack and his mom would go
looked for bananas , but there were none . He looked
baby was taking away all the attention from her . Her
. It was fun and messy . He saw a shiny
see the flowers . They could not see the shed .

INTERVAL -7.734 - -3.241
CONTAINS 1.179%

But then he heard some noise from far away , so
Then , they heard a noise . It was
. The cricket was so relieved that he stopped hopping and
but they did not notice the time . Soon , mom
. The little girl was sad , but she waited .
Token gave
Token ID2921
Feature activation-3.23e-01

Pos logit contributions
 noise+0.000
 cries+0.000
 awful+0.000
 evil+0.000
 truth+0.000
Neg logit contributions
burn-0.000
 sunscreen-0.000
ipop-0.000
uggets-0.000
ip-0.000
Token Tim
Token ID5045
Feature activation+1.10e+00

Pos logit contributions
 noise+0.007
 awful+0.006
 cries+0.006
 evil+0.005
 truth+0.005
Neg logit contributions
ipop-0.008
ip-0.008
burn-0.007
 sunscreen-0.007
Whe-0.007
Tokenmy
Token ID1820
Feature activation+1.02e+00

Pos logit contributions
ipop+0.025
 sunscreen+0.025
burn+0.025
agna+0.025
ip+0.025
Neg logit contributions
 noise-0.020
 cries-0.018
 awful-0.017
 evil-0.016
.-0.016
Token a
Token ID257
Feature activation+2.64e+00

Pos logit contributions
ipop+0.028
burn+0.028
ip+0.027
rav+0.027
Whe+0.027
Neg logit contributions
 noise-0.015
 awful-0.013
 cries-0.013
 evil-0.012
 unknown-0.012
Token l
Token ID300
Feature activation+3.27e+00

Pos logit contributions
uggets+0.069
burn+0.069
 Sn+0.068
ip+0.068
Whe+0.067
Neg logit contributions
 noise-0.045
 truth-0.038
 cries-0.036
 strange-0.036
 situation-0.035
Tokenoll
Token ID692
Feature activation+1.02e+01

Pos logit contributions
burn+0.053
 sunscreen+0.050
ipop+0.048
ip+0.048
 Sn+0.047
Neg logit contributions
 noise-0.085
 awful-0.081
 cries-0.080
 evil-0.080
 truth-0.078
Tokenipop
Token ID42800
Feature activation+4.06e+00

Pos logit contributions
Whe+0.109
 sunscreen+0.084
Br+0.079
burn+0.078
 Sn+0.078
Neg logit contributions
 cries-0.309
.-0.307
 noise-0.303
 evil-0.296
 awful-0.295
Token to
Token ID284
Feature activation+8.34e-01

Pos logit contributions
ipop+0.127
ip+0.126
burn+0.125
 sunscreen+0.122
Whe+0.120
Neg logit contributions
 noise-0.047
 awful-0.038
 cries-0.037
 evil-0.035
 truth-0.034
Token say
Token ID910
Feature activation+2.15e+00

Pos logit contributions
burn+0.020
ipop+0.020
ip+0.020
rav+0.019
uggets+0.018
Neg logit contributions
 noise-0.017
 awful-0.014
 cries-0.014
 evil-0.014
 harm-0.014
Token thank
Token ID5875
Feature activation+1.54e-02

Pos logit contributions
burn+0.055
 sunscreen+0.051
uggets+0.050
Whe+0.050
ip+0.050
Neg logit contributions
 noise-0.039
 awful-0.036
 truth-0.035
 evil-0.034
 cries-0.034
Token you
Token ID345
Feature activation+1.42e+00

Pos logit contributions
ipop+0.001
burn+0.001
ip+0.000
coat+0.000
 sunscreen+0.000
Neg logit contributions
 noise-0.000
 awful-0.000
 evil-0.000
 cries-0.000
 unfamiliar-0.000
Token She
Token ID1375
Feature activation+4.38e-02

Pos logit contributions
 noise+0.006
 awful+0.006
 cries+0.005
 evil+0.005
 unfamiliar+0.005
Neg logit contributions
ipop-0.007
ip-0.007
burn-0.007
 sunscreen-0.007
 Sn-0.007
Token gives
Token ID3607
Feature activation+3.72e-01

Pos logit contributions
ip+0.001
ipop+0.001
burn+0.001
 sunscreen+0.001
 Sn+0.001
Neg logit contributions
 noise-0.001
 awful-0.001
 evil-0.001
 cries-0.001
 unfamiliar-0.001
Token Sara
Token ID24799
Feature activation+1.27e+00

Pos logit contributions
ipop+0.007
burn+0.007
ip+0.007
belt+0.007
coat+0.007
Neg logit contributions
 noise-0.008
 cries-0.008
 awful-0.008
 truth-0.007
 wonders-0.007
Token a
Token ID257
Feature activation+2.94e+00

Pos logit contributions
ipop+0.039
 seeker+0.039
belt+0.038
rav+0.038
agna+0.038
Neg logit contributions
 noise-0.014
 cries-0.013
 awful-0.011
 evil-0.010
 truth-0.010
Token l
Token ID300
Feature activation+3.54e+00

Pos logit contributions
ip+0.056
ipop+0.054
burn+0.053
uggets+0.052
 Sn+0.052
Neg logit contributions
 noise-0.072
 cries-0.064
 truth-0.063
 awful-0.061
 situation-0.060
Tokenoll
Token ID692
Feature activation+1.01e+01

Pos logit contributions
burn+0.054
ipop+0.052
ip+0.052
 sunscreen+0.051
 Sn+0.048
Neg logit contributions
 noise-0.097
 cries-0.090
 awful-0.090
 evil-0.089
 truth-0.088
Tokenipop
Token ID42800
Feature activation+3.97e+00

Pos logit contributions
Whe+0.103
 sunscreen+0.078
 Sn+0.073
burn+0.068
agna+0.066
Neg logit contributions
 cries-0.321
 noise-0.312
.-0.308
 evil-0.297
 hurts-0.297
Token and
Token ID290
Feature activation+3.91e+00

Pos logit contributions
ipop+0.122
ip+0.121
burn+0.121
Whe+0.119
 sunscreen+0.119
Neg logit contributions
 noise-0.046
 cries-0.039
 awful-0.038
 truth-0.034
 evil-0.034
Token a
Token ID257
Feature activation+4.34e+00

Pos logit contributions
ipop+0.098
ip+0.097
burn+0.095
 sunscreen+0.091
 Sn+0.089
Neg logit contributions
 noise-0.071
 cries-0.063
 awful-0.063
 evil-0.061
 unfamiliar-0.059
Token hug
Token ID16225
Feature activation+1.10e+00

Pos logit contributions
ip+0.081
uggets+0.081
Whe+0.079
burn+0.079
rav+0.077
Neg logit contributions
 noise-0.108
 cries-0.094
 truth-0.093
 awful-0.088
 evil-0.086
Token.
Token ID13
Feature activation-1.64e-01

Pos logit contributions
ip+0.041
ipop+0.041
burn+0.040
 sunscreen+0.040
Whe+0.039
Neg logit contributions
 noise-0.007
 cries-0.004
 awful-0.004
 truth-0.004
 evil-0.003
Token lady
Token ID10846
Feature activation-3.93e-01

Pos logit contributions
 noise+0.005
 truth+0.004
 cries+0.004
 situation+0.004
 unknown+0.004
Neg logit contributions
ip-0.007
burn-0.007
Whe-0.007
uggets-0.007
 sunscreen-0.007
Token gave
Token ID2921
Feature activation+6.79e-01

Pos logit contributions
 noise+0.006
 cries+0.005
 wonders+0.005
 awful+0.004
 comm+0.004
Neg logit contributions
burn-0.011
 sunscreen-0.011
Whe-0.011
uggets-0.011
ipop-0.010
Token her
Token ID607
Feature activation+1.33e+00

Pos logit contributions
burn+0.021
ipop+0.021
ip+0.021
belt+0.021
agna+0.020
Neg logit contributions
 noise-0.008
 cries-0.007
 awful-0.006
 truth-0.006
 wonders-0.006
Token a
Token ID257
Feature activation+2.65e+00

Pos logit contributions
burn+0.038
Whe+0.038
rav+0.037
agna+0.037
ugg+0.036
Neg logit contributions
 noise-0.017
 cries-0.016
 unknown-0.014
 awful-0.014
 evil-0.014
Token l
Token ID300
Feature activation+3.22e+00

Pos logit contributions
uggets+0.082
burn+0.082
Whe+0.082
agna+0.079
belt+0.078
Neg logit contributions
 noise-0.032
 strange-0.025
 cries-0.023
 truth-0.023
 unknown-0.022
Tokenoll
Token ID692
Feature activation+9.83e+00

Pos logit contributions
burn+0.054
 sunscreen+0.050
 Sn+0.047
ipop+0.047
ip+0.047
Neg logit contributions
 noise-0.083
 awful-0.080
 cries-0.078
 evil-0.078
 truth-0.077
Tokenipop
Token ID42800
Feature activation+4.23e+00

Pos logit contributions
Whe+0.124
burn+0.092
 sunscreen+0.091
Br+0.084
 Sn+0.080
Neg logit contributions
 cries-0.290
.-0.285
 noise-0.281
 evil-0.275
 awful-0.274
Token and
Token ID290
Feature activation+2.60e+00

Pos logit contributions
Whe+0.134
burn+0.133
ip+0.130
ipop+0.129
 sunscreen+0.129
Neg logit contributions
 noise-0.045
 awful-0.037
 cries-0.037
 truth-0.033
 evil-0.031
Token said
Token ID531
Feature activation+2.73e+00

Pos logit contributions
burn+0.079
Whe+0.078
rav+0.077
uggets+0.076
ipop+0.075
Neg logit contributions
 noise-0.033
 cries-0.029
 comm-0.025
 awful-0.024
 wonders-0.024
Token,
Token ID11
Feature activation+2.85e+00

Pos logit contributions
burn+0.098
Whe+0.093
 sunscreen+0.093
coat+0.089
ip+0.088
Neg logit contributions
 noise-0.024
 truth-0.019
.-0.018
 awful-0.018
 cries-0.017
Token "
Token ID366
Feature activation+2.39e+00

Pos logit contributions
burn+0.082
ipop+0.080
ip+0.079
inea+0.078
 sunscreen+0.078
Neg logit contributions
 noise-0.042
 truth-0.034
 cries-0.033
 awful-0.032
 evil-0.030
Token Sally
Token ID25737
Feature activation-1.82e-01

Pos logit contributions
ipop+0.048
ip+0.047
burn+0.046
 saddle+0.044
 reins+0.044
Neg logit contributions
 noise-0.031
 awful-0.026
 unknown-0.025
 unfamiliar-0.025
 truth-0.024
Token gave
Token ID2921
Feature activation+1.09e+00

Pos logit contributions
 noise+0.003
 awful+0.003
 cries+0.003
 evil+0.003
 unfamiliar+0.003
Neg logit contributions
ipop-0.005
ip-0.005
burn-0.005
 sunscreen-0.004
 Sn-0.004
Token her
Token ID607
Feature activation+2.18e+00

Pos logit contributions
ipop+0.029
ip+0.028
burn+0.028
Whe+0.026
 Sn+0.026
Neg logit contributions
 noise-0.020
 awful-0.017
 cries-0.017
 truth-0.016
 unfamiliar-0.016
Token the
Token ID262
Feature activation+5.92e-01

Pos logit contributions
burn+0.064
Whe+0.062
rav+0.062
belt+0.061
 steering+0.060
Neg logit contributions
 noise-0.030
 unknown-0.026
 awful-0.026
 cries-0.026
 truth-0.025
Token l
Token ID300
Feature activation+3.57e+00

Pos logit contributions
burn+0.014
ip+0.014
ipop+0.013
uggets+0.013
 sunscreen+0.013
Neg logit contributions
 noise-0.013
 truth-0.012
 awful-0.011
 situation-0.011
 cries-0.011
Tokenoll
Token ID692
Feature activation+9.81e+00

Pos logit contributions
burn+0.074
 sunscreen+0.070
Whe+0.062
 apr+0.062
 Sn+0.062
Neg logit contributions
 noise-0.076
 awful-0.074
 truth-0.073
 cries-0.072
 evil-0.071
Tokenipop
Token ID42800
Feature activation+4.12e+00

Pos logit contributions
Whe+0.130
burn+0.103
 sunscreen+0.094
 steering+0.093
 passenger+0.081
Neg logit contributions
 cries-0.280
.-0.279
 noise-0.276
 awful-0.274
 hurts-0.265
Token.
Token ID13
Feature activation+4.17e-01

Pos logit contributions
burn+0.113
ip+0.112
ipop+0.112
Whe+0.111
 sunscreen+0.108
Neg logit contributions
 noise-0.062
 awful-0.054
 cries-0.053
 evil-0.050
 truth-0.049
Token Lily
Token ID20037
Feature activation+3.77e-01

Pos logit contributions
burn+0.010
ipop+0.010
ip+0.010
 sunscreen+0.010
Whe+0.009
Neg logit contributions
 noise-0.008
 cries-0.007
 awful-0.007
 truth-0.007
 wonders-0.006
Token learned
Token ID4499
Feature activation-2.52e+00

Pos logit contributions
ipop+0.011
ip+0.011
burn+0.010
 sunscreen+0.010
Whe+0.010
Neg logit contributions
 noise-0.006
 cries-0.005
 awful-0.005
 evil-0.004
 truth-0.004
Token that
Token ID326
Feature activation-6.66e-01

Pos logit contributions
 noise+0.040
 cries+0.036
 evil+0.036
 situation+0.035
umbling+0.035
Neg logit contributions
ip-0.065
 sunscreen-0.063
ipop-0.061
burn-0.060
 Sn-0.058
TokenCan
Token ID6090
Feature activation+4.56e-01

Pos logit contributions
ipop+0.059
ip+0.058
 sunscreen+0.051
burn+0.051
uggets+0.049
Neg logit contributions
 noise-0.055
 awful-0.049
 wonders-0.049
 odd-0.048
 unfamiliar-0.048
Token I
Token ID314
Feature activation+3.77e+00

Pos logit contributions
ip+0.013
ipop+0.013
burn+0.013
 sunscreen+0.013
uggets+0.013
Neg logit contributions
 noise-0.006
 cries-0.005
 awful-0.005
 evil-0.005
 truth-0.005
Token have
Token ID423
Feature activation+2.45e+00

Pos logit contributions
burn+0.095
Whe+0.095
uggets+0.094
belt+0.093
 sunscreen+0.093
Neg logit contributions
 noise-0.059
 harm-0.051
 cries-0.050
 truth-0.049
.-0.049
Token a
Token ID257
Feature activation+3.36e+00

Pos logit contributions
ipop+0.072
coat+0.070
belt+0.070
burn+0.070
ugg+0.069
Neg logit contributions
 noise-0.034
 awful-0.029
 cries-0.028
 unknown-0.027
 evil-0.026
Token l
Token ID300
Feature activation+3.27e+00

Pos logit contributions
belt+0.069
uggets+0.069
burn+0.069
 sunscreen+0.069
Whe+0.068
Neg logit contributions
 noise-0.073
 situation-0.065
 cries-0.064
 truth-0.063
 embarrassing-0.061
Tokenoll
Token ID692
Feature activation+9.68e+00

Pos logit contributions
burn+0.057
 sunscreen+0.055
 Sn+0.050
ip+0.049
 apr+0.048
Neg logit contributions
 noise-0.081
 awful-0.076
 cries-0.076
 evil-0.075
 truth-0.075
Tokenipop
Token ID42800
Feature activation+3.76e+00

Pos logit contributions
Whe+0.089
 sunscreen+0.072
 Sn+0.067
burn+0.061
Br+0.059
Neg logit contributions
 noise-0.312
 cries-0.304
 evil-0.297
 awful-0.296
 hurts-0.292
Token,
Token ID11
Feature activation+3.11e+00

Pos logit contributions
ip+0.103
burn+0.102
 sunscreen+0.101
ipop+0.101
Whe+0.100
Neg logit contributions
 noise-0.057
 awful-0.047
 cries-0.044
 evil-0.043
 truth-0.043
Token Johnny
Token ID15470
Feature activation+6.49e-02

Pos logit contributions
ip+0.070
ipop+0.068
burn+0.067
ugg+0.063
inea+0.060
Neg logit contributions
 noise-0.071
 awful-0.061
 truth-0.059
 odd-0.058
 cries-0.058
Token?"
Token ID1701
Feature activation+5.66e-01

Pos logit contributions
ipop+0.002
ip+0.002
burn+0.002
 sunscreen+0.002
 Sn+0.002
Neg logit contributions
 noise-0.001
 awful-0.001
 cries-0.001
 evil-0.001
 unfamiliar-0.001
Token Tim
Token ID5045
Feature activation+9.94e-01

Pos logit contributions
ip+0.011
ipop+0.011
burn+0.011
 sunscreen+0.010
 Sn+0.010
Neg logit contributions
 noise-0.013
 awful-0.012
 evil-0.012
 cries-0.012
 situation-0.012
Token gave
Token ID2921
Feature activation+5.32e-02

Pos logit contributions
ipop+0.023
ip+0.023
burn+0.023
uggets+0.022
 sunscreen+0.022
Neg logit contributions
 noise-0.015
 awful-0.013
 evil-0.012
 unfamiliar-0.012
 cries-0.012
Token Tim
Token ID5045
Feature activation+1.22e+00

Pos logit contributions
ipop+0.001
ip+0.001
burn+0.001
 sunscreen+0.001
Whe+0.001
Neg logit contributions
 noise-0.001
 awful-0.001
 cries-0.001
 evil-0.001
 unfamiliar-0.001
Tokenmy
Token ID1820
Feature activation+1.53e+00

Pos logit contributions
agna+0.028
ipop+0.028
 sunscreen+0.028
ip+0.028
burn+0.028
Neg logit contributions
 noise-0.021
 cries-0.020
 awful-0.019
.-0.018
 truth-0.018
Token a
Token ID257
Feature activation+2.79e+00

Pos logit contributions
ipop+0.045
burn+0.045
ip+0.045
rav+0.044
coat+0.044
Neg logit contributions
 noise-0.018
 awful-0.016
 cries-0.016
 unknown-0.015
 evil-0.015
Token l
Token ID300
Feature activation+2.91e+00

Pos logit contributions
ip+0.074
uggets+0.073
burn+0.073
Whe+0.072
 Sn+0.071
Neg logit contributions
 noise-0.047
 truth-0.038
 cries-0.038
 evil-0.035
 situation-0.035
Tokenoll
Token ID692
Feature activation+9.66e+00

Pos logit contributions
burn+0.052
 sunscreen+0.050
ip+0.045
 Sn+0.045
ipop+0.044
Neg logit contributions
 noise-0.071
 awful-0.068
 evil-0.068
 cries-0.067
 truth-0.067
Tokenipop
Token ID42800
Feature activation+3.98e+00

Pos logit contributions
Whe+0.105
 sunscreen+0.081
burn+0.074
Br+0.074
 Sn+0.068
Neg logit contributions
 cries-0.297
.-0.293
 noise-0.287
 evil-0.283
 awful-0.279
Token after
Token ID706
Feature activation-1.51e+00

Pos logit contributions
Whe+0.137
burn+0.136
 sunscreen+0.135
ip+0.135
ipop+0.134
Neg logit contributions
 noise-0.029
 awful-0.023
 cries-0.021
 evil-0.018
 truth-0.018
Token his
Token ID465
Feature activation+1.95e+00

Pos logit contributions
 noise+0.024
 awful+0.019
 cries+0.019
 evil+0.019
 truth+0.018
Neg logit contributions
burn-0.043
ip-0.043
ipop-0.043
Whe-0.041
 sunscreen-0.040
Token haircut
Token ID45548
Feature activation+2.43e+00

Pos logit contributions
burn+0.040
ip+0.039
ipop+0.038
 sunscreen+0.037
Whe+0.037
Neg logit contributions
 noise-0.047
 cries-0.044
 evil-0.042
 situation-0.041
 truth-0.040
Token.
Token ID13
Feature activation+3.03e-01

Pos logit contributions
ip+0.086
ipop+0.086
burn+0.085
 sunscreen+0.083
Whe+0.081
Neg logit contributions
 noise-0.018
 cries-0.013
 awful-0.012
 evil-0.010
 truth-0.010
Token and
Token ID290
Feature activation+1.56e+00

Pos logit contributions
ipop+0.082
ip+0.081
burn+0.080
Whe+0.077
 sunscreen+0.077
Neg logit contributions
 noise-0.042
 cries-0.035
 awful-0.035
 unfamiliar-0.033
 situation-0.033
Token gave
Token ID2921
Feature activation+5.18e-01

Pos logit contributions
burn+0.046
ip+0.043
rav+0.043
Whe+0.043
ipop+0.043
Neg logit contributions
 noise-0.024
 cries-0.019
 awful-0.018
 comm-0.018
 wonders-0.017
Token Jack
Token ID3619
Feature activation+6.34e-01

Pos logit contributions
ipop+0.013
ip+0.013
burn+0.013
 sunscreen+0.012
 Sn+0.012
Neg logit contributions
 noise-0.009
 awful-0.008
 cries-0.008
 evil-0.008
 unfamiliar-0.008
Token a
Token ID257
Feature activation+2.04e+00

Pos logit contributions
burn+0.018
ipop+0.018
ip+0.017
Whe+0.017
coat+0.017
Neg logit contributions
 noise-0.009
 cries-0.008
 awful-0.008
 unfamiliar-0.007
 evil-0.007
Token l
Token ID300
Feature activation+2.92e+00

Pos logit contributions
burn+0.059
ip+0.058
ipop+0.056
Whe+0.056
 Sn+0.055
Neg logit contributions
 noise-0.032
 truth-0.026
 awful-0.025
 cries-0.025
 situation-0.024
Tokenoll
Token ID692
Feature activation+9.66e+00

Pos logit contributions
ipop+0.043
ip+0.042
burn+0.040
 sunscreen+0.037
 Sn+0.035
Neg logit contributions
 noise-0.085
 awful-0.078
 cries-0.078
 evil-0.076
 unfamiliar-0.076
Tokenipop
Token ID42800
Feature activation+3.52e+00

Pos logit contributions
Whe+0.089
burn+0.066
 sunscreen+0.062
 Sn+0.057
Br+0.053
Neg logit contributions
 noise-0.314
 cries-0.312
 situation-0.305
 awful-0.304
 evil-0.304
Token.
Token ID13
Feature activation+2.76e-01

Pos logit contributions
burn+0.119
ipop+0.119
ip+0.118
Whe+0.116
 sunscreen+0.116
Neg logit contributions
 noise-0.030
 awful-0.023
 cries-0.022
 truth-0.020
 evil-0.020
Token Jack
Token ID3619
Feature activation+5.56e-01

Pos logit contributions
ipop+0.007
ip+0.007
burn+0.007
 sunscreen+0.007
 Sn+0.006
Neg logit contributions
 noise-0.005
 awful-0.004
 cries-0.004
 evil-0.004
 unfamiliar-0.004
Token thanked
Token ID26280
Feature activation+3.18e-01

Pos logit contributions
burn+0.017
ipop+0.017
ip+0.016
 sunscreen+0.016
Whe+0.016
Neg logit contributions
 noise-0.007
 cries-0.006
 awful-0.006
 wonders-0.005
 evil-0.005
Token him
Token ID683
Feature activation+1.36e+00

Pos logit contributions
ipop+0.008
ip+0.007
burn+0.007
 sunscreen+0.007
 Sn+0.007
Neg logit contributions
 noise-0.006
 awful-0.005
 cries-0.005
 evil-0.005
 unfamiliar-0.005
Token gave
Token ID2921
Feature activation+8.99e-01

Pos logit contributions
ipop+0.066
ip+0.066
burn+0.066
uggets+0.063
 sunscreen+0.063
Neg logit contributions
 noise-0.044
 cries-0.038
 awful-0.037
 comm-0.036
 evil-0.036
Token him
Token ID683
Feature activation+2.12e+00

Pos logit contributions
ipop+0.023
ip+0.023
burn+0.023
 sunscreen+0.022
 Sn+0.021
Neg logit contributions
 noise-0.016
 awful-0.014
 cries-0.013
 evil-0.013
 unfamiliar-0.013
Token a
Token ID257
Feature activation+3.09e+00

Pos logit contributions
burn+0.064
ipop+0.063
ip+0.063
coat+0.062
belt+0.060
Neg logit contributions
 noise-0.025
 cries-0.022
 awful-0.022
 evil-0.021
 truth-0.020
Token fancy
Token ID14996
Feature activation+3.09e+00

Pos logit contributions
ip+0.087
burn+0.087
ipop+0.084
uggets+0.084
 Sn+0.084
Neg logit contributions
 noise-0.046
 cries-0.038
 awful-0.037
 truth-0.037
 evil-0.036
Token l
Token ID300
Feature activation+2.91e+00

Pos logit contributions
ip+0.072
burn+0.070
Whe+0.068
 Sn+0.064
uggets+0.063
Neg logit contributions
 noise-0.070
 truth-0.059
 situation-0.059
 cries-0.057
 unknown-0.056
Tokenoll
Token ID692
Feature activation+9.61e+00

Pos logit contributions
burn+0.044
ipop+0.043
ip+0.043
 sunscreen+0.041
 Sn+0.039
Neg logit contributions
 noise-0.080
 awful-0.075
 cries-0.074
 evil-0.074
 truth-0.072
Tokenipop
Token ID42800
Feature activation+3.80e+00

Pos logit contributions
Whe+0.099
burn+0.073
 sunscreen+0.068
 Sn+0.067
Br+0.062
Neg logit contributions
 cries-0.305
 noise-0.301
.-0.298
 awful-0.297
 evil-0.295
Token as
Token ID355
Feature activation+1.62e+00

Pos logit contributions
ip+0.129
ipop+0.129
burn+0.129
 sunscreen+0.126
Whe+0.126
Neg logit contributions
 noise-0.031
 awful-0.024
 cries-0.023
 evil-0.021
 truth-0.019
Token a
Token ID257
Feature activation+3.35e+00

Pos logit contributions
burn+0.046
ipop+0.045
ip+0.045
 sunscreen+0.042
Whe+0.042
Neg logit contributions
 noise-0.025
 awful-0.021
 evil-0.021
 truth-0.020
 cries-0.020
Token reward
Token ID6721
Feature activation+1.32e+00

Pos logit contributions
burn+0.068
ip+0.067
uggets+0.064
 sunscreen+0.064
Whe+0.064
Neg logit contributions
 noise-0.080
 truth-0.071
 cries-0.069
 evil-0.068
 awful-0.067
Token.
Token ID13
Feature activation+4.25e-01

Pos logit contributions
burn+0.047
ip+0.047
ipop+0.047
 sunscreen+0.046
Whe+0.046
Neg logit contributions
 noise-0.009
 cries-0.006
 awful-0.006
 truth-0.005
 evil-0.005
Token.
Token ID13
Feature activation-7.85e-01

Pos logit contributions
 situation+0.049
 noise+0.049
 harm+0.049
 comm+0.048
 destruction+0.047
Neg logit contributions
ip-0.045
burn-0.044
ipop-0.043
 Sn-0.041
 husband-0.040
Token He
Token ID679
Feature activation+2.47e-01

Pos logit contributions
 noise+0.013
 awful+0.012
 cries+0.011
 evil+0.011
 unfamiliar+0.011
Neg logit contributions
ip-0.021
ipop-0.021
 sunscreen-0.020
 Sn-0.019
burn-0.019
Token took
Token ID1718
Feature activation+1.75e+00

Pos logit contributions
ipop+0.006
ip+0.006
burn+0.005
 sunscreen+0.005
 Sn+0.005
Neg logit contributions
 noise-0.005
 awful-0.005
 cries-0.004
 evil-0.004
 unfamiliar-0.004
Token his
Token ID465
Feature activation+2.73e+00

Pos logit contributions
ipop+0.044
ip+0.044
burn+0.043
 sunscreen+0.041
 Sn+0.040
Neg logit contributions
 noise-0.032
 awful-0.028
 cries-0.028
 evil-0.027
 unfamiliar-0.027
Token l
Token ID300
Feature activation+3.38e+00

Pos logit contributions
burn+0.058
ugg+0.056
Whe+0.054
uggets+0.053
ip+0.053
Neg logit contributions
 noise-0.058
 cries-0.057
 evil-0.052
 situation-0.052
 truth-0.051
Tokenoll
Token ID692
Feature activation+9.60e+00

Pos logit contributions
burn+0.050
ipop+0.048
 sunscreen+0.047
ip+0.047
 Sn+0.044
Neg logit contributions
 noise-0.093
 cries-0.088
 awful-0.087
 evil-0.085
 truth-0.085
Tokenipop
Token ID42800
Feature activation+4.10e+00

Pos logit contributions
Whe+0.078
burn+0.056
 sunscreen+0.056
 Sn+0.047
Br+0.042
Neg logit contributions
 cries-0.325
 noise-0.312
.-0.301
 awful-0.300
 situation-0.299
Token and
Token ID290
Feature activation+3.91e+00

Pos logit contributions
ip+0.117
ipop+0.116
burn+0.112
 sunscreen+0.106
 Sn+0.105
Neg logit contributions
 noise-0.064
 awful-0.055
 evil-0.055
 unfamiliar-0.054
 comm-0.053
Token said
Token ID531
Feature activation+2.18e+00

Pos logit contributions
ipop+0.089
ip+0.089
burn+0.088
Whe+0.083
 sunscreen+0.082
Neg logit contributions
 noise-0.080
 cries-0.073
 awful-0.070
 evil-0.068
 comm-0.066
Token,
Token ID11
Feature activation+3.06e+00

Pos logit contributions
burn+0.064
ip+0.060
 sunscreen+0.060
Whe+0.060
ipop+0.060
Neg logit contributions
 noise-0.032
 cries-0.028
 truth-0.027
 awful-0.027
 evil-0.025
Token "
Token ID366
Feature activation+1.66e+00

Pos logit contributions
burn+0.078
ipop+0.077
ip+0.076
 sunscreen+0.074
Whe+0.073
Neg logit contributions
 noise-0.055
 cries-0.049
 awful-0.047
 truth-0.046
 evil-0.044
Token and
Token ID290
Feature activation+1.83e+00

Pos logit contributions
 noise+0.030
 awful+0.026
 cries+0.026
 evil+0.025
 situation+0.025
Neg logit contributions
ipop-0.050
ip-0.050
burn-0.049
 sunscreen-0.047
 Sn-0.046
Token handed
Token ID10158
Feature activation+1.50e+00

Pos logit contributions
burn+0.061
ipop+0.061
ip+0.060
uggets+0.060
 sunscreen+0.058
Neg logit contributions
 noise-0.018
 cries-0.014
 awful-0.014
 comm-0.013
 unfamiliar-0.012
Token Lucy
Token ID22162
Feature activation+1.58e+00

Pos logit contributions
ip+0.035
ipop+0.035
burn+0.034
 sunscreen+0.033
uggets+0.032
Neg logit contributions
 noise-0.031
 awful-0.028
 cries-0.027
 truth-0.027
 unfamiliar-0.026
Token a
Token ID257
Feature activation+2.63e+00

Pos logit contributions
rav+0.050
ipop+0.050
Whe+0.049
ip+0.049
 seeker+0.048
Neg logit contributions
 noise-0.015
 cries-0.014
 awful-0.013
.-0.013
 unknown-0.012
Token l
Token ID300
Feature activation+3.15e+00

Pos logit contributions
ip+0.070
Whe+0.070
uggets+0.070
burn+0.070
agna+0.067
Neg logit contributions
 noise-0.044
 truth-0.035
 cries-0.035
 strange-0.034
 situation-0.033
Tokenoll
Token ID692
Feature activation+9.55e+00

Pos logit contributions
burn+0.056
 sunscreen+0.053
 Sn+0.049
ip+0.048
ipop+0.047
Neg logit contributions
 noise-0.078
 awful-0.075
 cries-0.074
 evil-0.073
 truth-0.073
Tokenipop
Token ID42800
Feature activation+3.72e+00

Pos logit contributions
Whe+0.119
 sunscreen+0.085
burn+0.083
 Sn+0.083
Br+0.081
Neg logit contributions
 cries-0.290
.-0.285
 noise-0.278
 awful-0.273
 situation-0.271
Token.
Token ID13
Feature activation+4.94e-01

Pos logit contributions
ipop+0.124
burn+0.124
ip+0.124
Whe+0.122
 sunscreen+0.122
Neg logit contributions
 noise-0.032
 awful-0.025
 cries-0.025
 truth-0.022
 evil-0.021
Token Lucy
Token ID22162
Feature activation+1.03e+00

Pos logit contributions
burn+0.014
ip+0.014
ipop+0.014
 sunscreen+0.013
 seeker+0.012
Neg logit contributions
 noise-0.008
 cries-0.007
 truth-0.007
 awful-0.007
 wonders-0.007
Token was
Token ID373
Feature activation+5.94e-01

Pos logit contributions
burn+0.027
ipop+0.027
 sunscreen+0.027
ip+0.026
uggets+0.026
Neg logit contributions
 noise-0.016
 cries-0.016
 awful-0.014
 wonders-0.014
 comm-0.013
Token so
Token ID523
Feature activation+9.77e-01

Pos logit contributions
ipop+0.019
ip+0.019
burn+0.018
 sunscreen+0.018
 Sn+0.017
Neg logit contributions
 noise-0.007
 awful-0.006
 cries-0.006
 evil-0.006
 unfamiliar-0.005
Token She
Token ID1375
Feature activation-1.56e-01

Pos logit contributions
 noise+0.009
 awful+0.008
 evil+0.008
 situation+0.008
 unfamiliar+0.008
Neg logit contributions
ip-0.012
ipop-0.012
 Sn-0.011
burn-0.011
 sunscreen-0.011
Token gives
Token ID3607
Feature activation+4.98e-01

Pos logit contributions
 noise+0.003
 awful+0.003
 unfamiliar+0.003
 evil+0.003
 odd+0.003
Neg logit contributions
ip-0.004
ipop-0.004
burn-0.004
 sunscreen-0.003
 Sn-0.003
Token them
Token ID606
Feature activation+3.23e+00

Pos logit contributions
ip+0.013
ipop+0.013
burn+0.012
 sunscreen+0.011
coat+0.011
Neg logit contributions
 noise-0.009
 cries-0.008
 awful-0.008
 evil-0.008
 unfamiliar-0.008
Token a
Token ID257
Feature activation+2.68e+00

Pos logit contributions
ipop+0.089
ip+0.089
belt+0.087
burn+0.086
coat+0.083
Neg logit contributions
 noise-0.044
 cries-0.041
 awful-0.040
 truth-0.037
 unknown-0.036
Token l
Token ID300
Feature activation+3.06e+00

Pos logit contributions
ip+0.053
Whe+0.051
 Sn+0.050
burn+0.050
ipop+0.050
Neg logit contributions
 noise-0.064
 cries-0.056
 truth-0.055
 situation-0.053
 awful-0.053
Tokenoll
Token ID692
Feature activation+9.54e+00

Pos logit contributions
ipop+0.049
ip+0.049
burn+0.047
 sunscreen+0.044
 Sn+0.042
Neg logit contributions
 noise-0.084
 awful-0.077
 cries-0.077
 evil-0.075
 unfamiliar-0.074
Tokenipop
Token ID42800
Feature activation+3.99e+00

Pos logit contributions
Whe+0.097
burn+0.064
 sunscreen+0.064
 Sn+0.064
Br+0.060
Neg logit contributions
 cries-0.306
 noise-0.301
.-0.296
 evil-0.287
 awful-0.285
Token and
Token ID290
Feature activation+3.73e+00

Pos logit contributions
ipop+0.128
ip+0.127
burn+0.126
 sunscreen+0.123
Whe+0.122
Neg logit contributions
 noise-0.042
 awful-0.034
 cries-0.034
 evil-0.031
 truth-0.030
Token a
Token ID257
Feature activation+3.68e+00

Pos logit contributions
ipop+0.095
ip+0.094
burn+0.092
 sunscreen+0.087
 Sn+0.086
Neg logit contributions
 noise-0.067
 cries-0.060
 awful-0.059
 evil-0.057
 unfamiliar-0.055
Token sticker
Token ID27735
Feature activation+2.21e+00

Pos logit contributions
Whe+0.072
ip+0.071
burn+0.071
uggets+0.070
rav+0.068
Neg logit contributions
 noise-0.090
 cries-0.077
 truth-0.076
 awful-0.073
 evil-0.071
Token.
Token ID13
Feature activation-5.83e-02

Pos logit contributions
ipop+0.076
ip+0.076
burn+0.075
 sunscreen+0.073
 Sn+0.071
Neg logit contributions
 noise-0.019
 cries-0.014
 awful-0.014
 evil-0.013
 truth-0.012
Token She
Token ID1375
Feature activation-3.97e-02

Pos logit contributions
 noise+0.003
 cries+0.002
 awful+0.002
 unfamiliar+0.002
 truth+0.002
Neg logit contributions
burn-0.004
ipop-0.004
ip-0.004
 sunscreen-0.004
uggets-0.004
Token gave
Token ID2921
Feature activation+7.02e-01

Pos logit contributions
 noise+0.001
 cries+0.001
 awful+0.001
 wonders+0.001
 evil+0.001
Neg logit contributions
burn-0.001
ipop-0.001
Whe-0.001
ip-0.001
 sunscreen-0.001
Token them
Token ID606
Feature activation+2.60e+00

Pos logit contributions
ipop+0.021
ip+0.021
burn+0.021
coat+0.020
belt+0.020
Neg logit contributions
 noise-0.010
 cries-0.008
 wonders-0.008
 awful-0.007
 unknown-0.007
Token a
Token ID257
Feature activation+2.74e+00

Pos logit contributions
burn+0.075
ipop+0.074
ip+0.073
belt+0.069
coat+0.069
Neg logit contributions
 noise-0.036
 awful-0.034
 cries-0.031
 evil-0.031
 unfamiliar-0.030
Token l
Token ID300
Feature activation+3.12e+00

Pos logit contributions
burn+0.080
ip+0.080
Whe+0.078
 Sn+0.076
ipop+0.075
Neg logit contributions
 noise-0.041
 truth-0.033
 awful-0.031
 cries-0.031
 evil-0.030
Tokenoll
Token ID692
Feature activation+9.53e+00

Pos logit contributions
burn+0.070
 sunscreen+0.065
 Sn+0.063
ip+0.062
Whe+0.061
Neg logit contributions
 noise-0.063
 awful-0.061
 evil-0.059
 cries-0.058
 truth-0.058
Tokenipop
Token ID42800
Feature activation+4.01e+00

Pos logit contributions
Whe+0.130
burn+0.091
Br+0.091
 Sn+0.087
 sunscreen+0.085
Neg logit contributions
.-0.276
 cries-0.274
 noise-0.271
 evil-0.265
 awful-0.265
Token each
Token ID1123
Feature activation+3.75e+00

Pos logit contributions
burn+0.120
ip+0.119
ipop+0.119
Whe+0.118
 sunscreen+0.116
Neg logit contributions
 noise-0.049
 awful-0.042
 cries-0.041
 evil-0.037
 truth-0.037
Token and
Token ID290
Feature activation+2.68e+00

Pos logit contributions
burn+0.115
ip+0.113
ipop+0.111
 sunscreen+0.110
 Sn+0.109
Neg logit contributions
 noise-0.048
 cries-0.040
 awful-0.038
 evil-0.035
 truth-0.035
Token told
Token ID1297
Feature activation-7.23e-01

Pos logit contributions
burn+0.071
ipop+0.071
ip+0.070
Whe+0.069
 sunscreen+0.066
Neg logit contributions
 noise-0.046
 cries-0.040
 awful-0.039
 evil-0.037
 comm-0.036
Token them
Token ID606
Feature activation+7.89e-01

Pos logit contributions
 noise+0.015
 awful+0.014
 cries+0.013
 evil+0.013
 unfamiliar+0.013
Neg logit contributions
ipop-0.016
ip-0.016
burn-0.016
 sunscreen-0.015
 Sn-0.015
Token Lily
Token ID20037
Feature activation+2.25e-01

Pos logit contributions
 reins+0.025
Whe+0.025
ipop+0.025
coat+0.024
uggets+0.024
Neg logit contributions
 noise-0.016
 awful-0.013
 cries-0.013
.-0.013
 comm-0.012
Token happily
Token ID18177
Feature activation+1.57e+00

Pos logit contributions
Whe+0.006
burn+0.006
 sunscreen+0.006
ipop+0.006
uggets+0.006
Neg logit contributions
 noise-0.003
 cries-0.003
 wonders-0.003
 awful-0.003
 comm-0.003
Token took
Token ID1718
Feature activation+2.35e+00

Pos logit contributions
Whe+0.040
burn+0.039
 sunscreen+0.039
 Sn+0.037
uggets+0.037
Neg logit contributions
 noise-0.024
 cries-0.021
.-0.020
 comm-0.020
 awful-0.019
Token the
Token ID262
Feature activation+1.15e+00

Pos logit contributions
belt+0.078
ipop+0.077
coat+0.076
burn+0.074
rav+0.073
Neg logit contributions
 noise-0.025
 cries-0.022
 trouble-0.020
.-0.017
 awful-0.016
Token l
Token ID300
Feature activation+3.09e+00

Pos logit contributions
burn+0.022
uggets+0.022
ip+0.021
 sunscreen+0.021
inea+0.020
Neg logit contributions
 noise-0.028
 cries-0.025
 truth-0.024
 situation-0.024
 unknown-0.024
Tokenoll
Token ID692
Feature activation+9.50e+00

Pos logit contributions
burn+0.059
 sunscreen+0.055
 Sn+0.050
Whe+0.048
 apr+0.047
Neg logit contributions
 noise-0.072
 cries-0.069
 awful-0.068
 truth-0.068
 evil-0.067
Tokenipop
Token ID42800
Feature activation+3.92e+00

Pos logit contributions
Whe+0.110
burn+0.082
 sunscreen+0.078
 steering+0.077
Br+0.076
Neg logit contributions
 cries-0.288
.-0.277
 noise-0.276
 awful-0.269
 situation-0.267
Token and
Token ID290
Feature activation+3.12e+00

Pos logit contributions
ipop+0.112
ip+0.111
burn+0.109
 sunscreen+0.106
 Sn+0.103
Neg logit contributions
 noise-0.057
 awful-0.049
 cries-0.049
 evil-0.046
 truth-0.045
Token started
Token ID2067
Feature activation+6.95e-01

Pos logit contributions
rav+0.089
Whe+0.089
burn+0.086
 reins+0.086
inea+0.086
Neg logit contributions
 noise-0.042
 cries-0.039
 wonders-0.032
 that-0.031
.-0.031
Token licking
Token ID43390
Feature activation+9.79e-01

Pos logit contributions
Whe+0.022
ipop+0.022
inea+0.022
rav+0.021
 seeker+0.021
Neg logit contributions
 noise-0.007
 cries-0.007
.-0.005
 truth-0.005
 harm-0.005
Token it
Token ID340
Feature activation+2.58e-01

Pos logit contributions
burn+0.033
ugg+0.033
ipop+0.033
coat+0.033
belt+0.032
Neg logit contributions
 noise-0.010
 cries-0.008
 awful-0.007
.-0.007
 truth-0.007
Token unw
Token ID7379
Feature activation+3.86e+00

Pos logit contributions
ipop+0.004
ip+0.004
burn+0.003
 sunscreen+0.003
 Sn+0.003
Neg logit contributions
 noise-0.003
 awful-0.003
 cries-0.003
 evil-0.003
 unfamiliar-0.003
Tokenra
Token ID430
Feature activation+3.01e+00

Pos logit contributions
ipop+0.064
ip+0.061
burn+0.059
 sunscreen+0.057
Whe+0.054
Neg logit contributions
 noise-0.103
 cries-0.096
 awful-0.094
 evil-0.094
 unfamiliar-0.091
Tokenpped
Token ID1496
Feature activation+1.44e+00

Pos logit contributions
burn+0.067
ipop+0.066
uggets+0.065
Whe+0.064
 sunscreen+0.063
Neg logit contributions
 noise-0.056
 cries-0.055
 awful-0.052
 evil-0.050
 situation-0.049
Token the
Token ID262
Feature activation+1.11e+00

Pos logit contributions
ipop+0.039
ip+0.039
burn+0.038
 sunscreen+0.037
uggets+0.037
Neg logit contributions
 noise-0.023
 cries-0.020
 awful-0.020
 truth-0.019
 evil-0.019
Token l
Token ID300
Feature activation+3.81e+00

Pos logit contributions
uggets+0.028
burn+0.028
rav+0.027
days+0.027
ugg+0.026
Neg logit contributions
 noise-0.022
 cries-0.019
 awful-0.019
 unknown-0.018
 truth-0.018
Tokenoll
Token ID692
Feature activation+9.49e+00

Pos logit contributions
burn+0.068
 sunscreen+0.063
 Sn+0.056
ipop+0.056
uggets+0.056
Neg logit contributions
 noise-0.094
 cries-0.090
 awful-0.089
 truth-0.088
 evil-0.088
Tokenipop
Token ID42800
Feature activation+3.13e+00

Pos logit contributions
Whe+0.095
burn+0.085
 sunscreen+0.072
Br+0.069
uggets+0.064
Neg logit contributions
 cries-0.297
 noise-0.293
.-0.283
 awful-0.279
 situation-0.278
Token and
Token ID290
Feature activation+2.95e+00

Pos logit contributions
ip+0.081
ipop+0.079
burn+0.074
 sunscreen+0.071
 Sn+0.071
Neg logit contributions
 noise-0.056
 evil-0.051
 unfamiliar-0.050
 awful-0.050
 comm-0.049
Token l
Token ID300
Feature activation+2.62e+00

Pos logit contributions
burn+0.075
ip+0.074
ipop+0.073
uggets+0.072
rav+0.071
Neg logit contributions
 noise-0.054
 cries-0.050
 awful-0.045
 evil-0.044
 comm-0.043
Tokenicked
Token ID9484
Feature activation+1.64e+00

Pos logit contributions
burn+0.054
ipop+0.051
 sunscreen+0.050
ip+0.050
 Sn+0.047
Neg logit contributions
 noise-0.058
 awful-0.055
 cries-0.055
 evil-0.053
 truth-0.052
Token it
Token ID340
Feature activation+1.81e-01

Pos logit contributions
ipop+0.040
ip+0.040
burn+0.039
 sunscreen+0.037
 Sn+0.036
Neg logit contributions
 noise-0.031
 awful-0.028
 cries-0.027
 evil-0.027
 unfamiliar-0.026
Token should
Token ID815
Feature activation-1.77e+00

Pos logit contributions
ipop+0.067
ip+0.066
burn+0.064
 sunscreen+0.063
Whe+0.059
Neg logit contributions
 noise-0.066
 cries-0.059
 awful-0.059
 truth-0.058
 wonders-0.056
Token have
Token ID423
Feature activation+1.73e+00

Pos logit contributions
 noise+0.041
 awful+0.038
 cries+0.037
 evil+0.037
 unfamiliar+0.036
Neg logit contributions
ipop-0.035
ip-0.034
burn-0.033
 sunscreen-0.032
 Sn-0.031
Token shared
Token ID4888
Feature activation+2.77e-01

Pos logit contributions
ipop+0.026
ip+0.026
burn+0.025
 sunscreen+0.024
 Sn+0.022
Neg logit contributions
 noise-0.048
 awful-0.044
 cries-0.044
 evil-0.044
 unfamiliar-0.043
Token my
Token ID616
Feature activation+3.60e+00

Pos logit contributions
ipop+0.007
ip+0.007
burn+0.007
 sunscreen+0.007
Whe+0.007
Neg logit contributions
 noise-0.005
 cries-0.004
 awful-0.004
 truth-0.004
 evil-0.004
Token l
Token ID300
Feature activation+3.55e+00

Pos logit contributions
burn+0.086
ip+0.079
ugg+0.078
etsy+0.075
inea+0.074
Neg logit contributions
 noise-0.079
 truth-0.069
 unknown-0.069
 situation-0.069
 wonders-0.068
Tokenoll
Token ID692
Feature activation+9.48e+00

Pos logit contributions
burn+0.064
 sunscreen+0.062
 Sn+0.054
Whe+0.053
 apr+0.053
Neg logit contributions
 noise-0.085
 cries-0.081
 truth-0.080
 awful-0.080
 evil-0.079
Tokenipop
Token ID42800
Feature activation+3.63e+00

Pos logit contributions
Whe+0.087
burn+0.071
 sunscreen+0.071
 steering+0.059
Br+0.058
Neg logit contributions
 cries-0.306
 noise-0.302
.-0.290
 awful-0.287
 harm-0.285
Token with
Token ID351
Feature activation+2.48e-01

Pos logit contributions
ipop+0.103
ip+0.102
burn+0.099
 sunscreen+0.095
 Sn+0.094
Neg logit contributions
 noise-0.055
 awful-0.047
 cries-0.046
 evil-0.046
 unfamiliar-0.045
Token you
Token ID345
Feature activation+1.80e+00

Pos logit contributions
ipop+0.008
ip+0.008
burn+0.008
 sunscreen+0.007
 Sn+0.007
Neg logit contributions
 noise-0.003
 awful-0.003
 cries-0.003
 evil-0.002
 truth-0.002
Token.
Token ID13
Feature activation+2.76e-02

Pos logit contributions
ipop+0.060
ip+0.060
burn+0.059
 sunscreen+0.057
 Sn+0.056
Neg logit contributions
 noise-0.018
 awful-0.014
 cries-0.014
 evil-0.013
 unfamiliar-0.012
Token It
Token ID632
Feature activation-6.90e-01

Pos logit contributions
ipop+0.001
ip+0.001
burn+0.001
 sunscreen+0.001
coat+0.001
Neg logit contributions
 noise-0.000
 cries-0.000
 awful-0.000
 truth-0.000
 evil-0.000
Token gave
Token ID2921
Feature activation-2.59e-01

Pos logit contributions
 noise+0.001
 awful+0.001
 cries+0.001
 evil+0.001
 unfamiliar+0.001
Neg logit contributions
ip-0.002
ipop-0.002
burn-0.002
 sunscreen-0.002
uggets-0.002
Token Sus
Token ID8932
Feature activation+9.76e-01

Pos logit contributions
 noise+0.004
 cries+0.004
 awful+0.003
 truth+0.003
 harm+0.003
Neg logit contributions
ipop-0.007
ip-0.007
burn-0.007
 saddle-0.007
coat-0.007
Tokenie
Token ID494
Feature activation+1.99e+00

Pos logit contributions
ipop+0.028
ip+0.028
 sunscreen+0.028
burn+0.028
Whe+0.027
Neg logit contributions
 noise-0.013
 awful-0.011
 cries-0.011
 evil-0.011
 unfamiliar-0.010
Token a
Token ID257
Feature activation+2.91e+00

Pos logit contributions
ip+0.059
ipop+0.059
burn+0.057
agna+0.056
uggets+0.056
Neg logit contributions
 noise-0.025
 cries-0.022
 awful-0.021
 evil-0.021
 unknown-0.019
Token l
Token ID300
Feature activation+2.70e+00

Pos logit contributions
ip+0.068
uggets+0.067
ipop+0.065
burn+0.064
Whe+0.063
Neg logit contributions
 noise-0.062
 cries-0.053
 truth-0.051
 awful-0.050
 evil-0.050
Tokenoll
Token ID692
Feature activation+9.48e+00

Pos logit contributions
burn+0.044
 sunscreen+0.043
ipop+0.040
ip+0.040
 Sn+0.039
Neg logit contributions
 noise-0.071
 awful-0.067
 cries-0.067
 evil-0.066
 truth-0.065
Tokenipop
Token ID42800
Feature activation+3.78e+00

Pos logit contributions
Whe+0.112
 sunscreen+0.084
Br+0.075
 Sn+0.072
burn+0.070
Neg logit contributions
 cries-0.294
 noise-0.286
.-0.282
 evil-0.277
 awful-0.276
Token to
Token ID284
Feature activation+1.11e+00

Pos logit contributions
Whe+0.128
 sunscreen+0.126
ip+0.126
ipop+0.126
burn+0.126
Neg logit contributions
 noise-0.030
 awful-0.023
 cries-0.023
 evil-0.019
 truth-0.019
Token make
Token ID787
Feature activation-1.04e+00

Pos logit contributions
ip+0.026
ipop+0.026
burn+0.025
rav+0.024
uggets+0.024
Neg logit contributions
 noise-0.024
 cries-0.020
 awful-0.019
 harm-0.019
 truth-0.019
Token her
Token ID607
Feature activation+8.23e-01

Pos logit contributions
 noise+0.020
 awful+0.017
 cries+0.017
 evil+0.016
 truth+0.016
Neg logit contributions
ipop-0.026
ip-0.026
burn-0.025
 sunscreen-0.024
coat-0.023
Token feel
Token ID1254
Feature activation-8.40e-01

Pos logit contributions
ip+0.022
ipop+0.021
burn+0.020
 sunscreen+0.020
ugg+0.020
Neg logit contributions
 noise-0.015
 cries-0.014
 awful-0.013
 evil-0.012
 situation-0.012
Token Do
Token ID2141
Feature activation+4.39e-01

Pos logit contributions
burn+0.015
 sunscreen+0.015
ip+0.014
ipop+0.014
coat+0.014
Neg logit contributions
 noise-0.006
 truth-0.006
 cries-0.005
 He-0.005
 wonders-0.005
Token you
Token ID345
Feature activation+1.85e+00

Pos logit contributions
ipop+0.011
ip+0.011
 sunscreen+0.011
rav+0.010
 Sn+0.010
Neg logit contributions
 noise-0.007
 awful-0.007
 unfamiliar-0.007
 cries-0.007
 comm-0.007
Token want
Token ID765
Feature activation+8.32e-01

Pos logit contributions
ipop+0.048
ip+0.047
burn+0.046
 sunscreen+0.044
 Sn+0.043
Neg logit contributions
 noise-0.033
 awful-0.029
 cries-0.028
 evil-0.028
 unfamiliar-0.027
Token a
Token ID257
Feature activation+4.30e+00

Pos logit contributions
ipop+0.025
burn+0.025
ip+0.025
Whe+0.023
 sunscreen+0.023
Neg logit contributions
 noise-0.011
 cries-0.009
 truth-0.009
 awful-0.009
 evil-0.009
Token l
Token ID300
Feature activation+3.14e+00

Pos logit contributions
uggets+0.096
burn+0.091
 sunscreen+0.090
belt+0.090
bled+0.089
Neg logit contributions
 noise-0.082
 truth-0.074
 situation-0.072
 unknown-0.070
 strange-0.070
Tokenoll
Token ID692
Feature activation+9.40e+00

Pos logit contributions
burn+0.055
 sunscreen+0.053
ipop+0.048
 Sn+0.048
ip+0.047
Neg logit contributions
 noise-0.078
 evil-0.073
 awful-0.073
 cries-0.072
 truth-0.072
Tokenipop
Token ID42800
Feature activation+3.65e+00

Pos logit contributions
Whe+0.077
 sunscreen+0.067
Br+0.060
burn+0.059
 Sn+0.057
Neg logit contributions
 noise-0.301
 cries-0.296
 evil-0.290
.-0.289
 awful-0.283
Token?"
Token ID1701
Feature activation-3.36e-01

Pos logit contributions
ip+0.110
burn+0.110
 sunscreen+0.109
ipop+0.109
Whe+0.107
Neg logit contributions
 noise-0.041
 awful-0.034
 evil-0.031
 truth-0.031
 cries-0.029
Token
Token ID198
Feature activation+9.74e-02

Pos logit contributions
 noise+0.006
 awful+0.005
 evil+0.005
 cries+0.005
 unfamiliar+0.005
Neg logit contributions
ipop-0.009
ip-0.008
burn-0.008
 sunscreen-0.008
 Sn-0.008
Token
Token ID198
Feature activation+1.94e-02

Pos logit contributions
ip+0.002
ipop+0.002
burn+0.002
 sunscreen+0.002
 Sn+0.002
Neg logit contributions
 noise-0.002
 awful-0.002
 evil-0.002
 cries-0.002
 situation-0.001
TokenTom
Token ID13787
Feature activation-9.61e-01

Pos logit contributions
ipop+0.001
ip+0.000
burn+0.000
 sunscreen+0.000
uggets+0.000
Neg logit contributions
 noise-0.000
 awful-0.000
 cries-0.000
 unfamiliar-0.000
 evil-0.000
Token gave
Token ID2921
Feature activation+5.77e-01

Pos logit contributions
 noise+0.010
 cries+0.009
 wonders+0.008
 evil+0.008
 awful+0.008
Neg logit contributions
burn-0.022
 sunscreen-0.021
ip-0.021
ipop-0.021
Whe-0.020
Token Peter
Token ID5613
Feature activation+7.70e-01

Pos logit contributions
ipop+0.016
ip+0.016
burn+0.015
 sunscreen+0.015
 Sn+0.014
Neg logit contributions
 noise-0.009
 cries-0.008
 awful-0.008
 evil-0.008
 unfamiliar-0.008
Token a
Token ID257
Feature activation+2.63e+00

Pos logit contributions
burn+0.023
ipop+0.023
ip+0.022
Whe+0.022
coat+0.022
Neg logit contributions
 noise-0.009
 cries-0.008
 awful-0.008
 evil-0.008
 unknown-0.007
Token big
Token ID1263
Feature activation+2.34e+00

Pos logit contributions
ip+0.079
burn+0.078
Whe+0.077
uggets+0.076
ipop+0.073
Neg logit contributions
 noise-0.040
 truth-0.030
 strange-0.030
 unknown-0.029
 cries-0.028
Token l
Token ID300
Feature activation+3.16e+00

Pos logit contributions
ip+0.056
burn+0.055
Whe+0.054
uggets+0.053
ipop+0.050
Neg logit contributions
 noise-0.051
 cries-0.043
 situation-0.041
 voice-0.040
 evil-0.040
Tokenoll
Token ID692
Feature activation+9.38e+00

Pos logit contributions
burn+0.059
 sunscreen+0.055
ip+0.052
 Sn+0.051
ipop+0.050
Neg logit contributions
 noise-0.075
 awful-0.072
 evil-0.071
 truth-0.071
 cries-0.071
Tokenipop
Token ID42800
Feature activation+3.49e+00

Pos logit contributions
Whe+0.114
Br+0.092
burn+0.088
 sunscreen+0.084
 Sn+0.080
Neg logit contributions
.-0.280
 cries-0.280
 noise-0.276
 evil-0.270
!-0.266
Token!
Token ID0
Feature activation-5.16e-02

Pos logit contributions
ip+0.110
ipop+0.110
burn+0.110
 sunscreen+0.106
Whe+0.104
Neg logit contributions
 noise-0.039
 awful-0.031
 cries-0.031
 evil-0.029
 truth-0.028
Token There
Token ID1318
Feature activation+9.85e-01

Pos logit contributions
 noise+0.001
 cries+0.001
 awful+0.001
 evil+0.001
 truth+0.001
Neg logit contributions
ipop-0.001
ip-0.001
burn-0.001
 sunscreen-0.001
uggets-0.001
Token were
Token ID547
Feature activation-1.38e+00

Pos logit contributions
ipop+0.024
ip+0.024
burn+0.023
 sunscreen+0.022
 Sn+0.022
Neg logit contributions
 noise-0.018
 awful-0.016
 evil-0.016
 cries-0.016
 unfamiliar-0.015
Token many
Token ID867
Feature activation-5.37e-01

Pos logit contributions
 noise+0.026
 awful+0.024
 cries+0.023
 evil+0.023
 truth+0.022
Neg logit contributions
ipop-0.035
ip-0.034
burn-0.034
ugg-0.032
Whe-0.031
Token.
Token ID13
Feature activation-2.98e-02

Pos logit contributions
ipop+0.044
ip+0.043
burn+0.042
 sunscreen+0.040
 Sn+0.040
Neg logit contributions
 noise-0.014
 awful-0.012
 evil-0.012
 unfamiliar-0.012
 odd-0.011
Token They
Token ID1119
Feature activation+1.53e+00

Pos logit contributions
 noise+0.001
 awful+0.001
 evil+0.001
 unfamiliar+0.001
 situation+0.001
Neg logit contributions
ip-0.001
ipop-0.001
 Sn-0.001
 sunscreen-0.001
burn-0.001
Token took
Token ID1718
Feature activation+2.02e+00

Pos logit contributions
ip+0.037
burn+0.037
ipop+0.036
uggets+0.034
Whe+0.034
Neg logit contributions
 noise-0.028
 cries-0.026
 awful-0.025
 wonders-0.024
 evil-0.024
Token their
Token ID511
Feature activation+2.97e+00

Pos logit contributions
ipop+0.058
coat+0.055
belt+0.055
ip+0.054
burn+0.054
Neg logit contributions
 noise-0.032
 cries-0.030
 awful-0.026
 evil-0.025
 trouble-0.025
Token l
Token ID300
Feature activation+3.48e+00

Pos logit contributions
burn+0.066
belt+0.063
ugg+0.062
uggets+0.060
Whe+0.060
Neg logit contributions
 cries-0.062
 noise-0.060
 situation-0.055
 screams-0.055
 wonders-0.054
Tokenoll
Token ID692
Feature activation+9.36e+00

Pos logit contributions
burn+0.059
 sunscreen+0.056
ipop+0.052
 Sn+0.052
ip+0.050
Neg logit contributions
 noise-0.085
 cries-0.084
 awful-0.081
 evil-0.080
 truth-0.080
Tokenip
Token ID541
Feature activation+5.48e+00

Pos logit contributions
Whe+0.108
burn+0.072
Br+0.069
 sunscreen+0.068
 steering+0.066
Neg logit contributions
 cries-0.297
.-0.280
 noise-0.271
 harm-0.266
 hurts-0.265
Tokenops
Token ID2840
Feature activation+2.47e+00

Pos logit contributions
ipop+0.090
ip+0.087
burn+0.087
 sunscreen+0.083
Whe+0.080
Neg logit contributions
 noise-0.144
 cries-0.135
 awful-0.131
 situation-0.127
 evil-0.127
Token and
Token ID290
Feature activation+4.65e+00

Pos logit contributions
ipop+0.071
ip+0.071
burn+0.069
 sunscreen+0.067
 Sn+0.065
Neg logit contributions
 noise-0.036
 awful-0.031
 cries-0.030
 evil-0.029
 unfamiliar-0.029
Token smiled
Token ID13541
Feature activation-1.26e+00

Pos logit contributions
rav+0.107
Whe+0.106
ip+0.105
ipop+0.104
burn+0.103
Neg logit contributions
 noise-0.089
 cries-0.087
 comm-0.075
 screams-0.075
 awful-0.072
Token.
Token ID13
Feature activation-4.07e-02

Pos logit contributions
 noise+0.012
 awful+0.010
 cries+0.010
 evil+0.009
 situation+0.009
Neg logit contributions
ipop-0.042
ip-0.042
burn-0.041
 sunscreen-0.040
 Sn-0.039
Token gives
Token ID3607
Feature activation+5.53e-01

Pos logit contributions
 noise+0.007
 cries+0.006
 awful+0.006
 evil+0.005
 truth+0.005
Neg logit contributions
burn-0.012
ip-0.012
ipop-0.012
 sunscreen-0.011
 Sn-0.011
Token them
Token ID606
Feature activation+2.05e+00

Pos logit contributions
ipop+0.015
ip+0.015
burn+0.015
ugg+0.014
coat+0.014
Neg logit contributions
 noise-0.009
 cries-0.008
 awful-0.007
 truth-0.007
 evil-0.007
Token each
Token ID1123
Feature activation+3.20e+00

Pos logit contributions
burn+0.059
ipop+0.059
belt+0.058
ip+0.057
coat+0.057
Neg logit contributions
 noise-0.027
 cries-0.025
 awful-0.023
 unknown-0.022
 truth-0.022
Token a
Token ID257
Feature activation+2.81e+00

Pos logit contributions
burn+0.093
ipop+0.091
ip+0.091
belt+0.087
 sunscreen+0.086
Neg logit contributions
 noise-0.048
 cries-0.041
 truth-0.036
 awful-0.036
 wonders-0.036
Token l
Token ID300
Feature activation+3.22e+00

Pos logit contributions
burn+0.064
Whe+0.062
uggets+0.060
 sunscreen+0.059
ugg+0.058
Neg logit contributions
 noise-0.060
 cries-0.051
 truth-0.051
 situation-0.049
 strange-0.048
Tokenoll
Token ID692
Feature activation+9.35e+00

Pos logit contributions
burn+0.056
 sunscreen+0.053
 Sn+0.048
ipop+0.048
ip+0.047
Neg logit contributions
 noise-0.081
 cries-0.077
 awful-0.076
 truth-0.075
 evil-0.075
Tokenipop
Token ID42800
Feature activation+3.92e+00

Pos logit contributions
Whe+0.088
 sunscreen+0.072
burn+0.070
Br+0.059
 Sn+0.057
Neg logit contributions
 cries-0.300
 noise-0.292
.-0.288
 evil-0.278
 awful-0.277
Token.
Token ID13
Feature activation+2.48e-01

Pos logit contributions
ipop+0.135
ip+0.135
burn+0.134
 sunscreen+0.130
Whe+0.128
Neg logit contributions
 noise-0.033
 awful-0.025
 cries-0.025
 evil-0.022
 truth-0.021
Token Tom
Token ID4186
Feature activation-9.40e-01

Pos logit contributions
ipop+0.006
ip+0.006
burn+0.006
 sunscreen+0.006
 Sn+0.006
Neg logit contributions
 noise-0.005
 awful-0.004
 cries-0.004
 evil-0.004
 unfamiliar-0.004
Token gets
Token ID3011
Feature activation+1.03e+00

Pos logit contributions
 noise+0.012
 cries+0.011
 awful+0.010
 wonders+0.010
 evil+0.009
Neg logit contributions
burn-0.028
ipop-0.028
ip-0.028
 sunscreen-0.027
uggets-0.026
Token a
Token ID257
Feature activation+2.63e+00

Pos logit contributions
ipop+0.029
burn+0.028
ip+0.028
coat+0.028
Whe+0.027
Neg logit contributions
 noise-0.016
 cries-0.015
 awful-0.014
 evil-0.013
 unknown-0.013
Token out
Token ID503
Feature activation-2.45e-01

Pos logit contributions
 noise+0.009
 awful+0.008
 cries+0.008
 evil+0.008
 unfamiliar+0.008
Neg logit contributions
ipop-0.014
ip-0.014
burn-0.014
 sunscreen-0.013
uggets-0.013
Token that
Token ID326
Feature activation-1.12e+00

Pos logit contributions
 noise+0.003
 awful+0.003
 evil+0.003
 cries+0.003
 unfamiliar+0.002
Neg logit contributions
burn-0.008
ip-0.008
ipop-0.008
 sunscreen-0.007
uggets-0.007
Token she
Token ID673
Feature activation-2.15e+00

Pos logit contributions
 noise+0.022
 cries+0.018
 awful+0.018
 unfamiliar+0.018
 evil+0.018
Neg logit contributions
ip-0.028
ipop-0.028
burn-0.027
 sunscreen-0.025
uggets-0.025
Token had
Token ID550
Feature activation-2.22e-01

Pos logit contributions
 noise+0.038
 awful+0.033
 unfamiliar+0.032
 cries+0.032
 evil+0.032
Neg logit contributions
ip-0.055
ipop-0.055
burn-0.052
 sunscreen-0.050
 Sn-0.049
Token been
Token ID587
Feature activation-1.13e+00

Pos logit contributions
 noise+0.004
 awful+0.003
 cries+0.003
 evil+0.003
 unfamiliar+0.003
Neg logit contributions
ipop-0.006
ip-0.006
burn-0.006
 sunscreen-0.005
 Sn-0.005
Token excluded
Token ID15009
Feature activation-6.41e+00

Pos logit contributions
 noise+0.021
 evil+0.019
 awful+0.019
 cries+0.019
 unfamiliar+0.018
Neg logit contributions
ipop-0.029
ip-0.028
burn-0.027
 sunscreen-0.026
Whe-0.025
Token from
Token ID422
Feature activation-1.95e+00

Pos logit contributions
 noise+0.111
 awful+0.111
 cries+0.102
 unfamiliar+0.101
 urgent+0.100
Neg logit contributions
ipop-0.156
burn-0.154
ip-0.148
 Sn-0.140
coat-0.139
Token a
Token ID257
Feature activation-3.47e-01

Pos logit contributions
 noise+0.035
 awful+0.030
 cries+0.030
 evil+0.030
 unfamiliar+0.029
Neg logit contributions
ipop-0.051
ip-0.051
burn-0.049
 sunscreen-0.047
Whe-0.046
Token special
Token ID2041
Feature activation+4.17e-01

Pos logit contributions
 noise+0.005
 awful+0.004
 cries+0.004
 evil+0.004
 unfamiliar+0.004
Neg logit contributions
ipop-0.010
ip-0.010
burn-0.010
 sunscreen-0.009
 Sn-0.009
Token game
Token ID983
Feature activation-3.15e+00

Pos logit contributions
ip+0.010
burn+0.010
ipop+0.010
uggets+0.010
 sunscreen+0.010
Neg logit contributions
 noise-0.008
 situation-0.007
 truth-0.007
 evil-0.007
 cries-0.007
Token.
Token ID13
Feature activation-2.18e+00

Pos logit contributions
 noise+0.041
 awful+0.035
 cries+0.035
 comm+0.033
 unfamiliar+0.033
Neg logit contributions
ipop-0.094
ip-0.093
burn-0.088
coat-0.087
 Sn-0.087
Token.
Token ID13
Feature activation-4.57e-01

Pos logit contributions
ipop+0.039
ip+0.038
burn+0.038
 sunscreen+0.037
 Sn+0.035
Neg logit contributions
 noise-0.015
 awful-0.014
 evil-0.013
 comm-0.012
 unfamiliar-0.012
Token But
Token ID887
Feature activation-4.14e-01

Pos logit contributions
 noise+0.006
 cries+0.005
 truth+0.004
 wonders+0.004
 awful+0.004
Neg logit contributions
ipop-0.014
burn-0.014
ip-0.014
 sunscreen-0.013
uggets-0.013
Token the
Token ID262
Feature activation-1.70e+00

Pos logit contributions
 noise+0.004
 scared+0.003
 unknown+0.003
 unfamiliar+0.003
 cries+0.003
Neg logit contributions
uggets-0.015
burn-0.014
ip-0.014
ouses-0.014
ipop-0.014
Token pony
Token ID26902
Feature activation-9.52e-01

Pos logit contributions
 noise+0.050
 evil+0.045
 cries+0.045
 situation+0.044
 awful+0.044
Neg logit contributions
ip-0.027
uggets-0.026
ipop-0.026
burn-0.023
 sunscreen-0.022
Token was
Token ID373
Feature activation-3.28e+00

Pos logit contributions
 noise+0.014
 cries+0.012
 awful+0.012
 evil+0.012
 unfamiliar+0.011
Neg logit contributions
ipop-0.027
ip-0.027
burn-0.026
 sunscreen-0.026
uggets-0.025
Token confused
Token ID10416
Feature activation-6.40e+00

Pos logit contributions
 noise+0.048
 cries+0.041
 awful+0.040
 situation+0.039
 truth+0.038
Neg logit contributions
ipop-0.093
ip-0.092
burn-0.091
 sunscreen-0.088
 Sn-0.086
Token.
Token ID13
Feature activation-2.08e+00

Pos logit contributions
 comm+0.129
 battlefield+0.123
 destruction+0.123
 awful+0.119
 evil+0.119
Neg logit contributions
ip-0.122
ipop-0.115
 chocolate-0.107
burn-0.106
 Sn-0.104
Token She
Token ID1375
Feature activation-2.66e+00

Pos logit contributions
 noise+0.030
 cries+0.025
 awful+0.024
 truth+0.023
 unfamiliar+0.022
Neg logit contributions
ipop-0.062
burn-0.061
ip-0.061
 sunscreen-0.058
uggets-0.056
Token was
Token ID373
Feature activation-2.31e+00

Pos logit contributions
 noise+0.048
 awful+0.042
 cries+0.041
 evil+0.040
 unfamiliar+0.040
Neg logit contributions
ipop-0.068
ip-0.067
burn-0.065
 sunscreen-0.063
 Sn-0.061
Token not
Token ID407
Feature activation-2.15e+00

Pos logit contributions
 noise+0.027
 cries+0.023
 situation+0.023
 awful+0.023
 truth+0.021
Neg logit contributions
ipop-0.070
burn-0.069
ip-0.069
 sunscreen-0.067
 Sn-0.066
Token used
Token ID973
Feature activation-9.89e-01

Pos logit contributions
 noise+0.039
 awful+0.034
 cries+0.033
 evil+0.033
 unfamiliar+0.032
Neg logit contributions
ipop-0.055
ip-0.055
burn-0.053
 sunscreen-0.051
 Sn-0.050
Token day
Token ID1110
Feature activation+1.51e+00

Pos logit contributions
 noise+0.015
 awful+0.013
 cries+0.013
 evil+0.013
 unfamiliar+0.013
Neg logit contributions
ip-0.016
ipop-0.016
burn-0.016
 sunscreen-0.015
 Sn-0.014
Token his
Token ID465
Feature activation-5.00e-01

Pos logit contributions
burn+0.054
uggets+0.052
ip+0.050
coat+0.050
ugg+0.050
Neg logit contributions
 noise-0.017
 strange-0.011
 evil-0.011
 cries-0.010
 trouble-0.010
Token peaceful
Token ID12309
Feature activation-2.06e+00

Pos logit contributions
 noise+0.011
 cries+0.010
 awful+0.009
 truth+0.009
 evil+0.009
Neg logit contributions
ipop-0.011
ip-0.011
burn-0.011
Whe-0.010
 sunscreen-0.010
Token journey
Token ID7002
Feature activation-8.68e-01

Pos logit contributions
 noise+0.039
 cries+0.033
 awful+0.033
 evil+0.032
 situation+0.032
Neg logit contributions
ip-0.052
ipop-0.052
burn-0.050
 sunscreen-0.048
 Sn-0.047
Token was
Token ID373
Feature activation-3.26e+00

Pos logit contributions
 noise+0.016
 awful+0.015
 cries+0.014
 unfamiliar+0.014
 evil+0.014
Neg logit contributions
ipop-0.021
ip-0.021
burn-0.020
coat-0.019
 sunscreen-0.019
Token disrupted
Token ID30067
Feature activation-6.40e+00

Pos logit contributions
 noise+0.055
 cries+0.051
 awful+0.049
 comm+0.047
 evil+0.046
Neg logit contributions
burn-0.081
ip-0.080
ipop-0.080
 sunscreen-0.079
 Sn-0.076
Token by
Token ID416
Feature activation-3.18e+00

Pos logit contributions
 cries+0.131
 noise+0.125
 screams+0.123
 awful+0.122
 comm+0.118
Neg logit contributions
ip-0.131
 Sn-0.123
 sunscreen-0.121
burn-0.120
uggets-0.116
Token an
Token ID281
Feature activation-1.66e+00

Pos logit contributions
 noise+0.056
 awful+0.049
 cries+0.049
 evil+0.047
 unfamiliar+0.047
Neg logit contributions
ipop-0.082
ip-0.081
burn-0.079
 sunscreen-0.076
 Sn-0.074
Token ignorant
Token ID20109
Feature activation-3.58e+00

Pos logit contributions
 cries+0.021
 noise+0.020
 truth+0.019
 comm+0.018
 screams+0.017
Neg logit contributions
 sunscreen-0.042
ip-0.041
ipop-0.041
burn-0.040
 saddle-0.039
Token animal
Token ID5044
Feature activation-4.19e+00

Pos logit contributions
 noise+0.068
 awful+0.061
 cries+0.060
 evil+0.058
 unfamiliar+0.058
Neg logit contributions
ipop-0.086
ip-0.086
burn-0.083
 sunscreen-0.080
 Sn-0.078
Token.
Token ID13
Feature activation-1.83e+00

Pos logit contributions
 noise+0.050
 awful+0.045
 odd+0.041
 cries+0.041
 unfamiliar+0.041
Neg logit contributions
ip-0.124
ipop-0.124
burn-0.119
 sunscreen-0.115
 Sn-0.114
Token happy
Token ID3772
Feature activation-3.04e+00

Pos logit contributions
 noise+0.021
 awful+0.019
 cries+0.019
 evil+0.018
 unfamiliar+0.018
Neg logit contributions
ipop-0.027
ip-0.027
burn-0.026
 sunscreen-0.025
 Sn-0.024
Token but
Token ID475
Feature activation-1.53e+00

Pos logit contributions
 awful+0.066
 noise+0.064
 evil+0.062
 cries+0.061
 unfamiliar+0.060
Neg logit contributions
ipop-0.062
ip-0.062
uggets-0.058
burn-0.056
 sunscreen-0.055
Token he
Token ID339
Feature activation-2.45e+00

Pos logit contributions
 noise+0.023
 awful+0.020
 cries+0.019
 evil+0.019
 unfamiliar+0.019
Neg logit contributions
ipop-0.044
ip-0.044
burn-0.043
 sunscreen-0.041
 Sn-0.040
Token was
Token ID373
Feature activation-2.42e+00

Pos logit contributions
 noise+0.044
 cries+0.038
 awful+0.038
 evil+0.037
 unfamiliar+0.037
Neg logit contributions
ip-0.062
ipop-0.062
burn-0.058
 sunscreen-0.057
 Sn-0.056
Token also
Token ID635
Feature activation-9.83e-01

Pos logit contributions
 noise+0.038
 cries+0.033
 awful+0.032
 situation+0.031
 truth+0.030
Neg logit contributions
ipop-0.066
ip-0.066
burn-0.064
 sunscreen-0.063
 Sn-0.061
Token confused
Token ID10416
Feature activation-6.39e+00

Pos logit contributions
 noise+0.015
 awful+0.013
 evil+0.013
 cries+0.012
 unfamiliar+0.012
Neg logit contributions
ipop-0.028
ip-0.028
burn-0.028
 sunscreen-0.026
 Sn-0.025
Token why
Token ID1521
Feature activation-1.46e+00

Pos logit contributions
 comm+0.103
 noise+0.102
 cries+0.098
 situation+0.097
 awful+0.097
Neg logit contributions
ip-0.152
ipop-0.149
burn-0.131
 Sn-0.131
 sunscreen-0.130
Token he
Token ID339
Feature activation-1.85e+00

Pos logit contributions
 noise+0.022
 awful+0.019
 evil+0.018
 cries+0.018
 unfamiliar+0.017
Neg logit contributions
ipop-0.042
ip-0.042
burn-0.042
 sunscreen-0.040
uggets-0.039
Token wanted
Token ID2227
Feature activation-2.59e+00

Pos logit contributions
 noise+0.036
 awful+0.032
 cries+0.031
 truth+0.030
 unfamiliar+0.029
Neg logit contributions
ip-0.044
ipop-0.044
burn-0.041
 Sn-0.039
 sunscreen-0.039
Token to
Token ID284
Feature activation-2.46e+00

Pos logit contributions
 noise+0.044
 awful+0.038
 cries+0.038
 evil+0.036
 unfamiliar+0.035
Neg logit contributions
ipop-0.068
ip-0.068
burn-0.066
 sunscreen-0.065
 Sn-0.062
Token collect
Token ID2824
Feature activation+4.95e-01

Pos logit contributions
 awful+0.058
 noise+0.057
 comm+0.057
 urgent+0.057
 cries+0.057
Neg logit contributions
 sunscreen-0.043
ipop-0.040
burn-0.040
ip-0.038
 seeker-0.037
Token the
Token ID262
Feature activation-2.99e+00

Pos logit contributions
 noise+0.013
 unknown+0.010
 evil+0.010
 awful+0.010
 unfamiliar+0.009
Neg logit contributions
burn-0.030
ip-0.029
coat-0.029
etsy-0.028
rav-0.028
Token knife
Token ID9845
Feature activation-2.30e+00

Pos logit contributions
 noise+0.062
 awful+0.056
 cries+0.056
 evil+0.054
 unfamiliar+0.054
Neg logit contributions
ipop-0.067
ip-0.066
burn-0.064
 sunscreen-0.062
 Sn-0.059
Token made
Token ID925
Feature activation-3.22e+00

Pos logit contributions
 noise+0.040
 awful+0.036
 evil+0.035
 unfamiliar+0.035
 comm+0.035
Neg logit contributions
ipop-0.059
ip-0.059
burn-0.055
 sunscreen-0.052
coat-0.052
Token him
Token ID683
Feature activation-2.54e+00

Pos logit contributions
 noise+0.057
 situation+0.056
 comm+0.056
 awful+0.055
 unfamiliar+0.055
Neg logit contributions
ip-0.068
 sunscreen-0.067
burn-0.065
ipop-0.064
 Sn-0.063
Token feel
Token ID1254
Feature activation-2.95e+00

Pos logit contributions
 noise+0.042
 truth+0.039
 comm+0.038
 situation+0.038
 evil+0.038
Neg logit contributions
ipop-0.063
burn-0.061
ip-0.061
 sunscreen-0.059
coat-0.054
Token confused
Token ID10416
Feature activation-6.32e+00

Pos logit contributions
urious+0.048
 noise+0.047
 situation+0.046
umbling+0.045
 comm+0.045
Neg logit contributions
ip-0.069
 sunscreen-0.068
ipop-0.065
 Sn-0.064
 marsh-0.061
Token.
Token ID13
Feature activation-2.26e+00

Pos logit contributions
 comm+0.120
 screams+0.113
 cries+0.113
 awful+0.111
 evil+0.109
Neg logit contributions
ip-0.124
ipop-0.120
 Sn-0.119
burn-0.117
 chocolate-0.114
Token
Token ID198
Feature activation-1.24e+00

Pos logit contributions
 noise+0.032
 cries+0.026
 awful+0.026
 truth+0.025
 evil+0.025
Neg logit contributions
ipop-0.067
ip-0.067
burn-0.066
 sunscreen-0.063
 Sn-0.061
Token
Token ID198
Feature activation-2.19e+00

Pos logit contributions
 noise+0.025
 awful+0.024
 evil+0.023
 cries+0.022
 unfamiliar+0.022
Neg logit contributions
ip-0.029
ipop-0.027
 Sn-0.026
burn-0.026
Whe-0.025
TokenOne
Token ID3198
Feature activation-6.25e-01

Pos logit contributions
 noise+0.042
 awful+0.037
 unfamiliar+0.036
 evil+0.036
 cries+0.035
Neg logit contributions
ipop-0.056
burn-0.054
ip-0.054
 sunscreen-0.051
uggets-0.050
Token day
Token ID1110
Feature activation+1.91e+00

Pos logit contributions
 noise+0.012
 evil+0.010
 awful+0.010
 cries+0.010
 unfamiliar+0.010
Neg logit contributions
ip-0.016
burn-0.015
ipop-0.015
 sunscreen-0.015
 Sn-0.015
Token,
Token ID11
Feature activation+2.13e+00

Pos logit contributions
ipop+0.006
ip+0.006
burn+0.006
 sunscreen+0.006
 Sn+0.006
Neg logit contributions
 noise-0.003
 awful-0.003
 cries-0.003
 evil-0.003
 unfamiliar-0.003
Token but
Token ID475
Feature activation-6.51e-01

Pos logit contributions
ipop+0.058
burn+0.057
ugg+0.055
uggets+0.054
ip+0.053
Neg logit contributions
 noise-0.040
 wonders-0.034
 cries-0.033
 unfamiliar-0.033
 unknown-0.032
Token then
Token ID788
Feature activation-8.75e-01

Pos logit contributions
 noise+0.012
 awful+0.010
 evil+0.010
 cries+0.010
 unfamiliar+0.010
Neg logit contributions
ipop-0.017
burn-0.017
ip-0.017
 sunscreen-0.016
Whe-0.016
Token Lime
Token ID43503
Feature activation-1.20e+00

Pos logit contributions
 noise+0.014
 evil+0.012
 cries+0.012
 awful+0.011
 unfamiliar+0.011
Neg logit contributions
burn-0.025
ipop-0.025
ip-0.024
Whe-0.024
coat-0.024
Token got
Token ID1392
Feature activation-2.57e+00

Pos logit contributions
 noise+0.020
 awful+0.018
 cries+0.018
 evil+0.017
 unfamiliar+0.017
Neg logit contributions
ipop-0.032
ip-0.031
burn-0.031
 sunscreen-0.030
 Sn-0.029
Token confused
Token ID10416
Feature activation-6.27e+00

Pos logit contributions
 noise+0.041
 awful+0.035
 cries+0.035
 evil+0.034
 unfamiliar+0.033
Neg logit contributions
ipop-0.071
ip-0.070
burn-0.069
 sunscreen-0.066
 Sn-0.065
Token.
Token ID13
Feature activation-1.57e+00

Pos logit contributions
 noise+0.077
 cries+0.070
 awful+0.069
 evil+0.067
 comm+0.067
Neg logit contributions
ip-0.177
ipop-0.176
burn-0.170
 sunscreen-0.167
 Sn-0.165
Token He
Token ID679
Feature activation-1.80e+00

Pos logit contributions
 noise+0.030
 awful+0.028
 cries+0.027
 evil+0.027
 situation+0.026
Neg logit contributions
ip-0.037
ipop-0.036
burn-0.035
 Sn-0.034
 sunscreen-0.034
Token couldn
Token ID3521
Feature activation-2.82e+00

Pos logit contributions
 noise+0.030
 awful+0.026
 cries+0.026
 evil+0.025
 truth+0.025
Neg logit contributions
ipop-0.048
ip-0.048
burn-0.046
 sunscreen-0.044
 Sn-0.043
Token't
Token ID470
Feature activation-2.42e+00

Pos logit contributions
 noise+0.049
 unfamiliar+0.045
 evil+0.045
 awful+0.045
 comm+0.044
Neg logit contributions
ip-0.071
burn-0.068
 Sn-0.068
ipop-0.067
Whe-0.062
Token remember
Token ID3505
Feature activation-2.23e+00

Pos logit contributions
 noise+0.050
 awful+0.046
 cries+0.045
 evil+0.044
 comm+0.044
Neg logit contributions
ipop-0.052
ip-0.051
burn-0.050
 sunscreen-0.048
Whe-0.046
Token,
Token ID11
Feature activation+1.66e+00

Pos logit contributions
 noise+0.015
 evil+0.015
 awful+0.015
 cries+0.014
 comm+0.014
Neg logit contributions
ip-0.030
ipop-0.030
burn-0.030
 sunscreen-0.030
coat-0.028
Token which
Token ID543
Feature activation+3.42e-02

Pos logit contributions
ipop+0.042
ip+0.041
burn+0.041
 sunscreen+0.041
 Sn+0.038
Neg logit contributions
 noise-0.028
 awful-0.026
 cries-0.025
 evil-0.025
 truth-0.024
Token made
Token ID925
Feature activation-3.03e+00

Pos logit contributions
ipop+0.001
burn+0.001
ip+0.001
 Sn+0.001
 sunscreen+0.001
Neg logit contributions
 noise-0.001
 cries-0.001
 awful-0.001
 evil-0.001
urious-0.001
Token them
Token ID606
Feature activation-1.53e+00

Pos logit contributions
 noise+0.057
 awful+0.053
 cries+0.052
 situation+0.052
 evil+0.051
Neg logit contributions
ipop-0.068
ip-0.068
burn-0.066
 sunscreen-0.065
 Sn-0.062
Token very
Token ID845
Feature activation-2.41e+00

Pos logit contributions
 noise+0.031
 awful+0.029
 evil+0.029
 cries+0.028
 situation+0.028
Neg logit contributions
ipop-0.032
burn-0.032
ip-0.032
 sunscreen-0.031
 Sn-0.029
Token excited
Token ID6568
Feature activation-6.23e+00

Pos logit contributions
 situation+0.060
 truth+0.059
 harm+0.056
 situations+0.054
 evil+0.054
Neg logit contributions
 sunscreen-0.031
apy-0.029
ip-0.029
 marsh-0.028
uggets-0.028
Token.
Token ID13
Feature activation-1.14e+00

Pos logit contributions
 awful+0.090
 evil+0.089
 urgent+0.088
 comm+0.088
 cries+0.087
Neg logit contributions
ipop-0.150
burn-0.145
 sunscreen-0.142
 breakfast-0.141
ip-0.141
Token
Token ID198
Feature activation-6.46e-01

Pos logit contributions
 noise+0.026
 awful+0.024
 evil+0.024
 harm+0.024
 situation+0.023
Neg logit contributions
ip-0.023
ipop-0.023
 sunscreen-0.021
 Sn-0.021
burn-0.020
Token
Token ID198
Feature activation-1.28e+00

Pos logit contributions
 noise+0.013
 awful+0.012
 evil+0.012
 cries+0.011
 unfamiliar+0.011
Neg logit contributions
ip-0.016
ipop-0.015
 Sn-0.014
burn-0.014
 sunscreen-0.014
TokenBen
Token ID11696
Feature activation-1.24e+00

Pos logit contributions
 noise+0.030
 awful+0.028
 cries+0.027
 evil+0.027
 unfamiliar+0.027
Neg logit contributions
ipop-0.025
ip-0.025
burn-0.024
 sunscreen-0.023
 Sn-0.022
Token asked
Token ID1965
Feature activation+3.16e-01

Pos logit contributions
 noise+0.019
 awful+0.017
 cries+0.016
 evil+0.016
 truth+0.016
Neg logit contributions
ipop-0.034
ip-0.034
burn-0.033
 sunscreen-0.032
 Sn-0.031
Token but
Token ID475
Feature activation+9.64e-01

Pos logit contributions
uggets+0.042
ipop+0.041
rav+0.041
burn+0.040
ugg+0.040
Neg logit contributions
 noise-0.035
 cries-0.031
 awful-0.027
 unfamiliar-0.027
 odd-0.027
Token the
Token ID262
Feature activation-1.59e+00

Pos logit contributions
burn+0.031
ip+0.031
uggets+0.030
ipop+0.030
 sunscreen+0.029
Neg logit contributions
 noise-0.012
 awful-0.009
 evil-0.009
 cries-0.009
 unfamiliar-0.009
Token nurse
Token ID15849
Feature activation-2.22e+00

Pos logit contributions
 noise+0.045
 cries+0.040
 evil+0.040
 awful+0.040
 truth+0.039
Neg logit contributions
ip-0.027
ipop-0.026
burn-0.024
 sunscreen-0.024
uggets-0.023
Token was
Token ID373
Feature activation-3.18e+00

Pos logit contributions
 noise+0.034
 cries+0.030
 awful+0.029
 evil+0.028
 comm+0.026
Neg logit contributions
 sunscreen-0.062
burn-0.062
ipop-0.062
ip-0.061
uggets-0.060
Token very
Token ID845
Feature activation-2.17e+00

Pos logit contributions
 noise+0.046
 awful+0.039
 cries+0.039
 evil+0.037
 unfamiliar+0.036
Neg logit contributions
ipop-0.092
ip-0.091
burn-0.090
 sunscreen-0.087
 Sn-0.085
Token surprised
Token ID6655
Feature activation-6.20e+00

Pos logit contributions
 noise+0.031
 cries+0.027
 awful+0.027
 truth+0.026
 situation+0.026
Neg logit contributions
ipop-0.061
ip-0.061
burn-0.059
 sunscreen-0.057
 Sn-0.056
Token and
Token ID290
Feature activation-1.67e+00

Pos logit contributions
 evil+0.124
 noise+0.123
 awful+0.122
 comm+0.120
 urgent+0.118
Neg logit contributions
ip-0.115
ipop-0.111
 chocolate-0.101
uit-0.093
coat-0.092
Token shouted
Token ID17293
Feature activation-6.24e-01

Pos logit contributions
 noise+0.025
 awful+0.021
 cries+0.021
 evil+0.020
 unfamiliar+0.020
Neg logit contributions
ipop-0.048
ip-0.048
burn-0.047
 sunscreen-0.045
 Sn-0.044
Token "
Token ID366
Feature activation+1.05e+00

Pos logit contributions
 noise+0.010
 awful+0.008
 cries+0.008
 evil+0.008
 odd+0.008
Neg logit contributions
burn-0.018
ipop-0.018
ip-0.018
 sunscreen-0.018
Whe-0.017
TokenO
Token ID46
Feature activation+1.08e+00

Pos logit contributions
ip+0.023
ipop+0.023
 sunscreen+0.022
uggets+0.022
agna+0.021
Neg logit contributions
 noise-0.023
 awful-0.019
 cries-0.019
 evil-0.019
 harm-0.019
Tokenuch
Token ID794
Feature activation-1.23e+00

Pos logit contributions
ipop+0.019
ip+0.018
burn+0.018
 sunscreen+0.017
 Sn+0.016
Neg logit contributions
 noise-0.028
 awful-0.026
 cries-0.026
 evil-0.025
 unfamiliar-0.025
Token day
Token ID1110
Feature activation+2.60e+00

Pos logit contributions
 noise+0.000
 awful+0.000
 evil+0.000
 cries+0.000
 unfamiliar+0.000
Neg logit contributions
ipop-0.001
ip-0.001
burn-0.001
 sunscreen-0.001
uggets-0.001
Token,
Token ID11
Feature activation+2.17e+00

Pos logit contributions
ipop+0.069
ip+0.069
burn+0.069
 sunscreen+0.065
 Sn+0.064
Neg logit contributions
 noise-0.044
 awful-0.038
 cries-0.037
 evil-0.036
 situation-0.035
Token Jack
Token ID3619
Feature activation-1.62e+00

Pos logit contributions
ipop+0.050
ip+0.049
burn+0.046
 sunscreen+0.046
 Sn+0.044
Neg logit contributions
 noise-0.043
 cries-0.040
 awful-0.040
 evil-0.038
 truth-0.038
Token was
Token ID373
Feature activation-1.42e+00

Pos logit contributions
 noise+0.038
 awful+0.037
 odd+0.035
 truth+0.035
 evil+0.034
Neg logit contributions
ip-0.030
ipop-0.028
burn-0.028
 sunscreen-0.026
 Sn-0.026
Token very
Token ID845
Feature activation-1.62e+00

Pos logit contributions
 noise+0.030
 awful+0.027
 cries+0.027
 evil+0.026
 unfamiliar+0.026
Neg logit contributions
ip-0.031
ipop-0.031
burn-0.030
 sunscreen-0.029
 Sn-0.028
Token confused
Token ID10416
Feature activation-6.20e+00

Pos logit contributions
 noise+0.027
 awful+0.024
 cries+0.024
 truth+0.023
 evil+0.023
Neg logit contributions
ipop-0.042
ip-0.042
burn-0.040
 sunscreen-0.040
 Sn-0.038
Token;
Token ID26
Feature activation-1.93e+00

Pos logit contributions
 awful+0.112
 comm+0.107
 evil+0.105
 situation+0.104
 cries+0.103
Neg logit contributions
ipop-0.131
ip-0.130
 chocolate-0.121
burn-0.120
 breakfast-0.119
Token he
Token ID339
Feature activation-2.02e+00

Pos logit contributions
 noise+0.023
 unfamiliar+0.016
 evil+0.016
 awful+0.016
 cries+0.015
Neg logit contributions
ip-0.066
ipop-0.065
burn-0.065
uggets-0.061
ugg-0.061
Token could
Token ID714
Feature activation-1.52e+00

Pos logit contributions
 noise+0.039
 awful+0.037
 truth+0.036
 situation+0.034
 cries+0.034
Neg logit contributions
ip-0.041
ipop-0.041
burn-0.039
 Sn-0.039
bie-0.037
Token smell
Token ID8508
Feature activation-3.62e+00

Pos logit contributions
 awful+0.031
 comm+0.030
umbling+0.029
 noise+0.028
 odd+0.027
Neg logit contributions
 wife-0.026
 sunscreen-0.026
burn-0.026
 yes-0.025
 haircut-0.025
Token something
Token ID1223
Feature activation-3.48e+00

Pos logit contributions
 noise+0.065
 cries+0.062
 situation+0.061
urious+0.060
 awful+0.060
Neg logit contributions
ip-0.081
 sunscreen-0.081
ipop-0.081
burn-0.078
 Sn-0.075
Token the
Token ID262
Feature activation-1.84e+00

Pos logit contributions
ip+0.033
uggets+0.033
ipop+0.032
ugg+0.032
rav+0.031
Neg logit contributions
 noise-0.011
 strange-0.008
 scared-0.007
 unknown-0.007
 unfamiliar-0.007
Token store
Token ID3650
Feature activation-2.28e+00

Pos logit contributions
 noise+0.033
 truth+0.028
 situation+0.025
 unknown+0.025
 voice+0.023
Neg logit contributions
ip-0.057
ipop-0.053
burn-0.051
uggets-0.051
Whe-0.049
Tokenkeeper
Token ID13884
Feature activation-2.46e+00

Pos logit contributions
 noise+0.040
 awful+0.035
 cries+0.035
 evil+0.033
 truth+0.033
Neg logit contributions
ipop-0.059
ip-0.059
burn-0.057
 sunscreen-0.056
Whe-0.054
Token was
Token ID373
Feature activation-3.15e+00

Pos logit contributions
 noise+0.043
 awful+0.037
 cries+0.037
 evil+0.036
 unfamiliar+0.035
Neg logit contributions
ipop-0.064
ip-0.064
burn-0.062
 sunscreen-0.060
 Sn-0.058
Token very
Token ID845
Feature activation-2.62e+00

Pos logit contributions
 noise+0.050
 cries+0.044
 awful+0.044
 evil+0.042
 situation+0.042
Neg logit contributions
ipop-0.083
ip-0.083
burn-0.082
 sunscreen-0.078
 Sn-0.078
Token ignorant
Token ID20109
Feature activation-6.19e+00

Pos logit contributions
 noise+0.034
 awful+0.028
 cries+0.028
 evil+0.026
 unfamiliar+0.026
Neg logit contributions
ipop-0.080
ip-0.079
burn-0.077
 sunscreen-0.075
 Sn-0.074
Token.
Token ID13
Feature activation-2.56e+00

Pos logit contributions
 noise+0.079
 awful+0.073
 evil+0.072
 unfamiliar+0.068
 cries+0.068
Neg logit contributions
ip-0.177
ipop-0.176
burn-0.169
 Sn-0.163
coat-0.160
Token He
Token ID679
Feature activation-3.05e+00

Pos logit contributions
 noise+0.030
 cries+0.023
 awful+0.022
 truth+0.021
 unfamiliar+0.020
Neg logit contributions
ipop-0.083
ip-0.083
burn-0.082
 sunscreen-0.079
rav-0.076
Token said
Token ID531
Feature activation+1.82e-01

Pos logit contributions
 noise+0.049
 awful+0.042
 cries+0.042
 evil+0.040
 unfamiliar+0.039
Neg logit contributions
ipop-0.084
ip-0.083
burn-0.082
 sunscreen-0.079
 Sn-0.077
Token he
Token ID339
Feature activation-1.61e+00

Pos logit contributions
burn+0.006
Whe+0.006
coat+0.006
ip+0.006
ipop+0.006
Neg logit contributions
 noise-0.002
 awful-0.002
 cries-0.002
 evil-0.002
 truth-0.002
Token couldn
Token ID3521
Feature activation-2.98e+00

Pos logit contributions
 noise+0.024
 awful+0.021
 cries+0.021
 evil+0.020
 unfamiliar+0.019
Neg logit contributions
ipop-0.046
ip-0.045
burn-0.045
 sunscreen-0.043
 Sn-0.042
Token One
Token ID1881
Feature activation+3.43e-01

Pos logit contributions
 noise+0.027
 evil+0.026
 awful+0.026
 cries+0.025
 comm+0.024
Neg logit contributions
ip-0.029
ipop-0.029
 sunscreen-0.028
Whe-0.027
 Sn-0.027
Token day
Token ID1110
Feature activation+2.67e+00

Pos logit contributions
ip+0.008
ipop+0.008
burn+0.008
 sunscreen+0.007
 Sn+0.007
Neg logit contributions
 noise-0.007
 awful-0.006
 cries-0.006
 evil-0.006
 unfamiliar-0.006
Token,
Token ID11
Feature activation+1.72e+00

Pos logit contributions
ipop+0.066
ip+0.066
burn+0.064
 sunscreen+0.062
 Sn+0.060
Neg logit contributions
 noise-0.050
 awful-0.044
 cries-0.044
 evil-0.043
 unfamiliar-0.042
Token he
Token ID339
Feature activation-1.37e+00

Pos logit contributions
ipop+0.039
ip+0.038
 sunscreen+0.037
burn+0.034
uggets+0.034
Neg logit contributions
 noise-0.034
 awful-0.033
 cries-0.032
 situation-0.031
 evil-0.031
Token was
Token ID373
Feature activation-1.31e+00

Pos logit contributions
 noise+0.026
 awful+0.023
 cries+0.023
 evil+0.022
 unfamiliar+0.022
Neg logit contributions
ipop-0.034
ip-0.034
burn-0.033
 sunscreen-0.032
 Sn-0.031
Token faced
Token ID7452
Feature activation-6.15e+00

Pos logit contributions
 noise+0.025
 evil+0.021
 cries+0.021
 awful+0.020
 unfamiliar+0.020
Neg logit contributions
ipop-0.034
ip-0.033
burn-0.033
 sunscreen-0.031
rav-0.031
Token with
Token ID351
Feature activation-1.14e+00

Pos logit contributions
 noise+0.121
 awful+0.113
 cries+0.110
 evil+0.106
 unfamiliar+0.104
Neg logit contributions
ip-0.139
ipop-0.138
burn-0.134
 sunscreen-0.129
uggets-0.127
Token an
Token ID281
Feature activation-1.54e+00

Pos logit contributions
 noise+0.021
 awful+0.019
 cries+0.019
 truth+0.018
 comm+0.018
Neg logit contributions
ipop-0.027
ip-0.027
burn-0.026
 sunscreen-0.026
 Sn-0.024
Token impressive
Token ID8036
Feature activation-2.59e-01

Pos logit contributions
 noise+0.023
 cries+0.022
 comm+0.020
 truth+0.020
 awful+0.020
Neg logit contributions
ip-0.038
ipop-0.038
 sunscreen-0.036
burn-0.035
 saddle-0.035
Token puzzle
Token ID15027
Feature activation-2.05e+00

Pos logit contributions
 noise+0.005
 awful+0.005
 cries+0.005
 evil+0.005
 unfamiliar+0.005
Neg logit contributions
ipop-0.006
ip-0.006
burn-0.006
 sunscreen-0.005
 Sn-0.005
Token.
Token ID13
Feature activation-1.78e+00

Pos logit contributions
 cries+0.031
 noise+0.030
 awful+0.030
urious+0.029
 comm+0.028
Neg logit contributions
ipop-0.052
ip-0.052
 harness-0.047
belt-0.047
 saddle-0.047
Token arm
Token ID3211
Feature activation-1.75e+00

Pos logit contributions
ipop+0.013
ip+0.013
 sunscreen+0.012
burn+0.012
 harness+0.011
Neg logit contributions
 awful-0.016
 noise-0.015
 unfamiliar-0.015
 truth-0.015
urious-0.015
Token but
Token ID475
Feature activation+2.37e-01

Pos logit contributions
 noise+0.028
 awful+0.027
 comm+0.027
 evil+0.026
 truth+0.025
Neg logit contributions
ip-0.045
ipop-0.044
burn-0.042
coat-0.041
 Sn-0.040
Token Tim
Token ID5045
Feature activation-6.13e-01

Pos logit contributions
ip+0.008
uggets+0.008
burn+0.008
ugg+0.008
ipop+0.008
Neg logit contributions
 noise-0.003
 awful-0.002
 scared-0.002
 evil-0.002
 unknown-0.002
Tokenmy
Token ID1820
Feature activation-2.48e+00

Pos logit contributions
 noise+0.016
 awful+0.014
 cries+0.014
 evil+0.014
 unfamiliar+0.014
Neg logit contributions
ipop-0.011
ip-0.011
burn-0.011
 sunscreen-0.010
 Sn-0.010
Token was
Token ID373
Feature activation-2.46e+00

Pos logit contributions
 noise+0.044
 awful+0.039
 evil+0.038
 unfamiliar+0.038
 situation+0.037
Neg logit contributions
ip-0.064
ipop-0.064
burn-0.059
 Sn-0.056
coat-0.055
Token confused
Token ID10416
Feature activation-6.12e+00

Pos logit contributions
 noise+0.032
 cries+0.029
 situation+0.028
 awful+0.027
 truth+0.026
Neg logit contributions
ipop-0.071
ip-0.071
burn-0.070
 Sn-0.068
 sunscreen-0.067
Token and
Token ID290
Feature activation-1.92e+00

Pos logit contributions
 evil+0.111
 comm+0.110
 destruction+0.109
 screams+0.108
 situation+0.105
Neg logit contributions
ip-0.138
ipop-0.135
burn-0.124
 chocolate-0.122
 breakfast-0.114
Token didn
Token ID1422
Feature activation-3.40e+00

Pos logit contributions
 noise+0.031
 awful+0.027
 cries+0.027
 truth+0.027
 situation+0.026
Neg logit contributions
ipop-0.051
ip-0.051
burn-0.049
 sunscreen-0.047
 Sn-0.046
Token't
Token ID470
Feature activation-2.47e+00

Pos logit contributions
 noise+0.069
 wonders+0.067
 comm+0.067
urious+0.065
 unfamiliar+0.065
Neg logit contributions
ip-0.073
 Sn-0.071
burn-0.070
ipop-0.066
Whe-0.062
Token know
Token ID760
Feature activation-2.00e+00

Pos logit contributions
 noise+0.042
 awful+0.040
 cries+0.039
 evil+0.037
 comm+0.037
Neg logit contributions
ipop-0.062
burn-0.059
ip-0.059
 sunscreen-0.056
Whe-0.054
Token what
Token ID644
Feature activation-8.53e-01

Pos logit contributions
 noise+0.044
 awful+0.039
 cries+0.039
 evil+0.038
 unfamiliar+0.038
Neg logit contributions
ipop-0.043
ip-0.043
burn-0.041
 sunscreen-0.039
 Sn-0.038
Token.
Token ID13
Feature activation-1.62e+00

Pos logit contributions
 noise+0.025
 cries+0.022
 awful+0.021
 truth+0.021
 comm+0.020
Neg logit contributions
ip-0.058
ipop-0.058
burn-0.057
 sunscreen-0.056
 Sn-0.055
Token The
Token ID383
Feature activation-1.26e+00

Pos logit contributions
 noise+0.027
 awful+0.024
 cries+0.023
 evil+0.023
 unfamiliar+0.022
Neg logit contributions
ipop-0.043
ip-0.043
burn-0.042
 sunscreen-0.040
 Sn-0.039
Token rabbit
Token ID22746
Feature activation-2.61e+00

Pos logit contributions
 noise+0.027
 awful+0.023
 evil+0.023
 cries+0.023
 situation+0.022
Neg logit contributions
ipop-0.030
ip-0.030
burn-0.029
 sunscreen-0.027
uggets-0.027
Token was
Token ID373
Feature activation-1.78e+00

Pos logit contributions
 noise+0.047
 truth+0.044
 unfamiliar+0.044
 awful+0.044
 odd+0.043
Neg logit contributions
ip-0.059
ipop-0.059
burn-0.054
 Sn-0.052
coat-0.051
Token very
Token ID845
Feature activation-1.59e+00

Pos logit contributions
urious+0.031
 situation+0.028
 noise+0.028
 truth+0.028
 wonders+0.026
Neg logit contributions
burn-0.040
 Sn-0.039
ipop-0.039
ip-0.038
 sunscreen-0.038
Token confused
Token ID10416
Feature activation-6.12e+00

Pos logit contributions
 truth+0.028
 noise+0.027
 situation+0.027
urious+0.027
 abrupt+0.025
Neg logit contributions
ipop-0.035
ip-0.035
uggets-0.031
 sunscreen-0.031
 Sn-0.030
Token.
Token ID13
Feature activation-2.23e+00

Pos logit contributions
 awful+0.118
 evil+0.117
 comm+0.117
 urgent+0.114
 cries+0.113
Neg logit contributions
ip-0.112
ipop-0.108
burn-0.102
 Sn-0.100
 chocolate-0.097
Token He
Token ID679
Feature activation-2.28e+00

Pos logit contributions
 noise+0.036
 awful+0.033
 evil+0.033
 harm+0.032
 unfamiliar+0.031
Neg logit contributions
ip-0.058
ipop-0.058
 Sn-0.056
 sunscreen-0.054
Whe-0.053
Token hopped
Token ID45230
Feature activation+1.74e+00

Pos logit contributions
 noise+0.042
 awful+0.039
 truth+0.038
 situation+0.038
 cries+0.037
Neg logit contributions
ipop-0.051
ip-0.050
burn-0.047
 sunscreen-0.047
 Sn-0.047
Token around
Token ID1088
Feature activation-1.62e+00

Pos logit contributions
ipop+0.041
ip+0.040
burn+0.039
 sunscreen+0.038
 Sn+0.037
Neg logit contributions
 noise-0.035
 awful-0.031
 cries-0.031
 evil-0.030
 unfamiliar-0.030
Token looking
Token ID2045
Feature activation+1.72e-01

Pos logit contributions
 noise+0.029
 awful+0.026
 cries+0.025
 evil+0.025
 unfamiliar+0.024
Neg logit contributions
ipop-0.040
ip-0.040
burn-0.039
 sunscreen-0.038
 Sn-0.037
Token dad
Token ID9955
Feature activation-1.04e-01

Pos logit contributions
ipop+0.038
ip+0.037
burn+0.036
 sunscreen+0.034
 Sn+0.033
Neg logit contributions
 noise-0.054
 awful-0.050
 cries-0.049
 evil-0.048
 unfamiliar-0.048
Token saw
Token ID2497
Feature activation-1.85e+00

Pos logit contributions
 noise+0.002
 awful+0.002
 cries+0.002
 evil+0.002
 unfamiliar+0.002
Neg logit contributions
ipop-0.003
ip-0.003
burn-0.002
 sunscreen-0.002
 Sn-0.002
Token that
Token ID326
Feature activation-1.50e+00

Pos logit contributions
 noise+0.038
 cries+0.035
 evil+0.035
 awful+0.035
 situation+0.034
Neg logit contributions
ip-0.040
ipop-0.039
 sunscreen-0.039
burn-0.038
 Sn-0.036
Token he
Token ID339
Feature activation-2.18e+00

Pos logit contributions
 noise+0.027
 awful+0.022
 truth+0.021
 unfamiliar+0.021
 unknown+0.021
Neg logit contributions
ipop-0.042
ip-0.042
burn-0.040
ugg-0.039
uggets-0.038
Token was
Token ID373
Feature activation-1.62e+00

Pos logit contributions
 noise+0.041
 truth+0.036
 awful+0.036
 cries+0.036
 evil+0.035
Neg logit contributions
ip-0.050
ipop-0.049
burn-0.047
 Sn-0.046
 sunscreen-0.043
Token confused
Token ID10416
Feature activation-6.11e+00

Pos logit contributions
 noise+0.024
 cries+0.023
 situation+0.022
urious+0.022
 awful+0.022
Neg logit contributions
burn-0.041
 sunscreen-0.040
ip-0.040
 Sn-0.040
ipop-0.039
Token and
Token ID290
Feature activation-1.68e+00

Pos logit contributions
 evil+0.099
 awful+0.098
 screams+0.097
 comm+0.097
 cries+0.097
Neg logit contributions
burn-0.135
ipop-0.134
ip-0.132
 Sn-0.124
 chocolate-0.124
Token they
Token ID484
Feature activation-1.43e+00

Pos logit contributions
 noise+0.027
 truth+0.024
 situation+0.024
 cries+0.024
 awful+0.024
Neg logit contributions
ip-0.043
ipop-0.043
burn-0.041
 sunscreen-0.041
 Sn-0.040
Token sat
Token ID3332
Feature activation-5.57e-01

Pos logit contributions
 noise+0.026
 awful+0.023
 cries+0.023
 evil+0.023
 situation+0.022
Neg logit contributions
ipop-0.035
ip-0.034
burn-0.034
 sunscreen-0.032
 Sn-0.032
Token down
Token ID866
Feature activation+7.14e-02

Pos logit contributions
 noise+0.013
 cries+0.011
 awful+0.011
 unfamiliar+0.011
 wonders+0.011
Neg logit contributions
ipop-0.012
ip-0.011
burn-0.011
Whe-0.011
 sunscreen-0.011
Token with
Token ID351
Feature activation-5.45e-01

Pos logit contributions
ipop+0.002
ip+0.002
burn+0.002
 sunscreen+0.002
 Sn+0.002
Neg logit contributions
 noise-0.001
 awful-0.001
 cries-0.001
 evil-0.001
 unfamiliar-0.001
Token woods
Token ID16479
Feature activation-2.17e+00

Pos logit contributions
 noise+0.034
 cries+0.031
 awful+0.030
 evil+0.028
 comm+0.028
Neg logit contributions
burn-0.059
ipop-0.059
ip-0.059
 sunscreen-0.059
 Sn-0.057
Token and
Token ID290
Feature activation-1.92e+00

Pos logit contributions
 awful+0.039
 noise+0.039
 comm+0.038
 cries+0.036
 evil+0.036
Neg logit contributions
coat-0.044
ip-0.044
 Sn-0.042
ipop-0.042
burn-0.041
Token she
Token ID673
Feature activation-2.85e+00

Pos logit contributions
 noise+0.035
 awful+0.033
 truth+0.032
 urgent+0.032
urious+0.032
Neg logit contributions
ipop-0.044
 Sn-0.041
ip-0.041
 sunscreen-0.039
burn-0.039
Token was
Token ID373
Feature activation-1.99e+00

Pos logit contributions
 noise+0.055
 awful+0.054
 odd+0.051
 truth+0.051
 unfamiliar+0.051
Neg logit contributions
 Sn-0.058
ip-0.057
ipop-0.056
burn-0.052
bie-0.050
Token very
Token ID845
Feature activation-2.01e+00

Pos logit contributions
urious+0.035
 noise+0.035
 awful+0.033
 situation+0.032
 truth+0.032
Neg logit contributions
 Sn-0.044
burn-0.041
ip-0.039
ipop-0.039
 sunscreen-0.038
Token confused
Token ID10416
Feature activation-6.10e+00

Pos logit contributions
 noise+0.032
 truth+0.031
urious+0.031
 situation+0.030
 awful+0.030
Neg logit contributions
ipop-0.045
ip-0.044
 Sn-0.044
 sunscreen-0.041
uggets-0.041
Token.
Token ID13
Feature activation-2.07e+00

Pos logit contributions
 comm+0.102
 awful+0.100
 situation+0.099
 cries+0.099
 noise+0.098
Neg logit contributions
burn-0.127
ipop-0.127
 Sn-0.127
ip-0.126
 breakfast-0.121
Token She
Token ID1375
Feature activation-2.93e+00

Pos logit contributions
 noise+0.034
 awful+0.029
 cries+0.029
 evil+0.028
 unfamiliar+0.027
Neg logit contributions
ipop-0.056
ip-0.056
burn-0.055
 sunscreen-0.053
 Sn-0.051
Token didn
Token ID1422
Feature activation-2.89e+00

Pos logit contributions
 noise+0.056
 awful+0.053
 truth+0.051
umbling+0.049
 harm+0.049
Neg logit contributions
ip-0.062
ipop-0.061
 Sn-0.059
 sunscreen-0.056
burn-0.056
Token't
Token ID470
Feature activation-2.79e+00

Pos logit contributions
 noise+0.060
urious+0.058
 wonders+0.057
 comm+0.057
 unfamiliar+0.056
Neg logit contributions
ip-0.060
 Sn-0.059
burn-0.058
ipop-0.053
 Eleanor-0.051
Token know
Token ID760
Feature activation-1.81e+00

Pos logit contributions
 awful+0.049
 noise+0.048
 cries+0.048
 comm+0.046
urious+0.046
Neg logit contributions
ipop-0.060
burn-0.059
 seeker-0.058
ip-0.058
 sunscreen-0.057
Token however
Token ID2158
Feature activation+5.64e-01

Pos logit contributions
ip+0.052
ipop+0.050
burn+0.049
coat+0.048
aj+0.048
Neg logit contributions
 noise-0.018
 scared-0.013
 strange-0.011
 unknown-0.011
 scary-0.010
Token,
Token ID11
Feature activation+8.80e-01

Pos logit contributions
burn+0.019
ipop+0.019
Whe+0.019
ip+0.018
coat+0.018
Neg logit contributions
 noise-0.006
 awful-0.005
 unknown-0.004
 evil-0.004
 scared-0.004
Token the
Token ID262
Feature activation-2.29e+00

Pos logit contributions
ip+0.031
ipop+0.030
ugg+0.029
burn+0.029
pp+0.028
Neg logit contributions
 noise-0.012
 scared-0.008
 strange-0.008
 unknown-0.007
 scary-0.007
Token ducks
Token ID39694
Feature activation-3.28e+00

Pos logit contributions
 noise+0.049
 awful+0.046
 cries+0.045
 evil+0.044
 unfamiliar+0.044
Neg logit contributions
ipop-0.047
burn-0.047
ip-0.047
 sunscreen-0.046
 Sn-0.042
Token were
Token ID547
Feature activation-2.14e+00

Pos logit contributions
 noise+0.070
 awful+0.067
 odd+0.066
 evil+0.065
 unfamiliar+0.065
Neg logit contributions
ip-0.068
burn-0.066
ipop-0.066
 sunscreen-0.060
coat-0.056
Token faced
Token ID7452
Feature activation-6.08e+00

Pos logit contributions
 noise+0.043
 awful+0.040
 truth+0.039
 cries+0.039
 situation+0.039
Neg logit contributions
burn-0.048
ip-0.046
 sunscreen-0.044
ipop-0.043
 Sn-0.042
Token with
Token ID351
Feature activation-1.78e+00

Pos logit contributions
 noise+0.122
 awful+0.114
 cries+0.109
 evil+0.107
 truth+0.107
Neg logit contributions
ip-0.134
burn-0.130
ipop-0.128
uggets-0.123
 Sn-0.122
Token a
Token ID257
Feature activation-1.48e+00

Pos logit contributions
 noise+0.035
 awful+0.032
 cries+0.032
 evil+0.030
 truth+0.030
Neg logit contributions
ip-0.040
ipop-0.040
burn-0.040
 sunscreen-0.039
 Sn-0.037
Token threat
Token ID2372
Feature activation-3.47e+00

Pos logit contributions
 noise+0.022
 awful+0.019
 cries+0.019
 evil+0.018
 unfamiliar+0.018
Neg logit contributions
ipop-0.041
ip-0.041
burn-0.041
 sunscreen-0.039
 Sn-0.038
Token.
Token ID13
Feature activation-1.69e+00

Pos logit contributions
 noise+0.054
 awful+0.054
 odd+0.049
 unfamiliar+0.049
 comm+0.048
Neg logit contributions
ipop-0.088
burn-0.086
ip-0.085
 Sn-0.080
 sunscreen-0.079
Token A
Token ID317
Feature activation-1.70e+00

Pos logit contributions
 noise+0.036
 awful+0.033
 evil+0.033
 cries+0.032
 unfamiliar+0.032
Neg logit contributions
ip-0.036
ipop-0.035
burn-0.034
 sunscreen-0.034
 Sn-0.034
Token.
Token ID13
Feature activation-1.71e+00

Pos logit contributions
 noise+0.034
 cries+0.030
 evil+0.030
 awful+0.029
 unfamiliar+0.029
Neg logit contributions
ip-0.069
ipop-0.068
burn-0.066
 Sn-0.063
 sunscreen-0.063
Token Poor
Token ID23676
Feature activation-4.95e-01

Pos logit contributions
 noise+0.041
 awful+0.038
 cries+0.038
 evil+0.038
 unfamiliar+0.037
Neg logit contributions
ip-0.032
ipop-0.031
 Sn-0.029
burn-0.028
 sunscreen-0.028
Token Benny
Token ID44275
Feature activation-1.85e+00

Pos logit contributions
 noise+0.012
 awful+0.011
 cries+0.011
 evil+0.011
 unfamiliar+0.011
Neg logit contributions
ipop-0.009
ip-0.009
burn-0.009
 sunscreen-0.008
 Sn-0.008
Token was
Token ID373
Feature activation-2.02e+00

Pos logit contributions
 noise+0.035
 unfamiliar+0.033
 odd+0.032
 awful+0.032
 urgent+0.031
Neg logit contributions
ip-0.045
ipop-0.042
burn-0.039
 Sn-0.037
coat-0.036
Token too
Token ID1165
Feature activation-1.28e+00

Pos logit contributions
 noise+0.031
 cries+0.028
urious+0.027
 situation+0.026
 truth+0.025
Neg logit contributions
ipop-0.055
ip-0.055
burn-0.054
 sunscreen-0.052
 Sn-0.051
Token excited
Token ID6568
Feature activation-6.05e+00

Pos logit contributions
 noise+0.023
 awful+0.021
 situation+0.020
 cries+0.020
 evil+0.020
Neg logit contributions
ip-0.033
ipop-0.031
burn-0.030
 sunscreen-0.028
uggets-0.028
Token and
Token ID290
Feature activation-1.04e+00

Pos logit contributions
 urgent+0.156
 awful+0.155
 evil+0.154
 comm+0.147
 unfamiliar+0.147
Neg logit contributions
ipop-0.084
ip-0.079
burn-0.075
 chocolate-0.075
 breakfast-0.073
Token tore
Token ID26197
Feature activation-1.04e+00

Pos logit contributions
 noise+0.019
 truth+0.018
 situation+0.018
 awful+0.018
 unfamiliar+0.017
Neg logit contributions
ipop-0.024
ip-0.024
 sunscreen-0.023
burn-0.022
 Sn-0.021
Token too
Token ID1165
Feature activation-1.18e+00

Pos logit contributions
 noise+0.019
 awful+0.017
 cries+0.017
 evil+0.017
 unfamiliar+0.017
Neg logit contributions
ip-0.025
ipop-0.025
burn-0.024
 sunscreen-0.023
 Sn-0.023
Token hard
Token ID1327
Feature activation+2.76e-01

Pos logit contributions
 noise+0.024
 comm+0.022
 evil+0.022
 awful+0.022
 situation+0.021
Neg logit contributions
ip-0.027
ipop-0.024
burn-0.023
 Sn-0.023
 sunscreen-0.022
Token.
Token ID13
Feature activation-1.35e+00

Pos logit contributions
ip+0.008
ipop+0.007
burn+0.007
 Sn+0.006
uggets+0.006
Neg logit contributions
 noise-0.004
urious-0.004
 evil-0.004
 awful-0.004
 unfamiliar-0.004
Token no
Token ID645
Feature activation+3.61e-01

Pos logit contributions
 noise+0.054
 awful+0.048
 evil+0.046
 cries+0.046
 truth+0.045
Neg logit contributions
ip-0.097
ipop-0.096
burn-0.095
Whe-0.089
rav-0.089
Token doll
Token ID3654
Feature activation+1.16e+00

Pos logit contributions
ip+0.008
ipop+0.008
burn+0.008
ugg+0.007
 sunscreen+0.007
Neg logit contributions
 noise-0.008
 cries-0.007
 awful-0.007
 evil-0.007
 truth-0.007
Token.
Token ID13
Feature activation-9.77e-01

Pos logit contributions
ipop+0.033
ip+0.033
burn+0.032
 sunscreen+0.032
Whe+0.031
Neg logit contributions
 noise-0.017
 cries-0.015
 awful-0.015
 evil-0.014
 situation-0.013
Token Sara
Token ID24799
Feature activation-1.89e+00

Pos logit contributions
 noise+0.020
 cries+0.018
 awful+0.018
 truth+0.018
 evil+0.017
Neg logit contributions
burn-0.023
ipop-0.022
ip-0.022
 sunscreen-0.021
coat-0.020
Token was
Token ID373
Feature activation-1.52e+00

Pos logit contributions
 noise+0.033
 awful+0.029
 evil+0.027
 cries+0.027
 unfamiliar+0.027
Neg logit contributions
ip-0.050
ipop-0.050
burn-0.047
 sunscreen-0.045
 Sn-0.045
Token confused
Token ID10416
Feature activation-6.04e+00

Pos logit contributions
 noise+0.024
 situation+0.021
 cries+0.020
 awful+0.020
 truth+0.020
Neg logit contributions
ipop-0.040
ip-0.039
burn-0.038
 sunscreen-0.037
 Sn-0.037
Token and
Token ID290
Feature activation-9.11e-01

Pos logit contributions
 comm+0.120
 awful+0.119
 evil+0.118
 urgent+0.118
 destruction+0.114
Neg logit contributions
ipop-0.113
ip-0.108
burn-0.107
 chocolate-0.104
 sunscreen-0.099
Token uncomfortable
Token ID12916
Feature activation-2.78e+00

Pos logit contributions
 noise+0.012
 situation+0.011
 truth+0.011
 awful+0.011
 odd+0.010
Neg logit contributions
ipop-0.025
ip-0.025
 sunscreen-0.024
burn-0.023
 Sn-0.023
Token.
Token ID13
Feature activation-1.57e+00

Pos logit contributions
 noise+0.030
 awful+0.026
 evil+0.025
 cries+0.025
 unfamiliar+0.024
Neg logit contributions
ipop-0.086
ip-0.084
burn-0.084
 Sn-0.081
 sunscreen-0.080
Token She
Token ID1375
Feature activation-1.57e+00

Pos logit contributions
 noise+0.025
 awful+0.022
 cries+0.021
 evil+0.021
 unfamiliar+0.021
Neg logit contributions
ipop-0.043
ip-0.043
burn-0.042
 sunscreen-0.040
 Sn-0.039
Token did
Token ID750
Feature activation-2.42e+00

Pos logit contributions
 noise+0.029
 awful+0.026
 situation+0.026
 unfamiliar+0.025
 truth+0.025
Neg logit contributions
ipop-0.037
ip-0.037
burn-0.034
 sunscreen-0.034
 Sn-0.034
Token bright
Token ID6016
Feature activation-4.23e-01

Pos logit contributions
 noise+0.011
 awful+0.010
 cries+0.010
 unfamiliar+0.010
 evil+0.010
Neg logit contributions
ipop-0.015
ip-0.014
burn-0.014
 sunscreen-0.014
 Sn-0.013
Token flame
Token ID16506
Feature activation-1.67e+00

Pos logit contributions
 noise+0.008
 awful+0.007
 cries+0.007
 evil+0.007
 unfamiliar+0.007
Neg logit contributions
ipop-0.010
burn-0.010
ip-0.010
 sunscreen-0.009
 Sn-0.009
Token that
Token ID326
Feature activation-9.82e-01

Pos logit contributions
 noise+0.034
 awful+0.032
 cries+0.031
 unfamiliar+0.031
 evil+0.031
Neg logit contributions
ipop-0.037
ip-0.036
burn-0.036
 sunscreen-0.033
 Sn-0.033
Token makes
Token ID1838
Feature activation-3.46e+00

Pos logit contributions
 noise+0.015
 situation+0.015
 awful+0.015
 comm+0.014
 evil+0.014
Neg logit contributions
ipop-0.024
ip-0.024
 sunscreen-0.023
 Sn-0.022
 marsh-0.022
Token them
Token ID606
Feature activation-1.06e+00

Pos logit contributions
 noise+0.063
 awful+0.059
 situation+0.059
 cries+0.058
 comm+0.058
Neg logit contributions
ip-0.076
burn-0.076
ipop-0.076
 sunscreen-0.074
 Sn-0.072
Token curious
Token ID11040
Feature activation-6.04e+00

Pos logit contributions
 noise+0.019
 awful+0.017
 cries+0.017
 evil+0.017
 situation+0.017
Neg logit contributions
burn-0.025
ipop-0.025
ip-0.025
 sunscreen-0.024
 Sn-0.023
Token.
Token ID13
Feature activation-2.07e+00

Pos logit contributions
 noise+0.062
 awful+0.059
 evil+0.056
 cries+0.056
 comm+0.055
Neg logit contributions
ipop-0.178
burn-0.177
ip-0.175
 sunscreen-0.173
 Sn-0.170
Token They
Token ID1119
Feature activation-1.77e+00

Pos logit contributions
 noise+0.047
 harm+0.044
 awful+0.043
 evil+0.043
 unfamiliar+0.042
Neg logit contributions
ipop-0.041
ip-0.041
 sunscreen-0.038
 Sn-0.038
uggets-0.036
Token want
Token ID765
Feature activation-8.14e-01

Pos logit contributions
 noise+0.034
 awful+0.033
 situation+0.032
 evil+0.031
 comm+0.031
Neg logit contributions
 sunscreen-0.036
ipop-0.036
ip-0.035
burn-0.035
 steering-0.032
Token to
Token ID284
Feature activation-1.05e+00

Pos logit contributions
 noise+0.015
 awful+0.014
 comm+0.014
 evil+0.014
 cries+0.014
Neg logit contributions
ipop-0.018
 sunscreen-0.018
burn-0.018
ip-0.018
uggets-0.017
Token touch
Token ID3638
Feature activation-7.06e-01

Pos logit contributions
 awful+0.024
 noise+0.023
 comm+0.022
 cries+0.022
 evil+0.022
Neg logit contributions
 sunscreen-0.019
burn-0.019
ipop-0.018
ip-0.017
 harness-0.017
Token.
Token ID13
Feature activation-5.32e-01

Pos logit contributions
ipop+0.037
ip+0.036
burn+0.036
 sunscreen+0.035
Whe+0.034
Neg logit contributions
 noise-0.009
 awful-0.006
 evil-0.005
 truth-0.005
 cries-0.005
Token But
Token ID887
Feature activation+1.67e-01

Pos logit contributions
 noise+0.009
 cries+0.008
 truth+0.007
 odd+0.007
 wonders+0.007
Neg logit contributions
burn-0.015
ipop-0.015
ip-0.015
rav-0.014
pp-0.014
Token they
Token ID484
Feature activation-1.52e+00

Pos logit contributions
ip+0.005
coat+0.005
ipop+0.005
burn+0.005
Whe+0.005
Neg logit contributions
 noise-0.002
 cries-0.002
 awful-0.002
 unfamiliar-0.002
 evil-0.002
Token were
Token ID547
Feature activation-2.23e+00

Pos logit contributions
 noise+0.026
 awful+0.023
 cries+0.023
 evil+0.022
 unfamiliar+0.022
Neg logit contributions
ipop-0.039
ip-0.039
burn-0.039
 sunscreen-0.037
 Sn-0.036
Token too
Token ID1165
Feature activation-1.00e+00

Pos logit contributions
 noise+0.033
 cries+0.028
 awful+0.028
 situation+0.027
 evil+0.026
Neg logit contributions
ipop-0.063
ip-0.062
burn-0.061
 sunscreen-0.059
 Sn-0.058
Token excited
Token ID6568
Feature activation-6.03e+00

Pos logit contributions
 noise+0.019
 awful+0.018
 cries+0.017
 truth+0.017
 evil+0.017
Neg logit contributions
ip-0.023
ipop-0.023
burn-0.023
 sunscreen-0.022
 Sn-0.021
Token and
Token ID290
Feature activation-4.72e-01

Pos logit contributions
 awful+0.137
 evil+0.136
 urgent+0.129
 comm+0.126
 noise+0.125
Neg logit contributions
burn-0.105
ipop-0.101
ip-0.098
coat-0.097
 breakfast-0.094
Token curious
Token ID11040
Feature activation-4.18e+00

Pos logit contributions
 noise+0.008
 awful+0.007
 cries+0.007
 evil+0.007
 unfamiliar+0.007
Neg logit contributions
ipop-0.012
ip-0.012
burn-0.012
 sunscreen-0.011
 Sn-0.011
Token.
Token ID13
Feature activation-1.41e+00

Pos logit contributions
 noise+0.045
 awful+0.039
 cries+0.038
 evil+0.038
 unfamiliar+0.036
Neg logit contributions
ipop-0.131
ip-0.128
burn-0.126
 sunscreen-0.122
 Sn-0.121
Token They
Token ID1119
Feature activation-1.13e+00

Pos logit contributions
 noise+0.023
 cries+0.020
 truth+0.019
 wonders+0.018
 awful+0.018
Neg logit contributions
burn-0.041
ipop-0.041
ip-0.040
rav-0.036
coat-0.036
Token wanted
Token ID2227
Feature activation-1.12e+00

Pos logit contributions
 noise+0.019
 cries+0.017
 awful+0.017
 evil+0.016
 unfamiliar+0.016
Neg logit contributions
ip-0.030
ipop-0.030
burn-0.029
 sunscreen-0.028
Whe-0.027
Token looked
Token ID3114
Feature activation+1.74e+00

Pos logit contributions
 noise+0.008
 awful+0.007
 cries+0.006
 evil+0.006
 unfamiliar+0.006
Neg logit contributions
ipop-0.010
ip-0.010
burn-0.010
 sunscreen-0.010
 Sn-0.009
Token brighter
Token ID26083
Feature activation-6.69e-01

Pos logit contributions
ipop+0.041
ip+0.041
burn+0.040
 sunscreen+0.038
 Sn+0.037
Neg logit contributions
 noise-0.035
 awful-0.031
 cries-0.030
 evil-0.030
 unfamiliar-0.029
Token.
Token ID13
Feature activation+6.99e-01

Pos logit contributions
 noise+0.007
 cries+0.006
 awful+0.005
 evil+0.005
 unfamiliar+0.005
Neg logit contributions
ip-0.022
ipop-0.022
burn-0.021
 sunscreen-0.021
 Sn-0.020
Token From
Token ID3574
Feature activation-3.47e-01

Pos logit contributions
burn+0.019
ipop+0.019
 sunscreen+0.019
ip+0.019
coat+0.018
Neg logit contributions
 noise-0.010
 cries-0.010
 awful-0.009
 truth-0.009
 wonders-0.008
Token that
Token ID326
Feature activation+3.08e+00

Pos logit contributions
 noise+0.005
 awful+0.004
 cries+0.004
 unfamiliar+0.004
 evil+0.004
Neg logit contributions
ipop-0.010
burn-0.010
ip-0.010
rav-0.010
Whe-0.010
Token day
Token ID1110
Feature activation+5.78e+00

Pos logit contributions
ipop+0.067
ip+0.067
burn+0.065
 sunscreen+0.062
 Sn+0.060
Neg logit contributions
 noise-0.067
 awful-0.060
 cries-0.060
 evil-0.059
 unfamiliar-0.058
Token on
Token ID319
Feature activation+1.75e+00

Pos logit contributions
uggets+0.121
 sunscreen+0.118
ipop+0.117
burn+0.111
 apr+0.111
Neg logit contributions
 noise-0.120
 cries-0.111
 awful-0.107
 evil-0.106
 trouble-0.101
Token,
Token ID11
Feature activation+3.73e+00

Pos logit contributions
burn+0.055
uggets+0.054
ipop+0.054
Whe+0.053
 sunscreen+0.052
Neg logit contributions
 noise-0.021
 cries-0.017
 situation-0.015
 awful-0.015
 trouble-0.014
Token Lee
Token ID5741
Feature activation+8.17e-01

Pos logit contributions
ipop+0.093
coat+0.090
 sunscreen+0.090
burn+0.090
uggets+0.089
Neg logit contributions
 noise-0.065
 cries-0.057
 awful-0.053
 evil-0.053
 truth-0.053
Token always
Token ID1464
Feature activation+1.02e+00

Pos logit contributions
ipop+0.023
burn+0.022
Whe+0.022
 sunscreen+0.022
ip+0.021
Neg logit contributions
 noise-0.012
 cries-0.011
 awful-0.011
 truth-0.010
 wonders-0.010
Token remembered
Token ID12086
Feature activation-2.12e+00

Pos logit contributions
ipop+0.025
ip+0.025
burn+0.024
 sunscreen+0.023
Whe+0.023
Neg logit contributions
 noise-0.019
 cries-0.017
 awful-0.017
 wonders-0.016
 evil-0.016
Token all
Token ID477
Feature activation+1.74e+00

Pos logit contributions
ip+0.087
ipop+0.086
burn+0.084
 sunscreen+0.081
 Sn+0.079
Neg logit contributions
 noise-0.054
 awful-0.048
 cries-0.047
 evil-0.046
 unfamiliar-0.045
Token day
Token ID1110
Feature activation+3.23e+00

Pos logit contributions
ip+0.035
ipop+0.035
burn+0.034
 sunscreen+0.032
 Sn+0.031
Neg logit contributions
 noise-0.040
 cries-0.037
 awful-0.036
 evil-0.036
 truth-0.035
Token.
Token ID13
Feature activation+9.22e-01

Pos logit contributions
ip+0.083
ipop+0.075
burn+0.072
 sunscreen+0.071
 Sn+0.069
Neg logit contributions
 noise-0.060
 evil-0.055
 awful-0.054
 unfamiliar-0.053
urious-0.052
Token From
Token ID3574
Feature activation+4.11e-01

Pos logit contributions
ipop+0.026
burn+0.025
ip+0.025
 sunscreen+0.024
coat+0.024
Neg logit contributions
 noise-0.014
 cries-0.012
 awful-0.012
 truth-0.011
 evil-0.011
Token that
Token ID326
Feature activation+4.06e+00

Pos logit contributions
ipop+0.014
ip+0.014
burn+0.014
 sunscreen+0.013
 Sn+0.013
Neg logit contributions
 noise-0.004
 awful-0.003
 cries-0.003
 evil-0.003
 unfamiliar-0.003
Token day
Token ID1110
Feature activation+5.87e+00

Pos logit contributions
ipop+0.091
ip+0.090
burn+0.088
 sunscreen+0.084
 Sn+0.081
Neg logit contributions
 noise-0.086
 awful-0.077
 cries-0.076
 evil-0.075
 unfamiliar-0.074
Token on
Token ID319
Feature activation+2.29e+00

Pos logit contributions
ipop+0.109
uggets+0.104
 sunscreen+0.104
burn+0.102
Whe+0.098
Neg logit contributions
 noise-0.140
 cries-0.129
 awful-0.128
 evil-0.126
 situation-0.121
Token,
Token ID11
Feature activation+3.36e+00

Pos logit contributions
ipop+0.063
burn+0.061
ip+0.061
 sunscreen+0.059
uggets+0.058
Neg logit contributions
 noise-0.038
 cries-0.032
 awful-0.032
 evil-0.031
 situation-0.031
Token Max
Token ID5436
Feature activation-1.15e-01

Pos logit contributions
ipop+0.092
burn+0.086
ip+0.085
 sunscreen+0.083
uggets+0.083
Neg logit contributions
 noise-0.059
 cries-0.049
 awful-0.048
 embarrassing-0.047
 harm-0.047
Token and
Token ID290
Feature activation+3.64e+00

Pos logit contributions
 noise+0.002
 awful+0.001
 cries+0.001
 evil+0.001
 unfamiliar+0.001
Neg logit contributions
ipop-0.003
ip-0.003
burn-0.003
 sunscreen-0.003
 Sn-0.003
Token Lily
Token ID20037
Feature activation+1.39e+00

Pos logit contributions
ip+0.081
ipop+0.080
burn+0.079
 sunscreen+0.076
uggets+0.073
Neg logit contributions
 noise-0.081
 cries-0.072
 evil-0.071
 harm-0.070
 unfamiliar-0.069
Token!
Token ID0
Feature activation-3.78e-01

Pos logit contributions
 noise+0.000
 awful+0.000
 truth+0.000
 evil+0.000
 cries+0.000
Neg logit contributions
burn-0.000
ip-0.000
ipop-0.000
Whe-0.000
 sunscreen-0.000
Token Now
Token ID2735
Feature activation+1.01e+00

Pos logit contributions
 noise+0.005
 truth+0.005
 wonders+0.004
 cries+0.004
 Instead+0.004
Neg logit contributions
ipop-0.011
burn-0.011
ip-0.011
 sunscreen-0.011
coat-0.010
Token let
Token ID1309
Feature activation+1.74e+00

Pos logit contributions
ipop+0.029
ip+0.028
burn+0.028
uggets+0.028
ugg+0.026
Neg logit contributions
 noise-0.016
 cries-0.013
 wonders-0.012
 truth-0.012
 awful-0.012
Tokenâ
Token ID22940
Feature activation+1.37e+00

Pos logit contributions
ipop+0.054
ip+0.053
burn+0.052
 sunscreen+0.051
uggets+0.050
Neg logit contributions
 noise-0.023
 awful-0.018
 cries-0.018
 truth-0.017
 evil-0.017
Token€
Token ID26391
Feature activation+3.39e+00

Pos logit contributions
ipop+0.029
ip+0.029
burn+0.028
 sunscreen+0.027
 Sn+0.026
Neg logit contributions
 noise-0.030
 awful-0.027
 cries-0.027
 evil-0.026
 unfamiliar-0.026
Tokenâ„¢
Token ID8151
Feature activation+5.93e+00

Pos logit contributions
ipop+0.077
burn+0.074
ip+0.072
 sunscreen+0.072
 Sn+0.068
Neg logit contributions
 noise-0.067
 cries-0.061
 awful-0.058
 situation-0.057
 truth-0.057
Tokens
Token ID82
Feature activation+1.33e+00

Pos logit contributions
 sunscreen+0.167
burn+0.163
 apr+0.161
ipop+0.160
Whe+0.158
Neg logit contributions
 noise-0.074
 cries-0.068
 awful-0.065
 situation-0.060
 truth-0.060
Token make
Token ID787
Feature activation+1.23e-01

Pos logit contributions
ipop+0.033
ip+0.033
burn+0.031
 Sn+0.030
 sunscreen+0.030
Neg logit contributions
 noise-0.026
 cries-0.022
 awful-0.021
 truth-0.021
 situation-0.021
Token a
Token ID257
Feature activation+1.70e+00

Pos logit contributions
ip+0.003
ipop+0.003
burn+0.003
 sunscreen+0.003
 Sn+0.003
Neg logit contributions
 noise-0.002
 awful-0.002
 cries-0.002
 evil-0.002
 unfamiliar-0.002
Token delicious
Token ID12625
Feature activation+1.80e+00

Pos logit contributions
ipop+0.042
burn+0.042
ip+0.042
Whe+0.040
 sunscreen+0.039
Neg logit contributions
 noise-0.034
 awful-0.029
 cries-0.028
 evil-0.028
 situation-0.027
Token meal
Token ID9799
Feature activation+1.46e+00

Pos logit contributions
ipop+0.032
ip+0.032
burn+0.031
 sunscreen+0.029
 Sn+0.028
Neg logit contributions
 noise-0.047
 awful-0.042
 cries-0.042
 evil-0.041
 situation-0.040
Token and
Token ID290
Feature activation+1.56e+00

Pos logit contributions
ipop+0.082
ip+0.081
burn+0.080
Whe+0.077
 sunscreen+0.077
Neg logit contributions
 noise-0.042
 cries-0.035
 awful-0.035
 unfamiliar-0.033
 situation-0.033
Token gave
Token ID2921
Feature activation+5.18e-01

Pos logit contributions
burn+0.046
ip+0.043
rav+0.043
Whe+0.043
ipop+0.043
Neg logit contributions
 noise-0.024
 cries-0.019
 awful-0.018
 comm-0.018
 wonders-0.017
Token Jack
Token ID3619
Feature activation+6.34e-01

Pos logit contributions
ipop+0.013
ip+0.013
burn+0.013
 sunscreen+0.012
 Sn+0.012
Neg logit contributions
 noise-0.009
 awful-0.008
 cries-0.008
 evil-0.008
 unfamiliar-0.008
Token a
Token ID257
Feature activation+2.04e+00

Pos logit contributions
burn+0.018
ipop+0.018
ip+0.017
Whe+0.017
coat+0.017
Neg logit contributions
 noise-0.009
 cries-0.008
 awful-0.008
 unfamiliar-0.007
 evil-0.007
Token l
Token ID300
Feature activation+2.92e+00

Pos logit contributions
burn+0.059
ip+0.058
ipop+0.056
Whe+0.056
 Sn+0.055
Neg logit contributions
 noise-0.032
 truth-0.026
 awful-0.025
 cries-0.025
 situation-0.024
Tokenoll
Token ID692
Feature activation+9.66e+00

Pos logit contributions
ipop+0.043
ip+0.042
burn+0.040
 sunscreen+0.037
 Sn+0.035
Neg logit contributions
 noise-0.085
 awful-0.078
 cries-0.078
 evil-0.076
 unfamiliar-0.076
Tokenipop
Token ID42800
Feature activation+3.52e+00

Pos logit contributions
Whe+0.089
burn+0.066
 sunscreen+0.062
 Sn+0.057
Br+0.053
Neg logit contributions
 noise-0.314
 cries-0.312
 situation-0.305
 awful-0.304
 evil-0.304
Token.
Token ID13
Feature activation+2.76e-01

Pos logit contributions
burn+0.119
ipop+0.119
ip+0.118
Whe+0.116
 sunscreen+0.116
Neg logit contributions
 noise-0.030
 awful-0.023
 cries-0.022
 truth-0.020
 evil-0.020
Token Jack
Token ID3619
Feature activation+5.56e-01

Pos logit contributions
ipop+0.007
ip+0.007
burn+0.007
 sunscreen+0.007
 Sn+0.006
Neg logit contributions
 noise-0.005
 awful-0.004
 cries-0.004
 evil-0.004
 unfamiliar-0.004
Token thanked
Token ID26280
Feature activation+3.18e-01

Pos logit contributions
burn+0.017
ipop+0.017
ip+0.016
 sunscreen+0.016
Whe+0.016
Neg logit contributions
 noise-0.007
 cries-0.006
 awful-0.006
 wonders-0.005
 evil-0.005
Token him
Token ID683
Feature activation+1.36e+00

Pos logit contributions
ipop+0.008
ip+0.007
burn+0.007
 sunscreen+0.007
 Sn+0.007
Neg logit contributions
 noise-0.006
 awful-0.005
 cries-0.005
 evil-0.005
 unfamiliar-0.005
Token man
Token ID582
Feature activation-5.80e-01

Pos logit contributions
ipop+0.035
ip+0.035
burn+0.035
 sunscreen+0.033
Whe+0.032
Neg logit contributions
 noise-0.024
 awful-0.020
 cries-0.020
 evil-0.020
 truth-0.019
Token who
Token ID508
Feature activation+1.22e+00

Pos logit contributions
 noise+0.006
 awful+0.005
 cries+0.005
 evil+0.005
 unfamiliar+0.005
Neg logit contributions
burn-0.018
 sunscreen-0.018
ipop-0.018
ip-0.018
Whe-0.018
Token gave
Token ID2921
Feature activation+6.96e-01

Pos logit contributions
ipop+0.030
ip+0.030
burn+0.029
 sunscreen+0.028
 Sn+0.027
Neg logit contributions
 noise-0.023
 awful-0.020
 cries-0.020
 evil-0.019
 unfamiliar-0.019
Token her
Token ID607
Feature activation+2.18e+00

Pos logit contributions
ipop+0.021
burn+0.020
ip+0.020
coat+0.019
Whe+0.019
Neg logit contributions
 noise-0.010
 cries-0.008
 truth-0.008
 awful-0.008
 wonders-0.008
Token the
Token ID262
Feature activation+9.37e-01

Pos logit contributions
ipop+0.054
burn+0.053
Whe+0.053
ip+0.053
coat+0.052
Neg logit contributions
 noise-0.039
 truth-0.036
 awful-0.035
 cries-0.034
 evil-0.034
Token sa
Token ID473
Feature activation+6.11e+00

Pos logit contributions
ipop+0.022
ip+0.022
burn+0.021
 sunscreen+0.020
Whe+0.020
Neg logit contributions
 noise-0.019
 awful-0.017
 truth-0.017
 cries-0.016
 evil-0.016
Tokenus
Token ID385
Feature activation+3.35e+00

Pos logit contributions
Whe+0.127
 sunscreen+0.122
ipop+0.122
burn+0.121
coat+0.115
Neg logit contributions
 noise-0.125
 cries-0.115
 awful-0.113
 truth-0.113
 situation-0.112
Tokenages
Token ID1095
Feature activation+4.02e+00

Pos logit contributions
Whe+0.062
burn+0.062
 seeker+0.060
ipop+0.059
 sunscreen+0.058
Neg logit contributions
 noise-0.079
 cries-0.073
 awful-0.071
 truth-0.070
 evil-0.069
Token.
Token ID13
Feature activation+7.51e-01

Pos logit contributions
ip+0.122
ipop+0.122
burn+0.122
Whe+0.117
 sunscreen+0.115
Neg logit contributions
 noise-0.049
 awful-0.041
 cries-0.039
 truth-0.037
 evil-0.037
Token She
Token ID1375
Feature activation+1.62e-01

Pos logit contributions
burn+0.023
ipop+0.023
ip+0.022
 sunscreen+0.022
coat+0.022
Neg logit contributions
 noise-0.008
 cries-0.008
 truth-0.007
 wonders-0.007
 awful-0.006
Token even
Token ID772
Feature activation+5.58e-01

Pos logit contributions
ipop+0.004
ip+0.004
burn+0.004
 sunscreen+0.004
Whe+0.004
Neg logit contributions
 noise-0.003
 cries-0.002
 awful-0.002
 evil-0.002
 truth-0.002
Tokenoring
Token ID3255
Feature activation-2.50e+00

Pos logit contributions
ipop+0.037
 groom+0.036
burn+0.036
Whe+0.036
 seventy+0.035
Neg logit contributions
 noise-0.047
 awful-0.045
umbling-0.044
 cries-0.043
.-0.042
Token loudly
Token ID23112
Feature activation-2.46e+00

Pos logit contributions
 noise+0.036
 awful+0.030
 cries+0.029
 evil+0.028
 unfamiliar+0.028
Neg logit contributions
ipop-0.074
ip-0.073
burn-0.072
 sunscreen-0.069
 Sn-0.067
Token.
Token ID13
Feature activation-1.24e+00

Pos logit contributions
 noise+0.023
 awful+0.019
 cries+0.018
 evil+0.017
 unfamiliar+0.017
Neg logit contributions
ipop-0.081
ip-0.081
burn-0.079
 sunscreen-0.077
 Sn-0.076
Token
Token ID198
Feature activation-4.90e-01

Pos logit contributions
 noise+0.020
 cries+0.017
 awful+0.017
 truth+0.016
 unfamiliar+0.016
Neg logit contributions
ipop-0.034
burn-0.034
ip-0.034
 sunscreen-0.032
 Sn-0.031
Token
Token ID198
Feature activation-1.13e+00

Pos logit contributions
 noise+0.008
 awful+0.008
 evil+0.007
 cries+0.007
 unfamiliar+0.007
Neg logit contributions
ip-0.013
ipop-0.013
burn-0.012
 sunscreen-0.012
 Sn-0.012
Token"
Token ID1
Feature activation+1.68e+00

Pos logit contributions
 noise+0.022
 awful+0.019
 unfamiliar+0.019
 odd+0.018
 unknown+0.018
Neg logit contributions
ipop-0.028
burn-0.027
ip-0.026
uggets-0.025
inea-0.025
TokenLook
Token ID8567
Feature activation+2.65e+00

Pos logit contributions
ip+0.048
ipop+0.045
ouses+0.039
aj+0.038
belt+0.038
Neg logit contributions
 noise-0.033
 truth-0.028
 comm-0.027
 unfamiliar-0.027
 wonders-0.027
Token,
Token ID11
Feature activation+3.29e+00

Pos logit contributions
burn+0.090
coat+0.086
ugg+0.086
 saddle+0.086
Whe+0.085
Neg logit contributions
 noise-0.026
 cries-0.022
 awful-0.021
.-0.019
!-0.019
Token Lily
Token ID20037
Feature activation-1.27e-01

Pos logit contributions
burn+0.103
ip+0.101
ugg+0.099
coat+0.099
z+0.099
Neg logit contributions
 noise-0.054
 truth-0.042
 that-0.041
 mean-0.040
 strange-0.040
Token,
Token ID11
Feature activation+2.94e+00

Pos logit contributions
 noise+0.002
 awful+0.002
 cries+0.002
 evil+0.002
 unfamiliar+0.002
Neg logit contributions
ip-0.003
ipop-0.003
burn-0.003
 sunscreen-0.003
 Sn-0.003
Token a
Token ID257
Feature activation+1.79e-01

Pos logit contributions
ip+0.098
ipop+0.095
burn+0.095
coat+0.093
ugg+0.093
Neg logit contributions
 noise-0.038
 truth-0.028
 evil-0.027
 unknown-0.026
 awful-0.025
Token.
Token ID13
Feature activation-1.13e+00

Pos logit contributions
ipop+0.029
ip+0.028
burn+0.028
coat+0.026
 sunscreen+0.026
Neg logit contributions
 noise-0.014
 awful-0.013
 evil-0.012
 cries-0.012
 unfamiliar-0.011
Token He
Token ID679
Feature activation-5.49e-01

Pos logit contributions
 noise+0.014
 cries+0.013
 truth+0.012
 wonders+0.011
 awful+0.011
Neg logit contributions
burn-0.036
ipop-0.033
ip-0.033
 sunscreen-0.032
rav-0.031
Token opened
Token ID4721
Feature activation+3.37e-02

Pos logit contributions
 noise+0.010
 cries+0.009
 awful+0.009
 evil+0.009
 wonders+0.008
Neg logit contributions
burn-0.014
ipop-0.014
ip-0.014
 sunscreen-0.013
Whe-0.013
Token the
Token ID262
Feature activation-3.57e-01

Pos logit contributions
ipop+0.001
burn+0.001
ip+0.001
Whe+0.001
rav+0.001
Neg logit contributions
 noise-0.001
 cries-0.000
 awful-0.000
 wonders-0.000
 unfamiliar-0.000
Token box
Token ID3091
Feature activation+8.88e-01

Pos logit contributions
 noise+0.006
 awful+0.005
 truth+0.005
 situation+0.005
 cries+0.005
Neg logit contributions
ip-0.010
burn-0.010
ipop-0.009
Whe-0.009
ugg-0.009
Token and
Token ID290
Feature activation+1.83e+00

Pos logit contributions
ipop+0.020
ip+0.019
burn+0.017
coat+0.017
belt+0.017
Neg logit contributions
 noise-0.018
 awful-0.017
 evil-0.017
 comm-0.017
 harm-0.017
Token showed
Token ID3751
Feature activation-3.66e-01

Pos logit contributions
burn+0.057
ip+0.053
ipop+0.052
ittens+0.051
rav+0.051
Neg logit contributions
 noise-0.026
 wonders-0.022
 cries-0.022
 scared-0.019
 unknown-0.019
Token them
Token ID606
Feature activation+1.00e+00

Pos logit contributions
 noise+0.006
 cries+0.005
 awful+0.005
 evil+0.005
 truth+0.005
Neg logit contributions
ipop-0.010
burn-0.010
ip-0.010
 sunscreen-0.009
coat-0.009
Token what
Token ID644
Feature activation+1.78e-01

Pos logit contributions
ipop+0.025
ip+0.025
burn+0.024
 sunscreen+0.024
 Sn+0.023
Neg logit contributions
 noise-0.018
 awful-0.016
 cries-0.016
 evil-0.015
 unfamiliar-0.015
Token was
Token ID373
Feature activation-1.07e+00

Pos logit contributions
ipop+0.004
ip+0.004
burn+0.004
 sunscreen+0.004
 Sn+0.004
Neg logit contributions
 noise-0.003
 awful-0.003
 cries-0.003
 evil-0.003
 unfamiliar-0.003
Token inside
Token ID2641
Feature activation-4.09e-01

Pos logit contributions
 noise+0.028
 awful+0.026
 cries+0.026
 evil+0.025
 unfamiliar+0.025
Neg logit contributions
ipop-0.018
ip-0.018
burn-0.017
 sunscreen-0.016
 Sn-0.015
TokenWhen
Token ID2215
Feature activation+6.79e-01

Pos logit contributions
 noise+0.001
 cries+0.001
 evil+0.001
 harm+0.001
 awful+0.001
Neg logit contributions
ip-0.000
ipop-0.000
Fl-0.000
Whe-0.000
 sunscreen-0.000
Token it
Token ID340
Feature activation+1.33e+00

Pos logit contributions
ip+0.016
ipop+0.016
burn+0.016
 sunscreen+0.015
coat+0.015
Neg logit contributions
 noise-0.014
 cries-0.012
 awful-0.012
 truth-0.011
 evil-0.011
Token was
Token ID373
Feature activation-4.16e-01

Pos logit contributions
ipop+0.035
ip+0.035
burn+0.034
 Sn+0.033
 sunscreen+0.032
Neg logit contributions
 noise-0.022
 awful-0.019
 evil-0.019
 cries-0.019
 unfamiliar-0.019
Token all
Token ID477
Feature activation-8.94e-01

Pos logit contributions
 noise+0.006
 awful+0.005
 evil+0.004
 truth+0.004
 unfamiliar+0.004
Neg logit contributions
ip-0.013
ipop-0.013
burn-0.012
Whe-0.012
uggets-0.012
Token clean
Token ID3424
Feature activation-9.09e-02

Pos logit contributions
 evil+0.020
 noise+0.020
 awful+0.019
 odd+0.019
 unfamiliar+0.019
Neg logit contributions
burn-0.018
ipop-0.016
ip-0.016
 Sn-0.016
 sunscreen-0.016
Token,
Token ID11
Feature activation+1.47e+00

Pos logit contributions
 noise+0.002
 awful+0.001
 cries+0.001
 evil+0.001
 unfamiliar+0.001
Neg logit contributions
ipop-0.002
ip-0.002
burn-0.002
 sunscreen-0.002
 Sn-0.002
Token Mia
Token ID32189
Feature activation-5.41e-01

Pos logit contributions
ip+0.032
ipop+0.032
burn+0.031
 sunscreen+0.029
 Sn+0.028
Neg logit contributions
 noise-0.033
 awful-0.029
 cries-0.029
 evil-0.028
 unfamiliar-0.028
Token's
Token ID338
Feature activation+1.37e+00

Pos logit contributions
 noise+0.009
 awful+0.008
 evil+0.008
 cries+0.008
 unfamiliar+0.008
Neg logit contributions
ip-0.014
ipop-0.014
burn-0.014
 sunscreen-0.013
 Sn-0.013
Token mom
Token ID1995
Feature activation-7.16e-01

Pos logit contributions
ipop+0.024
ip+0.024
burn+0.023
 sunscreen+0.022
 Sn+0.021
Neg logit contributions
 noise-0.035
 awful-0.032
 cries-0.032
 evil-0.031
 unfamiliar-0.031
Token gave
Token ID2921
Feature activation-2.41e-01

Pos logit contributions
 noise+0.012
 cries+0.010
 awful+0.010
 truth+0.010
 comm+0.010
Neg logit contributions
ipop-0.019
ip-0.019
burn-0.018
 sunscreen-0.018
Whe-0.018
Token her
Token ID607
Feature activation+8.72e-01

Pos logit contributions
 noise+0.005
 cries+0.004
 awful+0.004
 evil+0.004
 truth+0.004
Neg logit contributions
ipop-0.006
ip-0.006
burn-0.005
 sunscreen-0.005
 Sn-0.005
Token lit
Token ID6578
Feature activation-1.76e+00

Pos logit contributions
ipop+0.021
ip+0.021
burn+0.020
 sunscreen+0.020
 Sn+0.020
Neg logit contributions
 noise-0.020
 situation-0.018
 cries-0.018
 awful-0.018
 comm-0.018
Token up
Token ID510
Feature activation-9.06e-01

Pos logit contributions
 noise+0.031
 awful+0.028
 cries+0.027
 evil+0.027
 situation+0.026
Neg logit contributions
ip-0.044
ipop-0.044
burn-0.043
 sunscreen-0.041
 Sn-0.041
Token and
Token ID290
Feature activation+3.73e-01

Pos logit contributions
 noise+0.009
 awful+0.007
 cries+0.007
 evil+0.007
 unfamiliar+0.007
Neg logit contributions
ipop-0.030
ip-0.030
burn-0.029
 sunscreen-0.028
 Sn-0.028
Token he
Token ID339
Feature activation-4.88e-01

Pos logit contributions
burn+0.010
ip+0.010
ipop+0.010
 sunscreen+0.009
uggets+0.009
Neg logit contributions
 noise-0.006
 cries-0.006
 awful-0.005
 evil-0.005
 unfamiliar-0.005
Token said
Token ID531
Feature activation+1.36e+00

Pos logit contributions
 noise+0.009
 awful+0.008
 cries+0.008
 evil+0.008
 unfamiliar+0.008
Neg logit contributions
ipop-0.012
ip-0.012
burn-0.011
 sunscreen-0.011
 Sn-0.010
Token,
Token ID11
Feature activation+2.50e+00

Pos logit contributions
burn+0.039
ip+0.037
ipop+0.037
 sunscreen+0.037
Whe+0.036
Neg logit contributions
 noise-0.021
 cries-0.018
 awful-0.018
 truth-0.017
 evil-0.017
Token "
Token ID366
Feature activation+1.68e+00

Pos logit contributions
burn+0.076
ip+0.072
 sunscreen+0.071
ipop+0.070
inea+0.069
Neg logit contributions
 noise-0.035
 cries-0.029
 truth-0.028
 awful-0.026
.-0.025
TokenYes
Token ID5297
Feature activation+2.50e+00

Pos logit contributions
ipop+0.035
ip+0.035
burn+0.035
 sunscreen+0.033
uggets+0.031
Neg logit contributions
 noise-0.037
 truth-0.033
 awful-0.033
 wonders-0.033
 cries-0.033
Token,
Token ID11
Feature activation+3.16e+00

Pos logit contributions
burn+0.068
ipop+0.068
ip+0.068
 sunscreen+0.065
uggets+0.064
Neg logit contributions
 noise-0.040
 awful-0.035
 cries-0.034
 evil-0.032
 truth-0.032
Token please
Token ID3387
Feature activation+2.36e+00

Pos logit contributions
burn+0.092
ip+0.089
 sunscreen+0.085
ipop+0.085
coat+0.084
Neg logit contributions
 noise-0.049
 truth-0.043
 awful-0.042
 evil-0.040
 cries-0.039
Token!
Token ID0
Feature activation+8.93e-01

Pos logit contributions
ipop+0.079
ip+0.079
burn+0.079
 sunscreen+0.076
Whe+0.074
Neg logit contributions
 noise-0.025
 awful-0.018
 cries-0.018
 truth-0.017
 evil-0.017
Token from
Token ID422
Feature activation-2.97e-02

Pos logit contributions
 noise+0.008
 awful+0.006
 cries+0.006
 evil+0.006
 unfamiliar+0.005
Neg logit contributions
ipop-0.028
ip-0.028
burn-0.027
 sunscreen-0.026
 Sn-0.026
Token within
Token ID1626
Feature activation+1.90e+00

Pos logit contributions
 noise+0.001
 awful+0.001
 cries+0.001
 evil+0.000
 unfamiliar+0.000
Neg logit contributions
ipop-0.001
ip-0.001
burn-0.001
 sunscreen-0.001
 Sn-0.001
Token.
Token ID13
Feature activation+7.12e-01

Pos logit contributions
ipop+0.047
ip+0.047
burn+0.045
 sunscreen+0.044
 Sn+0.043
Neg logit contributions
 noise-0.035
 awful-0.031
 cries-0.031
 evil-0.029
 truth-0.029
Token She
Token ID1375
Feature activation+4.37e-01

Pos logit contributions
ipop+0.021
burn+0.021
ip+0.021
 sunscreen+0.021
coat+0.019
Neg logit contributions
 noise-0.009
 cries-0.008
 awful-0.008
 truth-0.008
 wonders-0.007
Token had
Token ID550
Feature activation+2.02e+00

Pos logit contributions
ipop+0.011
ip+0.011
burn+0.011
 sunscreen+0.010
 Sn+0.010
Neg logit contributions
 noise-0.008
 awful-0.007
 cries-0.007
 evil-0.007
 unfamiliar-0.007
Token finally
Token ID3443
Feature activation+1.74e+00

Pos logit contributions
ip+0.048
ipop+0.048
burn+0.047
 sunscreen+0.046
 Sn+0.044
Neg logit contributions
 noise-0.039
 cries-0.035
 awful-0.034
 evil-0.033
 unfamiliar-0.033
Token realized
Token ID6939
Feature activation-1.40e+00

Pos logit contributions
ipop+0.038
ip+0.037
burn+0.037
 sunscreen+0.036
 Sn+0.033
Neg logit contributions
 noise-0.037
 cries-0.033
 awful-0.033
 odd-0.032
 unfamiliar-0.032
Token that
Token ID326
Feature activation-7.29e-01

Pos logit contributions
 noise+0.015
 awful+0.012
 cries+0.012
 evil+0.011
 unfamiliar+0.011
Neg logit contributions
ipop-0.045
ip-0.045
burn-0.044
 sunscreen-0.043
 Sn-0.042
Token love
Token ID1842
Feature activation-8.60e-01

Pos logit contributions
 noise+0.013
 awful+0.011
 cries+0.011
 truth+0.011
 evil+0.011
Neg logit contributions
ipop-0.019
ip-0.019
burn-0.018
 sunscreen-0.018
Whe-0.017
Token was
Token ID373
Feature activation-1.30e+00

Pos logit contributions
 noise+0.014
 awful+0.012
 cries+0.012
 evil+0.011
 unfamiliar+0.011
Neg logit contributions
ipop-0.024
ip-0.024
burn-0.023
 sunscreen-0.022
 Sn-0.022
Token the
Token ID262
Feature activation-8.68e-01

Pos logit contributions
 noise+0.022
 awful+0.019
 evil+0.018
 cries+0.018
 unfamiliar+0.018
Neg logit contributions
ipop-0.036
ip-0.035
burn-0.034
 sunscreen-0.033
Whe-0.032
Token.
Token ID13
Feature activation-3.92e-01

Pos logit contributions
 noise+0.027
 cries+0.023
 truth+0.022
 awful+0.022
 situation+0.022
Neg logit contributions
ip-0.025
ipop-0.024
burn-0.023
 sunscreen-0.022
 Sn-0.021
Token
Token ID198
Feature activation-6.62e-01

Pos logit contributions
 evil+0.010
 harm+0.010
 awful+0.009
 noise+0.009
 annoying+0.009
Neg logit contributions
ip-0.007
ipop-0.007
ugg-0.006
 Whe-0.006
 Enjoy-0.006
Token
Token ID198
Feature activation-1.26e+00

Pos logit contributions
 noise+0.014
 evil+0.014
 awful+0.014
 unfamiliar+0.012
 cries+0.012
Neg logit contributions
ip-0.015
ipop-0.014
 Sn-0.013
ugg-0.013
 sunscreen-0.013
TokenEvery
Token ID6109
Feature activation+6.96e-01

Pos logit contributions
 cries+0.036
 noise+0.035
 screams+0.033
 harm+0.033
 evil+0.033
Neg logit contributions
ip-0.016
 sunscreen-0.015
Fl-0.015
pron-0.015
Br-0.015
Token day
Token ID1110
Feature activation+2.15e+00

Pos logit contributions
ip+0.013
ipop+0.013
burn+0.013
 sunscreen+0.012
 Sn+0.011
Neg logit contributions
 noise-0.017
 awful-0.016
 cries-0.016
 evil-0.015
 unfamiliar-0.015
Token Jack
Token ID3619
Feature activation-8.46e-01

Pos logit contributions
ipop+0.050
ip+0.050
burn+0.047
 sunscreen+0.046
 Sn+0.044
Neg logit contributions
 noise-0.043
 awful-0.039
 cries-0.039
 evil-0.038
 unfamiliar-0.038
Token and
Token ID290
Feature activation+2.47e+00

Pos logit contributions
 noise+0.018
 awful+0.016
 cries+0.016
 evil+0.016
 unfamiliar+0.016
Neg logit contributions
ipop-0.018
ip-0.018
burn-0.018
 sunscreen-0.017
 Sn-0.016
Token his
Token ID465
Feature activation+1.95e+00

Pos logit contributions
ipop+0.059
ip+0.057
burn+0.057
 sunscreen+0.055
 Sn+0.052
Neg logit contributions
 noise-0.047
 awful-0.044
 cries-0.042
 evil-0.041
 unfamiliar-0.041
Token mom
Token ID1995
Feature activation+1.67e-01

Pos logit contributions
ipop+0.037
ip+0.037
burn+0.036
 sunscreen+0.033
 Sn+0.033
Neg logit contributions
 noise-0.049
 cries-0.044
 awful-0.044
 evil-0.043
 truth-0.042
Token would
Token ID561
Feature activation-1.19e-01

Pos logit contributions
ipop+0.004
ip+0.004
Whe+0.004
burn+0.004
 Sn+0.004
Neg logit contributions
 noise-0.003
 cries-0.003
 truth-0.003
 awful-0.003
 wonders-0.003
Token go
Token ID467
Feature activation+1.57e+00

Pos logit contributions
 noise+0.002
 cries+0.002
 awful+0.002
 truth+0.002
 evil+0.002
Neg logit contributions
ipop-0.003
ip-0.003
burn-0.003
 sunscreen-0.003
Whe-0.002
Token looked
Token ID3114
Feature activation+1.75e+00

Pos logit contributions
 noise+0.027
 cries+0.023
 awful+0.023
 evil+0.023
 unfamiliar+0.022
Neg logit contributions
ipop-0.044
ip-0.044
burn-0.043
 sunscreen-0.041
Whe-0.040
Token for
Token ID329
Feature activation-6.57e-01

Pos logit contributions
ipop+0.042
ip+0.042
burn+0.041
 sunscreen+0.040
 Sn+0.038
Neg logit contributions
 noise-0.033
 awful-0.030
 cries-0.029
 evil-0.029
 truth-0.028
Token bananas
Token ID35484
Feature activation+3.48e+00

Pos logit contributions
 noise+0.013
 awful+0.011
 evil+0.011
 cries+0.011
 unfamiliar+0.010
Neg logit contributions
ip-0.017
ipop-0.017
burn-0.017
Whe-0.015
rav-0.015
Token,
Token ID11
Feature activation+2.85e+00

Pos logit contributions
ipop+0.084
ip+0.084
burn+0.078
 sunscreen+0.076
coat+0.073
Neg logit contributions
 noise-0.070
 awful-0.062
 cries-0.061
 evil-0.061
 comm-0.060
Token but
Token ID475
Feature activation+4.40e-01

Pos logit contributions
ip+0.084
ipop+0.083
burn+0.083
ugg+0.078
rav+0.078
Neg logit contributions
 noise-0.042
 cries-0.035
 awful-0.035
 evil-0.033
 unfamiliar-0.033
Token there
Token ID612
Feature activation-7.14e-01

Pos logit contributions
burn+0.015
ip+0.014
Whe+0.014
coat+0.014
rav+0.013
Neg logit contributions
 noise-0.005
 awful-0.004
 cries-0.004
 evil-0.004
 unfamiliar-0.004
Token were
Token ID547
Feature activation-2.92e+00

Pos logit contributions
 noise+0.013
 evil+0.011
 awful+0.011
 cries+0.011
 comm+0.011
Neg logit contributions
ipop-0.018
ip-0.018
burn-0.017
 Sn-0.016
 sunscreen-0.016
Token none
Token ID4844
Feature activation-1.93e+00

Pos logit contributions
 noise+0.057
 awful+0.050
 cries+0.050
 evil+0.049
 unfamiliar+0.048
Neg logit contributions
ipop-0.069
ip-0.069
burn-0.067
 sunscreen-0.065
 Sn-0.063
Token.
Token ID13
Feature activation-7.28e-01

Pos logit contributions
 noise+0.030
 cries+0.026
 awful+0.026
 evil+0.025
 unfamiliar+0.024
Neg logit contributions
ipop-0.054
ip-0.053
burn-0.051
 sunscreen-0.050
uggets-0.048
Token He
Token ID679
Feature activation-1.36e+00

Pos logit contributions
 noise+0.010
 cries+0.009
 awful+0.008
 truth+0.008
 odd+0.008
Neg logit contributions
burn-0.022
ipop-0.021
ip-0.021
 sunscreen-0.020
rav-0.020
Token looked
Token ID3114
Feature activation+1.59e+00

Pos logit contributions
 noise+0.023
 awful+0.020
 cries+0.020
 evil+0.019
 unfamiliar+0.019
Neg logit contributions
ipop-0.036
ip-0.036
burn-0.035
 sunscreen-0.034
 Sn-0.033
Token baby
Token ID5156
Feature activation-5.63e-01

Pos logit contributions
 noise+0.013
 awful+0.012
 cries+0.012
 evil+0.012
 unfamiliar+0.011
Neg logit contributions
ipop-0.016
burn-0.015
ip-0.015
 sunscreen-0.014
 Sn-0.014
Token was
Token ID373
Feature activation-1.47e+00

Pos logit contributions
 noise+0.008
 cries+0.006
 awful+0.006
 evil+0.006
 truth+0.006
Neg logit contributions
ip-0.017
burn-0.017
 sunscreen-0.017
ipop-0.017
Whe-0.016
Token taking
Token ID2263
Feature activation+2.93e-01

Pos logit contributions
 noise+0.022
 cries+0.020
 situation+0.019
urious+0.018
 awful+0.018
Neg logit contributions
ipop-0.038
burn-0.038
ip-0.038
 sunscreen-0.037
 Sn-0.037
Token away
Token ID1497
Feature activation-9.28e-01

Pos logit contributions
ipop+0.008
ip+0.007
burn+0.007
 sunscreen+0.007
Whe+0.007
Neg logit contributions
 noise-0.006
 cries-0.005
 awful-0.005
 evil-0.005
 unfamiliar-0.005
Token all
Token ID477
Feature activation-3.19e-01

Pos logit contributions
 noise+0.011
 awful+0.009
 cries+0.009
 evil+0.009
 unfamiliar+0.008
Neg logit contributions
ipop-0.029
ip-0.029
burn-0.029
 sunscreen-0.028
 Sn-0.027
Token the
Token ID262
Feature activation-4.21e-01

Pos logit contributions
 noise+0.006
 awful+0.005
 evil+0.005
 cries+0.005
 urgent+0.005
Neg logit contributions
ipop-0.007
ip-0.007
burn-0.007
 sunscreen-0.007
 Sn-0.007
Token attention
Token ID3241
Feature activation-1.25e+00

Pos logit contributions
 noise+0.006
 awful+0.006
 cries+0.005
 evil+0.005
 unfamiliar+0.005
Neg logit contributions
ipop-0.012
ip-0.012
burn-0.011
 sunscreen-0.011
 Sn-0.011
Token from
Token ID422
Feature activation-4.84e-01

Pos logit contributions
 noise+0.016
 awful+0.014
 cries+0.013
 evil+0.013
 unfamiliar+0.013
Neg logit contributions
ipop-0.038
ip-0.038
burn-0.037
 sunscreen-0.035
 Sn-0.035
Token her
Token ID607
Feature activation+1.23e+00

Pos logit contributions
 noise+0.009
 cries+0.008
 awful+0.008
 evil+0.008
 unfamiliar+0.008
Neg logit contributions
ip-0.013
ipop-0.013
burn-0.012
 sunscreen-0.011
coat-0.011
Token.
Token ID13
Feature activation-3.02e-01

Pos logit contributions
ipop+0.039
ip+0.039
burn+0.039
 sunscreen+0.037
Whe+0.037
Neg logit contributions
 noise-0.015
 cries-0.012
 awful-0.012
 evil-0.011
 truth-0.011
Token Her
Token ID2332
Feature activation+1.64e+00

Pos logit contributions
 noise+0.005
 cries+0.005
 awful+0.005
 truth+0.005
 evil+0.005
Neg logit contributions
ipop-0.008
ip-0.008
burn-0.008
 sunscreen-0.007
 Sn-0.007
Token.
Token ID13
Feature activation-6.06e-01

Pos logit contributions
ipop+0.006
ip+0.006
burn+0.006
 sunscreen+0.006
 Sn+0.005
Neg logit contributions
 noise-0.003
 awful-0.002
 cries-0.002
 evil-0.002
 unfamiliar-0.002
Token It
Token ID632
Feature activation-7.06e-01

Pos logit contributions
 noise+0.009
 cries+0.008
 awful+0.008
 truth+0.008
 odd+0.007
Neg logit contributions
burn-0.018
ipop-0.017
ip-0.017
 sunscreen-0.016
rav-0.016
Token was
Token ID373
Feature activation-1.64e+00

Pos logit contributions
 noise+0.013
 cries+0.011
 evil+0.011
 awful+0.011
 unfamiliar+0.011
Neg logit contributions
ip-0.017
ipop-0.017
burn-0.017
 sunscreen-0.016
 Sn-0.016
Token fun
Token ID1257
Feature activation-7.88e-01

Pos logit contributions
 noise+0.028
 awful+0.024
 cries+0.023
 evil+0.023
 unfamiliar+0.022
Neg logit contributions
ipop-0.045
ip-0.044
burn-0.043
 sunscreen-0.041
 Sn-0.040
Token and
Token ID290
Feature activation+3.97e-02

Pos logit contributions
 noise+0.013
 evil+0.012
 awful+0.012
 cries+0.012
 unfamiliar+0.012
Neg logit contributions
ip-0.021
ipop-0.020
burn-0.019
 sunscreen-0.019
 breakfast-0.019
Token messy
Token ID24097
Feature activation-1.07e+00

Pos logit contributions
ipop+0.001
ip+0.001
burn+0.001
 sunscreen+0.001
 Sn+0.001
Neg logit contributions
 noise-0.001
 cries-0.001
 awful-0.001
 evil-0.001
 situation-0.001
Token.
Token ID13
Feature activation-9.36e-01

Pos logit contributions
 noise+0.011
 evil+0.010
 cries+0.010
 awful+0.010
 unfamiliar+0.009
Neg logit contributions
ip-0.034
ipop-0.034
burn-0.033
 Sn-0.032
 sunscreen-0.032
Token He
Token ID679
Feature activation-5.88e-01

Pos logit contributions
 noise+0.015
 cries+0.013
 awful+0.013
 truth+0.012
 wonders+0.012
Neg logit contributions
burn-0.026
ipop-0.025
ip-0.025
 sunscreen-0.024
coat-0.023
Token saw
Token ID2497
Feature activation-1.98e+00

Pos logit contributions
 noise+0.012
 awful+0.011
 cries+0.011
 evil+0.011
 unfamiliar+0.010
Neg logit contributions
ipop-0.013
ip-0.013
burn-0.013
 sunscreen-0.012
 Sn-0.012
Token a
Token ID257
Feature activation-3.13e-01

Pos logit contributions
 noise+0.037
 awful+0.033
 cries+0.033
 evil+0.032
 situation+0.032
Neg logit contributions
ipop-0.048
ip-0.048
burn-0.046
 sunscreen-0.046
 Sn-0.044
Token shiny
Token ID22441
Feature activation+1.62e+00

Pos logit contributions
 noise+0.005
 truth+0.004
 awful+0.004
 situation+0.004
 strange+0.004
Neg logit contributions
ip-0.009
ipop-0.009
burn-0.009
Whe-0.009
ugg-0.009
Token see
Token ID766
Feature activation-1.50e+00

Pos logit contributions
 noise+0.031
 awful+0.030
 cries+0.029
 comm+0.029
 evil+0.029
Neg logit contributions
burn-0.030
ipop-0.030
ip-0.029
 sunscreen-0.029
coat-0.028
Token the
Token ID262
Feature activation-3.00e+00

Pos logit contributions
 noise+0.028
 awful+0.025
 cries+0.025
 evil+0.024
 unfamiliar+0.024
Neg logit contributions
ipop-0.037
ip-0.036
burn-0.036
 sunscreen-0.034
 Sn-0.033
Token flowers
Token ID12734
Feature activation+4.92e-01

Pos logit contributions
 noise+0.056
 awful+0.047
 cries+0.047
 truth+0.047
 evil+0.046
Neg logit contributions
ip-0.079
ipop-0.078
burn-0.075
 sunscreen-0.071
uggets-0.071
Token.
Token ID13
Feature activation-9.19e-01

Pos logit contributions
ipop+0.014
ip+0.014
burn+0.013
 sunscreen+0.013
 Sn+0.013
Neg logit contributions
 noise-0.007
 cries-0.006
 awful-0.006
 evil-0.006
 comm-0.006
Token They
Token ID1119
Feature activation-9.23e-01

Pos logit contributions
 noise+0.016
 cries+0.014
 awful+0.014
 evil+0.013
 truth+0.013
Neg logit contributions
ipop-0.024
ip-0.024
burn-0.023
 sunscreen-0.022
 Sn-0.021
Token could
Token ID714
Feature activation+1.94e-01

Pos logit contributions
 noise+0.016
 awful+0.014
 cries+0.014
 evil+0.014
 unfamiliar+0.014
Neg logit contributions
ipop-0.024
ip-0.023
burn-0.023
 sunscreen-0.022
 Sn-0.021
Token not
Token ID407
Feature activation-1.59e+00

Pos logit contributions
burn+0.005
 sunscreen+0.005
ipop+0.005
ip+0.005
 Sn+0.005
Neg logit contributions
 noise-0.003
 awful-0.003
 situation-0.003
 evil-0.003
urious-0.003
Token see
Token ID766
Feature activation-1.48e+00

Pos logit contributions
 noise+0.033
 awful+0.031
 cries+0.030
 comm+0.030
 evil+0.029
Neg logit contributions
ipop-0.034
burn-0.033
ip-0.033
 sunscreen-0.032
coat-0.030
Token the
Token ID262
Feature activation-2.32e+00

Pos logit contributions
 noise+0.028
 awful+0.025
 cries+0.025
 evil+0.024
 unfamiliar+0.024
Neg logit contributions
ipop-0.036
ip-0.035
burn-0.035
 sunscreen-0.034
 Sn-0.032
Token shed
Token ID14999
Feature activation-8.78e-01

Pos logit contributions
 noise+0.046
 awful+0.040
 truth+0.039
 cries+0.039
 evil+0.039
Neg logit contributions
ip-0.059
ipop-0.058
burn-0.056
uggets-0.052
 Sn-0.052
Token.
Token ID13
Feature activation-8.92e-01

Pos logit contributions
 noise+0.011
 comm+0.010
 cries+0.010
 awful+0.009
 evil+0.009
Neg logit contributions
burn-0.025
ipop-0.025
ip-0.025
 Sn-0.024
 sunscreen-0.024
TokenBut
Token ID1537
Feature activation-2.11e-02

Pos logit contributions
 noise+0.010
 awful+0.009
 cries+0.009
 evil+0.009
 unfamiliar+0.009
Neg logit contributions
ipop-0.008
ip-0.008
burn-0.008
 sunscreen-0.007
 Sn-0.007
Token then
Token ID788
Feature activation-1.23e-01

Pos logit contributions
 noise+0.000
 awful+0.000
 cries+0.000
 evil+0.000
 unfamiliar+0.000
Neg logit contributions
ip-0.001
burn-0.001
ipop-0.001
coat-0.000
 sunscreen-0.000
Token he
Token ID339
Feature activation-1.34e+00

Pos logit contributions
 noise+0.002
 cries+0.002
 awful+0.002
 evil+0.002
 unfamiliar+0.001
Neg logit contributions
burn-0.004
ip-0.004
ipop-0.004
coat-0.003
uggets-0.003
Token heard
Token ID2982
Feature activation-4.00e+00

Pos logit contributions
 noise+0.028
 awful+0.025
 cries+0.025
 truth+0.024
 unfamiliar+0.024
Neg logit contributions
ip-0.030
ipop-0.030
burn-0.029
 sunscreen-0.027
 Sn-0.026
Token some
Token ID617
Feature activation-2.33e+00

Pos logit contributions
 noise+0.075
 cries+0.068
 awful+0.068
 unfamiliar+0.065
 evil+0.065
Neg logit contributions
ipop-0.093
ip-0.092
burn-0.090
 sunscreen-0.088
 Sn-0.085
Token noise
Token ID7838
Feature activation-3.78e+00

Pos logit contributions
 noise+0.022
 cries+0.021
 awful+0.021
 situation+0.020
 wonders+0.020
Neg logit contributions
ipop-0.066
ip-0.066
burn-0.064
 sunscreen-0.063
Whe-0.060
Token from
Token ID422
Feature activation-1.72e+00

Pos logit contributions
 noise+0.052
 cries+0.051
 awful+0.051
urious+0.050
 situation+0.049
Neg logit contributions
burn-0.096
ipop-0.093
ip-0.092
 Sn-0.090
 sunscreen-0.087
Token far
Token ID1290
Feature activation+1.93e+00

Pos logit contributions
 noise+0.029
 cries+0.025
 awful+0.025
 evil+0.025
 unfamiliar+0.024
Neg logit contributions
ip-0.046
ipop-0.046
burn-0.045
 sunscreen-0.042
 Sn-0.042
Token away
Token ID1497
Feature activation-1.08e+00

Pos logit contributions
burn+0.027
ip+0.027
ipop+0.026
 sunscreen+0.025
 Sn+0.025
Neg logit contributions
 noise-0.050
 awful-0.050
 truth-0.048
 comm-0.048
 cries-0.048
Token,
Token ID11
Feature activation+3.22e-01

Pos logit contributions
 awful+0.019
 annoying+0.018
 cries+0.018
 comm+0.018
 noise+0.018
Neg logit contributions
 Sn-0.022
 seeker-0.021
 sled-0.021
ipop-0.021
 sunscreen-0.021
Token so
Token ID523
Feature activation-1.17e+00

Pos logit contributions
ipop+0.008
ip+0.008
 sunscreen+0.008
burn+0.008
 Sn+0.008
Neg logit contributions
 noise-0.005
 awful-0.005
 cries-0.005
 truth-0.005
 evil-0.005
Token
Token ID198
Feature activation-7.10e-01

Pos logit contributions
 noise+0.003
 truth+0.003
 cries+0.002
 awful+0.002
 wonders+0.002
Neg logit contributions
ipop-0.006
burn-0.006
ip-0.006
 sunscreen-0.006
 saddle-0.006
Token
Token ID198
Feature activation-8.79e-01

Pos logit contributions
 noise+0.012
 awful+0.011
 evil+0.011
 cries+0.011
 situation+0.011
Neg logit contributions
ip-0.018
ipop-0.018
burn-0.017
 sunscreen-0.017
 Sn-0.016
TokenThen
Token ID6423
Feature activation+5.80e-01

Pos logit contributions
 noise+0.013
 awful+0.010
 unfamiliar+0.010
 truth+0.010
 unexpected+0.010
Neg logit contributions
ipop-0.028
burn-0.026
ip-0.025
coat-0.024
oured-0.023
Token,
Token ID11
Feature activation+2.85e+00

Pos logit contributions
ipop+0.018
ugg+0.018
ip+0.018
burn+0.017
coat+0.017
Neg logit contributions
 noise-0.009
 cries-0.007
 trouble-0.006
 awful-0.006
.-0.006
Token they
Token ID484
Feature activation-1.16e-01

Pos logit contributions
ip+0.097
ugg+0.096
coat+0.094
ipop+0.094
etsy+0.092
Neg logit contributions
 noise-0.038
 not-0.026
 that-0.026
 strange-0.024
 trouble-0.024
Token heard
Token ID2982
Feature activation-3.51e+00

Pos logit contributions
 noise+0.002
 cries+0.002
 awful+0.002
 evil+0.002
 unfamiliar+0.002
Neg logit contributions
ipop-0.003
ip-0.003
burn-0.003
 sunscreen-0.003
 Sn-0.003
Token a
Token ID257
Feature activation-1.09e+00

Pos logit contributions
 noise+0.064
 awful+0.054
 cries+0.053
 evil+0.052
 truth+0.051
Neg logit contributions
ipop-0.095
ip-0.094
burn-0.091
 sunscreen-0.085
coat-0.085
Token noise
Token ID7838
Feature activation-3.37e+00

Pos logit contributions
 awful+0.012
 cries+0.012
 harm+0.012
 noise+0.012
 wonders+0.011
Neg logit contributions
ipop-0.029
 Sn-0.029
burn-0.029
 sunscreen-0.028
ip-0.028
Token.
Token ID13
Feature activation-1.11e+00

Pos logit contributions
urious+0.058
 awful+0.056
 situation+0.055
 impossible+0.053
 cries+0.052
Neg logit contributions
burn-0.072
 Sn-0.069
 sunscreen-0.068
 marsh-0.068
 chocolate-0.067
Token It
Token ID632
Feature activation-1.42e+00

Pos logit contributions
 noise+0.019
 cries+0.016
 awful+0.016
 truth+0.015
 evil+0.015
Neg logit contributions
ipop-0.030
ip-0.030
burn-0.029
 sunscreen-0.028
 Sn-0.027
Token was
Token ID373
Feature activation-2.06e+00

Pos logit contributions
umbling+0.029
urious+0.028
 situations+0.028
 awful+0.028
 situation+0.027
Neg logit contributions
 Sn-0.027
burn-0.024
ipop-0.024
Fl-0.023
 marsh-0.022
Token.
Token ID13
Feature activation-5.08e-01

Pos logit contributions
 awful+0.011
 evil+0.011
urious+0.011
 noise+0.011
 situation+0.010
Neg logit contributions
burn-0.017
 Sn-0.017
ip-0.017
coat-0.017
ipop-0.017
Token The
Token ID383
Feature activation-1.07e+00

Pos logit contributions
 noise+0.008
 cries+0.006
 awful+0.006
 truth+0.006
 evil+0.006
Neg logit contributions
burn-0.015
ipop-0.015
ip-0.015
 sunscreen-0.014
uggets-0.013
Token cricket
Token ID18836
Feature activation-1.13e+00

Pos logit contributions
 noise+0.022
 awful+0.021
 cries+0.021
 evil+0.020
 harm+0.020
Neg logit contributions
ipop-0.022
ip-0.022
 sunscreen-0.022
burn-0.021
Whe-0.020
Token was
Token ID373
Feature activation-7.53e-01

Pos logit contributions
 noise+0.020
 awful+0.018
 truth+0.018
 evil+0.018
 unfamiliar+0.018
Neg logit contributions
ip-0.027
ipop-0.027
burn-0.025
 sunscreen-0.024
 Sn-0.024
Token so
Token ID523
Feature activation-1.85e-01

Pos logit contributions
 noise+0.011
 situation+0.011
 cries+0.010
 truth+0.010
urious+0.010
Neg logit contributions
burn-0.020
 sunscreen-0.019
 Sn-0.019
ipop-0.018
ip-0.018
Token relieved
Token ID22598
Feature activation-3.74e+00

Pos logit contributions
 situation+0.004
 noise+0.004
 evil+0.004
 truth+0.004
 cries+0.004
Neg logit contributions
ip-0.003
 sunscreen-0.003
ipop-0.003
 Sn-0.003
burn-0.003
Token that
Token ID326
Feature activation-8.39e-01

Pos logit contributions
 noise+0.071
 evil+0.069
 awful+0.069
 unfamiliar+0.067
 annoying+0.066
Neg logit contributions
ip-0.082
ipop-0.081
 sunscreen-0.076
burn-0.075
 Sn-0.072
Token he
Token ID339
Feature activation-4.64e-01

Pos logit contributions
 noise+0.012
 awful+0.010
 cries+0.010
 evil+0.010
 unfamiliar+0.010
Neg logit contributions
ipop-0.024
ip-0.024
burn-0.024
 sunscreen-0.023
 Sn-0.022
Token stopped
Token ID5025
Feature activation-2.27e+00

Pos logit contributions
 noise+0.009
 truth+0.008
 awful+0.008
 unfamiliar+0.008
 cries+0.008
Neg logit contributions
ip-0.011
ipop-0.010
burn-0.010
 Sn-0.010
 sunscreen-0.009
Token hopping
Token ID47153
Feature activation+1.69e+00

Pos logit contributions
 noise+0.045
 awful+0.042
 evil+0.041
 truth+0.040
 odd+0.040
Neg logit contributions
ip-0.050
burn-0.050
ipop-0.049
 sunscreen-0.048
 Sn-0.046
Token and
Token ID290
Feature activation-1.15e-01

Pos logit contributions
ipop+0.041
ip+0.041
burn+0.039
 sunscreen+0.038
 Sn+0.038
Neg logit contributions
 noise-0.030
 awful-0.028
 evil-0.027
 cries-0.027
 unfamiliar-0.026
Token but
Token ID475
Feature activation-2.55e-01

Pos logit contributions
ip+0.086
ipop+0.084
burn+0.080
ugg+0.080
rav+0.079
Neg logit contributions
 noise-0.039
 unfamiliar-0.029
 cries-0.029
 awful-0.028
 unknown-0.028
Token they
Token ID484
Feature activation-1.42e+00

Pos logit contributions
 noise+0.004
 awful+0.004
 unfamiliar+0.004
 odd+0.004
 unknown+0.004
Neg logit contributions
ip-0.007
burn-0.007
ipop-0.007
Whe-0.006
coat-0.006
Token did
Token ID750
Feature activation-1.74e+00

Pos logit contributions
 noise+0.024
 awful+0.021
 cries+0.021
 evil+0.020
 unfamiliar+0.020
Neg logit contributions
ipop-0.037
ip-0.037
burn-0.036
 sunscreen-0.035
 Sn-0.034
Token not
Token ID407
Feature activation-2.43e+00

Pos logit contributions
 noise+0.017
 awful+0.013
 cries+0.013
 evil+0.012
 unfamiliar+0.011
Neg logit contributions
ipop-0.060
ip-0.059
burn-0.058
 sunscreen-0.056
 Sn-0.055
Token notice
Token ID4003
Feature activation-3.31e+00

Pos logit contributions
 noise+0.048
 awful+0.044
 cries+0.044
 evil+0.042
 unfamiliar+0.041
Neg logit contributions
ipop-0.054
burn-0.054
ip-0.053
 sunscreen-0.052
 Sn-0.049
Token the
Token ID262
Feature activation-3.30e+00

Pos logit contributions
 noise+0.045
 awful+0.037
 cries+0.037
 evil+0.036
 unfamiliar+0.035
Neg logit contributions
ipop-0.098
ip-0.098
burn-0.096
 sunscreen-0.093
 Sn-0.091
Token time
Token ID640
Feature activation-1.75e+00

Pos logit contributions
 noise+0.057
 awful+0.048
 cries+0.048
 truth+0.047
 evil+0.047
Neg logit contributions
ip-0.091
ipop-0.091
burn-0.087
 sunscreen-0.083
uggets-0.082
Token.
Token ID13
Feature activation-7.41e-01

Pos logit contributions
 noise+0.023
 awful+0.019
 cries+0.019
 evil+0.018
 unfamiliar+0.018
Neg logit contributions
ipop-0.053
ip-0.053
burn-0.052
 sunscreen-0.050
 Sn-0.049
Token Soon
Token ID15894
Feature activation+1.61e+00

Pos logit contributions
 noise+0.010
 truth+0.008
 cries+0.008
 awful+0.008
 wonders+0.008
Neg logit contributions
ipop-0.023
burn-0.023
ip-0.022
 sunscreen-0.021
coat-0.021
Token,
Token ID11
Feature activation+1.41e+00

Pos logit contributions
burn+0.052
ipop+0.052
ip+0.051
coat+0.051
ugg+0.051
Neg logit contributions
 noise-0.019
 trouble-0.014
 cries-0.014
 awful-0.014
 situation-0.014
Token mom
Token ID1995
Feature activation-1.25e+00

Pos logit contributions
ipop+0.046
ip+0.046
ugg+0.044
coat+0.044
etsy+0.043
Neg logit contributions
 noise-0.021
 trouble-0.016
 truth-0.015
 unknown-0.015
 unfamiliar-0.014
Token.
Token ID13
Feature activation-9.52e-01

Pos logit contributions
 noise+0.034
 cries+0.027
 unfamiliar+0.027
 awful+0.026
 truth+0.026
Neg logit contributions
ipop-0.070
ip-0.067
ugg-0.062
rav-0.062
burn-0.062
Token The
Token ID383
Feature activation-9.06e-01

Pos logit contributions
 noise+0.015
 cries+0.013
 awful+0.012
 truth+0.012
 situation+0.012
Neg logit contributions
ipop-0.027
ip-0.027
burn-0.026
 sunscreen-0.025
uggets-0.024
Token little
Token ID1310
Feature activation-1.28e+00

Pos logit contributions
 noise+0.010
 situation+0.006
 truth+0.006
 unknown+0.006
 evil+0.006
Neg logit contributions
ip-0.034
uggets-0.033
burn-0.033
ipop-0.032
 Sn-0.031
Token girl
Token ID2576
Feature activation-1.92e+00

Pos logit contributions
 noise+0.030
 cries+0.026
 situation+0.026
 wonders+0.026
 truth+0.025
Neg logit contributions
ip-0.027
ipop-0.026
uggets-0.025
burn-0.025
 sunscreen-0.024
Token was
Token ID373
Feature activation-1.42e+00

Pos logit contributions
 noise+0.031
 awful+0.027
 cries+0.026
 evil+0.026
 unfamiliar+0.025
Neg logit contributions
ipop-0.052
ip-0.052
burn-0.050
 sunscreen-0.049
 Sn-0.048
Token sad
Token ID6507
Feature activation-3.82e+00

Pos logit contributions
 noise+0.020
 cries+0.017
 awful+0.017
 situation+0.017
 truth+0.016
Neg logit contributions
burn-0.039
ip-0.039
ipop-0.039
 sunscreen-0.038
 Sn-0.037
Token,
Token ID11
Feature activation-2.02e-01

Pos logit contributions
 noise+0.070
 evil+0.067
 awful+0.066
 unfamiliar+0.063
 odd+0.062
Neg logit contributions
ip-0.092
ipop-0.091
burn-0.087
 sunscreen-0.082
 Sn-0.081
Token but
Token ID475
Feature activation-6.10e-01

Pos logit contributions
 noise+0.003
 awful+0.003
 cries+0.003
 evil+0.003
 truth+0.003
Neg logit contributions
ipop-0.005
ip-0.005
burn-0.005
 sunscreen-0.005
 Sn-0.005
Token she
Token ID673
Feature activation-1.17e+00

Pos logit contributions
 noise+0.013
 awful+0.011
 cries+0.011
 evil+0.011
 unfamiliar+0.011
Neg logit contributions
ipop-0.014
ip-0.014
burn-0.013
 sunscreen-0.013
 Sn-0.012
Token waited
Token ID13488
Feature activation-7.99e-01

Pos logit contributions
 noise+0.019
 awful+0.017
 cries+0.017
 evil+0.016
 unfamiliar+0.016
Neg logit contributions
ipop-0.031
ip-0.031
burn-0.030
 sunscreen-0.029
 Sn-0.028
Token.
Token ID13
Feature activation-6.97e-01

Pos logit contributions
 noise+0.015
 awful+0.013
 cries+0.013
 evil+0.013
 unfamiliar+0.013
Neg logit contributions
ipop-0.020
ip-0.019
burn-0.019
 sunscreen-0.018
 Sn-0.018