TOP ACTIVATIONS
MAX = 0.080

. They told them to clean up the mess and to
other . They help to clean up the papers . They
smile . They decide to clean up the mess together .
them away . Can we clean our room later ?"
friends . How about we clean your old basketball and make
to stop fighting and to clean up the blocks . She
bag of peanuts and they spilled all over the floor .
a big mess . They spilled the polish on the bed
decide to work together and clean up the mess . They
says he has to help clean up . She says he
She said they had to clean up the mess . She
Mom . She helped Mom clean up the soup and the
She said they had to clean up the mess .
tells Lily and Tom to clean up the mess and to
nervous anymore when asked to clean the garage . <|endoftext|> Once
. " How about we clean up the blocks and then
. Mom told them to clean up the blocks and make
mom . They promise to clean up . They turn off
mess and asked her to clean it up . But Lily
and say sorry . They clean up their rooms and their

MIN ACTIVATIONS
MIN = -0.076

yes ! â
yes , I want to find
Of course ! I â
Of course you can have a
rust ling in the grass below . It was a present
yes â and began to
yes â . The second
said , " Wow ! What is this ?"
Kay la was over joy ed and the two of
sharp object on the ground below . Curious , he quickly
Of course . I would love
Of course . I would love
Of course ! â he
with joy and said " yes !" From then
of himself and said " yes , I sure was ".
Of course ! Come join us
around in circles . " Why are you running like that
said , " Hi ! What is your name ?"
were reflecting off the lake below . Soon ,
man smiled and said ," Of course I can ! Watch

INTERVAL 0.041 - 0.080
CONTAINS 0.218%

dirty ? You need to clean up after you cook .
asked her , " Why are you sad , Sally ?"
and realized he forgot to wash them . He felt a
! When they finished their mixing , Amy 's
." While washing the dishes , Lily saw

INTERVAL 0.002 - 0.041
CONTAINS 42.673%

said , " Tim my , it 's not nice to
asked for a green spoon . She was so excited to
learn . We want to cook with you . We want
She had a lot of fun spl ashing water and making
and wanted to take it home . The mom

INTERVAL -0.037 - 0.002
CONTAINS 56.647%

ran up the hill and stopped when she reached the top
found a sandy beach . She was so excited to be
brown mushroom . It was so pretty and she wanted to
smaller . She looked at her hand and saw that the
was a little bird named Jack . He was singing and

INTERVAL -0.076 - -0.037
CONTAINS 0.463%

" Sam , Mia , where are you ?" she calls
so spark ly and high in the sky . They wanted
box . Sam , you come with me . We have
and a k ang aroo with pol ka dotted feet .
. She said , " Yes !" So Mom
Token.
Token ID13
Feature activation+2.42e-02

Pos logit contributions
lf+0.075
 cartoons+0.068
ilts+0.067
 spa+0.065
ruce+0.063
Neg logit contributions
ho-0.081
sed-0.077
ses-0.072
aped-0.063
rified-0.061
Token They
Token ID1119
Feature activation+3.38e-02

Pos logit contributions
lf+0.042
ilts+0.039
 cartoons+0.038
 spa+0.037
ruce+0.037
Neg logit contributions
ho-0.038
sed-0.036
ses-0.034
ks-0.030
 hope-0.029
Token told
Token ID1297
Feature activation+2.70e-02

Pos logit contributions
lf+0.058
ilts+0.054
 cartoons+0.053
 spa+0.051
ruce+0.051
Neg logit contributions
ho-0.053
sed-0.050
ses-0.047
aped-0.041
ks-0.040
Token them
Token ID606
Feature activation+2.63e-02

Pos logit contributions
lf+0.042
ilts+0.039
 spa+0.037
lore+0.036
During+0.036
Neg logit contributions
ho-0.047
sed-0.044
ses-0.042
 hope-0.038
ks-0.038
Token to
Token ID284
Feature activation+2.94e-02

Pos logit contributions
lf+0.049
ilts+0.046
 spa+0.045
iver+0.045
ruce+0.042
Neg logit contributions
ho-0.039
sed-0.035
ses-0.032
 hope-0.032
ks-0.031
Token clean
Token ID3424
Feature activation+7.62e-02

Pos logit contributions
lf+0.044
ilts+0.040
 cartoons+0.039
 spa+0.039
ruce+0.037
Neg logit contributions
ho-0.053
sed-0.050
ses-0.048
 hope-0.042
aped-0.042
Token up
Token ID510
Feature activation+2.02e-02

Pos logit contributions
lf+0.134
ilts+0.126
 spa+0.120
During+0.119
ruce+0.118
Neg logit contributions
ho-0.114
sed-0.105
ses-0.100
 hope-0.090
ks-0.090
Token the
Token ID262
Feature activation+3.78e-02

Pos logit contributions
lf+0.028
ilts+0.025
 cartoons+0.025
 spa+0.024
ruce+0.023
Neg logit contributions
ho-0.039
sed-0.037
ses-0.035
aped-0.032
ks-0.031
Token mess
Token ID2085
Feature activation+4.89e-02

Pos logit contributions
ilts+0.035
lf+0.035
 spa+0.032
iver+0.032
lore+0.032
Neg logit contributions
sed-0.082
ho-0.081
ses-0.079
 ear-0.073
 hope-0.073
Token and
Token ID290
Feature activation+3.35e-02

Pos logit contributions
lf+0.070
ilts+0.063
 cartoons+0.062
 spa+0.059
ruce+0.058
Neg logit contributions
ho-0.092
sed-0.089
ses-0.084
aped-0.074
 hope-0.072
Token to
Token ID284
Feature activation+3.73e-02

Pos logit contributions
lf+0.042
ilts+0.037
 cartoons+0.037
 spa+0.035
During+0.034
Neg logit contributions
ho-0.069
sed-0.066
ses-0.063
aped-0.057
ks-0.056
Token other
Token ID584
Feature activation+3.44e-02

Pos logit contributions
lf+0.002
 cartoons+0.001
ilts+0.001
During+0.001
 bath+0.001
Neg logit contributions
ho-0.002
sed-0.002
ses-0.002
aped-0.002
ks-0.002
Token.
Token ID13
Feature activation+2.29e-02

Pos logit contributions
lf+0.051
ilts+0.047
 cartoons+0.047
 spa+0.045
 pack+0.043
Neg logit contributions
ho-0.063
sed-0.061
ses-0.057
aped-0.050
ks-0.048
Token They
Token ID1119
Feature activation+2.37e-02

Pos logit contributions
lf+0.027
 cartoons+0.026
ilts+0.026
During+0.025
 spa+0.024
Neg logit contributions
ho-0.048
sed-0.045
ses-0.042
aped-0.039
rified-0.038
Token help
Token ID1037
Feature activation+2.96e-02

Pos logit contributions
lf+0.040
ilts+0.037
 cartoons+0.037
 spa+0.036
ruce+0.035
Neg logit contributions
ho-0.038
sed-0.036
ses-0.034
aped-0.029
ks-0.029
Token to
Token ID284
Feature activation+3.57e-02

Pos logit contributions
 cartoons+0.036
lf+0.036
 pack+0.032
 organize+0.032
 chores+0.032
Neg logit contributions
ho-0.062
sed-0.062
ses-0.059
 pad-0.052
rified-0.051
Token clean
Token ID3424
Feature activation+7.60e-02

Pos logit contributions
lf+0.039
ilts+0.036
 spa+0.034
 cartoons+0.033
ruce+0.032
Neg logit contributions
ho-0.077
sed-0.074
ses-0.071
 hope-0.067
 ear-0.064
Token up
Token ID510
Feature activation+2.58e-02

Pos logit contributions
lf+0.131
ilts+0.122
 spa+0.119
During+0.117
 cartoons+0.115
Neg logit contributions
ho-0.116
sed-0.108
ses-0.103
 hope-0.093
ks-0.092
Token the
Token ID262
Feature activation+2.77e-02

Pos logit contributions
lf+0.038
ilts+0.035
 spa+0.034
 cartoons+0.033
ruce+0.032
Neg logit contributions
ho-0.047
sed-0.044
ses-0.042
ks-0.038
 hope-0.038
Token papers
Token ID9473
Feature activation+2.64e-02

Pos logit contributions
lf+0.021
ilts+0.019
 spa+0.016
 cartoons+0.016
lore+0.016
Neg logit contributions
ho-0.070
sed-0.067
ses-0.065
 hope-0.060
ks-0.059
Token.
Token ID13
Feature activation+1.97e-02

Pos logit contributions
lf+0.042
ilts+0.039
 cartoons+0.038
 spa+0.038
lore+0.036
Neg logit contributions
ho-0.045
sed-0.042
ses-0.040
ks-0.035
 hope-0.035
Token They
Token ID1119
Feature activation+2.13e-02

Pos logit contributions
lf+0.026
ilts+0.025
 cartoons+0.024
 spa+0.023
During+0.023
Neg logit contributions
ho-0.039
sed-0.037
ses-0.034
aped-0.031
rified-0.031
Token smile
Token ID8212
Feature activation-8.11e-03

Pos logit contributions
 cartoons+0.012
lf+0.012
 pack+0.012
 organize+0.011
 chores+0.011
Neg logit contributions
ho-0.016
sed-0.016
ses-0.015
aped-0.014
rified-0.014
Token.
Token ID13
Feature activation+1.43e-02

Pos logit contributions
ho+0.017
sed+0.016
ses+0.015
aped+0.014
rified+0.013
Neg logit contributions
 cartoons-0.010
lf-0.009
ilts-0.009
 bath-0.009
 pack-0.009
Token They
Token ID1119
Feature activation+1.90e-02

Pos logit contributions
 cartoons+0.016
lf+0.016
ilts+0.016
 spa+0.015
 films+0.015
Neg logit contributions
ho-0.031
sed-0.028
ses-0.027
aped-0.025
rified-0.024
Token decide
Token ID5409
Feature activation+1.16e-02

Pos logit contributions
lf+0.030
ilts+0.028
 cartoons+0.028
 pack+0.028
 spa+0.027
Neg logit contributions
ho-0.031
sed-0.030
ses-0.028
aped-0.025
ks-0.024
Token to
Token ID284
Feature activation+2.73e-02

Pos logit contributions
lf+0.020
ilts+0.019
 spa+0.017
 cartoons+0.017
iver+0.017
Neg logit contributions
ho-0.019
sed-0.017
ses-0.016
 hope-0.015
ks-0.015
Token clean
Token ID3424
Feature activation+7.40e-02

Pos logit contributions
lf+0.039
ilts+0.036
 spa+0.035
 cartoons+0.035
ruce+0.033
Neg logit contributions
ho-0.051
sed-0.047
ses-0.045
 hope-0.041
 ear-0.040
Token up
Token ID510
Feature activation+2.08e-02

Pos logit contributions
lf+0.112
ilts+0.103
 cartoons+0.102
 spa+0.099
ruce+0.096
Neg logit contributions
ho-0.133
sed-0.127
ses-0.119
aped-0.106
ks-0.103
Token the
Token ID262
Feature activation+3.18e-02

Pos logit contributions
lf+0.026
 cartoons+0.024
ilts+0.023
 films+0.022
 spa+0.022
Neg logit contributions
ho-0.043
sed-0.042
ses-0.040
aped-0.036
rified-0.035
Token mess
Token ID2085
Feature activation+4.46e-02

Pos logit contributions
ilts+0.032
lf+0.031
 spa+0.030
lore+0.029
iver+0.029
Neg logit contributions
ho-0.068
sed-0.066
ses-0.064
 hope-0.059
 ear-0.058
Token together
Token ID1978
Feature activation+4.34e-03

Pos logit contributions
lf+0.067
 cartoons+0.061
ilts+0.059
 spa+0.055
 films+0.055
Neg logit contributions
ho-0.083
sed-0.082
ses-0.077
aped-0.066
rified-0.064
Token.
Token ID13
Feature activation+1.45e-02

Pos logit contributions
lf+0.005
 cartoons+0.004
 organize+0.004
 homework+0.004
 pack+0.004
Neg logit contributions
sed-0.010
ho-0.009
ses-0.009
rified-0.008
aped-0.008
Token them
Token ID606
Feature activation-3.86e-03

Pos logit contributions
lf+0.028
ilts+0.026
 cartoons+0.026
 spa+0.025
ruce+0.024
Neg logit contributions
ho-0.040
sed-0.039
ses-0.037
aped-0.033
ks-0.032
Token away
Token ID1497
Feature activation+1.75e-03

Pos logit contributions
ho+0.004
 hope+0.003
sed+0.003
ed+0.003
ses+0.003
Neg logit contributions
lf-0.009
ilts-0.008
 spa-0.008
iver-0.008
ruce-0.007
Token.
Token ID13
Feature activation+1.76e-02

Pos logit contributions
lf+0.003
ilts+0.003
ruce+0.003
 spa+0.003
lore+0.003
Neg logit contributions
ho-0.002
sed-0.002
ses-0.002
 hope-0.002
 ear-0.002
Token Can
Token ID1680
Feature activation+4.48e-03

Pos logit contributions
lf+0.033
ilts+0.032
 cartoons+0.031
 spa+0.030
During+0.030
Neg logit contributions
ho-0.025
sed-0.023
ses-0.021
aped-0.018
ks-0.018
Token we
Token ID356
Feature activation+1.53e-02

Pos logit contributions
ilts+0.009
iver+0.008
lf+0.008
 spa+0.008
ruce+0.008
Neg logit contributions
ho-0.006
sed-0.006
ses-0.005
 hope-0.004
 Hello-0.004
Token clean
Token ID3424
Feature activation+7.29e-02

Pos logit contributions
lf+0.028
irens+0.027
 films+0.027
ilts+0.026
 spa+0.025
Neg logit contributions
ho-0.022
 hope-0.020
sed-0.018
 ear-0.017
Deal-0.017
Token our
Token ID674
Feature activation+3.17e-02

Pos logit contributions
lf+0.136
ilts+0.135
 opener+0.132
During+0.131
 spa+0.128
Neg logit contributions
ho-0.086
sed-0.079
 hope-0.074
ses-0.073
 ear-0.072
Token room
Token ID2119
Feature activation+5.23e-02

Pos logit contributions
lf+0.038
 spa+0.035
 opener+0.035
ilts+0.035
During+0.034
Neg logit contributions
sed-0.063
ses-0.060
ho-0.058
 ear-0.056
 hope-0.053
Token later
Token ID1568
Feature activation+2.90e-02

Pos logit contributions
lf+0.077
ilts+0.075
 spa+0.071
 films+0.068
 cartoons+0.068
Neg logit contributions
ho-0.085
sed-0.081
ses-0.075
 hope-0.075
ed-0.070
Token?"
Token ID1701
Feature activation+2.93e-03

Pos logit contributions
ilts+0.029
lf+0.028
lore+0.024
iver+0.023
 product+0.023
Neg logit contributions
ho-0.061
sed-0.060
ses-0.055
ed-0.053
 hope-0.052
Token
Token ID198
Feature activation+7.75e-03

Pos logit contributions
lf+0.005
ilts+0.005
 cartoons+0.004
 spa+0.004
ruce+0.004
Neg logit contributions
ho-0.005
sed-0.005
ses-0.004
 hope-0.004
ks-0.004
Token friends
Token ID2460
Feature activation+1.09e-02

Pos logit contributions
lf+0.021
ilts+0.019
 cartoons+0.018
 spa+0.017
ruce+0.017
Neg logit contributions
ho-0.041
sed-0.039
ses-0.038
aped-0.034
ks-0.034
Token.
Token ID13
Feature activation+1.45e-02

Pos logit contributions
lf+0.018
ilts+0.017
 cartoons+0.016
 spa+0.016
ruce+0.015
Neg logit contributions
ho-0.018
sed-0.017
ses-0.016
ks-0.014
 hope-0.014
Token How
Token ID1374
Feature activation-3.30e-02

Pos logit contributions
lf+0.028
ilts+0.027
 cartoons+0.026
 spa+0.026
During+0.025
Neg logit contributions
ho-0.019
sed-0.018
ses-0.017
aped-0.014
ks-0.014
Token about
Token ID546
Feature activation+1.61e-02

Pos logit contributions
ho+0.049
sed+0.046
ses+0.043
aped+0.036
ks+0.036
Neg logit contributions
lf-0.060
ilts-0.056
 cartoons-0.055
 spa-0.054
ruce-0.053
Token we
Token ID356
Feature activation+2.55e-02

Pos logit contributions
lf+0.028
iver+0.027
ilts+0.027
 spa+0.026
ruce+0.025
Neg logit contributions
ho-0.022
sed-0.022
ses-0.020
 Yes-0.018
 Hello-0.018
Token clean
Token ID3424
Feature activation+7.22e-02

Pos logit contributions
lf+0.047
ilts+0.043
 films+0.040
 spa+0.040
 product+0.040
Neg logit contributions
ho-0.039
sed-0.035
 hope-0.033
ses-0.031
Deal-0.030
Token your
Token ID534
Feature activation+2.20e-02

Pos logit contributions
lf+0.126
ilts+0.118
 spa+0.116
During+0.114
 opener+0.111
Neg logit contributions
ho-0.103
sed-0.096
ses-0.090
 hope-0.085
ed-0.084
Token old
Token ID1468
Feature activation+4.16e-02

Pos logit contributions
lf+0.026
ilts+0.025
 spa+0.024
During+0.023
ruce+0.022
Neg logit contributions
sed-0.044
ho-0.042
ses-0.041
 ear-0.038
 hope-0.036
Token basketball
Token ID9669
Feature activation+3.99e-02

Pos logit contributions
ilts+0.045
lf+0.044
ruce+0.041
During+0.041
 spa+0.040
Neg logit contributions
sed-0.083
ho-0.082
ses-0.078
 ear-0.073
 pad-0.071
Token and
Token ID290
Feature activation+1.69e-02

Pos logit contributions
lf+0.049
ilts+0.048
 cartoons+0.044
 spa+0.044
 films+0.042
Neg logit contributions
ho-0.077
sed-0.072
ses-0.068
ed-0.065
Deal-0.063
Token make
Token ID787
Feature activation+3.06e-02

Pos logit contributions
lf+0.022
 cartoons+0.020
ilts+0.020
 spa+0.019
 pack+0.019
Neg logit contributions
ho-0.034
sed-0.032
ses-0.031
aped-0.027
rified-0.027
Token to
Token ID284
Feature activation+2.11e-02

Pos logit contributions
lf+0.046
ilts+0.043
 spa+0.041
 cartoons+0.040
iver+0.039
Neg logit contributions
ho-0.046
sed-0.043
ses-0.040
 hope-0.039
ks-0.038
Token stop
Token ID2245
Feature activation+1.46e-02

Pos logit contributions
lf+0.028
 cartoons+0.025
ilts+0.024
 pack+0.024
 organize+0.024
Neg logit contributions
ho-0.042
sed-0.041
ses-0.039
ed-0.034
aped-0.034
Token fighting
Token ID4330
Feature activation+1.73e-02

Pos logit contributions
lf+0.017
 cartoons+0.017
 bath+0.015
ilts+0.015
 films+0.014
Neg logit contributions
ho-0.030
sed-0.030
ses-0.029
Deal-0.026
rified-0.026
Token and
Token ID290
Feature activation+2.33e-02

Pos logit contributions
lf+0.024
 cartoons+0.022
ilts+0.021
During+0.020
ruce+0.020
Neg logit contributions
ho-0.034
sed-0.033
ses-0.031
aped-0.028
rified-0.027
Token to
Token ID284
Feature activation+1.96e-02

Pos logit contributions
lf+0.030
 cartoons+0.026
 organize+0.025
 homework+0.025
 pack+0.024
Neg logit contributions
ho-0.047
sed-0.046
ses-0.044
rified-0.040
 pad-0.039
Token clean
Token ID3424
Feature activation+7.19e-02

Pos logit contributions
lf+0.027
 cartoons+0.025
ilts+0.024
 spa+0.023
 pack+0.023
Neg logit contributions
ho-0.038
sed-0.037
ses-0.034
aped-0.031
rified-0.031
Token up
Token ID510
Feature activation+1.56e-02

Pos logit contributions
lf+0.112
ilts+0.104
 cartoons+0.100
 spa+0.099
ruce+0.097
Neg logit contributions
ho-0.125
sed-0.118
ses-0.111
ks-0.098
aped-0.098
Token the
Token ID262
Feature activation+2.70e-02

Pos logit contributions
lf+0.023
ilts+0.021
 cartoons+0.021
 spa+0.020
ruce+0.020
Neg logit contributions
ho-0.029
sed-0.027
ses-0.026
aped-0.023
ks-0.022
Token blocks
Token ID7021
Feature activation+1.23e-02

Pos logit contributions
lf+0.028
ilts+0.028
 spa+0.027
lore+0.025
 cartoons+0.025
Neg logit contributions
ho-0.056
sed-0.056
ses-0.054
ks-0.048
 hope-0.048
Token.
Token ID13
Feature activation+1.43e-02

Pos logit contributions
lf+0.021
ilts+0.019
 cartoons+0.019
 spa+0.018
ruce+0.018
Neg logit contributions
ho-0.020
sed-0.019
ses-0.018
aped-0.015
ks-0.015
Token She
Token ID1375
Feature activation+1.52e-02

Pos logit contributions
lf+0.022
ilts+0.020
 cartoons+0.020
 spa+0.019
ruce+0.019
Neg logit contributions
ho-0.025
sed-0.024
ses-0.023
aped-0.020
ks-0.020
Token bag
Token ID6131
Feature activation+2.28e-02

Pos logit contributions
lf+0.017
 cartoons+0.015
ilts+0.015
 bath+0.014
 spa+0.014
Neg logit contributions
ho-0.025
sed-0.023
ses-0.022
aped-0.021
ks-0.020
Token of
Token ID286
Feature activation+2.15e-02

Pos logit contributions
lf+0.033
ilts+0.030
 cartoons+0.029
 spa+0.028
ruce+0.028
Neg logit contributions
ho-0.042
sed-0.041
ses-0.039
aped-0.034
ks-0.034
Token peanuts
Token ID45612
Feature activation+1.70e-02

Pos logit contributions
lf+0.028
ilts+0.026
 spa+0.025
lore+0.024
During+0.024
Neg logit contributions
ho-0.042
sed-0.039
 hope-0.037
ses-0.036
 ear-0.034
Token and
Token ID290
Feature activation+9.84e-03

Pos logit contributions
lf+0.027
ilts+0.026
 cartoons+0.025
 spa+0.024
ruce+0.024
Neg logit contributions
ho-0.029
sed-0.027
ses-0.025
aped-0.022
ks-0.022
Token they
Token ID484
Feature activation+3.82e-02

Pos logit contributions
lf+0.015
ilts+0.015
 spa+0.014
iver+0.014
ruce+0.014
Neg logit contributions
ho-0.017
sed-0.016
ses-0.015
 hope-0.013
 ear-0.013
Token spilled
Token ID31896
Feature activation+7.17e-02

Pos logit contributions
lf+0.061
ilts+0.056
 cartoons+0.055
 spa+0.052
ruce+0.052
Neg logit contributions
ho-0.064
sed-0.062
ses-0.059
aped-0.051
ks-0.050
Token all
Token ID477
Feature activation+1.57e-02

Pos logit contributions
lf+0.090
ilts+0.082
 cartoons+0.080
 spa+0.077
ruce+0.075
Neg logit contributions
ho-0.145
sed-0.140
ses-0.133
aped-0.119
ks-0.118
Token over
Token ID625
Feature activation+1.73e-02

Pos logit contributions
lf+0.027
ilts+0.025
 cartoons+0.025
 spa+0.024
lore+0.024
Neg logit contributions
ho-0.025
sed-0.023
ses-0.021
aped-0.019
ks-0.019
Token the
Token ID262
Feature activation+1.23e-02

Pos logit contributions
lf+0.021
 cartoons+0.019
ilts+0.018
 bath+0.017
 product+0.017
Neg logit contributions
sed-0.037
ho-0.037
ses-0.035
aped-0.031
Deal-0.030
Token floor
Token ID4314
Feature activation+2.85e-02

Pos logit contributions
lf+0.012
ilts+0.011
 cartoons+0.011
 spa+0.010
ruce+0.010
Neg logit contributions
ho-0.028
sed-0.027
ses-0.026
aped-0.023
ks-0.023
Token.
Token ID13
Feature activation+1.46e-02

Pos logit contributions
lf+0.044
ilts+0.040
 cartoons+0.039
ruce+0.037
 spa+0.037
Neg logit contributions
ho-0.049
sed-0.048
ses-0.046
aped-0.039
rified-0.039
Token a
Token ID257
Feature activation+1.07e-02

Pos logit contributions
lf+0.032
ilts+0.029
During+0.027
 spa+0.027
lore+0.027
Neg logit contributions
ho-0.038
sed-0.037
ses-0.035
 hope-0.033
ks-0.032
Token big
Token ID1263
Feature activation-1.37e-03

Pos logit contributions
lf+0.010
ilts+0.009
 spa+0.009
ruce+0.008
During+0.008
Neg logit contributions
ho-0.025
sed-0.024
ses-0.023
 hope-0.021
 ear-0.020
Token mess
Token ID2085
Feature activation+4.09e-02

Pos logit contributions
ho+0.004
sed+0.003
ses+0.003
aped+0.003
ks+0.003
Neg logit contributions
lf-0.001
ilts-0.001
 cartoons-0.001
 spa-0.001
ruce-0.001
Token.
Token ID13
Feature activation+1.15e-02

Pos logit contributions
lf+0.071
ilts+0.068
 spa+0.065
 cartoons+0.064
During+0.064
Neg logit contributions
ho-0.062
sed-0.057
ses-0.054
ks-0.048
aped-0.047
Token They
Token ID1119
Feature activation+3.04e-02

Pos logit contributions
lf+0.019
ilts+0.017
 cartoons+0.017
 spa+0.017
ruce+0.016
Neg logit contributions
ho-0.019
sed-0.018
ses-0.017
ks-0.015
aped-0.015
Token spilled
Token ID31896
Feature activation+7.16e-02

Pos logit contributions
lf+0.054
ilts+0.049
 cartoons+0.049
 pack+0.047
 spa+0.047
Neg logit contributions
ho-0.045
sed-0.044
ses-0.041
aped-0.035
ks-0.033
Token the
Token ID262
Feature activation+2.84e-02

Pos logit contributions
lf+0.098
ilts+0.090
 cartoons+0.086
 spa+0.085
ruce+0.082
Neg logit contributions
ho-0.138
sed-0.131
ses-0.125
aped-0.111
ks-0.111
Token polish
Token ID25245
Feature activation+3.41e-02

Pos logit contributions
lf+0.031
ilts+0.029
 spa+0.026
 cartoons+0.026
During+0.026
Neg logit contributions
ho-0.061
sed-0.060
ses-0.057
 hope-0.052
 ear-0.051
Token on
Token ID319
Feature activation+7.67e-04

Pos logit contributions
lf+0.053
ilts+0.049
 cartoons+0.048
 spa+0.047
ruce+0.046
Neg logit contributions
ho-0.059
sed-0.056
ses-0.053
aped-0.046
ks-0.046
Token the
Token ID262
Feature activation+6.98e-03

Pos logit contributions
lf+0.001
ilts+0.001
 spa+0.001
 cartoons+0.001
ruce+0.001
Neg logit contributions
ho-0.001
sed-0.001
ses-0.001
 hope-0.001
ks-0.001
Token bed
Token ID3996
Feature activation+2.44e-02

Pos logit contributions
lf+0.005
 cartoons+0.004
 spa+0.003
 bath+0.003
ilts+0.003
Neg logit contributions
ho-0.019
sed-0.018
ses-0.017
aped-0.017
ks-0.017
Token decide
Token ID5409
Feature activation+1.26e-02

Pos logit contributions
lf+0.033
 pack+0.032
 cartoons+0.032
ilts+0.032
 chores+0.031
Neg logit contributions
ho-0.043
sed-0.042
ses-0.040
aped-0.036
ks-0.035
Token to
Token ID284
Feature activation+2.47e-02

Pos logit contributions
lf+0.022
ilts+0.022
iver+0.020
 spa+0.020
 cartoons+0.020
Neg logit contributions
ho-0.019
sed-0.018
ses-0.016
 hope-0.016
ks-0.015
Token work
Token ID670
Feature activation+4.21e-02

Pos logit contributions
lf+0.036
ilts+0.032
 spa+0.031
 cartoons+0.031
ruce+0.030
Neg logit contributions
ho-0.046
sed-0.043
ses-0.040
 hope-0.038
 ear-0.037
Token together
Token ID1978
Feature activation+7.71e-03

Pos logit contributions
lf+0.053
 cartoons+0.051
 homework+0.049
ilts+0.047
 spa+0.046
Neg logit contributions
ho-0.085
sed-0.083
ses-0.078
rified-0.071
 pad-0.070
Token and
Token ID290
Feature activation+2.66e-02

Pos logit contributions
lf+0.009
 cartoons+0.008
 opener+0.008
 organize+0.007
 homework+0.007
Neg logit contributions
sed-0.016
ho-0.016
ses-0.016
rified-0.014
aped-0.014
Token clean
Token ID3424
Feature activation+7.15e-02

Pos logit contributions
lf+0.035
 cartoons+0.031
 pack+0.030
 organize+0.030
 homework+0.030
Neg logit contributions
ho-0.053
sed-0.051
ses-0.048
rified-0.045
aped-0.044
Token up
Token ID510
Feature activation+2.01e-02

Pos logit contributions
lf+0.111
ilts+0.103
 cartoons+0.100
 spa+0.098
ruce+0.096
Neg logit contributions
ho-0.125
sed-0.119
ses-0.112
aped-0.099
ks-0.098
Token the
Token ID262
Feature activation+3.35e-02

Pos logit contributions
lf+0.026
 cartoons+0.024
ilts+0.024
 spa+0.023
 films+0.022
Neg logit contributions
ho-0.041
sed-0.039
ses-0.037
aped-0.033
rified-0.032
Token mess
Token ID2085
Feature activation+4.68e-02

Pos logit contributions
lf+0.033
ilts+0.031
lore+0.029
During+0.028
iver+0.027
Neg logit contributions
ho-0.073
sed-0.069
ses-0.066
 hope-0.064
 ear-0.061
Token.
Token ID13
Feature activation+1.61e-02

Pos logit contributions
lf+0.073
 cartoons+0.066
ilts+0.065
 spa+0.062
 films+0.061
Neg logit contributions
ho-0.083
sed-0.082
ses-0.077
aped-0.066
rified-0.064
Token They
Token ID1119
Feature activation+2.28e-02

Pos logit contributions
lf+0.025
ilts+0.023
 cartoons+0.022
 spa+0.022
ruce+0.021
Neg logit contributions
ho-0.029
sed-0.027
ses-0.026
aped-0.023
ks-0.023
Token says
Token ID1139
Feature activation-4.09e-04

Pos logit contributions
lf+0.041
 cartoons+0.039
ilts+0.039
 spa+0.038
ruce+0.038
Neg logit contributions
ho-0.030
sed-0.028
ses-0.025
aped-0.022
rified-0.021
Token he
Token ID339
Feature activation+1.75e-02

Pos logit contributions
ho+0.001
sed+0.001
ses+0.001
aped+0.001
ks+0.001
Neg logit contributions
lf-0.000
 cartoons-0.000
 bath-0.000
ilts-0.000
 spa-0.000
Token has
Token ID468
Feature activation+2.12e-02

Pos logit contributions
lf+0.032
ilts+0.031
 homework+0.031
 cartoons+0.030
 spa+0.030
Neg logit contributions
ho-0.024
sed-0.021
ses-0.019
rified-0.017
 ear-0.017
Token to
Token ID284
Feature activation+3.65e-02

Pos logit contributions
lf+0.028
ilts+0.026
 cartoons+0.025
 spa+0.024
 homework+0.024
Neg logit contributions
ho-0.041
sed-0.039
ses-0.037
rified-0.033
aped-0.033
Token help
Token ID1037
Feature activation+3.35e-02

Pos logit contributions
lf+0.059
ilts+0.055
 spa+0.053
 cartoons+0.052
lore+0.052
Neg logit contributions
ho-0.060
sed-0.056
ses-0.053
 hope-0.050
 ear-0.048
Token clean
Token ID3424
Feature activation+7.14e-02

Pos logit contributions
lf+0.051
ilts+0.047
 cartoons+0.046
 spa+0.045
ruce+0.043
Neg logit contributions
ho-0.060
sed-0.058
ses-0.054
aped-0.048
ks-0.047
Token up
Token ID510
Feature activation+1.95e-02

Pos logit contributions
lf+0.117
ilts+0.110
 spa+0.105
During+0.104
ruce+0.103
Neg logit contributions
ho-0.115
sed-0.106
ses-0.101
 hope-0.091
ks-0.091
Token.
Token ID13
Feature activation+2.39e-02

Pos logit contributions
lf+0.029
ilts+0.027
 cartoons+0.026
 spa+0.025
ruce+0.025
Neg logit contributions
ho-0.035
sed-0.033
ses-0.032
ks-0.028
aped-0.028
Token She
Token ID1375
Feature activation+1.98e-02

Pos logit contributions
lf+0.037
ilts+0.034
 cartoons+0.033
 spa+0.032
ruce+0.032
Neg logit contributions
ho-0.042
sed-0.040
ses-0.038
aped-0.033
ks-0.033
Token says
Token ID1139
Feature activation-4.02e-04

Pos logit contributions
lf+0.041
ilts+0.039
 cartoons+0.038
 spa+0.037
ruce+0.036
Neg logit contributions
ho-0.024
sed-0.023
ses-0.021
ks-0.017
 hope-0.017
Token he
Token ID339
Feature activation+1.65e-02

Pos logit contributions
ho+0.001
sed+0.001
ses+0.001
aped+0.001
ks+0.001
Neg logit contributions
lf-0.000
 cartoons-0.000
ilts-0.000
 spa-0.000
 bath-0.000
Token She
Token ID1375
Feature activation+1.76e-02

Pos logit contributions
lf+0.041
ilts+0.037
 spa+0.035
ruce+0.035
 cartoons+0.035
Neg logit contributions
ho-0.030
sed-0.029
ses-0.028
ks-0.024
 hope-0.024
Token said
Token ID531
Feature activation+1.12e-02

Pos logit contributions
lf+0.027
 cartoons+0.024
ilts+0.023
During+0.023
ruce+0.023
Neg logit contributions
ho-0.031
sed-0.029
ses-0.028
aped-0.025
rified-0.024
Token they
Token ID484
Feature activation+3.36e-02

Pos logit contributions
lf+0.016
ilts+0.016
 spa+0.015
iver+0.015
ruce+0.014
Neg logit contributions
ho-0.021
sed-0.020
ses-0.018
 hope-0.017
ks-0.017
Token had
Token ID550
Feature activation+2.47e-02

Pos logit contributions
lf+0.064
ilts+0.060
 cartoons+0.059
During+0.057
ruce+0.057
Neg logit contributions
ho-0.046
sed-0.044
ses-0.041
aped-0.034
rified-0.033
Token to
Token ID284
Feature activation+3.68e-02

Pos logit contributions
lf+0.038
ilts+0.035
 cartoons+0.034
 spa+0.034
ruce+0.033
Neg logit contributions
ho-0.044
sed-0.042
ses-0.039
aped-0.034
ks-0.034
Token clean
Token ID3424
Feature activation+7.14e-02

Pos logit contributions
lf+0.067
ilts+0.060
lore+0.059
 spa+0.058
ruce+0.058
Neg logit contributions
ho-0.052
 hope-0.047
sed-0.047
ses-0.045
 ear-0.044
Token up
Token ID510
Feature activation+1.86e-02

Pos logit contributions
lf+0.131
ilts+0.122
During+0.118
 spa+0.117
 opener+0.116
Neg logit contributions
ho-0.101
sed-0.092
ses-0.088
 hope-0.082
 ear-0.080
Token the
Token ID262
Feature activation+3.38e-02

Pos logit contributions
lf+0.027
ilts+0.024
 cartoons+0.024
 spa+0.023
ruce+0.023
Neg logit contributions
ho-0.035
sed-0.033
ses-0.031
ks-0.028
aped-0.028
Token mess
Token ID2085
Feature activation+4.87e-02

Pos logit contributions
lf+0.033
ilts+0.031
lore+0.030
During+0.028
iver+0.027
Neg logit contributions
ho-0.073
sed-0.072
ses-0.069
 hope-0.063
 de-0.061
Token.
Token ID13
Feature activation+2.19e-02

Pos logit contributions
lf+0.077
ilts+0.071
 cartoons+0.070
 spa+0.068
ruce+0.066
Neg logit contributions
ho-0.084
sed-0.080
ses-0.075
aped-0.066
ks-0.065
Token She
Token ID1375
Feature activation+2.14e-02

Pos logit contributions
lf+0.042
ilts+0.038
 spa+0.037
ruce+0.036
 product+0.036
Neg logit contributions
ho-0.029
sed-0.029
ses-0.027
ks-0.023
 hope-0.023
Token Mom
Token ID11254
Feature activation+2.29e-02

Pos logit contributions
lf+0.016
 cartoons+0.015
ilts+0.015
 spa+0.014
ruce+0.014
Neg logit contributions
ho-0.023
sed-0.022
ses-0.020
aped-0.018
ks-0.018
Token.
Token ID13
Feature activation+1.94e-02

Pos logit contributions
lf+0.030
 cartoons+0.029
 homework+0.026
 chores+0.026
ilts+0.026
Neg logit contributions
sed-0.046
ho-0.045
ses-0.043
aped-0.039
rified-0.036
Token She
Token ID1375
Feature activation+1.65e-02

Pos logit contributions
lf+0.021
 cartoons+0.021
ilts+0.020
 spa+0.019
 bath+0.019
Neg logit contributions
ho-0.042
sed-0.038
ses-0.037
ks-0.034
rified-0.034
Token helped
Token ID4193
Feature activation+3.70e-02

Pos logit contributions
lf+0.027
ilts+0.024
 cartoons+0.024
During+0.024
 spa+0.024
Neg logit contributions
ho-0.028
sed-0.025
ses-0.024
aped-0.021
ks-0.021
Token Mom
Token ID11254
Feature activation+3.10e-02

Pos logit contributions
lf+0.056
ilts+0.052
 spa+0.049
 cartoons+0.049
lore+0.048
Neg logit contributions
ho-0.066
sed-0.062
ses-0.059
 hope-0.052
ks-0.052
Token clean
Token ID3424
Feature activation+7.14e-02

Pos logit contributions
lf+0.041
ilts+0.036
 spa+0.034
lore+0.033
 cartoons+0.033
Neg logit contributions
ho-0.059
 hope-0.055
sed-0.054
ses-0.052
ks-0.052
Token up
Token ID510
Feature activation+1.70e-02

Pos logit contributions
lf+0.129
ilts+0.122
During+0.118
lore+0.116
 spa+0.115
Neg logit contributions
ho-0.101
sed-0.091
ses-0.087
 hope-0.081
 ear-0.080
Token the
Token ID262
Feature activation+3.65e-02

Pos logit contributions
lf+0.027
ilts+0.025
 spa+0.024
lore+0.024
ruce+0.023
Neg logit contributions
ho-0.029
sed-0.027
ses-0.025
 hope-0.023
ks-0.023
Token soup
Token ID17141
Feature activation+2.12e-02

Pos logit contributions
ilts+0.045
lore+0.045
lf+0.044
iver+0.043
During+0.042
Neg logit contributions
ho-0.064
sed-0.059
 hope-0.055
 ear-0.055
ses-0.055
Token and
Token ID290
Feature activation+1.31e-02

Pos logit contributions
lf+0.033
ilts+0.031
 cartoons+0.030
 spa+0.029
ruce+0.029
Neg logit contributions
ho-0.036
sed-0.035
ses-0.033
aped-0.029
rified-0.028
Token the
Token ID262
Feature activation+1.87e-02

Pos logit contributions
lf+0.015
 cartoons+0.013
 pack+0.012
 bath+0.012
 organize+0.012
Neg logit contributions
ho-0.029
sed-0.028
ses-0.027
rified-0.025
aped-0.025
Token She
Token ID1375
Feature activation+1.28e-02

Pos logit contributions
lf+0.022
ilts+0.020
 cartoons+0.019
 spa+0.019
ruce+0.019
Neg logit contributions
ho-0.024
sed-0.023
ses-0.022
ks-0.019
aped-0.019
Token said
Token ID531
Feature activation+7.20e-03

Pos logit contributions
lf+0.021
ilts+0.019
 cartoons+0.018
 spa+0.018
ruce+0.018
Neg logit contributions
ho-0.021
sed-0.020
ses-0.019
aped-0.017
rified-0.016
Token they
Token ID484
Feature activation+3.32e-02

Pos logit contributions
lf+0.009
 cartoons+0.007
ilts+0.007
 spa+0.007
ruce+0.007
Neg logit contributions
ho-0.015
sed-0.015
ses-0.014
aped-0.013
ks-0.013
Token had
Token ID550
Feature activation+2.45e-02

Pos logit contributions
lf+0.058
ilts+0.055
 cartoons+0.054
During+0.053
ruce+0.053
Neg logit contributions
ho-0.049
sed-0.047
ses-0.044
aped-0.037
rified-0.036
Token to
Token ID284
Feature activation+3.67e-02

Pos logit contributions
lf+0.031
ilts+0.030
 cartoons+0.029
 homework+0.028
 bath+0.028
Neg logit contributions
ho-0.048
sed-0.045
ses-0.043
rified-0.039
aped-0.039
Token clean
Token ID3424
Feature activation+7.13e-02

Pos logit contributions
lf+0.060
ilts+0.055
 spa+0.054
 cartoons+0.053
lore+0.052
Neg logit contributions
ho-0.060
sed-0.056
ses-0.053
 hope-0.051
 ear-0.048
Token up
Token ID510
Feature activation+1.88e-02

Pos logit contributions
lf+0.119
ilts+0.111
 spa+0.106
 cartoons+0.104
During+0.104
Neg logit contributions
ho-0.115
sed-0.108
ses-0.102
 hope-0.091
ks-0.090
Token the
Token ID262
Feature activation+2.72e-02

Pos logit contributions
lf+0.026
ilts+0.023
 cartoons+0.023
 spa+0.022
ruce+0.021
Neg logit contributions
ho-0.036
sed-0.035
ses-0.033
aped-0.029
rified-0.029
Token mess
Token ID2085
Feature activation+4.60e-02

Pos logit contributions
lf+0.026
ilts+0.025
 spa+0.023
 cartoons+0.022
lore+0.021
Neg logit contributions
ho-0.062
sed-0.060
ses-0.057
 hope-0.053
aped-0.051
Token.
Token ID13
Feature activation+1.48e-02

Pos logit contributions
lf+0.072
ilts+0.066
 cartoons+0.065
 spa+0.062
ruce+0.061
Neg logit contributions
ho-0.081
sed-0.079
ses-0.074
aped-0.064
rified-0.061
Token
Token ID198
Feature activation+4.38e-03

Pos logit contributions
lf+0.025
ilts+0.023
 cartoons+0.022
 spa+0.022
ruce+0.021
Neg logit contributions
ho-0.024
sed-0.023
ses-0.022
ks-0.019
 hope-0.019
Token tells
Token ID4952
Feature activation+2.01e-02

Pos logit contributions
lf+0.012
 cartoons+0.011
 spa+0.011
ilts+0.011
ruce+0.011
Neg logit contributions
ho-0.010
sed-0.008
ses-0.008
aped-0.007
rified-0.007
Token Lily
Token ID20037
Feature activation+1.60e-02

Pos logit contributions
lf+0.033
ilts+0.032
iver+0.031
 spa+0.030
lore+0.030
Neg logit contributions
ho-0.032
sed-0.030
ses-0.028
 hope-0.026
 ear-0.025
Token and
Token ID290
Feature activation-5.46e-03

Pos logit contributions
lf+0.028
ilts+0.026
 cartoons+0.025
 spa+0.025
lore+0.025
Neg logit contributions
ho-0.025
sed-0.023
ses-0.022
 hope-0.020
ks-0.020
Token Tom
Token ID4186
Feature activation+1.11e-02

Pos logit contributions
ho+0.010
sed+0.009
ses+0.008
rified+0.008
ks+0.008
Neg logit contributions
 cartoons-0.008
lf-0.008
 homework-0.008
 bath-0.007
 chores-0.007
Token to
Token ID284
Feature activation+2.30e-02

Pos logit contributions
lf+0.018
ilts+0.016
 spa+0.016
 cartoons+0.015
lore+0.015
Neg logit contributions
ho-0.019
sed-0.018
ses-0.017
 hope-0.016
ks-0.016
Token clean
Token ID3424
Feature activation+7.12e-02

Pos logit contributions
lf+0.031
ilts+0.028
 cartoons+0.028
 pack+0.027
 organize+0.027
Neg logit contributions
ho-0.045
sed-0.043
ses-0.041
aped-0.037
rified-0.036
Token up
Token ID510
Feature activation+1.40e-02

Pos logit contributions
lf+0.125
ilts+0.119
During+0.114
iver+0.112
ruce+0.111
Neg logit contributions
ho-0.103
sed-0.097
ses-0.092
 ear-0.085
ks-0.083
Token the
Token ID262
Feature activation+3.44e-02

Pos logit contributions
lf+0.021
ilts+0.020
 cartoons+0.019
 spa+0.019
ruce+0.018
Neg logit contributions
ho-0.025
sed-0.024
ses-0.022
ks-0.020
 hope-0.020
Token mess
Token ID2085
Feature activation+4.52e-02

Pos logit contributions
lf+0.042
lore+0.041
ilts+0.040
iver+0.040
ister+0.037
Neg logit contributions
ses-0.058
sed-0.057
 ear-0.055
ks-0.052
 follows-0.052
Token and
Token ID290
Feature activation+1.84e-02

Pos logit contributions
lf+0.076
ilts+0.071
 cartoons+0.069
 spa+0.068
ruce+0.066
Neg logit contributions
ho-0.073
sed-0.069
ses-0.065
aped-0.056
ks-0.056
Token to
Token ID284
Feature activation+3.15e-02

Pos logit contributions
lf+0.023
 cartoons+0.021
ilts+0.020
 spa+0.020
 pack+0.019
Neg logit contributions
ho-0.038
sed-0.036
ses-0.034
aped-0.031
rified-0.031
Token nervous
Token ID10927
Feature activation+1.15e-02

Pos logit contributions
lf+0.018
ilts+0.016
ruce+0.015
 spa+0.015
lore+0.015
Neg logit contributions
ho-0.015
sed-0.015
ses-0.014
 hope-0.014
 ear-0.014
Token anymore
Token ID7471
Feature activation+8.11e-03

Pos logit contributions
lf+0.021
ilts+0.019
 spa+0.018
lore+0.017
ruce+0.017
Neg logit contributions
ho-0.017
sed-0.015
ses-0.014
ks-0.014
 hope-0.014
Token when
Token ID618
Feature activation+9.74e-03

Pos logit contributions
lf+0.012
ilts+0.012
 spa+0.011
ruce+0.011
 cartoons+0.011
Neg logit contributions
ho-0.014
sed-0.013
ses-0.012
ks-0.011
 hope-0.011
Token asked
Token ID1965
Feature activation+1.40e-02

Pos logit contributions
lf+0.012
ilts+0.011
 cartoons+0.010
 spa+0.010
ruce+0.010
Neg logit contributions
ho-0.020
sed-0.019
ses-0.018
aped-0.016
 hope-0.016
Token to
Token ID284
Feature activation+1.25e-02

Pos logit contributions
lf+0.023
ilts+0.022
 spa+0.021
 cartoons+0.020
lore+0.020
Neg logit contributions
ho-0.023
sed-0.021
ses-0.020
 hope-0.018
ks-0.018
Token clean
Token ID3424
Feature activation+7.11e-02

Pos logit contributions
lf+0.014
ilts+0.013
 cartoons+0.012
 spa+0.012
ruce+0.012
Neg logit contributions
ho-0.027
sed-0.026
ses-0.025
aped-0.022
ks-0.022
Token the
Token ID262
Feature activation+2.12e-02

Pos logit contributions
lf+0.124
ilts+0.121
During+0.117
 spa+0.114
 opener+0.111
Neg logit contributions
ho-0.096
sed-0.089
 hope-0.084
ses-0.083
 ear-0.080
Token garage
Token ID15591
Feature activation+2.52e-02

Pos logit contributions
lf+0.024
ilts+0.022
 spa+0.020
 cartoons+0.019
iver+0.019
Neg logit contributions
ho-0.043
sed-0.043
ses-0.041
 hope-0.038
 ear-0.038
Token.
Token ID13
Feature activation+9.11e-03

Pos logit contributions
lf+0.041
ilts+0.039
 spa+0.037
 cartoons+0.037
ruce+0.036
Neg logit contributions
ho-0.041
sed-0.038
ses-0.036
 hope-0.032
ks-0.032
Token<|endoftext|>
Token ID50256
Feature activation+7.93e-03

Pos logit contributions
lf+0.014
ilts+0.013
 cartoons+0.013
 spa+0.013
ruce+0.012
Neg logit contributions
ho-0.016
sed-0.015
ses-0.014
aped-0.012
ks-0.012
TokenOnce
Token ID7454
Feature activation+2.22e-02

Pos logit contributions
lf+0.009
ilts+0.007
 bath+0.007
 cartoons+0.007
 spa+0.007
Neg logit contributions
ho-0.018
sed-0.018
ses-0.017
aped-0.015
ked-0.015
Token.
Token ID13
Feature activation-7.19e-03

Pos logit contributions
ho+0.017
sed+0.016
ses+0.016
aped+0.014
rified+0.014
Neg logit contributions
lf-0.009
 cartoons-0.009
 opener-0.008
ilts-0.008
 pack-0.008
Token "
Token ID366
Feature activation-3.05e-03

Pos logit contributions
ho+0.015
sed+0.015
ses+0.014
aped+0.013
rified+0.013
Neg logit contributions
lf-0.008
ilts-0.007
 opener-0.007
 spa-0.007
During-0.007
TokenHow
Token ID2437
Feature activation-3.40e-02

Pos logit contributions
ho+0.005
sed+0.005
ses+0.005
ks+0.004
aped+0.004
Neg logit contributions
lf-0.005
ilts-0.005
 spa-0.004
 cartoons-0.004
During-0.004
Token about
Token ID546
Feature activation+1.15e-02

Pos logit contributions
ho+0.076
sed+0.073
ses+0.070
aped+0.064
rified+0.063
Neg logit contributions
lf-0.037
ilts-0.032
 cartoons-0.032
ruce-0.030
During-0.030
Token we
Token ID356
Feature activation+2.03e-02

Pos logit contributions
lf+0.018
ilts+0.018
 spa+0.017
iver+0.017
ruce+0.017
Neg logit contributions
ho-0.019
sed-0.018
ses-0.017
ks-0.015
 hope-0.015
Token clean
Token ID3424
Feature activation+7.11e-02

Pos logit contributions
lf+0.035
ilts+0.032
 spa+0.030
 cartoons+0.030
 films+0.030
Neg logit contributions
ho-0.033
sed-0.029
 hope-0.028
ses-0.027
 ear-0.025
Token up
Token ID510
Feature activation+2.87e-02

Pos logit contributions
lf+0.109
ilts+0.102
 spa+0.097
 cartoons+0.097
During+0.094
Neg logit contributions
ho-0.124
sed-0.118
ses-0.111
ks-0.098
 hope-0.098
Token the
Token ID262
Feature activation+1.52e-02

Pos logit contributions
lf+0.037
ilts+0.034
 cartoons+0.033
 spa+0.032
ruce+0.031
Neg logit contributions
ho-0.057
sed-0.055
ses-0.052
aped-0.047
ks-0.047
Token blocks
Token ID7021
Feature activation+2.05e-02

Pos logit contributions
lf+0.018
ilts+0.017
 spa+0.016
 cartoons+0.016
lore+0.015
Neg logit contributions
ho-0.031
sed-0.030
ses-0.029
 hope-0.026
ks-0.026
Token and
Token ID290
Feature activation+1.61e-02

Pos logit contributions
lf+0.025
ilts+0.024
 cartoons+0.022
 spa+0.022
lore+0.021
Neg logit contributions
ho-0.042
sed-0.039
ses-0.037
 hope-0.034
ks-0.034
Token then
Token ID788
Feature activation+2.05e-02

Pos logit contributions
lf+0.020
 cartoons+0.018
ilts+0.017
 pack+0.017
 spa+0.017
Neg logit contributions
ho-0.034
sed-0.032
ses-0.030
aped-0.028
rified-0.028
Token.
Token ID13
Feature activation+1.59e-02

Pos logit contributions
lf+0.060
ilts+0.056
 spa+0.053
 cartoons+0.053
ruce+0.052
Neg logit contributions
ho-0.054
sed-0.050
ses-0.047
 hope-0.043
ks-0.043
Token Mom
Token ID11254
Feature activation+2.10e-02

Pos logit contributions
lf+0.023
ilts+0.021
 cartoons+0.021
 spa+0.020
ruce+0.020
Neg logit contributions
ho-0.029
sed-0.028
ses-0.026
aped-0.023
ks-0.023
Token told
Token ID1297
Feature activation+1.35e-02

Pos logit contributions
lf+0.042
ilts+0.039
 cartoons+0.039
 spa+0.038
ruce+0.037
Neg logit contributions
ho-0.027
sed-0.025
ses-0.023
aped-0.019
ks-0.019
Token them
Token ID606
Feature activation+1.91e-02

Pos logit contributions
lf+0.017
 cartoons+0.016
ilts+0.015
ruce+0.014
 films+0.014
Neg logit contributions
ho-0.028
sed-0.027
ses-0.026
aped-0.024
ks-0.023
Token to
Token ID284
Feature activation+2.11e-02

Pos logit contributions
lf+0.031
ilts+0.030
iver+0.028
 spa+0.028
ruce+0.026
Neg logit contributions
ho-0.032
sed-0.030
ses-0.027
 hope-0.026
ks-0.026
Token clean
Token ID3424
Feature activation+7.10e-02

Pos logit contributions
lf+0.029
ilts+0.026
 cartoons+0.026
 spa+0.025
 pack+0.025
Neg logit contributions
ho-0.040
sed-0.039
ses-0.037
aped-0.033
ks-0.033
Token up
Token ID510
Feature activation+1.46e-02

Pos logit contributions
lf+0.114
ilts+0.106
 cartoons+0.102
 spa+0.101
ruce+0.098
Neg logit contributions
ho-0.120
sed-0.114
ses-0.107
ks-0.094
aped-0.094
Token the
Token ID262
Feature activation+3.45e-02

Pos logit contributions
lf+0.020
 cartoons+0.018
ilts+0.018
 spa+0.017
ruce+0.016
Neg logit contributions
ho-0.029
sed-0.028
ses-0.026
aped-0.024
rified-0.023
Token blocks
Token ID7021
Feature activation+1.04e-02

Pos logit contributions
ilts+0.031
lf+0.031
 spa+0.029
lore+0.027
During+0.026
Neg logit contributions
ho-0.077
sed-0.077
ses-0.074
 ear-0.067
ks-0.067
Token and
Token ID290
Feature activation+1.85e-02

Pos logit contributions
lf+0.016
ilts+0.014
 cartoons+0.014
 spa+0.014
ruce+0.013
Neg logit contributions
ho-0.019
sed-0.018
ses-0.017
aped-0.015
ks-0.015
Token make
Token ID787
Feature activation+2.91e-02

Pos logit contributions
lf+0.023
 cartoons+0.020
ilts+0.019
 pack+0.019
 spa+0.019
Neg logit contributions
ho-0.039
sed-0.037
ses-0.035
rified-0.032
aped-0.032
Token mom
Token ID1995
Feature activation+2.05e-02

Pos logit contributions
 cartoons+0.014
lf+0.014
 bath+0.013
 films+0.013
 pack+0.013
Neg logit contributions
ho-0.024
sed-0.023
ses-0.021
aped-0.021
ed-0.021
Token.
Token ID13
Feature activation+1.44e-02

Pos logit contributions
lf+0.027
 cartoons+0.026
ilts+0.024
 pack+0.024
 films+0.024
Neg logit contributions
sed-0.040
ho-0.039
ses-0.038
aped-0.034
rified-0.032
Token They
Token ID1119
Feature activation+2.10e-02

Pos logit contributions
lf+0.019
ilts+0.018
 cartoons+0.017
 spa+0.017
During+0.017
Neg logit contributions
ho-0.028
sed-0.027
ses-0.025
aped-0.023
rified-0.023
Token promise
Token ID6991
Feature activation+2.66e-02

Pos logit contributions
 pack+0.031
lf+0.031
ilts+0.030
 spa+0.030
 organize+0.030
Neg logit contributions
ho-0.036
sed-0.034
ses-0.032
aped-0.029
ks-0.028
Token to
Token ID284
Feature activation+1.86e-02

Pos logit contributions
lf+0.045
ilts+0.041
 cartoons+0.040
 spa+0.039
ruce+0.039
Neg logit contributions
ho-0.044
sed-0.041
ses-0.039
 hope-0.035
ks-0.034
Token clean
Token ID3424
Feature activation+7.09e-02

Pos logit contributions
lf+0.024
 cartoons+0.022
 pack+0.022
ilts+0.022
 organize+0.021
Neg logit contributions
ho-0.037
sed-0.036
ses-0.034
rified-0.032
ed-0.031
Token up
Token ID510
Feature activation+1.85e-02

Pos logit contributions
lf+0.107
ilts+0.099
 cartoons+0.098
 spa+0.095
ruce+0.092
Neg logit contributions
ho-0.127
sed-0.122
ses-0.114
aped-0.101
ks-0.099
Token.
Token ID13
Feature activation+1.36e-02

Pos logit contributions
lf+0.024
 cartoons+0.022
ilts+0.020
 spa+0.020
 bath+0.020
Neg logit contributions
ho-0.038
sed-0.037
ses-0.035
aped-0.031
rified-0.031
Token They
Token ID1119
Feature activation+2.02e-02

Pos logit contributions
lf+0.019
ilts+0.018
 cartoons+0.017
 spa+0.017
During+0.017
Neg logit contributions
ho-0.025
sed-0.024
ses-0.023
aped-0.020
ks-0.020
Token turn
Token ID1210
Feature activation+1.52e-02

Pos logit contributions
lf+0.032
ilts+0.030
 pack+0.030
 spa+0.030
 cartoons+0.029
Neg logit contributions
ho-0.033
sed-0.032
ses-0.030
aped-0.026
ks-0.026
Token off
Token ID572
Feature activation+1.18e-02

Pos logit contributions
lf+0.021
 cartoons+0.021
ilts+0.020
 bath+0.019
 spa+0.019
Neg logit contributions
ho-0.029
sed-0.028
ses-0.026
aped-0.023
Deal-0.022
Token mess
Token ID2085
Feature activation+3.92e-02

Pos logit contributions
irens+0.035
iver+0.034
ilts+0.033
lf+0.033
ruce+0.032
Neg logit contributions
ho-0.056
sed-0.050
 hope-0.046
ses-0.045
 pad-0.044
Token and
Token ID290
Feature activation+2.84e-02

Pos logit contributions
lf+0.067
ilts+0.067
 spa+0.064
lore+0.063
During+0.062
Neg logit contributions
ho-0.057
sed-0.051
ses-0.048
ks-0.045
ed-0.044
Token asked
Token ID1965
Feature activation+1.11e-02

Pos logit contributions
lf+0.041
ilts+0.038
 cartoons+0.036
 spa+0.036
ruce+0.035
Neg logit contributions
ho-0.053
sed-0.050
ses-0.047
 hope-0.043
aped-0.042
Token her
Token ID607
Feature activation+2.95e-02

Pos logit contributions
lf+0.012
 cartoons+0.011
During+0.010
 bath+0.010
 chores+0.010
Neg logit contributions
sed-0.024
ho-0.024
ses-0.024
aped-0.021
rified-0.020
Token to
Token ID284
Feature activation+1.90e-02

Pos logit contributions
lf+0.058
ilts+0.057
iver+0.055
 spa+0.052
ruce+0.052
Neg logit contributions
ho-0.038
sed-0.034
 hope-0.034
 ear-0.030
ses-0.030
Token clean
Token ID3424
Feature activation+7.09e-02

Pos logit contributions
lf+0.021
ilts+0.019
 cartoons+0.018
 spa+0.018
 films+0.017
Neg logit contributions
ho-0.041
sed-0.039
ses-0.037
 hope-0.036
 ear-0.035
Token it
Token ID340
Feature activation+2.41e-02

Pos logit contributions
lf+0.129
ilts+0.128
During+0.124
lore+0.121
 opener+0.121
Neg logit contributions
ho-0.081
sed-0.078
 ear-0.076
 hope-0.074
ses-0.074
Token up
Token ID510
Feature activation+2.18e-03

Pos logit contributions
lf+0.047
ilts+0.045
 films+0.045
 spa+0.044
 cartoons+0.044
Neg logit contributions
ho-0.028
sed-0.025
 hope-0.023
ses-0.023
ed-0.022
Token.
Token ID13
Feature activation+1.98e-02

Pos logit contributions
lf+0.003
ilts+0.003
 cartoons+0.003
 spa+0.003
During+0.003
Neg logit contributions
ho-0.004
sed-0.004
ses-0.004
aped-0.003
rified-0.003
Token But
Token ID887
Feature activation+1.07e-02

Pos logit contributions
lf+0.030
ilts+0.028
 cartoons+0.027
 spa+0.027
ruce+0.026
Neg logit contributions
ho-0.035
sed-0.033
ses-0.032
ks-0.028
aped-0.028
Token Lily
Token ID20037
Feature activation+2.62e-02

Pos logit contributions
lf+0.013
 cartoons+0.011
ilts+0.011
 spa+0.010
During+0.010
Neg logit contributions
ho-0.023
sed-0.022
ses-0.021
aped-0.019
ks-0.019
Token and
Token ID290
Feature activation+1.17e-02

Pos logit contributions
 cartoons+0.025
lf+0.024
 films+0.021
 bath+0.021
ilts+0.020
Neg logit contributions
ho-0.038
sed-0.037
ses-0.035
aped-0.032
 pad-0.031
Token say
Token ID910
Feature activation-1.06e-02

Pos logit contributions
lf+0.017
 cartoons+0.016
 pack+0.016
 organize+0.016
 bath+0.015
Neg logit contributions
ho-0.021
sed-0.020
ses-0.019
aped-0.018
rified-0.017
Token sorry
Token ID7926
Feature activation+2.93e-02

Pos logit contributions
ho+0.019
sed+0.019
ses+0.018
aped+0.016
 pad+0.016
Neg logit contributions
lf-0.015
 cartoons-0.014
ilts-0.014
 bath-0.013
 films-0.013
Token.
Token ID13
Feature activation+2.60e-02

Pos logit contributions
lf+0.039
 cartoons+0.038
ilts+0.036
During+0.035
 bath+0.034
Neg logit contributions
ho-0.055
sed-0.054
ses-0.051
aped-0.046
rified-0.046
Token They
Token ID1119
Feature activation+2.78e-02

Pos logit contributions
lf+0.033
ilts+0.031
 cartoons+0.031
 spa+0.029
During+0.029
Neg logit contributions
ho-0.052
sed-0.048
ses-0.046
aped-0.042
rified-0.042
Token clean
Token ID3424
Feature activation+7.09e-02

Pos logit contributions
lf+0.043
ilts+0.041
 cartoons+0.040
 pack+0.040
 spa+0.040
Neg logit contributions
ho-0.046
sed-0.045
ses-0.042
aped-0.037
ks-0.036
Token up
Token ID510
Feature activation+1.91e-02

Pos logit contributions
lf+0.102
 cartoons+0.095
ilts+0.093
 spa+0.089
 films+0.088
Neg logit contributions
ho-0.134
sed-0.128
ses-0.121
aped-0.107
ks-0.105
Token their
Token ID511
Feature activation+2.81e-02

Pos logit contributions
lf+0.022
 cartoons+0.020
ilts+0.018
 films+0.018
 bath+0.018
Neg logit contributions
ho-0.042
sed-0.041
ses-0.039
 pad-0.035
aped-0.035
Token rooms
Token ID9519
Feature activation+4.00e-02

Pos logit contributions
lf+0.030
ilts+0.028
 spa+0.026
 cartoons+0.025
ruce+0.025
Neg logit contributions
ho-0.062
sed-0.060
ses-0.057
ks-0.052
aped-0.051
Token and
Token ID290
Feature activation+2.48e-02

Pos logit contributions
lf+0.055
ilts+0.049
 cartoons+0.048
 spa+0.045
ruce+0.044
Neg logit contributions
ho-0.078
sed-0.076
ses-0.072
aped-0.063
rified-0.063
Token their
Token ID511
Feature activation+2.53e-02

Pos logit contributions
lf+0.030
 cartoons+0.026
 pack+0.025
ilts+0.025
 organize+0.024
Neg logit contributions
ho-0.053
sed-0.051
ses-0.049
rified-0.044
aped-0.044
Token
Token ID6184
Feature activation-1.08e-02

Pos logit contributions
ses+0.056
sed+0.055
ho+0.054
 pad+0.047
ks+0.047
Neg logit contributions
lf-0.027
 spa-0.022
 cartoons-0.022
 bath-0.022
ilts-0.021
Token
Token ID95
Feature activation-1.48e-03

Pos logit contributions
ho+0.019
sed+0.018
ses+0.017
ks+0.015
aped+0.015
Neg logit contributions
lf-0.017
ilts-0.015
 cartoons-0.015
 spa-0.015
ruce-0.014
Token
Token ID26391
Feature activation-7.42e-03

Pos logit contributions
ho+0.003
sed+0.002
ses+0.002
aped+0.002
ks+0.002
Neg logit contributions
lf-0.002
ilts-0.002
 cartoons-0.002
 spa-0.002
ruce-0.002
Token
Token ID129
Feature activation-1.31e-02

Pos logit contributions
ho+0.013
sed+0.012
ses+0.012
aped+0.010
ks+0.010
Neg logit contributions
lf-0.012
ilts-0.011
 cartoons-0.010
 spa-0.010
lore-0.010
Token
Token ID241
Feature activation-3.44e-02

Pos logit contributions
ho+0.022
sed+0.021
ses+0.019
ks+0.017
aped+0.017
Neg logit contributions
lf-0.021
ilts-0.019
 cartoons-0.019
 spa-0.018
 bath-0.018
Tokenyes
Token ID8505
Feature activation-7.50e-02

Pos logit contributions
sed+0.063
ho+0.060
ses+0.059
ked+0.050
rified+0.049
Neg logit contributions
During-0.050
 spa-0.048
 sport-0.046
omo-0.045
On-0.045
Token!
Token ID0
Feature activation-1.61e-02

Pos logit contributions
sed+0.166
ses+0.164
ho+0.159
aped+0.141
ks+0.138
Neg logit contributions
lf-0.085
ilts-0.068
 spa-0.066
 pack-0.066
During-0.065
Tokenâ
Token ID22940
Feature activation-1.63e-02

Pos logit contributions
ho+0.028
sed+0.027
ses+0.025
ks+0.022
aped+0.022
Neg logit contributions
lf-0.024
ilts-0.023
 spa-0.023
 cartoons-0.023
During-0.022
Token
Token ID26391
Feature activation-3.31e-03

Pos logit contributions
ho+0.030
sed+0.029
ses+0.027
ks+0.023
 hope+0.023
Neg logit contributions
lf-0.025
 spa-0.023
iver-0.022
ilts-0.022
During-0.021
Token
Token ID198
Feature activation-8.44e-03

Pos logit contributions
ho+0.006
sed+0.005
ses+0.005
aped+0.004
ks+0.004
Neg logit contributions
lf-0.005
ilts-0.005
 cartoons-0.005
 spa-0.005
lore-0.004
Token
Token ID198
Feature activation+4.95e-03

Pos logit contributions
ho+0.014
sed+0.013
ses+0.012
aped+0.011
ks+0.011
Neg logit contributions
lf-0.014
ilts-0.013
 cartoons-0.012
 spa-0.012
lore-0.012
Token
Token ID6184
Feature activation-1.10e-02

Pos logit contributions
ho+0.038
sed+0.038
ses+0.037
aped+0.032
ks+0.032
Neg logit contributions
lf-0.031
 cartoons-0.028
 spa-0.028
ilts-0.027
 pack-0.026
Token
Token ID95
Feature activation-2.24e-03

Pos logit contributions
ho+0.019
sed+0.018
ses+0.017
aped+0.016
Deal+0.015
Neg logit contributions
lf-0.016
ilts-0.015
 cartoons-0.015
During-0.014
ruce-0.014
Token
Token ID26391
Feature activation-5.56e-03

Pos logit contributions
ho+0.004
sed+0.003
ses+0.003
aped+0.003
ks+0.003
Neg logit contributions
lf-0.004
ilts-0.003
 cartoons-0.003
 spa-0.003
ruce-0.003
Token
Token ID129
Feature activation-1.07e-02

Pos logit contributions
ho+0.010
sed+0.009
ses+0.009
aped+0.008
ks+0.008
Neg logit contributions
lf-0.009
ilts-0.008
 cartoons-0.008
 spa-0.008
ruce-0.007
Token
Token ID241
Feature activation-3.38e-02

Pos logit contributions
ho+0.017
sed+0.017
ses+0.016
ks+0.014
 ear+0.014
Neg logit contributions
lf-0.017
ilts-0.016
 cartoons-0.015
 spa-0.015
ruce-0.015
Tokenyes
Token ID8505
Feature activation-7.12e-02

Pos logit contributions
sed+0.065
ho+0.063
ses+0.060
ked+0.051
als+0.050
Neg logit contributions
During-0.047
 spa-0.046
 sport-0.043
While-0.043
 Wed-0.042
Token,
Token ID11
Feature activation-3.40e-02

Pos logit contributions
sed+0.154
ho+0.152
ses+0.150
aped+0.130
ks+0.126
Neg logit contributions
lf-0.085
 spa-0.070
ilts-0.069
 pack-0.068
During-0.067
Token I
Token ID314
Feature activation-3.97e-02

Pos logit contributions
ho+0.061
sed+0.060
ses+0.058
aped+0.050
ks+0.049
Neg logit contributions
lf-0.051
 spa-0.047
 cartoons-0.046
ilts-0.045
During-0.044
Token want
Token ID765
Feature activation-3.17e-02

Pos logit contributions
ho+0.068
sed+0.067
ses+0.064
rified+0.055
ked+0.053
Neg logit contributions
 pack-0.060
ilts-0.060
lf-0.060
 organize-0.058
 spa-0.058
Token to
Token ID284
Feature activation-9.23e-03

Pos logit contributions
ho+0.050
sed+0.046
ses+0.043
 hope+0.039
ks+0.039
Neg logit contributions
lf-0.054
ilts-0.050
iver-0.048
 cartoons-0.047
 spa-0.047
Token find
Token ID1064
Feature activation-1.24e-02

Pos logit contributions
ho+0.014
sed+0.013
 ear+0.012
 hope+0.012
ses+0.012
Neg logit contributions
lf-0.016
ilts-0.015
 films-0.014
 cartoons-0.014
 spa-0.014
Token
Token ID6184
Feature activation-1.68e-02

Pos logit contributions
ses+0.032
ho+0.031
sed+0.031
 hope+0.028
Deal+0.028
Neg logit contributions
lf-0.018
 spa-0.016
 cartoons-0.016
During-0.016
lore-0.015
Token
Token ID95
Feature activation-1.23e-02

Pos logit contributions
ho+0.029
sed+0.027
ses+0.026
aped+0.022
ks+0.022
Neg logit contributions
lf-0.027
ilts-0.025
 cartoons-0.024
 spa-0.024
ruce-0.023
Token
Token ID26391
Feature activation-1.59e-02

Pos logit contributions
ho+0.021
sed+0.020
ses+0.018
 hope+0.016
ks+0.016
Neg logit contributions
lf-0.020
ilts-0.019
 spa-0.018
 cartoons-0.018
lore-0.018
Token
Token ID129
Feature activation-1.90e-02

Pos logit contributions
ho+0.029
sed+0.028
ses+0.026
aped+0.023
 hope+0.023
Neg logit contributions
lf-0.023
ilts-0.022
 cartoons-0.020
lore-0.020
 spa-0.020
Token
Token ID241
Feature activation-3.49e-02

Pos logit contributions
ho+0.034
sed+0.032
ses+0.030
aped+0.026
ks+0.026
Neg logit contributions
lf-0.029
ilts-0.027
 cartoons-0.026
 spa-0.026
ruce-0.025
TokenOf
Token ID5189
Feature activation-6.86e-02

Pos logit contributions
sed+0.065
ho+0.062
ses+0.061
als+0.052
ked+0.051
Neg logit contributions
During-0.047
 spa-0.047
On-0.045
 sport-0.045
omo-0.044
Token course
Token ID1781
Feature activation-5.29e-02

Pos logit contributions
sed+0.152
ses+0.148
ho+0.130
aped+0.123
 pad+0.117
Neg logit contributions
ister-0.076
 concert-0.076
lf-0.075
?"-0.073
unt-0.073
Token!
Token ID0
Feature activation-2.65e-02

Pos logit contributions
ses+0.122
sed+0.120
ho+0.116
aped+0.102
ks+0.101
Neg logit contributions
lf-0.052
 bath-0.043
 pack-0.042
?"-0.042
 spa-0.041
Token I
Token ID314
Feature activation-2.87e-02

Pos logit contributions
ho+0.044
sed+0.043
ses+0.040
ks+0.034
aped+0.033
Neg logit contributions
 spa-0.041
During-0.040
 mechanic-0.039
ilts-0.039
 opener-0.039
Tokenâ
Token ID22940
Feature activation-9.15e-03

Pos logit contributions
ho+0.048
sed+0.046
ses+0.044
aped+0.037
rified+0.037
Neg logit contributions
lf-0.044
ilts-0.043
 pack-0.042
 spa-0.041
 organize-0.041
Token
Token ID26391
Feature activation-3.19e-03

Pos logit contributions
sed+0.016
ho+0.016
ses+0.015
ks+0.012
ked+0.012
Neg logit contributions
lf-0.014
iver-0.014
 spa-0.014
 pack-0.013
ilts-0.013
Token
Token ID6184
Feature activation-1.40e-02

Pos logit contributions
sed+0.036
ses+0.036
ho+0.036
aped+0.031
ks+0.031
Neg logit contributions
lf-0.021
 spa-0.019
 cartoons-0.019
During-0.018
 pack-0.018
Token
Token ID95
Feature activation-7.12e-03

Pos logit contributions
ho+0.023
sed+0.022
ses+0.021
aped+0.018
ks+0.018
Neg logit contributions
lf-0.023
ilts-0.021
 cartoons-0.021
 spa-0.020
ruce-0.020
Token
Token ID26391
Feature activation-1.22e-02

Pos logit contributions
ho+0.012
sed+0.011
ses+0.010
aped+0.009
ks+0.009
Neg logit contributions
lf-0.012
ilts-0.011
 cartoons-0.011
 spa-0.011
ruce-0.010
Token
Token ID129
Feature activation-1.74e-02

Pos logit contributions
ho+0.022
sed+0.021
ses+0.020
aped+0.017
 hope+0.017
Neg logit contributions
lf-0.019
ilts-0.017
 cartoons-0.016
lore-0.016
 spa-0.016
Token
Token ID241
Feature activation-3.11e-02

Pos logit contributions
ho+0.030
sed+0.028
ses+0.027
ks+0.024
aped+0.024
Neg logit contributions
lf-0.027
ilts-0.025
 cartoons-0.025
 spa-0.024
ruce-0.023
TokenOf
Token ID5189
Feature activation-6.80e-02

Pos logit contributions
sed+0.056
ho+0.056
ses+0.053
als+0.047
ked+0.045
Neg logit contributions
During-0.043
 spa-0.042
 sport-0.041
lf-0.040
omo-0.040
Token course
Token ID1781
Feature activation-5.85e-02

Pos logit contributions
sed+0.151
ses+0.145
ho+0.126
aped+0.119
 pad+0.119
Neg logit contributions
lf-0.086
ister-0.079
 concert-0.077
 operation-0.076
?"-0.073
Token you
Token ID345
Feature activation-5.53e-02

Pos logit contributions
ses+0.143
sed+0.139
ho+0.135
 pad+0.120
aped+0.119
Neg logit contributions
lf-0.054
?"-0.042
 pack-0.039
 bath-0.037
ister-0.036
Token can
Token ID460
Feature activation-3.45e-02

Pos logit contributions
ho+0.088
sed+0.086
ses+0.082
aped+0.069
ks+0.062
Neg logit contributions
 pack-0.089
lf-0.089
ilts-0.088
 organize-0.087
 opener-0.085
Token have
Token ID423
Feature activation-1.08e-02

Pos logit contributions
ho+0.062
sed+0.060
ses+0.058
rified+0.050
aped+0.050
Neg logit contributions
lf-0.051
ilts-0.047
 cartoons-0.045
 pack-0.044
 spa-0.044
Token a
Token ID257
Feature activation-1.63e-02

Pos logit contributions
ho+0.018
sed+0.018
ses+0.017
 hope+0.016
ks+0.015
Neg logit contributions
lf-0.017
ilts-0.015
 cartoons-0.015
 spa-0.015
ruce-0.014
Token rust
Token ID17000
Feature activation-2.21e-02

Pos logit contributions
ho+0.025
sed+0.024
ses+0.023
aped+0.021
ks+0.020
Neg logit contributions
lf-0.016
ilts-0.015
 cartoons-0.015
 spa-0.014
ruce-0.014
Tokenling
Token ID1359
Feature activation-2.20e-02

Pos logit contributions
sed+0.053
ses+0.050
ho+0.048
Deal+0.042
aped+0.042
Neg logit contributions
lf-0.023
 cartoons-0.020
ruce-0.020
 films-0.019
iver-0.019
Token in
Token ID287
Feature activation-4.23e-02

Pos logit contributions
sed+0.045
ses+0.043
ho+0.040
Deal+0.037
 pad+0.035
Neg logit contributions
lf-0.026
 cartoons-0.026
ruce-0.024
 films-0.024
During-0.023
Token the
Token ID262
Feature activation-4.23e-02

Pos logit contributions
sed+0.099
ho+0.098
ses+0.093
aped+0.086
Deal+0.083
Neg logit contributions
lf-0.043
 cartoons-0.039
 films-0.036
 pack-0.036
ilts-0.035
Token grass
Token ID8701
Feature activation-2.14e-02

Pos logit contributions
sed+0.096
ho+0.093
ses+0.090
aped+0.085
Deal+0.083
Neg logit contributions
 bath-0.042
lf-0.038
iver-0.037
 pack-0.037
 carpet-0.037
Token below
Token ID2174
Feature activation-6.78e-02

Pos logit contributions
sed+0.039
ses+0.038
ho+0.036
 hope+0.033
Deal+0.033
Neg logit contributions
lf-0.030
ilts-0.027
ruce-0.025
 cartoons-0.025
iver-0.025
Token.
Token ID13
Feature activation-3.33e-03

Pos logit contributions
sed+0.171
ses+0.166
Deal+0.150
ho+0.146
aped+0.138
Neg logit contributions
lf-0.061
 bath-0.055
 pack-0.053
 cartoons-0.052
iver-0.048
Token It
Token ID632
Feature activation-1.25e-02

Pos logit contributions
ho+0.006
sed+0.006
ses+0.005
 hope+0.005
ks+0.005
Neg logit contributions
lf-0.005
ilts-0.004
 cartoons-0.004
 spa-0.004
lore-0.004
Token was
Token ID373
Feature activation-1.35e-02

Pos logit contributions
ho+0.020
sed+0.019
ses+0.018
aped+0.016
ks+0.016
Neg logit contributions
lf-0.021
ilts-0.019
 cartoons-0.019
 spa-0.018
ruce-0.018
Token a
Token ID257
Feature activation-1.85e-02

Pos logit contributions
ho+0.027
sed+0.026
ses+0.024
aped+0.022
ks+0.022
Neg logit contributions
lf-0.017
ilts-0.016
 cartoons-0.016
 spa-0.015
ruce-0.015
Token present
Token ID1944
Feature activation+3.83e-03

Pos logit contributions
ho+0.040
sed+0.038
ses+0.037
aped+0.033
ks+0.033
Neg logit contributions
lf-0.021
ilts-0.019
 cartoons-0.018
 spa-0.018
During-0.017
Token
Token ID6184
Feature activation-1.09e-02

Pos logit contributions
ho+0.037
sed+0.037
ses+0.037
aped+0.032
ks+0.031
Neg logit contributions
lf-0.023
 cartoons-0.020
ilts-0.020
 spa-0.020
lore-0.019
Token
Token ID95
Feature activation+1.41e-04

Pos logit contributions
ho+0.019
sed+0.018
ses+0.017
aped+0.015
ks+0.015
Neg logit contributions
lf-0.017
ilts-0.016
 cartoons-0.015
 spa-0.015
During-0.015
Token
Token ID26391
Feature activation-6.94e-03

Pos logit contributions
lf+0.000
ilts+0.000
 cartoons+0.000
 spa+0.000
ruce+0.000
Neg logit contributions
ho-0.000
sed-0.000
ses-0.000
aped-0.000
ks-0.000
Token
Token ID129
Feature activation-9.20e-03

Pos logit contributions
ho+0.012
sed+0.011
ses+0.011
aped+0.009
ks+0.009
Neg logit contributions
lf-0.011
ilts-0.010
 cartoons-0.010
 spa-0.009
lore-0.009
Token
Token ID241
Feature activation-3.13e-02

Pos logit contributions
ho+0.015
sed+0.014
ses+0.014
ks+0.012
 ear+0.012
Neg logit contributions
lf-0.015
ilts-0.014
 cartoons-0.014
 spa-0.013
 bath-0.013
Tokenyes
Token ID8505
Feature activation-6.77e-02

Pos logit contributions
sed+0.054
ho+0.054
ses+0.050
als+0.042
rified+0.041
Neg logit contributions
 spa-0.047
lf-0.046
During-0.046
 sport-0.045
 pack-0.044
Tokenâ
Token ID22940
Feature activation-2.08e-02

Pos logit contributions
sed+0.149
ses+0.148
ho+0.143
aped+0.129
ks+0.123
Neg logit contributions
lf-0.081
 pack-0.062
 spa-0.061
ilts-0.059
ister-0.059
Token
Token ID26391
Feature activation-1.03e-02

Pos logit contributions
sed+0.032
ho+0.031
ses+0.029
ks+0.024
 pad+0.023
Neg logit contributions
lf-0.037
iver-0.035
 spa-0.034
 pack-0.034
 operation-0.033
Token and
Token ID290
Feature activation-2.22e-02

Pos logit contributions
ho+0.018
sed+0.017
ses+0.017
aped+0.014
ks+0.014
Neg logit contributions
lf-0.016
ilts-0.014
 cartoons-0.014
 spa-0.014
lore-0.013
Token began
Token ID2540
Feature activation-4.05e-03

Pos logit contributions
ho+0.039
sed+0.038
ses+0.036
ks+0.032
rified+0.031
Neg logit contributions
lf-0.033
 spa-0.030
ilts-0.029
 cartoons-0.029
 pack-0.029
Token to
Token ID284
Feature activation-9.68e-03

Pos logit contributions
ho+0.007
sed+0.007
ses+0.006
aped+0.005
ks+0.005
Neg logit contributions
lf-0.006
ilts-0.006
 cartoons-0.006
 spa-0.006
ruce-0.006
Token
Token ID6184
Feature activation-1.01e-02

Pos logit contributions
ses+0.003
sed+0.003
ho+0.003
 pad+0.003
aped+0.003
Neg logit contributions
lf-0.002
 cartoons-0.001
 bath-0.001
 spa-0.001
During-0.001
Token
Token ID95
Feature activation-2.27e-03

Pos logit contributions
ho+0.017
sed+0.016
ses+0.015
aped+0.013
ks+0.013
Neg logit contributions
lf-0.017
ilts-0.016
 cartoons-0.015
 spa-0.015
ruce-0.014
Token
Token ID26391
Feature activation-6.51e-03

Pos logit contributions
ho+0.004
sed+0.004
ses+0.003
aped+0.003
ks+0.003
Neg logit contributions
lf-0.004
ilts-0.003
 cartoons-0.003
 spa-0.003
ruce-0.003
Token
Token ID129
Feature activation-6.38e-03

Pos logit contributions
ho+0.012
sed+0.011
ses+0.010
aped+0.009
ks+0.009
Neg logit contributions
lf-0.010
ilts-0.009
 cartoons-0.009
 spa-0.009
lore-0.009
Token
Token ID241
Feature activation-2.44e-02

Pos logit contributions
ho+0.011
sed+0.010
ses+0.010
ks+0.008
aped+0.008
Neg logit contributions
lf-0.010
ilts-0.009
 cartoons-0.009
 spa-0.009
 bath-0.009
Tokenyes
Token ID8505
Feature activation-6.72e-02

Pos logit contributions
sed+0.045
ho+0.042
ses+0.042
als+0.036
 ear+0.036
Neg logit contributions
During-0.034
 spa-0.032
 sport-0.031
On-0.031
While-0.030
Tokenâ
Token ID22940
Feature activation-1.77e-02

Pos logit contributions
sed+0.153
ses+0.151
ho+0.146
aped+0.133
ks+0.126
Neg logit contributions
lf-0.071
 pack-0.056
During-0.055
 spa-0.054
ilts-0.054
Token
Token ID26391
Feature activation-1.14e-02

Pos logit contributions
sed+0.028
ho+0.027
ses+0.026
ks+0.021
 pad+0.021
Neg logit contributions
lf-0.030
iver-0.028
 spa-0.027
 pack-0.027
ilts-0.027
Token.
Token ID13
Feature activation-5.21e-03

Pos logit contributions
ho+0.021
sed+0.020
ses+0.019
aped+0.016
ks+0.016
Neg logit contributions
lf-0.017
ilts-0.015
 cartoons-0.015
 spa-0.015
lore-0.014
Token The
Token ID383
Feature activation-9.50e-03

Pos logit contributions
ho+0.009
sed+0.009
ses+0.008
aped+0.007
ks+0.007
Neg logit contributions
lf-0.008
ilts-0.007
 cartoons-0.007
 spa-0.007
During-0.007
Token second
Token ID1218
Feature activation-2.99e-02

Pos logit contributions
ho+0.018
sed+0.017
ses+0.016
aped+0.014
ks+0.014
Neg logit contributions
lf-0.013
ilts-0.012
 cartoons-0.012
 spa-0.012
ruce-0.011
Token said
Token ID531
Feature activation-2.51e-02

Pos logit contributions
ho+0.021
ses+0.021
sed+0.020
rified+0.018
 hope+0.016
Neg logit contributions
lf-0.017
During-0.015
lore-0.014
 pack-0.014
 organize-0.014
Token,
Token ID11
Feature activation-1.07e-02

Pos logit contributions
sed+0.053
ho+0.052
ses+0.052
 pad+0.046
ks+0.046
Neg logit contributions
lf-0.026
 cartoons-0.021
lore-0.021
ilts-0.021
 bath-0.020
Token "
Token ID366
Feature activation-2.36e-02

Pos logit contributions
ho+0.022
ses+0.021
sed+0.021
 hope+0.019
rified+0.019
Neg logit contributions
lf-0.012
 cartoons-0.011
 spa-0.010
During-0.010
lore-0.010
TokenWow
Token ID22017
Feature activation-2.08e-02

Pos logit contributions
sed+0.036
ho+0.034
ses+0.034
ks+0.028
aped+0.026
Neg logit contributions
lf-0.043
During-0.039
ilts-0.039
lore-0.039
 spa-0.039
Token!
Token ID0
Feature activation-2.32e-02

Pos logit contributions
ho+0.041
sed+0.039
ses+0.038
ks+0.033
aped+0.033
Neg logit contributions
lf-0.027
ilts-0.025
 cartoons-0.024
 bath-0.023
 spa-0.023
Token What
Token ID1867
Feature activation-6.67e-02

Pos logit contributions
ho+0.035
sed+0.033
ses+0.031
ks+0.028
 hope+0.027
Neg logit contributions
lf-0.039
ilts-0.037
 cartoons-0.037
 spa-0.036
lore-0.036
Token is
Token ID318
Feature activation+1.09e-02

Pos logit contributions
ho+0.117
sed+0.111
ses+0.103
aped+0.089
ked+0.088
Neg logit contributions
lf-0.106
ilts-0.096
 cartoons-0.095
ruce-0.093
 pack-0.093
Token this
Token ID428
Feature activation-5.90e-03

Pos logit contributions
lf+0.016
ilts+0.015
 cartoons+0.014
 spa+0.014
iver+0.014
Neg logit contributions
ho-0.020
sed-0.019
ses-0.018
aped-0.016
ks-0.016
Token?"
Token ID1701
Feature activation-3.74e-03

Pos logit contributions
ho+0.013
sed+0.013
ses+0.012
 ear+0.012
 pad+0.011
Neg logit contributions
ilts-0.005
lf-0.005
 cartoons-0.005
 films-0.005
 spa-0.004
Token
Token ID198
Feature activation+5.01e-04

Pos logit contributions
ho+0.007
sed+0.007
ses+0.007
ks+0.006
aped+0.006
Neg logit contributions
lf-0.005
 cartoons-0.005
ilts-0.005
 spa-0.005
During-0.004
Token
Token ID198
Feature activation+1.03e-02

Pos logit contributions
lf+0.001
ilts+0.001
 cartoons+0.001
 spa+0.001
 films+0.001
Neg logit contributions
ho-0.001
sed-0.001
ses-0.001
ks-0.001
aped-0.001
Token
Token ID26391
Feature activation-1.37e-02

Pos logit contributions
ho+0.033
sed+0.032
ses+0.030
aped+0.025
ks+0.025
Neg logit contributions
lf-0.029
iver-0.027
 spa-0.026
ilts-0.026
 concert-0.026
Token Kay
Token ID17356
Feature activation-3.60e-02

Pos logit contributions
ho+0.027
sed+0.026
ses+0.025
rified+0.022
ked+0.021
Neg logit contributions
lf-0.019
ilts-0.018
lore-0.016
During-0.015
 spa-0.015
Tokenla
Token ID5031
Feature activation-2.22e-02

Pos logit contributions
ses+0.066
sed+0.066
ho+0.059
aped+0.053
Deal+0.053
Neg logit contributions
lf-0.057
ilts-0.048
 Eve-0.045
?"-0.045
 spa-0.045
Token was
Token ID373
Feature activation-2.79e-02

Pos logit contributions
ho+0.037
sed+0.036
ses+0.035
rified+0.030
aped+0.029
Neg logit contributions
lf-0.036
ilts-0.032
 cartoons-0.030
 spa-0.030
 pack-0.030
Token over
Token ID625
Feature activation-1.05e-02

Pos logit contributions
ho+0.055
sed+0.052
ses+0.050
aped+0.044
ks+0.043
Neg logit contributions
lf-0.038
ilts-0.035
 cartoons-0.034
 spa-0.034
 pack-0.032
Tokenjoy
Token ID2633
Feature activation-6.64e-02

Pos logit contributions
sed+0.016
ses+0.016
ho+0.016
Deal+0.013
aped+0.013
Neg logit contributions
lf-0.017
 spa-0.017
 bath-0.017
 cartoons-0.017
 product-0.016
Tokened
Token ID276
Feature activation-1.20e-02

Pos logit contributions
sed+0.087
ses+0.085
ho+0.075
Deal+0.055
 hope+0.054
Neg logit contributions
uts-0.127
lf-0.118
 late-0.117
ilts-0.113
 Wed-0.112
Token and
Token ID290
Feature activation-1.33e-02

Pos logit contributions
sed+0.023
ho+0.023
ses+0.023
aped+0.019
ks+0.019
Neg logit contributions
lf-0.016
ilts-0.014
 spa-0.014
 cartoons-0.014
 pack-0.014
Token the
Token ID262
Feature activation-1.45e-02

Pos logit contributions
ho+0.024
sed+0.024
ses+0.023
rified+0.021
ks+0.020
Neg logit contributions
lf-0.019
 cartoons-0.017
 pack-0.017
ilts-0.017
 spa-0.017
Token two
Token ID734
Feature activation-4.87e-03

Pos logit contributions
ho+0.031
sed+0.030
ses+0.029
ks+0.028
aped+0.027
Neg logit contributions
lf-0.017
 cartoons-0.015
 spa-0.015
 bath-0.015
 pack-0.014
Token of
Token ID286
Feature activation-8.18e-03

Pos logit contributions
ho+0.010
sed+0.009
ses+0.009
aped+0.008
 hope+0.008
Neg logit contributions
lf-0.006
ilts-0.006
 cartoons-0.005
 spa-0.005
lore-0.005
Token sharp
Token ID7786
Feature activation-1.61e-02

Pos logit contributions
sed+0.029
ho+0.029
ses+0.028
aped+0.025
ks+0.025
Neg logit contributions
lf-0.021
 cartoons-0.017
 bath-0.017
 spa-0.017
 pack-0.016
Token object
Token ID2134
Feature activation-1.04e-02

Pos logit contributions
ho+0.034
sed+0.033
Deal+0.031
ses+0.031
aped+0.031
Neg logit contributions
lf-0.019
 battery-0.016
 bath-0.015
 pack-0.015
 carpet-0.014
Token on
Token ID319
Feature activation-1.84e-02

Pos logit contributions
sed+0.019
ho+0.019
ses+0.019
Deal+0.018
 hope+0.017
Neg logit contributions
lf-0.014
 opener-0.011
ruce-0.011
 cartoons-0.011
 bath-0.011
Token the
Token ID262
Feature activation-2.94e-02

Pos logit contributions
ho+0.030
sed+0.027
ses+0.026
 ear+0.025
 hope+0.025
Neg logit contributions
ilts-0.029
lf-0.029
arre-0.027
 spa-0.026
lore-0.026
Token ground
Token ID2323
Feature activation-1.79e-02

Pos logit contributions
ho+0.064
sed+0.062
ks+0.060
ses+0.060
aped+0.059
Neg logit contributions
lf-0.033
 carpet-0.030
 bath-0.029
 cartoons-0.028
 mechanic-0.028
Token below
Token ID2174
Feature activation-6.61e-02

Pos logit contributions
sed+0.035
ses+0.034
ho+0.034
 hope+0.030
Deal+0.029
Neg logit contributions
lf-0.024
ilts-0.020
 opener-0.020
ruce-0.019
 cartoons-0.019
Token.
Token ID13
Feature activation+1.98e-03

Pos logit contributions
sed+0.160
ses+0.155
Deal+0.141
ho+0.140
aped+0.127
Neg logit contributions
lf-0.065
 bath-0.055
 cartoons-0.053
 pack-0.052
ruce-0.048
Token Curious
Token ID44269
Feature activation-3.77e-02

Pos logit contributions
lf+0.003
ilts+0.002
 cartoons+0.002
 spa+0.002
lore+0.002
Neg logit contributions
ho-0.004
sed-0.004
ses-0.003
ks-0.003
aped-0.003
Token,
Token ID11
Feature activation+1.01e-03

Pos logit contributions
ho+0.081
sed+0.080
ses+0.078
aped+0.069
ks+0.068
Neg logit contributions
lf-0.042
 cartoons-0.035
 bath-0.034
ilts-0.034
 spa-0.034
Token he
Token ID339
Feature activation-2.63e-03

Pos logit contributions
lf+0.001
 cartoons+0.001
 spa+0.001
ilts+0.001
During+0.001
Neg logit contributions
ho-0.002
sed-0.002
ses-0.002
aped-0.002
ks-0.002
Token quickly
Token ID2952
Feature activation-5.32e-03

Pos logit contributions
ses+0.004
sed+0.004
ho+0.004
Deal+0.003
rified+0.003
Neg logit contributions
lf-0.004
During-0.004
ilts-0.004
 pack-0.004
lore-0.004
Token
Token ID6184
Feature activation-9.83e-03

Pos logit contributions
ho+0.026
sed+0.025
ses+0.025
 hope+0.022
ks+0.022
Neg logit contributions
lf-0.015
 spa-0.014
 cartoons-0.014
During-0.013
 mechanic-0.013
Token
Token ID95
Feature activation-3.12e-03

Pos logit contributions
ho+0.016
sed+0.016
ses+0.015
aped+0.013
ks+0.013
Neg logit contributions
lf-0.016
ilts-0.015
 cartoons-0.015
 spa-0.014
ruce-0.014
Token
Token ID26391
Feature activation-7.03e-03

Pos logit contributions
ho+0.005
sed+0.005
ses+0.005
aped+0.004
ks+0.004
Neg logit contributions
lf-0.005
ilts-0.005
 cartoons-0.005
 spa-0.005
ruce-0.005
Token
Token ID129
Feature activation-1.30e-02

Pos logit contributions
ho+0.013
sed+0.012
ses+0.011
aped+0.010
 hope+0.010
Neg logit contributions
lf-0.011
ilts-0.010
 cartoons-0.009
 spa-0.009
lore-0.009
Token
Token ID241
Feature activation-2.53e-02

Pos logit contributions
ho+0.021
sed+0.021
ses+0.020
ks+0.017
 ear+0.017
Neg logit contributions
lf-0.021
ilts-0.019
 cartoons-0.019
 spa-0.018
ruce-0.018
TokenOf
Token ID5189
Feature activation-6.59e-02

Pos logit contributions
sed+0.046
ho+0.045
ses+0.042
als+0.038
ked+0.036
Neg logit contributions
During-0.036
 spa-0.035
 sport-0.034
omo-0.034
lf-0.033
Token course
Token ID1781
Feature activation-5.15e-02

Pos logit contributions
sed+0.132
ses+0.126
ho+0.113
 pad+0.103
aped+0.103
Neg logit contributions
lf-0.095
ister-0.086
 concert-0.085
 operation-0.084
 bath-0.082
Token.
Token ID13
Feature activation-1.18e-02

Pos logit contributions
ses+0.121
sed+0.118
ho+0.115
aped+0.102
 pad+0.102
Neg logit contributions
lf-0.050
?"-0.042
 bath-0.038
 pack-0.037
 cartoons-0.037
Token I
Token ID314
Feature activation-2.85e-02

Pos logit contributions
ho+0.019
sed+0.018
ses+0.017
ks+0.015
aped+0.014
Neg logit contributions
 spa-0.019
lf-0.019
 cartoons-0.018
ilts-0.018
 mechanic-0.018
Token would
Token ID561
Feature activation-1.33e-02

Pos logit contributions
ho+0.046
sed+0.045
ses+0.042
aped+0.036
rified+0.036
Neg logit contributions
lf-0.044
ilts-0.044
 pack-0.043
 spa-0.042
 bath-0.042
Token love
Token ID1842
Feature activation-4.86e-04

Pos logit contributions
ho+0.024
sed+0.023
ses+0.022
aped+0.020
ks+0.020
Neg logit contributions
lf-0.019
ilts-0.018
 cartoons-0.017
 spa-0.017
 pack-0.017
Token
Token ID6184
Feature activation-9.83e-03

Pos logit contributions
ho+0.026
sed+0.025
ses+0.025
 hope+0.022
ks+0.022
Neg logit contributions
lf-0.015
 spa-0.014
 cartoons-0.014
During-0.013
 mechanic-0.013
Token
Token ID95
Feature activation-3.12e-03

Pos logit contributions
ho+0.016
sed+0.016
ses+0.015
aped+0.013
ks+0.013
Neg logit contributions
lf-0.016
ilts-0.015
 cartoons-0.015
 spa-0.014
ruce-0.014
Token
Token ID26391
Feature activation-7.03e-03

Pos logit contributions
ho+0.005
sed+0.005
ses+0.005
aped+0.004
ks+0.004
Neg logit contributions
lf-0.005
ilts-0.005
 cartoons-0.005
 spa-0.005
ruce-0.005
Token
Token ID129
Feature activation-1.30e-02

Pos logit contributions
ho+0.013
sed+0.012
ses+0.011
aped+0.010
 hope+0.010
Neg logit contributions
lf-0.011
ilts-0.010
 cartoons-0.009
 spa-0.009
lore-0.009
Token
Token ID241
Feature activation-2.53e-02

Pos logit contributions
ho+0.021
sed+0.021
ses+0.020
ks+0.017
 ear+0.017
Neg logit contributions
lf-0.021
ilts-0.019
 cartoons-0.019
 spa-0.018
ruce-0.018
TokenOf
Token ID5189
Feature activation-6.59e-02

Pos logit contributions
sed+0.046
ho+0.045
ses+0.042
als+0.038
ked+0.036
Neg logit contributions
During-0.036
 spa-0.035
 sport-0.034
omo-0.034
lf-0.033
Token course
Token ID1781
Feature activation-5.15e-02

Pos logit contributions
sed+0.132
ses+0.126
ho+0.113
 pad+0.103
aped+0.103
Neg logit contributions
lf-0.095
ister-0.086
 concert-0.085
 operation-0.084
 bath-0.082
Token.
Token ID13
Feature activation-1.18e-02

Pos logit contributions
ses+0.121
sed+0.118
ho+0.115
aped+0.102
 pad+0.102
Neg logit contributions
lf-0.050
?"-0.042
 bath-0.038
 pack-0.037
 cartoons-0.037
Token I
Token ID314
Feature activation-2.85e-02

Pos logit contributions
ho+0.019
sed+0.018
ses+0.017
ks+0.015
aped+0.014
Neg logit contributions
 spa-0.019
lf-0.019
 cartoons-0.018
ilts-0.018
 mechanic-0.018
Token would
Token ID561
Feature activation-1.33e-02

Pos logit contributions
ho+0.046
sed+0.045
ses+0.042
aped+0.036
rified+0.036
Neg logit contributions
lf-0.044
ilts-0.044
 pack-0.043
 spa-0.042
 bath-0.042
Token love
Token ID1842
Feature activation-4.86e-04

Pos logit contributions
ho+0.024
sed+0.023
ses+0.022
aped+0.020
ks+0.020
Neg logit contributions
lf-0.019
ilts-0.018
 cartoons-0.017
 spa-0.017
 pack-0.017
Token
Token ID6184
Feature activation-8.81e-03

Pos logit contributions
ho+0.029
sed+0.027
ses+0.026
aped+0.023
 hope+0.023
Neg logit contributions
 spa-0.014
During-0.013
ilts-0.013
 mechanic-0.013
 cartoons-0.013
Token
Token ID95
Feature activation-1.47e-03

Pos logit contributions
ho+0.014
sed+0.013
ses+0.012
aped+0.011
ks+0.011
Neg logit contributions
lf-0.015
ilts-0.014
 cartoons-0.014
 spa-0.013
 films-0.013
Token
Token ID26391
Feature activation-9.67e-03

Pos logit contributions
ho+0.002
sed+0.002
ses+0.002
aped+0.002
 hope+0.002
Neg logit contributions
lf-0.002
ilts-0.002
 cartoons-0.002
 spa-0.002
lore-0.002
Token
Token ID129
Feature activation-1.13e-02

Pos logit contributions
ho+0.017
sed+0.016
ses+0.015
aped+0.013
ks+0.013
Neg logit contributions
lf-0.015
ilts-0.014
 cartoons-0.013
 spa-0.013
ruce-0.013
Token
Token ID241
Feature activation-2.79e-02

Pos logit contributions
ho+0.020
sed+0.019
ses+0.018
aped+0.016
ks+0.016
Neg logit contributions
lf-0.017
ilts-0.016
 cartoons-0.016
 spa-0.015
ruce-0.015
TokenOf
Token ID5189
Feature activation-6.59e-02

Pos logit contributions
sed+0.054
ho+0.052
ses+0.050
als+0.045
ked+0.043
Neg logit contributions
During-0.037
 spa-0.036
On-0.035
omo-0.034
 sport-0.033
Token course
Token ID1781
Feature activation-5.28e-02

Pos logit contributions
sed+0.158
ses+0.153
ho+0.134
aped+0.129
ked+0.120
Neg logit contributions
lf-0.071
?"-0.066
 operation-0.065
ister-0.065
 concert-0.065
Token!
Token ID0
Feature activation-2.60e-02

Pos logit contributions
ses+0.124
sed+0.121
ho+0.118
aped+0.103
ked+0.101
Neg logit contributions
lf-0.054
?"-0.044
 pack-0.042
 bath-0.041
 spa-0.040
Tokenâ
Token ID22940
Feature activation-1.29e-02

Pos logit contributions
ho+0.043
sed+0.041
ses+0.038
ks+0.032
aped+0.032
Neg logit contributions
 spa-0.040
During-0.039
 mechanic-0.039
lf-0.039
 opener-0.039
Token
Token ID26391
Feature activation-6.53e-03

Pos logit contributions
ho+0.023
sed+0.023
ses+0.022
ks+0.018
ked+0.018
Neg logit contributions
lf-0.020
iver-0.018
 spa-0.018
 pack-0.017
ilts-0.017
Token he
Token ID339
Feature activation-1.29e-02

Pos logit contributions
ho+0.012
sed+0.011
ses+0.011
aped+0.009
ks+0.009
Neg logit contributions
lf-0.010
ilts-0.009
 cartoons-0.009
 spa-0.009
lore-0.008
Token with
Token ID351
Feature activation-3.11e-02

Pos logit contributions
sed+0.013
ses+0.013
ho+0.013
aped+0.011
Deal+0.011
Neg logit contributions
lf-0.006
 cartoons-0.006
lore-0.005
 pack-0.005
 spa-0.005
Token joy
Token ID8716
Feature activation-3.77e-03

Pos logit contributions
ho+0.056
sed+0.053
ses+0.049
aped+0.044
 pad+0.041
Neg logit contributions
 cartoons-0.047
lf-0.047
 bath-0.044
 pack-0.044
 films-0.043
Token and
Token ID290
Feature activation-1.60e-02

Pos logit contributions
ho+0.007
sed+0.007
ses+0.007
aped+0.006
 hope+0.006
Neg logit contributions
lf-0.005
 cartoons-0.004
ilts-0.004
 spa-0.004
lore-0.004
Token said
Token ID531
Feature activation-1.80e-02

Pos logit contributions
ho+0.030
sed+0.028
ses+0.028
 pad+0.024
rified+0.024
Neg logit contributions
lf-0.023
 pack-0.020
 cartoons-0.019
ilts-0.019
 organize-0.019
Token "
Token ID366
Feature activation-2.67e-02

Pos logit contributions
ses+0.042
sed+0.042
ho+0.040
 pad+0.037
rified+0.036
Neg logit contributions
lf-0.016
 cartoons-0.012
 spa-0.011
 organize-0.011
lore-0.011
Tokenyes
Token ID8505
Feature activation-6.57e-02

Pos logit contributions
sed+0.046
ses+0.044
ho+0.042
ks+0.035
 ear+0.034
Neg logit contributions
lf-0.043
During-0.040
 spa-0.040
ilts-0.039
lore-0.038
Token!"
Token ID2474
Feature activation-3.35e-03

Pos logit contributions
sed+0.134
ses+0.133
ho+0.130
aped+0.117
 hope+0.113
Neg logit contributions
lf-0.082
ilts-0.065
 spa-0.063
 cartoons-0.063
 pack-0.063
Token
Token ID198
Feature activation-2.69e-04

Pos logit contributions
ho+0.007
sed+0.007
ses+0.007
rified+0.006
Deal+0.006
Neg logit contributions
lf-0.003
ilts-0.003
 cartoons-0.003
 spa-0.003
During-0.003
Token
Token ID198
Feature activation+8.02e-03

Pos logit contributions
ho+0.001
sed+0.000
ses+0.000
aped+0.000
 hope+0.000
Neg logit contributions
lf-0.000
ilts-0.000
 cartoons-0.000
 spa-0.000
ruce-0.000
TokenFrom
Token ID4863
Feature activation-1.08e-02

Pos logit contributions
 cartoons+0.011
lf+0.011
 bath+0.011
ilts+0.011
 sausage+0.010
Neg logit contributions
ho-0.015
ks-0.013
sed-0.012
aped-0.012
ses-0.011
Token then
Token ID788
Feature activation-1.22e-02

Pos logit contributions
ho+0.022
sed+0.021
ses+0.020
aped+0.018
ks+0.018
Neg logit contributions
lf-0.014
 cartoons-0.012
ilts-0.012
 spa-0.012
ruce-0.011
Token of
Token ID286
Feature activation-1.28e-02

Pos logit contributions
ho+0.009
sed+0.009
ses+0.009
aped+0.008
rified+0.007
Neg logit contributions
lf-0.006
 cartoons-0.005
ilts-0.005
 spa-0.005
ruce-0.005
Token himself
Token ID2241
Feature activation-1.60e-02

Pos logit contributions
ho+0.031
sed+0.030
ses+0.028
aped+0.026
als+0.026
Neg logit contributions
lf-0.014
 cartoons-0.012
 chores-0.010
 spa-0.010
 homework-0.010
Token and
Token ID290
Feature activation-9.58e-03

Pos logit contributions
ho+0.030
sed+0.029
ses+0.027
aped+0.024
 hope+0.023
Neg logit contributions
lf-0.024
 cartoons-0.021
ilts-0.021
 spa-0.020
 pack-0.020
Token said
Token ID531
Feature activation-1.33e-02

Pos logit contributions
ho+0.020
sed+0.019
ses+0.019
rified+0.016
 pad+0.016
Neg logit contributions
lf-0.012
 pack-0.010
 cartoons-0.010
 spa-0.010
During-0.010
Token "
Token ID366
Feature activation-2.54e-02

Pos logit contributions
sed+0.029
ho+0.028
ses+0.028
 pad+0.025
rified+0.024
Neg logit contributions
lf-0.014
 cartoons-0.012
ilts-0.012
 spa-0.012
 pack-0.011
Tokenyes
Token ID8505
Feature activation-6.52e-02

Pos logit contributions
sed+0.050
ses+0.048
ho+0.042
 ear+0.036
ks+0.036
Neg logit contributions
lf-0.038
During-0.037
 spa-0.035
lore-0.034
Johnny-0.034
Token,
Token ID11
Feature activation-2.25e-02

Pos logit contributions
sed+0.138
ho+0.135
ses+0.135
aped+0.118
ks+0.116
Neg logit contributions
lf-0.079
ilts-0.063
 cartoons-0.062
 spa-0.062
 pack-0.061
Token I
Token ID314
Feature activation-3.10e-02

Pos logit contributions
sed+0.041
ho+0.041
ses+0.039
aped+0.035
ks+0.034
Neg logit contributions
lf-0.034
 cartoons-0.030
 spa-0.030
 mechanic-0.029
 pack-0.029
Token sure
Token ID1654
Feature activation-1.12e-02

Pos logit contributions
ho+0.046
sed+0.044
ses+0.041
aped+0.034
ks+0.034
Neg logit contributions
lf-0.056
ilts-0.053
 cartoons-0.051
 spa-0.051
During-0.050
Token was
Token ID373
Feature activation+4.91e-04

Pos logit contributions
ho+0.021
sed+0.020
ses+0.019
rified+0.016
ked+0.016
Neg logit contributions
lf-0.016
ilts-0.014
 cartoons-0.014
 pack-0.014
 spa-0.014
Token".
Token ID1911
Feature activation-8.47e-03

Pos logit contributions
lf+0.001
 cartoons+0.001
ilts+0.001
 spa+0.001
 pack+0.001
Neg logit contributions
ho-0.001
sed-0.001
ses-0.001
aped-0.001
rified-0.001
Token
Token ID6184
Feature activation-8.72e-03

Pos logit contributions
ses+0.024
sed+0.024
ho+0.024
 hope+0.021
ks+0.021
Neg logit contributions
lf-0.014
 spa-0.013
 cartoons-0.013
During-0.012
 mechanic-0.012
Token
Token ID95
Feature activation-3.39e-03

Pos logit contributions
ho+0.015
sed+0.014
ses+0.014
aped+0.012
ks+0.012
Neg logit contributions
lf-0.014
ilts-0.013
 cartoons-0.012
 spa-0.012
During-0.012
Token
Token ID26391
Feature activation-1.08e-02

Pos logit contributions
ho+0.006
sed+0.005
ses+0.005
ks+0.004
aped+0.004
Neg logit contributions
lf-0.006
ilts-0.005
 cartoons-0.005
 spa-0.005
lore-0.005
Token
Token ID129
Feature activation-1.07e-02

Pos logit contributions
ho+0.020
sed+0.019
ses+0.018
aped+0.016
ks+0.016
Neg logit contributions
lf-0.016
ilts-0.015
 cartoons-0.014
 spa-0.014
lore-0.013
Token
Token ID241
Feature activation-2.46e-02

Pos logit contributions
ho+0.018
sed+0.018
ses+0.017
ks+0.014
aped+0.014
Neg logit contributions
lf-0.017
ilts-0.016
 cartoons-0.015
 spa-0.015
ruce-0.014
TokenOf
Token ID5189
Feature activation-6.51e-02

Pos logit contributions
sed+0.047
ho+0.045
ses+0.044
als+0.038
ked+0.037
Neg logit contributions
During-0.033
 spa-0.032
On-0.031
 sport-0.030
While-0.030
Token course
Token ID1781
Feature activation-4.96e-02

Pos logit contributions
sed+0.144
ses+0.138
ho+0.127
aped+0.116
 pad+0.113
Neg logit contributions
ister-0.075
 concert-0.075
lf-0.074
unt-0.071
?"-0.070
Token!
Token ID0
Feature activation-2.14e-02

Pos logit contributions
ses+0.114
sed+0.112
ho+0.108
aped+0.094
ked+0.093
Neg logit contributions
lf-0.051
?"-0.041
 cartoons-0.041
 spa-0.040
 bath-0.040
Token Come
Token ID7911
Feature activation-4.02e-02

Pos logit contributions
ho+0.034
sed+0.033
ses+0.031
aped+0.025
ks+0.025
Neg logit contributions
 spa-0.034
ilts-0.034
During-0.033
lf-0.033
 cartoons-0.033
Token join
Token ID4654
Feature activation-3.78e-02

Pos logit contributions
ho+0.073
sed+0.072
ses+0.069
aped+0.059
rified+0.058
Neg logit contributions
lf-0.057
 cartoons-0.053
 spa-0.053
ilts-0.053
 pack-0.052
Token us
Token ID514
Feature activation-1.53e-02

Pos logit contributions
sed+0.080
ho+0.080
ses+0.077
aped+0.066
Deal+0.063
Neg logit contributions
lf-0.045
 cartoons-0.042
 bath-0.042
 spa-0.042
ilts-0.040
Token around
Token ID1088
Feature activation-1.57e-02

Pos logit contributions
ho+0.034
sed+0.032
ses+0.030
aped+0.025
 hope+0.025
Neg logit contributions
lf-0.040
ilts-0.038
 cartoons-0.037
 spa-0.036
During-0.036
Token in
Token ID287
Feature activation-3.42e-02

Pos logit contributions
sed+0.033
ho+0.032
ses+0.032
Deal+0.028
aped+0.028
Neg logit contributions
lf-0.019
 cartoons-0.017
 bath-0.016
 films-0.015
 sport-0.015
Token circles
Token ID13332
Feature activation-1.14e-02

Pos logit contributions
ho+0.071
sed+0.070
ses+0.065
aped+0.060
Deal+0.057
Neg logit contributions
lf-0.043
 cartoons-0.039
ilts-0.038
 bath-0.037
 spa-0.037
Token.
Token ID13
Feature activation+5.84e-03

Pos logit contributions
ho+0.020
sed+0.020
ses+0.019
 hope+0.016
aped+0.016
Neg logit contributions
lf-0.018
 cartoons-0.016
ilts-0.016
 spa-0.015
ruce-0.015
Token "
Token ID366
Feature activation-2.13e-02

Pos logit contributions
lf+0.011
ilts+0.011
 spa+0.010
 cartoons+0.010
iver+0.010
Neg logit contributions
ho-0.008
sed-0.007
ses-0.007
aped-0.006
 ear-0.005
TokenWhy
Token ID5195
Feature activation-6.51e-02

Pos logit contributions
ho+0.029
sed+0.029
ses+0.027
ks+0.022
aped+0.022
Neg logit contributions
lf-0.040
ilts-0.038
 cartoons-0.037
 spa-0.037
During-0.037
Token are
Token ID389
Feature activation+2.44e-02

Pos logit contributions
ho+0.082
sed+0.077
ses+0.071
aped+0.058
ks+0.058
Neg logit contributions
lf-0.132
ilts-0.125
 cartoons-0.122
 spa-0.121
ruce-0.118
Token you
Token ID345
Feature activation-1.79e-02

Pos logit contributions
ilts+0.050
lf+0.049
iver+0.048
 spa+0.047
 product+0.046
Neg logit contributions
ho-0.032
sed-0.028
ses-0.024
aped-0.021
als-0.021
Token running
Token ID2491
Feature activation-1.82e-02

Pos logit contributions
ho+0.029
sed+0.028
ses+0.025
 ear+0.022
 pad+0.022
Neg logit contributions
lf-0.031
ilts-0.030
 spa-0.028
 product-0.028
 cartoons-0.028
Token like
Token ID588
Feature activation-1.31e-02

Pos logit contributions
ho+0.029
sed+0.027
ses+0.025
ks+0.022
 hope+0.022
Neg logit contributions
lf-0.032
ilts-0.030
 spa-0.028
 cartoons-0.028
 product-0.028
Token that
Token ID326
Feature activation+5.91e-04

Pos logit contributions
ho+0.023
sed+0.022
ses+0.020
 hope+0.018
aped+0.018
Neg logit contributions
lf-0.020
ilts-0.019
 spa-0.018
 product-0.018
iver-0.018
Token said
Token ID531
Feature activation-1.42e-02

Pos logit contributions
ho+0.032
sed+0.032
ses+0.031
ks+0.026
aped+0.026
Neg logit contributions
lf-0.030
ilts-0.026
 cartoons-0.026
 spa-0.026
During-0.026
Token,
Token ID11
Feature activation-8.21e-03

Pos logit contributions
ses+0.031
sed+0.031
ho+0.030
 pad+0.028
aped+0.027
Neg logit contributions
lf-0.012
 cartoons-0.010
 bath-0.010
lore-0.009
 doctor-0.009
Token "
Token ID366
Feature activation-2.38e-02

Pos logit contributions
ho+0.016
sed+0.016
ses+0.016
Deal+0.015
 hope+0.014
Neg logit contributions
lf-0.009
 cartoons-0.008
 spa-0.008
 opener-0.008
During-0.008
TokenHi
Token ID17250
Feature activation-2.91e-02

Pos logit contributions
sed+0.039
ses+0.037
ho+0.035
ks+0.030
 ear+0.030
Neg logit contributions
lf-0.038
During-0.036
 spa-0.036
ilts-0.035
lore-0.035
Token!
Token ID0
Feature activation-2.07e-02

Pos logit contributions
ho+0.058
sed+0.056
ses+0.053
aped+0.048
ks+0.047
Neg logit contributions
lf-0.038
ilts-0.034
 cartoons-0.034
 spa-0.033
lore-0.031
Token What
Token ID1867
Feature activation-6.48e-02

Pos logit contributions
sed+0.032
ho+0.031
ses+0.030
 hope+0.027
aped+0.026
Neg logit contributions
lf-0.031
During-0.031
 spa-0.031
 cartoons-0.030
 mechanic-0.030
Token is
Token ID318
Feature activation+1.39e-02

Pos logit contributions
ho+0.098
sed+0.094
ses+0.087
aped+0.072
ked+0.071
Neg logit contributions
lf-0.113
ilts-0.105
 cartoons-0.104
 pack-0.103
 bath-0.103
Token your
Token ID534
Feature activation-1.63e-02

Pos logit contributions
lf+0.017
ilts+0.015
 cartoons+0.015
 spa+0.014
ruce+0.014
Neg logit contributions
ho-0.029
sed-0.028
ses-0.026
aped-0.024
ks-0.024
Token name
Token ID1438
Feature activation-2.33e-02

Pos logit contributions
ho+0.030
sed+0.028
ses+0.027
rified+0.026
aped+0.026
Neg logit contributions
lf-0.021
 bath-0.020
ilts-0.020
 cartoons-0.020
 spa-0.020
Token?"
Token ID1701
Feature activation-1.35e-02

Pos logit contributions
ho+0.059
sed+0.056
ses+0.053
 ear+0.050
ed+0.049
Neg logit contributions
lf-0.017
ilts-0.016
 cartoons-0.013
 spa-0.013
During-0.013
Token
Token ID198
Feature activation-1.32e-02

Pos logit contributions
ho+0.024
sed+0.023
ses+0.021
aped+0.018
ks+0.018
Neg logit contributions
lf-0.021
 cartoons-0.019
ilts-0.019
 spa-0.019
During-0.018
Token were
Token ID547
Feature activation+3.16e-03

Pos logit contributions
ho+0.019
sed+0.019
ses+0.018
Deal+0.016
rified+0.015
Neg logit contributions
lf-0.017
ilts-0.015
 cartoons-0.015
ruce-0.014
 product-0.014
Token reflecting
Token ID20252
Feature activation-2.38e-02

Pos logit contributions
lf+0.005
ilts+0.004
 cartoons+0.004
 spa+0.004
lore+0.004
Neg logit contributions
ho-0.006
sed-0.005
ses-0.005
ks-0.005
aped-0.005
Token off
Token ID572
Feature activation-5.34e-04

Pos logit contributions
ho+0.045
sed+0.043
ses+0.041
aped+0.036
ks+0.036
Neg logit contributions
lf-0.034
ilts-0.031
 cartoons-0.030
 spa-0.029
ruce-0.029
Token the
Token ID262
Feature activation-1.77e-02

Pos logit contributions
sed+0.001
ho+0.001
ses+0.001
aped+0.001
Deal+0.001
Neg logit contributions
lf-0.001
 cartoons-0.001
ilts-0.001
 films-0.001
 bath-0.001
Token lake
Token ID13546
Feature activation-2.64e-02

Pos logit contributions
ho+0.041
sed+0.039
ses+0.037
Deal+0.037
aped+0.036
Neg logit contributions
 carpet-0.017
lf-0.016
 cartoons-0.016
 bath-0.016
 television-0.015
Token below
Token ID2174
Feature activation-6.46e-02

Pos logit contributions
sed+0.052
ho+0.050
ses+0.050
aped+0.044
Deal+0.043
Neg logit contributions
lf-0.034
 cartoons-0.031
ilts-0.031
 pack-0.030
 bath-0.028
Token.
Token ID13
Feature activation-3.44e-03

Pos logit contributions
sed+0.159
ses+0.154
Deal+0.139
ho+0.139
aped+0.130
Neg logit contributions
 cartoons-0.055
lf-0.055
 bath-0.052
 pack-0.050
 TV-0.047
Token
Token ID198
Feature activation-8.08e-04

Pos logit contributions
ho+0.007
sed+0.006
ses+0.006
aped+0.005
ks+0.005
Neg logit contributions
lf-0.005
ilts-0.004
 cartoons-0.004
 spa-0.004
During-0.004
Token
Token ID198
Feature activation+6.97e-03

Pos logit contributions
ho+0.002
sed+0.001
ses+0.001
aped+0.001
ks+0.001
Neg logit contributions
lf-0.001
ilts-0.001
 cartoons-0.001
 spa-0.001
ruce-0.001
TokenSoon
Token ID28093
Feature activation-4.91e-03

Pos logit contributions
 cartoons+0.011
 homework+0.011
 films+0.011
 bath+0.010
ilts+0.010
Neg logit contributions
ho-0.011
OK-0.009
aped-0.009
ks-0.009
Deal-0.009
Token,
Token ID11
Feature activation+8.61e-03

Pos logit contributions
ho+0.008
sed+0.008
ses+0.007
 hope+0.007
ks+0.007
Neg logit contributions
lf-0.008
ilts-0.007
 cartoons-0.007
 spa-0.007
iver-0.007
Token man
Token ID582
Feature activation-2.14e-02

Pos logit contributions
ilts+0.019
lf+0.019
 cartoons+0.017
 spa+0.017
ruce+0.017
Neg logit contributions
ho-0.027
sed-0.025
ses-0.023
 hope-0.021
 ear-0.021
Token smiled
Token ID13541
Feature activation-2.58e-02

Pos logit contributions
ho+0.032
sed+0.031
ses+0.030
aped+0.024
ks+0.024
Neg logit contributions
lf-0.038
ilts-0.034
 cartoons-0.034
 spa-0.033
During-0.033
Token and
Token ID290
Feature activation-3.42e-02

Pos logit contributions
ho+0.055
sed+0.054
ses+0.052
aped+0.047
ks+0.044
Neg logit contributions
lf-0.027
 cartoons-0.026
ilts-0.025
 spa-0.025
 pack-0.024
Token said
Token ID531
Feature activation-1.46e-02

Pos logit contributions
ho+0.061
sed+0.057
ses+0.055
ks+0.049
rified+0.048
Neg logit contributions
lf-0.051
 spa-0.045
 cartoons-0.045
During-0.044
ilts-0.043
Token,"
Token ID553
Feature activation-1.60e-02

Pos logit contributions
ses+0.035
sed+0.034
ho+0.032
 pad+0.031
ks+0.029
Neg logit contributions
lf-0.013
 cartoons-0.010
 doctor-0.010
 mechanic-0.009
?"-0.009
TokenOf
Token ID5189
Feature activation-6.44e-02

Pos logit contributions
sed+0.028
ses+0.027
ho+0.026
ks+0.022
 pad+0.022
Neg logit contributions
lf-0.027
 spa-0.023
 cartoons-0.023
 mechanic-0.023
During-0.023
Token course
Token ID1781
Feature activation-4.66e-02

Pos logit contributions
sed+0.136
ses+0.130
ho+0.105
 pad+0.101
aped+0.099
Neg logit contributions
lf-0.093
 concert-0.077
que-0.075
ister-0.075
?"-0.073
Token I
Token ID314
Feature activation-3.95e-02

Pos logit contributions
ses+0.111
sed+0.109
ho+0.105
aped+0.096
 pad+0.093
Neg logit contributions
lf-0.045
?"-0.036
 mechanic-0.035
 operation-0.034
 cartoons-0.034
Token can
Token ID460
Feature activation-2.76e-02

Pos logit contributions
ho+0.061
sed+0.060
ses+0.057
rified+0.048
aped+0.047
Neg logit contributions
lf-0.065
 pack-0.063
ilts-0.063
 opener-0.062
 organize-0.062
Token!
Token ID0
Feature activation-1.71e-02

Pos logit contributions
ho+0.049
sed+0.047
ses+0.044
aped+0.039
ks+0.038
Neg logit contributions
lf-0.042
ilts-0.039
 cartoons-0.038
 spa-0.037
ruce-0.036
TokenWatch
Token ID10723
Feature activation-4.11e-02

Pos logit contributions
ho+0.029
sed+0.027
ses+0.025
aped+0.022
ks+0.022
Neg logit contributions
lf-0.028
 spa-0.027
 cartoons-0.027
ilts-0.026
During-0.026
Token dirty
Token ID11841
Feature activation+5.52e-02

Pos logit contributions
lf+0.029
ilts+0.028
 spa+0.026
 cartoons+0.025
lore+0.025
Neg logit contributions
ho-0.048
sed-0.044
ses-0.041
 hope-0.038
 ear-0.038
Token?
Token ID30
Feature activation+1.81e-02

Pos logit contributions
ilts+0.060
lf+0.059
During+0.054
lore+0.053
 spa+0.052
Neg logit contributions
ho-0.113
sed-0.104
ses-0.099
ed-0.097
 hope-0.094
Token You
Token ID921
Feature activation-3.46e-04

Pos logit contributions
lf+0.035
ilts+0.033
 cartoons+0.032
 spa+0.032
lore+0.031
Neg logit contributions
ho-0.025
sed-0.023
ses-0.021
aped-0.018
ks-0.018
Token need
Token ID761
Feature activation+2.20e-02

Pos logit contributions
ho+0.000
sed+0.000
 hope+0.000
ses+0.000
ks+0.000
Neg logit contributions
lf-0.001
 spa-0.001
 cartoons-0.001
ilts-0.001
 product-0.001
Token to
Token ID284
Feature activation+3.42e-02

Pos logit contributions
lf+0.037
ilts+0.034
lore+0.032
 cartoons+0.032
 spa+0.032
Neg logit contributions
ho-0.035
sed-0.034
ses-0.032
 hope-0.029
 ear-0.028
Token clean
Token ID3424
Feature activation+6.21e-02

Pos logit contributions
lf+0.054
irens+0.050
lore+0.049
 models+0.049
ilts+0.048
Neg logit contributions
 hope-0.052
ho-0.049
 ear-0.047
sed-0.043
ses-0.041
Token up
Token ID510
Feature activation+1.36e-02

Pos logit contributions
lf+0.112
ilts+0.107
During+0.105
 spa+0.104
 opener+0.103
Neg logit contributions
ho-0.081
sed-0.075
ses-0.070
 hope-0.070
 ear-0.069
Token after
Token ID706
Feature activation+1.82e-02

Pos logit contributions
lf+0.021
ilts+0.019
 spa+0.018
ruce+0.018
 cartoons+0.018
Neg logit contributions
ho-0.023
sed-0.021
ses-0.020
 hope-0.019
ks-0.018
Token you
Token ID345
Feature activation+5.16e-03

Pos logit contributions
ilts+0.030
During+0.029
lore+0.029
 mechanic+0.029
lf+0.029
Neg logit contributions
ho-0.025
sed-0.024
ses-0.022
 ear-0.021
aped-0.020
Token cook
Token ID4255
Feature activation+3.67e-02

Pos logit contributions
lf+0.009
ilts+0.008
 cartoons+0.008
 spa+0.008
lore+0.008
Neg logit contributions
ho-0.008
sed-0.007
ses-0.007
 hope-0.006
ks-0.006
Token.
Token ID13
Feature activation+1.03e-02

Pos logit contributions
lf+0.056
ilts+0.052
 cartoons+0.050
 spa+0.050
lore+0.049
Neg logit contributions
ho-0.063
sed-0.060
ses-0.056
 hope-0.050
ks-0.050
Token asked
Token ID1965
Feature activation-1.92e-03

Pos logit contributions
lf+0.013
ilts+0.012
 cartoons+0.012
 spa+0.011
During+0.011
Neg logit contributions
ho-0.015
sed-0.014
ses-0.014
aped-0.012
ks-0.012
Token her
Token ID607
Feature activation+1.27e-02

Pos logit contributions
sed+0.004
ho+0.004
ses+0.004
aped+0.004
ks+0.003
Neg logit contributions
lf-0.002
 cartoons-0.002
 bath-0.002
ilts-0.002
 pack-0.002
Token,
Token ID11
Feature activation+3.77e-03

Pos logit contributions
lf+0.022
ilts+0.021
 spa+0.020
iver+0.020
 cartoons+0.019
Neg logit contributions
ho-0.020
sed-0.019
ses-0.017
 hope-0.016
ks-0.016
Token "
Token ID366
Feature activation+3.23e-03

Pos logit contributions
lf+0.005
 cartoons+0.005
 spa+0.005
ilts+0.005
During+0.005
Neg logit contributions
ho-0.007
sed-0.007
ses-0.006
aped-0.006
rified-0.006
TokenWhy
Token ID5195
Feature activation-3.69e-02

Pos logit contributions
lf+0.005
ilts+0.004
 cartoons+0.004
 spa+0.004
During+0.004
Neg logit contributions
ho-0.006
sed-0.006
ses-0.006
ks-0.005
aped-0.005
Token are
Token ID389
Feature activation+4.54e-02

Pos logit contributions
ho+0.034
sed+0.031
ses+0.027
 hope+0.022
ks+0.020
Neg logit contributions
lf-0.087
ilts-0.084
 spa-0.084
 cartoons-0.082
During-0.081
Token you
Token ID345
Feature activation+1.18e-02

Pos logit contributions
ilts+0.106
iver+0.104
lf+0.100
 spa+0.099
 opener+0.098
Neg logit contributions
ho-0.048
sed-0.037
ses-0.028
 Hello-0.027
rified-0.026
Token sad
Token ID6507
Feature activation+4.74e-02

Pos logit contributions
lf+0.023
oons+0.023
ilts+0.022
 opener+0.022
During+0.021
Neg logit contributions
sed-0.015
ho-0.014
ses-0.013
 ear-0.012
ked-0.011
Token,
Token ID11
Feature activation+1.33e-02

Pos logit contributions
lf+0.062
ilts+0.062
 product+0.057
 spa+0.056
ruce+0.054
Neg logit contributions
ho-0.087
sed-0.085
ses-0.077
 ear-0.071
ed-0.071
Token Sally
Token ID25737
Feature activation+2.72e-02

Pos logit contributions
lf+0.023
ilts+0.022
 spa+0.020
 product+0.020
ruce+0.020
Neg logit contributions
ho-0.021
sed-0.020
ses-0.018
 ear-0.016
 hope-0.016
Token?"
Token ID1701
Feature activation-3.77e-05

Pos logit contributions
lf+0.021
ilts+0.018
 cartoons+0.017
 spa+0.016
ruce+0.016
Neg logit contributions
ho-0.067
sed-0.064
ses-0.061
 hope-0.058
ks-0.057
Token and
Token ID290
Feature activation-4.66e-03

Pos logit contributions
lf+0.012
ilts+0.011
 cartoons+0.011
 spa+0.011
ruce+0.010
Neg logit contributions
ho-0.018
sed-0.017
ses-0.016
aped-0.014
ks-0.014
Token realized
Token ID6939
Feature activation+1.27e-02

Pos logit contributions
ho+0.009
sed+0.008
ses+0.008
rified+0.007
aped+0.007
Neg logit contributions
lf-0.007
 cartoons-0.006
During-0.006
 pack-0.006
 spa-0.006
Token he
Token ID339
Feature activation+8.69e-03

Pos logit contributions
ilts+0.021
iver+0.019
lf+0.019
lore+0.018
 spa+0.018
Neg logit contributions
ho-0.020
sed-0.018
ses-0.017
 hope-0.017
 ear-0.017
Token forgot
Token ID16453
Feature activation+1.75e-02

Pos logit contributions
lf+0.015
ilts+0.013
 cartoons+0.013
ruce+0.012
During+0.012
Neg logit contributions
ho-0.014
sed-0.013
ses-0.013
aped-0.011
 Hello-0.011
Token to
Token ID284
Feature activation+1.48e-02

Pos logit contributions
lf+0.027
ilts+0.026
iver+0.024
 spa+0.024
lore+0.023
Neg logit contributions
ho-0.029
sed-0.027
ses-0.025
 hope-0.024
 ear-0.024
Token wash
Token ID13502
Feature activation+4.82e-02

Pos logit contributions
lf+0.020
ilts+0.019
 cartoons+0.018
 spa+0.018
ruce+0.017
Neg logit contributions
ho-0.028
sed-0.027
ses-0.025
 hope-0.023
 ear-0.023
Token them
Token ID606
Feature activation+1.23e-02

Pos logit contributions
lf+0.061
ilts+0.055
 cartoons+0.053
 spa+0.052
ruce+0.050
Neg logit contributions
ho-0.098
sed-0.094
ses-0.089
aped-0.080
ks-0.080
Token.
Token ID13
Feature activation+4.86e-03

Pos logit contributions
lf+0.024
ilts+0.022
 spa+0.022
 concert+0.022
lore+0.021
Neg logit contributions
ho-0.016
sed-0.013
 hope-0.012
ses-0.012
ed-0.012
Token He
Token ID679
Feature activation+5.74e-03

Pos logit contributions
lf+0.008
ilts+0.008
ruce+0.007
 spa+0.007
iver+0.007
Neg logit contributions
sed-0.008
ho-0.008
ses-0.007
 ear-0.006
rified-0.006
Token felt
Token ID2936
Feature activation+2.11e-03

Pos logit contributions
lf+0.011
ilts+0.011
 cartoons+0.010
 spa+0.010
ruce+0.010
Neg logit contributions
ho-0.008
sed-0.007
ses-0.007
 hope-0.006
aped-0.006
Token a
Token ID257
Feature activation-2.38e-02

Pos logit contributions
lf+0.004
ilts+0.003
 cartoons+0.003
 spa+0.003
lore+0.003
Neg logit contributions
ho-0.003
sed-0.003
ses-0.003
 hope-0.003
 ear-0.003
Token!
Token ID0
Feature activation-1.43e-02

Pos logit contributions
ho+0.011
sed+0.010
ses+0.010
rified+0.009
aped+0.009
Neg logit contributions
lf-0.008
 cartoons-0.007
ilts-0.007
 spa-0.007
 product-0.007
Token
Token ID198
Feature activation-4.34e-03

Pos logit contributions
ho+0.030
sed+0.028
ses+0.027
aped+0.024
 hope+0.024
Neg logit contributions
lf-0.018
ilts-0.016
 cartoons-0.016
During-0.015
 spa-0.015
Token
Token ID198
Feature activation+2.88e-03

Pos logit contributions
ho+0.009
sed+0.009
ses+0.008
aped+0.007
rified+0.007
Neg logit contributions
lf-0.006
ilts-0.005
 cartoons-0.005
 spa-0.005
During-0.005
TokenWhen
Token ID2215
Feature activation-3.95e-03

Pos logit contributions
lf+0.004
ilts+0.004
 cartoons+0.004
 spa+0.004
 bath+0.003
Neg logit contributions
ho-0.005
sed-0.005
ses-0.005
aped-0.004
ks-0.004
Token they
Token ID484
Feature activation+9.06e-03

Pos logit contributions
ho+0.008
sed+0.008
ses+0.007
ks+0.007
aped+0.006
Neg logit contributions
lf-0.005
 cartoons-0.005
ilts-0.004
 spa-0.004
During-0.004
Token finished
Token ID5201
Feature activation+4.59e-02

Pos logit contributions
lf+0.015
ilts+0.014
 cartoons+0.014
 spa+0.014
ruce+0.013
Neg logit contributions
ho-0.015
sed-0.014
ses-0.013
aped-0.011
ks-0.011
Token their
Token ID511
Feature activation+2.28e-02

Pos logit contributions
lf+0.072
ilts+0.071
iver+0.070
During+0.067
 spa+0.066
Neg logit contributions
ho-0.077
sed-0.069
ses-0.063
aped-0.061
 hope-0.059
Token mixing
Token ID17090
Feature activation+3.05e-02

Pos logit contributions
lf+0.022
ilts+0.019
 cartoons+0.019
 spa+0.018
ruce+0.017
Neg logit contributions
ho-0.053
sed-0.051
ses-0.049
ks-0.045
aped-0.045
Token,
Token ID11
Feature activation+1.44e-02

Pos logit contributions
lf+0.040
ilts+0.036
 cartoons+0.035
 spa+0.034
ruce+0.033
Neg logit contributions
ho-0.061
sed-0.059
ses-0.056
ks-0.050
aped-0.050
Token Amy
Token ID14235
Feature activation-7.34e-03

Pos logit contributions
lf+0.019
 cartoons+0.017
ilts+0.017
 spa+0.016
During+0.016
Neg logit contributions
ho-0.028
sed-0.027
ses-0.026
aped-0.023
ks-0.023
Token's
Token ID338
Feature activation+9.14e-03

Pos logit contributions
ho+0.012
sed+0.012
ses+0.011
aped+0.010
ks+0.010
Neg logit contributions
lf-0.012
ilts-0.011
 cartoons-0.010
 spa-0.010
lore-0.010
Token."
Token ID526
Feature activation+1.49e-02

Pos logit contributions
lf+0.037
ilts+0.035
 cartoons+0.034
During+0.033
 spa+0.033
Neg logit contributions
ho-0.033
sed-0.029
ses-0.027
 hope-0.026
ed-0.025
Token
Token ID220
Feature activation+4.84e-03

Pos logit contributions
lf+0.025
ilts+0.023
 cartoons+0.022
 spa+0.022
ruce+0.021
Neg logit contributions
ho-0.025
sed-0.023
ses-0.022
aped-0.019
ks-0.019
Token
Token ID198
Feature activation+1.23e-02

Pos logit contributions
lf+0.008
 cartoons+0.007
ilts+0.007
 spa+0.007
lore+0.007
Neg logit contributions
ho-0.008
sed-0.007
ses-0.007
ks-0.006
aped-0.006
Token
Token ID198
Feature activation+2.02e-02

Pos logit contributions
lf+0.022
ilts+0.021
 cartoons+0.020
 spa+0.020
lore+0.020
Neg logit contributions
ho-0.018
sed-0.016
ses-0.014
ks-0.014
 hope-0.013
TokenWhile
Token ID3633
Feature activation+2.73e-02

Pos logit contributions
 cartoons+0.033
 films+0.032
 operation+0.032
 fever+0.031
 opener+0.031
Neg logit contributions
ho-0.029
OK-0.026
 pad-0.024
Deal-0.023
Ok-0.023
Token washing
Token ID20518
Feature activation+5.17e-02

Pos logit contributions
ilts+0.041
lf+0.040
iver+0.039
lore+0.038
ruce+0.038
Neg logit contributions
ho-0.048
sed-0.045
ses-0.042
aped-0.039
 ear-0.038
Token the
Token ID262
Feature activation+3.38e-02

Pos logit contributions
ilts+0.085
lf+0.084
lore+0.081
During+0.079
iver+0.079
Neg logit contributions
ho-0.081
sed-0.072
ed-0.067
ses-0.066
aped-0.064
Token dishes
Token ID16759
Feature activation+4.33e-02

Pos logit contributions
lore+0.044
ilts+0.044
lf+0.044
iver+0.043
During+0.043
Neg logit contributions
 hope-0.049
ed-0.049
sed-0.049
 ear-0.048
ho-0.047
Token,
Token ID11
Feature activation+3.22e-02

Pos logit contributions
lf+0.073
ilts+0.071
 cartoons+0.068
 spa+0.067
lore+0.066
Neg logit contributions
ho-0.069
sed-0.063
ses-0.059
aped-0.053
 hope-0.052
Token Lily
Token ID20037
Feature activation+2.11e-02

Pos logit contributions
ilts+0.047
lf+0.047
iver+0.044
ruce+0.043
 spa+0.043
Neg logit contributions
ho-0.057
sed-0.054
ses-0.050
 hope-0.046
 ear-0.046
Token saw
Token ID2497
Feature activation+4.37e-03

Pos logit contributions
lf+0.037
 cartoons+0.036
ilts+0.036
ruce+0.034
 spa+0.034
Neg logit contributions
ho-0.030
sed-0.028
ses-0.025
 pad-0.024
aped-0.024
Token said
Token ID531
Feature activation+1.35e-02

Pos logit contributions
lf+0.044
ilts+0.039
 cartoons+0.039
ruce+0.037
 spa+0.037
Neg logit contributions
ho-0.050
sed-0.048
ses-0.046
aped-0.040
ks-0.039
Token,
Token ID11
Feature activation+1.15e-02

Pos logit contributions
lf+0.016
 cartoons+0.014
ilts+0.014
 spa+0.013
ruce+0.013
Neg logit contributions
ho-0.028
sed-0.027
ses-0.026
aped-0.023
ks-0.023
Token "
Token ID366
Feature activation+1.37e-02

Pos logit contributions
lf+0.017
 cartoons+0.016
ilts+0.015
 spa+0.015
During+0.015
Neg logit contributions
ho-0.021
sed-0.020
ses-0.019
ks-0.017
aped-0.017
TokenTim
Token ID14967
Feature activation+1.33e-02

Pos logit contributions
lf+0.024
ilts+0.023
 cartoons+0.022
 spa+0.022
During+0.022
Neg logit contributions
ho-0.021
sed-0.020
ses-0.019
ks-0.016
aped-0.016
Tokenmy
Token ID1820
Feature activation+2.37e-02

Pos logit contributions
lf+0.017
ilts+0.015
 cartoons+0.015
 spa+0.014
During+0.014
Neg logit contributions
ho-0.027
sed-0.026
ses-0.025
aped-0.022
ks-0.022
Token,
Token ID11
Feature activation+9.29e-03

Pos logit contributions
lf+0.029
ilts+0.027
 cartoons+0.025
 spa+0.024
 pack+0.024
Neg logit contributions
ho-0.049
sed-0.047
ses-0.045
aped-0.040
ks-0.039
Token it
Token ID340
Feature activation+1.44e-02

Pos logit contributions
lf+0.015
ilts+0.013
 cartoons+0.013
 spa+0.013
ruce+0.013
Neg logit contributions
ho-0.016
sed-0.015
ses-0.014
aped-0.013
ks-0.013
Token's
Token ID338
Feature activation+2.65e-02

Pos logit contributions
lf+0.027
ilts+0.025
 spa+0.024
 cartoons+0.024
 product+0.024
Neg logit contributions
ho-0.020
sed-0.019
ses-0.018
 hope-0.016
aped-0.015
Token not
Token ID407
Feature activation+2.91e-02

Pos logit contributions
lf+0.043
ilts+0.039
lore+0.038
 spa+0.037
irens+0.036
Neg logit contributions
sed-0.043
ses-0.041
ho-0.040
 ear-0.037
 hope-0.035
Token nice
Token ID3621
Feature activation+3.01e-02

Pos logit contributions
lf+0.050
ilts+0.045
 spa+0.043
ruce+0.042
lore+0.042
Neg logit contributions
sed-0.044
ho-0.043
ses-0.042
 hope-0.038
 ear-0.038
Token to
Token ID284
Feature activation+2.12e-02

Pos logit contributions
lf+0.046
ilts+0.043
 cartoons+0.041
 spa+0.041
ruce+0.040
Neg logit contributions
ho-0.053
sed-0.051
ses-0.048
aped-0.042
ks-0.042
Token asked
Token ID1965
Feature activation-6.30e-03

Pos logit contributions
lf+0.007
ilts+0.006
 cartoons+0.006
 spa+0.006
During+0.006
Neg logit contributions
ho-0.007
sed-0.007
ses-0.006
ks-0.006
aped-0.006
Token for
Token ID329
Feature activation+8.60e-03

Pos logit contributions
ho+0.015
sed+0.015
ses+0.015
aped+0.013
rified+0.012
Neg logit contributions
lf-0.007
 cartoons-0.006
 organize-0.005
 chores-0.005
 homework-0.005
Token a
Token ID257
Feature activation-8.79e-03

Pos logit contributions
lf+0.011
 cartoons+0.010
 spa+0.010
 chores+0.009
 bath+0.009
Neg logit contributions
ho-0.018
sed-0.017
ses-0.017
aped-0.015
rified-0.014
Token green
Token ID4077
Feature activation+7.51e-03

Pos logit contributions
ho+0.018
sed+0.017
ses+0.016
aped+0.015
ks+0.015
Neg logit contributions
lf-0.011
ilts-0.010
 spa-0.010
 cartoons-0.010
 bath-0.009
Token spoon
Token ID24556
Feature activation+1.82e-02

Pos logit contributions
lf+0.009
 cartoons+0.008
 spa+0.008
ilts+0.008
 bath+0.008
Neg logit contributions
ho-0.015
sed-0.015
ses-0.014
ks-0.013
aped-0.013
Token.
Token ID13
Feature activation+1.09e-02

Pos logit contributions
lf+0.026
ilts+0.023
 cartoons+0.023
 spa+0.023
ruce+0.022
Neg logit contributions
ho-0.033
sed-0.033
ses-0.031
aped-0.027
ks-0.027
Token She
Token ID1375
Feature activation+6.71e-03

Pos logit contributions
lf+0.014
 cartoons+0.014
ilts+0.013
During+0.013
 spa+0.013
Neg logit contributions
ho-0.021
sed-0.021
ses-0.020
ks-0.017
 hope-0.017
Token was
Token ID373
Feature activation-1.08e-02

Pos logit contributions
lf+0.011
ilts+0.009
 spa+0.009
During+0.009
 cartoons+0.009
Neg logit contributions
ho-0.011
sed-0.011
ses-0.011
rified-0.009
ks-0.009
Token so
Token ID523
Feature activation-3.02e-03

Pos logit contributions
ho+0.020
sed+0.019
ses+0.018
aped+0.016
rified+0.016
Neg logit contributions
lf-0.015
ilts-0.014
 cartoons-0.014
 spa-0.014
ruce-0.013
Token excited
Token ID6568
Feature activation-6.11e-03

Pos logit contributions
ho+0.006
sed+0.005
ses+0.005
rified+0.004
ks+0.004
Neg logit contributions
lf-0.004
ilts-0.004
 cartoons-0.004
 spa-0.004
ruce-0.004
Token to
Token ID284
Feature activation-1.57e-02

Pos logit contributions
ho+0.011
sed+0.011
ses+0.011
aped+0.009
ks+0.009
Neg logit contributions
lf-0.009
ilts-0.008
 cartoons-0.008
 spa-0.008
ruce-0.007
Token learn
Token ID2193
Feature activation+2.02e-02

Pos logit contributions
lf+0.025
arre+0.025
ooky+0.023
ruce+0.023
ilts+0.023
Neg logit contributions
 hope-0.015
ho-0.014
 ear-0.013
sed-0.012
ses-0.011
Token.
Token ID13
Feature activation+1.67e-02

Pos logit contributions
lf+0.033
ilts+0.031
iver+0.029
 spa+0.029
lore+0.028
Neg logit contributions
ho-0.032
sed-0.030
ses-0.028
 hope-0.027
ks-0.027
Token We
Token ID775
Feature activation+5.92e-03

Pos logit contributions
 cartoons+0.026
lf+0.026
 spa+0.026
 films+0.025
ilts+0.025
Neg logit contributions
ho-0.028
sed-0.025
ses-0.023
aped-0.021
ed-0.020
Token want
Token ID765
Feature activation-3.51e-03

Pos logit contributions
lf+0.011
ilts+0.011
 pack+0.010
 spa+0.010
 cartoons+0.010
Neg logit contributions
ho-0.008
sed-0.008
ses-0.007
aped-0.006
rified-0.006
Token to
Token ID284
Feature activation+1.53e-02

Pos logit contributions
ho+0.006
sed+0.006
ses+0.005
 hope+0.005
ks+0.005
Neg logit contributions
lf-0.006
ilts-0.005
 spa-0.005
 cartoons-0.005
lore-0.005
Token cook
Token ID4255
Feature activation+3.81e-02

Pos logit contributions
lf+0.028
arre+0.026
ilts+0.026
ruce+0.025
 spa+0.025
Neg logit contributions
ho-0.021
 hope-0.020
sed-0.019
 ear-0.018
ses-0.018
Token with
Token ID351
Feature activation+1.62e-02

Pos logit contributions
lf+0.052
ilts+0.047
 cartoons+0.046
 spa+0.045
ruce+0.043
Neg logit contributions
ho-0.075
sed-0.071
ses-0.068
aped-0.060
ks-0.059
Token you
Token ID345
Feature activation-1.08e-02

Pos logit contributions
lf+0.021
ilts+0.019
 cartoons+0.019
 spa+0.018
ruce+0.018
Neg logit contributions
ho-0.032
sed-0.031
ses-0.030
aped-0.026
ks-0.026
Token.
Token ID13
Feature activation+1.35e-02

Pos logit contributions
ho+0.019
sed+0.018
ses+0.017
ks+0.015
aped+0.015
Neg logit contributions
lf-0.017
ilts-0.015
 cartoons-0.015
 spa-0.015
ruce-0.014
Token We
Token ID775
Feature activation+6.08e-03

Pos logit contributions
 cartoons+0.022
lf+0.022
 spa+0.021
 films+0.021
ilts+0.021
Neg logit contributions
ho-0.022
sed-0.020
ses-0.018
aped-0.016
ed-0.016
Token want
Token ID765
Feature activation-2.60e-03

Pos logit contributions
lf+0.012
ilts+0.011
 cartoons+0.011
 spa+0.011
 pack+0.011
Neg logit contributions
ho-0.008
sed-0.008
ses-0.007
aped-0.006
ks-0.006
Token She
Token ID1375
Feature activation+1.08e-02

Pos logit contributions
lf+0.005
ilts+0.004
 cartoons+0.004
 spa+0.004
During+0.004
Neg logit contributions
ho-0.013
sed-0.012
ses-0.012
aped-0.011
ks-0.011
Token had
Token ID550
Feature activation+8.75e-03

Pos logit contributions
lf+0.022
ilts+0.021
 cartoons+0.020
 spa+0.020
ruce+0.020
Neg logit contributions
ho-0.014
sed-0.013
ses-0.011
aped-0.009
ks-0.009
Token a
Token ID257
Feature activation+9.25e-03

Pos logit contributions
lf+0.012
ilts+0.011
 cartoons+0.010
 spa+0.010
ruce+0.010
Neg logit contributions
ho-0.017
sed-0.017
ses-0.016
aped-0.014
ks-0.014
Token lot
Token ID1256
Feature activation+2.55e-02

Pos logit contributions
lf+0.012
ilts+0.012
iver+0.011
ruce+0.011
lore+0.011
Neg logit contributions
ho-0.018
sed-0.018
ses-0.017
 hope-0.015
 ear-0.015
Token of
Token ID286
Feature activation+2.12e-02

Pos logit contributions
lf+0.040
ilts+0.035
 cartoons+0.034
lore+0.034
iver+0.033
Neg logit contributions
ho-0.044
sed-0.039
ks-0.039
ed-0.038
ses-0.037
Token fun
Token ID1257
Feature activation+1.15e-02

Pos logit contributions
lf+0.035
lore+0.032
ilts+0.032
During+0.030
iver+0.030
Neg logit contributions
sed-0.032
 hope-0.031
ses-0.030
ho-0.030
 ear-0.029
Token spl
Token ID4328
Feature activation+1.66e-02

Pos logit contributions
 mechanic+0.021
 spa+0.020
ilts+0.020
 severe+0.020
 hire+0.019
Neg logit contributions
ho-0.014
sed-0.013
aped-0.013
ks-0.013
ed-0.012
Tokenashing
Token ID2140
Feature activation+9.85e-03

Pos logit contributions
lf+0.035
 cartoons+0.031
 spa+0.030
 mechanic+0.030
 product+0.030
Neg logit contributions
ho-0.023
sed-0.020
ses-0.018
aped-0.018
ed-0.017
Token water
Token ID1660
Feature activation+3.27e-03

Pos logit contributions
lf+0.016
 cartoons+0.015
ilts+0.014
 bath+0.014
ruce+0.014
Neg logit contributions
sed-0.017
ho-0.016
ses-0.016
aped-0.013
ks-0.013
Token and
Token ID290
Feature activation-1.79e-03

Pos logit contributions
lf+0.005
ilts+0.005
 cartoons+0.005
 spa+0.005
ruce+0.004
Neg logit contributions
ho-0.006
sed-0.005
ses-0.005
aped-0.004
 hope-0.004
Token making
Token ID1642
Feature activation+1.03e-02

Pos logit contributions
ho+0.003
sed+0.003
ses+0.003
rified+0.003
ks+0.003
Neg logit contributions
lf-0.002
 cartoons-0.002
ilts-0.002
 spa-0.002
 bath-0.002
Token and
Token ID290
Feature activation-1.35e-02

Pos logit contributions
ho+0.006
sed+0.006
ses+0.005
aped+0.005
ks+0.005
Neg logit contributions
lf-0.006
ilts-0.005
 cartoons-0.005
 spa-0.005
ruce-0.005
Token wanted
Token ID2227
Feature activation-3.34e-02

Pos logit contributions
ho+0.021
sed+0.020
ses+0.019
rified+0.017
ks+0.016
Neg logit contributions
lf-0.023
ilts-0.021
 cartoons-0.020
 pack-0.020
 spa-0.020
Token to
Token ID284
Feature activation-1.34e-02

Pos logit contributions
ho+0.066
sed+0.063
ses+0.060
aped+0.053
rified+0.053
Neg logit contributions
lf-0.045
 cartoons-0.040
ilts-0.040
 spa-0.039
 bath-0.038
Token take
Token ID1011
Feature activation-1.09e-02

Pos logit contributions
ho+0.018
sed+0.016
 ear+0.016
 hope+0.016
ses+0.015
Neg logit contributions
lf-0.025
arre-0.025
ilts-0.024
 spa-0.024
 films-0.023
Token it
Token ID340
Feature activation-1.25e-02

Pos logit contributions
ho+0.022
sed+0.021
ses+0.020
aped+0.017
rified+0.017
Neg logit contributions
lf-0.014
ilts-0.013
 cartoons-0.013
 spa-0.013
 bath-0.012
Token home
Token ID1363
Feature activation+4.36e-03

Pos logit contributions
ho+0.015
sed+0.013
 ear+0.012
ses+0.012
 hope+0.012
Neg logit contributions
lf-0.025
 spa-0.024
ilts-0.023
 cartoons-0.023
 films-0.023
Token.
Token ID13
Feature activation-4.02e-03

Pos logit contributions
lf+0.009
ilts+0.009
arre+0.009
 spa+0.009
 cartoons+0.009
Neg logit contributions
ho-0.005
ks-0.003
sed-0.003
 hope-0.003
 ear-0.003
Token
Token ID198
Feature activation+2.68e-04

Pos logit contributions
ho+0.007
sed+0.007
ses+0.007
ks+0.006
aped+0.006
Neg logit contributions
lf-0.006
ilts-0.005
 cartoons-0.005
 spa-0.005
During-0.005
Token
Token ID198
Feature activation+8.60e-03

Pos logit contributions
lf+0.000
ilts+0.000
 cartoons+0.000
 spa+0.000
ruce+0.000
Neg logit contributions
ho-0.000
sed-0.000
ses-0.000
ks-0.000
aped-0.000
TokenThe
Token ID464
Feature activation+3.01e-03

Pos logit contributions
 cartoons+0.013
ingu+0.012
ilts+0.012
lf+0.011
 models+0.011
Neg logit contributions
ho-0.015
OK-0.013
aped-0.013
Deal-0.013
sed-0.012
Token mom
Token ID1995
Feature activation+1.84e-02

Pos logit contributions
lf+0.004
 cartoons+0.003
 pack+0.003
 mechanic+0.003
 spa+0.003
Neg logit contributions
ho-0.006
sed-0.006
ses-0.006
ks-0.006
rified-0.006
Token ran
Token ID4966
Feature activation-1.30e-02

Pos logit contributions
lf+0.008
 pack+0.007
During+0.007
 spa+0.007
 mechanic+0.007
Neg logit contributions
ho-0.010
ses-0.010
sed-0.010
rified-0.008
 Hello-0.008
Token up
Token ID510
Feature activation-5.07e-03

Pos logit contributions
ho+0.022
sed+0.021
ses+0.020
aped+0.017
ks+0.017
Neg logit contributions
lf-0.021
ilts-0.019
 cartoons-0.019
 spa-0.019
ruce-0.018
Token the
Token ID262
Feature activation+6.74e-03

Pos logit contributions
ho+0.010
sed+0.010
ses+0.009
aped+0.008
rified+0.008
Neg logit contributions
lf-0.007
 cartoons-0.006
 spa-0.006
ilts-0.006
ruce-0.005
Token hill
Token ID12788
Feature activation-2.18e-02

Pos logit contributions
lf+0.007
 cartoons+0.005
 pack+0.005
 spa+0.005
 mechanic+0.005
Neg logit contributions
ho-0.015
sed-0.015
ses-0.015
ks-0.014
ked-0.014
Token and
Token ID290
Feature activation-1.24e-02

Pos logit contributions
sed+0.047
ses+0.045
ho+0.044
 Hello+0.039
 ear+0.039
Neg logit contributions
lf-0.024
ilts-0.019
 pack-0.019
 opener-0.019
 cartoons-0.019
Token stopped
Token ID5025
Feature activation-9.95e-03

Pos logit contributions
sed+0.023
ses+0.022
ho+0.022
Deal+0.019
 de+0.018
Neg logit contributions
lf-0.017
 pack-0.016
During-0.015
 spa-0.015
 organize-0.015
Token when
Token ID618
Feature activation-4.30e-03

Pos logit contributions
sed+0.021
ses+0.020
ho+0.020
rified+0.018
Deal+0.018
Neg logit contributions
lf-0.010
 cartoons-0.010
 pack-0.009
 bath-0.009
 organize-0.009
Token she
Token ID673
Feature activation+4.30e-03

Pos logit contributions
sed+0.010
ses+0.010
ho+0.010
 pad+0.009
ked+0.009
Neg logit contributions
lf-0.004
 cartoons-0.003
 spa-0.003
 chores-0.003
 bath-0.003
Token reached
Token ID4251
Feature activation+5.08e-03

Pos logit contributions
lf+0.007
 pack+0.005
ilts+0.005
During+0.005
 bath+0.005
Neg logit contributions
ses-0.008
sed-0.008
ho-0.008
Deal-0.007
 ear-0.006
Token the
Token ID262
Feature activation-6.07e-03

Pos logit contributions
lf+0.005
 cartoons+0.005
 pack+0.005
 spa+0.005
ilts+0.005
Neg logit contributions
ho-0.012
sed-0.011
ses-0.010
Deal-0.009
rified-0.009
Token top
Token ID1353
Feature activation-2.47e-02

Pos logit contributions
ho+0.015
sed+0.014
ses+0.014
ks+0.014
ked+0.013
Neg logit contributions
 mechanic-0.005
 battery-0.005
lf-0.005
 doctor-0.004
 pack-0.004
Token found
Token ID1043
Feature activation+4.60e-03

Pos logit contributions
ho+0.005
sed+0.005
ses+0.005
aped+0.004
ks+0.004
Neg logit contributions
lf-0.006
ilts-0.005
 spa-0.005
 cartoons-0.005
During-0.005
Token a
Token ID257
Feature activation-6.98e-03

Pos logit contributions
 bath+0.003
 doctor+0.003
lf+0.003
 spa+0.002
 doctors+0.002
Neg logit contributions
ho-0.012
sed-0.011
ses-0.011
Deal-0.010
rav-0.010
Token sandy
Token ID44039
Feature activation+2.29e-02

Pos logit contributions
ho+0.015
sed+0.014
ses+0.013
 hope+0.012
ks+0.012
Neg logit contributions
lf-0.008
ilts-0.008
 cartoons-0.007
ruce-0.007
lore-0.007
Token beach
Token ID10481
Feature activation-3.02e-03

Pos logit contributions
ilts+0.029
 cartoons+0.029
lf+0.028
During+0.027
ruce+0.027
Neg logit contributions
ho-0.045
sed-0.041
ses-0.037
 ear-0.034
 hope-0.033
Token.
Token ID13
Feature activation-7.35e-03

Pos logit contributions
ho+0.005
sed+0.005
ses+0.005
aped+0.004
rified+0.004
Neg logit contributions
lf-0.005
ilts-0.004
 cartoons-0.004
 spa-0.004
 pack-0.004
Token She
Token ID1375
Feature activation+1.50e-03

Pos logit contributions
ho+0.014
sed+0.014
ses+0.013
ks+0.012
 hope+0.012
Neg logit contributions
lf-0.009
ilts-0.009
 spa-0.009
During-0.009
 cartoons-0.009
Token was
Token ID373
Feature activation-1.71e-02

Pos logit contributions
lf+0.003
ilts+0.002
 cartoons+0.002
 spa+0.002
During+0.002
Neg logit contributions
ho-0.002
sed-0.002
ses-0.002
rified-0.002
ks-0.002
Token so
Token ID523
Feature activation-4.95e-03

Pos logit contributions
ho+0.028
sed+0.027
ses+0.025
ks+0.022
 hope+0.022
Neg logit contributions
lf-0.029
ilts-0.027
 cartoons-0.026
 spa-0.025
ruce-0.025
Token excited
Token ID6568
Feature activation-1.22e-02

Pos logit contributions
ho+0.008
sed+0.008
ses+0.007
aped+0.006
ks+0.006
Neg logit contributions
lf-0.008
ilts-0.008
 cartoons-0.008
 spa-0.007
ruce-0.007
Token to
Token ID284
Feature activation-2.53e-02

Pos logit contributions
ho+0.022
sed+0.021
ses+0.020
aped+0.018
ks+0.018
Neg logit contributions
lf-0.018
ilts-0.017
 cartoons-0.016
 spa-0.016
ruce-0.015
Token be
Token ID307
Feature activation-1.52e-02

Pos logit contributions
ho+0.043
sed+0.040
ses+0.038
 hope+0.035
 ear+0.035
Neg logit contributions
lf-0.040
ilts-0.038
 cartoons-0.036
 spa-0.036
ruce-0.034
Token brown
Token ID7586
Feature activation-3.62e-03

Pos logit contributions
ho+0.048
sed+0.047
ses+0.044
aped+0.041
ks+0.040
Neg logit contributions
lf-0.029
 cartoons-0.025
ilts-0.025
 spa-0.025
ruce-0.024
Token mushroom
Token ID28520
Feature activation+2.40e-04

Pos logit contributions
ho+0.007
sed+0.007
ses+0.006
ks+0.006
aped+0.006
Neg logit contributions
lf-0.005
ilts-0.004
 cartoons-0.004
 spa-0.004
ruce-0.004
Token.
Token ID13
Feature activation+3.03e-03

Pos logit contributions
lf+0.000
ilts+0.000
 spa+0.000
 cartoons+0.000
ruce+0.000
Neg logit contributions
ho-0.000
sed-0.000
ses-0.000
 hope-0.000
ks-0.000
Token It
Token ID632
Feature activation-1.11e-02

Pos logit contributions
lf+0.006
ilts+0.005
 cartoons+0.005
 spa+0.005
ruce+0.005
Neg logit contributions
ho-0.004
sed-0.004
ses-0.004
aped-0.003
 ear-0.003
Token was
Token ID373
Feature activation-1.74e-02

Pos logit contributions
ho+0.018
sed+0.017
ses+0.016
ks+0.014
aped+0.014
Neg logit contributions
lf-0.019
ilts-0.017
 cartoons-0.017
 spa-0.017
ruce-0.016
Token so
Token ID523
Feature activation-7.95e-03

Pos logit contributions
ho+0.029
sed+0.028
ses+0.026
 ear+0.023
 hope+0.022
Neg logit contributions
lf-0.028
ilts-0.027
 cartoons-0.026
 spa-0.025
lore-0.025
Token pretty
Token ID2495
Feature activation-9.91e-03

Pos logit contributions
ho+0.017
sed+0.017
ses+0.016
ks+0.015
aped+0.015
Neg logit contributions
lf-0.009
ilts-0.008
 cartoons-0.007
 spa-0.007
ruce-0.007
Token and
Token ID290
Feature activation-1.56e-02

Pos logit contributions
ho+0.019
sed+0.019
ses+0.018
ks+0.016
aped+0.016
Neg logit contributions
lf-0.013
ilts-0.012
 cartoons-0.011
 spa-0.011
ruce-0.011
Token she
Token ID673
Feature activation-5.30e-04

Pos logit contributions
ho+0.032
sed+0.032
ses+0.031
ks+0.028
aped+0.028
Neg logit contributions
lf-0.021
 cartoons-0.016
During-0.015
ruce-0.015
 spa-0.015
Token wanted
Token ID2227
Feature activation-3.75e-02

Pos logit contributions
ho+0.001
sed+0.001
ses+0.001
aped+0.000
 ear+0.000
Neg logit contributions
ilts-0.001
lf-0.001
 cartoons-0.001
 spa-0.001
 films-0.001
Token to
Token ID284
Feature activation-1.71e-02

Pos logit contributions
ho+0.077
sed+0.075
ses+0.072
aped+0.065
rified+0.064
Neg logit contributions
lf-0.046
 cartoons-0.041
ilts-0.040
 bath-0.040
ruce-0.038
Token smaller
Token ID4833
Feature activation-2.70e-03

Pos logit contributions
ho+0.006
sed+0.006
ses+0.006
ks+0.006
rified+0.006
Neg logit contributions
lf-0.003
 cartoons-0.003
 bath-0.002
 spa-0.002
During-0.002
Token.
Token ID13
Feature activation+7.11e-03

Pos logit contributions
sed+0.006
ho+0.006
ses+0.005
osh+0.005
Deal+0.005
Neg logit contributions
lf-0.002
 cartoons-0.002
 opener-0.002
uts-0.002
 mess-0.002
Token She
Token ID1375
Feature activation+6.92e-03

Pos logit contributions
lf+0.009
 cartoons+0.008
ilts+0.008
 spa+0.008
lore+0.008
Neg logit contributions
ho-0.014
sed-0.013
ses-0.013
ks-0.012
 hope-0.012
Token looked
Token ID3114
Feature activation-3.09e-03

Pos logit contributions
lf+0.012
ilts+0.010
During+0.010
 cartoons+0.010
lore+0.010
Neg logit contributions
ho-0.011
sed-0.011
ses-0.011
Deal-0.009
ks-0.009
Token at
Token ID379
Feature activation-3.12e-03

Pos logit contributions
ho+0.006
sed+0.005
ses+0.005
aped+0.005
Deal+0.004
Neg logit contributions
 cartoons-0.004
lf-0.004
lore-0.004
ruce-0.004
ilts-0.004
Token her
Token ID607
Feature activation-2.08e-03

Pos logit contributions
ho+0.007
sed+0.007
ses+0.007
aped+0.006
Deal+0.006
Neg logit contributions
lf-0.003
 cartoons-0.003
 bath-0.003
 carpet-0.002
ilts-0.002
Token hand
Token ID1021
Feature activation-5.98e-03

Pos logit contributions
ho+0.004
sed+0.004
ses+0.004
Deal+0.003
rified+0.003
Neg logit contributions
lf-0.003
 bath-0.002
ilts-0.002
 cartoons-0.002
 pack-0.002
Token and
Token ID290
Feature activation-1.09e-02

Pos logit contributions
ses+0.013
sed+0.013
ho+0.013
Deal+0.012
 hope+0.012
Neg logit contributions
lf-0.005
 cartoons-0.004
 bath-0.004
 pack-0.004
 spa-0.004
Token saw
Token ID2497
Feature activation-3.52e-03

Pos logit contributions
ses+0.021
sed+0.021
ho+0.020
rav+0.018
Deal+0.018
Neg logit contributions
lf-0.014
During-0.011
lore-0.011
 mess-0.011
 pack-0.011
Token that
Token ID326
Feature activation+8.87e-03

Pos logit contributions
ho+0.008
sed+0.008
ses+0.008
aped+0.007
Deal+0.007
Neg logit contributions
lf-0.003
 cartoons-0.003
 bath-0.003
ruce-0.003
ilts-0.003
Token the
Token ID262
Feature activation-7.08e-03

Pos logit contributions
lf+0.012
 cartoons+0.011
ilts+0.011
 spa+0.011
 bath+0.010
Neg logit contributions
ho-0.017
sed-0.017
ses-0.016
aped-0.014
ks-0.014
Token was
Token ID373
Feature activation+2.48e-02

Pos logit contributions
sed+0.047
ses+0.046
ho+0.044
ks+0.040
ked+0.039
Neg logit contributions
lf-0.025
 bath-0.020
omo-0.018
During-0.018
 pack-0.018
Token a
Token ID257
Feature activation+1.24e-02

Pos logit contributions
lf+0.027
 cartoons+0.026
 bath+0.025
ilts+0.025
 films+0.024
Neg logit contributions
ho-0.054
sed-0.051
ses-0.048
ks-0.044
rified-0.043
Token little
Token ID1310
Feature activation+2.02e-03

Pos logit contributions
lf+0.012
 cartoons+0.010
ilts+0.010
 spa+0.010
During+0.009
Neg logit contributions
ho-0.029
sed-0.028
ses-0.027
aped-0.025
ks-0.025
Token bird
Token ID6512
Feature activation-1.46e-02

Pos logit contributions
lf+0.003
ilts+0.003
 cartoons+0.003
 spa+0.003
 bath+0.003
Neg logit contributions
ho-0.004
sed-0.003
ses-0.003
ks-0.003
aped-0.003
Token named
Token ID3706
Feature activation+1.11e-03

Pos logit contributions
ho+0.026
sed+0.025
ses+0.024
aped+0.021
 hope+0.021
Neg logit contributions
lf-0.022
ilts-0.020
 cartoons-0.019
 spa-0.019
During-0.018
Token Jack
Token ID3619
Feature activation-1.44e-02

Pos logit contributions
lf+0.001
ilts+0.001
 cartoons+0.001
During+0.001
 bath+0.001
Neg logit contributions
ho-0.002
sed-0.002
ses-0.002
ks-0.002
aped-0.002
Token.
Token ID13
Feature activation+1.40e-02

Pos logit contributions
sed+0.031
ses+0.029
ho+0.029
aped+0.026
rified+0.025
Neg logit contributions
lf-0.017
ilts-0.015
 pack-0.015
iver-0.014
 bath-0.014
Token He
Token ID679
Feature activation-5.55e-03

Pos logit contributions
lf+0.022
ilts+0.021
 spa+0.020
 cartoons+0.020
ruce+0.019
Neg logit contributions
ho-0.024
sed-0.022
ses-0.021
aped-0.019
 hope-0.018
Token was
Token ID373
Feature activation-1.29e-02

Pos logit contributions
ho+0.009
sed+0.009
ses+0.008
rified+0.007
ks+0.007
Neg logit contributions
lf-0.009
During-0.008
ilts-0.008
 bath-0.008
 cartoons-0.007
Token singing
Token ID13777
Feature activation-9.11e-03

Pos logit contributions
ho+0.021
sed+0.020
ses+0.019
aped+0.017
ks+0.017
Neg logit contributions
lf-0.022
ilts-0.020
 cartoons-0.019
 spa-0.019
ruce-0.019
Token and
Token ID290
Feature activation-2.00e-02

Pos logit contributions
ho+0.017
sed+0.016
ses+0.015
ks+0.013
aped+0.013
Neg logit contributions
lf-0.013
ilts-0.013
 cartoons-0.012
 spa-0.012
ruce-0.012
Token"
Token ID1
Feature activation-4.13e-03

Pos logit contributions
lf+0.000
ilts+0.000
 cartoons+0.000
 spa+0.000
ruce+0.000
Neg logit contributions
ho-0.001
sed-0.001
ses-0.001
aped-0.001
ks-0.001
TokenSam
Token ID16305
Feature activation-1.02e-02

Pos logit contributions
sed+0.008
ho+0.008
ses+0.008
 ear+0.006
 de+0.006
Neg logit contributions
lf-0.006
During-0.005
 spa-0.005
ruce-0.005
 cartoons-0.005
Token,
Token ID11
Feature activation-1.56e-02

Pos logit contributions
ho+0.023
sed+0.023
ses+0.022
aped+0.020
ks+0.020
Neg logit contributions
lf-0.010
ilts-0.008
 pack-0.007
 mess-0.007
 cartoons-0.007
Token Mia
Token ID32189
Feature activation-7.83e-03

Pos logit contributions
ho+0.028
sed+0.027
ses+0.026
aped+0.024
ks+0.023
Neg logit contributions
lf-0.023
 cartoons-0.020
 bath-0.020
 pack-0.020
 organize-0.019
Token,
Token ID11
Feature activation-1.84e-02

Pos logit contributions
ho+0.018
sed+0.018
ses+0.017
aped+0.016
ked+0.016
Neg logit contributions
lf-0.008
ilts-0.006
 pack-0.006
?"-0.005
 mess-0.005
Token where
Token ID810
Feature activation-3.87e-02

Pos logit contributions
ho+0.034
sed+0.033
ses+0.031
aped+0.028
ks+0.028
Neg logit contributions
lf-0.026
 cartoons-0.023
 pack-0.023
 bath-0.023
 organize-0.023
Token are
Token ID389
Feature activation+3.83e-02

Pos logit contributions
ho+0.038
sed+0.034
ses+0.031
 hope+0.027
ks+0.026
Neg logit contributions
lf-0.088
ilts-0.085
 spa-0.082
 cartoons-0.081
lore-0.080
Token you
Token ID345
Feature activation+5.67e-03

Pos logit contributions
lf+0.074
ilts+0.073
iver+0.070
 spa+0.069
 cartoons+0.067
Neg logit contributions
ho-0.052
sed-0.047
ses-0.042
aped-0.036
 hope-0.036
Token?"
Token ID1701
Feature activation-2.71e-03

Pos logit contributions
lf+0.006
ilts+0.005
During+0.005
lore+0.005
 cartoons+0.004
Neg logit contributions
ho-0.013
sed-0.012
ses-0.011
 hope-0.011
 ear-0.011
Token she
Token ID673
Feature activation+1.24e-02

Pos logit contributions
ho+0.006
sed+0.006
ses+0.006
 pad+0.005
rified+0.005
Neg logit contributions
lf-0.003
 cartoons-0.003
 bath-0.002
ilts-0.002
 spa-0.002
Token calls
Token ID3848
Feature activation-1.21e-02

Pos logit contributions
lf+0.023
ilts+0.021
 cartoons+0.020
 spa+0.020
ruce+0.020
Neg logit contributions
ho-0.018
sed-0.017
ses-0.016
ks-0.014
 ear-0.014
Token so
Token ID523
Feature activation-6.28e-03

Pos logit contributions
ho+0.035
sed+0.033
ses+0.032
aped+0.029
ks+0.028
Neg logit contributions
lf-0.025
ilts-0.023
 cartoons-0.022
 spa-0.022
ruce-0.021
Token spark
Token ID9009
Feature activation+1.41e-03

Pos logit contributions
ho+0.015
sed+0.014
ses+0.014
rified+0.013
 hope+0.013
Neg logit contributions
lf-0.006
ilts-0.005
ruce-0.005
During-0.004
 cartoons-0.004
Tokenly
Token ID306
Feature activation-2.07e-02

Pos logit contributions
lf+0.002
ilts+0.002
 cartoons+0.002
ruce+0.002
lore+0.002
Neg logit contributions
sed-0.002
ho-0.002
ses-0.002
 ear-0.002
 hope-0.002
Token and
Token ID290
Feature activation-1.28e-02

Pos logit contributions
ho+0.046
sed+0.046
ses+0.045
Deal+0.043
rified+0.041
Neg logit contributions
lf-0.017
 cartoons-0.013
 bath-0.013
ilts-0.012
 mess-0.012
Token high
Token ID1029
Feature activation-3.37e-02

Pos logit contributions
ho+0.027
sed+0.026
ses+0.025
rified+0.024
Deal+0.023
Neg logit contributions
lf-0.017
 cartoons-0.013
 carpet-0.013
ilts-0.012
ruce-0.012
Token in
Token ID287
Feature activation-4.57e-02

Pos logit contributions
ho+0.064
sed+0.063
ses+0.061
ks+0.054
 ear+0.053
Neg logit contributions
lf-0.043
 cartoons-0.037
ilts-0.036
lore-0.035
ruce-0.035
Token the
Token ID262
Feature activation-4.25e-02

Pos logit contributions
sed+0.107
ho+0.104
ses+0.100
Deal+0.091
aped+0.090
Neg logit contributions
lf-0.050
 cartoons-0.041
ilts-0.040
 bath-0.040
 films-0.039
Token sky
Token ID6766
Feature activation-3.79e-02

Pos logit contributions
sed+0.090
ho+0.089
ses+0.086
ks+0.081
aped+0.080
Neg logit contributions
lf-0.048
 bath-0.048
 carpet-0.045
 pack-0.042
 battery-0.042
Token.
Token ID13
Feature activation-7.86e-03

Pos logit contributions
sed+0.074
ho+0.073
ses+0.072
Deal+0.063
rified+0.061
Neg logit contributions
lf-0.051
ilts-0.044
 cartoons-0.042
 bath-0.041
 product-0.041
Token They
Token ID1119
Feature activation+1.32e-02

Pos logit contributions
ho+0.015
sed+0.014
ses+0.013
 hope+0.012
aped+0.012
Neg logit contributions
lf-0.011
ilts-0.010
 cartoons-0.010
During-0.010
 spa-0.010
Token wanted
Token ID2227
Feature activation-3.21e-02

Pos logit contributions
lf+0.024
ilts+0.021
 cartoons+0.021
During+0.021
ruce+0.021
Neg logit contributions
ho-0.019
sed-0.019
ses-0.018
ks-0.015
aped-0.015
Token box
Token ID3091
Feature activation-2.78e-03

Pos logit contributions
lf+0.013
 pack+0.012
 cartoons+0.012
 bath+0.011
 mechanic+0.011
Neg logit contributions
ho-0.036
sed-0.035
ses-0.034
aped-0.033
rified-0.032
Token.
Token ID13
Feature activation-5.46e-03

Pos logit contributions
ho+0.004
sed+0.004
ses+0.004
 hope+0.003
ks+0.003
Neg logit contributions
lf-0.005
ilts-0.004
 cartoons-0.004
 spa-0.004
ruce-0.004
Token Sam
Token ID3409
Feature activation-9.18e-03

Pos logit contributions
ho+0.008
sed+0.007
ses+0.007
aped+0.006
ks+0.006
Neg logit contributions
lf-0.010
ilts-0.009
 cartoons-0.009
 spa-0.009
 mechanic-0.009
Token,
Token ID11
Feature activation-2.70e-02

Pos logit contributions
ho+0.013
sed+0.013
ses+0.012
aped+0.010
ks+0.010
Neg logit contributions
lf-0.017
ilts-0.016
 cartoons-0.015
 spa-0.015
ruce-0.015
Token you
Token ID345
Feature activation-2.48e-02

Pos logit contributions
ho+0.049
sed+0.048
ses+0.045
aped+0.042
rified+0.040
Neg logit contributions
lf-0.039
 cartoons-0.036
 pack-0.035
 bath-0.034
 spa-0.034
Token come
Token ID1282
Feature activation-3.73e-02

Pos logit contributions
ho+0.031
sed+0.029
ses+0.026
 hope+0.025
ks+0.024
Neg logit contributions
lf-0.050
 cartoons-0.047
ilts-0.047
 spa-0.046
lore-0.045
Token with
Token ID351
Feature activation-1.12e-02

Pos logit contributions
ho+0.061
sed+0.058
ses+0.054
aped+0.047
ks+0.047
Neg logit contributions
lf-0.062
ilts-0.058
 cartoons-0.056
 spa-0.055
ruce-0.054
Token me
Token ID502
Feature activation-2.46e-02

Pos logit contributions
ho+0.024
sed+0.022
ses+0.021
aped+0.020
rified+0.019
Neg logit contributions
lf-0.014
 cartoons-0.012
ilts-0.012
 spa-0.012
 bath-0.012
Token.
Token ID13
Feature activation-4.57e-03

Pos logit contributions
ho+0.045
sed+0.044
ses+0.041
aped+0.037
rified+0.035
Neg logit contributions
lf-0.036
ilts-0.033
 cartoons-0.032
 spa-0.031
 pack-0.030
Token We
Token ID775
Feature activation-8.70e-04

Pos logit contributions
ho+0.007
sed+0.006
ses+0.006
aped+0.005
ks+0.005
Neg logit contributions
lf-0.008
 cartoons-0.008
ilts-0.008
 spa-0.008
 mechanic-0.007
Token have
Token ID423
Feature activation+8.02e-03

Pos logit contributions
ho+0.001
sed+0.001
ses+0.001
 hope+0.001
ks+0.001
Neg logit contributions
lf-0.002
ilts-0.002
 cartoons-0.002
 spa-0.002
lore-0.001
Token and
Token ID290
Feature activation-2.30e-02

Pos logit contributions
ho+0.062
sed+0.057
ses+0.054
Deal+0.052
aped+0.050
Neg logit contributions
lf-0.030
 cartoons-0.027
ilts-0.027
 spa-0.026
 pack-0.025
Token a
Token ID257
Feature activation-3.95e-02

Pos logit contributions
ho+0.051
sed+0.048
ses+0.046
Deal+0.042
rified+0.042
Neg logit contributions
lf-0.025
 cartoons-0.023
ilts-0.021
 spa-0.020
 films-0.020
Token k
Token ID479
Feature activation-2.96e-02

Pos logit contributions
ho+0.081
sed+0.075
ses+0.073
aped+0.071
rified+0.070
Neg logit contributions
lf-0.046
 cartoons-0.042
 bath-0.042
ilts-0.041
 spa-0.040
Tokenang
Token ID648
Feature activation-3.27e-02

Pos logit contributions
sed+0.051
ho+0.050
ses+0.049
Deal+0.047
 hope+0.045
Neg logit contributions
ilts-0.036
lf-0.036
 bath-0.035
 battery-0.034
 opener-0.033
Tokenaroo
Token ID38049
Feature activation-1.93e-02

Pos logit contributions
ses+0.061
sed+0.061
Deal+0.057
ho+0.055
OK+0.051
Neg logit contributions
lf-0.040
?"-0.031
 mechanic-0.030
't-0.029
 pack-0.029
Token with
Token ID351
Feature activation-4.55e-02

Pos logit contributions
ho+0.035
sed+0.033
ses+0.033
Deal+0.030
rified+0.029
Neg logit contributions
lf-0.027
ilts-0.025
 cartoons-0.024
 bath-0.023
 pack-0.022
Token pol
Token ID755
Feature activation+4.44e-03

Pos logit contributions
ho+0.103
sed+0.100
ses+0.097
Deal+0.093
rified+0.084
Neg logit contributions
lf-0.043
ilts-0.038
 cartoons-0.038
 films-0.036
 bath-0.035
Tokenka
Token ID4914
Feature activation-3.24e-02

Pos logit contributions
lf+0.007
ilts+0.007
 cartoons+0.007
 spa+0.007
ruce+0.007
Neg logit contributions
ho-0.007
sed-0.007
ses-0.006
aped-0.006
ks-0.005
Token dotted
Token ID38745
Feature activation-1.23e-02

Pos logit contributions
ho+0.049
sed+0.046
ses+0.043
ks+0.038
aped+0.037
Neg logit contributions
lf-0.057
ilts-0.053
 spa-0.053
 cartoons-0.052
ruce-0.051
Token feet
Token ID3625
Feature activation-1.53e-02

Pos logit contributions
ho+0.022
sed+0.021
ses+0.020
aped+0.018
 hope+0.017
Neg logit contributions
lf-0.019
ilts-0.017
 cartoons-0.017
 spa-0.016
ruce-0.016
Token.
Token ID13
Feature activation-6.70e-03

Pos logit contributions
ho+0.026
sed+0.025
ses+0.023
aped+0.020
 hope+0.020
Neg logit contributions
lf-0.024
ilts-0.023
 cartoons-0.022
 spa-0.021
ruce-0.021
Token.
Token ID13
Feature activation-2.59e-03

Pos logit contributions
ho+0.005
sed+0.005
ses+0.005
aped+0.004
 hope+0.004
Neg logit contributions
lf-0.004
 cartoons-0.004
ilts-0.004
 spa-0.003
ruce-0.003
Token She
Token ID1375
Feature activation-3.49e-03

Pos logit contributions
ho+0.005
sed+0.005
ses+0.005
ks+0.004
 hope+0.004
Neg logit contributions
lf-0.003
 spa-0.003
ilts-0.003
 cartoons-0.003
During-0.003
Token said
Token ID531
Feature activation-1.12e-02

Pos logit contributions
ho+0.006
sed+0.005
ses+0.005
rified+0.005
aped+0.005
Neg logit contributions
lf-0.006
ilts-0.005
During-0.005
 pack-0.005
 spa-0.005
Token,
Token ID11
Feature activation-1.26e-02

Pos logit contributions
ses+0.024
sed+0.024
ho+0.023
rified+0.021
Deal+0.021
Neg logit contributions
lf-0.011
 cartoons-0.009
 spa-0.009
 bath-0.009
ilts-0.009
Token "
Token ID366
Feature activation-1.62e-02

Pos logit contributions
ho+0.024
sed+0.024
ses+0.023
rified+0.021
 hope+0.021
Neg logit contributions
lf-0.016
 cartoons-0.015
 spa-0.014
During-0.014
ilts-0.014
TokenYes
Token ID5297
Feature activation-4.13e-02

Pos logit contributions
sed+0.029
ses+0.029
ho+0.026
ks+0.022
rified+0.021
Neg logit contributions
 spa-0.025
During-0.025
lf-0.023
On-0.023
lore-0.023
Token!"
Token ID2474
Feature activation+5.81e-04

Pos logit contributions
ho+0.082
sed+0.079
ses+0.077
ks+0.069
aped+0.069
Neg logit contributions
lf-0.054
 cartoons-0.049
ilts-0.048
 spa-0.046
 bath-0.045
Token
Token ID198
Feature activation+8.47e-03

Pos logit contributions
lf+0.001
 cartoons+0.001
ilts+0.001
 spa+0.001
 bath+0.001
Neg logit contributions
ho-0.001
sed-0.001
ses-0.001
aped-0.001
ks-0.001
Token
Token ID198
Feature activation+1.75e-02

Pos logit contributions
lf+0.013
ilts+0.013
 cartoons+0.012
 spa+0.012
lore+0.012
Neg logit contributions
ho-0.014
sed-0.013
ses-0.012
ks-0.011
aped-0.011
TokenSo
Token ID2396
Feature activation-6.54e-04

Pos logit contributions
 cartoons+0.026
ilts+0.026
 homework+0.025
lf+0.025
 restaurants+0.024
Neg logit contributions
ho-0.029
aped-0.023
sed-0.022
OK-0.022
 Hello-0.021
Token Mom
Token ID11254
Feature activation+8.38e-03

Pos logit contributions
ho+0.001
sed+0.001
ses+0.001
aped+0.001
ks+0.001
Neg logit contributions
lf-0.001
ilts-0.001
 cartoons-0.001
 spa-0.001
ruce-0.001