TOP ACTIVATIONS
MAX = 0.072

to sleep . When it woke up , it felt much
fell asleep . When he woke up , he was back
his bed . When she woke up , she saw that
hot sun . When he woke up , all his candy
a nap and when she woke up , her grandma was
her cube . When she woke up , she felt much
fun . <|endoftext|> L ily woke up and felt hungry .
very tired . When she woke up , she played with
. When she woke up , it was a
knee . When Tim my woke up , he felt much
asleep fast . When she woke up , she felt much
adventure ever . When he woke up in the morning ,
with food . When she woke up , Fl uffy was
a nap . When she woke up , her mom my
the rope and the dog woke up , but was too
fell asleep . When it woke up , the rain had
slept peacefully . When she woke up , she said to
a nap . When she woke up , she saw that
And when Emily woke up in the morning ,
decided to play pirates again after lunch . But this time

MIN ACTIVATIONS
MIN = -0.075

in the scary forest " Just then ,
called him " Bob the Explorer " and all of his
" What are you doing ?" The tube didn 't
will understand you better " They sk ated
jealous , because the ch ame leon was very colorful and
is mine !" The ch ame leon could tell he was
pile of paw prints " L ucky smiled
Sara says . She reaches out her hand to touch the
Anna said . She reached out her hand to touch the
Lily says . She reaches out her arm to touch the
. John reached out and tried to stretch it
sharp object on the ground below . Curious , he quickly
. Bob pulled out his finger and slowly scratched
S ara reached out her finger and gently stro
's take a break " But Tim was determined
the shelf . She reached out and picked it up .
?" Joe took out a large can from his
. She reached out and touched it . It
The customer picked out two pairs of sunglasses .
with him . The ch ame leon said yes and the

INTERVAL 0.035 - 0.072
CONTAINS 0.344%

lighter . They made more mud pies and had fun .
Everyone was happy , until one night . M omm
my !" She drank the hot milk and felt warm inside
. <|endoftext|> S ara was sick . She had a fever
mud and dust and bird poop on it . Tim and

INTERVAL -0.002 - 0.035
CONTAINS 55.592%

go . The woman smiled and went back to
play outside and look for pretty rocks . One day ,
apa explained that the coal was for the fireplace . He
ant and wanted to grab the apple . She climbed the
day ." L ily felt happy and continued

INTERVAL -0.038 - -0.002
CONTAINS 43.849%

see it . She hopes the rabbit is safe and finds
scared of the dark . He would always remember the oy
. Mom my found the key and restored it to Lily
did not want to be apart . They thought and thought
again . Tom asked why and his mom told him that

INTERVAL -0.075 - -0.038
CONTAINS 0.215%

Yes , they are ost ric hes ," mom said .
. They hugged each other and went back to
's soar together to the top of the mountain !" But
her best , she could win anything she wanted . <|endoftext|>
walk in the woods . The forest was
Token to
Token ID284
Feature activation+4.48e-03

Pos logit contributions
 accidents+0.014
rim+0.013
reating+0.012
iver+0.012
tex+0.012
Neg logit contributions
sed-0.006
 three-0.005
 hand-0.005
cious-0.005
 seven-0.004
Token sleep
Token ID3993
Feature activation+1.77e-02

Pos logit contributions
 accidents+0.005
reating+0.004
rim+0.004
THE+0.004
 emergency+0.004
Neg logit contributions
sed-0.009
 three-0.008
ses-0.008
cious-0.008
 hand-0.008
Token.
Token ID13
Feature activation+1.00e-02

Pos logit contributions
 accidents+0.031
 mess+0.026
THE+0.025
reating+0.025
 cartoons+0.024
Neg logit contributions
sed-0.025
2-0.023
ogram-0.023
ses-0.022
cious-0.022
Token When
Token ID1649
Feature activation+3.30e-02

Pos logit contributions
 accidents+0.017
reating+0.015
THE+0.014
rim+0.014
 mess+0.014
Neg logit contributions
sed-0.014
cious-0.013
 three-0.013
2-0.012
ses-0.012
Token it
Token ID340
Feature activation+1.85e-02

Pos logit contributions
rim+0.075
iver+0.073
reating+0.070
ivering+0.068
hing+0.067
Neg logit contributions
 three-0.027
 "-0.025
sed-0.024
 one-0.023
 hand-0.023
Token woke
Token ID19092
Feature activation+6.88e-02

Pos logit contributions
reating+0.042
 accidents+0.040
rim+0.038
 cartoons+0.036
 emergency+0.036
Neg logit contributions
 three-0.016
 pointed-0.013
 saw-0.013
 hand-0.012
 looked-0.012
Token up
Token ID510
Feature activation-2.93e-03

Pos logit contributions
rim+0.122
reating+0.122
THE+0.111
iver+0.110
Found+0.109
Neg logit contributions
 three-0.072
 one-0.070
 "-0.069
 each-0.064
sed-0.063
Token,
Token ID11
Feature activation+1.42e-02

Pos logit contributions
 three+0.002
sed+0.002
 "+0.002
 to+0.002
 hand+0.002
Neg logit contributions
reating-0.006
rim-0.006
 accidents-0.006
THE-0.006
iver-0.006
Token it
Token ID340
Feature activation+6.94e-03

Pos logit contributions
 accidents+0.024
rim+0.024
reating+0.023
iver+0.022
 emergency+0.021
Neg logit contributions
 three-0.018
sed-0.017
 4-0.015
 hand-0.015
 finger-0.015
Token felt
Token ID2936
Feature activation+1.07e-02

Pos logit contributions
reating+0.015
 accidents+0.015
rim+0.014
 emergency+0.013
 vacations+0.013
Neg logit contributions
sed-0.006
 three-0.006
 pointed-0.005
 liked-0.005
 finger-0.005
Token much
Token ID881
Feature activation+2.43e-02

Pos logit contributions
reating+0.018
 accidents+0.018
rim+0.017
lf+0.017
 emergency+0.016
Neg logit contributions
 three-0.014
sed-0.012
 hand-0.012
 liked-0.011
 "-0.011
Token fell
Token ID3214
Feature activation+2.67e-02

Pos logit contributions
 accidents+0.040
reating+0.035
rim+0.034
THE+0.033
 mess+0.033
Neg logit contributions
sed-0.024
 three-0.022
cious-0.021
2-0.021
ses-0.021
Token asleep
Token ID16039
Feature activation+1.56e-02

Pos logit contributions
 accidents+0.046
rim+0.043
reating+0.042
 emergency+0.040
 vacations+0.039
Neg logit contributions
sed-0.034
 three-0.032
 hand-0.030
ses-0.029
 arms-0.029
Token.
Token ID13
Feature activation+1.42e-02

Pos logit contributions
 accidents+0.021
THE+0.018
 mess+0.017
reating+0.016
iver+0.016
Neg logit contributions
sed-0.026
ogram-0.025
2-0.025
cious-0.024
ses-0.024
Token When
Token ID1649
Feature activation+3.53e-02

Pos logit contributions
 accidents+0.026
reating+0.023
THE+0.022
rim+0.022
 mess+0.021
Neg logit contributions
sed-0.018
 three-0.016
cious-0.016
ses-0.016
2-0.016
Token he
Token ID339
Feature activation+5.00e-03

Pos logit contributions
iver+0.072
rim+0.071
reating+0.068
ivering+0.065
ilt+0.064
Neg logit contributions
 three-0.037
 one-0.036
 "-0.033
 to-0.030
sed-0.030
Token woke
Token ID19092
Feature activation+6.63e-02

Pos logit contributions
 accidents+0.010
reating+0.010
rim+0.009
adoes+0.009
 emergency+0.009
Neg logit contributions
 three-0.005
 pointed-0.005
 hand-0.004
 saw-0.004
 3-0.004
Token up
Token ID510
Feature activation-4.79e-03

Pos logit contributions
reating+0.112
Time+0.110
text+0.108
rim+0.107
THE+0.103
Neg logit contributions
 three-0.077
sed-0.075
 one-0.072
 to-0.069
 "-0.069
Token,
Token ID11
Feature activation+1.62e-02

Pos logit contributions
 three+0.004
sed+0.004
 to+0.004
 3+0.003
 hand+0.003
Neg logit contributions
THE-0.010
reating-0.010
rim-0.010
iver-0.009
ilt-0.009
Token he
Token ID339
Feature activation-6.95e-03

Pos logit contributions
reating+0.025
rim+0.025
 accidents+0.025
iver+0.024
ilt+0.023
Neg logit contributions
 three-0.023
sed-0.021
 3-0.020
 "-0.019
 hand-0.019
Token was
Token ID373
Feature activation-6.51e-04

Pos logit contributions
sed+0.007
 three+0.006
cious+0.006
2+0.006
 4+0.006
Neg logit contributions
 accidents-0.015
reating-0.013
rim-0.013
 emergency-0.012
THE-0.012
Token back
Token ID736
Feature activation-2.55e-04

Pos logit contributions
sed+0.001
 three+0.001
cious+0.001
 finger+0.001
 4+0.001
Neg logit contributions
 accidents-0.001
reating-0.001
rim-0.001
 emergency-0.001
THE-0.001
Token his
Token ID465
Feature activation+1.69e-02

Pos logit contributions
 accidents+0.001
rim+0.001
reating+0.001
iver+0.001
ivering+0.001
Neg logit contributions
 three-0.001
sed-0.001
 hand-0.000
 4-0.000
 arms-0.000
Token bed
Token ID3996
Feature activation+1.30e-02

Pos logit contributions
 accidents+0.022
reating+0.020
rim+0.019
THE+0.017
adays+0.016
Neg logit contributions
sed-0.031
 three-0.029
 hand-0.028
 finger-0.028
ses-0.028
Token.
Token ID13
Feature activation+2.07e-02

Pos logit contributions
 accidents+0.022
 mess+0.020
reating+0.019
lf+0.019
iver+0.019
Neg logit contributions
sed-0.019
ses-0.017
2-0.016
cious-0.016
ogram-0.016
Token When
Token ID1649
Feature activation+3.44e-02

Pos logit contributions
 accidents+0.039
reating+0.035
rim+0.034
THE+0.033
 mess+0.032
Neg logit contributions
sed-0.026
 three-0.023
ses-0.022
cious-0.022
2-0.022
Token she
Token ID673
Feature activation+8.84e-03

Pos logit contributions
rim+0.081
iver+0.080
ivering+0.075
ilt+0.073
reating+0.069
Neg logit contributions
 three-0.029
 one-0.026
 "-0.026
 to-0.023
 hand-0.021
Token woke
Token ID19092
Feature activation+6.62e-02

Pos logit contributions
 accidents+0.018
reating+0.017
rim+0.016
 emergency+0.016
 hurricane+0.015
Neg logit contributions
 three-0.009
sed-0.008
 pointed-0.008
 hand-0.008
 finger-0.008
Token up
Token ID510
Feature activation-1.10e-02

Pos logit contributions
rim+0.131
text+0.117
reating+0.117
Ro+0.116
iver+0.115
Neg logit contributions
 one-0.063
 three-0.061
 "-0.060
 to-0.059
 each-0.056
Token,
Token ID11
Feature activation+1.51e-02

Pos logit contributions
 three+0.009
 to+0.007
 "+0.007
 3+0.006
 hand+0.006
Neg logit contributions
rim-0.025
THE-0.025
reating-0.024
ilt-0.024
iver-0.024
Token she
Token ID673
Feature activation-5.17e-04

Pos logit contributions
rim+0.032
iver+0.031
ilt+0.029
asis+0.029
ivering+0.028
Neg logit contributions
 three-0.015
 "-0.014
 one-0.013
 pointed-0.012
 3-0.012
Token saw
Token ID2497
Feature activation+1.54e-02

Pos logit contributions
sed+0.001
 three+0.000
cious+0.000
 finger+0.000
2+0.000
Neg logit contributions
 accidents-0.001
reating-0.001
rim-0.001
 emergency-0.001
THE-0.001
Token that
Token ID326
Feature activation+2.28e-03

Pos logit contributions
rim+0.031
iver+0.030
asis+0.030
ivering+0.029
ital+0.029
Neg logit contributions
 three-0.015
 one-0.012
 "-0.012
sed-0.012
 finger-0.012
Token hot
Token ID3024
Feature activation+3.84e-02

Pos logit contributions
rim+0.021
 accidents+0.021
reating+0.020
iver+0.018
THE+0.018
Neg logit contributions
 three-0.020
 hand-0.019
sed-0.018
 3-0.017
 seven-0.017
Token sun
Token ID4252
Feature activation+3.50e-03

Pos logit contributions
 accidents+0.058
rim+0.054
THE+0.053
reating+0.052
adays+0.050
Neg logit contributions
 three-0.052
 hand-0.050
 3-0.045
 arms-0.044
 fingers-0.044
Token.
Token ID13
Feature activation+2.11e-02

Pos logit contributions
 accidents+0.008
reating+0.007
rim+0.007
THE+0.007
 mess+0.006
Neg logit contributions
sed-0.003
 three-0.003
cious-0.003
2-0.003
ses-0.003
Token When
Token ID1649
Feature activation+3.82e-02

Pos logit contributions
 accidents+0.038
reating+0.034
rim+0.034
THE+0.032
 mess+0.031
Neg logit contributions
sed-0.027
 three-0.025
2-0.024
ses-0.024
 4-0.023
Token he
Token ID339
Feature activation+6.06e-03

Pos logit contributions
rim+0.082
iver+0.075
reating+0.073
ivering+0.071
ilt+0.070
Neg logit contributions
 three-0.039
 one-0.035
 "-0.033
 hand-0.032
 4-0.031
Token woke
Token ID19092
Feature activation+6.54e-02

Pos logit contributions
 accidents+0.012
reating+0.012
rim+0.010
 hurricane+0.010
 vacations+0.010
Neg logit contributions
 three-0.006
 pointed-0.006
 hand-0.006
 finger-0.006
sed-0.006
Token up
Token ID510
Feature activation-7.33e-03

Pos logit contributions
rim+0.130
text+0.117
reating+0.117
Ro+0.116
Found+0.115
Neg logit contributions
 three-0.063
 one-0.062
 "-0.060
 to-0.060
 each-0.055
Token,
Token ID11
Feature activation+1.38e-02

Pos logit contributions
 three+0.007
sed+0.006
 3+0.006
 hand+0.006
 "+0.006
Neg logit contributions
reating-0.015
THE-0.015
rim-0.014
 accidents-0.014
tex-0.014
Token all
Token ID477
Feature activation-3.73e-03

Pos logit contributions
rim+0.027
reating+0.024
iver+0.024
ilt+0.024
ivering+0.023
Neg logit contributions
 three-0.017
 one-0.014
 "-0.014
 3-0.014
 hand-0.014
Token his
Token ID465
Feature activation+2.81e-03

Pos logit contributions
 three+0.004
 hand+0.004
 3+0.004
 finger+0.003
 arms+0.003
Neg logit contributions
 accidents-0.007
reating-0.006
rim-0.006
iver-0.006
THE-0.006
Token candy
Token ID18550
Feature activation+3.07e-02

Pos logit contributions
 accidents+0.005
reating+0.005
rim+0.004
THE+0.004
iver+0.004
Neg logit contributions
sed-0.004
 three-0.004
ses-0.003
 finger-0.003
 hand-0.003
Token a
Token ID257
Feature activation+8.61e-03

Pos logit contributions
 three+0.007
 hand+0.006
 3+0.006
 4+0.006
 seven+0.005
Neg logit contributions
 accidents-0.011
rim-0.011
asis-0.010
reating-0.010
THE-0.010
Token nap
Token ID25422
Feature activation+2.13e-02

Pos logit contributions
 accidents+0.014
reating+0.013
rim+0.012
THE+0.012
iver+0.012
Neg logit contributions
sed-0.012
 three-0.012
 hand-0.011
 finger-0.011
cious-0.011
Token and
Token ID290
Feature activation+5.06e-03

Pos logit contributions
 accidents+0.036
reating+0.030
rim+0.029
THE+0.029
 mess+0.029
Neg logit contributions
sed-0.031
2-0.028
cious-0.028
ses-0.028
 three-0.027
Token when
Token ID618
Feature activation+3.16e-02

Pos logit contributions
 accidents+0.009
reating+0.008
rim+0.008
 emergency+0.008
iver+0.008
Neg logit contributions
sed-0.006
 three-0.006
cious-0.006
ses-0.005
 4-0.005
Token she
Token ID673
Feature activation+1.08e-02

Pos logit contributions
rim+0.064
iver+0.061
 accidents+0.056
ivering+0.056
ilt+0.055
Neg logit contributions
 three-0.035
sed-0.031
 hand-0.029
 3-0.028
 4-0.028
Token woke
Token ID19092
Feature activation+6.51e-02

Pos logit contributions
 accidents+0.020
reating+0.018
rim+0.017
 emergency+0.017
 cartoons+0.016
Neg logit contributions
 three-0.013
sed-0.013
 hand-0.012
 finger-0.012
sel-0.012
Token up
Token ID510
Feature activation-1.05e-02

Pos logit contributions
rim+0.102
THE+0.093
 accidents+0.093
reating+0.092
asis+0.091
Neg logit contributions
 three-0.089
sed-0.083
 one-0.078
 finger-0.077
 "-0.077
Token,
Token ID11
Feature activation+1.38e-02

Pos logit contributions
 three+0.012
sed+0.011
 hand+0.010
 3+0.010
 4+0.010
Neg logit contributions
 accidents-0.019
reating-0.018
rim-0.018
THE-0.018
tex-0.017
Token her
Token ID607
Feature activation+1.71e-03

Pos logit contributions
 accidents+0.023
rim+0.022
reating+0.021
iver+0.021
THE+0.020
Neg logit contributions
 three-0.018
sed-0.017
 4-0.016
cious-0.015
 3-0.015
Token grandma
Token ID49890
Feature activation+2.45e-02

Pos logit contributions
 accidents+0.003
 mess+0.002
reating+0.002
 emergency+0.002
THE+0.002
Neg logit contributions
sed-0.002
cious-0.002
2-0.002
sel-0.002
 three-0.002
Token was
Token ID373
Feature activation+1.11e-02

Pos logit contributions
 accidents+0.044
reating+0.038
 mess+0.037
rim+0.036
THE+0.036
Neg logit contributions
sed-0.032
ses-0.029
 three-0.028
cious-0.028
2-0.028
Token her
Token ID607
Feature activation+2.11e-02

Pos logit contributions
 accidents+0.008
reating+0.007
rim+0.007
THE+0.007
iver+0.007
Neg logit contributions
sed-0.008
 three-0.008
cious-0.007
ses-0.007
2-0.007
Token cube
Token ID23441
Feature activation+8.32e-03

Pos logit contributions
 accidents+0.035
reating+0.032
rim+0.031
iver+0.030
THE+0.029
Neg logit contributions
sed-0.030
 three-0.028
 finger-0.026
ses-0.025
 hand-0.025
Token.
Token ID13
Feature activation+1.10e-02

Pos logit contributions
 accidents+0.016
rim+0.015
reating+0.015
THE+0.014
adays+0.014
Neg logit contributions
sed-0.009
 three-0.008
 finger-0.007
cious-0.007
 hand-0.007
Token When
Token ID1649
Feature activation+3.47e-02

Pos logit contributions
 accidents+0.021
reating+0.019
THE+0.018
rim+0.018
 mess+0.018
Neg logit contributions
sed-0.013
 three-0.011
cious-0.011
2-0.011
ses-0.011
Token she
Token ID673
Feature activation+5.92e-03

Pos logit contributions
rim+0.083
iver+0.078
ivering+0.074
hing+0.071
reating+0.069
Neg logit contributions
 three-0.030
 one-0.027
 "-0.024
 hand-0.024
 to-0.023
Token woke
Token ID19092
Feature activation+6.51e-02

Pos logit contributions
 accidents+0.011
reating+0.010
rim+0.010
 emergency+0.009
 hurricane+0.009
Neg logit contributions
 three-0.007
sed-0.007
 pointed-0.007
 hand-0.007
 finger-0.006
Token up
Token ID510
Feature activation-8.61e-03

Pos logit contributions
rim+0.111
Time+0.101
reating+0.101
THE+0.098
text+0.097
Neg logit contributions
 three-0.081
sed-0.079
 one-0.072
 "-0.070
 to-0.068
Token,
Token ID11
Feature activation+1.57e-02

Pos logit contributions
 three+0.009
sed+0.008
 3+0.007
 hand+0.007
 to+0.006
Neg logit contributions
THE-0.017
rim-0.017
 accidents-0.017
reating-0.016
iver-0.016
Token she
Token ID673
Feature activation-3.21e-03

Pos logit contributions
rim+0.031
 accidents+0.028
iver+0.028
reating+0.028
asis+0.027
Neg logit contributions
 three-0.018
sed-0.016
 hand-0.015
 finger-0.015
 3-0.015
Token felt
Token ID2936
Feature activation+7.38e-03

Pos logit contributions
sed+0.003
 three+0.003
cious+0.003
2+0.003
ses+0.003
Neg logit contributions
 accidents-0.007
reating-0.006
rim-0.006
 emergency-0.006
THE-0.006
Token much
Token ID881
Feature activation+1.94e-02

Pos logit contributions
 accidents+0.013
rim+0.013
reating+0.013
lf+0.013
 emergency+0.012
Neg logit contributions
 three-0.009
 hand-0.007
 finger-0.007
 3-0.007
sed-0.007
Token fun
Token ID1257
Feature activation+1.86e-02

Pos logit contributions
sed+0.032
cious+0.030
2+0.029
ses+0.029
ogram+0.029
Neg logit contributions
 accidents-0.015
 mess-0.011
reating-0.010
 mud-0.010
 cartoons-0.009
Token.
Token ID13
Feature activation+1.51e-02

Pos logit contributions
 accidents+0.038
reating+0.035
rim+0.034
THE+0.033
iver+0.032
Neg logit contributions
sed-0.019
 three-0.017
2-0.016
cious-0.016
ses-0.015
Token<|endoftext|>
Token ID50256
Feature activation+2.10e-02

Pos logit contributions
 accidents+0.028
reating+0.024
THE+0.024
 mess+0.023
rim+0.023
Neg logit contributions
sed-0.018
cious-0.016
2-0.016
sel-0.016
 three-0.016
TokenL
Token ID43
Feature activation+6.60e-03

Pos logit contributions
 accidents+0.036
reating+0.031
rim+0.029
THE+0.029
 mess+0.029
Neg logit contributions
sed-0.030
 three-0.027
cious-0.026
2-0.026
ses-0.026
Tokenily
Token ID813
Feature activation+2.89e-02

Pos logit contributions
 accidents+0.010
reating+0.008
rim+0.008
THE+0.008
 mess+0.008
Neg logit contributions
sed-0.011
 three-0.010
cious-0.010
ses-0.010
2-0.010
Token woke
Token ID19092
Feature activation+6.50e-02

Pos logit contributions
 accidents+0.060
reating+0.054
rim+0.051
THE+0.050
 emergency+0.049
Neg logit contributions
sed-0.031
 three-0.028
ses-0.026
 seven-0.026
 4-0.026
Token up
Token ID510
Feature activation-1.49e-02

Pos logit contributions
Time+0.099
reating+0.098
Ro+0.098
Got+0.097
rim+0.097
Neg logit contributions
sed-0.087
 three-0.077
 each-0.074
 finger-0.074
ses-0.073
Token and
Token ID290
Feature activation-3.42e-03

Pos logit contributions
sed+0.023
 three+0.021
cious+0.021
ses+0.020
2+0.020
Neg logit contributions
 accidents-0.023
reating-0.020
rim-0.019
THE-0.019
 mess-0.018
Token felt
Token ID2936
Feature activation+1.73e-02

Pos logit contributions
sed+0.005
 three+0.004
cious+0.004
2+0.004
ses+0.004
Neg logit contributions
 accidents-0.006
reating-0.005
rim-0.005
THE-0.005
 mess-0.005
Token hungry
Token ID14720
Feature activation+2.55e-02

Pos logit contributions
 accidents+0.034
rim+0.033
reating+0.033
lf+0.031
adays+0.031
Neg logit contributions
 three-0.017
sed-0.016
 finger-0.015
 hand-0.015
 "-0.014
Token.
Token ID13
Feature activation+2.31e-02

Pos logit contributions
 accidents+0.057
reating+0.051
rim+0.051
THE+0.050
 mess+0.049
Neg logit contributions
sed-0.021
 three-0.019
cious-0.017
2-0.017
ses-0.017
Token very
Token ID845
Feature activation+5.23e-04

Pos logit contributions
 accidents+0.012
reating+0.010
rim+0.009
THE+0.009
 mess+0.009
Neg logit contributions
sed-0.009
 three-0.008
cious-0.008
2-0.008
ses-0.008
Token tired
Token ID10032
Feature activation+1.51e-02

Pos logit contributions
 accidents+0.001
reating+0.001
rim+0.001
THE+0.001
 mess+0.001
Neg logit contributions
sed-0.001
 three-0.001
cious-0.001
2-0.001
ses-0.001
Token.
Token ID13
Feature activation+1.37e-02

Pos logit contributions
 accidents+0.033
rim+0.031
THE+0.030
reating+0.029
tex+0.029
Neg logit contributions
 three-0.012
sed-0.010
 hand-0.010
 3-0.009
2-0.009
Token When
Token ID1649
Feature activation+3.61e-02

Pos logit contributions
 accidents+0.026
reating+0.023
rim+0.022
THE+0.021
 mess+0.021
Neg logit contributions
sed-0.017
 three-0.015
cious-0.015
2-0.015
ses-0.014
Token she
Token ID673
Feature activation+6.11e-03

Pos logit contributions
rim+0.082
iver+0.077
ivering+0.072
ilt+0.070
hing+0.070
Neg logit contributions
 three-0.033
 one-0.029
 hand-0.028
 "-0.027
sed-0.027
Token woke
Token ID19092
Feature activation+6.45e-02

Pos logit contributions
 accidents+0.011
reating+0.011
rim+0.010
 emergency+0.010
 cartoons+0.009
Neg logit contributions
 three-0.007
sed-0.007
 hand-0.007
 pointed-0.007
 finger-0.006
Token up
Token ID510
Feature activation-1.14e-02

Pos logit contributions
rim+0.113
Time+0.104
reating+0.102
Ro+0.102
Got+0.101
Neg logit contributions
 three-0.076
sed-0.073
 one-0.068
 to-0.067
 "-0.066
Token,
Token ID11
Feature activation+1.49e-02

Pos logit contributions
 three+0.012
sed+0.011
 hand+0.010
 3+0.010
 finger+0.010
Neg logit contributions
 accidents-0.021
reating-0.021
THE-0.021
rim-0.021
tex-0.020
Token she
Token ID673
Feature activation-1.88e-03

Pos logit contributions
rim+0.030
iver+0.027
reating+0.027
 accidents+0.026
asis+0.026
Neg logit contributions
 three-0.017
sed-0.014
 hand-0.014
 3-0.014
 finger-0.014
Token played
Token ID2826
Feature activation+1.96e-02

Pos logit contributions
sed+0.002
 three+0.002
cious+0.002
 finger+0.002
 4+0.002
Neg logit contributions
 accidents-0.004
reating-0.004
rim-0.003
 emergency-0.003
THE-0.003
Token with
Token ID351
Feature activation+1.20e-03

Pos logit contributions
rim+0.039
 accidents+0.038
reating+0.036
 hurricane+0.036
adays+0.035
Neg logit contributions
 three-0.022
 "-0.019
 hand-0.018
 one-0.017
 finger-0.017
Token.
Token ID13
Feature activation+1.69e-02

Pos logit contributions
sed+0.022
ses+0.020
cious+0.020
2+0.020
sel+0.019
Neg logit contributions
 accidents-0.021
reating-0.018
THE-0.018
 mess-0.017
rim-0.017
Token
Token ID198
Feature activation+1.01e-02

Pos logit contributions
 accidents+0.031
reating+0.027
THE+0.026
rim+0.025
 mess+0.025
Neg logit contributions
sed-0.022
cious-0.019
 three-0.019
2-0.019
ses-0.019
Token
Token ID198
Feature activation+1.50e-02

Pos logit contributions
 accidents+0.016
reating+0.014
rim+0.013
 emergency+0.013
adays+0.013
Neg logit contributions
sed-0.015
 three-0.014
2-0.013
cious-0.013
 4-0.013
TokenWhen
Token ID2215
Feature activation+3.56e-02

Pos logit contributions
 accidents+0.028
 emergency+0.024
rim+0.023
reating+0.023
 hurricane+0.023
Neg logit contributions
2-0.018
 three-0.017
cious-0.016
 seven-0.016
 eight-0.016
Token she
Token ID673
Feature activation+5.48e-03

Pos logit contributions
rim+0.080
iver+0.078
ivering+0.072
ilt+0.068
reating+0.067
Neg logit contributions
 three-0.035
 one-0.031
 "-0.030
sed-0.029
 hand-0.027
Token woke
Token ID19092
Feature activation+6.41e-02

Pos logit contributions
 accidents+0.011
reating+0.010
rim+0.010
 emergency+0.009
 cartoons+0.009
Neg logit contributions
 three-0.006
sed-0.005
 pointed-0.005
 hand-0.005
 4-0.005
Token up
Token ID510
Feature activation-7.69e-03

Pos logit contributions
rim+0.115
Time+0.107
reating+0.105
text+0.102
Got+0.102
Neg logit contributions
 three-0.075
sed-0.075
 one-0.067
 to-0.066
 each-0.064
Token,
Token ID11
Feature activation+1.35e-02

Pos logit contributions
 three+0.007
sed+0.006
 3+0.006
 hand+0.006
 to+0.006
Neg logit contributions
rim-0.016
reating-0.015
THE-0.015
 accidents-0.015
tex-0.015
Token it
Token ID340
Feature activation+6.53e-03

Pos logit contributions
rim+0.027
iver+0.024
 accidents+0.024
reating+0.024
asis+0.023
Neg logit contributions
 three-0.016
sed-0.014
 3-0.013
 hand-0.013
 "-0.012
Token was
Token ID373
Feature activation+7.27e-03

Pos logit contributions
 accidents+0.013
reating+0.013
rim+0.011
 emergency+0.011
 vacations+0.011
Neg logit contributions
sed-0.007
 three-0.007
cious-0.006
 4-0.006
 seven-0.006
Token a
Token ID257
Feature activation+4.54e-03

Pos logit contributions
 accidents+0.013
reating+0.013
rim+0.012
uggle+0.012
adays+0.011
Neg logit contributions
 three-0.009
sed-0.008
 seven-0.008
 4-0.008
 hand-0.007
Token knee
Token ID10329
Feature activation+2.64e-02

Pos logit contributions
 accidents+0.021
reating+0.019
rim+0.019
THE+0.018
iver+0.017
Neg logit contributions
sed-0.019
 three-0.018
ses-0.017
 finger-0.017
 hand-0.016
Token.
Token ID13
Feature activation+1.22e-02

Pos logit contributions
 accidents+0.052
reating+0.046
 mess+0.044
THE+0.044
rim+0.044
Neg logit contributions
sed-0.030
 three-0.026
cious-0.026
2-0.026
ses-0.025
Token When
Token ID1649
Feature activation+3.56e-02

Pos logit contributions
 accidents+0.022
reating+0.019
rim+0.018
THE+0.018
 mess+0.018
Neg logit contributions
sed-0.016
 three-0.014
cious-0.014
2-0.014
ses-0.014
Token Tim
Token ID5045
Feature activation+1.17e-02

Pos logit contributions
iver+0.077
rim+0.076
ivering+0.075
ilt+0.072
reating+0.067
Neg logit contributions
 three-0.037
 one-0.034
 "-0.029
 4-0.028
 3-0.028
Tokenmy
Token ID1820
Feature activation+1.36e-02

Pos logit contributions
 accidents+0.022
rim+0.019
reating+0.019
 hurricane+0.018
THE+0.018
Neg logit contributions
 three-0.014
sed-0.014
 hand-0.013
2-0.013
 4-0.013
Token woke
Token ID19092
Feature activation+6.40e-02

Pos logit contributions
 accidents+0.027
reating+0.025
rim+0.025
THE+0.023
 hurricane+0.022
Neg logit contributions
 three-0.015
sed-0.014
 hand-0.014
 pointed-0.014
 finger-0.014
Token up
Token ID510
Feature activation-6.62e-03

Pos logit contributions
rim+0.119
Time+0.118
text+0.116
Ro+0.116
Got+0.115
Neg logit contributions
 three-0.066
 one-0.062
sed-0.061
 to-0.060
 "-0.059
Token,
Token ID11
Feature activation+1.24e-02

Pos logit contributions
 three+0.006
 to+0.005
 3+0.005
 hand+0.005
sed+0.005
Neg logit contributions
THE-0.014
rim-0.014
ilt-0.013
reating-0.013
iver-0.013
Token he
Token ID339
Feature activation-5.99e-03

Pos logit contributions
rim+0.022
 accidents+0.021
ilt+0.021
reating+0.020
iver+0.020
Neg logit contributions
 three-0.015
sed-0.014
 hand-0.013
 3-0.013
 "-0.013
Token felt
Token ID2936
Feature activation+1.23e-02

Pos logit contributions
sed+0.006
 three+0.005
cious+0.005
2+0.005
 finger+0.005
Neg logit contributions
 accidents-0.013
reating-0.012
rim-0.011
 emergency-0.011
THE-0.011
Token much
Token ID881
Feature activation+2.05e-02

Pos logit contributions
reating+0.022
 accidents+0.022
rim+0.021
lf+0.020
adays+0.019
Neg logit contributions
 three-0.015
 hand-0.013
 finger-0.013
 3-0.012
sed-0.012
Token asleep
Token ID16039
Feature activation+1.31e-02

Pos logit contributions
 accidents+0.042
rim+0.040
reating+0.038
 emergency+0.035
THE+0.035
Neg logit contributions
sed-0.045
 three-0.043
 hand-0.040
ses-0.040
 arms-0.039
Token fast
Token ID3049
Feature activation+3.44e-04

Pos logit contributions
 accidents+0.020
 mess+0.017
THE+0.017
lf+0.016
reating+0.016
Neg logit contributions
sed-0.020
ogram-0.020
2-0.019
ses-0.018
cious-0.018
Token.
Token ID13
Feature activation+1.47e-02

Pos logit contributions
 accidents+0.001
rim+0.001
reating+0.001
THE+0.001
iver+0.001
Neg logit contributions
sed-0.000
 three-0.000
ses-0.000
cious-0.000
2-0.000
Token When
Token ID1649
Feature activation+3.27e-02

Pos logit contributions
 accidents+0.026
reating+0.022
THE+0.022
 mess+0.021
rim+0.021
Neg logit contributions
sed-0.019
 three-0.017
cious-0.017
sel-0.017
2-0.017
Token she
Token ID673
Feature activation+4.30e-03

Pos logit contributions
rim+0.077
iver+0.073
ivering+0.069
ilt+0.066
 accidents+0.066
Neg logit contributions
 three-0.026
sed-0.024
 "-0.024
 one-0.023
 hand-0.022
Token woke
Token ID19092
Feature activation+6.37e-02

Pos logit contributions
 accidents+0.008
reating+0.008
rim+0.007
 emergency+0.007
 hurricane+0.007
Neg logit contributions
 three-0.005
sed-0.005
 pointed-0.005
 hand-0.005
 finger-0.005
Token up
Token ID510
Feature activation-1.11e-02

Pos logit contributions
rim+0.110
Time+0.102
reating+0.099
iver+0.098
asis+0.098
Neg logit contributions
sed-0.078
 three-0.072
 one-0.066
 "-0.065
 wrists-0.064
Token,
Token ID11
Feature activation+1.38e-02

Pos logit contributions
 three+0.010
sed+0.009
 hand+0.008
 3+0.008
 to+0.008
Neg logit contributions
THE-0.022
rim-0.022
reating-0.022
 accidents-0.021
tex-0.021
Token she
Token ID673
Feature activation-5.66e-03

Pos logit contributions
rim+0.025
 accidents+0.024
reating+0.024
iver+0.022
asis+0.022
Neg logit contributions
sed-0.016
 three-0.016
 hand-0.014
 finger-0.014
 "-0.014
Token felt
Token ID2936
Feature activation+7.89e-03

Pos logit contributions
sed+0.006
 three+0.006
cious+0.005
2+0.005
ses+0.005
Neg logit contributions
 accidents-0.011
reating-0.010
rim-0.010
 emergency-0.010
THE-0.009
Token much
Token ID881
Feature activation+1.97e-02

Pos logit contributions
rim+0.014
reating+0.014
 accidents+0.013
lf+0.013
 emergency+0.012
Neg logit contributions
 three-0.009
 hand-0.008
 finger-0.008
 "-0.008
sed-0.008
Token adventure
Token ID8855
Feature activation+1.23e-06

Pos logit contributions
sed+0.012
 three+0.011
 hand+0.010
ses+0.010
 finger+0.010
Neg logit contributions
 accidents-0.010
reating-0.009
rim-0.009
THE-0.008
adays-0.008
Token ever
Token ID1683
Feature activation+8.98e-03

Feature not active on prev token;
no predictions were affected.
Token.
Token ID13
Feature activation+1.32e-02

Pos logit contributions
 accidents+0.016
reating+0.014
THE+0.014
 mess+0.013
rim+0.013
Neg logit contributions
sed-0.013
ses-0.011
 three-0.011
2-0.011
cious-0.010
Token When
Token ID1649
Feature activation+3.83e-02

Pos logit contributions
 accidents+0.024
reating+0.021
rim+0.021
 mess+0.020
THE+0.020
Neg logit contributions
sed-0.017
 three-0.016
cious-0.015
2-0.015
ses-0.015
Token he
Token ID339
Feature activation+3.25e-03

Pos logit contributions
rim+0.087
iver+0.082
ivering+0.080
ilt+0.078
reating+0.076
Neg logit contributions
 three-0.037
 one-0.036
 3-0.028
 "-0.028
 4-0.027
Token woke
Token ID19092
Feature activation+6.36e-02

Pos logit contributions
 accidents+0.006
reating+0.006
rim+0.005
 cartoons+0.005
 emergency+0.005
Neg logit contributions
 three-0.004
 pointed-0.004
sed-0.003
 hand-0.003
 3-0.003
Token up
Token ID510
Feature activation-7.85e-03

Pos logit contributions
rim+0.116
reating+0.110
Got+0.106
text+0.105
Time+0.105
Neg logit contributions
 three-0.072
 one-0.065
sed-0.064
 to-0.064
 "-0.063
Token in
Token ID287
Feature activation-3.79e-03

Pos logit contributions
 three+0.008
sed+0.008
 3+0.007
 hand+0.007
 4+0.006
Neg logit contributions
reating-0.015
 accidents-0.015
THE-0.015
rim-0.015
iver-0.014
Token the
Token ID262
Feature activation+1.16e-02

Pos logit contributions
sed+0.005
 three+0.005
 hand+0.004
 4+0.004
 3+0.004
Neg logit contributions
 accidents-0.007
reating-0.006
rim-0.006
THE-0.006
iver-0.006
Token morning
Token ID3329
Feature activation-6.65e-03

Pos logit contributions
 accidents+0.019
rim+0.018
reating+0.017
iver+0.015
affles+0.015
Neg logit contributions
 three-0.018
 hand-0.016
 3-0.016
 seven-0.015
 4-0.015
Token,
Token ID11
Feature activation+1.31e-02

Pos logit contributions
 three+0.008
 3+0.006
sed+0.006
 4+0.006
 hand+0.006
Neg logit contributions
 accidents-0.013
reating-0.012
rim-0.012
THE-0.012
iver-0.011
Token with
Token ID351
Feature activation+9.05e-03

Pos logit contributions
sed+0.031
ses+0.028
 three+0.026
2+0.026
 wrists+0.025
Neg logit contributions
 accidents-0.043
reating-0.038
THE-0.036
rim-0.036
 mess-0.035
Token food
Token ID2057
Feature activation+2.64e-02

Pos logit contributions
 accidents+0.015
reating+0.013
rim+0.013
THE+0.013
 mess+0.012
Neg logit contributions
sed-0.013
 three-0.012
ses-0.011
cious-0.011
 hand-0.011
Token.
Token ID13
Feature activation+1.78e-02

Pos logit contributions
 accidents+0.058
reating+0.052
rim+0.050
THE+0.049
 mess+0.049
Neg logit contributions
sed-0.024
 three-0.021
cious-0.020
ses-0.020
2-0.019
Token When
Token ID1649
Feature activation+3.88e-02

Pos logit contributions
 accidents+0.036
reating+0.033
rim+0.033
 mess+0.032
 emergency+0.031
Neg logit contributions
sed-0.020
 three-0.018
ses-0.017
cious-0.016
 "-0.016
Token she
Token ID673
Feature activation+8.90e-03

Pos logit contributions
rim+0.088
iver+0.083
reating+0.082
ilt+0.079
ivering+0.077
Neg logit contributions
 three-0.034
sed-0.032
 "-0.030
 hand-0.028
 one-0.028
Token woke
Token ID19092
Feature activation+6.36e-02

Pos logit contributions
 accidents+0.016
reating+0.015
rim+0.014
 emergency+0.014
 cartoons+0.014
Neg logit contributions
 three-0.011
sed-0.011
 hand-0.010
 pointed-0.010
 finger-0.009
Token up
Token ID510
Feature activation-8.88e-03

Pos logit contributions
rim+0.126
reating+0.116
text+0.109
Time+0.108
Got+0.107
Neg logit contributions
 three-0.066
 one-0.062
 to-0.061
sed-0.060
 "-0.057
Token,
Token ID11
Feature activation+1.10e-02

Pos logit contributions
 three+0.008
sed+0.007
 "+0.006
 to+0.006
 3+0.006
Neg logit contributions
reating-0.019
rim-0.019
THE-0.018
iver-0.018
 accidents-0.018
Token Fl
Token ID1610
Feature activation+5.91e-04

Pos logit contributions
rim+0.023
reating+0.022
iver+0.021
ilt+0.021
 accidents+0.020
Neg logit contributions
 three-0.012
sed-0.011
 "-0.010
 hand-0.009
 3-0.009
Tokenuffy
Token ID15352
Feature activation+8.94e-03

Pos logit contributions
 accidents+0.001
reating+0.001
rim+0.001
THE+0.001
 mess+0.001
Neg logit contributions
sed-0.001
 three-0.001
cious-0.001
2-0.001
ses-0.001
Token was
Token ID373
Feature activation+5.67e-03

Pos logit contributions
 accidents+0.017
reating+0.015
rim+0.014
THE+0.014
 emergency+0.014
Neg logit contributions
sed-0.011
 three-0.010
cious-0.010
2-0.009
ses-0.009
Token a
Token ID257
Feature activation+4.77e-03

Pos logit contributions
sed+0.012
 three+0.011
cious+0.010
2+0.010
ses+0.010
Neg logit contributions
 accidents-0.027
reating-0.024
rim-0.024
THE-0.023
 mess-0.023
Token nap
Token ID25422
Feature activation+1.54e-02

Pos logit contributions
 accidents+0.008
reating+0.007
rim+0.006
THE+0.006
iver+0.006
Neg logit contributions
sed-0.007
 three-0.007
 hand-0.007
 finger-0.007
cious-0.006
Token.
Token ID13
Feature activation+1.21e-02

Pos logit contributions
 accidents+0.027
reating+0.023
rim+0.022
THE+0.022
 mess+0.022
Neg logit contributions
sed-0.021
 three-0.019
2-0.019
cious-0.019
ses-0.019
Token When
Token ID1649
Feature activation+3.12e-02

Pos logit contributions
 accidents+0.020
reating+0.016
 mess+0.016
THE+0.016
 cartoons+0.015
Neg logit contributions
sed-0.018
cious-0.016
sel-0.016
2-0.016
ses-0.016
Token she
Token ID673
Feature activation+6.09e-03

Pos logit contributions
rim+0.064
iver+0.059
 accidents+0.058
reating+0.058
ivering+0.055
Neg logit contributions
 three-0.033
sed-0.030
 hand-0.028
 "-0.027
 4-0.027
Token woke
Token ID19092
Feature activation+6.34e-02

Pos logit contributions
 accidents+0.013
reating+0.012
rim+0.011
 emergency+0.011
 cartoons+0.011
Neg logit contributions
 three-0.006
sed-0.006
 hand-0.005
cious-0.005
 finger-0.005
Token up
Token ID510
Feature activation-1.44e-02

Pos logit contributions
rim+0.110
reating+0.101
text+0.098
Time+0.096
 accidents+0.095
Neg logit contributions
 three-0.078
sed-0.077
 one-0.070
 "-0.068
 finger-0.067
Token,
Token ID11
Feature activation+1.17e-02

Pos logit contributions
 three+0.014
sed+0.012
 3+0.011
 hand+0.011
 "+0.011
Neg logit contributions
rim-0.029
reating-0.028
 accidents-0.028
THE-0.027
tex-0.027
Token her
Token ID607
Feature activation+6.97e-04

Pos logit contributions
rim+0.020
 accidents+0.020
reating+0.019
iver+0.018
THE+0.018
Neg logit contributions
 three-0.015
sed-0.014
 hand-0.013
 finger-0.013
 3-0.013
Token mom
Token ID1995
Feature activation+2.19e-02

Pos logit contributions
 accidents+0.001
reating+0.001
 mess+0.001
rim+0.001
THE+0.001
Neg logit contributions
sed-0.001
cious-0.001
 three-0.001
2-0.001
ses-0.001
Tokenmy
Token ID1820
Feature activation+2.04e-02

Pos logit contributions
 accidents+0.035
reating+0.030
rim+0.029
THE+0.028
 vacations+0.028
Neg logit contributions
sed-0.032
 three-0.031
2-0.029
 4-0.029
cious-0.028
Token the
Token ID262
Feature activation-1.28e-03

Pos logit contributions
 accidents+0.005
reating+0.005
rim+0.005
THE+0.004
iver+0.004
Neg logit contributions
sed-0.003
 three-0.003
2-0.003
 4-0.003
ses-0.003
Token rope
Token ID17182
Feature activation+6.66e-05

Pos logit contributions
sed+0.002
 three+0.002
cious+0.002
2+0.002
ses+0.002
Neg logit contributions
 accidents-0.002
reating-0.002
THE-0.002
 mess-0.002
rim-0.002
Token and
Token ID290
Feature activation+1.04e-02

Pos logit contributions
 accidents+0.000
rim+0.000
reating+0.000
THE+0.000
 emergency+0.000
Neg logit contributions
sed-0.000
 three-0.000
 hand-0.000
ses-0.000
 4-0.000
Token the
Token ID262
Feature activation-7.07e-03

Pos logit contributions
 accidents+0.019
reating+0.017
THE+0.016
rim+0.016
 mess+0.016
Neg logit contributions
sed-0.013
 three-0.012
cious-0.012
2-0.012
ses-0.012
Token dog
Token ID3290
Feature activation+3.65e-03

Pos logit contributions
sed+0.010
 three+0.009
cious+0.009
2+0.009
ses+0.009
Neg logit contributions
 accidents-0.012
reating-0.010
 mess-0.010
THE-0.010
 emergency-0.009
Token woke
Token ID19092
Feature activation+6.33e-02

Pos logit contributions
 accidents+0.006
 mess+0.005
THE+0.005
reating+0.005
rim+0.005
Neg logit contributions
sed-0.005
2-0.005
 three-0.005
ses-0.004
cious-0.004
Token up
Token ID510
Feature activation-1.10e-02

Pos logit contributions
rim+0.104
reating+0.098
asis+0.096
 accidents+0.095
iver+0.093
Neg logit contributions
 three-0.073
 each-0.072
 one-0.071
sed-0.071
 "-0.069
Token,
Token ID11
Feature activation+4.03e-03

Pos logit contributions
sed+0.013
 three+0.012
cious+0.011
2+0.011
ses+0.011
Neg logit contributions
 accidents-0.021
reating-0.019
rim-0.018
THE-0.018
 mess-0.018
Token but
Token ID475
Feature activation+3.19e-02

Pos logit contributions
 accidents+0.008
reating+0.007
rim+0.006
THE+0.006
 emergency+0.006
Neg logit contributions
sed-0.005
 three-0.004
cious-0.004
ses-0.004
2-0.004
Token was
Token ID373
Feature activation-1.75e-03

Pos logit contributions
 accidents+0.055
reating+0.050
rim+0.050
iver+0.048
THE+0.046
Neg logit contributions
sed-0.042
 three-0.040
ses-0.036
cious-0.036
 4-0.036
Token too
Token ID1165
Feature activation-1.68e-02

Pos logit contributions
sed+0.002
 three+0.002
cious+0.002
2+0.002
ses+0.002
Neg logit contributions
 accidents-0.003
reating-0.003
THE-0.002
rim-0.002
 mess-0.002
Token fell
Token ID3214
Feature activation+2.56e-02

Pos logit contributions
 accidents+0.018
reating+0.017
rim+0.016
 emergency+0.016
iver+0.016
Neg logit contributions
sed-0.013
 three-0.012
ses-0.011
cious-0.011
 hand-0.011
Token asleep
Token ID16039
Feature activation+1.17e-02

Pos logit contributions
 accidents+0.038
rim+0.035
reating+0.035
 emergency+0.032
adays+0.032
Neg logit contributions
sed-0.039
 three-0.037
 hand-0.035
ses-0.034
 arms-0.034
Token.
Token ID13
Feature activation+8.96e-03

Pos logit contributions
 accidents+0.017
 mess+0.014
THE+0.014
lf+0.013
 happen+0.013
Neg logit contributions
ogram-0.019
sed-0.018
2-0.018
ses-0.017
cious-0.017
Token When
Token ID1649
Feature activation+2.96e-02

Pos logit contributions
 accidents+0.017
THE+0.014
tex+0.014
adays+0.014
uggle+0.014
Neg logit contributions
sed-0.011
cious-0.010
 wrists-0.010
2-0.010
sel-0.010
Token it
Token ID340
Feature activation+1.87e-02

Pos logit contributions
 accidents+0.048
reating+0.044
rim+0.042
iver+0.039
THE+0.038
Neg logit contributions
sed-0.043
 three-0.041
cious-0.038
ses-0.038
 4-0.038
Token woke
Token ID19092
Feature activation+6.30e-02

Pos logit contributions
 accidents+0.037
reating+0.037
rim+0.034
 cartoons+0.033
 emergency+0.032
Neg logit contributions
 three-0.020
 hand-0.018
sed-0.018
 pointed-0.017
 3-0.017
Token up
Token ID510
Feature activation-3.78e-03

Pos logit contributions
reating+0.105
rim+0.102
 accidents+0.095
Got+0.093
text+0.093
Neg logit contributions
 three-0.077
sed-0.075
 "-0.070
 one-0.069
 to-0.066
Token,
Token ID11
Feature activation+6.22e-03

Pos logit contributions
 three+0.004
sed+0.004
 hand+0.003
 3+0.003
 4+0.003
Neg logit contributions
reating-0.007
 accidents-0.007
rim-0.007
THE-0.007
iver-0.006
Token the
Token ID262
Feature activation-1.86e-02

Pos logit contributions
 accidents+0.010
reating+0.008
THE+0.008
 mess+0.008
 emergency+0.007
Neg logit contributions
sed-0.010
cious-0.009
 three-0.009
2-0.009
ses-0.009
Token rain
Token ID6290
Feature activation+3.22e-02

Pos logit contributions
sed+0.021
 three+0.020
ses+0.018
2+0.018
 hand+0.018
Neg logit contributions
 accidents-0.036
reating-0.035
rim-0.033
THE-0.031
 vacations-0.031
Token had
Token ID550
Feature activation-5.69e-03

Pos logit contributions
 accidents+0.059
reating+0.055
rim+0.052
 emergency+0.050
THE+0.049
Neg logit contributions
sed-0.041
 three-0.036
cious-0.034
ses-0.034
 hand-0.032
Token slept
Token ID21256
Feature activation+2.93e-02

Pos logit contributions
 accidents+0.006
reating+0.005
rim+0.005
THE+0.005
 emergency+0.005
Neg logit contributions
sed-0.003
 three-0.003
cious-0.003
ses-0.003
2-0.003
Token peacefully
Token ID30996
Feature activation+2.73e-04

Pos logit contributions
 accidents+0.047
reating+0.040
rim+0.038
THE+0.038
 mess+0.037
Neg logit contributions
sed-0.044
 three-0.040
cious-0.040
2-0.039
ses-0.039
Token.
Token ID13
Feature activation+1.11e-02

Pos logit contributions
 accidents+0.000
reating+0.000
THE+0.000
 mess+0.000
rim+0.000
Neg logit contributions
sed-0.000
2-0.000
cious-0.000
ses-0.000
 three-0.000
Token When
Token ID1649
Feature activation+3.57e-02

Pos logit contributions
 accidents+0.019
reating+0.017
THE+0.016
rim+0.016
 mess+0.016
Neg logit contributions
sed-0.015
 three-0.014
cious-0.014
2-0.013
ses-0.013
Token she
Token ID673
Feature activation+6.90e-03

Pos logit contributions
rim+0.076
iver+0.071
ivering+0.067
 accidents+0.065
reating+0.064
Neg logit contributions
 three-0.035
sed-0.030
 one-0.030
 hand-0.030
 "-0.030
Token woke
Token ID19092
Feature activation+6.30e-02

Pos logit contributions
 accidents+0.013
reating+0.011
rim+0.011
 emergency+0.010
 hurricane+0.010
Neg logit contributions
 three-0.009
sed-0.008
 pointed-0.008
 hand-0.008
 finger-0.008
Token up
Token ID510
Feature activation-1.12e-02

Pos logit contributions
rim+0.101
reating+0.093
 accidents+0.091
THE+0.091
Got+0.088
Neg logit contributions
 three-0.082
sed-0.080
 one-0.072
 "-0.072
 finger-0.071
Token,
Token ID11
Feature activation+9.50e-03

Pos logit contributions
 three+0.012
sed+0.011
 3+0.010
 hand+0.010
 "+0.009
Neg logit contributions
 accidents-0.021
rim-0.021
reating-0.021
THE-0.020
tex-0.019
Token she
Token ID673
Feature activation-2.93e-04

Pos logit contributions
 accidents+0.016
rim+0.016
reating+0.015
iver+0.015
THE+0.014
Neg logit contributions
 three-0.012
sed-0.011
 hand-0.011
 finger-0.010
 3-0.010
Token said
Token ID531
Feature activation+1.18e-02

Pos logit contributions
sed+0.000
 three+0.000
cious+0.000
2+0.000
 4+0.000
Neg logit contributions
 accidents-0.001
reating-0.001
rim-0.001
 emergency-0.001
THE-0.001
Token to
Token ID284
Feature activation-1.66e-03

Pos logit contributions
 accidents+0.022
rim+0.020
reating+0.020
iver+0.019
 mess+0.019
Neg logit contributions
sed-0.015
 three-0.014
cious-0.013
 4-0.013
 seven-0.012
Token a
Token ID257
Feature activation+8.76e-03

Pos logit contributions
 three+0.003
sed+0.003
 hand+0.003
 4+0.003
 seven+0.003
Neg logit contributions
 accidents-0.007
rim-0.007
reating-0.007
THE-0.006
 mess-0.006
Token nap
Token ID25422
Feature activation+2.45e-02

Pos logit contributions
 accidents+0.013
reating+0.011
rim+0.011
THE+0.011
 emergency+0.010
Neg logit contributions
sed-0.014
 three-0.013
cious-0.013
2-0.012
ses-0.012
Token.
Token ID13
Feature activation+1.56e-02

Pos logit contributions
 accidents+0.048
reating+0.042
THE+0.041
rim+0.041
 mess+0.041
Neg logit contributions
sed-0.029
2-0.025
ses-0.025
cious-0.025
 three-0.024
Token When
Token ID1649
Feature activation+3.18e-02

Pos logit contributions
 accidents+0.026
reating+0.021
THE+0.021
 mess+0.020
adays+0.020
Neg logit contributions
sed-0.022
cious-0.020
sel-0.020
 three-0.020
2-0.020
Token she
Token ID673
Feature activation+8.85e-03

Pos logit contributions
rim+0.073
iver+0.066
reating+0.064
 accidents+0.062
ivering+0.062
Neg logit contributions
 three-0.030
sed-0.025
 hand-0.025
 "-0.025
 eight-0.023
Token woke
Token ID19092
Feature activation+6.30e-02

Pos logit contributions
 accidents+0.016
reating+0.016
 emergency+0.014
rim+0.014
 cartoons+0.014
Neg logit contributions
 three-0.011
sed-0.010
 hand-0.010
 finger-0.010
 pointed-0.009
Token up
Token ID510
Feature activation-1.08e-02

Pos logit contributions
rim+0.111
reating+0.102
text+0.099
Time+0.097
Ro+0.096
Neg logit contributions
 three-0.074
sed-0.072
 one-0.067
 "-0.065
 to-0.064
Token,
Token ID11
Feature activation+1.20e-02

Pos logit contributions
 three+0.008
 to+0.007
 "+0.007
 hand+0.006
 3+0.006
Neg logit contributions
rim-0.025
reating-0.024
THE-0.023
ilt-0.023
iver-0.023
Token she
Token ID673
Feature activation-6.99e-05

Pos logit contributions
rim+0.025
reating+0.023
iver+0.022
 accidents+0.022
ilt+0.021
Neg logit contributions
 three-0.013
sed-0.011
 3-0.011
 hand-0.011
 finger-0.010
Token saw
Token ID2497
Feature activation+1.25e-02

Pos logit contributions
sed+0.000
 three+0.000
cious+0.000
ses+0.000
2+0.000
Neg logit contributions
 accidents-0.000
reating-0.000
rim-0.000
THE-0.000
 emergency-0.000
Token that
Token ID326
Feature activation-2.20e-03

Pos logit contributions
rim+0.026
iver+0.025
 accidents+0.024
asis+0.024
adays+0.024
Neg logit contributions
 three-0.012
sed-0.010
 finger-0.010
 hand-0.009
 3-0.009
Token
Token ID198
Feature activation+1.15e-02

Pos logit contributions
 accidents+0.009
THE+0.007
reating+0.007
 mess+0.007
tex+0.007
Neg logit contributions
sed-0.007
cious-0.006
ses-0.006
 three-0.006
2-0.006
Token
Token ID198
Feature activation+1.44e-02

Pos logit contributions
 accidents+0.020
reating+0.017
rim+0.016
 emergency+0.016
 mess+0.016
Neg logit contributions
 three-0.015
sed-0.014
 hand-0.014
cious-0.014
 "-0.014
TokenAnd
Token ID1870
Feature activation-1.36e-02

Pos logit contributions
 accidents+0.023
rim+0.022
reating+0.021
 emergency+0.021
 bamboo+0.020
Neg logit contributions
 three-0.018
2-0.018
 eight-0.017
cious-0.017
 4-0.017
Token when
Token ID618
Feature activation+2.41e-02

Pos logit contributions
sed+0.019
 three+0.019
cious+0.017
2+0.017
 4+0.017
Neg logit contributions
 accidents-0.022
rim-0.020
reating-0.020
iver-0.019
 mess-0.019
Token Emily
Token ID17608
Feature activation+2.08e-02

Pos logit contributions
rim+0.042
reating+0.040
 accidents+0.040
iver+0.038
asis+0.036
Neg logit contributions
 three-0.029
sed-0.028
 hand-0.025
 4-0.025
 3-0.025
Token woke
Token ID19092
Feature activation+6.29e-02

Pos logit contributions
 accidents+0.036
reating+0.033
rim+0.031
 emergency+0.029
 cartoons+0.028
Neg logit contributions
 three-0.028
sed-0.027
cious-0.026
 pointed-0.025
 seven-0.024
Token up
Token ID510
Feature activation-5.25e-03

Pos logit contributions
rim+0.104
Got+0.098
reating+0.096
 accidents+0.095
THE+0.094
Neg logit contributions
 three-0.080
sed-0.071
 one-0.069
 finger-0.066
 to-0.066
Token in
Token ID287
Feature activation+2.71e-03

Pos logit contributions
 three+0.006
sed+0.005
 3+0.005
 hand+0.005
 4+0.004
Neg logit contributions
reating-0.010
 accidents-0.010
rim-0.010
THE-0.009
tex-0.009
Token the
Token ID262
Feature activation+1.86e-02

Pos logit contributions
 accidents+0.004
rim+0.004
reating+0.004
THE+0.004
iver+0.004
Neg logit contributions
 three-0.004
sed-0.004
 hand-0.004
 3-0.003
 4-0.003
Token morning
Token ID3329
Feature activation-2.19e-03

Pos logit contributions
rim+0.032
reating+0.029
 accidents+0.028
affles+0.025
THE+0.025
Neg logit contributions
 three-0.028
 hand-0.026
 3-0.025
 arms-0.024
 one-0.022
Token,
Token ID11
Feature activation+6.58e-03

Pos logit contributions
 three+0.003
sed+0.002
 3+0.002
 hand+0.002
 4+0.002
Neg logit contributions
 accidents-0.004
reating-0.004
rim-0.004
THE-0.004
iver-0.004
Token decided
Token ID3066
Feature activation+8.86e-03

Pos logit contributions
sed+0.002
ses+0.002
2+0.002
ogram+0.002
cious+0.002
Neg logit contributions
 accidents-0.002
 mess-0.002
THE-0.002
lf-0.002
 cancel-0.002
Token to
Token ID284
Feature activation+1.41e-02

Pos logit contributions
 accidents+0.021
reating+0.019
THE+0.018
 mess+0.018
rim+0.018
Neg logit contributions
sed-0.007
 three-0.006
2-0.005
ses-0.005
cious-0.005
Token play
Token ID711
Feature activation+6.75e-03

Pos logit contributions
 accidents+0.026
reating+0.022
rim+0.020
THE+0.020
 emergency+0.020
Neg logit contributions
 three-0.019
sed-0.018
 hand-0.017
 4-0.017
 finger-0.016
Token pirates
Token ID27516
Feature activation+1.47e-02

Pos logit contributions
 accidents+0.011
reating+0.009
rim+0.009
THE+0.009
 mess+0.008
Neg logit contributions
sed-0.010
 three-0.010
cious-0.009
2-0.009
ses-0.009
Token again
Token ID757
Feature activation-1.22e-03

Pos logit contributions
 accidents+0.028
reating+0.024
rim+0.024
THE+0.023
 emergency+0.023
Neg logit contributions
sed-0.018
 three-0.017
cious-0.016
2-0.015
ses-0.015
Token after
Token ID706
Feature activation+6.28e-02

Pos logit contributions
sed+0.002
ses+0.002
 eyebrows+0.002
sel+0.002
ogram+0.002
Neg logit contributions
 accidents-0.002
THE-0.001
 mess-0.001
reating-0.001
lf-0.001
Token lunch
Token ID9965
Feature activation+3.21e-02

Pos logit contributions
 accidents+0.103
rim+0.093
reating+0.092
iver+0.086
adays+0.084
Neg logit contributions
sed-0.084
 three-0.083
cious-0.075
ses-0.074
 4-0.073
Token.
Token ID13
Feature activation+9.10e-03

Pos logit contributions
 accidents+0.063
reating+0.056
 mess+0.055
THE+0.054
rim+0.053
Neg logit contributions
sed-0.037
ses-0.032
cious-0.031
 three-0.031
2-0.031
Token But
Token ID887
Feature activation+2.80e-02

Pos logit contributions
 accidents+0.019
reating+0.017
rim+0.016
THE+0.016
 mess+0.016
Neg logit contributions
sed-0.010
 three-0.008
cious-0.008
2-0.008
ses-0.008
Token this
Token ID428
Feature activation-3.10e-03

Pos logit contributions
 accidents+0.052
rim+0.049
iver+0.049
reating+0.049
ilt+0.046
Neg logit contributions
sed-0.033
 three-0.032
ses-0.028
 hand-0.027
 4-0.027
Token time
Token ID640
Feature activation+7.20e-03

Pos logit contributions
 three+0.004
sed+0.004
 seven+0.004
 one+0.004
 hand+0.004
Neg logit contributions
 accidents-0.005
reating-0.005
rim-0.005
ivering-0.004
iver-0.004
Token in
Token ID287
Feature activation-1.50e-02

Pos logit contributions
 three+0.009
sed+0.009
 hand+0.008
 seven+0.008
 finger+0.008
Neg logit contributions
 accidents-0.014
rim-0.013
reating-0.012
THE-0.012
iver-0.012
Token the
Token ID262
Feature activation-2.63e-03

Pos logit contributions
 three+0.018
sed+0.017
 hand+0.016
 4+0.016
cious+0.016
Neg logit contributions
 accidents-0.027
reating-0.024
rim-0.024
THE-0.023
iver-0.023
Token scary
Token ID14343
Feature activation-8.21e-03

Pos logit contributions
 three+0.005
sed+0.005
 seven+0.004
 4+0.004
 3+0.004
Neg logit contributions
 accidents-0.003
reating-0.003
rim-0.003
THE-0.002
iver-0.002
Token forest
Token ID8222
Feature activation-3.37e-02

Pos logit contributions
sed+0.018
 three+0.017
cious+0.016
ses+0.016
2+0.016
Neg logit contributions
 accidents-0.008
reating-0.006
rim-0.006
THE-0.006
 mess-0.005
Token"
Token ID1
Feature activation-4.31e-02

Pos logit contributions
sed+0.046
ses+0.041
cious+0.039
2+0.039
 three+0.039
Neg logit contributions
 accidents-0.059
reating-0.052
 mess-0.049
THE-0.049
rim-0.048
Token
Token ID220
Feature activation-7.52e-02

Pos logit contributions
sed+0.074
ses+0.067
cious+0.065
 wrists+0.062
ence+0.061
Neg logit contributions
 accidents-0.063
reating-0.054
THE-0.049
 happen-0.048
iver-0.047
Token
Token ID198
Feature activation-1.07e-02

Pos logit contributions
sed+0.129
ses+0.117
 wrists+0.111
cious+0.100
 three+0.100
Neg logit contributions
 accidents-0.115
reating-0.105
THE-0.097
iver-0.096
rim-0.093
Token
Token ID198
Feature activation-4.99e-03

Pos logit contributions
sed+0.017
ses+0.016
 wrists+0.014
cious+0.014
sel+0.014
Neg logit contributions
 accidents-0.016
THE-0.015
reating-0.014
Later-0.014
rim-0.014
TokenJust
Token ID5703
Feature activation-8.44e-03

Pos logit contributions
sed+0.007
 three+0.007
cious+0.006
2+0.006
 4+0.006
Neg logit contributions
 accidents-0.009
reating-0.007
rim-0.007
 mess-0.007
 emergency-0.007
Token then
Token ID788
Feature activation-3.93e-03

Pos logit contributions
sed+0.015
cious+0.014
2+0.014
ses+0.014
sel+0.013
Neg logit contributions
 accidents-0.011
reating-0.008
 emergency-0.008
THE-0.008
 mess-0.008
Token,
Token ID11
Feature activation+1.53e-03

Pos logit contributions
sed+0.007
ses+0.006
2+0.006
cious+0.006
 three+0.006
Neg logit contributions
 accidents-0.006
reating-0.005
THE-0.004
 emergency-0.004
 mess-0.004
Token called
Token ID1444
Feature activation-5.75e-03

Pos logit contributions
 accidents+0.002
THE+0.001
reating+0.001
 mess+0.001
lf+0.001
Neg logit contributions
sed-0.002
ses-0.002
cious-0.002
2-0.002
 finger-0.001
Token him
Token ID683
Feature activation-1.37e-02

Pos logit contributions
sed+0.009
cious+0.008
2+0.008
 three+0.008
ses+0.008
Neg logit contributions
 accidents-0.009
reating-0.008
THE-0.007
rim-0.007
 mess-0.007
Token "
Token ID366
Feature activation-1.59e-02

Pos logit contributions
sed+0.019
2+0.017
ses+0.017
cious+0.016
sel+0.016
Neg logit contributions
 accidents-0.023
reating-0.019
THE-0.019
adays-0.018
 emergency-0.018
TokenBob
Token ID18861
Feature activation-1.45e-02

Pos logit contributions
sed+0.028
ses+0.026
cious+0.024
 three+0.024
 finger+0.024
Neg logit contributions
 accidents-0.022
reating-0.019
THE-0.018
rim-0.018
iver-0.017
Token the
Token ID262
Feature activation-1.12e-02

Pos logit contributions
sed+0.023
ses+0.021
2+0.021
cious+0.020
 wrists+0.020
Neg logit contributions
 accidents-0.021
reating-0.017
rim-0.017
THE-0.017
 mess-0.017
Token Explorer
Token ID19142
Feature activation-7.15e-02

Pos logit contributions
sed+0.017
cious+0.016
sel+0.016
2+0.015
ses+0.015
Neg logit contributions
 accidents-0.017
 emergency-0.014
 mess-0.014
reating-0.014
 hurricane-0.014
Token"
Token ID1
Feature activation-3.51e-02

Pos logit contributions
sed+0.123
ses+0.112
 wrists+0.106
cious+0.097
 finger+0.095
Neg logit contributions
 accidents-0.101
reating-0.091
THE-0.090
rim-0.088
iver-0.085
Token and
Token ID290
Feature activation-9.54e-03

Pos logit contributions
sed+0.061
ses+0.056
cious+0.053
 finger+0.052
 wrists+0.051
Neg logit contributions
 accidents-0.050
reating-0.044
THE-0.040
rim-0.037
 happen-0.036
Token all
Token ID477
Feature activation-2.39e-02

Pos logit contributions
sed+0.014
ses+0.013
cious+0.012
2+0.012
 three+0.012
Neg logit contributions
 accidents-0.016
reating-0.013
 mess-0.012
adays-0.012
 happen-0.012
Token of
Token ID286
Feature activation+2.26e-02

Pos logit contributions
sed+0.039
ses+0.035
cious+0.035
 three+0.034
2+0.034
Neg logit contributions
 accidents-0.037
reating-0.031
rim-0.030
THE-0.029
 mess-0.029
Token his
Token ID465
Feature activation-3.53e-03

Pos logit contributions
 accidents+0.026
reating+0.021
THE+0.020
 emergency+0.019
rim+0.019
Neg logit contributions
sed-0.046
ses-0.042
cious-0.041
2-0.040
sel-0.040
Token "
Token ID366
Feature activation-1.21e-02

Pos logit contributions
sed+0.031
cious+0.029
ses+0.028
sel+0.027
 wrists+0.026
Neg logit contributions
 accidents-0.023
reating-0.019
 emergency-0.017
 cartoons-0.016
THE-0.016
TokenWhat
Token ID2061
Feature activation-3.59e-02

Pos logit contributions
sed+0.026
ses+0.024
cious+0.023
 three+0.022
sel+0.022
Neg logit contributions
 accidents-0.013
reating-0.011
THE-0.010
rim-0.010
 mess-0.009
Token are
Token ID389
Feature activation+1.35e-02

Pos logit contributions
sed+0.065
ses+0.057
sel+0.057
cious+0.057
 seven+0.057
Neg logit contributions
 accidents-0.049
reating-0.039
 happen-0.038
 mess-0.037
 emergency-0.036
Token you
Token ID345
Feature activation-1.44e-02

Pos logit contributions
 accidents+0.013
reating+0.010
rim+0.009
 emergency+0.008
THE+0.008
Neg logit contributions
sed-0.029
cious-0.027
ses-0.027
 three-0.027
2-0.027
Token doing
Token ID1804
Feature activation-5.71e-03

Pos logit contributions
sed+0.029
cious+0.027
sel+0.027
ses+0.027
 wrists+0.026
Neg logit contributions
 accidents-0.016
 mess-0.012
reating-0.011
 emergency-0.011
iver-0.011
Token
Token ID220
Feature activation-7.10e-02

Pos logit contributions
 three+0.009
sed+0.008
 4+0.008
2+0.008
 3+0.008
Neg logit contributions
 accidents-0.009
reating-0.008
rim-0.008
THE-0.007
tex-0.007
Token?"
Token ID1701
Feature activation-2.74e-02

Pos logit contributions
sed+0.115
ses+0.103
 three+0.101
cious+0.101
2+0.097
Neg logit contributions
 accidents-0.109
reating-0.094
rim-0.089
THE-0.087
iver-0.085
Token The
Token ID383
Feature activation-2.54e-02

Pos logit contributions
sed+0.053
ses+0.049
cious+0.046
sel+0.043
ige+0.042
Neg logit contributions
 accidents-0.035
 Bun-0.033
 noise-0.032
 him-0.030
 cartoons-0.030
Token tube
Token ID12403
Feature activation+1.62e-03

Pos logit contributions
sed+0.037
 three+0.035
2+0.033
ses+0.032
cious+0.032
Neg logit contributions
 accidents-0.042
reating-0.037
rim-0.035
THE-0.034
iver-0.033
Token didn
Token ID1422
Feature activation+1.06e-02

Pos logit contributions
 accidents+0.003
 mess+0.002
lf+0.002
 noise+0.002
 happen+0.002
Neg logit contributions
cious-0.002
sed-0.002
2-0.002
ses-0.002
sel-0.002
Token't
Token ID470
Feature activation-6.89e-03

Pos logit contributions
 accidents+0.017
reating+0.015
rim+0.015
 mess+0.014
THE+0.014
Neg logit contributions
sed-0.015
 three-0.015
2-0.014
 4-0.013
cious-0.013
Token will
Token ID481
Feature activation-1.28e-03

Pos logit contributions
sed+0.037
cious+0.033
ses+0.033
2+0.032
 three+0.031
Neg logit contributions
 accidents-0.041
 mess-0.036
rim-0.036
iver-0.036
THE-0.035
Token understand
Token ID1833
Feature activation-1.97e-02

Pos logit contributions
sed+0.002
 three+0.002
cious+0.002
2+0.002
ses+0.002
Neg logit contributions
 accidents-0.002
reating-0.002
rim-0.002
THE-0.002
 mess-0.002
Token you
Token ID345
Feature activation-1.89e-02

Pos logit contributions
sed+0.040
ses+0.037
cious+0.035
sel+0.034
ogram+0.032
Neg logit contributions
 accidents-0.026
reating-0.022
THE-0.019
 cartoons-0.018
 happen-0.018
Token better
Token ID1365
Feature activation+1.28e-03

Pos logit contributions
sed+0.031
ses+0.028
cious+0.026
2+0.026
sel+0.026
Neg logit contributions
 accidents-0.028
reating-0.024
 happen-0.023
THE-0.023
 mess-0.022
Token"
Token ID1
Feature activation-4.45e-02

Pos logit contributions
 accidents+0.002
reating+0.002
 mess+0.001
THE+0.001
 happen+0.001
Neg logit contributions
sed-0.002
ses-0.002
cious-0.002
2-0.002
sel-0.002
Token
Token ID220
Feature activation-6.91e-02

Pos logit contributions
sed+0.082
ses+0.076
cious+0.066
 wrists+0.066
 finger+0.065
Neg logit contributions
 accidents-0.064
reating-0.055
THE-0.050
 off-0.048
 happen-0.046
Token
Token ID198
Feature activation-1.28e-02

Pos logit contributions
sed+0.122
ses+0.114
 wrists+0.102
 finger+0.092
cious+0.091
Neg logit contributions
 accidents-0.100
reating-0.094
THE-0.088
iver-0.088
rim-0.086
Token
Token ID198
Feature activation-7.83e-03

Pos logit contributions
sed+0.023
ses+0.021
 finger+0.018
cious+0.017
 wrists+0.017
Neg logit contributions
 accidents-0.018
THE-0.018
reating-0.016
rim-0.016
Later-0.016
TokenThey
Token ID2990
Feature activation-3.02e-03

Pos logit contributions
sed+0.011
 three+0.011
cious+0.010
2+0.010
 4+0.010
Neg logit contributions
 accidents-0.013
reating-0.011
rim-0.011
 mess-0.011
 emergency-0.010
Token sk
Token ID1341
Feature activation-6.32e-04

Pos logit contributions
sed+0.005
ses+0.005
cious+0.004
2+0.004
ogram+0.004
Neg logit contributions
 accidents-0.004
THE-0.004
reating-0.004
tex-0.003
adays-0.003
Tokenated
Token ID515
Feature activation+3.11e-03

Pos logit contributions
sed+0.001
ses+0.001
 three+0.001
cious+0.001
 seven+0.001
Neg logit contributions
 accidents-0.001
rim-0.001
reating-0.001
THE-0.001
iver-0.001
Token jealous
Token ID19354
Feature activation+3.69e-03

Pos logit contributions
 accidents+0.013
reating+0.012
rim+0.011
THE+0.011
 mess+0.010
Neg logit contributions
sed-0.009
 three-0.009
cious-0.008
 4-0.008
 seven-0.008
Token,
Token ID11
Feature activation-1.13e-02

Pos logit contributions
 accidents+0.007
reating+0.006
 mess+0.005
THE+0.005
rim+0.005
Neg logit contributions
sed-0.005
cious-0.005
 three-0.005
ses-0.005
2-0.005
Token because
Token ID780
Feature activation+4.82e-03

Pos logit contributions
sed+0.015
 three+0.013
cious+0.013
2+0.013
ses+0.013
Neg logit contributions
 accidents-0.021
reating-0.018
rim-0.017
THE-0.017
 mess-0.017
Token the
Token ID262
Feature activation-2.87e-02

Pos logit contributions
 accidents+0.007
reating+0.006
rim+0.006
THE+0.006
 mess+0.005
Neg logit contributions
sed-0.008
 three-0.007
cious-0.007
2-0.007
ses-0.007
Token ch
Token ID442
Feature activation-3.33e-02

Pos logit contributions
sed+0.046
cious+0.045
sel+0.045
ogram+0.044
2+0.043
Neg logit contributions
 accidents-0.039
 mess-0.034
 emergency-0.033
 noise-0.033
 bamboo-0.032
Tokename
Token ID480
Feature activation-6.83e-02

Pos logit contributions
cious+0.052
sed+0.050
 wrists+0.049
 three+0.049
 4+0.049
Neg logit contributions
 accidents-0.045
rim-0.045
ilt-0.038
 emergency-0.037
iver-0.036
Tokenleon
Token ID38970
Feature activation-3.44e-02

Pos logit contributions
sed+0.105
 wrists+0.103
ses+0.101
2+0.094
cious+0.094
Neg logit contributions
 accidents-0.096
rim-0.088
ie-0.085
 fog-0.080
lf-0.080
Token was
Token ID373
Feature activation+6.42e-03

Pos logit contributions
sed+0.051
2+0.049
ses+0.047
cious+0.047
 three+0.046
Neg logit contributions
 accidents-0.052
rim-0.043
ie-0.042
THE-0.041
 mess-0.040
Token very
Token ID845
Feature activation-2.29e-03

Pos logit contributions
 accidents+0.010
 mess+0.008
reating+0.008
rim+0.007
 happen+0.007
Neg logit contributions
sed-0.010
2-0.010
cious-0.010
sel-0.009
ses-0.009
Token colorful
Token ID20239
Feature activation+1.11e-02

Pos logit contributions
sed+0.004
2+0.004
ses+0.004
sel+0.003
cious+0.003
Neg logit contributions
 accidents-0.003
 mess-0.003
rim-0.003
 emergency-0.002
 happen-0.002
Token and
Token ID290
Feature activation-8.12e-04

Pos logit contributions
 accidents+0.022
reating+0.020
THE+0.019
 mess+0.019
rim+0.018
Neg logit contributions
sed-0.012
cious-0.011
 three-0.011
2-0.011
ses-0.010
Token is
Token ID318
Feature activation+2.66e-03

Pos logit contributions
sed+0.019
2+0.018
 three+0.017
ses+0.017
cious+0.017
Neg logit contributions
 accidents-0.016
 happen-0.012
rim-0.012
 mess-0.012
ie-0.012
Token mine
Token ID6164
Feature activation+4.03e-03

Pos logit contributions
 accidents+0.004
 mess+0.003
reating+0.003
rim+0.003
 emergency+0.003
Neg logit contributions
sed-0.004
2-0.004
cious-0.004
ses-0.004
sel-0.004
Token!"
Token ID2474
Feature activation-2.31e-02

Pos logit contributions
 accidents+0.006
reating+0.005
THE+0.005
 mess+0.004
 happen+0.004
Neg logit contributions
sed-0.007
2-0.006
ses-0.006
cious-0.006
sel-0.006
Token The
Token ID383
Feature activation-3.07e-02

Pos logit contributions
sed+0.029
cious+0.025
ses+0.025
sel+0.024
ogram+0.024
Neg logit contributions
 accidents-0.044
THE-0.035
 mess-0.035
 noise-0.035
adays-0.035
Token ch
Token ID442
Feature activation-3.63e-02

Pos logit contributions
sed+0.049
ogram+0.047
cious+0.046
sel+0.045
ses+0.044
Neg logit contributions
 accidents-0.043
 mess-0.037
 emergency-0.035
 noise-0.035
 bamboo-0.035
Tokename
Token ID480
Feature activation-6.81e-02

Pos logit contributions
sed+0.056
cious+0.055
 three+0.052
sel+0.051
 4+0.051
Neg logit contributions
 accidents-0.053
rim-0.049
ilt-0.043
 mess-0.042
 emergency-0.042
Tokenleon
Token ID38970
Feature activation-2.62e-02

Pos logit contributions
sed+0.098
ses+0.091
 wrists+0.090
cious+0.087
2+0.085
Neg logit contributions
 accidents-0.106
rim-0.095
ie-0.090
lf-0.088
 fog-0.087
Token could
Token ID714
Feature activation-1.77e-04

Pos logit contributions
2+0.035
sed+0.035
cious+0.033
ses+0.033
sel+0.032
Neg logit contributions
 accidents-0.042
rim-0.035
ie-0.034
 mess-0.033
THE-0.032
Token tell
Token ID1560
Feature activation-1.37e-02

Pos logit contributions
sed+0.000
 three+0.000
cious+0.000
2+0.000
ses+0.000
Neg logit contributions
 accidents-0.000
reating-0.000
rim-0.000
THE-0.000
 mess-0.000
Token he
Token ID339
Feature activation-1.94e-02

Pos logit contributions
sed+0.023
cious+0.021
ses+0.021
 three+0.021
2+0.021
Neg logit contributions
 accidents-0.020
reating-0.016
 emergency-0.015
 mess-0.015
rim-0.015
Token was
Token ID373
Feature activation-8.18e-03

Pos logit contributions
sed+0.026
2+0.025
ses+0.024
cious+0.024
sel+0.023
Neg logit contributions
 accidents-0.033
 mess-0.027
 happen-0.027
ie-0.026
reating-0.026
Token pile
Token ID14540
Feature activation-8.13e-03

Pos logit contributions
 accidents+0.004
reating+0.004
rim+0.003
THE+0.003
 mess+0.003
Neg logit contributions
sed-0.005
 three-0.005
cious-0.005
2-0.005
ses-0.005
Token of
Token ID286
Feature activation+1.06e-02

Pos logit contributions
sed+0.013
 three+0.012
cious+0.012
ses+0.012
2+0.012
Neg logit contributions
 accidents-0.012
reating-0.010
rim-0.010
THE-0.010
 mess-0.009
Token paw
Token ID41827
Feature activation+1.13e-02

Pos logit contributions
 accidents+0.016
reating+0.013
rim+0.013
 mess+0.013
THE+0.012
Neg logit contributions
sed-0.017
cious-0.016
 three-0.015
2-0.015
ses-0.015
Token prints
Token ID20842
Feature activation+1.35e-03

Pos logit contributions
 accidents+0.020
reating+0.017
rim+0.016
THE+0.016
 mess+0.016
Neg logit contributions
sed-0.015
 three-0.014
cious-0.013
2-0.013
ses-0.013
Token"
Token ID1
Feature activation-3.96e-02

Pos logit contributions
 accidents+0.002
reating+0.002
rim+0.002
THE+0.002
 mess+0.002
Neg logit contributions
sed-0.002
 three-0.002
cious-0.002
2-0.002
ses-0.002
Token
Token ID220
Feature activation-6.71e-02

Pos logit contributions
sed+0.064
ses+0.058
cious+0.055
 finger+0.052
ence+0.051
Neg logit contributions
 accidents-0.064
reating-0.056
THE-0.051
 cartoons-0.049
 mess-0.049
Token
Token ID198
Feature activation-1.25e-02

Pos logit contributions
sed+0.118
ses+0.109
 wrists+0.099
cious+0.092
 three+0.090
Neg logit contributions
 accidents-0.101
reating-0.091
iver-0.086
THE-0.085
rim-0.084
Token
Token ID198
Feature activation-4.45e-03

Pos logit contributions
sed+0.018
ses+0.015
cious+0.015
 three+0.014
sel+0.014
Neg logit contributions
 accidents-0.022
THE-0.020
reating-0.019
rim-0.019
uggle-0.018
TokenL
Token ID43
Feature activation+1.75e-03

Pos logit contributions
sed+0.006
 three+0.006
2+0.006
cious+0.006
 4+0.006
Neg logit contributions
 accidents-0.007
reating-0.006
 emergency-0.006
 mess-0.006
rim-0.006
Tokenucky
Token ID5309
Feature activation+1.68e-02

Pos logit contributions
 accidents+0.003
reating+0.002
rim+0.002
 mess+0.002
THE+0.002
Neg logit contributions
sed-0.003
 three-0.003
2-0.003
cious-0.003
ses-0.002
Token smiled
Token ID13541
Feature activation+1.33e-03

Pos logit contributions
 accidents+0.029
 mess+0.025
THE+0.024
rim+0.023
lf+0.023
Neg logit contributions
sed-0.023
ses-0.021
ogram-0.020
cious-0.020
2-0.019
Token Sara
Token ID24799
Feature activation-1.60e-02

Pos logit contributions
sed+0.033
cious+0.030
ses+0.029
sel+0.029
2+0.029
Neg logit contributions
 accidents-0.029
 mess-0.022
 bamboo-0.022
rim-0.022
reating-0.022
Token says
Token ID1139
Feature activation-1.50e-02

Pos logit contributions
sed+0.026
cious+0.024
ogram+0.024
sel+0.024
ses+0.023
Neg logit contributions
 accidents-0.020
 mess-0.018
 noise-0.017
iver-0.016
 cartoons-0.016
Token.
Token ID13
Feature activation-1.49e-02

Pos logit contributions
sed+0.027
ogram+0.027
ses+0.027
sel+0.025
ence+0.025
Neg logit contributions
 accidents-0.016
 off-0.016
 from-0.015
 him-0.013
 disappear-0.012
Token She
Token ID1375
Feature activation-7.24e-03

Pos logit contributions
sel+0.028
chie+0.027
ence+0.027
cious+0.026
ogram+0.026
Neg logit contributions
 accidents-0.013
 him-0.012
 Tweet-0.011
 off-0.011
Later-0.011
Token reaches
Token ID12229
Feature activation-1.81e-02

Pos logit contributions
sed+0.010
cious+0.009
 three+0.009
2+0.009
ses+0.009
Neg logit contributions
 accidents-0.012
reating-0.010
 mess-0.010
rim-0.010
THE-0.010
Token out
Token ID503
Feature activation-6.46e-02

Pos logit contributions
cious+0.033
sel+0.032
sed+0.032
2+0.032
ogram+0.030
Neg logit contributions
 accidents-0.020
 off-0.018
reating-0.017
 bamboo-0.017
 fog-0.016
Token her
Token ID607
Feature activation-1.12e-02

Pos logit contributions
sed+0.146
ses+0.136
cious+0.130
2+0.125
sel+0.125
Neg logit contributions
 accidents-0.064
reating-0.053
 off-0.049
 bamboo-0.046
 fog-0.043
Token hand
Token ID1021
Feature activation-1.49e-02

Pos logit contributions
sel+0.010
2+0.010
4+0.009
 4+0.009
cious+0.008
Neg logit contributions
 accidents-0.022
 mess-0.021
 mud-0.020
 bamboo-0.020
 off-0.020
Token to
Token ID284
Feature activation-4.97e-03

Pos logit contributions
sed+0.028
2+0.027
ses+0.026
cious+0.026
chie+0.026
Neg logit contributions
 off-0.015
 accidents-0.015
 from-0.014
reating-0.011
 bamboo-0.011
Token touch
Token ID3638
Feature activation-1.89e-02

Pos logit contributions
sed+0.008
cious+0.008
ogram+0.008
sel+0.008
ence+0.008
Neg logit contributions
 accidents-0.007
 happen-0.006
reating-0.006
 bamboo-0.006
 cartoons-0.005
Token the
Token ID262
Feature activation-9.21e-03

Pos logit contributions
sed+0.037
cious+0.033
ses+0.032
2+0.032
ence+0.032
Neg logit contributions
 accidents-0.025
reating-0.020
 bamboo-0.019
 fog-0.018
THE-0.018
Token Anna
Token ID11735
Feature activation-2.45e-02

Pos logit contributions
sed+0.041
cious+0.037
ses+0.036
2+0.036
sel+0.034
Neg logit contributions
 accidents-0.042
 mess-0.034
reating-0.034
 cartoons-0.033
THE-0.033
Token said
Token ID531
Feature activation-1.07e-02

Pos logit contributions
sed+0.041
ses+0.040
ogram+0.037
chie+0.036
ence+0.036
Neg logit contributions
 accidents-0.027
 noise-0.026
 from-0.025
ie-0.024
 off-0.023
Token.
Token ID13
Feature activation-1.81e-02

Pos logit contributions
sed+0.019
ses+0.018
ogram+0.017
sel+0.017
2+0.017
Neg logit contributions
 accidents-0.013
 off-0.012
 from-0.011
 happen-0.010
 noise-0.010
Token She
Token ID1375
Feature activation-5.76e-03

Pos logit contributions
sel+0.032
cious+0.031
ence+0.031
ogram+0.030
ige+0.029
Neg logit contributions
 accidents-0.018
 cartoons-0.015
Ro-0.015
 Bun-0.015
 showers-0.015
Token reached
Token ID4251
Feature activation-1.86e-02

Pos logit contributions
ogram+0.008
sed+0.008
ses+0.008
cious+0.007
sel+0.007
Neg logit contributions
 accidents-0.009
 mess-0.007
lf-0.007
 fog-0.007
tex-0.007
Token out
Token ID503
Feature activation-6.45e-02

Pos logit contributions
cious+0.036
sed+0.035
sel+0.035
2+0.034
ence+0.032
Neg logit contributions
 accidents-0.018
 off-0.016
reating-0.015
 bamboo-0.015
 fog-0.015
Token her
Token ID607
Feature activation-9.84e-03

Pos logit contributions
sed+0.137
ses+0.127
cious+0.125
2+0.119
sel+0.119
Neg logit contributions
 accidents-0.073
reating-0.058
 cartoons-0.050
 off-0.049
 bamboo-0.048
Token hand
Token ID1021
Feature activation-1.62e-02

Pos logit contributions
sel+0.009
2+0.009
4+0.008
 4+0.008
cious+0.008
Neg logit contributions
 accidents-0.019
 mess-0.018
 mud-0.017
 off-0.017
 snow-0.017
Token to
Token ID284
Feature activation-6.19e-03

Pos logit contributions
sed+0.030
ses+0.028
2+0.028
cious+0.027
chie+0.027
Neg logit contributions
 accidents-0.017
 off-0.017
 from-0.015
reating-0.013
iver-0.013
Token touch
Token ID3638
Feature activation-1.98e-02

Pos logit contributions
sed+0.010
cious+0.009
ogram+0.009
sel+0.009
ence+0.009
Neg logit contributions
 accidents-0.009
 happen-0.008
reating-0.007
 bamboo-0.007
 cartoons-0.007
Token the
Token ID262
Feature activation-8.61e-03

Pos logit contributions
sed+0.039
cious+0.034
ses+0.034
2+0.034
ence+0.033
Neg logit contributions
 accidents-0.026
reating-0.021
 bamboo-0.019
 fog-0.019
THE-0.018
Token Lily
Token ID20037
Feature activation+7.41e-04

Pos logit contributions
sed+0.041
cious+0.038
ses+0.037
2+0.037
 three+0.036
Neg logit contributions
 accidents-0.027
 mess-0.020
reating-0.020
 cartoons-0.019
rim-0.019
Token says
Token ID1139
Feature activation-1.35e-02

Pos logit contributions
 accidents+0.001
 mess+0.001
reating+0.001
 cartoons+0.001
 noise+0.001
Neg logit contributions
sed-0.001
cious-0.001
sel-0.001
 three-0.001
ogram-0.001
Token.
Token ID13
Feature activation-1.13e-02

Pos logit contributions
ogram+0.024
sed+0.023
ses+0.022
 seven+0.022
2+0.022
Neg logit contributions
 accidents-0.016
 off-0.014
 from-0.013
 happen-0.012
 noise-0.011
Token She
Token ID1375
Feature activation-5.29e-03

Pos logit contributions
sel+0.021
ence+0.020
cious+0.019
ogram+0.019
chie+0.019
Neg logit contributions
 accidents-0.011
 cartoons-0.009
 off-0.009
Ro-0.009
 showers-0.009
Token reaches
Token ID12229
Feature activation-1.97e-02

Pos logit contributions
sed+0.008
 three+0.007
cious+0.007
2+0.007
ses+0.007
Neg logit contributions
 accidents-0.009
reating-0.007
 mess-0.007
rim-0.007
THE-0.007
Token out
Token ID503
Feature activation-6.45e-02

Pos logit contributions
cious+0.036
sed+0.036
sel+0.035
2+0.035
ogram+0.033
Neg logit contributions
 accidents-0.022
 off-0.019
reating-0.018
 bamboo-0.018
 fog-0.018
Token her
Token ID607
Feature activation-1.11e-02

Pos logit contributions
sed+0.135
ses+0.125
cious+0.122
2+0.116
sel+0.115
Neg logit contributions
 accidents-0.078
reating-0.061
 cartoons-0.054
 off-0.053
 bamboo-0.052
Token arm
Token ID3211
Feature activation+1.49e-02

Pos logit contributions
2+0.010
sel+0.009
4+0.009
 4+0.009
cious+0.008
Neg logit contributions
 accidents-0.022
 mess-0.020
 bamboo-0.020
 mud-0.020
 off-0.020
Token to
Token ID284
Feature activation+1.28e-03

Pos logit contributions
 accidents+0.019
reating+0.017
 bamboo+0.016
 off+0.016
 fog+0.016
Neg logit contributions
sed-0.024
cious-0.024
2-0.023
chie-0.023
 4-0.023
Token touch
Token ID3638
Feature activation-1.76e-02

Pos logit contributions
 accidents+0.002
 bamboo+0.002
reating+0.002
 happen+0.002
 mess+0.002
Neg logit contributions
sed-0.002
ogram-0.002
cious-0.002
sel-0.002
ses-0.002
Token the
Token ID262
Feature activation-7.76e-03

Pos logit contributions
sed+0.033
cious+0.030
ses+0.029
2+0.029
ence+0.028
Neg logit contributions
 accidents-0.024
reating-0.020
THE-0.018
 bamboo-0.017
 emergency-0.017
Token.
Token ID13
Feature activation-1.63e-02

Pos logit contributions
sed+0.005
ses+0.005
2+0.005
ogram+0.005
 seven+0.005
Neg logit contributions
 accidents-0.004
 off-0.003
 happen-0.003
 noise-0.003
 from-0.003
Token
Token ID198
Feature activation-5.26e-03

Pos logit contributions
sed+0.021
ses+0.019
cious+0.018
sel+0.018
 finger+0.016
Neg logit contributions
 accidents-0.030
lf-0.025
 cartoons-0.025
reating-0.025
 emergency-0.024
Token
Token ID198
Feature activation-2.94e-03

Pos logit contributions
sed+0.007
ses+0.006
 three+0.006
cious+0.006
2+0.006
Neg logit contributions
 accidents-0.009
reating-0.008
THE-0.008
rim-0.008
 mess-0.008
TokenJohn
Token ID7554
Feature activation-9.27e-03

Pos logit contributions
sed+0.005
 three+0.005
cious+0.005
ses+0.005
2+0.005
Neg logit contributions
 accidents-0.004
reating-0.003
rim-0.003
THE-0.003
 mess-0.003
Token reached
Token ID4251
Feature activation-2.06e-02

Pos logit contributions
ses+0.014
sed+0.014
cious+0.012
ogram+0.012
sel+0.011
Neg logit contributions
 accidents-0.013
ie-0.012
lf-0.012
 mess-0.011
iver-0.011
Token out
Token ID503
Feature activation-6.40e-02

Pos logit contributions
sed+0.041
cious+0.040
ses+0.036
sel+0.034
 wrists+0.033
Neg logit contributions
 accidents-0.024
reating-0.022
 off-0.022
 mess-0.021
 cartoons-0.021
Token and
Token ID290
Feature activation-1.57e-02

Pos logit contributions
sed+0.134
ses+0.126
cious+0.119
 wrists+0.110
ogram+0.107
Neg logit contributions
 accidents-0.084
reating-0.068
 off-0.062
 cartoons-0.061
 mess-0.058
Token tried
Token ID3088
Feature activation-2.16e-02

Pos logit contributions
sed+0.026
ses+0.026
cious+0.023
sel+0.020
 ago+0.020
Neg logit contributions
 accidents-0.022
 off-0.019
 forth-0.018
 mess-0.018
lf-0.018
Token to
Token ID284
Feature activation-5.26e-03

Pos logit contributions
sed+0.036
ses+0.032
sel+0.030
cious+0.030
 seven+0.027
Neg logit contributions
 accidents-0.038
reating-0.033
 off-0.031
 mess-0.031
 happen-0.029
Token stretch
Token ID7539
Feature activation-2.69e-02

Pos logit contributions
sed+0.009
ses+0.008
cious+0.008
sel+0.008
2+0.007
Neg logit contributions
 accidents-0.008
reating-0.006
 mess-0.006
 happen-0.006
iver-0.006
Token it
Token ID340
Feature activation+5.49e-04

Pos logit contributions
sed+0.053
cious+0.049
ses+0.048
2+0.046
sel+0.044
Neg logit contributions
 accidents-0.033
reating-0.028
THE-0.025
 happen-0.024
 mess-0.024
Token sharp
Token ID7786
Feature activation+8.93e-03

Pos logit contributions
 three+0.025
sed+0.025
 finger+0.023
 hand+0.023
 seven+0.022
Neg logit contributions
 accidents-0.030
reating-0.026
rim-0.025
THE-0.025
iver-0.025
Token object
Token ID2134
Feature activation-1.24e-02

Pos logit contributions
 accidents+0.013
reating+0.011
 mess+0.010
 emergency+0.010
rim+0.010
Neg logit contributions
sed-0.015
cious-0.013
2-0.013
 three-0.013
ses-0.013
Token on
Token ID319
Feature activation-1.36e-02

Pos logit contributions
sed+0.019
2+0.018
cious+0.018
ses+0.018
 4+0.017
Neg logit contributions
 accidents-0.019
reating-0.015
 happen-0.014
 mess-0.014
THE-0.014
Token the
Token ID262
Feature activation-8.56e-03

Pos logit contributions
 three+0.014
sed+0.013
 4+0.012
 seven+0.012
 hand+0.012
Neg logit contributions
 accidents-0.028
rim-0.025
reating-0.025
iver-0.024
THE-0.024
Token ground
Token ID2323
Feature activation-1.81e-02

Pos logit contributions
sed+0.018
ses+0.017
2+0.017
cious+0.017
ogram+0.017
Neg logit contributions
 accidents-0.007
reating-0.006
THE-0.006
 mess-0.006
 emergency-0.005
Token below
Token ID2174
Feature activation-6.38e-02

Pos logit contributions
sed+0.027
ses+0.024
cious+0.024
2+0.023
 three+0.022
Neg logit contributions
 accidents-0.030
reating-0.026
 mess-0.026
THE-0.025
 happen-0.024
Token.
Token ID13
Feature activation-7.19e-04

Pos logit contributions
sed+0.081
cious+0.070
ses+0.070
2+0.069
 three+0.068
Neg logit contributions
 accidents-0.119
reating-0.106
THE-0.102
rim-0.100
 emergency-0.099
Token Curious
Token ID44269
Feature activation-1.95e-02

Pos logit contributions
sed+0.001
cious+0.001
ses+0.001
sel+0.001
ogram+0.001
Neg logit contributions
 accidents-0.001
reating-0.001
 mess-0.001
 emergency-0.001
 bamboo-0.001
Token,
Token ID11
Feature activation-1.22e-02

Pos logit contributions
sed+0.026
 three+0.024
2+0.023
cious+0.022
 4+0.022
Neg logit contributions
 accidents-0.034
reating-0.031
rim-0.030
THE-0.029
iver-0.029
Token he
Token ID339
Feature activation-4.20e-03

Pos logit contributions
sed+0.021
 three+0.020
cious+0.019
2+0.019
ses+0.019
Neg logit contributions
 accidents-0.017
reating-0.014
rim-0.013
THE-0.013
 mess-0.013
Token quickly
Token ID2952
Feature activation-1.13e-02

Pos logit contributions
sed+0.007
ses+0.006
cious+0.006
 three+0.006
2+0.006
Neg logit contributions
 accidents-0.006
reating-0.005
THE-0.005
rim-0.005
 mess-0.005
Token.
Token ID13
Feature activation-6.72e-03

Pos logit contributions
sed+0.013
 three+0.012
cious+0.011
2+0.011
ses+0.011
Neg logit contributions
 accidents-0.026
reating-0.023
rim-0.022
THE-0.022
 mess-0.022
Token
Token ID198
Feature activation-9.35e-03

Pos logit contributions
sed+0.010
sel+0.010
ses+0.009
cious+0.009
 seven+0.009
Neg logit contributions
 accidents-0.010
THE-0.008
Later-0.008
rim-0.008
adays-0.008
Token
Token ID198
Feature activation-3.68e-03

Pos logit contributions
sed+0.013
ses+0.012
cious+0.011
 three+0.011
 wrists+0.011
Neg logit contributions
 accidents-0.016
THE-0.014
rim-0.014
reating-0.014
iver-0.013
TokenBob
Token ID18861
Feature activation-8.78e-03

Pos logit contributions
cious+0.006
 three+0.006
sed+0.006
2+0.006
 4+0.005
Neg logit contributions
 accidents-0.006
reating-0.005
 mess-0.004
 cartoons-0.004
 emergency-0.004
Token pulled
Token ID5954
Feature activation-1.39e-02

Pos logit contributions
sed+0.011
ses+0.010
cious+0.010
ogram+0.010
2+0.009
Neg logit contributions
 accidents-0.015
 mess-0.013
rim-0.013
THE-0.013
ie-0.012
Token out
Token ID503
Feature activation-6.38e-02

Pos logit contributions
sed+0.020
 three+0.019
2+0.018
cious+0.018
 4+0.018
Neg logit contributions
 accidents-0.023
reating-0.020
rim-0.020
THE-0.019
 mess-0.019
Token his
Token ID465
Feature activation-9.73e-03

Pos logit contributions
sed+0.099
cious+0.088
ses+0.088
 wrists+0.082
2+0.079
Neg logit contributions
 accidents-0.110
reating-0.090
 bamboo-0.085
 emergency-0.084
 fog-0.084
Token finger
Token ID7660
Feature activation-8.64e-03

Pos logit contributions
sel+0.014
2+0.013
 4+0.013
cious+0.013
ence+0.013
Neg logit contributions
 accidents-0.017
 mess-0.015
 emergency-0.014
 mud-0.014
iver-0.014
Token and
Token ID290
Feature activation-1.91e-03

Pos logit contributions
sed+0.014
ses+0.013
cious+0.013
2+0.012
sel+0.012
Neg logit contributions
 accidents-0.014
reating-0.010
 mess-0.010
iver-0.010
 off-0.010
Token slowly
Token ID6364
Feature activation-1.48e-02

Pos logit contributions
sed+0.003
ses+0.002
cious+0.002
2+0.002
 three+0.002
Neg logit contributions
 accidents-0.003
 mess-0.003
reating-0.003
lf-0.003
rim-0.003
Token scratched
Token ID37468
Feature activation+2.37e-03

Pos logit contributions
sed+0.021
ses+0.020
cious+0.018
ogram+0.018
 wrists+0.017
Neg logit contributions
 accidents-0.026
 mess-0.021
 mud-0.020
rim-0.020
 off-0.020
Token
Token ID198
Feature activation-1.83e-02

Pos logit contributions
sed+0.042
cious+0.038
ses+0.038
sel+0.036
ence+0.035
Neg logit contributions
 accidents-0.040
THE-0.034
reating-0.032
 bamboo-0.032
 mess-0.031
Token
Token ID198
Feature activation-1.46e-02

Pos logit contributions
sed+0.024
cious+0.021
ses+0.021
 three+0.020
sel+0.020
Neg logit contributions
 accidents-0.033
reating-0.029
THE-0.029
rim-0.029
 mess-0.028
TokenS
Token ID50
Feature activation-5.26e-04

Pos logit contributions
sed+0.028
ses+0.025
 three+0.025
cious+0.025
2+0.025
Neg logit contributions
 accidents-0.018
reating-0.015
THE-0.014
rim-0.014
 mess-0.013
Tokenara
Token ID3301
Feature activation-2.48e-03

Pos logit contributions
sed+0.001
 three+0.001
cious+0.001
2+0.001
ses+0.001
Neg logit contributions
 accidents-0.001
reating-0.001
rim-0.001
THE-0.001
 mess-0.001
Token reached
Token ID4251
Feature activation-1.81e-02

Pos logit contributions
sed+0.003
ogram+0.003
ses+0.003
chie+0.003
sel+0.003
Neg logit contributions
 accidents-0.004
 mess-0.003
iver-0.003
 stain-0.003
ie-0.003
Token out
Token ID503
Feature activation-6.34e-02

Pos logit contributions
cious+0.035
sed+0.035
sel+0.032
2+0.032
 wrists+0.031
Neg logit contributions
 accidents-0.019
 off-0.017
 mess-0.015
reating-0.015
ie-0.015
Token her
Token ID607
Feature activation-1.06e-02

Pos logit contributions
sed+0.138
ses+0.127
cious+0.123
2+0.118
 wrists+0.117
Neg logit contributions
 accidents-0.072
reating-0.056
 fog-0.046
 off-0.046
 bamboo-0.046
Token finger
Token ID7660
Feature activation-1.03e-02

Pos logit contributions
sel+0.011
2+0.009
cious+0.009
sed+0.008
4+0.008
Neg logit contributions
 accidents-0.021
 mess-0.019
 mud-0.019
 off-0.018
 stain-0.018
Token and
Token ID290
Feature activation-7.47e-03

Pos logit contributions
sed+0.019
ses+0.018
2+0.017
cious+0.017
chie+0.016
Neg logit contributions
 accidents-0.013
 off-0.012
 from-0.010
ie-0.010
iver-0.010
Token gently
Token ID15165
Feature activation-1.67e-02

Pos logit contributions
sed+0.011
chie+0.011
ses+0.011
ogram+0.011
 ago+0.010
Neg logit contributions
 accidents-0.012
 off-0.009
 mess-0.009
 mud-0.009
 stain-0.009
Token stro
Token ID9660
Feature activation-8.25e-03

Pos logit contributions
sed+0.027
ses+0.027
ogram+0.026
chie+0.024
sing+0.024
Neg logit contributions
 accidents-0.021
 off-0.019
 mud-0.018
 happen-0.018
 mess-0.018
Token's
Token ID338
Feature activation+5.18e-03

Pos logit contributions
 three+0.014
 hand+0.012
 seven+0.010
 finger+0.010
cious+0.010
Neg logit contributions
 accidents-0.058
reating-0.056
rim-0.055
 bamboo-0.053
THE-0.052
Token take
Token ID1011
Feature activation-1.34e-02

Pos logit contributions
 accidents+0.010
reating+0.009
rim+0.008
THE+0.008
tex+0.007
Neg logit contributions
 three-0.007
 hand-0.007
 finger-0.006
 4-0.006
 3-0.006
Token a
Token ID257
Feature activation+5.66e-03

Pos logit contributions
 three+0.014
sed+0.014
 4+0.012
cious+0.012
 seven+0.012
Neg logit contributions
 accidents-0.027
reating-0.024
rim-0.024
THE-0.023
 mess-0.023
Token break
Token ID2270
Feature activation-1.47e-03

Pos logit contributions
 accidents+0.009
reating+0.007
 mess+0.007
THE+0.007
rim+0.007
Neg logit contributions
sed-0.009
ses-0.008
cious-0.008
 three-0.008
2-0.008
Token"
Token ID1
Feature activation-3.59e-02

Pos logit contributions
sed+0.002
 three+0.002
cious+0.002
2+0.002
ses+0.002
Neg logit contributions
 accidents-0.003
reating-0.002
rim-0.002
THE-0.002
 mess-0.002
Token
Token ID220
Feature activation-6.31e-02

Pos logit contributions
sed+0.061
ses+0.055
cious+0.054
 finger+0.050
 eight+0.050
Neg logit contributions
 accidents-0.054
reating-0.047
THE-0.043
iver-0.041
 mess-0.041
Token
Token ID198
Feature activation-1.38e-02

Pos logit contributions
sed+0.101
ses+0.091
 wrists+0.085
cious+0.083
 three+0.082
Neg logit contributions
 accidents-0.100
reating-0.090
THE-0.085
iver-0.084
rim-0.084
TokenBut
Token ID1537
Feature activation+5.09e-03

Pos logit contributions
sed+0.021
ses+0.019
cious+0.017
 three+0.017
 finger+0.016
Neg logit contributions
 accidents-0.023
THE-0.021
reating-0.021
rim-0.021
iver-0.019
Token Tim
Token ID5045
Feature activation+5.01e-03

Pos logit contributions
 accidents+0.008
reating+0.007
rim+0.007
THE+0.006
 mess+0.006
Neg logit contributions
sed-0.008
 three-0.007
 4-0.007
cious-0.007
2-0.007
Token was
Token ID373
Feature activation-5.52e-03

Pos logit contributions
 accidents+0.007
 mess+0.006
ie+0.006
 emergency+0.005
 happen+0.005
Neg logit contributions
sed-0.008
cious-0.008
ses-0.008
 three-0.007
 wrists-0.007
Token determined
Token ID5295
Feature activation-2.65e-02

Pos logit contributions
sed+0.009
cious+0.009
ses+0.009
sel+0.009
 wrists+0.008
Neg logit contributions
 accidents-0.008
 mess-0.007
 fog-0.006
 emergency-0.006
reating-0.006
Token the
Token ID262
Feature activation-6.06e-03

Pos logit contributions
sed+0.012
 three+0.011
cious+0.010
2+0.010
ses+0.010
Neg logit contributions
 accidents-0.015
reating-0.013
rim-0.013
THE-0.012
 mess-0.012
Token shelf
Token ID18316
Feature activation-2.26e-02

Pos logit contributions
sed+0.011
ses+0.010
cious+0.009
2+0.009
 three+0.009
Neg logit contributions
 accidents-0.008
reating-0.007
 mess-0.007
 emergency-0.007
THE-0.006
Token.
Token ID13
Feature activation+6.82e-03

Pos logit contributions
sed+0.038
ses+0.035
cious+0.033
 eyebrows+0.031
2+0.030
Neg logit contributions
 accidents-0.033
reating-0.029
lf-0.029
 mess-0.028
THE-0.028
Token She
Token ID1375
Feature activation-2.89e-03

Pos logit contributions
 accidents+0.009
THE+0.008
Later+0.007
tex+0.007
 mud+0.007
Neg logit contributions
sed-0.011
sel-0.011
cious-0.010
ses-0.010
 three-0.009
Token reached
Token ID4251
Feature activation-2.01e-02

Pos logit contributions
sed+0.004
ses+0.003
cious+0.003
ogram+0.003
sel+0.003
Neg logit contributions
 accidents-0.005
 mess-0.005
lf-0.004
THE-0.004
rim-0.004
Token out
Token ID503
Feature activation-6.31e-02

Pos logit contributions
sed+0.043
cious+0.042
ses+0.038
sel+0.037
2+0.035
Neg logit contributions
 off-0.020
 accidents-0.019
 mess-0.018
reating-0.017
ie-0.015
Token and
Token ID290
Feature activation-1.37e-02

Pos logit contributions
sed+0.135
ses+0.127
cious+0.111
ogram+0.109
sel+0.106
Neg logit contributions
 accidents-0.084
reating-0.067
 off-0.067
 from-0.059
ie-0.057
Token picked
Token ID6497
Feature activation+3.56e-03

Pos logit contributions
sed+0.023
ses+0.022
cious+0.019
sing+0.018
 ago+0.018
Neg logit contributions
 accidents-0.019
 off-0.017
 Bun-0.016
 mess-0.015
 mud-0.015
Token it
Token ID340
Feature activation-2.67e-03

Pos logit contributions
 accidents+0.003
reating+0.003
 off+0.003
 mess+0.002
 bamboo+0.002
Neg logit contributions
sed-0.008
ses-0.007
cious-0.007
ogram-0.007
sing-0.007
Token up
Token ID510
Feature activation-2.81e-02

Pos logit contributions
sed+0.007
ses+0.006
cious+0.006
2+0.006
ogram+0.006
Neg logit contributions
 accidents-0.002
reating-0.001
 mess-0.001
 happen-0.001
THE-0.001
Token.
Token ID13
Feature activation+5.88e-03

Pos logit contributions
sed+0.053
ses+0.051
ogram+0.044
2+0.044
cious+0.042
Neg logit contributions
 accidents-0.040
 off-0.033
THE-0.032
 from-0.031
 mess-0.030
Token?"
Token ID1701
Feature activation-2.29e-02

Pos logit contributions
sed+0.011
 three+0.010
cious+0.010
2+0.010
ses+0.010
Neg logit contributions
 accidents-0.009
reating-0.007
rim-0.007
THE-0.007
 mess-0.007
Token
Token ID198
Feature activation-1.45e-02

Pos logit contributions
sed+0.044
ses+0.041
cious+0.039
sel+0.039
 wrists+0.037
Neg logit contributions
 accidents-0.030
THE-0.024
 cartoons-0.023
lf-0.022
 noise-0.022
Token
Token ID198
Feature activation-7.03e-03

Pos logit contributions
sed+0.027
ses+0.025
cious+0.023
 wrists+0.023
sel+0.022
Neg logit contributions
 accidents-0.020
THE-0.019
rim-0.017
Later-0.017
reating-0.016
TokenJoe
Token ID19585
Feature activation-3.00e-03

Pos logit contributions
sed+0.015
ses+0.014
 three+0.013
cious+0.013
 wrists+0.012
Neg logit contributions
 accidents-0.008
THE-0.006
reating-0.006
rim-0.006
 mess-0.006
Token took
Token ID1718
Feature activation-1.85e-02

Pos logit contributions
sed+0.005
ses+0.005
cious+0.004
sel+0.004
ogram+0.004
Neg logit contributions
ie-0.004
 accidents-0.004
 mess-0.004
rim-0.003
lf-0.003
Token out
Token ID503
Feature activation-6.30e-02

Pos logit contributions
sed+0.027
cious+0.025
ses+0.024
2+0.023
 three+0.023
Neg logit contributions
 accidents-0.031
reating-0.026
rim-0.025
 emergency-0.025
THE-0.025
Token a
Token ID257
Feature activation+7.51e-04

Pos logit contributions
sed+0.106
ses+0.095
cious+0.094
 wrists+0.088
ence+0.084
Neg logit contributions
 accidents-0.103
reating-0.085
 bamboo-0.081
 cartoons-0.081
 fog-0.079
Token large
Token ID1588
Feature activation+1.53e-02

Pos logit contributions
 accidents+0.001
 mess+0.001
reating+0.001
rim+0.001
 emergency+0.001
Neg logit contributions
sed-0.001
cious-0.001
ses-0.001
2-0.001
 three-0.001
Token can
Token ID460
Feature activation-1.76e-02

Pos logit contributions
 accidents+0.029
reating+0.026
THE+0.025
rim+0.024
iver+0.023
Neg logit contributions
sed-0.019
 three-0.018
ses-0.016
cious-0.016
2-0.016
Token from
Token ID422
Feature activation+1.33e-02

Pos logit contributions
sed+0.035
 wrists+0.033
ses+0.031
2+0.029
ence+0.029
Neg logit contributions
 accidents-0.018
iver-0.018
reating-0.017
rim-0.017
THE-0.016
Token his
Token ID465
Feature activation+4.97e-04

Pos logit contributions
 accidents+0.018
reating+0.016
 mess+0.015
 emergency+0.015
rim+0.014
Neg logit contributions
sed-0.023
cious-0.022
2-0.021
sel-0.021
ses-0.020
Token.
Token ID13
Feature activation-1.26e-02

Pos logit contributions
 accidents+0.005
reating+0.004
THE+0.004
 mess+0.004
 happen+0.004
Neg logit contributions
sed-0.003
ses-0.003
cious-0.002
2-0.002
 three-0.002
Token
Token ID198
Feature activation-1.19e-02

Pos logit contributions
sed+0.020
ses+0.018
cious+0.017
sel+0.017
ogram+0.016
Neg logit contributions
 accidents-0.019
THE-0.017
reating-0.016
uggle-0.016
adays-0.016
Token
Token ID198
Feature activation-1.31e-02

Pos logit contributions
sed+0.016
ses+0.014
cious+0.013
 three+0.013
sel+0.013
Neg logit contributions
 accidents-0.021
reating-0.019
THE-0.019
rim-0.018
iver-0.018
TokenShe
Token ID3347
Feature activation-1.04e-03

Pos logit contributions
sed+0.020
 three+0.019
2+0.019
cious+0.019
 4+0.018
Neg logit contributions
 accidents-0.020
reating-0.017
rim-0.017
 mess-0.016
THE-0.016
Token reached
Token ID4251
Feature activation-2.16e-02

Pos logit contributions
sed+0.001
ses+0.001
cious+0.001
ogram+0.001
2+0.001
Neg logit contributions
 accidents-0.002
 mess-0.002
reating-0.002
THE-0.001
lf-0.001
Token out
Token ID503
Feature activation-6.29e-02

Pos logit contributions
sed+0.044
cious+0.042
ses+0.039
sel+0.038
2+0.038
Neg logit contributions
 accidents-0.023
 off-0.020
reating-0.019
 mess-0.019
 fog-0.017
Token and
Token ID290
Feature activation-1.51e-02

Pos logit contributions
sed+0.124
ses+0.115
cious+0.109
sel+0.103
ogram+0.102
Neg logit contributions
 accidents-0.087
reating-0.071
 off-0.065
 fog-0.060
ie-0.060
Token touched
Token ID12615
Feature activation-1.04e-02

Pos logit contributions
sed+0.025
ses+0.024
cious+0.020
ogram+0.018
sing+0.018
Neg logit contributions
 accidents-0.022
 off-0.020
 forth-0.019
 Bun-0.018
ie-0.018
Token it
Token ID340
Feature activation-2.27e-04

Pos logit contributions
sed+0.023
ses+0.021
cious+0.020
sing+0.019
2+0.019
Neg logit contributions
 accidents-0.012
reating-0.009
 off-0.008
 fog-0.008
THE-0.008
Token.
Token ID13
Feature activation-1.12e-02

Pos logit contributions
sed+0.000
ses+0.000
cious+0.000
2+0.000
sel+0.000
Neg logit contributions
 accidents-0.000
 mess-0.000
 happen-0.000
reating-0.000
THE-0.000
Token It
Token ID632
Feature activation+5.29e-03

Pos logit contributions
sed+0.016
ogram+0.015
cious+0.015
ses+0.014
sel+0.014
Neg logit contributions
 accidents-0.017
adays-0.014
reating-0.014
 cartoons-0.014
 showers-0.014
Token
Token ID198
Feature activation-8.15e-03

Pos logit contributions
sed+0.004
sel+0.004
cious+0.004
 finger+0.004
ses+0.004
Neg logit contributions
 accidents-0.003
THE-0.003
Later-0.003
adays-0.002
 vacations-0.002
Token
Token ID198
Feature activation-5.86e-03

Pos logit contributions
sed+0.010
ses+0.009
 three+0.009
cious+0.009
sel+0.008
Neg logit contributions
 accidents-0.015
reating-0.014
THE-0.013
rim-0.013
 mess-0.013
TokenThe
Token ID464
Feature activation-1.13e-02

Pos logit contributions
sed+0.009
 three+0.009
cious+0.009
2+0.008
ses+0.008
Neg logit contributions
 accidents-0.009
reating-0.008
rim-0.007
THE-0.007
 mess-0.007
Token customer
Token ID6491
Feature activation-9.66e-03

Pos logit contributions
sed+0.016
 three+0.016
cious+0.015
2+0.015
 seven+0.014
Neg logit contributions
 accidents-0.019
reating-0.016
rim-0.016
iver-0.015
THE-0.015
Token picked
Token ID6497
Feature activation+1.14e-03

Pos logit contributions
sed+0.014
ses+0.013
cious+0.012
ogram+0.011
2+0.011
Neg logit contributions
 accidents-0.015
lf-0.014
 mess-0.013
 emergency-0.012
reating-0.012
Token out
Token ID503
Feature activation-6.28e-02

Pos logit contributions
 accidents+0.001
reating+0.001
 mess+0.001
THE+0.001
 emergency+0.001
Neg logit contributions
sed-0.002
ses-0.002
cious-0.002
2-0.002
 three-0.002
Token two
Token ID734
Feature activation-1.70e-02

Pos logit contributions
sed+0.108
ses+0.099
cious+0.090
2+0.086
sing+0.086
Neg logit contributions
 accidents-0.094
reating-0.077
 bamboo-0.073
 fog-0.072
 off-0.071
Token pairs
Token ID14729
Feature activation-1.31e-02

Pos logit contributions
sed+0.027
ses+0.025
cious+0.023
ogram+0.022
 three+0.022
Neg logit contributions
 accidents-0.027
reating-0.023
 mess-0.022
rim-0.021
THE-0.021
Token of
Token ID286
Feature activation+8.66e-04

Pos logit contributions
sed+0.025
ses+0.023
cious+0.022
sing+0.021
 eyebrows+0.021
Neg logit contributions
 accidents-0.015
reating-0.013
THE-0.012
 from-0.012
 off-0.011
Token sunglasses
Token ID36251
Feature activation+7.18e-03

Pos logit contributions
 accidents+0.001
reating+0.001
 cartoons+0.001
 mess+0.001
rim+0.001
Neg logit contributions
sed-0.002
ses-0.001
cious-0.001
ogram-0.001
sing-0.001
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
 accidents+0.010
reating+0.010
THE+0.009
 happen+0.009
 mess+0.008
Neg logit contributions
sed-0.012
ses-0.011
cious-0.010
2-0.010
 eight-0.010
Token with
Token ID351
Feature activation+7.84e-04

Pos logit contributions
sed+0.004
cious+0.004
ses+0.004
2+0.004
 three+0.004
Neg logit contributions
 accidents-0.004
reating-0.003
THE-0.003
rim-0.003
 mess-0.003
Token him
Token ID683
Feature activation+1.20e-04

Pos logit contributions
 accidents+0.001
reating+0.001
THE+0.001
 mess+0.001
 emergency+0.001
Neg logit contributions
sed-0.002
ses-0.001
cious-0.001
2-0.001
ogram-0.001
Token.
Token ID13
Feature activation-8.04e-03

Pos logit contributions
 accidents+0.000
THE+0.000
reating+0.000
 mess+0.000
 bamboo+0.000
Neg logit contributions
sed-0.000
cious-0.000
ses-0.000
2-0.000
sel-0.000
Token The
Token ID383
Feature activation-2.48e-02

Pos logit contributions
cious+0.012
sed+0.012
sel+0.011
 finger+0.011
ses+0.011
Neg logit contributions
 accidents-0.011
adays-0.010
Later-0.009
 Bun-0.009
 cartoons-0.009
Token ch
Token ID442
Feature activation-3.39e-02

Pos logit contributions
sed+0.042
cious+0.040
ogram+0.040
sel+0.039
ared+0.038
Neg logit contributions
 accidents-0.033
 mess-0.029
 noise-0.028
 emergency-0.027
 happen-0.026
Tokename
Token ID480
Feature activation-6.26e-02

Pos logit contributions
sed+0.054
cious+0.053
 three+0.049
sel+0.049
 wrists+0.048
Neg logit contributions
 accidents-0.048
rim-0.045
ilt-0.039
 mess-0.039
iver-0.038
Tokenleon
Token ID38970
Feature activation-1.98e-02

Pos logit contributions
sed+0.093
ses+0.089
 wrists+0.085
cious+0.083
2+0.078
Neg logit contributions
 accidents-0.095
rim-0.087
ie-0.083
lf-0.081
THE-0.080
Token said
Token ID531
Feature activation+1.46e-02

Pos logit contributions
sed+0.024
2+0.023
cious+0.023
ses+0.023
sel+0.022
Neg logit contributions
 accidents-0.034
THE-0.029
rim-0.029
ie-0.028
 mess-0.027
Token yes
Token ID3763
Feature activation-8.16e-03

Pos logit contributions
 accidents+0.022
reating+0.018
THE+0.017
 emergency+0.017
rim+0.016
Neg logit contributions
sed-0.024
ses-0.022
cious-0.022
2-0.022
 three-0.021
Token and
Token ID290
Feature activation-1.09e-02

Pos logit contributions
sed+0.014
ses+0.013
ogram+0.012
cious+0.012
 4+0.012
Neg logit contributions
 accidents-0.012
reating-0.010
 mess-0.010
THE-0.009
 happen-0.009
Token the
Token ID262
Feature activation-2.46e-02

Pos logit contributions
sed+0.017
ses+0.015
cious+0.015
2+0.014
ogram+0.014
Neg logit contributions
 accidents-0.018
reating-0.014
THE-0.014
rim-0.013
adays-0.013
Token lighter
Token ID14871
Feature activation+9.19e-03

Pos logit contributions
 accidents+0.019
reating+0.016
rim+0.015
THE+0.015
 mess+0.015
Neg logit contributions
sed-0.020
 three-0.019
cious-0.019
2-0.018
ses-0.018
Token.
Token ID13
Feature activation+1.02e-02

Pos logit contributions
 accidents+0.018
reating+0.015
 mess+0.015
THE+0.014
 happen+0.014
Neg logit contributions
sed-0.011
2-0.010
cious-0.010
 three-0.010
 4-0.010
Token They
Token ID1119
Feature activation+1.32e-03

Pos logit contributions
 accidents+0.017
reating+0.014
THE+0.014
 cartoons+0.013
adays+0.013
Neg logit contributions
sed-0.015
cious-0.014
 three-0.013
ses-0.013
sel-0.013
Token made
Token ID925
Feature activation+3.44e-03

Pos logit contributions
 accidents+0.002
 mess+0.002
THE+0.002
reating+0.002
 bamboo+0.002
Neg logit contributions
sed-0.002
cious-0.002
2-0.002
ses-0.002
ence-0.002
Token more
Token ID517
Feature activation+1.24e-02

Pos logit contributions
 accidents+0.007
rim+0.006
THE+0.006
reating+0.006
iver+0.006
Neg logit contributions
 three-0.004
sed-0.004
ses-0.003
 4-0.003
 hand-0.003
Token mud
Token ID17492
Feature activation+4.84e-02

Pos logit contributions
reating+0.018
adays+0.017
 accidents+0.017
THE+0.017
Later+0.016
Neg logit contributions
 three-0.017
sed-0.017
 finger-0.017
 hand-0.017
ks-0.017
Token pies
Token ID47199
Feature activation+2.41e-02

Pos logit contributions
 accidents+0.080
reating+0.075
rim+0.073
THE+0.072
 vacations+0.072
Neg logit contributions
sed-0.064
 three-0.057
ses-0.057
ks-0.056
sel-0.053
Token and
Token ID290
Feature activation+1.94e-02

Pos logit contributions
 accidents+0.046
reating+0.041
rim+0.039
THE+0.039
 mess+0.038
Neg logit contributions
sed-0.029
 three-0.026
cious-0.025
2-0.025
ses-0.024
Token had
Token ID550
Feature activation-8.62e-03

Pos logit contributions
 accidents+0.028
reating+0.024
rim+0.024
THE+0.023
iver+0.023
Neg logit contributions
sed-0.031
 three-0.030
ses-0.028
cious-0.028
 4-0.027
Token fun
Token ID1257
Feature activation+2.17e-02

Pos logit contributions
sed+0.019
2+0.017
ses+0.017
cious+0.017
ogram+0.017
Neg logit contributions
 accidents-0.008
 mess-0.006
THE-0.006
reating-0.006
 bamboo-0.005
Token.
Token ID13
Feature activation+1.68e-02

Pos logit contributions
 accidents+0.049
reating+0.044
rim+0.042
THE+0.042
 mess+0.041
Neg logit contributions
sed-0.019
 three-0.016
cious-0.015
2-0.015
ses-0.015
Token
Token ID198
Feature activation+8.76e-03

Pos logit contributions
 accidents+0.012
reating+0.011
rim+0.010
THE+0.010
 mess+0.010
Neg logit contributions
sed-0.006
 three-0.005
cious-0.005
2-0.005
ses-0.005
TokenEveryone
Token ID16190
Feature activation+1.02e-02

Pos logit contributions
 accidents+0.016
reating+0.015
 emergency+0.014
 mess+0.014
 happen+0.013
Neg logit contributions
 three-0.011
2-0.010
 4-0.010
cious-0.009
 eight-0.009
Token was
Token ID373
Feature activation+1.08e-03

Pos logit contributions
 accidents+0.017
reating+0.014
THE+0.014
rim+0.014
 mess+0.014
Neg logit contributions
sed-0.015
ses-0.014
cious-0.013
2-0.013
 three-0.013
Token happy
Token ID3772
Feature activation+2.17e-03

Pos logit contributions
 accidents+0.002
reating+0.002
rim+0.002
THE+0.002
 mess+0.001
Neg logit contributions
sed-0.002
 three-0.001
cious-0.001
2-0.001
ses-0.001
Token,
Token ID11
Feature activation-9.99e-03

Pos logit contributions
 accidents+0.004
reating+0.004
rim+0.004
THE+0.003
 mess+0.003
Neg logit contributions
sed-0.003
 three-0.002
cious-0.002
2-0.002
ses-0.002
Token until
Token ID1566
Feature activation+3.78e-02

Pos logit contributions
sed+0.013
 three+0.012
cious+0.011
2+0.011
ses+0.011
Neg logit contributions
 accidents-0.018
reating-0.016
rim-0.015
THE-0.015
 mess-0.015
Token one
Token ID530
Feature activation-9.15e-03

Pos logit contributions
 accidents+0.060
THE+0.048
reating+0.047
 emergency+0.046
 mess+0.045
Neg logit contributions
sed-0.061
cious-0.056
ses-0.055
2-0.055
sel-0.052
Token night
Token ID1755
Feature activation+2.82e-03

Pos logit contributions
sed+0.019
ses+0.018
cious+0.018
2+0.017
 three+0.017
Neg logit contributions
 accidents-0.010
reating-0.008
rim-0.007
THE-0.007
 mess-0.007
Token.
Token ID13
Feature activation+8.92e-04

Pos logit contributions
 accidents+0.005
reating+0.004
THE+0.004
 mess+0.004
rim+0.004
Neg logit contributions
sed-0.004
ses-0.004
cious-0.004
 three-0.004
2-0.004
Token M
Token ID337
Feature activation+5.34e-03

Pos logit contributions
 accidents+0.002
THE+0.001
adays+0.001
reating+0.001
tex+0.001
Neg logit contributions
sed-0.001
cious-0.001
ses-0.001
 wrists-0.001
 finger-0.001
Tokenomm
Token ID2002
Feature activation+1.05e-02

Pos logit contributions
 accidents+0.006
reating+0.005
THE+0.005
rim+0.005
 mess+0.005
Neg logit contributions
sed-0.010
 three-0.010
cious-0.009
2-0.009
ses-0.009
Tokenmy
Token ID1820
Feature activation+2.93e-02

Pos logit contributions
 accidents+0.034
reating+0.029
 mess+0.027
THE+0.027
rim+0.027
Neg logit contributions
sed-0.044
cious-0.040
ses-0.040
2-0.038
sel-0.037
Token!"
Token ID2474
Feature activation-6.40e-03

Pos logit contributions
 accidents+0.048
reating+0.041
 mess+0.040
THE+0.040
rim+0.039
Neg logit contributions
sed-0.044
cious-0.039
ses-0.039
2-0.038
 three-0.037
Token She
Token ID1375
Feature activation-5.02e-03

Pos logit contributions
sed+0.009
cious+0.008
ses+0.008
2+0.008
sel+0.008
Neg logit contributions
 accidents-0.011
THE-0.009
reating-0.009
 cartoons-0.009
rim-0.009
Token drank
Token ID24070
Feature activation+2.72e-02

Pos logit contributions
sed+0.007
cious+0.006
2+0.006
ses+0.006
 three+0.006
Neg logit contributions
 accidents-0.008
 mess-0.007
THE-0.007
reating-0.007
rim-0.007
Token the
Token ID262
Feature activation+1.27e-02

Pos logit contributions
 accidents+0.052
rim+0.050
reating+0.049
iver+0.048
 emergency+0.047
Neg logit contributions
 three-0.029
sed-0.027
 hand-0.024
 4-0.023
 seven-0.023
Token hot
Token ID3024
Feature activation+3.52e-02

Pos logit contributions
 accidents+0.018
reating+0.015
rim+0.014
THE+0.014
 emergency+0.013
Neg logit contributions
sed-0.022
 three-0.020
ses-0.019
cious-0.019
2-0.019
Token milk
Token ID7545
Feature activation+2.21e-02

Pos logit contributions
 accidents+0.050
reating+0.045
THE+0.042
rim+0.041
adays+0.040
Neg logit contributions
sed-0.057
 three-0.055
cious-0.051
ses-0.050
 hand-0.050
Token and
Token ID290
Feature activation+1.38e-02

Pos logit contributions
 accidents+0.038
reating+0.033
THE+0.031
 mess+0.031
rim+0.031
Neg logit contributions
sed-0.031
2-0.027
cious-0.027
 three-0.027
ses-0.027
Token felt
Token ID2936
Feature activation+1.96e-02

Pos logit contributions
 accidents+0.026
reating+0.023
rim+0.022
THE+0.022
 mess+0.022
Neg logit contributions
sed-0.017
 three-0.015
cious-0.014
ses-0.014
2-0.014
Token warm
Token ID5814
Feature activation+3.23e-02

Pos logit contributions
 accidents+0.036
reating+0.035
rim+0.033
lf+0.033
 emergency+0.032
Neg logit contributions
 three-0.023
 hand-0.020
 finger-0.019
sed-0.019
 seven-0.019
Token inside
Token ID2641
Feature activation+1.53e-02

Pos logit contributions
 accidents+0.064
reating+0.057
rim+0.055
THE+0.054
 mess+0.054
Neg logit contributions
sed-0.036
 three-0.032
cious-0.031
2-0.030
ses-0.030
Token.
Token ID13
Feature activation+7.86e-03

Pos logit contributions
 accidents+0.120
reating+0.108
rim+0.105
THE+0.103
 mess+0.102
Neg logit contributions
sed-0.043
 three-0.037
cious-0.035
ses-0.033
2-0.033
Token<|endoftext|>
Token ID50256
Feature activation+1.75e-02

Pos logit contributions
 accidents+0.015
reating+0.013
THE+0.013
rim+0.012
 mess+0.012
Neg logit contributions
sed-0.010
cious-0.009
 three-0.009
ses-0.008
2-0.008
TokenS
Token ID50
Feature activation-1.84e-03

Pos logit contributions
 accidents+0.029
THE+0.025
reating+0.025
rim+0.024
 mess+0.023
Neg logit contributions
sed-0.026
cious-0.023
ses-0.022
 three-0.022
2-0.022
Tokenara
Token ID3301
Feature activation+2.04e-02

Pos logit contributions
sed+0.002
2+0.002
 three+0.002
 4+0.002
cious+0.002
Neg logit contributions
 accidents-0.004
reating-0.003
rim-0.003
 emergency-0.003
iver-0.003
Token was
Token ID373
Feature activation-1.66e-03

Pos logit contributions
 accidents+0.034
reating+0.029
THE+0.029
 mess+0.028
rim+0.028
Neg logit contributions
sed-0.028
 three-0.025
2-0.025
cious-0.025
ses-0.025
Token sick
Token ID6639
Feature activation+3.85e-02

Pos logit contributions
sed+0.002
 three+0.002
ses+0.002
 4+0.002
cious+0.002
Neg logit contributions
 accidents-0.003
reating-0.003
rim-0.003
THE-0.003
 mess-0.003
Token.
Token ID13
Feature activation+1.80e-02

Pos logit contributions
 accidents+0.063
reating+0.053
 mess+0.052
 emergency+0.050
THE+0.050
Neg logit contributions
sed-0.055
cious-0.053
2-0.051
sel-0.049
ses-0.049
Token She
Token ID1375
Feature activation+1.01e-02

Pos logit contributions
 accidents+0.031
reating+0.027
rim+0.026
THE+0.026
 mess+0.025
Neg logit contributions
sed-0.025
 three-0.022
cious-0.022
2-0.022
ses-0.021
Token had
Token ID550
Feature activation-1.75e-02

Pos logit contributions
 accidents+0.021
reating+0.019
rim+0.018
THE+0.018
 emergency+0.017
Neg logit contributions
sed-0.010
 three-0.009
cious-0.009
ses-0.009
2-0.008
Token a
Token ID257
Feature activation+1.20e-02

Pos logit contributions
sed+0.022
2+0.021
ses+0.020
sel+0.019
cious+0.019
Neg logit contributions
 accidents-0.033
 mess-0.028
reating-0.027
THE-0.026
 emergency-0.026
Token fever
Token ID17372
Feature activation+3.39e-02

Pos logit contributions
 accidents+0.019
 mess+0.016
reating+0.015
 emergency+0.015
rim+0.015
Neg logit contributions
sed-0.018
cious-0.017
2-0.017
sel-0.016
 three-0.016
Token mud
Token ID17492
Feature activation+5.34e-02

Pos logit contributions
sed+0.016
cious+0.016
2+0.016
ses+0.015
sel+0.014
Neg logit contributions
 accidents-0.019
 mess-0.016
 mud-0.016
reating-0.015
 fog-0.015
Token and
Token ID290
Feature activation+7.32e-03

Pos logit contributions
 accidents+0.098
reating+0.091
rim+0.089
THE+0.088
iver+0.087
Neg logit contributions
sed-0.064
 three-0.058
ses-0.055
 seven-0.051
sel-0.051
Token dust
Token ID8977
Feature activation+3.22e-02

Pos logit contributions
 accidents+0.010
reating+0.008
 mess+0.008
THE+0.007
rim+0.007
Neg logit contributions
sed-0.013
cious-0.012
2-0.012
 three-0.012
ses-0.012
Token and
Token ID290
Feature activation+6.00e-03

Pos logit contributions
 accidents+0.058
reating+0.050
rim+0.048
 mess+0.048
THE+0.047
Neg logit contributions
sed-0.043
 three-0.038
cious-0.038
2-0.038
ses-0.037
Token bird
Token ID6512
Feature activation+3.02e-03

Pos logit contributions
 accidents+0.008
reating+0.006
rim+0.006
THE+0.006
 mess+0.006
Neg logit contributions
sed-0.011
 three-0.010
cious-0.010
2-0.010
ses-0.010
Token poop
Token ID40125
Feature activation+4.40e-02

Pos logit contributions
 accidents+0.005
reating+0.004
rim+0.004
 mess+0.004
THE+0.004
Neg logit contributions
sed-0.004
 three-0.004
2-0.004
cious-0.004
ses-0.004
Token on
Token ID319
Feature activation+2.09e-03

Pos logit contributions
 accidents+0.086
reating+0.076
rim+0.073
THE+0.072
 mess+0.071
Neg logit contributions
sed-0.051
 three-0.045
cious-0.044
2-0.043
ses-0.043
Token it
Token ID340
Feature activation+2.68e-03

Pos logit contributions
 accidents+0.003
 mess+0.002
reating+0.002
 fog+0.002
 happen+0.002
Neg logit contributions
sed-0.004
cious-0.004
ses-0.004
2-0.004
ence-0.003
Token.
Token ID13
Feature activation+7.34e-03

Pos logit contributions
 accidents+0.006
reating+0.005
 mess+0.005
THE+0.005
rim+0.005
Neg logit contributions
sed-0.003
 three-0.002
cious-0.002
2-0.002
ses-0.002
Token Tim
Token ID5045
Feature activation+9.04e-03

Pos logit contributions
 accidents+0.007
 emergency+0.005
rim+0.005
reating+0.005
 mess+0.005
Neg logit contributions
sed-0.016
cious-0.015
sel-0.015
ses-0.014
 three-0.014
Token and
Token ID290
Feature activation-2.55e-02

Pos logit contributions
 accidents+0.016
 mess+0.014
iver+0.013
rim+0.013
lf+0.013
Neg logit contributions
sed-0.012
ses-0.011
cious-0.010
sel-0.010
 three-0.010
Token go
Token ID467
Feature activation+1.31e-02

Pos logit contributions
sed+0.001
 three+0.001
 4+0.001
cious+0.001
 hand+0.001
Neg logit contributions
 accidents-0.002
reating-0.001
rim-0.001
THE-0.001
 emergency-0.001
Token.
Token ID13
Feature activation+3.89e-04

Pos logit contributions
 accidents+0.020
reating+0.017
rim+0.016
THE+0.016
 mess+0.015
Neg logit contributions
sed-0.021
 three-0.020
cious-0.019
ses-0.019
2-0.019
Token
Token ID198
Feature activation+1.25e-03

Pos logit contributions
 accidents+0.001
THE+0.001
reating+0.001
adays+0.001
uggle+0.001
Neg logit contributions
sed-0.001
cious-0.000
ses-0.000
sel-0.000
 finger-0.000
Token
Token ID198
Feature activation+4.74e-03

Pos logit contributions
 accidents+0.003
reating+0.002
rim+0.002
THE+0.002
 mess+0.002
Neg logit contributions
sed-0.001
 three-0.001
cious-0.001
2-0.001
ses-0.001
TokenThe
Token ID464
Feature activation-2.06e-02

Pos logit contributions
 accidents+0.009
reating+0.008
 mess+0.008
 emergency+0.008
rim+0.007
Neg logit contributions
2-0.005
 three-0.005
cious-0.005
 4-0.005
 seven-0.005
Token woman
Token ID2415
Feature activation-1.67e-03

Pos logit contributions
sed+0.028
 three+0.026
cious+0.025
2+0.025
ses+0.025
Neg logit contributions
 accidents-0.036
reating-0.032
rim-0.030
THE-0.030
 mess-0.029
Token smiled
Token ID13541
Feature activation-1.04e-03

Pos logit contributions
sed+0.002
ses+0.002
cious+0.002
2+0.002
 three+0.002
Neg logit contributions
 accidents-0.003
THE-0.003
rim-0.002
reating-0.002
 mess-0.002
Token and
Token ID290
Feature activation-3.82e-03

Pos logit contributions
sed+0.002
cious+0.002
ses+0.002
2+0.001
ogram+0.001
Neg logit contributions
 accidents-0.002
THE-0.001
reating-0.001
 mess-0.001
iver-0.001
Token went
Token ID1816
Feature activation+1.10e-02

Pos logit contributions
sed+0.005
ogram+0.005
ses+0.005
2+0.005
cious+0.005
Neg logit contributions
 accidents-0.007
THE-0.005
 fog-0.005
 mess-0.005
 bamboo-0.005
Token back
Token ID736
Feature activation+2.07e-03

Pos logit contributions
 accidents+0.018
reating+0.015
rim+0.014
THE+0.014
 mess+0.014
Neg logit contributions
sed-0.017
 three-0.016
cious-0.015
2-0.015
ses-0.015
Token to
Token ID284
Feature activation+1.69e-02

Pos logit contributions
 accidents+0.003
reating+0.003
THE+0.003
 mess+0.003
 bamboo+0.003
Neg logit contributions
sed-0.003
sel-0.003
ses-0.003
2-0.003
cious-0.003
Token play
Token ID711
Feature activation+6.87e-03

Pos logit contributions
 accidents+0.010
reating+0.009
rim+0.008
THE+0.008
 mess+0.008
Neg logit contributions
sed-0.013
 three-0.012
cious-0.012
ses-0.012
2-0.012
Token outside
Token ID2354
Feature activation-6.03e-03

Pos logit contributions
 accidents+0.013
reating+0.013
adays+0.012
 collapse+0.012
rim+0.011
Neg logit contributions
 three-0.008
 "-0.007
 who-0.007
 each-0.006
 hand-0.006
Token and
Token ID290
Feature activation+1.42e-03

Pos logit contributions
 three+0.007
sed+0.007
 4+0.007
cious+0.006
 seven+0.006
Neg logit contributions
 accidents-0.011
reating-0.010
rim-0.010
THE-0.009
 mess-0.009
Token look
Token ID804
Feature activation+3.06e-02

Pos logit contributions
 accidents+0.002
reating+0.002
THE+0.002
 mess+0.002
rim+0.002
Neg logit contributions
sed-0.002
ses-0.002
cious-0.002
2-0.002
 three-0.002
Token for
Token ID329
Feature activation-1.03e-02

Pos logit contributions
 accidents+0.063
rim+0.060
THE+0.057
iver+0.057
 collapse+0.056
Neg logit contributions
 three-0.031
 seven-0.025
 "-0.024
 hand-0.024
 4-0.023
Token pretty
Token ID2495
Feature activation+5.94e-03

Pos logit contributions
 three+0.016
sed+0.014
2+0.014
cious+0.014
 4+0.014
Neg logit contributions
 accidents-0.015
rim-0.014
THE-0.014
iver-0.013
reating-0.013
Token rocks
Token ID12586
Feature activation+1.46e-02

Pos logit contributions
 accidents+0.010
iver+0.010
rim+0.009
 cancel+0.009
 collapse+0.009
Neg logit contributions
 three-0.007
 fingers-0.007
 arms-0.007
 wrists-0.007
sed-0.007
Token.
Token ID13
Feature activation+1.65e-02

Pos logit contributions
 accidents+0.030
reating+0.027
THE+0.026
rim+0.025
 mess+0.025
Neg logit contributions
sed-0.016
 three-0.014
ses-0.014
cious-0.013
2-0.013
Token One
Token ID1881
Feature activation-1.72e-02

Pos logit contributions
 accidents+0.027
reating+0.025
THE+0.024
 cartoons+0.023
 mess+0.023
Neg logit contributions
sed-0.023
cious-0.022
sel-0.021
ses-0.020
ks-0.020
Token day
Token ID1110
Feature activation+1.12e-02

Pos logit contributions
sed+0.031
 three+0.030
cious+0.029
2+0.029
ses+0.029
Neg logit contributions
 accidents-0.022
reating-0.018
rim-0.017
THE-0.016
 mess-0.016
Token,
Token ID11
Feature activation+1.37e-02

Pos logit contributions
 accidents+0.020
reating+0.018
rim+0.018
THE+0.017
iver+0.017
Neg logit contributions
sed-0.014
 three-0.014
cious-0.013
2-0.012
ses-0.012
Tokenapa
Token ID32678
Feature activation+5.42e-03

Pos logit contributions
sed+0.034
ses+0.031
 three+0.031
cious+0.031
 finger+0.029
Neg logit contributions
 accidents-0.031
THE-0.027
reating-0.026
rim-0.026
tex-0.025
Token explained
Token ID4893
Feature activation-9.40e-03

Pos logit contributions
 accidents+0.010
rim+0.009
reating+0.009
 mess+0.008
THE+0.008
Neg logit contributions
sed-0.007
ses-0.006
cious-0.006
 three-0.006
2-0.006
Token that
Token ID326
Feature activation-1.02e-02

Pos logit contributions
sed+0.014
ses+0.012
cious+0.012
2+0.012
 three+0.011
Neg logit contributions
 accidents-0.016
reating-0.014
 mess-0.013
rim-0.013
 emergency-0.013
Token the
Token ID262
Feature activation-1.39e-02

Pos logit contributions
sed+0.016
 three+0.015
cious+0.014
2+0.014
ses+0.014
Neg logit contributions
 accidents-0.016
reating-0.014
rim-0.013
THE-0.013
 mess-0.012
Token coal
Token ID5655
Feature activation+1.93e-02

Pos logit contributions
sed+0.021
 three+0.019
cious+0.019
2+0.019
ses+0.019
Neg logit contributions
 accidents-0.022
reating-0.019
rim-0.018
THE-0.018
 mess-0.018
Token was
Token ID373
Feature activation+6.82e-03

Pos logit contributions
 accidents+0.033
reating+0.027
 mess+0.027
rim+0.027
THE+0.027
Neg logit contributions
sed-0.027
 three-0.024
cious-0.024
2-0.024
ses-0.024
Token for
Token ID329
Feature activation+1.02e-02

Pos logit contributions
 accidents+0.011
 mess+0.009
 emergency+0.009
reating+0.009
rim+0.009
Neg logit contributions
sed-0.010
cious-0.009
2-0.009
ses-0.009
 wrists-0.009
Token the
Token ID262
Feature activation-5.27e-03

Pos logit contributions
 accidents+0.015
reating+0.012
rim+0.012
 emergency+0.012
 mess+0.012
Neg logit contributions
sed-0.017
cious-0.015
ses-0.015
2-0.015
sel-0.015
Token fireplace
Token ID44250
Feature activation-9.17e-03

Pos logit contributions
sed+0.008
 three+0.008
cious+0.007
ses+0.007
2+0.007
Neg logit contributions
 accidents-0.008
reating-0.007
rim-0.007
THE-0.007
 mess-0.006
Token.
Token ID13
Feature activation-1.47e-02

Pos logit contributions
sed+0.013
ses+0.012
2+0.012
cious+0.012
 three+0.012
Neg logit contributions
 accidents-0.016
reating-0.013
 mess-0.013
 emergency-0.012
rim-0.012
Token He
Token ID679
Feature activation-1.56e-02

Pos logit contributions
sed+0.024
ses+0.022
cious+0.022
 wrists+0.020
ogram+0.020
Neg logit contributions
 accidents-0.022
adays-0.017
THE-0.017
rim-0.017
reating-0.016
Token ant
Token ID1885
Feature activation-6.17e-03

Pos logit contributions
 accidents+0.015
 mess+0.013
 emergency+0.012
reating+0.012
 hurricane+0.011
Neg logit contributions
sed-0.016
sel-0.015
cious-0.015
2-0.015
ks-0.015
Token and
Token ID290
Feature activation+1.35e-02

Pos logit contributions
sed+0.008
 three+0.007
cious+0.007
2+0.007
ses+0.007
Neg logit contributions
 accidents-0.011
reating-0.010
rim-0.009
THE-0.009
 mess-0.009
Token wanted
Token ID2227
Feature activation-1.07e-02

Pos logit contributions
 accidents+0.027
reating+0.024
rim+0.024
THE+0.023
iver+0.023
Neg logit contributions
sed-0.015
 three-0.013
cious-0.012
ses-0.012
2-0.012
Token to
Token ID284
Feature activation+5.95e-03

Pos logit contributions
sed+0.015
ses+0.013
2+0.013
sel+0.012
cious+0.012
Neg logit contributions
 accidents-0.020
reating-0.016
 bamboo-0.016
THE-0.016
 mess-0.015
Token grab
Token ID5552
Feature activation-2.20e-02

Pos logit contributions
 accidents+0.010
reating+0.009
rim+0.008
THE+0.008
 vacations+0.008
Neg logit contributions
sed-0.009
 three-0.008
 hand-0.008
cious-0.008
ses-0.008
Token the
Token ID262
Feature activation+2.36e-03

Pos logit contributions
sed+0.046
ses+0.041
cious+0.040
2+0.040
ence+0.039
Neg logit contributions
 accidents-0.026
 mess-0.018
reating-0.018
THE-0.018
 bamboo-0.018
Token apple
Token ID17180
Feature activation+2.49e-03

Pos logit contributions
 accidents+0.003
 mess+0.002
reating+0.002
 emergency+0.002
THE+0.002
Neg logit contributions
sed-0.005
cious-0.004
2-0.004
sel-0.004
ence-0.004
Token.
Token ID13
Feature activation+1.48e-02

Pos logit contributions
 accidents+0.005
reating+0.004
rim+0.004
THE+0.004
 emergency+0.004
Neg logit contributions
sed-0.003
 three-0.003
cious-0.002
2-0.002
ses-0.002
Token She
Token ID1375
Feature activation+3.81e-03

Pos logit contributions
 accidents+0.027
reating+0.022
 mess+0.022
THE+0.022
 emergency+0.021
Neg logit contributions
sed-0.019
cious-0.018
sel-0.017
 three-0.017
ses-0.017
Token climbed
Token ID19952
Feature activation-5.54e-03

Pos logit contributions
 accidents+0.008
reating+0.007
rim+0.006
 mess+0.006
THE+0.006
Neg logit contributions
sed-0.004
 three-0.004
cious-0.004
ses-0.004
2-0.004
Token the
Token ID262
Feature activation+1.27e-02

Pos logit contributions
sed+0.009
 three+0.008
cious+0.008
2+0.008
ses+0.008
Neg logit contributions
 accidents-0.008
reating-0.007
rim-0.007
THE-0.006
 mess-0.006
Token day
Token ID1110
Feature activation-1.94e-02

Pos logit contributions
sed+0.029
ses+0.026
cious+0.026
sel+0.025
 wrists+0.024
Neg logit contributions
 accidents-0.015
THE-0.012
reating-0.012
 happen-0.012
ie-0.011
Token."
Token ID526
Feature activation-1.22e-02

Pos logit contributions
sed+0.036
ses+0.034
 eyebrows+0.029
 wrists+0.029
cious+0.028
Neg logit contributions
 accidents-0.026
THE-0.022
reating-0.022
 happen-0.021
 vacations-0.020
Token
Token ID220
Feature activation-3.29e-02

Pos logit contributions
sed+0.015
cious+0.013
 three+0.013
ses+0.013
2+0.012
Neg logit contributions
 accidents-0.023
reating-0.020
THE-0.020
rim-0.019
 mess-0.019
Token
Token ID198
Feature activation-6.76e-03

Pos logit contributions
sed+0.041
cious+0.036
 three+0.036
ses+0.035
sel+0.033
Neg logit contributions
 accidents-0.061
reating-0.054
THE-0.053
rim-0.051
 mess-0.051
Token
Token ID198
Feature activation-4.62e-04

Pos logit contributions
sed+0.007
 three+0.006
cious+0.006
2+0.006
ses+0.006
Neg logit contributions
 accidents-0.014
reating-0.012
rim-0.012
THE-0.012
 mess-0.012
TokenL
Token ID43
Feature activation+9.80e-03

Pos logit contributions
2+0.001
 three+0.001
 4+0.001
 seven+0.001
cious+0.001
Neg logit contributions
 accidents-0.001
 emergency-0.001
reating-0.001
rim-0.001
 happen-0.001
Tokenily
Token ID813
Feature activation+1.01e-02

Pos logit contributions
 accidents+0.016
reating+0.014
rim+0.013
THE+0.013
 mess+0.013
Neg logit contributions
sed-0.014
 three-0.013
2-0.013
cious-0.012
ses-0.012
Token felt
Token ID2936
Feature activation+4.35e-03

Pos logit contributions
 accidents+0.019
reating+0.016
THE+0.016
rim+0.016
 mess+0.016
Neg logit contributions
sed-0.012
ses-0.011
cious-0.011
 three-0.010
2-0.010
Token happy
Token ID3772
Feature activation-9.21e-03

Pos logit contributions
 accidents+0.008
reating+0.008
rim+0.007
lf+0.007
 emergency+0.007
Neg logit contributions
 three-0.005
 seven-0.005
sed-0.005
 hand-0.004
cious-0.004
Token and
Token ID290
Feature activation-8.29e-04

Pos logit contributions
sed+0.010
 three+0.010
cious+0.009
 hand+0.008
 finger+0.008
Neg logit contributions
 accidents-0.017
rim-0.016
reating-0.016
 emergency-0.016
iver-0.015
Token continued
Token ID3767
Feature activation+1.36e-02

Pos logit contributions
sed+0.001
2+0.001
ses+0.001
 three+0.001
ogram+0.001
Neg logit contributions
 accidents-0.001
THE-0.001
 mess-0.001
 happen-0.001
adays-0.001
Token see
Token ID766
Feature activation-1.69e-03

Pos logit contributions
 accidents+0.004
reating+0.004
THE+0.004
iver+0.004
 mess+0.004
Neg logit contributions
sed-0.004
ses-0.004
2-0.004
 three-0.004
cious-0.004
Token it
Token ID340
Feature activation-1.44e-02

Pos logit contributions
sed+0.003
cious+0.003
2+0.003
sel+0.003
 4+0.003
Neg logit contributions
 accidents-0.002
reating-0.002
 happen-0.002
 emergency-0.002
 bamboo-0.002
Token.
Token ID13
Feature activation+7.58e-03

Pos logit contributions
sed+0.013
 three+0.011
cious+0.011
ses+0.010
2+0.010
Neg logit contributions
 accidents-0.032
reating-0.028
rim-0.028
THE-0.027
 mess-0.027
Token She
Token ID1375
Feature activation-1.12e-02

Pos logit contributions
 accidents+0.011
THE+0.008
uggle+0.008
adays+0.008
reating+0.008
Neg logit contributions
sed-0.012
sel-0.012
cious-0.012
 three-0.011
 finger-0.011
Token hopes
Token ID7176
Feature activation+3.00e-03

Pos logit contributions
sed+0.014
cious+0.013
 three+0.013
2+0.013
 4+0.012
Neg logit contributions
 accidents-0.020
reating-0.017
 mess-0.017
THE-0.017
rim-0.016
Token the
Token ID262
Feature activation-1.97e-02

Pos logit contributions
 accidents+0.005
reating+0.004
THE+0.004
rim+0.004
 mess+0.004
Neg logit contributions
sed-0.005
 three-0.004
cious-0.004
2-0.004
ses-0.004
Token rabbit
Token ID22746
Feature activation-8.52e-03

Pos logit contributions
ence+0.029
sel+0.029
2+0.029
ogram+0.029
sed+0.029
Neg logit contributions
 accidents-0.030
 mess-0.026
 emergency-0.025
 fog-0.023
 noise-0.023
Token is
Token ID318
Feature activation+6.75e-03

Pos logit contributions
sed+0.012
 three+0.011
cious+0.010
2+0.010
ses+0.010
Neg logit contributions
 accidents-0.015
reating-0.013
THE-0.012
rim-0.012
 mess-0.012
Token safe
Token ID3338
Feature activation+1.08e-02

Pos logit contributions
 accidents+0.011
reating+0.010
rim+0.009
THE+0.009
adays+0.009
Neg logit contributions
sed-0.010
 three-0.009
ses-0.009
cious-0.009
 seven-0.008
Token and
Token ID290
Feature activation+3.01e-03

Pos logit contributions
 accidents+0.021
reating+0.018
THE+0.018
rim+0.018
 mess+0.017
Neg logit contributions
sed-0.013
 three-0.011
2-0.011
cious-0.011
ses-0.011
Token finds
Token ID7228
Feature activation-2.88e-04

Pos logit contributions
 accidents+0.005
reating+0.004
rim+0.004
THE+0.004
 mess+0.004
Neg logit contributions
sed-0.005
 three-0.004
cious-0.004
2-0.004
ses-0.004
Token scared
Token ID12008
Feature activation-5.07e-03

Pos logit contributions
sed+0.024
ses+0.022
cious+0.021
 wrists+0.021
2+0.020
Neg logit contributions
 accidents-0.027
 mess-0.022
reating-0.022
 emergency-0.020
THE-0.020
Token of
Token ID286
Feature activation+1.40e-02

Pos logit contributions
sed+0.007
 three+0.007
2+0.007
cious+0.007
 4+0.007
Neg logit contributions
 accidents-0.008
reating-0.007
rim-0.007
THE-0.007
iver-0.006
Token the
Token ID262
Feature activation+5.07e-03

Pos logit contributions
 accidents+0.022
reating+0.018
 mess+0.017
 emergency+0.017
rim+0.016
Neg logit contributions
sed-0.023
cious-0.021
ses-0.020
sel-0.020
2-0.020
Token dark
Token ID3223
Feature activation+1.15e-03

Pos logit contributions
 accidents+0.008
reating+0.007
rim+0.006
THE+0.006
iver+0.006
Neg logit contributions
sed-0.008
 three-0.008
cious-0.007
2-0.007
ses-0.007
Token.
Token ID13
Feature activation-2.35e-03

Pos logit contributions
 accidents+0.002
reating+0.002
rim+0.001
THE+0.001
 mess+0.001
Neg logit contributions
sed-0.002
 three-0.002
cious-0.002
2-0.002
ses-0.002
Token He
Token ID679
Feature activation-1.10e-02

Pos logit contributions
sed+0.004
cious+0.003
sel+0.003
ses+0.003
 three+0.003
Neg logit contributions
 accidents-0.004
THE-0.003
reating-0.003
adays-0.003
 cartoons-0.003
Token would
Token ID561
Feature activation+9.84e-03

Pos logit contributions
sed+0.014
 three+0.013
cious+0.012
ses+0.012
2+0.012
Neg logit contributions
 accidents-0.020
reating-0.017
THE-0.017
rim-0.017
 mess-0.017
Token always
Token ID1464
Feature activation+2.21e-03

Pos logit contributions
 accidents+0.017
reating+0.015
rim+0.014
THE+0.014
 emergency+0.014
Neg logit contributions
sed-0.013
 three-0.013
 4-0.012
cious-0.012
ses-0.012
Token remember
Token ID3505
Feature activation-1.47e-02

Pos logit contributions
 accidents+0.003
 mess+0.003
reating+0.003
 happen+0.003
 emergency+0.003
Neg logit contributions
sed-0.004
cious-0.003
ses-0.003
2-0.003
 three-0.003
Token the
Token ID262
Feature activation+4.98e-03

Pos logit contributions
sed+0.025
cious+0.023
ses+0.023
2+0.023
 three+0.022
Neg logit contributions
 accidents-0.022
reating-0.018
THE-0.017
 mess-0.016
rim-0.016
Token oy
Token ID35104
Feature activation-9.99e-03

Pos logit contributions
 accidents+0.008
reating+0.006
 mess+0.006
rim+0.006
THE+0.006
Neg logit contributions
sed-0.008
cious-0.007
ses-0.007
 three-0.007
2-0.007
Token.
Token ID13
Feature activation+1.24e-02

Pos logit contributions
sed+0.013
 three+0.012
cious+0.011
2+0.011
ses+0.011
Neg logit contributions
 accidents-0.022
reating-0.020
rim-0.019
THE-0.019
 mess-0.018
Token Mom
Token ID11254
Feature activation+2.70e-02

Pos logit contributions
 accidents+0.020
 mess+0.016
reating+0.016
THE+0.016
 cartoons+0.015
Neg logit contributions
sed-0.018
cious-0.017
ses-0.016
sel-0.016
2-0.016
Tokenmy
Token ID1820
Feature activation+1.95e-02

Pos logit contributions
 accidents+0.039
reating+0.032
rim+0.031
THE+0.030
 mess+0.030
Neg logit contributions
sed-0.045
 three-0.042
cious-0.041
2-0.041
ses-0.040
Token found
Token ID1043
Feature activation+6.96e-03

Pos logit contributions
 accidents+0.037
 mess+0.031
reating+0.030
rim+0.030
lf+0.030
Neg logit contributions
sed-0.024
ses-0.022
cious-0.021
 three-0.020
2-0.020
Token the
Token ID262
Feature activation-6.18e-03

Pos logit contributions
 accidents+0.012
reating+0.010
rim+0.010
THE+0.010
 mess+0.010
Neg logit contributions
sed-0.010
cious-0.009
 three-0.009
2-0.009
ses-0.009
Token key
Token ID1994
Feature activation-1.06e-02

Pos logit contributions
sed+0.008
 three+0.007
ses+0.006
 finger+0.006
 hand+0.006
Neg logit contributions
 accidents-0.012
rim-0.011
reating-0.011
THE-0.010
iver-0.010
Token and
Token ID290
Feature activation-6.13e-04

Pos logit contributions
sed+0.019
cious+0.017
ses+0.017
2+0.017
 4+0.016
Neg logit contributions
 accidents-0.015
reating-0.012
 mess-0.012
lf-0.011
 happen-0.011
Token restored
Token ID15032
Feature activation+8.21e-03

Pos logit contributions
sed+0.001
cious+0.001
 three+0.001
ses+0.001
2+0.001
Neg logit contributions
 accidents-0.001
reating-0.001
 mess-0.001
rim-0.001
THE-0.001
Token it
Token ID340
Feature activation+1.27e-02

Pos logit contributions
 accidents+0.013
reating+0.011
 mess+0.010
THE+0.010
rim+0.010
Neg logit contributions
sed-0.013
cious-0.012
ses-0.012
 three-0.012
2-0.012
Token to
Token ID284
Feature activation+1.74e-02

Pos logit contributions
 accidents+0.024
reating+0.021
 mess+0.020
THE+0.020
rim+0.020
Neg logit contributions
sed-0.015
 three-0.014
cious-0.014
2-0.013
ses-0.013
Token Lily
Token ID20037
Feature activation-5.26e-03

Pos logit contributions
 accidents+0.021
reating+0.017
 mess+0.017
 emergency+0.016
 happen+0.016
Neg logit contributions
sed-0.034
cious-0.033
ogram-0.032
2-0.031
ses-0.031
Token did
Token ID750
Feature activation-2.28e-03

Pos logit contributions
 accidents+0.013
 mess+0.011
THE+0.011
reating+0.011
 emergency+0.011
Neg logit contributions
sed-0.008
cious-0.007
2-0.007
 three-0.007
ses-0.007
Token not
Token ID407
Feature activation+3.92e-04

Pos logit contributions
sed+0.003
 three+0.003
cious+0.002
ses+0.002
2+0.002
Neg logit contributions
 accidents-0.004
reating-0.004
rim-0.004
THE-0.003
 emergency-0.003
Token want
Token ID765
Feature activation-2.28e-02

Pos logit contributions
 accidents+0.001
reating+0.001
rim+0.001
THE+0.001
 emergency+0.001
Neg logit contributions
sed-0.000
 three-0.000
cious-0.000
2-0.000
ses-0.000
Token to
Token ID284
Feature activation+3.04e-03

Pos logit contributions
sed+0.018
cious+0.015
2+0.014
 three+0.014
ses+0.014
Neg logit contributions
 accidents-0.053
reating-0.048
rim-0.046
THE-0.046
 mess-0.046
Token be
Token ID307
Feature activation-2.22e-02

Pos logit contributions
 accidents+0.006
THE+0.005
reating+0.005
tex+0.005
rim+0.005
Neg logit contributions
 three-0.004
 hand-0.003
 each-0.003
 finger-0.003
sed-0.003
Token apart
Token ID5475
Feature activation-1.71e-02

Pos logit contributions
sed+0.037
cious+0.035
2+0.034
ses+0.034
sel+0.034
Neg logit contributions
 accidents-0.033
 mess-0.026
reating-0.026
rim-0.026
 emergency-0.025
Token.
Token ID13
Feature activation+4.28e-03

Pos logit contributions
sed+0.021
cious+0.019
ses+0.018
2+0.018
 three+0.018
Neg logit contributions
 accidents-0.033
reating-0.028
 mess-0.027
THE-0.027
rim-0.027
Token They
Token ID1119
Feature activation+4.48e-03

Pos logit contributions
 accidents+0.006
Later+0.005
 Bun+0.004
 noise+0.004
 cartoons+0.004
Neg logit contributions
sel-0.007
cious-0.007
sed-0.007
ence-0.006
 finger-0.006
Token thought
Token ID1807
Feature activation-9.36e-03

Pos logit contributions
 accidents+0.008
 mess+0.007
THE+0.006
 emergency+0.006
 cancel+0.006
Neg logit contributions
sed-0.006
cious-0.005
2-0.005
ses-0.005
sel-0.005
Token and
Token ID290
Feature activation+3.40e-03

Pos logit contributions
sed+0.014
cious+0.013
 three+0.012
2+0.012
ses+0.012
Neg logit contributions
 accidents-0.016
reating-0.013
 mess-0.012
 emergency-0.012
THE-0.012
Token thought
Token ID1807
Feature activation-5.27e-03

Pos logit contributions
 accidents+0.007
 mess+0.006
 happen+0.005
reating+0.005
 noise+0.005
Neg logit contributions
sed-0.004
cious-0.004
2-0.004
ogram-0.004
sel-0.004
Token again
Token ID757
Feature activation-1.43e-02

Pos logit contributions
 three+0.006
sed+0.006
cious+0.005
 hand+0.005
 finger+0.005
Neg logit contributions
 accidents-0.014
rim-0.013
THE-0.013
reating-0.013
tex-0.013
Token.
Token ID13
Feature activation-6.73e-03

Pos logit contributions
sed+0.022
ses+0.020
2+0.018
cious+0.018
ogram+0.018
Neg logit contributions
 accidents-0.024
 mess-0.020
reating-0.020
 happen-0.019
THE-0.019
Token Tom
Token ID4186
Feature activation+2.27e-03

Pos logit contributions
sed+0.010
cious+0.009
 finger+0.009
sel+0.009
ses+0.009
Neg logit contributions
 accidents-0.010
adays-0.009
reating-0.009
 mess-0.008
 cartoons-0.008
Token asked
Token ID1965
Feature activation-1.23e-02

Pos logit contributions
 accidents+0.004
 mess+0.003
reating+0.003
THE+0.003
rim+0.003
Neg logit contributions
sed-0.003
ses-0.003
cious-0.003
 three-0.003
2-0.003
Token why
Token ID1521
Feature activation-8.69e-03

Pos logit contributions
sed+0.023
cious+0.021
ses+0.021
 three+0.021
2+0.020
Neg logit contributions
 accidents-0.016
reating-0.013
rim-0.012
THE-0.012
 emergency-0.012
Token and
Token ID290
Feature activation-9.60e-03

Pos logit contributions
sed+0.017
ses+0.015
cious+0.015
 three+0.014
2+0.014
Neg logit contributions
 accidents-0.012
reating-0.009
 mess-0.008
lf-0.008
 happen-0.008
Token his
Token ID465
Feature activation-9.89e-03

Pos logit contributions
sed+0.018
ses+0.017
cious+0.016
2+0.016
sel+0.015
Neg logit contributions
 accidents-0.013
reating-0.011
 emergency-0.010
lf-0.009
 mess-0.009
Token mom
Token ID1995
Feature activation+6.09e-03

Pos logit contributions
sed+0.013
2+0.012
cious+0.012
 three+0.012
ses+0.012
Neg logit contributions
 accidents-0.017
reating-0.015
 mess-0.014
rim-0.014
 emergency-0.014
Token told
Token ID1297
Feature activation+9.16e-03

Pos logit contributions
 accidents+0.012
reating+0.010
 mess+0.010
rim+0.010
THE+0.009
Neg logit contributions
sed-0.008
ses-0.007
cious-0.007
2-0.006
 three-0.006
Token him
Token ID683
Feature activation-1.49e-03

Pos logit contributions
 accidents+0.010
THE+0.008
rim+0.008
reating+0.007
iver+0.007
Neg logit contributions
 three-0.018
sed-0.017
2-0.017
 4-0.016
cious-0.016
Token that
Token ID326
Feature activation+2.47e-03

Pos logit contributions
sed+0.002
ses+0.002
2+0.002
 three+0.002
cious+0.002
Neg logit contributions
 accidents-0.002
reating-0.002
THE-0.002
 mess-0.002
rim-0.002
TokenYes
Token ID5297
Feature activation-1.00e-02

Pos logit contributions
 accidents+0.016
THE+0.013
reating+0.013
rim+0.011
Time+0.011
Neg logit contributions
sed-0.034
ses-0.031
cious-0.030
 finger-0.030
 three-0.029
Token,
Token ID11
Feature activation-1.11e-02

Pos logit contributions
sed+0.015
 three+0.013
cious+0.013
2+0.013
ses+0.013
Neg logit contributions
 accidents-0.017
reating-0.014
rim-0.014
THE-0.014
 mess-0.013
Token they
Token ID484
Feature activation-8.02e-04

Pos logit contributions
sed+0.018
ses+0.016
cious+0.016
2+0.016
sel+0.015
Neg logit contributions
 accidents-0.017
reating-0.014
 mess-0.013
THE-0.013
rim-0.013
Token are
Token ID389
Feature activation+1.37e-02

Pos logit contributions
sed+0.001
 three+0.001
cious+0.001
ses+0.001
2+0.001
Neg logit contributions
 accidents-0.001
reating-0.001
 mess-0.001
THE-0.001
rim-0.001
Token ost
Token ID23619
Feature activation-2.17e-02

Pos logit contributions
 accidents+0.021
reating+0.017
 mess+0.017
THE+0.017
rim+0.017
Neg logit contributions
sed-0.022
 three-0.019
2-0.019
ses-0.019
cious-0.019
Tokenric
Token ID1173
Feature activation-4.01e-02

Pos logit contributions
 3+0.040
 three+0.040
 Sure+0.036
 hand+0.035
 finger+0.035
Neg logit contributions
rim-0.028
ie-0.023
asis-0.021
iver-0.021
 accidents-0.020
Tokenhes
Token ID956
Feature activation-1.24e-02

Pos logit contributions
sed+0.073
ses+0.065
 wrists+0.064
 three+0.064
 3+0.063
Neg logit contributions
 accidents-0.049
THE-0.042
rim-0.042
reating-0.041
 mess-0.040
Token,"
Token ID553
Feature activation-1.26e-02

Pos logit contributions
sed+0.020
2+0.018
cious+0.018
ses+0.017
 three+0.017
Neg logit contributions
 accidents-0.019
reating-0.016
rim-0.016
 mess-0.015
THE-0.015
Token mom
Token ID1995
Feature activation+6.17e-04

Pos logit contributions
sed+0.017
cious+0.016
2+0.016
sel+0.015
ses+0.015
Neg logit contributions
 accidents-0.022
reating-0.018
rim-0.017
 mess-0.017
THE-0.017
Token said
Token ID531
Feature activation+6.37e-03

Pos logit contributions
 accidents+0.001
reating+0.001
rim+0.001
lf+0.001
ie+0.001
Neg logit contributions
sed-0.001
ses-0.001
cious-0.001
 three-0.001
 wrists-0.001
Token.
Token ID13
Feature activation-1.24e-03

Pos logit contributions
 accidents+0.009
 off+0.007
 happen+0.007
 noise+0.007
 cartoons+0.007
Neg logit contributions
ogram-0.010
sed-0.010
2-0.010
ses-0.010
 seven-0.010
Token.
Token ID13
Feature activation-7.80e-03

Pos logit contributions
 accidents+0.002
reating+0.001
 mess+0.001
 happen+0.001
 off+0.001
Neg logit contributions
sed-0.001
ses-0.001
ogram-0.001
cious-0.001
2-0.001
Token
Token ID198
Feature activation-2.36e-03

Pos logit contributions
sed+0.011
cious+0.011
sel+0.011
 wrists+0.010
ence+0.010
Neg logit contributions
 accidents-0.012
THE-0.010
 bamboo-0.010
reating-0.010
Later-0.010
Token
Token ID198
Feature activation-5.79e-04

Pos logit contributions
sed+0.002
 three+0.002
2+0.002
cious+0.002
ses+0.002
Neg logit contributions
 accidents-0.005
reating-0.005
rim-0.004
 emergency-0.004
 mess-0.004
TokenThey
Token ID2990
Feature activation-5.12e-03

Pos logit contributions
sed+0.001
 three+0.001
2+0.001
cious+0.001
 seven+0.001
Neg logit contributions
 accidents-0.001
reating-0.001
 emergency-0.001
rim-0.001
 mess-0.001
Token hugged
Token ID41130
Feature activation+7.41e-03

Pos logit contributions
sed+0.008
ses+0.008
cious+0.007
ogram+0.007
2+0.007
Neg logit contributions
 accidents-0.007
THE-0.006
 mess-0.006
 cancel-0.006
tex-0.006
Token each
Token ID1123
Feature activation-3.97e-02

Pos logit contributions
 accidents+0.016
reating+0.014
THE+0.014
rim+0.013
 mess+0.013
Neg logit contributions
sed-0.007
cious-0.006
 three-0.006
2-0.006
ses-0.006
Token other
Token ID584
Feature activation+9.58e-03

Pos logit contributions
sed+0.085
ses+0.075
ence+0.074
cious+0.073
 wrists+0.069
Neg logit contributions
reating-0.043
 accidents-0.041
 mess-0.037
THE-0.037
tex-0.037
Token and
Token ID290
Feature activation-1.00e-03

Pos logit contributions
 accidents+0.017
reating+0.014
THE+0.014
 mess+0.014
rim+0.013
Neg logit contributions
sed-0.014
ses-0.012
cious-0.012
2-0.012
 three-0.012
Token went
Token ID1816
Feature activation+9.54e-03

Pos logit contributions
sed+0.001
2+0.001
cious+0.001
ogram+0.001
 three+0.001
Neg logit contributions
 accidents-0.002
 mess-0.001
THE-0.001
reating-0.001
tex-0.001
Token back
Token ID736
Feature activation+1.13e-03

Pos logit contributions
 accidents+0.019
reating+0.017
rim+0.017
THE+0.016
 mess+0.016
Neg logit contributions
sed-0.010
 three-0.009
cious-0.009
2-0.009
ses-0.009
Token to
Token ID284
Feature activation+2.08e-02

Pos logit contributions
 accidents+0.002
reating+0.002
uggle+0.002
 mess+0.002
THE+0.002
Neg logit contributions
sed-0.001
sel-0.001
2-0.001
cious-0.001
ses-0.001
Token's
Token ID338
Feature activation+4.65e-03

Pos logit contributions
 three+0.012
 hand+0.008
sed+0.007
cious+0.007
 one+0.007
Neg logit contributions
 accidents-0.066
reating-0.065
rim-0.064
 bamboo-0.061
lf-0.060
Token soar
Token ID48701
Feature activation-2.29e-02

Pos logit contributions
 accidents+0.008
reating+0.007
rim+0.006
THE+0.006
 mess+0.006
Neg logit contributions
 three-0.007
sed-0.007
 hand-0.006
 finger-0.006
 4-0.006
Token together
Token ID1978
Feature activation-2.71e-02

Pos logit contributions
sed+0.029
 three+0.029
 seven+0.026
 4+0.026
2+0.026
Neg logit contributions
 accidents-0.041
reating-0.036
rim-0.035
 emergency-0.034
 mess-0.033
Token to
Token ID284
Feature activation+8.25e-03

Pos logit contributions
sed+0.047
ses+0.043
cious+0.042
sel+0.040
2+0.040
Neg logit contributions
 accidents-0.037
reating-0.032
THE-0.032
 mess-0.030
 cartoons-0.028
Token the
Token ID262
Feature activation+6.02e-03

Pos logit contributions
 accidents+0.013
reating+0.011
 mess+0.010
 happen+0.010
THE+0.010
Neg logit contributions
sed-0.014
cious-0.012
ses-0.012
sel-0.012
2-0.011
Token top
Token ID1353
Feature activation-4.01e-02

Pos logit contributions
 accidents+0.009
reating+0.008
rim+0.008
THE+0.007
 mess+0.007
Neg logit contributions
sed-0.009
 three-0.009
2-0.008
 4-0.008
cious-0.008
Token of
Token ID286
Feature activation+1.35e-02

Pos logit contributions
sed+0.076
cious+0.069
ses+0.068
sel+0.064
 three+0.064
Neg logit contributions
 accidents-0.051
reating-0.044
THE-0.041
 mess-0.041
rim-0.038
Token the
Token ID262
Feature activation+1.62e-02

Pos logit contributions
 accidents+0.022
reating+0.019
 mess+0.018
THE+0.018
 emergency+0.018
Neg logit contributions
sed-0.020
cious-0.019
2-0.018
ses-0.018
 three-0.018
Token mountain
Token ID8598
Feature activation-4.18e-02

Pos logit contributions
 accidents+0.025
reating+0.022
rim+0.021
THE+0.020
iver+0.019
Neg logit contributions
sed-0.025
 three-0.024
2-0.022
 4-0.022
 seven-0.022
Token!"
Token ID2474
Feature activation-1.64e-02

Pos logit contributions
sed+0.078
cious+0.074
ses+0.071
 eyebrows+0.069
sel+0.068
Neg logit contributions
 accidents-0.049
 mess-0.044
THE-0.043
reating-0.041
asis-0.039
Token But
Token ID887
Feature activation+2.08e-03

Pos logit contributions
sed+0.033
cious+0.031
 three+0.031
2+0.031
ses+0.031
Neg logit contributions
 accidents-0.017
reating-0.012
THE-0.012
 mess-0.012
 emergency-0.012
Token her
Token ID607
Feature activation-9.75e-03

Pos logit contributions
sed+0.028
 three+0.024
cious+0.024
ses+0.024
2+0.024
Neg logit contributions
 accidents-0.039
reating-0.034
rim-0.032
THE-0.032
 mess-0.032
Token best
Token ID1266
Feature activation-2.19e-02

Pos logit contributions
sed+0.015
2+0.013
ses+0.013
cious+0.013
 three+0.013
Neg logit contributions
 accidents-0.016
reating-0.013
 mess-0.013
THE-0.013
 emergency-0.013
Token,
Token ID11
Feature activation-5.56e-03

Pos logit contributions
sed+0.027
2+0.024
cious+0.023
ses+0.023
 three+0.023
Neg logit contributions
 accidents-0.043
reating-0.036
 mess-0.035
rim-0.035
THE-0.034
Token she
Token ID673
Feature activation-1.78e-03

Pos logit contributions
sed+0.008
 three+0.008
cious+0.008
2+0.008
ses+0.007
Neg logit contributions
 accidents-0.009
reating-0.007
rim-0.007
THE-0.007
 mess-0.007
Token could
Token ID714
Feature activation+3.03e-03

Pos logit contributions
sed+0.002
cious+0.002
 three+0.002
2+0.002
ses+0.002
Neg logit contributions
 accidents-0.004
reating-0.003
THE-0.003
 mess-0.003
rim-0.003
Token win
Token ID1592
Feature activation-3.93e-02

Pos logit contributions
 accidents+0.005
reating+0.005
rim+0.005
THE+0.004
 emergency+0.004
Neg logit contributions
sed-0.004
 three-0.004
cious-0.003
 4-0.003
2-0.003
Token anything
Token ID1997
Feature activation+8.99e-03

Pos logit contributions
sed+0.058
ses+0.051
cious+0.051
 three+0.050
2+0.049
Neg logit contributions
 accidents-0.068
reating-0.057
 mess-0.054
THE-0.054
rim-0.053
Token she
Token ID673
Feature activation-1.07e-02

Pos logit contributions
 accidents+0.015
reating+0.014
rim+0.014
THE+0.013
iver+0.013
Neg logit contributions
sed-0.011
 three-0.011
cious-0.010
2-0.010
 4-0.010
Token wanted
Token ID2227
Feature activation-7.83e-03

Pos logit contributions
sed+0.012
ses+0.010
cious+0.010
 three+0.010
2+0.010
Neg logit contributions
 accidents-0.021
 mess-0.018
THE-0.018
 happen-0.018
reating-0.018
Token.
Token ID13
Feature activation+1.47e-02

Pos logit contributions
sed+0.010
ses+0.009
cious+0.008
2+0.008
 three+0.008
Neg logit contributions
 accidents-0.015
reating-0.013
THE-0.012
 mess-0.012
 happen-0.012
Token<|endoftext|>
Token ID50256
Feature activation+2.18e-02

Pos logit contributions
 accidents+0.028
reating+0.025
THE+0.024
rim+0.023
 mess+0.023
Neg logit contributions
sed-0.018
cious-0.016
 three-0.016
ses-0.015
2-0.015
Token walk
Token ID2513
Feature activation-1.29e-02

Pos logit contributions
 accidents+0.021
reating+0.017
 mess+0.016
THE+0.016
 emergency+0.016
Neg logit contributions
sed-0.030
ses-0.028
cious-0.027
2-0.027
 three-0.027
Token in
Token ID287
Feature activation-1.72e-02

Pos logit contributions
sed+0.027
ses+0.025
cious+0.023
 finger+0.022
 eyebrows+0.022
Neg logit contributions
 accidents-0.013
THE-0.012
ie-0.012
reating-0.012
 vacations-0.011
Token the
Token ID262
Feature activation+3.76e-03

Pos logit contributions
sed+0.034
cious+0.030
ses+0.029
2+0.028
sel+0.028
Neg logit contributions
 accidents-0.023
reating-0.020
THE-0.020
 mess-0.019
 vacations-0.019
Token woods
Token ID16479
Feature activation-2.18e-02

Pos logit contributions
 accidents+0.006
rim+0.005
reating+0.005
iver+0.004
 happen+0.004
Neg logit contributions
 three-0.006
sed-0.006
 4-0.006
2-0.005
 seven-0.005
Token.
Token ID13
Feature activation-5.91e-04

Pos logit contributions
sed+0.041
ses+0.037
cious+0.035
ige+0.035
 wrists+0.034
Neg logit contributions
 accidents-0.025
 mess-0.024
reating-0.023
THE-0.023
 roads-0.022
Token
Token ID220
Feature activation-3.87e-02

Pos logit contributions
sed+0.001
ses+0.001
cious+0.001
 finger+0.001
sel+0.001
Neg logit contributions
 accidents-0.001
 vacations-0.001
THE-0.001
reating-0.001
 mess-0.001
Token
Token ID198
Feature activation+1.45e-03

Pos logit contributions
sed+0.070
cious+0.064
ses+0.063
 wrists+0.060
sel+0.060
Neg logit contributions
 accidents-0.050
THE-0.046
reating-0.042
iver-0.041
uggle-0.041
Token
Token ID198
Feature activation+4.85e-03

Pos logit contributions
 accidents+0.002
reating+0.002
THE+0.002
 vacations+0.002
rim+0.002
Neg logit contributions
sed-0.002
ses-0.002
cious-0.002
 finger-0.002
 three-0.002
TokenThe
Token ID464
Feature activation-6.88e-03

Pos logit contributions
 accidents+0.007
reating+0.005
rim+0.005
 emergency+0.005
 mess+0.005
Neg logit contributions
sed-0.008
 three-0.008
cious-0.008
2-0.008
 4-0.008
Token forest
Token ID8222
Feature activation-8.06e-03

Pos logit contributions
sed+0.010
 three+0.010
 4+0.008
ses+0.008
 seven+0.008
Neg logit contributions
 accidents-0.013
rim-0.011
reating-0.011
iver-0.011
tex-0.010
Token was
Token ID373
Feature activation-1.00e-02

Pos logit contributions
sed+0.012
cious+0.011
ses+0.011
2+0.011
 three+0.011
Neg logit contributions
 accidents-0.013
 mess-0.011
reating-0.011
rim-0.011
THE-0.010