TOP ACTIVATIONS
MAX = 5.645

The dog w ags his tail more . He
The dog w ags its tail and l icks
The dog w ags his tail . He sees
The dog w ags its tail . It likes
The dog w ags its tail again . It
pet Rex . He w ags his tail and l icks
her band . It w ags its tail and bar ks
The dog w ags its tail . It runs
see Sam . He w ags his tail and l icks
not answer . It w ags its tail . It l
hello . The dog w ags its tail and l icks
hello . The dog w ags his tail . He likes
?" Tom w ags his tail and l icks
their hands . Rex w ags his tail . He l
The dog w ags its tail . It l
ike bar ks and w ags his tail . He is
The dog w ags its tail . It likes
stops barking . It w ags its tail . It l
Ben 's feet and w ags his tail .
The dog w ags its tail . It l

MIN ACTIVATIONS
MIN = -4.451

We have different likes and disl ikes . We can be
had hamm ers , screw drivers , pl iers , and
for some funny little sun fl owers . Her m ama
garden and admire his sun fl owers . They also see
careful , Lily . Cater pill ars can be tricky .
back with lots of sun fl owers to fill the prince
know , Lily . Cater pill ars need leaves and air
Did you know that cater pill ars turn into beautiful butterflies
watching her eat the pine con es . The squirrel said
da is ies , sun fl owers , and tul ips
away in her pig gy bank every week . This way
happy with all the sun fl owers and he ordered his
<|endoftext|> L ily loved sun fl owers . She liked to
and lady bugs and cater pill ars ." Tim
lots of big juicy pine con es , so Penny started
and watched as the cater pill ars grew bigger and bigger
stretch out and enormous sun fl owers would show their bright
the biggest and brightest sun fl owers she could find .
and Lily felt even sad der . From that day on
he noticed some other cater pill ars stretching towards it too

INTERVAL 3.121 - 5.645
CONTAINS 0.006%

. She grabbed her nap kin and covered her face .
She found some old r ags which were dirty . She
. He grabbed the nap kin and wiped his face .
iffs them . It w ags its tail . It l
mom gave her a nap kin and showed her how to

INTERVAL 0.597 - 3.121
CONTAINS 8.466%

He now felt warm and cozy . His cold ness had
" We 're glad you 're safe . We love you
. Lucy and her brother stood back and cl apped with
teacher , Miss Rose . One day , Lily went to
safe . She pushed it back up the hill , taking

INTERVAL -1.927 - 0.597
CONTAINS 91.423%

prayed with her . They prayed to God and
the instrument until it looked like new again .
tree with lots of leaves . He ran over to the
beautiful season . She was so grateful to be able to
like to play baseball , but now we can 't ,"

INTERVAL -4.451 - -1.927
CONTAINS 0.105%

, rabbits , and fox es all gathered around to listen
- old t ipt o ed up to the toilet ,
Her mom said , " Good idea , Mia . Let
Soon , the broccoli - fl avored l oll ip ops
. It was a special part of the family . Everyone
Token
Token ID198
Feature activation+2.14e-01

Pos logit contributions
over+0.037
date+0.036
ute+0.035
apping+0.034
pron+0.033
Neg logit contributions
 imag-0.030
 scratches-0.028
 pulls-0.027
 pret-0.027
 hides-0.026
Token
Token ID198
Feature activation+2.26e-01

Pos logit contributions
over+0.009
date+0.008
apping+0.008
ute+0.008
he+0.008
Neg logit contributions
 imag-0.011
 hides-0.010
 scratches-0.010
 pulls-0.010
 pret-0.010
TokenThe
Token ID464
Feature activation+4.86e-01

Pos logit contributions
over+0.009
date+0.008
ute+0.008
apping+0.008
he+0.008
Neg logit contributions
 imag-0.012
 scratches-0.012
 pulls-0.012
 hides-0.011
 pret-0.011
Token dog
Token ID3290
Feature activation-5.88e-01

Pos logit contributions
over+0.024
date+0.024
apping+0.024
ute+0.022
he+0.022
Neg logit contributions
 imag-0.020
 scratches-0.020
 hides-0.019
 pulls-0.019
 screams-0.019
Token w
Token ID266
Feature activation-1.10e+00

Pos logit contributions
drivers+0.022
 imag+0.021
 fault+0.018
 supervision+0.018
 hides+0.016
Neg logit contributions
over-0.029
 year-0.028
apping-0.028
ute-0.028
he-0.027
Tokenags
Token ID3775
Feature activation+4.99e+00

Pos logit contributions
 imag+0.068
 scratches+0.062
 pulls+0.061
 hides+0.061
 pret+0.060
Neg logit contributions
over-0.038
date-0.036
pron-0.033
 year-0.033
ute-0.033
Token his
Token ID465
Feature activation-1.75e-01

Pos logit contributions
over+0.286
date+0.266
apping+0.263
ute+0.262
he+0.259
Neg logit contributions
 imag-0.197
 scratches-0.172
 hides-0.170
 pret-0.168
 pulls-0.168
Token tail
Token ID7894
Feature activation+7.53e-02

Pos logit contributions
 imag+0.008
 scratches+0.008
 pulls+0.007
 pret+0.007
 hides+0.007
Neg logit contributions
over-0.008
date-0.008
he-0.008
apping-0.008
ute-0.008
Token more
Token ID517
Feature activation+2.35e-01

Pos logit contributions
over+0.004
date+0.003
ute+0.003
apping+0.003
pron+0.003
Neg logit contributions
 imag-0.004
 scratches-0.003
 pulls-0.003
 hides-0.003
 pret-0.003
Token.
Token ID13
Feature activation+6.11e-01

Pos logit contributions
date+0.012
over+0.011
ute+0.011
pron+0.011
he+0.010
Neg logit contributions
 imag-0.010
 scratches-0.009
 screams-0.009
 pulls-0.009
 cries-0.009
Token He
Token ID679
Feature activation+3.49e-01

Pos logit contributions
over+0.032
ute+0.030
date+0.029
apping+0.029
 year+0.028
Neg logit contributions
 imag-0.027
 scratches-0.024
 pulls-0.024
 hides-0.024
 pret-0.024
Token
Token ID198
Feature activation+8.60e-02

Pos logit contributions
over+0.019
date+0.018
ute+0.018
apping+0.017
he+0.017
Neg logit contributions
 imag-0.012
 pret-0.012
 scratches-0.011
 pulls-0.011
 knocks-0.011
Token
Token ID198
Feature activation+1.20e-01

Pos logit contributions
over+0.005
date+0.004
ute+0.004
apping+0.004
pron+0.004
Neg logit contributions
 imag-0.003
 scratches-0.003
 pulls-0.003
 pret-0.003
 knocks-0.003
TokenThe
Token ID464
Feature activation+5.36e-01

Pos logit contributions
over+0.006
ute+0.005
date+0.005
apping+0.005
he+0.005
Neg logit contributions
 imag-0.005
 scratches-0.005
 pulls-0.005
 pret-0.005
 hides-0.005
Token dog
Token ID3290
Feature activation-5.22e-01

Pos logit contributions
over+0.028
date+0.027
apping+0.027
he+0.025
abby+0.025
Neg logit contributions
 imag-0.022
 scratches-0.021
 screams-0.020
 pret-0.020
 hides-0.020
Token w
Token ID266
Feature activation-9.85e-01

Pos logit contributions
 imag+0.018
drivers+0.017
 fault+0.015
 scratches+0.014
 supervision+0.014
Neg logit contributions
over-0.027
 year-0.027
apping-0.026
ute-0.026
he-0.025
Tokenags
Token ID3775
Feature activation+4.96e+00

Pos logit contributions
 imag+0.060
 scratches+0.055
 pulls+0.054
 hides+0.053
 pret+0.052
Neg logit contributions
over-0.035
date-0.033
pron-0.032
 year-0.031
ute-0.031
Token its
Token ID663
Feature activation-2.24e-01

Pos logit contributions
over+0.300
date+0.281
ute+0.277
apping+0.275
he+0.273
Neg logit contributions
 imag-0.176
 scratches-0.151
 pret-0.149
 hides-0.149
 pulls-0.148
Token tail
Token ID7894
Feature activation+1.07e-01

Pos logit contributions
 imag+0.011
 scratches+0.010
 hides+0.010
 pulls+0.010
 pret+0.010
Neg logit contributions
over-0.010
date-0.010
he-0.009
apping-0.009
ute-0.009
Token and
Token ID290
Feature activation+3.55e-01

Pos logit contributions
over+0.005
date+0.005
ute+0.005
pron+0.005
he+0.005
Neg logit contributions
 imag-0.005
 scratches-0.005
 pulls-0.005
 hides-0.004
 pret-0.004
Token l
Token ID300
Feature activation-7.33e-01

Pos logit contributions
over+0.015
 year+0.014
pron+0.014
ute+0.014
mber+0.014
Neg logit contributions
drivers-0.015
 imag-0.014
 fault-0.013
 supervision-0.012
 fourteen-0.011
Tokenicks
Token ID3378
Feature activation-5.49e-01

Pos logit contributions
 imag+0.022
 hides+0.018
 scratches+0.018
 pulls+0.017
 pret+0.017
Neg logit contributions
over-0.049
date-0.046
ute-0.046
apping-0.046
he-0.045
Token
Token ID198
Feature activation+7.74e-02

Pos logit contributions
over+0.020
date+0.019
ute+0.019
apping+0.019
 year+0.018
Neg logit contributions
 imag-0.017
 scratches-0.015
 hides-0.015
 pulls-0.015
 pret-0.015
Token
Token ID198
Feature activation+9.11e-02

Pos logit contributions
over+0.004
date+0.003
apping+0.003
ute+0.003
pron+0.003
Neg logit contributions
 imag-0.004
 scratches-0.003
 pulls-0.003
 hides-0.003
 pret-0.003
TokenThe
Token ID464
Feature activation+3.96e-01

Pos logit contributions
over+0.004
ute+0.003
date+0.003
apping+0.003
pron+0.003
Neg logit contributions
 imag-0.005
 scratches-0.005
 pulls-0.005
 hides-0.005
 pret-0.005
Token dog
Token ID3290
Feature activation-6.71e-01

Pos logit contributions
over+0.021
date+0.020
apping+0.020
abby+0.019
ute+0.019
Neg logit contributions
 scratches-0.016
 imag-0.016
 hides-0.015
 screams-0.015
 pulls-0.015
Token w
Token ID266
Feature activation-1.05e+00

Pos logit contributions
drivers+0.027
 imag+0.026
 fault+0.023
 supervision+0.022
 disl+0.021
Neg logit contributions
 year-0.030
over-0.030
apping-0.029
ute-0.028
 world-0.028
Tokenags
Token ID3775
Feature activation+4.94e+00

Pos logit contributions
 imag+0.069
 hides+0.062
 scratches+0.062
 pulls+0.062
 pret+0.061
Neg logit contributions
over-0.034
date-0.029
apping-0.029
ute-0.029
he-0.028
Token his
Token ID465
Feature activation-8.93e-02

Pos logit contributions
over+0.276
date+0.255
apping+0.254
ute+0.253
he+0.249
Neg logit contributions
 imag-0.203
 hides-0.175
 scratches-0.174
 pulls-0.172
 pret-0.171
Token tail
Token ID7894
Feature activation-3.61e-02

Pos logit contributions
 imag+0.005
 scratches+0.004
 hides+0.004
 pulls+0.004
 pret+0.004
Neg logit contributions
over-0.004
date-0.004
apping-0.004
he-0.004
ute-0.004
Token.
Token ID13
Feature activation+4.06e-01

Pos logit contributions
 imag+0.002
 hides+0.002
 scratches+0.002
 pulls+0.002
 pret+0.002
Neg logit contributions
over-0.002
date-0.001
apping-0.001
ute-0.001
he-0.001
Token He
Token ID679
Feature activation+2.61e-01

Pos logit contributions
over+0.020
ute+0.019
date+0.019
apping+0.019
he+0.018
Neg logit contributions
 imag-0.019
 scratches-0.017
 hides-0.017
 pulls-0.017
 pret-0.016
Token sees
Token ID7224
Feature activation-3.64e-02

Pos logit contributions
 year+0.011
over+0.010
ute+0.010
apping+0.010
 fisherman+0.010
Neg logit contributions
drivers-0.011
 imag-0.010
 supervision-0.009
 fault-0.009
esh-0.008
Token
Token ID198
Feature activation+9.61e-02

Pos logit contributions
over+0.018
date+0.017
ute+0.016
apping+0.016
he+0.016
Neg logit contributions
 imag-0.015
 scratches-0.014
 pret-0.013
 hides-0.013
 pulls-0.013
Token
Token ID198
Feature activation+1.23e-01

Pos logit contributions
over+0.005
date+0.004
apping+0.004
ute+0.004
pron+0.004
Neg logit contributions
 imag-0.004
 scratches-0.004
 pulls-0.004
 hides-0.004
 pret-0.004
TokenThe
Token ID464
Feature activation+5.33e-01

Pos logit contributions
over+0.005
date+0.005
apping+0.004
ute+0.004
he+0.004
Neg logit contributions
 imag-0.006
 scratches-0.006
 pulls-0.006
 hides-0.006
 pret-0.006
Token dog
Token ID3290
Feature activation-5.39e-01

Pos logit contributions
over+0.028
date+0.027
apping+0.027
he+0.025
abby+0.025
Neg logit contributions
 imag-0.022
 scratches-0.022
 hides-0.020
 screams-0.020
 pret-0.020
Token w
Token ID266
Feature activation-1.04e+00

Pos logit contributions
 imag+0.019
drivers+0.018
 fault+0.016
 scratches+0.015
 supervision+0.015
Neg logit contributions
over-0.028
 year-0.027
apping-0.027
ute-0.026
he-0.026
Tokenags
Token ID3775
Feature activation+4.92e+00

Pos logit contributions
 imag+0.064
 scratches+0.059
 pulls+0.058
 hides+0.058
 pret+0.057
Neg logit contributions
over-0.036
date-0.033
apping-0.031
ute-0.031
pron-0.031
Token its
Token ID663
Feature activation-2.10e-01

Pos logit contributions
over+0.291
date+0.272
ute+0.268
apping+0.268
he+0.264
Neg logit contributions
 imag-0.183
 scratches-0.158
 hides-0.156
 pret-0.155
 pulls-0.155
Token tail
Token ID7894
Feature activation+9.07e-02

Pos logit contributions
 imag+0.011
 scratches+0.010
 hides+0.010
 pulls+0.010
 pret+0.010
Neg logit contributions
over-0.009
date-0.008
apping-0.008
he-0.008
ute-0.008
Token.
Token ID13
Feature activation+4.57e-01

Pos logit contributions
over+0.004
date+0.004
ute+0.004
pron+0.004
he+0.004
Neg logit contributions
 imag-0.004
 scratches-0.004
 pulls-0.004
 hides-0.004
 pret-0.004
Token It
Token ID632
Feature activation-7.79e-01

Pos logit contributions
over+0.022
apping+0.020
date+0.020
ute+0.020
he+0.020
Neg logit contributions
 imag-0.022
 hides-0.020
 scratches-0.019
 pulls-0.019
 pret-0.019
Token likes
Token ID7832
Feature activation-1.76e-01

Pos logit contributions
drivers+0.036
 supervision+0.032
 imag+0.030
 fault+0.030
imet+0.028
Neg logit contributions
 year-0.032
 world-0.030
over-0.030
 corner-0.029
 reef-0.029
Token
Token ID198
Feature activation+1.79e-02

Pos logit contributions
over+0.018
date+0.017
ute+0.017
apping+0.017
kin+0.016
Neg logit contributions
 imag-0.012
 scratches-0.011
 pret-0.011
 pulls-0.011
 hides-0.011
Token
Token ID198
Feature activation+5.42e-02

Pos logit contributions
over+0.001
date+0.001
apping+0.001
ute+0.001
 year+0.001
Neg logit contributions
 imag-0.001
 scratches-0.001
 pulls-0.001
 pret-0.001
 hides-0.001
TokenThe
Token ID464
Feature activation+4.42e-01

Pos logit contributions
over+0.002
date+0.002
apping+0.002
ute+0.002
kin+0.002
Neg logit contributions
 scratches-0.002
 imag-0.002
 pulls-0.002
 pret-0.002
 screams-0.002
Token dog
Token ID3290
Feature activation-5.95e-01

Pos logit contributions
over+0.023
date+0.023
apping+0.023
abby+0.021
he+0.021
Neg logit contributions
 scratches-0.018
 imag-0.018
 hides-0.016
 screams-0.016
 pulls-0.016
Token w
Token ID266
Feature activation-1.10e+00

Pos logit contributions
 imag+0.019
drivers+0.019
 fault+0.015
 scratches+0.015
 hides+0.014
Neg logit contributions
over-0.033
apping-0.032
 year-0.032
ute-0.031
he-0.031
Tokenags
Token ID3775
Feature activation+4.91e+00

Pos logit contributions
 imag+0.066
 scratches+0.061
 pulls+0.060
 hides+0.059
 screams+0.058
Neg logit contributions
over-0.039
date-0.038
pron-0.036
 year-0.035
ute-0.034
Token its
Token ID663
Feature activation-2.62e-01

Pos logit contributions
over+0.291
date+0.277
ute+0.269
apping+0.267
he+0.267
Neg logit contributions
 imag-0.179
 scratches-0.157
 hides-0.155
 pret-0.155
 pulls-0.154
Token tail
Token ID7894
Feature activation+6.33e-02

Pos logit contributions
 imag+0.013
 scratches+0.012
 hides+0.011
 pulls+0.011
 pret+0.011
Neg logit contributions
over-0.012
date-0.012
he-0.012
apping-0.011
ute-0.011
Token again
Token ID757
Feature activation-1.68e-02

Pos logit contributions
over+0.003
date+0.003
ute+0.003
pron+0.003
he+0.003
Neg logit contributions
 imag-0.003
 scratches-0.003
 pulls-0.002
 screams-0.002
 cries-0.002
Token.
Token ID13
Feature activation+4.31e-01

Pos logit contributions
 imag+0.001
 scratches+0.001
 pret+0.000
 hides+0.000
 screams+0.000
Neg logit contributions
date-0.001
ute-0.001
pron-0.001
over-0.001
abby-0.001
Token It
Token ID632
Feature activation-8.73e-01

Pos logit contributions
over+0.023
ute+0.021
date+0.021
apping+0.021
he+0.020
Neg logit contributions
 imag-0.018
 scratches-0.017
 pulls-0.016
 hides-0.016
 pret-0.016
Token pet
Token ID4273
Feature activation-6.24e-01

Pos logit contributions
over+0.013
he+0.013
apping+0.012
date+0.012
ute+0.012
Neg logit contributions
 imag-0.014
drivers-0.012
 hides-0.012
 scratches-0.012
 pret-0.012
Token Rex
Token ID17853
Feature activation+2.30e-01

Pos logit contributions
 imag+0.022
 hides+0.018
drivers+0.017
 scratches+0.017
 pret+0.017
Neg logit contributions
over-0.039
he-0.036
ute-0.036
date-0.036
apping-0.035
Token.
Token ID13
Feature activation+6.27e-01

Pos logit contributions
over+0.007
apping+0.006
ute+0.006
he+0.006
 year+0.006
Neg logit contributions
 imag-0.014
drivers-0.013
 fault-0.012
 hides-0.012
 disl-0.012
Token He
Token ID679
Feature activation+3.19e-01

Pos logit contributions
over+0.031
apping+0.028
ute+0.028
date+0.027
he+0.027
Neg logit contributions
 imag-0.030
 hides-0.026
 pulls-0.025
 scratches-0.025
 pret-0.025
Token w
Token ID266
Feature activation-1.16e+00

Pos logit contributions
over+0.015
 year+0.015
ute+0.014
apping+0.014
 world+0.014
Neg logit contributions
drivers-0.012
 imag-0.011
 fault-0.010
 supervision-0.009
esh-0.009
Tokenags
Token ID3775
Feature activation+4.90e+00

Pos logit contributions
 imag+0.073
 scratches+0.067
 pulls+0.066
 hides+0.065
 pret+0.064
Neg logit contributions
over-0.039
date-0.036
ute-0.034
apping-0.034
pron-0.033
Token his
Token ID465
Feature activation-2.07e-01

Pos logit contributions
over+0.268
apping+0.247
date+0.244
ute+0.243
 year+0.239
Neg logit contributions
 imag-0.210
 hides-0.181
 pulls-0.177
 scratches-0.176
 pret-0.173
Token tail
Token ID7894
Feature activation+1.54e-03

Pos logit contributions
 imag+0.010
 scratches+0.009
 pulls+0.009
 pret+0.009
 hides+0.009
Neg logit contributions
over-0.009
date-0.009
he-0.009
apping-0.009
ute-0.009
Token and
Token ID290
Feature activation+3.25e-01

Pos logit contributions
over+0.000
date+0.000
ute+0.000
pron+0.000
he+0.000
Neg logit contributions
 imag-0.000
 scratches-0.000
 pulls-0.000
 hides-0.000
 pret-0.000
Token l
Token ID300
Feature activation-8.09e-01

Pos logit contributions
over+0.014
 fisherman+0.012
 year+0.012
mare+0.012
ute+0.012
Neg logit contributions
 imag-0.014
drivers-0.014
 fault-0.013
imet-0.011
esh-0.011
Tokenicks
Token ID3378
Feature activation-6.28e-01

Pos logit contributions
 imag+0.026
 hides+0.021
 scratches+0.021
 pulls+0.021
 pret+0.021
Neg logit contributions
over-0.053
apping-0.049
date-0.049
ute-0.049
he-0.048
Token her
Token ID607
Feature activation-2.79e-01

Pos logit contributions
over+0.014
 fisherman+0.013
 year+0.013
chard+0.013
 town+0.013
Neg logit contributions
drivers-0.021
 imag-0.019
 fault-0.018
 fourteen-0.016
imet-0.016
Token band
Token ID4097
Feature activation+7.72e-01

Pos logit contributions
 imag+0.014
 scratches+0.013
 hides+0.013
 pulls+0.013
 pret+0.013
Neg logit contributions
over-0.013
date-0.012
apping-0.012
he-0.011
ute-0.011
Token.
Token ID13
Feature activation+4.82e-01

Pos logit contributions
over+0.033
apping+0.030
date+0.030
ute+0.030
he+0.029
Neg logit contributions
 imag-0.041
 hides-0.037
 scratches-0.037
 pulls-0.036
 pret-0.036
Token It
Token ID632
Feature activation-8.40e-01

Pos logit contributions
over+0.024
date+0.022
apping+0.022
ute+0.022
he+0.021
Neg logit contributions
 imag-0.022
 scratches-0.020
 hides-0.020
 pulls-0.020
 pret-0.020
Token w
Token ID266
Feature activation-1.11e+00

Pos logit contributions
drivers+0.039
 supervision+0.034
 fault+0.032
 imag+0.032
imet+0.030
Neg logit contributions
 year-0.034
 world-0.033
 corner-0.031
ute-0.031
 reef-0.030
Tokenags
Token ID3775
Feature activation+4.90e+00

Pos logit contributions
 imag+0.071
 scratches+0.065
 hides+0.065
 pulls+0.064
 pret+0.064
Neg logit contributions
over-0.036
date-0.032
apping-0.031
ute-0.031
he-0.030
Token its
Token ID663
Feature activation-3.02e-01

Pos logit contributions
over+0.273
date+0.256
apping+0.251
ute+0.251
he+0.247
Neg logit contributions
 imag-0.198
 scratches-0.174
 hides-0.172
 pret-0.170
 pulls-0.169
Token tail
Token ID7894
Feature activation-3.91e-02

Pos logit contributions
 imag+0.015
 scratches+0.014
 hides+0.014
 pulls+0.013
 pret+0.013
Neg logit contributions
over-0.014
date-0.013
he-0.013
apping-0.013
ute-0.012
Token and
Token ID290
Feature activation+3.39e-01

Pos logit contributions
 imag+0.002
 scratches+0.002
 pulls+0.002
 hides+0.001
 pret+0.001
Neg logit contributions
date-0.002
over-0.002
ute-0.002
pron-0.002
mber-0.002
Token bar
Token ID2318
Feature activation-1.96e+00

Pos logit contributions
over+0.014
 year+0.013
ute+0.013
 place+0.013
 sunny+0.012
Neg logit contributions
drivers-0.015
 imag-0.013
 fault-0.013
imet-0.011
 supervision-0.011
Tokenks
Token ID591
Feature activation-1.15e+00

Pos logit contributions
 imag+0.069
 hides+0.057
 scratches+0.056
 pulls+0.056
 pret+0.055
Neg logit contributions
over-0.121
apping-0.115
ute-0.111
he-0.111
date-0.110
Token
Token ID198
Feature activation+1.58e-01

Pos logit contributions
over+0.014
ute+0.013
date+0.013
apping+0.013
pron+0.012
Neg logit contributions
 imag-0.010
 scratches-0.010
 pulls-0.010
 hides-0.010
 pret-0.010
Token
Token ID198
Feature activation+1.68e-01

Pos logit contributions
over+0.009
date+0.008
ute+0.008
apping+0.008
pron+0.008
Neg logit contributions
 imag-0.006
 scratches-0.006
 pulls-0.006
 knocks-0.005
 pret-0.005
TokenThe
Token ID464
Feature activation+5.19e-01

Pos logit contributions
over+0.007
ute+0.007
date+0.006
apping+0.006
pron+0.006
Neg logit contributions
 imag-0.008
 scratches-0.008
 pulls-0.008
 hides-0.008
 pret-0.008
Token dog
Token ID3290
Feature activation-6.36e-01

Pos logit contributions
over+0.027
apping+0.026
date+0.026
abby+0.024
ute+0.024
Neg logit contributions
 scratches-0.021
 imag-0.021
 hides-0.021
 pulls-0.020
 screams-0.020
Token w
Token ID266
Feature activation-1.06e+00

Pos logit contributions
 imag+0.022
drivers+0.021
 scratches+0.018
 fault+0.018
 hides+0.017
Neg logit contributions
over-0.034
 year-0.032
apping-0.032
ute-0.032
he-0.031
Tokenags
Token ID3775
Feature activation+4.88e+00

Pos logit contributions
 imag+0.065
 scratches+0.060
 pulls+0.059
 hides+0.058
 pret+0.057
Neg logit contributions
over-0.037
date-0.034
pron-0.033
ute-0.032
apping-0.032
Token its
Token ID663
Feature activation-1.87e-01

Pos logit contributions
over+0.302
date+0.284
ute+0.279
apping+0.278
he+0.276
Neg logit contributions
 imag-0.168
 scratches-0.143
 pret-0.142
 hides-0.142
 pulls-0.141
Token tail
Token ID7894
Feature activation-8.96e-03

Pos logit contributions
 imag+0.010
 scratches+0.009
 hides+0.009
 pulls+0.009
 pret+0.009
Neg logit contributions
over-0.008
date-0.008
apping-0.008
he-0.007
ute-0.007
Token.
Token ID13
Feature activation+3.65e-01

Pos logit contributions
 imag+0.000
 scratches+0.000
 pulls+0.000
 hides+0.000
 pret+0.000
Neg logit contributions
over-0.000
date-0.000
ute-0.000
he-0.000
apping-0.000
Token It
Token ID632
Feature activation-8.02e-01

Pos logit contributions
over+0.020
date+0.018
ute+0.018
apping+0.018
he+0.018
Neg logit contributions
 imag-0.016
 hides-0.014
 scratches-0.014
 pulls-0.014
 pret-0.013
Token runs
Token ID4539
Feature activation-9.46e-01

Pos logit contributions
drivers+0.035
 imag+0.031
 supervision+0.030
 fault+0.029
 fourteen+0.028
Neg logit contributions
 year-0.035
over-0.034
 world-0.033
ute-0.032
he-0.032
Token see
Token ID766
Feature activation+5.66e-01

Pos logit contributions
 imag+0.033
 hides+0.029
 pulls+0.029
 scratches+0.029
 pret+0.029
Neg logit contributions
over-0.025
date-0.023
he-0.022
apping-0.022
ute-0.022
Token Sam
Token ID3409
Feature activation+5.28e-01

Pos logit contributions
over+0.029
he+0.028
date+0.027
apping+0.027
ute+0.026
Neg logit contributions
 imag-0.025
 hides-0.022
 pret-0.020
 scratches-0.020
 pulls-0.020
Token.
Token ID13
Feature activation+4.36e-01

Pos logit contributions
over+0.023
apping+0.021
date+0.020
ute+0.020
he+0.020
Neg logit contributions
 imag-0.028
 scratches-0.025
 hides-0.025
 pulls-0.024
 pret-0.024
Token He
Token ID679
Feature activation+8.09e-02

Pos logit contributions
over+0.019
apping+0.017
date+0.017
ute+0.017
he+0.016
Neg logit contributions
 imag-0.023
 hides-0.021
 scratches-0.020
 pret-0.020
 pulls-0.020
Token w
Token ID266
Feature activation-1.18e+00

Pos logit contributions
 year+0.004
 world+0.004
ute+0.004
apping+0.003
he+0.003
Neg logit contributions
drivers-0.003
 imag-0.002
 supervision-0.002
 fault-0.002
fl-0.002
Tokenags
Token ID3775
Feature activation+4.87e+00

Pos logit contributions
 imag+0.072
 scratches+0.066
 pulls+0.065
 hides+0.064
 pret+0.063
Neg logit contributions
over-0.042
date-0.039
pron-0.037
ute-0.037
 year-0.036
Token his
Token ID465
Feature activation-2.88e-01

Pos logit contributions
over+0.286
apping+0.264
date+0.264
ute+0.263
he+0.258
Neg logit contributions
 imag-0.187
 hides-0.160
 scratches-0.158
 pulls-0.156
 pret-0.156
Token tail
Token ID7894
Feature activation-8.05e-02

Pos logit contributions
 imag+0.014
 scratches+0.013
 hides+0.013
 pulls+0.013
 pret+0.013
Neg logit contributions
over-0.013
date-0.013
he-0.012
apping-0.012
ute-0.012
Token and
Token ID290
Feature activation+3.07e-01

Pos logit contributions
 imag+0.004
 scratches+0.003
 pulls+0.003
 hides+0.003
 pret+0.003
Neg logit contributions
over-0.004
date-0.004
ute-0.004
pron-0.003
apping-0.003
Token l
Token ID300
Feature activation-8.25e-01

Pos logit contributions
 fisherman+0.012
 place+0.012
mare+0.012
over+0.011
 town+0.011
Neg logit contributions
drivers-0.012
 fault-0.011
 imag-0.011
imet-0.010
 supervision-0.009
Tokenicks
Token ID3378
Feature activation-7.09e-01

Pos logit contributions
 imag+0.026
 hides+0.021
 scratches+0.021
 pulls+0.021
 pret+0.021
Neg logit contributions
over-0.054
apping-0.050
date-0.050
ute-0.050
he-0.049
Token not
Token ID407
Feature activation+8.98e-01

Pos logit contributions
 imag+0.008
 hides+0.007
 scratches+0.007
 pulls+0.007
 pret+0.007
Neg logit contributions
over-0.009
apping-0.008
ute-0.008
date-0.008
he-0.008
Token answer
Token ID3280
Feature activation+3.26e-01

Pos logit contributions
over+0.046
date+0.042
apping+0.042
ute+0.041
pron+0.040
Neg logit contributions
 imag-0.042
 hides-0.038
 pret-0.038
 scratches-0.037
 pulls-0.037
Token.
Token ID13
Feature activation+4.72e-01

Pos logit contributions
over+0.016
date+0.015
ute+0.015
apping+0.015
he+0.014
Neg logit contributions
 imag-0.015
 scratches-0.014
 hides-0.014
 pulls-0.013
 pret-0.013
Token It
Token ID632
Feature activation-8.18e-01

Pos logit contributions
over+0.025
date+0.024
ute+0.024
apping+0.023
 year+0.023
Neg logit contributions
 imag-0.019
 pulls-0.018
 scratches-0.018
 hides-0.018
 pret-0.018
Token w
Token ID266
Feature activation-1.08e+00

Pos logit contributions
drivers+0.035
 supervision+0.028
 imag+0.027
 fault+0.026
 subt+0.025
Neg logit contributions
 year-0.039
over-0.037
ute-0.035
 world-0.035
he-0.034
Tokenags
Token ID3775
Feature activation+4.87e+00

Pos logit contributions
 imag+0.065
 scratches+0.060
 pulls+0.059
 hides+0.058
 screams+0.057
Neg logit contributions
over-0.040
date-0.038
pron-0.035
 year-0.034
apping-0.034
Token its
Token ID663
Feature activation-2.76e-01

Pos logit contributions
over+0.256
date+0.237
apping+0.235
ute+0.233
he+0.230
Neg logit contributions
 imag-0.216
 hides-0.189
 scratches-0.188
 pulls-0.186
 pret-0.186
Token tail
Token ID7894
Feature activation+4.00e-02

Pos logit contributions
 imag+0.014
 scratches+0.013
 hides+0.012
 pulls+0.012
 pret+0.012
Neg logit contributions
over-0.012
date-0.012
apping-0.012
he-0.012
ute-0.011
Token.
Token ID13
Feature activation+4.65e-01

Pos logit contributions
over+0.002
date+0.002
ute+0.002
pron+0.002
mber+0.002
Neg logit contributions
 imag-0.002
 scratches-0.002
 pulls-0.002
 hides-0.002
 screams-0.001
Token It
Token ID632
Feature activation-8.40e-01

Pos logit contributions
over+0.025
date+0.024
ute+0.024
apping+0.023
 year+0.023
Neg logit contributions
 imag-0.019
 pulls-0.017
 scratches-0.017
 pret-0.017
 hides-0.017
Token l
Token ID300
Feature activation-7.82e-01

Pos logit contributions
drivers+0.037
 supervision+0.030
 imag+0.029
 fault+0.028
imet+0.026
Neg logit contributions
 year-0.039
over-0.037
 world-0.036
ute-0.035
 corner-0.035
Token hello
Token ID23748
Feature activation-1.09e-01

Pos logit contributions
over+0.027
apping+0.025
ute+0.024
date+0.024
 year+0.023
Neg logit contributions
 imag-0.033
 hides-0.030
 pulls-0.029
 scratches-0.029
 pret-0.029
Token.
Token ID13
Feature activation+5.79e-01

Pos logit contributions
 imag+0.006
 pulls+0.006
 scratches+0.006
 hides+0.006
 pret+0.005
Neg logit contributions
over-0.004
apping-0.004
ute-0.004
date-0.004
 year-0.004
Token The
Token ID383
Feature activation+2.14e-02

Pos logit contributions
over+0.029
date+0.026
apping+0.026
ute+0.026
he+0.025
Neg logit contributions
 imag-0.028
 scratches-0.024
 hides-0.024
 pulls-0.024
 pret-0.024
Token dog
Token ID3290
Feature activation-6.79e-01

Pos logit contributions
over+0.001
apping+0.001
date+0.001
ute+0.001
he+0.001
Neg logit contributions
 imag-0.001
 scratches-0.001
 hides-0.001
 pulls-0.001
 pret-0.001
Token w
Token ID266
Feature activation-1.14e+00

Pos logit contributions
drivers+0.028
 imag+0.025
 supervision+0.022
 subt+0.022
 fault+0.021
Neg logit contributions
 year-0.030
over-0.030
 world-0.029
ute-0.029
apping-0.028
Tokenags
Token ID3775
Feature activation+4.85e+00

Pos logit contributions
 imag+0.072
 scratches+0.066
 pulls+0.065
 hides+0.065
 pret+0.064
Neg logit contributions
over-0.038
date-0.035
ute-0.033
apping-0.033
 year-0.032
Token its
Token ID663
Feature activation-3.02e-01

Pos logit contributions
over+0.280
date+0.261
ute+0.257
apping+0.257
he+0.256
Neg logit contributions
 imag-0.189
 scratches-0.162
 pret-0.161
 hides-0.160
 pulls-0.158
Token tail
Token ID7894
Feature activation-4.75e-02

Pos logit contributions
 imag+0.016
 scratches+0.014
 hides+0.014
 pulls+0.014
 pret+0.014
Neg logit contributions
over-0.013
date-0.012
apping-0.012
he-0.012
ute-0.012
Token and
Token ID290
Feature activation+3.53e-01

Pos logit contributions
 imag+0.003
 hides+0.002
 scratches+0.002
 pulls+0.002
 pret+0.002
Neg logit contributions
over-0.002
date-0.002
apping-0.002
ute-0.002
he-0.002
Token l
Token ID300
Feature activation-7.77e-01

Pos logit contributions
over+0.016
ute+0.014
 fisherman+0.014
 place+0.013
 year+0.013
Neg logit contributions
drivers-0.015
 imag-0.014
 fault-0.013
imet-0.011
esh-0.011
Tokenicks
Token ID3378
Feature activation-6.74e-01

Pos logit contributions
 imag+0.026
 hides+0.022
 scratches+0.021
 pret+0.021
 pulls+0.021
Neg logit contributions
over-0.049
apping-0.046
ute-0.045
date-0.045
he-0.045
Token hello
Token ID23748
Feature activation+1.34e-02

Pos logit contributions
 imag+0.018
 hides+0.016
 scratches+0.016
 pulls+0.016
 pret+0.016
Neg logit contributions
over-0.016
apping-0.014
date-0.014
ute-0.014
 year-0.014
Token.
Token ID13
Feature activation+1.07e+00

Pos logit contributions
over+0.001
apping+0.001
ute+0.001
date+0.001
he+0.001
Neg logit contributions
 imag-0.001
 scratches-0.001
 hides-0.001
 pulls-0.001
 pret-0.001
Token The
Token ID383
Feature activation+4.04e-01

Pos logit contributions
over+0.057
date+0.054
ute+0.053
he+0.052
pron+0.052
Neg logit contributions
 imag-0.043
 pret-0.039
 scratches-0.039
 pulls-0.038
 hides-0.038
Token dog
Token ID3290
Feature activation-4.69e-01

Pos logit contributions
over+0.020
apping+0.020
date+0.019
he+0.019
ute+0.018
Neg logit contributions
 imag-0.018
 scratches-0.017
 pret-0.016
 screams-0.016
 hides-0.016
Token w
Token ID266
Feature activation-1.06e+00

Pos logit contributions
drivers+0.016
 imag+0.016
 fault+0.013
 supervision+0.013
 subt+0.012
Neg logit contributions
over-0.024
 year-0.024
ute-0.023
apping-0.023
he-0.023
Tokenags
Token ID3775
Feature activation+4.85e+00

Pos logit contributions
 imag+0.065
 scratches+0.059
 pulls+0.059
 hides+0.058
 pret+0.057
Neg logit contributions
over-0.037
date-0.034
ute-0.032
apping-0.032
he-0.031
Token his
Token ID465
Feature activation-3.48e-01

Pos logit contributions
over+0.287
date+0.273
ute+0.272
apping+0.270
he+0.264
Neg logit contributions
 imag-0.171
 scratches-0.150
 pret-0.148
 hides-0.147
 pulls-0.145
Token tail
Token ID7894
Feature activation+1.23e-01

Pos logit contributions
 imag+0.016
 scratches+0.015
 pulls+0.014
 pret+0.014
 hides+0.014
Neg logit contributions
date-0.016
over-0.016
he-0.016
apping-0.016
ute-0.016
Token.
Token ID13
Feature activation+8.82e-01

Pos logit contributions
over+0.006
date+0.005
ute+0.005
apping+0.005
pron+0.005
Neg logit contributions
 imag-0.006
 scratches-0.005
 pulls-0.005
 hides-0.005
 pret-0.005
Token He
Token ID679
Feature activation+2.87e-01

Pos logit contributions
over+0.048
ute+0.046
date+0.046
apping+0.044
pron+0.043
Neg logit contributions
 imag-0.035
 scratches-0.033
 pulls-0.032
 pret-0.031
 hides-0.031
Token likes
Token ID7832
Feature activation-2.41e-01

Pos logit contributions
 year+0.014
over+0.013
ute+0.013
 fisherman+0.012
apping+0.012
Neg logit contributions
drivers-0.011
 imag-0.010
 supervision-0.009
 fault-0.008
 subt-0.008
Token?"
Token ID1701
Feature activation+2.84e-01

Pos logit contributions
over+0.038
apping+0.033
boat+0.031
date+0.031
he+0.031
Neg logit contributions
 imag-0.055
 pret-0.053
 hides-0.050
 scratches-0.048
 knocks-0.048
Token
Token ID198
Feature activation+2.45e-02

Pos logit contributions
over+0.015
date+0.013
apping+0.013
ute+0.013
he+0.013
Neg logit contributions
 imag-0.013
 hides-0.011
 scratches-0.011
 pulls-0.011
 pret-0.011
Token
Token ID198
Feature activation+5.64e-02

Pos logit contributions
over+0.001
date+0.001
apping+0.001
ute+0.001
 year+0.001
Neg logit contributions
 imag-0.001
 scratches-0.001
 pulls-0.001
 hides-0.001
 pret-0.001
TokenTom
Token ID13787
Feature activation+1.50e-03

Pos logit contributions
over+0.002
ute+0.002
date+0.002
apping+0.002
 year+0.002
Neg logit contributions
 imag-0.003
 scratches-0.003
 pulls-0.003
 hides-0.003
 pret-0.003
Token w
Token ID266
Feature activation-1.11e+00

Pos logit contributions
apping+0.000
over+0.000
he+0.000
 year+0.000
ute+0.000
Neg logit contributions
drivers-0.000
 imag-0.000
 scratches-0.000
 fault-0.000
 supervision-0.000
Tokenags
Token ID3775
Feature activation+4.84e+00

Pos logit contributions
 imag+0.069
 scratches+0.063
 pulls+0.062
 hides+0.062
 pret+0.061
Neg logit contributions
over-0.038
date-0.035
ute-0.033
apping-0.032
pron-0.032
Token his
Token ID465
Feature activation-2.51e-01

Pos logit contributions
over+0.278
apping+0.257
date+0.256
ute+0.255
he+0.251
Neg logit contributions
 imag-0.192
 hides-0.164
 scratches-0.162
 pulls-0.160
 pret-0.159
Token tail
Token ID7894
Feature activation+3.47e-02

Pos logit contributions
 imag+0.013
 scratches+0.012
 hides+0.011
 pulls+0.011
 pret+0.011
Neg logit contributions
over-0.011
date-0.011
he-0.011
apping-0.011
ute-0.010
Token and
Token ID290
Feature activation+2.95e-01

Pos logit contributions
over+0.002
date+0.002
ute+0.002
pron+0.002
he+0.002
Neg logit contributions
 imag-0.002
 scratches-0.001
 pulls-0.001
 pret-0.001
 hides-0.001
Token l
Token ID300
Feature activation-8.05e-01

Pos logit contributions
over+0.012
mber+0.011
 fisherman+0.011
ute+0.011
pron+0.011
Neg logit contributions
drivers-0.012
 imag-0.012
 fault-0.011
 supervision-0.009
 fourteen-0.009
Tokenicks
Token ID3378
Feature activation-6.33e-01

Pos logit contributions
 imag+0.026
 hides+0.021
 scratches+0.021
 pulls+0.021
 pret+0.021
Neg logit contributions
over-0.052
apping-0.049
date-0.048
ute-0.048
he-0.048
Token their
Token ID511
Feature activation-4.52e-01

Pos logit contributions
over+0.018
apping+0.016
date+0.016
ute+0.016
he+0.016
Neg logit contributions
 imag-0.013
 hides-0.012
 scratches-0.011
 pulls-0.011
 pret-0.011
Token hands
Token ID2832
Feature activation-1.41e-01

Pos logit contributions
 imag+0.022
drivers+0.021
 pret+0.021
 fourteen+0.020
 pulls+0.020
Neg logit contributions
over-0.018
ute-0.017
 year-0.017
date-0.017
apping-0.016
Token.
Token ID13
Feature activation+2.97e-01

Pos logit contributions
 imag+0.005
 scratches+0.005
 pulls+0.005
 pret+0.005
 cries+0.005
Neg logit contributions
over-0.007
date-0.007
ute-0.007
pron-0.007
mber-0.007
Token Rex
Token ID17853
Feature activation+3.52e-01

Pos logit contributions
over+0.016
date+0.015
apping+0.015
ute+0.015
he+0.014
Neg logit contributions
 imag-0.013
 hides-0.011
 scratches-0.011
 pulls-0.011
 pret-0.011
Token w
Token ID266
Feature activation-1.10e+00

Pos logit contributions
over+0.018
 year+0.016
apping+0.016
ute+0.016
he+0.016
Neg logit contributions
 imag-0.012
drivers-0.012
 fault-0.011
 supervision-0.010
 disl-0.009
Tokenags
Token ID3775
Feature activation+4.84e+00

Pos logit contributions
 imag+0.070
 scratches+0.064
 pulls+0.063
 hides+0.063
 pret+0.062
Neg logit contributions
over-0.037
date-0.033
ute-0.031
apping-0.031
he-0.031
Token his
Token ID465
Feature activation-1.58e-01

Pos logit contributions
over+0.305
date+0.290
ute+0.286
apping+0.283
he+0.282
Neg logit contributions
 imag-0.159
 scratches-0.136
 pret-0.134
 hides-0.133
 pulls-0.131
Token tail
Token ID7894
Feature activation-1.13e-01

Pos logit contributions
 imag+0.008
 scratches+0.007
 hides+0.007
 pulls+0.007
 pret+0.007
Neg logit contributions
over-0.007
date-0.006
apping-0.006
ute-0.006
he-0.006
Token.
Token ID13
Feature activation+2.67e-01

Pos logit contributions
 imag+0.006
 scratches+0.005
 hides+0.005
 pulls+0.005
 pret+0.005
Neg logit contributions
over-0.005
date-0.005
ute-0.005
apping-0.005
he-0.005
Token He
Token ID679
Feature activation+1.89e-01

Pos logit contributions
over+0.014
date+0.013
ute+0.013
apping+0.013
he+0.012
Neg logit contributions
 imag-0.012
 scratches-0.011
 hides-0.011
 pulls-0.011
 pret-0.010
Token l
Token ID300
Feature activation-7.27e-01

Pos logit contributions
over+0.008
 year+0.008
apping+0.007
 world+0.007
ute+0.007
Neg logit contributions
drivers-0.008
 imag-0.007
 supervision-0.006
 fault-0.006
esh-0.006
Token
Token ID198
Feature activation+1.02e-01

Pos logit contributions
over+0.019
date+0.017
ute+0.017
apping+0.017
he+0.017
Neg logit contributions
 imag-0.018
 scratches-0.016
 hides-0.016
 pret-0.016
 pulls-0.016
Token
Token ID198
Feature activation+1.29e-01

Pos logit contributions
over+0.005
date+0.004
apping+0.004
ute+0.004
he+0.004
Neg logit contributions
 imag-0.005
 scratches-0.004
 pulls-0.004
 hides-0.004
 pret-0.004
TokenThe
Token ID464
Feature activation+5.03e-01

Pos logit contributions
over+0.005
date+0.005
ute+0.005
apping+0.005
he+0.005
Neg logit contributions
 imag-0.007
 scratches-0.006
 pulls-0.006
 pret-0.006
 hides-0.006
Token dog
Token ID3290
Feature activation-5.69e-01

Pos logit contributions
over+0.026
date+0.026
apping+0.025
he+0.024
abby+0.024
Neg logit contributions
 scratches-0.021
 imag-0.020
 screams-0.019
 hides-0.019
 pulls-0.019
Token w
Token ID266
Feature activation-1.04e+00

Pos logit contributions
drivers+0.021
 imag+0.020
 fault+0.018
 supervision+0.017
 scratches+0.015
Neg logit contributions
over-0.028
 year-0.027
apping-0.027
ute-0.026
he-0.026
Tokenags
Token ID3775
Feature activation+4.84e+00

Pos logit contributions
 imag+0.064
 scratches+0.059
 pulls+0.058
 hides+0.058
 pret+0.057
Neg logit contributions
over-0.036
date-0.034
pron-0.032
apping-0.031
ute-0.031
Token its
Token ID663
Feature activation-2.26e-01

Pos logit contributions
over+0.285
date+0.265
apping+0.262
ute+0.262
he+0.258
Neg logit contributions
 imag-0.184
 scratches-0.157
 hides-0.157
 pret-0.155
 pulls-0.155
Token tail
Token ID7894
Feature activation+8.46e-02

Pos logit contributions
 imag+0.012
 scratches+0.010
 hides+0.010
 pulls+0.010
 pret+0.010
Neg logit contributions
over-0.010
date-0.009
he-0.009
apping-0.009
ute-0.009
Token.
Token ID13
Feature activation+4.76e-01

Pos logit contributions
over+0.004
date+0.004
ute+0.004
apping+0.004
he+0.003
Neg logit contributions
 imag-0.004
 scratches-0.004
 pulls-0.004
 hides-0.004
 pret-0.004
Token It
Token ID632
Feature activation-8.10e-01

Pos logit contributions
over+0.026
ute+0.024
date+0.024
apping+0.024
 year+0.023
Neg logit contributions
 imag-0.020
 scratches-0.018
 pulls-0.018
 hides-0.018
 pret-0.017
Token l
Token ID300
Feature activation-7.57e-01

Pos logit contributions
drivers+0.039
 supervision+0.034
 imag+0.033
 fault+0.032
 subt+0.030
Neg logit contributions
 year-0.031
over-0.030
 world-0.030
 fisherman-0.029
ute-0.029
Tokenike
Token ID522
Feature activation+4.21e-01

Pos logit contributions
over+0.008
date+0.007
apping+0.007
ute+0.007
he+0.007
Neg logit contributions
 imag-0.010
 hides-0.009
 scratches-0.009
 pulls-0.009
 pret-0.009
Token bar
Token ID2318
Feature activation-2.00e+00

Pos logit contributions
over+0.023
apping+0.022
ute+0.022
 year+0.022
he+0.022
Neg logit contributions
drivers-0.013
 imag-0.013
 scratches-0.011
 fault-0.011
 hides-0.011
Tokenks
Token ID591
Feature activation-1.21e+00

Pos logit contributions
 imag+0.068
 hides+0.057
 scratches+0.056
 pret+0.055
 pulls+0.055
Neg logit contributions
over-0.124
apping-0.118
ute-0.115
he-0.114
date-0.114
Token and
Token ID290
Feature activation+2.02e-01

Pos logit contributions
 imag+0.074
 hides+0.067
drivers+0.066
 pret+0.065
 scratches+0.064
Neg logit contributions
over-0.043
apping-0.039
he-0.037
 year-0.036
ute-0.035
Token w
Token ID266
Feature activation-1.18e+00

Pos logit contributions
apping+0.009
ute+0.009
over+0.009
 fisherman+0.009
 reef+0.009
Neg logit contributions
drivers-0.007
 imag-0.007
 fault-0.006
wed-0.005
anged-0.005
Tokenags
Token ID3775
Feature activation+4.84e+00

Pos logit contributions
 imag+0.077
 hides+0.070
 scratches+0.070
 pulls+0.070
 pret+0.069
Neg logit contributions
over-0.037
date-0.032
apping-0.032
ute-0.032
he-0.030
Token his
Token ID465
Feature activation-2.89e-01

Pos logit contributions
over+0.232
apping+0.214
 year+0.210
date+0.207
he+0.207
Neg logit contributions
 imag-0.233
 hides-0.203
 pulls-0.199
drivers-0.195
 scratches-0.194
Token tail
Token ID7894
Feature activation-7.28e-02

Pos logit contributions
 imag+0.016
 hides+0.014
 scratches+0.014
 pulls+0.014
 pret+0.014
Neg logit contributions
over-0.012
apping-0.011
date-0.011
ute-0.011
he-0.011
Token.
Token ID13
Feature activation+3.12e-01

Pos logit contributions
 imag+0.004
 hides+0.004
 scratches+0.003
 pulls+0.003
 pret+0.003
Neg logit contributions
over-0.003
apping-0.003
date-0.003
ute-0.003
he-0.003
Token He
Token ID679
Feature activation+7.94e-02

Pos logit contributions
over+0.013
apping+0.011
ute+0.011
date+0.011
he+0.011
Neg logit contributions
 imag-0.018
 hides-0.015
 pret-0.015
 pulls-0.015
 scratches-0.015
Token is
Token ID318
Feature activation-4.89e-01

Pos logit contributions
 year+0.004
 world+0.004
ute+0.004
over+0.003
he+0.003
Neg logit contributions
drivers-0.003
 imag-0.003
 supervision-0.003
 fault-0.002
 scratches-0.002
Token
Token ID198
Feature activation+9.22e-02

Pos logit contributions
 imag+0.005
 scratches+0.005
 pulls+0.005
 pret+0.005
 hides+0.005
Neg logit contributions
over-0.007
date-0.007
pron-0.007
kin-0.007
ute-0.007
Token
Token ID198
Feature activation+1.09e-01

Pos logit contributions
over+0.004
date+0.004
apping+0.004
ute+0.004
pron+0.004
Neg logit contributions
 imag-0.004
 scratches-0.004
 pulls-0.004
 hides-0.004
 pret-0.004
TokenThe
Token ID464
Feature activation+4.73e-01

Pos logit contributions
over+0.004
date+0.004
apping+0.004
ute+0.004
pron+0.004
Neg logit contributions
 imag-0.006
 scratches-0.006
 pulls-0.006
 hides-0.006
 pret-0.005
Token dog
Token ID3290
Feature activation-6.25e-01

Pos logit contributions
over+0.025
date+0.024
apping+0.024
abby+0.023
ute+0.022
Neg logit contributions
 imag-0.020
 scratches-0.019
 hides-0.018
 pulls-0.018
 screams-0.018
Token w
Token ID266
Feature activation-1.06e+00

Pos logit contributions
 imag+0.021
drivers+0.021
 fault+0.017
 scratches+0.017
 supervision+0.016
Neg logit contributions
over-0.032
 year-0.032
apping-0.031
he-0.031
ute-0.031
Tokenags
Token ID3775
Feature activation+4.83e+00

Pos logit contributions
 imag+0.066
 scratches+0.061
 pulls+0.060
 hides+0.060
 pret+0.059
Neg logit contributions
over-0.036
date-0.033
apping-0.031
ute-0.031
pron-0.031
Token its
Token ID663
Feature activation-2.54e-01

Pos logit contributions
over+0.289
date+0.271
ute+0.265
apping+0.265
he+0.264
Neg logit contributions
 imag-0.177
 scratches-0.153
 hides-0.151
 pret-0.151
 pulls-0.150
Token tail
Token ID7894
Feature activation-1.16e-02

Pos logit contributions
 imag+0.014
 hides+0.012
 scratches+0.012
 pulls+0.012
 pret+0.012
Neg logit contributions
over-0.011
date-0.010
apping-0.010
ute-0.010
he-0.010
Token.
Token ID13
Feature activation+4.65e-01

Pos logit contributions
 imag+0.001
 scratches+0.000
 pulls+0.000
 hides+0.000
 pret+0.000
Neg logit contributions
over-0.001
date-0.001
ute-0.001
apping-0.001
pron-0.001
Token It
Token ID632
Feature activation-8.97e-01

Pos logit contributions
over+0.025
ute+0.023
date+0.023
apping+0.022
he+0.022
Neg logit contributions
 imag-0.020
 scratches-0.018
 pulls-0.018
 hides-0.018
 pret-0.018
Token likes
Token ID7832
Feature activation-2.38e-01

Pos logit contributions
drivers+0.040
 supervision+0.034
 imag+0.033
 fault+0.032
imet+0.029
Neg logit contributions
 year-0.039
 world-0.037
over-0.036
 corner-0.036
ute-0.035
Token stops
Token ID9911
Feature activation-6.58e-01

Pos logit contributions
 imag+0.022
drivers+0.022
 fault+0.018
 supervision+0.017
 scratches+0.017
Neg logit contributions
over-0.034
 year-0.033
ute-0.033
apping-0.033
he-0.032
Token barking
Token ID48961
Feature activation-9.07e-01

Pos logit contributions
 imag+0.031
 scratches+0.028
 hides+0.027
 pret+0.027
 pulls+0.027
Neg logit contributions
over-0.033
date-0.030
ute-0.029
he-0.029
apping-0.029
Token.
Token ID13
Feature activation+3.95e-01

Pos logit contributions
 imag+0.046
 scratches+0.042
 hides+0.041
 pulls+0.041
 pret+0.041
Neg logit contributions
over-0.041
date-0.038
ute-0.038
apping-0.037
pron-0.037
Token It
Token ID632
Feature activation-8.91e-01

Pos logit contributions
over+0.023
date+0.022
ute+0.021
apping+0.021
 year+0.020
Neg logit contributions
 imag-0.015
 scratches-0.014
 pulls-0.014
 hides-0.013
 pret-0.013
Token w
Token ID266
Feature activation-1.11e+00

Pos logit contributions
drivers+0.042
 supervision+0.035
 imag+0.033
 fault+0.033
 subt+0.032
Neg logit contributions
 year-0.037
over-0.034
 world-0.034
ute-0.034
 corner-0.033
Tokenags
Token ID3775
Feature activation+4.83e+00

Pos logit contributions
 imag+0.069
 scratches+0.064
 pulls+0.063
 hides+0.062
 pret+0.061
Neg logit contributions
over-0.038
date-0.036
ute-0.033
pron-0.033
apping-0.033
Token its
Token ID663
Feature activation-2.86e-01

Pos logit contributions
over+0.258
date+0.239
apping+0.237
ute+0.235
he+0.232
Neg logit contributions
 imag-0.211
 hides-0.183
 scratches-0.183
 pulls-0.180
 pret-0.180
Token tail
Token ID7894
Feature activation-3.32e-02

Pos logit contributions
 imag+0.015
 scratches+0.013
 hides+0.013
 pulls+0.013
 pret+0.013
Neg logit contributions
over-0.013
date-0.012
apping-0.012
he-0.011
ute-0.011
Token.
Token ID13
Feature activation+3.96e-01

Pos logit contributions
 imag+0.002
 scratches+0.001
 pulls+0.001
 hides+0.001
 pret+0.001
Neg logit contributions
over-0.002
date-0.002
ute-0.002
pron-0.001
apping-0.001
Token It
Token ID632
Feature activation-9.10e-01

Pos logit contributions
over+0.023
date+0.022
ute+0.021
apping+0.021
 year+0.020
Neg logit contributions
 imag-0.015
 scratches-0.014
 pulls-0.014
 hides-0.013
 pret-0.013
Token l
Token ID300
Feature activation-7.74e-01

Pos logit contributions
drivers+0.045
 supervision+0.038
 imag+0.035
 fault+0.035
 subt+0.033
Neg logit contributions
 year-0.036
 world-0.034
over-0.033
ute-0.033
 fisherman-0.033
Token Ben
Token ID3932
Feature activation+3.30e-01

Pos logit contributions
over+0.018
date+0.016
apping+0.016
ute+0.016
he+0.016
Neg logit contributions
 imag-0.012
 hides-0.010
 scratches-0.010
 pulls-0.010
 pret-0.010
Token's
Token ID338
Feature activation+6.14e-01

Pos logit contributions
over+0.016
date+0.015
apping+0.015
ute+0.015
he+0.014
Neg logit contributions
 imag-0.015
 hides-0.014
 pulls-0.014
 scratches-0.014
 pret-0.014
Token feet
Token ID3625
Feature activation+2.97e-02

Pos logit contributions
over+0.025
ute+0.022
apping+0.022
date+0.022
he+0.021
Neg logit contributions
 imag-0.034
 hides-0.031
 pulls-0.030
 scratches-0.030
 pret-0.030
Token and
Token ID290
Feature activation+2.44e-01

Pos logit contributions
over+0.001
date+0.001
ute+0.001
apping+0.001
pron+0.001
Neg logit contributions
 imag-0.001
 scratches-0.001
 pulls-0.001
 pret-0.001
 hides-0.001
Token w
Token ID266
Feature activation-1.20e+00

Pos logit contributions
 fisherman+0.010
 field+0.010
 corner+0.010
 place+0.009
 basket+0.009
Neg logit contributions
drivers-0.010
 imag-0.008
 fault-0.008
wed-0.007
 fourteen-0.006
Tokenags
Token ID3775
Feature activation+4.81e+00

Pos logit contributions
 imag+0.078
 hides+0.071
 scratches+0.071
 pulls+0.071
 pret+0.070
Neg logit contributions
over-0.039
date-0.033
apping-0.033
ute-0.033
he-0.032
Token his
Token ID465
Feature activation-2.92e-01

Pos logit contributions
over+0.259
apping+0.238
date+0.236
ute+0.234
he+0.232
Neg logit contributions
 imag-0.209
 hides-0.179
 pulls-0.176
 scratches-0.175
 pret-0.172
Token tail
Token ID7894
Feature activation-1.32e-01

Pos logit contributions
 imag+0.015
 scratches+0.013
 hides+0.013
 pulls+0.013
 pret+0.013
Neg logit contributions
over-0.013
date-0.012
apping-0.012
he-0.012
ute-0.012
Token.
Token ID13
Feature activation+3.49e-01

Pos logit contributions
 imag+0.006
 scratches+0.005
 hides+0.005
 pulls+0.005
 pret+0.005
Neg logit contributions
over-0.007
date-0.006
ute-0.006
pron-0.006
apping-0.006
Token
Token ID198
Feature activation-6.26e-02

Pos logit contributions
over+0.016
apping+0.014
date+0.014
ute+0.014
he+0.014
Neg logit contributions
 imag-0.018
 hides-0.016
 scratches-0.016
 pulls-0.016
 pret-0.016
Token
Token ID198
Feature activation-5.53e-02

Pos logit contributions
 imag+0.004
 hides+0.003
 scratches+0.003
 pret+0.003
 pulls+0.003
Neg logit contributions
over-0.002
apping-0.002
he-0.002
ute-0.002
date-0.002
Token
Token ID198
Feature activation-1.19e-01

Pos logit contributions
over+0.017
ute+0.016
date+0.016
apping+0.016
he+0.015
Neg logit contributions
 imag-0.012
 scratches-0.011
 pulls-0.011
 hides-0.011
 pret-0.011
Token
Token ID198
Feature activation-1.65e-02

Pos logit contributions
 imag+0.004
 scratches+0.004
 pulls+0.004
 screams+0.004
 pret+0.004
Neg logit contributions
over-0.007
date-0.006
apping-0.006
ute-0.006
kin-0.006
TokenThe
Token ID464
Feature activation+6.22e-01

Pos logit contributions
 scratches+0.001
 imag+0.001
 pulls+0.001
 dr+0.001
 pret+0.001
Neg logit contributions
over-0.001
apping-0.001
ute-0.001
date-0.001
kin-0.001
Token dog
Token ID3290
Feature activation-3.33e-01

Pos logit contributions
over+0.028
apping+0.026
date+0.025
he+0.025
ute+0.024
Neg logit contributions
 imag-0.032
 scratches-0.030
 pret-0.029
 hides-0.028
 screams-0.028
Token w
Token ID266
Feature activation-8.76e-01

Pos logit contributions
 imag+0.013
drivers+0.011
 hides+0.010
 fault+0.010
 scratches+0.010
Neg logit contributions
over-0.018
 year-0.016
apping-0.016
ute-0.016
date-0.016
Tokenags
Token ID3775
Feature activation+4.81e+00

Pos logit contributions
 imag+0.057
 hides+0.053
 scratches+0.052
 pret+0.052
 pulls+0.052
Neg logit contributions
over-0.028
apping-0.024
date-0.023
ute-0.023
he-0.023
Token its
Token ID663
Feature activation-1.44e-01

Pos logit contributions
over+0.278
date+0.258
ute+0.256
apping+0.255
he+0.251
Neg logit contributions
 imag-0.184
 scratches-0.160
 pulls-0.158
 hides-0.157
 pret-0.157
Token tail
Token ID7894
Feature activation+2.16e-01

Pos logit contributions
 imag+0.007
 scratches+0.007
 hides+0.007
 pulls+0.007
 pret+0.007
Neg logit contributions
over-0.006
date-0.006
apping-0.006
he-0.006
ute-0.006
Token.
Token ID13
Feature activation+3.84e-01

Pos logit contributions
over+0.010
date+0.010
ute+0.010
apping+0.009
he+0.009
Neg logit contributions
 imag-0.010
 scratches-0.009
 pulls-0.009
 hides-0.009
 pret-0.009
Token It
Token ID632
Feature activation-8.56e-01

Pos logit contributions
over+0.022
ute+0.021
date+0.021
he+0.020
apping+0.020
Neg logit contributions
 imag-0.014
 pulls-0.013
 scratches-0.013
 pret-0.013
 hides-0.013
Token l
Token ID300
Feature activation-7.55e-01

Pos logit contributions
drivers+0.040
 supervision+0.034
 imag+0.033
 fault+0.032
 subt+0.030
Neg logit contributions
 year-0.035
over-0.033
 world-0.032
ute-0.031
pron-0.031
Token We
Token ID775
Feature activation-1.74e-01

Pos logit contributions
over+0.008
date+0.007
ute+0.007
he+0.007
apping+0.007
Neg logit contributions
 imag-0.007
 hides-0.006
 scratches-0.006
 pulls-0.006
 pret-0.006
Token have
Token ID423
Feature activation+2.01e-03

Pos logit contributions
 imag+0.007
 hides+0.006
 scratches+0.006
 pulls+0.006
 pret+0.006
Neg logit contributions
over-0.009
apping-0.009
date-0.009
ute-0.009
he-0.008
Token different
Token ID1180
Feature activation-6.65e-01

Pos logit contributions
over+0.000
apping+0.000
 year+0.000
ute+0.000
date+0.000
Neg logit contributions
 imag-0.000
 pret-0.000
 hides-0.000
 pulls-0.000
drivers-0.000
Token likes
Token ID7832
Feature activation-4.13e-01

Pos logit contributions
 imag+0.029
 pulls+0.028
 hides+0.027
 scratches+0.027
drivers+0.026
Neg logit contributions
over-0.031
 year-0.028
ute-0.028
apping-0.027
date-0.027
Token and
Token ID290
Feature activation+1.02e-01

Pos logit contributions
 imag+0.018
 pulls+0.016
 hides+0.016
 scratches+0.016
 pret+0.016
Neg logit contributions
over-0.021
ute-0.019
apping-0.019
date-0.019
 year-0.019
Token disl
Token ID19621
Feature activation-4.29e+00

Pos logit contributions
over+0.005
apping+0.005
ute+0.005
date+0.004
 year+0.004
Neg logit contributions
 imag-0.004
drivers-0.004
 pulls-0.004
 pret-0.004
 scratches-0.004
Tokenikes
Token ID7938
Feature activation+5.91e-01

Pos logit contributions
 imag+0.224
 scratches+0.210
 hides+0.202
 pulls+0.202
 pret+0.197
Neg logit contributions
over-0.189
apping-0.181
ute-0.177
he-0.173
date-0.165
Token.
Token ID13
Feature activation+1.08e-01

Pos logit contributions
over+0.028
date+0.026
ute+0.026
apping+0.026
he+0.024
Neg logit contributions
 imag-0.029
 scratches-0.025
 hides-0.025
 pulls-0.025
 pret-0.025
Token We
Token ID775
Feature activation-2.12e-01

Pos logit contributions
over+0.006
date+0.005
ute+0.005
he+0.005
apping+0.005
Neg logit contributions
 imag-0.005
 hides-0.004
 scratches-0.004
 pulls-0.004
 pret-0.004
Token can
Token ID460
Feature activation+1.24e-01

Pos logit contributions
 imag+0.009
 hides+0.008
 scratches+0.008
 pulls+0.008
 pret+0.008
Neg logit contributions
over-0.011
date-0.010
ute-0.010
apping-0.010
he-0.010
Token be
Token ID307
Feature activation+3.01e-01

Pos logit contributions
over+0.006
apping+0.006
date+0.006
ute+0.006
he+0.006
Neg logit contributions
 imag-0.006
 hides-0.005
 scratches-0.005
 pulls-0.005
 pret-0.005
Token had
Token ID550
Feature activation+1.82e-01

Pos logit contributions
 imag+0.006
 hides+0.005
 scratches+0.005
 pulls+0.005
 pret+0.005
Neg logit contributions
over-0.006
he-0.006
date-0.006
 year-0.006
apping-0.006
Token hamm
Token ID25755
Feature activation+1.25e+00

Pos logit contributions
over+0.007
 year+0.007
ak+0.006
date+0.006
pron+0.006
Neg logit contributions
 imag-0.009
 hides-0.009
 screams-0.009
 pulls-0.008
 pret-0.008
Tokeners
Token ID364
Feature activation-2.38e-02

Pos logit contributions
over+0.055
apping+0.049
date+0.049
ute+0.049
he+0.049
Neg logit contributions
 imag-0.066
 hides-0.059
 scratches-0.058
 pulls-0.058
 pret-0.057
Token,
Token ID11
Feature activation-4.89e-02

Pos logit contributions
 imag+0.001
 hides+0.001
 scratches+0.001
 pulls+0.001
 pret+0.001
Neg logit contributions
over-0.001
apping-0.001
date-0.001
ute-0.001
he-0.001
Token screw
Token ID9580
Feature activation+4.01e-01

Pos logit contributions
 imag+0.002
 hides+0.002
 screams+0.002
 pulls+0.002
 scratches+0.002
Neg logit contributions
over-0.003
 year-0.002
he-0.002
ute-0.002
pron-0.002
Tokendrivers
Token ID36702
Feature activation-4.04e+00

Pos logit contributions
over+0.024
ute+0.022
apping+0.022
he+0.022
kin+0.021
Neg logit contributions
 imag-0.014
 hides-0.011
 screams-0.011
 pret-0.010
 scratches-0.010
Token,
Token ID11
Feature activation-7.24e-02

Pos logit contributions
 imag+0.264
 hides+0.238
 pret+0.234
 pulls+0.232
 scratches+0.230
Neg logit contributions
over-0.131
apping-0.111
 year-0.110
he-0.107
ute-0.104
Token pl
Token ID458
Feature activation-2.58e-01

Pos logit contributions
 imag+0.004
 hides+0.004
 screams+0.004
 pulls+0.003
 scratches+0.003
Neg logit contributions
over-0.003
 year-0.002
he-0.002
ute-0.002
apping-0.002
Tokeniers
Token ID3183
Feature activation+5.09e-01

Pos logit contributions
 imag+0.014
 scratches+0.013
 pulls+0.013
 hides+0.013
 pret+0.012
Neg logit contributions
over-0.011
date-0.010
ute-0.009
apping-0.009
 year-0.009
Token,
Token ID11
Feature activation-6.98e-02

Pos logit contributions
over+0.017
apping+0.015
ute+0.013
he+0.013
 year+0.013
Neg logit contributions
 imag-0.033
 hides-0.030
 pret-0.029
 pulls-0.029
 scratches-0.029
Token and
Token ID290
Feature activation-9.77e-02

Pos logit contributions
 imag+0.004
 hides+0.004
 screams+0.004
 scratches+0.004
 pulls+0.004
Neg logit contributions
over-0.002
 year-0.002
ute-0.002
apping-0.002
he-0.002
Token for
Token ID329
Feature activation+4.26e-01

Pos logit contributions
 imag+0.012
 hides+0.011
 scratches+0.011
 pulls+0.011
 pret+0.011
Neg logit contributions
over-0.013
apping-0.012
date-0.012
ute-0.012
he-0.011
Token some
Token ID617
Feature activation-4.22e-01

Pos logit contributions
over+0.023
date+0.021
apping+0.021
 year+0.021
he+0.020
Neg logit contributions
 imag-0.018
 hides-0.016
 pulls-0.015
 pret-0.015
 scratches-0.015
Token funny
Token ID8258
Feature activation-4.82e-01

Pos logit contributions
 imag+0.019
 scratches+0.016
 hides+0.016
 pulls+0.016
 pret+0.016
Neg logit contributions
over-0.022
apping-0.020
ute-0.020
date-0.020
he-0.020
Token little
Token ID1310
Feature activation+3.53e-01

Pos logit contributions
 imag+0.021
 hides+0.019
 scratches+0.018
 pulls+0.018
 pret+0.018
Neg logit contributions
over-0.025
date-0.023
apping-0.023
ute-0.023
 year-0.022
Token sun
Token ID4252
Feature activation-6.35e-01

Pos logit contributions
over+0.017
date+0.015
apping+0.015
 year+0.015
ute+0.014
Neg logit contributions
 imag-0.017
 hides-0.015
 pulls-0.015
 scratches-0.015
 pret-0.015
Tokenfl
Token ID2704
Feature activation-3.94e+00

Pos logit contributions
 imag+0.032
 hides+0.028
 scratches+0.028
 pulls+0.028
 pret+0.028
Neg logit contributions
over-0.029
ute-0.026
apping-0.026
date-0.026
he-0.025
Tokenowers
Token ID3618
Feature activation+7.92e-01

Pos logit contributions
 imag+0.253
 hides+0.226
 pulls+0.225
 scratches+0.225
 pret+0.223
Neg logit contributions
over-0.125
apping-0.109
ute-0.109
date-0.107
 year-0.104
Token.
Token ID13
Feature activation+2.30e-01

Pos logit contributions
over+0.039
ute+0.036
date+0.036
apping+0.035
 year+0.034
Neg logit contributions
 imag-0.036
 hides-0.031
 pulls-0.031
 scratches-0.031
 pret-0.031
Token Her
Token ID2332
Feature activation+4.94e-01

Pos logit contributions
over+0.012
ute+0.012
apping+0.011
he+0.011
date+0.011
Neg logit contributions
 imag-0.009
 pret-0.008
 hides-0.008
 pulls-0.008
 scratches-0.008
Token m
Token ID285
Feature activation+4.12e-01

Pos logit contributions
over+0.018
ute+0.016
apping+0.016
date+0.016
 year+0.016
Neg logit contributions
 imag-0.029
 hides-0.027
 pulls-0.026
 scratches-0.026
 pret-0.026
Tokenama
Token ID1689
Feature activation-4.43e-01

Pos logit contributions
over+0.011
date+0.009
apping+0.009
ute+0.009
he+0.008
Neg logit contributions
 imag-0.029
 scratches-0.027
 hides-0.027
 pulls-0.026
 pret-0.026
Token garden
Token ID11376
Feature activation-5.15e-01

Pos logit contributions
 imag+0.005
 hides+0.005
 scratches+0.005
 pulls+0.005
 pret+0.005
Neg logit contributions
over-0.005
apping-0.005
date-0.004
ute-0.004
he-0.004
Token and
Token ID290
Feature activation+4.46e-02

Pos logit contributions
 imag+0.025
 hides+0.022
 scratches+0.022
 pret+0.022
 pulls+0.022
Neg logit contributions
over-0.024
ute-0.022
apping-0.022
date-0.022
he-0.021
Token admire
Token ID21099
Feature activation-4.47e-01

Pos logit contributions
over+0.002
date+0.002
apping+0.002
he+0.002
ute+0.002
Neg logit contributions
 imag-0.002
 scratches-0.002
 hides-0.002
 pulls-0.002
drivers-0.002
Token his
Token ID465
Feature activation-1.05e+00

Pos logit contributions
 imag+0.018
 hides+0.015
 scratches+0.015
 pulls+0.015
 pret+0.015
Neg logit contributions
over-0.026
apping-0.024
date-0.024
ute-0.024
he-0.023
Token sun
Token ID4252
Feature activation-9.15e-01

Pos logit contributions
 imag+0.051
 scratches+0.045
 hides+0.044
 pret+0.043
 pulls+0.043
Neg logit contributions
over-0.051
apping-0.047
date-0.047
he-0.046
ute-0.045
Tokenfl
Token ID2704
Feature activation-3.90e+00

Pos logit contributions
 imag+0.044
 hides+0.039
 pret+0.039
 scratches+0.039
 pulls+0.038
Neg logit contributions
over-0.042
apping-0.039
date-0.039
ute-0.039
he-0.037
Tokenowers
Token ID3618
Feature activation+5.22e-01

Pos logit contributions
 imag+0.267
 hides+0.241
 scratches+0.240
 pulls+0.239
 pret+0.239
Neg logit contributions
over-0.110
apping-0.093
date-0.092
ute-0.091
he-0.087
Token.
Token ID13
Feature activation-8.24e-03

Pos logit contributions
over+0.025
date+0.024
ute+0.023
apping+0.023
 year+0.022
Neg logit contributions
 imag-0.024
 hides-0.021
 pret-0.021
 pulls-0.021
 scratches-0.021
Token They
Token ID1119
Feature activation-4.64e-01

Pos logit contributions
 imag+0.000
 hides+0.000
 scratches+0.000
 pulls+0.000
 pret+0.000
Neg logit contributions
over-0.000
date-0.000
ute-0.000
apping-0.000
he-0.000
Token also
Token ID635
Feature activation+1.86e-01

Pos logit contributions
 imag+0.021
 hides+0.018
 scratches+0.018
 pulls+0.018
drivers+0.018
Neg logit contributions
over-0.023
he-0.021
date-0.021
apping-0.021
ute-0.021
Token see
Token ID766
Feature activation+3.89e-01

Pos logit contributions
over+0.009
he+0.008
date+0.008
apping+0.008
 year+0.008
Neg logit contributions
 imag-0.008
drivers-0.008
 hides-0.007
 scratches-0.007
 pulls-0.007
Token careful
Token ID8161
Feature activation+1.70e-01

Pos logit contributions
over+0.058
date+0.057
ute+0.056
he+0.056
apping+0.055
Neg logit contributions
 imag-0.055
 scratches-0.051
 pulls-0.050
 hides-0.048
 pret-0.048
Token,
Token ID11
Feature activation+1.67e-01

Pos logit contributions
over+0.008
date+0.008
ute+0.008
he+0.008
apping+0.007
Neg logit contributions
 imag-0.007
 pulls-0.007
 scratches-0.007
 hides-0.007
 knocks-0.006
Token Lily
Token ID20037
Feature activation-5.65e-01

Pos logit contributions
over+0.009
apping+0.008
date+0.008
ute+0.008
he+0.008
Neg logit contributions
 imag-0.008
 hides-0.007
 scratches-0.007
 pulls-0.007
 pret-0.006
Token.
Token ID13
Feature activation-1.34e-01

Pos logit contributions
 imag+0.025
 hides+0.023
 scratches+0.023
 pulls+0.023
 pret+0.022
Neg logit contributions
over-0.027
date-0.026
ute-0.026
apping-0.026
he-0.025
Token Cater
Token ID40742
Feature activation-1.36e+00

Pos logit contributions
 imag+0.005
 hides+0.005
 pulls+0.005
 scratches+0.005
 pret+0.005
Neg logit contributions
over-0.007
date-0.007
he-0.007
ute-0.007
apping-0.007
Tokenpill
Token ID27215
Feature activation-3.88e+00

Pos logit contributions
 imag+0.058
 hides+0.052
 pret+0.051
 pulls+0.051
 scratches+0.050
Neg logit contributions
date-0.066
over-0.065
ute-0.065
pron-0.064
apping-0.062
Tokenars
Token ID945
Feature activation+5.18e-01

Pos logit contributions
 imag+0.206
 scratches+0.188
 pulls+0.186
 hides+0.181
 pret+0.177
Neg logit contributions
over-0.162
date-0.152
ute-0.148
apping-0.148
pron-0.145
Token can
Token ID460
Feature activation+1.90e-02

Pos logit contributions
over+0.032
date+0.030
apping+0.029
ute+0.029
he+0.029
Neg logit contributions
 imag-0.018
 hides-0.015
 scratches-0.015
 pulls-0.015
 pret-0.015
Token be
Token ID307
Feature activation+3.66e-01

Pos logit contributions
over+0.001
ute+0.001
date+0.001
apping+0.001
he+0.001
Neg logit contributions
 imag-0.001
 hides-0.001
 pret-0.001
 scratches-0.001
 pulls-0.001
Token tricky
Token ID17198
Feature activation-5.61e-01

Pos logit contributions
over+0.016
apping+0.014
date+0.014
 year+0.014
ute+0.014
Neg logit contributions
 imag-0.019
 hides-0.016
 pret-0.016
 pulls-0.016
drivers-0.016
Token.
Token ID13
Feature activation-1.96e-01

Pos logit contributions
 imag+0.029
 hides+0.026
 scratches+0.025
 pret+0.025
 pulls+0.025
Neg logit contributions
over-0.025
date-0.023
ute-0.023
apping-0.023
he-0.023
Token back
Token ID736
Feature activation+4.71e-01

Pos logit contributions
 imag+0.014
 scratches+0.013
 hides+0.012
 pulls+0.012
 screams+0.012
Neg logit contributions
over-0.020
he-0.019
ute-0.019
pron-0.019
date-0.019
Token with
Token ID351
Feature activation-3.40e-01

Pos logit contributions
over+0.024
ute+0.023
date+0.022
pron+0.022
apping+0.022
Neg logit contributions
 imag-0.020
 scratches-0.018
 hides-0.018
 pulls-0.017
 pret-0.017
Token lots
Token ID6041
Feature activation-6.38e-01

Pos logit contributions
 imag+0.014
 hides+0.013
 scratches+0.012
 pret+0.012
 pulls+0.012
Neg logit contributions
over-0.018
date-0.017
apping-0.017
ute-0.017
he-0.017
Token of
Token ID286
Feature activation+1.26e+00

Pos logit contributions
 imag+0.047
 scratches+0.043
 hides+0.043
 pulls+0.043
 pret+0.043
Neg logit contributions
over-0.015
date-0.012
apping-0.012
ute-0.012
he-0.011
Token sun
Token ID4252
Feature activation-5.85e-01

Pos logit contributions
over+0.064
apping+0.059
date+0.059
ute+0.058
 year+0.057
Neg logit contributions
 imag-0.056
 hides-0.050
 pulls-0.049
 pret-0.049
 scratches-0.048
Tokenfl
Token ID2704
Feature activation-3.80e+00

Pos logit contributions
 imag+0.025
 hides+0.021
 scratches+0.021
 pulls+0.021
 pret+0.021
Neg logit contributions
over-0.031
date-0.029
apping-0.029
ute-0.029
he-0.028
Tokenowers
Token ID3618
Feature activation+5.63e-01

Pos logit contributions
 imag+0.259
 hides+0.235
 scratches+0.235
 pulls+0.233
 pret+0.231
Neg logit contributions
over-0.106
date-0.091
apping-0.090
ute-0.090
he-0.085
Token to
Token ID284
Feature activation-7.15e-01

Pos logit contributions
over+0.026
date+0.024
ute+0.024
apping+0.023
he+0.023
Neg logit contributions
 imag-0.028
 hides-0.025
 scratches-0.025
 pulls-0.024
 pret-0.024
Token fill
Token ID6070
Feature activation+6.82e-01

Pos logit contributions
 imag+0.029
 scratches+0.026
 pret+0.025
 hides+0.024
 pulls+0.024
Neg logit contributions
over-0.040
apping-0.038
ute-0.037
date-0.036
he-0.034
Token the
Token ID262
Feature activation-1.37e-01

Pos logit contributions
over+0.037
apping+0.036
ute+0.035
date+0.034
pron+0.033
Neg logit contributions
 imag-0.027
 scratches-0.024
 pret-0.023
 hides-0.023
 pulls-0.022
Token prince
Token ID19716
Feature activation-9.86e-01

Pos logit contributions
 imag+0.007
 scratches+0.006
 hides+0.006
 pret+0.006
 pulls+0.006
Neg logit contributions
over-0.006
apping-0.006
ute-0.005
date-0.005
he-0.005
Token know
Token ID760
Feature activation+1.62e-01

Pos logit contributions
over+0.044
ute+0.044
date+0.043
apping+0.043
pron+0.042
Neg logit contributions
 imag-0.019
 hides-0.017
 scratches-0.017
 pret-0.016
 pulls-0.016
Token,
Token ID11
Feature activation+1.75e-01

Pos logit contributions
over+0.006
apping+0.006
 year+0.006
ute+0.006
he+0.006
Neg logit contributions
 imag-0.009
 hides-0.008
 scratches-0.008
 pret-0.007
drivers-0.007
Token Lily
Token ID20037
Feature activation-4.43e-01

Pos logit contributions
over+0.010
date+0.009
apping+0.009
ute+0.009
he+0.009
Neg logit contributions
 imag-0.007
 hides-0.006
 pret-0.006
 pulls-0.006
 scratches-0.006
Token.
Token ID13
Feature activation-5.05e-02

Pos logit contributions
 imag+0.024
 hides+0.021
 scratches+0.021
 pulls+0.021
 pret+0.021
Neg logit contributions
over-0.019
apping-0.017
date-0.017
ute-0.017
he-0.016
Token Cater
Token ID40742
Feature activation-1.40e+00

Pos logit contributions
 imag+0.002
 hides+0.002
 scratches+0.002
 pret+0.002
 pulls+0.002
Neg logit contributions
over-0.002
apping-0.002
ute-0.002
date-0.002
he-0.002
Tokenpill
Token ID27215
Feature activation-3.79e+00

Pos logit contributions
 imag+0.062
 hides+0.056
 pret+0.054
 scratches+0.054
 pulls+0.054
Neg logit contributions
over-0.070
date-0.067
ute-0.066
pron-0.065
apping-0.065
Tokenars
Token ID945
Feature activation+5.36e-01

Pos logit contributions
 imag+0.201
 scratches+0.182
 pulls+0.180
 hides+0.178
 pret+0.174
Neg logit contributions
over-0.162
date-0.152
apping-0.147
ute-0.146
pron-0.145
Token need
Token ID761
Feature activation+2.70e-01

Pos logit contributions
over+0.032
apping+0.030
ute+0.030
date+0.030
he+0.029
Neg logit contributions
 imag-0.020
 hides-0.016
 scratches-0.016
 pret-0.016
 pulls-0.016
Token leaves
Token ID5667
Feature activation-2.51e-02

Pos logit contributions
over+0.013
apping+0.012
ute+0.012
date+0.012
 year+0.012
Neg logit contributions
 imag-0.012
 hides-0.011
 pret-0.011
 pulls-0.011
 scratches-0.010
Token and
Token ID290
Feature activation+6.39e-02

Pos logit contributions
 imag+0.001
 hides+0.001
 scratches+0.001
 pulls+0.001
 pret+0.001
Neg logit contributions
over-0.001
date-0.001
apping-0.001
ute-0.001
he-0.001
Token air
Token ID1633
Feature activation+3.93e-01

Pos logit contributions
over+0.003
 year+0.003
 world+0.002
date+0.002
ute+0.002
Neg logit contributions
 imag-0.003
drivers-0.002
 knocks-0.002
 pulls-0.002
 pret-0.002
Token Did
Token ID7731
Feature activation+2.26e-01

Pos logit contributions
over+0.002
date+0.002
he+0.002
ute+0.002
apping+0.002
Neg logit contributions
 imag-0.002
 hides-0.001
 pulls-0.001
 scratches-0.001
 pret-0.001
Token you
Token ID345
Feature activation+8.58e-01

Pos logit contributions
over+0.007
ute+0.006
apping+0.006
 year+0.006
he+0.006
Neg logit contributions
 imag-0.014
 pret-0.012
 scratches-0.012
drivers-0.012
 hides-0.012
Token know
Token ID760
Feature activation+1.64e-01

Pos logit contributions
over+0.051
apping+0.046
date+0.045
ute+0.044
 year+0.044
Neg logit contributions
 imag-0.033
 pulls-0.027
 scratches-0.027
 hides-0.027
 pret-0.027
Token that
Token ID326
Feature activation+1.90e-01

Pos logit contributions
over+0.011
date+0.010
apping+0.010
ute+0.010
he+0.010
Neg logit contributions
 imag-0.005
 hides-0.005
 scratches-0.004
 pulls-0.004
 pret-0.004
Token cater
Token ID20825
Feature activation-1.16e+00

Pos logit contributions
over+0.010
apping+0.009
date+0.009
ute+0.009
he+0.009
Neg logit contributions
 imag-0.009
 hides-0.008
 pret-0.008
 scratches-0.007
 pulls-0.007
Tokenpill
Token ID27215
Feature activation-3.78e+00

Pos logit contributions
 imag+0.041
 hides+0.034
 scratches+0.034
 pulls+0.033
 pret+0.033
Neg logit contributions
over-0.070
date-0.066
apping-0.066
ute-0.065
pron-0.063
Tokenars
Token ID945
Feature activation+5.39e-01

Pos logit contributions
 imag+0.195
 pulls+0.170
 scratches+0.169
 hides+0.167
 pret+0.164
Neg logit contributions
over-0.162
date-0.152
apping-0.150
ute-0.150
pron-0.148
Token turn
Token ID1210
Feature activation+1.05e-01

Pos logit contributions
over+0.030
apping+0.027
date+0.027
ute+0.026
he+0.025
Neg logit contributions
 imag-0.023
 hides-0.020
 pulls-0.019
 scratches-0.019
 pret-0.019
Token into
Token ID656
Feature activation-3.72e-01

Pos logit contributions
over+0.005
apping+0.005
date+0.005
ute+0.005
he+0.004
Neg logit contributions
 imag-0.005
 hides-0.004
 scratches-0.004
 pulls-0.004
 pret-0.004
Token beautiful
Token ID4950
Feature activation-2.99e-01

Pos logit contributions
 imag+0.015
 scratches+0.013
 hides+0.013
 pret+0.013
 pulls+0.013
Neg logit contributions
over-0.021
apping-0.019
date-0.019
ute-0.019
he-0.019
Token butterflies
Token ID44571
Feature activation-2.76e-01

Pos logit contributions
 imag+0.013
 hides+0.011
 pret+0.011
 scratches+0.011
 pulls+0.011
Neg logit contributions
over-0.016
apping-0.015
he-0.015
ute-0.014
date-0.014
Token watching
Token ID4964
Feature activation-1.06e+00

Pos logit contributions
over+0.031
ute+0.030
he+0.029
date+0.029
apping+0.029
Neg logit contributions
 imag-0.029
 scratches-0.026
 hides-0.026
 pulls-0.026
 pret-0.025
Token her
Token ID607
Feature activation-5.93e-01

Pos logit contributions
 imag+0.041
 scratches+0.036
 hides+0.035
 pret+0.035
 pulls+0.035
Neg logit contributions
over-0.059
apping-0.056
ute-0.055
date-0.055
he-0.054
Token eat
Token ID4483
Feature activation-1.13e+00

Pos logit contributions
 imag+0.028
 hides+0.025
 scratches+0.025
 pulls+0.025
 pret+0.025
Neg logit contributions
over-0.028
date-0.026
apping-0.026
pron-0.025
ute-0.025
Token the
Token ID262
Feature activation+4.61e-02

Pos logit contributions
 imag+0.047
 scratches+0.044
 hides+0.042
 pret+0.042
 pulls+0.041
Neg logit contributions
over-0.058
apping-0.057
ute-0.056
date-0.055
he-0.052
Token pine
Token ID20161
Feature activation-3.52e-01

Pos logit contributions
over+0.002
apping+0.002
ute+0.002
date+0.002
he+0.002
Neg logit contributions
 imag-0.002
 scratches-0.002
 hides-0.002
 pulls-0.002
 pret-0.002
Tokencon
Token ID1102
Feature activation-3.77e+00

Pos logit contributions
 imag+0.018
 scratches+0.017
 hides+0.016
 pulls+0.016
 pret+0.016
Neg logit contributions
over-0.015
apping-0.014
date-0.014
ute-0.014
he-0.013
Tokenes
Token ID274
Feature activation-2.31e+00

Pos logit contributions
 imag+0.142
drivers+0.126
 pulls+0.125
 hides+0.124
 scratches+0.124
Neg logit contributions
over-0.198
ute-0.194
apping-0.191
date-0.190
pron-0.183
Token.
Token ID13
Feature activation-1.02e-01

Pos logit contributions
 imag+0.095
 scratches+0.084
 hides+0.084
 pulls+0.084
 pret+0.084
Neg logit contributions
over-0.120
apping-0.112
ute-0.111
date-0.109
pron-0.106
Token The
Token ID383
Feature activation-2.12e-01

Pos logit contributions
 imag+0.004
 hides+0.004
 pulls+0.004
 scratches+0.004
 pret+0.004
Neg logit contributions
over-0.005
ute-0.005
date-0.005
apping-0.005
he-0.005
Token squirrel
Token ID33039
Feature activation-7.99e-02

Pos logit contributions
 imag+0.010
 scratches+0.009
 hides+0.009
 pulls+0.009
 screams+0.009
Neg logit contributions
over-0.010
apping-0.010
date-0.009
ute-0.009
he-0.009
Token said
Token ID531
Feature activation+4.87e-01

Pos logit contributions
 imag+0.003
 hides+0.003
 scratches+0.003
 pret+0.003
 pulls+0.003
Neg logit contributions
over-0.005
apping-0.004
date-0.004
he-0.004
ute-0.004
Token da
Token ID12379
Feature activation+3.44e-01

Pos logit contributions
over+0.002
date+0.002
ute+0.002
apping+0.002
he+0.002
Neg logit contributions
 imag-0.001
 scratches-0.001
 pret-0.001
 hides-0.001
 pulls-0.001
Tokenis
Token ID271
Feature activation-1.54e+00

Pos logit contributions
over+0.014
apping+0.012
date+0.012
ute+0.012
he+0.012
Neg logit contributions
 imag-0.019
 hides-0.017
 scratches-0.017
 pulls-0.017
 pret-0.017
Tokenies
Token ID444
Feature activation-3.58e-02

Pos logit contributions
 imag+0.093
 hides+0.085
 pulls+0.083
 scratches+0.083
 pret+0.082
Neg logit contributions
over-0.056
apping-0.048
 year-0.047
ute-0.047
he-0.044
Token,
Token ID11
Feature activation-3.00e-03

Pos logit contributions
 imag+0.002
 hides+0.002
 pulls+0.002
 pret+0.002
 scratches+0.002
Neg logit contributions
over-0.002
date-0.002
ute-0.002
apping-0.002
he-0.001
Token sun
Token ID4252
Feature activation-6.42e-01

Pos logit contributions
 imag+0.000
 hides+0.000
 scratches+0.000
 pulls+0.000
 pret+0.000
Neg logit contributions
over-0.000
apping-0.000
date-0.000
ute-0.000
he-0.000
Tokenfl
Token ID2704
Feature activation-3.76e+00

Pos logit contributions
 imag+0.025
 hides+0.021
 scratches+0.021
 pulls+0.020
 pret+0.020
Neg logit contributions
over-0.037
date-0.035
apping-0.035
ute-0.034
he-0.034
Tokenowers
Token ID3618
Feature activation+5.34e-01

Pos logit contributions
 imag+0.266
 hides+0.244
 scratches+0.243
 pulls+0.242
 pret+0.241
Neg logit contributions
over-0.097
apping-0.081
date-0.080
ute-0.080
he-0.076
Token,
Token ID11
Feature activation+6.17e-03

Pos logit contributions
over+0.026
date+0.024
ute+0.023
apping+0.023
he+0.022
Neg logit contributions
 imag-0.025
 hides-0.023
 scratches-0.022
 pulls-0.022
 pret-0.022
Token and
Token ID290
Feature activation+1.09e-01

Pos logit contributions
over+0.000
apping+0.000
date+0.000
ute+0.000
he+0.000
Neg logit contributions
 imag-0.000
 hides-0.000
 scratches-0.000
 pulls-0.000
 pret-0.000
Token tul
Token ID48373
Feature activation-1.86e+00

Pos logit contributions
over+0.006
apping+0.005
date+0.005
ute+0.005
 year+0.005
Neg logit contributions
 imag-0.005
 hides-0.004
 scratches-0.004
 pulls-0.004
drivers-0.004
Tokenips
Token ID2419
Feature activation-3.19e-01

Pos logit contributions
 imag+0.094
 hides+0.085
 scratches+0.084
 pulls+0.084
 pret+0.082
Neg logit contributions
over-0.086
apping-0.078
date-0.077
ute-0.077
he-0.075
Token away
Token ID1497
Feature activation+1.15e+00

Pos logit contributions
 imag+0.030
 hides+0.027
 scratches+0.026
 pulls+0.026
 pret+0.026
Neg logit contributions
over-0.030
date-0.028
apping-0.028
ute-0.028
 year-0.026
Token in
Token ID287
Feature activation+2.40e-01

Pos logit contributions
over+0.057
date+0.055
apping+0.054
ute+0.054
pron+0.051
Neg logit contributions
 imag-0.050
 hides-0.046
 scratches-0.045
 pret-0.044
 pulls-0.043
Token her
Token ID607
Feature activation-7.29e-01

Pos logit contributions
over+0.015
apping+0.014
ute+0.014
date+0.013
he+0.013
Neg logit contributions
 imag-0.009
 scratches-0.007
 hides-0.007
 pret-0.007
 pulls-0.007
Token pig
Token ID12967
Feature activation-1.76e-01

Pos logit contributions
 imag+0.042
 hides+0.037
 scratches+0.036
 pret+0.036
 pulls+0.035
Neg logit contributions
over-0.029
apping-0.028
he-0.027
date-0.027
ute-0.026
Tokengy
Token ID1360
Feature activation-2.12e+00

Pos logit contributions
 imag+0.007
 hides+0.007
 scratches+0.006
 pulls+0.006
 pret+0.006
Neg logit contributions
over-0.009
apping-0.009
ute-0.009
he-0.008
date-0.008
Tokenbank
Token ID17796
Feature activation-3.73e+00

Pos logit contributions
 imag+0.094
 hides+0.082
 scratches+0.081
 pret+0.080
 pulls+0.080
Neg logit contributions
over-0.110
apping-0.102
ute-0.100
date-0.100
he-0.099
Token every
Token ID790
Feature activation-5.69e-01

Pos logit contributions
 imag+0.173
 hides+0.156
 scratches+0.153
 pulls+0.151
 pret+0.149
Neg logit contributions
over-0.174
date-0.166
ute-0.164
apping-0.160
he-0.159
Token week
Token ID1285
Feature activation-7.37e-01

Pos logit contributions
 imag+0.030
 knocks+0.027
 hides+0.026
 scratches+0.026
 pret+0.026
Neg logit contributions
over-0.023
apping-0.023
ute-0.022
he-0.021
abby-0.020
Token.
Token ID13
Feature activation-2.10e-01

Pos logit contributions
 imag+0.037
 hides+0.033
 scratches+0.032
 pret+0.032
 pulls+0.032
Neg logit contributions
over-0.034
ute-0.031
apping-0.031
date-0.031
he-0.030
Token This
Token ID770
Feature activation-9.60e-01

Pos logit contributions
 imag+0.009
 hides+0.008
 pulls+0.008
 pret+0.008
 scratches+0.008
Neg logit contributions
over-0.011
ute-0.010
date-0.010
apping-0.010
he-0.010
Token way
Token ID835
Feature activation-4.17e-01

Pos logit contributions
 imag+0.055
 hides+0.049
 pret+0.049
 scratches+0.049
 pulls+0.048
Neg logit contributions
over-0.038
apping-0.036
ute-0.034
date-0.033
he-0.033
Token happy
Token ID3772
Feature activation+1.63e-01

Pos logit contributions
 imag+0.014
 scratches+0.012
 hides+0.012
 pulls+0.012
 pret+0.012
Neg logit contributions
over-0.010
date-0.009
ute-0.009
apping-0.009
he-0.009
Token with
Token ID351
Feature activation-4.65e-01

Pos logit contributions
over+0.008
date+0.007
ute+0.007
apping+0.007
he+0.007
Neg logit contributions
 imag-0.008
 hides-0.007
 scratches-0.007
 pulls-0.007
 pret-0.007
Token all
Token ID477
Feature activation-8.64e-01

Pos logit contributions
 imag+0.019
 hides+0.016
 scratches+0.016
 pulls+0.016
 pret+0.016
Neg logit contributions
over-0.026
apping-0.024
date-0.024
ute-0.024
he-0.024
Token the
Token ID262
Feature activation-1.25e-01

Pos logit contributions
 imag+0.033
 scratches+0.030
 hides+0.029
 pulls+0.029
 pret+0.028
Neg logit contributions
over-0.049
ute-0.046
date-0.046
apping-0.046
pron-0.043
Token sun
Token ID4252
Feature activation-4.86e-01

Pos logit contributions
 imag+0.006
 scratches+0.005
 hides+0.005
 pret+0.005
 pulls+0.005
Neg logit contributions
over-0.007
apping-0.006
date-0.006
ute-0.006
he-0.006
Tokenfl
Token ID2704
Feature activation-3.71e+00

Pos logit contributions
 imag+0.020
 scratches+0.017
 hides+0.017
 pulls+0.017
 pret+0.017
Neg logit contributions
over-0.026
date-0.024
apping-0.024
ute-0.024
he-0.023
Tokenowers
Token ID3618
Feature activation+5.82e-01

Pos logit contributions
 imag+0.253
 scratches+0.227
 hides+0.227
 pulls+0.225
 pret+0.224
Neg logit contributions
over-0.105
apping-0.090
date-0.090
ute-0.089
he-0.084
Token and
Token ID290
Feature activation+6.47e-02

Pos logit contributions
over+0.027
date+0.025
ute+0.025
apping+0.025
he+0.024
Neg logit contributions
 imag-0.028
 scratches-0.025
 hides-0.025
 pulls-0.025
 pret-0.024
Token he
Token ID339
Feature activation-4.95e-01

Pos logit contributions
over+0.003
apping+0.003
date+0.003
 year+0.003
he+0.003
Neg logit contributions
 imag-0.003
 hides-0.003
 scratches-0.003
drivers-0.003
 pulls-0.003
Token ordered
Token ID6149
Feature activation+5.25e-02

Pos logit contributions
 imag+0.021
 hides+0.018
 scratches+0.018
drivers+0.018
 pulls+0.018
Neg logit contributions
over-0.026
 year-0.024
he-0.024
apping-0.024
date-0.024
Token his
Token ID465
Feature activation-6.25e-01

Pos logit contributions
over+0.003
apping+0.003
ute+0.003
date+0.003
he+0.003
Neg logit contributions
 imag-0.002
 scratches-0.002
 hides-0.002
 pulls-0.002
 pret-0.002
Token<|endoftext|>
Token ID50256
Feature activation-1.26e-01

Pos logit contributions
over+0.054
ute+0.051
date+0.050
he+0.050
apping+0.049
Neg logit contributions
 imag-0.043
 hides-0.037
 pulls-0.037
 pret-0.037
 scratches-0.036
TokenL
Token ID43
Feature activation-5.80e-01

Pos logit contributions
 imag+0.005
 pret+0.004
 hides+0.004
 scratches+0.004
 pulls+0.004
Neg logit contributions
over-0.007
ute-0.006
apping-0.006
pron-0.006
date-0.006
Tokenily
Token ID813
Feature activation-1.13e+00

Pos logit contributions
 imag+0.033
 scratches+0.029
 hides+0.029
 pulls+0.029
 pret+0.029
Neg logit contributions
over-0.023
date-0.021
apping-0.021
ute-0.021
he-0.020
Token loved
Token ID6151
Feature activation-1.93e-01

Pos logit contributions
 imag+0.062
 hides+0.055
 scratches+0.055
 pulls+0.055
 pret+0.055
Neg logit contributions
over-0.047
apping-0.043
date-0.043
ute-0.042
he-0.041
Token sun
Token ID4252
Feature activation-8.56e-01

Pos logit contributions
 imag+0.010
 pulls+0.009
 hides+0.009
drivers+0.008
 scratches+0.008
Neg logit contributions
over-0.008
date-0.008
ute-0.008
 year-0.008
apping-0.008
Tokenfl
Token ID2704
Feature activation-3.64e+00

Pos logit contributions
 imag+0.048
 hides+0.043
 scratches+0.043
 pret+0.043
 pulls+0.043
Neg logit contributions
over-0.034
apping-0.031
date-0.031
ute-0.030
he-0.030
Tokenowers
Token ID3618
Feature activation+5.73e-01

Pos logit contributions
 imag+0.241
 scratches+0.215
 hides+0.215
 pulls+0.214
 pret+0.214
Neg logit contributions
over-0.109
apping-0.093
ute-0.092
date-0.091
 year-0.088
Token.
Token ID13
Feature activation-1.75e-01

Pos logit contributions
over+0.031
date+0.030
mber+0.029
ute+0.028
apping+0.028
Neg logit contributions
 imag-0.019
fl-0.018
 scratches-0.016
 pret-0.016
 puts-0.016
Token She
Token ID1375
Feature activation-2.76e-01

Pos logit contributions
 imag+0.007
 pret+0.006
 hides+0.006
 scratches+0.006
 knocks+0.006
Neg logit contributions
over-0.009
ute-0.009
date-0.008
apping-0.008
pron-0.008
Token liked
Token ID8288
Feature activation+3.63e-01

Pos logit contributions
 imag+0.014
 hides+0.012
 scratches+0.012
 pulls+0.012
drivers+0.012
Neg logit contributions
over-0.013
 year-0.012
ute-0.011
date-0.011
apping-0.011
Token to
Token ID284
Feature activation-8.63e-01

Pos logit contributions
over+0.018
apping+0.017
date+0.017
ute+0.017
he+0.016
Neg logit contributions
 imag-0.016
 hides-0.014
 scratches-0.014
 pulls-0.014
 pret-0.014
Token and
Token ID290
Feature activation-7.94e-02

Pos logit contributions
 imag+0.008
 hides+0.007
 scratches+0.007
 pret+0.007
 pulls+0.007
Neg logit contributions
over-0.008
date-0.007
apping-0.007
ute-0.007
he-0.007
Token lady
Token ID10846
Feature activation-1.11e+00

Pos logit contributions
 imag+0.003
drivers+0.003
 pulls+0.003
 hides+0.003
 pret+0.003
Neg logit contributions
over-0.003
 year-0.003
apping-0.003
ute-0.003
date-0.003
Tokenbugs
Token ID32965
Feature activation-7.05e-01

Pos logit contributions
 imag+0.056
 hides+0.050
 scratches+0.050
 pulls+0.049
 pret+0.049
Neg logit contributions
over-0.051
apping-0.046
date-0.046
ute-0.046
he-0.045
Token and
Token ID290
Feature activation-1.45e-01

Pos logit contributions
 imag+0.037
 hides+0.033
 scratches+0.033
 pulls+0.033
 pret+0.033
Neg logit contributions
over-0.031
apping-0.028
date-0.028
ute-0.027
he-0.027
Token cater
Token ID20825
Feature activation-1.40e+00

Pos logit contributions
 imag+0.007
drivers+0.006
 pulls+0.006
 hides+0.006
 knocks+0.006
Neg logit contributions
over-0.006
 year-0.005
apping-0.005
date-0.005
ute-0.005
Tokenpill
Token ID27215
Feature activation-3.60e+00

Pos logit contributions
 imag+0.040
 hides+0.032
 pulls+0.032
 pret+0.031
 knocks+0.030
Neg logit contributions
over-0.096
apping-0.089
ute-0.088
he-0.087
date-0.087
Tokenars
Token ID945
Feature activation+3.27e-01

Pos logit contributions
 imag+0.208
 scratches+0.188
 hides+0.188
 pulls+0.186
 pret+0.185
Neg logit contributions
over-0.138
date-0.125
apping-0.124
ute-0.122
he-0.117
Token."
Token ID526
Feature activation-4.90e-01

Pos logit contributions
over+0.016
date+0.015
apping+0.014
ute+0.014
he+0.014
Neg logit contributions
 imag-0.016
 scratches-0.014
 hides-0.014
 pret-0.014
 pulls-0.014
Token
Token ID198
Feature activation-3.03e-01

Pos logit contributions
 imag+0.026
 hides+0.023
 scratches+0.023
 pulls+0.023
 pret+0.023
Neg logit contributions
over-0.021
apping-0.019
date-0.019
ute-0.019
he-0.019
Token
Token ID198
Feature activation-2.65e-01

Pos logit contributions
 imag+0.018
 hides+0.016
 pret+0.015
 scratches+0.015
 pulls+0.015
Neg logit contributions
over-0.012
apping-0.011
he-0.011
ute-0.011
date-0.010
TokenTim
Token ID14967
Feature activation+2.42e-01

Pos logit contributions
 imag+0.014
 hides+0.012
 scratches+0.012
 pulls+0.012
 pret+0.012
Neg logit contributions
over-0.012
apping-0.011
date-0.011
ute-0.011
he-0.011
Token lots
Token ID6041
Feature activation-5.75e-01

Pos logit contributions
 year+0.007
over+0.007
 world+0.006
he+0.006
 corner+0.006
Neg logit contributions
 imag-0.010
 pret-0.009
 hides-0.009
drivers-0.008
 pulls-0.008
Token of
Token ID286
Feature activation+1.48e+00

Pos logit contributions
 imag+0.044
 hides+0.041
 pret+0.041
 pulls+0.041
 scratches+0.040
Neg logit contributions
over-0.011
apping-0.009
date-0.009
ute-0.008
he-0.008
Token big
Token ID1263
Feature activation+7.03e-02

Pos logit contributions
 year+0.069
over+0.065
 world+0.060
he+0.060
 outdoors+0.059
Neg logit contributions
 imag-0.069
 pulls-0.061
drivers-0.061
 knocks-0.060
 pret-0.060
Token juicy
Token ID35075
Feature activation+5.62e-01

Pos logit contributions
 year+0.003
over+0.003
 world+0.003
 outdoors+0.003
ak+0.003
Neg logit contributions
 imag-0.003
drivers-0.003
 knocks-0.003
 pulls-0.003
 pret-0.003
Token pine
Token ID20161
Feature activation-2.28e-01

Pos logit contributions
over+0.027
 year+0.026
ute+0.024
apping+0.024
he+0.024
Neg logit contributions
 imag-0.026
 pulls-0.022
 pret-0.022
 knocks-0.022
drivers-0.021
Tokencon
Token ID1102
Feature activation-3.60e+00

Pos logit contributions
 imag+0.010
 hides+0.008
 pulls+0.008
 pret+0.008
 scratches+0.008
Neg logit contributions
over-0.013
apping-0.011
ute-0.011
he-0.011
date-0.011
Tokenes
Token ID274
Feature activation-2.26e+00

Pos logit contributions
 imag+0.138
 hides+0.116
 scratches+0.114
 pulls+0.113
 pret+0.112
Neg logit contributions
over-0.215
apping-0.197
date-0.195
ute-0.194
he-0.193
Token,
Token ID11
Feature activation+1.27e-01

Pos logit contributions
 imag+0.107
 hides+0.099
 scratches+0.096
 pulls+0.095
 pret+0.095
Neg logit contributions
over-0.107
date-0.099
ute-0.098
apping-0.098
pron-0.094
Token so
Token ID523
Feature activation-3.30e-01

Pos logit contributions
over+0.006
date+0.005
apping+0.005
ute+0.005
he+0.005
Neg logit contributions
 imag-0.007
 scratches-0.006
 hides-0.006
 pulls-0.006
 pret-0.006
Token Penny
Token ID25965
Feature activation-1.54e-01

Pos logit contributions
 imag+0.015
 hides+0.013
 scratches+0.013
 pulls+0.013
 pret+0.013
Neg logit contributions
over-0.016
date-0.016
apping-0.015
ute-0.015
he-0.015
Token started
Token ID2067
Feature activation+1.96e-01

Pos logit contributions
 imag+0.006
 hides+0.006
 scratches+0.006
 pulls+0.005
 pret+0.005
Neg logit contributions
over-0.008
date-0.008
apping-0.008
ute-0.008
he-0.007
Token and
Token ID290
Feature activation+2.81e-02

Pos logit contributions
 imag+0.005
 pulls+0.005
 hides+0.004
 scratches+0.004
 pret+0.004
Neg logit contributions
over-0.005
date-0.005
ute-0.004
apping-0.004
pron-0.004
Token watched
Token ID7342
Feature activation-1.56e-01

Pos logit contributions
over+0.002
apping+0.001
ute+0.001
date+0.001
he+0.001
Neg logit contributions
 imag-0.001
 hides-0.001
 scratches-0.001
 pulls-0.001
 pret-0.001
Token as
Token ID355
Feature activation+2.22e-01

Pos logit contributions
 imag+0.006
 scratches+0.005
 hides+0.005
 pret+0.005
 pulls+0.005
Neg logit contributions
over-0.009
date-0.008
apping-0.008
ute-0.008
he-0.008
Token the
Token ID262
Feature activation+1.42e-01

Pos logit contributions
over+0.013
date+0.012
apping+0.012
ute+0.012
he+0.012
Neg logit contributions
 imag-0.009
 scratches-0.007
 hides-0.007
 pret-0.007
 pulls-0.007
Token cater
Token ID20825
Feature activation-1.27e+00

Pos logit contributions
over+0.006
apping+0.006
date+0.005
he+0.005
ute+0.005
Neg logit contributions
 imag-0.008
 scratches-0.007
 pulls-0.007
 screams-0.007
 turns-0.007
Tokenpill
Token ID27215
Feature activation-3.59e+00

Pos logit contributions
 imag+0.070
 scratches+0.064
 pulls+0.064
 hides+0.064
 pret+0.062
Neg logit contributions
over-0.049
apping-0.046
date-0.045
ute-0.045
 year-0.043
Tokenars
Token ID945
Feature activation+4.96e-01

Pos logit contributions
 imag+0.191
 scratches+0.174
 pulls+0.173
 hides+0.170
 pret+0.167
Neg logit contributions
over-0.146
ute-0.135
apping-0.134
date-0.133
pron-0.126
Token grew
Token ID6348
Feature activation-5.96e-01

Pos logit contributions
over+0.027
apping+0.025
date+0.025
ute+0.025
he+0.024
Neg logit contributions
 imag-0.021
 hides-0.018
 scratches-0.018
 pulls-0.018
 pret-0.017
Token bigger
Token ID5749
Feature activation-1.06e+00

Pos logit contributions
 imag+0.031
 hides+0.027
 pulls+0.027
 pret+0.027
 scratches+0.027
Neg logit contributions
over-0.027
apping-0.024
ute-0.024
date-0.024
he-0.023
Token and
Token ID290
Feature activation-4.85e-02

Pos logit contributions
 imag+0.060
 hides+0.054
 pulls+0.053
 scratches+0.053
 pret+0.053
Neg logit contributions
over-0.043
apping-0.038
date-0.037
ute-0.037
he-0.036
Token bigger
Token ID5749
Feature activation-1.07e+00

Pos logit contributions
 imag+0.003
 hides+0.002
 scratches+0.002
drivers+0.002
 pulls+0.002
Neg logit contributions
over-0.002
pron-0.002
date-0.002
apping-0.002
he-0.002
Token stretch
Token ID7539
Feature activation+3.08e-01

Pos logit contributions
over+0.018
apping+0.016
date+0.016
ute+0.016
he+0.015
Neg logit contributions
 imag-0.016
 hides-0.014
 scratches-0.014
 pulls-0.013
 pret-0.013
Token out
Token ID503
Feature activation-5.44e-02

Pos logit contributions
over+0.015
apping+0.014
date+0.014
ute+0.013
he+0.013
Neg logit contributions
 imag-0.015
 scratches-0.013
 pret-0.012
 hides-0.012
 pulls-0.012
Token and
Token ID290
Feature activation+1.36e-01

Pos logit contributions
 imag+0.003
 scratches+0.003
 pulls+0.003
 hides+0.003
 pret+0.003
Neg logit contributions
over-0.002
date-0.002
ute-0.002
apping-0.002
he-0.002
Token enormous
Token ID9812
Feature activation+1.78e-01

Pos logit contributions
over+0.007
apping+0.006
date+0.006
ute+0.006
he+0.006
Neg logit contributions
 imag-0.006
 hides-0.006
 scratches-0.006
 pulls-0.006
 pret-0.006
Token sun
Token ID4252
Feature activation-4.38e-01

Pos logit contributions
over+0.009
apping+0.009
date+0.009
ute+0.009
he+0.008
Neg logit contributions
 imag-0.008
 hides-0.007
 scratches-0.007
 pulls-0.007
 pret-0.007
Tokenfl
Token ID2704
Feature activation-3.58e+00

Pos logit contributions
 imag+0.018
 hides+0.015
 scratches+0.015
 pret+0.015
 pulls+0.015
Neg logit contributions
over-0.024
apping-0.023
date-0.022
ute-0.022
he-0.022
Tokenowers
Token ID3618
Feature activation+6.98e-01

Pos logit contributions
 imag+0.248
 hides+0.226
 scratches+0.225
 pulls+0.224
 pret+0.223
Neg logit contributions
over-0.097
apping-0.082
date-0.082
ute-0.080
he-0.077
Token would
Token ID561
Feature activation+3.48e-01

Pos logit contributions
over+0.032
date+0.029
apping+0.029
ute+0.028
he+0.027
Neg logit contributions
 imag-0.036
 hides-0.032
 scratches-0.031
 pulls-0.031
 pret-0.031
Token show
Token ID905
Feature activation-2.84e-01

Pos logit contributions
over+0.018
apping+0.017
date+0.017
ute+0.017
he+0.016
Neg logit contributions
 imag-0.015
 hides-0.013
 scratches-0.013
 pulls-0.013
 pret-0.013
Token their
Token ID511
Feature activation-6.77e-01

Pos logit contributions
 imag+0.012
 hides+0.011
 scratches+0.011
 pulls+0.011
 pret+0.010
Neg logit contributions
over-0.015
apping-0.014
date-0.014
ute-0.014
he-0.013
Token bright
Token ID6016
Feature activation-3.68e-01

Pos logit contributions
 imag+0.032
 pulls+0.030
 hides+0.029
drivers+0.029
 pret+0.029
Neg logit contributions
over-0.031
 year-0.029
ute-0.028
date-0.028
apping-0.028
Token the
Token ID262
Feature activation-1.33e-02

Pos logit contributions
over+0.010
he+0.009
date+0.009
ute+0.009
apping+0.009
Neg logit contributions
 imag-0.010
 hides-0.008
 pret-0.008
 pulls-0.008
 knocks-0.008
Token biggest
Token ID4094
Feature activation+7.53e-01

Pos logit contributions
 imag+0.001
 scratches+0.001
 hides+0.001
 pret+0.001
 pulls+0.001
Neg logit contributions
over-0.001
apping-0.001
date-0.001
ute-0.001
he-0.001
Token and
Token ID290
Feature activation+1.86e-01

Pos logit contributions
over+0.038
apping+0.035
date+0.035
ute+0.034
he+0.034
Neg logit contributions
 imag-0.035
 scratches-0.031
 hides-0.031
 pret-0.031
 pulls-0.030
Token brightest
Token ID33871
Feature activation+1.04e+00

Pos logit contributions
over+0.008
date+0.008
 year+0.008
apping+0.008
he+0.008
Neg logit contributions
 imag-0.009
 hides-0.008
 pulls-0.008
 scratches-0.008
 pret-0.008
Token sun
Token ID4252
Feature activation-4.13e-01

Pos logit contributions
over+0.052
apping+0.047
date+0.047
ute+0.047
he+0.046
Neg logit contributions
 imag-0.049
 hides-0.043
 scratches-0.043
 pulls-0.042
 pret-0.042
Tokenfl
Token ID2704
Feature activation-3.57e+00

Pos logit contributions
 imag+0.009
 hides+0.006
 scratches+0.006
 pulls+0.006
 pret+0.006
Neg logit contributions
over-0.031
apping-0.029
date-0.029
ute-0.029
he-0.028
Tokenowers
Token ID3618
Feature activation+7.30e-01

Pos logit contributions
 imag+0.243
 hides+0.221
 scratches+0.220
 pulls+0.219
 pret+0.218
Neg logit contributions
over-0.101
apping-0.086
date-0.085
ute-0.085
he-0.081
Token she
Token ID673
Feature activation-3.22e-01

Pos logit contributions
over+0.033
date+0.030
ute+0.030
apping+0.029
he+0.028
Neg logit contributions
 imag-0.037
 hides-0.033
 scratches-0.033
 pulls-0.032
 pret-0.032
Token could
Token ID714
Feature activation-2.53e-01

Pos logit contributions
 imag+0.013
 hides+0.011
 scratches+0.011
 pulls+0.011
 pret+0.011
Neg logit contributions
over-0.018
apping-0.017
ute-0.017
date-0.017
he-0.017
Token find
Token ID1064
Feature activation+8.60e-01

Pos logit contributions
 imag+0.011
 scratches+0.009
 hides+0.009
 pulls+0.009
 pret+0.009
Neg logit contributions
over-0.014
apping-0.012
ute-0.012
date-0.012
 year-0.012
Token.
Token ID13
Feature activation-1.28e-01

Pos logit contributions
over+0.040
ute+0.037
apping+0.037
date+0.037
he+0.035
Neg logit contributions
 imag-0.042
 scratches-0.037
 hides-0.037
 pret-0.036
 pulls-0.036
Token and
Token ID290
Feature activation+1.31e-01

Pos logit contributions
over+0.014
apping+0.013
date+0.012
ute+0.012
he+0.012
Neg logit contributions
 imag-0.017
 hides-0.015
 scratches-0.015
 pulls-0.015
 pret-0.015
Token Lily
Token ID20037
Feature activation-1.02e+00

Pos logit contributions
over+0.007
date+0.006
apping+0.006
he+0.006
ute+0.006
Neg logit contributions
 imag-0.006
 hides-0.005
 pret-0.005
 scratches-0.005
 pulls-0.004
Token felt
Token ID2936
Feature activation+1.36e-01

Pos logit contributions
 imag+0.039
 hides+0.033
 scratches+0.033
 pulls+0.032
 pret+0.032
Neg logit contributions
over-0.060
apping-0.055
date-0.055
ute-0.055
he-0.054
Token even
Token ID772
Feature activation-4.63e-01

Pos logit contributions
over+0.006
apping+0.005
he+0.005
date+0.005
 year+0.005
Neg logit contributions
 imag-0.007
 hides-0.007
 pret-0.007
 pulls-0.006
drivers-0.006
Token sad
Token ID6507
Feature activation-2.89e-01

Pos logit contributions
 imag+0.023
drivers+0.020
 hides+0.020
 pret+0.019
 scratches+0.019
Neg logit contributions
over-0.021
 year-0.019
date-0.019
apping-0.019
he-0.018
Tokender
Token ID1082
Feature activation-3.55e+00

Pos logit contributions
 imag+0.016
 hides+0.014
 pulls+0.014
 scratches+0.014
 pret+0.014
Neg logit contributions
over-0.012
apping-0.011
ute-0.011
date-0.011
he-0.010
Token.
Token ID13
Feature activation-1.79e-01

Pos logit contributions
 imag+0.188
 hides+0.168
 scratches+0.167
 pulls+0.166
 pret+0.165
Neg logit contributions
over-0.154
apping-0.139
date-0.139
ute-0.138
he-0.135
Token From
Token ID3574
Feature activation-7.12e-01

Pos logit contributions
 imag+0.009
 hides+0.008
 scratches+0.008
 pulls+0.008
 pret+0.008
Neg logit contributions
over-0.008
apping-0.007
date-0.007
ute-0.007
he-0.007
Token that
Token ID326
Feature activation+1.48e-01

Pos logit contributions
 imag+0.028
 hides+0.024
 scratches+0.024
 pulls+0.024
 pret+0.023
Neg logit contributions
over-0.041
apping-0.038
date-0.038
ute-0.037
he-0.037
Token day
Token ID1110
Feature activation-3.73e-01

Pos logit contributions
over+0.005
apping+0.005
ute+0.005
date+0.005
he+0.004
Neg logit contributions
 imag-0.009
 scratches-0.008
 pret-0.008
 hides-0.008
 pulls-0.008
Token on
Token ID319
Feature activation+6.10e-01

Pos logit contributions
 imag+0.017
 pulls+0.016
 scratches+0.016
 hides+0.015
 pret+0.015
Neg logit contributions
over-0.017
ute-0.016
date-0.016
apping-0.015
pron-0.015
Token he
Token ID339
Feature activation-1.35e-01

Pos logit contributions
over+0.015
apping+0.014
ute+0.014
date+0.014
he+0.014
Neg logit contributions
 imag-0.010
 scratches-0.009
 pret-0.009
 hides-0.009
 pulls-0.009
Token noticed
Token ID6810
Feature activation+7.62e-01

Pos logit contributions
 imag+0.005
 hides+0.005
 scratches+0.005
 pulls+0.005
 pret+0.004
Neg logit contributions
over-0.008
apping-0.007
date-0.007
ute-0.007
he-0.007
Token some
Token ID617
Feature activation-3.87e-01

Pos logit contributions
over+0.040
apping+0.036
date+0.036
ute+0.036
he+0.035
Neg logit contributions
 imag-0.034
 hides-0.030
 scratches-0.029
 pulls-0.029
 pret-0.029
Token other
Token ID584
Feature activation-1.88e-01

Pos logit contributions
 imag+0.018
 scratches+0.017
 pulls+0.016
 hides+0.016
 pret+0.016
Neg logit contributions
over-0.019
ute-0.018
date-0.017
apping-0.017
he-0.017
Token cater
Token ID20825
Feature activation-1.23e+00

Pos logit contributions
 imag+0.009
 hides+0.008
 scratches+0.008
 pulls+0.008
 pret+0.008
Neg logit contributions
over-0.009
apping-0.008
date-0.008
ute-0.008
he-0.008
Tokenpill
Token ID27215
Feature activation-3.55e+00

Pos logit contributions
 imag+0.035
 hides+0.028
 scratches+0.027
 pulls+0.027
 pret+0.027
Neg logit contributions
over-0.085
apping-0.079
date-0.079
ute-0.078
he-0.078
Tokenars
Token ID945
Feature activation+6.19e-01

Pos logit contributions
 imag+0.178
 scratches+0.160
 pulls+0.159
 hides+0.154
 screams+0.153
Neg logit contributions
over-0.148
ute-0.144
date-0.142
apping-0.136
pron-0.135
Token stretching
Token ID20880
Feature activation-4.50e-01

Pos logit contributions
over+0.029
date+0.026
ute+0.026
apping+0.026
he+0.025
Neg logit contributions
 imag-0.030
 scratches-0.027
 pulls-0.027
 hides-0.027
 pret-0.026
Token towards
Token ID3371
Feature activation-3.07e-01

Pos logit contributions
 imag+0.021
 scratches+0.018
 pret+0.018
 pulls+0.018
 hides+0.018
Neg logit contributions
over-0.022
apping-0.021
date-0.021
ute-0.021
he-0.020
Token it
Token ID340
Feature activation+4.29e-01

Pos logit contributions
 imag+0.011
 hides+0.009
 scratches+0.009
 pulls+0.009
 pret+0.009
Neg logit contributions
over-0.019
date-0.017
apping-0.017
ute-0.017
he-0.017
Token too
Token ID1165
Feature activation+4.00e-01

Pos logit contributions
over+0.020
apping+0.018
ute+0.018
date+0.018
he+0.017
Neg logit contributions
 imag-0.021
 hides-0.019
 scratches-0.019
 pulls-0.019
 pret-0.018
Token.
Token ID13
Feature activation+2.96e-01

Pos logit contributions
 imag+0.034
 pulls+0.030
 hides+0.029
 pret+0.029
 scratches+0.029
Neg logit contributions
over-0.021
apping-0.018
 year-0.016
date-0.016
he-0.016
Token She
Token ID1375
Feature activation-1.37e-02

Pos logit contributions
over+0.015
date+0.013
ute+0.013
apping+0.013
 year+0.013
Neg logit contributions
 imag-0.014
 scratches-0.013
 hides-0.012
 pulls-0.012
 pret-0.012
Token grabbed
Token ID13176
Feature activation+6.88e-01

Pos logit contributions
 imag+0.001
 fault+0.001
 hides+0.001
drivers+0.001
fl+0.001
Neg logit contributions
over-0.001
 year-0.001
apping-0.001
pron-0.001
ak-0.001
Token her
Token ID607
Feature activation-3.36e-01

Pos logit contributions
over+0.032
he+0.030
date+0.030
apping+0.029
ute+0.028
Neg logit contributions
 imag-0.033
 hides-0.030
 pret-0.028
 knocks-0.027
drivers-0.027
Token nap
Token ID25422
Feature activation-3.56e-01

Pos logit contributions
 imag+0.019
 hides+0.017
 scratches+0.017
 pulls+0.017
 pret+0.017
Neg logit contributions
over-0.014
apping-0.012
date-0.012
ute-0.012
he-0.012
Tokenkin
Token ID5116
Feature activation+3.76e+00

Pos logit contributions
 imag+0.025
 scratches+0.024
 pret+0.024
 pulls+0.024
drivers+0.023
Neg logit contributions
date-0.007
ute-0.007
over-0.007
apping-0.006
 year-0.006
Token and
Token ID290
Feature activation+1.75e-01

Pos logit contributions
over+0.181
date+0.171
ute+0.163
apping+0.159
 year+0.159
Neg logit contributions
 imag-0.168
 pulls-0.158
 scratches-0.157
 hides-0.156
drivers-0.149
Token covered
Token ID5017
Feature activation+2.32e-01

Pos logit contributions
over+0.008
kin+0.008
pron+0.008
he+0.007
date+0.007
Neg logit contributions
 imag-0.008
 fault-0.006
drivers-0.006
 hides-0.006
fl-0.006
Token her
Token ID607
Feature activation-3.80e-01

Pos logit contributions
over+0.014
date+0.013
ute+0.013
apping+0.013
 year+0.012
Neg logit contributions
 imag-0.008
 scratches-0.007
 pulls-0.007
 hides-0.007
 pret-0.007
Token face
Token ID1986
Feature activation+4.77e-01

Pos logit contributions
 imag+0.020
 hides+0.018
 scratches+0.018
 pulls+0.018
 pret+0.018
Neg logit contributions
over-0.017
apping-0.015
ute-0.015
date-0.015
he-0.014
Token.
Token ID13
Feature activation+1.89e-01

Pos logit contributions
over+0.022
date+0.021
ute+0.021
apping+0.020
he+0.019
Neg logit contributions
 imag-0.023
 scratches-0.021
 hides-0.020
 pulls-0.020
 pret-0.020
Token She
Token ID1375
Feature activation-7.62e-02

Pos logit contributions
 imag+0.004
 hides+0.003
 pulls+0.003
 scratches+0.003
 pret+0.003
Neg logit contributions
over-0.004
ute-0.004
date-0.003
apping-0.003
he-0.003
Token found
Token ID1043
Feature activation+2.94e-01

Pos logit contributions
 imag+0.003
 hides+0.003
 scratches+0.003
 pret+0.003
 pulls+0.003
Neg logit contributions
over-0.004
date-0.004
apping-0.004
he-0.003
ute-0.003
Token some
Token ID617
Feature activation-4.32e-01

Pos logit contributions
over+0.011
he+0.011
ute+0.010
date+0.010
 year+0.010
Neg logit contributions
 imag-0.016
 hides-0.015
 pret-0.014
 knocks-0.014
 disl-0.013
Token old
Token ID1468
Feature activation-3.75e-01

Pos logit contributions
 imag+0.020
 hides+0.017
 pret+0.017
 pulls+0.017
 knocks+0.016
Neg logit contributions
over-0.022
ute-0.020
apping-0.020
date-0.020
he-0.020
Token r
Token ID374
Feature activation-1.02e+00

Pos logit contributions
 imag+0.017
 pulls+0.015
 hides+0.014
 knocks+0.014
 fourteen+0.014
Neg logit contributions
over-0.018
 year-0.017
ute-0.016
date-0.016
 world-0.015
Tokenags
Token ID3775
Feature activation+4.67e+00

Pos logit contributions
 imag+0.055
 scratches+0.049
 hides+0.049
 pulls+0.049
 pret+0.048
Neg logit contributions
over-0.043
date-0.039
ute-0.039
apping-0.038
he-0.038
Token which
Token ID543
Feature activation-3.90e-02

Pos logit contributions
over+0.219
date+0.204
apping+0.197
ute+0.196
he+0.188
Neg logit contributions
 imag-0.225
 scratches-0.203
 hides-0.200
 pulls-0.200
 pret-0.196
Token were
Token ID547
Feature activation-2.73e-01

Pos logit contributions
 imag+0.001
 hides+0.001
 scratches+0.001
 pulls+0.001
 pret+0.001
Neg logit contributions
over-0.002
date-0.002
apping-0.002
ute-0.002
he-0.002
Token dirty
Token ID11841
Feature activation-3.84e-01

Pos logit contributions
 imag+0.014
 hides+0.013
 pret+0.013
 pulls+0.012
 scratches+0.012
Neg logit contributions
over-0.012
apping-0.011
date-0.011
 year-0.011
he-0.011
Token.
Token ID13
Feature activation-1.48e-02

Pos logit contributions
 imag+0.020
 hides+0.018
 scratches+0.018
 pulls+0.018
 pret+0.018
Neg logit contributions
over-0.017
apping-0.015
ute-0.015
date-0.015
he-0.014
Token She
Token ID1375
Feature activation-3.08e-02

Pos logit contributions
 imag+0.001
 hides+0.001
 pulls+0.001
 scratches+0.001
 pret+0.001
Neg logit contributions
over-0.001
ute-0.001
date-0.001
apping-0.001
he-0.001
Token.
Token ID13
Feature activation+1.75e-01

Pos logit contributions
over+0.018
date+0.017
ute+0.017
he+0.016
apping+0.016
Neg logit contributions
 imag-0.015
 hides-0.014
 pret-0.013
 pulls-0.013
 scratches-0.013
Token He
Token ID679
Feature activation+3.23e-02

Pos logit contributions
over+0.009
apping+0.008
date+0.008
ute+0.008
he+0.008
Neg logit contributions
 imag-0.008
 hides-0.007
 scratches-0.007
 pulls-0.007
 pret-0.007
Token grabbed
Token ID13176
Feature activation+7.00e-01

Pos logit contributions
over+0.002
 year+0.001
he+0.001
pron+0.001
apping+0.001
Neg logit contributions
 imag-0.001
drivers-0.001
 hides-0.001
 scratches-0.001
 pret-0.001
Token the
Token ID262
Feature activation+2.01e-01

Pos logit contributions
over+0.038
date+0.034
he+0.034
apping+0.034
ute+0.034
Neg logit contributions
 imag-0.030
 hides-0.026
 scratches-0.025
 pret-0.025
 pulls-0.025
Token nap
Token ID25422
Feature activation-2.42e-01

Pos logit contributions
over+0.009
apping+0.008
date+0.008
ute+0.008
he+0.007
Neg logit contributions
 imag-0.011
 scratches-0.010
 pret-0.010
 pulls-0.010
 hides-0.009
Tokenkin
Token ID5116
Feature activation+3.70e+00

Pos logit contributions
 imag+0.017
 pret+0.016
 scratches+0.016
 pulls+0.015
drivers+0.015
Neg logit contributions
ute-0.005
apping-0.004
over-0.004
date-0.004
?'-0.003
Token and
Token ID290
Feature activation+1.22e-01

Pos logit contributions
over+0.187
date+0.177
ute+0.170
apping+0.165
 year+0.162
Neg logit contributions
 imag-0.151
 pulls-0.140
 hides-0.136
 scratches-0.135
drivers-0.131
Token wiped
Token ID21122
Feature activation-6.04e-02

Pos logit contributions
over+0.006
he+0.005
date+0.005
kin+0.005
pron+0.005
Neg logit contributions
 imag-0.005
drivers-0.005
 scratches-0.005
 hides-0.005
 fault-0.005
Token his
Token ID465
Feature activation-3.68e-01

Pos logit contributions
 imag+0.003
 hides+0.003
drivers+0.003
 pret+0.002
 pulls+0.002
Neg logit contributions
over-0.003
he-0.003
date-0.003
ute-0.003
apping-0.003
Token face
Token ID1986
Feature activation+4.72e-01

Pos logit contributions
 imag+0.014
 hides+0.012
 pret+0.012
 scratches+0.012
 pulls+0.012
Neg logit contributions
over-0.021
date-0.019
apping-0.019
ute-0.019
 year-0.019
Token.
Token ID13
Feature activation+4.74e-02

Pos logit contributions
over+0.022
date+0.020
ute+0.020
apping+0.020
 year+0.019
Neg logit contributions
 imag-0.022
 scratches-0.020
 pulls-0.020
 pret-0.020
 hides-0.020
Tokeniffs
Token ID10203
Feature activation-4.04e-01

Pos logit contributions
over+0.007
date+0.007
 year+0.007
pron+0.007
apping+0.007
Neg logit contributions
 imag-0.005
 scratches-0.005
 hides-0.005
 pulls-0.005
drivers-0.005
Token them
Token ID606
Feature activation+1.35e-01

Pos logit contributions
 imag+0.015
 scratches+0.013
 hides+0.013
 pulls+0.013
 pret+0.013
Neg logit contributions
over-0.024
date-0.022
ute-0.022
apping-0.022
he-0.021
Token.
Token ID13
Feature activation+4.80e-01

Pos logit contributions
over+0.006
date+0.006
ute+0.006
apping+0.006
he+0.005
Neg logit contributions
 imag-0.007
 hides-0.006
 scratches-0.006
 pulls-0.006
 pret-0.006
Token It
Token ID632
Feature activation-8.93e-01

Pos logit contributions
over+0.025
date+0.024
ute+0.023
apping+0.023
he+0.023
Neg logit contributions
 imag-0.021
 scratches-0.019
 hides-0.019
 pulls-0.018
 pret-0.018
Token w
Token ID266
Feature activation-1.10e+00

Pos logit contributions
drivers+0.042
 supervision+0.033
 imag+0.033
 fault+0.031
 subt+0.031
Neg logit contributions
 year-0.038
 world-0.036
 fisherman-0.035
over-0.035
 corner-0.035
Tokenags
Token ID3775
Feature activation+4.75e+00

Pos logit contributions
 imag+0.070
 scratches+0.064
 hides+0.064
 pulls+0.063
 pret+0.063
Neg logit contributions
over-0.036
date-0.032
apping-0.031
ute-0.031
he-0.030
Token its
Token ID663
Feature activation-3.27e-01

Pos logit contributions
over+0.254
date+0.238
apping+0.233
ute+0.233
he+0.231
Neg logit contributions
 imag-0.204
 scratches-0.179
 hides-0.178
 pret-0.176
 pulls-0.175
Token tail
Token ID7894
Feature activation+8.34e-03

Pos logit contributions
 imag+0.017
 scratches+0.015
 hides+0.015
 pulls+0.015
 pret+0.015
Neg logit contributions
over-0.014
date-0.014
apping-0.013
he-0.013
ute-0.013
Token.
Token ID13
Feature activation+4.84e-01

Pos logit contributions
over+0.000
date+0.000
ute+0.000
pron+0.000
apping+0.000
Neg logit contributions
 imag-0.000
 scratches-0.000
 pulls-0.000
 hides-0.000
 pret-0.000
Token It
Token ID632
Feature activation-9.09e-01

Pos logit contributions
over+0.025
date+0.024
ute+0.023
apping+0.023
he+0.023
Neg logit contributions
 imag-0.021
 scratches-0.019
 pulls-0.019
 hides-0.019
 pret-0.018
Token l
Token ID300
Feature activation-8.14e-01

Pos logit contributions
drivers+0.045
 supervision+0.036
 imag+0.036
 fault+0.033
 subt+0.033
Neg logit contributions
 year-0.038
 world-0.035
over-0.035
 fisherman-0.035
 reef-0.033
Token mom
Token ID1995
Feature activation-2.53e-01

Pos logit contributions
over+0.017
ute+0.015
 year+0.015
date+0.015
pron+0.015
Neg logit contributions
 imag-0.029
 hides-0.027
 pret-0.026
drivers-0.026
 pulls-0.026
Token gave
Token ID2921
Feature activation-7.12e-01

Pos logit contributions
 imag+0.012
 hides+0.010
 pret+0.010
 scratches+0.010
 pulls+0.009
Neg logit contributions
over-0.012
he-0.012
 year-0.012
apping-0.012
date-0.012
Token her
Token ID607
Feature activation-5.62e-01

Pos logit contributions
 imag+0.036
drivers+0.031
 hides+0.031
 pret+0.031
 pulls+0.030
Neg logit contributions
over-0.032
date-0.030
he-0.030
 year-0.030
pron-0.027
Token a
Token ID257
Feature activation+4.95e-02

Pos logit contributions
 imag+0.028
 hides+0.025
 pret+0.024
drivers+0.023
 scratches+0.023
Neg logit contributions
over-0.027
date-0.024
 year-0.024
he-0.024
apping-0.024
Token nap
Token ID25422
Feature activation-2.98e-01

Pos logit contributions
over+0.002
apping+0.002
date+0.002
ute+0.002
he+0.002
Neg logit contributions
 imag-0.002
 scratches-0.002
 pret-0.002
 pulls-0.002
 hides-0.002
Tokenkin
Token ID5116
Feature activation+3.71e+00

Pos logit contributions
 imag+0.020
 pret+0.019
 scratches+0.019
drivers+0.019
 pulls+0.019
Neg logit contributions
ute-0.006
apping-0.006
over-0.005
date-0.005
 year-0.004
Token and
Token ID290
Feature activation+6.09e-02

Pos logit contributions
over+0.182
ute+0.166
date+0.166
apping+0.164
 year+0.158
Neg logit contributions
 imag-0.166
 hides-0.148
 scratches-0.148
 pulls-0.147
 pret-0.143
Token showed
Token ID3751
Feature activation-1.16e+00

Pos logit contributions
pron+0.003
over+0.002
date+0.002
kin+0.002
he+0.002
Neg logit contributions
 imag-0.003
drivers-0.003
 hides-0.003
 scratches-0.003
 screams-0.003
Token her
Token ID607
Feature activation-5.38e-01

Pos logit contributions
 imag+0.066
drivers+0.055
 pret+0.055
 hides+0.055
 pulls+0.054
Neg logit contributions
over-0.046
he-0.045
date-0.044
 year-0.044
ute-0.039
Token how
Token ID703
Feature activation-1.16e+00

Pos logit contributions
 imag+0.022
 hides+0.018
 scratches+0.018
drivers+0.018
 pret+0.017
Neg logit contributions
over-0.030
date-0.028
he-0.028
apping-0.028
ute-0.028
Token to
Token ID284
Feature activation-6.65e-01

Pos logit contributions
 imag+0.061
 pret+0.056
 hides+0.054
 pulls+0.054
drivers+0.054
Neg logit contributions
over-0.049
apping-0.046
 year-0.045
ute-0.045
pron-0.044
Token He
Token ID679
Feature activation+2.37e-01

Pos logit contributions
over+0.014
date+0.013
ute+0.013
apping+0.013
he+0.012
Neg logit contributions
 imag-0.013
 hides-0.011
 scratches-0.011
 pulls-0.011
 pret-0.011
Token now
Token ID783
Feature activation-3.23e-01

Pos logit contributions
over+0.013
apping+0.012
ute+0.012
date+0.012
 year+0.012
Neg logit contributions
 imag-0.010
 hides-0.009
 scratches-0.008
 pret-0.008
 pulls-0.008
Token felt
Token ID2936
Feature activation+3.33e-01

Pos logit contributions
 imag+0.013
 scratches+0.012
 hides+0.011
 pulls+0.011
 pret+0.011
Neg logit contributions
over-0.018
apping-0.016
date-0.016
ute-0.016
he-0.016
Token warm
Token ID5814
Feature activation+2.55e-01

Pos logit contributions
over+0.015
apping+0.014
date+0.014
ute+0.014
he+0.014
Neg logit contributions
 imag-0.017
 hides-0.015
 scratches-0.015
 pulls-0.015
 pret-0.015
Token and
Token ID290
Feature activation+1.95e-01

Pos logit contributions
over+0.012
ute+0.011
date+0.011
apping+0.011
he+0.010
Neg logit contributions
 imag-0.012
 scratches-0.011
 hides-0.011
 pret-0.011
 pulls-0.011
Token cozy
Token ID37438
Feature activation+6.89e-01

Pos logit contributions
over+0.008
apping+0.007
 year+0.007
he+0.007
date+0.007
Neg logit contributions
 imag-0.011
 hides-0.010
drivers-0.010
 pulls-0.010
 scratches-0.010
Token.
Token ID13
Feature activation+1.63e-01

Pos logit contributions
ute+0.033
over+0.032
apping+0.031
date+0.031
he+0.029
Neg logit contributions
 imag-0.031
 scratches-0.028
 hides-0.028
 pret-0.028
 pulls-0.026
Token His
Token ID2399
Feature activation+4.99e-01

Pos logit contributions
over+0.008
date+0.007
apping+0.007
ute+0.007
he+0.007
Neg logit contributions
 imag-0.008
 hides-0.007
 scratches-0.007
 pulls-0.007
 pret-0.007
Token cold
Token ID4692
Feature activation-2.87e-01

Pos logit contributions
over+0.023
apping+0.021
date+0.021
ute+0.021
he+0.020
Neg logit contributions
 imag-0.026
 hides-0.023
 scratches-0.023
 pulls-0.022
 pret-0.022
Tokenness
Token ID1108
Feature activation-2.71e-01

Pos logit contributions
 imag+0.014
 scratches+0.012
 hides+0.012
 pulls+0.012
 pret+0.012
Neg logit contributions
over-0.014
apping-0.013
date-0.013
ute-0.012
he-0.012
Token had
Token ID550
Feature activation+1.53e-01

Pos logit contributions
 imag+0.011
 hides+0.010
 scratches+0.009
 pret+0.009
 pulls+0.009
Neg logit contributions
over-0.015
apping-0.014
date-0.014
he-0.014
ute-0.014
Token "
Token ID366
Feature activation-2.62e-02

Pos logit contributions
over+0.009
date+0.009
ute+0.009
apping+0.009
he+0.008
Neg logit contributions
 imag-0.008
 scratches-0.007
 pret-0.007
 pulls-0.007
 hides-0.007
TokenWe
Token ID1135
Feature activation+1.22e-01

Pos logit contributions
 imag+0.001
 scratches+0.001
 pulls+0.001
 hides+0.001
 pret+0.001
Neg logit contributions
over-0.001
ute-0.001
date-0.001
apping-0.001
he-0.001
Token're
Token ID821
Feature activation+8.46e-01

Pos logit contributions
over+0.007
apping+0.007
date+0.007
ute+0.007
he+0.007
Neg logit contributions
 imag-0.005
 hides-0.004
 scratches-0.004
 pulls-0.004
 pret-0.004
Token glad
Token ID9675
Feature activation+2.31e-01

Pos logit contributions
over+0.042
date+0.042
ute+0.040
 year+0.039
pron+0.039
Neg logit contributions
 imag-0.035
 scratches-0.031
 hides-0.031
 pulls-0.031
 pret-0.031
Token you
Token ID345
Feature activation+1.03e+00

Pos logit contributions
over+0.011
ute+0.010
date+0.010
apping+0.010
he+0.010
Neg logit contributions
 imag-0.011
 hides-0.010
 scratches-0.010
 pulls-0.010
 pret-0.010
Token're
Token ID821
Feature activation+8.24e-01

Pos logit contributions
over+0.059
date+0.055
apping+0.055
ute+0.054
he+0.054
Neg logit contributions
 imag-0.040
 hides-0.034
 scratches-0.034
 pulls-0.033
 pret-0.033
Token safe
Token ID3338
Feature activation+5.37e-01

Pos logit contributions
date+0.039
over+0.038
ute+0.037
pron+0.037
 year+0.036
Neg logit contributions
ps-0.033
 imag-0.032
 hides-0.031
 screams-0.031
 scratches-0.030
Token.
Token ID13
Feature activation-5.84e-02

Pos logit contributions
over+0.026
date+0.026
pron+0.026
ute+0.026
he+0.025
Neg logit contributions
 imag-0.023
 hides-0.021
 scratches-0.020
 pret-0.020
 screams-0.019
Token We
Token ID775
Feature activation-2.69e-01

Pos logit contributions
 imag+0.003
 hides+0.002
 scratches+0.002
 pulls+0.002
 pret+0.002
Neg logit contributions
over-0.003
apping-0.003
date-0.003
ute-0.003
he-0.003
Token love
Token ID1842
Feature activation+3.30e-01

Pos logit contributions
 imag+0.010
 hides+0.009
 scratches+0.009
 pulls+0.009
 pret+0.009
Neg logit contributions
over-0.015
date-0.014
apping-0.014
ute-0.014
he-0.014
Token you
Token ID345
Feature activation+1.02e+00

Pos logit contributions
over+0.016
ute+0.015
apping+0.015
date+0.015
he+0.014
Neg logit contributions
 imag-0.016
 hides-0.014
 scratches-0.014
 pulls-0.014
 pret-0.014
Token.
Token ID13
Feature activation+3.58e-02

Pos logit contributions
over+0.022
ute+0.021
date+0.019
apping+0.019
he+0.018
Neg logit contributions
 imag-0.048
 scratches-0.045
 pulls-0.044
 hides-0.043
 pret-0.043
Token Lucy
Token ID22162
Feature activation+1.61e-01

Pos logit contributions
over+0.002
ute+0.002
date+0.002
apping+0.002
he+0.002
Neg logit contributions
 imag-0.002
 hides-0.001
 scratches-0.001
 pulls-0.001
 pret-0.001
Token and
Token ID290
Feature activation+3.56e-02

Pos logit contributions
over+0.008
apping+0.007
 year+0.007
he+0.007
ute+0.007
Neg logit contributions
 imag-0.007
 scratches-0.006
 hides-0.006
 pret-0.006
drivers-0.006
Token her
Token ID607
Feature activation-2.69e-01

Pos logit contributions
over+0.002
apping+0.002
ute+0.002
date+0.002
he+0.002
Neg logit contributions
 imag-0.002
 hides-0.001
 scratches-0.001
 pulls-0.001
 pret-0.001
Token brother
Token ID3956
Feature activation-2.45e-01

Pos logit contributions
 imag+0.014
 hides+0.013
 scratches+0.012
 pulls+0.012
 pret+0.012
Neg logit contributions
over-0.012
apping-0.011
date-0.011
ute-0.011
he-0.011
Token stood
Token ID6204
Feature activation+9.40e-01

Pos logit contributions
 imag+0.010
 hides+0.009
 pret+0.009
 scratches+0.009
 pulls+0.009
Neg logit contributions
over-0.013
he-0.012
date-0.012
apping-0.012
 year-0.012
Token back
Token ID736
Feature activation+8.29e-01

Pos logit contributions
over+0.047
date+0.045
ute+0.043
 year+0.043
mber+0.042
Neg logit contributions
 imag-0.042
 pret-0.035
 scratches-0.035
 hides-0.035
 smiles-0.035
Token and
Token ID290
Feature activation+9.39e-02

Pos logit contributions
over+0.039
date+0.036
ute+0.035
apping+0.035
pron+0.035
Neg logit contributions
 imag-0.041
 hides-0.036
 scratches-0.036
 pret-0.035
 pulls-0.035
Token cl
Token ID537
Feature activation-3.39e-02

Pos logit contributions
over+0.005
apping+0.005
date+0.005
he+0.005
ute+0.004
Neg logit contributions
 imag-0.004
 scratches-0.003
 hides-0.003
 pret-0.003
 pulls-0.003
Tokenapped
Token ID6320
Feature activation+1.22e+00

Pos logit contributions
 imag+0.002
 hides+0.002
 scratches+0.002
 pulls+0.002
 pret+0.002
Neg logit contributions
over-0.001
date-0.001
ute-0.001
apping-0.001
he-0.001
Token with
Token ID351
Feature activation-4.55e-01

Pos logit contributions
over+0.065
apping+0.059
date+0.059
ute+0.059
he+0.058
Neg logit contributions
 imag-0.053
 hides-0.046
 scratches-0.046
 pulls-0.045
 pret-0.045
Token teacher
Token ID4701
Feature activation-6.45e-01

Pos logit contributions
 imag+0.032
 hides+0.029
 pret+0.029
 scratches+0.028
 pulls+0.028
Neg logit contributions
over-0.019
apping-0.016
date-0.016
ute-0.016
he-0.016
Token,
Token ID11
Feature activation+8.43e-02

Pos logit contributions
 imag+0.030
 pret+0.027
 hides+0.027
 scratches+0.026
 pulls+0.026
Neg logit contributions
over-0.031
ute-0.029
apping-0.029
date-0.029
he-0.028
Token Miss
Token ID4544
Feature activation-1.24e-01

Pos logit contributions
over+0.004
apping+0.004
date+0.004
ute+0.004
he+0.004
Neg logit contributions
 imag-0.004
 hides-0.004
 scratches-0.004
 pulls-0.004
 pret-0.004
Token Rose
Token ID8049
Feature activation-5.61e-01

Pos logit contributions
 imag+0.006
 scratches+0.005
 pulls+0.005
 hides+0.005
 screams+0.005
Neg logit contributions
over-0.006
date-0.006
apping-0.006
ute-0.005
he-0.005
Token.
Token ID13
Feature activation-3.72e-02

Pos logit contributions
 imag+0.030
 hides+0.027
 scratches+0.027
 pulls+0.027
 pret+0.027
Neg logit contributions
over-0.024
apping-0.021
date-0.021
ute-0.021
he-0.021
Token One
Token ID1881
Feature activation+6.48e-01

Pos logit contributions
 imag+0.002
 scratches+0.002
 pulls+0.002
 hides+0.002
 pret+0.002
Neg logit contributions
over-0.002
date-0.002
apping-0.002
he-0.001
ute-0.001
Token day
Token ID1110
Feature activation-1.48e-01

Pos logit contributions
over+0.028
date+0.026
ute+0.026
apping+0.025
umber+0.024
Neg logit contributions
 imag-0.031
 knocks-0.027
 pret-0.027
 hides-0.027
 turns-0.026
Token,
Token ID11
Feature activation+2.01e-01

Pos logit contributions
 imag+0.009
 hides+0.008
 scratches+0.008
 pulls+0.008
 pret+0.008
Neg logit contributions
over-0.006
apping-0.005
date-0.005
ute-0.005
he-0.005
Token Lily
Token ID20037
Feature activation-3.89e-01

Pos logit contributions
over+0.012
apping+0.011
ute+0.011
date+0.011
he+0.010
Neg logit contributions
 imag-0.008
 hides-0.007
 scratches-0.007
 pret-0.007
 pulls-0.007
Token went
Token ID1816
Feature activation-2.37e-01

Pos logit contributions
 imag+0.017
 scratches+0.015
 pulls+0.014
 hides+0.014
 pret+0.014
Neg logit contributions
over-0.020
date-0.019
apping-0.019
he-0.018
ute-0.018
Token to
Token ID284
Feature activation-6.45e-01

Pos logit contributions
 imag+0.010
 hides+0.009
 pret+0.008
 scratches+0.008
 pulls+0.008
Neg logit contributions
over-0.014
apping-0.012
date-0.012
ute-0.012
he-0.012
Token safe
Token ID3338
Feature activation+6.20e-01

Pos logit contributions
 imag+0.027
 scratches+0.025
 pulls+0.024
 hides+0.024
 knocks+0.023
Neg logit contributions
over-0.028
ute-0.027
apping-0.026
date-0.026
he-0.025
Token.
Token ID13
Feature activation+8.23e-02

Pos logit contributions
over+0.028
ute+0.027
date+0.027
apping+0.026
he+0.025
Neg logit contributions
 imag-0.030
 hides-0.027
 scratches-0.027
 pulls-0.026
 pret-0.026
Token She
Token ID1375
Feature activation-1.84e-01

Pos logit contributions
over+0.004
date+0.004
ute+0.004
apping+0.004
he+0.004
Neg logit contributions
 imag-0.004
 hides-0.003
 pulls-0.003
 scratches-0.003
 pret-0.003
Token pushed
Token ID7121
Feature activation-1.27e-01

Pos logit contributions
 imag+0.008
 hides+0.007
drivers+0.006
 scratches+0.006
 pret+0.006
Neg logit contributions
over-0.010
 year-0.009
apping-0.009
he-0.009
date-0.009
Token it
Token ID340
Feature activation+3.27e-01

Pos logit contributions
 imag+0.006
 hides+0.005
 scratches+0.005
 pret+0.005
 pulls+0.005
Neg logit contributions
over-0.007
apping-0.006
he-0.006
date-0.006
ute-0.006
Token back
Token ID736
Feature activation+7.00e-01

Pos logit contributions
over+0.014
apping+0.013
ute+0.013
date+0.013
he+0.013
Neg logit contributions
 imag-0.017
 hides-0.015
 pret-0.015
 scratches-0.015
 pulls-0.015
Token up
Token ID510
Feature activation-3.53e-01

Pos logit contributions
over+0.033
date+0.030
apping+0.030
ute+0.030
he+0.029
Neg logit contributions
 imag-0.034
 hides-0.031
 scratches-0.030
 pulls-0.030
 pret-0.030
Token the
Token ID262
Feature activation+7.62e-02

Pos logit contributions
 imag+0.017
 scratches+0.015
 pulls+0.014
 hides+0.014
 pret+0.014
Neg logit contributions
over-0.017
ute-0.016
apping-0.016
date-0.016
pron-0.015
Token hill
Token ID12788
Feature activation+7.90e-01

Pos logit contributions
over+0.003
apping+0.003
date+0.003
ute+0.003
he+0.003
Neg logit contributions
 imag-0.004
 scratches-0.004
 hides-0.004
 pulls-0.004
 knocks-0.004
Token,
Token ID11
Feature activation+1.89e-01

Pos logit contributions
over+0.036
ute+0.034
date+0.033
apping+0.033
pron+0.032
Neg logit contributions
 imag-0.038
 scratches-0.034
 hides-0.034
 pulls-0.034
 pret-0.033
Token taking
Token ID2263
Feature activation-8.68e-01

Pos logit contributions
over+0.008
apping+0.008
date+0.008
ute+0.008
he+0.007
Neg logit contributions
 imag-0.010
 hides-0.009
 scratches-0.009
 pret-0.008
 pulls-0.008
Token prayed
Token ID32168
Feature activation+2.13e-01

Pos logit contributions
over+0.002
 year+0.001
he+0.001
pron+0.001
apping+0.001
Neg logit contributions
 imag-0.001
drivers-0.001
 hides-0.001
 scratches-0.001
 pulls-0.001
Token with
Token ID351
Feature activation-3.16e-01

Pos logit contributions
over+0.011
date+0.010
ute+0.010
apping+0.010
he+0.010
Neg logit contributions
 imag-0.010
 scratches-0.008
 hides-0.008
 pulls-0.008
 pret-0.008
Token her
Token ID607
Feature activation-2.25e-01

Pos logit contributions
 imag+0.014
 hides+0.013
 pulls+0.012
 pret+0.012
drivers+0.012
Neg logit contributions
over-0.016
he-0.014
 year-0.014
date-0.014
pron-0.014
Token.
Token ID13
Feature activation+3.56e-01

Pos logit contributions
 imag+0.011
 scratches+0.009
 hides+0.009
 pulls+0.009
 pret+0.009
Neg logit contributions
over-0.011
date-0.011
apping-0.011
he-0.010
ute-0.010
Token
Token ID198
Feature activation-1.29e-01

Pos logit contributions
over+0.018
date+0.017
ute+0.016
apping+0.016
he+0.016
Neg logit contributions
 imag-0.016
 scratches-0.015
 hides-0.015
 pret-0.015
 pulls-0.015
Token
Token ID198
Feature activation-1.03e-01

Pos logit contributions
 imag+0.008
 hides+0.007
 pulls+0.007
 pret+0.007
 scratches+0.007
Neg logit contributions
over-0.005
he-0.004
apping-0.004
ute-0.004
 year-0.004
TokenThey
Token ID2990
Feature activation+4.10e-02

Pos logit contributions
 imag+0.006
 scratches+0.006
 hides+0.006
 pulls+0.006
 pret+0.006
Neg logit contributions
over-0.004
date-0.003
apping-0.003
ute-0.003
he-0.003
Token prayed
Token ID32168
Feature activation+2.68e-01

Pos logit contributions
over+0.002
date+0.002
he+0.002
apping+0.002
ute+0.002
Neg logit contributions
 imag-0.002
 hides-0.001
 scratches-0.001
 pulls-0.001
 pret-0.001
Token to
Token ID284
Feature activation-5.29e-01

Pos logit contributions
over+0.014
date+0.013
ute+0.012
apping+0.012
he+0.012
Neg logit contributions
 imag-0.012
 scratches-0.011
 hides-0.011
 pret-0.010
 pulls-0.010
Token God
Token ID1793
Feature activation+2.02e-01

Pos logit contributions
 imag+0.033
 hides+0.030
 pulls+0.029
 scratches+0.029
 pret+0.029
Neg logit contributions
over-0.018
date-0.016
apping-0.016
ute-0.016
he-0.016
Token and
Token ID290
Feature activation+2.45e-01

Pos logit contributions
over+0.009
apping+0.008
date+0.008
ute+0.008
he+0.008
Neg logit contributions
 imag-0.011
 hides-0.009
 scratches-0.009
 pulls-0.009
 pret-0.009
Token the
Token ID262
Feature activation+1.41e-01

Pos logit contributions
over+0.036
date+0.033
 year+0.032
apping+0.032
he+0.032
Neg logit contributions
 imag-0.031
 hides-0.027
 pulls-0.026
 pret-0.026
 scratches-0.026
Token instrument
Token ID8875
Feature activation+3.49e-01

Pos logit contributions
over+0.006
ute+0.006
date+0.006
apping+0.005
he+0.005
Neg logit contributions
 imag-0.008
 hides-0.007
 scratches-0.007
 pulls-0.007
 pret-0.007
Token until
Token ID1566
Feature activation-7.47e-01

Pos logit contributions
over+0.016
apping+0.015
date+0.014
ute+0.014
he+0.014
Neg logit contributions
 imag-0.017
 hides-0.015
 scratches-0.015
 pulls-0.015
 pret-0.015
Token it
Token ID340
Feature activation+3.32e-01

Pos logit contributions
 imag+0.032
 hides+0.028
 scratches+0.028
 pulls+0.028
 pret+0.027
Neg logit contributions
over-0.040
apping-0.037
ute-0.037
date-0.036
he-0.036
Token looked
Token ID3114
Feature activation+8.63e-01

Pos logit contributions
over+0.017
date+0.016
he+0.016
 year+0.015
kin+0.015
Neg logit contributions
 imag-0.015
 hides-0.013
 pret-0.013
 scratches-0.013
 screams-0.012
Token like
Token ID588
Feature activation-1.81e-01

Pos logit contributions
over+0.037
apping+0.033
date+0.033
he+0.032
ute+0.032
Neg logit contributions
 imag-0.046
 hides-0.041
 scratches-0.041
 pulls-0.041
 pret-0.040
Token new
Token ID649
Feature activation-4.41e-01

Pos logit contributions
 imag+0.009
 hides+0.008
 pulls+0.008
 pret+0.008
 knocks+0.007
Neg logit contributions
over-0.008
date-0.008
 year-0.007
pron-0.007
apping-0.007
Token again
Token ID757
Feature activation-2.50e-01

Pos logit contributions
 imag+0.025
 pulls+0.024
 hides+0.023
 scratches+0.023
 knocks+0.023
Neg logit contributions
over-0.016
apping-0.014
date-0.014
ute-0.014
 year-0.014
Token.
Token ID13
Feature activation-1.34e-01

Pos logit contributions
 imag+0.012
 scratches+0.010
 hides+0.010
 pulls+0.010
 pret+0.010
Neg logit contributions
over-0.012
ute-0.012
apping-0.011
date-0.011
he-0.011
Token
Token ID198
Feature activation-1.96e-01

Pos logit contributions
 imag+0.006
 hides+0.006
 pulls+0.006
 scratches+0.006
 pret+0.006
Neg logit contributions
over-0.007
ute-0.006
date-0.006
apping-0.006
he-0.006
Token
Token ID198
Feature activation-1.37e-01

Pos logit contributions
 imag+0.009
 scratches+0.008
 pulls+0.008
 hides+0.008
 pret+0.008
Neg logit contributions
over-0.010
date-0.009
ute-0.009
apping-0.009
pron-0.008
Token tree
Token ID5509
Feature activation+2.03e-01

Pos logit contributions
over+0.004
apping+0.004
ute+0.004
date+0.004
he+0.003
Neg logit contributions
 imag-0.003
 scratches-0.003
 hides-0.003
 pret-0.003
 knocks-0.003
Token with
Token ID351
Feature activation-5.00e-01

Pos logit contributions
over+0.010
date+0.009
mber+0.009
ute+0.009
pron+0.009
Neg logit contributions
 hides-0.009
 knocks-0.008
 scratches-0.008
 pulls-0.008
 imag-0.008
Token lots
Token ID6041
Feature activation-5.23e-01

Pos logit contributions
 imag+0.029
 pret+0.027
 hides+0.027
drivers+0.026
 knocks+0.026
Neg logit contributions
 year-0.016
over-0.015
 sunny-0.014
 world-0.014
he-0.014
Token of
Token ID286
Feature activation+1.48e+00

Pos logit contributions
 imag+0.038
 scratches+0.035
 hides+0.035
 pulls+0.035
 pret+0.035
Neg logit contributions
over-0.012
date-0.010
apping-0.010
ute-0.010
he-0.009
Token leaves
Token ID5667
Feature activation+5.41e-02

Pos logit contributions
 year+0.066
over+0.064
he+0.062
 world+0.059
ute+0.057
Neg logit contributions
 imag-0.072
drivers-0.064
 pret-0.062
 pulls-0.062
 hides-0.061
Token.
Token ID13
Feature activation+9.75e-02

Pos logit contributions
over+0.002
apping+0.002
date+0.002
ute+0.002
he+0.002
Neg logit contributions
 imag-0.003
 hides-0.003
 scratches-0.003
 pulls-0.003
 pret-0.003
Token He
Token ID679
Feature activation+1.96e-01

Pos logit contributions
over+0.004
apping+0.004
date+0.004
he+0.004
ute+0.004
Neg logit contributions
 imag-0.005
 hides-0.005
 scratches-0.005
 pret-0.004
 pulls-0.004
Token ran
Token ID4966
Feature activation-2.27e-01

Pos logit contributions
 year+0.009
over+0.009
pron+0.008
apping+0.008
he+0.008
Neg logit contributions
 imag-0.009
 hides-0.008
 fault-0.008
drivers-0.008
 pret-0.008
Token over
Token ID625
Feature activation-1.40e-01

Pos logit contributions
 imag+0.012
 scratches+0.010
 pret+0.010
 pulls+0.010
 hides+0.010
Neg logit contributions
over-0.010
he-0.009
date-0.009
apping-0.009
ute-0.009
Token to
Token ID284
Feature activation-7.07e-01

Pos logit contributions
 imag+0.007
 hides+0.006
 pulls+0.006
 scratches+0.006
 pret+0.006
Neg logit contributions
over-0.007
date-0.006
apping-0.006
he-0.006
 year-0.006
Token the
Token ID262
Feature activation+1.20e-01

Pos logit contributions
 imag+0.027
 pulls+0.024
 scratches+0.024
 hides+0.024
 pret+0.023
Neg logit contributions
over-0.043
apping-0.040
ute-0.039
date-0.039
he-0.037
Token beautiful
Token ID4950
Feature activation-2.53e-01

Pos logit contributions
over+0.010
apping+0.009
he+0.009
ute+0.009
date+0.009
Neg logit contributions
 imag-0.008
 scratches-0.007
 pret-0.007
 hides-0.007
 pulls-0.006
Token season
Token ID1622
Feature activation-5.24e-02

Pos logit contributions
 imag+0.013
 scratches+0.011
 pret+0.011
fl+0.011
 bumps+0.011
Neg logit contributions
over-0.011
apping-0.011
he-0.010
ute-0.010
umber-0.010
Token.
Token ID13
Feature activation-2.93e-02

Pos logit contributions
 imag+0.002
 hides+0.002
 scratches+0.002
 pulls+0.002
 pret+0.002
Neg logit contributions
over-0.003
ute-0.002
date-0.002
apping-0.002
pron-0.002
Token She
Token ID1375
Feature activation-1.39e-01

Pos logit contributions
 imag+0.001
 hides+0.001
 pulls+0.001
 scratches+0.001
 pret+0.001
Neg logit contributions
over-0.001
ute-0.001
date-0.001
apping-0.001
pron-0.001
Token was
Token ID373
Feature activation-5.78e-01

Pos logit contributions
 imag+0.006
 hides+0.005
 scratches+0.005
drivers+0.005
 pulls+0.005
Neg logit contributions
over-0.008
apping-0.007
ute-0.007
date-0.007
 year-0.007
Token so
Token ID523
Feature activation-2.45e-01

Pos logit contributions
 imag+0.028
 hides+0.024
 pulls+0.024
 scratches+0.024
 pret+0.024
Neg logit contributions
over-0.028
ute-0.026
date-0.026
apping-0.025
he-0.025
Token grateful
Token ID14066
Feature activation-2.49e-01

Pos logit contributions
 imag+0.012
 scratches+0.011
 hides+0.011
 pulls+0.011
 screams+0.010
Neg logit contributions
over-0.011
date-0.011
ute-0.010
apping-0.010
he-0.009
Token to
Token ID284
Feature activation-4.66e-01

Pos logit contributions
 imag+0.010
 hides+0.009
 scratches+0.009
 pulls+0.008
 pret+0.008
Neg logit contributions
over-0.014
date-0.013
ute-0.013
apping-0.013
he-0.013
Token be
Token ID307
Feature activation+4.09e-01

Pos logit contributions
 imag+0.018
 pulls+0.015
 hides+0.015
 scratches+0.015
 pret+0.015
Neg logit contributions
over-0.027
date-0.025
ute-0.025
apping-0.025
he-0.024
Token able
Token ID1498
Feature activation-7.17e-01

Pos logit contributions
ute+0.021
he+0.020
over+0.020
 year+0.020
apping+0.019
Neg logit contributions
 imag-0.016
 hugs-0.013
 scratches-0.013
 pret-0.013
 hides-0.013
Token to
Token ID284
Feature activation-4.91e-01

Pos logit contributions
 imag+0.032
 hides+0.028
 pret+0.028
 scratches+0.028
 pulls+0.028
Neg logit contributions
over-0.037
apping-0.034
date-0.033
ute-0.033
he-0.033
Token like
Token ID588
Feature activation+5.42e-02

Pos logit contributions
over+0.021
date+0.020
he+0.019
 year+0.019
ute+0.019
Neg logit contributions
 imag-0.018
 scratches-0.016
drivers-0.016
 pret-0.015
 hides-0.015
Token to
Token ID284
Feature activation-5.59e-01

Pos logit contributions
over+0.002
ute+0.002
 year+0.002
apping+0.002
date+0.002
Neg logit contributions
 imag-0.003
 hides-0.002
drivers-0.002
 pret-0.002
 pulls-0.002
Token play
Token ID711
Feature activation+4.03e-01

Pos logit contributions
 imag+0.027
 hides+0.024
 pulls+0.024
 scratches+0.024
 pret+0.024
Neg logit contributions
over-0.027
date-0.025
ute-0.024
apping-0.024
he-0.024
Token baseball
Token ID9283
Feature activation-9.29e-02

Pos logit contributions
over+0.017
apping+0.017
ute+0.016
pron+0.016
mber+0.016
Neg logit contributions
 imag-0.020
 pret-0.019
 scratches-0.019
 hides-0.018
 pulls-0.018
Token,
Token ID11
Feature activation+2.20e-01

Pos logit contributions
 imag+0.005
 hides+0.004
 scratches+0.004
 pret+0.004
 pulls+0.004
Neg logit contributions
over-0.004
apping-0.004
date-0.004
ute-0.004
he-0.004
Token but
Token ID475
Feature activation+3.98e-01

Pos logit contributions
over+0.011
date+0.011
ute+0.010
he+0.010
apping+0.010
Neg logit contributions
 imag-0.010
 pret-0.009
 scratches-0.009
 hides-0.009
 pulls-0.009
Token now
Token ID783
Feature activation-3.57e-01

Pos logit contributions
over+0.020
apping+0.018
ute+0.018
date+0.018
he+0.017
Neg logit contributions
 imag-0.019
 hides-0.016
 scratches-0.016
 pulls-0.016
 pret-0.016
Token we
Token ID356
Feature activation+9.28e-02

Pos logit contributions
 imag+0.017
 scratches+0.015
 hides+0.015
 pulls+0.015
 pret+0.015
Neg logit contributions
over-0.017
date-0.016
apping-0.016
ute-0.016
 year-0.015
Token can
Token ID460
Feature activation+1.05e-01

Pos logit contributions
over+0.005
apping+0.004
date+0.004
ute+0.004
he+0.004
Neg logit contributions
 imag-0.004
 hides-0.004
 scratches-0.004
 pulls-0.004
 pret-0.004
Token't
Token ID470
Feature activation+5.53e-01

Pos logit contributions
over+0.005
date+0.005
ute+0.005
apping+0.005
he+0.004
Neg logit contributions
 imag-0.005
 hides-0.005
 scratches-0.004
 pret-0.004
 pulls-0.004
Token,"
Token ID553
Feature activation+1.26e-01

Pos logit contributions
over+0.032
apping+0.030
ute+0.029
pron+0.029
mber+0.029
Neg logit contributions
 imag-0.020
 hides-0.019
 pulls-0.019
 scratches-0.019
 pret-0.018
Token,
Token ID11
Feature activation+2.09e-01

Pos logit contributions
over+0.017
apping+0.016
date+0.015
ute+0.015
 year+0.015
Neg logit contributions
 imag-0.015
 scratches-0.013
 hides-0.013
 pulls-0.012
 pret-0.012
Token rabbits
Token ID33043
Feature activation+1.21e-01

Pos logit contributions
over+0.011
date+0.010
ute+0.010
apping+0.010
he+0.010
Neg logit contributions
 imag-0.010
 hides-0.008
 pret-0.008
 scratches-0.008
 pulls-0.008
Token,
Token ID11
Feature activation+1.99e-01

Pos logit contributions
over+0.006
date+0.005
ute+0.005
apping+0.005
he+0.005
Neg logit contributions
 imag-0.006
 hides-0.005
 scratches-0.005
 pulls-0.005
 pret-0.005
Token and
Token ID290
Feature activation+3.72e-02

Pos logit contributions
over+0.008
apping+0.007
ute+0.007
date+0.007
he+0.007
Neg logit contributions
 imag-0.011
 hides-0.010
 scratches-0.010
 pulls-0.010
 pret-0.009
Token fox
Token ID21831
Feature activation-7.90e-01

Pos logit contributions
over+0.002
date+0.002
apping+0.002
 year+0.002
ute+0.002
Neg logit contributions
 imag-0.002
 hides-0.002
 scratches-0.002
 pulls-0.002
 pret-0.002
Tokenes
Token ID274
Feature activation-2.25e+00

Pos logit contributions
 imag+0.041
 scratches+0.036
 hides+0.036
 pulls+0.036
 pret+0.035
Neg logit contributions
over-0.035
apping-0.032
date-0.032
ute-0.031
 year-0.030
Token all
Token ID477
Feature activation-5.73e-01

Pos logit contributions
 imag+0.103
 hides+0.090
 scratches+0.089
 pret+0.089
 pulls+0.089
Neg logit contributions
over-0.113
ute-0.103
apping-0.103
date-0.101
pron-0.098
Token gathered
Token ID9272
Feature activation+5.13e-01

Pos logit contributions
 imag+0.027
 hides+0.023
 scratches+0.023
 pulls+0.023
 pret+0.023
Neg logit contributions
over-0.029
apping-0.026
date-0.026
ute-0.026
he-0.025
Token around
Token ID1088
Feature activation+5.04e-01

Pos logit contributions
over+0.022
apping+0.019
date+0.019
ute+0.019
he+0.019
Neg logit contributions
 imag-0.027
 hides-0.025
 pulls-0.025
 pret-0.025
 scratches-0.024
Token to
Token ID284
Feature activation-5.36e-01

Pos logit contributions
over+0.029
apping+0.027
date+0.027
ute+0.027
he+0.027
Neg logit contributions
 imag-0.019
 hides-0.016
 scratches-0.016
 pulls-0.016
 pret-0.016
Token listen
Token ID6004
Feature activation-1.62e-01

Pos logit contributions
 imag+0.026
 hides+0.023
 scratches+0.023
 pulls+0.023
 pret+0.023
Neg logit contributions
over-0.026
apping-0.024
date-0.023
ute-0.023
he-0.023
Token-
Token ID12
Feature activation+8.36e-01

Pos logit contributions
over+0.093
date+0.083
apping+0.083
ute+0.082
he+0.081
Neg logit contributions
 imag-0.132
 hides-0.119
 scratches-0.119
 pulls-0.118
 pret-0.117
Tokenold
Token ID727
Feature activation-6.25e-01

Pos logit contributions
over+0.039
apping+0.036
date+0.036
ute+0.035
he+0.035
Neg logit contributions
 imag-0.042
 hides-0.037
 scratches-0.037
 pulls-0.036
 pret-0.036
Token t
Token ID256
Feature activation+2.65e-01

Pos logit contributions
 imag+0.027
 hides+0.023
 scratches+0.023
 pulls+0.023
 pret+0.023
Neg logit contributions
over-0.033
apping-0.031
date-0.031
ute-0.031
he-0.030
Tokenipt
Token ID10257
Feature activation+8.38e-02

Pos logit contributions
over+0.011
date+0.010
ute+0.010
apping+0.010
he+0.010
Neg logit contributions
 imag-0.014
 hides-0.013
 scratches-0.013
 pulls-0.013
 pret-0.012
Tokeno
Token ID78
Feature activation-7.40e-02

Pos logit contributions
over+0.004
date+0.004
ute+0.004
apping+0.004
 year+0.004
Neg logit contributions
 imag-0.004
 hides-0.003
 pulls-0.003
 scratches-0.003
drivers-0.003
Tokened
Token ID276
Feature activation-2.08e+00

Pos logit contributions
 imag+0.002
 hides+0.002
 pret+0.002
 scratches+0.002
 pulls+0.002
Neg logit contributions
over-0.005
apping-0.004
date-0.004
ute-0.004
he-0.004
Token up
Token ID510
Feature activation-2.10e-01

Pos logit contributions
 imag+0.109
 hides+0.098
 pret+0.097
 scratches+0.096
 pulls+0.096
Neg logit contributions
over-0.089
date-0.081
apping-0.079
he-0.078
 year-0.077
Token to
Token ID284
Feature activation-3.85e-01

Pos logit contributions
 imag+0.008
 scratches+0.007
 hides+0.007
 pulls+0.007
 pret+0.007
Neg logit contributions
over-0.012
ute-0.011
date-0.011
apping-0.011
he-0.011
Token the
Token ID262
Feature activation+1.87e-01

Pos logit contributions
 imag+0.013
 pret+0.011
 scratches+0.011
 pulls+0.011
 hides+0.011
Neg logit contributions
over-0.025
ute-0.023
apping-0.023
date-0.022
he-0.022
Token toilet
Token ID16146
Feature activation+5.15e-01

Pos logit contributions
over+0.010
apping+0.009
date+0.009
ute+0.009
he+0.008
Neg logit contributions
 imag-0.009
 hides-0.008
 scratches-0.008
 pulls-0.007
 pret-0.007
Token,
Token ID11
Feature activation+2.75e-01

Pos logit contributions
over+0.025
ute+0.023
date+0.023
apping+0.023
he+0.021
Neg logit contributions
 imag-0.023
 pret-0.021
 hides-0.021
 scratches-0.021
 pulls-0.021
Token Her
Token ID2332
Feature activation+6.30e-01

Pos logit contributions
 imag+0.021
 hides+0.018
 scratches+0.018
 pulls+0.018
 pret+0.018
Neg logit contributions
over-0.018
apping-0.016
date-0.016
ute-0.016
he-0.016
Token mom
Token ID1995
Feature activation-1.63e-01

Pos logit contributions
over+0.029
apping+0.026
ute+0.026
date+0.026
he+0.025
Neg logit contributions
 imag-0.032
 hides-0.029
 scratches-0.028
 pulls-0.028
 pret-0.028
Token said
Token ID531
Feature activation+4.79e-01

Pos logit contributions
 imag+0.007
 hides+0.006
 scratches+0.006
 pret+0.006
 fault+0.006
Neg logit contributions
over-0.008
he-0.008
apping-0.008
 year-0.008
date-0.007
Token,
Token ID11
Feature activation+1.82e-01

Pos logit contributions
over+0.020
apping+0.018
date+0.017
ute+0.017
he+0.017
Neg logit contributions
 imag-0.027
 hides-0.024
 scratches-0.024
 pulls-0.024
 pret-0.024
Token "
Token ID366
Feature activation+1.37e-01

Pos logit contributions
over+0.010
date+0.010
ute+0.010
he+0.009
apping+0.009
Neg logit contributions
 imag-0.007
 pret-0.006
 pulls-0.006
 scratches-0.006
 hides-0.006
TokenGood
Token ID10248
Feature activation-2.03e+00

Pos logit contributions
over+0.006
apping+0.005
date+0.005
ute+0.005
he+0.005
Neg logit contributions
 imag-0.008
 hides-0.007
 scratches-0.007
 pulls-0.007
 pret-0.007
Token idea
Token ID2126
Feature activation+1.80e+00

Pos logit contributions
 imag+0.121
 hides+0.110
 scratches+0.108
 pret+0.108
 pulls+0.106
Neg logit contributions
over-0.074
date-0.069
apping-0.069
ute-0.067
he-0.066
Token,
Token ID11
Feature activation+1.23e-01

Pos logit contributions
over+0.080
date+0.072
ute+0.072
apping+0.072
he+0.069
Neg logit contributions
 imag-0.094
 hides-0.084
 scratches-0.084
 pulls-0.083
 pret-0.083
Token Mia
Token ID32189
Feature activation+2.01e-01

Pos logit contributions
over+0.006
apping+0.006
date+0.006
 year+0.006
he+0.006
Neg logit contributions
 imag-0.005
 hides-0.005
 scratches-0.005
 pret-0.004
 pulls-0.004
Token.
Token ID13
Feature activation+6.13e-03

Pos logit contributions
over+0.008
apping+0.007
date+0.007
ute+0.007
he+0.007
Neg logit contributions
 imag-0.011
 hides-0.010
 scratches-0.010
 pulls-0.010
 pret-0.010
Token Let
Token ID3914
Feature activation-5.43e-01

Pos logit contributions
over+0.000
apping+0.000
date+0.000
ute+0.000
he+0.000
Neg logit contributions
 imag-0.000
 scratches-0.000
 hides-0.000
 pulls-0.000
 pret-0.000
Token Soon
Token ID15894
Feature activation-9.01e-01

Pos logit contributions
over+0.002
date+0.002
apping+0.002
ute+0.002
he+0.002
Neg logit contributions
 imag-0.003
 scratches-0.002
 pret-0.002
 hides-0.002
 pulls-0.002
Token,
Token ID11
Feature activation+1.71e-01

Pos logit contributions
 imag+0.054
 scratches+0.049
 pulls+0.049
drivers+0.048
 pret+0.047
Neg logit contributions
over-0.034
date-0.028
apping-0.028
 year-0.027
kin-0.025
Token the
Token ID262
Feature activation+9.29e-02

Pos logit contributions
over+0.010
apping+0.009
ute+0.009
date+0.009
he+0.009
Neg logit contributions
 imag-0.007
 hides-0.006
 scratches-0.006
 pret-0.006
 pulls-0.006
Token broccoli
Token ID44653
Feature activation+2.15e-01

Pos logit contributions
over+0.005
date+0.005
ute+0.005
apping+0.005
he+0.005
Neg logit contributions
 imag-0.004
 hides-0.003
 pulls-0.003
 scratches-0.003
 pret-0.003
Token-
Token ID12
Feature activation+7.67e-01

Pos logit contributions
over+0.013
apping+0.012
date+0.012
ute+0.012
he+0.011
Neg logit contributions
 imag-0.008
 hides-0.007
 scratches-0.007
 pulls-0.007
 pret-0.007
Tokenfl
Token ID2704
Feature activation-3.27e+00

Pos logit contributions
over+0.039
date+0.036
he+0.036
ute+0.036
apping+0.036
Neg logit contributions
 imag-0.035
 hides-0.031
 scratches-0.030
 pulls-0.030
 pret-0.030
Tokenavored
Token ID48275
Feature activation+1.09e+00

Pos logit contributions
 imag+0.160
 scratches+0.140
 hides+0.140
 pulls+0.139
 pret+0.136
Neg logit contributions
over-0.153
apping-0.142
ute-0.141
date-0.139
 year-0.136
Token l
Token ID300
Feature activation-8.28e-01

Pos logit contributions
over+0.066
date+0.060
apping+0.060
he+0.060
ute+0.059
Neg logit contributions
 imag-0.040
 hides-0.033
 scratches-0.033
 pulls-0.033
 pret-0.032
Tokenoll
Token ID692
Feature activation+8.44e-01

Pos logit contributions
 imag+0.032
 scratches+0.028
 pulls+0.028
 hides+0.027
 pret+0.026
Neg logit contributions
over-0.046
date-0.044
ute-0.043
apping-0.042
 year-0.042
Tokenip
Token ID541
Feature activation+2.49e-01

Pos logit contributions
over+0.032
ute+0.028
he+0.028
kin+0.027
apping+0.027
Neg logit contributions
 imag-0.050
 hides-0.045
 scratches-0.044
 pulls-0.044
 pret-0.044
Tokenops
Token ID2840
Feature activation-2.38e+00

Pos logit contributions
over+0.015
he+0.014
date+0.013
ute+0.013
apping+0.013
Neg logit contributions
 imag-0.010
 hides-0.008
 scratches-0.008
 pret-0.008
 knocks-0.008
Token.
Token ID13
Feature activation-1.86e-01

Pos logit contributions
over+0.000
apping+0.000
date+0.000
ute+0.000
he+0.000
Neg logit contributions
 imag-0.000
 hides-0.000
 scratches-0.000
 pulls-0.000
 pret-0.000
Token It
Token ID632
Feature activation-1.23e+00

Pos logit contributions
 imag+0.009
 hides+0.008
 scratches+0.007
 pulls+0.007
 pret+0.007
Neg logit contributions
over-0.009
apping-0.009
date-0.009
ute-0.009
he-0.008
Token was
Token ID373
Feature activation-6.44e-01

Pos logit contributions
 imag+0.054
drivers+0.050
 scratches+0.048
 hides+0.048
 pret+0.047
Neg logit contributions
over-0.059
he-0.057
 year-0.057
date-0.055
apping-0.053
Token a
Token ID257
Feature activation+9.74e-02

Pos logit contributions
 imag+0.028
 hides+0.025
 scratches+0.025
 pulls+0.024
 pret+0.024
Neg logit contributions
over-0.034
apping-0.031
date-0.031
ute-0.031
he-0.030
Token special
Token ID2041
Feature activation-8.79e-02

Pos logit contributions
over+0.005
apping+0.005
ute+0.005
he+0.005
date+0.004
Neg logit contributions
 imag-0.005
 pret-0.004
 scratches-0.004
 hides-0.004
 knocks-0.004
Token part
Token ID636
Feature activation-2.11e+00

Pos logit contributions
 imag+0.005
 pret+0.004
 scratches+0.004
 hides+0.004
 knocks+0.004
Neg logit contributions
over-0.003
apping-0.003
ute-0.003
he-0.003
date-0.003
Token of
Token ID286
Feature activation+1.32e+00

Pos logit contributions
 imag+0.119
 hides+0.108
 scratches+0.107
 knocks+0.106
 pulls+0.106
Neg logit contributions
over-0.072
ute-0.066
date-0.063
apping-0.061
pron-0.058
Token the
Token ID262
Feature activation+1.15e-01

Pos logit contributions
over+0.084
apping+0.079
ute+0.079
date+0.077
he+0.076
Neg logit contributions
 imag-0.045
 scratches-0.038
 pret-0.036
 pulls-0.035
 hides-0.035
Token family
Token ID1641
Feature activation-1.36e-01

Pos logit contributions
over+0.005
ute+0.005
apping+0.005
boat+0.005
date+0.005
Neg logit contributions
 imag-0.006
 scratches-0.005
 knocks-0.005
 hides-0.005
 turns-0.005
Token.
Token ID13
Feature activation-1.89e-01

Pos logit contributions
 imag+0.006
 hides+0.005
 pret+0.005
 pulls+0.005
 knocks+0.005
Neg logit contributions
over-0.006
ute-0.006
date-0.006
apping-0.006
pron-0.006
Token Everyone
Token ID11075
Feature activation-5.90e-01

Pos logit contributions
 imag+0.008
 hides+0.007
 scratches+0.007
 pret+0.007
 pulls+0.007
Neg logit contributions
over-0.010
date-0.009
ute-0.009
apping-0.009
he-0.009