TOP ACTIVATIONS
MAX = 6.558

a black cat named Wh isk ers . Wh isk ers
a naughty cat named Wh isk ers . Wh isk ers
Wh isk ers . Wh isk ers loved to walk around
not eat too many ch ocol ates and his friend learned
ch ubby cat named Wh isk ers . Tim my loved
on the cards . Wh isk ers was hesitant at first
a big box of ch ocol ates . It was so
Ch ir py and Wh isk ers knew that poison was
was a box of ch ocol ates . Lily loved ch
find a box of ch ocol ates . "
Wh isk ers . Wh isk ers loved to play with
giving him too many ch ocol ates ." The king learned
Ch ir py and Wh isk ers became good friends ,
box and share the ch ocol ates . They are happy
would heal soon . Wh isk ers was very happy to
After Mr . Wh isk ers was done with his
a quiet mouse named Wh isk ers . Wh isk ers
a smart cat named Wh isk ers . Wh isk ers
her pet cat , Wh isk ers . Lily loved to
Wh isk ers . Wh isk ers was sad because he

MIN ACTIVATIONS
MIN = -3.816

Jane hopped on the sees aw and before long she was
her mom my watched in amaz ement as the machine flew
les for extra y um min ess .
dog as a yellow lab rador . Lily got excited and
go ! Everyone watched in amaz ement as it took off
friends stopped and watched in amaz ement . The dog too
sun fl owers were Y ol anda 's best friends .
- a box of mult icol ored bubbles !
the c rows were c aw ing in unison to make
fish , a noisy squ aw king hawk , and many
ered in , blinking in amaz ement . But
little girl looked up in amaz ement and said , "
for the lock . Y ol anda was so excited !
and forth on the sees aw . They kept singing and
of them balanced the sees aw . They laughed and laughed
she just stood there g aw king at the beautiful gold
was jumping on the sees aw , she accidentally it and
As the light ening f lic kered away , the man
ball ?" A lic ia nodded and said ,
it . A lic ia smiled while she looked

INTERVAL 3.965 - 6.558
CONTAINS 0.008%

Ch ir py and Wh isk ers became good friends ,
slowly and carefully to Wh isk ers . Wh isk ers
care of the rh in oc eros and it needed to
that day on , Wh isk ers and Ch ir py
to see the rh in oc eros . Tim my was

INTERVAL 1.371 - 3.965
CONTAINS 1.290%

to use a stool , but it was too far .
L ily was sad , but knew she had to face
's nice , Mia , but listen to my drum !"
the boy if he could help him . The boy said
She sees a man get out of the truck and walk

INTERVAL -1.222 - 1.371
CONTAINS 97.583%

's mom my said , " We can 't keep the
mummy replied . C her ie felt an urge
," replied the mole . The rabbit had an
Emily and Tim spend the rest of the day making her
she didn 't sing well . But her parents told her

INTERVAL -3.816 - -1.222
CONTAINS 1.118%

one day , the colorful metal started to tell stories too
she asked . M um pointed to a huge
. He traveled around for days and days until he reached
, so he pushed the door open and hopped inside .
ily and Ben looked at each other . They both wanted
Token a
Token ID257
Feature activation+1.35e-01

Pos logit contributions
 convince+0.008
 hips+0.006
 injuries+0.006
me+0.006
bing+0.006
Neg logit contributions
 wow-0.009
 fault-0.009
 happens-0.009
unts-0.009
 ago-0.009
Token black
Token ID2042
Feature activation-1.16e+00

Pos logit contributions
 convince+0.005
 hips+0.004
 injuries+0.004
ivering+0.004
bing+0.004
Neg logit contributions
 wow-0.005
 fault-0.004
 ago-0.004
uzz-0.004
 happens-0.004
Token cat
Token ID3797
Feature activation+2.98e-01

Pos logit contributions
 wow+0.041
Kat+0.038
 fault+0.038
 happens+0.038
 ago+0.038
Neg logit contributions
 convince-0.036
 hips-0.030
 injuries-0.029
ivering-0.028
bing-0.028
Token named
Token ID3706
Feature activation-5.98e-03

Pos logit contributions
 convince+0.012
 hips+0.011
 injuries+0.011
 negotiate+0.010
arians+0.010
Neg logit contributions
 wow-0.008
unts-0.007
uzz-0.007
 ago-0.007
ull-0.007
Token Wh
Token ID854
Feature activation+1.59e+00

Pos logit contributions
 wow+0.000
 ago+0.000
 maj+0.000
uzz+0.000
 happened+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
eling-0.000
ivering-0.000
Tokenisk
Token ID1984
Feature activation+6.05e+00

Pos logit contributions
 convince+0.054
 hips+0.044
 persuade+0.041
 hire+0.041
bing+0.040
Neg logit contributions
uzz-0.054
 fault-0.053
unts-0.053
 ago-0.052
 wow-0.052
Tokeners
Token ID364
Feature activation+4.27e-01

Pos logit contributions
 convince+0.084
me+0.046
 hips+0.046
bing+0.043
 injuries+0.041
Neg logit contributions
 wow-0.324
 telescope-0.307
 ago-0.307
 amaz-0.306
 fault-0.302
Token.
Token ID13
Feature activation+6.56e-01

Pos logit contributions
 convince+0.017
 hips+0.015
 injuries+0.014
vation+0.014
eling+0.014
Neg logit contributions
 wow-0.012
uzz-0.012
 many-0.011
unts-0.011
 ago-0.011
Token Wh
Token ID854
Feature activation+1.55e+00

Pos logit contributions
 convince+0.027
eling+0.023
 hips+0.023
vation+0.022
 persuade+0.022
Neg logit contributions
 wow-0.019
 many-0.018
 ago-0.017
unts-0.017
uzz-0.016
Tokenisk
Token ID1984
Feature activation+5.98e+00

Pos logit contributions
 convince+0.053
 hips+0.044
 persuade+0.040
 injuries+0.040
 hire+0.039
Neg logit contributions
uzz-0.053
 fault-0.052
unts-0.051
 ago-0.051
 wow-0.051
Tokeners
Token ID364
Feature activation+3.84e-01

Pos logit contributions
 convince+0.079
me+0.040
bing+0.040
 hips+0.039
ty+0.039
Neg logit contributions
 wow-0.325
 ago-0.311
 telescope-0.309
 amaz-0.308
 fault-0.303
Token a
Token ID257
Feature activation+8.52e-02

Pos logit contributions
 convince+0.010
 hips+0.008
ittens+0.008
Mr+0.008
me+0.008
Neg logit contributions
 wow-0.012
 happens-0.011
 fault-0.011
 telescope-0.011
uzz-0.011
Token naughty
Token ID45128
Feature activation-6.74e-01

Pos logit contributions
 convince+0.003
 hips+0.002
 injuries+0.002
ivering+0.002
bing+0.002
Neg logit contributions
 wow-0.003
 fault-0.003
 ago-0.003
uzz-0.003
 happens-0.003
Token cat
Token ID3797
Feature activation+3.55e-01

Pos logit contributions
 wow+0.024
Kat+0.023
 ago+0.023
uzz+0.023
 fault+0.023
Neg logit contributions
 convince-0.019
 hips-0.016
 injuries-0.015
ivering-0.015
ittens-0.015
Token named
Token ID3706
Feature activation+3.06e-02

Pos logit contributions
 convince+0.014
 hips+0.013
 injuries+0.012
 negotiate+0.012
bing+0.012
Neg logit contributions
 wow-0.010
unts-0.009
 ago-0.009
uzz-0.009
 happened-0.009
Token Wh
Token ID854
Feature activation+1.77e+00

Pos logit contributions
 convince+0.001
 hips+0.001
 injuries+0.001
bing+0.001
 hire+0.001
Neg logit contributions
 wow-0.001
 maj-0.001
 ago-0.001
 happened-0.001
 fault-0.001
Tokenisk
Token ID1984
Feature activation+6.01e+00

Pos logit contributions
 convince+0.057
 hips+0.047
 persuade+0.043
 hire+0.042
bing+0.042
Neg logit contributions
 fault-0.063
uzz-0.062
unts-0.062
 wow-0.062
 ago-0.061
Tokeners
Token ID364
Feature activation+4.42e-01

Pos logit contributions
 convince+0.079
me+0.043
ty+0.041
bing+0.040
 hips+0.040
Neg logit contributions
 wow-0.324
 amaz-0.313
 telescope-0.310
 ago-0.306
 solar-0.302
Token.
Token ID13
Feature activation+5.80e-01

Pos logit contributions
 convince+0.017
 hips+0.015
 injuries+0.014
 persuade+0.014
vation+0.014
Neg logit contributions
 wow-0.013
 many-0.012
 happened-0.012
uzz-0.012
unts-0.012
Token Wh
Token ID854
Feature activation+1.68e+00

Pos logit contributions
 convince+0.023
 hips+0.019
eling+0.019
 injuries+0.018
 persuade+0.018
Neg logit contributions
 wow-0.017
 many-0.017
 ago-0.016
unts-0.016
uzz-0.016
Tokenisk
Token ID1984
Feature activation+5.86e+00

Pos logit contributions
 convince+0.057
 hips+0.048
 persuade+0.043
 injuries+0.042
 hire+0.041
Neg logit contributions
 fault-0.057
uzz-0.057
unts-0.056
 ago-0.055
 wow-0.055
Tokeners
Token ID364
Feature activation+3.89e-01

Pos logit contributions
 convince+0.082
bing+0.044
me+0.044
 hips+0.042
ty+0.042
Neg logit contributions
 wow-0.314
 amaz-0.300
 telescope-0.300
 ago-0.298
 fault-0.291
Token Wh
Token ID854
Feature activation+1.59e+00

Pos logit contributions
 wow+0.000
 ago+0.000
 maj+0.000
uzz+0.000
 happened+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
eling-0.000
ivering-0.000
Tokenisk
Token ID1984
Feature activation+6.05e+00

Pos logit contributions
 convince+0.054
 hips+0.044
 persuade+0.041
 hire+0.041
bing+0.040
Neg logit contributions
uzz-0.054
 fault-0.053
unts-0.053
 ago-0.052
 wow-0.052
Tokeners
Token ID364
Feature activation+4.27e-01

Pos logit contributions
 convince+0.084
me+0.046
 hips+0.046
bing+0.043
 injuries+0.041
Neg logit contributions
 wow-0.324
 telescope-0.307
 ago-0.307
 amaz-0.306
 fault-0.302
Token.
Token ID13
Feature activation+6.56e-01

Pos logit contributions
 convince+0.017
 hips+0.015
 injuries+0.014
vation+0.014
eling+0.014
Neg logit contributions
 wow-0.012
uzz-0.012
 many-0.011
unts-0.011
 ago-0.011
Token Wh
Token ID854
Feature activation+1.55e+00

Pos logit contributions
 convince+0.027
eling+0.023
 hips+0.023
vation+0.022
 persuade+0.022
Neg logit contributions
 wow-0.019
 many-0.018
 ago-0.017
unts-0.017
uzz-0.016
Tokenisk
Token ID1984
Feature activation+5.98e+00

Pos logit contributions
 convince+0.053
 hips+0.044
 persuade+0.040
 injuries+0.040
 hire+0.039
Neg logit contributions
uzz-0.053
 fault-0.052
unts-0.051
 ago-0.051
 wow-0.051
Tokeners
Token ID364
Feature activation+3.84e-01

Pos logit contributions
 convince+0.079
me+0.040
bing+0.040
 hips+0.039
ty+0.039
Neg logit contributions
 wow-0.325
 ago-0.311
 telescope-0.309
 amaz-0.308
 fault-0.303
Token loved
Token ID6151
Feature activation+1.01e+00

Pos logit contributions
 convince+0.020
 hips+0.018
 injuries+0.017
bing+0.017
 hire+0.017
Neg logit contributions
 wow-0.006
 ago-0.005
 fault-0.005
unts-0.005
uzz-0.005
Token to
Token ID284
Feature activation+2.94e-01

Pos logit contributions
 convince+0.039
 hips+0.033
 injuries+0.033
bing+0.031
ittens+0.031
Neg logit contributions
 wow-0.030
Kat-0.027
 ago-0.026
apped-0.026
unts-0.026
Token walk
Token ID2513
Feature activation+4.19e-01

Pos logit contributions
 convince+0.010
 hips+0.009
 injuries+0.008
bing+0.008
arians+0.008
Neg logit contributions
 wow-0.010
unts-0.009
 telescope-0.009
 maj-0.009
 happened-0.009
Token around
Token ID1088
Feature activation-1.12e-01

Pos logit contributions
 convince+0.017
 hips+0.015
 injuries+0.015
eling+0.015
arians+0.015
Neg logit contributions
 wow-0.011
 many-0.011
uzz-0.010
 maj-0.010
 happened-0.010
Token not
Token ID407
Feature activation+8.19e-01

Pos logit contributions
 wow+0.036
 ago+0.032
unts+0.032
uzz+0.032
 maj+0.032
Neg logit contributions
 convince-0.036
 hips-0.031
 injuries-0.029
bing-0.028
 hire-0.028
Token eat
Token ID4483
Feature activation+1.13e+00

Pos logit contributions
 convince+0.026
 hips+0.024
bing+0.022
 injuries+0.022
arians+0.022
Neg logit contributions
 wow-0.028
unts-0.027
 ago-0.026
uzz-0.026
 fault-0.026
Token too
Token ID1165
Feature activation-6.41e-03

Pos logit contributions
 convince+0.046
 hips+0.039
ivering+0.038
vation+0.038
 hire+0.037
Neg logit contributions
 wow-0.033
 many-0.031
 ago-0.029
 fault-0.029
uzz-0.029
Token many
Token ID867
Feature activation-7.12e-01

Pos logit contributions
 wow+0.000
 maj+0.000
 ago+0.000
uzz+0.000
unts+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
bing-0.000
 hire-0.000
Token ch
Token ID442
Feature activation-1.95e-01

Pos logit contributions
 wow+0.020
 many+0.019
 ago+0.019
 happened+0.019
 happens+0.018
Neg logit contributions
 convince-0.029
 hips-0.026
-0.025
ivering-0.024
bing-0.024
Tokenocol
Token ID4668
Feature activation+5.96e+00

Pos logit contributions
 wow+0.007
 ago+0.006
unts+0.006
uzz+0.006
 fault+0.006
Neg logit contributions
 convince-0.006
 hips-0.005
 injuries-0.005
bing-0.005
arians-0.005
Tokenates
Token ID689
Feature activation+1.56e+00

Pos logit contributions
 convince+0.157
 hips+0.128
 injuries+0.119
bing+0.112
 hire+0.108
Neg logit contributions
 wow-0.250
 ago-0.230
unts-0.229
uzz-0.228
 fault-0.226
Token and
Token ID290
Feature activation-2.27e-01

Pos logit contributions
 convince+0.061
 hips+0.051
 injuries+0.050
 hire+0.048
 negotiate+0.048
Neg logit contributions
 wow-0.046
 ago-0.045
uzz-0.042
 fault-0.042
 many-0.042
Token his
Token ID465
Feature activation+3.34e-01

Pos logit contributions
 wow+0.007
unts+0.007
Kat+0.007
 maj+0.007
uzz+0.007
Neg logit contributions
 convince-0.008
 hips-0.007
 injuries-0.007
 hire-0.006
eling-0.006
Token friend
Token ID1545
Feature activation-8.95e-02

Pos logit contributions
 convince+0.013
 hips+0.011
 injuries+0.010
bing+0.010
ivering+0.010
Neg logit contributions
 wow-0.010
 ago-0.009
 fault-0.009
unts-0.009
uzz-0.009
Token learned
Token ID4499
Feature activation+3.71e-01

Pos logit contributions
 wow+0.002
 ago+0.002
unts+0.002
uzz+0.002
 maj+0.002
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
bing-0.003
 hire-0.003
Token ch
Token ID442
Feature activation-1.32e-01

Pos logit contributions
 wow+0.007
uzz+0.007
 ago+0.006
apped+0.006
Kat+0.006
Neg logit contributions
 convince-0.008
 hips-0.007
 injuries-0.007
 negotiate-0.006
 sleeve-0.006
Tokenubby
Token ID38393
Feature activation+6.92e-01

Pos logit contributions
 wow+0.005
 happened+0.004
Kat+0.004
 happens+0.004
 ago+0.004
Neg logit contributions
 convince-0.004
 hips-0.004
 injuries-0.004
ivering-0.003
bing-0.003
Token cat
Token ID3797
Feature activation+1.69e-01

Pos logit contributions
 convince+0.023
 hips+0.020
 injuries+0.019
bing+0.018
ivering+0.018
Neg logit contributions
 wow-0.023
Kat-0.021
unts-0.021
uzz-0.021
 happens-0.021
Token named
Token ID3706
Feature activation-1.87e-01

Pos logit contributions
 convince+0.006
 hips+0.006
 injuries+0.005
bing+0.005
eling+0.005
Neg logit contributions
 wow-0.005
 ago-0.005
unts-0.005
uzz-0.005
 happened-0.005
Token Wh
Token ID854
Feature activation+1.67e+00

Pos logit contributions
 wow+0.006
 ago+0.005
uzz+0.005
 happened+0.005
unts+0.005
Neg logit contributions
 convince-0.007
 hips-0.006
 injuries-0.006
ivering-0.006
bing-0.006
Tokenisk
Token ID1984
Feature activation+5.94e+00

Pos logit contributions
 convince+0.056
 hips+0.045
 persuade+0.043
 hire+0.042
 injuries+0.041
Neg logit contributions
 fault-0.058
uzz-0.058
 wow-0.056
 ago-0.055
unts-0.055
Tokeners
Token ID364
Feature activation+4.01e-01

Pos logit contributions
 convince+0.102
 hips+0.067
 injuries+0.061
bing+0.060
arians+0.057
Neg logit contributions
 wow-0.302
 telescope-0.284
 ago-0.282
unts-0.280
 fault-0.278
Token.
Token ID13
Feature activation+1.08e-01

Pos logit contributions
 convince+0.014
 hips+0.012
 injuries+0.012
eling+0.011
 persuade+0.011
Neg logit contributions
 wow-0.013
 ago-0.013
uzz-0.012
 happened-0.012
unts-0.012
Token Tim
Token ID5045
Feature activation+7.90e-01

Pos logit contributions
 convince+0.005
eling+0.004
 injuries+0.004
ising+0.004
vation+0.004
Neg logit contributions
 wow-0.002
 ago-0.002
 It-0.002
uzz-0.002
 many-0.002
Tokenmy
Token ID1820
Feature activation+1.21e+00

Pos logit contributions
 convince+0.030
 hips+0.026
 injuries+0.025
bing+0.024
 hire+0.024
Neg logit contributions
 wow-0.024
unts-0.021
 ago-0.021
uzz-0.021
 fault-0.021
Token loved
Token ID6151
Feature activation+7.34e-01

Pos logit contributions
 convince+0.052
 hips+0.047
bing+0.044
me+0.043
ivering+0.042
Neg logit contributions
 wow-0.030
 ago-0.027
 fault-0.026
 telescope-0.026
uzz-0.026
Token on
Token ID319
Feature activation+1.12e-01

Pos logit contributions
 wow+0.021
 ago+0.019
unts+0.019
uzz+0.018
 fault+0.018
Neg logit contributions
 convince-0.021
 hips-0.018
 injuries-0.017
bing-0.017
arians-0.016
Token the
Token ID262
Feature activation+6.22e-02

Pos logit contributions
 convince+0.005
 hips+0.004
 injuries+0.004
ivering+0.004
bing+0.004
Neg logit contributions
 wow-0.003
 ago-0.003
unts-0.003
 telescope-0.003
 happened-0.003
Token cards
Token ID4116
Feature activation+5.02e-01

Pos logit contributions
 convince+0.002
 hips+0.002
 injuries+0.002
bing+0.002
ivering+0.002
Neg logit contributions
 wow-0.002
 ago-0.002
unts-0.002
 telescope-0.002
 fault-0.002
Token.
Token ID13
Feature activation+5.22e-02

Pos logit contributions
 convince+0.017
 hips+0.014
 injuries+0.014
bing+0.013
ivering+0.013
Neg logit contributions
 wow-0.017
 ago-0.016
unts-0.016
 happened-0.016
 fault-0.015
Token Wh
Token ID854
Feature activation+1.71e+00

Pos logit contributions
 convince+0.002
 hips+0.002
+0.002
 hire+0.002
arians+0.002
Neg logit contributions
 wow-0.001
 happened-0.001
 telescope-0.001
 fault-0.001
 happens-0.001
Tokenisk
Token ID1984
Feature activation+5.93e+00

Pos logit contributions
 convince+0.055
 hips+0.045
 injuries+0.041
 persuade+0.040
bing+0.039
Neg logit contributions
 wow-0.062
 fault-0.062
uzz-0.062
 ago-0.060
unts-0.060
Tokeners
Token ID364
Feature activation+2.73e-01

Pos logit contributions
 convince+0.085
 hips+0.049
me+0.048
bing+0.046
arians+0.044
Neg logit contributions
 wow-0.313
 telescope-0.301
 ago-0.298
 amaz-0.295
unts-0.294
Token was
Token ID373
Feature activation-1.09e-02

Pos logit contributions
 convince+0.011
 hips+0.010
 injuries+0.009
bing+0.009
 hire+0.009
Neg logit contributions
 wow-0.008
 ago-0.007
unts-0.007
 fault-0.007
uzz-0.007
Token hesitant
Token ID32848
Feature activation+1.08e+00

Pos logit contributions
 wow+0.000
 telescope+0.000
 ago+0.000
 fault+0.000
uzz+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
bing-0.000
 hire-0.000
Token at
Token ID379
Feature activation-7.46e-01

Pos logit contributions
 convince+0.046
 hips+0.040
arians+0.040
 injuries+0.039
 negotiate+0.039
Neg logit contributions
 wow-0.027
 happened-0.025
 ago-0.025
 fault-0.025
uzz-0.025
Token first
Token ID717
Feature activation+1.27e+00

Pos logit contributions
 wow+0.039
 ago+0.036
uzz+0.036
unts+0.036
 fault+0.036
Neg logit contributions
 convince-0.012
 hips-0.009
 injuries-0.008
bing-0.007
ivering-0.006
Token a
Token ID257
Feature activation-1.36e-01

Pos logit contributions
unts+0.011
 wow+0.011
 ago+0.010
 maj+0.010
 fault+0.010
Neg logit contributions
 convince-0.012
 injuries-0.010
 hips-0.010
eling-0.009
bing-0.009
Token big
Token ID1263
Feature activation-6.99e-03

Pos logit contributions
 wow+0.004
uzz+0.004
 ago+0.004
 fault+0.004
apped+0.004
Neg logit contributions
 convince-0.005
 hips-0.004
 injuries-0.004
 hire-0.004
 negotiate-0.004
Token box
Token ID3091
Feature activation-1.01e-01

Pos logit contributions
 wow+0.000
uzz+0.000
Kat+0.000
 fault+0.000
unts+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
bing-0.000
 hire-0.000
Token of
Token ID286
Feature activation+5.86e-01

Pos logit contributions
 wow+0.004
 ago+0.003
unts+0.003
uzz+0.003
 happened+0.003
Neg logit contributions
 convince-0.003
 hips-0.003
 injuries-0.003
bing-0.003
arians-0.003
Token ch
Token ID442
Feature activation-1.84e-01

Pos logit contributions
 convince+0.022
 hips+0.019
 injuries+0.018
bing+0.017
ivering+0.017
Neg logit contributions
 wow-0.018
 ago-0.016
 telescope-0.016
unts-0.016
 happened-0.016
Tokenocol
Token ID4668
Feature activation+5.90e+00

Pos logit contributions
 wow+0.008
 happened+0.007
 fault+0.007
Kat+0.007
 maj+0.007
Neg logit contributions
 convince-0.005
 hips-0.004
 injuries-0.004
bing-0.004
ivering-0.004
Tokenates
Token ID689
Feature activation+1.72e+00

Pos logit contributions
 convince+0.136
 hips+0.114
 injuries+0.107
bing+0.101
arians+0.096
Neg logit contributions
 wow-0.263
unts-0.240
 happened-0.236
 happens-0.236
 fault-0.236
Token.
Token ID13
Feature activation-4.80e-02

Pos logit contributions
 convince+0.057
 hips+0.048
 injuries+0.046
 hire+0.043
arians+0.043
Neg logit contributions
 wow-0.060
 ago-0.056
 happened-0.054
uzz-0.054
unts-0.054
Token It
Token ID632
Feature activation+8.66e-01

Pos logit contributions
 wow+0.001
unts+0.001
 fault+0.001
 happened+0.001
 telescope+0.001
Neg logit contributions
 convince-0.002
 hips-0.002
 hire-0.001
 injuries-0.001
arians-0.001
Token was
Token ID373
Feature activation-7.16e-03

Pos logit contributions
 convince+0.037
 hips+0.033
 injuries+0.033
bing+0.031
ivering+0.030
Neg logit contributions
 wow-0.022
 happened-0.019
 ago-0.019
unts-0.019
uzz-0.019
Token so
Token ID523
Feature activation+3.37e-01

Pos logit contributions
 wow+0.000
unts+0.000
 ago+0.000
uzz+0.000
 fault+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
bing-0.000
 hire-0.000
Token Ch
Token ID609
Feature activation+5.69e-01

Pos logit contributions
 wow+0.016
 ago+0.014
uzz+0.014
Kat+0.014
unts+0.013
Neg logit contributions
 convince-0.031
 hips-0.028
 injuries-0.028
eling-0.026
bing-0.026
Tokenir
Token ID343
Feature activation+2.74e+00

Pos logit contributions
 convince+0.022
 hips+0.019
 injuries+0.019
bing+0.018
 negotiate+0.018
Neg logit contributions
 wow-0.016
uzz-0.015
 ago-0.015
unts-0.015
 fault-0.015
Tokenpy
Token ID9078
Feature activation+2.50e-01

Pos logit contributions
 convince+0.083
 hips+0.069
 injuries+0.064
bing+0.063
ivering+0.062
Neg logit contributions
 wow-0.103
 telescope-0.094
unts-0.093
 ago-0.093
Kat-0.093
Token and
Token ID290
Feature activation-3.04e-01

Pos logit contributions
 convince+0.010
 hips+0.009
 injuries+0.008
bing+0.008
eling+0.008
Neg logit contributions
 wow-0.007
 ago-0.006
 fault-0.006
unts-0.006
uzz-0.006
Token Wh
Token ID854
Feature activation+1.61e+00

Pos logit contributions
 fault+0.010
 wow+0.010
unts+0.010
Kat+0.010
apped+0.010
Neg logit contributions
 convince-0.010
 hips-0.008
bing-0.008
 injuries-0.008
 hire-0.007
Tokenisk
Token ID1984
Feature activation+5.89e+00

Pos logit contributions
 convince+0.051
 hips+0.043
 injuries+0.040
bing+0.038
 hire+0.038
Neg logit contributions
 wow-0.059
uzz-0.056
unts-0.055
 ago-0.055
 fault-0.054
Tokeners
Token ID364
Feature activation+3.21e-01

Pos logit contributions
 convince+0.091
 hips+0.054
bing+0.053
arians+0.050
pper+0.049
Neg logit contributions
 wow-0.303
 telescope-0.297
 ago-0.289
 amaz-0.287
unts-0.286
Token knew
Token ID2993
Feature activation-7.08e-01

Pos logit contributions
 convince+0.013
 hips+0.012
 injuries+0.011
bing+0.011
arians+0.011
Neg logit contributions
 wow-0.009
 ago-0.007
unts-0.007
uzz-0.007
 happened-0.007
Token that
Token ID326
Feature activation-9.33e-01

Pos logit contributions
 wow+0.021
 ago+0.018
 happened+0.018
uzz+0.018
unts+0.018
Neg logit contributions
 convince-0.027
 hips-0.024
 injuries-0.023
bing-0.022
ivering-0.022
Token poison
Token ID8764
Feature activation+1.86e-01

Pos logit contributions
 wow+0.031
unts+0.028
 ago+0.028
 maj+0.027
Kat+0.027
Neg logit contributions
 convince-0.033
 hips-0.028
 injuries-0.027
bing-0.026
 hire-0.025
Token was
Token ID373
Feature activation-1.33e-01

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.007
bing+0.007
 negotiate+0.007
Neg logit contributions
 wow-0.004
 happened-0.004
 happens-0.004
 ago-0.004
 fault-0.004
Token was
Token ID373
Feature activation+9.45e-03

Pos logit contributions
 convince+0.026
 hips+0.022
 injuries+0.021
bing+0.021
 hire+0.020
Neg logit contributions
 wow-0.018
 ago-0.016
unts-0.016
 fault-0.016
uzz-0.016
Token a
Token ID257
Feature activation-3.99e-01

Pos logit contributions
 convince+0.000
 hips+0.000
 injuries+0.000
bing+0.000
 sleeve+0.000
Neg logit contributions
 wow-0.000
Kat-0.000
unts-0.000
uzz-0.000
 fault-0.000
Token box
Token ID3091
Feature activation-1.17e-01

Pos logit contributions
 wow+0.012
uzz+0.011
apped+0.011
 fault+0.011
Kat+0.011
Neg logit contributions
 convince-0.015
 hips-0.013
 injuries-0.013
 sleeve-0.012
 negotiate-0.012
Token of
Token ID286
Feature activation+7.20e-01

Pos logit contributions
 wow+0.004
 ago+0.004
unts+0.004
uzz+0.004
 telescope+0.004
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
eling-0.003
bing-0.003
Token ch
Token ID442
Feature activation-1.50e-01

Pos logit contributions
 convince+0.028
 hips+0.024
 injuries+0.023
bing+0.022
ivering+0.022
Neg logit contributions
 wow-0.022
 ago-0.019
unts-0.019
 telescope-0.019
 happened-0.019
Tokenocol
Token ID4668
Feature activation+5.88e+00

Pos logit contributions
 wow+0.006
 happened+0.005
Kat+0.005
 fault+0.005
uzz+0.005
Neg logit contributions
 convince-0.004
 hips-0.004
 injuries-0.004
bing-0.003
ivering-0.003
Tokenates
Token ID689
Feature activation+1.70e+00

Pos logit contributions
 convince+0.150
 hips+0.123
 injuries+0.115
bing+0.108
arians+0.104
Neg logit contributions
 wow-0.250
unts-0.229
 ago-0.228
 fault-0.226
uzz-0.226
Token.
Token ID13
Feature activation-9.74e-02

Pos logit contributions
 convince+0.057
 hips+0.048
 injuries+0.046
 hire+0.043
ivering+0.043
Neg logit contributions
 wow-0.059
 ago-0.055
unts-0.053
 happened-0.053
uzz-0.053
Token Lily
Token ID20037
Feature activation+4.65e-02

Pos logit contributions
 wow+0.003
 happened+0.003
 fault+0.003
unts+0.002
 telescope+0.002
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
 hire-0.003
 sleeve-0.003
Token loved
Token ID6151
Feature activation+8.15e-01

Pos logit contributions
 convince+0.002
 hips+0.002
me+0.001
arians+0.001
bing+0.001
Neg logit contributions
 wow-0.001
 fault-0.001
unts-0.001
uzz-0.001
ull-0.001
Token ch
Token ID442
Feature activation-2.32e-01

Pos logit contributions
 convince+0.026
 hips+0.023
 injuries+0.022
bing+0.020
eling+0.020
Neg logit contributions
 wow-0.028
apped-0.027
unts-0.026
Kat-0.026
 ago-0.026
Token find
Token ID1064
Feature activation+7.87e-01

Pos logit contributions
 wow+0.003
 ago+0.003
unts+0.003
uzz+0.003
 telescope+0.003
Neg logit contributions
 convince-0.002
 hips-0.002
 injuries-0.002
bing-0.002
ivering-0.002
Token a
Token ID257
Feature activation-1.98e-01

Pos logit contributions
 convince+0.026
 hips+0.023
 injuries+0.023
 sleeve+0.021
bing+0.020
Neg logit contributions
 wow-0.026
Kat-0.025
 ago-0.024
unts-0.024
uzz-0.023
Token box
Token ID3091
Feature activation-2.11e-01

Pos logit contributions
 wow+0.006
apped+0.006
uzz+0.006
aw+0.006
Kat+0.006
Neg logit contributions
 convince-0.007
 hips-0.006
 injuries-0.006
 sleeve-0.005
 negotiate-0.005
Token of
Token ID286
Feature activation+8.15e-01

Pos logit contributions
 wow+0.008
unts+0.007
 fault+0.007
 ago+0.007
uzz+0.007
Neg logit contributions
 convince-0.007
 hips-0.006
 injuries-0.005
bing-0.005
 hire-0.005
Token ch
Token ID442
Feature activation-5.61e-01

Pos logit contributions
 convince+0.032
 hips+0.028
 injuries+0.026
bing+0.026
ivering+0.026
Neg logit contributions
 wow-0.024
 ago-0.022
 telescope-0.021
unts-0.021
 maj-0.021
Tokenocol
Token ID4668
Feature activation+5.87e+00

Pos logit contributions
 wow+0.021
 happened+0.019
Kat+0.019
 fault+0.018
 maj+0.018
Neg logit contributions
 convince-0.017
 hips-0.015
 injuries-0.015
bing-0.014
eling-0.014
Tokenates
Token ID689
Feature activation+1.51e+00

Pos logit contributions
 convince+0.140
 hips+0.120
 injuries+0.113
bing+0.106
arians+0.102
Neg logit contributions
 wow-0.256
unts-0.230
 happened-0.230
 fault-0.230
Kat-0.229
Token.
Token ID13
Feature activation+8.33e-02

Pos logit contributions
 convince+0.052
 hips+0.045
 injuries+0.043
bing+0.041
 hire+0.040
Neg logit contributions
 wow-0.051
 ago-0.046
unts-0.046
uzz-0.045
 fault-0.045
Token
Token ID198
Feature activation+3.50e-02

Pos logit contributions
 convince+0.003
 hips+0.003
 injuries+0.003
+0.003
 hire+0.003
Neg logit contributions
 wow-0.002
 fault-0.002
 telescope-0.002
 happened-0.002
Kat-0.002
Token
Token ID198
Feature activation+5.37e-02

Pos logit contributions
 convince+0.001
 hips+0.001
 injuries+0.001
bing+0.001
 hire+0.001
Neg logit contributions
 wow-0.001
 fault-0.001
 ago-0.001
 telescope-0.001
 happened-0.001
Token"
Token ID1
Feature activation+1.76e-01

Pos logit contributions
 convince+0.002
 hips+0.001
 injuries+0.001
bing+0.001
 hire+0.001
Neg logit contributions
 wow-0.002
 fault-0.002
 happened-0.002
 ago-0.002
 telescope-0.002
Token Wh
Token ID854
Feature activation+1.77e+00

Pos logit contributions
 convince+0.001
 hips+0.001
 injuries+0.001
bing+0.001
 hire+0.001
Neg logit contributions
 wow-0.001
 maj-0.001
 ago-0.001
 happened-0.001
 fault-0.001
Tokenisk
Token ID1984
Feature activation+6.01e+00

Pos logit contributions
 convince+0.057
 hips+0.047
 persuade+0.043
 hire+0.042
bing+0.042
Neg logit contributions
 fault-0.063
uzz-0.062
unts-0.062
 wow-0.062
 ago-0.061
Tokeners
Token ID364
Feature activation+4.42e-01

Pos logit contributions
 convince+0.079
me+0.043
ty+0.041
bing+0.040
 hips+0.040
Neg logit contributions
 wow-0.324
 amaz-0.313
 telescope-0.310
 ago-0.306
 solar-0.302
Token.
Token ID13
Feature activation+5.80e-01

Pos logit contributions
 convince+0.017
 hips+0.015
 injuries+0.014
 persuade+0.014
vation+0.014
Neg logit contributions
 wow-0.013
 many-0.012
 happened-0.012
uzz-0.012
unts-0.012
Token Wh
Token ID854
Feature activation+1.68e+00

Pos logit contributions
 convince+0.023
 hips+0.019
eling+0.019
 injuries+0.018
 persuade+0.018
Neg logit contributions
 wow-0.017
 many-0.017
 ago-0.016
unts-0.016
uzz-0.016
Tokenisk
Token ID1984
Feature activation+5.86e+00

Pos logit contributions
 convince+0.057
 hips+0.048
 persuade+0.043
 injuries+0.042
 hire+0.041
Neg logit contributions
 fault-0.057
uzz-0.057
unts-0.056
 ago-0.055
 wow-0.055
Tokeners
Token ID364
Feature activation+3.89e-01

Pos logit contributions
 convince+0.082
bing+0.044
me+0.044
 hips+0.042
ty+0.042
Neg logit contributions
 wow-0.314
 amaz-0.300
 telescope-0.300
 ago-0.298
 fault-0.291
Token loved
Token ID6151
Feature activation+9.97e-01

Pos logit contributions
 convince+0.021
 hips+0.020
 injuries+0.019
bing+0.018
arians+0.018
Neg logit contributions
 wow-0.005
 happened-0.004
unts-0.004
 ago-0.004
uzz-0.004
Token to
Token ID284
Feature activation+8.20e-02

Pos logit contributions
 convince+0.039
 hips+0.033
 injuries+0.033
bing+0.031
ittens+0.031
Neg logit contributions
 wow-0.030
apped-0.026
 ago-0.026
Kat-0.026
uzz-0.026
Token play
Token ID711
Feature activation-1.30e-01

Pos logit contributions
 convince+0.003
 hips+0.002
 injuries+0.002
bing+0.002
arians+0.002
Neg logit contributions
 wow-0.003
unts-0.003
 ago-0.003
uzz-0.003
 happened-0.003
Token with
Token ID351
Feature activation+1.43e-01

Pos logit contributions
 wow+0.003
 many+0.003
 maj+0.003
aw+0.003
 happened+0.003
Neg logit contributions
 convince-0.005
 hips-0.005
 injuries-0.005
eling-0.005
 hire-0.005
Token giving
Token ID3501
Feature activation+8.59e-01

Pos logit contributions
 wow+0.020
 ago+0.018
unts+0.018
uzz+0.018
 fault+0.018
Neg logit contributions
 convince-0.016
 hips-0.014
 injuries-0.013
bing-0.012
 hire-0.012
Token him
Token ID683
Feature activation+1.90e-01

Pos logit contributions
 convince+0.023
 hips+0.019
 injuries+0.017
bing+0.017
 hire+0.016
Neg logit contributions
 wow-0.036
 ago-0.033
unts-0.033
 telescope-0.032
uzz-0.032
Token too
Token ID1165
Feature activation+3.91e-02

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.006
arians+0.006
bing+0.006
Neg logit contributions
 wow-0.006
 fault-0.005
uzz-0.005
 maj-0.005
 ago-0.005
Token many
Token ID867
Feature activation-6.99e-01

Pos logit contributions
 convince+0.002
 hips+0.002
 injuries+0.002
bing+0.002
 hire+0.002
Neg logit contributions
 wow-0.001
 ago-0.001
unts-0.001
uzz-0.001
 fault-0.001
Token ch
Token ID442
Feature activation-7.35e-02

Pos logit contributions
 wow+0.023
 ago+0.021
 many+0.020
 happened+0.020
 happens+0.020
Neg logit contributions
 convince-0.025
 hips-0.022
bing-0.021
ivering-0.020
 injuries-0.020
Tokenocol
Token ID4668
Feature activation+5.85e+00

Pos logit contributions
 wow+0.003
 happened+0.002
Kat+0.002
 telescope+0.002
 maj+0.002
Neg logit contributions
 convince-0.002
 hips-0.002
 injuries-0.002
ivering-0.002
eling-0.002
Tokenates
Token ID689
Feature activation+1.70e+00

Pos logit contributions
 convince+0.144
 hips+0.117
 injuries+0.111
bing+0.104
arians+0.099
Neg logit contributions
 wow-0.254
unts-0.232
 ago-0.230
 telescope-0.229
 maj-0.228
Token."
Token ID526
Feature activation-6.81e-01

Pos logit contributions
 convince+0.059
 hips+0.049
 injuries+0.049
 negotiate+0.045
 hire+0.045
Neg logit contributions
 wow-0.056
 ago-0.054
uzz-0.052
 fault-0.051
unts-0.050
Token The
Token ID383
Feature activation+3.74e-01

Pos logit contributions
 wow+0.016
 ago+0.014
uzz+0.014
unts+0.014
Kat+0.014
Neg logit contributions
 convince-0.030
 hips-0.026
 injuries-0.026
bing-0.025
eling-0.025
Token king
Token ID5822
Feature activation+5.58e-02

Pos logit contributions
 convince+0.014
 hips+0.012
 injuries+0.011
 hire+0.011
bing+0.011
Neg logit contributions
 wow-0.012
 fault-0.010
uzz-0.010
 ago-0.010
unts-0.010
Token learned
Token ID4499
Feature activation+2.35e-01

Pos logit contributions
 convince+0.002
 hips+0.002
me+0.002
ivering+0.002
 hire+0.002
Neg logit contributions
 wow-0.002
 ago-0.002
 fault-0.002
 telescope-0.002
 maj-0.002
Token Ch
Token ID609
Feature activation+3.13e-01

Pos logit contributions
 wow+0.000
 many+0.000
 happened+0.000
uzz+0.000
 ago+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
ivering-0.000
eling-0.000
Tokenir
Token ID343
Feature activation+1.71e+00

Pos logit contributions
 convince+0.014
 hips+0.012
 injuries+0.012
 persuade+0.012
 negotiate+0.011
Neg logit contributions
 ago-0.007
uzz-0.007
unts-0.007
 wow-0.007
ull-0.007
Tokenpy
Token ID9078
Feature activation-5.17e-01

Pos logit contributions
 convince+0.064
 hips+0.056
 injuries+0.053
 negotiate+0.050
bing+0.049
Neg logit contributions
 wow-0.052
 ago-0.049
unts-0.048
uzz-0.048
 happened-0.047
Token and
Token ID290
Feature activation-2.00e-01

Pos logit contributions
 wow+0.013
 ago+0.012
unts+0.012
 happened+0.012
 maj+0.011
Neg logit contributions
 convince-0.022
 hips-0.019
 injuries-0.019
bing-0.018
arians-0.018
Token Wh
Token ID854
Feature activation+1.61e+00

Pos logit contributions
 wow+0.007
Kat+0.007
unts+0.007
 fault+0.007
 ago+0.007
Neg logit contributions
 convince-0.006
 hips-0.005
 injuries-0.005
 hire-0.005
bing-0.005
Tokenisk
Token ID1984
Feature activation+5.84e+00

Pos logit contributions
 convince+0.052
 hips+0.043
 injuries+0.039
 persuade+0.038
bing+0.037
Neg logit contributions
uzz-0.058
 wow-0.057
unts-0.056
 fault-0.055
 ago-0.055
Tokeners
Token ID364
Feature activation+2.54e-01

Pos logit contributions
 convince+0.100
 hips+0.065
 injuries+0.061
bing+0.058
arians+0.056
Neg logit contributions
 wow-0.295
 telescope-0.280
 ago-0.280
unts-0.276
 fault-0.275
Token became
Token ID2627
Feature activation+2.53e-01

Pos logit contributions
 convince+0.011
 hips+0.010
 injuries+0.010
bing+0.009
arians+0.009
Neg logit contributions
 wow-0.006
 ago-0.005
unts-0.005
uzz-0.005
 happened-0.005
Token good
Token ID922
Feature activation-3.02e-01

Pos logit contributions
 convince+0.010
 hips+0.009
 injuries+0.008
arians+0.008
vation+0.008
Neg logit contributions
 wow-0.007
unts-0.007
 many-0.007
 fault-0.006
 happened-0.006
Token friends
Token ID2460
Feature activation-2.59e-01

Pos logit contributions
 wow+0.009
unts+0.008
 fault+0.008
 ago+0.008
 happened+0.008
Neg logit contributions
 convince-0.011
 hips-0.010
 injuries-0.009
 negotiate-0.009
bing-0.009
Token,
Token ID11
Feature activation-8.64e-02

Pos logit contributions
 wow+0.008
 ago+0.007
unts+0.007
uzz+0.007
 many+0.007
Neg logit contributions
 convince-0.010
 hips-0.008
 negotiate-0.008
eling-0.008
 injuries-0.008
Token box
Token ID3091
Feature activation-3.01e-01

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.007
bing+0.007
ivering+0.007
Neg logit contributions
 wow-0.004
 ago-0.004
 telescope-0.004
unts-0.003
 maj-0.003
Token and
Token ID290
Feature activation-5.82e-02

Pos logit contributions
 wow+0.011
 fault+0.010
Kat+0.010
 happened+0.010
unts+0.010
Neg logit contributions
 convince-0.009
 hips-0.008
 injuries-0.007
bing-0.007
 sleeve-0.007
Token share
Token ID2648
Feature activation-1.94e-01

Pos logit contributions
 wow+0.002
unts+0.002
 fault+0.002
uzz+0.002
 ago+0.002
Neg logit contributions
 convince-0.002
 hips-0.002
 injuries-0.002
 hire-0.002
bing-0.002
Token the
Token ID262
Feature activation+2.04e-01

Pos logit contributions
 wow+0.007
 fault+0.006
 ago+0.006
unts+0.006
 happened+0.006
Neg logit contributions
 convince-0.006
 hips-0.006
 injuries-0.005
bing-0.005
 negotiate-0.005
Token ch
Token ID442
Feature activation-6.47e-01

Pos logit contributions
 convince+0.009
 hips+0.008
 injuries+0.007
bing+0.007
 hire+0.007
Neg logit contributions
 wow-0.005
 ago-0.005
unts-0.005
uzz-0.005
 fault-0.004
Tokenocol
Token ID4668
Feature activation+5.83e+00

Pos logit contributions
 wow+0.022
 ago+0.021
unts+0.020
uzz+0.020
 maj+0.020
Neg logit contributions
 convince-0.022
 hips-0.019
 injuries-0.018
bing-0.017
 negotiate-0.017
Tokenates
Token ID689
Feature activation+1.44e+00

Pos logit contributions
 convince+0.161
 hips+0.132
 injuries+0.124
bing+0.116
 hire+0.113
Neg logit contributions
 wow-0.237
 ago-0.218
unts-0.217
uzz-0.215
 fault-0.214
Token.
Token ID13
Feature activation+8.30e-02

Pos logit contributions
 convince+0.051
 injuries+0.044
 hips+0.043
ivering+0.042
eling+0.041
Neg logit contributions
 wow-0.046
uzz-0.044
 many-0.042
 ago-0.042
unts-0.041
Token They
Token ID1119
Feature activation-5.11e-02

Pos logit contributions
 convince+0.003
 hips+0.003
+0.003
 injuries+0.002
 before+0.002
Neg logit contributions
 wow-0.002
 fault-0.002
 telescope-0.002
od-0.002
Kat-0.002
Token are
Token ID389
Feature activation+2.17e-01

Pos logit contributions
 wow+0.002
 ago+0.002
unts+0.002
uzz+0.002
 maj+0.002
Neg logit contributions
 convince-0.002
 hips-0.001
 injuries-0.001
bing-0.001
ivering-0.001
Token happy
Token ID3772
Feature activation+8.63e-01

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.007
bing+0.007
 negotiate+0.006
Neg logit contributions
 wow-0.006
Kat-0.006
 ago-0.006
 happens-0.006
 telescope-0.006
Token would
Token ID561
Feature activation-9.42e-01

Pos logit contributions
 wow+0.027
 happened+0.024
 ago+0.024
unts+0.024
 fault+0.023
Neg logit contributions
 convince-0.036
 hips-0.032
 injuries-0.031
bing-0.030
ivering-0.029
Token heal
Token ID12035
Feature activation+1.46e+00

Pos logit contributions
 wow+0.032
 happened+0.029
 ago+0.029
unts+0.029
 wonder+0.029
Neg logit contributions
 convince-0.028
 hips-0.027
arians-0.026
bing-0.026
ittens-0.025
Token soon
Token ID2582
Feature activation+5.08e-01

Pos logit contributions
 convince+0.057
 hips+0.050
vation+0.046
+0.046
ivering+0.046
Neg logit contributions
 wow-0.040
 many-0.039
uzz-0.038
 happened-0.037
 fault-0.036
Token.
Token ID13
Feature activation-3.66e-01

Pos logit contributions
 convince+0.019
 hips+0.017
 negotiate+0.015
arians+0.015
ivering+0.015
Neg logit contributions
 ago-0.015
 wow-0.015
 happened-0.015
 happens-0.014
 fault-0.014
Token Wh
Token ID854
Feature activation+1.60e+00

Pos logit contributions
 wow+0.009
 ago+0.008
unts+0.008
uzz+0.008
apped+0.008
Neg logit contributions
 convince-0.016
 hips-0.014
 injuries-0.013
eling-0.013
bing-0.013
Tokenisk
Token ID1984
Feature activation+5.82e+00

Pos logit contributions
 convince+0.055
 hips+0.049
 persuade+0.042
 injuries+0.041
 hind+0.040
Neg logit contributions
uzz-0.055
 ago-0.053
 fault-0.053
unts-0.051
om-0.050
Tokeners
Token ID364
Feature activation+2.43e-01

Pos logit contributions
 convince+0.109
 hips+0.076
 injuries+0.070
bing+0.068
me+0.063
Neg logit contributions
 wow-0.288
 ago-0.271
 telescope-0.269
unts-0.268
 fault-0.267
Token was
Token ID373
Feature activation-6.24e-02

Pos logit contributions
 convince+0.010
 hips+0.009
 injuries+0.009
bing+0.008
arians+0.008
Neg logit contributions
 wow-0.006
unts-0.005
 ago-0.005
uzz-0.005
 happened-0.005
Token very
Token ID845
Feature activation+9.64e-01

Pos logit contributions
 wow+0.002
 telescope+0.002
 ago+0.002
unts+0.002
uzz+0.002
Neg logit contributions
 convince-0.002
 hips-0.002
 injuries-0.002
bing-0.002
ivering-0.002
Token happy
Token ID3772
Feature activation+8.29e-01

Pos logit contributions
 convince+0.041
 hips+0.035
 injuries+0.034
bing+0.033
ivering+0.032
Neg logit contributions
 wow-0.026
 ago-0.023
uzz-0.022
 telescope-0.022
 fault-0.022
Token to
Token ID284
Feature activation+3.77e-02

Pos logit contributions
 convince+0.028
 hips+0.024
 injuries+0.023
bing+0.022
arians+0.021
Neg logit contributions
 wow-0.028
unts-0.026
 ago-0.025
uzz-0.025
 maj-0.025
Token
Token ID198
Feature activation-3.03e-01

Pos logit contributions
 wow+0.008
 fault+0.007
 ago+0.007
 telescope+0.007
 happens+0.007
Neg logit contributions
 convince-0.010
 hips-0.009
 injuries-0.008
me-0.008
 hire-0.008
TokenAfter
Token ID3260
Feature activation-1.41e+00

Pos logit contributions
 wow+0.011
 ago+0.010
 fault+0.010
unts+0.010
 happened+0.010
Neg logit contributions
 convince-0.010
 hips-0.008
 injuries-0.008
bing-0.007
 hire-0.007
Token Mr
Token ID1770
Feature activation+4.97e-01

Pos logit contributions
 wow+0.042
unts+0.039
 fault+0.039
 maj+0.039
uzz+0.038
Neg logit contributions
 convince-0.052
 hips-0.046
 injuries-0.044
bing-0.042
 hire-0.041
Token.
Token ID13
Feature activation-2.05e-01

Pos logit contributions
 convince+0.017
 hips+0.014
bing+0.013
 injuries+0.013
 persuade+0.013
Neg logit contributions
 wow-0.017
 ago-0.016
 fault-0.016
 happened-0.016
unts-0.016
Token Wh
Token ID854
Feature activation+1.68e+00

Pos logit contributions
 wow+0.007
unts+0.006
 ago+0.006
uzz+0.006
 fault+0.006
Neg logit contributions
 convince-0.007
 hips-0.006
 injuries-0.006
bing-0.005
 hire-0.005
Tokenisk
Token ID1984
Feature activation+5.82e+00

Pos logit contributions
 convince+0.055
 hips+0.046
bing+0.042
 persuade+0.042
 hind+0.041
Neg logit contributions
 fault-0.061
uzz-0.059
 wow-0.057
 happened-0.056
 ago-0.056
Tokeners
Token ID364
Feature activation+6.88e-02

Pos logit contributions
 convince+0.094
 hips+0.059
 injuries+0.054
me+0.054
bing+0.053
Neg logit contributions
 wow-0.298
 telescope-0.287
 ago-0.285
unts-0.282
 fault-0.281
Token was
Token ID373
Feature activation-1.49e-01

Pos logit contributions
 convince+0.003
 hips+0.003
 injuries+0.002
bing+0.002
arians+0.002
Neg logit contributions
 wow-0.002
 ago-0.002
unts-0.002
uzz-0.002
 happened-0.002
Token done
Token ID1760
Feature activation-1.19e-01

Pos logit contributions
 wow+0.005
 telescope+0.004
 ago+0.004
unts+0.004
uzz+0.004
Neg logit contributions
 convince-0.005
 hips-0.005
 injuries-0.004
ivering-0.004
bing-0.004
Token with
Token ID351
Feature activation-4.28e-01

Pos logit contributions
 wow+0.004
unts+0.004
uzz+0.004
 ago+0.004
 fault+0.004
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
bing-0.003
 hire-0.003
Token his
Token ID465
Feature activation+1.97e-01

Pos logit contributions
 wow+0.012
uzz+0.011
 many+0.011
 fault+0.010
 ago+0.010
Neg logit contributions
 convince-0.017
 hips-0.015
ivering-0.015
bing-0.015
arians-0.015
Token a
Token ID257
Feature activation-7.56e-03

Pos logit contributions
 convince+0.002
 hips+0.002
 injuries+0.002
bing+0.002
 hire+0.002
Neg logit contributions
 wow-0.001
 ago-0.001
unts-0.001
uzz-0.001
 fault-0.001
Token quiet
Token ID5897
Feature activation-3.28e-01

Pos logit contributions
 wow+0.000
 ago+0.000
uzz+0.000
 fault+0.000
apped+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
 negotiate-0.000
 hire-0.000
Token mouse
Token ID10211
Feature activation+3.06e-01

Pos logit contributions
 wow+0.014
 fault+0.013
uzz+0.013
Kat+0.013
 happens+0.012
Neg logit contributions
 convince-0.008
ivering-0.006
 hips-0.006
 rescue-0.006
 hire-0.006
Token named
Token ID3706
Feature activation-2.48e-01

Pos logit contributions
 convince+0.010
 hips+0.009
 injuries+0.008
bing+0.008
ivering+0.008
Neg logit contributions
 wow-0.011
 ago-0.010
unts-0.010
 fault-0.010
Kat-0.009
Token Wh
Token ID854
Feature activation+1.49e+00

Pos logit contributions
 wow+0.007
 ago+0.006
unts+0.006
uzz+0.006
 fault+0.006
Neg logit contributions
 convince-0.010
 hips-0.009
 injuries-0.008
bing-0.008
ivering-0.008
Tokenisk
Token ID1984
Feature activation+5.81e+00

Pos logit contributions
 convince+0.048
 hips+0.040
 injuries+0.036
 persuade+0.036
 hire+0.035
Neg logit contributions
 wow-0.053
uzz-0.052
unts-0.051
 fault-0.051
 ago-0.051
Tokeners
Token ID364
Feature activation+2.57e-01

Pos logit contributions
 convince+0.058
ty+0.032
me+0.028
ches+0.022
bing+0.022
Neg logit contributions
 wow-0.326
 amaz-0.325
 telescope-0.314
 ago-0.314
 fault-0.313
Token.
Token ID13
Feature activation+7.11e-02

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.006
bing+0.006
 hire+0.006
Neg logit contributions
 wow-0.009
 ago-0.009
unts-0.009
uzz-0.008
 happened-0.008
Token Wh
Token ID854
Feature activation+1.47e+00

Pos logit contributions
 convince+0.003
 injuries+0.003
eling+0.003
 hips+0.003
ivering+0.003
Neg logit contributions
 wow-0.001
 many-0.001
unts-0.001
 ago-0.001
apped-0.001
Tokenisk
Token ID1984
Feature activation+5.76e+00

Pos logit contributions
 convince+0.046
 hips+0.039
 injuries+0.036
 persuade+0.033
 hire+0.033
Neg logit contributions
 wow-0.054
uzz-0.053
 fault-0.052
unts-0.051
 ago-0.051
Tokeners
Token ID364
Feature activation+2.40e-01

Pos logit contributions
 convince+0.074
bing+0.039
me+0.038
ty+0.037
 hips+0.036
Neg logit contributions
 wow-0.311
 ago-0.302
 fault-0.298
 amaz-0.298
 telescope-0.297
Token a
Token ID257
Feature activation+3.66e-02

Pos logit contributions
 convince+0.009
 hips+0.007
ittens+0.007
 injuries+0.007
me+0.007
Neg logit contributions
 wow-0.011
 happens-0.010
 fault-0.010
 telescope-0.010
 ago-0.010
Token smart
Token ID4451
Feature activation+3.60e-01

Pos logit contributions
 convince+0.001
 hips+0.001
 injuries+0.001
ivering+0.001
bing+0.001
Neg logit contributions
 wow-0.001
 fault-0.001
 ago-0.001
uzz-0.001
 happens-0.001
Token cat
Token ID3797
Feature activation+3.23e-01

Pos logit contributions
 convince+0.010
 hips+0.008
ivering+0.008
me+0.008
ittens+0.007
Neg logit contributions
 wow-0.014
 fault-0.013
 ago-0.013
Kat-0.013
uzz-0.013
Token named
Token ID3706
Feature activation-2.92e-02

Pos logit contributions
 convince+0.013
 hips+0.011
 injuries+0.011
 negotiate+0.010
bing+0.010
Neg logit contributions
 wow-0.009
unts-0.008
 ago-0.008
uzz-0.008
 happened-0.008
Token Wh
Token ID854
Feature activation+1.71e+00

Pos logit contributions
 wow+0.001
 ago+0.001
 maj+0.001
uzz+0.001
 happened+0.001
Neg logit contributions
 convince-0.001
 hips-0.001
 injuries-0.001
eling-0.001
bing-0.001
Tokenisk
Token ID1984
Feature activation+5.77e+00

Pos logit contributions
 convince+0.055
 hips+0.046
 hire+0.042
 persuade+0.042
bing+0.041
Neg logit contributions
 wow-0.060
 fault-0.060
uzz-0.060
unts-0.059
 ago-0.058
Tokeners
Token ID364
Feature activation+3.49e-01

Pos logit contributions
 convince+0.073
me+0.039
ty+0.037
bing+0.035
 hips+0.035
Neg logit contributions
 wow-0.314
 amaz-0.304
 telescope-0.301
 ago-0.298
 solar-0.293
Token.
Token ID13
Feature activation+4.85e-01

Pos logit contributions
 convince+0.014
 hips+0.012
 injuries+0.012
 persuade+0.011
vation+0.011
Neg logit contributions
 wow-0.010
 many-0.009
uzz-0.009
unts-0.009
 happened-0.009
Token Wh
Token ID854
Feature activation+1.64e+00

Pos logit contributions
 convince+0.019
 hips+0.016
eling+0.015
 injuries+0.015
 persuade+0.015
Neg logit contributions
 wow-0.015
 many-0.014
 ago-0.014
unts-0.014
uzz-0.013
Tokenisk
Token ID1984
Feature activation+5.68e+00

Pos logit contributions
 convince+0.055
 hips+0.047
 injuries+0.042
 persuade+0.042
 hire+0.041
Neg logit contributions
 fault-0.056
uzz-0.056
 wow-0.055
unts-0.054
 ago-0.054
Tokeners
Token ID364
Feature activation+3.22e-01

Pos logit contributions
 convince+0.074
me+0.038
bing+0.038
ty+0.037
 hips+0.035
Neg logit contributions
 wow-0.309
 amaz-0.296
 ago-0.296
 telescope-0.295
 fault-0.288
Token her
Token ID607
Feature activation+3.32e-01

Pos logit contributions
 convince+0.010
 help+0.008
 explain+0.007
 first+0.007
Mr+0.007
Neg logit contributions
 fault-0.015
unts-0.015
 ago-0.014
uzz-0.014
ff-0.014
Token pet
Token ID4273
Feature activation+2.09e-01

Pos logit contributions
 convince+0.011
 hips+0.010
 injuries+0.009
 sleeve+0.009
arians+0.008
Neg logit contributions
 wow-0.011
Kat-0.010
 ago-0.010
uzz-0.010
unts-0.010
Token cat
Token ID3797
Feature activation+3.87e-01

Pos logit contributions
 convince+0.007
 hips+0.006
 injuries+0.006
bing+0.006
arians+0.006
Neg logit contributions
 wow-0.007
 ago-0.006
unts-0.006
uzz-0.006
 fault-0.006
Token,
Token ID11
Feature activation+5.35e-01

Pos logit contributions
 convince+0.015
 hips+0.013
 injuries+0.013
eling+0.012
vation+0.012
Neg logit contributions
 wow-0.011
 ago-0.010
uzz-0.010
 happened-0.010
 many-0.010
Token Wh
Token ID854
Feature activation+1.65e+00

Pos logit contributions
 convince+0.021
 hips+0.019
 injuries+0.018
bing+0.018
arians+0.017
Neg logit contributions
 wow-0.015
 happened-0.014
uzz-0.013
 telescope-0.013
 ago-0.013
Tokenisk
Token ID1984
Feature activation+5.76e+00

Pos logit contributions
 convince+0.052
 hips+0.042
 persuade+0.039
 hire+0.039
bing+0.038
Neg logit contributions
 fault-0.059
 wow-0.059
uzz-0.059
unts-0.058
 ago-0.057
Tokeners
Token ID364
Feature activation+4.11e-01

Pos logit contributions
 convince+0.074
me+0.040
ty+0.036
bing+0.035
 hips+0.034
Neg logit contributions
 wow-0.314
 amaz-0.305
 telescope-0.301
 ago-0.299
 fault-0.295
Token.
Token ID13
Feature activation+6.12e-01

Pos logit contributions
 convince+0.016
 hips+0.014
eling+0.013
 persuade+0.013
 injuries+0.013
Neg logit contributions
 wow-0.012
 ago-0.011
 many-0.011
 happened-0.010
unts-0.010
Token Lily
Token ID20037
Feature activation+3.10e-01

Pos logit contributions
 convince+0.029
 hips+0.025
eling+0.025
 injuries+0.025
bing+0.024
Neg logit contributions
 wow-0.013
 ago-0.012
uzz-0.011
unts-0.011
 maj-0.011
Token loved
Token ID6151
Feature activation+9.49e-01

Pos logit contributions
 convince+0.014
 hips+0.013
bing+0.012
me+0.012
 injuries+0.012
Neg logit contributions
 wow-0.007
 ago-0.006
 fault-0.006
uzz-0.006
 telescope-0.006
Token to
Token ID284
Feature activation+1.83e-01

Pos logit contributions
 convince+0.035
 hips+0.029
 injuries+0.029
bing+0.028
ittens+0.027
Neg logit contributions
 wow-0.030
apped-0.027
 ago-0.027
unts-0.026
Kat-0.026
Token Wh
Token ID854
Feature activation+1.49e+00

Pos logit contributions
 wow+0.007
 ago+0.006
unts+0.006
uzz+0.006
 fault+0.006
Neg logit contributions
 convince-0.010
 hips-0.009
 injuries-0.008
bing-0.008
ivering-0.008
Tokenisk
Token ID1984
Feature activation+5.81e+00

Pos logit contributions
 convince+0.048
 hips+0.040
 injuries+0.036
 persuade+0.036
 hire+0.035
Neg logit contributions
 wow-0.053
uzz-0.052
unts-0.051
 fault-0.051
 ago-0.051
Tokeners
Token ID364
Feature activation+2.57e-01

Pos logit contributions
 convince+0.058
ty+0.032
me+0.028
ches+0.022
bing+0.022
Neg logit contributions
 wow-0.326
 amaz-0.325
 telescope-0.314
 ago-0.314
 fault-0.313
Token.
Token ID13
Feature activation+7.11e-02

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.006
bing+0.006
 hire+0.006
Neg logit contributions
 wow-0.009
 ago-0.009
unts-0.009
uzz-0.008
 happened-0.008
Token Wh
Token ID854
Feature activation+1.47e+00

Pos logit contributions
 convince+0.003
 injuries+0.003
eling+0.003
 hips+0.003
ivering+0.003
Neg logit contributions
 wow-0.001
 many-0.001
unts-0.001
 ago-0.001
apped-0.001
Tokenisk
Token ID1984
Feature activation+5.76e+00

Pos logit contributions
 convince+0.046
 hips+0.039
 injuries+0.036
 persuade+0.033
 hire+0.033
Neg logit contributions
 wow-0.054
uzz-0.053
 fault-0.052
unts-0.051
 ago-0.051
Tokeners
Token ID364
Feature activation+2.40e-01

Pos logit contributions
 convince+0.074
bing+0.039
me+0.038
ty+0.037
 hips+0.036
Neg logit contributions
 wow-0.311
 ago-0.302
 fault-0.298
 amaz-0.298
 telescope-0.297
Token was
Token ID373
Feature activation+1.06e-01

Pos logit contributions
 convince+0.011
 hips+0.010
 injuries+0.009
bing+0.009
 hire+0.009
Neg logit contributions
 wow-0.006
 ago-0.005
 fault-0.005
unts-0.005
uzz-0.005
Token sad
Token ID6507
Feature activation-1.26e-02

Pos logit contributions
 convince+0.004
 hips+0.003
 injuries+0.003
bing+0.003
 hire+0.003
Neg logit contributions
 wow-0.003
 ago-0.003
unts-0.003
uzz-0.003
 fault-0.003
Token because
Token ID780
Feature activation+3.14e-01

Pos logit contributions
 wow+0.001
unts+0.000
Kat+0.000
 ago+0.000
uzz+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
bing-0.000
ivering-0.000
Token he
Token ID339
Feature activation+2.18e-01

Pos logit contributions
 convince+0.011
 hips+0.010
 injuries+0.009
bing+0.009
arians+0.009
Neg logit contributions
 wow-0.010
 ago-0.009
uzz-0.009
unts-0.009
 happened-0.009
Token Jane
Token ID12091
Feature activation-4.52e-01

Pos logit contributions
 convince+0.007
bing+0.006
ising+0.006
 hips+0.006
place+0.005
Neg logit contributions
 wow-0.009
 fault-0.009
uzz-0.009
apped-0.008
unts-0.008
Token hopped
Token ID45230
Feature activation-5.21e-01

Pos logit contributions
 wow+0.013
 fault+0.012
 telescope+0.012
unts+0.012
uzz+0.012
Neg logit contributions
 convince-0.017
 hips-0.015
me-0.014
eling-0.013
 injuries-0.013
Token on
Token ID319
Feature activation-4.65e-01

Pos logit contributions
 wow+0.018
 fault+0.017
unts+0.017
 ago+0.016
 happened+0.016
Neg logit contributions
 convince-0.017
 hips-0.015
bing-0.013
 injuries-0.013
 persuade-0.012
Token the
Token ID262
Feature activation-9.42e-02

Pos logit contributions
 wow+0.014
unts+0.013
 ago+0.013
uzz+0.013
apped+0.013
Neg logit contributions
 convince-0.017
 hips-0.015
 injuries-0.014
bing-0.014
 hire-0.013
Token sees
Token ID7224
Feature activation-8.77e-01

Pos logit contributions
 wow+0.003
 ago+0.003
 telescope+0.003
unts+0.003
 maj+0.003
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
bing-0.003
ivering-0.003
Tokenaw
Token ID707
Feature activation-3.38e+00

Pos logit contributions
 wow+0.026
unts+0.026
uzz+0.025
 fault+0.025
 maj+0.024
Neg logit contributions
 convince-0.030
 hips-0.026
 injuries-0.025
eling-0.024
 sleeve-0.023
Token and
Token ID290
Feature activation-4.55e-01

Pos logit contributions
 wow+0.124
unts+0.115
 fault+0.113
 telescope+0.113
uzz+0.112
Neg logit contributions
 convince-0.103
 hips-0.089
 injuries-0.083
bing-0.081
ivering-0.077
Token before
Token ID878
Feature activation+3.80e-01

Pos logit contributions
 fault+0.015
unts+0.014
uzz+0.014
 ago+0.013
 wow+0.013
Neg logit contributions
 convince-0.016
 hips-0.014
bing-0.013
 persuade-0.012
eling-0.012
Token long
Token ID890
Feature activation-5.76e-01

Pos logit contributions
 convince+0.013
 hips+0.011
 injuries+0.011
+0.010
bing+0.010
Neg logit contributions
ocado-0.012
 wow-0.012
 fault-0.012
uzz-0.012
unts-0.012
Token she
Token ID673
Feature activation-2.35e-01

Pos logit contributions
 wow+0.021
 ago+0.019
unts+0.019
uzz+0.019
 fault+0.018
Neg logit contributions
 convince-0.018
 hips-0.016
 injuries-0.015
bing-0.014
 hire-0.014
Token was
Token ID373
Feature activation-1.07e-01

Pos logit contributions
 wow+0.006
 happened+0.005
 ago+0.005
unts+0.005
uzz+0.005
Neg logit contributions
 convince-0.010
 hips-0.009
 injuries-0.009
bing-0.009
ivering-0.009
Token her
Token ID607
Feature activation+4.70e-01

Pos logit contributions
 wow+0.003
 fault+0.003
uzz+0.003
unts+0.003
 maj+0.003
Neg logit contributions
 convince-0.003
 hips-0.003
 injuries-0.002
 hire-0.002
eling-0.002
Token mom
Token ID1995
Feature activation-3.64e-01

Pos logit contributions
 convince+0.018
 hips+0.015
 injuries+0.015
bing+0.014
ivering+0.014
Neg logit contributions
 wow-0.014
 ago-0.013
 fault-0.013
uzz-0.012
 maj-0.012
Tokenmy
Token ID1820
Feature activation+1.17e+00

Pos logit contributions
 wow+0.011
 ago+0.010
unts+0.010
 happened+0.010
 maj+0.010
Neg logit contributions
 convince-0.014
 hips-0.012
 injuries-0.011
bing-0.011
arians-0.011
Token watched
Token ID7342
Feature activation-8.38e-01

Pos logit contributions
 convince+0.043
 hips+0.038
 injuries+0.036
bing+0.035
arians+0.034
Neg logit contributions
 wow-0.036
 ago-0.032
unts-0.032
uzz-0.032
 happened-0.031
Token in
Token ID287
Feature activation-1.77e-01

Pos logit contributions
 wow+0.026
 ago+0.023
uzz+0.022
 maj+0.022
 happened+0.022
Neg logit contributions
 convince-0.031
 hips-0.027
 injuries-0.026
arians-0.025
ivering-0.025
Token amaz
Token ID40642
Feature activation-3.37e+00

Pos logit contributions
 wow+0.006
unts+0.005
uzz+0.005
 ago+0.005
 fault+0.005
Neg logit contributions
 convince-0.006
 hips-0.006
 injuries-0.005
bing-0.005
 hire-0.005
Tokenement
Token ID972
Feature activation+2.17e+00

Pos logit contributions
 wow+0.152
unts+0.146
uzz+0.143
 ago+0.143
 telescope+0.141
Neg logit contributions
 convince-0.074
 hips-0.058
 injuries-0.053
bing-0.050
eling-0.048
Token as
Token ID355
Feature activation-9.75e-01

Pos logit contributions
 convince+0.071
 hips+0.060
 injuries+0.058
bing+0.054
 negotiate+0.054
Neg logit contributions
 wow-0.077
 ago-0.070
uzz-0.068
unts-0.068
 happened-0.068
Token the
Token ID262
Feature activation+4.02e-02

Pos logit contributions
 wow+0.029
unts+0.027
 fault+0.027
 ago+0.027
 maj+0.027
Neg logit contributions
 convince-0.036
 hips-0.030
 injuries-0.029
 hire-0.028
bing-0.027
Token machine
Token ID4572
Feature activation+8.05e-02

Pos logit contributions
 convince+0.002
 hips+0.001
 injuries+0.001
bing+0.001
arians+0.001
Neg logit contributions
 wow-0.001
 ago-0.001
unts-0.001
uzz-0.001
 fault-0.001
Token flew
Token ID13112
Feature activation-6.89e-01

Pos logit contributions
 convince+0.003
 hips+0.003
 injuries+0.003
bing+0.003
ivering+0.003
Neg logit contributions
 wow-0.002
 ago-0.002
 happened-0.002
unts-0.002
uzz-0.002
Tokenles
Token ID829
Feature activation-7.67e-01

Pos logit contributions
 wow+0.005
 happened+0.004
 ago+0.004
 happens+0.004
aw+0.004
Neg logit contributions
 convince-0.006
 hips-0.005
 injuries-0.005
 hire-0.005
arians-0.005
Token for
Token ID329
Feature activation+4.69e-01

Pos logit contributions
 wow+0.024
 ago+0.022
unts+0.021
uzz+0.021
 happened+0.021
Neg logit contributions
 convince-0.028
 hips-0.024
 injuries-0.023
bing-0.022
 hire-0.022
Token extra
Token ID3131
Feature activation+8.26e-02

Pos logit contributions
 convince+0.020
ivering+0.019
 hips+0.018
bing+0.017
arians+0.017
Neg logit contributions
 wow-0.012
 many-0.011
Kat-0.010
 ago-0.010
 happened-0.009
Token y
Token ID331
Feature activation+8.96e-01

Pos logit contributions
 convince+0.003
 hips+0.003
 injuries+0.003
ivering+0.003
bing+0.003
Neg logit contributions
 wow-0.002
 ago-0.002
unts-0.002
 happened-0.002
 maj-0.002
Tokenum
Token ID388
Feature activation-7.53e-01

Pos logit contributions
 convince+0.033
 hips+0.028
 injuries+0.026
arians+0.025
 hire+0.025
Neg logit contributions
 wow-0.028
unts-0.026
 ago-0.026
 fault-0.026
uzz-0.025
Tokenmin
Token ID1084
Feature activation-3.31e+00

Pos logit contributions
 wow+0.025
 ago+0.022
 happened+0.022
unts+0.022
 fault+0.022
Neg logit contributions
 convince-0.027
 hips-0.023
 injuries-0.022
bing-0.021
ivering-0.021
Tokeness
Token ID408
Feature activation-1.47e+00

Pos logit contributions
 wow+0.073
unts+0.061
 ago+0.061
 telescope+0.060
 maj+0.059
Neg logit contributions
 convince-0.153
 hips-0.136
 injuries-0.131
bing-0.128
me-0.127
Token.
Token ID13
Feature activation-1.17e-01

Pos logit contributions
 wow+0.048
 ago+0.043
unts+0.043
uzz+0.043
 fault+0.042
Neg logit contributions
 convince-0.052
 hips-0.045
 injuries-0.043
bing-0.041
 hire-0.040
Token
Token ID220
Feature activation+1.73e-01

Pos logit contributions
 wow+0.004
 telescope+0.003
unts+0.003
 fault+0.003
 happens+0.003
Neg logit contributions
 convince-0.004
 hips-0.004
-0.003
 hire-0.003
bing-0.003
Token
Token ID198
Feature activation-1.32e-01

Pos logit contributions
 convince+0.007
 hips+0.006
me+0.005
bing+0.005
 injuries+0.005
Neg logit contributions
 wow-0.005
 telescope-0.005
 ago-0.004
unts-0.004
 solar-0.004
Token
Token ID198
Feature activation-1.21e-01

Pos logit contributions
 wow+0.004
 fault+0.004
 telescope+0.004
 ago+0.003
 happens+0.003
Neg logit contributions
 convince-0.005
 hips-0.004
bing-0.004
me-0.004
 hire-0.004
Token dog
Token ID3290
Feature activation+4.83e-01

Pos logit contributions
 convince+0.006
 hips+0.005
 injuries+0.005
bing+0.005
eling+0.005
Neg logit contributions
 wow-0.004
 happened-0.003
 ago-0.003
unts-0.003
 fault-0.003
Token as
Token ID355
Feature activation-6.57e-01

Pos logit contributions
 convince+0.019
 hips+0.017
 injuries+0.016
ivering+0.016
eling+0.016
Neg logit contributions
 wow-0.013
 ago-0.012
 many-0.012
uzz-0.012
 fault-0.012
Token a
Token ID257
Feature activation-1.13e-01

Pos logit contributions
 wow+0.022
unts+0.020
 telescope+0.020
 ago+0.020
Kat+0.020
Neg logit contributions
 convince-0.022
 hips-0.018
 injuries-0.018
bing-0.017
 hire-0.017
Token yellow
Token ID7872
Feature activation-9.05e-01

Pos logit contributions
 wow+0.004
uzz+0.003
 ago+0.003
 fault+0.003
 maj+0.003
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
 hire-0.003
 negotiate-0.003
Token lab
Token ID2248
Feature activation+4.14e-01

Pos logit contributions
 wow+0.027
 ago+0.024
unts+0.024
uzz+0.024
 happened+0.024
Neg logit contributions
 convince-0.035
 hips-0.031
 injuries-0.029
eling-0.029
bing-0.029
Tokenrador
Token ID40368
Feature activation-3.22e+00

Pos logit contributions
 convince+0.015
 hips+0.013
 injuries+0.012
bing+0.012
ivering+0.012
Neg logit contributions
 wow-0.013
 ago-0.012
unts-0.012
uzz-0.012
 happened-0.012
Token.
Token ID13
Feature activation+1.64e-02

Pos logit contributions
 wow+0.099
 happened+0.089
 ago+0.089
uzz+0.087
unts+0.086
Neg logit contributions
 convince-0.122
 hips-0.105
 injuries-0.101
bing-0.098
ivering-0.097
Token Lily
Token ID20037
Feature activation+5.92e-02

Pos logit contributions
 convince+0.001
 hips+0.001
 injuries+0.001
 hire+0.000
bing+0.000
Neg logit contributions
 wow-0.000
 telescope-0.000
unts-0.000
 happened-0.000
 fault-0.000
Token got
Token ID1392
Feature activation-1.19e-01

Pos logit contributions
 convince+0.002
 hips+0.002
 injuries+0.002
bing+0.002
ivering+0.002
Neg logit contributions
 wow-0.002
unts-0.001
 ago-0.001
uzz-0.001
 happened-0.001
Token excited
Token ID6568
Feature activation+9.59e-01

Pos logit contributions
 wow+0.003
unts+0.003
uzz+0.003
 ago+0.003
 happened+0.003
Neg logit contributions
 convince-0.005
 hips-0.004
 injuries-0.004
bing-0.004
arians-0.004
Token and
Token ID290
Feature activation-2.71e-01

Pos logit contributions
 convince+0.036
 hips+0.032
 injuries+0.030
bing+0.029
 hire+0.029
Neg logit contributions
 wow-0.029
 ago-0.026
unts-0.026
 happened-0.026
 fault-0.025
Token go
Token ID467
Feature activation+1.04e-01

Pos logit contributions
 convince+0.001
 hips+0.001
 injuries+0.001
bing+0.001
ivering+0.001
Neg logit contributions
 wow-0.001
 ago-0.001
 happened-0.001
 telescope-0.001
unts-0.001
Token!
Token ID0
Feature activation-2.54e-01

Pos logit contributions
 convince+0.005
arians+0.005
 injuries+0.004
ivering+0.004
bing+0.004
Neg logit contributions
 wow-0.002
 ago-0.002
 many-0.002
uzz-0.002
 telescope-0.002
Token Everyone
Token ID11075
Feature activation+3.54e-02

Pos logit contributions
 wow+0.007
 ago+0.006
uzz+0.006
unts+0.006
Kat+0.006
Neg logit contributions
 convince-0.011
 hips-0.009
 injuries-0.009
bing-0.009
eling-0.009
Token watched
Token ID7342
Feature activation-4.11e-01

Pos logit contributions
 convince+0.001
 hips+0.001
eling+0.001
bing+0.001
 hire+0.001
Neg logit contributions
 wow-0.001
 fault-0.001
unts-0.001
 ago-0.001
uzz-0.001
Token in
Token ID287
Feature activation-4.37e-01

Pos logit contributions
 wow+0.013
 ago+0.012
unts+0.012
uzz+0.012
 maj+0.012
Neg logit contributions
 convince-0.015
 hips-0.013
 injuries-0.012
bing-0.012
ivering-0.012
Token amaz
Token ID40642
Feature activation-3.20e+00

Pos logit contributions
 wow+0.014
unts+0.013
uzz+0.013
Kat+0.013
 fault+0.013
Neg logit contributions
 convince-0.016
 hips-0.014
 injuries-0.013
bing-0.012
 hire-0.012
Tokenement
Token ID972
Feature activation+2.08e+00

Pos logit contributions
 wow+0.147
unts+0.142
uzz+0.139
 ago+0.138
 fault+0.137
Neg logit contributions
 convince-0.066
 hips-0.051
 injuries-0.045
bing-0.044
eling-0.042
Token as
Token ID355
Feature activation-1.03e+00

Pos logit contributions
 convince+0.063
 hips+0.054
 injuries+0.050
bing+0.049
arians+0.046
Neg logit contributions
 wow-0.078
unts-0.071
 ago-0.070
uzz-0.070
Kat-0.070
Token it
Token ID340
Feature activation-4.91e-01

Pos logit contributions
 wow+0.031
 fault+0.029
unts+0.029
apped+0.029
uzz+0.028
Neg logit contributions
 convince-0.037
 hips-0.031
 injuries-0.029
 hire-0.029
bing-0.028
Token took
Token ID1718
Feature activation+3.76e-01

Pos logit contributions
 wow+0.014
unts+0.012
 ago+0.012
 fault+0.012
uzz+0.012
Neg logit contributions
 convince-0.020
 hips-0.017
 injuries-0.016
bing-0.016
eling-0.016
Token off
Token ID572
Feature activation-1.80e+00

Pos logit contributions
 convince+0.014
 hips+0.012
 injuries+0.012
bing+0.011
 hire+0.011
Neg logit contributions
 wow-0.011
 ago-0.010
unts-0.010
 fault-0.010
uzz-0.010
Token friends
Token ID2460
Feature activation-7.00e-01

Pos logit contributions
 wow+0.021
Kat+0.019
unts+0.018
uzz+0.018
 happens+0.018
Neg logit contributions
 convince-0.022
 hips-0.019
 injuries-0.019
 sleeve-0.017
 hire-0.017
Token stopped
Token ID5025
Feature activation-1.03e+00

Pos logit contributions
 wow+0.019
 ago+0.016
 fault+0.016
unts+0.016
uzz+0.016
Neg logit contributions
 convince-0.029
 hips-0.026
 injuries-0.024
 hire-0.023
bing-0.023
Token and
Token ID290
Feature activation-4.59e-01

Pos logit contributions
 wow+0.031
 ago+0.028
unts+0.028
uzz+0.028
 maj+0.028
Neg logit contributions
 convince-0.039
 hips-0.034
 injuries-0.032
arians-0.031
bing-0.031
Token watched
Token ID7342
Feature activation-7.09e-01

Pos logit contributions
 fault+0.017
unts+0.016
 ago+0.016
uzz+0.015
 wow+0.015
Neg logit contributions
 convince-0.016
 hips-0.012
 persuade-0.011
 explain-0.011
 before-0.011
Token in
Token ID287
Feature activation-4.89e-01

Pos logit contributions
 wow+0.026
 ago+0.023
uzz+0.023
unts+0.023
 happened+0.023
Neg logit contributions
 convince-0.023
 hips-0.019
 injuries-0.018
ivering-0.018
arians-0.018
Token amaz
Token ID40642
Feature activation-3.19e+00

Pos logit contributions
 wow+0.013
 ago+0.011
unts+0.011
uzz+0.011
 fault+0.011
Neg logit contributions
 convince-0.020
 hips-0.018
 injuries-0.017
bing-0.017
 hire-0.016
Tokenement
Token ID972
Feature activation+2.00e+00

Pos logit contributions
 wow+0.147
unts+0.141
uzz+0.138
 ago+0.138
 fault+0.136
Neg logit contributions
 convince-0.066
 hips-0.052
 injuries-0.048
bing-0.045
eling-0.043
Token.
Token ID13
Feature activation-2.58e-01

Pos logit contributions
 convince+0.061
 hips+0.051
 injuries+0.048
bing+0.045
 hire+0.045
Neg logit contributions
 wow-0.075
 ago-0.069
unts-0.068
uzz-0.068
 fault-0.067
Token The
Token ID383
Feature activation+5.44e-01

Pos logit contributions
 wow+0.007
 fault+0.007
 happened+0.006
 happens+0.006
 telescope+0.006
Neg logit contributions
 convince-0.009
 hips-0.009
-0.008
 hire-0.008
arians-0.008
Token dog
Token ID3290
Feature activation+3.84e-01

Pos logit contributions
 convince+0.021
 hips+0.018
 injuries+0.017
bing+0.016
 hire+0.016
Neg logit contributions
 wow-0.017
 ago-0.015
unts-0.015
uzz-0.015
 fault-0.014
Token too
Token ID1165
Feature activation-2.30e-01

Pos logit contributions
 convince+0.016
 hips+0.014
 injuries+0.014
bing+0.013
ivering+0.013
Neg logit contributions
 wow-0.010
 happened-0.009
unts-0.009
 ago-0.009
 happens-0.008
Token sun
Token ID4252
Feature activation-7.34e-01

Pos logit contributions
 wow+0.000
 maj+0.000
 ago+0.000
 many+0.000
 fault+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
arians-0.000
bing-0.000
Tokenfl
Token ID2704
Feature activation-8.01e-01

Pos logit contributions
 wow+0.027
unts+0.025
Kat+0.025
 fault+0.024
 ago+0.024
Neg logit contributions
 convince-0.022
 hips-0.019
bing-0.018
me-0.018
 injuries-0.017
Tokenowers
Token ID3618
Feature activation+5.46e-01

Pos logit contributions
 wow+0.025
 ago+0.022
unts+0.022
uzz+0.022
 fault+0.021
Neg logit contributions
 convince-0.030
 hips-0.026
 injuries-0.025
bing-0.024
 hire-0.024
Token were
Token ID547
Feature activation+3.53e-01

Pos logit contributions
 convince+0.022
 hips+0.019
 injuries+0.018
bing+0.018
 hire+0.017
Neg logit contributions
 wow-0.015
 ago-0.013
unts-0.013
uzz-0.013
 fault-0.013
Token Y
Token ID575
Feature activation+9.00e-01

Pos logit contributions
 convince+0.014
 hips+0.012
 injuries+0.011
bing+0.011
 hire+0.011
Neg logit contributions
 wow-0.011
 ago-0.009
unts-0.009
uzz-0.009
 telescope-0.009
Tokenol
Token ID349
Feature activation-3.12e+00

Pos logit contributions
 convince+0.038
 hips+0.034
 injuries+0.032
eling+0.031
bing+0.031
Neg logit contributions
 wow-0.024
uzz-0.022
 maj-0.021
apped-0.021
 fault-0.021
Tokenanda
Token ID5282
Feature activation-5.63e-01

Pos logit contributions
 wow+0.092
 ago+0.082
uzz+0.082
 maj+0.080
 fault+0.079
Neg logit contributions
 convince-0.125
 hips-0.108
 injuries-0.104
 negotiate-0.099
bing-0.099
Token's
Token ID338
Feature activation-1.10e+00

Pos logit contributions
 wow+0.018
uzz+0.016
 maj+0.016
 ago+0.016
 fault+0.016
Neg logit contributions
 convince-0.021
 hips-0.017
 injuries-0.017
 negotiate-0.016
eling-0.016
Token best
Token ID1266
Feature activation-6.31e-01

Pos logit contributions
 wow+0.031
unts+0.027
uzz+0.027
 ago+0.027
Kat+0.027
Neg logit contributions
 convince-0.044
 hips-0.039
 injuries-0.037
bing-0.036
 hire-0.035
Token friends
Token ID2460
Feature activation-4.59e-01

Pos logit contributions
 wow+0.020
 ago+0.018
unts+0.018
uzz+0.018
Kat+0.017
Neg logit contributions
 convince-0.023
 hips-0.020
 injuries-0.019
bing-0.018
 hire-0.018
Token.
Token ID13
Feature activation-8.06e-02

Pos logit contributions
 wow+0.016
 ago+0.014
unts+0.014
uzz+0.014
 fault+0.014
Neg logit contributions
 convince-0.015
 hips-0.013
 injuries-0.012
bing-0.012
arians-0.012
Token -
Token ID532
Feature activation-2.67e-01

Pos logit contributions
 convince+0.010
 hips+0.008
 injuries+0.007
bing+0.007
arians+0.007
Neg logit contributions
 wow-0.014
unts-0.012
Kat-0.012
uzz-0.012
 maj-0.012
Token a
Token ID257
Feature activation-9.15e-02

Pos logit contributions
 wow+0.008
unts+0.007
uzz+0.007
 ago+0.007
Kat+0.007
Neg logit contributions
 convince-0.010
 hips-0.009
 injuries-0.009
bing-0.008
 hire-0.008
Token box
Token ID3091
Feature activation-2.94e-01

Pos logit contributions
 wow+0.003
uzz+0.003
 ago+0.002
apped+0.002
Kat+0.002
Neg logit contributions
 convince-0.003
 hips-0.003
 injuries-0.003
 hire-0.003
 sleeve-0.003
Token of
Token ID286
Feature activation+5.89e-01

Pos logit contributions
 wow+0.010
 ago+0.010
unts+0.009
uzz+0.009
 fault+0.009
Neg logit contributions
 convince-0.010
 hips-0.008
 injuries-0.008
bing-0.007
arians-0.007
Token mult
Token ID1963
Feature activation-1.44e+00

Pos logit contributions
 convince+0.023
 hips+0.020
 injuries+0.019
bing+0.018
 hire+0.018
Neg logit contributions
 wow-0.017
unts-0.015
 ago-0.015
uzz-0.015
 fault-0.015
Tokenicol
Token ID27045
Feature activation-3.12e+00

Pos logit contributions
 wow+0.016
unts+0.013
 happened+0.012
Kat+0.012
uzz+0.012
Neg logit contributions
 convince-0.079
 hips-0.075
 injuries-0.072
bing-0.071
arians-0.070
Tokenored
Token ID1850
Feature activation-1.02e+00

Pos logit contributions
 wow+0.119
unts+0.111
 ago+0.109
uzz+0.108
 telescope+0.108
Neg logit contributions
 convince-0.093
 hips-0.078
 injuries-0.074
bing-0.070
arians-0.068
Token bubbles
Token ID25037
Feature activation+9.13e-02

Pos logit contributions
 wow+0.032
uzz+0.028
unts+0.028
 fault+0.028
Kat+0.028
Neg logit contributions
 convince-0.035
 hips-0.031
 injuries-0.030
bing-0.029
 hire-0.029
Token!
Token ID0
Feature activation-5.44e-01

Pos logit contributions
 convince+0.003
 hips+0.002
 injuries+0.002
bing+0.002
ivering+0.002
Neg logit contributions
 wow-0.004
uzz-0.003
Kat-0.003
unts-0.003
 maj-0.003
Token
Token ID198
Feature activation-3.01e-02

Pos logit contributions
 wow+0.019
 fault+0.017
 telescope+0.017
ull+0.017
uzz+0.017
Neg logit contributions
 convince-0.017
 hips-0.015
-0.015
 hire-0.014
me-0.014
Token
Token ID198
Feature activation-3.11e-02

Pos logit contributions
 wow+0.001
 fault+0.001
 ago+0.001
 telescope+0.001
 happens+0.001
Neg logit contributions
 convince-0.001
 hips-0.001
me-0.001
bing-0.001
ittens-0.001
Token the
Token ID262
Feature activation+3.98e-02

Pos logit contributions
 wow+0.019
uzz+0.017
Kat+0.017
unts+0.017
 fault+0.017
Neg logit contributions
 convince-0.021
 hips-0.018
eling-0.017
bing-0.017
 injuries-0.016
Token c
Token ID269
Feature activation+2.93e-01

Pos logit contributions
 convince+0.001
 hips+0.001
 injuries+0.001
bing+0.001
ivering+0.001
Neg logit contributions
 wow-0.001
 ago-0.001
unts-0.001
 fault-0.001
 telescope-0.001
Tokenrows
Token ID8516
Feature activation-4.03e-01

Pos logit contributions
 convince+0.011
 hips+0.010
 injuries+0.010
bing+0.009
ivering+0.009
Neg logit contributions
 wow-0.009
 ago-0.008
unts-0.008
 telescope-0.008
uzz-0.007
Token were
Token ID547
Feature activation+4.30e-01

Pos logit contributions
 wow+0.010
 ago+0.009
unts+0.009
uzz+0.009
 fault+0.009
Neg logit contributions
 convince-0.017
 hips-0.015
 injuries-0.014
bing-0.014
 hire-0.014
Token c
Token ID269
Feature activation+1.57e-01

Pos logit contributions
 convince+0.016
 hips+0.014
 injuries+0.013
bing+0.013
 hire+0.012
Neg logit contributions
 wow-0.014
 ago-0.012
unts-0.012
uzz-0.012
 fault-0.012
Tokenaw
Token ID707
Feature activation-3.11e+00

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.007
 hire+0.007
 negotiate+0.007
Neg logit contributions
 wow-0.002
aw-0.002
apped-0.002
unts-0.002
 maj-0.002
Tokening
Token ID278
Feature activation-7.06e-01

Pos logit contributions
 wow+0.100
 ago+0.090
unts+0.088
 happened+0.088
uzz+0.087
Neg logit contributions
 convince-0.113
 hips-0.097
 injuries-0.093
bing-0.088
 hire-0.088
Token in
Token ID287
Feature activation-2.23e-01

Pos logit contributions
 wow+0.023
 ago+0.021
unts+0.020
uzz+0.020
 happened+0.020
Neg logit contributions
 convince-0.025
 hips-0.022
 injuries-0.021
bing-0.020
 hire-0.019
Token unison
Token ID49418
Feature activation+2.71e-01

Pos logit contributions
 wow+0.007
unts+0.006
 ago+0.006
 fault+0.006
uzz+0.006
Neg logit contributions
 convince-0.008
 hips-0.007
 injuries-0.007
bing-0.007
 hire-0.006
Token to
Token ID284
Feature activation-2.55e-01

Pos logit contributions
 convince+0.008
 hips+0.006
 injuries+0.006
me+0.005
bing+0.005
Neg logit contributions
 wow-0.011
 telescope-0.010
 fault-0.010
 maj-0.010
unts-0.010
Token make
Token ID787
Feature activation+8.27e-01

Pos logit contributions
 wow+0.009
 ago+0.008
uzz+0.008
unts+0.008
 telescope+0.008
Neg logit contributions
 convince-0.008
 hips-0.007
 injuries-0.007
bing-0.007
arians-0.006
Token fish
Token ID5916
Feature activation+7.56e-02

Pos logit contributions
 wow+0.028
Kat+0.026
 fault+0.025
unts+0.025
uzz+0.025
Neg logit contributions
 convince-0.030
 hips-0.026
 injuries-0.025
bing-0.024
 sleeve-0.023
Token,
Token ID11
Feature activation-1.98e-01

Pos logit contributions
 convince+0.003
 hips+0.002
 injuries+0.002
bing+0.002
ivering+0.002
Neg logit contributions
 wow-0.003
unts-0.002
 ago-0.002
 fault-0.002
Kat-0.002
Token a
Token ID257
Feature activation-1.05e-01

Pos logit contributions
 wow+0.007
unts+0.006
 fault+0.006
uzz+0.006
 ago+0.006
Neg logit contributions
 convince-0.006
 hips-0.006
 injuries-0.005
bing-0.005
 hire-0.005
Token noisy
Token ID31210
Feature activation-6.27e-01

Pos logit contributions
 wow+0.003
uzz+0.003
apped+0.003
 fault+0.003
 ago+0.003
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
 sleeve-0.003
 negotiate-0.003
Token squ
Token ID2809
Feature activation-5.95e-01

Pos logit contributions
 wow+0.021
Kat+0.020
uzz+0.020
 fault+0.020
 happens+0.019
Neg logit contributions
 convince-0.020
 hips-0.017
 injuries-0.016
bing-0.016
ivering-0.015
Tokenaw
Token ID707
Feature activation-3.05e+00

Pos logit contributions
 wow+0.015
 ago+0.013
Kat+0.013
 happened+0.013
 telescope+0.013
Neg logit contributions
 convince-0.025
 hips-0.023
 injuries-0.022
ivering-0.021
bing-0.021
Tokenking
Token ID3364
Feature activation-1.52e-01

Pos logit contributions
 wow+0.121
unts+0.115
Kat+0.114
 fault+0.114
uzz+0.113
Neg logit contributions
 convince-0.080
 hips-0.070
 injuries-0.066
bing-0.064
pper-0.061
Token hawk
Token ID48710
Feature activation-5.90e-01

Pos logit contributions
 wow+0.005
Kat+0.005
 fault+0.005
 maj+0.005
uzz+0.005
Neg logit contributions
 convince-0.005
 hips-0.004
bing-0.004
 injuries-0.004
 sleeve-0.004
Token,
Token ID11
Feature activation-1.14e-01

Pos logit contributions
 wow+0.018
 ago+0.017
uzz+0.016
 happened+0.016
unts+0.015
Neg logit contributions
 convince-0.023
 hips-0.020
 injuries-0.019
 hire-0.018
 negotiate-0.018
Token and
Token ID290
Feature activation-2.97e-01

Pos logit contributions
 wow+0.004
unts+0.004
uzz+0.004
Kat+0.004
 ago+0.004
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
bing-0.003
arians-0.003
Token many
Token ID867
Feature activation-7.13e-01

Pos logit contributions
 wow+0.009
 fault+0.009
unts+0.008
 ago+0.008
 maj+0.008
Neg logit contributions
 convince-0.011
 hips-0.009
bing-0.009
 injuries-0.009
eling-0.009
Tokenered
Token ID1068
Feature activation+8.60e-01

Pos logit contributions
 wow+0.017
unts+0.015
 ago+0.015
 fault+0.015
uzz+0.015
Neg logit contributions
 convince-0.012
 hips-0.010
 injuries-0.009
bing-0.009
eling-0.008
Token in
Token ID287
Feature activation-3.28e-01

Pos logit contributions
 convince+0.028
 hips+0.024
 injuries+0.022
bing+0.022
eling+0.021
Neg logit contributions
 wow-0.030
unts-0.029
 fault-0.028
 happened-0.027
 happens-0.027
Token,
Token ID11
Feature activation-2.77e-01

Pos logit contributions
 wow+0.011
unts+0.010
 fault+0.010
apped+0.010
uzz+0.010
Neg logit contributions
 convince-0.011
 hips-0.010
 injuries-0.009
bing-0.009
 hire-0.009
Token blinking
Token ID43196
Feature activation+6.62e-03

Pos logit contributions
 wow+0.010
unts+0.010
 fault+0.010
 happens+0.009
 ago+0.009
Neg logit contributions
 convince-0.009
 hips-0.007
 injuries-0.007
bing-0.007
eling-0.006
Token in
Token ID287
Feature activation-3.05e-01

Pos logit contributions
 convince+0.000
 hips+0.000
bing+0.000
 injuries+0.000
 sleeve+0.000
Neg logit contributions
 wow-0.000
 fault-0.000
unts-0.000
 ago-0.000
uzz-0.000
Token amaz
Token ID40642
Feature activation-3.03e+00

Pos logit contributions
 wow+0.010
 fault+0.009
unts+0.009
apped+0.009
 maj+0.009
Neg logit contributions
 convince-0.010
 hips-0.009
 injuries-0.009
bing-0.008
 hire-0.008
Tokenement
Token ID972
Feature activation+2.00e+00

Pos logit contributions
 wow+0.141
unts+0.137
 fault+0.133
uzz+0.133
 ago+0.131
Neg logit contributions
 convince-0.059
 hips-0.047
 injuries-0.043
bing-0.041
eling-0.039
Token.
Token ID13
Feature activation-3.96e-01

Pos logit contributions
 convince+0.059
 hips+0.049
 injuries+0.046
bing+0.045
arians+0.042
Neg logit contributions
 wow-0.077
unts-0.070
 ago-0.070
 fault-0.069
uzz-0.069
Token
Token ID198
Feature activation-4.86e-01

Pos logit contributions
 wow+0.012
 fault+0.011
unts+0.011
 happens+0.011
 maj+0.011
Neg logit contributions
 convince-0.014
 hips-0.013
-0.011
 hire-0.011
arians-0.011
Token
Token ID198
Feature activation-5.07e-01

Pos logit contributions
 wow+0.017
 fault+0.016
 telescope+0.015
 happens+0.015
 maj+0.015
Neg logit contributions
 convince-0.017
 hips-0.014
me-0.013
 hire-0.013
 negotiate-0.012
TokenBut
Token ID1537
Feature activation+7.23e-01

Pos logit contributions
 fault+0.020
 wow+0.020
 happened+0.019
 happens+0.019
 telescope+0.019
Neg logit contributions
 convince-0.015
 hips-0.011
me-0.011
 hire-0.011
 negotiate-0.011
Token little
Token ID1310
Feature activation-5.07e-01

Pos logit contributions
 convince+0.007
 hips+0.006
 injuries+0.006
bing+0.006
arians+0.006
Neg logit contributions
 wow-0.007
 ago-0.006
unts-0.006
uzz-0.006
 fault-0.006
Token girl
Token ID2576
Feature activation-5.72e-01

Pos logit contributions
 wow+0.015
 fault+0.013
unts+0.013
uzz+0.013
Kat+0.013
Neg logit contributions
 convince-0.020
 hips-0.017
 injuries-0.016
bing-0.016
 hire-0.016
Token looked
Token ID3114
Feature activation+2.91e-01

Pos logit contributions
 wow+0.017
 fault+0.016
uzz+0.015
unts+0.015
 ago+0.015
Neg logit contributions
 convince-0.022
 hips-0.019
 injuries-0.018
me-0.017
bing-0.017
Token up
Token ID510
Feature activation+5.34e-01

Pos logit contributions
 convince+0.009
 hips+0.008
bing+0.007
ivering+0.007
 injuries+0.007
Neg logit contributions
 wow-0.010
unts-0.010
uzz-0.010
 ago-0.010
 fault-0.010
Token in
Token ID287
Feature activation-5.10e-01

Pos logit contributions
 convince+0.021
 hips+0.018
 injuries+0.017
bing+0.017
ivering+0.016
Neg logit contributions
 wow-0.016
unts-0.014
 fault-0.014
uzz-0.014
 happens-0.014
Token amaz
Token ID40642
Feature activation-3.01e+00

Pos logit contributions
 wow+0.017
 fault+0.016
uzz+0.016
unts+0.016
apped+0.016
Neg logit contributions
 convince-0.018
 hips-0.015
 injuries-0.015
 sleeve-0.013
 hire-0.013
Tokenement
Token ID972
Feature activation+1.99e+00

Pos logit contributions
 wow+0.142
unts+0.138
uzz+0.135
 fault+0.134
 ago+0.134
Neg logit contributions
 convince-0.057
 hips-0.044
 injuries-0.040
bing-0.038
eling-0.036
Token and
Token ID290
Feature activation-8.20e-01

Pos logit contributions
 convince+0.058
 hips+0.050
 injuries+0.046
bing+0.045
ivering+0.042
Neg logit contributions
 wow-0.077
unts-0.071
 fault-0.069
 happens-0.069
uzz-0.069
Token said
Token ID531
Feature activation-9.38e-01

Pos logit contributions
 fault+0.027
unts+0.026
uzz+0.025
ull+0.025
 wow+0.024
Neg logit contributions
 convince-0.030
 hips-0.026
bing-0.024
eling-0.023
 persuade-0.023
Token,
Token ID11
Feature activation-6.02e-01

Pos logit contributions
unts+0.043
 fault+0.043
 solar+0.042
 galaxy+0.041
ocado+0.040
Neg logit contributions
 convince-0.017
 hips-0.016
 before-0.013
 help-0.012
bing-0.012
Token "
Token ID366
Feature activation-5.30e-01

Pos logit contributions
 fault+0.018
 happens+0.017
unts+0.017
 happened+0.016
aw+0.016
Neg logit contributions
 convince-0.022
 hips-0.022
bing-0.019
 before-0.019
 sleeve-0.018
Token for
Token ID329
Feature activation+4.13e-01

Pos logit contributions
 convince+0.015
 hips+0.013
 injuries+0.013
bing+0.012
arians+0.012
Neg logit contributions
 wow-0.017
 ago-0.015
unts-0.015
 fault-0.015
uzz-0.015
Token the
Token ID262
Feature activation+1.12e-01

Pos logit contributions
 convince+0.017
 hips+0.015
ivering+0.015
bing+0.014
me+0.014
Neg logit contributions
 wow-0.011
 ago-0.010
uzz-0.010
 many-0.009
 Jas-0.009
Token lock
Token ID5793
Feature activation-1.12e+00

Pos logit contributions
 convince+0.004
 hips+0.004
bing+0.004
ivering+0.003
 injuries+0.003
Neg logit contributions
 wow-0.003
 ago-0.003
 maj-0.003
 fault-0.003
 many-0.003
Token.
Token ID13
Feature activation+2.42e-02

Pos logit contributions
 wow+0.039
 ago+0.037
 happened+0.036
unts+0.036
 fault+0.036
Neg logit contributions
 convince-0.036
 hips-0.031
 injuries-0.029
bing-0.028
ivering-0.028
Token Y
Token ID575
Feature activation+9.94e-01

Pos logit contributions
 convince+0.001
 hips+0.001
 hire+0.001
arians+0.001
 injuries+0.001
Neg logit contributions
 wow-0.001
 telescope-0.001
unts-0.001
 happened-0.001
ull-0.001
Tokenol
Token ID349
Feature activation-2.99e+00

Pos logit contributions
 convince+0.040
 hips+0.035
 injuries+0.034
eling+0.032
bing+0.032
Neg logit contributions
 wow-0.029
apped-0.026
uzz-0.026
 maj-0.025
unts-0.025
Tokenanda
Token ID5282
Feature activation-2.77e-01

Pos logit contributions
 wow+0.093
unts+0.083
 ago+0.081
 fault+0.081
 telescope+0.081
Neg logit contributions
 convince-0.111
 hips-0.097
 injuries-0.092
bing-0.089
arians-0.088
Token was
Token ID373
Feature activation-2.49e-01

Pos logit contributions
 wow+0.008
 ago+0.007
unts+0.007
uzz+0.007
 happened+0.007
Neg logit contributions
 convince-0.011
 hips-0.010
 injuries-0.009
bing-0.009
ivering-0.009
Token so
Token ID523
Feature activation+5.03e-01

Pos logit contributions
 wow+0.008
 ago+0.007
unts+0.007
uzz+0.007
 fault+0.007
Neg logit contributions
 convince-0.009
 hips-0.008
 injuries-0.008
bing-0.007
 hire-0.007
Token excited
Token ID6568
Feature activation+9.08e-01

Pos logit contributions
 convince+0.018
 hips+0.014
 hire+0.013
 persuade+0.013
bing+0.013
Neg logit contributions
 wow-0.016
 telescope-0.016
 fault-0.015
 ago-0.015
 solar-0.015
Token!
Token ID0
Feature activation-6.27e-01

Pos logit contributions
 convince+0.033
 hips+0.029
 injuries+0.027
bing+0.026
 hire+0.026
Neg logit contributions
 wow-0.029
 ago-0.026
unts-0.026
uzz-0.026
 fault-0.025
Token and
Token ID290
Feature activation-4.36e-01

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.007
bing+0.006
ivering+0.006
Neg logit contributions
 wow-0.008
unts-0.007
 ago-0.007
Kat-0.007
uzz-0.007
Token forth
Token ID6071
Feature activation-6.22e-01

Pos logit contributions
 wow+0.018
unts+0.018
uzz+0.017
 ago+0.017
Kat+0.017
Neg logit contributions
 convince-0.011
 hips-0.009
 injuries-0.008
bing-0.008
eling-0.008
Token on
Token ID319
Feature activation-2.63e-01

Pos logit contributions
 wow+0.022
unts+0.020
 ago+0.020
Kat+0.020
uzz+0.020
Neg logit contributions
 convince-0.020
 hips-0.018
 injuries-0.016
bing-0.016
ivering-0.015
Token the
Token ID262
Feature activation+1.04e-02

Pos logit contributions
 wow+0.008
 ago+0.007
unts+0.007
 fault+0.007
uzz+0.007
Neg logit contributions
 convince-0.010
 hips-0.009
 injuries-0.008
bing-0.008
ivering-0.008
Token sees
Token ID7224
Feature activation-7.68e-01

Pos logit contributions
 convince+0.000
 hips+0.000
 injuries+0.000
bing+0.000
ivering+0.000
Neg logit contributions
 wow-0.000
 telescope-0.000
 ago-0.000
 fault-0.000
unts-0.000
Tokenaw
Token ID707
Feature activation-2.99e+00

Pos logit contributions
 wow+0.025
unts+0.023
uzz+0.023
 fault+0.023
 ago+0.023
Neg logit contributions
 convince-0.026
 hips-0.022
 injuries-0.022
bing-0.021
 hire-0.020
Token.
Token ID13
Feature activation-2.56e-01

Pos logit contributions
 wow+0.115
unts+0.106
uzz+0.105
 telescope+0.105
 ago+0.104
Neg logit contributions
 convince-0.087
 hips-0.074
 injuries-0.070
bing-0.067
ivering-0.065
Token They
Token ID1119
Feature activation-3.50e-01

Pos logit contributions
 wow+0.008
 telescope+0.007
 happened+0.007
 fault+0.007
unts+0.007
Neg logit contributions
 convince-0.009
 hips-0.008
-0.007
 hire-0.007
ittens-0.007
Token kept
Token ID4030
Feature activation-4.46e-01

Pos logit contributions
 wow+0.012
Kat+0.012
 ago+0.012
 fault+0.012
uzz+0.011
Neg logit contributions
 convince-0.011
 hips-0.010
 injuries-0.009
eling-0.009
 hire-0.009
Token singing
Token ID13777
Feature activation+4.52e-01

Pos logit contributions
 wow+0.016
 ago+0.015
Kat+0.015
unts+0.015
ocado+0.015
Neg logit contributions
 convince-0.014
 hips-0.012
bing-0.012
 injuries-0.012
eling-0.011
Token and
Token ID290
Feature activation-5.37e-01

Pos logit contributions
 convince+0.012
 hips+0.012
bing+0.010
 injuries+0.010
 before+0.010
Neg logit contributions
 fault-0.016
 wow-0.016
Kat-0.016
unts-0.016
 ago-0.016
Token of
Token ID286
Feature activation+4.66e-01

Pos logit contributions
 wow+0.024
unts+0.022
apped+0.022
 happened+0.022
 ago+0.021
Neg logit contributions
 convince-0.026
 hips-0.024
 injuries-0.022
bing-0.022
arians-0.021
Token them
Token ID606
Feature activation+5.38e-02

Pos logit contributions
 convince+0.015
 injuries+0.012
 hips+0.012
eling+0.012
bing+0.011
Neg logit contributions
 wow-0.017
 fault-0.016
 maj-0.015
 ago-0.015
unts-0.015
Token balanced
Token ID12974
Feature activation+4.09e-01

Pos logit contributions
 convince+0.002
 hips+0.002
 injuries+0.002
bing+0.002
me+0.002
Neg logit contributions
 wow-0.002
 fault-0.001
Kat-0.001
 ago-0.001
unts-0.001
Token the
Token ID262
Feature activation+6.67e-02

Pos logit contributions
 convince+0.015
 hips+0.013
 injuries+0.012
ivering+0.012
arians+0.012
Neg logit contributions
 wow-0.013
 ago-0.012
uzz-0.012
unts-0.011
 fault-0.011
Token sees
Token ID7224
Feature activation-1.10e+00

Pos logit contributions
 convince+0.003
 hips+0.002
 injuries+0.002
bing+0.002
arians+0.002
Neg logit contributions
 wow-0.002
 ago-0.002
 fault-0.002
 maj-0.002
 telescope-0.002
Tokenaw
Token ID707
Feature activation-2.99e+00

Pos logit contributions
 wow+0.028
Kat+0.025
unts+0.025
uzz+0.025
 ago+0.024
Neg logit contributions
 convince-0.045
 hips-0.040
 injuries-0.039
bing-0.038
eling-0.037
Token.
Token ID13
Feature activation-4.07e-02

Pos logit contributions
 wow+0.115
unts+0.107
 ago+0.105
uzz+0.105
 fault+0.105
Neg logit contributions
 convince-0.086
 hips-0.074
 injuries-0.069
bing-0.067
arians-0.064
Token They
Token ID1119
Feature activation+3.72e-01

Pos logit contributions
 wow+0.001
 happens+0.001
 happened+0.001
unts+0.001
 telescope+0.001
Neg logit contributions
 convince-0.001
 hips-0.001
-0.001
 hire-0.001
bing-0.001
Token laughed
Token ID13818
Feature activation-1.34e-01

Pos logit contributions
 convince+0.013
 hips+0.010
eling+0.010
pper+0.010
 persuade+0.009
Neg logit contributions
 fault-0.013
 ago-0.012
Kat-0.012
 wow-0.012
 maj-0.012
Token and
Token ID290
Feature activation-4.53e-01

Pos logit contributions
 ago+0.005
 fault+0.005
Kat+0.005
unts+0.005
 wow+0.005
Neg logit contributions
 convince-0.003
 hips-0.003
bing-0.003
 before-0.002
 injuries-0.002
Token laughed
Token ID13818
Feature activation-1.48e-01

Pos logit contributions
 fault+0.015
 ago+0.014
 maj+0.014
Kat+0.014
 telescope+0.014
Neg logit contributions
 convince-0.016
 hips-0.014
bing-0.013
 injuries-0.012
eling-0.012
Token she
Token ID673
Feature activation-2.29e-01

Pos logit contributions
 wow+0.036
unts+0.033
 fault+0.033
 ago+0.032
uzz+0.032
Neg logit contributions
 convince-0.040
 hips-0.034
 injuries-0.032
bing-0.031
 hire-0.031
Token just
Token ID655
Feature activation-6.68e-01

Pos logit contributions
 wow+0.006
 fault+0.006
 ago+0.006
uzz+0.006
unts+0.006
Neg logit contributions
 convince-0.009
 hips-0.008
 injuries-0.008
bing-0.008
 hire-0.007
Token stood
Token ID6204
Feature activation-6.82e-02

Pos logit contributions
 fault+0.021
unts+0.020
Kat+0.019
ane+0.018
uzz+0.018
Neg logit contributions
 convince-0.023
 hips-0.021
 before-0.020
 resist-0.020
eling-0.019
Token there
Token ID612
Feature activation+4.12e-01

Pos logit contributions
 wow+0.002
unts+0.002
 ago+0.002
 fault+0.002
 happened+0.002
Neg logit contributions
 convince-0.002
 hips-0.002
 injuries-0.002
bing-0.002
ivering-0.002
Token g
Token ID308
Feature activation-3.30e-01

Pos logit contributions
 convince+0.014
 hips+0.012
 injuries+0.011
bing+0.011
 hire+0.011
Neg logit contributions
 wow-0.014
 ago-0.013
unts-0.013
uzz-0.013
 fault-0.013
Tokenaw
Token ID707
Feature activation-2.97e+00

Pos logit contributions
 wow+0.010
unts+0.008
Look+0.008
Kat+0.008
 telescope+0.008
Neg logit contributions
 convince-0.012
 hips-0.011
bing-0.011
ivering-0.011
itching-0.011
Tokenking
Token ID3364
Feature activation-1.20e-01

Pos logit contributions
 wow+0.112
unts+0.107
 fault+0.104
uzz+0.103
 telescope+0.103
Neg logit contributions
 convince-0.086
 hips-0.076
bing-0.070
 injuries-0.069
arians-0.065
Token at
Token ID379
Feature activation-7.74e-01

Pos logit contributions
 wow+0.004
unts+0.004
 fault+0.004
Kat+0.004
 ago+0.004
Neg logit contributions
 convince-0.004
 hips-0.003
bing-0.003
 injuries-0.003
 sleeve-0.003
Token the
Token ID262
Feature activation-1.88e-02

Pos logit contributions
 wow+0.021
 ago+0.019
uzz+0.019
 happened+0.018
 fault+0.018
Neg logit contributions
 convince-0.032
 hips-0.028
 injuries-0.027
ivering-0.026
 hire-0.026
Token beautiful
Token ID4950
Feature activation-5.96e-01

Pos logit contributions
 wow+0.001
 ago+0.000
unts+0.000
 maj+0.000
 telescope+0.000
Neg logit contributions
 convince-0.001
 hips-0.001
 injuries-0.001
bing-0.001
arians-0.001
Token gold
Token ID3869
Feature activation-8.20e-01

Pos logit contributions
 wow+0.017
uzz+0.015
unts+0.015
 fault+0.015
Kat+0.015
Neg logit contributions
 convince-0.023
 hips-0.021
bing-0.020
 injuries-0.020
me-0.019
Token was
Token ID373
Feature activation-1.29e-01

Pos logit contributions
 wow+0.001
 happened+0.000
 ago+0.000
uzz+0.000
unts+0.000
Neg logit contributions
 convince-0.001
 hips-0.001
 injuries-0.001
bing-0.001
arians-0.001
Token jumping
Token ID14284
Feature activation+5.84e-02

Pos logit contributions
 wow+0.005
 telescope+0.005
ocado+0.004
unts+0.004
uzz+0.004
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
bing-0.003
eling-0.003
Token on
Token ID319
Feature activation-2.60e-01

Pos logit contributions
 convince+0.002
 hips+0.002
bing+0.002
 injuries+0.002
 persuade+0.001
Neg logit contributions
 wow-0.002
unts-0.002
 ago-0.002
 fault-0.002
Kat-0.002
Token the
Token ID262
Feature activation-4.10e-02

Pos logit contributions
 wow+0.009
unts+0.008
 ago+0.008
 fault+0.008
uzz+0.008
Neg logit contributions
 convince-0.009
 hips-0.008
 injuries-0.007
bing-0.007
 hire-0.007
Token sees
Token ID7224
Feature activation-9.37e-01

Pos logit contributions
 wow+0.001
 ago+0.001
uzz+0.001
unts+0.001
 fault+0.001
Neg logit contributions
 convince-0.001
 hips-0.001
 injuries-0.001
bing-0.001
eling-0.001
Tokenaw
Token ID707
Feature activation-2.97e+00

Pos logit contributions
unts+0.030
 fault+0.028
 wow+0.028
 solar+0.027
uzz+0.027
Neg logit contributions
 convince-0.031
 hips-0.026
 injuries-0.026
bing-0.024
me-0.024
Token,
Token ID11
Feature activation-9.15e-02

Pos logit contributions
 wow+0.096
 ago+0.086
unts+0.086
uzz+0.085
 fault+0.085
Neg logit contributions
 convince-0.106
 hips-0.092
 injuries-0.088
bing-0.084
arians-0.082
Token she
Token ID673
Feature activation+1.92e-02

Pos logit contributions
unts+0.003
 fault+0.003
 wow+0.003
ocado+0.003
 ago+0.003
Neg logit contributions
 convince-0.003
 hips-0.002
 injuries-0.002
bing-0.002
 hire-0.002
Token accidentally
Token ID14716
Feature activation-2.16e-01

Pos logit contributions
 convince+0.001
 hips+0.001
 injuries+0.001
me+0.001
eling+0.001
Neg logit contributions
 wow-0.001
 fault-0.001
uzz-0.000
 ago-0.000
unts-0.000
Token it
Token ID340
Feature activation-4.55e-01

Pos logit contributions
 wow+0.007
 ago+0.006
unts+0.006
Kat+0.006
 fault+0.006
Neg logit contributions
 convince-0.008
 hips-0.007
 injuries-0.007
bing-0.006
eling-0.006
Token and
Token ID290
Feature activation-3.27e-01

Pos logit contributions
 happened+0.011
 wow+0.011
 happens+0.010
 ago+0.010
unts+0.010
Neg logit contributions
 convince-0.019
 hips-0.017
arians-0.016
 injuries-0.016
 negotiate-0.016
TokenAs
Token ID1722
Feature activation-5.97e-01

Pos logit contributions
 wow+0.021
 ago+0.019
unts+0.019
uzz+0.019
 fault+0.019
Neg logit contributions
 convince-0.014
 hips-0.012
 injuries-0.011
bing-0.011
 hire-0.010
Token the
Token ID262
Feature activation-3.12e-01

Pos logit contributions
 wow+0.019
 ago+0.017
unts+0.017
uzz+0.017
 fault+0.017
Neg logit contributions
 convince-0.022
 hips-0.019
 injuries-0.018
bing-0.017
ivering-0.017
Token light
Token ID1657
Feature activation-4.49e-02

Pos logit contributions
 wow+0.008
unts+0.007
 ago+0.007
uzz+0.007
 fault+0.007
Neg logit contributions
 convince-0.013
 hips-0.012
 injuries-0.011
bing-0.011
 hire-0.011
Tokenening
Token ID3101
Feature activation-3.80e-01

Pos logit contributions
 wow+0.001
unts+0.001
 fault+0.001
uzz+0.001
Kat+0.001
Neg logit contributions
 convince-0.002
 hips-0.001
 injuries-0.001
bing-0.001
 hire-0.001
Token f
Token ID277
Feature activation-9.68e-01

Pos logit contributions
 wow+0.013
unts+0.012
uzz+0.011
 fault+0.011
Kat+0.011
Neg logit contributions
 convince-0.013
 hips-0.011
bing-0.011
 injuries-0.011
 hire-0.010
Tokenlic
Token ID677
Feature activation-2.96e+00

Pos logit contributions
 wow+0.028
 fault+0.024
 telescope+0.024
unts+0.024
 many+0.023
Neg logit contributions
 convince-0.037
ivering-0.033
 hips-0.033
bing-0.033
me-0.031
Tokenkered
Token ID28970
Feature activation-1.13e+00

Pos logit contributions
 wow+0.090
 fault+0.080
 telescope+0.079
 ago+0.078
uzz+0.078
Neg logit contributions
 convince-0.112
 hips-0.097
bing-0.094
 hire-0.090
 injuries-0.090
Token away
Token ID1497
Feature activation-6.71e-01

Pos logit contributions
 wow+0.039
unts+0.036
 fault+0.036
Kat+0.035
 ago+0.035
Neg logit contributions
 convince-0.037
 hips-0.033
 injuries-0.030
bing-0.030
 hire-0.028
Token,
Token ID11
Feature activation-4.97e-01

Pos logit contributions
 wow+0.021
 ago+0.019
unts+0.019
uzz+0.019
 maj+0.019
Neg logit contributions
 convince-0.025
 hips-0.021
 injuries-0.020
bing-0.019
arians-0.019
Token the
Token ID262
Feature activation-2.41e-01

Pos logit contributions
 wow+0.016
 ago+0.014
unts+0.014
 happened+0.014
 telescope+0.013
Neg logit contributions
 convince-0.018
 hips-0.016
 injuries-0.015
bing-0.015
ivering-0.015
Token man
Token ID582
Feature activation-3.41e-01

Pos logit contributions
 wow+0.008
uzz+0.007
 fault+0.007
unts+0.007
Kat+0.007
Neg logit contributions
 convince-0.009
 hips-0.007
 injuries-0.007
bing-0.007
 hire-0.007
Token ball
Token ID2613
Feature activation+1.13e+00

Pos logit contributions
 wow+0.002
 ago+0.002
unts+0.002
uzz+0.002
 fault+0.002
Neg logit contributions
 convince-0.003
 hips-0.003
 injuries-0.003
bing-0.003
 hire-0.003
Token?"
Token ID1701
Feature activation-9.12e-02

Pos logit contributions
 convince+0.041
 hips+0.036
 injuries+0.035
bing+0.033
eling+0.033
Neg logit contributions
 wow-0.036
 ago-0.033
uzz-0.032
unts-0.031
 fault-0.031
Token
Token ID198
Feature activation+5.39e-02

Pos logit contributions
 wow+0.002
 ago+0.002
unts+0.002
uzz+0.002
 fault+0.002
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
bing-0.003
 hire-0.003
Token
Token ID198
Feature activation+5.69e-02

Pos logit contributions
 convince+0.002
 hips+0.002
 injuries+0.002
bing+0.002
me+0.002
Neg logit contributions
 wow-0.002
 ago-0.001
 fault-0.001
 happened-0.001
 telescope-0.001
TokenA
Token ID32
Feature activation-4.20e-01

Pos logit contributions
 convince+0.002
 hips+0.001
 injuries+0.001
 hire+0.001
bing+0.001
Neg logit contributions
 wow-0.002
 fault-0.002
 ago-0.002
 happened-0.002
 maj-0.002
Tokenlic
Token ID677
Feature activation-2.96e+00

Pos logit contributions
 fault+0.014
 wow+0.014
 maj+0.013
uzz+0.013
 ago+0.013
Neg logit contributions
 convince-0.015
 hips-0.012
 hire-0.011
 persuade-0.011
me-0.011
Tokenia
Token ID544
Feature activation-1.18e+00

Pos logit contributions
 wow+0.089
 fault+0.083
 ago+0.077
 telescope+0.076
Kat+0.073
Neg logit contributions
 convince-0.114
 hips-0.100
bing-0.094
me-0.093
 hire-0.091
Token nodded
Token ID14464
Feature activation-1.84e+00

Pos logit contributions
 fault+0.035
 wow+0.035
unts+0.033
 maj+0.032
uzz+0.032
Neg logit contributions
 convince-0.045
 hips-0.039
me-0.035
bing-0.035
 injuries-0.035
Token and
Token ID290
Feature activation-3.11e-01

Pos logit contributions
 wow+0.061
unts+0.055
 ago+0.054
 fault+0.054
 maj+0.053
Neg logit contributions
 convince-0.065
 hips-0.056
 injuries-0.053
bing-0.051
 hire-0.050
Token said
Token ID531
Feature activation-3.60e-01

Pos logit contributions
 fault+0.012
unts+0.011
 ago+0.011
uzz+0.011
 solar+0.011
Neg logit contributions
 convince-0.010
 hips-0.008
 persuade-0.007
 before-0.007
bing-0.007
Token,
Token ID11
Feature activation-1.68e-01

Pos logit contributions
unts+0.016
 fault+0.015
 ago+0.015
 solar+0.015
 telescope+0.015
Neg logit contributions
 convince-0.008
 hips-0.007
 help-0.006
bing-0.006
 before-0.005
Token it
Token ID340
Feature activation-5.20e-01

Pos logit contributions
 convince+0.005
 hips+0.005
ivering+0.005
 injuries+0.005
arians+0.005
Neg logit contributions
 wow-0.002
 ago-0.002
 happened-0.002
Kat-0.002
unts-0.002
Token.
Token ID13
Feature activation-4.80e-02

Pos logit contributions
 wow+0.016
 happened+0.016
 ago+0.016
 fault+0.015
 happens+0.015
Neg logit contributions
 convince-0.019
 hips-0.016
 injuries-0.015
arians-0.015
bing-0.015
Token
Token ID198
Feature activation-7.56e-02

Pos logit contributions
 wow+0.002
 telescope+0.002
 prope+0.002
 fault+0.001
 maj+0.001
Neg logit contributions
 convince-0.001
 hips-0.001
-0.001
 hire-0.001
 sleeve-0.001
Token
Token ID198
Feature activation-7.96e-02

Pos logit contributions
 wow+0.003
 fault+0.002
 telescope+0.002
 ago+0.002
 maj+0.002
Neg logit contributions
 convince-0.003
 hips-0.002
me-0.002
 injuries-0.002
 persuade-0.002
TokenA
Token ID32
Feature activation-5.01e-01

Pos logit contributions
 wow+0.003
 fault+0.003
 ago+0.003
uzz+0.003
 telescope+0.003
Neg logit contributions
 convince-0.003
 hips-0.002
 injuries-0.002
 hire-0.002
me-0.002
Tokenlic
Token ID677
Feature activation-2.96e+00

Pos logit contributions
 wow+0.016
 fault+0.015
uzz+0.015
 ago+0.014
 maj+0.014
Neg logit contributions
 convince-0.018
 hips-0.016
 injuries-0.015
 hire-0.014
bing-0.014
Tokenia
Token ID544
Feature activation-1.21e+00

Pos logit contributions
 wow+0.082
 fault+0.079
 telescope+0.072
 ago+0.072
 maj+0.069
Neg logit contributions
 convince-0.118
 hips-0.103
me-0.097
bing-0.096
 hire-0.095
Token smiled
Token ID13541
Feature activation-2.46e-01

Pos logit contributions
 wow+0.033
 fault+0.030
uzz+0.030
unts+0.030
 maj+0.029
Neg logit contributions
 convince-0.049
 hips-0.043
 injuries-0.041
me-0.040
bing-0.040
Token while
Token ID981
Feature activation-2.06e-01

Pos logit contributions
 wow+0.009
unts+0.009
 fault+0.008
uzz+0.008
 ago+0.008
Neg logit contributions
 convince-0.007
 hips-0.006
bing-0.006
 injuries-0.006
ivering-0.006
Token she
Token ID673
Feature activation-2.21e-01

Pos logit contributions
 wow+0.006
 fault+0.005
unts+0.005
 ago+0.005
uzz+0.005
Neg logit contributions
 convince-0.008
 hips-0.007
 injuries-0.007
bing-0.006
 hire-0.006
Token looked
Token ID3114
Feature activation-2.56e-01

Pos logit contributions
 wow+0.006
 fault+0.006
uzz+0.005
 ago+0.005
 maj+0.005
Neg logit contributions
 convince-0.009
 hips-0.008
 injuries-0.008
 hire-0.007
me-0.007
Token Ch
Token ID609
Feature activation+3.13e-01

Pos logit contributions
 wow+0.000
 many+0.000
 happened+0.000
uzz+0.000
 ago+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
ivering-0.000
eling-0.000
Tokenir
Token ID343
Feature activation+1.71e+00

Pos logit contributions
 convince+0.014
 hips+0.012
 injuries+0.012
 persuade+0.012
 negotiate+0.011
Neg logit contributions
 ago-0.007
uzz-0.007
unts-0.007
 wow-0.007
ull-0.007
Tokenpy
Token ID9078
Feature activation-5.17e-01

Pos logit contributions
 convince+0.064
 hips+0.056
 injuries+0.053
 negotiate+0.050
bing+0.049
Neg logit contributions
 wow-0.052
 ago-0.049
unts-0.048
uzz-0.048
 happened-0.047
Token and
Token ID290
Feature activation-2.00e-01

Pos logit contributions
 wow+0.013
 ago+0.012
unts+0.012
 happened+0.012
 maj+0.011
Neg logit contributions
 convince-0.022
 hips-0.019
 injuries-0.019
bing-0.018
arians-0.018
Token Wh
Token ID854
Feature activation+1.61e+00

Pos logit contributions
 wow+0.007
Kat+0.007
unts+0.007
 fault+0.007
 ago+0.007
Neg logit contributions
 convince-0.006
 hips-0.005
 injuries-0.005
 hire-0.005
bing-0.005
Tokenisk
Token ID1984
Feature activation+5.84e+00

Pos logit contributions
 convince+0.052
 hips+0.043
 injuries+0.039
 persuade+0.038
bing+0.037
Neg logit contributions
uzz-0.058
 wow-0.057
unts-0.056
 fault-0.055
 ago-0.055
Tokeners
Token ID364
Feature activation+2.54e-01

Pos logit contributions
 convince+0.100
 hips+0.065
 injuries+0.061
bing+0.058
arians+0.056
Neg logit contributions
 wow-0.295
 telescope-0.280
 ago-0.280
unts-0.276
 fault-0.275
Token became
Token ID2627
Feature activation+2.53e-01

Pos logit contributions
 convince+0.011
 hips+0.010
 injuries+0.010
bing+0.009
arians+0.009
Neg logit contributions
 wow-0.006
 ago-0.005
unts-0.005
uzz-0.005
 happened-0.005
Token good
Token ID922
Feature activation-3.02e-01

Pos logit contributions
 convince+0.010
 hips+0.009
 injuries+0.008
arians+0.008
vation+0.008
Neg logit contributions
 wow-0.007
unts-0.007
 many-0.007
 fault-0.006
 happened-0.006
Token friends
Token ID2460
Feature activation-2.59e-01

Pos logit contributions
 wow+0.009
unts+0.008
 fault+0.008
 ago+0.008
 happened+0.008
Neg logit contributions
 convince-0.011
 hips-0.010
 injuries-0.009
 negotiate-0.009
bing-0.009
Token,
Token ID11
Feature activation-8.64e-02

Pos logit contributions
 wow+0.008
 ago+0.007
unts+0.007
uzz+0.007
 many+0.007
Neg logit contributions
 convince-0.010
 hips-0.008
 negotiate-0.008
eling-0.008
 injuries-0.008
Token slowly
Token ID6364
Feature activation-5.42e-01

Pos logit contributions
 convince+0.032
 hips+0.027
 injuries+0.026
arians+0.025
 hire+0.025
Neg logit contributions
 wow-0.028
uzz-0.026
unts-0.026
 ago-0.026
 maj-0.025
Token and
Token ID290
Feature activation-1.65e-01

Pos logit contributions
 wow+0.017
unts+0.016
uzz+0.015
Kat+0.015
 ago+0.015
Neg logit contributions
 convince-0.020
 hips-0.017
 injuries-0.016
bing-0.016
 hire-0.015
Token carefully
Token ID7773
Feature activation-5.31e-01

Pos logit contributions
 wow+0.005
unts+0.005
 ago+0.005
 fault+0.005
uzz+0.005
Neg logit contributions
 convince-0.006
 hips-0.005
 injuries-0.005
bing-0.005
 hire-0.004
Token to
Token ID284
Feature activation+1.26e-01

Pos logit contributions
 wow+0.017
 ago+0.015
unts+0.015
uzz+0.014
 fault+0.014
Neg logit contributions
 convince-0.020
 hips-0.017
 injuries-0.016
bing-0.016
 hire-0.015
Token Wh
Token ID854
Feature activation+1.33e+00

Pos logit contributions
 convince+0.004
 hips+0.004
ivering+0.004
arians+0.004
ider+0.004
Neg logit contributions
 wow-0.004
 many-0.004
 maj-0.003
 wonder-0.003
unts-0.003
Tokenisk
Token ID1984
Feature activation+5.37e+00

Pos logit contributions
 convince+0.042
 hips+0.033
bing+0.031
 persuade+0.031
 injuries+0.030
Neg logit contributions
 wow-0.049
 fault-0.048
unts-0.046
uzz-0.046
 maj-0.045
Tokeners
Token ID364
Feature activation+1.04e-01

Pos logit contributions
 convince+0.068
me+0.039
bing+0.037
 hips+0.035
 injuries+0.035
Neg logit contributions
 wow-0.293
 ago-0.286
 telescope-0.286
 amaz-0.280
 solar-0.278
Token.
Token ID13
Feature activation+9.96e-02

Pos logit contributions
 convince+0.003
 hips+0.003
 injuries+0.003
bing+0.003
ivering+0.003
Neg logit contributions
 wow-0.004
 ago-0.003
unts-0.003
uzz-0.003
 fault-0.003
Token Wh
Token ID854
Feature activation+1.32e+00

Pos logit contributions
 convince+0.004
 hips+0.004
 injuries+0.003
bing+0.003
 hire+0.003
Neg logit contributions
 wow-0.003
 telescope-0.002
Kat-0.002
 ago-0.002
unts-0.002
Tokenisk
Token ID1984
Feature activation+5.36e+00

Pos logit contributions
 convince+0.045
 hips+0.038
 persuade+0.034
 injuries+0.034
 hire+0.033
Neg logit contributions
 fault-0.045
uzz-0.044
 wow-0.043
unts-0.043
 happened-0.042
Tokeners
Token ID364
Feature activation+9.85e-02

Pos logit contributions
 convince+0.093
 hips+0.061
bing+0.056
 injuries+0.056
me+0.054
Neg logit contributions
 wow-0.272
 ago-0.260
 telescope-0.257
 fault-0.254
unts-0.254
Token care
Token ID1337
Feature activation+2.43e-01

Pos logit contributions
 convince+0.014
 hips+0.012
ivering+0.012
 injuries+0.012
bing+0.012
Neg logit contributions
 wow-0.010
 fault-0.009
 many-0.009
 happened-0.009
 ago-0.009
Token of
Token ID286
Feature activation+6.22e-01

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.006
bing+0.006
arians+0.006
Neg logit contributions
 wow-0.009
 ago-0.008
unts-0.008
uzz-0.008
 fault-0.008
Token the
Token ID262
Feature activation+8.50e-02

Pos logit contributions
 convince+0.028
 hips+0.025
ider+0.025
ivering+0.024
vation+0.024
Neg logit contributions
 wow-0.015
 many-0.013
 fault-0.013
 ago-0.012
uzz-0.012
Token rh
Token ID9529
Feature activation+1.55e+00

Pos logit contributions
 convince+0.004
 hips+0.003
bing+0.003
eling+0.003
pper+0.003
Neg logit contributions
 wow-0.002
 ago-0.002
 many-0.002
 fault-0.002
unts-0.002
Tokenin
Token ID259
Feature activation+4.16e-01

Pos logit contributions
 convince+0.083
 hips+0.080
 hind+0.070
 persuade+0.070
 injuries+0.069
Neg logit contributions
aw-0.024
ull-0.023
unts-0.022
od-0.021
ane-0.021
Tokenoc
Token ID420
Feature activation+5.09e+00

Pos logit contributions
 convince+0.023
 hips+0.022
 injuries+0.020
 persuade+0.019
itching+0.019
Neg logit contributions
om-0.005
 many-0.005
 temple-0.005
ocado-0.005
aw-0.005
Tokeneros
Token ID27498
Feature activation+7.36e-01

Pos logit contributions
 convince+0.183
 hips+0.147
 persuade+0.135
 hire+0.125
ivering+0.125
Neg logit contributions
 fault-0.169
 wow-0.168
aw-0.164
uzz-0.162
 ago-0.158
Token and
Token ID290
Feature activation-5.16e-02

Pos logit contributions
 convince+0.028
 hips+0.025
 injuries+0.023
 negotiate+0.022
eling+0.022
Neg logit contributions
 wow-0.021
 ago-0.021
 fault-0.020
 happened-0.019
 many-0.019
Token it
Token ID340
Feature activation-4.27e-01

Pos logit contributions
 wow+0.002
unts+0.001
uzz+0.001
 ago+0.001
Kat+0.001
Neg logit contributions
 convince-0.002
 hips-0.002
 injuries-0.002
bing-0.002
 hire-0.002
Token needed
Token ID2622
Feature activation+1.05e+00

Pos logit contributions
 wow+0.010
 happened+0.010
 happens+0.010
 fault+0.009
 maj+0.009
Neg logit contributions
 convince-0.019
 hips-0.017
 injuries-0.016
arians-0.016
ittens-0.016
Token to
Token ID284
Feature activation+9.91e-02

Pos logit contributions
 convince+0.048
 hips+0.043
ivering+0.042
bing+0.042
arians+0.041
Neg logit contributions
 wow-0.023
 happened-0.021
 many-0.021
 ago-0.021
 fault-0.020
Token that
Token ID326
Feature activation-9.51e-01

Pos logit contributions
 wow+0.001
 ago+0.001
 happened+0.001
uzz+0.001
 telescope+0.001
Neg logit contributions
 convince-0.002
 hips-0.001
 injuries-0.001
 hire-0.001
arians-0.001
Token day
Token ID1110
Feature activation+5.78e-01

Pos logit contributions
 wow+0.024
uzz+0.021
 ago+0.021
 fault+0.020
Kat+0.020
Neg logit contributions
 convince-0.042
 hips-0.036
 injuries-0.035
bing-0.035
me-0.033
Token on
Token ID319
Feature activation-3.16e-01

Pos logit contributions
 convince+0.025
 hips+0.022
 injuries+0.021
 negotiate+0.021
arians+0.021
Neg logit contributions
 wow-0.014
apped-0.013
 many-0.013
unts-0.013
 happened-0.013
Token,
Token ID11
Feature activation-2.77e-01

Pos logit contributions
 wow+0.006
 happened+0.006
 many+0.006
 ago+0.006
 happens+0.005
Neg logit contributions
 convince-0.016
 negotiate-0.014
 hips-0.014
ivering-0.014
eling-0.013
Token Wh
Token ID854
Feature activation+1.22e+00

Pos logit contributions
 wow+0.008
 many+0.008
 happened+0.007
unts+0.007
 happens+0.007
Neg logit contributions
 convince-0.011
 hips-0.010
ivering-0.009
 injuries-0.009
 sleeve-0.009
Tokenisk
Token ID1984
Feature activation+5.23e+00

Pos logit contributions
 convince+0.041
 hips+0.035
 persuade+0.031
 injuries+0.031
 negotiate+0.030
Neg logit contributions
uzz-0.041
unts-0.040
 fault-0.040
 wow-0.040
 ago-0.040
Tokeners
Token ID364
Feature activation-7.00e-02

Pos logit contributions
 convince+0.118
 hips+0.094
 injuries+0.086
bing+0.077
 hire+0.075
Neg logit contributions
 wow-0.239
unts-0.220
uzz-0.220
 ago-0.219
 maj-0.218
Token and
Token ID290
Feature activation-2.74e-01

Pos logit contributions
 wow+0.002
 fault+0.002
 ago+0.002
unts+0.002
uzz+0.002
Neg logit contributions
 convince-0.003
 hips-0.002
 injuries-0.002
bing-0.002
arians-0.002
Token Ch
Token ID609
Feature activation-1.86e-01

Pos logit contributions
 wow+0.007
 many+0.007
 Jas+0.007
uzz+0.007
unts+0.007
Neg logit contributions
 convince-0.011
 hips-0.010
ivering-0.009
vation-0.009
ider-0.009
Tokenir
Token ID343
Feature activation+7.91e-01

Pos logit contributions
uzz+0.004
 ago+0.004
unts+0.004
 fault+0.004
om+0.003
Neg logit contributions
 convince-0.009
 hips-0.008
 persuade-0.007
vation-0.007
 injuries-0.007
Tokenpy
Token ID9078
Feature activation-1.36e+00

Pos logit contributions
 convince+0.032
 hips+0.028
 injuries+0.027
 negotiate+0.025
+0.025
Neg logit contributions
 wow-0.022
uzz-0.021
unts-0.021
 ago-0.020
apped-0.020
Token to
Token ID284
Feature activation-2.36e-01

Pos logit contributions
 wow+0.016
unts+0.014
Kat+0.014
 ago+0.014
 happened+0.014
Neg logit contributions
 convince-0.022
 hips-0.019
 injuries-0.018
bing-0.018
 hire-0.017
Token see
Token ID766
Feature activation+5.81e-01

Pos logit contributions
 wow+0.008
unts+0.008
apped+0.008
 ago+0.007
 fault+0.007
Neg logit contributions
 convince-0.008
 hips-0.006
 injuries-0.006
 hire-0.006
 negotiate-0.006
Token the
Token ID262
Feature activation+9.22e-03

Pos logit contributions
 convince+0.019
 injuries+0.015
 hips+0.015
bing+0.014
place+0.014
Neg logit contributions
unts-0.019
 wow-0.019
 ago-0.019
uzz-0.018
apped-0.018
Token rh
Token ID9529
Feature activation+1.49e+00

Pos logit contributions
 convince+0.000
 hips+0.000
 injuries+0.000
bing+0.000
ivering+0.000
Neg logit contributions
 wow-0.000
 ago-0.000
unts-0.000
 fault-0.000
 telescope-0.000
Tokenin
Token ID259
Feature activation+4.13e-01

Pos logit contributions
 convince+0.072
 hips+0.067
 injuries+0.060
 persuade+0.059
 hind+0.058
Neg logit contributions
aw-0.029
ull-0.029
unts-0.029
 wow-0.028
apped-0.027
Tokenoc
Token ID420
Feature activation+4.83e+00

Pos logit contributions
 convince+0.019
 hips+0.019
 injuries+0.017
 persuade+0.016
 hind+0.016
Neg logit contributions
 wow-0.008
 many-0.008
 ago-0.008
ull-0.008
om-0.008
Tokeneros
Token ID27498
Feature activation+9.76e-01

Pos logit contributions
 convince+0.145
 hips+0.115
 injuries+0.104
 persuade+0.100
ivering+0.099
Neg logit contributions
 wow-0.191
uzz-0.176
 ago-0.176
 fault-0.176
 maj-0.174
Token.
Token ID13
Feature activation-1.07e-01

Pos logit contributions
 convince+0.035
 hips+0.031
 injuries+0.030
arians+0.028
ivering+0.028
Neg logit contributions
 wow-0.031
 ago-0.029
 happened-0.029
 many-0.027
 telescope-0.027
Token Tim
Token ID5045
Feature activation+1.00e+00

Pos logit contributions
 wow+0.003
 ago+0.002
unts+0.002
uzz+0.002
 maj+0.002
Neg logit contributions
 convince-0.005
 hips-0.004
 injuries-0.004
bing-0.004
ivering-0.004
Tokenmy
Token ID1820
Feature activation+1.11e+00

Pos logit contributions
 convince+0.040
 hips+0.035
 injuries+0.035
bing+0.033
arians+0.032
Neg logit contributions
 wow-0.027
 ago-0.025
 happened-0.024
unts-0.024
uzz-0.024
Token was
Token ID373
Feature activation+1.07e-01

Pos logit contributions
 convince+0.049
 hips+0.044
 injuries+0.043
bing+0.041
arians+0.040
Neg logit contributions
 wow-0.026
 ago-0.023
unts-0.023
 happened-0.022
uzz-0.022
Token to
Token ID284
Feature activation+1.88e-01

Pos logit contributions
 convince+0.007
 hips+0.006
 injuries+0.006
arians+0.006
 hire+0.006
Neg logit contributions
 wow-0.004
 fault-0.004
 ago-0.004
uzz-0.004
unts-0.004
Token use
Token ID779
Feature activation+1.27e-01

Pos logit contributions
arians+0.006
ider+0.006
vation+0.006
ns+0.005
ittens+0.005
Neg logit contributions
 wow-0.006
 wonder-0.005
 telescope-0.005
 many-0.005
 Jas-0.005
Token a
Token ID257
Feature activation-7.17e-02

Pos logit contributions
 convince+0.005
ivering+0.005
arians+0.005
 hips+0.005
me+0.005
Neg logit contributions
 wow-0.003
 fault-0.003
 many-0.003
 ago-0.003
 telescope-0.003
Token stool
Token ID38753
Feature activation+1.85e-01

Pos logit contributions
 wow+0.002
unts+0.002
 telescope+0.002
 ago+0.002
 happened+0.002
Neg logit contributions
 convince-0.003
 hips-0.002
arians-0.002
 injuries-0.002
bing-0.002
Token,
Token ID11
Feature activation+2.04e-01

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.006
bing+0.006
eling+0.006
Neg logit contributions
 wow-0.005
 ago-0.004
 fault-0.004
unts-0.004
uzz-0.004
Token but
Token ID475
Feature activation+1.61e+00

Pos logit contributions
 convince+0.006
ivering+0.005
arians+0.005
vation+0.005
 hire+0.005
Neg logit contributions
 wow-0.008
 ago-0.007
 many-0.007
 fault-0.007
 maj-0.007
Token it
Token ID340
Feature activation-4.57e-01

Pos logit contributions
 convince+0.073
 hips+0.065
 injuries+0.063
ivering+0.062
arians+0.061
Neg logit contributions
 wow-0.036
 ago-0.031
 happened-0.031
 happens-0.030
uzz-0.030
Token was
Token ID373
Feature activation-1.37e-01

Pos logit contributions
 wow+0.011
 happened+0.010
 happens+0.010
 fault+0.010
uzz+0.010
Neg logit contributions
 convince-0.019
 hips-0.017
 injuries-0.017
bing-0.016
arians-0.016
Token too
Token ID1165
Feature activation-1.10e-01

Pos logit contributions
 wow+0.005
 many+0.005
 happened+0.005
 ago+0.005
 fault+0.005
Neg logit contributions
 convince-0.004
arians-0.003
vation-0.003
 hips-0.003
 hire-0.003
Token far
Token ID1290
Feature activation+2.49e-01

Pos logit contributions
 wow+0.003
uzz+0.003
 ago+0.003
Kat+0.003
 telescope+0.003
Neg logit contributions
 convince-0.004
 hips-0.004
 injuries-0.004
bing-0.003
ivering-0.003
Token.
Token ID13
Feature activation+1.70e-01

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.006
arians+0.006
 hire+0.006
Neg logit contributions
 wow-0.009
 ago-0.008
 fault-0.008
 happened-0.008
unts-0.008
TokenL
Token ID43
Feature activation-6.98e-01

Pos logit contributions
 wow+0.002
unts+0.002
Kat+0.002
 many+0.002
 happens+0.002
Neg logit contributions
 convince-0.003
 hips-0.003
 injuries-0.003
arians-0.003
bing-0.003
Tokenily
Token ID813
Feature activation-9.48e-01

Pos logit contributions
 wow+0.016
unts+0.013
 telescope+0.013
 ago+0.013
 maj+0.013
Neg logit contributions
 convince-0.032
 hips-0.028
 injuries-0.027
ivering-0.026
bing-0.026
Token was
Token ID373
Feature activation-2.73e-01

Pos logit contributions
 wow+0.022
unts+0.019
 happened+0.019
apped+0.018
Kat+0.018
Neg logit contributions
 convince-0.042
 hips-0.038
 injuries-0.037
bing-0.035
arians-0.035
Token sad
Token ID6507
Feature activation-1.79e-01

Pos logit contributions
 wow+0.009
 ago+0.008
 telescope+0.008
unts+0.008
 maj+0.008
Neg logit contributions
 convince-0.009
 hips-0.008
 injuries-0.008
bing-0.007
ivering-0.007
Token,
Token ID11
Feature activation-7.45e-02

Pos logit contributions
 wow+0.006
unts+0.006
Kat+0.006
 telescope+0.006
 ago+0.006
Neg logit contributions
 convince-0.006
 hips-0.005
 injuries-0.005
bing-0.005
ivering-0.004
Token but
Token ID475
Feature activation+1.47e+00

Pos logit contributions
 wow+0.003
 ago+0.003
unts+0.003
 happened+0.003
 fault+0.003
Neg logit contributions
 convince-0.002
 hips-0.001
 injuries-0.001
arians-0.001
 hire-0.001
Token knew
Token ID2993
Feature activation-1.03e+00

Pos logit contributions
 convince+0.052
 hips+0.043
 injuries+0.042
bing+0.040
eling+0.039
Neg logit contributions
 wow-0.049
 fault-0.046
 ago-0.045
unts-0.045
uzz-0.045
Token she
Token ID673
Feature activation-8.66e-02

Pos logit contributions
 wow+0.032
unts+0.030
 ago+0.030
 maj+0.029
uzz+0.029
Neg logit contributions
 convince-0.038
 hips-0.032
 injuries-0.031
bing-0.030
 hire-0.029
Token had
Token ID550
Feature activation-3.75e-01

Pos logit contributions
 wow+0.002
 ago+0.002
unts+0.002
uzz+0.002
 fault+0.002
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
bing-0.003
 hire-0.003
Token to
Token ID284
Feature activation+1.15e-01

Pos logit contributions
 wow+0.013
 telescope+0.012
 ago+0.012
uzz+0.012
 solar+0.011
Neg logit contributions
 convince-0.013
 injuries-0.012
 hips-0.011
 resist-0.010
 hurt-0.010
Token face
Token ID1986
Feature activation-6.99e-01

Pos logit contributions
bing+0.004
 hips+0.004
arians+0.004
ns+0.003
 convince+0.003
Neg logit contributions
 wow-0.004
 wonder-0.003
unts-0.003
 fault-0.003
 many-0.003
Token's
Token ID338
Feature activation-5.37e-01

Pos logit contributions
uzz+0.061
Kat+0.056
 ago+0.056
 wow+0.055
year+0.053
Neg logit contributions
 convince-0.026
 hips-0.021
me-0.019
 first-0.018
 hire-0.017
Token nice
Token ID3621
Feature activation+8.74e-03

Pos logit contributions
 wow+0.014
 many+0.012
unts+0.011
 maj+0.011
 happened+0.011
Neg logit contributions
 convince-0.023
 hips-0.021
 injuries-0.020
ivering-0.020
bing-0.019
Token,
Token ID11
Feature activation+3.05e-01

Pos logit contributions
 convince+0.000
 hips+0.000
me+0.000
 hire+0.000
 myself+0.000
Neg logit contributions
Kat-0.000
 wow-0.000
uzz-0.000
 solar-0.000
 maj-0.000
Token Mia
Token ID32189
Feature activation+3.51e-01

Pos logit contributions
 convince+0.014
 hips+0.012
 injuries+0.012
bing+0.012
arians+0.011
Neg logit contributions
 wow-0.007
unts-0.006
 ago-0.006
uzz-0.006
 fault-0.006
Token,
Token ID11
Feature activation+3.53e-01

Pos logit contributions
 convince+0.010
 hips+0.009
 injuries+0.008
bing+0.008
ivering+0.007
Neg logit contributions
 wow-0.014
uzz-0.012
 ago-0.012
Kat-0.012
 telescope-0.012
Token but
Token ID475
Feature activation+1.76e+00

Pos logit contributions
 convince+0.010
 hips+0.009
 injuries+0.008
bing+0.007
me+0.007
Neg logit contributions
unts-0.013
 wow-0.013
Kat-0.013
uzz-0.013
apped-0.013
Token listen
Token ID6004
Feature activation-9.94e-01

Pos logit contributions
 convince+0.045
 hips+0.036
 injuries+0.033
bing+0.032
 hire+0.031
Neg logit contributions
 wow-0.073
unts-0.071
Kat-0.070
uzz-0.069
 fault-0.069
Token to
Token ID284
Feature activation+2.40e-01

Pos logit contributions
 wow+0.031
uzz+0.028
unts+0.027
apped+0.027
 telescope+0.027
Neg logit contributions
 convince-0.037
 hips-0.033
bing-0.030
 injuries-0.029
 before-0.028
Token my
Token ID616
Feature activation+1.42e-01

Pos logit contributions
 convince+0.007
 hips+0.006
 injuries+0.006
bing+0.005
 hire+0.005
Neg logit contributions
 wow-0.009
 ago-0.008
unts-0.008
uzz-0.008
 happened-0.008
Token drum
Token ID13026
Feature activation+3.43e-01

Pos logit contributions
 convince+0.005
 hips+0.005
 injuries+0.004
bing+0.004
 hire+0.004
Neg logit contributions
 wow-0.004
 ago-0.004
uzz-0.004
unts-0.004
Kat-0.004
Token!"
Token ID2474
Feature activation-1.35e-01

Pos logit contributions
 convince+0.011
 hips+0.009
 injuries+0.009
bing+0.009
ivering+0.008
Neg logit contributions
 wow-0.012
 ago-0.011
unts-0.011
 fault-0.011
uzz-0.011
Token the
Token ID262
Feature activation-1.01e-01

Pos logit contributions
 wow+0.020
 ago+0.017
uzz+0.017
unts+0.017
 happened+0.017
Neg logit contributions
 convince-0.023
 hips-0.020
 injuries-0.019
bing-0.019
ivering-0.019
Token boy
Token ID2933
Feature activation+3.11e-01

Pos logit contributions
 wow+0.002
uzz+0.002
 ago+0.002
unts+0.002
 fault+0.002
Neg logit contributions
 convince-0.005
 hips-0.004
 injuries-0.004
 hire-0.004
bing-0.004
Token if
Token ID611
Feature activation+1.34e-02

Pos logit contributions
 convince+0.012
 hips+0.011
 injuries+0.010
bing+0.010
 hire+0.010
Neg logit contributions
 wow-0.009
unts-0.008
 ago-0.008
 happened-0.008
 telescope-0.007
Token he
Token ID339
Feature activation-1.47e-01

Pos logit contributions
 convince+0.000
 hips+0.000
 injuries+0.000
 hire+0.000
bing+0.000
Neg logit contributions
 wow-0.001
unts-0.000
 ago-0.000
 fault-0.000
uzz-0.000
Token could
Token ID714
Feature activation-3.73e-01

Pos logit contributions
 wow+0.002
 happened+0.002
 happens+0.002
 ago+0.002
uzz+0.002
Neg logit contributions
 convince-0.008
 hips-0.007
 injuries-0.007
bing-0.007
ivering-0.007
Token help
Token ID1037
Feature activation+1.67e+00

Pos logit contributions
 wonder+0.011
 wow+0.011
 ago+0.011
uzz+0.010
unts+0.010
Neg logit contributions
eling-0.012
bing-0.012
ivering-0.012
 hips-0.012
izing-0.012
Token him
Token ID683
Feature activation+2.19e-03

Pos logit contributions
ider+0.082
vation+0.079
ivering+0.077
 convince+0.075
arians+0.075
Neg logit contributions
 it-0.024
 Jill-0.023
 many-0.023
od-0.021
 ago-0.021
Token.
Token ID13
Feature activation-6.68e-01

Pos logit contributions
 injuries+0.000
 convince+0.000
arians+0.000
vation+0.000
ittens+0.000
Neg logit contributions
 wow-0.000
 many-0.000
ull-0.000
 wonder-0.000
od-0.000
Token The
Token ID383
Feature activation+2.88e-01

Pos logit contributions
 wow+0.014
unts+0.012
apped+0.012
 ago+0.012
uzz+0.012
Neg logit contributions
 convince-0.031
 injuries-0.029
ivering-0.028
eling-0.028
 hips-0.027
Token boy
Token ID2933
Feature activation+3.99e-01

Pos logit contributions
 convince+0.014
 hips+0.012
 injuries+0.012
bing+0.012
arians+0.012
Neg logit contributions
 wow-0.006
 ago-0.005
unts-0.005
uzz-0.005
 fault-0.005
Token said
Token ID531
Feature activation-2.81e-01

Pos logit contributions
 convince+0.017
 injuries+0.016
 hips+0.016
bing+0.015
ivering+0.015
Neg logit contributions
 wow-0.009
 happened-0.008
uzz-0.008
apped-0.008
unts-0.008
Token She
Token ID1375
Feature activation-2.89e-01

Pos logit contributions
 convince+0.004
 hips+0.004
 injuries+0.003
 sleeve+0.003
 hire+0.003
Neg logit contributions
 wow-0.003
 telescope-0.002
 happened-0.002
 fault-0.002
ull-0.002
Token sees
Token ID7224
Feature activation-8.13e-01

Pos logit contributions
 wow+0.008
Kat+0.007
uzz+0.007
 fault+0.007
unts+0.007
Neg logit contributions
 convince-0.012
 hips-0.011
 injuries-0.010
eling-0.010
 hire-0.010
Token a
Token ID257
Feature activation-1.17e-01

Pos logit contributions
 wow+0.026
unts+0.025
Kat+0.025
 maj+0.025
 telescope+0.024
Neg logit contributions
 convince-0.026
 hips-0.023
 injuries-0.022
bing-0.021
eling-0.021
Token man
Token ID582
Feature activation-1.40e-01

Pos logit contributions
 wow+0.004
uzz+0.004
 fault+0.004
apped+0.004
Kat+0.004
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
 hire-0.003
 negotiate-0.003
Token get
Token ID651
Feature activation+7.64e-01

Pos logit contributions
 wow+0.005
 ago+0.004
unts+0.004
uzz+0.004
 happened+0.004
Neg logit contributions
 convince-0.005
 hips-0.004
 injuries-0.004
arians-0.004
 hind-0.004
Token out
Token ID503
Feature activation+1.70e+00

Pos logit contributions
 convince+0.030
 hips+0.026
 injuries+0.025
bing+0.024
arians+0.024
Neg logit contributions
 wow-0.022
 ago-0.020
unts-0.020
uzz-0.019
 maj-0.019
Token of
Token ID286
Feature activation+8.12e-01

Pos logit contributions
 convince+0.062
 hips+0.053
 injuries+0.049
arians+0.049
 negotiate+0.049
Neg logit contributions
 wow-0.054
 ago-0.051
unts-0.050
uzz-0.049
 happened-0.049
Token the
Token ID262
Feature activation+2.30e-01

Pos logit contributions
 convince+0.032
 hips+0.028
ivering+0.027
me+0.026
bing+0.026
Neg logit contributions
 wow-0.024
 ago-0.022
uzz-0.021
 fault-0.021
 happened-0.020
Token truck
Token ID7779
Feature activation+1.04e+00

Pos logit contributions
 convince+0.009
 hips+0.008
bing+0.007
eling+0.007
arians+0.007
Neg logit contributions
 wow-0.007
 ago-0.007
 telescope-0.006
 maj-0.006
unts-0.006
Token and
Token ID290
Feature activation+1.14e-01

Pos logit contributions
 convince+0.039
 hips+0.033
 injuries+0.032
eling+0.031
ittens+0.030
Neg logit contributions
 ago-0.031
 wow-0.031
unts-0.029
 happened-0.028
 telescope-0.028
Token walk
Token ID2513
Feature activation+3.64e-01

Pos logit contributions
 convince+0.004
 hips+0.003
 injuries+0.003
 hire+0.003
eling+0.003
Neg logit contributions
 wow-0.004
 fault-0.003
uzz-0.003
unts-0.003
Kat-0.003
Token's
Token ID338
Feature activation-7.90e-01

Pos logit contributions
 convince+0.001
 hips+0.001
 injuries+0.001
bing+0.001
arians+0.001
Neg logit contributions
 wow-0.001
 happened-0.000
unts-0.000
 ago-0.000
 fault-0.000
Token mom
Token ID1995
Feature activation-4.27e-01

Pos logit contributions
 wow+0.029
Kat+0.026
unts+0.026
 ago+0.026
uzz+0.025
Neg logit contributions
 convince-0.026
 hips-0.022
 injuries-0.022
 sleeve-0.020
 hire-0.019
Tokenmy
Token ID1820
Feature activation+3.12e-01

Pos logit contributions
 wow+0.013
unts+0.011
 happened+0.011
 ago+0.011
uzz+0.011
Neg logit contributions
 convince-0.016
 hips-0.014
 injuries-0.014
bing-0.013
arians-0.013
Token said
Token ID531
Feature activation-1.87e-01

Pos logit contributions
 convince+0.013
 hips+0.012
 injuries+0.011
bing+0.011
arians+0.011
Neg logit contributions
 wow-0.008
 happened-0.007
unts-0.007
 ago-0.007
 fault-0.007
Token,
Token ID11
Feature activation+1.72e-01

Pos logit contributions
 wow+0.006
unts+0.006
Kat+0.006
 telescope+0.006
uzz+0.006
Neg logit contributions
 convince-0.006
 hips-0.005
 injuries-0.005
bing-0.005
 sleeve-0.004
Token "
Token ID366
Feature activation-3.05e-01

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.007
bing+0.007
arians+0.007
Neg logit contributions
 wow-0.003
unts-0.003
 ago-0.003
 fault-0.003
uzz-0.003
TokenWe
Token ID1135
Feature activation+4.34e-02

Pos logit contributions
 wow+0.008
unts+0.008
Kat+0.008
 happened+0.007
uzz+0.007
Neg logit contributions
 convince-0.012
 hips-0.011
 injuries-0.011
bing-0.010
arians-0.010
Token can
Token ID460
Feature activation-6.45e-01

Pos logit contributions
 convince+0.001
 hips+0.001
 hire+0.001
 injuries+0.001
 negotiate+0.001
Neg logit contributions
 wow-0.002
 maj-0.002
Kat-0.002
uzz-0.002
 telescope-0.002
Token't
Token ID470
Feature activation+1.09e+00

Pos logit contributions
 wonder+0.020
 wow+0.019
 many+0.019
 It+0.018
unts+0.018
Neg logit contributions
 injuries-0.020
ns-0.020
eling-0.018
 hind-0.018
lish-0.018
Token keep
Token ID1394
Feature activation+6.26e-01

Pos logit contributions
 hips+0.040
 hind+0.040
eling+0.038
bing+0.038
arians+0.038
Neg logit contributions
 wonder-0.028
 wow-0.025
 nation-0.024
unts-0.024
od-0.024
Token the
Token ID262
Feature activation+2.06e-01

Pos logit contributions
 convince+0.030
 hips+0.027
ivering+0.027
 injuries+0.026
arians+0.026
Neg logit contributions
 wow-0.013
 many-0.011
 fault-0.010
 maj-0.010
 telescope-0.010
Token mummy
Token ID38567
Feature activation-2.57e-01

Pos logit contributions
 wow+0.020
 ago+0.017
unts+0.017
uzz+0.017
Kat+0.017
Neg logit contributions
 convince-0.030
 hips-0.026
 injuries-0.025
bing-0.024
ivering-0.024
Token replied
Token ID8712
Feature activation-2.28e-01

Pos logit contributions
 wow+0.007
 ago+0.007
unts+0.007
uzz+0.006
 happened+0.006
Neg logit contributions
 convince-0.010
 hips-0.009
 injuries-0.008
bing-0.008
arians-0.008
Token.
Token ID13
Feature activation-2.98e-01

Pos logit contributions
 wow+0.008
 ago+0.007
uzz+0.007
apped+0.007
unts+0.007
Neg logit contributions
 convince-0.008
 hips-0.006
 injuries-0.006
arians-0.006
eling-0.006
Token
Token ID198
Feature activation-3.00e-01

Pos logit contributions
 wow+0.009
 fault+0.008
 happens+0.008
unts+0.008
 telescope+0.008
Neg logit contributions
 convince-0.011
 hips-0.010
bing-0.009
-0.009
 hire-0.009
Token
Token ID198
Feature activation-3.07e-01

Pos logit contributions
 wow+0.009
 fault+0.008
 ago+0.008
 happens+0.008
 telescope+0.008
Neg logit contributions
 convince-0.012
 hips-0.010
 injuries-0.009
bing-0.009
me-0.009
TokenC
Token ID34
Feature activation+9.41e-02

Pos logit contributions
 wow+0.011
 fault+0.010
 ago+0.010
 happened+0.010
unts+0.010
Neg logit contributions
 convince-0.010
 hips-0.008
 injuries-0.008
bing-0.008
 hire-0.007
Tokenher
Token ID372
Feature activation+9.77e-01

Pos logit contributions
 convince+0.003
me+0.003
bing+0.003
ivering+0.003
 hire+0.003
Neg logit contributions
 wow-0.004
 ago-0.003
 happens-0.003
 happened-0.003
 fault-0.003
Tokenie
Token ID494
Feature activation+1.07e-01

Pos logit contributions
 convince+0.036
 hips+0.031
 injuries+0.029
bing+0.028
me+0.028
Neg logit contributions
 wow-0.031
 fault-0.029
 ago-0.028
unts-0.028
 solar-0.028
Token felt
Token ID2936
Feature activation+6.14e-02

Pos logit contributions
 convince+0.004
 hips+0.004
 injuries+0.003
bing+0.003
 hire+0.003
Neg logit contributions
 wow-0.003
 fault-0.003
unts-0.003
 ago-0.003
uzz-0.003
Token an
Token ID281
Feature activation+6.21e-01

Pos logit contributions
 convince+0.002
 hips+0.002
arians+0.002
 injuries+0.002
bing+0.002
Neg logit contributions
 wow-0.002
unts-0.002
 happened-0.002
 ago-0.002
 many-0.002
Token urge
Token ID14960
Feature activation+9.86e-01

Pos logit contributions
 convince+0.020
arians+0.018
 hips+0.018
 injuries+0.018
bing+0.017
Neg logit contributions
 wow-0.020
unts-0.020
 wonder-0.019
 ago-0.019
 fault-0.019
Token,"
Token ID553
Feature activation-2.99e-01

Pos logit contributions
 wow+0.016
unts+0.015
uzz+0.015
Kat+0.015
apped+0.014
Neg logit contributions
 convince-0.016
 hips-0.014
 injuries-0.013
bing-0.012
 hire-0.012
Token replied
Token ID8712
Feature activation-1.48e-01

Pos logit contributions
 wow+0.007
 happened+0.006
 happens+0.006
 many+0.005
unts+0.005
Neg logit contributions
 convince-0.014
 injuries-0.013
 hips-0.013
ivering-0.012
arians-0.012
Token the
Token ID262
Feature activation-3.70e-02

Pos logit contributions
 wow+0.003
 happened+0.003
ull+0.003
 many+0.003
apped+0.003
Neg logit contributions
 convince-0.007
ivering-0.006
 injuries-0.006
arians-0.006
ittens-0.006
Token mole
Token ID9411
Feature activation+4.70e-01

Pos logit contributions
 wow+0.001
 ago+0.001
unts+0.001
uzz+0.001
 fault+0.001
Neg logit contributions
 convince-0.001
 hips-0.001
 injuries-0.001
bing-0.001
 hire-0.001
Token.
Token ID13
Feature activation-2.05e-01

Pos logit contributions
 convince+0.017
 injuries+0.015
arians+0.014
 hips+0.014
vation+0.014
Neg logit contributions
 wow-0.014
uzz-0.013
 ago-0.013
ull-0.013
apped-0.013
Token
Token ID198
Feature activation-1.29e-01

Pos logit contributions
 wow+0.005
 ago+0.005
 telescope+0.005
 fault+0.005
unts+0.005
Neg logit contributions
 convince-0.009
 hips-0.008
 injuries-0.007
bing-0.007
arians-0.007
Token
Token ID198
Feature activation-1.31e-01

Pos logit contributions
 wow+0.003
 ago+0.003
 fault+0.003
unts+0.003
 telescope+0.003
Neg logit contributions
 convince-0.005
 hips-0.005
 injuries-0.005
bing-0.004
 hire-0.004
TokenThe
Token ID464
Feature activation+2.23e-01

Pos logit contributions
 wow+0.005
Kat+0.004
unts+0.004
uzz+0.004
 ago+0.004
Neg logit contributions
 convince-0.004
 hips-0.004
 injuries-0.004
bing-0.003
eling-0.003
Token rabbit
Token ID22746
Feature activation-8.36e-03

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.007
bing+0.006
arians+0.006
Neg logit contributions
 wow-0.007
 ago-0.007
unts-0.006
uzz-0.006
 telescope-0.006
Token had
Token ID550
Feature activation-2.85e-01

Pos logit contributions
 wow+0.000
 ago+0.000
unts+0.000
uzz+0.000
 happened+0.000
Neg logit contributions
 convince-0.000
 hips-0.000
 injuries-0.000
bing-0.000
arians-0.000
Token an
Token ID281
Feature activation+4.35e-01

Pos logit contributions
 wow+0.009
 telescope+0.008
uzz+0.008
apped+0.008
unts+0.008
Neg logit contributions
 convince-0.011
 hips-0.009
 injuries-0.009
 persuade-0.008
 hire-0.008
TokenEmily
Token ID48640
Feature activation-4.23e-01

Pos logit contributions
 wow+0.007
 ago+0.006
 fault+0.006
uzz+0.006
 maj+0.006
Neg logit contributions
 convince-0.007
 hips-0.006
 injuries-0.006
bing-0.005
 hire-0.005
Token and
Token ID290
Feature activation-3.88e-01

Pos logit contributions
 wow+0.014
 fault+0.013
unts+0.013
uzz+0.013
 ago+0.012
Neg logit contributions
 convince-0.015
 hips-0.013
me-0.012
bing-0.011
 hire-0.011
Token Tim
Token ID5045
Feature activation+5.89e-01

Pos logit contributions
 wow+0.012
unts+0.012
 maj+0.011
 fault+0.011
Kat+0.011
Neg logit contributions
 convince-0.014
 hips-0.012
 injuries-0.011
bing-0.011
 hire-0.011
Token spend
Token ID4341
Feature activation+4.88e-01

Pos logit contributions
 convince+0.022
 hips+0.019
 injuries+0.018
eling+0.018
me+0.017
Neg logit contributions
 wow-0.018
 fault-0.017
 ago-0.016
Kat-0.016
 maj-0.016
Token the
Token ID262
Feature activation+5.50e-02

Pos logit contributions
 convince+0.018
 hips+0.016
 injuries+0.015
eling+0.014
bing+0.014
Neg logit contributions
 wow-0.015
uzz-0.014
 fault-0.014
 maj-0.014
 telescope-0.014
Token rest
Token ID1334
Feature activation+2.39e-01

Pos logit contributions
 convince+0.002
 hips+0.002
 injuries+0.002
arians+0.002
 hire+0.002
Neg logit contributions
 wow-0.002
 happened-0.001
 many-0.001
 maj-0.001
 happens-0.001
Token of
Token ID286
Feature activation+4.47e-01

Pos logit contributions
 convince+0.008
 hips+0.007
 injuries+0.007
 negotiate+0.007
 hire+0.007
Neg logit contributions
 wow-0.008
 ago-0.007
uzz-0.007
unts-0.007
apped-0.007
Token the
Token ID262
Feature activation+6.84e-02

Pos logit contributions
 convince+0.016
 hips+0.014
 injuries+0.014
bing+0.013
ivering+0.013
Neg logit contributions
 wow-0.014
 ago-0.013
unts-0.013
uzz-0.013
 fault-0.012
Token day
Token ID1110
Feature activation+4.26e-01

Pos logit contributions
 convince+0.003
 hips+0.003
arians+0.003
 injuries+0.003
 hire+0.003
Neg logit contributions
 wow-0.001
 ago-0.001
 maj-0.001
 many-0.001
 happened-0.001
Token making
Token ID1642
Feature activation+1.83e-01

Pos logit contributions
 convince+0.016
 hips+0.014
 injuries+0.013
bing+0.013
 negotiate+0.013
Neg logit contributions
 wow-0.013
 ago-0.012
unts-0.012
 maj-0.011
 happened-0.011
Token her
Token ID607
Feature activation-6.57e-02

Pos logit contributions
 convince+0.006
 hips+0.005
 injuries+0.005
bing+0.005
eling+0.005
Neg logit contributions
 wow-0.006
 ago-0.006
uzz-0.006
 telescope-0.006
unts-0.005
Token she
Token ID673
Feature activation-3.86e-01

Pos logit contributions
 convince+0.012
 hips+0.010
 injuries+0.010
bing+0.009
 hire+0.009
Neg logit contributions
 wow-0.007
uzz-0.007
unts-0.007
Kat-0.007
apped-0.007
Token didn
Token ID1422
Feature activation+1.11e-01

Pos logit contributions
 wow+0.011
 ago+0.010
uzz+0.010
unts+0.010
 telescope+0.010
Neg logit contributions
 convince-0.015
 hips-0.014
 injuries-0.013
me-0.012
bing-0.012
Token't
Token ID470
Feature activation+9.92e-01

Pos logit contributions
 convince+0.003
 hips+0.002
 injuries+0.002
eling+0.002
ivering+0.002
Neg logit contributions
 happens-0.005
unts-0.005
 wow-0.005
 happened-0.005
 many-0.004
Token sing
Token ID1702
Feature activation+8.72e-01

Pos logit contributions
 convince+0.037
 hips+0.030
 injuries+0.029
 hire+0.029
 persuade+0.028
Neg logit contributions
 wow-0.032
 maj-0.028
 telescope-0.028
apped-0.028
 ago-0.028
Token well
Token ID880
Feature activation-6.19e-01

Pos logit contributions
 convince+0.028
 hips+0.024
 injuries+0.023
bing+0.022
 hire+0.021
Neg logit contributions
 wow-0.031
 ago-0.028
unts-0.028
uzz-0.028
 telescope-0.028
Token.
Token ID13
Feature activation-1.89e-01

Pos logit contributions
 wow+0.023
 ago+0.021
unts+0.021
 fault+0.020
uzz+0.020
Neg logit contributions
 convince-0.020
 hips-0.016
 injuries-0.016
bing-0.015
 hire-0.015
Token But
Token ID887
Feature activation+1.44e+00

Pos logit contributions
 wow+0.007
ull+0.007
 telescope+0.007
unts+0.006
 happens+0.006
Neg logit contributions
 convince-0.006
 hips-0.005
-0.005
 before-0.004
 hire-0.004
Token her
Token ID607
Feature activation-1.25e-01

Pos logit contributions
 convince+0.048
 hips+0.040
 injuries+0.038
bing+0.036
 hire+0.035
Neg logit contributions
 wow-0.051
unts-0.049
 fault-0.047
 maj-0.047
 ago-0.046
Token parents
Token ID3397
Feature activation-8.77e-01

Pos logit contributions
 wow+0.003
uzz+0.003
 ago+0.003
unts+0.003
 maj+0.003
Neg logit contributions
 convince-0.005
 hips-0.005
 injuries-0.005
bing-0.004
 sleeve-0.004
Token told
Token ID1297
Feature activation+5.67e-02

Pos logit contributions
 wow+0.024
unts+0.021
 ago+0.021
 happened+0.021
uzz+0.021
Neg logit contributions
 convince-0.035
 hips-0.031
 injuries-0.030
bing-0.029
arians-0.028
Token her
Token ID607
Feature activation-1.41e-01

Pos logit contributions
 convince+0.002
 hips+0.002
 injuries+0.002
bing+0.002
arians+0.002
Neg logit contributions
 wow-0.001
 ago-0.001
 happened-0.001
unts-0.001
 fault-0.001
Token one
Token ID530
Feature activation+1.05e+00

Pos logit contributions
 convince+0.037
 hips+0.032
 injuries+0.030
bing+0.029
 hire+0.028
Neg logit contributions
 wow-0.038
unts-0.035
 ago-0.034
uzz-0.034
 fault-0.034
Token day
Token ID1110
Feature activation+4.48e-01

Pos logit contributions
 convince+0.043
 hips+0.038
 injuries+0.036
bing+0.035
 hire+0.034
Neg logit contributions
 wow-0.029
 ago-0.025
unts-0.025
uzz-0.025
 fault-0.025
Token,
Token ID11
Feature activation-1.23e-01

Pos logit contributions
 convince+0.016
 hips+0.014
 injuries+0.013
bing+0.013
 hire+0.013
Neg logit contributions
 wow-0.014
 ago-0.013
unts-0.013
uzz-0.013
 fault-0.013
Token the
Token ID262
Feature activation+4.59e-02

Pos logit contributions
 wow+0.004
unts+0.004
 ago+0.004
uzz+0.004
 fault+0.004
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
 hire-0.003
bing-0.003
Token colorful
Token ID20239
Feature activation-1.41e-01

Pos logit contributions
 convince+0.002
 hips+0.002
 injuries+0.002
bing+0.001
 hire+0.001
Neg logit contributions
 wow-0.001
uzz-0.001
unts-0.001
Kat-0.001
 ago-0.001
Token metal
Token ID6147
Feature activation-1.79e+00

Pos logit contributions
uzz+0.005
 fault+0.005
 wow+0.005
Kat+0.005
unts+0.005
Neg logit contributions
 convince-0.004
 hips-0.004
 hire-0.004
me-0.003
 injuries-0.003
Token started
Token ID2067
Feature activation-9.64e-01

Pos logit contributions
 wow+0.056
Kat+0.051
unts+0.051
uzz+0.051
 ago+0.050
Neg logit contributions
 convince-0.065
 hips-0.057
 injuries-0.053
 hire-0.051
bing-0.051
Token to
Token ID284
Feature activation+3.04e-02

Pos logit contributions
 wow+0.026
unts+0.023
 ago+0.023
uzz+0.023
Kat+0.022
Neg logit contributions
 convince-0.039
 hips-0.035
 injuries-0.034
bing-0.032
 hire-0.032
Token tell
Token ID1560
Feature activation+1.06e-01

Pos logit contributions
 convince+0.001
 hips+0.001
 injuries+0.001
bing+0.001
arians+0.001
Neg logit contributions
 wow-0.001
 ago-0.001
unts-0.001
uzz-0.001
 maj-0.001
Token stories
Token ID3923
Feature activation-7.72e-01

Pos logit contributions
 convince+0.004
 hips+0.003
 injuries+0.003
bing+0.003
ivering+0.003
Neg logit contributions
 wow-0.003
 ago-0.003
unts-0.003
uzz-0.003
 fault-0.003
Token too
Token ID1165
Feature activation-9.38e-02

Pos logit contributions
 wow+0.025
unts+0.023
uzz+0.023
 ago+0.023
 maj+0.023
Neg logit contributions
 convince-0.027
 hips-0.023
 injuries-0.022
bing-0.021
arians-0.021
Token she
Token ID673
Feature activation-2.52e-02

Pos logit contributions
 convince+0.011
 hips+0.009
 persuade+0.008
bing+0.008
 hire+0.008
Neg logit contributions
 fault-0.011
 wow-0.011
ocado-0.010
unts-0.010
 ago-0.010
Token asked
Token ID1965
Feature activation-1.26e-01

Pos logit contributions
 wow+0.001
 ago+0.001
unts+0.001
uzz+0.001
 happened+0.001
Neg logit contributions
 convince-0.001
 hips-0.001
 injuries-0.001
bing-0.001
arians-0.001
Token.
Token ID13
Feature activation-2.39e-01

Pos logit contributions
 wow+0.005
 ago+0.004
uzz+0.004
unts+0.004
 happened+0.004
Neg logit contributions
 convince-0.004
 hips-0.003
 injuries-0.003
bing-0.003
arians-0.003
Token
Token ID198
Feature activation-5.15e-02

Pos logit contributions
 fault+0.009
 wow+0.008
aw+0.008
ocado+0.008
unts+0.008
Neg logit contributions
 convince-0.008
 hips-0.007
-0.006
 persuade-0.006
't-0.006
Token
Token ID198
Feature activation-5.33e-02

Pos logit contributions
 wow+0.002
 fault+0.002
 ago+0.001
 telescope+0.001
 solar+0.001
Neg logit contributions
 convince-0.002
 hips-0.002
me-0.001
bing-0.001
 injuries-0.001
TokenM
Token ID44
Feature activation-1.59e+00

Pos logit contributions
 fault+0.002
 wow+0.002
 ago+0.002
unts+0.002
 happened+0.002
Neg logit contributions
 convince-0.002
 hips-0.001
me-0.001
bing-0.001
 persuade-0.001
Tokenum
Token ID388
Feature activation-7.36e-01

Pos logit contributions
 wow+0.062
 solar+0.056
 fault+0.054
ocado+0.054
 Wow+0.053
Neg logit contributions
 convince-0.045
 injuries-0.037
 hips-0.036
ittens-0.036
ivering-0.035
Token pointed
Token ID6235
Feature activation-5.17e-01

Pos logit contributions
 wow+0.022
 fault+0.021
 ago+0.020
unts+0.020
 solar+0.020
Neg logit contributions
 convince-0.028
 hips-0.024
 injuries-0.023
bing-0.022
me-0.022
Token to
Token ID284
Feature activation-3.04e-02

Pos logit contributions
 wow+0.018
unts+0.017
 fault+0.016
Kat+0.016
 galaxy+0.016
Neg logit contributions
 convince-0.016
 hips-0.015
 before-0.013
 sleeve-0.013
bing-0.013
Token a
Token ID257
Feature activation-2.13e-01

Pos logit contributions
 fault+0.001
unts+0.001
 wow+0.001
ocado+0.001
apped+0.001
Neg logit contributions
 convince-0.001
 hips-0.001
 injuries-0.001
 persuade-0.001
 hire-0.001
Token huge
Token ID3236
Feature activation-5.71e-01

Pos logit contributions
 wow+0.007
uzz+0.006
 fault+0.006
apped+0.006
unts+0.006
Neg logit contributions
 convince-0.008
 hips-0.006
 injuries-0.006
bing-0.006
 negotiate-0.006
Token.
Token ID13
Feature activation-3.69e-02

Pos logit contributions
 wow+0.007
 many+0.006
 ago+0.006
 fault+0.006
 happened+0.006
Neg logit contributions
 convince-0.009
 hips-0.008
arians-0.008
ivering-0.008
 hire-0.008
Token He
Token ID679
Feature activation+3.76e-01

Pos logit contributions
 wow+0.001
 happened+0.001
 telescope+0.001
 maj+0.001
 happens+0.001
Neg logit contributions
 convince-0.001
 hips-0.001
 injuries-0.001
-0.001
 hire-0.001
Token traveled
Token ID14113
Feature activation+9.02e-01

Pos logit contributions
 convince+0.015
 hips+0.013
eling+0.012
 injuries+0.012
me+0.012
Neg logit contributions
 wow-0.011
 fault-0.010
unts-0.010
 ago-0.010
uzz-0.010
Token around
Token ID1088
Feature activation-2.39e-01

Pos logit contributions
 convince+0.032
 hips+0.027
 injuries+0.025
bing+0.024
 persuade+0.023
Neg logit contributions
 wow-0.030
unts-0.028
 ago-0.028
 fault-0.027
 happened-0.027
Token for
Token ID329
Feature activation+3.78e-01

Pos logit contributions
 wow+0.008
unts+0.008
 fault+0.007
uzz+0.007
 ago+0.007
Neg logit contributions
 convince-0.008
 hips-0.007
 injuries-0.006
bing-0.006
eling-0.006
Token days
Token ID1528
Feature activation-1.88e+00

Pos logit contributions
 convince+0.015
 hips+0.013
 injuries+0.013
bing+0.012
eling+0.012
Neg logit contributions
 wow-0.010
unts-0.009
 ago-0.009
uzz-0.009
 fault-0.009
Token and
Token ID290
Feature activation-8.34e-01

Pos logit contributions
 wow+0.063
unts+0.057
uzz+0.056
 fault+0.056
 maj+0.056
Neg logit contributions
 convince-0.065
 hips-0.057
 injuries-0.054
bing-0.052
arians-0.050
Token days
Token ID1528
Feature activation-1.87e+00

Pos logit contributions
 wow+0.029
 fault+0.028
unts+0.027
uzz+0.027
 maj+0.027
Neg logit contributions
 convince-0.028
 injuries-0.023
 hips-0.022
bing-0.022
eling-0.022
Token until
Token ID1566
Feature activation+3.21e-03

Pos logit contributions
 wow+0.064
unts+0.058
 ago+0.057
uzz+0.057
 fault+0.057
Neg logit contributions
 convince-0.064
 hips-0.055
 injuries-0.052
bing-0.050
arians-0.049
Token he
Token ID339
Feature activation-4.31e-02

Pos logit contributions
 convince+0.000
 injuries+0.000
 hips+0.000
+0.000
bing+0.000
Neg logit contributions
 wow-0.000
unts-0.000
 fault-0.000
uzz-0.000
ocado-0.000
Token reached
Token ID4251
Feature activation-4.54e-01

Pos logit contributions
 wow+0.002
 fault+0.001
unts+0.001
uzz+0.001
 telescope+0.001
Neg logit contributions
 convince-0.001
me-0.001
 injuries-0.001
 before-0.001
 hips-0.001
Token,
Token ID11
Feature activation-3.13e-01

Pos logit contributions
 convince+0.022
 hips+0.019
 hire+0.018
arians+0.018
ivering+0.018
Neg logit contributions
 wow-0.016
 ago-0.016
 happened-0.015
 fault-0.014
 many-0.014
Token so
Token ID523
Feature activation+5.06e-01

Pos logit contributions
 wow+0.012
 ago+0.011
 happened+0.011
unts+0.011
 telescope+0.011
Neg logit contributions
 convince-0.009
 hips-0.008
 injuries-0.007
bing-0.007
arians-0.007
Token he
Token ID339
Feature activation+8.04e-02

Pos logit contributions
 convince+0.016
 hips+0.014
 injuries+0.013
bing+0.013
 hire+0.012
Neg logit contributions
 wow-0.018
 ago-0.017
unts-0.017
uzz-0.016
 fault-0.016
Token pushed
Token ID7121
Feature activation-1.26e-01

Pos logit contributions
 convince+0.003
 hips+0.003
 injuries+0.003
bing+0.003
arians+0.003
Neg logit contributions
 wow-0.002
 ago-0.002
unts-0.002
uzz-0.002
 happened-0.002
Token the
Token ID262
Feature activation-1.18e-02

Pos logit contributions
 wow+0.003
 many+0.003
ull+0.003
uzz+0.003
 Jas+0.002
Neg logit contributions
 convince-0.005
arians-0.005
 hips-0.005
ivering-0.005
ider-0.005
Token door
Token ID3420
Feature activation-1.29e+00

Pos logit contributions
 wow+0.000
 telescope+0.000
 happened+0.000
 maj+0.000
 ago+0.000
Neg logit contributions
 convince-0.001
 hips-0.000
 injuries-0.000
eling-0.000
arians-0.000
Token open
Token ID1280
Feature activation+2.24e-01

Pos logit contributions
 wow+0.041
 ago+0.038
unts+0.037
uzz+0.037
 telescope+0.037
Neg logit contributions
 convince-0.046
 hips-0.039
 injuries-0.038
arians-0.038
eling-0.038
Token and
Token ID290
Feature activation-6.95e-01

Pos logit contributions
 convince+0.007
 hips+0.006
 injuries+0.006
bing+0.005
 hire+0.005
Neg logit contributions
 wow-0.008
unts-0.008
 fault-0.007
uzz-0.007
 happened-0.007
Token hopped
Token ID45230
Feature activation-4.61e-01

Pos logit contributions
 wow+0.020
unts+0.019
 fault+0.019
 ago+0.019
uzz+0.018
Neg logit contributions
 convince-0.027
 hips-0.023
 injuries-0.022
bing-0.021
 hire-0.021
Token inside
Token ID2641
Feature activation-4.67e-01

Pos logit contributions
 wow+0.017
 fault+0.016
unts+0.015
 ago+0.015
 happened+0.015
Neg logit contributions
 convince-0.015
 hips-0.013
 injuries-0.012
bing-0.011
 persuade-0.011
Token.
Token ID13
Feature activation-3.40e-01

Pos logit contributions
 wow+0.016
 ago+0.015
unts+0.014
uzz+0.014
 maj+0.014
Neg logit contributions
 convince-0.016
 hips-0.013
 injuries-0.013
arians-0.012
bing-0.012
Tokenily
Token ID813
Feature activation-8.81e-01

Pos logit contributions
 wow+0.016
 telescope+0.013
 solar+0.013
unts+0.013
 maj+0.013
Neg logit contributions
 convince-0.025
 hips-0.022
 injuries-0.021
ivering-0.021
me-0.020
Token and
Token ID290
Feature activation-2.85e-02

Pos logit contributions
 wow+0.030
 ago+0.027
 fault+0.026
uzz+0.026
unts+0.026
Neg logit contributions
 convince-0.031
 hips-0.027
 injuries-0.025
bing-0.024
ivering-0.024
Token Ben
Token ID3932
Feature activation-2.01e-01

Pos logit contributions
 wow+0.001
 ago+0.001
unts+0.001
 maj+0.001
 fault+0.001
Neg logit contributions
 convince-0.001
 hips-0.001
 injuries-0.001
bing-0.001
 sleeve-0.001
Token looked
Token ID3114
Feature activation+2.76e-02

Pos logit contributions
 wow+0.005
 fault+0.005
 ago+0.004
uzz+0.004
unts+0.004
Neg logit contributions
 convince-0.009
 hips-0.008
 injuries-0.007
bing-0.007
ivering-0.007
Token at
Token ID379
Feature activation-5.39e-01

Pos logit contributions
 convince+0.001
 hips+0.001
bing+0.001
 injuries+0.001
ivering+0.001
Neg logit contributions
 wow-0.001
Kat-0.001
 ago-0.001
uzz-0.001
 fault-0.001
Token each
Token ID1123
Feature activation-1.35e+00

Pos logit contributions
 wow+0.014
 ago+0.012
unts+0.012
uzz+0.012
 fault+0.012
Neg logit contributions
 convince-0.023
 hips-0.020
 injuries-0.019
bing-0.019
arians-0.018
Token other
Token ID584
Feature activation-5.46e-01

Pos logit contributions
 many+0.044
unts+0.038
 wonder+0.035
 telescope+0.035
Kat+0.034
Neg logit contributions
arians-0.051
ider-0.049
 injuries-0.048
-0.048
 hips-0.047
Token.
Token ID13
Feature activation+3.43e-01

Pos logit contributions
 wow+0.017
 ago+0.016
unts+0.016
 fault+0.016
uzz+0.015
Neg logit contributions
 convince-0.020
 hips-0.016
 injuries-0.016
arians-0.016
 hire-0.015
Token They
Token ID1119
Feature activation-1.77e-02

Pos logit contributions
 hips+0.012
 convince+0.012
+0.012
 sleeve+0.011
 injuries+0.011
Neg logit contributions
 wow-0.010
Kat-0.008
 telescope-0.008
 happened-0.008
aw-0.008
Token both
Token ID1111
Feature activation+2.44e-01

Pos logit contributions
 wow+0.001
 fault+0.001
 ago+0.000
uzz+0.000
 happens+0.000
Neg logit contributions
 convince-0.001
 hips-0.001
 injuries-0.001
eling-0.001
 sleeve-0.001
Token wanted
Token ID2227
Feature activation-7.00e-01

Pos logit contributions
 convince+0.009
 hips+0.008
eling+0.008
 injuries+0.007
 resist+0.007
Neg logit contributions
 wow-0.007
 fault-0.007
 ago-0.007
 telescope-0.007
uzz-0.007