TOP ACTIVATIONS
MAX = 0.062

perform tricks to make his wife and cat happy . His
and the mum gave her daughter lots of hugs .
outside , a strange thought came to her mind . She
she wanted to give her mom a flower .
happy that he made his wife laugh . They hugged and
It looks great ! Now off we go on our walk
Finally , Dave 's box came back down to the ground
counted down slowly and gave everyone an equal chance .
laughed . He wanted his sister to get up so they
she could and gave Grand ma the biggest hug ever .
shiny toy that makes his dad 's face smooth . He
, you ! Leave those mice alone ! They are my
She smiled and gave the mouse a little pat on its
, he heard a voice calling his name from behind .
. " Look out !" Tom shouted . He
walk outside to give his neighbours a big surprise .
Grand mother would give her granddaughter a file . The file
she decided to make her mother a healthy snack . She
with his foot . The hut fell down . He was
able to make his little sister feel better with a hug

MIN ACTIVATIONS
MIN = -0.065

or herbs that the mole doesn 't like . Or we
mad at us . Please don 't hurt us ."
has big words that I don 't know . But we
naughty and bad . Please don 't be mad at us
. " I don 't know ," Ben says
wanted to touch the ost ric hes . They reached out
tells her the ones she doesn 't know . They have
. " I don 't know . Maybe it
Lily says , " I don 't know , Ben .
. " I don 't know . Maybe they
?" " I don 't know , but we
is green . But I don 't know everything ."
will be worried if they don 't see us ."
, " No , I don 't have any cheese ."
, " No , I don 't . How were you
said " no , please don 't touch the crocod ile
. " I don 't know ," Ben says
. " I don 't know . Maybe .
?" " I don 't know yet . I
voice say , " Please don 't touch me !" Tim

INTERVAL 0.031 - 0.062
CONTAINS 0.387%

" Mom , mom , where are you ?"
rules and she happily skipped away with her toe - w
day , the man looked out the window and saw a
my made sure to keep a closer eye on his beloved
he shouted , " Look Lily , I found your pretty

INTERVAL -0.001 - 0.031
CONTAINS 53.954%

them and taste them . One day , Jen
, " You 're right , Lily ! My collar is
Tom my smiled and started looking for some pants as
be her friend . <|endoftext|> S ara loved to play with
No , I don 't . You have boring toys and

INTERVAL -0.033 - -0.001
CONTAINS 45.323%

honey . They also sampled some crunch y chips and some
, stay over there ". M ia was troubled
my toy . You can play with something else ."
played with balls , ran around , and even made a
ily took the orange and started to wave it around ,

INTERVAL -0.065 - -0.033
CONTAINS 0.336%

some cake too . Lily didn 't want to share and
Her mom my told her that it came from a special
box and showed them what was inside . It was a
delicious drink inside . He couldn 't believe how good it
rot named Polly flew by and saw Max drinking the lime
Token perform
Token ID1620
Feature activation-7.27e-04

Pos logit contributions
pron+0.011
ller+0.011
issa+0.010
mallow+0.010
lights+0.010
Neg logit contributions
 unnecessary-0.006
 lifelong-0.005
 belong-0.005
 risks-0.005
Got-0.004
Token tricks
Token ID15910
Feature activation-9.85e-03

Pos logit contributions
pron+0.001
ller+0.001
mallow+0.001
rant+0.001
issa+0.001
Neg logit contributions
 unnecessary-0.001
 lifelong-0.001
 risks-0.001
 chances-0.001
 friendships-0.001
Token to
Token ID284
Feature activation+1.99e-03

Pos logit contributions
pron+0.020
ller+0.018
issa+0.017
mallow+0.016
lf+0.016
Neg logit contributions
 unnecessary-0.014
 lifelong-0.013
 true-0.011
Sur-0.011
 risks-0.011
Token make
Token ID787
Feature activation-6.58e-04

Pos logit contributions
 unnecessary+0.003
 lifelong+0.002
 risks+0.002
 chances+0.002
 situations+0.002
Neg logit contributions
pron-0.004
ller-0.004
mallow-0.004
lf-0.004
lights-0.004
Token his
Token ID465
Feature activation+1.45e-02

Pos logit contributions
pron+0.001
ller+0.001
lf+0.001
ibrarian+0.001
arge+0.001
Neg logit contributions
 unnecessary-0.001
 lifelong-0.001
Got-0.001
Sur-0.001
 risks-0.001
Token wife
Token ID3656
Feature activation+6.23e-02

Pos logit contributions
 unnecessary+0.024
 lifelong+0.020
Sur+0.019
Got+0.019
 risks+0.018
Neg logit contributions
pron-0.028
ller-0.027
lf-0.024
ibrarian-0.023
lights-0.023
Token and
Token ID290
Feature activation+1.10e-02

Pos logit contributions
 unnecessary+0.123
 lifelong+0.102
Got+0.101
Sur+0.100
 situations+0.097
Neg logit contributions
pron-0.105
ller-0.094
lf-0.086
washer-0.081
lights-0.080
Token cat
Token ID3797
Feature activation+3.38e-02

Pos logit contributions
 unnecessary+0.020
 lifelong+0.017
Sur+0.016
 risks+0.016
Got+0.016
Neg logit contributions
pron-0.020
ller-0.018
lf-0.017
ibrarian-0.016
lights-0.016
Token happy
Token ID3772
Feature activation-8.07e-03

Pos logit contributions
 unnecessary+0.066
 lifelong+0.057
Sur+0.053
Got+0.052
 risks+0.052
Neg logit contributions
pron-0.057
ller-0.052
lf-0.048
ibrarian-0.045
lights-0.045
Token.
Token ID13
Feature activation-8.90e-03

Pos logit contributions
pron+0.012
ller+0.011
lf+0.010
mallow+0.009
ibrarian+0.009
Neg logit contributions
 unnecessary-0.017
 lifelong-0.015
Sur-0.014
 risks-0.014
 situations-0.014
Token His
Token ID2399
Feature activation-3.38e-03

Pos logit contributions
pron+0.020
ller+0.018
lf+0.017
ibrarian+0.016
issa+0.016
Neg logit contributions
 unnecessary-0.012
 lifelong-0.011
Sur-0.010
Got-0.009
 situations-0.009
Token and
Token ID290
Feature activation+1.15e-02

Pos logit contributions
 unnecessary+0.034
 lifelong+0.030
Sur+0.028
Got+0.028
 risks+0.028
Neg logit contributions
pron-0.022
ller-0.019
lf-0.017
lights-0.016
mallow-0.016
Token the
Token ID262
Feature activation+5.93e-03

Pos logit contributions
 unnecessary+0.022
 lifelong+0.019
Sur+0.018
 risks+0.017
Got+0.017
Neg logit contributions
pron-0.020
ller-0.018
lf-0.017
lights-0.016
mallow-0.016
Token mum
Token ID25682
Feature activation+1.39e-02

Pos logit contributions
 unnecessary+0.010
 lifelong+0.008
Sur+0.007
 chances+0.007
 risks+0.007
Neg logit contributions
pron-0.012
ller-0.011
lf-0.010
lights-0.010
rant-0.010
Token gave
Token ID2921
Feature activation+2.78e-03

Pos logit contributions
 unnecessary+0.024
 lifelong+0.022
Sur+0.020
Got+0.020
 risks+0.019
Neg logit contributions
pron-0.025
ller-0.023
lf-0.021
lights-0.020
rant-0.020
Token her
Token ID607
Feature activation+2.01e-02

Pos logit contributions
 unnecessary+0.005
 lifelong+0.004
Sur+0.004
 risks+0.004
Got+0.004
Neg logit contributions
pron-0.005
ller-0.005
lf-0.004
ibrarian-0.004
mallow-0.004
Token daughter
Token ID4957
Feature activation+6.05e-02

Pos logit contributions
 unnecessary+0.032
 lifelong+0.027
Sur+0.024
 risks+0.024
Got+0.024
Neg logit contributions
pron-0.041
ller-0.038
lf-0.036
lights-0.034
ibrarian-0.034
Token lots
Token ID6041
Feature activation+3.04e-03

Pos logit contributions
 unnecessary+0.100
 lifelong+0.083
Sur+0.076
Got+0.075
 risks+0.075
Neg logit contributions
pron-0.120
ller-0.110
lf-0.102
ibrarian-0.098
lights-0.098
Token of
Token ID286
Feature activation+1.62e-02

Pos logit contributions
 unnecessary+0.005
 lifelong+0.005
Sur+0.004
Got+0.004
 risks+0.004
Neg logit contributions
pron-0.006
ller-0.005
lf-0.005
lights-0.005
ibrarian-0.005
Token hugs
Token ID40657
Feature activation+8.73e-03

Pos logit contributions
 unnecessary+0.031
 lifelong+0.027
Sur+0.025
Got+0.025
 risks+0.025
Neg logit contributions
pron-0.028
ller-0.025
lf-0.023
ibrarian-0.022
lights-0.022
Token.
Token ID13
Feature activation+6.03e-03

Pos logit contributions
 unnecessary+0.018
 lifelong+0.016
Sur+0.015
 risks+0.015
Got+0.015
Neg logit contributions
pron-0.014
ller-0.012
lf-0.011
lights-0.010
ibrarian-0.010
Token
Token ID198
Feature activation+3.11e-03

Pos logit contributions
 unnecessary+0.011
 lifelong+0.010
Sur+0.009
Got+0.009
 risks+0.009
Neg logit contributions
pron-0.011
ller-0.010
lf-0.009
ibrarian-0.009
lights-0.009
Token outside
Token ID2354
Feature activation+8.87e-03

Pos logit contributions
pron+0.016
ller+0.015
lf+0.014
lights+0.013
washer+0.013
Neg logit contributions
 unnecessary-0.019
 lifelong-0.016
Sur-0.015
Got-0.015
 risks-0.015
Token,
Token ID11
Feature activation+7.14e-03

Pos logit contributions
 unnecessary+0.022
 lifelong+0.019
 risks+0.018
Got+0.018
Sur+0.018
Neg logit contributions
pron-0.010
lf-0.010
ller-0.009
lights-0.009
washer-0.009
Token a
Token ID257
Feature activation-3.80e-03

Pos logit contributions
 unnecessary+0.016
 lifelong+0.014
Sur+0.013
 risks+0.013
 situations+0.013
Neg logit contributions
pron-0.010
ller-0.009
lf-0.008
issa-0.008
mallow-0.008
Token strange
Token ID6283
Feature activation-1.14e-02

Pos logit contributions
pron+0.008
ller+0.007
lf+0.007
lights+0.007
ibrarian+0.007
Neg logit contributions
 unnecessary-0.006
 lifelong-0.005
Sur-0.004
 risks-0.004
Got-0.004
Token thought
Token ID1807
Feature activation-8.79e-04

Pos logit contributions
pron+0.021
ller+0.020
ibrarian+0.018
lights+0.018
lf+0.018
Neg logit contributions
 unnecessary-0.018
 lifelong-0.016
 situations-0.015
 risks-0.015
Sur-0.015
Token came
Token ID1625
Feature activation+5.74e-02

Pos logit contributions
pron+0.002
ller+0.001
lf+0.001
ibrarian+0.001
lights+0.001
Neg logit contributions
 unnecessary-0.002
 lifelong-0.001
Sur-0.001
 risks-0.001
Got-0.001
Token to
Token ID284
Feature activation+1.36e-02

Pos logit contributions
 unnecessary+0.122
Got+0.114
 lifelong+0.110
Sur+0.108
 situations+0.103
Neg logit contributions
ller-0.067
pron-0.057
work-0.054
aby-0.053
:-0.052
Token her
Token ID607
Feature activation-2.33e-03

Pos logit contributions
 unnecessary+0.027
 lifelong+0.023
Sur+0.022
Got+0.021
 risks+0.021
Neg logit contributions
pron-0.021
ller-0.019
lf-0.019
 stove-0.018
 clock-0.017
Token mind
Token ID2000
Feature activation+8.30e-04

Pos logit contributions
pron+0.003
ller+0.003
lf+0.003
ibrarian+0.002
mallow+0.002
Neg logit contributions
 unnecessary-0.005
 lifelong-0.005
Sur-0.004
Got-0.004
 risks-0.004
Token.
Token ID13
Feature activation-3.51e-04

Pos logit contributions
 unnecessary+0.002
 lifelong+0.001
 risks+0.001
 situations+0.001
Sur+0.001
Neg logit contributions
pron-0.001
ller-0.001
lf-0.001
ibrarian-0.001
 timer-0.001
Token She
Token ID1375
Feature activation-1.20e-02

Pos logit contributions
pron+0.001
ller+0.001
lf+0.001
ibrarian+0.000
lights+0.000
Neg logit contributions
 unnecessary-0.001
 lifelong-0.001
 risks-0.001
Sur-0.001
Got-0.001
Token she
Token ID673
Feature activation+2.16e-02

Pos logit contributions
 unnecessary+0.006
 lifelong+0.005
Sur+0.005
 risks+0.005
Got+0.005
Neg logit contributions
pron-0.005
ller-0.005
lf-0.004
lights-0.004
mallow-0.004
Token wanted
Token ID2227
Feature activation-9.36e-03

Pos logit contributions
 unnecessary+0.029
 lifelong+0.025
Sur+0.025
Got+0.024
 risks+0.022
Neg logit contributions
pron-0.044
ller-0.040
lights-0.039
lf-0.039
rant-0.038
Token to
Token ID284
Feature activation+1.24e-02

Pos logit contributions
pron+0.017
ller+0.015
lf+0.014
lights+0.013
mallow+0.013
Neg logit contributions
 unnecessary-0.018
 lifelong-0.015
 risks-0.014
Sur-0.014
 chances-0.014
Token give
Token ID1577
Feature activation-7.50e-03

Pos logit contributions
 unnecessary+0.018
 lifelong+0.015
 risks+0.014
Sur+0.013
Got+0.013
Neg logit contributions
pron-0.027
ller-0.025
lf-0.023
lights-0.023
ibrarian-0.022
Token her
Token ID607
Feature activation+1.14e-02

Pos logit contributions
pron+0.015
ller+0.014
lf+0.013
lights+0.012
ibrarian+0.012
Neg logit contributions
 unnecessary-0.013
 lifelong-0.011
 risks-0.010
Sur-0.010
 chances-0.010
Token mom
Token ID1995
Feature activation+5.70e-02

Pos logit contributions
 unnecessary+0.021
 lifelong+0.019
 chances+0.017
 risks+0.017
 friendships+0.016
Neg logit contributions
pron-0.020
ller-0.019
lf-0.018
rant-0.017
ibrarian-0.017
Token a
Token ID257
Feature activation+1.68e-02

Pos logit contributions
 unnecessary+0.088
 lifelong+0.074
 risks+0.067
Sur+0.066
Got+0.065
Neg logit contributions
pron-0.118
ller-0.112
lf-0.104
lights-0.099
ibrarian-0.098
Token flower
Token ID15061
Feature activation+1.25e-02

Pos logit contributions
 unnecessary+0.030
 lifelong+0.026
Sur+0.023
 risks+0.023
Got+0.023
Neg logit contributions
pron-0.030
ller-0.029
lf-0.027
lights-0.026
ibrarian-0.025
Token.
Token ID13
Feature activation-3.55e-03

Pos logit contributions
 unnecessary+0.025
 lifelong+0.023
 risks+0.021
Sur+0.021
 chances+0.020
Neg logit contributions
pron-0.019
ller-0.018
lf-0.016
issa-0.015
lights-0.015
Token
Token ID220
Feature activation+3.48e-03

Pos logit contributions
pron+0.006
ller+0.006
lf+0.005
lights+0.005
ibrarian+0.005
Neg logit contributions
 unnecessary-0.007
 lifelong-0.006
Sur-0.005
Got-0.005
 risks-0.005
Token
Token ID198
Feature activation+3.04e-03

Pos logit contributions
 unnecessary+0.006
 lifelong+0.005
Sur+0.005
Got+0.005
 risks+0.005
Neg logit contributions
pron-0.007
ller-0.006
lf-0.006
lights-0.005
mallow-0.005
Token happy
Token ID3772
Feature activation-8.03e-03

Pos logit contributions
pron+0.028
ller+0.025
lf+0.023
ibrarian+0.022
mallow+0.022
Neg logit contributions
 unnecessary-0.029
 lifelong-0.024
 risks-0.023
 situations-0.023
Sur-0.023
Token that
Token ID326
Feature activation-6.92e-03

Pos logit contributions
pron+0.015
ller+0.013
rant+0.012
lf+0.012
lights+0.012
Neg logit contributions
 unnecessary-0.014
 lifelong-0.012
Sur-0.011
 friendships-0.011
 risks-0.011
Token he
Token ID339
Feature activation-4.99e-03

Pos logit contributions
pron+0.009
lf+0.008
ller+0.008
mallow+0.007
washer+0.007
Neg logit contributions
 unnecessary-0.016
 lifelong-0.013
 risks-0.013
Sur-0.013
Got-0.013
Token made
Token ID925
Feature activation-1.61e-03

Pos logit contributions
pron+0.008
ller+0.008
lf+0.007
lights+0.007
ibrarian+0.007
Neg logit contributions
 unnecessary-0.010
 lifelong-0.008
Sur-0.008
Got-0.008
 risks-0.008
Token his
Token ID465
Feature activation+1.09e-02

Pos logit contributions
pron+0.003
ller+0.003
lf+0.002
ibrarian+0.002
washer+0.002
Neg logit contributions
 unnecessary-0.003
 lifelong-0.003
Got-0.002
Sur-0.002
 situations-0.002
Token wife
Token ID3656
Feature activation+5.67e-02

Pos logit contributions
 unnecessary+0.022
Got+0.020
Sur+0.020
 lifelong+0.019
 situations+0.018
Neg logit contributions
pron-0.017
ller-0.014
lf-0.013
 clock-0.013
ibrarian-0.013
Token laugh
Token ID6487
Feature activation+1.96e-03

Pos logit contributions
 unnecessary+0.118
Got+0.097
Sur+0.096
 lifelong+0.095
 situations+0.094
Neg logit contributions
pron-0.092
ller-0.081
lf-0.074
washer-0.071
 clock-0.068
Token.
Token ID13
Feature activation-5.88e-03

Pos logit contributions
 unnecessary+0.004
 lifelong+0.004
Sur+0.003
 risks+0.003
Got+0.003
Neg logit contributions
pron-0.003
ller-0.003
lf-0.002
mallow-0.002
ibrarian-0.002
Token They
Token ID1119
Feature activation-1.36e-02

Pos logit contributions
pron+0.013
ller+0.012
lf+0.011
rant+0.011
 timer+0.011
Neg logit contributions
 unnecessary-0.008
 lifelong-0.007
Sur-0.007
Got-0.006
 friendships-0.006
Token hugged
Token ID41130
Feature activation-6.08e-03

Pos logit contributions
pron+0.031
ller+0.030
lights+0.029
oons+0.027
lf+0.026
Neg logit contributions
 unnecessary-0.015
 lifelong-0.014
Got-0.013
 belong-0.013
Sur-0.012
Token and
Token ID290
Feature activation-5.54e-04

Pos logit contributions
pron+0.011
ller+0.010
lf+0.009
washer+0.009
ibrarian+0.009
Neg logit contributions
 unnecessary-0.011
 lifelong-0.010
Sur-0.009
 risks-0.009
Got-0.009
TokenIt
Token ID1026
Feature activation+1.36e-02

Pos logit contributions
 unnecessary+0.035
 lifelong+0.029
 situations+0.028
 chances+0.026
 friendships+0.026
Neg logit contributions
pron-0.041
ller-0.036
mallow-0.036
isa-0.034
issa-0.033
Token looks
Token ID3073
Feature activation+2.54e-02

Pos logit contributions
 unnecessary+0.020
 lifelong+0.017
Sur+0.015
Got+0.015
 risks+0.015
Neg logit contributions
pron-0.029
ller-0.027
lf-0.025
washer-0.024
issa-0.024
Token great
Token ID1049
Feature activation+1.87e-02

Pos logit contributions
 unnecessary+0.045
 lifelong+0.039
Sur+0.039
Got+0.038
 situations+0.038
Neg logit contributions
pron-0.045
ller-0.039
rant-0.036
lights-0.034
lf-0.034
Token!
Token ID0
Feature activation+1.52e-02

Pos logit contributions
 unnecessary+0.038
 lifelong+0.033
Sur+0.031
Got+0.031
 risks+0.030
Neg logit contributions
pron-0.030
ller-0.027
lf-0.025
lights-0.023
ibrarian-0.023
Token Now
Token ID2735
Feature activation+3.87e-02

Pos logit contributions
 unnecessary+0.026
 lifelong+0.022
 risks+0.020
Sur+0.020
Got+0.020
Neg logit contributions
pron-0.029
ller-0.027
lf-0.024
mallow-0.024
ibrarian-0.024
Token off
Token ID572
Feature activation+5.63e-02

Pos logit contributions
 unnecessary+0.079
 lifelong+0.074
Got+0.068
Sur+0.068
 situations+0.065
Neg logit contributions
pron-0.051
ller-0.047
lf-0.043
ma-0.042
issa-0.040
Token we
Token ID356
Feature activation+8.69e-03

Pos logit contributions
 unnecessary+0.112
 lifelong+0.104
Got+0.095
Sur+0.094
 risks+0.088
Neg logit contributions
pron-0.082
ller-0.077
lf-0.071
issa-0.066
mallow-0.065
Token go
Token ID467
Feature activation+1.65e-03

Pos logit contributions
 unnecessary+0.014
 lifelong+0.012
 risks+0.011
Sur+0.011
Got+0.010
Neg logit contributions
pron-0.018
ller-0.017
lf-0.015
lights-0.015
ibrarian-0.014
Token on
Token ID319
Feature activation+2.82e-02

Pos logit contributions
 unnecessary+0.003
 lifelong+0.003
 risks+0.002
 chances+0.002
Sur+0.002
Neg logit contributions
pron-0.003
ller-0.003
lf-0.003
ibrarian-0.003
mallow-0.002
Token our
Token ID674
Feature activation+2.77e-04

Pos logit contributions
 unnecessary+0.046
 lifelong+0.039
Sur+0.035
 risks+0.035
Got+0.035
Neg logit contributions
pron-0.056
ller-0.052
lf-0.049
lights-0.046
ibrarian-0.046
Token walk
Token ID2513
Feature activation-1.95e-03

Pos logit contributions
 unnecessary+0.000
 lifelong+0.000
Sur+0.000
Got+0.000
 risks+0.000
Neg logit contributions
pron-0.001
ller-0.001
lf-0.000
lights-0.000
issa-0.000
TokenFinally
Token ID11158
Feature activation+1.78e-02

Pos logit contributions
 unnecessary+0.023
 lifelong+0.018
 situations+0.018
 risks+0.018
 friendships+0.017
Neg logit contributions
pron-0.019
lf-0.016
ller-0.016
ibrarian-0.015
washer-0.015
Token,
Token ID11
Feature activation+4.94e-03

Pos logit contributions
 unnecessary+0.047
 lifelong+0.040
 situations+0.037
 risks+0.037
Got+0.037
Neg logit contributions
pron-0.018
lf-0.018
ller-0.015
ibrarian-0.015
 maker-0.015
Token Dave
Token ID9935
Feature activation+1.79e-02

Pos logit contributions
 unnecessary+0.010
 lifelong+0.009
 situations+0.008
 risks+0.008
Got+0.008
Neg logit contributions
pron-0.008
ller-0.007
lf-0.007
mallow-0.006
rant-0.006
Token's
Token ID338
Feature activation+1.11e-02

Pos logit contributions
 unnecessary+0.036
 lifelong+0.031
 risks+0.029
 situations+0.028
Got+0.028
Neg logit contributions
pron-0.029
ller-0.027
lf-0.025
ibrarian-0.023
washer-0.023
Token box
Token ID3091
Feature activation+3.95e-02

Pos logit contributions
 unnecessary+0.025
 lifelong+0.022
Sur+0.021
 risks+0.021
Got+0.021
Neg logit contributions
pron-0.015
ller-0.013
lf-0.012
lights-0.011
ibrarian-0.011
Token came
Token ID1625
Feature activation+5.63e-02

Pos logit contributions
 unnecessary+0.085
 lifelong+0.068
 risks+0.067
 situations+0.065
Got+0.064
Neg logit contributions
pron-0.063
ller-0.057
lf-0.055
lights-0.053
ibrarian-0.050
Token back
Token ID736
Feature activation+2.90e-02

Pos logit contributions
 unnecessary+0.127
Got+0.114
 situations+0.111
 risks+0.105
 endings+0.104
Neg logit contributions
ller-0.063
pron-0.060
bell-0.056
lights-0.055
work-0.052
Token down
Token ID866
Feature activation+2.28e-02

Pos logit contributions
 unnecessary+0.064
 lifelong+0.055
Got+0.053
Sur+0.052
 risks+0.052
Neg logit contributions
pron-0.042
ller-0.038
lf-0.033
lights-0.032
mallow-0.032
Token to
Token ID284
Feature activation+3.97e-03

Pos logit contributions
 unnecessary+0.050
 lifelong+0.044
Sur+0.041
 risks+0.041
Got+0.041
Neg logit contributions
pron-0.033
ller-0.029
lf-0.027
lights-0.025
ibrarian-0.025
Token the
Token ID262
Feature activation-2.06e-03

Pos logit contributions
 unnecessary+0.008
 lifelong+0.006
 risks+0.006
Sur+0.006
Got+0.006
Neg logit contributions
pron-0.007
ller-0.006
lf-0.006
rant-0.006
mallow-0.006
Token ground
Token ID2323
Feature activation+1.24e-03

Pos logit contributions
pron+0.003
ller+0.003
lf+0.003
washer+0.003
mallow+0.002
Neg logit contributions
 unnecessary-0.004
 lifelong-0.004
Sur-0.003
 risks-0.003
Got-0.003
Token counted
Token ID14789
Feature activation+2.07e-02

Pos logit contributions
pron+0.008
ller+0.007
lf+0.007
lights+0.006
mallow+0.006
Neg logit contributions
 unnecessary-0.009
 lifelong-0.007
Sur-0.007
Got-0.007
 risks-0.007
Token down
Token ID866
Feature activation+2.32e-02

Pos logit contributions
 unnecessary+0.041
 lifelong+0.035
Sur+0.033
 risks+0.033
Got+0.033
Neg logit contributions
pron-0.034
ller-0.031
lf-0.029
lights-0.027
ibrarian-0.027
Token slowly
Token ID6364
Feature activation+1.78e-03

Pos logit contributions
 unnecessary+0.051
 lifelong+0.044
Got+0.041
 risks+0.041
Sur+0.041
Neg logit contributions
pron-0.031
ller-0.029
lf-0.026
ibrarian-0.025
lights-0.025
Token and
Token ID290
Feature activation-9.68e-03

Pos logit contributions
 unnecessary+0.004
 lifelong+0.003
Got+0.003
Sur+0.003
 risks+0.003
Neg logit contributions
pron-0.003
ller-0.002
lf-0.002
lights-0.002
ibrarian-0.002
Token gave
Token ID2921
Feature activation+4.15e-03

Pos logit contributions
pron+0.015
ller+0.014
lf+0.013
ibrarian+0.012
lights+0.012
Neg logit contributions
 unnecessary-0.020
 lifelong-0.017
Sur-0.016
 risks-0.016
Got-0.016
Token everyone
Token ID2506
Feature activation+5.57e-02

Pos logit contributions
 unnecessary+0.008
 lifelong+0.007
Sur+0.006
 risks+0.006
 situations+0.006
Neg logit contributions
pron-0.007
ller-0.007
lf-0.006
issa-0.006
ibrarian-0.006
Token an
Token ID281
Feature activation+1.45e-02

Pos logit contributions
 unnecessary+0.108
 lifelong+0.087
 situations+0.085
 risks+0.081
 endings+0.080
Neg logit contributions
pron-0.094
ller-0.082
washer-0.082
ibrarian-0.079
issa-0.078
Token equal
Token ID4961
Feature activation+2.03e-02

Pos logit contributions
 unnecessary+0.024
 lifelong+0.020
Sur+0.019
Got+0.019
 risks+0.018
Neg logit contributions
pron-0.028
ller-0.026
lf-0.024
ibrarian-0.023
lights-0.023
Token chance
Token ID2863
Feature activation+1.74e-02

Pos logit contributions
 unnecessary+0.038
 lifelong+0.032
Got+0.030
Sur+0.030
Pe+0.029
Neg logit contributions
pron-0.036
ller-0.031
lf-0.029
ibrarian-0.028
washer-0.028
Token.
Token ID13
Feature activation+1.30e-02

Pos logit contributions
 unnecessary+0.041
 lifelong+0.037
Got+0.035
Sur+0.034
 risks+0.034
Neg logit contributions
pron-0.021
ller-0.019
lf-0.017
ibrarian-0.016
washer-0.016
Token
Token ID220
Feature activation+1.62e-02

Pos logit contributions
 unnecessary+0.026
 lifelong+0.023
Sur+0.021
 risks+0.021
Got+0.021
Neg logit contributions
pron-0.021
ller-0.019
lf-0.018
ibrarian-0.017
lights-0.017
Token laughed
Token ID13818
Feature activation+8.65e-03

Pos logit contributions
 unnecessary+0.002
 lifelong+0.001
 risks+0.001
Sur+0.001
 situations+0.001
Neg logit contributions
pron-0.001
ller-0.001
lf-0.001
mallow-0.001
lights-0.001
Token.
Token ID13
Feature activation-4.01e-03

Pos logit contributions
 unnecessary+0.020
 lifelong+0.017
Sur+0.016
 risks+0.016
Got+0.016
Neg logit contributions
pron-0.012
ller-0.011
lf-0.009
lights-0.009
ibrarian-0.009
Token He
Token ID679
Feature activation-1.08e-03

Pos logit contributions
pron+0.008
ller+0.008
lf+0.007
issa+0.007
rant+0.007
Neg logit contributions
 unnecessary-0.006
 lifelong-0.005
Sur-0.005
Got-0.005
 See-0.004
Token wanted
Token ID2227
Feature activation-9.96e-03

Pos logit contributions
pron+0.002
ller+0.002
lights+0.002
lf+0.002
mallow+0.002
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
Sur-0.002
Got-0.002
 risks-0.002
Token his
Token ID465
Feature activation-5.87e-04

Pos logit contributions
pron+0.017
ller+0.016
lf+0.015
lights+0.014
ibrarian+0.014
Neg logit contributions
 unnecessary-0.019
 lifelong-0.016
 risks-0.015
Sur-0.015
Got-0.015
Token sister
Token ID6621
Feature activation+5.56e-02

Pos logit contributions
pron+0.001
ller+0.001
lf+0.001
washer+0.001
lights+0.001
Neg logit contributions
 unnecessary-0.001
 lifelong-0.001
Sur-0.001
Got-0.001
 risks-0.001
Token to
Token ID284
Feature activation+1.92e-02

Pos logit contributions
 unnecessary+0.132
 lifelong+0.119
Sur+0.117
Got+0.116
Pe+0.115
Neg logit contributions
pron-0.058
ller-0.047
washer-0.044
chair-0.042
ibrarian-0.042
Token get
Token ID651
Feature activation+1.32e-02

Pos logit contributions
 unnecessary+0.036
 lifelong+0.031
Sur+0.029
Got+0.027
 situations+0.027
Neg logit contributions
pron-0.033
lf-0.030
ller-0.029
ibrarian-0.026
ma-0.026
Token up
Token ID510
Feature activation+8.89e-03

Pos logit contributions
 unnecessary+0.025
 lifelong+0.022
Sur+0.021
 situations+0.021
Got+0.020
Neg logit contributions
pron-0.022
ller-0.019
mallow-0.017
washer-0.017
rant-0.017
Token so
Token ID523
Feature activation+1.71e-03

Pos logit contributions
 unnecessary+0.017
 lifelong+0.015
 risks+0.014
Sur+0.014
Got+0.014
Neg logit contributions
pron-0.015
ller-0.014
lf-0.012
ibrarian-0.012
issa-0.012
Token they
Token ID484
Feature activation+1.55e-02

Pos logit contributions
 unnecessary+0.003
 lifelong+0.003
Sur+0.003
Got+0.003
 risks+0.002
Neg logit contributions
pron-0.003
ller-0.003
lf-0.003
lights-0.002
mallow-0.002
Token she
Token ID673
Feature activation-3.25e-03

Pos logit contributions
pron+0.004
ller+0.003
lf+0.003
ibrarian+0.003
lights+0.003
Neg logit contributions
 unnecessary-0.004
 lifelong-0.004
Sur-0.003
 risks-0.003
Got-0.003
Token could
Token ID714
Feature activation-4.36e-03

Pos logit contributions
pron+0.005
ller+0.004
lf+0.004
lights+0.004
ibrarian+0.004
Neg logit contributions
 unnecessary-0.007
 lifelong-0.006
Sur-0.006
Got-0.006
 risks-0.006
Token and
Token ID290
Feature activation-1.20e-02

Pos logit contributions
pron+0.007
ller+0.006
lf+0.005
ibrarian+0.005
lights+0.005
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.008
 risks-0.008
Got-0.008
Token gave
Token ID2921
Feature activation-4.26e-03

Pos logit contributions
pron+0.020
ller+0.018
lf+0.016
issa+0.016
ibrarian+0.015
Neg logit contributions
 unnecessary-0.024
 lifelong-0.021
Sur-0.019
 risks-0.019
Got-0.019
Token Grand
Token ID5675
Feature activation-3.93e-03

Pos logit contributions
pron+0.007
ller+0.006
lf+0.005
lights+0.005
ibrarian+0.005
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.007
 risks-0.007
Got-0.007
Tokenma
Token ID2611
Feature activation+5.53e-02

Pos logit contributions
pron+0.005
ller+0.004
rant+0.003
lights+0.003
lf+0.003
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.008
 chances-0.008
 friendships-0.008
Token the
Token ID262
Feature activation+2.02e-03

Pos logit contributions
 unnecessary+0.089
 lifelong+0.074
Sur+0.067
 risks+0.067
Got+0.066
Neg logit contributions
pron-0.112
ller-0.104
lf-0.097
ibrarian-0.092
lights-0.092
Token biggest
Token ID4094
Feature activation-3.20e-03

Pos logit contributions
 unnecessary+0.005
 lifelong+0.004
 chances+0.004
Sur+0.004
 risks+0.004
Neg logit contributions
ller-0.002
pron-0.002
rant-0.002
lights-0.002
lf-0.002
Token hug
Token ID16225
Feature activation+1.60e-02

Pos logit contributions
pron+0.005
ller+0.005
lf+0.004
rant+0.004
lights+0.004
Neg logit contributions
 unnecessary-0.006
 lifelong-0.005
Sur-0.005
 risks-0.005
 chances-0.005
Token ever
Token ID1683
Feature activation-6.69e-04

Pos logit contributions
 unnecessary+0.034
 lifelong+0.030
Sur+0.028
 risks+0.028
Got+0.028
Neg logit contributions
pron-0.023
ller-0.021
lf-0.019
lights-0.018
ibrarian-0.018
Token.
Token ID13
Feature activation+4.03e-03

Pos logit contributions
pron+0.001
ller+0.001
lf+0.001
ibrarian+0.001
lights+0.001
Neg logit contributions
 unnecessary-0.002
 lifelong-0.001
Sur-0.001
 risks-0.001
Got-0.001
Token shiny
Token ID22441
Feature activation-3.56e-03

Pos logit contributions
 unnecessary+0.015
 lifelong+0.012
 situations+0.012
 risks+0.012
Got+0.011
Neg logit contributions
pron-0.019
ller-0.016
lf-0.016
washer-0.015
 clock-0.015
Token toy
Token ID13373
Feature activation-2.42e-03

Pos logit contributions
pron+0.007
ller+0.007
lf+0.006
ibrarian+0.006
lights+0.006
Neg logit contributions
 unnecessary-0.005
 lifelong-0.005
 risks-0.004
 chances-0.004
 situations-0.004
Token that
Token ID326
Feature activation-2.71e-03

Pos logit contributions
pron+0.003
ller+0.003
lf+0.003
lights+0.003
ibrarian+0.003
Neg logit contributions
 unnecessary-0.005
 lifelong-0.005
Sur-0.004
 risks-0.004
Got-0.004
Token makes
Token ID1838
Feature activation-8.30e-03

Pos logit contributions
pron+0.005
ller+0.004
lf+0.004
lights+0.004
ibrarian+0.004
Neg logit contributions
 unnecessary-0.005
 lifelong-0.004
Sur-0.004
Got-0.004
 risks-0.004
Token his
Token ID465
Feature activation+1.60e-02

Pos logit contributions
pron+0.014
ller+0.013
lf+0.012
washer+0.012
lights+0.012
Neg logit contributions
 unnecessary-0.016
 lifelong-0.014
Sur-0.013
Got-0.013
Pe-0.012
Token dad
Token ID9955
Feature activation+5.51e-02

Pos logit contributions
 unnecessary+0.030
 lifelong+0.026
 risks+0.024
Sur+0.023
Got+0.023
Neg logit contributions
pron-0.028
ller-0.026
lf-0.024
ibrarian-0.023
lights-0.022
Token's
Token ID338
Feature activation+4.07e-03

Pos logit contributions
 unnecessary+0.105
 lifelong+0.088
Sur+0.084
Got+0.084
 risks+0.082
Neg logit contributions
pron-0.097
ller-0.088
lf-0.080
washer-0.077
lights-0.076
Token face
Token ID1986
Feature activation+2.58e-02

Pos logit contributions
 unnecessary+0.007
 lifelong+0.006
 risks+0.006
Sur+0.006
Got+0.006
Neg logit contributions
pron-0.008
ller-0.007
lf-0.006
ibrarian-0.006
lights-0.006
Token smooth
Token ID7209
Feature activation-1.02e-03

Pos logit contributions
 unnecessary+0.053
 lifelong+0.046
Sur+0.043
Got+0.043
 risks+0.043
Neg logit contributions
pron-0.041
ller-0.037
lf-0.034
lights-0.032
washer-0.031
Token.
Token ID13
Feature activation+3.40e-03

Pos logit contributions
pron+0.001
ller+0.001
lf+0.001
ibrarian+0.001
mallow+0.001
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
 risks-0.002
Sur-0.002
 situations-0.002
Token He
Token ID679
Feature activation+4.40e-03

Pos logit contributions
 unnecessary+0.005
 lifelong+0.004
Sur+0.004
Got+0.004
 chances+0.004
Neg logit contributions
pron-0.007
ller-0.006
lf-0.006
ibrarian-0.006
issa-0.006
Token,
Token ID11
Feature activation-7.82e-03

Pos logit contributions
 unnecessary+0.032
 lifelong+0.029
 situations+0.027
 risks+0.026
 endings+0.026
Neg logit contributions
pron-0.016
lf-0.015
ller-0.015
mallow-0.013
ma-0.013
Token you
Token ID345
Feature activation+5.51e-03

Pos logit contributions
pron+0.011
ibrarian+0.010
mallow+0.010
lf+0.010
ller+0.010
Neg logit contributions
 unnecessary-0.016
 lifelong-0.015
 situations-0.014
 friendships-0.013
 risks-0.013
Token!
Token ID0
Feature activation+2.57e-02

Pos logit contributions
 unnecessary+0.011
 lifelong+0.011
 situations+0.009
 friendships+0.009
 endings+0.009
Neg logit contributions
lf-0.007
ibrarian-0.007
pron-0.007
ller-0.007
washer-0.007
Token Leave
Token ID17446
Feature activation-1.82e-02

Pos logit contributions
 unnecessary+0.051
 lifelong+0.046
 risks+0.038
 chances+0.036
 friendships+0.035
Neg logit contributions
mallow-0.042
issa-0.039
ller-0.039
pron-0.038
ibrarian-0.038
Token those
Token ID883
Feature activation+1.42e-02

Pos logit contributions
pron+0.029
ller+0.027
lf+0.026
 oven+0.024
ibrarian+0.024
Neg logit contributions
 unnecessary-0.035
 lifelong-0.031
Sur-0.028
 risks-0.027
 situations-0.027
Token mice
Token ID10693
Feature activation+5.49e-02

Pos logit contributions
 unnecessary+0.026
 lifelong+0.024
Pe+0.020
 risks+0.020
 situations+0.020
Neg logit contributions
pron-0.024
ibrarian-0.022
lf-0.022
 oven-0.021
ma-0.021
Token alone
Token ID3436
Feature activation+2.00e-02

Pos logit contributions
 unnecessary+0.119
 lifelong+0.107
Sur+0.098
Got+0.098
Pe+0.094
Neg logit contributions
pron-0.075
ller-0.069
lf-0.067
washer-0.060
ibrarian-0.059
Token!
Token ID0
Feature activation+1.50e-02

Pos logit contributions
 unnecessary+0.045
 lifelong+0.040
Got+0.037
Sur+0.036
 risks+0.036
Neg logit contributions
pron-0.027
ller-0.025
lf-0.023
ibrarian-0.021
mallow-0.021
Token They
Token ID1119
Feature activation+2.19e-02

Pos logit contributions
 unnecessary+0.031
 lifelong+0.028
 risks+0.024
 chances+0.023
 friendships+0.022
Neg logit contributions
ller-0.022
ibrarian-0.022
mallow-0.022
pron-0.021
issa-0.021
Token are
Token ID389
Feature activation-5.20e-03

Pos logit contributions
 unnecessary+0.039
 lifelong+0.034
 situations+0.029
Sur+0.029
Pe+0.028
Neg logit contributions
pron-0.039
ller-0.036
lf-0.035
washer-0.034
ibrarian-0.034
Token my
Token ID616
Feature activation+1.20e-02

Pos logit contributions
pron+0.008
ller+0.007
lf+0.007
ibrarian+0.007
 oven+0.006
Neg logit contributions
 unnecessary-0.010
 lifelong-0.009
Got-0.009
 situations-0.009
Sur-0.009
Token She
Token ID1375
Feature activation-5.25e-03

Pos logit contributions
pron+0.016
ller+0.015
lf+0.014
lights+0.013
rant+0.013
Neg logit contributions
 unnecessary-0.012
 lifelong-0.010
Sur-0.009
Got-0.009
 friendships-0.008
Token smiled
Token ID13541
Feature activation+1.58e-02

Pos logit contributions
pron+0.009
ller+0.009
lights+0.008
lf+0.008
mallow+0.008
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.008
Got-0.008
 risks-0.007
Token and
Token ID290
Feature activation-7.28e-03

Pos logit contributions
 unnecessary+0.037
 lifelong+0.032
 risks+0.030
Sur+0.030
Got+0.030
Neg logit contributions
pron-0.021
ller-0.019
lf-0.016
lights-0.016
ibrarian-0.015
Token gave
Token ID2921
Feature activation-2.41e-02

Pos logit contributions
pron+0.010
ller+0.009
lf+0.008
lights+0.007
ibrarian+0.007
Neg logit contributions
 unnecessary-0.016
 lifelong-0.015
Sur-0.014
 risks-0.014
Got-0.014
Token the
Token ID262
Feature activation+3.01e-05

Pos logit contributions
pron+0.053
ller+0.051
lf+0.046
lights+0.045
mallow+0.044
Neg logit contributions
 unnecessary-0.036
 lifelong-0.030
 chances-0.026
 risks-0.026
 friendships-0.025
Token mouse
Token ID10211
Feature activation+5.47e-02

Pos logit contributions
 unnecessary+0.000
 lifelong+0.000
Sur+0.000
 risks+0.000
Got+0.000
Neg logit contributions
pron-0.000
ller-0.000
lf-0.000
lights-0.000
mallow-0.000
Token a
Token ID257
Feature activation+1.59e-02

Pos logit contributions
 unnecessary+0.106
 lifelong+0.088
 situations+0.084
Sur+0.083
Got+0.083
Neg logit contributions
pron-0.089
ller-0.079
ibrarian-0.074
washer-0.072
lf-0.070
Token little
Token ID1310
Feature activation+1.35e-02

Pos logit contributions
 unnecessary+0.030
 lifelong+0.025
 risks+0.023
Sur+0.023
Got+0.023
Neg logit contributions
pron-0.028
ller-0.026
lf-0.024
ibrarian-0.022
mallow-0.022
Token pat
Token ID1458
Feature activation+1.36e-03

Pos logit contributions
 unnecessary+0.029
 lifelong+0.024
 situations+0.023
Got+0.023
 risks+0.023
Neg logit contributions
pron-0.021
ller-0.017
 timer-0.016
 oven-0.016
 clock-0.016
Token on
Token ID319
Feature activation-6.38e-03

Pos logit contributions
 unnecessary+0.002
 lifelong+0.002
Sur+0.002
Got+0.002
 risks+0.002
Neg logit contributions
pron-0.002
ller-0.002
lf-0.002
lights-0.002
arge-0.002
Token its
Token ID663
Feature activation-8.88e-03

Pos logit contributions
pron+0.014
ller+0.013
washer+0.013
lights+0.012
mallow+0.012
Neg logit contributions
 unnecessary-0.008
 lifelong-0.007
Sur-0.006
 risks-0.006
 chances-0.006
Token,
Token ID11
Feature activation+2.89e-02

Pos logit contributions
 unnecessary+0.003
 lifelong+0.003
Sur+0.003
Got+0.003
 risks+0.003
Neg logit contributions
pron-0.003
ller-0.003
lf-0.003
lights-0.002
mallow-0.002
Token he
Token ID339
Feature activation+9.99e-03

Pos logit contributions
 unnecessary+0.065
 lifelong+0.058
 situations+0.055
Sur+0.054
Got+0.054
Neg logit contributions
ller-0.033
pron-0.033
lf-0.028
mallow-0.028
washer-0.027
Token heard
Token ID2982
Feature activation+2.05e-02

Pos logit contributions
 unnecessary+0.017
 lifelong+0.014
Sur+0.013
 risks+0.013
Got+0.013
Neg logit contributions
pron-0.019
ller-0.018
lights-0.017
lf-0.016
mallow-0.016
Token a
Token ID257
Feature activation-1.09e-02

Pos logit contributions
 unnecessary+0.035
 lifelong+0.030
Got+0.027
Sur+0.027
 risks+0.027
Neg logit contributions
pron-0.038
ller-0.036
lf-0.033
ibrarian-0.032
lights-0.032
Token voice
Token ID3809
Feature activation-8.03e-03

Pos logit contributions
pron+0.019
ller+0.017
lf+0.016
ibrarian+0.015
washer+0.014
Neg logit contributions
 unnecessary-0.021
 lifelong-0.018
 risks-0.017
Got-0.017
 situations-0.016
Token calling
Token ID4585
Feature activation+5.46e-02

Pos logit contributions
pron+0.014
ller+0.013
lf+0.011
lights+0.011
mallow+0.011
Neg logit contributions
 unnecessary-0.015
 lifelong-0.013
Sur-0.012
 risks-0.012
 situations-0.012
Token his
Token ID465
Feature activation+9.30e-03

Pos logit contributions
 unnecessary+0.104
 lifelong+0.089
Got+0.083
 situations+0.082
Sur+0.082
Neg logit contributions
pron-0.088
ller-0.085
ibrarian-0.077
lf-0.075
rant-0.073
Token name
Token ID1438
Feature activation+1.18e-02

Pos logit contributions
 unnecessary+0.016
 lifelong+0.014
Sur+0.012
 risks+0.012
Got+0.012
Neg logit contributions
pron-0.018
ller-0.016
lf-0.015
lights-0.015
mallow-0.014
Token from
Token ID422
Feature activation+1.41e-02

Pos logit contributions
 unnecessary+0.026
 lifelong+0.023
Sur+0.022
 risks+0.021
Got+0.021
Neg logit contributions
pron-0.017
ller-0.015
lf-0.013
lights-0.012
ibrarian-0.012
Token behind
Token ID2157
Feature activation+2.94e-02

Pos logit contributions
 unnecessary+0.023
 lifelong+0.018
 risks+0.018
 chances+0.017
 situations+0.016
Neg logit contributions
pron-0.031
ller-0.027
lf-0.026
oons-0.025
washer-0.025
Token.
Token ID13
Feature activation+8.39e-03

Pos logit contributions
 unnecessary+0.051
 lifelong+0.043
 risks+0.040
Sur+0.040
Got+0.039
Neg logit contributions
pron-0.056
ller-0.052
lf-0.048
lights-0.046
ibrarian-0.046
Token.
Token ID13
Feature activation+1.34e-03

Pos logit contributions
pron+0.001
ller+0.001
lf+0.001
ibrarian+0.001
washer+0.001
Neg logit contributions
 unnecessary-0.003
 lifelong-0.003
Sur-0.002
Got-0.002
 risks-0.002
Token
Token ID198
Feature activation-9.78e-04

Pos logit contributions
 unnecessary+0.003
 lifelong+0.002
 risks+0.002
 situations+0.002
Sur+0.002
Neg logit contributions
pron-0.002
ller-0.002
lf-0.002
ibrarian-0.002
lights-0.002
Token
Token ID198
Feature activation+9.72e-03

Pos logit contributions
pron+0.002
ller+0.002
lf+0.002
lights+0.002
mallow+0.002
Neg logit contributions
 unnecessary-0.002
 lifelong-0.001
Sur-0.001
Got-0.001
 risks-0.001
Token"
Token ID1
Feature activation+1.59e-02

Pos logit contributions
 unnecessary+0.021
 lifelong+0.019
 situations+0.017
 risks+0.017
 endings+0.017
Neg logit contributions
pron-0.015
lf-0.013
ller-0.012
ibrarian-0.012
washer-0.012
TokenLook
Token ID8567
Feature activation-2.28e-03

Pos logit contributions
 unnecessary+0.033
 lifelong+0.027
 situations+0.026
 decisions+0.024
 risks+0.023
Neg logit contributions
pron-0.025
mallow-0.023
 timer-0.022
lf-0.022
ller-0.021
Token out
Token ID503
Feature activation+5.45e-02

Pos logit contributions
pron+0.004
ller+0.003
lf+0.003
ibrarian+0.003
mallow+0.003
Neg logit contributions
 unnecessary-0.005
 lifelong-0.004
Got-0.004
 situations-0.004
Sur-0.004
Token!"
Token ID2474
Feature activation+5.68e-03

Pos logit contributions
 unnecessary+0.110
 lifelong+0.097
Got+0.090
 situations+0.089
 risks+0.088
Neg logit contributions
pron-0.084
ller-0.079
lf-0.074
ibrarian-0.070
issa-0.068
Token Tom
Token ID4186
Feature activation-2.18e-03

Pos logit contributions
 unnecessary+0.011
 lifelong+0.010
 risks+0.009
 situations+0.009
Sur+0.009
Neg logit contributions
pron-0.009
ller-0.009
lf-0.008
ibrarian-0.008
lights-0.007
Token shouted
Token ID17293
Feature activation+1.05e-02

Pos logit contributions
pron+0.003
ller+0.003
lf+0.002
lights+0.002
ibrarian+0.002
Neg logit contributions
 unnecessary-0.005
 lifelong-0.004
Sur-0.004
 risks-0.004
Got-0.004
Token.
Token ID13
Feature activation-1.01e-02

Pos logit contributions
 unnecessary+0.026
 lifelong+0.023
Got+0.022
Sur+0.022
 risks+0.021
Neg logit contributions
pron-0.011
ller-0.010
ibrarian-0.009
lf-0.009
lights-0.008
Token He
Token ID679
Feature activation-3.38e-03

Pos logit contributions
pron+0.019
ller+0.017
lf+0.016
rant+0.015
issa+0.015
Neg logit contributions
 unnecessary-0.017
 lifelong-0.015
Sur-0.015
Got-0.014
Help-0.014
Token walk
Token ID2513
Feature activation+2.27e-03

Pos logit contributions
pron+0.003
ller+0.003
lf+0.003
ibrarian+0.003
washer+0.003
Neg logit contributions
 unnecessary-0.003
 lifelong-0.002
Sur-0.002
Got-0.002
 risks-0.002
Token outside
Token ID2354
Feature activation+1.10e-02

Pos logit contributions
 unnecessary+0.004
 lifelong+0.003
 risks+0.003
 chances+0.003
 situations+0.003
Neg logit contributions
pron-0.004
ller-0.004
lf-0.004
ibrarian-0.004
issa-0.003
Token to
Token ID284
Feature activation+8.75e-03

Pos logit contributions
 unnecessary+0.023
 lifelong+0.020
Sur+0.019
 risks+0.018
Got+0.018
Neg logit contributions
pron-0.017
ller-0.015
lf-0.014
lights-0.013
ibrarian-0.013
Token give
Token ID1577
Feature activation+1.23e-03

Pos logit contributions
 unnecessary+0.016
 lifelong+0.013
Sur+0.012
Got+0.012
 risks+0.012
Neg logit contributions
pron-0.016
ller-0.014
lf-0.014
issa-0.013
rant-0.013
Token his
Token ID465
Feature activation+1.07e-02

Pos logit contributions
 unnecessary+0.002
 lifelong+0.002
Got+0.002
Sur+0.002
 situations+0.002
Neg logit contributions
pron-0.002
ller-0.002
lf-0.002
washer-0.002
pa-0.001
Token neighbours
Token ID23788
Feature activation+5.43e-02

Pos logit contributions
 unnecessary+0.017
 lifelong+0.014
Sur+0.013
 risks+0.013
Got+0.013
Neg logit contributions
pron-0.022
ller-0.020
lf-0.019
ibrarian-0.018
lights-0.018
Token a
Token ID257
Feature activation+2.08e-02

Pos logit contributions
 unnecessary+0.098
 lifelong+0.082
Sur+0.077
Got+0.076
 risks+0.074
Neg logit contributions
pron-0.099
ller-0.089
lf-0.083
washer-0.080
ibrarian-0.080
Token big
Token ID1263
Feature activation+1.60e-02

Pos logit contributions
 unnecessary+0.037
 lifelong+0.031
Sur+0.029
Got+0.028
 risks+0.028
Neg logit contributions
pron-0.039
ller-0.035
lf-0.033
washer-0.031
ibrarian-0.031
Token surprise
Token ID5975
Feature activation+3.34e-03

Pos logit contributions
 unnecessary+0.029
 lifelong+0.025
 risks+0.023
 chances+0.022
Sur+0.022
Neg logit contributions
pron-0.028
ller-0.027
lf-0.025
lights-0.024
rant-0.023
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
 unnecessary+0.007
 lifelong+0.006
Sur+0.006
 risks+0.006
 situations+0.006
Neg logit contributions
pron-0.005
ller-0.004
lf-0.004
lights-0.004
issa-0.004
Token
Token ID198
Feature activation-3.60e-03

Pos logit contributions
pron+0.017
ller+0.015
lf+0.014
washer+0.014
lights+0.013
Neg logit contributions
 unnecessary-0.021
 lifelong-0.018
 risks-0.017
Got-0.017
Sur-0.016
Token Grand
Token ID5675
Feature activation-2.54e-02

Pos logit contributions
pron+0.020
ller+0.018
rant+0.017
ibrarian+0.016
lf+0.016
Neg logit contributions
 unnecessary-0.024
 lifelong-0.021
Sur-0.020
Got-0.019
 risks-0.019
Tokenmother
Token ID13552
Feature activation+4.64e-03

Pos logit contributions
pron+0.028
ller+0.022
lights+0.021
rant+0.019
 maker+0.019
Neg logit contributions
 unnecessary-0.056
 lifelong-0.054
Sur-0.053
 endings-0.051
 true-0.050
Token would
Token ID561
Feature activation-1.48e-02

Pos logit contributions
 unnecessary+0.009
 lifelong+0.008
Sur+0.008
Got+0.007
 risks+0.007
Neg logit contributions
pron-0.007
ller-0.007
lf-0.006
lights-0.006
mallow-0.006
Token give
Token ID1577
Feature activation-3.31e-03

Pos logit contributions
pron+0.031
ller+0.030
lights+0.028
mallow+0.027
lf+0.027
Neg logit contributions
 unnecessary-0.020
 lifelong-0.017
 belong-0.016
Sur-0.015
 risks-0.015
Token her
Token ID607
Feature activation+1.59e-02

Pos logit contributions
pron+0.006
ller+0.006
lf+0.005
ibrarian+0.005
lights+0.005
Neg logit contributions
 unnecessary-0.006
 lifelong-0.005
Sur-0.005
 risks-0.005
Got-0.005
Token granddaughter
Token ID46458
Feature activation+5.42e-02

Pos logit contributions
 unnecessary+0.027
 lifelong+0.023
Sur+0.021
 risks+0.020
Got+0.020
Neg logit contributions
pron-0.031
ller-0.029
lf-0.027
ibrarian-0.025
lights-0.025
Token a
Token ID257
Feature activation+1.31e-02

Pos logit contributions
 unnecessary+0.096
 lifelong+0.081
Got+0.076
Sur+0.075
 situations+0.073
Neg logit contributions
pron-0.100
ller-0.088
lf-0.083
ibrarian-0.081
washer-0.080
Token file
Token ID2393
Feature activation+9.81e-04

Pos logit contributions
 unnecessary+0.023
 lifelong+0.020
Sur+0.018
 risks+0.017
Got+0.017
Neg logit contributions
pron-0.024
ller-0.023
lf-0.022
lights-0.021
mallow-0.020
Token.
Token ID13
Feature activation-9.83e-03

Pos logit contributions
 unnecessary+0.002
 lifelong+0.001
Sur+0.001
 chances+0.001
 belong+0.001
Neg logit contributions
ller-0.002
pron-0.002
lf-0.001
lights-0.001
arge-0.001
Token The
Token ID383
Feature activation-1.68e-02

Pos logit contributions
pron+0.020
ller+0.018
lf+0.018
ibrarian+0.017
washer+0.017
Neg logit contributions
 unnecessary-0.014
 lifelong-0.013
Sur-0.012
Got-0.011
Pe-0.011
Token file
Token ID2393
Feature activation-3.00e-03

Pos logit contributions
pron+0.029
ller+0.028
lights+0.025
lf+0.025
mallow+0.024
Neg logit contributions
 unnecessary-0.031
 lifelong-0.027
Sur-0.025
Got-0.024
 chances-0.024
Token she
Token ID673
Feature activation+1.09e-02

Pos logit contributions
pron+0.018
ller+0.016
lf+0.015
ibrarian+0.014
lights+0.014
Neg logit contributions
 unnecessary-0.019
 lifelong-0.016
Sur-0.015
Got-0.015
 risks-0.015
Token decided
Token ID3066
Feature activation+7.69e-03

Pos logit contributions
 unnecessary+0.017
 lifelong+0.017
Sur+0.016
Got+0.015
 friendships+0.014
Neg logit contributions
pron-0.020
ller-0.018
lights-0.016
rant-0.016
mallow-0.016
Token to
Token ID284
Feature activation+1.28e-02

Pos logit contributions
 unnecessary+0.016
 lifelong+0.014
Sur+0.013
 risks+0.013
Got+0.013
Neg logit contributions
pron-0.012
ller-0.011
lf-0.010
ibrarian-0.009
lights-0.009
Token make
Token ID787
Feature activation+8.76e-04

Pos logit contributions
 unnecessary+0.019
 lifelong+0.016
Got+0.014
Sur+0.014
 risks+0.014
Neg logit contributions
pron-0.026
ller-0.025
lf-0.023
lights-0.023
mallow-0.022
Token her
Token ID607
Feature activation+1.90e-02

Pos logit contributions
 unnecessary+0.002
 lifelong+0.001
Sur+0.001
Got+0.001
 risks+0.001
Neg logit contributions
pron-0.002
ller-0.001
lf-0.001
ibrarian-0.001
lights-0.001
Token mother
Token ID2802
Feature activation+5.40e-02

Pos logit contributions
 unnecessary+0.038
 lifelong+0.032
Sur+0.030
 risks+0.030
Got+0.030
Neg logit contributions
pron-0.032
ller-0.029
lf-0.027
lights-0.025
ibrarian-0.025
Token a
Token ID257
Feature activation+2.32e-02

Pos logit contributions
 unnecessary+0.110
Got+0.089
 lifelong+0.089
Sur+0.089
 situations+0.088
Neg logit contributions
pron-0.089
ller-0.079
lf-0.071
washer-0.068
lights-0.067
Token healthy
Token ID5448
Feature activation+1.18e-02

Pos logit contributions
 unnecessary+0.039
 lifelong+0.032
 situations+0.030
 risks+0.030
Sur+0.030
Neg logit contributions
pron-0.047
ller-0.041
lf-0.038
washer-0.036
ibrarian-0.036
Token snack
Token ID26906
Feature activation-2.80e-03

Pos logit contributions
 unnecessary+0.022
 lifelong+0.019
 risks+0.018
Sur+0.018
Got+0.018
Neg logit contributions
pron-0.020
ller-0.019
lf-0.017
lights-0.016
ibrarian-0.016
Token.
Token ID13
Feature activation+3.17e-03

Pos logit contributions
pron+0.004
ller+0.004
lf+0.004
issa+0.003
mallow+0.003
Neg logit contributions
 unnecessary-0.005
 lifelong-0.005
Sur-0.005
 risks-0.004
Got-0.004
Token She
Token ID1375
Feature activation-8.19e-03

Pos logit contributions
 unnecessary+0.005
 lifelong+0.005
Sur+0.004
Got+0.004
 risks+0.004
Neg logit contributions
pron-0.006
ller-0.006
lf-0.005
mallow-0.005
lights-0.005
Token with
Token ID351
Feature activation+1.03e-02

Pos logit contributions
 unnecessary+0.002
 lifelong+0.002
Sur+0.002
Got+0.002
Pe+0.002
Neg logit contributions
pron-0.001
ller-0.001
lf-0.001
lights-0.001
 maker-0.001
Token his
Token ID465
Feature activation+2.34e-03

Pos logit contributions
 unnecessary+0.021
 lifelong+0.018
 situations+0.017
Sur+0.016
 risks+0.016
Neg logit contributions
pron-0.015
ller-0.013
lf-0.013
lights-0.013
 screen-0.013
Token foot
Token ID2366
Feature activation+7.49e-03

Pos logit contributions
 unnecessary+0.005
 lifelong+0.004
Got+0.004
Sur+0.004
 situations+0.004
Neg logit contributions
pron-0.004
ller-0.004
lf-0.003
ibrarian-0.003
mallow-0.003
Token.
Token ID13
Feature activation-4.92e-03

Pos logit contributions
 unnecessary+0.017
 lifelong+0.014
Sur+0.014
 risks+0.014
Got+0.014
Neg logit contributions
pron-0.011
ller-0.009
lf-0.009
ibrarian-0.008
lights-0.008
Token The
Token ID383
Feature activation-8.12e-03

Pos logit contributions
pron+0.009
ller+0.008
lf+0.008
mallow+0.007
lights+0.007
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.007
Got-0.007
 risks-0.007
Token hut
Token ID40812
Feature activation+5.40e-02

Pos logit contributions
pron+0.015
ller+0.013
lf+0.012
ibrarian+0.012
lights+0.011
Neg logit contributions
 unnecessary-0.015
 lifelong-0.013
Sur-0.012
 risks-0.012
Got-0.012
Token fell
Token ID3214
Feature activation-5.41e-04

Pos logit contributions
 unnecessary+0.102
 lifelong+0.087
Sur+0.080
 risks+0.080
Got+0.079
Neg logit contributions
pron-0.095
ller-0.087
lf-0.081
ibrarian-0.076
lights-0.076
Token down
Token ID866
Feature activation+1.24e-02

Pos logit contributions
pron+0.001
ller+0.001
lights+0.001
lf+0.001
washer+0.001
Neg logit contributions
 unnecessary-0.001
 lifelong-0.001
 risks-0.001
Got-0.001
Sur-0.001
Token.
Token ID13
Feature activation-8.84e-03

Pos logit contributions
 unnecessary+0.029
 lifelong+0.026
Sur+0.024
 risks+0.024
Got+0.024
Neg logit contributions
pron-0.016
ller-0.014
lf-0.013
lights-0.012
ibrarian-0.012
Token He
Token ID679
Feature activation+7.59e-04

Pos logit contributions
pron+0.016
ller+0.015
lf+0.014
mallow+0.013
lights+0.013
Neg logit contributions
 unnecessary-0.016
 lifelong-0.014
Sur-0.013
Got-0.013
 risks-0.013
Token was
Token ID373
Feature activation-2.63e-03

Pos logit contributions
 unnecessary+0.001
 lifelong+0.001
Sur+0.001
Got+0.001
 risks+0.001
Neg logit contributions
pron-0.001
ller-0.001
lf-0.001
lights-0.001
ibrarian-0.001
Token able
Token ID1498
Feature activation+2.18e-02

Pos logit contributions
 unnecessary+0.008
Got+0.007
 situations+0.007
Sur+0.007
 lifelong+0.007
Neg logit contributions
pron-0.008
ller-0.007
lf-0.007
ma-0.006
washer-0.006
Token to
Token ID284
Feature activation+1.55e-02

Pos logit contributions
 unnecessary+0.053
Got+0.048
 lifelong+0.047
Sur+0.046
 situations+0.046
Neg logit contributions
pron-0.022
ller-0.020
lf-0.015
ma-0.015
washer-0.015
Token make
Token ID787
Feature activation+1.62e-02

Pos logit contributions
 unnecessary+0.025
 lifelong+0.021
Sur+0.019
 risks+0.019
Got+0.019
Neg logit contributions
pron-0.031
ller-0.029
lf-0.027
ibrarian-0.026
lights-0.026
Token his
Token ID465
Feature activation+1.62e-02

Pos logit contributions
 unnecessary+0.030
Got+0.027
Sur+0.026
Help+0.026
 lifelong+0.025
Neg logit contributions
ller-0.024
pron-0.023
ilda-0.020
lf-0.020
issa-0.019
Token little
Token ID1310
Feature activation+3.80e-02

Pos logit contributions
 unnecessary+0.031
 lifelong+0.027
Sur+0.026
Got+0.026
 risks+0.025
Neg logit contributions
pron-0.028
ller-0.025
lf-0.023
lights-0.021
mallow-0.021
Token sister
Token ID6621
Feature activation+5.38e-02

Pos logit contributions
 unnecessary+0.073
Sur+0.064
Got+0.063
 lifelong+0.061
Pe+0.058
Neg logit contributions
pron-0.060
ller-0.057
 clock-0.050
mallow-0.048
lights-0.047
Token feel
Token ID1254
Feature activation+1.19e-02

Pos logit contributions
 unnecessary+0.115
Got+0.095
 lifelong+0.094
Sur+0.094
 situations+0.091
Neg logit contributions
pron-0.080
ller-0.071
lf-0.063
washer-0.061
lights-0.060
Token better
Token ID1365
Feature activation+4.02e-03

Pos logit contributions
 unnecessary+0.022
Sur+0.018
Got+0.017
 lifelong+0.017
 situations+0.017
Neg logit contributions
pron-0.021
ller-0.018
washer-0.017
rant-0.016
lights-0.016
Token with
Token ID351
Feature activation+1.27e-02

Pos logit contributions
 unnecessary+0.009
 lifelong+0.008
Sur+0.008
Got+0.008
 risks+0.008
Neg logit contributions
pron-0.005
ller-0.005
lf-0.004
lights-0.004
mallow-0.004
Token a
Token ID257
Feature activation+1.47e-02

Pos logit contributions
 unnecessary+0.021
 lifelong+0.018
Sur+0.017
Got+0.017
 risks+0.016
Neg logit contributions
pron-0.024
ller-0.022
lf-0.021
lights-0.020
mallow-0.020
Token hug
Token ID16225
Feature activation+5.83e-03

Pos logit contributions
 unnecessary+0.029
 lifelong+0.024
 risks+0.023
 situations+0.023
Got+0.023
Neg logit contributions
pron-0.026
ller-0.022
lf-0.020
washer-0.019
mallow-0.019
Token or
Token ID393
Feature activation+1.26e-02

Pos logit contributions
pron+0.020
ller+0.019
lf+0.018
arge+0.018
issa+0.017
Neg logit contributions
 unnecessary-0.017
 lifelong-0.015
 risks-0.014
 belong-0.014
 chances-0.014
Token herbs
Token ID25411
Feature activation-9.17e-03

Pos logit contributions
 unnecessary+0.023
 lifelong+0.021
Sur+0.019
Got+0.018
 risks+0.018
Neg logit contributions
pron-0.021
ller-0.019
lf-0.018
 stove-0.017
 clock-0.017
Token that
Token ID326
Feature activation-1.38e-02

Pos logit contributions
pron+0.017
ller+0.016
lf+0.015
arge+0.015
issa+0.014
Neg logit contributions
 unnecessary-0.016
 lifelong-0.013
 risks-0.013
 chances-0.012
Sur-0.012
Token the
Token ID262
Feature activation+6.86e-03

Pos logit contributions
pron+0.027
ller+0.025
lf+0.023
ibrarian+0.022
lights+0.022
Neg logit contributions
 unnecessary-0.024
 lifelong-0.020
Sur-0.018
 risks-0.018
Got-0.018
Token mole
Token ID9411
Feature activation+7.23e-03

Pos logit contributions
 unnecessary+0.012
 lifelong+0.010
Sur+0.009
 risks+0.009
Got+0.009
Neg logit contributions
pron-0.013
ller-0.012
lf-0.012
ibrarian-0.011
lights-0.011
Token doesn
Token ID1595
Feature activation-6.46e-02

Pos logit contributions
 unnecessary+0.013
 lifelong+0.011
 risks+0.010
Sur+0.010
 chances+0.010
Neg logit contributions
pron-0.013
ller-0.012
lf-0.011
lights-0.011
arge-0.010
Token't
Token ID470
Feature activation-5.92e-03

Pos logit contributions
pron+0.109
ller+0.098
lights+0.091
isa+0.090
issa+0.089
Neg logit contributions
 unnecessary-0.117
 lifelong-0.104
Sur-0.098
 belong-0.098
 risks-0.096
Token like
Token ID588
Feature activation-3.76e-02

Pos logit contributions
pron+0.012
ller+0.011
lf+0.011
ibrarian+0.010
rant+0.010
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.007
Got-0.007
 situations-0.007
Token.
Token ID13
Feature activation+6.64e-03

Pos logit contributions
pron+0.067
ller+0.061
lf+0.056
ibrarian+0.053
mallow+0.053
Neg logit contributions
 unnecessary-0.070
 lifelong-0.060
 risks-0.055
Sur-0.055
Got-0.054
Token Or
Token ID1471
Feature activation-1.50e-02

Pos logit contributions
 unnecessary+0.012
 lifelong+0.010
Got+0.009
 risks+0.009
 situations+0.009
Neg logit contributions
pron-0.012
ller-0.011
mallow-0.010
ibrarian-0.010
lights-0.009
Token we
Token ID356
Feature activation+1.36e-03

Pos logit contributions
pron+0.027
ller+0.025
lf+0.022
ibrarian+0.022
lights+0.021
Neg logit contributions
 unnecessary-0.027
 lifelong-0.024
Sur-0.022
Got-0.022
 risks-0.021
Token mad
Token ID8805
Feature activation+3.67e-03

Pos logit contributions
pron+0.010
ller+0.009
lf+0.008
issa+0.008
ibrarian+0.008
Neg logit contributions
 unnecessary-0.012
 lifelong-0.011
Got-0.010
Sur-0.010
 situations-0.010
Token at
Token ID379
Feature activation-1.51e-02

Pos logit contributions
 unnecessary+0.007
 lifelong+0.006
Got+0.006
Sur+0.006
 risks+0.006
Neg logit contributions
pron-0.006
ller-0.006
lf-0.005
ibrarian-0.005
lights-0.005
Token us
Token ID514
Feature activation-1.20e-02

Pos logit contributions
pron+0.034
ller+0.030
lights+0.028
ibrarian+0.028
lf+0.028
Neg logit contributions
 unnecessary-0.023
 lifelong-0.017
 risks-0.017
 situations-0.016
 friendships-0.016
Token.
Token ID13
Feature activation+5.69e-03

Pos logit contributions
pron+0.021
ller+0.018
lf+0.017
mallow+0.016
lights+0.016
Neg logit contributions
 unnecessary-0.023
 lifelong-0.019
 risks-0.019
 chances-0.018
 situations-0.018
Token Please
Token ID4222
Feature activation-1.76e-02

Pos logit contributions
 unnecessary+0.009
 lifelong+0.008
Got+0.007
 risks+0.007
Sur+0.007
Neg logit contributions
pron-0.011
ller-0.010
lf-0.009
ibrarian-0.009
mallow-0.009
Token don
Token ID836
Feature activation-6.21e-02

Pos logit contributions
pron+0.040
ller+0.037
lights+0.036
lf+0.035
arge+0.034
Neg logit contributions
 unnecessary-0.023
 risks-0.017
 lifelong-0.017
 belong-0.017
Sur-0.017
Token't
Token ID470
Feature activation-9.16e-03

Pos logit contributions
pron+0.120
isa+0.100
rant+0.097
ller+0.096
lights+0.092
Neg logit contributions
 unnecessary-0.098
Sur-0.091
 lifelong-0.087
 Leave-0.086
 belong-0.081
Token hurt
Token ID5938
Feature activation-1.22e-02

Pos logit contributions
pron+0.019
ller+0.018
lf+0.017
mallow+0.016
ibrarian+0.016
Neg logit contributions
 unnecessary-0.014
 lifelong-0.012
Sur-0.010
Got-0.010
 risks-0.010
Token us
Token ID514
Feature activation-5.64e-03

Pos logit contributions
pron+0.020
ller+0.019
lf+0.017
issa+0.016
lights+0.016
Neg logit contributions
 unnecessary-0.023
 lifelong-0.021
Got-0.019
Sur-0.019
 risks-0.018
Token."
Token ID526
Feature activation-8.18e-03

Pos logit contributions
pron+0.009
ller+0.008
lf+0.008
ibrarian+0.007
lights+0.007
Neg logit contributions
 unnecessary-0.012
 lifelong-0.010
 risks-0.009
 chances-0.009
Sur-0.009
Token
Token ID198
Feature activation-1.29e-03

Pos logit contributions
pron+0.015
ller+0.014
lf+0.013
ibrarian+0.012
lights+0.012
Neg logit contributions
 unnecessary-0.015
 lifelong-0.013
Sur-0.012
 risks-0.012
Got-0.012
Token has
Token ID468
Feature activation-7.01e-03

Pos logit contributions
pron+0.031
ller+0.028
lf+0.027
lights+0.026
mallow+0.025
Neg logit contributions
 unnecessary-0.022
 lifelong-0.018
Sur-0.016
 risks-0.016
Got-0.016
Token big
Token ID1263
Feature activation+1.62e-04

Pos logit contributions
pron+0.012
ller+0.011
lf+0.010
ibrarian+0.009
lights+0.009
Neg logit contributions
 unnecessary-0.013
 lifelong-0.012
Got-0.011
Sur-0.011
 situations-0.010
Token words
Token ID2456
Feature activation-1.18e-02

Pos logit contributions
 unnecessary+0.000
 lifelong+0.000
 risks+0.000
 situations+0.000
Sur+0.000
Neg logit contributions
pron-0.000
ller-0.000
lf-0.000
mallow-0.000
washer-0.000
Token that
Token ID326
Feature activation-1.62e-02

Pos logit contributions
pron+0.021
ller+0.018
lf+0.018
arge+0.017
isa+0.017
Neg logit contributions
 unnecessary-0.020
 lifelong-0.017
 risks-0.017
Sur-0.016
 belong-0.016
Token I
Token ID314
Feature activation-2.00e-02

Pos logit contributions
pron+0.030
ller+0.027
lf+0.025
lights+0.024
mallow+0.024
Neg logit contributions
 unnecessary-0.029
 lifelong-0.024
 risks-0.022
Sur-0.022
Got-0.022
Token don
Token ID836
Feature activation-6.18e-02

Pos logit contributions
pron+0.047
lights+0.045
ller+0.045
lf+0.044
arge+0.040
Neg logit contributions
 unnecessary-0.023
 belong-0.019
 lifelong-0.018
 risks-0.017
Got-0.017
Token't
Token ID470
Feature activation-1.28e-02

Pos logit contributions
pron+0.122
isa+0.103
rant+0.101
ller+0.099
lights+0.098
Neg logit contributions
 unnecessary-0.088
Sur-0.081
 belong-0.078
 Leave-0.077
 lifelong-0.076
Token know
Token ID760
Feature activation-1.70e-02

Pos logit contributions
pron+0.032
lights+0.031
ller+0.030
lf+0.029
mallow+0.029
Neg logit contributions
 unnecessary-0.014
 belong-0.011
 risks-0.010
 lifelong-0.009
 chances-0.009
Token.
Token ID13
Feature activation-4.76e-03

Pos logit contributions
pron+0.028
ller+0.025
lf+0.023
mallow+0.022
lights+0.022
Neg logit contributions
 unnecessary-0.034
 lifelong-0.028
 risks-0.027
Sur-0.027
 situations-0.026
Token But
Token ID887
Feature activation-5.00e-03

Pos logit contributions
pron+0.009
ller+0.008
lf+0.008
lights+0.007
ibrarian+0.007
Neg logit contributions
 unnecessary-0.008
 lifelong-0.007
Sur-0.006
 risks-0.006
Got-0.006
Token we
Token ID356
Feature activation-4.90e-03

Pos logit contributions
pron+0.009
ller+0.008
lf+0.008
ibrarian+0.007
lights+0.007
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.007
 risks-0.007
Got-0.007
Token naughty
Token ID45128
Feature activation+2.98e-03

Pos logit contributions
pron+0.005
ller+0.004
lf+0.004
lights+0.004
mallow+0.004
Neg logit contributions
 unnecessary-0.004
 lifelong-0.003
 risks-0.003
Sur-0.003
 chances-0.003
Token and
Token ID290
Feature activation+4.71e-03

Pos logit contributions
 unnecessary+0.007
 lifelong+0.006
Sur+0.006
 risks+0.006
Got+0.006
Neg logit contributions
pron-0.004
ller-0.004
lf-0.003
ibrarian-0.003
lights-0.003
Token bad
Token ID2089
Feature activation-4.84e-03

Pos logit contributions
 unnecessary+0.007
 lifelong+0.005
 risks+0.005
Sur+0.005
 chances+0.005
Neg logit contributions
pron-0.010
ller-0.009
lf-0.009
ibrarian-0.009
lights-0.009
Token.
Token ID13
Feature activation+7.52e-03

Pos logit contributions
pron+0.008
ller+0.007
lf+0.006
lights+0.006
ibrarian+0.006
Neg logit contributions
 unnecessary-0.010
 lifelong-0.009
Got-0.008
Sur-0.008
 risks-0.008
Token Please
Token ID4222
Feature activation-1.74e-02

Pos logit contributions
 unnecessary+0.013
 lifelong+0.011
Got+0.009
 risks+0.009
Sur+0.009
Neg logit contributions
pron-0.014
ller-0.014
lf-0.012
ibrarian-0.012
issa-0.012
Token don
Token ID836
Feature activation-6.15e-02

Pos logit contributions
pron+0.040
ller+0.038
lights+0.037
lf+0.035
issa+0.035
Neg logit contributions
 unnecessary-0.022
 risks-0.017
 lifelong-0.016
 belong-0.016
Sur-0.016
Token't
Token ID470
Feature activation-1.13e-02

Pos logit contributions
pron+0.119
isa+0.100
rant+0.097
ller+0.095
lights+0.092
Neg logit contributions
 unnecessary-0.096
Sur-0.090
 lifelong-0.085
 Leave-0.085
 belong-0.080
Token be
Token ID307
Feature activation-6.19e-03

Pos logit contributions
pron+0.024
ller+0.023
lf+0.021
ibrarian+0.020
lights+0.020
Neg logit contributions
 unnecessary-0.017
 lifelong-0.014
Sur-0.012
 risks-0.012
Got-0.012
Token mad
Token ID8805
Feature activation+3.67e-03

Pos logit contributions
pron+0.010
ller+0.009
lf+0.008
issa+0.008
ibrarian+0.008
Neg logit contributions
 unnecessary-0.012
 lifelong-0.011
Got-0.010
Sur-0.010
 situations-0.010
Token at
Token ID379
Feature activation-1.51e-02

Pos logit contributions
 unnecessary+0.007
 lifelong+0.006
Got+0.006
Sur+0.006
 risks+0.006
Neg logit contributions
pron-0.006
ller-0.006
lf-0.005
ibrarian-0.005
lights-0.005
Token us
Token ID514
Feature activation-1.20e-02

Pos logit contributions
pron+0.034
ller+0.030
lights+0.028
ibrarian+0.028
lf+0.028
Neg logit contributions
 unnecessary-0.023
 lifelong-0.017
 risks-0.017
 situations-0.016
 friendships-0.016
Token.
Token ID13
Feature activation-2.11e-02

Pos logit contributions
 unnecessary+0.014
 lifelong+0.012
 risks+0.012
Sur+0.012
 situations+0.011
Neg logit contributions
pron-0.011
ller-0.010
lf-0.009
lights-0.008
rant-0.008
Token
Token ID198
Feature activation-6.09e-03

Pos logit contributions
pron+0.044
ller+0.040
lf+0.039
lights+0.037
rant+0.036
Neg logit contributions
 unnecessary-0.032
 lifelong-0.028
Sur-0.027
Got-0.025
Pe-0.024
Token
Token ID198
Feature activation-8.31e-05

Pos logit contributions
pron+0.014
ller+0.012
lf+0.012
lights+0.012
rant+0.012
Neg logit contributions
 unnecessary-0.008
 lifelong-0.007
Sur-0.007
Got-0.006
Pe-0.006
Token"
Token ID1
Feature activation+1.86e-03

Pos logit contributions
pron+0.000
ller+0.000
lf+0.000
ibrarian+0.000
isa+0.000
Neg logit contributions
 unnecessary-0.000
 lifelong-0.000
 risks-0.000
 chances-0.000
 situations-0.000
TokenI
Token ID40
Feature activation-1.89e-02

Pos logit contributions
 unnecessary+0.003
 lifelong+0.003
 situations+0.003
 risks+0.003
 chances+0.003
Neg logit contributions
pron-0.003
ller-0.003
lf-0.003
ibrarian-0.003
mallow-0.003
Token don
Token ID836
Feature activation-6.11e-02

Pos logit contributions
pron+0.038
ller+0.035
ibrarian+0.032
lf+0.031
mallow+0.031
Neg logit contributions
 unnecessary-0.031
 lifelong-0.025
 situations-0.024
Sur-0.024
Got-0.023
Token't
Token ID470
Feature activation-1.59e-02

Pos logit contributions
pron+0.127
rant+0.106
isa+0.103
roach+0.103
lights+0.103
Neg logit contributions
 unnecessary-0.083
Sur-0.080
 Leave-0.078
 lifelong-0.076
 belong-0.072
Token know
Token ID760
Feature activation-1.28e-02

Pos logit contributions
pron+0.035
ller+0.033
lf+0.031
ibrarian+0.030
mallow+0.030
Neg logit contributions
 unnecessary-0.022
 lifelong-0.018
Sur-0.016
Got-0.016
 risks-0.015
Token,"
Token ID553
Feature activation+3.04e-03

Pos logit contributions
pron+0.018
ller+0.016
lf+0.014
ibrarian+0.013
lights+0.013
Neg logit contributions
 unnecessary-0.028
 lifelong-0.025
Sur-0.023
 risks-0.023
Got-0.023
Token Ben
Token ID3932
Feature activation-2.68e-03

Pos logit contributions
 unnecessary+0.006
 lifelong+0.005
Sur+0.004
Got+0.004
 risks+0.004
Neg logit contributions
pron-0.005
ller-0.005
lf-0.005
lights-0.004
ibrarian-0.004
Token says
Token ID1139
Feature activation-5.42e-03

Pos logit contributions
pron+0.005
ller+0.004
lf+0.004
lights+0.004
mallow+0.003
Neg logit contributions
 unnecessary-0.005
 lifelong-0.005
 risks-0.004
Sur-0.004
 chances-0.004
Token wanted
Token ID2227
Feature activation-8.60e-03

Pos logit contributions
pron+0.010
ller+0.009
lf+0.008
lights+0.008
washer+0.008
Neg logit contributions
 unnecessary-0.009
 lifelong-0.007
Got-0.007
Sur-0.007
 risks-0.007
Token to
Token ID284
Feature activation+1.88e-02

Pos logit contributions
pron+0.014
ller+0.013
lf+0.012
lights+0.011
ibrarian+0.011
Neg logit contributions
 unnecessary-0.017
 lifelong-0.015
Sur-0.014
 risks-0.014
Got-0.014
Token touch
Token ID3638
Feature activation+9.10e-03

Pos logit contributions
 unnecessary+0.033
 lifelong+0.028
Sur+0.026
 situations+0.025
Got+0.025
Neg logit contributions
pron-0.034
ller-0.033
lf-0.030
ibrarian-0.028
 clock-0.027
Token the
Token ID262
Feature activation+3.67e-03

Pos logit contributions
 unnecessary+0.017
 lifelong+0.014
Sur+0.013
Got+0.013
 risks+0.013
Neg logit contributions
pron-0.016
ller-0.015
lf-0.014
lights-0.013
mallow-0.013
Token ost
Token ID23619
Feature activation-1.27e-02

Pos logit contributions
 unnecessary+0.006
 lifelong+0.005
 risks+0.004
Sur+0.004
Got+0.004
Neg logit contributions
pron-0.008
ller-0.007
lf-0.007
lights-0.007
ibrarian-0.007
Tokenric
Token ID1173
Feature activation-6.07e-02

Pos logit contributions
pron+0.027
lf+0.024
ibrarian+0.022
chair+0.022
issa+0.022
Neg logit contributions
 unnecessary-0.019
 lifelong-0.015
 risks-0.014
 friendships-0.014
 situations-0.013
Tokenhes
Token ID956
Feature activation+6.92e-03

Pos logit contributions
pron+0.086
ller+0.078
lights+0.068
ci+0.066
issa+0.065
Neg logit contributions
 unnecessary-0.115
Sur-0.107
 lifelong-0.101
 risks-0.099
 belong-0.098
Token.
Token ID13
Feature activation+1.47e-02

Pos logit contributions
 unnecessary+0.014
 lifelong+0.013
 risks+0.012
Sur+0.012
Got+0.011
Neg logit contributions
pron-0.011
ller-0.010
lf-0.009
issa-0.008
ibrarian-0.008
Token They
Token ID1119
Feature activation-2.20e-03

Pos logit contributions
 unnecessary+0.033
 lifelong+0.029
 risks+0.027
 situations+0.026
 chances+0.026
Neg logit contributions
pron-0.019
ller-0.018
lights-0.016
arge-0.015
ibrarian-0.015
Token reached
Token ID4251
Feature activation+1.66e-02

Pos logit contributions
pron+0.004
ller+0.004
lf+0.004
lights+0.004
washer+0.003
Neg logit contributions
 unnecessary-0.004
Got-0.003
 lifelong-0.003
Sur-0.003
 risks-0.003
Token out
Token ID503
Feature activation+1.32e-02

Pos logit contributions
 unnecessary+0.022
 lifelong+0.018
 risks+0.015
 chances+0.015
 situations+0.014
Neg logit contributions
pron-0.040
ller-0.037
lf-0.035
lights-0.032
washer-0.032
Token tells
Token ID4952
Feature activation-1.64e-02

Pos logit contributions
 unnecessary+0.030
 lifelong+0.026
Sur+0.024
 risks+0.024
Got+0.024
Neg logit contributions
pron-0.021
ller-0.019
lf-0.017
ibrarian-0.016
mallow-0.016
Token her
Token ID607
Feature activation+6.19e-03

Pos logit contributions
pron+0.030
ller+0.028
lf+0.025
lights+0.024
mallow+0.024
Neg logit contributions
 unnecessary-0.030
 lifelong-0.025
 risks-0.023
Sur-0.023
Got-0.023
Token the
Token ID262
Feature activation+9.65e-03

Pos logit contributions
 unnecessary+0.012
 lifelong+0.010
Sur+0.009
 risks+0.009
Got+0.009
Neg logit contributions
pron-0.011
ller-0.010
lf-0.009
mallow-0.008
lights-0.008
Token ones
Token ID3392
Feature activation-1.09e-02

Pos logit contributions
 unnecessary+0.017
 lifelong+0.014
 risks+0.013
Sur+0.013
Got+0.013
Neg logit contributions
pron-0.018
ller-0.017
lf-0.016
lights-0.015
ibrarian-0.015
Token she
Token ID673
Feature activation+7.72e-03

Pos logit contributions
pron+0.017
ller+0.015
lf+0.013
lights+0.012
mallow+0.012
Neg logit contributions
 unnecessary-0.023
 lifelong-0.020
 risks-0.019
Sur-0.019
 chances-0.018
Token doesn
Token ID1595
Feature activation-6.03e-02

Pos logit contributions
 unnecessary+0.013
 lifelong+0.010
Got+0.010
 risks+0.010
Sur+0.010
Neg logit contributions
pron-0.016
ller-0.014
lights-0.014
lf-0.014
mallow-0.013
Token't
Token ID470
Feature activation-8.18e-06

Pos logit contributions
pron+0.102
ller+0.091
lights+0.086
lf+0.083
issa+0.082
Neg logit contributions
 unnecessary-0.107
 lifelong-0.097
Sur-0.089
 belong-0.089
 risks-0.087
Token know
Token ID760
Feature activation-1.72e-02

Feature not active on prev token;
no predictions were affected.
Token.
Token ID13
Feature activation-7.25e-03

Pos logit contributions
pron+0.028
ller+0.025
lf+0.023
ibrarian+0.022
washer+0.021
Neg logit contributions
 unnecessary-0.034
 lifelong-0.030
 risks-0.028
Sur-0.027
 situations-0.027
Token They
Token ID1119
Feature activation+3.87e-03

Pos logit contributions
pron+0.013
ller+0.012
lf+0.011
lights+0.011
ibrarian+0.011
Neg logit contributions
 unnecessary-0.013
 lifelong-0.011
Sur-0.010
Got-0.010
 risks-0.010
Token have
Token ID423
Feature activation+3.99e-03

Pos logit contributions
 unnecessary+0.006
 lifelong+0.005
 risks+0.004
Got+0.004
 chances+0.004
Neg logit contributions
pron-0.009
ller-0.008
lf-0.007
lights-0.007
mallow-0.007
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
pron+0.017
ller+0.015
lights+0.014
lf+0.014
roach+0.013
Neg logit contributions
 unnecessary-0.017
 lifelong-0.015
 risks-0.014
 chances-0.014
 situations-0.013
Token
Token ID198
Feature activation+1.61e-03

Pos logit contributions
pron+0.028
ller+0.024
lf+0.024
lights+0.023
rant+0.023
Neg logit contributions
 unnecessary-0.020
Sur-0.018
 lifelong-0.017
Pe-0.017
Got-0.017
Token
Token ID198
Feature activation+7.31e-03

Pos logit contributions
 unnecessary+0.002
 lifelong+0.002
Sur+0.002
Got+0.002
 risks+0.002
Neg logit contributions
pron-0.003
ller-0.003
lf-0.003
lights-0.003
mallow-0.003
Token"
Token ID1
Feature activation-1.19e-03

Pos logit contributions
 unnecessary+0.016
 lifelong+0.013
 risks+0.012
 situations+0.012
 friendships+0.012
Neg logit contributions
pron-0.012
ller-0.011
lf-0.010
ibrarian-0.009
arge-0.009
TokenI
Token ID40
Feature activation-2.06e-02

Pos logit contributions
pron+0.002
ller+0.002
lf+0.002
lights+0.002
ibrarian+0.002
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
Sur-0.001
Got-0.001
 risks-0.001
Token don
Token ID836
Feature activation-6.03e-02

Pos logit contributions
pron+0.044
ller+0.041
lf+0.039
ibrarian+0.037
lights+0.037
Neg logit contributions
 unnecessary-0.031
 lifelong-0.025
Sur-0.022
 risks-0.022
Got-0.022
Token't
Token ID470
Feature activation-9.38e-03

Pos logit contributions
pron+0.125
rant+0.112
lights+0.108
roach+0.105
isa+0.105
Neg logit contributions
Sur-0.073
 unnecessary-0.073
 Leave-0.068
 lifelong-0.067
 true-0.063
Token know
Token ID760
Feature activation-9.12e-03

Pos logit contributions
pron+0.026
ller+0.024
lights+0.023
lf+0.023
mallow+0.022
Neg logit contributions
 unnecessary-0.009
 lifelong-0.006
 risks-0.006
Sur-0.005
 chances-0.005
Token.
Token ID13
Feature activation-1.02e-02

Pos logit contributions
pron+0.015
ller+0.013
lf+0.011
mallow+0.011
rant+0.011
Neg logit contributions
 unnecessary-0.019
 lifelong-0.016
 risks-0.016
Sur-0.015
 situations-0.015
Token Maybe
Token ID6674
Feature activation-1.27e-02

Pos logit contributions
pron+0.021
ller+0.019
lf+0.018
mallow+0.017
lights+0.017
Neg logit contributions
 unnecessary-0.016
 lifelong-0.013
Sur-0.013
 risks-0.012
Got-0.012
Token it
Token ID340
Feature activation-1.41e-02

Pos logit contributions
pron+0.021
ller+0.019
lf+0.018
lights+0.017
issa+0.017
Neg logit contributions
 unnecessary-0.025
 lifelong-0.022
Got-0.020
Sur-0.020
 risks-0.020
Token Lily
Token ID20037
Feature activation-1.86e-03

Pos logit contributions
pron+0.003
ller+0.003
lf+0.002
mallow+0.002
rant+0.002
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
Sur-0.002
Got-0.001
 risks-0.001
Token says
Token ID1139
Feature activation+1.48e-03

Pos logit contributions
pron+0.004
lf+0.003
lights+0.003
ller+0.003
mallow+0.003
Neg logit contributions
Sur-0.002
 unnecessary-0.002
 lifelong-0.002
Got-0.002
 chances-0.002
Token,
Token ID11
Feature activation-1.07e-02

Pos logit contributions
 unnecessary+0.002
 lifelong+0.002
Sur+0.001
 risks+0.001
 chances+0.001
Neg logit contributions
pron-0.003
ller-0.003
lf-0.003
rant-0.003
lights-0.003
Token "
Token ID366
Feature activation+6.08e-04

Pos logit contributions
pron+0.020
lights+0.019
oons+0.019
ller+0.018
ibrarian+0.018
Neg logit contributions
Sur-0.014
 unnecessary-0.014
 lifelong-0.013
Got-0.013
Pe-0.012
TokenI
Token ID40
Feature activation-1.23e-02

Pos logit contributions
 unnecessary+0.001
 lifelong+0.001
Sur+0.001
 risks+0.001
Got+0.001
Neg logit contributions
pron-0.001
ller-0.001
lf-0.001
lights-0.001
ibrarian-0.001
Token don
Token ID836
Feature activation-6.02e-02

Pos logit contributions
pron+0.025
ller+0.023
lf+0.021
ibrarian+0.020
mallow+0.020
Neg logit contributions
 unnecessary-0.020
 lifelong-0.017
Sur-0.015
 risks-0.015
Got-0.015
Token't
Token ID470
Feature activation-1.26e-03

Pos logit contributions
pron+0.114
rant+0.097
isa+0.097
lights+0.095
ller+0.095
Neg logit contributions
 unnecessary-0.085
Sur-0.084
 lifelong-0.079
 Leave-0.078
 belong-0.073
Token know
Token ID760
Feature activation-1.21e-02

Pos logit contributions
pron+0.003
ller+0.003
lf+0.003
lights+0.003
ibrarian+0.003
Neg logit contributions
 unnecessary-0.002
 lifelong-0.001
 risks-0.001
Sur-0.001
Got-0.001
Token,
Token ID11
Feature activation-1.77e-02

Pos logit contributions
pron+0.021
ller+0.019
lf+0.017
ibrarian+0.016
lights+0.016
Neg logit contributions
 unnecessary-0.023
 lifelong-0.020
 risks-0.018
Sur-0.018
Got-0.018
Token Ben
Token ID3932
Feature activation+9.37e-03

Pos logit contributions
pron+0.032
ller+0.029
lf+0.027
lights+0.026
ibrarian+0.026
Neg logit contributions
 unnecessary-0.032
 lifelong-0.027
Sur-0.025
 risks-0.025
Got-0.025
Token.
Token ID13
Feature activation-3.69e-03

Pos logit contributions
 unnecessary+0.020
 lifelong+0.018
Sur+0.017
 risks+0.017
Got+0.016
Neg logit contributions
pron-0.014
ller-0.012
lf-0.011
lights-0.010
ibrarian-0.010
Token.
Token ID13
Feature activation-2.15e-02

Pos logit contributions
 unnecessary+0.006
 lifelong+0.005
Sur+0.005
 chances+0.004
 risks+0.004
Neg logit contributions
pron-0.006
ller-0.005
lf-0.004
mallow-0.004
rant-0.004
Token
Token ID198
Feature activation-3.32e-03

Pos logit contributions
pron+0.046
ller+0.041
mallow+0.039
lf+0.038
rant+0.038
Neg logit contributions
 unnecessary-0.028
 lifelong-0.027
Sur-0.027
Got-0.024
Help-0.024
Token
Token ID198
Feature activation-4.89e-03

Pos logit contributions
pron+0.007
ller+0.007
mallow+0.006
lf+0.006
rant+0.006
Neg logit contributions
 unnecessary-0.005
 lifelong-0.004
Sur-0.004
Got-0.004
Pe-0.003
Token"
Token ID1
Feature activation+4.80e-03

Pos logit contributions
pron+0.009
ller+0.009
lf+0.008
ibrarian+0.008
lights+0.007
Neg logit contributions
 unnecessary-0.009
 lifelong-0.007
 risks-0.007
Got-0.007
Sur-0.007
TokenI
Token ID40
Feature activation-1.51e-02

Pos logit contributions
 unnecessary+0.009
 lifelong+0.007
 risks+0.007
 situations+0.006
 chances+0.006
Neg logit contributions
pron-0.009
ller-0.009
lf-0.008
ibrarian-0.008
washer-0.008
Token don
Token ID836
Feature activation-6.02e-02

Pos logit contributions
pron+0.029
ller+0.026
ibrarian+0.025
issa+0.025
lf+0.025
Neg logit contributions
 unnecessary-0.027
 lifelong-0.020
 situations-0.020
 risks-0.019
 friendships-0.018
Token't
Token ID470
Feature activation-1.28e-02

Pos logit contributions
pron+0.127
rant+0.111
isa+0.109
roach+0.107
lights+0.107
Neg logit contributions
Sur-0.070
 unnecessary-0.070
 Leave-0.069
 lifelong-0.067
 belong-0.064
Token know
Token ID760
Feature activation-1.20e-02

Pos logit contributions
pron+0.031
ller+0.029
lf+0.027
lights+0.026
mallow+0.026
Neg logit contributions
 unnecessary-0.016
 lifelong-0.013
 risks-0.011
Sur-0.011
Got-0.011
Token.
Token ID13
Feature activation-6.46e-03

Pos logit contributions
pron+0.018
ller+0.016
lf+0.014
ibrarian+0.014
mallow+0.014
Neg logit contributions
 unnecessary-0.026
 lifelong-0.022
 risks-0.021
Sur-0.021
Got-0.021
Token Maybe
Token ID6674
Feature activation-2.22e-02

Pos logit contributions
pron+0.013
ller+0.012
lf+0.011
lights+0.011
ibrarian+0.011
Neg logit contributions
 unnecessary-0.011
 lifelong-0.009
Sur-0.008
 risks-0.008
Got-0.008
Token they
Token ID484
Feature activation+5.29e-03

Pos logit contributions
pron+0.036
ller+0.034
lf+0.031
lights+0.029
ibrarian+0.029
Neg logit contributions
 unnecessary-0.044
 lifelong-0.038
Got-0.035
Sur-0.035
 risks-0.035
Token?"
Token ID1701
Feature activation-2.14e-02

Pos logit contributions
pron+0.018
ller+0.017
issa+0.014
arge+0.014
lf+0.014
Neg logit contributions
 unnecessary-0.036
 lifelong-0.033
 chances-0.030
Sur-0.029
 risks-0.029
Token
Token ID198
Feature activation-1.37e-02

Pos logit contributions
pron+0.051
lights+0.046
rant+0.044
lf+0.043
arge+0.043
Neg logit contributions
 unnecessary-0.025
Sur-0.022
 lifelong-0.022
Help-0.019
Got-0.019
Token
Token ID198
Feature activation-9.07e-03

Pos logit contributions
pron+0.031
ller+0.028
lights+0.027
rant+0.027
mallow+0.027
Neg logit contributions
 unnecessary-0.016
 lifelong-0.015
Sur-0.015
Pe-0.013
Got-0.013
Token"
Token ID1
Feature activation+9.44e-03

Pos logit contributions
pron+0.017
ller+0.016
lf+0.015
ibrarian+0.014
mallow+0.014
Neg logit contributions
 unnecessary-0.016
 lifelong-0.013
 risks-0.012
 situations-0.012
 chances-0.012
TokenI
Token ID40
Feature activation-1.84e-02

Pos logit contributions
 unnecessary+0.016
 lifelong+0.013
 risks+0.012
Sur+0.012
 situations+0.012
Neg logit contributions
pron-0.018
ller-0.017
lf-0.016
ibrarian-0.015
lights-0.015
Token don
Token ID836
Feature activation-6.02e-02

Pos logit contributions
pron+0.042
ller+0.039
lf+0.037
lights+0.036
ibrarian+0.035
Neg logit contributions
 unnecessary-0.025
 lifelong-0.021
Sur-0.018
 risks-0.018
Got-0.018
Token't
Token ID470
Feature activation-1.06e-02

Pos logit contributions
pron+0.118
rant+0.100
isa+0.099
lights+0.098
ller+0.096
Neg logit contributions
 unnecessary-0.088
Sur-0.081
 lifelong-0.080
 Leave-0.080
 belong-0.073
Token know
Token ID760
Feature activation-7.58e-03

Pos logit contributions
pron+0.027
ller+0.025
lf+0.024
lights+0.023
mallow+0.023
Neg logit contributions
 unnecessary-0.012
 lifelong-0.009
 risks-0.008
Sur-0.008
 chances-0.008
Token,
Token ID11
Feature activation-8.59e-03

Pos logit contributions
pron+0.012
ller+0.011
lf+0.010
mallow+0.009
ibrarian+0.009
Neg logit contributions
 unnecessary-0.016
 lifelong-0.013
 risks-0.013
Sur-0.013
 chances-0.013
Token but
Token ID475
Feature activation-2.11e-03

Pos logit contributions
pron+0.016
ller+0.014
lf+0.013
lights+0.012
ibrarian+0.012
Neg logit contributions
 unnecessary-0.016
 lifelong-0.013
Sur-0.012
Got-0.012
 risks-0.012
Token we
Token ID356
Feature activation-5.94e-03

Pos logit contributions
pron+0.004
ller+0.004
lf+0.003
ibrarian+0.003
lights+0.003
Neg logit contributions
 unnecessary-0.004
 lifelong-0.003
Sur-0.003
 risks-0.003
Got-0.003
Token is
Token ID318
Feature activation-1.63e-02

Pos logit contributions
pron+0.011
ller+0.010
lf+0.010
issa+0.009
arge+0.009
Neg logit contributions
 unnecessary-0.009
 lifelong-0.007
 belong-0.007
 risks-0.007
 situations-0.007
Token green
Token ID4077
Feature activation-1.55e-02

Pos logit contributions
pron+0.031
ller+0.029
lf+0.028
ibrarian+0.027
issa+0.027
Neg logit contributions
 unnecessary-0.028
 lifelong-0.024
 risks-0.021
 belong-0.020
 chances-0.020
Token.
Token ID13
Feature activation-6.50e-03

Pos logit contributions
pron+0.028
ller+0.024
lf+0.023
ibrarian+0.022
issa+0.022
Neg logit contributions
 unnecessary-0.028
 lifelong-0.024
 risks-0.024
 situations-0.023
 chances-0.023
Token But
Token ID887
Feature activation-9.70e-03

Pos logit contributions
pron+0.014
ller+0.013
lf+0.012
lights+0.012
mallow+0.012
Neg logit contributions
 unnecessary-0.010
 lifelong-0.008
Sur-0.007
Got-0.007
 risks-0.007
Token I
Token ID314
Feature activation-2.07e-02

Pos logit contributions
pron+0.019
ller+0.017
lf+0.016
ibrarian+0.015
lights+0.015
Neg logit contributions
 unnecessary-0.017
 lifelong-0.014
 risks-0.013
Sur-0.013
Got-0.013
Token don
Token ID836
Feature activation-6.02e-02

Pos logit contributions
pron+0.048
ller+0.046
lf+0.044
lights+0.044
 timer+0.041
Neg logit contributions
 unnecessary-0.027
 lifelong-0.022
 risks-0.020
 belong-0.019
 chances-0.019
Token't
Token ID470
Feature activation-8.71e-03

Pos logit contributions
pron+0.121
lights+0.104
rant+0.103
isa+0.100
ller+0.097
Neg logit contributions
 unnecessary-0.083
Sur-0.079
 lifelong-0.076
 Leave-0.075
Pe-0.072
Token know
Token ID760
Feature activation-1.41e-02

Pos logit contributions
pron+0.022
ller+0.020
lf+0.020
lights+0.020
mallow+0.019
Neg logit contributions
 unnecessary-0.010
 lifelong-0.008
 risks-0.007
Sur-0.007
 chances-0.007
Token everything
Token ID2279
Feature activation-5.77e-03

Pos logit contributions
pron+0.026
ller+0.023
lf+0.022
ibrarian+0.021
lights+0.020
Neg logit contributions
 unnecessary-0.026
 lifelong-0.022
 risks-0.021
 chances-0.020
 situations-0.020
Token."
Token ID526
Feature activation-5.70e-03

Pos logit contributions
pron+0.011
ller+0.009
lf+0.009
arge+0.009
mallow+0.008
Neg logit contributions
 unnecessary-0.010
 lifelong-0.009
 risks-0.008
 chances-0.008
 belong-0.008
Token
Token ID198
Feature activation-2.55e-03

Pos logit contributions
pron+0.011
ller+0.010
lf+0.010
ibrarian+0.009
lights+0.009
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.007
 risks-0.007
Got-0.007
Token will
Token ID481
Feature activation-1.94e-02

Pos logit contributions
 unnecessary+0.022
 lifelong+0.018
 risks+0.017
 chances+0.016
 belong+0.016
Neg logit contributions
pron-0.028
ller-0.026
lf-0.025
lights-0.024
arge-0.023
Token be
Token ID307
Feature activation+7.44e-03

Pos logit contributions
pron+0.033
ller+0.030
lf+0.027
lights+0.026
mallow+0.026
Neg logit contributions
 unnecessary-0.038
 lifelong-0.033
Sur-0.030
 risks-0.030
Got-0.030
Token worried
Token ID7960
Feature activation-1.59e-02

Pos logit contributions
 unnecessary+0.014
Got+0.012
 situations+0.012
 endings+0.012
 lifelong+0.012
Neg logit contributions
pron-0.012
ller-0.010
hes-0.010
ma-0.010
 oven-0.009
Token if
Token ID611
Feature activation-1.11e-02

Pos logit contributions
pron+0.031
ller+0.027
mallow+0.025
ibrarian+0.025
lf+0.025
Neg logit contributions
 unnecessary-0.027
 lifelong-0.022
 risks-0.022
 chances-0.021
 situations-0.021
Token they
Token ID484
Feature activation+2.91e-03

Pos logit contributions
pron+0.019
ller+0.017
lf+0.016
lights+0.015
ibrarian+0.015
Neg logit contributions
 unnecessary-0.021
 lifelong-0.019
Got-0.017
Sur-0.017
 risks-0.017
Token don
Token ID836
Feature activation-5.99e-02

Pos logit contributions
 unnecessary+0.004
 belong+0.003
 risks+0.003
 lifelong+0.003
 chances+0.003
Neg logit contributions
pron-0.007
ller-0.006
lf-0.006
lights-0.006
arge-0.006
Token't
Token ID470
Feature activation-5.18e-03

Pos logit contributions
pron+0.106
ller+0.089
lights+0.084
isa+0.081
mallow+0.080
Neg logit contributions
 unnecessary-0.106
 lifelong-0.095
Sur-0.094
 belong-0.089
 risks-0.088
Token see
Token ID766
Feature activation-6.10e-03

Pos logit contributions
pron+0.009
ller+0.009
lf+0.008
ibrarian+0.008
rant+0.008
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Got-0.007
Sur-0.007
 situations-0.007
Token us
Token ID514
Feature activation-9.24e-03

Pos logit contributions
pron+0.011
ller+0.010
lf+0.009
lights+0.009
ibrarian+0.009
Neg logit contributions
 unnecessary-0.011
 lifelong-0.010
Sur-0.009
Got-0.009
 risks-0.009
Token."
Token ID526
Feature activation-8.21e-04

Pos logit contributions
pron+0.012
ller+0.011
lf+0.010
lights+0.009
ibrarian+0.009
Neg logit contributions
 unnecessary-0.021
 lifelong-0.019
Sur-0.018
Got-0.018
 risks-0.017
Token
Token ID198
Feature activation+2.49e-03

Pos logit contributions
pron+0.001
ller+0.001
lf+0.001
ibrarian+0.001
lights+0.001
Neg logit contributions
 unnecessary-0.002
 lifelong-0.001
 risks-0.001
Sur-0.001
Got-0.001
Token,
Token ID11
Feature activation-9.25e-03

Pos logit contributions
pron+0.009
ller+0.008
lf+0.007
ibrarian+0.007
lights+0.007
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.007
 risks-0.007
Got-0.007
Token "
Token ID366
Feature activation+5.78e-03

Pos logit contributions
lights+0.020
issa+0.019
ller+0.019
pron+0.019
pa+0.019
Neg logit contributions
Sur-0.009
 lifelong-0.008
 okay-0.008
 unnecessary-0.008
Pe-0.008
TokenNo
Token ID2949
Feature activation-1.52e-02

Pos logit contributions
 unnecessary+0.010
 lifelong+0.008
 risks+0.007
 situations+0.007
Sur+0.007
Neg logit contributions
pron-0.012
ller-0.011
lf-0.010
ibrarian-0.009
mallow-0.009
Token,
Token ID11
Feature activation-1.95e-02

Pos logit contributions
pron+0.026
ller+0.024
lf+0.022
ibrarian+0.020
lights+0.020
Neg logit contributions
 unnecessary-0.030
 lifelong-0.026
Sur-0.024
Got-0.024
 risks-0.023
Token I
Token ID314
Feature activation-1.84e-02

Pos logit contributions
pron+0.031
ller+0.028
lf+0.027
lights+0.024
ibrarian+0.024
Neg logit contributions
 unnecessary-0.040
 lifelong-0.035
 situations-0.032
Sur-0.031
 risks-0.031
Token don
Token ID836
Feature activation-5.97e-02

Pos logit contributions
pron+0.040
ller+0.039
lf+0.037
lights+0.036
issa+0.034
Neg logit contributions
 unnecessary-0.026
 lifelong-0.021
Got-0.019
 risks-0.019
Sur-0.019
Token't
Token ID470
Feature activation-1.25e-02

Pos logit contributions
pron+0.108
ller+0.091
rant+0.089
isa+0.088
lights+0.088
Neg logit contributions
 unnecessary-0.099
Sur-0.090
 lifelong-0.089
 Leave-0.087
 belong-0.083
Token have
Token ID423
Feature activation-9.48e-03

Pos logit contributions
pron+0.027
ller+0.026
lights+0.024
lf+0.024
issa+0.023
Neg logit contributions
 unnecessary-0.018
 lifelong-0.014
 risks-0.014
Sur-0.013
 chances-0.013
Token any
Token ID597
Feature activation-1.36e-02

Pos logit contributions
pron+0.020
ller+0.019
lf+0.017
ibrarian+0.017
rant+0.016
Neg logit contributions
 unnecessary-0.014
 lifelong-0.012
Sur-0.010
Got-0.010
 situations-0.010
Token cheese
Token ID9891
Feature activation-1.26e-02

Pos logit contributions
pron+0.021
ller+0.019
lf+0.017
ibrarian+0.016
lights+0.016
Neg logit contributions
 unnecessary-0.029
 lifelong-0.025
Sur-0.023
 risks-0.023
Got-0.023
Token."
Token ID526
Feature activation-6.01e-03

Pos logit contributions
pron+0.019
ller+0.017
lf+0.016
lights+0.015
issa+0.015
Neg logit contributions
 unnecessary-0.026
 lifelong-0.023
Sur-0.022
 risks-0.022
Got-0.021
Token,
Token ID11
Feature activation-1.88e-02

Pos logit contributions
pron+0.007
ller+0.007
lf+0.006
lights+0.006
issa+0.006
Neg logit contributions
 unnecessary-0.006
 lifelong-0.005
Sur-0.005
 risks-0.005
 chances-0.004
Token "
Token ID366
Feature activation-1.63e-03

Pos logit contributions
pron+0.038
lights+0.037
oons+0.036
washer+0.035
issa+0.034
Neg logit contributions
Sur-0.022
 unnecessary-0.022
 lifelong-0.020
Help-0.019
Pe-0.019
TokenNo
Token ID2949
Feature activation-7.40e-03

Pos logit contributions
pron+0.003
ller+0.003
lf+0.003
ibrarian+0.003
mallow+0.003
Neg logit contributions
 unnecessary-0.003
 lifelong-0.002
 risks-0.002
Sur-0.002
Got-0.002
Token,
Token ID11
Feature activation-1.65e-02

Pos logit contributions
pron+0.014
ller+0.013
lf+0.012
lights+0.012
ibrarian+0.012
Neg logit contributions
 unnecessary-0.013
 lifelong-0.011
 risks-0.010
Sur-0.010
 situations-0.010
Token I
Token ID314
Feature activation-2.08e-02

Pos logit contributions
pron+0.032
ller+0.029
lf+0.027
lights+0.026
ibrarian+0.025
Neg logit contributions
 unnecessary-0.028
 lifelong-0.024
Sur-0.022
 risks-0.022
Got-0.022
Token don
Token ID836
Feature activation-5.95e-02

Pos logit contributions
pron+0.047
ller+0.045
lights+0.043
lf+0.043
issa+0.040
Neg logit contributions
 unnecessary-0.028
 lifelong-0.023
 risks-0.020
Sur-0.020
Got-0.020
Token't
Token ID470
Feature activation-1.33e-02

Pos logit contributions
pron+0.116
rant+0.096
isa+0.095
ller+0.095
lights+0.092
Neg logit contributions
 unnecessary-0.089
Sur-0.084
 lifelong-0.081
 Leave-0.079
 belong-0.073
Token.
Token ID13
Feature activation+5.88e-04

Pos logit contributions
pron+0.030
ller+0.026
lights+0.026
lf+0.026
mallow+0.024
Neg logit contributions
 unnecessary-0.020
 risks-0.016
 lifelong-0.016
 belong-0.015
 chances-0.014
Token How
Token ID1374
Feature activation-1.75e-02

Pos logit contributions
 unnecessary+0.001
 lifelong+0.001
Sur+0.001
 risks+0.001
Got+0.001
Neg logit contributions
pron-0.001
ller-0.001
lf-0.001
ibrarian-0.001
mallow-0.001
Token were
Token ID547
Feature activation-5.00e-03

Pos logit contributions
pron+0.033
ller+0.031
lf+0.029
mallow+0.029
rant+0.028
Neg logit contributions
 unnecessary-0.030
 lifelong-0.024
 risks-0.023
 belong-0.023
 chances-0.023
Token you
Token ID345
Feature activation-2.49e-02

Pos logit contributions
pron+0.010
ller+0.009
lf+0.009
ibrarian+0.009
issa+0.008
Neg logit contributions
 unnecessary-0.008
 lifelong-0.007
Sur-0.006
Got-0.006
 risks-0.006
Token said
Token ID531
Feature activation-1.06e-02

Pos logit contributions
 unnecessary+0.024
 lifelong+0.021
Sur+0.020
 belong+0.020
Got+0.020
Neg logit contributions
pron-0.022
lights-0.021
ller-0.020
rescent-0.019
rant-0.018
Token "
Token ID366
Feature activation-2.27e-03

Pos logit contributions
pron+0.020
ller+0.019
lf+0.017
lights+0.017
rant+0.016
Neg logit contributions
 unnecessary-0.019
 lifelong-0.016
Sur-0.014
 risks-0.014
Got-0.014
Tokenno
Token ID3919
Feature activation-1.19e-02

Pos logit contributions
pron+0.005
ller+0.004
mallow+0.004
lf+0.004
isa+0.004
Neg logit contributions
 unnecessary-0.004
 lifelong-0.003
 risks-0.003
 situations-0.003
 chances-0.003
Token,
Token ID11
Feature activation-8.73e-03

Pos logit contributions
pron+0.023
ller+0.021
lights+0.019
lf+0.019
rant+0.018
Neg logit contributions
 unnecessary-0.021
 lifelong-0.017
 risks-0.017
Sur-0.016
 situations-0.016
Token please
Token ID3387
Feature activation-6.25e-03

Pos logit contributions
pron+0.016
ller+0.015
lf+0.013
ibrarian+0.013
lights+0.013
Neg logit contributions
 unnecessary-0.016
 lifelong-0.013
Sur-0.012
 risks-0.012
Got-0.012
Token don
Token ID836
Feature activation-5.95e-02

Pos logit contributions
pron+0.014
ller+0.013
lights+0.013
lf+0.013
ibrarian+0.012
Neg logit contributions
 unnecessary-0.008
 risks-0.006
 lifelong-0.006
 belong-0.006
 chances-0.006
Token't
Token ID470
Feature activation-5.71e-03

Pos logit contributions
pron+0.099
ller+0.085
lights+0.079
rant+0.078
isa+0.078
Neg logit contributions
 unnecessary-0.112
Sur-0.099
 lifelong-0.098
 risks-0.092
 Leave-0.092
Token touch
Token ID3638
Feature activation-1.33e-02

Pos logit contributions
ller+0.010
pron+0.010
lf+0.008
 stove+0.008
 clock+0.008
Neg logit contributions
 unnecessary-0.011
 lifelong-0.010
Got-0.009
Sur-0.008
 situations-0.008
Token the
Token ID262
Feature activation+3.87e-04

Pos logit contributions
pron+0.023
ller+0.020
lf+0.019
ibrarian+0.018
lights+0.018
Neg logit contributions
 unnecessary-0.026
 lifelong-0.022
 risks-0.021
Sur-0.021
Got-0.020
Token crocod
Token ID37565
Feature activation-1.34e-02

Pos logit contributions
 unnecessary+0.001
 lifelong+0.001
Sur+0.001
Got+0.001
 risks+0.001
Neg logit contributions
pron-0.001
ller-0.001
lf-0.001
washer-0.001
mallow-0.001
Tokenile
Token ID576
Feature activation-2.90e-03

Pos logit contributions
pron+0.021
ller+0.018
lf+0.017
issa+0.016
lights+0.016
Neg logit contributions
 unnecessary-0.028
 lifelong-0.024
 risks-0.024
Sur-0.023
 chances-0.023
Token.
Token ID13
Feature activation-2.15e-02

Pos logit contributions
 unnecessary+0.013
 lifelong+0.011
 risks+0.010
 situations+0.010
Sur+0.010
Neg logit contributions
pron-0.011
ller-0.010
lf-0.009
lights-0.008
rant-0.008
Token
Token ID198
Feature activation-5.27e-03

Pos logit contributions
pron+0.045
ller+0.042
lf+0.041
lights+0.039
washer+0.038
Neg logit contributions
 unnecessary-0.032
Sur-0.027
 lifelong-0.027
Got-0.025
Help-0.024
Token
Token ID198
Feature activation+2.70e-03

Pos logit contributions
pron+0.012
ller+0.011
lf+0.010
rant+0.010
mallow+0.010
Neg logit contributions
 unnecessary-0.007
Sur-0.006
 lifelong-0.005
Got-0.005
Pe-0.005
Token"
Token ID1
Feature activation-2.86e-03

Pos logit contributions
 unnecessary+0.005
 lifelong+0.005
 risks+0.004
 chances+0.004
 situations+0.004
Neg logit contributions
pron-0.005
ller-0.004
lf-0.004
ibrarian-0.004
washer-0.003
TokenI
Token ID40
Feature activation-1.85e-02

Pos logit contributions
pron+0.006
ller+0.006
lf+0.005
lights+0.005
ibrarian+0.005
Neg logit contributions
 unnecessary-0.004
 lifelong-0.004
Sur-0.003
Got-0.003
 risks-0.003
Token don
Token ID836
Feature activation-5.95e-02

Pos logit contributions
pron+0.039
ller+0.036
lf+0.034
lights+0.032
ibrarian+0.032
Neg logit contributions
 unnecessary-0.029
 lifelong-0.024
Sur-0.022
 risks-0.021
Got-0.021
Token't
Token ID470
Feature activation-1.26e-02

Pos logit contributions
pron+0.119
rant+0.102
isa+0.100
lights+0.098
ller+0.096
Neg logit contributions
 unnecessary-0.081
Sur-0.078
 lifelong-0.073
 Leave-0.072
 belong-0.070
Token know
Token ID760
Feature activation-1.39e-02

Pos logit contributions
pron+0.036
ller+0.033
lights+0.033
lf+0.032
mallow+0.031
Neg logit contributions
 unnecessary-0.011
 risks-0.008
 lifelong-0.008
Sur-0.006
 belong-0.006
Token,"
Token ID553
Feature activation+4.83e-03

Pos logit contributions
pron+0.021
ller+0.019
lf+0.017
mallow+0.016
ibrarian+0.016
Neg logit contributions
 unnecessary-0.030
 lifelong-0.026
 risks-0.024
Sur-0.024
Got-0.024
Token Ben
Token ID3932
Feature activation-5.95e-03

Pos logit contributions
 unnecessary+0.009
 lifelong+0.007
Got+0.007
Sur+0.007
 risks+0.007
Neg logit contributions
pron-0.009
ller-0.008
lf-0.008
lights-0.007
washer-0.007
Token says
Token ID1139
Feature activation-1.06e-02

Pos logit contributions
pron+0.012
lf+0.010
ller+0.010
lights+0.010
mallow+0.010
Neg logit contributions
 unnecessary-0.010
 lifelong-0.008
 risks-0.008
Sur-0.008
 chances-0.008
Token.
Token ID13
Feature activation-1.34e-02

Pos logit contributions
 unnecessary+0.017
 lifelong+0.015
Sur+0.014
 risks+0.014
 chances+0.014
Neg logit contributions
pron-0.012
ller-0.011
lf-0.010
mallow-0.009
issa-0.009
Token
Token ID198
Feature activation+2.37e-03

Pos logit contributions
pron+0.028
ller+0.026
lf+0.024
rant+0.024
issa+0.023
Neg logit contributions
 unnecessary-0.019
 lifelong-0.017
Sur-0.017
Got-0.016
Help-0.014
Token
Token ID198
Feature activation+4.85e-03

Pos logit contributions
 unnecessary+0.003
 lifelong+0.003
Sur+0.003
Got+0.003
 risks+0.003
Neg logit contributions
pron-0.005
ller-0.005
lf-0.004
rant-0.004
work-0.004
Token"
Token ID1
Feature activation+5.51e-03

Pos logit contributions
 unnecessary+0.010
 lifelong+0.008
 risks+0.008
 situations+0.007
 chances+0.007
Neg logit contributions
pron-0.008
lf-0.007
ller-0.007
ibrarian-0.007
arge-0.007
TokenI
Token ID40
Feature activation-1.62e-02

Pos logit contributions
 unnecessary+0.010
 lifelong+0.008
 risks+0.008
 situations+0.008
 chances+0.008
Neg logit contributions
pron-0.010
ller-0.009
lf-0.009
ibrarian-0.008
lights-0.008
Token don
Token ID836
Feature activation-5.94e-02

Pos logit contributions
pron+0.034
ller+0.032
lf+0.030
lights+0.028
ibrarian+0.028
Neg logit contributions
 unnecessary-0.025
 lifelong-0.021
Sur-0.019
 risks-0.019
Got-0.019
Token't
Token ID470
Feature activation-1.39e-02

Pos logit contributions
pron+0.125
rant+0.111
isa+0.109
roach+0.104
anne+0.103
Neg logit contributions
Sur-0.072
 unnecessary-0.068
 Leave-0.068
 lifelong-0.067
 belong-0.063
Token know
Token ID760
Feature activation-1.71e-02

Pos logit contributions
pron+0.038
ller+0.035
lf+0.034
lights+0.034
 timer+0.032
Neg logit contributions
 unnecessary-0.014
 risks-0.010
 lifelong-0.010
 chances-0.009
 belong-0.009
Token.
Token ID13
Feature activation+1.28e-03

Pos logit contributions
pron+0.027
ller+0.023
lf+0.021
ibrarian+0.020
mallow+0.020
Neg logit contributions
 unnecessary-0.036
 lifelong-0.031
 risks-0.029
Sur-0.029
Got-0.028
Token Maybe
Token ID6674
Feature activation-1.86e-02

Pos logit contributions
 unnecessary+0.002
 lifelong+0.002
 risks+0.002
Sur+0.002
Got+0.002
Neg logit contributions
pron-0.002
ller-0.002
lf-0.002
ibrarian-0.002
lights-0.002
Token.
Token ID13
Feature activation+4.90e-03

Pos logit contributions
pron+0.030
ller+0.028
lf+0.026
lights+0.024
ibrarian+0.024
Neg logit contributions
 unnecessary-0.038
 lifelong-0.033
Got-0.030
 risks-0.030
Sur-0.030
Token?"
Token ID1701
Feature activation-8.92e-03

Pos logit contributions
 unnecessary+0.001
 belong+0.001
 lifelong+0.001
 true+0.001
 situations+0.001
Neg logit contributions
lights-0.001
pron-0.001
ller-0.001
 timer-0.001
lf-0.001
Token
Token ID198
Feature activation-5.10e-03

Pos logit contributions
pron+0.019
ller+0.019
lf+0.017
lights+0.017
isa+0.017
Neg logit contributions
 unnecessary-0.010
Sur-0.010
 lifelong-0.010
Got-0.009
 See-0.008
Token
Token ID198
Feature activation-1.77e-03

Pos logit contributions
pron+0.011
ller+0.010
mallow+0.009
lf+0.009
rant+0.009
Neg logit contributions
 unnecessary-0.007
 lifelong-0.006
Sur-0.006
Got-0.006
Pe-0.005
Token"
Token ID1
Feature activation+2.56e-03

Pos logit contributions
pron+0.003
ller+0.003
lf+0.003
ibrarian+0.003
issa+0.003
Neg logit contributions
 unnecessary-0.003
 lifelong-0.003
 risks-0.002
 situations-0.002
 chances-0.002
TokenI
Token ID40
Feature activation-2.44e-02

Pos logit contributions
 unnecessary+0.003
Sur+0.003
 lifelong+0.003
Got+0.003
Pe+0.003
Neg logit contributions
pron-0.005
ller-0.005
lf-0.005
rant-0.005
lights-0.005
Token don
Token ID836
Feature activation-5.94e-02

Pos logit contributions
pron+0.054
ller+0.053
lights+0.051
lf+0.050
ibrarian+0.047
Neg logit contributions
 unnecessary-0.031
 lifelong-0.027
Sur-0.023
Got-0.023
 belong-0.023
Token't
Token ID470
Feature activation-7.64e-03

Pos logit contributions
pron+0.116
rant+0.103
isa+0.102
lights+0.101
anne+0.099
Neg logit contributions
 unnecessary-0.080
Sur-0.077
 lifelong-0.073
 Leave-0.073
 belong-0.067
Token know
Token ID760
Feature activation+4.41e-05

Pos logit contributions
pron+0.020
ller+0.019
lights+0.018
lf+0.018
mallow+0.017
Neg logit contributions
 unnecessary-0.008
 lifelong-0.006
 risks-0.005
Sur-0.005
 chances-0.005
Token yet
Token ID1865
Feature activation+3.41e-03

Pos logit contributions
 unnecessary+0.000
 lifelong+0.000
 risks+0.000
Sur+0.000
 chances+0.000
Neg logit contributions
pron-0.000
ller-0.000
lf-0.000
ibrarian-0.000
mallow-0.000
Token.
Token ID13
Feature activation-2.77e-03

Pos logit contributions
 unnecessary+0.007
 lifelong+0.006
 risks+0.006
Sur+0.006
 situations+0.005
Neg logit contributions
pron-0.005
ller-0.005
lf-0.004
mallow-0.004
ibrarian-0.004
Token I
Token ID314
Feature activation-1.57e-02

Pos logit contributions
pron+0.006
ller+0.005
lf+0.005
mallow+0.005
lights+0.005
Neg logit contributions
 unnecessary-0.004
 lifelong-0.003
Sur-0.003
Got-0.003
 risks-0.003
Token voice
Token ID3809
Feature activation-9.93e-03

Pos logit contributions
pron+0.006
ller+0.005
lf+0.005
ibrarian+0.004
lights+0.004
Neg logit contributions
 unnecessary-0.007
 lifelong-0.006
Got-0.005
 risks-0.005
Sur-0.005
Token say
Token ID910
Feature activation-4.83e-03

Pos logit contributions
pron+0.016
ller+0.014
lf+0.012
lights+0.012
ibrarian+0.011
Neg logit contributions
 unnecessary-0.021
 lifelong-0.018
Sur-0.017
 risks-0.017
 chances-0.017
Token,
Token ID11
Feature activation-9.39e-03

Pos logit contributions
pron+0.008
ller+0.008
lf+0.007
ibrarian+0.007
mallow+0.007
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
 risks-0.007
Sur-0.007
Got-0.007
Token "
Token ID366
Feature activation+4.09e-04

Pos logit contributions
pron+0.021
lights+0.019
lf+0.019
oons+0.018
ibrarian+0.018
Neg logit contributions
 unnecessary-0.012
Sur-0.011
Help-0.009
Pe-0.009
 lifelong-0.009
TokenPlease
Token ID5492
Feature activation-9.64e-03

Pos logit contributions
 unnecessary+0.001
 lifelong+0.001
 risks+0.001
 friendships+0.001
 chances+0.001
Neg logit contributions
pron-0.001
ller-0.001
lf-0.001
mallow-0.001
bell-0.001
Token don
Token ID836
Feature activation-5.94e-02

Pos logit contributions
pron+0.018
ller+0.016
lf+0.015
ibrarian+0.015
mallow+0.014
Neg logit contributions
 unnecessary-0.017
 lifelong-0.015
Sur-0.014
 risks-0.013
Got-0.013
Token't
Token ID470
Feature activation-5.40e-03

Pos logit contributions
pron+0.098
ller+0.081
lights+0.076
rant+0.075
isa+0.073
Neg logit contributions
 unnecessary-0.115
Sur-0.102
 lifelong-0.101
 risks-0.095
Got-0.095
Token touch
Token ID3638
Feature activation-2.70e-03

Pos logit contributions
pron+0.009
ller+0.009
lf+0.009
washer+0.008
ibrarian+0.008
Neg logit contributions
 unnecessary-0.010
 lifelong-0.009
Got-0.008
Sur-0.008
 risks-0.008
Token me
Token ID502
Feature activation-9.71e-03

Pos logit contributions
pron+0.005
ller+0.004
lf+0.004
ibrarian+0.004
lights+0.004
Neg logit contributions
 unnecessary-0.005
 lifelong-0.004
 risks-0.004
Sur-0.004
Got-0.004
Token!"
Token ID2474
Feature activation-1.24e-02

Pos logit contributions
pron+0.018
ller+0.016
lights+0.015
lf+0.015
ibrarian+0.014
Neg logit contributions
 unnecessary-0.017
 lifelong-0.014
 risks-0.014
 chances-0.013
 situations-0.013
Token Tim
Token ID5045
Feature activation-1.05e-02

Pos logit contributions
pron+0.026
ller+0.024
lf+0.023
lights+0.022
ibrarian+0.022
Neg logit contributions
 unnecessary-0.019
 lifelong-0.015
Sur-0.015
 risks-0.014
Got-0.014
Token
Token ID198
Feature activation+3.22e-03

Pos logit contributions
 unnecessary+0.009
 lifelong+0.008
Sur+0.007
Got+0.007
 risks+0.007
Neg logit contributions
pron-0.010
ller-0.009
lf-0.008
lights-0.008
ibrarian-0.008
Token
Token ID198
Feature activation+1.51e-02

Pos logit contributions
 unnecessary+0.005
 lifelong+0.004
Sur+0.004
Got+0.003
 risks+0.003
Neg logit contributions
pron-0.007
ller-0.006
lf-0.006
lights-0.006
rant-0.006
Token"
Token ID1
Feature activation+1.39e-02

Pos logit contributions
 unnecessary+0.028
 lifelong+0.024
 risks+0.021
 situations+0.021
 chances+0.021
Neg logit contributions
pron-0.028
ller-0.025
lf-0.024
ibrarian-0.023
issa-0.022
TokenMom
Token ID29252
Feature activation+3.49e-02

Pos logit contributions
 unnecessary+0.026
 lifelong+0.022
 risks+0.020
 situations+0.020
 chances+0.019
Neg logit contributions
pron-0.026
ller-0.023
lf-0.022
mallow-0.021
issa-0.021
Token,
Token ID11
Feature activation+1.42e-02

Pos logit contributions
 unnecessary+0.058
 lifelong+0.048
Sur+0.046
Got+0.045
 risks+0.045
Neg logit contributions
pron-0.068
ller-0.061
lights-0.057
lf-0.057
rant-0.055
Token mom
Token ID1995
Feature activation+3.31e-02

Pos logit contributions
 unnecessary+0.033
 lifelong+0.030
 situations+0.027
 risks+0.027
Sur+0.027
Neg logit contributions
pron-0.019
ller-0.015
lf-0.015
mallow-0.014
ibrarian-0.014
Token,
Token ID11
Feature activation+2.43e-02

Pos logit contributions
 unnecessary+0.054
 lifelong+0.044
Sur+0.042
Got+0.041
 risks+0.041
Neg logit contributions
pron-0.066
ller-0.059
lights-0.055
lf-0.055
rant-0.054
Token where
Token ID810
Feature activation+5.06e-03

Pos logit contributions
 unnecessary+0.049
 lifelong+0.045
 situations+0.040
 risks+0.039
 friendships+0.039
Neg logit contributions
pron-0.039
lf-0.032
ller-0.032
washer-0.032
mallow-0.032
Token are
Token ID389
Feature activation+8.08e-03

Pos logit contributions
 unnecessary+0.009
 lifelong+0.008
Sur+0.007
 risks+0.007
Pe+0.007
Neg logit contributions
pron-0.009
washer-0.008
ller-0.008
lf-0.008
ibrarian-0.008
Token you
Token ID345
Feature activation-4.20e-03

Pos logit contributions
 unnecessary+0.016
 lifelong+0.015
Got+0.013
 situations+0.013
 risks+0.013
Neg logit contributions
pron-0.012
lf-0.012
ller-0.012
 oven-0.011
washer-0.011
Token?"
Token ID1701
Feature activation+5.72e-03

Pos logit contributions
pron+0.004
ller+0.003
lf+0.003
mallow+0.002
washer+0.002
Neg logit contributions
 unnecessary-0.011
 lifelong-0.010
Got-0.010
Sur-0.010
 risks-0.010
Token rules
Token ID3173
Feature activation-1.49e-02

Pos logit contributions
pron+0.021
ller+0.019
lf+0.018
ibrarian+0.017
lights+0.017
Neg logit contributions
 unnecessary-0.015
 lifelong-0.013
Sur-0.011
 risks-0.011
Got-0.011
Token and
Token ID290
Feature activation+8.65e-03

Pos logit contributions
pron+0.026
ller+0.024
lf+0.022
rescent+0.020
arge+0.020
Neg logit contributions
 unnecessary-0.026
 lifelong-0.023
 situations-0.021
 risks-0.021
 chances-0.021
Token she
Token ID673
Feature activation+7.23e-03

Pos logit contributions
 unnecessary+0.018
 lifelong+0.015
Sur+0.014
Got+0.014
 risks+0.014
Neg logit contributions
pron-0.014
ller-0.012
lf-0.012
washer-0.011
ibrarian-0.011
Token happily
Token ID18177
Feature activation+5.63e-03

Pos logit contributions
 unnecessary+0.011
 lifelong+0.010
Sur+0.009
Got+0.009
 friendships+0.009
Neg logit contributions
pron-0.014
ller-0.013
lf-0.013
lights-0.012
rant-0.012
Token skipped
Token ID26684
Feature activation+6.08e-03

Pos logit contributions
 unnecessary+0.010
 lifelong+0.009
Sur+0.008
 risks+0.008
Got+0.008
Neg logit contributions
pron-0.010
ller-0.009
lf-0.008
ibrarian-0.008
lights-0.008
Token away
Token ID1497
Feature activation+3.18e-02

Pos logit contributions
 unnecessary+0.011
 lifelong+0.009
Sur+0.008
Got+0.008
 risks+0.008
Neg logit contributions
pron-0.011
ller-0.010
lf-0.010
lights-0.009
ibrarian-0.009
Token with
Token ID351
Feature activation+8.67e-03

Pos logit contributions
 unnecessary+0.070
 lifelong+0.060
Sur+0.058
Got+0.058
 risks+0.056
Neg logit contributions
pron-0.045
ller-0.040
lf-0.037
lights-0.035
ibrarian-0.033
Token her
Token ID607
Feature activation+5.32e-03

Pos logit contributions
 unnecessary+0.015
 lifelong+0.013
Sur+0.012
 risks+0.011
Got+0.011
Neg logit contributions
pron-0.017
ller-0.015
lf-0.014
lights-0.014
ibrarian-0.014
Token toe
Token ID21189
Feature activation+8.58e-03

Pos logit contributions
 unnecessary+0.010
 lifelong+0.009
 risks+0.008
 chances+0.008
 friendships+0.008
Neg logit contributions
pron-0.009
ller-0.009
rant-0.008
lf-0.008
ibrarian-0.008
Token-
Token ID12
Feature activation-2.41e-05

Pos logit contributions
 unnecessary+0.017
 lifelong+0.015
 risks+0.014
 chances+0.014
 situations+0.014
Neg logit contributions
pron-0.013
ller-0.012
lf-0.011
rant-0.011
ibrarian-0.010
Tokenw
Token ID86
Feature activation-7.63e-03

Pos logit contributions
pron+0.000
ller+0.000
lf+0.000
lights+0.000
ibrarian+0.000
Neg logit contributions
 unnecessary-0.000
 lifelong-0.000
 risks-0.000
Sur-0.000
Got-0.000
Token day
Token ID1110
Feature activation+1.13e-02

Pos logit contributions
pron+0.041
ller+0.040
lights+0.036
lf+0.036
ibrarian+0.035
Neg logit contributions
 unnecessary-0.033
 lifelong-0.030
Sur-0.026
 chances-0.026
 risks-0.025
Token,
Token ID11
Feature activation-1.35e-03

Pos logit contributions
 unnecessary+0.028
 lifelong+0.024
 risks+0.022
 situations+0.022
Got+0.022
Neg logit contributions
pron-0.014
ller-0.012
lf-0.012
mallow-0.011
 clock-0.010
Token the
Token ID262
Feature activation-6.64e-03

Pos logit contributions
pron+0.003
ller+0.002
lf+0.002
ibrarian+0.002
mallow+0.002
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
Sur-0.002
 risks-0.002
Got-0.002
Token man
Token ID582
Feature activation+1.75e-03

Pos logit contributions
pron+0.011
ller+0.011
lights+0.010
isa+0.010
rant+0.010
Neg logit contributions
 unnecessary-0.012
 lifelong-0.010
Sur-0.010
 risks-0.009
 chances-0.009
Token looked
Token ID3114
Feature activation+8.84e-03

Pos logit contributions
 unnecessary+0.004
 lifelong+0.003
 risks+0.003
Got+0.003
 situations+0.003
Neg logit contributions
pron-0.003
ller-0.003
lf-0.002
washer-0.002
ibrarian-0.002
Token out
Token ID503
Feature activation+3.13e-02

Pos logit contributions
 unnecessary+0.016
 lifelong+0.013
 risks+0.012
 chances+0.012
 situations+0.011
Neg logit contributions
pron-0.017
ller-0.016
lf-0.015
ibrarian-0.014
mallow-0.014
Token the
Token ID262
Feature activation-7.54e-03

Pos logit contributions
 unnecessary+0.057
 lifelong+0.049
Sur+0.044
 risks+0.044
Got+0.044
Neg logit contributions
pron-0.058
ller-0.053
lf-0.049
ibrarian-0.046
mallow-0.046
Token window
Token ID4324
Feature activation+3.79e-03

Pos logit contributions
pron+0.009
ller+0.009
lf+0.008
ibrarian+0.007
mallow+0.007
Neg logit contributions
 unnecessary-0.018
 lifelong-0.016
 risks-0.015
 chances-0.015
Sur-0.015
Token and
Token ID290
Feature activation-9.88e-03

Pos logit contributions
 unnecessary+0.007
 lifelong+0.006
Sur+0.006
 situations+0.006
 chances+0.006
Neg logit contributions
pron-0.007
ller-0.006
lf-0.005
ibrarian-0.005
issa-0.005
Token saw
Token ID2497
Feature activation-2.08e-03

Pos logit contributions
pron+0.020
ller+0.018
lights+0.017
oons+0.016
lf+0.016
Neg logit contributions
 unnecessary-0.015
 lifelong-0.012
Got-0.012
Sur-0.012
Help-0.012
Token a
Token ID257
Feature activation-4.98e-03

Pos logit contributions
pron+0.005
ller+0.004
lf+0.004
lights+0.004
rant+0.004
Neg logit contributions
 unnecessary-0.003
 lifelong-0.002
Sur-0.002
Got-0.002
 risks-0.002
Tokenmy
Token ID1820
Feature activation-3.79e-03

Pos logit contributions
pron+0.008
ller+0.007
lights+0.006
lf+0.006
 timer+0.006
Neg logit contributions
 unnecessary-0.010
 lifelong-0.010
Got-0.009
Sur-0.008
 friendships-0.008
Token made
Token ID925
Feature activation+1.05e-03

Pos logit contributions
pron+0.006
ller+0.005
lf+0.005
ibrarian+0.005
lights+0.005
Neg logit contributions
 unnecessary-0.008
 lifelong-0.007
Sur-0.006
 risks-0.006
Got-0.006
Token sure
Token ID1654
Feature activation+1.22e-02

Pos logit contributions
 unnecessary+0.002
 situations+0.002
 lifelong+0.002
Got+0.002
Sur+0.002
Neg logit contributions
pron-0.002
ller-0.002
lf-0.001
arge-0.001
issa-0.001
Token to
Token ID284
Feature activation+1.29e-02

Pos logit contributions
 unnecessary+0.031
 lifelong+0.028
Sur+0.026
Got+0.026
 risks+0.026
Neg logit contributions
pron-0.011
ller-0.011
washer-0.010
lf-0.010
ma-0.009
Token keep
Token ID1394
Feature activation+2.87e-03

Pos logit contributions
 unnecessary+0.023
 lifelong+0.020
Sur+0.019
Got+0.018
 risks+0.018
Neg logit contributions
pron-0.024
ller-0.021
lf-0.020
ibrarian-0.019
mallow-0.018
Token a
Token ID257
Feature activation+3.23e-02

Pos logit contributions
 unnecessary+0.005
 lifelong+0.004
Sur+0.004
Got+0.004
 situations+0.004
Neg logit contributions
pron-0.005
ller-0.005
lf-0.005
issa-0.004
mallow-0.004
Token closer
Token ID5699
Feature activation+1.71e-02

Pos logit contributions
 unnecessary+0.065
 situations+0.057
 endings+0.057
 lifelong+0.055
Got+0.054
Neg logit contributions
pron-0.057
ller-0.041
lf-0.037
washer-0.037
 timer-0.036
Token eye
Token ID4151
Feature activation+2.53e-03

Pos logit contributions
 unnecessary+0.035
 lifelong+0.030
Got+0.029
Sur+0.029
Pe+0.027
Neg logit contributions
pron-0.027
ller-0.024
lf-0.021
lights-0.021
washer-0.021
Token on
Token ID319
Feature activation+1.47e-02

Pos logit contributions
 unnecessary+0.005
 lifelong+0.004
Sur+0.004
Got+0.004
 risks+0.004
Neg logit contributions
pron-0.004
ller-0.004
lf-0.003
ibrarian-0.003
lights-0.003
Token his
Token ID465
Feature activation+1.01e-02

Pos logit contributions
 unnecessary+0.025
 lifelong+0.021
Sur+0.019
Got+0.019
 risks+0.019
Neg logit contributions
pron-0.028
ller-0.026
lf-0.024
lights-0.023
ibrarian-0.023
Token beloved
Token ID14142
Feature activation+1.62e-02

Pos logit contributions
 unnecessary+0.020
 lifelong+0.017
Sur+0.016
Got+0.016
 risks+0.016
Neg logit contributions
pron-0.017
ller-0.015
lf-0.014
lights-0.013
mallow-0.013
Token he
Token ID339
Feature activation+1.39e-02

Pos logit contributions
pron+0.010
ller+0.009
lf+0.009
lights+0.008
ibrarian+0.008
Neg logit contributions
 unnecessary-0.014
 lifelong-0.012
 risks-0.011
 situations-0.011
Sur-0.011
Token shouted
Token ID17293
Feature activation+1.42e-02

Pos logit contributions
 unnecessary+0.027
 lifelong+0.023
Sur+0.021
 risks+0.021
Got+0.021
Neg logit contributions
pron-0.024
ller-0.022
lf-0.020
lights-0.019
ibrarian-0.019
Token,
Token ID11
Feature activation+2.96e-04

Pos logit contributions
 unnecessary+0.030
 lifelong+0.026
 situations+0.024
 risks+0.024
Got+0.024
Neg logit contributions
pron-0.022
ller-0.020
lf-0.018
ibrarian-0.018
washer-0.017
Token "
Token ID366
Feature activation+1.60e-02

Pos logit contributions
 unnecessary+0.000
 lifelong+0.000
Sur+0.000
Got+0.000
Help+0.000
Neg logit contributions
pron-0.001
lights-0.000
lf-0.000
ller-0.000
oons-0.000
TokenLook
Token ID8567
Feature activation+3.86e-03

Pos logit contributions
 unnecessary+0.032
 situations+0.025
 lifelong+0.025
 risks+0.024
 chances+0.023
Neg logit contributions
pron-0.028
mallow-0.025
ller-0.024
lf-0.023
flake-0.022
Token Lily
Token ID20037
Feature activation+3.35e-02

Pos logit contributions
 unnecessary+0.008
 lifelong+0.007
 situations+0.006
Got+0.006
 risks+0.006
Neg logit contributions
pron-0.006
ller-0.006
lf-0.006
ibrarian-0.005
issa-0.005
Token,
Token ID11
Feature activation+1.71e-02

Pos logit contributions
 unnecessary+0.077
 lifelong+0.066
 risks+0.065
 situations+0.065
Got+0.063
Neg logit contributions
pron-0.050
ller-0.042
lf-0.041
ma-0.039
mallow-0.037
Token I
Token ID314
Feature activation+1.59e-02

Pos logit contributions
 unnecessary+0.036
 lifelong+0.032
 situations+0.030
 risks+0.029
 chances+0.028
Neg logit contributions
pron-0.027
ller-0.024
lf-0.023
washer-0.021
ibrarian-0.021
Token found
Token ID1043
Feature activation+2.05e-02

Pos logit contributions
 unnecessary+0.034
 lifelong+0.029
 situations+0.028
 risks+0.026
 endings+0.026
Neg logit contributions
pron-0.024
washer-0.021
mallow-0.020
lf-0.020
ller-0.019
Token your
Token ID534
Feature activation+5.76e-03

Pos logit contributions
 unnecessary+0.039
 lifelong+0.035
 situations+0.032
Got+0.032
 risks+0.031
Neg logit contributions
pron-0.032
ller-0.031
lf-0.028
ibrarian-0.027
 oven-0.027
Token pretty
Token ID2495
Feature activation+2.48e-03

Pos logit contributions
 unnecessary+0.011
 lifelong+0.009
Sur+0.009
Got+0.008
 risks+0.008
Neg logit contributions
pron-0.010
ller-0.010
lf-0.009
ibrarian-0.008
lights-0.008
Token them
Token ID606
Feature activation+2.39e-02

Pos logit contributions
 unnecessary+0.043
 lifelong+0.038
Sur+0.036
 situations+0.036
Got+0.036
Neg logit contributions
pron-0.028
ller-0.024
lights-0.022
lf-0.022
rant-0.021
Token and
Token ID290
Feature activation+1.98e-02

Pos logit contributions
 unnecessary+0.055
 lifelong+0.048
Sur+0.046
Got+0.046
 risks+0.045
Neg logit contributions
pron-0.032
ller-0.029
lf-0.025
lights-0.024
mallow-0.023
Token taste
Token ID6938
Feature activation+1.76e-02

Pos logit contributions
 unnecessary+0.036
 lifelong+0.031
Sur+0.028
 risks+0.028
Got+0.028
Neg logit contributions
pron-0.036
ller-0.033
lf-0.031
lights-0.029
ibrarian-0.029
Token them
Token ID606
Feature activation+1.93e-02

Pos logit contributions
 unnecessary+0.039
 lifelong+0.034
Sur+0.033
Got+0.033
 situations+0.032
Neg logit contributions
pron-0.024
ller-0.020
lights-0.018
lf-0.018
rant-0.018
Token.
Token ID13
Feature activation+1.26e-02

Pos logit contributions
 unnecessary+0.043
 lifelong+0.038
 risks+0.035
Sur+0.035
Got+0.035
Neg logit contributions
pron-0.027
ller-0.024
lf-0.022
ibrarian-0.021
lights-0.020
Token
Token ID198
Feature activation+7.57e-04

Pos logit contributions
 unnecessary+0.022
 lifelong+0.019
Sur+0.018
Got+0.017
 risks+0.017
Neg logit contributions
pron-0.024
ller-0.022
lf-0.020
lights-0.019
ibrarian-0.019
Token
Token ID198
Feature activation+1.94e-02

Pos logit contributions
 unnecessary+0.001
 lifelong+0.001
Sur+0.001
Pe+0.001
Got+0.001
Neg logit contributions
pron-0.002
ller-0.002
lights-0.001
lf-0.001
work-0.001
TokenOne
Token ID3198
Feature activation-1.11e-02

Pos logit contributions
 unnecessary+0.041
 lifelong+0.035
 risks+0.032
 situations+0.032
 chances+0.031
Neg logit contributions
pron-0.029
ller-0.027
lf-0.026
arge-0.024
mallow-0.024
Token day
Token ID1110
Feature activation+2.80e-02

Pos logit contributions
ller+0.022
pron+0.022
lf+0.019
lights+0.019
issa+0.019
Neg logit contributions
 unnecessary-0.017
 lifelong-0.015
Sur-0.013
 chances-0.013
 risks-0.013
Token,
Token ID11
Feature activation+2.22e-02

Pos logit contributions
 unnecessary+0.073
 lifelong+0.066
 situations+0.062
 risks+0.060
 friendships+0.059
Neg logit contributions
pron-0.023
ller-0.022
arge-0.022
lf-0.022
mallow-0.021
Token Jen
Token ID13364
Feature activation+2.63e-03

Pos logit contributions
 unnecessary+0.047
 lifelong+0.041
 situations+0.039
 risks+0.038
 friendships+0.037
Neg logit contributions
pron-0.033
ller-0.030
lf-0.028
arge-0.028
mallow-0.027
Token,
Token ID11
Feature activation-1.04e-04

Pos logit contributions
 unnecessary+0.015
 lifelong+0.013
 risks+0.012
 situations+0.012
Sur+0.012
Neg logit contributions
pron-0.013
ller-0.012
lf-0.011
ibrarian-0.010
lights-0.010
Token "
Token ID366
Feature activation+9.52e-03

Pos logit contributions
pron+0.000
ller+0.000
lf+0.000
lights+0.000
issa+0.000
Neg logit contributions
 unnecessary-0.000
 lifelong-0.000
Sur-0.000
Pe-0.000
Got-0.000
TokenYou
Token ID1639
Feature activation-4.95e-03

Pos logit contributions
 unnecessary+0.017
 lifelong+0.014
 risks+0.012
 situations+0.012
 chances+0.012
Neg logit contributions
pron-0.019
ller-0.017
mallow-0.015
lf-0.015
arge-0.015
Token're
Token ID821
Feature activation-5.05e-03

Pos logit contributions
pron+0.011
ller+0.010
lf+0.009
ibrarian+0.009
washer+0.009
Neg logit contributions
 unnecessary-0.008
 lifelong-0.006
Sur-0.005
Got-0.005
 situations-0.005
Token right
Token ID826
Feature activation+1.33e-02

Pos logit contributions
pron+0.010
ller+0.010
lf+0.009
ibrarian+0.009
mallow+0.009
Neg logit contributions
 unnecessary-0.008
 lifelong-0.007
Sur-0.006
Got-0.006
 risks-0.006
Token,
Token ID11
Feature activation+2.88e-03

Pos logit contributions
 unnecessary+0.025
 lifelong+0.021
 risks+0.020
Sur+0.020
 chances+0.019
Neg logit contributions
pron-0.023
ller-0.021
lf-0.020
lights-0.019
ibrarian-0.019
Token Lily
Token ID20037
Feature activation+1.96e-02

Pos logit contributions
 unnecessary+0.006
 lifelong+0.005
Sur+0.005
 risks+0.005
Got+0.005
Neg logit contributions
pron-0.005
ller-0.004
lf-0.004
ibrarian-0.004
mallow-0.004
Token!
Token ID0
Feature activation+1.33e-02

Pos logit contributions
 unnecessary+0.038
 lifelong+0.033
 risks+0.031
 situations+0.030
Sur+0.030
Neg logit contributions
pron-0.032
ller-0.029
lf-0.026
lights-0.026
ibrarian-0.025
Token My
Token ID2011
Feature activation-8.89e-04

Pos logit contributions
 unnecessary+0.021
 lifelong+0.018
Sur+0.016
 risks+0.016
Got+0.016
Neg logit contributions
pron-0.027
ller-0.025
lf-0.023
lights-0.022
ibrarian-0.022
Token collar
Token ID19908
Feature activation+1.39e-02

Pos logit contributions
pron+0.002
ller+0.001
lf+0.001
lights+0.001
ibrarian+0.001
Neg logit contributions
 unnecessary-0.002
 lifelong-0.001
 risks-0.001
Sur-0.001
Got-0.001
Token is
Token ID318
Feature activation-4.55e-04

Pos logit contributions
 unnecessary+0.026
 lifelong+0.023
Sur+0.021
 risks+0.021
Got+0.021
Neg logit contributions
pron-0.024
ller-0.022
lf-0.020
lights-0.019
ibrarian-0.019
Token
Token ID198
Feature activation+1.47e-02

Pos logit contributions
 unnecessary+0.019
 lifelong+0.016
 risks+0.015
 situations+0.015
 chances+0.014
Neg logit contributions
pron-0.016
ller-0.015
lf-0.014
ibrarian-0.013
mallow-0.013
TokenTom
Token ID13787
Feature activation-6.93e-03

Pos logit contributions
 unnecessary+0.035
 lifelong+0.030
 situations+0.029
 risks+0.028
 chances+0.028
Neg logit contributions
pron-0.022
lf-0.019
ller-0.017
ma-0.016
mallow-0.015
Tokenmy
Token ID1820
Feature activation-8.93e-03

Pos logit contributions
pron+0.012
ller+0.011
lights+0.010
lf+0.010
 timer+0.010
Neg logit contributions
 unnecessary-0.011
 lifelong-0.010
Sur-0.010
Got-0.010
 friendships-0.009
Token smiled
Token ID13541
Feature activation+1.09e-02

Pos logit contributions
pron+0.016
ller+0.015
lights+0.015
rant+0.014
mallow+0.014
Neg logit contributions
 unnecessary-0.013
Sur-0.012
 lifelong-0.012
Got-0.012
Help-0.010
Token and
Token ID290
Feature activation-9.03e-03

Pos logit contributions
 unnecessary+0.026
 lifelong+0.023
Got+0.022
 risks+0.022
Sur+0.022
Neg logit contributions
pron-0.013
ller-0.012
lf-0.010
lights-0.010
mallow-0.009
Token started
Token ID2067
Feature activation+8.15e-03

Pos logit contributions
pron+0.013
ller+0.011
lf+0.010
ibrarian+0.009
lights+0.009
Neg logit contributions
 unnecessary-0.021
 lifelong-0.018
 risks-0.017
 situations-0.017
Sur-0.017
Token looking
Token ID2045
Feature activation+2.51e-02

Pos logit contributions
 unnecessary+0.017
 lifelong+0.014
 risks+0.013
Sur+0.013
Got+0.013
Neg logit contributions
pron-0.013
ller-0.012
lf-0.011
lights-0.010
mallow-0.010
Token for
Token ID329
Feature activation+2.25e-02

Pos logit contributions
 unnecessary+0.050
 lifelong+0.043
Sur+0.040
 risks+0.040
Got+0.040
Neg logit contributions
pron-0.041
ller-0.037
lf-0.034
lights-0.032
ibrarian-0.032
Token some
Token ID617
Feature activation+9.46e-03

Pos logit contributions
 unnecessary+0.042
 lifelong+0.036
Sur+0.035
Got+0.033
 situations+0.033
Neg logit contributions
pron-0.038
ller-0.032
lf-0.032
lights-0.030
issa-0.030
Token pants
Token ID12581
Feature activation-5.20e-03

Pos logit contributions
 unnecessary+0.017
 lifelong+0.015
Sur+0.014
Got+0.014
 situations+0.013
Neg logit contributions
pron-0.017
ller-0.014
lf-0.014
lights-0.014
issa-0.013
Token as
Token ID355
Feature activation-1.33e-04

Pos logit contributions
pron+0.008
ller+0.007
lf+0.007
issa+0.006
ibrarian+0.006
Neg logit contributions
 unnecessary-0.010
 lifelong-0.009
 risks-0.009
Sur-0.008
 chances-0.008
Token be
Token ID307
Feature activation+3.98e-03

Pos logit contributions
 unnecessary+0.011
 lifelong+0.009
Sur+0.008
 risks+0.008
Got+0.008
Neg logit contributions
pron-0.010
ller-0.009
lf-0.008
ibrarian-0.008
lights-0.008
Token her
Token ID607
Feature activation+1.16e-02

Pos logit contributions
 unnecessary+0.008
 situations+0.007
 endings+0.006
 risks+0.006
Got+0.006
Neg logit contributions
pron-0.007
hes-0.006
ma-0.006
ller-0.006
lf-0.005
Token friend
Token ID1545
Feature activation+2.05e-02

Pos logit contributions
 unnecessary+0.019
Sur+0.016
 lifelong+0.016
Got+0.016
 situations+0.015
Neg logit contributions
pron-0.022
ller-0.021
lf-0.019
ibrarian-0.019
washer-0.018
Token.
Token ID13
Feature activation+8.99e-03

Pos logit contributions
 unnecessary+0.047
 lifelong+0.042
Sur+0.039
 risks+0.039
Got+0.039
Neg logit contributions
pron-0.027
ller-0.024
lf-0.021
ibrarian-0.020
lights-0.020
Token<|endoftext|>
Token ID50256
Feature activation-8.83e-03

Pos logit contributions
 unnecessary+0.018
 lifelong+0.016
 risks+0.015
 situations+0.014
Sur+0.014
Neg logit contributions
pron-0.015
ller-0.014
lf-0.012
ibrarian-0.012
mallow-0.012
TokenS
Token ID50
Feature activation-8.25e-04

Pos logit contributions
pron+0.019
ller+0.018
lf+0.016
lights+0.016
rant+0.016
Neg logit contributions
 unnecessary-0.012
 lifelong-0.011
Sur-0.010
Got-0.010
 risks-0.009
Tokenara
Token ID3301
Feature activation-1.45e-02

Pos logit contributions
pron+0.001
ller+0.001
lights+0.001
lf+0.001
washer+0.001
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
Sur-0.001
Got-0.001
Pe-0.001
Token loved
Token ID6151
Feature activation-1.82e-02

Pos logit contributions
pron+0.026
ller+0.024
lights+0.023
lf+0.021
 timer+0.021
Neg logit contributions
 unnecessary-0.024
 lifelong-0.022
Got-0.021
Sur-0.021
 situations-0.020
Token to
Token ID284
Feature activation+6.56e-03

Pos logit contributions
pron+0.035
ller+0.032
rant+0.030
lights+0.030
ibrarian+0.030
Neg logit contributions
 unnecessary-0.030
 lifelong-0.025
Sur-0.024
Got-0.023
 risks-0.023
Token play
Token ID711
Feature activation-3.83e-03

Pos logit contributions
 unnecessary+0.008
 lifelong+0.007
Sur+0.006
Got+0.006
 risks+0.006
Neg logit contributions
pron-0.014
ller-0.014
lights-0.013
ibrarian-0.013
lf-0.013
Token with
Token ID351
Feature activation-1.49e-03

Pos logit contributions
pron+0.007
ller+0.006
lf+0.006
ibrarian+0.006
lights+0.006
Neg logit contributions
 unnecessary-0.007
 lifelong-0.006
Sur-0.005
 risks-0.005
Got-0.005
TokenNo
Token ID2949
Feature activation-3.92e-03

Pos logit contributions
 unnecessary+0.017
 lifelong+0.013
 situations+0.013
 risks+0.013
 chances+0.012
Neg logit contributions
pron-0.014
ller-0.013
mallow-0.011
lf-0.011
ibrarian-0.011
Token,
Token ID11
Feature activation-1.50e-02

Pos logit contributions
pron+0.006
ller+0.006
lf+0.005
lights+0.005
issa+0.005
Neg logit contributions
 unnecessary-0.008
 lifelong-0.007
Sur-0.006
Got-0.006
 risks-0.006
Token I
Token ID314
Feature activation-5.02e-03

Pos logit contributions
pron+0.027
ller+0.024
lf+0.023
ibrarian+0.021
lights+0.021
Neg logit contributions
 unnecessary-0.028
 lifelong-0.024
Sur-0.022
 risks-0.022
Got-0.022
Token don
Token ID836
Feature activation-5.16e-02

Pos logit contributions
pron+0.011
ller+0.010
lf+0.009
ibrarian+0.009
mallow+0.009
Neg logit contributions
 unnecessary-0.008
 lifelong-0.006
Sur-0.006
 risks-0.006
Got-0.006
Token't
Token ID470
Feature activation-4.07e-03

Pos logit contributions
pron+0.093
ller+0.077
isa+0.074
rant+0.073
lights+0.073
Neg logit contributions
 unnecessary-0.090
Sur-0.080
 lifelong-0.080
 Leave-0.077
 risks-0.074
Token.
Token ID13
Feature activation+1.29e-02

Pos logit contributions
pron+0.008
ller+0.008
lf+0.007
ibrarian+0.007
mallow+0.007
Neg logit contributions
 unnecessary-0.007
 lifelong-0.006
Sur-0.005
Got-0.005
 risks-0.005
Token You
Token ID921
Feature activation+2.61e-04

Pos logit contributions
 unnecessary+0.024
 lifelong+0.020
 risks+0.018
Sur+0.018
Got+0.018
Neg logit contributions
pron-0.023
ller-0.021
ibrarian-0.019
lf-0.019
lights-0.019
Token have
Token ID423
Feature activation+8.54e-03

Pos logit contributions
 unnecessary+0.000
 lifelong+0.000
Sur+0.000
Got+0.000
 situations+0.000
Neg logit contributions
pron-0.000
ller-0.000
lf-0.000
ibrarian-0.000
washer-0.000
Token boring
Token ID14262
Feature activation-4.76e-03

Pos logit contributions
 unnecessary+0.017
 lifelong+0.016
Sur+0.015
Got+0.015
 situations+0.014
Neg logit contributions
pron-0.012
ller-0.011
lf-0.009
issa-0.009
 clock-0.009
Token toys
Token ID14958
Feature activation+2.21e-03

Pos logit contributions
pron+0.009
ller+0.008
lf+0.008
lights+0.007
ibrarian+0.007
Neg logit contributions
 unnecessary-0.009
 lifelong-0.007
Sur-0.007
Got-0.007
 risks-0.007
Token and
Token ID290
Feature activation+1.32e-02

Pos logit contributions
 unnecessary+0.004
 lifelong+0.004
 risks+0.004
 chances+0.004
Sur+0.004
Neg logit contributions
pron-0.004
ller-0.003
lf-0.003
arge-0.003
lights-0.003
Token honey
Token ID12498
Feature activation-9.79e-03

Pos logit contributions
pron+0.035
ller+0.032
lf+0.030
ibrarian+0.028
lights+0.028
Neg logit contributions
 unnecessary-0.037
 lifelong-0.032
Sur-0.029
 risks-0.029
Got-0.029
Token.
Token ID13
Feature activation-1.47e-03

Pos logit contributions
pron+0.015
ller+0.014
issa+0.013
lf+0.013
mallow+0.012
Neg logit contributions
 unnecessary-0.019
 lifelong-0.017
 risks-0.015
Sur-0.015
 situations-0.015
Token They
Token ID1119
Feature activation-6.11e-03

Pos logit contributions
pron+0.003
ller+0.002
lf+0.002
lights+0.002
ibrarian+0.002
Neg logit contributions
 unnecessary-0.003
 lifelong-0.002
Sur-0.002
Got-0.002
 risks-0.002
Token also
Token ID635
Feature activation-1.19e-03

Pos logit contributions
pron+0.012
ller+0.011
lights+0.010
lf+0.010
issa+0.010
Neg logit contributions
 unnecessary-0.009
Got-0.008
 lifelong-0.008
 belong-0.008
Sur-0.008
Token sampled
Token ID35846
Feature activation+3.42e-03

Pos logit contributions
pron+0.002
ller+0.002
lights+0.002
lf+0.002
issa+0.002
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
Got-0.002
Sur-0.002
 belong-0.002
Token some
Token ID617
Feature activation-1.26e-02

Pos logit contributions
 unnecessary+0.006
 lifelong+0.005
Sur+0.005
 risks+0.005
Got+0.005
Neg logit contributions
pron-0.006
ller-0.006
lf-0.005
ibrarian-0.005
lights-0.005
Token crunch
Token ID23527
Feature activation-8.94e-03

Pos logit contributions
pron+0.024
ller+0.023
lf+0.021
issa+0.020
washer+0.020
Neg logit contributions
 unnecessary-0.022
 lifelong-0.018
 risks-0.017
Sur-0.017
Got-0.017
Tokeny
Token ID88
Feature activation-1.12e-02

Pos logit contributions
pron+0.016
ller+0.015
lf+0.014
issa+0.013
ibrarian+0.013
Neg logit contributions
 unnecessary-0.015
 lifelong-0.013
Sur-0.012
Got-0.012
 risks-0.012
Token chips
Token ID12014
Feature activation-1.51e-02

Pos logit contributions
pron+0.020
ller+0.018
lf+0.016
ibrarian+0.016
lights+0.015
Neg logit contributions
 unnecessary-0.021
 lifelong-0.018
Sur-0.017
 risks-0.017
Got-0.017
Token and
Token ID290
Feature activation+1.89e-03

Pos logit contributions
pron+0.026
ller+0.024
lf+0.022
issa+0.021
lights+0.020
Neg logit contributions
 unnecessary-0.027
 lifelong-0.024
Sur-0.023
 risks-0.022
Got-0.022
Token some
Token ID617
Feature activation-5.84e-03

Pos logit contributions
 unnecessary+0.004
 lifelong+0.003
Sur+0.003
Got+0.003
 risks+0.003
Neg logit contributions
pron-0.003
ller-0.003
lf-0.003
ibrarian-0.002
issa-0.002
Token,
Token ID11
Feature activation-2.51e-05

Pos logit contributions
pron+0.010
ller+0.009
lf+0.008
mallow+0.008
issa+0.008
Neg logit contributions
 unnecessary-0.012
 lifelong-0.010
Sur-0.009
Got-0.009
 risks-0.009
Token stay
Token ID2652
Feature activation+1.09e-02

Pos logit contributions
pron+0.000
ller+0.000
lf+0.000
ibrarian+0.000
issa+0.000
Neg logit contributions
 unnecessary-0.000
 lifelong-0.000
Sur-0.000
 risks-0.000
 situations-0.000
Token over
Token ID625
Feature activation+1.40e-02

Pos logit contributions
 unnecessary+0.019
 lifelong+0.016
Got+0.015
Sur+0.015
 situations+0.015
Neg logit contributions
pron-0.021
ller-0.019
lf-0.018
issa-0.017
mallow-0.016
Token there
Token ID612
Feature activation+2.10e-02

Pos logit contributions
 unnecessary+0.027
 lifelong+0.024
Sur+0.022
Got+0.022
 risks+0.021
Neg logit contributions
pron-0.024
ller-0.022
lf-0.020
lights-0.019
mallow-0.019
Token".
Token ID1911
Feature activation+2.42e-03

Pos logit contributions
 unnecessary+0.041
 lifelong+0.035
 risks+0.032
Sur+0.032
 chances+0.032
Neg logit contributions
pron-0.035
ller-0.032
lf-0.030
lights-0.028
ibrarian-0.028
Token
Token ID198
Feature activation-1.27e-03

Pos logit contributions
 unnecessary+0.005
 lifelong+0.004
 risks+0.004
Got+0.004
Sur+0.004
Neg logit contributions
pron-0.004
ller-0.004
lf-0.003
ibrarian-0.003
lights-0.003
Token
Token ID198
Feature activation+8.48e-03

Pos logit contributions
pron+0.003
ller+0.002
lf+0.002
lights+0.002
mallow+0.002
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
Sur-0.002
Got-0.002
 risks-0.002
TokenM
Token ID44
Feature activation-3.76e-04

Pos logit contributions
 unnecessary+0.019
 lifelong+0.016
 situations+0.015
 risks+0.015
 chances+0.015
Neg logit contributions
pron-0.013
lf-0.011
ibrarian-0.010
ller-0.010
arge-0.010
Tokenia
Token ID544
Feature activation-1.18e-03

Pos logit contributions
pron+0.001
ller+0.000
lf+0.000
mallow+0.000
rant+0.000
Neg logit contributions
 unnecessary-0.001
 lifelong-0.001
Sur-0.001
 risks-0.001
 chances-0.001
Token was
Token ID373
Feature activation-7.93e-03

Pos logit contributions
pron+0.002
ller+0.002
lf+0.002
lights+0.002
mallow+0.002
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
Sur-0.002
Got-0.002
 risks-0.002
Token troubled
Token ID17840
Feature activation+6.65e-03

Pos logit contributions
pron+0.014
ller+0.013
lf+0.012
ibrarian+0.012
lights+0.012
Neg logit contributions
 unnecessary-0.015
 lifelong-0.012
Sur-0.011
 risks-0.011
Got-0.011
Token my
Token ID616
Feature activation-3.78e-04

Pos logit contributions
 unnecessary+0.005
 lifelong+0.004
 risks+0.004
Sur+0.004
Got+0.004
Neg logit contributions
pron-0.004
ller-0.004
lf-0.004
lights-0.003
ibrarian-0.003
Token toy
Token ID13373
Feature activation-1.34e-02

Pos logit contributions
pron+0.001
ller+0.001
lf+0.001
ibrarian+0.001
mallow+0.001
Neg logit contributions
 unnecessary-0.001
 lifelong-0.001
Sur-0.001
Got-0.001
 risks-0.001
Token.
Token ID13
Feature activation+4.78e-03

Pos logit contributions
pron+0.021
ller+0.018
lf+0.017
lights+0.016
ibrarian+0.016
Neg logit contributions
 unnecessary-0.028
 lifelong-0.025
Sur-0.023
 risks-0.023
Got-0.023
Token You
Token ID921
Feature activation-3.46e-03

Pos logit contributions
 unnecessary+0.009
 lifelong+0.008
 risks+0.007
Sur+0.007
Got+0.007
Neg logit contributions
pron-0.008
ller-0.008
lf-0.007
ibrarian-0.007
mallow-0.007
Token can
Token ID460
Feature activation-1.46e-02

Pos logit contributions
pron+0.007
ller+0.006
lf+0.006
lights+0.006
ibrarian+0.005
Neg logit contributions
 unnecessary-0.006
 lifelong-0.005
 risks-0.005
Sur-0.005
Got-0.004
Token play
Token ID711
Feature activation-1.76e-02

Pos logit contributions
pron+0.027
ller+0.024
lights+0.022
lf+0.022
mallow+0.021
Neg logit contributions
 unnecessary-0.025
 lifelong-0.022
 risks-0.021
 belong-0.020
Sur-0.020
Token with
Token ID351
Feature activation-1.18e-02

Pos logit contributions
pron+0.034
ller+0.031
lf+0.029
lights+0.028
ibrarian+0.028
Neg logit contributions
 unnecessary-0.030
 lifelong-0.025
Sur-0.023
 risks-0.023
Got-0.023
Token something
Token ID1223
Feature activation-1.08e-03

Pos logit contributions
pron+0.021
ller+0.019
lf+0.017
lights+0.017
mallow+0.017
Neg logit contributions
 unnecessary-0.022
 lifelong-0.018
 risks-0.018
 chances-0.017
Sur-0.017
Token else
Token ID2073
Feature activation-9.13e-03

Pos logit contributions
pron+0.002
ller+0.002
lf+0.002
washer+0.002
lights+0.002
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
Got-0.002
Sur-0.002
 situations-0.002
Token."
Token ID526
Feature activation-5.31e-04

Pos logit contributions
pron+0.015
ller+0.013
lf+0.012
lights+0.012
mallow+0.011
Neg logit contributions
 unnecessary-0.018
 lifelong-0.016
 risks-0.015
 chances-0.015
 situations-0.015
Token
Token ID198
Feature activation-2.35e-04

Pos logit contributions
pron+0.001
ller+0.001
lf+0.001
lights+0.001
ibrarian+0.001
Neg logit contributions
 unnecessary-0.001
 lifelong-0.001
Sur-0.001
 risks-0.001
Got-0.001
Token played
Token ID2826
Feature activation+5.99e-04

Pos logit contributions
pron+0.016
ller+0.015
lights+0.013
lf+0.013
ibrarian+0.013
Neg logit contributions
 unnecessary-0.013
 lifelong-0.012
Got-0.011
Sur-0.010
 risks-0.010
Token with
Token ID351
Feature activation+9.04e-03

Pos logit contributions
 unnecessary+0.001
 lifelong+0.001
Sur+0.001
 risks+0.001
Got+0.001
Neg logit contributions
pron-0.001
ller-0.001
lf-0.001
lights-0.001
mallow-0.001
Token balls
Token ID11333
Feature activation-6.04e-03

Pos logit contributions
 unnecessary+0.016
 lifelong+0.014
Sur+0.013
 risks+0.013
Got+0.012
Neg logit contributions
pron-0.017
ller-0.015
lf-0.014
lights-0.014
ibrarian-0.013
Token,
Token ID11
Feature activation-4.91e-03

Pos logit contributions
pron+0.011
ller+0.011
issa+0.009
lf+0.009
isa+0.009
Neg logit contributions
 unnecessary-0.010
 lifelong-0.009
Sur-0.008
 chances-0.008
 friendships-0.008
Token ran
Token ID4966
Feature activation+1.86e-03

Pos logit contributions
pron+0.008
ller+0.008
lf+0.007
ibrarian+0.007
lights+0.007
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Got-0.008
Sur-0.007
 risks-0.007
Token around
Token ID1088
Feature activation-5.41e-03

Pos logit contributions
 unnecessary+0.004
 lifelong+0.004
Sur+0.004
 risks+0.004
Got+0.004
Neg logit contributions
pron-0.002
ller-0.002
lf-0.002
lights-0.002
ibrarian-0.002
Token,
Token ID11
Feature activation-1.40e-02

Pos logit contributions
pron+0.010
ller+0.009
ibrarian+0.008
lights+0.008
lf+0.008
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.007
Got-0.007
 risks-0.007
Token and
Token ID290
Feature activation-1.17e-02

Pos logit contributions
pron+0.024
ller+0.022
lights+0.021
ibrarian+0.021
roach+0.020
Neg logit contributions
 unnecessary-0.022
 lifelong-0.021
Sur-0.019
Got-0.019
Help-0.018
Token even
Token ID772
Feature activation-2.71e-03

Pos logit contributions
pron+0.021
ller+0.019
ibrarian+0.017
lf+0.017
lights+0.017
Neg logit contributions
 unnecessary-0.021
 lifelong-0.019
Got-0.017
Sur-0.017
 risks-0.016
Token made
Token ID925
Feature activation+6.83e-03

Pos logit contributions
pron+0.004
lf+0.003
ller+0.003
washer+0.003
rant+0.003
Neg logit contributions
 unnecessary-0.006
 lifelong-0.005
 risks-0.005
 situations-0.005
Sur-0.005
Token a
Token ID257
Feature activation+2.49e-02

Pos logit contributions
 unnecessary+0.012
 lifelong+0.011
 situations+0.010
Sur+0.010
 risks+0.010
Neg logit contributions
pron-0.011
ller-0.010
lf-0.010
lights-0.009
ibrarian-0.009
Tokenily
Token ID813
Feature activation-1.53e-02

Pos logit contributions
pron+0.032
ller+0.027
lf+0.026
ibrarian+0.026
issa+0.024
Neg logit contributions
 unnecessary-0.054
 lifelong-0.048
 risks-0.044
Got-0.043
Sur-0.043
Token took
Token ID1718
Feature activation-1.08e-02

Pos logit contributions
pron+0.028
ller+0.025
lights+0.024
rant+0.023
mallow+0.023
Neg logit contributions
 unnecessary-0.023
 lifelong-0.023
Sur-0.022
Got-0.021
 friendships-0.019
Token the
Token ID262
Feature activation-8.66e-03

Pos logit contributions
pron+0.015
ller+0.014
ibrarian+0.012
issa+0.012
lf+0.012
Neg logit contributions
 unnecessary-0.023
 situations-0.020
 lifelong-0.019
 endings-0.019
Got-0.018
Token orange
Token ID10912
Feature activation-2.78e-02

Pos logit contributions
pron+0.014
ller+0.013
lf+0.011
lights+0.011
ibrarian+0.011
Neg logit contributions
 unnecessary-0.018
 lifelong-0.016
Sur-0.014
Got-0.014
 risks-0.014
Token and
Token ID290
Feature activation-2.08e-02

Pos logit contributions
pron+0.043
ller+0.039
lf+0.035
ibrarian+0.034
issa+0.033
Neg logit contributions
 unnecessary-0.057
 lifelong-0.051
Sur-0.048
Got-0.047
 risks-0.047
Token started
Token ID2067
Feature activation-7.33e-03

Pos logit contributions
pron+0.037
ller+0.034
lf+0.031
ibrarian+0.030
lights+0.030
Neg logit contributions
 unnecessary-0.038
 lifelong-0.033
Sur-0.030
Got-0.030
 risks-0.029
Token to
Token ID284
Feature activation-2.71e-03

Pos logit contributions
pron+0.011
ller+0.010
lf+0.009
mallow+0.008
washer+0.008
Neg logit contributions
 unnecessary-0.016
 lifelong-0.014
 risks-0.013
Got-0.013
 situations-0.013
Token wave
Token ID6769
Feature activation+2.44e-03

Pos logit contributions
pron+0.005
ller+0.005
lf+0.005
mallow+0.004
ibrarian+0.004
Neg logit contributions
 unnecessary-0.005
 lifelong-0.004
 risks-0.004
Sur-0.004
Got-0.004
Token it
Token ID340
Feature activation-6.70e-03

Pos logit contributions
 unnecessary+0.005
 lifelong+0.004
 situations+0.004
 risks+0.004
Got+0.004
Neg logit contributions
pron-0.004
ller-0.004
lf-0.003
ibrarian-0.003
lights-0.003
Token around
Token ID1088
Feature activation-1.20e-02

Pos logit contributions
pron+0.010
ller+0.009
lf+0.008
lights+0.008
ibrarian+0.008
Neg logit contributions
 unnecessary-0.014
 lifelong-0.012
Sur-0.012
Got-0.011
 risks-0.011
Token,
Token ID11
Feature activation-1.52e-02

Pos logit contributions
pron+0.022
ller+0.020
lf+0.018
lights+0.018
ibrarian+0.018
Neg logit contributions
 unnecessary-0.022
 lifelong-0.019
Sur-0.017
 risks-0.017
Got-0.017
Token some
Token ID617
Feature activation-7.79e-03

Pos logit contributions
pron+0.041
ller+0.039
lf+0.036
lights+0.035
ibrarian+0.034
Neg logit contributions
 unnecessary-0.037
 lifelong-0.031
Sur-0.029
 risks-0.028
Got-0.028
Token cake
Token ID12187
Feature activation-2.93e-03

Pos logit contributions
pron+0.013
ller+0.012
lf+0.011
lights+0.011
issa+0.010
Neg logit contributions
 unnecessary-0.015
 lifelong-0.013
Sur-0.012
 chances-0.012
 risks-0.012
Token too
Token ID1165
Feature activation-4.08e-03

Pos logit contributions
pron+0.005
ller+0.004
lf+0.004
issa+0.004
ibrarian+0.004
Neg logit contributions
 unnecessary-0.006
 lifelong-0.005
Sur-0.005
 risks-0.005
 chances-0.005
Token.
Token ID13
Feature activation-7.67e-03

Pos logit contributions
pron+0.006
ller+0.006
lf+0.005
ibrarian+0.005
issa+0.005
Neg logit contributions
 unnecessary-0.008
 lifelong-0.007
 risks-0.007
Sur-0.007
 chances-0.007
Token Lily
Token ID20037
Feature activation-1.21e-02

Pos logit contributions
pron+0.015
ller+0.014
lf+0.013
mallow+0.012
ibrarian+0.012
Neg logit contributions
 unnecessary-0.013
 lifelong-0.011
Sur-0.010
Got-0.010
 risks-0.010
Token didn
Token ID1422
Feature activation-3.43e-02

Pos logit contributions
pron+0.025
lights+0.024
ller+0.023
rant+0.021
lf+0.021
Neg logit contributions
 unnecessary-0.017
 lifelong-0.014
Sur-0.014
Got-0.013
Help-0.012
Token't
Token ID470
Feature activation-4.61e-03

Pos logit contributions
pron+0.053
ller+0.047
lights+0.043
lf+0.042
isa+0.040
Neg logit contributions
 unnecessary-0.070
 lifelong-0.062
Sur-0.059
 risks-0.057
Got-0.057
Token want
Token ID765
Feature activation-2.84e-02

Pos logit contributions
pron+0.009
ller+0.009
lf+0.008
ibrarian+0.008
mallow+0.008
Neg logit contributions
 unnecessary-0.007
 lifelong-0.006
Sur-0.005
Got-0.005
 risks-0.005
Token to
Token ID284
Feature activation-1.83e-03

Pos logit contributions
pron+0.053
ller+0.048
lf+0.045
lights+0.043
mallow+0.042
Neg logit contributions
 unnecessary-0.052
 lifelong-0.044
 risks-0.041
Sur-0.040
 chances-0.040
Token share
Token ID2648
Feature activation-9.03e-03

Pos logit contributions
pron+0.004
ller+0.004
lf+0.003
lights+0.003
ibrarian+0.003
Neg logit contributions
 unnecessary-0.003
 lifelong-0.002
Sur-0.002
 risks-0.002
Got-0.002
Token and
Token ID290
Feature activation+9.35e-03

Pos logit contributions
pron+0.017
ller+0.016
lf+0.015
ibrarian+0.014
washer+0.014
Neg logit contributions
 unnecessary-0.015
 lifelong-0.013
 risks-0.012
Sur-0.012
Got-0.011
Token Her
Token ID2332
Feature activation-5.17e-03

Pos logit contributions
pron+0.027
ller+0.025
lf+0.024
rant+0.024
ibrarian+0.024
Neg logit contributions
 unnecessary-0.015
 lifelong-0.013
Sur-0.013
Got-0.011
Pe-0.011
Token mom
Token ID1995
Feature activation+5.60e-03

Pos logit contributions
pron+0.007
ller+0.007
rant+0.006
lf+0.006
ibrarian+0.006
Neg logit contributions
 unnecessary-0.011
 lifelong-0.010
Sur-0.009
Got-0.009
 friendships-0.009
Tokenmy
Token ID1820
Feature activation-1.01e-03

Pos logit contributions
 lifelong+0.005
Sur+0.005
Got+0.004
 true+0.004
 worry+0.004
Neg logit contributions
lights-0.012
pron-0.012
roach-0.011
ller-0.011
alo-0.011
Token told
Token ID1297
Feature activation-1.33e-02

Pos logit contributions
lights+0.002
pron+0.002
ller+0.002
rant+0.002
roach+0.002
Neg logit contributions
Sur-0.001
 lifelong-0.001
Got-0.001
Help-0.001
 unnecessary-0.001
Token her
Token ID607
Feature activation+1.86e-03

Pos logit contributions
pron+0.024
ller+0.022
lf+0.020
ibrarian+0.019
issa+0.019
Neg logit contributions
 unnecessary-0.025
 lifelong-0.021
Sur-0.019
 risks-0.019
Got-0.019
Token that
Token ID326
Feature activation-3.67e-02

Pos logit contributions
 unnecessary+0.004
 lifelong+0.003
 risks+0.003
 chances+0.003
 friendships+0.003
Neg logit contributions
pron-0.003
ller-0.003
mallow-0.002
rant-0.002
lf-0.002
Token it
Token ID340
Feature activation+3.98e-03

Pos logit contributions
pron+0.063
ller+0.058
lf+0.053
ibrarian+0.051
lights+0.051
Neg logit contributions
 unnecessary-0.068
 lifelong-0.060
Sur-0.055
Got-0.054
 risks-0.054
Token came
Token ID1625
Feature activation+2.95e-02

Pos logit contributions
 unnecessary+0.006
 lifelong+0.005
Sur+0.005
 risks+0.005
Got+0.005
Neg logit contributions
pron-0.008
ller-0.008
lf-0.007
lights-0.007
ibrarian-0.007
Token from
Token ID422
Feature activation+8.64e-03

Pos logit contributions
 unnecessary+0.054
Got+0.049
 situations+0.047
 lifelong+0.046
Sur+0.045
Neg logit contributions
pron-0.048
ller-0.046
work-0.041
rant-0.040
bell-0.039
Token a
Token ID257
Feature activation-1.24e-02

Pos logit contributions
 unnecessary+0.014
 lifelong+0.012
 risks+0.011
 situations+0.010
 chances+0.010
Neg logit contributions
pron-0.018
ller-0.016
lf-0.015
lights-0.015
ibrarian-0.015
Token special
Token ID2041
Feature activation-1.29e-02

Pos logit contributions
pron+0.024
ller+0.021
lf+0.020
washer+0.018
mallow+0.018
Neg logit contributions
 unnecessary-0.022
 lifelong-0.019
Sur-0.017
Got-0.017
 risks-0.017
Token box
Token ID3091
Feature activation-2.20e-02

Pos logit contributions
pron+0.016
ller+0.015
lf+0.013
lights+0.012
mallow+0.012
Neg logit contributions
 unnecessary-0.026
 lifelong-0.023
Sur-0.022
 risks-0.022
Got-0.022
Token and
Token ID290
Feature activation-2.10e-02

Pos logit contributions
pron+0.041
ller+0.037
roach+0.034
mallow+0.033
issa+0.033
Neg logit contributions
 lifelong-0.032
 unnecessary-0.031
Sur-0.030
 chances-0.028
 situations-0.027
Token showed
Token ID3751
Feature activation-6.14e-03

Pos logit contributions
pron+0.036
ller+0.033
lf+0.030
lights+0.029
mallow+0.028
Neg logit contributions
 unnecessary-0.040
 lifelong-0.035
Sur-0.033
Got-0.032
 risks-0.032
Token them
Token ID606
Feature activation+1.84e-02

Pos logit contributions
pron+0.008
ller+0.008
ibrarian+0.008
lf+0.007
issa+0.007
Neg logit contributions
 unnecessary-0.014
 lifelong-0.011
 situations-0.011
 risks-0.011
Sur-0.011
Token what
Token ID644
Feature activation+7.25e-03

Pos logit contributions
 unnecessary+0.033
 lifelong+0.028
Sur+0.026
 risks+0.025
Got+0.025
Neg logit contributions
pron-0.034
ller-0.031
lf-0.029
ibrarian-0.028
lights-0.028
Token was
Token ID373
Feature activation-3.32e-02

Pos logit contributions
 unnecessary+0.015
 lifelong+0.013
 risks+0.012
Sur+0.012
Got+0.012
Neg logit contributions
pron-0.012
ller-0.011
lf-0.010
ibrarian-0.009
washer-0.009
Token inside
Token ID2641
Feature activation+1.51e-02

Pos logit contributions
pron+0.053
ller+0.048
lf+0.043
lights+0.041
mallow+0.041
Neg logit contributions
 unnecessary-0.068
 lifelong-0.059
 risks-0.055
Sur-0.055
Got-0.054
Token.
Token ID13
Feature activation-1.05e-02

Pos logit contributions
 unnecessary+0.033
 lifelong+0.029
 risks+0.027
Sur+0.027
 situations+0.027
Neg logit contributions
pron-0.022
ller-0.019
lf-0.017
ibrarian-0.016
lights-0.016
Token It
Token ID632
Feature activation-1.60e-02

Pos logit contributions
pron+0.019
ller+0.017
lf+0.016
mallow+0.015
ibrarian+0.015
Neg logit contributions
 unnecessary-0.019
 lifelong-0.017
Sur-0.015
Got-0.015
 risks-0.015
Token was
Token ID373
Feature activation-6.41e-03

Pos logit contributions
pron+0.034
ller+0.031
lf+0.028
lights+0.028
mallow+0.028
Neg logit contributions
 unnecessary-0.022
 lifelong-0.020
Sur-0.019
Got-0.018
 risks-0.016
Token a
Token ID257
Feature activation-9.94e-03

Pos logit contributions
pron+0.012
ller+0.011
lf+0.010
ibrarian+0.010
lights+0.010
Neg logit contributions
 unnecessary-0.011
 lifelong-0.009
Sur-0.009
 risks-0.008
Got-0.008
Token delicious
Token ID12625
Feature activation-1.03e-03

Pos logit contributions
pron+0.012
ller+0.011
lf+0.011
ibrarian+0.010
lights+0.010
Neg logit contributions
 unnecessary-0.012
 lifelong-0.010
Sur-0.009
 risks-0.009
Got-0.009
Token drink
Token ID4144
Feature activation-1.17e-02

Pos logit contributions
pron+0.002
ller+0.002
lights+0.002
ibrarian+0.002
issa+0.002
Neg logit contributions
 unnecessary-0.002
 lifelong-0.002
Sur-0.001
 risks-0.001
Got-0.001
Token inside
Token ID2641
Feature activation+1.15e-02

Pos logit contributions
pron+0.019
ller+0.018
lf+0.016
issa+0.016
ibrarian+0.015
Neg logit contributions
 unnecessary-0.022
 lifelong-0.019
Sur-0.019
Got-0.018
 risks-0.018
Token.
Token ID13
Feature activation-4.60e-03

Pos logit contributions
 unnecessary+0.020
 lifelong+0.018
 situations+0.017
 risks+0.016
 chances+0.016
Neg logit contributions
pron-0.020
ller-0.019
ibrarian-0.017
issa-0.016
lf-0.016
Token He
Token ID679
Feature activation-1.22e-02

Pos logit contributions
pron+0.008
ller+0.007
lf+0.007
ibrarian+0.007
lights+0.006
Neg logit contributions
 unnecessary-0.009
 lifelong-0.007
 risks-0.007
Sur-0.007
Got-0.007
Token couldn
Token ID3521
Feature activation-3.60e-02

Pos logit contributions
pron+0.022
ller+0.020
lf+0.018
lights+0.018
mallow+0.018
Neg logit contributions
 unnecessary-0.022
 lifelong-0.019
Sur-0.018
Got-0.018
 risks-0.017
Token't
Token ID470
Feature activation-1.61e-02

Pos logit contributions
pron+0.052
ller+0.046
lf+0.042
lights+0.041
ibrarian+0.038
Neg logit contributions
 unnecessary-0.077
 lifelong-0.069
Sur-0.065
Got-0.064
 risks-0.063
Token believe
Token ID1975
Feature activation-2.59e-03

Pos logit contributions
pron+0.034
ller+0.032
lf+0.029
lights+0.028
ibrarian+0.028
Neg logit contributions
 unnecessary-0.025
 lifelong-0.021
Sur-0.019
 risks-0.019
Got-0.018
Token how
Token ID703
Feature activation+4.82e-03

Pos logit contributions
pron+0.005
ller+0.005
ibrarian+0.004
mallow+0.004
lf+0.004
Neg logit contributions
 unnecessary-0.004
 lifelong-0.004
 risks-0.003
Sur-0.003
 chances-0.003
Token good
Token ID922
Feature activation+5.71e-03

Pos logit contributions
 unnecessary+0.007
 lifelong+0.006
Sur+0.005
 risks+0.005
Got+0.005
Neg logit contributions
pron-0.010
ller-0.010
lf-0.009
ibrarian-0.009
mallow-0.009
Token it
Token ID340
Feature activation-2.16e-03

Pos logit contributions
 unnecessary+0.012
 lifelong+0.011
Sur+0.010
Got+0.010
 risks+0.010
Neg logit contributions
pron-0.009
ller-0.008
lf-0.007
lights-0.007
mallow-0.006
Tokenrot
Token ID10599
Feature activation-6.73e-03

Pos logit contributions
pron+0.020
ller+0.018
lights+0.017
lf+0.017
mallow+0.017
Neg logit contributions
 unnecessary-0.016
 lifelong-0.014
Sur-0.013
Got-0.013
 risks-0.012
Token named
Token ID3706
Feature activation-3.87e-03

Pos logit contributions
pron+0.010
ller+0.009
lf+0.009
ibrarian+0.008
mallow+0.008
Neg logit contributions
 unnecessary-0.015
 lifelong-0.013
 risks-0.012
 situations-0.011
 chances-0.011
Token Polly
Token ID36898
Feature activation+1.08e-02

Pos logit contributions
ller+0.006
pron+0.005
lf+0.005
issa+0.005
ibrarian+0.005
Neg logit contributions
 unnecessary-0.008
 lifelong-0.007
 risks-0.007
Sur-0.006
 situations-0.006
Token flew
Token ID13112
Feature activation+4.74e-03

Pos logit contributions
 unnecessary+0.022
 lifelong+0.019
 risks+0.018
Sur+0.017
 chances+0.017
Neg logit contributions
pron-0.017
ller-0.016
lf-0.015
ibrarian-0.014
mallow-0.014
Token by
Token ID416
Feature activation-4.42e-03

Pos logit contributions
 unnecessary+0.010
 lifelong+0.008
 situations+0.008
Sur+0.008
Got+0.008
Neg logit contributions
pron-0.007
ller-0.007
lf-0.006
ibrarian-0.006
rant-0.006
Token and
Token ID290
Feature activation-3.30e-02

Pos logit contributions
pron+0.007
ller+0.006
lf+0.005
lights+0.005
washer+0.005
Neg logit contributions
 unnecessary-0.009
 lifelong-0.008
Sur-0.008
 risks-0.008
 situations-0.008
Token saw
Token ID2497
Feature activation-8.37e-03

Pos logit contributions
pron+0.055
ller+0.049
lf+0.045
lights+0.043
ibrarian+0.043
Neg logit contributions
 unnecessary-0.065
 lifelong-0.057
Sur-0.053
Got-0.052
 risks-0.052
Token Max
Token ID5436
Feature activation+1.35e-03

Pos logit contributions
pron+0.013
ller+0.012
lf+0.011
ibrarian+0.010
issa+0.010
Neg logit contributions
 unnecessary-0.017
 lifelong-0.015
Sur-0.014
 risks-0.014
Got-0.014
Token drinking
Token ID7722
Feature activation+4.55e-03

Pos logit contributions
 unnecessary+0.003
 lifelong+0.002
Sur+0.002
Got+0.002
 risks+0.002
Neg logit contributions
pron-0.002
ller-0.002
lf-0.002
mallow-0.002
ibrarian-0.002
Token the
Token ID262
Feature activation+7.11e-03

Pos logit contributions
 unnecessary+0.009
 lifelong+0.008
Sur+0.007
 risks+0.007
Got+0.007
Neg logit contributions
pron-0.008
ller-0.007
lf-0.007
lights-0.006
mallow-0.006
Token lime
Token ID28738
Feature activation-1.92e-02

Pos logit contributions
 unnecessary+0.015
 lifelong+0.013
 situations+0.012
Sur+0.012
 risks+0.012
Neg logit contributions
pron-0.011
ller-0.009
 oven-0.008
oni-0.008
 stove-0.008