TOP ACTIVATIONS
MAX = 0.077

a lick and a w ag . He was nice .
strong body and a w ag gy tail . One day
. It was a se ag ull . It had Pip
brown dog with a w ag gy tail . Tim my
black dog with a w ag gy tail . The dog
his friend , a se ag ull named Tim my ,
. It was a se ag ull ! It flew over
then , a big se ag ull saw the oy ster
She gives Sara a l oll ipop and a hug .
M um and Bobby hugged each other and then Mum said
had a long , w ag gy tail and loved to
, they saw a se ag ull fly down and pick
flew off with a w ag of his tail ! <|endoftext|>
. She grabbed a l oll ipop from her bag and
at ?" The bear thought as he peek ed out from
Suddenly , a se ag ull flew down , and
She even saw a se ag ull ! When
gave Tim my and Sally each a medal of their own
J enny spotted a se ag ull on the sand .
friends , but a se ag ull swoop ed down and

MIN ACTIVATIONS
MIN = -0.075

day long and when the evening came , it was very
owl 's stories until the night was over , and the
and cl apping as the night sky lit up with streaks
all day and when the evening came , he looked at
, but it was all fore shadow ed by the kind
in her room when the night came . She wanted to
in the machine until the evening . He was so proud
flew like butterflies until the evening . Then , with a
said " Hom eless and Hungry ". Tim my asked his
time , until eventually the night sky filled with stars .
all week and when the night finally arrived , he was
around and around until the night sky began to come alive
danced together , until the night turned to day . From
everything she recorded on her Sunday adventure . <|endoftext|> Once there
Ash ley , when the night comes you must hide the
the heavy bag had been fore shadow ed , and that
not wake up until the morning . He smiled , and
all afternoon and as the evening sky turned a deep pink
all afternoon , until the morning sky started to turn orange
's go get your p aj amas from the drawer ."

INTERVAL 0.039 - 0.077
CONTAINS 0.786%

with your lion . And we can be friends again ."
happened . Her mom did not believe her . She thought
a mess like that . I must punish you ." So
He was sad and didn 't know what to do .
said with a ch ir p . But Jack

INTERVAL 0.001 - 0.039
CONTAINS 45.905%

tasty . They said thank you to Ben and smiled .
liked to pretend they were heroes and villains . Ben had
stuck . Max was scared and didn 't know
old man told him the truth , but then it began
From that day , Tim and Sam shared their

INTERVAL -0.037 - 0.001
CONTAINS 52.708%

applaud ing . He started to applaud too . Then ,
Just get back up and try again ." L
was like or what she was thinking . One
called Jenna . Jenna was very scared , because she was
patch of red flowers that were dancing in the breeze .

INTERVAL -0.075 - -0.037
CONTAINS 0.600%

Tommy asked his mum : Can
minutes . Finally , her mom said , " OK ,
. Tim my loved going on vacation because he got to
and vowed to come back soon . When
can 't play until your room is clean ." So Tim
Token a
Token ID257
Feature activation+1.65e-02

Pos logit contributions
 nod+0.047
 agreement+0.042
 name+0.041
 understand+0.040
 gra+0.034
Neg logit contributions
ways-0.052
pill-0.052
ths-0.050
ience-0.050
-0.048
Token lick
Token ID36728
Feature activation+3.67e-02

Pos logit contributions
 agreement+0.024
 nod+0.022
 understand+0.019
 whine+0.018
 name+0.018
Neg logit contributions
pill-0.028
ha-0.028
endant-0.027
ww-0.027
-0.027
Token and
Token ID290
Feature activation-2.27e-03

Pos logit contributions
 nod+0.074
 agreement+0.070
 name+0.067
 understand+0.067
 gra+0.062
Neg logit contributions
oh-0.047
ways-0.045
ar-0.045
t-0.044
st-0.043
Token a
Token ID257
Feature activation+2.47e-02

Pos logit contributions
pill+0.003
ways+0.003
ience+0.003
ths+0.003
endant+0.003
Neg logit contributions
 nod-0.004
 agreement-0.004
 name-0.004
 understand-0.004
 gra-0.003
Token w
Token ID266
Feature activation+2.56e-02

Pos logit contributions
 agreement+0.035
 nod+0.032
 understand+0.031
 name+0.029
 whine+0.028
Neg logit contributions
endant-0.040
-0.040
pill-0.039
br-0.038
 transformed-0.038
Tokenag
Token ID363
Feature activation+7.72e-02

Pos logit contributions
 nod+0.045
 name+0.040
 agreement+0.040
 understand+0.039
 head+0.033
Neg logit contributions
ways-0.043
pill-0.042
ths-0.042
ience-0.040
 lifetime-0.039
Token.
Token ID13
Feature activation+1.25e-02

Pos logit contributions
 nod+0.147
 agreement+0.130
 understand+0.121
 whine+0.119
 name+0.118
Neg logit contributions
ar-0.110
t-0.107
oh-0.106
ways-0.106
yl-0.102
Token He
Token ID679
Feature activation-7.83e-03

Pos logit contributions
 nod+0.018
 agreement+0.016
 name+0.015
 understand+0.015
 gra+0.013
Neg logit contributions
ways-0.025
pill-0.024
-0.023
ths-0.023
ience-0.023
Token was
Token ID373
Feature activation+6.25e-03

Pos logit contributions
ways+0.013
pill+0.012
ths+0.012
ience+0.012
+0.011
Neg logit contributions
 nod-0.014
 name-0.013
 agreement-0.013
 understand-0.012
 head-0.011
Token nice
Token ID3621
Feature activation-1.06e-02

Pos logit contributions
 nod+0.012
 agreement+0.011
 name+0.010
 understand+0.010
 gra+0.009
Neg logit contributions
ways-0.009
ths-0.008
 transformed-0.008
pill-0.008
endant-0.008
Token.
Token ID13
Feature activation+1.08e-02

Pos logit contributions
ways+0.020
pill+0.020
ths+0.020
ience+0.020
+0.019
Neg logit contributions
 nod-0.016
 name-0.014
 agreement-0.014
 understand-0.013
 head-0.012
Token strong
Token ID1913
Feature activation+3.20e-02

Pos logit contributions
 agreement+0.012
 nod+0.012
 understand+0.011
 gra+0.010
 name+0.010
Neg logit contributions
-0.009
 lifetime-0.009
 seas-0.008
ww-0.008
ience-0.008
Token body
Token ID1767
Feature activation+3.28e-02

Pos logit contributions
 agreement+0.041
 nod+0.041
 understand+0.038
 gra+0.038
 name+0.033
Neg logit contributions
ar-0.052
-0.051
 seas-0.051
ways-0.049
ience-0.049
Token and
Token ID290
Feature activation-5.07e-03

Pos logit contributions
 nod+0.068
 name+0.062
 agreement+0.062
 understand+0.060
 head+0.054
Neg logit contributions
ways-0.043
pill-0.043
ths-0.041
ience-0.041
-0.039
Token a
Token ID257
Feature activation+1.88e-02

Pos logit contributions
ways+0.008
pill+0.008
ience+0.008
ths+0.008
ar+0.007
Neg logit contributions
 nod-0.009
 agreement-0.008
 name-0.007
 understand-0.007
 gra-0.007
Token w
Token ID266
Feature activation+2.18e-02

Pos logit contributions
 nod+0.033
 agreement+0.033
 understand+0.030
 gra+0.028
 name+0.028
Neg logit contributions
 lifetime-0.025
-0.024
ience-0.023
ways-0.023
 seas-0.023
Tokenag
Token ID363
Feature activation+7.47e-02

Pos logit contributions
 nod+0.037
 name+0.034
 agreement+0.033
 understand+0.032
 head+0.028
Neg logit contributions
ways-0.037
pill-0.036
ths-0.036
ience-0.035
 lifetime-0.034
Tokengy
Token ID1360
Feature activation+1.30e-02

Pos logit contributions
 nod+0.153
 agreement+0.134
 understand+0.127
 whine+0.124
 name+0.120
Neg logit contributions
ways-0.095
ar-0.094
oh-0.091
st-0.091
t-0.091
Token tail
Token ID7894
Feature activation+1.75e-02

Pos logit contributions
 nod+0.020
 agreement+0.017
 name+0.017
 understand+0.017
 gra+0.014
Neg logit contributions
ways-0.024
pill-0.024
ience-0.023
ths-0.023
ar-0.022
Token.
Token ID13
Feature activation+8.69e-04

Pos logit contributions
 nod+0.032
 agreement+0.029
 name+0.029
 understand+0.028
 gra+0.024
Neg logit contributions
ways-0.028
pill-0.027
ths-0.026
ience-0.026
-0.025
Token One
Token ID1881
Feature activation+2.23e-02

Pos logit contributions
 nod+0.002
 agreement+0.002
 understand+0.001
 name+0.001
 whine+0.001
Neg logit contributions
ways-0.001
pill-0.001
-0.001
ar-0.001
 seas-0.001
Token day
Token ID1110
Feature activation-7.74e-03

Pos logit contributions
 nod+0.051
 agreement+0.051
 whine+0.045
 reins+0.045
 understand+0.044
Neg logit contributions
ww-0.019
oh-0.019
t-0.019
way-0.019
st-0.018
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
 nod+0.029
 agreement+0.028
 understand+0.026
 name+0.026
 gra+0.024
Neg logit contributions
ways-0.019
pill-0.017
endant-0.017
-0.017
ar-0.017
Token It
Token ID632
Feature activation+3.26e-03

Pos logit contributions
 nod+0.018
 agreement+0.016
 name+0.016
 understand+0.015
 gra+0.013
Neg logit contributions
ways-0.023
pill-0.022
ths-0.022
ience-0.021
-0.021
Token was
Token ID373
Feature activation-2.00e-03

Pos logit contributions
 nod+0.006
 agreement+0.005
 name+0.005
 understand+0.005
 gra+0.004
Neg logit contributions
ways-0.005
pill-0.005
ths-0.005
ience-0.005
-0.005
Token a
Token ID257
Feature activation+8.16e-03

Pos logit contributions
ways+0.003
pill+0.003
ths+0.003
endant+0.003
 transformed+0.003
Neg logit contributions
 nod-0.004
 agreement-0.004
 understand-0.003
 name-0.003
 whine-0.003
Token se
Token ID384
Feature activation+2.79e-02

Pos logit contributions
 nod+0.014
 agreement+0.013
 name+0.013
 understand+0.012
 gra+0.011
Neg logit contributions
ways-0.013
pill-0.013
ths-0.013
ience-0.013
-0.012
Tokenag
Token ID363
Feature activation+7.43e-02

Pos logit contributions
 nod+0.041
 agreement+0.037
 name+0.037
 understand+0.034
 gra+0.030
Neg logit contributions
ways-0.053
pill-0.051
ience-0.050
ar-0.050
ths-0.049
Tokenull
Token ID724
Feature activation+1.54e-02

Pos logit contributions
 nod+0.139
 agreement+0.124
 understand+0.118
 name+0.113
 whine+0.111
Neg logit contributions
ar-0.107
oh-0.107
ways-0.106
-0.102
st-0.102
Token.
Token ID13
Feature activation+1.17e-02

Pos logit contributions
 nod+0.028
 name+0.026
 understand+0.025
 agreement+0.024
 head+0.023
Neg logit contributions
ways-0.024
ths-0.024
pill-0.023
ience-0.022
-0.022
Token It
Token ID632
Feature activation+4.05e-03

Pos logit contributions
 nod+0.018
 name+0.016
 agreement+0.016
 understand+0.015
 gra+0.013
Neg logit contributions
ways-0.022
pill-0.022
ths-0.021
ience-0.021
oh-0.020
Token had
Token ID550
Feature activation-2.57e-02

Pos logit contributions
 nod+0.007
 agreement+0.007
 name+0.007
 understand+0.006
 gra+0.006
Neg logit contributions
ways-0.006
pill-0.006
ths-0.006
ience-0.006
-0.006
Token Pip
Token ID25149
Feature activation+5.95e-03

Pos logit contributions
ths+0.049
ways+0.048
+0.048
ience+0.046
pill+0.045
Neg logit contributions
 nod-0.040
 name-0.040
 understand-0.038
 agreement-0.034
 gra-0.032
Token brown
Token ID7586
Feature activation+3.57e-02

Pos logit contributions
 nod+0.033
 agreement+0.033
 name+0.030
 understand+0.030
 gra+0.027
Neg logit contributions
ways-0.023
pill-0.023
ience-0.022
 seas-0.021
endant-0.020
Token dog
Token ID3290
Feature activation+1.88e-03

Pos logit contributions
 understand+0.059
 agreement+0.058
 nod+0.058
 gra+0.056
rated+0.050
Neg logit contributions
 seas-0.046
ar-0.046
-0.046
opus-0.043
st-0.043
Token with
Token ID351
Feature activation-1.24e-02

Pos logit contributions
 nod+0.003
 name+0.003
 agreement+0.002
 understand+0.002
 head+0.002
Neg logit contributions
ths-0.004
ways-0.004
pill-0.004
endant-0.003
ience-0.003
Token a
Token ID257
Feature activation+2.89e-04

Pos logit contributions
pill+0.020
ways+0.020
ths+0.019
ience+0.019
+0.018
Neg logit contributions
 nod-0.022
 agreement-0.020
 name-0.019
 understand-0.019
 gra-0.016
Token w
Token ID266
Feature activation+1.87e-02

Pos logit contributions
 agreement+0.001
 nod+0.001
 understand+0.001
 gra+0.000
 whine+0.000
Neg logit contributions
-0.000
ways-0.000
endant-0.000
ience-0.000
pill-0.000
Tokenag
Token ID363
Feature activation+7.41e-02

Pos logit contributions
 nod+0.029
 name+0.026
 agreement+0.026
 understand+0.025
 head+0.021
Neg logit contributions
ways-0.035
pill-0.034
ths-0.034
ience-0.033
oh-0.032
Tokengy
Token ID1360
Feature activation+1.22e-02

Pos logit contributions
 nod+0.148
 agreement+0.134
 understand+0.127
 whine+0.121
 gra+0.117
Neg logit contributions
ways-0.097
ar-0.095
-0.092
t-0.091
yl-0.089
Token tail
Token ID7894
Feature activation+2.14e-02

Pos logit contributions
 nod+0.016
 agreement+0.014
 understand+0.014
 name+0.013
 gra+0.011
Neg logit contributions
ways-0.024
pill-0.024
ience-0.024
-0.023
ar-0.023
Token.
Token ID13
Feature activation+1.24e-02

Pos logit contributions
 nod+0.037
 agreement+0.033
 name+0.033
 understand+0.031
 gra+0.027
Neg logit contributions
ways-0.036
pill-0.036
ths-0.035
ience-0.034
-0.033
Token Tim
Token ID5045
Feature activation+4.82e-03

Pos logit contributions
 nod+0.025
 agreement+0.024
 understand+0.023
 name+0.022
 head+0.020
Neg logit contributions
ways-0.016
-0.016
pill-0.016
ths-0.014
ar-0.013
Tokenmy
Token ID1820
Feature activation-1.34e-02

Pos logit contributions
 nod+0.007
 agreement+0.007
 name+0.006
 understand+0.006
 gra+0.005
Neg logit contributions
pill-0.009
ways-0.009
ience-0.009
oh-0.009
ths-0.009
Token black
Token ID2042
Feature activation+3.53e-02

Pos logit contributions
 nod+0.034
 agreement+0.033
 understand+0.031
 name+0.031
 gra+0.027
Neg logit contributions
ways-0.020
 seas-0.020
pill-0.020
endant-0.019
ience-0.019
Token dog
Token ID3290
Feature activation+4.88e-03

Pos logit contributions
 nod+0.055
 agreement+0.053
 understand+0.052
 name+0.046
 gra+0.043
Neg logit contributions
-0.060
ways-0.059
pill-0.059
ar-0.058
 seas-0.057
Token with
Token ID351
Feature activation-9.71e-03

Pos logit contributions
 nod+0.009
 name+0.008
 agreement+0.008
 understand+0.007
 gra+0.007
Neg logit contributions
ways-0.008
pill-0.008
ths-0.008
ience-0.008
-0.007
Token a
Token ID257
Feature activation-3.16e-04

Pos logit contributions
pill+0.016
ways+0.016
ths+0.015
ience+0.015
+0.014
Neg logit contributions
 nod-0.017
 agreement-0.016
 name-0.015
 understand-0.015
 gra-0.013
Token w
Token ID266
Feature activation+2.12e-02

Pos logit contributions
ways+0.000
+0.000
endant+0.000
ience+0.000
pill+0.000
Neg logit contributions
 nod-0.001
 agreement-0.001
 understand-0.001
 name-0.000
 gra-0.000
Tokenag
Token ID363
Feature activation+7.33e-02

Pos logit contributions
 nod+0.035
 name+0.031
 agreement+0.030
 understand+0.029
 head+0.025
Neg logit contributions
ways-0.038
pill-0.038
ths-0.037
ience-0.036
 lifetime-0.035
Tokengy
Token ID1360
Feature activation+1.11e-02

Pos logit contributions
 nod+0.145
 agreement+0.133
 understand+0.126
 whine+0.118
 name+0.118
Neg logit contributions
ways-0.096
ar-0.094
-0.092
oh-0.092
t-0.090
Token tail
Token ID7894
Feature activation+2.24e-02

Pos logit contributions
 nod+0.014
 agreement+0.013
 understand+0.012
 name+0.012
 gra+0.010
Neg logit contributions
ways-0.023
pill-0.022
ience-0.022
-0.021
endant-0.021
Token.
Token ID13
Feature activation+1.83e-02

Pos logit contributions
 nod+0.040
 agreement+0.036
 name+0.035
 understand+0.035
 gra+0.030
Neg logit contributions
ways-0.036
pill-0.035
ths-0.034
ience-0.034
-0.033
Token The
Token ID383
Feature activation+8.42e-03

Pos logit contributions
 nod+0.034
 agreement+0.032
 understand+0.031
 name+0.030
 whine+0.027
Neg logit contributions
ways-0.027
pill-0.027
-0.026
oh-0.024
ar-0.024
Token dog
Token ID3290
Feature activation-7.51e-03

Pos logit contributions
 agreement+0.014
 nod+0.014
 gra+0.013
 understand+0.013
 Amen+0.012
Neg logit contributions
-0.010
t-0.010
ww-0.010
 fallen-0.010
 arrest-0.010
Token his
Token ID465
Feature activation+1.06e-02

Pos logit contributions
 nod+0.011
 agreement+0.010
 name+0.010
 understand+0.009
 gra+0.009
Neg logit contributions
ar-0.008
ways-0.008
ths-0.008
ha-0.008
 seas-0.008
Token friend
Token ID1545
Feature activation+2.99e-03

Pos logit contributions
 nod+0.019
 agreement+0.017
 name+0.016
 understand+0.016
 gra+0.014
Neg logit contributions
ways-0.017
pill-0.017
ths-0.017
ience-0.016
ar-0.016
Token,
Token ID11
Feature activation+1.34e-02

Pos logit contributions
 nod+0.007
 agreement+0.007
 name+0.007
 understand+0.007
 gra+0.006
Neg logit contributions
ways-0.002
pill-0.002
ience-0.002
ar-0.002
ths-0.002
Token a
Token ID257
Feature activation+1.51e-02

Pos logit contributions
 nod+0.024
 agreement+0.023
 name+0.022
 understand+0.021
 gra+0.020
Neg logit contributions
-0.018
oh-0.018
ways-0.018
ths-0.017
ar-0.017
Token se
Token ID384
Feature activation+4.45e-02

Pos logit contributions
 nod+0.025
 agreement+0.023
 understand+0.022
 name+0.021
 gra+0.019
Neg logit contributions
pill-0.024
ways-0.024
lyss-0.023
ience-0.023
ths-0.023
Tokenag
Token ID363
Feature activation+7.28e-02

Pos logit contributions
 nod+0.071
 name+0.064
 agreement+0.062
 understand+0.058
 gra+0.054
Neg logit contributions
ar-0.077
ways-0.076
ience-0.075
pill-0.074
oh-0.073
Tokenull
Token ID724
Feature activation+2.82e-02

Pos logit contributions
 nod+0.136
 agreement+0.118
 understand+0.116
 name+0.112
 whine+0.108
Neg logit contributions
oh-0.102
ar-0.102
ways-0.102
st-0.100
yl-0.098
Token named
Token ID3706
Feature activation-9.27e-03

Pos logit contributions
 nod+0.058
 agreement+0.053
 name+0.053
 understand+0.051
 gra+0.046
Neg logit contributions
ways-0.038
pill-0.037
ths-0.036
ience-0.035
-0.034
Token Tim
Token ID5045
Feature activation+1.24e-02

Pos logit contributions
ways+0.011
pill+0.011
ths+0.011
ience+0.010
+0.010
Neg logit contributions
 nod-0.020
 agreement-0.018
 name-0.018
 understand-0.018
 whine-0.016
Tokenmy
Token ID1820
Feature activation-1.11e-02

Pos logit contributions
 nod+0.019
 agreement+0.018
 name+0.017
 understand+0.017
 whine+0.014
Neg logit contributions
oh-0.021
pill-0.021
ways-0.021
st-0.021
 least-0.021
Token,
Token ID11
Feature activation+1.32e-02

Pos logit contributions
ways+0.012
pill+0.012
ience+0.011
ths+0.011
+0.011
Neg logit contributions
 nod-0.025
 agreement-0.024
 name-0.023
 understand-0.023
 gra-0.020
Token.
Token ID13
Feature activation-8.87e-03

Pos logit contributions
 nod+0.046
 agreement+0.042
 name+0.042
 understand+0.040
 gra+0.037
Neg logit contributions
ways-0.034
ths-0.031
pill-0.030
qu-0.030
ar-0.029
Token It
Token ID632
Feature activation+3.70e-03

Pos logit contributions
ways+0.018
pill+0.018
ths+0.018
ience+0.018
endant+0.017
Neg logit contributions
 nod-0.012
 name-0.011
 agreement-0.010
 understand-0.010
 gra-0.008
Token was
Token ID373
Feature activation-8.24e-03

Pos logit contributions
 nod+0.007
 agreement+0.006
 name+0.006
 understand+0.006
 gra+0.005
Neg logit contributions
ways-0.006
pill-0.006
ths-0.006
ience-0.006
ar-0.005
Token a
Token ID257
Feature activation+9.13e-03

Pos logit contributions
ways+0.013
pill+0.012
ths+0.012
+0.011
 transformed+0.011
Neg logit contributions
 nod-0.015
 agreement-0.014
 name-0.013
 understand-0.013
 gra-0.012
Token se
Token ID384
Feature activation+2.91e-02

Pos logit contributions
 nod+0.015
 agreement+0.014
 name+0.014
 understand+0.013
 gra+0.011
Neg logit contributions
ways-0.016
pill-0.016
ths-0.015
ience-0.015
-0.014
Tokenag
Token ID363
Feature activation+7.17e-02

Pos logit contributions
 nod+0.043
 name+0.038
 agreement+0.038
 understand+0.036
 gra+0.031
Neg logit contributions
ways-0.055
pill-0.054
ience-0.053
ths-0.052
ar-0.052
Tokenull
Token ID724
Feature activation+1.75e-02

Pos logit contributions
 nod+0.131
 agreement+0.115
 understand+0.110
 name+0.109
 whine+0.104
Neg logit contributions
ways-0.109
oh-0.106
ar-0.104
pill-0.101
ths-0.101
Token!
Token ID0
Feature activation-1.62e-03

Pos logit contributions
 nod+0.033
 name+0.030
 agreement+0.029
 understand+0.029
 head+0.026
Neg logit contributions
ways-0.027
pill-0.026
ths-0.025
ience-0.024
-0.024
Token It
Token ID632
Feature activation+4.59e-03

Pos logit contributions
ways+0.003
ths+0.003
ience+0.003
pill+0.003
endant+0.003
Neg logit contributions
 nod-0.002
 name-0.002
 agreement-0.002
 understand-0.002
 gra-0.002
Token flew
Token ID13112
Feature activation+1.14e-02

Pos logit contributions
 nod+0.008
 agreement+0.008
 name+0.007
 understand+0.007
 gra+0.006
Neg logit contributions
ways-0.007
pill-0.007
ths-0.007
ience-0.007
-0.007
Token over
Token ID625
Feature activation-1.02e-02

Pos logit contributions
 nod+0.017
 agreement+0.015
 name+0.014
 understand+0.014
 gra+0.012
Neg logit contributions
ways-0.022
pill-0.022
ths-0.021
ience-0.021
ar-0.021
Token then
Token ID788
Feature activation-6.31e-03

Pos logit contributions
 nod+0.041
 agreement+0.038
 name+0.037
 understand+0.035
 head+0.031
Neg logit contributions
pill-0.043
ways-0.042
ience-0.040
qu-0.039
ths-0.039
Token,
Token ID11
Feature activation+5.97e-03

Pos logit contributions
pill+0.011
ways+0.010
endant+0.010
 lifetime+0.010
+0.010
Neg logit contributions
 nod-0.012
 name-0.010
 understand-0.010
 agreement-0.010
 whine-0.009
Token a
Token ID257
Feature activation+1.27e-02

Pos logit contributions
 nod+0.010
 name+0.009
 agreement+0.009
 understand+0.009
 whine+0.007
Neg logit contributions
pill-0.011
ways-0.011
ths-0.010
ience-0.010
endant-0.010
Token big
Token ID1263
Feature activation+2.95e-02

Pos logit contributions
 nod+0.021
 agreement+0.019
 name+0.019
 understand+0.018
 gra+0.015
Neg logit contributions
ways-0.022
pill-0.022
ths-0.021
ience-0.021
-0.020
Token se
Token ID384
Feature activation+4.32e-02

Pos logit contributions
 nod+0.047
 agreement+0.043
 understand+0.043
 name+0.043
 gra+0.039
Neg logit contributions
ways-0.047
ths-0.047
ience-0.046
ar-0.046
pill-0.046
Tokenag
Token ID363
Feature activation+7.15e-02

Pos logit contributions
 nod+0.065
 name+0.057
 agreement+0.057
 understand+0.053
 gra+0.047
Neg logit contributions
ways-0.080
pill-0.078
ar-0.078
ience-0.077
oh-0.075
Tokenull
Token ID724
Feature activation+1.73e-02

Pos logit contributions
 nod+0.126
 agreement+0.111
 understand+0.107
 name+0.106
 whine+0.097
Neg logit contributions
ways-0.112
oh-0.110
ar-0.110
ths-0.108
pill-0.107
Token saw
Token ID2497
Feature activation+7.82e-03

Pos logit contributions
 nod+0.029
 understand+0.025
 name+0.025
 agreement+0.024
 say+0.023
Neg logit contributions
ways-0.030
igsaw-0.029
pill-0.029
ths-0.028
-0.027
Token the
Token ID262
Feature activation-1.28e-02

Pos logit contributions
 nod+0.009
 agreement+0.008
 name+0.008
 understand+0.007
 say+0.006
Neg logit contributions
ways-0.017
 lifetime-0.017
ths-0.017
ience-0.017
pill-0.016
Token oy
Token ID35104
Feature activation-1.36e-02

Pos logit contributions
ways+0.019
pill+0.019
ths+0.018
ience+0.017
+0.017
Neg logit contributions
 nod-0.026
 name-0.024
 agreement-0.024
 understand-0.023
 head-0.022
Tokenster
Token ID1706
Feature activation+1.30e-02

Pos logit contributions
ways+0.023
pill+0.022
ience+0.021
ths+0.021
+0.020
Neg logit contributions
 nod-0.025
 name-0.023
 understand-0.020
 head-0.020
 agreement-0.020
Token She
Token ID1375
Feature activation-1.72e-02

Pos logit contributions
 nod+0.012
 agreement+0.011
 name+0.011
 understand+0.011
 gra+0.009
Neg logit contributions
ways-0.016
pill-0.016
ience-0.015
ths-0.015
-0.015
Token gives
Token ID3607
Feature activation+9.99e-03

Pos logit contributions
ways+0.033
pill+0.033
ths+0.032
ience+0.032
+0.031
Neg logit contributions
 nod-0.025
 agreement-0.022
 name-0.022
 understand-0.021
 gra-0.018
Token Sara
Token ID24799
Feature activation+1.12e-02

Pos logit contributions
 nod+0.016
 agreement+0.014
 name+0.014
 understand+0.013
 whine+0.011
Neg logit contributions
ways-0.018
pill-0.018
ths-0.018
ience-0.017
oh-0.017
Token a
Token ID257
Feature activation+1.98e-02

Pos logit contributions
 nod+0.020
 agreement+0.019
 name+0.018
 understand+0.018
 whine+0.016
Neg logit contributions
pill-0.017
ways-0.017
ience-0.016
ths-0.016
oh-0.016
Token l
Token ID300
Feature activation+3.01e-02

Pos logit contributions
 agreement+0.029
 nod+0.027
 understand+0.023
 name+0.022
 whine+0.022
Neg logit contributions
endant-0.034
pill-0.033
ha-0.032
-0.032
st-0.032
Tokenoll
Token ID692
Feature activation+7.10e-02

Pos logit contributions
 nod+0.066
 agreement+0.062
 name+0.060
 understand+0.059
 gra+0.054
Neg logit contributions
pill-0.036
ways-0.036
ience-0.034
ar-0.033
endant-0.033
Tokenipop
Token ID42800
Feature activation+1.23e-02

Pos logit contributions
 nod+0.149
 agreement+0.139
 name+0.135
 understand+0.133
 gra+0.121
Neg logit contributions
ways-0.090
pill-0.088
ience-0.087
ths-0.084
ar-0.084
Token and
Token ID290
Feature activation+6.55e-03

Pos logit contributions
 nod+0.018
 agreement+0.016
 name+0.016
 understand+0.015
 gra+0.013
Neg logit contributions
ways-0.024
pill-0.023
ths-0.023
ience-0.022
-0.022
Token a
Token ID257
Feature activation+2.15e-02

Pos logit contributions
 nod+0.011
 agreement+0.010
 name+0.010
 understand+0.010
 gra+0.008
Neg logit contributions
ways-0.011
pill-0.011
ths-0.010
ience-0.010
-0.010
Token hug
Token ID16225
Feature activation+1.32e-02

Pos logit contributions
 agreement+0.031
 nod+0.030
 understand+0.027
 name+0.025
 whine+0.024
Neg logit contributions
pill-0.037
endant-0.037
-0.036
ience-0.035
 seas-0.034
Token.
Token ID13
Feature activation+1.25e-02

Pos logit contributions
 nod+0.030
 agreement+0.029
 understand+0.028
 name+0.028
 gra+0.027
Neg logit contributions
ways-0.013
oh-0.012
ar-0.012
pill-0.012
st-0.011
TokenM
Token ID44
Feature activation+1.91e-02

Pos logit contributions
pill+0.024
ways+0.024
+0.023
ths+0.023
ience+0.022
Neg logit contributions
 nod-0.027
 name-0.024
 agreement-0.024
 understand-0.024
 head-0.020
Tokenum
Token ID388
Feature activation+1.39e-03

Pos logit contributions
 nod+0.040
 agreement+0.036
 name+0.036
 understand+0.035
 head+0.031
Neg logit contributions
ways-0.026
pill-0.025
ths-0.024
ience-0.023
-0.023
Token and
Token ID290
Feature activation+1.56e-02

Pos logit contributions
 nod+0.002
 agreement+0.002
 name+0.002
 understand+0.002
 gra+0.002
Neg logit contributions
ways-0.002
pill-0.002
ths-0.002
ience-0.002
-0.002
Token Bobby
Token ID17155
Feature activation+8.96e-03

Pos logit contributions
 nod+0.024
 agreement+0.022
 name+0.021
 understand+0.021
 gra+0.018
Neg logit contributions
ways-0.028
pill-0.028
ths-0.027
ience-0.027
oh-0.026
Token hugged
Token ID41130
Feature activation+1.54e-02

Pos logit contributions
 nod+0.016
 agreement+0.015
 name+0.014
 understand+0.014
 gra+0.012
Neg logit contributions
ways-0.014
pill-0.014
ths-0.014
ience-0.014
-0.013
Token each
Token ID1123
Feature activation+7.10e-02

Pos logit contributions
 nod+0.032
 name+0.029
 understand+0.029
 agreement+0.028
 gra+0.026
Neg logit contributions
ths-0.018
ways-0.018
-0.018
pill-0.017
ar-0.017
Token other
Token ID584
Feature activation+2.32e-02

Pos logit contributions
 nod+0.139
 name+0.123
 gra+0.119
 whine+0.118
 understand+0.117
Neg logit contributions
ha-0.078
ience-0.073
is-0.072
 years-0.072
way-0.069
Token and
Token ID290
Feature activation+7.78e-03

Pos logit contributions
 nod+0.043
 name+0.039
 agreement+0.039
 understand+0.037
 gra+0.033
Neg logit contributions
ways-0.035
pill-0.034
ience-0.034
ths-0.033
-0.031
Token then
Token ID788
Feature activation-5.02e-03

Pos logit contributions
 nod+0.015
 agreement+0.014
 name+0.014
 understand+0.013
 gra+0.012
Neg logit contributions
ways-0.010
ths-0.010
pill-0.010
ience-0.010
 lifetime-0.009
Token Mum
Token ID38367
Feature activation-1.72e-02

Pos logit contributions
ways+0.007
ths+0.007
ience+0.007
pill+0.007
ar+0.006
Neg logit contributions
 nod-0.010
 agreement-0.009
 name-0.009
 understand-0.008
 gra-0.008
Token said
Token ID531
Feature activation-2.19e-02

Pos logit contributions
ways+0.024
ience+0.024
pill+0.023
ths+0.023
oh+0.022
Neg logit contributions
 nod-0.033
 agreement-0.031
 name-0.030
 understand-0.029
 gra-0.026
Token had
Token ID550
Feature activation-1.67e-02

Pos logit contributions
 nod+0.014
 name+0.012
 understand+0.011
 agreement+0.011
 head+0.009
Neg logit contributions
ways-0.024
pill-0.024
ths-0.023
-0.023
ience-0.023
Token a
Token ID257
Feature activation+1.08e-02

Pos logit contributions
ways+0.023
pill+0.023
ths+0.021
ience+0.021
oh+0.021
Neg logit contributions
 nod-0.033
 agreement-0.031
 name-0.029
 understand-0.028
 gra-0.026
Token long
Token ID890
Feature activation+3.66e-02

Pos logit contributions
 nod+0.020
 agreement+0.019
 understand+0.018
 name+0.017
 gra+0.016
Neg logit contributions
ways-0.015
pill-0.015
-0.015
ience-0.015
ths-0.014
Token,
Token ID11
Feature activation+1.35e-02

Pos logit contributions
 nod+0.053
 agreement+0.051
 understand+0.048
 gra+0.045
 name+0.042
Neg logit contributions
ar-0.061
ways-0.060
-0.060
ience-0.056
endant-0.056
Token w
Token ID266
Feature activation+1.90e-02

Pos logit contributions
 nod+0.026
 agreement+0.026
 understand+0.024
 name+0.023
 gra+0.021
Neg logit contributions
pill-0.017
 seas-0.017
ways-0.017
ience-0.016
endant-0.016
Tokenag
Token ID363
Feature activation+7.10e-02

Pos logit contributions
 nod+0.028
 name+0.026
 agreement+0.024
 understand+0.024
 head+0.020
Neg logit contributions
ways-0.037
ths-0.036
pill-0.036
ience-0.035
 lifetime-0.034
Tokengy
Token ID1360
Feature activation+6.24e-03

Pos logit contributions
 nod+0.143
 agreement+0.123
 understand+0.117
 whine+0.115
 name+0.112
Neg logit contributions
ways-0.096
ar-0.092
yl-0.090
st-0.089
t-0.089
Token tail
Token ID7894
Feature activation+1.72e-02

Pos logit contributions
 nod+0.009
 agreement+0.008
 name+0.008
 understand+0.008
 gra+0.007
Neg logit contributions
ways-0.012
pill-0.012
ience-0.011
ths-0.011
-0.011
Token and
Token ID290
Feature activation-3.66e-03

Pos logit contributions
 nod+0.026
 name+0.024
 agreement+0.023
 understand+0.022
 head+0.020
Neg logit contributions
pill-0.032
ways-0.032
ths-0.032
ience-0.032
 lifetime-0.030
Token loved
Token ID6151
Feature activation-2.39e-03

Pos logit contributions
ways+0.006
pill+0.006
ience+0.006
ths+0.005
+0.005
Neg logit contributions
 nod-0.006
 agreement-0.006
 name-0.006
 understand-0.005
 gra-0.005
Token to
Token ID284
Feature activation-4.13e-03

Pos logit contributions
ways+0.003
pill+0.003
ience+0.003
ths+0.003
+0.003
Neg logit contributions
 nod-0.005
 agreement-0.004
 name-0.004
 understand-0.004
 gra-0.004
Token,
Token ID11
Feature activation+4.98e-04

Pos logit contributions
pill+0.030
ience+0.028
endant+0.026
ways+0.026
ths+0.026
Neg logit contributions
 nod-0.039
 name-0.034
 understand-0.033
 agreement-0.033
 head-0.031
Token they
Token ID484
Feature activation+1.71e-03

Pos logit contributions
 nod+0.001
 agreement+0.001
 understand+0.001
 name+0.001
 record+0.001
Neg logit contributions
pill-0.001
endant-0.001
ocus-0.001
ience-0.001
oh-0.001
Token saw
Token ID2497
Feature activation-4.62e-03

Pos logit contributions
 nod+0.003
 agreement+0.003
 name+0.003
 understand+0.003
 gra+0.003
Neg logit contributions
ways-0.003
pill-0.002
ths-0.002
ience-0.002
-0.002
Token a
Token ID257
Feature activation+3.72e-03

Pos logit contributions
ways+0.006
pill+0.006
ths+0.006
ience+0.006
endant+0.006
Neg logit contributions
 nod-0.009
 agreement-0.009
 name-0.008
 understand-0.008
 gra-0.007
Token se
Token ID384
Feature activation+3.74e-02

Pos logit contributions
 nod+0.006
 agreement+0.006
 name+0.005
 understand+0.005
 gra+0.005
Neg logit contributions
ways-0.006
pill-0.006
ths-0.006
ience-0.006
-0.006
Tokenag
Token ID363
Feature activation+7.04e-02

Pos logit contributions
 nod+0.053
 name+0.047
 agreement+0.047
 understand+0.044
 gra+0.038
Neg logit contributions
ways-0.073
pill-0.072
ience-0.070
ths-0.069
ar-0.069
Tokenull
Token ID724
Feature activation+1.99e-02

Pos logit contributions
 nod+0.126
 agreement+0.110
 understand+0.106
 name+0.106
 whine+0.098
Neg logit contributions
ways-0.111
ar-0.107
pill-0.106
oh-0.106
ths-0.104
Token fly
Token ID6129
Feature activation+1.60e-02

Pos logit contributions
 nod+0.037
 agreement+0.033
 name+0.033
 understand+0.033
 gra+0.029
Neg logit contributions
ways-0.031
pill-0.030
ths-0.029
ience-0.029
-0.028
Token down
Token ID866
Feature activation-1.60e-02

Pos logit contributions
 nod+0.026
 agreement+0.023
 name+0.023
 understand+0.022
 gra+0.019
Neg logit contributions
ways-0.029
pill-0.028
ths-0.028
ience-0.027
ar-0.027
Token and
Token ID290
Feature activation-8.30e-03

Pos logit contributions
pill+0.031
ience+0.030
ways+0.030
ths+0.030
endant+0.029
Neg logit contributions
 nod-0.024
 agreement-0.021
 name-0.021
 understand-0.020
 gra-0.016
Token pick
Token ID2298
Feature activation-2.03e-02

Pos logit contributions
ways+0.013
pill+0.013
ths+0.012
ience+0.012
+0.012
Neg logit contributions
 nod-0.015
 agreement-0.014
 name-0.014
 understand-0.013
 gra-0.012
Token flew
Token ID13112
Feature activation-8.31e-04

Pos logit contributions
ways+0.031
pill+0.030
ths+0.029
ience+0.029
+0.028
Neg logit contributions
 nod-0.038
 agreement-0.035
 name-0.035
 understand-0.034
 gra-0.030
Token off
Token ID572
Feature activation-2.16e-02

Pos logit contributions
ways+0.001
pill+0.001
ths+0.001
ience+0.001
ar+0.001
Neg logit contributions
 nod-0.001
 agreement-0.001
 name-0.001
 understand-0.001
 gra-0.001
Token with
Token ID351
Feature activation+1.42e-02

Pos logit contributions
pill+0.040
ths+0.038
ience+0.038
endant+0.037
ways+0.037
Neg logit contributions
 nod-0.035
 agreement-0.031
 name-0.031
 understand-0.030
 gra-0.026
Token a
Token ID257
Feature activation-2.31e-03

Pos logit contributions
 nod+0.022
 agreement+0.020
 name+0.020
 understand+0.019
 gra+0.016
Neg logit contributions
ways-0.026
pill-0.026
ths-0.025
ience-0.025
-0.024
Token w
Token ID266
Feature activation+2.05e-02

Pos logit contributions
ways+0.004
pill+0.004
 lifetime+0.004
ience+0.004
+0.004
Neg logit contributions
 nod-0.003
 agreement-0.003
 name-0.003
 understand-0.003
 whine-0.003
Tokenag
Token ID363
Feature activation+7.02e-02

Pos logit contributions
 nod+0.040
 name+0.037
 agreement+0.036
 understand+0.036
 head+0.031
Neg logit contributions
pill-0.029
ways-0.029
ths-0.029
oh-0.027
ience-0.027
Token of
Token ID286
Feature activation+2.90e-02

Pos logit contributions
 nod+0.134
 agreement+0.121
 name+0.114
 understand+0.113
 whine+0.111
Neg logit contributions
ways-0.098
ar-0.094
-0.090
t-0.089
way-0.088
Token his
Token ID465
Feature activation+1.34e-02

Pos logit contributions
 nod+0.050
 name+0.044
 understand+0.043
 agreement+0.042
 whine+0.038
Neg logit contributions
pill-0.044
ways-0.043
 seas-0.042
ience-0.042
ths-0.042
Token tail
Token ID7894
Feature activation+6.71e-03

Pos logit contributions
 nod+0.020
 agreement+0.017
 name+0.017
 understand+0.017
 gra+0.014
Neg logit contributions
ways-0.025
pill-0.025
ience-0.024
ths-0.024
-0.024
Token!
Token ID0
Feature activation+1.27e-02

Pos logit contributions
 nod+0.012
 agreement+0.011
 name+0.011
 understand+0.011
 head+0.009
Neg logit contributions
ways-0.010
pill-0.010
ths-0.010
ience-0.010
-0.010
Token<|endoftext|>
Token ID50256
Feature activation-1.97e-02

Pos logit contributions
 nod+0.022
 agreement+0.021
 name+0.020
 understand+0.020
 whine+0.017
Neg logit contributions
pill-0.020
ways-0.020
-0.020
ths-0.019
ience-0.019
Token.
Token ID13
Feature activation+1.87e-02

Pos logit contributions
pill+0.003
ways+0.003
ience+0.003
ths+0.003
+0.003
Neg logit contributions
 nod-0.003
 name-0.002
 agreement-0.002
 understand-0.002
 head-0.002
Token She
Token ID1375
Feature activation-4.65e-03

Pos logit contributions
 nod+0.033
 agreement+0.032
 understand+0.030
 name+0.028
 gra+0.026
Neg logit contributions
ways-0.029
pill-0.029
-0.028
oh-0.027
ths-0.027
Token grabbed
Token ID13176
Feature activation+1.44e-02

Pos logit contributions
ways+0.008
pill+0.008
+0.008
ths+0.007
ience+0.007
Neg logit contributions
 nod-0.008
 name-0.007
 agreement-0.007
 understand-0.007
 head-0.006
Token a
Token ID257
Feature activation+1.91e-02

Pos logit contributions
 nod+0.024
 agreement+0.022
 name+0.021
 understand+0.021
 gra+0.018
Neg logit contributions
ways-0.025
pill-0.025
ths-0.024
ience-0.024
-0.023
Token l
Token ID300
Feature activation+2.77e-02

Pos logit contributions
 nod+0.032
 agreement+0.030
 understand+0.028
 name+0.028
 whine+0.025
Neg logit contributions
pill-0.030
ways-0.030
endant-0.029
oh-0.029
ha-0.029
Tokenoll
Token ID692
Feature activation+7.01e-02

Pos logit contributions
 nod+0.048
 name+0.044
 agreement+0.043
 understand+0.042
 head+0.036
Neg logit contributions
ways-0.047
ths-0.046
pill-0.045
ience-0.044
 lifetime-0.043
Tokenipop
Token ID42800
Feature activation+1.69e-02

Pos logit contributions
 nod+0.127
 name+0.116
 agreement+0.112
 understand+0.109
 head+0.099
Neg logit contributions
pill-0.114
ways-0.114
ths-0.108
 lifetime-0.102
qu-0.102
Token from
Token ID422
Feature activation+3.05e-02

Pos logit contributions
 nod+0.023
 name+0.020
 agreement+0.019
 understand+0.019
 head+0.016
Neg logit contributions
pill-0.035
ways-0.035
ience-0.034
ths-0.034
 lifetime-0.032
Token her
Token ID607
Feature activation+4.91e-03

Pos logit contributions
 nod+0.040
 agreement+0.034
 name+0.034
 understand+0.033
 gra+0.027
Neg logit contributions
ways-0.064
pill-0.063
ths-0.062
ience-0.061
-0.060
Token bag
Token ID6131
Feature activation-2.72e-03

Pos logit contributions
 nod+0.005
 name+0.004
 agreement+0.004
 understand+0.003
 head+0.003
Neg logit contributions
ways-0.012
pill-0.012
ience-0.012
ths-0.012
qu-0.012
Token and
Token ID290
Feature activation+1.21e-02

Pos logit contributions
pill+0.005
ths+0.005
ways+0.005
ience+0.005
qu+0.005
Neg logit contributions
 nod-0.004
 name-0.004
 agreement-0.004
 understand-0.003
 head-0.003
Tokenat
Token ID265
Feature activation-4.62e-03

Pos logit contributions
ways+0.005
pill+0.005
ths+0.005
ience+0.005
+0.005
Neg logit contributions
 nod-0.006
 name-0.006
 agreement-0.006
 understand-0.006
 head-0.005
Token?"
Token ID1701
Feature activation-9.45e-03

Pos logit contributions
ways+0.006
pill+0.006
ths+0.005
ience+0.005
+0.005
Neg logit contributions
 nod-0.010
 agreement-0.009
 name-0.009
 understand-0.009
 gra-0.008
Token The
Token ID383
Feature activation+1.32e-02

Pos logit contributions
ways+0.014
pill+0.013
ths+0.013
ience+0.013
ar+0.013
Neg logit contributions
 nod-0.018
 agreement-0.017
 name-0.016
 understand-0.016
 gra-0.014
Token bear
Token ID6842
Feature activation+1.87e-02

Pos logit contributions
 nod+0.023
 understand+0.020
 agreement+0.019
 whine+0.018
 name+0.017
Neg logit contributions
ience-0.020
pill-0.020
ths-0.020
-0.019
oh-0.019
Token thought
Token ID1807
Feature activation+2.76e-02

Pos logit contributions
 nod+0.034
 name+0.030
 understand+0.029
 agreement+0.029
 head+0.026
Neg logit contributions
pill-0.031
ways-0.031
-0.029
ths-0.029
ience-0.029
Token as
Token ID355
Feature activation+6.94e-02

Pos logit contributions
 nod+0.040
 agreement+0.035
 name+0.034
 understand+0.033
 gra+0.028
Neg logit contributions
ways-0.054
pill-0.054
ths-0.052
ience-0.052
-0.050
Token he
Token ID339
Feature activation+7.96e-03

Pos logit contributions
 nod+0.168
 agreement+0.162
 name+0.156
 understand+0.149
 whine+0.143
Neg logit contributions
ience-0.059
ways-0.059
pill-0.055
ths-0.053
ha-0.052
Token peek
Token ID27185
Feature activation-6.15e-04

Pos logit contributions
 nod+0.014
 name+0.012
 understand+0.012
 agreement+0.012
 head+0.010
Neg logit contributions
ways-0.013
pill-0.013
-0.012
ths-0.012
endant-0.012
Tokened
Token ID276
Feature activation-5.72e-03

Pos logit contributions
ways+0.001
pill+0.001
ience+0.001
ths+0.001
ar+0.001
Neg logit contributions
 nod-0.001
 agreement-0.001
 name-0.001
 understand-0.001
 whine-0.001
Token out
Token ID503
Feature activation+1.32e-02

Pos logit contributions
ways+0.010
ar+0.009
ths+0.009
ience+0.009
oh+0.009
Neg logit contributions
 nod-0.009
 agreement-0.008
 understand-0.008
 name-0.008
 gra-0.007
Token from
Token ID422
Feature activation+1.57e-02

Pos logit contributions
 nod+0.023
 name+0.021
 agreement+0.021
 understand+0.020
 gra+0.018
Neg logit contributions
ways-0.021
ths-0.020
pill-0.019
ience-0.019
ar-0.019
Token
Token ID198
Feature activation-2.09e-02

Pos logit contributions
 nod+0.017
 agreement+0.016
 name+0.015
 head+0.014
 whine+0.014
Neg logit contributions
ways-0.011
pill-0.011
oh-0.011
-0.011
ar-0.010
TokenSuddenly
Token ID38582
Feature activation-1.40e-02

Pos logit contributions
ways+0.033
oh+0.031
ths+0.031
pill+0.031
ha+0.031
Neg logit contributions
 nod-0.038
 agreement-0.035
 name-0.034
 head-0.031
 understand-0.031
Token,
Token ID11
Feature activation-6.00e-04

Pos logit contributions
pill+0.026
endant+0.022
ths+0.022
 lifetime+0.022
qu+0.022
Neg logit contributions
 nod-0.027
 name-0.024
 understand-0.022
 agreement-0.021
 head-0.019
Token a
Token ID257
Feature activation+1.19e-02

Pos logit contributions
ways+0.001
pill+0.001
ths+0.001
ience+0.001
+0.001
Neg logit contributions
 nod-0.001
 agreement-0.001
 name-0.001
 understand-0.001
 gra-0.001
Token se
Token ID384
Feature activation+4.21e-02

Pos logit contributions
 nod+0.019
 agreement+0.017
 name+0.017
 understand+0.016
 gra+0.014
Neg logit contributions
ways-0.021
pill-0.021
ths-0.020
ience-0.020
-0.020
Tokenag
Token ID363
Feature activation+6.89e-02

Pos logit contributions
 nod+0.075
 name+0.069
 agreement+0.068
 understand+0.064
 gra+0.058
Neg logit contributions
ways-0.067
pill-0.065
ience-0.063
ar-0.063
oh-0.062
Tokenull
Token ID724
Feature activation+1.44e-02

Pos logit contributions
 nod+0.121
 agreement+0.106
 understand+0.102
 name+0.102
 whine+0.092
Neg logit contributions
ways-0.110
oh-0.107
ar-0.106
ths-0.105
pill-0.104
Token flew
Token ID13112
Feature activation+9.24e-03

Pos logit contributions
 nod+0.027
 name+0.024
 agreement+0.024
 understand+0.024
 gra+0.022
Neg logit contributions
ways-0.022
ths-0.021
pill-0.021
-0.020
igsaw-0.020
Token down
Token ID866
Feature activation-1.07e-02

Pos logit contributions
 nod+0.013
 agreement+0.012
 name+0.011
 understand+0.011
 gra+0.009
Neg logit contributions
ways-0.018
pill-0.018
ths-0.018
ience-0.017
oh-0.017
Token,
Token ID11
Feature activation+6.74e-03

Pos logit contributions
ience+0.023
pill+0.023
ths+0.023
ways+0.022
endant+0.022
Neg logit contributions
 nod-0.013
 agreement-0.012
 name-0.012
 understand-0.011
 pat-0.009
Token and
Token ID290
Feature activation-8.26e-03

Pos logit contributions
 nod+0.011
 agreement+0.010
 name+0.010
 understand+0.010
 gra+0.008
Neg logit contributions
ways-0.012
pill-0.012
ths-0.011
ience-0.011
-0.011
Token She
Token ID1375
Feature activation-1.15e-02

Pos logit contributions
 nod+0.006
 name+0.006
 agreement+0.006
 understand+0.005
 gra+0.004
Neg logit contributions
pill-0.008
ways-0.008
ience-0.007
ths-0.007
ar-0.007
Token even
Token ID772
Feature activation+8.16e-03

Pos logit contributions
ways+0.020
pill+0.020
ths+0.019
+0.019
ience+0.019
Neg logit contributions
 nod-0.019
 name-0.017
 agreement-0.017
 understand-0.017
 head-0.014
Token saw
Token ID2497
Feature activation-8.21e-03

Pos logit contributions
 nod+0.015
 agreement+0.014
 name+0.014
 understand+0.014
 gra+0.012
Neg logit contributions
ways-0.012
pill-0.012
ths-0.012
ience-0.012
-0.011
Token a
Token ID257
Feature activation+4.90e-03

Pos logit contributions
ways+0.014
ths+0.014
pill+0.014
ience+0.014
+0.013
Neg logit contributions
 nod-0.014
 name-0.012
 agreement-0.012
 understand-0.012
 gra-0.010
Token se
Token ID384
Feature activation+3.36e-02

Pos logit contributions
 nod+0.008
 agreement+0.008
 understand+0.007
 name+0.007
 gra+0.006
Neg logit contributions
pill-0.008
ways-0.008
ience-0.008
 lifetime-0.008
ths-0.008
Tokenag
Token ID363
Feature activation+6.87e-02

Pos logit contributions
 nod+0.049
 agreement+0.044
 name+0.044
 understand+0.041
 gra+0.035
Neg logit contributions
ways-0.064
pill-0.063
ar-0.061
ience-0.061
oh-0.060
Tokenull
Token ID724
Feature activation+1.68e-02

Pos logit contributions
 nod+0.130
 agreement+0.114
 understand+0.111
 name+0.106
 whine+0.106
Neg logit contributions
ar-0.098
oh-0.097
ways-0.096
st-0.094
way-0.092
Token!
Token ID0
Feature activation+9.37e-03

Pos logit contributions
 nod+0.032
 name+0.029
 agreement+0.028
 understand+0.028
 gra+0.025
Neg logit contributions
ways-0.026
pill-0.025
ths-0.025
ience-0.024
-0.023
Token
Token ID198
Feature activation+1.08e-02

Pos logit contributions
 nod+0.014
 name+0.012
 agreement+0.012
 understand+0.012
 gra+0.010
Neg logit contributions
pill-0.018
ways-0.018
ience-0.018
ths-0.018
ar-0.017
Token
Token ID198
Feature activation-1.80e-02

Pos logit contributions
 nod+0.018
 agreement+0.017
 name+0.016
 understand+0.015
 head+0.014
Neg logit contributions
ways-0.019
pill-0.018
ths-0.018
ience-0.017
-0.017
TokenWhen
Token ID2215
Feature activation-3.82e-03

Pos logit contributions
ways+0.031
pill+0.031
ths+0.030
ience+0.029
+0.029
Neg logit contributions
 nod-0.030
 agreement-0.027
 name-0.027
 understand-0.026
 gra-0.022
Token gave
Token ID2921
Feature activation+1.19e-02

Pos logit contributions
ways+0.010
pill+0.010
ths+0.009
ience+0.009
+0.009
Neg logit contributions
 nod-0.012
 agreement-0.011
 name-0.011
 understand-0.010
 gra-0.009
Token Tim
Token ID5045
Feature activation+2.41e-02

Pos logit contributions
 nod+0.021
 agreement+0.018
 name+0.018
 understand+0.018
 gra+0.015
Neg logit contributions
ways-0.020
pill-0.020
ths-0.019
ience-0.019
ar-0.018
Tokenmy
Token ID1820
Feature activation+8.32e-03

Pos logit contributions
 nod+0.035
 agreement+0.031
 name+0.030
 understand+0.029
 whine+0.026
Neg logit contributions
pill-0.045
ways-0.045
ience-0.044
oh-0.043
ar-0.043
Token and
Token ID290
Feature activation+2.36e-02

Pos logit contributions
 nod+0.014
 agreement+0.013
 name+0.013
 understand+0.012
 gra+0.011
Neg logit contributions
ways-0.014
pill-0.014
ths-0.013
ience-0.013
-0.013
Token Sally
Token ID25737
Feature activation+1.74e-02

Pos logit contributions
 nod+0.047
 agreement+0.043
 name+0.043
 understand+0.042
 gra+0.037
Neg logit contributions
ways-0.033
pill-0.032
ths-0.031
ience-0.030
-0.029
Token each
Token ID1123
Feature activation+6.78e-02

Pos logit contributions
 nod+0.028
 agreement+0.025
 name+0.025
 understand+0.024
 gra+0.020
Neg logit contributions
ways-0.031
pill-0.031
ths-0.030
ience-0.030
-0.029
Token a
Token ID257
Feature activation+2.66e-02

Pos logit contributions
 nod+0.124
 agreement+0.117
 name+0.107
 whine+0.106
 understand+0.103
Neg logit contributions
ha-0.090
ar-0.088
oh-0.087
ience-0.086
st-0.084
Token medal
Token ID18279
Feature activation+1.88e-02

Pos logit contributions
 nod+0.044
 agreement+0.040
 name+0.039
 understand+0.038
 gra+0.034
Neg logit contributions
ways-0.044
pill-0.044
ience-0.043
ths-0.042
-0.042
Token of
Token ID286
Feature activation+1.64e-02

Pos logit contributions
 nod+0.033
 name+0.029
 agreement+0.028
 understand+0.028
 head+0.025
Neg logit contributions
pill-0.032
ways-0.032
ths-0.031
ience-0.030
-0.029
Token their
Token ID511
Feature activation-7.71e-03

Pos logit contributions
 nod+0.027
 agreement+0.024
 name+0.023
 understand+0.022
 whine+0.020
Neg logit contributions
pill-0.028
ways-0.028
ths-0.027
ience-0.027
-0.026
Token own
Token ID898
Feature activation-1.35e-04

Pos logit contributions
ways+0.013
pill+0.013
ths+0.012
ience+0.012
qu+0.012
Neg logit contributions
 nod-0.013
 agreement-0.012
 name-0.012
 understand-0.011
 head-0.010
TokenJ
Token ID41
Feature activation-1.46e-02

Pos logit contributions
ways+0.030
pill+0.030
ths+0.029
ience+0.029
+0.028
Neg logit contributions
 nod-0.036
 agreement-0.033
 name-0.032
 understand-0.031
 gra-0.027
Tokenenny
Token ID11870
Feature activation-1.03e-02

Pos logit contributions
ways+0.010
ths+0.010
pill+0.010
ience+0.010
+0.009
Neg logit contributions
 nod-0.038
 name-0.036
 agreement-0.036
 understand-0.035
 head-0.032
Token spotted
Token ID13489
Feature activation-9.27e-03

Pos logit contributions
pill+0.019
ways+0.018
endant+0.018
+0.017
ths+0.017
Neg logit contributions
 nod-0.017
 name-0.015
 understand-0.014
 agreement-0.014
 head-0.013
Token a
Token ID257
Feature activation+2.03e-03

Pos logit contributions
ways+0.015
pill+0.014
ths+0.014
ience+0.014
+0.013
Neg logit contributions
 nod-0.017
 agreement-0.015
 name-0.015
 understand-0.015
 gra-0.013
Token se
Token ID384
Feature activation+3.96e-02

Pos logit contributions
 nod+0.004
 agreement+0.003
 name+0.003
 understand+0.003
 gra+0.003
Neg logit contributions
ways-0.003
pill-0.003
ths-0.003
ience-0.003
-0.003
Tokenag
Token ID363
Feature activation+6.76e-02

Pos logit contributions
 nod+0.058
 name+0.052
 agreement+0.051
 understand+0.048
 gra+0.042
Neg logit contributions
ways-0.075
pill-0.073
ar-0.072
ience-0.072
oh-0.070
Tokenull
Token ID724
Feature activation+1.98e-02

Pos logit contributions
 nod+0.121
 agreement+0.107
 understand+0.105
 name+0.102
 whine+0.098
Neg logit contributions
ar-0.101
ways-0.101
oh-0.100
st-0.098
-0.097
Token on
Token ID319
Feature activation+1.81e-02

Pos logit contributions
 nod+0.038
 agreement+0.035
 name+0.035
 understand+0.034
 gra+0.030
Neg logit contributions
ways-0.029
pill-0.028
ths-0.027
ience-0.027
-0.026
Token the
Token ID262
Feature activation-1.49e-02

Pos logit contributions
 nod+0.031
 agreement+0.027
 name+0.027
 understand+0.027
 whine+0.023
Neg logit contributions
ways-0.030
pill-0.030
ths-0.029
-0.028
ience-0.028
Token sand
Token ID6450
Feature activation-2.28e-03

Pos logit contributions
ways+0.022
pill+0.021
ths+0.020
ience+0.020
oh+0.020
Neg logit contributions
 nod-0.028
 agreement-0.025
 name-0.025
 understand-0.025
 whine-0.022
Token.
Token ID13
Feature activation+4.66e-03

Pos logit contributions
ways+0.004
pill+0.004
ths+0.004
ience+0.004
+0.004
Neg logit contributions
 nod-0.004
 agreement-0.003
 name-0.003
 understand-0.003
 gra-0.003
Token friends
Token ID2460
Feature activation+9.79e-03

Pos logit contributions
 name+0.001
 nod+0.001
 agreement+0.001
 understand+0.001
 head+0.001
Neg logit contributions
ways-0.001
pill-0.001
ience-0.001
ocus-0.001
plets-0.001
Token,
Token ID11
Feature activation+7.06e-03

Pos logit contributions
 nod+0.022
 agreement+0.020
 name+0.020
 understand+0.019
 gra+0.018
Neg logit contributions
ways-0.011
pill-0.010
ths-0.010
ience-0.010
oh-0.009
Token but
Token ID475
Feature activation+2.55e-02

Pos logit contributions
 nod+0.013
 agreement+0.012
 name+0.012
 understand+0.011
 gra+0.011
Neg logit contributions
pill-0.010
ways-0.010
ths-0.009
 seas-0.009
endant-0.009
Token a
Token ID257
Feature activation+1.42e-02

Pos logit contributions
 nod+0.048
 agreement+0.043
 name+0.043
 understand+0.042
 gra+0.037
Neg logit contributions
ways-0.039
pill-0.038
ths-0.038
ience-0.037
ar-0.036
Token se
Token ID384
Feature activation+3.78e-02

Pos logit contributions
 nod+0.024
 name+0.022
 agreement+0.022
 understand+0.021
 head+0.018
Neg logit contributions
ways-0.024
pill-0.024
ths-0.023
ience-0.023
-0.022
Tokenag
Token ID363
Feature activation+6.75e-02

Pos logit contributions
 nod+0.056
 agreement+0.050
 name+0.050
 understand+0.046
 gra+0.041
Neg logit contributions
ways-0.071
pill-0.069
ar-0.068
ience-0.068
oh-0.066
Tokenull
Token ID724
Feature activation+1.42e-02

Pos logit contributions
 nod+0.126
 agreement+0.109
 understand+0.105
 name+0.103
 whine+0.100
Neg logit contributions
ways-0.099
oh-0.099
ar-0.097
st-0.095
way-0.094
Token swoop
Token ID38527
Feature activation-6.48e-03

Pos logit contributions
 nod+0.027
 understand+0.024
 name+0.024
 agreement+0.023
 say+0.022
Neg logit contributions
ways-0.022
pill-0.021
ths-0.021
igsaw-0.020
-0.020
Tokened
Token ID276
Feature activation+6.23e-03

Pos logit contributions
ways+0.008
pill+0.007
ths+0.007
ience+0.007
oh+0.007
Neg logit contributions
 nod-0.014
 agreement-0.014
 name-0.013
 understand-0.013
 whine-0.012
Token down
Token ID866
Feature activation+1.31e-03

Pos logit contributions
 nod+0.010
 agreement+0.009
 understand+0.009
 name+0.009
 gra+0.007
Neg logit contributions
ways-0.011
ths-0.010
ience-0.010
pill-0.010
ar-0.010
Token and
Token ID290
Feature activation-1.49e-02

Pos logit contributions
 nod+0.002
 agreement+0.002
 name+0.002
 understand+0.002
 whine+0.002
Neg logit contributions
ways-0.002
pill-0.002
ths-0.002
-0.002
ience-0.002
Token day
Token ID1110
Feature activation-1.75e-02

Pos logit contributions
 nod+0.043
 name+0.038
 agreement+0.037
 understand+0.036
 head+0.030
Neg logit contributions
pill-0.071
-0.068
endant-0.067
ways-0.067
ience-0.067
Token long
Token ID890
Feature activation-9.37e-03

Pos logit contributions
pill+0.034
endant+0.032
ths+0.032
ience+0.032
ways+0.029
Neg logit contributions
 nod-0.026
 understand-0.024
 name-0.023
 agreement-0.023
 head-0.020
Token and
Token ID290
Feature activation-1.86e-02

Pos logit contributions
pill+0.016
endant+0.016
ths+0.016
ience+0.016
oh+0.015
Neg logit contributions
 nod-0.014
 name-0.013
 agreement-0.013
 understand-0.012
 head-0.011
Token when
Token ID618
Feature activation-1.75e-04

Pos logit contributions
pill+0.030
ways+0.029
ience+0.028
+0.027
ths+0.027
Neg logit contributions
 nod-0.034
 agreement-0.031
 name-0.030
 understand-0.030
 head-0.026
Token the
Token ID262
Feature activation-6.55e-03

Pos logit contributions
pill+0.000
endant+0.000
unn+0.000
ocus+0.000
qu+0.000
Neg logit contributions
 nod-0.000
 name-0.000
 understand-0.000
 agreement-0.000
 record-0.000
Token evening
Token ID6180
Feature activation-7.46e-02

Pos logit contributions
ways+0.012
pill+0.012
ths+0.012
ience+0.012
+0.011
Neg logit contributions
 nod-0.010
 agreement-0.009
 name-0.009
 understand-0.009
 gra-0.007
Token came
Token ID1625
Feature activation-3.61e-02

Pos logit contributions
pill+0.134
ience+0.132
endant+0.126
ths+0.124
ocus+0.122
Neg logit contributions
 nod-0.125
 name-0.117
 agreement-0.102
 understand-0.101
 head-0.092
Token,
Token ID11
Feature activation-6.05e-03

Pos logit contributions
pill+0.067
ience+0.063
ths+0.061
ways+0.059
ocus+0.057
Neg logit contributions
 nod-0.064
 name-0.061
 understand-0.055
 agreement-0.053
 head-0.047
Token it
Token ID340
Feature activation-1.83e-02

Pos logit contributions
ways+0.009
pill+0.009
ths+0.009
ience+0.009
+0.009
Neg logit contributions
 nod-0.011
 agreement-0.010
 name-0.010
 understand-0.010
 gra-0.009
Token was
Token ID373
Feature activation-1.40e-02

Pos logit contributions
pill+0.031
ways+0.030
+0.028
ths+0.028
ience+0.028
Neg logit contributions
 nod-0.032
 name-0.029
 agreement-0.029
 understand-0.028
 head-0.025
Token very
Token ID845
Feature activation-1.36e-02

Pos logit contributions
ways+0.020
ths+0.019
pill+0.019
ience+0.018
+0.018
Neg logit contributions
 nod-0.028
 agreement-0.025
 name-0.025
 understand-0.024
 gra-0.022
Token owl
Token ID39610
Feature activation+7.32e-03

Pos logit contributions
 nod+0.014
 agreement+0.012
 name+0.012
 understand+0.012
 gra+0.010
Neg logit contributions
ways-0.015
pill-0.015
ths-0.015
ience-0.015
-0.014
Token's
Token ID338
Feature activation+1.37e-02

Pos logit contributions
 nod+0.013
 agreement+0.012
 name+0.012
 understand+0.012
 head+0.010
Neg logit contributions
ways-0.012
pill-0.012
ths-0.011
-0.011
ience-0.011
Token stories
Token ID3923
Feature activation+3.46e-03

Pos logit contributions
 nod+0.022
 agreement+0.019
 name+0.019
 understand+0.018
 gra+0.017
Neg logit contributions
ths-0.023
 lifetime-0.022
ience-0.022
ways-0.022
 least-0.021
Token until
Token ID1566
Feature activation-1.48e-02

Pos logit contributions
 nod+0.006
 agreement+0.006
 name+0.005
 understand+0.005
 gra+0.005
Neg logit contributions
pill-0.006
ways-0.006
ths-0.005
ience-0.005
-0.005
Token the
Token ID262
Feature activation-4.67e-04

Pos logit contributions
pill+0.028
ways+0.025
ience+0.025
endant+0.024
ocus+0.024
Neg logit contributions
 nod-0.025
 agreement-0.023
 name-0.022
 understand-0.022
 gra-0.019
Token night
Token ID1755
Feature activation-7.36e-02

Pos logit contributions
pill+0.001
ways+0.001
ths+0.001
ience+0.001
+0.001
Neg logit contributions
 nod-0.001
 name-0.001
 agreement-0.001
 understand-0.001
 head-0.001
Token was
Token ID373
Feature activation-2.82e-02

Pos logit contributions
pill+0.129
ience+0.128
ths+0.125
endant+0.121
key+0.119
Neg logit contributions
 nod-0.122
 name-0.112
 understand-0.103
 agreement-0.098
 gra-0.093
Token over
Token ID625
Feature activation-2.87e-02

Pos logit contributions
ways+0.040
ths+0.038
pill+0.037
ience+0.036
+0.036
Neg logit contributions
 nod-0.056
 agreement-0.051
 name-0.050
 understand-0.049
 gra-0.043
Token,
Token ID11
Feature activation+3.41e-03

Pos logit contributions
pill+0.055
ways+0.052
ience+0.052
ocus+0.050
ths+0.050
Neg logit contributions
 nod-0.046
 agreement-0.041
 name-0.041
 understand-0.039
 head-0.033
Token and
Token ID290
Feature activation-4.24e-03

Pos logit contributions
 nod+0.006
 agreement+0.006
 name+0.006
 understand+0.005
 whine+0.005
Neg logit contributions
ways-0.005
ths-0.005
pill-0.005
ience-0.005
-0.005
Token the
Token ID262
Feature activation+7.21e-03

Pos logit contributions
ways+0.006
ths+0.006
ience+0.005
pill+0.005
 lifetime+0.005
Neg logit contributions
 nod-0.008
 agreement-0.008
 name-0.008
 understand-0.007
 whine-0.007
Token and
Token ID290
Feature activation-2.18e-02

Pos logit contributions
ways+0.006
ths+0.006
pill+0.006
ience+0.006
ar+0.005
Neg logit contributions
 nod-0.006
 agreement-0.006
 name-0.005
 understand-0.005
 gra-0.005
Token cl
Token ID537
Feature activation-3.91e-02

Pos logit contributions
ways+0.037
ths+0.035
pill+0.035
ience+0.034
 lifetime+0.033
Neg logit contributions
 nod-0.037
 agreement-0.034
 name-0.033
 understand-0.032
 gra-0.028
Tokenapping
Token ID5912
Feature activation+8.77e-03

Pos logit contributions
pill+0.083
ience+0.081
ths+0.079
ha+0.079
 seas+0.078
Neg logit contributions
 nod-0.055
 name-0.048
 agreement-0.044
 understand-0.044
 head-0.034
Token as
Token ID355
Feature activation+1.83e-02

Pos logit contributions
 nod+0.014
 agreement+0.012
 name+0.012
 understand+0.012
 gra+0.011
Neg logit contributions
ways-0.016
ths-0.015
ience-0.015
pill-0.014
ar-0.014
Token the
Token ID262
Feature activation-7.70e-03

Pos logit contributions
 nod+0.040
 agreement+0.038
 name+0.036
 understand+0.036
 gra+0.032
Neg logit contributions
ways-0.022
ths-0.021
pill-0.020
ience-0.020
ar-0.019
Token night
Token ID1755
Feature activation-7.21e-02

Pos logit contributions
pill+0.013
ways+0.013
ths+0.013
ience+0.012
+0.012
Neg logit contributions
 nod-0.013
 agreement-0.012
 name-0.012
 understand-0.011
 head-0.010
Token sky
Token ID6766
Feature activation-3.70e-02

Pos logit contributions
pill+0.119
ience+0.116
endant+0.113
ths+0.113
 least+0.107
Neg logit contributions
 nod-0.125
 name-0.116
 understand-0.105
 agreement-0.105
 gra-0.098
Token lit
Token ID6578
Feature activation-5.00e-02

Pos logit contributions
pill+0.056
ways+0.053
ience+0.052
ths+0.052
endant+0.051
Neg logit contributions
 nod-0.071
 name-0.064
 agreement-0.063
 understand-0.061
 gra-0.055
Token up
Token ID510
Feature activation-1.60e-02

Pos logit contributions
pill+0.116
ience+0.111
+0.111
ways+0.111
ths+0.110
Neg logit contributions
 nod-0.056
 name-0.051
 agreement-0.047
 understand-0.044
 gra-0.036
Token with
Token ID351
Feature activation-8.80e-03

Pos logit contributions
pill+0.031
ience+0.029
endant+0.029
+0.029
ways+0.028
Neg logit contributions
 nod-0.025
 name-0.022
 agreement-0.022
 understand-0.021
 head-0.017
Token streaks
Token ID48356
Feature activation+1.33e-02

Pos logit contributions
ways+0.015
pill+0.015
ths+0.015
ience+0.014
+0.014
Neg logit contributions
 nod-0.015
 agreement-0.013
 name-0.013
 understand-0.012
 gra-0.011
Token all
Token ID477
Feature activation+3.74e-02

Pos logit contributions
endant+0.009
pill+0.009
ience+0.009
oh+0.008
lyss+0.008
Neg logit contributions
 nod-0.006
 understand-0.005
 name-0.005
 agreement-0.005
 answer-0.004
Token day
Token ID1110
Feature activation-1.61e-02

Pos logit contributions
 nod+0.053
 agreement+0.047
 name+0.047
 understand+0.045
 head+0.038
Neg logit contributions
pill-0.074
ways-0.072
ience-0.072
endant-0.071
-0.070
Token and
Token ID290
Feature activation-1.07e-02

Pos logit contributions
pill+0.030
ience+0.029
endant+0.029
ths+0.028
lyss+0.026
Neg logit contributions
 nod-0.025
 understand-0.023
 name-0.022
 agreement-0.022
 head-0.019
Token when
Token ID618
Feature activation+1.02e-02

Pos logit contributions
pill+0.017
endant+0.015
+0.015
ience+0.015
ways+0.015
Neg logit contributions
 nod-0.020
 understand-0.019
 agreement-0.018
 name-0.018
 head-0.016
Token the
Token ID262
Feature activation-7.97e-03

Pos logit contributions
 nod+0.020
 agreement+0.018
 name+0.017
 understand+0.017
 whine+0.015
Neg logit contributions
pill-0.016
endant-0.015
ways-0.015
ocus-0.015
ience-0.014
Token evening
Token ID6180
Feature activation-7.18e-02

Pos logit contributions
pill+0.016
ways+0.016
ience+0.015
ocus+0.015
ar+0.015
Neg logit contributions
 nod-0.012
 name-0.011
 agreement-0.010
 understand-0.010
 head-0.008
Token came
Token ID1625
Feature activation-3.40e-02

Pos logit contributions
pill+0.114
ience+0.110
ways+0.107
endant+0.107
ths+0.106
Neg logit contributions
 nod-0.136
 name-0.123
 agreement-0.116
 understand-0.114
 gra-0.101
Token,
Token ID11
Feature activation-1.26e-03

Pos logit contributions
pill+0.051
ience+0.048
ways+0.047
ths+0.047
endant+0.045
Neg logit contributions
 nod-0.069
 name-0.065
 understand-0.061
 agreement-0.060
 head-0.053
Token he
Token ID339
Feature activation-1.46e-02

Pos logit contributions
ways+0.001
ths+0.001
pill+0.001
ience+0.001
+0.001
Neg logit contributions
 nod-0.003
 agreement-0.003
 name-0.003
 understand-0.003
 gra-0.002
Token looked
Token ID3114
Feature activation+4.13e-03

Pos logit contributions
pill+0.024
ways+0.023
ths+0.022
+0.022
ience+0.022
Neg logit contributions
 nod-0.026
 name-0.023
 understand-0.023
 agreement-0.023
 head-0.020
Token at
Token ID379
Feature activation+4.07e-03

Pos logit contributions
 nod+0.006
 agreement+0.006
 name+0.006
 understand+0.005
 gra+0.004
Neg logit contributions
ways-0.008
pill-0.008
ths-0.007
ience-0.007
-0.007
Token,
Token ID11
Feature activation+1.07e-02

Pos logit contributions
pill+0.001
ways+0.001
ths+0.001
ience+0.001
endant+0.001
Neg logit contributions
 nod-0.002
 name-0.001
 agreement-0.001
 understand-0.001
 head-0.001
Token but
Token ID475
Feature activation+1.68e-02

Pos logit contributions
 nod+0.019
 agreement+0.017
 name+0.017
 understand+0.016
 gra+0.014
Neg logit contributions
ways-0.017
pill-0.017
ths-0.017
ience-0.017
-0.016
Token it
Token ID340
Feature activation-7.70e-03

Pos logit contributions
 nod+0.034
 agreement+0.031
 name+0.031
 understand+0.030
 head+0.027
Neg logit contributions
pill-0.024
ways-0.024
-0.022
ths-0.022
ience-0.022
Token was
Token ID373
Feature activation-1.44e-02

Pos logit contributions
ways+0.013
pill+0.013
ths+0.012
+0.012
endant+0.012
Neg logit contributions
 nod-0.013
 agreement-0.012
 name-0.012
 understand-0.011
 head-0.010
Token all
Token ID477
Feature activation+6.40e-04

Pos logit contributions
ways+0.019
pill+0.018
ths+0.018
ience+0.018
+0.017
Neg logit contributions
 nod-0.030
 agreement-0.027
 name-0.027
 understand-0.026
 gra-0.024
Token fore
Token ID1674
Feature activation-7.14e-02

Pos logit contributions
 nod+0.001
 agreement+0.001
 understand+0.001
 name+0.001
 gra+0.001
Neg logit contributions
ways-0.001
pill-0.001
ths-0.001
-0.001
ience-0.001
Tokenshadow
Token ID19106
Feature activation-4.53e-02

Pos logit contributions
pill+0.098
+0.094
ways+0.086
endant+0.083
 seas+0.079
Neg logit contributions
 nod-0.144
 agreement-0.141
 understand-0.138
 name-0.123
 reins-0.121
Tokened
Token ID276
Feature activation-6.49e-03

Pos logit contributions
endant+0.068
pill+0.068
+0.066
ways+0.065
ths+0.064
Neg logit contributions
 nod-0.080
 name-0.072
 understand-0.070
 agreement-0.068
 head-0.062
Token by
Token ID416
Feature activation+3.85e-03

Pos logit contributions
pill+0.012
ways+0.011
+0.011
endant+0.011
ths+0.011
Neg logit contributions
 nod-0.011
 name-0.010
 agreement-0.009
 understand-0.009
 head-0.008
Token the
Token ID262
Feature activation-7.01e-03

Pos logit contributions
 nod+0.006
 name+0.005
 agreement+0.005
 understand+0.005
 head+0.004
Neg logit contributions
pill-0.007
ways-0.007
ths-0.007
ience-0.007
endant-0.007
Token kind
Token ID1611
Feature activation+4.88e-03

Pos logit contributions
ways+0.013
pill+0.012
ths+0.012
ience+0.012
+0.012
Neg logit contributions
 nod-0.011
 agreement-0.010
 name-0.010
 understand-0.009
 gra-0.008
Token in
Token ID287
Feature activation-2.04e-02

Pos logit contributions
ways+0.009
pill+0.009
ths+0.008
ience+0.008
endant+0.008
Neg logit contributions
 nod-0.010
 agreement-0.009
 name-0.009
 understand-0.009
 gra-0.008
Token her
Token ID607
Feature activation-1.79e-02

Pos logit contributions
ways+0.043
pill+0.043
ience+0.042
ths+0.042
ar+0.041
Neg logit contributions
 nod-0.026
 agreement-0.024
 name-0.023
 understand-0.022
 head-0.018
Token room
Token ID2119
Feature activation-2.14e-02

Pos logit contributions
ways+0.033
pill+0.032
ths+0.032
ience+0.031
+0.031
Neg logit contributions
 nod-0.027
 agreement-0.025
 understand-0.024
 name-0.024
 gra-0.020
Token when
Token ID618
Feature activation-5.50e-04

Pos logit contributions
pill+0.036
ways+0.034
ths+0.034
ience+0.034
endant+0.033
Neg logit contributions
 nod-0.037
 agreement-0.034
 name-0.033
 understand-0.032
 head-0.029
Token the
Token ID262
Feature activation-1.12e-02

Pos logit contributions
pill+0.001
ways+0.001
ience+0.001
ths+0.001
qu+0.001
Neg logit contributions
 nod-0.001
 name-0.001
 agreement-0.001
 understand-0.001
 whine-0.001
Token night
Token ID1755
Feature activation-7.09e-02

Pos logit contributions
ways+0.016
pill+0.015
ths+0.015
ience+0.015
 lifetime+0.014
Neg logit contributions
 nod-0.021
 agreement-0.019
 understand-0.019
 name-0.018
 gra-0.017
Token came
Token ID1625
Feature activation-2.68e-02

Pos logit contributions
pill+0.121
ience+0.120
ths+0.119
ways+0.113
endant+0.113
Neg logit contributions
 nod-0.121
 name-0.112
 understand-0.100
 agreement-0.099
 head-0.092
Token.
Token ID13
Feature activation-6.51e-04

Pos logit contributions
ways+0.044
pill+0.043
ths+0.042
ience+0.042
+0.040
Neg logit contributions
 nod-0.048
 agreement-0.043
 name-0.043
 understand-0.042
 gra-0.036
Token She
Token ID1375
Feature activation-4.65e-03

Pos logit contributions
ways+0.001
pill+0.001
ths+0.001
ience+0.001
+0.001
Neg logit contributions
 nod-0.001
 agreement-0.001
 name-0.001
 understand-0.001
 gra-0.001
Token wanted
Token ID2227
Feature activation+1.40e-02

Pos logit contributions
ways+0.007
pill+0.007
ths+0.007
ience+0.007
+0.007
Neg logit contributions
 nod-0.008
 agreement-0.008
 name-0.008
 understand-0.007
 gra-0.006
Token to
Token ID284
Feature activation-5.32e-03

Pos logit contributions
 nod+0.027
 agreement+0.024
 name+0.024
 understand+0.024
 gra+0.021
Neg logit contributions
ways-0.021
pill-0.021
ths-0.020
ience-0.020
-0.019
Token in
Token ID287
Feature activation-1.12e-02

Pos logit contributions
 nod+0.004
 agreement+0.004
 name+0.004
 understand+0.004
 gra+0.003
Neg logit contributions
ways-0.004
pill-0.004
ths-0.004
ience-0.004
ar-0.004
Token the
Token ID262
Feature activation-3.76e-02

Pos logit contributions
ways+0.020
pill+0.020
ths+0.019
ience+0.019
+0.019
Neg logit contributions
 nod-0.018
 agreement-0.016
 name-0.016
 understand-0.015
 gra-0.013
Token machine
Token ID4572
Feature activation-4.09e-04

Pos logit contributions
ways+0.064
pill+0.063
ths+0.061
ience+0.060
qu+0.059
Neg logit contributions
 nod-0.067
 agreement-0.059
 name-0.059
 understand-0.056
 head-0.051
Token until
Token ID1566
Feature activation-1.40e-02

Pos logit contributions
pill+0.001
ths+0.001
ways+0.001
ience+0.001
endant+0.001
Neg logit contributions
 nod-0.001
 name-0.001
 agreement-0.001
 understand-0.001
 head-0.001
Token the
Token ID262
Feature activation-6.37e-03

Pos logit contributions
pill+0.030
ience+0.027
qu+0.027
unn+0.027
endant+0.027
Neg logit contributions
 nod-0.021
 name-0.018
 agreement-0.017
 understand-0.017
 pat-0.015
Token evening
Token ID6180
Feature activation-7.08e-02

Pos logit contributions
ways+0.011
pill+0.011
ths+0.011
ience+0.010
+0.010
Neg logit contributions
 nod-0.011
 agreement-0.009
 name-0.009
 understand-0.009
 gra-0.008
Token.
Token ID13
Feature activation-9.54e-03

Pos logit contributions
ience+0.120
pill+0.120
ths+0.119
endant+0.114
ways+0.113
Neg logit contributions
 nod-0.122
 name-0.111
 understand-0.102
 agreement-0.102
 gra-0.092
Token He
Token ID679
Feature activation-7.32e-03

Pos logit contributions
ways+0.017
pill+0.017
+0.017
ience+0.016
ths+0.016
Neg logit contributions
 nod-0.015
 agreement-0.013
 understand-0.013
 name-0.013
 whine-0.011
Token was
Token ID373
Feature activation-5.30e-03

Pos logit contributions
ways+0.013
pill+0.013
+0.012
ths+0.012
endant+0.012
Neg logit contributions
 nod-0.012
 name-0.011
 understand-0.011
 agreement-0.010
 head-0.009
Token so
Token ID523
Feature activation-1.49e-02

Pos logit contributions
pill+0.008
ways+0.008
ience+0.008
ths+0.007
+0.007
Neg logit contributions
 nod-0.010
 agreement-0.010
 name-0.010
 understand-0.009
 head-0.008
Token proud
Token ID6613
Feature activation-2.48e-02

Pos logit contributions
pill+0.034
ience+0.032
ways+0.032
ths+0.032
+0.031
Neg logit contributions
 nod-0.020
 understand-0.017
 name-0.017
 agreement-0.016
 gra-0.013
Token flew
Token ID13112
Feature activation+8.21e-03

Pos logit contributions
 nod+0.021
 agreement+0.019
 name+0.019
 understand+0.019
 gra+0.016
Neg logit contributions
ways-0.018
pill-0.018
ths-0.017
ience-0.017
-0.016
Token like
Token ID588
Feature activation-4.70e-03

Pos logit contributions
 nod+0.014
 agreement+0.012
 name+0.012
 understand+0.012
 whine+0.011
Neg logit contributions
ways-0.014
pill-0.013
ths-0.013
ar-0.013
ience-0.013
Token butterflies
Token ID44571
Feature activation+1.66e-02

Pos logit contributions
ways+0.008
pill+0.008
ths+0.008
ience+0.007
+0.007
Neg logit contributions
 nod-0.008
 agreement-0.007
 name-0.007
 understand-0.007
 gra-0.006
Token until
Token ID1566
Feature activation-1.25e-02

Pos logit contributions
 nod+0.026
 name+0.022
 agreement+0.022
 understand+0.022
 gra+0.018
Neg logit contributions
ways-0.031
ths-0.030
pill-0.030
ience-0.029
endant-0.028
Token the
Token ID262
Feature activation-1.44e-02

Pos logit contributions
pill+0.026
endant+0.024
ience+0.024
ways+0.023
unn+0.023
Neg logit contributions
 nod-0.019
 agreement-0.016
 name-0.016
 understand-0.015
 gra-0.013
Token evening
Token ID6180
Feature activation-7.03e-02

Pos logit contributions
ways+0.026
pill+0.026
ths+0.025
ience+0.025
+0.024
Neg logit contributions
 nod-0.023
 agreement-0.020
 name-0.020
 understand-0.019
 gra-0.017
Token.
Token ID13
Feature activation-4.65e-03

Pos logit contributions
pill+0.117
ience+0.116
ths+0.115
ways+0.115
endant+0.111
Neg logit contributions
 nod-0.122
 name-0.109
 agreement-0.105
 understand-0.103
 gra-0.091
Token Then
Token ID3244
Feature activation-1.45e-02

Pos logit contributions
ways+0.008
pill+0.008
+0.007
ths+0.007
ience+0.007
Neg logit contributions
 nod-0.008
 agreement-0.007
 name-0.007
 understand-0.007
 head-0.006
Token,
Token ID11
Feature activation-2.00e-03

Pos logit contributions
pill+0.023
endant+0.022
ways+0.022
ience+0.021
ths+0.020
Neg logit contributions
 nod-0.029
 name-0.026
 agreement-0.025
 understand-0.024
 say-0.021
Token with
Token ID351
Feature activation+1.24e-02

Pos logit contributions
ways+0.003
pill+0.003
ths+0.002
ience+0.002
+0.002
Neg logit contributions
 nod-0.004
 agreement-0.004
 name-0.004
 understand-0.004
 gra-0.003
Token a
Token ID257
Feature activation+3.00e-03

Pos logit contributions
 nod+0.023
 name+0.021
 agreement+0.020
 understand+0.020
 gra+0.018
Neg logit contributions
ways-0.018
ths-0.018
pill-0.018
ience-0.018
 transformed-0.017
Token said
Token ID531
Feature activation-3.46e-02

Pos logit contributions
ways+0.015
pill+0.015
ths+0.014
ience+0.014
+0.014
Neg logit contributions
 nod-0.020
 agreement-0.018
 name-0.018
 understand-0.017
 gra-0.015
Token "
Token ID366
Feature activation-1.08e-02

Pos logit contributions
ways+0.062
pill+0.062
ths+0.061
ience+0.060
+0.058
Neg logit contributions
 nod-0.056
 name-0.049
 agreement-0.049
 understand-0.047
 gra-0.040
TokenHom
Token ID28718
Feature activation+3.18e-03

Pos logit contributions
ways+0.019
pill+0.018
ience+0.018
ths+0.018
endant+0.018
Neg logit contributions
 nod-0.018
 agreement-0.016
 name-0.016
 understand-0.016
 gra-0.013
Tokeneless
Token ID5321
Feature activation-3.08e-02

Pos logit contributions
 nod+0.005
 agreement+0.005
 name+0.005
 understand+0.005
 whine+0.004
Neg logit contributions
pill-0.006
ways-0.005
ience-0.005
ths-0.005
 lifetime-0.005
Token and
Token ID290
Feature activation-1.92e-02

Pos logit contributions
ways+0.044
pill+0.044
ths+0.042
ience+0.041
endant+0.040
Neg logit contributions
 nod-0.060
 name-0.056
 agreement-0.055
 understand-0.053
 head-0.047
Token Hungry
Token ID42939
Feature activation-7.00e-02

Pos logit contributions
ways+0.030
pill+0.030
ths+0.029
ience+0.029
+0.028
Neg logit contributions
 nod-0.035
 agreement-0.032
 name-0.031
 understand-0.030
 gra-0.027
Token".
Token ID1911
Feature activation+3.61e-03

Pos logit contributions
endant+0.108
ths+0.104
 transformed+0.101
ways+0.100
inations+0.099
Neg logit contributions
 nod-0.125
 name-0.115
 agreement-0.108
 head-0.098
 understand-0.097
Token Tim
Token ID5045
Feature activation+1.87e-02

Pos logit contributions
 nod+0.005
 name+0.005
 agreement+0.005
 understand+0.004
 say+0.004
Neg logit contributions
ience-0.007
pill-0.007
 lifetime-0.007
ths-0.007
endant-0.007
Tokenmy
Token ID1820
Feature activation+8.99e-04

Pos logit contributions
 nod+0.031
 agreement+0.028
 name+0.028
 gra+0.027
 understand+0.026
Neg logit contributions
oh-0.030
ths-0.030
ience-0.029
 least-0.029
ways-0.029
Token asked
Token ID1965
Feature activation+1.56e-02

Pos logit contributions
 nod+0.002
 understand+0.001
 name+0.001
 agreement+0.001
 head+0.001
Neg logit contributions
pill-0.002
ways-0.002
-0.002
endant-0.002
 seas-0.002
Token his
Token ID465
Feature activation+1.99e-02

Pos logit contributions
 nod+0.030
 name+0.030
 agreement+0.028
 gra+0.027
 reins+0.027
Neg logit contributions
ways-0.018
ths-0.018
ar-0.016
ience-0.015
 you-0.015
Token time
Token ID640
Feature activation+9.13e-03

Pos logit contributions
 nod+0.042
 agreement+0.038
 name+0.037
 understand+0.036
 head+0.031
Neg logit contributions
pill-0.043
ways-0.043
ths-0.041
ience-0.040
endant-0.040
Token,
Token ID11
Feature activation+1.05e-02

Pos logit contributions
 nod+0.016
 agreement+0.014
 understand+0.014
 name+0.013
 head+0.012
Neg logit contributions
pill-0.017
ience-0.016
endant-0.016
ths-0.015
ways-0.015
Token until
Token ID1566
Feature activation-6.67e-03

Pos logit contributions
 nod+0.018
 agreement+0.016
 name+0.016
 understand+0.016
 gra+0.014
Neg logit contributions
ways-0.018
pill-0.017
ths-0.017
ience-0.017
-0.016
Token eventually
Token ID4191
Feature activation-3.56e-02

Pos logit contributions
pill+0.014
endant+0.013
qu+0.013
ience+0.012
unn+0.012
Neg logit contributions
 nod-0.010
 agreement-0.009
 name-0.009
 understand-0.008
 pat-0.007
Token the
Token ID262
Feature activation-1.03e-02

Pos logit contributions
pill+0.076
endant+0.070
ience+0.066
ths+0.065
years+0.064
Neg logit contributions
 nod-0.056
 name-0.051
 understand-0.047
 agreement-0.044
 say-0.041
Token night
Token ID1755
Feature activation-6.98e-02

Pos logit contributions
pill+0.020
ways+0.020
ience+0.019
ocus+0.018
ths+0.018
Neg logit contributions
 nod-0.016
 name-0.015
 agreement-0.015
 understand-0.013
 head-0.012
Token sky
Token ID6766
Feature activation-3.04e-02

Pos logit contributions
ience+0.121
pill+0.120
ths+0.118
endant+0.114
ways+0.109
Neg logit contributions
 nod-0.119
 name-0.110
 understand-0.099
 agreement-0.097
 gra-0.089
Token filled
Token ID5901
Feature activation-4.79e-03

Pos logit contributions
pill+0.045
ways+0.044
ths+0.043
ience+0.043
endant+0.041
Neg logit contributions
 nod-0.059
 name-0.053
 agreement-0.053
 understand-0.052
 gra-0.046
Token with
Token ID351
Feature activation-1.63e-02

Pos logit contributions
ways+0.008
pill+0.008
ths+0.008
ience+0.008
oh+0.008
Neg logit contributions
 nod-0.008
 agreement-0.007
 name-0.007
 understand-0.007
 whine-0.006
Token stars
Token ID5788
Feature activation+2.98e-03

Pos logit contributions
ways+0.028
pill+0.028
ience+0.027
ths+0.027
+0.027
Neg logit contributions
 nod-0.027
 agreement-0.025
 name-0.025
 understand-0.024
 gra-0.021
Token.
Token ID13
Feature activation-4.58e-03

Pos logit contributions
 nod+0.005
 name+0.004
 agreement+0.004
 understand+0.004
 head+0.003
Neg logit contributions
pill-0.006
ways-0.005
ths-0.005
ience-0.005
-0.005
Token all
Token ID477
Feature activation+3.60e-02

Pos logit contributions
pill+0.015
endant+0.014
ience+0.014
oh+0.014
ths+0.014
Neg logit contributions
 nod-0.009
 name-0.008
 understand-0.008
 agreement-0.007
 head-0.006
Token week
Token ID1285
Feature activation-2.89e-02

Pos logit contributions
 nod+0.054
 agreement+0.048
 name+0.048
 understand+0.046
 gra+0.039
Neg logit contributions
ways-0.068
pill-0.068
ths-0.066
ience-0.065
-0.064
Token and
Token ID290
Feature activation-1.80e-02

Pos logit contributions
pill+0.049
ways+0.048
ths+0.047
ience+0.046
endant+0.045
Neg logit contributions
 nod-0.050
 name-0.045
 agreement-0.044
 understand-0.044
 head-0.037
Token when
Token ID618
Feature activation+6.38e-03

Pos logit contributions
pill+0.026
ways+0.025
+0.025
ths+0.024
ience+0.024
Neg logit contributions
 nod-0.035
 agreement-0.032
 name-0.031
 understand-0.031
 head-0.027
Token the
Token ID262
Feature activation-7.87e-03

Pos logit contributions
 nod+0.011
 name+0.010
 agreement+0.010
 understand+0.010
 whine+0.009
Neg logit contributions
pill-0.011
endant-0.010
ways-0.010
ience-0.010
qu-0.010
Token night
Token ID1755
Feature activation-6.98e-02

Pos logit contributions
ways+0.014
pill+0.014
ths+0.014
ience+0.014
+0.013
Neg logit contributions
 nod-0.012
 agreement-0.011
 name-0.011
 understand-0.010
 head-0.009
Token finally
Token ID3443
Feature activation-2.78e-02

Pos logit contributions
pill+0.122
ience+0.116
endant+0.115
ths+0.115
 least+0.113
Neg logit contributions
 nod-0.120
 name-0.111
 understand-0.097
 agreement-0.096
 kiss-0.089
Token arrived
Token ID5284
Feature activation-4.28e-02

Pos logit contributions
pill+0.050
+0.046
ience+0.044
ways+0.043
endant+0.043
Neg logit contributions
 nod-0.049
 name-0.044
 understand-0.042
 agreement-0.042
 head-0.038
Token,
Token ID11
Feature activation-4.28e-03

Pos logit contributions
pill+0.065
ways+0.058
ience+0.057
ths+0.056
+0.055
Neg logit contributions
 nod-0.087
 name-0.080
 agreement-0.079
 understand-0.076
 head-0.069
Token he
Token ID339
Feature activation-1.01e-02

Pos logit contributions
ways+0.006
pill+0.005
ths+0.005
ience+0.005
+0.005
Neg logit contributions
 nod-0.009
 agreement-0.008
 name-0.008
 understand-0.008
 gra-0.007
Token was
Token ID373
Feature activation-1.24e-03

Pos logit contributions
ways+0.017
pill+0.017
ths+0.016
+0.016
ience+0.016
Neg logit contributions
 nod-0.018
 name-0.016
 agreement-0.016
 understand-0.015
 head-0.014
Token around
Token ID1088
Feature activation+1.54e-02

Pos logit contributions
 nod+0.001
 agreement+0.001
 name+0.001
 understand+0.001
 gra+0.001
Neg logit contributions
ways-0.002
pill-0.002
ths-0.002
ience-0.002
-0.002
Token and
Token ID290
Feature activation-8.56e-03

Pos logit contributions
 nod+0.024
 agreement+0.021
 name+0.021
 understand+0.020
 head+0.017
Neg logit contributions
pill-0.029
ways-0.029
ths-0.028
ience-0.028
endant-0.027
Token around
Token ID1088
Feature activation+3.66e-03

Pos logit contributions
ways+0.015
pill+0.015
ths+0.014
ience+0.014
+0.014
Neg logit contributions
 nod-0.015
 agreement-0.013
 name-0.013
 understand-0.012
 head-0.011
Token until
Token ID1566
Feature activation-1.19e-02

Pos logit contributions
 nod+0.006
 agreement+0.006
 name+0.006
 understand+0.005
 head+0.005
Neg logit contributions
pill-0.006
ways-0.006
ths-0.006
ience-0.006
endant-0.006
Token the
Token ID262
Feature activation-1.05e-02

Pos logit contributions
pill+0.027
endant+0.025
unn+0.025
qu+0.024
ocus+0.023
Neg logit contributions
 nod-0.016
 agreement-0.014
 understand-0.013
 name-0.013
 kiss-0.011
Token night
Token ID1755
Feature activation-6.94e-02

Pos logit contributions
ways+0.020
pill+0.020
ths+0.019
ience+0.019
ar+0.019
Neg logit contributions
 nod-0.016
 name-0.015
 agreement-0.015
 understand-0.013
 head-0.012
Token sky
Token ID6766
Feature activation-3.38e-02

Pos logit contributions
pill+0.120
ience+0.120
ths+0.118
endant+0.114
key+0.111
Neg logit contributions
 nod-0.117
 name-0.108
 understand-0.097
 agreement-0.096
 kiss-0.088
Token began
Token ID2540
Feature activation-3.48e-02

Pos logit contributions
pill+0.049
ways+0.048
ience+0.046
ths+0.046
endant+0.044
Neg logit contributions
 nod-0.066
 name-0.060
 agreement-0.060
 understand-0.058
 head-0.051
Token to
Token ID284
Feature activation-2.55e-02

Pos logit contributions
ience+0.066
pill+0.065
ths+0.061
ar+0.061
opus+0.059
Neg logit contributions
 nod-0.057
 name-0.051
 understand-0.049
 agreement-0.048
 pat-0.043
Token come
Token ID1282
Feature activation-2.38e-02

Pos logit contributions
ways+0.048
pill+0.046
ths+0.046
ience+0.045
+0.044
Neg logit contributions
 nod-0.038
 agreement-0.034
 name-0.034
 understand-0.032
 gra-0.027
Token alive
Token ID6776
Feature activation-1.75e-02

Pos logit contributions
ways+0.044
pill+0.042
ths+0.042
ience+0.041
+0.040
Neg logit contributions
 nod-0.037
 agreement-0.033
 name-0.032
 understand-0.031
 gra-0.027
Token danced
Token ID39480
Feature activation+1.95e-03

Pos logit contributions
ways+0.028
pill+0.027
ths+0.026
ience+0.026
+0.026
Neg logit contributions
 nod-0.028
 agreement-0.025
 name-0.025
 understand-0.024
 gra-0.021
Token together
Token ID1978
Feature activation-3.04e-03

Pos logit contributions
 nod+0.004
 agreement+0.003
 name+0.003
 understand+0.003
 gra+0.003
Neg logit contributions
ways-0.003
pill-0.003
ths-0.003
ience-0.003
ar-0.003
Token,
Token ID11
Feature activation+4.92e-03

Pos logit contributions
pill+0.005
ways+0.005
ths+0.005
ience+0.005
endant+0.004
Neg logit contributions
 nod-0.006
 agreement-0.005
 name-0.005
 understand-0.005
 head-0.004
Token until
Token ID1566
Feature activation-9.37e-03

Pos logit contributions
 nod+0.009
 agreement+0.008
 name+0.008
 understand+0.007
 gra+0.007
Neg logit contributions
ways-0.008
pill-0.008
ths-0.008
ience-0.008
-0.007
Token the
Token ID262
Feature activation-3.49e-03

Pos logit contributions
pill+0.023
endant+0.022
ocus+0.022
unn+0.021
oh+0.021
Neg logit contributions
 nod-0.010
 agreement-0.009
 name-0.009
 understand-0.008
 their-0.008
Token night
Token ID1755
Feature activation-6.92e-02

Pos logit contributions
ways+0.006
pill+0.006
ths+0.006
ience+0.006
+0.006
Neg logit contributions
 nod-0.005
 agreement-0.005
 name-0.005
 understand-0.005
 gra-0.004
Token turned
Token ID2900
Feature activation-3.70e-02

Pos logit contributions
pill+0.117
ience+0.117
ths+0.116
endant+0.115
ocus+0.112
Neg logit contributions
 nod-0.117
 name-0.108
 understand-0.099
 agreement-0.096
 record-0.087
Token to
Token ID284
Feature activation-4.04e-03

Pos logit contributions
pill+0.070
ths+0.067
ways+0.067
+0.066
ience+0.065
Neg logit contributions
 nod-0.058
 name-0.051
 agreement-0.050
 understand-0.049
 gra-0.041
Token day
Token ID1110
Feature activation-3.60e-02

Pos logit contributions
ways+0.007
ths+0.006
 least+0.006
+0.006
 lifetime+0.006
Neg logit contributions
 nod-0.007
 name-0.006
 agreement-0.006
 understand-0.006
 whine-0.005
Token.
Token ID13
Feature activation+1.12e-03

Pos logit contributions
pill+0.065
ths+0.062
endant+0.060
ience+0.060
ways+0.060
Neg logit contributions
 nod-0.062
 name-0.055
 agreement-0.054
 understand-0.053
 head-0.044
Token From
Token ID3574
Feature activation-1.08e-02

Pos logit contributions
 nod+0.002
 name+0.002
 agreement+0.002
 understand+0.002
 gra+0.002
Neg logit contributions
ways-0.002
ths-0.002
pill-0.002
ience-0.002
endant-0.002
Token everything
Token ID2279
Feature activation-6.80e-03

Pos logit contributions
ways+0.011
pill+0.011
ths+0.011
 least+0.011
ience+0.011
Neg logit contributions
 nod-0.011
 agreement-0.010
 name-0.010
 understand-0.009
 whine-0.008
Token she
Token ID673
Feature activation-2.83e-02

Pos logit contributions
pill+0.010
ways+0.010
+0.009
ths+0.009
endant+0.009
Neg logit contributions
 nod-0.013
 agreement-0.011
 name-0.011
 understand-0.011
 head-0.010
Token recorded
Token ID6264
Feature activation-2.12e-02

Pos logit contributions
ways+0.049
pill+0.048
ths+0.047
+0.046
ience+0.045
Neg logit contributions
 nod-0.047
 name-0.043
 agreement-0.041
 understand-0.041
 head-0.035
Token on
Token ID319
Feature activation-2.43e-02

Pos logit contributions
ways+0.030
pill+0.029
ths+0.028
ience+0.028
ar+0.027
Neg logit contributions
 nod-0.042
 agreement-0.038
 name-0.037
 understand-0.036
 gra-0.033
Token her
Token ID607
Feature activation-1.06e-02

Pos logit contributions
ways+0.043
pill+0.042
ths+0.041
ience+0.040
+0.039
Neg logit contributions
 nod-0.040
 agreement-0.036
 name-0.035
 understand-0.034
 gra-0.029
Token Sunday
Token ID3502
Feature activation-6.84e-02

Pos logit contributions
ways+0.019
pill+0.019
ths+0.018
ience+0.018
+0.017
Neg logit contributions
 nod-0.017
 agreement-0.016
 name-0.015
 understand-0.015
 head-0.013
Token adventure
Token ID8855
Feature activation-1.74e-02

Pos logit contributions
pill+0.111
ways+0.108
ience+0.108
endant+0.108
ths+0.107
Neg logit contributions
 nod-0.119
 name-0.108
 agreement-0.105
 understand-0.100
 head-0.092
Token.
Token ID13
Feature activation+6.78e-03

Pos logit contributions
pill+0.027
ways+0.027
ths+0.026
ience+0.026
endant+0.025
Neg logit contributions
 nod-0.032
 agreement-0.029
 name-0.028
 understand-0.027
 gra-0.024
Token<|endoftext|>
Token ID50256
Feature activation-2.69e-02

Pos logit contributions
 nod+0.012
 agreement+0.011
 name+0.010
 understand+0.010
 gra+0.009
Neg logit contributions
ways-0.011
pill-0.011
ths-0.011
ience-0.011
-0.011
TokenOnce
Token ID7454
Feature activation-7.01e-03

Pos logit contributions
pill+0.033
ways+0.033
+0.031
oh+0.029
ar+0.029
Neg logit contributions
 nod-0.059
 agreement-0.055
 understand-0.053
 name-0.052
 whine-0.048
Token there
Token ID612
Feature activation+2.25e-03

Pos logit contributions
ways+0.012
pill+0.012
ths+0.012
ience+0.012
+0.011
Neg logit contributions
 nod-0.011
 agreement-0.010
 name-0.010
 understand-0.010
 gra-0.008
TokenAsh
Token ID26754
Feature activation-7.21e-03

Pos logit contributions
ways+0.018
pill+0.018
ths+0.017
ience+0.017
oh+0.017
Neg logit contributions
 nod-0.014
 agreement-0.012
 name-0.012
 understand-0.011
 gra-0.010
Tokenley
Token ID1636
Feature activation-7.04e-03

Pos logit contributions
ways+0.011
pill+0.011
ths+0.010
+0.010
 lifetime+0.010
Neg logit contributions
 nod-0.014
 name-0.013
 agreement-0.012
 understand-0.012
 head-0.011
Token,
Token ID11
Feature activation-1.24e-02

Pos logit contributions
pill+0.010
ways+0.010
ths+0.010
 lifetime+0.009
 seas+0.009
Neg logit contributions
 nod-0.014
 name-0.013
 understand-0.012
 agreement-0.012
 head-0.011
Token when
Token ID618
Feature activation-2.96e-03

Pos logit contributions
ways+0.013
pill+0.013
ths+0.013
ience+0.013
oh+0.012
Neg logit contributions
 nod-0.027
 agreement-0.026
 name-0.025
 understand-0.025
 gra-0.023
Token the
Token ID262
Feature activation-1.24e-02

Pos logit contributions
pill+0.003
ways+0.003
endant+0.003
ience+0.003
ths+0.003
Neg logit contributions
 nod-0.007
 name-0.007
 understand-0.007
 agreement-0.006
 head-0.006
Token night
Token ID1755
Feature activation-6.79e-02

Pos logit contributions
ways+0.021
pill+0.021
ths+0.020
ience+0.020
+0.020
Neg logit contributions
 nod-0.020
 agreement-0.019
 understand-0.018
 name-0.017
 gra-0.015
Token comes
Token ID2058
Feature activation-2.34e-02

Pos logit contributions
pill+0.110
ways+0.108
ths+0.108
endant+0.106
ience+0.102
Neg logit contributions
 nod-0.121
 name-0.113
 agreement-0.100
 understand-0.098
 head-0.094
Token you
Token ID345
Feature activation-2.20e-02

Pos logit contributions
pill+0.042
ways+0.041
ths+0.039
ience+0.039
endant+0.038
Neg logit contributions
 nod-0.039
 name-0.037
 agreement-0.035
 understand-0.034
 head-0.030
Token must
Token ID1276
Feature activation-1.59e-02

Pos logit contributions
pill+0.037
ways+0.036
ths+0.035
ience+0.034
+0.033
Neg logit contributions
 nod-0.039
 name-0.036
 understand-0.035
 agreement-0.035
 head-0.030
Token hide
Token ID7808
Feature activation+9.62e-03

Pos logit contributions
ways+0.030
pill+0.029
ths+0.029
ience+0.029
ar+0.028
Neg logit contributions
 nod-0.023
 agreement-0.020
 name-0.019
 understand-0.018
 gra-0.016
Token the
Token ID262
Feature activation-1.04e-02

Pos logit contributions
 nod+0.017
 agreement+0.015
 name+0.015
 understand+0.015
 head+0.013
Neg logit contributions
pill-0.016
ways-0.016
ths-0.015
ience-0.015
-0.015
Token the
Token ID262
Feature activation-3.27e-03

Pos logit contributions
 nod+0.004
 agreement+0.004
 name+0.004
 understand+0.003
 head+0.003
Neg logit contributions
ways-0.004
pill-0.004
ths-0.004
ience-0.004
-0.004
Token heavy
Token ID4334
Feature activation+1.48e-02

Pos logit contributions
ways+0.006
pill+0.005
ths+0.005
ience+0.005
+0.005
Neg logit contributions
 nod-0.006
 agreement-0.005
 name-0.005
 understand-0.005
 gra-0.004
Token bag
Token ID6131
Feature activation+2.32e-03

Pos logit contributions
 nod+0.025
 agreement+0.023
 name+0.022
 understand+0.022
 gra+0.019
Neg logit contributions
ways-0.025
pill-0.024
ience-0.024
ths-0.023
ar-0.023
Token had
Token ID550
Feature activation-4.51e-02

Pos logit contributions
 nod+0.004
 agreement+0.003
 understand+0.003
 name+0.003
 head+0.003
Neg logit contributions
pill-0.004
ways-0.004
ths-0.004
ience-0.004
qu-0.004
Token been
Token ID587
Feature activation-1.30e-02

Pos logit contributions
pill+0.072
ths+0.067
ways+0.067
+0.065
ience+0.065
Neg logit contributions
 nod-0.084
 name-0.079
 understand-0.079
 agreement-0.076
 head-0.068
Token fore
Token ID1674
Feature activation-6.79e-02

Pos logit contributions
pill+0.022
ths+0.020
ways+0.020
+0.020
ience+0.019
Neg logit contributions
 nod-0.023
 understand-0.021
 agreement-0.021
 name-0.021
 head-0.019
Tokenshadow
Token ID19106
Feature activation-5.10e-02

Pos logit contributions
pill+0.092
+0.084
ways+0.080
endant+0.073
 seas+0.071
Neg logit contributions
 nod-0.142
 agreement-0.138
 understand-0.135
 name-0.128
 head-0.122
Tokened
Token ID276
Feature activation-1.35e-02

Pos logit contributions
endant+0.078
pill+0.078
ths+0.074
ience+0.073
+0.072
Neg logit contributions
 nod-0.090
 name-0.083
 understand-0.081
 agreement-0.080
 head-0.073
Token,
Token ID11
Feature activation-4.02e-03

Pos logit contributions
pill+0.023
ways+0.023
ths+0.022
ience+0.022
+0.021
Neg logit contributions
 nod-0.023
 name-0.021
 agreement-0.021
 understand-0.020
 head-0.017
Token and
Token ID290
Feature activation-1.76e-02

Pos logit contributions
ways+0.007
pill+0.007
ths+0.007
ience+0.007
+0.007
Neg logit contributions
 nod-0.006
 agreement-0.006
 name-0.006
 understand-0.005
 gra-0.005
Token that
Token ID326
Feature activation+1.05e-02

Pos logit contributions
pill+0.032
ways+0.031
endant+0.030
+0.030
ocus+0.029
Neg logit contributions
 nod-0.028
 agreement-0.025
 understand-0.025
 name-0.025
 head-0.022
Token not
Token ID407
Feature activation+1.80e-02

Pos logit contributions
pill+0.031
ways+0.031
ths+0.030
ience+0.029
+0.029
Neg logit contributions
 nod-0.031
 name-0.028
 agreement-0.028
 understand-0.027
 gra-0.023
Token wake
Token ID7765
Feature activation-2.06e-02

Pos logit contributions
 nod+0.024
 agreement+0.021
 name+0.021
 understand+0.020
 gra+0.017
Neg logit contributions
ways-0.037
pill-0.036
ths-0.035
ience-0.035
-0.034
Token up
Token ID510
Feature activation-9.12e-03

Pos logit contributions
ways+0.040
ths+0.039
pill+0.038
ience+0.038
ar+0.037
Neg logit contributions
 nod-0.030
 agreement-0.026
 understand-0.024
 name-0.024
 gra-0.020
Token until
Token ID1566
Feature activation-1.64e-02

Pos logit contributions
ways+0.015
pill+0.014
ths+0.014
ience+0.014
+0.013
Neg logit contributions
 nod-0.016
 agreement-0.015
 name-0.015
 understand-0.014
 gra-0.012
Token the
Token ID262
Feature activation-1.08e-02

Pos logit contributions
ways+0.031
pill+0.031
ths+0.030
ience+0.030
+0.029
Neg logit contributions
 nod-0.025
 agreement-0.022
 name-0.022
 understand-0.021
 gra-0.018
Token morning
Token ID3329
Feature activation-6.73e-02

Pos logit contributions
ways+0.016
ths+0.015
pill+0.015
 lifetime+0.015
ience+0.015
Neg logit contributions
 nod-0.019
 agreement-0.017
 understand-0.017
 name-0.017
 gra-0.015
Token.
Token ID13
Feature activation+1.60e-03

Pos logit contributions
pill+0.119
ths+0.119
ience+0.117
ways+0.115
endant+0.112
Neg logit contributions
 nod-0.110
 name-0.101
 agreement-0.095
 understand-0.092
 gra-0.083
Token He
Token ID679
Feature activation-5.33e-03

Pos logit contributions
 nod+0.002
 agreement+0.002
 understand+0.002
 name+0.002
 gra+0.002
Neg logit contributions
ways-0.003
pill-0.003
-0.003
ience-0.003
ths-0.003
Token smiled
Token ID13541
Feature activation-1.02e-02

Pos logit contributions
ways+0.009
pill+0.009
ths+0.008
ience+0.008
+0.008
Neg logit contributions
 nod-0.009
 agreement-0.008
 name-0.008
 understand-0.008
 gra-0.007
Token,
Token ID11
Feature activation+1.66e-02

Pos logit contributions
ways+0.016
ths+0.015
pill+0.015
ience+0.015
ar+0.014
Neg logit contributions
 nod-0.018
 agreement-0.016
 name-0.016
 understand-0.015
 gra-0.014
Token and
Token ID290
Feature activation-3.72e-03

Pos logit contributions
 nod+0.029
 agreement+0.029
 name+0.028
 understand+0.026
 whine+0.025
Neg logit contributions
ways-0.022
ths-0.021
ience-0.021
 lifetime-0.021
endant-0.020
Token all
Token ID477
Feature activation+2.22e-02

Pos logit contributions
 nod+0.022
 agreement+0.020
 understand+0.020
 name+0.019
 whine+0.019
Neg logit contributions
ways-0.016
ar-0.015
ths-0.014
 seas-0.014
st-0.014
Token afternoon
Token ID6672
Feature activation-2.49e-02

Pos logit contributions
 nod+0.037
 agreement+0.033
 name+0.032
 understand+0.031
 gra+0.027
Neg logit contributions
ways-0.039
pill-0.038
ths-0.037
ience-0.037
oh-0.036
Token and
Token ID290
Feature activation-1.77e-02

Pos logit contributions
pill+0.053
endant+0.053
ience+0.052
ths+0.050
 least+0.050
Neg logit contributions
 nod-0.031
 name-0.027
 understand-0.025
 agreement-0.025
 head-0.024
Token as
Token ID355
Feature activation+2.62e-02

Pos logit contributions
pill+0.026
ways+0.026
ience+0.025
ths+0.025
+0.024
Neg logit contributions
 nod-0.034
 name-0.031
 agreement-0.031
 understand-0.030
 head-0.026
Token the
Token ID262
Feature activation-5.10e-04

Pos logit contributions
 nod+0.047
 name+0.044
 understand+0.044
 agreement+0.041
 gra+0.037
Neg logit contributions
pill-0.040
ways-0.039
ience-0.039
ar-0.039
-0.038
Token evening
Token ID6180
Feature activation-6.72e-02

Pos logit contributions
ways+0.001
pill+0.001
ths+0.001
ience+0.001
+0.001
Neg logit contributions
 nod-0.001
 agreement-0.001
 name-0.001
 understand-0.001
 gra-0.001
Token sky
Token ID6766
Feature activation-3.26e-02

Pos logit contributions
pill+0.104
ience+0.102
ths+0.099
ways+0.099
endant+0.097
Neg logit contributions
 nod-0.126
 name-0.115
 agreement-0.110
 understand-0.107
 gra-0.096
Token turned
Token ID2900
Feature activation-4.78e-02

Pos logit contributions
pill+0.052
ways+0.051
ience+0.050
ths+0.050
endant+0.048
Neg logit contributions
 nod-0.059
 name-0.053
 agreement-0.052
 understand-0.051
 gra-0.045
Token a
Token ID257
Feature activation-2.31e-02

Pos logit contributions
pill+0.091
ths+0.089
+0.088
ience+0.086
ways+0.086
Neg logit contributions
 nod-0.073
 name-0.067
 agreement-0.061
 understand-0.060
 head-0.052
Token deep
Token ID2769
Feature activation-2.70e-03

Pos logit contributions
ways+0.039
pill+0.038
ths+0.037
ience+0.036
+0.035
Neg logit contributions
 nod-0.040
 agreement-0.036
 name-0.036
 understand-0.034
 gra-0.030
Token pink
Token ID11398
Feature activation+7.16e-03

Pos logit contributions
pill+0.005
ways+0.005
ths+0.004
ience+0.004
+0.004
Neg logit contributions
 nod-0.005
 agreement-0.004
 name-0.004
 understand-0.004
 head-0.003
Token all
Token ID477
Feature activation+2.74e-02

Pos logit contributions
ways+0.011
pill+0.011
ths+0.011
ience+0.011
+0.010
Neg logit contributions
 nod-0.017
 agreement-0.015
 name-0.015
 understand-0.015
 gra-0.013
Token afternoon
Token ID6672
Feature activation-3.29e-02

Pos logit contributions
 nod+0.042
 agreement+0.038
 name+0.037
 understand+0.035
 gra+0.030
Neg logit contributions
ways-0.051
pill-0.051
ths-0.050
ience-0.049
-0.049
Token,
Token ID11
Feature activation+5.68e-03

Pos logit contributions
endant+0.061
ience+0.058
ths+0.058
pill+0.058
 least+0.057
Neg logit contributions
 nod-0.053
 name-0.045
 understand-0.044
 agreement-0.043
 head-0.038
Token until
Token ID1566
Feature activation-1.14e-02

Pos logit contributions
 nod+0.010
 agreement+0.009
 name+0.009
 understand+0.008
 gra+0.007
Neg logit contributions
ways-0.010
pill-0.010
ths-0.009
ience-0.009
endant-0.009
Token the
Token ID262
Feature activation-4.55e-03

Pos logit contributions
pill+0.028
endant+0.027
ocus+0.026
inations+0.025
enny+0.025
Neg logit contributions
 nod-0.013
 name-0.010
 agreement-0.010
 understand-0.009
 their-0.009
Token morning
Token ID3329
Feature activation-6.65e-02

Pos logit contributions
ways+0.008
pill+0.008
ths+0.008
ience+0.008
+0.008
Neg logit contributions
 nod-0.007
 agreement-0.007
 name-0.006
 understand-0.006
 gra-0.005
Token sky
Token ID6766
Feature activation-3.39e-02

Pos logit contributions
ths+0.131
ience+0.130
years+0.127
 least+0.126
endant+0.125
Neg logit contributions
 nod-0.098
 name-0.091
 understand-0.077
 agreement-0.074
 gra-0.070
Token started
Token ID2067
Feature activation-3.62e-02

Pos logit contributions
pill+0.050
ways+0.048
ience+0.047
ths+0.047
endant+0.045
Neg logit contributions
 nod-0.067
 name-0.061
 agreement-0.059
 understand-0.058
 gra-0.052
Token to
Token ID284
Feature activation-2.19e-02

Pos logit contributions
ience+0.069
pill+0.067
ths+0.064
ar+0.063
opus+0.062
Neg logit contributions
 nod-0.059
 name-0.052
 understand-0.050
 agreement-0.050
 pat-0.044
Token turn
Token ID1210
Feature activation-1.78e-02

Pos logit contributions
ways+0.037
pill+0.036
ths+0.035
ience+0.035
+0.034
Neg logit contributions
 nod-0.037
 agreement-0.033
 name-0.033
 understand-0.032
 gra-0.028
Token orange
Token ID10912
Feature activation-1.37e-02

Pos logit contributions
pill+0.033
ways+0.032
ths+0.032
+0.031
ience+0.031
Neg logit contributions
 nod-0.028
 name-0.025
 agreement-0.025
 understand-0.024
 head-0.020
Token's
Token ID338
Feature activation+3.96e-03

Pos logit contributions
 nod+0.032
 agreement+0.029
 name+0.029
 understand+0.028
 gra+0.025
Neg logit contributions
ways-0.022
pill-0.021
ths-0.020
ience-0.020
-0.019
Token go
Token ID467
Feature activation-3.11e-03

Pos logit contributions
 nod+0.005
 agreement+0.005
 understand+0.004
 reins+0.004
 name+0.004
Neg logit contributions
 least-0.008
-0.007
 you-0.007
oh-0.007
ths-0.007
Token get
Token ID651
Feature activation-1.78e-02

Pos logit contributions
ways+0.006
pill+0.006
ths+0.006
ience+0.006
+0.005
Neg logit contributions
 nod-0.005
 agreement-0.004
 name-0.004
 understand-0.004
 gra-0.003
Token your
Token ID534
Feature activation-7.27e-04

Pos logit contributions
ways+0.037
pill+0.036
ths+0.034
ience+0.034
+0.032
Neg logit contributions
 nod-0.026
 name-0.024
 agreement-0.023
 understand-0.022
 say-0.020
Token p
Token ID279
Feature activation-3.54e-02

Pos logit contributions
ways+0.001
pill+0.001
ths+0.001
ience+0.001
qu+0.001
Neg logit contributions
 nod-0.001
 name-0.001
 agreement-0.001
 understand-0.001
 head-0.001
Tokenaj
Token ID1228
Feature activation-6.64e-02

Pos logit contributions
pill+0.046
ths+0.046
ways+0.045
ience+0.043
 lifetime+0.041
Neg logit contributions
 nod-0.075
 name-0.073
 agreement-0.064
 head-0.063
 understand-0.062
Tokenamas
Token ID17485
Feature activation-4.48e-03

Pos logit contributions
pill+0.114
ways+0.110
ience+0.109
ths+0.106
endant+0.105
Neg logit contributions
 nod-0.112
 name-0.104
 agreement-0.098
 understand-0.092
 head-0.090
Token from
Token ID422
Feature activation+2.26e-03

Pos logit contributions
ways+0.005
pill+0.005
ths+0.005
ience+0.005
+0.005
Neg logit contributions
 nod-0.010
 agreement-0.009
 name-0.009
 understand-0.009
 gra-0.008
Token the
Token ID262
Feature activation-2.84e-02

Pos logit contributions
 nod+0.004
 agreement+0.004
 name+0.003
 understand+0.003
 head+0.003
Neg logit contributions
pill-0.004
ways-0.004
ience-0.004
ths-0.004
endant-0.004
Token drawer
Token ID33451
Feature activation-1.60e-02

Pos logit contributions
ways+0.046
pill+0.045
ience+0.043
ths+0.043
+0.042
Neg logit contributions
 nod-0.050
 agreement-0.045
 understand-0.044
 name-0.044
 gra-0.037
Token."
Token ID526
Feature activation+6.17e-03

Pos logit contributions
pill+0.025
ths+0.024
ways+0.023
ience+0.023
+0.022
Neg logit contributions
 nod-0.030
 name-0.027
 agreement-0.027
 understand-0.026
 head-0.025
Token with
Token ID351
Feature activation+5.53e-03

Pos logit contributions
 nod+0.002
 agreement+0.002
 name+0.002
 understand+0.002
 gra+0.001
Neg logit contributions
ways-0.002
pill-0.002
ths-0.002
ience-0.002
ar-0.002
Token your
Token ID534
Feature activation+8.20e-03

Pos logit contributions
 nod+0.012
 name+0.011
 agreement+0.011
 understand+0.011
 say+0.010
Neg logit contributions
ways-0.007
pill-0.006
-0.006
ths-0.006
 lifetime-0.006
Token lion
Token ID18744
Feature activation+7.40e-03

Pos logit contributions
 nod+0.012
 name+0.011
 agreement+0.011
 understand+0.010
 head+0.009
Neg logit contributions
ways-0.017
pill-0.015
ths-0.015
-0.015
ience-0.015
Token.
Token ID13
Feature activation-6.66e-03

Pos logit contributions
 nod+0.013
 name+0.012
 agreement+0.012
 understand+0.011
 head+0.010
Neg logit contributions
ways-0.012
pill-0.012
ths-0.012
-0.012
ience-0.011
Token And
Token ID843
Feature activation-2.02e-02

Pos logit contributions
pill+0.010
ways+0.010
ience+0.010
oh+0.009
+0.009
Neg logit contributions
 nod-0.012
 agreement-0.011
 understand-0.011
 name-0.010
 gra-0.009
Token we
Token ID356
Feature activation+4.01e-02

Pos logit contributions
ways+0.020
pill+0.019
ths+0.018
ience+0.018
ar+0.017
Neg logit contributions
 nod-0.049
 agreement-0.045
 name-0.045
 understand-0.044
 gra-0.040
Token can
Token ID460
Feature activation+1.07e-02

Pos logit contributions
 nod+0.061
 name+0.054
 agreement+0.054
 understand+0.053
 gra+0.043
Neg logit contributions
ways-0.077
pill-0.076
ths-0.074
ience-0.072
-0.070
Token be
Token ID307
Feature activation+6.43e-03

Pos logit contributions
 nod+0.014
 agreement+0.012
 name+0.011
 understand+0.011
 gra+0.009
Neg logit contributions
ways-0.021
pill-0.021
ths-0.021
-0.021
qu-0.021
Token friends
Token ID2460
Feature activation+5.82e-04

Pos logit contributions
 nod+0.011
 name+0.010
 agreement+0.010
 understand+0.009
 head+0.008
Neg logit contributions
ways-0.012
pill-0.012
-0.011
ths-0.011
ience-0.011
Token again
Token ID757
Feature activation-1.24e-02

Pos logit contributions
 nod+0.001
 agreement+0.001
 name+0.001
 understand+0.001
 gra+0.001
Neg logit contributions
ways-0.001
pill-0.001
ience-0.001
ths-0.001
ar-0.001
Token."
Token ID526
Feature activation+4.81e-03

Pos logit contributions
ways+0.017
pill+0.017
ths+0.016
ience+0.015
+0.015
Neg logit contributions
 nod-0.026
 agreement-0.024
 name-0.023
 understand-0.023
 head-0.020
Token happened
Token ID3022
Feature activation-1.39e-02

Pos logit contributions
ways+0.012
pill+0.012
ths+0.011
ience+0.011
+0.011
Neg logit contributions
 nod-0.024
 agreement-0.023
 name-0.022
 understand-0.022
 head-0.020
Token.
Token ID13
Feature activation+1.54e-02

Pos logit contributions
ways+0.018
pill+0.017
ths+0.017
ience+0.016
oh+0.016
Neg logit contributions
 nod-0.028
 agreement-0.026
 name-0.026
 understand-0.025
 gra-0.024
Token Her
Token ID2332
Feature activation+6.82e-03

Pos logit contributions
 nod+0.026
 agreement+0.025
 understand+0.023
 name+0.022
 gra+0.021
Neg logit contributions
ways-0.025
pill-0.024
-0.024
endant-0.023
ths-0.022
Token mom
Token ID1995
Feature activation-3.68e-02

Pos logit contributions
 nod+0.010
 agreement+0.010
 gra+0.009
 understand+0.009
rated+0.007
Neg logit contributions
endant-0.012
ths-0.011
-0.011
ways-0.011
 seas-0.010
Token did
Token ID750
Feature activation-1.37e-03

Pos logit contributions
ways+0.061
pill+0.060
ths+0.058
ience+0.058
+0.056
Neg logit contributions
 nod-0.064
 agreement-0.058
 name-0.057
 understand-0.056
 gra-0.048
Token not
Token ID407
Feature activation+4.19e-02

Pos logit contributions
ways+0.002
ths+0.002
+0.002
pill+0.002
ience+0.002
Neg logit contributions
 nod-0.003
 agreement-0.002
 name-0.002
 understand-0.002
 gra-0.002
Token believe
Token ID1975
Feature activation-7.38e-03

Pos logit contributions
 nod+0.051
 agreement+0.047
 name+0.044
 understand+0.039
 gra+0.039
Neg logit contributions
ways-0.085
pill-0.081
oh-0.081
qu-0.080
ar-0.080
Token her
Token ID607
Feature activation-6.85e-03

Pos logit contributions
ways+0.014
pill+0.014
ths+0.014
ience+0.014
oh+0.013
Neg logit contributions
 nod-0.011
 agreement-0.009
 name-0.009
 understand-0.009
 gra-0.008
Token.
Token ID13
Feature activation+1.20e-02

Pos logit contributions
ways+0.010
pill+0.010
ths+0.010
ience+0.010
+0.009
Neg logit contributions
 nod-0.013
 agreement-0.012
 name-0.012
 understand-0.011
 gra-0.010
Token She
Token ID1375
Feature activation-1.73e-02

Pos logit contributions
 nod+0.019
 agreement+0.017
 understand+0.016
 name+0.016
 gra+0.014
Neg logit contributions
ways-0.022
pill-0.021
-0.020
endant-0.020
ths-0.020
Token thought
Token ID1807
Feature activation+1.22e-02

Pos logit contributions
ways+0.025
pill+0.025
ths+0.024
ience+0.024
+0.023
Neg logit contributions
 nod-0.033
 agreement-0.031
 name-0.030
 understand-0.029
 gra-0.026
Token a
Token ID257
Feature activation+1.39e-03

Pos logit contributions
ways+0.013
pill+0.013
endant+0.013
+0.013
ar+0.013
Neg logit contributions
 nod-0.013
 understand-0.013
 name-0.013
 agreement-0.012
 head-0.011
Token mess
Token ID2085
Feature activation+3.25e-03

Pos logit contributions
 nod+0.003
 agreement+0.002
 name+0.002
 understand+0.002
 gra+0.002
Neg logit contributions
ways-0.002
pill-0.002
ths-0.002
ience-0.002
-0.002
Token like
Token ID588
Feature activation-2.17e-02

Pos logit contributions
 nod+0.006
 agreement+0.005
 name+0.005
 understand+0.005
 head+0.005
Neg logit contributions
pill-0.005
ways-0.005
ths-0.005
ience-0.005
endant-0.005
Token that
Token ID326
Feature activation-1.05e-02

Pos logit contributions
pill+0.033
ways+0.032
ar+0.032
ience+0.031
 lifetime+0.031
Neg logit contributions
 nod-0.038
 name-0.037
 agreement-0.036
 understand-0.035
 head-0.032
Token.
Token ID13
Feature activation-2.39e-02

Pos logit contributions
ways+0.016
pill+0.016
ths+0.016
ience+0.016
+0.015
Neg logit contributions
 nod-0.019
 agreement-0.017
 name-0.017
 understand-0.017
 gra-0.015
Token I
Token ID314
Feature activation+4.08e-02

Pos logit contributions
ways+0.040
pill+0.039
ths+0.038
ience+0.038
+0.036
Neg logit contributions
 nod-0.041
 agreement-0.037
 name-0.037
 understand-0.036
 gra-0.031
Token must
Token ID1276
Feature activation-2.11e-02

Pos logit contributions
 nod+0.062
 agreement+0.058
 name+0.053
 understand+0.050
 gra+0.047
Neg logit contributions
ways-0.072
pill-0.071
ience-0.070
oh-0.069
ar-0.069
Token punish
Token ID6878
Feature activation-7.41e-03

Pos logit contributions
ways+0.038
ths+0.037
oh+0.037
+0.037
 least+0.037
Neg logit contributions
 nod-0.029
 agreement-0.026
 name-0.024
 gra-0.023
 reins-0.023
Token you
Token ID345
Feature activation+1.06e-02

Pos logit contributions
pill+0.006
ways+0.006
ience+0.006
+0.005
ths+0.005
Neg logit contributions
 nod-0.019
 agreement-0.018
 name-0.018
 understand-0.018
 head-0.016
Token."
Token ID526
Feature activation-7.52e-03

Pos logit contributions
 nod+0.021
 agreement+0.019
 name+0.019
 understand+0.018
 head+0.016
Neg logit contributions
ways-0.016
pill-0.016
ths-0.015
ience-0.015
-0.014
Token So
Token ID1406
Feature activation-5.33e-03

Pos logit contributions
pill+0.013
ience+0.013
ar+0.013
ths+0.012
ways+0.012
Neg logit contributions
 nod-0.013
 name-0.012
 understand-0.011
 agreement-0.011
 whine-0.009
Token He
Token ID679
Feature activation+7.47e-04

Pos logit contributions
 nod+0.017
 agreement+0.015
 name+0.015
 understand+0.014
 gra+0.012
Neg logit contributions
ways-0.019
pill-0.019
ths-0.019
ience-0.018
-0.018
Token was
Token ID373
Feature activation+1.01e-02

Pos logit contributions
 nod+0.001
 understand+0.001
 name+0.001
 agreement+0.001
 head+0.001
Neg logit contributions
pill-0.001
ways-0.001
-0.001
 lifetime-0.001
ths-0.001
Token sad
Token ID6507
Feature activation-4.48e-03

Pos logit contributions
 nod+0.018
 agreement+0.017
 name+0.017
 understand+0.016
 head+0.014
Neg logit contributions
ways-0.016
pill-0.016
ths-0.015
ience-0.015
-0.015
Token and
Token ID290
Feature activation-1.16e-02

Pos logit contributions
pill+0.010
ways+0.010
ience+0.010
ths+0.009
qu+0.009
Neg logit contributions
 nod-0.006
 understand-0.005
 name-0.005
 agreement-0.005
 head-0.004
Token didn
Token ID1422
Feature activation+6.49e-03

Pos logit contributions
ways+0.017
pill+0.016
ths+0.016
ience+0.016
+0.015
Neg logit contributions
 nod-0.023
 agreement-0.021
 name-0.021
 understand-0.020
 gra-0.018
Token't
Token ID470
Feature activation+5.57e-02

Pos logit contributions
 nod+0.016
 name+0.015
 understand+0.015
 agreement+0.015
 gra+0.013
Neg logit contributions
ways-0.006
pill-0.006
ience-0.006
ths-0.006
 lifetime-0.005
Token know
Token ID760
Feature activation-1.88e-02

Pos logit contributions
 nod+0.091
 agreement+0.082
 gra+0.077
 name+0.077
 reins+0.070
Neg logit contributions
ways-0.085
ha-0.083
qu-0.080
endant-0.080
st-0.080
Token what
Token ID644
Feature activation-2.98e-03

Pos logit contributions
ways+0.032
pill+0.032
ths+0.031
ience+0.030
+0.029
Neg logit contributions
 nod-0.032
 agreement-0.029
 name-0.028
 understand-0.027
 gra-0.024
Token to
Token ID284
Feature activation-2.37e-03

Pos logit contributions
ways+0.004
pill+0.004
ths+0.004
ience+0.004
+0.004
Neg logit contributions
 nod-0.006
 name-0.005
 agreement-0.005
 understand-0.005
 gra-0.005
Token do
Token ID466
Feature activation-1.33e-02

Pos logit contributions
oh+0.005
qu+0.005
ways+0.005
 you+0.005
ths+0.005
Neg logit contributions
 nod-0.003
 agreement-0.002
 name-0.002
 reins-0.002
 gra-0.002
Token.
Token ID13
Feature activation+1.46e-02

Pos logit contributions
pill+0.028
ience+0.027
endant+0.026
ths+0.026
ways+0.026
Neg logit contributions
 nod-0.018
 agreement-0.016
 understand-0.016
 name-0.016
 head-0.013
Token said
Token ID531
Feature activation-1.29e-02

Pos logit contributions
 nod+0.029
 name+0.026
 agreement+0.025
 understand+0.025
 head+0.022
Neg logit contributions
ways-0.027
pill-0.027
ths-0.026
-0.026
ience-0.025
Token with
Token ID351
Feature activation+1.76e-02

Pos logit contributions
pill+0.021
ways+0.021
ths+0.020
ience+0.020
+0.019
Neg logit contributions
 nod-0.024
 agreement-0.021
 name-0.021
 understand-0.020
 head-0.018
Token a
Token ID257
Feature activation+1.74e-02

Pos logit contributions
 nod+0.035
 agreement+0.031
 name+0.030
 understand+0.030
 whine+0.029
Neg logit contributions
pill-0.023
-0.021
endant-0.021
ways-0.021
ience-0.021
Token ch
Token ID442
Feature activation+3.56e-02

Pos logit contributions
 agreement+0.023
 nod+0.022
 understand+0.019
 name+0.019
 whine+0.019
Neg logit contributions
-0.029
endant-0.029
 transformed-0.029
 lifetime-0.029
pill-0.028
Tokenir
Token ID343
Feature activation+2.79e-02

Pos logit contributions
 nod+0.086
 agreement+0.080
 name+0.079
 understand+0.077
 gra+0.070
Neg logit contributions
ways-0.036
pill-0.035
ths-0.033
ience-0.032
-0.031
Tokenp
Token ID79
Feature activation+4.85e-02

Pos logit contributions
 nod+0.048
 agreement+0.043
 name+0.043
 understand+0.040
 whine+0.039
Neg logit contributions
pill-0.043
ways-0.043
ience-0.041
qu-0.041
ar-0.041
Token.
Token ID13
Feature activation+1.60e-02

Pos logit contributions
 nod+0.096
 agreement+0.093
 name+0.085
 understand+0.085
 whine+0.081
Neg logit contributions
ways-0.059
ar-0.059
endant-0.057
pill-0.056
oh-0.056
Token
Token ID198
Feature activation+7.50e-03

Pos logit contributions
 nod+0.016
 name+0.013
 agreement+0.012
 understand+0.011
 head+0.008
Neg logit contributions
ths-0.039
ways-0.039
pill-0.039
ience-0.038
 lifetime-0.037
Token
Token ID198
Feature activation-1.92e-02

Pos logit contributions
 nod+0.011
 name+0.010
 agreement+0.010
 understand+0.010
 gra+0.008
Neg logit contributions
ways-0.014
pill-0.014
ths-0.014
ience-0.014
-0.013
TokenBut
Token ID1537
Feature activation+1.96e-02

Pos logit contributions
pill+0.032
ways+0.031
endant+0.031
+0.030
ths+0.030
Neg logit contributions
 nod-0.033
 agreement-0.030
 understand-0.029
 name-0.028
 head-0.023
Token Jack
Token ID3619
Feature activation-1.27e-02

Pos logit contributions
 nod+0.034
 agreement+0.032
 name+0.029
 understand+0.029
 head+0.027
Neg logit contributions
pill-0.039
ways-0.034
endant-0.034
qu-0.033
unn-0.032
Token tasty
Token ID25103
Feature activation-3.65e-03

Pos logit contributions
 nod+0.032
 agreement+0.030
 name+0.029
 understand+0.028
 gra+0.027
Neg logit contributions
ways-0.021
pill-0.020
ths-0.020
ience-0.020
oh-0.018
Token.
Token ID13
Feature activation+1.03e-02

Pos logit contributions
ways+0.006
pill+0.006
ths+0.005
ience+0.005
+0.005
Neg logit contributions
 nod-0.007
 agreement-0.006
 name-0.006
 understand-0.006
 gra-0.005
Token They
Token ID1119
Feature activation+1.17e-02

Pos logit contributions
 nod+0.016
 agreement+0.015
 understand+0.014
 name+0.013
 gra+0.013
Neg logit contributions
pill-0.018
ways-0.018
-0.017
ience-0.016
endant-0.016
Token said
Token ID531
Feature activation-2.10e-02

Pos logit contributions
 nod+0.022
 agreement+0.020
 name+0.020
 understand+0.019
 gra+0.017
Neg logit contributions
ways-0.018
pill-0.017
ience-0.017
ths-0.017
-0.016
Token thank
Token ID5875
Feature activation-2.50e-02

Pos logit contributions
ways+0.028
pill+0.027
ths+0.026
+0.025
ience+0.024
Neg logit contributions
 nod-0.046
 name-0.041
 agreement-0.041
 understand-0.040
 whine-0.035
Token you
Token ID345
Feature activation+6.82e-03

Pos logit contributions
ways+0.015
pill+0.015
ths+0.013
ience+0.013
endant+0.012
Neg logit contributions
 nod-0.070
 agreement-0.066
 name-0.065
 understand-0.064
 gra-0.059
Token to
Token ID284
Feature activation+4.05e-03

Pos logit contributions
 nod+0.011
 agreement+0.010
 name+0.010
 understand+0.010
 head+0.008
Neg logit contributions
ways-0.012
pill-0.012
ths-0.012
ience-0.011
-0.011
Token Ben
Token ID3932
Feature activation+8.87e-03

Pos logit contributions
 nod+0.007
 name+0.007
 agreement+0.006
 understand+0.006
 gra+0.006
Neg logit contributions
 least-0.006
ths-0.006
ways-0.006
 you-0.006
-0.006
Token and
Token ID290
Feature activation+1.50e-02

Pos logit contributions
 nod+0.015
 agreement+0.013
 name+0.013
 understand+0.013
 gra+0.011
Neg logit contributions
ways-0.015
pill-0.015
ience-0.014
ths-0.014
ar-0.014
Token smiled
Token ID13541
Feature activation-5.24e-03

Pos logit contributions
 nod+0.025
 agreement+0.023
 name+0.022
 understand+0.021
 gra+0.019
Neg logit contributions
ways-0.024
pill-0.024
ience-0.024
ths-0.023
ar-0.023
Token.
Token ID13
Feature activation+1.55e-02

Pos logit contributions
ways+0.005
ience+0.005
ths+0.005
ar+0.005
 lifetime+0.005
Neg logit contributions
 nod-0.011
 gra-0.011
 agreement-0.010
 whine-0.010
 understand-0.010
Token liked
Token ID8288
Feature activation-1.46e-02

Pos logit contributions
 nod+0.023
 agreement+0.021
 name+0.021
 understand+0.020
 gra+0.018
Neg logit contributions
ways-0.018
pill-0.017
ths-0.016
oh-0.016
ience-0.016
Token to
Token ID284
Feature activation-1.32e-03

Pos logit contributions
ways+0.019
pill+0.019
ths+0.018
ience+0.017
+0.017
Neg logit contributions
 nod-0.030
 agreement-0.028
 name-0.028
 understand-0.027
 gra-0.024
Token pretend
Token ID16614
Feature activation-3.16e-02

Pos logit contributions
ways+0.002
+0.002
ar+0.002
 seas+0.002
 arrest+0.002
Neg logit contributions
 agreement-0.002
 nod-0.002
 lull-0.002
 Amen-0.002
 name-0.002
Token they
Token ID484
Feature activation-1.68e-02

Pos logit contributions
ience+0.052
pill+0.051
+0.051
ths+0.049
ways+0.048
Neg logit contributions
 nod-0.055
 name-0.051
 understand-0.049
 agreement-0.047
 say-0.043
Token were
Token ID547
Feature activation-3.61e-02

Pos logit contributions
ways+0.027
pill+0.026
ths+0.025
ience+0.025
+0.024
Neg logit contributions
 nod-0.030
 agreement-0.028
 name-0.027
 understand-0.026
 gra-0.023
Token heroes
Token ID10281
Feature activation+8.34e-03

Pos logit contributions
ways+0.062
pill+0.062
ths+0.060
ience+0.060
+0.058
Neg logit contributions
 nod-0.060
 name-0.054
 agreement-0.054
 understand-0.052
 gra-0.045
Token and
Token ID290
Feature activation-1.78e-02

Pos logit contributions
 nod+0.014
 agreement+0.013
 name+0.013
 understand+0.012
 gra+0.010
Neg logit contributions
ways-0.014
pill-0.014
ths-0.014
ience-0.013
-0.013
Token villains
Token ID25239
Feature activation-2.19e-04

Pos logit contributions
ways+0.032
pill+0.031
ths+0.029
ience+0.029
 lifetime+0.029
Neg logit contributions
 nod-0.028
 agreement-0.026
 name-0.025
 understand-0.023
 gra-0.021
Token.
Token ID13
Feature activation+3.84e-03

Pos logit contributions
pill+0.000
ths+0.000
ways+0.000
ience+0.000
+0.000
Neg logit contributions
 nod-0.000
 name-0.000
 agreement-0.000
 understand-0.000
 head-0.000
Token Ben
Token ID3932
Feature activation-1.59e-02

Pos logit contributions
 nod+0.007
 understand+0.006
 agreement+0.006
 name+0.006
 gra+0.005
Neg logit contributions
ways-0.006
pill-0.006
-0.006
ience-0.006
ar-0.005
Token had
Token ID550
Feature activation-2.07e-02

Pos logit contributions
ways+0.028
pill+0.028
ths+0.028
+0.027
ience+0.026
Neg logit contributions
 nod-0.026
 name-0.023
 understand-0.022
 agreement-0.022
 head-0.020
Token stuck
Token ID7819
Feature activation+2.97e-03

Pos logit contributions
 nod+0.004
 agreement+0.004
 name+0.004
 understand+0.004
 head+0.003
Neg logit contributions
pill-0.003
ways-0.003
ience-0.003
ths-0.003
-0.003
Token.
Token ID13
Feature activation+1.19e-02

Pos logit contributions
 nod+0.005
 agreement+0.005
 name+0.005
 understand+0.004
 head+0.004
Neg logit contributions
pill-0.005
ways-0.005
ience-0.005
ths-0.005
-0.004
Token
Token ID198
Feature activation+1.88e-02

Pos logit contributions
 nod+0.016
 name+0.014
 agreement+0.014
 understand+0.013
 head+0.011
Neg logit contributions
ways-0.024
pill-0.024
ths-0.024
ience-0.023
endant-0.023
Token
Token ID198
Feature activation-1.18e-02

Pos logit contributions
 nod+0.032
 agreement+0.029
 name+0.028
 understand+0.027
 head+0.024
Neg logit contributions
ways-0.032
pill-0.031
ths-0.030
ience-0.030
-0.029
TokenMax
Token ID11518
Feature activation-4.40e-03

Pos logit contributions
ways+0.023
pill+0.023
ths+0.022
ience+0.022
+0.021
Neg logit contributions
 nod-0.017
 agreement-0.015
 name-0.015
 understand-0.014
 gra-0.012
Token was
Token ID373
Feature activation+8.14e-03

Pos logit contributions
pill+0.009
+0.009
ways+0.009
endant+0.008
igsaw+0.008
Neg logit contributions
 nod-0.006
 understand-0.006
 name-0.006
 head-0.005
 agreement-0.005
Token scared
Token ID12008
Feature activation+3.86e-03

Pos logit contributions
 nod+0.015
 agreement+0.014
 name+0.014
 understand+0.014
 head+0.012
Neg logit contributions
pill-0.012
ways-0.012
ience-0.012
ths-0.012
-0.011
Token and
Token ID290
Feature activation-4.54e-03

Pos logit contributions
 nod+0.006
 agreement+0.005
 understand+0.005
 name+0.005
 head+0.004
Neg logit contributions
pill-0.008
ways-0.007
endant-0.007
-0.007
ience-0.007
Token didn
Token ID1422
Feature activation-8.10e-04

Pos logit contributions
pill+0.007
ways+0.007
ths+0.006
ience+0.006
+0.006
Neg logit contributions
 nod-0.009
 agreement-0.008
 name-0.008
 understand-0.008
 head-0.007
Token't
Token ID470
Feature activation+4.63e-02

Pos logit contributions
ways+0.001
pill+0.001
ience+0.001
ths+0.001
endant+0.001
Neg logit contributions
 nod-0.002
 name-0.002
 understand-0.002
 agreement-0.002
 gra-0.002
Token know
Token ID760
Feature activation-2.17e-02

Pos logit contributions
 nod+0.071
 agreement+0.069
 name+0.062
 gra+0.060
 whine+0.055
Neg logit contributions
ways-0.073
ha-0.073
ience-0.068
oh-0.068
qu-0.068
Token old
Token ID1468
Feature activation+1.15e-02

Pos logit contributions
 nod+0.011
 agreement+0.010
 name+0.010
 understand+0.009
 gra+0.009
Neg logit contributions
ways-0.009
ths-0.008
ience-0.008
pill-0.008
 lifetime-0.008
Token man
Token ID582
Feature activation-8.18e-03

Pos logit contributions
 agreement+0.016
 nod+0.016
 name+0.015
 understand+0.015
 gra+0.014
Neg logit contributions
ways-0.019
ths-0.019
ience-0.018
ar-0.018
lyss-0.018
Token told
Token ID1297
Feature activation+1.55e-02

Pos logit contributions
ways+0.013
pill+0.013
ths+0.013
ience+0.012
+0.012
Neg logit contributions
 nod-0.015
 agreement-0.013
 name-0.013
 understand-0.013
 head-0.011
Token him
Token ID683
Feature activation+1.13e-02

Pos logit contributions
 nod+0.021
 agreement+0.019
 name+0.018
 understand+0.018
 gra+0.015
Neg logit contributions
ways-0.032
pill-0.032
endant-0.030
ience-0.030
ths-0.030
Token the
Token ID262
Feature activation+1.98e-03

Pos logit contributions
 nod+0.025
 name+0.023
 understand+0.023
 agreement+0.022
 head+0.019
Neg logit contributions
pill-0.014
ways-0.014
-0.014
ths-0.013
endant-0.013
Token truth
Token ID3872
Feature activation+1.57e-02

Pos logit contributions
 nod+0.003
 agreement+0.003
 name+0.003
 understand+0.003
 gra+0.002
Neg logit contributions
pill-0.003
ways-0.003
ience-0.003
ths-0.003
 lifetime-0.003
Token,
Token ID11
Feature activation+1.31e-02

Pos logit contributions
 nod+0.028
 agreement+0.026
 name+0.025
 understand+0.024
 gra+0.021
Neg logit contributions
ways-0.025
pill-0.025
ths-0.024
ience-0.024
ar-0.023
Token but
Token ID475
Feature activation+2.22e-02

Pos logit contributions
 nod+0.021
 agreement+0.019
 name+0.019
 understand+0.018
 gra+0.016
Neg logit contributions
ways-0.023
pill-0.023
ths-0.022
ience-0.022
-0.021
Token then
Token ID788
Feature activation-1.49e-02

Pos logit contributions
 nod+0.044
 agreement+0.040
 name+0.039
 understand+0.038
 gra+0.034
Neg logit contributions
ways-0.032
pill-0.031
ths-0.030
ience-0.030
-0.029
Token it
Token ID340
Feature activation-1.11e-02

Pos logit contributions
pill+0.027
ways+0.025
+0.025
endant+0.025
qu+0.024
Neg logit contributions
 nod-0.027
 understand-0.023
 name-0.023
 agreement-0.022
 head-0.019
Token began
Token ID2540
Feature activation-1.91e-02

Pos logit contributions
ways+0.017
pill+0.017
ths+0.016
ience+0.016
+0.016
Neg logit contributions
 nod-0.021
 agreement-0.019
 name-0.019
 understand-0.018
 gra-0.016
Token
Token ID198
Feature activation+1.51e-02

Pos logit contributions
 nod+0.012
 agreement+0.011
 name+0.011
 understand+0.010
 gra+0.009
Neg logit contributions
ways-0.015
pill-0.014
ths-0.014
ience-0.014
endant-0.013
Token
Token ID198
Feature activation-9.37e-03

Pos logit contributions
 nod+0.029
 agreement+0.026
 head+0.024
 name+0.024
 whine+0.024
Neg logit contributions
pill-0.023
ways-0.021
oh-0.021
ha-0.021
qu-0.021
TokenFrom
Token ID4863
Feature activation-2.29e-03

Pos logit contributions
pill+0.014
ways+0.014
qu+0.013
oh+0.013
ths+0.013
Neg logit contributions
 nod-0.018
 agreement-0.016
 name-0.016
 understand-0.015
 head-0.015
Token that
Token ID326
Feature activation+2.66e-02

Pos logit contributions
ways+0.002
pill+0.002
ience+0.002
ths+0.002
+0.002
Neg logit contributions
 nod-0.006
 agreement-0.005
 name-0.005
 understand-0.005
 head-0.005
Token day
Token ID1110
Feature activation+9.95e-04

Pos logit contributions
 nod+0.046
 agreement+0.041
 name+0.040
 understand+0.039
 gra+0.034
Neg logit contributions
ways-0.044
pill-0.044
ths-0.043
ience-0.043
-0.041
Token,
Token ID11
Feature activation+6.74e-03

Pos logit contributions
 nod+0.001
 agreement+0.001
 name+0.001
 understand+0.001
 head+0.001
Neg logit contributions
ways-0.002
pill-0.002
ths-0.002
ience-0.002
endant-0.002
Token Tim
Token ID5045
Feature activation+2.80e-03

Pos logit contributions
 nod+0.011
 name+0.010
 agreement+0.010
 understand+0.010
 gra+0.008
Neg logit contributions
ways-0.012
endant-0.012
pill-0.012
ience-0.011
ocus-0.011
Token and
Token ID290
Feature activation+3.69e-02

Pos logit contributions
 nod+0.004
 name+0.004
 agreement+0.004
 understand+0.004
 head+0.003
Neg logit contributions
ways-0.005
pill-0.005
ths-0.005
endant-0.005
-0.005
Token Sam
Token ID3409
Feature activation+5.99e-03

Pos logit contributions
 nod+0.074
 agreement+0.069
 name+0.068
 understand+0.064
 reins+0.061
Neg logit contributions
ths-0.046
pill-0.044
ways-0.043
 least-0.042
key-0.041
Token shared
Token ID4888
Feature activation-1.36e-02

Pos logit contributions
 nod+0.010
 understand+0.009
 agreement+0.009
 name+0.009
 gra+0.007
Neg logit contributions
ways-0.010
pill-0.010
ths-0.010
endant-0.010
ience-0.010
Token their
Token ID511
Feature activation+7.14e-03

Pos logit contributions
ways+0.022
pill+0.021
ths+0.021
 seas+0.021
ience+0.021
Neg logit contributions
 nod-0.023
 agreement-0.020
 name-0.019
 understand-0.019
 whine-0.018
Token applaud
Token ID22517
Feature activation-1.28e-02

Pos logit contributions
 nod+0.002
 agreement+0.002
 name+0.002
 understand+0.002
 gra+0.002
Neg logit contributions
ways-0.002
ths-0.001
pill-0.001
ience-0.001
-0.001
Tokening
Token ID278
Feature activation+3.07e-02

Pos logit contributions
ways+0.018
pill+0.018
ths+0.017
ience+0.017
ar+0.016
Neg logit contributions
 nod-0.026
 agreement-0.023
 name-0.023
 understand-0.023
 gra-0.020
Token.
Token ID13
Feature activation+1.32e-03

Pos logit contributions
 nod+0.050
 agreement+0.045
 name+0.044
 understand+0.043
 whine+0.037
Neg logit contributions
ways-0.054
pill-0.053
ience-0.052
ths-0.051
ar-0.050
Token He
Token ID679
Feature activation-4.27e-03

Pos logit contributions
 nod+0.002
 name+0.002
 agreement+0.001
 understand+0.001
 gra+0.001
Neg logit contributions
ths-0.003
ience-0.003
pill-0.003
ways-0.003
endant-0.003
Token started
Token ID2067
Feature activation-1.09e-02

Pos logit contributions
pill+0.008
ways+0.008
+0.007
endant+0.007
ths+0.007
Neg logit contributions
 nod-0.007
 name-0.006
 understand-0.006
 agreement-0.006
 head-0.005
Token to
Token ID284
Feature activation-6.96e-03

Pos logit contributions
ways+0.017
pill+0.017
ience+0.016
ths+0.016
+0.016
Neg logit contributions
 nod-0.020
 name-0.019
 agreement-0.018
 understand-0.018
 gra-0.016
Token applaud
Token ID22517
Feature activation-8.09e-03

Pos logit contributions
ways+0.013
 least+0.013
pill+0.013
qu+0.013
ths+0.013
Neg logit contributions
 nod-0.008
 agreement-0.007
 name-0.007
 understand-0.006
 reins-0.006
Token too
Token ID1165
Feature activation+8.79e-03

Pos logit contributions
ways+0.013
pill+0.012
ths+0.012
ience+0.012
ar+0.011
Neg logit contributions
 nod-0.015
 agreement-0.014
 name-0.013
 understand-0.013
 whine-0.012
Token.
Token ID13
Feature activation+6.43e-03

Pos logit contributions
 nod+0.013
 name+0.012
 understand+0.012
 agreement+0.011
 answer+0.009
Neg logit contributions
endant-0.018
pill-0.017
uch-0.017
lyss-0.016
ths-0.016
Token Then
Token ID3244
Feature activation-1.15e-02

Pos logit contributions
 nod+0.009
 name+0.008
 agreement+0.007
 understand+0.007
 gra+0.006
Neg logit contributions
ths-0.013
ience-0.013
pill-0.013
ways-0.013
endant-0.013
Token,
Token ID11
Feature activation+5.72e-03

Pos logit contributions
pill+0.023
endant+0.021
+0.020
ways+0.019
qu+0.019
Neg logit contributions
 nod-0.021
 name-0.018
 understand-0.018
 agreement-0.017
 head-0.015
Token Just
Token ID2329
Feature activation+5.38e-03

Pos logit contributions
ways+0.029
pill+0.029
ths+0.027
ience+0.027
+0.027
Neg logit contributions
 nod-0.034
 agreement-0.031
 name-0.030
 understand-0.029
 gra-0.026
Token get
Token ID651
Feature activation-1.35e-02

Pos logit contributions
 nod+0.007
 agreement+0.007
 name+0.007
 understand+0.006
 gra+0.006
Neg logit contributions
 least-0.010
ths-0.009
oh-0.009
ience-0.009
-0.009
Token back
Token ID736
Feature activation-6.66e-03

Pos logit contributions
ways+0.028
pill+0.027
 lifetime+0.025
+0.024
qu+0.024
Neg logit contributions
 nod-0.020
 name-0.018
 understand-0.018
 agreement-0.017
 say-0.016
Token up
Token ID510
Feature activation-1.33e-02

Pos logit contributions
pill+0.012
ways+0.012
+0.011
ience+0.011
ths+0.011
Neg logit contributions
 nod-0.011
 name-0.010
 agreement-0.009
 understand-0.009
 head-0.008
Token and
Token ID290
Feature activation-1.98e-02

Pos logit contributions
pill+0.029
ways+0.028
endant+0.028
ths+0.027
ar+0.027
Neg logit contributions
 nod-0.016
 name-0.014
 agreement-0.014
 understand-0.014
 head-0.012
Token try
Token ID1949
Feature activation-1.28e-02

Pos logit contributions
ths+0.034
ways+0.033
ience+0.033
pill+0.032
oh+0.032
Neg logit contributions
 nod-0.031
 agreement-0.030
 name-0.028
 understand-0.027
 gra-0.024
Token again
Token ID757
Feature activation-2.08e-02

Pos logit contributions
pill+0.021
ways+0.021
ths+0.020
+0.019
ar+0.019
Neg logit contributions
 nod-0.023
 name-0.021
 understand-0.020
 agreement-0.020
 head-0.017
Token."
Token ID526
Feature activation+4.70e-03

Pos logit contributions
pill+0.032
ways+0.031
ths+0.029
+0.029
ience+0.029
Neg logit contributions
 nod-0.040
 name-0.036
 agreement-0.036
 understand-0.036
 head-0.031
Token
Token ID198
Feature activation+1.10e-02

Pos logit contributions
 nod+0.007
 name+0.006
 agreement+0.006
 understand+0.006
 gra+0.005
Neg logit contributions
ways-0.009
pill-0.009
ths-0.009
ience-0.009
 lifetime-0.009
Token
Token ID198
Feature activation-1.45e-02

Pos logit contributions
 nod+0.020
 agreement+0.018
 name+0.018
 understand+0.017
 whine+0.016
Neg logit contributions
pill-0.016
ways-0.016
oh-0.015
-0.015
qu-0.015
TokenL
Token ID43
Feature activation+1.75e-02

Pos logit contributions
ways+0.023
pill+0.022
ths+0.021
ience+0.021
+0.020
Neg logit contributions
 nod-0.027
 agreement-0.024
 name-0.024
 understand-0.023
 gra-0.021
Token was
Token ID373
Feature activation-5.72e-03

Pos logit contributions
ways+0.036
pill+0.035
ths+0.034
ience+0.034
+0.033
Neg logit contributions
 nod-0.037
 agreement-0.033
 name-0.033
 understand-0.032
 gra-0.027
Token like
Token ID588
Feature activation-7.17e-04

Pos logit contributions
pill+0.010
ways+0.010
ths+0.010
+0.009
ience+0.009
Neg logit contributions
 nod-0.010
 name-0.009
 understand-0.009
 agreement-0.009
 say-0.008
Token or
Token ID393
Feature activation-3.04e-02

Pos logit contributions
pill+0.001
ths+0.001
ways+0.001
ience+0.001
endant+0.001
Neg logit contributions
 nod-0.001
 name-0.001
 agreement-0.001
 understand-0.001
 head-0.001
Token what
Token ID644
Feature activation-1.05e-02

Pos logit contributions
ways+0.051
pill+0.050
ths+0.049
ience+0.049
+0.047
Neg logit contributions
 nod-0.052
 agreement-0.047
 name-0.046
 understand-0.044
 gra-0.038
Token she
Token ID673
Feature activation-2.43e-02

Pos logit contributions
ways+0.013
pill+0.013
ths+0.012
ience+0.012
+0.012
Neg logit contributions
 nod-0.023
 name-0.021
 agreement-0.021
 understand-0.020
 gra-0.018
Token was
Token ID373
Feature activation-8.61e-03

Pos logit contributions
ways+0.043
pill+0.043
ths+0.041
ience+0.040
+0.040
Neg logit contributions
 nod-0.040
 name-0.036
 agreement-0.034
 understand-0.034
 head-0.029
Token thinking
Token ID3612
Feature activation-2.11e-02

Pos logit contributions
pill+0.015
+0.015
ths+0.014
ways+0.014
ience+0.014
Neg logit contributions
 nod-0.015
 name-0.014
 understand-0.014
 agreement-0.013
 head-0.012
Token.
Token ID13
Feature activation+6.52e-03

Pos logit contributions
ways+0.039
pill+0.038
ths+0.037
ience+0.037
+0.036
Neg logit contributions
 nod-0.033
 agreement-0.029
 name-0.029
 understand-0.028
 gra-0.023
Token
Token ID198
Feature activation+1.24e-02

Pos logit contributions
 nod+0.010
 agreement+0.009
 name+0.009
 understand+0.009
 gra+0.008
Neg logit contributions
ways-0.012
pill-0.011
ths-0.011
ience-0.011
-0.011
Token
Token ID198
Feature activation-2.05e-02

Pos logit contributions
 nod+0.022
 agreement+0.020
 name+0.020
 understand+0.019
 head+0.018
Neg logit contributions
ways-0.020
pill-0.019
ths-0.018
oh-0.018
-0.018
TokenOne
Token ID3198
Feature activation+1.83e-02

Pos logit contributions
ways+0.029
pill+0.029
ience+0.027
ths+0.027
+0.026
Neg logit contributions
 nod-0.041
 agreement-0.037
 name-0.037
 understand-0.036
 gra-0.032
Token called
Token ID1444
Feature activation-1.39e-02

Pos logit contributions
 nod+0.007
 agreement+0.007
 name+0.007
 understand+0.006
 head+0.006
Neg logit contributions
ways-0.007
pill-0.007
ths-0.006
ience-0.006
-0.006
Token Jenna
Token ID45377
Feature activation-2.50e-04

Pos logit contributions
ths+0.029
ience+0.029
pill+0.028
ways+0.028
 lifetime+0.028
Neg logit contributions
 nod-0.019
 name-0.018
 agreement-0.016
 understand-0.016
 whine-0.013
Token.
Token ID13
Feature activation-7.44e-03

Pos logit contributions
ways+0.000
pill+0.000
ths+0.000
ience+0.000
oh+0.000
Neg logit contributions
 nod-0.000
 agreement-0.000
 name-0.000
 understand-0.000
 gra-0.000
Token Jenna
Token ID45377
Feature activation-1.55e-02

Pos logit contributions
ways+0.012
pill+0.012
ths+0.011
ience+0.011
+0.011
Neg logit contributions
 nod-0.013
 agreement-0.012
 name-0.012
 understand-0.012
 gra-0.010
Token was
Token ID373
Feature activation+5.95e-03

Pos logit contributions
ways+0.033
pill+0.033
ths+0.032
ience+0.031
+0.031
Neg logit contributions
 nod-0.020
 name-0.017
 understand-0.017
 agreement-0.016
 head-0.014
Token very
Token ID845
Feature activation-6.09e-03

Pos logit contributions
 nod+0.011
 agreement+0.010
 name+0.010
 understand+0.010
 gra+0.009
Neg logit contributions
ways-0.009
pill-0.009
ths-0.008
ience-0.008
-0.008
Token scared
Token ID12008
Feature activation-1.21e-03

Pos logit contributions
ways+0.011
pill+0.011
ths+0.011
ience+0.011
ar+0.010
Neg logit contributions
 nod-0.010
 name-0.009
 agreement-0.008
 understand-0.008
 head-0.007
Token,
Token ID11
Feature activation+2.22e-02

Pos logit contributions
pill+0.002
ways+0.002
ience+0.002
ths+0.002
+0.002
Neg logit contributions
 nod-0.002
 agreement-0.002
 name-0.002
 understand-0.002
 head-0.002
Token because
Token ID780
Feature activation+2.15e-02

Pos logit contributions
 nod+0.039
 agreement+0.035
 name+0.035
 understand+0.034
 gra+0.029
Neg logit contributions
ways-0.036
pill-0.036
ience-0.035
ths-0.035
endant-0.034
Token she
Token ID673
Feature activation-1.33e-02

Pos logit contributions
 nod+0.046
 agreement+0.043
 name+0.042
 understand+0.041
 gra+0.037
Neg logit contributions
ways-0.027
pill-0.026
ths-0.025
ience-0.025
-0.024
Token was
Token ID373
Feature activation-6.69e-03

Pos logit contributions
ways+0.024
pill+0.023
ths+0.023
+0.022
qu+0.022
Neg logit contributions
 nod-0.021
 name-0.020
 understand-0.019
 agreement-0.017
 head-0.016
Token patch
Token ID8529
Feature activation+9.11e-03

Pos logit contributions
pill+0.008
ways+0.008
ths+0.008
ience+0.007
qu+0.007
Neg logit contributions
 nod-0.008
 agreement-0.007
 name-0.007
 understand-0.007
 head-0.006
Token of
Token ID286
Feature activation+4.12e-03

Pos logit contributions
 nod+0.018
 agreement+0.016
 name+0.016
 understand+0.016
 whine+0.014
Neg logit contributions
ways-0.013
pill-0.012
ience-0.012
ths-0.012
ar-0.012
Token red
Token ID2266
Feature activation+1.02e-02

Pos logit contributions
 nod+0.008
 agreement+0.007
 name+0.007
 understand+0.007
 whine+0.006
Neg logit contributions
pill-0.006
 seas-0.006
ways-0.006
ience-0.006
endant-0.006
Token flowers
Token ID12734
Feature activation+7.31e-03

Pos logit contributions
 nod+0.018
 agreement+0.016
 name+0.016
 understand+0.015
 gra+0.013
Neg logit contributions
ways-0.017
pill-0.017
ths-0.016
ience-0.016
-0.015
Token that
Token ID326
Feature activation-4.37e-03

Pos logit contributions
 nod+0.013
 name+0.011
 agreement+0.011
 understand+0.011
 head+0.010
Neg logit contributions
pill-0.012
ways-0.012
ths-0.012
ience-0.012
ar-0.011
Token were
Token ID547
Feature activation-1.73e-02

Pos logit contributions
pill+0.007
ways+0.007
ience+0.006
ths+0.006
+0.006
Neg logit contributions
 nod-0.008
 name-0.007
 agreement-0.007
 understand-0.007
 gra-0.006
Token dancing
Token ID15360
Feature activation+2.55e-03

Pos logit contributions
ways+0.024
pill+0.024
ths+0.023
ience+0.023
+0.022
Neg logit contributions
 nod-0.034
 agreement-0.031
 name-0.031
 understand-0.030
 gra-0.027
Token in
Token ID287
Feature activation-7.29e-03

Pos logit contributions
 nod+0.004
 name+0.004
 agreement+0.004
 understand+0.004
 head+0.003
Neg logit contributions
pill-0.004
ways-0.004
ths-0.004
ience-0.004
-0.004
Token the
Token ID262
Feature activation-3.18e-02

Pos logit contributions
pill+0.015
ways+0.014
ience+0.014
ths+0.014
ar+0.014
Neg logit contributions
 nod-0.010
 agreement-0.010
 name-0.009
 understand-0.009
 gra-0.007
Token breeze
Token ID28633
Feature activation-8.91e-04

Pos logit contributions
pill+0.057
ways+0.056
ths+0.055
ience+0.054
+0.053
Neg logit contributions
 nod-0.052
 agreement-0.047
 name-0.047
 understand-0.044
 head-0.039
Token.
Token ID13
Feature activation+1.02e-03

Pos logit contributions
ways+0.001
pill+0.001
ths+0.001
ience+0.001
+0.001
Neg logit contributions
 nod-0.002
 agreement-0.001
 name-0.001
 understand-0.001
 gra-0.001
Token Tommy
Token ID19919
Feature activation-2.21e-03

Pos logit contributions
 nod+0.008
 agreement+0.008
 understand+0.007
 name+0.007
 gra+0.006
Neg logit contributions
ways-0.007
pill-0.007
-0.007
ths-0.007
ience-0.006
Token asked
Token ID1965
Feature activation+2.79e-03

Pos logit contributions
ways+0.004
pill+0.004
ths+0.004
ience+0.004
+0.004
Neg logit contributions
 nod-0.003
 name-0.003
 understand-0.003
 agreement-0.003
 head-0.003
Token his
Token ID465
Feature activation+8.76e-03

Pos logit contributions
 nod+0.005
 agreement+0.005
 understand+0.005
 name+0.005
 gra+0.004
Neg logit contributions
ways-0.004
ar-0.004
ha-0.004
ience-0.003
oh-0.003
Token mum
Token ID25682
Feature activation-1.11e-02

Pos logit contributions
 nod+0.012
 agreement+0.010
 understand+0.010
 name+0.010
 gra+0.008
Neg logit contributions
ways-0.017
pill-0.017
ths-0.017
ience-0.017
-0.016
Token:
Token ID25
Feature activation+1.25e-02

Pos logit contributions
ways+0.015
pill+0.015
ience+0.015
ths+0.014
oh+0.014
Neg logit contributions
 nod-0.021
 agreement-0.020
 name-0.019
 understand-0.018
 gra-0.017
Token
Token ID6184
Feature activation-4.27e-02

Pos logit contributions
 nod+0.017
 agreement+0.016
 understand+0.015
 name+0.014
 head+0.013
Neg logit contributions
ways-0.024
pill-0.023
-0.023
oh-0.023
ths-0.023
Token
Token ID95
Feature activation-3.08e-02

Pos logit contributions
pill+0.056
ways+0.056
 lifetime+0.056
ths+0.055
endant+0.053
Neg logit contributions
 nod-0.088
 name-0.080
 agreement-0.075
 gra-0.071
 understand-0.071
Token
Token ID26391
Feature activation-3.88e-02

Pos logit contributions
ways+0.045
pill+0.044
ths+0.043
ience+0.043
endant+0.041
Neg logit contributions
 nod-0.060
 name-0.054
 agreement-0.054
 understand-0.052
 gra-0.047
Token
Token ID129
Feature activation-1.98e-02

Pos logit contributions
pill+0.062
ience+0.061
ths+0.061
 least+0.059
ways+0.059
Neg logit contributions
 nod-0.069
 name-0.065
 understand-0.058
 agreement-0.053
 gra-0.050
Token
Token ID241
Feature activation+4.98e-02

Pos logit contributions
ways+0.034
pill+0.033
ths+0.032
ience+0.032
ar+0.031
Neg logit contributions
 nod-0.034
 agreement-0.030
 name-0.029
 understand-0.029
 gra-0.025
TokenCan
Token ID6090
Feature activation+5.10e-03

Pos logit contributions
 nod+0.086
 agreement+0.083
 head+0.072
 name+0.071
 understand+0.071
Neg logit contributions
pill-0.079
ways-0.078
oh-0.076
-0.075
t-0.073
Token minutes
Token ID2431
Feature activation-1.11e-02

Pos logit contributions
 nod+0.007
 agreement+0.006
 name+0.006
 understand+0.006
 whine+0.005
Neg logit contributions
ways-0.007
pill-0.007
endant-0.007
ths-0.007
-0.007
Token.
Token ID13
Feature activation+6.21e-03

Pos logit contributions
pill+0.016
ways+0.016
ience+0.015
ths+0.015
endant+0.015
Neg logit contributions
 nod-0.022
 agreement-0.020
 name-0.020
 understand-0.019
 gra-0.017
Token Finally
Token ID9461
Feature activation-1.62e-02

Pos logit contributions
 nod+0.010
 name+0.009
 agreement+0.008
 understand+0.008
 gra+0.007
Neg logit contributions
ience-0.012
pill-0.012
ths-0.011
ways-0.011
endant-0.011
Token,
Token ID11
Feature activation+1.85e-03

Pos logit contributions
pill+0.032
endant+0.029
ience+0.027
unn+0.027
 lifetime+0.027
Neg logit contributions
 nod-0.029
 agreement-0.026
 name-0.025
 understand-0.024
 say-0.021
Token her
Token ID607
Feature activation+9.87e-04

Pos logit contributions
 nod+0.003
 name+0.003
 agreement+0.003
 understand+0.003
 head+0.002
Neg logit contributions
pill-0.003
ience-0.003
endant-0.003
ways-0.003
qu-0.003
Token mom
Token ID1995
Feature activation-3.91e-02

Pos logit contributions
 nod+0.001
 agreement+0.001
 name+0.001
 understand+0.001
 gra+0.001
Neg logit contributions
ways-0.002
ths-0.002
pill-0.002
ience-0.002
-0.002
Token said
Token ID531
Feature activation-2.28e-02

Pos logit contributions
pill+0.062
ways+0.062
ience+0.058
+0.058
ths+0.058
Neg logit contributions
 nod-0.072
 name-0.064
 agreement-0.063
 understand-0.063
 head-0.054
Token,
Token ID11
Feature activation+9.30e-03

Pos logit contributions
ways+0.029
pill+0.029
ths+0.027
ience+0.027
+0.026
Neg logit contributions
 nod-0.049
 agreement-0.045
 name-0.044
 understand-0.044
 gra-0.039
Token "
Token ID366
Feature activation-9.34e-03

Pos logit contributions
 nod+0.012
 agreement+0.010
 name+0.010
 understand+0.010
 gra+0.008
Neg logit contributions
ways-0.020
pill-0.019
ths-0.019
ience-0.018
-0.018
TokenOK
Token ID11380
Feature activation-5.24e-03

Pos logit contributions
ways+0.017
pill+0.017
ths+0.016
oh+0.016
ience+0.016
Neg logit contributions
 nod-0.015
 agreement-0.013
 name-0.013
 understand-0.013
 head-0.011
Token,
Token ID11
Feature activation-7.87e-03

Pos logit contributions
pill+0.007
ways+0.007
endant+0.006
ience+0.006
ths+0.006
Neg logit contributions
 nod-0.011
 name-0.011
 agreement-0.010
 understand-0.010
 head-0.009
Token.
Token ID13
Feature activation+3.41e-03

Pos logit contributions
pill+0.077
ience+0.075
endant+0.073
ths+0.069
ocus+0.069
Neg logit contributions
 nod-0.041
 name-0.039
 understand-0.034
 pack-0.034
 agreement-0.032
Token Tim
Token ID5045
Feature activation+5.77e-03

Pos logit contributions
 nod+0.006
 agreement+0.006
 understand+0.006
 name+0.005
 whine+0.005
Neg logit contributions
ways-0.005
-0.005
pill-0.005
oh-0.005
endant-0.005
Tokenmy
Token ID1820
Feature activation-1.18e-02

Pos logit contributions
 nod+0.008
 agreement+0.007
 name+0.007
 understand+0.007
 gra+0.006
Neg logit contributions
pill-0.011
ways-0.011
ths-0.011
ience-0.011
oh-0.011
Token loved
Token ID6151
Feature activation-1.10e-04

Pos logit contributions
ways+0.022
pill+0.022
ths+0.021
+0.021
endant+0.021
Neg logit contributions
 nod-0.018
 name-0.016
 agreement-0.016
 understand-0.016
 head-0.014
Token going
Token ID1016
Feature activation-1.83e-02

Pos logit contributions
ways+0.000
pill+0.000
ience+0.000
ths+0.000
+0.000
Neg logit contributions
 nod-0.000
 name-0.000
 agreement-0.000
 understand-0.000
 gra-0.000
Token on
Token ID319
Feature activation-3.94e-02

Pos logit contributions
pill+0.036
ience+0.034
ways+0.034
ths+0.034
endant+0.033
Neg logit contributions
 nod-0.027
 name-0.026
 agreement-0.023
 understand-0.023
 pack-0.021
Token vacation
Token ID14600
Feature activation-2.72e-02

Pos logit contributions
endant+0.068
ience+0.068
ways+0.067
oh+0.067
pill+0.066
Neg logit contributions
 nod-0.059
 name-0.057
 agreement-0.055
 pack-0.051
 understand-0.050
Token because
Token ID780
Feature activation+1.83e-02

Pos logit contributions
pill+0.062
ience+0.061
endant+0.061
ths+0.056
ocus+0.055
Neg logit contributions
 nod-0.032
 name-0.029
 understand-0.026
 pack-0.026
 agreement-0.024
Token he
Token ID339
Feature activation-1.71e-02

Pos logit contributions
 nod+0.038
 agreement+0.035
 name+0.035
 understand+0.034
 gra+0.030
Neg logit contributions
ways-0.024
pill-0.023
ths-0.022
ience-0.022
-0.021
Token got
Token ID1392
Feature activation-3.32e-02

Pos logit contributions
ways+0.036
pill+0.035
endant+0.035
ths+0.034
mmm+0.033
Neg logit contributions
 nod-0.021
 name-0.020
 understand-0.018
 agreement-0.017
 head-0.016
Token to
Token ID284
Feature activation-1.41e-02

Pos logit contributions
pill+0.062
+0.062
ience+0.060
ths+0.059
ways+0.058
Neg logit contributions
 nod-0.050
 name-0.050
 understand-0.046
 agreement-0.044
 pack-0.041
Token and
Token ID290
Feature activation-1.11e-02

Pos logit contributions
pill+0.051
ways+0.051
ths+0.049
ience+0.049
+0.047
Neg logit contributions
 nod-0.063
 name-0.058
 agreement-0.057
 understand-0.055
 head-0.049
Token vowed
Token ID19982
Feature activation-2.85e-02

Pos logit contributions
ways+0.017
pill+0.017
ths+0.017
ience+0.016
ar+0.016
Neg logit contributions
 nod-0.020
 agreement-0.018
 name-0.018
 understand-0.018
 gra-0.015
Token to
Token ID284
Feature activation-4.10e-03

Pos logit contributions
pill+0.048
ways+0.044
ience+0.043
ths+0.043
endant+0.042
Neg logit contributions
 nod-0.053
 name-0.048
 understand-0.047
 agreement-0.045
 head-0.041
Token come
Token ID1282
Feature activation+2.81e-03

Pos logit contributions
ths+0.007
ways+0.007
+0.007
ar+0.007
oh+0.007
Neg logit contributions
 nod-0.006
 agreement-0.005
 name-0.005
 understand-0.005
 whine-0.004
Token back
Token ID736
Feature activation-1.04e-02

Pos logit contributions
 nod+0.005
 agreement+0.004
 understand+0.004
 name+0.004
 whine+0.004
Neg logit contributions
ways-0.005
ar-0.005
oh-0.004
ths-0.004
 least-0.004
Token soon
Token ID2582
Feature activation-3.75e-02

Pos logit contributions
ways+0.015
pill+0.014
ths+0.014
ience+0.014
ar+0.013
Neg logit contributions
 nod-0.021
 agreement-0.019
 name-0.019
 understand-0.018
 whine-0.016
Token.
Token ID13
Feature activation+3.77e-03

Pos logit contributions
pill+0.065
ways+0.063
ths+0.061
ience+0.060
endant+0.060
Neg logit contributions
 nod-0.064
 agreement-0.057
 name-0.057
 understand-0.055
 head-0.048
Token
Token ID220
Feature activation-1.71e-02

Pos logit contributions
 nod+0.006
 name+0.005
 agreement+0.005
 understand+0.005
 gra+0.004
Neg logit contributions
ways-0.007
ience-0.007
pill-0.007
ths-0.007
endant-0.007
Token
Token ID198
Feature activation+3.59e-03

Pos logit contributions
ways+0.030
pill+0.030
ths+0.029
ience+0.028
+0.028
Neg logit contributions
 nod-0.028
 agreement-0.026
 name-0.025
 understand-0.024
 whine-0.021
Token
Token ID198
Feature activation-1.88e-02

Pos logit contributions
 nod+0.007
 agreement+0.006
 name+0.006
 understand+0.006
 head+0.006
Neg logit contributions
pill-0.005
ways-0.005
-0.005
oh-0.005
qu-0.005
TokenWhen
Token ID2215
Feature activation-8.27e-03

Pos logit contributions
ths+0.028
ways+0.028
pill+0.028
qu+0.028
oh+0.027
Neg logit contributions
 nod-0.035
 name-0.032
 agreement-0.031
 understand-0.030
 head-0.028
Token can
Token ID460
Feature activation+1.21e-03

Pos logit contributions
 nod+0.029
 understand+0.026
 name+0.026
 agreement+0.022
 head+0.020
Neg logit contributions
ways-0.046
pill-0.045
ths-0.044
 lifetime-0.043
qu-0.042
Token't
Token ID470
Feature activation+1.80e-02

Pos logit contributions
 nod+0.002
 name+0.002
 agreement+0.002
 understand+0.002
 head+0.002
Neg logit contributions
pill-0.002
ways-0.002
ience-0.002
ths-0.002
endant-0.002
Token play
Token ID711
Feature activation-2.94e-04

Pos logit contributions
 nod+0.021
 agreement+0.018
 name+0.018
 understand+0.017
 gra+0.013
Neg logit contributions
ways-0.040
pill-0.040
ths-0.039
ience-0.039
-0.038
Token until
Token ID1566
Feature activation-2.00e-02

Pos logit contributions
ways+0.001
pill+0.001
ths+0.000
ience+0.000
+0.000
Neg logit contributions
 nod-0.000
 agreement-0.000
 name-0.000
 understand-0.000
 gra-0.000
Token your
Token ID534
Feature activation+5.36e-04

Pos logit contributions
pill+0.032
ways+0.030
ience+0.028
endant+0.027
ar+0.026
Neg logit contributions
 nod-0.039
 agreement-0.036
 name-0.035
 understand-0.035
 head-0.032
Token room
Token ID2119
Feature activation-4.07e-02

Pos logit contributions
 nod+0.001
 agreement+0.001
 name+0.001
 understand+0.001
 gra+0.001
Neg logit contributions
ways-0.001
pill-0.001
ths-0.001
ience-0.001
-0.001
Token is
Token ID318
Feature activation-2.42e-02

Pos logit contributions
pill+0.093
ths+0.088
ways+0.086
 lifetime+0.086
years+0.085
Neg logit contributions
 nod-0.047
 name-0.043
 head-0.038
 understand-0.036
 agreement-0.035
Token clean
Token ID3424
Feature activation-5.83e-03

Pos logit contributions
pill+0.043
ways+0.041
ience+0.040
ths+0.039
ar+0.038
Neg logit contributions
 nod-0.041
 agreement-0.038
 name-0.037
 understand-0.036
 head-0.032
Token."
Token ID526
Feature activation+1.38e-03

Pos logit contributions
ways+0.008
pill+0.008
ths+0.008
ience+0.008
+0.007
Neg logit contributions
 nod-0.012
 agreement-0.011
 name-0.010
 understand-0.010
 gra-0.009
Token So
Token ID1406
Feature activation-7.94e-03

Pos logit contributions
 nod+0.003
 agreement+0.002
 name+0.002
 understand+0.002
 gra+0.002
Neg logit contributions
ways-0.002
pill-0.002
ience-0.002
ths-0.002
endant-0.002
Token Tim
Token ID5045
Feature activation+2.28e-02

Pos logit contributions
pill+0.015
endant+0.013
ience+0.013
ways+0.013
ocus+0.012
Neg logit contributions
 nod-0.015
 name-0.013
 understand-0.013
 agreement-0.013
 say-0.011