TOP ACTIVATIONS
MAX = 5.906

He realized it was his fault and he felt regret .
wrong and it was his fault . He learned a lesson
felt like it was her fault , like she had done
he knew it was his fault . In the end ,
knew it was his own fault . He would have to
and it was his own fault . Johnny smiled
realized it was his own fault for not val uing her
He wanted to make a difference in the world .
because he had made a difference and brought joy back to
knew they were making a difference in their community . <|endoftext|>
sorry . It was my fault . I was too brave
strong body and a w ag gy tail . One day
that he was making a difference . He was proud to
know he had made a difference . Even small acts of
told her it was her fault because she dropped her toy
that she could make a difference in someone 's day .
it was all Joe 's fault . Joe tried to talk
Tom . It was my fault . I pulled the string
knew she could make a difference . The moral
that she could make a difference just by caring . <|endoftext|>

MIN ACTIVATIONS
MIN = -6.977

. She was three - years - old and she loved
girl who was three - years - old . She had
She was just three - years - old and loved to
He was only three - years - old and he was
. She was three - years - old and she had
still wanted to eat something scr um pt ious .
that even a three - year - old can deliver something
. She was three - years old and she loved to
She was only three - years - old , but she
bracelet for her three - year - old child . The
. He was three - years - old and liked to
, who was three - years - old , said to
there teaching the 3 - year - old girl about the
. Daisy was three - years - old , and she
She was only three - years - old , but she
complete . Height : 90 centimeters . Weight : 15 kilograms
. She was three - years - old and loved to
a very curious 3 - year - old boy . One
tell Daddy 's three - year - old a story about
. Jake was 3 - years - old and the classroom

INTERVAL 2.685 - 5.906
CONTAINS 1.054%

when she heard a strange noise . She was so scared
He wanted to have adventures but wasn 't sure how .
, she fell off and hurt herself . Her mom rushed
dress . He made a big hole . L
Lu cy was so happy that she had learned something new

INTERVAL -0.536 - 2.685
CONTAINS 68.829%

They went on a search throughout the house until they came
anger . She saw they were all wet ! She was
a time , in a small house , lived a compassionate
" Tweet , tweet , tweet !" L ily
and photos . Ben and Lily were curious and wanted to

INTERVAL -3.756 - -0.536
CONTAINS 29.887%

, such as juicy apples and crunch y carrots .
and parked it inside . They drove the van to the
kicked the football and it flew into the goal easily .
to the p uddle and jumped in it and spl ashed
and said , " Mom , we did it ! We

INTERVAL -6.977 - -3.756
CONTAINS 0.230%

win ! An ostr ich was in the
join today ." M um smiled . She knew
. Their bags are too full . Their things do not
advantage . â M ate o had to think
making noise . M ama : " Let 's
Token He
Token ID679
Feature activation+3.21e-01

Pos logit contributions
ling+0.018
ishly+0.017
 sorry+0.017
 gru+0.017
oding+0.017
Neg logit contributions
 pic-0.019
 Christmas-0.019
ine-0.018
 weeks-0.017
 Monday-0.017
Token realized
Token ID6939
Feature activation-2.30e-01

Pos logit contributions
ling+0.008
 sorry+0.007
ishly+0.007
oding+0.007
 gru+0.007
Neg logit contributions
 pic-0.006
 Christmas-0.005
ine-0.005
 weeks-0.005
 planned-0.005
Token it
Token ID340
Feature activation-7.04e-02

Pos logit contributions
 pic+0.005
 Christmas+0.005
ine+0.004
 weeks+0.004
 jars+0.004
Neg logit contributions
ling-0.005
ishly-0.005
 sorry-0.005
 gru-0.005
 terrified-0.005
Token was
Token ID373
Feature activation+2.58e-01

Pos logit contributions
 pic+0.001
 Christmas+0.001
ine+0.001
 weeks+0.001
 jars+0.001
Neg logit contributions
ling-0.002
ishly-0.002
 sorry-0.002
 gru-0.002
 terrified-0.002
Token his
Token ID465
Feature activation+9.65e-01

Pos logit contributions
 sorry+0.006
ling+0.006
 gru+0.006
ishly+0.005
 terrified+0.005
Neg logit contributions
 pic-0.005
 Christmas-0.005
ine-0.005
 strips-0.004
 hotels-0.004
Token fault
Token ID8046
Feature activation+5.91e+00

Pos logit contributions
 sorry+0.020
ling+0.020
ishly+0.020
 gru+0.019
 terrified+0.019
Neg logit contributions
 pic-0.020
 Christmas-0.019
ine-0.018
 weeks-0.018
 hotels-0.018
Token and
Token ID290
Feature activation-1.97e-01

Pos logit contributions
ling+0.162
 gru+0.161
ishly+0.160
 sorry+0.159
hing+0.158
Neg logit contributions
 Christmas-0.078
 pic-0.076
ine-0.072
 weeks-0.069
 strips-0.066
Token he
Token ID339
Feature activation+8.06e-01

Pos logit contributions
 Christmas+0.003
 pic+0.003
 weeks+0.003
ine+0.003
 planned+0.003
Neg logit contributions
ling-0.005
ishly-0.005
hing-0.005
 vibration-0.005
oding-0.005
Token felt
Token ID2936
Feature activation+5.42e-01

Pos logit contributions
ling+0.019
ishly+0.019
 sorry+0.019
 gru+0.019
oding+0.018
Neg logit contributions
 pic-0.014
 Christmas-0.013
ine-0.013
 weeks-0.012
 jars-0.012
Token regret
Token ID13721
Feature activation+1.33e+00

Pos logit contributions
ishly+0.014
arse+0.013
oding+0.013
hing+0.013
ivered+0.013
Neg logit contributions
 Christmas-0.008
 planned-0.008
 to-0.008
 weeks-0.008
 pic-0.008
Token.
Token ID13
Feature activation+1.37e+00

Pos logit contributions
ling+0.033
 gru+0.033
ishly+0.033
 sorry+0.033
 vibration+0.032
Neg logit contributions
 Christmas-0.021
 pic-0.020
ine-0.019
 weeks-0.018
 Monday-0.018
Token wrong
Token ID2642
Feature activation+1.15e+00

Pos logit contributions
ishly+0.028
hing+0.026
ling+0.026
 vibration+0.026
 gru+0.025
Neg logit contributions
 pic-0.054
 Christmas-0.053
ine-0.051
 planned-0.050
 Monday-0.050
Token and
Token ID290
Feature activation+3.35e-01

Pos logit contributions
ling+0.027
 sorry+0.027
ishly+0.026
 gru+0.026
oding+0.025
Neg logit contributions
 pic-0.022
 Christmas-0.020
 hotels-0.018
 jars-0.018
ine-0.018
Token it
Token ID340
Feature activation+6.05e-01

Pos logit contributions
ishly+0.008
hing+0.008
ling+0.008
 vibration+0.008
oding+0.008
Neg logit contributions
 Christmas-0.005
 weeks-0.005
 pic-0.005
 planned-0.005
 unp-0.005
Token was
Token ID373
Feature activation+4.73e-01

Pos logit contributions
ling+0.015
ishly+0.015
 sorry+0.015
 gru+0.014
 terrified+0.014
Neg logit contributions
 pic-0.010
 Christmas-0.010
ine-0.009
 weeks-0.009
 jars-0.009
Token his
Token ID465
Feature activation+1.03e+00

Pos logit contributions
 sorry+0.010
ling+0.009
 gru+0.009
ishly+0.009
 terrified+0.009
Neg logit contributions
 pic-0.010
 Christmas-0.010
ine-0.009
 hotels-0.009
 strips-0.009
Token fault
Token ID8046
Feature activation+5.83e+00

Pos logit contributions
 sorry+0.023
 gru+0.020
ishly+0.020
 terrified+0.020
 vibration+0.020
Neg logit contributions
 pic-0.021
 Christmas-0.019
ine-0.019
 hotels-0.019
aters-0.019
Token.
Token ID13
Feature activation+7.31e-01

Pos logit contributions
 gru+0.159
hing+0.159
ling+0.158
 alarmed+0.158
ishly+0.157
Neg logit contributions
 Christmas-0.077
 pic-0.069
ine-0.068
 weeks-0.067
 Monday-0.063
Token He
Token ID679
Feature activation-2.48e-02

Pos logit contributions
ling+0.017
ishly+0.017
 sorry+0.016
hing+0.016
 vibration+0.016
Neg logit contributions
 pic-0.013
 Christmas-0.013
ine-0.012
 Monday-0.012
 weeks-0.012
Token learned
Token ID4499
Feature activation-1.65e+00

Pos logit contributions
 pic+0.001
 Christmas+0.000
 hotels+0.000
 Monday+0.000
 jars+0.000
Neg logit contributions
ling-0.001
ishly-0.001
 sorry-0.001
 terrified-0.001
 gru-0.001
Token a
Token ID257
Feature activation+9.01e-01

Pos logit contributions
 pic+0.034
 Christmas+0.032
ine+0.031
 jars+0.030
 weeks+0.030
Neg logit contributions
 sorry-0.035
ishly-0.035
ling-0.035
 gru-0.034
oding-0.033
Token lesson
Token ID11483
Feature activation+1.11e+00

Pos logit contributions
ishly+0.024
ling+0.024
hing+0.023
 sorry+0.023
 gru+0.023
Neg logit contributions
 pic-0.013
 Christmas-0.013
ine-0.012
 Monday-0.012
 weeks-0.012
Token felt
Token ID2936
Feature activation+6.00e-01

Pos logit contributions
 pic+0.006
 Christmas+0.006
ine+0.005
 jars+0.005
 weeks+0.005
Neg logit contributions
ling-0.007
ishly-0.007
 sorry-0.007
 gru-0.006
 terrified-0.006
Token like
Token ID588
Feature activation-4.34e-01

Pos logit contributions
oding+0.011
hing+0.010
ishly+0.010
arse+0.010
ivered+0.010
Neg logit contributions
 Christmas-0.013
 weeks-0.013
 planned-0.013
 pic-0.013
 Monday-0.012
Token it
Token ID340
Feature activation+1.57e+00

Pos logit contributions
 pic+0.007
 Christmas+0.007
ine+0.007
 weeks+0.006
 jars+0.006
Neg logit contributions
ling-0.011
 sorry-0.011
ishly-0.010
 gru-0.010
 terrified-0.010
Token was
Token ID373
Feature activation+2.24e-01

Pos logit contributions
ling+0.039
ishly+0.038
 sorry+0.038
 gru+0.037
 terrified+0.037
Neg logit contributions
 pic-0.027
 Christmas-0.026
ine-0.024
 weeks-0.023
 jars-0.023
Token her
Token ID607
Feature activation+8.14e-01

Pos logit contributions
ishly+0.005
ling+0.005
 gru+0.005
oding+0.005
hing+0.005
Neg logit contributions
 pic-0.004
 Christmas-0.004
ine-0.004
 weeks-0.004
 Monday-0.004
Token fault
Token ID8046
Feature activation+5.82e+00

Pos logit contributions
 sorry+0.017
ling+0.016
ishly+0.016
 gru+0.016
 vibration+0.016
Neg logit contributions
 pic-0.017
 Christmas-0.016
ine-0.016
 weeks-0.015
 jars-0.015
Token,
Token ID11
Feature activation+3.24e-01

Pos logit contributions
 gru+0.163
ling+0.161
hing+0.158
 sorry+0.158
 alarmed+0.158
Neg logit contributions
 Christmas-0.074
 pic-0.071
ine-0.069
 weeks-0.065
 strips-0.062
Token like
Token ID588
Feature activation-2.42e+00

Pos logit contributions
ling+0.008
ishly+0.008
 sorry+0.008
 gru+0.008
 terrified+0.008
Neg logit contributions
 pic-0.005
 Christmas-0.005
ine-0.005
 weeks-0.004
 jars-0.004
Token she
Token ID673
Feature activation+1.14e-01

Pos logit contributions
 pic+0.043
ine+0.042
 Christmas+0.040
ast+0.039
aters+0.038
Neg logit contributions
 sorry-0.058
ishly-0.054
ling-0.054
 gru-0.053
 terrified-0.052
Token had
Token ID550
Feature activation-1.73e+00

Pos logit contributions
ling+0.003
 sorry+0.003
ishly+0.003
 gru+0.003
 terrified+0.003
Neg logit contributions
 pic-0.002
 Christmas-0.002
ine-0.002
 weeks-0.002
 jars-0.002
Token done
Token ID1760
Feature activation-1.03e+00

Pos logit contributions
 pic+0.032
 Christmas+0.032
 hotels+0.030
 Monday+0.030
ine+0.029
Neg logit contributions
 sorry-0.038
ling-0.037
 terrified-0.036
 gru-0.036
ishly-0.035
Token he
Token ID339
Feature activation+7.50e-01

Pos logit contributions
ishly+0.068
hing+0.067
 trigger+0.064
 vibration+0.064
ling+0.064
Neg logit contributions
 pic-0.043
 Christmas-0.043
 weeks-0.041
 Monday-0.040
 planned-0.040
Token knew
Token ID2993
Feature activation-1.42e-01

Pos logit contributions
ling+0.015
 sorry+0.015
ishly+0.015
 gru+0.014
 terrified+0.014
Neg logit contributions
 pic-0.016
 Christmas-0.016
ine-0.015
 jars-0.015
 weeks-0.015
Token it
Token ID340
Feature activation+2.84e-01

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.003
 weeks+0.003
 jars+0.003
Neg logit contributions
ling-0.003
 sorry-0.003
ishly-0.003
 gru-0.003
 terrified-0.003
Token was
Token ID373
Feature activation+8.28e-01

Pos logit contributions
ling+0.007
ishly+0.007
 sorry+0.007
 gru+0.007
 terrified+0.006
Neg logit contributions
 pic-0.005
 Christmas-0.005
ine-0.005
 jars-0.004
 weeks-0.004
Token his
Token ID465
Feature activation+1.42e+00

Pos logit contributions
ishly+0.018
ling+0.018
 gru+0.017
 sorry+0.017
 terrified+0.017
Neg logit contributions
 pic-0.016
 Christmas-0.016
ine-0.015
 weeks-0.015
 Monday-0.015
Token fault
Token ID8046
Feature activation+5.80e+00

Pos logit contributions
 sorry+0.033
ishly+0.031
 gru+0.031
ling+0.031
 alarmed+0.030
Neg logit contributions
 pic-0.027
 Christmas-0.025
ine-0.025
 jars-0.024
 hotels-0.024
Token.
Token ID13
Feature activation+1.55e-01

Pos logit contributions
ling+0.158
ishly+0.156
 sorry+0.156
 gru+0.156
 terrified+0.153
Neg logit contributions
 pic-0.078
 Christmas-0.077
ine-0.071
 weeks-0.069
 Monday-0.067
Token In
Token ID554
Feature activation+7.53e-02

Pos logit contributions
 sorry+0.002
 relieved+0.002
 gru+0.002
ishly+0.002
oding+0.002
Neg logit contributions
 pic-0.004
ine-0.004
 Christmas-0.004
 strips-0.004
 jars-0.004
Token the
Token ID262
Feature activation+1.55e+00

Pos logit contributions
 sorry+0.002
ling+0.002
ishly+0.002
 gru+0.002
 terrified+0.002
Neg logit contributions
 pic-0.001
 Christmas-0.001
ine-0.001
 jars-0.001
 weeks-0.001
Token end
Token ID886
Feature activation+1.34e-01

Pos logit contributions
ishly+0.036
hing+0.036
ling+0.035
oding+0.035
 terrified+0.035
Neg logit contributions
 Christmas-0.028
 pic-0.028
 jars-0.027
 Monday-0.027
 weeks-0.027
Token,
Token ID11
Feature activation+6.62e-01

Pos logit contributions
ling+0.004
ishly+0.004
 sorry+0.004
 gru+0.004
 terrified+0.004
Neg logit contributions
 pic-0.002
 Christmas-0.002
ine-0.001
 weeks-0.001
 jars-0.001
Token knew
Token ID2993
Feature activation-3.21e-01

Pos logit contributions
ling+0.015
ishly+0.015
 sorry+0.014
 gru+0.014
 terrified+0.014
Neg logit contributions
 pic-0.015
 Christmas-0.015
 hotels-0.014
 jars-0.014
 strips-0.014
Token it
Token ID340
Feature activation+2.96e-01

Pos logit contributions
 pic+0.008
 Christmas+0.007
aters+0.007
 jars+0.007
 strips+0.007
Neg logit contributions
 sorry-0.006
ling-0.006
ishly-0.006
 gru-0.006
oding-0.005
Token was
Token ID373
Feature activation+4.93e-01

Pos logit contributions
ling+0.007
ishly+0.007
 sorry+0.007
 gru+0.007
 terrified+0.007
Neg logit contributions
 pic-0.005
 Christmas-0.005
ine-0.005
 jars-0.005
 weeks-0.004
Token his
Token ID465
Feature activation+1.55e+00

Pos logit contributions
 sorry+0.010
ling+0.010
 gru+0.010
ishly+0.010
 terrified+0.009
Neg logit contributions
 pic-0.011
 Christmas-0.010
ine-0.010
 strips-0.010
 jars-0.010
Token own
Token ID898
Feature activation+1.12e-02

Pos logit contributions
 sorry+0.035
 vibration+0.030
 gru+0.030
 relief+0.030
ishly+0.030
Neg logit contributions
 pic-0.031
ine-0.029
 Christmas-0.028
aters-0.028
 hotels-0.028
Token fault
Token ID8046
Feature activation+5.72e+00

Pos logit contributions
 sorry+0.000
 gru+0.000
 relief+0.000
 vibration+0.000
ishly+0.000
Neg logit contributions
 pic-0.000
 Christmas-0.000
ine-0.000
 jars-0.000
ast-0.000
Token.
Token ID13
Feature activation+6.93e-01

Pos logit contributions
ling+0.151
ishly+0.149
 sorry+0.149
 gru+0.147
 terrified+0.144
Neg logit contributions
 pic-0.086
 Christmas-0.083
ine-0.077
 weeks-0.074
 jars-0.073
Token He
Token ID679
Feature activation+9.10e-02

Pos logit contributions
 sorry+0.014
 gru+0.013
ishly+0.013
ling+0.013
 terrified+0.013
Neg logit contributions
 pic-0.015
 Christmas-0.015
 jars-0.014
ine-0.014
 weeks-0.014
Token would
Token ID561
Feature activation-4.50e-01

Pos logit contributions
ishly+0.002
ling+0.002
 relieved+0.002
 terrified+0.002
 sorry+0.002
Neg logit contributions
 pic-0.002
 Christmas-0.002
 hotels-0.002
 jars-0.002
aters-0.002
Token have
Token ID423
Feature activation-1.63e+00

Pos logit contributions
 pic+0.008
 jars+0.007
 Christmas+0.007
 strips+0.007
 weeks+0.007
Neg logit contributions
ishly-0.011
 sorry-0.011
 apologise-0.010
ling-0.010
hing-0.010
Token to
Token ID284
Feature activation-9.58e-01

Pos logit contributions
 Christmas+0.028
 pic+0.027
aters+0.027
ast+0.027
 hotels+0.026
Neg logit contributions
 sorry-0.038
 gru-0.036
 relieved-0.035
 apologise-0.035
 relief-0.035
Token and
Token ID290
Feature activation-1.58e-01

Pos logit contributions
 pic+0.001
 Christmas+0.001
 jars+0.001
 Monday+0.001
ine+0.001
Neg logit contributions
ling-0.002
ishly-0.002
 sorry-0.002
oding-0.002
 gru-0.002
Token it
Token ID340
Feature activation+4.96e-01

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.003
 jars+0.003
 strips+0.003
Neg logit contributions
 sorry-0.003
ling-0.003
ishly-0.003
 gru-0.003
 terrified-0.003
Token was
Token ID373
Feature activation+7.80e-02

Pos logit contributions
ling+0.011
ishly+0.011
 sorry+0.011
 terrified+0.010
 gru+0.010
Neg logit contributions
 pic-0.010
 Christmas-0.010
 jars-0.009
 hotels-0.009
ine-0.009
Token his
Token ID465
Feature activation+1.07e+00

Pos logit contributions
 sorry+0.002
 gru+0.002
ling+0.002
 terrified+0.002
ishly+0.002
Neg logit contributions
 pic-0.002
 Christmas-0.001
 jars-0.001
 strips-0.001
 hotels-0.001
Token own
Token ID898
Feature activation-1.36e-01

Pos logit contributions
 sorry+0.025
ishly+0.024
ling+0.023
 vibration+0.023
 gru+0.023
Neg logit contributions
 pic-0.021
 Christmas-0.019
ine-0.018
 jars-0.018
 hotels-0.018
Token fault
Token ID8046
Feature activation+5.70e+00

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.002
 jars+0.002
 weeks+0.002
Neg logit contributions
 sorry-0.003
ling-0.003
ishly-0.003
 gru-0.003
oding-0.003
Token.
Token ID13
Feature activation+8.89e-01

Pos logit contributions
ling+0.150
 gru+0.150
 sorry+0.149
ishly+0.148
 alarmed+0.147
Neg logit contributions
 Christmas-0.080
 pic-0.078
ine-0.074
 weeks-0.072
 Monday-0.069
Token
Token ID198
Feature activation+4.28e-02

Pos logit contributions
 sorry+0.019
ling+0.018
ishly+0.018
 gru+0.018
oding+0.018
Neg logit contributions
 pic-0.018
 Christmas-0.018
ine-0.017
 jars-0.017
 weeks-0.016
Token
Token ID198
Feature activation-2.92e-02

Pos logit contributions
ishly+0.001
 vibration+0.001
ling+0.001
rified+0.001
 gru+0.001
Neg logit contributions
 Grand-0.000
 pic-0.000
ine-0.000
 Monday-0.000
 Christmas-0.000
TokenJohnny
Token ID44960
Feature activation-1.36e+00

Pos logit contributions
 pic+0.001
 Christmas+0.001
ine+0.001
 jars+0.001
 weeks+0.001
Neg logit contributions
ling-0.001
 sorry-0.001
ishly-0.001
 gru-0.001
oding-0.001
Token smiled
Token ID13541
Feature activation+7.29e-01

Pos logit contributions
 pic+0.031
 Christmas+0.030
 hotels+0.029
 jars+0.028
 Monday+0.027
Neg logit contributions
ling-0.028
ishly-0.028
 terrified-0.026
 relieved-0.025
 sorry-0.025
Token realized
Token ID6939
Feature activation-2.45e-01

Pos logit contributions
 pic+0.001
 Christmas+0.001
 hotels+0.001
 jars+0.001
 Monday+0.001
Neg logit contributions
ling-0.001
ishly-0.001
 terrified-0.001
 sorry-0.001
 gru-0.001
Token it
Token ID340
Feature activation+1.32e-01

Pos logit contributions
 pic+0.006
 Christmas+0.005
ine+0.005
 jars+0.005
 strips+0.005
Neg logit contributions
 sorry-0.005
ishly-0.004
ling-0.004
 gru-0.004
oding-0.004
Token was
Token ID373
Feature activation+4.09e-01

Pos logit contributions
ling+0.003
ishly+0.003
 sorry+0.003
 terrified+0.003
 shocked+0.003
Neg logit contributions
 pic-0.003
 Christmas-0.003
 hotels-0.003
 jars-0.003
aters-0.003
Token his
Token ID465
Feature activation+1.34e+00

Pos logit contributions
 sorry+0.009
ling+0.008
 gru+0.008
ishly+0.008
 terrified+0.008
Neg logit contributions
 pic-0.009
 hotels-0.008
 strips-0.008
aters-0.008
ast-0.008
Token own
Token ID898
Feature activation-1.07e-01

Pos logit contributions
 sorry+0.028
ishly+0.028
ling+0.027
 gru+0.027
 terrified+0.026
Neg logit contributions
 pic-0.028
 Christmas-0.026
ine-0.025
 weeks-0.025
 jars-0.025
Token fault
Token ID8046
Feature activation+5.65e+00

Pos logit contributions
 pic+0.002
 Christmas+0.002
ine+0.002
 weeks+0.002
 jars+0.002
Neg logit contributions
 sorry-0.002
ishly-0.002
ling-0.002
 gru-0.002
 vibration-0.002
Token for
Token ID329
Feature activation+6.43e-01

Pos logit contributions
ling+0.153
ishly+0.151
 sorry+0.151
 gru+0.148
 terrified+0.145
Neg logit contributions
 pic-0.082
 Christmas-0.078
ine-0.072
 jars-0.070
 weeks-0.069
Token not
Token ID407
Feature activation+3.68e-01

Pos logit contributions
 sorry+0.013
 gru+0.012
ishly+0.012
ling+0.012
 apologise+0.012
Neg logit contributions
 pic-0.014
ine-0.012
 Christmas-0.012
aters-0.012
 strips-0.012
Token val
Token ID1188
Feature activation-1.76e-01

Pos logit contributions
 sorry+0.009
ling+0.009
 apologise+0.009
 gru+0.009
ishly+0.009
Neg logit contributions
 pic-0.006
 jars-0.006
 Christmas-0.006
ine-0.006
 strips-0.005
Tokenuing
Token ID4250
Feature activation-4.82e-01

Pos logit contributions
 pic+0.005
 Christmas+0.004
 weeks+0.004
ine+0.004
 jars+0.004
Neg logit contributions
ling-0.003
ishly-0.003
 sorry-0.003
 gru-0.003
oding-0.003
Token her
Token ID607
Feature activation-7.59e-01

Pos logit contributions
 pic+0.010
 Christmas+0.009
 Monday+0.009
ine+0.009
 Saturday+0.009
Neg logit contributions
ling-0.010
 sorry-0.010
ishly-0.010
 gru-0.010
oding-0.009
Token He
Token ID679
Feature activation-1.77e-01

Pos logit contributions
ishly+0.003
 gru+0.003
 sorry+0.003
 terrified+0.003
ling+0.003
Neg logit contributions
 pic-0.004
 Christmas-0.004
ine-0.004
 weeks-0.004
 strips-0.004
Token wanted
Token ID2227
Feature activation-6.52e-02

Pos logit contributions
 pic+0.004
aters+0.004
 hotels+0.003
 Christmas+0.003
 Wed+0.003
Neg logit contributions
ling-0.004
 gru-0.003
 terrified-0.003
ishly-0.003
cut-0.003
Token to
Token ID284
Feature activation+9.11e-02

Pos logit contributions
ine+0.001
aters+0.001
 pic+0.001
ast+0.001
ara+0.001
Neg logit contributions
 sorry-0.002
 gru-0.002
ling-0.002
 alarmed-0.002
 apologise-0.002
Token make
Token ID787
Feature activation-1.15e-01

Pos logit contributions
 sorry+0.002
ling+0.002
ishly+0.002
 gru+0.002
 apologise+0.002
Neg logit contributions
 pic-0.002
 Christmas-0.002
ine-0.002
 weeks-0.002
 jars-0.002
Token a
Token ID257
Feature activation+1.53e+00

Pos logit contributions
 pic+0.002
 Christmas+0.002
ine+0.002
 jars+0.002
 weeks+0.002
Neg logit contributions
ishly-0.003
ling-0.003
 gru-0.003
oding-0.003
hing-0.003
Token difference
Token ID3580
Feature activation+5.51e+00

Pos logit contributions
ishly+0.037
ling+0.036
hing+0.035
oding+0.035
rified+0.034
Neg logit contributions
 pic-0.029
 Christmas-0.028
 jars-0.025
ine-0.025
 strips-0.024
Token in
Token ID287
Feature activation-1.59e+00

Pos logit contributions
ling+0.138
ishly+0.137
 sorry+0.137
 gru+0.135
 terrified+0.132
Neg logit contributions
 pic-0.089
 Christmas-0.087
ine-0.080
 weeks-0.078
 jars-0.077
Token the
Token ID262
Feature activation+2.04e+00

Pos logit contributions
 pic+0.031
ine+0.030
aters+0.028
 Christmas+0.028
ara+0.028
Neg logit contributions
 sorry-0.035
 gru-0.033
ling-0.032
 terrified-0.032
 vibration-0.031
Token world
Token ID995
Feature activation+1.34e-01

Pos logit contributions
ishly+0.054
ling+0.052
oding+0.052
 terrified+0.050
 alarmed+0.050
Neg logit contributions
 Christmas-0.033
 pic-0.032
 weeks-0.029
 jars-0.028
 Monday-0.027
Token.
Token ID13
Feature activation+4.62e-01

Pos logit contributions
ling+0.003
ishly+0.003
 sorry+0.003
hing+0.003
 gru+0.003
Neg logit contributions
 pic-0.003
 Christmas-0.002
ine-0.002
 weeks-0.002
ast-0.002
Token
Token ID220
Feature activation-5.16e-01

Pos logit contributions
ishly+0.008
ling+0.008
 sorry+0.008
 gru+0.008
 terrified+0.007
Neg logit contributions
 pic-0.011
 Christmas-0.011
ine-0.010
 weeks-0.010
 jars-0.010
Token because
Token ID780
Feature activation+4.52e-01

Pos logit contributions
 sorry+0.060
ling+0.059
ishly+0.058
 relieved+0.058
 terrified+0.057
Neg logit contributions
 pic-0.051
 folders-0.047
 jars-0.046
 hotels-0.046
aters-0.045
Token he
Token ID339
Feature activation+1.29e+00

Pos logit contributions
 sorry+0.007
 gru+0.007
ling+0.007
 terrified+0.007
 relieved+0.007
Neg logit contributions
 pic-0.011
ine-0.011
aters-0.011
 jars-0.011
 strips-0.010
Token had
Token ID550
Feature activation-1.16e+00

Pos logit contributions
ling+0.028
 terrified+0.027
 relieved+0.027
ishly+0.027
 sorry+0.026
Neg logit contributions
 pic-0.026
 hotels-0.025
 Christmas-0.024
 jars-0.024
aters-0.024
Token made
Token ID925
Feature activation-1.47e-01

Pos logit contributions
 pic+0.024
 Christmas+0.023
aters+0.023
 hotels+0.023
ine+0.023
Neg logit contributions
 sorry-0.023
 relieved-0.022
 terrified-0.022
 shocked-0.022
 alarmed-0.021
Token a
Token ID257
Feature activation+2.31e+00

Pos logit contributions
 pic+0.002
 Christmas+0.002
ine+0.002
 weeks+0.002
 jars+0.002
Neg logit contributions
ling-0.004
ishly-0.004
 sorry-0.004
 gru-0.004
 terrified-0.004
Token difference
Token ID3580
Feature activation+5.47e+00

Pos logit contributions
ishly+0.050
ling+0.050
hing+0.048
oding+0.047
 gru+0.047
Neg logit contributions
 pic-0.047
 Christmas-0.046
ine-0.042
 jars-0.041
 weeks-0.041
Token and
Token ID290
Feature activation+2.80e-01

Pos logit contributions
ling+0.131
ishly+0.128
 sorry+0.128
 gru+0.125
 terrified+0.122
Neg logit contributions
 pic-0.098
 Christmas-0.093
ine-0.089
 jars-0.085
 weeks-0.085
Token brought
Token ID3181
Feature activation+7.52e-01

Pos logit contributions
ling+0.007
ishly+0.007
 sorry+0.007
 gru+0.006
oding+0.006
Neg logit contributions
 pic-0.005
 Christmas-0.005
ine-0.004
 weeks-0.004
 jars-0.004
Token joy
Token ID8716
Feature activation+2.23e-01

Pos logit contributions
ishly+0.018
ling+0.018
 sorry+0.018
 gru+0.017
 terrified+0.017
Neg logit contributions
 pic-0.013
 Christmas-0.013
 weeks-0.012
ine-0.012
 jars-0.012
Token back
Token ID736
Feature activation+4.09e-02

Pos logit contributions
ling+0.007
ishly+0.006
 sorry+0.006
 gru+0.006
oding+0.006
Neg logit contributions
 pic-0.003
 Christmas-0.002
ine-0.002
 weeks-0.002
 Monday-0.002
Token to
Token ID284
Feature activation+1.22e+00

Pos logit contributions
ling+0.001
ishly+0.001
 sorry+0.001
 gru+0.001
oding+0.001
Neg logit contributions
 pic-0.001
ine-0.001
 Christmas-0.001
 Monday-0.001
 strips-0.001
Token knew
Token ID2993
Feature activation-2.80e-01

Pos logit contributions
ling+0.022
 sorry+0.022
ishly+0.022
 gru+0.021
 terrified+0.021
Neg logit contributions
 pic-0.017
 Christmas-0.017
ine-0.016
 weeks-0.015
 Monday-0.015
Token they
Token ID484
Feature activation+2.37e-01

Pos logit contributions
 pic+0.005
aters+0.005
 Christmas+0.005
 strips+0.005
ine+0.005
Neg logit contributions
 sorry-0.007
ishly-0.006
ling-0.006
 gru-0.006
 apologise-0.006
Token were
Token ID547
Feature activation+1.12e+00

Pos logit contributions
 gru+0.007
ling+0.007
ishly+0.007
 sorry+0.007
 vibration+0.006
Neg logit contributions
 Christmas-0.003
 pic-0.003
ine-0.003
 planned-0.003
 weeks-0.003
Token making
Token ID1642
Feature activation-5.89e-01

Pos logit contributions
ling+0.023
ishly+0.023
 gru+0.022
 sorry+0.022
oding+0.022
Neg logit contributions
 pic-0.024
 Christmas-0.023
ine-0.022
 weeks-0.021
 jars-0.021
Token a
Token ID257
Feature activation+1.85e+00

Pos logit contributions
 pic+0.010
 Christmas+0.009
ine+0.009
 weeks+0.008
 jars+0.008
Neg logit contributions
ling-0.015
ishly-0.015
 gru-0.014
 sorry-0.014
oding-0.014
Token difference
Token ID3580
Feature activation+5.41e+00

Pos logit contributions
ishly+0.047
hing+0.044
rified+0.044
ling+0.044
oding+0.044
Neg logit contributions
 pic-0.035
 Christmas-0.033
 jars-0.030
ine-0.029
 strips-0.029
Token in
Token ID287
Feature activation-7.96e-01

Pos logit contributions
ling+0.139
ishly+0.137
 sorry+0.137
 gru+0.135
 terrified+0.132
Neg logit contributions
 pic-0.086
 Christmas-0.083
ine-0.077
 weeks-0.074
 jars-0.074
Token their
Token ID511
Feature activation+8.26e-01

Pos logit contributions
 pic+0.013
ine+0.012
 Christmas+0.012
aters+0.012
 strips+0.011
Neg logit contributions
 sorry-0.020
ling-0.019
 gru-0.018
 apologise-0.018
 relief-0.018
Token community
Token ID2055
Feature activation+8.49e-01

Pos logit contributions
ling+0.019
 sorry+0.019
ishly+0.019
 gru+0.019
 terrified+0.018
Neg logit contributions
 pic-0.015
 Christmas-0.014
ine-0.014
 weeks-0.013
 Monday-0.013
Token.
Token ID13
Feature activation+7.25e-01

Pos logit contributions
ling+0.021
 sorry+0.020
ishly+0.020
 gru+0.020
 terrified+0.019
Neg logit contributions
 pic-0.015
 Christmas-0.014
ine-0.013
 Monday-0.013
 jars-0.012
Token<|endoftext|>
Token ID50256
Feature activation+2.19e+00

Pos logit contributions
ishly+0.014
 sorry+0.014
 gru+0.013
 relieved+0.013
 terrified+0.013
Neg logit contributions
 Christmas-0.016
 pic-0.015
 jars-0.015
 strips-0.015
 weeks-0.015
Token sorry
Token ID7926
Feature activation+1.32e+00

Pos logit contributions
ishly+0.021
hing+0.019
ling+0.019
 apologise+0.018
 trigger+0.018
Neg logit contributions
ine-0.036
 pic-0.035
 Christmas-0.035
 Grand-0.033
 strips-0.033
Token.
Token ID13
Feature activation-1.92e+00

Pos logit contributions
 sorry+0.034
ling+0.034
ishly+0.032
 gru+0.032
oding+0.032
Neg logit contributions
 pic-0.022
 Christmas-0.021
ine-0.019
 jars-0.019
 weeks-0.019
Token It
Token ID632
Feature activation-1.83e-01

Pos logit contributions
 pic+0.036
 Christmas+0.033
ine+0.032
 weeks+0.032
 jars+0.031
Neg logit contributions
 sorry-0.047
ishly-0.045
 gru-0.044
 vibration-0.043
ling-0.043
Token was
Token ID373
Feature activation-2.87e-01

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.003
 weeks+0.002
 Monday+0.002
Neg logit contributions
ling-0.005
ishly-0.005
 sorry-0.005
 gru-0.005
oding-0.005
Token my
Token ID616
Feature activation+1.48e+00

Pos logit contributions
 pic+0.005
 Christmas+0.005
ine+0.005
 weeks+0.005
 hotels+0.005
Neg logit contributions
 sorry-0.007
ling-0.006
 gru-0.006
 terrified-0.006
ishly-0.006
Token fault
Token ID8046
Feature activation+5.41e+00

Pos logit contributions
ling+0.032
 sorry+0.032
ishly+0.032
 gru+0.031
 terrified+0.030
Neg logit contributions
 pic-0.029
 Christmas-0.028
ine-0.027
 weeks-0.026
 jars-0.026
Token.
Token ID13
Feature activation-1.71e+00

Pos logit contributions
ling+0.152
 gru+0.152
ishly+0.150
 sorry+0.148
 alarmed+0.148
Neg logit contributions
 Christmas-0.068
 pic-0.066
ine-0.060
 weeks-0.059
 Monday-0.057
Token I
Token ID314
Feature activation+2.37e+00

Pos logit contributions
 pic+0.035
 Christmas+0.033
 weeks+0.032
ine+0.032
 jars+0.032
Neg logit contributions
 sorry-0.038
ishly-0.036
 gru-0.035
 vibration-0.034
 terrified-0.034
Token was
Token ID373
Feature activation+1.89e+00

Pos logit contributions
 gru+0.068
ishly+0.066
 sidelines+0.064
ling+0.063
iece+0.063
Neg logit contributions
 planned-0.032
ine-0.032
 Christmas-0.031
 pic-0.031
 unp-0.029
Token too
Token ID1165
Feature activation+4.29e-01

Pos logit contributions
ishly+0.049
hing+0.047
ling+0.045
rified+0.045
 gru+0.045
Neg logit contributions
 Christmas-0.032
 pic-0.031
ine-0.029
 unp-0.029
 Grand-0.028
Token brave
Token ID14802
Feature activation-7.46e-01

Pos logit contributions
ling+0.007
ishly+0.007
 sorry+0.007
 gru+0.007
 terrified+0.007
Neg logit contributions
 pic-0.010
 Christmas-0.010
ine-0.010
 weeks-0.010
 Monday-0.009
Token strong
Token ID1913
Feature activation+9.01e-01

Pos logit contributions
ling+0.007
hing+0.007
ishly+0.007
oding+0.007
iece+0.007
Neg logit contributions
 pic-0.004
 Christmas-0.004
ine-0.003
 Monday-0.003
 strips-0.003
Token body
Token ID1767
Feature activation+3.29e-01

Pos logit contributions
ling+0.019
 sorry+0.019
ishly+0.019
 gru+0.018
 terrified+0.018
Neg logit contributions
 pic-0.018
 Christmas-0.018
ine-0.017
 weeks-0.017
 jars-0.016
Token and
Token ID290
Feature activation-3.35e-02

Pos logit contributions
ling+0.008
 sorry+0.007
ishly+0.007
 terrified+0.007
oding+0.007
Neg logit contributions
 pic-0.007
 Christmas-0.006
 weeks-0.006
 Monday-0.006
 hotels-0.006
Token a
Token ID257
Feature activation+3.93e-01

Pos logit contributions
 pic+0.001
 Christmas+0.001
ine+0.001
aters+0.001
 jars+0.001
Neg logit contributions
 sorry-0.001
 gru-0.001
ling-0.001
ishly-0.001
 terrified-0.001
Token w
Token ID266
Feature activation-6.30e-01

Pos logit contributions
hing+0.011
ishly+0.011
ling+0.011
iece+0.011
rified+0.011
Neg logit contributions
 Christmas-0.006
 pic-0.005
ine-0.005
 strips-0.005
 jars-0.005
Tokenag
Token ID363
Feature activation+5.40e+00

Pos logit contributions
 pic+0.017
 Christmas+0.016
ine+0.016
 weeks+0.016
 strips+0.015
Neg logit contributions
ishly-0.009
 gru-0.009
ling-0.009
 terrified-0.008
 sorry-0.008
Tokengy
Token ID1360
Feature activation+1.73e+00

Pos logit contributions
ishly+0.144
 sidelines+0.137
 sorry+0.136
 apologise+0.135
 gru+0.134
Neg logit contributions
ine-0.098
 Christmas-0.092
ast-0.086
 Halloween-0.076
ro-0.076
Token tail
Token ID7894
Feature activation+1.74e+00

Pos logit contributions
 sorry+0.039
ling+0.039
 gru+0.039
ishly+0.038
oding+0.038
Neg logit contributions
 Christmas-0.032
 pic-0.031
ine-0.030
 weeks-0.028
 strips-0.028
Token.
Token ID13
Feature activation+6.00e-01

Pos logit contributions
ling+0.049
ishly+0.048
 sorry+0.048
 gru+0.048
oding+0.047
Neg logit contributions
 Christmas-0.022
 pic-0.022
ine-0.020
 weeks-0.019
 strips-0.019
Token One
Token ID1881
Feature activation-1.44e+00

Pos logit contributions
 sorry+0.012
 gru+0.012
ishly+0.012
 vibration+0.011
ling+0.011
Neg logit contributions
 pic-0.012
ine-0.012
 Christmas-0.011
 weeks-0.011
 planned-0.011
Token day
Token ID1110
Feature activation+1.63e+00

Pos logit contributions
 pic+0.020
 Christmas+0.019
ine+0.018
 weeks+0.017
 Monday+0.017
Neg logit contributions
ling-0.040
ishly-0.040
 sorry-0.040
 gru-0.039
 terrified-0.038
Token that
Token ID326
Feature activation+2.64e+00

Pos logit contributions
 pic+0.019
 Christmas+0.018
ine+0.017
 jars+0.017
 strips+0.017
Neg logit contributions
 sorry-0.017
ishly-0.017
ling-0.017
 gru-0.017
oding-0.016
Token he
Token ID339
Feature activation+5.38e-01

Pos logit contributions
 sorry+0.060
ling+0.059
 gru+0.058
ishly+0.058
 terrified+0.056
Neg logit contributions
 pic-0.050
 Christmas-0.048
ine-0.045
 jars-0.044
 strips-0.044
Token was
Token ID373
Feature activation+2.62e+00

Pos logit contributions
ling+0.014
ishly+0.013
 sorry+0.013
 gru+0.013
 terrified+0.013
Neg logit contributions
 pic-0.009
 Christmas-0.009
 hotels-0.008
 jars-0.008
ine-0.008
Token making
Token ID1642
Feature activation-3.86e-01

Pos logit contributions
ling+0.057
ishly+0.057
 sorry+0.055
 gru+0.055
 vibration+0.054
Neg logit contributions
 pic-0.051
 Christmas-0.050
ine-0.047
 weeks-0.046
 Monday-0.046
Token a
Token ID257
Feature activation+1.88e+00

Pos logit contributions
 pic+0.006
 Christmas+0.006
ine+0.006
 weeks+0.005
 strips+0.005
Neg logit contributions
ishly-0.010
ling-0.010
oding-0.009
 gru-0.009
hing-0.009
Token difference
Token ID3580
Feature activation+5.39e+00

Pos logit contributions
ishly+0.044
ling+0.043
hing+0.042
oding+0.041
 gru+0.041
Neg logit contributions
 pic-0.035
 Christmas-0.034
ine-0.032
 weeks-0.030
 Monday-0.030
Token.
Token ID13
Feature activation+4.50e-01

Pos logit contributions
ling+0.134
ishly+0.132
 sorry+0.132
 gru+0.129
oding+0.127
Neg logit contributions
 pic-0.090
 Christmas-0.086
ine-0.081
 weeks-0.078
 jars-0.077
Token He
Token ID679
Feature activation-6.10e-03

Pos logit contributions
 sorry+0.007
ishly+0.007
 relieved+0.006
 gru+0.006
oding+0.006
Neg logit contributions
 pic-0.012
 Christmas-0.012
 jars-0.011
 strips-0.011
 hotels-0.011
Token was
Token ID373
Feature activation+2.32e+00

Pos logit contributions
 pic+0.000
 Christmas+0.000
 jars+0.000
 hotels+0.000
 Monday+0.000
Neg logit contributions
ling-0.000
ishly-0.000
 sorry-0.000
 terrified-0.000
 relieved-0.000
Token proud
Token ID6613
Feature activation-6.88e-01

Pos logit contributions
ling+0.043
ishly+0.043
 sorry+0.042
 gru+0.042
 terrified+0.040
Neg logit contributions
 pic-0.053
 Christmas-0.051
ine-0.049
 weeks-0.048
 Monday-0.048
Token to
Token ID284
Feature activation+2.23e+00

Pos logit contributions
 pic+0.013
 Christmas+0.013
ine+0.012
 jars+0.012
 hotels+0.011
Neg logit contributions
ling-0.016
 sorry-0.016
ishly-0.015
 gru-0.015
 terrified-0.015
Token know
Token ID760
Feature activation-1.34e+00

Pos logit contributions
ling+0.055
ishly+0.054
 gru+0.053
 sorry+0.052
oding+0.052
Neg logit contributions
 Christmas-0.030
 pic-0.030
ine-0.028
 weeks-0.027
 Monday-0.026
Token he
Token ID339
Feature activation+1.89e-01

Pos logit contributions
 pic+0.027
 Christmas+0.026
ine+0.024
 weeks+0.024
 jars+0.024
Neg logit contributions
ling-0.029
ishly-0.029
 sorry-0.029
 gru-0.028
 terrified-0.028
Token had
Token ID550
Feature activation-9.62e-01

Pos logit contributions
ling+0.005
ishly+0.005
 sorry+0.005
 gru+0.005
 terrified+0.005
Neg logit contributions
 pic-0.003
 Christmas-0.003
ine-0.003
 jars-0.003
 hotels-0.003
Token made
Token ID925
Feature activation-5.28e-01

Pos logit contributions
 pic+0.019
 Christmas+0.018
 hotels+0.018
ine+0.017
 jars+0.017
Neg logit contributions
 sorry-0.021
ling-0.020
 gru-0.020
oding-0.020
 terrified-0.020
Token a
Token ID257
Feature activation+2.01e+00

Pos logit contributions
 pic+0.008
 Christmas+0.008
ine+0.007
 weeks+0.007
 jars+0.007
Neg logit contributions
ishly-0.014
ling-0.014
 sorry-0.014
 gru-0.014
 terrified-0.013
Token difference
Token ID3580
Feature activation+5.38e+00

Pos logit contributions
ishly+0.045
hing+0.044
ling+0.044
oding+0.043
rified+0.042
Neg logit contributions
 pic-0.040
 Christmas-0.038
ine-0.036
 jars-0.034
 weeks-0.034
Token.
Token ID13
Feature activation+8.13e-01

Pos logit contributions
ling+0.133
 sorry+0.132
ishly+0.132
 gru+0.130
 terrified+0.128
Neg logit contributions
 pic-0.088
 Christmas-0.086
ine-0.080
 weeks-0.078
 Monday-0.076
Token Even
Token ID3412
Feature activation+3.34e-01

Pos logit contributions
 sorry+0.014
ishly+0.014
 relieved+0.014
 gru+0.014
 terrified+0.013
Neg logit contributions
 pic-0.019
 Christmas-0.019
 jars-0.018
ine-0.018
 strips-0.018
Token small
Token ID1402
Feature activation-7.29e-02

Pos logit contributions
ling+0.007
 sorry+0.007
 gru+0.007
ishly+0.007
 terrified+0.007
Neg logit contributions
 pic-0.007
 Christmas-0.006
ine-0.006
 strips-0.006
 jars-0.006
Token acts
Token ID6529
Feature activation+3.71e-01

Pos logit contributions
 pic+0.001
 Christmas+0.001
ine+0.001
 Monday+0.001
 hotels+0.001
Neg logit contributions
ling-0.002
ishly-0.002
 sorry-0.002
 gru-0.002
 terrified-0.001
Token of
Token ID286
Feature activation-2.05e-01

Pos logit contributions
ling+0.010
ishly+0.010
 gru+0.010
 sorry+0.010
 vibration+0.010
Neg logit contributions
 pic-0.005
 Christmas-0.005
ine-0.005
 weeks-0.005
 strips-0.004
Token told
Token ID1297
Feature activation+1.31e+00

Pos logit contributions
 pic+0.013
 Christmas+0.012
 hotels+0.012
 Monday+0.012
 weeks+0.011
Neg logit contributions
ling-0.013
ishly-0.012
 sorry-0.012
 terrified-0.012
 alarmed-0.012
Token her
Token ID607
Feature activation-6.82e-01

Pos logit contributions
 sorry+0.033
ishly+0.032
ling+0.032
 alarmed+0.030
 vibration+0.030
Neg logit contributions
 pic-0.022
 Christmas-0.021
ine-0.020
 strips-0.020
 jars-0.019
Token it
Token ID340
Feature activation-8.02e-02

Pos logit contributions
 pic+0.013
 strips+0.011
 Christmas+0.011
aters+0.011
 hotels+0.011
Neg logit contributions
 sorry-0.017
ling-0.015
 shocked-0.015
 relieved-0.015
ishly-0.015
Token was
Token ID373
Feature activation-2.57e-01

Pos logit contributions
 pic+0.002
 Christmas+0.002
 hotels+0.002
 weeks+0.001
 jars+0.001
Neg logit contributions
ling-0.002
ishly-0.002
 sorry-0.002
 alarmed-0.002
 terrified-0.002
Token her
Token ID607
Feature activation-3.76e-01

Pos logit contributions
 pic+0.007
 hotels+0.007
 strips+0.007
 jars+0.006
 Mon+0.006
Neg logit contributions
 sorry-0.004
ling-0.003
 wrong-0.003
 relief-0.003
 shocked-0.003
Token fault
Token ID8046
Feature activation+5.35e+00

Pos logit contributions
 pic+0.008
 hotels+0.008
 Christmas+0.007
ine+0.007
 Monday+0.007
Neg logit contributions
 sorry-0.008
ishly-0.007
ling-0.007
 shocked-0.007
 terrified-0.007
Token because
Token ID780
Feature activation-4.31e-01

Pos logit contributions
ling+0.143
ishly+0.141
 sorry+0.141
 gru+0.140
 terrified+0.137
Neg logit contributions
 pic-0.077
 Christmas-0.075
ine-0.069
 weeks-0.067
 Monday-0.066
Token she
Token ID673
Feature activation-3.05e-01

Pos logit contributions
ast+0.010
 pic+0.009
ine+0.009
oki+0.009
ona+0.009
Neg logit contributions
 sorry-0.009
 shocked-0.008
ling-0.007
 bleeding-0.007
 relieved-0.007
Token dropped
Token ID5710
Feature activation+3.66e-01

Pos logit contributions
 pic+0.006
 Christmas+0.006
 hotels+0.006
 Monday+0.005
 jars+0.005
Neg logit contributions
ling-0.007
 sorry-0.006
 terrified-0.006
ishly-0.006
 alarmed-0.006
Token her
Token ID607
Feature activation-1.74e+00

Pos logit contributions
 sorry+0.009
ishly+0.008
ling+0.008
 trigger+0.008
 terrified+0.008
Neg logit contributions
 pic-0.007
 Christmas-0.006
ine-0.006
 Monday-0.006
ast-0.006
Token toy
Token ID13373
Feature activation+1.72e-01

Pos logit contributions
 pic+0.034
 Christmas+0.031
ine+0.030
 weeks+0.030
 Monday+0.030
Neg logit contributions
 sorry-0.039
ling-0.039
ishly-0.038
 gru-0.037
 vibration-0.037
Token that
Token ID326
Feature activation+2.99e+00

Pos logit contributions
 pic+0.013
ine+0.013
 Christmas+0.013
 strips+0.012
 jars+0.012
Neg logit contributions
 sorry-0.013
ishly-0.012
ling-0.012
 gru-0.011
 terrified-0.011
Token she
Token ID673
Feature activation-8.14e-02

Pos logit contributions
 sorry+0.074
 gru+0.067
ling+0.067
ishly+0.066
 apologise+0.066
Neg logit contributions
 pic-0.052
ine-0.052
ast-0.050
 strips-0.048
aters-0.048
Token could
Token ID714
Feature activation-3.00e-01

Pos logit contributions
 pic+0.001
 Christmas+0.001
ine+0.001
 weeks+0.001
 jars+0.001
Neg logit contributions
ling-0.002
ishly-0.002
 sorry-0.002
 gru-0.002
 terrified-0.002
Token make
Token ID787
Feature activation-3.48e-01

Pos logit contributions
 Christmas+0.006
 pic+0.005
ine+0.005
 Monday+0.005
 weeks+0.005
Neg logit contributions
 gru-0.007
ling-0.007
ishly-0.007
 terrified-0.007
rified-0.007
Token a
Token ID257
Feature activation+1.82e+00

Pos logit contributions
 pic+0.006
 Christmas+0.005
 Grand+0.005
ine+0.005
 weeks+0.005
Neg logit contributions
ishly-0.009
ling-0.009
 gru-0.009
hing-0.009
oding-0.009
Token difference
Token ID3580
Feature activation+5.32e+00

Pos logit contributions
ishly+0.044
rified+0.042
hing+0.042
ivered+0.040
iece+0.040
Neg logit contributions
 pic-0.037
 Christmas-0.037
 Grand-0.033
ast-0.030
 grand-0.030
Token in
Token ID287
Feature activation-1.39e+00

Pos logit contributions
ling+0.133
ishly+0.132
 sorry+0.132
 gru+0.130
 terrified+0.127
Neg logit contributions
 pic-0.087
 Christmas-0.084
ine-0.078
 weeks-0.076
 jars-0.075
Token someone
Token ID2130
Feature activation+8.77e-01

Pos logit contributions
 pic+0.025
ine+0.024
aters+0.023
 strips+0.022
ast+0.022
Neg logit contributions
 sorry-0.034
 relief-0.031
 apologise-0.031
 gru-0.030
 vibration-0.030
Token's
Token ID338
Feature activation-1.53e+00

Pos logit contributions
ling+0.019
 sorry+0.018
ishly+0.017
 terrified+0.017
 gru+0.017
Neg logit contributions
 pic-0.018
 Christmas-0.018
ine-0.017
 Monday-0.017
 strips-0.017
Token day
Token ID1110
Feature activation-2.53e-01

Pos logit contributions
 Christmas+0.028
 pic+0.027
 weeks+0.026
ine+0.025
 jars+0.024
Neg logit contributions
ishly-0.036
ling-0.035
hing-0.035
 terrified-0.034
 gru-0.034
Token.
Token ID13
Feature activation+1.33e+00

Pos logit contributions
 pic+0.005
 jars+0.005
 hotels+0.005
 Christmas+0.005
ine+0.005
Neg logit contributions
ling-0.006
 sorry-0.005
ishly-0.005
 gru-0.005
 relieved-0.005
Token it
Token ID340
Feature activation+1.88e-01

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.003
 weeks+0.003
 jars+0.003
Neg logit contributions
ling-0.003
 sorry-0.003
ishly-0.003
 gru-0.003
 terrified-0.003
Token was
Token ID373
Feature activation+5.20e-01

Pos logit contributions
ling+0.004
ishly+0.004
 sorry+0.004
 gru+0.004
 terrified+0.004
Neg logit contributions
 pic-0.004
 Christmas-0.003
ine-0.003
 jars-0.003
 hotels-0.003
Token all
Token ID477
Feature activation+2.39e-01

Pos logit contributions
 sorry+0.011
ling+0.011
 gru+0.011
ishly+0.010
 terrified+0.010
Neg logit contributions
 pic-0.011
 Christmas-0.010
ine-0.010
 strips-0.010
 jars-0.010
Token Joe
Token ID5689
Feature activation+4.38e-01

Pos logit contributions
ishly+0.006
ling+0.006
 sorry+0.006
 gru+0.006
oding+0.006
Neg logit contributions
 pic-0.004
 Christmas-0.004
 Monday-0.003
ine-0.003
 weeks-0.003
Token's
Token ID338
Feature activation-6.15e-01

Pos logit contributions
ling+0.011
 sorry+0.010
ishly+0.010
 gru+0.010
 terrified+0.010
Neg logit contributions
 pic-0.008
 Christmas-0.008
 hotels-0.007
 Monday-0.007
aters-0.007
Token fault
Token ID8046
Feature activation+5.30e+00

Pos logit contributions
 pic+0.012
ine+0.011
 Christmas+0.011
 hotels+0.011
 jars+0.011
Neg logit contributions
 sorry-0.014
ling-0.013
ishly-0.013
 vibration-0.012
 gru-0.012
Token.
Token ID13
Feature activation+8.69e-01

Pos logit contributions
ling+0.137
ishly+0.135
 sorry+0.135
 gru+0.132
 terrified+0.130
Neg logit contributions
 pic-0.084
 Christmas-0.080
ine-0.075
 weeks-0.072
 jars-0.072
Token Joe
Token ID5689
Feature activation+4.79e-02

Pos logit contributions
 sorry+0.018
ishly+0.018
 gru+0.018
ling+0.018
 terrified+0.017
Neg logit contributions
 pic-0.018
 Christmas-0.018
ine-0.016
 jars-0.016
 weeks-0.016
Token tried
Token ID3088
Feature activation-1.94e-01

Pos logit contributions
ling+0.001
ishly+0.001
 sorry+0.001
 terrified+0.001
 gru+0.001
Neg logit contributions
 pic-0.001
 Christmas-0.001
 hotels-0.001
 jars-0.001
 Monday-0.001
Token to
Token ID284
Feature activation+2.34e-01

Pos logit contributions
 pic+0.002
 Christmas+0.002
ine+0.001
 weeks+0.001
 jars+0.001
Neg logit contributions
ling-0.006
 sorry-0.006
ishly-0.006
 gru-0.006
 terrified-0.006
Token talk
Token ID1561
Feature activation+4.04e-01

Pos logit contributions
 gru+0.006
ling+0.005
rified+0.005
ishly+0.005
oding+0.005
Neg logit contributions
 Christmas-0.004
 pic-0.004
ine-0.004
ast-0.004
 unp-0.004
Token Tom
Token ID4186
Feature activation-5.09e-01

Pos logit contributions
 pic+0.003
ine+0.003
 Christmas+0.003
 weeks+0.003
 jars+0.003
Neg logit contributions
 sorry-0.005
ishly-0.004
 gru-0.004
 terrified-0.004
ling-0.004
Token.
Token ID13
Feature activation-2.16e+00

Pos logit contributions
 pic+0.008
 Christmas+0.008
ine+0.008
 weeks+0.007
 jars+0.007
Neg logit contributions
ishly-0.013
ling-0.013
 sorry-0.013
 gru-0.012
 vibration-0.012
Token It
Token ID632
Feature activation+2.56e-01

Pos logit contributions
 pic+0.029
 Christmas+0.029
ine+0.026
 Monday+0.025
 Grand+0.025
Neg logit contributions
ling-0.061
ishly-0.058
 gru-0.058
 sorry-0.058
 alarmed-0.058
Token was
Token ID373
Feature activation-3.32e-01

Pos logit contributions
ishly+0.007
 gru+0.007
ling+0.007
oding+0.007
 sorry+0.007
Neg logit contributions
 pic-0.003
 Christmas-0.003
ine-0.003
 planned-0.003
 Monday-0.003
Token my
Token ID616
Feature activation+1.59e+00

Pos logit contributions
 pic+0.006
 Christmas+0.005
ine+0.005
 weeks+0.005
 Monday+0.005
Neg logit contributions
ling-0.008
ishly-0.008
 sorry-0.008
 gru-0.008
oding-0.008
Token fault
Token ID8046
Feature activation+5.30e+00

Pos logit contributions
ling+0.037
ishly+0.037
 sorry+0.037
 gru+0.036
 terrified+0.035
Neg logit contributions
 pic-0.029
 Christmas-0.028
ine-0.026
 weeks-0.026
 Monday-0.025
Token.
Token ID13
Feature activation-1.86e+00

Pos logit contributions
 gru+0.152
hing+0.151
ling+0.150
 alarmed+0.150
 vibration+0.150
Neg logit contributions
 Christmas-0.063
 pic-0.057
ine-0.056
 weeks-0.053
 strips-0.052
Token I
Token ID314
Feature activation+2.59e+00

Pos logit contributions
 pic+0.033
 Christmas+0.032
ine+0.030
 weeks+0.029
 jars+0.029
Neg logit contributions
 sorry-0.045
ishly-0.044
ling-0.044
 gru-0.043
 terrified-0.042
Token pulled
Token ID5954
Feature activation+5.63e-01

Pos logit contributions
ishly+0.066
 gru+0.066
ling+0.065
 sorry+0.064
oding+0.063
Neg logit contributions
 pic-0.041
ine-0.040
 Christmas-0.039
 planned-0.038
 unp-0.036
Token the
Token ID262
Feature activation+6.71e-02

Pos logit contributions
 sorry+0.014
ling+0.014
ishly+0.013
 gru+0.013
 terrified+0.013
Neg logit contributions
 pic-0.010
 Christmas-0.009
ine-0.009
 jars-0.008
 Monday-0.008
Token string
Token ID4731
Feature activation+1.83e-01

Pos logit contributions
ling+0.002
ishly+0.002
 sorry+0.002
 gru+0.002
 terrified+0.002
Neg logit contributions
 pic-0.001
 Christmas-0.001
ine-0.001
 weeks-0.001
 jars-0.001
Token knew
Token ID2993
Feature activation-6.60e-01

Pos logit contributions
 pic+0.013
 Christmas+0.013
 hotels+0.012
 jars+0.012
 folders+0.012
Neg logit contributions
ling-0.012
ishly-0.012
 sorry-0.012
 terrified-0.012
 relieved-0.012
Token she
Token ID673
Feature activation-7.28e-01

Pos logit contributions
 pic+0.013
 Christmas+0.012
ine+0.012
 strips+0.012
 jars+0.012
Neg logit contributions
 sorry-0.015
ishly-0.014
ling-0.014
 gru-0.014
oding-0.014
Token could
Token ID714
Feature activation-4.72e-01

Pos logit contributions
 pic+0.012
 Christmas+0.011
 hotels+0.011
 jars+0.010
 strips+0.010
Neg logit contributions
ling-0.019
ishly-0.018
 sorry-0.018
 gru-0.018
 terrified-0.018
Token make
Token ID787
Feature activation-2.77e-01

Pos logit contributions
 pic+0.009
 jars+0.008
 Christmas+0.008
 strips+0.008
 folders+0.008
Neg logit contributions
ishly-0.011
 sorry-0.011
ling-0.011
 apologise-0.010
hing-0.010
Token a
Token ID257
Feature activation+1.97e+00

Pos logit contributions
 pic+0.005
 Christmas+0.004
ine+0.004
 weeks+0.004
 Monday+0.004
Neg logit contributions
ling-0.007
 sorry-0.007
ishly-0.007
 gru-0.007
 terrified-0.007
Token difference
Token ID3580
Feature activation+5.30e+00

Pos logit contributions
ishly+0.042
ling+0.042
hing+0.041
oding+0.041
 sorry+0.040
Neg logit contributions
 pic-0.042
 Christmas-0.040
ine-0.037
 jars-0.036
 Monday-0.036
Token.
Token ID13
Feature activation+5.93e-01

Pos logit contributions
ling+0.127
ishly+0.124
 sorry+0.124
 gru+0.121
oding+0.118
Neg logit contributions
 pic-0.095
 Christmas-0.090
ine-0.085
 jars-0.082
 Monday-0.082
Token
Token ID198
Feature activation-2.52e-01

Pos logit contributions
ishly+0.010
 relieved+0.010
 sorry+0.010
rified+0.009
 gru+0.009
Neg logit contributions
 pic-0.014
 Christmas-0.014
 jars-0.013
 strips-0.013
 hotels-0.013
Token
Token ID198
Feature activation-3.77e-01

Pos logit contributions
 pic+0.005
 Christmas+0.005
ine+0.005
 Monday+0.005
 weeks+0.005
Neg logit contributions
ishly-0.005
 vibration-0.005
ling-0.005
 sorry-0.005
 trigger-0.005
TokenThe
Token ID464
Feature activation+1.67e-01

Pos logit contributions
 pic+0.008
 Christmas+0.007
ine+0.007
 jars+0.007
 strips+0.007
Neg logit contributions
ling-0.008
ishly-0.008
 gru-0.008
 sorry-0.008
oding-0.008
Token moral
Token ID6573
Feature activation-7.20e-01

Pos logit contributions
 sorry+0.004
ling+0.004
 gru+0.004
ishly+0.003
 relief+0.003
Neg logit contributions
 pic-0.003
ine-0.003
 Christmas-0.003
 jars-0.003
aters-0.003
Token that
Token ID326
Feature activation+3.74e+00

Pos logit contributions
 pic+0.002
 Christmas+0.002
ine+0.002
 jars+0.002
 Monday+0.002
Neg logit contributions
 sorry-0.003
ling-0.003
ishly-0.003
 gru-0.003
 terrified-0.003
Token she
Token ID673
Feature activation+6.68e-01

Pos logit contributions
 sorry+0.093
ling+0.085
ishly+0.083
 gru+0.083
 apologise+0.082
Neg logit contributions
 pic-0.069
ine-0.064
 Christmas-0.062
ast-0.061
 strips-0.061
Token could
Token ID714
Feature activation-9.44e-02

Pos logit contributions
ling+0.017
ishly+0.017
 sorry+0.017
 gru+0.017
oding+0.017
Neg logit contributions
 pic-0.010
 Christmas-0.010
ine-0.009
 weeks-0.009
 Monday-0.009
Token make
Token ID787
Feature activation+1.59e-02

Pos logit contributions
 pic+0.002
 Christmas+0.002
ine+0.002
 weeks+0.002
 jars+0.002
Neg logit contributions
ling-0.002
 sorry-0.002
ishly-0.002
 gru-0.002
 terrified-0.002
Token a
Token ID257
Feature activation+2.08e+00

Pos logit contributions
ishly+0.000
ling+0.000
 gru+0.000
hing+0.000
 terrified+0.000
Neg logit contributions
 pic-0.000
 Christmas-0.000
ine-0.000
 jars-0.000
 weeks-0.000
Token difference
Token ID3580
Feature activation+5.26e+00

Pos logit contributions
hing+0.054
ishly+0.053
rified+0.051
ling+0.051
oding+0.049
Neg logit contributions
 pic-0.038
 Christmas-0.035
 jars-0.033
 sugar-0.031
 weeks-0.031
Token just
Token ID655
Feature activation-4.39e-01

Pos logit contributions
ling+0.133
ishly+0.131
 sorry+0.131
 gru+0.129
 terrified+0.127
Neg logit contributions
 pic-0.085
 Christmas-0.082
ine-0.077
 weeks-0.074
 jars-0.074
Token by
Token ID416
Feature activation+5.87e-01

Pos logit contributions
 pic+0.007
 Christmas+0.007
ine+0.006
 weeks+0.006
 Monday+0.006
Neg logit contributions
ling-0.011
ishly-0.011
 gru-0.011
 sorry-0.011
oding-0.011
Token caring
Token ID18088
Feature activation-1.25e+00

Pos logit contributions
 sorry+0.012
ling+0.012
ishly+0.012
 gru+0.012
 terrified+0.011
Neg logit contributions
 pic-0.012
 Christmas-0.012
ine-0.011
 Monday-0.011
 weeks-0.011
Token.
Token ID13
Feature activation+2.07e+00

Pos logit contributions
 pic+0.023
 Christmas+0.021
 Monday+0.020
ine+0.020
 hotels+0.020
Neg logit contributions
ling-0.030
 sorry-0.029
ishly-0.028
hing-0.026
 vibration-0.026
Token<|endoftext|>
Token ID50256
Feature activation+1.68e+00

Pos logit contributions
 sorry+0.040
ishly+0.039
 relieved+0.038
ling+0.038
 terrified+0.037
Neg logit contributions
 pic-0.045
 Christmas-0.043
ine-0.042
 jars-0.041
 strips-0.041
Token.
Token ID13
Feature activation-2.31e-01

Pos logit contributions
ling+0.016
 sorry+0.016
ishly+0.015
 gru+0.014
 terrified+0.014
Neg logit contributions
 pic-0.022
 strips-0.020
 Christmas-0.020
 jars-0.020
 weeks-0.019
Token She
Token ID1375
Feature activation-1.06e+00

Pos logit contributions
 pic+0.005
ine+0.005
 passports+0.005
ast+0.005
 strips+0.005
Neg logit contributions
 sorry-0.004
 relief-0.004
 vibration-0.004
 gru-0.004
ishly-0.004
Token was
Token ID373
Feature activation+1.12e+00

Pos logit contributions
 pic+0.028
aters+0.027
 strips+0.026
 jars+0.026
 hotels+0.026
Neg logit contributions
ling-0.017
 sorry-0.016
ined-0.014
 shocked-0.014
 terrified-0.014
Token three
Token ID1115
Feature activation-2.09e+00

Pos logit contributions
 sorry+0.025
 terrified+0.022
ling+0.022
ishly+0.022
 alarmed+0.022
Neg logit contributions
 pic-0.024
ine-0.022
 Christmas-0.021
 weeks-0.021
 strips-0.021
Token-
Token ID12
Feature activation-1.98e+00

Pos logit contributions
 pic+0.031
 Monday+0.031
 Christmas+0.029
 Wed+0.028
ine+0.027
Neg logit contributions
ling-0.060
 sorry-0.055
 shocked-0.054
 terrified-0.053
 gru-0.052
Tokenyears
Token ID19002
Feature activation-6.98e+00

Pos logit contributions
 pic+0.045
 Christmas+0.043
ine+0.041
 Monday+0.040
 strips+0.040
Neg logit contributions
ling-0.038
 sorry-0.037
ishly-0.037
 gru-0.036
oding-0.036
Token-
Token ID12
Feature activation-1.88e+00

Pos logit contributions
 pic+0.144
 Monday+0.137
 hotels+0.129
 Christmas+0.128
 Saturday+0.126
Neg logit contributions
ling-0.164
ag-0.146
 gru-0.143
 sorry-0.141
cut-0.136
Tokenold
Token ID727
Feature activation+1.02e+00

Pos logit contributions
 pic+0.054
 Christmas+0.051
ine+0.050
 Monday+0.050
 strips+0.049
Neg logit contributions
ling-0.025
 sorry-0.023
 gru-0.023
oding-0.023
ishly-0.022
Token and
Token ID290
Feature activation+1.97e-01

Pos logit contributions
ishly+0.024
ling+0.024
 sorry+0.024
 gru+0.023
 relieved+0.022
Neg logit contributions
 pic-0.020
 Monday-0.018
 Christmas-0.017
 planned-0.016
 hotels-0.016
Token she
Token ID673
Feature activation-9.17e-02

Pos logit contributions
 sorry+0.003
ling+0.003
ishly+0.003
 gru+0.003
oding+0.003
Neg logit contributions
 pic-0.005
 Christmas-0.005
ine-0.005
 strips-0.004
 jars-0.004
Token loved
Token ID6151
Feature activation-2.10e+00

Pos logit contributions
 pic+0.003
aters+0.002
 strips+0.002
 hotels+0.002
 jars+0.002
Neg logit contributions
ling-0.001
 sorry-0.001
 shocked-0.001
 relieved-0.001
 terrified-0.001
Token girl
Token ID2576
Feature activation+8.87e-01

Pos logit contributions
 sorry+0.005
ling+0.005
ishly+0.004
 gru+0.004
 vibration+0.004
Neg logit contributions
 pic-0.007
ine-0.006
 Christmas-0.006
 weeks-0.006
ast-0.006
Token who
Token ID508
Feature activation+1.81e-01

Pos logit contributions
ling+0.014
 sorry+0.014
ishly+0.013
 gru+0.012
cut+0.012
Neg logit contributions
 pic-0.023
 strips-0.021
 jars-0.021
 weeks-0.020
aters-0.020
Token was
Token ID373
Feature activation+1.22e+00

Pos logit contributions
 sorry+0.003
ling+0.002
 relieved+0.002
 relief+0.002
hing+0.002
Neg logit contributions
 pic-0.005
 strips-0.005
aters-0.005
 jars-0.005
ine-0.005
Token three
Token ID1115
Feature activation-1.87e+00

Pos logit contributions
 sorry+0.026
ling+0.025
ishly+0.025
 gru+0.024
 terrified+0.024
Neg logit contributions
 pic-0.025
 Christmas-0.024
ine-0.023
 weeks-0.023
 strips-0.023
Token-
Token ID12
Feature activation-1.56e+00

Pos logit contributions
 pic+0.028
 Monday+0.027
 Christmas+0.026
ine+0.025
 Wed+0.024
Neg logit contributions
ling-0.053
 sorry-0.049
 terrified-0.048
 gru-0.047
 shocked-0.047
Tokenyears
Token ID19002
Feature activation-6.96e+00

Pos logit contributions
 pic+0.033
 Christmas+0.032
ine+0.030
 weeks+0.030
 jars+0.029
Neg logit contributions
ling-0.032
ishly-0.032
 sorry-0.031
 gru-0.031
 vibration-0.030
Token-
Token ID12
Feature activation-1.62e+00

Pos logit contributions
 pic+0.140
 Monday+0.130
 hotels+0.123
 Christmas+0.122
ine+0.120
Neg logit contributions
ling-0.166
ag-0.148
 sorry-0.146
 gru-0.146
oding-0.140
Tokenold
Token ID727
Feature activation+1.31e+00

Pos logit contributions
 pic+0.044
 Christmas+0.042
ine+0.040
 strips+0.040
 Monday+0.040
Neg logit contributions
ling-0.024
 sorry-0.023
ishly-0.023
 gru-0.023
oding-0.022
Token.
Token ID13
Feature activation+2.82e-02

Pos logit contributions
ling+0.029
ishly+0.029
 sorry+0.029
 gru+0.028
 terrified+0.027
Neg logit contributions
 pic-0.027
 Monday-0.024
 Christmas-0.024
ine-0.023
 hotels-0.022
Token She
Token ID1375
Feature activation-1.05e+00

Pos logit contributions
 sorry+0.000
 hi+0.000
 gru+0.000
 relief+0.000
 relieved+0.000
Neg logit contributions
ine-0.001
ast-0.001
 pic-0.001
 passports-0.001
ri-0.001
Token had
Token ID550
Feature activation-2.08e+00

Pos logit contributions
aters+0.032
 pic+0.032
 jars+0.031
ait+0.031
 hotels+0.031
Neg logit contributions
ling-0.011
 sorry-0.010
ined-0.010
 shocked-0.010
 relieved-0.010
Token She
Token ID1375
Feature activation-1.13e+00

Pos logit contributions
 pic+0.007
 jars+0.006
 teaspoons+0.006
ona+0.006
 Christmas+0.006
Neg logit contributions
 sorry-0.007
oding-0.007
ishly-0.007
hing-0.007
 vibration-0.007
Token was
Token ID373
Feature activation+1.39e+00

Pos logit contributions
 pic+0.031
aters+0.030
 jars+0.030
 strips+0.029
 hotels+0.029
Neg logit contributions
ling-0.016
 sorry-0.016
 shocked-0.014
 relieved-0.014
 terrified-0.014
Token just
Token ID655
Feature activation+1.18e+00

Pos logit contributions
 sorry+0.031
ishly+0.029
 terrified+0.028
ling+0.028
 alarmed+0.028
Neg logit contributions
 pic-0.028
ine-0.026
 Christmas-0.026
 strips-0.026
 weeks-0.026
Token three
Token ID1115
Feature activation-2.34e+00

Pos logit contributions
ling+0.030
ishly+0.030
 sorry+0.030
 gru+0.029
 terrified+0.029
Neg logit contributions
 pic-0.019
 Christmas-0.018
ine-0.017
 weeks-0.016
 jars-0.016
Token-
Token ID12
Feature activation-1.69e+00

Pos logit contributions
 Monday+0.038
 pic+0.036
 Christmas+0.035
 Wed+0.034
 Saturday+0.034
Neg logit contributions
ling-0.065
 sorry-0.057
 terrified-0.056
 shocked-0.056
ivering-0.055
Tokenyears
Token ID19002
Feature activation-6.92e+00

Pos logit contributions
 pic+0.040
 Christmas+0.038
ine+0.036
 strips+0.035
 Monday+0.035
Neg logit contributions
ling-0.032
 sorry-0.030
ishly-0.030
 gru-0.030
oding-0.030
Token-
Token ID12
Feature activation-1.77e+00

Pos logit contributions
 pic+0.146
 Monday+0.139
 hotels+0.130
 Christmas+0.129
 Saturday+0.129
Neg logit contributions
ling-0.160
ag-0.142
 gru-0.137
 sorry-0.136
cut-0.132
Tokenold
Token ID727
Feature activation+9.99e-01

Pos logit contributions
 pic+0.052
 Christmas+0.050
 strips+0.048
 Monday+0.048
ine+0.048
Neg logit contributions
ling-0.022
 sorry-0.021
oding-0.020
ishly-0.020
 gru-0.020
Token and
Token ID290
Feature activation+4.12e-01

Pos logit contributions
ishly+0.025
ling+0.024
 sorry+0.024
 gru+0.023
 terrified+0.022
Neg logit contributions
 pic-0.019
 Monday-0.017
 Christmas-0.017
 Saturday-0.016
 hotels-0.016
Token loved
Token ID6151
Feature activation-2.01e+00

Pos logit contributions
ling+0.008
 sorry+0.008
ishly+0.007
 gru+0.007
 terrified+0.007
Neg logit contributions
 pic-0.010
 Christmas-0.009
ine-0.009
 weeks-0.009
 jars-0.009
Token to
Token ID284
Feature activation-1.75e-01

Pos logit contributions
ine+0.040
ara+0.039
ast+0.036
ona+0.035
azines+0.034
Neg logit contributions
 sorry-0.043
 weeping-0.042
 crying-0.042
 noise-0.039
ling-0.039
Token He
Token ID679
Feature activation-1.16e-01

Pos logit contributions
 pic+0.006
 Christmas+0.005
ine+0.005
 weeks+0.005
 jars+0.005
Neg logit contributions
 sorry-0.008
 gru-0.007
ishly-0.007
ling-0.007
 vibration-0.007
Token was
Token ID373
Feature activation+8.75e-01

Pos logit contributions
 pic+0.003
aters+0.003
 jars+0.003
 hotels+0.003
 strips+0.003
Neg logit contributions
ling-0.002
 sorry-0.002
 terrified-0.002
ined-0.002
 gru-0.002
Token only
Token ID691
Feature activation+1.36e+00

Pos logit contributions
 sorry+0.022
ling+0.021
ishly+0.021
 gru+0.020
 terrified+0.020
Neg logit contributions
 pic-0.015
 Christmas-0.014
ine-0.014
 weeks-0.013
 jars-0.013
Token three
Token ID1115
Feature activation-2.15e+00

Pos logit contributions
 sorry+0.035
ling+0.035
 vibration+0.034
 gru+0.034
ishly+0.034
Neg logit contributions
 pic-0.023
 Christmas-0.020
 weeks-0.019
ine-0.019
 jars-0.018
Token-
Token ID12
Feature activation-1.81e+00

Pos logit contributions
 pic+0.034
 Monday+0.032
 Christmas+0.031
 Sale+0.029
ine+0.029
Neg logit contributions
ling-0.061
 sorry-0.054
 shocked-0.054
 terrified-0.053
 gru-0.053
Tokenyears
Token ID19002
Feature activation-6.85e+00

Pos logit contributions
 pic+0.041
 Christmas+0.039
ine+0.037
 jars+0.036
 Monday+0.036
Neg logit contributions
ling-0.035
 sorry-0.034
ishly-0.034
 gru-0.034
oding-0.033
Token-
Token ID12
Feature activation-1.86e+00

Pos logit contributions
 pic+0.150
 Monday+0.139
 hotels+0.130
 Christmas+0.129
 Saturday+0.128
Neg logit contributions
ling-0.155
ag-0.137
 gru-0.133
 sorry-0.130
cut-0.128
Tokenold
Token ID727
Feature activation+1.11e+00

Pos logit contributions
 pic+0.054
 Christmas+0.051
 strips+0.049
ine+0.049
 Monday+0.049
Neg logit contributions
ling-0.025
 sorry-0.023
 gru-0.023
oding-0.023
hing-0.022
Token and
Token ID290
Feature activation+5.01e-01

Pos logit contributions
ling+0.028
ishly+0.027
 sorry+0.027
 gru+0.027
 terrified+0.026
Neg logit contributions
 pic-0.019
 Christmas-0.018
 Monday-0.017
ine-0.016
 weeks-0.016
Token he
Token ID339
Feature activation+1.12e+00

Pos logit contributions
ling+0.010
ishly+0.010
 gru+0.010
 sorry+0.010
 vibration+0.010
Neg logit contributions
 Christmas-0.011
 pic-0.010
ine-0.010
 weeks-0.010
 Monday-0.010
Token was
Token ID373
Feature activation+1.59e+00

Pos logit contributions
ling+0.016
 sorry+0.016
 terrified+0.015
 shocked+0.015
 relieved+0.015
Neg logit contributions
 pic-0.031
aters-0.029
 strips-0.029
 hotels-0.029
 jars-0.028
Token.
Token ID13
Feature activation-2.33e-01

Pos logit contributions
 pic+0.034
 Christmas+0.031
 jars+0.030
 strips+0.030
 weeks+0.030
Neg logit contributions
ling-0.045
 sorry-0.042
ishly-0.042
oding-0.041
 gru-0.041
Token She
Token ID1375
Feature activation-1.14e+00

Pos logit contributions
 pic+0.004
 Christmas+0.004
ine+0.004
 strips+0.004
 jars+0.004
Neg logit contributions
ling-0.005
 sorry-0.005
ishly-0.005
 gru-0.005
oding-0.005
Token was
Token ID373
Feature activation+1.23e+00

Pos logit contributions
 pic+0.031
aters+0.029
 jars+0.029
 strips+0.029
 hotels+0.029
Neg logit contributions
ling-0.017
 sorry-0.016
 relieved-0.015
ined-0.015
 shocked-0.015
Token three
Token ID1115
Feature activation-2.08e+00

Pos logit contributions
 sorry+0.028
 terrified+0.025
ishly+0.025
 shocked+0.025
 alarmed+0.025
Neg logit contributions
 pic-0.025
ine-0.023
 strips-0.023
 weeks-0.023
 jars-0.023
Token-
Token ID12
Feature activation-1.53e+00

Pos logit contributions
 pic+0.030
 Monday+0.029
 Christmas+0.028
 Wed+0.025
 Saturday+0.025
Neg logit contributions
ling-0.061
 sorry-0.055
 terrified-0.054
 shocked-0.054
 gru-0.053
Tokenyears
Token ID19002
Feature activation-6.82e+00

Pos logit contributions
 pic+0.034
 Christmas+0.032
ine+0.031
 Monday+0.030
 jars+0.030
Neg logit contributions
ling-0.030
 sorry-0.030
ishly-0.030
 gru-0.029
oding-0.028
Token-
Token ID12
Feature activation-1.49e+00

Pos logit contributions
 pic+0.140
 Monday+0.133
 hotels+0.124
 Christmas+0.123
 Saturday+0.122
Neg logit contributions
ling-0.161
ag-0.141
 gru-0.138
 sorry-0.138
cut-0.134
Tokenold
Token ID727
Feature activation+1.34e+00

Pos logit contributions
 pic+0.042
 Christmas+0.040
ine+0.039
 strips+0.038
 Monday+0.038
Neg logit contributions
ling-0.021
 sorry-0.020
ishly-0.019
 gru-0.019
oding-0.019
Token and
Token ID290
Feature activation+2.34e-01

Pos logit contributions
ishly+0.035
ling+0.035
 sorry+0.035
 gru+0.034
 terrified+0.033
Neg logit contributions
 pic-0.022
 Monday-0.019
 Christmas-0.019
ine-0.018
 planned-0.017
Token she
Token ID673
Feature activation+2.93e-02

Pos logit contributions
ling+0.004
ishly+0.004
 sorry+0.004
 gru+0.004
 terrified+0.004
Neg logit contributions
 pic-0.005
 Christmas-0.005
ine-0.005
 weeks-0.005
 Monday-0.005
Token had
Token ID550
Feature activation-2.01e+00

Pos logit contributions
ling+0.000
 sorry+0.000
 terrified+0.000
 relieved+0.000
ishly+0.000
Neg logit contributions
 pic-0.001
 strips-0.001
 Christmas-0.001
 jars-0.001
 hotels-0.001
Token still
Token ID991
Feature activation+1.90e+00

Pos logit contributions
ling+0.016
 terrified+0.015
 sorry+0.015
 relieved+0.015
ishly+0.015
Neg logit contributions
 pic-0.022
aters-0.021
 hotels-0.020
 Christmas-0.020
 strips-0.020
Token wanted
Token ID2227
Feature activation-3.81e-01

Pos logit contributions
ling+0.039
 sorry+0.038
ishly+0.038
 gru+0.037
 terrified+0.036
Neg logit contributions
 pic-0.040
 Christmas-0.039
ine-0.037
 weeks-0.036
 jars-0.036
Token to
Token ID284
Feature activation-1.06e-01

Pos logit contributions
ine+0.005
aters+0.005
 pic+0.005
ast+0.005
ara+0.005
Neg logit contributions
 sorry-0.010
 gru-0.010
ling-0.010
 alarmed-0.010
 terrified-0.010
Token eat
Token ID4483
Feature activation-5.24e-01

Pos logit contributions
 pic+0.002
 Christmas+0.002
 weeks+0.002
ine+0.002
 jars+0.002
Neg logit contributions
 sorry-0.002
ishly-0.002
ling-0.002
 apologise-0.002
 vibration-0.002
Token something
Token ID1223
Feature activation+1.02e+00

Pos logit contributions
 pic+0.010
 Christmas+0.010
ine+0.009
 weeks+0.009
 Monday+0.009
Neg logit contributions
ling-0.012
 sorry-0.012
ishly-0.011
 gru-0.011
 terrified-0.011
Token scr
Token ID6040
Feature activation-6.80e+00

Pos logit contributions
ling+0.022
 sorry+0.022
 gru+0.021
ishly+0.021
 terrified+0.020
Neg logit contributions
 pic-0.020
 Christmas-0.019
ine-0.019
 weeks-0.018
 strips-0.018
Tokenum
Token ID388
Feature activation-6.20e-01

Pos logit contributions
 pic+0.164
 Christmas+0.161
 hotels+0.153
 weeks+0.152
 jars+0.152
Neg logit contributions
ling-0.118
 sorry-0.113
ishly-0.108
ivering-0.107
 alarmed-0.106
Tokenpt
Token ID457
Feature activation+1.84e-01

Pos logit contributions
 pic+0.011
 Christmas+0.010
 Monday+0.010
 hotels+0.010
ine+0.010
Neg logit contributions
ling-0.015
 sorry-0.014
ishly-0.014
cut-0.014
 gru-0.014
Tokenious
Token ID699
Feature activation-2.02e+00

Pos logit contributions
ling+0.005
ishly+0.005
 sorry+0.005
 gru+0.005
 terrified+0.005
Neg logit contributions
 pic-0.003
 Christmas-0.003
ine-0.003
 Monday-0.002
 jars-0.002
Token.
Token ID13
Feature activation+9.04e-01

Pos logit contributions
ara+0.042
ine+0.042
 pic+0.042
aters+0.042
 hotels+0.041
Neg logit contributions
cut-0.037
ling-0.037
 sorry-0.036
ishly-0.035
 gru-0.035
Token
Token ID220
Feature activation-4.24e-01

Pos logit contributions
ling+0.020
 sorry+0.020
ishly+0.020
 gru+0.020
 terrified+0.020
Neg logit contributions
 pic-0.017
 Christmas-0.016
ine-0.015
 weeks-0.015
 jars-0.015
Token that
Token ID326
Feature activation+2.11e+00

Pos logit contributions
ling+0.047
ishly+0.046
 sorry+0.046
 gru+0.045
 terrified+0.044
Neg logit contributions
 pic-0.040
 Christmas-0.039
ine-0.037
 weeks-0.036
 jars-0.035
Token even
Token ID772
Feature activation-8.31e-01

Pos logit contributions
ling+0.047
 sorry+0.046
ishly+0.046
 gru+0.045
 terrified+0.044
Neg logit contributions
 pic-0.041
 Christmas-0.039
ine-0.037
 weeks-0.036
 jars-0.036
Token a
Token ID257
Feature activation-6.68e-01

Pos logit contributions
 pic+0.015
 Christmas+0.014
ine+0.013
 weeks+0.013
 Monday+0.013
Neg logit contributions
ling-0.020
ishly-0.020
 sorry-0.019
 gru-0.019
 terrified-0.019
Token three
Token ID1115
Feature activation-2.30e+00

Pos logit contributions
 pic+0.014
 Christmas+0.013
ine+0.013
 jars+0.013
 weeks+0.012
Neg logit contributions
 sorry-0.014
 gru-0.014
ling-0.013
ishly-0.013
 terrified-0.013
Token-
Token ID12
Feature activation-3.05e+00

Pos logit contributions
 pic+0.032
 Christmas+0.030
ine+0.028
 Monday+0.028
 jars+0.027
Neg logit contributions
ling-0.063
 gru-0.063
 sorry-0.062
ishly-0.060
 terrified-0.060
Tokenyear
Token ID1941
Feature activation-6.78e+00

Pos logit contributions
 pic+0.075
 Christmas+0.074
ine+0.071
 weeks+0.069
 Monday+0.069
Neg logit contributions
ling-0.051
ishly-0.050
 sorry-0.050
 gru-0.049
 terrified-0.047
Token-
Token ID12
Feature activation-3.05e+00

Pos logit contributions
 pic+0.137
 Monday+0.122
 jars+0.120
ine+0.119
aters+0.118
Neg logit contributions
ling-0.159
 gru-0.156
 sorry-0.152
cut-0.145
 terrified-0.141
Tokenold
Token ID727
Feature activation+8.71e-01

Pos logit contributions
 pic+0.084
 Christmas+0.080
ine+0.078
 jars+0.077
 strips+0.076
Neg logit contributions
ling-0.043
 sorry-0.043
 gru-0.042
ishly-0.041
oding-0.041
Token can
Token ID460
Feature activation-1.37e+00

Pos logit contributions
 sorry+0.019
ishly+0.019
ling+0.019
 gru+0.018
 terrified+0.018
Neg logit contributions
 pic-0.018
 Christmas-0.017
ine-0.016
 jars-0.016
 Monday-0.016
Token deliver
Token ID5203
Feature activation-8.73e-01

Pos logit contributions
 pic+0.025
 Christmas+0.023
ine+0.022
 jars+0.022
 weeks+0.021
Neg logit contributions
ling-0.032
ishly-0.032
 sorry-0.032
 gru-0.031
hing-0.031
Token something
Token ID1223
Feature activation+5.86e-02

Pos logit contributions
 pic+0.016
 Christmas+0.015
ine+0.014
ast+0.014
 Monday+0.014
Neg logit contributions
 sorry-0.020
ling-0.020
 gru-0.020
ishly-0.019
oding-0.019
Token.
Token ID13
Feature activation-2.14e-02

Pos logit contributions
ishly+0.002
ling+0.002
 sorry+0.002
 gru+0.002
hing+0.002
Neg logit contributions
 Christmas-0.001
 pic-0.001
ine-0.001
 Monday-0.001
 weeks-0.001
Token She
Token ID1375
Feature activation-1.14e+00

Pos logit contributions
 pic+0.000
ine+0.000
 weeks+0.000
 passports+0.000
ast+0.000
Neg logit contributions
 sorry-0.000
 gru-0.000
ishly-0.000
 hi-0.000
 relief-0.000
Token was
Token ID373
Feature activation+1.35e+00

Pos logit contributions
 pic+0.029
aters+0.028
ine+0.027
 strips+0.027
 hotels+0.027
Neg logit contributions
ling-0.019
 sorry-0.017
 shocked-0.017
 relieved-0.017
 terrified-0.017
Token three
Token ID1115
Feature activation-1.84e+00

Pos logit contributions
 sorry+0.029
ling+0.028
ishly+0.027
 gru+0.027
 terrified+0.027
Neg logit contributions
 pic-0.029
 Christmas-0.027
ine-0.026
 weeks-0.026
 strips-0.025
Token-
Token ID12
Feature activation-1.83e+00

Pos logit contributions
 Monday+0.030
 pic+0.028
 Wed+0.027
ine+0.027
 Christmas+0.027
Neg logit contributions
ling-0.052
 sorry-0.045
 terrified-0.044
 shocked-0.044
 relieved-0.043
Tokenyears
Token ID19002
Feature activation-6.78e+00

Pos logit contributions
 pic+0.039
 Christmas+0.038
ine+0.036
 weeks+0.035
 jars+0.035
Neg logit contributions
ling-0.037
ishly-0.036
 sorry-0.036
 gru-0.035
 terrified-0.035
Token old
Token ID1468
Feature activation+1.20e+00

Pos logit contributions
 pic+0.133
 Monday+0.127
 Christmas+0.119
ine+0.118
 hotels+0.118
Neg logit contributions
ling-0.164
ag-0.143
 gru-0.143
 sorry-0.142
cut-0.137
Token and
Token ID290
Feature activation+5.66e-01

Pos logit contributions
ling+0.031
ishly+0.030
 sorry+0.030
 gru+0.029
 terrified+0.029
Neg logit contributions
 pic-0.019
 Monday-0.017
ine-0.017
 Christmas-0.016
ast-0.016
Token she
Token ID673
Feature activation-1.95e-01

Pos logit contributions
ling+0.011
ishly+0.010
 sorry+0.010
 gru+0.010
 terrified+0.010
Neg logit contributions
 pic-0.013
 Christmas-0.013
ine-0.012
 weeks-0.012
 Monday-0.012
Token loved
Token ID6151
Feature activation-1.80e+00

Pos logit contributions
 pic+0.006
 strips+0.005
aters+0.005
 hotels+0.005
 jars+0.005
Neg logit contributions
ling-0.003
 sorry-0.003
 terrified-0.002
 relieved-0.002
 shocked-0.002
Token to
Token ID284
Feature activation+1.76e-02

Pos logit contributions
ine+0.034
ara+0.031
ast+0.030
em+0.028
ait+0.028
Neg logit contributions
 sorry-0.041
 weeping-0.039
 hi-0.038
ling-0.038
 crying-0.038
Token She
Token ID1375
Feature activation-1.07e+00

Pos logit contributions
 pic+0.010
 Christmas+0.010
ine+0.008
 passports+0.008
 weeks+0.008
Neg logit contributions
iece-0.009
 sorry-0.009
oding-0.009
 vibration-0.009
 smug-0.009
Token was
Token ID373
Feature activation+1.02e+00

Pos logit contributions
 pic+0.028
 jars+0.026
 Christmas+0.026
 strips+0.026
aters+0.026
Neg logit contributions
ling-0.017
 sorry-0.016
 terrified-0.015
 vibration-0.015
hing-0.015
Token only
Token ID691
Feature activation+1.43e+00

Pos logit contributions
 sorry+0.023
ling+0.021
 terrified+0.021
ishly+0.021
 alarmed+0.021
Neg logit contributions
 pic-0.020
ine-0.019
 Christmas-0.019
 weeks-0.018
 jars-0.018
Token three
Token ID1115
Feature activation-2.07e+00

Pos logit contributions
 sorry+0.035
 vibration+0.032
ling+0.031
 gru+0.031
 terrified+0.031
Neg logit contributions
 pic-0.028
 Christmas-0.023
 weeks-0.023
ine-0.023
 Monday-0.023
Token-
Token ID12
Feature activation-1.90e+00

Pos logit contributions
 Monday+0.033
 pic+0.032
 Christmas+0.031
 Sale+0.030
Inside+0.030
Neg logit contributions
ling-0.057
 sorry-0.051
 shocked-0.050
 terrified-0.050
ivering-0.049
Tokenyears
Token ID19002
Feature activation-6.78e+00

Pos logit contributions
 pic+0.046
 Christmas+0.043
ine+0.041
 Monday+0.041
 strips+0.040
Neg logit contributions
ling-0.034
 sorry-0.033
ishly-0.032
 gru-0.032
oding-0.031
Token-
Token ID12
Feature activation-1.86e+00

Pos logit contributions
 pic+0.143
 Monday+0.136
 hotels+0.127
 Saturday+0.126
 Christmas+0.125
Neg logit contributions
ling-0.154
ag-0.141
 gru-0.133
 sorry-0.132
cut-0.128
Tokenold
Token ID727
Feature activation+1.13e+00

Pos logit contributions
 pic+0.054
 Christmas+0.050
 Monday+0.049
 strips+0.049
ine+0.048
Neg logit contributions
ling-0.024
oding-0.023
faced-0.022
 sorry-0.022
 gru-0.022
Token,
Token ID11
Feature activation+1.01e+00

Pos logit contributions
ishly+0.028
ling+0.027
 sorry+0.027
 gru+0.026
rified+0.025
Neg logit contributions
 pic-0.020
 Monday-0.019
 Christmas-0.019
 Saturday-0.017
 hotels-0.017
Token but
Token ID475
Feature activation+1.72e+00

Pos logit contributions
 sorry+0.019
 gru+0.018
ling+0.017
 relieved+0.017
 terrified+0.017
Neg logit contributions
 Sale-0.022
 pic-0.021
aters-0.021
 folders-0.021
ast-0.021
Token she
Token ID673
Feature activation-1.52e-02

Pos logit contributions
 sorry+0.042
 gru+0.039
ishly+0.039
 relieved+0.038
ling+0.038
Neg logit contributions
 pic-0.031
ast-0.028
 strips-0.028
ine-0.028
aters-0.028
Token bracelet
Token ID42893
Feature activation+7.89e-01

Pos logit contributions
 sorry+0.027
ling+0.027
ishly+0.027
 gru+0.027
 vibration+0.026
Neg logit contributions
 pic-0.022
 Christmas-0.021
ine-0.020
 weeks-0.020
 Monday-0.019
Token for
Token ID329
Feature activation+5.90e-01

Pos logit contributions
ishly+0.016
ling+0.015
 sorry+0.015
cut+0.015
faced+0.014
Neg logit contributions
 pic-0.017
 hotels-0.016
ine-0.015
 weeks-0.015
 Monday-0.015
Token her
Token ID607
Feature activation-8.22e-01

Pos logit contributions
 sorry+0.014
ishly+0.012
ling+0.012
 relief+0.012
 gru+0.012
Neg logit contributions
ine-0.011
 pic-0.011
aters-0.011
ast-0.010
 folders-0.010
Token three
Token ID1115
Feature activation-1.92e+00

Pos logit contributions
 pic+0.014
 hotels+0.013
ine+0.012
 Christmas+0.012
 jars+0.012
Neg logit contributions
 sorry-0.020
ishly-0.020
ling-0.019
 vibration-0.018
 gru-0.018
Token-
Token ID12
Feature activation-2.38e+00

Pos logit contributions
 Monday+0.038
 pays+0.038
 Min+0.038
 Wed+0.037
 Mon+0.037
Neg logit contributions
ling-0.041
 gru-0.036
ag-0.034
 sorry-0.033
 noise-0.032
Tokenyear
Token ID1941
Feature activation-6.78e+00

Pos logit contributions
 pic+0.063
 Christmas+0.058
ine+0.056
 hotels+0.056
 strips+0.056
Neg logit contributions
ling-0.036
faced-0.035
 gru-0.034
ishly-0.034
oding-0.033
Token-
Token ID12
Feature activation-2.53e+00

Pos logit contributions
 pic+0.166
 Monday+0.153
 Min+0.144
 hotels+0.143
aters+0.142
Neg logit contributions
ling-0.141
 gru-0.121
cut-0.118
ag-0.118
rier-0.116
Tokenold
Token ID727
Feature activation+9.18e-01

Pos logit contributions
 pic+0.072
 Christmas+0.067
 hotels+0.066
 strips+0.066
 Monday+0.066
Neg logit contributions
faced-0.031
ling-0.030
ag-0.029
cut-0.029
 gru-0.029
Token child
Token ID1200
Feature activation+6.95e-01

Pos logit contributions
ishly+0.018
 sorry+0.018
ling+0.017
 gru+0.016
 hi+0.016
Neg logit contributions
 pic-0.021
 Christmas-0.019
 Monday-0.019
 weeks-0.018
 hotels-0.018
Token.
Token ID13
Feature activation+1.71e-01

Pos logit contributions
ishly+0.012
ling+0.012
 sorry+0.011
faced+0.010
 relieved+0.010
Neg logit contributions
 pic-0.016
 weeks-0.015
 hotels-0.015
 Christmas-0.015
 Monday-0.014
Token The
Token ID383
Feature activation-2.87e-01

Pos logit contributions
 relieved+0.003
ishly+0.003
 sorry+0.003
 hi+0.003
 gru+0.002
Neg logit contributions
 weeks-0.004
ine-0.004
 Christmas-0.004
 jars-0.004
 pic-0.004
Token.
Token ID13
Feature activation-3.33e-01

Pos logit contributions
ling+0.005
 sorry+0.004
ishly+0.004
 gru+0.004
oding+0.004
Neg logit contributions
 pic-0.005
 weeks-0.005
 folders-0.005
 strips-0.005
 jars-0.005
Token He
Token ID679
Feature activation-2.93e-01

Pos logit contributions
 pic+0.006
 Christmas+0.005
 Monday+0.005
 Saturday+0.005
 weeks+0.005
Neg logit contributions
ling-0.009
oding-0.008
ishly-0.008
 vibration-0.008
 sorry-0.008
Token was
Token ID373
Feature activation+9.54e-01

Pos logit contributions
 pic+0.008
aters+0.008
 jars+0.008
ait+0.008
 strips+0.008
Neg logit contributions
ling-0.004
 sorry-0.004
ined-0.004
 relieved-0.004
 shocked-0.004
Token three
Token ID1115
Feature activation-2.15e+00

Pos logit contributions
 sorry+0.023
ling+0.022
ishly+0.022
 gru+0.022
 terrified+0.021
Neg logit contributions
 pic-0.017
 Christmas-0.016
ine-0.016
 weeks-0.015
 jars-0.015
Token-
Token ID12
Feature activation-1.60e+00

Pos logit contributions
 pic+0.030
 Monday+0.029
 Christmas+0.028
ine+0.027
 Wed+0.026
Neg logit contributions
ling-0.064
 sorry-0.057
 terrified-0.057
 gru-0.056
 shocked-0.056
Tokenyears
Token ID19002
Feature activation-6.77e+00

Pos logit contributions
 pic+0.031
 Christmas+0.031
ine+0.029
 weeks+0.029
 jars+0.028
Neg logit contributions
ishly-0.034
 sorry-0.034
ling-0.034
 gru-0.033
 vibration-0.033
Token-
Token ID12
Feature activation-1.66e+00

Pos logit contributions
 pic+0.136
 Monday+0.127
 hotels+0.120
 Christmas+0.120
ine+0.118
Neg logit contributions
ling-0.162
 gru-0.143
 sorry-0.141
ag-0.141
cut-0.138
Tokenold
Token ID727
Feature activation+1.20e+00

Pos logit contributions
 pic+0.044
 Christmas+0.043
ine+0.041
 weeks+0.040
 Monday+0.040
Neg logit contributions
ling-0.025
 sorry-0.025
ishly-0.024
 gru-0.024
oding-0.023
Token and
Token ID290
Feature activation+1.34e-01

Pos logit contributions
ling+0.033
ishly+0.033
 sorry+0.033
 gru+0.032
 terrified+0.032
Neg logit contributions
 pic-0.017
 Christmas-0.016
 Monday-0.015
ine-0.015
 weeks-0.014
Token liked
Token ID8288
Feature activation-3.34e+00

Pos logit contributions
ling+0.003
ishly+0.003
 vibration+0.003
 gru+0.003
hing+0.003
Neg logit contributions
 Christmas-0.003
 pic-0.003
ine-0.003
 weeks-0.003
 Monday-0.003
Token to
Token ID284
Feature activation-1.13e-01

Pos logit contributions
ine+0.060
ara+0.057
ast+0.053
ona+0.050
aters+0.050
Neg logit contributions
 sorry-0.079
 hi-0.074
ling-0.074
 gru-0.073
 weeping-0.073
Token,
Token ID11
Feature activation+1.65e+00

Pos logit contributions
ling+0.008
ishly+0.008
 sorry+0.008
 terrified+0.008
 gru+0.008
Neg logit contributions
 pic-0.008
 Christmas-0.008
 hotels-0.008
aters-0.007
 strips-0.007
Token who
Token ID508
Feature activation+4.25e-01

Pos logit contributions
ishly+0.039
 sorry+0.039
ling+0.039
 gru+0.038
 terrified+0.037
Neg logit contributions
 pic-0.030
 Christmas-0.029
ine-0.027
 weeks-0.026
 jars-0.026
Token was
Token ID373
Feature activation+1.01e+00

Pos logit contributions
ling+0.009
 terrified+0.009
 sorry+0.009
hing+0.008
ishly+0.008
Neg logit contributions
 pic-0.009
 Christmas-0.008
 weeks-0.008
 hotels-0.008
aters-0.008
Token three
Token ID1115
Feature activation-2.20e+00

Pos logit contributions
ling+0.022
ishly+0.021
 sorry+0.021
 gru+0.021
 terrified+0.020
Neg logit contributions
 pic-0.020
 Christmas-0.020
ine-0.019
 weeks-0.018
 jars-0.018
Token-
Token ID12
Feature activation-2.24e+00

Pos logit contributions
 pic+0.032
 Christmas+0.032
 Monday+0.030
 Wed+0.028
 Saturday+0.027
Neg logit contributions
ling-0.065
 terrified-0.060
 gru-0.059
 sorry-0.059
 shocked-0.058
Tokenyears
Token ID19002
Feature activation-6.77e+00

Pos logit contributions
 pic+0.046
 Christmas+0.045
ine+0.043
 weeks+0.042
 jars+0.041
Neg logit contributions
ling-0.047
ishly-0.046
 sorry-0.046
 gru-0.045
 terrified-0.044
Token-
Token ID12
Feature activation-2.18e+00

Pos logit contributions
 pic+0.135
 Monday+0.126
 Christmas+0.124
 hotels+0.120
 Saturday+0.117
Neg logit contributions
ling-0.166
 gru-0.148
 sorry-0.142
ag-0.142
 terrified-0.140
Token old
Token ID1468
Feature activation+6.73e-01

Pos logit contributions
 pic+0.060
 Christmas+0.058
ine+0.055
 Monday+0.055
 weeks+0.054
Neg logit contributions
ling-0.032
 gru-0.030
ishly-0.030
 sorry-0.030
oding-0.029
Token,
Token ID11
Feature activation+7.81e-01

Pos logit contributions
ling+0.017
ishly+0.017
 gru+0.017
 sorry+0.016
 terrified+0.016
Neg logit contributions
 pic-0.011
 Christmas-0.011
 Monday-0.010
ast-0.010
ine-0.010
Token said
Token ID531
Feature activation-1.03e+00

Pos logit contributions
 sorry+0.019
 gru+0.019
ling+0.018
 terrified+0.018
ishly+0.018
Neg logit contributions
 Christmas-0.013
 pic-0.013
ine-0.012
 hotels-0.012
 weeks-0.012
Token to
Token ID284
Feature activation+8.80e-01

Pos logit contributions
 pic+0.012
 Christmas+0.012
ine+0.011
 weeks+0.010
 Monday+0.010
Neg logit contributions
ling-0.030
ishly-0.030
 sorry-0.030
 gru-0.029
 vibration-0.029
Token there
Token ID612
Feature activation+2.92e-01

Pos logit contributions
ling+0.007
ishly+0.007
 sorry+0.007
 gru+0.007
 terrified+0.007
Neg logit contributions
 pic-0.007
 Christmas-0.006
ine-0.006
 weeks-0.006
 jars-0.006
Token teaching
Token ID7743
Feature activation+5.27e-01

Pos logit contributions
ling+0.008
 sorry+0.008
ishly+0.008
 gru+0.008
oding+0.007
Neg logit contributions
 pic-0.004
 Christmas-0.004
ine-0.004
 hotels-0.004
 strips-0.004
Token the
Token ID262
Feature activation+7.81e-01

Pos logit contributions
 sorry+0.012
ling+0.012
ishly+0.012
 apologise+0.011
 gru+0.011
Neg logit contributions
 pic-0.010
 Christmas-0.009
ast-0.009
 Monday-0.009
ine-0.008
Token 3
Token ID513
Feature activation-1.15e+00

Pos logit contributions
ling+0.016
 sorry+0.015
 gru+0.015
 vibration+0.015
ishly+0.014
Neg logit contributions
 pic-0.017
 Christmas-0.016
ine-0.016
ast-0.015
aters-0.015
Token-
Token ID12
Feature activation-1.86e+00

Pos logit contributions
 pic+0.013
 Christmas+0.013
ine+0.011
 weeks+0.011
 jars+0.011
Neg logit contributions
ishly-0.034
ling-0.034
 sorry-0.034
 gru-0.034
oding-0.033
Tokenyear
Token ID1941
Feature activation-6.76e+00

Pos logit contributions
 pic+0.045
 Christmas+0.043
 strips+0.040
ine+0.040
 jars+0.040
Neg logit contributions
ling-0.033
ishly-0.032
 sorry-0.032
 gru-0.032
oding-0.031
Token-
Token ID12
Feature activation-2.21e+00

Pos logit contributions
 pic+0.152
 Monday+0.132
aters+0.127
 hotels+0.126
 Christmas+0.126
Neg logit contributions
ling-0.158
 gru-0.143
 sorry-0.140
cut-0.136
ishly-0.135
Tokenold
Token ID727
Feature activation+1.18e+00

Pos logit contributions
 pic+0.060
 Christmas+0.057
 Monday+0.054
 strips+0.054
ine+0.054
Neg logit contributions
ling-0.034
 sorry-0.032
 gru-0.031
ishly-0.031
oding-0.031
Token girl
Token ID2576
Feature activation-2.75e-01

Pos logit contributions
ling+0.028
ishly+0.028
 sorry+0.028
 gru+0.027
 terrified+0.026
Neg logit contributions
 pic-0.022
 Christmas-0.020
 Monday-0.019
 strips-0.018
 hotels-0.018
Token about
Token ID546
Feature activation-1.16e+00

Pos logit contributions
 pic+0.005
 Christmas+0.005
 Monday+0.005
 weeks+0.004
 hotels+0.004
Neg logit contributions
 sorry-0.006
ling-0.006
ishly-0.006
 gru-0.006
 terrified-0.006
Token the
Token ID262
Feature activation+1.18e+00

Pos logit contributions
 pic+0.020
 Christmas+0.020
ine+0.019
 Monday+0.018
 weeks+0.018
Neg logit contributions
 sorry-0.027
ling-0.027
ishly-0.027
 gru-0.026
 terrified-0.026
Token.
Token ID13
Feature activation-4.60e-01

Pos logit contributions
 pic+0.018
 weeks+0.016
 jars+0.016
 hotels+0.016
 strips+0.016
Neg logit contributions
ling-0.016
 sorry-0.015
ishly-0.014
oding-0.014
 relieved-0.014
Token Daisy
Token ID40355
Feature activation-9.20e-01

Pos logit contributions
 pic+0.008
 Christmas+0.007
ine+0.007
 weeks+0.007
 Monday+0.007
Neg logit contributions
ling-0.011
ishly-0.011
 sorry-0.011
 gru-0.011
oding-0.011
Token was
Token ID373
Feature activation+1.00e+00

Pos logit contributions
 pic+0.026
 hotels+0.025
aters+0.025
 strips+0.025
ait+0.025
Neg logit contributions
ling-0.012
 relieved-0.010
ined-0.010
 sorry-0.010
 shocked-0.010
Token three
Token ID1115
Feature activation-1.96e+00

Pos logit contributions
 sorry+0.023
ling+0.023
ishly+0.022
 gru+0.022
 terrified+0.022
Neg logit contributions
 pic-0.019
 Christmas-0.018
ine-0.017
 weeks-0.017
 strips-0.017
Token-
Token ID12
Feature activation-1.71e+00

Pos logit contributions
 pic+0.029
 Monday+0.028
 Christmas+0.027
ine+0.026
 Wed+0.025
Neg logit contributions
ling-0.056
 sorry-0.051
 shocked-0.050
 terrified-0.050
 gru-0.050
Tokenyears
Token ID19002
Feature activation-6.76e+00

Pos logit contributions
 pic+0.035
 Christmas+0.034
ine+0.032
 weeks+0.031
 jars+0.031
Neg logit contributions
ling-0.036
ishly-0.036
 sorry-0.036
 gru-0.035
 terrified-0.034
Token-
Token ID12
Feature activation-1.80e+00

Pos logit contributions
 pic+0.136
 Monday+0.129
 Christmas+0.120
 hotels+0.119
 Saturday+0.119
Neg logit contributions
ling-0.161
ag-0.144
 gru-0.140
 sorry-0.139
cut-0.136
Tokenold
Token ID727
Feature activation+1.25e+00

Pos logit contributions
 pic+0.049
 Christmas+0.046
 Monday+0.044
ine+0.044
 strips+0.044
Neg logit contributions
ling-0.027
 sorry-0.026
 gru-0.026
oding-0.025
ishly-0.025
Token,
Token ID11
Feature activation+6.93e-01

Pos logit contributions
ling+0.031
ishly+0.031
 sorry+0.031
 gru+0.030
 terrified+0.029
Neg logit contributions
 pic-0.022
 Christmas-0.020
 Monday-0.020
ine-0.019
 Saturday-0.018
Token and
Token ID290
Feature activation+2.83e-02

Pos logit contributions
 sorry+0.014
 gru+0.014
 relieved+0.013
 terrified+0.013
ling+0.013
Neg logit contributions
 Sale-0.013
 weeks-0.013
 folders-0.013
ell-0.013
aters-0.013
Token she
Token ID673
Feature activation-1.58e-01

Pos logit contributions
 sorry+0.001
ishly+0.001
ling+0.001
 gru+0.001
 terrified+0.001
Neg logit contributions
 pic-0.001
 Christmas-0.000
ine-0.000
 strips-0.000
 jars-0.000
Token She
Token ID1375
Feature activation-1.11e+00

Pos logit contributions
 pic+0.006
ine+0.005
ast+0.005
 Christmas+0.005
 weeks+0.005
Neg logit contributions
 sorry-0.006
ishly-0.006
hing-0.006
oding-0.006
ined-0.006
Token was
Token ID373
Feature activation+1.38e+00

Pos logit contributions
 pic+0.028
 jars+0.026
aters+0.025
 strips+0.025
 hotels+0.025
Neg logit contributions
ling-0.019
 sorry-0.019
 gru-0.017
 relieved-0.017
 terrified-0.017
Token only
Token ID691
Feature activation+1.49e+00

Pos logit contributions
 sorry+0.031
 terrified+0.028
ishly+0.028
 alarmed+0.027
 shocked+0.027
Neg logit contributions
 pic-0.029
ine-0.026
 strips-0.026
 weeks-0.026
 jars-0.026
Token three
Token ID1115
Feature activation-2.10e+00

Pos logit contributions
 sorry+0.037
 vibration+0.035
ling+0.034
 gru+0.034
ishly+0.034
Neg logit contributions
 pic-0.028
 Christmas-0.024
 weeks-0.023
ine-0.023
 Monday-0.023
Token-
Token ID12
Feature activation-1.41e+00

Pos logit contributions
 Monday+0.033
 pic+0.032
 Christmas+0.031
 Saturday+0.030
 Sale+0.029
Neg logit contributions
ling-0.059
 sorry-0.052
 shocked-0.052
 terrified-0.051
ivering-0.050
Tokenyears
Token ID19002
Feature activation-6.76e+00

Pos logit contributions
 pic+0.035
 Christmas+0.033
ine+0.032
 strips+0.031
 Monday+0.031
Neg logit contributions
ling-0.024
 sorry-0.024
ishly-0.023
 gru-0.023
oding-0.023
Token-
Token ID12
Feature activation-1.59e+00

Pos logit contributions
 pic+0.145
 Monday+0.140
 Saturday+0.130
 hotels+0.128
 Christmas+0.128
Neg logit contributions
ling-0.153
ag-0.137
 gru-0.129
 sorry-0.129
agging-0.124
Tokenold
Token ID727
Feature activation+1.35e+00

Pos logit contributions
 pic+0.049
 Christmas+0.046
 strips+0.045
 Monday+0.045
ine+0.045
Neg logit contributions
ling-0.018
 sorry-0.017
oding-0.016
 gru-0.016
ishly-0.016
Token,
Token ID11
Feature activation+1.01e+00

Pos logit contributions
ishly+0.033
ling+0.033
 sorry+0.032
 gru+0.031
rified+0.030
Neg logit contributions
 pic-0.025
 Monday-0.023
 Christmas-0.022
 Saturday-0.021
ine-0.021
Token but
Token ID475
Feature activation+1.68e+00

Pos logit contributions
 sorry+0.018
 gru+0.017
 relieved+0.017
ling+0.016
 terrified+0.016
Neg logit contributions
 Sale-0.023
aters-0.022
 pic-0.022
 hotels-0.022
 folders-0.022
Token she
Token ID673
Feature activation+1.44e-02

Pos logit contributions
 sorry+0.045
ling+0.045
ishly+0.045
 gru+0.044
 terrified+0.043
Neg logit contributions
 pic-0.025
 Christmas-0.023
ine-0.022
 strips-0.021
 jars-0.021
Token complete
Token ID1844
Feature activation-2.97e+00

Pos logit contributions
 pic+0.019
 Christmas+0.018
ine+0.016
 hotels+0.016
 Monday+0.016
Neg logit contributions
ling-0.023
 sorry-0.022
 gru-0.021
ishly-0.021
oding-0.021
Token.
Token ID13
Feature activation-2.19e+00

Pos logit contributions
 pic+0.058
 Christmas+0.053
ait+0.051
 packed+0.051
aters+0.050
Neg logit contributions
ling-0.060
 sorry-0.058
 gru-0.057
 wrong-0.056
cut-0.053
Token Height
Token ID27280
Feature activation-2.31e+00

Pos logit contributions
 pic+0.052
 Christmas+0.046
 packed+0.046
 weeks+0.046
 planned+0.044
Neg logit contributions
 sorry-0.044
 gru-0.038
ishly-0.037
 hi-0.037
 vibration-0.035
Token:
Token ID25
Feature activation-5.03e-01

Pos logit contributions
 pic+0.060
 Christmas+0.047
 hotels+0.047
 packed+0.046
 folders+0.045
Neg logit contributions
oding-0.043
ling-0.041
ble-0.040
ishly-0.039
 sorry-0.039
Token 90
Token ID4101
Feature activation-2.40e+00

Pos logit contributions
 pic+0.009
 Christmas+0.008
aters+0.008
 hotels+0.008
 weeks+0.007
Neg logit contributions
 sorry-0.012
 gru-0.012
 vibration-0.011
 terrified-0.011
ling-0.011
Token centimeters
Token ID48829
Feature activation-6.75e+00

Pos logit contributions
 pic+0.049
 Christmas+0.043
 hotels+0.041
 Saturday+0.039
 Monday+0.038
Neg logit contributions
ling-0.058
 sorry-0.050
 gru-0.049
inking-0.048
ering-0.048
Token.
Token ID13
Feature activation-2.17e+00

Pos logit contributions
 pic+0.150
 Christmas+0.145
 hotels+0.140
 Wed+0.136
ait+0.132
Neg logit contributions
ling-0.122
 sorry-0.117
 wrong-0.111
 gru-0.110
 racket-0.109
Token Weight
Token ID14331
Feature activation-1.53e+00

Pos logit contributions
 pic+0.051
 Christmas+0.047
 weeks+0.047
 packed+0.046
 planned+0.045
Neg logit contributions
 sorry-0.042
 hi-0.036
 gru-0.036
ishly-0.036
hing-0.033
Token:
Token ID25
Feature activation+5.01e-01

Pos logit contributions
 pic+0.032
 Christmas+0.028
 hotels+0.026
 Halloween+0.025
 Saturday+0.025
Neg logit contributions
ling-0.035
hing-0.033
 sorry-0.032
oding-0.031
ishly-0.031
Token 15
Token ID1315
Feature activation-2.51e+00

Pos logit contributions
 sorry+0.012
 gru+0.011
ling+0.011
 terrified+0.011
 vibration+0.011
Neg logit contributions
 pic-0.009
 Christmas-0.008
 hotels-0.008
aters-0.008
 weeks-0.008
Token kilograms
Token ID37075
Feature activation-3.47e+00

Pos logit contributions
 pic+0.050
 Christmas+0.045
ait+0.044
 hotels+0.043
aters+0.042
Neg logit contributions
ling-0.055
 sorry-0.052
 gru-0.052
 terrified-0.050
 shocked-0.049
Token.
Token ID13
Feature activation-2.92e-01

Pos logit contributions
 pic+0.021
 Christmas+0.019
 jars+0.018
 weeks+0.018
ine+0.018
Neg logit contributions
ling-0.029
 sorry-0.028
ishly-0.028
 gru-0.027
oding-0.027
Token She
Token ID1375
Feature activation-1.16e+00

Pos logit contributions
 pic+0.006
 Christmas+0.005
ine+0.005
 jars+0.005
ast+0.005
Neg logit contributions
 sorry-0.006
ishly-0.006
 vibration-0.006
hing-0.006
oding-0.006
Token was
Token ID373
Feature activation+1.24e+00

Pos logit contributions
 pic+0.029
aters+0.027
 jars+0.027
 strips+0.027
 hotels+0.027
Neg logit contributions
ling-0.019
 sorry-0.019
 shocked-0.018
 relieved-0.018
 terrified-0.017
Token three
Token ID1115
Feature activation-2.07e+00

Pos logit contributions
 sorry+0.028
 terrified+0.025
ishly+0.025
ling+0.025
 alarmed+0.024
Neg logit contributions
 pic-0.026
ine-0.024
 strips-0.024
 weeks-0.023
 Christmas-0.023
Token-
Token ID12
Feature activation-1.46e+00

Pos logit contributions
 Monday+0.030
 pic+0.030
 Christmas+0.027
 Wed+0.026
ine+0.026
Neg logit contributions
ling-0.060
 sorry-0.054
 shocked-0.053
 terrified-0.053
ivering-0.052
Tokenyears
Token ID19002
Feature activation-6.75e+00

Pos logit contributions
 pic+0.030
 Christmas+0.029
ine+0.028
 weeks+0.027
 jars+0.027
Neg logit contributions
ling-0.030
ishly-0.030
 sorry-0.030
 gru-0.029
 terrified-0.029
Token-
Token ID12
Feature activation-1.49e+00

Pos logit contributions
 pic+0.139
 Monday+0.131
 hotels+0.121
 Christmas+0.120
 Saturday+0.120
Neg logit contributions
ling-0.160
ag-0.140
 gru-0.138
 sorry-0.138
oding-0.134
Tokenold
Token ID727
Feature activation+1.31e+00

Pos logit contributions
 pic+0.042
 Christmas+0.039
ine+0.038
 strips+0.038
 Monday+0.038
Neg logit contributions
ling-0.021
 sorry-0.021
 gru-0.020
ishly-0.020
oding-0.020
Token and
Token ID290
Feature activation+1.48e-01

Pos logit contributions
ling+0.034
ishly+0.034
 sorry+0.033
 gru+0.032
 terrified+0.031
Neg logit contributions
 pic-0.022
 Monday-0.020
 Christmas-0.019
ine-0.018
 planned-0.018
Token loved
Token ID6151
Feature activation-2.19e+00

Pos logit contributions
ling+0.003
 sorry+0.003
ishly+0.003
 gru+0.003
oding+0.002
Neg logit contributions
 pic-0.004
 Christmas-0.003
ine-0.003
 weeks-0.003
 jars-0.003
Token to
Token ID284
Feature activation-1.85e-01

Pos logit contributions
ine+0.044
ara+0.042
ona+0.039
ast+0.038
em+0.036
Neg logit contributions
 sorry-0.047
 weeping-0.046
 crying-0.046
 noise-0.043
 hi-0.042
Token a
Token ID257
Feature activation-9.35e-01

Pos logit contributions
 pic+0.006
 Christmas+0.005
ine+0.005
 Monday+0.005
 weeks+0.005
Neg logit contributions
ling-0.006
ishly-0.006
 sorry-0.006
 gru-0.006
oding-0.005
Token very
Token ID845
Feature activation+5.78e-03

Pos logit contributions
 pic+0.021
aters+0.020
 jars+0.020
 strips+0.020
ine+0.020
Neg logit contributions
 sorry-0.019
ling-0.018
 gru-0.017
 relieved-0.017
 terrified-0.017
Token curious
Token ID11040
Feature activation-4.79e-01

Pos logit contributions
 sorry+0.000
ling+0.000
 gru+0.000
 terrified+0.000
ishly+0.000
Neg logit contributions
 pic-0.000
 Christmas-0.000
ine-0.000
 strips-0.000
 jars-0.000
Token 3
Token ID513
Feature activation-5.52e-01

Pos logit contributions
 pic+0.012
ine+0.011
 jars+0.011
 strips+0.011
 weeks+0.011
Neg logit contributions
ling-0.008
 sorry-0.008
 gru-0.008
ishly-0.008
 hi-0.007
Token-
Token ID12
Feature activation-1.90e+00

Pos logit contributions
 pic+0.009
 Christmas+0.008
aters+0.008
 Monday+0.008
 strips+0.007
Neg logit contributions
ling-0.015
 gru-0.014
ishly-0.014
 sorry-0.014
 terrified-0.014
Tokenyear
Token ID1941
Feature activation-6.75e+00

Pos logit contributions
 pic+0.044
 Christmas+0.044
ine+0.042
 weeks+0.041
 Monday+0.040
Neg logit contributions
ling-0.034
ishly-0.034
 sorry-0.034
 vibration-0.033
 gru-0.032
Token-
Token ID12
Feature activation-1.95e+00

Pos logit contributions
 pic+0.151
 Monday+0.133
aters+0.131
 jars+0.131
ine+0.129
Neg logit contributions
ling-0.154
 gru-0.141
cut-0.137
 sorry-0.135
ishly-0.129
Tokenold
Token ID727
Feature activation+6.80e-01

Pos logit contributions
 pic+0.054
 Christmas+0.052
ine+0.050
 strips+0.049
 jars+0.049
Neg logit contributions
ling-0.028
 sorry-0.027
ishly-0.027
 gru-0.027
oding-0.026
Token boy
Token ID2933
Feature activation+2.06e+00

Pos logit contributions
ling+0.015
ishly+0.015
 sorry+0.015
 gru+0.015
 terrified+0.014
Neg logit contributions
 pic-0.014
 Christmas-0.012
 Monday-0.012
 strips-0.012
 jars-0.012
Token.
Token ID13
Feature activation-4.57e-01

Pos logit contributions
ishly+0.039
 sorry+0.037
ling+0.036
 gru+0.035
 hi+0.034
Neg logit contributions
 pic-0.046
 Christmas-0.043
 hotels-0.043
 strips-0.042
 jars-0.042
Token One
Token ID1881
Feature activation-2.06e+00

Pos logit contributions
 weeks+0.011
 pic+0.011
 jars+0.010
 strips+0.010
 planned+0.010
Neg logit contributions
ishly-0.008
 gru-0.007
 hi-0.007
 sorry-0.007
ling-0.007
Token tell
Token ID1560
Feature activation-6.48e-01

Pos logit contributions
ling+0.003
ishly+0.003
 gru+0.003
 sorry+0.003
hing+0.003
Neg logit contributions
 pic-0.003
 Christmas-0.003
ine-0.003
 Monday-0.003
 weeks-0.003
Token Daddy
Token ID26926
Feature activation-9.07e-01

Pos logit contributions
 pic+0.014
 Christmas+0.013
ine+0.012
 jars+0.012
aters+0.012
Neg logit contributions
 sorry-0.013
ling-0.013
ishly-0.013
 gru-0.012
hing-0.012
Token's
Token ID338
Feature activation-1.87e+00

Pos logit contributions
 pic+0.021
aters+0.018
 strips+0.018
 weeks+0.018
 jars+0.018
Neg logit contributions
ling-0.019
ishly-0.018
 sorry-0.018
hing-0.017
oding-0.016
Token three
Token ID1115
Feature activation-3.50e+00

Pos logit contributions
 pic+0.033
 Christmas+0.031
ine+0.031
 weeks+0.029
 jars+0.029
Neg logit contributions
 sorry-0.045
ishly-0.043
 gru-0.043
ling-0.043
 alarmed-0.042
Token-
Token ID12
Feature activation-2.29e+00

Pos logit contributions
 pic+0.065
 hotels+0.058
 Christmas+0.058
aters+0.057
 Monday+0.057
Neg logit contributions
ling-0.086
 gru-0.080
 sorry-0.076
ishly-0.074
 relieved-0.073
Tokenyear
Token ID1941
Feature activation-6.75e+00

Pos logit contributions
 pic+0.061
 Christmas+0.059
ine+0.056
 jars+0.055
 strips+0.055
Neg logit contributions
ling-0.035
 gru-0.033
 sorry-0.033
ishly-0.033
oding-0.032
Token-
Token ID12
Feature activation-2.27e+00

Pos logit contributions
 pic+0.163
 Monday+0.146
aters+0.141
 jars+0.140
 hotels+0.140
Neg logit contributions
ling-0.146
 gru-0.127
cut-0.123
 sorry-0.122
ag-0.118
Tokenold
Token ID727
Feature activation+7.17e-01

Pos logit contributions
 pic+0.067
 Christmas+0.064
 strips+0.061
 jars+0.061
ine+0.061
Neg logit contributions
ling-0.029
 gru-0.027
 sorry-0.026
oding-0.025
ishly-0.025
Token a
Token ID257
Feature activation-6.40e-01

Pos logit contributions
ling+0.016
ishly+0.016
 sorry+0.015
 gru+0.015
oding+0.014
Neg logit contributions
 pic-0.015
 Christmas-0.014
 Monday-0.013
 strips-0.013
 jars-0.013
Token story
Token ID1621
Feature activation+3.18e-01

Pos logit contributions
 pic+0.013
ine+0.013
 Christmas+0.012
 weeks+0.012
aters+0.012
Neg logit contributions
 sorry-0.013
 gru-0.013
ling-0.012
ishly-0.012
 alarmed-0.012
Token about
Token ID546
Feature activation-1.21e+00

Pos logit contributions
 sorry+0.006
ling+0.006
 gru+0.006
hing+0.006
ishly+0.006
Neg logit contributions
 pic-0.007
 jars-0.006
aters-0.006
 hotels-0.006
 strips-0.006
Token.
Token ID13
Feature activation-9.39e-01

Pos logit contributions
 pic+0.041
 jars+0.035
aters+0.035
 Christmas+0.035
 strips+0.034
Neg logit contributions
 gru-0.036
 sorry-0.036
ishly-0.035
ling-0.035
 shocked-0.034
Token Jake
Token ID14757
Feature activation-1.02e+00

Pos logit contributions
 pic+0.023
ine+0.022
com+0.022
 jars+0.021
 strips+0.021
Neg logit contributions
 relief-0.014
 hi-0.014
 sorry-0.014
 gru-0.014
hing-0.013
Token was
Token ID373
Feature activation+1.02e+00

Pos logit contributions
 pic+0.028
aters+0.028
 hotels+0.027
 teaspoons+0.026
 jars+0.026
Neg logit contributions
ling-0.014
 shocked-0.013
faced-0.012
 relieved-0.012
 cheered-0.012
Token 3
Token ID513
Feature activation-4.80e-01

Pos logit contributions
 sorry+0.022
 terrified+0.020
 shocked+0.020
 gru+0.019
 alarmed+0.019
Neg logit contributions
 pic-0.023
ine-0.021
 Christmas-0.021
 jars-0.020
 strips-0.020
Token-
Token ID12
Feature activation-1.91e+00

Pos logit contributions
 pic+0.007
 Monday+0.007
 Christmas+0.007
aters+0.006
 Wed+0.006
Neg logit contributions
ling-0.014
 sorry-0.013
 gru-0.013
 terrified-0.013
 shocked-0.012
Tokenyears
Token ID19002
Feature activation-6.74e+00

Pos logit contributions
 pic+0.037
 Christmas+0.037
ine+0.035
 weeks+0.035
 jars+0.034
Neg logit contributions
ishly-0.041
ling-0.041
 sorry-0.040
 gru-0.040
 vibration-0.039
Token-
Token ID12
Feature activation-1.78e+00

Pos logit contributions
 pic+0.139
 Monday+0.130
 Christmas+0.123
 hotels+0.123
 jars+0.120
Neg logit contributions
ling-0.158
ag-0.139
 gru-0.138
 sorry-0.138
oding-0.134
Tokenold
Token ID727
Feature activation+1.26e+00

Pos logit contributions
 pic+0.048
 Christmas+0.046
ine+0.044
 Monday+0.043
 jars+0.043
Neg logit contributions
ling-0.027
 sorry-0.026
ishly-0.026
 gru-0.026
oding-0.025
Token and
Token ID290
Feature activation+1.11e-01

Pos logit contributions
ling+0.033
ishly+0.032
 sorry+0.032
 gru+0.031
 terrified+0.030
Neg logit contributions
 pic-0.022
 Christmas-0.019
 Monday-0.019
 weeks-0.017
 hotels-0.017
Token the
Token ID262
Feature activation+8.55e-01

Pos logit contributions
ling+0.002
 sorry+0.002
ishly+0.002
 gru+0.002
oding+0.002
Neg logit contributions
 pic-0.003
 Christmas-0.002
ine-0.002
 weeks-0.002
 jars-0.002
Token classroom
Token ID15806
Feature activation-1.41e+00

Pos logit contributions
 sorry+0.019
ling+0.018
 gru+0.018
ishly+0.018
 vibration+0.018
Neg logit contributions
 pic-0.017
 Christmas-0.016
ine-0.016
 Monday-0.015
 weeks-0.015
Token when
Token ID618
Feature activation+1.11e+00

Pos logit contributions
 pic+0.013
 Christmas+0.012
ine+0.011
 strips+0.011
aters+0.011
Neg logit contributions
ling-0.019
 gru-0.018
 sorry-0.018
 terrified-0.018
 alarmed-0.018
Token she
Token ID673
Feature activation-7.10e-01

Pos logit contributions
ling+0.033
ishly+0.032
 sorry+0.031
 gru+0.031
hing+0.031
Neg logit contributions
 Christmas-0.014
 pic-0.014
 weeks-0.012
 Monday-0.012
ine-0.012
Token heard
Token ID2982
Feature activation-2.93e-01

Pos logit contributions
 pic+0.016
 Christmas+0.016
ine+0.015
 weeks+0.015
 Monday+0.015
Neg logit contributions
ishly-0.013
 sorry-0.013
ling-0.013
 gru-0.013
hing-0.013
Token a
Token ID257
Feature activation+1.30e+00

Pos logit contributions
 Christmas+0.004
 pic+0.004
 Grand+0.003
 weeks+0.003
ine+0.003
Neg logit contributions
ishly-0.009
ling-0.009
 gru-0.008
 sorry-0.008
hing-0.008
Token strange
Token ID6283
Feature activation+2.46e+00

Pos logit contributions
ishly+0.030
ivered+0.029
iece+0.028
onel+0.028
oding+0.028
Neg logit contributions
 Christmas-0.024
 Grand-0.023
 pic-0.022
 jars-0.021
 planned-0.021
Token noise
Token ID7838
Feature activation+3.18e+00

Pos logit contributions
ishly+0.038
onel+0.036
oding+0.035
rified+0.033
arse+0.033
Neg logit contributions
 Christmas-0.065
ine-0.063
 Grand-0.059
 pic-0.059
 Monday-0.059
Token.
Token ID13
Feature activation+5.28e-01

Pos logit contributions
ishly+0.091
 sorry+0.088
 gru+0.088
ling+0.087
oding+0.086
Neg logit contributions
 Christmas-0.040
 pic-0.038
ine-0.036
 Monday-0.035
 weeks-0.035
Token She
Token ID1375
Feature activation-8.09e-01

Pos logit contributions
 sorry+0.013
ling+0.013
ishly+0.013
 gru+0.013
 terrified+0.013
Neg logit contributions
 pic-0.009
 Christmas-0.008
ine-0.008
 weeks-0.007
 jars-0.007
Token was
Token ID373
Feature activation+1.68e+00

Pos logit contributions
 pic+0.014
 Christmas+0.014
ine+0.013
 weeks+0.013
 Monday+0.013
Neg logit contributions
ling-0.019
ishly-0.019
 sorry-0.019
 gru-0.019
oding-0.018
Token so
Token ID523
Feature activation+9.04e-01

Pos logit contributions
 trigger+0.033
hing+0.032
ling+0.031
ishly+0.031
cut+0.030
Neg logit contributions
 Christmas-0.036
 Grand-0.035
 unp-0.035
 pic-0.035
ine-0.034
Token scared
Token ID12008
Feature activation+2.27e+00

Pos logit contributions
ishly+0.019
 trigger+0.018
coon+0.016
rified+0.016
hing+0.016
Neg logit contributions
 Grand-0.019
 Christmas-0.019
 planned-0.019
 pic-0.019
 unp-0.018
Token He
Token ID679
Feature activation+1.65e+00

Pos logit contributions
ling+0.007
ishly+0.007
hing+0.007
 alarmed+0.007
 sorry+0.007
Neg logit contributions
 Christmas-0.004
 pic-0.003
 strips-0.003
 jars-0.003
ine-0.003
Token wanted
Token ID2227
Feature activation+4.93e-01

Pos logit contributions
ling+0.039
ishly+0.039
 sorry+0.038
 gru+0.038
oding+0.037
Neg logit contributions
 pic-0.029
 Christmas-0.028
ine-0.026
 weeks-0.025
 jars-0.025
Token to
Token ID284
Feature activation+3.17e-01

Pos logit contributions
 sorry+0.015
 gru+0.014
ling+0.014
ishly+0.014
 apologise+0.014
Neg logit contributions
 pic-0.005
ine-0.005
 Christmas-0.005
aters-0.005
ast-0.005
Token have
Token ID423
Feature activation-1.63e+00

Pos logit contributions
 sorry+0.007
ishly+0.006
 apologise+0.006
ling+0.006
 gru+0.006
Neg logit contributions
 pic-0.007
 Christmas-0.006
 weeks-0.006
ine-0.006
 jars-0.006
Token adventures
Token ID17545
Feature activation+1.29e+00

Pos logit contributions
 pic+0.028
 Christmas+0.027
ine+0.027
ast+0.025
 Monday+0.025
Neg logit contributions
 sorry-0.038
 gru-0.037
ling-0.037
ishly-0.037
 terrified-0.036
Token but
Token ID475
Feature activation+3.03e+00

Pos logit contributions
ling+0.032
 sorry+0.032
ishly+0.032
 gru+0.031
 terrified+0.030
Neg logit contributions
 pic-0.021
 Christmas-0.020
ine-0.020
 jars-0.019
 Monday-0.018
Token wasn
Token ID2492
Feature activation-9.23e-01

Pos logit contributions
ishly+0.053
 sorry+0.053
 gru+0.051
ling+0.050
 terrified+0.048
Neg logit contributions
 pic-0.074
 Christmas-0.071
 strips-0.069
 jars-0.068
aters-0.067
Token't
Token ID470
Feature activation+1.22e+00

Pos logit contributions
 pic+0.019
 Christmas+0.017
 jars+0.017
ine+0.016
 weeks+0.016
Neg logit contributions
 sorry-0.022
 terrified-0.020
ishly-0.020
ling-0.020
 gru-0.020
Token sure
Token ID1654
Feature activation-6.53e-01

Pos logit contributions
ling+0.025
 sorry+0.025
 gru+0.024
 terrified+0.024
 alarmed+0.024
Neg logit contributions
 pic-0.027
 Christmas-0.025
 jars-0.024
ine-0.024
 strips-0.023
Token how
Token ID703
Feature activation-1.53e+00

Pos logit contributions
 pic+0.016
 Christmas+0.015
ine+0.014
 strips+0.014
 jars+0.014
Neg logit contributions
 sorry-0.012
ling-0.012
 gru-0.011
oding-0.011
ishly-0.011
Token.
Token ID13
Feature activation+3.90e-01

Pos logit contributions
 pic+0.021
 Christmas+0.019
ine+0.019
 strips+0.018
ast+0.018
Neg logit contributions
 sorry-0.043
ling-0.043
 gru-0.042
 terrified-0.041
ishly-0.040
Token,
Token ID11
Feature activation-6.93e-02

Pos logit contributions
 pic+0.001
ast+0.001
ine+0.001
 Christmas+0.001
 strips+0.001
Neg logit contributions
 sorry-0.001
ishly-0.001
ling-0.001
 shocked-0.001
 gru-0.001
Token she
Token ID673
Feature activation-1.42e+00

Pos logit contributions
 pic+0.001
 Christmas+0.001
ine+0.001
 weeks+0.001
 strips+0.001
Neg logit contributions
 sorry-0.002
ling-0.002
ishly-0.002
 gru-0.002
 vibration-0.002
Token fell
Token ID3214
Feature activation-1.96e-01

Pos logit contributions
 pic+0.031
 Christmas+0.030
ine+0.029
 weeks+0.028
 jars+0.028
Neg logit contributions
ishly-0.028
ling-0.028
 sorry-0.027
 gru-0.027
oding-0.026
Token off
Token ID572
Feature activation-1.03e+00

Pos logit contributions
 pic+0.005
 Christmas+0.005
ine+0.005
 jars+0.005
 weeks+0.005
Neg logit contributions
ling-0.003
ishly-0.003
 sorry-0.003
 gru-0.003
oding-0.003
Token and
Token ID290
Feature activation+6.66e-01

Pos logit contributions
 pic+0.021
ast+0.020
 fiber+0.020
ine+0.020
aters+0.019
Neg logit contributions
 sorry-0.022
 shocked-0.021
 terrified-0.021
ling-0.021
ishly-0.020
Token hurt
Token ID5938
Feature activation+2.86e+00

Pos logit contributions
ling+0.016
ishly+0.016
 gru+0.016
hing+0.015
oding+0.015
Neg logit contributions
 Christmas-0.012
 pic-0.011
 Grand-0.011
 planned-0.010
ine-0.010
Token herself
Token ID5223
Feature activation-7.00e-01

Pos logit contributions
ling+0.074
 sorry+0.074
ishly+0.073
 gru+0.072
 terrified+0.071
Neg logit contributions
 pic-0.046
 Christmas-0.043
ine-0.040
 jars-0.039
 Monday-0.039
Token.
Token ID13
Feature activation+9.21e-01

Pos logit contributions
 Christmas+0.012
 pic+0.012
ine+0.011
 weeks+0.011
 Monday+0.010
Neg logit contributions
ishly-0.017
 gru-0.017
 sorry-0.016
ling-0.016
hing-0.016
Token Her
Token ID2332
Feature activation-1.21e+00

Pos logit contributions
 sorry+0.023
ling+0.023
ishly+0.023
 gru+0.022
 terrified+0.022
Neg logit contributions
 pic-0.015
 Christmas-0.015
ine-0.014
 weeks-0.013
 jars-0.013
Token mom
Token ID1995
Feature activation+3.98e-01

Pos logit contributions
 pic+0.015
 Christmas+0.015
ine+0.013
 weeks+0.013
 jars+0.013
Neg logit contributions
ling-0.035
ishly-0.034
 sorry-0.034
 gru-0.034
 terrified-0.033
Token rushed
Token ID13999
Feature activation+1.12e+00

Pos logit contributions
 gru+0.010
 sorry+0.010
ishly+0.010
 vibration+0.010
ling+0.010
Neg logit contributions
ine-0.006
 Christmas-0.006
 pic-0.006
 weeks-0.006
 unp-0.005
Token dress
Token ID6576
Feature activation-1.74e+00

Pos logit contributions
 pic+0.018
 Christmas+0.017
ine+0.016
 weeks+0.016
 Monday+0.016
Neg logit contributions
ling-0.025
 sorry-0.025
ishly-0.025
 gru-0.024
 vibration-0.024
Token.
Token ID13
Feature activation-1.33e-01

Pos logit contributions
 pic+0.030
 Christmas+0.026
 hotels+0.025
 Monday+0.024
 weeks+0.024
Neg logit contributions
ling-0.046
ishly-0.045
 sorry-0.044
oding-0.043
 gru-0.042
Token He
Token ID679
Feature activation-9.45e-02

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.003
 weeks+0.003
 strips+0.003
Neg logit contributions
 sorry-0.003
 gru-0.003
 terrified-0.002
 relieved-0.002
oding-0.002
Token made
Token ID925
Feature activation-9.96e-01

Pos logit contributions
 pic+0.002
 Christmas+0.002
ine+0.002
 jars+0.002
 weeks+0.002
Neg logit contributions
ishly-0.002
ling-0.002
 sorry-0.002
 gru-0.002
oding-0.002
Token a
Token ID257
Feature activation+1.65e+00

Pos logit contributions
 Christmas+0.012
 pic+0.011
ine+0.011
 jars+0.010
 strips+0.010
Neg logit contributions
ishly-0.030
ling-0.029
 gru-0.029
oding-0.028
hing-0.028
Token big
Token ID1263
Feature activation+2.96e+00

Pos logit contributions
ishly+0.048
rified+0.044
hing+0.043
ling+0.042
oding+0.041
Neg logit contributions
 Christmas-0.024
 jars-0.023
 strips-0.023
 line-0.023
 pic-0.023
Token hole
Token ID7604
Feature activation+1.37e+00

Pos logit contributions
oding+0.073
ishly+0.072
rified+0.069
ling+0.069
 gru+0.068
Neg logit contributions
 Christmas-0.050
 strips-0.049
 jars-0.047
ine-0.047
 sugar-0.045
Token.
Token ID13
Feature activation-1.32e-01

Pos logit contributions
ishly+0.036
ling+0.035
 sorry+0.035
 gru+0.035
oding+0.034
Neg logit contributions
 pic-0.020
 Christmas-0.020
ine-0.019
 jars-0.018
 weeks-0.018
Token
Token ID198
Feature activation-4.52e-01

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.003
 weeks+0.003
 strips+0.003
Neg logit contributions
 sorry-0.003
 gru-0.002
 terrified-0.002
oding-0.002
 relieved-0.002
Token
Token ID198
Feature activation-2.93e-01

Pos logit contributions
 pic+0.006
 Christmas+0.006
ine+0.006
 Monday+0.006
 Grand+0.006
Neg logit contributions
ishly-0.013
ling-0.012
 vibration-0.012
 sorry-0.012
 gru-0.012
TokenL
Token ID43
Feature activation-2.30e+00

Pos logit contributions
 pic+0.006
 Christmas+0.006
aters+0.005
 strips+0.005
 folders+0.005
Neg logit contributions
ling-0.007
 gru-0.007
 sorry-0.007
oding-0.007
 terrified-0.006
TokenLu
Token ID25596
Feature activation-2.17e+00

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.003
 jars+0.003
 strips+0.003
Neg logit contributions
ling-0.005
 sorry-0.005
ishly-0.005
 gru-0.005
 terrified-0.005
Tokency
Token ID948
Feature activation-1.35e+00

Pos logit contributions
 pic+0.051
 Christmas+0.046
 hotels+0.045
 weeks+0.045
 strips+0.045
Neg logit contributions
ling-0.044
ishly-0.043
rified-0.040
 terrified-0.038
 relieved-0.038
Token was
Token ID373
Feature activation+2.02e+00

Pos logit contributions
 pic+0.031
 Christmas+0.029
 hotels+0.028
 Monday+0.027
 jars+0.027
Neg logit contributions
ishly-0.026
ling-0.026
 sorry-0.025
 terrified-0.025
 relieved-0.025
Token so
Token ID523
Feature activation+1.29e+00

Pos logit contributions
ling+0.057
 trigger+0.056
ishly+0.055
hing+0.055
cut+0.054
Neg logit contributions
 Christmas-0.027
 pic-0.025
ine-0.025
 unp-0.025
 Grand-0.024
Token happy
Token ID3772
Feature activation-8.39e-01

Pos logit contributions
 trigger+0.028
ishly+0.028
coon+0.026
ling+0.026
cut+0.025
Neg logit contributions
 unp-0.026
 Grand-0.026
 Christmas-0.026
 planned-0.026
 un-0.024
Token that
Token ID326
Feature activation+3.52e+00

Pos logit contributions
 pic+0.015
 Christmas+0.014
ine+0.014
 Monday+0.013
 jars+0.013
Neg logit contributions
ling-0.020
 sorry-0.020
ishly-0.020
 gru-0.019
 terrified-0.019
Token she
Token ID673
Feature activation+3.67e-01

Pos logit contributions
 sorry+0.080
 relief+0.072
 gru+0.072
ishly+0.072
 apologise+0.071
Neg logit contributions
 pic-0.070
ine-0.068
 strips-0.067
ast-0.066
aters-0.066
Token had
Token ID550
Feature activation-5.32e-01

Pos logit contributions
ling+0.008
 sorry+0.008
ishly+0.008
 terrified+0.008
 relieved+0.008
Neg logit contributions
 pic-0.007
 Christmas-0.007
 hotels-0.007
 jars-0.007
 Monday-0.007
Token learned
Token ID4499
Feature activation-1.73e+00

Pos logit contributions
 Christmas+0.011
 pic+0.011
 hotels+0.011
aters+0.011
ine+0.011
Neg logit contributions
 sorry-0.010
 relieved-0.010
 apologise-0.010
 shocked-0.010
 alarmed-0.010
Token something
Token ID1223
Feature activation+1.36e+00

Pos logit contributions
 pic+0.036
 Christmas+0.035
ine+0.033
aters+0.033
 weeks+0.033
Neg logit contributions
 sorry-0.036
ishly-0.034
oding-0.033
ling-0.033
 gru-0.033
Token new
Token ID649
Feature activation+7.47e-01

Pos logit contributions
ling+0.025
 sorry+0.025
ishly+0.024
 gru+0.024
oding+0.023
Neg logit contributions
 pic-0.032
 Christmas-0.031
ine-0.029
 weeks-0.029
 hotels-0.029
Token They
Token ID1119
Feature activation-2.87e-01

Pos logit contributions
 weeks+0.012
com+0.011
ast+0.011
 pic+0.011
 queens+0.011
Neg logit contributions
 sorry-0.006
 hi-0.006
ishly-0.005
 gru-0.005
 Oh-0.004
Token went
Token ID1816
Feature activation+7.40e-01

Pos logit contributions
 pic+0.007
aters+0.006
ast+0.006
 Wed+0.006
ine+0.006
Neg logit contributions
ling-0.005
ishly-0.005
hing-0.005
 sorry-0.005
cut-0.005
Token on
Token ID319
Feature activation-1.54e+00

Pos logit contributions
ling+0.021
 sorry+0.020
ishly+0.020
 gru+0.019
oding+0.019
Neg logit contributions
 pic-0.010
ine-0.009
 Christmas-0.009
 weeks-0.009
 jars-0.008
Token a
Token ID257
Feature activation-7.80e-02

Pos logit contributions
ine+0.031
aters+0.031
ast+0.030
ell+0.029
ait+0.029
Neg logit contributions
 gru-0.029
 sorry-0.028
 alarmed-0.028
 horrified-0.027
 terrified-0.027
Token search
Token ID2989
Feature activation+3.74e-01

Pos logit contributions
aters+0.001
ine+0.001
 weeks+0.001
 pic+0.001
 jars+0.001
Neg logit contributions
ling-0.002
 sorry-0.002
 gru-0.002
ishly-0.002
 alarmed-0.002
Token throughout
Token ID3690
Feature activation-1.12e-01

Pos logit contributions
ling+0.008
 sorry+0.008
ishly+0.008
 gru+0.008
oding+0.008
Neg logit contributions
 pic-0.007
ine-0.007
 Christmas-0.007
ast-0.006
 strips-0.006
Token the
Token ID262
Feature activation+1.26e+00

Pos logit contributions
aters+0.002
 pic+0.002
ine+0.002
ara+0.002
ast+0.002
Neg logit contributions
 sorry-0.003
 gru-0.002
 relief-0.002
ling-0.002
 alarmed-0.002
Token house
Token ID2156
Feature activation-2.68e-01

Pos logit contributions
ling+0.026
 gru+0.025
 sorry+0.024
 vibration+0.024
 alarmed+0.023
Neg logit contributions
ine-0.027
 pic-0.025
aters-0.024
 Christmas-0.024
ast-0.024
Token until
Token ID1566
Feature activation-3.16e-01

Pos logit contributions
 pic+0.005
ara+0.004
 Christmas+0.004
 Wed+0.004
ine+0.004
Neg logit contributions
ling-0.006
 sorry-0.006
ishly-0.006
 alarmed-0.006
oding-0.006
Token they
Token ID484
Feature activation-7.86e-01

Pos logit contributions
 pic+0.005
aters+0.005
ine+0.005
 strips+0.005
ast+0.005
Neg logit contributions
 sorry-0.008
 gru-0.007
ishly-0.007
ling-0.007
 shocked-0.007
Token came
Token ID1625
Feature activation-2.58e-02

Pos logit contributions
 pic+0.018
 Christmas+0.017
ine+0.017
 strips+0.017
aters+0.016
Neg logit contributions
ling-0.014
 sorry-0.014
ishly-0.014
 alarmed-0.014
 terrified-0.014
Tokenanger
Token ID2564
Feature activation-1.03e+00

Pos logit contributions
 pic+0.116
 Christmas+0.109
 Monday+0.101
 Saturday+0.101
 hotels+0.100
Neg logit contributions
oding-0.059
ivering-0.058
ling-0.058
arling-0.057
onel-0.057
Token.
Token ID13
Feature activation-2.50e-01

Pos logit contributions
 pic+0.018
 Christmas+0.017
ine+0.016
 hotels+0.016
 Monday+0.016
Neg logit contributions
ishly-0.025
ling-0.025
 sorry-0.024
 gru-0.024
oding-0.023
Token She
Token ID1375
Feature activation-1.19e+00

Pos logit contributions
 Christmas+0.005
ine+0.005
 pic+0.005
 weeks+0.005
 strips+0.005
Neg logit contributions
 sorry-0.005
ishly-0.005
 relieved-0.005
 gru-0.005
 hi-0.005
Token saw
Token ID2497
Feature activation+4.70e-01

Pos logit contributions
 Christmas+0.027
 pic+0.026
 hotels+0.026
aters+0.026
 Wed+0.026
Neg logit contributions
 relieved-0.021
 shocked-0.021
ling-0.021
ishly-0.020
 terrified-0.020
Token they
Token ID484
Feature activation+1.19e-01

Pos logit contributions
ling+0.014
 gru+0.014
ishly+0.014
 vibration+0.014
ivering+0.013
Neg logit contributions
 Christmas-0.006
 pic-0.006
 Grand-0.005
 jars-0.005
 weeks-0.005
Token were
Token ID547
Feature activation+6.32e-01

Pos logit contributions
 gru+0.003
ishly+0.003
ling+0.003
 sorry+0.003
oding+0.003
Neg logit contributions
 Christmas-0.002
 pic-0.002
ine-0.002
 planned-0.002
 unp-0.002
Token all
Token ID477
Feature activation+1.04e+00

Pos logit contributions
 sorry+0.012
ling+0.012
ishly+0.012
 terrified+0.012
 gru+0.012
Neg logit contributions
 pic-0.014
 Christmas-0.014
ine-0.013
 Monday-0.013
 weeks-0.013
Token wet
Token ID9583
Feature activation+2.97e-01

Pos logit contributions
ling+0.021
ishly+0.021
 gru+0.020
 sorry+0.020
oding+0.020
Neg logit contributions
 pic-0.022
 Christmas-0.022
ine-0.020
 jars-0.020
 weeks-0.020
Token!
Token ID0
Feature activation+8.74e-01

Pos logit contributions
ling+0.008
ishly+0.008
 sorry+0.008
hing+0.007
 vibration+0.007
Neg logit contributions
 pic-0.004
 Christmas-0.004
ine-0.004
 Monday-0.004
ast-0.004
Token She
Token ID1375
Feature activation-8.00e-01

Pos logit contributions
ishly+0.020
 sorry+0.020
 gru+0.019
 terrified+0.019
ling+0.019
Neg logit contributions
 pic-0.016
 Christmas-0.016
ine-0.015
 weeks-0.015
 jars-0.015
Token was
Token ID373
Feature activation+1.80e+00

Pos logit contributions
 pic+0.018
 Christmas+0.017
 hotels+0.016
 Monday+0.016
aters+0.016
Neg logit contributions
ling-0.016
 sorry-0.016
ishly-0.015
 terrified-0.015
 shocked-0.015
Token a
Token ID257
Feature activation-2.45e+00

Pos logit contributions
ishly+0.017
ling+0.016
 gru+0.016
 sorry+0.016
hing+0.016
Neg logit contributions
 Christmas-0.005
 pic-0.005
 Monday-0.003
 weeks-0.003
ine-0.003
Token time
Token ID640
Feature activation+5.27e-01

Pos logit contributions
 pic+0.021
 Christmas+0.021
ine+0.018
 jars+0.018
 strips+0.018
Neg logit contributions
ling-0.079
 sorry-0.077
 gru-0.076
ishly-0.076
 terrified-0.075
Token,
Token ID11
Feature activation+2.66e-02

Pos logit contributions
ling+0.015
 sorry+0.015
ishly+0.015
 gru+0.015
 terrified+0.014
Neg logit contributions
 pic-0.007
 Christmas-0.007
ine-0.006
 weeks-0.006
 jars-0.006
Token in
Token ID287
Feature activation-1.08e+00

Pos logit contributions
ishly+0.001
 sidelines+0.001
 apologise+0.001
 relief+0.001
 gru+0.001
Neg logit contributions
 Christmas-0.000
 Monday-0.000
ine-0.000
 eight-0.000
 Saturday-0.000
Token a
Token ID257
Feature activation-1.10e+00

Pos logit contributions
aters+0.021
ine+0.021
ell+0.021
 pic+0.021
 planned+0.020
Neg logit contributions
 sorry-0.021
 relief-0.021
 gru-0.020
 vibration-0.020
 relieved-0.020
Token small
Token ID1402
Feature activation+4.03e-02

Pos logit contributions
 pic+0.019
 Christmas+0.019
ine+0.018
 weeks+0.017
 jars+0.017
Neg logit contributions
ling-0.026
ishly-0.026
 sorry-0.026
 gru-0.025
 terrified-0.025
Token house
Token ID2156
Feature activation-2.41e-01

Pos logit contributions
ling+0.001
 sorry+0.001
 gru+0.001
 vibration+0.001
ishly+0.001
Neg logit contributions
 pic-0.001
ine-0.001
ast-0.001
 weeks-0.001
 planned-0.001
Token,
Token ID11
Feature activation+4.56e-01

Pos logit contributions
 pic+0.004
ine+0.004
 strips+0.004
 Christmas+0.004
aters+0.003
Neg logit contributions
 sorry-0.006
ling-0.006
ishly-0.006
 gru-0.006
oding-0.006
Token lived
Token ID5615
Feature activation-6.90e-01

Pos logit contributions
ishly+0.014
 sidelines+0.014
ling+0.014
hing+0.013
 vibration+0.013
Neg logit contributions
 Christmas-0.006
 Monday-0.005
 planned-0.005
 unp-0.005
 pic-0.004
Token a
Token ID257
Feature activation+1.01e-01

Pos logit contributions
 pic+0.011
 Christmas+0.011
ine+0.010
 weeks+0.010
 jars+0.010
Neg logit contributions
ling-0.018
 sorry-0.017
ishly-0.017
 gru-0.017
 terrified-0.017
Token compassionate
Token ID32533
Feature activation+1.02e+00

Pos logit contributions
 sorry+0.002
ling+0.002
 gru+0.002
ishly+0.002
 terrified+0.002
Neg logit contributions
 pic-0.002
 Christmas-0.002
ine-0.002
 weeks-0.002
 jars-0.002
Token "
Token ID366
Feature activation-5.99e-01

Pos logit contributions
 pic+0.027
 Christmas+0.027
ine+0.025
 weeks+0.024
 Monday+0.023
Neg logit contributions
ling-0.049
ishly-0.048
 gru-0.047
 sorry-0.047
 vibration-0.047
TokenTweet
Token ID47845
Feature activation-9.02e-01

Pos logit contributions
 pic+0.012
 Christmas+0.012
ine+0.012
 weeks+0.011
 strips+0.011
Neg logit contributions
 sorry-0.012
ling-0.012
ishly-0.012
 gru-0.012
oding-0.012
Token,
Token ID11
Feature activation-1.81e+00

Pos logit contributions
 pic+0.014
 Christmas+0.013
 weeks+0.012
 Monday+0.012
 hotels+0.012
Neg logit contributions
ling-0.025
ishly-0.024
 sorry-0.024
 gru-0.023
hing-0.023
Token tweet
Token ID6126
Feature activation+2.43e-02

Pos logit contributions
 pic+0.035
 Christmas+0.034
ine+0.033
 weeks+0.032
 Monday+0.031
Neg logit contributions
 sorry-0.040
ishly-0.039
ling-0.039
 gru-0.039
 terrified-0.038
Token,
Token ID11
Feature activation-2.16e+00

Pos logit contributions
ling+0.001
 sorry+0.001
ishly+0.001
 gru+0.001
hing+0.001
Neg logit contributions
 pic-0.000
 Christmas-0.000
 jars-0.000
ine-0.000
 weeks-0.000
Token tweet
Token ID6126
Feature activation-4.53e-01

Pos logit contributions
 pic+0.048
 Christmas+0.047
ine+0.044
 weeks+0.043
 jars+0.043
Neg logit contributions
ling-0.042
ishly-0.041
 gru-0.040
 sorry-0.040
oding-0.040
Token!"
Token ID2474
Feature activation-3.10e-01

Pos logit contributions
 pic+0.008
 Christmas+0.008
 jars+0.007
ine+0.007
 weeks+0.007
Neg logit contributions
ling-0.011
 sorry-0.011
ishly-0.010
hing-0.010
 gru-0.010
Token
Token ID198
Feature activation-2.37e-01

Pos logit contributions
 pic+0.006
 Christmas+0.005
ine+0.005
 weeks+0.005
 jars+0.005
Neg logit contributions
 sorry-0.007
ling-0.007
ishly-0.007
 gru-0.007
 terrified-0.007
Token
Token ID198
Feature activation-3.40e-01

Pos logit contributions
 pic+0.002
 Grand+0.002
 Christmas+0.002
ine+0.002
ara+0.002
Neg logit contributions
ishly-0.008
 vibration-0.008
ling-0.008
rified-0.007
hing-0.007
TokenL
Token ID43
Feature activation-2.50e+00

Pos logit contributions
 pic+0.005
 Christmas+0.005
ine+0.004
 weeks+0.004
 jars+0.004
Neg logit contributions
ling-0.009
ishly-0.009
 sorry-0.009
 gru-0.009
 terrified-0.009
Tokenily
Token ID813
Feature activation-1.96e+00

Pos logit contributions
 pic+0.049
 Christmas+0.048
ine+0.045
 weeks+0.044
 jars+0.044
Neg logit contributions
ling-0.055
ishly-0.054
 sorry-0.053
 gru-0.053
 terrified-0.052
Token and
Token ID290
Feature activation-3.49e+00

Pos logit contributions
 Monday+0.059
ait+0.057
 pic+0.057
arian+0.056
 Saturday+0.056
Neg logit contributions
 sorry-0.074
ling-0.072
 hi-0.071
 shocked-0.069
hing-0.068
Token photos
Token ID5205
Feature activation-1.97e+00

Pos logit contributions
 pic+0.055
ine+0.054
ast+0.053
 Monday+0.052
 Christmas+0.051
Neg logit contributions
 sorry-0.088
 gru-0.083
 hi-0.081
 vibration-0.080
 terrified-0.080
Token.
Token ID13
Feature activation+1.67e-02

Pos logit contributions
ast+0.040
 pic+0.040
ine+0.039
 Christmas+0.038
 weeks+0.037
Neg logit contributions
ling-0.040
ishly-0.039
 sorry-0.038
hing-0.038
 hi-0.036
Token Ben
Token ID3932
Feature activation-3.99e-01

Pos logit contributions
 sorry+0.000
 gru+0.000
ishly+0.000
 relieved+0.000
ling+0.000
Neg logit contributions
 Christmas-0.000
 pic-0.000
 weeks-0.000
ine-0.000
 strips-0.000
Token and
Token ID290
Feature activation-1.18e-01

Pos logit contributions
 Christmas+0.009
 pic+0.009
 hotels+0.009
 Wed+0.008
 Monday+0.008
Neg logit contributions
 relieved-0.007
 shocked-0.007
ling-0.007
 terrified-0.007
 cheered-0.007
Token Lily
Token ID20037
Feature activation-1.23e-01

Pos logit contributions
 pic+0.002
 Christmas+0.002
aters+0.002
 strips+0.002
ast+0.002
Neg logit contributions
 sorry-0.003
 gru-0.003
ling-0.003
 amused-0.002
 apologise-0.002
Token were
Token ID547
Feature activation+1.20e+00

Pos logit contributions
 pic+0.002
 Christmas+0.002
ine+0.002
 weeks+0.002
 jars+0.002
Neg logit contributions
ling-0.003
 sorry-0.003
ishly-0.003
 gru-0.003
 terrified-0.003
Token curious
Token ID11040
Feature activation+8.53e-02

Pos logit contributions
 sorry+0.024
ling+0.023
ishly+0.023
 gru+0.023
 terrified+0.023
Neg logit contributions
 pic-0.027
 Christmas-0.025
ine-0.025
 weeks-0.024
 Monday-0.024
Token and
Token ID290
Feature activation+4.19e-01

Pos logit contributions
 sorry+0.002
ling+0.002
 gru+0.002
ishly+0.002
 alarmed+0.002
Neg logit contributions
 pic-0.002
 Christmas-0.002
ine-0.001
ast-0.001
 Monday-0.001
Token wanted
Token ID2227
Feature activation-1.31e-01

Pos logit contributions
 sorry+0.009
ling+0.009
 gru+0.009
ishly+0.009
 alarmed+0.009
Neg logit contributions
 pic-0.008
 Christmas-0.008
ine-0.008
aters-0.007
 strips-0.007
Token to
Token ID284
Feature activation+7.80e-02

Pos logit contributions
 pic+0.002
aters+0.002
ast+0.002
ine+0.002
ara+0.002
Neg logit contributions
 sorry-0.003
 shocked-0.003
 gru-0.003
 alarmed-0.003
 hi-0.003
Token,
Token ID11
Feature activation+4.45e-02

Pos logit contributions
 pic+0.039
 Christmas+0.037
 hotels+0.036
 Monday+0.035
 Wed+0.034
Neg logit contributions
ishly-0.058
ling-0.057
cut-0.053
 gru-0.051
 sorry-0.051
Token such
Token ID884
Feature activation-1.10e+00

Pos logit contributions
 sorry+0.001
ling+0.001
ishly+0.001
 relieved+0.001
 terrified+0.001
Neg logit contributions
 Christmas-0.001
 pic-0.001
ine-0.001
 Monday-0.001
aters-0.001
Token as
Token ID355
Feature activation-2.52e-01

Pos logit contributions
 pic+0.019
 Christmas+0.018
ine+0.018
 Monday+0.016
 hotels+0.016
Neg logit contributions
 sorry-0.027
ling-0.026
 gru-0.026
ishly-0.026
 terrified-0.026
Token juicy
Token ID35075
Feature activation-2.32e+00

Pos logit contributions
ine+0.006
azines+0.005
ast+0.005
aters+0.005
ro+0.005
Neg logit contributions
 sorry-0.005
 shocked-0.004
 relieved-0.004
ling-0.004
 hi-0.004
Token apples
Token ID22514
Feature activation-1.09e+00

Pos logit contributions
ine+0.048
isha+0.047
 hotels+0.046
 folders+0.046
aya+0.045
Neg logit contributions
ling-0.052
 sorry-0.044
ishly-0.042
cut-0.041
ging-0.041
Token and
Token ID290
Feature activation-1.75e+00

Pos logit contributions
 pic+0.022
 hotels+0.021
 Christmas+0.020
ine+0.020
ast+0.019
Neg logit contributions
ling-0.025
ishly-0.024
 sorry-0.022
cut-0.022
hing-0.021
Token crunch
Token ID23527
Feature activation-2.82e-01

Pos logit contributions
ine+0.029
ast+0.028
 pic+0.028
ait+0.027
 hotels+0.027
Neg logit contributions
ling-0.041
ishly-0.039
 gru-0.039
 sorry-0.039
cut-0.039
Tokeny
Token ID88
Feature activation-1.23e+00

Pos logit contributions
 pic+0.006
 Christmas+0.006
 hotels+0.006
 Monday+0.005
ine+0.005
Neg logit contributions
ling-0.007
hing-0.006
ishly-0.006
led-0.005
ging-0.005
Token carrots
Token ID34397
Feature activation-1.85e-02

Pos logit contributions
 pic+0.022
ine+0.022
 hotels+0.021
 Christmas+0.021
 Monday+0.021
Neg logit contributions
ling-0.029
ishly-0.028
 sorry-0.026
cut-0.026
 gru-0.025
Token.
Token ID13
Feature activation+7.31e-01

Pos logit contributions
 pic+0.000
 Christmas+0.000
ine+0.000
 hotels+0.000
 Monday+0.000
Neg logit contributions
ling-0.000
ishly-0.000
 sorry-0.000
 terrified-0.000
 gru-0.000
Token
Token ID220
Feature activation-3.44e-01

Pos logit contributions
 sorry+0.017
ishly+0.017
ling+0.017
 gru+0.017
oding+0.016
Neg logit contributions
 pic-0.013
 Christmas-0.013
ine-0.012
 jars-0.012
 weeks-0.011
Token and
Token ID290
Feature activation-5.45e-01

Pos logit contributions
 pic+0.012
 Christmas+0.011
ine+0.011
 weeks+0.010
 Monday+0.010
Neg logit contributions
ling-0.019
ishly-0.019
 sorry-0.019
 gru-0.018
hing-0.018
Token parked
Token ID19584
Feature activation-5.53e-01

Pos logit contributions
 pic+0.009
 Christmas+0.008
ine+0.008
 weeks+0.008
 strips+0.008
Neg logit contributions
 sorry-0.014
ishly-0.013
ling-0.013
 gru-0.013
 terrified-0.013
Token it
Token ID340
Feature activation-1.85e-01

Pos logit contributions
 pic+0.009
 Christmas+0.008
ine+0.008
 weeks+0.008
 strips+0.007
Neg logit contributions
 sorry-0.014
ling-0.014
ishly-0.014
 gru-0.014
 terrified-0.013
Token inside
Token ID2641
Feature activation-1.35e+00

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.003
 weeks+0.003
 jars+0.003
Neg logit contributions
ling-0.004
 sorry-0.004
ishly-0.004
 gru-0.004
 terrified-0.004
Token.
Token ID13
Feature activation-4.03e-01

Pos logit contributions
 pic+0.026
ine+0.025
 Christmas+0.025
ast+0.024
aters+0.023
Neg logit contributions
 sorry-0.029
ling-0.028
 gru-0.027
 shocked-0.026
ishly-0.026
Token They
Token ID1119
Feature activation-8.93e-01

Pos logit contributions
 Christmas+0.008
ine+0.008
 weeks+0.007
com+0.007
 pic+0.007
Neg logit contributions
 sorry-0.009
 relieved-0.009
 relief-0.008
 gru-0.008
 terrified-0.008
Token drove
Token ID10357
Feature activation+4.61e-01

Pos logit contributions
 pic+0.018
 Christmas+0.017
ine+0.017
 Monday+0.017
aters+0.017
Neg logit contributions
ling-0.018
 sorry-0.018
 terrified-0.018
ishly-0.018
 shocked-0.017
Token the
Token ID262
Feature activation+1.24e-01

Pos logit contributions
ling+0.011
 sorry+0.011
ishly+0.011
 gru+0.011
 alarmed+0.010
Neg logit contributions
 pic-0.008
 Christmas-0.008
ine-0.008
 Monday-0.007
 jars-0.007
Token van
Token ID5719
Feature activation-6.90e-01

Pos logit contributions
ling+0.003
 sorry+0.003
ishly+0.003
 gru+0.002
 terrified+0.002
Neg logit contributions
 pic-0.003
 Christmas-0.002
ine-0.002
 weeks-0.002
 Monday-0.002
Token to
Token ID284
Feature activation-3.03e-01

Pos logit contributions
 pic+0.011
 Christmas+0.010
ine+0.010
 jars+0.009
 Monday+0.009
Neg logit contributions
ling-0.018
ishly-0.018
 sorry-0.017
 gru-0.017
 terrified-0.017
Token the
Token ID262
Feature activation+1.31e+00

Pos logit contributions
 pic+0.004
 Christmas+0.004
ine+0.003
 weeks+0.003
 Monday+0.003
Neg logit contributions
ling-0.009
ishly-0.009
 sorry-0.008
 gru-0.008
oding-0.008
Token kicked
Token ID12165
Feature activation+1.01e+00

Pos logit contributions
 pic+0.013
 Christmas+0.013
aters+0.012
 hotels+0.012
 jars+0.012
Neg logit contributions
ling-0.013
 gru-0.013
 terrified-0.013
ishly-0.012
 sorry-0.012
Token the
Token ID262
Feature activation+1.22e-01

Pos logit contributions
ling+0.028
 sorry+0.027
ishly+0.027
 gru+0.027
 terrified+0.026
Neg logit contributions
 pic-0.015
 Christmas-0.014
ine-0.013
 weeks-0.012
 Monday-0.012
Token football
Token ID4346
Feature activation+1.34e+00

Pos logit contributions
ishly+0.003
oding+0.003
 alarmed+0.003
ling+0.003
ined+0.003
Neg logit contributions
 strips-0.002
 Christmas-0.002
 pic-0.002
 jars-0.002
 plates-0.002
Token and
Token ID290
Feature activation-2.76e-01

Pos logit contributions
ling+0.032
ishly+0.032
 sorry+0.031
 terrified+0.030
 gru+0.030
Neg logit contributions
 pic-0.025
 Christmas-0.023
ine-0.022
 hotels-0.021
 weeks-0.021
Token it
Token ID340
Feature activation+3.64e-01

Pos logit contributions
 pic+0.005
 Christmas+0.005
ine+0.005
 weeks+0.005
 jars+0.005
Neg logit contributions
ling-0.006
ishly-0.006
 sorry-0.006
 gru-0.006
oding-0.006
Token flew
Token ID13112
Feature activation-8.88e-01

Pos logit contributions
ling+0.008
ishly+0.008
 sorry+0.008
 gru+0.008
oding+0.008
Neg logit contributions
 pic-0.007
 Christmas-0.006
ine-0.006
 weeks-0.006
 jars-0.006
Token into
Token ID656
Feature activation-6.55e-01

Pos logit contributions
 pic+0.023
 Christmas+0.023
ine+0.022
 weeks+0.021
 jars+0.021
Neg logit contributions
ling-0.014
ishly-0.014
 sorry-0.013
 gru-0.013
 terrified-0.013
Token the
Token ID262
Feature activation+1.77e+00

Pos logit contributions
ine+0.011
 pic+0.011
 Christmas+0.011
ast+0.010
aters+0.010
Neg logit contributions
 gru-0.015
 terrified-0.015
 sorry-0.015
 relief-0.015
 alarmed-0.015
Token goal
Token ID3061
Feature activation-5.04e-01

Pos logit contributions
ling+0.043
ishly+0.043
 sorry+0.042
oding+0.042
 gru+0.041
Neg logit contributions
 pic-0.030
 Christmas-0.029
 jars-0.027
 strips-0.027
ine-0.027
Token easily
Token ID3538
Feature activation-1.83e+00

Pos logit contributions
 pic+0.010
 Christmas+0.010
 hotels+0.009
ast+0.009
ine+0.009
Neg logit contributions
ling-0.011
 gru-0.010
 terrified-0.010
ishly-0.010
hing-0.010
Token.
Token ID13
Feature activation+6.71e-01

Pos logit contributions
 pic+0.038
 Christmas+0.036
 hotels+0.034
 Monday+0.033
 jars+0.033
Neg logit contributions
ling-0.040
ishly-0.038
 sorry-0.038
 gru-0.037
 terrified-0.037
Token to
Token ID284
Feature activation-8.21e-02

Pos logit contributions
ling+0.002
 sorry+0.002
ishly+0.002
 terrified+0.002
 alarmed+0.002
Neg logit contributions
 pic-0.001
ine-0.001
 Christmas-0.001
 Monday-0.001
 jars-0.001
Token the
Token ID262
Feature activation+1.35e+00

Pos logit contributions
 pic+0.001
 Christmas+0.001
ine+0.001
 Monday+0.001
 weeks+0.001
Neg logit contributions
ling-0.002
ishly-0.002
 sorry-0.002
 gru-0.002
oding-0.002
Token p
Token ID279
Feature activation-2.82e+00

Pos logit contributions
ling+0.032
 sorry+0.032
ishly+0.032
 gru+0.031
 terrified+0.031
Neg logit contributions
 pic-0.023
 Christmas-0.022
ine-0.022
 weeks-0.021
 Monday-0.020
Tokenuddle
Token ID24500
Feature activation-7.80e-02

Pos logit contributions
 Christmas+0.091
 pic+0.091
 weeks+0.087
 Monday+0.087
 Saturday+0.084
Neg logit contributions
ling-0.029
ishly-0.026
oding-0.024
 sorry-0.023
 gru-0.023
Token and
Token ID290
Feature activation-1.91e-01

Pos logit contributions
 pic+0.001
 Christmas+0.001
ine+0.001
 Monday+0.001
 weeks+0.001
Neg logit contributions
ling-0.002
ishly-0.002
 sorry-0.002
 gru-0.002
 terrified-0.002
Token jumped
Token ID11687
Feature activation-1.22e+00

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.003
 weeks+0.003
 Monday+0.003
Neg logit contributions
 sorry-0.004
ishly-0.004
ling-0.004
 gru-0.004
 terrified-0.004
Token in
Token ID287
Feature activation-1.78e-01

Pos logit contributions
 pic+0.027
 Christmas+0.026
ine+0.025
 weeks+0.024
 Monday+0.024
Neg logit contributions
ling-0.024
 sorry-0.024
ishly-0.024
 gru-0.023
 terrified-0.023
Token it
Token ID340
Feature activation+2.32e-01

Pos logit contributions
 pic+0.003
 Christmas+0.003
ine+0.003
 Monday+0.002
 weeks+0.002
Neg logit contributions
 sorry-0.004
ling-0.004
 gru-0.004
ishly-0.004
 terrified-0.004
Token and
Token ID290
Feature activation+2.65e-02

Pos logit contributions
ling+0.006
ishly+0.006
 sorry+0.005
 gru+0.005
 terrified+0.005
Neg logit contributions
 pic-0.004
 Christmas-0.004
ine-0.004
 weeks-0.004
 hotels-0.004
Token spl
Token ID4328
Feature activation-1.60e-02

Pos logit contributions
ling+0.001
ishly+0.001
 gru+0.001
oding+0.001
 sorry+0.001
Neg logit contributions
 pic-0.000
 Christmas-0.000
ine-0.000
 jars-0.000
 weeks-0.000
Tokenashed
Token ID5263
Feature activation+6.64e-02

Pos logit contributions
 pic+0.000
ine+0.000
 Christmas+0.000
ast+0.000
 planned+0.000
Neg logit contributions
 gru-0.000
 sorry-0.000
ishly-0.000
 terrified-0.000
 alarmed-0.000
Token and
Token ID290
Feature activation+8.51e-01

Pos logit contributions
 pic+0.014
 Christmas+0.014
 Monday+0.013
ine+0.013
 folders+0.013
Neg logit contributions
ling-0.022
ishly-0.021
 sorry-0.021
 terrified-0.021
 gru-0.021
Token said
Token ID531
Feature activation+2.56e-01

Pos logit contributions
ling+0.017
ishly+0.017
 gru+0.016
 sorry+0.016
hing+0.016
Neg logit contributions
 pic-0.018
 Christmas-0.018
ine-0.017
 weeks-0.016
 planned-0.016
Token,
Token ID11
Feature activation-1.05e+00

Pos logit contributions
ling+0.008
 vibration+0.008
ishly+0.008
 terrified+0.008
hing+0.008
Neg logit contributions
 Christmas-0.003
 pic-0.003
 weeks-0.002
 Monday-0.002
ine-0.002
Token "
Token ID366
Feature activation+1.08e+00

Pos logit contributions
 pic+0.021
 Christmas+0.020
 jars+0.019
ine+0.019
 strips+0.019
Neg logit contributions
 sorry-0.024
ishly-0.023
 gru-0.022
ling-0.022
 relieved-0.022
TokenMom
Token ID29252
Feature activation-1.12e+00

Pos logit contributions
ling+0.018
ishly+0.018
 gru+0.017
hing+0.017
 vibration+0.017
Neg logit contributions
 pic-0.026
 Christmas-0.025
 jars-0.025
ine-0.024
 hotels-0.024
Token,
Token ID11
Feature activation-2.17e+00

Pos logit contributions
 pic+0.029
 hotels+0.025
 folders+0.025
 weeks+0.025
 Christmas+0.025
Neg logit contributions
 sorry-0.020
ishly-0.020
ling-0.019
oding-0.018
 gru-0.017
Token we
Token ID356
Feature activation+1.75e+00

Pos logit contributions
 pic+0.038
ine+0.038
aters+0.035
 strips+0.034
 weeks+0.034
Neg logit contributions
 sorry-0.054
 hi-0.050
ishly-0.049
 gru-0.048
 relief-0.048
Token did
Token ID750
Feature activation-1.84e+00

Pos logit contributions
ling+0.037
 sorry+0.037
ishly+0.036
hing+0.035
 terrified+0.035
Neg logit contributions
 pic-0.036
 Christmas-0.034
ine-0.032
 weeks-0.032
 Monday-0.032
Token it
Token ID340
Feature activation+7.84e-01

Pos logit contributions
 pic+0.035
 Christmas+0.033
ine+0.033
 weeks+0.033
aters+0.032
Neg logit contributions
 sorry-0.042
 terrified-0.038
 gru-0.038
ling-0.038
ishly-0.038
Token!
Token ID0
Feature activation-1.03e+00

Pos logit contributions
ling+0.019
ishly+0.018
 sorry+0.018
 gru+0.018
oding+0.017
Neg logit contributions
 pic-0.014
 Christmas-0.013
ine-0.013
 jars-0.013
 weeks-0.012
Token We
Token ID775
Feature activation+2.09e+00

Pos logit contributions
 pic+0.022
 jars+0.022
 weeks+0.021
 Christmas+0.021
ine+0.021
Neg logit contributions
 sorry-0.022
ishly-0.021
 gru-0.020
 hi-0.020
 relieved-0.020
Token win
Token ID1592
Feature activation-7.65e-01

Pos logit contributions
 pic+0.014
 Christmas+0.014
ine+0.013
 weeks+0.013
 Monday+0.012
Neg logit contributions
ling-0.014
ishly-0.014
 sorry-0.014
 gru-0.014
oding-0.013
Token!
Token ID0
Feature activation+7.68e-01

Pos logit contributions
 pic+0.015
 Christmas+0.014
aters+0.013
ast+0.013
 Monday+0.012
Neg logit contributions
ling-0.017
 terrified-0.017
 sorry-0.017
 gru-0.016
 alarmed-0.016
Token
Token ID220
Feature activation-4.77e-01

Pos logit contributions
 sorry+0.018
 gru+0.017
ishly+0.017
 vibration+0.016
 terrified+0.016
Neg logit contributions
 pic-0.014
 Christmas-0.013
ine-0.013
 weeks-0.013
ast-0.012
Token
Token ID198
Feature activation-6.15e-01

Pos logit contributions
 pic+0.006
 Christmas+0.005
ine+0.005
 Grand+0.005
 Monday+0.005
Neg logit contributions
ishly-0.014
ling-0.014
 vibration-0.014
 gru-0.014
 sorry-0.014
Token
Token ID198
Feature activation-6.82e-01

Pos logit contributions
 pic+0.009
 Christmas+0.009
ine+0.008
 Monday+0.008
 weeks+0.008
Neg logit contributions
ishly-0.017
 sorry-0.017
ling-0.017
 vibration-0.016
 gru-0.016
TokenAn
Token ID2025
Feature activation-3.95e+00

Pos logit contributions
 pic+0.013
 jars+0.012
 Christmas+0.012
ine+0.012
 passports+0.012
Neg logit contributions
hing-0.015
ling-0.015
 gru-0.014
cut-0.013
faced-0.013
Token ostr
Token ID42387
Feature activation-1.99e+00

Pos logit contributions
 pic+0.091
 Christmas+0.082
 jars+0.081
aters+0.080
 Wed+0.078
Neg logit contributions
ling-0.082
 gru-0.072
 sorry-0.069
cut-0.069
faced-0.068
Tokenich
Token ID488
Feature activation+1.33e+00

Pos logit contributions
 Christmas+0.038
 pic+0.037
ine+0.036
ast+0.035
 weeks+0.034
Neg logit contributions
 sorry-0.044
 gru-0.043
 relief-0.043
 vibration-0.043
ling-0.042
Token was
Token ID373
Feature activation+1.12e+00

Pos logit contributions
ling+0.031
ishly+0.030
 sorry+0.030
 gru+0.030
 terrified+0.029
Neg logit contributions
 pic-0.024
 Christmas-0.023
ine-0.022
 jars-0.021
 weeks-0.021
Token in
Token ID287
Feature activation+4.79e-01

Pos logit contributions
 sorry+0.024
ling+0.024
 terrified+0.023
 gru+0.023
ishly+0.023
Neg logit contributions
 pic-0.023
 Christmas-0.022
ine-0.021
 jars-0.021
 weeks-0.021
Token the
Token ID262
Feature activation+2.13e+00

Pos logit contributions
 relief+0.009
 sorry+0.009
 gru+0.009
 terrified+0.009
 vibration+0.008
Neg logit contributions
ell-0.010
ast-0.010
aters-0.010
ip-0.010
ine-0.010
Token join
Token ID4654
Feature activation+4.73e-01

Pos logit contributions
 pic+0.014
 Christmas+0.013
 jars+0.013
 weeks+0.013
 strips+0.013
Neg logit contributions
 sorry-0.014
ling-0.014
ishly-0.014
 apologise-0.013
hing-0.013
Token today
Token ID1909
Feature activation-2.53e+00

Pos logit contributions
 sorry+0.011
ling+0.011
ishly+0.010
 apologise+0.010
hing+0.010
Neg logit contributions
 pic-0.009
ine-0.009
 Christmas-0.008
 weeks-0.008
 strips-0.008
Token."
Token ID526
Feature activation+7.69e-01

Pos logit contributions
ine+0.052
 pic+0.051
ast+0.050
 folders+0.048
 Christmas+0.048
Neg logit contributions
ling-0.047
 sorry-0.045
ishly-0.042
 gru-0.041
 terrified-0.040
Token
Token ID198
Feature activation-1.04e-01

Pos logit contributions
ling+0.021
ishly+0.021
hing+0.020
 vibration+0.020
 gru+0.020
Neg logit contributions
 pic-0.011
 Christmas-0.010
ine-0.010
 Monday-0.009
 weeks-0.009
Token
Token ID198
Feature activation-8.14e-02

Pos logit contributions
 pic+0.001
 Monday+0.001
 Christmas+0.001
 Grand+0.001
ine+0.001
Neg logit contributions
ishly-0.003
 vibration-0.003
ling-0.003
rified-0.003
hing-0.003
TokenM
Token ID44
Feature activation-4.37e+00

Pos logit contributions
 pic+0.001
 Christmas+0.001
ine+0.001
 weeks+0.001
 Monday+0.001
Neg logit contributions
ling-0.002
ishly-0.002
 sorry-0.002
 gru-0.002
 vibration-0.002
Tokenum
Token ID388
Feature activation-1.40e+00

Pos logit contributions
 pic+0.090
 Christmas+0.086
 hotels+0.084
 folders+0.079
 jars+0.077
Neg logit contributions
 sorry-0.096
ling-0.094
 gru-0.090
oding-0.090
 alarmed-0.087
Token smiled
Token ID13541
Feature activation+2.70e-01

Pos logit contributions
 pic+0.028
 Christmas+0.026
 hotels+0.024
 weeks+0.024
 jars+0.024
Neg logit contributions
ling-0.032
ishly-0.032
 sorry-0.031
 gru-0.030
 alarmed-0.029
Token.
Token ID13
Feature activation-3.80e-01

Pos logit contributions
ishly+0.009
hing+0.009
 gru+0.009
ling+0.009
 vibration+0.009
Neg logit contributions
 Christmas-0.002
 pic-0.002
ine-0.002
 weeks-0.002
 Grand-0.002
Token She
Token ID1375
Feature activation-1.55e+00

Pos logit contributions
 pic+0.007
 Christmas+0.007
ine+0.006
 weeks+0.006
 hotels+0.006
Neg logit contributions
 sorry-0.009
ishly-0.009
ling-0.009
 gru-0.009
 alarmed-0.008
Token knew
Token ID2993
Feature activation-6.17e-01

Pos logit contributions
 pic+0.034
 Christmas+0.033
 hotels+0.031
 jars+0.031
ine+0.031
Neg logit contributions
ling-0.030
ishly-0.030
 sorry-0.030
 gru-0.029
 terrified-0.029
Token.
Token ID13
Feature activation-3.88e-01

Pos logit contributions
 sorry+0.011
 shocked+0.011
ling+0.010
 apologise+0.010
 wrong+0.010
Neg logit contributions
 pic-0.016
ast-0.015
 Christmas-0.015
 maple-0.015
aters-0.015
Token Their
Token ID5334
Feature activation-1.90e+00

Pos logit contributions
ine+0.007
ast+0.007
 pic+0.007
 weeks+0.006
com+0.006
Neg logit contributions
 sorry-0.010
 gru-0.009
 ashamed-0.008
 shocked-0.008
 terrified-0.008
Token bags
Token ID11668
Feature activation-7.42e-01

Pos logit contributions
ine+0.034
 pic+0.034
alia+0.032
ona+0.032
 Saturday+0.032
Neg logit contributions
 sorry-0.045
 shocked-0.038
 sincere-0.037
 hi-0.037
 apologise-0.037
Token are
Token ID389
Feature activation-1.60e+00

Pos logit contributions
 pic+0.017
 Christmas+0.016
ine+0.015
 weeks+0.015
 Monday+0.015
Neg logit contributions
 sorry-0.015
ling-0.015
ishly-0.014
 terrified-0.014
oding-0.014
Token too
Token ID1165
Feature activation-7.85e-01

Pos logit contributions
 pic+0.042
ell+0.039
ib+0.038
ara+0.037
aters+0.037
Neg logit contributions
 sorry-0.032
 shocked-0.029
 wrong-0.028
 furious-0.028
 terrified-0.027
Token full
Token ID1336
Feature activation-3.91e+00

Pos logit contributions
 pic+0.022
 Mon+0.021
aters+0.021
 Wed+0.021
oki+0.020
Neg logit contributions
 sorry-0.013
 shocked-0.012
ling-0.012
 relieved-0.011
 wrong-0.011
Token.
Token ID13
Feature activation-1.14e-01

Pos logit contributions
 pic+0.089
 Christmas+0.082
ast+0.082
 hotels+0.082
 Wed+0.080
Neg logit contributions
ling-0.073
 sorry-0.071
ishly-0.068
hing-0.063
rified-0.063
Token Their
Token ID5334
Feature activation-1.69e+00

Pos logit contributions
ine+0.002
ast+0.002
 pic+0.002
 weeks+0.002
 Christmas+0.002
Neg logit contributions
 sorry-0.003
 gru-0.003
 ashamed-0.002
 relieved-0.002
 shocked-0.002
Token things
Token ID1243
Feature activation+1.51e+00

Pos logit contributions
alia+0.030
ona+0.030
ine+0.030
 Mon+0.029
 pic+0.028
Neg logit contributions
 sorry-0.040
 relief-0.034
 shocked-0.034
 blame-0.033
 hi-0.033
Token do
Token ID466
Feature activation-2.88e+00

Pos logit contributions
ling+0.032
 sorry+0.032
ishly+0.031
 terrified+0.030
 gru+0.030
Neg logit contributions
 pic-0.031
 Christmas-0.030
ine-0.029
 weeks-0.028
 Monday-0.027
Token not
Token ID407
Feature activation+9.75e-01

Pos logit contributions
aters+0.080
 pic+0.077
 strawberries+0.073
ib+0.072
 Pr+0.070
Neg logit contributions
 wrong-0.044
 sorry-0.039
 shocked-0.038
 noise-0.038
 apologise-0.038
Token advantage
Token ID4621
Feature activation+7.80e-01

Pos logit contributions
 pic+0.015
 Christmas+0.014
 jars+0.014
 hotels+0.013
ine+0.013
Neg logit contributions
 sorry-0.013
ling-0.013
 gru-0.013
ishly-0.013
 alarmed-0.012
Token.
Token ID13
Feature activation-1.95e-02

Pos logit contributions
ling+0.017
 sorry+0.017
ishly+0.017
oding+0.016
hing+0.016
Neg logit contributions
 pic-0.016
 Christmas-0.014
 hotels-0.014
 jars-0.013
 folders-0.013
Tokenâ
Token ID22940
Feature activation+7.19e-01

Pos logit contributions
 Christmas+0.000
 pic+0.000
 weeks+0.000
 jars+0.000
 strips+0.000
Neg logit contributions
 sorry-0.000
 gru-0.000
 relieved-0.000
 hi-0.000
 terrified-0.000
Token
Token ID26391
Feature activation+6.45e-01

Pos logit contributions
ling+0.012
ishly+0.012
 sorry+0.011
 gru+0.011
 vibration+0.011
Neg logit contributions
 pic-0.018
 Christmas-0.018
ine-0.017
 jars-0.017
 weeks-0.017
Token
Token ID198
Feature activation+2.80e-01

Pos logit contributions
 relieved+0.013
 sorry+0.013
 terrified+0.012
oding+0.012
 shocked+0.012
Neg logit contributions
 Christmas-0.013
 pic-0.013
 folders-0.013
 planned-0.013
 weeks-0.013
TokenM
Token ID44
Feature activation-4.40e+00

Pos logit contributions
ishly+0.006
ling+0.006
 vibration+0.006
 sorry+0.006
 gru+0.006
Neg logit contributions
 pic-0.006
 Christmas-0.006
ine-0.005
 Monday-0.005
 weeks-0.005
Tokenate
Token ID378
Feature activation-1.97e+00

Pos logit contributions
 pic+0.099
 folders+0.098
 hotels+0.093
 skirts+0.091
 puzzles+0.089
Neg logit contributions
 sorry-0.081
inking-0.079
ling-0.079
 gru-0.077
oding-0.076
Tokeno
Token ID78
Feature activation-4.84e-01

Pos logit contributions
 pic+0.044
 Christmas+0.040
 hotels+0.040
 jars+0.038
aters+0.038
Neg logit contributions
ling-0.044
ishly-0.039
hing-0.037
 sorry-0.037
 shocked-0.036
Token had
Token ID550
Feature activation-1.78e+00

Pos logit contributions
 pic+0.010
 Christmas+0.010
 hotels+0.010
 jars+0.009
 folders+0.009
Neg logit contributions
ling-0.011
ishly-0.010
 sorry-0.010
 terrified-0.010
 relieved-0.010
Token to
Token ID284
Feature activation-6.67e-01

Pos logit contributions
 Christmas+0.034
 pic+0.034
 hotels+0.033
aters+0.032
ast+0.032
Neg logit contributions
 sorry-0.037
 gru-0.035
 terrified-0.035
 ang-0.035
 relieved-0.034
Token think
Token ID892
Feature activation-4.40e-01

Pos logit contributions
 pic+0.014
 Christmas+0.013
 jars+0.012
 weeks+0.012
ine+0.012
Neg logit contributions
 sorry-0.014
ishly-0.014
ling-0.014
 apologise-0.013
 gru-0.013
Token making
Token ID1642
Feature activation-6.18e-01

Pos logit contributions
ling+0.026
 sorry+0.026
ishly+0.025
 gru+0.025
 terrified+0.024
Neg logit contributions
 pic-0.018
ine-0.017
 Christmas-0.017
aters-0.016
ast-0.016
Token noise
Token ID7838
Feature activation+2.19e+00

Pos logit contributions
 pic+0.011
 Christmas+0.010
ine+0.010
 weeks+0.010
 Monday+0.010
Neg logit contributions
ling-0.015
 sorry-0.015
ishly-0.014
 gru-0.014
 alarmed-0.014
Token.
Token ID13
Feature activation+5.93e-01

Pos logit contributions
ling+0.055
 sorry+0.052
hing+0.050
ishly+0.049
 terrified+0.048
Neg logit contributions
 pic-0.041
 Christmas-0.037
 Monday-0.035
 weeks-0.035
ine-0.035
Token
Token ID198
Feature activation-1.05e+00

Pos logit contributions
ishly+0.012
 sorry+0.012
ling+0.012
 gru+0.012
 terrified+0.012
Neg logit contributions
 pic-0.012
 Christmas-0.012
 jars-0.011
 weeks-0.011
ine-0.011
Token
Token ID198
Feature activation-1.00e+00

Pos logit contributions
 pic+0.020
 Christmas+0.019
 jars+0.018
ine+0.018
 folders+0.017
Neg logit contributions
ling-0.024
 gru-0.023
 sorry-0.023
oding-0.023
 terrified-0.023
TokenM
Token ID44
Feature activation-4.62e+00

Pos logit contributions
 pic+0.017
 Christmas+0.016
 jars+0.015
 folders+0.015
ine+0.015
Neg logit contributions
ling-0.025
 gru-0.024
 terrified-0.024
 relieved-0.024
 alarmed-0.024
Tokenama
Token ID1689
Feature activation-1.68e+00

Pos logit contributions
 pic+0.108
 jars+0.101
 hotels+0.100
 Christmas+0.099
 puzzles+0.098
Neg logit contributions
ling-0.093
ishly-0.089
 sorry-0.081
ivering-0.081
 alarmed-0.079
Token:
Token ID25
Feature activation+1.47e+00

Pos logit contributions
 pic+0.038
 Christmas+0.035
 hotels+0.034
 jars+0.033
aters+0.033
Neg logit contributions
ling-0.036
ishly-0.034
 gru-0.032
 terrified-0.032
 sorry-0.032
Token "
Token ID366
Feature activation+1.16e+00

Pos logit contributions
 sorry+0.035
ling+0.035
ishly+0.035
 gru+0.034
 terrified+0.033
Neg logit contributions
 pic-0.026
 Christmas-0.025
ine-0.024
 weeks-0.023
 jars-0.023
TokenLet
Token ID5756
Feature activation+7.26e-01

Pos logit contributions
ling+0.021
hing+0.020
 gru+0.020
ishly+0.019
 vibration+0.019
Neg logit contributions
 pic-0.027
 Christmas-0.027
ine-0.025
 jars-0.025
 weeks-0.024
Token's
Token ID338
Feature activation-7.56e-01

Pos logit contributions
ling+0.017
 sorry+0.017
ishly+0.017
 gru+0.016
 alarmed+0.016
Neg logit contributions
 pic-0.013
 Christmas-0.013
 jars-0.012
 weeks-0.012
ine-0.012