TOP ACTIVATIONS
MAX = 0.101

. " Can we take some corn ?" Tom
idea . Let 's get some cream and make some ice
" Can I have some ice , please ?" Tim
mom set to making a delicious pie . They took out
Lily . Let 's get some tasty ice cream ," said
, " Can I buy some ice cream ?"
Yes , you can have some ice cream !"
the tree and picked a juicy peach . But it was
" Can we have some ?" Lily asked .
ran to grab all sorts of decorations , including balloons ,
friends and ate lots of delicious food . She had so
! Can we please have some ?" asked Lily .
. Let 's get some tasty ice cream ," said her
backyard . She found a ripe one and carefully pl ucked
, let 's go get some ." When
She would take all sorts of things and make them spark
, " Can I have some ice cream now ?" Her
" You can choose either an apple or a banana ."
" Can we have some ice cream , please ?"
, I 'll get you some ." But

MIN ACTIVATIONS
MIN = -0.099

little girl playing a toy sax ophone . She was playing
were two musicians playing their guitars . They were singing and
, a woman with a violin , a boy with a
toys . She put the drum around her neck and the
his dream of playing the violin had come true . <|endoftext|>
time and became a great violin ist . <|endoftext|> Once upon
when she saw Gina 's violin . She wanted one too
sunshine . One day , the sky went pale and it
. They b anged their drums and lifted their feet .
her friend playing a beautiful violin . She asked her friend
watch . It hit the golf hole in one shot -
sound . It is a violin . She follows the sound
article about the local ar chery tournament . He decided he
a ribbon with a big bow . It was a pink
and a girl with a piano . They hear how the
was wearing a pretty pink bow in her hair .
a fantastic time visiting the jazz club . She danced and
and always wore his lucky red shirt . One day ,
, she stopped playing her violin . Then
, he is playing a sax ophone . It is a

INTERVAL 0.051 - 0.101
CONTAINS 0.193%

bear named Brown ie . Brown ie was very lonely because
in the kitchen , mixing some things together . Suddenly ,
bury the corn . The farmer was so happy that he
named Lily . She loved ice - cream more than anything
who wanted a haircut . She asked her mom if she

INTERVAL 0.001 - 0.051
CONTAINS 46.676%

some crunch y carrot sticks . They were very excited to
it . They heard a buzzing sound in their ears .
Finally they reached a good spot to fish and
the strawberries . The strawberries were sweet and y ummy .
cookie . Her mom said , " Yes , but you

INTERVAL -0.049 - 0.001
CONTAINS 52.974%

know how tall he was ! From that
" Yes , I do . Can I carry my guitar
't know either . Suddenly , Danny saw it ! Sour
started to spin around and around . He spun faster and
cats . And there are no other par rots . I

INTERVAL -0.099 - -0.049
CONTAINS 0.157%

and Anna like to play soccer . They are on the
a big box with a bow . " Happy
The End . <|endoftext|> One day Mom my
day when Jake and his dog , F uzz y ,
, a gentleman wearing a hat stopped and waved at them
Token.
Token ID13
Feature activation-5.75e-03

Pos logit contributions
cher+0.016
ses+0.015
 spoiler+0.015
robe+0.015
 trigger+0.015
Neg logit contributions
 asked-0.012
 Jane-0.012
 tidy-0.011
 clean-0.010
 John-0.010
Token "
Token ID366
Feature activation+2.14e-02

Pos logit contributions
ses+0.009
ider+0.009
robe+0.009
cher+0.009
 spoiler+0.009
Neg logit contributions
 tidy-0.007
 asked-0.007
 Jane-0.007
agg-0.007
by-0.007
TokenCan
Token ID6090
Feature activation+1.90e-02

Pos logit contributions
 asked+0.027
 Jane+0.026
 tidy+0.025
Jane+0.022
 clean+0.022
Neg logit contributions
ses-0.038
cher-0.038
ider-0.037
robe-0.036
 trigger-0.036
Token we
Token ID356
Feature activation+4.58e-02

Pos logit contributions
 asked+0.033
 Jane+0.033
 tidy+0.032
agg+0.029
Jane+0.029
Neg logit contributions
cher-0.023
ses-0.023
ider-0.023
 spoiler-0.022
 trigger-0.022
Token take
Token ID1011
Feature activation+6.03e-02

Pos logit contributions
 Jane+0.072
Jane+0.071
Kat+0.067
agg+0.067
 Katie+0.067
Neg logit contributions
pper-0.059
cher-0.053
ses-0.053
 loser-0.053
hered-0.053
Token some
Token ID617
Feature activation+1.01e-01

Pos logit contributions
 tidy+0.101
Jane+0.100
 Jane+0.099
 asked+0.094
agg+0.092
Neg logit contributions
 trigger-0.069
hered-0.067
cher-0.067
pper-0.066
ses-0.065
Token corn
Token ID11676
Feature activation+4.10e-02

Pos logit contributions
 Jane+0.145
Jane+0.142
Kat+0.132
 Katie+0.127
 John+0.125
Neg logit contributions
 trigger-0.133
pper-0.133
 wand-0.131
plets-0.124
 flap-0.123
Token?"
Token ID1701
Feature activation+1.46e-02

Pos logit contributions
 Jane+0.054
 asked+0.052
 tidy+0.050
Jane+0.048
 John+0.046
Neg logit contributions
cher-0.068
ider-0.067
ses-0.065
robe-0.065
pper-0.064
Token
Token ID198
Feature activation+1.35e-04

Pos logit contributions
 tidy+0.017
 Jane+0.015
 asked+0.015
agg+0.015
 admired+0.014
Neg logit contributions
ider-0.025
ses-0.024
cher-0.024
 spoiler-0.024
robe-0.023
Token
Token ID198
Feature activation-4.67e-03

Pos logit contributions
 tidy+0.000
 Jane+0.000
 asked+0.000
 desserts+0.000
 admired+0.000
Neg logit contributions
cher-0.000
robe-0.000
ider-0.000
ses-0.000
 spoiler-0.000
TokenTom
Token ID13787
Feature activation+1.57e-02

Pos logit contributions
ider+0.007
cher+0.007
pper+0.007
 loser+0.007
 spoiler+0.007
Neg logit contributions
 Jane-0.007
 asked-0.006
 tidy-0.006
 Baby-0.006
 Maggie-0.006
Token idea
Token ID2126
Feature activation+2.28e-02

Pos logit contributions
 asked+0.050
 Jane+0.047
 admired+0.046
agg+0.045
 tidy+0.045
Neg logit contributions
 loser-0.042
ses-0.041
ider-0.040
 shooter-0.039
 spoiler-0.039
Token.
Token ID13
Feature activation+8.99e-03

Pos logit contributions
 asked+0.029
 Jane+0.028
 tidy+0.027
agg+0.025
Jane+0.024
Neg logit contributions
cher-0.038
ider-0.038
ses-0.037
robe-0.036
 trigger-0.036
Token Let
Token ID3914
Feature activation-9.49e-03

Pos logit contributions
 asked+0.015
 tidy+0.015
 Jane+0.014
agg+0.013
Jane+0.013
Neg logit contributions
cher-0.012
ider-0.011
ses-0.011
 spoiler-0.011
robe-0.011
Token's
Token ID338
Feature activation+3.86e-02

Pos logit contributions
hered+0.011
pper+0.011
 trigger+0.011
cher+0.010
 AW+0.010
Neg logit contributions
 Jane-0.015
agg-0.015
 tidy-0.014
 admired-0.014
 desserts-0.014
Token get
Token ID651
Feature activation+5.49e-02

Pos logit contributions
 Jane+0.060
Jane+0.059
agg+0.059
Kat+0.056
 asked+0.055
Neg logit contributions
 spoiler-0.045
 trigger-0.044
cher-0.044
pper-0.044
 loser-0.043
Token some
Token ID617
Feature activation+9.59e-02

Pos logit contributions
 Jane+0.087
 asked+0.083
agg+0.083
 tidy+0.083
Jane+0.082
Neg logit contributions
cher-0.070
ider-0.064
 trigger-0.058
 loser-0.058
hered-0.058
Token cream
Token ID8566
Feature activation+4.54e-02

Pos logit contributions
 Jane+0.131
Kat+0.124
Jane+0.123
 Jessie+0.121
 Katie+0.120
Neg logit contributions
ider-0.130
cher-0.125
 controller-0.123
 trigger-0.122
 loser-0.121
Token and
Token ID290
Feature activation+3.10e-02

Pos logit contributions
 asked+0.060
 Jane+0.059
 tidy+0.057
agg+0.053
Jane+0.052
Neg logit contributions
cher-0.075
ider-0.073
robe-0.071
ses-0.070
pper-0.069
Token make
Token ID787
Feature activation+4.19e-02

Pos logit contributions
 Jane+0.042
agg+0.041
Jane+0.040
 asked+0.040
 tidy+0.039
Neg logit contributions
cher-0.044
 spoiler-0.042
ider-0.042
 trigger-0.041
ses-0.041
Token some
Token ID617
Feature activation+7.37e-02

Pos logit contributions
 asked+0.058
 Jane+0.058
 tidy+0.056
Jane+0.054
 admired+0.053
Neg logit contributions
cher-0.056
 controller-0.053
 trigger-0.053
 spoiler-0.052
pper-0.052
Token ice
Token ID4771
Feature activation+6.00e-02

Pos logit contributions
 Jane+0.079
 asked+0.077
Jane+0.070
 Katie+0.070
idi+0.069
Neg logit contributions
ider-0.125
cher-0.123
 spoiler-0.120
 trigger-0.119
 loser-0.119
Token
Token ID198
Feature activation+1.57e-02

Pos logit contributions
 tidy+0.037
 desserts+0.035
 Jane+0.035
 asked+0.034
 admired+0.033
Neg logit contributions
robe-0.035
cher-0.035
hered-0.034
 controller-0.033
 CD-0.032
Token"
Token ID1
Feature activation+1.30e-02

Pos logit contributions
 Jane+0.022
 asked+0.021
 Katie+0.020
 tidy+0.020
 Maggie+0.019
Neg logit contributions
ider-0.024
cher-0.023
hered-0.023
pper-0.023
robe-0.022
TokenCan
Token ID6090
Feature activation+7.78e-03

Pos logit contributions
 Jane+0.020
 asked+0.019
 tidy+0.019
agg+0.018
 Katie+0.017
Neg logit contributions
cher-0.019
ider-0.018
hered-0.018
robe-0.018
 trigger-0.018
Token I
Token ID314
Feature activation+3.60e-02

Pos logit contributions
 asked+0.013
 Jane+0.013
 tidy+0.013
agg+0.012
Jane+0.012
Neg logit contributions
cher-0.010
ider-0.010
ses-0.009
robe-0.009
 trigger-0.009
Token have
Token ID423
Feature activation+5.56e-02

Pos logit contributions
 Jane+0.058
 asked+0.057
 tidy+0.054
agg+0.053
 Katie+0.053
Neg logit contributions
cher-0.047
ider-0.045
pper-0.044
hered-0.044
ses-0.043
Token some
Token ID617
Feature activation+9.52e-02

Pos logit contributions
 Jane+0.088
 tidy+0.083
Jane+0.083
agg+0.082
 asked+0.080
Neg logit contributions
cher-0.067
 spoiler-0.065
ses-0.063
pper-0.063
hered-0.063
Token ice
Token ID4771
Feature activation+6.04e-02

Pos logit contributions
Kat+0.135
 Jane+0.134
 Jessie+0.133
 Katie+0.131
anne+0.130
Neg logit contributions
ider-0.133
 loser-0.125
 spoiler-0.118
ses-0.116
arf-0.115
Token,
Token ID11
Feature activation+2.38e-02

Pos logit contributions
 asked+0.057
 Jane+0.055
 Katie+0.051
 tidy+0.050
 Jessie+0.050
Neg logit contributions
ider-0.123
cher-0.119
robe-0.118
ses-0.116
pper-0.115
Token please
Token ID3387
Feature activation+3.75e-02

Pos logit contributions
 asked+0.024
 Jane+0.023
agg+0.023
by+0.022
 admired+0.022
Neg logit contributions
cher-0.043
 spoiler-0.043
ses-0.043
 trigger-0.042
 robber-0.042
Token?"
Token ID1701
Feature activation+2.25e-02

Pos logit contributions
 Jane+0.061
Jane+0.060
agg+0.060
Kat+0.058
 admired+0.057
Neg logit contributions
cher-0.045
ider-0.044
pper-0.043
ses-0.042
 spoiler-0.040
Token Tim
Token ID5045
Feature activation+3.01e-03

Pos logit contributions
Kat+0.026
by+0.026
 tidy+0.026
agg+0.026
 desserts+0.024
Neg logit contributions
ses-0.036
hered-0.034
 spoiler-0.034
 trigger-0.034
ider-0.033
Token mom
Token ID1995
Feature activation+3.68e-02

Pos logit contributions
 asked+0.007
 Jane+0.007
 tidy+0.006
Jane+0.006
agg+0.006
Neg logit contributions
ses-0.011
robe-0.011
ider-0.010
cher-0.010
 spoiler-0.010
Token set
Token ID900
Feature activation+4.15e-02

Pos logit contributions
 pic+0.046
 Jane+0.045
Kat+0.043
 Katie+0.042
 Clara+0.041
Neg logit contributions
pper-0.053
cher-0.049
hered-0.047
ede-0.047
wheel-0.046
Token to
Token ID284
Feature activation+4.40e-02

Pos logit contributions
 asked+0.057
 Jane+0.055
 tidy+0.054
Jane+0.052
 admired+0.050
Neg logit contributions
hered-0.060
 trigger-0.059
cher-0.059
ses-0.057
 controller-0.056
Token making
Token ID1642
Feature activation+4.12e-02

Pos logit contributions
agg+0.072
Jane+0.071
 Jane+0.068
mber+0.068
Kat+0.066
Neg logit contributions
 flap-0.042
 controller-0.041
cher-0.039
 grant-0.039
 trigger-0.039
Token a
Token ID257
Feature activation+6.37e-02

Pos logit contributions
 tidy+0.062
 asked+0.062
Jane+0.062
idi+0.060
 Jane+0.060
Neg logit contributions
 trigger-0.042
ses-0.041
 controller-0.041
 wand-0.040
cher-0.038
Token delicious
Token ID12625
Feature activation+9.37e-02

Pos logit contributions
ere+0.077
 asked+0.076
 John+0.075
 Jane+0.075
assembled+0.070
Neg logit contributions
 wand-0.088
 cape-0.088
 trigger-0.087
cher-0.086
 spoiler-0.084
Token pie
Token ID2508
Feature activation+3.62e-02

Pos logit contributions
 asked+0.125
Kat+0.120
Jamie+0.119
idi+0.116
Jane+0.114
Neg logit contributions
 end-0.114
 floating-0.111
ider-0.110
 wand-0.109
 trigger-0.107
Token.
Token ID13
Feature activation+8.37e-03

Pos logit contributions
 asked+0.051
 Jane+0.050
 tidy+0.048
agg+0.044
 John+0.044
Neg logit contributions
cher-0.055
ider-0.054
robe-0.053
ses-0.053
 trigger-0.051
Token They
Token ID1119
Feature activation+2.74e-02

Pos logit contributions
 asked+0.010
 tidy+0.010
 Jane+0.010
agg+0.009
Jane+0.009
Neg logit contributions
ses-0.015
ider-0.014
robe-0.014
 spoiler-0.014
hered-0.014
Token took
Token ID1718
Feature activation+2.67e-02

Pos logit contributions
 Jane+0.031
Jane+0.028
 tidy+0.028
agg+0.028
Kat+0.028
Neg logit contributions
cher-0.044
pper-0.043
ses-0.040
 shooter-0.040
 spoiler-0.040
Token out
Token ID503
Feature activation+3.73e-02

Pos logit contributions
 asked+0.040
 tidy+0.038
 questioned+0.037
Jane+0.036
urious+0.035
Neg logit contributions
hered-0.036
 trigger-0.033
plets-0.032
 AW-0.031
 controller-0.031
Token Lily
Token ID20037
Feature activation-2.33e-03

Pos logit contributions
 asked+0.029
 Jane+0.027
 admired+0.026
agg+0.026
 tidy+0.026
Neg logit contributions
cher-0.029
ider-0.027
ses-0.027
pper-0.027
 trigger-0.027
Token.
Token ID13
Feature activation+7.19e-03

Pos logit contributions
cher+0.004
ider+0.004
ses+0.004
robe+0.004
 trigger+0.003
Neg logit contributions
 asked-0.003
 Jane-0.003
 tidy-0.003
 John-0.003
agg-0.003
Token Let
Token ID3914
Feature activation-1.02e-02

Pos logit contributions
 asked+0.011
 Jane+0.011
 tidy+0.011
agg+0.010
Jane+0.009
Neg logit contributions
cher-0.010
ider-0.010
ses-0.010
robe-0.010
 spoiler-0.010
Token's
Token ID338
Feature activation+3.39e-02

Pos logit contributions
hered+0.015
cher+0.015
 trigger+0.015
pper+0.015
ses+0.015
Neg logit contributions
 Jane-0.014
 asked-0.013
 tidy-0.013
agg-0.012
Jane-0.012
Token get
Token ID651
Feature activation+5.53e-02

Pos logit contributions
 Jane+0.056
Jane+0.054
agg+0.054
 asked+0.052
Kat+0.051
Neg logit contributions
 trigger-0.039
 spoiler-0.039
pper-0.039
cher-0.038
hered-0.037
Token some
Token ID617
Feature activation+9.36e-02

Pos logit contributions
 Jane+0.084
 tidy+0.080
agg+0.080
 asked+0.079
Jane+0.078
Neg logit contributions
cher-0.076
 trigger-0.070
ider-0.069
hered-0.068
pper-0.067
Token tasty
Token ID25103
Feature activation+9.18e-02

Pos logit contributions
 Jane+0.123
Kat+0.120
 Jessie+0.116
Jess+0.116
 Katie+0.116
Neg logit contributions
ider-0.131
 trigger-0.129
hered-0.123
 controller-0.122
cher-0.121
Token ice
Token ID4771
Feature activation+5.90e-02

Pos logit contributions
Kat+0.112
 Jane+0.107
idi+0.106
Jess+0.105
 tidy+0.105
Neg logit contributions
hered-0.139
 trigger-0.133
ses-0.131
rob-0.131
ider-0.130
Token cream
Token ID8566
Feature activation+3.21e-02

Pos logit contributions
 asked+0.046
 Jane+0.045
 tidy+0.040
 Katie+0.038
agg+0.035
Neg logit contributions
ider-0.127
cher-0.127
robe-0.124
pper-0.122
ses-0.121
Token,"
Token ID553
Feature activation+3.28e-02

Pos logit contributions
 Jane+0.040
 asked+0.040
 tidy+0.038
agg+0.034
Jane+0.033
Neg logit contributions
cher-0.055
ider-0.054
ses-0.054
robe-0.053
 trigger-0.052
Token said
Token ID531
Feature activation+1.14e-02

Pos logit contributions
agg+0.039
 Jane+0.037
 pic+0.036
 tidy+0.035
Kat+0.034
Neg logit contributions
cher-0.053
pper-0.050
ses-0.050
hered-0.049
 spoiler-0.049
Token,
Token ID11
Feature activation+1.41e-02

Pos logit contributions
agg+0.033
 tidy+0.031
 asked+0.031
 pic+0.030
idi+0.029
Neg logit contributions
cher-0.040
robe-0.039
pper-0.039
hered-0.038
ses-0.037
Token "
Token ID366
Feature activation+2.04e-02

Pos logit contributions
 tidy+0.019
 asked+0.019
 Jane+0.018
agg+0.018
 admired+0.017
Neg logit contributions
cher-0.022
ses-0.021
ider-0.020
robe-0.020
hered-0.020
TokenCan
Token ID6090
Feature activation+1.42e-03

Pos logit contributions
 Jane+0.031
 tidy+0.030
 asked+0.030
agg+0.029
 John+0.027
Neg logit contributions
cher-0.027
ider-0.026
hered-0.026
pper-0.026
robe-0.026
Token I
Token ID314
Feature activation+3.71e-02

Pos logit contributions
 asked+0.002
 Jane+0.002
 tidy+0.002
 John+0.002
 clean+0.002
Neg logit contributions
cher-0.002
ider-0.002
ses-0.002
robe-0.002
 trigger-0.002
Token buy
Token ID2822
Feature activation+6.49e-02

Pos logit contributions
 Jane+0.063
 asked+0.061
 tidy+0.059
agg+0.057
Jane+0.056
Neg logit contributions
cher-0.046
ider-0.045
ses-0.043
 spoiler-0.042
hered-0.042
Token some
Token ID617
Feature activation+9.35e-02

Pos logit contributions
 Jane+0.096
 tidy+0.095
 asked+0.095
Jane+0.090
agg+0.090
Neg logit contributions
cher-0.087
pper-0.080
ider-0.078
 spoiler-0.077
hered-0.076
Token ice
Token ID4771
Feature activation+5.93e-02

Pos logit contributions
 Jane+0.123
Kat+0.116
 Jessie+0.116
Jane+0.115
anne+0.114
Neg logit contributions
ider-0.133
cher-0.127
 sax-0.125
 CD-0.124
 spoiler-0.123
Token cream
Token ID8566
Feature activation+3.12e-02

Pos logit contributions
 asked+0.050
 Jane+0.049
 tidy+0.045
 Jessie+0.043
 Katie+0.043
Neg logit contributions
ider-0.124
cher-0.122
robe-0.122
pper-0.117
 loser-0.115
Token?"
Token ID1701
Feature activation+3.34e-02

Pos logit contributions
 asked+0.041
 Jane+0.041
 tidy+0.039
agg+0.036
Jane+0.035
Neg logit contributions
cher-0.051
ider-0.051
robe-0.049
ses-0.048
pper-0.047
Token
Token ID220
Feature activation+8.45e-03

Pos logit contributions
 tidy+0.058
 gre+0.057
 affordable+0.056
 rewarded+0.056
 chees+0.055
Neg logit contributions
hered-0.028
ses-0.028
robe-0.027
ishly-0.025
 Have-0.024
Token
Token ID198
Feature activation+1.76e-02

Pos logit contributions
 tidy+0.010
 asked+0.010
 Jane+0.009
 desserts+0.009
 admired+0.009
Neg logit contributions
robe-0.014
cher-0.014
ses-0.013
ider-0.013
hered-0.013
TokenYes
Token ID5297
Feature activation+1.81e-02

Pos logit contributions
 Jane+0.003
 asked+0.003
 tidy+0.003
agg+0.003
 John+0.003
Neg logit contributions
cher-0.003
ider-0.003
ses-0.003
robe-0.003
 spoiler-0.003
Token,
Token ID11
Feature activation+1.24e-02

Pos logit contributions
 asked+0.014
 Jane+0.014
 tidy+0.013
 John+0.011
 clean+0.010
Neg logit contributions
cher-0.040
ses-0.040
ider-0.040
robe-0.039
 trigger-0.038
Token you
Token ID345
Feature activation+2.40e-02

Pos logit contributions
 asked+0.015
 Jane+0.015
 tidy+0.014
agg+0.013
Jane+0.013
Neg logit contributions
cher-0.021
ider-0.021
ses-0.021
 spoiler-0.020
 trigger-0.020
Token can
Token ID460
Feature activation+4.36e-02

Pos logit contributions
 asked+0.035
 Jane+0.035
 tidy+0.034
 John+0.030
 clean+0.030
Neg logit contributions
cher-0.036
ider-0.036
ses-0.036
robe-0.034
 trigger-0.034
Token have
Token ID423
Feature activation+5.08e-02

Pos logit contributions
 Jane+0.065
 asked+0.063
 tidy+0.059
agg+0.058
Jane+0.058
Neg logit contributions
ider-0.062
cher-0.060
 spoiler-0.059
pper-0.058
robe-0.058
Token some
Token ID617
Feature activation+9.31e-02

Pos logit contributions
agg+0.086
 Jane+0.085
 tidy+0.084
Jane+0.082
 asked+0.079
Neg logit contributions
cher-0.047
 spoiler-0.045
pper-0.045
 Have-0.045
hered-0.044
Token ice
Token ID4771
Feature activation+5.97e-02

Pos logit contributions
Kat+0.134
Jess+0.128
 Jane+0.128
 Jessie+0.127
Maria+0.127
Neg logit contributions
ider-0.124
 loser-0.117
 Have-0.112
 end-0.108
 spoiler-0.107
Token cream
Token ID8566
Feature activation+3.16e-02

Pos logit contributions
 asked+0.055
 Jane+0.052
 Jessie+0.049
 Katie+0.049
agg+0.046
Neg logit contributions
ider-0.124
robe-0.119
cher-0.118
 loser-0.117
pper-0.115
Token!"
Token ID2474
Feature activation+2.05e-03

Pos logit contributions
 asked+0.045
 Jane+0.044
 tidy+0.042
agg+0.040
Jane+0.039
Neg logit contributions
cher-0.049
ider-0.049
robe-0.047
ses-0.046
 loser-0.045
Token
Token ID198
Feature activation+2.67e-03

Pos logit contributions
 asked+0.002
 Jane+0.002
 tidy+0.002
 John+0.002
 clean+0.002
Neg logit contributions
cher-0.004
ider-0.004
ses-0.004
robe-0.004
 trigger-0.004
Token
Token ID198
Feature activation-6.82e-03

Pos logit contributions
 Jane+0.003
 asked+0.003
 tidy+0.002
agg+0.002
 John+0.002
Neg logit contributions
cher-0.005
ses-0.005
ider-0.005
robe-0.005
 spoiler-0.005
Token the
Token ID262
Feature activation+3.90e-02

Pos logit contributions
 asked+0.018
 Jane+0.018
 tidy+0.018
 John+0.015
agg+0.015
Neg logit contributions
cher-0.032
ider-0.032
ses-0.032
robe-0.031
 trigger-0.031
Token tree
Token ID5509
Feature activation+1.11e-02

Pos logit contributions
 Jane+0.063
 asked+0.060
 John+0.059
 Katie+0.059
Jane+0.058
Neg logit contributions
 trigger-0.052
ses-0.049
ider-0.048
 loser-0.048
robe-0.045
Token and
Token ID290
Feature activation+5.33e-03

Pos logit contributions
 asked+0.012
 Jane+0.012
 tidy+0.011
agg+0.011
 Baby+0.011
Neg logit contributions
ider-0.020
cher-0.020
ses-0.019
robe-0.019
hered-0.019
Token picked
Token ID6497
Feature activation+3.11e-02

Pos logit contributions
 tidy+0.006
agg+0.006
 Jane+0.006
Jane+0.006
ere+0.005
Neg logit contributions
cher-0.009
pper-0.008
hered-0.008
ses-0.008
plets-0.008
Token a
Token ID257
Feature activation+5.55e-02

Pos logit contributions
 tidy+0.041
 asked+0.039
 Jane+0.035
Jane+0.035
 clean+0.033
Neg logit contributions
 trigger-0.048
ses-0.047
cher-0.046
robe-0.045
hered-0.045
Token juicy
Token ID35075
Feature activation+9.30e-02

Pos logit contributions
 John+0.073
 Jane+0.073
 asked+0.068
 Katie+0.067
ere+0.065
Neg logit contributions
cher-0.081
 wand-0.080
 loser-0.080
 trigger-0.079
ider-0.077
Token peach
Token ID47565
Feature activation+2.80e-02

Pos logit contributions
 John+0.124
Jamie+0.121
 Jane+0.120
 asked+0.120
Kat+0.120
Neg logit contributions
 trigger-0.132
 loser-0.127
pper-0.127
ses-0.126
 wand-0.124
Token.
Token ID13
Feature activation-3.99e-03

Pos logit contributions
 asked+0.041
 Jane+0.040
 tidy+0.038
 John+0.035
 clean+0.034
Neg logit contributions
cher-0.043
ider-0.042
ses-0.042
robe-0.040
 trigger-0.040
Token But
Token ID887
Feature activation-8.16e-03

Pos logit contributions
ses+0.006
robe+0.005
ider+0.005
 Have+0.005
plets+0.005
Neg logit contributions
 tidy-0.006
 asked-0.005
 desserts-0.005
agg-0.005
by-0.005
Token it
Token ID340
Feature activation+1.61e-02

Pos logit contributions
ider+0.013
ses+0.013
cher+0.013
 loser+0.012
pper+0.012
Neg logit contributions
 tidy-0.010
 asked-0.009
agg-0.009
 Jane-0.009
 Baby-0.009
Token was
Token ID373
Feature activation+1.48e-02

Pos logit contributions
 Jane+0.019
 Katie+0.018
 tidy+0.017
Kat+0.017
 John+0.016
Neg logit contributions
pper-0.028
ses-0.028
cher-0.027
ider-0.027
 trigger-0.027
Token
Token ID198
Feature activation-7.78e-03

Pos logit contributions
cher+0.006
robe+0.006
ider+0.006
hered+0.006
ula+0.005
Neg logit contributions
 desserts-0.008
aul-0.007
elia-0.007
 Jane-0.007
 Katie-0.007
Token"
Token ID1
Feature activation+1.88e-02

Pos logit contributions
ider+0.011
cher+0.011
robe+0.010
 loser+0.010
hered+0.010
Neg logit contributions
 Jane-0.012
 asked-0.011
 Katie-0.011
 John-0.011
 desserts-0.011
TokenCan
Token ID6090
Feature activation+1.54e-02

Pos logit contributions
 asked+0.025
 Jane+0.025
 tidy+0.024
 John+0.022
agg+0.021
Neg logit contributions
ider-0.030
cher-0.030
ses-0.030
robe-0.029
 spoiler-0.029
Token we
Token ID356
Feature activation+4.73e-02

Pos logit contributions
 asked+0.028
 Jane+0.027
 tidy+0.027
agg+0.025
 admired+0.025
Neg logit contributions
ider-0.017
cher-0.017
ses-0.017
 spoiler-0.016
robe-0.016
Token have
Token ID423
Feature activation+5.59e-02

Pos logit contributions
agg+0.086
 Jane+0.085
Jane+0.084
Kat+0.084
 Katie+0.081
Neg logit contributions
pper-0.042
 spoiler-0.041
 Have-0.040
cher-0.039
 loser-0.039
Token some
Token ID617
Feature activation+9.25e-02

Pos logit contributions
agg+0.098
 Jane+0.098
 tidy+0.095
Jane+0.095
 asked+0.094
Neg logit contributions
cher-0.049
 spoiler-0.049
 controller-0.046
hered-0.045
pper-0.044
Token?"
Token ID1701
Feature activation+1.01e-02

Pos logit contributions
Kat+0.120
 Jane+0.115
Jess+0.113
Jane+0.113
 Katie+0.110
Neg logit contributions
ider-0.132
 loser-0.118
 spoiler-0.116
 CD-0.115
 controller-0.113
Token Lily
Token ID20037
Feature activation-3.82e-03

Pos logit contributions
aul+0.014
 tidy+0.013
 desserts+0.013
agg+0.013
 bakery+0.012
Neg logit contributions
robe-0.013
hered-0.013
 robber-0.013
ider-0.013
cher-0.013
Token asked
Token ID1965
Feature activation-1.16e-02

Pos logit contributions
ider+0.008
cher+0.008
pper+0.008
hered+0.008
bee+0.008
Neg logit contributions
agg-0.003
 proper-0.002
 pic-0.002
 Jane-0.002
 desserts-0.002
Token.
Token ID13
Feature activation-5.22e-03

Pos logit contributions
cher+0.020
ider+0.020
ses+0.020
 spoiler+0.019
 trigger+0.019
Neg logit contributions
 Jane-0.015
 asked-0.015
 tidy-0.014
 John-0.013
 clean-0.013
Token
Token ID198
Feature activation+7.41e-03

Pos logit contributions
ses+0.008
ider+0.008
hered+0.008
robe+0.008
cher+0.007
Neg logit contributions
 tidy-0.007
agg-0.007
by-0.006
 desserts-0.006
 asked-0.006
Token ran
Token ID4966
Feature activation-7.03e-04

Pos logit contributions
 Jane+0.020
 tidy+0.019
 asked+0.018
 John+0.016
 Baby+0.015
Neg logit contributions
cher-0.042
robe-0.041
ider-0.041
ses-0.040
hered-0.040
Token to
Token ID284
Feature activation+1.99e-02

Pos logit contributions
robe+0.001
cher+0.001
ider+0.001
hered+0.001
ses+0.001
Neg logit contributions
 asked-0.001
agg-0.001
 Jane-0.001
aul-0.001
 Baby-0.001
Token grab
Token ID5552
Feature activation+1.96e-02

Pos logit contributions
aul+0.025
agg+0.024
by+0.024
 asked+0.023
 Jane+0.022
Neg logit contributions
cher-0.029
robe-0.027
 grant-0.026
ses-0.026
hered-0.026
Token all
Token ID477
Feature activation+3.91e-02

Pos logit contributions
 asked+0.024
 Jane+0.023
 tidy+0.022
agg+0.020
Jane+0.019
Neg logit contributions
cher-0.034
ses-0.033
robe-0.033
ider-0.033
 trigger-0.033
Token sorts
Token ID10524
Feature activation+4.00e-02

Pos logit contributions
 asked+0.048
 questioned+0.043
agg+0.043
 pic+0.042
idi+0.042
Neg logit contributions
robe-0.056
 controller-0.053
 CD-0.053
hered-0.052
ula-0.051
Token of
Token ID286
Feature activation+9.22e-02

Pos logit contributions
 asked+0.044
 Jane+0.043
 tidy+0.041
 John+0.036
 Baby+0.035
Neg logit contributions
cher-0.074
ider-0.073
ses-0.072
robe-0.071
 trigger-0.071
Token decorations
Token ID39011
Feature activation+1.25e-02

Pos logit contributions
 Jim+0.154
 Todd+0.145
 Dave+0.144
Jim+0.142
 John+0.142
Neg logit contributions
 wand-0.103
 threads-0.085
ps-0.083
 keys-0.080
aces-0.079
Token,
Token ID11
Feature activation+1.42e-02

Pos logit contributions
 asked+0.015
 Jane+0.015
 tidy+0.014
 John+0.013
agg+0.013
Neg logit contributions
cher-0.021
ider-0.021
ses-0.021
robe-0.021
 trigger-0.020
Token including
Token ID1390
Feature activation+9.22e-03

Pos logit contributions
 asked+0.020
 Jane+0.019
 tidy+0.018
agg+0.017
 John+0.017
Neg logit contributions
cher-0.022
ider-0.022
ses-0.021
robe-0.021
 spoiler-0.021
Token balloons
Token ID35485
Feature activation-1.20e-02

Pos logit contributions
 asked+0.012
 Jane+0.012
 tidy+0.011
Jane+0.010
agg+0.010
Neg logit contributions
cher-0.015
ses-0.014
robe-0.014
ider-0.014
 spoiler-0.014
Token,
Token ID11
Feature activation+4.04e-03

Pos logit contributions
cher+0.025
ses+0.024
ider+0.024
 trigger+0.024
 spoiler+0.024
Neg logit contributions
 asked-0.012
 Jane-0.011
 tidy-0.011
 clean-0.009
 John-0.009
Token friends
Token ID2460
Feature activation-2.37e-02

Pos logit contributions
 Jane+0.018
 asked+0.017
Jane+0.016
 Katie+0.016
agg+0.016
Neg logit contributions
ses-0.021
ider-0.021
robe-0.020
cher-0.020
 loser-0.019
Token and
Token ID290
Feature activation-2.94e-02

Pos logit contributions
 spoiler+0.051
ses+0.051
cher+0.050
 trigger+0.049
ider+0.049
Neg logit contributions
 asked-0.023
 Jane-0.021
 tidy-0.021
 to-0.018
 clean-0.018
Token ate
Token ID15063
Feature activation+4.14e-02

Pos logit contributions
cher+0.046
ses+0.046
robe+0.045
ider+0.043
 shooter+0.042
Neg logit contributions
 Jane-0.038
 asked-0.038
 tidy-0.038
agg-0.036
ere-0.034
Token lots
Token ID6041
Feature activation+4.56e-02

Pos logit contributions
 asked+0.056
agg+0.052
 questioned+0.052
urious+0.051
idi+0.051
Neg logit contributions
robe-0.053
ses-0.051
 CD-0.051
 spoiler-0.051
 trigger-0.051
Token of
Token ID286
Feature activation+8.49e-02

Pos logit contributions
 asked+0.067
 Jane+0.062
 John+0.059
 tidy+0.059
 Todd+0.058
Neg logit contributions
ider-0.063
hered-0.061
cher-0.060
robe-0.060
 CD-0.060
Token delicious
Token ID12625
Feature activation+9.21e-02

Pos logit contributions
assembled+0.115
urious+0.114
anne+0.111
Kat+0.109
to+0.108
Neg logit contributions
rob-0.094
 cotton-0.091
arf-0.084
bee-0.081
 cape-0.080
Token food
Token ID2057
Feature activation+1.90e-02

Pos logit contributions
Kat+0.139
Jamie+0.136
idi+0.136
 Jane+0.135
 John+0.135
Neg logit contributions
bee-0.097
 cotton-0.093
rob-0.090
ops-0.086
hered-0.084
Token.
Token ID13
Feature activation-1.29e-02

Pos logit contributions
 asked+0.029
 Jane+0.028
 tidy+0.027
agg+0.025
 John+0.025
Neg logit contributions
cher-0.027
ider-0.027
ses-0.026
robe-0.026
pper-0.025
Token She
Token ID1375
Feature activation+1.80e-02

Pos logit contributions
robe+0.022
ses+0.022
ider+0.021
 trigger+0.020
 spoiler+0.020
Neg logit contributions
 asked-0.016
 tidy-0.015
 Jane-0.015
agg-0.014
Jane-0.014
Token had
Token ID550
Feature activation-1.59e-02

Pos logit contributions
 Jane+0.023
ere+0.020
 tidy+0.020
 John+0.020
agg+0.020
Neg logit contributions
cher-0.027
robe-0.027
ses-0.027
pper-0.026
 shooter-0.026
Token so
Token ID523
Feature activation-1.93e-03

Pos logit contributions
cher+0.021
robe+0.020
ses+0.020
hered+0.019
ider+0.019
Neg logit contributions
 asked-0.022
 Jane-0.022
agg-0.021
 tidy-0.021
ere-0.020
Token!
Token ID0
Feature activation+2.99e-02

Pos logit contributions
 asked+0.035
 Jane+0.035
 tidy+0.033
 John+0.030
 clean+0.029
Neg logit contributions
cher-0.046
ider-0.045
ses-0.045
robe-0.044
 trigger-0.043
Token Can
Token ID1680
Feature activation+3.62e-03

Pos logit contributions
 asked+0.044
 Jane+0.043
 tidy+0.042
agg+0.038
Jane+0.038
Neg logit contributions
cher-0.044
ider-0.044
ses-0.043
 spoiler-0.042
robe-0.042
Token we
Token ID356
Feature activation+3.86e-02

Pos logit contributions
 asked+0.005
 Jane+0.005
 tidy+0.005
 John+0.005
 Baby+0.004
Neg logit contributions
ses-0.006
cher-0.006
robe-0.006
ider-0.006
 trigger-0.005
Token please
Token ID3387
Feature activation+3.84e-02

Pos logit contributions
 Jane+0.065
agg+0.065
Jane+0.063
Kat+0.061
 Katie+0.060
Neg logit contributions
pper-0.044
hered-0.042
 spoiler-0.041
assing-0.041
 trigger-0.041
Token have
Token ID423
Feature activation+5.45e-02

Pos logit contributions
 Jane+0.059
agg+0.057
Jane+0.056
 asked+0.056
 Katie+0.055
Neg logit contributions
cher-0.048
ider-0.047
pper-0.046
 spoiler-0.045
hered-0.045
Token some
Token ID617
Feature activation+9.20e-02

Pos logit contributions
agg+0.090
 Jane+0.088
 tidy+0.086
Jane+0.085
 Maggie+0.083
Neg logit contributions
hered-0.059
 spoiler-0.059
cher-0.058
pper-0.056
 trigger-0.055
Token?"
Token ID1701
Feature activation+7.25e-03

Pos logit contributions
Kat+0.130
 Jane+0.130
Maria+0.129
 Jessie+0.129
Jamie+0.129
Neg logit contributions
ider-0.120
 loser-0.110
pper-0.109
hered-0.109
 trigger-0.107
Token asked
Token ID1965
Feature activation-1.46e-02

Pos logit contributions
agg+0.011
 desserts+0.010
ere+0.010
aul+0.009
 pic+0.009
Neg logit contributions
 trigger-0.009
hered-0.009
robe-0.009
cher-0.009
 CD-0.009
Token Lily
Token ID20037
Feature activation-1.22e-02

Pos logit contributions
cher+0.029
ider+0.029
ses+0.029
robe+0.028
 trigger+0.028
Neg logit contributions
 asked-0.014
 Jane-0.014
 tidy-0.013
 John-0.011
agg-0.011
Token.
Token ID13
Feature activation+3.85e-03

Pos logit contributions
cher+0.017
ider+0.016
robe+0.016
pper+0.016
hered+0.016
Neg logit contributions
 Jane-0.019
 asked-0.018
 tidy-0.017
agg-0.017
Jane-0.017
Token
Token ID198
Feature activation+1.49e-02

Pos logit contributions
agg+0.006
 tidy+0.006
 pic+0.006
 desserts+0.006
ere+0.006
Neg logit contributions
robe-0.005
hered-0.004
ses-0.004
 controller-0.004
 spoiler-0.004
Token.
Token ID13
Feature activation+7.19e-03

Pos logit contributions
cher+0.004
ider+0.004
ses+0.004
robe+0.004
 trigger+0.003
Neg logit contributions
 asked-0.003
 Jane-0.003
 tidy-0.003
 John-0.003
agg-0.003
Token Let
Token ID3914
Feature activation-1.02e-02

Pos logit contributions
 asked+0.011
 Jane+0.011
 tidy+0.011
agg+0.010
Jane+0.009
Neg logit contributions
cher-0.010
ider-0.010
ses-0.010
robe-0.010
 spoiler-0.010
Token's
Token ID338
Feature activation+3.39e-02

Pos logit contributions
hered+0.015
cher+0.015
 trigger+0.015
pper+0.015
ses+0.015
Neg logit contributions
 Jane-0.014
 asked-0.013
 tidy-0.013
agg-0.012
Jane-0.012
Token get
Token ID651
Feature activation+5.53e-02

Pos logit contributions
 Jane+0.056
Jane+0.054
agg+0.054
 asked+0.052
Kat+0.051
Neg logit contributions
 trigger-0.039
 spoiler-0.039
pper-0.039
cher-0.038
hered-0.037
Token some
Token ID617
Feature activation+9.36e-02

Pos logit contributions
 Jane+0.084
 tidy+0.080
agg+0.080
 asked+0.079
Jane+0.078
Neg logit contributions
cher-0.076
 trigger-0.070
ider-0.069
hered-0.068
pper-0.067
Token tasty
Token ID25103
Feature activation+9.18e-02

Pos logit contributions
 Jane+0.123
Kat+0.120
 Jessie+0.116
Jess+0.116
 Katie+0.116
Neg logit contributions
ider-0.131
 trigger-0.129
hered-0.123
 controller-0.122
cher-0.121
Token ice
Token ID4771
Feature activation+5.90e-02

Pos logit contributions
Kat+0.112
 Jane+0.107
idi+0.106
Jess+0.105
 tidy+0.105
Neg logit contributions
hered-0.139
 trigger-0.133
ses-0.131
rob-0.131
ider-0.130
Token cream
Token ID8566
Feature activation+3.21e-02

Pos logit contributions
 asked+0.046
 Jane+0.045
 tidy+0.040
 Katie+0.038
agg+0.035
Neg logit contributions
ider-0.127
cher-0.127
robe-0.124
pper-0.122
ses-0.121
Token,"
Token ID553
Feature activation+3.28e-02

Pos logit contributions
 Jane+0.040
 asked+0.040
 tidy+0.038
agg+0.034
Jane+0.033
Neg logit contributions
cher-0.055
ider-0.054
ses-0.054
robe-0.053
 trigger-0.052
Token said
Token ID531
Feature activation+1.14e-02

Pos logit contributions
agg+0.039
 Jane+0.037
 pic+0.036
 tidy+0.035
Kat+0.034
Neg logit contributions
cher-0.053
pper-0.050
ses-0.050
hered-0.049
 spoiler-0.049
Token her
Token ID607
Feature activation+7.41e-04

Pos logit contributions
agg+0.011
 tidy+0.010
 Jane+0.010
 pic+0.010
Jane+0.010
Neg logit contributions
cher-0.021
hered-0.021
 trigger-0.020
 spoiler-0.020
robe-0.020
Token backyard
Token ID24296
Feature activation+1.16e-02

Pos logit contributions
 Jane+0.023
agg+0.023
 asked+0.022
 John+0.021
 Katie+0.021
Neg logit contributions
ses-0.019
 Have-0.019
robe-0.019
Have-0.019
tail-0.019
Token.
Token ID13
Feature activation+1.47e-02

Pos logit contributions
 asked+0.017
 Jane+0.017
 tidy+0.016
 John+0.014
 clean+0.014
Neg logit contributions
cher-0.018
ses-0.018
ider-0.018
 trigger-0.017
robe-0.017
Token She
Token ID1375
Feature activation+3.92e-02

Pos logit contributions
agg+0.022
 tidy+0.021
ere+0.021
by+0.020
Joe+0.020
Neg logit contributions
ses-0.019
 Have-0.018
 CD-0.017
hered-0.017
robe-0.017
Token found
Token ID1043
Feature activation+1.19e-02

Pos logit contributions
 Jane+0.053
 Katie+0.051
 Todd+0.049
 John+0.049
 Maggie+0.048
Neg logit contributions
ede-0.053
ses-0.052
pper-0.051
ider-0.051
robe-0.051
Token a
Token ID257
Feature activation+2.90e-02

Pos logit contributions
 asked+0.019
 Jane+0.019
 tidy+0.019
agg+0.017
Jane+0.017
Neg logit contributions
cher-0.015
ses-0.015
 trigger-0.015
ider-0.015
robe-0.015
Token ripe
Token ID29036
Feature activation+9.12e-02

Pos logit contributions
 Jane+0.037
 asked+0.036
 John+0.034
agg+0.033
Jane+0.032
Neg logit contributions
cher-0.047
ses-0.047
robe-0.046
ider-0.046
 loser-0.046
Token one
Token ID530
Feature activation+3.03e-02

Pos logit contributions
 John+0.121
 Jane+0.120
old+0.119
Kat+0.117
 Katie+0.114
Neg logit contributions
ider-0.127
 trigger-0.125
 loser-0.125
pper-0.125
ses-0.122
Token and
Token ID290
Feature activation+1.32e-02

Pos logit contributions
 asked+0.039
 tidy+0.037
 Jane+0.037
agg+0.035
 Baby+0.034
Neg logit contributions
ider-0.047
pper-0.047
ses-0.047
cher-0.046
robe-0.046
Token carefully
Token ID7773
Feature activation+2.21e-02

Pos logit contributions
agg+0.014
Jamie+0.014
 tidy+0.013
 Todd+0.013
 Jane+0.013
Neg logit contributions
pper-0.019
ider-0.018
cher-0.018
ses-0.018
 trigger-0.018
Token pl
Token ID458
Feature activation+3.46e-02

Pos logit contributions
 Jane+0.024
 tidy+0.023
 John+0.023
 Katie+0.022
 Baby+0.021
Neg logit contributions
ider-0.039
pper-0.037
ses-0.037
hered-0.036
cher-0.035
Tokenucked
Token ID17758
Feature activation+1.72e-02

Pos logit contributions
 tidy+0.059
 asked+0.058
 Jane+0.057
 clean+0.053
 John+0.052
Neg logit contributions
cher-0.047
ider-0.047
pper-0.044
ses-0.043
hered-0.042
Token,
Token ID11
Feature activation+3.76e-03

Pos logit contributions
 asked+0.003
 Jane+0.003
 tidy+0.003
agg+0.002
 John+0.002
Neg logit contributions
cher-0.007
ider-0.007
ses-0.007
robe-0.007
 trigger-0.007
Token let
Token ID1309
Feature activation-6.33e-03

Pos logit contributions
 asked+0.005
 Jane+0.005
agg+0.005
 admired+0.004
 tidy+0.004
Neg logit contributions
cher-0.006
ider-0.006
ses-0.006
robe-0.006
 trigger-0.006
Token's
Token ID338
Feature activation+2.83e-02

Pos logit contributions
hered+0.008
 trigger+0.007
 AW+0.007
pper+0.007
cher+0.007
Neg logit contributions
 Jane-0.010
agg-0.010
 admired-0.009
 asked-0.009
 tidy-0.009
Token go
Token ID467
Feature activation+2.57e-02

Pos logit contributions
 Jane+0.049
agg+0.048
Jane+0.047
 asked+0.046
Kat+0.045
Neg logit contributions
cher-0.031
 spoiler-0.030
 trigger-0.030
 loser-0.029
hered-0.029
Token get
Token ID651
Feature activation+5.28e-02

Pos logit contributions
 asked+0.031
 Jane+0.030
 tidy+0.029
 clean+0.026
 to+0.024
Neg logit contributions
ses-0.049
ider-0.047
cher-0.047
 trigger-0.047
 spoiler-0.046
Token some
Token ID617
Feature activation+9.12e-02

Pos logit contributions
 Jane+0.072
agg+0.072
 tidy+0.069
Jane+0.068
 asked+0.067
Neg logit contributions
cher-0.075
ider-0.071
 trigger-0.069
 loser-0.068
hered-0.068
Token."
Token ID526
Feature activation-1.83e-03

Pos logit contributions
 Jane+0.121
Kat+0.118
 Jessie+0.116
Jess+0.116
 Katie+0.114
Neg logit contributions
ider-0.128
 trigger-0.122
 loser-0.119
 controller-0.118
cher-0.117
Token
Token ID220
Feature activation-8.62e-03

Pos logit contributions
cher+0.003
ses+0.003
ider+0.003
robe+0.003
 spoiler+0.003
Neg logit contributions
 asked-0.002
 Jane-0.002
 tidy-0.002
agg-0.002
 John-0.002
Token
Token ID198
Feature activation+2.33e-03

Pos logit contributions
cher+0.017
ses+0.016
ider+0.016
robe+0.016
 trigger+0.016
Neg logit contributions
 asked-0.009
 Jane-0.008
 tidy-0.008
agg-0.007
 clean-0.007
Token
Token ID198
Feature activation-7.63e-03

Pos logit contributions
 Jane+0.003
 asked+0.003
 tidy+0.003
agg+0.003
 John+0.002
Neg logit contributions
robe-0.004
cher-0.004
hered-0.004
ider-0.004
 CD-0.004
TokenWhen
Token ID2215
Feature activation-1.03e-02

Pos logit contributions
cher+0.011
ider+0.011
ses+0.011
robe+0.011
 spoiler+0.011
Neg logit contributions
 Jane-0.011
 asked-0.011
 tidy-0.010
 John-0.010
agg-0.010
Token She
Token ID1375
Feature activation+5.22e-02

Pos logit contributions
 asked+0.030
agg+0.029
 tidy+0.029
 Jane+0.028
 admired+0.027
Neg logit contributions
robe-0.030
ses-0.029
 spoiler-0.028
 Have-0.027
hered-0.027
Token would
Token ID561
Feature activation+3.03e-02

Pos logit contributions
 Jane+0.066
 John+0.055
 Mary+0.055
 Katie+0.055
Jane+0.054
Neg logit contributions
robe-0.089
cher-0.087
ider-0.085
ses-0.082
 spoiler-0.081
Token take
Token ID1011
Feature activation+2.37e-02

Pos logit contributions
 asked+0.039
 Jane+0.037
 tidy+0.037
 clean+0.032
 John+0.032
Neg logit contributions
cher-0.053
ses-0.052
ider-0.052
robe-0.050
 trigger-0.050
Token all
Token ID477
Feature activation+4.18e-02

Pos logit contributions
 Jane+0.033
 asked+0.033
Jane+0.031
agg+0.031
urious+0.031
Neg logit contributions
hered-0.034
robe-0.033
 wand-0.030
 trigger-0.030
 spoiler-0.029
Token sorts
Token ID10524
Feature activation+4.37e-02

Pos logit contributions
 asked+0.048
 Jane+0.044
 tidy+0.042
agg+0.042
 questioned+0.041
Neg logit contributions
robe-0.068
hered-0.066
 shooter-0.063
 CD-0.063
cher-0.063
Token of
Token ID286
Feature activation+9.10e-02

Pos logit contributions
 asked+0.047
 Jane+0.046
 tidy+0.043
 John+0.038
 clean+0.037
Neg logit contributions
cher-0.083
ider-0.082
ses-0.081
robe-0.079
 trigger-0.079
Token things
Token ID1243
Feature activation+3.18e-02

Pos logit contributions
 Jane+0.149
 Jim+0.149
 Mary+0.148
 Todd+0.141
oph+0.139
Neg logit contributions
 wand-0.100
aces-0.090
 cape-0.090
robe-0.087
 sax-0.086
Token and
Token ID290
Feature activation+3.39e-03

Pos logit contributions
 asked+0.034
 Jane+0.034
 tidy+0.031
agg+0.029
Jane+0.028
Neg logit contributions
cher-0.059
ider-0.058
robe-0.058
ses-0.057
hered-0.056
Token make
Token ID787
Feature activation+9.79e-03

Pos logit contributions
 Jane+0.005
agg+0.005
Jane+0.005
 asked+0.004
 John+0.004
Neg logit contributions
cher-0.005
robe-0.005
 spoiler-0.005
 trigger-0.005
ses-0.004
Token them
Token ID606
Feature activation+1.31e-02

Pos logit contributions
 asked+0.012
 Jane+0.012
Jane+0.011
idi+0.011
 John+0.011
Neg logit contributions
robe-0.014
cher-0.014
 trigger-0.014
 wand-0.013
 spoiler-0.013
Token spark
Token ID9009
Feature activation-4.06e-03

Pos logit contributions
 tidy+0.015
 asked+0.015
 Jane+0.014
 clean+0.014
 admired+0.012
Neg logit contributions
ses-0.025
cher-0.025
ider-0.025
pper-0.024
 trigger-0.024
Token,
Token ID11
Feature activation+7.92e-03

Pos logit contributions
agg+0.017
 asked+0.016
 pic+0.016
 tidy+0.016
 admired+0.015
Neg logit contributions
cher-0.021
robe-0.020
hered-0.019
ider-0.019
 loser-0.019
Token "
Token ID366
Feature activation+1.48e-02

Pos logit contributions
 Jane+0.012
agg+0.012
 tidy+0.011
by+0.011
 asked+0.011
Neg logit contributions
cher-0.011
ider-0.011
robe-0.010
ses-0.010
hered-0.010
TokenCan
Token ID6090
Feature activation+3.24e-03

Pos logit contributions
 asked+0.021
 Jane+0.021
 tidy+0.020
 John+0.018
agg+0.018
Neg logit contributions
cher-0.023
ider-0.022
ses-0.022
robe-0.022
 trigger-0.022
Token I
Token ID314
Feature activation+3.17e-02

Pos logit contributions
 asked+0.006
 Jane+0.006
 tidy+0.005
agg+0.005
 John+0.005
Neg logit contributions
cher-0.004
ider-0.004
ses-0.004
robe-0.004
 trigger-0.004
Token have
Token ID423
Feature activation+4.54e-02

Pos logit contributions
 Jane+0.049
 asked+0.048
 tidy+0.045
Jane+0.044
 Katie+0.043
Neg logit contributions
cher-0.044
ider-0.043
ses-0.042
robe-0.041
pper-0.041
Token some
Token ID617
Feature activation+9.02e-02

Pos logit contributions
 Jane+0.069
Jane+0.069
agg+0.068
 tidy+0.065
urious+0.064
Neg logit contributions
cher-0.050
 cape-0.046
 controller-0.046
 spoiler-0.045
hered-0.045
Token ice
Token ID4771
Feature activation+6.11e-02

Pos logit contributions
Kat+0.134
 Jessie+0.131
anne+0.130
 Jane+0.130
Jess+0.127
Neg logit contributions
ider-0.122
 loser-0.107
 spoiler-0.101
cher-0.101
 CD-0.098
Token cream
Token ID8566
Feature activation+3.93e-02

Pos logit contributions
 asked+0.053
 Jane+0.051
 Jessie+0.047
 Katie+0.046
 tidy+0.045
Neg logit contributions
ider-0.131
cher-0.127
robe-0.125
pper-0.121
 loser-0.121
Token now
Token ID783
Feature activation+9.30e-03

Pos logit contributions
 Jane+0.051
 asked+0.050
 tidy+0.048
agg+0.047
Jane+0.045
Neg logit contributions
cher-0.065
ider-0.065
robe-0.062
ses-0.061
 loser-0.061
Token?"
Token ID1701
Feature activation+2.20e-02

Pos logit contributions
 Jane+0.014
 asked+0.013
 tidy+0.013
agg+0.012
Jane+0.012
Neg logit contributions
ider-0.013
cher-0.013
robe-0.013
ses-0.013
pper-0.012
Token Her
Token ID2332
Feature activation+1.23e-02

Pos logit contributions
agg+0.036
Jess+0.034
 tidy+0.034
elia+0.033
Jane+0.032
Neg logit contributions
robe-0.026
ses-0.024
 CD-0.024
hered-0.023
 sax-0.021
Token "
Token ID366
Feature activation+2.08e-02

Pos logit contributions
 Jane+0.029
 tidy+0.029
 asked+0.028
agg+0.028
by+0.028
Neg logit contributions
cher-0.028
ider-0.028
ses-0.027
robe-0.026
 trigger-0.026
TokenYou
Token ID1639
Feature activation+1.45e-02

Pos logit contributions
 Jane+0.034
agg+0.032
 tidy+0.031
 asked+0.031
 John+0.031
Neg logit contributions
cher-0.026
ider-0.026
Have-0.024
hered-0.024
 loser-0.024
Token can
Token ID460
Feature activation+2.56e-02

Pos logit contributions
 Jane+0.023
 asked+0.023
 tidy+0.022
Jane+0.020
agg+0.020
Neg logit contributions
cher-0.020
ses-0.019
ider-0.019
 trigger-0.019
robe-0.019
Token choose
Token ID3853
Feature activation+6.19e-02

Pos logit contributions
 Jane+0.039
 asked+0.039
 tidy+0.037
Jane+0.035
 John+0.034
Neg logit contributions
ider-0.036
cher-0.036
ses-0.035
robe-0.034
 trigger-0.034
Token either
Token ID2035
Feature activation+3.48e-02

Pos logit contributions
 Jane+0.083
 asked+0.082
agg+0.080
Jane+0.080
 tidy+0.079
Neg logit contributions
cher-0.090
ider-0.089
pper-0.088
ses-0.084
robe-0.082
Token an
Token ID281
Feature activation+8.93e-02

Pos logit contributions
 asked+0.053
 Jane+0.052
 tidy+0.052
agg+0.049
 admired+0.048
Neg logit contributions
ider-0.047
cher-0.047
pper-0.046
robe-0.045
 loser-0.045
Token apple
Token ID17180
Feature activation+3.15e-02

Pos logit contributions
 Jane+0.123
Jane+0.115
 asked+0.114
 tidy+0.108
 admired+0.108
Neg logit contributions
ses-0.133
ider-0.126
 trigger-0.126
pper-0.126
 shooter-0.121
Token or
Token ID393
Feature activation+1.14e-02

Pos logit contributions
 asked+0.043
 Jane+0.042
 tidy+0.041
agg+0.038
Jane+0.037
Neg logit contributions
cher-0.050
ider-0.049
ses-0.049
robe-0.048
pper-0.047
Token a
Token ID257
Feature activation+5.40e-02

Pos logit contributions
 Jane+0.015
agg+0.015
 asked+0.015
Jane+0.014
 tidy+0.014
Neg logit contributions
cher-0.017
ses-0.017
ider-0.016
 trigger-0.016
pper-0.016
Token banana
Token ID25996
Feature activation+3.03e-02

Pos logit contributions
 Jane+0.075
 asked+0.070
Jane+0.069
agg+0.068
ere+0.067
Neg logit contributions
ses-0.083
 loser-0.082
 trigger-0.081
cher-0.079
 wand-0.078
Token."
Token ID526
Feature activation+2.69e-02

Pos logit contributions
 Jane+0.046
 asked+0.046
 tidy+0.044
agg+0.040
 John+0.040
Neg logit contributions
cher-0.044
ider-0.044
ses-0.043
robe-0.041
 trigger-0.041
Token
Token ID198
Feature activation-1.25e-02

Pos logit contributions
 Jane+0.001
 tidy+0.001
 desserts+0.001
 asked+0.001
agg+0.001
Neg logit contributions
cher-0.001
robe-0.001
 controller-0.001
ses-0.001
 spoiler-0.001
Token"
Token ID1
Feature activation-1.58e-03

Pos logit contributions
cher+0.019
ider+0.018
 trigger+0.017
hered+0.017
pper+0.017
Neg logit contributions
 Jane-0.019
 asked-0.017
 tidy-0.017
agg-0.016
 John-0.016
TokenCan
Token ID6090
Feature activation-9.35e-03

Pos logit contributions
ses+0.003
cher+0.003
ider+0.003
robe+0.003
 trigger+0.002
Neg logit contributions
 asked-0.002
 Jane-0.002
 tidy-0.002
Jane-0.002
 John-0.002
Token we
Token ID356
Feature activation+3.50e-02

Pos logit contributions
cher+0.013
ses+0.013
ider+0.013
robe+0.012
 trigger+0.012
Neg logit contributions
 asked-0.015
 Jane-0.015
 tidy-0.014
 John-0.013
 clean-0.013
Token have
Token ID423
Feature activation+5.34e-02

Pos logit contributions
 Jane+0.058
agg+0.058
Jane+0.054
 asked+0.053
 Katie+0.053
Neg logit contributions
pper-0.039
cher-0.039
hered-0.039
 spoiler-0.039
 loser-0.038
Token some
Token ID617
Feature activation+8.92e-02

Pos logit contributions
agg+0.083
 Jane+0.081
Jane+0.077
 tidy+0.077
 Maggie+0.075
Neg logit contributions
 spoiler-0.063
cher-0.060
hered-0.060
 controller-0.058
 Have-0.057
Token ice
Token ID4771
Feature activation+5.46e-02

Pos logit contributions
 Jane+0.115
Kat+0.114
 Jessie+0.113
Jess+0.111
 Katie+0.110
Neg logit contributions
ider-0.130
 spoiler-0.120
 loser-0.119
hered-0.115
 sax-0.115
Token cream
Token ID8566
Feature activation+3.29e-02

Pos logit contributions
 asked+0.048
 Jane+0.045
 tidy+0.042
 Jessie+0.040
 Katie+0.039
Neg logit contributions
ider-0.114
robe-0.111
cher-0.111
ses-0.109
pper-0.107
Token,
Token ID11
Feature activation+7.61e-04

Pos logit contributions
 asked+0.039
 Jane+0.039
 tidy+0.037
agg+0.035
Jane+0.034
Neg logit contributions
cher-0.057
ider-0.057
ses-0.055
robe-0.055
pper-0.054
Token please
Token ID3387
Feature activation+3.01e-02

Pos logit contributions
 asked+0.001
 Jane+0.001
 tidy+0.001
agg+0.001
 admired+0.001
Neg logit contributions
cher-0.001
ider-0.001
ses-0.001
 spoiler-0.001
 trigger-0.001
Token?"
Token ID1701
Feature activation+6.03e-03

Pos logit contributions
 Jane+0.047
agg+0.047
 tidy+0.046
Jane+0.046
 admired+0.045
Neg logit contributions
cher-0.036
ider-0.036
pper-0.034
ses-0.033
 spoiler-0.033
Token,
Token ID11
Feature activation+1.98e-02

Pos logit contributions
 asked+0.019
 Jane+0.017
 tidy+0.016
 admired+0.016
agg+0.016
Neg logit contributions
 trigger-0.033
ses-0.032
ider-0.032
cher-0.031
robe-0.031
Token I
Token ID314
Feature activation+1.33e-02

Pos logit contributions
 asked+0.029
agg+0.028
by+0.027
idi+0.027
Kat+0.026
Neg logit contributions
cher-0.027
 trigger-0.026
ses-0.026
 spoiler-0.025
 robber-0.025
Token'll
Token ID1183
Feature activation+2.23e-02

Pos logit contributions
 Jane+0.017
 asked+0.017
 tidy+0.016
agg+0.015
 John+0.015
Neg logit contributions
cher-0.022
ider-0.022
ses-0.021
robe-0.021
 trigger-0.021
Token get
Token ID651
Feature activation+3.54e-02

Pos logit contributions
 Jane+0.037
 asked+0.036
 tidy+0.035
agg+0.032
 John+0.032
Neg logit contributions
cher-0.029
ider-0.028
ses-0.028
 trigger-0.027
robe-0.027
Token you
Token ID345
Feature activation+3.15e-02

Pos logit contributions
 Jane+0.049
 asked+0.046
 tidy+0.045
agg+0.044
Jane+0.043
Neg logit contributions
cher-0.056
ider-0.053
 trigger-0.052
ses-0.052
hered-0.050
Token some
Token ID617
Feature activation+8.89e-02

Pos logit contributions
 asked+0.043
 Jane+0.042
 tidy+0.041
 clean+0.036
 John+0.036
Neg logit contributions
cher-0.051
ses-0.051
ider-0.050
 trigger-0.049
robe-0.049
Token."
Token ID526
Feature activation+1.75e-02

Pos logit contributions
 Jane+0.133
Jane+0.124
Kat+0.122
 Jessie+0.121
 Katie+0.119
Neg logit contributions
ider-0.121
cher-0.114
 trigger-0.112
 loser-0.112
 CD-0.105
Token
Token ID220
Feature activation-4.90e-03

Pos logit contributions
agg+0.024
 tidy+0.023
idi+0.021
elia+0.021
by+0.021
Neg logit contributions
ses-0.024
hered-0.024
robe-0.024
ishly-0.023
 spoiler-0.023
Token
Token ID198
Feature activation+1.86e-02

Pos logit contributions
cher+0.009
robe+0.009
ider+0.008
ses+0.008
hered+0.008
Neg logit contributions
 tidy-0.005
 asked-0.005
 Jane-0.005
agg-0.005
 Katie-0.005
Token
Token ID198
Feature activation+1.22e-02

Pos logit contributions
 tidy+0.025
 asked+0.024
 Jane+0.024
agg+0.023
 desserts+0.023
Neg logit contributions
cher-0.025
robe-0.025
hered-0.025
 controller-0.024
ider-0.023
TokenBut
Token ID1537
Feature activation-3.51e-03

Pos logit contributions
 Jane+0.017
 asked+0.017
 Katie+0.015
agg+0.015
 tidy+0.015
Neg logit contributions
cher-0.018
ider-0.018
 spoiler-0.018
pper-0.017
 trigger-0.017
Token little
Token ID1310
Feature activation-1.51e-02

Pos logit contributions
cher+0.018
ider+0.018
ses+0.018
robe+0.018
 trigger+0.018
Neg logit contributions
 asked-0.010
 Jane-0.010
 tidy-0.010
 John-0.008
 clean-0.008
Token girl
Token ID2576
Feature activation-3.34e-02

Pos logit contributions
cher+0.028
ider+0.028
ses+0.028
robe+0.027
pper+0.027
Neg logit contributions
 asked-0.017
 Jane-0.017
 tidy-0.016
 clean-0.014
 John-0.014
Token playing
Token ID2712
Feature activation-3.13e-02

Pos logit contributions
ses+0.059
cher+0.058
 trigger+0.057
ider+0.057
robe+0.056
Neg logit contributions
 asked-0.045
 Jane-0.043
 tidy-0.040
 clean-0.036
 John-0.036
Token a
Token ID257
Feature activation-1.01e-02

Pos logit contributions
ses+0.077
plets+0.077
pper+0.075
 trigger+0.075
ider+0.073
Neg logit contributions
 Jane-0.024
 asked-0.022
 tidy-0.022
 clean-0.019
 to-0.019
Token toy
Token ID13373
Feature activation-6.05e-02

Pos logit contributions
cher+0.018
ses+0.018
pper+0.017
ider+0.017
 spoiler+0.016
Neg logit contributions
 tidy-0.013
 Jane-0.012
 asked-0.012
 clean-0.012
Jane-0.011
Token sax
Token ID46909
Feature activation-8.75e-02

Pos logit contributions
plets+0.125
 spoiler+0.121
 AW+0.120
ses+0.120
pper+0.118
Neg logit contributions
 to-0.063
 asked-0.061
 by-0.054
 Jane-0.049
 tidy-0.048
Tokenophone
Token ID38656
Feature activation-7.80e-02

Pos logit contributions
ses+0.120
 sy+0.116
 trigger+0.104
 affect+0.101
 drown+0.099
Neg logit contributions
 Jane-0.143
 asked-0.140
 admired-0.138
 to-0.138
to-0.128
Token.
Token ID13
Feature activation-1.23e-02

Pos logit contributions
ses+0.168
 spoiler+0.166
 AW+0.161
cher+0.161
 trigger+0.160
Neg logit contributions
 to-0.073
 asked-0.071
 Jane-0.067
 John-0.058
 by-0.056
Token She
Token ID1375
Feature activation+6.07e-03

Pos logit contributions
robe+0.015
ses+0.014
hered+0.014
 robber+0.014
 Have+0.013
Neg logit contributions
agg-0.020
 tidy-0.019
by-0.018
 asked-0.017
 desserts-0.017
Token was
Token ID373
Feature activation-2.61e-02

Pos logit contributions
 Jane+0.006
 tidy+0.006
agg+0.005
 desserts+0.005
 Baby+0.005
Neg logit contributions
cher-0.011
hered-0.011
pper-0.011
ider-0.010
 shooter-0.010
Token playing
Token ID2712
Feature activation-3.09e-02

Pos logit contributions
cher+0.036
ses+0.036
ider+0.035
hered+0.034
robe+0.034
Neg logit contributions
 Jane-0.038
 asked-0.037
 tidy-0.036
agg-0.035
by-0.034
Token were
Token ID547
Feature activation-3.67e-02

Pos logit contributions
cher+0.002
ider+0.002
ses+0.002
robe+0.002
 trigger+0.002
Neg logit contributions
 asked-0.001
 Jane-0.001
 tidy-0.001
 John-0.001
 clean-0.001
Token two
Token ID734
Feature activation-6.86e-03

Pos logit contributions
cher+0.055
ider+0.054
ses+0.054
robe+0.053
 trigger+0.052
Neg logit contributions
 asked-0.052
 Jane-0.050
 tidy-0.049
agg-0.044
 John-0.044
Token musicians
Token ID17245
Feature activation-5.67e-02

Pos logit contributions
cher+0.013
ider+0.013
ses+0.013
robe+0.012
 trigger+0.012
Neg logit contributions
 asked-0.008
 Jane-0.008
 tidy-0.007
 John-0.006
 clean-0.006
Token playing
Token ID2712
Feature activation-4.08e-02

Pos logit contributions
 trigger+0.117
ses+0.117
 spoiler+0.112
ider+0.111
robe+0.110
Neg logit contributions
 asked-0.066
 Jane-0.057
 to-0.050
 John-0.049
 admired-0.048
Token their
Token ID511
Feature activation-3.08e-02

Pos logit contributions
ses+0.084
ider+0.084
 trigger+0.082
pper+0.082
plets+0.081
Neg logit contributions
 asked-0.042
 Jane-0.040
 tidy-0.036
 John-0.033
 to-0.033
Token guitars
Token ID32497
Feature activation-8.64e-02

Pos logit contributions
cher+0.051
ider+0.049
plets+0.048
pper+0.048
ses+0.047
Neg logit contributions
 asked-0.042
 tidy-0.042
 Jane-0.041
 clean-0.038
 John-0.037
Token.
Token ID13
Feature activation-2.91e-02

Pos logit contributions
ses+0.187
ider+0.186
 trigger+0.185
cher+0.185
 spoiler+0.182
Neg logit contributions
 asked-0.084
 to-0.077
 Jane-0.066
 tidy-0.066
 clean-0.062
Token They
Token ID1119
Feature activation-3.62e-02

Pos logit contributions
cher+0.059
ider+0.058
ses+0.057
 trigger+0.056
robe+0.056
Neg logit contributions
 asked-0.031
 Jane-0.031
 tidy-0.026
 John-0.025
 Baby-0.024
Token were
Token ID547
Feature activation-3.56e-02

Pos logit contributions
ider+0.080
ses+0.078
cher+0.076
 trigger+0.075
robe+0.075
Neg logit contributions
 asked-0.040
 tidy-0.032
 Jane-0.031
 admired-0.029
 clean-0.027
Token singing
Token ID13777
Feature activation-4.06e-02

Pos logit contributions
cher+0.055
ses+0.054
ider+0.053
 spoiler+0.051
robe+0.051
Neg logit contributions
 Jane-0.050
 asked-0.049
 tidy-0.047
agg-0.043
Jane-0.043
Token and
Token ID290
Feature activation-3.72e-02

Pos logit contributions
cher+0.083
ider+0.083
ses+0.083
 trigger+0.080
pper+0.079
Neg logit contributions
 asked-0.040
 Jane-0.039
 tidy-0.037
 John-0.032
 clean-0.032
Token,
Token ID11
Feature activation-2.11e-02

Pos logit contributions
plets+0.138
cher+0.138
ses+0.136
 trigger+0.136
ider+0.135
Neg logit contributions
 asked-0.075
 to-0.066
 Jane-0.059
 tidy-0.059
 clean-0.055
Token a
Token ID257
Feature activation-3.50e-02

Pos logit contributions
ider+0.045
 trigger+0.043
cher+0.043
 loser+0.041
robe+0.041
Neg logit contributions
 asked-0.024
 Jane-0.023
 John-0.021
 Mary-0.019
 tidy-0.019
Token woman
Token ID2415
Feature activation-2.64e-02

Pos logit contributions
ider+0.054
pper+0.053
cher+0.052
ses+0.051
assing+0.049
Neg logit contributions
 asked-0.052
 tidy-0.051
 Jane-0.048
 clean-0.047
 to-0.044
Token with
Token ID351
Feature activation-6.18e-02

Pos logit contributions
cher+0.051
ses+0.050
 trigger+0.050
ider+0.050
robe+0.050
Neg logit contributions
 asked-0.030
 Jane-0.028
 tidy-0.026
 John-0.023
 clean-0.023
Token a
Token ID257
Feature activation-3.46e-02

Pos logit contributions
ider+0.108
cher+0.106
ses+0.102
robe+0.101
pper+0.100
Neg logit contributions
 asked-0.084
 Jane-0.080
 tidy-0.076
 John-0.071
 clean-0.068
Token violin
Token ID38283
Feature activation-8.57e-02

Pos logit contributions
ider+0.060
cher+0.059
pper+0.059
ses+0.058
robe+0.056
Neg logit contributions
 asked-0.046
 tidy-0.043
 Jane-0.041
 clean-0.039
 to-0.036
Token,
Token ID11
Feature activation-2.22e-02

Pos logit contributions
ses+0.179
cher+0.175
pper+0.173
assing+0.173
 loser+0.172
Neg logit contributions
 asked-0.086
 to-0.081
 tidy-0.067
 clean-0.062
 Jane-0.060
Token a
Token ID257
Feature activation-4.82e-02

Pos logit contributions
ider+0.048
 trigger+0.047
cher+0.046
 loser+0.044
 AW+0.044
Neg logit contributions
 asked-0.025
 Jane-0.021
 John-0.020
 to-0.019
 Mary-0.018
Token boy
Token ID2933
Feature activation-3.97e-02

Pos logit contributions
ider+0.077
pper+0.077
cher+0.074
ses+0.071
assing+0.070
Neg logit contributions
 asked-0.072
 tidy-0.070
 clean-0.065
 Jane-0.063
 to-0.061
Token with
Token ID351
Feature activation-6.24e-02

Pos logit contributions
 trigger+0.071
cher+0.070
ses+0.070
ider+0.069
robe+0.068
Neg logit contributions
 asked-0.051
 Jane-0.048
 tidy-0.044
 John-0.041
 Mary-0.040
Token a
Token ID257
Feature activation-4.35e-02

Pos logit contributions
ider+0.111
cher+0.108
ses+0.105
robe+0.103
pper+0.103
Neg logit contributions
 asked-0.084
 Jane-0.079
 tidy-0.075
 John-0.070
 clean-0.068
Token toys
Token ID14958
Feature activation-3.60e-02

Pos logit contributions
ider+0.043
pper+0.041
ses+0.036
plets+0.036
oor+0.035
Neg logit contributions
 tidy-0.034
 asked-0.034
 clean-0.033
 to-0.030
 different-0.028
Token.
Token ID13
Feature activation-9.20e-03

Pos logit contributions
ses+0.091
ider+0.087
plets+0.087
 spoiler+0.087
cel+0.084
Neg logit contributions
 to-0.027
 tidy-0.025
 asked-0.024
 clean-0.024
 Jane-0.019
Token She
Token ID1375
Feature activation-1.13e-02

Pos logit contributions
ses+0.014
ider+0.014
robe+0.014
 spoiler+0.014
 trigger+0.013
Neg logit contributions
 asked-0.012
 Jane-0.012
agg-0.012
 tidy-0.012
by-0.011
Token put
Token ID1234
Feature activation-2.29e-02

Pos logit contributions
cher+0.022
ses+0.022
ider+0.022
robe+0.021
 spoiler+0.021
Neg logit contributions
 Jane-0.011
 asked-0.011
 tidy-0.010
 John-0.009
Jane-0.009
Token the
Token ID262
Feature activation-2.22e-02

Pos logit contributions
ider+0.048
ses+0.047
cher+0.047
robe+0.045
 spoiler+0.045
Neg logit contributions
 asked-0.022
 Jane-0.021
 tidy-0.020
 John-0.017
 clean-0.017
Token drum
Token ID13026
Feature activation-8.35e-02

Pos logit contributions
ider+0.042
pper+0.040
oor+0.038
plets+0.038
ses+0.036
Neg logit contributions
 tidy-0.033
 clean-0.031
 asked-0.030
 room-0.027
 different-0.027
Token around
Token ID1088
Feature activation-2.03e-02

Pos logit contributions
ses+0.178
plets+0.165
pper+0.164
 trigger+0.163
assing+0.161
Neg logit contributions
 tidy-0.090
 to-0.086
 asked-0.084
 clean-0.082
 organized-0.071
Token her
Token ID607
Feature activation-1.57e-02

Pos logit contributions
cher+0.046
ider+0.046
ses+0.045
 spoiler+0.044
 trigger+0.044
Neg logit contributions
 Jane-0.016
 asked-0.016
 tidy-0.014
 John-0.012
 clean-0.011
Token neck
Token ID7393
Feature activation-3.04e-02

Pos logit contributions
cher+0.027
ider+0.026
 trigger+0.026
plets+0.025
 AW+0.025
Neg logit contributions
 asked-0.021
 tidy-0.021
 Jane-0.019
 clean-0.019
 admired-0.017
Token and
Token ID290
Feature activation-4.08e-02

Pos logit contributions
cher+0.056
ider+0.056
ses+0.055
robe+0.054
 trigger+0.054
Neg logit contributions
 asked-0.034
 Jane-0.033
 tidy-0.031
 John-0.028
agg-0.027
Token the
Token ID262
Feature activation-3.82e-02

Pos logit contributions
cher+0.072
ses+0.071
ider+0.071
robe+0.069
 spoiler+0.069
Neg logit contributions
 Jane-0.048
 asked-0.048
 tidy-0.046
 John-0.040
agg-0.040
Token his
Token ID465
Feature activation-4.26e-02

Pos logit contributions
cher+0.041
ses+0.040
ider+0.040
robe+0.039
 spoiler+0.039
Neg logit contributions
 asked-0.027
 Jane-0.027
 tidy-0.026
agg-0.023
Jane-0.023
Token dream
Token ID4320
Feature activation-3.58e-02

Pos logit contributions
cher+0.089
plets+0.086
ider+0.086
pper+0.085
 trigger+0.082
Neg logit contributions
 tidy-0.044
 asked-0.043
 Jane-0.039
 clean-0.038
 Dad-0.034
Token of
Token ID286
Feature activation-3.46e-02

Pos logit contributions
ses+0.074
ider+0.073
cher+0.072
 spoiler+0.072
 trigger+0.071
Neg logit contributions
 asked-0.034
 Jane-0.032
 tidy-0.031
 clean-0.026
 John-0.026
Token playing
Token ID2712
Feature activation-3.93e-02

Pos logit contributions
ider+0.063
cher+0.059
 trigger+0.059
pper+0.059
ses+0.059
Neg logit contributions
 asked-0.047
 Jane-0.044
 tidy-0.044
 John-0.040
 clean-0.040
Token the
Token ID262
Feature activation-4.15e-02

Pos logit contributions
ses+0.089
ider+0.085
pper+0.084
plets+0.082
 trigger+0.081
Neg logit contributions
 tidy-0.039
 asked-0.038
 Jane-0.036
 clean-0.034
 to-0.032
Token violin
Token ID38283
Feature activation-8.32e-02

Pos logit contributions
ses+0.065
ider+0.064
pper+0.061
cher+0.058
 spoiler+0.056
Neg logit contributions
 tidy-0.076
 clean-0.069
 asked-0.065
 organized-0.064
 me-0.062
Token had
Token ID550
Feature activation-5.82e-02

Pos logit contributions
ses+0.181
 trigger+0.168
 spoiler+0.164
pper+0.163
ider+0.163
Neg logit contributions
 asked-0.087
 Jane-0.082
 tidy-0.081
 to-0.081
 often-0.072
Token come
Token ID1282
Feature activation-3.27e-02

Pos logit contributions
ider+0.112
ses+0.112
cher+0.110
 trigger+0.109
 spoiler+0.106
Neg logit contributions
 asked-0.068
 Jane-0.064
 tidy-0.061
 John-0.054
 clean-0.054
Token true
Token ID2081
Feature activation-7.27e-03

Pos logit contributions
ses+0.066
ider+0.064
cher+0.061
plets+0.060
 trigger+0.059
Neg logit contributions
 asked-0.037
 tidy-0.036
 Jane-0.035
 to-0.033
 clean-0.032
Token.
Token ID13
Feature activation-1.81e-02

Pos logit contributions
cher+0.011
ses+0.011
ider+0.011
 spoiler+0.011
robe+0.011
Neg logit contributions
 asked-0.010
 Jane-0.010
 tidy-0.010
 John-0.009
 clean-0.009
Token<|endoftext|>
Token ID50256
Feature activation-2.19e-02

Pos logit contributions
cher+0.037
ider+0.036
ses+0.036
robe+0.035
 trigger+0.035
Neg logit contributions
 asked-0.017
 Jane-0.017
 tidy-0.016
 John-0.014
 clean-0.013
Token time
Token ID640
Feature activation-2.40e-02

Pos logit contributions
ider+0.032
ses+0.031
cher+0.031
pper+0.030
 spoiler+0.029
Neg logit contributions
 tidy-0.014
 asked-0.014
 Jane-0.012
 clean-0.011
 John-0.010
Token and
Token ID290
Feature activation-2.60e-02

Pos logit contributions
cher+0.043
ider+0.042
ses+0.042
robe+0.041
hered+0.040
Neg logit contributions
 asked-0.028
 Jane-0.028
 tidy-0.026
agg-0.023
 John-0.023
Token became
Token ID2627
Feature activation-2.39e-02

Pos logit contributions
cher+0.044
robe+0.042
hered+0.042
ses+0.042
 shooter+0.041
Neg logit contributions
 Jane-0.031
agg-0.030
 asked-0.029
 tidy-0.029
 Baby-0.028
Token a
Token ID257
Feature activation-1.90e-02

Pos logit contributions
cher+0.045
ider+0.044
ses+0.044
robe+0.043
 spoiler+0.043
Neg logit contributions
 asked-0.025
 Jane-0.025
 tidy-0.023
agg-0.021
 John-0.021
Token great
Token ID1049
Feature activation-3.49e-02

Pos logit contributions
ses+0.038
cher+0.038
ider+0.038
 trigger+0.037
pper+0.036
Neg logit contributions
 tidy-0.020
 asked-0.020
 Jane-0.019
 clean-0.017
 admired-0.016
Token violin
Token ID38283
Feature activation-8.25e-02

Pos logit contributions
cher+0.065
ider+0.064
ses+0.063
pper+0.061
 trigger+0.060
Neg logit contributions
 asked-0.045
 tidy-0.045
 Jane-0.043
 clean-0.040
 John-0.037
Tokenist
Token ID396
Feature activation-2.05e-02

Pos logit contributions
ses+0.189
 spoiler+0.176
 trigger+0.174
assing+0.167
pper+0.167
Neg logit contributions
 to-0.081
 asked-0.081
 tidy-0.078
 Jane-0.072
 clean-0.067
Token.
Token ID13
Feature activation-5.98e-03

Pos logit contributions
ses+0.040
cher+0.039
ider+0.039
 spoiler+0.037
robe+0.037
Neg logit contributions
 asked-0.023
 Jane-0.023
 tidy-0.021
 John-0.019
 admired-0.019
Token<|endoftext|>
Token ID50256
Feature activation-1.89e-02

Pos logit contributions
cher+0.012
ider+0.012
ses+0.012
robe+0.011
 spoiler+0.011
Neg logit contributions
 asked-0.006
 Jane-0.006
 tidy-0.005
 John-0.005
 clean-0.005
TokenOnce
Token ID7454
Feature activation+6.35e-03

Pos logit contributions
cher+0.036
ider+0.035
ses+0.035
robe+0.035
 trigger+0.034
Neg logit contributions
 asked-0.020
 Jane-0.019
 tidy-0.019
agg-0.016
 John-0.016
Token upon
Token ID2402
Feature activation-2.39e-02

Pos logit contributions
 asked+0.011
 tidy+0.010
 Jane+0.010
agg+0.009
 desserts+0.009
Neg logit contributions
cher-0.007
robe-0.007
ses-0.007
ider-0.007
 trigger-0.007
Token when
Token ID618
Feature activation-1.12e-02

Pos logit contributions
cher+0.035
ses+0.034
ider+0.034
 spoiler+0.033
 trigger+0.033
Neg logit contributions
 asked-0.023
 Jane-0.022
 tidy-0.021
 John-0.018
 clean-0.018
Token she
Token ID673
Feature activation-1.44e-02

Pos logit contributions
cher+0.021
ses+0.020
ider+0.020
 trigger+0.020
robe+0.020
Neg logit contributions
 asked-0.013
 Jane-0.013
 tidy-0.012
 John-0.011
 Katie-0.011
Token saw
Token ID2497
Feature activation-7.99e-04

Pos logit contributions
cher+0.028
ider+0.028
ses+0.027
hered+0.027
robe+0.027
Neg logit contributions
 Jane-0.014
 tidy-0.013
 asked-0.012
agg-0.012
Jane-0.011
Token Gina
Token ID43002
Feature activation-4.29e-02

Pos logit contributions
cher+0.001
ses+0.001
robe+0.001
 trigger+0.001
 spoiler+0.001
Neg logit contributions
 asked-0.001
 Jane-0.001
 tidy-0.001
agg-0.001
Jane-0.001
Token's
Token ID338
Feature activation-5.46e-02

Pos logit contributions
ses+0.095
ider+0.093
cher+0.090
 loser+0.090
plets+0.089
Neg logit contributions
 asked-0.044
 tidy-0.041
 Jane-0.038
 clean-0.037
 to-0.034
Token violin
Token ID38283
Feature activation-8.18e-02

Pos logit contributions
cher+0.097
pper+0.094
ider+0.092
plets+0.089
coon+0.081
Neg logit contributions
 tidy-0.090
 asked-0.085
 clean-0.084
 to-0.082
 Mom-0.076
Token.
Token ID13
Feature activation-1.19e-02

Pos logit contributions
ses+0.174
 trigger+0.167
cher+0.167
plets+0.164
pper+0.163
Neg logit contributions
 asked-0.088
 tidy-0.082
 to-0.074
 clean-0.073
 Jane-0.071
Token She
Token ID1375
Feature activation+1.59e-02

Pos logit contributions
robe+0.018
ses+0.018
 spoiler+0.018
ider+0.018
 trigger+0.017
Neg logit contributions
 tidy-0.016
 Jane-0.015
 asked-0.015
agg-0.014
Jane-0.014
Token wanted
Token ID2227
Feature activation+2.40e-03

Pos logit contributions
 Jane+0.015
agg+0.014
 tidy+0.013
 desserts+0.013
 pic+0.013
Neg logit contributions
cher-0.030
ses-0.028
ider-0.028
pper-0.028
robe-0.028
Token one
Token ID530
Feature activation+8.71e-03

Pos logit contributions
agg+0.002
 tidy+0.002
 Jane+0.002
 asked+0.001
Jane+0.001
Neg logit contributions
cher-0.005
ses-0.005
hered-0.005
 sax-0.005
robe-0.005
Token too
Token ID1165
Feature activation+6.63e-03

Pos logit contributions
 Jane+0.011
 asked+0.011
 tidy+0.011
 clean+0.010
 John+0.009
Neg logit contributions
cher-0.015
ider-0.015
ses-0.015
 trigger-0.015
 spoiler-0.015
Token sunshine
Token ID34488
Feature activation-1.57e-02

Pos logit contributions
cher+0.057
ses+0.056
ider+0.056
robe+0.055
 spoiler+0.055
Neg logit contributions
 asked-0.036
 Jane-0.035
 tidy-0.033
 John-0.029
agg-0.029
Token.
Token ID13
Feature activation-9.42e-03

Pos logit contributions
cher+0.025
ses+0.024
ider+0.024
 spoiler+0.024
 trigger+0.024
Neg logit contributions
 asked-0.022
 Jane-0.022
 tidy-0.021
 John-0.019
 clean-0.019
Token One
Token ID1881
Feature activation+1.72e-02

Pos logit contributions
ses+0.016
cher+0.016
ider+0.016
robe+0.015
 spoiler+0.015
Neg logit contributions
 asked-0.012
 Jane-0.012
 tidy-0.011
agg-0.010
 clean-0.010
Token day
Token ID1110
Feature activation-1.06e-02

Pos logit contributions
 asked+0.019
 Jane+0.018
 tidy+0.017
agg+0.016
 John+0.015
Neg logit contributions
ses-0.031
cher-0.031
robe-0.031
ider-0.030
 shooter-0.030
Token,
Token ID11
Feature activation-1.64e-02

Pos logit contributions
hered+0.019
cher+0.018
 trigger+0.018
 loser+0.018
 shooter+0.017
Neg logit contributions
 tidy-0.012
 asked-0.011
 Jane-0.011
agg-0.011
Jane-0.011
Token the
Token ID262
Feature activation-8.12e-02

Pos logit contributions
cher+0.034
ider+0.034
ses+0.034
robe+0.033
 spoiler+0.033
Neg logit contributions
 asked-0.015
 Jane-0.015
 tidy-0.014
 John-0.012
 clean-0.011
Token sky
Token ID6766
Feature activation+1.36e-03

Pos logit contributions
cher+0.171
ses+0.169
ider+0.168
pper+0.168
plets+0.165
Neg logit contributions
 asked-0.100
 tidy-0.098
 clean-0.087
 organized-0.085
 me-0.074
Token went
Token ID1816
Feature activation-2.14e-02

Pos logit contributions
 Jane+0.002
 desserts+0.002
Kat+0.002
 pic+0.002
 Katie+0.002
Neg logit contributions
hered-0.002
 trigger-0.002
plets-0.002
 shooter-0.002
 controller-0.002
Token pale
Token ID14005
Feature activation-2.71e-04

Pos logit contributions
cher+0.040
ses+0.040
ider+0.040
 spoiler+0.038
robe+0.038
Neg logit contributions
 asked-0.024
 Jane-0.023
 tidy-0.023
 clean-0.020
 John-0.019
Token and
Token ID290
Feature activation-2.48e-02

Pos logit contributions
robe+0.000
ider+0.000
plets+0.000
ses+0.000
hered+0.000
Neg logit contributions
 asked-0.000
 tidy-0.000
 Jane-0.000
agg-0.000
Jane-0.000
Token it
Token ID340
Feature activation-2.50e-02

Pos logit contributions
robe+0.043
 spoiler+0.043
ses+0.043
cher+0.043
 trigger+0.042
Neg logit contributions
 asked-0.025
agg-0.024
 tidy-0.024
 Jane-0.023
old-0.022
Token.
Token ID13
Feature activation-5.53e-04

Pos logit contributions
ses+0.099
ider+0.097
cher+0.096
 spoiler+0.096
 trigger+0.095
Neg logit contributions
 asked-0.055
 Jane-0.050
 tidy-0.048
 to-0.046
 clean-0.043
Token They
Token ID1119
Feature activation-4.28e-02

Pos logit contributions
ses+0.001
 spoiler+0.001
ider+0.001
robe+0.001
 trigger+0.001
Neg logit contributions
 asked-0.001
 tidy-0.001
 Jane-0.001
agg-0.001
Jane-0.001
Token b
Token ID275
Feature activation+1.02e-02

Pos logit contributions
ider+0.095
ses+0.093
cher+0.092
 trigger+0.090
robe+0.089
Neg logit contributions
 asked-0.042
 tidy-0.034
 Jane-0.034
 admired-0.030
 clean-0.029
Tokenanged
Token ID5102
Feature activation-6.38e-03

Pos logit contributions
 tidy+0.018
 asked+0.018
 Jane+0.018
 clean+0.016
idi+0.016
Neg logit contributions
hered-0.013
ses-0.012
cher-0.012
pper-0.012
ider-0.012
Token their
Token ID511
Feature activation-3.44e-02

Pos logit contributions
cher+0.010
ider+0.010
ses+0.010
robe+0.010
 trigger+0.010
Neg logit contributions
 asked-0.009
 Jane-0.009
 tidy-0.008
 John-0.007
agg-0.007
Token drums
Token ID24730
Feature activation-8.10e-02

Pos logit contributions
cher+0.060
ider+0.058
pper+0.057
plets+0.055
ses+0.054
Neg logit contributions
 asked-0.045
 tidy-0.044
 Jane-0.040
 clean-0.040
 to-0.039
Token and
Token ID290
Feature activation-3.89e-02

Pos logit contributions
ses+0.195
cher+0.191
ider+0.189
plets+0.184
 loser+0.183
Neg logit contributions
 asked-0.058
 to-0.055
 tidy-0.053
 clean-0.050
 Jane-0.038
Token lifted
Token ID13663
Feature activation-1.39e-03

Pos logit contributions
cher+0.068
ider+0.068
ses+0.067
robe+0.065
 trigger+0.065
Neg logit contributions
 asked-0.049
 Jane-0.047
 tidy-0.045
 John-0.040
 clean-0.040
Token their
Token ID511
Feature activation-3.15e-02

Pos logit contributions
 trigger+0.002
 spoiler+0.002
ses+0.002
 controller+0.002
hered+0.002
Neg logit contributions
 asked-0.002
agg-0.002
 tidy-0.002
 Jane-0.001
Jane-0.001
Token feet
Token ID3625
Feature activation-4.44e-02

Pos logit contributions
ider+0.059
cher+0.059
pper+0.057
plets+0.055
hered+0.054
Neg logit contributions
 tidy-0.038
 asked-0.036
 clean-0.034
 to-0.032
 homes-0.031
Token.
Token ID13
Feature activation+7.51e-03

Pos logit contributions
ses+0.096
cher+0.089
ider+0.088
plets+0.087
robe+0.086
Neg logit contributions
 tidy-0.046
 asked-0.045
 clean-0.043
 to-0.043
 Jane-0.040
Token her
Token ID607
Feature activation-1.73e-02

Pos logit contributions
 asked+0.006
 Jane+0.006
 tidy+0.006
agg+0.005
 John+0.005
Neg logit contributions
cher-0.006
ses-0.006
robe-0.006
ider-0.006
 trigger-0.006
Token friend
Token ID1545
Feature activation-2.72e-02

Pos logit contributions
cher+0.036
ider+0.035
plets+0.034
 spoiler+0.034
ses+0.033
Neg logit contributions
 asked-0.017
 Jane-0.017
 tidy-0.017
 clean-0.014
 John-0.014
Token playing
Token ID2712
Feature activation-2.18e-02

Pos logit contributions
hered+0.048
cher+0.045
pper+0.044
robe+0.044
ses+0.043
Neg logit contributions
agg-0.030
 tidy-0.028
 asked-0.026
 admired-0.025
 Jane-0.025
Token a
Token ID257
Feature activation-1.00e-02

Pos logit contributions
plets+0.056
pper+0.054
ider+0.054
ses+0.054
 trigger+0.053
Neg logit contributions
 Jane-0.016
 asked-0.015
 tidy-0.014
 clean-0.013
 to-0.012
Token beautiful
Token ID4950
Feature activation-2.69e-02

Pos logit contributions
cher+0.018
ses+0.017
pper+0.017
ider+0.017
plets+0.016
Neg logit contributions
 tidy-0.014
 Jane-0.013
 asked-0.013
 clean-0.012
Jane-0.011
Token violin
Token ID38283
Feature activation-8.09e-02

Pos logit contributions
cher+0.044
ider+0.042
ses+0.042
pper+0.041
plets+0.039
Neg logit contributions
 Jane-0.041
 tidy-0.040
 asked-0.039
 clean-0.037
Jane-0.034
Token.
Token ID13
Feature activation-1.52e-02

Pos logit contributions
ses+0.211
 spoiler+0.199
 Deal+0.198
 AW+0.198
 trigger+0.191
Neg logit contributions
 to-0.064
 asked-0.063
 by-0.051
 tidy-0.051
 clean-0.048
Token She
Token ID1375
Feature activation+6.67e-03

Pos logit contributions
robe+0.019
hered+0.018
 CD+0.017
 spoiler+0.016
ses+0.016
Neg logit contributions
agg-0.026
by-0.024
elia-0.023
 tidy-0.023
 desserts-0.023
Token asked
Token ID1965
Feature activation-2.52e-04

Pos logit contributions
 Jane+0.006
 tidy+0.006
 desserts+0.006
agg+0.006
aul+0.005
Neg logit contributions
cher-0.012
ses-0.012
pper-0.012
 shooter-0.012
hered-0.012
Token her
Token ID607
Feature activation+1.36e-03

Pos logit contributions
hered+0.000
cher+0.000
robe+0.000
 robber+0.000
 villain+0.000
Neg logit contributions
agg-0.000
 tidy-0.000
 asked-0.000
 admired-0.000
 measured-0.000
Token friend
Token ID1545
Feature activation-1.96e-03

Pos logit contributions
 asked+0.002
 Jane+0.002
agg+0.001
 tidy+0.001
Jane+0.001
Neg logit contributions
cher-0.002
robe-0.002
 shooter-0.002
ses-0.002
 trigger-0.002
Token watch
Token ID2342
Feature activation-1.30e-02

Pos logit contributions
ider+0.018
cher+0.017
robe+0.016
ses+0.016
plets+0.016
Neg logit contributions
 Jane-0.015
 asked-0.015
 tidy-0.014
 John-0.013
 clean-0.013
Token.
Token ID13
Feature activation+2.30e-03

Pos logit contributions
cher+0.026
ider+0.026
ses+0.026
robe+0.025
 trigger+0.025
Neg logit contributions
 asked-0.013
 Jane-0.013
 tidy-0.012
 John-0.011
 clean-0.010
Token It
Token ID632
Feature activation-3.54e-02

Pos logit contributions
 tidy+0.002
 asked+0.002
 Jane+0.002
 clean+0.002
by+0.002
Neg logit contributions
ses-0.004
ider-0.004
cher-0.004
 spoiler-0.004
robe-0.004
Token hit
Token ID2277
Feature activation-7.20e-03

Pos logit contributions
cher+0.077
ider+0.077
ses+0.077
robe+0.075
 trigger+0.075
Neg logit contributions
 asked-0.030
 Jane-0.028
 tidy-0.026
 John-0.021
 clean-0.021
Token the
Token ID262
Feature activation-5.64e-02

Pos logit contributions
ses+0.013
cher+0.013
 trigger+0.013
 spoiler+0.013
hered+0.013
Neg logit contributions
 asked-0.008
 tidy-0.008
 Jane-0.007
agg-0.007
 clean-0.006
Token golf
Token ID13126
Feature activation-8.04e-02

Pos logit contributions
cher+0.105
ider+0.104
pper+0.095
robe+0.093
asso+0.093
Neg logit contributions
 tidy-0.073
 asked-0.069
 clean-0.069
 me-0.058
 organized-0.058
Token hole
Token ID7604
Feature activation-2.40e-02

Pos logit contributions
ses+0.134
ider+0.130
cher+0.123
arse+0.113
pper+0.113
Neg logit contributions
 asked-0.106
 tidy-0.098
 to-0.097
 me-0.097
 paid-0.097
Token in
Token ID287
Feature activation-5.35e-03

Pos logit contributions
ses+0.044
cher+0.043
ider+0.042
 trigger+0.042
 spoiler+0.042
Neg logit contributions
 Jane-0.028
 asked-0.028
 tidy-0.027
 clean-0.024
 John-0.023
Token one
Token ID530
Feature activation-3.15e-03

Pos logit contributions
ider+0.011
cher+0.011
ses+0.011
robe+0.011
 trigger+0.010
Neg logit contributions
 Jane-0.006
 asked-0.005
 tidy-0.005
 John-0.005
 clean-0.004
Token shot
Token ID2823
Feature activation-4.04e-02

Pos logit contributions
cher+0.005
ider+0.005
ses+0.005
robe+0.005
 trigger+0.005
Neg logit contributions
 asked-0.004
 Jane-0.004
 tidy-0.004
 clean-0.003
 John-0.003
Token -
Token ID532
Feature activation-2.93e-02

Pos logit contributions
ider+0.070
ses+0.070
cher+0.068
 spoiler+0.066
robe+0.065
Neg logit contributions
 asked-0.051
 Jane-0.051
 tidy-0.048
 John-0.044
 clean-0.043
Token sound
Token ID2128
Feature activation-3.94e-02

Pos logit contributions
cher+0.046
pper+0.045
plets+0.044
 spoiler+0.044
 trigger+0.042
Neg logit contributions
 Jane-0.022
 asked-0.022
 tidy-0.021
 clean-0.019
 to-0.019
Token.
Token ID13
Feature activation+1.10e-03

Pos logit contributions
ses+0.094
 spoiler+0.088
cher+0.087
ider+0.085
 trigger+0.085
Neg logit contributions
 asked-0.040
 Jane-0.037
 to-0.034
 tidy-0.033
 clean-0.032
Token It
Token ID632
Feature activation-3.78e-02

Pos logit contributions
 tidy+0.002
 asked+0.002
by+0.001
 admired+0.001
 Jane+0.001
Neg logit contributions
ses-0.002
robe-0.002
ider-0.002
 trigger-0.001
 spoiler-0.001
Token is
Token ID318
Feature activation-3.60e-02

Pos logit contributions
cher+0.080
ider+0.080
ses+0.077
pper+0.077
 trigger+0.076
Neg logit contributions
 asked-0.040
 Jane-0.036
 tidy-0.033
 clean-0.029
 Mary-0.028
Token a
Token ID257
Feature activation-8.68e-03

Pos logit contributions
cher+0.067
ider+0.066
 trigger+0.064
ses+0.064
 spoiler+0.063
Neg logit contributions
 Jane-0.045
 asked-0.044
 tidy-0.042
 John-0.037
 clean-0.037
Token violin
Token ID38283
Feature activation-7.98e-02

Pos logit contributions
cher+0.016
ider+0.016
ses+0.016
pper+0.016
 spoiler+0.016
Neg logit contributions
 asked-0.010
 Jane-0.009
 tidy-0.009
 clean-0.008
 John-0.008
Token.
Token ID13
Feature activation+1.15e-02

Pos logit contributions
ses+0.190
 trigger+0.178
 spoiler+0.178
pper+0.177
 Deal+0.175
Neg logit contributions
 asked-0.066
 to-0.063
 tidy-0.060
 clean-0.055
 Jane-0.050
Token She
Token ID1375
Feature activation+2.26e-03

Pos logit contributions
 tidy+0.016
 asked+0.016
by+0.015
 admired+0.014
Kat+0.014
Neg logit contributions
ses-0.019
robe-0.017
ider-0.017
 trigger-0.016
 spoiler-0.015
Token follows
Token ID5679
Feature activation+1.02e-02

Pos logit contributions
 asked+0.003
 Jane+0.003
 tidy+0.003
 John+0.002
 clean+0.002
Neg logit contributions
cher-0.004
ider-0.004
ses-0.004
robe-0.004
 trigger-0.004
Token the
Token ID262
Feature activation-4.52e-02

Pos logit contributions
 asked+0.008
 Jane+0.008
 tidy+0.008
agg+0.006
 John+0.006
Neg logit contributions
cher-0.022
ses-0.021
ider-0.021
 trigger-0.021
robe-0.021
Token sound
Token ID2128
Feature activation-2.10e-02

Pos logit contributions
ider+0.081
pper+0.080
cher+0.077
ses+0.076
plets+0.071
Neg logit contributions
 tidy-0.070
 asked-0.068
 clean-0.065
 me-0.060
 to-0.057
Token article
Token ID2708
Feature activation-3.84e-02

Pos logit contributions
cher+0.003
ider+0.003
ses+0.003
robe+0.003
hered+0.003
Neg logit contributions
 asked-0.002
 Jane-0.002
 tidy-0.001
 John-0.001
 clean-0.001
Token about
Token ID546
Feature activation-1.83e-02

Pos logit contributions
ses+0.093
 spoiler+0.088
ider+0.087
cher+0.086
pper+0.086
Neg logit contributions
 Jane-0.029
 asked-0.028
 tidy-0.024
 John-0.022
 to-0.020
Token the
Token ID262
Feature activation-3.50e-02

Pos logit contributions
ider+0.035
cher+0.033
ses+0.033
 trigger+0.033
pper+0.032
Neg logit contributions
 Jane-0.023
 asked-0.022
 tidy-0.020
 John-0.020
 Mary-0.018
Token local
Token ID1957
Feature activation-4.41e-02

Pos logit contributions
ider+0.067
cher+0.064
plets+0.064
ses+0.063
pper+0.063
Neg logit contributions
 asked-0.042
 tidy-0.040
 Jane-0.037
 clean-0.035
 John-0.033
Token ar
Token ID610
Feature activation-4.93e-02

Pos logit contributions
ider+0.075
ses+0.071
cher+0.070
plets+0.070
 spoiler+0.069
Neg logit contributions
 asked-0.063
 tidy-0.058
 Jane-0.058
 clean-0.052
 John-0.052
Tokenchery
Token ID31132
Feature activation-7.97e-02

Pos logit contributions
ses+0.074
 spoiler+0.054
robe+0.053
 elbow+0.049
sed+0.049
Neg logit contributions
 asked-0.093
 tidy-0.087
old-0.087
to-0.085
 often-0.083
Token tournament
Token ID7756
Feature activation-4.93e-02

Pos logit contributions
ses+0.138
ider+0.131
 spoiler+0.125
plets+0.121
cher+0.118
Neg logit contributions
 asked-0.114
 to-0.100
 tidy-0.100
 Jane-0.097
 often-0.096
Token.
Token ID13
Feature activation+4.05e-03

Pos logit contributions
ses+0.103
 spoiler+0.098
ider+0.095
 trigger+0.095
plets+0.093
Neg logit contributions
 asked-0.056
 Jane-0.053
 tidy-0.046
 to-0.046
 John-0.045
Token He
Token ID679
Feature activation-3.62e-04

Pos logit contributions
 tidy+0.006
 asked+0.005
by+0.005
 clean+0.005
agg+0.005
Neg logit contributions
ses-0.006
 Have-0.006
 spoiler-0.006
robe-0.006
cher-0.005
Token decided
Token ID3066
Feature activation+3.74e-03

Pos logit contributions
cher+0.001
ses+0.001
ider+0.001
pper+0.001
 shooter+0.001
Neg logit contributions
 Jane-0.000
 tidy-0.000
agg-0.000
 asked-0.000
 Baby-0.000
Token he
Token ID339
Feature activation+7.59e-03

Pos logit contributions
 Jane+0.002
 tidy+0.002
agg+0.002
 asked+0.002
 Baby+0.002
Neg logit contributions
cher-0.009
ider-0.008
 trigger-0.008
ses-0.008
 shooter-0.008
Token a
Token ID257
Feature activation-1.60e-02

Pos logit contributions
ider+0.085
cher+0.084
ses+0.080
plets+0.079
pper+0.079
Neg logit contributions
 asked-0.058
 tidy-0.058
 Jane-0.058
 clean-0.053
 John-0.049
Token ribbon
Token ID29092
Feature activation-6.53e-02

Pos logit contributions
cher+0.028
ider+0.027
ses+0.027
robe+0.026
 trigger+0.026
Neg logit contributions
 asked-0.020
 tidy-0.019
 Jane-0.019
 clean-0.017
 John-0.016
Token with
Token ID351
Feature activation-4.44e-02

Pos logit contributions
ses+0.144
 loser+0.136
 spoiler+0.135
assing+0.135
cher+0.134
Neg logit contributions
 asked-0.062
 to-0.060
 tidy-0.055
 clean-0.053
 Jane-0.049
Token a
Token ID257
Feature activation-2.28e-02

Pos logit contributions
ider+0.079
cher+0.079
ses+0.077
robe+0.075
 spoiler+0.074
Neg logit contributions
 asked-0.055
 Jane-0.054
 tidy-0.051
 John-0.045
 clean-0.045
Token big
Token ID1263
Feature activation-3.44e-03

Pos logit contributions
cher+0.039
ider+0.038
ses+0.038
robe+0.037
pper+0.036
Neg logit contributions
 asked-0.030
 Jane-0.028
 tidy-0.028
 clean-0.025
 John-0.024
Token bow
Token ID9563
Feature activation-7.95e-02

Pos logit contributions
cher+0.006
ider+0.006
robe+0.006
ses+0.005
pper+0.005
Neg logit contributions
 asked-0.005
 Jane-0.004
 tidy-0.004
 clean-0.004
 John-0.004
Token.
Token ID13
Feature activation-2.05e-02

Pos logit contributions
 Deal+0.173
ses+0.171
rob+0.168
assing+0.167
 spoiler+0.167
Neg logit contributions
 to-0.073
 asked-0.070
 by-0.052
 clean-0.048
 tidy-0.048
Token It
Token ID632
Feature activation-2.24e-02

Pos logit contributions
ses+0.033
robe+0.033
cher+0.033
ider+0.032
 trigger+0.031
Neg logit contributions
 tidy-0.025
 Jane-0.025
 asked-0.025
agg-0.023
Jane-0.023
Token was
Token ID373
Feature activation-1.63e-02

Pos logit contributions
cher+0.046
ses+0.045
ider+0.045
robe+0.044
 trigger+0.043
Neg logit contributions
 Jane-0.021
 asked-0.020
 tidy-0.019
agg-0.017
 Katie-0.016
Token a
Token ID257
Feature activation-2.29e-03

Pos logit contributions
cher+0.027
ider+0.027
ses+0.027
robe+0.026
 trigger+0.026
Neg logit contributions
 asked-0.021
 Jane-0.021
 tidy-0.020
 John-0.018
 clean-0.018
Token pink
Token ID11398
Feature activation-4.18e-02

Pos logit contributions
cher+0.004
ider+0.004
ses+0.004
robe+0.004
 trigger+0.004
Neg logit contributions
 asked-0.003
 Jane-0.003
 tidy-0.002
 clean-0.002
 John-0.002
Token and
Token ID290
Feature activation-4.95e-02

Pos logit contributions
ider+0.063
cher+0.061
 trigger+0.061
 loser+0.058
assing+0.058
Neg logit contributions
 asked-0.029
 Jane-0.024
 to-0.023
 John-0.022
 tidy-0.021
Token a
Token ID257
Feature activation-5.20e-02

Pos logit contributions
ider+0.090
pper+0.087
cher+0.086
robe+0.085
ses+0.084
Neg logit contributions
 asked-0.068
 Jane-0.063
 tidy-0.058
 John-0.055
 Mary-0.053
Token girl
Token ID2576
Feature activation-4.82e-02

Pos logit contributions
pper+0.089
ider+0.087
cher+0.083
assing+0.080
ses+0.080
Neg logit contributions
 asked-0.073
 tidy-0.071
 clean-0.066
 Jane-0.064
 to-0.062
Token with
Token ID351
Feature activation-5.96e-02

Pos logit contributions
 trigger+0.090
ses+0.089
cher+0.088
ider+0.088
robe+0.087
Neg logit contributions
 asked-0.065
 Jane-0.057
 tidy-0.051
 admired-0.050
 to-0.049
Token a
Token ID257
Feature activation-4.17e-02

Pos logit contributions
ider+0.109
cher+0.106
ses+0.103
pper+0.101
robe+0.101
Neg logit contributions
 asked-0.078
 Jane-0.074
 tidy-0.070
 John-0.065
 clean-0.063
Token piano
Token ID19132
Feature activation-7.95e-02

Pos logit contributions
ider+0.080
pper+0.080
cher+0.077
ses+0.075
assing+0.074
Neg logit contributions
 asked-0.052
 tidy-0.047
 clean-0.043
 Jane-0.042
 to-0.040
Token.
Token ID13
Feature activation-5.64e-03

Pos logit contributions
ses+0.171
cher+0.168
pper+0.167
plets+0.166
 trigger+0.165
Neg logit contributions
 asked-0.075
 to-0.071
 tidy-0.058
 clean-0.053
 Jane-0.049
Token They
Token ID1119
Feature activation-2.58e-02

Pos logit contributions
ses+0.007
ider+0.007
 trigger+0.007
 spoiler+0.007
robe+0.006
Neg logit contributions
 tidy-0.009
 desserts-0.008
 clean-0.008
agg-0.008
Jane-0.008
Token hear
Token ID3285
Feature activation-2.77e-02

Pos logit contributions
ider+0.045
ses+0.043
cher+0.043
robe+0.042
 trigger+0.041
Neg logit contributions
 asked-0.040
 tidy-0.035
 Jane-0.035
 clean-0.032
 admired-0.031
Token how
Token ID703
Feature activation-9.76e-03

Pos logit contributions
cher+0.048
ses+0.047
 trigger+0.047
 spoiler+0.046
robe+0.046
Neg logit contributions
 asked-0.032
 Jane-0.030
 tidy-0.030
agg-0.027
Jane-0.026
Token the
Token ID262
Feature activation-1.18e-02

Pos logit contributions
cher+0.016
 spoiler+0.015
robe+0.015
ses+0.015
 trigger+0.015
Neg logit contributions
 Jane-0.012
 asked-0.012
agg-0.011
 tidy-0.011
 Baby-0.010
Token was
Token ID373
Feature activation-2.20e-02

Pos logit contributions
cher+0.001
ses+0.001
hered+0.001
robe+0.001
ider+0.001
Neg logit contributions
 Jane-0.001
 tidy-0.001
 asked-0.001
agg-0.001
Jane-0.001
Token wearing
Token ID5762
Feature activation-4.09e-02

Pos logit contributions
cher+0.037
ses+0.036
ider+0.036
robe+0.035
hered+0.035
Neg logit contributions
 Jane-0.025
 asked-0.025
 tidy-0.023
agg-0.023
by-0.023
Token a
Token ID257
Feature activation-1.87e-02

Pos logit contributions
cher+0.071
ider+0.070
ses+0.069
robe+0.067
 trigger+0.067
Neg logit contributions
 asked-0.051
 Jane-0.051
 tidy-0.049
 John-0.043
 clean-0.043
Token pretty
Token ID2495
Feature activation-2.50e-02

Pos logit contributions
cher+0.038
ider+0.038
ses+0.037
pper+0.036
plets+0.036
Neg logit contributions
 tidy-0.019
 asked-0.018
 Jane-0.017
 clean-0.017
 admired-0.014
Token pink
Token ID11398
Feature activation-4.46e-02

Pos logit contributions
cher+0.040
ider+0.039
ses+0.038
pper+0.036
assing+0.036
Neg logit contributions
 asked-0.037
 tidy-0.036
 Jane-0.036
 clean-0.034
 admired-0.032
Token bow
Token ID9563
Feature activation-7.92e-02

Pos logit contributions
cher+0.084
assing+0.078
chers+0.078
plets+0.077
ses+0.075
Neg logit contributions
 asked-0.056
 tidy-0.054
 clean-0.053
 to-0.053
 me-0.052
Token in
Token ID287
Feature activation-1.27e-02

Pos logit contributions
ses+0.178
assing+0.175
 AW+0.173
cher+0.170
 spoiler+0.169
Neg logit contributions
 asked-0.067
 to-0.067
 tidy-0.052
 clean-0.050
 Jane-0.049
Token her
Token ID607
Feature activation-1.97e-02

Pos logit contributions
cher+0.029
ider+0.028
ses+0.028
robe+0.027
 trigger+0.027
Neg logit contributions
 asked-0.010
 Jane-0.010
 tidy-0.009
 John-0.007
 clean-0.007
Token hair
Token ID4190
Feature activation-4.36e-02

Pos logit contributions
cher+0.033
ider+0.032
 trigger+0.031
ses+0.031
 spoiler+0.030
Neg logit contributions
 asked-0.027
 Jane-0.026
 tidy-0.026
 clean-0.023
 admired-0.022
Token.
Token ID13
Feature activation-2.75e-02

Pos logit contributions
cher+0.081
ses+0.081
 trigger+0.078
 spoiler+0.077
ider+0.076
Neg logit contributions
 asked-0.052
 Jane-0.048
 tidy-0.046
 clean-0.042
 to-0.041
Token
Token ID198
Feature activation-1.58e-02

Pos logit contributions
ses+0.049
robe+0.049
cher+0.048
ider+0.048
 spoiler+0.047
Neg logit contributions
 tidy-0.030
 asked-0.030
 Jane-0.029
agg-0.026
 clean-0.025
Token a
Token ID257
Feature activation-1.38e-02

Pos logit contributions
cher+0.071
ses+0.068
ider+0.068
robe+0.068
hered+0.067
Neg logit contributions
 Jane-0.050
 asked-0.049
 tidy-0.047
agg-0.043
 John-0.042
Token fantastic
Token ID9623
Feature activation-2.18e-02

Pos logit contributions
ider+0.033
cher+0.032
pper+0.031
ses+0.031
 trigger+0.031
Neg logit contributions
 tidy-0.011
 asked-0.010
 Jane-0.010
 clean-0.009
 John-0.007
Token time
Token ID640
Feature activation-3.31e-02

Pos logit contributions
cher+0.057
ider+0.054
 trigger+0.053
pper+0.053
plets+0.052
Neg logit contributions
 asked-0.014
 Jane-0.014
 tidy-0.013
 clean-0.010
 admired-0.009
Token visiting
Token ID10013
Feature activation-1.14e-02

Pos logit contributions
 trigger+0.069
ses+0.068
 spoiler+0.066
cher+0.065
robe+0.065
Neg logit contributions
 asked-0.036
 Jane-0.034
 tidy-0.030
 to-0.029
 admired-0.028
Token the
Token ID262
Feature activation-4.80e-02

Pos logit contributions
cher+0.024
ider+0.023
ses+0.023
robe+0.023
 trigger+0.023
Neg logit contributions
 asked-0.010
 Jane-0.010
 tidy-0.009
 John-0.008
 clean-0.008
Token jazz
Token ID21274
Feature activation-7.90e-02

Pos logit contributions
cher+0.098
pper+0.098
ider+0.096
ses+0.096
 spoiler+0.090
Neg logit contributions
 tidy-0.058
 asked-0.056
 clean-0.051
 me-0.048
 organized-0.048
Token club
Token ID3430
Feature activation-5.12e-02

Pos logit contributions
ses+0.151
 trigger+0.140
 spoiler+0.139
pper+0.139
ider+0.134
Neg logit contributions
 asked-0.099
 Jane-0.094
 tidy-0.093
 to-0.090
 clean-0.084
Token.
Token ID13
Feature activation-1.89e-02

Pos logit contributions
ses+0.104
 trigger+0.103
 spoiler+0.100
cher+0.097
ider+0.097
Neg logit contributions
 asked-0.057
 Jane-0.057
 tidy-0.052
 to-0.048
 clean-0.046
Token She
Token ID1375
Feature activation-1.05e-02

Pos logit contributions
cher+0.037
ider+0.037
ses+0.036
 trigger+0.036
 spoiler+0.035
Neg logit contributions
 Jane-0.020
 asked-0.020
 tidy-0.018
 John-0.016
 Katie-0.016
Token danced
Token ID39480
Feature activation-2.88e-02

Pos logit contributions
cher+0.020
ses+0.020
ider+0.020
robe+0.020
 spoiler+0.019
Neg logit contributions
 Jane-0.011
 asked-0.010
 tidy-0.010
 John-0.008
agg-0.008
Token and
Token ID290
Feature activation-4.25e-02

Pos logit contributions
ses+0.059
ider+0.058
cher+0.058
 trigger+0.056
 spoiler+0.056
Neg logit contributions
 asked-0.029
 Jane-0.029
 tidy-0.027
 John-0.023
 clean-0.023
Token and
Token ID290
Feature activation-1.47e-02

Pos logit contributions
cher+0.008
ider+0.007
ses+0.007
robe+0.007
 trigger+0.007
Neg logit contributions
 Jane-0.005
 asked-0.005
 tidy-0.005
agg-0.004
Jane-0.004
Token always
Token ID1464
Feature activation-3.31e-02

Pos logit contributions
cher+0.022
robe+0.021
 spoiler+0.021
 shooter+0.021
ses+0.021
Neg logit contributions
agg-0.020
 Jane-0.019
 tidy-0.019
Jane-0.018
 asked-0.018
Token wore
Token ID12408
Feature activation-3.83e-02

Pos logit contributions
cher+0.059
robe+0.055
ider+0.055
 spoiler+0.055
ses+0.054
Neg logit contributions
 Jane-0.039
agg-0.037
Jane-0.035
 tidy-0.034
 Baby-0.033
Token his
Token ID465
Feature activation-5.31e-02

Pos logit contributions
cher+0.075
ses+0.073
ider+0.073
robe+0.072
 spoiler+0.072
Neg logit contributions
 asked-0.039
 Jane-0.038
 tidy-0.035
agg-0.031
Jane-0.031
Token lucky
Token ID9670
Feature activation-7.01e-02

Pos logit contributions
plets+0.099
pper+0.098
ider+0.097
cher+0.097
 trigger+0.094
Neg logit contributions
 tidy-0.068
 asked-0.065
 clean-0.062
 to-0.057
 Dad-0.056
Token red
Token ID2266
Feature activation-7.88e-02

Pos logit contributions
cher+0.108
pper+0.107
ider+0.102
plets+0.102
ses+0.102
Neg logit contributions
 tidy-0.108
 asked-0.107
 Jane-0.100
 clean-0.100
 to-0.098
Token shirt
Token ID10147
Feature activation-5.01e-02

Pos logit contributions
cher+0.141
pper+0.140
plets+0.138
ses+0.137
ider+0.134
Neg logit contributions
 to-0.110
 tidy-0.104
 asked-0.102
 clean-0.100
 me-0.093
Token.
Token ID13
Feature activation+1.69e-02

Pos logit contributions
ses+0.102
 spoiler+0.098
plets+0.097
 AW+0.096
assing+0.095
Neg logit contributions
 asked-0.054
 tidy-0.051
 to-0.049
 Jane-0.048
 clean-0.046
Token One
Token ID1881
Feature activation+1.99e-02

Pos logit contributions
 asked+0.018
 tidy+0.018
 Jane+0.018
agg+0.016
Jane+0.015
Neg logit contributions
 spoiler-0.030
ses-0.030
robe-0.030
cher-0.029
ider-0.029
Token day
Token ID1110
Feature activation+4.67e-05

Pos logit contributions
 asked+0.022
 Jane+0.022
 tidy+0.020
agg+0.019
 John+0.018
Neg logit contributions
robe-0.035
ses-0.035
ider-0.035
cher-0.035
 shooter-0.034
Token,
Token ID11
Feature activation+1.30e-02

Pos logit contributions
 tidy+0.000
 asked+0.000
 Jane+0.000
agg+0.000
Kat+0.000
Neg logit contributions
hered-0.000
 loser-0.000
ider-0.000
ses-0.000
 trigger-0.000
Token,
Token ID11
Feature activation-1.13e-02

Pos logit contributions
cher+0.007
ses+0.007
ider+0.007
 spoiler+0.007
 trigger+0.007
Neg logit contributions
 asked-0.005
 Jane-0.005
 tidy-0.004
 John-0.004
 clean-0.004
Token she
Token ID673
Feature activation+2.05e-02

Pos logit contributions
cher+0.024
ider+0.024
ses+0.023
 spoiler+0.022
robe+0.022
Neg logit contributions
 asked-0.012
 Jane-0.012
 tidy-0.010
 John-0.009
 to-0.009
Token stopped
Token ID5025
Feature activation+2.90e-03

Pos logit contributions
 Jane+0.023
 asked+0.021
 tidy+0.021
Jane+0.019
agg+0.019
Neg logit contributions
cher-0.038
ider-0.037
ses-0.037
 trigger-0.037
robe-0.036
Token playing
Token ID2712
Feature activation-3.41e-02

Pos logit contributions
 asked+0.004
 Jane+0.004
 tidy+0.004
 clean+0.003
 John+0.003
Neg logit contributions
cher-0.005
ider-0.005
ses-0.005
robe-0.005
 trigger-0.005
Token her
Token ID607
Feature activation-5.07e-02

Pos logit contributions
ses+0.085
ider+0.083
pper+0.083
plets+0.082
cher+0.079
Neg logit contributions
 tidy-0.027
 asked-0.025
 clean-0.024
 to-0.023
 Jane-0.021
Token violin
Token ID38283
Feature activation-7.85e-02

Pos logit contributions
pper+0.085
cher+0.085
plets+0.080
ider+0.078
 spoiler+0.076
Neg logit contributions
 tidy-0.083
 clean-0.078
 to-0.077
 asked-0.077
 organized-0.071
Token.
Token ID13
Feature activation-2.47e-02

Pos logit contributions
ses+0.182
cher+0.172
plets+0.171
 spoiler+0.169
ider+0.168
Neg logit contributions
 asked-0.069
 tidy-0.064
 to-0.064
 clean-0.058
 Jane-0.054
Token
Token ID220
Feature activation-1.13e-02

Pos logit contributions
ses+0.043
robe+0.042
ider+0.041
hered+0.041
 trigger+0.041
Neg logit contributions
 tidy-0.029
 Jane-0.028
 asked-0.028
agg-0.025
 John-0.025
Token
Token ID198
Feature activation-9.80e-03

Pos logit contributions
cher+0.023
ider+0.022
ses+0.022
robe+0.022
 trigger+0.022
Neg logit contributions
 asked-0.011
 Jane-0.010
 tidy-0.010
 John-0.008
agg-0.008
Token
Token ID198
Feature activation-2.96e-02

Pos logit contributions
cher+0.017
hered+0.016
robe+0.016
ider+0.016
ses+0.016
Neg logit contributions
 tidy-0.011
 Jane-0.011
 asked-0.011
 desserts-0.009
 Baby-0.009
TokenThen
Token ID6423
Feature activation-4.01e-02

Pos logit contributions
cher+0.054
ses+0.054
 spoiler+0.051
ider+0.051
robe+0.050
Neg logit contributions
 asked-0.038
 Jane-0.036
 tidy-0.036
Jane-0.032
 clean-0.031
Token,
Token ID11
Feature activation+4.31e-04

Pos logit contributions
cher+0.010
ider+0.010
ses+0.009
robe+0.009
 trigger+0.009
Neg logit contributions
 asked-0.004
 Jane-0.004
 tidy-0.004
 John-0.003
 clean-0.003
Token he
Token ID339
Feature activation-3.15e-03

Pos logit contributions
 Jane+0.000
 asked+0.000
 tidy+0.000
 John+0.000
 to+0.000
Neg logit contributions
ider-0.001
robe-0.001
ses-0.001
cher-0.001
hered-0.001
Token is
Token ID318
Feature activation-2.96e-02

Pos logit contributions
ider+0.006
cher+0.006
 trigger+0.006
 spoiler+0.006
ses+0.006
Neg logit contributions
 asked-0.003
 tidy-0.003
 Jane-0.003
 clean-0.003
 admired-0.003
Token playing
Token ID2712
Feature activation-1.76e-02

Pos logit contributions
ider+0.056
 trigger+0.054
cher+0.054
ses+0.054
robe+0.052
Neg logit contributions
 asked-0.038
 Jane-0.036
 tidy-0.035
 clean-0.032
 John-0.031
Token a
Token ID257
Feature activation-2.52e-03

Pos logit contributions
ses+0.042
plets+0.040
ider+0.039
pper+0.039
 trigger+0.039
Neg logit contributions
 tidy-0.015
 asked-0.015
 clean-0.013
 Jane-0.013
 me-0.012
Token sax
Token ID46909
Feature activation-7.85e-02

Pos logit contributions
ses+0.005
cher+0.005
ider+0.005
pper+0.004
robe+0.004
Neg logit contributions
 tidy-0.003
 asked-0.003
 Jane-0.003
 clean-0.003
 admired-0.002
Tokenophone
Token ID38656
Feature activation-5.74e-02

Pos logit contributions
ses+0.099
 trigger+0.091
 drown+0.085
 affect+0.085
 injuries+0.085
Neg logit contributions
 to-0.128
 asked-0.127
 Jane-0.127
 by-0.117
 John-0.117
Token.
Token ID13
Feature activation-1.52e-02

Pos logit contributions
 spoiler+0.110
ses+0.110
 trigger+0.109
assing+0.108
cher+0.106
Neg logit contributions
 asked-0.061
 Jane-0.058
 to-0.058
 John-0.050
 tidy-0.050
Token It
Token ID632
Feature activation-4.09e-02

Pos logit contributions
cher+0.032
pper+0.031
hered+0.031
ider+0.030
 trigger+0.030
Neg logit contributions
 Jane-0.014
 asked-0.014
 tidy-0.012
 John-0.012
 Mary-0.011
Token is
Token ID318
Feature activation-3.79e-02

Pos logit contributions
ider+0.098
 trigger+0.097
ses+0.095
pper+0.094
 loser+0.093
Neg logit contributions
 asked-0.035
 tidy-0.027
 clean-0.024
 me-0.023
 to-0.022
Token a
Token ID257
Feature activation-1.29e-02

Pos logit contributions
ider+0.082
cher+0.081
 trigger+0.080
hered+0.079
ses+0.077
Neg logit contributions
 asked-0.037
 tidy-0.036
 Jane-0.034
 clean-0.032
 John-0.028
Token bear
Token ID6842
Feature activation-1.63e-02

Pos logit contributions
ider+0.031
ses+0.031
plets+0.030
 AW+0.029
cher+0.029
Neg logit contributions
 asked-0.026
 tidy-0.022
 Jane-0.021
 clean-0.021
 admired-0.020
Token named
Token ID3706
Feature activation+3.22e-03

Pos logit contributions
 trigger+0.035
 spoiler+0.035
ses+0.035
cher+0.034
 AW+0.034
Neg logit contributions
 asked-0.018
 Jane-0.014
 to-0.014
 tidy-0.014
 clean-0.013
Token Brown
Token ID4373
Feature activation+1.03e-02

Pos logit contributions
 asked+0.005
 Jane+0.005
 tidy+0.005
agg+0.005
 admired+0.005
Neg logit contributions
cher-0.004
robe-0.004
 trigger-0.004
ses-0.004
pper-0.004
Tokenie
Token ID494
Feature activation-1.96e-02

Pos logit contributions
 asked+0.007
 Jane+0.006
 tidy+0.006
 clean+0.004
 John+0.004
Neg logit contributions
ses-0.025
ider-0.025
cher-0.024
 spoiler-0.024
 trigger-0.024
Token.
Token ID13
Feature activation+2.32e-02

Pos logit contributions
ider+0.044
plets+0.043
ses+0.043
cher+0.043
 AW+0.042
Neg logit contributions
 asked-0.020
 to-0.017
 tidy-0.016
 clean-0.015
 Jane-0.015
Token Brown
Token ID4373
Feature activation+5.89e-02

Pos logit contributions
 Jane+0.037
 asked+0.037
 tidy+0.035
agg+0.035
Jane+0.034
Neg logit contributions
 spoiler-0.031
robe-0.030
ses-0.030
cher-0.029
ider-0.029
Tokenie
Token ID494
Feature activation+4.17e-02

Pos logit contributions
 Jane+0.081
 Mary+0.071
 Katie+0.070
agg+0.069
 Maggie+0.069
Neg logit contributions
pper-0.091
robe-0.090
cher-0.087
bee-0.083
 shooter-0.081
Token was
Token ID373
Feature activation+1.58e-03

Pos logit contributions
 Jane+0.055
 Katie+0.050
agg+0.049
 desserts+0.049
Jane+0.048
Neg logit contributions
robe-0.064
ses-0.062
cher-0.060
pper-0.060
ider-0.060
Token very
Token ID845
Feature activation+1.86e-02

Pos logit contributions
 Jane+0.002
 asked+0.002
 tidy+0.002
Jane+0.002
agg+0.002
Neg logit contributions
ses-0.002
cher-0.002
ider-0.002
robe-0.002
 shooter-0.002
Token lonely
Token ID21757
Feature activation-1.14e-02

Pos logit contributions
 Jane+0.027
Jane+0.025
 Katie+0.024
aman+0.024
 John+0.024
Neg logit contributions
ider-0.028
ses-0.025
cher-0.025
 loser-0.025
robe-0.025
Token because
Token ID780
Feature activation+1.71e-02

Pos logit contributions
cher+0.022
ses+0.021
 trigger+0.021
 spoiler+0.021
ider+0.021
Neg logit contributions
 asked-0.013
 tidy-0.012
 Jane-0.012
 clean-0.011
 to-0.010
Token in
Token ID287
Feature activation+1.44e-02

Pos logit contributions
ider+0.021
cher+0.020
ses+0.020
hered+0.020
robe+0.019
Neg logit contributions
 Jane-0.029
Jane-0.027
agg-0.027
 asked-0.026
 John-0.026
Token the
Token ID262
Feature activation-2.73e-04

Pos logit contributions
 asked+0.012
 Jane+0.012
 tidy+0.011
 John+0.010
 clean+0.009
Neg logit contributions
cher-0.031
ider-0.030
ses-0.030
robe-0.029
 trigger-0.029
Token kitchen
Token ID9592
Feature activation+4.20e-02

Pos logit contributions
robe+0.000
 sax+0.000
 loser+0.000
 CD+0.000
Have+0.000
Neg logit contributions
 Jane-0.000
agg-0.000
Jane-0.000
 Maggie-0.000
 asked-0.000
Token,
Token ID11
Feature activation+7.79e-03

Pos logit contributions
 Jane+0.053
 asked+0.052
agg+0.050
Jane+0.049
 tidy+0.048
Neg logit contributions
ider-0.068
robe-0.065
ses-0.063
hered-0.061
 CD-0.058
Token mixing
Token ID17090
Feature activation+4.85e-02

Pos logit contributions
 asked+0.010
 Jane+0.010
 tidy+0.009
 John+0.008
agg+0.008
Neg logit contributions
cher-0.013
ider-0.013
ses-0.013
robe-0.013
 trigger-0.013
Token some
Token ID617
Feature activation+5.34e-02

Pos logit contributions
 asked+0.068
idi+0.064
old+0.064
 questioned+0.063
Kat+0.062
Neg logit contributions
robe-0.060
ters-0.059
ula-0.055
 wand-0.054
 controller-0.053
Token things
Token ID1243
Feature activation+9.84e-03

Pos logit contributions
 Jane+0.064
 asked+0.062
 John+0.061
 Todd+0.060
idi+0.059
Neg logit contributions
 trigger-0.082
robe-0.081
 controller-0.080
 wand-0.080
ider-0.079
Token together
Token ID1978
Feature activation+1.59e-02

Pos logit contributions
 asked+0.012
 Jane+0.012
 tidy+0.011
 John+0.010
 clean+0.010
Neg logit contributions
cher-0.017
ider-0.017
ses-0.017
robe-0.016
 trigger-0.016
Token.
Token ID13
Feature activation+9.25e-03

Pos logit contributions
 asked+0.024
 Jane+0.024
 tidy+0.023
agg+0.023
 Baby+0.022
Neg logit contributions
cher-0.021
ider-0.021
robe-0.020
ses-0.020
hered-0.019
Token Suddenly
Token ID24975
Feature activation-6.44e-03

Pos logit contributions
 tidy+0.013
agg+0.012
by+0.012
 asked+0.012
 clean+0.011
Neg logit contributions
robe-0.013
ses-0.012
hered-0.012
 Have-0.012
ider-0.012
Token,
Token ID11
Feature activation-7.71e-03

Pos logit contributions
cher+0.011
ses+0.011
ider+0.011
hered+0.011
robe+0.011
Neg logit contributions
 asked-0.007
 tidy-0.007
 Jane-0.006
Jane-0.006
agg-0.005
Token bury
Token ID30033
Feature activation+1.98e-02

Pos logit contributions
 Jane+0.027
 asked+0.027
 Katie+0.025
agg+0.025
Jane+0.025
Neg logit contributions
cher-0.025
ses-0.025
ider-0.025
 trigger-0.025
pper-0.024
Token the
Token ID262
Feature activation+2.68e-02

Pos logit contributions
 asked+0.023
 Jane+0.022
 tidy+0.021
agg+0.019
Jane+0.019
Neg logit contributions
cher-0.035
robe-0.034
 trigger-0.034
ses-0.034
ider-0.033
Token corn
Token ID11676
Feature activation+2.95e-03

Pos logit contributions
 Jane+0.032
 asked+0.031
Jane+0.028
 tidy+0.027
 John+0.027
Neg logit contributions
ses-0.047
ider-0.047
 trigger-0.047
cher-0.046
 loser-0.045
Token.
Token ID13
Feature activation-1.15e-03

Pos logit contributions
 asked+0.004
 Jane+0.004
 tidy+0.004
 clean+0.003
 John+0.003
Neg logit contributions
cher-0.005
ses-0.005
ider-0.005
 spoiler-0.005
robe-0.005
Token The
Token ID383
Feature activation+9.50e-03

Pos logit contributions
ses+0.002
cher+0.002
robe+0.002
ider+0.002
 spoiler+0.002
Neg logit contributions
 asked-0.001
 tidy-0.001
 Jane-0.001
agg-0.001
 clean-0.001
Token farmer
Token ID18739
Feature activation+5.24e-02

Pos logit contributions
 asked+0.011
 Jane+0.011
 tidy+0.011
 John+0.009
 clean+0.009
Neg logit contributions
cher-0.017
ider-0.017
ses-0.017
robe-0.016
 trigger-0.016
Token was
Token ID373
Feature activation+1.47e-03

Pos logit contributions
 Jane+0.056
 tidy+0.052
 Katie+0.051
agg+0.050
Jane+0.049
Neg logit contributions
cher-0.093
pper-0.087
robe-0.087
 shooter-0.086
ses-0.085
Token so
Token ID523
Feature activation+8.98e-03

Pos logit contributions
 Jane+0.002
 asked+0.002
 tidy+0.002
agg+0.002
by+0.001
Neg logit contributions
cher-0.003
ider-0.002
ses-0.002
 spoiler-0.002
robe-0.002
Token happy
Token ID3772
Feature activation+4.98e-03

Pos logit contributions
 Jane+0.011
 asked+0.010
 pic+0.010
idi+0.010
 John+0.010
Neg logit contributions
cher-0.014
ider-0.014
ses-0.014
 loser-0.014
robe-0.014
Token that
Token ID326
Feature activation-1.00e-02

Pos logit contributions
 asked+0.006
 Jane+0.006
 tidy+0.005
agg+0.005
 John+0.005
Neg logit contributions
cher-0.009
ider-0.009
ses-0.009
robe-0.009
 trigger-0.008
Token he
Token ID339
Feature activation+2.36e-02

Pos logit contributions
cher+0.018
ider+0.017
ses+0.017
robe+0.017
 spoiler+0.017
Neg logit contributions
 asked-0.012
 Jane-0.012
 tidy-0.011
agg-0.010
Jane-0.010
Token named
Token ID3706
Feature activation-2.05e-03

Pos logit contributions
cher+0.010
ider+0.010
ses+0.010
robe+0.010
 spoiler+0.010
Neg logit contributions
 Jane-0.011
 asked-0.011
 tidy-0.010
 John-0.010
agg-0.010
Token Lily
Token ID20037
Feature activation-6.76e-03

Pos logit contributions
ider+0.004
ses+0.003
robe+0.003
cher+0.003
hered+0.003
Neg logit contributions
 asked-0.002
 tidy-0.002
 Jane-0.002
agg-0.002
 admired-0.002
Token.
Token ID13
Feature activation-6.49e-03

Pos logit contributions
cher+0.012
ider+0.012
ses+0.011
 trigger+0.011
 spoiler+0.011
Neg logit contributions
 asked-0.009
 Jane-0.009
 tidy-0.009
 clean-0.008
 John-0.008
Token She
Token ID1375
Feature activation+3.42e-02

Pos logit contributions
ider+0.012
ses+0.012
robe+0.012
cher+0.011
 spoiler+0.011
Neg logit contributions
 asked-0.007
 Jane-0.007
 tidy-0.007
agg-0.006
 John-0.006
Token loved
Token ID6151
Feature activation+4.61e-02

Pos logit contributions
 Jane+0.031
 asked+0.029
 tidy+0.027
 John+0.025
 Katie+0.025
Neg logit contributions
cher-0.069
ses-0.069
robe-0.069
ider-0.069
hered-0.066
Token ice
Token ID4771
Feature activation+5.20e-02

Pos logit contributions
agg+0.057
Jane+0.050
 Jane+0.049
idi+0.046
Jamie+0.045
Neg logit contributions
 sax-0.076
ses-0.074
 trump-0.071
 CD-0.070
 robber-0.070
Token-
Token ID12
Feature activation+7.51e-02

Pos logit contributions
 asked+0.052
 Jane+0.050
 Katie+0.047
idi+0.045
 Jessie+0.045
Neg logit contributions
robe-0.102
ider-0.100
cher-0.098
pper-0.096
ses-0.095
Tokencream
Token ID36277
Feature activation+2.26e-02

Pos logit contributions
 Jane+0.113
 asked+0.109
 Katie+0.107
 Jessie+0.104
 tidy+0.102
Neg logit contributions
pper-0.102
plets-0.100
cher-0.098
ider-0.096
 trigger-0.096
Token more
Token ID517
Feature activation+5.56e-02

Pos logit contributions
 asked+0.030
 Jane+0.029
 tidy+0.028
agg+0.026
Jane+0.026
Neg logit contributions
ider-0.036
robe-0.036
cher-0.036
ses-0.036
 spoiler-0.034
Token than
Token ID621
Feature activation+1.49e-02

Pos logit contributions
 Jane+0.100
 asked+0.098
idi+0.095
agg+0.092
Jane+0.092
Neg logit contributions
ider-0.054
pper-0.050
robe-0.050
 shooter-0.049
 sax-0.047
Token anything
Token ID1997
Feature activation+2.76e-02

Pos logit contributions
 asked+0.020
 Jane+0.019
 tidy+0.019
agg+0.018
Jane+0.017
Neg logit contributions
ses-0.023
ider-0.023
cher-0.023
robe-0.022
 trigger-0.022
Token who
Token ID508
Feature activation+1.10e-02

Pos logit contributions
ider+0.015
cher+0.015
ses+0.015
robe+0.014
pper+0.014
Neg logit contributions
 Jane-0.019
 asked-0.019
 tidy-0.018
agg-0.018
 John-0.017
Token wanted
Token ID2227
Feature activation+1.95e-02

Pos logit contributions
 Jane+0.009
 asked+0.009
 tidy+0.008
agg+0.007
 John+0.007
Neg logit contributions
ses-0.023
robe-0.023
cher-0.022
ider-0.022
hered-0.022
Token a
Token ID257
Feature activation+2.29e-02

Pos logit contributions
agg+0.016
ere+0.015
 asked+0.015
 Jane+0.015
Jane+0.014
Neg logit contributions
ses-0.035
robe-0.035
 CD-0.035
cher-0.035
 sax-0.034
Token haircut
Token ID45548
Feature activation+1.15e-03

Pos logit contributions
 asked+0.032
 Jane+0.032
 tidy+0.031
 clean+0.027
 John+0.027
Neg logit contributions
cher-0.036
ider-0.036
ses-0.035
robe-0.034
 trigger-0.034
Token.
Token ID13
Feature activation+2.18e-02

Pos logit contributions
 Jane+0.001
 asked+0.001
 tidy+0.001
 to+0.001
 John+0.001
Neg logit contributions
 trigger-0.002
ses-0.002
 spoiler-0.002
cher-0.002
plets-0.002
Token She
Token ID1375
Feature activation+5.24e-02

Pos logit contributions
agg+0.035
idi+0.032
 asked+0.032
 Todd+0.032
 pic+0.031
Neg logit contributions
robe-0.028
ses-0.026
 spoiler-0.024
 Have-0.024
ider-0.023
Token asked
Token ID1965
Feature activation+2.74e-02

Pos logit contributions
 Jane+0.078
 Maggie+0.071
 Katie+0.068
 Jim+0.068
 Jessie+0.067
Neg logit contributions
robe-0.068
bee-0.067
pper-0.066
 cape-0.065
ider-0.065
Token her
Token ID607
Feature activation+2.80e-02

Pos logit contributions
agg+0.039
 topp+0.038
 pic+0.038
idi+0.038
Jamie+0.037
Neg logit contributions
robe-0.030
ses-0.029
hered-0.029
 robber-0.027
 villain-0.027
Token mom
Token ID1995
Feature activation+3.36e-02

Pos logit contributions
 Jane+0.034
 asked+0.032
Jane+0.031
 tidy+0.030
agg+0.030
Neg logit contributions
ses-0.046
cher-0.045
robe-0.045
ider-0.043
 wand-0.043
Token if
Token ID611
Feature activation+1.85e-02

Pos logit contributions
agg+0.051
 Jane+0.048
 asked+0.047
Jane+0.047
 tidy+0.047
Neg logit contributions
cher-0.043
robe-0.043
pper-0.040
 shooter-0.038
ider-0.038
Token she
Token ID673
Feature activation+3.63e-02

Pos logit contributions
 Jane+0.030
 tidy+0.029
agg+0.029
Jane+0.029
 asked+0.028
Neg logit contributions
cher-0.022
ses-0.021
robe-0.020
ider-0.020
 villain-0.019
Token some
Token ID617
Feature activation+1.82e-02

Pos logit contributions
cher+0.003
ses+0.003
robe+0.003
 trigger+0.003
ider+0.003
Neg logit contributions
 Jane-0.002
 asked-0.002
 tidy-0.002
agg-0.002
Jane-0.002
Token crunch
Token ID23527
Feature activation+3.46e-02

Pos logit contributions
 asked+0.018
 Jane+0.018
 tidy+0.016
 John+0.014
 Katie+0.014
Neg logit contributions
cher-0.035
ider-0.035
ses-0.035
robe-0.034
 trigger-0.034
Tokeny
Token ID88
Feature activation+1.64e-03

Pos logit contributions
 asked+0.045
 tidy+0.043
 Jane+0.043
agg+0.037
 John+0.037
Neg logit contributions
cher-0.056
robe-0.056
 trigger-0.054
ider-0.054
pper-0.054
Token carrot
Token ID40562
Feature activation+6.31e-03

Pos logit contributions
 asked+0.002
 Jane+0.002
 tidy+0.002
 John+0.002
agg+0.002
Neg logit contributions
cher-0.003
ses-0.003
ider-0.003
robe-0.003
 trigger-0.003
Token sticks
Token ID16461
Feature activation-1.45e-02

Pos logit contributions
 asked+0.011
 Jane+0.011
 tidy+0.010
 John+0.009
Jane+0.009
Neg logit contributions
cher-0.008
ider-0.008
ses-0.008
robe-0.008
 trigger-0.007
Token.
Token ID13
Feature activation+9.28e-03

Pos logit contributions
cher+0.027
ses+0.027
ider+0.027
 spoiler+0.027
 trigger+0.026
Neg logit contributions
 asked-0.016
 Jane-0.016
 tidy-0.015
 clean-0.013
 to-0.013
Token They
Token ID1119
Feature activation+1.36e-02

Pos logit contributions
 tidy+0.013
 asked+0.012
agg+0.011
Jane+0.011
 Jane+0.011
Neg logit contributions
ses-0.015
ider-0.014
robe-0.014
 trigger-0.013
 spoiler-0.013
Token were
Token ID547
Feature activation+1.24e-02

Pos logit contributions
 Jane+0.016
 tidy+0.015
agg+0.014
 asked+0.014
Jane+0.014
Neg logit contributions
cher-0.023
pper-0.022
robe-0.022
ses-0.021
 spoiler-0.021
Token very
Token ID845
Feature activation+1.88e-02

Pos logit contributions
by+0.018
 Jane+0.017
agg+0.017
 asked+0.017
Kat+0.017
Neg logit contributions
cher-0.015
ses-0.015
 shooter-0.014
robe-0.014
ider-0.014
Token excited
Token ID6568
Feature activation+3.14e-03

Pos logit contributions
aman+0.030
assembled+0.029
 Katie+0.028
 John+0.028
idi+0.028
Neg logit contributions
ider-0.023
 loser-0.020
balls-0.020
ses-0.020
cher-0.020
Token to
Token ID284
Feature activation+1.21e-02

Pos logit contributions
 asked+0.003
 tidy+0.003
 Jane+0.003
agg+0.003
 Baby+0.002
Neg logit contributions
ider-0.006
cher-0.006
robe-0.006
ses-0.006
hered-0.006
Token it
Token ID340
Feature activation-2.46e-02

Pos logit contributions
cher+0.010
ider+0.010
ses+0.010
robe+0.010
 spoiler+0.010
Neg logit contributions
 asked-0.004
 Jane-0.004
 tidy-0.003
 John-0.003
 clean-0.003
Token.
Token ID13
Feature activation+1.35e-03

Pos logit contributions
ses+0.051
plets+0.051
 spoiler+0.051
cher+0.051
robe+0.049
Neg logit contributions
 asked-0.026
 Jane-0.025
 tidy-0.023
 to-0.023
 clean-0.022
Token They
Token ID1119
Feature activation-5.43e-03

Pos logit contributions
 tidy+0.002
 asked+0.002
agg+0.002
 desserts+0.002
by+0.002
Neg logit contributions
ses-0.002
ider-0.002
 spoiler-0.002
 Have-0.002
robe-0.002
Token heard
Token ID2982
Feature activation-2.46e-02

Pos logit contributions
cher+0.010
ses+0.010
ider+0.010
 trigger+0.010
pper+0.010
Neg logit contributions
 Jane-0.006
 tidy-0.006
 asked-0.005
 John-0.005
Jane-0.005
Token a
Token ID257
Feature activation+1.07e-02

Pos logit contributions
ses+0.038
cher+0.038
 trigger+0.037
ider+0.036
 spoiler+0.036
Neg logit contributions
 asked-0.034
 Jane-0.033
 tidy-0.033
agg-0.029
Jane-0.029
Token buzzing
Token ID41719
Feature activation+7.98e-03

Pos logit contributions
 asked+0.014
 Jane+0.013
 tidy+0.013
 John+0.011
 Katie+0.011
Neg logit contributions
cher-0.018
ider-0.018
ses-0.018
 trigger-0.017
robe-0.017
Token sound
Token ID2128
Feature activation-2.33e-02

Pos logit contributions
 asked+0.009
 Jane+0.009
 tidy+0.008
 clean+0.007
Jane+0.007
Neg logit contributions
cher-0.016
ider-0.016
ses-0.015
 spoiler-0.015
robe-0.015
Token in
Token ID287
Feature activation+1.75e-02

Pos logit contributions
ses+0.050
cher+0.049
ider+0.048
 spoiler+0.047
robe+0.047
Neg logit contributions
 asked-0.024
 Jane-0.023
 tidy-0.021
 to-0.020
 clean-0.019
Token their
Token ID511
Feature activation+1.34e-02

Pos logit contributions
 asked+0.013
 Jane+0.013
 tidy+0.012
 John+0.010
 clean+0.009
Neg logit contributions
cher-0.039
ider-0.039
ses-0.039
robe-0.038
 trigger-0.038
Token ears
Token ID11368
Feature activation-1.78e-02

Pos logit contributions
 Jane+0.022
 asked+0.022
Jane+0.021
 Katie+0.020
 John+0.020
Neg logit contributions
ses-0.020
 trigger-0.017
robe-0.017
 spoiler-0.017
 wand-0.016
Token.
Token ID13
Feature activation+2.97e-05

Pos logit contributions
cher+0.032
ses+0.031
ider+0.031
robe+0.030
 spoiler+0.030
Neg logit contributions
 asked-0.022
 Jane-0.022
 tidy-0.020
 John-0.018
 clean-0.018
Token
Token ID198
Feature activation+9.24e-03

Pos logit contributions
cher+0.011
ses+0.011
ider+0.011
robe+0.011
 spoiler+0.011
Neg logit contributions
 asked-0.006
 Jane-0.006
 tidy-0.006
agg-0.005
 clean-0.005
Token
Token ID198
Feature activation-6.17e-03

Pos logit contributions
 Jane+0.010
 asked+0.010
 tidy+0.010
agg+0.008
 clean+0.008
Neg logit contributions
cher-0.017
ider-0.016
robe-0.016
ses-0.016
hered-0.016
TokenFinally
Token ID11158
Feature activation-6.18e-03

Pos logit contributions
ses+0.013
cher+0.012
ider+0.012
robe+0.011
 trigger+0.011
Neg logit contributions
 asked-0.007
 tidy-0.007
 Jane-0.006
Jane-0.006
 clean-0.006
Token they
Token ID484
Feature activation+8.14e-04

Pos logit contributions
cher+0.010
hered+0.010
ider+0.010
 shooter+0.010
 loser+0.010
Neg logit contributions
 tidy-0.007
 asked-0.007
 Jane-0.007
Jane-0.006
agg-0.006
Token reached
Token ID4251
Feature activation-3.45e-03

Pos logit contributions
 Jane+0.001
 tidy+0.001
 asked+0.001
Jane+0.001
 Katie+0.001
Neg logit contributions
cher-0.001
pper-0.001
hered-0.001
ses-0.001
 trigger-0.001
Token a
Token ID257
Feature activation+1.08e-02

Pos logit contributions
ses+0.007
cher+0.007
pper+0.007
ider+0.007
hered+0.007
Neg logit contributions
 asked-0.003
 Jane-0.003
 tidy-0.003
Jane-0.003
agg-0.003
Token good
Token ID922
Feature activation+2.75e-02

Pos logit contributions
 Jane+0.014
 asked+0.013
agg+0.012
 tidy+0.012
 John+0.012
Neg logit contributions
cher-0.018
 trigger-0.018
 spoiler-0.017
ses-0.017
 loser-0.017
Token spot
Token ID4136
Feature activation+5.50e-03

Pos logit contributions
 Jane+0.039
 asked+0.039
Jane+0.035
 tidy+0.035
agg+0.034
Neg logit contributions
 shooter-0.041
 trigger-0.040
ses-0.040
 loser-0.040
hered-0.040
Token to
Token ID284
Feature activation+9.72e-03

Pos logit contributions
 Jane+0.004
 asked+0.004
 tidy+0.003
 John+0.003
 to+0.003
Neg logit contributions
ses-0.012
cher-0.012
 spoiler-0.012
 trigger-0.012
ider-0.012
Token fish
Token ID5916
Feature activation+3.84e-04

Pos logit contributions
 asked+0.015
 Jane+0.015
 tidy+0.015
 John+0.013
Jane+0.013
Neg logit contributions
cher-0.014
ses-0.013
ider-0.013
 trigger-0.013
robe-0.013
Token and
Token ID290
Feature activation-1.71e-02

Pos logit contributions
 asked+0.000
 Jane+0.000
 tidy+0.000
 clean+0.000
 to+0.000
Neg logit contributions
ses-0.001
ider-0.001
 spoiler-0.001
 trigger-0.001
cher-0.001
Token the
Token ID262
Feature activation+4.88e-02

Pos logit contributions
 asked+0.058
agg+0.058
 tidy+0.057
Kat+0.057
idi+0.056
Neg logit contributions
ses-0.063
robe-0.059
 trigger-0.059
hered-0.058
 spoiler-0.057
Token strawberries
Token ID36973
Feature activation+6.61e-03

Pos logit contributions
 asked+0.052
 Jane+0.050
 tidy+0.047
 John+0.046
 Katie+0.042
Neg logit contributions
ses-0.086
 trigger-0.083
ider-0.082
robe-0.082
hered-0.081
Token.
Token ID13
Feature activation+1.57e-02

Pos logit contributions
 asked+0.010
 Jane+0.010
 tidy+0.009
 John+0.009
agg+0.008
Neg logit contributions
cher-0.010
ider-0.010
robe-0.009
ses-0.009
 trigger-0.009
Token The
Token ID383
Feature activation+1.81e-02

Pos logit contributions
by+0.021
 tidy+0.020
 asked+0.020
agg+0.019
aul+0.019
Neg logit contributions
ses-0.026
ider-0.024
 spoiler-0.022
 Have-0.022
robe-0.022
Token strawberries
Token ID36973
Feature activation+2.81e-02

Pos logit contributions
 asked+0.024
 Jane+0.023
 tidy+0.022
 John+0.021
 Katie+0.020
Neg logit contributions
ider-0.030
ses-0.030
cher-0.029
robe-0.029
 trigger-0.028
Token were
Token ID547
Feature activation+2.07e-02

Pos logit contributions
 Jane+0.038
 John+0.037
 tidy+0.036
 Katie+0.036
Kat+0.035
Neg logit contributions
robe-0.043
cher-0.040
hered-0.039
pper-0.039
ider-0.038
Token sweet
Token ID6029
Feature activation+3.14e-02

Pos logit contributions
 Jane+0.027
old+0.026
 Katie+0.026
Kat+0.026
to+0.025
Neg logit contributions
ses-0.029
pper-0.028
robe-0.028
 spoiler-0.027
 floating-0.027
Token and
Token ID290
Feature activation+2.21e-02

Pos logit contributions
 asked+0.047
 tidy+0.047
idi+0.046
 Jane+0.045
agg+0.045
Neg logit contributions
hered-0.038
ses-0.038
ider-0.038
robe-0.036
plets-0.035
Token y
Token ID331
Feature activation+3.83e-02

Pos logit contributions
old+0.028
to+0.027
Kat+0.026
agg+0.026
ere+0.025
Neg logit contributions
ses-0.034
 spoiler-0.031
robe-0.031
pper-0.031
plets-0.030
Tokenummy
Token ID13513
Feature activation+3.04e-02

Pos logit contributions
 asked+0.043
 Jane+0.037
 tidy+0.036
idi+0.036
 admired+0.034
Neg logit contributions
ider-0.073
hered-0.072
plets-0.068
ses-0.068
 trigger-0.068
Token.
Token ID13
Feature activation+1.43e-02

Pos logit contributions
 asked+0.049
 Jane+0.048
 tidy+0.048
agg+0.045
Jane+0.044
Neg logit contributions
ider-0.038
ses-0.038
cher-0.038
robe-0.037
hered-0.036
Token cookie
Token ID19751
Feature activation+1.86e-02

Pos logit contributions
 Jane+0.057
 asked+0.055
 John+0.054
 Todd+0.054
agg+0.054
Neg logit contributions
ider-0.048
cher-0.046
 wand-0.046
 CD-0.046
ses-0.046
Token.
Token ID13
Feature activation+1.52e-02

Pos logit contributions
 asked+0.024
 Jane+0.024
 tidy+0.023
 clean+0.020
 John+0.020
Neg logit contributions
ses-0.032
cher-0.032
ider-0.031
 spoiler-0.031
 trigger-0.031
Token Her
Token ID2332
Feature activation+1.91e-02

Pos logit contributions
agg+0.026
 tidy+0.026
Jess+0.024
 desserts+0.024
Jamie+0.024
Neg logit contributions
robe-0.017
ses-0.016
hered-0.015
ula-0.015
 CD-0.015
Token mom
Token ID1995
Feature activation+4.06e-02

Pos logit contributions
 Jane+0.025
 asked+0.023
agg+0.023
Jane+0.023
 Maggie+0.022
Neg logit contributions
ses-0.032
robe-0.030
ider-0.028
 loser-0.028
 trigger-0.028
Token said
Token ID531
Feature activation+1.74e-02

Pos logit contributions
 Jane+0.041
 tidy+0.040
 pic+0.039
agg+0.037
 Todd+0.037
Neg logit contributions
cher-0.073
pper-0.070
ider-0.068
hered-0.067
ses-0.066
Token,
Token ID11
Feature activation+2.13e-02

Pos logit contributions
 asked+0.037
 Jane+0.036
 tidy+0.036
agg+0.033
 admired+0.033
Neg logit contributions
cher-0.015
ider-0.014
ses-0.013
robe-0.013
hered-0.012
Token "
Token ID366
Feature activation+1.19e-02

Pos logit contributions
 tidy+0.030
 asked+0.030
 Jane+0.029
agg+0.028
by+0.027
Neg logit contributions
cher-0.031
ider-0.031
ses-0.030
robe-0.029
hered-0.028
TokenYes
Token ID5297
Feature activation+2.53e-02

Pos logit contributions
 asked+0.016
 Jane+0.016
 tidy+0.015
 John+0.014
agg+0.013
Neg logit contributions
cher-0.019
ider-0.019
ses-0.019
robe-0.018
 trigger-0.018
Token,
Token ID11
Feature activation+2.06e-02

Pos logit contributions
 asked+0.023
 Jane+0.022
 tidy+0.021
 admired+0.018
agg+0.017
Neg logit contributions
ider-0.052
cher-0.051
ses-0.050
 trigger-0.050
 spoiler-0.049
Token but
Token ID475
Feature activation+2.14e-02

Pos logit contributions
 asked+0.031
 Jane+0.030
 admired+0.029
agg+0.028
 tidy+0.028
Neg logit contributions
cher-0.028
ider-0.027
ses-0.027
pper-0.026
 trigger-0.026
Token you
Token ID345
Feature activation+1.60e-02

Pos logit contributions
agg+0.036
Kat+0.036
Jane+0.036
 Jane+0.035
old+0.035
Neg logit contributions
ses-0.024
ider-0.022
 loser-0.022
pper-0.022
cher-0.021
Token know
Token ID760
Feature activation+2.28e-02

Pos logit contributions
 asked+0.019
 Jane+0.019
 tidy+0.018
 John+0.016
agg+0.016
Neg logit contributions
cher-0.019
ider-0.018
ses-0.018
robe-0.018
 trigger-0.018
Token how
Token ID703
Feature activation+3.66e-02

Pos logit contributions
 asked+0.028
 Jane+0.027
 tidy+0.026
agg+0.024
Jane+0.023
Neg logit contributions
cher-0.039
ider-0.037
 spoiler-0.036
robe-0.036
ses-0.036
Token tall
Token ID7331
Feature activation+3.20e-02

Pos logit contributions
 asked+0.032
 Jane+0.032
agg+0.030
Jane+0.029
idi+0.028
Neg logit contributions
cher-0.068
ider-0.065
 shooter-0.064
 AW-0.061
 Have-0.061
Token he
Token ID339
Feature activation+2.50e-02

Pos logit contributions
 Jane+0.039
 asked+0.038
 tidy+0.038
agg+0.035
Jane+0.034
Neg logit contributions
cher-0.053
 trigger-0.052
ses-0.052
ider-0.051
 spoiler-0.051
Token was
Token ID373
Feature activation+5.06e-03

Pos logit contributions
 Jane+0.030
 tidy+0.027
 asked+0.026
 Baby+0.026
agg+0.025
Neg logit contributions
cher-0.046
ider-0.043
ses-0.042
robe-0.042
pper-0.042
Token!
Token ID0
Feature activation-1.26e-04

Pos logit contributions
 Jane+0.007
 asked+0.007
 tidy+0.006
agg+0.006
 Baby+0.006
Neg logit contributions
cher-0.008
ider-0.008
ses-0.008
robe-0.008
hered-0.008
Token
Token ID220
Feature activation-7.29e-03

Pos logit contributions
cher+0.000
ses+0.000
ider+0.000
robe+0.000
 spoiler+0.000
Neg logit contributions
 asked-0.000
 Jane-0.000
 tidy-0.000
 clean-0.000
agg-0.000
Token
Token ID198
Feature activation+1.06e-02

Pos logit contributions
cher+0.015
ider+0.015
ses+0.015
robe+0.014
 trigger+0.014
Neg logit contributions
 asked-0.007
 Jane-0.007
 tidy-0.006
 John-0.005
 clean-0.005
Token
Token ID198
Feature activation-4.93e-03

Pos logit contributions
 Jane+0.011
 asked+0.011
 tidy+0.011
 clean+0.009
 John+0.009
Neg logit contributions
cher-0.020
ider-0.019
robe-0.019
ses-0.018
hered-0.018
TokenFrom
Token ID4863
Feature activation+4.45e-03

Pos logit contributions
ses+0.010
cher+0.010
ider+0.010
robe+0.009
 trigger+0.009
Neg logit contributions
 asked-0.005
 Jane-0.005
 tidy-0.005
Jane-0.005
 clean-0.004
Token that
Token ID326
Feature activation-1.19e-02

Pos logit contributions
 asked+0.006
 Jane+0.006
 tidy+0.005
 John+0.005
Jane+0.005
Neg logit contributions
cher-0.007
ider-0.007
ses-0.007
robe-0.007
 trigger-0.007
Token "
Token ID366
Feature activation-2.05e-03

Pos logit contributions
cher+0.007
ider+0.007
ses+0.007
 trigger+0.006
robe+0.006
Neg logit contributions
 asked-0.004
 Jane-0.004
 tidy-0.004
 John-0.003
 clean-0.003
TokenYes
Token ID5297
Feature activation+7.82e-03

Pos logit contributions
ses+0.004
cher+0.004
ider+0.004
 trigger+0.003
robe+0.003
Neg logit contributions
 asked-0.003
 Jane-0.003
 tidy-0.002
Jane-0.002
 Katie-0.002
Token,
Token ID11
Feature activation+5.44e-03

Pos logit contributions
 Jane+0.006
 asked+0.006
 tidy+0.006
 John+0.005
 clean+0.004
Neg logit contributions
cher-0.017
ses-0.017
ider-0.017
robe-0.017
 trigger-0.017
Token I
Token ID314
Feature activation+9.94e-03

Pos logit contributions
 Jane+0.007
 asked+0.007
 tidy+0.007
 John+0.006
 clean+0.006
Neg logit contributions
ider-0.009
cher-0.009
ses-0.009
robe-0.009
 trigger-0.009
Token do
Token ID466
Feature activation+1.29e-03

Pos logit contributions
 asked+0.013
 Jane+0.013
 tidy+0.012
 clean+0.011
 John+0.011
Neg logit contributions
cher-0.017
ider-0.017
ses-0.016
 trigger-0.016
 spoiler-0.016
Token.
Token ID13
Feature activation-1.43e-03

Pos logit contributions
 Jane+0.002
 asked+0.002
 tidy+0.002
 clean+0.001
 John+0.001
Neg logit contributions
ses-0.002
 trigger-0.002
cher-0.002
robe-0.002
 spoiler-0.002
Token Can
Token ID1680
Feature activation-5.03e-03

Pos logit contributions
cher+0.002
ider+0.002
 trigger+0.002
ses+0.002
pper+0.002
Neg logit contributions
 Jane-0.002
 asked-0.002
 tidy-0.002
 John-0.002
 Baby-0.002
Token I
Token ID314
Feature activation-1.73e-03

Pos logit contributions
cher+0.009
ses+0.009
 trigger+0.008
robe+0.008
ider+0.008
Neg logit contributions
 asked-0.007
 Jane-0.007
 tidy-0.006
 John-0.006
 Dad-0.006
Token carry
Token ID3283
Feature activation+2.45e-02

Pos logit contributions
ses+0.003
cher+0.003
ider+0.003
 trigger+0.003
robe+0.003
Neg logit contributions
 asked-0.002
 tidy-0.002
 Jane-0.002
 clean-0.002
 John-0.002
Token my
Token ID616
Feature activation+6.63e-03

Pos logit contributions
 asked+0.027
 Jane+0.026
 tidy+0.025
 John+0.022
 clean+0.021
Neg logit contributions
cher-0.047
ider-0.046
ses-0.046
robe-0.045
 spoiler-0.045
Token guitar
Token ID10047
Feature activation-3.42e-02

Pos logit contributions
 asked+0.009
 tidy+0.009
 Jane+0.009
 clean+0.008
 Baby+0.008
Neg logit contributions
cher-0.011
ider-0.010
ses-0.010
plets-0.010
pper-0.010
Token't
Token ID470
Feature activation-7.63e-03

Pos logit contributions
ses+0.015
cher+0.015
ider+0.013
pper+0.013
ems+0.013
Neg logit contributions
 Jane-0.012
 asked-0.012
 tidy-0.011
 John-0.011
 clean-0.011
Token know
Token ID760
Feature activation+2.56e-02

Pos logit contributions
ses+0.013
cher+0.013
ider+0.013
robe+0.012
 spoiler+0.012
Neg logit contributions
 asked-0.010
 Jane-0.010
 tidy-0.010
 clean-0.009
 John-0.008
Token either
Token ID2035
Feature activation+1.36e-03

Pos logit contributions
 asked+0.033
 Jane+0.032
 tidy+0.031
agg+0.028
Jane+0.028
Neg logit contributions
cher-0.042
ses-0.041
ider-0.040
robe-0.040
 trigger-0.040
Token.
Token ID13
Feature activation+1.32e-02

Pos logit contributions
 asked+0.002
 Jane+0.002
 tidy+0.002
 John+0.002
 clean+0.002
Neg logit contributions
cher-0.002
ider-0.002
ses-0.002
robe-0.002
 trigger-0.002
Token Suddenly
Token ID24975
Feature activation-1.08e-03

Pos logit contributions
 tidy+0.018
 asked+0.016
 clean+0.016
 pic+0.015
ere+0.015
Neg logit contributions
ider-0.020
robe-0.019
ses-0.019
 spoiler-0.018
 loser-0.018
Token,
Token ID11
Feature activation+9.47e-05

Pos logit contributions
cher+0.002
ider+0.002
ses+0.002
robe+0.002
hered+0.002
Neg logit contributions
 tidy-0.001
 asked-0.001
 Jane-0.001
Jane-0.001
 clean-0.001
Token Danny
Token ID15105
Feature activation+5.07e-03

Pos logit contributions
 asked+0.000
 Jane+0.000
 tidy+0.000
 clean+0.000
Jane+0.000
Neg logit contributions
cher-0.000
ider-0.000
ses-0.000
robe-0.000
 trigger-0.000
Token saw
Token ID2497
Feature activation+4.18e-04

Pos logit contributions
 Jane+0.006
 tidy+0.005
 asked+0.005
Jane+0.005
 John+0.005
Neg logit contributions
cher-0.009
ider-0.009
pper-0.009
hered-0.009
 trigger-0.009
Token it
Token ID340
Feature activation-2.08e-02

Pos logit contributions
 asked+0.001
 Jane+0.001
 tidy+0.001
agg+0.001
 John+0.001
Neg logit contributions
cher-0.001
ses-0.001
ider-0.001
robe-0.001
 trigger-0.001
Token!
Token ID0
Feature activation+7.80e-03

Pos logit contributions
ses+0.045
cher+0.045
 spoiler+0.044
ider+0.043
 trigger+0.043
Neg logit contributions
 asked-0.020
 Jane-0.020
 tidy-0.018
 clean-0.016
 John-0.015
Token Sour
Token ID47694
Feature activation+3.69e-03

Pos logit contributions
 tidy+0.011
 asked+0.010
 admired+0.010
 homes+0.010
 clean+0.010
Neg logit contributions
robe-0.012
ider-0.011
ses-0.011
 spoiler-0.011
 loser-0.010
Token started
Token ID2067
Feature activation+1.72e-02

Pos logit contributions
 Jane+0.018
 asked+0.017
 tidy+0.016
agg+0.015
 John+0.015
Neg logit contributions
cher-0.030
ider-0.028
robe-0.028
ses-0.028
 spoiler-0.027
Token to
Token ID284
Feature activation+5.20e-03

Pos logit contributions
 Jane+0.014
agg+0.014
 asked+0.013
 tidy+0.012
Jane+0.012
Neg logit contributions
cher-0.034
ses-0.034
 shooter-0.033
hered-0.033
ider-0.033
Token spin
Token ID7906
Feature activation-1.14e-02

Pos logit contributions
 Jane+0.008
 asked+0.008
 tidy+0.007
agg+0.007
Jane+0.007
Neg logit contributions
cher-0.007
ses-0.007
 spoiler-0.007
ider-0.007
robe-0.007
Token around
Token ID1088
Feature activation-1.66e-03

Pos logit contributions
ses+0.028
ider+0.028
cher+0.028
pper+0.027
robe+0.026
Neg logit contributions
 asked-0.008
 Jane-0.007
 tidy-0.007
 clean-0.006
 to-0.005
Token and
Token ID290
Feature activation-2.63e-02

Pos logit contributions
cher+0.003
ses+0.003
ider+0.003
robe+0.003
 trigger+0.003
Neg logit contributions
 asked-0.002
 Jane-0.002
 tidy-0.002
 John-0.002
 clean-0.001
Token around
Token ID1088
Feature activation-5.35e-03

Pos logit contributions
cher+0.046
ider+0.046
ses+0.045
robe+0.044
 trigger+0.044
Neg logit contributions
 asked-0.034
 Jane-0.033
 tidy-0.031
 clean-0.028
 John-0.027
Token.
Token ID13
Feature activation+1.29e-02

Pos logit contributions
cher+0.009
ses+0.009
ider+0.009
robe+0.009
pper+0.009
Neg logit contributions
 asked-0.007
 Jane-0.007
 tidy-0.006
 John-0.006
 clean-0.006
Token He
Token ID679
Feature activation-3.70e-03

Pos logit contributions
 tidy+0.018
agg+0.017
 asked+0.017
 Jane+0.016
 desserts+0.016
Neg logit contributions
robe-0.019
ses-0.018
 Have-0.018
 spoiler-0.017
ider-0.017
Token spun
Token ID26843
Feature activation-8.61e-03

Pos logit contributions
cher+0.006
ses+0.005
ider+0.005
robe+0.005
 spoiler+0.005
Neg logit contributions
 Jane-0.005
 asked-0.005
 tidy-0.005
 John-0.005
agg-0.005
Token faster
Token ID5443
Feature activation-8.84e-03

Pos logit contributions
cher+0.018
ider+0.018
ses+0.018
robe+0.017
 trigger+0.017
Neg logit contributions
 asked-0.008
 Jane-0.008
 tidy-0.007
 John-0.006
 clean-0.006
Token and
Token ID290
Feature activation-1.09e-02

Pos logit contributions
cher+0.017
ider+0.017
ses+0.017
robe+0.017
 trigger+0.017
Neg logit contributions
 asked-0.009
 Jane-0.009
 tidy-0.008
 John-0.007
 clean-0.007
Token cats
Token ID11875
Feature activation-2.43e-02

Pos logit contributions
cher+0.003
ses+0.003
ider+0.003
robe+0.003
 trigger+0.003
Neg logit contributions
 asked-0.002
 Jane-0.002
 tidy-0.002
agg-0.002
 John-0.002
Token.
Token ID13
Feature activation-1.27e-02

Pos logit contributions
ider+0.044
cher+0.043
ses+0.043
 spoiler+0.043
robe+0.042
Neg logit contributions
 asked-0.030
 Jane-0.028
 tidy-0.027
 clean-0.024
 John-0.024
Token And
Token ID843
Feature activation-1.23e-02

Pos logit contributions
cher+0.025
ses+0.024
ider+0.024
 trigger+0.024
robe+0.023
Neg logit contributions
 Jane-0.014
 asked-0.014
 tidy-0.012
 John-0.011
 Baby-0.011
Token there
Token ID612
Feature activation-1.81e-03

Pos logit contributions
cher+0.021
ses+0.021
ider+0.020
robe+0.020
 trigger+0.020
Neg logit contributions
 asked-0.015
 Jane-0.015
 tidy-0.014
agg-0.013
Jane-0.013
Token are
Token ID389
Feature activation-3.27e-02

Pos logit contributions
ses+0.002
cher+0.002
robe+0.002
ider+0.002
 spoiler+0.002
Neg logit contributions
 asked-0.003
 Jane-0.003
 tidy-0.003
agg-0.003
Jane-0.003
Token no
Token ID645
Feature activation-2.45e-02

Pos logit contributions
cher+0.052
ider+0.051
ses+0.050
robe+0.049
 spoiler+0.049
Neg logit contributions
 asked-0.046
 Jane-0.045
 tidy-0.043
 John-0.039
 clean-0.039
Token other
Token ID584
Feature activation-3.01e-02

Pos logit contributions
cher+0.044
ider+0.042
pper+0.041
robe+0.041
ses+0.040
Neg logit contributions
 asked-0.033
 Jane-0.032
 tidy-0.030
 clean-0.027
 John-0.026
Token par
Token ID1582
Feature activation-5.32e-02

Pos logit contributions
cher+0.052
ider+0.051
pper+0.049
 spoiler+0.048
robe+0.048
Neg logit contributions
 asked-0.043
 Jane-0.040
 tidy-0.039
 clean-0.036
 John-0.035
Tokenrots
Token ID24744
Feature activation-1.63e-02

Pos logit contributions
ses+0.076
 spoiler+0.074
robe+0.071
 trigger+0.065
 CD+0.062
Neg logit contributions
ere-0.072
 asked-0.072
by-0.068
ups-0.068
agg-0.067
Token.
Token ID13
Feature activation-1.48e-02

Pos logit contributions
ses+0.028
 spoiler+0.028
ider+0.028
cher+0.028
robe+0.027
Neg logit contributions
 asked-0.022
 Jane-0.020
 tidy-0.019
 clean-0.017
 John-0.017
Token I
Token ID314
Feature activation-8.85e-03

Pos logit contributions
cher+0.026
ses+0.026
ider+0.026
 trigger+0.025
robe+0.025
Neg logit contributions
 Jane-0.018
 asked-0.018
 tidy-0.016
 John-0.015
 Baby-0.015
Token and
Token ID290
Feature activation+1.08e-02

Pos logit contributions
agg+0.036
 Jane+0.036
 tidy+0.035
 desserts+0.035
Jane+0.034
Neg logit contributions
pper-0.030
robe-0.030
ses-0.028
hered-0.027
cher-0.026
Token Anna
Token ID11735
Feature activation+6.01e-03

Pos logit contributions
 tidy+0.010
agg+0.009
 asked+0.009
 Jane+0.009
ere+0.008
Neg logit contributions
cher-0.020
ses-0.020
hered-0.019
robe-0.019
 shooter-0.019
Token like
Token ID588
Feature activation+1.45e-02

Pos logit contributions
 Jane+0.008
 asked+0.007
 tidy+0.007
Jane+0.006
agg+0.006
Neg logit contributions
ses-0.010
cher-0.010
ider-0.010
robe-0.010
hered-0.009
Token to
Token ID284
Feature activation+2.51e-03

Pos logit contributions
 asked+0.007
 Jane+0.007
 tidy+0.007
agg+0.006
Jane+0.005
Neg logit contributions
ses-0.034
cher-0.034
robe-0.034
ider-0.033
hered-0.033
Token play
Token ID711
Feature activation+4.62e-03

Pos logit contributions
 Jane+0.003
 asked+0.003
 tidy+0.003
 clean+0.003
 John+0.003
Neg logit contributions
cher-0.004
ider-0.004
ses-0.004
 trigger-0.004
robe-0.004
Token soccer
Token ID11783
Feature activation-5.17e-02

Pos logit contributions
 asked+0.005
 tidy+0.004
 Jane+0.004
 clean+0.004
 to+0.003
Neg logit contributions
ses-0.010
cher-0.010
 spoiler-0.009
plets-0.009
ider-0.009
Token.
Token ID13
Feature activation+1.09e-02

Pos logit contributions
plets+0.123
ses+0.121
cher+0.121
 spoiler+0.120
 trigger+0.118
Neg logit contributions
 asked-0.048
 to-0.042
 tidy-0.041
 Jane-0.037
 clean-0.036
Token They
Token ID1119
Feature activation-2.49e-02

Pos logit contributions
 tidy+0.015
 asked+0.015
agg+0.014
 Jane+0.013
 admired+0.013
Neg logit contributions
ses-0.017
 spoiler-0.016
ider-0.016
robe-0.015
 loser-0.015
Token are
Token ID389
Feature activation-2.86e-02

Pos logit contributions
ider+0.047
cher+0.046
ses+0.044
robe+0.043
plets+0.042
Neg logit contributions
 asked-0.037
 tidy-0.034
 Jane-0.032
 clean-0.031
 often-0.029
Token on
Token ID319
Feature activation-2.67e-02

Pos logit contributions
cher+0.043
ses+0.043
ider+0.042
robe+0.041
 spoiler+0.041
Neg logit contributions
 asked-0.041
 Jane-0.041
 tidy-0.039
agg-0.036
 John-0.035
Token the
Token ID262
Feature activation-1.94e-02

Pos logit contributions
cher+0.061
ider+0.059
ses+0.057
 trigger+0.057
robe+0.057
Neg logit contributions
 asked-0.022
 Jane-0.022
 tidy-0.021
 John-0.017
 clean-0.017
Token a
Token ID257
Feature activation+1.63e-02

Pos logit contributions
 asked+0.004
 Jane+0.003
 tidy+0.003
agg+0.003
 admired+0.003
Neg logit contributions
cher-0.005
ses-0.005
ider-0.005
robe-0.005
 spoiler-0.005
Token big
Token ID1263
Feature activation+3.57e-02

Pos logit contributions
 Jane+0.024
 asked+0.024
 John+0.022
agg+0.022
 tidy+0.021
Neg logit contributions
cher-0.024
ider-0.023
 trigger-0.023
robe-0.023
ses-0.023
Token box
Token ID3091
Feature activation-4.89e-04

Pos logit contributions
 asked+0.059
 Jane+0.056
agg+0.055
 tidy+0.052
Jane+0.052
Neg logit contributions
 spoiler-0.045
cher-0.044
ses-0.044
 trigger-0.043
hered-0.042
Token with
Token ID351
Feature activation-2.11e-02

Pos logit contributions
cher+0.001
ses+0.001
ider+0.001
 trigger+0.001
 spoiler+0.001
Neg logit contributions
 asked-0.001
 Jane-0.001
 tidy-0.001
 clean-0.000
 John-0.000
Token a
Token ID257
Feature activation-4.66e-03

Pos logit contributions
cher+0.030
ses+0.030
 trigger+0.029
robe+0.029
 spoiler+0.028
Neg logit contributions
 asked-0.031
 Jane-0.030
 tidy-0.029
agg-0.027
Jane-0.027
Token bow
Token ID9563
Feature activation-5.47e-02

Pos logit contributions
cher+0.008
ider+0.008
ses+0.008
robe+0.007
pper+0.007
Neg logit contributions
 asked-0.006
 Jane-0.006
 tidy-0.006
 clean-0.005
 John-0.005
Token.
Token ID13
Feature activation+5.93e-03

Pos logit contributions
assing+0.119
ses+0.118
 AW+0.118
 spoiler+0.116
cher+0.115
Neg logit contributions
 to-0.049
 asked-0.046
 Jane-0.039
 tidy-0.036
 clean-0.034
Token
Token ID198
Feature activation+4.81e-03

Pos logit contributions
 tidy+0.008
 asked+0.007
by+0.007
 clean+0.007
 admired+0.007
Neg logit contributions
ses-0.010
ider-0.009
robe-0.009
cher-0.009
 spoiler-0.009
Token
Token ID198
Feature activation-6.39e-03

Pos logit contributions
 asked+0.005
 tidy+0.005
 Jane+0.005
 admired+0.004
 clean+0.004
Neg logit contributions
cher-0.008
ses-0.008
robe-0.008
ider-0.008
 spoiler-0.008
Token"
Token ID1
Feature activation+1.77e-03

Pos logit contributions
ses+0.011
cher+0.011
ider+0.011
robe+0.011
 trigger+0.011
Neg logit contributions
 asked-0.008
 Jane-0.008
 tidy-0.007
Jane-0.007
 John-0.006
TokenHappy
Token ID25082
Feature activation+1.99e-02

Pos logit contributions
 asked+0.002
 Jane+0.002
 tidy+0.002
Jane+0.002
 Katie+0.002
Neg logit contributions
cher-0.003
ses-0.003
ider-0.003
 trigger-0.003
robe-0.003
Token
Token ID220
Feature activation-3.93e-02

Pos logit contributions
cher+0.107
ider+0.106
ses+0.103
 trigger+0.102
pper+0.102
Neg logit contributions
 Jane-0.044
 asked-0.044
 tidy-0.036
 John-0.034
 Katie-0.033
Token
Token ID198
Feature activation-3.88e-02

Pos logit contributions
cher+0.085
ider+0.085
ses+0.085
 trigger+0.083
 spoiler+0.083
Neg logit contributions
 asked-0.032
 Jane-0.032
 tidy-0.028
 John-0.024
Jane-0.023
Token
Token ID198
Feature activation-4.94e-02

Pos logit contributions
ses+0.093
cher+0.092
ider+0.092
 spoiler+0.090
 trigger+0.090
Neg logit contributions
 asked-0.028
 Jane-0.027
 tidy-0.022
Jane-0.021
 Katie-0.018
TokenThe
Token ID464
Feature activation-3.05e-02

Pos logit contributions
ses+0.116
 spoiler+0.105
cher+0.105
 trigger+0.105
ider+0.103
Neg logit contributions
 asked-0.051
Jane-0.047
 Jane-0.046
 tidy-0.045
Kat-0.041
Token End
Token ID5268
Feature activation-3.25e-02

Pos logit contributions
plets+0.054
pper+0.053
ses+0.053
cher+0.051
hered+0.049
Neg logit contributions
 tidy-0.049
 asked-0.044
 clean-0.044
 organized-0.039
 Jane-0.038
Token.
Token ID13
Feature activation-5.22e-02

Pos logit contributions
 trigger+0.067
ses+0.067
pper+0.065
cher+0.065
 spoiler+0.064
Neg logit contributions
 Jane-0.037
 asked-0.035
 tidy-0.029
 Mary-0.029
Jane-0.029
Token<|endoftext|>
Token ID50256
Feature activation-3.94e-02

Pos logit contributions
ider+0.129
cher+0.128
ses+0.127
 trigger+0.126
 spoiler+0.122
Neg logit contributions
 Jane-0.037
 asked-0.035
 John-0.026
 Katie-0.025
 tidy-0.025
TokenOne
Token ID3198
Feature activation+1.98e-02

Pos logit contributions
ses+0.075
cher+0.074
ider+0.074
 trigger+0.072
 spoiler+0.071
Neg logit contributions
 asked-0.046
 Jane-0.045
 tidy-0.040
 John-0.037
Jane-0.036
Token day
Token ID1110
Feature activation-2.60e-02

Pos logit contributions
 asked+0.020
 Jane+0.019
 tidy+0.018
 John+0.016
agg+0.016
Neg logit contributions
cher-0.038
ider-0.038
ses-0.038
robe-0.037
 spoiler-0.037
Token Mom
Token ID11254
Feature activation+3.38e-02

Pos logit contributions
cher+0.060
ses+0.060
ider+0.059
 spoiler+0.059
 trigger+0.058
Neg logit contributions
 asked-0.019
 Jane-0.018
 tidy-0.016
 John-0.013
 Katie-0.013
Tokenmy
Token ID1820
Feature activation+1.51e-02

Pos logit contributions
 tidy+0.044
agg+0.044
 Jane+0.043
 asked+0.041
Jane+0.041
Neg logit contributions
robe-0.046
cher-0.045
pper-0.044
hered-0.043
 shooter-0.042
Token day
Token ID1110
Feature activation-1.43e-02

Pos logit contributions
 asked+0.012
 Jane+0.012
 tidy+0.011
 John+0.010
agg+0.010
Neg logit contributions
robe-0.015
cher-0.015
ider-0.015
ses-0.015
 spoiler-0.015
Token when
Token ID618
Feature activation-1.14e-02

Pos logit contributions
cher+0.026
ses+0.025
ider+0.025
 spoiler+0.024
 trigger+0.024
Neg logit contributions
 asked-0.018
 Jane-0.017
 tidy-0.016
 John-0.015
 clean-0.014
Token Jake
Token ID14757
Feature activation-8.28e-03

Pos logit contributions
cher+0.024
ider+0.023
ses+0.023
 trigger+0.023
robe+0.023
Neg logit contributions
 Jane-0.011
 asked-0.011
 tidy-0.010
 John-0.009
 Baby-0.009
Token and
Token ID290
Feature activation-1.78e-02

Pos logit contributions
cher+0.015
hered+0.015
pper+0.014
robe+0.014
ider+0.014
Neg logit contributions
 Jane-0.009
 tidy-0.008
 asked-0.008
agg-0.008
Jane-0.007
Token his
Token ID465
Feature activation-1.64e-02

Pos logit contributions
cher+0.033
robe+0.033
 spoiler+0.032
ses+0.032
 shooter+0.032
Neg logit contributions
agg-0.015
 asked-0.015
 tidy-0.015
 Jane-0.015
Jane-0.014
Token dog
Token ID3290
Feature activation-5.16e-02

Pos logit contributions
cher+0.033
ider+0.033
ses+0.032
 trigger+0.032
robe+0.032
Neg logit contributions
 asked-0.016
 Jane-0.015
 tidy-0.015
 John-0.012
 clean-0.012
Token,
Token ID11
Feature activation-4.10e-02

Pos logit contributions
cher+0.102
ider+0.102
ses+0.100
 trigger+0.097
 spoiler+0.097
Neg logit contributions
 asked-0.056
 Jane-0.053
 tidy-0.049
 John-0.044
 clean-0.043
Token F
Token ID376
Feature activation-3.24e-02

Pos logit contributions
ider+0.084
ses+0.081
cher+0.081
 trigger+0.079
 spoiler+0.079
Neg logit contributions
 Jane-0.043
 asked-0.043
 tidy-0.038
 John-0.037
 Baby-0.034
Tokenuzz
Token ID4715
Feature activation-3.28e-02

Pos logit contributions
cher+0.062
 trigger+0.061
ses+0.061
ider+0.060
 spoiler+0.059
Neg logit contributions
 Jane-0.037
 asked-0.036
 tidy-0.034
 John-0.031
 Katie-0.030
Tokeny
Token ID88
Feature activation-5.83e-02

Pos logit contributions
ses+0.067
cher+0.066
ider+0.066
 spoiler+0.065
 trigger+0.064
Neg logit contributions
 asked-0.034
 Jane-0.031
 tidy-0.029
 admired-0.027
 John-0.025
Token,
Token ID11
Feature activation-2.25e-02

Pos logit contributions
cher+0.122
ider+0.119
ses+0.118
 trigger+0.115
 spoiler+0.115
Neg logit contributions
 asked-0.057
 Jane-0.055
 tidy-0.049
 John-0.045
 Baby-0.043
Token,
Token ID11
Feature activation-4.39e-03

Pos logit contributions
cher+0.036
ider+0.036
ses+0.036
hered+0.035
robe+0.035
Neg logit contributions
 asked-0.016
 tidy-0.015
 Jane-0.015
 clean-0.012
Jane-0.012
Token a
Token ID257
Feature activation-2.04e-02

Pos logit contributions
cher+0.008
ider+0.008
ses+0.008
robe+0.007
 trigger+0.007
Neg logit contributions
 asked-0.005
 Jane-0.005
 tidy-0.005
 clean-0.004
agg-0.004
Token gentleman
Token ID22427
Feature activation-3.62e-02

Pos logit contributions
cher+0.039
ider+0.038
ses+0.038
 trigger+0.038
robe+0.038
Neg logit contributions
 Jane-0.021
 asked-0.020
 tidy-0.018
agg-0.017
 John-0.017
Token wearing
Token ID5762
Feature activation-4.47e-02

Pos logit contributions
ses+0.073
cher+0.073
ider+0.073
robe+0.071
 trigger+0.071
Neg logit contributions
 asked-0.035
 Jane-0.034
 tidy-0.031
 John-0.027
 clean-0.026
Token a
Token ID257
Feature activation-1.09e-02

Pos logit contributions
cher+0.077
ider+0.076
ses+0.076
robe+0.073
 trigger+0.073
Neg logit contributions
 asked-0.056
 Jane-0.056
 tidy-0.053
 John-0.047
 clean-0.046
Token hat
Token ID6877
Feature activation-4.99e-02

Pos logit contributions
cher+0.019
ider+0.019
ses+0.019
robe+0.019
 trigger+0.018
Neg logit contributions
 asked-0.013
 Jane-0.013
 tidy-0.012
 clean-0.011
 John-0.011
Token stopped
Token ID5025
Feature activation+3.42e-04

Pos logit contributions
ses+0.112
 loser+0.110
ider+0.109
cher+0.108
 AW+0.108
Neg logit contributions
 asked-0.046
 Jane-0.037
 to-0.036
 tidy-0.032
 John-0.032
Token and
Token ID290
Feature activation-2.03e-02

Pos logit contributions
 asked+0.000
 Jane+0.000
 tidy+0.000
 John+0.000
 clean+0.000
Neg logit contributions
cher-0.001
ses-0.001
ider-0.001
robe-0.001
 trigger-0.001
Token waved
Token ID26834
Feature activation-1.84e-02

Pos logit contributions
cher+0.039
ses+0.036
hered+0.035
ider+0.035
 spoiler+0.035
Neg logit contributions
 tidy-0.019
agg-0.018
 Jane-0.017
Jane-0.016
ere-0.016
Token at
Token ID379
Feature activation+2.38e-02

Pos logit contributions
cher+0.033
 trigger+0.032
ider+0.032
ses+0.031
hered+0.031
Neg logit contributions
 asked-0.020
 Jane-0.020
 tidy-0.019
agg-0.018
Jane-0.018
Token them
Token ID606
Feature activation-3.06e-02

Pos logit contributions
 tidy+0.015
 asked+0.014
 pic+0.011
aul+0.011
 clean+0.011
Neg logit contributions
cher-0.049
 trigger-0.048
hered-0.048
robe-0.048
ses-0.047