TOP ACTIVATIONS
MAX = 0.104

" Ring , ring , ring , we are playing
Ring , ring , ring , we are playing with the
. " Ring , ring , ring , we
ows , blankets , hats , and stickers . They had
plates , forks , knives , and nap kins ," mom
blocks , wheels , sticks , and strings . They wanted
mouths , like " bang , bang , bang ". They
dogs , cats , birds , and monkeys . They laugh
bikes , dogs , birds , and flowers . They say
like " bang , bang , bang ". They ran and
have more flour , eggs , milk and sugar . We
books , toys , clothes , and candy . They smiled
They saw monkeys , birds , and snakes . They did
books , dolls , cars , balls and candy . But
eggs , flour , sugar , milk , and some green
, shoes , scar ves , and belts . They looked
flour , sugar , milk , and some green stuff ,"
bowl with lettuce , tomatoes , cheese and dressing . They
They see bones , pots , coins , paintings , and
We put eggs , flour , sugar , milk , and

MIN ACTIVATIONS
MIN = -0.098

night , a three - year - old child was walking
beach . A three - year - old boy was so
asked the three - year - old .
day , a three - year - old boy was walking
, a little three - year - old girl named Lucy
, a little three - year - old boy named John
, a little 3 - year - old girl found the
day , a three - year - old boy was very
olly was a three - year - old who liked to
was an eager 3 - year - old boy who loved
day , a 3 - year - old child found the
ally was a 3 - year - old girl who had
nie was a three - year - old with bright eyes
was a curious 3 - year - old , eager to
across a little 3 - year - old boy who was
a very curious three - year - old boy . He
Charlie was a three - year - old who loved books
a gr umpy three - year - old . She had
was an eager 3 - year - old . He liked
a very gifted three - year - old . He loved

INTERVAL 0.053 - 0.104
CONTAINS 0.874%

above the mountains and rivers , and even saw some animals
nodded again . They said , " Yes , we want
. " Help , help !" Tom and Mia
. They hugged their dad , and thanked him for the
it at them and said , " A br ac ad

INTERVAL 0.003 - 0.053
CONTAINS 40.072%

't worry , your blocks are safe . You can design
check the kitchen . He said , " Good job ,
you , Lily . You did a good job . You
were bright colours and magical things . A man
gave them the ball and they said goodbye . They ran

INTERVAL -0.048 - 0.003
CONTAINS 58.852%

As the show ended , Tim my 's mom my
the water . One day , Tom has a
mighty jump over a big pile of rocks . Leo was
saw a big , round pie sitting on the counter .
my came to her and asked her what was wrong .

INTERVAL -0.098 - -0.048
CONTAINS 0.201%

way to keep herself safe from Fl uffy . From that
It was the best day ever ! <|endoftext|> Once upon a
scared but she did it because she was very thirsty !
. <|endoftext|> Jimmy was a very lively 3 year old .
before . M oral of the story : Working
Token
Token ID198
Feature activation+7.40e-03

Pos logit contributions
coat+0.041
aults+0.040
 apr+0.039
Aud+0.039
 skirts+0.039
Neg logit contributions
 need-0.019
 of-0.019
 needs-0.018
ore-0.017
 harm-0.017
Token"
Token ID1
Feature activation+4.37e-02

Pos logit contributions
coat+0.014
 apr+0.014
Aud+0.013
 skirts+0.013
aults+0.013
Neg logit contributions
 of-0.011
 need-0.011
 needs-0.010
crow-0.010
 day-0.010
TokenRing
Token ID39687
Feature activation-1.79e-03

Pos logit contributions
aj+0.075
coat+0.069
uits+0.065
 apr+0.065
aults+0.064
Neg logit contributions
crow-0.068
 thing-0.068
 harm-0.066
 day-0.066
 need-0.066
Token,
Token ID11
Feature activation+1.01e-01

Pos logit contributions
 need+0.002
 needs+0.002
 harm+0.002
 animal+0.002
 choice+0.002
Neg logit contributions
coat-0.004
aults-0.004
 apr-0.004
Ol-0.004
Aud-0.004
Token ring
Token ID5858
Feature activation+1.16e-02

Pos logit contributions
coat+0.154
Ol+0.154
 apr+0.152
aults+0.151
Aud+0.151
Neg logit contributions
 need-0.170
 needs-0.169
 harm-0.168
 policy-0.167
 choice-0.158
Token,
Token ID11
Feature activation+1.04e-01

Pos logit contributions
aults+0.022
coat+0.022
Aud+0.021
 apr+0.021
Ol+0.021
Neg logit contributions
 need-0.016
 needs-0.016
 harm-0.015
 policy-0.015
 choice-0.014
Token ring
Token ID5858
Feature activation+1.58e-02

Pos logit contributions
coat+0.161
aults+0.161
 apr+0.158
Aud+0.157
Ol+0.157
Neg logit contributions
 need-0.175
 needs-0.174
 harm-0.170
 policy-0.166
 choice-0.159
Token,
Token ID11
Feature activation+1.02e-01

Pos logit contributions
coat+0.030
aults+0.030
 apr+0.029
Aud+0.029
Ol+0.029
Neg logit contributions
 need-0.021
 needs-0.021
 harm-0.021
 policy-0.020
 choice-0.019
Token we
Token ID356
Feature activation+3.14e-02

Pos logit contributions
coat+0.169
aults+0.168
 apr+0.166
Aud+0.164
Ol+0.163
Neg logit contributions
 need-0.166
 needs-0.165
 harm-0.159
 policy-0.150
 of-0.150
Token are
Token ID389
Feature activation+2.02e-02

Pos logit contributions
 apr+0.061
coat+0.061
Aud+0.061
Ol+0.060
aults+0.060
Neg logit contributions
 need-0.043
 needs-0.042
 harm-0.040
 policy-0.036
 of-0.036
Token playing
Token ID2712
Feature activation+6.54e-02

Pos logit contributions
coat+0.038
Ol+0.038
 apr+0.037
Aud+0.037
aults+0.037
Neg logit contributions
 need-0.029
 needs-0.028
 harm-0.028
 policy-0.025
 day-0.025
TokenRing
Token ID39687
Feature activation-1.79e-03

Pos logit contributions
aj+0.075
coat+0.069
uits+0.065
 apr+0.065
aults+0.064
Neg logit contributions
crow-0.068
 thing-0.068
 harm-0.066
 day-0.066
 need-0.066
Token,
Token ID11
Feature activation+1.01e-01

Pos logit contributions
 need+0.002
 needs+0.002
 harm+0.002
 animal+0.002
 choice+0.002
Neg logit contributions
coat-0.004
aults-0.004
 apr-0.004
Ol-0.004
Aud-0.004
Token ring
Token ID5858
Feature activation+1.16e-02

Pos logit contributions
coat+0.154
Ol+0.154
 apr+0.152
aults+0.151
Aud+0.151
Neg logit contributions
 need-0.170
 needs-0.169
 harm-0.168
 policy-0.167
 choice-0.158
Token,
Token ID11
Feature activation+1.04e-01

Pos logit contributions
aults+0.022
coat+0.022
Aud+0.021
 apr+0.021
Ol+0.021
Neg logit contributions
 need-0.016
 needs-0.016
 harm-0.015
 policy-0.015
 choice-0.014
Token ring
Token ID5858
Feature activation+1.58e-02

Pos logit contributions
coat+0.161
aults+0.161
 apr+0.158
Aud+0.157
Ol+0.157
Neg logit contributions
 need-0.175
 needs-0.174
 harm-0.170
 policy-0.166
 choice-0.159
Token,
Token ID11
Feature activation+1.02e-01

Pos logit contributions
coat+0.030
aults+0.030
 apr+0.029
Aud+0.029
Ol+0.029
Neg logit contributions
 need-0.021
 needs-0.021
 harm-0.021
 policy-0.020
 choice-0.019
Token we
Token ID356
Feature activation+3.14e-02

Pos logit contributions
coat+0.169
aults+0.168
 apr+0.166
Aud+0.164
Ol+0.163
Neg logit contributions
 need-0.166
 needs-0.165
 harm-0.159
 policy-0.150
 of-0.150
Token are
Token ID389
Feature activation+2.02e-02

Pos logit contributions
 apr+0.061
coat+0.061
Aud+0.061
Ol+0.060
aults+0.060
Neg logit contributions
 need-0.043
 needs-0.042
 harm-0.040
 policy-0.036
 of-0.036
Token playing
Token ID2712
Feature activation+6.54e-02

Pos logit contributions
coat+0.038
Ol+0.038
 apr+0.037
Aud+0.037
aults+0.037
Neg logit contributions
 need-0.029
 needs-0.028
 harm-0.028
 policy-0.025
 day-0.025
Token with
Token ID351
Feature activation-5.73e-03

Pos logit contributions
coat+0.144
Ol+0.143
Aud+0.141
 apr+0.140
aults+0.140
Neg logit contributions
 need-0.070
 needs-0.068
 harm-0.067
 thing-0.061
 animal-0.061
Token the
Token ID262
Feature activation+4.01e-02

Pos logit contributions
 need+0.007
 needs+0.007
 harm+0.007
 policy+0.006
 animal+0.006
Neg logit contributions
coat-0.012
Aud-0.012
Ol-0.012
aults-0.012
Daniel-0.011
Token.
Token ID13
Feature activation+4.97e-03

Pos logit contributions
Ol+0.096
coat+0.094
 apr+0.094
Daniel+0.094
aults+0.093
Neg logit contributions
 need-0.029
 needs-0.028
 harm-0.027
 of-0.027
 day-0.026
Token
Token ID198
Feature activation+1.75e-02

Pos logit contributions
aults+0.012
coat+0.012
 apr+0.011
Aud+0.011
 skirts+0.011
Neg logit contributions
 need-0.005
 of-0.005
 needs-0.005
 policy-0.004
 day-0.004
Token
Token ID198
Feature activation+7.40e-03

Pos logit contributions
coat+0.041
aults+0.040
 apr+0.039
Aud+0.039
 skirts+0.039
Neg logit contributions
 need-0.019
 of-0.019
 needs-0.018
ore-0.017
 harm-0.017
Token"
Token ID1
Feature activation+4.37e-02

Pos logit contributions
coat+0.014
 apr+0.014
Aud+0.013
 skirts+0.013
aults+0.013
Neg logit contributions
 of-0.011
 need-0.011
 needs-0.010
crow-0.010
 day-0.010
TokenRing
Token ID39687
Feature activation-1.79e-03

Pos logit contributions
aj+0.075
coat+0.069
uits+0.065
 apr+0.065
aults+0.064
Neg logit contributions
crow-0.068
 thing-0.068
 harm-0.066
 day-0.066
 need-0.066
Token,
Token ID11
Feature activation+1.01e-01

Pos logit contributions
 need+0.002
 needs+0.002
 harm+0.002
 animal+0.002
 choice+0.002
Neg logit contributions
coat-0.004
aults-0.004
 apr-0.004
Ol-0.004
Aud-0.004
Token ring
Token ID5858
Feature activation+1.16e-02

Pos logit contributions
coat+0.154
Ol+0.154
 apr+0.152
aults+0.151
Aud+0.151
Neg logit contributions
 need-0.170
 needs-0.169
 harm-0.168
 policy-0.167
 choice-0.158
Token,
Token ID11
Feature activation+1.04e-01

Pos logit contributions
aults+0.022
coat+0.022
Aud+0.021
 apr+0.021
Ol+0.021
Neg logit contributions
 need-0.016
 needs-0.016
 harm-0.015
 policy-0.015
 choice-0.014
Token ring
Token ID5858
Feature activation+1.58e-02

Pos logit contributions
coat+0.161
aults+0.161
 apr+0.158
Aud+0.157
Ol+0.157
Neg logit contributions
 need-0.175
 needs-0.174
 harm-0.170
 policy-0.166
 choice-0.159
Token,
Token ID11
Feature activation+1.02e-01

Pos logit contributions
coat+0.030
aults+0.030
 apr+0.029
Aud+0.029
Ol+0.029
Neg logit contributions
 need-0.021
 needs-0.021
 harm-0.021
 policy-0.020
 choice-0.019
Token we
Token ID356
Feature activation+3.14e-02

Pos logit contributions
coat+0.169
aults+0.168
 apr+0.166
Aud+0.164
Ol+0.163
Neg logit contributions
 need-0.166
 needs-0.165
 harm-0.159
 policy-0.150
 of-0.150
Tokenows
Token ID1666
Feature activation+1.55e-02

Pos logit contributions
aults+0.025
Ol+0.025
coat+0.025
 apr+0.025
Aud+0.025
Neg logit contributions
 needs-0.034
 need-0.034
 harm-0.033
 policy-0.033
 of-0.032
Token,
Token ID11
Feature activation+8.91e-02

Pos logit contributions
Ol+0.034
V+0.034
Ryan+0.033
Aud+0.033
Daniel+0.033
Neg logit contributions
 need-0.016
 of-0.016
 needs-0.016
 harm-0.016
 thing-0.015
Token blankets
Token ID34794
Feature activation+2.77e-02

Pos logit contributions
aults+0.133
coat+0.131
Ryan+0.130
Daniel+0.129
Aud+0.129
Neg logit contributions
 need-0.161
 needs-0.160
 harm-0.153
 of-0.147
 policy-0.147
Token,
Token ID11
Feature activation+9.24e-02

Pos logit contributions
Ol+0.057
aults+0.056
 apr+0.056
Aud+0.056
coat+0.056
Neg logit contributions
 need-0.033
 needs-0.033
 harm-0.031
 policy-0.031
 of-0.030
Token hats
Token ID23910
Feature activation+4.92e-02

Pos logit contributions
Ryan+0.159
aults+0.154
Daniel+0.154
Aud+0.152
V+0.152
Neg logit contributions
 need-0.149
 needs-0.149
 of-0.139
 harm-0.138
 animal-0.134
Token,
Token ID11
Feature activation+9.95e-02

Pos logit contributions
Ol+0.123
La+0.121
V+0.120
Ryan+0.120
 apr+0.119
Neg logit contributions
 of-0.044
 need-0.042
 needs-0.039
 day-0.036
 policy-0.035
Token and
Token ID290
Feature activation+5.93e-02

Pos logit contributions
Ryan+0.183
Daniel+0.174
V+0.174
aults+0.173
Aud+0.172
Neg logit contributions
 needs-0.154
 need-0.153
 of-0.144
 animal-0.138
 times-0.138
Token stickers
Token ID28568
Feature activation+3.72e-02

Pos logit contributions
coat+0.084
aults+0.084
 apr+0.084
Ol+0.083
Aud+0.083
Neg logit contributions
 need-0.106
 needs-0.106
 harm-0.104
 policy-0.103
 choice-0.098
Token.
Token ID13
Feature activation+3.47e-03

Pos logit contributions
coat+0.085
aults+0.084
Ol+0.084
 apr+0.084
Aud+0.084
Neg logit contributions
 need-0.036
 needs-0.035
 harm-0.034
 policy-0.033
 of-0.030
Token They
Token ID1119
Feature activation+3.12e-02

Pos logit contributions
aults+0.009
coat+0.009
Ol+0.009
Daniel+0.009
Ryan+0.009
Neg logit contributions
 need-0.003
 of-0.002
 needs-0.002
 day-0.002
 policy-0.002
Token had
Token ID550
Feature activation+2.79e-02

Pos logit contributions
Aud+0.053
coat+0.052
aults+0.052
Ryan+0.051
Ol+0.051
Neg logit contributions
 need-0.051
 needs-0.049
 harm-0.047
 of-0.045
 day-0.044
Token plates
Token ID13375
Feature activation+3.26e-02

Pos logit contributions
 apr+0.000
aults+0.000
coat+0.000
Ol+0.000
Aud+0.000
Neg logit contributions
 needs-0.000
 need-0.000
 harm-0.000
 policy-0.000
 decision-0.000
Token,
Token ID11
Feature activation+8.76e-02

Pos logit contributions
coat+0.059
aults+0.058
 apr+0.058
 skirts+0.056
Aud+0.056
Neg logit contributions
 needs-0.046
 harm-0.046
 need-0.045
 policy-0.044
 earthquake-0.043
Token forks
Token ID43378
Feature activation+3.71e-02

Pos logit contributions
coat+0.146
Aud+0.143
aults+0.143
V+0.140
Ol+0.139
Neg logit contributions
 need-0.146
 needs-0.144
 of-0.133
 harm-0.132
 policy-0.131
Token,
Token ID11
Feature activation+9.38e-02

Pos logit contributions
coat+0.072
aults+0.072
 apr+0.071
Ol+0.070
Aud+0.070
Neg logit contributions
 needs-0.047
 need-0.047
 harm-0.046
 policy-0.044
 choice-0.042
Token knives
Token ID23110
Feature activation+2.43e-02

Pos logit contributions
Aud+0.166
coat+0.165
V+0.165
aults+0.163
Ryan+0.163
Neg logit contributions
 need-0.146
 needs-0.144
 of-0.135
 harm-0.129
 times-0.128
Token,
Token ID11
Feature activation+9.93e-02

Pos logit contributions
coat+0.047
 apr+0.046
aults+0.046
Ol+0.045
Aud+0.045
Neg logit contributions
 need-0.031
 needs-0.030
 harm-0.030
 policy-0.030
 choice-0.028
Token and
Token ID290
Feature activation+5.80e-02

Pos logit contributions
V+0.190
Aud+0.190
Ryan+0.188
coat+0.187
La+0.186
Neg logit contributions
 need-0.142
 needs-0.140
 of-0.129
 harm-0.124
 times-0.124
Token nap
Token ID25422
Feature activation+3.41e-02

Pos logit contributions
 apr+0.088
aults+0.086
coat+0.085
Ol+0.084
 skirts+0.084
Neg logit contributions
 needs-0.098
 harm-0.098
 need-0.098
 policy-0.097
crow-0.094
Tokenkins
Token ID5331
Feature activation+1.60e-02

Pos logit contributions
 apr+0.062
aults+0.060
 skirts+0.059
coat+0.059
 shirts+0.057
Neg logit contributions
 harm-0.049
 need-0.048
 needs-0.048
crow-0.048
 policy-0.047
Token,"
Token ID553
Feature activation+3.43e-02

Pos logit contributions
coat+0.033
 apr+0.033
aults+0.033
Aud+0.032
 skirts+0.032
Neg logit contributions
 harm-0.018
 needs-0.018
 need-0.018
 policy-0.018
 decision-0.017
Token mom
Token ID1995
Feature activation+1.29e-02

Pos logit contributions
 apr+0.052
Ol+0.052
coat+0.051
aults+0.051
Aud+0.051
Neg logit contributions
 needs-0.059
 need-0.058
 harm-0.058
 policy-0.056
 situations-0.055
Token blocks
Token ID7021
Feature activation+1.82e-02

Pos logit contributions
coat+0.012
aults+0.012
 apr+0.012
Aud+0.012
Ol+0.012
Neg logit contributions
 need-0.014
 needs-0.014
 harm-0.013
 policy-0.013
 choice-0.012
Token,
Token ID11
Feature activation+7.51e-02

Pos logit contributions
coat+0.038
aults+0.037
Aud+0.037
 apr+0.036
 skirts+0.036
Neg logit contributions
 needs-0.021
 need-0.021
 harm-0.020
 policy-0.019
 earthquake-0.019
Token wheels
Token ID13666
Feature activation+3.55e-02

Pos logit contributions
Daniel+0.123
Ol+0.122
aults+0.122
V+0.121
T+0.121
Neg logit contributions
 need-0.123
 needs-0.120
 harm-0.118
 of-0.113
 day-0.113
Token,
Token ID11
Feature activation+9.01e-02

Pos logit contributions
Ol+0.081
 apr+0.081
aults+0.080
La+0.080
Aud+0.080
Neg logit contributions
 need-0.036
 needs-0.036
 of-0.034
 harm-0.034
 policy-0.032
Token sticks
Token ID16461
Feature activation+4.01e-02

Pos logit contributions
Ryan+0.201
Daniel+0.197
aults+0.192
V+0.191
Ol+0.191
Neg logit contributions
 need-0.100
 needs-0.098
 of-0.098
 times-0.090
 harm-0.088
Token,
Token ID11
Feature activation+9.91e-02

Pos logit contributions
aults+0.079
coat+0.079
 apr+0.079
Ol+0.079
Aud+0.078
Neg logit contributions
 need-0.050
 needs-0.050
 harm-0.048
 policy-0.047
 choice-0.044
Token and
Token ID290
Feature activation+5.96e-02

Pos logit contributions
Ryan+0.219
Daniel+0.210
V+0.206
T+0.203
Aud+0.203
Neg logit contributions
 needs-0.114
 need-0.114
 of-0.112
 times-0.105
 day-0.100
Token strings
Token ID13042
Feature activation+2.01e-02

Pos logit contributions
coat+0.087
aults+0.087
Ol+0.085
Aud+0.085
 apr+0.085
Neg logit contributions
 need-0.107
 needs-0.106
 harm-0.103
 policy-0.100
 choice-0.096
Token.
Token ID13
Feature activation+5.00e-03

Pos logit contributions
coat+0.046
aults+0.046
Aud+0.045
 apr+0.045
Ol+0.045
Neg logit contributions
 needs-0.019
 need-0.018
 harm-0.018
 policy-0.017
 choice-0.016
Token They
Token ID1119
Feature activation+3.41e-02

Pos logit contributions
aults+0.013
coat+0.013
Daniel+0.012
Ol+0.012
 apr+0.012
Neg logit contributions
 need-0.004
 needs-0.003
 of-0.003
 policy-0.003
 day-0.003
Token wanted
Token ID2227
Feature activation+1.10e-02

Pos logit contributions
Aud+0.067
aults+0.066
coat+0.065
 apr+0.065
La+0.065
Neg logit contributions
 need-0.047
 needs-0.044
 harm-0.041
 day-0.040
 of-0.040
Token mouths
Token ID28552
Feature activation+1.75e-02

Pos logit contributions
 apr+0.028
aults+0.028
coat+0.028
 guitars+0.028
Ol+0.028
Neg logit contributions
 need-0.031
 needs-0.031
 harm-0.031
 policy-0.030
crow-0.030
Token,
Token ID11
Feature activation+7.32e-02

Pos logit contributions
Ol+0.036
 apr+0.036
Aud+0.036
coat+0.036
aults+0.036
Neg logit contributions
 need-0.020
 needs-0.020
 harm-0.020
 policy-0.018
 of-0.018
Token like
Token ID588
Feature activation+1.39e-02

Pos logit contributions
aults+0.118
coat+0.118
Ol+0.117
Daniel+0.116
Aud+0.116
Neg logit contributions
 need-0.119
 needs-0.118
 harm-0.115
 policy-0.110
 animal-0.107
Token "
Token ID366
Feature activation+5.40e-02

Pos logit contributions
coat+0.023
aults+0.023
Ol+0.023
 apr+0.023
Aud+0.023
Neg logit contributions
 need-0.021
 needs-0.021
 harm-0.021
 policy-0.020
 choice-0.019
Tokenbang
Token ID36668
Feature activation+4.48e-02

Pos logit contributions
 apr+0.066
coat+0.066
aults+0.065
Daniel+0.063
Aud+0.063
Neg logit contributions
 need-0.109
 needs-0.107
 harm-0.107
 policy-0.102
crow-0.102
Token,
Token ID11
Feature activation+9.91e-02

Pos logit contributions
coat+0.094
 apr+0.094
Daniel+0.093
Aud+0.092
 skirts+0.092
Neg logit contributions
 need-0.053
 of-0.051
 needs-0.049
 thing-0.048
 harm-0.048
Token bang
Token ID20188
Feature activation+5.39e-02

Pos logit contributions
coat+0.145
Daniel+0.145
aults+0.145
Ryan+0.142
J+0.142
Neg logit contributions
 needs-0.180
 need-0.180
 harm-0.171
 animal-0.165
 earthquake-0.163
Token,
Token ID11
Feature activation+9.69e-02

Pos logit contributions
coat+0.109
Daniel+0.107
 apr+0.106
Aud+0.106
Ol+0.105
Neg logit contributions
 needs-0.067
 need-0.065
 thing-0.064
 harm-0.061
 of-0.061
Token bang
Token ID20188
Feature activation+5.62e-02

Pos logit contributions
Daniel+0.156
Ryan+0.155
coat+0.154
J+0.152
 apr+0.152
Neg logit contributions
 need-0.166
 needs-0.165
 harm-0.154
 of-0.150
 animal-0.149
Token".
Token ID1911
Feature activation+7.37e-03

Pos logit contributions
coat+0.117
Daniel+0.114
Aud+0.112
 apr+0.112
J+0.112
Neg logit contributions
 needs-0.068
 need-0.066
 thing-0.065
 harm-0.062
 of-0.062
Token They
Token ID1119
Feature activation+3.17e-02

Pos logit contributions
aults+0.014
coat+0.014
Aud+0.014
Ol+0.014
 apr+0.013
Neg logit contributions
 need-0.010
 needs-0.010
 harm-0.010
 policy-0.009
 of-0.009
Token dogs
Token ID6844
Feature activation+4.34e-02

Pos logit contributions
aults+0.142
Daniel+0.142
Ryan+0.141
coat+0.140
Aud+0.138
Neg logit contributions
 need-0.144
 needs-0.140
 harm-0.138
 animal-0.133
 of-0.132
Token,
Token ID11
Feature activation+9.04e-02

Pos logit contributions
 apr+0.095
Ol+0.095
Daniel+0.094
coat+0.094
aults+0.094
Neg logit contributions
 need-0.046
 needs-0.046
 harm-0.044
 policy-0.040
 of-0.039
Token cats
Token ID11875
Feature activation+4.20e-02

Pos logit contributions
Ryan+0.166
Daniel+0.166
aults+0.164
coat+0.162
Aud+0.160
Neg logit contributions
 need-0.133
 needs-0.131
 of-0.127
 harm-0.125
 animal-0.123
Token,
Token ID11
Feature activation+9.33e-02

Pos logit contributions
Ol+0.094
 apr+0.093
Daniel+0.093
coat+0.093
aults+0.093
Neg logit contributions
 need-0.043
 needs-0.042
 harm-0.041
 policy-0.037
 of-0.037
Token birds
Token ID10087
Feature activation+4.71e-02

Pos logit contributions
Ryan+0.198
Daniel+0.196
aults+0.194
coat+0.190
V+0.190
Neg logit contributions
 need-0.115
 needs-0.112
 of-0.109
 harm-0.104
 animal-0.104
Token,
Token ID11
Feature activation+9.90e-02

Pos logit contributions
 apr+0.098
coat+0.098
aults+0.098
Ol+0.098
Daniel+0.097
Neg logit contributions
 need-0.053
 needs-0.053
 harm-0.051
 policy-0.048
 of-0.047
Token and
Token ID290
Feature activation+6.54e-02

Pos logit contributions
Ryan+0.223
Daniel+0.218
aults+0.216
V+0.212
coat+0.211
Neg logit contributions
 need-0.109
 needs-0.108
 of-0.106
 animal-0.099
 harm-0.097
Token monkeys
Token ID26697
Feature activation+5.18e-02

Pos logit contributions
coat+0.107
aults+0.106
Aud+0.104
 apr+0.103
Ol+0.103
Neg logit contributions
 need-0.107
 needs-0.106
 harm-0.103
 policy-0.098
 of-0.096
Token.
Token ID13
Feature activation+7.15e-03

Pos logit contributions
coat+0.125
Ol+0.124
aults+0.124
 apr+0.124
Daniel+0.124
Neg logit contributions
 need-0.042
 needs-0.042
 harm-0.040
 policy-0.037
 of-0.035
Token They
Token ID1119
Feature activation+3.45e-02

Pos logit contributions
aults+0.018
coat+0.018
Ol+0.018
 apr+0.017
Daniel+0.017
Neg logit contributions
 need-0.006
 of-0.005
 day-0.005
 needs-0.005
 policy-0.005
Token laugh
Token ID6487
Feature activation+3.45e-02

Pos logit contributions
coat+0.068
 apr+0.068
Aud+0.067
Ol+0.067
aults+0.067
Neg logit contributions
 need-0.046
 needs-0.044
 harm-0.042
 choice-0.039
 policy-0.038
Token bikes
Token ID16715
Feature activation+2.01e-02

Pos logit contributions
Daniel+0.146
Ryan+0.146
aults+0.145
coat+0.145
Vict+0.141
Neg logit contributions
 need-0.166
 needs-0.162
 of-0.158
 harm-0.153
 times-0.153
Token,
Token ID11
Feature activation+9.23e-02

Pos logit contributions
 apr+0.043
Ol+0.043
aults+0.043
coat+0.043
Aud+0.043
Neg logit contributions
 need-0.022
 needs-0.022
 harm-0.021
 policy-0.020
 of-0.019
Token dogs
Token ID6844
Feature activation+4.09e-02

Pos logit contributions
Ryan+0.148
Daniel+0.144
aults+0.144
coat+0.143
T+0.141
Neg logit contributions
 need-0.156
 needs-0.155
 of-0.152
 times-0.146
 harm-0.144
Token,
Token ID11
Feature activation+9.17e-02

Pos logit contributions
Ol+0.092
 apr+0.092
aults+0.090
J+0.090
Daniel+0.090
Neg logit contributions
 need-0.043
 needs-0.043
 harm-0.039
 of-0.037
 policy-0.037
Token birds
Token ID10087
Feature activation+4.30e-02

Pos logit contributions
Ryan+0.168
Daniel+0.164
V+0.163
aults+0.162
Ol+0.161
Neg logit contributions
 need-0.137
 needs-0.136
 of-0.131
 times-0.126
 harm-0.124
Token,
Token ID11
Feature activation+9.83e-02

Pos logit contributions
 apr+0.092
Ol+0.092
aults+0.091
coat+0.091
Daniel+0.090
Neg logit contributions
 need-0.047
 needs-0.047
 harm-0.045
 of-0.043
 policy-0.042
Token and
Token ID290
Feature activation+6.64e-02

Pos logit contributions
Ryan+0.200
V+0.193
Daniel+0.192
aults+0.189
Vict+0.189
Neg logit contributions
 needs-0.128
 need-0.128
 of-0.126
 times-0.119
 animal-0.114
Token flowers
Token ID12734
Feature activation+4.26e-02

Pos logit contributions
coat+0.099
aults+0.098
Ol+0.097
Aud+0.096
 apr+0.096
Neg logit contributions
 need-0.117
 needs-0.116
 harm-0.113
 policy-0.109
 of-0.106
Token.
Token ID13
Feature activation+9.01e-04

Pos logit contributions
coat+0.095
aults+0.094
Aud+0.093
 apr+0.092
 skirts+0.091
Neg logit contributions
 needs-0.043
 need-0.042
 harm-0.041
 policy-0.041
 decision-0.038
Token They
Token ID1119
Feature activation+3.20e-02

Pos logit contributions
aults+0.002
coat+0.002
Ol+0.002
Daniel+0.002
Ryan+0.002
Neg logit contributions
 need-0.001
 needs-0.001
 of-0.001
 day-0.001
 policy-0.001
Token say
Token ID910
Feature activation+2.37e-02

Pos logit contributions
coat+0.064
Ol+0.063
 apr+0.062
Aud+0.062
aults+0.062
Neg logit contributions
 need-0.044
 needs-0.042
 harm-0.039
 choice-0.036
 of-0.036
Token like
Token ID588
Feature activation+1.39e-02

Pos logit contributions
aults+0.118
coat+0.118
Ol+0.117
Daniel+0.116
Aud+0.116
Neg logit contributions
 need-0.119
 needs-0.118
 harm-0.115
 policy-0.110
 animal-0.107
Token "
Token ID366
Feature activation+5.40e-02

Pos logit contributions
coat+0.023
aults+0.023
Ol+0.023
 apr+0.023
Aud+0.023
Neg logit contributions
 need-0.021
 needs-0.021
 harm-0.021
 policy-0.020
 choice-0.019
Tokenbang
Token ID36668
Feature activation+4.48e-02

Pos logit contributions
 apr+0.066
coat+0.066
aults+0.065
Daniel+0.063
Aud+0.063
Neg logit contributions
 need-0.109
 needs-0.107
 harm-0.107
 policy-0.102
crow-0.102
Token,
Token ID11
Feature activation+9.91e-02

Pos logit contributions
coat+0.094
 apr+0.094
Daniel+0.093
Aud+0.092
 skirts+0.092
Neg logit contributions
 need-0.053
 of-0.051
 needs-0.049
 thing-0.048
 harm-0.048
Token bang
Token ID20188
Feature activation+5.39e-02

Pos logit contributions
coat+0.145
Daniel+0.145
aults+0.145
Ryan+0.142
J+0.142
Neg logit contributions
 needs-0.180
 need-0.180
 harm-0.171
 animal-0.165
 earthquake-0.163
Token,
Token ID11
Feature activation+9.69e-02

Pos logit contributions
coat+0.109
Daniel+0.107
 apr+0.106
Aud+0.106
Ol+0.105
Neg logit contributions
 needs-0.067
 need-0.065
 thing-0.064
 harm-0.061
 of-0.061
Token bang
Token ID20188
Feature activation+5.62e-02

Pos logit contributions
Daniel+0.156
Ryan+0.155
coat+0.154
J+0.152
 apr+0.152
Neg logit contributions
 need-0.166
 needs-0.165
 harm-0.154
 of-0.150
 animal-0.149
Token".
Token ID1911
Feature activation+7.37e-03

Pos logit contributions
coat+0.117
Daniel+0.114
Aud+0.112
 apr+0.112
J+0.112
Neg logit contributions
 needs-0.068
 need-0.066
 thing-0.065
 harm-0.062
 of-0.062
Token They
Token ID1119
Feature activation+3.17e-02

Pos logit contributions
aults+0.014
coat+0.014
Aud+0.014
Ol+0.014
 apr+0.013
Neg logit contributions
 need-0.010
 needs-0.010
 harm-0.010
 policy-0.009
 of-0.009
Token ran
Token ID4966
Feature activation+2.67e-02

Pos logit contributions
Aud+0.062
 apr+0.058
coat+0.058
Ryan+0.058
Daniel+0.057
Neg logit contributions
 need-0.048
 needs-0.042
 harm-0.041
 fear-0.040
 animal-0.039
Token and
Token ID290
Feature activation+2.42e-02

Pos logit contributions
Ol+0.058
 apr+0.058
Daniel+0.057
coat+0.057
aults+0.057
Neg logit contributions
 need-0.029
 harm-0.028
 needs-0.028
 of-0.026
 policy-0.026
Token have
Token ID423
Feature activation+2.64e-02

Pos logit contributions
coat+0.053
aults+0.053
 apr+0.053
Ol+0.052
Aud+0.051
Neg logit contributions
 policy-0.040
 harm-0.039
 needs-0.038
 situations-0.038
ment-0.038
Token more
Token ID517
Feature activation+2.20e-02

Pos logit contributions
coat+0.054
aults+0.054
Ol+0.053
Aud+0.053
V+0.052
Neg logit contributions
 need-0.034
 needs-0.033
 harm-0.031
 policy-0.030
 day-0.029
Token flour
Token ID10601
Feature activation+2.70e-02

Pos logit contributions
 apr+0.033
coat+0.030
aults+0.030
 skirts+0.030
Daniel+0.029
Neg logit contributions
 needs-0.040
 need-0.040
 harm-0.039
 policy-0.038
crow-0.038
Token,
Token ID11
Feature activation+8.46e-02

Pos logit contributions
coat+0.053
aults+0.052
 apr+0.052
Aud+0.051
Ol+0.051
Neg logit contributions
 need-0.034
 needs-0.034
 harm-0.033
 policy-0.032
 choice-0.030
Token eggs
Token ID9653
Feature activation+2.22e-02

Pos logit contributions
coat+0.138
aults+0.138
Aud+0.137
Ol+0.136
Ryan+0.134
Neg logit contributions
 need-0.139
 needs-0.138
 harm-0.130
 policy-0.127
 of-0.126
Token,
Token ID11
Feature activation+9.65e-02

Pos logit contributions
coat+0.044
aults+0.042
 apr+0.042
Ol+0.041
Aud+0.041
Neg logit contributions
 needs-0.028
 need-0.028
 harm-0.028
 policy-0.027
 situations-0.026
Token milk
Token ID7545
Feature activation+4.02e-02

Pos logit contributions
Ryan+0.160
aults+0.159
Aud+0.158
coat+0.157
Ol+0.155
Neg logit contributions
 need-0.160
 needs-0.159
 of-0.149
 harm-0.145
 times-0.142
Token and
Token ID290
Feature activation+5.26e-02

Pos logit contributions
aults+0.085
Ol+0.084
coat+0.084
Aud+0.084
Daniel+0.083
Neg logit contributions
 needs-0.046
 need-0.046
 harm-0.043
 policy-0.042
 choice-0.039
Token sugar
Token ID7543
Feature activation+3.74e-02

Pos logit contributions
 apr+0.073
coat+0.072
aults+0.071
Ol+0.070
Aud+0.070
Neg logit contributions
 needs-0.096
 need-0.096
 harm-0.095
 policy-0.094
 decision-0.090
Token.
Token ID13
Feature activation-1.40e-03

Pos logit contributions
aults+0.084
coat+0.084
Ol+0.083
Aud+0.083
 apr+0.083
Neg logit contributions
 needs-0.037
 need-0.037
 harm-0.035
 policy-0.034
 choice-0.031
Token We
Token ID775
Feature activation+3.10e-02

Pos logit contributions
 need+0.002
 needs+0.002
 harm+0.002
 policy+0.002
 choice+0.001
Neg logit contributions
coat-0.003
aults-0.003
 apr-0.003
Aud-0.003
Ol-0.003
Token books
Token ID3835
Feature activation-2.43e-03

Pos logit contributions
coat+0.032
Aud+0.032
Ol+0.032
aults+0.032
La+0.031
Neg logit contributions
 need-0.026
 needs-0.026
 harm-0.025
 of-0.024
 animal-0.024
Token,
Token ID11
Feature activation+9.51e-02

Pos logit contributions
 need+0.003
 needs+0.003
 harm+0.003
 policy+0.003
 of+0.002
Neg logit contributions
Ol-0.005
 apr-0.005
coat-0.005
aults-0.005
Aud-0.005
Token toys
Token ID14958
Feature activation+3.05e-02

Pos logit contributions
aults+0.151
Aud+0.151
coat+0.151
Ryan+0.150
Ol+0.150
Neg logit contributions
 need-0.160
 needs-0.156
 of-0.153
 harm-0.145
 times-0.144
Token,
Token ID11
Feature activation+9.48e-02

Pos logit contributions
Ol+0.065
aults+0.064
 apr+0.064
coat+0.064
Aud+0.064
Neg logit contributions
 need-0.035
 needs-0.035
 harm-0.032
 policy-0.032
 of-0.031
Token clothes
Token ID8242
Feature activation+3.30e-02

Pos logit contributions
Ol+0.169
V+0.168
Ryan+0.167
Aud+0.167
Daniel+0.165
Neg logit contributions
 need-0.144
 needs-0.142
 of-0.141
 times-0.131
 animal-0.129
Token,
Token ID11
Feature activation+9.63e-02

Pos logit contributions
Ol+0.079
V+0.078
La+0.077
Aud+0.077
 apr+0.076
Neg logit contributions
 need-0.031
 needs-0.030
 of-0.030
 policy-0.027
 harm-0.027
Token and
Token ID290
Feature activation+5.20e-02

Pos logit contributions
V+0.182
Ryan+0.181
Ol+0.180
Aud+0.178
Vict+0.177
Neg logit contributions
 need-0.136
 needs-0.135
 of-0.132
 times-0.124
 animal-0.121
Token candy
Token ID18550
Feature activation+2.42e-02

Pos logit contributions
 apr+0.070
aults+0.069
coat+0.069
 skirts+0.069
 guitars+0.068
Neg logit contributions
 needs-0.096
 harm-0.096
 need-0.095
 policy-0.094
crow-0.091
Token.
Token ID13
Feature activation-3.41e-03

Pos logit contributions
coat+0.057
aults+0.057
Ol+0.056
 apr+0.056
Aud+0.056
Neg logit contributions
 needs-0.021
 need-0.021
 harm-0.020
 policy-0.019
 choice-0.018
Token They
Token ID1119
Feature activation+2.77e-02

Pos logit contributions
 need+0.003
 needs+0.003
 of+0.003
 policy+0.003
 day+0.003
Neg logit contributions
aults-0.008
coat-0.008
Ol-0.008
Aud-0.008
Daniel-0.008
Token smiled
Token ID13541
Feature activation+7.98e-03

Pos logit contributions
Aud+0.053
coat+0.051
aults+0.050
Ol+0.050
V+0.049
Neg logit contributions
 need-0.041
 needs-0.039
 of-0.037
 harm-0.035
 day-0.035
Token They
Token ID1119
Feature activation+2.99e-02

Pos logit contributions
 need+0.001
 needs+0.001
 harm+0.001
 policy+0.001
 choice+0.001
Neg logit contributions
coat-0.002
aults-0.002
 apr-0.002
Ol-0.002
Aud-0.002
Token saw
Token ID2497
Feature activation+1.29e-02

Pos logit contributions
Aud+0.055
aults+0.055
coat+0.054
 apr+0.054
Ryan+0.054
Neg logit contributions
 need-0.044
 needs-0.042
 harm-0.040
 day-0.039
 night-0.038
Token monkeys
Token ID26697
Feature activation+4.32e-02

Pos logit contributions
coat+0.023
aults+0.023
 apr+0.023
Ol+0.022
Aud+0.022
Neg logit contributions
 need-0.019
 needs-0.019
 harm-0.018
 policy-0.018
 choice-0.017
Token,
Token ID11
Feature activation+9.09e-02

Pos logit contributions
 apr+0.090
Ol+0.090
Ryan+0.090
coat+0.089
Daniel+0.089
Neg logit contributions
 need-0.050
 needs-0.050
 harm-0.049
 of-0.046
 animal-0.044
Token birds
Token ID10087
Feature activation+4.26e-02

Pos logit contributions
Ryan+0.160
coat+0.157
aults+0.157
Daniel+0.156
 apr+0.152
Neg logit contributions
 need-0.145
 needs-0.141
 animal-0.136
 harm-0.136
 of-0.135
Token,
Token ID11
Feature activation+9.62e-02

Pos logit contributions
 apr+0.088
coat+0.088
aults+0.088
Ol+0.088
Aud+0.087
Neg logit contributions
 need-0.049
 needs-0.049
 harm-0.047
 policy-0.045
 of-0.043
Token and
Token ID290
Feature activation+5.91e-02

Pos logit contributions
Ryan+0.198
Daniel+0.191
coat+0.190
aults+0.190
Aud+0.185
Neg logit contributions
 need-0.127
 needs-0.125
 of-0.122
 animal-0.120
 harm-0.117
Token snakes
Token ID26042
Feature activation+3.59e-02

Pos logit contributions
coat+0.092
aults+0.092
 apr+0.091
Ol+0.091
Aud+0.091
Neg logit contributions
 needs-0.098
 need-0.097
 harm-0.095
 policy-0.094
 choice-0.090
Token.
Token ID13
Feature activation+4.21e-03

Pos logit contributions
coat+0.079
aults+0.078
Aud+0.077
 apr+0.075
 skirts+0.075
Neg logit contributions
 needs-0.036
 need-0.036
 policy-0.036
 harm-0.034
 choice-0.033
Token They
Token ID1119
Feature activation+3.29e-02

Pos logit contributions
aults+0.010
coat+0.010
Ol+0.010
 apr+0.010
Aud+0.009
Neg logit contributions
 need-0.004
 needs-0.004
 policy-0.004
 harm-0.004
crow-0.004
Token did
Token ID750
Feature activation+2.19e-02

Pos logit contributions
 apr+0.061
coat+0.061
aults+0.061
Aud+0.060
Ryan+0.060
Neg logit contributions
 need-0.050
 needs-0.047
 harm-0.045
 of-0.044
 animal-0.043
Token books
Token ID3835
Feature activation+1.37e-03

Pos logit contributions
coat+0.041
aults+0.040
Ol+0.040
Aud+0.040
 apr+0.039
Neg logit contributions
 need-0.030
 needs-0.029
 harm-0.028
 of-0.027
 animal-0.027
Token,
Token ID11
Feature activation+9.00e-02

Pos logit contributions
coat+0.003
aults+0.003
 apr+0.003
Ol+0.003
Aud+0.003
Neg logit contributions
 need-0.002
 needs-0.002
 harm-0.002
 policy-0.002
 choice-0.001
Token dolls
Token ID36062
Feature activation+3.70e-02

Pos logit contributions
coat+0.132
aults+0.131
Ol+0.129
Aud+0.129
Daniel+0.127
Neg logit contributions
 need-0.162
 needs-0.160
 harm-0.155
 policy-0.149
 of-0.148
Token,
Token ID11
Feature activation+9.32e-02

Pos logit contributions
Ol+0.084
 apr+0.084
aults+0.082
V+0.082
La+0.081
Neg logit contributions
 need-0.038
 needs-0.037
 of-0.035
 harm-0.034
 policy-0.033
Token cars
Token ID5006
Feature activation+5.68e-02

Pos logit contributions
coat+0.143
aults+0.142
Aud+0.141
Ryan+0.140
V+0.140
Neg logit contributions
 need-0.163
 needs-0.161
 harm-0.153
 of-0.152
 animal-0.147
Token,
Token ID11
Feature activation+9.61e-02

Pos logit contributions
Ol+0.129
 apr+0.129
aults+0.128
La+0.126
Daniel+0.126
Neg logit contributions
 need-0.058
 needs-0.057
 of-0.054
 harm-0.052
 policy-0.051
Token balls
Token ID11333
Feature activation+3.88e-02

Pos logit contributions
Ryan+0.164
V+0.162
coat+0.161
Daniel+0.160
Aud+0.160
Neg logit contributions
 need-0.153
 needs-0.152
 of-0.145
 times-0.140
 harm-0.139
Token and
Token ID290
Feature activation+6.05e-02

Pos logit contributions
 apr+0.084
Ol+0.084
aults+0.084
Aud+0.082
Daniel+0.082
Neg logit contributions
 need-0.043
 needs-0.042
 harm-0.039
 of-0.039
 policy-0.039
Token candy
Token ID18550
Feature activation+3.14e-02

Pos logit contributions
coat+0.084
aults+0.083
Ol+0.082
Aud+0.082
 apr+0.081
Neg logit contributions
 need-0.112
 needs-0.112
 harm-0.109
 policy-0.106
 choice-0.102
Token.
Token ID13
Feature activation-7.53e-04

Pos logit contributions
coat+0.073
aults+0.072
 apr+0.072
Ol+0.072
Aud+0.071
Neg logit contributions
 need-0.029
 needs-0.029
 harm-0.027
 policy-0.026
 choice-0.024
Token But
Token ID887
Feature activation-1.74e-02

Pos logit contributions
 need+0.001
 needs+0.001
 of+0.001
 day+0.001
 policy+0.001
Neg logit contributions
aults-0.002
coat-0.002
Ol-0.002
Aud-0.002
 apr-0.002
Token eggs
Token ID9653
Feature activation+1.84e-02

Pos logit contributions
 apr+0.066
aults+0.066
coat+0.065
Aud+0.065
Ol+0.065
Neg logit contributions
 need-0.038
 needs-0.038
 harm-0.038
 policy-0.037
 choice-0.034
Token,
Token ID11
Feature activation+8.65e-02

Pos logit contributions
coat+0.038
Aud+0.038
aults+0.038
Ol+0.038
 apr+0.037
Neg logit contributions
 need-0.021
 needs-0.021
 harm-0.020
 policy-0.020
 of-0.019
Token flour
Token ID10601
Feature activation+3.86e-02

Pos logit contributions
aults+0.145
coat+0.145
Aud+0.142
Ryan+0.142
Ol+0.140
Neg logit contributions
 need-0.141
 needs-0.141
 of-0.130
 harm-0.127
 policy-0.127
Token,
Token ID11
Feature activation+9.52e-02

Pos logit contributions
Aud+0.084
aults+0.083
Ol+0.083
coat+0.083
La+0.083
Neg logit contributions
 needs-0.042
 need-0.042
 harm-0.039
 policy-0.037
 of-0.037
Token sugar
Token ID7543
Feature activation+3.84e-02

Pos logit contributions
Ryan+0.176
Aud+0.169
aults+0.169
coat+0.166
Rose+0.164
Neg logit contributions
 need-0.151
 needs-0.150
 of-0.146
 times-0.133
 thing-0.131
Token,
Token ID11
Feature activation+9.60e-02

Pos logit contributions
Aud+0.083
aults+0.083
Ol+0.083
Ryan+0.083
La+0.083
Neg logit contributions
 needs-0.042
 need-0.041
 harm-0.038
 policy-0.037
 of-0.036
Token milk
Token ID7545
Feature activation+4.10e-02

Pos logit contributions
Ryan+0.187
Aud+0.178
aults+0.176
Rose+0.174
Daniel+0.173
Neg logit contributions
 need-0.145
 needs-0.144
 of-0.142
 times-0.127
 thing-0.124
Token,
Token ID11
Feature activation+9.59e-02

Pos logit contributions
Aud+0.089
aults+0.089
Ol+0.089
La+0.089
Ryan+0.089
Neg logit contributions
 needs-0.044
 need-0.044
 harm-0.041
 policy-0.040
 of-0.039
Token and
Token ID290
Feature activation+4.55e-02

Pos logit contributions
Ryan+0.195
Aud+0.184
aults+0.183
Daniel+0.181
Rose+0.181
Neg logit contributions
 need-0.137
 needs-0.136
 of-0.133
 times-0.118
 day-0.116
Token some
Token ID617
Feature activation+2.09e-02

Pos logit contributions
coat+0.070
aults+0.069
Aud+0.068
Ol+0.068
Ryan+0.067
Neg logit contributions
 need-0.079
 needs-0.079
 harm-0.075
 policy-0.073
 of-0.071
Token green
Token ID4077
Feature activation+2.47e-02

Pos logit contributions
 apr+0.031
coat+0.030
aults+0.030
 skirts+0.029
Ol+0.029
Neg logit contributions
 harm-0.036
 needs-0.035
 need-0.035
 policy-0.035
 decision-0.034
Token,
Token ID11
Feature activation+8.71e-02

Pos logit contributions
Ol+0.008
 apr+0.008
aults+0.008
Aud+0.008
Daniel+0.008
Neg logit contributions
 need-0.004
 needs-0.004
 of-0.004
 harm-0.004
 policy-0.004
Token shoes
Token ID10012
Feature activation+1.95e-02

Pos logit contributions
aults+0.119
Daniel+0.115
Ryan+0.115
coat+0.115
Aud+0.112
Neg logit contributions
 need-0.168
 needs-0.166
 of-0.156
 animal-0.156
 harm-0.156
Token,
Token ID11
Feature activation+9.08e-02

Pos logit contributions
 apr+0.042
aults+0.042
Ol+0.042
Aud+0.041
coat+0.041
Neg logit contributions
 need-0.022
 needs-0.021
 harm-0.020
 policy-0.020
 of-0.019
Token scar
Token ID10153
Feature activation+3.17e-02

Pos logit contributions
Ryan+0.132
aults+0.132
Daniel+0.130
V+0.127
Vict+0.126
Neg logit contributions
 need-0.168
 needs-0.166
 of-0.162
 animal-0.156
 times-0.154
Tokenves
Token ID1158
Feature activation+2.72e-02

Pos logit contributions
Ol+0.056
 apr+0.055
Aud+0.055
V+0.055
Ryan+0.054
Neg logit contributions
 of-0.046
 need-0.045
 needs-0.044
 harm-0.043
 times-0.043
Token,
Token ID11
Feature activation+9.59e-02

Pos logit contributions
 apr+0.063
Ol+0.062
aults+0.062
Daniel+0.061
Ryan+0.061
Neg logit contributions
 need-0.027
 needs-0.026
 of-0.025
 harm-0.025
 policy-0.024
Token and
Token ID290
Feature activation+5.27e-02

Pos logit contributions
Ryan+0.184
Daniel+0.179
V+0.178
aults+0.178
Vict+0.175
Neg logit contributions
 needs-0.133
 need-0.133
 of-0.129
 animal-0.121
 times-0.120
Token belts
Token ID28617
Feature activation+1.46e-02

Pos logit contributions
coat+0.070
 apr+0.070
aults+0.069
Ol+0.068
Aud+0.068
Neg logit contributions
 needs-0.099
 need-0.099
 harm-0.098
 policy-0.098
 decision-0.094
Token.
Token ID13
Feature activation+5.85e-03

Pos logit contributions
Ol+0.036
 apr+0.035
aults+0.035
Aud+0.035
coat+0.035
Neg logit contributions
 need-0.012
 needs-0.011
 harm-0.011
 policy-0.010
 of-0.010
Token They
Token ID1119
Feature activation+3.14e-02

Pos logit contributions
coat+0.016
aults+0.016
Ryan+0.015
Ol+0.015
Daniel+0.015
Neg logit contributions
 of-0.004
 need-0.004
 needs-0.004
 thing-0.003
 day-0.003
Token looked
Token ID3114
Feature activation+1.70e-02

Pos logit contributions
Aud+0.057
aults+0.055
Ryan+0.054
coat+0.054
Ol+0.054
Neg logit contributions
 need-0.048
 needs-0.044
 of-0.044
 harm-0.042
 day-0.041
Token flour
Token ID10601
Feature activation+3.86e-02

Pos logit contributions
aults+0.145
coat+0.145
Aud+0.142
Ryan+0.142
Ol+0.140
Neg logit contributions
 need-0.141
 needs-0.141
 of-0.130
 harm-0.127
 policy-0.127
Token,
Token ID11
Feature activation+9.52e-02

Pos logit contributions
Aud+0.084
aults+0.083
Ol+0.083
coat+0.083
La+0.083
Neg logit contributions
 needs-0.042
 need-0.042
 harm-0.039
 policy-0.037
 of-0.037
Token sugar
Token ID7543
Feature activation+3.84e-02

Pos logit contributions
Ryan+0.176
Aud+0.169
aults+0.169
coat+0.166
Rose+0.164
Neg logit contributions
 need-0.151
 needs-0.150
 of-0.146
 times-0.133
 thing-0.131
Token,
Token ID11
Feature activation+9.60e-02

Pos logit contributions
Aud+0.083
aults+0.083
Ol+0.083
Ryan+0.083
La+0.083
Neg logit contributions
 needs-0.042
 need-0.041
 harm-0.038
 policy-0.037
 of-0.036
Token milk
Token ID7545
Feature activation+4.10e-02

Pos logit contributions
Ryan+0.187
Aud+0.178
aults+0.176
Rose+0.174
Daniel+0.173
Neg logit contributions
 need-0.145
 needs-0.144
 of-0.142
 times-0.127
 thing-0.124
Token,
Token ID11
Feature activation+9.59e-02

Pos logit contributions
Aud+0.089
aults+0.089
Ol+0.089
La+0.089
Ryan+0.089
Neg logit contributions
 needs-0.044
 need-0.044
 harm-0.041
 policy-0.040
 of-0.039
Token and
Token ID290
Feature activation+4.55e-02

Pos logit contributions
Ryan+0.195
Aud+0.184
aults+0.183
Daniel+0.181
Rose+0.181
Neg logit contributions
 need-0.137
 needs-0.136
 of-0.133
 times-0.118
 day-0.116
Token some
Token ID617
Feature activation+2.09e-02

Pos logit contributions
coat+0.070
aults+0.069
Aud+0.068
Ol+0.068
Ryan+0.067
Neg logit contributions
 need-0.079
 needs-0.079
 harm-0.075
 policy-0.073
 of-0.071
Token green
Token ID4077
Feature activation+2.47e-02

Pos logit contributions
 apr+0.031
coat+0.030
aults+0.030
 skirts+0.029
Ol+0.029
Neg logit contributions
 harm-0.036
 needs-0.035
 need-0.035
 policy-0.035
 decision-0.034
Token stuff
Token ID3404
Feature activation+1.13e-02

Pos logit contributions
coat+0.039
 apr+0.039
aults+0.039
Ol+0.038
Aud+0.038
Neg logit contributions
 need-0.040
 needs-0.040
 harm-0.040
 policy-0.039
 choice-0.037
Token,"
Token ID553
Feature activation+4.09e-02

Pos logit contributions
coat+0.023
 apr+0.023
aults+0.023
 skirts+0.022
Aud+0.022
Neg logit contributions
 harm-0.013
 need-0.013
 needs-0.013
 policy-0.013
 decision-0.012
Token bowl
Token ID9396
Feature activation+1.11e-03

Pos logit contributions
coat+0.056
aults+0.056
Aud+0.055
Ol+0.055
 apr+0.054
Neg logit contributions
 need-0.066
 needs-0.066
 harm-0.063
 policy-0.062
 choice-0.060
Token with
Token ID351
Feature activation-1.62e-04

Pos logit contributions
coat+0.002
 apr+0.002
aults+0.002
J+0.002
 skirts+0.002
Neg logit contributions
 harm-0.002
 needs-0.002
 need-0.002
 policy-0.002
 situations-0.002
Token lettuce
Token ID39406
Feature activation+3.00e-02

Pos logit contributions
 need+0.000
 needs+0.000
 harm+0.000
 policy+0.000
 of+0.000
Neg logit contributions
coat-0.000
Aud-0.000
aults-0.000
Ol-0.000
Ryan-0.000
Token,
Token ID11
Feature activation+8.10e-02

Pos logit contributions
coat+0.062
aults+0.062
 apr+0.061
Ol+0.061
Aud+0.061
Neg logit contributions
 need-0.035
 needs-0.035
 harm-0.034
 policy-0.033
 choice-0.031
Token tomatoes
Token ID23972
Feature activation+2.91e-02

Pos logit contributions
coat+0.139
Aud+0.139
aults+0.137
Ryan+0.135
V+0.134
Neg logit contributions
 needs-0.133
 need-0.132
 harm-0.121
 of-0.119
 day-0.117
Token,
Token ID11
Feature activation+9.58e-02

Pos logit contributions
aults+0.059
coat+0.059
Ol+0.059
Aud+0.058
 apr+0.058
Neg logit contributions
 needs-0.035
 need-0.035
 harm-0.033
 policy-0.033
 of-0.031
Token cheese
Token ID9891
Feature activation+3.55e-02

Pos logit contributions
Ryan+0.185
Aud+0.184
coat+0.183
V+0.182
La+0.177
Neg logit contributions
 needs-0.142
 need-0.141
 of-0.133
 day-0.124
 times-0.123
Token and
Token ID290
Feature activation+5.33e-02

Pos logit contributions
coat+0.075
Ol+0.075
aults+0.074
Aud+0.074
Daniel+0.074
Neg logit contributions
 needs-0.040
 need-0.040
 harm-0.038
 policy-0.037
 of-0.036
Token dressing
Token ID18544
Feature activation+2.57e-02

Pos logit contributions
coat+0.084
aults+0.083
Aud+0.082
Ol+0.081
Daniel+0.081
Neg logit contributions
 needs-0.090
 need-0.090
 harm-0.086
 policy-0.084
 of-0.080
Token.
Token ID13
Feature activation-2.30e-03

Pos logit contributions
coat+0.051
aults+0.051
 apr+0.050
Ol+0.050
Aud+0.050
Neg logit contributions
 need-0.032
 needs-0.032
 harm-0.031
 policy-0.030
 choice-0.028
Token They
Token ID1119
Feature activation+2.60e-02

Pos logit contributions
 need+0.002
 needs+0.002
 policy+0.002
 of+0.002
 day+0.002
Neg logit contributions
aults-0.005
coat-0.005
Ol-0.005
 apr-0.005
Daniel-0.005
Token They
Token ID1119
Feature activation+2.17e-02

Pos logit contributions
 need+0.013
 needs+0.013
 harm+0.012
 policy+0.012
 of+0.012
Neg logit contributions
aults-0.026
coat-0.025
Ol-0.025
 apr-0.025
Aud-0.024
Token see
Token ID766
Feature activation+2.10e-02

Pos logit contributions
aults+0.036
coat+0.036
 apr+0.036
Ol+0.035
Aud+0.035
Neg logit contributions
 needs-0.034
 need-0.033
 harm-0.033
 policy-0.033
 choice-0.031
Token bones
Token ID11945
Feature activation+1.17e-02

Pos logit contributions
coat+0.040
Ol+0.039
aults+0.038
Aud+0.038
 apr+0.037
Neg logit contributions
 need-0.031
 needs-0.030
 harm-0.029
 of-0.028
 animal-0.028
Token,
Token ID11
Feature activation+8.76e-02

Pos logit contributions
aults+0.023
coat+0.023
Aud+0.022
 apr+0.022
Ol+0.022
Neg logit contributions
 needs-0.015
 need-0.015
 harm-0.014
 policy-0.014
 choice-0.013
Token pots
Token ID32195
Feature activation+3.78e-02

Pos logit contributions
coat+0.135
aults+0.133
Aud+0.133
Ol+0.130
Daniel+0.130
Neg logit contributions
 need-0.152
 needs-0.151
 harm-0.145
 policy-0.140
 animal-0.138
Token,
Token ID11
Feature activation+9.58e-02

Pos logit contributions
Ol+0.089
Aud+0.088
La+0.088
aults+0.088
V+0.087
Neg logit contributions
 of-0.035
 need-0.035
 needs-0.033
 policy-0.031
 harm-0.030
Token coins
Token ID10796
Feature activation+8.34e-03

Pos logit contributions
Ryan+0.152
Aud+0.152
coat+0.151
V+0.147
Daniel+0.146
Neg logit contributions
 needs-0.166
 need-0.164
 of-0.153
 animal-0.151
 harm-0.151
Token,
Token ID11
Feature activation+9.35e-02

Pos logit contributions
coat+0.017
aults+0.017
 apr+0.017
Ol+0.017
Aud+0.017
Neg logit contributions
 need-0.010
 needs-0.010
 harm-0.009
 policy-0.009
 choice-0.008
Token paintings
Token ID21641
Feature activation+3.83e-02

Pos logit contributions
Ryan+0.169
Aud+0.167
coat+0.166
V+0.165
Ol+0.162
Neg logit contributions
 needs-0.141
 need-0.140
 of-0.130
 animal-0.127
 harm-0.127
Token,
Token ID11
Feature activation+9.11e-02

Pos logit contributions
Ol+0.088
V+0.086
Aud+0.086
coat+0.086
Ryan+0.086
Neg logit contributions
 need-0.037
 needs-0.037
 of-0.037
 harm-0.034
 policy-0.034
Token and
Token ID290
Feature activation+4.41e-02

Pos logit contributions
Ryan+0.188
Aud+0.187
V+0.185
coat+0.185
Ol+0.184
Neg logit contributions
 needs-0.113
 need-0.111
 of-0.103
 animal-0.101
 harm-0.100
TokenWe
Token ID1135
Feature activation+3.25e-02

Pos logit contributions
aults+0.088
coat+0.085
king+0.080
Aud+0.079
 apr+0.078
Neg logit contributions
 of-0.079
 need-0.079
 needs-0.079
 day-0.075
 thing-0.074
Token put
Token ID1234
Feature activation+3.25e-02

Pos logit contributions
coat+0.057
Aud+0.057
aults+0.057
Ol+0.056
 apr+0.056
Neg logit contributions
 need-0.050
 needs-0.049
 harm-0.046
 policy-0.045
 choice-0.043
Token eggs
Token ID9653
Feature activation+1.84e-02

Pos logit contributions
 apr+0.066
aults+0.066
coat+0.065
Aud+0.065
Ol+0.065
Neg logit contributions
 need-0.038
 needs-0.038
 harm-0.038
 policy-0.037
 choice-0.034
Token,
Token ID11
Feature activation+8.65e-02

Pos logit contributions
coat+0.038
Aud+0.038
aults+0.038
Ol+0.038
 apr+0.037
Neg logit contributions
 need-0.021
 needs-0.021
 harm-0.020
 policy-0.020
 of-0.019
Token flour
Token ID10601
Feature activation+3.86e-02

Pos logit contributions
aults+0.145
coat+0.145
Aud+0.142
Ryan+0.142
Ol+0.140
Neg logit contributions
 need-0.141
 needs-0.141
 of-0.130
 harm-0.127
 policy-0.127
Token,
Token ID11
Feature activation+9.52e-02

Pos logit contributions
Aud+0.084
aults+0.083
Ol+0.083
coat+0.083
La+0.083
Neg logit contributions
 needs-0.042
 need-0.042
 harm-0.039
 policy-0.037
 of-0.037
Token sugar
Token ID7543
Feature activation+3.84e-02

Pos logit contributions
Ryan+0.176
Aud+0.169
aults+0.169
coat+0.166
Rose+0.164
Neg logit contributions
 need-0.151
 needs-0.150
 of-0.146
 times-0.133
 thing-0.131
Token,
Token ID11
Feature activation+9.60e-02

Pos logit contributions
Aud+0.083
aults+0.083
Ol+0.083
Ryan+0.083
La+0.083
Neg logit contributions
 needs-0.042
 need-0.041
 harm-0.038
 policy-0.037
 of-0.036
Token milk
Token ID7545
Feature activation+4.10e-02

Pos logit contributions
Ryan+0.187
Aud+0.178
aults+0.176
Rose+0.174
Daniel+0.173
Neg logit contributions
 need-0.145
 needs-0.144
 of-0.142
 times-0.127
 thing-0.124
Token,
Token ID11
Feature activation+9.59e-02

Pos logit contributions
Aud+0.089
aults+0.089
Ol+0.089
La+0.089
Ryan+0.089
Neg logit contributions
 needs-0.044
 need-0.044
 harm-0.041
 policy-0.040
 of-0.039
Token and
Token ID290
Feature activation+4.55e-02

Pos logit contributions
Ryan+0.195
Aud+0.184
aults+0.183
Daniel+0.181
Rose+0.181
Neg logit contributions
 need-0.137
 needs-0.136
 of-0.133
 times-0.118
 day-0.116
Token night
Token ID1755
Feature activation+5.26e-04

Pos logit contributions
 harm+0.029
 policy+0.028
 situations+0.028
 deeds+0.027
 decisions+0.026
Neg logit contributions
coat-0.064
J-0.062
La-0.062
Ben-0.062
 skirts-0.060
Token,
Token ID11
Feature activation+1.82e-02

Pos logit contributions
coat+0.001
aults+0.001
Ol+0.001
 apr+0.001
Aud+0.001
Neg logit contributions
 needs-0.001
 need-0.001
 harm-0.001
 policy-0.001
 choice-0.001
Token a
Token ID257
Feature activation-1.43e-03

Pos logit contributions
 apr+0.038
aults+0.037
Aud+0.037
coat+0.037
Ol+0.036
Neg logit contributions
 need-0.023
 needs-0.021
 day-0.021
 of-0.021
 thing-0.020
Token three
Token ID1115
Feature activation-3.36e-02

Pos logit contributions
 day+0.002
 need+0.002
 needs+0.002
 thing+0.002
 animal+0.001
Neg logit contributions
aults-0.003
Aud-0.003
 apr-0.003
 skirts-0.003
coat-0.003
Token-
Token ID12
Feature activation-3.81e-02

Pos logit contributions
 harm+0.039
 policy+0.037
 decisions+0.035
 need+0.035
 needs+0.035
Neg logit contributions
 apr-0.072
J-0.071
coat-0.068
 guitars-0.067
Ben-0.067
Tokenyear
Token ID1941
Feature activation-9.83e-02

Pos logit contributions
 needs+0.042
 need+0.038
 policy+0.038
 harm+0.038
 decision+0.037
Neg logit contributions
J-0.086
coat-0.083
V-0.083
Ol-0.082
T-0.082
Token-
Token ID12
Feature activation-5.01e-02

Pos logit contributions
 needs+0.144
 harm+0.142
 need+0.139
 policy+0.136
 situations+0.134
Neg logit contributions
coat-0.164
 apr-0.164
ding-0.150
J-0.149
 shirts-0.149
Tokenold
Token ID727
Feature activation-3.10e-02

Pos logit contributions
 needs+0.054
 need+0.050
 harm+0.049
 policy+0.048
 situations+0.047
Neg logit contributions
J-0.115
coat-0.110
T-0.109
V-0.108
Daniel-0.108
Token child
Token ID1200
Feature activation-2.33e-02

Pos logit contributions
 harm+0.045
 policy+0.044
 need+0.044
 needs+0.044
 decisions+0.042
Neg logit contributions
coat-0.054
Ol-0.053
 apr-0.053
T-0.053
J-0.053
Token was
Token ID373
Feature activation-1.36e-02

Pos logit contributions
 decisions+0.035
 decision+0.035
 harm+0.035
 needs+0.034
 choice+0.034
Neg logit contributions
coat-0.039
J-0.038
 apr-0.038
Ol-0.037
T-0.037
Token walking
Token ID6155
Feature activation+9.27e-03

Pos logit contributions
 policy+0.026
 needs+0.026
 harm+0.026
 choice+0.025
 need+0.025
Neg logit contributions
coat-0.018
Ol-0.018
 apr-0.017
aults-0.017
V-0.017
Token beach
Token ID10481
Feature activation-1.37e-02

Pos logit contributions
 need+0.021
 day+0.021
 needs+0.020
 night+0.020
 harm+0.020
Neg logit contributions
coat-0.021
Aud-0.021
 apr-0.021
Ol-0.021
aults-0.021
Token.
Token ID13
Feature activation-2.94e-02

Pos logit contributions
 harm+0.016
 needs+0.016
 need+0.016
 policy+0.015
 choice+0.014
Neg logit contributions
coat-0.028
aults-0.027
 apr-0.027
J-0.027
 guitars-0.027
Token A
Token ID317
Feature activation-1.41e-02

Pos logit contributions
 needs+0.036
 harm+0.036
 need+0.035
 policy+0.034
 choice+0.033
Neg logit contributions
coat-0.059
 apr-0.059
Ol-0.058
Aud-0.057
aults-0.057
Token three
Token ID1115
Feature activation-3.74e-02

Pos logit contributions
 need+0.019
 needs+0.019
 day+0.018
 thing+0.018
 animal+0.017
Neg logit contributions
Aud-0.027
aults-0.027
coat-0.027
 apr-0.027
Ol-0.026
Token-
Token ID12
Feature activation-3.67e-02

Pos logit contributions
 harm+0.042
 policy+0.038
 needs+0.038
 need+0.037
 decisions+0.035
Neg logit contributions
J-0.084
 apr-0.082
 guitars-0.079
 skirts-0.078
 shirts-0.078
Tokenyear
Token ID1941
Feature activation-9.74e-02

Pos logit contributions
 needs+0.037
 need+0.035
 harm+0.035
 policy+0.034
 decision+0.033
Neg logit contributions
J-0.084
coat-0.083
Aud-0.082
Ol-0.082
T-0.082
Token-
Token ID12
Feature activation-4.98e-02

Pos logit contributions
 needs+0.143
 harm+0.139
 need+0.138
 situations+0.135
 policy+0.133
Neg logit contributions
coat-0.158
 apr-0.158
ades-0.151
 shirts-0.149
 guitars-0.148
Tokenold
Token ID727
Feature activation-3.79e-02

Pos logit contributions
 needs+0.053
 need+0.050
 harm+0.049
 policy+0.047
 decision+0.046
Neg logit contributions
J-0.114
Daniel-0.110
coat-0.110
Aud-0.109
T-0.109
Token boy
Token ID2933
Feature activation-2.16e-02

Pos logit contributions
 harm+0.059
 need+0.055
 policy+0.055
 earthquake+0.055
 decisions+0.054
Neg logit contributions
coat-0.063
J-0.062
Ryan-0.061
Ol-0.061
Daniel-0.061
Token was
Token ID373
Feature activation-2.14e-02

Pos logit contributions
 decision+0.034
 decisions+0.034
 harm+0.033
 choice+0.033
 fate+0.033
Neg logit contributions
coat-0.036
 guitars-0.033
 apr-0.033
T-0.033
Daniel-0.033
Token so
Token ID523
Feature activation-2.47e-02

Pos logit contributions
 needs+0.035
 need+0.035
 harm+0.034
 policy+0.034
 choice+0.032
Neg logit contributions
coat-0.034
aults-0.034
 apr-0.033
Ol-0.033
Aud-0.033
Token
Token ID26391
Feature activation+8.34e-03

Pos logit contributions
 need+0.040
 harm+0.040
 needs+0.040
 policy+0.039
 choice+0.038
Neg logit contributions
coat-0.037
Aud-0.036
 apr-0.035
V-0.035
Ol-0.035
Token asked
Token ID1965
Feature activation-1.76e-02

Pos logit contributions
 apr+0.014
Ol+0.014
aults+0.014
 groom+0.013
 skirts+0.013
Neg logit contributions
 need-0.013
 needs-0.013
ore-0.012
 of-0.012
 day-0.012
Token the
Token ID262
Feature activation-9.19e-04

Pos logit contributions
 need+0.026
 needs+0.025
 harm+0.024
 policy+0.024
 of+0.023
Neg logit contributions
aults-0.032
coat-0.031
Aud-0.031
 apr-0.031
Ol-0.031
Token three
Token ID1115
Feature activation-2.77e-02

Pos logit contributions
 need+0.001
 needs+0.001
 harm+0.001
 policy+0.001
 day+0.001
Neg logit contributions
aults-0.002
coat-0.002
Aud-0.002
 apr-0.002
Ol-0.002
Token-
Token ID12
Feature activation-2.63e-02

Pos logit contributions
 harm+0.029
 policy+0.027
 needs+0.026
 need+0.026
 earthquake+0.024
Neg logit contributions
J-0.060
 apr-0.060
 skirts-0.059
aults-0.059
 shirts-0.058
Tokenyear
Token ID1941
Feature activation-9.68e-02

Pos logit contributions
 needs+0.023
 need+0.023
 harm+0.022
 policy+0.021
 decision+0.019
Neg logit contributions
coat-0.062
Ol-0.062
 apr-0.062
Aud-0.062
aults-0.062
Token-
Token ID12
Feature activation-3.71e-02

Pos logit contributions
 needs+0.138
 need+0.138
 harm+0.136
 policy+0.130
 situations+0.127
Neg logit contributions
J-0.158
 apr-0.154
ades-0.150
 skirts-0.150
 shirts-0.150
Tokenold
Token ID727
Feature activation-1.80e-02

Pos logit contributions
 needs+0.037
 need+0.035
 harm+0.034
 policy+0.033
 decision+0.032
Neg logit contributions
J-0.086
T-0.084
coat-0.084
Aud-0.083
Ol-0.083
Token.
Token ID13
Feature activation-1.49e-02

Pos logit contributions
 harm+0.025
 need+0.025
 needs+0.024
 policy+0.024
 earthquake+0.023
Neg logit contributions
coat-0.033
Ol-0.032
Daniel-0.032
 apr-0.032
aults-0.032
Token
Token ID198
Feature activation+1.10e-02

Pos logit contributions
 need+0.019
 needs+0.018
 of+0.018
 day+0.017
crow+0.017
Neg logit contributions
aults-0.031
Aud-0.029
coat-0.029
 apr-0.029
 skirts-0.028
Token
Token ID198
Feature activation+3.80e-03

Pos logit contributions
 apr+0.024
aults+0.024
 skirts+0.023
 wrists+0.023
coat+0.022
Neg logit contributions
 of-0.015
 day-0.013
ore-0.012
 need-0.011
crow-0.011
Token day
Token ID1110
Feature activation+9.43e-03

Pos logit contributions
 situations+0.052
 harm+0.049
 decisions+0.048
rum+0.045
 deeds+0.044
Neg logit contributions
J-0.085
Ben-0.082
La-0.081
T-0.081
coat-0.080
Token,
Token ID11
Feature activation-1.92e-03

Pos logit contributions
coat+0.021
aults+0.021
Aud+0.021
 apr+0.021
Ol+0.021
Neg logit contributions
 need-0.010
 needs-0.010
 policy-0.009
 harm-0.009
 day-0.009
Token a
Token ID257
Feature activation-1.01e-02

Pos logit contributions
 needs+0.003
 harm+0.003
 policy+0.003
 need+0.003
 decision+0.003
Neg logit contributions
 apr-0.003
aults-0.003
Ol-0.003
 guitars-0.003
coat-0.003
Token three
Token ID1115
Feature activation-3.53e-02

Pos logit contributions
 need+0.013
 needs+0.013
 harm+0.013
 policy+0.012
 day+0.012
Neg logit contributions
aults-0.020
coat-0.019
Aud-0.019
 apr-0.019
Ol-0.019
Token-
Token ID12
Feature activation-3.40e-02

Pos logit contributions
 harm+0.041
 policy+0.041
 needs+0.039
 need+0.038
 fate+0.037
Neg logit contributions
J-0.073
 guitars-0.070
 apr-0.070
aults-0.068
 skirts-0.068
Tokenyear
Token ID1941
Feature activation-9.66e-02

Pos logit contributions
 needs+0.034
 need+0.033
 policy+0.031
 harm+0.031
 decision+0.030
Neg logit contributions
J-0.078
V-0.077
coat-0.076
Aud-0.076
Ol-0.076
Token-
Token ID12
Feature activation-4.78e-02

Pos logit contributions
 needs+0.145
 need+0.139
 harm+0.137
 policy+0.134
 situations+0.134
Neg logit contributions
coat-0.155
J-0.155
 apr-0.154
 guitars-0.152
aults-0.151
Tokenold
Token ID727
Feature activation-3.31e-02

Pos logit contributions
 needs+0.051
 need+0.049
 harm+0.046
 policy+0.045
 situations+0.044
Neg logit contributions
J-0.110
T-0.105
Daniel-0.105
Aud-0.105
Ryan-0.104
Token boy
Token ID2933
Feature activation-1.62e-02

Pos logit contributions
 harm+0.049
 need+0.047
 policy+0.047
 needs+0.047
 earthquake+0.046
Neg logit contributions
J-0.055
coat-0.055
Ol-0.055
Ryan-0.054
Daniel-0.054
Token was
Token ID373
Feature activation-2.21e-02

Pos logit contributions
 needs+0.025
 harm+0.024
 decision+0.024
 need+0.024
 decisions+0.024
Neg logit contributions
coat-0.026
aults-0.026
J-0.026
Ol-0.026
T-0.025
Token walking
Token ID6155
Feature activation+5.52e-03

Pos logit contributions
 needs+0.037
 need+0.037
 harm+0.036
 policy+0.036
 choice+0.034
Neg logit contributions
coat-0.035
aults-0.034
 apr-0.034
Ol-0.034
Aud-0.034
Token,
Token ID11
Feature activation+1.21e-02

Pos logit contributions
coat+0.014
aults+0.014
Aud+0.014
 apr+0.014
Ol+0.013
Neg logit contributions
 need-0.007
 needs-0.006
 policy-0.006
 harm-0.006
 day-0.006
Token a
Token ID257
Feature activation-2.79e-03

Pos logit contributions
aults+0.022
coat+0.022
 apr+0.022
Aud+0.022
Ol+0.022
Neg logit contributions
 need-0.017
 needs-0.017
 harm-0.016
 policy-0.016
 day-0.015
Token little
Token ID1310
Feature activation-3.06e-02

Pos logit contributions
 day+0.003
 need+0.003
 needs+0.003
 animal+0.003
 thing+0.003
Neg logit contributions
aults-0.006
Aud-0.006
 apr-0.006
coat-0.006
 skirts-0.006
Token three
Token ID1115
Feature activation-3.07e-02

Pos logit contributions
 harm+0.043
 needs+0.042
 need+0.042
 policy+0.041
 decisions+0.040
Neg logit contributions
coat-0.057
Ol-0.055
 apr-0.054
Aud-0.054
aults-0.053
Token-
Token ID12
Feature activation-3.72e-02

Pos logit contributions
 harm+0.040
 policy+0.037
 decisions+0.037
 need+0.036
 situations+0.035
Neg logit contributions
 apr-0.060
J-0.059
coat-0.057
 guitars-0.056
Ben-0.056
Tokenyear
Token ID1941
Feature activation-9.66e-02

Pos logit contributions
 needs+0.047
 situations+0.044
 policy+0.043
 decision+0.043
 need+0.043
Neg logit contributions
J-0.080
V-0.076
T-0.075
coat-0.074
Ol-0.074
Token-
Token ID12
Feature activation-5.16e-02

Pos logit contributions
 needs+0.148
 harm+0.147
 need+0.145
 situations+0.144
 policy+0.139
Neg logit contributions
coat-0.152
 apr-0.148
ades-0.140
ding-0.139
udd-0.139
Tokenold
Token ID727
Feature activation-3.08e-02

Pos logit contributions
 needs+0.065
 need+0.060
 situations+0.059
 harm+0.058
 fate+0.057
Neg logit contributions
J-0.113
T-0.105
coat-0.103
V-0.103
Daniel-0.102
Token girl
Token ID2576
Feature activation-1.88e-02

Pos logit contributions
 harm+0.044
 needs+0.043
 policy+0.043
 need+0.043
 decisions+0.042
Neg logit contributions
Ol-0.054
coat-0.053
T-0.053
J-0.053
 apr-0.052
Token named
Token ID3706
Feature activation-7.74e-03

Pos logit contributions
 harm+0.027
 decision+0.027
 decisions+0.027
 needs+0.027
 policy+0.026
Neg logit contributions
coat-0.032
 apr-0.032
J-0.031
Ol-0.031
T-0.031
Token Lucy
Token ID22162
Feature activation-2.45e-03

Pos logit contributions
 needs+0.014
 harm+0.013
 need+0.013
 policy+0.013
 decision+0.013
Neg logit contributions
coat-0.011
Aud-0.011
 apr-0.011
aults-0.011
Ol-0.011
Token,
Token ID11
Feature activation+1.33e-03

Pos logit contributions
aults+0.016
coat+0.016
Aud+0.016
 apr+0.015
Daniel+0.015
Neg logit contributions
 need-0.007
 needs-0.007
 harm-0.007
 day-0.006
 policy-0.006
Token a
Token ID257
Feature activation-8.66e-03

Pos logit contributions
coat+0.003
 apr+0.003
Ol+0.003
aults+0.003
Aud+0.003
Neg logit contributions
 needs-0.002
 need-0.002
 harm-0.002
 policy-0.002
 decision-0.002
Token little
Token ID1310
Feature activation-4.00e-02

Pos logit contributions
 need+0.012
 needs+0.012
 harm+0.011
 policy+0.011
 day+0.011
Neg logit contributions
aults-0.016
coat-0.016
Aud-0.016
 apr-0.016
Ol-0.016
Token three
Token ID1115
Feature activation-3.22e-02

Pos logit contributions
 policy+0.064
 harm+0.061
 needs+0.060
 need+0.059
 fate+0.059
Neg logit contributions
coat-0.068
Ol-0.063
 guitars-0.062
V-0.062
Aud-0.061
Token-
Token ID12
Feature activation-3.38e-02

Pos logit contributions
 policy+0.041
 harm+0.038
 needs+0.036
 need+0.036
 fate+0.036
Neg logit contributions
 apr-0.062
 guitars-0.062
J-0.060
coat-0.060
La-0.060
Tokenyear
Token ID1941
Feature activation-9.65e-02

Pos logit contributions
 needs+0.039
 policy+0.037
 need+0.037
 decision+0.037
 situations+0.036
Neg logit contributions
J-0.074
V-0.073
T-0.072
coat-0.072
Ol-0.072
Token-
Token ID12
Feature activation-4.59e-02

Pos logit contributions
 needs+0.150
 policy+0.146
 need+0.146
 situations+0.143
 harm+0.136
Neg logit contributions
 groom-0.147
coat-0.144
 apr-0.141
 guitars-0.141
aults-0.137
Tokenold
Token ID727
Feature activation-3.12e-02

Pos logit contributions
 needs+0.056
 need+0.053
 policy+0.052
 decision+0.052
 situations+0.050
Neg logit contributions
J-0.099
T-0.096
Aud-0.095
V-0.095
coat-0.094
Token boy
Token ID2933
Feature activation-1.71e-02

Pos logit contributions
 policy+0.046
 harm+0.046
 need+0.045
 needs+0.044
 decision+0.044
Neg logit contributions
Ol-0.052
coat-0.052
T-0.052
J-0.051
 apr-0.050
Token named
Token ID3706
Feature activation-6.73e-03

Pos logit contributions
 needs+0.025
 policy+0.025
 need+0.025
 harm+0.025
 decision+0.025
Neg logit contributions
coat-0.029
Ol-0.028
aults-0.028
T-0.028
 apr-0.028
Token John
Token ID1757
Feature activation+8.73e-03

Pos logit contributions
 need+0.011
 needs+0.011
 harm+0.011
 policy+0.011
 choice+0.010
Neg logit contributions
coat-0.010
aults-0.010
 apr-0.010
Aud-0.010
Ol-0.010
Token,
Token ID11
Feature activation-2.91e-03

Pos logit contributions
coat+0.017
aults+0.017
 apr+0.017
Aud+0.017
Ol+0.017
Neg logit contributions
 need-0.010
 needs-0.010
 harm-0.009
 policy-0.009
 choice-0.008
Token a
Token ID257
Feature activation-1.03e-02

Pos logit contributions
 harm+0.005
 needs+0.005
 policy+0.005
 need+0.005
 decisions+0.004
Neg logit contributions
Ol-0.005
 apr-0.005
coat-0.005
aults-0.004
V-0.004
Token little
Token ID1310
Feature activation-4.02e-02

Pos logit contributions
 need+0.014
 needs+0.014
 harm+0.013
 policy+0.013
 day+0.013
Neg logit contributions
aults-0.020
coat-0.020
 apr-0.019
Aud-0.019
Ol-0.019
Token 3
Token ID513
Feature activation-3.36e-02

Pos logit contributions
 harm+0.065
 policy+0.065
 decisions+0.063
 needs+0.063
 fate+0.062
Neg logit contributions
coat-0.065
Ol-0.060
Ben-0.059
 guitars-0.059
Max-0.059
Token-
Token ID12
Feature activation-3.89e-02

Pos logit contributions
 needs+0.046
 decisions+0.045
 policy+0.043
 need+0.043
 harm+0.042
Neg logit contributions
J-0.066
Tom-0.059
coat-0.058
 apr-0.058
T-0.057
Tokenyear
Token ID1941
Feature activation-9.63e-02

Pos logit contributions
 needs+0.049
 decision+0.047
 situations+0.047
 need+0.046
 policy+0.045
Neg logit contributions
J-0.084
V-0.080
Ol-0.079
T-0.078
coat-0.077
Token-
Token ID12
Feature activation-5.06e-02

Pos logit contributions
 needs+0.152
 situations+0.148
 need+0.147
 harm+0.142
 policy+0.141
Neg logit contributions
coat-0.141
J-0.134
 apr-0.133
udd-0.129
 groom-0.127
Tokenold
Token ID727
Feature activation-3.34e-02

Pos logit contributions
 needs+0.064
 need+0.060
 decision+0.059
 situations+0.059
 harm+0.057
Neg logit contributions
J-0.111
T-0.102
V-0.102
Ol-0.100
Aud-0.100
Token girl
Token ID2576
Feature activation-2.15e-02

Pos logit contributions
 harm+0.050
 policy+0.049
 need+0.048
 needs+0.048
 decision+0.047
Neg logit contributions
Ol-0.056
J-0.056
coat-0.056
T-0.055
 apr-0.054
Token found
Token ID1043
Feature activation-1.37e-02

Pos logit contributions
 harm+0.032
 needs+0.032
 decision+0.032
 policy+0.032
 need+0.032
Neg logit contributions
coat-0.036
 apr-0.035
Ol-0.035
J-0.035
Aud-0.035
Token the
Token ID262
Feature activation+6.23e-03

Pos logit contributions
 needs+0.022
 policy+0.022
 decision+0.021
 need+0.021
 harm+0.021
Neg logit contributions
 apr-0.022
Ol-0.022
J-0.021
coat-0.021
T-0.021
Token day
Token ID1110
Feature activation+6.70e-03

Pos logit contributions
 situations+0.026
 policy+0.026
 harm+0.025
 decisions+0.024
 deeds+0.023
Neg logit contributions
coat-0.057
J-0.056
La-0.056
Ben-0.054
ding-0.054
Token,
Token ID11
Feature activation+1.28e-02

Pos logit contributions
coat+0.015
aults+0.014
Aud+0.014
 apr+0.014
Ol+0.014
Neg logit contributions
 need-0.007
 needs-0.007
 policy-0.007
 harm-0.007
 day-0.006
Token a
Token ID257
Feature activation-3.42e-03

Pos logit contributions
 apr+0.026
Aud+0.026
aults+0.026
coat+0.025
La+0.025
Neg logit contributions
 need-0.016
 needs-0.015
 day-0.015
 of-0.015
 thing-0.014
Token three
Token ID1115
Feature activation-3.14e-02

Pos logit contributions
 day+0.004
 need+0.004
 needs+0.004
 thing+0.004
 animal+0.004
Neg logit contributions
aults-0.007
Aud-0.007
 apr-0.007
coat-0.007
 skirts-0.007
Token-
Token ID12
Feature activation-3.74e-02

Pos logit contributions
 harm+0.037
 policy+0.036
 decisions+0.035
 needs+0.035
 need+0.034
Neg logit contributions
J-0.065
 apr-0.065
coat-0.062
 guitars-0.061
Ben-0.060
Tokenyear
Token ID1941
Feature activation-9.63e-02

Pos logit contributions
 needs+0.040
 need+0.037
 policy+0.036
 harm+0.036
 decision+0.035
Neg logit contributions
J-0.085
coat-0.082
T-0.082
Ol-0.082
V-0.082
Token-
Token ID12
Feature activation-5.10e-02

Pos logit contributions
 needs+0.141
 harm+0.137
 need+0.136
 policy+0.134
 situations+0.132
Neg logit contributions
coat-0.157
 apr-0.152
ding-0.146
aults-0.145
udd-0.144
Tokenold
Token ID727
Feature activation-3.20e-02

Pos logit contributions
 needs+0.055
 need+0.050
 harm+0.048
 policy+0.048
 situations+0.046
Neg logit contributions
J-0.117
T-0.113
coat-0.112
Daniel-0.112
Ol-0.111
Token boy
Token ID2933
Feature activation-1.26e-02

Pos logit contributions
 harm+0.045
 needs+0.044
 need+0.044
 policy+0.044
 decision+0.042
Neg logit contributions
Ol-0.057
coat-0.057
T-0.056
J-0.056
 apr-0.055
Token was
Token ID373
Feature activation-1.37e-02

Pos logit contributions
 decision+0.019
 decisions+0.018
 needs+0.018
 harm+0.018
 fate+0.018
Neg logit contributions
J-0.021
coat-0.021
Ol-0.021
T-0.021
 apr-0.021
Token very
Token ID845
Feature activation-3.58e-02

Pos logit contributions
 needs+0.026
 policy+0.026
 harm+0.026
 need+0.026
 choice+0.025
Neg logit contributions
coat-0.018
Ol-0.018
 apr-0.018
aults-0.017
V-0.017
Tokenolly
Token ID5098
Feature activation-1.41e-02

Pos logit contributions
 situations+0.058
 harm+0.058
 needs+0.056
 policy+0.054
 need+0.053
Neg logit contributions
Daniel-0.046
 apr-0.045
Ol-0.044
J-0.044
Ryan-0.044
Token was
Token ID373
Feature activation+4.82e-03

Pos logit contributions
 needs+0.019
 need+0.019
 harm+0.018
 policy+0.018
 day+0.017
Neg logit contributions
aults-0.027
coat-0.026
Ol-0.026
 apr-0.026
Aud-0.026
Token a
Token ID257
Feature activation+3.58e-03

Pos logit contributions
aults+0.008
Ol+0.008
coat+0.008
Aud+0.008
 apr+0.008
Neg logit contributions
 need-0.008
 needs-0.007
 harm-0.007
 day-0.007
 of-0.007
Token three
Token ID1115
Feature activation-3.07e-02

Pos logit contributions
coat+0.007
Aud+0.007
aults+0.007
Ol+0.007
 apr+0.007
Neg logit contributions
 need-0.005
 needs-0.005
 day-0.005
 animal-0.004
 night-0.004
Token-
Token ID12
Feature activation-1.84e-02

Pos logit contributions
 policy+0.036
 harm+0.034
 need+0.033
 needs+0.033
 fate+0.032
Neg logit contributions
 apr-0.065
 guitars-0.063
 skirts-0.062
aults-0.062
J-0.062
Tokenyear
Token ID1941
Feature activation-9.61e-02

Pos logit contributions
 needs+0.018
 need+0.018
 harm+0.017
 policy+0.016
 choice+0.015
Neg logit contributions
coat-0.042
aults-0.042
 apr-0.041
Ol-0.041
Aud-0.041
Token-
Token ID12
Feature activation-3.70e-02

Pos logit contributions
 needs+0.146
 need+0.145
 policy+0.145
 harm+0.137
 belong+0.133
Neg logit contributions
 apr-0.152
coat-0.148
aults-0.147
 shirts-0.141
 skirts-0.137
Tokenold
Token ID727
Feature activation-2.76e-02

Pos logit contributions
 needs+0.037
 need+0.035
 policy+0.033
 harm+0.032
 decision+0.031
Neg logit contributions
J-0.086
coat-0.085
aults-0.085
Aud-0.085
Daniel-0.084
Token who
Token ID508
Feature activation-1.79e-02

Pos logit contributions
 harm+0.040
 policy+0.040
 need+0.040
 needs+0.039
 earthquake+0.039
Neg logit contributions
coat-0.048
Ol-0.047
aults-0.047
 apr-0.047
Ryan-0.047
Token liked
Token ID8288
Feature activation-9.07e-03

Pos logit contributions
 harm+0.031
 policy+0.031
 needs+0.030
 need+0.030
 decisions+0.030
Neg logit contributions
coat-0.027
Daniel-0.026
Ol-0.026
T-0.026
Aud-0.026
Token to
Token ID284
Feature activation-1.45e-02

Pos logit contributions
 need+0.011
 needs+0.011
 harm+0.011
 policy+0.010
 choice+0.010
Neg logit contributions
aults-0.018
coat-0.018
Aud-0.018
Ol-0.018
 apr-0.018
Token was
Token ID373
Feature activation-1.92e-03

Pos logit contributions
coat+0.010
aults+0.010
 apr+0.010
Ol+0.009
Aud+0.009
Neg logit contributions
 need-0.008
 needs-0.008
 harm-0.008
 policy-0.007
 choice-0.007
Token an
Token ID281
Feature activation-2.50e-02

Pos logit contributions
 need+0.003
 needs+0.003
 harm+0.003
 policy+0.002
 day+0.002
Neg logit contributions
aults-0.004
coat-0.004
Ol-0.003
Aud-0.003
 apr-0.003
Token eager
Token ID11069
Feature activation-1.85e-02

Pos logit contributions
 policy+0.037
 needs+0.037
 harm+0.037
 need+0.036
ment+0.036
Neg logit contributions
 apr-0.043
 skirts-0.040
coat-0.040
aults-0.040
 guitars-0.039
Token 3
Token ID513
Feature activation-3.88e-02

Pos logit contributions
 harm+0.029
 needs+0.028
 situations+0.027
 belongs+0.027
 policy+0.027
Neg logit contributions
coat-0.031
 apr-0.029
 skirts-0.028
 guitars-0.028
 shirts-0.028
Token-
Token ID12
Feature activation-3.82e-02

Pos logit contributions
 needs+0.051
 need+0.048
 fate+0.047
 policy+0.047
 harm+0.046
Neg logit contributions
J-0.073
 apr-0.072
 guitars-0.070
aults-0.069
 shirts-0.066
Tokenyear
Token ID1941
Feature activation-9.59e-02

Pos logit contributions
 needs+0.046
 need+0.043
 situations+0.042
 fear+0.040
 policy+0.040
Neg logit contributions
J-0.085
T-0.081
coat-0.080
V-0.079
Ryan-0.079
Token-
Token ID12
Feature activation-4.79e-02

Pos logit contributions
 need+0.151
 needs+0.150
 harm+0.144
 situations+0.140
 policy+0.138
Neg logit contributions
 apr-0.143
aults-0.141
coat-0.139
 shirts-0.138
ding-0.132
Tokenold
Token ID727
Feature activation-3.43e-02

Pos logit contributions
 needs+0.064
 need+0.060
 situations+0.057
 fear+0.057
 fate+0.056
Neg logit contributions
J-0.102
T-0.095
Ryan-0.094
Daniel-0.093
Adam-0.093
Token boy
Token ID2933
Feature activation-1.90e-02

Pos logit contributions
 harm+0.053
 policy+0.052
 earthquake+0.052
 fate+0.051
 need+0.050
Neg logit contributions
coat-0.054
 guitars-0.054
Ryan-0.053
T-0.053
 apr-0.053
Token who
Token ID508
Feature activation-2.41e-02

Pos logit contributions
 decision+0.027
 choice+0.027
 fate+0.026
 decisions+0.026
 policy+0.026
Neg logit contributions
coat-0.033
aults-0.032
T-0.032
 guitars-0.032
 apr-0.032
Token loved
Token ID6151
Feature activation-1.57e-02

Pos logit contributions
 policy+0.045
 harm+0.045
 decisions+0.044
 situations+0.043
 fate+0.043
Neg logit contributions
coat-0.032
Daniel-0.031
Ryan-0.031
 apr-0.031
T-0.031
Token day
Token ID1110
Feature activation+7.42e-03

Pos logit contributions
 situations+0.062
 decisions+0.061
 harm+0.056
rum+0.054
 policy+0.053
Neg logit contributions
Ben-0.082
J-0.080
T-0.079
La-0.076
Max-0.076
Token,
Token ID11
Feature activation-2.55e-04

Pos logit contributions
coat+0.015
aults+0.015
 apr+0.015
Aud+0.015
Ol+0.015
Neg logit contributions
 need-0.009
 needs-0.009
 harm-0.008
 policy-0.008
 choice-0.007
Token a
Token ID257
Feature activation-1.19e-02

Pos logit contributions
 needs+0.000
 need+0.000
 harm+0.000
 policy+0.000
 choice+0.000
Neg logit contributions
coat-0.000
aults-0.000
Ol-0.000
 apr-0.000
Aud-0.000
Token 3
Token ID513
Feature activation-3.44e-02

Pos logit contributions
 need+0.017
 needs+0.017
 harm+0.016
 policy+0.016
 choice+0.015
Neg logit contributions
aults-0.022
coat-0.022
 apr-0.022
Aud-0.022
Ol-0.022
Token-
Token ID12
Feature activation-3.76e-02

Pos logit contributions
 needs+0.047
 decisions+0.045
 need+0.045
 policy+0.045
 harm+0.043
Neg logit contributions
J-0.068
Tom-0.061
coat-0.059
Ben-0.058
 apr-0.058
Tokenyear
Token ID1941
Feature activation-9.59e-02

Pos logit contributions
 needs+0.047
 decision+0.044
 need+0.043
 situations+0.043
 policy+0.043
Neg logit contributions
J-0.081
V-0.078
Ol-0.077
T-0.077
coat-0.076
Token-
Token ID12
Feature activation-4.85e-02

Pos logit contributions
 needs+0.151
 situations+0.145
 need+0.145
 policy+0.141
 harm+0.140
Neg logit contributions
coat-0.134
J-0.129
 apr-0.127
 groom-0.126
aults-0.124
Tokenold
Token ID727
Feature activation-3.67e-02

Pos logit contributions
 needs+0.061
 need+0.057
 decision+0.056
 situations+0.055
 policy+0.054
Neg logit contributions
J-0.106
T-0.099
V-0.097
Ol-0.096
Aud-0.095
Token child
Token ID1200
Feature activation-3.04e-02

Pos logit contributions
 harm+0.058
 policy+0.056
 earthquake+0.055
 decision+0.055
 fate+0.055
Neg logit contributions
J-0.058
Ol-0.057
coat-0.056
T-0.056
Ryan-0.056
Token found
Token ID1043
Feature activation-1.64e-02

Pos logit contributions
 decision+0.049
 needs+0.049
 policy+0.048
 need+0.048
 decisions+0.048
Neg logit contributions
coat-0.047
aults-0.046
Ol-0.046
J-0.045
 apr-0.045
Token the
Token ID262
Feature activation+4.43e-03

Pos logit contributions
 needs+0.028
 policy+0.028
 decision+0.028
 need+0.027
 harm+0.027
Neg logit contributions
Ol-0.025
 apr-0.025
T-0.024
J-0.024
V-0.023
Tokenally
Token ID453
Feature activation+2.69e-04

Pos logit contributions
 need+0.015
 needs+0.015
 night+0.015
 of+0.015
 day+0.015
Neg logit contributions
coat-0.011
 guitars-0.011
aults-0.010
Aud-0.010
 apr-0.010
Token was
Token ID373
Feature activation-2.53e-03

Pos logit contributions
coat+0.000
aults+0.000
 apr+0.000
Ol+0.000
Aud+0.000
Neg logit contributions
 need-0.000
 needs-0.000
 harm-0.000
 policy-0.000
 choice-0.000
Token a
Token ID257
Feature activation+1.60e-04

Pos logit contributions
 need+0.004
 needs+0.004
 harm+0.003
 day+0.003
 policy+0.003
Neg logit contributions
aults-0.005
coat-0.005
Ol-0.005
Aud-0.005
 apr-0.005
Token 3
Token ID513
Feature activation-3.51e-02

Pos logit contributions
Aud+0.000
coat+0.000
aults+0.000
Ol+0.000
 apr+0.000
Neg logit contributions
 need-0.000
 needs-0.000
 day-0.000
 animal-0.000
 night-0.000
Token-
Token ID12
Feature activation-2.70e-02

Pos logit contributions
 needs+0.041
 policy+0.039
 need+0.039
 harm+0.038
 belong+0.036
Neg logit contributions
J-0.073
 apr-0.073
aults-0.071
 guitars-0.071
coat-0.069
Tokenyear
Token ID1941
Feature activation-9.58e-02

Pos logit contributions
 needs+0.025
 need+0.025
 harm+0.024
 policy+0.023
 situations+0.021
Neg logit contributions
coat-0.063
aults-0.062
Ol-0.062
 apr-0.062
Aud-0.061
Token-
Token ID12
Feature activation-4.37e-02

Pos logit contributions
 needs+0.142
 need+0.141
 harm+0.140
 situations+0.138
 policy+0.137
Neg logit contributions
 apr-0.155
aults-0.149
coat-0.147
 shirts-0.142
 groom-0.139
Tokenold
Token ID727
Feature activation-3.24e-02

Pos logit contributions
 needs+0.045
 need+0.042
 harm+0.040
 policy+0.039
 situations+0.039
Neg logit contributions
J-0.101
coat-0.099
aults-0.098
Daniel-0.097
Aud-0.097
Token girl
Token ID2576
Feature activation-2.11e-02

Pos logit contributions
 harm+0.051
 policy+0.050
 needs+0.050
 need+0.050
 earthquake+0.048
Neg logit contributions
coat-0.052
aults-0.051
 apr-0.050
 guitars-0.050
Ol-0.050
Token who
Token ID508
Feature activation-2.04e-02

Pos logit contributions
 harm+0.032
 needs+0.032
 policy+0.032
 need+0.031
 decision+0.031
Neg logit contributions
coat-0.035
aults-0.035
 apr-0.034
 guitars-0.034
T-0.034
Token had
Token ID550
Feature activation+2.93e-03

Pos logit contributions
 harm+0.037
 policy+0.036
 decisions+0.035
 needs+0.035
 need+0.035
Neg logit contributions
coat-0.029
Daniel-0.028
 apr-0.028
Aud-0.028
Ryan-0.028
Tokennie
Token ID11952
Feature activation-2.42e-02

Pos logit contributions
 policy+0.038
 earthquake+0.038
 harm+0.036
 needs+0.036
 situations+0.035
Neg logit contributions
T-0.053
V-0.053
Daniel-0.053
aults-0.052
Ryan-0.052
Token was
Token ID373
Feature activation-4.53e-03

Pos logit contributions
 harm+0.036
 policy+0.036
 need+0.035
 needs+0.035
 decisions+0.034
Neg logit contributions
coat-0.041
aults-0.040
Daniel-0.040
 apr-0.040
Ol-0.039
Token a
Token ID257
Feature activation+7.33e-04

Pos logit contributions
 needs+0.007
 need+0.007
 harm+0.007
 policy+0.007
 choice+0.006
Neg logit contributions
coat-0.008
aults-0.008
 apr-0.007
Ol-0.007
Aud-0.007
Token three
Token ID1115
Feature activation-3.36e-02

Pos logit contributions
Aud+0.001
coat+0.001
aults+0.001
Ol+0.001
 apr+0.001
Neg logit contributions
 need-0.001
 needs-0.001
 day-0.001
 harm-0.001
 animal-0.001
Token-
Token ID12
Feature activation-2.47e-02

Pos logit contributions
 harm+0.037
 policy+0.037
 need+0.034
 needs+0.033
 fate+0.033
Neg logit contributions
 apr-0.073
aults-0.070
 guitars-0.070
 skirts-0.069
 shirts-0.068
Tokenyear
Token ID1941
Feature activation-9.57e-02

Pos logit contributions
 needs+0.025
 need+0.024
 policy+0.023
 harm+0.023
 decision+0.022
Neg logit contributions
coat-0.056
aults-0.055
J-0.055
Ol-0.055
 apr-0.055
Token-
Token ID12
Feature activation-4.00e-02

Pos logit contributions
 harm+0.133
 needs+0.133
 need+0.132
 means+0.131
 policy+0.131
Neg logit contributions
 apr-0.161
aults-0.157
coat-0.153
 shirts-0.151
ding-0.145
Tokenold
Token ID727
Feature activation-2.80e-02

Pos logit contributions
 needs+0.046
 need+0.043
 harm+0.042
 policy+0.041
 earthquake+0.040
Neg logit contributions
J-0.089
Daniel-0.085
coat-0.085
Ryan-0.085
T-0.084
Token with
Token ID351
Feature activation-1.56e-02

Pos logit contributions
 harm+0.043
 policy+0.042
 need+0.041
 earthquake+0.041
 needs+0.041
Neg logit contributions
coat-0.047
 guitars-0.045
aults-0.045
 apr-0.045
 skirts-0.045
Token bright
Token ID6016
Feature activation-1.38e-02

Pos logit contributions
 needs+0.026
 harm+0.025
 policy+0.025
 need+0.025
 decision+0.024
Neg logit contributions
 apr-0.025
Ol-0.025
coat-0.024
aults-0.024
Aud-0.024
Token eyes
Token ID2951
Feature activation-2.07e-02

Pos logit contributions
 harm+0.026
 needs+0.026
 need+0.025
 policy+0.025
crow+0.024
Neg logit contributions
 apr-0.019
coat-0.019
aults-0.018
 skirts-0.018
 guitars-0.018
Token was
Token ID373
Feature activation-7.42e-03

Pos logit contributions
 need+0.039
 needs+0.039
 harm+0.038
 policy+0.037
 choice+0.035
Neg logit contributions
coat-0.057
aults-0.057
 apr-0.056
Ol-0.056
Aud-0.056
Token a
Token ID257
Feature activation-1.06e-03

Pos logit contributions
 need+0.011
 needs+0.010
 harm+0.010
 policy+0.010
 day+0.009
Neg logit contributions
aults-0.014
coat-0.014
Ol-0.013
Aud-0.013
 apr-0.013
Token curious
Token ID11040
Feature activation-1.78e-02

Pos logit contributions
 need+0.001
 needs+0.001
 day+0.001
 animal+0.001
 night+0.001
Neg logit contributions
Aud-0.002
coat-0.002
aults-0.002
Ol-0.002
 apr-0.002
Token 3
Token ID513
Feature activation-3.88e-02

Pos logit contributions
 harm+0.030
 needs+0.029
 policy+0.029
 need+0.029
 situations+0.027
Neg logit contributions
coat-0.028
 apr-0.027
aults-0.026
Daniel-0.026
 guitars-0.026
Token-
Token ID12
Feature activation-3.32e-02

Pos logit contributions
 needs+0.047
 policy+0.046
 need+0.045
 harm+0.044
 belong+0.042
Neg logit contributions
J-0.078
 apr-0.077
 guitars-0.076
aults-0.074
coat-0.072
Tokenyear
Token ID1941
Feature activation-9.55e-02

Pos logit contributions
 needs+0.036
 need+0.034
 policy+0.033
 harm+0.031
 situations+0.031
Neg logit contributions
J-0.075
aults-0.074
coat-0.074
Ol-0.073
T-0.073
Token-
Token ID12
Feature activation-4.51e-02

Pos logit contributions
 need+0.143
 needs+0.142
 policy+0.137
 harm+0.134
 means+0.126
Neg logit contributions
aults-0.152
 apr-0.151
coat-0.147
 shirts-0.146
 guitars-0.144
Tokenold
Token ID727
Feature activation-3.53e-02

Pos logit contributions
 needs+0.054
 need+0.050
 policy+0.047
 harm+0.046
 situations+0.045
Neg logit contributions
J-0.100
Ryan-0.096
Daniel-0.096
aults-0.096
T-0.096
Token,
Token ID11
Feature activation+1.03e-02

Pos logit contributions
 harm+0.055
 policy+0.054
 earthquake+0.054
 need+0.053
 needs+0.053
Neg logit contributions
coat-0.057
Ryan-0.056
T-0.056
 guitars-0.056
aults-0.056
Token eager
Token ID11069
Feature activation-2.30e-02

Pos logit contributions
aults+0.019
 apr+0.019
coat+0.018
Aud+0.018
Ol+0.018
Neg logit contributions
 need-0.015
 needs-0.015
 harm-0.014
 day-0.013
 policy-0.013
Token to
Token ID284
Feature activation-4.28e-02

Pos logit contributions
 needs+0.037
 harm+0.037
 need+0.035
crow+0.035
 policy+0.035
Neg logit contributions
coat-0.034
 guitars-0.034
 skirts-0.034
 shirts-0.033
 apr-0.033
Token across
Token ID1973
Feature activation-3.93e-03

Pos logit contributions
coat+0.033
aults+0.033
 apr+0.033
Ol+0.033
Aud+0.033
Neg logit contributions
 needs-0.022
 need-0.022
 harm-0.021
 policy-0.021
 choice-0.019
Token a
Token ID257
Feature activation-6.21e-03

Pos logit contributions
 needs+0.006
 need+0.006
 policy+0.006
 harm+0.006
 decision+0.006
Neg logit contributions
 apr-0.006
Ol-0.006
coat-0.006
Aud-0.006
T-0.006
Token little
Token ID1310
Feature activation-2.89e-02

Pos logit contributions
 need+0.009
 needs+0.009
 harm+0.009
 policy+0.009
 animal+0.008
Neg logit contributions
aults-0.011
coat-0.011
Aud-0.011
 apr-0.011
Ol-0.011
Token 3
Token ID513
Feature activation-2.94e-02

Pos logit contributions
 policy+0.039
 need+0.039
 needs+0.039
 harm+0.039
 choice+0.036
Neg logit contributions
coat-0.053
aults-0.052
 apr-0.051
Ol-0.051
Aud-0.050
Token-
Token ID12
Feature activation-2.09e-02

Pos logit contributions
 needs+0.038
 policy+0.035
 need+0.035
 belong+0.031
 fate+0.031
Neg logit contributions
J-0.062
aults-0.058
 apr-0.058
T-0.057
 guitars-0.057
Tokenyear
Token ID1941
Feature activation-9.54e-02

Pos logit contributions
 needs+0.018
 need+0.018
 harm+0.016
 policy+0.016
 choice+0.015
Neg logit contributions
coat-0.050
aults-0.050
Ol-0.050
 apr-0.050
Aud-0.050
Token-
Token ID12
Feature activation-3.26e-02

Pos logit contributions
 needs+0.150
 need+0.145
 policy+0.143
 situations+0.135
 harm+0.131
Neg logit contributions
 apr-0.139
aults-0.139
coat-0.139
king-0.134
 shirts-0.133
Tokenold
Token ID727
Feature activation-2.28e-02

Pos logit contributions
 needs+0.026
 need+0.026
 harm+0.024
 policy+0.023
 choice+0.021
Neg logit contributions
coat-0.080
aults-0.080
Aud-0.079
Ol-0.079
 apr-0.079
Token boy
Token ID2933
Feature activation-1.28e-02

Pos logit contributions
 need+0.032
 needs+0.031
 policy+0.031
 harm+0.031
 earthquake+0.030
Neg logit contributions
coat-0.041
Ol-0.041
 apr-0.040
aults-0.040
Ryan-0.040
Token who
Token ID508
Feature activation-1.48e-02

Pos logit contributions
 needs+0.020
 policy+0.019
 need+0.019
 decision+0.019
 choice+0.019
Neg logit contributions
coat-0.021
aults-0.021
Ol-0.021
T-0.021
 apr-0.021
Token was
Token ID373
Feature activation-1.28e-02

Pos logit contributions
 policy+0.022
 harm+0.021
 needs+0.021
 need+0.021
 decisions+0.021
Neg logit contributions
coat-0.025
Ol-0.025
Daniel-0.025
T-0.025
aults-0.025
Token a
Token ID257
Feature activation-1.73e-03

Pos logit contributions
 need+0.005
 needs+0.004
 harm+0.004
 policy+0.004
 day+0.004
Neg logit contributions
aults-0.005
coat-0.005
Aud-0.005
Ol-0.005
 apr-0.005
Token very
Token ID845
Feature activation-5.23e-02

Pos logit contributions
 need+0.002
 needs+0.002
 day+0.002
 harm+0.002
 night+0.002
Neg logit contributions
Aud-0.003
coat-0.003
aults-0.003
Ol-0.003
 apr-0.003
Token curious
Token ID11040
Feature activation-2.06e-02

Pos logit contributions
 harm+0.099
 situations+0.098
 policy+0.097
 decisions+0.094
crow+0.093
Neg logit contributions
 apr-0.068
coat-0.068
J-0.067
Daniel-0.065
La-0.064
Token three
Token ID1115
Feature activation-3.85e-02

Pos logit contributions
 harm+0.035
 needs+0.034
 policy+0.034
 need+0.033
 situations+0.031
Neg logit contributions
coat-0.032
 apr-0.031
 skirts-0.030
 guitars-0.030
aults-0.030
Token-
Token ID12
Feature activation-3.29e-02

Pos logit contributions
 policy+0.049
 harm+0.047
 needs+0.047
 need+0.046
 belong+0.046
Neg logit contributions
 apr-0.072
 guitars-0.072
J-0.071
 skirts-0.071
 shirts-0.070
Tokenyear
Token ID1941
Feature activation-9.54e-02

Pos logit contributions
 needs+0.037
 need+0.035
 policy+0.033
 harm+0.032
 decision+0.032
Neg logit contributions
J-0.075
coat-0.072
T-0.072
Ryan-0.072
Daniel-0.071
Token-
Token ID12
Feature activation-4.26e-02

Pos logit contributions
 need+0.150
 needs+0.148
 harm+0.144
 policy+0.140
 means+0.137
Neg logit contributions
coat-0.143
 apr-0.141
 shirts-0.137
aults-0.136
ding-0.133
Tokenold
Token ID727
Feature activation-3.39e-02

Pos logit contributions
 needs+0.050
 need+0.047
 policy+0.044
 harm+0.043
 fear+0.043
Neg logit contributions
J-0.096
Ryan-0.092
T-0.091
Daniel-0.091
coat-0.090
Token boy
Token ID2933
Feature activation-2.08e-02

Pos logit contributions
 harm+0.046
 policy+0.046
 need+0.044
 earthquake+0.044
 needs+0.044
Neg logit contributions
coat-0.061
 guitars-0.061
Ryan-0.060
aults-0.060
T-0.060
Token.
Token ID13
Feature activation-2.47e-02

Pos logit contributions
 decision+0.029
 choice+0.028
 policy+0.028
 fate+0.027
 harm+0.027
Neg logit contributions
 guitars-0.037
T-0.037
coat-0.037
aults-0.036
 skirts-0.036
Token He
Token ID679
Feature activation-2.33e-02

Pos logit contributions
 need+0.021
 needs+0.021
 harm+0.020
 policy+0.019
 choice+0.017
Neg logit contributions
coat-0.059
aults-0.059
 apr-0.059
Aud-0.058
Ol-0.058
TokenCharlie
Token ID37136
Feature activation-5.81e-03

Pos logit contributions
 needs+0.005
 need+0.005
 harm+0.005
 policy+0.005
 choice+0.004
Neg logit contributions
coat-0.005
Ol-0.005
aults-0.005
Aud-0.005
 apr-0.005
Token was
Token ID373
Feature activation-4.33e-03

Pos logit contributions
 need+0.008
 needs+0.008
 harm+0.008
 policy+0.008
 choice+0.008
Neg logit contributions
coat-0.010
aults-0.010
 apr-0.010
Ol-0.010
Aud-0.010
Token a
Token ID257
Feature activation-3.22e-03

Pos logit contributions
 need+0.006
 needs+0.006
 harm+0.006
 policy+0.006
 day+0.006
Neg logit contributions
aults-0.008
coat-0.008
 apr-0.008
Ol-0.008
Aud-0.008
Token three
Token ID1115
Feature activation-3.44e-02

Pos logit contributions
 need+0.005
 needs+0.005
 day+0.004
 night+0.004
 animal+0.004
Neg logit contributions
Aud-0.006
coat-0.006
aults-0.006
 apr-0.006
Ol-0.006
Token-
Token ID12
Feature activation-2.40e-02

Pos logit contributions
 policy+0.038
 harm+0.037
 needs+0.035
 need+0.035
 fate+0.033
Neg logit contributions
 apr-0.074
 guitars-0.072
aults-0.072
J-0.071
 skirts-0.071
Tokenyear
Token ID1941
Feature activation-9.53e-02

Pos logit contributions
 needs+0.022
 need+0.021
 policy+0.020
 harm+0.020
 choice+0.018
Neg logit contributions
coat-0.057
aults-0.057
Ol-0.056
Aud-0.056
 apr-0.056
Token-
Token ID12
Feature activation-3.89e-02

Pos logit contributions
 needs+0.138
 policy+0.136
 need+0.136
 harm+0.130
 means+0.129
Neg logit contributions
aults-0.155
 apr-0.152
coat-0.148
 shirts-0.147
 guitars-0.142
Tokenold
Token ID727
Feature activation-3.14e-02

Pos logit contributions
 needs+0.040
 need+0.038
 policy+0.036
 harm+0.035
 situations+0.034
Neg logit contributions
J-0.090
aults-0.088
coat-0.087
Daniel-0.087
Ryan-0.087
Token who
Token ID508
Feature activation-2.13e-02

Pos logit contributions
 harm+0.051
 policy+0.051
 need+0.050
 needs+0.049
 earthquake+0.049
Neg logit contributions
coat-0.049
 guitars-0.048
aults-0.048
Ryan-0.047
Daniel-0.047
Token loved
Token ID6151
Feature activation-2.19e-02

Pos logit contributions
 policy+0.039
 harm+0.039
 decisions+0.037
 decision+0.037
 choice+0.036
Neg logit contributions
Daniel-0.030
coat-0.030
Ryan-0.029
Ol-0.029
T-0.029
Token books
Token ID3835
Feature activation-3.31e-02

Pos logit contributions
 needs+0.032
 need+0.032
 policy+0.031
 harm+0.031
crow+0.029
Neg logit contributions
 apr-0.039
coat-0.039
 guitars-0.038
aults-0.038
 skirts-0.038
Token a
Token ID257
Feature activation-2.45e-03

Pos logit contributions
aults+0.006
coat+0.006
Ol+0.006
Aud+0.006
 apr+0.006
Neg logit contributions
 need-0.005
 needs-0.005
 harm-0.005
 policy-0.005
 day-0.005
Token gr
Token ID1036
Feature activation-3.34e-02

Pos logit contributions
 need+0.003
 needs+0.003
 day+0.003
 harm+0.003
 animal+0.003
Neg logit contributions
coat-0.005
aults-0.005
Aud-0.005
 apr-0.004
Ol-0.004
Tokenumpy
Token ID32152
Feature activation-2.59e-02

Pos logit contributions
 needs+0.046
 policy+0.046
 need+0.045
 harm+0.045
 choice+0.044
Neg logit contributions
coat-0.061
J-0.060
T-0.059
aults-0.059
ding-0.059
Token three
Token ID1115
Feature activation-3.64e-02

Pos logit contributions
 policy+0.039
 harm+0.039
 need+0.038
 needs+0.038
 situations+0.036
Neg logit contributions
coat-0.045
Ol-0.042
aults-0.042
Daniel-0.042
Ryan-0.042
Token-
Token ID12
Feature activation-2.83e-02

Pos logit contributions
 policy+0.040
 harm+0.036
 needs+0.036
 need+0.035
 belong+0.034
Neg logit contributions
 apr-0.076
 guitars-0.076
J-0.076
aults-0.076
 skirts-0.074
Tokenyear
Token ID1941
Feature activation-9.52e-02

Pos logit contributions
 needs+0.030
 need+0.029
 policy+0.029
 harm+0.028
 situations+0.027
Neg logit contributions
J-0.063
coat-0.063
Ol-0.062
T-0.062
Daniel-0.062
Token-
Token ID12
Feature activation-4.08e-02

Pos logit contributions
 need+0.145
 needs+0.145
 policy+0.142
 harm+0.141
 situations+0.135
Neg logit contributions
coat-0.149
 apr-0.143
 shirts-0.140
aults-0.140
 skirts-0.136
Tokenold
Token ID727
Feature activation-3.15e-02

Pos logit contributions
 needs+0.044
 need+0.042
 policy+0.040
 harm+0.039
 situations+0.038
Neg logit contributions
J-0.094
coat-0.090
Daniel-0.090
Ryan-0.090
T-0.089
Token.
Token ID13
Feature activation-2.60e-02

Pos logit contributions
 harm+0.043
 policy+0.043
 need+0.042
 needs+0.042
 earthquake+0.041
Neg logit contributions
coat-0.058
aults-0.056
Ol-0.056
Daniel-0.055
Ryan-0.055
Token She
Token ID1375
Feature activation-2.63e-02

Pos logit contributions
 harm+0.040
 policy+0.039
 needs+0.038
 need+0.038
 choice+0.036
Neg logit contributions
V-0.044
coat-0.044
La-0.043
J-0.043
Ban-0.043
Token had
Token ID550
Feature activation+1.39e-02

Pos logit contributions
 policy+0.045
 harm+0.044
 decisions+0.044
 need+0.042
 situations+0.042
Neg logit contributions
coat-0.040
La-0.039
 apr-0.039
Daniel-0.039
Ol-0.039
Token was
Token ID373
Feature activation+5.63e-04

Pos logit contributions
aults+0.019
coat+0.019
Aud+0.019
 apr+0.019
Ol+0.019
Neg logit contributions
 needs-0.017
 need-0.017
 harm-0.016
 day-0.015
 policy-0.015
Token an
Token ID281
Feature activation-2.32e-02

Pos logit contributions
aults+0.001
coat+0.001
Ol+0.001
Aud+0.001
 apr+0.001
Neg logit contributions
 need-0.001
 needs-0.001
 harm-0.001
 policy-0.001
 day-0.001
Token eager
Token ID11069
Feature activation-1.85e-02

Pos logit contributions
 policy+0.034
 needs+0.034
 harm+0.034
 need+0.033
ment+0.032
Neg logit contributions
 apr-0.040
coat-0.038
aults-0.038
 guitars-0.037
 skirts-0.037
Token 3
Token ID513
Feature activation-3.80e-02

Pos logit contributions
 harm+0.027
 needs+0.027
 policy+0.026
 situations+0.025
 belongs+0.025
Neg logit contributions
coat-0.033
 apr-0.030
 guitars-0.030
aults-0.029
ding-0.029
Token-
Token ID12
Feature activation-3.73e-02

Pos logit contributions
 needs+0.048
 policy+0.046
 need+0.046
 harm+0.045
 belong+0.044
Neg logit contributions
J-0.072
 apr-0.072
 guitars-0.071
aults-0.068
T-0.066
Tokenyear
Token ID1941
Feature activation-9.50e-02

Pos logit contributions
 needs+0.045
 need+0.042
 situations+0.040
 policy+0.040
 decision+0.040
Neg logit contributions
J-0.083
T-0.079
Ryan-0.079
Ol-0.078
Daniel-0.078
Token-
Token ID12
Feature activation-4.71e-02

Pos logit contributions
 need+0.142
 needs+0.140
 harm+0.137
 policy+0.135
 means+0.130
Neg logit contributions
aults-0.148
 apr-0.147
coat-0.145
 shirts-0.143
 guitars-0.141
Tokenold
Token ID727
Feature activation-3.44e-02

Pos logit contributions
 needs+0.062
 need+0.057
 fear+0.055
 policy+0.054
 situations+0.053
Neg logit contributions
J-0.102
T-0.096
Ryan-0.095
Daniel-0.094
aults-0.094
Token.
Token ID13
Feature activation-2.45e-02

Pos logit contributions
 harm+0.052
 policy+0.051
 earthquake+0.050
 needs+0.049
 need+0.049
Neg logit contributions
Ryan-0.056
coat-0.056
 guitars-0.056
T-0.056
aults-0.056
Token He
Token ID679
Feature activation-2.36e-02

Pos logit contributions
 harm+0.025
 needs+0.024
 need+0.024
 policy+0.023
 choice+0.021
Neg logit contributions
coat-0.054
 apr-0.053
V-0.053
Ol-0.053
Daniel-0.052
Token liked
Token ID8288
Feature activation-2.13e-03

Pos logit contributions
 need+0.036
 harm+0.036
 needs+0.036
 policy+0.036
 decisions+0.034
Neg logit contributions
coat-0.039
 apr-0.039
aults-0.039
Ol-0.038
Aud-0.038
Token a
Token ID257
Feature activation-3.06e-03

Pos logit contributions
 needs+0.006
 need+0.006
 harm+0.006
 policy+0.006
 choice+0.006
Neg logit contributions
coat-0.007
aults-0.007
 apr-0.007
Ol-0.007
Aud-0.007
Token very
Token ID845
Feature activation-4.91e-02

Pos logit contributions
 need+0.004
 needs+0.004
 day+0.004
 harm+0.004
 animal+0.004
Neg logit contributions
coat-0.006
Aud-0.006
aults-0.006
 apr-0.006
Ol-0.006
Token gifted
Token ID22527
Feature activation-2.71e-02

Pos logit contributions
 policy+0.093
 harm+0.093
 situations+0.092
 decisions+0.088
crow+0.087
Neg logit contributions
J-0.061
 apr-0.061
Daniel-0.060
coat-0.060
aults-0.059
Token three
Token ID1115
Feature activation-3.37e-02

Pos logit contributions
 harm+0.042
 needs+0.042
 policy+0.042
 need+0.041
 situations+0.039
Neg logit contributions
coat-0.045
aults-0.043
 apr-0.042
Ryan-0.042
Daniel-0.042
Token-
Token ID12
Feature activation-3.52e-02

Pos logit contributions
 policy+0.040
 harm+0.039
 need+0.036
 belong+0.036
 needs+0.036
Neg logit contributions
 apr-0.067
 guitars-0.064
 shirts-0.064
 skirts-0.063
aults-0.063
Tokenyear
Token ID1941
Feature activation-9.49e-02

Pos logit contributions
 needs+0.041
 need+0.039
 decision+0.038
 policy+0.037
 situations+0.037
Neg logit contributions
J-0.078
T-0.075
Ryan-0.074
Daniel-0.074
Ol-0.074
Token-
Token ID12
Feature activation-4.44e-02

Pos logit contributions
 need+0.149
 needs+0.146
 harm+0.144
 policy+0.140
 means+0.140
Neg logit contributions
 apr-0.128
coat-0.125
 shirts-0.124
aults-0.122
udd-0.121
Tokenold
Token ID727
Feature activation-3.72e-02

Pos logit contributions
 needs+0.057
 need+0.053
 fear+0.052
 earthquake+0.051
 harm+0.051
Neg logit contributions
J-0.094
T-0.088
Ryan-0.088
Daniel-0.088
Ol-0.085
Token.
Token ID13
Feature activation-2.69e-02

Pos logit contributions
 earthquake+0.053
 harm+0.053
 policy+0.052
 need+0.050
 fate+0.049
Neg logit contributions
 guitars-0.064
Ryan-0.063
T-0.063
Daniel-0.063
aults-0.063
Token He
Token ID679
Feature activation-2.15e-02

Pos logit contributions
 harm+0.034
 policy+0.031
 needs+0.031
 earthquake+0.030
 need+0.029
Neg logit contributions
Daniel-0.053
Ban-0.053
V-0.053
 apr-0.052
coat-0.051
Token loved
Token ID6151
Feature activation-1.69e-02

Pos logit contributions
 harm+0.032
 need+0.031
 needs+0.031
 policy+0.031
 decisions+0.030
Neg logit contributions
coat-0.037
 apr-0.036
Ol-0.036
aults-0.036
Daniel-0.036
Token above
Token ID2029
Feature activation+1.34e-02

Pos logit contributions
Ol+0.040
 apr+0.040
V+0.039
T+0.039
Ryan+0.039
Neg logit contributions
 of-0.019
 day-0.017
 need-0.016
 night-0.016
 thing-0.016
Token the
Token ID262
Feature activation+1.34e-02

Pos logit contributions
coat+0.029
Ol+0.029
aults+0.029
Aud+0.029
 apr+0.028
Neg logit contributions
 need-0.015
 needs-0.014
 harm-0.014
 day-0.013
 of-0.013
Token mountains
Token ID12269
Feature activation+1.45e-02

Pos logit contributions
aults+0.022
 apr+0.022
coat+0.022
La+0.022
Aud+0.021
Neg logit contributions
 need-0.021
 policy-0.021
 needs-0.020
 harm-0.020
 choice-0.019
Token and
Token ID290
Feature activation+5.43e-03

Pos logit contributions
coat+0.028
aults+0.027
 apr+0.027
Ol+0.027
Aud+0.027
Neg logit contributions
 need-0.019
 needs-0.019
 harm-0.018
 policy-0.018
 choice-0.017
Token rivers
Token ID18180
Feature activation+3.91e-02

Pos logit contributions
 apr+0.010
coat+0.010
Ol+0.010
aults+0.009
Daniel+0.009
Neg logit contributions
 need-0.008
 needs-0.008
 day-0.008
 harm-0.008
 night-0.007
Token,
Token ID11
Feature activation+5.58e-02

Pos logit contributions
Ol+0.085
 apr+0.084
V+0.083
Ryan+0.083
Aud+0.083
Neg logit contributions
 of-0.042
 need-0.042
 harm-0.041
 needs-0.040
 day-0.039
Token and
Token ID290
Feature activation+1.22e-02

Pos logit contributions
 apr+0.108
V+0.107
Ol+0.107
Ryan+0.106
T+0.103
Neg logit contributions
 day-0.075
 of-0.074
 night-0.071
 need-0.069
 harm-0.067
Token even
Token ID772
Feature activation-1.14e-02

Pos logit contributions
coat+0.021
aults+0.021
 apr+0.021
Ol+0.021
Aud+0.020
Neg logit contributions
 need-0.019
 needs-0.019
 harm-0.018
 policy-0.017
 day-0.017
Token saw
Token ID2497
Feature activation+6.74e-03

Pos logit contributions
 policy+0.018
 needs+0.018
 need+0.017
 harm+0.017
 choice+0.017
Neg logit contributions
aults-0.019
coat-0.019
Aud-0.019
La-0.019
 guitars-0.019
Token some
Token ID617
Feature activation-2.96e-04

Pos logit contributions
coat+0.012
aults+0.012
 apr+0.012
Ol+0.012
Aud+0.012
Neg logit contributions
 need-0.010
 needs-0.010
 harm-0.010
 policy-0.009
 choice-0.009
Token animals
Token ID4695
Feature activation+2.38e-02

Pos logit contributions
 needs+0.000
 need+0.000
 policy+0.000
 harm+0.000
 choice+0.000
Neg logit contributions
aults-0.000
coat-0.000
 apr-0.000
Aud-0.000
Ol-0.000
Token nodded
Token ID14464
Feature activation+6.10e-03

Pos logit contributions
Aud+0.053
coat+0.053
 apr+0.052
aults+0.052
La+0.051
Neg logit contributions
 need-0.035
 needs-0.034
 of-0.031
 animal-0.030
 fear-0.030
Token again
Token ID757
Feature activation-1.99e-02

Pos logit contributions
coat+0.015
aults+0.015
Aud+0.015
Ol+0.015
 apr+0.015
Neg logit contributions
 need-0.004
 needs-0.004
 harm-0.004
 policy-0.004
 of-0.004
Token.
Token ID13
Feature activation-1.44e-02

Pos logit contributions
 needs+0.028
 situations+0.027
 earthquake+0.027
crow+0.027
body+0.026
Neg logit contributions
J-0.035
aults-0.035
T-0.035
La-0.034
king-0.034
Token They
Token ID1119
Feature activation+2.04e-02

Pos logit contributions
 need+0.014
 needs+0.014
 policy+0.013
 harm+0.012
crow+0.012
Neg logit contributions
aults-0.033
coat-0.033
Ol-0.032
 apr-0.032
Daniel-0.032
Token said
Token ID531
Feature activation+1.29e-02

Pos logit contributions
aults+0.040
 apr+0.040
coat+0.040
Aud+0.039
Ol+0.039
Neg logit contributions
 need-0.027
 needs-0.026
 of-0.024
 animal-0.024
 harm-0.023
Token,
Token ID11
Feature activation+6.34e-02

Pos logit contributions
coat+0.026
aults+0.026
 apr+0.026
Ol+0.026
Aud+0.026
Neg logit contributions
 need-0.015
 needs-0.015
 harm-0.015
 policy-0.015
 choice-0.014
Token "
Token ID366
Feature activation+4.46e-02

Pos logit contributions
aults+0.148
coat+0.136
Ryan+0.132
 apr+0.132
Daniel+0.132
Neg logit contributions
 of-0.067
 needs-0.066
 day-0.065
 need-0.063
 thing-0.059
TokenYes
Token ID5297
Feature activation-1.19e-02

Pos logit contributions
coat+0.072
 apr+0.072
 skirts+0.071
aults+0.071
Daniel+0.067
Neg logit contributions
 needs-0.072
 need-0.071
 harm-0.068
 animal-0.068
 policy-0.068
Token,
Token ID11
Feature activation+4.45e-02

Pos logit contributions
 need+0.011
 needs+0.011
 of+0.010
 policy+0.010
 choice+0.010
Neg logit contributions
Aud-0.028
coat-0.028
aults-0.028
Ol-0.028
 apr-0.027
Token we
Token ID356
Feature activation+1.43e-02

Pos logit contributions
Aud+0.075
coat+0.074
Ol+0.073
aults+0.073
 apr+0.072
Neg logit contributions
 needs-0.068
 harm-0.068
 need-0.068
 situations-0.066
 policy-0.064
Token want
Token ID765
Feature activation-1.20e-03

Pos logit contributions
coat+0.031
 apr+0.030
aults+0.030
Ol+0.030
Aud+0.030
Neg logit contributions
 need-0.017
 needs-0.016
 harm-0.015
 policy-0.015
 choice-0.014
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
 needs+0.038
 need+0.038
 policy+0.038
 harm+0.037
 decisions+0.035
Neg logit contributions
coat-0.053
aults-0.051
 apr-0.050
Aud-0.050
Daniel-0.049
Token
Token ID198
Feature activation+1.06e-02

Pos logit contributions
 harm+0.013
 needs+0.013
 need+0.012
 policy+0.012
 choice+0.012
Neg logit contributions
coat-0.021
Ol-0.021
Aud-0.021
 apr-0.021
Daniel-0.021
Token
Token ID198
Feature activation+1.64e-03

Pos logit contributions
aults+0.019
coat+0.019
 apr+0.019
Aud+0.018
Ol+0.018
Neg logit contributions
 need-0.016
 needs-0.015
 harm-0.015
 policy-0.014
 of-0.014
Token"
Token ID1
Feature activation+3.49e-02

Pos logit contributions
coat+0.002
 apr+0.002
aults+0.002
 skirts+0.002
 guitars+0.002
Neg logit contributions
 need-0.003
crow-0.003
 of-0.003
 needs-0.003
 day-0.003
TokenHelp
Token ID22087
Feature activation-1.33e-02

Pos logit contributions
udd+0.059
coat+0.058
aults+0.058
aj+0.056
 skirts+0.055
Neg logit contributions
crow-0.057
 animal-0.053
 thing-0.052
igator-0.051
 day-0.051
Token,
Token ID11
Feature activation+6.34e-02

Pos logit contributions
 need+0.016
 needs+0.016
 harm+0.016
 policy+0.015
crow+0.014
Neg logit contributions
coat-0.027
aults-0.027
Aud-0.027
 apr-0.027
Ol-0.026
Token help
Token ID1037
Feature activation-4.78e-03

Pos logit contributions
Ol+0.109
aults+0.106
coat+0.105
 apr+0.105
V+0.105
Neg logit contributions
 harm-0.093
 needs-0.091
 situations-0.090
 policy-0.090
 decisions-0.089
Token!"
Token ID2474
Feature activation+3.13e-02

Pos logit contributions
 need+0.007
 needs+0.007
 harm+0.007
 policy+0.006
 choice+0.006
Neg logit contributions
coat-0.009
aults-0.009
 apr-0.009
Ol-0.009
Aud-0.009
Token Tom
Token ID4186
Feature activation+1.53e-02

Pos logit contributions
Ol+0.045
 apr+0.044
Daniel+0.043
T+0.042
J+0.042
Neg logit contributions
 policy-0.057
 harm-0.056
 needs-0.056
 decision-0.055
 decisions-0.053
Token and
Token ID290
Feature activation+1.30e-02

Pos logit contributions
coat+0.024
Ol+0.023
aults+0.023
T+0.022
 apr+0.022
Neg logit contributions
 policy-0.027
 needs-0.026
 harm-0.025
 need-0.025
 earthquake-0.024
Token Mia
Token ID32189
Feature activation+2.00e-02

Pos logit contributions
Ol+0.017
coat+0.016
Aud+0.016
V+0.016
J+0.016
Neg logit contributions
 policy-0.026
 needs-0.025
 decisions-0.025
 harm-0.025
 need-0.025
Token.
Token ID13
Feature activation-7.37e-03

Pos logit contributions
 need+0.023
 needs+0.022
 policy+0.021
 harm+0.021
 earthquake+0.020
Neg logit contributions
aults-0.042
Daniel-0.041
J-0.040
 skirts-0.040
 guitars-0.040
Token They
Token ID1119
Feature activation+1.61e-02

Pos logit contributions
 need+0.009
 needs+0.009
 policy+0.008
 harm+0.008
 of+0.008
Neg logit contributions
coat-0.016
aults-0.016
 apr-0.015
Ol-0.015
Aud-0.015
Token hugged
Token ID41130
Feature activation+3.43e-03

Pos logit contributions
coat+0.029
 apr+0.028
Aud+0.028
aults+0.028
Ol+0.028
Neg logit contributions
 need-0.024
 needs-0.024
 harm-0.022
 of-0.021
 day-0.021
Token their
Token ID511
Feature activation+1.23e-02

Pos logit contributions
 apr+0.006
aults+0.006
coat+0.006
Ol+0.005
Aud+0.005
Neg logit contributions
 needs-0.006
 need-0.006
 harm-0.005
 policy-0.005
 choice-0.005
Token dad
Token ID9955
Feature activation+1.10e-02

Pos logit contributions
coat+0.021
Aud+0.021
Ol+0.020
aults+0.020
V+0.020
Neg logit contributions
 needs-0.020
 need-0.019
 harm-0.019
 policy-0.018
 choice-0.018
Token,
Token ID11
Feature activation+7.30e-02

Pos logit contributions
aults+0.019
coat+0.019
 apr+0.019
Aud+0.018
Daniel+0.018
Neg logit contributions
 need-0.017
 harm-0.017
 needs-0.017
 policy-0.016
 earthquake-0.016
Token and
Token ID290
Feature activation+2.16e-02

Pos logit contributions
aults+0.144
coat+0.143
V+0.142
Aud+0.139
Ol+0.139
Neg logit contributions
 need-0.097
 needs-0.093
 of-0.091
 harm-0.089
 animal-0.085
Token thanked
Token ID26280
Feature activation-2.11e-04

Pos logit contributions
 apr+0.030
 guitars+0.029
 skirts+0.029
Aud+0.028
aults+0.028
Neg logit contributions
 harm-0.041
 need-0.040
 policy-0.040
 situations-0.040
 needs-0.040
Token him
Token ID683
Feature activation+1.38e-03

Pos logit contributions
 needs+0.000
 need+0.000
 harm+0.000
 policy+0.000
 decision+0.000
Neg logit contributions
aults-0.000
 apr-0.000
Ol-0.000
coat-0.000
Daniel-0.000
Token for
Token ID329
Feature activation-4.93e-03

Pos logit contributions
coat+0.003
aults+0.003
 apr+0.003
Ol+0.003
Aud+0.003
Neg logit contributions
 need-0.001
 needs-0.001
 harm-0.001
 policy-0.001
 choice-0.001
Token the
Token ID262
Feature activation+2.53e-02

Pos logit contributions
 needs+0.007
 need+0.007
 harm+0.007
 policy+0.007
 choice+0.007
Neg logit contributions
 apr-0.009
aults-0.009
Ol-0.009
coat-0.009
Aud-0.009
Token it
Token ID340
Feature activation-1.21e-04

Pos logit contributions
coat+0.026
aults+0.025
Ol+0.025
 apr+0.025
Aud+0.025
Neg logit contributions
 need-0.010
 needs-0.010
 harm-0.009
 of-0.009
 thing-0.009
Token at
Token ID379
Feature activation-2.47e-02

Pos logit contributions
 need+0.000
 needs+0.000
 harm+0.000
 policy+0.000
 choice+0.000
Neg logit contributions
coat-0.000
aults-0.000
 apr-0.000
Ol-0.000
Aud-0.000
Token them
Token ID606
Feature activation+2.43e-03

Pos logit contributions
 needs+0.040
 policy+0.038
 need+0.037
 decisions+0.035
 decision+0.035
Neg logit contributions
 apr-0.041
J-0.040
T-0.040
Ol-0.040
coat-0.040
Token and
Token ID290
Feature activation-1.13e-03

Pos logit contributions
coat+0.005
aults+0.005
 apr+0.005
Ol+0.005
Aud+0.005
Neg logit contributions
 need-0.003
 needs-0.003
 harm-0.003
 policy-0.003
 choice-0.003
Token said
Token ID531
Feature activation+1.68e-02

Pos logit contributions
 need+0.002
 needs+0.002
 policy+0.002
 harm+0.002
 decisions+0.002
Neg logit contributions
aults-0.002
coat-0.002
Aud-0.002
 apr-0.002
Ol-0.002
Token,
Token ID11
Feature activation+6.26e-02

Pos logit contributions
coat+0.039
aults+0.039
 apr+0.038
Aud+0.038
Ol+0.038
Neg logit contributions
 needs-0.016
 need-0.016
 of-0.015
 harm-0.015
 policy-0.015
Token "
Token ID366
Feature activation+3.71e-02

Pos logit contributions
aults+0.128
coat+0.124
Aud+0.121
 apr+0.120
Ryan+0.119
Neg logit contributions
 needs-0.080
 need-0.078
 of-0.073
 day-0.073
 harm-0.070
TokenA
Token ID32
Feature activation+8.33e-03

Pos logit contributions
 apr+0.065
coat+0.062
 skirts+0.059
aults+0.059
aj+0.058
Neg logit contributions
crow-0.060
 harm-0.057
 need-0.055
 needs-0.055
 day-0.054
Tokenbr
Token ID1671
Feature activation+4.17e-02

Pos logit contributions
coat+0.012
aults+0.012
Ol+0.012
Aud+0.012
 apr+0.012
Neg logit contributions
 needs-0.014
 need-0.014
 harm-0.014
 policy-0.014
 decisions-0.013
Tokenac
Token ID330
Feature activation-2.92e-02

Pos logit contributions
coat+0.074
aults+0.074
 apr+0.074
Ol+0.073
Aud+0.072
Neg logit contributions
 need-0.060
 needs-0.060
 harm-0.060
 policy-0.058
 earthquake-0.056
Tokenad
Token ID324
Feature activation-2.64e-03

Pos logit contributions
 needs+0.047
 policy+0.046
 need+0.045
 night+0.042
 harm+0.041
Neg logit contributions
J-0.046
coat-0.044
V-0.044
La-0.043
Daniel-0.043
Token't
Token ID470
Feature activation-1.54e-02

Pos logit contributions
aults+0.009
V+0.009
Ol+0.009
Aud+0.008
Max+0.008
Neg logit contributions
 policy-0.011
 needs-0.011
 harm-0.011
 situations-0.011
ment-0.011
Token worry
Token ID5490
Feature activation+1.10e-02

Pos logit contributions
crow+0.024
ment+0.024
ore+0.023
 needs+0.023
 policy+0.023
Neg logit contributions
aults-0.026
 guitars-0.024
coat-0.024
 skirts-0.024
 apr-0.024
Token,
Token ID11
Feature activation+5.26e-02

Pos logit contributions
coat+0.021
aults+0.021
 apr+0.021
 skirts+0.021
Aud+0.021
Neg logit contributions
 need-0.014
 harm-0.013
 needs-0.013
 policy-0.013
crow-0.012
Token your
Token ID534
Feature activation-4.38e-03

Pos logit contributions
aults+0.093
 apr+0.090
 guitars+0.090
coat+0.090
Aud+0.090
Neg logit contributions
 needs-0.074
 need-0.074
 harm-0.074
 policy-0.073
 choice-0.070
Token blocks
Token ID7021
Feature activation+1.13e-02

Pos logit contributions
 need+0.007
 needs+0.006
 harm+0.006
 policy+0.006
crow+0.006
Neg logit contributions
coat-0.008
 apr-0.007
aults-0.007
Ol-0.007
 guitars-0.007
Token are
Token ID389
Feature activation+5.83e-03

Pos logit contributions
coat+0.018
aults+0.017
ding+0.016
 apr+0.016
 skirts+0.015
Neg logit contributions
 policy-0.018
crow-0.018
 deeds-0.017
 situations-0.017
 animal-0.017
Token safe
Token ID3338
Feature activation+5.91e-03

Pos logit contributions
aults+0.010
coat+0.009
 apr+0.009
Aud+0.009
 skirts+0.009
Neg logit contributions
 needs-0.009
 need-0.009
 policy-0.009
 harm-0.009
crow-0.009
Token.
Token ID13
Feature activation-3.82e-03

Pos logit contributions
coat+0.011
aults+0.011
 apr+0.010
 skirts+0.010
ding+0.010
Neg logit contributions
 need-0.008
crow-0.008
 needs-0.008
 policy-0.007
 harm-0.007
Token You
Token ID921
Feature activation+1.31e-02

Pos logit contributions
 harm+0.005
 need+0.005
 decision+0.005
 needs+0.005
 choice+0.005
Neg logit contributions
V-0.007
Ol-0.007
Aud-0.007
 apr-0.007
T-0.007
Token can
Token ID460
Feature activation+2.04e-02

Pos logit contributions
aults+0.021
coat+0.020
Ol+0.020
 apr+0.019
Max+0.019
Neg logit contributions
 policy-0.021
ment-0.021
 situations-0.021
 decision-0.020
crow-0.020
Token design
Token ID1486
Feature activation+2.27e-02

Pos logit contributions
coat+0.037
aults+0.036
Ol+0.036
 apr+0.035
Daniel+0.035
Neg logit contributions
 needs-0.029
 policy-0.029
 need-0.028
 harm-0.028
 situations-0.027
Token check
Token ID2198
Feature activation+1.21e-03

Pos logit contributions
 apr+0.001
coat+0.001
V+0.001
aults+0.001
Ol+0.001
Neg logit contributions
 policy-0.001
 needs-0.001
 decisions-0.001
 deeds-0.001
 decision-0.001
Token the
Token ID262
Feature activation+2.75e-02

Pos logit contributions
 apr+0.002
 skirts+0.002
 guitars+0.002
aults+0.002
coat+0.002
Neg logit contributions
 needs-0.002
 decision-0.002
 need-0.002
 harm-0.002
 policy-0.002
Token kitchen
Token ID9592
Feature activation+1.05e-02

Pos logit contributions
 apr+0.050
coat+0.049
aults+0.049
 guitars+0.048
Aud+0.047
Neg logit contributions
 need-0.037
 needs-0.037
 policy-0.036
 harm-0.036
ment-0.034
Token.
Token ID13
Feature activation-4.41e-03

Pos logit contributions
 apr+0.022
aults+0.022
coat+0.022
 skirts+0.021
 shirts+0.021
Neg logit contributions
 policy-0.011
 need-0.011
 needs-0.011
 harm-0.011
crow-0.011
Token He
Token ID679
Feature activation+6.11e-03

Pos logit contributions
 need+0.005
 needs+0.005
 harm+0.005
 policy+0.005
crow+0.004
Neg logit contributions
aults-0.010
coat-0.009
 apr-0.009
Ol-0.009
Aud-0.009
Token said
Token ID531
Feature activation+1.42e-02

Pos logit contributions
aults+0.012
coat+0.012
Aud+0.012
 apr+0.011
Ol+0.011
Neg logit contributions
 needs-0.008
 need-0.008
 harm-0.008
 of-0.008
 animal-0.007
Token,
Token ID11
Feature activation+7.39e-02

Pos logit contributions
coat+0.030
aults+0.030
 apr+0.029
Aud+0.029
Ol+0.029
Neg logit contributions
 need-0.016
 needs-0.016
 harm-0.016
 policy-0.015
 of-0.014
Token "
Token ID366
Feature activation+4.50e-02

Pos logit contributions
aults+0.142
coat+0.134
 apr+0.128
Ryan+0.128
Aud+0.127
Neg logit contributions
 needs-0.109
 need-0.108
 of-0.102
 day-0.098
 harm-0.097
TokenGood
Token ID10248
Feature activation-1.31e-02

Pos logit contributions
coat+0.069
 skirts+0.066
aults+0.065
 apr+0.065
aj+0.061
Neg logit contributions
crow-0.081
 need-0.077
 day-0.076
 needs-0.076
 animal-0.075
Token job
Token ID1693
Feature activation+1.19e-02

Pos logit contributions
 harm+0.011
 need+0.010
 situations+0.009
 needs+0.009
 policy+0.009
Neg logit contributions
coat-0.031
 apr-0.030
ding-0.029
V-0.028
aults-0.028
Token,
Token ID11
Feature activation+6.01e-02

Pos logit contributions
coat+0.022
aults+0.022
 apr+0.021
Daniel+0.021
Ol+0.021
Neg logit contributions
 harm-0.016
 needs-0.016
 need-0.016
 policy-0.015
crow-0.015
Token you
Token ID345
Feature activation-4.84e-03

Pos logit contributions
 needs+0.024
 need+0.024
 policy+0.023
 harm+0.023
 decision+0.023
Neg logit contributions
Ol-0.039
 apr-0.039
 skirts-0.039
aults-0.039
Daniel-0.039
Token,
Token ID11
Feature activation+5.86e-02

Pos logit contributions
 harm+0.006
 needs+0.006
 need+0.005
 policy+0.005
crow+0.005
Neg logit contributions
aults-0.010
 apr-0.010
coat-0.010
T-0.009
 skirts-0.009
Token Lily
Token ID20037
Feature activation+1.66e-02

Pos logit contributions
coat+0.103
aults+0.102
Ol+0.102
Aud+0.101
 apr+0.101
Neg logit contributions
 need-0.090
 needs-0.088
 harm-0.084
 policy-0.084
 choice-0.082
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
coat+0.034
aults+0.033
 skirts+0.032
 apr+0.032
Aud+0.032
Neg logit contributions
 harm-0.019
 need-0.019
 policy-0.019
 needs-0.018
 situations-0.018
Token You
Token ID921
Feature activation+1.38e-02

Pos logit contributions
 harm+0.016
 need+0.016
 needs+0.016
 policy+0.015
 decision+0.015
Neg logit contributions
coat-0.027
Aud-0.027
Ol-0.027
V-0.027
 apr-0.027
Token did
Token ID750
Feature activation+2.93e-02

Pos logit contributions
coat+0.028
aults+0.027
Aud+0.027
 apr+0.027
Ol+0.027
Neg logit contributions
 need-0.018
 needs-0.018
 harm-0.016
 policy-0.016
 of-0.015
Token a
Token ID257
Feature activation+4.36e-02

Pos logit contributions
coat+0.062
Ol+0.059
Aud+0.059
 apr+0.058
aults+0.058
Neg logit contributions
 need-0.038
 needs-0.036
 harm-0.035
 choice-0.033
 wishes-0.032
Token good
Token ID922
Feature activation+3.08e-02

Pos logit contributions
coat+0.094
Ol+0.090
Aud+0.090
Daniel+0.087
 apr+0.087
Neg logit contributions
 need-0.056
 needs-0.055
 job-0.053
 thing-0.053
 day-0.052
Token job
Token ID1693
Feature activation+1.20e-02

Pos logit contributions
coat+0.072
V+0.071
 apr+0.070
T+0.070
La+0.068
Neg logit contributions
 harm-0.025
igator-0.025
crow-0.024
 situations-0.023
igators-0.022
Token.
Token ID13
Feature activation-1.58e-02

Pos logit contributions
coat+0.025
 apr+0.025
aults+0.024
 skirts+0.024
 guitars+0.024
Neg logit contributions
 needs-0.013
 need-0.013
 harm-0.013
 policy-0.012
crow-0.012
Token You
Token ID921
Feature activation+1.12e-02

Pos logit contributions
 harm+0.020
 needs+0.019
 need+0.019
 decision+0.018
 policy+0.018
Neg logit contributions
Aud-0.031
coat-0.031
V-0.031
Ol-0.031
 apr-0.031
Token were
Token ID547
Feature activation+1.45e-02

Pos logit contributions
coat+0.012
aults+0.012
Ol+0.012
 apr+0.012
Daniel+0.012
Neg logit contributions
 needs-0.010
 need-0.010
 harm-0.010
 policy-0.009
 choice-0.009
Token bright
Token ID6016
Feature activation+6.05e-03

Pos logit contributions
coat+0.028
Aud+0.028
Ol+0.027
 apr+0.027
aults+0.027
Neg logit contributions
 need-0.020
 of-0.020
 needs-0.019
 animal-0.019
 harm-0.019
Token colours
Token ID18915
Feature activation+1.92e-02

Pos logit contributions
Ol+0.011
Aud+0.011
coat+0.011
aults+0.011
 apr+0.011
Neg logit contributions
 need-0.009
 needs-0.009
 of-0.009
 harm-0.009
 animal-0.008
Token and
Token ID290
Feature activation+8.73e-03

Pos logit contributions
Ol+0.038
 apr+0.038
coat+0.038
aults+0.038
Aud+0.038
Neg logit contributions
 need-0.024
 needs-0.023
 harm-0.023
 policy-0.022
 of-0.021
Token magical
Token ID10883
Feature activation-4.98e-03

Pos logit contributions
coat+0.013
aults+0.013
Ol+0.013
 apr+0.013
Aud+0.013
Neg logit contributions
 need-0.016
 needs-0.015
 harm-0.015
 policy-0.015
 day-0.014
Token things
Token ID1243
Feature activation+8.71e-03

Pos logit contributions
 need+0.008
 needs+0.007
 policy+0.007
 harm+0.007
 earthquake+0.007
Neg logit contributions
 skirts-0.008
 guitars-0.008
aults-0.008
coat-0.008
 shirts-0.008
Token.
Token ID13
Feature activation-8.42e-03

Pos logit contributions
coat+0.019
aults+0.019
Aud+0.019
 apr+0.019
Daniel+0.019
Neg logit contributions
 needs-0.009
 need-0.009
 policy-0.008
 harm-0.008
 choice-0.008
Token
Token ID198
Feature activation+1.31e-02

Pos logit contributions
 needs+0.013
 harm+0.013
 need+0.013
 policy+0.013
 choice+0.012
Neg logit contributions
coat-0.014
Ol-0.014
 apr-0.014
V-0.014
Aud-0.014
Token
Token ID198
Feature activation+1.09e-03

Pos logit contributions
aults+0.025
coat+0.025
 apr+0.025
Aud+0.024
 skirts+0.024
Neg logit contributions
 need-0.018
 needs-0.018
 harm-0.017
 of-0.017
 policy-0.017
TokenA
Token ID32
Feature activation-8.32e-03

Pos logit contributions
 apr+0.001
 skirts+0.001
Aud+0.001
coat+0.001
 guitars+0.001
Neg logit contributions
 need-0.002
 of-0.002
 day-0.002
 needs-0.002
 harm-0.002
Token man
Token ID582
Feature activation-2.17e-02

Pos logit contributions
 need+0.011
 day+0.011
 needs+0.011
 animal+0.010
 harm+0.010
Neg logit contributions
 apr-0.017
coat-0.016
aults-0.016
Aud-0.016
 skirts-0.016
Token gave
Token ID2921
Feature activation-1.53e-02

Pos logit contributions
 needs+0.014
 need+0.014
 harm+0.013
 of+0.013
 policy+0.013
Neg logit contributions
aults-0.012
coat-0.011
 apr-0.011
Ol-0.011
Aud-0.011
Token them
Token ID606
Feature activation+1.45e-02

Pos logit contributions
 needs+0.027
 need+0.027
 policy+0.026
 harm+0.026
 decision+0.025
Neg logit contributions
coat-0.022
 apr-0.022
aults-0.022
Aud-0.022
Ol-0.022
Token the
Token ID262
Feature activation+2.77e-02

Pos logit contributions
coat+0.031
aults+0.030
Daniel+0.030
ding+0.030
Ryan+0.029
Neg logit contributions
 need-0.016
 of-0.016
 animal-0.016
 needs-0.016
 thing-0.015
Token ball
Token ID2613
Feature activation-1.32e-02

Pos logit contributions
aults+0.044
La+0.044
Daniel+0.043
T+0.043
Ol+0.043
Neg logit contributions
 need-0.049
 needs-0.048
 choice-0.045
 policy-0.045
 harm-0.045
Token and
Token ID290
Feature activation+1.15e-02

Pos logit contributions
 harm+0.021
 needs+0.021
 situations+0.021
 need+0.021
 policy+0.021
Neg logit contributions
coat-0.021
 skirts-0.020
 apr-0.020
aults-0.020
Aud-0.020
Token they
Token ID484
Feature activation+3.23e-02

Pos logit contributions
coat+0.018
 apr+0.018
aults+0.018
Aud+0.018
Ol+0.018
Neg logit contributions
 need-0.019
 needs-0.019
 harm-0.019
 policy-0.018
 choice-0.017
Token said
Token ID531
Feature activation+1.50e-02

Pos logit contributions
coat+0.058
Aud+0.058
Daniel+0.058
aults+0.057
Ol+0.057
Neg logit contributions
 need-0.046
 needs-0.045
 harm-0.042
 of-0.041
 day-0.041
Token goodbye
Token ID24829
Feature activation-3.82e-03

Pos logit contributions
coat+0.027
aults+0.027
 apr+0.027
Aud+0.027
Ol+0.027
Neg logit contributions
 need-0.021
 needs-0.021
 harm-0.021
 policy-0.020
 choice-0.019
Token.
Token ID13
Feature activation-1.05e-02

Pos logit contributions
 need+0.004
 needs+0.004
 policy+0.004
 harm+0.004
 situations+0.004
Neg logit contributions
coat-0.008
aults-0.008
 apr-0.008
Aud-0.008
T-0.007
Token They
Token ID1119
Feature activation+2.24e-02

Pos logit contributions
 need+0.011
 needs+0.011
 harm+0.010
 policy+0.010
 of+0.010
Neg logit contributions
aults-0.023
coat-0.023
Ol-0.023
 apr-0.023
Daniel-0.022
Token ran
Token ID4966
Feature activation+2.61e-02

Pos logit contributions
aults+0.041
coat+0.041
 apr+0.040
Ol+0.040
Aud+0.040
Neg logit contributions
 need-0.032
 needs-0.031
 harm-0.030
 of-0.029
 day-0.028
Token
Token ID198
Feature activation-1.53e-03

Pos logit contributions
aults+0.014
coat+0.014
 apr+0.014
Aud+0.014
 skirts+0.014
Neg logit contributions
 need-0.010
 needs-0.010
 harm-0.010
 of-0.009
 day-0.009
TokenAs
Token ID1722
Feature activation-2.87e-02

Pos logit contributions
 of+0.003
 day+0.003
 need+0.003
 policy+0.003
 harm+0.003
Neg logit contributions
 skirts-0.002
 apr-0.002
coat-0.002
 wrists-0.002
Aud-0.002
Token the
Token ID262
Feature activation-1.43e-02

Pos logit contributions
 needs+0.045
 need+0.043
 harm+0.042
 situations+0.041
 decisions+0.041
Neg logit contributions
aults-0.048
T-0.047
coat-0.047
Ol-0.047
J-0.047
Token show
Token ID905
Feature activation-1.95e-02

Pos logit contributions
 needs+0.019
 need+0.019
 harm+0.018
 policy+0.018
ment+0.018
Neg logit contributions
 apr-0.028
coat-0.027
 guitars-0.027
aults-0.026
Ryan-0.026
Token ended
Token ID4444
Feature activation-3.30e-02

Pos logit contributions
 harm+0.027
 need+0.026
 needs+0.026
 policy+0.026
 situations+0.025
Neg logit contributions
coat-0.035
aults-0.035
 apr-0.034
 guitars-0.034
Daniel-0.034
Token,
Token ID11
Feature activation+6.62e-04

Pos logit contributions
 needs+0.046
 need+0.046
 policy+0.043
 harm+0.043
crow+0.041
Neg logit contributions
aults-0.062
coat-0.061
 apr-0.061
Daniel-0.059
 guitars-0.059
Token Tim
Token ID5045
Feature activation+1.17e-02

Pos logit contributions
 apr+0.001
Ol+0.001
coat+0.001
Max+0.001
aults+0.001
Neg logit contributions
 needs-0.001
 harm-0.001
 need-0.001
 policy-0.001
 situations-0.001
Tokenmy
Token ID1820
Feature activation-1.06e-02

Pos logit contributions
aults+0.015
coat+0.015
 apr+0.015
Aud+0.015
Ol+0.015
Neg logit contributions
 needs-0.023
 need-0.023
 harm-0.022
 policy-0.022
 choice-0.021
Token's
Token ID338
Feature activation-2.30e-02

Pos logit contributions
 needs+0.015
 need+0.014
 harm+0.014
 day+0.013
 of+0.013
Neg logit contributions
coat-0.020
Aud-0.020
aults-0.020
 apr-0.020
Ol-0.020
Token mom
Token ID1995
Feature activation+2.07e-02

Pos logit contributions
 needs+0.037
 need+0.037
 harm+0.036
 policy+0.035
crow+0.034
Neg logit contributions
coat-0.037
 apr-0.037
aults-0.037
Ol-0.036
Aud-0.036
Tokenmy
Token ID1820
Feature activation+9.62e-03

Pos logit contributions
aults+0.036
Aud+0.036
 apr+0.035
Ol+0.034
 guitars+0.034
Neg logit contributions
 need-0.033
 needs-0.032
 of-0.032
 harm-0.031
 day-0.030
Token the
Token ID262
Feature activation+2.87e-02

Pos logit contributions
aults+0.002
 apr+0.002
coat+0.002
Ol+0.002
Aud+0.002
Neg logit contributions
 needs-0.001
 need-0.001
 policy-0.001
 harm-0.001
 choice-0.001
Token water
Token ID1660
Feature activation+1.88e-02

Pos logit contributions
coat+0.052
aults+0.052
 apr+0.051
Aud+0.051
Ol+0.051
Neg logit contributions
 need-0.041
 needs-0.041
 harm-0.040
 policy-0.039
 choice-0.037
Token.
Token ID13
Feature activation-2.67e-03

Pos logit contributions
aults+0.038
coat+0.038
 skirts+0.037
 guitars+0.037
 apr+0.037
Neg logit contributions
 policy-0.022
 needs-0.021
 harm-0.021
 need-0.020
 earthquake-0.019
Token
Token ID198
Feature activation+1.92e-02

Pos logit contributions
 need+0.003
 needs+0.003
 of+0.003
 policy+0.003
crow+0.003
Neg logit contributions
aults-0.006
coat-0.006
Ol-0.005
 apr-0.005
Aud-0.005
Token
Token ID198
Feature activation+4.79e-03

Pos logit contributions
aults+0.040
coat+0.039
 apr+0.038
 skirts+0.038
Aud+0.037
Neg logit contributions
 need-0.025
 of-0.024
 needs-0.023
 harm-0.023
 day-0.023
TokenOne
Token ID3198
Feature activation-4.07e-02

Pos logit contributions
coat+0.008
aults+0.008
 apr+0.007
 skirts+0.007
Ol+0.007
Neg logit contributions
 need-0.008
 day-0.008
 of-0.008
 needs-0.008
crow-0.008
Token day
Token ID1110
Feature activation+1.59e-02

Pos logit contributions
 situations+0.044
 decisions+0.043
 policy+0.043
 harm+0.042
 deeds+0.041
Neg logit contributions
Ben-0.083
J-0.079
ding-0.077
La-0.077
coat-0.077
Token,
Token ID11
Feature activation+1.21e-02

Pos logit contributions
coat+0.036
Aud+0.036
aults+0.036
Ol+0.036
 apr+0.036
Neg logit contributions
 need-0.016
 needs-0.016
 policy-0.015
 of-0.014
 day-0.014
Token Tom
Token ID4186
Feature activation-5.54e-03

Pos logit contributions
coat+0.020
aults+0.020
Ol+0.020
Aud+0.020
 apr+0.020
Neg logit contributions
 need-0.019
 needs-0.019
 policy-0.018
 harm-0.018
 day-0.017
Token has
Token ID468
Feature activation+5.59e-03

Pos logit contributions
 need+0.008
 harm+0.008
 needs+0.008
 policy+0.008
 choice+0.007
Neg logit contributions
coat-0.010
aults-0.010
 apr-0.009
Ol-0.009
Aud-0.009
Token a
Token ID257
Feature activation+2.90e-03

Pos logit contributions
coat+0.011
aults+0.011
Ol+0.011
Aud+0.011
 apr+0.011
Neg logit contributions
 need-0.007
 needs-0.007
 harm-0.007
 day-0.006
 policy-0.006
Token mighty
Token ID18680
Feature activation-6.38e-03

Pos logit contributions
aults+0.018
coat+0.018
 apr+0.017
Ol+0.017
Aud+0.017
Neg logit contributions
 needs-0.015
 need-0.015
 harm-0.015
 policy-0.014
crow-0.014
Token jump
Token ID4391
Feature activation-1.81e-03

Pos logit contributions
 needs+0.009
 policy+0.009
crow+0.009
ment+0.009
 harm+0.009
Neg logit contributions
aults-0.011
coat-0.011
 apr-0.011
 guitars-0.011
 skirts-0.010
Token over
Token ID625
Feature activation-9.15e-03

Pos logit contributions
 needs+0.002
 need+0.002
 policy+0.002
 harm+0.002
 choice+0.002
Neg logit contributions
coat-0.004
aults-0.004
 apr-0.003
Ol-0.003
Aud-0.003
Token a
Token ID257
Feature activation+1.30e-02

Pos logit contributions
 needs+0.013
 policy+0.013
 need+0.013
 harm+0.012
 decision+0.012
Neg logit contributions
 apr-0.016
aults-0.016
 skirts-0.016
 guitars-0.016
Daniel-0.016
Token big
Token ID1263
Feature activation+5.54e-03

Pos logit contributions
coat+0.022
aults+0.022
 apr+0.022
Ol+0.022
Aud+0.022
Neg logit contributions
 need-0.019
 needs-0.019
 harm-0.019
 policy-0.018
 choice-0.018
Token pile
Token ID14540
Feature activation-1.57e-02

Pos logit contributions
coat+0.009
 apr+0.009
aults+0.009
Aud+0.009
 skirts+0.009
Neg logit contributions
 needs-0.008
 policy-0.008
 need-0.008
 harm-0.008
ment-0.008
Token of
Token ID286
Feature activation+9.21e-03

Pos logit contributions
 needs+0.011
 harm+0.011
 need+0.010
 policy+0.010
 wishes+0.009
Neg logit contributions
coat-0.040
aults-0.039
J-0.037
 skirts-0.037
 apr-0.037
Token rocks
Token ID12586
Feature activation+1.78e-04

Pos logit contributions
 apr+0.014
coat+0.014
aults+0.013
 skirts+0.013
 guitars+0.013
Neg logit contributions
 needs-0.016
 need-0.016
 policy-0.015
 harm-0.015
 decision-0.015
Token.
Token ID13
Feature activation-1.44e-02

Pos logit contributions
coat+0.000
aults+0.000
 skirts+0.000
ding+0.000
 guitars+0.000
Neg logit contributions
 policy-0.000
 needs-0.000
 need-0.000
 harm-0.000
 choice-0.000
Token Leo
Token ID19632
Feature activation+1.49e-03

Pos logit contributions
 need+0.022
 needs+0.022
 harm+0.022
 policy+0.021
 choice+0.020
Neg logit contributions
Aud-0.024
Ol-0.024
coat-0.024
 apr-0.024
aults-0.024
Token was
Token ID373
Feature activation-8.71e-03

Pos logit contributions
coat+0.003
aults+0.003
 apr+0.003
Ol+0.003
Aud+0.003
Neg logit contributions
 need-0.002
 needs-0.002
 harm-0.002
 policy-0.002
 choice-0.002
Token saw
Token ID2497
Feature activation+4.09e-04

Pos logit contributions
Ol+0.001
aults+0.001
Aud+0.001
V+0.001
La+0.001
Neg logit contributions
 needs-0.001
 need-0.000
 animal-0.000
 lives-0.000
 of-0.000
Token a
Token ID257
Feature activation+6.51e-03

Pos logit contributions
coat+0.001
aults+0.001
Ol+0.001
Aud+0.001
 apr+0.001
Neg logit contributions
 need-0.001
 needs-0.000
 harm-0.000
 policy-0.000
 choice-0.000
Token big
Token ID1263
Feature activation+1.77e-03

Pos logit contributions
Aud+0.013
Ol+0.012
aults+0.012
 apr+0.012
La+0.012
Neg logit contributions
 thing-0.009
 need-0.009
 animal-0.009
 needs-0.009
 day-0.008
Token,
Token ID11
Feature activation+3.95e-02

Pos logit contributions
coat+0.003
aults+0.003
 apr+0.003
Aud+0.003
Ol+0.003
Neg logit contributions
 needs-0.003
 need-0.003
 harm-0.003
 policy-0.003
 situations-0.002
Token round
Token ID2835
Feature activation+1.48e-03

Pos logit contributions
coat+0.064
aults+0.064
 apr+0.063
Ol+0.063
Aud+0.062
Neg logit contributions
 need-0.064
 needs-0.064
 harm-0.062
 policy-0.061
 choice-0.058
Token pie
Token ID2508
Feature activation-3.05e-03

Pos logit contributions
coat+0.003
 apr+0.002
 skirts+0.002
aults+0.002
 shirts+0.002
Neg logit contributions
 harm-0.002
 needs-0.002
 need-0.002
 policy-0.002
 situations-0.002
Token sitting
Token ID5586
Feature activation+1.59e-02

Pos logit contributions
 need+0.004
 harm+0.004
 needs+0.004
 policy+0.004
 decisions+0.004
Neg logit contributions
coat-0.006
aults-0.006
 apr-0.005
 skirts-0.005
Aud-0.005
Token on
Token ID319
Feature activation-1.09e-02

Pos logit contributions
coat+0.029
aults+0.028
Ol+0.028
 apr+0.028
Daniel+0.028
Neg logit contributions
 need-0.023
 harm-0.023
 needs-0.022
 policy-0.022
 choice-0.021
Token the
Token ID262
Feature activation+2.67e-02

Pos logit contributions
 needs+0.014
 need+0.014
 harm+0.014
 policy+0.013
 choice+0.012
Neg logit contributions
coat-0.021
aults-0.021
 apr-0.021
Aud-0.021
Ol-0.021
Token counter
Token ID3753
Feature activation-1.53e-02

Pos logit contributions
Ol+0.049
coat+0.049
Aud+0.049
aults+0.049
Daniel+0.048
Neg logit contributions
 needs-0.038
 need-0.037
 harm-0.036
 policy-0.035
 animal-0.034
Token.
Token ID13
Feature activation-1.59e-02

Pos logit contributions
 need+0.015
 needs+0.015
 harm+0.015
 policy+0.015
 choice+0.013
Neg logit contributions
coat-0.034
aults-0.034
 apr-0.033
Daniel-0.033
Aud-0.032
Tokenmy
Token ID1820
Feature activation+2.13e-02

Pos logit contributions
aults+0.032
Aud+0.031
 apr+0.031
coat+0.030
Ol+0.030
Neg logit contributions
 need-0.029
 needs-0.029
 harm-0.028
 of-0.027
 day-0.026
Token came
Token ID1625
Feature activation+3.90e-02

Pos logit contributions
aults+0.043
Aud+0.043
 apr+0.042
La+0.042
coat+0.041
Neg logit contributions
 need-0.028
 needs-0.028
 of-0.026
 harm-0.026
 animal-0.024
Token to
Token ID284
Feature activation+1.10e-04

Pos logit contributions
Aud+0.081
aults+0.080
coat+0.080
 apr+0.080
Ol+0.079
Neg logit contributions
 need-0.047
 harm-0.045
 needs-0.044
 of-0.043
 day-0.041
Token her
Token ID607
Feature activation-2.13e-02

Pos logit contributions
aults+0.000
Aud+0.000
coat+0.000
 guitars+0.000
ding+0.000
Neg logit contributions
 need-0.000
 harm-0.000
 needs-0.000
 choice-0.000
 of-0.000
Token and
Token ID290
Feature activation+5.44e-04

Pos logit contributions
 needs+0.033
 harm+0.033
 policy+0.033
 need+0.032
crow+0.032
Neg logit contributions
 apr-0.036
coat-0.035
aults-0.035
 skirts-0.034
Daniel-0.033
Token asked
Token ID1965
Feature activation-8.33e-03

Pos logit contributions
aults+0.001
coat+0.001
Aud+0.001
La+0.001
 apr+0.001
Neg logit contributions
 needs-0.001
 need-0.001
 harm-0.001
 of-0.001
 day-0.001
Token her
Token ID607
Feature activation-1.87e-02

Pos logit contributions
 need+0.011
 needs+0.011
 harm+0.010
 policy+0.010
 choice+0.009
Neg logit contributions
coat-0.016
aults-0.016
 apr-0.016
Aud-0.016
Ol-0.016
Token what
Token ID644
Feature activation+2.45e-03

Pos logit contributions
 harm+0.028
 needs+0.027
 need+0.027
 policy+0.027
 earthquake+0.026
Neg logit contributions
 apr-0.032
aults-0.032
coat-0.032
 skirts-0.031
Ol-0.031
Token was
Token ID373
Feature activation-7.64e-03

Pos logit contributions
coat+0.005
Ol+0.005
T+0.005
Aud+0.005
J+0.005
Neg logit contributions
 needs-0.003
 need-0.003
 harm-0.003
 decisions-0.003
 decision-0.003
Token wrong
Token ID2642
Feature activation-2.64e-02

Pos logit contributions
 needs+0.007
 need+0.007
 harm+0.007
 policy+0.006
 day+0.006
Neg logit contributions
aults-0.018
Aud-0.018
Ol-0.018
 apr-0.018
coat-0.017
Token.
Token ID13
Feature activation-1.42e-02

Pos logit contributions
 needs+0.033
 harm+0.032
 need+0.032
crow+0.031
igator+0.030
Neg logit contributions
coat-0.051
 skirts-0.050
aults-0.050
J-0.049
 apr-0.049
Token way
Token ID835
Feature activation-6.31e-03

Pos logit contributions
 apr+0.019
aults+0.019
Ol+0.019
Aud+0.019
coat+0.019
Neg logit contributions
 need-0.011
 needs-0.011
 harm-0.010
 policy-0.010
 choice-0.010
Token to
Token ID284
Feature activation-1.59e-02

Pos logit contributions
 needs+0.011
 deeds+0.011
 choice+0.011
 situations+0.011
 decisions+0.010
Neg logit contributions
coat-0.009
aults-0.008
ding-0.008
J-0.008
Daniel-0.008
Token keep
Token ID1394
Feature activation-2.09e-02

Pos logit contributions
 needs+0.027
 policy+0.026
 decisions+0.024
ore+0.024
 deeds+0.024
Neg logit contributions
coat-0.025
Wh-0.024
Aud-0.024
aults-0.024
Daniel-0.024
Token herself
Token ID5223
Feature activation-9.80e-03

Pos logit contributions
 needs+0.036
 need+0.034
 policy+0.034
 decision+0.033
 choice+0.033
Neg logit contributions
coat-0.032
aults-0.032
Aud-0.031
 skirts-0.031
T-0.031
Token safe
Token ID3338
Feature activation-3.23e-03

Pos logit contributions
crow+0.015
 needs+0.014
 policy+0.014
 harm+0.013
 need+0.013
Neg logit contributions
aults-0.019
coat-0.017
 skirts-0.017
 apr-0.016
ding-0.016
Token from
Token ID422
Feature activation-4.80e-02

Pos logit contributions
 needs+0.005
 need+0.004
body+0.004
crow+0.004
 policy+0.004
Neg logit contributions
aults-0.006
coat-0.006
 skirts-0.005
 guitars-0.005
king-0.005
Token Fl
Token ID1610
Feature activation-3.26e-03

Pos logit contributions
 policy+0.090
 needs+0.088
 deeds+0.086
ment+0.085
ore+0.085
Neg logit contributions
 skirts-0.057
aults-0.056
La-0.055
 guitars-0.054
 sunscreen-0.051
Tokenuffy
Token ID15352
Feature activation-9.79e-03

Pos logit contributions
 need+0.006
 needs+0.006
 harm+0.006
 policy+0.006
 choice+0.006
Neg logit contributions
coat-0.004
aults-0.004
Ol-0.004
Aud-0.004
 apr-0.004
Token.
Token ID13
Feature activation-1.22e-02

Pos logit contributions
 policy+0.013
ment+0.013
 needs+0.012
crow+0.012
 need+0.012
Neg logit contributions
aults-0.019
coat-0.018
 skirts-0.017
T-0.017
Max-0.016
Token From
Token ID3574
Feature activation-5.55e-02

Pos logit contributions
 harm+0.022
 need+0.021
 policy+0.020
 earthquake+0.020
 decision+0.020
Neg logit contributions
V-0.018
Ol-0.018
T-0.018
Aud-0.018
Max-0.018
Token that
Token ID326
Feature activation-1.91e-02

Pos logit contributions
 need+0.074
 policy+0.072
 situations+0.071
 needs+0.069
 decisions+0.068
Neg logit contributions
Aud-0.107
Max-0.105
Ben-0.104
aults-0.104
T-0.103
Token It
Token ID632
Feature activation-6.29e-03

Pos logit contributions
 needs+0.012
 need+0.012
 policy+0.011
 day+0.011
 of+0.011
Neg logit contributions
aults-0.024
coat-0.024
Aud-0.023
Ol-0.023
 apr-0.023
Token was
Token ID373
Feature activation-9.69e-03

Pos logit contributions
 needs+0.007
 need+0.007
 harm+0.007
 policy+0.007
 choice+0.006
Neg logit contributions
coat-0.013
aults-0.013
Aud-0.013
 apr-0.013
Ol-0.013
Token the
Token ID262
Feature activation+1.50e-02

Pos logit contributions
 need+0.014
 needs+0.013
 harm+0.013
 day+0.013
 of+0.012
Neg logit contributions
coat-0.018
Ol-0.018
Aud-0.018
aults-0.018
Ryan-0.018
Token best
Token ID1266
Feature activation+9.71e-03

Pos logit contributions
Aud+0.029
Ol+0.029
T+0.029
V+0.029
J+0.028
Neg logit contributions
 day-0.021
 need-0.021
 needs-0.020
 choice-0.020
 night-0.020
Token day
Token ID1110
Feature activation+8.32e-03

Pos logit contributions
coat+0.016
 apr+0.016
aults+0.015
 skirts+0.014
Aud+0.014
Neg logit contributions
 needs-0.016
 harm-0.016
crow-0.015
 need-0.015
 situations-0.015
Token ever
Token ID1683
Feature activation-4.93e-02

Pos logit contributions
aults+0.018
J+0.017
 apr+0.017
ding+0.017
coat+0.017
Neg logit contributions
 harm-0.009
 needs-0.009
 decision-0.008
 choice-0.008
 situations-0.008
Token!
Token ID0
Feature activation-1.90e-02

Pos logit contributions
 harm+0.080
 need+0.079
crow+0.079
 needs+0.079
 situations+0.077
Neg logit contributions
aults-0.079
ding-0.076
Daniel-0.073
coat-0.073
T-0.073
Token<|endoftext|>
Token ID50256
Feature activation+4.68e-03

Pos logit contributions
 needs+0.031
 need+0.031
 harm+0.030
 policy+0.029
 choice+0.028
Neg logit contributions
coat-0.030
aults-0.030
 apr-0.030
Ol-0.030
Aud-0.030
TokenOnce
Token ID7454
Feature activation+2.44e-03

Pos logit contributions
aults+0.008
coat+0.008
 apr+0.008
Aud+0.008
Ol+0.008
Neg logit contributions
 need-0.007
 needs-0.007
 harm-0.007
 policy-0.007
 day-0.007
Token upon
Token ID2402
Feature activation+6.50e-03

Pos logit contributions
coat+0.003
Daniel+0.003
Ol+0.003
J+0.003
 apr+0.003
Neg logit contributions
 harm-0.005
 needs-0.005
 need-0.005
 situations-0.005
 decisions-0.005
Token a
Token ID257
Feature activation+2.90e-02

Pos logit contributions
coat+0.012
aults+0.012
 apr+0.012
Aud+0.012
Ol+0.012
Neg logit contributions
 need-0.009
 needs-0.009
 harm-0.009
 policy-0.008
 of-0.008
Token scared
Token ID12008
Feature activation-4.49e-03

Pos logit contributions
 needs+0.008
 policy+0.008
 harm+0.008
 need+0.008
 situations+0.008
Neg logit contributions
coat-0.008
aults-0.008
Aud-0.008
 apr-0.008
Ol-0.008
Token but
Token ID475
Feature activation-3.57e-02

Pos logit contributions
 needs+0.008
 policy+0.007
 need+0.007
 choice+0.007
 harm+0.007
Neg logit contributions
coat-0.006
aults-0.006
 skirts-0.006
T-0.006
Ryan-0.006
Token she
Token ID673
Feature activation-1.62e-02

Pos logit contributions
 situations+0.067
 needs+0.067
 policy+0.067
ore+0.066
 need+0.064
Neg logit contributions
J-0.050
T-0.049
Max-0.048
Aud-0.047
Ol-0.047
Token did
Token ID750
Feature activation+6.37e-04

Pos logit contributions
 situations+0.028
 need+0.027
 policy+0.027
 harm+0.027
 needs+0.027
Neg logit contributions
T-0.025
Ol-0.025
Max-0.024
J-0.024
Daniel-0.024
Token it
Token ID340
Feature activation-1.24e-02

Pos logit contributions
aults+0.001
Daniel+0.001
Aud+0.001
T+0.001
coat+0.001
Neg logit contributions
 needs-0.001
crow-0.001
 policy-0.001
 need-0.001
 situations-0.001
Token because
Token ID780
Feature activation-5.15e-02

Pos logit contributions
 needs+0.018
 harm+0.017
crow+0.017
 need+0.017
 policy+0.017
Neg logit contributions
coat-0.023
aults-0.021
Max-0.021
T-0.021
Ol-0.020
Token she
Token ID673
Feature activation-1.21e-02

Pos logit contributions
 situations+0.105
eer+0.105
 needs+0.104
 deeds+0.101
 decisions+0.100
Neg logit contributions
Max-0.060
T-0.059
J-0.058
Rose-0.057
 skirts-0.052
Token was
Token ID373
Feature activation-5.40e-03

Pos logit contributions
 policy+0.019
 situations+0.019
 decisions+0.019
 earthquake+0.018
 need+0.018
Neg logit contributions
T-0.020
Ol-0.020
Max-0.020
coat-0.020
V-0.020
Token very
Token ID845
Feature activation-2.66e-02

Pos logit contributions
 policy+0.009
 needs+0.009
 situations+0.009
 need+0.008
 harm+0.008
Neg logit contributions
Aud-0.008
aults-0.008
coat-0.008
 skirts-0.008
Ryan-0.008
Token thirsty
Token ID47124
Feature activation+1.37e-02

Pos logit contributions
 situations+0.056
 policy+0.054
crow+0.053
loo+0.052
ore+0.052
Neg logit contributions
Aud-0.028
coat-0.028
ends-0.027
Ben-0.026
Pink-0.026
Token!
Token ID0
Feature activation-1.79e-02

Pos logit contributions
coat+0.025
aults+0.024
 skirts+0.023
ding+0.022
 shirts+0.022
Neg logit contributions
 needs-0.019
 policy-0.018
 need-0.018
 harm-0.018
ment-0.017
Token.
Token ID13
Feature activation-2.05e-02

Pos logit contributions
 policy+0.002
 harm+0.002
 choice+0.002
 needs+0.002
 decision+0.002
Neg logit contributions
aults-0.003
coat-0.003
Aud-0.002
 apr-0.002
 guitars-0.002
Token<|endoftext|>
Token ID50256
Feature activation+1.06e-03

Pos logit contributions
 harm+0.029
 needs+0.028
 choice+0.028
 decision+0.028
 policy+0.027
Neg logit contributions
V-0.037
Ryan-0.036
Ol-0.036
T-0.036
Daniel-0.036
TokenJimmy
Token ID40335
Feature activation+5.29e-03

Pos logit contributions
coat+0.002
aults+0.002
 apr+0.002
Ol+0.002
Aud+0.002
Neg logit contributions
 need-0.002
 needs-0.002
 harm-0.002
 policy-0.002
 choice-0.001
Token was
Token ID373
Feature activation-4.25e-03

Pos logit contributions
coat+0.010
aults+0.010
 apr+0.009
Ol+0.009
Aud+0.009
Neg logit contributions
 need-0.007
 needs-0.007
 harm-0.007
 policy-0.007
 choice-0.007
Token a
Token ID257
Feature activation-1.85e-03

Pos logit contributions
 needs+0.007
 need+0.007
 harm+0.007
 policy+0.007
 choice+0.006
Neg logit contributions
coat-0.007
aults-0.007
 apr-0.007
Ol-0.007
Aud-0.007
Token very
Token ID845
Feature activation-5.22e-02

Pos logit contributions
 need+0.003
 needs+0.003
 day+0.002
 harm+0.002
 animal+0.002
Neg logit contributions
Aud-0.003
coat-0.003
aults-0.003
 apr-0.003
Ol-0.003
Token lively
Token ID29696
Feature activation-1.35e-02

Pos logit contributions
 harm+0.098
 policy+0.097
 situations+0.096
 decisions+0.092
 needs+0.091
Neg logit contributions
coat-0.068
 apr-0.068
J-0.067
Daniel-0.066
La-0.065
Token 3
Token ID513
Feature activation-3.87e-02

Pos logit contributions
 needs+0.019
 harm+0.019
 need+0.019
 policy+0.018
 decisions+0.017
Neg logit contributions
coat-0.025
 apr-0.024
aults-0.024
Ol-0.024
Daniel-0.023
Token year
Token ID614
Feature activation-8.01e-02

Pos logit contributions
 needs+0.047
 policy+0.047
 need+0.045
 belong+0.045
 harm+0.045
Neg logit contributions
J-0.074
 apr-0.074
 guitars-0.070
coat-0.069
 shirts-0.068
Token old
Token ID1468
Feature activation-4.00e-02

Pos logit contributions
 harm+0.106
 need+0.105
 policy+0.102
 needs+0.096
 fate+0.095
Neg logit contributions
ding-0.145
coat-0.142
 skirts-0.142
 apr-0.142
Daniel-0.141
Token.
Token ID13
Feature activation-2.27e-02

Pos logit contributions
 harm+0.059
 policy+0.058
 need+0.058
 needs+0.056
 fate+0.054
Neg logit contributions
coat-0.070
 guitars-0.068
Daniel-0.066
 apr-0.066
Ryan-0.066
Token before
Token ID878
Feature activation-1.80e-02

Pos logit contributions
 needs+0.030
crow+0.029
 situations+0.028
 choice+0.028
 decision+0.028
Neg logit contributions
aults-0.049
J-0.048
 skirts-0.048
 guitars-0.047
 apr-0.046
Token.
Token ID13
Feature activation-1.28e-02

Pos logit contributions
 needs+0.019
 need+0.019
 harm+0.018
 policy+0.018
 choice+0.016
Neg logit contributions
aults-0.039
coat-0.039
 apr-0.039
Ol-0.039
Aud-0.039
Token
Token ID198
Feature activation+8.92e-03

Pos logit contributions
 harm+0.019
 needs+0.019
 need+0.019
 policy+0.018
 choice+0.018
Neg logit contributions
 apr-0.022
coat-0.022
aults-0.022
Ol-0.022
Aud-0.022
Token
Token ID198
Feature activation+1.67e-03

Pos logit contributions
coat+0.019
aults+0.018
 apr+0.018
Aud+0.017
 skirts+0.017
Neg logit contributions
 need-0.011
 of-0.011
 needs-0.011
 day-0.010
ore-0.010
TokenM
Token ID44
Feature activation-3.31e-02

Pos logit contributions
coat+0.002
paper+0.002
 apr+0.002
ding+0.002
Aud+0.002
Neg logit contributions
 of-0.003
 need-0.003
 day-0.003
 needs-0.003
crow-0.003
Tokenoral
Token ID6864
Feature activation-5.03e-02

Pos logit contributions
 harm+0.064
 need+0.063
 needs+0.063
igator+0.060
 situations+0.060
Neg logit contributions
J-0.044
Daniel-0.043
 apr-0.042
T-0.042
Ryan-0.041
Token of
Token ID286
Feature activation-3.06e-02

Pos logit contributions
 harm+0.061
 situations+0.060
rum+0.058
 decision+0.058
 fear+0.057
Neg logit contributions
Grand-0.096
J-0.095
Daniel-0.091
Ryan-0.089
Miss-0.088
Token the
Token ID262
Feature activation-1.80e-02

Pos logit contributions
 needs+0.054
 situations+0.054
 earthquake+0.053
 policy+0.052
crow+0.051
Neg logit contributions
 apr-0.043
Ol-0.042
T-0.041
Daniel-0.041
J-0.040
Token story
Token ID1621
Feature activation-3.03e-02

Pos logit contributions
 harm+0.021
ment+0.020
 needs+0.020
 policy+0.020
 situations+0.020
Neg logit contributions
 apr-0.036
Ol-0.036
aults-0.035
Daniel-0.035
Ryan-0.034
Token:
Token ID25
Feature activation+1.58e-02

Pos logit contributions
 harm+0.050
 situations+0.045
 need+0.045
 earthquake+0.044
 danger+0.043
Neg logit contributions
Daniel-0.047
Rose-0.046
Ryan-0.045
Aud-0.044
uits-0.044
Token Working
Token ID14594
Feature activation+9.82e-03

Pos logit contributions
T+0.026
Ol+0.026
V+0.026
Daniel+0.025
Che+0.025
Neg logit contributions
 harm-0.023
 choice-0.023
 policy-0.022
 earthquake-0.022
 decision-0.022