TOP ACTIVATIONS
MAX = 0.249

She loved to go on pic n ics with her family
liked to play and have pic n ics . One day
the warm sunshine and having pic n ics with her family
near the cliff and have pic n ics . One day
in the garden and have pic n ics together .
Tim my loved going on pic n ics with his family
. She loved to have pic n ics with her family
saw lots of people having pic n ics and walking their
and it was going on pic n ics !
a family who really enjoyed pic n ics . They loved
even see other people having pic n ics .
girl named Lily who loved pic n ics . She would
the park , and have pic n ics outside .
roof . They would have pic n ics and play games
see the bird and have pic n ics . <|endoftext|> Once
to be careful when having pic n ics and not to
family were ready to go pic n icking again . But
went on adventures , had pic n ics , and even
children running around , families pic n icking , couples st
favourite place to go for pic n ics . <|endoftext|> Once

MIN ACTIVATIONS
MIN = -0.150

angry . He is the artist . " Sorry
beautiful . The artist said she was very talented
Grand ma ," said the granddaughter . They
. He was becoming an artist . His mom was so
she 'll be a powerful artist just like the boy in
special present !" said the fairy . John was
lose their magic ." the fairy said . L
paint . The artist watched as the little girl
You 're such a talented artist !" <|endoftext|> Once upon a
even more famous !" The fairy and the princess laughed and
real world ," explained the fairy . L ily
a 3 - year - old boy . He was sitting
ge . The artist decided to knock on the
girl smiled and showed the artist her painting . He was
so well !" said the fairy . " It looks so
, magic fairy !" The fairy smiled back and said ,
" Hello ," said the fairy . J
" That monkey is an artist . He is very careless
Suddenly , a friendly fairy flew into the park .
up to be a wealthy artist . She sold her pictures

INTERVAL 0.149 - 0.249
CONTAINS 0.011%

there was a dog named F ido . He loved to
in her heart . <|endoftext|> S ara and Ben like to
about the hill . <|endoftext|> S ara and Tom love pizza
in the backyard . <|endoftext|> S ara likes to make things
a family who really enjoyed pic n ics . They loved

INTERVAL 0.050 - 0.149
CONTAINS 9.189%

" Get off ! Get off ! You are ruining my
they were a family and they had to be nice to
all the fruits we saw on the cards . And some
't you come and join us ?" Max was
. <|endoftext|> Once upon a time , there was a little

INTERVAL -0.050 - 0.050
CONTAINS 83.317%

. " Let 's go inside , Lily ,"
went to the wall and challenged himself with a new route
piano . It was so pretty and shiny , and it
. One day , Anna found a very big
birthday ." Sam said . L ily took the

INTERVAL -0.150 - -0.050
CONTAINS 7.484%

there was a thoughtful little girl who liked to speak all
. It was so beautiful that he wanted to take a
was very kind . She taught Lucy how to take her
with others . The little bird realized the owl was right
away !" The old person was amazed and said
Token She
Token ID1375
Feature activation+4.07e-03

Pos logit contributions
heit+0.088
rated+0.088
ils+0.082
fy+0.082
wy+0.082
Neg logit contributions
 doctor-0.043
 mouse-0.040
 woman-0.039
 vet-0.036
 fairy-0.036
Token loved
Token ID6151
Feature activation-9.53e-03

Pos logit contributions
rated+0.009
fy+0.009
heit+0.009
cluded+0.008
ash+0.008
Neg logit contributions
 woman-0.006
 doctor-0.006
 mouse-0.006
 parent-0.006
 fairy-0.006
Token to
Token ID284
Feature activation+2.83e-02

Pos logit contributions
 doctor+0.015
 woman+0.015
 parent+0.014
 vet+0.014
 wizard+0.014
Neg logit contributions
rated-0.020
fy-0.019
cluded-0.019
|-0.018
iling-0.018
Token go
Token ID467
Feature activation+9.93e-02

Pos logit contributions
rated+0.053
fy+0.049
iling+0.046
heit+0.046
cluded+0.045
Neg logit contributions
 doctor-0.056
 woman-0.053
 parent-0.052
 painter-0.051
 wizard-0.051
Token on
Token ID319
Feature activation+9.09e-02

Pos logit contributions
rated+0.198
fy+0.181
heit+0.178
iling+0.175
rene+0.171
Neg logit contributions
 doctor-0.176
 woman-0.170
 parent-0.164
 mouse-0.160
 wizard-0.160
Token pic
Token ID8301
Feature activation+2.49e-01

Pos logit contributions
rated+0.161
fy+0.146
cluded+0.139
rene+0.135
iling+0.134
Neg logit contributions
 doctor-0.185
 woman-0.183
 parent-0.173
 mouse-0.172
 vet-0.169
Tokenn
Token ID77
Feature activation+9.95e-02

Pos logit contributions
rated+0.285
fy+0.268
cluded+0.240
irm+0.225
iling+0.224
Neg logit contributions
 doctor-0.622
 woman-0.612
 vet-0.599
 parent-0.595
 wizard-0.588
Tokenics
Token ID873
Feature activation+6.51e-02

Pos logit contributions
rated+0.094
fy+0.084
cluded+0.083
|+0.079
irl+0.077
Neg logit contributions
 woman-0.264
 doctor-0.262
 mouse-0.255
 vet-0.254
 wizard-0.254
Token with
Token ID351
Feature activation+3.28e-02

Pos logit contributions
rated+0.197
heit+0.189
fy+0.179
rene+0.173
wed+0.171
Neg logit contributions
 doctor-0.058
 woman-0.053
 mouse-0.047
 owl-0.046
 bird-0.045
Token her
Token ID607
Feature activation+5.86e-02

Pos logit contributions
rated+0.077
fy+0.070
cluded+0.067
iling+0.066
heit+0.066
Neg logit contributions
 doctor-0.052
 woman-0.051
 parent-0.050
 mouse-0.048
 vet-0.047
Token family
Token ID1641
Feature activation+4.21e-03

Pos logit contributions
rated+0.159
fy+0.156
cluded+0.151
|+0.148
iling+0.146
Neg logit contributions
 doctor-0.052
 woman-0.051
 vet-0.047
 wizard-0.046
 parent-0.046
Token liked
Token ID8288
Feature activation-2.09e-02

Pos logit contributions
rated+0.086
fy+0.083
cluded+0.082
heit+0.082
iling+0.081
Neg logit contributions
 parent-0.062
 woman-0.058
 mouse-0.058
 fairy-0.057
 bird-0.057
Token to
Token ID284
Feature activation+2.64e-02

Pos logit contributions
 doctor+0.032
 woman+0.032
 parent+0.030
 vet+0.030
 wizard+0.030
Neg logit contributions
rated-0.043
fy-0.042
cluded-0.041
|-0.039
iling-0.039
Token play
Token ID711
Feature activation+6.28e-02

Pos logit contributions
rated+0.050
fy+0.049
iling+0.045
heit+0.044
cluded+0.043
Neg logit contributions
 doctor-0.051
 parent-0.048
 vet-0.047
 painter-0.046
 mouse-0.046
Token and
Token ID290
Feature activation+4.24e-02

Pos logit contributions
rated+0.083
fy+0.079
cluded+0.076
|+0.073
iling+0.071
Neg logit contributions
 woman-0.143
 doctor-0.141
 parent-0.137
 wizard-0.136
 vet-0.136
Token have
Token ID423
Feature activation+2.01e-03

Pos logit contributions
rated+0.067
fy+0.065
cluded+0.062
iling+0.061
heit+0.059
Neg logit contributions
 doctor-0.086
 woman-0.086
 parent-0.082
 mouse-0.082
 wizard-0.081
Token pic
Token ID8301
Feature activation+2.43e-01

Pos logit contributions
rated+0.003
fy+0.003
cluded+0.003
iling+0.003
heit+0.003
Neg logit contributions
 doctor-0.004
 woman-0.004
 mouse-0.004
 vet-0.004
 parent-0.004
Tokenn
Token ID77
Feature activation+9.43e-02

Pos logit contributions
rated+0.290
fy+0.267
cluded+0.241
irm+0.233
iling+0.228
Neg logit contributions
 doctor-0.575
 vet-0.561
 woman-0.558
 parent-0.547
 wizard-0.543
Tokenics
Token ID873
Feature activation+5.79e-02

Pos logit contributions
rated+0.095
fy+0.086
cluded+0.082
|+0.077
irl+0.075
Neg logit contributions
 woman-0.246
 doctor-0.246
 vet-0.237
 wizard-0.237
 parent-0.236
Token.
Token ID13
Feature activation+2.91e-02

Pos logit contributions
rated+0.126
heit+0.119
fy+0.113
cluded+0.111
iling+0.109
Neg logit contributions
 doctor-0.093
 mouse-0.089
 woman-0.089
 fairy-0.084
 vet-0.084
Token One
Token ID1881
Feature activation+5.94e-02

Pos logit contributions
rated+0.055
fy+0.053
cluded+0.050
heit+0.049
iling+0.048
Neg logit contributions
 doctor-0.052
 woman-0.051
 parent-0.049
 vet-0.049
 mouse-0.048
Token day
Token ID1110
Feature activation+3.01e-02

Pos logit contributions
heit+0.094
fy+0.087
rated+0.086
ils+0.085
iling+0.084
Neg logit contributions
 doctor-0.128
 woman-0.127
 mouse-0.124
 bird-0.122
 parent-0.120
Token the
Token ID262
Feature activation+1.97e-02

Pos logit contributions
rated+0.112
fy+0.109
|+0.107
cluded+0.106
irl+0.104
Neg logit contributions
 woman-0.109
 doctor-0.108
 vet-0.104
 wizard-0.103
 parent-0.103
Token warm
Token ID5814
Feature activation+8.61e-02

Pos logit contributions
rated+0.049
fy+0.046
heit+0.045
iling+0.044
|+0.044
Neg logit contributions
 doctor-0.028
 woman-0.026
 vet-0.024
 mouse-0.024
 wizard-0.024
Token sunshine
Token ID34488
Feature activation+3.84e-02

Pos logit contributions
rated+0.185
fy+0.176
heit+0.175
cluded+0.170
beh+0.169
Neg logit contributions
 doctor-0.132
 woman-0.130
 mouse-0.123
 fairy-0.122
 parent-0.121
Token and
Token ID290
Feature activation+2.80e-02

Pos logit contributions
rated+0.078
fy+0.075
cluded+0.073
heit+0.071
|+0.070
Neg logit contributions
 doctor-0.062
 woman-0.062
 mouse-0.058
 parent-0.058
 vet-0.057
Token having
Token ID1719
Feature activation+1.76e-02

Pos logit contributions
rated+0.055
fy+0.053
cluded+0.051
|+0.049
iling+0.049
Neg logit contributions
 doctor-0.047
 woman-0.047
 parent-0.044
 wizard-0.044
 vet-0.044
Token pic
Token ID8301
Feature activation+2.41e-01

Pos logit contributions
rated+0.033
fy+0.028
cluded+0.027
heit+0.027
iling+0.027
Neg logit contributions
 doctor-0.034
 woman-0.034
 mouse-0.033
 vet-0.032
 fairy-0.032
Tokenn
Token ID77
Feature activation+8.48e-02

Pos logit contributions
rated+0.339
fy+0.311
ishy+0.292
irm+0.289
cluded+0.286
Neg logit contributions
 doctor-0.486
 woman-0.474
 vet-0.474
 fairy-0.467
 wizard-0.465
Tokenics
Token ID873
Feature activation+4.74e-02

Pos logit contributions
rated+0.104
fy+0.102
cluded+0.092
heit+0.089
|+0.087
Neg logit contributions
 doctor-0.201
 woman-0.199
 parent-0.193
 wizard-0.189
 vet-0.189
Token with
Token ID351
Feature activation+2.11e-02

Pos logit contributions
rated+0.108
heit+0.103
estival+0.096
fy+0.096
cluded+0.095
Neg logit contributions
 doctor-0.072
 woman-0.071
 mouse-0.068
 fairy-0.068
 wizard-0.064
Token her
Token ID607
Feature activation+4.38e-02

Pos logit contributions
rated+0.035
fy+0.034
cluded+0.032
|+0.031
iling+0.031
Neg logit contributions
 woman-0.041
 doctor-0.041
 wizard-0.039
 vet-0.039
 parent-0.039
Token family
Token ID1641
Feature activation-4.96e-03

Pos logit contributions
rated+0.101
fy+0.099
cluded+0.096
|+0.093
iling+0.092
Neg logit contributions
 doctor-0.055
 woman-0.055
 vet-0.051
 wizard-0.051
 parent-0.050
Token near
Token ID1474
Feature activation+4.69e-02

Pos logit contributions
heit+0.238
rated+0.238
fy+0.224
wy+0.224
iling+0.221
Neg logit contributions
 woman-0.128
 mouse-0.121
 doctor-0.118
 owl-0.114
 bird-0.113
Token the
Token ID262
Feature activation+7.63e-03

Pos logit contributions
rated+0.109
fy+0.107
cluded+0.102
|+0.099
rene+0.098
Neg logit contributions
 woman-0.066
 doctor-0.065
 parent-0.062
 mouse-0.062
 vet-0.060
Token cliff
Token ID19516
Feature activation+4.49e-02

Pos logit contributions
rated+0.014
fy+0.013
cluded+0.013
irl+0.012
|+0.012
Neg logit contributions
 doctor-0.013
 woman-0.013
 parent-0.013
 vet-0.013
 wizard-0.012
Token and
Token ID290
Feature activation+3.26e-02

Pos logit contributions
rated+0.092
fy+0.089
heit+0.087
cluded+0.086
wy+0.085
Neg logit contributions
 doctor-0.070
 woman-0.070
 mouse-0.066
 wizard-0.065
 vet-0.065
Token have
Token ID423
Feature activation-8.63e-03

Pos logit contributions
rated+0.053
fy+0.052
cluded+0.048
iling+0.047
wy+0.047
Neg logit contributions
 doctor-0.067
 woman-0.067
 wizard-0.063
 parent-0.063
 vet-0.063
Token pic
Token ID8301
Feature activation+2.39e-01

Pos logit contributions
 woman+0.015
 doctor+0.015
 vet+0.014
 bird+0.014
 mouse+0.014
Neg logit contributions
rated-0.018
fy-0.016
heit-0.016
iling-0.016
cluded-0.015
Tokenn
Token ID77
Feature activation+8.63e-02

Pos logit contributions
rated+0.322
fy+0.290
ishy+0.273
irm+0.267
cluded+0.266
Neg logit contributions
 doctor-0.514
 vet-0.509
 woman-0.498
 conductor-0.491
 fairy-0.488
Tokenics
Token ID873
Feature activation+4.62e-02

Pos logit contributions
rated+0.099
fy+0.095
cluded+0.087
|+0.082
iling+0.082
Neg logit contributions
 doctor-0.212
 woman-0.211
 parent-0.204
 vet-0.202
 wizard-0.201
Token.
Token ID13
Feature activation+2.40e-02

Pos logit contributions
rated+0.106
heit+0.105
iling+0.095
fy+0.095
cluded+0.093
Neg logit contributions
 doctor-0.067
 woman-0.067
 mouse-0.064
 bird-0.063
 fairy-0.062
Token One
Token ID1881
Feature activation+5.61e-02

Pos logit contributions
rated+0.052
fy+0.050
cluded+0.048
heit+0.047
ils+0.047
Neg logit contributions
 doctor-0.036
 woman-0.036
 vet-0.034
 parent-0.033
 mouse-0.033
Token day
Token ID1110
Feature activation+2.52e-02

Pos logit contributions
rated+0.075
heit+0.075
fy+0.073
ils+0.071
iling+0.069
Neg logit contributions
 woman-0.130
 doctor-0.129
 mouse-0.123
 parent-0.123
 vet-0.121
Token in
Token ID287
Feature activation+7.00e-02

Pos logit contributions
rated+0.112
fy+0.107
heit+0.100
cluded+0.099
iling+0.096
Neg logit contributions
 doctor-0.108
 woman-0.105
 mouse-0.103
 fairy-0.100
 wizard-0.099
Token the
Token ID262
Feature activation+2.43e-02

Pos logit contributions
rated+0.146
fy+0.141
cluded+0.136
|+0.132
iling+0.131
Neg logit contributions
 doctor-0.109
 woman-0.109
 parent-0.102
 vet-0.102
 wizard-0.101
Token garden
Token ID11376
Feature activation+2.30e-02

Pos logit contributions
rated+0.060
fy+0.059
cluded+0.057
heit+0.056
|+0.056
Neg logit contributions
 woman-0.029
 doctor-0.029
 parent-0.027
 vet-0.026
 mouse-0.026
Token and
Token ID290
Feature activation+3.93e-02

Pos logit contributions
rated+0.040
fy+0.039
cluded+0.037
heit+0.036
|+0.036
Neg logit contributions
 doctor-0.043
 woman-0.043
 parent-0.041
 vet-0.041
 mouse-0.041
Token have
Token ID423
Feature activation-3.57e-04

Pos logit contributions
rated+0.057
fy+0.055
cluded+0.052
|+0.050
iling+0.049
Neg logit contributions
 doctor-0.085
 woman-0.085
 parent-0.081
 vet-0.081
 wizard-0.081
Token pic
Token ID8301
Feature activation+2.38e-01

Pos logit contributions
 doctor+0.001
 woman+0.001
 mouse+0.001
 bird+0.001
 vet+0.001
Neg logit contributions
rated-0.001
fy-0.001
heit-0.001
iling-0.001
cluded-0.001
Tokenn
Token ID77
Feature activation+8.79e-02

Pos logit contributions
rated+0.315
fy+0.291
ishy+0.265
cluded+0.264
irm+0.263
Neg logit contributions
 doctor-0.519
 vet-0.510
 woman-0.501
 conductor-0.491
 parent-0.489
Tokenics
Token ID873
Feature activation+4.87e-02

Pos logit contributions
rated+0.100
fy+0.096
cluded+0.088
|+0.083
heit+0.082
Neg logit contributions
 doctor-0.217
 woman-0.215
 parent-0.209
 vet-0.207
 wizard-0.206
Token together
Token ID1978
Feature activation+8.09e-02

Pos logit contributions
rated+0.114
heit+0.113
iling+0.102
cluded+0.101
fy+0.101
Neg logit contributions
 doctor-0.071
 woman-0.070
 mouse-0.068
 bird-0.065
 fairy-0.064
Token.
Token ID13
Feature activation+2.47e-02

Pos logit contributions
rated+0.215
heit+0.212
fy+0.200
cluded+0.198
|+0.194
Neg logit contributions
 doctor-0.092
 mouse-0.086
 woman-0.081
 bird-0.080
 fairy-0.076
Token
Token ID220
Feature activation+5.06e-02

Pos logit contributions
rated+0.052
fy+0.050
cluded+0.049
|+0.048
estival+0.047
Neg logit contributions
 doctor-0.037
 woman-0.037
 wizard-0.035
 parent-0.035
 vet-0.034
Token Tim
Token ID5045
Feature activation+5.46e-02

Pos logit contributions
ils+0.068
rated+0.068
fy+0.066
heit+0.066
cluded+0.063
Neg logit contributions
 doctor-0.040
 mouse-0.038
 mechanic-0.038
 painter-0.036
 woman-0.036
Tokenmy
Token ID1820
Feature activation-3.21e-02

Pos logit contributions
fy+0.136
heit+0.134
estival+0.133
rated+0.132
cluded+0.129
Neg logit contributions
 doctor-0.060
 woman-0.058
 parent-0.056
 mouse-0.055
 owl-0.055
Token loved
Token ID6151
Feature activation-1.40e-02

Pos logit contributions
 doctor+0.053
 woman+0.053
 parent+0.051
 mouse+0.050
 mechanic+0.049
Neg logit contributions
rated-0.066
heit-0.063
fy-0.063
cluded-0.061
estival-0.060
Token going
Token ID1016
Feature activation+9.64e-02

Pos logit contributions
 woman+0.023
 doctor+0.022
 vet+0.021
 wizard+0.021
 parent+0.021
Neg logit contributions
rated-0.027
fy-0.026
cluded-0.025
|-0.025
iling-0.025
Token on
Token ID319
Feature activation+7.95e-02

Pos logit contributions
rated+0.202
fy+0.191
cluded+0.183
heit+0.182
iling+0.181
Neg logit contributions
 doctor-0.155
 woman-0.151
 parent-0.143
 mechanic-0.142
 vet-0.141
Token pic
Token ID8301
Feature activation+2.37e-01

Pos logit contributions
rated+0.131
fy+0.121
cluded+0.114
rene+0.109
iling+0.108
Neg logit contributions
 doctor-0.169
 woman-0.165
 parent-0.157
 mouse-0.157
 vet-0.157
Tokenn
Token ID77
Feature activation+8.75e-02

Pos logit contributions
rated+0.277
fy+0.262
cluded+0.232
irm+0.223
ash+0.220
Neg logit contributions
 doctor-0.588
 woman-0.569
 vet-0.564
 wizard-0.555
 parent-0.553
Tokenics
Token ID873
Feature activation+5.74e-02

Pos logit contributions
rated+0.093
fy+0.085
cluded+0.083
|+0.078
irl+0.077
Neg logit contributions
 woman-0.223
 doctor-0.221
 mouse-0.214
 vet-0.214
 parent-0.214
Token with
Token ID351
Feature activation+2.40e-02

Pos logit contributions
rated+0.177
heit+0.166
fy+0.159
rene+0.156
wed+0.150
Neg logit contributions
 doctor-0.051
 mouse-0.038
 woman-0.037
 mechanic-0.036
 bird-0.036
Token his
Token ID465
Feature activation+5.06e-02

Pos logit contributions
rated+0.052
fy+0.048
cluded+0.046
iling+0.045
|+0.045
Neg logit contributions
 doctor-0.040
 woman-0.038
 parent-0.038
 mouse-0.037
 vet-0.037
Token family
Token ID1641
Feature activation-9.73e-04

Pos logit contributions
rated+0.141
fy+0.136
cluded+0.133
|+0.130
iling+0.130
Neg logit contributions
 doctor-0.046
 woman-0.045
 parent-0.041
 mouse-0.040
 vet-0.040
Token.
Token ID13
Feature activation+2.83e-02

Pos logit contributions
rated+0.067
fy+0.064
cluded+0.061
|+0.059
iling+0.058
Neg logit contributions
 doctor-0.091
 woman-0.090
 vet-0.086
 wizard-0.086
 parent-0.086
Token She
Token ID1375
Feature activation+5.86e-03

Pos logit contributions
rated+0.065
heit+0.063
fy+0.062
cluded+0.061
ils+0.060
Neg logit contributions
 doctor-0.040
 woman-0.037
 mouse-0.037
 parent-0.036
 vet-0.036
Token loved
Token ID6151
Feature activation-1.53e-02

Pos logit contributions
rated+0.012
fy+0.012
cluded+0.011
heit+0.011
ash+0.011
Neg logit contributions
 woman-0.010
 doctor-0.010
 parent-0.009
 mouse-0.009
 fairy-0.009
Token to
Token ID284
Feature activation+2.42e-02

Pos logit contributions
 doctor+0.024
 woman+0.024
 vet+0.022
 parent+0.022
 wizard+0.022
Neg logit contributions
rated-0.031
fy-0.030
cluded-0.029
|-0.028
iling-0.028
Token have
Token ID423
Feature activation+2.73e-03

Pos logit contributions
rated+0.044
fy+0.040
iling+0.038
cluded+0.038
heit+0.038
Neg logit contributions
 doctor-0.049
 woman-0.047
 parent-0.046
 painter-0.045
 vet-0.044
Token pic
Token ID8301
Feature activation+2.37e-01

Pos logit contributions
rated+0.005
fy+0.004
cluded+0.004
heit+0.004
iling+0.004
Neg logit contributions
 doctor-0.006
 woman-0.006
 mouse-0.005
 parent-0.005
 vet-0.005
Tokenn
Token ID77
Feature activation+9.65e-02

Pos logit contributions
rated+0.279
fy+0.255
cluded+0.232
irm+0.218
ash+0.213
Neg logit contributions
 doctor-0.582
 woman-0.565
 vet-0.562
 parent-0.553
 wizard-0.550
Tokenics
Token ID873
Feature activation+5.45e-02

Pos logit contributions
rated+0.081
cluded+0.072
fy+0.071
irl+0.070
|+0.067
Neg logit contributions
 woman-0.266
 doctor-0.262
 mouse-0.258
 wizard-0.256
 vet-0.256
Token with
Token ID351
Feature activation+2.57e-02

Pos logit contributions
rated+0.147
heit+0.142
fy+0.133
cluded+0.127
iling+0.126
Neg logit contributions
 doctor-0.066
 woman-0.060
 fairy-0.057
 mouse-0.057
 owl-0.056
Token her
Token ID607
Feature activation+5.49e-02

Pos logit contributions
rated+0.064
fy+0.057
cluded+0.056
rene+0.054
heit+0.054
Neg logit contributions
 doctor-0.040
 woman-0.038
 mouse-0.037
 parent-0.036
 vet-0.035
Token family
Token ID1641
Feature activation-6.22e-04

Pos logit contributions
rated+0.146
fy+0.142
cluded+0.138
|+0.135
iling+0.134
Neg logit contributions
 doctor-0.054
 woman-0.054
 parent-0.049
 vet-0.048
 mouse-0.048
Token saw
Token ID2497
Feature activation-9.86e-02

Pos logit contributions
rated+0.080
fy+0.077
cluded+0.075
|+0.073
iling+0.072
Neg logit contributions
 woman-0.065
 doctor-0.065
 parent-0.061
 vet-0.060
 wizard-0.060
Token lots
Token ID6041
Feature activation+3.76e-02

Pos logit contributions
 wizard+0.174
 parent+0.171
 woman+0.171
 mechanic+0.171
 doctor+0.169
Neg logit contributions
rated-0.147
ils-0.146
 matters-0.145
fy-0.142
cluded-0.141
Token of
Token ID286
Feature activation-3.86e-02

Pos logit contributions
rated+0.088
fy+0.087
cluded+0.084
iling+0.082
|+0.081
Neg logit contributions
 doctor-0.046
 woman-0.046
 parent-0.044
 vet-0.043
 wizard-0.042
Token people
Token ID661
Feature activation-2.15e-02

Pos logit contributions
 doctor+0.055
 woman+0.055
 parent+0.050
 vet+0.050
 mouse+0.050
Neg logit contributions
rated-0.086
fy-0.083
cluded-0.080
|-0.078
iling-0.077
Token having
Token ID1719
Feature activation+7.57e-03

Pos logit contributions
 woman+0.034
 doctor+0.034
 wizard+0.031
 parent+0.031
 mouse+0.031
Neg logit contributions
rated-0.046
fy-0.043
heit-0.042
cluded-0.042
rene-0.041
Token pic
Token ID8301
Feature activation+2.36e-01

Pos logit contributions
rated+0.016
fy+0.013
wy+0.013
heit+0.013
cluded+0.013
Neg logit contributions
 woman-0.013
 doctor-0.013
 mouse-0.013
 bird-0.013
 vet-0.012
Tokenn
Token ID77
Feature activation+8.52e-02

Pos logit contributions
rated+0.330
fy+0.300
ishy+0.276
irm+0.275
ash+0.273
Neg logit contributions
 doctor-0.472
 vet-0.466
 woman-0.463
 wizard-0.457
 owl-0.453
Tokenics
Token ID873
Feature activation+4.81e-02

Pos logit contributions
rated+0.102
fy+0.100
cluded+0.090
heit+0.087
|+0.084
Neg logit contributions
 doctor-0.203
 woman-0.202
 parent-0.196
 vet-0.193
 wizard-0.192
Token and
Token ID290
Feature activation+2.27e-02

Pos logit contributions
rated+0.117
heit+0.114
fy+0.106
cluded+0.104
|+0.103
Neg logit contributions
 doctor-0.064
 woman-0.064
 mouse-0.060
 fairy-0.059
 bird-0.056
Token walking
Token ID6155
Feature activation+8.93e-02

Pos logit contributions
rated+0.040
fy+0.039
cluded+0.037
|+0.036
iling+0.036
Neg logit contributions
 doctor-0.041
 woman-0.041
 vet-0.039
 parent-0.039
 wizard-0.039
Token their
Token ID511
Feature activation+4.11e-02

Pos logit contributions
wy+0.197
heit+0.193
iling+0.190
rated+0.187
irl+0.182
Neg logit contributions
 woman-0.138
 mouse-0.126
 bird-0.124
 lady-0.117
 doctor-0.117
Token and
Token ID290
Feature activation+2.08e-02

Pos logit contributions
rated+0.137
fy+0.131
cluded+0.125
heit+0.124
|+0.123
Neg logit contributions
 doctor-0.110
 woman-0.106
 parent-0.102
 mouse-0.102
 wizard-0.101
Token it
Token ID340
Feature activation+2.21e-02

Pos logit contributions
rated+0.038
fy+0.037
cluded+0.036
|+0.034
estival+0.034
Neg logit contributions
 doctor-0.038
 woman-0.037
 parent-0.036
 mouse-0.035
 wizard-0.035
Token was
Token ID373
Feature activation-3.42e-03

Pos logit contributions
rated+0.044
fy+0.044
cluded+0.042
heit+0.041
estival+0.041
Neg logit contributions
 doctor-0.036
 woman-0.035
 parent-0.034
 mouse-0.034
 wizard-0.033
Token going
Token ID1016
Feature activation+9.26e-02

Pos logit contributions
 doctor+0.005
 woman+0.005
 parent+0.005
 mouse+0.005
 vet+0.004
Neg logit contributions
rated-0.008
fy-0.008
cluded-0.007
|-0.007
iling-0.007
Token on
Token ID319
Feature activation+7.80e-02

Pos logit contributions
rated+0.223
iling+0.222
heit+0.220
fy+0.220
wy+0.210
Neg logit contributions
 doctor-0.109
 mouse-0.101
 mechanic-0.099
 conductor-0.098
 woman-0.097
Token pic
Token ID8301
Feature activation+2.36e-01

Pos logit contributions
rated+0.197
fy+0.196
cluded+0.185
iling+0.178
ishy+0.178
Neg logit contributions
 doctor-0.094
 vet-0.084
 mouse-0.084
 woman-0.083
 mechanic-0.080
Tokenn
Token ID77
Feature activation+9.28e-02

Pos logit contributions
fy+0.300
rated+0.290
ishy+0.265
ash+0.264
cluded+0.260
Neg logit contributions
 doctor-0.530
 vet-0.516
 woman-0.510
 conductor-0.502
 wizard-0.500
Tokenics
Token ID873
Feature activation+5.27e-02

Pos logit contributions
rated+0.157
fy+0.153
cluded+0.143
heit+0.139
iling+0.138
Neg logit contributions
 doctor-0.179
 woman-0.176
 parent-0.169
 vet-0.167
 wizard-0.166
Token!
Token ID0
Feature activation+2.06e-02

Pos logit contributions
rated+0.123
fy+0.120
heit+0.120
cluded+0.111
wy+0.111
Neg logit contributions
 doctor-0.074
 mouse-0.066
 woman-0.065
 wizard-0.063
 vet-0.062
Token
Token ID198
Feature activation-1.11e-02

Pos logit contributions
rated+0.049
fy+0.048
cluded+0.046
iling+0.045
|+0.045
Neg logit contributions
 doctor-0.026
 woman-0.026
 vet-0.024
 parent-0.024
 mouse-0.024
Token
Token ID198
Feature activation-1.72e-02

Pos logit contributions
 woman+0.017
 doctor+0.017
 wizard+0.016
 parent+0.016
 mouse+0.016
Neg logit contributions
rated-0.022
|-0.022
cluded-0.021
fy-0.021
irl-0.021
Token a
Token ID257
Feature activation-2.21e-02

Pos logit contributions
rated+0.018
fy+0.017
cluded+0.017
heit+0.016
iling+0.016
Neg logit contributions
 doctor-0.009
 woman-0.008
 mouse-0.008
 vet-0.008
 parent-0.008
Token family
Token ID1641
Feature activation-2.18e-02

Pos logit contributions
 doctor+0.022
 vet+0.022
 wizard+0.021
 conductor+0.021
 woman+0.021
Neg logit contributions
fy-0.049
rated-0.047
irl-0.046
cluded-0.045
 tim-0.044
Token who
Token ID508
Feature activation-4.40e-02

Pos logit contributions
 woman+0.035
 doctor+0.035
 vet+0.034
 wizard+0.033
 mouse+0.032
Neg logit contributions
rated-0.039
fy-0.039
iling-0.038
|-0.038
cluded-0.037
Token really
Token ID1107
Feature activation+8.44e-02

Pos logit contributions
 woman+0.061
 doctor+0.060
 mouse+0.058
 parent+0.058
 mechanic+0.055
Neg logit contributions
heit-0.108
rated-0.099
estival-0.096
fy-0.095
ash-0.095
Token enjoyed
Token ID8359
Feature activation+1.38e-02

Pos logit contributions
heit+0.186
rated+0.183
iling+0.175
irm+0.169
estival+0.168
Neg logit contributions
 woman-0.126
 doctor-0.119
 parent-0.116
 painter-0.114
 mouse-0.114
Token pic
Token ID8301
Feature activation+2.33e-01

Pos logit contributions
rated+0.028
fy+0.027
cluded+0.025
rene+0.024
|+0.024
Neg logit contributions
 doctor-0.024
 woman-0.024
 parent-0.023
 mouse-0.023
 wizard-0.022
Tokenn
Token ID77
Feature activation+9.28e-02

Pos logit contributions
rated+0.308
fy+0.287
irm+0.262
cluded+0.260
ishy+0.257
Neg logit contributions
 doctor-0.516
 woman-0.508
 wizard-0.505
 vet-0.497
 owl-0.495
Tokenics
Token ID873
Feature activation+5.91e-02

Pos logit contributions
rated+0.097
fy+0.089
cluded+0.084
|+0.079
iling+0.077
Neg logit contributions
 doctor-0.239
 woman-0.238
 vet-0.229
 parent-0.229
 wizard-0.228
Token.
Token ID13
Feature activation+3.15e-02

Pos logit contributions
rated+0.154
heit+0.148
fy+0.136
rene+0.135
cluded+0.135
Neg logit contributions
 doctor-0.075
 woman-0.074
 mouse-0.073
 owl-0.068
 bird-0.067
Token They
Token ID1119
Feature activation+4.77e-02

Pos logit contributions
rated+0.056
fy+0.053
heit+0.050
ils+0.050
cluded+0.050
Neg logit contributions
 doctor-0.063
 woman-0.062
 vet-0.059
 parent-0.059
 mechanic-0.059
Token loved
Token ID6151
Feature activation-1.02e-02

Pos logit contributions
rated+0.104
cluded+0.103
irm+0.103
fy+0.103
heit+0.100
Neg logit contributions
 woman-0.071
 parent-0.067
 mouse-0.066
 bird-0.063
 fairy-0.063
Token even
Token ID772
Feature activation+4.02e-02

Pos logit contributions
 doctor+0.035
 woman+0.034
 vet+0.032
 parent+0.032
 wizard+0.032
Neg logit contributions
rated-0.038
fy-0.037
cluded-0.035
iling-0.034
|-0.034
Token see
Token ID766
Feature activation-5.16e-02

Pos logit contributions
rated+0.065
fy+0.063
cluded+0.060
|+0.058
iling+0.057
Neg logit contributions
 doctor-0.080
 woman-0.080
 vet-0.076
 wizard-0.076
 parent-0.076
Token other
Token ID584
Feature activation-4.62e-02

Pos logit contributions
 woman+0.079
 doctor+0.078
 mouse+0.073
 parent+0.073
 vet+0.072
Neg logit contributions
rated-0.110
fy-0.107
cluded-0.103
|-0.101
iling-0.099
Token people
Token ID661
Feature activation-2.70e-02

Pos logit contributions
 woman+0.056
 doctor+0.054
 mouse+0.052
 bird+0.051
 owl+0.049
Neg logit contributions
rated-0.115
fy-0.112
cluded-0.109
wy-0.107
|-0.106
Token having
Token ID1719
Feature activation+3.16e-03

Pos logit contributions
 woman+0.043
 doctor+0.043
 wizard+0.041
 parent+0.040
 mouse+0.040
Neg logit contributions
rated-0.055
fy-0.053
cluded-0.051
heit-0.050
|-0.049
Token pic
Token ID8301
Feature activation+2.31e-01

Pos logit contributions
rated+0.007
fy+0.006
wy+0.006
cluded+0.005
heit+0.005
Neg logit contributions
 woman-0.005
 doctor-0.005
 bird-0.005
 mouse-0.005
 owl-0.005
Tokenn
Token ID77
Feature activation+7.96e-02

Pos logit contributions
rated+0.323
fy+0.297
irm+0.278
ash+0.274
ishy+0.273
Neg logit contributions
 woman-0.449
 doctor-0.449
 vet-0.445
 wizard-0.443
 fairy-0.439
Tokenics
Token ID873
Feature activation+4.30e-02

Pos logit contributions
fy+0.102
rated+0.102
cluded+0.091
heit+0.090
|+0.086
Neg logit contributions
 doctor-0.180
 woman-0.179
 parent-0.173
 painter-0.169
 wizard-0.169
Token.
Token ID13
Feature activation+1.07e-02

Pos logit contributions
rated+0.095
heit+0.092
fy+0.088
cluded+0.086
|+0.085
Neg logit contributions
 woman-0.064
 doctor-0.063
 mouse-0.061
 fairy-0.060
 wizard-0.059
Token
Token ID198
Feature activation-2.30e-02

Pos logit contributions
rated+0.021
fy+0.020
cluded+0.019
heit+0.019
irl+0.018
Neg logit contributions
 doctor-0.018
 woman-0.018
 mouse-0.017
 vet-0.017
 parent-0.017
Token
Token ID198
Feature activation-2.94e-02

Pos logit contributions
 woman+0.034
 doctor+0.034
 parent+0.032
 mouse+0.032
 wizard+0.032
Neg logit contributions
rated-0.048
fy-0.047
|-0.046
cluded-0.046
irl-0.045
Token girl
Token ID2576
Feature activation-6.00e-02

Pos logit contributions
 doctor+0.045
 vet+0.044
 woman+0.042
 conductor+0.042
 wizard+0.042
Neg logit contributions
fy-0.110
rated-0.105
ishy-0.101
cluded-0.100
|-0.099
Token named
Token ID3706
Feature activation-1.39e-02

Pos logit contributions
 doctor+0.106
 woman+0.105
 wizard+0.102
 vet+0.101
 mechanic+0.099
Neg logit contributions
fy-0.102
rated-0.101
|-0.096
cluded-0.095
iling-0.095
Token Lily
Token ID20037
Feature activation+4.43e-02

Pos logit contributions
 doctor+0.021
 woman+0.020
 parent+0.019
 mouse+0.019
 vet+0.019
Neg logit contributions
rated-0.030
fy-0.029
cluded-0.028
heit-0.028
iling-0.028
Token who
Token ID508
Feature activation-4.55e-02

Pos logit contributions
rated+0.065
fy+0.061
cluded+0.058
|+0.057
iling+0.056
Neg logit contributions
 doctor-0.095
 woman-0.094
 vet-0.091
 wizard-0.091
 parent-0.089
Token loved
Token ID6151
Feature activation-1.30e-02

Pos logit contributions
 doctor+0.075
 woman+0.075
 mouse+0.071
 parent+0.070
 fairy+0.068
Neg logit contributions
rated-0.093
heit-0.092
fy-0.088
estival-0.088
cluded-0.087
Token pic
Token ID8301
Feature activation+2.26e-01

Pos logit contributions
 doctor+0.020
 woman+0.020
 parent+0.019
 vet+0.019
 wizard+0.018
Neg logit contributions
rated-0.027
fy-0.026
cluded-0.025
|-0.025
iling-0.024
Tokenn
Token ID77
Feature activation+8.59e-02

Pos logit contributions
rated+0.285
fy+0.257
cluded+0.239
irm+0.235
ash+0.228
Neg logit contributions
 doctor-0.527
 woman-0.518
 vet-0.508
 wizard-0.502
 fairy-0.500
Tokenics
Token ID873
Feature activation+5.51e-02

Pos logit contributions
rated+0.081
fy+0.070
cluded+0.069
|+0.065
irl+0.064
Neg logit contributions
 woman-0.230
 doctor-0.228
 vet-0.222
 wizard-0.222
 mouse-0.221
Token.
Token ID13
Feature activation+2.97e-02

Pos logit contributions
rated+0.129
heit+0.126
cluded+0.114
fy+0.113
rene+0.111
Neg logit contributions
 doctor-0.087
 woman-0.083
 mouse-0.081
 fairy-0.078
 vet-0.078
Token She
Token ID1375
Feature activation+9.75e-03

Pos logit contributions
rated+0.061
fy+0.058
heit+0.057
cluded+0.056
iling+0.056
Neg logit contributions
 doctor-0.050
 woman-0.048
 vet-0.046
 parent-0.045
 mouse-0.045
Token would
Token ID561
Feature activation-4.40e-03

Pos logit contributions
rated+0.022
iling+0.022
beh+0.022
heit+0.021
ash+0.021
Neg logit contributions
 woman-0.014
 doctor-0.013
 parent-0.012
 mouse-0.012
 bird-0.012
Token the
Token ID262
Feature activation+4.47e-03

Pos logit contributions
rated+0.097
fy+0.094
cluded+0.091
|+0.091
irl+0.089
Neg logit contributions
 woman-0.088
 doctor-0.086
 parent-0.083
 wizard-0.082
 vet-0.082
Token park
Token ID3952
Feature activation+2.84e-02

Pos logit contributions
rated+0.009
fy+0.009
iling+0.009
cluded+0.009
ils+0.009
Neg logit contributions
 doctor-0.007
 woman-0.007
 vet-0.007
 mechanic-0.007
 parent-0.007
Token,
Token ID11
Feature activation-1.24e-02

Pos logit contributions
rated+0.052
fy+0.050
cluded+0.048
heit+0.047
iling+0.046
Neg logit contributions
 doctor-0.052
 woman-0.051
 mouse-0.049
 vet-0.049
 wizard-0.049
Token and
Token ID290
Feature activation+3.76e-03

Pos logit contributions
 woman+0.020
 doctor+0.020
 parent+0.019
 vet+0.019
 wizard+0.019
Neg logit contributions
rated-0.024
fy-0.023
cluded-0.023
|-0.022
irm-0.022
Token have
Token ID423
Feature activation-2.37e-02

Pos logit contributions
rated+0.007
fy+0.007
cluded+0.006
|+0.006
iling+0.006
Neg logit contributions
 doctor-0.007
 woman-0.007
 parent-0.006
 vet-0.006
 wizard-0.006
Token pic
Token ID8301
Feature activation+2.26e-01

Pos logit contributions
 doctor+0.043
 woman+0.042
 vet+0.041
 mouse+0.041
 parent+0.040
Neg logit contributions
rated-0.045
fy-0.042
cluded-0.040
iling-0.039
heit-0.039
Tokenn
Token ID77
Feature activation+7.11e-02

Pos logit contributions
rated+0.314
fy+0.284
ishy+0.273
irm+0.272
ash+0.262
Neg logit contributions
 doctor-0.450
 vet-0.441
 fairy-0.434
 woman-0.433
 conductor-0.432
Tokenics
Token ID873
Feature activation+3.02e-02

Pos logit contributions
rated+0.091
fy+0.088
cluded+0.081
|+0.078
iling+0.077
Neg logit contributions
 doctor-0.165
 woman-0.164
 parent-0.158
 vet-0.157
 wizard-0.156
Token outside
Token ID2354
Feature activation+2.56e-02

Pos logit contributions
rated+0.061
heit+0.057
iling+0.055
fy+0.055
estival+0.054
Neg logit contributions
 doctor-0.052
 woman-0.051
 mouse-0.049
 fairy-0.048
 vet-0.047
Token.
Token ID13
Feature activation+5.35e-04

Pos logit contributions
rated+0.051
heit+0.048
fy+0.047
cluded+0.046
iling+0.046
Neg logit contributions
 doctor-0.045
 woman-0.042
 mouse-0.042
 parent-0.041
 wizard-0.040
Token
Token ID220
Feature activation+3.36e-02

Pos logit contributions
rated+0.001
fy+0.001
cluded+0.001
iling+0.001
heit+0.001
Neg logit contributions
 doctor-0.001
 woman-0.001
 vet-0.001
 parent-0.001
 mouse-0.001
Token roof
Token ID9753
Feature activation-3.21e-03

Pos logit contributions
rated+0.011
fy+0.011
cluded+0.011
|+0.010
iling+0.010
Neg logit contributions
 doctor-0.009
 woman-0.009
 vet-0.009
 parent-0.009
 wizard-0.009
Token.
Token ID13
Feature activation+7.16e-03

Pos logit contributions
 woman+0.006
 doctor+0.006
 vet+0.006
 parent+0.006
 wizard+0.006
Neg logit contributions
rated-0.005
fy-0.005
cluded-0.005
|-0.005
ash-0.004
Token They
Token ID1119
Feature activation+3.45e-02

Pos logit contributions
rated+0.010
fy+0.009
|+0.009
cluded+0.009
iling+0.008
Neg logit contributions
 woman-0.016
 doctor-0.016
 wizard-0.015
 parent-0.015
 vet-0.015
Token would
Token ID561
Feature activation-9.69e-03

Pos logit contributions
rated+0.055
fy+0.053
cluded+0.051
|+0.048
iling+0.048
Neg logit contributions
 doctor-0.070
 woman-0.070
 parent-0.067
 wizard-0.066
 mouse-0.066
Token have
Token ID423
Feature activation-1.99e-02

Pos logit contributions
 doctor+0.020
 woman+0.020
 parent+0.019
 vet+0.019
 wizard+0.019
Neg logit contributions
rated-0.015
fy-0.015
cluded-0.014
|-0.013
heit-0.013
Token pic
Token ID8301
Feature activation+2.25e-01

Pos logit contributions
 doctor+0.034
 woman+0.034
 vet+0.032
 parent+0.032
 mouse+0.032
Neg logit contributions
rated-0.039
fy-0.037
cluded-0.036
|-0.034
iling-0.034
Tokenn
Token ID77
Feature activation+7.43e-02

Pos logit contributions
rated+0.272
fy+0.242
cluded+0.232
irm+0.220
ash+0.213
Neg logit contributions
 doctor-0.518
 woman-0.508
 vet-0.504
 parent-0.491
 wizard-0.490
Tokenics
Token ID873
Feature activation+3.18e-02

Pos logit contributions
rated+0.067
fy+0.062
cluded+0.059
irl+0.056
|+0.056
Neg logit contributions
 woman-0.201
 doctor-0.199
 wizard-0.194
 vet-0.194
 mouse-0.194
Token and
Token ID290
Feature activation+1.88e-02

Pos logit contributions
rated+0.061
fy+0.057
cluded+0.056
heit+0.055
iling+0.054
Neg logit contributions
 doctor-0.056
 woman-0.055
 mouse-0.052
 parent-0.052
 vet-0.052
Token play
Token ID711
Feature activation+4.94e-02

Pos logit contributions
rated+0.026
fy+0.026
|+0.024
cluded+0.024
ils+0.024
Neg logit contributions
 woman-0.040
 doctor-0.040
 vet-0.038
 parent-0.038
 wizard-0.038
Token games
Token ID1830
Feature activation+8.29e-02

Pos logit contributions
rated+0.079
fy+0.075
cluded+0.069
heit+0.069
|+0.066
Neg logit contributions
 doctor-0.104
 woman-0.102
 mouse-0.099
 vet-0.097
 wizard-0.097
Token see
Token ID766
Feature activation-5.06e-02

Pos logit contributions
rated+0.016
fy+0.015
cluded+0.015
iling+0.014
heit+0.014
Neg logit contributions
 doctor-0.022
 woman-0.022
 vet-0.021
 parent-0.021
 wizard-0.021
Token the
Token ID262
Feature activation-3.05e-02

Pos logit contributions
 doctor+0.086
 woman+0.085
 wizard+0.081
 parent+0.081
 vet+0.081
Neg logit contributions
rated-0.095
fy-0.091
cluded-0.088
iling-0.085
|-0.085
Token bird
Token ID6512
Feature activation-9.18e-02

Pos logit contributions
 doctor+0.039
 woman+0.038
 wizard+0.037
 mechanic+0.036
 parent+0.036
Neg logit contributions
rated-0.069
fy-0.064
cluded-0.061
|-0.061
rene-0.061
Token and
Token ID290
Feature activation+1.14e-03

Pos logit contributions
 doctor+0.156
 woman+0.156
 parent+0.147
 vet+0.147
 wizard+0.146
Neg logit contributions
rated-0.176
fy-0.170
cluded-0.163
|-0.158
iling-0.156
Token have
Token ID423
Feature activation-2.39e-02

Pos logit contributions
rated+0.002
fy+0.002
|+0.002
cluded+0.002
 matters+0.001
Neg logit contributions
 doctor-0.002
 woman-0.002
 mechanic-0.002
 wizard-0.002
 vet-0.002
Token pic
Token ID8301
Feature activation+2.22e-01

Pos logit contributions
 doctor+0.041
 woman+0.040
 vet+0.039
 mouse+0.038
 parent+0.038
Neg logit contributions
rated-0.049
fy-0.045
cluded-0.044
iling-0.043
heit-0.043
Tokenn
Token ID77
Feature activation+7.43e-02

Pos logit contributions
rated+0.276
fy+0.245
cluded+0.230
irm+0.222
ash+0.219
Neg logit contributions
 doctor-0.509
 vet-0.500
 woman-0.496
 fairy-0.485
 parent-0.482
Tokenics
Token ID873
Feature activation+3.19e-02

Pos logit contributions
rated+0.079
fy+0.074
cluded+0.069
|+0.065
iling+0.063
Neg logit contributions
 doctor-0.190
 woman-0.189
 parent-0.182
 vet-0.182
 wizard-0.181
Token.
Token ID13
Feature activation+3.46e-03

Pos logit contributions
rated+0.066
heit+0.063
fy+0.060
iling+0.058
cluded+0.058
Neg logit contributions
 doctor-0.054
 woman-0.053
 mouse-0.050
 fairy-0.050
 vet-0.050
Token<|endoftext|>
Token ID50256
Feature activation+9.12e-02

Pos logit contributions
|+0.007
rated+0.007
fy+0.007
cluded+0.007
rene+0.006
Neg logit contributions
 woman-0.005
 doctor-0.005
 wizard-0.005
 parent-0.005
 vet-0.005
TokenOnce
Token ID7454
Feature activation+1.21e-01

Pos logit contributions
rated+0.235
fy+0.229
cluded+0.221
heit+0.215
iling+0.214
Neg logit contributions
 doctor-0.099
 woman-0.096
 vet-0.088
 mouse-0.088
 parent-0.087
Token to
Token ID284
Feature activation-4.96e-03

Pos logit contributions
 woman+0.098
 wizard+0.096
 doctor+0.095
 parent+0.092
 mouse+0.092
Neg logit contributions
rated-0.129
fy-0.126
cluded-0.122
 matters-0.121
|-0.121
Token be
Token ID307
Feature activation+2.76e-02

Pos logit contributions
 doctor+0.009
 vet+0.009
 woman+0.009
 parent+0.009
 owl+0.008
Neg logit contributions
rated-0.009
fy-0.009
cluded-0.008
ash-0.008
estival-0.008
Token careful
Token ID8161
Feature activation-3.55e-04

Pos logit contributions
rated+0.056
fy+0.054
cluded+0.052
|+0.051
iling+0.050
Neg logit contributions
 doctor-0.044
 woman-0.044
 vet-0.042
 parent-0.042
 wizard-0.041
Token when
Token ID618
Feature activation-2.78e-02

Pos logit contributions
 doctor+0.000
 woman+0.000
 vet+0.000
 parent+0.000
 wizard+0.000
Neg logit contributions
rated-0.001
fy-0.001
cluded-0.001
|-0.001
iling-0.001
Token having
Token ID1719
Feature activation+5.90e-03

Pos logit contributions
 doctor+0.044
 woman+0.042
 parent+0.040
 mouse+0.040
 wizard+0.040
Neg logit contributions
rated-0.060
fy-0.059
cluded-0.057
rene-0.055
|-0.055
Token pic
Token ID8301
Feature activation+2.22e-01

Pos logit contributions
rated+0.012
fy+0.011
cluded+0.011
|+0.011
estival+0.011
Neg logit contributions
 doctor-0.010
 vet-0.009
 mouse-0.009
 bird-0.009
 woman-0.009
Tokenn
Token ID77
Feature activation+7.94e-02

Pos logit contributions
rated+0.270
fy+0.269
cluded+0.242
ash+0.228
ishy+0.226
Neg logit contributions
 doctor-0.506
 vet-0.493
 woman-0.480
 wizard-0.478
 owl-0.470
Tokenics
Token ID873
Feature activation+3.72e-02

Pos logit contributions
rated+0.100
fy+0.096
cluded+0.090
|+0.084
heit+0.083
Neg logit contributions
 doctor-0.188
 woman-0.186
 parent-0.180
 vet-0.179
 wizard-0.177
Token and
Token ID290
Feature activation+1.56e-02

Pos logit contributions
rated+0.080
fy+0.076
cluded+0.075
heit+0.074
estival+0.074
Neg logit contributions
 doctor-0.059
 woman-0.056
 mouse-0.054
 vet-0.053
 parent-0.052
Token not
Token ID407
Feature activation-1.11e-02

Pos logit contributions
rated+0.030
fy+0.028
cluded+0.027
|+0.027
iling+0.026
Neg logit contributions
 woman-0.027
 doctor-0.027
 parent-0.025
 vet-0.025
 wizard-0.025
Token to
Token ID284
Feature activation-2.20e-03

Pos logit contributions
 doctor+0.019
 woman+0.018
 vet+0.018
 mouse+0.017
 owl+0.017
Neg logit contributions
rated-0.022
fy-0.021
cluded-0.021
heit-0.020
rene-0.020
Token family
Token ID1641
Feature activation-3.64e-02

Pos logit contributions
rated+0.012
fy+0.012
cluded+0.012
ils+0.011
irl+0.011
Neg logit contributions
 doctor-0.005
 woman-0.005
 wizard-0.005
 vet-0.005
 mechanic-0.004
Token were
Token ID547
Feature activation-1.08e-02

Pos logit contributions
 doctor+0.071
 woman+0.070
 parent+0.066
 mouse+0.066
 wizard+0.066
Neg logit contributions
rated-0.065
fy-0.062
cluded-0.060
heit-0.057
estival-0.057
Token ready
Token ID3492
Feature activation+6.46e-02

Pos logit contributions
 doctor+0.020
 woman+0.020
 parent+0.019
 vet+0.019
 wizard+0.019
Neg logit contributions
rated-0.019
fy-0.019
cluded-0.018
|-0.017
iling-0.017
Token to
Token ID284
Feature activation+3.41e-03

Pos logit contributions
rated+0.152
fy+0.145
|+0.144
heit+0.143
cluded+0.141
Neg logit contributions
 doctor-0.085
 woman-0.084
 parent-0.077
 wizard-0.077
 mouse-0.075
Token go
Token ID467
Feature activation+7.83e-02

Pos logit contributions
rated+0.006
fy+0.006
cluded+0.005
ash+0.005
iling+0.005
Neg logit contributions
 doctor-0.007
 vet-0.007
 parent-0.007
 woman-0.007
 mouse-0.007
Token pic
Token ID8301
Feature activation+2.20e-01

Pos logit contributions
rated+0.181
heit+0.164
fy+0.159
iling+0.157
wy+0.154
Neg logit contributions
 doctor-0.127
 woman-0.116
 vet-0.114
 parent-0.111
 wizard-0.111
Tokenn
Token ID77
Feature activation+8.48e-02

Pos logit contributions
rated+0.242
fy+0.225
cluded+0.208
irm+0.193
iling+0.193
Neg logit contributions
 doctor-0.555
 woman-0.546
 vet-0.536
 parent-0.530
 wizard-0.527
Tokenicking
Token ID7958
Feature activation-2.92e-02

Pos logit contributions
rated+0.104
fy+0.098
cluded+0.092
heit+0.086
|+0.086
Neg logit contributions
 doctor-0.203
 woman-0.202
 parent-0.194
 vet-0.194
 wizard-0.192
Token again
Token ID757
Feature activation+3.27e-02

Pos logit contributions
 doctor+0.043
 mouse+0.041
 woman+0.040
 fairy+0.039
 bird+0.038
Neg logit contributions
rated-0.064
wy-0.062
fy-0.062
rene-0.060
cluded-0.060
Token.
Token ID13
Feature activation-4.10e-03

Pos logit contributions
rated+0.068
fy+0.064
heit+0.063
cluded+0.062
wy+0.061
Neg logit contributions
 woman-0.053
 doctor-0.052
 parent-0.050
 mouse-0.050
 vet-0.049
Token But
Token ID887
Feature activation-3.65e-02

Pos logit contributions
 doctor+0.008
 woman+0.008
 vet+0.008
 mouse+0.008
 parent+0.008
Neg logit contributions
rated-0.007
fy-0.006
cluded-0.006
wy-0.006
irl-0.006
Token went
Token ID1816
Feature activation+6.97e-02

Pos logit contributions
rated+0.049
fy+0.046
|+0.044
cluded+0.043
ils+0.042
Neg logit contributions
 doctor-0.079
 woman-0.078
 vet-0.075
 wizard-0.075
 parent-0.074
Token on
Token ID319
Feature activation+6.91e-02

Pos logit contributions
rated+0.132
fy+0.123
cluded+0.118
iling+0.116
heit+0.116
Neg logit contributions
 doctor-0.126
 woman-0.124
 mouse-0.119
 parent-0.118
 wizard-0.118
Token adventures
Token ID17545
Feature activation+1.05e-01

Pos logit contributions
rated+0.119
fy+0.112
cluded+0.107
iling+0.101
rene+0.101
Neg logit contributions
 doctor-0.136
 woman-0.135
 mouse-0.129
 parent-0.129
 vet-0.128
Token,
Token ID11
Feature activation-1.29e-02

Pos logit contributions
rated+0.193
fy+0.188
cluded+0.180
|+0.176
iling+0.173
Neg logit contributions
 doctor-0.186
 woman-0.185
 parent-0.176
 vet-0.176
 wizard-0.174
Token had
Token ID550
Feature activation-3.43e-03

Pos logit contributions
 doctor+0.024
 vet+0.023
 woman+0.023
 mechanic+0.022
 parent+0.022
Neg logit contributions
|-0.022
rated-0.022
fy-0.022
cluded-0.021
ils-0.020
Token pic
Token ID8301
Feature activation+2.20e-01

Pos logit contributions
 doctor+0.006
 woman+0.006
 parent+0.006
 vet+0.006
 wizard+0.006
Neg logit contributions
rated-0.006
fy-0.006
cluded-0.006
|-0.006
iling-0.006
Tokenn
Token ID77
Feature activation+6.60e-02

Pos logit contributions
rated+0.263
fy+0.237
cluded+0.223
irm+0.213
ash+0.211
Neg logit contributions
 doctor-0.502
 woman-0.498
 vet-0.491
 fairy-0.482
 parent-0.481
Tokenics
Token ID873
Feature activation+2.94e-02

Pos logit contributions
rated+0.064
fy+0.058
cluded+0.056
|+0.054
irl+0.053
Neg logit contributions
 doctor-0.175
 woman-0.174
 vet-0.170
 wizard-0.170
 mouse-0.168
Token,
Token ID11
Feature activation-1.25e-02

Pos logit contributions
rated+0.065
fy+0.060
heit+0.059
cluded+0.059
estival+0.058
Neg logit contributions
 doctor-0.044
 woman-0.044
 mouse-0.042
 parent-0.041
 vet-0.041
Token and
Token ID290
Feature activation+4.63e-03

Pos logit contributions
 doctor+0.023
 vet+0.022
 mechanic+0.022
 wizard+0.022
 woman+0.022
Neg logit contributions
|-0.020
fy-0.020
irl-0.019
rated-0.018
cluded-0.018
Token even
Token ID772
Feature activation+2.87e-02

Pos logit contributions
|+0.007
fy+0.007
rated+0.006
ils+0.006
 matters+0.006
Neg logit contributions
 doctor-0.010
 woman-0.010
 vet-0.009
 wizard-0.009
 mechanic-0.009
Token children
Token ID1751
Feature activation-1.75e-02

Pos logit contributions
 doctor+0.011
 woman+0.011
 wizard+0.011
 parent+0.011
 vet+0.011
Neg logit contributions
rated-0.012
fy-0.012
cluded-0.012
|-0.011
ils-0.011
Token running
Token ID2491
Feature activation+8.71e-02

Pos logit contributions
 doctor+0.030
 woman+0.030
 parent+0.028
 wizard+0.028
 vet+0.028
Neg logit contributions
rated-0.034
fy-0.032
cluded-0.031
heit-0.030
|-0.030
Token around
Token ID1088
Feature activation+1.04e-02

Pos logit contributions
rated+0.149
wy+0.138
iling+0.137
fy+0.135
heit+0.132
Neg logit contributions
 woman-0.171
 doctor-0.166
 mouse-0.166
 conductor-0.158
 fairy-0.157
Token,
Token ID11
Feature activation-2.31e-02

Pos logit contributions
rated+0.023
fy+0.022
cluded+0.021
rene+0.020
|+0.020
Neg logit contributions
 doctor-0.015
 woman-0.015
 mouse-0.014
 wizard-0.014
 parent-0.014
Token families
Token ID4172
Feature activation+4.66e-03

Pos logit contributions
 doctor+0.040
 woman+0.040
 vet+0.038
 parent+0.038
 wizard+0.038
Neg logit contributions
rated-0.042
fy-0.041
cluded-0.039
|-0.039
irm-0.038
Token pic
Token ID8301
Feature activation+2.19e-01

Pos logit contributions
rated+0.009
fy+0.009
cluded+0.008
|+0.008
iling+0.008
Neg logit contributions
 doctor-0.008
 woman-0.008
 vet-0.007
 parent-0.007
 wizard-0.007
Tokenn
Token ID77
Feature activation+6.41e-02

Pos logit contributions
rated+0.276
fy+0.260
irm+0.233
ishy+0.232
cluded+0.231
Neg logit contributions
 woman-0.504
 doctor-0.493
 conductor-0.485
 wizard-0.482
 owl-0.480
Tokenicking
Token ID7958
Feature activation-3.56e-02

Pos logit contributions
rated+0.078
fy+0.072
cluded+0.069
|+0.066
iling+0.064
Neg logit contributions
 doctor-0.155
 woman-0.153
 vet-0.149
 wizard-0.148
 mouse-0.148
Token,
Token ID11
Feature activation-2.29e-02

Pos logit contributions
 mouse+0.053
 woman+0.053
 doctor+0.052
 parent+0.049
 fairy+0.049
Neg logit contributions
rated-0.076
fy-0.072
rene-0.071
wy-0.070
cluded-0.070
Token couples
Token ID11886
Feature activation+4.22e-03

Pos logit contributions
 doctor+0.037
 woman+0.036
 vet+0.036
 parent+0.035
 wizard+0.035
Neg logit contributions
rated-0.043
fy-0.042
cluded-0.040
|-0.040
irm-0.039
Token st
Token ID336
Feature activation+1.07e-01

Pos logit contributions
rated+0.008
fy+0.008
cluded+0.008
|+0.007
rene+0.007
Neg logit contributions
 doctor-0.007
 woman-0.007
 parent-0.007
 mouse-0.007
 wizard-0.007
Token favourite
Token ID12507
Feature activation-2.46e-02

Pos logit contributions
rated+0.029
cluded+0.027
fy+0.027
iling+0.026
heit+0.026
Neg logit contributions
 doctor-0.026
 woman-0.025
 parent-0.025
 mouse-0.024
 fairy-0.024
Token place
Token ID1295
Feature activation-1.39e-02

Pos logit contributions
 doctor+0.043
 woman+0.042
 mouse+0.041
 fairy+0.040
 parent+0.039
Neg logit contributions
rated-0.049
fy-0.047
cluded-0.045
wy-0.045
heit-0.045
Token to
Token ID284
Feature activation+8.15e-03

Pos logit contributions
 doctor+0.022
 woman+0.022
 fairy+0.021
 mouse+0.021
 parent+0.021
Neg logit contributions
rated-0.029
heit-0.027
iling-0.027
wy-0.026
fy-0.026
Token go
Token ID467
Feature activation+7.22e-02

Pos logit contributions
rated+0.014
fy+0.014
heit+0.013
iling+0.013
cluded+0.012
Neg logit contributions
 doctor-0.016
 woman-0.015
 fairy-0.015
 wizard-0.015
 vet-0.015
Token for
Token ID329
Feature activation-4.35e-03

Pos logit contributions
rated+0.161
heit+0.154
wy+0.149
iling+0.145
fy+0.145
Neg logit contributions
 doctor-0.114
 woman-0.104
 fairy-0.102
 mechanic-0.101
 parent-0.101
Token pic
Token ID8301
Feature activation+2.18e-01

Pos logit contributions
 doctor+0.006
 woman+0.006
 vet+0.006
 mouse+0.006
 parent+0.005
Neg logit contributions
rated-0.011
fy-0.010
cluded-0.010
ash-0.009
heit-0.009
Tokenn
Token ID77
Feature activation+7.56e-02

Pos logit contributions
rated+0.277
fy+0.256
ishy+0.249
ash+0.245
irm+0.244
Neg logit contributions
 doctor-0.486
 vet-0.476
 woman-0.476
 wizard-0.461
 parent-0.461
Tokenics
Token ID873
Feature activation+3.42e-02

Pos logit contributions
rated+0.125
fy+0.122
heit+0.114
cluded+0.113
iling+0.109
Neg logit contributions
 doctor-0.147
 woman-0.145
 parent-0.140
 vet-0.137
 mechanic-0.135
Token.
Token ID13
Feature activation+3.58e-03

Pos logit contributions
rated+0.076
heit+0.072
fy+0.069
cluded+0.067
iling+0.067
Neg logit contributions
 doctor-0.054
 woman-0.052
 mouse-0.048
 vet-0.048
 fairy-0.048
Token<|endoftext|>
Token ID50256
Feature activation+8.86e-02

Pos logit contributions
rated+0.008
fy+0.008
cluded+0.008
|+0.007
iling+0.007
Neg logit contributions
 doctor-0.005
 woman-0.005
 parent-0.004
 vet-0.004
 wizard-0.004
TokenOnce
Token ID7454
Feature activation+1.17e-01

Pos logit contributions
rated+0.224
fy+0.220
heit+0.208
cluded+0.208
irm+0.201
Neg logit contributions
 doctor-0.108
 woman-0.103
 mouse-0.100
 wizard-0.093
 fairy-0.093
Token angry
Token ID7954
Feature activation-2.26e-02

Pos logit contributions
 mouse+0.048
 woman+0.047
 parent+0.046
 vet+0.046
 doctor+0.046
Neg logit contributions
irl-0.043
cluded-0.043
ag-0.041
 matters-0.041
cks-0.041
Token.
Token ID13
Feature activation+2.89e-03

Pos logit contributions
 woman+0.048
 doctor+0.048
 parent+0.047
 vet+0.046
 wizard+0.046
Neg logit contributions
rated-0.032
fy-0.031
cluded-0.029
iling-0.029
|-0.028
Token He
Token ID679
Feature activation-2.27e-02

Pos logit contributions
rated+0.005
fy+0.005
|+0.005
cluded+0.005
 matters+0.004
Neg logit contributions
 doctor-0.005
 woman-0.005
 parent-0.005
 vet-0.005
 wizard-0.005
Token is
Token ID318
Feature activation-3.52e-02

Pos logit contributions
 doctor+0.046
 woman+0.046
 wizard+0.044
 vet+0.044
 parent+0.043
Neg logit contributions
rated-0.037
fy-0.035
cluded-0.033
|-0.032
estival-0.031
Token the
Token ID262
Feature activation-3.31e-02

Pos logit contributions
 woman+0.061
 mouse+0.061
 parent+0.060
 doctor+0.060
 vet+0.060
Neg logit contributions
cluded-0.055
rated-0.054
fy-0.053
irl-0.053
ag-0.052
Token artist
Token ID6802
Feature activation-1.50e-01

Pos logit contributions
 vet+0.043
ress+0.043
 doctor+0.043
Ros+0.040
 mouse+0.040
Neg logit contributions
rated-0.059
 matters-0.055
wy-0.055
cluded-0.055
 mine-0.054
Token.
Token ID13
Feature activation+5.77e-03

Pos logit contributions
 doctor+0.305
 woman+0.304
 parent+0.290
 vet+0.290
 wizard+0.288
Neg logit contributions
rated-0.235
fy-0.225
cluded-0.215
|-0.206
iling-0.204
Token
Token ID198
Feature activation-1.69e-02

Pos logit contributions
rated+0.009
|+0.009
fy+0.009
cluded+0.009
 matters+0.008
Neg logit contributions
 woman-0.011
 doctor-0.010
 parent-0.010
 wizard-0.010
 vet-0.010
Token
Token ID198
Feature activation-2.12e-02

Pos logit contributions
 woman+0.027
 parent+0.026
 doctor+0.026
 wizard+0.026
 mouse+0.026
Neg logit contributions
|-0.031
fy-0.030
irl-0.029
rated-0.029
cluded-0.029
Token"
Token ID1
Feature activation-4.50e-02

Pos logit contributions
 doctor+0.039
 woman+0.038
 parent+0.037
 wizard+0.036
 vet+0.036
Neg logit contributions
rated-0.036
fy-0.036
cluded-0.034
|-0.034
iling-0.033
TokenSorry
Token ID14385
Feature activation+7.05e-03

Pos logit contributions
 doctor+0.085
 parent+0.085
 wizard+0.082
 grandmother+0.080
 mouse+0.079
Neg logit contributions
rated-0.067
|-0.064
iling-0.064
cluded-0.064
fy-0.063
Token beautiful
Token ID4950
Feature activation-2.17e-02

Pos logit contributions
rated+0.095
fy+0.089
|+0.089
cluded+0.087
iling+0.086
Neg logit contributions
 doctor-0.078
 woman-0.077
 wizard-0.073
 parent-0.071
 vet-0.071
Token.
Token ID13
Feature activation+7.05e-04

Pos logit contributions
 doctor+0.034
 woman+0.034
 wizard+0.032
 mouse+0.031
 vet+0.031
Neg logit contributions
rated-0.045
fy-0.043
cluded-0.042
|-0.042
heit-0.041
Token
Token ID198
Feature activation-3.11e-02

Pos logit contributions
|+0.001
fy+0.001
rated+0.001
cluded+0.001
 matters+0.001
Neg logit contributions
 woman-0.001
 doctor-0.001
 parent-0.001
 wizard-0.001
 vet-0.001
Token
Token ID198
Feature activation-3.66e-02

Pos logit contributions
 woman+0.055
 doctor+0.054
 mouse+0.053
 parent+0.053
 wizard+0.052
Neg logit contributions
|-0.053
rated-0.052
fy-0.051
cluded-0.051
irl-0.051
TokenThe
Token ID464
Feature activation-5.80e-02

Pos logit contributions
 woman+0.056
 mouse+0.055
 parent+0.055
 doctor+0.055
 wizard+0.052
Neg logit contributions
rated-0.071
fy-0.070
cluded-0.070
irl-0.069
|-0.069
Token artist
Token ID6802
Feature activation-1.45e-01

Pos logit contributions
 vet+0.077
 doctor+0.071
 blood+0.064
 gen+0.062
 Owl+0.061
Neg logit contributions
rated-0.125
fy-0.115
 afternoon-0.112
ils-0.112
cluded-0.111
Token said
Token ID531
Feature activation-5.71e-02

Pos logit contributions
 doctor+0.257
 woman+0.254
 vet+0.246
 mouse+0.241
 parent+0.241
Neg logit contributions
rated-0.259
fy-0.250
cluded-0.238
ils-0.231
iling-0.230
Token she
Token ID673
Feature activation-7.09e-03

Pos logit contributions
 woman+0.101
 doctor+0.098
 parent+0.098
 vet+0.097
 mouse+0.094
Neg logit contributions
rated-0.096
fy-0.095
cluded-0.091
ils-0.088
heit-0.087
Token was
Token ID373
Feature activation-3.44e-02

Pos logit contributions
 doctor+0.013
 woman+0.013
 vet+0.012
 parent+0.012
 wizard+0.012
Neg logit contributions
rated-0.012
fy-0.012
cluded-0.011
|-0.011
rene-0.011
Token very
Token ID845
Feature activation+3.07e-02

Pos logit contributions
 doctor+0.056
 woman+0.055
 parent+0.052
 wizard+0.052
 vet+0.051
Neg logit contributions
rated-0.070
fy-0.068
cluded-0.065
|-0.064
iling-0.063
Token talented
Token ID12356
Feature activation-3.48e-02

Pos logit contributions
rated+0.067
fy+0.064
cluded+0.062
|+0.061
iling+0.060
Neg logit contributions
 doctor-0.045
 woman-0.045
 parent-0.042
 vet-0.042
 wizard-0.042
Token Grand
Token ID5675
Feature activation+1.51e-02

Pos logit contributions
 doctor+0.033
 woman+0.033
 vet+0.031
 wizard+0.031
 parent+0.031
Neg logit contributions
rated-0.048
fy-0.047
cluded-0.045
|-0.043
iling-0.043
Tokenma
Token ID2611
Feature activation-4.92e-02

Pos logit contributions
rated+0.034
fy+0.033
irm+0.032
cluded+0.032
heit+0.031
Neg logit contributions
 doctor-0.021
 woman-0.021
 parent-0.020
 painter-0.020
 mechanic-0.019
Token,"
Token ID553
Feature activation-4.83e-02

Pos logit contributions
 doctor+0.109
 woman+0.109
 parent+0.104
 mouse+0.103
 wizard+0.103
Neg logit contributions
rated-0.070
fy-0.066
cluded-0.065
heit-0.062
|-0.062
Token said
Token ID531
Feature activation-5.69e-02

Pos logit contributions
 doctor+0.084
 vet+0.082
 woman+0.081
 wizard+0.081
 parent+0.080
Neg logit contributions
fy-0.085
rated-0.082
cluded-0.078
 matters-0.076
|-0.075
Token the
Token ID262
Feature activation-4.87e-02

Pos logit contributions
 vet+0.091
 doctor+0.090
 woman+0.089
 wizard+0.089
 mouse+0.087
Neg logit contributions
fy-0.102
rated-0.102
 matters-0.095
cluded-0.095
|-0.094
Token granddaughter
Token ID46458
Feature activation-1.43e-01

Pos logit contributions
 wizard+0.048
 vet+0.046
 doctor+0.045
 woman+0.042
 gen+0.041
Neg logit contributions
rated-0.115
fy-0.114
cluded-0.107
 Saturday-0.104
ils-0.102
Token.
Token ID13
Feature activation-1.21e-02

Pos logit contributions
 doctor+0.270
 woman+0.269
 parent+0.256
 vet+0.256
 wizard+0.254
Neg logit contributions
rated-0.245
fy-0.236
cluded-0.225
|-0.218
iling-0.214
Token
Token ID220
Feature activation+2.10e-02

Pos logit contributions
 doctor+0.020
 wizard+0.020
 woman+0.019
 mouse+0.019
 vet+0.019
Neg logit contributions
fy-0.022
rated-0.021
cluded-0.020
|-0.020
estival-0.020
Token
Token ID198
Feature activation-3.86e-02

Pos logit contributions
rated+0.048
fy+0.047
cluded+0.046
|+0.045
iling+0.044
Neg logit contributions
 woman-0.027
 doctor-0.026
 wizard-0.025
 vet-0.025
 mouse-0.025
Token
Token ID198
Feature activation-4.15e-02

Pos logit contributions
 wizard+0.056
 doctor+0.056
 woman+0.055
 mouse+0.055
 vet+0.053
Neg logit contributions
fy-0.075
|-0.075
rated-0.074
cluded-0.074
irl-0.072
TokenThey
Token ID2990
Feature activation+3.06e-02

Pos logit contributions
 doctor+0.057
 woman+0.056
 mouse+0.056
 wizard+0.056
 parent+0.054
Neg logit contributions
fy-0.087
rated-0.087
cluded-0.084
|-0.082
iling-0.082
Token.
Token ID13
Feature activation+2.63e-03

Pos logit contributions
rated+0.017
fy+0.016
|+0.016
heit+0.015
cluded+0.015
Neg logit contributions
 doctor-0.015
 woman-0.014
 wizard-0.014
 fairy-0.013
 mouse-0.013
Token He
Token ID679
Feature activation-1.71e-02

Pos logit contributions
rated+0.005
fy+0.005
|+0.004
cluded+0.004
irm+0.004
Neg logit contributions
 woman-0.005
 doctor-0.005
 wizard-0.004
 parent-0.004
 vet-0.004
Token was
Token ID373
Feature activation-2.84e-02

Pos logit contributions
 doctor+0.031
 woman+0.031
 vet+0.029
 mouse+0.029
 wizard+0.029
Neg logit contributions
rated-0.030
fy-0.030
cluded-0.027
irl-0.027
ils-0.027
Token becoming
Token ID5033
Feature activation-2.53e-02

Pos logit contributions
 woman+0.050
 doctor+0.049
 mouse+0.047
 parent+0.047
 vet+0.047
Neg logit contributions
rated-0.051
fy-0.050
cluded-0.048
rene-0.046
irl-0.046
Token an
Token ID281
Feature activation-4.35e-03

Pos logit contributions
 woman+0.043
 doctor+0.042
 parent+0.041
 vet+0.040
 mouse+0.040
Neg logit contributions
rated-0.046
fy-0.045
cluded-0.043
irl-0.042
rene-0.042
Token artist
Token ID6802
Feature activation-1.41e-01

Pos logit contributions
 doctor+0.008
 woman+0.008
 vet+0.007
 parent+0.007
 mouse+0.007
Neg logit contributions
rated-0.008
fy-0.008
cluded-0.007
iling-0.007
rene-0.007
Token.
Token ID13
Feature activation-2.04e-03

Pos logit contributions
 woman+0.252
 doctor+0.250
 vet+0.248
 parent+0.238
 wizard+0.232
Neg logit contributions
fy-0.234
rated-0.231
cluded-0.216
ils-0.211
iling-0.211
Token His
Token ID2399
Feature activation-2.16e-02

Pos logit contributions
 woman+0.003
 doctor+0.003
 wizard+0.003
 parent+0.003
 fairy+0.003
Neg logit contributions
fy-0.004
rated-0.004
|-0.004
cluded-0.004
irm-0.003
Token mom
Token ID1995
Feature activation-7.02e-02

Pos logit contributions
 doctor+0.026
 woman+0.026
 vet+0.025
 wizard+0.024
 mouse+0.024
Neg logit contributions
rated-0.049
fy-0.048
cluded-0.047
irl-0.046
heit-0.045
Token was
Token ID373
Feature activation-3.94e-02

Pos logit contributions
 doctor+0.134
 vet+0.132
 woman+0.132
 wizard+0.127
 parent+0.125
Neg logit contributions
fy-0.111
rated-0.109
ils-0.102
iling-0.099
cluded-0.097
Token so
Token ID523
Feature activation+4.31e-02

Pos logit contributions
 mouse+0.073
 woman+0.072
 parent+0.070
 doctor+0.070
 vet+0.069
Neg logit contributions
irl-0.060
fy-0.059
rated-0.059
cluded-0.059
iling-0.057
Token she
Token ID673
Feature activation+1.58e-02

Pos logit contributions
rated+0.032
fy+0.029
cluded+0.029
|+0.029
rene+0.029
Neg logit contributions
 doctor-0.021
 woman-0.020
 parent-0.020
 fairy-0.019
 mouse-0.019
Token'll
Token ID1183
Feature activation-3.20e-02

Pos logit contributions
rated+0.029
fy+0.028
cluded+0.027
|+0.026
heit+0.026
Neg logit contributions
 doctor-0.029
 woman-0.028
 parent-0.027
 vet-0.027
 wizard-0.027
Token be
Token ID307
Feature activation+1.87e-02

Pos logit contributions
 woman+0.065
 doctor+0.062
 mouse+0.061
 vet+0.060
 parent+0.059
Neg logit contributions
rated-0.048
fy-0.046
irl-0.046
cluded-0.046
rene-0.044
Token a
Token ID257
Feature activation-5.18e-02

Pos logit contributions
rated+0.034
fy+0.033
cluded+0.032
rene+0.031
irl+0.031
Neg logit contributions
 woman-0.032
 doctor-0.031
 vet-0.030
 mouse-0.030
 parent-0.030
Token powerful
Token ID3665
Feature activation-4.48e-02

Pos logit contributions
 woman+0.071
 conductor+0.070
 blood+0.070
 parent+0.068
 vet+0.067
Neg logit contributions
fy-0.088
 mine-0.080
rated-0.079
cluded-0.077
irl-0.077
Token artist
Token ID6802
Feature activation-1.40e-01

Pos logit contributions
 vet+0.049
 parent+0.048
 woman+0.047
 doctor+0.046
 conductor+0.046
Neg logit contributions
rated-0.094
fy-0.092
cluded-0.086
iling-0.082
 matters-0.081
Token just
Token ID655
Feature activation+1.21e-02

Pos logit contributions
 woman+0.274
 doctor+0.270
 vet+0.267
 parent+0.260
 wizard+0.256
Neg logit contributions
rated-0.211
fy-0.207
cluded-0.198
ils-0.193
iling-0.187
Token like
Token ID588
Feature activation-1.77e-02

Pos logit contributions
fy+0.023
rated+0.023
|+0.022
cluded+0.022
heit+0.021
Neg logit contributions
 doctor-0.020
 woman-0.020
 wizard-0.019
 owl-0.019
 parent-0.019
Token the
Token ID262
Feature activation-1.75e-02

Pos logit contributions
 woman+0.031
 doctor+0.031
 vet+0.030
 parent+0.030
 wizard+0.029
Neg logit contributions
rated-0.030
fy-0.030
cluded-0.029
ils-0.028
|-0.028
Token boy
Token ID2933
Feature activation-4.89e-02

Pos logit contributions
 parent+0.025
 vet+0.025
 doctor+0.025
 woman+0.024
 conductor+0.024
Neg logit contributions
rated-0.031
cluded-0.029
fy-0.029
 time-0.028
 matters-0.028
Token in
Token ID287
Feature activation+5.60e-02

Pos logit contributions
 doctor+0.081
 woman+0.078
 mouse+0.075
 parent+0.074
 fairy+0.073
Neg logit contributions
rated-0.103
heit-0.097
fy-0.095
cluded-0.093
|-0.091
Token special
Token ID2041
Feature activation-1.01e-02

Pos logit contributions
 woman+0.047
 doctor+0.045
 wizard+0.044
 parent+0.044
 vet+0.043
Neg logit contributions
fy-0.053
rated-0.051
cluded-0.050
|-0.047
 matters-0.046
Token present
Token ID1944
Feature activation-4.17e-02

Pos logit contributions
 woman+0.016
 doctor+0.016
 parent+0.015
 vet+0.015
 wizard+0.015
Neg logit contributions
rated-0.020
fy-0.020
cluded-0.019
|-0.018
estival-0.018
Token!"
Token ID2474
Feature activation-1.10e-02

Pos logit contributions
 doctor+0.085
 woman+0.084
 mouse+0.080
 parent+0.080
 vet+0.079
Neg logit contributions
rated-0.069
fy-0.064
cluded-0.062
|-0.062
heit-0.061
Token said
Token ID531
Feature activation-4.98e-02

Pos logit contributions
 doctor+0.018
 woman+0.018
 vet+0.018
 parent+0.018
 mouse+0.018
Neg logit contributions
rated-0.018
|-0.018
fy-0.018
cluded-0.018
 matters-0.018
Token the
Token ID262
Feature activation-4.91e-02

Pos logit contributions
 woman+0.099
 vet+0.098
 doctor+0.098
 parent+0.095
 wizard+0.094
Neg logit contributions
rated-0.066
fy-0.063
 matters-0.062
cluded-0.061
|-0.060
Token fairy
Token ID25607
Feature activation-1.40e-01

Pos logit contributions
 parent+0.042
 woman+0.042
 vet+0.041
 doctor+0.041
 wizard+0.039
Neg logit contributions
rated-0.110
fy-0.107
cluded-0.104
heit-0.103
|-0.103
Token.
Token ID13
Feature activation-1.26e-02

Pos logit contributions
 doctor+0.284
 vet+0.276
 woman+0.275
 parent+0.271
 mechanic+0.268
Neg logit contributions
rated-0.205
fy-0.205
cluded-0.190
rene-0.177
|-0.174
Token
Token ID198
Feature activation-2.98e-02

Pos logit contributions
 doctor+0.021
 parent+0.021
 wizard+0.020
 woman+0.020
 fairy+0.020
Neg logit contributions
|-0.021
rated-0.021
fy-0.021
cluded-0.020
estival-0.020
Token
Token ID198
Feature activation-3.33e-02

Pos logit contributions
 doctor+0.043
 woman+0.043
 parent+0.042
 wizard+0.042
 mouse+0.042
Neg logit contributions
|-0.058
rated-0.058
fy-0.057
cluded-0.057
rene-0.056
TokenJohn
Token ID7554
Feature activation+1.10e-02

Pos logit contributions
 doctor+0.063
 woman+0.062
 parent+0.060
 wizard+0.060
 mouse+0.060
Neg logit contributions
rated-0.056
fy-0.054
cluded-0.053
iling-0.051
|-0.051
Token was
Token ID373
Feature activation-3.06e-02

Pos logit contributions
rated+0.022
fy+0.021
cluded+0.020
|+0.019
iling+0.019
Neg logit contributions
 doctor-0.018
 woman-0.018
 parent-0.017
 vet-0.017
 wizard-0.017
Token lose
Token ID4425
Feature activation-3.24e-02

Pos logit contributions
 woman+0.047
 doctor+0.047
 parent+0.044
 vet+0.044
 wizard+0.044
Neg logit contributions
rated-0.046
fy-0.044
cluded-0.043
|-0.041
iling-0.041
Token their
Token ID511
Feature activation+2.38e-02

Pos logit contributions
 doctor+0.063
 woman+0.063
 parent+0.060
 vet+0.060
 wizard+0.059
Neg logit contributions
rated-0.054
fy-0.052
cluded-0.050
|-0.048
iling-0.047
Token magic
Token ID5536
Feature activation-2.23e-02

Pos logit contributions
rated+0.047
fy+0.045
cluded+0.043
rene+0.042
iling+0.042
Neg logit contributions
 doctor-0.041
 woman-0.041
 parent-0.040
 mouse-0.039
 fairy-0.039
Token."
Token ID526
Feature activation+1.35e-02

Pos logit contributions
 doctor+0.035
 woman+0.035
 vet+0.033
 parent+0.033
 wizard+0.033
Neg logit contributions
rated-0.045
fy-0.044
cluded-0.042
|-0.041
iling-0.040
Token the
Token ID262
Feature activation-4.66e-02

Pos logit contributions
 matters+0.021
|+0.021
irl+0.020
irm+0.020
fy+0.020
Neg logit contributions
 wizard-0.022
 doctor-0.022
 gen-0.022
 woman-0.022
 conductor-0.022
Token fairy
Token ID25607
Feature activation-1.40e-01

Pos logit contributions
 mechanic+0.032
 paramed+0.029
 vet+0.028
 conductor+0.027
smith+0.027
Neg logit contributions
rated-0.106
irl-0.106
 matters-0.105
ils-0.105
heit-0.101
Token said
Token ID531
Feature activation-4.94e-02

Pos logit contributions
 doctor+0.284
 mechanic+0.284
 parent+0.281
 scientist+0.279
 vet+0.276
Neg logit contributions
rated-0.169
fy-0.166
ils-0.163
led-0.158
cluded-0.156
Token.
Token ID13
Feature activation+1.05e-02

Pos logit contributions
 woman+0.108
 doctor+0.108
 vet+0.104
 parent+0.104
 wizard+0.103
Neg logit contributions
rated-0.068
fy-0.066
cluded-0.062
irm-0.058
iling-0.058
Token
Token ID198
Feature activation-1.69e-02

Pos logit contributions
 matters+0.010
|+0.009
aks+0.009
irm+0.009
 Today+0.009
Neg logit contributions
 wizard-0.022
 parent-0.022
 conductor-0.022
 painter-0.022
 mechanic-0.022
Token
Token ID198
Feature activation-2.04e-02

Pos logit contributions
 woman+0.029
 wizard+0.029
 doctor+0.028
 painter+0.028
 mouse+0.027
Neg logit contributions
|-0.029
fy-0.027
irl-0.027
rated-0.027
 matters-0.027
TokenL
Token ID43
Feature activation+4.35e-02

Pos logit contributions
 doctor+0.034
 woman+0.034
 wizard+0.033
 parent+0.033
 mouse+0.032
Neg logit contributions
fy-0.037
rated-0.037
cluded-0.035
|-0.034
iling-0.034
Token paint
Token ID7521
Feature activation-1.21e-02

Pos logit contributions
rated+0.004
fy+0.004
cluded+0.004
|+0.004
iling+0.003
Neg logit contributions
 doctor-0.004
 woman-0.004
 parent-0.003
 vet-0.003
 wizard-0.003
Token.
Token ID13
Feature activation+1.49e-02

Pos logit contributions
 doctor+0.023
 woman+0.022
 mouse+0.021
 wizard+0.021
 vet+0.021
Neg logit contributions
rated-0.022
fy-0.021
cluded-0.020
|-0.020
iling-0.020
Token
Token ID198
Feature activation-1.74e-02

Pos logit contributions
rated+0.025
fy+0.025
|+0.024
cluded+0.024
irm+0.023
Neg logit contributions
 woman-0.027
 doctor-0.026
 parent-0.026
 wizard-0.026
 vet-0.025
Token
Token ID198
Feature activation-2.37e-02

Pos logit contributions
 woman+0.027
 doctor+0.026
 parent+0.025
 mouse+0.025
 wizard+0.025
Neg logit contributions
|-0.034
rated-0.033
fy-0.033
cluded-0.033
rene-0.032
TokenThe
Token ID464
Feature activation-5.07e-02

Pos logit contributions
 woman+0.038
 doctor+0.038
 parent+0.036
 mouse+0.035
 wizard+0.035
Neg logit contributions
rated-0.047
fy-0.046
cluded-0.045
|-0.044
rene-0.043
Token artist
Token ID6802
Feature activation-1.40e-01

Pos logit contributions
 doctor+0.041
 woman+0.040
 parent+0.038
 vet+0.038
 wizard+0.037
Neg logit contributions
rated-0.136
fy-0.134
cluded-0.129
irl-0.124
|-0.124
Token watched
Token ID7342
Feature activation+9.96e-04

Pos logit contributions
 doctor+0.242
 woman+0.241
 vet+0.230
 parent+0.228
 wizard+0.227
Neg logit contributions
rated-0.259
fy-0.250
cluded-0.239
iling-0.230
|-0.230
Token as
Token ID355
Feature activation+1.26e-02

Pos logit contributions
rated+0.002
fy+0.002
cluded+0.002
|+0.001
ils+0.001
Neg logit contributions
 woman-0.002
 doctor-0.002
 vet-0.002
 parent-0.002
 wizard-0.002
Token the
Token ID262
Feature activation-1.87e-02

Pos logit contributions
rated+0.017
irl+0.017
 matters+0.017
fy+0.016
 Saturday+0.016
Neg logit contributions
 vet-0.023
 woman-0.022
 doctor-0.022
 parent-0.022
 mechanic-0.022
Token little
Token ID1310
Feature activation-5.13e-02

Pos logit contributions
 vet+0.020
 doctor+0.019
 woman+0.016
 parent+0.016
 wizard+0.016
Neg logit contributions
rated-0.044
fy-0.043
cluded-0.041
ils-0.040
estival-0.039
Token girl
Token ID2576
Feature activation-7.00e-02

Pos logit contributions
 doctor+0.054
 vet+0.053
 parent+0.049
 woman+0.049
 conductor+0.047
Neg logit contributions
rated-0.122
fy-0.120
cluded-0.113
 matters-0.111
iling-0.111
Token You
Token ID921
Feature activation+3.41e-02

Pos logit contributions
 woman+0.006
 doctor+0.005
 wizard+0.005
 parent+0.005
 vet+0.005
Neg logit contributions
rated-0.004
fy-0.004
|-0.003
cluded-0.003
estival-0.003
Token're
Token ID821
Feature activation-7.99e-03

Pos logit contributions
rated+0.046
fy+0.044
cluded+0.040
|+0.040
ils+0.039
Neg logit contributions
 woman-0.078
 doctor-0.077
 vet-0.075
 parent-0.073
 wizard-0.073
Token such
Token ID884
Feature activation+7.84e-03

Pos logit contributions
 woman+0.014
 doctor+0.014
 vet+0.013
 parent+0.013
 mouse+0.013
Neg logit contributions
rated-0.014
fy-0.014
cluded-0.013
iling-0.013
|-0.013
Token a
Token ID257
Feature activation-3.34e-02

Pos logit contributions
rated+0.018
fy+0.017
cluded+0.016
|+0.016
iling+0.016
Neg logit contributions
 doctor-0.011
 woman-0.011
 parent-0.010
 mouse-0.010
 vet-0.010
Token talented
Token ID12356
Feature activation-3.45e-02

Pos logit contributions
 woman+0.071
 doctor+0.070
 parent+0.068
 conductor+0.067
 vet+0.067
Neg logit contributions
fy-0.046
rated-0.043
cluded-0.039
rene-0.038
irl-0.037
Token artist
Token ID6802
Feature activation-1.39e-01

Pos logit contributions
 vet+0.036
 doctor+0.034
 woman+0.033
 wizard+0.031
 conductor+0.031
Neg logit contributions
rated-0.082
fy-0.079
cluded-0.076
iling-0.074
ils-0.073
Token!"
Token ID2474
Feature activation+1.52e-04

Pos logit contributions
 doctor+0.328
 woman+0.327
 vet+0.321
 parent+0.315
 wizard+0.313
Neg logit contributions
rated-0.166
fy-0.158
cluded-0.145
iling-0.139
ils-0.137
Token<|endoftext|>
Token ID50256
Feature activation+8.47e-02

Pos logit contributions
fy+0.000
rated+0.000
|+0.000
cluded+0.000
irm+0.000
Neg logit contributions
 woman-0.000
 doctor-0.000
 wizard-0.000
 parent-0.000
 mechanic-0.000
TokenOnce
Token ID7454
Feature activation+1.13e-01

Pos logit contributions
rated+0.231
fy+0.225
cluded+0.214
heit+0.214
ils+0.207
Neg logit contributions
 doctor-0.088
 woman-0.083
 mouse-0.079
 vet-0.074
 wizard-0.073
Token upon
Token ID2402
Feature activation+3.73e-02

Pos logit contributions
rated+0.200
fy+0.191
cluded+0.184
|+0.181
irl+0.175
Neg logit contributions
 doctor-0.206
 woman-0.205
 vet-0.200
 wizard-0.196
 mouse-0.194
Token a
Token ID257
Feature activation+3.38e-03

Pos logit contributions
rated+0.078
fy+0.075
cluded+0.073
|+0.072
rene+0.070
Neg logit contributions
 woman-0.055
 doctor-0.055
 parent-0.052
 wizard-0.052
 vet-0.052
Token even
Token ID772
Feature activation+2.07e-02

Pos logit contributions
rated+0.110
fy+0.105
cluded+0.102
iling+0.100
heit+0.100
Neg logit contributions
 doctor-0.078
 woman-0.077
 parent-0.073
 wizard-0.072
 mouse-0.072
Token more
Token ID517
Feature activation+6.47e-03

Pos logit contributions
rated+0.042
fy+0.040
cluded+0.039
|+0.038
iling+0.037
Neg logit contributions
 woman-0.033
 doctor-0.033
 parent-0.031
 vet-0.031
 wizard-0.031
Token famous
Token ID5863
Feature activation-8.58e-03

Pos logit contributions
rated+0.013
fy+0.012
|+0.012
estival+0.012
cluded+0.012
Neg logit contributions
 doctor-0.011
 woman-0.011
 wizard-0.011
 fairy-0.010
 parent-0.010
Token!"
Token ID2474
Feature activation-6.90e-03

Pos logit contributions
 doctor+0.017
 woman+0.017
 wizard+0.016
 mouse+0.016
 fairy+0.016
Neg logit contributions
rated-0.015
fy-0.014
cluded-0.014
|-0.013
rene-0.013
Token The
Token ID383
Feature activation-7.13e-02

Pos logit contributions
 doctor+0.011
 woman+0.011
 parent+0.011
 mechanic+0.011
 vet+0.011
Neg logit contributions
fy-0.013
rated-0.013
|-0.012
cluded-0.012
irm-0.012
Token fairy
Token ID25607
Feature activation-1.39e-01

Pos logit contributions
 woman+0.060
 vet+0.060
 doctor+0.059
 mechanic+0.058
 parent+0.055
Neg logit contributions
rated-0.172
fy-0.170
|-0.165
ils-0.164
irl-0.161
Token and
Token ID290
Feature activation-1.06e-02

Pos logit contributions
 doctor+0.263
 vet+0.255
 mechanic+0.252
 woman+0.251
 parent+0.246
Neg logit contributions
fy-0.217
rated-0.215
ils-0.197
cluded-0.197
|-0.195
Token the
Token ID262
Feature activation-3.54e-02

Pos logit contributions
 doctor+0.020
 mechanic+0.019
 vet+0.019
 woman+0.018
 parent+0.018
Neg logit contributions
rated-0.016
|-0.015
 matters-0.015
ils-0.015
heit-0.015
Token princess
Token ID21752
Feature activation-7.72e-02

Pos logit contributions
 vet+0.031
 doctor+0.031
 mechanic+0.030
 woman+0.030
 parent+0.029
Neg logit contributions
rated-0.084
fy-0.084
|-0.081
ils-0.080
cluded-0.080
Token laughed
Token ID13818
Feature activation-4.71e-03

Pos logit contributions
 doctor+0.153
 woman+0.151
 vet+0.145
 wizard+0.144
 parent+0.144
Neg logit contributions
rated-0.125
fy-0.121
cluded-0.114
|-0.111
iling-0.109
Token and
Token ID290
Feature activation-6.67e-03

Pos logit contributions
 woman+0.007
 doctor+0.007
 mouse+0.006
 parent+0.006
 fairy+0.006
Neg logit contributions
rated-0.011
fy-0.010
cluded-0.010
|-0.010
heit-0.010
Token real
Token ID1103
Feature activation-6.22e-04

Pos logit contributions
 woman+0.017
 doctor+0.016
 parent+0.016
 wizard+0.016
 mechanic+0.016
Neg logit contributions
rated-0.022
fy-0.022
|-0.021
cluded-0.021
ils-0.020
Token world
Token ID995
Feature activation+3.82e-02

Pos logit contributions
 woman+0.001
 doctor+0.001
 parent+0.001
 vet+0.001
 mechanic+0.001
Neg logit contributions
rated-0.002
fy-0.001
cluded-0.001
|-0.001
iling-0.001
Token,"
Token ID553
Feature activation-3.65e-02

Pos logit contributions
rated+0.062
heit+0.057
cluded+0.056
irm+0.055
fy+0.055
Neg logit contributions
 doctor-0.078
 woman-0.076
 mouse-0.075
 fairy-0.073
 mechanic-0.072
Token explained
Token ID4893
Feature activation-6.34e-02

Pos logit contributions
 doctor+0.063
 parent+0.062
 woman+0.062
 mechanic+0.061
 wizard+0.061
Neg logit contributions
fy-0.059
rated-0.058
 matters-0.058
|-0.056
cluded-0.056
Token the
Token ID262
Feature activation-4.18e-02

Pos logit contributions
 woman+0.130
 doctor+0.130
 parent+0.127
 wizard+0.127
 vet+0.126
Neg logit contributions
fy-0.085
rated-0.084
cluded-0.078
irm-0.077
ils-0.077
Token fairy
Token ID25607
Feature activation-1.39e-01

Pos logit contributions
 woman+0.026
 vet+0.026
 mechanic+0.025
 conductor+0.023
 doctor+0.023
Neg logit contributions
rated-0.105
 matters-0.102
ils-0.100
cluded-0.100
irl-0.097
Token.
Token ID13
Feature activation-4.87e-03

Pos logit contributions
 doctor+0.272
 woman+0.271
 parent+0.258
 vet+0.257
 wizard+0.257
Neg logit contributions
rated-0.231
fy-0.222
cluded-0.212
|-0.205
iling-0.202
Token
Token ID198
Feature activation-2.94e-02

Pos logit contributions
 parent+0.009
 woman+0.009
 wizard+0.009
 painter+0.009
 mechanic+0.008
Neg logit contributions
|-0.007
 matters-0.007
irm-0.006
estival-0.006
fy-0.006
Token
Token ID198
Feature activation-3.25e-02

Pos logit contributions
 woman+0.048
 wizard+0.048
 doctor+0.046
 parent+0.046
 mouse+0.046
Neg logit contributions
|-0.053
fy-0.051
cluded-0.050
irl-0.050
rated-0.050
TokenL
Token ID43
Feature activation+4.39e-02

Pos logit contributions
 doctor+0.053
 woman+0.053
 wizard+0.052
 parent+0.052
 mouse+0.050
Neg logit contributions
fy-0.060
rated-0.057
cluded-0.057
|-0.056
iling-0.055
Tokenily
Token ID813
Feature activation-1.44e-02

Pos logit contributions
rated+0.073
fy+0.069
cluded+0.067
|+0.065
iling+0.064
Neg logit contributions
 doctor-0.087
 woman-0.087
 parent-0.083
 vet-0.082
 wizard-0.082
Token a
Token ID257
Feature activation-5.32e-02

Pos logit contributions
 doctor+0.065
 woman+0.064
 parent+0.060
 vet+0.060
 wizard+0.059
Neg logit contributions
rated-0.113
fy-0.110
cluded-0.107
|-0.104
heit-0.103
Token 3
Token ID513
Feature activation-6.97e-02

Pos logit contributions
 doctor+0.070
 woman+0.067
 parent+0.067
 vet+0.067
 wizard+0.065
Neg logit contributions
fy-0.111
rated-0.110
cluded-0.105
heit-0.100
irl-0.099
Token-
Token ID12
Feature activation-4.13e-02

Pos logit contributions
 woman+0.155
 doctor+0.154
 wizard+0.147
 parent+0.147
 vet+0.146
Neg logit contributions
rated-0.097
fy-0.093
cluded-0.088
|-0.084
irm-0.082
Tokenyear
Token ID1941
Feature activation-3.10e-04

Pos logit contributions
 doctor+0.085
 woman+0.084
 parent+0.083
 mouse+0.081
 vet+0.081
Neg logit contributions
rated-0.063
fy-0.063
cluded-0.060
rene-0.057
|-0.056
Token-
Token ID12
Feature activation-3.15e-02

Pos logit contributions
 doctor+0.001
 woman+0.001
 parent+0.001
 vet+0.001
 mouse+0.001
Neg logit contributions
rated-0.001
fy-0.001
cluded-0.001
iling-0.000
|-0.000
Tokenold
Token ID727
Feature activation-1.39e-01

Pos logit contributions
 doctor+0.058
 woman+0.057
 parent+0.056
 mouse+0.055
 vet+0.054
Neg logit contributions
fy-0.055
rated-0.055
cluded-0.053
rene-0.051
iling-0.050
Token boy
Token ID2933
Feature activation-6.99e-02

Pos logit contributions
 doctor+0.159
 vet+0.151
 woman+0.150
 parent+0.146
 wizard+0.140
Neg logit contributions
rated-0.329
fy-0.322
cluded-0.313
|-0.300
iling-0.297
Token.
Token ID13
Feature activation-9.98e-03

Pos logit contributions
 doctor+0.112
 woman+0.112
 wizard+0.104
 parent+0.104
 mouse+0.103
Neg logit contributions
rated-0.147
fy-0.139
cluded-0.134
|-0.130
heit-0.130
Token He
Token ID679
Feature activation-1.24e-02

Pos logit contributions
 doctor+0.015
 woman+0.015
 parent+0.014
 vet+0.014
 wizard+0.014
Neg logit contributions
rated-0.021
fy-0.020
cluded-0.019
|-0.019
estival-0.019
Token was
Token ID373
Feature activation-3.54e-02

Pos logit contributions
 woman+0.020
 doctor+0.020
 parent+0.019
 wizard+0.019
 vet+0.019
Neg logit contributions
rated-0.025
fy-0.024
cluded-0.023
|-0.022
iling-0.022
Token sitting
Token ID5586
Feature activation+1.44e-02

Pos logit contributions
 doctor+0.058
 woman+0.058
 parent+0.055
 vet+0.055
 mouse+0.055
Neg logit contributions
rated-0.068
fy-0.066
cluded-0.064
|-0.062
iling-0.061
Tokenge
Token ID469
Feature activation+1.30e-02

Pos logit contributions
rated+0.069
fy+0.066
cluded+0.061
|+0.058
rene+0.057
Neg logit contributions
 doctor-0.113
 woman-0.112
 vet-0.108
 wizard-0.107
 mouse-0.107
Token.
Token ID13
Feature activation+1.24e-02

Pos logit contributions
rated+0.027
cluded+0.027
|+0.027
heit+0.027
irl+0.027
Neg logit contributions
 doctor-0.019
 mouse-0.019
 woman-0.019
 parent-0.018
 fairy-0.018
Token
Token ID198
Feature activation-1.76e-02

Pos logit contributions
rated+0.020
fy+0.019
|+0.018
cluded+0.018
estival+0.018
Neg logit contributions
 woman-0.024
 wizard-0.023
 doctor-0.023
 parent-0.023
 mechanic-0.022
Token
Token ID198
Feature activation-2.22e-02

Pos logit contributions
 woman+0.027
 doctor+0.026
 wizard+0.026
 parent+0.026
 mouse+0.025
Neg logit contributions
|-0.034
rated-0.034
fy-0.033
cluded-0.033
rene-0.032
TokenThe
Token ID464
Feature activation-4.63e-02

Pos logit contributions
 doctor+0.030
 woman+0.029
 vet+0.027
 parent+0.027
 mouse+0.027
Neg logit contributions
rated-0.052
fy-0.050
cluded-0.048
|-0.047
heit-0.047
Token artist
Token ID6802
Feature activation-1.38e-01

Pos logit contributions
 vet+0.075
 paramed+0.066
 blood+0.065
 woman+0.064
 doctor+0.064
Neg logit contributions
rated-0.080
ils-0.074
 matters-0.074
cluded-0.072
fy-0.070
Token decided
Token ID3066
Feature activation+1.02e-02

Pos logit contributions
 doctor+0.261
 woman+0.260
 vet+0.253
 parent+0.248
 wizard+0.248
Neg logit contributions
rated-0.228
fy-0.214
cluded-0.204
ils-0.202
iling-0.195
Token to
Token ID284
Feature activation-3.26e-04

Pos logit contributions
rated+0.018
fy+0.018
cluded+0.017
estival+0.016
ils+0.016
Neg logit contributions
 doctor-0.017
 woman-0.017
 parent-0.017
 mouse-0.016
 wizard-0.016
Token knock
Token ID10643
Feature activation-1.43e-03

Pos logit contributions
 doctor+0.001
 woman+0.001
 parent+0.001
 vet+0.001
 wizard+0.001
Neg logit contributions
rated-0.001
fy-0.001
cluded-0.001
|-0.000
iling-0.000
Token on
Token ID319
Feature activation+4.98e-02

Pos logit contributions
 doctor+0.003
 woman+0.003
 wizard+0.003
 mouse+0.003
 parent+0.003
Neg logit contributions
rated-0.003
fy-0.002
cluded-0.002
|-0.002
iling-0.002
Token the
Token ID262
Feature activation-1.31e-02

Pos logit contributions
rated+0.102
fy+0.099
cluded+0.095
|+0.092
iling+0.092
Neg logit contributions
 doctor-0.079
 woman-0.079
 parent-0.074
 vet-0.074
 wizard-0.073
Token girl
Token ID2576
Feature activation-9.40e-02

Pos logit contributions
 vet+0.088
 doctor+0.087
 conductor+0.078
 parent+0.077
 mechanic+0.076
Neg logit contributions
rated-0.155
fy-0.151
cluded-0.142
ishy-0.141
ils-0.139
Token smiled
Token ID13541
Feature activation-5.73e-02

Pos logit contributions
 doctor+0.175
 woman+0.172
 vet+0.169
 wizard+0.167
 parent+0.165
Neg logit contributions
rated-0.157
fy-0.156
ils-0.143
cluded-0.142
irl-0.140
Token and
Token ID290
Feature activation-1.24e-02

Pos logit contributions
 doctor+0.098
 woman+0.097
 vet+0.095
 wizard+0.092
 mechanic+0.091
Neg logit contributions
rated-0.104
fy-0.102
cluded-0.096
irm-0.095
rene-0.092
Token showed
Token ID3751
Feature activation-5.39e-02

Pos logit contributions
 doctor+0.023
 woman+0.022
 vet+0.022
 mechanic+0.022
 parent+0.022
Neg logit contributions
rated-0.021
fy-0.020
cluded-0.019
irm-0.019
ils-0.019
Token the
Token ID262
Feature activation-2.41e-02

Pos logit contributions
 woman+0.120
 vet+0.120
 doctor+0.119
 mechanic+0.118
 parent+0.116
Neg logit contributions
ils-0.060
 matters-0.059
fy-0.054
ag-0.054
cluded-0.053
Token artist
Token ID6802
Feature activation-1.38e-01

Pos logit contributions
 vet+0.041
 doctor+0.036
 paramed+0.033
 mechanic+0.032
 Owl+0.032
Neg logit contributions
rated-0.044
ets-0.040
fy-0.040
cluded-0.040
ils-0.040
Token her
Token ID607
Feature activation+1.84e-02

Pos logit contributions
 doctor+0.233
 woman+0.231
 vet+0.220
 parent+0.218
 wizard+0.217
Neg logit contributions
rated-0.264
fy-0.256
cluded-0.247
|-0.237
iling-0.235
Token painting
Token ID12036
Feature activation-5.70e-02

Pos logit contributions
rated+0.038
fy+0.037
cluded+0.035
irl+0.034
ils+0.034
Neg logit contributions
 doctor-0.028
 woman-0.028
 vet-0.027
 parent-0.026
 wizard-0.026
Token.
Token ID13
Feature activation-1.74e-03

Pos logit contributions
 doctor+0.112
 woman+0.112
 vet+0.107
 parent+0.107
 wizard+0.106
Neg logit contributions
rated-0.092
fy-0.088
cluded-0.085
|-0.080
rene-0.079
Token He
Token ID679
Feature activation-1.47e-02

Pos logit contributions
 woman+0.003
 doctor+0.003
 wizard+0.003
 parent+0.003
 fairy+0.003
Neg logit contributions
fy-0.002
rated-0.002
|-0.002
estival-0.002
cluded-0.002
Token was
Token ID373
Feature activation-2.92e-02

Pos logit contributions
 doctor+0.028
 woman+0.027
 vet+0.027
 wizard+0.027
 mouse+0.026
Neg logit contributions
fy-0.023
rated-0.023
ils-0.022
irl-0.022
cluded-0.021
Token so
Token ID523
Feature activation+6.34e-02

Pos logit contributions
rated+0.118
cluded+0.109
iling+0.109
fy+0.107
estival+0.107
Neg logit contributions
 doctor-0.123
 mouse-0.120
 parent-0.120
 woman-0.119
 fairy-0.119
Token well
Token ID880
Feature activation+4.82e-02

Pos logit contributions
rated+0.128
estival+0.122
cluded+0.120
iling+0.120
wed+0.118
Neg logit contributions
 doctor-0.099
 fairy-0.095
 mouse-0.094
 woman-0.094
 parent-0.093
Token!"
Token ID2474
Feature activation-1.20e-02

Pos logit contributions
rated+0.086
cluded+0.084
estival+0.084
|+0.084
rene+0.080
Neg logit contributions
 doctor-0.084
 woman-0.084
 fairy-0.082
 wizard-0.081
 mouse-0.081
Token said
Token ID531
Feature activation-5.23e-02

Pos logit contributions
 doctor+0.021
 woman+0.020
 mechanic+0.020
 vet+0.020
 parent+0.020
Neg logit contributions
fy-0.021
rated-0.021
cluded-0.020
 matters-0.020
|-0.020
Token the
Token ID262
Feature activation-4.99e-02

Pos logit contributions
 doctor+0.096
 woman+0.094
 vet+0.094
 mechanic+0.092
 parent+0.089
Neg logit contributions
rated-0.082
fy-0.080
 matters-0.077
|-0.075
cluded-0.075
Token fairy
Token ID25607
Feature activation-1.38e-01

Pos logit contributions
 mechanic+0.027
 woman+0.026
 vet+0.026
 doctor+0.025
 conductor+0.023
Neg logit contributions
rated-0.129
fy-0.124
irl-0.123
ils-0.123
 matters-0.122
Token.
Token ID13
Feature activation-1.49e-02

Pos logit contributions
 doctor+0.270
 woman+0.266
 vet+0.259
 parent+0.255
 mechanic+0.253
Neg logit contributions
rated-0.225
fy-0.219
cluded-0.206
|-0.195
rene-0.195
Token "
Token ID366
Feature activation-6.03e-02

Pos logit contributions
 woman+0.023
 doctor+0.023
 parent+0.023
 mechanic+0.022
 conductor+0.022
Neg logit contributions
fy-0.027
rated-0.026
|-0.026
irm-0.026
cluded-0.026
TokenIt
Token ID1026
Feature activation-1.66e-02

Pos logit contributions
 doctor+0.106
 woman+0.105
 parent+0.101
 vet+0.099
 mouse+0.099
Neg logit contributions
rated-0.111
fy-0.108
cluded-0.104
|-0.102
rene-0.101
Token looks
Token ID3073
Feature activation+2.06e-02

Pos logit contributions
 doctor+0.032
 woman+0.031
 mouse+0.030
 parent+0.030
 wizard+0.030
Neg logit contributions
rated-0.028
fy-0.027
cluded-0.026
|-0.025
estival-0.025
Token so
Token ID523
Feature activation+4.43e-02

Pos logit contributions
iling+0.044
rated+0.044
|+0.043
estival+0.042
cluded+0.041
Neg logit contributions
 doctor-0.028
 fairy-0.027
 woman-0.026
 mouse-0.026
 wizard-0.026
Token,
Token ID11
Feature activation-2.72e-02

Pos logit contributions
rated+0.069
fy+0.065
wy+0.064
iling+0.064
cluded+0.064
Neg logit contributions
 doctor-0.057
 mouse-0.054
 woman-0.054
 fairy-0.054
 parent-0.054
Token magic
Token ID5536
Feature activation-3.07e-02

Pos logit contributions
 woman+0.048
 doctor+0.048
 vet+0.046
 wizard+0.045
 parent+0.045
Neg logit contributions
rated-0.047
fy-0.046
cluded-0.043
|-0.042
 matters-0.042
Token fairy
Token ID25607
Feature activation-1.11e-01

Pos logit contributions
 doctor+0.063
 woman+0.060
 wizard+0.060
 mouse+0.059
 fairy+0.059
Neg logit contributions
rated-0.052
fy-0.049
|-0.048
cluded-0.048
rene-0.046
Token!"
Token ID2474
Feature activation-6.32e-03

Pos logit contributions
 doctor+0.240
 woman+0.238
 mouse+0.229
 wizard+0.229
 parent+0.227
Neg logit contributions
rated-0.166
fy-0.155
cluded-0.149
|-0.145
heit-0.144
Token The
Token ID383
Feature activation-7.11e-02

Pos logit contributions
 doctor+0.011
 woman+0.010
 wizard+0.010
 vet+0.010
 mechanic+0.010
Neg logit contributions
rated-0.011
fy-0.011
|-0.011
cluded-0.011
irm-0.010
Token fairy
Token ID25607
Feature activation-1.38e-01

Pos logit contributions
 woman+0.085
 parent+0.082
 doctor+0.081
 vet+0.078
 conductor+0.077
Neg logit contributions
rated-0.149
fy-0.145
cluded-0.141
 matters-0.139
irl-0.139
Token smiled
Token ID13541
Feature activation-6.06e-02

Pos logit contributions
 doctor+0.254
 woman+0.246
 vet+0.244
 mechanic+0.241
 parent+0.237
Neg logit contributions
rated-0.230
fy-0.228
cluded-0.213
ils-0.207
irl-0.206
Token back
Token ID736
Feature activation+5.92e-03

Pos logit contributions
 doctor+0.104
 woman+0.102
 vet+0.100
 mechanic+0.098
 parent+0.096
Neg logit contributions
rated-0.112
fy-0.109
cluded-0.103
irm-0.102
rene-0.100
Token and
Token ID290
Feature activation-1.48e-02

Pos logit contributions
rated+0.014
fy+0.013
cluded+0.013
|+0.013
rene+0.012
Neg logit contributions
 woman-0.008
 doctor-0.008
 parent-0.008
 mouse-0.007
 wizard-0.007
Token said
Token ID531
Feature activation-5.21e-02

Pos logit contributions
 doctor+0.027
 woman+0.026
 vet+0.025
 mechanic+0.025
 wizard+0.025
Neg logit contributions
rated-0.026
fy-0.026
cluded-0.024
ils-0.023
 matters-0.023
Token,
Token ID11
Feature activation-4.70e-02

Pos logit contributions
 doctor+0.089
 woman+0.088
 wizard+0.083
 parent+0.083
 mouse+0.083
Neg logit contributions
rated-0.102
fy-0.098
cluded-0.095
|-0.093
iling-0.091
Token "
Token ID366
Feature activation-6.20e-02

Pos logit contributions
 doctor+0.025
 vet+0.025
 woman+0.024
 mouse+0.024
 wizard+0.024
Neg logit contributions
fy-0.021
rated-0.021
cluded-0.020
estival-0.020
|-0.020
TokenHello
Token ID15496
Feature activation-4.22e-03

Pos logit contributions
 doctor+0.120
 woman+0.113
 parent+0.112
 vet+0.110
 wizard+0.110
Neg logit contributions
rated-0.105
fy-0.100
|-0.099
cluded-0.097
rene-0.095
Token,"
Token ID553
Feature activation-3.70e-02

Pos logit contributions
 doctor+0.007
 woman+0.007
 vet+0.007
 parent+0.007
 wizard+0.007
Neg logit contributions
rated-0.008
fy-0.007
cluded-0.007
|-0.007
iling-0.007
Token said
Token ID531
Feature activation-5.04e-02

Pos logit contributions
 doctor+0.062
 vet+0.061
 woman+0.059
 parent+0.058
 mouse+0.058
Neg logit contributions
rated-0.066
fy-0.065
 matters-0.062
cluded-0.062
heit-0.060
Token the
Token ID262
Feature activation-4.15e-02

Pos logit contributions
 vet+0.094
 doctor+0.090
 woman+0.089
 mouse+0.087
 owl+0.087
Neg logit contributions
 matters-0.073
rated-0.073
fy-0.072
cluded-0.069
€-0.066
Token fairy
Token ID25607
Feature activation-1.37e-01

Pos logit contributions
 vet+0.024
 mechanic+0.024
 woman+0.022
 doctor+0.022
 paramed+0.022
Neg logit contributions
rated-0.100
ils-0.097
 matters-0.096
fy-0.096
|-0.096
Token.
Token ID13
Feature activation-7.53e-03

Pos logit contributions
 doctor+0.292
 woman+0.291
 vet+0.282
 parent+0.279
 mechanic+0.275
Neg logit contributions
rated-0.200
fy-0.191
cluded-0.181
|-0.170
rene-0.168
Token
Token ID220
Feature activation+2.71e-02

Pos logit contributions
 vet+0.010
 doctor+0.010
 parent+0.010
 mouse+0.010
 owl+0.009
Neg logit contributions
fy-0.014
€-0.014
estival-0.014
rated-0.013
 matters-0.013
Token
Token ID198
Feature activation-3.15e-02

Pos logit contributions
rated+0.058
fy+0.056
cluded+0.055
|+0.055
 matters+0.054
Neg logit contributions
 woman-0.038
 doctor-0.037
 wizard-0.035
 parent-0.035
 mouse-0.034
Token
Token ID198
Feature activation-3.53e-02

Pos logit contributions
 mouse+0.052
 doctor+0.051
 woman+0.050
 parent+0.050
 vet+0.049
Neg logit contributions
fy-0.056
cluded-0.055
irl-0.055
|-0.054
rated-0.054
TokenJ
Token ID41
Feature activation+3.89e-02

Pos logit contributions
 doctor+0.061
 mouse+0.059
 woman+0.059
 vet+0.058
 parent+0.058
Neg logit contributions
fy-0.064
rated-0.063
cluded-0.062
iling-0.059
rene-0.058
Token "
Token ID366
Feature activation-6.30e-02

Pos logit contributions
 vet+0.048
 doctor+0.046
 parent+0.046
 woman+0.046
 mouse+0.044
Neg logit contributions
rated-0.089
fy-0.084
estival-0.083
cluded-0.082
heit-0.082
TokenThat
Token ID2504
Feature activation+3.15e-02

Pos logit contributions
 doctor+0.113
 parent+0.108
 woman+0.108
 wizard+0.105
 mouse+0.104
Neg logit contributions
rated-0.113
fy-0.111
cluded-0.107
|-0.106
rene-0.103
Token monkey
Token ID21657
Feature activation-7.15e-02

Pos logit contributions
rated+0.058
fy+0.056
cluded+0.053
|+0.052
iling+0.051
Neg logit contributions
 doctor-0.057
 woman-0.056
 parent-0.054
 vet-0.053
 wizard-0.053
Token is
Token ID318
Feature activation-1.71e-02

Pos logit contributions
 doctor+0.131
 mouse+0.131
 fairy+0.127
 woman+0.126
 wizard+0.125
Neg logit contributions
rated-0.135
|-0.124
cluded-0.123
fy-0.123
irm-0.120
Token an
Token ID281
Feature activation-3.72e-03

Pos logit contributions
 doctor+0.027
 woman+0.027
 wizard+0.026
 parent+0.025
 mouse+0.025
Neg logit contributions
rated-0.036
fy-0.034
cluded-0.033
|-0.032
iling-0.032
Token artist
Token ID6802
Feature activation-1.37e-01

Pos logit contributions
 doctor+0.004
 woman+0.004
 wizard+0.004
 mouse+0.004
 vet+0.004
Neg logit contributions
rated-0.009
fy-0.009
cluded-0.009
|-0.009
irl-0.009
Token.
Token ID13
Feature activation-2.44e-03

Pos logit contributions
 doctor+0.277
 woman+0.276
 vet+0.264
 parent+0.264
 wizard+0.261
Neg logit contributions
rated-0.217
fy-0.208
cluded-0.199
|-0.190
iling-0.189
Token He
Token ID679
Feature activation-6.24e-03

Pos logit contributions
 woman+0.005
 doctor+0.005
 parent+0.005
 wizard+0.004
 vet+0.004
Neg logit contributions
rated-0.004
fy-0.004
cluded-0.004
|-0.004
estival-0.003
Token is
Token ID318
Feature activation-2.12e-02

Pos logit contributions
 doctor+0.013
 woman+0.012
 wizard+0.012
 mouse+0.012
 parent+0.012
Neg logit contributions
rated-0.010
fy-0.010
cluded-0.009
|-0.009
estival-0.009
Token very
Token ID845
Feature activation+3.75e-02

Pos logit contributions
 woman+0.036
 doctor+0.036
 vet+0.034
 parent+0.034
 mouse+0.034
Neg logit contributions
rated-0.038
fy-0.037
cluded-0.036
iling-0.035
irl-0.035
Token careless
Token ID36138
Feature activation-3.21e-02

Pos logit contributions
rated+0.081
fy+0.078
cluded+0.075
|+0.074
iling+0.073
Neg logit contributions
 doctor-0.056
 woman-0.055
 parent-0.052
 wizard-0.052
 vet-0.052
Token
Token ID198
Feature activation-3.92e-02

Pos logit contributions
 woman+0.051
 doctor+0.050
 wizard+0.050
 mouse+0.049
 fairy+0.048
Neg logit contributions
rated-0.067
|-0.066
fy-0.065
cluded-0.065
rene-0.064
TokenSuddenly
Token ID38582
Feature activation-4.80e-02

Pos logit contributions
 doctor+0.065
 woman+0.065
 parent+0.062
 wizard+0.061
 mouse+0.061
Neg logit contributions
rated-0.077
fy-0.074
cluded-0.071
|-0.069
iling-0.069
Token,
Token ID11
Feature activation-3.20e-02

Pos logit contributions
 doctor+0.083
 woman+0.083
 parent+0.078
 vet+0.078
 mouse+0.078
Neg logit contributions
rated-0.092
fy-0.088
cluded-0.085
|-0.083
iling-0.082
Token a
Token ID257
Feature activation-5.86e-02

Pos logit contributions
 doctor+0.055
 woman+0.055
 wizard+0.053
 vet+0.053
 parent+0.053
Neg logit contributions
rated-0.058
fy-0.055
cluded-0.053
heit-0.052
|-0.052
Token friendly
Token ID8030
Feature activation-3.58e-02

Pos logit contributions
 parent+0.076
 doctor+0.075
 vet+0.074
 wizard+0.073
 grandmother+0.069
Neg logit contributions
fy-0.107
cluded-0.100
rated-0.100
heit-0.094
irl-0.090
Token fairy
Token ID25607
Feature activation-1.37e-01

Pos logit contributions
 doctor+0.034
 parent+0.033
 vet+0.033
 woman+0.031
 wizard+0.030
Neg logit contributions
rated-0.083
cluded-0.082
fy-0.081
|-0.078
irm-0.076
Token flew
Token ID13112
Feature activation+1.81e-02

Pos logit contributions
 doctor+0.241
 vet+0.228
 mechanic+0.224
 parent+0.222
 woman+0.222
Neg logit contributions
rated-0.235
fy-0.229
ils-0.225
cluded-0.222
|-0.219
Token into
Token ID656
Feature activation+1.56e-02

Pos logit contributions
rated+0.026
fy+0.025
cluded+0.024
iling+0.023
|+0.023
Neg logit contributions
 woman-0.040
 doctor-0.039
 parent-0.038
 mouse-0.037
 vet-0.037
Token the
Token ID262
Feature activation-2.45e-02

Pos logit contributions
|+0.019
rated+0.017
igators+0.017
 matters+0.017
estival+0.017
Neg logit contributions
 doctor-0.034
 mechanic-0.032
 vet-0.032
 wizard-0.032
 parent-0.032
Token park
Token ID3952
Feature activation+5.59e-03

Pos logit contributions
 doctor+0.034
 woman+0.033
 parent+0.031
 vet+0.031
 wizard+0.031
Neg logit contributions
rated-0.055
fy-0.054
cluded-0.052
|-0.051
iling-0.050
Token.
Token ID13
Feature activation-1.10e-02

Pos logit contributions
rated+0.010
fy+0.009
cluded+0.009
|+0.009
iling+0.009
Neg logit contributions
 doctor-0.010
 woman-0.010
 parent-0.010
 vet-0.010
 wizard-0.010
Token up
Token ID510
Feature activation-2.31e-02

Pos logit contributions
 woman+0.025
 doctor+0.024
 vet+0.023
 parent+0.023
 mouse+0.023
Neg logit contributions
rated-0.019
fy-0.018
cluded-0.017
irm-0.016
rene-0.016
Token to
Token ID284
Feature activation-1.77e-02

Pos logit contributions
 doctor+0.039
 woman+0.038
 wizard+0.036
 parent+0.036
 fairy+0.035
Neg logit contributions
rated-0.045
fy-0.043
cluded-0.042
|-0.042
heit-0.041
Token be
Token ID307
Feature activation+1.70e-02

Pos logit contributions
 doctor+0.032
 woman+0.032
 vet+0.031
 mouse+0.031
 parent+0.031
Neg logit contributions
rated-0.030
fy-0.028
cluded-0.028
irl-0.027
|-0.027
Token a
Token ID257
Feature activation-4.79e-02

Pos logit contributions
rated+0.033
fy+0.032
cluded+0.030
|+0.029
iling+0.029
Neg logit contributions
 doctor-0.029
 woman-0.029
 parent-0.027
 vet-0.027
 wizard-0.027
Token wealthy
Token ID11574
Feature activation-2.02e-02

Pos logit contributions
 owl+0.064
 blood+0.064
 woman+0.063
 vet+0.063
 parent+0.063
Neg logit contributions
fy-0.092
rated-0.083
cluded-0.079
 Saturday-0.074
irl-0.073
Token artist
Token ID6802
Feature activation-1.37e-01

Pos logit contributions
 blood+0.017
 vet+0.017
 doctor+0.016
 conductor+0.016
 parent+0.015
Neg logit contributions
fy-0.047
rated-0.044
cluded-0.042
gers-0.040
 Saturday-0.040
Token.
Token ID13
Feature activation-5.62e-03

Pos logit contributions
 doctor+0.277
 woman+0.275
 vet+0.268
 parent+0.263
 wizard+0.262
Neg logit contributions
rated-0.208
fy-0.201
cluded-0.190
ils-0.182
iling-0.181
Token She
Token ID1375
Feature activation-1.53e-02

Pos logit contributions
 woman+0.011
 doctor+0.011
 wizard+0.010
 parent+0.010
 mouse+0.010
Neg logit contributions
rated-0.009
fy-0.009
cluded-0.009
|-0.009
estival-0.008
Token sold
Token ID2702
Feature activation-7.58e-03

Pos logit contributions
 doctor+0.028
 woman+0.028
 vet+0.026
 parent+0.026
 wizard+0.026
Neg logit contributions
rated-0.027
fy-0.026
cluded-0.025
|-0.024
iling-0.024
Token her
Token ID607
Feature activation+1.65e-02

Pos logit contributions
 doctor+0.015
 woman+0.015
 vet+0.015
 parent+0.014
 wizard+0.014
Neg logit contributions
fy-0.011
ils-0.011
rated-0.011
 matters-0.010
cluded-0.010
Token pictures
Token ID5986
Feature activation-5.75e-02

Pos logit contributions
rated+0.031
fy+0.031
cluded+0.029
 matters+0.029
irl+0.029
Neg logit contributions
 doctor-0.026
 woman-0.026
 vet-0.026
 wizard-0.024
 mechanic-0.024
Token there
Token ID612
Feature activation+3.06e-02

Pos logit contributions
rated+0.055
fy+0.053
cluded+0.051
|+0.049
iling+0.049
Neg logit contributions
 doctor-0.054
 woman-0.054
 parent-0.051
 vet-0.051
 wizard-0.051
Token was
Token ID373
Feature activation+9.22e-03

Pos logit contributions
rated+0.059
fy+0.055
|+0.054
cluded+0.054
iling+0.053
Neg logit contributions
 woman-0.051
 doctor-0.050
 wizard-0.048
 vet-0.048
 parent-0.048
Token a
Token ID257
Feature activation-1.82e-02

Pos logit contributions
rated+0.021
fy+0.020
cluded+0.020
|+0.019
iling+0.019
Neg logit contributions
 doctor-0.013
 woman-0.012
 vet-0.012
 parent-0.012
 mouse-0.012
Token dog
Token ID3290
Feature activation-3.34e-02

Pos logit contributions
 vet+0.019
 doctor+0.019
 wizard+0.018
 conductor+0.018
 woman+0.017
Neg logit contributions
fy-0.039
irl-0.038
rated-0.037
 tim-0.036
cluded-0.036
Token named
Token ID3706
Feature activation-5.57e-03

Pos logit contributions
 woman+0.058
 doctor+0.057
 wizard+0.055
 parent+0.055
 vet+0.054
Neg logit contributions
fy-0.058
rated-0.057
|-0.056
cluded-0.055
iling-0.054
Token F
Token ID376
Feature activation+1.50e-01

Pos logit contributions
 doctor+0.008
 woman+0.008
 mouse+0.008
 vet+0.007
 parent+0.007
Neg logit contributions
rated-0.013
fy-0.012
cluded-0.012
iling-0.012
rene-0.012
Tokenido
Token ID17305
Feature activation+5.00e-02

Pos logit contributions
rated+0.220
heit+0.215
fy+0.208
ils+0.197
iling+0.181
Neg logit contributions
 doctor-0.321
 woman-0.315
 parent-0.305
 mechanic-0.304
 vet-0.303
Token.
Token ID13
Feature activation+3.36e-02

Pos logit contributions
rated+0.103
fy+0.096
cluded+0.094
heit+0.094
irm+0.090
Neg logit contributions
 doctor-0.087
 woman-0.086
 mouse-0.083
 parent-0.082
 vet-0.079
Token He
Token ID679
Feature activation+6.60e-03

Pos logit contributions
rated+0.072
ils+0.071
fy+0.067
iling+0.067
wy+0.067
Neg logit contributions
 doctor-0.051
 woman-0.050
 mouse-0.050
 vet-0.049
 parent-0.048
Token loved
Token ID6151
Feature activation-1.72e-02

Pos logit contributions
rated+0.014
fy+0.013
cluded+0.013
heit+0.013
ash+0.013
Neg logit contributions
 doctor-0.011
 woman-0.011
 mouse-0.010
 vet-0.010
 parent-0.010
Token to
Token ID284
Feature activation+2.21e-02

Pos logit contributions
 doctor+0.025
 woman+0.025
 parent+0.023
 vet+0.023
 mouse+0.023
Neg logit contributions
rated-0.039
fy-0.037
cluded-0.036
|-0.035
rene-0.035
Token in
Token ID287
Feature activation+1.03e-02

Pos logit contributions
 doctor+0.112
 woman+0.112
 parent+0.105
 wizard+0.105
 fairy+0.103
Neg logit contributions
rated-0.140
fy-0.136
|-0.134
cluded-0.131
iling-0.126
Token her
Token ID607
Feature activation+1.49e-02

Pos logit contributions
rated+0.023
fy+0.022
cluded+0.021
|+0.021
rene+0.020
Neg logit contributions
 doctor-0.016
 woman-0.015
 wizard-0.014
 parent-0.014
 mouse-0.014
Token heart
Token ID2612
Feature activation-2.62e-02

Pos logit contributions
rated+0.030
fy+0.029
cluded+0.028
|+0.027
iling+0.027
Neg logit contributions
 doctor-0.025
 woman-0.025
 parent-0.023
 wizard-0.023
 vet-0.023
Token.
Token ID13
Feature activation+6.01e-03

Pos logit contributions
 doctor+0.046
 woman+0.046
 parent+0.043
 wizard+0.043
 fairy+0.043
Neg logit contributions
rated-0.049
fy-0.047
cluded-0.046
|-0.045
heit-0.044
Token<|endoftext|>
Token ID50256
Feature activation+1.01e-01

Pos logit contributions
rated+0.013
fy+0.012
cluded+0.012
iling+0.012
ils+0.012
Neg logit contributions
 doctor-0.009
 woman-0.009
 vet-0.009
 parent-0.009
 mouse-0.008
TokenS
Token ID50
Feature activation+1.53e-01

Pos logit contributions
rated+0.264
fy+0.256
cluded+0.242
heit+0.241
ils+0.236
Neg logit contributions
 doctor-0.119
 woman-0.112
 mouse-0.108
 wizard-0.103
 vet-0.102
Tokenara
Token ID3301
Feature activation+9.75e-02

Pos logit contributions
rated+0.281
fy+0.267
cluded+0.257
irm+0.245
|+0.242
Neg logit contributions
 doctor-0.280
 woman-0.278
 vet-0.263
 parent-0.262
 wizard-0.261
Token and
Token ID290
Feature activation+5.64e-02

Pos logit contributions
rated+0.255
fy+0.252
cluded+0.243
heit+0.238
estival+0.237
Neg logit contributions
 woman-0.105
 doctor-0.102
 parent-0.091
 mouse-0.090
 wizard-0.090
Token Ben
Token ID3932
Feature activation+1.05e-01

Pos logit contributions
rated+0.127
fy+0.125
cluded+0.119
iling+0.115
ash+0.115
Neg logit contributions
 woman-0.089
 doctor-0.085
 parent-0.082
 mouse-0.079
 vet-0.077
Token like
Token ID588
Feature activation+1.24e-02

Pos logit contributions
rated+0.236
fy+0.231
cluded+0.221
estival+0.217
heit+0.212
Neg logit contributions
 woman-0.157
 doctor-0.155
 parent-0.142
 fairy-0.142
 bird-0.139
Token to
Token ID284
Feature activation+2.88e-02

Pos logit contributions
rated+0.029
fy+0.028
cluded+0.027
heit+0.027
iling+0.026
Neg logit contributions
 woman-0.017
 doctor-0.017
 mouse-0.016
 parent-0.016
 vet-0.016
Token about
Token ID546
Feature activation-2.91e-02

Pos logit contributions
 doctor+0.055
 woman+0.053
 vet+0.049
 parent+0.049
 mouse+0.048
Neg logit contributions
rated-0.059
fy-0.058
cluded-0.055
|-0.055
wy-0.054
Token the
Token ID262
Feature activation-9.11e-03

Pos logit contributions
 doctor+0.043
 woman+0.043
 vet+0.040
 parent+0.039
 mouse+0.039
Neg logit contributions
rated-0.065
fy-0.065
cluded-0.062
|-0.059
rene-0.059
Token hill
Token ID12788
Feature activation+5.14e-02

Pos logit contributions
 doctor+0.016
 woman+0.016
 parent+0.015
 vet+0.015
 wizard+0.015
Neg logit contributions
rated-0.017
fy-0.017
cluded-0.016
|-0.016
iling-0.015
Token.
Token ID13
Feature activation+2.56e-02

Pos logit contributions
rated+0.104
heit+0.100
cluded+0.097
fy+0.097
irl+0.093
Neg logit contributions
 doctor-0.087
 woman-0.084
 mouse-0.081
 vet-0.079
 fairy-0.077
Token<|endoftext|>
Token ID50256
Feature activation+1.02e-01

Pos logit contributions
rated+0.061
fy+0.058
cluded+0.056
ils+0.055
heit+0.054
Neg logit contributions
 doctor-0.036
 woman-0.035
 vet-0.034
 parent-0.032
 mouse-0.032
TokenS
Token ID50
Feature activation+1.62e-01

Pos logit contributions
rated+0.249
fy+0.241
heit+0.224
cluded+0.223
rene+0.219
Neg logit contributions
 doctor-0.145
 mouse-0.135
 woman-0.132
 wizard-0.126
 fairy-0.126
Tokenara
Token ID3301
Feature activation+9.09e-02

Pos logit contributions
rated+0.230
fy+0.221
cluded+0.209
|+0.201
iling+0.197
Neg logit contributions
 doctor-0.351
 woman-0.351
 parent-0.336
 vet-0.335
 wizard-0.335
Token and
Token ID290
Feature activation+5.10e-02

Pos logit contributions
rated+0.221
fy+0.213
heit+0.204
cluded+0.199
appointed+0.193
Neg logit contributions
 doctor-0.128
 woman-0.123
 painter-0.115
 wizard-0.112
 mouse-0.112
Token Tom
Token ID4186
Feature activation+9.28e-02

Pos logit contributions
rated+0.111
fy+0.107
cluded+0.101
iling+0.098
ash+0.097
Neg logit contributions
 woman-0.084
 doctor-0.082
 parent-0.078
 vet-0.077
 mouse-0.076
Token love
Token ID1842
Feature activation-3.18e-03

Pos logit contributions
rated+0.205
fy+0.192
cluded+0.183
estival+0.179
appointed+0.177
Neg logit contributions
 doctor-0.142
 woman-0.139
 bird-0.135
 painter-0.132
 fairy-0.128
Token pizza
Token ID14256
Feature activation+4.12e-02

Pos logit contributions
 doctor+0.004
 woman+0.004
 painter+0.004
 mouse+0.004
 parent+0.003
Neg logit contributions
rated-0.008
fy-0.008
rene-0.008
cluded-0.008
heit-0.008
Token in
Token ID287
Feature activation+5.86e-02

Pos logit contributions
rated+0.263
fy+0.252
cluded+0.240
rene+0.238
wy+0.233
Neg logit contributions
 doctor-0.186
 woman-0.175
 parent-0.172
 mouse-0.171
 painter-0.168
Token the
Token ID262
Feature activation+2.18e-03

Pos logit contributions
rated+0.096
fy+0.093
|+0.090
cluded+0.089
iling+0.087
Neg logit contributions
 woman-0.111
 doctor-0.111
 wizard-0.106
 vet-0.106
 parent-0.104
Token backyard
Token ID24296
Feature activation+6.88e-03

Pos logit contributions
rated+0.005
fy+0.005
cluded+0.005
heit+0.005
iling+0.004
Neg logit contributions
 doctor-0.003
 woman-0.003
 parent-0.003
 vet-0.003
 mouse-0.003
Token.
Token ID13
Feature activation+2.38e-02

Pos logit contributions
rated+0.010
fy+0.010
cluded+0.009
|+0.009
iling+0.009
Neg logit contributions
 woman-0.014
 doctor-0.014
 parent-0.014
 vet-0.014
 wizard-0.014
Token<|endoftext|>
Token ID50256
Feature activation+9.67e-02

Pos logit contributions
|+0.043
fy+0.042
rated+0.042
 matters+0.041
cluded+0.040
Neg logit contributions
 wizard-0.040
 woman-0.040
 doctor-0.040
 owl-0.037
 painter-0.037
TokenS
Token ID50
Feature activation+1.55e-01

Pos logit contributions
rated+0.222
fy+0.214
cluded+0.204
heit+0.200
rene+0.196
Neg logit contributions
 doctor-0.139
 woman-0.133
 mouse-0.127
 vet-0.126
 wizard-0.124
Tokenara
Token ID3301
Feature activation+8.54e-02

Pos logit contributions
fy+0.201
rated+0.197
cluded+0.187
|+0.183
estival+0.178
Neg logit contributions
 woman-0.348
 doctor-0.343
 wizard-0.335
 mouse-0.333
 parent-0.332
Token likes
Token ID7832
Feature activation-1.18e-02

Pos logit contributions
rated+0.210
fy+0.201
heit+0.200
cluded+0.198
estival+0.192
Neg logit contributions
 woman-0.110
 doctor-0.109
 mouse-0.101
 parent-0.097
 fairy-0.097
Token to
Token ID284
Feature activation+2.93e-02

Pos logit contributions
 woman+0.018
 doctor+0.018
 parent+0.017
 wizard+0.017
 vet+0.017
Neg logit contributions
rated-0.024
fy-0.023
cluded-0.022
|-0.022
estival-0.021
Token make
Token ID787
Feature activation-2.21e-02

Pos logit contributions
rated+0.055
fy+0.052
iling+0.048
cluded+0.047
heit+0.046
Neg logit contributions
 doctor-0.056
 woman-0.054
 parent-0.053
 wizard-0.052
 painter-0.052
Token things
Token ID1243
Feature activation+2.18e-02

Pos logit contributions
 woman+0.042
 doctor+0.040
 wizard+0.039
 parent+0.039
 mouse+0.038
Neg logit contributions
rated-0.036
irl-0.035
fy-0.035
ils-0.035
cluded-0.034
Token a
Token ID257
Feature activation-2.21e-02

Pos logit contributions
rated+0.018
fy+0.017
cluded+0.017
heit+0.016
iling+0.016
Neg logit contributions
 doctor-0.009
 woman-0.008
 mouse-0.008
 vet-0.008
 parent-0.008
Token family
Token ID1641
Feature activation-2.18e-02

Pos logit contributions
 doctor+0.022
 vet+0.022
 wizard+0.021
 conductor+0.021
 woman+0.021
Neg logit contributions
fy-0.049
rated-0.047
irl-0.046
cluded-0.045
 tim-0.044
Token who
Token ID508
Feature activation-4.40e-02

Pos logit contributions
 woman+0.035
 doctor+0.035
 vet+0.034
 wizard+0.033
 mouse+0.032
Neg logit contributions
rated-0.039
fy-0.039
iling-0.038
|-0.038
cluded-0.037
Token really
Token ID1107
Feature activation+8.44e-02

Pos logit contributions
 woman+0.061
 doctor+0.060
 mouse+0.058
 parent+0.058
 mechanic+0.055
Neg logit contributions
heit-0.108
rated-0.099
estival-0.096
fy-0.095
ash-0.095
Token enjoyed
Token ID8359
Feature activation+1.38e-02

Pos logit contributions
heit+0.186
rated+0.183
iling+0.175
irm+0.169
estival+0.168
Neg logit contributions
 woman-0.126
 doctor-0.119
 parent-0.116
 painter-0.114
 mouse-0.114
Token pic
Token ID8301
Feature activation+2.33e-01

Pos logit contributions
rated+0.028
fy+0.027
cluded+0.025
rene+0.024
|+0.024
Neg logit contributions
 doctor-0.024
 woman-0.024
 parent-0.023
 mouse-0.023
 wizard-0.022
Tokenn
Token ID77
Feature activation+9.28e-02

Pos logit contributions
rated+0.308
fy+0.287
irm+0.262
cluded+0.260
ishy+0.257
Neg logit contributions
 doctor-0.516
 woman-0.508
 wizard-0.505
 vet-0.497
 owl-0.495
Tokenics
Token ID873
Feature activation+5.91e-02

Pos logit contributions
rated+0.097
fy+0.089
cluded+0.084
|+0.079
iling+0.077
Neg logit contributions
 doctor-0.239
 woman-0.238
 vet-0.229
 parent-0.229
 wizard-0.228
Token.
Token ID13
Feature activation+3.15e-02

Pos logit contributions
rated+0.154
heit+0.148
fy+0.136
rene+0.135
cluded+0.135
Neg logit contributions
 doctor-0.075
 woman-0.074
 mouse-0.073
 owl-0.068
 bird-0.067
Token They
Token ID1119
Feature activation+4.77e-02

Pos logit contributions
rated+0.056
fy+0.053
heit+0.050
ils+0.050
cluded+0.050
Neg logit contributions
 doctor-0.063
 woman-0.062
 vet-0.059
 parent-0.059
 mechanic-0.059
Token loved
Token ID6151
Feature activation-1.02e-02

Pos logit contributions
rated+0.104
cluded+0.103
irm+0.103
fy+0.103
heit+0.100
Neg logit contributions
 woman-0.071
 parent-0.067
 mouse-0.066
 bird-0.063
 fairy-0.063
Token "
Token ID366
Feature activation-5.45e-02

Pos logit contributions
 doctor+0.038
 mechanic+0.037
 woman+0.037
 wizard+0.036
 painter+0.036
Neg logit contributions
rated-0.083
fy-0.080
irm-0.078
cluded-0.078
heit-0.078
TokenGet
Token ID3855
Feature activation+6.57e-02

Pos logit contributions
 woman+0.104
 doctor+0.104
 vet+0.099
 parent+0.097
 mouse+0.096
Neg logit contributions
rated-0.096
fy-0.093
cluded-0.088
heit-0.083
|-0.082
Token off
Token ID572
Feature activation+6.29e-02

Pos logit contributions
rated+0.133
rene+0.123
iling+0.122
cluded+0.122
estival+0.121
Neg logit contributions
 doctor-0.112
 woman-0.097
 bird-0.097
 vet-0.096
 mechanic-0.096
Token!
Token ID0
Feature activation+2.27e-02

Pos logit contributions
rated+0.120
fy+0.113
cluded+0.112
rene+0.106
|+0.106
Neg logit contributions
 doctor-0.114
 woman-0.110
 vet-0.104
 mouse-0.104
 wizard-0.103
Token Get
Token ID3497
Feature activation+4.42e-02

Pos logit contributions
rated+0.041
fy+0.038
cluded+0.036
ils+0.036
rene+0.035
Neg logit contributions
 doctor-0.045
 woman-0.043
 vet-0.042
 parent-0.042
 mouse-0.042
Token off
Token ID572
Feature activation+6.30e-02

Pos logit contributions
rated+0.085
iling+0.080
cluded+0.079
rene+0.078
estival+0.077
Neg logit contributions
 doctor-0.078
 vet-0.068
 woman-0.068
 mechanic-0.067
 bird-0.067
Token!
Token ID0
Feature activation+2.25e-02

Pos logit contributions
rated+0.112
cluded+0.105
fy+0.104
rene+0.099
|+0.098
Neg logit contributions
 doctor-0.122
 woman-0.118
 mouse-0.112
 vet-0.112
 fairy-0.112
Token You
Token ID921
Feature activation+4.16e-02

Pos logit contributions
rated+0.038
fy+0.036
cluded+0.034
ils+0.033
heit+0.032
Neg logit contributions
 doctor-0.046
 woman-0.045
 vet-0.043
 parent-0.043
 mouse-0.043
Token are
Token ID389
Feature activation-2.04e-02

Pos logit contributions
rated+0.065
fy+0.060
cluded+0.059
estival+0.056
iling+0.055
Neg logit contributions
 doctor-0.087
 woman-0.085
 mouse-0.083
 parent-0.083
 vet-0.082
Token ruining
Token ID39842
Feature activation+4.09e-02

Pos logit contributions
 doctor+0.034
 woman+0.033
 mouse+0.032
 parent+0.031
 vet+0.031
Neg logit contributions
rated-0.042
fy-0.039
cluded-0.038
rene-0.037
estival-0.037
Token my
Token ID616
Feature activation+2.64e-02

Pos logit contributions
rated+0.063
fy+0.059
cluded+0.057
|+0.054
rene+0.053
Neg logit contributions
 doctor-0.088
 woman-0.087
 vet-0.083
 parent-0.083
 mouse-0.083
Token they
Token ID484
Feature activation+2.96e-02

Pos logit contributions
 doctor+0.081
 woman+0.080
 parent+0.076
 vet+0.076
 wizard+0.075
Neg logit contributions
rated-0.086
fy-0.082
cluded-0.079
|-0.077
iling-0.076
Token were
Token ID547
Feature activation-1.52e-02

Pos logit contributions
rated+0.058
cluded+0.055
heit+0.055
iling+0.054
fy+0.054
Neg logit contributions
 doctor-0.050
 woman-0.049
 parent-0.048
 mouse-0.047
 fairy-0.045
Token a
Token ID257
Feature activation-4.01e-02

Pos logit contributions
 doctor+0.025
 woman+0.023
 parent+0.023
 fairy+0.022
 wizard+0.022
Neg logit contributions
rated-0.034
fy-0.032
|-0.031
cluded-0.031
heit-0.030
Token family
Token ID1641
Feature activation-3.82e-02

Pos logit contributions
 woman+0.080
 wizard+0.078
 doctor+0.077
 owl+0.077
 vet+0.076
Neg logit contributions
fy-0.055
rated-0.047
irl-0.046
cluded-0.046
|-0.044
Token and
Token ID290
Feature activation+1.10e-02

Pos logit contributions
 doctor+0.069
 woman+0.068
 parent+0.065
 vet+0.064
 mouse+0.064
Neg logit contributions
rated-0.071
fy-0.067
cluded-0.065
heit-0.063
|-0.063
Token they
Token ID484
Feature activation+5.49e-02

Pos logit contributions
rated+0.021
fy+0.021
cluded+0.020
|+0.019
ils+0.019
Neg logit contributions
 doctor-0.018
 woman-0.018
 wizard-0.017
 vet-0.017
 parent-0.017
Token had
Token ID550
Feature activation+8.14e-03

Pos logit contributions
rated+0.099
fy+0.092
cluded+0.092
heit+0.091
iling+0.090
Neg logit contributions
 woman-0.102
 doctor-0.101
 parent-0.097
 fairy-0.096
 mouse-0.095
Token to
Token ID284
Feature activation+1.02e-02

Pos logit contributions
rated+0.020
cluded+0.018
heit+0.017
iling+0.017
fy+0.017
Neg logit contributions
 woman-0.011
 doctor-0.011
 vet-0.011
 mouse-0.011
 parent-0.011
Token be
Token ID307
Feature activation+2.98e-02

Pos logit contributions
rated+0.020
fy+0.019
cluded+0.017
ash+0.017
iling+0.017
Neg logit contributions
 doctor-0.019
 parent-0.018
 vet-0.018
 woman-0.018
 fairy-0.018
Token nice
Token ID3621
Feature activation+3.40e-02

Pos logit contributions
rated+0.065
fy+0.061
|+0.059
heit+0.059
cluded+0.058
Neg logit contributions
 doctor-0.047
 woman-0.046
 parent-0.044
 vet-0.043
 fairy-0.042
Token to
Token ID284
Feature activation-6.27e-03

Pos logit contributions
rated+0.076
fy+0.071
heit+0.071
cluded+0.071
|+0.069
Neg logit contributions
 doctor-0.051
 woman-0.050
 mouse-0.047
 parent-0.047
 vet-0.046
Token all
Token ID477
Feature activation+8.18e-02

Pos logit contributions
rated+0.040
fy+0.037
cluded+0.037
|+0.035
rene+0.035
Neg logit contributions
 doctor-0.035
 woman-0.034
 mouse-0.033
 vet-0.032
 parent-0.032
Token the
Token ID262
Feature activation+1.48e-02

Pos logit contributions
rated+0.166
fy+0.159
cluded+0.152
irm+0.149
heit+0.148
Neg logit contributions
 woman-0.137
 doctor-0.135
 mouse-0.130
 wizard-0.128
 fairy-0.128
Token fruits
Token ID15921
Feature activation+2.02e-02

Pos logit contributions
rated+0.033
fy+0.032
cluded+0.031
|+0.031
rene+0.030
Neg logit contributions
 doctor-0.021
 woman-0.021
 mouse-0.020
 wizard-0.020
 parent-0.020
Token we
Token ID356
Feature activation+8.91e-02

Pos logit contributions
rated+0.043
heit+0.041
cluded+0.040
|+0.039
rene+0.039
Neg logit contributions
 doctor-0.033
 woman-0.032
 mouse-0.030
 owl-0.030
 fairy-0.030
Token saw
Token ID2497
Feature activation-7.53e-02

Pos logit contributions
rated+0.160
heit+0.153
cluded+0.147
fy+0.144
irm+0.141
Neg logit contributions
 woman-0.155
 wizard-0.155
 doctor-0.154
 fairy-0.153
 mouse-0.151
Token on
Token ID319
Feature activation+8.12e-02

Pos logit contributions
 doctor+0.120
 woman+0.117
 mouse+0.117
 owl+0.114
 fairy+0.113
Neg logit contributions
rated-0.154
fy-0.145
cluded-0.145
rene-0.145
|-0.144
Token the
Token ID262
Feature activation+1.39e-02

Pos logit contributions
rated+0.189
fy+0.183
cluded+0.178
rene+0.171
beh+0.166
Neg logit contributions
 doctor-0.120
 woman-0.113
 mouse-0.111
 vet-0.108
 fairy-0.105
Token cards
Token ID4116
Feature activation-4.03e-03

Pos logit contributions
rated+0.032
fy+0.031
cluded+0.030
rene+0.030
iling+0.029
Neg logit contributions
 doctor-0.020
 woman-0.020
 wizard-0.019
 mouse-0.019
 vet-0.018
Token.
Token ID13
Feature activation+2.43e-02

Pos logit contributions
 doctor+0.008
 woman+0.007
 wizard+0.007
 fairy+0.007
 mouse+0.007
Neg logit contributions
rated-0.007
fy-0.007
cluded-0.007
heit-0.007
|-0.006
Token And
Token ID843
Feature activation+5.33e-03

Pos logit contributions
rated+0.039
fy+0.039
cluded+0.036
|+0.035
estival+0.035
Neg logit contributions
 woman-0.047
 doctor-0.047
 parent-0.045
 wizard-0.045
 vet-0.044
Token some
Token ID617
Feature activation+2.46e-02

Pos logit contributions
rated+0.010
fy+0.010
cluded+0.009
|+0.009
rene+0.009
Neg logit contributions
 doctor-0.009
 woman-0.009
 mouse-0.009
 parent-0.009
 vet-0.009
Token't
Token ID470
Feature activation-1.83e-02

Pos logit contributions
 woman+0.024
 doctor+0.023
 vet+0.022
 parent+0.022
 wizard+0.022
Neg logit contributions
rated-0.039
fy-0.038
cluded-0.037
|-0.036
iling-0.036
Token you
Token ID345
Feature activation+7.09e-02

Pos logit contributions
 doctor+0.042
 woman+0.042
 parent+0.040
 vet+0.040
 wizard+0.040
Neg logit contributions
rated-0.025
fy-0.023
cluded-0.022
|-0.021
rene-0.021
Token come
Token ID1282
Feature activation+5.79e-02

Pos logit contributions
rated+0.123
fy+0.118
cluded+0.113
iling+0.112
ils+0.111
Neg logit contributions
 doctor-0.134
 parent-0.127
 woman-0.126
 vet-0.125
 mouse-0.123
Token and
Token ID290
Feature activation+9.17e-03

Pos logit contributions
rated+0.116
fy+0.111
cluded+0.107
heit+0.104
iling+0.103
Neg logit contributions
 doctor-0.097
 woman-0.094
 parent-0.091
 vet-0.090
 mouse-0.090
Token join
Token ID4654
Feature activation+5.48e-02

Pos logit contributions
rated+0.013
fy+0.012
cluded+0.012
|+0.011
iling+0.011
Neg logit contributions
 woman-0.020
 doctor-0.020
 parent-0.019
 wizard-0.019
 vet-0.019
Token us
Token ID514
Feature activation+7.28e-02

Pos logit contributions
rated+0.116
fy+0.115
rene+0.107
ishy+0.102
cluded+0.101
Neg logit contributions
 doctor-0.087
 mouse-0.086
 parent-0.084
 fairy-0.081
 bird-0.079
Token?"
Token ID1701
Feature activation-4.88e-03

Pos logit contributions
rated+0.130
fy+0.124
heit+0.118
cluded+0.117
rene+0.113
Neg logit contributions
 doctor-0.141
 woman-0.131
 parent-0.129
 vet-0.128
 mouse-0.128
Token
Token ID198
Feature activation-2.81e-02

Pos logit contributions
 woman+0.008
 doctor+0.008
 wizard+0.008
 parent+0.008
 fairy+0.007
Neg logit contributions
rated-0.009
fy-0.009
|-0.009
cluded-0.009
 matters-0.008
Token
Token ID198
Feature activation-3.15e-02

Pos logit contributions
 woman+0.043
 wizard+0.042
 doctor+0.042
 fairy+0.040
 parent+0.040
Neg logit contributions
|-0.054
fy-0.054
rated-0.053
cluded-0.053
irl-0.052
TokenMax
Token ID11518
Feature activation+6.86e-03

Pos logit contributions
 doctor+0.045
 woman+0.045
 parent+0.042
 vet+0.042
 wizard+0.042
Neg logit contributions
rated-0.069
fy-0.067
cluded-0.065
|-0.063
iling-0.063
Token was
Token ID373
Feature activation-2.45e-02

Pos logit contributions
rated+0.014
fy+0.013
cluded+0.013
|+0.013
iling+0.012
Neg logit contributions
 doctor-0.011
 woman-0.011
 parent-0.010
 mouse-0.010
 vet-0.010
Token.
Token ID13
Feature activation+6.16e-04

Pos logit contributions
 doctor+0.119
 woman+0.113
 mouse+0.108
 fairy+0.107
 parent+0.107
Neg logit contributions
rated-0.134
heit-0.128
wy-0.125
fy-0.123
iling-0.122
Token<|endoftext|>
Token ID50256
Feature activation+9.41e-02

Pos logit contributions
|+0.001
fy+0.001
rated+0.001
cluded+0.001
 matters+0.001
Neg logit contributions
 wizard-0.001
 woman-0.001
 doctor-0.001
 fairy-0.001
 painter-0.001
TokenOnce
Token ID7454
Feature activation+1.19e-01

Pos logit contributions
rated+0.257
fy+0.251
cluded+0.240
heit+0.237
ils+0.232
Neg logit contributions
 doctor-0.094
 woman-0.089
 mouse-0.083
 vet-0.080
 wizard-0.079
Token upon
Token ID2402
Feature activation+3.97e-02

Pos logit contributions
rated+0.211
fy+0.202
cluded+0.195
|+0.191
irl+0.185
Neg logit contributions
 doctor-0.216
 woman-0.216
 vet-0.209
 wizard-0.205
 mouse-0.203
Token a
Token ID257
Feature activation+5.30e-03

Pos logit contributions
rated+0.083
fy+0.080
cluded+0.078
|+0.077
rene+0.075
Neg logit contributions
 woman-0.059
 doctor-0.059
 parent-0.056
 wizard-0.055
 vet-0.055
Token time
Token ID640
Feature activation+1.16e-01

Pos logit contributions
rated+0.007
beh+0.006
heit+0.006
ils+0.006
estival+0.006
Neg logit contributions
 doctor-0.012
 woman-0.012
 vet-0.012
 parent-0.012
 mouse-0.012
Token,
Token ID11
Feature activation+2.25e-02

Pos logit contributions
rated+0.222
fy+0.213
cluded+0.205
rene+0.196
iling+0.195
Neg logit contributions
 doctor-0.199
 woman-0.197
 parent-0.187
 mouse-0.186
 wizard-0.185
Token there
Token ID612
Feature activation+2.39e-02

Pos logit contributions
rated+0.044
fy+0.043
cluded+0.041
heit+0.040
ils+0.040
Neg logit contributions
 doctor-0.037
 woman-0.037
 mouse-0.035
 parent-0.035
 vet-0.034
Token was
Token ID373
Feature activation+4.48e-03

Pos logit contributions
rated+0.052
fy+0.051
cluded+0.049
iling+0.047
|+0.047
Neg logit contributions
 doctor-0.035
 woman-0.034
 parent-0.032
 vet-0.032
 mouse-0.032
Token a
Token ID257
Feature activation-2.25e-02

Pos logit contributions
rated+0.010
fy+0.010
cluded+0.009
|+0.009
iling+0.009
Neg logit contributions
 doctor-0.006
 woman-0.006
 parent-0.006
 wizard-0.006
 vet-0.006
Token little
Token ID1310
Feature activation-4.74e-02

Pos logit contributions
 vet+0.024
 doctor+0.023
 wizard+0.023
 conductor+0.023
 woman+0.022
Neg logit contributions
fy-0.047
irl-0.045
rated-0.044
 tim-0.043
cluded-0.042
Token.
Token ID13
Feature activation+1.88e-02

Pos logit contributions
fy+0.152
rated+0.151
heit+0.150
cluded+0.148
irl+0.145
Neg logit contributions
 doctor-0.140
 woman-0.138
 mouse-0.134
 parent-0.132
 fairy-0.127
Token
Token ID198
Feature activation-8.97e-03

Pos logit contributions
rated+0.027
fy+0.026
cluded+0.025
|+0.024
irm+0.024
Neg logit contributions
 woman-0.040
 doctor-0.040
 wizard-0.038
 parent-0.038
 vet-0.038
Token
Token ID198
Feature activation-1.36e-02

Pos logit contributions
 woman+0.015
 wizard+0.015
 doctor+0.014
 painter+0.014
 owl+0.014
Neg logit contributions
|-0.017
fy-0.016
rated-0.016
irl-0.016
 matters-0.016
Token"
Token ID1
Feature activation-4.03e-02

Pos logit contributions
 woman+0.021
 doctor+0.021
 vet+0.020
 parent+0.020
 mouse+0.020
Neg logit contributions
rated-0.029
fy-0.027
cluded-0.026
|-0.025
rene-0.025
TokenLet
Token ID5756
Feature activation-4.33e-03

Pos logit contributions
 doctor+0.063
 woman+0.063
 parent+0.059
 wizard+0.059
 vet+0.059
Neg logit contributions
rated-0.083
fy-0.080
cluded-0.078
|-0.075
iling-0.075
Token's
Token ID338
Feature activation+1.41e-02

Pos logit contributions
 doctor+0.007
 woman+0.006
 vet+0.006
 parent+0.006
 mouse+0.006
Neg logit contributions
rated-0.009
fy-0.009
cluded-0.009
rene-0.009
iling-0.009
Token go
Token ID467
Feature activation+8.24e-02

Pos logit contributions
rated+0.024
fy+0.023
cluded+0.022
iling+0.022
estival+0.021
Neg logit contributions
 doctor-0.029
 parent-0.027
 vet-0.027
 woman-0.026
 mouse-0.026
Token inside
Token ID2641
Feature activation-3.56e-02

Pos logit contributions
rated+0.156
fy+0.151
cluded+0.145
|+0.141
iling+0.138
Neg logit contributions
 doctor-0.142
 woman-0.141
 parent-0.134
 vet-0.133
 wizard-0.133
Token,
Token ID11
Feature activation+3.57e-03

Pos logit contributions
 doctor+0.059
 woman+0.058
 parent+0.055
 mouse+0.054
 wizard+0.054
Neg logit contributions
rated-0.071
fy-0.068
cluded-0.067
|-0.064
rene-0.063
Token Lily
Token ID20037
Feature activation+2.68e-02

Pos logit contributions
rated+0.007
fy+0.007
cluded+0.006
|+0.006
irm+0.006
Neg logit contributions
 woman-0.006
 doctor-0.006
 wizard-0.006
 vet-0.006
 parent-0.006
Token,"
Token ID553
Feature activation-2.87e-02

Pos logit contributions
rated+0.042
fy+0.040
cluded+0.039
|+0.037
estival+0.037
Neg logit contributions
 woman-0.055
 doctor-0.055
 parent-0.052
 mouse-0.052
 vet-0.052
Token went
Token ID1816
Feature activation+6.71e-02

Pos logit contributions
rated+0.086
fy+0.083
cluded+0.081
|+0.078
estival+0.077
Neg logit contributions
 doctor-0.078
 woman-0.077
 parent-0.073
 wizard-0.072
 mouse-0.072
Token to
Token ID284
Feature activation+3.72e-04

Pos logit contributions
rated+0.123
fy+0.119
cluded+0.115
|+0.110
iling+0.109
Neg logit contributions
 woman-0.118
 doctor-0.117
 parent-0.112
 vet-0.111
 mouse-0.111
Token the
Token ID262
Feature activation-9.38e-03

Pos logit contributions
rated+0.001
fy+0.001
cluded+0.001
|+0.001
iling+0.001
Neg logit contributions
 woman-0.001
 doctor-0.001
 parent-0.001
 vet-0.001
 wizard-0.001
Token wall
Token ID3355
Feature activation-3.06e-02

Pos logit contributions
 woman+0.014
 parent+0.014
 mouse+0.013
 fairy+0.013
 wizard+0.013
Neg logit contributions
rated-0.017
fy-0.017
cluded-0.017
estival-0.016
irl-0.016
Token and
Token ID290
Feature activation+1.37e-02

Pos logit contributions
 woman+0.049
 doctor+0.048
 parent+0.047
 vet+0.046
 mouse+0.046
Neg logit contributions
rated-0.060
fy-0.059
cluded-0.057
|-0.054
iling-0.054
Token challenged
Token ID12827
Feature activation+1.73e-02

Pos logit contributions
rated+0.024
fy+0.023
cluded+0.023
|+0.022
 matters+0.022
Neg logit contributions
 woman-0.023
 parent-0.023
 doctor-0.023
 vet-0.023
 mouse-0.022
Token himself
Token ID2241
Feature activation+4.89e-02

Pos logit contributions
rated+0.042
fy+0.040
|+0.038
cluded+0.037
iling+0.037
Neg logit contributions
 doctor-0.025
 woman-0.024
 parent-0.022
 wizard-0.022
 vet-0.022
Token with
Token ID351
Feature activation-1.42e-03

Pos logit contributions
rated+0.110
fy+0.104
|+0.101
cluded+0.099
iling+0.099
Neg logit contributions
 doctor-0.074
 woman-0.071
 parent-0.068
 mouse-0.067
 wizard-0.066
Token a
Token ID257
Feature activation-1.26e-02

Pos logit contributions
 woman+0.003
 doctor+0.003
 parent+0.002
 vet+0.002
 wizard+0.002
Neg logit contributions
rated-0.003
fy-0.002
cluded-0.002
|-0.002
heit-0.002
Token new
Token ID649
Feature activation-1.96e-02

Pos logit contributions
 woman+0.022
 doctor+0.022
 vet+0.021
 parent+0.021
 wizard+0.020
Neg logit contributions
fy-0.022
rated-0.022
cluded-0.022
heit-0.020
|-0.020
Token route
Token ID6339
Feature activation+2.39e-02

Pos logit contributions
 doctor+0.025
 woman+0.025
 vet+0.024
 parent+0.023
 wizard+0.023
Neg logit contributions
rated-0.044
fy-0.043
cluded-0.042
irl-0.040
estival-0.040
Token piano
Token ID19132
Feature activation-2.56e-02

Pos logit contributions
 vet+0.041
 doctor+0.041
 parent+0.040
 wizard+0.039
 woman+0.038
Neg logit contributions
rated-0.054
fy-0.053
 matters-0.052
cluded-0.051
irl-0.051
Token.
Token ID13
Feature activation+1.07e-02

Pos logit contributions
 doctor+0.052
 woman+0.052
 parent+0.049
 vet+0.049
 wizard+0.049
Neg logit contributions
rated-0.040
fy-0.039
cluded-0.037
|-0.035
iling-0.035
Token It
Token ID632
Feature activation-4.00e-02

Pos logit contributions
rated+0.016
|+0.016
fy+0.016
cluded+0.015
estival+0.015
Neg logit contributions
 wizard-0.021
 woman-0.021
 doctor-0.020
 fairy-0.020
 owl-0.020
Token was
Token ID373
Feature activation-1.47e-02

Pos logit contributions
 doctor+0.072
 woman+0.071
 vet+0.069
 wizard+0.068
 parent+0.067
Neg logit contributions
rated-0.072
fy-0.069
|-0.066
cluded-0.065
irl-0.063
Token so
Token ID523
Feature activation+6.40e-02

Pos logit contributions
 woman+0.027
 doctor+0.027
 wizard+0.026
 vet+0.025
 parent+0.025
Neg logit contributions
rated-0.025
fy-0.025
cluded-0.023
irl-0.023
|-0.022
Token pretty
Token ID2495
Feature activation+3.18e-03

Pos logit contributions
rated+0.148
iling+0.138
cluded+0.137
fy+0.137
wy+0.137
Neg logit contributions
 woman-0.090
 doctor-0.087
 parent-0.080
 painter-0.078
 fairy-0.078
Token and
Token ID290
Feature activation-7.33e-03

Pos logit contributions
rated+0.007
fy+0.007
cluded+0.007
|+0.007
rene+0.007
Neg logit contributions
 doctor-0.004
 woman-0.004
 mouse-0.004
 wizard-0.004
 parent-0.004
Token shiny
Token ID22441
Feature activation+5.04e-03

Pos logit contributions
 doctor+0.013
 woman+0.013
 vet+0.012
 wizard+0.012
 parent+0.012
Neg logit contributions
rated-0.013
fy-0.013
cluded-0.012
|-0.012
irl-0.012
Token,
Token ID11
Feature activation-2.14e-02

Pos logit contributions
rated+0.010
|+0.010
cluded+0.010
fy+0.010
rene+0.009
Neg logit contributions
 doctor-0.008
 woman-0.008
 mouse-0.008
 fairy-0.008
 wizard-0.008
Token and
Token ID290
Feature activation-3.11e-03

Pos logit contributions
 doctor+0.037
 woman+0.036
 parent+0.036
 vet+0.036
 mechanic+0.036
Neg logit contributions
rated-0.038
fy-0.037
irl-0.035
cluded-0.035
 matters-0.034
Token it
Token ID340
Feature activation+8.84e-03

Pos logit contributions
 doctor+0.006
 woman+0.006
 wizard+0.006
 vet+0.006
 mechanic+0.005
Neg logit contributions
rated-0.005
fy-0.005
 matters-0.005
cluded-0.005
ils-0.005
Token.
Token ID13
Feature activation+2.36e-02

Pos logit contributions
rated+0.029
cluded+0.028
|+0.027
heit+0.027
rene+0.027
Neg logit contributions
 doctor-0.019
 mouse-0.019
 fairy-0.018
 woman-0.018
 bird-0.017
Token
Token ID198
Feature activation-4.40e-03

Pos logit contributions
rated+0.049
fy+0.047
cluded+0.045
|+0.043
iling+0.043
Neg logit contributions
 doctor-0.038
 woman-0.038
 vet-0.036
 parent-0.035
 mouse-0.035
Token
Token ID198
Feature activation-1.07e-02

Pos logit contributions
 woman+0.006
 doctor+0.006
 wizard+0.005
 parent+0.005
 mouse+0.005
Neg logit contributions
rated-0.010
fy-0.010
cluded-0.010
|-0.009
irl-0.009
TokenOne
Token ID3198
Feature activation+7.03e-02

Pos logit contributions
 doctor+0.017
 vet+0.017
 woman+0.017
 mouse+0.016
 nurse+0.015
Neg logit contributions
rated-0.023
fy-0.021
cluded-0.021
beh-0.021
heit-0.020
Token day
Token ID1110
Feature activation+2.26e-02

Pos logit contributions
heit+0.122
rated+0.118
fy+0.114
wy+0.112
ils+0.112
Neg logit contributions
 doctor-0.140
 woman-0.140
 bird-0.136
 mouse-0.133
 fairy-0.132
Token,
Token ID11
Feature activation-9.56e-03

Pos logit contributions
rated+0.057
cluded+0.054
fy+0.053
rene+0.052
|+0.051
Neg logit contributions
 doctor-0.031
 mouse-0.029
 woman-0.028
 fairy-0.026
 vet-0.025
Token Anna
Token ID11735
Feature activation+4.32e-02

Pos logit contributions
 doctor+0.017
 woman+0.017
 mouse+0.016
 parent+0.016
 vet+0.016
Neg logit contributions
rated-0.018
fy-0.018
cluded-0.017
|-0.016
iling-0.016
Token found
Token ID1043
Feature activation-6.66e-02

Pos logit contributions
rated+0.086
fy+0.083
cluded+0.080
heit+0.078
|+0.077
Neg logit contributions
 woman-0.072
 doctor-0.071
 parent-0.068
 vet-0.067
 mouse-0.067
Token a
Token ID257
Feature activation-3.25e-02

Pos logit contributions
 mechanic+0.136
 woman+0.133
 wizard+0.130
 painter+0.130
 parent+0.126
Neg logit contributions
ils-0.082
 matters-0.081
estival-0.076
irl-0.076
irm-0.076
Token very
Token ID845
Feature activation+6.36e-02

Pos logit contributions
 doctor+0.050
 vet+0.049
 woman+0.048
 parent+0.048
 conductor+0.047
Neg logit contributions
fy-0.059
rated-0.059
irl-0.056
cluded-0.056
heit-0.055
Token big
Token ID1263
Feature activation-7.08e-03

Pos logit contributions
rated+0.152
|+0.147
iling+0.146
fy+0.144
ils+0.141
Neg logit contributions
 doctor-0.083
 woman-0.083
 wizard-0.075
 fairy-0.075
 vet-0.074
Token birthday
Token ID10955
Feature activation+5.64e-02

Pos logit contributions
rated+0.070
fy+0.069
cluded+0.066
|+0.064
estival+0.063
Neg logit contributions
 woman-0.053
 doctor-0.053
 vet-0.050
 wizard-0.050
 parent-0.050
Token."
Token ID526
Feature activation+7.93e-03

Pos logit contributions
rated+0.094
cluded+0.086
heit+0.085
fy+0.084
|+0.084
Neg logit contributions
 doctor-0.116
 woman-0.113
 mouse-0.108
 parent-0.108
 mechanic-0.107
Token Sam
Token ID3409
Feature activation+2.75e-03

Pos logit contributions
rated+0.015
fy+0.014
cluded+0.014
|+0.014
estival+0.013
Neg logit contributions
 doctor-0.014
 woman-0.013
 wizard-0.013
 parent-0.013
 vet-0.013
Token said
Token ID531
Feature activation-4.08e-02

Pos logit contributions
rated+0.005
fy+0.005
cluded+0.005
|+0.005
iling+0.005
Neg logit contributions
 doctor-0.005
 woman-0.005
 parent-0.004
 vet-0.004
 wizard-0.004
Token.
Token ID13
Feature activation+5.52e-03

Pos logit contributions
 doctor+0.076
 woman+0.074
 mouse+0.069
 parent+0.069
 wizard+0.069
Neg logit contributions
rated-0.075
fy-0.070
heit-0.069
iling-0.069
cluded-0.069
Token
Token ID198
Feature activation-1.64e-02

Pos logit contributions
|+0.009
fy+0.009
cluded+0.009
irm+0.008
estival+0.008
Neg logit contributions
 woman-0.010
 wizard-0.010
 parent-0.009
 owl-0.009
 doctor-0.009
Token
Token ID198
Feature activation-1.95e-02

Pos logit contributions
 woman+0.025
 wizard+0.024
 doctor+0.024
 parent+0.024
 mouse+0.024
Neg logit contributions
|-0.032
rated-0.032
fy-0.032
cluded-0.031
rene-0.031
TokenL
Token ID43
Feature activation+5.06e-02

Pos logit contributions
 doctor+0.030
 woman+0.030
 parent+0.028
 wizard+0.028
 mouse+0.027
Neg logit contributions
rated-0.041
fy-0.040
cluded-0.038
|-0.037
rene-0.037
Tokenily
Token ID813
Feature activation-5.80e-03

Pos logit contributions
rated+0.094
fy+0.092
cluded+0.085
heit+0.084
|+0.084
Neg logit contributions
 doctor-0.091
 woman-0.088
 vet-0.085
 wizard-0.084
 mouse-0.084
Token took
Token ID1718
Feature activation+1.53e-02

Pos logit contributions
 doctor+0.010
 woman+0.010
 wizard+0.010
 parent+0.010
 vet+0.010
Neg logit contributions
rated-0.010
fy-0.010
cluded-0.009
ils-0.009
|-0.009
Token the
Token ID262
Feature activation+1.60e-03

Pos logit contributions
rated+0.029
fy+0.029
cluded+0.027
|+0.027
irm+0.026
Neg logit contributions
 woman-0.026
 doctor-0.026
 parent-0.024
 wizard-0.024
 vet-0.024
Token there
Token ID612
Feature activation+3.55e-02

Pos logit contributions
rated+0.213
fy+0.205
cluded+0.198
|+0.191
rene+0.187
Neg logit contributions
 doctor-0.203
 woman-0.203
 vet-0.194
 wizard-0.191
 parent-0.191
Token was
Token ID373
Feature activation+1.32e-02

Pos logit contributions
rated+0.079
fy+0.077
cluded+0.074
iling+0.072
heit+0.071
Neg logit contributions
 doctor-0.050
 woman-0.049
 parent-0.046
 mouse-0.046
 wizard-0.045
Token a
Token ID257
Feature activation-1.57e-02

Pos logit contributions
rated+0.036
fy+0.033
heit+0.033
cluded+0.032
ils+0.032
Neg logit contributions
 doctor-0.014
 woman-0.013
 mouse-0.013
 vet-0.013
 parent-0.012
Token thoughtful
Token ID22677
Feature activation-1.04e-02

Pos logit contributions
 doctor+0.015
 wizard+0.015
 vet+0.015
 woman+0.014
 conductor+0.014
Neg logit contributions
fy-0.037
rated-0.035
irl-0.034
cluded-0.034
|-0.033
Token little
Token ID1310
Feature activation-4.42e-02

Pos logit contributions
 doctor+0.009
 vet+0.009
 woman+0.008
 parent+0.008
 wizard+0.008
Neg logit contributions
rated-0.025
fy-0.025
cluded-0.024
|-0.023
irm-0.023
Token girl
Token ID2576
Feature activation-5.53e-02

Pos logit contributions
 doctor+0.040
 woman+0.039
 vet+0.037
 parent+0.036
 wizard+0.036
Neg logit contributions
fy-0.114
rated-0.113
cluded-0.108
|-0.106
iling-0.105
Token who
Token ID508
Feature activation-4.88e-02

Pos logit contributions
 doctor+0.085
 woman+0.085
 parent+0.080
 vet+0.080
 wizard+0.079
Neg logit contributions
rated-0.115
fy-0.112
cluded-0.108
|-0.105
iling-0.104
Token liked
Token ID8288
Feature activation-2.99e-02

Pos logit contributions
 doctor+0.059
 woman+0.058
 fairy+0.057
 mouse+0.056
 parent+0.054
Neg logit contributions
heit-0.126
rated-0.120
estival-0.117
ash-0.113
raw-0.112
Token to
Token ID284
Feature activation+1.73e-02

Pos logit contributions
 doctor+0.045
 woman+0.045
 parent+0.042
 vet+0.042
 wizard+0.041
Neg logit contributions
rated-0.063
fy-0.061
cluded-0.059
|-0.058
iling-0.057
Token speak
Token ID2740
Feature activation+1.82e-03

Pos logit contributions
rated+0.033
fy+0.032
iling+0.030
heit+0.029
wy+0.029
Neg logit contributions
 doctor-0.033
 woman-0.031
 parent-0.031
 wizard-0.030
 fairy-0.030
Token all
Token ID477
Feature activation+7.14e-02

Pos logit contributions
rated+0.004
fy+0.004
|+0.004
wy+0.004
iling+0.004
Neg logit contributions
 doctor-0.003
 woman-0.003
 parent-0.003
 mouse-0.003
 fairy-0.003
Token.
Token ID13
Feature activation+3.00e-03

Pos logit contributions
rated+0.041
fy+0.040
cluded+0.038
|+0.037
iling+0.036
Neg logit contributions
 doctor-0.045
 woman-0.045
 vet-0.043
 parent-0.043
 wizard-0.043
Token It
Token ID632
Feature activation-4.50e-02

Pos logit contributions
rated+0.006
fy+0.006
cluded+0.005
|+0.005
iling+0.005
Neg logit contributions
 doctor-0.005
 woman-0.005
 vet-0.005
 parent-0.005
 wizard-0.005
Token was
Token ID373
Feature activation-2.12e-02

Pos logit contributions
 doctor+0.066
 woman+0.066
 parent+0.062
 vet+0.062
 mouse+0.061
Neg logit contributions
rated-0.096
fy-0.094
cluded-0.091
iling-0.088
|-0.088
Token so
Token ID523
Feature activation+5.81e-02

Pos logit contributions
 doctor+0.035
 woman+0.035
 parent+0.033
 wizard+0.033
 vet+0.033
Neg logit contributions
rated-0.042
fy-0.040
cluded-0.039
|-0.038
iling-0.037
Token beautiful
Token ID4950
Feature activation-2.39e-02

Pos logit contributions
rated+0.137
ils+0.135
cluded+0.131
iling+0.130
fy+0.129
Neg logit contributions
 doctor-0.071
 woman-0.070
 fairy-0.064
 mouse-0.064
 bird-0.063
Token that
Token ID326
Feature activation-5.52e-02

Pos logit contributions
 doctor+0.032
 woman+0.032
 mouse+0.030
 wizard+0.030
 owl+0.029
Neg logit contributions
rated-0.055
fy-0.053
cluded-0.052
|-0.051
rene-0.050
Token he
Token ID339
Feature activation+1.93e-02

Pos logit contributions
 doctor+0.079
 woman+0.077
 parent+0.072
 wizard+0.072
 mouse+0.072
Neg logit contributions
rated-0.123
fy-0.120
cluded-0.116
|-0.112
iling-0.112
Token wanted
Token ID2227
Feature activation-1.57e-02

Pos logit contributions
rated+0.037
cluded+0.036
fy+0.036
iling+0.035
|+0.034
Neg logit contributions
 doctor-0.032
 woman-0.032
 wizard-0.030
 parent-0.030
 mouse-0.030
Token to
Token ID284
Feature activation-6.28e-03

Pos logit contributions
 woman+0.027
 doctor+0.026
 parent+0.026
 vet+0.025
 wizard+0.025
Neg logit contributions
rated-0.028
fy-0.028
cluded-0.027
estival-0.026
irm-0.026
Token take
Token ID1011
Feature activation+4.11e-02

Pos logit contributions
 doctor+0.013
 woman+0.012
 vet+0.012
 parent+0.012
 wizard+0.012
Neg logit contributions
rated-0.010
fy-0.010
cluded-0.009
iling-0.009
ils-0.009
Token a
Token ID257
Feature activation-7.79e-03

Pos logit contributions
rated+0.112
fy+0.102
cluded+0.099
iling+0.099
|+0.098
Neg logit contributions
 doctor-0.046
 mouse-0.041
 wizard-0.040
 woman-0.040
 bird-0.040
Token was
Token ID373
Feature activation-1.44e-02

Pos logit contributions
rated+0.032
fy+0.032
|+0.029
ils+0.029
cluded+0.028
Neg logit contributions
 wizard-0.035
 woman-0.035
 doctor-0.035
 painter-0.035
 mechanic-0.034
Token very
Token ID845
Feature activation+4.48e-02

Pos logit contributions
 woman+0.026
 doctor+0.025
 wizard+0.025
 parent+0.024
 mouse+0.024
Neg logit contributions
rated-0.025
fy-0.025
cluded-0.024
|-0.023
iling-0.022
Token kind
Token ID1611
Feature activation-3.48e-02

Pos logit contributions
rated+0.094
fy+0.092
cluded+0.088
|+0.085
iling+0.085
Neg logit contributions
 doctor-0.066
 woman-0.066
 wizard-0.062
 vet-0.062
 parent-0.062
Token.
Token ID13
Feature activation+3.89e-03

Pos logit contributions
 doctor+0.071
 woman+0.071
 parent+0.069
 wizard+0.069
 vet+0.069
Neg logit contributions
fy-0.051
rated-0.050
cluded-0.045
iling-0.044
ils-0.044
Token She
Token ID1375
Feature activation-1.31e-02

Pos logit contributions
rated+0.008
fy+0.008
cluded+0.007
irl+0.007
ils+0.007
Neg logit contributions
 doctor-0.006
 woman-0.006
 mouse-0.006
 vet-0.006
 parent-0.006
Token taught
Token ID7817
Feature activation-6.58e-02

Pos logit contributions
 woman+0.024
 doctor+0.023
 wizard+0.023
 painter+0.023
 mechanic+0.023
Neg logit contributions
fy-0.021
rated-0.021
ils-0.020
|-0.019
cluded-0.019
Token Lucy
Token ID22162
Feature activation+5.13e-02

Pos logit contributions
 doctor+0.130
 woman+0.129
 mouse+0.124
 vet+0.122
 parent+0.122
Neg logit contributions
rated-0.112
fy-0.107
cluded-0.104
irl-0.097
estival-0.096
Token how
Token ID703
Feature activation-1.19e-02

Pos logit contributions
rated+0.098
fy+0.097
ils+0.092
cluded+0.092
|+0.090
Neg logit contributions
 woman-0.082
 doctor-0.081
 wizard-0.079
 parent-0.078
 mechanic-0.077
Token to
Token ID284
Feature activation+1.66e-02

Pos logit contributions
 doctor+0.015
 woman+0.015
 parent+0.014
 mouse+0.014
 vet+0.014
Neg logit contributions
rated-0.029
fy-0.027
cluded-0.027
heit-0.027
|-0.027
Token take
Token ID1011
Feature activation+5.55e-02

Pos logit contributions
rated+0.031
fy+0.030
cluded+0.029
|+0.028
iling+0.028
Neg logit contributions
 doctor-0.029
 woman-0.029
 vet-0.027
 parent-0.027
 mouse-0.027
Token her
Token ID607
Feature activation+4.12e-02

Pos logit contributions
rated+0.089
fy+0.084
cluded+0.081
|+0.078
iling+0.077
Neg logit contributions
 doctor-0.114
 woman-0.113
 vet-0.108
 mouse-0.108
 parent-0.108
Token with
Token ID351
Feature activation-5.86e-03

Pos logit contributions
rated+0.053
fy+0.051
cluded+0.049
|+0.047
estival+0.047
Neg logit contributions
 doctor-0.045
 woman-0.045
 fairy-0.043
 mouse-0.042
 parent-0.041
Token others
Token ID1854
Feature activation-2.97e-02

Pos logit contributions
 doctor+0.010
 woman+0.010
 vet+0.010
 wizard+0.009
 parent+0.009
Neg logit contributions
rated-0.011
fy-0.011
cluded-0.010
|-0.010
iling-0.010
Token.
Token ID13
Feature activation+4.43e-04

Pos logit contributions
 doctor+0.050
 woman+0.049
 mouse+0.048
 parent+0.046
 fairy+0.046
Neg logit contributions
rated-0.060
cluded-0.056
fy-0.056
heit-0.055
rene-0.054
Token The
Token ID383
Feature activation-5.80e-02

Pos logit contributions
fy+0.001
rated+0.001
|+0.001
cluded+0.001
iling+0.001
Neg logit contributions
 woman-0.001
 doctor-0.001
 parent-0.001
 wizard-0.001
 mechanic-0.001
Token little
Token ID1310
Feature activation-7.12e-02

Pos logit contributions
 doctor+0.065
 scientist+0.064
 principal+0.062
 wizard+0.058
 mechanic+0.057
Neg logit contributions
rated-0.122
 time-0.109
fy-0.108
 afternoon-0.105
iling-0.104
Token bird
Token ID6512
Feature activation-1.11e-01

Pos logit contributions
 doctor+0.066
 principal+0.065
 scientist+0.060
 dentist+0.060
 conductor+0.057
Neg logit contributions
rated-0.168
fy-0.154
iling-0.150
cluded-0.148
 matters-0.148
Token realized
Token ID6939
Feature activation-9.75e-02

Pos logit contributions
 woman+0.194
 doctor+0.192
 parent+0.183
 vet+0.182
 mechanic+0.182
Neg logit contributions
rated-0.204
fy-0.195
cluded-0.186
|-0.183
iling-0.182
Token the
Token ID262
Feature activation-3.10e-02

Pos logit contributions
 parent+0.152
 woman+0.152
 mechanic+0.152
 painter+0.148
 doctor+0.148
Neg logit contributions
 matters-0.166
rated-0.164
iling-0.162
heit-0.162
ils-0.162
Token owl
Token ID39610
Feature activation-1.17e-01

Pos logit contributions
 doctor+0.041
 woman+0.040
 parent+0.038
 mechanic+0.038
 painter+0.037
Neg logit contributions
rated-0.063
cluded-0.055
fy-0.055
heit-0.054
 matters-0.054
Token was
Token ID373
Feature activation-3.60e-02

Pos logit contributions
 doctor+0.201
 woman+0.200
 parent+0.190
 vet+0.190
 wizard+0.187
Neg logit contributions
rated-0.220
fy-0.212
cluded-0.203
|-0.198
iling-0.197
Token right
Token ID826
Feature activation+4.91e-02

Pos logit contributions
 woman+0.061
 doctor+0.060
 parent+0.058
 mouse+0.057
 vet+0.057
Neg logit contributions
rated-0.067
fy-0.065
cluded-0.063
iling-0.061
|-0.060
Token away
Token ID1497
Feature activation+2.79e-02

Pos logit contributions
rated+0.060
fy+0.057
heit+0.056
cluded+0.056
iling+0.055
Neg logit contributions
 doctor-0.062
 woman-0.059
 mouse-0.058
 vet-0.057
 fairy-0.057
Token!"
Token ID2474
Feature activation-1.27e-02

Pos logit contributions
fy+0.050
rated+0.050
cluded+0.049
|+0.047
irl+0.046
Neg logit contributions
 doctor-0.049
 woman-0.048
 mouse-0.047
 wizard-0.046
 fairy-0.046
Token
Token ID198
Feature activation-4.20e-02

Pos logit contributions
 doctor+0.022
 parent+0.022
 woman+0.021
 mechanic+0.021
 wizard+0.021
Neg logit contributions
fy-0.021
|-0.021
rated-0.021
cluded-0.020
iling-0.020
Token
Token ID198
Feature activation-4.31e-02

Pos logit contributions
 doctor+0.063
 wizard+0.063
 parent+0.063
 woman+0.063
 mouse+0.061
Neg logit contributions
|-0.080
fy-0.073
 matters-0.073
cluded-0.072
irl-0.070
TokenThe
Token ID464
Feature activation-6.20e-02

Pos logit contributions
 doctor+0.060
 woman+0.058
 parent+0.058
 wizard+0.057
 mouse+0.056
Neg logit contributions
fy-0.087
|-0.087
rated-0.085
cluded-0.085
iling-0.084
Token old
Token ID1468
Feature activation-6.95e-02

Pos logit contributions
 conductor+0.087
smith+0.084
Steven+0.080
 parent+0.079
 Eagle+0.077
Neg logit contributions
 matters-0.101
rated-0.101
 belong-0.095
igators-0.094
 belongs-0.094
Token person
Token ID1048
Feature activation-1.09e-01

Pos logit contributions
 conductor+0.066
 parent+0.062
 mechanic+0.058
 Eagle+0.058
 vet+0.057
Neg logit contributions
rated-0.142
lla-0.140
 matters-0.137
ancy-0.136
 belongs-0.136
Token was
Token ID373
Feature activation-2.34e-02

Pos logit contributions
 doctor+0.206
 woman+0.205
 vet+0.204
 parent+0.200
 mechanic+0.196
Neg logit contributions
rated-0.174
fy-0.165
cluded-0.155
ils-0.153
|-0.152
Token amazed
Token ID24562
Feature activation-6.87e-02

Pos logit contributions
 woman+0.042
 doctor+0.041
 painter+0.040
 wizard+0.040
 parent+0.040
Neg logit contributions
cluded-0.037
|-0.037
rated-0.037
fy-0.036
irl-0.036
Token and
Token ID290
Feature activation-1.43e-02

Pos logit contributions
 woman+0.098
 doctor+0.098
 parent+0.091
 wizard+0.091
 mouse+0.091
Neg logit contributions
rated-0.152
fy-0.147
cluded-0.143
|-0.141
heit-0.138
Token said
Token ID531
Feature activation-4.92e-02

Pos logit contributions
 doctor+0.025
 woman+0.024
 parent+0.024
 vet+0.024
 mechanic+0.024
Neg logit contributions
rated-0.026
fy-0.025
cluded-0.024
heit-0.023
|-0.023