TOP ACTIVATIONS
MAX = 0.181

There was a gr umpy ranc her named Tom . Tom
control . The ranc her heard the noise and
scratched . The ranc her heard the noise and
willing . The ranc her forg ave them ,
nodded . They thanked the ranc her and hugged him .
. <|endoftext|> Sam was three years old and he loved to
<|endoftext|> J enna was 3 years old . She liked to
<|endoftext|> Sam my was 3 years old and loved to play
and planes . There were men and women wearing bright v
spoiled and mold y and ranc id and expired and best
. <|endoftext|> Johnny was 3 years old and he liked to
<|endoftext|> J ill was 3 years old . She was learning
. <|endoftext|> John was 3 years old . He loved to
. <|endoftext|> John was three years old and very curious .
the new horse when the ranc her was not looking .
Ada . She was 3 years old , and loved playing
twins . They are three years old . They like to
. <|endoftext|> John was three years old and he loved pointing
. <|endoftext|> Jack was 3 years old . He loved to
Mary . Mary was 3 years old and liked to wear

MIN ACTIVATIONS
MIN = -0.181

moved it away , " See , it 's gone now
monster goodbye and said " See you soon !" The monster
hug and said , " See , Jenny ? You did
Let 's take the tw ig with us on all our
bird and took the tw ig back to her nest to
up and said , " See , Mitt ens , I
smiled and said , " See , sometimes things that seem
key . She unlocked the pad lock and opened the door
little boy used the tw ig to help them with all
he could finish his z ig zag . <|endoftext|> Once upon
laughed and said , " See , I told you it
to Tim my , " See , I told you I
His mom said , " See , the blanket will keep
he never refused the z ig zag again . <|endoftext|> John
waved goodbye and said " See you soon !"
His mom said , " See , you are so independent
mom smiled and said " See , I told you being
his mom said , ' See , I told you I
. Sara said , " See you soon !" and the
smiled and said , " See , big bear ! I

INTERVAL 0.091 - 0.181
CONTAINS 0.367%

big farm with lots of animals . There was a horse
eat . It was very hungry and greedy .
the lion and got scared too . He ran to Lily
pretend they were grown - ups and use the things they
coming from outside . It sounded like a big boom !

INTERVAL 0.000 - 0.091
CONTAINS 49.816%

day was finally over , they both felt satisfied with their
The girl said she wanted to help the deaf man and
sister . He loved to play with her and they were
. They had had an adventure . <|endoftext|> L ily loved
hopped by . It looked up at the bear , who

INTERVAL -0.090 - 0.000
CONTAINS 49.586%

said . " We hope you have a better birthday next
was too deep . The insect tried to swim but it
You 're welcome , Ben . You are a good friend
mom smiled and said , � � € � � That
. Inside the box , she found a beautiful dress .

INTERVAL -0.181 - -0.090
CONTAINS 0.232%

the crystal stayed on the rock . That was OK ,
bear showed him how to smooth the walls with his paws
fun . " Look , a hose !" Ben
and helped him prop the tomato plant back up . They
close and said : " Wow ! This is so cool
TokenThere
Token ID1858
Feature activation-4.19e-02

Pos logit contributions
 ranc+0.203
 officers+0.190
 men+0.186
 policemen+0.184
 goats+0.181
Neg logit contributions
ig-0.137
 perfectly-0.113
 patiently-0.102
 Linda-0.098
 marker-0.095
Token was
Token ID373
Feature activation+2.73e-02

Pos logit contributions
ig+0.067
 perfectly+0.056
 patiently+0.052
 marker+0.048
 Linda+0.048
Neg logit contributions
 ranc-0.092
 officers-0.085
 men-0.085
 policemen-0.083
 goats-0.083
Token a
Token ID257
Feature activation+1.43e-02

Pos logit contributions
 ranc+0.059
 men+0.055
 officers+0.054
 goats+0.054
 policemen+0.053
Neg logit contributions
ig-0.046
 perfectly-0.037
 patiently-0.034
mate-0.032
 marker-0.032
Token gr
Token ID1036
Feature activation+1.01e-01

Pos logit contributions
 ranc+0.027
 men+0.024
 goats+0.023
 officers+0.023
 policemen+0.023
Neg logit contributions
ig-0.028
 perfectly-0.025
fully-0.023
 patiently-0.022
mate-0.022
Tokenumpy
Token ID32152
Feature activation+6.66e-02

Pos logit contributions
 ranc+0.123
 policemen+0.113
 officers+0.113
 cages+0.107
 goats+0.106
Neg logit contributions
ig-0.262
 perfectly-0.224
 triangle-0.222
inn-0.222
 patiently-0.218
Token ranc
Token ID32205
Feature activation+1.81e-01

Pos logit contributions
 ranc+0.124
 officers+0.117
 policemen+0.116
 men+0.114
 goats+0.111
Neg logit contributions
ig-0.127
 perfectly-0.106
 patiently-0.102
 triangle-0.098
 toe-0.097
Tokenher
Token ID372
Feature activation+1.10e-02

Pos logit contributions
 ranc+0.440
 policemen+0.439
 men+0.437
 goats+0.432
 officers+0.409
Neg logit contributions
 patiently-0.200
ig-0.198
 perfectly-0.198
 proudly-0.181
fully-0.166
Token named
Token ID3706
Feature activation+8.01e-02

Pos logit contributions
 ranc+0.024
 policemen+0.024
 announcements+0.024
 men+0.023
 cages+0.023
Neg logit contributions
ig-0.014
 perfectly-0.012
 patiently-0.011
 M-0.009
fully-0.009
Token Tom
Token ID4186
Feature activation+2.84e-02

Pos logit contributions
 men+0.189
 ranc+0.189
 policemen+0.188
 officers+0.186
 cages+0.181
Neg logit contributions
ig-0.085
 M-0.080
 perfectly-0.080
 Linda-0.073
 Emily-0.069
Token.
Token ID13
Feature activation+2.50e-02

Pos logit contributions
 ranc+0.066
 men+0.061
 officers+0.061
 policemen+0.061
 goats+0.059
Neg logit contributions
ig-0.043
 perfectly-0.035
 patiently-0.032
 Linda-0.029
 toe-0.029
Token Tom
Token ID4186
Feature activation+8.02e-03

Pos logit contributions
 ranc+0.063
 officers+0.060
 policemen+0.056
 goats+0.055
 men+0.053
Neg logit contributions
ig-0.027
inn-0.024
 triangle-0.024
 Jill-0.023
 ï¿½-0.023
Token control
Token ID1630
Feature activation+1.76e-02

Pos logit contributions
 ranc+0.030
 officers+0.028
 men+0.027
 policemen+0.027
 goats+0.027
Neg logit contributions
ig-0.018
 perfectly-0.015
 patiently-0.014
 marker-0.012
 Linda-0.012
Token.
Token ID13
Feature activation+4.42e-03

Pos logit contributions
 ranc+0.044
 officers+0.040
 policemen+0.039
 goats+0.039
 men+0.038
Neg logit contributions
ig-0.024
 perfectly-0.020
fully-0.017
 patiently-0.017
 toe-0.016
Token
Token ID198
Feature activation+6.57e-03

Pos logit contributions
 ranc+0.010
 officers+0.010
 men+0.009
 policemen+0.009
 goats+0.009
Neg logit contributions
ig-0.007
 perfectly-0.005
 patiently-0.005
 Linda-0.005
 marker-0.005
Token
Token ID198
Feature activation+5.61e-03

Pos logit contributions
 ranc+0.016
 officers+0.015
 men+0.014
 goats+0.014
 policemen+0.014
Neg logit contributions
ig-0.010
 perfectly-0.008
 patiently-0.007
 Linda-0.007
 dab-0.007
TokenThe
Token ID464
Feature activation+3.18e-02

Pos logit contributions
 ranc+0.014
 officers+0.013
 goats+0.012
 policemen+0.012
 men+0.012
Neg logit contributions
ig-0.007
on-0.006
 perfectly-0.006
 gently-0.006
zz-0.006
Token ranc
Token ID32205
Feature activation+1.72e-01

Pos logit contributions
 ranc+0.053
 pm+0.051
cases+0.051
 officers+0.050
uts+0.050
Neg logit contributions
 patiently-0.044
 brush-0.043
ig-0.042
 marker-0.041
 leaf-0.039
Tokenher
Token ID372
Feature activation+3.97e-03

Pos logit contributions
 ranc+0.419
 goats+0.408
 policemen+0.379
 officers+0.379
 seals+0.368
Neg logit contributions
ig-0.236
 perfectly-0.206
 patiently-0.183
 gently-0.164
on-0.162
Token heard
Token ID2982
Feature activation+3.50e-02

Pos logit contributions
 ranc+0.010
 goats+0.009
 cages+0.009
 policemen+0.009
 announcements+0.009
Neg logit contributions
ig-0.004
 perfectly-0.004
 patiently-0.003
zz-0.003
 gently-0.003
Token the
Token ID262
Feature activation+4.15e-02

Pos logit contributions
 ranc+0.092
 officers+0.083
 goats+0.081
 policemen+0.081
 men+0.080
Neg logit contributions
ig-0.038
 perfectly-0.030
 Linda-0.029
 patiently-0.029
 Kate-0.028
Token noise
Token ID7838
Feature activation+7.10e-02

Pos logit contributions
 ranc+0.065
 vendors+0.064
guards+0.063
 stalls+0.060
 seals+0.059
Neg logit contributions
 marker-0.059
 brush-0.058
ig-0.058
 patiently-0.055
 paste-0.055
Token and
Token ID290
Feature activation+2.54e-02

Pos logit contributions
 ranc+0.182
 officers+0.166
 goats+0.164
 policemen+0.163
 men+0.161
Neg logit contributions
ig-0.087
 perfectly-0.073
 patiently-0.070
fully-0.063
 gently-0.061
Token scratched
Token ID37468
Feature activation-1.88e-02

Pos logit contributions
 ranc+0.029
 officers+0.027
 men+0.027
 policemen+0.026
 goats+0.026
Neg logit contributions
ig-0.028
 perfectly-0.023
 patiently-0.021
inn-0.020
zz-0.020
Token.
Token ID13
Feature activation+1.41e-02

Pos logit contributions
ig+0.030
 perfectly+0.024
 patiently+0.022
 marker+0.021
 dab+0.021
Neg logit contributions
 ranc-0.042
 men-0.040
 officers-0.040
 policemen-0.039
 goats-0.038
Token
Token ID198
Feature activation+1.15e-02

Pos logit contributions
 ranc+0.030
 officers+0.028
 men+0.027
 goats+0.027
 policemen+0.026
Neg logit contributions
ig-0.022
 perfectly-0.020
 Kate-0.019
 Linda-0.018
 Emily-0.018
Token
Token ID198
Feature activation+9.42e-03

Pos logit contributions
 ranc+0.025
 officers+0.024
 men+0.023
 policemen+0.023
 goats+0.023
Neg logit contributions
ig-0.019
 perfectly-0.015
 patiently-0.014
 Linda-0.013
 marker-0.013
TokenThe
Token ID464
Feature activation+3.55e-02

Pos logit contributions
 ranc+0.022
 officers+0.021
 men+0.020
 goats+0.020
 policemen+0.019
Neg logit contributions
ig-0.013
 perfectly-0.011
 patiently-0.010
 gently-0.010
inn-0.010
Token ranc
Token ID32205
Feature activation+1.70e-01

Pos logit contributions
 vendors+0.065
 officers+0.064
uts+0.064
anes+0.063
irens+0.063
Neg logit contributions
ig-0.049
 leaf-0.043
 patiently-0.041
 brush-0.040
 crystal-0.038
Tokenher
Token ID372
Feature activation+1.31e-03

Pos logit contributions
 ranc+0.416
 goats+0.404
 policemen+0.380
 officers+0.374
 men+0.369
Neg logit contributions
ig-0.221
 perfectly-0.202
 patiently-0.179
 gently-0.162
fully-0.159
Token heard
Token ID2982
Feature activation+3.59e-02

Pos logit contributions
 ranc+0.003
 cages+0.003
 goats+0.003
 vendors+0.003
 policemen+0.003
Neg logit contributions
ig-0.001
 perfectly-0.001
 patiently-0.001
 proudly-0.001
 gently-0.001
Token the
Token ID262
Feature activation+4.31e-02

Pos logit contributions
 ranc+0.097
 officers+0.086
 goats+0.085
 vendors+0.081
 policemen+0.081
Neg logit contributions
ig-0.030
 Emily-0.028
 Kate-0.028
 patiently-0.026
 perfectly-0.026
Token noise
Token ID7838
Feature activation+6.92e-02

Pos logit contributions
 vendors+0.063
 ranc+0.061
guards+0.060
 officers+0.059
 stalls+0.057
Neg logit contributions
ig-0.073
 patiently-0.064
 leaf-0.064
 brush-0.063
 gently-0.062
Token and
Token ID290
Feature activation+2.70e-02

Pos logit contributions
 ranc+0.162
 officers+0.149
 men+0.147
 goats+0.146
 policemen+0.146
Neg logit contributions
ig-0.102
 perfectly-0.085
 patiently-0.079
fully-0.072
 toe-0.071
Token willing
Token ID4684
Feature activation+9.22e-03

Pos logit contributions
ig+0.003
 perfectly+0.003
 patiently+0.003
 marker+0.002
 Linda+0.002
Neg logit contributions
 ranc-0.004
 officers-0.004
 men-0.004
 policemen-0.004
 goats-0.004
Token.
Token ID13
Feature activation-1.41e-02

Pos logit contributions
 ranc+0.027
 men+0.024
 officers+0.024
 policemen+0.024
 goats+0.024
Neg logit contributions
ig-0.008
 perfectly-0.006
 patiently-0.005
 proudly-0.005
 gently-0.005
Token
Token ID198
Feature activation-6.01e-03

Pos logit contributions
ig+0.022
 perfectly+0.019
 patiently+0.019
 Linda+0.018
 Emily+0.018
Neg logit contributions
 ranc-0.030
 officers-0.028
 men-0.028
 policemen-0.027
 goats-0.027
Token
Token ID198
Feature activation-5.17e-03

Pos logit contributions
ig+0.009
 perfectly+0.007
 patiently+0.007
 Linda+0.006
on+0.006
Neg logit contributions
 ranc-0.014
 officers-0.013
 men-0.013
 policemen-0.013
 goats-0.013
TokenThe
Token ID464
Feature activation+2.53e-02

Pos logit contributions
ig+0.008
 patiently+0.006
on+0.006
 gently+0.006
 perfectly+0.006
Neg logit contributions
 ranc-0.011
 officers-0.011
 men-0.011
 goats-0.010
 policemen-0.010
Token ranc
Token ID32205
Feature activation+1.61e-01

Pos logit contributions
cases+0.050
 officers+0.050
 vendors+0.049
guards+0.049
 policemen+0.048
Neg logit contributions
ig-0.035
 patiently-0.029
 er-0.024
 leaf-0.024
 gently-0.024
Tokenher
Token ID372
Feature activation-1.25e-02

Pos logit contributions
 ranc+0.390
 goats+0.380
 policemen+0.362
 men+0.355
 officers+0.352
Neg logit contributions
ig-0.218
 perfectly-0.199
 patiently-0.188
fully-0.162
 gently-0.160
Token forg
Token ID5784
Feature activation+5.67e-02

Pos logit contributions
ig+0.018
 patiently+0.016
 perfectly+0.015
 gently+0.014
 proudly+0.014
Neg logit contributions
 ranc-0.026
 cages-0.025
 policemen-0.025
 announcements-0.024
 goats-0.024
Tokenave
Token ID1015
Feature activation-3.44e-02

Pos logit contributions
 ranc+0.111
 officers+0.102
 men+0.101
 policemen+0.099
 goats+0.098
Neg logit contributions
ig-0.107
 perfectly-0.091
 patiently-0.085
 marker-0.080
 Linda-0.080
Token them
Token ID606
Feature activation+6.09e-02

Pos logit contributions
ig+0.052
 perfectly+0.043
 patiently+0.039
 Linda+0.037
fully+0.036
Neg logit contributions
 ranc-0.080
 officers-0.071
 men-0.071
 policemen-0.070
 goats-0.069
Token,
Token ID11
Feature activation-1.61e-03

Pos logit contributions
 ranc+0.150
 men+0.139
 policemen+0.138
 officers+0.136
 goats+0.135
Neg logit contributions
ig-0.083
 perfectly-0.069
 patiently-0.066
 proudly-0.058
 gently-0.057
Token nodded
Token ID14464
Feature activation+2.43e-02

Pos logit contributions
ig+0.004
 perfectly+0.004
 patiently+0.003
 marker+0.003
 toe+0.003
Neg logit contributions
 ranc-0.005
 officers-0.004
 men-0.004
 goats-0.004
 policemen-0.004
Token.
Token ID13
Feature activation-2.83e-02

Pos logit contributions
 ranc+0.050
 officers+0.047
 men+0.046
 policemen+0.045
 goats+0.045
Neg logit contributions
ig-0.044
 perfectly-0.034
zz-0.031
 marker-0.031
 dab-0.031
Token They
Token ID1119
Feature activation+5.00e-02

Pos logit contributions
ig+0.057
 perfectly+0.050
 patiently+0.048
 Linda+0.045
 triangle+0.045
Neg logit contributions
 ranc-0.051
 officers-0.045
 men-0.045
 policemen-0.044
 goats-0.043
Token thanked
Token ID26280
Feature activation+3.38e-03

Pos logit contributions
 ranc+0.111
 men+0.104
 policemen+0.102
 officers+0.101
 cages+0.100
Neg logit contributions
ig-0.077
 perfectly-0.067
 patiently-0.063
 proudly-0.059
 gently-0.057
Token the
Token ID262
Feature activation+1.98e-02

Pos logit contributions
 ranc+0.008
 officers+0.007
 men+0.007
 policemen+0.007
 goats+0.007
Neg logit contributions
ig-0.005
 perfectly-0.004
 patiently-0.004
 marker-0.004
 toe-0.004
Token ranc
Token ID32205
Feature activation+1.60e-01

Pos logit contributions
 ranc+0.034
 officers+0.033
cases+0.032
 cages+0.031
irens+0.031
Neg logit contributions
ig-0.033
 patiently-0.030
 triangle-0.028
 leaf-0.027
 brush-0.027
Tokenher
Token ID372
Feature activation-2.76e-02

Pos logit contributions
 ranc+0.356
 goats+0.338
 policemen+0.338
 men+0.333
 officers+0.322
Neg logit contributions
ig-0.244
 patiently-0.229
 perfectly-0.221
 gently-0.197
 proudly-0.195
Token and
Token ID290
Feature activation-2.54e-04

Pos logit contributions
ig+0.034
 perfectly+0.028
 patiently+0.027
fully+0.022
 gently+0.022
Neg logit contributions
 ranc-0.070
 policemen-0.065
 men-0.064
 officers-0.063
 goats-0.063
Token hugged
Token ID41130
Feature activation+2.09e-03

Pos logit contributions
ig+0.000
 perfectly+0.000
 patiently+0.000
 marker+0.000
 toe+0.000
Neg logit contributions
 ranc-0.001
 officers-0.000
 men-0.000
 policemen-0.000
 goats-0.000
Token him
Token ID683
Feature activation+3.09e-02

Pos logit contributions
 ranc+0.005
 officers+0.005
 men+0.005
 policemen+0.005
 goats+0.005
Neg logit contributions
ig-0.003
 perfectly-0.002
 patiently-0.002
fully-0.002
 Linda-0.002
Token.
Token ID13
Feature activation-1.77e-02

Pos logit contributions
 ranc+0.074
 officers+0.069
 men+0.069
 policemen+0.068
 goats+0.067
Neg logit contributions
ig-0.045
 perfectly-0.037
 patiently-0.033
 marker-0.030
 toe-0.030
Token.
Token ID13
Feature activation-3.33e-03

Pos logit contributions
ig+0.008
 perfectly+0.007
 patiently+0.006
 marker+0.006
 Linda+0.006
Neg logit contributions
 ranc-0.013
 officers-0.012
 men-0.011
 policemen-0.011
 goats-0.011
Token<|endoftext|>
Token ID50256
Feature activation+9.36e-02

Pos logit contributions
ig+0.005
 perfectly+0.004
 Linda+0.004
 patiently+0.004
 Kate+0.003
Neg logit contributions
 ranc-0.008
 officers-0.007
 men-0.007
 goats-0.007
 policemen-0.007
TokenSam
Token ID16305
Feature activation+8.04e-03

Pos logit contributions
 ranc+0.203
 officers+0.188
 men+0.184
 policemen+0.182
 goats+0.180
Neg logit contributions
ig-0.157
 perfectly-0.131
 patiently-0.120
 Linda-0.114
 marker-0.113
Token was
Token ID373
Feature activation+4.03e-02

Pos logit contributions
 ranc+0.020
 officers+0.019
 men+0.019
 policemen+0.019
 goats+0.018
Neg logit contributions
ig-0.011
 perfectly-0.008
 patiently-0.007
 Linda-0.007
 marker-0.007
Token three
Token ID1115
Feature activation+3.33e-02

Pos logit contributions
 ranc+0.084
 officers+0.078
 men+0.077
 policemen+0.075
 goats+0.075
Neg logit contributions
ig-0.070
 perfectly-0.060
 patiently-0.056
 toe-0.051
 marker-0.051
Token years
Token ID812
Feature activation+1.60e-01

Pos logit contributions
 ranc+0.059
 officers+0.054
 announcements+0.048
 policemen+0.048
 goats+0.047
Neg logit contributions
ig-0.069
 perfectly-0.061
 patiently-0.058
 proudly-0.055
 toe-0.053
Token old
Token ID1468
Feature activation+5.90e-02

Pos logit contributions
 ranc+0.263
 officers+0.241
 goats+0.233
 policemen+0.229
 men+0.224
Neg logit contributions
ig-0.338
 patiently-0.298
 perfectly-0.294
 proudly-0.272
 triangle-0.271
Token and
Token ID290
Feature activation+5.05e-02

Pos logit contributions
 ranc+0.136
 officers+0.125
 men+0.124
 policemen+0.122
 goats+0.120
Neg logit contributions
ig-0.087
 perfectly-0.071
 patiently-0.069
 proudly-0.061
 toe-0.061
Token he
Token ID339
Feature activation+6.28e-02

Pos logit contributions
 ranc+0.113
 officers+0.106
 goats+0.102
 policemen+0.102
 men+0.102
Neg logit contributions
ig-0.077
 perfectly-0.068
 patiently-0.064
 Linda-0.058
 proudly-0.058
Token loved
Token ID6151
Feature activation+1.65e-02

Pos logit contributions
 ranc+0.160
 officers+0.149
 goats+0.147
 men+0.146
 policemen+0.146
Neg logit contributions
ig-0.078
 perfectly-0.071
 patiently-0.061
 proudly-0.059
 toe-0.055
Token to
Token ID284
Feature activation+4.47e-02

Pos logit contributions
 ranc+0.041
 officers+0.038
 men+0.038
 policemen+0.037
 goats+0.037
Neg logit contributions
ig-0.021
 perfectly-0.017
 patiently-0.016
 marker-0.015
 coloring-0.014
Token<|endoftext|>
Token ID50256
Feature activation+8.68e-02

Pos logit contributions
ig+0.063
 perfectly+0.054
 patiently+0.051
 marker+0.049
 toe+0.047
Neg logit contributions
 ranc-0.051
 officers-0.044
 policemen-0.043
 goats-0.043
 men-0.043
TokenJ
Token ID41
Feature activation-1.18e-02

Pos logit contributions
 ranc+0.209
 officers+0.196
 men+0.192
 policemen+0.190
 goats+0.188
Neg logit contributions
ig-0.125
 perfectly-0.100
 patiently-0.089
on-0.087
 Linda-0.087
Tokenenna
Token ID13713
Feature activation-2.70e-02

Pos logit contributions
ig+0.013
 perfectly+0.011
 patiently+0.010
 toe+0.009
 triangle+0.008
Neg logit contributions
 ranc-0.030
 officers-0.029
 men-0.028
 goats-0.028
 policemen-0.027
Token was
Token ID373
Feature activation+4.18e-02

Pos logit contributions
ig+0.040
 perfectly+0.032
 patiently+0.030
 proudly+0.027
 marker+0.027
Neg logit contributions
 ranc-0.064
 officers-0.060
 men-0.059
 policemen-0.059
 goats-0.057
Token 3
Token ID513
Feature activation+4.62e-02

Pos logit contributions
 ranc+0.082
 men+0.073
 policemen+0.072
 officers+0.072
 goats+0.071
Neg logit contributions
ig-0.079
 perfectly-0.062
 marker-0.059
mate-0.059
fully-0.058
Token years
Token ID812
Feature activation+1.59e-01

Pos logit contributions
 ranc+0.076
 officers+0.069
 men+0.065
 policemen+0.065
 goats+0.062
Neg logit contributions
ig-0.101
 perfectly-0.091
 patiently-0.086
 proudly-0.081
 toe-0.081
Token old
Token ID1468
Feature activation+6.09e-02

Pos logit contributions
 ranc+0.244
 officers+0.221
 men+0.212
 policemen+0.212
 goats+0.211
Neg logit contributions
ig-0.363
 perfectly-0.317
 patiently-0.313
 Linda-0.290
 toe-0.289
Token.
Token ID13
Feature activation+2.53e-02

Pos logit contributions
 ranc+0.136
 officers+0.126
 men+0.126
 policemen+0.123
 goats+0.122
Neg logit contributions
ig-0.096
 perfectly-0.080
 patiently-0.074
 marker-0.068
 Linda-0.067
Token She
Token ID1375
Feature activation+1.32e-02

Pos logit contributions
 ranc+0.045
 officers+0.041
 men+0.040
 policemen+0.039
 goats+0.038
Neg logit contributions
ig-0.054
 perfectly-0.044
 patiently-0.041
zz-0.040
mate-0.039
Token liked
Token ID8288
Feature activation+1.54e-03

Pos logit contributions
 ranc+0.029
 officers+0.027
 men+0.027
 policemen+0.027
 goats+0.026
Neg logit contributions
ig-0.021
 perfectly-0.018
 patiently-0.017
 marker-0.015
 toe-0.015
Token to
Token ID284
Feature activation+4.21e-02

Pos logit contributions
 ranc+0.004
 officers+0.003
 men+0.003
 policemen+0.003
 goats+0.003
Neg logit contributions
ig-0.002
 perfectly-0.002
 patiently-0.002
 Linda-0.001
 marker-0.001
Token<|endoftext|>
Token ID50256
Feature activation+8.99e-02

Pos logit contributions
ig+0.054
 perfectly+0.045
 patiently+0.043
mate+0.043
fully+0.042
Neg logit contributions
 ranc-0.039
 officers-0.036
 men-0.035
 policemen-0.033
 goats-0.033
TokenSam
Token ID16305
Feature activation+8.49e-03

Pos logit contributions
 ranc+0.181
 officers+0.167
 men+0.164
 policemen+0.162
 goats+0.160
Neg logit contributions
ig-0.164
 perfectly-0.139
 patiently-0.128
 Linda-0.123
 marker-0.122
Tokenmy
Token ID1820
Feature activation-4.36e-03

Pos logit contributions
 ranc+0.020
 officers+0.019
 men+0.018
 goats+0.018
 policemen+0.018
Neg logit contributions
ig-0.012
 perfectly-0.010
 patiently-0.009
 marker-0.008
mate-0.008
Token was
Token ID373
Feature activation+3.79e-02

Pos logit contributions
ig+0.007
 perfectly+0.006
 patiently+0.005
 marker+0.005
 toe+0.005
Neg logit contributions
 ranc-0.010
 officers-0.009
 men-0.009
 policemen-0.009
 goats-0.009
Token 3
Token ID513
Feature activation+4.33e-02

Pos logit contributions
 ranc+0.077
 men+0.071
 officers+0.070
 policemen+0.069
 goats+0.068
Neg logit contributions
ig-0.068
 perfectly-0.055
fully-0.050
 patiently-0.050
 marker-0.050
Token years
Token ID812
Feature activation+1.59e-01

Pos logit contributions
 ranc+0.077
 officers+0.069
 men+0.065
 policemen+0.064
 announcements+0.063
Neg logit contributions
ig-0.088
 perfectly-0.079
 patiently-0.076
 proudly-0.072
 toe-0.072
Token old
Token ID1468
Feature activation+5.89e-02

Pos logit contributions
 ranc+0.248
 officers+0.225
 goats+0.216
 men+0.216
 policemen+0.215
Neg logit contributions
ig-0.353
 perfectly-0.310
 patiently-0.309
 proudly-0.284
 Linda-0.283
Token and
Token ID290
Feature activation+4.76e-02

Pos logit contributions
 ranc+0.131
 officers+0.121
 men+0.120
 policemen+0.118
 goats+0.117
Neg logit contributions
ig-0.094
 perfectly-0.078
 patiently-0.073
 marker-0.066
 toe-0.066
Token loved
Token ID6151
Feature activation+1.83e-02

Pos logit contributions
 ranc+0.091
 men+0.083
 officers+0.082
 policemen+0.080
 goats+0.080
Neg logit contributions
ig-0.094
 perfectly-0.077
 patiently-0.071
zz-0.069
 marker-0.069
Token to
Token ID284
Feature activation+4.67e-02

Pos logit contributions
 ranc+0.044
 officers+0.041
 men+0.040
 policemen+0.040
 goats+0.039
Neg logit contributions
ig-0.026
 perfectly-0.021
 patiently-0.020
 marker-0.018
 toe-0.018
Token play
Token ID711
Feature activation+1.47e-02

Pos logit contributions
 ranc+0.098
 men+0.089
 goats+0.089
 officers+0.088
 policemen+0.086
Neg logit contributions
ig-0.081
 perfectly-0.068
 patiently-0.061
inn-0.059
fully-0.057
Token and
Token ID290
Feature activation+1.24e-02

Pos logit contributions
 ranc+0.228
 goats+0.203
 policemen+0.199
 announcements+0.194
 officers+0.192
Neg logit contributions
 perfectly-0.086
ig-0.084
on-0.069
 patiently-0.067
 coloring-0.067
Token planes
Token ID13016
Feature activation+8.32e-02

Pos logit contributions
 ranc+0.018
 officers+0.015
 goats+0.014
 vendors+0.014
 policemen+0.013
Neg logit contributions
ig-0.023
 perfectly-0.023
 pencil-0.022
 toe-0.022
 Linda-0.022
Token.
Token ID13
Feature activation+1.53e-02

Pos logit contributions
 ranc+0.217
 policemen+0.196
 goats+0.191
 men+0.189
 officers+0.187
Neg logit contributions
ig-0.096
 perfectly-0.090
fully-0.074
 patiently-0.072
 proudly-0.069
Token There
Token ID1318
Feature activation-2.54e-02

Pos logit contributions
 ranc+0.034
 officers+0.031
 men+0.030
 goats+0.030
 policemen+0.030
Neg logit contributions
ig-0.022
 Linda-0.019
 Kate-0.019
 perfectly-0.019
 Emily-0.018
Token were
Token ID547
Feature activation+1.47e-02

Pos logit contributions
ig+0.037
 perfectly+0.030
on+0.028
 Kate+0.027
 triangle+0.025
Neg logit contributions
 ranc-0.062
 officers-0.056
 goats-0.055
 policemen-0.054
 announcements-0.051
Token men
Token ID1450
Feature activation+1.58e-01

Pos logit contributions
 ranc+0.027
 officers+0.024
 goats+0.023
 MacDonald+0.021
 vendors+0.020
Neg logit contributions
 perfectly-0.024
ig-0.022
 coloring-0.021
 patiently-0.021
 Kate-0.021
Token and
Token ID290
Feature activation+2.92e-02

Pos logit contributions
 ranc+0.453
 goats+0.402
 officers+0.399
 announcements+0.397
 policemen+0.396
Neg logit contributions
ig-0.132
 perfectly-0.121
 coloring-0.104
 patiently-0.100
 proudly-0.091
Token women
Token ID1466
Feature activation+1.06e-01

Pos logit contributions
 ranc+0.039
umpy+0.032
 officers+0.032
 vendors+0.031
 pm+0.031
Neg logit contributions
 perfectly-0.053
ig-0.053
 toe-0.052
 Linda-0.052
 pencil-0.052
Token wearing
Token ID5762
Feature activation+2.57e-02

Pos logit contributions
 ranc+0.278
 goats+0.245
 policemen+0.244
 men+0.243
 officers+0.241
Neg logit contributions
ig-0.104
 perfectly-0.103
 coloring-0.092
 patiently-0.085
 proudly-0.082
Token bright
Token ID6016
Feature activation-3.11e-02

Pos logit contributions
 ranc+0.062
 officers+0.056
 men+0.055
 policemen+0.055
 goats+0.054
Neg logit contributions
 perfectly-0.030
ig-0.029
 pencil-0.024
 patiently-0.023
 marker-0.023
Token v
Token ID410
Feature activation+2.63e-02

Pos logit contributions
ig+0.057
 perfectly+0.049
 marker+0.047
 patiently+0.046
 coloring+0.045
Neg logit contributions
 ranc-0.061
 officers-0.054
 policemen-0.051
 goats-0.051
 men-0.050
Token spoiled
Token ID34781
Feature activation-3.53e-03

Pos logit contributions
ig+0.015
 perfectly+0.013
 patiently+0.012
 marker+0.011
 toe+0.011
Neg logit contributions
 ranc-0.019
 officers-0.018
 men-0.018
 policemen-0.017
 goats-0.017
Token and
Token ID290
Feature activation-1.25e-02

Pos logit contributions
ig+0.005
 perfectly+0.005
 patiently+0.004
fully+0.004
 toe+0.004
Neg logit contributions
 ranc-0.008
 officers-0.008
 policemen-0.007
 men-0.007
 goats-0.007
Token mold
Token ID15936
Feature activation-3.79e-02

Pos logit contributions
ig+0.020
 perfectly+0.018
 patiently+0.016
 toe+0.014
 marker+0.014
Neg logit contributions
 ranc-0.028
 officers-0.025
 men-0.025
 policemen-0.025
 goats-0.025
Tokeny
Token ID88
Feature activation-4.61e-02

Pos logit contributions
ig+0.056
 perfectly+0.047
 patiently+0.041
fully+0.038
 Linda+0.038
Neg logit contributions
 ranc-0.090
 officers-0.083
 men-0.082
 policemen-0.082
 goats-0.081
Token and
Token ID290
Feature activation-9.06e-03

Pos logit contributions
ig+0.076
 perfectly+0.067
 patiently+0.058
 toe+0.056
 marker+0.055
Neg logit contributions
 ranc-0.101
 officers-0.092
 policemen-0.090
 men-0.090
 goats-0.088
Token ranc
Token ID32205
Feature activation+1.58e-01

Pos logit contributions
ig+0.015
 perfectly+0.013
 patiently+0.012
 marker+0.011
 toe+0.011
Neg logit contributions
 ranc-0.019
 officers-0.018
 men-0.017
 policemen-0.017
 goats-0.017
Tokenid
Token ID312
Feature activation+1.56e-02

Pos logit contributions
 ranc+0.399
 goats+0.380
 policemen+0.377
 men+0.376
 seals+0.364
Neg logit contributions
 perfectly-0.196
ig-0.174
 patiently-0.160
fully-0.151
 proudly-0.133
Token and
Token ID290
Feature activation-1.24e-03

Pos logit contributions
 ranc+0.036
 policemen+0.033
 men+0.033
 officers+0.032
 goats+0.032
Neg logit contributions
ig-0.024
 perfectly-0.021
 patiently-0.017
fully-0.017
 proudly-0.016
Token expired
Token ID21350
Feature activation+2.10e-02

Pos logit contributions
ig+0.002
 perfectly+0.002
 patiently+0.002
 toe+0.002
 marker+0.002
Neg logit contributions
 ranc-0.003
 officers-0.002
 men-0.002
 policemen-0.002
 goats-0.002
Token and
Token ID290
Feature activation-2.74e-03

Pos logit contributions
 ranc+0.050
 officers+0.045
 policemen+0.045
 men+0.045
 goats+0.044
Neg logit contributions
ig-0.031
 perfectly-0.027
 patiently-0.024
fully-0.021
 toe-0.021
Token best
Token ID1266
Feature activation-6.13e-03

Pos logit contributions
ig+0.005
 perfectly+0.004
 patiently+0.004
 marker+0.003
 toe+0.003
Neg logit contributions
 ranc-0.006
 officers-0.005
 men-0.005
 policemen-0.005
 goats-0.005
Token.
Token ID13
Feature activation-1.55e-02

Pos logit contributions
ig+0.020
 perfectly+0.017
 patiently+0.016
 Linda+0.015
zz+0.015
Neg logit contributions
 ranc-0.020
 officers-0.019
 men-0.019
 goats-0.018
 policemen-0.018
Token<|endoftext|>
Token ID50256
Feature activation+8.39e-02

Pos logit contributions
ig+0.029
 perfectly+0.024
 patiently+0.022
 marker+0.021
zz+0.021
Neg logit contributions
 ranc-0.031
 officers-0.029
 men-0.028
 policemen-0.028
 goats-0.028
TokenJohnny
Token ID44960
Feature activation+9.51e-03

Pos logit contributions
 ranc+0.175
 officers+0.162
 men+0.159
 policemen+0.157
 goats+0.155
Neg logit contributions
ig-0.147
 perfectly-0.123
 patiently-0.114
 Linda-0.107
 marker-0.107
Token was
Token ID373
Feature activation+3.66e-02

Pos logit contributions
 ranc+0.022
 officers+0.020
 men+0.020
 policemen+0.020
 goats+0.020
Neg logit contributions
ig-0.015
 perfectly-0.012
 patiently-0.011
 marker-0.010
 Linda-0.010
Token 3
Token ID513
Feature activation+4.47e-02

Pos logit contributions
 ranc+0.069
 men+0.063
 policemen+0.061
 officers+0.061
 goats+0.061
Neg logit contributions
ig-0.070
 perfectly-0.057
 marker-0.053
mate-0.053
fully-0.052
Token years
Token ID812
Feature activation+1.58e-01

Pos logit contributions
 ranc+0.077
 officers+0.070
 policemen+0.065
 men+0.064
 goats+0.063
Neg logit contributions
ig-0.095
 perfectly-0.085
 patiently-0.080
 proudly-0.077
 toe-0.076
Token old
Token ID1468
Feature activation+6.08e-02

Pos logit contributions
 ranc+0.246
 officers+0.224
 policemen+0.214
 goats+0.213
 men+0.212
Neg logit contributions
ig-0.355
 perfectly-0.311
 patiently-0.307
 triangle-0.283
 proudly-0.283
Token and
Token ID290
Feature activation+4.78e-02

Pos logit contributions
 ranc+0.142
 officers+0.131
 men+0.131
 policemen+0.129
 goats+0.127
Neg logit contributions
ig-0.087
 perfectly-0.072
 patiently-0.069
 proudly-0.061
 toe-0.061
Token he
Token ID339
Feature activation+5.88e-02

Pos logit contributions
 ranc+0.100
 officers+0.093
 men+0.091
 policemen+0.090
 goats+0.089
Neg logit contributions
ig-0.083
 perfectly-0.070
 patiently-0.066
 marker-0.061
 Linda-0.061
Token liked
Token ID8288
Feature activation+4.53e-03

Pos logit contributions
 ranc+0.130
 officers+0.121
 men+0.119
 policemen+0.117
 goats+0.116
Neg logit contributions
ig-0.096
 perfectly-0.080
 patiently-0.074
 marker-0.068
 toe-0.068
Token to
Token ID284
Feature activation+3.98e-02

Pos logit contributions
 ranc+0.011
 officers+0.011
 men+0.010
 policemen+0.010
 goats+0.010
Neg logit contributions
ig-0.006
 perfectly-0.005
 patiently-0.004
 marker-0.004
 toe-0.004
Token<|endoftext|>
Token ID50256
Feature activation+8.13e-02

Pos logit contributions
ig+0.040
 perfectly+0.034
 patiently+0.031
mate+0.030
 marker+0.030
Neg logit contributions
 ranc-0.036
 officers-0.033
 men-0.032
 goats-0.032
 policemen-0.031
TokenJ
Token ID41
Feature activation-1.61e-02

Pos logit contributions
 ranc+0.188
 officers+0.175
 men+0.171
 policemen+0.169
 goats+0.168
Neg logit contributions
ig-0.125
 perfectly-0.102
 patiently-0.093
 Linda-0.088
 marker-0.087
Tokenill
Token ID359
Feature activation-2.26e-02

Pos logit contributions
ig+0.017
 perfectly+0.014
 patiently+0.012
 toe+0.011
 triangle+0.011
Neg logit contributions
 ranc-0.042
 officers-0.040
 men-0.039
 goats-0.038
 policemen-0.038
Token was
Token ID373
Feature activation+3.61e-02

Pos logit contributions
ig+0.035
 perfectly+0.028
 patiently+0.026
 marker+0.024
 toe+0.023
Neg logit contributions
 ranc-0.052
 officers-0.049
 men-0.048
 policemen-0.048
 goats-0.047
Token 3
Token ID513
Feature activation+4.24e-02

Pos logit contributions
 ranc+0.066
 men+0.060
 policemen+0.058
 officers+0.058
 goats+0.057
Neg logit contributions
ig-0.072
 perfectly-0.055
fully-0.055
mate-0.054
 marker-0.054
Token years
Token ID812
Feature activation+1.58e-01

Pos logit contributions
 ranc+0.073
 officers+0.067
 policemen+0.061
 men+0.061
 goats+0.060
Neg logit contributions
ig-0.091
 perfectly-0.082
 patiently-0.078
 proudly-0.074
 toe-0.073
Token old
Token ID1468
Feature activation+6.13e-02

Pos logit contributions
 ranc+0.234
 officers+0.211
 men+0.203
 policemen+0.201
 goats+0.200
Neg logit contributions
ig-0.369
 perfectly-0.325
 patiently-0.316
 marker-0.295
 Linda-0.295
Token.
Token ID13
Feature activation+2.66e-02

Pos logit contributions
 ranc+0.134
 officers+0.124
 men+0.123
 policemen+0.121
 goats+0.120
Neg logit contributions
ig-0.101
 perfectly-0.084
 patiently-0.078
 marker-0.072
 Linda-0.072
Token She
Token ID1375
Feature activation+1.29e-02

Pos logit contributions
 ranc+0.049
 men+0.046
 officers+0.045
 policemen+0.044
 vendors+0.043
Neg logit contributions
ig-0.054
 perfectly-0.040
zz-0.038
 patiently-0.038
mate-0.038
Token was
Token ID373
Feature activation+2.03e-02

Pos logit contributions
 ranc+0.027
 officers+0.025
 men+0.025
 policemen+0.024
 goats+0.024
Neg logit contributions
ig-0.023
 perfectly-0.019
 patiently-0.018
 marker-0.016
 Linda-0.016
Token learning
Token ID4673
Feature activation+6.26e-02

Pos logit contributions
 ranc+0.041
 men+0.037
 officers+0.036
 policemen+0.036
 goats+0.035
Neg logit contributions
ig-0.039
 perfectly-0.030
zz-0.028
mate-0.028
 marker-0.028
Token.
Token ID13
Feature activation-1.73e-02

Pos logit contributions
ig+0.049
 perfectly+0.040
 patiently+0.037
 marker+0.034
 Linda+0.034
Neg logit contributions
 ranc-0.070
 officers-0.065
 men-0.065
 goats-0.063
 policemen-0.063
Token<|endoftext|>
Token ID50256
Feature activation+7.13e-02

Pos logit contributions
ig+0.030
 perfectly+0.025
 patiently+0.024
 marker+0.022
 toe+0.022
Neg logit contributions
 ranc-0.036
 officers-0.033
 men-0.033
 policemen-0.032
 goats-0.032
TokenJohn
Token ID7554
Feature activation+1.10e-02

Pos logit contributions
 ranc+0.131
 officers+0.119
 men+0.119
 policemen+0.115
 goats+0.115
Neg logit contributions
ig-0.139
 perfectly-0.121
 patiently-0.115
 marker-0.106
fully-0.105
Token was
Token ID373
Feature activation+3.15e-02

Pos logit contributions
 ranc+0.023
 officers+0.022
 men+0.021
 goats+0.021
 policemen+0.021
Neg logit contributions
ig-0.018
 perfectly-0.016
 patiently-0.014
mate-0.014
 Linda-0.013
Token 3
Token ID513
Feature activation+4.89e-02

Pos logit contributions
 ranc+0.065
 men+0.060
 officers+0.058
 goats+0.058
 policemen+0.058
Neg logit contributions
ig-0.055
 perfectly-0.045
mate-0.041
 marker-0.041
 patiently-0.040
Token years
Token ID812
Feature activation+1.57e-01

Pos logit contributions
 ranc+0.084
 officers+0.077
 goats+0.070
 policemen+0.069
 men+0.068
Neg logit contributions
ig-0.103
 perfectly-0.088
 toe-0.084
 patiently-0.082
 proudly-0.079
Token old
Token ID1468
Feature activation+6.74e-02

Pos logit contributions
 ranc+0.260
 officers+0.239
 policemen+0.230
 goats+0.228
 men+0.223
Neg logit contributions
ig-0.330
 patiently-0.292
 perfectly-0.287
 proudly-0.266
fully-0.261
Token.
Token ID13
Feature activation+3.19e-02

Pos logit contributions
 ranc+0.156
 men+0.144
 officers+0.143
 policemen+0.142
 goats+0.140
Neg logit contributions
ig-0.097
 perfectly-0.079
 patiently-0.075
 toe-0.069
 proudly-0.068
Token He
Token ID679
Feature activation+3.36e-02

Pos logit contributions
 ranc+0.057
 officers+0.052
 men+0.051
 goats+0.050
 policemen+0.050
Neg logit contributions
ig-0.064
 perfectly-0.056
 patiently-0.054
 Linda-0.050
 marker-0.050
Token loved
Token ID6151
Feature activation+4.55e-03

Pos logit contributions
 ranc+0.074
 officers+0.068
 men+0.068
 policemen+0.066
 goats+0.066
Neg logit contributions
ig-0.055
 perfectly-0.045
 patiently-0.042
 marker-0.039
 Linda-0.038
Token to
Token ID284
Feature activation+4.33e-02

Pos logit contributions
 ranc+0.011
 officers+0.010
 men+0.010
 policemen+0.010
 goats+0.010
Neg logit contributions
ig-0.007
 perfectly-0.005
 patiently-0.005
 marker-0.004
 Linda-0.004
Token.
Token ID13
Feature activation-1.93e-02

Pos logit contributions
 ranc+0.101
 officers+0.092
 policemen+0.091
 men+0.091
 goats+0.090
Neg logit contributions
ig-0.063
 perfectly-0.053
 patiently-0.048
 marker-0.044
fully-0.044
Token<|endoftext|>
Token ID50256
Feature activation+8.84e-02

Pos logit contributions
ig+0.029
 perfectly+0.024
 patiently+0.022
 Linda+0.022
 Kate+0.021
Neg logit contributions
 ranc-0.045
 officers-0.041
 men-0.041
 policemen-0.040
 goats-0.040
TokenJohn
Token ID7554
Feature activation+1.95e-02

Pos logit contributions
 ranc+0.184
 officers+0.170
 policemen+0.164
 men+0.164
 goats+0.162
Neg logit contributions
ig-0.158
 perfectly-0.133
 patiently-0.119
 Linda-0.118
on-0.117
Token was
Token ID373
Feature activation+3.65e-02

Pos logit contributions
 ranc+0.044
 officers+0.041
 men+0.040
 policemen+0.040
 goats+0.039
Neg logit contributions
ig-0.031
 perfectly-0.025
 patiently-0.023
 marker-0.021
 toe-0.021
Token three
Token ID1115
Feature activation+3.35e-02

Pos logit contributions
 ranc+0.078
 officers+0.072
 men+0.071
 policemen+0.070
 goats+0.069
Neg logit contributions
ig-0.062
 perfectly-0.051
 patiently-0.047
 marker-0.045
 Linda-0.044
Token years
Token ID812
Feature activation+1.57e-01

Pos logit contributions
 ranc+0.058
 officers+0.054
 policemen+0.048
 announcements+0.047
 men+0.047
Neg logit contributions
ig-0.070
 perfectly-0.063
 patiently-0.059
 proudly-0.057
 toe-0.056
Token old
Token ID1468
Feature activation+5.69e-02

Pos logit contributions
 ranc+0.261
 officers+0.239
 goats+0.230
 policemen+0.228
 men+0.223
Neg logit contributions
ig-0.329
 patiently-0.292
 perfectly-0.289
 proudly-0.267
 triangle-0.262
Token and
Token ID290
Feature activation+5.02e-02

Pos logit contributions
 ranc+0.133
 men+0.122
 officers+0.122
 policemen+0.121
 goats+0.119
Neg logit contributions
ig-0.080
 perfectly-0.066
 patiently-0.064
 proudly-0.057
 toe-0.057
Token very
Token ID845
Feature activation+6.30e-02

Pos logit contributions
 ranc+0.104
 officers+0.096
 men+0.095
 policemen+0.093
 goats+0.092
Neg logit contributions
ig-0.089
 perfectly-0.075
 patiently-0.070
 marker-0.065
 Linda-0.064
Token curious
Token ID11040
Feature activation+7.10e-02

Pos logit contributions
 ranc+0.121
 officers+0.113
 goats+0.108
 policemen+0.107
 men+0.106
Neg logit contributions
ig-0.119
 perfectly-0.107
 patiently-0.100
 proudly-0.094
 toe-0.094
Token.
Token ID13
Feature activation+2.56e-02

Pos logit contributions
 ranc+0.176
 officers+0.160
 policemen+0.157
 men+0.157
 goats+0.157
Neg logit contributions
ig-0.090
 perfectly-0.077
 patiently-0.071
fully-0.068
 proudly-0.066
Token the
Token ID262
Feature activation-7.68e-03

Pos logit contributions
ig+0.053
 perfectly+0.043
 patiently+0.038
 marker+0.034
 Linda+0.034
Neg logit contributions
 ranc-0.098
 officers-0.091
 men-0.091
 policemen-0.089
 goats-0.089
Token new
Token ID649
Feature activation-7.27e-02

Pos logit contributions
ig+0.015
 perfectly+0.013
 patiently+0.012
 triangle+0.012
 leaf+0.011
Neg logit contributions
 ranc-0.014
 officers-0.013
 men-0.013
 policemen-0.013
 goats-0.012
Token horse
Token ID8223
Feature activation-9.48e-03

Pos logit contributions
ig+0.145
 perfectly+0.120
 patiently+0.115
 triangle+0.110
 toe+0.109
Neg logit contributions
 ranc-0.135
 officers-0.125
 policemen-0.121
 men-0.121
 goats-0.116
Token when
Token ID618
Feature activation-2.68e-02

Pos logit contributions
ig+0.014
 perfectly+0.012
 patiently+0.010
fully+0.009
 toe+0.009
Neg logit contributions
 ranc-0.023
 men-0.021
 officers-0.021
 policemen-0.021
 goats-0.020
Token the
Token ID262
Feature activation+1.09e-02

Pos logit contributions
ig+0.029
 perfectly+0.026
 patiently+0.022
 Kate+0.021
 proudly+0.020
Neg logit contributions
 ranc-0.072
 officers-0.066
 goats-0.065
 policemen-0.063
 men-0.062
Token ranc
Token ID32205
Feature activation+1.57e-01

Pos logit contributions
 ranc+0.020
 officers+0.019
 policemen+0.018
 men+0.018
 goats+0.018
Neg logit contributions
ig-0.020
 perfectly-0.017
 patiently-0.016
 triangle-0.016
 toe-0.015
Tokenher
Token ID372
Feature activation-1.77e-02

Pos logit contributions
 ranc+0.373
 goats+0.373
 policemen+0.355
 men+0.350
 seals+0.349
Neg logit contributions
ig-0.177
 patiently-0.171
 perfectly-0.165
 proudly-0.145
rap-0.136
Token was
Token ID373
Feature activation-2.25e-03

Pos logit contributions
ig+0.023
 perfectly+0.020
 patiently+0.018
 proudly+0.017
zz+0.016
Neg logit contributions
 ranc-0.042
 policemen-0.038
 cages-0.038
 goats-0.037
 men-0.037
Token not
Token ID407
Feature activation+6.78e-02

Pos logit contributions
ig+0.003
 perfectly+0.003
 patiently+0.003
 proudly+0.003
 coloring+0.003
Neg logit contributions
 ranc-0.005
 officers-0.005
 goats-0.005
 men-0.004
 policemen-0.004
Token looking
Token ID2045
Feature activation+2.49e-02

Pos logit contributions
 ranc+0.159
 officers+0.148
 men+0.146
 policemen+0.145
 goats+0.143
Neg logit contributions
ig-0.097
 perfectly-0.085
 patiently-0.080
fully-0.070
 coloring-0.068
Token.
Token ID13
Feature activation+1.55e-02

Pos logit contributions
 ranc+0.067
 policemen+0.061
 officers+0.061
 goats+0.060
 cages+0.059
Neg logit contributions
ig-0.026
 perfectly-0.023
 patiently-0.018
 proudly-0.015
fully-0.015
Token Ada
Token ID47395
Feature activation+3.65e-02

Pos logit contributions
 ranc+0.187
 officers+0.166
 goats+0.165
 cages+0.161
 men+0.160
Neg logit contributions
ig-0.134
 perfectly-0.096
 patiently-0.091
 marker-0.089
bul-0.088
Token.
Token ID13
Feature activation+2.91e-02

Pos logit contributions
 ranc+0.075
 officers+0.068
 men+0.068
 goats+0.067
 policemen+0.066
Neg logit contributions
ig-0.064
 perfectly-0.055
 patiently-0.050
 Linda-0.048
 marker-0.048
Token She
Token ID1375
Feature activation+1.27e-02

Pos logit contributions
 ranc+0.058
 officers+0.053
 men+0.053
 policemen+0.051
 goats+0.051
Neg logit contributions
ig-0.054
 perfectly-0.045
 patiently-0.042
 toe-0.040
 marker-0.040
Token was
Token ID373
Feature activation+2.48e-02

Pos logit contributions
 ranc+0.024
 men+0.021
 officers+0.021
 goats+0.020
 policemen+0.020
Neg logit contributions
ig-0.025
 perfectly-0.021
 patiently-0.020
mate-0.019
 Linda-0.019
Token 3
Token ID513
Feature activation+4.02e-02

Pos logit contributions
 ranc+0.054
 officers+0.049
 men+0.049
 policemen+0.048
 goats+0.048
Neg logit contributions
ig-0.043
 perfectly-0.034
 patiently-0.031
fully-0.031
 marker-0.030
Token years
Token ID812
Feature activation+1.56e-01

Pos logit contributions
 ranc+0.071
 officers+0.066
 men+0.062
 policemen+0.060
 goats+0.059
Neg logit contributions
ig-0.084
 perfectly-0.071
 toe-0.067
 patiently-0.066
 Kate-0.065
Token old
Token ID1468
Feature activation+6.01e-02

Pos logit contributions
 ranc+0.232
 officers+0.211
 men+0.203
 policemen+0.201
 goats+0.200
Neg logit contributions
ig-0.362
 perfectly-0.317
 patiently-0.309
 toe-0.289
 Linda-0.288
Token,
Token ID11
Feature activation+4.51e-02

Pos logit contributions
 ranc+0.132
 officers+0.123
 men+0.122
 policemen+0.120
 goats+0.119
Neg logit contributions
ig-0.098
 perfectly-0.081
 patiently-0.075
 marker-0.069
 toe-0.069
Token and
Token ID290
Feature activation+3.99e-02

Pos logit contributions
 ranc+0.093
 officers+0.084
 goats+0.083
 men+0.083
 policemen+0.081
Neg logit contributions
ig-0.082
 perfectly-0.066
 patiently-0.060
zz-0.059
mate-0.058
Token loved
Token ID6151
Feature activation+1.27e-02

Pos logit contributions
 ranc+0.076
 men+0.070
 officers+0.067
 policemen+0.066
 goats+0.065
Neg logit contributions
ig-0.083
zz-0.059
 perfectly-0.059
 patiently-0.057
mate-0.057
Token playing
Token ID2712
Feature activation+3.67e-02

Pos logit contributions
 ranc+0.030
 officers+0.028
 men+0.028
 policemen+0.027
 goats+0.027
Neg logit contributions
ig-0.019
 perfectly-0.015
 patiently-0.014
 Linda-0.013
 marker-0.013
Token twins
Token ID20345
Feature activation+1.34e-01

Pos logit contributions
 ranc+0.100
 officers+0.092
 men+0.091
 policemen+0.090
 goats+0.089
Neg logit contributions
ig-0.089
 perfectly-0.075
 patiently-0.070
 marker-0.065
 Linda-0.065
Token.
Token ID13
Feature activation+3.08e-02

Pos logit contributions
 ranc+0.325
 officers+0.304
 men+0.303
 policemen+0.300
 goats+0.292
Neg logit contributions
ig-0.186
 perfectly-0.154
 patiently-0.145
 proudly-0.131
 coloring-0.126
Token They
Token ID1119
Feature activation+8.02e-02

Pos logit contributions
 ranc+0.044
 officers+0.038
 men+0.038
 goats+0.037
 policemen+0.036
Neg logit contributions
ig-0.077
 perfectly-0.068
 patiently-0.063
 marker-0.060
zz-0.060
Token are
Token ID389
Feature activation+4.29e-02

Pos logit contributions
 ranc+0.188
 policemen+0.177
 officers+0.176
 men+0.175
 goats+0.171
Neg logit contributions
ig-0.109
 perfectly-0.098
 patiently-0.090
 brush-0.084
 proudly-0.081
Token three
Token ID1115
Feature activation+2.40e-02

Pos logit contributions
 ranc+0.091
 officers+0.084
 men+0.083
 policemen+0.082
 goats+0.082
Neg logit contributions
ig-0.074
 perfectly-0.062
 patiently-0.057
 marker-0.054
 Linda-0.053
Token years
Token ID812
Feature activation+1.56e-01

Pos logit contributions
 ranc+0.049
 officers+0.047
 policemen+0.043
 men+0.042
 announcements+0.041
Neg logit contributions
ig-0.044
 perfectly-0.037
 patiently-0.035
 proudly-0.032
 gently-0.032
Token old
Token ID1468
Feature activation+5.53e-02

Pos logit contributions
 ranc+0.254
 officers+0.237
 goats+0.224
 policemen+0.222
 men+0.220
Neg logit contributions
ig-0.328
 patiently-0.289
 perfectly-0.281
 triangle-0.263
 proudly-0.262
Token.
Token ID13
Feature activation+3.12e-02

Pos logit contributions
 ranc+0.138
 officers+0.128
 policemen+0.127
 men+0.127
 announcements+0.125
Neg logit contributions
ig-0.066
 perfectly-0.052
 patiently-0.052
 gently-0.045
 proudly-0.044
Token They
Token ID1119
Feature activation+7.75e-02

Pos logit contributions
 ranc+0.056
 officers+0.051
 men+0.050
 goats+0.049
 policemen+0.049
Neg logit contributions
ig-0.065
 perfectly-0.056
 patiently-0.052
 marker-0.049
zz-0.049
Token like
Token ID588
Feature activation+2.81e-03

Pos logit contributions
 ranc+0.184
 policemen+0.174
 officers+0.173
 men+0.172
 goats+0.169
Neg logit contributions
ig-0.101
 perfectly-0.092
 patiently-0.083
 brush-0.080
 coloring-0.076
Token to
Token ID284
Feature activation+3.07e-02

Pos logit contributions
 ranc+0.007
 officers+0.007
 men+0.007
 policemen+0.006
 goats+0.006
Neg logit contributions
ig-0.004
 perfectly-0.003
 patiently-0.003
 marker-0.002
 Linda-0.002
Token.
Token ID13
Feature activation-2.44e-02

Pos logit contributions
ig+0.013
 perfectly+0.011
 patiently+0.010
fully+0.009
 Linda+0.009
Neg logit contributions
 ranc-0.023
 men-0.021
 officers-0.021
 policemen-0.021
 goats-0.020
Token<|endoftext|>
Token ID50256
Feature activation+8.44e-02

Pos logit contributions
ig+0.047
 perfectly+0.040
 patiently+0.037
 dab+0.035
fully+0.035
Neg logit contributions
 ranc-0.047
 officers-0.043
 men-0.042
 policemen-0.042
 goats-0.042
TokenJohn
Token ID7554
Feature activation+1.62e-02

Pos logit contributions
 ranc+0.186
 officers+0.172
 men+0.169
 policemen+0.167
 goats+0.165
Neg logit contributions
ig-0.139
 perfectly-0.115
 patiently-0.106
 Linda-0.099
 marker-0.099
Token was
Token ID373
Feature activation+3.63e-02

Pos logit contributions
 ranc+0.036
 officers+0.033
 men+0.033
 policemen+0.032
 goats+0.032
Neg logit contributions
ig-0.026
 perfectly-0.022
 patiently-0.020
 marker-0.019
 Linda-0.019
Token three
Token ID1115
Feature activation+3.04e-02

Pos logit contributions
 ranc+0.072
 men+0.066
 officers+0.065
 policemen+0.065
 goats+0.064
Neg logit contributions
ig-0.067
 perfectly-0.054
 marker-0.050
fully-0.049
 patiently-0.049
Token years
Token ID812
Feature activation+1.56e-01

Pos logit contributions
 ranc+0.053
 officers+0.050
 policemen+0.046
 men+0.046
 announcements+0.045
Neg logit contributions
ig-0.064
 perfectly-0.056
 patiently-0.054
 proudly-0.051
 toe-0.049
Token old
Token ID1468
Feature activation+5.53e-02

Pos logit contributions
 ranc+0.250
 officers+0.230
 policemen+0.220
 goats+0.219
 men+0.218
Neg logit contributions
ig-0.337
 patiently-0.295
 perfectly-0.290
 proudly-0.270
 Linda-0.269
Token and
Token ID290
Feature activation+4.74e-02

Pos logit contributions
 ranc+0.129
 men+0.119
 officers+0.118
 policemen+0.117
 goats+0.114
Neg logit contributions
ig-0.079
 perfectly-0.065
 patiently-0.063
 proudly-0.056
 toe-0.055
Token he
Token ID339
Feature activation+6.14e-02

Pos logit contributions
 ranc+0.091
 men+0.083
 officers+0.082
 policemen+0.081
 goats+0.080
Neg logit contributions
ig-0.093
 perfectly-0.076
 patiently-0.071
zz-0.069
 marker-0.068
Token loved
Token ID6151
Feature activation+1.37e-02

Pos logit contributions
 ranc+0.131
 officers+0.121
 men+0.120
 policemen+0.118
 goats+0.117
Neg logit contributions
ig-0.105
 perfectly-0.088
 patiently-0.081
 marker-0.076
 Linda-0.075
Token pointing
Token ID10609
Feature activation-5.96e-03

Pos logit contributions
 ranc+0.034
 officers+0.032
 men+0.032
 policemen+0.031
 goats+0.031
Neg logit contributions
ig-0.017
 perfectly-0.014
 patiently-0.013
 marker-0.012
 coloring-0.012
Token.
Token ID13
Feature activation-3.58e-03

Pos logit contributions
 ranc+0.038
 men+0.036
 officers+0.036
 policemen+0.035
 goats+0.034
Neg logit contributions
ig-0.024
 perfectly-0.018
 patiently-0.017
 toe-0.016
 Linda-0.016
Token<|endoftext|>
Token ID50256
Feature activation+6.61e-02

Pos logit contributions
ig+0.006
 perfectly+0.005
 patiently+0.005
 Linda+0.005
 marker+0.005
Neg logit contributions
 ranc-0.008
 officers-0.007
 men-0.007
 policemen-0.007
 goats-0.007
TokenJack
Token ID14295
Feature activation-2.68e-02

Pos logit contributions
 ranc+0.116
 men+0.106
 officers+0.105
 goats+0.102
 policemen+0.101
Neg logit contributions
ig-0.133
 perfectly-0.117
 patiently-0.112
fully-0.103
 marker-0.103
Token was
Token ID373
Feature activation+2.84e-02

Pos logit contributions
ig+0.047
 perfectly+0.041
 patiently+0.038
mate+0.036
 Linda+0.035
Neg logit contributions
 ranc-0.053
 officers-0.049
 men-0.048
 goats-0.048
 policemen-0.047
Token 3
Token ID513
Feature activation+4.70e-02

Pos logit contributions
 ranc+0.059
 men+0.055
 goats+0.053
 officers+0.053
 policemen+0.053
Neg logit contributions
ig-0.049
 perfectly-0.040
mate-0.037
 marker-0.036
 patiently-0.035
Token years
Token ID812
Feature activation+1.56e-01

Pos logit contributions
 ranc+0.086
 officers+0.078
 goats+0.072
 policemen+0.072
 men+0.070
Neg logit contributions
ig-0.096
 perfectly-0.081
 toe-0.077
 patiently-0.077
 Kate-0.073
Token old
Token ID1468
Feature activation+6.76e-02

Pos logit contributions
 ranc+0.243
 officers+0.220
 policemen+0.210
 goats+0.209
 men+0.208
Neg logit contributions
ig-0.348
 perfectly-0.305
 patiently-0.303
 proudly-0.277
 toe-0.277
Token.
Token ID13
Feature activation+3.06e-02

Pos logit contributions
 ranc+0.153
 officers+0.141
 men+0.140
 policemen+0.139
 goats+0.137
Neg logit contributions
ig-0.103
 perfectly-0.085
 patiently-0.080
 toe-0.073
 triangle-0.072
Token He
Token ID679
Feature activation+3.53e-02

Pos logit contributions
 ranc+0.046
 men+0.041
 officers+0.040
 policemen+0.040
 goats+0.038
Neg logit contributions
ig-0.073
 perfectly-0.062
 patiently-0.057
zz-0.056
 dab-0.056
Token loved
Token ID6151
Feature activation+4.21e-03

Pos logit contributions
 ranc+0.076
 men+0.069
 officers+0.069
 policemen+0.067
 goats+0.066
Neg logit contributions
ig-0.061
 perfectly-0.049
 patiently-0.046
 marker-0.043
 Linda-0.043
Token to
Token ID284
Feature activation+4.42e-02

Pos logit contributions
 ranc+0.010
 officers+0.009
 men+0.009
 policemen+0.009
 goats+0.009
Neg logit contributions
ig-0.006
 perfectly-0.005
 patiently-0.005
 Linda-0.004
 marker-0.004
Token Mary
Token ID5335
Feature activation+3.55e-02

Pos logit contributions
 ranc+0.043
 goats+0.040
 men+0.039
 dogs+0.038
 officers+0.038
Neg logit contributions
ig-0.033
 perfectly-0.026
 patiently-0.026
fully-0.022
mate-0.022
Token.
Token ID13
Feature activation+2.74e-02

Pos logit contributions
 ranc+0.068
 officers+0.061
 goats+0.061
 men+0.060
 policemen+0.058
Neg logit contributions
ig-0.067
 perfectly-0.057
 patiently-0.053
 marker-0.052
mate-0.051
Token Mary
Token ID5335
Feature activation+1.82e-02

Pos logit contributions
 ranc+0.060
 officers+0.056
 men+0.055
 policemen+0.054
 goats+0.053
Neg logit contributions
ig-0.045
 perfectly-0.038
 patiently-0.035
 marker-0.033
 Linda-0.033
Token was
Token ID373
Feature activation+1.89e-02

Pos logit contributions
 ranc+0.041
 officers+0.038
 men+0.037
 policemen+0.037
 goats+0.036
Neg logit contributions
ig-0.029
 perfectly-0.024
 patiently-0.022
 marker-0.021
 toe-0.021
Token 3
Token ID513
Feature activation+3.34e-02

Pos logit contributions
 ranc+0.040
 men+0.036
 officers+0.036
 goats+0.035
 policemen+0.035
Neg logit contributions
ig-0.033
 perfectly-0.027
 patiently-0.025
fully-0.025
 marker-0.024
Token years
Token ID812
Feature activation+1.56e-01

Pos logit contributions
 ranc+0.059
 officers+0.055
 men+0.050
 policemen+0.050
 goats+0.049
Neg logit contributions
ig-0.070
 perfectly-0.060
 toe-0.056
 patiently-0.055
 Kate-0.054
Token old
Token ID1468
Feature activation+5.74e-02

Pos logit contributions
 ranc+0.238
 officers+0.218
 men+0.209
 policemen+0.208
 goats+0.207
Neg logit contributions
ig-0.354
 perfectly-0.306
 patiently-0.300
 toe-0.281
 Linda-0.281
Token and
Token ID290
Feature activation+4.26e-02

Pos logit contributions
 ranc+0.128
 officers+0.119
 men+0.118
 policemen+0.116
 goats+0.115
Neg logit contributions
ig-0.091
 perfectly-0.075
 patiently-0.070
 toe-0.064
 marker-0.064
Token liked
Token ID8288
Feature activation+8.73e-03

Pos logit contributions
 ranc+0.082
 men+0.074
 officers+0.073
 policemen+0.071
 goats+0.071
Neg logit contributions
ig-0.084
 perfectly-0.067
 patiently-0.065
 marker-0.062
zz-0.061
Token to
Token ID284
Feature activation+4.11e-02

Pos logit contributions
 ranc+0.021
 officers+0.019
 men+0.019
 policemen+0.019
 goats+0.019
Neg logit contributions
ig-0.013
 perfectly-0.011
 patiently-0.010
inn-0.009
 Linda-0.009
Token wear
Token ID5806
Feature activation-3.43e-02

Pos logit contributions
 ranc+0.090
 men+0.081
 officers+0.081
 goats+0.079
 policemen+0.079
Neg logit contributions
ig-0.070
 perfectly-0.057
 patiently-0.053
inn-0.049
fully-0.048
Token moved
Token ID3888
Feature activation-2.41e-02

Pos logit contributions
 ranc+0.015
 officers+0.014
 men+0.014
 policemen+0.014
 goats+0.013
Neg logit contributions
ig-0.009
 perfectly-0.008
 patiently-0.007
 toe-0.007
 Linda-0.007
Token it
Token ID340
Feature activation-1.77e-02

Pos logit contributions
ig+0.035
 perfectly+0.027
 patiently+0.025
 marker+0.024
 dab+0.023
Neg logit contributions
 ranc-0.056
 officers-0.054
 men-0.053
 policemen-0.051
 goats-0.051
Token away
Token ID1497
Feature activation-6.77e-03

Pos logit contributions
ig+0.026
 perfectly+0.020
 patiently+0.018
 Linda+0.017
 marker+0.017
Neg logit contributions
 ranc-0.041
 officers-0.039
 men-0.038
 goats-0.037
 policemen-0.037
Token,
Token ID11
Feature activation-1.19e-02

Pos logit contributions
ig+0.010
 perfectly+0.008
 patiently+0.007
fully+0.007
 marker+0.007
Neg logit contributions
 ranc-0.016
 officers-0.015
 men-0.015
 policemen-0.015
 goats-0.015
Token "
Token ID366
Feature activation-3.01e-03

Pos logit contributions
ig+0.018
 perfectly+0.015
 patiently+0.014
 marker+0.012
 Linda+0.012
Neg logit contributions
 ranc-0.028
 officers-0.026
 men-0.025
 policemen-0.025
 goats-0.025
TokenSee
Token ID6214
Feature activation-1.49e-01

Pos logit contributions
ig+0.006
 perfectly+0.005
 patiently+0.005
 marker+0.004
 Linda+0.004
Neg logit contributions
 ranc-0.006
 officers-0.005
 men-0.005
 policemen-0.005
 goats-0.005
Token,
Token ID11
Feature activation-3.77e-02

Pos logit contributions
ig+0.233
inn+0.201
 marker+0.189
 perfectly+0.188
 paste+0.187
Neg logit contributions
 ranc-0.289
 men-0.283
 officers-0.276
 goats-0.265
 policemen-0.263
Token it
Token ID340
Feature activation-1.33e-02

Pos logit contributions
ig+0.053
 perfectly+0.042
 patiently+0.039
 marker+0.035
 Linda+0.034
Neg logit contributions
 ranc-0.091
 officers-0.085
 men-0.085
 policemen-0.083
 goats-0.083
Token's
Token ID338
Feature activation-1.31e-03

Pos logit contributions
ig+0.021
 perfectly+0.017
 patiently+0.015
 marker+0.014
 triangle+0.014
Neg logit contributions
 ranc-0.031
 officers-0.029
 men-0.029
 policemen-0.028
 goats-0.028
Token gone
Token ID3750
Feature activation-4.38e-02

Pos logit contributions
ig+0.002
zz+0.002
 perfectly+0.002
 marker+0.002
mate+0.002
Neg logit contributions
 ranc-0.003
 men-0.002
 officers-0.002
 goats-0.002
 cages-0.002
Token now
Token ID783
Feature activation-3.93e-02

Pos logit contributions
ig+0.067
 perfectly+0.051
 patiently+0.048
 marker+0.047
inn+0.047
Neg logit contributions
 ranc-0.097
 officers-0.092
 men-0.091
 policemen-0.088
 goats-0.088
Token monster
Token ID9234
Feature activation-2.31e-02

Pos logit contributions
 ranc+0.045
 officers+0.042
 men+0.041
 goats+0.040
 policemen+0.040
Neg logit contributions
ig-0.035
 perfectly-0.029
 patiently-0.027
 toe-0.026
 marker-0.026
Token goodbye
Token ID24829
Feature activation+5.25e-02

Pos logit contributions
ig+0.037
 perfectly+0.031
 patiently+0.028
 marker+0.027
 Linda+0.026
Neg logit contributions
 ranc-0.051
 officers-0.047
 men-0.047
 policemen-0.046
 goats-0.046
Token and
Token ID290
Feature activation-2.59e-02

Pos logit contributions
 ranc+0.122
 men+0.113
 officers+0.113
 policemen+0.112
 goats+0.110
Neg logit contributions
ig-0.079
 perfectly-0.066
 patiently-0.062
fully-0.055
 proudly-0.054
Token said
Token ID531
Feature activation-1.73e-03

Pos logit contributions
ig+0.047
 perfectly+0.039
inn+0.036
 marker+0.035
 patiently+0.035
Neg logit contributions
 ranc-0.052
 officers-0.049
 men-0.048
 policemen-0.047
 goats-0.046
Token "
Token ID366
Feature activation-2.96e-03

Pos logit contributions
ig+0.003
 perfectly+0.003
 marker+0.002
 triangle+0.002
 paste+0.002
Neg logit contributions
 ranc-0.003
 men-0.003
 officers-0.003
 policemen-0.003
 goats-0.003
TokenSee
Token ID6214
Feature activation-1.46e-01

Pos logit contributions
ig+0.006
 perfectly+0.005
 patiently+0.005
 marker+0.004
 Linda+0.004
Neg logit contributions
 ranc-0.006
 officers-0.005
 men-0.005
 goats-0.005
 policemen-0.005
Token you
Token ID345
Feature activation-3.97e-02

Pos logit contributions
ig+0.220
inn+0.183
 paste+0.183
 dab+0.179
 scrub+0.179
Neg logit contributions
 ranc-0.303
 men-0.286
 officers-0.285
 goats-0.271
 policemen-0.271
Token soon
Token ID2582
Feature activation-3.81e-02

Pos logit contributions
ig+0.051
 perfectly+0.040
 patiently+0.037
 marker+0.032
 Linda+0.032
Neg logit contributions
 ranc-0.102
 officers-0.095
 men-0.094
 policemen-0.093
 goats-0.092
Token!"
Token ID2474
Feature activation+6.37e-03

Pos logit contributions
ig+0.047
 perfectly+0.039
 patiently+0.035
 proudly+0.031
 coloring+0.030
Neg logit contributions
 ranc-0.098
 officers-0.092
 policemen-0.091
 men-0.090
 goats-0.089
Token The
Token ID383
Feature activation+2.20e-02

Pos logit contributions
 ranc+0.010
 officers+0.010
 goats+0.009
 men+0.009
 policemen+0.009
Neg logit contributions
ig-0.013
 perfectly-0.011
 crystal-0.010
 toe-0.010
 patiently-0.010
Token monster
Token ID9234
Feature activation-1.33e-02

Pos logit contributions
 ranc+0.046
 officers+0.043
 men+0.042
 goats+0.041
 policemen+0.041
Neg logit contributions
ig-0.038
 perfectly-0.032
 patiently-0.030
 toe-0.028
 marker-0.028
Token hug
Token ID16225
Feature activation-4.32e-03

Pos logit contributions
ig+0.058
 perfectly+0.048
 patiently+0.045
 marker+0.043
 toe+0.042
Neg logit contributions
 ranc-0.068
 officers-0.062
 men-0.061
 policemen-0.060
 goats-0.060
Token and
Token ID290
Feature activation-6.49e-03

Pos logit contributions
ig+0.007
 perfectly+0.005
 patiently+0.005
 marker+0.005
 Linda+0.005
Neg logit contributions
 ranc-0.010
 officers-0.009
 men-0.009
 policemen-0.009
 goats-0.009
Token said
Token ID531
Feature activation+1.15e-02

Pos logit contributions
ig+0.011
 perfectly+0.009
 patiently+0.008
 marker+0.008
zz+0.008
Neg logit contributions
 ranc-0.014
 officers-0.013
 men-0.013
 policemen-0.013
 goats-0.013
Token,
Token ID11
Feature activation-9.76e-03

Pos logit contributions
 ranc+0.026
 men+0.025
 officers+0.024
 policemen+0.023
 goats+0.023
Neg logit contributions
ig-0.018
 perfectly-0.014
 marker-0.013
 patiently-0.013
 triangle-0.012
Token "
Token ID366
Feature activation+1.70e-03

Pos logit contributions
ig+0.018
 perfectly+0.014
 patiently+0.013
zz+0.012
 marker+0.012
Neg logit contributions
 ranc-0.021
 officers-0.019
 men-0.018
 goats-0.018
 policemen-0.018
TokenSee
Token ID6214
Feature activation-1.45e-01

Pos logit contributions
 ranc+0.003
 officers+0.003
 men+0.003
 goats+0.003
 policemen+0.003
Neg logit contributions
ig-0.003
 perfectly-0.003
 patiently-0.003
 marker-0.002
 toe-0.002
Token,
Token ID11
Feature activation-3.65e-02

Pos logit contributions
ig+0.222
inn+0.184
 perfectly+0.183
 patiently+0.176
 marker+0.176
Neg logit contributions
 ranc-0.306
 men-0.292
 officers-0.287
 goats-0.278
 policemen-0.275
Token Jenny
Token ID28437
Feature activation-4.68e-02

Pos logit contributions
ig+0.051
 perfectly+0.041
 patiently+0.037
 marker+0.034
 Linda+0.033
Neg logit contributions
 ranc-0.089
 officers-0.083
 men-0.082
 policemen-0.081
 goats-0.081
Token?
Token ID30
Feature activation-5.38e-02

Pos logit contributions
ig+0.072
 perfectly+0.059
 patiently+0.054
 marker+0.050
 Linda+0.049
Neg logit contributions
 ranc-0.107
 officers-0.099
 men-0.099
 policemen-0.097
 goats-0.096
Token You
Token ID921
Feature activation-3.86e-02

Pos logit contributions
ig+0.088
 perfectly+0.073
 patiently+0.068
 marker+0.063
fully+0.062
Neg logit contributions
 ranc-0.117
 officers-0.109
 men-0.107
 policemen-0.105
 goats-0.105
Token did
Token ID750
Feature activation-2.51e-02

Pos logit contributions
ig+0.060
 perfectly+0.050
 patiently+0.044
 marker+0.043
 brush+0.043
Neg logit contributions
 ranc-0.090
 men-0.082
 policemen-0.082
 officers-0.082
 goats-0.081
TokenLet
Token ID5756
Feature activation-6.83e-02

Pos logit contributions
 ranc+0.019
 officers+0.018
 goats+0.017
 men+0.017
 policemen+0.017
Neg logit contributions
ig-0.021
 perfectly-0.018
 patiently-0.017
 marker-0.016
 toe-0.016
Token's
Token ID338
Feature activation-2.21e-02

Pos logit contributions
ig+0.114
 perfectly+0.095
 patiently+0.088
 marker+0.082
 Linda+0.081
Neg logit contributions
 ranc-0.148
 officers-0.137
 men-0.136
 policemen-0.133
 goats-0.133
Token take
Token ID1011
Feature activation-4.89e-02

Pos logit contributions
ig+0.036
 perfectly+0.028
inn+0.028
 patiently+0.026
fully+0.026
Neg logit contributions
 ranc-0.047
 officers-0.043
 men-0.043
 goats-0.042
 policemen-0.042
Token the
Token ID262
Feature activation-2.01e-02

Pos logit contributions
ig+0.084
 patiently+0.069
 perfectly+0.069
 marker+0.066
inn+0.064
Neg logit contributions
 men-0.092
 ranc-0.091
 officers-0.090
 goats-0.088
 policemen-0.087
Token tw
Token ID665
Feature activation-1.33e-01

Pos logit contributions
ig+0.035
 perfectly+0.029
 patiently+0.027
 marker+0.025
 toe+0.025
Neg logit contributions
 ranc-0.043
 officers-0.040
 men-0.039
 policemen-0.039
 goats-0.038
Tokenig
Token ID328
Feature activation-1.44e-01

Pos logit contributions
ig+0.021
 perfectly-0.003
 patiently-0.015
 marker-0.021
 Linda-0.024
Neg logit contributions
 ranc-0.470
 men-0.443
 officers-0.442
 goats-0.435
 policemen-0.433
Token with
Token ID351
Feature activation-3.11e-02

Pos logit contributions
ig+0.186
 perfectly+0.150
 patiently+0.137
 marker+0.127
 dab+0.126
Neg logit contributions
 ranc-0.352
 officers-0.327
 men-0.322
 goats-0.315
 policemen-0.314
Token us
Token ID514
Feature activation+3.42e-02

Pos logit contributions
ig+0.048
 perfectly+0.038
 patiently+0.036
 marker+0.033
 Linda+0.033
Neg logit contributions
 ranc-0.071
 officers-0.067
 men-0.066
 policemen-0.065
 goats-0.065
Token on
Token ID319
Feature activation-5.37e-02

Pos logit contributions
 ranc+0.082
 officers+0.076
 men+0.076
 policemen+0.075
 goats+0.073
Neg logit contributions
ig-0.050
 perfectly-0.042
 patiently-0.038
 marker-0.034
 toe-0.034
Token all
Token ID477
Feature activation-1.63e-02

Pos logit contributions
ig+0.066
 perfectly+0.057
 patiently+0.051
 triangle+0.043
 proudly+0.043
Neg logit contributions
 ranc-0.139
 officers-0.131
 men-0.128
 policemen-0.127
 goats-0.123
Token our
Token ID674
Feature activation+7.31e-03

Pos logit contributions
ig+0.027
 perfectly+0.023
 patiently+0.021
 marker+0.019
 Linda+0.019
Neg logit contributions
 ranc-0.036
 officers-0.033
 men-0.033
 policemen-0.032
 goats-0.032
Token bird
Token ID6512
Feature activation-3.14e-02

Pos logit contributions
ig+0.100
 perfectly+0.080
 patiently+0.075
 toe+0.072
 triangle+0.071
Neg logit contributions
 ranc-0.128
 officers-0.120
 policemen-0.117
 men-0.117
 goats-0.114
Token and
Token ID290
Feature activation-1.48e-02

Pos logit contributions
ig+0.045
 perfectly+0.037
 patiently+0.033
 marker+0.032
 dab+0.031
Neg logit contributions
 ranc-0.074
 officers-0.069
 men-0.068
 goats-0.068
 policemen-0.067
Token took
Token ID1718
Feature activation-4.85e-02

Pos logit contributions
ig+0.029
 perfectly+0.025
inn+0.023
mate+0.022
 marker+0.022
Neg logit contributions
 ranc-0.028
 officers-0.026
 men-0.026
 goats-0.026
 policemen-0.025
Token the
Token ID262
Feature activation-1.01e-02

Pos logit contributions
ig+0.099
Acc+0.087
pron+0.086
 perfectly+0.083
 patiently+0.083
Neg logit contributions
 men-0.076
 officers-0.068
 goats-0.068
 cages-0.068
 policemen-0.068
Token tw
Token ID665
Feature activation-1.26e-01

Pos logit contributions
ig+0.012
 perfectly+0.011
 patiently+0.010
 Kate+0.009
 Linda+0.009
Neg logit contributions
 ranc-0.025
 men-0.022
 goats-0.022
 officers-0.022
 cages-0.022
Tokenig
Token ID328
Feature activation-1.43e-01

Pos logit contributions
 Emily+0.024
 Kate+0.020
 perfectly+0.019
 marker+0.014
 patiently+0.012
Neg logit contributions
 ranc-0.395
 men-0.363
 cages-0.360
 goats-0.354
 officers-0.349
Token back
Token ID736
Feature activation-3.19e-02

Pos logit contributions
ig+0.181
 perfectly+0.147
 patiently+0.131
 marker+0.122
 triangle+0.121
Neg logit contributions
 ranc-0.358
 officers-0.332
 men-0.327
 goats-0.321
 policemen-0.319
Token to
Token ID284
Feature activation+5.83e-03

Pos logit contributions
ig+0.051
 perfectly+0.045
 patiently+0.042
 Linda+0.040
 marker+0.040
Neg logit contributions
 ranc-0.064
 cages-0.060
 officers-0.060
 goats-0.060
 men-0.059
Token her
Token ID607
Feature activation-2.72e-02

Pos logit contributions
 ranc+0.012
 men+0.011
 goats+0.011
 policemen+0.011
 officers+0.011
Neg logit contributions
ig-0.011
 perfectly-0.009
inn-0.008
zz-0.007
 marker-0.007
Token nest
Token ID16343
Feature activation-2.20e-02

Pos logit contributions
ig+0.048
 perfectly+0.040
 patiently+0.037
 Linda+0.035
 marker+0.035
Neg logit contributions
 ranc-0.057
 officers-0.052
 men-0.052
 goats-0.051
 policemen-0.050
Token to
Token ID284
Feature activation+1.25e-02

Pos logit contributions
ig+0.036
 perfectly+0.030
 patiently+0.028
 marker+0.026
 Linda+0.026
Neg logit contributions
 ranc-0.048
 officers-0.044
 men-0.044
 policemen-0.043
 goats-0.042
Token up
Token ID510
Feature activation-1.97e-02

Pos logit contributions
ig+0.032
 perfectly+0.026
 patiently+0.023
 marker+0.021
zz+0.021
Neg logit contributions
 ranc-0.049
 officers-0.046
 men-0.045
 goats-0.044
 policemen-0.044
Token and
Token ID290
Feature activation-3.00e-02

Pos logit contributions
ig+0.029
 perfectly+0.023
 patiently+0.021
 marker+0.020
 Linda+0.019
Neg logit contributions
 ranc-0.047
 officers-0.043
 men-0.043
 policemen-0.042
 goats-0.042
Token said
Token ID531
Feature activation-2.42e-03

Pos logit contributions
ig+0.052
 perfectly+0.042
 patiently+0.037
inn+0.036
mate+0.036
Neg logit contributions
 ranc-0.064
 officers-0.060
 men-0.059
 goats-0.058
 policemen-0.058
Token,
Token ID11
Feature activation-2.02e-02

Pos logit contributions
ig+0.004
 perfectly+0.003
 patiently+0.003
 marker+0.003
zz+0.003
Neg logit contributions
 ranc-0.006
 men-0.005
 officers-0.005
 policemen-0.005
 goats-0.005
Token "
Token ID366
Feature activation-1.24e-03

Pos logit contributions
ig+0.034
 perfectly+0.028
 patiently+0.025
 marker+0.024
 toe+0.024
Neg logit contributions
 ranc-0.044
 officers-0.041
 men-0.040
 goats-0.039
 policemen-0.039
TokenSee
Token ID6214
Feature activation-1.42e-01

Pos logit contributions
ig+0.002
 perfectly+0.002
 patiently+0.002
 marker+0.002
 Linda+0.002
Neg logit contributions
 ranc-0.003
 officers-0.002
 men-0.002
 policemen-0.002
 goats-0.002
Token,
Token ID11
Feature activation-3.28e-02

Pos logit contributions
ig+0.210
 perfectly+0.175
 patiently+0.166
inn+0.165
 marker+0.164
Neg logit contributions
 ranc-0.312
 officers-0.291
 men-0.291
 goats-0.285
 policemen-0.279
Token Mitt
Token ID16627
Feature activation-6.35e-03

Pos logit contributions
ig+0.046
 perfectly+0.037
 patiently+0.033
 Linda+0.031
 marker+0.030
Neg logit contributions
 ranc-0.080
 officers-0.075
 men-0.074
 policemen-0.073
 goats-0.072
Tokenens
Token ID641
Feature activation-1.12e-02

Pos logit contributions
ig+0.012
 perfectly+0.010
 patiently+0.009
 Linda+0.009
zz+0.009
Neg logit contributions
 ranc-0.013
 policemen-0.012
 officers-0.012
 men-0.011
 cages-0.011
Token,
Token ID11
Feature activation-3.06e-02

Pos logit contributions
ig+0.016
 perfectly+0.013
 patiently+0.012
 Linda+0.011
 toe+0.010
Neg logit contributions
 ranc-0.027
 officers-0.025
 policemen-0.025
 men-0.025
 goats-0.024
Token I
Token ID314
Feature activation-6.37e-02

Pos logit contributions
ig+0.044
 perfectly+0.036
 patiently+0.033
 marker+0.030
 Linda+0.029
Neg logit contributions
 ranc-0.073
 officers-0.068
 men-0.067
 goats-0.066
 policemen-0.066
Token smiled
Token ID13541
Feature activation+1.41e-02

Pos logit contributions
 ranc+0.103
 officers+0.097
 men+0.096
 policemen+0.094
 goats+0.093
Neg logit contributions
ig-0.078
 perfectly-0.065
 patiently-0.058
 triangle-0.055
 marker-0.055
Token and
Token ID290
Feature activation+5.26e-04

Pos logit contributions
 ranc+0.029
 officers+0.027
 men+0.027
 goats+0.026
 policemen+0.026
Neg logit contributions
ig-0.025
 perfectly-0.021
 patiently-0.019
 marker-0.018
zz-0.018
Token said
Token ID531
Feature activation+1.54e-02

Pos logit contributions
 ranc+0.001
 officers+0.001
 men+0.001
 goats+0.001
 policemen+0.001
Neg logit contributions
ig-0.001
 perfectly-0.001
mate-0.001
inn-0.001
 marker-0.001
Token,
Token ID11
Feature activation-1.20e-02

Pos logit contributions
 ranc+0.032
 men+0.031
 officers+0.030
 policemen+0.029
 goats+0.029
Neg logit contributions
ig-0.026
 perfectly-0.020
zz-0.019
 marker-0.019
inn-0.019
Token "
Token ID366
Feature activation+2.70e-03

Pos logit contributions
ig+0.023
 perfectly+0.018
zz+0.016
 bead+0.016
 toe+0.015
Neg logit contributions
 ranc-0.025
 officers-0.023
 men-0.022
 goats-0.022
 policemen-0.022
TokenSee
Token ID6214
Feature activation-1.42e-01

Pos logit contributions
 ranc+0.005
 officers+0.005
 goats+0.005
 policemen+0.005
 men+0.005
Neg logit contributions
ig-0.005
 perfectly-0.004
 patiently-0.004
 toe-0.004
 marker-0.004
Token,
Token ID11
Feature activation-3.80e-02

Pos logit contributions
ig+0.220
inn+0.181
 perfectly+0.179
 marker+0.172
 patiently+0.170
Neg logit contributions
 ranc-0.295
 men-0.280
 officers-0.277
 goats-0.268
 policemen-0.264
Token sometimes
Token ID3360
Feature activation+7.67e-03

Pos logit contributions
ig+0.054
 perfectly+0.042
 patiently+0.038
inn+0.035
 marker+0.035
Neg logit contributions
 ranc-0.092
 officers-0.086
 men-0.086
 policemen-0.084
 goats-0.084
Token things
Token ID1243
Feature activation+4.05e-02

Pos logit contributions
 ranc+0.018
 officers+0.017
 men+0.017
 policemen+0.017
 goats+0.016
Neg logit contributions
ig-0.011
 perfectly-0.009
 patiently-0.008
 marker-0.007
 Linda-0.007
Token that
Token ID326
Feature activation-1.05e-02

Pos logit contributions
 ranc+0.097
 officers+0.090
 men+0.089
 policemen+0.088
 goats+0.088
Neg logit contributions
ig-0.058
 perfectly-0.048
 patiently-0.043
 marker-0.039
 Linda-0.038
Token seem
Token ID1283
Feature activation+4.83e-02

Pos logit contributions
ig+0.016
 perfectly+0.013
 patiently+0.012
 marker+0.011
 triangle+0.011
Neg logit contributions
 ranc-0.024
 officers-0.022
 men-0.022
 policemen-0.022
 goats-0.021
Token key
Token ID1994
Feature activation-5.26e-02

Pos logit contributions
 ranc+0.017
 officers+0.016
 goats+0.016
 policemen+0.015
 men+0.015
Neg logit contributions
ig-0.007
 perfectly-0.006
 marker-0.005
 patiently-0.005
 brush-0.005
Token.
Token ID13
Feature activation-1.45e-02

Pos logit contributions
ig+0.088
 perfectly+0.073
 patiently+0.068
 Linda+0.065
 toe+0.064
Neg logit contributions
 ranc-0.110
 officers-0.104
 men-0.102
 policemen-0.099
 goats-0.098
Token She
Token ID1375
Feature activation-1.94e-02

Pos logit contributions
ig+0.027
 perfectly+0.022
 patiently+0.021
zz+0.020
 toe+0.020
Neg logit contributions
 ranc-0.029
 officers-0.027
 men-0.026
 policemen-0.026
 goats-0.025
Token unlocked
Token ID14838
Feature activation-6.65e-02

Pos logit contributions
ig+0.031
 perfectly+0.025
 marker+0.023
mate+0.023
fully+0.023
Neg logit contributions
 ranc-0.042
 officers-0.039
 men-0.038
 policemen-0.038
 goats-0.037
Token the
Token ID262
Feature activation-3.82e-03

Pos logit contributions
ig+0.097
 perfectly+0.078
 patiently+0.071
 marker+0.066
 Linda+0.064
Neg logit contributions
 ranc-0.158
 officers-0.148
 men-0.147
 policemen-0.144
 goats-0.143
Token pad
Token ID14841
Feature activation-1.41e-01

Pos logit contributions
ig+0.006
 perfectly+0.005
 patiently+0.005
 marker+0.005
 crystal+0.005
Neg logit contributions
 ranc-0.008
 officers-0.007
 goats-0.007
 men-0.007
 policemen-0.007
Tokenlock
Token ID5354
Feature activation-8.44e-02

Pos logit contributions
ig+0.251
 perfectly+0.210
 Linda+0.194
 patiently+0.192
 dab+0.189
Neg logit contributions
 ranc-0.278
 officers-0.257
 men-0.256
 policemen-0.246
 goats-0.238
Token and
Token ID290
Feature activation-1.65e-02

Pos logit contributions
ig+0.147
 perfectly+0.121
 patiently+0.112
 Linda+0.109
 toe+0.109
Neg logit contributions
 ranc-0.169
 officers-0.158
 men-0.157
 policemen-0.151
 goats-0.149
Token opened
Token ID4721
Feature activation-6.72e-02

Pos logit contributions
ig+0.029
 perfectly+0.023
inn+0.021
 triangle+0.021
 patiently+0.020
Neg logit contributions
 ranc-0.033
 officers-0.032
 men-0.032
 policemen-0.031
 goats-0.029
Token the
Token ID262
Feature activation+3.65e-03

Pos logit contributions
ig+0.100
 perfectly+0.078
 patiently+0.071
 marker+0.070
zz+0.069
Neg logit contributions
 ranc-0.151
 men-0.144
 officers-0.142
 policemen-0.139
 goats-0.138
Token door
Token ID3420
Feature activation-1.39e-02

Pos logit contributions
 ranc+0.008
 officers+0.007
 men+0.007
 policemen+0.007
 goats+0.007
Neg logit contributions
ig-0.006
 perfectly-0.005
 patiently-0.005
 marker-0.005
 toe-0.005
Token little
Token ID1310
Feature activation-8.20e-02

Pos logit contributions
 ranc+0.049
 men+0.045
 officers+0.045
 goats+0.043
 policemen+0.043
Neg logit contributions
ig-0.038
 perfectly-0.033
 patiently-0.031
fully-0.031
inn-0.030
Token boy
Token ID2933
Feature activation-3.67e-03

Pos logit contributions
ig+0.152
 perfectly+0.138
 patiently+0.133
fully+0.125
mate+0.123
Neg logit contributions
 ranc-0.151
 officers-0.140
 men-0.139
 goats-0.136
 policemen-0.131
Token used
Token ID973
Feature activation-7.83e-02

Pos logit contributions
ig+0.006
 perfectly+0.005
 patiently+0.004
 marker+0.004
 Linda+0.004
Neg logit contributions
 ranc-0.008
 officers-0.008
 men-0.008
 policemen-0.007
 goats-0.007
Token the
Token ID262
Feature activation-2.96e-02

Pos logit contributions
ig+0.131
 perfectly+0.103
 patiently+0.096
zz+0.091
mate+0.090
Neg logit contributions
 ranc-0.165
 officers-0.158
 men-0.156
 policemen-0.154
 goats-0.152
Token tw
Token ID665
Feature activation-1.29e-01

Pos logit contributions
ig+0.037
inn+0.030
fully+0.030
 perfectly+0.030
 patiently+0.029
Neg logit contributions
 ranc-0.070
 men-0.064
 officers-0.063
 cages-0.061
 goats-0.060
Tokenig
Token ID328
Feature activation-1.41e-01

Pos logit contributions
ig+0.048
 perfectly+0.036
 Emily+0.036
 triangle+0.035
 marker+0.035
Neg logit contributions
 ranc-0.385
 men-0.361
 officers-0.354
 cages-0.352
 goats-0.349
Token to
Token ID284
Feature activation-2.42e-03

Pos logit contributions
ig+0.195
 perfectly+0.160
 patiently+0.149
 triangle+0.139
 marker+0.138
Neg logit contributions
 ranc-0.329
 officers-0.308
 men-0.304
 goats-0.296
 policemen-0.294
Token help
Token ID1037
Feature activation+6.05e-03

Pos logit contributions
ig+0.003
 perfectly+0.003
 patiently+0.002
inn+0.002
 triangle+0.002
Neg logit contributions
 ranc-0.006
 officers-0.005
 men-0.005
 goats-0.005
 policemen-0.005
Token them
Token ID606
Feature activation+4.80e-02

Pos logit contributions
 ranc+0.013
 officers+0.012
 men+0.012
 policemen+0.012
 goats+0.012
Neg logit contributions
ig-0.010
 perfectly-0.008
 patiently-0.007
zz-0.007
 marker-0.007
Token with
Token ID351
Feature activation-3.11e-02

Pos logit contributions
 ranc+0.109
 officers+0.103
 men+0.100
 goats+0.098
 policemen+0.098
Neg logit contributions
ig-0.073
 perfectly-0.057
 patiently-0.052
inn-0.050
 Linda-0.049
Token all
Token ID477
Feature activation-1.54e-02

Pos logit contributions
ig+0.047
 perfectly+0.039
 patiently+0.036
 marker+0.033
 toe+0.033
Neg logit contributions
 ranc-0.073
 officers-0.067
 men-0.066
 policemen-0.065
 goats-0.065
Token he
Token ID339
Feature activation+8.40e-03

Pos logit contributions
 ranc+0.001
 men+0.001
 officers+0.001
 policemen+0.001
 goats+0.001
Neg logit contributions
ig-0.001
 perfectly-0.000
 marker-0.000
 patiently-0.000
zz-0.000
Token could
Token ID714
Feature activation-7.06e-02

Pos logit contributions
 ranc+0.019
 officers+0.018
 men+0.017
 goats+0.017
 policemen+0.017
Neg logit contributions
ig-0.013
 perfectly-0.010
 patiently-0.009
 marker-0.009
 triangle-0.009
Token finish
Token ID5461
Feature activation-5.61e-02

Pos logit contributions
ig+0.112
 perfectly+0.091
 patiently+0.082
 Linda+0.079
inn+0.079
Neg logit contributions
 ranc-0.154
 officers-0.142
 men-0.141
 policemen-0.137
 goats-0.137
Token his
Token ID465
Feature activation-3.52e-02

Pos logit contributions
ig+0.087
 perfectly+0.066
zz+0.065
inn+0.063
mate+0.063
Neg logit contributions
 ranc-0.115
 men-0.112
 cages-0.108
 goats-0.107
 policemen-0.107
Token z
Token ID1976
Feature activation-7.78e-02

Pos logit contributions
ig+0.055
inn+0.050
fully+0.049
 Linda+0.048
 perfectly+0.047
Neg logit contributions
 ranc-0.069
 men-0.065
 officers-0.062
 cages-0.061
 policemen-0.060
Tokenig
Token ID328
Feature activation-1.41e-01

Pos logit contributions
 Emily+0.024
 Jackie+0.023
 ï¿½+0.016
 Kate+0.016
fully+0.015
Neg logit contributions
 ranc-0.231
 men-0.221
 cages-0.215
 officers-0.214
 goats-0.208
Tokenzag
Token ID50183
Feature activation-7.00e-02

Pos logit contributions
ig+0.238
 perfectly+0.206
 patiently+0.197
 marker+0.189
 Linda+0.188
Neg logit contributions
 ranc-0.271
 officers-0.248
 men-0.243
 goats-0.234
 policemen-0.230
Token.
Token ID13
Feature activation-2.65e-02

Pos logit contributions
ig+0.113
 perfectly+0.094
 patiently+0.091
 Linda+0.089
inn+0.088
Neg logit contributions
 ranc-0.141
 men-0.131
 officers-0.129
 cages-0.127
 goats-0.124
Token<|endoftext|>
Token ID50256
Feature activation+8.23e-02

Pos logit contributions
ig+0.049
 perfectly+0.041
 patiently+0.038
 marker+0.036
 toe+0.036
Neg logit contributions
 ranc-0.053
 officers-0.050
 men-0.049
 policemen-0.048
 goats-0.047
TokenOnce
Token ID7454
Feature activation+4.43e-04

Pos logit contributions
 ranc+0.170
 officers+0.157
 men+0.154
 policemen+0.152
 goats+0.151
Neg logit contributions
ig-0.147
 perfectly-0.124
 patiently-0.113
 Linda-0.109
 marker-0.107
Token upon
Token ID2402
Feature activation+2.14e-02

Pos logit contributions
 ranc+0.001
 men+0.001
 officers+0.001
 policemen+0.001
 goats+0.001
Neg logit contributions
ig-0.001
 perfectly-0.001
 patiently-0.001
 marker-0.001
 dab-0.001
Token laughed
Token ID13818
Feature activation+1.17e-02

Pos logit contributions
 ranc+0.046
 officers+0.043
 men+0.042
 policemen+0.042
 goats+0.041
Neg logit contributions
ig-0.031
 perfectly-0.026
 patiently-0.024
 marker-0.022
 Linda-0.022
Token and
Token ID290
Feature activation-1.16e-03

Pos logit contributions
 ranc+0.022
 men+0.021
 officers+0.021
 goats+0.020
 policemen+0.020
Neg logit contributions
ig-0.022
 perfectly-0.017
zz-0.017
 marker-0.016
 dab-0.016
Token said
Token ID531
Feature activation+7.78e-03

Pos logit contributions
ig+0.002
 perfectly+0.002
 marker+0.002
mate+0.002
 patiently+0.002
Neg logit contributions
 ranc-0.002
 officers-0.002
 men-0.002
 goats-0.002
 policemen-0.002
Token,
Token ID11
Feature activation-9.34e-03

Pos logit contributions
 ranc+0.017
 men+0.016
 officers+0.015
 policemen+0.015
 goats+0.015
Neg logit contributions
ig-0.013
 perfectly-0.010
zz-0.009
 marker-0.009
inn-0.009
Token "
Token ID366
Feature activation+8.26e-03

Pos logit contributions
ig+0.019
 perfectly+0.015
zz+0.013
 toe+0.013
 bead+0.013
Neg logit contributions
 ranc-0.018
 officers-0.016
 men-0.016
 goats-0.016
 policemen-0.015
TokenSee
Token ID6214
Feature activation-1.41e-01

Pos logit contributions
 ranc+0.014
 officers+0.013
 goats+0.013
 men+0.012
 policemen+0.012
Neg logit contributions
ig-0.017
 perfectly-0.014
 patiently-0.014
 toe-0.013
 coloring-0.013
Token,
Token ID11
Feature activation-3.14e-02

Pos logit contributions
ig+0.218
inn+0.181
 perfectly+0.178
 marker+0.175
 patiently+0.173
Neg logit contributions
 ranc-0.287
 men-0.274
 officers-0.270
 goats-0.258
 policemen-0.258
Token I
Token ID314
Feature activation-7.37e-02

Pos logit contributions
ig+0.048
 perfectly+0.037
 patiently+0.033
inn+0.033
 marker+0.032
Neg logit contributions
 ranc-0.072
 men-0.068
 officers-0.068
 goats-0.066
 policemen-0.066
Token told
Token ID1297
Feature activation+2.00e-02

Pos logit contributions
ig+0.122
 perfectly+0.100
 patiently+0.092
fully+0.087
 marker+0.086
Neg logit contributions
 ranc-0.158
 officers-0.147
 men-0.146
 policemen-0.142
 goats-0.141
Token you
Token ID345
Feature activation-2.80e-02

Pos logit contributions
 ranc+0.044
 officers+0.041
 men+0.041
 goats+0.040
 policemen+0.040
Neg logit contributions
ig-0.032
 perfectly-0.026
 patiently-0.024
 marker-0.023
inn-0.022
Token it
Token ID340
Feature activation-2.56e-02

Pos logit contributions
ig+0.036
 perfectly+0.031
 patiently+0.027
fully+0.024
 proudly+0.024
Neg logit contributions
 ranc-0.072
 officers-0.067
 men-0.065
 policemen-0.065
 goats-0.064
Token to
Token ID284
Feature activation+1.77e-02

Pos logit contributions
 ranc+0.009
 men+0.009
 officers+0.009
 policemen+0.009
 goats+0.008
Neg logit contributions
ig-0.008
 perfectly-0.007
 marker-0.006
zz-0.006
 paste-0.006
Token Tim
Token ID5045
Feature activation-5.87e-02

Pos logit contributions
 ranc+0.037
 officers+0.035
 goats+0.034
 men+0.033
 dogs+0.033
Neg logit contributions
ig-0.033
 perfectly-0.024
 marker-0.023
zz-0.023
inn-0.023
Tokenmy
Token ID1820
Feature activation-6.89e-02

Pos logit contributions
ig+0.087
 perfectly+0.071
 patiently+0.062
 dab+0.062
 marker+0.060
Neg logit contributions
 ranc-0.135
 officers-0.129
 men-0.126
 goats-0.123
 policemen-0.122
Token,
Token ID11
Feature activation-1.54e-02

Pos logit contributions
ig+0.094
 perfectly+0.074
 patiently+0.067
 marker+0.062
 Linda+0.061
Neg logit contributions
 ranc-0.169
 officers-0.159
 men-0.158
 policemen-0.155
 goats-0.154
Token "
Token ID366
Feature activation+9.02e-03

Pos logit contributions
ig+0.032
 perfectly+0.025
zz+0.023
 marker+0.023
 toe+0.023
Neg logit contributions
 ranc-0.030
 officers-0.027
 men-0.026
 policemen-0.026
 goats-0.026
TokenSee
Token ID6214
Feature activation-1.40e-01

Pos logit contributions
 ranc+0.015
 officers+0.014
 goats+0.014
 men+0.014
 policemen+0.014
Neg logit contributions
ig-0.019
 perfectly-0.016
 patiently-0.016
 toe-0.015
 Linda-0.014
Token,
Token ID11
Feature activation-3.13e-02

Pos logit contributions
ig+0.216
 perfectly+0.179
 dab+0.172
inn+0.171
 paste+0.171
Neg logit contributions
 ranc-0.298
 men-0.283
 officers-0.283
 policemen-0.270
 goats-0.269
Token I
Token ID314
Feature activation-7.05e-02

Pos logit contributions
ig+0.054
 perfectly+0.042
inn+0.040
zz+0.039
 patiently+0.039
Neg logit contributions
 ranc-0.065
 men-0.063
 officers-0.062
 goats-0.061
 policemen-0.060
Token told
Token ID1297
Feature activation+2.08e-02

Pos logit contributions
ig+0.117
 perfectly+0.097
 patiently+0.089
fully+0.082
 marker+0.082
Neg logit contributions
 ranc-0.153
 officers-0.143
 men-0.141
 policemen-0.139
 goats-0.137
Token you
Token ID345
Feature activation-2.54e-02

Pos logit contributions
 ranc+0.043
 officers+0.041
 men+0.041
 goats+0.040
 policemen+0.040
Neg logit contributions
ig-0.036
 perfectly-0.028
 patiently-0.027
inn-0.026
 marker-0.026
Token I
Token ID314
Feature activation-6.57e-02

Pos logit contributions
ig+0.032
 perfectly+0.028
 patiently+0.025
 proudly+0.022
fully+0.022
Neg logit contributions
 ranc-0.066
 officers-0.061
 policemen-0.059
 men-0.059
 cages-0.058
Token His
Token ID2399
Feature activation-3.07e-02

Pos logit contributions
ig+0.046
 perfectly+0.039
 patiently+0.036
 marker+0.035
zz+0.035
Neg logit contributions
 ranc-0.040
 officers-0.037
 men-0.037
 goats-0.036
 policemen-0.036
Token mom
Token ID1995
Feature activation+3.26e-02

Pos logit contributions
ig+0.051
 perfectly+0.041
inn+0.040
fully+0.039
zz+0.039
Neg logit contributions
 ranc-0.062
 men-0.060
 officers-0.059
 policemen-0.058
 goats-0.057
Token said
Token ID531
Feature activation+6.72e-03

Pos logit contributions
 ranc+0.072
 officers+0.068
 men+0.066
 policemen+0.065
 goats+0.065
Neg logit contributions
ig-0.052
 perfectly-0.043
 patiently-0.038
 marker-0.037
 toe-0.036
Token,
Token ID11
Feature activation-1.43e-02

Pos logit contributions
 ranc+0.015
 men+0.015
 officers+0.015
 policemen+0.014
 goats+0.014
Neg logit contributions
ig-0.010
 perfectly-0.008
zz-0.007
 patiently-0.007
 marker-0.007
Token "
Token ID366
Feature activation+1.34e-03

Pos logit contributions
ig+0.023
 perfectly+0.018
zz+0.016
 toe+0.016
 marker+0.015
Neg logit contributions
 ranc-0.033
 officers-0.030
 men-0.030
 goats-0.029
 policemen-0.029
TokenSee
Token ID6214
Feature activation-1.40e-01

Pos logit contributions
 ranc+0.002
 officers+0.002
 goats+0.002
 men+0.002
 policemen+0.002
Neg logit contributions
ig-0.003
 perfectly-0.002
 patiently-0.002
 toe-0.002
 coloring-0.002
Token,
Token ID11
Feature activation-3.61e-02

Pos logit contributions
ig+0.208
 perfectly+0.167
 dab+0.158
 patiently+0.156
inn+0.155
Neg logit contributions
 ranc-0.306
 officers-0.293
 men-0.290
 policemen-0.280
 goats-0.278
Token the
Token ID262
Feature activation+2.03e-02

Pos logit contributions
ig+0.052
 perfectly+0.041
 patiently+0.038
 marker+0.035
 toe+0.034
Neg logit contributions
 ranc-0.086
 officers-0.081
 men-0.080
 policemen-0.079
 goats-0.079
Token blanket
Token ID18447
Feature activation-1.83e-02

Pos logit contributions
 ranc+0.043
 officers+0.040
 men+0.039
 policemen+0.039
 goats+0.038
Neg logit contributions
ig-0.034
 perfectly-0.029
 patiently-0.027
 marker-0.026
 toe-0.026
Token will
Token ID481
Feature activation+1.92e-02

Pos logit contributions
ig+0.027
 perfectly+0.022
 patiently+0.020
 marker+0.018
 toe+0.018
Neg logit contributions
 ranc-0.044
 officers-0.041
 men-0.040
 policemen-0.040
 goats-0.040
Token keep
Token ID1394
Feature activation-4.84e-02

Pos logit contributions
 ranc+0.048
 officers+0.045
 men+0.044
 policemen+0.044
 goats+0.043
Neg logit contributions
ig-0.026
 perfectly-0.021
 patiently-0.019
 marker-0.017
 Linda-0.017
Token he
Token ID339
Feature activation+1.35e-02

Pos logit contributions
ig+0.014
 perfectly+0.012
 patiently+0.011
 marker+0.010
 toe+0.010
Neg logit contributions
 ranc-0.017
 officers-0.016
 men-0.015
 policemen-0.015
 goats-0.015
Token never
Token ID1239
Feature activation-2.33e-02

Pos logit contributions
 ranc+0.030
 officers+0.028
 men+0.027
 goats+0.027
 policemen+0.027
Neg logit contributions
ig-0.021
 perfectly-0.017
 patiently-0.016
 marker-0.015
 triangle-0.015
Token refused
Token ID6520
Feature activation+6.57e-03

Pos logit contributions
ig+0.042
 perfectly+0.035
 patiently+0.033
 marker+0.031
 Linda+0.030
Neg logit contributions
 ranc-0.048
 officers-0.044
 men-0.044
 policemen-0.043
 goats-0.043
Token the
Token ID262
Feature activation+1.92e-03

Pos logit contributions
 ranc+0.017
 officers+0.016
 men+0.016
 policemen+0.015
 goats+0.015
Neg logit contributions
ig-0.008
 perfectly-0.007
 patiently-0.006
 Linda-0.005
 toe-0.005
Token z
Token ID1976
Feature activation-6.43e-02

Pos logit contributions
 ranc+0.004
 officers+0.004
 men+0.004
 policemen+0.004
 goats+0.004
Neg logit contributions
ig-0.003
 perfectly-0.003
 patiently-0.002
 marker-0.002
 toe-0.002
Tokenig
Token ID328
Feature activation-1.40e-01

Pos logit contributions
 Emily+0.012
 Jackie+0.011
 ï¿½+0.007
 Kate+0.006
 patiently+0.006
Neg logit contributions
 ranc-0.204
 men-0.195
 officers-0.192
 cages-0.189
 policemen-0.184
Tokenzag
Token ID50183
Feature activation-6.60e-02

Pos logit contributions
ig+0.251
 patiently+0.217
 perfectly+0.214
 dab+0.206
 marker+0.202
Neg logit contributions
 ranc-0.252
 officers-0.230
 men-0.219
 goats-0.209
 policemen-0.206
Token again
Token ID757
Feature activation-2.49e-02

Pos logit contributions
ig+0.105
 patiently+0.084
 perfectly+0.082
inn+0.079
 Linda+0.078
Neg logit contributions
 ranc-0.138
 officers-0.131
 men-0.130
 cages-0.124
 policemen-0.123
Token.
Token ID13
Feature activation-2.00e-02

Pos logit contributions
ig+0.042
 perfectly+0.034
 patiently+0.032
 marker+0.030
 Linda+0.029
Neg logit contributions
 ranc-0.054
 officers-0.050
 men-0.049
 policemen-0.048
 goats-0.048
Token<|endoftext|>
Token ID50256
Feature activation+8.58e-02

Pos logit contributions
ig+0.034
 perfectly+0.028
 patiently+0.026
 marker+0.025
 toe+0.024
Neg logit contributions
 ranc-0.043
 officers-0.040
 men-0.039
 policemen-0.039
 goats-0.038
TokenJohn
Token ID7554
Feature activation+1.21e-02

Pos logit contributions
 ranc+0.172
 officers+0.158
 men+0.154
 policemen+0.153
 goats+0.151
Neg logit contributions
ig-0.159
 perfectly-0.135
 patiently-0.124
 Linda-0.119
 marker-0.118
Token waved
Token ID26834
Feature activation-1.09e-02

Pos logit contributions
 ranc+0.024
 officers+0.022
 men+0.022
 goats+0.021
 policemen+0.021
Neg logit contributions
ig-0.020
 perfectly-0.017
 patiently-0.015
mate-0.015
inn-0.015
Token goodbye
Token ID24829
Feature activation+4.96e-02

Pos logit contributions
ig+0.017
 perfectly+0.014
 patiently+0.012
 marker+0.012
 triangle+0.012
Neg logit contributions
 ranc-0.025
 officers-0.023
 men-0.023
 policemen-0.022
 goats-0.022
Token and
Token ID290
Feature activation-2.40e-02

Pos logit contributions
 ranc+0.115
 officers+0.107
 men+0.106
 policemen+0.103
 goats+0.103
Neg logit contributions
ig-0.075
 perfectly-0.060
 patiently-0.055
 marker-0.052
 Linda-0.051
Token said
Token ID531
Feature activation+1.60e-03

Pos logit contributions
ig+0.046
 perfectly+0.037
 marker+0.036
inn+0.035
 triangle+0.034
Neg logit contributions
 ranc-0.045
 officers-0.043
 men-0.043
 policemen-0.040
 goats-0.040
Token "
Token ID366
Feature activation+2.76e-03

Pos logit contributions
 ranc+0.003
 men+0.003
 officers+0.003
 policemen+0.003
 goats+0.003
Neg logit contributions
ig-0.003
 perfectly-0.002
 marker-0.002
 paste-0.002
zz-0.002
TokenSee
Token ID6214
Feature activation-1.40e-01

Pos logit contributions
 ranc+0.005
 officers+0.005
 goats+0.004
 men+0.004
 policemen+0.004
Neg logit contributions
ig-0.006
 perfectly-0.005
 patiently-0.004
 marker-0.004
 Linda-0.004
Token you
Token ID345
Feature activation-3.54e-02

Pos logit contributions
ig+0.203
 perfectly+0.163
 patiently+0.154
 marker+0.149
 dab+0.147
Neg logit contributions
 ranc-0.319
 officers-0.298
 men-0.297
 policemen-0.287
 goats-0.286
Token soon
Token ID2582
Feature activation-3.37e-02

Pos logit contributions
ig+0.049
 perfectly+0.040
 patiently+0.036
 marker+0.032
 Linda+0.032
Neg logit contributions
 ranc-0.087
 officers-0.081
 men-0.080
 policemen-0.079
 goats-0.078
Token!"
Token ID2474
Feature activation+1.10e-02

Pos logit contributions
ig+0.044
 perfectly+0.036
 patiently+0.033
 proudly+0.029
 marker+0.029
Neg logit contributions
 ranc-0.085
 officers-0.079
 policemen-0.078
 men-0.078
 goats-0.076
Token
Token ID198
Feature activation+1.97e-03

Pos logit contributions
 ranc+0.019
 officers+0.017
 men+0.016
 policemen+0.016
 goats+0.016
Neg logit contributions
ig-0.023
 perfectly-0.020
 patiently-0.018
 marker-0.018
 triangle-0.017
Token
Token ID198
Feature activation-1.53e-03

Pos logit contributions
 ranc+0.004
 officers+0.003
 men+0.003
 policemen+0.003
 goats+0.003
Neg logit contributions
ig-0.004
 perfectly-0.003
 patiently-0.003
 marker-0.003
zz-0.003
Token His
Token ID2399
Feature activation-2.67e-02

Pos logit contributions
ig+0.040
 perfectly+0.033
zz+0.030
 patiently+0.030
mate+0.029
Neg logit contributions
 ranc-0.030
 officers-0.028
 men-0.027
 policemen-0.026
 goats-0.026
Token mom
Token ID1995
Feature activation+3.67e-02

Pos logit contributions
ig+0.047
 perfectly+0.037
inn+0.037
zz+0.036
fully+0.036
Neg logit contributions
 men-0.051
 ranc-0.050
 officers-0.049
 policemen-0.048
 goats-0.048
Token said
Token ID531
Feature activation+1.16e-02

Pos logit contributions
 ranc+0.076
 officers+0.073
 men+0.071
 policemen+0.069
 goats+0.068
Neg logit contributions
ig-0.063
 perfectly-0.051
 patiently-0.046
inn-0.045
 toe-0.045
Token,
Token ID11
Feature activation-1.05e-02

Pos logit contributions
 ranc+0.026
 men+0.025
 officers+0.025
 policemen+0.024
 goats+0.024
Neg logit contributions
ig-0.018
 perfectly-0.014
zz-0.013
 patiently-0.012
 toe-0.012
Token "
Token ID366
Feature activation+3.64e-03

Pos logit contributions
ig+0.020
 perfectly+0.016
 toe+0.014
zz+0.014
 bead+0.014
Neg logit contributions
 ranc-0.022
 officers-0.020
 men-0.020
 policemen-0.019
 goats-0.019
TokenSee
Token ID6214
Feature activation-1.40e-01

Pos logit contributions
 ranc+0.007
 officers+0.007
 goats+0.006
 policemen+0.006
 men+0.006
Neg logit contributions
ig-0.007
 perfectly-0.006
 patiently-0.005
 toe-0.005
 brush-0.005
Token,
Token ID11
Feature activation-3.60e-02

Pos logit contributions
ig+0.218
 perfectly+0.176
 dab+0.173
inn+0.170
 paste+0.167
Neg logit contributions
 ranc-0.285
 men-0.281
 officers-0.277
 policemen-0.268
 goats-0.261
Token you
Token ID345
Feature activation-2.52e-02

Pos logit contributions
ig+0.052
 perfectly+0.038
 patiently+0.034
zz+0.033
inn+0.033
Neg logit contributions
 ranc-0.084
 men-0.082
 officers-0.081
 policemen-0.080
 goats-0.079
Token are
Token ID389
Feature activation-1.20e-02

Pos logit contributions
ig+0.039
 perfectly+0.032
 patiently+0.029
 marker+0.027
 Linda+0.026
Neg logit contributions
 ranc-0.059
 officers-0.054
 men-0.054
 policemen-0.053
 goats-0.053
Token so
Token ID523
Feature activation+7.59e-03

Pos logit contributions
ig+0.020
 perfectly+0.017
 patiently+0.015
 marker+0.014
 Linda+0.014
Neg logit contributions
 ranc-0.027
 officers-0.025
 men-0.024
 policemen-0.024
 goats-0.024
Token independent
Token ID4795
Feature activation-3.07e-04

Pos logit contributions
 ranc+0.016
 officers+0.015
 men+0.014
 policemen+0.014
 goats+0.014
Neg logit contributions
ig-0.014
 perfectly-0.012
 patiently-0.011
 triangle-0.010
 toe-0.010
Token mom
Token ID1995
Feature activation+4.46e-02

Pos logit contributions
 ranc+0.006
 officers+0.006
 men+0.006
 policemen+0.006
 goats+0.005
Neg logit contributions
ig-0.005
 perfectly-0.004
inn-0.004
zz-0.004
fully-0.004
Token smiled
Token ID13541
Feature activation+7.95e-03

Pos logit contributions
 ranc+0.096
 officers+0.092
 men+0.089
 policemen+0.087
 goats+0.087
Neg logit contributions
ig-0.073
 perfectly-0.059
 patiently-0.054
 toe-0.052
 marker-0.052
Token and
Token ID290
Feature activation-4.89e-03

Pos logit contributions
 ranc+0.017
 officers+0.016
 men+0.016
 policemen+0.016
 goats+0.015
Neg logit contributions
ig-0.013
 perfectly-0.011
 patiently-0.010
 marker-0.009
 Linda-0.009
Token said
Token ID531
Feature activation+7.46e-03

Pos logit contributions
ig+0.011
 perfectly+0.009
mate+0.009
inn+0.009
 marker+0.009
Neg logit contributions
 ranc-0.007
 officers-0.007
 men-0.007
 goats-0.007
 policemen-0.006
Token "
Token ID366
Feature activation+3.19e-03

Pos logit contributions
 ranc+0.016
 men+0.015
 officers+0.015
 policemen+0.014
 goats+0.014
Neg logit contributions
ig-0.013
 perfectly-0.009
zz-0.009
 toe-0.009
 marker-0.009
TokenSee
Token ID6214
Feature activation-1.40e-01

Pos logit contributions
 ranc+0.006
 officers+0.006
 goats+0.006
 men+0.005
 policemen+0.005
Neg logit contributions
ig-0.006
 perfectly-0.005
 patiently-0.005
 toe-0.004
 marker-0.004
Token,
Token ID11
Feature activation-3.45e-02

Pos logit contributions
ig+0.213
inn+0.172
 perfectly+0.167
 dab+0.165
 paste+0.164
Neg logit contributions
 ranc-0.295
 officers-0.281
 men-0.278
 goats-0.265
 policemen-0.264
Token I
Token ID314
Feature activation-7.21e-02

Pos logit contributions
ig+0.052
 perfectly+0.038
inn+0.036
 patiently+0.035
zz+0.034
Neg logit contributions
 ranc-0.079
 officers-0.076
 men-0.076
 policemen-0.074
 goats-0.074
Token told
Token ID1297
Feature activation+1.61e-02

Pos logit contributions
ig+0.121
 perfectly+0.099
 patiently+0.091
fully+0.086
 triangle+0.086
Neg logit contributions
 ranc-0.154
 officers-0.143
 men-0.140
 policemen-0.137
 goats-0.136
Token you
Token ID345
Feature activation-2.66e-02

Pos logit contributions
 ranc+0.035
 officers+0.033
 men+0.033
 goats+0.032
 policemen+0.032
Neg logit contributions
ig-0.027
 perfectly-0.021
 patiently-0.020
 marker-0.019
inn-0.018
Token being
Token ID852
Feature activation+2.54e-02

Pos logit contributions
ig+0.031
 perfectly+0.027
 patiently+0.023
 proudly+0.020
 Linda+0.020
Neg logit contributions
 ranc-0.071
 officers-0.066
 men-0.066
 policemen-0.065
 goats-0.064
Token his
Token ID465
Feature activation-8.35e-04

Pos logit contributions
ig+0.020
 perfectly+0.016
 patiently+0.015
 marker+0.015
mate+0.015
Neg logit contributions
 ranc-0.021
 officers-0.019
 men-0.019
 policemen-0.019
 goats-0.019
Token mom
Token ID1995
Feature activation+3.08e-02

Pos logit contributions
ig+0.001
 perfectly+0.001
fully+0.001
 patiently+0.001
zz+0.001
Neg logit contributions
 ranc-0.002
 men-0.002
 officers-0.002
 policemen-0.002
 goats-0.002
Token said
Token ID531
Feature activation+7.54e-03

Pos logit contributions
 ranc+0.067
 officers+0.063
 men+0.061
 policemen+0.060
 goats+0.060
Neg logit contributions
ig-0.050
 perfectly-0.041
 patiently-0.037
 marker-0.036
 toe-0.035
Token,
Token ID11
Feature activation-9.29e-03

Pos logit contributions
 ranc+0.017
 men+0.016
 officers+0.015
 policemen+0.015
 goats+0.015
Neg logit contributions
ig-0.012
 perfectly-0.009
zz-0.009
 marker-0.009
 patiently-0.008
Token '
Token ID705
Feature activation-5.06e-03

Pos logit contributions
ig+0.017
 perfectly+0.014
zz+0.012
 toe+0.012
 marker+0.012
Neg logit contributions
 ranc-0.020
 officers-0.018
 men-0.018
 goats-0.017
 policemen-0.017
TokenSee
Token ID6214
Feature activation-1.39e-01

Pos logit contributions
ig+0.010
 perfectly+0.009
 patiently+0.008
 toe+0.008
 Linda+0.008
Neg logit contributions
 ranc-0.009
 officers-0.008
 goats-0.008
 men-0.008
 policemen-0.008
Token,
Token ID11
Feature activation-3.65e-02

Pos logit contributions
ig+0.205
 perfectly+0.162
 patiently+0.149
 dab+0.146
 marker+0.143
Neg logit contributions
 ranc-0.320
 officers-0.301
 men-0.300
 policemen-0.292
 goats-0.288
Token I
Token ID314
Feature activation-7.44e-02

Pos logit contributions
ig+0.054
 perfectly+0.043
 patiently+0.038
inn+0.036
 dab+0.036
Neg logit contributions
 ranc-0.085
 men-0.080
 officers-0.080
 policemen-0.078
 goats-0.078
Token told
Token ID1297
Feature activation+1.41e-02

Pos logit contributions
ig+0.114
 perfectly+0.094
 patiently+0.087
 marker+0.079
 Linda+0.079
Neg logit contributions
 ranc-0.172
 officers-0.160
 men-0.158
 policemen-0.156
 goats-0.155
Token you
Token ID345
Feature activation-2.75e-02

Pos logit contributions
 ranc+0.032
 officers+0.030
 men+0.030
 policemen+0.029
 goats+0.029
Neg logit contributions
ig-0.022
 perfectly-0.017
 patiently-0.016
 marker-0.015
 Linda-0.015
Token I
Token ID314
Feature activation-6.85e-02

Pos logit contributions
ig+0.036
 perfectly+0.030
 patiently+0.027
 proudly+0.024
fully+0.024
Neg logit contributions
 ranc-0.070
 officers-0.066
 men-0.064
 policemen-0.064
 goats-0.063
Token.
Token ID13
Feature activation-1.86e-02

Pos logit contributions
ig+0.016
 perfectly+0.014
 patiently+0.012
 proudly+0.011
fully+0.011
Neg logit contributions
 ranc-0.028
 men-0.025
 policemen-0.025
 officers-0.025
 goats-0.025
Token Sara
Token ID24799
Feature activation-3.58e-02

Pos logit contributions
ig+0.032
 perfectly+0.027
 patiently+0.025
 marker+0.023
 Linda+0.023
Neg logit contributions
 ranc-0.039
 officers-0.036
 men-0.035
 policemen-0.035
 goats-0.035
Token said
Token ID531
Feature activation+5.20e-03

Pos logit contributions
ig+0.051
 perfectly+0.040
 patiently+0.037
 proudly+0.035
 toe+0.033
Neg logit contributions
 ranc-0.088
 men-0.082
 officers-0.082
 policemen-0.082
 goats-0.080
Token,
Token ID11
Feature activation-1.52e-02

Pos logit contributions
 ranc+0.012
 officers+0.011
 men+0.011
 policemen+0.011
 goats+0.011
Neg logit contributions
ig-0.008
 perfectly-0.006
 patiently-0.006
 marker-0.006
 Linda-0.005
Token "
Token ID366
Feature activation+3.09e-03

Pos logit contributions
ig+0.027
 perfectly+0.022
 patiently+0.020
 marker+0.020
zz+0.020
Neg logit contributions
 ranc-0.032
 officers-0.029
 men-0.029
 goats-0.028
 policemen-0.028
TokenSee
Token ID6214
Feature activation-1.39e-01

Pos logit contributions
 ranc+0.006
 officers+0.006
 men+0.006
 policemen+0.006
 goats+0.006
Neg logit contributions
ig-0.005
 perfectly-0.005
 patiently-0.004
 marker-0.004
 Linda-0.004
Token you
Token ID345
Feature activation-3.86e-02

Pos logit contributions
ig+0.203
 perfectly+0.165
 patiently+0.154
 marker+0.151
inn+0.149
Neg logit contributions
 ranc-0.320
 officers-0.299
 men-0.298
 policemen-0.288
 goats-0.288
Token soon
Token ID2582
Feature activation-3.44e-02

Pos logit contributions
ig+0.052
 perfectly+0.041
 patiently+0.038
 marker+0.034
 Linda+0.033
Neg logit contributions
 ranc-0.097
 officers-0.090
 men-0.089
 policemen-0.088
 goats-0.087
Token!"
Token ID2474
Feature activation+1.63e-02

Pos logit contributions
ig+0.042
 perfectly+0.034
 patiently+0.030
 proudly+0.026
 marker+0.026
Neg logit contributions
 ranc-0.090
 officers-0.084
 policemen-0.083
 men-0.082
 goats-0.081
Token and
Token ID290
Feature activation-2.87e-03

Pos logit contributions
 ranc+0.035
 officers+0.032
 men+0.032
 goats+0.031
 policemen+0.031
Neg logit contributions
ig-0.028
 perfectly-0.024
 patiently-0.021
 marker-0.020
 toe-0.020
Token the
Token ID262
Feature activation+3.46e-02

Pos logit contributions
ig+0.005
 perfectly+0.004
 patiently+0.004
 marker+0.003
 triangle+0.003
Neg logit contributions
 ranc-0.006
 officers-0.006
 men-0.006
 goats-0.006
 policemen-0.006
Token smiled
Token ID13541
Feature activation+1.07e-02

Pos logit contributions
ig+0.015
 perfectly+0.012
 patiently+0.011
 marker+0.010
 Linda+0.010
Neg logit contributions
 ranc-0.023
 officers-0.021
 men-0.021
 policemen-0.021
 goats-0.020
Token and
Token ID290
Feature activation-5.00e-03

Pos logit contributions
 ranc+0.022
 officers+0.020
 men+0.020
 goats+0.019
 policemen+0.019
Neg logit contributions
ig-0.019
 perfectly-0.016
 marker-0.014
 dab-0.014
zz-0.014
Token said
Token ID531
Feature activation+8.52e-03

Pos logit contributions
ig+0.010
 perfectly+0.008
mate+0.008
 marker+0.008
inn+0.007
Neg logit contributions
 ranc-0.009
 officers-0.009
 men-0.009
 goats-0.008
 policemen-0.008
Token,
Token ID11
Feature activation-1.09e-02

Pos logit contributions
 ranc+0.018
 men+0.017
 officers+0.016
 policemen+0.016
 goats+0.015
Neg logit contributions
ig-0.015
 perfectly-0.012
zz-0.011
 marker-0.011
 dab-0.011
Token "
Token ID366
Feature activation+3.77e-03

Pos logit contributions
ig+0.022
 perfectly+0.017
zz+0.015
 toe+0.015
 bead+0.015
Neg logit contributions
 ranc-0.022
 officers-0.019
 men-0.019
 goats-0.018
 policemen-0.018
TokenSee
Token ID6214
Feature activation-1.39e-01

Pos logit contributions
 ranc+0.006
 officers+0.006
 goats+0.006
 men+0.006
 policemen+0.006
Neg logit contributions
ig-0.008
 perfectly-0.007
 patiently-0.006
 coloring-0.006
 toe-0.006
Token,
Token ID11
Feature activation-3.32e-02

Pos logit contributions
ig+0.204
 perfectly+0.165
 patiently+0.157
inn+0.154
 dab+0.154
Neg logit contributions
 ranc-0.310
 men-0.293
 officers-0.292
 policemen-0.281
 goats-0.280
Token big
Token ID1263
Feature activation-3.29e-02

Pos logit contributions
ig+0.046
 perfectly+0.037
 patiently+0.033
 marker+0.030
 Linda+0.030
Neg logit contributions
 ranc-0.081
 officers-0.076
 men-0.075
 policemen-0.074
 goats-0.073
Token bear
Token ID6842
Feature activation-8.53e-03

Pos logit contributions
ig+0.051
 perfectly+0.041
 patiently+0.037
 toe+0.035
 leaf+0.035
Neg logit contributions
 ranc-0.076
 officers-0.071
 policemen-0.069
 men-0.069
 goats-0.067
Token!
Token ID0
Feature activation-4.78e-02

Pos logit contributions
ig+0.012
 perfectly+0.010
 patiently+0.009
 Linda+0.008
 marker+0.008
Neg logit contributions
 ranc-0.021
 officers-0.019
 men-0.019
 policemen-0.019
 goats-0.019
Token I
Token ID314
Feature activation-6.66e-02

Pos logit contributions
ig+0.077
 perfectly+0.064
 patiently+0.059
 marker+0.054
fully+0.053
Neg logit contributions
 ranc-0.106
 officers-0.098
 men-0.097
 policemen-0.095
 goats-0.095
Token big
Token ID1263
Feature activation-6.74e-03

Pos logit contributions
 ranc+0.011
 men+0.010
 goats+0.010
 officers+0.010
 policemen+0.010
Neg logit contributions
ig-0.012
 perfectly-0.011
fully-0.010
 patiently-0.010
 Linda-0.009
Token farm
Token ID5318
Feature activation+1.10e-01

Pos logit contributions
ig+0.013
 perfectly+0.012
 patiently+0.011
fully+0.010
 dab+0.010
Neg logit contributions
 ranc-0.013
 men-0.012
 officers-0.011
 goats-0.011
 policemen-0.011
Token with
Token ID351
Feature activation+3.06e-02

Pos logit contributions
 ranc+0.307
 officers+0.299
 policemen+0.297
 announcements+0.292
 men+0.290
Neg logit contributions
ig-0.101
 perfectly-0.071
 patiently-0.070
on-0.069
 triangle-0.068
Token lots
Token ID6041
Feature activation+2.46e-02

Pos logit contributions
 ranc+0.069
 officers+0.067
 policemen+0.062
 men+0.062
 announcements+0.061
Neg logit contributions
ig-0.037
 perfectly-0.033
 M-0.032
 triangle-0.031
 patiently-0.031
Token of
Token ID286
Feature activation+2.44e-02

Pos logit contributions
 ranc+0.066
 officers+0.065
 announcements+0.064
 policemen+0.063
 cages+0.060
Neg logit contributions
ig-0.021
 patiently-0.014
 leaf-0.014
on-0.012
 triangle-0.011
Token animals
Token ID4695
Feature activation+9.41e-02

Pos logit contributions
 officers+0.038
 ranc+0.035
cases+0.035
guards+0.033
 policemen+0.032
Neg logit contributions
ig-0.040
 leaf-0.038
 M-0.038
 crystal-0.038
 flower-0.037
Token.
Token ID13
Feature activation+1.62e-02

Pos logit contributions
 ranc+0.242
 officers+0.228
 policemen+0.224
 men+0.222
 announcements+0.215
Neg logit contributions
ig-0.118
 perfectly-0.087
 patiently-0.082
 coloring-0.077
 toe-0.075
Token There
Token ID1318
Feature activation-2.35e-02

Pos logit contributions
 ranc+0.035
 officers+0.034
 policemen+0.033
 men+0.033
 goats+0.032
Neg logit contributions
ig-0.020
 Kate-0.020
 triangle-0.020
 Emily-0.019
 Linda-0.019
Token was
Token ID373
Feature activation+3.81e-03

Pos logit contributions
ig+0.037
 perfectly+0.030
 patiently+0.028
 triangle+0.026
 marker+0.026
Neg logit contributions
 ranc-0.054
 officers-0.050
 policemen-0.049
 men-0.048
 goats-0.048
Token a
Token ID257
Feature activation-9.33e-03

Pos logit contributions
 officers+0.010
folk+0.009
cases+0.009
 ranc+0.009
guards+0.009
Neg logit contributions
 M-0.003
 Mia-0.003
 leaf-0.002
 her-0.002
 Lily-0.002
Token horse
Token ID8223
Feature activation+4.97e-02

Pos logit contributions
ig+0.017
 perfectly+0.014
 patiently+0.013
 triangle+0.013
 marker+0.013
Neg logit contributions
 ranc-0.018
 officers-0.017
 men-0.017
 policemen-0.016
 goats-0.016
Token eat
Token ID4483
Feature activation-1.41e-02

Pos logit contributions
 ranc+0.065
 officers+0.061
 men+0.060
 policemen+0.059
 goats+0.059
Neg logit contributions
ig-0.042
 perfectly-0.035
 patiently-0.033
 marker-0.030
 toe-0.030
Token.
Token ID13
Feature activation+8.41e-04

Pos logit contributions
ig+0.022
 perfectly+0.017
 patiently+0.015
zz+0.015
 marker+0.015
Neg logit contributions
 ranc-0.032
 men-0.031
 officers-0.030
 goats-0.030
 policemen-0.029
Token It
Token ID632
Feature activation-6.65e-03

Pos logit contributions
 ranc+0.002
 officers+0.002
 men+0.002
 policemen+0.002
 goats+0.002
Neg logit contributions
ig-0.001
 perfectly-0.001
 patiently-0.001
 Linda-0.001
 marker-0.001
Token was
Token ID373
Feature activation+1.74e-02

Pos logit contributions
ig+0.010
 perfectly+0.008
 patiently+0.008
 marker+0.007
 Linda+0.007
Neg logit contributions
 ranc-0.015
 officers-0.014
 men-0.014
 policemen-0.014
 goats-0.014
Token very
Token ID845
Feature activation+5.75e-02

Pos logit contributions
 ranc+0.037
 officers+0.034
 men+0.034
 policemen+0.033
 goats+0.033
Neg logit contributions
ig-0.030
 perfectly-0.025
 patiently-0.023
 marker-0.022
 Linda-0.022
Token hungry
Token ID14720
Feature activation+9.98e-02

Pos logit contributions
 ranc+0.114
 officers+0.109
 goats+0.102
 policemen+0.102
 men+0.099
Neg logit contributions
ig-0.102
 perfectly-0.092
 patiently-0.089
 gently-0.081
 proudly-0.080
Token and
Token ID290
Feature activation+2.76e-02

Pos logit contributions
 ranc+0.256
 officers+0.237
 policemen+0.229
 goats+0.228
 men+0.226
Neg logit contributions
ig-0.125
 perfectly-0.103
 patiently-0.097
 toe-0.085
fully-0.085
Token greedy
Token ID31828
Feature activation+7.75e-02

Pos logit contributions
 ranc+0.049
 officers+0.045
 policemen+0.043
 goats+0.043
 men+0.043
Neg logit contributions
ig-0.054
 perfectly-0.050
 patiently-0.047
 toe-0.044
 proudly-0.044
Token.
Token ID13
Feature activation+9.34e-03

Pos logit contributions
 ranc+0.211
 officers+0.190
 policemen+0.187
 goats+0.183
 announcements+0.180
Neg logit contributions
ig-0.083
 perfectly-0.067
 patiently-0.060
fully-0.056
 gently-0.054
Token
Token ID198
Feature activation+7.61e-03

Pos logit contributions
 ranc+0.022
 officers+0.020
 men+0.020
 policemen+0.019
 goats+0.019
Neg logit contributions
ig-0.014
 perfectly-0.012
 patiently-0.011
 Linda-0.010
 triangle-0.010
Token
Token ID198
Feature activation+6.36e-03

Pos logit contributions
 ranc+0.017
 officers+0.016
 men+0.016
 goats+0.016
 policemen+0.016
Neg logit contributions
ig-0.012
 perfectly-0.010
 patiently-0.009
 Linda-0.008
 marker-0.008
Token the
Token ID262
Feature activation+3.49e-02

Pos logit contributions
ig+0.006
 perfectly+0.005
 patiently+0.004
 marker+0.004
zz+0.004
Neg logit contributions
 ranc-0.010
 officers-0.009
 men-0.009
 policemen-0.009
 goats-0.009
Token lion
Token ID18744
Feature activation+6.90e-03

Pos logit contributions
 ranc+0.076
 officers+0.070
 policemen+0.067
 goats+0.066
 men+0.065
Neg logit contributions
ig-0.058
 perfectly-0.049
 triangle-0.048
 marker-0.047
 toe-0.046
Token and
Token ID290
Feature activation+1.53e-02

Pos logit contributions
 ranc+0.016
 officers+0.015
 men+0.014
 policemen+0.014
 goats+0.014
Neg logit contributions
ig-0.011
 perfectly-0.009
 patiently-0.008
 marker-0.008
 toe-0.008
Token got
Token ID1392
Feature activation+1.95e-02

Pos logit contributions
 ranc+0.032
 officers+0.030
 men+0.030
 policemen+0.029
 goats+0.029
Neg logit contributions
ig-0.027
 perfectly-0.022
 patiently-0.020
 marker-0.019
zz-0.019
Token scared
Token ID12008
Feature activation+7.04e-02

Pos logit contributions
 ranc+0.039
 officers+0.035
 goats+0.034
 men+0.034
 policemen+0.033
Neg logit contributions
ig-0.035
 perfectly-0.031
 patiently-0.029
 toe-0.028
 Linda-0.027
Token too
Token ID1165
Feature activation+1.03e-01

Pos logit contributions
 ranc+0.164
 officers+0.150
 men+0.148
 policemen+0.146
 goats+0.145
Neg logit contributions
ig-0.107
 perfectly-0.089
 patiently-0.082
 toe-0.076
 Linda-0.076
Token.
Token ID13
Feature activation-5.47e-03

Pos logit contributions
 ranc+0.289
 officers+0.272
 goats+0.255
 policemen+0.252
 men+0.249
Neg logit contributions
ig-0.106
 patiently-0.085
 perfectly-0.084
 gently-0.083
 proudly-0.074
Token He
Token ID679
Feature activation+1.21e-02

Pos logit contributions
ig+0.010
 perfectly+0.008
 patiently+0.008
 marker+0.008
zz+0.008
Neg logit contributions
 ranc-0.011
 men-0.010
 officers-0.010
 policemen-0.010
 goats-0.010
Token ran
Token ID4966
Feature activation+1.16e-02

Pos logit contributions
 ranc+0.025
 officers+0.023
 men+0.023
 policemen+0.023
 goats+0.022
Neg logit contributions
ig-0.022
 perfectly-0.018
 patiently-0.017
 marker-0.016
 triangle-0.016
Token to
Token ID284
Feature activation+2.34e-02

Pos logit contributions
 ranc+0.031
 officers+0.029
 men+0.029
 goats+0.028
 policemen+0.028
Neg logit contributions
ig-0.013
 perfectly-0.011
 patiently-0.010
 Linda-0.008
fully-0.008
Token Lily
Token ID20037
Feature activation-6.67e-02

Pos logit contributions
 ranc+0.048
 officers+0.045
 men+0.045
 policemen+0.044
 goats+0.042
Neg logit contributions
ig-0.044
 perfectly-0.034
bul-0.030
zz-0.029
 marker-0.029
Token pretend
Token ID16614
Feature activation-4.44e-03

Pos logit contributions
 ranc+0.085
 men+0.078
 officers+0.077
 policemen+0.075
 goats+0.073
Neg logit contributions
ig-0.059
 perfectly-0.047
 patiently-0.042
fully-0.040
mate-0.039
Token they
Token ID484
Feature activation+6.28e-02

Pos logit contributions
ig+0.006
 perfectly+0.005
 patiently+0.005
 Kate+0.004
 Linda+0.004
Neg logit contributions
 ranc-0.011
 officers-0.010
 policemen-0.010
 goats-0.010
 men-0.010
Token were
Token ID547
Feature activation+4.33e-03

Pos logit contributions
 ranc+0.138
 goats+0.127
 men+0.127
 policemen+0.126
 officers+0.125
Neg logit contributions
ig-0.098
 perfectly-0.089
 patiently-0.075
 toe-0.074
 brush-0.074
Token grown
Token ID7334
Feature activation+2.27e-04

Pos logit contributions
 ranc+0.009
 officers+0.008
 policemen+0.008
 goats+0.008
 men+0.008
Neg logit contributions
ig-0.007
 perfectly-0.007
 patiently-0.006
 coloring-0.006
 Linda-0.006
Token-
Token ID12
Feature activation+2.84e-02

Pos logit contributions
 ranc+0.001
 officers+0.001
 goats+0.001
 policemen+0.001
 men+0.001
Neg logit contributions
ig-0.000
 perfectly-0.000
 patiently-0.000
 proudly-0.000
 coloring-0.000
Tokenups
Token ID4739
Feature activation+9.52e-02

Pos logit contributions
 ranc+0.052
 men+0.046
 goats+0.046
 policemen+0.045
 cages+0.042
Neg logit contributions
ig-0.055
 perfectly-0.050
 patiently-0.046
 coloring-0.045
zz-0.045
Token and
Token ID290
Feature activation+1.37e-02

Pos logit contributions
 ranc+0.244
 policemen+0.229
 men+0.229
 officers+0.222
 goats+0.219
Neg logit contributions
ig-0.112
 perfectly-0.099
 patiently-0.084
 coloring-0.083
 brush-0.076
Token use
Token ID779
Feature activation-3.01e-02

Pos logit contributions
 ranc+0.032
 officers+0.030
 men+0.030
 policemen+0.029
 goats+0.029
Neg logit contributions
ig-0.020
 perfectly-0.017
 patiently-0.015
 marker-0.014
 Linda-0.014
Token the
Token ID262
Feature activation-2.43e-03

Pos logit contributions
ig+0.053
 perfectly+0.045
 patiently+0.042
 marker+0.039
 Linda+0.039
Neg logit contributions
 ranc-0.062
 officers-0.057
 men-0.056
 policemen-0.055
 goats-0.055
Token things
Token ID1243
Feature activation+3.87e-02

Pos logit contributions
ig+0.004
 perfectly+0.003
 patiently+0.003
 marker+0.003
 toe+0.003
Neg logit contributions
 ranc-0.005
 officers-0.005
 men-0.005
 policemen-0.005
 goats-0.005
Token they
Token ID484
Feature activation+5.40e-02

Pos logit contributions
 ranc+0.092
 officers+0.084
 men+0.084
 policemen+0.083
 goats+0.082
Neg logit contributions
ig-0.056
 perfectly-0.047
 patiently-0.043
 marker-0.039
 Linda-0.039
Token coming
Token ID2406
Feature activation+6.71e-02

Pos logit contributions
 ranc+0.189
 officers+0.176
 men+0.175
 policemen+0.169
 goats+0.166
Neg logit contributions
ig-0.157
 perfectly-0.129
 patiently-0.113
 dab-0.107
 Linda-0.107
Token from
Token ID422
Feature activation+5.25e-02

Pos logit contributions
 ranc+0.147
 officers+0.136
 men+0.135
 policemen+0.132
 goats+0.132
Neg logit contributions
ig-0.111
 perfectly-0.092
 patiently-0.085
 marker-0.079
 Linda-0.078
Token outside
Token ID2354
Feature activation-3.53e-03

Pos logit contributions
 ranc+0.116
 men+0.108
 officers+0.106
 cages+0.105
 policemen+0.104
Neg logit contributions
ig-0.088
 perfectly-0.071
 patiently-0.066
 marker-0.060
 dab-0.060
Token.
Token ID13
Feature activation+1.56e-02

Pos logit contributions
ig+0.004
 perfectly+0.003
 patiently+0.003
 Linda+0.003
 toe+0.003
Neg logit contributions
 ranc-0.009
 officers-0.009
 men-0.009
 goats-0.009
 policemen-0.009
Token It
Token ID632
Feature activation+4.12e-03

Pos logit contributions
 ranc+0.030
 officers+0.027
 men+0.027
 policemen+0.026
 goats+0.026
Neg logit contributions
ig-0.031
 perfectly-0.026
 patiently-0.024
mate-0.022
zz-0.022
Token sounded
Token ID14846
Feature activation+1.01e-01

Pos logit contributions
 ranc+0.010
 officers+0.009
 men+0.009
 policemen+0.009
 goats+0.009
Neg logit contributions
ig-0.006
 perfectly-0.005
 patiently-0.005
 marker-0.004
 Linda-0.004
Token like
Token ID588
Feature activation+3.62e-02

Pos logit contributions
 ranc+0.201
 officers+0.184
 men+0.183
 policemen+0.178
 goats+0.176
Neg logit contributions
ig-0.196
 perfectly-0.157
 patiently-0.144
zz-0.140
 marker-0.139
Token a
Token ID257
Feature activation-1.45e-03

Pos logit contributions
 ranc+0.086
 officers+0.080
 men+0.078
 policemen+0.077
 goats+0.077
Neg logit contributions
ig-0.050
 perfectly-0.042
 patiently-0.038
 Linda-0.036
 marker-0.036
Token big
Token ID1263
Feature activation-7.13e-03

Pos logit contributions
ig+0.003
 perfectly+0.003
 patiently+0.002
fully+0.002
inn+0.002
Neg logit contributions
 ranc-0.003
 men-0.002
 policemen-0.002
 officers-0.002
 goats-0.002
Token boom
Token ID14166
Feature activation+5.25e-02

Pos logit contributions
ig+0.013
 perfectly+0.011
 patiently+0.011
fully+0.010
inn+0.010
Neg logit contributions
 ranc-0.014
 men-0.013
 officers-0.013
 policemen-0.013
 goats-0.012
Token!
Token ID0
Feature activation+1.47e-02

Pos logit contributions
 ranc+0.125
 officers+0.115
 men+0.113
 policemen+0.113
 goats+0.112
Neg logit contributions
ig-0.077
 perfectly-0.064
 patiently-0.058
 marker-0.053
fully-0.053
Token day
Token ID1110
Feature activation-3.47e-02

Pos logit contributions
 ranc+0.043
 officers+0.040
 men+0.039
 policemen+0.039
 goats+0.039
Neg logit contributions
ig-0.026
 perfectly-0.022
 patiently-0.020
 marker-0.018
 triangle-0.018
Token was
Token ID373
Feature activation-1.62e-02

Pos logit contributions
ig+0.055
 perfectly+0.046
 patiently+0.042
 marker+0.039
 Linda+0.039
Neg logit contributions
 ranc-0.078
 officers-0.072
 men-0.071
 policemen-0.070
 goats-0.070
Token finally
Token ID3443
Feature activation-3.95e-02

Pos logit contributions
ig+0.024
 perfectly+0.017
 marker+0.016
 triangle+0.016
 toe+0.016
Neg logit contributions
 ranc-0.037
 men-0.035
 officers-0.035
 policemen-0.035
 goats-0.034
Token over
Token ID625
Feature activation+3.75e-03

Pos logit contributions
ig+0.051
 perfectly+0.042
 patiently+0.038
 Linda+0.034
 marker+0.033
Neg logit contributions
 ranc-0.100
 officers-0.094
 men-0.093
 policemen-0.092
 goats-0.092
Token,
Token ID11
Feature activation-1.21e-02

Pos logit contributions
 ranc+0.010
 officers+0.009
 men+0.009
 policemen+0.009
 goats+0.009
Neg logit contributions
ig-0.004
 perfectly-0.003
 patiently-0.003
 Linda-0.003
 marker-0.003
Token they
Token ID484
Feature activation+6.73e-02

Pos logit contributions
ig+0.016
 perfectly+0.014
 patiently+0.013
 Kate+0.012
 Linda+0.012
Neg logit contributions
 ranc-0.030
 officers-0.028
 goats-0.026
 policemen-0.026
 men-0.026
Token both
Token ID1111
Feature activation-7.95e-03

Pos logit contributions
 ranc+0.140
 officers+0.130
 men+0.127
 policemen+0.125
 goats+0.124
Neg logit contributions
ig-0.118
 perfectly-0.098
 patiently-0.090
 marker-0.086
 toe-0.085
Token felt
Token ID2936
Feature activation+2.37e-03

Pos logit contributions
ig+0.014
 perfectly+0.012
 patiently+0.011
 marker+0.011
mate+0.011
Neg logit contributions
 ranc-0.016
 officers-0.015
 men-0.015
 goats-0.014
 policemen-0.014
Token satisfied
Token ID11378
Feature activation+1.34e-02

Pos logit contributions
 ranc+0.006
 officers+0.005
 men+0.005
 policemen+0.005
 goats+0.005
Neg logit contributions
ig-0.003
 perfectly-0.003
 patiently-0.003
 proudly-0.002
 Linda-0.002
Token with
Token ID351
Feature activation-2.68e-03

Pos logit contributions
 ranc+0.031
 officers+0.029
 men+0.029
 policemen+0.029
 goats+0.028
Neg logit contributions
ig-0.020
 perfectly-0.017
 patiently-0.016
 marker-0.014
 Linda-0.014
Token their
Token ID511
Feature activation+3.50e-02

Pos logit contributions
ig+0.004
 perfectly+0.003
 patiently+0.003
 marker+0.003
zz+0.003
Neg logit contributions
 ranc-0.006
 officers-0.006
 men-0.006
 policemen-0.006
 goats-0.006
Token The
Token ID383
Feature activation+2.62e-02

Pos logit contributions
 ranc+0.003
 men+0.003
 officers+0.003
 policemen+0.003
 goats+0.003
Neg logit contributions
ig-0.002
 Emily-0.002
 Linda-0.002
 M-0.002
 Kate-0.002
Token girl
Token ID2576
Feature activation-1.56e-02

Pos logit contributions
 ranc+0.054
 officers+0.051
 goats+0.050
 cages+0.049
 policemen+0.049
Neg logit contributions
ig-0.041
 patiently-0.036
 perfectly-0.034
 brush-0.033
 marker-0.033
Token said
Token ID531
Feature activation+1.72e-02

Pos logit contributions
ig+0.022
 perfectly+0.018
 patiently+0.017
 dab+0.015
 proudly+0.015
Neg logit contributions
 ranc-0.037
 officers-0.035
 men-0.035
 policemen-0.034
 goats-0.034
Token she
Token ID673
Feature activation+9.70e-03

Pos logit contributions
 ranc+0.042
 men+0.039
 officers+0.039
 policemen+0.038
 goats+0.037
Neg logit contributions
ig-0.025
 perfectly-0.020
 patiently-0.018
 marker-0.016
zz-0.016
Token wanted
Token ID2227
Feature activation-8.93e-03

Pos logit contributions
 ranc+0.022
 officers+0.020
 men+0.020
 goats+0.020
 policemen+0.020
Neg logit contributions
ig-0.015
 perfectly-0.013
 patiently-0.012
 marker-0.011
 Linda-0.011
Token to
Token ID284
Feature activation+2.88e-02

Pos logit contributions
ig+0.012
 perfectly+0.010
 patiently+0.009
 marker+0.008
 Linda+0.008
Neg logit contributions
 ranc-0.022
 officers-0.021
 men-0.021
 policemen-0.020
 goats-0.020
Token help
Token ID1037
Feature activation+1.93e-02

Pos logit contributions
 ranc+0.066
 officers+0.062
 men+0.061
 policemen+0.060
 goats+0.059
Neg logit contributions
ig-0.045
 perfectly-0.036
 patiently-0.032
inn-0.030
 Linda-0.030
Token the
Token ID262
Feature activation+1.75e-02

Pos logit contributions
 ranc+0.042
 officers+0.039
 men+0.039
 policemen+0.038
 goats+0.038
Neg logit contributions
ig-0.032
 perfectly-0.026
 patiently-0.024
 marker-0.023
zz-0.022
Token deaf
Token ID21815
Feature activation-1.64e-02

Pos logit contributions
 ranc+0.030
 officers+0.028
 goats+0.028
 cages+0.027
 policemen+0.026
Neg logit contributions
ig-0.032
 patiently-0.030
 brush-0.028
 crystal-0.028
 marker-0.027
Token man
Token ID582
Feature activation+4.93e-02

Pos logit contributions
ig+0.024
 pencil+0.024
 brush+0.023
 patiently+0.022
 proudly+0.022
Neg logit contributions
 ranc-0.031
 cages-0.029
girls-0.029
 goats-0.028
 officers-0.028
Token and
Token ID290
Feature activation-2.77e-03

Pos logit contributions
 ranc+0.103
 officers+0.097
 men+0.095
 policemen+0.092
 goats+0.092
Neg logit contributions
ig-0.084
 perfectly-0.068
 patiently-0.064
 Linda-0.061
 marker-0.061
Token sister
Token ID6621
Feature activation+2.39e-02

Pos logit contributions
ig+0.058
 perfectly+0.050
mate+0.046
fully+0.045
inn+0.044
Neg logit contributions
 ranc-0.071
 men-0.063
 goats-0.063
 officers-0.062
 policemen-0.060
Token.
Token ID13
Feature activation+2.33e-02

Pos logit contributions
 ranc+0.049
 officers+0.046
 men+0.044
 goats+0.044
 policemen+0.043
Neg logit contributions
ig-0.042
 perfectly-0.036
 patiently-0.033
zz-0.031
 marker-0.031
Token He
Token ID679
Feature activation+2.21e-02

Pos logit contributions
 ranc+0.049
 officers+0.045
 men+0.044
 policemen+0.044
 goats+0.044
Neg logit contributions
ig-0.039
 perfectly-0.034
 patiently-0.032
 marker-0.029
 Kate-0.029
Token loved
Token ID6151
Feature activation+1.16e-03

Pos logit contributions
 ranc+0.045
 officers+0.041
 men+0.040
 goats+0.039
 policemen+0.039
Neg logit contributions
ig-0.040
 perfectly-0.033
 patiently-0.031
 marker-0.029
mate-0.029
Token to
Token ID284
Feature activation+4.02e-02

Pos logit contributions
 ranc+0.003
 officers+0.003
 men+0.002
 policemen+0.002
 goats+0.002
Neg logit contributions
ig-0.002
 perfectly-0.001
 patiently-0.001
zz-0.001
 Linda-0.001
Token play
Token ID711
Feature activation+8.00e-03

Pos logit contributions
 ranc+0.084
 officers+0.077
 goats+0.076
 men+0.075
 policemen+0.073
Neg logit contributions
ig-0.069
 perfectly-0.057
 patiently-0.051
inn-0.050
fully-0.047
Token with
Token ID351
Feature activation-1.24e-02

Pos logit contributions
 ranc+0.017
 men+0.016
 officers+0.016
 policemen+0.015
 goats+0.015
Neg logit contributions
ig-0.013
 perfectly-0.011
 patiently-0.009
zz-0.009
 dab-0.009
Token her
Token ID607
Feature activation-1.41e-02

Pos logit contributions
ig+0.019
 perfectly+0.015
 patiently+0.014
zz+0.013
 dab+0.013
Neg logit contributions
 ranc-0.028
 officers-0.027
 men-0.027
 policemen-0.026
 goats-0.026
Token and
Token ID290
Feature activation-1.09e-03

Pos logit contributions
ig+0.027
 perfectly+0.023
 patiently+0.022
zz+0.021
 dab+0.021
Neg logit contributions
 ranc-0.026
 officers-0.024
 men-0.024
 goats-0.023
 policemen-0.023
Token they
Token ID484
Feature activation+7.78e-02

Pos logit contributions
ig+0.002
 perfectly+0.001
inn+0.001
zz+0.001
 patiently+0.001
Neg logit contributions
 ranc-0.002
 officers-0.002
 men-0.002
 goats-0.002
 policemen-0.002
Token were
Token ID547
Feature activation+7.96e-03

Pos logit contributions
 ranc+0.166
 officers+0.153
 goats+0.147
 men+0.146
 policemen+0.143
Neg logit contributions
ig-0.135
 perfectly-0.107
 patiently-0.102
 toe-0.095
 Linda-0.094
Token.
Token ID13
Feature activation+2.25e-03

Pos logit contributions
ig+0.006
 perfectly+0.005
 patiently+0.004
 marker+0.004
fully+0.004
Neg logit contributions
 ranc-0.011
 officers-0.010
 men-0.010
 policemen-0.010
 goats-0.010
Token They
Token ID1119
Feature activation+5.52e-02

Pos logit contributions
 ranc+0.005
 officers+0.004
 men+0.004
 policemen+0.004
 goats+0.004
Neg logit contributions
ig-0.004
 Linda-0.003
 perfectly-0.003
 Emily-0.003
 Kate-0.003
Token had
Token ID550
Feature activation-4.70e-02

Pos logit contributions
 ranc+0.123
 men+0.118
 policemen+0.116
 officers+0.115
 goats+0.114
Neg logit contributions
ig-0.080
 perfectly-0.070
 patiently-0.067
 proudly-0.063
 gently-0.061
Token had
Token ID550
Feature activation-4.67e-02

Pos logit contributions
ig+0.047
 perfectly+0.044
 patiently+0.039
 coloring+0.034
 dab+0.033
Neg logit contributions
 ranc-0.123
 officers-0.114
 men-0.114
 policemen-0.113
 goats-0.108
Token an
Token ID281
Feature activation-1.36e-02

Pos logit contributions
ig+0.056
 perfectly+0.050
 patiently+0.044
 coloring+0.039
 Linda+0.039
Neg logit contributions
 ranc-0.118
 officers-0.109
 men-0.108
 policemen-0.107
 goats-0.103
Token adventure
Token ID8855
Feature activation+2.58e-02

Pos logit contributions
ig+0.024
 perfectly+0.020
 patiently+0.019
 marker+0.018
 Linda+0.018
Neg logit contributions
 ranc-0.028
 officers-0.026
 men-0.026
 policemen-0.025
 goats-0.025
Token.
Token ID13
Feature activation+6.73e-03

Pos logit contributions
 ranc+0.065
 officers+0.060
 policemen+0.059
 men+0.059
 goats+0.058
Neg logit contributions
ig-0.034
 perfectly-0.029
 patiently-0.026
fully-0.023
 proudly-0.022
Token<|endoftext|>
Token ID50256
Feature activation+9.32e-02

Pos logit contributions
 ranc+0.014
 officers+0.013
 men+0.013
 policemen+0.013
 goats+0.013
Neg logit contributions
ig-0.011
 perfectly-0.009
 Linda-0.009
 patiently-0.009
 Emily-0.009
TokenL
Token ID43
Feature activation-1.58e-02

Pos logit contributions
 ranc+0.179
 officers+0.166
 policemen+0.160
 men+0.160
 goats+0.156
Neg logit contributions
ig-0.182
 perfectly-0.151
 patiently-0.140
 triangle-0.136
on-0.136
Tokenily
Token ID813
Feature activation-3.74e-02

Pos logit contributions
ig+0.026
 perfectly+0.021
 patiently+0.019
 Linda+0.018
 marker+0.018
Neg logit contributions
 ranc-0.036
 officers-0.033
 men-0.032
 policemen-0.032
 goats-0.032
Token loved
Token ID6151
Feature activation+2.24e-02

Pos logit contributions
ig+0.052
 perfectly+0.041
 patiently+0.039
 proudly+0.036
zz+0.034
Neg logit contributions
 ranc-0.091
 officers-0.086
 policemen-0.085
 men-0.085
 goats-0.082
Token hopped
Token ID45230
Feature activation-2.03e-02

Pos logit contributions
 ranc+0.007
 officers+0.006
 men+0.006
 policemen+0.006
 goats+0.006
Neg logit contributions
ig-0.005
 perfectly-0.004
 patiently-0.003
 Linda-0.003
 marker-0.003
Token by
Token ID416
Feature activation+1.89e-02

Pos logit contributions
ig+0.019
 perfectly+0.015
 patiently+0.013
 proudly+0.011
 Linda+0.011
Neg logit contributions
 ranc-0.058
 officers-0.054
 policemen-0.054
 men-0.054
 goats-0.053
Token.
Token ID13
Feature activation-2.38e-02

Pos logit contributions
 ranc+0.031
 men+0.031
 officers+0.029
 cages+0.029
 policemen+0.029
Neg logit contributions
ig-0.038
 perfectly-0.030
pron-0.029
 marker-0.028
 patiently-0.028
Token It
Token ID632
Feature activation-1.02e-02

Pos logit contributions
ig+0.038
 perfectly+0.032
 patiently+0.032
 Linda+0.031
 Emily+0.031
Neg logit contributions
 ranc-0.050
 officers-0.046
 men-0.045
 policemen-0.045
 goats-0.044
Token looked
Token ID3114
Feature activation-1.77e-03

Pos logit contributions
ig+0.015
 perfectly+0.013
 patiently+0.012
 Linda+0.011
 proudly+0.010
Neg logit contributions
 ranc-0.024
 officers-0.022
 men-0.022
 policemen-0.022
 goats-0.021
Token up
Token ID510
Feature activation+2.53e-02

Pos logit contributions
ig+0.003
 perfectly+0.002
 patiently+0.002
zz+0.002
 marker+0.002
Neg logit contributions
 ranc-0.004
 officers-0.004
 men-0.004
 goats-0.004
 policemen-0.004
Token at
Token ID379
Feature activation+2.23e-02

Pos logit contributions
 ranc+0.052
 men+0.049
 officers+0.048
 goats+0.047
 policemen+0.047
Neg logit contributions
ig-0.043
 perfectly-0.035
 marker-0.031
mate-0.031
 dab-0.031
Token the
Token ID262
Feature activation+2.34e-02

Pos logit contributions
 ranc+0.044
 men+0.042
 goats+0.041
 officers+0.041
 policemen+0.040
Neg logit contributions
ig-0.040
 perfectly-0.033
 patiently-0.031
 dab-0.030
 paste-0.030
Token bear
Token ID6842
Feature activation-2.07e-02

Pos logit contributions
 ranc+0.055
 policemen+0.052
 announcements+0.052
 men+0.052
 officers+0.051
Neg logit contributions
ig-0.031
 leaf-0.026
 marker-0.026
 triangle-0.025
 crystal-0.025
Token,
Token ID11
Feature activation-4.35e-03

Pos logit contributions
ig+0.034
 perfectly+0.028
 patiently+0.026
 marker+0.024
 Linda+0.024
Neg logit contributions
 ranc-0.045
 officers-0.042
 men-0.041
 policemen-0.040
 goats-0.040
Token who
Token ID508
Feature activation-3.29e-02

Pos logit contributions
ig+0.007
 perfectly+0.006
 patiently+0.005
 marker+0.005
 Linda+0.005
Neg logit contributions
 ranc-0.010
 officers-0.009
 men-0.009
 policemen-0.009
 goats-0.009
Token said
Token ID531
Feature activation+3.86e-02

Pos logit contributions
 ranc+0.160
 men+0.148
 goats+0.147
 cages+0.147
 policemen+0.147
Neg logit contributions
ig-0.102
 perfectly-0.085
 patiently-0.084
fully-0.077
 proudly-0.077
Token.
Token ID13
Feature activation-4.90e-03

Pos logit contributions
 ranc+0.079
 officers+0.074
 men+0.073
 policemen+0.071
 goats+0.070
Neg logit contributions
ig-0.068
 perfectly-0.057
 patiently-0.052
 marker-0.050
 Linda-0.050
Token "
Token ID366
Feature activation+2.68e-02

Pos logit contributions
ig+0.006
 patiently+0.005
 perfectly+0.005
 proudly+0.004
 Linda+0.004
Neg logit contributions
 ranc-0.013
 officers-0.012
 goats-0.011
 men-0.011
 policemen-0.011
TokenWe
Token ID1135
Feature activation-6.16e-03

Pos logit contributions
 ranc+0.058
 officers+0.054
 men+0.053
 policemen+0.052
 goats+0.051
Neg logit contributions
ig-0.043
 perfectly-0.038
 patiently-0.036
 triangle-0.034
fully-0.034
Token hope
Token ID2911
Feature activation+3.44e-03

Pos logit contributions
ig+0.009
 perfectly+0.007
 patiently+0.007
zz+0.006
 marker+0.006
Neg logit contributions
 ranc-0.015
 men-0.014
 goats-0.013
 officers-0.013
 policemen-0.013
Token you
Token ID345
Feature activation-9.23e-03

Pos logit contributions
 ranc+0.009
 officers+0.008
 men+0.008
 policemen+0.008
 goats+0.008
Neg logit contributions
ig-0.005
 perfectly-0.004
 patiently-0.004
 Linda-0.003
fully-0.003
Token have
Token ID423
Feature activation-4.08e-02

Pos logit contributions
ig+0.013
 perfectly+0.011
 patiently+0.010
 toe+0.009
zz+0.009
Neg logit contributions
 ranc-0.022
 men-0.021
 officers-0.020
 policemen-0.020
 goats-0.020
Token a
Token ID257
Feature activation-5.38e-02

Pos logit contributions
ig+0.066
 perfectly+0.055
 patiently+0.051
 marker+0.047
 Linda+0.047
Neg logit contributions
 ranc-0.090
 officers-0.084
 men-0.083
 policemen-0.082
 goats-0.081
Token better
Token ID1365
Feature activation-3.18e-02

Pos logit contributions
ig+0.094
 perfectly+0.079
fully+0.073
inn+0.070
zz+0.069
Neg logit contributions
 ranc-0.113
 officers-0.102
 men-0.102
 policemen-0.102
 goats-0.098
Token birthday
Token ID10955
Feature activation-1.60e-02

Pos logit contributions
ig+0.047
 perfectly+0.038
 patiently+0.035
 marker+0.032
 Linda+0.032
Neg logit contributions
 ranc-0.075
 officers-0.070
 men-0.069
 policemen-0.068
 goats-0.068
Token next
Token ID1306
Feature activation-8.11e-03

Pos logit contributions
ig+0.026
 perfectly+0.021
 patiently+0.019
 toe+0.019
 Linda+0.019
Neg logit contributions
 ranc-0.035
 officers-0.033
 men-0.033
 policemen-0.032
 goats-0.031
Token was
Token ID373
Feature activation+1.85e-02

Pos logit contributions
 ranc+0.135
 policemen+0.127
 officers+0.126
 goats+0.124
 announcements+0.122
Neg logit contributions
ig-0.047
 perfectly-0.037
 patiently-0.026
 leaf-0.024
 gently-0.023
Token too
Token ID1165
Feature activation+1.20e-01

Pos logit contributions
 ranc+0.044
 officers+0.043
 men+0.041
 policemen+0.041
 goats+0.041
Neg logit contributions
ig-0.025
 perfectly-0.022
 patiently-0.020
 toe-0.018
 leaf-0.018
Token deep
Token ID2769
Feature activation-3.16e-03

Pos logit contributions
 officers+0.310
 ranc+0.298
 policemen+0.294
 men+0.283
 announcements+0.281
Neg logit contributions
ig-0.131
 patiently-0.117
 leaf-0.113
 gently-0.113
 perfectly-0.111
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
ig+0.005
 perfectly+0.004
 patiently+0.004
 marker+0.003
 Linda+0.003
Neg logit contributions
 ranc-0.007
 officers-0.007
 men-0.007
 policemen-0.007
 goats-0.006
Token The
Token ID383
Feature activation+3.16e-02

Pos logit contributions
 ranc+0.029
 officers+0.027
 men+0.026
 policemen+0.026
 goats+0.026
Neg logit contributions
ig-0.015
 perfectly-0.013
 triangle-0.012
 Kate-0.012
 Emily-0.012
Token insect
Token ID12833
Feature activation-2.66e-02

Pos logit contributions
 ranc+0.071
 men+0.067
 officers+0.065
 goats+0.065
 policemen+0.064
Neg logit contributions
ig-0.050
 perfectly-0.042
 patiently-0.038
fully-0.035
inn-0.034
Token tried
Token ID3088
Feature activation-3.49e-02

Pos logit contributions
ig+0.039
 perfectly+0.031
 patiently+0.028
 toe+0.026
 marker+0.025
Neg logit contributions
 ranc-0.064
 officers-0.060
 policemen-0.059
 men-0.059
 goats-0.058
Token to
Token ID284
Feature activation+2.16e-02

Pos logit contributions
ig+0.052
 perfectly+0.041
 patiently+0.037
 marker+0.035
 Linda+0.035
Neg logit contributions
 ranc-0.082
 officers-0.076
 men-0.076
 policemen-0.075
 goats-0.075
Token swim
Token ID9422
Feature activation+1.75e-02

Pos logit contributions
 ranc+0.052
 officers+0.050
 men+0.049
 policemen+0.049
 goats+0.048
Neg logit contributions
ig-0.029
 perfectly-0.024
 patiently-0.023
 toe-0.021
 leaf-0.021
Token but
Token ID475
Feature activation-2.65e-02

Pos logit contributions
 ranc+0.043
 officers+0.040
 men+0.040
 policemen+0.039
 goats+0.039
Neg logit contributions
ig-0.024
 perfectly-0.019
 patiently-0.017
 marker-0.016
 dab-0.016
Token it
Token ID340
Feature activation+1.41e-02

Pos logit contributions
ig+0.034
 perfectly+0.026
 patiently+0.023
 marker+0.021
 Linda+0.021
Neg logit contributions
 ranc-0.067
 officers-0.063
 men-0.063
 policemen-0.061
 goats-0.061
TokenYou
Token ID1639
Feature activation-4.29e-02

Pos logit contributions
 ranc+0.038
 officers+0.034
 goats+0.034
 men+0.033
 policemen+0.033
Neg logit contributions
ig-0.048
 perfectly-0.042
 patiently-0.040
 Linda-0.037
 marker-0.037
Token're
Token ID821
Feature activation-1.69e-02

Pos logit contributions
ig+0.070
 perfectly+0.060
 patiently+0.054
 marker+0.054
 Linda+0.052
Neg logit contributions
 ranc-0.095
 men-0.086
 officers-0.086
 policemen-0.086
 goats-0.084
Token welcome
Token ID7062
Feature activation+1.78e-02

Pos logit contributions
ig+0.028
 perfectly+0.023
 patiently+0.022
 marker+0.020
 Linda+0.020
Neg logit contributions
 ranc-0.037
 officers-0.034
 men-0.034
 policemen-0.033
 goats-0.033
Token,
Token ID11
Feature activation-3.42e-03

Pos logit contributions
 ranc+0.045
 officers+0.043
 men+0.042
 policemen+0.042
 goats+0.041
Neg logit contributions
ig-0.023
 perfectly-0.018
 patiently-0.016
 Linda-0.015
 marker-0.015
Token Ben
Token ID3932
Feature activation-2.52e-02

Pos logit contributions
ig+0.004
 perfectly+0.003
 patiently+0.003
 marker+0.003
fully+0.003
Neg logit contributions
 ranc-0.009
 officers-0.008
 men-0.008
 goats-0.008
 policemen-0.008
Token.
Token ID13
Feature activation-2.88e-02

Pos logit contributions
ig+0.040
 perfectly+0.033
 patiently+0.031
 marker+0.028
 Linda+0.028
Neg logit contributions
 ranc-0.057
 officers-0.053
 men-0.052
 policemen-0.052
 goats-0.051
Token You
Token ID921
Feature activation-3.24e-02

Pos logit contributions
ig+0.055
 perfectly+0.046
 patiently+0.045
 marker+0.042
 masterpiece+0.041
Neg logit contributions
 ranc-0.055
 officers-0.049
 men-0.049
 goats-0.049
 policemen-0.047
Token are
Token ID389
Feature activation+4.94e-03

Pos logit contributions
ig+0.053
 perfectly+0.046
 patiently+0.041
 marker+0.040
 brush+0.039
Neg logit contributions
 ranc-0.071
 men-0.065
 policemen-0.065
 officers-0.064
 goats-0.063
Token a
Token ID257
Feature activation-2.15e-02

Pos logit contributions
 ranc+0.011
 officers+0.010
 men+0.010
 policemen+0.010
 goats+0.010
Neg logit contributions
ig-0.008
 perfectly-0.006
 patiently-0.006
 marker-0.005
zz-0.005
Token good
Token ID922
Feature activation-1.21e-02

Pos logit contributions
ig+0.040
fully+0.033
 perfectly+0.033
inn+0.031
 patiently+0.031
Neg logit contributions
 ranc-0.043
 men-0.039
 policemen-0.038
 goats-0.038
 officers-0.037
Token friend
Token ID1545
Feature activation+4.69e-03

Pos logit contributions
ig+0.021
 perfectly+0.020
fully+0.017
 patiently+0.017
on+0.017
Neg logit contributions
 ranc-0.024
 men-0.023
 officers-0.021
 goats-0.021
 policemen-0.021
Token mom
Token ID1995
Feature activation+3.67e-02

Pos logit contributions
 ranc+0.006
 officers+0.006
 men+0.006
 policemen+0.006
 goats+0.006
Neg logit contributions
ig-0.004
 perfectly-0.003
 patiently-0.003
 marker-0.002
 toe-0.002
Token smiled
Token ID13541
Feature activation+1.04e-02

Pos logit contributions
 ranc+0.085
 officers+0.079
 men+0.079
 policemen+0.077
 goats+0.077
Neg logit contributions
ig-0.055
 perfectly-0.045
 patiently-0.042
 marker-0.038
 Linda-0.038
Token and
Token ID290
Feature activation-9.01e-04

Pos logit contributions
 ranc+0.022
 officers+0.021
 men+0.020
 policemen+0.020
 goats+0.020
Neg logit contributions
ig-0.018
 perfectly-0.014
 patiently-0.013
 marker-0.013
zz-0.013
Token said
Token ID531
Feature activation+6.07e-03

Pos logit contributions
ig+0.002
 perfectly+0.002
 marker+0.002
inn+0.001
mate+0.001
Neg logit contributions
 ranc-0.002
 officers-0.001
 goats-0.001
 men-0.001
 policemen-0.001
Token,
Token ID11
Feature activation-1.06e-02

Pos logit contributions
 ranc+0.013
 men+0.013
 officers+0.013
 policemen+0.012
 goats+0.012
Neg logit contributions
ig-0.010
 perfectly-0.007
 marker-0.007
 patiently-0.007
 toe-0.007
Token �
Token ID6184
Feature activation-8.59e-03

Pos logit contributions
ig+0.021
 perfectly+0.016
 toe+0.015
 marker+0.015
zz+0.015
Neg logit contributions
 ranc-0.021
 officers-0.019
 men-0.018
 goats-0.018
 policemen-0.018
Token�
Token ID95
Feature activation+2.88e-02

Pos logit contributions
ig+0.016
 perfectly+0.014
 patiently+0.013
 proudly+0.012
fully+0.012
Neg logit contributions
 ranc-0.017
 men-0.016
 goats-0.015
 policemen-0.015
 officers-0.015
Token€
Token ID26391
Feature activation-4.46e-02

Pos logit contributions
 ranc+0.059
 men+0.055
 goats+0.055
 policemen+0.054
 officers+0.053
Neg logit contributions
ig-0.048
 perfectly-0.044
 patiently-0.040
mate-0.037
 Linda-0.036
Token�
Token ID129
Feature activation+1.45e-02

Pos logit contributions
ig+0.064
 perfectly+0.051
 Emily+0.050
on+0.048
 proudly+0.046
Neg logit contributions
 ranc-0.108
 men-0.098
 goats-0.097
 seals-0.097
 officers-0.097
Token�
Token ID241
Feature activation+3.65e-02

Pos logit contributions
 ranc+0.029
 policemen+0.028
 men+0.028
 officers+0.027
 goats+0.027
Neg logit contributions
ig-0.023
 perfectly-0.022
 patiently-0.021
 ï¿½-0.018
 gently-0.018
TokenThat
Token ID2504
Feature activation-6.24e-02

Pos logit contributions
 ranc+0.076
 officers+0.070
 men+0.069
 policemen+0.068
 goats+0.068
Neg logit contributions
ig-0.065
 perfectly-0.053
 patiently-0.050
 marker-0.046
 toe-0.046
Token.
Token ID13
Feature activation-1.39e-02

Pos logit contributions
 ranc+0.067
 officers+0.062
 policemen+0.061
 goats+0.061
 men+0.061
Neg logit contributions
ig-0.032
 perfectly-0.028
 patiently-0.026
 coloring-0.023
 proudly-0.023
Token Inside
Token ID14384
Feature activation-9.35e-02

Pos logit contributions
ig+0.021
 perfectly+0.018
 patiently+0.017
 triangle+0.016
 Kate+0.016
Neg logit contributions
 ranc-0.031
 officers-0.029
 men-0.028
 goats-0.028
 policemen-0.028
Token the
Token ID262
Feature activation-3.60e-03

Pos logit contributions
ig+0.140
 perfectly+0.115
 patiently+0.106
 marker+0.097
 toe+0.096
Neg logit contributions
 ranc-0.216
 officers-0.201
 men-0.201
 policemen-0.195
 goats-0.195
Token box
Token ID3091
Feature activation-3.88e-02

Pos logit contributions
ig+0.006
 perfectly+0.005
 patiently+0.005
fully+0.004
zz+0.004
Neg logit contributions
 ranc-0.008
 officers-0.007
 men-0.007
 policemen-0.007
 cages-0.007
Token,
Token ID11
Feature activation-1.55e-02

Pos logit contributions
ig+0.059
 perfectly+0.050
 patiently+0.046
zz+0.045
 dab+0.044
Neg logit contributions
 ranc-0.082
 men-0.079
 officers-0.078
 cages-0.074
 policemen-0.074
Token she
Token ID673
Feature activation-3.70e-04

Pos logit contributions
ig+0.027
 perfectly+0.022
 patiently+0.020
zz+0.019
 marker+0.019
Neg logit contributions
 ranc-0.033
 men-0.031
 officers-0.031
 policemen-0.030
 goats-0.029
Token found
Token ID1043
Feature activation-3.83e-02

Pos logit contributions
ig+0.001
 perfectly+0.001
fully+0.001
mate+0.001
inn+0.001
Neg logit contributions
 ranc-0.001
 officers-0.001
 men-0.001
 policemen-0.001
 goats-0.001
Token a
Token ID257
Feature activation-2.78e-02

Pos logit contributions
ig+0.066
 perfectly+0.053
 patiently+0.051
zz+0.048
 marker+0.047
Neg logit contributions
 ranc-0.080
 men-0.077
 officers-0.076
 policemen-0.074
 goats-0.073
Token beautiful
Token ID4950
Feature activation-3.74e-02

Pos logit contributions
ig+0.056
fully+0.047
 perfectly+0.046
inn+0.046
zz+0.045
Neg logit contributions
 ranc-0.051
 men-0.045
 officers-0.045
 policemen-0.044
 cages-0.043
Token dress
Token ID6576
Feature activation-2.63e-02

Pos logit contributions
ig+0.054
 perfectly+0.046
Acc+0.044
 patiently+0.044
inn+0.044
Neg logit contributions
 ranc-0.083
 men-0.075
 officers-0.074
 cages-0.073
 goats-0.072
Token.
Token ID13
Feature activation-1.32e-02

Pos logit contributions
ig+0.044
 perfectly+0.036
 patiently+0.034
 toe+0.032
 dab+0.032
Neg logit contributions
 ranc-0.056
 officers-0.053
 men-0.051
 goats-0.050
 policemen-0.049
Token the
Token ID262
Feature activation+2.60e-02

Pos logit contributions
ig+0.017
 perfectly+0.013
 patiently+0.012
 marker+0.012
 triangle+0.011
Neg logit contributions
 ranc-0.023
 officers-0.022
 men-0.021
 policemen-0.021
 goats-0.021
Token crystal
Token ID15121
Feature activation-9.79e-02

Pos logit contributions
 ranc+0.052
 officers+0.048
 men+0.048
 policemen+0.046
 goats+0.046
Neg logit contributions
ig-0.048
 perfectly-0.041
 patiently-0.038
 Linda-0.036
inn-0.036
Token stayed
Token ID9658
Feature activation+2.49e-02

Pos logit contributions
ig+0.149
 perfectly+0.122
 patiently+0.114
 Linda+0.108
 marker+0.107
Neg logit contributions
 ranc-0.219
 officers-0.204
 men-0.202
 policemen-0.196
 goats-0.196
Token on
Token ID319
Feature activation-4.37e-02

Pos logit contributions
 ranc+0.061
 officers+0.057
 men+0.057
 policemen+0.056
 goats+0.055
Neg logit contributions
ig-0.034
 perfectly-0.027
 patiently-0.025
 marker-0.023
 Linda-0.022
Token the
Token ID262
Feature activation-1.81e-03

Pos logit contributions
ig+0.069
 perfectly+0.053
 patiently+0.049
 marker+0.047
zz+0.047
Neg logit contributions
 ranc-0.100
 men-0.093
 officers-0.092
 policemen-0.091
 goats-0.091
Token rock
Token ID3881
Feature activation-9.20e-02

Pos logit contributions
ig+0.004
 perfectly+0.003
 patiently+0.003
 Linda+0.003
 marker+0.003
Neg logit contributions
 ranc-0.003
 officers-0.003
 men-0.003
 goats-0.003
 policemen-0.003
Token.
Token ID13
Feature activation-2.31e-02

Pos logit contributions
ig+0.146
 perfectly+0.120
 patiently+0.111
 marker+0.103
 Linda+0.102
Neg logit contributions
 ranc-0.205
 officers-0.191
 men-0.189
 policemen-0.185
 goats-0.184
Token That
Token ID1320
Feature activation-4.65e-02

Pos logit contributions
ig+0.040
 perfectly+0.033
 patiently+0.031
 marker+0.029
 toe+0.028
Neg logit contributions
 ranc-0.049
 officers-0.045
 men-0.045
 policemen-0.044
 goats-0.044
Token was
Token ID373
Feature activation-1.23e-02

Pos logit contributions
ig+0.050
 perfectly+0.038
 patiently+0.031
 crystal+0.029
 triangle+0.029
Neg logit contributions
 ranc-0.130
 officers-0.122
 policemen-0.120
 men-0.118
 goats-0.117
Token OK
Token ID7477
Feature activation-1.15e-02

Pos logit contributions
ig+0.020
 perfectly+0.015
 patiently+0.014
 marker+0.014
zz+0.013
Neg logit contributions
 ranc-0.028
 officers-0.026
 men-0.026
 policemen-0.025
 goats-0.025
Token,
Token ID11
Feature activation-5.09e-03

Pos logit contributions
ig+0.018
 perfectly+0.015
 patiently+0.014
 marker+0.013
 dab+0.013
Neg logit contributions
 ranc-0.025
 officers-0.024
 men-0.023
 policemen-0.023
 goats-0.023
Token bear
Token ID6842
Feature activation+7.24e-04

Pos logit contributions
 ranc+0.045
 officers+0.041
 men+0.041
 policemen+0.041
 goats+0.040
Neg logit contributions
ig-0.029
 perfectly-0.023
 patiently-0.023
 Linda-0.020
 marker-0.020
Token showed
Token ID3751
Feature activation-2.10e-02

Pos logit contributions
 ranc+0.002
 officers+0.002
 men+0.002
 policemen+0.002
 goats+0.002
Neg logit contributions
ig-0.001
 perfectly-0.001
 patiently-0.001
 marker-0.001
 Linda-0.001
Token him
Token ID683
Feature activation+1.66e-02

Pos logit contributions
ig+0.038
 perfectly+0.032
 patiently+0.029
 marker+0.028
 dab+0.027
Neg logit contributions
 ranc-0.043
 officers-0.040
 men-0.040
 policemen-0.039
 goats-0.039
Token how
Token ID703
Feature activation-2.73e-02

Pos logit contributions
 ranc+0.046
 men+0.043
 officers+0.043
 policemen+0.043
 goats+0.042
Neg logit contributions
ig-0.018
 perfectly-0.013
 patiently-0.011
 marker-0.010
 Linda-0.010
Token to
Token ID284
Feature activation-8.04e-03

Pos logit contributions
ig+0.036
 perfectly+0.029
 patiently+0.027
 marker+0.024
 Linda+0.024
Neg logit contributions
 ranc-0.069
 officers-0.064
 men-0.063
 policemen-0.063
 goats-0.062
Token smooth
Token ID7209
Feature activation-9.83e-02

Pos logit contributions
ig+0.013
 perfectly+0.010
 patiently+0.009
fully+0.009
mate+0.009
Neg logit contributions
 ranc-0.017
 men-0.016
 officers-0.016
 policemen-0.015
 cages-0.015
Token the
Token ID262
Feature activation+6.88e-03

Pos logit contributions
ig+0.155
 perfectly+0.128
 patiently+0.121
 marker+0.115
 dab+0.115
Neg logit contributions
 ranc-0.210
 men-0.201
 officers-0.195
 goats-0.193
 policemen-0.190
Token walls
Token ID7714
Feature activation+4.21e-03

Pos logit contributions
 ranc+0.017
 officers+0.016
 men+0.015
 policemen+0.015
 goats+0.015
Neg logit contributions
ig-0.010
 perfectly-0.008
 patiently-0.007
 marker-0.007
 toe-0.007
Token with
Token ID351
Feature activation-1.54e-02

Pos logit contributions
 ranc+0.008
 men+0.008
 officers+0.008
 policemen+0.008
 goats+0.007
Neg logit contributions
ig-0.007
 perfectly-0.006
 patiently-0.005
 Linda-0.005
 dab-0.005
Token his
Token ID465
Feature activation-1.52e-02

Pos logit contributions
ig+0.025
 perfectly+0.021
 patiently+0.019
 marker+0.018
 Linda+0.018
Neg logit contributions
 ranc-0.034
 officers-0.031
 men-0.031
 policemen-0.030
 goats-0.030
Token paws
Token ID49193
Feature activation-3.94e-02

Pos logit contributions
ig+0.025
 perfectly+0.021
 patiently+0.020
 Linda+0.019
fully+0.019
Neg logit contributions
 ranc-0.032
 men-0.030
 officers-0.030
 policemen-0.029
 goats-0.029
Token fun
Token ID1257
Feature activation-1.28e-02

Pos logit contributions
ig+0.070
 perfectly+0.059
 patiently+0.054
 marker+0.050
 Linda+0.049
Neg logit contributions
 ranc-0.102
 officers-0.094
 men-0.093
 policemen-0.092
 goats-0.091
Token.
Token ID13
Feature activation-8.30e-03

Pos logit contributions
ig+0.021
 perfectly+0.017
 patiently+0.016
 marker+0.015
 Linda+0.015
Neg logit contributions
 ranc-0.027
 officers-0.026
 men-0.026
 policemen-0.025
 goats-0.025
Token
Token ID198
Feature activation-4.82e-03

Pos logit contributions
ig+0.016
 perfectly+0.013
 patiently+0.012
 marker+0.012
fully+0.011
Neg logit contributions
 ranc-0.016
 officers-0.015
 men-0.015
 goats-0.014
 policemen-0.014
Token
Token ID198
Feature activation-1.57e-03

Pos logit contributions
ig+0.007
 perfectly+0.006
 patiently+0.006
 Linda+0.005
 toe+0.005
Neg logit contributions
 ranc-0.011
 officers-0.011
 men-0.010
 policemen-0.010
 goats-0.010
Token"
Token ID1
Feature activation+2.06e-02

Pos logit contributions
ig+0.003
 perfectly+0.002
 patiently+0.002
 Linda+0.002
inn+0.002
Neg logit contributions
 ranc-0.003
 officers-0.003
 men-0.003
 policemen-0.003
 goats-0.003
TokenLook
Token ID8567
Feature activation-9.68e-02

Pos logit contributions
 ranc+0.038
 officers+0.035
 goats+0.034
 men+0.034
 policemen+0.034
Neg logit contributions
ig-0.043
 perfectly-0.036
 patiently-0.034
 toe-0.032
 Linda-0.031
Token,
Token ID11
Feature activation-8.67e-03

Pos logit contributions
ig+0.158
bul+0.149
 paste+0.144
Acc+0.144
 dab+0.141
Neg logit contributions
 men-0.172
 ranc-0.167
 officers-0.166
 goats-0.157
 policemen-0.156
Token a
Token ID257
Feature activation-2.22e-02

Pos logit contributions
ig+0.013
 perfectly+0.011
 patiently+0.010
inn+0.009
fully+0.009
Neg logit contributions
 ranc-0.020
 men-0.019
 officers-0.019
 policemen-0.019
 goats-0.019
Token hose
Token ID31489
Feature activation+3.03e-02

Pos logit contributions
ig+0.042
 perfectly+0.037
 patiently+0.035
fully+0.034
inn+0.032
Neg logit contributions
 ranc-0.043
 men-0.040
 officers-0.039
 goats-0.039
 policemen-0.039
Token!"
Token ID2474
Feature activation+2.16e-02

Pos logit contributions
 ranc+0.071
 officers+0.065
 policemen+0.064
 men+0.064
 goats+0.063
Neg logit contributions
ig-0.046
 perfectly-0.037
 patiently-0.034
 marker-0.032
 triangle-0.031
Token Ben
Token ID3932
Feature activation-4.09e-03

Pos logit contributions
 ranc+0.059
 officers+0.055
 policemen+0.054
 goats+0.054
 men+0.054
Neg logit contributions
ig-0.022
 perfectly-0.017
 patiently-0.016
 Linda-0.015
 Emily-0.015
Token and
Token ID290
Feature activation-3.42e-03

Pos logit contributions
 ranc+0.054
 officers+0.050
 men+0.050
 policemen+0.049
 goats+0.049
Neg logit contributions
ig-0.036
 perfectly-0.030
 patiently-0.028
 Linda-0.025
fully-0.025
Token helped
Token ID4193
Feature activation-2.26e-02

Pos logit contributions
ig+0.006
 perfectly+0.005
 patiently+0.004
zz+0.004
 marker+0.004
Neg logit contributions
 ranc-0.007
 officers-0.007
 men-0.007
 policemen-0.007
 goats-0.006
Token him
Token ID683
Feature activation+1.68e-02

Pos logit contributions
ig+0.045
 perfectly+0.036
zz+0.033
 patiently+0.032
 bead+0.032
Neg logit contributions
 ranc-0.042
 officers-0.040
 men-0.039
 policemen-0.038
 goats-0.038
Token prop
Token ID2632
Feature activation-5.83e-02

Pos logit contributions
 ranc+0.032
 officers+0.031
 men+0.030
 policemen+0.029
 goats+0.028
Neg logit contributions
ig-0.029
 perfectly-0.023
inn-0.022
 Linda-0.021
mate-0.020
Token the
Token ID262
Feature activation-1.38e-02

Pos logit contributions
ig+0.105
 dab+0.078
zz+0.078
 blush+0.074
 perfectly+0.074
Neg logit contributions
 men-0.110
 officers-0.106
 policemen-0.102
 ranc-0.102
 cages-0.099
Token tomato
Token ID24240
Feature activation-9.07e-02

Pos logit contributions
ig+0.020
 perfectly+0.015
inn+0.013
 Linda+0.013
 patiently+0.013
Neg logit contributions
 ranc-0.034
 men-0.033
 goats-0.032
 officers-0.031
 policemen-0.031
Token plant
Token ID4618
Feature activation-3.57e-02

Pos logit contributions
ig+0.152
zz+0.119
 perfectly+0.117
inn+0.116
 Linda+0.115
Neg logit contributions
 ranc-0.168
 men-0.165
 officers-0.158
 cages-0.153
 goats-0.152
Token back
Token ID736
Feature activation-1.33e-02

Pos logit contributions
ig+0.058
 perfectly+0.044
zz+0.042
inn+0.041
 patiently+0.041
Neg logit contributions
 ranc-0.073
 men-0.071
 officers-0.069
 cages-0.067
 goats-0.067
Token up
Token ID510
Feature activation-1.55e-02

Pos logit contributions
ig+0.023
 dab+0.017
 marker+0.017
 perfectly+0.017
zz+0.017
Neg logit contributions
 ranc-0.025
 officers-0.024
 men-0.023
 cages-0.023
 policemen-0.022
Token.
Token ID13
Feature activation-2.12e-02

Pos logit contributions
ig+0.028
 perfectly+0.021
zz+0.020
 patiently+0.020
 dab+0.019
Neg logit contributions
 ranc-0.031
 officers-0.029
 men-0.029
 policemen-0.028
 cages-0.028
Token They
Token ID1119
Feature activation+4.99e-02

Pos logit contributions
ig+0.037
 perfectly+0.030
 patiently+0.027
zz+0.027
 marker+0.026
Neg logit contributions
 ranc-0.045
 officers-0.042
 men-0.041
 goats-0.040
 policemen-0.040
Token close
Token ID1969
Feature activation-1.05e-02

Pos logit contributions
ig+0.004
 perfectly+0.003
 patiently+0.003
 marker+0.003
 proudly+0.003
Neg logit contributions
 ranc-0.006
 officers-0.006
 men-0.006
 policemen-0.006
 goats-0.006
Token and
Token ID290
Feature activation-9.43e-03

Pos logit contributions
ig+0.014
 perfectly+0.012
 patiently+0.011
fully+0.010
 proudly+0.010
Neg logit contributions
 ranc-0.026
 officers-0.024
 policemen-0.024
 men-0.023
 goats-0.023
Token said
Token ID531
Feature activation+8.07e-03

Pos logit contributions
ig+0.015
 perfectly+0.011
 patiently+0.010
inn+0.010
 marker+0.010
Neg logit contributions
 ranc-0.022
 officers-0.021
 men-0.020
 policemen-0.020
 goats-0.020
Token:
Token ID25
Feature activation-1.35e-02

Pos logit contributions
 ranc+0.018
 men+0.016
 officers+0.016
 policemen+0.016
 goats+0.016
Neg logit contributions
ig-0.013
 perfectly-0.011
 patiently-0.010
 marker-0.010
 Linda-0.009
Token "
Token ID366
Feature activation+1.26e-02

Pos logit contributions
ig+0.025
 perfectly+0.020
 patiently+0.018
zz+0.018
 marker+0.017
Neg logit contributions
 ranc-0.028
 men-0.025
 officers-0.025
 goats-0.024
 policemen-0.024
TokenWow
Token ID22017
Feature activation-1.10e-01

Pos logit contributions
 ranc+0.025
 officers+0.023
 men+0.023
 goats+0.022
 policemen+0.022
Neg logit contributions
ig-0.024
 perfectly-0.020
 patiently-0.018
 Linda-0.017
 toe-0.017
Token!
Token ID0
Feature activation-4.86e-02

Pos logit contributions
ig+0.157
 perfectly+0.127
 patiently+0.118
 marker+0.113
 dab+0.111
Neg logit contributions
 ranc-0.252
 men-0.237
 officers-0.235
 policemen-0.229
 goats-0.228
Token This
Token ID770
Feature activation-3.12e-02

Pos logit contributions
ig+0.074
 perfectly+0.061
 patiently+0.056
 marker+0.051
 Linda+0.051
Neg logit contributions
 ranc-0.112
 officers-0.104
 men-0.103
 policemen-0.101
 goats-0.101
Token is
Token ID318
Feature activation+1.15e-02

Pos logit contributions
ig+0.043
 perfectly+0.034
 triangle+0.029
 marker+0.029
 patiently+0.028
Neg logit contributions
 ranc-0.078
 officers-0.071
 policemen-0.070
 goats-0.070
 men-0.068
Token so
Token ID523
Feature activation+1.54e-02

Pos logit contributions
 ranc+0.026
 officers+0.024
 men+0.024
 policemen+0.024
 goats+0.024
Neg logit contributions
ig-0.018
 perfectly-0.015
 patiently-0.014
 marker-0.013
 toe-0.012
Token cool
Token ID3608
Feature activation-4.59e-02

Pos logit contributions
 ranc+0.038
 officers+0.036
 policemen+0.035
 goats+0.035
 men+0.034
Neg logit contributions
ig-0.020
 perfectly-0.019
 patiently-0.016
 triangle-0.015
 Kate-0.014