TOP ACTIVATIONS
MAX = 0.223

ran towards him , w agging his tail and licking his
on the lawn , w agging his tail . She went
ran towards them , w agging its tail . Lily and
ran and jumped , w agging their tails .
roots , z ig z agging back and forth .
after the bone , w agging his tail with excitement .
around the house , w agging his tail .
ran towards them , w agging its tail . Lily and
Rex followed them , w agging his tail . They reached
around the yard , w agging his tail with the ribbon
together all day , w agging their tails and having fun
They saw the dog w agging his tail and licking the
playing with it and w agging his tail . He is
, he saw Spot w agging his tail again . He
down the town , w agging his tail and barking happily
, sniff ing and w agging his tail . Finally ,
after the stick , w agging his tail . He was
is pant ing and w agging his tail . He is
pet Rex . He w ags his tail and l icks
new friends . He w ags his tail and bar ks

MIN ACTIVATIONS
MIN = -0.236

was a fairy or a gn ome ," Mia says .
root , larger and more gn ar led than the others
special surprise . A thoughtful gn ome was playing the most
she had made a garden gn ome . The
bunny was a very quick lear ner and soon he was
. The gn ome was so pleased with
being brave was the best choice . The
. Sam was a quick lear ner and soon he was
purple car was the best choice . They took it home
," replied the ha ird ress er . " It feels
Rex would be a good lear ner and look er too
still so the ha ird ress er could do a good
. Peter was a fast lear ner , and after a
she told the ha ird ress er , " I want
said that was a perfect choice . We bought
to remain until another weary traveler needed it . <|endoftext|> Once
being happy was the best choice for her to remain .
was now the mouse 's reality . <|endoftext|> Once upon a
and thanked the ha ird ress er . As they were
N ora was a fast lear ner . Soon enough she

INTERVAL 0.108 - 0.223
CONTAINS 1.015%

a un icycle . She laughed and cl apped , she
" Maybe Mom will laugh too ," Lily said .
both . Now , let 's have some cookies and milk
, and they rolled all over the land . They rolled
animal and said " Let 's go cl ap for my

INTERVAL -0.006 - 0.108
CONTAINS 57.916%

quickly flew away but the fire had spread too far .
air . The light is good for the sky . The
let it splash with the water . The ducks came closer
that his favorite toy was missing . He looked everywhere ,
my . Then he said , � � € � �

INTERVAL -0.121 - -0.006
CONTAINS 39.949%

couldn 't reach it . She laughed and yelled , "
ground and Lily couldn 't get it out .
to each other . <|endoftext|> Once upon a time , there
She had been very brave and powerful . <|endoftext|> One day
! You are not my friend !" Tom felt

INTERVAL -0.236 - -0.121
CONTAINS 1.120%

and practice , the 3 year old started getting better and
, he had finally saved enough ! He was so happy
Lily watched out the window as they drove . She saw
's go inside and see how much it costs ."
and to follow good advice from those who care about you
Token ran
Token ID4966
Feature activation-6.34e-02

Pos logit contributions
ang+0.004
iker+0.004
 sur+0.004
lves+0.004
 energ+0.004
Neg logit contributions
gh-0.003
 sounds-0.002
 noses-0.002
 color-0.002
thia-0.002
Token towards
Token ID3371
Feature activation+3.21e-02

Pos logit contributions
ang+0.127
 sur+0.113
 begg+0.111
lves+0.110
iker+0.109
Neg logit contributions
gh-0.071
Their-0.063
 color-0.062
 respect-0.062
 noses-0.062
Token him
Token ID683
Feature activation+7.84e-02

Pos logit contributions
gh+0.031
 draw+0.029
 color+0.029
 talk+0.029
 respect+0.029
Neg logit contributions
ang-0.071
 energ-0.065
 sur-0.063
iker-0.062
 begg-0.062
Token,
Token ID11
Feature activation+4.58e-02

Pos logit contributions
gh+0.078
 color+0.072
 noses+0.070
 respect+0.067
 sounds+0.067
Neg logit contributions
ang-0.161
 sur-0.156
lves-0.150
 begg-0.150
iker-0.148
Token w
Token ID266
Feature activation+3.14e-02

Pos logit contributions
gh+0.055
 noses+0.054
 color+0.052
 respect+0.050
 sounds+0.050
Neg logit contributions
ang-0.084
 sur-0.078
iker-0.078
 begg-0.076
lves-0.074
Tokenagging
Token ID16406
Feature activation+2.15e-01

Pos logit contributions
 color+0.035
 draw+0.032
 discuss+0.030
 opinions+0.030
gh+0.030
Neg logit contributions
ang-0.066
 sur-0.059
iker-0.059
aro-0.056
aroo-0.056
Token his
Token ID465
Feature activation+5.04e-02

Pos logit contributions
gh+0.373
 opinions+0.343
 color+0.342
 noses+0.339
 cooperate+0.336
Neg logit contributions
ang-0.293
 sur-0.268
iker-0.250
lves-0.248
 begg-0.245
Token tail
Token ID7894
Feature activation+6.21e-02

Pos logit contributions
gh+0.061
 cooperate+0.053
 opinions+0.053
 color+0.052
 argue+0.051
Neg logit contributions
ang-0.095
 sur-0.083
aroo-0.083
lves-0.083
iker-0.082
Token and
Token ID290
Feature activation-9.95e-03

Pos logit contributions
gh+0.087
 opinions+0.079
 color+0.078
 noses+0.077
 cooperate+0.075
Neg logit contributions
ang-0.105
 sur-0.097
iker-0.092
lves-0.092
 begg-0.091
Token licking
Token ID43390
Feature activation+8.28e-02

Pos logit contributions
ang+0.018
iker+0.017
 sur+0.017
lves+0.017
 begg+0.016
Neg logit contributions
 noses-0.012
gh-0.012
 color-0.012
 sounds-0.011
 faces-0.011
Token his
Token ID465
Feature activation+3.49e-02

Pos logit contributions
gh+0.135
 color+0.123
 noses+0.123
 opinions+0.121
 respect+0.117
Neg logit contributions
ang-0.125
 sur-0.115
iker-0.110
lves-0.109
 begg-0.108
Token on
Token ID319
Feature activation+6.41e-02

Pos logit contributions
ang+0.042
 sur+0.040
lves+0.039
iker+0.038
 begg+0.038
Neg logit contributions
gh-0.033
 color-0.030
 noses-0.029
 sounds-0.028
 opinions-0.028
Token the
Token ID262
Feature activation+4.96e-03

Pos logit contributions
gh+0.099
 color+0.092
 noses+0.090
 opinions+0.089
 respect+0.086
Neg logit contributions
ang-0.106
 sur-0.097
lves-0.094
iker-0.093
 begg-0.091
Token lawn
Token ID21133
Feature activation+4.32e-02

Pos logit contributions
gh+0.006
 color+0.006
 noses+0.005
 rhythm+0.005
 opinions+0.005
Neg logit contributions
ang-0.010
 sur-0.009
lves-0.008
iker-0.008
 begg-0.008
Token,
Token ID11
Feature activation+3.32e-02

Pos logit contributions
gh+0.052
 color+0.049
 noses+0.047
 opinions+0.046
 sounds+0.045
Neg logit contributions
ang-0.082
 sur-0.077
lves-0.075
iker-0.074
 begg-0.073
Token w
Token ID266
Feature activation+2.51e-02

Pos logit contributions
 color+0.041
 noses+0.041
gh+0.040
 sounds+0.039
 draw+0.039
Neg logit contributions
ang-0.062
iker-0.057
 sur-0.056
lves-0.055
 begg-0.054
Tokenagging
Token ID16406
Feature activation+2.12e-01

Pos logit contributions
gh+0.032
 color+0.029
 noses+0.029
 opinions+0.028
 cooperate+0.027
Neg logit contributions
ang-0.046
 sur-0.043
iker-0.041
lves-0.041
 begg-0.040
Token his
Token ID465
Feature activation+4.53e-02

Pos logit contributions
gh+0.317
 opinions+0.288
 color+0.284
 cooperate+0.283
 noses+0.282
Neg logit contributions
ang-0.333
 sur-0.309
lves-0.288
 Rory-0.287
iker-0.287
Token tail
Token ID7894
Feature activation+5.59e-02

Pos logit contributions
gh+0.048
 opinions+0.040
 cooperate+0.040
 color+0.039
etrical+0.038
Neg logit contributions
ang-0.092
 sur-0.085
iker-0.081
 begg-0.081
aroo-0.081
Token.
Token ID13
Feature activation+1.36e-02

Pos logit contributions
gh+0.070
 color+0.063
 noses+0.063
 opinions+0.061
 respect+0.058
Neg logit contributions
ang-0.104
 sur-0.097
lves-0.094
iker-0.093
 begg-0.092
Token She
Token ID1375
Feature activation-1.36e-02

Pos logit contributions
 color+0.014
gh+0.014
 noses+0.013
Their+0.013
thia+0.013
Neg logit contributions
ang-0.026
iker-0.026
lves-0.026
 sur-0.025
ester-0.025
Token went
Token ID1816
Feature activation-8.57e-02

Pos logit contributions
ang+0.026
iker+0.025
lves+0.024
 sur+0.023
 begg+0.023
Neg logit contributions
gh-0.016
 noses-0.016
 sounds-0.015
 color-0.015
 opinions-0.015
Token ran
Token ID4966
Feature activation-7.20e-02

Pos logit contributions
gh+0.093
 color+0.086
 noses+0.086
 opinions+0.082
 sounds+0.082
Neg logit contributions
ang-0.112
 sur-0.105
iker-0.103
lves-0.099
 begg-0.098
Token towards
Token ID3371
Feature activation+2.56e-02

Pos logit contributions
ang+0.114
 sur+0.105
iker+0.100
 begg+0.098
lves+0.097
Neg logit contributions
gh-0.109
 noses-0.100
 color-0.100
 opinions-0.098
 respect-0.097
Token them
Token ID606
Feature activation+8.56e-02

Pos logit contributions
gh+0.026
 color+0.023
 noses+0.023
 opinions+0.023
 respect+0.021
Neg logit contributions
ang-0.054
 sur-0.051
iker-0.049
lves-0.048
 begg-0.048
Token,
Token ID11
Feature activation+5.11e-02

Pos logit contributions
gh+0.111
 color+0.099
 noses+0.098
 opinions+0.097
 cooperate+0.093
Neg logit contributions
ang-0.156
 sur-0.146
iker-0.139
lves-0.139
 begg-0.138
Token w
Token ID266
Feature activation+3.43e-02

Pos logit contributions
gh+0.071
 color+0.062
 opinions+0.061
 noses+0.060
 cooperate+0.059
Neg logit contributions
ang-0.090
 sur-0.083
lves-0.079
iker-0.078
 begg-0.078
Tokenagging
Token ID16406
Feature activation+2.12e-01

Pos logit contributions
 color+0.042
 draw+0.038
tr+0.037
 opinions+0.037
 discuss+0.036
Neg logit contributions
ang-0.071
 sur-0.062
iker-0.061
aro-0.060
aroo-0.058
Token its
Token ID663
Feature activation+8.35e-02

Pos logit contributions
gh+0.375
 cooperate+0.372
icious+0.364
etrical+0.360
 opinions+0.357
Neg logit contributions
ang-0.242
aroo-0.215
 sur-0.209
 he-0.202
aro-0.199
Token tail
Token ID7894
Feature activation+5.34e-02

Pos logit contributions
gh+0.131
etrical+0.113
 cooperate+0.111
 discuss+0.109
ographs+0.107
Neg logit contributions
ang-0.130
lves-0.109
aroo-0.106
 Kang-0.105
 fisherman-0.103
Token.
Token ID13
Feature activation+1.45e-02

Pos logit contributions
gh+0.084
 opinions+0.076
 cooperate+0.073
 color+0.072
etrical+0.072
Neg logit contributions
ang-0.082
 sur-0.073
 begg-0.069
iker-0.069
aroo-0.068
Token Lily
Token ID20037
Feature activation+1.26e-02

Pos logit contributions
Their+0.015
gh+0.014
 color+0.013
 noses+0.013
 opinions+0.013
Neg logit contributions
ang-0.029
iker-0.029
lves-0.028
 sur-0.028
 traveller-0.028
Token and
Token ID290
Feature activation-1.74e-02

Pos logit contributions
gh+0.015
 noses+0.014
 color+0.014
 opinions+0.013
 sounds+0.013
Neg logit contributions
ang-0.024
 sur-0.022
iker-0.022
lves-0.022
 begg-0.021
Token ran
Token ID4966
Feature activation-5.50e-02

Pos logit contributions
gh+0.076
 color+0.069
 noses+0.068
 opinions+0.065
 draw+0.064
Neg logit contributions
ang-0.120
 sur-0.112
iker-0.109
 begg-0.108
lves-0.106
Token and
Token ID290
Feature activation+1.17e-02

Pos logit contributions
ang+0.086
 sur+0.079
iker+0.075
 begg+0.075
lves+0.075
Neg logit contributions
gh-0.085
 color-0.078
 noses-0.077
 opinions-0.076
 cooperate-0.074
Token jumped
Token ID11687
Feature activation+5.59e-03

Pos logit contributions
gh+0.016
 noses+0.015
 color+0.015
 opinions+0.014
 cooperate+0.014
Neg logit contributions
ang-0.020
 sur-0.019
iker-0.018
 begg-0.018
lves-0.018
Token,
Token ID11
Feature activation+4.95e-02

Pos logit contributions
gh+0.008
 color+0.007
 noses+0.007
 opinions+0.007
 cooperate+0.007
Neg logit contributions
ang-0.010
 sur-0.009
iker-0.008
lves-0.008
 begg-0.008
Token w
Token ID266
Feature activation+2.90e-02

Pos logit contributions
gh+0.061
 noses+0.056
 color+0.056
 opinions+0.052
 draw+0.052
Neg logit contributions
ang-0.091
 sur-0.085
iker-0.084
 begg-0.083
lves-0.081
Tokenagging
Token ID16406
Feature activation+2.11e-01

Pos logit contributions
 color+0.034
 draw+0.032
tr+0.031
gh+0.031
 argue+0.031
Neg logit contributions
ang-0.059
 sur-0.051
iker-0.049
lves-0.049
aro-0.049
Token their
Token ID511
Feature activation+1.16e-01

Pos logit contributions
gh+0.349
 cooperate+0.323
etrical+0.317
 opinions+0.314
 draw+0.312
Neg logit contributions
ang-0.287
 sur-0.252
aroo-0.247
 Rory-0.241
lves-0.240
Token tails
Token ID30514
Feature activation+1.47e-01

Pos logit contributions
gh+0.131
tr+0.101
 erase+0.100
 argue+0.100
etrical+0.099
Neg logit contributions
ang-0.230
lves-0.202
 sur-0.195
 Nem-0.194
aroo-0.190
Token.
Token ID13
Feature activation+2.10e-02

Pos logit contributions
gh+0.231
 opinions+0.207
 color+0.197
 cooperate+0.197
thia+0.196
Neg logit contributions
ang-0.227
 sur-0.206
 Rory-0.193
iker-0.191
lves-0.189
Token
Token ID198
Feature activation+2.14e-02

Pos logit contributions
gh+0.026
 color+0.024
 noses+0.024
Their+0.024
 opinions+0.023
Neg logit contributions
ang-0.038
 sur-0.037
iker-0.036
 begg-0.036
 traveller-0.035
Token
Token ID198
Feature activation+2.27e-02

Pos logit contributions
gh+0.034
 color+0.031
 noses+0.031
 opinions+0.030
 cooperate+0.029
Neg logit contributions
ang-0.033
 sur-0.030
iker-0.029
lves-0.029
 begg-0.028
Token roots
Token ID11135
Feature activation+1.01e-02

Pos logit contributions
gh+0.181
 color+0.162
 opinions+0.162
 noses+0.161
 cooperate+0.156
Neg logit contributions
ang-0.194
 sur-0.180
iker-0.171
lves-0.170
 begg-0.169
Token,
Token ID11
Feature activation+4.01e-02

Pos logit contributions
gh+0.011
 color+0.011
 noses+0.010
 draw+0.010
 shape+0.010
Neg logit contributions
ang-0.019
 sur-0.019
lves-0.018
iker-0.018
 begg-0.018
Token z
Token ID1976
Feature activation-8.44e-02

Pos logit contributions
 color+0.047
 noses+0.046
gh+0.044
 draw+0.043
 respect+0.041
Neg logit contributions
ang-0.078
 sur-0.072
iker-0.072
 begg-0.069
ester-0.067
Tokenig
Token ID328
Feature activation-5.09e-02

Pos logit contributions
ang+0.124
 sur+0.120
lves+0.117
 begg+0.116
 energ+0.113
Neg logit contributions
gh-0.138
 noses-0.121
 opinions-0.116
 color-0.116
ap-0.112
Token z
Token ID1976
Feature activation-8.49e-02

Pos logit contributions
 sur+0.104
ang+0.103
 begg+0.098
iker+0.097
 Rory+0.097
Neg logit contributions
gh-0.057
 noses-0.046
 color-0.045
thia-0.042
 sounds-0.041
Tokenagging
Token ID16406
Feature activation+2.09e-01

Pos logit contributions
ang+0.127
 sur+0.125
lves+0.122
 begg+0.120
 energ+0.117
Neg logit contributions
gh-0.138
 noses-0.119
 opinions-0.113
 color-0.113
ags-0.111
Token back
Token ID736
Feature activation+7.64e-02

Pos logit contributions
gh+0.325
 cooperate+0.303
 opinions+0.300
 noses+0.287
 argue+0.286
Neg logit contributions
ang-0.307
 sur-0.279
 Rory-0.267
lves-0.265
aroo-0.263
Token and
Token ID290
Feature activation-7.70e-03

Pos logit contributions
 color+0.086
gh+0.085
 noses+0.081
 opinions+0.077
 sounds+0.076
Neg logit contributions
ang-0.147
 sur-0.144
iker-0.138
lves-0.137
 begg-0.137
Token forth
Token ID6071
Feature activation+7.91e-02

Pos logit contributions
ang+0.015
iker+0.014
 sur+0.014
 begg+0.014
lves+0.014
Neg logit contributions
 color-0.009
 noses-0.008
gh-0.008
 sounds-0.008
 rhythm-0.008
Token.
Token ID13
Feature activation+1.21e-02

Pos logit contributions
gh+0.106
 color+0.095
 opinions+0.094
 noses+0.094
 cooperate+0.090
Neg logit contributions
ang-0.140
 sur-0.130
iker-0.124
lves-0.124
 begg-0.123
Token
Token ID198
Feature activation+1.94e-02

Pos logit contributions
 color+0.010
Her+0.010
thia+0.010
 draw+0.010
 Her+0.009
Neg logit contributions
ang-0.026
 sur-0.025
iker-0.025
 begg-0.024
lves-0.024
Token after
Token ID706
Feature activation-9.24e-03

Pos logit contributions
ang+0.094
 sur+0.086
 begg+0.082
iker+0.082
lves+0.080
Neg logit contributions
gh-0.088
 color-0.079
 noses-0.079
Their-0.077
 respect-0.077
Token the
Token ID262
Feature activation-2.24e-02

Pos logit contributions
ang+0.018
 sur+0.017
iker+0.017
 begg+0.017
ester+0.017
Neg logit contributions
gh-0.011
Their-0.010
 rhythm-0.009
 noses-0.009
 purpose-0.009
Token bone
Token ID9970
Feature activation-1.44e-02

Pos logit contributions
ang+0.033
 sur+0.030
iker+0.029
 begg+0.029
lves+0.028
Neg logit contributions
gh-0.037
 opinions-0.033
 color-0.033
 noses-0.033
 cooperate-0.032
Token,
Token ID11
Feature activation+4.78e-02

Pos logit contributions
ang+0.026
 sur+0.024
lves+0.023
iker+0.023
 begg+0.023
Neg logit contributions
gh-0.019
 color-0.017
 noses-0.016
 opinions-0.016
 sounds-0.016
Token w
Token ID266
Feature activation+3.13e-02

Pos logit contributions
 noses+0.045
 color+0.045
gh+0.044
 quarrel+0.042
 sounds+0.040
Neg logit contributions
ang-0.101
 sur-0.094
 begg-0.093
iker-0.092
ester-0.090
Tokenagging
Token ID16406
Feature activation+2.09e-01

Pos logit contributions
 color+0.035
 draw+0.031
 opinions+0.031
 argue+0.030
 noses+0.030
Neg logit contributions
ang-0.065
iker-0.059
 sur-0.058
lves-0.056
 Rory-0.055
Token his
Token ID465
Feature activation+5.15e-02

Pos logit contributions
gh+0.360
 cooperate+0.352
 opinions+0.348
etrical+0.338
 draw+0.333
Neg logit contributions
ang-0.254
 sur-0.231
aroo-0.217
lves-0.217
 Rory-0.214
Token tail
Token ID7894
Feature activation+5.79e-02

Pos logit contributions
gh+0.075
 cooperate+0.068
 argue+0.068
 opinions+0.068
etrical+0.067
Neg logit contributions
ang-0.082
lves-0.069
aroo-0.066
iker-0.065
 fisherman-0.064
Token with
Token ID351
Feature activation-1.55e-02

Pos logit contributions
gh+0.089
 opinions+0.082
 cooperate+0.079
 color+0.078
 noses+0.077
Neg logit contributions
ang-0.090
 sur-0.082
iker-0.077
 begg-0.076
lves-0.076
Token excitement
Token ID14067
Feature activation+5.58e-02

Pos logit contributions
ang+0.028
 sur+0.026
iker+0.025
lves+0.025
 begg+0.025
Neg logit contributions
gh-0.020
 color-0.018
 noses-0.018
 opinions-0.018
 respect-0.017
Token.
Token ID13
Feature activation+6.95e-03

Pos logit contributions
gh+0.074
 color+0.066
 noses+0.065
 opinions+0.065
 cooperate+0.062
Neg logit contributions
ang-0.100
 sur-0.093
iker-0.089
lves-0.089
 begg-0.088
Token around
Token ID1088
Feature activation+1.07e-01

Pos logit contributions
ang+0.012
 sur+0.011
iker+0.011
lves+0.010
 begg+0.010
Neg logit contributions
gh-0.009
 color-0.008
 noses-0.008
 opinions-0.008
 sounds-0.008
Token the
Token ID262
Feature activation+4.47e-03

Pos logit contributions
gh+0.166
 color+0.153
 noses+0.150
 opinions+0.147
 draw+0.144
Neg logit contributions
ang-0.170
 sur-0.157
iker-0.151
lves-0.150
 begg-0.147
Token house
Token ID2156
Feature activation+1.75e-03

Pos logit contributions
gh+0.008
 opinions+0.007
 cooperate+0.007
 noses+0.007
 argue+0.007
Neg logit contributions
ang-0.006
 sur-0.006
 begg-0.006
lves-0.005
 Kang-0.005
Token,
Token ID11
Feature activation+4.92e-02

Pos logit contributions
gh+0.002
 color+0.002
 noses+0.002
 opinions+0.002
 cooperate+0.002
Neg logit contributions
ang-0.003
 sur-0.003
iker-0.003
lves-0.003
 begg-0.003
Token w
Token ID266
Feature activation+3.51e-02

Pos logit contributions
gh+0.066
 color+0.061
 noses+0.060
 opinions+0.059
 draw+0.057
Neg logit contributions
ang-0.087
 sur-0.080
iker-0.077
lves-0.077
 begg-0.076
Tokenagging
Token ID16406
Feature activation+2.09e-01

Pos logit contributions
 color+0.042
 argue+0.038
tr+0.038
 draw+0.037
 opinions+0.037
Neg logit contributions
ang-0.072
 sur-0.062
iker-0.060
lves-0.058
ester-0.058
Token his
Token ID465
Feature activation+4.86e-02

Pos logit contributions
gh+0.386
 cooperate+0.373
 opinions+0.365
etrical+0.355
 argue+0.354
Neg logit contributions
ang-0.237
 sur-0.212
 Rory-0.199
aroo-0.194
lves-0.190
Token tail
Token ID7894
Feature activation+5.72e-02

Pos logit contributions
gh+0.078
 argue+0.074
 cooperate+0.071
 communicate+0.069
 erase+0.068
Neg logit contributions
ang-0.074
 elderly-0.061
 Kang-0.057
 fisherman-0.057
 begg-0.056
Token.
Token ID13
Feature activation+1.09e-02

Pos logit contributions
gh+0.082
 opinions+0.073
 noses+0.072
 color+0.072
 cooperate+0.071
Neg logit contributions
ang-0.096
 sur-0.088
iker-0.083
 begg-0.083
lves-0.083
Token
Token ID198
Feature activation+2.14e-02

Pos logit contributions
gh+0.016
 color+0.015
 noses+0.015
 opinions+0.015
 cooperate+0.014
Neg logit contributions
ang-0.017
 sur-0.016
lves-0.015
iker-0.015
 begg-0.015
Token
Token ID198
Feature activation+2.26e-02

Pos logit contributions
gh+0.038
 noses+0.036
 opinions+0.035
 color+0.035
 faces+0.034
Neg logit contributions
ang-0.029
 sur-0.025
 begg-0.023
 Rory-0.023
lves-0.023
Token ran
Token ID4966
Feature activation-6.66e-02

Pos logit contributions
gh+0.093
 noses+0.087
 color+0.086
 sounds+0.083
 quarrel+0.080
Neg logit contributions
ang-0.123
 sur-0.116
iker-0.115
 traveller-0.109
lves-0.109
Token towards
Token ID3371
Feature activation+2.58e-02

Pos logit contributions
ang+0.113
 sur+0.103
 begg+0.099
iker+0.098
lves+0.095
Neg logit contributions
gh-0.094
 noses-0.087
 color-0.086
Their-0.085
 respect-0.084
Token them
Token ID606
Feature activation+8.84e-02

Pos logit contributions
gh+0.024
 color+0.021
 noses+0.020
 opinions+0.020
 respect+0.019
Neg logit contributions
ang-0.057
 sur-0.054
iker-0.052
lves-0.051
 begg-0.051
Token,
Token ID11
Feature activation+5.18e-02

Pos logit contributions
gh+0.105
 color+0.096
 noses+0.095
 opinions+0.091
 respect+0.089
Neg logit contributions
ang-0.168
 sur-0.160
iker-0.153
lves-0.152
 begg-0.152
Token w
Token ID266
Feature activation+3.50e-02

Pos logit contributions
gh+0.069
 color+0.061
 opinions+0.061
 noses+0.060
 cooperate+0.058
Neg logit contributions
ang-0.093
 sur-0.087
lves-0.082
iker-0.082
 begg-0.081
Tokenagging
Token ID16406
Feature activation+2.07e-01

Pos logit contributions
 color+0.040
 draw+0.037
tr+0.036
 opinions+0.036
 discuss+0.034
Neg logit contributions
ang-0.074
 sur-0.065
iker-0.064
aro-0.063
aroo-0.061
Token its
Token ID663
Feature activation+8.22e-02

Pos logit contributions
gh+0.353
 cooperate+0.352
 opinions+0.339
etrical+0.337
 draw+0.330
Neg logit contributions
ang-0.254
 sur-0.228
aroo-0.226
aro-0.212
 Rory-0.212
Token tail
Token ID7894
Feature activation+5.30e-02

Pos logit contributions
gh+0.135
etrical+0.117
 cooperate+0.117
 discuss+0.113
ographs+0.111
Neg logit contributions
ang-0.123
 Kang-0.100
lves-0.099
 elderly-0.098
aroo-0.097
Token.
Token ID13
Feature activation+1.20e-02

Pos logit contributions
gh+0.079
 opinions+0.071
 color+0.070
 noses+0.069
 cooperate+0.068
Neg logit contributions
ang-0.086
 sur-0.078
iker-0.074
 begg-0.073
lves-0.073
Token Lily
Token ID20037
Feature activation+1.21e-02

Pos logit contributions
Their+0.013
gh+0.013
 color+0.012
 noses+0.011
 draw+0.011
Neg logit contributions
ang-0.024
iker-0.023
 sur-0.023
lves-0.022
 traveller-0.022
Token and
Token ID290
Feature activation-1.83e-02

Pos logit contributions
gh+0.013
 noses+0.012
 color+0.012
 sounds+0.011
 opinions+0.011
Neg logit contributions
ang-0.023
 sur-0.022
iker-0.022
lves-0.021
 begg-0.021
Token Rex
Token ID17853
Feature activation+2.08e-02

Pos logit contributions
gh+0.030
 noses+0.027
 color+0.027
 opinions+0.026
 cooperate+0.025
Neg logit contributions
ang-0.044
 sur-0.041
iker-0.040
lves-0.039
 begg-0.039
Token followed
Token ID3940
Feature activation-4.30e-02

Pos logit contributions
gh+0.030
 color+0.027
 noses+0.027
 opinions+0.027
 cooperate+0.026
Neg logit contributions
ang-0.035
 sur-0.032
iker-0.030
lves-0.030
 begg-0.030
Token them
Token ID606
Feature activation+8.27e-02

Pos logit contributions
ang+0.098
 sur+0.093
lves+0.090
iker+0.089
 energ+0.089
Neg logit contributions
gh-0.038
 noses-0.033
 color-0.032
Their-0.031
 respect-0.031
Token,
Token ID11
Feature activation+4.63e-02

Pos logit contributions
gh+0.117
 color+0.105
 noses+0.104
 opinions+0.104
 cooperate+0.099
Neg logit contributions
ang-0.142
 sur-0.132
iker-0.125
lves-0.125
 begg-0.123
Token w
Token ID266
Feature activation+3.22e-02

Pos logit contributions
gh+0.065
 noses+0.060
 color+0.060
 opinions+0.059
 respect+0.057
Neg logit contributions
ang-0.078
 sur-0.072
iker-0.069
lves-0.068
 begg-0.068
Tokenagging
Token ID16406
Feature activation+2.07e-01

Pos logit contributions
 color+0.043
 draw+0.039
tr+0.038
 discuss+0.037
 argue+0.036
Neg logit contributions
ang-0.064
iker-0.055
aro-0.054
 sur-0.054
ela-0.052
Token his
Token ID465
Feature activation+5.36e-02

Pos logit contributions
gh+0.416
 cooperate+0.400
 opinions+0.393
etrical+0.386
 draw+0.381
Neg logit contributions
ang-0.200
 sur-0.185
 Rory-0.172
aroo-0.164
 Ned-0.158
Token tail
Token ID7894
Feature activation+5.86e-02

Pos logit contributions
gh+0.078
 cooperate+0.067
 opinions+0.066
etrical+0.064
 argue+0.063
Neg logit contributions
ang-0.089
lves-0.076
aroo-0.076
 begg-0.073
 fisherman-0.073
Token.
Token ID13
Feature activation+1.35e-02

Pos logit contributions
gh+0.090
 opinions+0.080
 cooperate+0.078
 color+0.078
 noses+0.077
Neg logit contributions
ang-0.092
 sur-0.084
iker-0.079
 begg-0.079
 Rory-0.078
Token They
Token ID1119
Feature activation+6.31e-02

Pos logit contributions
gh+0.020
 noses+0.020
Their+0.019
 sounds+0.019
 color+0.019
Neg logit contributions
ang-0.022
 sur-0.020
lves-0.019
iker-0.019
 begg-0.019
Token reached
Token ID4251
Feature activation-1.42e-02

Pos logit contributions
 color+0.073
 noses+0.073
gh+0.073
 draw+0.070
 quarrel+0.069
Neg logit contributions
ang-0.119
iker-0.108
 sur-0.107
lves-0.103
 begg-0.103
Token around
Token ID1088
Feature activation+9.76e-02

Pos logit contributions
ang+0.180
lves+0.160
 begg+0.160
iker+0.159
 sur+0.158
Neg logit contributions
gh-0.092
 rhythm-0.079
 color-0.077
 draw-0.076
 respect-0.074
Token the
Token ID262
Feature activation-8.73e-04

Pos logit contributions
gh+0.136
 draw+0.128
 color+0.127
 rhythm+0.126
 quarrel+0.122
Neg logit contributions
ang-0.168
iker-0.157
lves-0.153
 begg-0.150
unn-0.147
Token yard
Token ID12699
Feature activation-4.90e-05

Pos logit contributions
ang+0.001
 sur+0.001
lves+0.001
 begg+0.001
iker+0.001
Neg logit contributions
gh-0.001
 opinions-0.001
 noses-0.001
 cooperate-0.001
 argue-0.001
Token,
Token ID11
Feature activation+4.29e-02

Pos logit contributions
ang+0.000
 sur+0.000
lves+0.000
iker+0.000
 begg+0.000
Neg logit contributions
gh-0.000
 color-0.000
 noses-0.000
 opinions-0.000
 draw-0.000
Token w
Token ID266
Feature activation+2.98e-02

Pos logit contributions
gh+0.048
 color+0.046
 noses+0.045
 draw+0.045
 rhythm+0.042
Neg logit contributions
ang-0.082
iker-0.077
 sur-0.075
 begg-0.074
 traveller-0.074
Tokenagging
Token ID16406
Feature activation+2.07e-01

Pos logit contributions
 color+0.034
 argue+0.031
gh+0.031
 noses+0.030
 draw+0.029
Neg logit contributions
ang-0.062
 sur-0.056
iker-0.054
lves-0.053
ester-0.051
Token his
Token ID465
Feature activation+4.49e-02

Pos logit contributions
gh+0.360
 opinions+0.339
 cooperate+0.339
 noses+0.325
etrical+0.325
Neg logit contributions
ang-0.268
 sur-0.245
 Rory-0.227
lves-0.223
aroo-0.222
Token tail
Token ID7894
Feature activation+5.60e-02

Pos logit contributions
gh+0.063
 argue+0.056
 cooperate+0.053
 opinions+0.052
 communicate+0.051
Neg logit contributions
ang-0.078
lves-0.066
 sur-0.065
aroo-0.064
iker-0.062
Token with
Token ID351
Feature activation-1.70e-02

Pos logit contributions
gh+0.077
 color+0.069
 noses+0.068
 opinions+0.068
 cooperate+0.065
Neg logit contributions
ang-0.097
 sur-0.091
iker-0.086
lves-0.086
 begg-0.085
Token the
Token ID262
Feature activation-3.54e-02

Pos logit contributions
ang+0.035
iker+0.032
 sur+0.032
lves+0.032
 begg+0.031
Neg logit contributions
gh-0.018
 color-0.018
 noses-0.017
 respect-0.017
 rhythm-0.017
Token ribbon
Token ID29092
Feature activation-2.42e-02

Pos logit contributions
ang+0.066
 sur+0.062
iker+0.059
lves+0.059
 begg+0.058
Neg logit contributions
gh-0.044
 color-0.038
 opinions-0.038
 noses-0.038
 cooperate-0.037
Token together
Token ID1978
Feature activation+1.17e-01

Pos logit contributions
gh+0.027
 color+0.024
 noses+0.024
 opinions+0.024
thia+0.024
Neg logit contributions
ang-0.026
 sur-0.024
 Rory-0.022
 energ-0.022
lves-0.022
Token all
Token ID477
Feature activation+2.94e-02

Pos logit contributions
gh+0.217
 opinions+0.209
thia+0.202
 noses+0.198
etrical+0.194
Neg logit contributions
ang-0.142
 sur-0.125
 Rory-0.115
 he-0.113
 energ-0.110
Token day
Token ID1110
Feature activation-3.16e-02

Pos logit contributions
gh+0.054
 noses+0.048
 opinions+0.048
 color+0.047
 cooperate+0.046
Neg logit contributions
ang-0.040
 sur-0.035
lves-0.033
iker-0.032
 begg-0.032
Token,
Token ID11
Feature activation+5.23e-02

Pos logit contributions
ang+0.049
 sur+0.043
lves+0.040
iker+0.039
 energ+0.039
Neg logit contributions
gh-0.049
 opinions-0.049
 respect-0.045
 color-0.045
 noses-0.045
Token w
Token ID266
Feature activation+3.44e-02

Pos logit contributions
gh+0.071
 color+0.063
 noses+0.063
 opinions+0.062
 cooperate+0.060
Neg logit contributions
ang-0.092
 sur-0.086
iker-0.082
lves-0.082
 begg-0.081
Tokenagging
Token ID16406
Feature activation+2.07e-01

Pos logit contributions
 color+0.043
gh+0.039
 draw+0.038
 argue+0.038
 noses+0.037
Neg logit contributions
ang-0.072
lves-0.059
aroo-0.058
 sur-0.058
aro-0.058
Token their
Token ID511
Feature activation+1.11e-01

Pos logit contributions
gh+0.330
 cooperate+0.291
etrical+0.287
 opinions+0.284
 argue+0.278
Neg logit contributions
ang-0.304
lves-0.269
 sur-0.267
 Rory-0.261
aroo-0.259
Token tails
Token ID30514
Feature activation+1.44e-01

Pos logit contributions
gh+0.138
 argue+0.102
 mascara+0.101
etrical+0.101
 cooperate+0.098
Neg logit contributions
ang-0.215
lves-0.198
 elderly-0.178
 Nem-0.174
aroo-0.174
Token and
Token ID290
Feature activation-2.21e-03

Pos logit contributions
gh+0.240
 opinions+0.212
etrical+0.204
 cooperate+0.203
thia+0.202
Neg logit contributions
ang-0.213
 sur-0.187
 Rory-0.175
iker-0.174
lves-0.174
Token having
Token ID1719
Feature activation+5.12e-02

Pos logit contributions
ang+0.004
 sur+0.004
lves+0.003
 Rory+0.003
iker+0.003
Neg logit contributions
gh-0.003
 opinions-0.003
 color-0.003
Their-0.003
 cooperate-0.003
Token fun
Token ID1257
Feature activation+1.13e-01

Pos logit contributions
gh+0.059
 color+0.051
 noses+0.050
 opinions+0.049
 cooperate+0.048
Neg logit contributions
ang-0.097
 sur-0.092
 begg-0.088
 Rory-0.088
lves-0.088
Token They
Token ID1119
Feature activation+6.63e-02

Pos logit contributions
gh+0.021
 noses+0.019
 color+0.019
 opinions+0.019
Their+0.018
Neg logit contributions
ang-0.040
 sur-0.038
iker-0.036
lves-0.036
 begg-0.036
Token saw
Token ID2497
Feature activation+1.84e-02

Pos logit contributions
gh+0.085
 color+0.081
 noses+0.080
 opinions+0.078
 draw+0.077
Neg logit contributions
ang-0.119
 sur-0.108
iker-0.106
 begg-0.104
lves-0.103
Token the
Token ID262
Feature activation-3.67e-02

Pos logit contributions
 noses+0.022
 faces+0.021
 sounds+0.021
 color+0.021
 opinions+0.021
Neg logit contributions
ang-0.036
 sur-0.032
iker-0.032
lves-0.031
 begg-0.031
Token dog
Token ID3290
Feature activation-4.85e-02

Pos logit contributions
ang+0.047
 sur+0.042
lves+0.039
iker+0.039
 begg+0.039
Neg logit contributions
gh-0.068
 color-0.063
 noses-0.063
 opinions-0.062
 sounds-0.061
Token w
Token ID266
Feature activation+3.57e-02

Pos logit contributions
ang+0.087
 sur+0.079
lves+0.077
 begg+0.075
iker+0.075
Neg logit contributions
gh-0.064
 opinions-0.058
 color-0.057
 noses-0.056
 cooperate-0.055
Tokenagging
Token ID16406
Feature activation+2.07e-01

Pos logit contributions
 color+0.050
tr+0.044
 draw+0.043
 discuss+0.040
gh+0.040
Neg logit contributions
ang-0.071
iker-0.061
 sur-0.059
lves-0.056
 Rory-0.056
Token his
Token ID465
Feature activation+4.52e-02

Pos logit contributions
 cooperate+0.385
gh+0.385
etrical+0.368
 opinions+0.366
icious+0.363
Neg logit contributions
ang-0.217
aroo-0.188
 sur-0.188
 Rory-0.184
 he-0.178
Token tail
Token ID7894
Feature activation+5.59e-02

Pos logit contributions
gh+0.068
 cooperate+0.059
etrical+0.058
 opinions+0.058
 argue+0.057
Neg logit contributions
ang-0.073
aroo-0.060
lves-0.059
 fisherman-0.058
 elderly-0.057
Token and
Token ID290
Feature activation-7.41e-03

Pos logit contributions
gh+0.091
 opinions+0.083
 cooperate+0.081
etrical+0.079
Their+0.079
Neg logit contributions
ang-0.082
 sur-0.072
iker-0.068
 Rory-0.068
 begg-0.067
Token licking
Token ID43390
Feature activation+7.81e-02

Pos logit contributions
ang+0.013
 sur+0.012
iker+0.012
lves+0.011
 begg+0.011
Neg logit contributions
gh-0.010
 noses-0.009
 color-0.009
 opinions-0.009
 respect-0.009
Token the
Token ID262
Feature activation-2.50e-02

Pos logit contributions
gh+0.111
 opinions+0.099
 cooperate+0.098
thia+0.098
etrical+0.097
Neg logit contributions
ang-0.127
 sur-0.118
 Rory-0.117
 Ned-0.110
 Nem-0.109
Token playing
Token ID2712
Feature activation+4.39e-02

Pos logit contributions
gh+0.058
 color+0.053
 noses+0.052
 opinions+0.051
 respect+0.050
Neg logit contributions
ang-0.063
 sur-0.059
iker-0.057
lves-0.056
 begg-0.054
Token with
Token ID351
Feature activation-2.17e-03

Pos logit contributions
gh+0.055
 color+0.048
 sounds+0.046
 rhythm+0.046
 draw+0.046
Neg logit contributions
ang-0.084
 sur-0.079
lves-0.078
iker-0.078
 begg-0.076
Token it
Token ID340
Feature activation+7.80e-02

Pos logit contributions
ang+0.004
 sur+0.004
iker+0.003
 energ+0.003
 begg+0.003
Neg logit contributions
 noses-0.003
gh-0.003
 color-0.003
 opinions-0.003
 sounds-0.003
Token and
Token ID290
Feature activation-2.94e-03

Pos logit contributions
gh+0.084
 color+0.078
 noses+0.073
 sounds+0.072
 opinions+0.071
Neg logit contributions
ang-0.160
 sur-0.156
lves-0.149
iker-0.148
 begg-0.146
Token w
Token ID266
Feature activation+3.41e-02

Pos logit contributions
ang+0.005
iker+0.005
lves+0.005
 sur+0.005
ester+0.005
Neg logit contributions
 noses-0.004
 sounds-0.004
 color-0.004
 opinions-0.004
 draw-0.004
Tokenagging
Token ID16406
Feature activation+2.06e-01

Pos logit contributions
 color+0.034
gh+0.031
tr+0.030
 draw+0.028
 noses+0.027
Neg logit contributions
ang-0.074
 sur-0.066
iker-0.066
 Rory-0.063
ester-0.062
Token his
Token ID465
Feature activation+4.91e-02

Pos logit contributions
gh+0.383
 cooperate+0.358
 opinions+0.358
 noses+0.353
etrical+0.353
Neg logit contributions
ang-0.234
 sur-0.212
 Rory-0.194
lves-0.191
aroo-0.191
Token tail
Token ID7894
Feature activation+5.55e-02

Pos logit contributions
gh+0.067
 opinions+0.055
etrical+0.055
 cooperate+0.055
 discuss+0.054
Neg logit contributions
ang-0.087
aroo-0.074
 sur-0.073
lves-0.072
 begg-0.072
Token.
Token ID13
Feature activation+1.11e-02

Pos logit contributions
gh+0.069
 color+0.062
 noses+0.061
 opinions+0.061
 respect+0.058
Neg logit contributions
ang-0.103
 sur-0.097
iker-0.092
lves-0.092
 begg-0.091
Token He
Token ID679
Feature activation-1.45e-05

Pos logit contributions
Their+0.013
 color+0.012
gh+0.012
 names+0.011
 draw+0.011
Neg logit contributions
lves-0.021
ang-0.021
 energ-0.020
 traveller-0.020
iker-0.019
Token is
Token ID318
Feature activation+3.86e-02

Pos logit contributions
ang+0.000
iker+0.000
 Wil+0.000
 sur+0.000
 traveller+0.000
Neg logit contributions
 sounds-0.000
 noses-0.000
 opinions-0.000
 draw-0.000
 color-0.000
Token,
Token ID11
Feature activation+3.95e-02

Pos logit contributions
ang+0.140
lves+0.133
iker+0.125
 sur+0.123
 energ+0.122
Neg logit contributions
gh-0.054
 color-0.049
 sounds-0.042
 noses-0.042
 draw-0.042
Token he
Token ID339
Feature activation-3.57e-02

Pos logit contributions
 color+0.054
 draw+0.048
 talk+0.048
 joke+0.047
 respect+0.046
Neg logit contributions
ang-0.083
lves-0.071
iker-0.071
unn-0.071
ester-0.067
Token saw
Token ID2497
Feature activation+7.07e-04

Pos logit contributions
ang+0.079
iker+0.072
lves+0.071
aro+0.070
aroo+0.069
Neg logit contributions
 draw-0.037
 drew-0.035
gh-0.034
 color-0.034
 joke-0.034
Token Spot
Token ID15899
Feature activation+5.70e-03

Pos logit contributions
 color+0.001
 opinions+0.001
gh+0.001
 sounds+0.001
 noses+0.001
Neg logit contributions
ang-0.001
iker-0.001
 sur-0.001
 begg-0.001
 energ-0.001
Token w
Token ID266
Feature activation+3.55e-02

Pos logit contributions
gh+0.007
 color+0.006
 noses+0.006
 opinions+0.006
 draw+0.006
Neg logit contributions
ang-0.011
 sur-0.010
iker-0.010
lves-0.010
 begg-0.009
Tokenagging
Token ID16406
Feature activation+2.05e-01

Pos logit contributions
gh+0.033
 color+0.033
 noses+0.029
 draw+0.029
 opinions+0.028
Neg logit contributions
ang-0.077
 sur-0.071
iker-0.070
 begg-0.067
lves-0.067
Token his
Token ID465
Feature activation+4.36e-02

Pos logit contributions
gh+0.364
 cooperate+0.333
 opinions+0.332
 noses+0.329
 color+0.328
Neg logit contributions
ang-0.264
 sur-0.242
iker-0.224
lves-0.221
 begg-0.220
Token tail
Token ID7894
Feature activation+5.33e-02

Pos logit contributions
gh+0.050
 cooperate+0.043
 opinions+0.042
etrical+0.041
 color+0.040
Neg logit contributions
ang-0.084
 sur-0.075
iker-0.073
 begg-0.073
aroo-0.073
Token again
Token ID757
Feature activation+3.08e-02

Pos logit contributions
gh+0.075
 opinions+0.067
 noses+0.066
 color+0.066
 cooperate+0.064
Neg logit contributions
ang-0.091
 sur-0.083
iker-0.080
 begg-0.079
lves-0.078
Token.
Token ID13
Feature activation+1.39e-02

Pos logit contributions
gh+0.038
 color+0.034
 noses+0.033
 opinions+0.033
 cooperate+0.032
Neg logit contributions
ang-0.058
 sur-0.055
lves-0.053
iker-0.053
 begg-0.052
Token He
Token ID679
Feature activation-3.87e-04

Pos logit contributions
gh+0.016
 color+0.016
 noses+0.016
 opinions+0.015
thia+0.015
Neg logit contributions
ang-0.026
 sur-0.024
lves-0.024
iker-0.023
 energ-0.023
Token down
Token ID866
Feature activation+4.33e-02

Pos logit contributions
ang+0.010
 sur+0.009
iker+0.009
lves+0.009
 begg+0.009
Neg logit contributions
gh-0.007
 color-0.007
 noses-0.006
 opinions-0.006
 draw-0.006
Token the
Token ID262
Feature activation-8.77e-03

Pos logit contributions
 color+0.055
 sounds+0.054
gh+0.052
 rhythm+0.051
 cooperate+0.051
Neg logit contributions
ang-0.078
lves-0.073
iker-0.073
 sur-0.069
 traveller-0.069
Token town
Token ID3240
Feature activation-2.47e-02

Pos logit contributions
ang+0.015
 sur+0.014
iker+0.013
lves+0.013
 begg+0.013
Neg logit contributions
gh-0.013
 color-0.012
 noses-0.011
 opinions-0.011
 sounds-0.011
Token,
Token ID11
Feature activation+4.90e-02

Pos logit contributions
ang+0.046
 sur+0.043
lves+0.042
iker+0.042
 begg+0.042
Neg logit contributions
gh-0.030
 color-0.029
 opinions-0.027
 noses-0.027
 sounds-0.027
Token w
Token ID266
Feature activation+3.43e-02

Pos logit contributions
gh+0.058
 color+0.057
 noses+0.055
 draw+0.054
 quarrel+0.054
Neg logit contributions
ang-0.093
 sur-0.084
iker-0.083
 begg-0.082
lves-0.082
Tokenagging
Token ID16406
Feature activation+2.05e-01

Pos logit contributions
 color+0.034
gh+0.032
 noses+0.030
 draw+0.030
 opinions+0.029
Neg logit contributions
ang-0.075
 sur-0.067
iker-0.066
lves-0.064
ester-0.062
Token his
Token ID465
Feature activation+4.65e-02

Pos logit contributions
gh+0.368
 opinions+0.337
 cooperate+0.335
 noses+0.331
 color+0.329
Neg logit contributions
ang-0.264
 sur-0.241
iker-0.222
lves-0.222
 Rory-0.218
Token tail
Token ID7894
Feature activation+5.61e-02

Pos logit contributions
gh+0.069
 argue+0.063
 cooperate+0.061
 erase+0.059
 communicate+0.059
Neg logit contributions
ang-0.078
 elderly-0.063
 sur-0.061
lves-0.061
 fisherman-0.061
Token and
Token ID290
Feature activation-1.22e-02

Pos logit contributions
gh+0.077
 color+0.068
 noses+0.068
 opinions+0.068
 cooperate+0.065
Neg logit contributions
ang-0.098
 sur-0.091
iker-0.087
lves-0.087
 begg-0.086
Token barking
Token ID48961
Feature activation+6.21e-02

Pos logit contributions
ang+0.026
 sur+0.024
lves+0.023
iker+0.023
 begg+0.023
Neg logit contributions
gh-0.012
 color-0.012
 noses-0.011
 opinions-0.011
 draw-0.010
Token happily
Token ID18177
Feature activation-4.84e-02

Pos logit contributions
gh+0.087
 color+0.079
 noses+0.078
 opinions+0.078
 cooperate+0.074
Neg logit contributions
ang-0.106
 sur-0.099
iker-0.094
lves-0.094
 begg-0.093
Token,
Token ID11
Feature activation+4.90e-02

Pos logit contributions
ang+0.049
 sur+0.046
iker+0.044
lves+0.044
 begg+0.043
Neg logit contributions
gh-0.038
 color-0.034
 noses-0.033
 opinions-0.033
 cooperate-0.032
Token sniff
Token ID26300
Feature activation+1.30e-02

Pos logit contributions
gh+0.063
 color+0.059
 noses+0.057
 opinions+0.055
 draw+0.055
Neg logit contributions
ang-0.089
 sur-0.082
iker-0.080
 begg-0.078
lves-0.078
Tokening
Token ID278
Feature activation+6.13e-02

Pos logit contributions
 sounds+0.018
gh+0.017
 noses+0.017
 color+0.017
 shapes+0.017
Neg logit contributions
 sur-0.021
ang-0.020
iker-0.019
 begg-0.019
 energ-0.018
Token and
Token ID290
Feature activation-1.82e-02

Pos logit contributions
gh+0.085
 noses+0.080
 color+0.079
 opinions+0.077
 sounds+0.075
Neg logit contributions
ang-0.106
 sur-0.099
iker-0.094
lves-0.093
 begg-0.092
Token w
Token ID266
Feature activation+3.29e-02

Pos logit contributions
ang+0.031
 sur+0.029
iker+0.028
lves+0.028
 begg+0.027
Neg logit contributions
gh-0.026
 color-0.023
 opinions-0.023
 noses-0.023
 cooperate-0.022
Tokenagging
Token ID16406
Feature activation+2.05e-01

Pos logit contributions
 color+0.052
tr+0.047
 draw+0.047
 rhythm+0.046
 argue+0.046
Neg logit contributions
ang-0.054
iker-0.046
 sur-0.044
lves-0.044
ester-0.044
Token his
Token ID465
Feature activation+4.91e-02

Pos logit contributions
gh+0.389
 cooperate+0.375
 opinions+0.365
etrical+0.358
 respect+0.354
Neg logit contributions
ang-0.213
 sur-0.196
 Rory-0.187
lves-0.182
 Wil-0.182
Token tail
Token ID7894
Feature activation+5.48e-02

Pos logit contributions
gh+0.068
 cooperate+0.063
 argue+0.062
 communicate+0.058
etrical+0.057
Neg logit contributions
ang-0.074
lves-0.068
 begg-0.066
 elderly-0.064
 energ-0.064
Token.
Token ID13
Feature activation-2.88e-04

Pos logit contributions
gh+0.081
 opinions+0.072
 cooperate+0.071
Their+0.070
 noses+0.070
Neg logit contributions
ang-0.087
 sur-0.079
iker-0.076
 begg-0.076
 Rory-0.075
Token Finally
Token ID9461
Feature activation-5.61e-02

Pos logit contributions
ang+0.000
 sur+0.000
iker+0.000
lves+0.000
 begg+0.000
Neg logit contributions
gh-0.000
 color-0.000
 noses-0.000
 opinions-0.000
 cooperate-0.000
Token,
Token ID11
Feature activation+1.45e-02

Pos logit contributions
ang+0.125
lves+0.115
 sur+0.115
iker+0.111
 begg+0.108
Neg logit contributions
gh-0.061
 color-0.048
 noses-0.046
 draw-0.044
Their-0.043
Token after
Token ID706
Feature activation-1.13e-02

Pos logit contributions
ang+0.100
 sur+0.091
 begg+0.087
iker+0.087
lves+0.085
Neg logit contributions
gh-0.100
 color-0.090
 noses-0.090
Their-0.088
 respect-0.088
Token the
Token ID262
Feature activation-2.76e-02

Pos logit contributions
ang+0.024
 sur+0.023
lves+0.022
iker+0.022
 begg+0.022
Neg logit contributions
gh-0.012
Their-0.011
 rhythm-0.010
 color-0.010
 noses-0.010
Token stick
Token ID4859
Feature activation+9.06e-03

Pos logit contributions
ang+0.042
 sur+0.038
iker+0.036
lves+0.036
 begg+0.036
Neg logit contributions
gh-0.045
 opinions-0.041
 noses-0.040
 color-0.040
 cooperate-0.039
Token,
Token ID11
Feature activation+4.55e-02

Pos logit contributions
gh+0.012
 color+0.011
 noses+0.011
 opinions+0.011
 cooperate+0.010
Neg logit contributions
ang-0.016
 sur-0.015
iker-0.014
lves-0.014
 begg-0.014
Token w
Token ID266
Feature activation+2.99e-02

Pos logit contributions
gh+0.056
 noses+0.053
 color+0.053
 opinions+0.049
 quarrel+0.049
Neg logit contributions
ang-0.084
 sur-0.077
iker-0.076
 begg-0.075
lves-0.073
Tokenagging
Token ID16406
Feature activation+2.04e-01

Pos logit contributions
 color+0.039
 draw+0.035
 discuss+0.034
 argue+0.034
tr+0.033
Neg logit contributions
ang-0.060
iker-0.052
 sur-0.051
lves-0.049
aroo-0.049
Token his
Token ID465
Feature activation+4.61e-02

Pos logit contributions
gh+0.375
 cooperate+0.371
 opinions+0.369
etrical+0.357
 draw+0.353
Neg logit contributions
ang-0.225
 sur-0.204
aroo-0.194
 Rory-0.189
lves-0.184
Token tail
Token ID7894
Feature activation+5.46e-02

Pos logit contributions
gh+0.071
 opinions+0.065
 cooperate+0.065
 argue+0.064
 discuss+0.063
Neg logit contributions
ang-0.071
aroo-0.058
lves-0.058
 sur-0.054
 fisherman-0.054
Token.
Token ID13
Feature activation+3.11e-03

Pos logit contributions
gh+0.088
 opinions+0.082
 cooperate+0.079
 color+0.077
etrical+0.077
Neg logit contributions
ang-0.080
 sur-0.073
iker-0.068
 Rory-0.067
 begg-0.067
Token He
Token ID679
Feature activation-4.09e-03

Pos logit contributions
gh+0.004
Their+0.004
 color+0.004
 noses+0.004
 sounds+0.004
Neg logit contributions
ang-0.006
 sur-0.005
iker-0.005
 traveller-0.005
lves-0.005
Token was
Token ID373
Feature activation+1.04e-02

Pos logit contributions
ang+0.007
iker+0.007
 sur+0.007
 begg+0.007
 traveller+0.006
Neg logit contributions
gh-0.005
 noses-0.005
 color-0.005
 sounds-0.005
 quarrel-0.005
Token is
Token ID318
Feature activation+3.81e-02

Pos logit contributions
gh+0.027
 noses+0.025
 sounds+0.024
 opinions+0.024
 color+0.024
Neg logit contributions
ang-0.041
 sur-0.039
iker-0.038
lves-0.037
 begg-0.036
Token pant
Token ID15857
Feature activation-1.09e-03

Pos logit contributions
gh+0.060
 color+0.055
 opinions+0.054
 noses+0.054
 cooperate+0.052
Neg logit contributions
ang-0.059
 sur-0.055
iker-0.051
 begg-0.051
lves-0.051
Tokening
Token ID278
Feature activation+5.36e-02

Pos logit contributions
ang+0.002
lves+0.002
 sur+0.002
 begg+0.002
 Rory+0.002
Neg logit contributions
gh-0.001
 noses-0.001
 color-0.001
 sounds-0.001
 opinions-0.001
Token and
Token ID290
Feature activation-2.15e-02

Pos logit contributions
gh+0.053
 noses+0.047
 sounds+0.046
 color+0.046
 opinions+0.044
Neg logit contributions
ang-0.115
 sur-0.110
lves-0.109
iker-0.107
 begg-0.105
Token w
Token ID266
Feature activation+2.74e-02

Pos logit contributions
ang+0.041
iker+0.039
 sur+0.039
lves+0.039
 traveller+0.038
Neg logit contributions
 noses-0.025
 color-0.023
 sounds-0.023
gh-0.023
 faces-0.022
Tokenagging
Token ID16406
Feature activation+2.04e-01

Pos logit contributions
 color+0.026
gh+0.025
 noses+0.022
 opinions+0.022
 draw+0.022
Neg logit contributions
ang-0.061
 sur-0.055
iker-0.055
lves-0.052
 Rory-0.052
Token his
Token ID465
Feature activation+5.42e-02

Pos logit contributions
gh+0.384
 opinions+0.356
 cooperate+0.354
 noses+0.352
 color+0.352
Neg logit contributions
ang-0.236
 sur-0.217
 Rory-0.197
 begg-0.197
iker-0.196
Token tail
Token ID7894
Feature activation+6.08e-02

Pos logit contributions
gh+0.065
 opinions+0.057
 cooperate+0.057
etrical+0.055
 color+0.054
Neg logit contributions
ang-0.100
 sur-0.091
 begg-0.090
iker-0.089
aroo-0.087
Token.
Token ID13
Feature activation+1.16e-02

Pos logit contributions
gh+0.081
 color+0.072
 opinions+0.072
 noses+0.071
 cooperate+0.069
Neg logit contributions
ang-0.107
 sur-0.100
iker-0.095
 begg-0.094
lves-0.094
Token He
Token ID679
Feature activation+2.03e-03

Pos logit contributions
 sounds+0.017
gh+0.016
 noses+0.016
Their+0.016
 names+0.015
Neg logit contributions
ang-0.019
lves-0.018
 energ-0.017
 sur-0.017
ester-0.017
Token is
Token ID318
Feature activation+4.18e-02

Pos logit contributions
 sounds+0.002
 noses+0.002
 opinions+0.002
gh+0.002
 draw+0.002
Neg logit contributions
ang-0.004
iker-0.004
 sur-0.004
lves-0.004
aroo-0.004
Token pet
Token ID4273
Feature activation-2.24e-02

Pos logit contributions
gh+0.042
 color+0.036
 noses+0.036
 opinions+0.036
 cooperate+0.033
Neg logit contributions
ang-0.077
 sur-0.073
lves-0.069
iker-0.069
 begg-0.069
Token Rex
Token ID17853
Feature activation+2.37e-02

Pos logit contributions
ang+0.035
iker+0.034
 energ+0.033
 sur+0.033
 begg+0.033
Neg logit contributions
 noses-0.034
 sounds-0.032
 respect-0.031
 talk-0.031
 names-0.031
Token.
Token ID13
Feature activation+1.74e-02

Pos logit contributions
gh+0.032
 color+0.029
 noses+0.028
 opinions+0.028
 respect+0.027
Neg logit contributions
ang-0.042
 sur-0.039
iker-0.037
lves-0.037
 begg-0.037
Token He
Token ID679
Feature activation+7.13e-03

Pos logit contributions
 noses+0.017
gh+0.017
 color+0.016
 sounds+0.016
Their+0.016
Neg logit contributions
ang-0.035
 sur-0.034
iker-0.033
lves-0.033
 traveller-0.032
Token w
Token ID266
Feature activation+4.60e-02

Pos logit contributions
gh+0.009
 noses+0.009
 sounds+0.009
 opinions+0.009
 color+0.008
Neg logit contributions
ang-0.013
iker-0.012
 sur-0.012
lves-0.011
 begg-0.011
Tokenags
Token ID3775
Feature activation+2.03e-01

Pos logit contributions
gh+0.036
 color+0.034
 draw+0.027
tr+0.027
 noses+0.026
Neg logit contributions
ang-0.112
 sur-0.101
iker-0.098
 Rory-0.098
aro-0.096
Token his
Token ID465
Feature activation+6.43e-02

Pos logit contributions
gh+0.383
 cooperate+0.344
 opinions+0.343
 noses+0.339
 draw+0.334
Neg logit contributions
ang-0.263
 sur-0.228
aroo-0.219
aro-0.214
 Rory-0.213
Token tail
Token ID7894
Feature activation+6.97e-02

Pos logit contributions
gh+0.088
 cooperate+0.077
 opinions+0.073
 argue+0.071
 draw+0.070
Neg logit contributions
ang-0.118
aroo-0.100
lves-0.097
 sur-0.097
 Rory-0.097
Token and
Token ID290
Feature activation+8.60e-03

Pos logit contributions
gh+0.113
 opinions+0.099
 noses+0.096
 color+0.096
 cooperate+0.096
Neg logit contributions
ang-0.106
 sur-0.095
 begg-0.091
 Rory-0.089
iker-0.087
Token l
Token ID300
Feature activation-3.97e-02

Pos logit contributions
 sounds+0.012
 noses+0.012
 respect+0.011
 talk+0.011
 faces+0.011
Neg logit contributions
ang-0.015
iker-0.015
 sur-0.014
ester-0.014
 traveller-0.014
Tokenicks
Token ID3378
Feature activation+8.91e-02

Pos logit contributions
ang+0.082
 sur+0.076
iker+0.073
 Rory+0.072
 begg+0.071
Neg logit contributions
gh-0.044
 color-0.040
 noses-0.038
 opinions-0.037
 rhythm-0.035
Token new
Token ID649
Feature activation+5.70e-02

Pos logit contributions
gh+0.058
 opinions+0.055
 sounds+0.055
 noses+0.054
 color+0.053
Neg logit contributions
ang-0.093
 sur-0.088
iker-0.086
lves-0.082
 traveller-0.082
Token friends
Token ID2460
Feature activation+1.14e-02

Pos logit contributions
gh+0.114
etrical+0.098
 color+0.098
Their+0.098
 noses+0.097
Neg logit contributions
ang-0.068
 sur-0.061
iker-0.055
 begg-0.055
 Rory-0.053
Token.
Token ID13
Feature activation+1.78e-02

Pos logit contributions
gh+0.015
 noses+0.014
 color+0.013
 opinions+0.013
 cooperate+0.013
Neg logit contributions
ang-0.020
 sur-0.019
iker-0.018
lves-0.018
 begg-0.018
Token He
Token ID679
Feature activation+6.70e-03

Pos logit contributions
 color+0.024
 opinions+0.024
 respect+0.023
Their+0.023
 sounds+0.023
Neg logit contributions
ang-0.030
lves-0.028
 sur-0.027
iker-0.027
aroo-0.027
Token w
Token ID266
Feature activation+4.83e-02

Pos logit contributions
 sounds+0.007
 opinions+0.007
 draw+0.007
 respect+0.007
 noses+0.007
Neg logit contributions
ang-0.014
iker-0.013
 Wil-0.012
 sur-0.012
aroo-0.012
Tokenags
Token ID3775
Feature activation+2.03e-01

Pos logit contributions
gh+0.038
 color+0.035
 noses+0.030
tr+0.029
 sounds+0.027
Neg logit contributions
ang-0.116
 sur-0.105
iker-0.103
 Rory-0.100
ester-0.098
Token his
Token ID465
Feature activation+6.36e-02

Pos logit contributions
gh+0.341
 color+0.316
 noses+0.309
 opinions+0.308
 respect+0.297
Neg logit contributions
ang-0.298
 sur-0.272
lves-0.261
iker-0.260
 begg-0.253
Token tail
Token ID7894
Feature activation+6.74e-02

Pos logit contributions
gh+0.077
 cooperate+0.062
 opinions+0.062
 respect+0.059
 color+0.059
Neg logit contributions
ang-0.125
lves-0.111
 sur-0.111
aroo-0.108
 begg-0.107
Token and
Token ID290
Feature activation+7.07e-03

Pos logit contributions
gh+0.087
 color+0.080
 noses+0.078
 opinions+0.077
 respect+0.074
Neg logit contributions
ang-0.122
 sur-0.115
iker-0.110
lves-0.110
 begg-0.107
Token bar
Token ID2318
Feature activation+5.75e-04

Pos logit contributions
 sounds+0.009
 noses+0.009
 talk+0.008
 respect+0.008
 draw+0.008
Neg logit contributions
ang-0.014
iker-0.014
ester-0.013
 sur-0.013
 traveller-0.013
Tokenks
Token ID591
Feature activation+5.27e-02

Pos logit contributions
gh+0.001
 color+0.001
 noses+0.001
 opinions+0.001
 cooperate+0.000
Neg logit contributions
ang-0.001
 sur-0.001
lves-0.001
iker-0.001
 begg-0.001
Token was
Token ID373
Feature activation-1.20e-02

Pos logit contributions
gh+0.045
 color+0.040
 opinions+0.039
 noses+0.039
 sounds+0.039
Neg logit contributions
ang-0.093
 sur-0.088
iker-0.085
 Rory-0.083
 begg-0.083
Token a
Token ID257
Feature activation-1.98e-02

Pos logit contributions
ang+0.023
 sur+0.022
iker+0.021
lves+0.020
 begg+0.020
Neg logit contributions
gh-0.015
 color-0.013
 noses-0.013
 opinions-0.013
 respect-0.012
Token fairy
Token ID25607
Feature activation-1.18e-01

Pos logit contributions
ang+0.032
 sur+0.031
 begg+0.029
iker+0.029
lves+0.028
Neg logit contributions
gh-0.028
 opinions-0.026
 noses-0.026
 cooperate-0.024
 color-0.024
Token or
Token ID393
Feature activation-9.97e-02

Pos logit contributions
ang+0.211
 sur+0.198
lves+0.195
iker+0.191
 Rory+0.190
Neg logit contributions
gh-0.154
 color-0.142
 draw-0.136
 opinions-0.135
 sounds-0.134
Token a
Token ID257
Feature activation-2.05e-02

Pos logit contributions
ang+0.233
 sur+0.211
iker+0.209
lves+0.207
ester+0.201
Neg logit contributions
 color-0.091
gh-0.087
 draw-0.086
 noses-0.084
 respect-0.081
Token gn
Token ID19967
Feature activation-2.36e-01

Pos logit contributions
ang+0.033
 sur+0.031
iker+0.029
lves+0.029
 begg+0.029
Neg logit contributions
gh-0.030
 color-0.027
 noses-0.027
 opinions-0.027
 cooperate-0.026
Tokenome
Token ID462
Feature activation-7.65e-02

Pos logit contributions
ang+0.324
 Wil+0.318
 Rory+0.304
 Ned+0.299
 sur+0.297
Neg logit contributions
gh-0.393
 opinions-0.340
 noses-0.338
 respect-0.334
 cooperate-0.331
Token,"
Token ID553
Feature activation-5.05e-02

Pos logit contributions
ang+0.138
 sur+0.130
 energ+0.129
lves+0.128
 Rory+0.124
Neg logit contributions
gh-0.096
 color-0.093
 draw-0.087
 sounds-0.086
 opinions-0.086
Token Mia
Token ID32189
Feature activation-6.62e-03

Pos logit contributions
ang+0.103
lves+0.099
iker+0.096
 sur+0.093
 traveller+0.091
Neg logit contributions
 color-0.055
 opinions-0.053
 sounds-0.053
 noses-0.053
 draw-0.051
Token says
Token ID1139
Feature activation-5.13e-02

Pos logit contributions
ang+0.008
 sur+0.008
 begg+0.007
 Rory+0.007
iker+0.007
Neg logit contributions
gh-0.013
 color-0.011
 noses-0.011
 rhythm-0.011
Their-0.011
Token.
Token ID13
Feature activation-1.44e-02

Pos logit contributions
ang+0.073
 sur+0.069
iker+0.063
 begg+0.063
 he+0.062
Neg logit contributions
gh-0.083
 color-0.077
 opinions-0.076
 noses-0.074
 rhythm-0.074
Token root
Token ID6808
Feature activation-3.44e-02

Pos logit contributions
ang+0.000
 sur+0.000
iker+0.000
lves+0.000
 begg+0.000
Neg logit contributions
gh-0.000
 color-0.000
 noses-0.000
 opinions-0.000
 respect-0.000
Token,
Token ID11
Feature activation+2.14e-02

Pos logit contributions
ang+0.062
 sur+0.058
lves+0.057
 Rory+0.055
iker+0.055
Neg logit contributions
gh-0.046
 color-0.042
 noses-0.041
 draw-0.040
 sounds-0.039
Token larger
Token ID4025
Feature activation+1.26e-02

Pos logit contributions
 color+0.028
gh+0.027
 noses+0.027
 sounds+0.026
 opinions+0.026
Neg logit contributions
ang-0.039
 sur-0.035
iker-0.035
lves-0.034
aroo-0.033
Token and
Token ID290
Feature activation-5.56e-02

Pos logit contributions
 color+0.015
 noses+0.015
 sounds+0.014
 opinions+0.014
 shape+0.014
Neg logit contributions
ang-0.025
lves-0.023
 sur-0.023
 energ-0.022
 Rory-0.022
Token more
Token ID517
Feature activation+3.54e-02

Pos logit contributions
ang+0.113
iker+0.103
 sur+0.102
lves+0.101
 Rory+0.099
Neg logit contributions
 color-0.060
gh-0.059
 sounds-0.057
 opinions-0.056
 noses-0.054
Token gn
Token ID19967
Feature activation-2.27e-01

Pos logit contributions
 color+0.043
 sounds+0.041
 noses+0.040
 opinions+0.040
 shape+0.040
Neg logit contributions
ang-0.068
lves-0.062
 sur-0.062
 Rory-0.061
 begg-0.060
Tokenar
Token ID283
Feature activation-2.65e-02

Pos logit contributions
ang+0.281
 sur+0.263
 Rory+0.249
 Wil+0.248
 begg+0.246
Neg logit contributions
gh-0.419
 noses-0.359
 opinions-0.358
etrical-0.350
 respect-0.349
Tokenled
Token ID992
Feature activation+1.91e-02

Pos logit contributions
ang+0.041
 sur+0.039
iker+0.038
 Kang+0.038
 begg+0.038
Neg logit contributions
gh-0.042
 color-0.037
 noses-0.036
 sounds-0.035
 shape-0.034
Token than
Token ID621
Feature activation-7.06e-02

Pos logit contributions
gh+0.026
 color+0.025
 noses+0.025
 opinions+0.025
 sounds+0.024
Neg logit contributions
ang-0.034
 sur-0.031
iker-0.030
 begg-0.030
lves-0.029
Token the
Token ID262
Feature activation-6.40e-02

Pos logit contributions
ang+0.133
 sur+0.113
lves+0.112
 begg+0.109
iker+0.108
Neg logit contributions
 color-0.097
gh-0.096
 respect-0.087
 noses-0.086
 draw-0.085
Token others
Token ID1854
Feature activation-1.86e-02

Pos logit contributions
ang+0.117
 sur+0.106
iker+0.100
lves+0.100
 begg+0.099
Neg logit contributions
gh-0.085
 color-0.083
 noses-0.078
 opinions-0.078
 sounds-0.075
Token special
Token ID2041
Feature activation+8.08e-03

Pos logit contributions
gh+0.000
 opinions+0.000
 noses+0.000
 color+0.000
 cooperate+0.000
Neg logit contributions
ang-0.000
 sur-0.000
iker-0.000
lves-0.000
 begg-0.000
Token surprise
Token ID5975
Feature activation-4.83e-02

Pos logit contributions
gh+0.011
 color+0.010
 noses+0.010
 opinions+0.010
 cooperate+0.009
Neg logit contributions
ang-0.014
 sur-0.013
lves-0.012
iker-0.012
 begg-0.012
Token.
Token ID13
Feature activation-6.92e-03

Pos logit contributions
ang+0.088
lves+0.081
 sur+0.080
iker+0.077
 begg+0.076
Neg logit contributions
gh-0.066
 color-0.058
 noses-0.054
 opinions-0.054
 draw-0.054
Token A
Token ID317
Feature activation+4.37e-03

Pos logit contributions
ang+0.013
lves+0.012
 sur+0.012
iker+0.011
aro+0.011
Neg logit contributions
gh-0.009
 color-0.009
 opinions-0.008
 respect-0.008
thia-0.008
Token thoughtful
Token ID22677
Feature activation+2.45e-02

Pos logit contributions
gh+0.006
 color+0.006
 noses+0.005
 rhythm+0.005
 sounds+0.005
Neg logit contributions
ang-0.008
 sur-0.007
lves-0.007
iker-0.007
 Rory-0.006
Token gn
Token ID19967
Feature activation-2.22e-01

Pos logit contributions
gh+0.037
 color+0.036
 rhythm+0.033
 respect+0.033
 opinions+0.033
Neg logit contributions
ang-0.040
lves-0.036
 sur-0.034
iker-0.034
aro-0.033
Tokenome
Token ID462
Feature activation-5.01e-02

Pos logit contributions
ang+0.265
 Wil+0.242
 energ+0.239
 Rory+0.237
 sur+0.235
Neg logit contributions
gh-0.419
 respect-0.353
 opinions-0.351
 color-0.349
 noses-0.347
Token was
Token ID373
Feature activation-4.38e-03

Pos logit contributions
ang+0.093
 energ+0.091
lves+0.090
iker+0.083
unn+0.083
Neg logit contributions
gh-0.062
 draw-0.054
 color-0.054
 shape-0.052
 sounds-0.051
Token playing
Token ID2712
Feature activation+3.63e-02

Pos logit contributions
ang+0.008
lves+0.008
iker+0.007
 sur+0.007
unn+0.007
Neg logit contributions
gh-0.006
 color-0.005
 draw-0.005
 respect-0.005
 noses-0.005
Token the
Token ID262
Feature activation-1.71e-02

Pos logit contributions
gh+0.042
 rhythm+0.038
 color+0.038
 draw+0.037
 sounds+0.036
Neg logit contributions
ang-0.073
unn-0.070
lves-0.068
iker-0.068
 sur-0.066
Token most
Token ID749
Feature activation-2.39e-02

Pos logit contributions
ang+0.031
 sur+0.027
 Rory+0.027
iker+0.027
lves+0.027
Neg logit contributions
gh-0.024
 color-0.022
 rhythm-0.020
 draw-0.020
 sounds-0.020
Token she
Token ID673
Feature activation-6.95e-02

Pos logit contributions
ang+0.091
iker+0.085
unn+0.079
 sur+0.078
oun+0.075
Neg logit contributions
 color-0.048
 draw-0.045
 sounds-0.043
 drew-0.042
 talk-0.041
Token had
Token ID550
Feature activation+1.94e-02

Pos logit contributions
ang+0.139
iker+0.134
lves+0.126
 sur+0.120
 traveller+0.119
Neg logit contributions
 color-0.079
gh-0.078
 draw-0.075
 sounds-0.073
 shape-0.072
Token made
Token ID925
Feature activation+2.10e-02

Pos logit contributions
gh+0.028
 noses+0.025
 opinions+0.025
 color+0.025
 cooperate+0.024
Neg logit contributions
 sur-0.029
ang-0.029
 begg-0.029
 energ-0.027
 Rory-0.027
Token a
Token ID257
Feature activation+8.13e-03

Pos logit contributions
gh+0.024
 color+0.022
 sounds+0.022
 opinions+0.022
 noses+0.022
Neg logit contributions
ang-0.043
 sur-0.039
iker-0.039
lves-0.038
aroo-0.037
Token garden
Token ID11376
Feature activation-3.93e-02

Pos logit contributions
gh+0.012
 opinions+0.011
 noses+0.010
 color+0.010
 cooperate+0.010
Neg logit contributions
ang-0.014
 sur-0.013
lves-0.012
 begg-0.012
iker-0.012
Token gn
Token ID19967
Feature activation-2.18e-01

Pos logit contributions
ang+0.072
 sur+0.065
lves+0.065
iker+0.064
aroo+0.062
Neg logit contributions
gh-0.051
 color-0.049
 draw-0.047
 noses-0.046
 sounds-0.046
Tokenome
Token ID462
Feature activation-4.48e-02

Pos logit contributions
ang+0.251
 sur+0.229
 Wil+0.216
 Rory+0.213
 Ned+0.209
Neg logit contributions
gh-0.424
 opinions-0.371
 noses-0.371
 color-0.367
etrical-0.367
Token.
Token ID13
Feature activation-4.08e-03

Pos logit contributions
ang+0.083
 sur+0.080
iker+0.078
lves+0.077
 Rory+0.076
Neg logit contributions
 color-0.052
 draw-0.051
gh-0.050
 sounds-0.050
 shape-0.049
Token
Token ID198
Feature activation+7.96e-03

Pos logit contributions
ang+0.010
 sur+0.009
iker+0.009
 begg+0.009
aro+0.009
Neg logit contributions
 color-0.003
 draw-0.003
 Her-0.003
Her-0.003
 opinions-0.002
Token
Token ID198
Feature activation+1.08e-02

Pos logit contributions
gh+0.010
Their+0.009
 color+0.009
 draw+0.009
 respect+0.009
Neg logit contributions
ang-0.014
 sur-0.014
iker-0.014
lves-0.013
 begg-0.013
TokenThe
Token ID464
Feature activation-1.45e-02

Pos logit contributions
gh+0.013
 color+0.011
 opinions+0.011
 noses+0.011
 cooperate+0.010
Neg logit contributions
ang-0.021
 sur-0.020
iker-0.019
lves-0.019
 begg-0.019
Token bunny
Token ID44915
Feature activation-4.10e-02

Pos logit contributions
ang+0.014
 sur+0.013
iker+0.013
aro+0.012
lves+0.012
Neg logit contributions
 color-0.019
gh-0.018
 sounds-0.018
 rhythm-0.018
 noses-0.018
Token was
Token ID373
Feature activation+4.06e-04

Pos logit contributions
ang+0.072
 sur+0.067
iker+0.064
lves+0.064
 begg+0.063
Neg logit contributions
gh-0.055
 color-0.050
 noses-0.049
 opinions-0.049
 cooperate-0.047
Token a
Token ID257
Feature activation-1.79e-02

Pos logit contributions
gh+0.001
 color+0.001
 opinions+0.001
 noses+0.001
 rhythm+0.001
Neg logit contributions
ang-0.001
 sur-0.001
iker-0.001
lves-0.001
 begg-0.001
Token very
Token ID845
Feature activation-2.13e-02

Pos logit contributions
ang+0.028
 sur+0.026
iker+0.025
lves+0.024
 begg+0.024
Neg logit contributions
gh-0.028
 color-0.026
 noses-0.025
 opinions-0.025
 cooperate-0.024
Token quick
Token ID2068
Feature activation-7.60e-03

Pos logit contributions
ang+0.036
lves+0.033
iker+0.033
 sur+0.033
 begg+0.032
Neg logit contributions
gh-0.030
 color-0.028
 cooperate-0.026
 joke-0.026
 opinions-0.026
Token lear
Token ID22454
Feature activation-2.17e-01

Pos logit contributions
ang+0.014
lves+0.012
aro+0.012
 sur+0.012
iker+0.012
Neg logit contributions
gh-0.010
 color-0.010
 joke-0.009
 rhythm-0.009
 noses-0.009
Tokenner
Token ID1008
Feature activation-4.57e-02

Pos logit contributions
ang+0.226
lves+0.208
 begg+0.201
 traveller+0.199
 sur+0.198
Neg logit contributions
gh-0.468
 cooperate-0.364
ap-0.363
ze-0.362
 draw-0.361
Token and
Token ID290
Feature activation-5.53e-02

Pos logit contributions
ang+0.094
lves+0.090
 sur+0.088
 begg+0.087
iker+0.086
Neg logit contributions
gh-0.048
 color-0.044
 noses-0.041
 opinions-0.039
 draw-0.039
Token soon
Token ID2582
Feature activation-3.29e-02

Pos logit contributions
ang+0.099
lves+0.098
iker+0.098
 traveller+0.090
 begg+0.089
Neg logit contributions
 color-0.069
gh-0.068
 sounds-0.066
 shape-0.065
 joke-0.063
Token he
Token ID339
Feature activation-6.26e-02

Pos logit contributions
ang+0.070
iker+0.060
aroo+0.057
lves+0.056
 Kang+0.055
Neg logit contributions
gh-0.039
 sounds-0.039
 color-0.036
 become-0.036
's-0.035
Token was
Token ID373
Feature activation-2.38e-02

Pos logit contributions
ang+0.128
iker+0.123
 sur+0.117
lves+0.117
 begg+0.114
Neg logit contributions
gh-0.069
 color-0.064
 sounds-0.061
 draw-0.059
 noses-0.059
Token.
Token ID13
Feature activation-1.52e-02

Pos logit contributions
gh+0.017
 color+0.016
 opinions+0.015
 noses+0.014
 rhythm+0.014
Neg logit contributions
ang-0.033
lves-0.032
iker-0.031
aroo-0.030
 sur-0.029
Token
Token ID220
Feature activation+3.80e-02

Pos logit contributions
ang+0.030
 sur+0.028
iker+0.028
lves+0.027
 begg+0.027
Neg logit contributions
gh-0.017
 opinions-0.016
 color-0.016
thia-0.016
 respect-0.016
Token
Token ID198
Feature activation+1.78e-03

Pos logit contributions
gh+0.038
thia+0.035
Their+0.035
 cooperate+0.034
 opinions+0.034
Neg logit contributions
ang-0.078
iker-0.074
 sur-0.074
 begg-0.074
lves-0.073
Token
Token ID198
Feature activation+4.91e-03

Pos logit contributions
gh+0.003
 color+0.002
 opinions+0.002
Their+0.002
 noses+0.002
Neg logit contributions
ang-0.003
 sur-0.003
iker-0.003
lves-0.003
 begg-0.003
TokenThe
Token ID464
Feature activation-2.43e-02

Pos logit contributions
gh+0.006
 color+0.006
 noses+0.006
 opinions+0.006
 cooperate+0.005
Neg logit contributions
ang-0.009
 sur-0.008
iker-0.008
lves-0.008
 begg-0.008
Token gn
Token ID19967
Feature activation-2.17e-01

Pos logit contributions
ang+0.053
lves+0.050
 Wil+0.048
rily+0.048
iker+0.047
Neg logit contributions
gh-0.023
 color-0.023
 sounds-0.023
 joke-0.022
 quarrel-0.022
Tokenome
Token ID462
Feature activation-4.27e-02

Pos logit contributions
ang+0.281
 Wil+0.269
 Kang+0.261
 Rory+0.261
 Ned+0.251
Neg logit contributions
gh-0.385
 opinions-0.317
thia-0.314
 sounds-0.309
 cooperate-0.307
Token was
Token ID373
Feature activation-7.11e-04

Pos logit contributions
ang+0.083
lves+0.082
iker+0.079
 sur+0.074
 energ+0.073
Neg logit contributions
gh-0.048
 color-0.044
 sounds-0.043
 opinions-0.042
 draw-0.041
Token so
Token ID523
Feature activation-8.09e-02

Pos logit contributions
ang+0.001
iker+0.001
lves+0.001
unn+0.001
 sur+0.001
Neg logit contributions
gh-0.001
 color-0.001
 rhythm-0.001
 respect-0.001
Their-0.001
Token pleased
Token ID10607
Feature activation-4.52e-04

Pos logit contributions
iker+0.151
lves+0.147
ang+0.142
unn+0.137
aro+0.131
Neg logit contributions
gh-0.098
 color-0.098
Their-0.093
 respect-0.090
 rhythm-0.090
Token with
Token ID351
Feature activation-6.88e-02

Pos logit contributions
lves+0.001
ang+0.001
iker+0.001
 begg+0.001
 sur+0.001
Neg logit contributions
gh-0.001
 color-0.000
 sounds-0.000
 respect-0.000
 opinions-0.000
Token being
Token ID852
Feature activation-3.86e-02

Pos logit contributions
ang+0.160
 sur+0.142
iker+0.138
lves+0.137
 begg+0.131
Neg logit contributions
gh-0.148
 color-0.144
 opinions-0.137
 respect-0.134
 noses-0.133
Token brave
Token ID14802
Feature activation-5.04e-02

Pos logit contributions
ang+0.068
 sur+0.063
iker+0.062
lves+0.058
 begg+0.058
Neg logit contributions
gh-0.051
 color-0.049
 opinions-0.049
 respect-0.047
 noses-0.046
Token was
Token ID373
Feature activation-2.49e-02

Pos logit contributions
ang+0.103
 sur+0.096
lves+0.093
iker+0.091
aro+0.090
Neg logit contributions
gh-0.058
 color-0.056
 opinions-0.056
 sounds-0.055
 noses-0.053
Token the
Token ID262
Feature activation-8.51e-02

Pos logit contributions
ang+0.051
 sur+0.047
iker+0.046
lves+0.044
 begg+0.044
Neg logit contributions
gh-0.030
 color-0.027
 respect-0.026
 opinions-0.026
 noses-0.024
Token best
Token ID1266
Feature activation-4.41e-02

Pos logit contributions
ang+0.179
 sur+0.157
aro+0.153
lves+0.152
iker+0.151
Neg logit contributions
gh-0.097
 color-0.094
 opinions-0.087
 respect-0.085
 draw-0.084
Token choice
Token ID3572
Feature activation-2.14e-01

Pos logit contributions
ang+0.098
lves+0.084
 sur+0.083
aro+0.083
 Rory+0.081
Neg logit contributions
 color-0.055
gh-0.051
 draw-0.049
 respect-0.047
 feelings-0.046
Token.
Token ID13
Feature activation-1.13e-02

Pos logit contributions
ang+0.396
 sur+0.363
lves+0.360
 begg+0.349
iker+0.341
Neg logit contributions
gh-0.306
 respect-0.266
 color-0.265
 opinions-0.249
 sounds-0.245
Token
Token ID220
Feature activation+2.97e-02

Pos logit contributions
 sur+0.026
ang+0.025
iker+0.025
lves+0.024
 begg+0.024
Neg logit contributions
Her-0.007
 color-0.007
 respect-0.007
 Her-0.007
 opinions-0.007
Token
Token ID198
Feature activation-2.41e-03

Pos logit contributions
gh+0.030
thia+0.028
Their+0.028
 cooperate+0.027
 opinions+0.026
Neg logit contributions
 sur-0.060
ang-0.059
iker-0.058
 begg-0.057
 traveller-0.056
Token
Token ID198
Feature activation+1.63e-03

Pos logit contributions
ang+0.005
iker+0.005
 sur+0.005
lves+0.005
 begg+0.004
Neg logit contributions
gh-0.003
Their-0.002
 cooperate-0.002
 respect-0.002
Her-0.002
TokenThe
Token ID464
Feature activation-2.94e-02

Pos logit contributions
gh+0.002
Their+0.001
 respect+0.001
 cooperate+0.001
 color+0.001
Neg logit contributions
ang-0.003
iker-0.003
 sur-0.003
lves-0.003
 begg-0.003
Token.
Token ID13
Feature activation-6.20e-03

Pos logit contributions
gh+0.075
 noses+0.065
 opinions+0.065
 color+0.065
 sounds+0.063
Neg logit contributions
ang-0.125
 sur-0.117
lves-0.113
iker-0.112
 begg-0.111
Token Sam
Token ID3409
Feature activation+1.97e-02

Pos logit contributions
ang+0.011
 sur+0.010
iker+0.010
 begg+0.010
lves+0.010
Neg logit contributions
gh-0.008
 opinions-0.008
 noses-0.008
 color-0.008
 cooperate-0.007
Token was
Token ID373
Feature activation+9.11e-04

Pos logit contributions
gh+0.023
 opinions+0.021
 color+0.021
 noses+0.021
 sounds+0.021
Neg logit contributions
ang-0.038
 sur-0.035
iker-0.035
 begg-0.034
lves-0.033
Token a
Token ID257
Feature activation-2.44e-02

Pos logit contributions
gh+0.001
 color+0.001
 opinions+0.001
 noses+0.001
 respect+0.001
Neg logit contributions
ang-0.002
 sur-0.001
iker-0.001
lves-0.001
 begg-0.001
Token quick
Token ID2068
Feature activation-3.73e-03

Pos logit contributions
ang+0.040
 sur+0.035
iker+0.035
lves+0.034
 Rory+0.034
Neg logit contributions
gh-0.037
 color-0.034
 noses-0.032
 rhythm-0.032
 quarrel-0.032
Token lear
Token ID22454
Feature activation-2.14e-01

Pos logit contributions
ang+0.006
lves+0.005
aro+0.005
 sur+0.005
 begg+0.005
Neg logit contributions
gh-0.006
 color-0.005
 rhythm-0.005
 quarrel-0.005
 joke-0.005
Tokenner
Token ID1008
Feature activation-4.52e-02

Pos logit contributions
ang+0.222
 begg+0.219
lves+0.216
 pine+0.213
 traveller+0.209
Neg logit contributions
gh-0.444
ap-0.358
 cooperate-0.357
 argue-0.356
 quarrel-0.354
Token and
Token ID290
Feature activation-5.43e-02

Pos logit contributions
ang+0.095
lves+0.091
 sur+0.089
 begg+0.088
iker+0.087
Neg logit contributions
gh-0.046
 color-0.042
 noses-0.038
 respect-0.038
 quarrel-0.038
Token soon
Token ID2582
Feature activation-3.59e-02

Pos logit contributions
ang+0.102
lves+0.096
iker+0.093
 begg+0.088
 traveller+0.088
Neg logit contributions
gh-0.071
 color-0.065
 sounds-0.063
 cooperate-0.063
 draw-0.062
Token he
Token ID339
Feature activation-6.64e-02

Pos logit contributions
ang+0.066
lves+0.057
iker+0.055
 Kang+0.054
aroo+0.054
Neg logit contributions
gh-0.054
 sounds-0.050
 color-0.048
 noses-0.047
 faces-0.047
Token was
Token ID373
Feature activation-2.86e-02

Pos logit contributions
ang+0.138
iker+0.131
lves+0.124
 sur+0.123
 Wil+0.123
Neg logit contributions
gh-0.072
 sounds-0.065
 quarrel-0.060
 color-0.060
 joke-0.060
Token purple
Token ID14032
Feature activation+7.13e-02

Pos logit contributions
ang+0.103
lves+0.091
 sur+0.091
iker+0.089
 Rory+0.088
Neg logit contributions
gh-0.085
 color-0.083
 opinions-0.080
 noses-0.079
 quarrel-0.077
Token car
Token ID1097
Feature activation-7.71e-02

Pos logit contributions
gh+0.111
 color+0.109
 opinions+0.103
 noses+0.102
 draw+0.100
Neg logit contributions
ang-0.114
 sur-0.104
lves-0.103
 Rory-0.101
 energ-0.099
Token was
Token ID373
Feature activation-2.40e-02

Pos logit contributions
ang+0.167
 sur+0.153
 Rory+0.153
lves+0.150
iker+0.148
Neg logit contributions
gh-0.080
 sounds-0.077
 color-0.075
 opinions-0.074
 respect-0.069
Token the
Token ID262
Feature activation-8.95e-02

Pos logit contributions
ang+0.047
iker+0.043
 sur+0.042
lves+0.041
unn+0.040
Neg logit contributions
gh-0.032
 color-0.029
 respect-0.027
 opinions-0.027
 draw-0.026
Token best
Token ID1266
Feature activation-4.14e-02

Pos logit contributions
ang+0.181
 sur+0.165
lves+0.161
iker+0.161
aro+0.158
Neg logit contributions
gh-0.102
 color-0.096
 opinions-0.090
 noses-0.088
 respect-0.085
Token choice
Token ID3572
Feature activation-2.14e-01

Pos logit contributions
ang+0.081
lves+0.075
 Rory+0.071
 sur+0.070
 begg+0.069
Neg logit contributions
gh-0.053
 color-0.053
 opinions-0.046
 shape-0.046
 rhythm-0.046
Token.
Token ID13
Feature activation-4.50e-03

Pos logit contributions
ang+0.420
lves+0.389
 sur+0.387
 begg+0.380
iker+0.371
Neg logit contributions
gh-0.270
 color-0.228
 respect-0.225
 opinions-0.217
 noses-0.214
Token They
Token ID1119
Feature activation+4.23e-02

Pos logit contributions
iker+0.009
 begg+0.009
ang+0.009
 sur+0.009
 traveller+0.009
Neg logit contributions
Their-0.004
 noses-0.004
 color-0.004
 opinions-0.004
thia-0.003
Token took
Token ID1718
Feature activation-6.11e-02

Pos logit contributions
 color+0.041
 draw+0.038
 quarrel+0.037
gh+0.036
 respect+0.036
Neg logit contributions
ang-0.091
iker-0.086
 begg-0.082
 sur-0.082
lves-0.081
Token it
Token ID340
Feature activation+7.16e-02

Pos logit contributions
ang+0.118
 sur+0.110
iker+0.108
 begg+0.106
lves+0.105
Neg logit contributions
gh-0.075
 color-0.068
 noses-0.067
 opinions-0.067
 respect-0.065
Token home
Token ID1363
Feature activation-1.85e-02

Pos logit contributions
gh+0.101
 color+0.091
 noses+0.087
 opinions+0.086
 respect+0.085
Neg logit contributions
ang-0.125
 sur-0.117
lves-0.111
iker-0.109
 begg-0.109
Token,"
Token ID553
Feature activation-4.81e-02

Pos logit contributions
 color+0.014
 opinions+0.013
gh+0.013
 sounds+0.013
 respect+0.013
Neg logit contributions
ang-0.018
 sur-0.017
iker-0.017
lves-0.017
 begg-0.016
Token replied
Token ID8712
Feature activation-2.29e-02

Pos logit contributions
ang+0.078
 sur+0.072
iker+0.072
lves+0.069
 begg+0.067
Neg logit contributions
gh-0.070
 color-0.069
 opinions-0.066
 draw-0.066
 sounds-0.065
Token the
Token ID262
Feature activation-8.39e-02

Pos logit contributions
ang+0.034
 sur+0.031
iker+0.030
lves+0.029
 begg+0.029
Neg logit contributions
gh-0.038
 color-0.035
 opinions-0.034
 noses-0.034
 draw-0.033
Token ha
Token ID387
Feature activation-1.46e-01

Pos logit contributions
ang+0.140
lves+0.134
 Wil+0.129
 Nem+0.128
aro+0.123
Neg logit contributions
 color-0.123
gh-0.121
 rhythm-0.115
 sounds-0.115
 draw-0.114
Tokenird
Token ID1447
Feature activation-1.18e-01

Pos logit contributions
lves+0.277
 Nem+0.266
 energ+0.266
 fisherman+0.263
 traveller+0.263
Neg logit contributions
gh-0.185
 color-0.136
 cooperate-0.128
ap-0.127
 opinions-0.124
Tokenress
Token ID601
Feature activation-2.13e-01

Pos logit contributions
 energ+0.188
lves+0.182
 Rory+0.169
 fisherman+0.168
 sur+0.166
Neg logit contributions
gh-0.171
 color-0.150
Her-0.150
ap-0.147
tr-0.146
Tokener
Token ID263
Feature activation-9.68e-02

Pos logit contributions
 sur+0.460
lves+0.457
 Rory+0.444
 Ned+0.443
 Sailor+0.441
Neg logit contributions
gh-0.166
Her-0.144
 feelings-0.136
 color-0.132
 sounds-0.127
Token.
Token ID13
Feature activation-6.82e-03

Pos logit contributions
ang+0.216
 sur+0.208
lves+0.205
 energ+0.201
 begg+0.198
Neg logit contributions
gh-0.085
 color-0.079
 draw-0.066
 opinions-0.065
 feelings-0.063
Token "
Token ID366
Feature activation+2.22e-03

Pos logit contributions
ang+0.013
iker+0.013
 sur+0.013
aro+0.012
 traveller+0.012
Neg logit contributions
 color-0.007
Their-0.007
gh-0.007
 draw-0.007
Her-0.007
TokenIt
Token ID1026
Feature activation+8.37e-02

Pos logit contributions
gh+0.003
 noses+0.003
 opinions+0.003
 color+0.003
 faces+0.003
Neg logit contributions
ang-0.004
 sur-0.003
lves-0.003
iker-0.003
 begg-0.003
Token feels
Token ID5300
Feature activation-4.54e-03

Pos logit contributions
gh+0.064
 color+0.059
 opinions+0.058
 sounds+0.057
 noses+0.056
Neg logit contributions
ang-0.196
 sur-0.188
iker-0.182
 begg-0.178
 Rory-0.178
Token Rex
Token ID17853
Feature activation-1.82e-02

Pos logit contributions
ang+0.035
 sur+0.033
lves+0.032
iker+0.030
 begg+0.030
Neg logit contributions
gh-0.039
 noses-0.036
 color-0.035
 sounds-0.034
 opinions-0.034
Token would
Token ID561
Feature activation+1.56e-02

Pos logit contributions
ang+0.032
 sur+0.030
iker+0.029
lves+0.029
 begg+0.029
Neg logit contributions
gh-0.024
 color-0.022
 noses-0.021
 opinions-0.021
 cooperate-0.020
Token be
Token ID307
Feature activation+1.56e-02

Pos logit contributions
gh+0.016
thia+0.013
 opinions+0.013
 noses+0.013
etrical+0.013
Neg logit contributions
ang-0.030
 sur-0.030
 Rory-0.029
 begg-0.029
 he-0.028
Token a
Token ID257
Feature activation-2.89e-02

Pos logit contributions
gh+0.020
 color+0.018
 opinions+0.018
 noses+0.018
 respect+0.017
Neg logit contributions
ang-0.029
 sur-0.027
iker-0.026
lves-0.026
 begg-0.025
Token good
Token ID922
Feature activation-1.76e-02

Pos logit contributions
ang+0.055
 sur+0.050
lves+0.049
iker+0.049
aro+0.048
Neg logit contributions
gh-0.036
 color-0.033
 respect-0.030
 noses-0.030
 rhythm-0.030
Token lear
Token ID22454
Feature activation-2.12e-01

Pos logit contributions
ang+0.030
lves+0.027
aro+0.027
 sur+0.026
 energ+0.026
Neg logit contributions
gh-0.026
 color-0.025
 joke-0.023
 opinions-0.023
 draw-0.023
Tokenner
Token ID1008
Feature activation-4.87e-02

Pos logit contributions
ang+0.240
lves+0.232
 sur+0.231
 begg+0.217
 traveller+0.211
Neg logit contributions
gh-0.437
 cooperate-0.352
 draw-0.343
Their-0.342
 quarrel-0.342
Token and
Token ID290
Feature activation-4.34e-02

Pos logit contributions
ang+0.104
 sur+0.098
lves+0.098
 begg+0.096
iker+0.094
Neg logit contributions
gh-0.048
 color-0.046
 opinions-0.041
 noses-0.041
 respect-0.041
Token look
Token ID804
Feature activation+2.43e-02

Pos logit contributions
ang+0.091
lves+0.082
iker+0.079
 sur+0.079
aroo+0.077
Neg logit contributions
gh-0.049
 respect-0.048
 opinions-0.048
 color-0.048
 sounds-0.047
Tokener
Token ID263
Feature activation-8.95e-02

Pos logit contributions
 respect+0.029
 sounds+0.029
gh+0.029
 color+0.028
 faces+0.028
Neg logit contributions
ang-0.044
 sur-0.044
 begg-0.042
aroo-0.042
iker-0.041
Token too
Token ID1165
Feature activation+2.00e-02

Pos logit contributions
 sur+0.169
ang+0.168
 energ+0.167
 begg+0.166
lves+0.165
Neg logit contributions
 color-0.094
gh-0.094
 sounds-0.092
 opinions-0.088
 noses-0.088
Token still
Token ID991
Feature activation+2.95e-02

Pos logit contributions
gh+0.001
 color+0.001
 opinions+0.001
 noses+0.001
 cooperate+0.001
Neg logit contributions
ang-0.001
 sur-0.001
iker-0.001
lves-0.001
 begg-0.001
Token so
Token ID523
Feature activation-7.71e-02

Pos logit contributions
gh+0.042
 color+0.039
 noses+0.038
 opinions+0.037
 draw+0.036
Neg logit contributions
ang-0.050
 sur-0.046
lves-0.045
iker-0.044
 begg-0.044
Token the
Token ID262
Feature activation-8.09e-02

Pos logit contributions
ang+0.142
lves+0.142
iker+0.141
 traveller+0.134
 begg+0.133
Neg logit contributions
gh-0.097
 color-0.087
 respect-0.081
 shape-0.081
Their-0.079
Token ha
Token ID387
Feature activation-1.44e-01

Pos logit contributions
ang+0.143
lves+0.130
aro+0.121
 Wil+0.120
 Nem+0.119
Neg logit contributions
gh-0.119
 color-0.117
 rhythm-0.108
 draw-0.106
 sounds-0.104
Tokenird
Token ID1447
Feature activation-1.22e-01

Pos logit contributions
lves+0.260
 traveller+0.250
 Nem+0.246
 sur+0.243
 energ+0.242
Neg logit contributions
gh-0.198
 color-0.162
 draw-0.151
ap-0.147
 noses-0.144
Tokenress
Token ID601
Feature activation-2.12e-01

Pos logit contributions
lves+0.175
 energ+0.168
 traveller+0.163
 sur+0.160
 fisherman+0.159
Neg logit contributions
gh-0.202
 color-0.181
ap-0.179
 draw-0.172
tr-0.171
Tokener
Token ID263
Feature activation-9.46e-02

Pos logit contributions
lves+0.434
ang+0.417
 sur+0.412
 Rory+0.396
 traveller+0.394
Neg logit contributions
gh-0.228
 color-0.178
ze-0.173
 sounds-0.170
 feelings-0.170
Token could
Token ID714
Feature activation+1.22e-02

Pos logit contributions
ang+0.176
lves+0.167
 sur+0.161
iker+0.157
 begg+0.156
Neg logit contributions
gh-0.121
 color-0.116
 draw-0.108
 noses-0.106
 opinions-0.105
Token do
Token ID466
Feature activation+2.43e-02

Pos logit contributions
gh+0.013
 color+0.011
 opinions+0.010
 noses+0.010
 draw+0.010
Neg logit contributions
ang-0.026
 sur-0.024
lves-0.024
iker-0.023
 begg-0.023
Token a
Token ID257
Feature activation+3.87e-04

Pos logit contributions
gh+0.036
 color+0.032
 sounds+0.031
 opinions+0.030
 respect+0.030
Neg logit contributions
ang-0.046
 sur-0.041
lves-0.040
iker-0.040
 Kang-0.039
Token good
Token ID922
Feature activation+1.36e-02

Pos logit contributions
gh+0.001
 color+0.000
 noses+0.000
 opinions+0.000
 cooperate+0.000
Neg logit contributions
ang-0.001
 sur-0.001
iker-0.001
lves-0.001
 begg-0.001
Token.
Token ID13
Feature activation-1.23e-02

Pos logit contributions
ang+0.027
lves+0.026
 sur+0.026
iker+0.025
 begg+0.025
Neg logit contributions
gh-0.015
 color-0.014
 opinions-0.013
 noses-0.013
 rhythm-0.013
Token Peter
Token ID5613
Feature activation+5.46e-02

Pos logit contributions
ang+0.018
 sur+0.016
iker+0.015
lves+0.015
 begg+0.015
Neg logit contributions
gh-0.021
 color-0.019
 noses-0.019
 opinions-0.019
 cooperate-0.018
Token was
Token ID373
Feature activation+2.57e-03

Pos logit contributions
gh+0.070
 color+0.063
 noses+0.062
 opinions+0.062
 sounds+0.059
Neg logit contributions
ang-0.100
 sur-0.092
iker-0.092
lves-0.090
 begg-0.089
Token a
Token ID257
Feature activation-1.78e-02

Pos logit contributions
gh+0.004
 color+0.004
 opinions+0.004
 noses+0.004
 rhythm+0.004
Neg logit contributions
ang-0.004
 sur-0.003
iker-0.003
lves-0.003
 begg-0.003
Token fast
Token ID3049
Feature activation-1.20e-02

Pos logit contributions
ang+0.029
iker+0.026
lves+0.025
 sur+0.025
aro+0.025
Neg logit contributions
gh-0.026
 color-0.024
 rhythm-0.023
 quarrel-0.023
 joke-0.023
Token lear
Token ID22454
Feature activation-2.11e-01

Pos logit contributions
ang+0.020
 Wil+0.019
lves+0.019
aro+0.018
iker+0.018
Neg logit contributions
gh-0.015
 rhythm-0.015
 color-0.015
 shape-0.015
 joke-0.014
Tokenner
Token ID1008
Feature activation-4.09e-02

Pos logit contributions
ang+0.206
lves+0.188
 begg+0.186
 traveller+0.180
iker+0.175
Neg logit contributions
gh-0.458
 draw-0.370
 cooperate-0.370
 quarrel-0.370
ap-0.368
Token,
Token ID11
Feature activation+1.30e-02

Pos logit contributions
ang+0.082
lves+0.078
 sur+0.076
 begg+0.075
iker+0.075
Neg logit contributions
gh-0.045
 color-0.042
 noses-0.038
 quarrel-0.037
 rhythm-0.037
Token and
Token ID290
Feature activation-4.35e-02

Pos logit contributions
gh+0.015
 color+0.015
 opinions+0.014
 noses+0.014
 sounds+0.013
Neg logit contributions
ang-0.026
iker-0.023
lves-0.023
 sur-0.023
 begg-0.023
Token after
Token ID706
Feature activation-4.62e-02

Pos logit contributions
ang+0.070
lves+0.061
iker+0.061
unn+0.060
 begg+0.058
Neg logit contributions
 color-0.070
gh-0.069
 shape-0.066
 sounds-0.066
 rhythm-0.065
Token a
Token ID257
Feature activation-1.34e-03

Pos logit contributions
ang+0.090
lves+0.085
 sur+0.084
iker+0.083
 begg+0.081
Neg logit contributions
gh-0.056
 color-0.052
 noses-0.049
 rhythm-0.049
 opinions-0.048
Token she
Token ID673
Feature activation-1.18e-01

Pos logit contributions
gh+0.019
 cooperate+0.016
 noses+0.016
 communicate+0.016
 opinions+0.016
Neg logit contributions
ang-0.015
 Wil-0.014
 Nem-0.014
 Rory-0.013
 he-0.013
Token told
Token ID1297
Feature activation-5.00e-02

Pos logit contributions
ang+0.253
iker+0.234
lves+0.233
 sur+0.221
 Kang+0.220
Neg logit contributions
 draw-0.126
gh-0.126
 color-0.120
 sounds-0.118
 drew-0.116
Token the
Token ID262
Feature activation-5.40e-02

Pos logit contributions
ang+0.120
 sur+0.108
lves+0.105
iker+0.105
 begg+0.101
Neg logit contributions
gh-0.047
 noses-0.043
 opinions-0.043
 faces-0.043
 respect-0.041
Token ha
Token ID387
Feature activation-1.37e-01

Pos logit contributions
ang+0.107
lves+0.093
iker+0.089
 sur+0.086
 energ+0.085
Neg logit contributions
gh-0.069
 color-0.068
 sounds-0.067
 joke-0.067
 rhythm-0.065
Tokenird
Token ID1447
Feature activation-1.28e-01

Pos logit contributions
lves+0.238
ang+0.224
 sur+0.219
 energ+0.218
 traveller+0.216
Neg logit contributions
gh-0.200
 color-0.166
ap-0.159
 opinions-0.157
 noses-0.157
Tokenress
Token ID601
Feature activation-2.10e-01

Pos logit contributions
lves+0.171
 energ+0.164
 sur+0.162
ang+0.161
 begg+0.154
Neg logit contributions
gh-0.221
 color-0.200
thia-0.190
ap-0.189
 opinions-0.189
Tokener
Token ID263
Feature activation-9.31e-02

Pos logit contributions
lves+0.395
ang+0.391
 sur+0.384
 energ+0.358
 Rory+0.354
Neg logit contributions
gh-0.268
 opinions-0.224
 color-0.217
 sounds-0.210
 feelings-0.207
Token,
Token ID11
Feature activation+2.34e-02

Pos logit contributions
ang+0.174
lves+0.162
 sur+0.159
 begg+0.156
iker+0.154
Neg logit contributions
gh-0.119
 color-0.113
 opinions-0.108
 noses-0.107
 respect-0.107
Token "
Token ID366
Feature activation-1.34e-02

Pos logit contributions
 color+0.026
 draw+0.025
 respect+0.025
 noses+0.023
 rhythm+0.023
Neg logit contributions
ang-0.048
iker-0.044
lves-0.044
unn-0.043
 sur-0.043
TokenI
Token ID40
Feature activation-1.88e-02

Pos logit contributions
ang+0.023
 sur+0.022
iker+0.022
lves+0.021
 begg+0.021
Neg logit contributions
gh-0.018
 color-0.016
 opinions-0.016
 respect-0.015
 cooperate-0.015
Token want
Token ID765
Feature activation+1.67e-02

Pos logit contributions
ang+0.038
lves+0.036
iker+0.035
aro+0.034
 begg+0.034
Neg logit contributions
gh-0.021
 respect-0.020
 draw-0.019
 quarrel-0.019
 color-0.019
Token said
Token ID531
Feature activation-5.43e-02

Pos logit contributions
iker+0.110
ang+0.107
lves+0.102
aro+0.095
ikers+0.094
Neg logit contributions
 color-0.042
 drew-0.039
 draw-0.038
 faces-0.038
 sounds-0.038
Token that
Token ID326
Feature activation-8.90e-02

Pos logit contributions
ang+0.122
lves+0.114
iker+0.111
 sur+0.106
 begg+0.103
Neg logit contributions
gh-0.060
 color-0.050
 respect-0.047
 noses-0.045
 faces-0.044
Token was
Token ID373
Feature activation-3.25e-02

Pos logit contributions
ang+0.156
iker+0.141
 sur+0.138
lves+0.138
aroo+0.128
Neg logit contributions
 color-0.138
gh-0.128
 opinions-0.126
 sounds-0.126
 noses-0.122
Token a
Token ID257
Feature activation-3.65e-02

Pos logit contributions
ang+0.076
iker+0.071
unn+0.069
 sur+0.068
lves+0.068
Neg logit contributions
 color-0.034
gh-0.034
 respect-0.028
 opinions-0.028
 shape-0.028
Token perfect
Token ID2818
Feature activation-6.47e-02

Pos logit contributions
ang+0.069
 sur+0.063
iker+0.061
lves+0.061
 begg+0.060
Neg logit contributions
gh-0.046
 color-0.042
 noses-0.040
 opinions-0.040
 respect-0.038
Token choice
Token ID3572
Feature activation-2.10e-01

Pos logit contributions
ang+0.124
 sur+0.116
iker+0.111
lves+0.111
 begg+0.110
Neg logit contributions
gh-0.078
 color-0.069
 noses-0.068
 opinions-0.068
 cooperate-0.064
Token.
Token ID13
Feature activation-1.72e-02

Pos logit contributions
ang+0.434
lves+0.401
 sur+0.399
 begg+0.384
iker+0.380
Neg logit contributions
gh-0.243
 color-0.210
 respect-0.202
 opinions-0.200
 noses-0.195
Token
Token ID220
Feature activation+3.10e-02

Pos logit contributions
iker+0.040
ang+0.038
lves+0.038
 sur+0.037
 begg+0.037
Neg logit contributions
 color-0.012
 noses-0.012
Their-0.012
 opinions-0.011
Her-0.011
Token
Token ID198
Feature activation-3.59e-03

Pos logit contributions
gh+0.032
thia+0.029
Their+0.029
 opinions+0.028
 color+0.028
Neg logit contributions
ang-0.062
 sur-0.062
iker-0.060
 begg-0.059
lves-0.059
TokenWe
Token ID1135
Feature activation+4.29e-02

Pos logit contributions
ang+0.007
 sur+0.007
iker+0.007
lves+0.007
 begg+0.007
Neg logit contributions
gh-0.004
 color-0.003
Their-0.003
 opinions-0.003
 cooperate-0.003
Token bought
Token ID5839
Feature activation-4.16e-02

Pos logit contributions
 color+0.046
 respect+0.044
 draw+0.042
 discuss+0.041
 quarrel+0.041
Neg logit contributions
ang-0.088
iker-0.083
 begg-0.082
 sur-0.081
lves-0.078
Token to
Token ID284
Feature activation+2.31e-03

Pos logit contributions
 color+0.030
 noses+0.030
gh+0.030
 sounds+0.028
 faces+0.028
Neg logit contributions
ang-0.054
lves-0.051
 sur-0.048
iker-0.048
 begg-0.048
Token remain
Token ID3520
Feature activation-2.71e-02

Pos logit contributions
gh+0.002
 color+0.002
 respect+0.002
 opinions+0.002
 draw+0.002
Neg logit contributions
ang-0.005
lves-0.005
iker-0.004
 sur-0.004
 begg-0.004
Token until
Token ID1566
Feature activation-9.41e-02

Pos logit contributions
ang+0.047
iker+0.044
 sur+0.044
lves+0.044
 begg+0.042
Neg logit contributions
gh-0.038
 color-0.035
 opinions-0.033
 noses-0.033
 respect-0.033
Token another
Token ID1194
Feature activation-5.00e-02

Pos logit contributions
ang+0.184
lves+0.171
iker+0.162
unn+0.160
ester+0.150
Neg logit contributions
gh-0.131
 color-0.127
 draw-0.116
 talk-0.116
 quarrel-0.113
Token weary
Token ID34730
Feature activation-5.55e-02

Pos logit contributions
ang+0.095
lves+0.094
 begg+0.088
iker+0.088
 sur+0.087
Neg logit contributions
 color-0.060
gh-0.055
 rhythm-0.055
 sounds-0.053
 respect-0.052
Token traveler
Token ID40168
Feature activation-2.10e-01

Pos logit contributions
ang+0.104
lves+0.103
aro+0.092
 Wil+0.089
 begg+0.089
Neg logit contributions
 color-0.071
 rhythm-0.066
 sounds-0.064
 respect-0.064
 noses-0.062
Token needed
Token ID2622
Feature activation-8.37e-02

Pos logit contributions
ang+0.444
lves+0.411
iker+0.394
 sur+0.387
 energ+0.375
Neg logit contributions
gh-0.236
 respect-0.222
 color-0.209
 sounds-0.207
 draw-0.206
Token it
Token ID340
Feature activation+2.13e-02

Pos logit contributions
ang+0.174
lves+0.159
 sur+0.158
iker+0.153
 begg+0.149
Neg logit contributions
gh-0.100
 color-0.087
 respect-0.086
 opinions-0.082
 noses-0.082
Token.
Token ID13
Feature activation-1.29e-02

Pos logit contributions
gh+0.022
 color+0.020
 respect+0.019
 opinions+0.018
 draw+0.018
Neg logit contributions
ang-0.047
 sur-0.044
lves-0.042
iker-0.042
 begg-0.041
Token<|endoftext|>
Token ID50256
Feature activation-8.49e-03

Pos logit contributions
ang+0.029
iker+0.028
lves+0.028
 sur+0.027
 energ+0.027
Neg logit contributions
One-0.009
thia-0.008
 color-0.008
Their-0.008
 Her-0.008
TokenOnce
Token ID7454
Feature activation-4.08e-02

Pos logit contributions
lves+0.015
ang+0.015
iker+0.015
 sur+0.014
 energ+0.013
Neg logit contributions
gh-0.011
 color-0.011
 sounds-0.010
 draw-0.010
 names-0.010
Token being
Token ID852
Feature activation-2.71e-02

Pos logit contributions
ang+0.117
 sur+0.107
iker+0.102
lves+0.102
 begg+0.099
Neg logit contributions
gh-0.113
 color-0.105
 opinions-0.103
 noses-0.102
 draw-0.098
Token happy
Token ID3772
Feature activation+2.70e-02

Pos logit contributions
ang+0.043
 sur+0.040
iker+0.037
lves+0.037
 begg+0.037
Neg logit contributions
gh-0.042
 color-0.038
 noses-0.037
 opinions-0.037
 cooperate-0.036
Token was
Token ID373
Feature activation-2.14e-02

Pos logit contributions
gh+0.039
 color+0.035
 noses+0.035
 opinions+0.035
 cooperate+0.033
Neg logit contributions
ang-0.045
 sur-0.042
iker-0.040
lves-0.040
 begg-0.039
Token the
Token ID262
Feature activation-8.59e-02

Pos logit contributions
ang+0.037
 sur+0.035
lves+0.033
iker+0.033
 begg+0.033
Neg logit contributions
gh-0.028
 noses-0.026
 color-0.025
 opinions-0.025
 cooperate-0.024
Token best
Token ID1266
Feature activation-4.02e-02

Pos logit contributions
ang+0.147
 sur+0.143
 begg+0.135
iker+0.133
 traveller+0.132
Neg logit contributions
gh-0.116
 noses-0.104
 opinions-0.100
Their-0.100
 cooperate-0.098
Token choice
Token ID3572
Feature activation-2.09e-01

Pos logit contributions
ang+0.080
lves+0.074
 sur+0.071
iker+0.069
 begg+0.069
Neg logit contributions
gh-0.048
 color-0.047
 rhythm-0.043
 opinions-0.043
 respect-0.042
Token for
Token ID329
Feature activation-6.06e-02

Pos logit contributions
ang+0.373
 sur+0.340
lves+0.338
 begg+0.326
iker+0.322
Neg logit contributions
gh-0.304
 color-0.273
 respect-0.258
 opinions-0.253
 sounds-0.251
Token her
Token ID607
Feature activation-1.13e-02

Pos logit contributions
ang+0.115
 sur+0.108
iker+0.103
lves+0.103
 begg+0.102
Neg logit contributions
gh-0.075
 color-0.066
 noses-0.065
 opinions-0.065
 cooperate-0.062
Token to
Token ID284
Feature activation+6.57e-03

Pos logit contributions
ang+0.020
 sur+0.018
iker+0.017
lves+0.017
 begg+0.017
Neg logit contributions
gh-0.016
 color-0.014
 noses-0.014
 opinions-0.014
 cooperate-0.013
Token remain
Token ID3520
Feature activation-1.67e-02

Pos logit contributions
gh+0.008
 noses+0.007
Their+0.007
thia+0.007
 opinions+0.007
Neg logit contributions
 sur-0.012
ang-0.011
 Rory-0.011
 begg-0.011
 traveller-0.011
Token.
Token ID13
Feature activation-3.15e-03

Pos logit contributions
ang+0.028
 sur+0.026
iker+0.024
lves+0.024
 begg+0.024
Neg logit contributions
gh-0.025
 color-0.022
 noses-0.022
 opinions-0.022
 cooperate-0.021
Token was
Token ID373
Feature activation-2.67e-02

Pos logit contributions
ang+0.129
lves+0.125
 sur+0.120
 begg+0.116
iker+0.116
Neg logit contributions
gh-0.067
 color-0.056
 respect-0.055
 sounds-0.053
 cooperate-0.053
Token now
Token ID783
Feature activation-7.97e-02

Pos logit contributions
ang+0.057
 sur+0.051
iker+0.051
lves+0.050
 begg+0.048
Neg logit contributions
gh-0.033
 color-0.027
 respect-0.027
 opinions-0.026
 quarrel-0.026
Token the
Token ID262
Feature activation-9.65e-02

Pos logit contributions
ang+0.156
lves+0.138
 sur+0.138
iker+0.136
 begg+0.131
Neg logit contributions
gh-0.106
 color-0.097
 respect-0.095
 opinions-0.092
 noses-0.091
Token mouse
Token ID10211
Feature activation-1.09e-01

Pos logit contributions
ang+0.185
 sur+0.160
lves+0.160
iker+0.158
aro+0.156
Neg logit contributions
gh-0.125
 color-0.119
 opinions-0.112
 quarrel-0.111
 noses-0.111
Token's
Token ID338
Feature activation+6.50e-02

Pos logit contributions
ang+0.236
 sur+0.218
iker+0.212
lves+0.212
 Rory+0.208
Neg logit contributions
gh-0.113
 color-0.102
 respect-0.102
 sounds-0.098
 draw-0.097
Token reality
Token ID3950
Feature activation-2.09e-01

Pos logit contributions
gh+0.081
 color+0.074
 opinions+0.074
 noses+0.073
 respect+0.071
Neg logit contributions
ang-0.127
 sur-0.116
lves-0.112
iker-0.110
 Rory-0.108
Token.
Token ID13
Feature activation-8.77e-03

Pos logit contributions
ang+0.427
lves+0.403
 sur+0.388
 Rory+0.384
 Kang+0.377
Neg logit contributions
gh-0.255
 color-0.219
 respect-0.211
 sounds-0.205
 cooperate-0.204
Token<|endoftext|>
Token ID50256
Feature activation+1.87e-03

Pos logit contributions
iker+0.020
ang+0.020
lves+0.019
 sur+0.019
 energ+0.018
Neg logit contributions
 color-0.006
Their-0.006
 Her-0.006
Her-0.006
thia-0.006
TokenOnce
Token ID7454
Feature activation-3.04e-02

Pos logit contributions
gh+0.002
 color+0.002
 sounds+0.002
 draw+0.002
 opinions+0.002
Neg logit contributions
ang-0.003
iker-0.003
lves-0.003
 sur-0.003
 energ-0.003
Token upon
Token ID2402
Feature activation-1.99e-02

Pos logit contributions
 sur+0.054
iker+0.053
 begg+0.052
ang+0.052
lves+0.050
Neg logit contributions
gh-0.038
 color-0.038
Their-0.034
 draw-0.032
 quarrel-0.032
Token a
Token ID257
Feature activation+2.70e-02

Pos logit contributions
ang+0.038
 sur+0.035
iker+0.034
lves+0.033
 begg+0.033
Neg logit contributions
gh-0.024
 color-0.022
 noses-0.021
 opinions-0.021
 cooperate-0.020
Token and
Token ID290
Feature activation-5.73e-02

Pos logit contributions
gh+0.025
 color+0.025
 sounds+0.023
 rhythm+0.023
 draw+0.023
Neg logit contributions
ang-0.058
lves-0.056
aro-0.051
iker-0.051
 begg-0.051
Token thanked
Token ID26280
Feature activation-9.34e-02

Pos logit contributions
ang+0.112
lves+0.102
iker+0.100
aro+0.096
erry+0.096
Neg logit contributions
 color-0.071
 draw-0.071
 sounds-0.071
 drew-0.068
gh-0.067
Token the
Token ID262
Feature activation-7.83e-02

Pos logit contributions
ang+0.186
lves+0.170
iker+0.161
unn+0.157
 sur+0.153
Neg logit contributions
 color-0.122
gh-0.117
 draw-0.111
 jokes-0.109
 rhythm-0.109
Token ha
Token ID387
Feature activation-1.43e-01

Pos logit contributions
ang+0.130
lves+0.123
 Wil+0.115
aro+0.113
 Nem+0.113
Neg logit contributions
gh-0.118
 color-0.115
 joke-0.111
 rhythm-0.109
 sounds-0.108
Tokenird
Token ID1447
Feature activation-1.16e-01

Pos logit contributions
lves+0.270
 Nem+0.260
 energ+0.258
 traveller+0.253
 Eddie+0.250
Neg logit contributions
gh-0.185
 color-0.138
ap-0.134
 draw-0.126
 cooperate-0.125
Tokenress
Token ID601
Feature activation-2.09e-01

Pos logit contributions
lves+0.155
 energ+0.153
 sur+0.147
 traveller+0.142
 Rory+0.141
Neg logit contributions
gh-0.194
 color-0.179
ap-0.169
thia-0.167
Her-0.167
Tokener
Token ID263
Feature activation-9.54e-02

Pos logit contributions
lves+0.442
 sur+0.426
ang+0.417
 energ+0.417
 Rory+0.408
Neg logit contributions
gh-0.176
 color-0.138
 sounds-0.133
Her-0.131
 feelings-0.131
Token.
Token ID13
Feature activation-3.11e-03

Pos logit contributions
lves+0.201
ang+0.200
 sur+0.193
 energ+0.192
 begg+0.188
Neg logit contributions
 color-0.084
gh-0.082
 draw-0.073
 respect-0.072
 noses-0.067
Token As
Token ID1081
Feature activation-1.10e-01

Pos logit contributions
ang+0.007
iker+0.006
 sur+0.006
 traveller+0.006
 begg+0.006
Neg logit contributions
 Her-0.003
 color-0.002
 opinions-0.002
thia-0.002
 draw-0.002
Token they
Token ID484
Feature activation-1.43e-02

Pos logit contributions
ang+0.178
lves+0.159
iker+0.156
unn+0.153
 begg+0.146
Neg logit contributions
 color-0.176
gh-0.175
 purpose-0.157
Their-0.153
 draw-0.151
Token were
Token ID547
Feature activation+1.77e-02

Pos logit contributions
ang+0.026
iker+0.025
lves+0.024
 sur+0.024
 traveller+0.024
Neg logit contributions
 color-0.017
gh-0.017
 draw-0.016
 noses-0.016
 quarrel-0.015
TokenN
Token ID45
Feature activation-1.77e-02

Pos logit contributions
gh+0.011
Their+0.010
 color+0.010
 respect+0.009
 cooperate+0.009
Neg logit contributions
ang-0.013
iker-0.012
 sur-0.012
lves-0.012
 begg-0.011
Tokenora
Token ID5799
Feature activation-5.78e-02

Pos logit contributions
lves+0.026
 Wil+0.026
ang+0.025
iker+0.024
aro+0.024
Neg logit contributions
gh-0.026
 opinions-0.024
 sounds-0.024
 color-0.024
ap-0.024
Token was
Token ID373
Feature activation+7.30e-03

Pos logit contributions
ang+0.110
iker+0.108
lves+0.105
 begg+0.101
 Wil+0.099
Neg logit contributions
gh-0.065
 opinions-0.060
 sounds-0.060
 color-0.060
 noses-0.059
Token a
Token ID257
Feature activation-1.40e-02

Pos logit contributions
gh+0.010
 color+0.009
 opinions+0.009
 rhythm+0.009
 respect+0.009
Neg logit contributions
ang-0.012
iker-0.012
lves-0.011
 sur-0.011
unn-0.011
Token fast
Token ID3049
Feature activation-5.99e-03

Pos logit contributions
ang+0.025
iker+0.023
aro+0.022
 Wil+0.022
lves+0.021
Neg logit contributions
gh-0.018
 color-0.018
 rhythm-0.017
 quarrel-0.016
 joke-0.016
Token lear
Token ID22454
Feature activation-2.08e-01

Pos logit contributions
ang+0.009
 Wil+0.009
aro+0.009
lves+0.008
unn+0.008
Neg logit contributions
 shape-0.008
 rhythm-0.008
 color-0.008
gh-0.008
 draw-0.008
Tokenner
Token ID1008
Feature activation-3.96e-02

Pos logit contributions
ang+0.211
lves+0.208
 begg+0.203
iker+0.198
 traveller+0.192
Neg logit contributions
gh-0.426
ap-0.351
 respect-0.337
ze-0.335
 cooperate-0.334
Token.
Token ID13
Feature activation-4.47e-03

Pos logit contributions
ang+0.082
lves+0.079
 sur+0.077
iker+0.076
 begg+0.076
Neg logit contributions
gh-0.041
 color-0.038
 noses-0.034
 respect-0.033
 quarrel-0.033
Token Soon
Token ID15894
Feature activation-2.65e-02

Pos logit contributions
ang+0.011
aro+0.011
lves+0.010
iker+0.010
aroo+0.010
Neg logit contributions
 names-0.003
 color-0.003
 Her-0.003
ap-0.002
Her-0.002
Token enough
Token ID1576
Feature activation-1.39e-01

Pos logit contributions
ang+0.059
iker+0.053
lves+0.051
 sur+0.050
aroo+0.050
Neg logit contributions
gh-0.027
 color-0.024
 noses-0.023
 sounds-0.023
 respect-0.022
Token she
Token ID673
Feature activation-9.45e-02

Pos logit contributions
ang+0.302
lves+0.281
 sur+0.267
iker+0.263
 begg+0.256
Neg logit contributions
gh-0.148
 color-0.143
 noses-0.132
 opinions-0.126
 respect-0.125
Token a
Token ID257
Feature activation+1.59e-02

Pos logit contributions
ang+0.200
 sur+0.185
iker+0.177
lves+0.175
 begg+0.174
Neg logit contributions
gh-0.170
 color-0.154
 noses-0.152
 opinions-0.150
 respect-0.145
Token un
Token ID555
Feature activation-6.63e-02

Pos logit contributions
gh+0.023
 color+0.020
 noses+0.020
 rhythm+0.019
 draw+0.019
Neg logit contributions
ang-0.027
 sur-0.025
iker-0.024
lves-0.024
 begg-0.024
Tokenicycle
Token ID35298
Feature activation-2.23e-02

Pos logit contributions
ang+0.108
 sur+0.099
iker+0.094
lves+0.094
 begg+0.093
Neg logit contributions
gh-0.100
 color-0.091
 noses-0.090
 opinions-0.090
 cooperate-0.086
Token.
Token ID13
Feature activation-5.79e-03

Pos logit contributions
ang+0.038
 sur+0.036
lves+0.036
 begg+0.035
 Rory+0.035
Neg logit contributions
gh-0.030
 color-0.028
 noses-0.026
 sounds-0.026
 draw-0.026
Token She
Token ID1375
Feature activation-3.03e-02

Pos logit contributions
ang+0.011
iker+0.010
lves+0.010
 sur+0.010
 begg+0.010
Neg logit contributions
 color-0.007
gh-0.007
 noses-0.006
 draw-0.006
Their-0.006
Token laughed
Token ID13818
Feature activation+1.15e-01

Pos logit contributions
ang+0.059
iker+0.057
lves+0.055
 traveller+0.054
 begg+0.053
Neg logit contributions
gh-0.034
 color-0.034
 noses-0.031
 draw-0.031
 sounds-0.031
Token and
Token ID290
Feature activation-3.64e-02

Pos logit contributions
gh+0.174
 color+0.158
 noses+0.157
 opinions+0.156
 cooperate+0.150
Neg logit contributions
ang-0.185
 sur-0.171
iker-0.163
lves-0.162
 begg-0.160
Token cl
Token ID537
Feature activation+3.87e-02

Pos logit contributions
ang+0.071
iker+0.068
lves+0.066
 begg+0.064
 traveller+0.064
Neg logit contributions
 color-0.040
gh-0.039
 noses-0.038
 sounds-0.037
 rhythm-0.037
Tokenapped
Token ID6320
Feature activation+3.82e-02

Pos logit contributions
gh+0.062
 color+0.056
 noses+0.056
 opinions+0.056
 cooperate+0.054
Neg logit contributions
ang-0.059
 sur-0.055
iker-0.052
lves-0.052
 begg-0.051
Token,
Token ID11
Feature activation+2.39e-02

Pos logit contributions
gh+0.053
 opinions+0.047
 noses+0.046
 color+0.046
 cooperate+0.045
Neg logit contributions
ang-0.065
 sur-0.061
 Rory-0.057
 begg-0.057
iker-0.056
Token she
Token ID673
Feature activation-7.30e-02

Pos logit contributions
gh+0.032
 color+0.030
 noses+0.030
 rhythm+0.028
 sounds+0.028
Neg logit contributions
ang-0.042
iker-0.039
 sur-0.038
lves-0.038
 begg-0.037
Token
Token ID198
Feature activation+2.98e-02

Pos logit contributions
gh+0.048
 color+0.044
 noses+0.043
 opinions+0.043
 cooperate+0.041
Neg logit contributions
ang-0.044
 sur-0.041
iker-0.038
lves-0.038
 begg-0.038
Token"
Token ID1
Feature activation+1.10e-02

Pos logit contributions
gh+0.035
 noses+0.031
 color+0.030
 opinions+0.030
 sounds+0.028
Neg logit contributions
ang-0.059
 sur-0.055
lves-0.052
iker-0.052
 begg-0.051
TokenMaybe
Token ID13300
Feature activation-4.66e-02

Pos logit contributions
gh+0.020
sing+0.018
ag+0.018
 noses+0.018
ap+0.017
Neg logit contributions
ang-0.014
 sur-0.013
 Rory-0.012
aroo-0.011
 begg-0.011
Token Mom
Token ID11254
Feature activation+7.99e-02

Pos logit contributions
ang+0.072
lves+0.066
iker+0.065
 sur+0.063
 begg+0.060
Neg logit contributions
 color-0.076
gh-0.076
 noses-0.070
 opinions-0.070
 respect-0.070
Token will
Token ID481
Feature activation+4.32e-02

Pos logit contributions
 color+0.055
gh+0.054
 opinions+0.051
 sounds+0.048
 noses+0.047
Neg logit contributions
ang-0.190
 sur-0.184
iker-0.178
lves-0.176
 begg-0.175
Token laugh
Token ID6487
Feature activation+1.64e-01

Pos logit contributions
gh+0.056
 noses+0.051
 opinions+0.049
etrical+0.047
Their+0.047
Neg logit contributions
 sur-0.072
ang-0.071
 Rory-0.066
 begg-0.066
 Wil-0.066
Token too
Token ID1165
Feature activation+5.45e-02

Pos logit contributions
gh+0.255
 color+0.231
 noses+0.230
 opinions+0.229
 cooperate+0.220
Neg logit contributions
ang-0.255
 sur-0.236
iker-0.222
lves-0.222
 begg-0.220
Token,"
Token ID553
Feature activation-2.22e-02

Pos logit contributions
gh+0.081
 color+0.074
 noses+0.072
 opinions+0.071
 cooperate+0.069
Neg logit contributions
ang-0.091
 sur-0.084
lves-0.082
iker-0.080
 begg-0.079
Token Lily
Token ID20037
Feature activation-1.17e-02

Pos logit contributions
ang+0.041
lves+0.039
iker+0.038
 sur+0.037
 energ+0.037
Neg logit contributions
 noses-0.028
 color-0.028
 faces-0.028
Their-0.027
 draw-0.027
Token said
Token ID531
Feature activation-3.42e-02

Pos logit contributions
ang+0.015
 sur+0.013
 begg+0.012
iker+0.012
lves+0.012
Neg logit contributions
gh-0.022
 color-0.020
 opinions-0.020
 noses-0.020
 respect-0.019
Token.
Token ID13
Feature activation+2.62e-03

Pos logit contributions
ang+0.044
 sur+0.041
 he+0.040
 Rory+0.037
iker+0.037
Neg logit contributions
gh-0.057
 opinions-0.056
 noses-0.054
etrical-0.054
 color-0.054
Token both
Token ID1111
Feature activation+1.21e-01

Pos logit contributions
gh+0.017
thia+0.015
etrical+0.015
tr+0.015
 sentence+0.014
Neg logit contributions
ang-0.007
 he-0.007
 elderly-0.007
 Nem-0.006
 Eddie-0.005
Token.
Token ID13
Feature activation+2.70e-02

Pos logit contributions
gh+0.184
 noses+0.157
 color+0.151
 opinions+0.151
etrical+0.151
Neg logit contributions
ang-0.194
 sur-0.181
iker-0.170
 begg-0.169
lves-0.167
Token Now
Token ID2735
Feature activation-3.12e-02

Pos logit contributions
gh+0.046
 color+0.044
 opinions+0.044
 noses+0.043
 respect+0.043
Neg logit contributions
ang-0.037
 sur-0.034
lves-0.032
iker-0.032
 begg-0.031
Token,
Token ID11
Feature activation+4.14e-02

Pos logit contributions
ang+0.058
 sur+0.053
lves+0.052
iker+0.052
 begg+0.050
Neg logit contributions
gh-0.041
 color-0.038
 noses-0.037
 opinions-0.036
 respect-0.036
Token let
Token ID1309
Feature activation-6.03e-02

Pos logit contributions
gh+0.054
 color+0.053
 opinions+0.050
 respect+0.050
 noses+0.049
Neg logit contributions
ang-0.076
 sur-0.069
iker-0.067
lves-0.067
 begg-0.065
Token's
Token ID338
Feature activation+1.38e-01

Pos logit contributions
ang+0.133
 sur+0.126
lves+0.124
 begg+0.124
 Rory+0.121
Neg logit contributions
gh-0.050
 color-0.042
 noses-0.041
 opinions-0.040
 cooperate-0.040
Token have
Token ID423
Feature activation+3.45e-02

Pos logit contributions
gh+0.144
 noses+0.123
 color+0.122
 opinions+0.121
 cooperate+0.113
Neg logit contributions
ang-0.284
 sur-0.270
iker-0.257
 begg-0.256
lves-0.256
Token some
Token ID617
Feature activation+6.56e-02

Pos logit contributions
gh+0.053
iver+0.046
 noses+0.046
 rhythm+0.045
sing+0.045
Neg logit contributions
 Nem-0.048
 sur-0.047
 begg-0.046
 Wil-0.046
 he-0.044
Token cookies
Token ID14746
Feature activation+3.92e-02

Pos logit contributions
gh+0.075
 noses+0.064
 color+0.064
 opinions+0.064
 cooperate+0.060
Neg logit contributions
ang-0.129
 sur-0.122
iker-0.117
 begg-0.116
lves-0.115
Token and
Token ID290
Feature activation+9.15e-03

Pos logit contributions
gh+0.053
 color+0.048
 opinions+0.047
 noses+0.047
 cooperate+0.046
Neg logit contributions
ang-0.069
 sur-0.064
lves-0.061
iker-0.061
 begg-0.061
Token milk
Token ID7545
Feature activation+5.29e-02

Pos logit contributions
gh+0.011
 color+0.009
 noses+0.009
 opinions+0.009
 rhythm+0.009
Neg logit contributions
ang-0.017
 sur-0.016
 begg-0.015
lves-0.015
iker-0.015
Token,
Token ID11
Feature activation+4.80e-02

Pos logit contributions
gh+0.090
 color+0.078
 opinions+0.077
 noses+0.077
 cooperate+0.073
Neg logit contributions
ang-0.166
 sur-0.156
iker-0.150
lves-0.149
 begg-0.148
Token and
Token ID290
Feature activation-1.30e-02

Pos logit contributions
gh+0.059
 color+0.058
 noses+0.058
 draw+0.055
 rhythm+0.054
Neg logit contributions
ang-0.086
iker-0.081
 sur-0.080
lves-0.077
 begg-0.077
Token they
Token ID484
Feature activation+2.67e-02

Pos logit contributions
ang+0.023
iker+0.022
lves+0.021
 traveller+0.021
 sur+0.021
Neg logit contributions
 color-0.015
gh-0.015
 noses-0.015
 draw-0.015
 drew-0.014
Token rolled
Token ID11686
Feature activation+1.61e-02

Pos logit contributions
gh+0.030
 color+0.029
 noses+0.028
 draw+0.028
 quarrel+0.026
Neg logit contributions
ang-0.051
iker-0.049
 sur-0.048
 begg-0.047
lves-0.047
Token all
Token ID477
Feature activation+2.22e-02

Pos logit contributions
gh+0.019
 noses+0.019
 color+0.018
 opinions+0.018
Their+0.017
Neg logit contributions
ang-0.030
 sur-0.028
iker-0.028
 begg-0.028
lves-0.028
Token over
Token ID625
Feature activation+1.09e-01

Pos logit contributions
 color+0.034
gh+0.033
 opinions+0.032
Their+0.031
 noses+0.031
Neg logit contributions
ang-0.032
 sur-0.032
iker-0.032
 begg-0.031
lves-0.030
Token the
Token ID262
Feature activation-1.36e-02

Pos logit contributions
gh+0.167
 color+0.161
 noses+0.156
 opinions+0.155
 respect+0.150
Neg logit contributions
ang-0.166
 sur-0.153
iker-0.153
lves-0.152
 begg-0.151
Token land
Token ID1956
Feature activation-1.03e-01

Pos logit contributions
ang+0.021
 sur+0.019
lves+0.018
iker+0.018
aro+0.018
Neg logit contributions
gh-0.021
 color-0.021
 rhythm-0.020
 sounds-0.020
 noses-0.020
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
ang+0.202
 sur+0.197
lves+0.197
 begg+0.191
iker+0.190
Neg logit contributions
gh-0.114
 color-0.104
 noses-0.099
 opinions-0.096
 cooperate-0.095
Token They
Token ID1119
Feature activation+6.41e-02

Pos logit contributions
 color+0.012
 noses+0.012
 names+0.012
Their+0.011
thia+0.011
Neg logit contributions
ang-0.026
iker-0.024
 sur-0.024
lves-0.024
 begg-0.024
Token rolled
Token ID11686
Feature activation+2.69e-02

Pos logit contributions
gh+0.066
 color+0.065
 noses+0.061
 draw+0.060
 quarrel+0.058
Neg logit contributions
ang-0.129
iker-0.122
 sur-0.121
 begg-0.120
 traveller-0.117
Token animal
Token ID5044
Feature activation-7.54e-02

Pos logit contributions
 color+0.003
gh+0.003
 opinions+0.003
 noses+0.003
 sounds+0.003
Neg logit contributions
ang-0.003
 sur-0.003
 begg-0.003
iker-0.003
 energ-0.003
Token and
Token ID290
Feature activation-3.68e-02

Pos logit contributions
ang+0.127
 sur+0.118
iker+0.112
lves+0.111
 begg+0.111
Neg logit contributions
gh-0.109
 color-0.097
 opinions-0.096
 noses-0.096
 cooperate-0.092
Token said
Token ID531
Feature activation-6.18e-02

Pos logit contributions
ang+0.071
iker+0.065
lves+0.063
 sur+0.060
aro+0.059
Neg logit contributions
 noses-0.046
 color-0.046
 faces-0.045
 sounds-0.045
gh-0.044
Token "
Token ID366
Feature activation-4.00e-02

Pos logit contributions
ang+0.094
 sur+0.086
iker+0.081
lves+0.081
 begg+0.080
Neg logit contributions
gh-0.099
 color-0.090
 noses-0.090
 opinions-0.089
 cooperate-0.086
TokenLet
Token ID5756
Feature activation-1.41e-02

Pos logit contributions
ang+0.066
 sur+0.061
iker+0.059
lves+0.058
 begg+0.058
Neg logit contributions
gh-0.058
 color-0.053
 opinions-0.052
 noses-0.052
 respect-0.050
Token's
Token ID338
Feature activation+1.29e-01

Pos logit contributions
ang+0.029
 sur+0.028
lves+0.027
 Rory+0.027
 traveller+0.027
Neg logit contributions
gh-0.013
 noses-0.012
 color-0.011
etrical-0.011
 opinions-0.011
Token go
Token ID467
Feature activation-4.93e-02

Pos logit contributions
gh+0.135
 color+0.118
 noses+0.116
 opinions+0.116
 cooperate+0.109
Neg logit contributions
ang-0.266
 sur-0.250
iker-0.240
lves-0.240
 begg-0.238
Token cl
Token ID537
Feature activation+3.43e-02

Pos logit contributions
ang+0.093
lves+0.089
 sur+0.089
iker+0.087
 begg+0.086
Neg logit contributions
gh-0.054
 color-0.051
 draw-0.051
 respect-0.051
 talk-0.050
Tokenap
Token ID499
Feature activation+9.18e-02

Pos logit contributions
gh+0.012
 opinions+0.007
 color+0.007
 cooperate+0.006
 noses+0.006
Neg logit contributions
ang-0.093
lves-0.090
 sur-0.090
iker-0.089
 begg-0.088
Token for
Token ID329
Feature activation-3.03e-02

Pos logit contributions
gh+0.133
 color+0.119
 noses+0.119
 opinions+0.118
 cooperate+0.113
Neg logit contributions
ang-0.153
 sur-0.142
iker-0.133
lves-0.132
 begg-0.132
Token my
Token ID616
Feature activation+1.02e-02

Pos logit contributions
ang+0.049
 sur+0.045
iker+0.043
lves+0.043
 begg+0.042
Neg logit contributions
gh-0.046
 color-0.042
 noses-0.041
 opinions-0.041
 cooperate-0.039
Token quickly
Token ID2952
Feature activation-3.05e-02

Pos logit contributions
ang+0.043
iker+0.040
 sur+0.039
lves+0.039
 begg+0.038
Neg logit contributions
gh-0.025
 sounds-0.024
 noses-0.024
 color-0.023
 opinions-0.023
Token flew
Token ID13112
Feature activation-9.14e-02

Pos logit contributions
ang+0.058
iker+0.050
lves+0.050
 sur+0.049
 Kang+0.048
Neg logit contributions
gh-0.039
 respect-0.036
 draw-0.035
 noses-0.035
 faces-0.035
Token away
Token ID1497
Feature activation+2.25e-02

Pos logit contributions
ang+0.183
 sur+0.170
iker+0.166
lves+0.165
 begg+0.164
Neg logit contributions
gh-0.102
 color-0.093
 noses-0.091
 respect-0.088
 rhythm-0.088
Token but
Token ID475
Feature activation-2.97e-02

Pos logit contributions
gh+0.024
 color+0.022
 rhythm+0.022
 noses+0.021
 respect+0.021
Neg logit contributions
ang-0.046
lves-0.042
iker-0.042
 sur-0.041
 begg-0.040
Token the
Token ID262
Feature activation-4.72e-02

Pos logit contributions
ang+0.052
iker+0.047
 sur+0.047
 begg+0.045
lves+0.045
Neg logit contributions
gh-0.040
 respect-0.038
 color-0.038
 opinions-0.036
 sounds-0.035
Token fire
Token ID2046
Feature activation-3.78e-03

Pos logit contributions
ang+0.080
 sur+0.075
lves+0.071
iker+0.070
 begg+0.070
Neg logit contributions
gh-0.068
 opinions-0.058
 noses-0.058
 color-0.057
 cooperate-0.056
Token had
Token ID550
Feature activation+2.33e-02

Pos logit contributions
ang+0.007
 sur+0.006
lves+0.006
iker+0.006
 begg+0.006
Neg logit contributions
gh-0.005
 color-0.004
 noses-0.004
 opinions-0.004
 sounds-0.004
Token spread
Token ID4104
Feature activation-5.20e-02

Pos logit contributions
gh+0.031
 color+0.027
 noses+0.027
 opinions+0.027
 cooperate+0.026
Neg logit contributions
ang-0.041
 sur-0.039
iker-0.037
lves-0.037
 begg-0.037
Token too
Token ID1165
Feature activation+3.84e-02

Pos logit contributions
ang+0.090
 sur+0.083
iker+0.079
lves+0.079
 begg+0.078
Neg logit contributions
gh-0.073
 color-0.066
 noses-0.065
 opinions-0.065
 respect-0.063
Token far
Token ID1290
Feature activation-1.12e-01

Pos logit contributions
gh+0.049
 color+0.043
 respect+0.043
 noses+0.042
 opinions+0.042
Neg logit contributions
ang-0.071
 sur-0.067
iker-0.065
lves-0.064
 begg-0.064
Token.
Token ID13
Feature activation-7.22e-03

Pos logit contributions
ang+0.244
 sur+0.226
 begg+0.221
lves+0.216
iker+0.214
Neg logit contributions
gh-0.111
 respect-0.102
 color-0.097
 noses-0.097
 sounds-0.092
Token air
Token ID1633
Feature activation+1.00e-02

Pos logit contributions
ang+0.064
 sur+0.058
iker+0.056
lves+0.056
 Rory+0.056
Neg logit contributions
gh-0.038
 color-0.037
 opinions-0.035
 noses-0.035
 rhythm-0.034
Token.
Token ID13
Feature activation+2.15e-02

Pos logit contributions
gh+0.013
 color+0.012
 noses+0.012
 opinions+0.011
 respect+0.011
Neg logit contributions
ang-0.018
 sur-0.017
iker-0.016
lves-0.016
 begg-0.016
Token The
Token ID383
Feature activation+1.81e-03

Pos logit contributions
 color+0.023
Their+0.022
 opinions+0.021
 respect+0.021
 draw+0.021
Neg logit contributions
ang-0.041
iker-0.041
lves-0.041
 sur-0.040
 traveller-0.040
Token light
Token ID1657
Feature activation+5.52e-02

Pos logit contributions
gh+0.002
 color+0.002
 noses+0.002
 opinions+0.002
 sounds+0.002
Neg logit contributions
ang-0.004
 sur-0.003
iker-0.003
lves-0.003
 Rory-0.003
Token is
Token ID318
Feature activation+3.09e-02

Pos logit contributions
gh+0.064
 color+0.061
 noses+0.060
 opinions+0.059
 sounds+0.058
Neg logit contributions
ang-0.106
 sur-0.098
iker-0.097
lves-0.097
 begg-0.095
Token good
Token ID922
Feature activation+1.76e-02

Pos logit contributions
gh+0.043
 opinions+0.037
 noses+0.037
 color+0.037
 cooperate+0.036
Neg logit contributions
ang-0.053
 sur-0.050
lves-0.047
 begg-0.046
iker-0.046
Token for
Token ID329
Feature activation-3.44e-02

Pos logit contributions
 color+0.019
gh+0.019
 opinions+0.018
 noses+0.018
 sounds+0.017
Neg logit contributions
ang-0.036
lves-0.033
 sur-0.033
 begg-0.032
iker-0.032
Token the
Token ID262
Feature activation-3.52e-02

Pos logit contributions
ang+0.054
 sur+0.050
iker+0.048
lves+0.047
 begg+0.047
Neg logit contributions
gh-0.054
 color-0.050
 noses-0.049
 opinions-0.049
 respect-0.047
Token sky
Token ID6766
Feature activation+5.13e-02

Pos logit contributions
ang+0.066
 sur+0.061
iker+0.059
lves+0.059
 begg+0.058
Neg logit contributions
gh-0.044
 color-0.040
 noses-0.039
 opinions-0.039
 respect-0.037
Token.
Token ID13
Feature activation+2.13e-02

Pos logit contributions
gh+0.069
 color+0.062
 noses+0.061
 opinions+0.061
 cooperate+0.058
Neg logit contributions
ang-0.091
 sur-0.085
iker-0.081
lves-0.081
 begg-0.080
Token The
Token ID383
Feature activation-2.68e-04

Pos logit contributions
 color+0.024
Their+0.022
 opinions+0.022
 respect+0.021
 draw+0.021
Neg logit contributions
iker-0.040
ang-0.040
lves-0.039
 sur-0.039
 traveller-0.039
Token let
Token ID1309
Feature activation-6.69e-02

Pos logit contributions
ang+0.013
iker+0.011
lves+0.011
 sur+0.011
 traveller+0.011
Neg logit contributions
 color-0.008
 noses-0.008
 draw-0.008
gh-0.007
 sounds-0.007
Token it
Token ID340
Feature activation+6.32e-02

Pos logit contributions
ang+0.124
 sur+0.114
iker+0.111
lves+0.109
 begg+0.107
Neg logit contributions
gh-0.093
 color-0.084
 noses-0.079
 draw-0.078
 opinions-0.078
Token splash
Token ID22870
Feature activation+7.15e-02

Pos logit contributions
gh+0.069
 color+0.063
 noses+0.060
 opinions+0.059
 draw+0.058
Neg logit contributions
ang-0.128
 sur-0.121
iker-0.117
lves-0.117
 begg-0.115
Token with
Token ID351
Feature activation-2.27e-02

Pos logit contributions
gh+0.093
 color+0.089
 noses+0.085
 opinions+0.084
 sounds+0.083
Neg logit contributions
ang-0.129
 sur-0.123
iker-0.122
 begg-0.121
lves-0.119
Token the
Token ID262
Feature activation-3.06e-02

Pos logit contributions
ang+0.041
 sur+0.037
iker+0.037
lves+0.036
 begg+0.036
Neg logit contributions
gh-0.031
 color-0.029
 noses-0.029
 opinions-0.028
 respect-0.027
Token water
Token ID1660
Feature activation+1.40e-02

Pos logit contributions
ang+0.050
 sur+0.046
iker+0.046
lves+0.044
 begg+0.043
Neg logit contributions
gh-0.044
 color-0.044
 noses-0.043
 sounds-0.042
 opinions-0.041
Token.
Token ID13
Feature activation+5.50e-03

Pos logit contributions
gh+0.014
 color+0.014
 noses+0.013
 sounds+0.012
 faces+0.012
Neg logit contributions
ang-0.029
lves-0.028
 sur-0.028
iker-0.027
 begg-0.027
Token The
Token ID383
Feature activation-2.95e-03

Pos logit contributions
gh+0.007
 color+0.007
 opinions+0.007
 noses+0.007
 draw+0.007
Neg logit contributions
ang-0.010
lves-0.009
 sur-0.009
iker-0.009
 begg-0.009
Token ducks
Token ID39694
Feature activation-2.67e-04

Pos logit contributions
ang+0.004
 sur+0.004
iker+0.004
lves+0.004
 begg+0.004
Neg logit contributions
gh-0.005
 color-0.005
 noses-0.005
 opinions-0.005
 respect-0.004
Token came
Token ID1625
Feature activation-1.18e-02

Pos logit contributions
ang+0.000
 sur+0.000
iker+0.000
 begg+0.000
lves+0.000
Neg logit contributions
gh-0.000
 color-0.000
 noses-0.000
 opinions-0.000
 draw-0.000
Token closer
Token ID5699
Feature activation+1.18e-01

Pos logit contributions
ang+0.016
 sur+0.014
aroo+0.012
 Rory+0.012
 Nem+0.012
Neg logit contributions
gh-0.022
 opinions-0.020
 cooperate-0.020
thia-0.020
 purpose-0.019
Token that
Token ID326
Feature activation-5.77e-02

Pos logit contributions
gh+0.003
 color+0.003
 noses+0.003
 opinions+0.003
 respect+0.002
Neg logit contributions
ang-0.003
 sur-0.003
iker-0.003
lves-0.003
 begg-0.003
Token his
Token ID465
Feature activation+1.64e-02

Pos logit contributions
ang+0.097
iker+0.089
 sur+0.085
lves+0.084
 begg+0.080
Neg logit contributions
 color-0.088
gh-0.081
 sounds-0.080
 noses-0.080
 rhythm-0.079
Token favorite
Token ID4004
Feature activation+9.89e-02

Pos logit contributions
gh+0.023
 color+0.022
 noses+0.022
 opinions+0.021
 rhythm+0.020
Neg logit contributions
ang-0.028
 sur-0.025
lves-0.025
iker-0.025
 energ-0.024
Token toy
Token ID13373
Feature activation+5.49e-03

Pos logit contributions
gh+0.162
 opinions+0.146
 noses+0.145
 cooperate+0.144
 color+0.144
Neg logit contributions
ang-0.145
 sur-0.134
iker-0.125
 begg-0.124
lves-0.123
Token was
Token ID373
Feature activation-3.77e-03

Pos logit contributions
gh+0.008
 color+0.007
 noses+0.007
 opinions+0.007
 cooperate+0.007
Neg logit contributions
ang-0.009
 sur-0.009
iker-0.008
lves-0.008
 begg-0.008
Token missing
Token ID4814
Feature activation+9.44e-03

Pos logit contributions
ang+0.006
 sur+0.006
 begg+0.006
 Rory+0.005
lves+0.005
Neg logit contributions
gh-0.005
 opinions-0.005
 cooperate-0.005
 noses-0.005
 respect-0.005
Token.
Token ID13
Feature activation-2.26e-03

Pos logit contributions
gh+0.012
 color+0.011
 noses+0.011
 opinions+0.011
 respect+0.010
Neg logit contributions
ang-0.017
 sur-0.016
iker-0.015
lves-0.015
 begg-0.015
Token He
Token ID679
Feature activation-8.50e-03

Pos logit contributions
ang+0.003
 sur+0.003
iker+0.003
lves+0.003
aro+0.003
Neg logit contributions
gh-0.004
 color-0.004
 opinions-0.003
 noses-0.003
 sounds-0.003
Token looked
Token ID3114
Feature activation+5.23e-02

Pos logit contributions
ang+0.013
 sur+0.012
iker+0.012
lves+0.012
 begg+0.011
Neg logit contributions
gh-0.013
 color-0.012
 noses-0.012
 opinions-0.012
 cooperate-0.011
Token everywhere
Token ID8347
Feature activation-4.15e-02

Pos logit contributions
gh+0.078
 color+0.074
 noses+0.072
 opinions+0.071
 draw+0.070
Neg logit contributions
ang-0.085
 sur-0.078
iker-0.075
 begg-0.074
 Rory-0.071
Token,
Token ID11
Feature activation+3.71e-02

Pos logit contributions
ang+0.060
 sur+0.056
iker+0.052
 Rory+0.051
lves+0.051
Neg logit contributions
gh-0.067
 opinions-0.063
 noses-0.062
 color-0.060
 cooperate-0.059
Tokenmy
Token ID1820
Feature activation+6.91e-02

Pos logit contributions
 color+0.033
gh+0.032
 noses+0.030
 respect+0.029
 sounds+0.028
Neg logit contributions
ang-0.073
 sur-0.072
lves-0.072
iker-0.069
 begg-0.068
Token.
Token ID13
Feature activation-2.89e-04

Pos logit contributions
gh+0.073
 color+0.072
 noses+0.068
 respect+0.064
 sounds+0.063
Neg logit contributions
ang-0.138
 sur-0.135
lves-0.130
iker-0.128
 begg-0.127
Token Then
Token ID3244
Feature activation-8.72e-03

Pos logit contributions
lves+0.001
ang+0.001
iker+0.001
 energ+0.001
sie+0.001
Neg logit contributions
 color-0.000
Their-0.000
 names-0.000
 opinions-0.000
 respect-0.000
Token he
Token ID339
Feature activation-5.46e-02

Pos logit contributions
lves+0.016
ang+0.015
iker+0.013
pill+0.013
igator+0.013
Neg logit contributions
 color-0.013
gh-0.012
 sounds-0.012
 talk-0.011
 be-0.011
Token said
Token ID531
Feature activation-5.06e-02

Pos logit contributions
ang+0.110
lves+0.107
iker+0.103
aro+0.095
aroo+0.094
Neg logit contributions
 draw-0.058
 drew-0.058
 color-0.057
gh-0.057
 sounds-0.056
Token,
Token ID11
Feature activation+7.44e-03

Pos logit contributions
ang+0.110
lves+0.107
iker+0.097
 sur+0.091
 Kang+0.091
Neg logit contributions
gh-0.064
 respect-0.049
 color-0.048
Their-0.045
 faces-0.044
Token �
Token ID6184
Feature activation-7.13e-02

Pos logit contributions
gh+0.008
 color+0.008
Their+0.008
 draw+0.007
 respect+0.007
Neg logit contributions
ang-0.015
lves-0.015
iker-0.014
unn-0.013
 sur-0.013
Token�
Token ID95
Feature activation-6.82e-02

Pos logit contributions
ang+0.113
 sur+0.103
lves+0.099
iker+0.099
 begg+0.097
Neg logit contributions
gh-0.110
 color-0.098
 opinions-0.097
 cooperate-0.096
 noses-0.094
Token€
Token ID26391
Feature activation+5.78e-02

Pos logit contributions
ang+0.108
 sur+0.102
iker+0.095
lves+0.095
 begg+0.094
Neg logit contributions
gh-0.102
 noses-0.097
 color-0.094
 opinions-0.091
 rhythm-0.088
Token�
Token ID129
Feature activation-3.44e-02

Pos logit contributions
gh+0.089
 color+0.078
 opinions+0.077
 noses+0.077
thia+0.077
Neg logit contributions
ang-0.092
 sur-0.086
iker-0.083
lves-0.082
 Rory-0.082
Token�
Token ID241
Feature activation+3.45e-02

Pos logit contributions
ang+0.059
 sur+0.055
lves+0.053
iker+0.051
 begg+0.051
Neg logit contributions
gh-0.046
 color-0.045
 noses-0.043
 opinions-0.042
 cooperate-0.040
Token couldn
Token ID3521
Feature activation-7.01e-02

Pos logit contributions
ang+0.108
 sur+0.098
iker+0.097
lves+0.096
 begg+0.093
Neg logit contributions
gh-0.070
 color-0.063
 noses-0.062
 opinions-0.059
 quarrel-0.059
Token't
Token ID470
Feature activation+3.98e-02

Pos logit contributions
ang+0.161
 sur+0.153
iker+0.151
lves+0.150
 begg+0.145
Neg logit contributions
gh-0.067
 color-0.047
 sounds-0.046
Their-0.045
 quarrel-0.045
Token reach
Token ID3151
Feature activation-2.79e-02

Pos logit contributions
gh+0.051
 opinions+0.048
 noses+0.047
 color+0.044
 sounds+0.043
Neg logit contributions
ang-0.070
 sur-0.069
iker-0.065
lves-0.064
aroo-0.063
Token it
Token ID340
Feature activation+4.20e-02

Pos logit contributions
ang+0.050
 sur+0.045
lves+0.043
iker+0.043
 begg+0.042
Neg logit contributions
gh-0.041
 color-0.037
 noses-0.036
 respect-0.034
 draw-0.034
Token.
Token ID13
Feature activation-3.32e-03

Pos logit contributions
gh+0.057
 color+0.051
 noses+0.050
 opinions+0.049
 cooperate+0.047
Neg logit contributions
ang-0.075
 sur-0.070
lves-0.066
iker-0.066
 begg-0.066
Token She
Token ID1375
Feature activation-3.12e-02

Pos logit contributions
ang+0.006
 sur+0.006
iker+0.006
aro+0.005
lves+0.005
Neg logit contributions
gh-0.004
 opinions-0.004
 color-0.004
Their-0.004
 noses-0.004
Token laughed
Token ID13818
Feature activation+1.20e-01

Pos logit contributions
ang+0.061
 sur+0.057
iker+0.056
 begg+0.054
lves+0.054
Neg logit contributions
gh-0.036
 noses-0.034
 color-0.033
 quarrel-0.032
 sounds-0.032
Token and
Token ID290
Feature activation-3.17e-02

Pos logit contributions
gh+0.175
 color+0.158
 opinions+0.156
 noses+0.156
 cooperate+0.150
Neg logit contributions
ang-0.198
 sur-0.184
iker-0.174
lves-0.173
 begg-0.172
Token yelled
Token ID22760
Feature activation-9.59e-03

Pos logit contributions
ang+0.059
iker+0.055
 begg+0.053
 sur+0.053
 traveller+0.053
Neg logit contributions
gh-0.037
 noses-0.037
 color-0.037
 sounds-0.035
 faces-0.035
Token,
Token ID11
Feature activation+6.40e-03

Pos logit contributions
ang+0.015
 sur+0.014
iker+0.013
lves+0.013
 Rory+0.013
Neg logit contributions
gh-0.014
 color-0.013
 opinions-0.013
 noses-0.013
 cooperate-0.013
Token "
Token ID366
Feature activation-1.78e-02

Pos logit contributions
gh+0.009
 noses+0.008
 color+0.008
 quarrel+0.008
 sounds+0.008
Neg logit contributions
ang-0.011
 sur-0.010
iker-0.010
 begg-0.010
lves-0.010
Token ground
Token ID2323
Feature activation+1.37e-02

Pos logit contributions
ang+0.032
 sur+0.030
iker+0.029
lves+0.029
 begg+0.029
Neg logit contributions
gh-0.026
 opinions-0.023
 color-0.023
 noses-0.023
 cooperate-0.022
Token and
Token ID290
Feature activation-2.97e-02

Pos logit contributions
gh+0.017
 color+0.016
 noses+0.015
 opinions+0.015
 sounds+0.014
Neg logit contributions
ang-0.025
 sur-0.024
iker-0.023
lves-0.023
 begg-0.023
Token Lily
Token ID20037
Feature activation+3.21e-03

Pos logit contributions
ang+0.068
 sur+0.062
iker+0.061
unn+0.057
 Kang+0.056
Neg logit contributions
 color-0.030
 draw-0.030
 shape-0.029
 sounds-0.029
gh-0.029
Token couldn
Token ID3521
Feature activation-8.32e-02

Pos logit contributions
gh+0.004
 color+0.004
 noses+0.004
 draw+0.004
 sounds+0.004
Neg logit contributions
ang-0.006
 sur-0.005
iker-0.005
lves-0.005
 Rory-0.005
Token't
Token ID470
Feature activation+2.75e-02

Pos logit contributions
ang+0.191
 sur+0.183
iker+0.179
lves+0.175
 begg+0.172
Neg logit contributions
gh-0.078
 color-0.055
Their-0.054
 draw-0.053
 quarrel-0.053
Token get
Token ID651
Feature activation-1.07e-01

Pos logit contributions
gh+0.041
 opinions+0.038
 noses+0.038
thia+0.035
 color+0.035
Neg logit contributions
ang-0.044
 sur-0.042
iker-0.040
lves-0.039
aroo-0.039
Token it
Token ID340
Feature activation+1.57e-02

Pos logit contributions
ang+0.204
 sur+0.188
iker+0.183
 begg+0.172
 Kang+0.170
Neg logit contributions
gh-0.135
 color-0.124
 noses-0.122
 draw-0.121
 respect-0.120
Token out
Token ID503
Feature activation-1.15e-02

Pos logit contributions
gh+0.020
 color+0.017
 noses+0.017
 opinions+0.017
 respect+0.016
Neg logit contributions
ang-0.030
 sur-0.028
iker-0.026
lves-0.026
 begg-0.026
Token.
Token ID13
Feature activation-8.27e-03

Pos logit contributions
ang+0.025
lves+0.025
 sur+0.024
iker+0.024
 Kang+0.024
Neg logit contributions
gh-0.010
 color-0.009
 respect-0.009
 draw-0.008
 shape-0.008
Token
Token ID220
Feature activation+3.09e-02

Pos logit contributions
ang+0.015
 sur+0.014
iker+0.014
 begg+0.013
aroo+0.013
Neg logit contributions
gh-0.011
 color-0.010
 opinions-0.010
 noses-0.010
 draw-0.009
Token
Token ID198
Feature activation+8.18e-04

Pos logit contributions
gh+0.039
Their+0.033
 opinions+0.033
 color+0.033
 cooperate+0.032
Neg logit contributions
ang-0.057
 sur-0.057
 begg-0.054
iker-0.054
 traveller-0.053
Token to
Token ID284
Feature activation-7.95e-03

Pos logit contributions
gh+0.008
 color+0.007
 opinions+0.007
 noses+0.007
 respect+0.007
Neg logit contributions
ang-0.013
 sur-0.012
lves-0.011
iker-0.011
 begg-0.011
Token each
Token ID1123
Feature activation+4.13e-02

Pos logit contributions
ang+0.014
lves+0.013
 sur+0.013
 Nem+0.012
iker+0.012
Neg logit contributions
gh-0.011
 noses-0.009
thia-0.009
etrical-0.009
 opinions-0.009
Token other
Token ID584
Feature activation+2.80e-02

Pos logit contributions
gh+0.056
 noses+0.049
 color+0.049
 opinions+0.049
 cooperate+0.047
Neg logit contributions
ang-0.074
 sur-0.069
lves-0.065
iker-0.065
 begg-0.065
Token.
Token ID13
Feature activation+1.31e-02

Pos logit contributions
gh+0.038
 noses+0.033
 color+0.033
 opinions+0.033
 cooperate+0.031
Neg logit contributions
ang-0.049
 sur-0.046
iker-0.044
lves-0.044
 begg-0.043
Token<|endoftext|>
Token ID50256
Feature activation+5.53e-03

Pos logit contributions
gh+0.019
 color+0.018
 opinions+0.017
 noses+0.017
 respect+0.017
Neg logit contributions
ang-0.021
 sur-0.020
iker-0.020
 begg-0.019
lves-0.019
TokenOnce
Token ID7454
Feature activation-2.90e-02

Pos logit contributions
gh+0.008
 color+0.008
 opinions+0.007
 noses+0.007
 cooperate+0.007
Neg logit contributions
ang-0.009
 sur-0.008
iker-0.008
lves-0.008
 begg-0.008
Token upon
Token ID2402
Feature activation-2.04e-02

Pos logit contributions
ang+0.043
 sur+0.042
iker+0.041
 begg+0.040
lves+0.040
Neg logit contributions
gh-0.045
 color-0.044
Their-0.041
 draw-0.040
 noses-0.040
Token a
Token ID257
Feature activation+1.66e-02

Pos logit contributions
ang+0.026
 he+0.024
 sur+0.024
 begg+0.024
 energ+0.023
Neg logit contributions
gh-0.034
 noses-0.032
 opinions-0.031
 argue-0.030
 cooperate-0.030
Token time
Token ID640
Feature activation-1.32e-02

Pos logit contributions
 opinions+0.024
 noses+0.024
 mascara+0.023
 erase+0.023
 communicate+0.023
Neg logit contributions
ang-0.025
 sur-0.021
 energ-0.020
lves-0.020
 Nem-0.019
Token,
Token ID11
Feature activation+3.78e-02

Pos logit contributions
ang+0.021
 sur+0.019
iker+0.018
lves+0.018
 begg+0.018
Neg logit contributions
gh-0.019
 opinions-0.018
 noses-0.018
 color-0.018
 cooperate-0.017
Token there
Token ID612
Feature activation-8.54e-03

Pos logit contributions
gh+0.080
 color+0.075
 noses+0.074
 opinions+0.073
 cooperate+0.071
Neg logit contributions
ang-0.038
 sur-0.034
iker-0.031
lves-0.031
 begg-0.030
Token She
Token ID1375
Feature activation-3.73e-02

Pos logit contributions
ang+0.028
 sur+0.026
iker+0.025
aro+0.025
 begg+0.024
Neg logit contributions
 opinions-0.014
 color-0.014
gh-0.013
 draw-0.013
 respect-0.013
Token had
Token ID550
Feature activation+3.05e-02

Pos logit contributions
ang+0.073
iker+0.068
lves+0.065
 sur+0.065
aro+0.063
Neg logit contributions
gh-0.042
 color-0.041
 sounds-0.041
 noses-0.040
 draw-0.040
Token been
Token ID587
Feature activation-3.75e-03

Pos logit contributions
gh+0.039
 color+0.035
 noses+0.035
 opinions+0.035
 cooperate+0.033
Neg logit contributions
ang-0.056
 sur-0.052
iker-0.050
lves-0.050
 begg-0.049
Token very
Token ID845
Feature activation-1.53e-02

Pos logit contributions
ang+0.006
 sur+0.005
iker+0.005
lves+0.005
 Rory+0.005
Neg logit contributions
gh-0.006
 color-0.005
 opinions-0.005
 noses-0.005
 respect-0.005
Token brave
Token ID14802
Feature activation-3.77e-02

Pos logit contributions
ang+0.029
iker+0.027
 sur+0.027
lves+0.027
aro+0.026
Neg logit contributions
gh-0.017
 respect-0.017
 color-0.016
 opinions-0.016
 cooperate-0.016
Token and
Token ID290
Feature activation-5.67e-02

Pos logit contributions
ang+0.082
aro+0.078
lves+0.076
aroo+0.074
 sur+0.074
Neg logit contributions
gh-0.035
 color-0.035
 respect-0.034
 opinions-0.033
 feelings-0.033
Token powerful
Token ID3665
Feature activation-8.27e-03

Pos logit contributions
ang+0.120
iker+0.106
lves+0.105
 sur+0.102
 Rory+0.100
Neg logit contributions
 color-0.063
 respect-0.062
gh-0.062
 draw-0.059
 sounds-0.059
Token.
Token ID13
Feature activation-2.71e-03

Pos logit contributions
ang+0.017
lves+0.017
 sur+0.016
 begg+0.015
aro+0.015
Neg logit contributions
gh-0.008
 color-0.008
 opinions-0.008
 respect-0.008
 sounds-0.008
Token<|endoftext|>
Token ID50256
Feature activation-3.91e-03

Pos logit contributions
ang+0.005
 sur+0.005
iker+0.005
 begg+0.005
lves+0.004
Neg logit contributions
gh-0.003
 opinions-0.003
 color-0.003
 draw-0.003
 cooperate-0.003
TokenOne
Token ID3198
Feature activation+5.50e-02

Pos logit contributions
ang+0.007
iker+0.007
lves+0.007
 sur+0.007
 begg+0.006
Neg logit contributions
gh-0.005
 color-0.005
 sounds-0.005
 draw-0.004
 names-0.004
Token day
Token ID1110
Feature activation-4.06e-02

Pos logit contributions
gh+0.075
 color+0.075
 opinions+0.068
 rhythm+0.067
 draw+0.066
Neg logit contributions
ang-0.096
lves-0.092
 sur-0.091
iker-0.090
 begg-0.085
Token!
Token ID0
Feature activation+9.47e-03

Pos logit contributions
gh+0.097
 color+0.087
 respect+0.085
 opinions+0.085
 noses+0.083
Neg logit contributions
ang-0.181
lves-0.168
 energ-0.162
 sur-0.161
aro-0.160
Token You
Token ID921
Feature activation-1.76e-02

Pos logit contributions
gh+0.012
 respect+0.011
 opinions+0.011
 color+0.011
 noses+0.010
Neg logit contributions
ang-0.017
 energ-0.017
 sur-0.016
lves-0.016
 traveller-0.015
Token are
Token ID389
Feature activation+6.54e-02

Pos logit contributions
ang+0.041
 sur+0.038
iker+0.037
 Rory+0.036
 Ned+0.035
Neg logit contributions
gh-0.015
 draw-0.014
 color-0.014
 respect-0.014
 quarrel-0.014
Token not
Token ID407
Feature activation+9.26e-03

Pos logit contributions
gh+0.075
 color+0.068
 noses+0.065
 opinions+0.065
 respect+0.064
Neg logit contributions
ang-0.132
 sur-0.122
iker-0.118
lves-0.113
 begg-0.112
Token my
Token ID616
Feature activation-1.51e-02

Pos logit contributions
gh+0.010
 color+0.010
 respect+0.010
 feelings+0.010
 opinions+0.009
Neg logit contributions
ang-0.019
 sur-0.017
iker-0.017
aro-0.016
aroo-0.016
Token friend
Token ID1545
Feature activation-5.72e-02

Pos logit contributions
ang+0.027
 energ+0.024
 Wil+0.023
lves+0.023
aro+0.023
Neg logit contributions
 color-0.021
gh-0.020
 feelings-0.019
 opinions-0.019
 respect-0.019
Token!"
Token ID2474
Feature activation+2.91e-03

Pos logit contributions
ang+0.114
 sur+0.109
lves+0.106
 energ+0.104
 Ned+0.104
Neg logit contributions
gh-0.068
 color-0.060
 respect-0.057
 quarrel-0.056
 sounds-0.056
Token
Token ID198
Feature activation+4.36e-03

Pos logit contributions
Their+0.004
Her+0.004
gh+0.004
 respect+0.004
 sounds+0.004
Neg logit contributions
ang-0.005
lves-0.005
iker-0.005
 sur-0.005
 traveller-0.005
Token
Token ID198
Feature activation+1.02e-02

Pos logit contributions
gh+0.005
Their+0.005
 respect+0.005
 cooperate+0.005
 color+0.005
Neg logit contributions
ang-0.008
 sur-0.008
iker-0.008
lves-0.007
 traveller-0.007
TokenTom
Token ID13787
Feature activation+3.37e-02

Pos logit contributions
gh+0.010
 color+0.009
Their+0.008
 respect+0.008
 cooperate+0.008
Neg logit contributions
ang-0.022
 sur-0.021
iker-0.020
lves-0.020
 begg-0.020
Token felt
Token ID2936
Feature activation-3.02e-02

Pos logit contributions
gh+0.058
 purpose+0.053
 opinions+0.052
 color+0.052
 cooperate+0.051
Neg logit contributions
ang-0.045
 sur-0.043
 begg-0.040
 Rory-0.039
lves-0.038
Token and
Token ID290
Feature activation-3.05e-02

Pos logit contributions
ang+0.017
lves+0.017
 sur+0.016
 begg+0.016
iker+0.016
Neg logit contributions
gh-0.009
 color-0.009
 opinions-0.008
 noses-0.008
 respect-0.008
Token practice
Token ID3357
Feature activation+8.43e-03

Pos logit contributions
ang+0.062
lves+0.056
 sur+0.056
iker+0.054
 begg+0.053
Neg logit contributions
gh-0.036
 cooperate-0.034
 respect-0.034
 color-0.034
 opinions-0.034
Token,
Token ID11
Feature activation+3.09e-02

Pos logit contributions
gh+0.011
 color+0.010
 opinions+0.010
 noses+0.010
 cooperate+0.009
Neg logit contributions
ang-0.016
 sur-0.015
lves-0.014
 begg-0.014
iker-0.014
Token the
Token ID262
Feature activation-5.23e-02

Pos logit contributions
 color+0.042
 noses+0.041
 draw+0.041
gh+0.041
 opinions+0.041
Neg logit contributions
ang-0.056
iker-0.050
 sur-0.050
lves-0.049
 begg-0.048
Token 3
Token ID513
Feature activation-6.26e-02

Pos logit contributions
ang+0.096
lves+0.086
 sur+0.084
iker+0.083
 Wil+0.082
Neg logit contributions
gh-0.068
 color-0.068
 rhythm-0.064
 noses-0.064
 sounds-0.063
Token year
Token ID614
Feature activation-1.42e-01

Pos logit contributions
lves+0.102
ang+0.102
iker+0.097
 sur+0.095
 Wil+0.095
Neg logit contributions
gh-0.097
 color-0.087
 sounds-0.086
 opinions-0.085
 noses-0.082
Token old
Token ID1468
Feature activation-5.30e-02

Pos logit contributions
ang+0.253
lves+0.241
 sur+0.228
iker+0.228
 Rory+0.226
Neg logit contributions
gh-0.183
 opinions-0.181
 sounds-0.178
 color-0.174
 cooperate-0.169
Token started
Token ID2067
Feature activation-1.54e-02

Pos logit contributions
ang+0.086
lves+0.082
iker+0.073
 energ+0.073
 Wil+0.072
Neg logit contributions
 color-0.081
gh-0.079
 sounds-0.078
 opinions-0.077
 draw-0.076
Token getting
Token ID1972
Feature activation-8.10e-02

Pos logit contributions
ang+0.029
 sur+0.027
lves+0.027
iker+0.026
 begg+0.025
Neg logit contributions
gh-0.020
 color-0.018
 noses-0.017
 opinions-0.017
 draw-0.016
Token better
Token ID1365
Feature activation-8.91e-03

Pos logit contributions
ang+0.161
lves+0.143
iker+0.143
aroo+0.142
 sur+0.141
Neg logit contributions
gh-0.097
 color-0.090
 opinions-0.086
 noses-0.084
 sounds-0.082
Token and
Token ID290
Feature activation-6.34e-02

Pos logit contributions
ang+0.018
lves+0.017
aroo+0.017
aro+0.017
 sur+0.016
Neg logit contributions
gh-0.010
 color-0.009
 noses-0.009
 opinions-0.009
 sounds-0.009
Token,
Token ID11
Feature activation+2.70e-02

Pos logit contributions
lves+0.232
ang+0.230
 sur+0.216
 begg+0.210
 elf+0.209
Neg logit contributions
gh-0.083
 color-0.072
 draw-0.069
 cooperate-0.066
Their-0.064
Token he
Token ID339
Feature activation-4.94e-02

Pos logit contributions
 color+0.044
 noses+0.040
 quarrel+0.039
 talk+0.039
 shape+0.038
Neg logit contributions
ang-0.046
unn-0.041
ester-0.038
iker-0.038
 sur-0.037
Token had
Token ID550
Feature activation+1.99e-02

Pos logit contributions
ang+0.106
iker+0.094
aro+0.093
lves+0.091
aroo+0.090
Neg logit contributions
gh-0.056
 draw-0.049
 quarrel-0.049
 color-0.049
 drew-0.048
Token finally
Token ID3443
Feature activation-7.03e-02

Pos logit contributions
gh+0.022
 color+0.020
 opinions+0.019
 respect+0.019
 noses+0.018
Neg logit contributions
ang-0.046
iker-0.039
 sur-0.039
aroo-0.038
lves-0.038
Token saved
Token ID7448
Feature activation-7.79e-02

Pos logit contributions
ang+0.141
 Kang+0.120
lves+0.119
 sur+0.118
aro+0.116
Neg logit contributions
gh-0.099
 draw-0.085
 color-0.085
 respect-0.085
 cooperate-0.081
Token enough
Token ID1576
Feature activation-1.54e-01

Pos logit contributions
ang+0.145
 sur+0.131
iker+0.124
aroo+0.123
lves+0.120
Neg logit contributions
gh-0.111
 color-0.104
 purpose-0.101
 noses-0.100
 opinions-0.100
Token!
Token ID0
Feature activation-1.05e-02

Pos logit contributions
ang+0.289
lves+0.253
 sur+0.252
aroo+0.241
aro+0.232
Neg logit contributions
 color-0.215
 respect-0.206
gh-0.201
 noses-0.201
 opinions-0.200
Token He
Token ID679
Feature activation-1.93e-02

Pos logit contributions
ang+0.021
 sur+0.019
lves+0.019
aro+0.018
ela+0.018
Neg logit contributions
 color-0.012
tr-0.011
thia-0.011
 opinions-0.011
 draw-0.011
Token was
Token ID373
Feature activation-2.71e-03

Pos logit contributions
ang+0.044
iker+0.040
 sur+0.038
aro+0.038
lves+0.038
Neg logit contributions
gh-0.018
 color-0.016
 sounds-0.016
 quarrel-0.016
 draw-0.015
Token so
Token ID523
Feature activation-8.35e-02

Pos logit contributions
ang+0.006
iker+0.005
unn+0.005
 sur+0.005
lves+0.005
Neg logit contributions
gh-0.003
 color-0.003
 quarrel-0.003
 talk-0.003
 shape-0.002
Token happy
Token ID3772
Feature activation+2.17e-02

Pos logit contributions
ang+0.163
lves+0.160
iker+0.156
aro+0.151
 Rory+0.148
Neg logit contributions
gh-0.093
 color-0.088
Their-0.083
 cooperate-0.082
 talk-0.081
Token Lily
Token ID20037
Feature activation+1.17e-02

Pos logit contributions
 draw+0.001
 color+0.001
Their+0.001
gh+0.001
 faces+0.001
Neg logit contributions
ang-0.002
 sur-0.001
aro-0.001
iker-0.001
aroo-0.001
Token watched
Token ID7342
Feature activation-2.77e-03

Pos logit contributions
 color+0.011
gh+0.011
 sounds+0.011
 draw+0.010
 noses+0.010
Neg logit contributions
ang-0.024
iker-0.024
lves-0.023
 traveller-0.022
aro-0.022
Token out
Token ID503
Feature activation+3.24e-02

Pos logit contributions
ang+0.005
lves+0.004
iker+0.004
 sur+0.004
 begg+0.004
Neg logit contributions
gh-0.004
 color-0.004
 noses-0.004
 draw-0.004
 faces-0.004
Token the
Token ID262
Feature activation-3.71e-02

Pos logit contributions
 color+0.042
 sounds+0.040
 rhythm+0.040
 draw+0.040
gh+0.040
Neg logit contributions
ang-0.059
lves-0.057
iker-0.055
 sur-0.055
aroo-0.052
Token window
Token ID4324
Feature activation+2.18e-02

Pos logit contributions
ang+0.062
 sur+0.056
lves+0.056
 Rory+0.056
iker+0.054
Neg logit contributions
gh-0.053
 color-0.051
 noses-0.050
 sounds-0.049
 opinions-0.049
Token as
Token ID355
Feature activation-1.30e-01

Pos logit contributions
gh+0.031
 opinions+0.027
 color+0.027
 noses+0.027
 cooperate+0.026
Neg logit contributions
ang-0.037
 sur-0.034
iker-0.032
 begg-0.032
lves-0.032
Token they
Token ID484
Feature activation-1.61e-02

Pos logit contributions
ang+0.231
lves+0.216
iker+0.205
unn+0.193
 begg+0.189
Neg logit contributions
 color-0.197
gh-0.182
 draw-0.173
 shapes-0.170
 sounds-0.168
Token drove
Token ID10357
Feature activation-1.40e-01

Pos logit contributions
ang+0.021
 Rory+0.020
iker+0.019
 sur+0.019
lves+0.019
Neg logit contributions
 color-0.027
 noses-0.026
 draw-0.026
gh-0.025
 quarrel-0.025
Token.
Token ID13
Feature activation+8.94e-03

Pos logit contributions
ang+0.259
lves+0.236
 sur+0.235
 begg+0.230
iker+0.229
Neg logit contributions
gh-0.181
 color-0.172
 noses-0.164
 sounds-0.162
 respect-0.162
Token She
Token ID1375
Feature activation-2.45e-02

Pos logit contributions
gh+0.012
 color+0.012
 noses+0.011
 opinions+0.011
 draw+0.011
Neg logit contributions
ang-0.015
 sur-0.014
iker-0.014
lves-0.013
 begg-0.013
Token saw
Token ID2497
Feature activation+8.22e-03

Pos logit contributions
ang+0.047
iker+0.045
lves+0.043
 traveller+0.043
 Rory+0.043
Neg logit contributions
 color-0.028
gh-0.027
 sounds-0.026
 draw-0.026
 noses-0.025
Token's
Token ID338
Feature activation+1.45e-01

Pos logit contributions
gh+0.001
sing+0.001
etrical+0.001
 noses+0.001
 opinions+0.001
Neg logit contributions
ang-0.002
 sur-0.002
lves-0.002
 begg-0.002
 traveller-0.002
Token go
Token ID467
Feature activation-3.90e-02

Pos logit contributions
gh+0.149
 color+0.146
 draw+0.135
 opinions+0.131
 respect+0.131
Neg logit contributions
ang-0.311
 sur-0.286
iker-0.282
lves-0.279
 begg-0.271
Token inside
Token ID2641
Feature activation+2.77e-02

Pos logit contributions
ang+0.078
lves+0.076
 sur+0.075
iker+0.073
 energ+0.073
Neg logit contributions
 draw-0.038
 color-0.038
gh-0.038
 talk-0.036
 respect-0.035
Token and
Token ID290
Feature activation-7.21e-03

Pos logit contributions
 color+0.028
gh+0.024
 draw+0.024
 respect+0.023
 cooperate+0.023
Neg logit contributions
ang-0.059
iker-0.058
lves-0.058
 sur-0.056
 begg-0.055
Token see
Token ID766
Feature activation+4.77e-03

Pos logit contributions
ang+0.016
iker+0.014
 sur+0.014
lves+0.014
 begg+0.013
Neg logit contributions
 draw-0.007
 color-0.007
 discuss-0.007
 talk-0.007
 respect-0.006
Token how
Token ID703
Feature activation-1.35e-01

Pos logit contributions
 color+0.006
gh+0.006
 respect+0.006
 sounds+0.006
 opinions+0.006
Neg logit contributions
ang-0.008
 sur-0.008
iker-0.008
 begg-0.008
 traveller-0.007
Token much
Token ID881
Feature activation-1.81e-02

Pos logit contributions
ang+0.271
iker+0.264
 sur+0.255
 Rory+0.245
lves+0.243
Neg logit contributions
 color-0.159
 sounds-0.149
gh-0.146
 respect-0.145
 faces-0.141
Token it
Token ID340
Feature activation+3.17e-02

Pos logit contributions
ang+0.034
 sur+0.030
iker+0.030
lves+0.030
 begg+0.029
Neg logit contributions
 color-0.024
gh-0.023
 opinions-0.022
 sounds-0.022
 noses-0.022
Token costs
Token ID3484
Feature activation-4.84e-02

Pos logit contributions
gh+0.029
 sounds+0.028
 color+0.028
 opinions+0.027
 noses+0.027
Neg logit contributions
ang-0.069
 sur-0.065
iker-0.063
aroo-0.061
 Rory-0.061
Token."
Token ID526
Feature activation-1.29e-02

Pos logit contributions
ang+0.088
 sur+0.083
lves+0.081
iker+0.077
aroo+0.077
Neg logit contributions
gh-0.066
 color-0.061
 opinions-0.058
 noses-0.056
 draw-0.056
Token
Token ID198
Feature activation-4.46e-03

Pos logit contributions
ang+0.025
 sur+0.024
iker+0.023
aroo+0.022
 begg+0.022
Neg logit contributions
 color-0.015
gh-0.015
Their-0.014
 noses-0.014
 faces-0.014
Token and
Token ID290
Feature activation-8.14e-03

Pos logit contributions
ang+0.105
 sur+0.101
 begg+0.098
iker+0.096
lves+0.095
Neg logit contributions
gh-0.049
 sounds-0.049
 color-0.046
 opinions-0.046
 respect-0.044
Token to
Token ID284
Feature activation+3.11e-02

Pos logit contributions
ang+0.016
 sur+0.015
iker+0.014
lves+0.014
 begg+0.014
Neg logit contributions
gh-0.010
 color-0.009
 opinions-0.009
 noses-0.008
 respect-0.008
Token follow
Token ID1061
Feature activation-3.98e-02

Pos logit contributions
gh+0.037
 noses+0.032
 opinions+0.031
 color+0.031
Their+0.029
Neg logit contributions
ang-0.059
 sur-0.056
lves-0.053
iker-0.053
 begg-0.053
Token good
Token ID922
Feature activation+3.20e-02

Pos logit contributions
ang+0.069
 sur+0.065
iker+0.061
 begg+0.061
lves+0.060
Neg logit contributions
gh-0.056
 color-0.052
 noses-0.051
 opinions-0.051
 respect-0.051
Token advice
Token ID5608
Feature activation-4.91e-02

Pos logit contributions
gh+0.041
 opinions+0.037
 noses+0.037
 color+0.037
 respect+0.036
Neg logit contributions
ang-0.060
 sur-0.055
lves-0.053
iker-0.052
 begg-0.052
Token from
Token ID422
Feature activation-1.22e-01

Pos logit contributions
ang+0.093
 sur+0.088
 begg+0.084
lves+0.084
iker+0.084
Neg logit contributions
gh-0.060
 color-0.054
 opinions-0.053
 respect-0.053
 noses-0.053
Token those
Token ID883
Feature activation+1.65e-02

Pos logit contributions
ang+0.234
 sur+0.224
 begg+0.215
iker+0.215
lves+0.201
Neg logit contributions
 respect-0.148
 color-0.144
gh-0.141
 opinions-0.139
 rhythm-0.135
Token who
Token ID508
Feature activation-9.53e-02

Pos logit contributions
 sounds+0.020
 opinions+0.020
 respect+0.020
 color+0.019
 noses+0.019
Neg logit contributions
ang-0.031
 sur-0.030
 begg-0.029
 Rory-0.029
iker-0.028
Token care
Token ID1337
Feature activation+4.88e-02

Pos logit contributions
ang+0.188
 sur+0.175
iker+0.170
 begg+0.168
 Rory+0.165
Neg logit contributions
 respect-0.109
 opinions-0.107
 sounds-0.104
 color-0.102
gh-0.101
Token about
Token ID546
Feature activation+1.97e-02

Pos logit contributions
gh+0.059
 color+0.058
 respect+0.056
 opinions+0.056
 sounds+0.055
Neg logit contributions
ang-0.092
 sur-0.087
 begg-0.084
iker-0.083
lves-0.081
Token you
Token ID345
Feature activation-8.74e-03

Pos logit contributions
gh+0.027
 color+0.024
 opinions+0.024
 noses+0.024
 respect+0.023
Neg logit contributions
ang-0.035
 sur-0.033
iker-0.031
lves-0.031
 begg-0.031