TOP ACTIVATIONS
MAX = 0.198

saw a whole family of gor illas , from the mom
. The gor illas were having a picnic
big cage with lots of gor illas inside !"
some bananas and feed the gor illas ?" Tim my nodded
anas available for feeding the gor illas ." She said to
there was a family of gor illas . There was a
Jane waved goodbye to the gor illas as her family left
and that there were no gor illas in their house .
Lily and Ben liked the gor illas best . They were
time playing with her or nam ents and making her special
any bad dreams or scary gor illas . <|endoftext|> Once upon
borrowed her mom 's magn ifying glass to take a closer
the lock and the or nam ents . "
Suddenly , one of the gor illas reached out his furry
. Lily had a magn ifying glass that she found in
see the big , hairy gor illas . As
o was not like other gor illas . He was bald
cried , dropping the magn ifying glass . "
She learned that the or nam ents were not for playing
They reached for the or nam ents and touched them softly

MIN ACTIVATIONS
MIN = -0.188

they began to carefully ch isel the rock . They kept
a hammer and a ch isel and they began to carefully
, circles , and o vals . She matched up triangles
be ak , two o vals for the eyes , a
they rode the roller co aster together . It was scary
popcorn was sticky and gre asy . They used a towel
put it in the to aster . Then , she went
, thank you . Cig arettes are stupid . They hurt
They watch as she gl ues and tapes the horn back
the bread in the to aster and waited for it to
just stuck with the to aster in their modern home .
put it in the to aster . One day
and said , " F aster , mom my ! Faster
He said , " F aster , faster !" Jane said
toast out of the to aster and put it on a
there were lots of gl ues to choose from , nearly
m op and a dust pan and cleaned up the dough
or it will look gre asy ." Tim
her little broom and dust pan , trying to get all
white bread into the to aster . She saw the glowing

INTERVAL 0.102 - 0.198
CONTAINS 1.910%

loved her . <|endoftext|> Once upon a time there was a
minutes , the daughter calmed down and the mom my asked
the sky . <|endoftext|> Once upon a time , a little
way back feeling happy and content . <|endoftext|> Once upon a
. He wants his balls back . " Hey

INTERVAL 0.005 - 0.102
CONTAINS 44.296%

snow ing ! She put on her coat , hat ,
you ?" Lisa asks . S ara and Tom
hopped and skipped his way across the finish line .
and said , � � € � � It is lovely
with joy . From then on , he always remembered the

INTERVAL -0.092 - 0.005
CONTAINS 49.538%

what " famous " meant , but they liked the sound
happy family . <|endoftext|> Tim and Sue were brother and sister
be gentle and respectful next time ." The lady
his garage . He likes to fix things . Lily and
. When Boo left , the town became sad and quiet

INTERVAL -0.188 - -0.092
CONTAINS 4.257%

boy . He had no one to play with and was
" Anna , look , a crocod ile !"
skills . She wanted to see if she could be successful
one leg for a few seconds , then he fell down
Bob said , " I made this elevator . It goes
Token saw
Token ID2497
Feature activation-2.11e-02

Pos logit contributions
 Australian+0.190
opl+0.186
 habitats+0.175
 Jungle+0.174
 former+0.172
Neg logit contributions
up-0.073
 they-0.061
ups-0.054
 it-0.053
l-0.052
Token a
Token ID257
Feature activation-3.44e-02

Pos logit contributions
up+0.033
 they+0.028
ade+0.028
ups+0.028
l+0.027
Neg logit contributions
 Australian-0.043
opl-0.040
 habitats-0.039
 Jungle-0.039
 mastering-0.038
Token whole
Token ID2187
Feature activation+9.89e-02

Pos logit contributions
up+0.037
 they+0.034
l+0.032
 it+0.032
head+0.029
Neg logit contributions
 Australian-0.085
opl-0.085
 habitats-0.082
charged-0.080
 mastering-0.080
Token family
Token ID1641
Feature activation+9.26e-02

Pos logit contributions
 Australian+0.190
opl+0.182
charged+0.177
 mastering+0.176
 habitats+0.175
Neg logit contributions
up-0.161
 they-0.151
 it-0.144
ade-0.139
 one-0.139
Token of
Token ID286
Feature activation+6.25e-02

Pos logit contributions
 Australian+0.247
opl+0.235
 habitats+0.229
 Jungle+0.227
 former+0.224
Neg logit contributions
up-0.091
 they-0.072
ups-0.066
ade-0.065
l-0.063
Token gor
Token ID30344
Feature activation+1.98e-01

Pos logit contributions
opl+0.143
 Australian+0.136
 former+0.131
charged+0.129
ses+0.126
Neg logit contributions
 it-0.082
 they-0.082
up-0.079
 one-0.079
 them-0.073
Tokenillas
Token ID25314
Feature activation+1.31e-02

Pos logit contributions
 Australian+0.303
 former+0.296
opl+0.278
 habitats+0.267
 hind+0.247
Neg logit contributions
up-0.391
 they-0.388
ups-0.380
ing-0.362
l-0.356
Token,
Token ID11
Feature activation-2.39e-02

Pos logit contributions
 Australian+0.036
opl+0.034
 habitats+0.034
 former+0.033
charged+0.033
Neg logit contributions
 they-0.011
up-0.011
 it-0.009
 she-0.009
 to-0.008
Token from
Token ID422
Feature activation+1.23e-01

Pos logit contributions
 they+0.026
up+0.024
 it+0.022
 she+0.021
 he+0.020
Neg logit contributions
 Australian-0.060
opl-0.059
charged-0.057
 former-0.055
 habitats-0.055
Token the
Token ID262
Feature activation-3.37e-02

Pos logit contributions
 Australian+0.366
opl+0.364
charged+0.337
 mastering+0.333
 habitats+0.331
Neg logit contributions
 they-0.071
 it-0.069
 one-0.066
up-0.062
 to-0.051
Token mom
Token ID1995
Feature activation-5.60e-02

Pos logit contributions
up+0.064
 they+0.061
 one+0.061
 next+0.060
l+0.059
Neg logit contributions
opl-0.054
 Australian-0.054
 mastering-0.051
und-0.049
charged-0.047
Token.
Token ID13
Feature activation+1.23e-03

Pos logit contributions
 Australian+0.146
opl+0.139
 habitats+0.135
 Jungle+0.135
 former+0.133
Neg logit contributions
up-0.041
 they-0.031
ups-0.027
l-0.026
ade-0.026
Token
Token ID220
Feature activation-9.23e-03

Pos logit contributions
 Australian+0.003
opl+0.002
 mastering+0.002
 habitats+0.002
charged+0.002
Neg logit contributions
up-0.002
 they-0.002
cakes-0.002
ups-0.002
 it-0.002
Token
Token ID198
Feature activation+1.61e-02

Pos logit contributions
up+0.010
 they+0.008
ups+0.008
ade+0.008
l+0.008
Neg logit contributions
 Australian-0.023
opl-0.022
 habitats-0.022
 Jungle-0.021
 former-0.021
Token
Token ID198
Feature activation+1.65e-02

Pos logit contributions
 Australian+0.039
opl+0.037
 habitats+0.036
 Jungle+0.036
 mastering+0.035
Neg logit contributions
up-0.020
 they-0.016
ups-0.015
ade-0.015
l-0.015
TokenThe
Token ID464
Feature activation-1.91e-02

Pos logit contributions
 Australian+0.039
opl+0.038
 habitats+0.037
 Jungle+0.036
 former+0.036
Neg logit contributions
up-0.021
 they-0.018
ups-0.017
ade-0.016
l-0.016
Token gor
Token ID30344
Feature activation+1.98e-01

Pos logit contributions
up+0.027
 they+0.025
l+0.024
 one+0.022
cakes+0.022
Neg logit contributions
 Australian-0.042
opl-0.041
 mastering-0.038
charged-0.038
 habitats-0.038
Tokenillas
Token ID25314
Feature activation+1.39e-02

Pos logit contributions
 Australian+0.316
 former+0.302
opl+0.284
 habitats+0.273
 hind+0.273
Neg logit contributions
up-0.390
 they-0.378
ups-0.363
ing-0.361
guard-0.355
Token were
Token ID547
Feature activation+9.73e-02

Pos logit contributions
 Australian+0.035
opl+0.035
 habitats+0.032
 former+0.032
ses+0.032
Neg logit contributions
 they-0.016
up-0.015
 it-0.013
 one-0.013
 again-0.013
Token having
Token ID1719
Feature activation+3.25e-02

Pos logit contributions
charged+0.228
opl+0.227
 former+0.225
 habitats+0.224
 Australian+0.223
Neg logit contributions
 they-0.118
 it-0.106
 to-0.103
 one-0.102
up-0.099
Token a
Token ID257
Feature activation-3.33e-02

Pos logit contributions
 Australian+0.079
opl+0.078
 mastering+0.073
charged+0.073
 former+0.073
Neg logit contributions
up-0.038
 they-0.035
 it-0.033
 to-0.030
 one-0.030
Token picnic
Token ID35715
Feature activation+3.63e-02

Pos logit contributions
up+0.050
 they+0.049
 one+0.046
ade+0.045
 to+0.044
Neg logit contributions
 Australian-0.067
opl-0.067
charged-0.065
 mastering-0.063
 habitats-0.063
Token big
Token ID1263
Feature activation+2.98e-02

Pos logit contributions
up+0.046
 they+0.041
 it+0.040
 to+0.039
 one+0.039
Neg logit contributions
 Australian-0.080
opl-0.079
 habitats-0.075
 mastering-0.075
charged-0.073
Token cage
Token ID17186
Feature activation+4.90e-02

Pos logit contributions
 Australian+0.065
opl+0.060
 habitats+0.058
 mastering+0.058
charged+0.056
Neg logit contributions
up-0.042
 they-0.039
 it-0.038
 one-0.038
head-0.036
Token with
Token ID351
Feature activation+1.50e-02

Pos logit contributions
 Australian+0.112
opl+0.106
 Jungle+0.104
 habitats+0.103
 former+0.101
Neg logit contributions
up-0.060
ade-0.047
l-0.047
 they-0.047
ups-0.046
Token lots
Token ID6041
Feature activation+1.47e-02

Pos logit contributions
 Australian+0.038
opl+0.036
 mastering+0.034
 habitats+0.034
 former+0.034
Neg logit contributions
up-0.017
 they-0.014
 it-0.014
ups-0.013
l-0.013
Token of
Token ID286
Feature activation+6.12e-02

Pos logit contributions
 Australian+0.040
opl+0.039
 habitats+0.037
 Jungle+0.037
 mastering+0.036
Neg logit contributions
up-0.012
 they-0.009
ups-0.008
ade-0.008
cakes-0.008
Token gor
Token ID30344
Feature activation+1.98e-01

Pos logit contributions
 Australian+0.142
opl+0.132
 mastering+0.130
charged+0.127
 former+0.126
Neg logit contributions
up-0.081
 it-0.072
 they-0.071
l-0.070
 one-0.065
Tokenillas
Token ID25314
Feature activation+1.22e-02

Pos logit contributions
 Australian+0.289
 former+0.273
opl+0.252
 habitats+0.247
athed+0.226
Neg logit contributions
up-0.411
ups-0.400
 they-0.394
l-0.383
 it-0.366
Token inside
Token ID2641
Feature activation-8.41e-03

Pos logit contributions
 Australian+0.033
opl+0.032
 habitats+0.031
 Jungle+0.031
 former+0.030
Neg logit contributions
up-0.011
 they-0.009
ups-0.008
ade-0.008
l-0.008
Token!"
Token ID2474
Feature activation-6.56e-02

Pos logit contributions
up+0.009
 they+0.007
ups+0.006
ade+0.006
l+0.006
Neg logit contributions
 Australian-0.022
opl-0.021
 habitats-0.020
 Jungle-0.020
 mastering-0.020
Token
Token ID198
Feature activation+1.32e-02

Pos logit contributions
up+0.099
 they+0.086
ups+0.081
ade+0.081
l+0.080
Neg logit contributions
 Australian-0.140
opl-0.131
 habitats-0.127
 Jungle-0.125
 former-0.124
Token
Token ID198
Feature activation+1.39e-02

Pos logit contributions
 Australian+0.031
opl+0.029
 Jungle+0.028
 habitats+0.028
 mastering+0.027
Neg logit contributions
up-0.017
 they-0.014
ups-0.014
ade-0.014
cakes-0.013
Token some
Token ID617
Feature activation-1.04e-01

Pos logit contributions
up+0.071
ade+0.060
ups+0.058
cakes+0.056
 they+0.056
Neg logit contributions
 Australian-0.145
 Jungle-0.138
opl-0.138
 jungle-0.129
 habitats-0.129
Token bananas
Token ID35484
Feature activation+2.10e-02

Pos logit contributions
up+0.144
 they+0.121
ups+0.117
ade+0.115
l+0.112
Neg logit contributions
 Australian-0.233
opl-0.219
 Jungle-0.212
 habitats-0.212
 mastering-0.205
Token and
Token ID290
Feature activation-2.12e-02

Pos logit contributions
 Australian+0.053
opl+0.051
 Jungle+0.049
 habitats+0.048
 mastering+0.047
Neg logit contributions
up-0.023
ups-0.017
 they-0.016
l-0.016
ade-0.016
Token feed
Token ID3745
Feature activation-5.98e-02

Pos logit contributions
up+0.030
 they+0.026
ups+0.024
ade+0.024
 it+0.023
Neg logit contributions
 Australian-0.047
opl-0.045
 habitats-0.043
 Jungle-0.042
 mastering-0.042
Token the
Token ID262
Feature activation-3.46e-02

Pos logit contributions
up+0.060
ade+0.043
ups+0.041
l+0.039
 they+0.039
Neg logit contributions
 Australian-0.155
opl-0.146
 Jungle-0.145
 habitats-0.141
 mastering-0.135
Token gor
Token ID30344
Feature activation+1.97e-01

Pos logit contributions
up+0.081
 they+0.076
l+0.075
 it+0.073
 one+0.072
Neg logit contributions
 Australian-0.043
opl-0.039
 former-0.038
 mastering-0.037
 habitats-0.037
Tokenillas
Token ID25314
Feature activation+1.34e-02

Pos logit contributions
 Australian+0.286
 former+0.266
opl+0.251
 habitats+0.238
 mastering+0.229
Neg logit contributions
up-0.416
 they-0.407
ups-0.397
l-0.388
guard-0.369
Token?"
Token ID1701
Feature activation-8.43e-02

Pos logit contributions
 Australian+0.035
opl+0.033
 Jungle+0.032
 habitats+0.032
 mastering+0.031
Neg logit contributions
up-0.014
 they-0.011
ups-0.011
ade-0.010
l-0.010
Token Tim
Token ID5045
Feature activation-3.90e-02

Pos logit contributions
up+0.136
ups+0.113
ade+0.111
 they+0.110
l+0.109
Neg logit contributions
 Australian-0.174
opl-0.163
 Jungle-0.156
 habitats-0.155
 mastering-0.153
Tokenmy
Token ID1820
Feature activation-3.33e-02

Pos logit contributions
up+0.035
 they+0.025
ups+0.024
ade+0.024
l+0.023
Neg logit contributions
 Australian-0.107
opl-0.102
 habitats-0.099
 Jungle-0.099
 former-0.097
Token nodded
Token ID14464
Feature activation+1.98e-02

Pos logit contributions
up+0.044
 they+0.036
ups+0.035
ade+0.035
l+0.034
Neg logit contributions
 Australian-0.077
opl-0.073
 habitats-0.070
 Jungle-0.070
 mastering-0.069
Tokenanas
Token ID15991
Feature activation-1.96e-02

Pos logit contributions
up+0.052
 they+0.046
ups+0.045
ade+0.045
l+0.045
Neg logit contributions
 Australian-0.037
opl-0.034
 habitats-0.032
 Jungle-0.032
 mastering-0.031
Token available
Token ID1695
Feature activation+1.93e-02

Pos logit contributions
up+0.026
 they+0.023
ups+0.021
ade+0.021
l+0.021
Neg logit contributions
 Australian-0.045
opl-0.043
 habitats-0.042
 former-0.041
 Jungle-0.041
Token for
Token ID329
Feature activation+2.51e-02

Pos logit contributions
 Australian+0.051
opl+0.049
 habitats+0.047
 Jungle+0.047
 former+0.047
Neg logit contributions
up-0.019
 they-0.015
ups-0.014
ade-0.014
 it-0.013
Token feeding
Token ID13017
Feature activation-4.07e-02

Pos logit contributions
 Australian+0.062
opl+0.060
 habitats+0.057
 former+0.057
ses+0.056
Neg logit contributions
 they-0.027
up-0.027
 it-0.026
 one-0.023
ups-0.022
Token the
Token ID262
Feature activation-3.45e-02

Pos logit contributions
up+0.044
 they+0.037
ups+0.034
 it+0.034
ade+0.033
Neg logit contributions
 Australian-0.103
opl-0.098
 habitats-0.095
 former-0.094
 mastering-0.094
Token gor
Token ID30344
Feature activation+1.96e-01

Pos logit contributions
up+0.073
 they+0.070
l+0.069
 one+0.067
 it+0.066
Neg logit contributions
 Australian-0.049
opl-0.046
 former-0.044
 mastering-0.044
charged-0.043
Tokenillas
Token ID25314
Feature activation+1.26e-02

Pos logit contributions
 Australian+0.285
 former+0.265
opl+0.254
 habitats+0.236
 mastering+0.227
Neg logit contributions
up-0.416
 they-0.405
ups-0.404
l-0.387
 it-0.365
Token."
Token ID526
Feature activation-4.97e-02

Pos logit contributions
 Australian+0.034
opl+0.032
 habitats+0.032
 Jungle+0.031
 mastering+0.031
Neg logit contributions
up-0.012
 they-0.010
ups-0.009
ade-0.008
l-0.008
Token She
Token ID1375
Feature activation+6.69e-02

Pos logit contributions
up+0.076
 they+0.063
ups+0.062
ade+0.062
l+0.060
Neg logit contributions
 Australian-0.106
opl-0.100
 habitats-0.096
 Jungle-0.096
 mastering-0.093
Token said
Token ID531
Feature activation-2.20e-02

Pos logit contributions
 Australian+0.159
opl+0.146
 Jungle+0.145
 habitats+0.145
 mastering+0.143
Neg logit contributions
up-0.082
ade-0.065
ups-0.065
 they-0.064
l-0.063
Token to
Token ID284
Feature activation+6.18e-03

Pos logit contributions
up+0.032
ups+0.027
ade+0.026
l+0.024
head+0.023
Neg logit contributions
 Australian-0.047
 mastering-0.043
 Jungle-0.043
opl-0.042
 habitats-0.041
Token there
Token ID612
Feature activation-7.45e-02

Pos logit contributions
up+0.015
ade+0.009
ups+0.009
l+0.008
cakes+0.008
Neg logit contributions
 Australian-0.062
opl-0.057
 habitats-0.057
 Jungle-0.057
 mastering-0.056
Token was
Token ID373
Feature activation+5.39e-02

Pos logit contributions
up+0.081
ups+0.062
 they+0.060
ade+0.060
l+0.058
Neg logit contributions
 Australian-0.188
opl-0.178
 habitats-0.173
 Jungle-0.172
 mastering-0.169
Token a
Token ID257
Feature activation-3.84e-02

Pos logit contributions
 Australian+0.142
opl+0.135
 habitats+0.130
 Jungle+0.130
 mastering+0.128
Neg logit contributions
up-0.055
 they-0.044
ups-0.041
ade-0.040
l-0.039
Token family
Token ID1641
Feature activation+8.97e-02

Pos logit contributions
 they+0.034
up+0.033
 it+0.032
l+0.030
 se+0.030
Neg logit contributions
opl-0.102
 Australian-0.100
 habitats-0.098
charged-0.096
 mastering-0.094
Token of
Token ID286
Feature activation+5.46e-02

Pos logit contributions
 Australian+0.248
opl+0.236
 Jungle+0.230
 habitats+0.228
 mastering+0.225
Neg logit contributions
up-0.077
 they-0.054
ups-0.054
ade-0.054
l-0.050
Token gor
Token ID30344
Feature activation+1.96e-01

Pos logit contributions
 Australian+0.127
opl+0.127
 mastering+0.114
 former+0.114
charged+0.114
Neg logit contributions
up-0.068
 they-0.065
 it-0.063
 one-0.060
 sugar-0.058
Tokenillas
Token ID25314
Feature activation+1.48e-03

Pos logit contributions
 Australian+0.290
 former+0.266
opl+0.263
 habitats+0.243
 hind+0.229
Neg logit contributions
up-0.413
 they-0.404
ups-0.395
l-0.377
 it-0.371
Token.
Token ID13
Feature activation+7.35e-03

Pos logit contributions
 Australian+0.005
opl+0.005
 former+0.004
 habitats+0.004
 Jungle+0.004
Neg logit contributions
up-0.001
 they-0.001
 it-0.000
ups-0.000
l-0.000
Token There
Token ID1318
Feature activation-6.12e-02

Pos logit contributions
 Australian+0.015
opl+0.015
 mastering+0.014
 habitats+0.014
charged+0.014
Neg logit contributions
up-0.011
 they-0.010
l-0.009
 it-0.009
cakes-0.009
Token was
Token ID373
Feature activation+5.42e-02

Pos logit contributions
up+0.089
 they+0.079
ade+0.073
ups+0.073
l+0.072
Neg logit contributions
 Australian-0.134
opl-0.126
 habitats-0.122
 Jungle-0.121
 former-0.120
Token a
Token ID257
Feature activation-3.84e-02

Pos logit contributions
 Australian+0.141
opl+0.137
charged+0.131
 mastering+0.129
 habitats+0.129
Neg logit contributions
up-0.053
 they-0.048
ups-0.041
l-0.041
 it-0.041
TokenJane
Token ID41083
Feature activation-4.00e-02

Pos logit contributions
 Australian+0.043
opl+0.041
 habitats+0.040
 Jungle+0.040
 former+0.039
Neg logit contributions
up-0.029
 they-0.024
ups-0.024
ade-0.023
l-0.022
Token waved
Token ID26834
Feature activation+1.38e-02

Pos logit contributions
up+0.041
 they+0.033
l+0.029
ups+0.029
 it+0.029
Neg logit contributions
 Australian-0.106
opl-0.104
 habitats-0.100
 former-0.099
 Jungle-0.099
Token goodbye
Token ID24829
Feature activation+6.80e-02

Pos logit contributions
 Australian+0.026
opl+0.024
 habitats+0.023
 Jungle+0.023
 mastering+0.023
Neg logit contributions
up-0.024
ade-0.021
ups-0.021
 they-0.020
cakes-0.020
Token to
Token ID284
Feature activation+3.80e-03

Pos logit contributions
 Australian+0.203
opl+0.193
 habitats+0.189
 Jungle+0.188
 former+0.185
Neg logit contributions
up-0.045
 they-0.028
ade-0.027
ups-0.027
l-0.024
Token the
Token ID262
Feature activation-3.19e-02

Pos logit contributions
 Australian+0.011
opl+0.011
 habitats+0.010
 former+0.010
charged+0.010
Neg logit contributions
 they-0.003
 it-0.002
up-0.002
 one-0.002
 she-0.002
Token gor
Token ID30344
Feature activation+1.95e-01

Pos logit contributions
up+0.062
 one+0.061
l+0.060
 next+0.059
 they+0.057
Neg logit contributions
 Australian-0.051
opl-0.046
und-0.045
 mastering-0.045
charged-0.045
Tokenillas
Token ID25314
Feature activation+6.88e-03

Pos logit contributions
 Australian+0.309
 former+0.285
opl+0.276
 habitats+0.261
ses+0.242
Neg logit contributions
 they-0.391
up-0.388
l-0.372
ups-0.372
 again-0.362
Token as
Token ID355
Feature activation+5.24e-04

Pos logit contributions
 Australian+0.021
opl+0.020
 habitats+0.020
 Jungle+0.019
 former+0.019
Neg logit contributions
 they-0.004
up-0.004
 again-0.003
 it-0.003
 to-0.002
Token her
Token ID607
Feature activation+6.97e-02

Pos logit contributions
 Australian+0.002
opl+0.002
 habitats+0.002
 Jungle+0.002
 former+0.002
Neg logit contributions
up-0.000
 they-0.000
ups-0.000
l-0.000
ade-0.000
Token family
Token ID1641
Feature activation+9.08e-02

Pos logit contributions
 Australian+0.166
opl+0.152
und+0.147
 mastering+0.147
athed+0.146
Neg logit contributions
up-0.089
 they-0.076
 to-0.075
l-0.075
 next-0.074
Token left
Token ID1364
Feature activation+2.35e-02

Pos logit contributions
 Australian+0.225
opl+0.215
 habitats+0.207
 former+0.206
 Jungle+0.203
Neg logit contributions
up-0.109
 they-0.098
 it-0.085
ups-0.083
l-0.082
Token and
Token ID290
Feature activation-2.23e-02

Pos logit contributions
up+0.030
ade+0.023
ups+0.023
l+0.022
 they+0.022
Neg logit contributions
 Australian-0.067
opl-0.064
 habitats-0.062
 Jungle-0.062
 former-0.061
Token that
Token ID326
Feature activation-9.13e-02

Pos logit contributions
up+0.019
ade+0.014
l+0.013
ups+0.013
cakes+0.013
Neg logit contributions
 Australian-0.060
opl-0.056
 former-0.056
 habitats-0.056
 mastering-0.056
Token there
Token ID612
Feature activation-7.07e-02

Pos logit contributions
up+0.076
ade+0.050
ups+0.050
 they+0.048
l+0.047
Neg logit contributions
 Australian-0.258
opl-0.246
 habitats-0.238
 Jungle-0.237
 mastering-0.236
Token were
Token ID547
Feature activation+9.71e-02

Pos logit contributions
up+0.076
 they+0.062
ups+0.057
ade+0.057
l+0.055
Neg logit contributions
 Australian-0.182
opl-0.172
 habitats-0.168
 Jungle-0.168
 former-0.164
Token no
Token ID645
Feature activation-8.30e-02

Pos logit contributions
 Australian+0.216
opl+0.204
 Jungle+0.196
 habitats+0.190
 mastering+0.190
Neg logit contributions
up-0.140
 they-0.125
ups-0.118
 it-0.114
ade-0.113
Token gor
Token ID30344
Feature activation+1.95e-01

Pos logit contributions
up+0.124
 they+0.105
ups+0.101
ade+0.101
l+0.099
Neg logit contributions
 Australian-0.182
opl-0.169
 Jungle-0.163
 habitats-0.162
 mastering-0.159
Tokenillas
Token ID25314
Feature activation+7.54e-03

Pos logit contributions
 Australian+0.278
opl+0.259
 former+0.244
 habitats+0.236
 Jungle+0.220
Neg logit contributions
up-0.427
 they-0.408
ups-0.399
 again-0.383
 it-0.379
Token in
Token ID287
Feature activation+8.48e-02

Pos logit contributions
 Australian+0.019
opl+0.018
 habitats+0.017
 Jungle+0.017
 former+0.017
Neg logit contributions
up-0.008
 they-0.007
ups-0.007
l-0.007
 it-0.006
Token their
Token ID511
Feature activation+6.39e-02

Pos logit contributions
 Australian+0.218
opl+0.203
 habitats+0.198
 mastering+0.198
 Jungle+0.196
Neg logit contributions
up-0.091
 they-0.080
 it-0.078
head-0.070
l-0.070
Token house
Token ID2156
Feature activation+4.22e-02

Pos logit contributions
 Australian+0.122
 mastering+0.118
opl+0.117
 Jungle+0.107
athed+0.107
Neg logit contributions
up-0.105
 they-0.095
 one-0.091
 to-0.091
 again-0.089
Token.
Token ID13
Feature activation-3.40e-03

Pos logit contributions
 Australian+0.130
opl+0.123
 Jungle+0.121
 habitats+0.120
 mastering+0.118
Neg logit contributions
up-0.024
 they-0.016
ups-0.014
ade-0.013
l-0.012
Token Lily
Token ID20037
Feature activation-5.35e-02

Pos logit contributions
up+0.062
ade+0.049
ups+0.048
cakes+0.046
l+0.044
Neg logit contributions
 Australian-0.132
 habitats-0.121
 Jungle-0.119
opl-0.119
 mastering-0.117
Token and
Token ID290
Feature activation-1.38e-02

Pos logit contributions
up+0.051
 they+0.042
ups+0.038
ade+0.036
 it+0.036
Neg logit contributions
 Australian-0.143
opl-0.138
 former-0.134
 Jungle-0.133
 habitats-0.133
Token Ben
Token ID3932
Feature activation-6.85e-02

Pos logit contributions
up+0.021
ups+0.018
l+0.017
ade+0.017
cakes+0.017
Neg logit contributions
 Australian-0.029
opl-0.027
 habitats-0.026
 Jungle-0.026
 mastering-0.025
Token liked
Token ID8288
Feature activation+8.18e-02

Pos logit contributions
up+0.080
 they+0.078
 it+0.066
l+0.065
 she+0.063
Neg logit contributions
opl-0.170
 Australian-0.167
 former-0.159
 Jungle-0.158
 habitats-0.155
Token the
Token ID262
Feature activation-3.23e-02

Pos logit contributions
 Australian+0.204
opl+0.191
 habitats+0.189
 Jungle+0.187
 mastering+0.182
Neg logit contributions
up-0.095
ade-0.072
 they-0.072
ups-0.072
l-0.069
Token gor
Token ID30344
Feature activation+1.95e-01

Pos logit contributions
up+0.060
 they+0.053
l+0.052
ade+0.052
ups+0.050
Neg logit contributions
 Australian-0.057
opl-0.053
 mastering-0.051
 habitats-0.050
charged-0.049
Tokenillas
Token ID25314
Feature activation+1.38e-02

Pos logit contributions
 Australian+0.302
 former+0.280
 habitats+0.261
opl+0.252
 hind+0.241
Neg logit contributions
up-0.405
 they-0.385
ups-0.379
l-0.372
ing-0.353
Token best
Token ID1266
Feature activation+6.79e-02

Pos logit contributions
 Australian+0.040
opl+0.038
 habitats+0.038
 Jungle+0.037
 former+0.037
Neg logit contributions
up-0.010
 they-0.008
ups-0.007
ade-0.006
l-0.006
Token.
Token ID13
Feature activation+6.42e-03

Pos logit contributions
 Australian+0.195
opl+0.189
 habitats+0.184
 Jungle+0.181
 mastering+0.180
Neg logit contributions
up-0.050
 they-0.030
ade-0.030
head-0.029
ups-0.029
Token They
Token ID1119
Feature activation+1.78e-02

Pos logit contributions
 Australian+0.011
opl+0.010
 habitats+0.010
 Jungle+0.010
 former+0.009
Neg logit contributions
up-0.013
ups-0.011
ade-0.011
l-0.011
guard-0.010
Token were
Token ID547
Feature activation+9.87e-02

Pos logit contributions
opl+0.046
 Australian+0.045
 former+0.043
 mastering+0.042
 Jungle+0.042
Neg logit contributions
 they-0.019
up-0.019
 it-0.016
 she-0.015
 let-0.015
Token time
Token ID640
Feature activation-3.10e-02

Pos logit contributions
 Australian+0.195
opl+0.180
 habitats+0.175
charged+0.173
 Jungle+0.171
Neg logit contributions
up-0.121
 they-0.106
ade-0.103
 it-0.101
 one-0.098
Token playing
Token ID2712
Feature activation+4.82e-02

Pos logit contributions
up+0.040
 they+0.038
 it+0.033
 to+0.033
 she+0.033
Neg logit contributions
 Australian-0.073
opl-0.068
 habitats-0.066
 Jungle-0.065
charged-0.064
Token with
Token ID351
Feature activation+2.28e-02

Pos logit contributions
 Australian+0.127
opl+0.121
 habitats+0.117
 Jungle+0.116
 former+0.114
Neg logit contributions
up-0.049
 they-0.040
ade-0.037
 it-0.035
ups-0.035
Token her
Token ID607
Feature activation+7.24e-02

Pos logit contributions
 Australian+0.058
opl+0.054
 habitats+0.052
 Jungle+0.052
 mastering+0.051
Neg logit contributions
up-0.026
 they-0.022
ups-0.020
ade-0.020
 it-0.020
Token or
Token ID393
Feature activation-3.63e-02

Pos logit contributions
 Australian+0.164
opl+0.154
 habitats+0.145
 Jungle+0.144
 mastering+0.141
Neg logit contributions
up-0.099
 they-0.098
 it-0.088
l-0.087
ade-0.086
Tokennam
Token ID7402
Feature activation+1.95e-01

Pos logit contributions
up+0.054
 they+0.052
 it+0.049
ups+0.047
 she+0.046
Neg logit contributions
 Australian-0.076
opl-0.073
 habitats-0.068
 Jungle-0.068
charged-0.066
Tokenents
Token ID658
Feature activation-3.83e-02

Pos logit contributions
 Australian+0.429
opl+0.401
 habitats+0.391
 Jungle+0.388
 mastering+0.378
Neg logit contributions
up-0.285
ups-0.235
 they-0.232
ade-0.227
l-0.223
Token and
Token ID290
Feature activation-1.94e-02

Pos logit contributions
up+0.032
 they+0.022
ups+0.022
ade+0.021
cakes+0.020
Neg logit contributions
 Australian-0.106
opl-0.101
 habitats-0.098
 Jungle-0.098
 mastering-0.096
Token making
Token ID1642
Feature activation-7.69e-02

Pos logit contributions
up+0.024
 they+0.019
ups+0.019
ade+0.019
l+0.018
Neg logit contributions
 Australian-0.047
opl-0.044
 habitats-0.043
 Jungle-0.043
 mastering-0.042
Token her
Token ID607
Feature activation+7.01e-02

Pos logit contributions
up+0.096
 they+0.079
ups+0.075
ade+0.075
l+0.073
Neg logit contributions
 Australian-0.185
opl-0.175
 habitats-0.169
 Jungle-0.168
 mastering-0.165
Token special
Token ID2041
Feature activation+1.97e-02

Pos logit contributions
 Australian+0.150
opl+0.137
 habitats+0.134
 Jungle+0.132
und+0.128
Neg logit contributions
up-0.107
 they-0.100
 it-0.096
 one-0.093
l-0.092
Token any
Token ID597
Feature activation-3.70e-02

Pos logit contributions
 Australian+0.291
opl+0.270
 Jungle+0.260
 habitats+0.257
 mastering+0.252
Neg logit contributions
 they-0.136
up-0.136
 it-0.135
 one-0.123
 again-0.121
Token bad
Token ID2089
Feature activation+4.65e-02

Pos logit contributions
up+0.057
 they+0.049
ups+0.047
ade+0.047
l+0.046
Neg logit contributions
 Australian-0.078
opl-0.073
 habitats-0.070
 Jungle-0.070
 mastering-0.068
Token dreams
Token ID10625
Feature activation-5.21e-02

Pos logit contributions
 Australian+0.087
 habitats+0.083
opl+0.082
 Jungle+0.079
 jungle+0.077
Neg logit contributions
up-0.078
cakes-0.068
ups-0.067
ade-0.066
 they-0.065
Token or
Token ID393
Feature activation-4.23e-02

Pos logit contributions
up+0.045
ade+0.031
ups+0.030
cakes+0.028
l+0.027
Neg logit contributions
opl-0.138
 Australian-0.137
 Jungle-0.129
 habitats-0.127
 mastering-0.126
Token scary
Token ID14343
Feature activation+3.40e-02

Pos logit contributions
up+0.052
 they+0.048
 it+0.043
ups+0.043
l+0.040
Neg logit contributions
 Australian-0.102
opl-0.098
 Jungle-0.092
 habitats-0.092
 mastering-0.090
Token gor
Token ID30344
Feature activation+1.95e-01

Pos logit contributions
 Australian+0.084
opl+0.080
 habitats+0.078
 Jungle+0.077
 former+0.076
Neg logit contributions
up-0.039
 they-0.031
ups-0.030
ade-0.030
l-0.029
Tokenillas
Token ID25314
Feature activation-2.50e-04

Pos logit contributions
 Australian+0.283
opl+0.255
 former+0.246
 habitats+0.239
 Jungle+0.222
Neg logit contributions
up-0.406
 they-0.406
ups-0.390
l-0.381
 again-0.378
Token.
Token ID13
Feature activation+6.45e-03

Pos logit contributions
up+0.000
 they+0.000
ups+0.000
l+0.000
ade+0.000
Neg logit contributions
 Australian-0.001
opl-0.001
 habitats-0.001
 Jungle-0.001
 former-0.001
Token<|endoftext|>
Token ID50256
Feature activation+7.44e-02

Pos logit contributions
 Australian+0.016
opl+0.015
 habitats+0.015
 mastering+0.015
charged+0.015
Neg logit contributions
up-0.007
 they-0.007
 it-0.006
l-0.005
 he-0.005
TokenOnce
Token ID7454
Feature activation+3.44e-02

Pos logit contributions
 Australian+0.175
opl+0.165
 Jungle+0.159
 habitats+0.158
 mastering+0.154
Neg logit contributions
up-0.097
ups-0.077
 they-0.077
ade-0.075
cakes-0.073
Token upon
Token ID2402
Feature activation+1.02e-01

Pos logit contributions
 Australian+0.056
opl+0.051
 Jungle+0.047
 habitats+0.047
 mastering+0.046
Neg logit contributions
up-0.069
ups-0.060
cakes-0.059
head-0.058
ade-0.055
Token borrowed
Token ID22546
Feature activation-1.96e-02

Pos logit contributions
opl+0.198
 Australian+0.194
 habitats+0.184
 former+0.183
 Jungle+0.182
Neg logit contributions
up-0.068
 they-0.057
 it-0.054
l-0.049
ade-0.048
Token her
Token ID607
Feature activation+6.97e-02

Pos logit contributions
up+0.020
ups+0.015
 they+0.015
ade+0.015
cakes+0.014
Neg logit contributions
 Australian-0.051
opl-0.048
 Jungle-0.047
 habitats-0.047
 mastering-0.046
Token mom
Token ID1995
Feature activation-5.64e-02

Pos logit contributions
 Australian+0.158
opl+0.157
 habitats+0.150
 Jungle+0.146
 mastering+0.142
Neg logit contributions
up-0.092
 it-0.084
 to-0.083
 they-0.081
 one-0.079
Token's
Token ID338
Feature activation+2.36e-02

Pos logit contributions
up+0.064
ade+0.050
 they+0.050
ups+0.050
cakes+0.050
Neg logit contributions
 Australian-0.139
opl-0.130
 Jungle-0.127
 habitats-0.123
 mastering-0.121
Token magn
Token ID7842
Feature activation+1.74e-02

Pos logit contributions
opl+0.051
 Australian+0.051
 Jungle+0.046
 habitats+0.046
 mastering+0.045
Neg logit contributions
up-0.035
 to-0.032
 it-0.030
 one-0.029
 they-0.028
Tokenifying
Token ID4035
Feature activation+1.95e-01

Pos logit contributions
 Australian+0.017
opl+0.016
 habitats+0.014
 Jungle+0.013
 mastering+0.013
Neg logit contributions
up-0.046
ade-0.042
ly-0.042
l-0.042
 they-0.042
Token glass
Token ID5405
Feature activation-1.44e-02

Pos logit contributions
 Australian+0.511
opl+0.484
 habitats+0.476
 mastering+0.469
 Jungle+0.464
Neg logit contributions
up-0.200
ade-0.153
 they-0.152
l-0.142
 it-0.141
Token to
Token ID284
Feature activation+7.35e-03

Pos logit contributions
up+0.014
 they+0.011
ups+0.011
head+0.010
cakes+0.010
Neg logit contributions
 Australian-0.036
opl-0.034
 habitats-0.034
 Jungle-0.033
 former-0.032
Token take
Token ID1011
Feature activation-3.47e-02

Pos logit contributions
 Australian+0.019
opl+0.018
 habitats+0.017
charged+0.017
 former+0.016
Neg logit contributions
up-0.008
 they-0.007
 it-0.007
 she-0.006
ade-0.006
Token a
Token ID257
Feature activation-3.39e-02

Pos logit contributions
up+0.043
 they+0.035
ups+0.035
ade+0.034
cakes+0.033
Neg logit contributions
 Australian-0.082
opl-0.077
 Jungle-0.075
 habitats-0.075
 mastering-0.073
Token closer
Token ID5699
Feature activation-4.87e-02

Pos logit contributions
up+0.044
ups+0.036
 they+0.035
ade+0.034
cakes+0.034
Neg logit contributions
 Australian-0.080
opl-0.073
 Jungle-0.072
 habitats-0.071
 mastering-0.070
Token the
Token ID262
Feature activation-2.96e-02

Pos logit contributions
 Australian+0.048
opl+0.046
 habitats+0.045
 Jungle+0.044
 mastering+0.044
Neg logit contributions
up-0.017
 they-0.013
ups-0.013
ade-0.012
l-0.012
Token lock
Token ID5793
Feature activation-2.72e-02

Pos logit contributions
up+0.039
 they+0.033
l+0.031
ade+0.031
 it+0.030
Neg logit contributions
 Australian-0.069
opl-0.066
 habitats-0.063
 Jungle-0.063
 mastering-0.062
Token and
Token ID290
Feature activation-1.79e-02

Pos logit contributions
up+0.031
ups+0.024
ade+0.023
head+0.023
cakes+0.022
Neg logit contributions
opl-0.061
 Australian-0.060
und-0.057
 Jungle-0.056
athed-0.056
Token the
Token ID262
Feature activation-2.96e-02

Pos logit contributions
up+0.021
ade+0.016
ups+0.016
l+0.015
 they+0.015
Neg logit contributions
 Australian-0.044
opl-0.041
 Jungle-0.040
 habitats-0.040
 mastering-0.040
Token or
Token ID393
Feature activation-4.14e-02

Pos logit contributions
up+0.039
 they+0.033
l+0.032
ade+0.031
ups+0.031
Neg logit contributions
 Australian-0.069
opl-0.065
 habitats-0.063
 Jungle-0.062
 mastering-0.061
Tokennam
Token ID7402
Feature activation+1.94e-01

Pos logit contributions
up+0.055
ade+0.046
ups+0.043
cakes+0.042
l+0.042
Neg logit contributions
 Australian-0.094
 Jungle-0.087
 habitats-0.086
 mastering-0.086
opl-0.085
Tokenents
Token ID658
Feature activation-2.62e-02

Pos logit contributions
 Australian+0.382
opl+0.358
 habitats+0.340
 Jungle+0.339
 mastering+0.334
Neg logit contributions
up-0.311
 they-0.275
ade-0.264
cakes-0.260
ups-0.256
Token.
Token ID13
Feature activation+3.63e-03

Pos logit contributions
up+0.028
ups+0.022
head+0.020
cakes+0.020
ade+0.019
Neg logit contributions
 Australian-0.060
opl-0.060
 mastering-0.057
 Jungle-0.056
und-0.056
Token
Token ID198
Feature activation+1.45e-02

Pos logit contributions
 Australian+0.006
opl+0.006
 Jungle+0.006
 habitats+0.006
 former+0.006
Neg logit contributions
up-0.007
ups-0.006
ade-0.006
ly-0.005
guard-0.005
Token
Token ID198
Feature activation+1.49e-02

Pos logit contributions
 Australian+0.027
opl+0.026
 Jungle+0.024
und+0.024
 habitats+0.023
Neg logit contributions
up-0.024
ly-0.021
cakes-0.020
ade-0.020
ups-0.020
Token"
Token ID1
Feature activation-1.27e-02

Pos logit contributions
 Australian+0.033
opl+0.031
 Jungle+0.030
 mastering+0.029
 habitats+0.029
Neg logit contributions
up-0.019
ade-0.016
cakes-0.016
ly-0.016
head-0.015
TokenSuddenly
Token ID38582
Feature activation-5.96e-02

Pos logit contributions
 Australian+0.056
opl+0.054
 habitats+0.053
 Jungle+0.052
 mastering+0.051
Neg logit contributions
up-0.028
 they-0.023
ups-0.022
ade-0.021
l-0.021
Token,
Token ID11
Feature activation-2.27e-02

Pos logit contributions
up+0.074
ups+0.061
ade+0.061
cakes+0.059
l+0.054
Neg logit contributions
 Australian-0.137
 Jungle-0.125
 mastering-0.122
opl-0.122
 habitats-0.121
Token one
Token ID530
Feature activation-1.20e-01

Pos logit contributions
up+0.022
ups+0.016
ade+0.016
 they+0.015
l+0.015
Neg logit contributions
 Australian-0.061
opl-0.057
 habitats-0.056
 Jungle-0.056
 mastering-0.055
Token of
Token ID286
Feature activation+5.35e-02

Pos logit contributions
up+0.106
ups+0.079
 they+0.076
ade+0.076
cakes+0.074
Neg logit contributions
 Australian-0.321
opl-0.301
 Jungle-0.299
 habitats-0.295
 mastering-0.290
Token the
Token ID262
Feature activation-3.37e-02

Pos logit contributions
 Australian+0.150
opl+0.144
 habitats+0.140
 Jungle+0.139
 former+0.137
Neg logit contributions
up-0.044
 they-0.034
 it-0.030
ups-0.030
l-0.029
Token gor
Token ID30344
Feature activation+1.94e-01

Pos logit contributions
up+0.049
l+0.041
 they+0.041
ups+0.040
 one+0.039
Neg logit contributions
 Australian-0.075
opl-0.072
 mastering-0.068
 habitats-0.067
charged-0.066
Tokenillas
Token ID25314
Feature activation+2.06e-03

Pos logit contributions
 Australian+0.283
opl+0.254
 former+0.250
 habitats+0.227
 hind+0.221
Neg logit contributions
up-0.423
 they-0.406
ups-0.397
l-0.386
guard-0.374
Token reached
Token ID4251
Feature activation-1.03e-01

Pos logit contributions
 Australian+0.005
opl+0.005
 habitats+0.005
 former+0.005
 Jungle+0.005
Neg logit contributions
up-0.002
 they-0.002
l-0.002
ade-0.002
ups-0.002
Token out
Token ID503
Feature activation+4.54e-02

Pos logit contributions
up+0.098
 they+0.079
ups+0.071
 it+0.069
ade+0.069
Neg logit contributions
 Australian-0.276
opl-0.264
 habitats-0.256
 Jungle-0.252
 mastering-0.250
Token his
Token ID465
Feature activation+6.44e-02

Pos logit contributions
 Australian+0.120
opl+0.114
 mastering+0.112
 habitats+0.112
 Jungle+0.111
Neg logit contributions
up-0.042
ade-0.031
ups-0.030
head-0.028
cakes-0.028
Token furry
Token ID40857
Feature activation+6.23e-02

Pos logit contributions
 Australian+0.140
opl+0.132
 mastering+0.129
 habitats+0.126
charged+0.123
Neg logit contributions
up-0.094
 to-0.085
 one-0.080
 they-0.080
ups-0.077
Token.
Token ID13
Feature activation+7.32e-03

Pos logit contributions
up+0.062
ups+0.043
ade+0.042
cakes+0.042
 they+0.041
Neg logit contributions
 Australian-0.180
opl-0.173
 Jungle-0.168
 habitats-0.167
 mastering-0.164
Token Lily
Token ID20037
Feature activation-5.44e-02

Pos logit contributions
 Australian+0.012
 habitats+0.010
 Jungle+0.010
 jungle+0.010
 former+0.010
Neg logit contributions
up-0.015
ups-0.013
guard-0.013
ly-0.012
cakes-0.012
Token had
Token ID550
Feature activation+5.89e-03

Pos logit contributions
up+0.066
 they+0.055
ups+0.051
ade+0.050
l+0.049
Neg logit contributions
 Australian-0.133
opl-0.128
 habitats-0.122
 Jungle-0.121
 former-0.120
Token a
Token ID257
Feature activation-3.47e-02

Pos logit contributions
 Australian+0.015
opl+0.014
 habitats+0.014
 Jungle+0.014
 former+0.014
Neg logit contributions
up-0.007
 they-0.005
ups-0.005
ade-0.005
 it-0.005
Token magn
Token ID7842
Feature activation+1.70e-02

Pos logit contributions
up+0.042
 they+0.041
 it+0.038
l+0.037
 one+0.037
Neg logit contributions
 Australian-0.084
opl-0.081
charged-0.079
 habitats-0.077
 mastering-0.075
Tokenifying
Token ID4035
Feature activation+1.94e-01

Pos logit contributions
 Australian+0.011
opl+0.009
 habitats+0.008
 Jungle+0.008
 mastering+0.006
Neg logit contributions
up-0.051
 they-0.047
ups-0.047
ade-0.046
cakes-0.046
Token glass
Token ID5405
Feature activation-1.61e-02

Pos logit contributions
 Australian+0.517
opl+0.484
 habitats+0.482
 mastering+0.481
charged+0.474
Neg logit contributions
up-0.190
 they-0.151
ade-0.150
l-0.139
 it-0.138
Token that
Token ID326
Feature activation-8.76e-02

Pos logit contributions
up+0.017
ups+0.013
 they+0.012
cakes+0.012
l+0.012
Neg logit contributions
 Australian-0.039
opl-0.038
 Jungle-0.037
 habitats-0.037
 jungle-0.035
Token she
Token ID673
Feature activation+6.32e-03

Pos logit contributions
up+0.084
ups+0.063
head+0.061
l+0.059
ade+0.058
Neg logit contributions
 Australian-0.223
 Jungle-0.209
opl-0.208
 habitats-0.206
 mastering-0.199
Token found
Token ID1043
Feature activation-6.73e-02

Pos logit contributions
 Australian+0.015
opl+0.015
 habitats+0.014
 former+0.014
 Jungle+0.014
Neg logit contributions
up-0.008
 they-0.007
ups-0.006
ade-0.006
 it-0.006
Token in
Token ID287
Feature activation+8.76e-02

Pos logit contributions
up+0.071
 they+0.054
ade+0.052
ups+0.052
cakes+0.051
Neg logit contributions
 Australian-0.172
opl-0.164
 Jungle-0.160
 habitats-0.160
 mastering-0.156
Token see
Token ID766
Feature activation-1.05e-01

Pos logit contributions
 Australian+0.012
opl+0.011
charged+0.011
 habitats+0.011
 former+0.011
Neg logit contributions
up-0.005
 they-0.005
 it-0.005
 to-0.004
 one-0.004
Token the
Token ID262
Feature activation-3.58e-02

Pos logit contributions
up+0.134
ade+0.109
ups+0.104
cakes+0.099
head+0.097
Neg logit contributions
 Australian-0.248
opl-0.238
 Jungle-0.230
 habitats-0.225
 mastering-0.220
Token big
Token ID1263
Feature activation+2.76e-02

Pos logit contributions
up+0.082
l+0.076
 they+0.076
 it+0.073
 one+0.073
Neg logit contributions
 Australian-0.046
opl-0.041
 mastering-0.040
 former-0.040
charged-0.040
Token,
Token ID11
Feature activation-2.79e-02

Pos logit contributions
 Australian+0.041
 former+0.038
 mastering+0.036
opl+0.036
 habitats+0.036
Neg logit contributions
up-0.056
 they-0.053
l-0.052
 it-0.052
 one-0.051
Token hairy
Token ID43873
Feature activation+2.27e-02

Pos logit contributions
up+0.050
ups+0.043
ade+0.042
cakes+0.037
oons+0.037
Neg logit contributions
 Australian-0.053
 Jungle-0.049
opl-0.049
 jungle-0.046
 habitats-0.046
Token gor
Token ID30344
Feature activation+1.94e-01

Pos logit contributions
 Australian+0.032
 former+0.028
opl+0.027
 mastering+0.027
charged+0.027
Neg logit contributions
up-0.049
l-0.047
 they-0.046
 it-0.046
ly-0.045
Tokenillas
Token ID25314
Feature activation+9.67e-03

Pos logit contributions
 Australian+0.286
 former+0.275
opl+0.258
 habitats+0.248
 hind+0.226
Neg logit contributions
up-0.401
 they-0.392
ups-0.382
l-0.367
 it-0.365
Token.
Token ID13
Feature activation-2.84e-03

Pos logit contributions
 Australian+0.024
opl+0.023
 habitats+0.023
 former+0.023
 Jungle+0.022
Neg logit contributions
 they-0.011
up-0.010
 it-0.009
 again-0.009
 to-0.008
Token
Token ID198
Feature activation+1.30e-02

Pos logit contributions
up+0.005
ade+0.004
ups+0.004
l+0.004
ly+0.004
Neg logit contributions
 Australian-0.006
opl-0.005
 Jungle-0.005
 habitats-0.005
 former-0.005
Token
Token ID198
Feature activation+1.33e-02

Pos logit contributions
 Australian+0.023
opl+0.020
und+0.019
 Jungle+0.019
 habitats+0.018
Neg logit contributions
up-0.024
ly-0.021
ups-0.021
cakes-0.021
 cleaner-0.020
TokenAs
Token ID1722
Feature activation+3.64e-02

Pos logit contributions
 Australian+0.026
opl+0.024
 habitats+0.023
 Jungle+0.023
 mastering+0.022
Neg logit contributions
up-0.022
 they-0.019
ade-0.019
ups-0.019
l-0.018
Tokeno
Token ID78
Feature activation-9.87e-03

Pos logit contributions
 former+0.068
 Australian+0.066
 mastering+0.066
opl+0.065
 habitats+0.062
Neg logit contributions
up-0.029
 they-0.024
ade-0.024
l-0.024
 to-0.024
Token was
Token ID373
Feature activation+5.55e-02

Pos logit contributions
up+0.010
 they+0.009
 it+0.008
ade+0.008
l+0.008
Neg logit contributions
 Australian-0.025
opl-0.025
 former-0.024
 mastering-0.024
 habitats-0.023
Token not
Token ID407
Feature activation-8.84e-02

Pos logit contributions
 habitats+0.134
 former+0.131
opl+0.130
charged+0.126
 Australian+0.125
Neg logit contributions
 they-0.066
 it-0.064
 to-0.063
 one-0.062
 he-0.059
Token like
Token ID588
Feature activation+2.88e-02

Pos logit contributions
up+0.124
 they+0.114
 it+0.110
 to+0.105
l+0.104
Neg logit contributions
 Australian-0.193
opl-0.187
 habitats-0.185
 former-0.179
charged-0.173
Token other
Token ID584
Feature activation+1.28e-01

Pos logit contributions
 Australian+0.079
opl+0.076
 former+0.074
charged+0.073
 mastering+0.072
Neg logit contributions
up-0.024
 they-0.022
 it-0.020
 to-0.018
l-0.018
Token gor
Token ID30344
Feature activation+1.94e-01

Pos logit contributions
 Australian+0.227
 former+0.213
opl+0.206
 mastering+0.198
 habitats+0.195
Neg logit contributions
up-0.233
 one-0.219
 they-0.216
l-0.212
 it-0.211
Tokenillas
Token ID25314
Feature activation+1.21e-02

Pos logit contributions
 former+0.298
 Australian+0.296
 habitats+0.271
opl+0.267
 hind+0.250
Neg logit contributions
up-0.372
 they-0.364
ups-0.360
ing-0.350
 it-0.349
Token.
Token ID13
Feature activation+6.37e-03

Pos logit contributions
 Australian+0.037
opl+0.035
 habitats+0.035
 former+0.035
 Jungle+0.034
Neg logit contributions
up-0.007
 they-0.006
 it-0.005
l-0.005
ups-0.004
Token He
Token ID679
Feature activation+6.11e-02

Pos logit contributions
 Australian+0.013
opl+0.013
 former+0.012
 mastering+0.012
 habitats+0.012
Neg logit contributions
up-0.009
 they-0.008
ade-0.008
 it-0.008
l-0.008
Token was
Token ID373
Feature activation+5.53e-02

Pos logit contributions
opl+0.173
 Australian+0.170
 former+0.169
 mastering+0.162
charged+0.161
Neg logit contributions
up-0.044
 they-0.044
 it-0.040
 he-0.038
 said-0.038
Token bald
Token ID28273
Feature activation+2.25e-02

Pos logit contributions
opl+0.133
 Australian+0.129
 habitats+0.128
 former+0.126
charged+0.125
Neg logit contributions
 to-0.066
 they-0.065
 one-0.063
 it-0.063
 bigger-0.060
Token cried
Token ID16896
Feature activation-5.94e-03

Pos logit contributions
up+0.063
 they+0.049
ade+0.049
ups+0.048
cakes+0.047
Neg logit contributions
 Australian-0.134
opl-0.125
 habitats-0.122
 Jungle-0.122
 mastering-0.120
Token,
Token ID11
Feature activation-2.74e-02

Pos logit contributions
up+0.008
ade+0.007
cakes+0.007
ups+0.006
oons+0.006
Neg logit contributions
 Australian-0.013
 ear-0.012
pered-0.012
 mastering-0.012
 tail-0.012
Token dropping
Token ID12047
Feature activation+3.69e-02

Pos logit contributions
 they+0.040
 it+0.037
up+0.036
 she+0.035
 one+0.034
Neg logit contributions
opl-0.058
 Australian-0.058
 habitats-0.057
 former-0.055
 Jungle-0.054
Token the
Token ID262
Feature activation-3.19e-02

Pos logit contributions
 Australian+0.082
opl+0.075
 habitats+0.073
 Jungle+0.072
 mastering+0.072
Neg logit contributions
up-0.052
ade-0.042
ups-0.042
cakes-0.041
head-0.038
Token magn
Token ID7842
Feature activation+1.70e-02

Pos logit contributions
up+0.047
 they+0.040
ade+0.039
l+0.038
ups+0.038
Neg logit contributions
 Australian-0.070
opl-0.066
 habitats-0.064
 Jungle-0.063
 former-0.062
Tokenifying
Token ID4035
Feature activation+1.94e-01

Pos logit contributions
 Australian+0.010
opl+0.008
 Jungle+0.007
 habitats+0.006
 mastering+0.006
Neg logit contributions
up-0.052
 they-0.048
ups-0.047
ade-0.047
cakes-0.046
Token glass
Token ID5405
Feature activation-1.51e-02

Pos logit contributions
 Australian+0.484
opl+0.458
 habitats+0.444
 Jungle+0.442
 mastering+0.434
Neg logit contributions
up-0.223
 they-0.178
ups-0.170
ade-0.170
l-0.165
Token.
Token ID13
Feature activation+5.71e-03

Pos logit contributions
up+0.016
ups+0.012
cakes+0.011
 they+0.011
l+0.011
Neg logit contributions
 Australian-0.035
opl-0.034
 habitats-0.033
und-0.033
 Jungle-0.032
Token
Token ID198
Feature activation+1.48e-02

Pos logit contributions
 Australian+0.014
opl+0.013
 habitats+0.013
 Jungle+0.013
 former+0.012
Neg logit contributions
up-0.007
 they-0.006
ups-0.005
ade-0.005
l-0.005
Token
Token ID198
Feature activation+1.52e-02

Pos logit contributions
 Australian+0.035
opl+0.033
 Jungle+0.032
 habitats+0.032
 mastering+0.031
Neg logit contributions
up-0.019
 they-0.015
ade-0.015
ups-0.015
cakes-0.014
Token"
Token ID1
Feature activation-1.23e-02

Pos logit contributions
 Australian+0.039
opl+0.037
 habitats+0.036
 Jungle+0.036
 mastering+0.035
Neg logit contributions
up-0.016
 they-0.013
ade-0.012
ups-0.012
l-0.012
Token She
Token ID1375
Feature activation+6.46e-02

Pos logit contributions
 Australian+0.005
opl+0.005
 Jungle+0.005
 habitats+0.004
 mastering+0.004
Neg logit contributions
up-0.003
 they-0.003
ups-0.003
 it-0.003
l-0.003
Token learned
Token ID4499
Feature activation+4.59e-03

Pos logit contributions
 Australian+0.143
 habitats+0.131
opl+0.131
 former+0.131
athed+0.130
Neg logit contributions
up-0.078
head-0.072
ly-0.065
cakes-0.064
l-0.063
Token that
Token ID326
Feature activation-9.12e-02

Pos logit contributions
 Australian+0.010
opl+0.010
 habitats+0.010
 former+0.010
 mastering+0.010
Neg logit contributions
up-0.005
head-0.005
ade-0.005
cakes-0.004
ups-0.004
Token the
Token ID262
Feature activation-3.12e-02

Pos logit contributions
up+0.143
ade+0.115
ups+0.114
l+0.113
ly+0.112
Neg logit contributions
 Australian-0.190
opl-0.178
 habitats-0.173
 mastering-0.172
 former-0.169
Token or
Token ID393
Feature activation-4.22e-02

Pos logit contributions
up+0.045
 they+0.039
ade+0.038
ups+0.037
l+0.037
Neg logit contributions
 Australian-0.068
opl-0.065
 Jungle-0.062
 habitats-0.062
 mastering-0.060
Tokennam
Token ID7402
Feature activation+1.94e-01

Pos logit contributions
up+0.052
 they+0.042
ade+0.041
ups+0.041
l+0.040
Neg logit contributions
 Australian-0.101
opl-0.095
 habitats-0.093
 Jungle-0.092
 mastering-0.091
Tokenents
Token ID658
Feature activation-2.89e-02

Pos logit contributions
 Australian+0.418
opl+0.391
 habitats+0.378
 Jungle+0.377
 mastering+0.367
Neg logit contributions
up-0.290
 they-0.243
ups-0.238
ade-0.236
l-0.231
Token were
Token ID547
Feature activation+1.01e-01

Pos logit contributions
up+0.047
ups+0.040
ade+0.039
head+0.039
 they+0.039
Neg logit contributions
 Australian-0.055
opl-0.052
 habitats-0.050
 Jungle-0.049
 mastering-0.049
Token not
Token ID407
Feature activation-8.91e-02

Pos logit contributions
 Australian+0.238
opl+0.226
 habitats+0.217
 Jungle+0.215
charged+0.208
Neg logit contributions
up-0.135
 they-0.120
ups-0.112
 it-0.109
ade-0.107
Token for
Token ID329
Feature activation+2.36e-02

Pos logit contributions
up+0.104
 they+0.102
 it+0.093
 to+0.089
 one+0.086
Neg logit contributions
 Australian-0.219
opl-0.215
 habitats-0.204
 Jungle-0.202
charged-0.192
Token playing
Token ID2712
Feature activation+4.66e-02

Pos logit contributions
 Australian+0.046
opl+0.043
 Jungle+0.041
athed+0.040
 habitats+0.039
Neg logit contributions
up-0.040
 they-0.039
 it-0.037
ups-0.035
 to-0.035
Token They
Token ID1119
Feature activation+1.18e-02

Pos logit contributions
 Australian+0.008
opl+0.007
 habitats+0.007
 Jungle+0.007
 former+0.007
Neg logit contributions
up-0.008
 they-0.007
ade-0.007
ups-0.007
l-0.006
Token reached
Token ID4251
Feature activation-9.24e-02

Pos logit contributions
 Australian+0.034
opl+0.034
 Jungle+0.032
 habitats+0.032
 mastering+0.032
Neg logit contributions
up-0.009
 they-0.008
ade-0.006
l-0.006
ups-0.005
Token for
Token ID329
Feature activation+2.47e-02

Pos logit contributions
up+0.099
 they+0.081
ups+0.075
ade+0.074
l+0.072
Neg logit contributions
 Australian-0.238
opl-0.226
 habitats-0.219
 Jungle-0.218
 mastering-0.213
Token the
Token ID262
Feature activation-2.90e-02

Pos logit contributions
 Australian+0.062
opl+0.058
 habitats+0.057
 Jungle+0.057
 mastering+0.056
Neg logit contributions
up-0.028
ade-0.022
ups-0.022
 they-0.021
l-0.020
Token or
Token ID393
Feature activation-4.08e-02

Pos logit contributions
up+0.040
 they+0.035
l+0.034
ups+0.032
 one+0.032
Neg logit contributions
 Australian-0.066
opl-0.063
 habitats-0.060
 mastering-0.059
 Jungle-0.059
Tokennam
Token ID7402
Feature activation+1.94e-01

Pos logit contributions
up+0.115
ade+0.105
ups+0.103
cakes+0.101
l+0.101
Neg logit contributions
 Australian-0.034
 Jungle-0.026
 habitats-0.026
opl-0.025
 mastering-0.024
Tokenents
Token ID658
Feature activation-2.60e-02

Pos logit contributions
 Australian+0.406
opl+0.381
 habitats+0.367
 Jungle+0.365
 former+0.356
Neg logit contributions
up-0.295
 they-0.253
ade-0.244
ups-0.242
l-0.238
Token and
Token ID290
Feature activation-1.55e-02

Pos logit contributions
up+0.028
ups+0.021
ade+0.020
cakes+0.020
head+0.020
Neg logit contributions
 Australian-0.062
opl-0.061
 mastering-0.058
 habitats-0.058
 Jungle-0.057
Token touched
Token ID12615
Feature activation-6.65e-02

Pos logit contributions
up+0.020
 they+0.016
ade+0.016
ups+0.016
l+0.015
Neg logit contributions
 Australian-0.037
opl-0.034
 habitats-0.034
 Jungle-0.033
 mastering-0.033
Token them
Token ID606
Feature activation-7.82e-02

Pos logit contributions
up+0.060
ade+0.043
ups+0.042
cakes+0.039
 they+0.039
Neg logit contributions
 Australian-0.182
opl-0.172
 habitats-0.169
 Jungle-0.167
 former-0.164
Token softly
Token ID26625
Feature activation-5.23e-03

Pos logit contributions
up+0.052
 they+0.036
ups+0.032
ade+0.031
l+0.030
Neg logit contributions
 Australian-0.233
opl-0.223
 habitats-0.217
 Jungle-0.217
 former-0.212
Token they
Token ID484
Feature activation-4.98e-02

Pos logit contributions
up+0.027
l+0.021
ups+0.021
ade+0.021
head+0.021
Neg logit contributions
 Australian-0.050
 Jungle-0.046
 mastering-0.046
opl-0.046
 former-0.046
Token began
Token ID2540
Feature activation+9.37e-02

Pos logit contributions
up+0.072
ups+0.059
head+0.057
 they+0.057
l+0.057
Neg logit contributions
 Australian-0.106
opl-0.098
 Jungle-0.095
 mastering-0.095
 habitats-0.094
Token to
Token ID284
Feature activation+1.35e-03

Pos logit contributions
 Australian+0.256
opl+0.242
 mastering+0.240
 Jungle+0.237
 former+0.232
Neg logit contributions
up-0.083
head-0.055
cakes-0.054
ups-0.054
ade-0.053
Token carefully
Token ID7773
Feature activation-1.86e-03

Pos logit contributions
 Australian+0.003
opl+0.003
 Jungle+0.003
 habitats+0.003
 former+0.003
Neg logit contributions
up-0.002
 they-0.002
ups-0.002
ade-0.001
l-0.001
Token ch
Token ID442
Feature activation+8.56e-02

Pos logit contributions
up+0.002
ups+0.002
cakes+0.002
 they+0.002
head+0.002
Neg logit contributions
 Australian-0.004
opl-0.004
 Jungle-0.004
 mastering-0.004
 former-0.004
Tokenisel
Token ID36811
Feature activation-1.88e-01

Pos logit contributions
 Australian+0.134
opl+0.125
 Jungle+0.116
 habitats+0.116
 mastering+0.114
Neg logit contributions
up-0.178
 they-0.158
cakes-0.158
ups-0.155
ade-0.152
Token the
Token ID262
Feature activation-3.59e-02

Pos logit contributions
up+0.203
cakes+0.159
head+0.157
ade+0.148
 happens+0.132
Neg logit contributions
 Australian-0.453
opl-0.447
 mastering-0.443
 winning-0.430
 habitats-0.423
Token rock
Token ID3881
Feature activation-2.48e-02

Pos logit contributions
up+0.048
 they+0.040
ade+0.039
ups+0.039
l+0.038
Neg logit contributions
 Australian-0.083
opl-0.078
 habitats-0.076
 Jungle-0.075
 mastering-0.073
Token.
Token ID13
Feature activation+3.54e-03

Pos logit contributions
up+0.026
head+0.019
cakes+0.018
ups+0.018
ade+0.018
Neg logit contributions
 Australian-0.062
opl-0.060
 mastering-0.057
 Jungle-0.057
 habitats-0.055
Token They
Token ID1119
Feature activation+1.26e-02

Pos logit contributions
 Australian+0.008
opl+0.008
 habitats+0.008
charged+0.008
 former+0.008
Neg logit contributions
 they-0.004
up-0.004
 it-0.004
 he-0.003
cakes-0.003
Token kept
Token ID4030
Feature activation-3.24e-03

Pos logit contributions
 Australian+0.028
opl+0.027
 habitats+0.026
 Jungle+0.026
 mastering+0.025
Neg logit contributions
up-0.018
 they-0.015
ups-0.014
ade-0.014
l-0.014
Token a
Token ID257
Feature activation-3.68e-02

Pos logit contributions
 Australian+0.108
opl+0.104
 mastering+0.102
 defeating+0.099
 Jungle+0.098
Neg logit contributions
up-0.055
ups-0.043
ade-0.040
head-0.040
cakes-0.039
Token hammer
Token ID15554
Feature activation-8.98e-02

Pos logit contributions
up+0.048
 they+0.040
ups+0.039
ade+0.038
l+0.037
Neg logit contributions
 Australian-0.086
opl-0.080
 Jungle-0.078
 habitats-0.077
 mastering-0.076
Token and
Token ID290
Feature activation-2.14e-02

Pos logit contributions
up+0.113
cakes+0.091
ups+0.091
ade+0.084
head+0.083
Neg logit contributions
opl-0.197
 Australian-0.189
und-0.182
 mastering-0.178
athed-0.172
Token a
Token ID257
Feature activation-3.72e-02

Pos logit contributions
up+0.029
l+0.024
ups+0.024
head+0.024
guard+0.023
Neg logit contributions
 Australian-0.043
 mastering-0.040
 Jungle-0.039
 former-0.039
opl-0.039
Token ch
Token ID442
Feature activation+8.60e-02

Pos logit contributions
up+0.046
 they+0.038
ade+0.037
l+0.036
 it+0.035
Neg logit contributions
 Australian-0.090
opl-0.086
 habitats-0.085
 Jungle-0.081
 mastering-0.080
Tokenisel
Token ID36811
Feature activation-1.88e-01

Pos logit contributions
 Australian+0.150
opl+0.136
 habitats+0.132
 Jungle+0.130
 former+0.126
Neg logit contributions
up-0.167
 they-0.144
ade-0.144
ups-0.142
l-0.141
Token and
Token ID290
Feature activation-2.20e-02

Pos logit contributions
up+0.192
cakes+0.164
head+0.158
ups+0.139
ade+0.137
Neg logit contributions
opl-0.433
 Australian-0.432
 mastering-0.424
und-0.405
 winning-0.402
Token they
Token ID484
Feature activation-4.98e-02

Pos logit contributions
up+0.027
l+0.021
ups+0.021
ade+0.021
head+0.021
Neg logit contributions
 Australian-0.050
 Jungle-0.046
 mastering-0.046
opl-0.046
 former-0.046
Token began
Token ID2540
Feature activation+9.37e-02

Pos logit contributions
up+0.072
ups+0.059
head+0.057
 they+0.057
l+0.057
Neg logit contributions
 Australian-0.106
opl-0.098
 Jungle-0.095
 mastering-0.095
 habitats-0.094
Token to
Token ID284
Feature activation+1.35e-03

Pos logit contributions
 Australian+0.256
opl+0.242
 mastering+0.240
 Jungle+0.237
 former+0.232
Neg logit contributions
up-0.083
head-0.055
cakes-0.054
ups-0.054
ade-0.053
Token carefully
Token ID7773
Feature activation-1.86e-03

Pos logit contributions
 Australian+0.003
opl+0.003
 Jungle+0.003
 habitats+0.003
 former+0.003
Neg logit contributions
up-0.002
 they-0.002
ups-0.002
ade-0.001
l-0.001
Token,
Token ID11
Feature activation-1.94e-02

Pos logit contributions
up+0.075
ups+0.058
head+0.055
cakes+0.054
 they+0.054
Neg logit contributions
 Australian-0.168
opl-0.161
 Jungle-0.157
 habitats-0.154
 mastering-0.154
Token circles
Token ID13332
Feature activation-1.19e-01

Pos logit contributions
up+0.031
ade+0.026
ups+0.026
 they+0.025
cakes+0.025
Neg logit contributions
 Australian-0.039
opl-0.037
 Jungle-0.036
 habitats-0.036
 mastering-0.035
Token,
Token ID11
Feature activation-1.97e-02

Pos logit contributions
up+0.126
ups+0.096
 they+0.092
ade+0.091
l+0.088
Neg logit contributions
 Australian-0.300
opl-0.288
 Jungle-0.279
 habitats-0.277
 mastering-0.274
Token and
Token ID290
Feature activation-1.99e-02

Pos logit contributions
up+0.023
ups+0.017
ade+0.017
 they+0.016
cakes+0.016
Neg logit contributions
 Australian-0.049
opl-0.046
 Jungle-0.046
 habitats-0.045
 mastering-0.044
Token o
Token ID267
Feature activation+1.33e-02

Pos logit contributions
up+0.027
ade+0.022
ups+0.021
 they+0.021
l+0.021
Neg logit contributions
 Australian-0.045
opl-0.042
 Jungle-0.041
 habitats-0.041
 mastering-0.041
Tokenvals
Token ID12786
Feature activation-1.85e-01

Pos logit contributions
 Australian+0.024
opl+0.022
 habitats+0.022
 Jungle+0.021
 mastering+0.021
Neg logit contributions
up-0.024
 they-0.021
ups-0.021
ade-0.021
l-0.020
Token.
Token ID13
Feature activation+9.18e-03

Pos logit contributions
up+0.236
guard+0.193
head+0.193
ups+0.190
cakes+0.182
Neg logit contributions
 Australian-0.397
opl-0.381
 Jungle-0.369
 habitats-0.369
 mastering-0.367
Token She
Token ID1375
Feature activation+6.83e-02

Pos logit contributions
 Australian+0.021
opl+0.019
 habitats+0.019
charged+0.018
 Jungle+0.018
Neg logit contributions
up-0.012
 they-0.012
 it-0.011
cakes-0.010
 she-0.010
Token matched
Token ID14451
Feature activation-9.80e-02

Pos logit contributions
 Australian+0.174
opl+0.164
 habitats+0.159
 Jungle+0.159
 former+0.155
Neg logit contributions
up-0.077
 they-0.062
ade-0.059
ups-0.058
l-0.056
Token up
Token ID510
Feature activation+2.71e-02

Pos logit contributions
up+0.121
ups+0.095
ade+0.092
head+0.092
cakes+0.091
Neg logit contributions
 Australian-0.229
opl-0.216
 Jungle-0.213
 habitats-0.212
 mastering-0.209
Token triangles
Token ID44360
Feature activation-4.22e-02

Pos logit contributions
 Australian+0.069
opl+0.066
 habitats+0.063
 Jungle+0.063
 mastering+0.062
Neg logit contributions
up-0.030
 they-0.024
ups-0.022
ade-0.022
l-0.022
Token be
Token ID307
Feature activation+2.86e-03

Pos logit contributions
up+0.044
 they+0.038
l+0.035
ade+0.035
 it+0.035
Neg logit contributions
 Australian-0.077
opl-0.073
 habitats-0.070
 Jungle-0.069
 mastering-0.068
Tokenak
Token ID461
Feature activation-6.62e-02

Pos logit contributions
 Australian+0.005
opl+0.004
 Jungle+0.004
 habitats+0.004
 mastering+0.004
Neg logit contributions
up-0.005
guard-0.005
 they-0.005
cakes-0.005
head-0.005
Token,
Token ID11
Feature activation-2.27e-02

Pos logit contributions
up+0.054
 they+0.038
ups+0.036
ade+0.036
cakes+0.034
Neg logit contributions
 Australian-0.184
opl-0.177
 habitats-0.171
 Jungle-0.170
 mastering-0.167
Token two
Token ID734
Feature activation-4.87e-02

Pos logit contributions
up+0.021
 they+0.018
 it+0.015
ups+0.015
l+0.015
Neg logit contributions
 Australian-0.062
opl-0.058
 habitats-0.057
 Jungle-0.056
 former-0.055
Token o
Token ID267
Feature activation+1.44e-02

Pos logit contributions
up+0.078
 they+0.068
ups+0.065
l+0.064
ade+0.064
Neg logit contributions
 Australian-0.102
opl-0.094
 habitats-0.091
 Jungle-0.090
 former-0.089
Tokenvals
Token ID12786
Feature activation-1.85e-01

Pos logit contributions
 Australian+0.023
opl+0.023
 Jungle+0.022
 habitats+0.020
 mastering+0.020
Neg logit contributions
up-0.027
 they-0.025
l-0.024
 it-0.024
ly-0.024
Token for
Token ID329
Feature activation+2.46e-02

Pos logit contributions
up+0.233
head+0.184
cakes+0.182
ups+0.182
guard+0.177
Neg logit contributions
 Australian-0.411
opl-0.396
 habitats-0.383
 Jungle-0.380
 mastering-0.376
Token the
Token ID262
Feature activation-3.30e-02

Pos logit contributions
 Australian+0.071
opl+0.066
 habitats+0.065
 former+0.064
charged+0.063
Neg logit contributions
 it-0.020
 they-0.019
up-0.017
 one-0.015
 to-0.014
Token eyes
Token ID2951
Feature activation-2.38e-02

Pos logit contributions
up+0.044
 they+0.037
l+0.036
 it+0.036
ade+0.035
Neg logit contributions
 Australian-0.077
opl-0.071
 habitats-0.070
 mastering-0.068
 former-0.067
Token,
Token ID11
Feature activation-2.18e-02

Pos logit contributions
up+0.026
ups+0.019
guard+0.019
ade+0.019
head+0.019
Neg logit contributions
 Australian-0.055
opl-0.055
 mastering-0.052
 Jungle-0.051
und-0.051
Token a
Token ID257
Feature activation-3.51e-02

Pos logit contributions
up+0.020
 they+0.015
ups+0.014
ade+0.014
l+0.014
Neg logit contributions
 Australian-0.059
opl-0.056
 habitats-0.055
 Jungle-0.054
 mastering-0.053
Token they
Token ID484
Feature activation-5.86e-02

Pos logit contributions
up+0.045
 they+0.038
 it+0.036
l+0.036
 one+0.036
Neg logit contributions
 Australian-0.078
opl-0.076
 habitats-0.073
 mastering-0.072
 Jungle-0.070
Token rode
Token ID22075
Feature activation+9.17e-03

Pos logit contributions
up+0.075
 they+0.067
ade+0.060
ups+0.059
l+0.059
Neg logit contributions
 Australian-0.139
opl-0.134
 Jungle-0.127
 habitats-0.126
 mastering-0.125
Token the
Token ID262
Feature activation-2.53e-02

Pos logit contributions
 Australian+0.023
opl+0.022
 habitats+0.021
 Jungle+0.021
 mastering+0.020
Neg logit contributions
up-0.011
 they-0.009
ups-0.008
ade-0.008
l-0.008
Token roller
Token ID24471
Feature activation-5.41e-02

Pos logit contributions
up+0.039
 they+0.033
l+0.033
ade+0.032
ups+0.032
Neg logit contributions
 Australian-0.053
opl-0.049
 mastering-0.049
 habitats-0.048
 Jungle-0.047
Tokenco
Token ID1073
Feature activation-4.13e-03

Pos logit contributions
up+0.102
 they+0.089
ups+0.087
ade+0.087
l+0.085
Neg logit contributions
 Australian-0.092
opl-0.089
 habitats-0.082
 Jungle-0.082
 mastering-0.078
Tokenaster
Token ID1603
Feature activation-1.74e-01

Pos logit contributions
up+0.007
 they+0.006
ups+0.006
ade+0.006
l+0.006
Neg logit contributions
 Australian-0.008
opl-0.007
 habitats-0.007
 Jungle-0.007
 former-0.007
Token together
Token ID1978
Feature activation+1.44e-02

Pos logit contributions
up+0.164
 they+0.121
ade+0.117
ups+0.116
l+0.110
Neg logit contributions
 Australian-0.461
opl-0.442
 habitats-0.427
 Jungle-0.426
 mastering-0.414
Token.
Token ID13
Feature activation+1.51e-02

Pos logit contributions
 Australian+0.042
opl+0.040
 habitats+0.039
 Jungle+0.039
 former+0.039
Neg logit contributions
up-0.010
 they-0.008
ade-0.006
ups-0.006
l-0.006
Token It
Token ID632
Feature activation-5.83e-02

Pos logit contributions
 Australian+0.034
opl+0.033
charged+0.033
 mastering+0.032
 habitats+0.031
Neg logit contributions
 they-0.019
up-0.018
 it-0.017
cakes-0.016
head-0.016
Token was
Token ID373
Feature activation+5.88e-02

Pos logit contributions
up+0.056
 they+0.042
ups+0.041
ade+0.040
l+0.038
Neg logit contributions
 Australian-0.155
opl-0.147
 habitats-0.143
 Jungle-0.142
 mastering-0.140
Token scary
Token ID14343
Feature activation+2.87e-02

Pos logit contributions
 Australian+0.142
opl+0.134
charged+0.131
 habitats+0.131
 Jungle+0.129
Neg logit contributions
up-0.068
 they-0.064
 it-0.060
ade-0.055
l-0.054
Token popcorn
Token ID38914
Feature activation+3.17e-02

Pos logit contributions
up+0.043
 they+0.038
l+0.037
ade+0.035
ups+0.035
Neg logit contributions
 Australian-0.070
opl-0.066
 mastering-0.063
 habitats-0.063
charged-0.062
Token was
Token ID373
Feature activation+6.34e-02

Pos logit contributions
 Australian+0.080
opl+0.076
 habitats+0.073
 Jungle+0.073
 mastering+0.072
Neg logit contributions
up-0.036
 they-0.028
ups-0.027
ade-0.027
l-0.026
Token sticky
Token ID23408
Feature activation-4.50e-02

Pos logit contributions
 Australian+0.142
opl+0.133
 habitats+0.129
 Jungle+0.128
 mastering+0.125
Neg logit contributions
up-0.089
 they-0.078
ade-0.073
ups-0.073
l-0.072
Token and
Token ID290
Feature activation-1.70e-02

Pos logit contributions
up+0.061
ups+0.048
l+0.046
cakes+0.045
guard+0.045
Neg logit contributions
opl-0.095
 Australian-0.092
 mastering-0.088
und-0.086
 winning-0.085
Token gre
Token ID10536
Feature activation+1.21e-02

Pos logit contributions
up+0.025
ups+0.019
ade+0.019
cakes+0.018
l+0.018
Neg logit contributions
 Australian-0.037
opl-0.034
 habitats-0.033
 Jungle-0.033
 mastering-0.033
Tokenasy
Token ID4107
Feature activation-1.73e-01

Pos logit contributions
 Australian+0.023
charged+0.022
 habitats+0.022
ses+0.022
 former+0.022
Neg logit contributions
up-0.018
ade-0.017
cakes-0.017
 they-0.016
ups-0.016
Token.
Token ID13
Feature activation+1.73e-02

Pos logit contributions
up+0.153
 they+0.106
ade+0.100
ups+0.100
cakes+0.099
Neg logit contributions
 Australian-0.461
opl-0.447
 habitats-0.435
 mastering-0.425
 Jungle-0.422
Token They
Token ID1119
Feature activation+1.59e-02

Pos logit contributions
 Australian+0.041
opl+0.039
 habitats+0.038
 mastering+0.037
charged+0.037
Neg logit contributions
up-0.021
 they-0.020
 it-0.018
head-0.017
l-0.017
Token used
Token ID973
Feature activation-2.48e-02

Pos logit contributions
 Australian+0.040
opl+0.040
 Jungle+0.037
 habitats+0.037
 mastering+0.036
Neg logit contributions
up-0.019
 they-0.017
l-0.015
 it-0.015
ade-0.015
Token a
Token ID257
Feature activation-3.46e-02

Pos logit contributions
up+0.028
ups+0.021
cakes+0.020
ade+0.020
head+0.020
Neg logit contributions
 Australian-0.061
opl-0.057
 habitats-0.057
 former-0.055
 Jungle-0.055
Token towel
Token ID24808
Feature activation-2.73e-02

Pos logit contributions
up+0.048
ups+0.039
 they+0.039
cakes+0.037
ade+0.037
Neg logit contributions
 Australian-0.078
opl-0.072
 Jungle-0.070
 habitats-0.069
 former-0.069
Token put
Token ID1234
Feature activation-1.16e-01

Pos logit contributions
up+0.029
ade+0.023
l+0.022
ups+0.022
cakes+0.022
Neg logit contributions
 Australian-0.052
 Jungle-0.047
opl-0.047
 habitats-0.047
 mastering-0.047
Token it
Token ID340
Feature activation-1.29e-01

Pos logit contributions
up+0.127
ups+0.097
guard+0.096
cakes+0.087
ade+0.084
Neg logit contributions
 Australian-0.279
 Jungle-0.267
opl-0.265
athed-0.256
 ear-0.252
Token in
Token ID287
Feature activation+8.29e-02

Pos logit contributions
up+0.184
 they+0.150
ups+0.149
head+0.147
l+0.144
Neg logit contributions
 Australian-0.277
opl-0.251
 Jungle-0.248
 habitats-0.245
 former-0.245
Token the
Token ID262
Feature activation-3.61e-02

Pos logit contributions
 Australian+0.226
opl+0.215
 habitats+0.210
 Jungle+0.206
charged+0.202
Neg logit contributions
up-0.073
 they-0.065
 it-0.060
ade-0.054
 one-0.053
Token to
Token ID284
Feature activation-1.28e-03

Pos logit contributions
up+0.052
 they+0.044
ups+0.042
ade+0.042
l+0.041
Neg logit contributions
 Australian-0.080
opl-0.075
 habitats-0.072
 Jungle-0.072
 mastering-0.070
Tokenaster
Token ID1603
Feature activation-1.73e-01

Pos logit contributions
up+0.002
ups+0.001
ade+0.001
 they+0.001
cakes+0.001
Neg logit contributions
 Australian-0.003
opl-0.003
 Jungle-0.003
 former-0.003
 habitats-0.003
Token.
Token ID13
Feature activation+5.64e-03

Pos logit contributions
up+0.144
 they+0.100
ade+0.094
ups+0.094
l+0.090
Neg logit contributions
 Australian-0.475
opl-0.459
 habitats-0.444
 Jungle-0.442
 former-0.434
Token Then
Token ID3244
Feature activation+6.14e-03

Pos logit contributions
 Australian+0.012
opl+0.011
 habitats+0.011
 Jungle+0.011
charged+0.010
Neg logit contributions
up-0.009
 they-0.008
ups-0.007
cakes-0.007
 it-0.007
Token,
Token ID11
Feature activation-2.27e-02

Pos logit contributions
 Australian+0.018
opl+0.016
 Jungle+0.016
 mastering+0.016
 habitats+0.016
Neg logit contributions
up-0.005
ade-0.003
ups-0.003
l-0.003
cakes-0.003
Token she
Token ID673
Feature activation+2.70e-03

Pos logit contributions
up+0.015
ade+0.009
ups+0.009
 they+0.009
l+0.008
Neg logit contributions
 Australian-0.068
opl-0.064
 Jungle-0.063
 habitats-0.063
 mastering-0.062
Token went
Token ID1816
Feature activation+3.57e-02

Pos logit contributions
 Australian+0.006
opl+0.006
 habitats+0.006
 Jungle+0.006
 mastering+0.006
Neg logit contributions
up-0.003
 they-0.003
ups-0.003
ade-0.003
l-0.002
Token,
Token ID11
Feature activation-2.37e-02

Pos logit contributions
up+0.060
ups+0.046
ade+0.045
cakes+0.045
head+0.043
Neg logit contributions
 Australian-0.122
opl-0.119
 habitats-0.114
 Jungle-0.113
 mastering-0.113
Token thank
Token ID5875
Feature activation+1.75e-02

Pos logit contributions
up+0.032
ade+0.026
ups+0.026
l+0.024
cakes+0.024
Neg logit contributions
 Australian-0.051
opl-0.050
 Jungle-0.049
 mastering-0.048
 ear-0.047
Token you
Token ID345
Feature activation-3.63e-02

Pos logit contributions
 Australian+0.035
 Jungle+0.033
charged+0.031
opl+0.031
 mastering+0.031
Neg logit contributions
up-0.027
ade-0.025
cakes-0.023
oons-0.022
ups-0.022
Token.
Token ID13
Feature activation+1.19e-02

Pos logit contributions
up+0.036
ade+0.027
cakes+0.024
ups+0.024
head+0.023
Neg logit contributions
 Australian-0.093
opl-0.089
 former-0.085
 habitats-0.085
charged-0.085
Token Cig
Token ID32616
Feature activation+4.17e-02

Pos logit contributions
 Australian+0.024
opl+0.022
 habitats+0.021
 Jungle+0.021
 mastering+0.021
Neg logit contributions
up-0.020
 they-0.017
ups-0.017
ade-0.016
l-0.016
Tokenarettes
Token ID13890
Feature activation-1.73e-01

Pos logit contributions
 Australian+0.065
opl+0.060
 Jungle+0.058
 mastering+0.056
charged+0.054
Neg logit contributions
up-0.083
ade-0.074
l-0.073
 they-0.073
ly-0.073
Token are
Token ID389
Feature activation+9.24e-02

Pos logit contributions
up+0.279
 they+0.238
ups+0.233
ade+0.230
l+0.227
Neg logit contributions
 Australian-0.346
opl-0.325
 habitats-0.309
 Jungle-0.308
 mastering-0.298
Token stupid
Token ID8531
Feature activation+2.20e-02

Pos logit contributions
 Australian+0.212
opl+0.201
 habitats+0.194
 Jungle+0.192
 mastering+0.188
Neg logit contributions
up-0.125
 they-0.111
ups-0.104
ade-0.102
l-0.101
Token.
Token ID13
Feature activation+1.09e-02

Pos logit contributions
opl+0.051
 Australian+0.051
 habitats+0.048
charged+0.047
 tail+0.047
Neg logit contributions
up-0.023
cakes-0.017
ade-0.017
ups-0.016
head-0.015
Token They
Token ID1119
Feature activation+1.57e-02

Pos logit contributions
 Australian+0.020
opl+0.019
 habitats+0.018
 Jungle+0.018
 former+0.017
Neg logit contributions
up-0.020
ade-0.017
 they-0.017
ups-0.017
l-0.016
Token hurt
Token ID5938
Feature activation-1.51e-02

Pos logit contributions
 Australian+0.035
opl+0.033
 habitats+0.032
 Jungle+0.032
 mastering+0.031
Neg logit contributions
up-0.023
 they-0.020
ade-0.018
ups-0.018
l-0.018
Token They
Token ID1119
Feature activation+1.64e-02

Pos logit contributions
 Australian+0.038
opl+0.037
charged+0.035
 mastering+0.035
 Jungle+0.035
Neg logit contributions
up-0.025
 they-0.024
 it-0.021
l-0.021
head-0.020
Token watch
Token ID2342
Feature activation-7.46e-02

Pos logit contributions
 Australian+0.040
opl+0.039
charged+0.037
 habitats+0.037
 mastering+0.036
Neg logit contributions
up-0.020
 they-0.018
 it-0.016
l-0.016
 let-0.016
Token as
Token ID355
Feature activation-2.11e-03

Pos logit contributions
up+0.071
 they+0.056
ups+0.051
ade+0.050
l+0.050
Neg logit contributions
 Australian-0.201
opl-0.192
 habitats-0.186
 Jungle-0.185
 former-0.182
Token she
Token ID673
Feature activation+3.76e-03

Pos logit contributions
up+0.002
ade+0.001
ly+0.001
ups+0.001
cakes+0.001
Neg logit contributions
 Australian-0.005
 habitats-0.005
opl-0.005
 mastering-0.005
 Jungle-0.005
Token gl
Token ID1278
Feature activation-1.99e-02

Pos logit contributions
 Australian+0.009
charged+0.008
opl+0.008
 former+0.008
 mastering+0.008
Neg logit contributions
up-0.005
 they-0.004
l-0.004
 poured-0.004
ade-0.004
Tokenues
Token ID947
Feature activation-1.72e-01

Pos logit contributions
up+0.033
 they+0.031
 it+0.029
head+0.029
ups+0.028
Neg logit contributions
 Australian-0.036
opl-0.034
 habitats-0.033
 former-0.033
 Jungle-0.032
Token and
Token ID290
Feature activation-1.53e-02

Pos logit contributions
up+0.169
ade+0.123
ups+0.119
 they+0.116
head+0.114
Neg logit contributions
 Australian-0.438
opl-0.423
 habitats-0.413
 Jungle-0.407
 mastering-0.400
Token tapes
Token ID23695
Feature activation-9.63e-02

Pos logit contributions
up+0.022
 they+0.020
ups+0.018
l+0.018
 it+0.018
Neg logit contributions
 Australian-0.034
opl-0.032
 habitats-0.030
 Jungle-0.030
charged-0.030
Token the
Token ID262
Feature activation-3.10e-02

Pos logit contributions
up+0.092
ade+0.066
ups+0.064
cakes+0.062
head+0.061
Neg logit contributions
 Australian-0.246
opl-0.239
 habitats-0.234
 Jungle-0.231
 mastering-0.226
Token horn
Token ID12718
Feature activation+1.82e-02

Pos logit contributions
up+0.044
 they+0.037
ups+0.036
ade+0.036
l+0.035
Neg logit contributions
 Australian-0.069
opl-0.065
 habitats-0.062
 Jungle-0.062
 mastering-0.061
Token back
Token ID736
Feature activation+1.05e-01

Pos logit contributions
 Australian+0.050
opl+0.049
 habitats+0.047
 Jungle+0.047
 mastering+0.046
Neg logit contributions
up-0.014
ade-0.009
ups-0.009
 they-0.009
head-0.009
Token the
Token ID262
Feature activation-3.26e-02

Pos logit contributions
up+0.147
ups+0.121
guard+0.114
ade+0.108
l+0.103
Neg logit contributions
 Australian-0.247
 Jungle-0.235
opl-0.234
 ear-0.224
athed-0.223
Token bread
Token ID8509
Feature activation-6.21e-03

Pos logit contributions
up+0.043
 they+0.038
cakes+0.034
ups+0.034
guard+0.034
Neg logit contributions
 Australian-0.073
opl-0.068
 Jungle-0.067
 tail-0.064
 former-0.063
Token in
Token ID287
Feature activation+8.40e-02

Pos logit contributions
up+0.009
ups+0.007
head+0.007
l+0.007
cakes+0.007
Neg logit contributions
 Australian-0.013
 Jungle-0.012
opl-0.011
 habitats-0.011
 tail-0.011
Token the
Token ID262
Feature activation-3.35e-02

Pos logit contributions
 Australian+0.226
opl+0.215
 habitats+0.210
 Jungle+0.207
 mastering+0.203
Neg logit contributions
up-0.079
 they-0.065
ade-0.058
 it-0.058
l-0.056
Token to
Token ID284
Feature activation-8.18e-04

Pos logit contributions
up+0.044
 they+0.037
ups+0.035
ade+0.035
l+0.034
Neg logit contributions
 Australian-0.077
opl-0.073
 habitats-0.070
 Jungle-0.070
 mastering-0.069
Tokenaster
Token ID1603
Feature activation-1.71e-01

Pos logit contributions
up+0.001
ups+0.001
ade+0.001
 they+0.001
cakes+0.001
Neg logit contributions
 Australian-0.002
opl-0.002
 Jungle-0.002
 former-0.002
 mastering-0.002
Token and
Token ID290
Feature activation-2.31e-02

Pos logit contributions
up+0.164
 they+0.120
ups+0.117
ade+0.115
l+0.112
Neg logit contributions
 Australian-0.454
opl-0.436
 habitats-0.421
 Jungle-0.420
 former-0.411
Token waited
Token ID13488
Feature activation-9.76e-02

Pos logit contributions
up+0.027
ups+0.021
ade+0.021
 they+0.021
l+0.020
Neg logit contributions
 Australian-0.056
opl-0.053
 Jungle-0.051
 habitats-0.051
 mastering-0.051
Token for
Token ID329
Feature activation+2.33e-02

Pos logit contributions
up+0.149
ups+0.122
ade+0.116
guard+0.115
cakes+0.112
Neg logit contributions
 Australian-0.196
 mastering-0.189
 former-0.183
opl-0.182
 Jungle-0.181
Token it
Token ID340
Feature activation-1.27e-01

Pos logit contributions
 Australian+0.057
 Jungle+0.055
opl+0.054
 mastering+0.054
 habitats+0.053
Neg logit contributions
up-0.025
ade-0.019
cakes-0.019
ups-0.018
l-0.018
Token to
Token ID284
Feature activation-8.59e-04

Pos logit contributions
up+0.092
ups+0.056
 they+0.056
ade+0.056
l+0.055
Neg logit contributions
 Australian-0.361
opl-0.341
 habitats-0.337
 Jungle-0.335
 former-0.331
Token just
Token ID655
Feature activation+1.45e-02

Pos logit contributions
up+0.055
 they+0.049
ups+0.046
ade+0.045
 it+0.044
Neg logit contributions
 Australian-0.079
opl-0.075
 habitats-0.072
 Jungle-0.071
 former-0.069
Token stuck
Token ID7819
Feature activation-3.12e-02

Pos logit contributions
 Australian+0.033
opl+0.031
 Jungle+0.029
 habitats+0.029
charged+0.028
Neg logit contributions
up-0.021
 they-0.019
 it-0.018
 to-0.017
ups-0.017
Token with
Token ID351
Feature activation+1.75e-02

Pos logit contributions
up+0.031
 they+0.024
ups+0.023
ade+0.023
l+0.022
Neg logit contributions
 Australian-0.082
opl-0.078
 habitats-0.076
 Jungle-0.075
 mastering-0.074
Token the
Token ID262
Feature activation-2.85e-02

Pos logit contributions
 Australian+0.045
opl+0.043
 Jungle+0.041
 habitats+0.041
 mastering+0.041
Neg logit contributions
up-0.019
 they-0.016
ups-0.014
 it-0.014
ade-0.014
Token to
Token ID284
Feature activation+1.01e-03

Pos logit contributions
up+0.040
 they+0.035
ade+0.034
l+0.034
ups+0.032
Neg logit contributions
 Australian-0.063
opl-0.059
 habitats-0.056
 mastering-0.056
 Jungle-0.056
Tokenaster
Token ID1603
Feature activation-1.71e-01

Pos logit contributions
 Australian+0.003
opl+0.003
 habitats+0.002
 Jungle+0.002
 mastering+0.002
Neg logit contributions
up-0.001
 they-0.001
 it-0.001
ups-0.001
ade-0.001
Token in
Token ID287
Feature activation+8.52e-02

Pos logit contributions
up+0.160
 they+0.119
ade+0.113
ups+0.113
l+0.108
Neg logit contributions
 Australian-0.458
opl-0.438
 habitats-0.426
 Jungle-0.423
 former-0.415
Token their
Token ID511
Feature activation+5.97e-02

Pos logit contributions
 Australian+0.231
opl+0.219
 habitats+0.213
 Jungle+0.212
 mastering+0.208
Neg logit contributions
up-0.080
 they-0.063
ups-0.058
ade-0.057
l-0.055
Token modern
Token ID3660
Feature activation+6.64e-03

Pos logit contributions
 Australian+0.120
 mastering+0.113
opl+0.112
 habitats+0.106
 Jungle+0.106
Neg logit contributions
up-0.096
ups-0.086
 they-0.086
 to-0.084
 it-0.080
Token home
Token ID1363
Feature activation+4.80e-02

Pos logit contributions
 Australian+0.014
opl+0.013
 habitats+0.012
 Jungle+0.012
 mastering+0.012
Neg logit contributions
up-0.011
 they-0.009
ups-0.009
ade-0.009
 it-0.009
Token.
Token ID13
Feature activation+1.38e-02

Pos logit contributions
 Australian+0.138
opl+0.132
 habitats+0.129
 Jungle+0.128
 former+0.127
Neg logit contributions
up-0.035
ade-0.022
 they-0.022
ups-0.021
l-0.020
Token put
Token ID1234
Feature activation-1.12e-01

Pos logit contributions
up+0.003
 they+0.002
ade+0.002
ups+0.002
l+0.002
Neg logit contributions
 Australian-0.005
opl-0.005
 habitats-0.005
 Jungle-0.005
 former-0.005
Token it
Token ID340
Feature activation-1.27e-01

Pos logit contributions
up+0.108
ups+0.081
guard+0.078
cakes+0.077
ade+0.073
Neg logit contributions
 Australian-0.285
opl-0.280
 Jungle-0.276
athed-0.262
 ear-0.260
Token in
Token ID287
Feature activation+8.46e-02

Pos logit contributions
up+0.176
 they+0.145
ups+0.143
l+0.140
ade+0.139
Neg logit contributions
 Australian-0.276
opl-0.256
 Jungle-0.249
 former-0.246
 habitats-0.244
Token the
Token ID262
Feature activation-3.46e-02

Pos logit contributions
 Australian+0.224
opl+0.211
 habitats+0.208
 Jungle+0.204
 mastering+0.201
Neg logit contributions
up-0.081
 they-0.069
 it-0.064
ade-0.060
l-0.059
Token to
Token ID284
Feature activation-2.99e-04

Pos logit contributions
up+0.051
 they+0.044
ups+0.041
cakes+0.041
l+0.041
Neg logit contributions
 Australian-0.075
opl-0.070
 Jungle-0.068
 former-0.066
 habitats-0.065
Tokenaster
Token ID1603
Feature activation-1.70e-01

Pos logit contributions
up+0.000
 they+0.000
ups+0.000
ade+0.000
l+0.000
Neg logit contributions
 Australian-0.001
opl-0.001
 Jungle-0.001
 habitats-0.001
 former-0.001
Token.
Token ID13
Feature activation+8.34e-03

Pos logit contributions
up+0.161
 they+0.120
ups+0.114
ade+0.114
l+0.110
Neg logit contributions
 Australian-0.455
opl-0.434
 habitats-0.421
 Jungle-0.421
 former-0.413
Token
Token ID198
Feature activation+2.12e-02

Pos logit contributions
 Australian+0.020
 habitats+0.019
opl+0.018
charged+0.018
 mastering+0.018
Neg logit contributions
up-0.009
 they-0.009
 it-0.008
cakes-0.008
 one-0.007
Token
Token ID198
Feature activation+2.21e-02

Pos logit contributions
 Australian+0.050
opl+0.047
 Jungle+0.045
 habitats+0.045
 mastering+0.044
Neg logit contributions
up-0.028
 they-0.022
ups-0.022
ade-0.022
cakes-0.021
TokenOne
Token ID3198
Feature activation-1.66e-02

Pos logit contributions
 Australian+0.049
opl+0.046
 habitats+0.045
 Jungle+0.045
ses+0.044
Neg logit contributions
up-0.031
 they-0.027
ups-0.025
ade-0.025
l-0.025
Token day
Token ID1110
Feature activation+1.19e-02

Pos logit contributions
up+0.027
ups+0.023
cakes+0.023
ade+0.022
 they+0.022
Neg logit contributions
 Australian-0.033
opl-0.031
 Jungle-0.029
 habitats-0.029
 mastering-0.028
Token and
Token ID290
Feature activation-2.31e-02

Pos logit contributions
 Australian+0.170
opl+0.161
 habitats+0.157
 Jungle+0.156
 mastering+0.154
Neg logit contributions
up-0.056
 they-0.041
ups-0.039
ade-0.039
l-0.037
Token said
Token ID531
Feature activation-2.31e-02

Pos logit contributions
up+0.032
 they+0.026
ups+0.025
ade+0.025
l+0.025
Neg logit contributions
 Australian-0.053
opl-0.049
 habitats-0.048
 Jungle-0.048
 mastering-0.047
Token,
Token ID11
Feature activation-2.52e-02

Pos logit contributions
up+0.028
ups+0.023
ade+0.022
cakes+0.022
head+0.021
Neg logit contributions
 Australian-0.053
 mastering-0.049
opl-0.049
 Jungle-0.048
 habitats-0.048
Token "
Token ID366
Feature activation+5.45e-02

Pos logit contributions
up+0.037
ade+0.030
ups+0.030
 they+0.029
l+0.028
Neg logit contributions
 Australian-0.055
opl-0.052
 Jungle-0.050
 habitats-0.050
 mastering-0.049
TokenF
Token ID37
Feature activation+2.10e-02

Pos logit contributions
 Australian+0.103
opl+0.093
 Jungle+0.090
 habitats+0.089
 mastering+0.084
Neg logit contributions
up-0.097
 they-0.086
ade-0.084
l-0.084
ups-0.083
Tokenaster
Token ID1603
Feature activation-1.70e-01

Pos logit contributions
 Australian+0.037
opl+0.035
 habitats+0.033
 Jungle+0.033
 mastering+0.031
Neg logit contributions
up-0.040
 they-0.035
ups-0.035
l-0.034
ade-0.033
Token,
Token ID11
Feature activation-2.47e-02

Pos logit contributions
up+0.200
 they+0.161
ups+0.155
ade+0.154
l+0.151
Neg logit contributions
 Australian-0.415
opl-0.394
 habitats-0.381
 Jungle-0.381
 mastering-0.373
Token mom
Token ID1995
Feature activation-5.49e-02

Pos logit contributions
up+0.030
 they+0.024
ups+0.023
ade+0.023
l+0.022
Neg logit contributions
 Australian-0.060
opl-0.057
 habitats-0.055
 Jungle-0.055
 mastering-0.054
Tokenmy
Token ID1820
Feature activation-3.72e-02

Pos logit contributions
up+0.067
l+0.056
cakes+0.054
ade+0.054
 they+0.054
Neg logit contributions
 Australian-0.126
opl-0.122
 Jungle-0.116
 mastering-0.113
 habitats-0.113
Token!
Token ID0
Feature activation-5.74e-03

Pos logit contributions
up+0.047
ups+0.038
cakes+0.037
ade+0.037
l+0.037
Neg logit contributions
 Australian-0.087
opl-0.082
 Jungle-0.079
 habitats-0.078
 mastering-0.077
Token Faster
Token ID38996
Feature activation-4.88e-02

Pos logit contributions
up+0.008
 they+0.007
head+0.007
 it+0.007
cakes+0.007
Neg logit contributions
 Australian-0.012
opl-0.012
 habitats-0.012
 mastering-0.011
 Jungle-0.011
TokenHe
Token ID1544
Feature activation+5.09e-02

Pos logit contributions
 Australian+0.042
opl+0.039
 habitats+0.038
 Jungle+0.038
 former+0.037
Neg logit contributions
up-0.029
 they-0.025
ups-0.024
ade-0.024
l-0.023
Token said
Token ID531
Feature activation-2.22e-02

Pos logit contributions
 Australian+0.127
opl+0.122
 Jungle+0.116
 habitats+0.116
 former+0.113
Neg logit contributions
up-0.061
 they-0.050
ups-0.046
ade-0.046
l-0.045
Token,
Token ID11
Feature activation-2.45e-02

Pos logit contributions
up+0.025
ups+0.019
ade+0.019
cakes+0.018
 they+0.018
Neg logit contributions
 Australian-0.054
opl-0.050
 Jungle-0.050
 mastering-0.049
 habitats-0.049
Token "
Token ID366
Feature activation+5.52e-02

Pos logit contributions
up+0.036
ade+0.029
ups+0.029
 they+0.029
l+0.028
Neg logit contributions
 Australian-0.054
opl-0.050
 Jungle-0.049
 habitats-0.049
 mastering-0.048
TokenF
Token ID37
Feature activation+2.22e-02

Pos logit contributions
 Australian+0.109
opl+0.100
 Jungle+0.097
 habitats+0.097
 mastering+0.093
Neg logit contributions
up-0.093
 they-0.081
ade-0.079
ups-0.078
l-0.078
Tokenaster
Token ID1603
Feature activation-1.70e-01

Pos logit contributions
 Australian+0.040
opl+0.038
 habitats+0.036
 Jungle+0.036
 mastering+0.035
Neg logit contributions
up-0.040
 they-0.036
ups-0.036
l-0.034
cakes-0.034
Token,
Token ID11
Feature activation-2.41e-02

Pos logit contributions
up+0.198
 they+0.159
ups+0.152
ade+0.151
l+0.146
Neg logit contributions
 Australian-0.422
opl-0.398
 habitats-0.387
 Jungle-0.385
 former-0.377
Token faster
Token ID5443
Feature activation-3.91e-02

Pos logit contributions
up+0.031
 they+0.025
ups+0.025
ade+0.024
l+0.024
Neg logit contributions
 Australian-0.057
opl-0.053
 Jungle-0.052
 habitats-0.052
 mastering-0.051
Token!"
Token ID2474
Feature activation-6.38e-02

Pos logit contributions
up+0.048
 they+0.039
ade+0.038
ups+0.038
cakes+0.037
Neg logit contributions
 Australian-0.093
opl-0.089
 habitats-0.086
 Jungle-0.085
 mastering-0.084
Token Jane
Token ID12091
Feature activation-4.02e-02

Pos logit contributions
up+0.089
 they+0.080
cakes+0.073
ups+0.072
 it+0.072
Neg logit contributions
 Australian-0.144
opl-0.137
 habitats-0.131
 Jungle-0.130
charged-0.128
Token said
Token ID531
Feature activation-2.24e-02

Pos logit contributions
up+0.052
 they+0.043
ups+0.041
ade+0.040
l+0.040
Neg logit contributions
 Australian-0.096
opl-0.091
 habitats-0.089
 Jungle-0.088
 former-0.086
Token toast
Token ID27805
Feature activation-2.33e-02

Pos logit contributions
up+0.045
ups+0.039
 they+0.038
cakes+0.036
er+0.035
Neg logit contributions
 Australian-0.059
 tail-0.056
opl-0.055
ses-0.054
 Jungle-0.054
Token out
Token ID503
Feature activation+4.47e-02

Pos logit contributions
up+0.027
ups+0.021
head+0.020
 happens+0.020
l+0.020
Neg logit contributions
 Australian-0.052
opl-0.051
 tail-0.048
 former-0.048
 Jungle-0.047
Token of
Token ID286
Feature activation+5.33e-02

Pos logit contributions
 Australian+0.099
opl+0.097
 mastering+0.094
und+0.092
 former+0.091
Neg logit contributions
up-0.052
ups-0.038
cakes-0.038
head-0.037
guard-0.037
Token the
Token ID262
Feature activation-3.20e-02

Pos logit contributions
 Australian+0.134
opl+0.127
 Jungle+0.124
 habitats+0.122
 mastering+0.120
Neg logit contributions
up-0.062
ups-0.047
ade-0.045
l-0.045
 they-0.044
Token to
Token ID284
Feature activation-2.40e-04

Pos logit contributions
up+0.049
 they+0.041
ups+0.040
ade+0.040
l+0.039
Neg logit contributions
 Australian-0.068
opl-0.064
 Jungle-0.061
 habitats-0.061
 former-0.059
Tokenaster
Token ID1603
Feature activation-1.70e-01

Pos logit contributions
up+0.000
ups+0.000
 they+0.000
l+0.000
ade+0.000
Neg logit contributions
 Australian-0.001
opl-0.001
 Jungle-0.001
 habitats-0.001
 mastering-0.001
Token and
Token ID290
Feature activation-1.93e-02

Pos logit contributions
up+0.143
 they+0.096
ade+0.095
ups+0.093
l+0.090
Neg logit contributions
 Australian-0.461
opl-0.452
 habitats-0.430
 Jungle-0.430
 former-0.420
Token put
Token ID1234
Feature activation-1.14e-01

Pos logit contributions
up+0.025
ade+0.020
ups+0.020
l+0.020
head+0.018
Neg logit contributions
 Australian-0.043
 Jungle-0.040
opl-0.039
 mastering-0.039
athed-0.038
Token it
Token ID340
Feature activation-1.28e-01

Pos logit contributions
up+0.121
ups+0.093
guard+0.088
ade+0.083
cakes+0.079
Neg logit contributions
 Australian-0.275
 Jungle-0.262
opl-0.261
 ear-0.261
athed-0.258
Token on
Token ID319
Feature activation+6.14e-02

Pos logit contributions
up+0.194
ups+0.160
 they+0.158
head+0.157
guard+0.152
Neg logit contributions
 Australian-0.262
opl-0.233
 Jungle-0.228
 former-0.226
athed-0.225
Token a
Token ID257
Feature activation-3.92e-02

Pos logit contributions
 Australian+0.163
opl+0.155
 habitats+0.151
 Jungle+0.150
 mastering+0.147
Neg logit contributions
up-0.061
 they-0.047
ups-0.044
ade-0.044
l-0.042
Token there
Token ID612
Feature activation-7.20e-02

Pos logit contributions
up+0.035
ade+0.030
ups+0.029
l+0.028
cakes+0.027
Neg logit contributions
 Australian-0.052
 mastering-0.048
opl-0.047
 Jungle-0.047
 habitats-0.047
Token were
Token ID547
Feature activation+9.44e-02

Pos logit contributions
up+0.082
 they+0.064
ups+0.063
ade+0.062
l+0.060
Neg logit contributions
 Australian-0.180
opl-0.170
 habitats-0.165
 Jungle-0.164
 mastering-0.161
Token lots
Token ID6041
Feature activation+1.22e-02

Pos logit contributions
 Australian+0.185
opl+0.171
 habitats+0.166
 Jungle+0.164
 former+0.161
Neg logit contributions
up-0.157
 they-0.135
ade-0.132
ups-0.131
l-0.129
Token of
Token ID286
Feature activation+5.73e-02

Pos logit contributions
 Australian+0.031
opl+0.031
 Jungle+0.029
 mastering+0.029
 habitats+0.029
Neg logit contributions
up-0.011
 they-0.008
ups-0.008
ade-0.007
head-0.007
Token gl
Token ID1278
Feature activation-2.06e-02

Pos logit contributions
 Australian+0.127
opl+0.119
 habitats+0.115
 Jungle+0.114
 mastering+0.111
Neg logit contributions
up-0.083
 they-0.070
ade-0.068
ups-0.068
l-0.067
Tokenues
Token ID947
Feature activation-1.70e-01

Pos logit contributions
 they+0.033
 it+0.031
 she+0.028
 happens+0.028
 sugar+0.027
Neg logit contributions
 Australian-0.037
opl-0.037
 former-0.035
 habitats-0.035
charged-0.034
Token to
Token ID284
Feature activation+2.77e-03

Pos logit contributions
up+0.169
ade+0.123
ups+0.120
head+0.120
 they+0.116
Neg logit contributions
 Australian-0.424
opl-0.416
 habitats-0.402
 Jungle-0.400
 mastering-0.393
Token choose
Token ID3853
Feature activation-1.24e-02

Pos logit contributions
 Australian+0.007
opl+0.006
 habitats+0.006
 Jungle+0.006
 mastering+0.006
Neg logit contributions
up-0.003
 they-0.003
ups-0.003
ade-0.003
l-0.003
Token from
Token ID422
Feature activation+1.21e-01

Pos logit contributions
up+0.013
head+0.011
ade+0.011
cakes+0.010
ups+0.009
Neg logit contributions
 Australian-0.029
 habitats-0.028
opl-0.028
 former-0.028
 mastering-0.027
Token,
Token ID11
Feature activation-2.52e-02

Pos logit contributions
 Australian+0.340
opl+0.324
 habitats+0.313
 Jungle+0.313
 mastering+0.306
Neg logit contributions
up-0.099
 they-0.078
ups-0.068
 it-0.066
ade-0.066
Token nearly
Token ID3016
Feature activation+2.59e-04

Pos logit contributions
up+0.019
 they+0.018
 it+0.015
ups+0.013
 she+0.013
Neg logit contributions
 Australian-0.072
opl-0.069
 habitats-0.066
charged-0.066
 Jungle-0.066
Token m
Token ID285
Feature activation+3.56e-02

Pos logit contributions
up+0.047
ups+0.039
 they+0.037
head+0.036
l+0.036
Neg logit contributions
 Australian-0.084
opl-0.077
 Jungle-0.075
 former-0.075
 mastering-0.074
Tokenop
Token ID404
Feature activation-3.54e-02

Pos logit contributions
 Australian+0.059
opl+0.058
charged+0.053
 habitats+0.052
 former+0.051
Neg logit contributions
up-0.065
 they-0.060
 it-0.058
ade-0.057
ups-0.056
Token and
Token ID290
Feature activation-1.83e-02

Pos logit contributions
up+0.044
ups+0.033
ade+0.033
l+0.032
 happens+0.032
Neg logit contributions
 Australian-0.078
opl-0.077
charged-0.073
und-0.071
 mastering-0.071
Token a
Token ID257
Feature activation-3.69e-02

Pos logit contributions
up+0.025
l+0.022
head+0.021
ade+0.021
t+0.021
Neg logit contributions
 Australian-0.036
 former-0.033
 tail-0.032
athed-0.032
 mastering-0.032
Token dust
Token ID8977
Feature activation-5.01e-02

Pos logit contributions
up+0.040
 they+0.034
ade+0.031
l+0.030
cakes+0.029
Neg logit contributions
 Australian-0.095
opl-0.091
 habitats-0.090
 Jungle-0.087
 mastering-0.086
Tokenpan
Token ID6839
Feature activation-1.69e-01

Pos logit contributions
up+0.067
ups+0.056
cakes+0.052
ade+0.051
l+0.048
Neg logit contributions
 Australian-0.105
opl-0.100
und-0.097
 tail-0.097
athed-0.096
Token and
Token ID290
Feature activation-1.91e-02

Pos logit contributions
up+0.162
ups+0.115
ade+0.113
head+0.110
 they+0.110
Neg logit contributions
 Australian-0.446
opl-0.428
 habitats-0.417
 mastering-0.407
 Jungle-0.407
Token cleaned
Token ID20750
Feature activation-6.74e-02

Pos logit contributions
up+0.025
l+0.022
ade+0.021
cakes+0.021
head+0.021
Neg logit contributions
 Australian-0.038
 former-0.036
 habitats-0.035
 tail-0.035
ses-0.034
Token up
Token ID510
Feature activation+2.78e-02

Pos logit contributions
up+0.086
cakes+0.070
ade+0.069
guard+0.068
t+0.066
Neg logit contributions
 Australian-0.150
und-0.138
opl-0.138
charged-0.136
 mastering-0.135
Token the
Token ID262
Feature activation-3.19e-02

Pos logit contributions
 Australian+0.066
opl+0.063
 habitats+0.061
 former+0.061
 mastering+0.061
Neg logit contributions
up-0.034
ade-0.025
ups-0.025
head-0.024
cakes-0.024
Token dough
Token ID15756
Feature activation-1.55e-02

Pos logit contributions
up+0.049
 they+0.043
ade+0.040
ups+0.040
cakes+0.040
Neg logit contributions
 Australian-0.067
opl-0.062
 habitats-0.059
 former-0.059
 Jungle-0.059
Token or
Token ID393
Feature activation-4.22e-02

Pos logit contributions
opl+0.127
 Australian+0.126
 mastering+0.120
 habitats+0.118
 former+0.116
Neg logit contributions
up-0.068
cakes-0.060
 they-0.059
head-0.057
ups-0.056
Token it
Token ID340
Feature activation-1.25e-01

Pos logit contributions
up+0.055
ade+0.045
l+0.045
cakes+0.044
 they+0.043
Neg logit contributions
 Australian-0.094
opl-0.088
 former-0.086
 habitats-0.086
 mastering-0.086
Token will
Token ID481
Feature activation-2.32e-02

Pos logit contributions
up+0.144
head+0.112
ade+0.111
l+0.110
 they+0.110
Neg logit contributions
 Australian-0.294
opl-0.278
 habitats-0.266
 Jungle-0.265
 former-0.265
Token look
Token ID804
Feature activation-1.16e-01

Pos logit contributions
up+0.033
ups+0.027
 they+0.026
cakes+0.026
ade+0.026
Neg logit contributions
 Australian-0.050
opl-0.048
 former-0.046
 Jungle-0.045
 mastering-0.045
Token gre
Token ID10536
Feature activation+1.15e-02

Pos logit contributions
up+0.201
guard+0.174
ups+0.173
cakes+0.172
ade+0.169
Neg logit contributions
 Australian-0.205
 former-0.191
opl-0.181
 mastering-0.180
 Jungle-0.178
Tokenasy
Token ID4107
Feature activation-1.69e-01

Pos logit contributions
 Australian+0.021
 former+0.021
 ear+0.021
charged+0.020
 mastering+0.020
Neg logit contributions
 they-0.018
up-0.017
cakes-0.017
ade-0.017
l-0.015
Token."
Token ID526
Feature activation-4.34e-02

Pos logit contributions
up+0.189
 they+0.156
cakes+0.147
head+0.142
l+0.142
Neg logit contributions
 Australian-0.398
opl-0.387
 habitats-0.379
 former-0.368
 mastering-0.364
Token
Token ID220
Feature activation-1.03e-02

Pos logit contributions
up+0.064
 they+0.056
ade+0.053
ups+0.053
l+0.052
Neg logit contributions
 Australian-0.094
opl-0.088
 habitats-0.085
 Jungle-0.084
 mastering-0.082
Token
Token ID198
Feature activation+1.65e-02

Pos logit contributions
up+0.013
 they+0.010
ups+0.010
ade+0.010
l+0.010
Neg logit contributions
 Australian-0.025
opl-0.023
 Jungle-0.022
 habitats-0.022
 mastering-0.022
Token
Token ID198
Feature activation+1.68e-02

Pos logit contributions
 Australian+0.039
opl+0.037
 Jungle+0.035
 habitats+0.035
 mastering+0.034
Neg logit contributions
up-0.022
 they-0.017
ups-0.017
ade-0.017
head-0.016
TokenTim
Token ID14967
Feature activation-6.63e-02

Pos logit contributions
 Australian+0.044
opl+0.043
 habitats+0.042
 Jungle+0.042
 former+0.041
Neg logit contributions
up-0.017
 they-0.014
ups-0.013
ade-0.013
l-0.012
Token her
Token ID607
Feature activation+6.82e-02

Pos logit contributions
 Australian+0.037
opl+0.035
 habitats+0.034
 Jungle+0.034
 former+0.033
Neg logit contributions
up-0.017
 they-0.014
ade-0.013
ups-0.013
 it-0.013
Token little
Token ID1310
Feature activation+5.61e-02

Pos logit contributions
 Australian+0.188
opl+0.184
 Jungle+0.176
 habitats+0.174
 former+0.168
Neg logit contributions
up-0.062
 they-0.055
 it-0.048
ade-0.046
ups-0.046
Token broom
Token ID47085
Feature activation-3.92e-02

Pos logit contributions
 Australian+0.106
opl+0.100
 Jungle+0.094
 habitats+0.094
 mastering+0.092
Neg logit contributions
up-0.100
 they-0.088
ups-0.084
ade-0.083
l-0.081
Token and
Token ID290
Feature activation-2.10e-02

Pos logit contributions
up+0.039
 they+0.030
ade+0.029
ups+0.028
l+0.027
Neg logit contributions
 Australian-0.102
opl-0.097
 habitats-0.095
 Jungle-0.093
 mastering-0.093
Token dust
Token ID8977
Feature activation-4.88e-02

Pos logit contributions
up+0.027
 they+0.023
ups+0.022
ade+0.021
l+0.021
Neg logit contributions
 Australian-0.049
opl-0.047
 habitats-0.045
 Jungle-0.045
 former-0.044
Tokenpan
Token ID6839
Feature activation-1.69e-01

Pos logit contributions
up+0.060
ups+0.047
ade+0.046
cakes+0.046
head+0.044
Neg logit contributions
 Australian-0.106
opl-0.099
 mastering-0.099
 habitats-0.097
und-0.096
Token,
Token ID11
Feature activation-2.54e-02

Pos logit contributions
up+0.161
 they+0.119
ade+0.115
ups+0.115
l+0.110
Neg logit contributions
 Australian-0.451
opl-0.428
 habitats-0.417
 Jungle-0.413
 mastering-0.409
Token trying
Token ID2111
Feature activation-4.49e-02

Pos logit contributions
up+0.030
 they+0.025
ups+0.023
ade+0.023
l+0.022
Neg logit contributions
 Australian-0.063
opl-0.059
 habitats-0.058
 Jungle-0.057
 former-0.056
Token to
Token ID284
Feature activation+1.30e-03

Pos logit contributions
up+0.035
head+0.023
cakes+0.022
ade+0.020
guard+0.020
Neg logit contributions
 Australian-0.126
 mastering-0.120
opl-0.119
 habitats-0.118
 former-0.117
Token get
Token ID651
Feature activation-2.35e-02

Pos logit contributions
 Australian+0.003
opl+0.003
 former+0.003
 mastering+0.003
 Jungle+0.003
Neg logit contributions
up-0.002
 they-0.002
ups-0.002
cakes-0.001
l-0.001
Token all
Token ID477
Feature activation+3.14e-02

Pos logit contributions
up+0.030
cakes+0.026
head+0.025
l+0.025
ade+0.025
Neg logit contributions
 Australian-0.050
 former-0.046
 mastering-0.046
charged-0.045
 winning-0.045
Token white
Token ID2330
Feature activation-1.96e-02

Pos logit contributions
 Australian+0.124
opl+0.117
 Jungle+0.114
 habitats+0.114
 mastering+0.111
Neg logit contributions
up-0.063
 they-0.050
ups-0.049
ade-0.048
l-0.047
Token bread
Token ID8509
Feature activation-2.04e-03

Pos logit contributions
up+0.026
 they+0.022
ade+0.021
ups+0.021
l+0.020
Neg logit contributions
 Australian-0.043
opl-0.041
 habitats-0.040
 Jungle-0.040
 mastering-0.038
Token into
Token ID656
Feature activation+7.31e-02

Pos logit contributions
up+0.002
head+0.002
ups+0.002
ade+0.002
l+0.002
Neg logit contributions
 Australian-0.005
opl-0.005
 habitats-0.005
 Jungle-0.005
 former-0.004
Token the
Token ID262
Feature activation-3.11e-02

Pos logit contributions
 Australian+0.193
opl+0.184
 habitats+0.178
 Jungle+0.176
 mastering+0.173
Neg logit contributions
up-0.073
 they-0.060
ups-0.054
ade-0.053
l-0.053
Token to
Token ID284
Feature activation+1.31e-03

Pos logit contributions
up+0.046
 they+0.039
ups+0.038
ade+0.038
l+0.036
Neg logit contributions
 Australian-0.066
opl-0.063
 habitats-0.061
 mastering-0.060
 Jungle-0.059
Tokenaster
Token ID1603
Feature activation-1.69e-01

Pos logit contributions
 Australian+0.003
opl+0.003
 habitats+0.003
 Jungle+0.003
 mastering+0.003
Neg logit contributions
up-0.002
 they-0.001
ade-0.001
 it-0.001
l-0.001
Token.
Token ID13
Feature activation+1.04e-02

Pos logit contributions
up+0.134
 they+0.092
ups+0.087
ade+0.087
l+0.083
Neg logit contributions
 Australian-0.476
opl-0.457
 habitats-0.443
 Jungle-0.442
 former-0.434
Token She
Token ID1375
Feature activation+6.87e-02

Pos logit contributions
 Australian+0.023
opl+0.022
 habitats+0.021
charged+0.021
 mastering+0.020
Neg logit contributions
up-0.014
 they-0.013
 it-0.012
cakes-0.012
ups-0.012
Token saw
Token ID2497
Feature activation-2.58e-02

Pos logit contributions
 Australian+0.177
opl+0.170
 habitats+0.163
 Jungle+0.162
 former+0.157
Neg logit contributions
up-0.076
 they-0.064
ups-0.058
ade-0.058
 it-0.057
Token the
Token ID262
Feature activation-3.38e-02

Pos logit contributions
up+0.030
ade+0.024
ups+0.023
 they+0.023
cakes+0.022
Neg logit contributions
 Australian-0.062
opl-0.059
 habitats-0.057
 Jungle-0.057
 mastering-0.056
Token glowing
Token ID21377
Feature activation-3.35e-03

Pos logit contributions
up+0.046
 they+0.038
ups+0.036
ade+0.036
l+0.035
Neg logit contributions
 Australian-0.077
opl-0.073
 Jungle-0.070
 habitats-0.070
 former-0.069
Token loved
Token ID6151
Feature activation+9.08e-02

Pos logit contributions
 Australian+0.211
opl+0.199
 habitats+0.192
 Jungle+0.190
 former+0.186
Neg logit contributions
up-0.105
 they-0.090
ups-0.081
ade-0.080
l-0.080
Token her
Token ID607
Feature activation+7.27e-02

Pos logit contributions
 Australian+0.247
opl+0.235
 habitats+0.229
 Jungle+0.228
 mastering+0.224
Neg logit contributions
up-0.083
 they-0.063
ups-0.059
ade-0.059
l-0.056
Token.
Token ID13
Feature activation+7.09e-03

Pos logit contributions
 Australian+0.192
opl+0.180
 habitats+0.174
 Jungle+0.173
 mastering+0.170
Neg logit contributions
up-0.075
 they-0.063
ade-0.055
l-0.055
 it-0.054
Token<|endoftext|>
Token ID50256
Feature activation+7.83e-02

Pos logit contributions
 Australian+0.017
opl+0.016
charged+0.016
 habitats+0.016
 former+0.016
Neg logit contributions
 they-0.008
up-0.008
 it-0.007
 she-0.007
cakes-0.006
TokenOnce
Token ID7454
Feature activation+3.72e-02

Pos logit contributions
 Australian+0.168
opl+0.156
 Jungle+0.152
 habitats+0.150
 mastering+0.145
Neg logit contributions
up-0.118
ups-0.098
 they-0.097
ade-0.096
cakes-0.092
Token upon
Token ID2402
Feature activation+1.05e-01

Pos logit contributions
 Australian+0.056
opl+0.052
 habitats+0.046
 Jungle+0.046
 mastering+0.046
Neg logit contributions
up-0.077
ups-0.068
cakes-0.065
head-0.064
 cleaner-0.063
Token a
Token ID257
Feature activation-3.26e-02

Pos logit contributions
 Australian+0.243
opl+0.224
 Jungle+0.218
 habitats+0.218
 mastering+0.215
Neg logit contributions
up-0.136
ups-0.113
ade-0.108
cakes-0.106
 they-0.103
Token time
Token ID640
Feature activation-1.63e-02

Pos logit contributions
up+0.048
 they+0.040
ups+0.039
ade+0.039
l+0.038
Neg logit contributions
 Australian-0.071
opl-0.066
 habitats-0.064
 Jungle-0.063
 former-0.062
Token there
Token ID612
Feature activation-6.72e-02

Pos logit contributions
up+0.018
ups+0.014
ade+0.013
cakes+0.013
 they+0.013
Neg logit contributions
 Australian-0.041
opl-0.039
 Jungle-0.038
 habitats-0.038
 mastering-0.037
Token was
Token ID373
Feature activation+5.93e-02

Pos logit contributions
up+0.053
 they+0.040
ade+0.036
ups+0.034
l+0.034
Neg logit contributions
 Australian-0.191
opl-0.182
 habitats-0.178
 Jungle-0.177
 former-0.175
Token a
Token ID257
Feature activation-3.20e-02

Pos logit contributions
 Australian+0.153
opl+0.145
 habitats+0.141
 Jungle+0.141
 mastering+0.138
Neg logit contributions
up-0.063
 they-0.050
ade-0.047
ups-0.046
l-0.045
Token minutes
Token ID2431
Feature activation-9.13e-02

Pos logit contributions
 Australian+0.143
opl+0.134
 habitats+0.129
 Jungle+0.129
 mastering+0.127
Neg logit contributions
up-0.102
 they-0.086
ups-0.084
ade-0.083
cakes-0.081
Token,
Token ID11
Feature activation-2.84e-02

Pos logit contributions
up+0.111
ups+0.090
ade+0.083
cakes+0.080
l+0.080
Neg logit contributions
 Australian-0.215
opl-0.203
 mastering-0.201
 Jungle-0.197
 habitats-0.192
Token the
Token ID262
Feature activation-3.34e-02

Pos logit contributions
up+0.034
ade+0.027
ups+0.025
l+0.024
ly+0.023
Neg logit contributions
 Australian-0.070
 mastering-0.064
 Jungle-0.064
 former-0.064
opl-0.063
Token daughter
Token ID4957
Feature activation+7.03e-02

Pos logit contributions
up+0.043
 they+0.035
ups+0.035
ade+0.034
l+0.033
Neg logit contributions
 Australian-0.079
opl-0.074
 habitats-0.072
 Jungle-0.071
 former-0.070
Token calmed
Token ID49566
Feature activation-6.18e-02

Pos logit contributions
 Australian+0.174
opl+0.162
 habitats+0.159
 Jungle+0.159
 mastering+0.156
Neg logit contributions
up-0.079
 they-0.063
ade-0.061
ups-0.061
l-0.059
Token down
Token ID866
Feature activation+1.15e-01

Pos logit contributions
up+0.114
ade+0.104
cakes+0.097
oons+0.096
ups+0.095
Neg logit contributions
 Australian-0.112
 Jungle-0.097
opl-0.097
 ear-0.095
 habitats-0.095
Token and
Token ID290
Feature activation-2.44e-02

Pos logit contributions
 Australian+0.270
opl+0.255
 mastering+0.252
 Jungle+0.249
 former+0.245
Neg logit contributions
up-0.139
ade-0.111
ups-0.105
head-0.101
cakes-0.096
Token the
Token ID262
Feature activation-3.34e-02

Pos logit contributions
up+0.028
ade+0.023
ups+0.022
l+0.021
cakes+0.021
Neg logit contributions
 Australian-0.060
opl-0.054
 Jungle-0.054
 habitats-0.054
 mastering-0.053
Token mom
Token ID1995
Feature activation-6.03e-02

Pos logit contributions
up+0.049
 they+0.041
ade+0.039
ups+0.039
l+0.039
Neg logit contributions
 Australian-0.074
opl-0.069
 habitats-0.067
 Jungle-0.067
 mastering-0.065
Tokenmy
Token ID1820
Feature activation-4.09e-02

Pos logit contributions
up+0.081
ade+0.076
 happens+0.073
head+0.072
 they+0.071
Neg logit contributions
 Australian-0.119
charged-0.110
opl-0.109
 mastering-0.106
 Jungle-0.105
Token asked
Token ID1965
Feature activation-3.91e-02

Pos logit contributions
up+0.052
ade+0.047
cakes+0.045
head+0.045
oons+0.044
Neg logit contributions
 Australian-0.086
athed-0.078
 mastering-0.077
 Jungle-0.076
opl-0.076
Token the
Token ID262
Feature activation-2.82e-02

Pos logit contributions
 Australian+0.166
opl+0.156
 habitats+0.153
 Jungle+0.151
 mastering+0.148
Neg logit contributions
up-0.083
ups-0.064
 they-0.064
ade-0.064
l-0.062
Token sky
Token ID6766
Feature activation-3.14e-02

Pos logit contributions
up+0.037
ade+0.031
 they+0.031
l+0.030
ups+0.030
Neg logit contributions
 Australian-0.065
opl-0.062
 habitats-0.060
 Jungle-0.059
charged-0.059
Token.
Token ID13
Feature activation+5.46e-03

Pos logit contributions
up+0.030
ups+0.021
ade+0.020
head+0.018
cakes+0.018
Neg logit contributions
 Australian-0.082
opl-0.078
 habitats-0.077
 Jungle-0.076
 former-0.076
Token<|endoftext|>
Token ID50256
Feature activation+8.04e-02

Pos logit contributions
 Australian+0.013
opl+0.012
 habitats+0.012
 Jungle+0.012
 mastering+0.011
Neg logit contributions
up-0.007
 they-0.006
ups-0.006
ade-0.006
l-0.006
TokenOnce
Token ID7454
Feature activation+3.59e-02

Pos logit contributions
 Australian+0.175
opl+0.161
 Jungle+0.158
 habitats+0.152
 mastering+0.147
Neg logit contributions
up-0.123
ups-0.102
ly-0.097
ade-0.096
cakes-0.095
Token upon
Token ID2402
Feature activation+1.04e-01

Pos logit contributions
 Australian+0.055
opl+0.051
 Jungle+0.046
 mastering+0.045
 habitats+0.045
Neg logit contributions
up-0.074
ups-0.064
cakes-0.063
head-0.062
 cleaner-0.061
Token a
Token ID257
Feature activation-3.35e-02

Pos logit contributions
 Australian+0.235
opl+0.214
 Jungle+0.209
 habitats+0.207
 mastering+0.204
Neg logit contributions
up-0.142
ups-0.120
cakes-0.112
ade-0.112
 they-0.106
Token time
Token ID640
Feature activation-1.86e-02

Pos logit contributions
up+0.052
 they+0.043
ups+0.043
cakes+0.042
ade+0.041
Neg logit contributions
 Australian-0.071
opl-0.066
 habitats-0.064
 Jungle-0.063
 former-0.061
Token,
Token ID11
Feature activation-1.76e-02

Pos logit contributions
up+0.024
ups+0.019
cakes+0.018
ade+0.017
ly+0.016
Neg logit contributions
 Australian-0.044
opl-0.042
 Jungle-0.040
 mastering-0.040
 habitats-0.040
Token a
Token ID257
Feature activation-3.33e-02

Pos logit contributions
up+0.019
ups+0.015
cakes+0.014
ade+0.013
head+0.012
Neg logit contributions
 Australian-0.047
opl-0.042
 Jungle-0.042
 mastering-0.041
 habitats-0.041
Token little
Token ID1310
Feature activation+6.30e-02

Pos logit contributions
up+0.042
 they+0.035
ade+0.033
ups+0.033
l+0.033
Neg logit contributions
 Australian-0.079
opl-0.075
 habitats-0.072
 Jungle-0.072
 mastering-0.070
Token way
Token ID835
Feature activation+1.64e-04

Pos logit contributions
 Australian+0.144
opl+0.139
 habitats+0.133
 Jungle+0.130
 mastering+0.129
Neg logit contributions
up-0.096
 they-0.087
 it-0.081
l-0.079
 to-0.079
Token back
Token ID736
Feature activation+1.02e-01

Pos logit contributions
 Australian+0.000
opl+0.000
 Jungle+0.000
 mastering+0.000
 habitats+0.000
Neg logit contributions
up-0.000
cakes-0.000
ade-0.000
ups-0.000
head-0.000
Token feeling
Token ID4203
Feature activation+2.11e-02

Pos logit contributions
 Australian+0.251
opl+0.239
 habitats+0.232
 Jungle+0.229
 mastering+0.226
Neg logit contributions
up-0.121
 they-0.103
ups-0.094
ade-0.093
 it-0.092
Token happy
Token ID3772
Feature activation+7.20e-02

Pos logit contributions
 Australian+0.041
opl+0.039
 habitats+0.038
 Jungle+0.037
 former+0.036
Neg logit contributions
up-0.036
 they-0.033
 it-0.031
l-0.030
ade-0.030
Token and
Token ID290
Feature activation-2.06e-02

Pos logit contributions
 Australian+0.205
opl+0.196
 habitats+0.190
 Jungle+0.190
 mastering+0.186
Neg logit contributions
up-0.058
 they-0.043
ups-0.038
ade-0.038
l-0.036
Token content
Token ID2695
Feature activation+1.02e-01

Pos logit contributions
up+0.039
 they+0.038
 it+0.035
 she+0.034
 he+0.034
Neg logit contributions
 Australian-0.035
opl-0.034
 habitats-0.033
 Jungle-0.032
 former-0.030
Token.
Token ID13
Feature activation+9.21e-03

Pos logit contributions
 Australian+0.289
opl+0.275
 habitats+0.269
 Jungle+0.267
 former+0.264
Neg logit contributions
up-0.080
 they-0.054
ups-0.053
ade-0.053
l-0.049
Token<|endoftext|>
Token ID50256
Feature activation+7.51e-02

Pos logit contributions
 Australian+0.024
charged+0.024
opl+0.024
 habitats+0.023
 mastering+0.023
Neg logit contributions
 they-0.008
 it-0.007
up-0.007
 she-0.007
 he-0.006
TokenOnce
Token ID7454
Feature activation+3.48e-02

Pos logit contributions
 Australian+0.167
opl+0.156
 Jungle+0.151
 habitats+0.150
 mastering+0.146
Neg logit contributions
up-0.107
 they-0.089
ups-0.087
ade-0.086
l-0.084
Token upon
Token ID2402
Feature activation+1.01e-01

Pos logit contributions
 Australian+0.057
opl+0.052
 Jungle+0.049
 habitats+0.048
 mastering+0.048
Neg logit contributions
up-0.070
ups-0.060
cakes-0.059
head-0.058
ade-0.056
Token a
Token ID257
Feature activation-3.78e-02

Pos logit contributions
 Australian+0.252
opl+0.237
 habitats+0.230
 Jungle+0.230
 mastering+0.225
Neg logit contributions
up-0.115
 they-0.089
ups-0.089
ade-0.087
cakes-0.084
Token.
Token ID13
Feature activation+9.30e-03

Pos logit contributions
 Australian+0.160
opl+0.151
 habitats+0.149
 Jungle+0.148
 former+0.146
Neg logit contributions
up-0.031
 they-0.021
ade-0.017
ups-0.017
l-0.016
Token He
Token ID679
Feature activation+5.52e-02

Pos logit contributions
 Australian+0.016
opl+0.016
 Jungle+0.015
 habitats+0.015
 mastering+0.014
Neg logit contributions
up-0.017
ups-0.015
ade-0.015
 they-0.015
cakes-0.014
Token wants
Token ID3382
Feature activation-2.20e-02

Pos logit contributions
 Australian+0.139
 former+0.133
opl+0.132
charged+0.130
 habitats+0.126
Neg logit contributions
up-0.066
 they-0.055
l-0.055
ups-0.049
 it-0.049
Token his
Token ID465
Feature activation+6.29e-02

Pos logit contributions
up+0.029
oons+0.026
ade+0.025
le+0.025
head+0.025
Neg logit contributions
 habitats-0.044
 Jungle-0.043
 winning-0.043
 Australian-0.043
 jungle-0.041
Token balls
Token ID11333
Feature activation+4.76e-03

Pos logit contributions
 Australian+0.132
opl+0.124
 habitats+0.119
 Jungle+0.118
 mastering+0.117
Neg logit contributions
up-0.098
 they-0.085
ade-0.081
ups-0.081
l-0.079
Token back
Token ID736
Feature activation+1.03e-01

Pos logit contributions
 Australian+0.012
opl+0.011
 habitats+0.011
 Jungle+0.011
 mastering+0.011
Neg logit contributions
up-0.005
ups-0.004
 they-0.004
head-0.004
cakes-0.004
Token.
Token ID13
Feature activation+9.49e-03

Pos logit contributions
 Australian+0.283
opl+0.272
 habitats+0.265
 Jungle+0.262
 mastering+0.259
Neg logit contributions
up-0.088
cakes-0.059
ups-0.058
ade-0.057
head-0.056
Token
Token ID198
Feature activation+1.76e-02

Pos logit contributions
 Australian+0.015
opl+0.014
 Jungle+0.014
 habitats+0.013
 jungle+0.013
Neg logit contributions
up-0.019
ade-0.017
ups-0.016
cakes-0.016
oons-0.016
Token
Token ID198
Feature activation+1.76e-02

Pos logit contributions
 Australian+0.032
opl+0.031
 Jungle+0.029
und+0.027
 jungle+0.027
Neg logit contributions
up-0.031
ly-0.027
cakes-0.027
head-0.026
ups-0.025
Token"
Token ID1
Feature activation-1.22e-02

Pos logit contributions
 Australian+0.038
opl+0.036
 Jungle+0.034
und+0.033
 mastering+0.032
Neg logit contributions
up-0.024
cakes-0.021
ade-0.020
head-0.020
ly-0.019
TokenHey
Token ID10814
Feature activation+9.71e-03

Pos logit contributions
up+0.022
ade+0.019
ups+0.018
 they+0.018
cakes+0.018
Neg logit contributions
 Australian-0.022
opl-0.019
 Jungle-0.019
 habitats-0.019
athed-0.018
Token snow
Token ID6729
Feature activation-1.97e-02

Pos logit contributions
 Australian+0.117
opl+0.112
 habitats+0.110
 Jungle+0.108
charged+0.105
Neg logit contributions
up-0.072
 they-0.065
 it-0.061
ade-0.061
l-0.059
Tokening
Token ID278
Feature activation+5.46e-03

Pos logit contributions
up+0.028
ups+0.021
l+0.020
 happens+0.020
ade+0.020
Neg logit contributions
 Australian-0.042
opl-0.042
 tail-0.038
 Jungle-0.038
und-0.038
Token!
Token ID0
Feature activation-5.18e-03

Pos logit contributions
 Australian+0.011
opl+0.010
athed+0.010
und+0.009
 mastering+0.009
Neg logit contributions
up-0.009
cakes-0.007
ups-0.007
ade-0.007
oons-0.006
Token She
Token ID1375
Feature activation+6.80e-02

Pos logit contributions
up+0.010
ups+0.009
guard+0.009
ade+0.009
l+0.009
Neg logit contributions
 Australian-0.009
 Jungle-0.008
opl-0.008
 habitats-0.007
 mastering-0.007
Token put
Token ID1234
Feature activation-1.07e-01

Pos logit contributions
 Australian+0.142
opl+0.125
 habitats+0.124
 Jungle+0.122
 mastering+0.121
Neg logit contributions
up-0.104
cakes-0.088
ups-0.086
head-0.085
ade-0.084
Token on
Token ID319
Feature activation+6.25e-02

Pos logit contributions
up+0.182
ups+0.167
guard+0.154
cakes+0.147
ade+0.142
Neg logit contributions
 Australian-0.204
 ear-0.186
 tail-0.181
opl-0.178
und-0.173
Token her
Token ID607
Feature activation+6.65e-02

Pos logit contributions
 Australian+0.128
opl+0.110
 tail+0.106
 mastering+0.105
 ear+0.104
Neg logit contributions
up-0.100
guard-0.090
ups-0.086
cakes-0.084
head-0.081
Token coat
Token ID13209
Feature activation-2.87e-02

Pos logit contributions
 Australian+0.136
opl+0.126
 habitats+0.120
 Jungle+0.119
 former+0.119
Neg logit contributions
up-0.104
ups-0.090
cakes-0.086
 they-0.085
head-0.084
Token,
Token ID11
Feature activation-2.58e-02

Pos logit contributions
up+0.033
ups+0.026
ade+0.026
l+0.024
guard+0.023
Neg logit contributions
 Australian-0.069
opl-0.066
 habitats-0.063
 Jungle-0.063
 mastering-0.062
Token hat
Token ID6877
Feature activation-4.60e-02

Pos logit contributions
up+0.042
guard+0.036
ade+0.036
ups+0.036
cakes+0.035
Neg logit contributions
 Australian-0.049
 tail-0.046
 hind-0.043
 tails-0.043
 mastering-0.042
Token,
Token ID11
Feature activation-2.58e-02

Pos logit contributions
up+0.054
ups+0.043
ade+0.042
cakes+0.039
guard+0.039
Neg logit contributions
 Australian-0.110
opl-0.105
 Jungle-0.100
 mastering-0.099
 habitats-0.098
Token you
Token ID345
Feature activation-3.97e-02

Pos logit contributions
 Australian+0.032
opl+0.031
 habitats+0.030
 Jungle+0.030
 former+0.029
Neg logit contributions
up-0.019
ade-0.015
 they-0.015
ups-0.015
l-0.015
Token?"
Token ID1701
Feature activation-7.67e-02

Pos logit contributions
up+0.033
 they+0.024
ups+0.023
ade+0.023
l+0.021
Neg logit contributions
 Australian-0.111
opl-0.106
 habitats-0.103
 Jungle-0.103
 mastering-0.101
Token Lisa
Token ID15378
Feature activation-5.90e-02

Pos logit contributions
up+0.116
ade+0.101
ups+0.095
cakes+0.091
ly+0.090
Neg logit contributions
 Australian-0.162
opl-0.150
 Jungle-0.149
 habitats-0.149
 mastering-0.145
Token asks
Token ID7893
Feature activation-3.03e-02

Pos logit contributions
up+0.103
ade+0.096
cakes+0.092
head+0.090
ups+0.089
Neg logit contributions
 Australian-0.107
opl-0.100
 Jungle-0.096
 habitats-0.094
 mastering-0.091
Token.
Token ID13
Feature activation+9.65e-03

Pos logit contributions
up+0.040
ade+0.037
head+0.033
cakes+0.031
 salt+0.029
Neg logit contributions
 winning-0.060
 ear-0.060
 mastering-0.060
 Australian-0.059
 age-0.058
Token
Token ID198
Feature activation+1.77e-02

Pos logit contributions
 Australian+0.018
 Jungle+0.017
opl+0.017
 habitats+0.016
 mastering+0.016
Neg logit contributions
up-0.017
ade-0.015
ups-0.014
ly-0.013
cakes-0.013
Token
Token ID198
Feature activation+1.78e-02

Pos logit contributions
 Australian+0.035
opl+0.034
 Jungle+0.032
 habitats+0.031
 mastering+0.030
Neg logit contributions
up-0.028
ly-0.024
head-0.024
cakes-0.024
ade-0.024
TokenS
Token ID50
Feature activation-1.22e-02

Pos logit contributions
 Australian+0.042
opl+0.039
 Jungle+0.039
 habitats+0.038
 mastering+0.037
Neg logit contributions
up-0.021
ade-0.018
cakes-0.017
 they-0.017
head-0.017
Tokenara
Token ID3301
Feature activation-5.89e-02

Pos logit contributions
up+0.013
 they+0.012
head+0.011
ups+0.011
 it+0.011
Neg logit contributions
 Australian-0.029
opl-0.029
 habitats-0.028
 Jungle-0.028
 mastering-0.026
Token and
Token ID290
Feature activation-1.59e-02

Pos logit contributions
up+0.075
 they+0.061
ups+0.059
ade+0.059
l+0.057
Neg logit contributions
 Australian-0.139
opl-0.131
 habitats-0.127
 Jungle-0.127
 mastering-0.124
Token Tom
Token ID4186
Feature activation-5.14e-02

Pos logit contributions
up+0.020
 they+0.017
ups+0.016
ade+0.016
l+0.016
Neg logit contributions
 Australian-0.038
opl-0.036
 habitats-0.035
 Jungle-0.034
 mastering-0.034
Token hopped
Token ID45230
Feature activation+1.75e-02

Pos logit contributions
 Australian+0.006
opl+0.006
 habitats+0.005
 Jungle+0.005
 mastering+0.005
Neg logit contributions
up-0.003
 they-0.003
ups-0.002
ade-0.002
l-0.002
Token and
Token ID290
Feature activation-1.61e-02

Pos logit contributions
 Australian+0.037
opl+0.033
 Jungle+0.033
athed+0.032
 habitats+0.032
Neg logit contributions
up-0.026
ups-0.022
cakes-0.021
head-0.021
ade-0.021
Token skipped
Token ID26684
Feature activation+5.47e-02

Pos logit contributions
up+0.023
 they+0.022
 it+0.020
ade+0.019
ups+0.019
Neg logit contributions
 Australian-0.035
opl-0.035
 habitats-0.033
 Jungle-0.032
 former-0.032
Token his
Token ID465
Feature activation+7.03e-02

Pos logit contributions
 Australian+0.132
opl+0.125
 habitats+0.121
 Jungle+0.120
 former+0.118
Neg logit contributions
up-0.068
 they-0.057
ade-0.053
l-0.052
ups-0.052
Token way
Token ID835
Feature activation+1.53e-03

Pos logit contributions
 Australian+0.152
opl+0.151
 mastering+0.144
 habitats+0.142
und+0.140
Neg logit contributions
 they-0.092
up-0.089
 it-0.086
 to-0.086
 next-0.085
Token across
Token ID1973
Feature activation+6.27e-02

Pos logit contributions
 Australian+0.003
opl+0.003
 Jungle+0.003
 mastering+0.003
 habitats+0.003
Neg logit contributions
up-0.002
cakes-0.002
ups-0.002
ade-0.002
head-0.002
Token the
Token ID262
Feature activation-2.88e-02

Pos logit contributions
 Australian+0.163
opl+0.155
 Jungle+0.151
 habitats+0.149
 mastering+0.147
Neg logit contributions
up-0.066
ups-0.050
cakes-0.049
ade-0.048
 they-0.048
Token finish
Token ID5461
Feature activation+1.76e-02

Pos logit contributions
up+0.053
ade+0.046
 they+0.045
l+0.045
 one+0.045
Neg logit contributions
 Australian-0.047
opl-0.047
 mastering-0.046
 habitats-0.045
charged-0.044
Token line
Token ID1627
Feature activation-3.89e-02

Pos logit contributions
 Australian+0.035
opl+0.033
 Jungle+0.032
 habitats+0.032
 former+0.031
Neg logit contributions
up-0.028
 they-0.024
cakes-0.023
ups-0.023
l-0.023
Token.
Token ID13
Feature activation+1.40e-02

Pos logit contributions
up+0.036
cakes+0.026
head+0.025
ade+0.025
ups+0.025
Neg logit contributions
 Australian-0.099
opl-0.097
 Jungle-0.093
 habitats-0.092
 mastering-0.091
Token
Token ID198
Feature activation+2.75e-02

Pos logit contributions
 Australian+0.033
opl+0.032
 habitats+0.032
charged+0.032
 mastering+0.031
Neg logit contributions
 they-0.015
up-0.015
 it-0.014
 he-0.013
head-0.012
Token and
Token ID290
Feature activation-2.05e-02

Pos logit contributions
 Australian+0.190
opl+0.181
 habitats+0.179
 mastering+0.177
 Jungle+0.176
Neg logit contributions
up-0.065
ade-0.047
l-0.044
ups-0.044
cakes-0.042
Token said
Token ID531
Feature activation-2.01e-02

Pos logit contributions
up+0.032
 they+0.030
 it+0.027
 she+0.027
ups+0.026
Neg logit contributions
 Australian-0.044
opl-0.043
 habitats-0.041
 Jungle-0.040
 former-0.039
Token,
Token ID11
Feature activation-1.86e-02

Pos logit contributions
up+0.026
ade+0.021
ups+0.021
head+0.019
cakes+0.019
Neg logit contributions
 Australian-0.045
 mastering-0.042
 Jungle-0.042
opl-0.042
 habitats-0.041
Token �
Token ID6184
Feature activation+1.98e-02

Pos logit contributions
up+0.021
 they+0.020
 it+0.018
l+0.017
ups+0.017
Neg logit contributions
 Australian-0.045
opl-0.043
 habitats-0.042
 Jungle-0.041
 former-0.041
Token�
Token ID95
Feature activation-2.92e-02

Pos logit contributions
 Australian+0.032
opl+0.030
 habitats+0.029
 Jungle+0.028
 former+0.027
Neg logit contributions
up-0.040
 they-0.036
ade-0.035
cakes-0.034
ups-0.034
Token€
Token ID26391
Feature activation+3.17e-02

Pos logit contributions
up+0.058
ups+0.049
 they+0.049
ade+0.048
cakes+0.048
Neg logit contributions
 Australian-0.049
 habitats-0.044
opl-0.044
 Jungle-0.042
 former-0.042
Token�
Token ID129
Feature activation-8.42e-03

Pos logit contributions
 Australian+0.065
opl+0.061
 habitats+0.059
 Jungle+0.059
 former+0.058
Neg logit contributions
up-0.050
 they-0.044
ups-0.042
ade-0.041
l-0.041
Token�
Token ID241
Feature activation+1.38e-02

Pos logit contributions
up+0.016
ups+0.013
 it+0.013
l+0.013
er+0.013
Neg logit contributions
 Australian-0.013
charged-0.012
 prisoners-0.012
 habitats-0.012
 former-0.012
TokenIt
Token ID1026
Feature activation-6.05e-02

Pos logit contributions
 Australian+0.031
opl+0.029
 habitats+0.028
 Jungle+0.028
 former+0.027
Neg logit contributions
up-0.020
 they-0.017
ups-0.016
ade-0.016
l-0.016
Token is
Token ID318
Feature activation+2.62e-02

Pos logit contributions
up+0.062
 they+0.048
ade+0.046
ups+0.046
l+0.044
Neg logit contributions
 Australian-0.157
opl-0.149
 Jungle-0.144
 habitats-0.144
 mastering-0.142
Token lovely
Token ID14081
Feature activation+5.70e-02

Pos logit contributions
 Australian+0.063
opl+0.059
 habitats+0.058
 Jungle+0.057
 mastering+0.056
Neg logit contributions
up-0.033
 they-0.028
ups-0.026
ade-0.026
l-0.025
Token with
Token ID351
Feature activation+1.54e-02

Pos logit contributions
up+0.020
ups+0.016
ade+0.016
cakes+0.016
 they+0.015
Neg logit contributions
 Australian-0.037
opl-0.035
 Jungle-0.034
 habitats-0.033
 mastering-0.033
Token joy
Token ID8716
Feature activation+4.72e-03

Pos logit contributions
 Australian+0.035
opl+0.033
 Jungle+0.032
 habitats+0.032
 mastering+0.031
Neg logit contributions
up-0.022
 they-0.019
 it-0.018
ade-0.017
l-0.017
Token.
Token ID13
Feature activation+1.14e-02

Pos logit contributions
 Australian+0.012
opl+0.012
 habitats+0.012
 mastering+0.012
 Jungle+0.012
Neg logit contributions
up-0.004
ade-0.003
ups-0.003
head-0.003
cakes-0.003
Token From
Token ID3574
Feature activation+1.02e-01

Pos logit contributions
 Australian+0.025
opl+0.023
 habitats+0.023
 Jungle+0.023
charged+0.022
Neg logit contributions
up-0.016
 they-0.015
 it-0.013
ups-0.013
l-0.013
Token then
Token ID788
Feature activation-8.57e-02

Pos logit contributions
 Australian+0.254
opl+0.237
 habitats+0.233
 Jungle+0.232
 mastering+0.225
Neg logit contributions
up-0.118
ade-0.094
ups-0.092
cakes-0.091
 they-0.089
Token on
Token ID319
Feature activation+6.27e-02

Pos logit contributions
up+0.067
 they+0.065
 it+0.056
ade+0.049
 she+0.049
Neg logit contributions
 Australian-0.237
 habitats-0.235
 former-0.232
 Jungle-0.222
opl-0.222
Token,
Token ID11
Feature activation-2.24e-02

Pos logit contributions
 Australian+0.168
opl+0.160
 habitats+0.155
 Jungle+0.155
 mastering+0.152
Neg logit contributions
up-0.061
 they-0.044
ups-0.044
ade-0.043
l-0.041
Token he
Token ID339
Feature activation-2.08e-03

Pos logit contributions
up+0.024
ade+0.018
cakes+0.017
ups+0.017
l+0.017
Neg logit contributions
 Australian-0.058
opl-0.056
 mastering-0.054
 Jungle-0.053
 habitats-0.052
Token always
Token ID1464
Feature activation+3.45e-02

Pos logit contributions
up+0.003
 they+0.002
 it+0.002
ade+0.002
ups+0.002
Neg logit contributions
 Australian-0.005
opl-0.005
 habitats-0.005
 Jungle-0.005
charged-0.005
Token remembered
Token ID12086
Feature activation+4.48e-02

Pos logit contributions
 Australian+0.083
opl+0.079
 habitats+0.076
 Jungle+0.075
 former+0.074
Neg logit contributions
up-0.043
 they-0.038
 it-0.034
ups-0.033
ade-0.032
Token the
Token ID262
Feature activation-3.07e-02

Pos logit contributions
 Australian+0.132
opl+0.127
 Jungle+0.123
 habitats+0.122
 mastering+0.120
Neg logit contributions
up-0.031
 they-0.023
ups-0.020
 it-0.019
ade-0.019
Token what
Token ID644
Feature activation-6.86e-02

Pos logit contributions
up+0.059
ade+0.048
cakes+0.046
head+0.044
ups+0.043
Neg logit contributions
 Australian-0.099
opl-0.095
 habitats-0.094
 Jungle-0.091
 mastering-0.091
Token "
Token ID366
Feature activation+5.32e-02

Pos logit contributions
up+0.059
ade+0.040
ups+0.039
cakes+0.037
 they+0.036
Neg logit contributions
 Australian-0.191
opl-0.181
 Jungle-0.178
 habitats-0.177
 mastering-0.173
Tokenfamous
Token ID45143
Feature activation-2.26e-03

Pos logit contributions
 Australian+0.109
opl+0.102
 habitats+0.098
 Jungle+0.098
 former+0.094
Neg logit contributions
up-0.084
 they-0.072
ade-0.070
ups-0.070
l-0.067
Token"
Token ID1
Feature activation-1.71e-02

Pos logit contributions
up+0.003
ade+0.002
ups+0.002
cakes+0.002
 they+0.002
Neg logit contributions
 Australian-0.005
opl-0.005
 Jungle-0.005
 habitats-0.005
 mastering-0.004
Token meant
Token ID4001
Feature activation+3.59e-03

Pos logit contributions
up+0.022
ade+0.017
ups+0.016
 they+0.016
head+0.015
Neg logit contributions
 Australian-0.039
opl-0.036
 habitats-0.036
 Jungle-0.035
 mastering-0.034
Token,
Token ID11
Feature activation-2.30e-02

Pos logit contributions
 Australian+0.009
 mastering+0.008
 Jungle+0.008
opl+0.008
 winning+0.008
Neg logit contributions
up-0.004
ade-0.003
ups-0.003
cakes-0.003
l-0.003
Token but
Token ID475
Feature activation-5.38e-02

Pos logit contributions
up+0.025
 they+0.019
ade+0.018
ups+0.018
l+0.017
Neg logit contributions
 Australian-0.059
opl-0.056
 habitats-0.055
 Jungle-0.054
 mastering-0.053
Token they
Token ID484
Feature activation-5.05e-02

Pos logit contributions
up+0.019
 they+0.010
ups+0.005
ade+0.004
l+0.004
Neg logit contributions
 Australian-0.177
opl-0.171
 habitats-0.166
 Jungle-0.165
 former-0.163
Token liked
Token ID8288
Feature activation+7.66e-02

Pos logit contributions
up+0.054
 they+0.053
l+0.043
 it+0.042
 said+0.042
Neg logit contributions
opl-0.134
 Australian-0.133
 Jungle-0.123
charged-0.122
 former-0.120
Token the
Token ID262
Feature activation-3.38e-02

Pos logit contributions
 Australian+0.207
opl+0.196
 habitats+0.191
 Jungle+0.190
 mastering+0.187
Neg logit contributions
up-0.072
 they-0.055
ups-0.052
ade-0.051
l-0.049
Token sound
Token ID2128
Feature activation-2.36e-02

Pos logit contributions
up+0.050
 they+0.044
l+0.042
 one+0.042
ade+0.041
Neg logit contributions
 Australian-0.073
charged-0.068
opl-0.067
 mastering-0.066
 former-0.065
Token happy
Token ID3772
Feature activation+7.65e-02

Pos logit contributions
 they+0.041
up+0.037
 se+0.034
 one+0.034
 it+0.033
Neg logit contributions
 Australian-0.075
opl-0.075
 habitats-0.072
und-0.071
charged-0.069
Token family
Token ID1641
Feature activation+9.31e-02

Pos logit contributions
 Australian+0.169
opl+0.159
 habitats+0.150
 Jungle+0.149
charged+0.148
Neg logit contributions
up-0.109
 they-0.105
 it-0.093
ade-0.091
head-0.089
Token.
Token ID13
Feature activation+2.17e-02

Pos logit contributions
 Australian+0.293
opl+0.277
 habitats+0.272
 former+0.269
 Jungle+0.269
Neg logit contributions
up-0.049
 they-0.039
 it-0.025
ups-0.024
ade-0.021
Token<|endoftext|>
Token ID50256
Feature activation+8.66e-02

Pos logit contributions
 Australian+0.053
opl+0.050
 mastering+0.048
 habitats+0.048
charged+0.048
Neg logit contributions
up-0.024
 they-0.023
 it-0.020
l-0.019
ups-0.019
TokenTim
Token ID14967
Feature activation-6.42e-02

Pos logit contributions
 Australian+0.195
opl+0.183
 habitats+0.178
 Jungle+0.177
 former+0.174
Neg logit contributions
up-0.119
 they-0.100
ade-0.096
ups-0.096
l-0.094
Token and
Token ID290
Feature activation-1.39e-02

Pos logit contributions
up+0.050
 they+0.034
ade+0.033
ups+0.031
l+0.030
Neg logit contributions
 Australian-0.183
opl-0.174
 habitats-0.171
 Jungle-0.171
 former-0.171
Token Sue
Token ID26089
Feature activation-3.07e-02

Pos logit contributions
up+0.019
ups+0.016
ade+0.016
cakes+0.015
l+0.015
Neg logit contributions
 Australian-0.032
opl-0.030
 Jungle-0.029
 habitats-0.029
 mastering-0.028
Token were
Token ID547
Feature activation+9.54e-02

Pos logit contributions
up+0.043
 they+0.038
 it+0.035
l+0.034
ade+0.034
Neg logit contributions
 Australian-0.068
opl-0.065
 former-0.065
 habitats-0.064
 Jungle-0.063
Token brother
Token ID3956
Feature activation+4.32e-02

Pos logit contributions
 Australian+0.177
 habitats+0.173
opl+0.173
 Jungle+0.166
 mastering+0.163
Neg logit contributions
 they-0.159
up-0.156
 to-0.148
 one-0.143
 next-0.143
Token and
Token ID290
Feature activation-1.43e-02

Pos logit contributions
 Australian+0.118
opl+0.112
 habitats+0.109
 Jungle+0.109
 mastering+0.107
Neg logit contributions
up-0.039
ade-0.027
ups-0.026
 they-0.026
head-0.025
Token sister
Token ID6621
Feature activation+6.50e-02

Pos logit contributions
up+0.017
 they+0.014
ups+0.013
ade+0.013
l+0.013
Neg logit contributions
 Australian-0.035
opl-0.033
 habitats-0.032
 Jungle-0.032
 former-0.032
Token be
Token ID307
Feature activation+1.91e-03

Pos logit contributions
up+0.031
 they+0.027
 it+0.025
 to+0.024
ups+0.024
Neg logit contributions
 Australian-0.058
opl-0.056
 Jungle-0.054
 habitats-0.054
 mastering-0.052
Token gentle
Token ID10296
Feature activation+2.33e-02

Pos logit contributions
 Australian+0.004
 habitats+0.004
opl+0.004
 Jungle+0.004
 former+0.004
Neg logit contributions
up-0.003
 they-0.003
ups-0.003
 it-0.002
 one-0.002
Token and
Token ID290
Feature activation-1.70e-02

Pos logit contributions
 Australian+0.051
opl+0.049
 habitats+0.048
charged+0.047
 tail+0.047
Neg logit contributions
up-0.030
ade-0.027
cakes-0.024
l-0.023
oons-0.023
Token respectful
Token ID25923
Feature activation-3.20e-03

Pos logit contributions
up+0.026
 they+0.024
ups+0.022
 it+0.021
 to+0.021
Neg logit contributions
 Australian-0.037
opl-0.037
 habitats-0.033
 Jungle-0.033
und-0.033
Token next
Token ID1306
Feature activation-5.70e-02

Pos logit contributions
up+0.003
ade+0.002
cakes+0.002
l+0.002
 they+0.002
Neg logit contributions
 Australian-0.008
opl-0.008
 habitats-0.008
 Jungle-0.008
 mastering-0.008
Token time
Token ID640
Feature activation-2.04e-02

Pos logit contributions
up+0.091
 they+0.078
cakes+0.076
l+0.075
ade+0.074
Neg logit contributions
 Australian-0.116
opl-0.107
 habitats-0.106
 Jungle-0.104
 former-0.100
Token."
Token ID526
Feature activation-4.36e-02

Pos logit contributions
up+0.020
ade+0.015
cakes+0.014
ups+0.013
l+0.013
Neg logit contributions
 Australian-0.054
opl-0.051
 habitats-0.051
 Jungle-0.050
 mastering-0.049
Token
Token ID198
Feature activation+1.91e-02

Pos logit contributions
up+0.075
ade+0.064
ups+0.061
ly+0.060
cakes+0.059
Neg logit contributions
 Australian-0.085
 habitats-0.077
 Jungle-0.077
opl-0.076
 former-0.072
Token
Token ID198
Feature activation+1.94e-02

Pos logit contributions
 Australian+0.041
opl+0.038
 Jungle+0.036
 habitats+0.036
 mastering+0.034
Neg logit contributions
up-0.029
ade-0.024
cakes-0.024
head-0.023
ups-0.023
TokenThe
Token ID464
Feature activation-2.02e-02

Pos logit contributions
 Australian+0.040
opl+0.036
 Jungle+0.036
 habitats+0.035
 mastering+0.034
Neg logit contributions
up-0.030
ade-0.026
cakes-0.025
ly-0.025
head-0.025
Token lady
Token ID10846
Feature activation+6.70e-02

Pos logit contributions
up+0.016
 they+0.012
l+0.011
ups+0.011
ade+0.010
Neg logit contributions
 Australian-0.059
opl-0.056
 habitats-0.054
 Jungle-0.054
charged-0.053
Token his
Token ID465
Feature activation+6.66e-02

Pos logit contributions
 Australian+0.196
opl+0.183
 habitats+0.176
 Jungle+0.176
 mastering+0.174
Neg logit contributions
up-0.113
 they-0.098
 it-0.090
ups-0.089
ade-0.089
Token garage
Token ID15591
Feature activation+4.87e-03

Pos logit contributions
 Australian+0.139
 mastering+0.132
opl+0.127
athed+0.126
charged+0.125
Neg logit contributions
up-0.100
 they-0.093
 to-0.090
 one-0.087
ups-0.083
Token.
Token ID13
Feature activation+1.52e-02

Pos logit contributions
 Australian+0.014
opl+0.014
 habitats+0.013
 Jungle+0.013
 former+0.013
Neg logit contributions
up-0.003
 they-0.002
ade-0.002
ups-0.002
l-0.002
Token He
Token ID679
Feature activation+6.06e-02

Pos logit contributions
 Australian+0.030
opl+0.028
 habitats+0.027
 Jungle+0.027
 mastering+0.026
Neg logit contributions
up-0.025
 they-0.022
ups-0.021
ade-0.021
l-0.021
Token likes
Token ID7832
Feature activation+5.38e-02

Pos logit contributions
 Australian+0.140
opl+0.128
charged+0.124
 Jungle+0.122
 habitats+0.122
Neg logit contributions
up-0.087
 they-0.076
l-0.069
ups-0.069
 it-0.067
Token to
Token ID284
Feature activation+1.96e-03

Pos logit contributions
 Australian+0.149
opl+0.142
 Jungle+0.137
 habitats+0.137
 mastering+0.135
Neg logit contributions
up-0.047
 they-0.037
ups-0.033
ade-0.032
l-0.031
Token fix
Token ID4259
Feature activation-5.66e-02

Pos logit contributions
 Australian+0.004
opl+0.004
 habitats+0.004
 Jungle+0.004
 mastering+0.004
Neg logit contributions
up-0.003
 they-0.003
ade-0.002
ups-0.002
l-0.002
Token things
Token ID1243
Feature activation-2.00e-02

Pos logit contributions
up+0.060
 they+0.046
ade+0.045
ups+0.045
cakes+0.043
Neg logit contributions
 Australian-0.145
opl-0.138
 habitats-0.134
 Jungle-0.133
 mastering-0.131
Token.
Token ID13
Feature activation+1.39e-02

Pos logit contributions
up+0.018
ade+0.012
ups+0.012
head+0.012
cakes+0.012
Neg logit contributions
 Australian-0.053
opl-0.052
 Jungle-0.049
 mastering-0.049
 habitats-0.049
Token Lily
Token ID20037
Feature activation-5.18e-02

Pos logit contributions
 Australian+0.026
opl+0.024
 habitats+0.024
 Jungle+0.024
 former+0.023
Neg logit contributions
up-0.025
ade-0.021
ups-0.021
 they-0.021
l-0.020
Token and
Token ID290
Feature activation-1.66e-02

Pos logit contributions
up+0.036
 they+0.025
ups+0.022
ade+0.020
l+0.019
Neg logit contributions
 Australian-0.155
opl-0.147
 habitats-0.144
 Jungle-0.144
 former-0.144
Token.
Token ID13
Feature activation+1.00e-02

Pos logit contributions
up+0.008
 they+0.007
 it+0.007
 again+0.006
l+0.006
Neg logit contributions
 Australian-0.028
opl-0.026
 habitats-0.025
 Jungle-0.025
 former-0.025
Token When
Token ID1649
Feature activation+6.20e-03

Pos logit contributions
 Australian+0.026
opl+0.025
 former+0.024
 mastering+0.024
charged+0.024
Neg logit contributions
 they-0.010
up-0.010
 it-0.009
 she-0.009
 to-0.008
Token Boo
Token ID21458
Feature activation+1.90e-02

Pos logit contributions
 Australian+0.016
opl+0.016
 habitats+0.015
 former+0.015
 Jungle+0.015
Neg logit contributions
 they-0.007
 it-0.007
up-0.006
 she-0.006
 he-0.005
Token left
Token ID1364
Feature activation+2.03e-02

Pos logit contributions
 Australian+0.047
opl+0.044
 Jungle+0.043
 habitats+0.043
 mastering+0.042
Neg logit contributions
up-0.022
ups-0.017
ade-0.017
 they-0.017
cakes-0.017
Token,
Token ID11
Feature activation-2.77e-02

Pos logit contributions
 Australian+0.058
opl+0.055
 former+0.054
 habitats+0.053
 Jungle+0.052
Neg logit contributions
 they-0.016
up-0.015
 it-0.015
 she-0.013
 again-0.013
Token the
Token ID262
Feature activation-3.15e-02

Pos logit contributions
 they+0.021
up+0.019
 it+0.018
 she+0.017
 he+0.016
Neg logit contributions
 Australian-0.082
opl-0.080
 habitats-0.077
 former-0.076
 Jungle-0.075
Token town
Token ID3240
Feature activation+9.59e-02

Pos logit contributions
up+0.042
 next+0.041
 they+0.038
 one+0.037
guard+0.037
Neg logit contributions
 Australian-0.071
opl-0.068
 habitats-0.067
und-0.066
 Jungle-0.065
Token became
Token ID2627
Feature activation+1.05e-01

Pos logit contributions
 Australian+0.242
opl+0.229
 habitats+0.223
 Jungle+0.222
 mastering+0.218
Neg logit contributions
up-0.105
 they-0.082
ade-0.080
ups-0.079
l-0.076
Token sad
Token ID6507
Feature activation+4.25e-02

Pos logit contributions
 Australian+0.249
opl+0.249
 habitats+0.237
 Jungle+0.234
 mastering+0.224
Neg logit contributions
up-0.129
 they-0.117
 it-0.106
ups-0.105
 one-0.102
Token and
Token ID290
Feature activation-1.92e-02

Pos logit contributions
 Australian+0.103
 habitats+0.098
opl+0.098
 former+0.096
 mastering+0.096
Neg logit contributions
up-0.048
ups-0.036
ade-0.036
cakes-0.035
l-0.033
Token quiet
Token ID5897
Feature activation+1.16e-02

Pos logit contributions
up+0.027
 they+0.025
 it+0.023
ups+0.022
l+0.022
Neg logit contributions
 Australian-0.043
opl-0.041
 Jungle-0.039
 habitats-0.039
 mastering-0.038
Token boy
Token ID2933
Feature activation+5.60e-02

Pos logit contributions
 Australian+0.137
opl+0.130
 habitats+0.124
 Jungle+0.123
 mastering+0.122
Neg logit contributions
up-0.084
 they-0.074
ade-0.069
l-0.068
head-0.068
Token.
Token ID13
Feature activation+4.88e-03

Pos logit contributions
 Australian+0.086
opl+0.079
 habitats+0.075
 Jungle+0.075
 mastering+0.072
Neg logit contributions
up-0.118
 they-0.103
ade-0.103
ups-0.103
cakes-0.102
Token He
Token ID679
Feature activation+6.08e-02

Pos logit contributions
 Australian+0.011
opl+0.010
 former+0.010
charged+0.010
 mastering+0.010
Neg logit contributions
up-0.007
 they-0.006
 it-0.006
head-0.006
 he-0.006
Token had
Token ID550
Feature activation+4.22e-03

Pos logit contributions
opl+0.124
 Australian+0.122
 former+0.116
 Jungle+0.115
 mastering+0.114
Neg logit contributions
up-0.097
 they-0.088
l-0.084
 it-0.084
 he-0.080
Token no
Token ID645
Feature activation-8.36e-02

Pos logit contributions
 Australian+0.011
opl+0.011
 Jungle+0.010
 habitats+0.010
 mastering+0.010
Neg logit contributions
up-0.004
 they-0.004
ups-0.003
ade-0.003
 it-0.003
Token one
Token ID530
Feature activation-1.15e-01

Pos logit contributions
up+0.128
 one+0.119
 they+0.113
ade+0.111
 it+0.108
Neg logit contributions
 Australian-0.174
opl-0.161
 mastering-0.157
 Jungle-0.154
und-0.153
Token to
Token ID284
Feature activation+7.00e-03

Pos logit contributions
up+0.131
 they+0.111
ade+0.103
ups+0.099
 it+0.098
Neg logit contributions
 Australian-0.291
opl-0.274
 habitats-0.266
 former-0.263
 Jungle-0.263
Token play
Token ID711
Feature activation+3.25e-02

Pos logit contributions
 Australian+0.014
opl+0.013
 habitats+0.013
 Jungle+0.013
 mastering+0.013
Neg logit contributions
up-0.011
 they-0.010
ade-0.009
ups-0.009
l-0.009
Token with
Token ID351
Feature activation+1.44e-02

Pos logit contributions
 Australian+0.087
opl+0.083
 habitats+0.081
 Jungle+0.080
 former+0.079
Neg logit contributions
up-0.032
 they-0.026
ade-0.024
 it-0.023
ups-0.022
Token and
Token ID290
Feature activation-1.85e-02

Pos logit contributions
 Australian+0.044
opl+0.042
 Jungle+0.041
 former+0.040
 habitats+0.040
Neg logit contributions
up-0.009
 they-0.007
 it-0.007
l-0.006
ade-0.006
Token was
Token ID373
Feature activation+5.66e-02

Pos logit contributions
up+0.021
 they+0.020
 it+0.018
ups+0.017
l+0.017
Neg logit contributions
 Australian-0.046
opl-0.043
 Jungle-0.042
 habitats-0.041
charged-0.041
Token
Token ID198
Feature activation+2.40e-02

Pos logit contributions
 Australian+0.025
opl+0.023
 Jungle+0.022
 habitats+0.022
 mastering+0.022
Neg logit contributions
up-0.019
 they-0.015
ade-0.015
ups-0.015
l-0.015
Token
Token ID198
Feature activation+2.46e-02

Pos logit contributions
 Australian+0.051
opl+0.048
 Jungle+0.045
 habitats+0.045
 mastering+0.043
Neg logit contributions
up-0.036
ly-0.031
cakes-0.031
head-0.029
ups-0.029
Token"
Token ID1
Feature activation-9.35e-03

Pos logit contributions
 Australian+0.066
opl+0.064
 habitats+0.062
 Jungle+0.062
 former+0.061
Neg logit contributions
up-0.023
 they-0.018
ups-0.017
ade-0.016
l-0.016
TokenAnna
Token ID31160
Feature activation-4.97e-02

Pos logit contributions
up+0.017
ups+0.015
ade+0.015
ly+0.015
l+0.015
Neg logit contributions
 Australian-0.017
 Jungle-0.015
opl-0.014
 habitats-0.014
athed-0.014
Token,
Token ID11
Feature activation-1.56e-02

Pos logit contributions
up+0.050
 they+0.038
ups+0.037
ade+0.036
l+0.035
Neg logit contributions
 Australian-0.131
opl-0.124
 Jungle-0.121
 habitats-0.120
 mastering-0.118
Token look
Token ID804
Feature activation-1.20e-01

Pos logit contributions
up+0.021
ups+0.017
ade+0.017
l+0.016
cakes+0.016
Neg logit contributions
 Australian-0.036
opl-0.033
 Jungle-0.033
 habitats-0.032
 mastering-0.032
Token,
Token ID11
Feature activation-1.53e-02

Pos logit contributions
up+0.199
ups+0.169
cakes+0.166
guard+0.162
ade+0.162
Neg logit contributions
 Australian-0.235
opl-0.217
 Jungle-0.207
 mastering-0.206
 habitats-0.199
Token a
Token ID257
Feature activation-3.72e-02

Pos logit contributions
up+0.018
ups+0.014
ade+0.013
cakes+0.013
l+0.013
Neg logit contributions
 Australian-0.038
opl-0.035
 Jungle-0.035
 habitats-0.034
 mastering-0.034
Token crocod
Token ID37565
Feature activation+2.99e-02

Pos logit contributions
up+0.046
 they+0.039
ade+0.038
l+0.037
ups+0.036
Neg logit contributions
 Australian-0.089
opl-0.084
 habitats-0.082
 mastering-0.080
 Jungle-0.079
Tokenile
Token ID576
Feature activation+3.98e-02

Pos logit contributions
 Australian+0.046
opl+0.042
 habitats+0.040
 Jungle+0.040
 mastering+0.038
Neg logit contributions
up-0.063
 they-0.056
ups-0.055
ade-0.055
l-0.054
Token!"
Token ID2474
Feature activation-5.92e-02

Pos logit contributions
 Australian+0.096
opl+0.091
 Jungle+0.088
 habitats+0.087
 mastering+0.085
Neg logit contributions
up-0.049
 they-0.039
ups-0.038
ade-0.038
cakes-0.037
Token skills
Token ID4678
Feature activation-3.90e-02

Pos logit contributions
 Australian+0.147
opl+0.136
 Jungle+0.133
 habitats+0.132
athed+0.130
Neg logit contributions
up-0.093
 they-0.082
ade-0.077
ups-0.077
 it-0.076
Token.
Token ID13
Feature activation+2.24e-03

Pos logit contributions
up+0.036
head+0.027
cakes+0.027
ade+0.026
ups+0.026
Neg logit contributions
 Australian-0.100
opl-0.098
 mastering-0.094
 Jungle-0.093
 winning-0.091
Token She
Token ID1375
Feature activation+6.80e-02

Pos logit contributions
 Australian+0.004
opl+0.004
 habitats+0.004
 Jungle+0.004
 former+0.003
Neg logit contributions
up-0.004
guard-0.004
ups-0.004
head-0.003
l-0.003
Token wanted
Token ID2227
Feature activation+1.45e-03

Pos logit contributions
 Australian+0.161
opl+0.150
 habitats+0.147
 Jungle+0.146
 mastering+0.143
Neg logit contributions
up-0.087
 they-0.071
ups-0.069
ade-0.069
l-0.067
Token to
Token ID284
Feature activation+2.23e-03

Pos logit contributions
 Australian+0.004
 habitats+0.004
 winning+0.004
 mastering+0.004
opl+0.004
Neg logit contributions
up-0.001
l-0.001
head-0.001
ade-0.001
ly-0.001
Token see
Token ID766
Feature activation-1.08e-01

Pos logit contributions
 Australian+0.005
opl+0.005
 habitats+0.004
 Jungle+0.004
 former+0.004
Neg logit contributions
up-0.003
 they-0.003
ade-0.003
ups-0.003
l-0.003
Token if
Token ID611
Feature activation-8.37e-02

Pos logit contributions
up+0.164
ade+0.141
head+0.139
cakes+0.137
ups+0.135
Neg logit contributions
 Australian-0.219
opl-0.207
 mastering-0.199
 Jungle-0.197
 habitats-0.193
Token she
Token ID673
Feature activation+4.86e-03

Pos logit contributions
up+0.078
ade+0.059
l+0.054
oons+0.053
ups+0.053
Neg logit contributions
 Australian-0.222
 Jungle-0.205
 habitats-0.205
opl-0.202
 mastering-0.201
Token could
Token ID714
Feature activation+1.56e-02

Pos logit contributions
 Australian+0.012
opl+0.012
 habitats+0.011
 Jungle+0.011
 former+0.011
Neg logit contributions
up-0.005
 they-0.004
ups-0.004
ade-0.004
l-0.004
Token be
Token ID307
Feature activation+6.30e-04

Pos logit contributions
 Australian+0.033
opl+0.031
 habitats+0.030
 Jungle+0.030
 former+0.029
Neg logit contributions
up-0.024
 they-0.020
ade-0.020
ups-0.020
l-0.019
Token successful
Token ID4388
Feature activation+5.38e-02

Pos logit contributions
 Australian+0.001
opl+0.001
 habitats+0.001
 Jungle+0.001
 former+0.001
Neg logit contributions
up-0.001
 they-0.001
ups-0.001
 it-0.001
ade-0.001
Token one
Token ID530
Feature activation-1.19e-01

Pos logit contributions
 Australian+0.171
opl+0.161
 habitats+0.157
 Jungle+0.156
 former+0.153
Neg logit contributions
up-0.058
 they-0.046
 it-0.043
l-0.042
ade-0.041
Token leg
Token ID1232
Feature activation+2.22e-02

Pos logit contributions
up+0.196
 they+0.169
ups+0.165
cakes+0.161
ade+0.160
Neg logit contributions
 Australian-0.235
opl-0.221
 Jungle-0.210
 habitats-0.206
 mastering-0.205
Token for
Token ID329
Feature activation+2.28e-02

Pos logit contributions
 Australian+0.065
opl+0.061
 habitats+0.061
 Jungle+0.060
 former+0.060
Neg logit contributions
up-0.016
 they-0.012
ade-0.010
ups-0.010
 it-0.010
Token a
Token ID257
Feature activation-3.64e-02

Pos logit contributions
 Australian+0.062
opl+0.058
 habitats+0.057
 Jungle+0.056
 former+0.056
Neg logit contributions
up-0.020
 it-0.019
 they-0.018
 one-0.017
l-0.015
Token few
Token ID1178
Feature activation+6.53e-02

Pos logit contributions
up+0.048
 they+0.040
ups+0.038
ade+0.038
l+0.037
Neg logit contributions
 Australian-0.084
opl-0.080
 habitats-0.077
 Jungle-0.077
 mastering-0.075
Token seconds
Token ID4201
Feature activation-1.27e-01

Pos logit contributions
 Australian+0.149
opl+0.140
 habitats+0.135
 Jungle+0.135
 former+0.132
Neg logit contributions
up-0.090
 they-0.074
ups-0.073
ade-0.072
l-0.070
Token,
Token ID11
Feature activation-2.49e-02

Pos logit contributions
up+0.121
ade+0.086
cakes+0.085
ups+0.080
l+0.077
Neg logit contributions
 Australian-0.333
opl-0.329
 Jungle-0.316
 mastering-0.314
 habitats-0.312
Token then
Token ID788
Feature activation-8.12e-02

Pos logit contributions
up+0.026
ade+0.019
ups+0.019
 they+0.019
l+0.019
Neg logit contributions
 Australian-0.065
opl-0.061
 Jungle-0.060
 habitats-0.059
 mastering-0.059
Token he
Token ID339
Feature activation+1.89e-04

Pos logit contributions
up+0.092
ups+0.071
ade+0.071
cakes+0.071
 they+0.069
Neg logit contributions
 Australian-0.197
opl-0.185
 Jungle-0.183
 mastering-0.181
 habitats-0.180
Token fell
Token ID3214
Feature activation-7.20e-02

Pos logit contributions
 Australian+0.000
opl+0.000
 habitats+0.000
 Jungle+0.000
 mastering+0.000
Neg logit contributions
up-0.000
 they-0.000
ups-0.000
ade-0.000
cakes-0.000
Token down
Token ID866
Feature activation+1.13e-01

Pos logit contributions
up+0.080
 they+0.064
ups+0.060
ade+0.060
l+0.060
Neg logit contributions
 Australian-0.184
opl-0.173
 habitats-0.170
 Jungle-0.170
 former-0.165
Token Bob
Token ID5811
Feature activation+2.85e-02

Pos logit contributions
opl+0.012
 Australian+0.012
charged+0.012
 former+0.011
 mastering+0.011
Neg logit contributions
 they-0.007
up-0.007
cakes-0.007
 it-0.006
 to-0.006
Token said
Token ID531
Feature activation-1.70e-02

Pos logit contributions
 Australian+0.076
opl+0.073
 former+0.071
 habitats+0.071
 Jungle+0.070
Neg logit contributions
up-0.029
 they-0.024
ade-0.021
ups-0.021
 it-0.021
Token,
Token ID11
Feature activation-2.14e-02

Pos logit contributions
up+0.021
ups+0.017
ade+0.016
l+0.015
cakes+0.015
Neg logit contributions
 Australian-0.038
 Jungle-0.035
 mastering-0.035
 winning-0.035
 habitats-0.035
Token "
Token ID366
Feature activation+5.61e-02

Pos logit contributions
up+0.035
ups+0.029
ade+0.028
l+0.026
cakes+0.026
Neg logit contributions
 Australian-0.044
 Jungle-0.040
opl-0.040
 habitats-0.039
 mastering-0.039
TokenI
Token ID40
Feature activation-1.13e-02

Pos logit contributions
 Australian+0.121
opl+0.106
 habitats+0.105
 Jungle+0.104
 jungle+0.099
Neg logit contributions
up-0.088
 they-0.074
l-0.074
ups-0.073
ade-0.072
Token made
Token ID925
Feature activation-1.13e-01

Pos logit contributions
up+0.018
 they+0.016
ade+0.015
ups+0.015
l+0.015
Neg logit contributions
 Australian-0.023
opl-0.022
 habitats-0.021
 Jungle-0.021
 mastering-0.020
Token this
Token ID428
Feature activation-1.08e-01

Pos logit contributions
up+0.136
 they+0.109
ups+0.108
ade+0.108
l+0.107
Neg logit contributions
 Australian-0.266
opl-0.252
 Jungle-0.242
 mastering-0.241
 habitats-0.236
Token elevator
Token ID20932
Feature activation-4.68e-02

Pos logit contributions
up+0.161
 they+0.135
ups+0.133
ade+0.131
l+0.129
Neg logit contributions
 Australian-0.231
opl-0.219
 Jungle-0.208
 habitats-0.207
 mastering-0.205
Token.
Token ID13
Feature activation+5.31e-03

Pos logit contributions
up+0.057
l+0.046
ade+0.046
ups+0.045
 they+0.045
Neg logit contributions
 Australian-0.106
opl-0.104
 Jungle-0.099
 mastering-0.098
 habitats-0.097
Token It
Token ID632
Feature activation-5.20e-02

Pos logit contributions
 Australian+0.012
opl+0.012
 habitats+0.011
charged+0.011
 mastering+0.011
Neg logit contributions
up-0.007
 they-0.006
head-0.006
 it-0.006
cakes-0.006
Token goes
Token ID2925
Feature activation+2.72e-02

Pos logit contributions
up+0.063
 they+0.051
ups+0.050
ade+0.050
l+0.049
Neg logit contributions
 Australian-0.123
opl-0.117
 Jungle-0.112
 habitats-0.110
 mastering-0.109