TOP ACTIVATIONS
MAX = 0.098

, ow !" Sue w ailed . She held her knee
. " Of course Emma ", smiled her mom .
me back my blocks !" she said . " They are
I found a key !" she whispered . " Maybe it
Give me the spider !" he said . But
! Give it back !" she shouted . Ben
I will break her ," he said . Tom
! Ow ! Ow !" she cried as she stepped on
touch the v ase ," she warned . " It could
It belongs to me !" she shouted . Max
Give me your gum !" she said . She tried to
it , you two !" she said . " Why are
I 'm sorry ," the clerk said , " we don
coming to get you !" she said . She climbed the
own statues , Ben !" she suggested . "
This is my spider ," she said . Max
, I hurt myself !" she sob bed .
Stop that right now !" she said . " You are
, Tim my ," his dad said . " I 'll
Don 't worry ," she said . " I have an

MIN ACTIVATIONS
MIN = -0.091

day on , the successful frog continued to swim and have
Hello , can we be friends ?" The other
Do you want to be friends ?" Lily nodded her head
Do you want to be friends ?" The girl smiled and
Do you want to join us ?" Sam said
that day on , Nem o knew that even though there
explore . Will you join us ?" Lu cy
you want to be my friend ?" The fly was happy
Do you want to join us ?" Sara asked .
, " Can we be friends and play together again next
From that day on , Bob always used his big ears
why don 't you join us and try to forget about
, the mole and the duck became good friends and they
Do you want to be friends ? â €
music . Can we be friends ?" Tom smiled and said
, can you whistle for me ?" The bird whist led
that day on , the cat wasn 't so lazy anymore
you want to swim with me ?" The mole said ,
to play and have fun . <|endoftext|> Once upon a time
Do you want to join us and play with us ?"

INTERVAL 0.051 - 0.098
CONTAINS 1.012%

a candy bar and started to chew it .
and pressed it . Suddenly , the lab started shaking and
what is that ?" Ben asks . " I
all her secrets . One day , Lily was feeling gr
couldn 't find a way to measure them !

INTERVAL 0.004 - 0.051
CONTAINS 40.172%

shore . She hugged him and checked if he was okay
Alex was so happy to have found a
store . Then they sne aked to the kitchen . They
of you , so you don 't get soaked !"
wouldn 't find out . But later that night , Lily

INTERVAL -0.043 - 0.004
CONTAINS 57.979%

and say , " You are my friend . I like
The moral of the story is that sharing is
weak . May wanted to make the dress look
but he was too scared . He told his friend Lucy
there was a wise old grandfather . He liked to tell

INTERVAL -0.091 - -0.043
CONTAINS 0.838%

air on his face . One day the bald
everyone had fun and they all became friends . <|endoftext|> Once
room in the palace . One day , Lily
, Fl uffy . Fl uffy was happy and loved to
things for his curious mind . <|endoftext|> John was three years
Token,
Token ID11
Feature activation+4.14e-02

Pos logit contributions
 stool+0.070
cloth+0.064
 counter+0.063
box+0.060
 clerk+0.060
Neg logit contributions
zo-0.039
 adventures-0.039
 rewarding-0.037
 sharks-0.037
 seals-0.035
Token ow
Token ID12334
Feature activation+3.32e-02

Pos logit contributions
 stool+0.108
box+0.101
coat+0.098
amas+0.097
 counter+0.094
Neg logit contributions
 sharks-0.031
 aliens-0.030
 amazing-0.030
 extraordinary-0.029
 anyone-0.028
Token!"
Token ID2474
Feature activation+4.90e-02

Pos logit contributions
 stool+0.078
cloth+0.072
 counter+0.070
box+0.069
 clerk+0.066
Neg logit contributions
zo-0.039
 adventures-0.038
 rewarding-0.036
 sharks-0.035
 hunters-0.034
Token Sue
Token ID26089
Feature activation+6.43e-02

Pos logit contributions
 stool+0.100
box+0.086
cloth+0.085
 counter+0.078
 container+0.076
Neg logit contributions
 adventures-0.077
 culture-0.074
 Earth-0.073
zo-0.071
 villains-0.071
Token w
Token ID266
Feature activation+3.70e-02

Pos logit contributions
 stool+0.151
 counter+0.125
box+0.124
cloth+0.120
pron+0.120
Neg logit contributions
zo-0.090
 villains-0.071
 adventures-0.069
minded-0.069
 whist-0.068
Tokenailed
Token ID6255
Feature activation+9.79e-02

Pos logit contributions
 stool+0.079
 counter+0.065
 chair+0.061
cloth+0.061
 knob+0.057
Neg logit contributions
zo-0.061
 villains-0.056
 habitats-0.055
 culture-0.055
 aliens-0.054
Token.
Token ID13
Feature activation+4.91e-02

Pos logit contributions
 stool+0.285
box+0.259
pron+0.250
 counter+0.245
amas+0.238
Neg logit contributions
minded-0.056
 adventures-0.055
 anymore-0.055
 about-0.050
 hunters-0.050
Token She
Token ID1375
Feature activation+3.99e-02

Pos logit contributions
 stool+0.099
cloth+0.090
box+0.087
 container+0.082
itten+0.081
Neg logit contributions
 Earth-0.072
 Friendship-0.068
 habitats-0.066
 adventures-0.066
 culture-0.065
Token held
Token ID2714
Feature activation+2.60e-02

Pos logit contributions
 stool+0.080
 counter+0.065
box+0.063
cloth+0.061
itten+0.060
Neg logit contributions
zo-0.068
 villains-0.064
 aliens-0.062
 planets-0.062
 seals-0.062
Token her
Token ID607
Feature activation+1.38e-02

Pos logit contributions
 stool+0.050
 counter+0.042
cloth+0.040
box+0.040
 wra+0.037
Neg logit contributions
 Earth-0.042
 aliens-0.042
 adventures-0.041
 planets-0.041
minded-0.041
Token knee
Token ID10329
Feature activation+4.83e-02

Pos logit contributions
 stool+0.020
 counter+0.016
cloth+0.016
box+0.015
amas+0.014
Neg logit contributions
 adventures-0.027
zo-0.027
 sailors-0.027
 culture-0.027
 targets-0.027
Token.
Token ID13
Feature activation+5.91e-02

Pos logit contributions
 stool+0.045
itten+0.041
 wra+0.039
aman+0.038
Uh+0.037
Neg logit contributions
 adventures-0.020
minded-0.018
 rewarding-0.017
 extraordinary-0.017
 sailors-0.017
Token "
Token ID366
Feature activation+2.47e-02

Pos logit contributions
 stool+0.112
itten+0.108
cloth+0.093
pit+0.090
 wra+0.089
Neg logit contributions
 Earth-0.095
 Friendship-0.094
minded-0.090
 culture-0.090
 Jupiter-0.089
TokenOf
Token ID5189
Feature activation+2.28e-02

Pos logit contributions
 stool+0.051
cloth+0.043
 counter+0.041
box+0.039
 chair+0.039
Neg logit contributions
zo-0.040
 culture-0.040
minded-0.039
 Friendship-0.038
 rewarding-0.038
Token course
Token ID1781
Feature activation+2.19e-02

Pos logit contributions
 stool+0.052
cloth+0.045
 counter+0.042
box+0.041
 chair+0.041
Neg logit contributions
zo-0.033
 culture-0.033
 habitats-0.031
 galaxies-0.031
 planets-0.031
Token Emma
Token ID18966
Feature activation+3.75e-02

Pos logit contributions
 stool+0.043
cloth+0.037
 counter+0.035
itten+0.034
box+0.034
Neg logit contributions
zo-0.035
 adventures-0.035
 rewarding-0.035
 culture-0.035
 aliens-0.034
Token",
Token ID1600
Feature activation+9.61e-02

Pos logit contributions
 stool+0.081
cloth+0.067
 counter+0.064
itten+0.062
 sponge+0.060
Neg logit contributions
zo-0.057
 adventures-0.056
minded-0.055
 culture-0.054
 rewarding-0.054
Token smiled
Token ID13541
Feature activation+3.22e-02

Pos logit contributions
 stool+0.208
itten+0.186
pit+0.177
cloth+0.170
coat+0.170
Neg logit contributions
 adventures-0.126
zo-0.123
 Friendship-0.118
 sailors-0.116
 rewarding-0.116
Token her
Token ID607
Feature activation+4.08e-02

Pos logit contributions
 stool+0.071
cloth+0.060
 counter+0.058
box+0.055
 chair+0.054
Neg logit contributions
zo-0.049
minded-0.047
 culture-0.046
 adventures-0.046
 galaxies-0.045
Token mom
Token ID1995
Feature activation+5.40e-02

Pos logit contributions
 stool+0.067
cloth+0.060
 counter+0.057
itten+0.054
aman+0.052
Neg logit contributions
zo-0.072
 sailors-0.072
 Earth-0.071
 aliens-0.071
 adventures-0.070
Token.
Token ID13
Feature activation+3.63e-02

Pos logit contributions
 stool+0.126
cloth+0.107
 counter+0.103
 wra+0.098
box+0.097
Neg logit contributions
zo-0.074
minded-0.068
 adventures-0.063
 sailors-0.063
 rewarding-0.061
Token
Token ID198
Feature activation+9.84e-03

Pos logit contributions
 stool+0.068
itten+0.059
cloth+0.057
 wra+0.052
 knob+0.051
Neg logit contributions
 Friendship-0.064
 Earth-0.061
minded-0.061
 culture-0.060
 aliens-0.059
Token me
Token ID502
Feature activation-3.44e-03

Pos logit contributions
minded+0.009
zo+0.008
 adventures+0.008
 culture+0.008
 seals+0.008
Neg logit contributions
 stool-0.013
cloth-0.011
 counter-0.010
itten-0.010
 sponge-0.010
Token back
Token ID736
Feature activation-1.18e-03

Pos logit contributions
zo+0.003
minded+0.003
 sailors+0.003
 diversity+0.003
 extraordinary+0.003
Neg logit contributions
 stool-0.008
 counter-0.007
cloth-0.007
 wra-0.007
coat-0.007
Token my
Token ID616
Feature activation+1.24e-02

Pos logit contributions
 adventures+0.001
minded+0.001
 culture+0.001
zo+0.001
 habitats+0.001
Neg logit contributions
 stool-0.003
cloth-0.003
 counter-0.003
itten-0.003
 lid-0.003
Token blocks
Token ID7021
Feature activation+4.58e-02

Pos logit contributions
 stool+0.020
cloth+0.018
itten+0.016
 counter+0.016
coat+0.015
Neg logit contributions
 adventures-0.024
 planets-0.024
 aliens-0.023
 culture-0.023
 Earth-0.023
Token!"
Token ID2474
Feature activation+5.62e-02

Pos logit contributions
 stool+0.114
 counter+0.096
cloth+0.093
itten+0.092
aman+0.091
Neg logit contributions
 rewarding-0.044
 adventures-0.042
zo-0.040
 someday-0.038
 enjoyable-0.038
Token she
Token ID673
Feature activation+9.55e-02

Pos logit contributions
 stool+0.126
box+0.109
cloth+0.108
itten+0.104
coat+0.099
Neg logit contributions
 adventures-0.075
 rewarding-0.072
 villains-0.071
 Friendship-0.071
 culture-0.070
Token said
Token ID531
Feature activation+7.54e-02

Pos logit contributions
 stool+0.231
itten+0.192
cloth+0.186
box+0.186
 counter+0.183
Neg logit contributions
zo-0.129
 villains-0.109
 sailors-0.105
 seals-0.102
 aliens-0.100
Token.
Token ID13
Feature activation+5.12e-02

Pos logit contributions
 stool+0.191
itten+0.160
 counter+0.159
cloth+0.158
box+0.158
Neg logit contributions
minded-0.074
zo-0.073
 sailors-0.068
 Friendship-0.066
 Earth-0.065
Token "
Token ID366
Feature activation+8.91e-03

Pos logit contributions
 stool+0.121
cloth+0.111
itten+0.103
 knob+0.103
box+0.102
Neg logit contributions
 Friendship-0.061
 Earth-0.056
 Friends-0.054
 Peace-0.051
 Sharing-0.050
TokenThey
Token ID2990
Feature activation+1.67e-02

Pos logit contributions
 stool+0.020
cloth+0.017
 counter+0.016
box+0.016
 chair+0.016
Neg logit contributions
zo-0.013
 culture-0.013
 habitats-0.012
 galaxies-0.012
minded-0.012
Token are
Token ID389
Feature activation-1.15e-04

Pos logit contributions
 stool+0.036
cloth+0.029
 counter+0.029
itten+0.027
 chair+0.027
Neg logit contributions
zo-0.027
 culture-0.026
 planets-0.025
 adventures-0.025
 rewarding-0.025
Token I
Token ID314
Feature activation+2.14e-02

Pos logit contributions
 stool+0.033
cloth+0.028
 counter+0.027
itten+0.026
box+0.026
Neg logit contributions
 culture-0.023
zo-0.022
 planets-0.022
 aliens-0.022
 Friendship-0.022
Token found
Token ID1043
Feature activation+1.42e-02

Pos logit contributions
 stool+0.046
cloth+0.039
 counter+0.038
 chair+0.035
box+0.035
Neg logit contributions
zo-0.033
 culture-0.032
 adventures-0.031
 rewarding-0.031
 galaxies-0.031
Token a
Token ID257
Feature activation+2.14e-02

Pos logit contributions
 stool+0.032
cloth+0.027
 counter+0.026
coat+0.024
itten+0.024
Neg logit contributions
zo-0.020
 culture-0.020
 planets-0.020
 aliens-0.020
 adventures-0.019
Token key
Token ID1994
Feature activation+5.70e-02

Pos logit contributions
cloth+0.033
 stool+0.033
itten+0.026
amas+0.026
 counter+0.025
Neg logit contributions
 Earth-0.041
 culture-0.041
 Jupiter-0.040
 planet-0.039
 adventures-0.039
Token!"
Token ID2474
Feature activation+4.45e-02

Pos logit contributions
 stool+0.142
cloth+0.132
 sponge+0.124
 checkout+0.121
 pitcher+0.121
Neg logit contributions
 Earth-0.046
 someday-0.046
 aliens-0.045
zo-0.045
 planets-0.045
Token she
Token ID673
Feature activation+9.49e-02

Pos logit contributions
 stool+0.090
box+0.082
cloth+0.079
amas+0.076
 counter+0.074
Neg logit contributions
 adventures-0.065
 Jupiter-0.063
 Earth-0.061
 sailors-0.061
 Fin-0.061
Token whispered
Token ID25029
Feature activation+5.93e-02

Pos logit contributions
 stool+0.210
cloth+0.175
itten+0.173
 counter+0.171
aman+0.169
Neg logit contributions
zo-0.138
 villains-0.131
 aliens-0.127
 sailors-0.124
 adventures-0.123
Token.
Token ID13
Feature activation+4.03e-02

Pos logit contributions
 stool+0.133
cloth+0.112
 counter+0.110
box+0.107
 wra+0.105
Neg logit contributions
zo-0.083
minded-0.082
 sailors-0.077
 adventures-0.077
 culture-0.077
Token "
Token ID366
Feature activation+2.13e-02

Pos logit contributions
 stool+0.076
cloth+0.069
box+0.063
 counter+0.063
 wra+0.062
Neg logit contributions
 Friendship-0.065
 Earth-0.062
zo-0.062
 Jupiter-0.061
minded-0.061
TokenMaybe
Token ID13300
Feature activation+2.57e-02

Pos logit contributions
 stool+0.039
cloth+0.032
 counter+0.031
box+0.029
 chair+0.029
Neg logit contributions
zo-0.039
 culture-0.039
 habitats-0.038
 adventures-0.037
 galaxies-0.037
Token it
Token ID340
Feature activation+2.00e-02

Pos logit contributions
 stool+0.056
cloth+0.048
 counter+0.045
box+0.043
 chair+0.043
Neg logit contributions
zo-0.039
 culture-0.038
 planets-0.037
 adventures-0.037
 galaxies-0.037
Token Give
Token ID13786
Feature activation-4.26e-03

Pos logit contributions
 stool+0.052
cloth+0.046
 counter+0.044
 wra+0.044
box+0.044
Neg logit contributions
 Friendship-0.020
 Friends-0.018
 Earth-0.018
 villains-0.016
 Jupiter-0.016
Token me
Token ID502
Feature activation-2.45e-02

Pos logit contributions
zo+0.007
 culture+0.007
minded+0.007
 adventures+0.007
 habitats+0.007
Neg logit contributions
 stool-0.008
cloth-0.007
 counter-0.007
box-0.006
 chair-0.006
Token the
Token ID262
Feature activation+9.31e-03

Pos logit contributions
zo+0.032
minded+0.031
 culture+0.030
 diversity+0.030
 adventures+0.030
Neg logit contributions
 stool-0.057
cloth-0.048
 counter-0.048
 wra-0.046
box-0.044
Token spider
Token ID19230
Feature activation+1.55e-03

Pos logit contributions
 stool+0.015
cloth+0.014
 counter+0.013
box+0.012
 wra+0.012
Neg logit contributions
zo-0.017
 planets-0.017
 aliens-0.016
 diversity-0.016
 adventures-0.016
Token!"
Token ID2474
Feature activation+3.08e-02

Pos logit contributions
 stool+0.004
 counter+0.003
cloth+0.003
 checkout+0.003
amas+0.003
Neg logit contributions
zo-0.002
 adventures-0.002
 culture-0.002
 rewarding-0.002
 aliens-0.002
Token he
Token ID339
Feature activation+9.45e-02

Pos logit contributions
 stool+0.068
cloth+0.059
 counter+0.055
box+0.054
itten+0.053
Neg logit contributions
zo-0.046
 adventures-0.044
 culture-0.043
 aliens-0.043
 planets-0.043
Token said
Token ID531
Feature activation+7.45e-02

Pos logit contributions
 stool+0.212
cloth+0.174
itten+0.168
 counter+0.168
box+0.166
Neg logit contributions
zo-0.143
 villains-0.132
 seals-0.126
 sailors-0.122
 aliens-0.122
Token.
Token ID13
Feature activation+3.73e-02

Pos logit contributions
 stool+0.174
cloth+0.145
 counter+0.145
itten+0.140
box+0.138
Neg logit contributions
zo-0.097
minded-0.093
 sailors-0.090
 aliens-0.089
 culture-0.088
Token
Token ID198
Feature activation+1.83e-02

Pos logit contributions
 stool+0.101
cloth+0.090
itten+0.087
 counter+0.085
box+0.084
Neg logit contributions
zo-0.033
 Friendship-0.032
 Earth-0.031
 villains-0.031
 aliens-0.030
Token
Token ID198
Feature activation+3.30e-02

Pos logit contributions
 stool+0.044
cloth+0.038
 counter+0.037
box+0.035
 chair+0.035
Neg logit contributions
zo-0.024
 culture-0.023
 adventures-0.022
 Friendship-0.022
minded-0.022
TokenBut
Token ID1537
Feature activation+4.25e-02

Pos logit contributions
 stool+0.071
 counter+0.061
itten+0.060
cloth+0.058
 container+0.058
Neg logit contributions
 villains-0.051
zo-0.051
 rewarding-0.047
minded-0.046
 sailors-0.046
Token!
Token ID0
Feature activation+1.26e-02

Pos logit contributions
 stool+0.138
cloth+0.121
 counter+0.120
itten+0.118
 checkout+0.118
Neg logit contributions
 adventures-0.031
 anymore-0.031
 amazing-0.030
 ever-0.028
 aliens-0.027
Token Give
Token ID13786
Feature activation-4.13e-03

Pos logit contributions
 stool+0.031
cloth+0.026
 counter+0.025
coat+0.024
 wra+0.024
Neg logit contributions
 Friendship-0.016
 culture-0.014
zo-0.014
minded-0.014
 Earth-0.014
Token it
Token ID340
Feature activation+1.05e-02

Pos logit contributions
zo+0.006
 culture+0.006
minded+0.006
 adventures+0.006
 Friendship+0.006
Neg logit contributions
 stool-0.009
cloth-0.008
 counter-0.007
box-0.007
 chair-0.007
Token back
Token ID736
Feature activation-1.59e-03

Pos logit contributions
 stool+0.028
cloth+0.024
 counter+0.024
itten+0.022
 sponge+0.022
Neg logit contributions
zo-0.009
minded-0.009
 adventures-0.009
 extraordinary-0.009
 rewarding-0.009
Token!"
Token ID2474
Feature activation+3.98e-02

Pos logit contributions
 adventures+0.002
minded+0.002
 culture+0.002
zo+0.002
 rewarding+0.002
Neg logit contributions
 stool-0.004
cloth-0.003
 counter-0.003
 sponge-0.003
itten-0.003
Token she
Token ID673
Feature activation+9.32e-02

Pos logit contributions
 stool+0.067
cloth+0.053
box+0.049
 counter+0.049
itten+0.048
Neg logit contributions
zo-0.079
 culture-0.078
 adventures-0.076
 Friendship-0.076
minded-0.076
Token shouted
Token ID17293
Feature activation+7.37e-02

Pos logit contributions
 stool+0.219
itten+0.179
cloth+0.177
box+0.174
pit+0.174
Neg logit contributions
zo-0.130
 villains-0.116
 seals-0.108
 sailors-0.107
 aliens-0.107
Token.
Token ID13
Feature activation+4.18e-02

Pos logit contributions
 stool+0.166
cloth+0.138
 counter+0.136
box+0.132
 chair+0.129
Neg logit contributions
zo-0.103
minded-0.101
 culture-0.098
 aliens-0.097
 Friendship-0.096
Token
Token ID198
Feature activation+1.88e-02

Pos logit contributions
 stool+0.106
cloth+0.096
box+0.090
itten+0.089
 container+0.089
Neg logit contributions
 Friendship-0.043
 Earth-0.042
zo-0.042
 culture-0.039
minded-0.039
Token
Token ID198
Feature activation+2.65e-02

Pos logit contributions
 stool+0.046
cloth+0.040
 counter+0.038
box+0.037
 chair+0.037
Neg logit contributions
zo-0.023
 culture-0.022
 Friendship-0.022
 adventures-0.022
minded-0.021
TokenBen
Token ID11696
Feature activation+3.31e-02

Pos logit contributions
 stool+0.055
cloth+0.046
box+0.045
 container+0.045
itten+0.045
Neg logit contributions
zo-0.041
 villains-0.041
 Friendship-0.040
minded-0.039
 culture-0.038
Token I
Token ID314
Feature activation-2.87e-03

Pos logit contributions
 stool+0.001
cloth+0.001
 counter+0.001
box+0.001
 chair+0.001
Neg logit contributions
zo-0.001
 culture-0.001
minded-0.001
 planets-0.001
 habitats-0.001
Token will
Token ID481
Feature activation+4.11e-02

Pos logit contributions
zo+0.005
 culture+0.005
 habitats+0.005
 galaxies+0.005
 planets+0.005
Neg logit contributions
 stool-0.006
cloth-0.005
 counter-0.005
box-0.004
 chair-0.004
Token break
Token ID2270
Feature activation+2.26e-02

Pos logit contributions
 stool+0.085
cloth+0.071
 counter+0.067
box+0.067
itten+0.065
Neg logit contributions
zo-0.067
 culture-0.065
 adventures-0.064
 rewarding-0.063
 villains-0.062
Token her
Token ID607
Feature activation-6.78e-03

Pos logit contributions
 stool+0.052
box+0.042
cloth+0.042
 counter+0.042
itten+0.041
Neg logit contributions
minded-0.032
zo-0.031
 culture-0.030
 diversity-0.030
 planets-0.030
Token,"
Token ID553
Feature activation+3.69e-02

Pos logit contributions
 culture+0.009
zo+0.008
 adventures+0.008
 planets+0.008
 Earth+0.008
Neg logit contributions
 stool-0.016
cloth-0.014
 counter-0.013
itten-0.013
box-0.013
Token he
Token ID339
Feature activation+9.31e-02

Pos logit contributions
 stool+0.079
cloth+0.067
box+0.064
itten+0.063
 counter+0.063
Neg logit contributions
zo-0.056
 culture-0.054
 adventures-0.053
minded-0.053
 planets-0.053
Token said
Token ID531
Feature activation+6.94e-02

Pos logit contributions
 stool+0.194
cloth+0.156
 counter+0.155
itten+0.154
box+0.151
Neg logit contributions
zo-0.151
 villains-0.136
minded-0.135
 seals-0.133
 adventures-0.132
Token.
Token ID13
Feature activation+2.91e-02

Pos logit contributions
 stool+0.165
 counter+0.138
itten+0.137
cloth+0.137
 container+0.132
Neg logit contributions
minded-0.083
zo-0.081
 Friendship-0.078
 sailors-0.077
 aliens-0.075
Token
Token ID198
Feature activation+7.36e-03

Pos logit contributions
 stool+0.062
itten+0.054
cloth+0.052
box+0.050
coat+0.050
Neg logit contributions
zo-0.042
 Friendship-0.042
 villains-0.040
 Earth-0.040
 aliens-0.039
Token
Token ID198
Feature activation+2.83e-02

Pos logit contributions
 stool+0.017
cloth+0.014
 counter+0.014
box+0.013
 chair+0.013
Neg logit contributions
zo-0.010
 culture-0.010
 habitats-0.010
 galaxies-0.010
 planets-0.010
TokenTom
Token ID13787
Feature activation+2.35e-02

Pos logit contributions
 stool+0.062
itten+0.052
 knob+0.050
 counter+0.050
 container+0.048
Neg logit contributions
zo-0.044
 villains-0.042
minded-0.041
 rewarding-0.041
 diversity-0.040
Token!
Token ID0
Feature activation+3.98e-02

Pos logit contributions
 stool+0.076
 counter+0.068
 checkout+0.067
cloth+0.066
 timer+0.063
Neg logit contributions
zo-0.036
 sailors-0.030
 aliens-0.029
 Jupiter-0.029
 adventures-0.029
Token Ow
Token ID11960
Feature activation+2.56e-02

Pos logit contributions
 stool+0.090
cloth+0.075
coat+0.073
box+0.072
 counter+0.071
Neg logit contributions
 Friendship-0.054
 Earth-0.049
 Jupiter-0.047
zo-0.046
minded-0.046
Token!
Token ID0
Feature activation+3.03e-02

Pos logit contributions
 stool+0.059
cloth+0.053
 clerk+0.052
 counter+0.051
 cash+0.051
Neg logit contributions
zo-0.030
 adventures-0.029
 sharks-0.027
 planets-0.026
 journeys-0.025
Token Ow
Token ID11960
Feature activation+2.69e-02

Pos logit contributions
 stool+0.068
cloth+0.054
 counter+0.053
box+0.052
coat+0.052
Neg logit contributions
 Friendship-0.045
 Earth-0.041
zo-0.041
 culture-0.040
minded-0.040
Token!"
Token ID2474
Feature activation+5.25e-02

Pos logit contributions
 stool+0.062
cloth+0.055
 clerk+0.054
 counter+0.053
 cash+0.053
Neg logit contributions
zo-0.032
 adventures-0.031
 sharks-0.029
 planets-0.029
 aliens-0.027
Token she
Token ID673
Feature activation+9.28e-02

Pos logit contributions
 stool+0.116
box+0.097
cloth+0.097
pit+0.093
coat+0.092
Neg logit contributions
 adventures-0.071
 Earth-0.070
 culture-0.069
 Jupiter-0.067
 Friendship-0.066
Token cried
Token ID16896
Feature activation+6.91e-02

Pos logit contributions
 stool+0.226
pit+0.189
box+0.188
aman+0.186
 counter+0.184
Neg logit contributions
zo-0.115
 villains-0.100
 sailors-0.095
 aliens-0.093
 whist-0.092
Token as
Token ID355
Feature activation+5.29e-02

Pos logit contributions
 stool+0.157
cloth+0.132
 counter+0.129
box+0.128
 chair+0.122
Neg logit contributions
zo-0.096
minded-0.093
 aliens-0.092
 sailors-0.090
 adventures-0.090
Token she
Token ID673
Feature activation+2.50e-02

Pos logit contributions
 stool+0.087
box+0.080
 counter+0.073
cloth+0.073
 container+0.071
Neg logit contributions
 Earth-0.099
 villains-0.096
 aliens-0.095
zo-0.095
 adventures-0.094
Token stepped
Token ID10764
Feature activation+3.80e-02

Pos logit contributions
 stool+0.047
cloth+0.038
 counter+0.038
box+0.035
 ladder+0.034
Neg logit contributions
zo-0.046
 adventures-0.043
 planets-0.042
 habitats-0.042
 aliens-0.042
Token on
Token ID319
Feature activation+3.97e-02

Pos logit contributions
 stool+0.082
cloth+0.068
 counter+0.068
 wra+0.065
box+0.064
Neg logit contributions
minded-0.054
zo-0.053
 sailors-0.051
 aliens-0.051
 adventures-0.050
Token touch
Token ID3638
Feature activation-6.06e-04

Pos logit contributions
 stool+0.104
cloth+0.090
box+0.087
 counter+0.081
 lid+0.079
Neg logit contributions
zo-0.087
 adventures-0.082
 rewarding-0.081
 Earth-0.081
 culture-0.080
Token the
Token ID262
Feature activation+2.45e-02

Pos logit contributions
 culture+0.001
zo+0.001
 planets+0.001
 adventures+0.001
 aliens+0.001
Neg logit contributions
 stool-0.001
cloth-0.001
 counter-0.001
box-0.001
coat-0.001
Token v
Token ID410
Feature activation+3.67e-02

Pos logit contributions
 stool+0.044
cloth+0.041
box+0.037
 firmly+0.037
itten+0.035
Neg logit contributions
 planets-0.039
 adventures-0.039
 culture-0.039
 Earth-0.039
 aliens-0.038
Tokenase
Token ID589
Feature activation+3.88e-02

Pos logit contributions
 stool+0.069
cloth+0.058
 sponge+0.055
 wra+0.052
 counter+0.052
Neg logit contributions
zo-0.064
minded-0.062
 habitats-0.060
 Friendship-0.059
 culture-0.057
Token,"
Token ID553
Feature activation+7.83e-02

Pos logit contributions
 stool+0.088
cloth+0.078
itten+0.073
 sponge+0.073
 wra+0.072
Neg logit contributions
 rewarding-0.045
 adventures-0.045
zo-0.044
minded-0.042
 someday-0.042
Token she
Token ID673
Feature activation+9.24e-02

Pos logit contributions
 stool+0.161
cloth+0.139
box+0.137
pit+0.135
itten+0.133
Neg logit contributions
 adventures-0.111
 Friendship-0.110
minded-0.110
 culture-0.109
 marriage-0.107
Token warned
Token ID7728
Feature activation+7.37e-02

Pos logit contributions
 stool+0.222
cloth+0.192
pit+0.183
 counter+0.180
box+0.176
Neg logit contributions
zo-0.124
minded-0.100
 adventures-0.099
 rewarding-0.093
 extraordinary-0.093
Token.
Token ID13
Feature activation+4.83e-02

Pos logit contributions
 stool+0.178
itten+0.150
box+0.147
pit+0.144
 container+0.142
Neg logit contributions
minded-0.092
 adventures-0.083
zo-0.082
 Earth-0.080
 aliens-0.079
Token "
Token ID366
Feature activation+2.95e-03

Pos logit contributions
 stool+0.111
itten+0.103
cloth+0.099
 knob+0.096
box+0.095
Neg logit contributions
 Friendship-0.056
minded-0.053
 Earth-0.051
 Friends-0.050
 Captain-0.047
TokenIt
Token ID1026
Feature activation+1.48e-02

Pos logit contributions
 stool+0.006
cloth+0.005
 counter+0.004
box+0.004
 chair+0.004
Neg logit contributions
zo-0.005
 culture-0.005
 habitats-0.005
 galaxies-0.005
 planets-0.005
Token could
Token ID714
Feature activation+6.39e-02

Pos logit contributions
 stool+0.032
cloth+0.027
 counter+0.026
box+0.025
 chair+0.024
Neg logit contributions
zo-0.023
 culture-0.022
 planets-0.022
minded-0.022
 rewarding-0.021
Token It
Token ID632
Feature activation+1.20e-02

Pos logit contributions
zo+0.003
 Friendship+0.003
 villains+0.003
 culture+0.003
 seals+0.003
Neg logit contributions
 stool-0.005
cloth-0.005
 counter-0.004
box-0.004
 wra-0.004
Token belongs
Token ID14448
Feature activation-2.16e-02

Pos logit contributions
 stool+0.029
cloth+0.025
 counter+0.024
box+0.022
 chair+0.022
Neg logit contributions
zo-0.016
 adventures-0.014
 aliens-0.014
 seals-0.014
 Friendship-0.014
Token to
Token ID284
Feature activation-6.47e-03

Pos logit contributions
zo+0.027
 culture+0.026
 adventures+0.026
 habitats+0.025
minded+0.025
Neg logit contributions
 stool-0.053
cloth-0.045
 counter-0.044
 chair-0.042
itten-0.041
Token me
Token ID502
Feature activation-2.37e-02

Pos logit contributions
zo+0.011
 culture+0.011
 adventures+0.011
 habitats+0.010
 planets+0.010
Neg logit contributions
 stool-0.013
cloth-0.011
 counter-0.010
box-0.010
 chair-0.010
Token!"
Token ID2474
Feature activation+4.45e-02

Pos logit contributions
zo+0.028
 adventures+0.027
 culture+0.026
minded+0.026
 rewarding+0.026
Neg logit contributions
 stool-0.056
cloth-0.049
 sponge-0.047
 counter-0.047
 wra-0.047
Token she
Token ID673
Feature activation+9.17e-02

Pos logit contributions
 stool+0.082
cloth+0.069
box+0.065
itten+0.061
coat+0.061
Neg logit contributions
zo-0.078
 adventures-0.076
 Earth-0.075
 Friendship-0.074
minded-0.074
Token shouted
Token ID17293
Feature activation+6.57e-02

Pos logit contributions
 stool+0.219
itten+0.184
box+0.181
pit+0.180
cloth+0.178
Neg logit contributions
zo-0.118
 villains-0.110
 seals-0.100
 sailors-0.099
 aliens-0.099
Token.
Token ID13
Feature activation+3.45e-02

Pos logit contributions
 stool+0.146
cloth+0.122
 counter+0.119
box+0.116
itten+0.113
Neg logit contributions
zo-0.094
minded-0.091
 culture-0.090
 adventures-0.088
 aliens-0.088
Token
Token ID198
Feature activation+7.30e-03

Pos logit contributions
 stool+0.090
cloth+0.081
box+0.075
itten+0.075
 container+0.075
Neg logit contributions
 Friendship-0.034
 Earth-0.033
zo-0.032
 Jupiter-0.030
 aliens-0.030
Token
Token ID198
Feature activation+2.33e-02

Pos logit contributions
 stool+0.017
cloth+0.015
 counter+0.014
box+0.013
 chair+0.013
Neg logit contributions
zo-0.010
 culture-0.010
 habitats-0.009
 galaxies-0.009
 planets-0.009
TokenMax
Token ID11518
Feature activation+1.94e-02

Pos logit contributions
 stool+0.050
cloth+0.041
 container+0.041
 counter+0.041
box+0.041
Neg logit contributions
zo-0.036
 villains-0.036
 culture-0.033
minded-0.033
 Friendship-0.033
TokenGive
Token ID23318
Feature activation+2.98e-03

Pos logit contributions
 stool+0.104
cloth+0.086
box+0.083
 counter+0.083
 chair+0.078
Neg logit contributions
zo-0.097
 culture-0.096
minded-0.094
 Friendship-0.094
 adventures-0.093
Token me
Token ID502
Feature activation-2.13e-02

Pos logit contributions
 stool+0.006
cloth+0.005
 counter+0.005
box+0.005
 chair+0.004
Neg logit contributions
zo-0.005
 culture-0.005
minded-0.005
 adventures-0.005
 Friendship-0.005
Token your
Token ID534
Feature activation+5.68e-03

Pos logit contributions
zo+0.026
minded+0.025
 culture+0.025
 diversity+0.024
 Friendship+0.024
Neg logit contributions
 stool-0.051
cloth-0.044
 counter-0.043
 wra-0.042
box-0.041
Token gum
Token ID27142
Feature activation+3.77e-02

Pos logit contributions
 stool+0.009
cloth+0.008
 counter+0.007
box+0.007
 knob+0.006
Neg logit contributions
zo-0.011
 adventures-0.011
 Friendship-0.011
 culture-0.011
minded-0.011
Token!"
Token ID2474
Feature activation+5.69e-02

Pos logit contributions
 stool+0.086
cloth+0.071
 counter+0.070
 knob+0.068
 broom+0.067
Neg logit contributions
zo-0.050
 adventures-0.047
 rewarding-0.044
minded-0.044
 seals-0.043
Token she
Token ID673
Feature activation+9.16e-02

Pos logit contributions
 stool+0.140
box+0.121
cloth+0.120
itten+0.119
coat+0.115
Neg logit contributions
 Earth-0.062
 adventures-0.061
zo-0.061
 aliens-0.059
 planets-0.058
Token said
Token ID531
Feature activation+7.65e-02

Pos logit contributions
 stool+0.204
cloth+0.167
box+0.164
 counter+0.161
itten+0.160
Neg logit contributions
zo-0.137
 seals-0.122
 villains-0.119
 sailors-0.118
minded-0.116
Token.
Token ID13
Feature activation+4.77e-02

Pos logit contributions
 stool+0.179
cloth+0.150
 counter+0.149
itten+0.146
box+0.145
Neg logit contributions
minded-0.093
zo-0.093
 sailors-0.089
 Friendship-0.088
 culture-0.085
Token She
Token ID1375
Feature activation+4.66e-02

Pos logit contributions
 stool+0.091
cloth+0.080
itten+0.078
box+0.077
 container+0.075
Neg logit contributions
 Friendship-0.075
 Earth-0.074
 Jupiter-0.067
 Friends-0.067
 aliens-0.066
Token tried
Token ID3088
Feature activation+1.74e-02

Pos logit contributions
 stool+0.092
 counter+0.073
itten+0.072
cloth+0.071
box+0.070
Neg logit contributions
zo-0.073
 seals-0.073
 sailors-0.068
 Friendship-0.067
 aliens-0.067
Token to
Token ID284
Feature activation+6.18e-02

Pos logit contributions
 stool+0.044
cloth+0.039
 counter+0.038
box+0.037
itten+0.037
Neg logit contributions
 Friendship-0.018
minded-0.017
 adventures-0.017
 diversity-0.017
 culture-0.017
Token it
Token ID340
Feature activation+2.48e-02

Pos logit contributions
 stool+0.018
cloth+0.015
 counter+0.014
box+0.014
 chair+0.014
Neg logit contributions
zo-0.017
 culture-0.017
 habitats-0.016
 galaxies-0.016
minded-0.016
Token,
Token ID11
Feature activation+3.92e-02

Pos logit contributions
 stool+0.057
cloth+0.050
 counter+0.045
 chair+0.045
 sponge+0.045
Neg logit contributions
zo-0.029
 adventures-0.029
 Earth-0.027
 hunters-0.027
 wars-0.027
Token you
Token ID345
Feature activation+2.11e-02

Pos logit contributions
 stool+0.095
itten+0.082
box+0.078
 counter+0.078
coat+0.077
Neg logit contributions
 aliens-0.050
 sailors-0.047
 seals-0.044
 Earth-0.043
 villains-0.043
Token two
Token ID734
Feature activation+1.84e-02

Pos logit contributions
 stool+0.044
cloth+0.037
 counter+0.035
 chair+0.033
coat+0.033
Neg logit contributions
zo-0.033
 Friendship-0.033
 aliens-0.032
 sailors-0.032
 culture-0.032
Token!"
Token ID2474
Feature activation+5.24e-02

Pos logit contributions
 stool+0.042
cloth+0.035
 counter+0.034
itten+0.033
 clerk+0.032
Neg logit contributions
 battles-0.024
 adventures-0.024
 aliens-0.024
 nations-0.024
 sailors-0.024
Token she
Token ID673
Feature activation+9.12e-02

Pos logit contributions
 stool+0.101
cloth+0.084
coat+0.084
itten+0.083
pit+0.081
Neg logit contributions
 aliens-0.085
 adventures-0.085
 Earth-0.084
 Friendship-0.083
 Jupiter-0.082
Token said
Token ID531
Feature activation+8.40e-02

Pos logit contributions
 stool+0.213
cloth+0.177
itten+0.175
pit+0.170
 counter+0.170
Neg logit contributions
zo-0.124
 sailors-0.109
 villains-0.109
 aliens-0.105
 seals-0.104
Token.
Token ID13
Feature activation+5.12e-02

Pos logit contributions
 stool+0.188
 counter+0.157
cloth+0.155
itten+0.154
box+0.150
Neg logit contributions
minded-0.107
zo-0.105
 sailors-0.103
 aliens-0.097
 adventures-0.096
Token "
Token ID366
Feature activation+1.77e-02

Pos logit contributions
 stool+0.104
itten+0.095
cloth+0.094
 container+0.089
box+0.086
Neg logit contributions
 Friendship-0.073
 Earth-0.073
 Friends-0.069
 aliens-0.068
 Jupiter-0.066
TokenWhy
Token ID5195
Feature activation+3.00e-02

Pos logit contributions
 stool+0.037
cloth+0.032
 counter+0.030
 chair+0.029
box+0.029
Neg logit contributions
zo-0.028
 culture-0.028
 habitats-0.027
 galaxies-0.027
 adventures-0.026
Token are
Token ID389
Feature activation-1.25e-02

Pos logit contributions
 stool+0.063
 counter+0.053
cloth+0.051
itten+0.049
 container+0.048
Neg logit contributions
zo-0.047
 Friendship-0.047
 diversity-0.047
 culture-0.047
 planets-0.045
TokenI
Token ID40
Feature activation-1.54e-02

Pos logit contributions
 stool+0.090
cloth+0.079
 broom+0.073
box+0.073
 chair+0.072
Neg logit contributions
minded-0.051
 Friendship-0.051
zo-0.051
 culture-0.050
 seals-0.049
Token'm
Token ID1101
Feature activation-3.28e-03

Pos logit contributions
zo+0.021
 culture+0.020
 rewarding+0.020
 adventures+0.020
minded+0.019
Neg logit contributions
 stool-0.036
cloth-0.030
 counter-0.030
box-0.028
 wra-0.028
Token sorry
Token ID7926
Feature activation+1.63e-02

Pos logit contributions
 culture+0.004
 sailors+0.004
 Earth+0.004
 rewarding+0.004
 Jupiter+0.004
Neg logit contributions
 stool-0.007
cloth-0.006
 counter-0.006
box-0.006
itten-0.006
Token,"
Token ID553
Feature activation+4.25e-02

Pos logit contributions
 stool+0.033
cloth+0.030
box+0.027
 counter+0.026
 wra+0.026
Neg logit contributions
zo-0.025
 culture-0.025
 adventures-0.024
 rewarding-0.024
 aliens-0.024
Token the
Token ID262
Feature activation+4.46e-02

Pos logit contributions
 stool+0.102
cloth+0.095
box+0.089
coat+0.085
 broom+0.084
Neg logit contributions
 culture-0.051
minded-0.050
 Friendship-0.050
 adventures-0.050
zo-0.049
Token clerk
Token ID21120
Feature activation+9.11e-02

Pos logit contributions
 stool+0.060
cloth+0.054
pan+0.050
itten+0.046
 broom+0.046
Neg logit contributions
minded-0.089
 culture-0.087
 Friendship-0.087
 Captain-0.086
 marriage-0.085
Token said
Token ID531
Feature activation+6.45e-02

Pos logit contributions
 stool+0.194
cloth+0.160
itten+0.146
 chair+0.145
 counter+0.145
Neg logit contributions
zo-0.144
minded-0.139
 sailors-0.128
 Friendship-0.127
 culture-0.124
Token,
Token ID11
Feature activation+4.79e-02

Pos logit contributions
 stool+0.156
itten+0.131
cloth+0.128
 wra+0.128
box+0.126
Neg logit contributions
minded-0.073
zo-0.073
 culture-0.068
 Friendship-0.068
 sailors-0.068
Token "
Token ID366
Feature activation+1.33e-02

Pos logit contributions
 stool+0.095
cloth+0.082
box+0.080
itten+0.080
 wra+0.076
Neg logit contributions
 Friendship-0.072
 Earth-0.070
 culture-0.069
 sailors-0.068
 Jupiter-0.067
Tokenwe
Token ID732
Feature activation-1.93e-02

Pos logit contributions
 stool+0.027
cloth+0.022
 counter+0.021
box+0.021
 chair+0.020
Neg logit contributions
zo-0.023
 culture-0.022
minded-0.022
 Friendship-0.021
 galaxies-0.021
Token don
Token ID836
Feature activation+1.46e-02

Pos logit contributions
zo+0.029
 culture+0.027
minded+0.026
 Friendship+0.026
 rewarding+0.025
Neg logit contributions
 stool-0.044
 counter-0.037
cloth-0.036
 chair-0.034
box-0.034
Token coming
Token ID2406
Feature activation+7.88e-03

Pos logit contributions
zo+0.003
 culture+0.003
 habitats+0.003
 galaxies+0.003
minded+0.003
Neg logit contributions
 stool-0.004
cloth-0.003
 counter-0.003
box-0.003
 chair-0.003
Token to
Token ID284
Feature activation+9.22e-03

Pos logit contributions
 stool+0.018
cloth+0.015
 counter+0.015
 wra+0.015
box+0.015
Neg logit contributions
zo-0.011
 sailors-0.010
minded-0.010
 Earth-0.010
 aliens-0.010
Token get
Token ID651
Feature activation-6.20e-03

Pos logit contributions
 stool+0.017
 counter+0.014
cloth+0.013
 chair+0.013
 bowl+0.013
Neg logit contributions
zo-0.017
 culture-0.017
 habitats-0.016
 galaxies-0.016
 languages-0.016
Token you
Token ID345
Feature activation-1.22e-02

Pos logit contributions
zo+0.009
 culture+0.008
minded+0.008
 galaxies+0.008
 planets+0.008
Neg logit contributions
 stool-0.014
cloth-0.012
 counter-0.012
box-0.011
 chair-0.011
Token!"
Token ID2474
Feature activation+3.24e-02

Pos logit contributions
 adventures+0.016
zo+0.015
 planets+0.014
 Earth+0.014
 rewarding+0.014
Neg logit contributions
 stool-0.029
cloth-0.026
 counter-0.024
 clerk-0.023
 wra-0.023
Token she
Token ID673
Feature activation+9.09e-02

Pos logit contributions
 stool+0.056
cloth+0.045
box+0.044
 counter+0.044
coat+0.042
Neg logit contributions
 adventures-0.060
 culture-0.060
zo-0.059
 Jupiter-0.059
 Friendship-0.059
Token said
Token ID531
Feature activation+7.30e-02

Pos logit contributions
 stool+0.214
cloth+0.180
pit+0.177
 counter+0.174
box+0.172
Neg logit contributions
zo-0.125
 villains-0.108
 sailors-0.103
 aliens-0.102
 adventures-0.100
Token.
Token ID13
Feature activation+3.91e-02

Pos logit contributions
 stool+0.171
cloth+0.144
 counter+0.144
itten+0.140
box+0.139
Neg logit contributions
zo-0.087
minded-0.086
 sailors-0.084
 Friendship-0.081
 Earth-0.080
Token She
Token ID1375
Feature activation+3.98e-02

Pos logit contributions
 stool+0.103
cloth+0.093
 counter+0.089
 container+0.088
itten+0.088
Neg logit contributions
 Friendship-0.036
 Earth-0.033
 Jupiter-0.033
zo-0.032
 aliens-0.031
Token climbed
Token ID19952
Feature activation+3.45e-02

Pos logit contributions
 stool+0.078
 counter+0.066
cloth+0.065
itten+0.062
amas+0.061
Neg logit contributions
zo-0.062
 sailors-0.059
 adventures-0.057
 seals-0.057
 aliens-0.056
Token the
Token ID262
Feature activation+2.82e-02

Pos logit contributions
 stool+0.075
cloth+0.070
box+0.066
 counter+0.064
itten+0.063
Neg logit contributions
 Earth-0.046
 Jupiter-0.043
minded-0.043
 islands-0.042
 adventures-0.041
Token own
Token ID898
Feature activation-6.09e-03

Pos logit contributions
zo+0.017
 rewarding+0.017
minded+0.016
 Friendship+0.016
 villains+0.016
Neg logit contributions
 stool-0.022
 chair-0.019
 counter-0.018
 bowl-0.018
cloth-0.018
Token statues
Token ID25827
Feature activation+1.82e-02

Pos logit contributions
zo+0.009
 culture+0.009
minded+0.009
 habitats+0.009
 rewarding+0.009
Neg logit contributions
 stool-0.013
cloth-0.011
 counter-0.011
 chair-0.010
box-0.010
Token,
Token ID11
Feature activation+1.41e-02

Pos logit contributions
 stool+0.039
cloth+0.033
 counter+0.031
box+0.030
itten+0.029
Neg logit contributions
zo-0.028
 culture-0.028
 adventures-0.028
 planets-0.027
 rewarding-0.027
Token Ben
Token ID3932
Feature activation+9.44e-04

Pos logit contributions
 stool+0.027
cloth+0.022
 counter+0.021
itten+0.021
box+0.021
Neg logit contributions
 culture-0.024
 aliens-0.024
zo-0.024
 planets-0.023
 sailors-0.023
Token!"
Token ID2474
Feature activation+4.49e-02

Pos logit contributions
 stool+0.002
cloth+0.002
 counter+0.002
box+0.002
itten+0.002
Neg logit contributions
zo-0.001
 adventures-0.001
 culture-0.001
 rewarding-0.001
 sailors-0.001
Token she
Token ID673
Feature activation+9.08e-02

Pos logit contributions
 stool+0.080
cloth+0.065
box+0.063
 counter+0.061
itten+0.060
Neg logit contributions
zo-0.081
 adventures-0.081
 Friendship-0.081
 culture-0.080
 aliens-0.080
Token suggested
Token ID5220
Feature activation+6.63e-02

Pos logit contributions
 stool+0.198
cloth+0.165
itten+0.160
 counter+0.159
aman+0.154
Neg logit contributions
zo-0.143
 adventures-0.123
 villains-0.121
 sailors-0.120
 battles-0.119
Token.
Token ID13
Feature activation+3.78e-02

Pos logit contributions
 stool+0.164
itten+0.145
 wrapper+0.138
 container+0.138
cloth+0.137
Neg logit contributions
minded-0.065
 adventures-0.065
zo-0.064
 culture-0.063
 sailors-0.063
Token
Token ID198
Feature activation+1.52e-02

Pos logit contributions
 stool+0.099
cloth+0.088
 counter+0.083
itten+0.082
box+0.082
Neg logit contributions
 Friendship-0.038
zo-0.036
 Earth-0.034
 culture-0.034
minded-0.034
Token
Token ID198
Feature activation+3.14e-02

Pos logit contributions
 stool+0.037
cloth+0.032
 counter+0.031
box+0.029
 chair+0.029
Neg logit contributions
zo-0.019
 culture-0.019
 Friendship-0.018
 adventures-0.018
minded-0.018
Token"
Token ID1
Feature activation+5.15e-02

Pos logit contributions
 stool+0.061
 counter+0.050
itten+0.050
coat+0.049
cloth+0.049
Neg logit contributions
zo-0.051
 Friendship-0.051
 sailors-0.049
minded-0.049
 islands-0.048
Token This
Token ID770
Feature activation+1.12e-02

Pos logit contributions
zo+0.027
 culture+0.026
 galaxies+0.025
 adventures+0.025
 habitats+0.025
Neg logit contributions
 stool-0.036
cloth-0.031
 counter-0.029
box-0.028
 chair-0.028
Token is
Token ID318
Feature activation-1.68e-02

Pos logit contributions
 stool+0.023
cloth+0.020
 counter+0.019
box+0.018
 wra+0.018
Neg logit contributions
zo-0.018
 adventures-0.017
 planets-0.017
 culture-0.017
 aliens-0.016
Token my
Token ID616
Feature activation-6.79e-03

Pos logit contributions
zo+0.027
 culture+0.026
 habitats+0.025
 galaxies+0.025
minded+0.025
Neg logit contributions
 stool-0.035
cloth-0.030
 counter-0.028
box-0.027
 chair-0.027
Token spider
Token ID19230
Feature activation-1.53e-02

Pos logit contributions
zo+0.012
 culture+0.012
 adventures+0.011
 habitats+0.011
 planets+0.011
Neg logit contributions
 stool-0.013
cloth-0.011
 counter-0.010
box-0.010
 wra-0.010
Token,"
Token ID553
Feature activation+3.62e-02

Pos logit contributions
zo+0.023
 adventures+0.022
 culture+0.022
 rewarding+0.021
 planets+0.021
Neg logit contributions
 stool-0.033
cloth-0.028
 counter-0.027
box-0.025
itten-0.025
Token she
Token ID673
Feature activation+9.08e-02

Pos logit contributions
 stool+0.072
cloth+0.064
box+0.058
itten+0.058
coat+0.057
Neg logit contributions
 adventures-0.059
zo-0.058
 Friendship-0.057
 villains-0.057
 culture-0.057
Token said
Token ID531
Feature activation+7.36e-02

Pos logit contributions
 stool+0.215
cloth+0.181
 counter+0.175
box+0.174
itten+0.170
Neg logit contributions
zo-0.121
 villains-0.100
minded-0.100
 adventures-0.099
 seals-0.097
Token.
Token ID13
Feature activation+3.38e-02

Pos logit contributions
 stool+0.177
cloth+0.150
 counter+0.148
itten+0.145
 container+0.143
Neg logit contributions
zo-0.085
minded-0.085
 sailors-0.079
 Friendship-0.077
 adventures-0.076
Token
Token ID198
Feature activation+1.15e-02

Pos logit contributions
 stool+0.090
cloth+0.082
 container+0.076
 counter+0.076
itten+0.076
Neg logit contributions
 Friendship-0.033
 Earth-0.031
zo-0.029
 Jupiter-0.029
 aliens-0.028
Token
Token ID198
Feature activation+2.52e-02

Pos logit contributions
 stool+0.028
cloth+0.025
 counter+0.024
box+0.023
 chair+0.022
Neg logit contributions
zo-0.014
 culture-0.014
 Friendship-0.013
 adventures-0.013
 planets-0.013
TokenMax
Token ID11518
Feature activation+2.87e-02

Pos logit contributions
 stool+0.053
cloth+0.048
 container+0.047
 counter+0.046
 spoon+0.045
Neg logit contributions
zo-0.039
 villains-0.038
 Friendship-0.036
 adventures-0.035
 rewarding-0.035
Token,
Token ID11
Feature activation+1.71e-02

Pos logit contributions
 stool+0.076
cloth+0.066
 counter+0.063
 checkout+0.061
 clerk+0.060
Neg logit contributions
zo-0.039
 adventures-0.033
 harm-0.033
 meets-0.033
 thanks-0.032
Token I
Token ID314
Feature activation+2.93e-02

Pos logit contributions
 stool+0.039
 counter+0.035
coat+0.034
amas+0.034
box+0.034
Neg logit contributions
 sailors-0.019
 sharks-0.019
 aliens-0.019
 whales-0.019
 dolphins-0.018
Token hurt
Token ID5938
Feature activation+1.85e-02

Pos logit contributions
 stool+0.054
 counter+0.044
cloth+0.044
box+0.040
 chair+0.039
Neg logit contributions
zo-0.055
 culture-0.052
 rewarding-0.052
 adventures-0.052
minded-0.051
Token myself
Token ID3589
Feature activation+2.13e-02

Pos logit contributions
 stool+0.049
box+0.041
coat+0.041
cloth+0.041
 counter+0.040
Neg logit contributions
minded-0.017
 aliens-0.017
 Earth-0.017
 adventures-0.016
 planets-0.016
Token!"
Token ID2474
Feature activation+4.18e-02

Pos logit contributions
 stool+0.054
 clerk+0.047
 counter+0.047
 checkout+0.046
cloth+0.045
Neg logit contributions
 adventures-0.018
 someday-0.018
 rewarding-0.018
 anymore-0.017
zo-0.016
Token she
Token ID673
Feature activation+9.05e-02

Pos logit contributions
 stool+0.085
cloth+0.070
box+0.068
 counter+0.066
itten+0.065
Neg logit contributions
zo-0.066
 Friendship-0.065
 culture-0.065
 adventures-0.065
 Earth-0.064
Token sob
Token ID27355
Feature activation+5.04e-02

Pos logit contributions
 stool+0.208
 counter+0.168
box+0.167
pit+0.166
cloth+0.165
Neg logit contributions
zo-0.134
 villains-0.116
 aliens-0.114
 sailors-0.113
 whist-0.108
Tokenbed
Token ID3077
Feature activation+5.62e-02

Pos logit contributions
 stool+0.084
cloth+0.067
 counter+0.063
box+0.060
 chair+0.059
Neg logit contributions
zo-0.103
 culture-0.101
minded-0.098
 planets-0.098
 habitats-0.097
Token.
Token ID13
Feature activation+4.35e-02

Pos logit contributions
 stool+0.154
box+0.132
itten+0.128
 container+0.127
 counter+0.126
Neg logit contributions
minded-0.050
 sailors-0.047
 aliens-0.046
zo-0.045
 adventures-0.044
Token
Token ID198
Feature activation+1.95e-02

Pos logit contributions
 stool+0.120
cloth+0.108
box+0.103
itten+0.103
 counter+0.102
Neg logit contributions
 Friendship-0.036
 Earth-0.035
 Jupiter-0.031
zo-0.030
 culture-0.030
Token
Token ID198
Feature activation+3.55e-02

Pos logit contributions
 stool+0.049
cloth+0.042
 counter+0.041
box+0.039
 chair+0.039
Neg logit contributions
zo-0.023
 culture-0.022
 Friendship-0.022
 adventures-0.021
minded-0.021
TokenStop
Token ID19485
Feature activation+6.33e-03

Pos logit contributions
 stool+0.097
cloth+0.080
 counter+0.076
box+0.073
 chair+0.073
Neg logit contributions
zo-0.085
 culture-0.084
minded-0.082
 Friendship-0.081
 diversity-0.081
Token that
Token ID326
Feature activation+3.39e-02

Pos logit contributions
 stool+0.012
cloth+0.010
 counter+0.009
box+0.009
 chair+0.009
Neg logit contributions
zo-0.012
 culture-0.012
 habitats-0.011
 galaxies-0.011
 rewarding-0.011
Token right
Token ID826
Feature activation+2.30e-02

Pos logit contributions
 stool+0.074
cloth+0.068
 counter+0.058
 wra+0.058
box+0.058
Neg logit contributions
zo-0.047
 adventures-0.047
 seals-0.045
 Friendship-0.044
 aliens-0.044
Token now
Token ID783
Feature activation+2.75e-02

Pos logit contributions
 stool+0.053
cloth+0.046
itten+0.045
 counter+0.045
coat+0.044
Neg logit contributions
 culture-0.029
 aliens-0.029
 sailors-0.029
 adventures-0.028
 Earth-0.028
Token!"
Token ID2474
Feature activation+4.56e-02

Pos logit contributions
 stool+0.058
cloth+0.050
 counter+0.045
itten+0.045
box+0.045
Neg logit contributions
zo-0.041
 adventures-0.040
 Earth-0.040
 aliens-0.038
 culture-0.038
Token she
Token ID673
Feature activation+9.02e-02

Pos logit contributions
 stool+0.073
cloth+0.058
itten+0.056
box+0.056
coat+0.055
Neg logit contributions
 culture-0.089
 Earth-0.089
 adventures-0.088
 Friendship-0.088
 aliens-0.088
Token said
Token ID531
Feature activation+7.82e-02

Pos logit contributions
 stool+0.210
cloth+0.175
itten+0.174
pit+0.170
box+0.167
Neg logit contributions
zo-0.119
 sailors-0.107
minded-0.105
 seals-0.104
 villains-0.103
Token.
Token ID13
Feature activation+4.38e-02

Pos logit contributions
 stool+0.185
cloth+0.157
 counter+0.153
itten+0.153
box+0.151
Neg logit contributions
minded-0.092
zo-0.088
 sailors-0.086
 Friendship-0.084
 culture-0.083
Token "
Token ID366
Feature activation+1.37e-02

Pos logit contributions
 stool+0.087
cloth+0.077
itten+0.077
 container+0.072
coat+0.071
Neg logit contributions
 Friendship-0.066
 Earth-0.064
minded-0.060
 Jupiter-0.060
 aliens-0.059
TokenYou
Token ID1639
Feature activation+1.05e-03

Pos logit contributions
 stool+0.028
cloth+0.024
 counter+0.022
 wra+0.021
 chair+0.021
Neg logit contributions
zo-0.023
 culture-0.022
 habitats-0.022
 adventures-0.022
 galaxies-0.021
Token are
Token ID389
Feature activation-6.05e-03

Pos logit contributions
 stool+0.002
cloth+0.002
 counter+0.002
box+0.002
 chair+0.001
Neg logit contributions
zo-0.002
 culture-0.002
 habitats-0.002
 galaxies-0.002
minded-0.002
Token,
Token ID11
Feature activation+2.27e-02

Pos logit contributions
 stool+0.088
cloth+0.075
 counter+0.069
box+0.069
 tin+0.068
Neg logit contributions
zo-0.060
 aliens-0.057
 adventures-0.056
 Earth-0.055
 culture-0.054
Token Tim
Token ID5045
Feature activation+1.29e-02

Pos logit contributions
 stool+0.050
cloth+0.043
 counter+0.041
box+0.040
itten+0.039
Neg logit contributions
 culture-0.032
zo-0.032
 rewarding-0.032
 aliens-0.032
 sailors-0.031
Tokenmy
Token ID1820
Feature activation+3.38e-02

Pos logit contributions
 stool+0.024
cloth+0.020
 counter+0.019
apy+0.019
 wra+0.018
Neg logit contributions
zo-0.024
minded-0.021
 adventures-0.021
 sailors-0.021
 rewarding-0.021
Token,"
Token ID553
Feature activation+6.25e-02

Pos logit contributions
 stool+0.080
cloth+0.071
 counter+0.066
 sponge+0.064
box+0.064
Neg logit contributions
zo-0.045
minded-0.041
 adventures-0.040
 rewarding-0.039
 sailors-0.039
Token his
Token ID465
Feature activation+2.78e-02

Pos logit contributions
 stool+0.162
coat+0.146
cloth+0.145
itten+0.140
pit+0.139
Neg logit contributions
 adventures-0.059
zo-0.058
 sailors-0.058
 Friendship-0.056
 culture-0.055
Token dad
Token ID9955
Feature activation+9.01e-02

Pos logit contributions
 stool+0.052
cloth+0.045
 counter+0.041
itten+0.039
 wra+0.038
Neg logit contributions
zo-0.049
 culture-0.049
 sailors-0.047
 adventures-0.047
minded-0.046
Token said
Token ID531
Feature activation+8.74e-02

Pos logit contributions
 stool+0.197
cloth+0.166
 counter+0.157
 sponge+0.150
 pitcher+0.147
Neg logit contributions
zo-0.141
minded-0.127
 sailors-0.121
 adventures-0.116
 Friendship-0.114
Token.
Token ID13
Feature activation+3.54e-02

Pos logit contributions
 stool+0.216
itten+0.185
pit+0.182
cloth+0.181
 counter+0.180
Neg logit contributions
zo-0.085
minded-0.085
 sailors-0.082
 rewarding-0.075
 Earth-0.074
Token "
Token ID366
Feature activation+1.71e-04

Pos logit contributions
 stool+0.073
itten+0.068
cloth+0.068
 knob+0.060
brush+0.060
Neg logit contributions
 Friendship-0.049
 Friends-0.046
 Captain-0.044
 Earth-0.044
minded-0.044
TokenI
Token ID40
Feature activation+1.02e-02

Pos logit contributions
 stool+0.000
cloth+0.000
 counter+0.000
box+0.000
 chair+0.000
Neg logit contributions
 culture-0.000
zo-0.000
 planets-0.000
 adventures-0.000
 aliens-0.000
Token'll
Token ID1183
Feature activation+3.95e-02

Pos logit contributions
 stool+0.021
cloth+0.018
 counter+0.017
box+0.016
 chair+0.016
Neg logit contributions
zo-0.017
 culture-0.016
 habitats-0.016
 galaxies-0.016
 planets-0.016
TokenDon
Token ID3987
Feature activation+2.94e-02

Pos logit contributions
 stool+0.061
cloth+0.051
 counter+0.048
box+0.046
 chair+0.046
Neg logit contributions
zo-0.052
 culture-0.052
minded-0.050
 adventures-0.049
 Friendship-0.049
Token't
Token ID470
Feature activation+3.26e-02

Pos logit contributions
 stool+0.059
cloth+0.051
 counter+0.047
 wra+0.045
 chair+0.044
Neg logit contributions
zo-0.048
 Friendship-0.046
 rewarding-0.045
 aliens-0.045
 adventures-0.045
Token worry
Token ID5490
Feature activation+5.19e-02

Pos logit contributions
 stool+0.064
cloth+0.051
 counter+0.051
box+0.050
 chair+0.046
Neg logit contributions
zo-0.059
 rewarding-0.057
 adventures-0.056
 culture-0.056
 villains-0.055
Token,"
Token ID553
Feature activation+8.51e-02

Pos logit contributions
 stool+0.115
cloth+0.095
box+0.094
 counter+0.093
itten+0.093
Neg logit contributions
 adventures-0.068
 aliens-0.067
zo-0.066
 rewarding-0.064
 Earth-0.064
Token she
Token ID673
Feature activation+7.68e-02

Pos logit contributions
 stool+0.170
itten+0.152
pit+0.146
box+0.145
cloth+0.143
Neg logit contributions
 adventures-0.130
 rewarding-0.124
minded-0.123
 Friendship-0.122
 culture-0.121
Token said
Token ID531
Feature activation+8.99e-02

Pos logit contributions
 stool+0.164
cloth+0.137
 counter+0.134
itten+0.124
 chair+0.123
Neg logit contributions
zo-0.129
minded-0.113
 villains-0.110
 adventures-0.109
 sailors-0.108
Token.
Token ID13
Feature activation+5.51e-02

Pos logit contributions
 stool+0.220
itten+0.191
pit+0.189
box+0.185
cloth+0.185
Neg logit contributions
minded-0.092
zo-0.088
 sailors-0.082
 adventures-0.082
 rewarding-0.082
Token "
Token ID366
Feature activation+2.11e-02

Pos logit contributions
 stool+0.108
itten+0.099
cloth+0.094
 knob+0.087
box+0.086
Neg logit contributions
 Friendship-0.086
minded-0.086
 Earth-0.085
 culture-0.080
 Friends-0.079
TokenI
Token ID40
Feature activation+1.24e-02

Pos logit contributions
 stool+0.041
cloth+0.035
 counter+0.033
box+0.031
 chair+0.031
Neg logit contributions
zo-0.037
 culture-0.036
 habitats-0.035
 galaxies-0.035
minded-0.035
Token have
Token ID423
Feature activation-1.60e-02

Pos logit contributions
 stool+0.025
cloth+0.021
 counter+0.020
box+0.019
 chair+0.019
Neg logit contributions
zo-0.021
 culture-0.021
 habitats-0.020
 galaxies-0.020
 planets-0.020
Token an
Token ID281
Feature activation+1.92e-02

Pos logit contributions
zo+0.024
 culture+0.024
 habitats+0.023
 galaxies+0.023
minded+0.023
Neg logit contributions
 stool-0.035
cloth-0.030
 counter-0.028
box-0.027
 chair-0.027
Token day
Token ID1110
Feature activation+9.77e-04

Pos logit contributions
 habitats+0.046
minded+0.045
 culture+0.045
 cultures+0.043
 galaxies+0.043
Neg logit contributions
 stool-0.051
 chair-0.042
 bowl-0.041
 spoon-0.040
cloth-0.039
Token on
Token ID319
Feature activation-1.58e-02

Pos logit contributions
 stool+0.002
cloth+0.001
 counter+0.001
box+0.001
 chair+0.001
Neg logit contributions
 culture-0.002
zo-0.002
 habitats-0.002
 galaxies-0.002
minded-0.002
Token,
Token ID11
Feature activation-3.93e-02

Pos logit contributions
zo+0.031
 adventures+0.030
 culture+0.030
 planets+0.029
 Earth+0.029
Neg logit contributions
 stool-0.028
cloth-0.023
 counter-0.022
box-0.020
 wra-0.020
Token the
Token ID262
Feature activation-1.13e-02

Pos logit contributions
 culture+0.091
 habitats+0.091
zo+0.087
minded+0.084
 cultures+0.084
Neg logit contributions
 stool-0.051
 ladder-0.040
 chair-0.039
 spoon-0.037
 bowl-0.037
Token successful
Token ID4388
Feature activation-2.98e-05

Pos logit contributions
zo+0.025
 culture+0.024
 habitats+0.024
minded+0.023
 galaxies+0.023
Neg logit contributions
 stool-0.018
 bowl-0.014
 spoon-0.014
cloth-0.014
 chair-0.014
Token frog
Token ID21264
Feature activation-9.06e-02

Pos logit contributions
minded+0.000
zo+0.000
 culture+0.000
 rewarding+0.000
 habitats+0.000
Neg logit contributions
 stool-0.000
cloth-0.000
 chair-0.000
 bowl-0.000
 counter-0.000
Token continued
Token ID3767
Feature activation-5.05e-02

Pos logit contributions
 culture+0.205
 rewarding+0.194
 diversity+0.193
 cultures+0.193
 habitats+0.193
Neg logit contributions
 stool-0.105
 slipped-0.085
cloth-0.084
 spoon-0.082
 chair-0.081
Token to
Token ID284
Feature activation-2.03e-02

Pos logit contributions
 rewarding+0.107
 culture+0.107
 habitats+0.103
zo+0.102
 galaxies+0.102
Neg logit contributions
 stool-0.076
 chair-0.058
cloth-0.054
 spoon-0.053
 bowl-0.052
Token swim
Token ID9422
Feature activation-5.87e-02

Pos logit contributions
 culture+0.031
minded+0.031
 habitats+0.031
 galaxies+0.030
zo+0.030
Neg logit contributions
 stool-0.041
cloth-0.035
 spoon-0.033
 chair-0.033
 bowl-0.033
Token and
Token ID290
Feature activation-6.31e-02

Pos logit contributions
 culture+0.132
 Friendship+0.129
 rewarding+0.128
 habitats+0.125
 villains+0.124
Neg logit contributions
 stool-0.075
cloth-0.063
 chair-0.053
 bowl-0.052
 tin-0.051
Token have
Token ID423
Feature activation-5.71e-02

Pos logit contributions
 habitats+0.113
 culture+0.112
zo+0.112
 languages+0.111
 galaxies+0.109
Neg logit contributions
 stool-0.110
 spoon-0.089
 chair-0.088
 bowl-0.088
cloth-0.086
TokenHello
Token ID15496
Feature activation-2.29e-02

Pos logit contributions
 stool+0.014
cloth+0.012
 counter+0.011
box+0.011
 chair+0.011
Neg logit contributions
zo-0.010
 culture-0.010
 galaxies-0.010
 planets-0.010
 rewarding-0.010
Token,
Token ID11
Feature activation-6.96e-03

Pos logit contributions
zo+0.052
 culture+0.051
 habitats+0.050
 galaxies+0.050
minded+0.049
Neg logit contributions
 stool-0.033
cloth-0.025
 counter-0.024
 chair-0.023
 bowl-0.022
Token can
Token ID460
Feature activation+2.23e-03

Pos logit contributions
zo+0.012
 culture+0.012
 habitats+0.011
minded+0.011
 galaxies+0.011
Neg logit contributions
 stool-0.014
cloth-0.012
 counter-0.011
 chair-0.011
 bowl-0.010
Token we
Token ID356
Feature activation-2.57e-02

Pos logit contributions
 stool+0.004
cloth+0.003
 counter+0.003
 chair+0.003
 spoon+0.003
Neg logit contributions
 habitats-0.004
zo-0.004
 galaxies-0.004
 culture-0.004
 planets-0.004
Token be
Token ID307
Feature activation-3.99e-02

Pos logit contributions
 culture+0.040
zo+0.040
 galaxies+0.039
 habitats+0.039
 planets+0.038
Neg logit contributions
 stool-0.053
cloth-0.046
 counter-0.043
 bowl-0.042
 spoon-0.042
Token friends
Token ID2460
Feature activation-8.83e-02

Pos logit contributions
 habitats+0.066
 galaxies+0.066
 culture+0.066
zo+0.065
minded+0.065
Neg logit contributions
 stool-0.072
cloth-0.063
 bowl-0.060
box-0.058
 spoon-0.057
Token?"
Token ID1701
Feature activation-1.91e-02

Pos logit contributions
 habitats+0.205
 galaxies+0.198
 culture+0.196
zo+0.194
 rewarding+0.191
Neg logit contributions
 stool-0.117
 bowl-0.092
cloth-0.089
 chair-0.087
box-0.085
Token
Token ID198
Feature activation-2.77e-02

Pos logit contributions
 rewarding+0.036
 culture+0.035
 galaxies+0.034
 cultures+0.034
 habitats+0.033
Neg logit contributions
 stool-0.034
 counter-0.028
 chair-0.028
 bowl-0.027
cloth-0.026
Token
Token ID198
Feature activation-1.09e-02

Pos logit contributions
 culture+0.063
 habitats+0.061
 cultures+0.060
 galaxies+0.059
 planets+0.059
Neg logit contributions
 stool-0.040
 chair-0.030
 counter-0.030
cloth-0.030
 bowl-0.028
TokenThe
Token ID464
Feature activation-1.65e-03

Pos logit contributions
 culture+0.019
 habitats+0.019
zo+0.019
 planets+0.018
 galaxies+0.018
Neg logit contributions
 stool-0.021
cloth-0.017
 counter-0.016
 chair-0.015
box-0.015
Token other
Token ID584
Feature activation-1.25e-03

Pos logit contributions
zo+0.003
 culture+0.003
 habitats+0.003
 galaxies+0.003
minded+0.003
Neg logit contributions
 stool-0.003
cloth-0.002
 chair-0.002
 counter-0.002
 bowl-0.002
Token Do
Token ID2141
Feature activation-1.45e-02

Pos logit contributions
zo+0.078
 adventures+0.077
 culture+0.075
 targets+0.073
 conflicts+0.072
Neg logit contributions
 stool-0.078
cloth-0.064
 counter-0.063
 chair-0.058
box-0.056
Token you
Token ID345
Feature activation-4.71e-02

Pos logit contributions
minded+0.025
 culture+0.025
 solar+0.025
zo+0.024
 habitats+0.024
Neg logit contributions
 stool-0.025
cloth-0.020
 wra-0.020
 counter-0.019
 chair-0.019
Token want
Token ID765
Feature activation-4.00e-02

Pos logit contributions
 habitats+0.092
 culture+0.092
minded+0.092
 seals+0.090
zo+0.088
Neg logit contributions
cloth-0.068
 stool-0.066
 brush-0.062
 spoon-0.059
 sponge-0.054
Token to
Token ID284
Feature activation+1.73e-02

Pos logit contributions
zo+0.087
 habitats+0.084
 rewarding+0.084
 culture+0.083
 victories+0.082
Neg logit contributions
 stool-0.054
 spoon-0.044
 bowl-0.044
 counter-0.044
 chair-0.042
Token be
Token ID307
Feature activation-4.14e-02

Pos logit contributions
 stool+0.033
cloth+0.028
 counter+0.027
 chair+0.025
 spoon+0.025
Neg logit contributions
zo-0.030
 culture-0.029
 galaxies-0.029
minded-0.029
 habitats-0.028
Token friends
Token ID2460
Feature activation-8.65e-02

Pos logit contributions
zo+0.076
 habitats+0.075
 culture+0.073
 galaxies+0.072
minded+0.072
Neg logit contributions
 stool-0.068
cloth-0.060
 bowl-0.056
 spoon-0.054
 brush-0.054
Token?"
Token ID1701
Feature activation+4.10e-03

Pos logit contributions
 habitats+0.205
 galaxies+0.196
zo+0.194
 rewarding+0.193
 culture+0.192
Neg logit contributions
 stool-0.108
 bowl-0.084
cloth-0.083
 chair-0.081
 spoon-0.080
Token Lily
Token ID20037
Feature activation+1.95e-02

Pos logit contributions
 stool+0.007
cloth+0.006
 counter+0.006
 chair+0.006
box+0.005
Neg logit contributions
 culture-0.008
zo-0.008
 habitats-0.008
 galaxies-0.008
 planets-0.008
Token nodded
Token ID14464
Feature activation+2.89e-02

Pos logit contributions
 stool+0.035
cloth+0.028
 counter+0.027
 chair+0.025
box+0.025
Neg logit contributions
 culture-0.037
zo-0.037
 rewarding-0.036
 habitats-0.036
 galaxies-0.036
Token her
Token ID607
Feature activation+3.72e-02

Pos logit contributions
 stool+0.050
cloth+0.041
 counter+0.038
 chair+0.036
box+0.035
Neg logit contributions
 culture-0.057
zo-0.057
 habitats-0.055
 rewarding-0.055
 galaxies-0.054
Token head
Token ID1182
Feature activation+3.31e-02

Pos logit contributions
 stool+0.048
cloth+0.037
 counter+0.034
box+0.031
 wra+0.029
Neg logit contributions
zo-0.089
 culture-0.088
 adventures-0.086
 planets-0.086
 habitats-0.085
Token Do
Token ID2141
Feature activation-6.69e-03

Pos logit contributions
zo+0.059
 adventures+0.057
 culture+0.057
minded+0.055
 habitats+0.055
Neg logit contributions
 stool-0.066
cloth-0.056
 counter-0.054
 chair-0.051
box-0.050
Token you
Token ID345
Feature activation-4.80e-02

Pos logit contributions
zo+0.011
 galaxies+0.011
 habitats+0.011
 culture+0.011
minded+0.011
Neg logit contributions
 stool-0.013
cloth-0.011
 counter-0.010
 wra-0.010
 chair-0.010
Token want
Token ID765
Feature activation-3.64e-02

Pos logit contributions
 habitats+0.094
 culture+0.093
minded+0.093
 islands+0.092
 seals+0.092
Neg logit contributions
 stool-0.071
cloth-0.070
 brush-0.063
 spoon-0.059
coat-0.058
Token to
Token ID284
Feature activation+3.69e-03

Pos logit contributions
zo+0.071
 habitats+0.068
 rewarding+0.067
 culture+0.067
 galaxies+0.066
Neg logit contributions
 stool-0.062
 counter-0.050
 chair-0.049
cloth-0.049
box-0.049
Token be
Token ID307
Feature activation-3.73e-02

Pos logit contributions
 stool+0.007
cloth+0.006
 counter+0.005
 chair+0.005
 brush+0.005
Neg logit contributions
minded-0.006
zo-0.006
 culture-0.006
 habitats-0.006
 galaxies-0.006
Token friends
Token ID2460
Feature activation-8.37e-02

Pos logit contributions
zo+0.070
 habitats+0.070
 culture+0.069
minded+0.068
 galaxies+0.068
Neg logit contributions
 stool-0.061
cloth-0.053
 chair-0.048
box-0.047
 counter-0.047
Token?"
Token ID1701
Feature activation+1.04e-02

Pos logit contributions
 habitats+0.187
 galaxies+0.182
zo+0.182
 rewarding+0.177
 culture+0.177
Neg logit contributions
 stool-0.118
 chair-0.091
cloth-0.091
 bowl-0.088
box-0.088
Token The
Token ID383
Feature activation+2.43e-02

Pos logit contributions
 stool+0.023
cloth+0.019
 counter+0.019
 chair+0.018
box+0.017
Neg logit contributions
zo-0.016
 culture-0.016
 rewarding-0.015
 galaxies-0.015
 habitats-0.015
Token girl
Token ID2576
Feature activation+2.48e-02

Pos logit contributions
 stool+0.034
cloth+0.027
 counter+0.025
 chair+0.023
box+0.022
Neg logit contributions
zo-0.056
 culture-0.055
 habitats-0.054
 galaxies-0.053
 rewarding-0.053
Token smiled
Token ID13541
Feature activation+9.03e-03

Pos logit contributions
 stool+0.039
cloth+0.031
 counter+0.029
 chair+0.027
box+0.027
Neg logit contributions
zo-0.052
 culture-0.052
 habitats-0.050
 galaxies-0.050
 rewarding-0.050
Token and
Token ID290
Feature activation+8.56e-03

Pos logit contributions
 stool+0.016
cloth+0.014
 counter+0.013
 chair+0.012
box+0.012
Neg logit contributions
zo-0.017
 culture-0.017
 habitats-0.016
 rewarding-0.016
 galaxies-0.016
Token Do
Token ID2141
Feature activation-6.58e-03

Pos logit contributions
zo+0.067
 adventures+0.067
 conflicts+0.064
 habitats+0.063
 kingdom+0.062
Neg logit contributions
 stool-0.053
cloth-0.048
 counter-0.046
 lid-0.041
box-0.040
Token you
Token ID345
Feature activation-4.09e-02

Pos logit contributions
zo+0.011
minded+0.011
 habitats+0.011
 galaxies+0.011
 culture+0.011
Neg logit contributions
 stool-0.012
cloth-0.010
 wra-0.010
 chair-0.009
 counter-0.009
Token want
Token ID765
Feature activation-3.17e-02

Pos logit contributions
 habitats+0.076
 culture+0.076
minded+0.075
 galaxies+0.074
 seals+0.074
Neg logit contributions
 stool-0.069
cloth-0.066
 brush-0.056
 spoon-0.054
 knob-0.053
Token to
Token ID284
Feature activation+8.64e-03

Pos logit contributions
zo+0.065
 habitats+0.063
 rewarding+0.061
 victories+0.061
 culture+0.060
Neg logit contributions
 stool-0.051
 counter-0.043
box-0.041
 chair-0.041
 bowl-0.041
Token join
Token ID4654
Feature activation-4.35e-02

Pos logit contributions
 stool+0.016
cloth+0.013
 counter+0.013
 chair+0.012
 brush+0.012
Neg logit contributions
zo-0.015
minded-0.015
 culture-0.015
 galaxies-0.015
 rewarding-0.015
Token us
Token ID514
Feature activation-8.35e-02

Pos logit contributions
 rewarding+0.070
zo+0.068
 habitats+0.065
 culture+0.065
minded+0.064
Neg logit contributions
 stool-0.089
cloth-0.079
 counter-0.073
 chair-0.071
box-0.071
Token?"
Token ID1701
Feature activation-1.29e-02

Pos logit contributions
zo+0.165
 galaxies+0.161
 culture+0.160
 habitats+0.160
 villains+0.159
Neg logit contributions
 stool-0.135
cloth-0.110
 bowl-0.102
box-0.102
 chair-0.100
Token
Token ID198
Feature activation-1.07e-02

Pos logit contributions
 culture+0.026
 rewarding+0.026
 habitats+0.025
 galaxies+0.025
zo+0.024
Neg logit contributions
 stool-0.022
 counter-0.018
cloth-0.017
 chair-0.017
 bowl-0.016
Token
Token ID198
Feature activation-1.16e-03

Pos logit contributions
 culture+0.019
zo+0.018
 habitats+0.018
 rewarding+0.018
 galaxies+0.018
Neg logit contributions
 stool-0.021
cloth-0.017
 counter-0.017
 chair-0.016
box-0.016
TokenSam
Token ID16305
Feature activation+3.79e-02

Pos logit contributions
zo+0.002
 culture+0.002
minded+0.002
 habitats+0.002
 galaxies+0.002
Neg logit contributions
 stool-0.002
cloth-0.002
 counter-0.002
box-0.002
 chair-0.002
Token said
Token ID531
Feature activation+3.97e-02

Pos logit contributions
 stool+0.074
cloth+0.061
 counter+0.057
box+0.055
itten+0.054
Neg logit contributions
zo-0.067
 culture-0.063
 adventures-0.062
minded-0.062
 Friendship-0.061
Token that
Token ID326
Feature activation-1.82e-02

Pos logit contributions
 villains+0.095
 battles+0.093
 Friendship+0.092
 victories+0.092
zo+0.089
Neg logit contributions
 stool-0.083
cloth-0.064
 chair-0.063
 knob-0.061
 bowl-0.060
Token day
Token ID1110
Feature activation+9.72e-04

Pos logit contributions
minded+0.036
 habitats+0.035
 villains+0.035
 culture+0.035
 aliens+0.034
Neg logit contributions
 stool-0.034
 chair-0.028
 bowl-0.026
 spoon-0.025
 cup-0.024
Token on
Token ID319
Feature activation-1.91e-02

Pos logit contributions
 stool+0.002
cloth+0.001
box+0.001
itten+0.001
 counter+0.001
Neg logit contributions
 culture-0.002
zo-0.002
 habitats-0.002
 galaxies-0.002
minded-0.002
Token,
Token ID11
Feature activation-3.72e-02

Pos logit contributions
zo+0.046
 culture+0.046
minded+0.045
 habitats+0.045
 galaxies+0.045
Neg logit contributions
 stool-0.025
cloth-0.019
 counter-0.017
 chair-0.016
box-0.016
Token Nem
Token ID22547
Feature activation-7.57e-02

Pos logit contributions
 languages+0.072
 habitats+0.071
 villains+0.069
zo+0.069
 culture+0.069
Neg logit contributions
 stool-0.061
 bowl-0.055
 spoon-0.054
cloth-0.054
 counter-0.053
Tokeno
Token ID78
Feature activation-8.30e-02

Pos logit contributions
 culture+0.171
 aliens+0.170
 galaxies+0.166
 planets+0.164
 extraordinary+0.164
Neg logit contributions
 stool-0.102
cloth-0.077
box-0.075
itten-0.074
 chair-0.067
Token knew
Token ID2993
Feature activation-3.84e-02

Pos logit contributions
 culture+0.175
 rewarding+0.170
 diversity+0.169
 victories+0.167
 cultures+0.166
Neg logit contributions
 stool-0.112
cloth-0.095
box-0.087
 chair-0.085
 bowl-0.083
Token that
Token ID326
Feature activation-3.87e-02

Pos logit contributions
zo+0.068
 solar+0.066
 culture+0.066
 habitats+0.066
minded+0.064
Neg logit contributions
 stool-0.068
box-0.056
cloth-0.053
 chair-0.053
 counter-0.052
Token even
Token ID772
Feature activation-1.63e-02

Pos logit contributions
zo+0.082
 habitats+0.081
minded+0.074
 culture+0.073
 victories+0.070
Neg logit contributions
 stool-0.056
 chair-0.045
cloth-0.045
 spoon-0.043
 bowl-0.042
Token though
Token ID996
Feature activation-3.51e-02

Pos logit contributions
 habitats+0.035
zo+0.034
 culture+0.034
 Friendship+0.033
 solar+0.032
Neg logit contributions
 stool-0.021
 wra-0.016
cloth-0.016
 chair-0.015
 tin-0.015
Token there
Token ID612
Feature activation-1.15e-02

Pos logit contributions
zo+0.066
minded+0.064
 habitats+0.062
 culture+0.062
 galaxies+0.060
Neg logit contributions
 stool-0.061
cloth-0.052
 counter-0.048
 wra-0.046
 chair-0.046
Token explore
Token ID7301
Feature activation-3.67e-02

Pos logit contributions
zo+0.043
 culture+0.042
 habitats+0.040
 galaxies+0.040
minded+0.040
Neg logit contributions
 stool-0.063
cloth-0.054
 counter-0.052
box-0.050
 chair-0.049
Token.
Token ID13
Feature activation-4.31e-02

Pos logit contributions
zo+0.077
 habitats+0.072
 villains+0.071
 culture+0.069
minded+0.066
Neg logit contributions
 stool-0.060
 chair-0.047
 counter-0.045
 bowl-0.045
 ladder-0.042
Token Will
Token ID2561
Feature activation-4.94e-02

Pos logit contributions
zo+0.079
 culture+0.075
 adventures+0.075
 habitats+0.073
 diversity+0.072
Neg logit contributions
 stool-0.082
 counter-0.067
cloth-0.066
box-0.060
 chair-0.060
Token you
Token ID345
Feature activation-4.27e-02

Pos logit contributions
 habitats+0.089
minded+0.083
 villains+0.082
zo+0.082
 galaxies+0.082
Neg logit contributions
 stool-0.088
cloth-0.073
 spoon-0.070
 chair-0.068
 counter-0.068
Token join
Token ID4654
Feature activation-5.22e-02

Pos logit contributions
 culture+0.080
 habitats+0.080
zo+0.078
minded+0.078
 diversity+0.077
Neg logit contributions
 stool-0.075
cloth-0.062
 counter-0.057
 bowl-0.056
box-0.056
Token us
Token ID514
Feature activation-8.30e-02

Pos logit contributions
zo+0.088
 rewarding+0.088
 culture+0.087
minded+0.085
 habitats+0.085
Neg logit contributions
 stool-0.099
cloth-0.087
 counter-0.079
 chair-0.078
box-0.078
Token?"
Token ID1701
Feature activation+1.10e-02

Pos logit contributions
zo+0.144
 culture+0.141
 habitats+0.139
 galaxies+0.137
 diversity+0.136
Neg logit contributions
 stool-0.159
cloth-0.131
 counter-0.124
box-0.122
 chair-0.120
Token
Token ID198
Feature activation+1.17e-03

Pos logit contributions
 stool+0.025
cloth+0.022
box+0.021
 counter+0.020
 wra+0.020
Neg logit contributions
 Friendship-0.015
 Earth-0.015
 adventures-0.015
zo-0.014
 culture-0.014
Token
Token ID198
Feature activation+1.90e-02

Pos logit contributions
 stool+0.003
cloth+0.002
 counter+0.002
box+0.002
 chair+0.002
Neg logit contributions
zo-0.002
 culture-0.002
 habitats-0.001
 adventures-0.001
 galaxies-0.001
TokenLu
Token ID25596
Feature activation+5.65e-03

Pos logit contributions
 stool+0.036
cloth+0.031
box+0.030
itten+0.028
 counter+0.028
Neg logit contributions
 Friendship-0.031
 adventures-0.031
 culture-0.031
 rewarding-0.031
 Earth-0.030
Tokency
Token ID948
Feature activation+1.39e-02

Pos logit contributions
 stool+0.011
cloth+0.009
 counter+0.009
box+0.008
 chair+0.008
Neg logit contributions
zo-0.010
 culture-0.009
minded-0.009
 habitats-0.009
 Friendship-0.009
Token you
Token ID345
Feature activation-4.85e-02

Pos logit contributions
 galaxies+0.038
minded+0.038
 habitats+0.037
zo+0.037
 rewarding+0.037
Neg logit contributions
 stool-0.033
cloth-0.027
 wra-0.026
 chair-0.025
 counter-0.025
Token want
Token ID765
Feature activation-4.66e-02

Pos logit contributions
 habitats+0.090
minded+0.089
 culture+0.089
 islands+0.089
 galaxies+0.088
Neg logit contributions
cloth-0.072
 stool-0.069
 brush-0.066
 spoon-0.061
coat-0.059
Token to
Token ID284
Feature activation+4.49e-04

Pos logit contributions
zo+0.107
 habitats+0.103
 galaxies+0.101
 rewarding+0.100
ieving+0.100
Neg logit contributions
 stool-0.057
 chair-0.045
 bowl-0.045
 door-0.044
 counter-0.044
Token be
Token ID307
Feature activation-4.49e-02

Pos logit contributions
 stool+0.001
cloth+0.001
 counter+0.001
 brush+0.001
 handle+0.001
Neg logit contributions
 galaxies-0.001
zo-0.001
 culture-0.001
minded-0.001
 habitats-0.001
Token my
Token ID616
Feature activation-4.09e-02

Pos logit contributions
 habitats+0.092
zo+0.092
 galaxies+0.091
minded+0.089
 culture+0.088
Neg logit contributions
 stool-0.058
cloth-0.051
 brush-0.047
 bowl-0.046
box-0.044
Token friend
Token ID1545
Feature activation-8.24e-02

Pos logit contributions
zo+0.087
 galaxies+0.086
minded+0.082
 battles+0.081
 habitats+0.081
Neg logit contributions
 stool-0.064
 chair-0.049
 bowl-0.046
 spoon-0.045
 counter-0.045
Token?"
Token ID1701
Feature activation-2.01e-02

Pos logit contributions
 galaxies+0.171
 habitats+0.171
 culture+0.171
zo+0.165
 rewarding+0.164
Neg logit contributions
 stool-0.129
 bowl-0.100
cloth-0.100
 chair-0.098
 counter-0.095
Token The
Token ID383
Feature activation+2.78e-03

Pos logit contributions
 culture+0.033
 rewarding+0.032
 habitats+0.032
zo+0.031
 galaxies+0.030
Neg logit contributions
 stool-0.042
 chair-0.034
 counter-0.034
cloth-0.033
 spoon-0.033
Token fly
Token ID6129
Feature activation+2.28e-03

Pos logit contributions
 stool+0.004
 chair+0.003
 counter+0.003
cloth+0.003
 spoon+0.003
Neg logit contributions
 habitats-0.006
zo-0.006
 culture-0.006
 galaxies-0.006
minded-0.006
Token was
Token ID373
Feature activation-1.85e-02

Pos logit contributions
 stool+0.003
cloth+0.003
 counter+0.003
 chair+0.002
box+0.002
Neg logit contributions
 culture-0.005
zo-0.005
 habitats-0.005
minded-0.005
 rewarding-0.005
Token happy
Token ID3772
Feature activation-3.35e-02

Pos logit contributions
zo+0.030
 culture+0.029
minded+0.028
 habitats+0.028
 galaxies+0.028
Neg logit contributions
 stool-0.039
cloth-0.033
 counter-0.032
box-0.030
 chair-0.030
Token Do
Token ID2141
Feature activation-2.37e-02

Pos logit contributions
zo+0.093
 targets+0.090
 adventures+0.089
 battles+0.087
 culture+0.086
Neg logit contributions
 stool-0.075
cloth-0.067
 counter-0.066
 Lee-0.057
 chair-0.056
Token you
Token ID345
Feature activation-5.24e-02

Pos logit contributions
 galaxies+0.043
zo+0.043
 targets+0.043
minded+0.043
 solar+0.042
Neg logit contributions
 stool-0.040
 wra-0.031
 counter-0.031
cloth-0.031
 chair-0.030
Token want
Token ID765
Feature activation-4.07e-02

Pos logit contributions
 habitats+0.090
minded+0.090
 culture+0.089
 seals+0.087
 targets+0.086
Neg logit contributions
 stool-0.093
cloth-0.090
 brush-0.078
 spoon-0.075
 knob-0.073
Token to
Token ID284
Feature activation+3.68e-03

Pos logit contributions
zo+0.081
 habitats+0.077
 rewarding+0.077
 victories+0.077
 culture+0.076
Neg logit contributions
 stool-0.067
 counter-0.058
 bowl-0.054
 chair-0.054
cloth-0.054
Token join
Token ID4654
Feature activation-5.05e-02

Pos logit contributions
 stool+0.007
cloth+0.006
 counter+0.006
 chair+0.006
 bowl+0.005
Neg logit contributions
zo-0.006
 culture-0.006
minded-0.006
 galaxies-0.006
 habitats-0.006
Token us
Token ID514
Feature activation-8.22e-02

Pos logit contributions
zo+0.076
 rewarding+0.076
 culture+0.073
 habitats+0.071
minded+0.071
Neg logit contributions
 stool-0.108
cloth-0.095
 counter-0.090
 chair-0.087
box-0.085
Token?"
Token ID1701
Feature activation-9.44e-03

Pos logit contributions
zo+0.158
 villains+0.154
 culture+0.153
 habitats+0.153
 galaxies+0.152
Neg logit contributions
 stool-0.139
cloth-0.113
 bowl-0.108
 counter-0.107
box-0.107
Token Sara
Token ID24799
Feature activation+2.19e-02

Pos logit contributions
zo+0.020
 culture+0.020
 habitats+0.019
 galaxies+0.019
 battles+0.019
Neg logit contributions
 stool-0.015
cloth-0.012
 counter-0.012
 chair-0.011
 spoon-0.011
Token asked
Token ID1965
Feature activation+3.34e-02

Pos logit contributions
 stool+0.042
cloth+0.035
 counter+0.033
box+0.031
 chair+0.031
Neg logit contributions
zo-0.039
 culture-0.039
 habitats-0.037
 galaxies-0.037
 rewarding-0.037
Token.
Token ID13
Feature activation+5.02e-03

Pos logit contributions
 stool+0.071
cloth+0.060
 counter+0.057
box+0.055
 chair+0.054
Neg logit contributions
zo-0.052
 culture-0.050
minded-0.050
 planets-0.049
 adventures-0.049
Token
Token ID198
Feature activation-3.87e-03

Pos logit contributions
 stool+0.011
cloth+0.010
 counter+0.009
box+0.009
 chair+0.009
Neg logit contributions
zo-0.007
 culture-0.007
 habitats-0.007
 galaxies-0.007
 planets-0.007
Token,
Token ID11
Feature activation+1.27e-02

Pos logit contributions
 stool+0.033
cloth+0.027
 counter+0.027
itten+0.027
box+0.026
Neg logit contributions
 culture-0.023
 marriage-0.023
zo-0.023
 villains-0.022
 sailors-0.022
Token "
Token ID366
Feature activation+1.58e-02

Pos logit contributions
 stool+0.023
cloth+0.021
box+0.019
itten+0.018
 wra+0.018
Neg logit contributions
zo-0.021
 Friendship-0.021
 culture-0.021
 sailors-0.021
 Earth-0.020
TokenCan
Token ID6090
Feature activation+1.35e-02

Pos logit contributions
 stool+0.028
cloth+0.023
 counter+0.021
box+0.020
 chair+0.020
Neg logit contributions
zo-0.030
 culture-0.030
 habitats-0.029
 galaxies-0.029
 planets-0.029
Token we
Token ID356
Feature activation-2.22e-02

Pos logit contributions
 stool+0.029
cloth+0.024
 counter+0.023
 chair+0.022
box+0.022
Neg logit contributions
zo-0.021
 culture-0.021
 habitats-0.020
 galaxies-0.020
 planets-0.020
Token be
Token ID307
Feature activation-3.69e-02

Pos logit contributions
zo+0.036
 culture+0.035
 galaxies+0.034
 habitats+0.034
 planets+0.034
Neg logit contributions
 stool-0.046
cloth-0.039
 counter-0.037
box-0.035
 chair-0.035
Token friends
Token ID2460
Feature activation-8.20e-02

Pos logit contributions
zo+0.059
 habitats+0.059
 culture+0.059
 galaxies+0.058
 Jupiter+0.055
Neg logit contributions
 stool-0.074
cloth-0.064
box-0.060
 chair-0.059
 bowl-0.058
Token and
Token ID290
Feature activation-2.48e-02

Pos logit contributions
 habitats+0.165
 galaxies+0.162
zo+0.162
 culture+0.157
 planets+0.157
Neg logit contributions
 stool-0.139
cloth-0.112
box-0.107
 chair-0.107
 bowl-0.106
Token play
Token ID711
Feature activation-3.71e-02

Pos logit contributions
zo+0.037
 culture+0.036
 habitats+0.034
 galaxies+0.034
minded+0.034
Neg logit contributions
 stool-0.055
cloth-0.047
 counter-0.045
 chair-0.043
box-0.043
Token together
Token ID1978
Feature activation-4.75e-02

Pos logit contributions
 habitats+0.080
zo+0.078
 culture+0.076
 galaxies+0.075
 cultures+0.074
Neg logit contributions
 stool-0.056
box-0.046
 bowl-0.045
 chair-0.044
cloth-0.044
Token again
Token ID757
Feature activation-5.65e-02

Pos logit contributions
zo+0.077
 culture+0.075
 galaxies+0.073
 habitats+0.073
minded+0.072
Neg logit contributions
 stool-0.098
cloth-0.083
 counter-0.079
 chair-0.077
box-0.077
Token next
Token ID1306
Feature activation-1.37e-02

Pos logit contributions
zo+0.104
 culture+0.100
 villains+0.100
 galaxies+0.099
 habitats+0.098
Neg logit contributions
 stool-0.104
 chair-0.081
cloth-0.080
 bowl-0.079
 counter-0.079
Token From
Token ID3574
Feature activation-4.55e-02

Pos logit contributions
zo+0.106
minded+0.101
 battles+0.101
 galaxies+0.100
ieve+0.100
Neg logit contributions
 stool-0.089
 counter-0.076
 ladder-0.071
 bowl-0.071
 chair-0.070
Token that
Token ID326
Feature activation-1.46e-02

Pos logit contributions
 villains+0.087
 battles+0.086
 victories+0.085
 habitats+0.083
zo+0.083
Neg logit contributions
 stool-0.078
 chair-0.060
 knob-0.058
cloth-0.058
 bowl-0.057
Token day
Token ID1110
Feature activation+1.13e-03

Pos logit contributions
 habitats+0.027
minded+0.027
 culture+0.026
 seals+0.025
 cultures+0.025
Neg logit contributions
 stool-0.029
 chair-0.024
 bowl-0.022
 ladder-0.022
box-0.022
Token on
Token ID319
Feature activation-1.72e-02

Pos logit contributions
 stool+0.002
cloth+0.002
box+0.001
 counter+0.001
itten+0.001
Neg logit contributions
 culture-0.002
zo-0.002
 habitats-0.002
 galaxies-0.002
minded-0.002
Token,
Token ID11
Feature activation-4.09e-02

Pos logit contributions
zo+0.038
 culture+0.037
 habitats+0.036
minded+0.036
 galaxies+0.036
Neg logit contributions
 stool-0.026
cloth-0.021
 counter-0.019
 chair-0.018
box-0.018
Token Bob
Token ID5811
Feature activation-8.14e-02

Pos logit contributions
zo+0.091
 culture+0.089
 habitats+0.088
minded+0.088
 villains+0.086
Neg logit contributions
 stool-0.056
 ladder-0.046
 bowl-0.042
 chair-0.041
 spoon-0.038
Token always
Token ID1464
Feature activation-3.11e-02

Pos logit contributions
 culture+0.177
 victories+0.170
 rewarding+0.169
 diversity+0.168
 aliens+0.168
Neg logit contributions
 stool-0.110
cloth-0.084
 chair-0.082
box-0.081
 slipped-0.079
Token used
Token ID973
Feature activation-2.84e-02

Pos logit contributions
zo+0.060
 culture+0.060
 habitats+0.059
 galaxies+0.057
 aliens+0.057
Neg logit contributions
 stool-0.052
cloth-0.043
box-0.039
 counter-0.038
 chair-0.038
Token his
Token ID465
Feature activation-2.77e-02

Pos logit contributions
 culture+0.053
zo+0.053
minded+0.052
 habitats+0.051
 rewarding+0.050
Neg logit contributions
 stool-0.050
cloth-0.041
box-0.039
 chair-0.038
 counter-0.038
Token big
Token ID1263
Feature activation-1.15e-02

Pos logit contributions
zo+0.044
minded+0.043
 Friendship+0.042
 culture+0.041
 habitats+0.041
Neg logit contributions
 stool-0.060
 chair-0.048
cloth-0.046
 knob-0.046
 broom-0.046
Token ears
Token ID11368
Feature activation-2.85e-02

Pos logit contributions
zo+0.022
 culture+0.022
 adventures+0.021
 habitats+0.021
 planets+0.021
Neg logit contributions
 stool-0.020
cloth-0.017
 counter-0.016
box-0.015
 chair-0.014
Token why
Token ID1521
Feature activation-3.71e-03

Pos logit contributions
zo+0.042
 culture+0.041
minded+0.040
 habitats+0.040
 galaxies+0.040
Neg logit contributions
 stool-0.056
cloth-0.047
 counter-0.045
 chair-0.043
box-0.043
Token don
Token ID836
Feature activation-1.18e-02

Pos logit contributions
zo+0.007
 culture+0.007
 habitats+0.006
 planets+0.006
 galaxies+0.006
Neg logit contributions
 stool-0.007
cloth-0.006
 counter-0.005
box-0.005
 chair-0.005
Token't
Token ID470
Feature activation-1.62e-02

Pos logit contributions
 habitats+0.027
 culture+0.027
zo+0.027
minded+0.026
 galaxies+0.026
Neg logit contributions
 stool-0.017
cloth-0.012
 counter-0.012
 chair-0.011
box-0.010
Token you
Token ID345
Feature activation-2.71e-02

Pos logit contributions
 habitats+0.030
 culture+0.029
 galaxies+0.029
minded+0.028
 solar+0.028
Neg logit contributions
 stool-0.028
 counter-0.024
cloth-0.023
 chair-0.022
 spoon-0.022
Token join
Token ID4654
Feature activation-5.11e-02

Pos logit contributions
zo+0.044
 culture+0.042
 habitats+0.041
 adventures+0.040
 galaxies+0.040
Neg logit contributions
 stool-0.057
cloth-0.048
 counter-0.046
box-0.044
 chair-0.044
Token us
Token ID514
Feature activation-8.12e-02

Pos logit contributions
zo+0.087
 rewarding+0.084
 culture+0.084
 galaxies+0.082
minded+0.082
Neg logit contributions
 stool-0.094
cloth-0.085
 counter-0.077
 spoon-0.074
box-0.074
Token and
Token ID290
Feature activation-3.34e-02

Pos logit contributions
zo+0.147
 culture+0.143
 habitats+0.141
 galaxies+0.140
 villains+0.139
Neg logit contributions
 stool-0.148
cloth-0.124
 counter-0.116
box-0.114
 bowl-0.113
Token try
Token ID1949
Feature activation-3.88e-02

Pos logit contributions
zo+0.052
 culture+0.050
 habitats+0.048
 galaxies+0.048
minded+0.048
Neg logit contributions
 stool-0.071
cloth-0.061
 counter-0.058
 chair-0.055
box-0.055
Token to
Token ID284
Feature activation+2.02e-02

Pos logit contributions
zo+0.073
 habitats+0.071
 culture+0.069
 villains+0.068
 galaxies+0.067
Neg logit contributions
 stool-0.071
 chair-0.055
 bowl-0.054
 counter-0.054
 spoon-0.054
Token forget
Token ID6044
Feature activation-5.13e-02

Pos logit contributions
 stool+0.039
cloth+0.033
 counter+0.032
 chair+0.030
 bowl+0.030
Neg logit contributions
 culture-0.034
zo-0.034
 habitats-0.033
 galaxies-0.033
minded-0.033
Token about
Token ID546
Feature activation-4.94e-02

Pos logit contributions
zo+0.092
 galaxies+0.085
 culture+0.085
 habitats+0.084
minded+0.084
Neg logit contributions
 stool-0.096
 counter-0.080
cloth-0.079
box-0.076
 chair-0.075
Token,
Token ID11
Feature activation-2.97e-02

Pos logit contributions
zo+0.037
 culture+0.037
 adventures+0.036
 planets+0.036
 galaxies+0.036
Neg logit contributions
 stool-0.029
cloth-0.024
 counter-0.022
box-0.021
 chair-0.020
Token the
Token ID262
Feature activation+3.71e-03

Pos logit contributions
 culture+0.064
 habitats+0.063
zo+0.062
minded+0.059
 languages+0.059
Neg logit contributions
 stool-0.046
 ladder-0.035
 bowl-0.034
 counter-0.033
 chair-0.033
Token mole
Token ID9411
Feature activation-6.23e-02

Pos logit contributions
 stool+0.007
cloth+0.006
 bowl+0.006
 counter+0.006
 lid+0.006
Neg logit contributions
zo-0.007
minded-0.006
 galaxies-0.006
 habitats-0.006
 culture-0.006
Token and
Token ID290
Feature activation-2.98e-02

Pos logit contributions
 culture+0.128
 diversity+0.121
 rewarding+0.121
 galaxies+0.121
 Friendship+0.120
Neg logit contributions
 stool-0.095
cloth-0.078
 chair-0.071
 bowl-0.070
 spoon-0.070
Token the
Token ID262
Feature activation+8.25e-03

Pos logit contributions
 culture+0.055
zo+0.055
 habitats+0.054
minded+0.053
 rewarding+0.053
Neg logit contributions
 stool-0.054
cloth-0.043
 chair-0.042
 counter-0.041
 bowl-0.040
Token duck
Token ID22045
Feature activation-8.10e-02

Pos logit contributions
 stool+0.017
cloth+0.014
 counter+0.014
 chair+0.013
 bowl+0.013
Neg logit contributions
zo-0.014
 culture-0.013
 galaxies-0.013
 habitats-0.013
minded-0.013
Token became
Token ID2627
Feature activation-3.93e-02

Pos logit contributions
 culture+0.177
 rewarding+0.169
minded+0.168
 galaxies+0.167
 Friendship+0.166
Neg logit contributions
 stool-0.094
cloth-0.083
 slipped-0.080
 bowl-0.076
 spoon-0.075
Token good
Token ID922
Feature activation-5.59e-02

Pos logit contributions
 culture+0.076
 habitats+0.075
zo+0.073
minded+0.071
 galaxies+0.070
Neg logit contributions
 stool-0.065
cloth-0.053
 bowl-0.050
 spoon-0.048
 ladder-0.047
Token friends
Token ID2460
Feature activation-6.59e-02

Pos logit contributions
 Jupiter+0.133
 culture+0.127
 generation+0.126
 Friendship+0.122
 diversity+0.120
Neg logit contributions
 stool-0.051
cloth-0.046
 spoon-0.045
 m-0.044
 bowl-0.043
Token and
Token ID290
Feature activation-6.65e-02

Pos logit contributions
 culture+0.167
 rewarding+0.162
 habitats+0.161
 galaxies+0.158
 cultures+0.157
Neg logit contributions
 stool-0.059
cloth-0.049
 chair-0.039
box-0.039
 bowl-0.036
Token they
Token ID484
Feature activation-6.35e-02

Pos logit contributions
minded+0.114
 culture+0.112
 galaxies+0.111
 rewarding+0.109
ieve+0.109
Neg logit contributions
 stool-0.111
cloth-0.092
 spoon-0.089
 brush-0.088
 chair-0.088
Token Do
Token ID2141
Feature activation-1.63e-02

Pos logit contributions
zo+0.058
 culture+0.057
 galaxies+0.055
 planets+0.055
 adventures+0.055
Neg logit contributions
 stool-0.078
cloth-0.066
 counter-0.064
box-0.061
 chair-0.061
Token you
Token ID345
Feature activation-4.54e-02

Pos logit contributions
 culture+0.027
 solar+0.026
 galaxies+0.026
minded+0.026
 habitats+0.026
Neg logit contributions
 stool-0.031
cloth-0.025
 counter-0.024
 chair-0.024
box-0.024
Token want
Token ID765
Feature activation-4.10e-02

Pos logit contributions
 culture+0.075
 habitats+0.074
minded+0.073
 galaxies+0.073
 planets+0.072
Neg logit contributions
 stool-0.081
cloth-0.077
 brush-0.068
 knob-0.066
 handle-0.066
Token to
Token ID284
Feature activation+1.46e-02

Pos logit contributions
zo+0.085
 habitats+0.082
 rewarding+0.082
 culture+0.081
 galaxies+0.081
Neg logit contributions
 stool-0.062
 counter-0.051
box-0.050
 chair-0.049
 spoon-0.048
Token be
Token ID307
Feature activation-4.12e-02

Pos logit contributions
 stool+0.029
cloth+0.025
 counter+0.024
 chair+0.023
box+0.023
Neg logit contributions
zo-0.024
 culture-0.024
 galaxies-0.023
minded-0.023
 habitats-0.023
Token friends
Token ID2460
Feature activation-8.08e-02

Pos logit contributions
zo+0.071
 habitats+0.071
 culture+0.070
 galaxies+0.069
minded+0.068
Neg logit contributions
 stool-0.075
cloth-0.066
box-0.060
 counter-0.059
 chair-0.059
Token?
Token ID30
Feature activation-3.48e-02

Pos logit contributions
 habitats+0.178
 galaxies+0.174
zo+0.172
 culture+0.170
 planets+0.170
Neg logit contributions
 stool-0.119
cloth-0.094
 chair-0.090
box-0.089
 bowl-0.087
Tokenâ
Token ID22940
Feature activation-2.09e-03

Pos logit contributions
zo+0.053
 culture+0.052
 habitats+0.050
 galaxies+0.050
 planets+0.049
Neg logit contributions
 stool-0.075
cloth-0.064
 counter-0.061
box-0.059
 chair-0.058
Token€
Token ID26391
Feature activation-3.35e-02

Pos logit contributions
 aliens+0.004
 galaxies+0.004
 planets+0.004
 islands+0.004
 culture+0.004
Neg logit contributions
 stool-0.004
cloth-0.003
 counter-0.003
box-0.003
 wrapper-0.003
Token
Token ID220
Feature activation-7.57e-04

Pos logit contributions
 culture+0.057
zo+0.057
 aliens+0.055
 galaxies+0.055
 habitats+0.055
Neg logit contributions
 stool-0.066
cloth-0.054
 counter-0.052
 chair-0.051
box-0.051
Token
Token ID198
Feature activation-1.71e-02

Pos logit contributions
zo+0.001
 culture+0.001
minded+0.001
 habitats+0.001
 galaxies+0.001
Neg logit contributions
 stool-0.002
cloth-0.002
 counter-0.001
box-0.001
 chair-0.001
Token music
Token ID2647
Feature activation-1.32e-02

Pos logit contributions
zo+0.021
 habitats+0.020
minded+0.020
 islands+0.020
 culture+0.020
Neg logit contributions
 stool-0.020
 chair-0.016
cloth-0.015
 counter-0.015
 knob-0.015
Token.
Token ID13
Feature activation-3.90e-02

Pos logit contributions
zo+0.020
 culture+0.020
 adventures+0.020
 planets+0.019
 aliens+0.019
Neg logit contributions
 stool-0.028
cloth-0.024
 counter-0.023
 wra-0.022
itten-0.022
Token Can
Token ID1680
Feature activation+1.38e-03

Pos logit contributions
zo+0.072
 adventures+0.071
 culture+0.069
 rewarding+0.068
minded+0.067
Neg logit contributions
 stool-0.072
cloth-0.059
 counter-0.057
 chair-0.055
box-0.052
Token we
Token ID356
Feature activation-2.29e-02

Pos logit contributions
 stool+0.002
cloth+0.002
 counter+0.002
 chair+0.002
box+0.002
Neg logit contributions
zo-0.003
 culture-0.003
 habitats-0.003
 galaxies-0.003
 rewarding-0.003
Token be
Token ID307
Feature activation-4.29e-02

Pos logit contributions
 habitats+0.038
 culture+0.038
 galaxies+0.037
minded+0.037
zo+0.037
Neg logit contributions
 stool-0.045
cloth-0.037
 brush-0.035
 counter-0.035
 chair-0.035
Token friends
Token ID2460
Feature activation-8.07e-02

Pos logit contributions
 habitats+0.073
zo+0.071
 galaxies+0.071
 culture+0.070
minded+0.068
Neg logit contributions
 stool-0.080
cloth-0.066
 brush-0.064
 chair-0.063
box-0.063
Token?"
Token ID1701
Feature activation+1.43e-02

Pos logit contributions
 habitats+0.179
 galaxies+0.175
zo+0.173
 culture+0.170
 rewarding+0.165
Neg logit contributions
 stool-0.117
 chair-0.089
 bowl-0.088
cloth-0.086
box-0.086
Token Tom
Token ID4186
Feature activation+1.15e-02

Pos logit contributions
 stool+0.027
cloth+0.023
 counter+0.021
box+0.020
 chair+0.020
Neg logit contributions
zo-0.026
 culture-0.026
 habitats-0.025
 adventures-0.025
minded-0.025
Token smiled
Token ID13541
Feature activation+2.06e-03

Pos logit contributions
 stool+0.019
cloth+0.015
 counter+0.014
box+0.014
 chair+0.013
Neg logit contributions
zo-0.023
 culture-0.023
 habitats-0.022
 galaxies-0.022
 rewarding-0.022
Token and
Token ID290
Feature activation+1.03e-02

Pos logit contributions
 stool+0.003
cloth+0.003
 chair+0.002
 counter+0.002
box+0.002
Neg logit contributions
 culture-0.004
zo-0.004
 habitats-0.004
 rewarding-0.004
 diversity-0.004
Token said
Token ID531
Feature activation+2.60e-02

Pos logit contributions
 stool+0.016
cloth+0.013
 counter+0.012
 chair+0.011
box+0.011
Neg logit contributions
zo-0.022
 culture-0.022
 habitats-0.021
 galaxies-0.021
 planets-0.021
Token,
Token ID11
Feature activation-1.38e-04

Pos logit contributions
zo+0.043
 culture+0.040
minded+0.039
 adventures+0.039
 habitats+0.039
Neg logit contributions
 stool-0.049
cloth-0.040
 counter-0.038
 wra-0.037
box-0.037
Token can
Token ID460
Feature activation-6.52e-04

Pos logit contributions
zo+0.000
 habitats+0.000
 culture+0.000
minded+0.000
 galaxies+0.000
Neg logit contributions
 stool-0.000
cloth-0.000
 counter-0.000
 chair-0.000
 knob-0.000
Token you
Token ID345
Feature activation-4.16e-02

Pos logit contributions
 habitats+0.001
 culture+0.001
zo+0.001
 galaxies+0.001
minded+0.001
Neg logit contributions
 stool-0.001
cloth-0.001
 counter-0.001
 chair-0.001
 spoon-0.001
Token whistle
Token ID16121
Feature activation-6.18e-02

Pos logit contributions
 habitats+0.081
 culture+0.080
zo+0.079
 galaxies+0.079
 planets+0.078
Neg logit contributions
 stool-0.071
cloth-0.057
 counter-0.055
 chair-0.054
box-0.053
Token for
Token ID329
Feature activation-3.37e-02

Pos logit contributions
 habitats+0.120
 galaxies+0.116
 culture+0.115
 rewarding+0.113
zo+0.113
Neg logit contributions
 stool-0.101
 chair-0.078
cloth-0.078
box-0.076
 counter-0.074
Token me
Token ID502
Feature activation-8.05e-02

Pos logit contributions
zo+0.072
 habitats+0.072
 culture+0.070
 galaxies+0.070
minded+0.070
Neg logit contributions
 stool-0.047
 counter-0.036
cloth-0.035
 chair-0.035
 spoon-0.034
Token?"
Token ID1701
Feature activation-2.14e-02

Pos logit contributions
 galaxies+0.167
 habitats+0.165
 culture+0.162
 planets+0.162
 diversity+0.160
Neg logit contributions
 stool-0.130
box-0.100
cloth-0.097
 chair-0.096
 bowl-0.094
Token The
Token ID383
Feature activation+1.67e-02

Pos logit contributions
 culture+0.041
 rewarding+0.040
 habitats+0.039
 galaxies+0.039
zo+0.039
Neg logit contributions
 stool-0.038
 counter-0.032
 chair-0.032
 bowl-0.030
cloth-0.029
Token bird
Token ID6512
Feature activation-7.01e-03

Pos logit contributions
 stool+0.037
cloth+0.032
 counter+0.031
 chair+0.029
box+0.029
Neg logit contributions
zo-0.025
 culture-0.024
 habitats-0.023
 galaxies-0.023
minded-0.023
Token whist
Token ID32262
Feature activation-1.15e-02

Pos logit contributions
 culture+0.015
zo+0.015
 habitats+0.014
minded+0.014
 rewarding+0.014
Neg logit contributions
 stool-0.011
cloth-0.009
 counter-0.008
 chair-0.008
box-0.007
Tokenled
Token ID992
Feature activation-4.92e-03

Pos logit contributions
 culture+0.022
 rewarding+0.021
 habitats+0.021
 adventures+0.021
zo+0.021
Neg logit contributions
 stool-0.019
cloth-0.017
box-0.014
 counter-0.014
 chair-0.014
Token that
Token ID326
Feature activation-1.13e-02

Pos logit contributions
 battles+0.084
 villains+0.084
zo+0.083
 victories+0.082
 Friendship+0.082
Neg logit contributions
 stool-0.081
 chair-0.065
 knob-0.062
cloth-0.062
 ladder-0.060
Token day
Token ID1110
Feature activation+7.61e-03

Pos logit contributions
 habitats+0.020
 culture+0.020
minded+0.019
 cultures+0.019
 aliens+0.019
Neg logit contributions
 stool-0.024
 chair-0.020
 bowl-0.018
cloth-0.018
 spoon-0.018
Token on
Token ID319
Feature activation-9.87e-03

Pos logit contributions
 stool+0.014
cloth+0.011
 counter+0.011
box+0.010
 chair+0.010
Neg logit contributions
zo-0.014
 culture-0.014
 habitats-0.014
 galaxies-0.014
minded-0.014
Token,
Token ID11
Feature activation-3.00e-02

Pos logit contributions
zo+0.020
 culture+0.019
 adventures+0.019
 planets+0.019
 galaxies+0.019
Neg logit contributions
 stool-0.017
cloth-0.014
 counter-0.013
box-0.012
 wra-0.012
Token the
Token ID262
Feature activation+8.87e-04

Pos logit contributions
 culture+0.067
 habitats+0.066
zo+0.064
 languages+0.062
minded+0.062
Neg logit contributions
 stool-0.045
 chair-0.036
 ladder-0.033
 bowl-0.032
 counter-0.032
Token cat
Token ID3797
Feature activation-8.05e-02

Pos logit contributions
 stool+0.002
cloth+0.001
 counter+0.001
 chair+0.001
box+0.001
Neg logit contributions
zo-0.002
 culture-0.002
minded-0.002
 habitats-0.002
 galaxies-0.002
Token wasn
Token ID2492
Feature activation-1.38e-02

Pos logit contributions
 culture+0.165
 diversity+0.157
 rewarding+0.157
 cultures+0.155
 Friendship+0.155
Neg logit contributions
 stool-0.114
 slipped-0.096
 chair-0.095
cloth-0.094
 bowl-0.088
Token't
Token ID470
Feature activation-1.59e-02

Pos logit contributions
 culture+0.029
zo+0.029
 habitats+0.029
 galaxies+0.028
 planets+0.028
Neg logit contributions
 stool-0.022
cloth-0.017
 counter-0.016
 chair-0.015
box-0.015
Token so
Token ID523
Feature activation-1.21e-02

Pos logit contributions
 culture+0.028
 habitats+0.028
 diversity+0.027
minded+0.027
zo+0.027
Neg logit contributions
 stool-0.030
cloth-0.025
 counter-0.023
 chair-0.022
 spoon-0.022
Token lazy
Token ID16931
Feature activation-3.32e-02

Pos logit contributions
 habitats+0.023
 culture+0.023
 Friendship+0.022
 targets+0.022
 cultures+0.022
Neg logit contributions
 stool-0.019
cloth-0.017
 slipped-0.016
 ladder-0.015
 chair-0.015
Token anymore
Token ID7471
Feature activation-5.05e-02

Pos logit contributions
 culture+0.052
zo+0.052
 diversity+0.050
 habitats+0.049
 Jupiter+0.049
Neg logit contributions
 stool-0.069
cloth-0.060
 counter-0.057
 chair-0.055
 wra-0.055
Token you
Token ID345
Feature activation-5.81e-02

Pos logit contributions
minded+0.031
 culture+0.031
zo+0.031
 galaxies+0.031
 habitats+0.031
Neg logit contributions
 stool-0.032
cloth-0.026
 chair-0.025
 wra-0.025
 counter-0.024
Token want
Token ID765
Feature activation-4.70e-02

Pos logit contributions
minded+0.126
 habitats+0.125
 culture+0.124
 galaxies+0.117
 diversity+0.116
Neg logit contributions
cloth-0.075
 stool-0.067
 spoon-0.062
 brush-0.062
 sponge-0.060
Token to
Token ID284
Feature activation-1.81e-02

Pos logit contributions
zo+0.112
 culture+0.110
 habitats+0.110
 rewarding+0.109
 cultures+0.108
Neg logit contributions
 stool-0.052
 bowl-0.044
 spoon-0.044
 cup-0.042
 chair-0.040
Token swim
Token ID9422
Feature activation-7.24e-02

Pos logit contributions
 galaxies+0.036
 culture+0.035
 habitats+0.035
 planets+0.034
minded+0.034
Neg logit contributions
 stool-0.028
 handle-0.024
 towel-0.024
 spoon-0.023
cloth-0.023
Token with
Token ID351
Feature activation-4.16e-02

Pos logit contributions
 culture+0.153
 villains+0.151
 galaxies+0.151
 habitats+0.149
zo+0.148
Neg logit contributions
 stool-0.104
cloth-0.090
 bowl-0.082
 container-0.079
box-0.076
Token me
Token ID502
Feature activation-8.04e-02

Pos logit contributions
zo+0.100
 culture+0.098
 rewarding+0.098
 habitats+0.095
 galaxies+0.095
Neg logit contributions
 stool-0.051
cloth-0.040
 counter-0.038
 chair-0.036
 bowl-0.036
Token?"
Token ID1701
Feature activation-1.47e-02

Pos logit contributions
 culture+0.149
zo+0.147
 rewarding+0.145
 galaxies+0.145
 habitats+0.144
Neg logit contributions
 stool-0.147
cloth-0.117
 chair-0.113
box-0.113
 bowl-0.110
Token The
Token ID383
Feature activation+1.37e-02

Pos logit contributions
 culture+0.024
 rewarding+0.023
 habitats+0.023
 galaxies+0.022
 diversity+0.022
Neg logit contributions
 stool-0.032
 counter-0.026
cloth-0.025
 chair-0.025
 bowl-0.024
Token mole
Token ID9411
Feature activation+1.94e-02

Pos logit contributions
 stool+0.029
cloth+0.024
 counter+0.023
 chair+0.022
box+0.022
Neg logit contributions
zo-0.022
 culture-0.022
 habitats-0.021
 galaxies-0.021
minded-0.021
Token said
Token ID531
Feature activation+2.66e-02

Pos logit contributions
 stool+0.032
cloth+0.026
 counter+0.025
box+0.023
 chair+0.023
Neg logit contributions
zo-0.039
 culture-0.039
 habitats-0.037
 galaxies-0.037
minded-0.037
Token,
Token ID11
Feature activation+8.68e-03

Pos logit contributions
 stool+0.045
cloth+0.036
 counter+0.034
box+0.032
 wra+0.031
Neg logit contributions
zo-0.053
 culture-0.052
minded-0.051
 Friendship-0.051
 galaxies-0.050
Token to
Token ID284
Feature activation-1.59e-02

Pos logit contributions
 culture+0.119
 habitats+0.118
 diversity+0.116
zo+0.116
 galaxies+0.115
Neg logit contributions
 stool-0.119
cloth-0.098
 counter-0.090
 chair-0.090
box-0.089
Token play
Token ID711
Feature activation-2.86e-02

Pos logit contributions
zo+0.027
 habitats+0.026
minded+0.026
 culture+0.026
 galaxies+0.025
Neg logit contributions
 stool-0.030
cloth-0.024
 brush-0.024
 counter-0.024
 chair-0.024
Token and
Token ID290
Feature activation-4.27e-02

Pos logit contributions
 culture+0.052
zo+0.051
 rewarding+0.051
 habitats+0.051
minded+0.050
Neg logit contributions
 stool-0.053
cloth-0.044
 chair-0.041
box-0.041
 counter-0.040
Token have
Token ID423
Feature activation-5.56e-02

Pos logit contributions
zo+0.069
minded+0.067
 culture+0.066
 habitats+0.066
 galaxies+0.065
Neg logit contributions
 stool-0.085
cloth-0.071
 chair-0.068
 counter-0.068
 spoon-0.065
Token fun
Token ID1257
Feature activation-2.56e-02

Pos logit contributions
 culture+0.096
minded+0.089
zo+0.089
 Jupiter+0.088
 habitats+0.087
Neg logit contributions
 stool-0.089
 cup-0.083
 bowl-0.082
 sugar-0.080
 wra-0.079
Token.
Token ID13
Feature activation-8.03e-02

Pos logit contributions
zo+0.052
 culture+0.052
 rewarding+0.050
 habitats+0.050
minded+0.049
Neg logit contributions
 stool-0.040
cloth-0.032
 chair-0.031
 spoon-0.030
box-0.030
Token<|endoftext|>
Token ID50256
Feature activation-3.61e-02

Pos logit contributions
 culture+0.201
ieve+0.193
minded+0.191
 rewarding+0.191
 galaxies+0.191
Neg logit contributions
 stool-0.068
 cup-0.055
itten-0.052
op-0.051
 Lee-0.049
TokenOnce
Token ID7454
Feature activation-3.31e-02

Pos logit contributions
 culture+0.075
 rewarding+0.072
zo+0.071
minded+0.070
 habitats+0.070
Neg logit contributions
 stool-0.060
cloth-0.049
 counter-0.047
 chair-0.047
box-0.046
Token upon
Token ID2402
Feature activation-1.40e-02

Pos logit contributions
 culture+0.085
zo+0.084
 habitats+0.083
minded+0.083
 galaxies+0.082
Neg logit contributions
 stool-0.038
cloth-0.027
 chair-0.023
 counter-0.023
box-0.022
Token a
Token ID257
Feature activation-2.45e-02

Pos logit contributions
 culture+0.027
zo+0.027
minded+0.027
 habitats+0.027
 diversity+0.026
Neg logit contributions
 stool-0.024
cloth-0.020
 counter-0.018
 chair-0.018
 wra-0.017
Token time
Token ID640
Feature activation+2.54e-03

Pos logit contributions
minded+0.042
 culture+0.042
 sailors+0.041
 galaxies+0.041
 aliens+0.040
Neg logit contributions
 stool-0.049
cloth-0.042
 chair-0.039
 bowl-0.039
itten-0.038
Token Do
Token ID2141
Feature activation-4.01e-03

Pos logit contributions
zo+0.046
 adventures+0.045
 culture+0.044
 targets+0.043
 galaxies+0.043
Neg logit contributions
 stool-0.044
cloth-0.038
 counter-0.035
 chair-0.033
box-0.033
Token you
Token ID345
Feature activation-3.94e-02

Pos logit contributions
zo+0.006
 galaxies+0.006
 culture+0.006
 habitats+0.006
minded+0.006
Neg logit contributions
 stool-0.008
cloth-0.007
 counter-0.006
 chair-0.006
box-0.006
Token want
Token ID765
Feature activation-3.63e-02

Pos logit contributions
 galaxies+0.070
 habitats+0.070
 culture+0.069
 planets+0.069
zo+0.068
Neg logit contributions
 stool-0.068
cloth-0.065
 brush-0.057
 spoon-0.054
 handle-0.054
Token to
Token ID284
Feature activation-3.67e-03

Pos logit contributions
zo+0.071
 habitats+0.069
 rewarding+0.068
 galaxies+0.068
 culture+0.068
Neg logit contributions
 stool-0.061
 counter-0.049
cloth-0.048
box-0.047
 chair-0.047
Token join
Token ID4654
Feature activation-4.30e-02

Pos logit contributions
 habitats+0.007
 galaxies+0.007
zo+0.007
 culture+0.006
 rewarding+0.006
Neg logit contributions
 stool-0.006
cloth-0.005
 counter-0.005
 brush-0.005
 handle-0.005
Token us
Token ID514
Feature activation-8.01e-02

Pos logit contributions
zo+0.074
 rewarding+0.071
 culture+0.071
 galaxies+0.069
 habitats+0.069
Neg logit contributions
 stool-0.084
cloth-0.073
 counter-0.067
 chair-0.065
box-0.065
Token and
Token ID290
Feature activation-3.11e-02

Pos logit contributions
 galaxies+0.154
zo+0.154
 culture+0.151
 habitats+0.151
 rewarding+0.149
Neg logit contributions
 stool-0.133
cloth-0.109
box-0.100
 counter-0.099
 chair-0.098
Token play
Token ID711
Feature activation-2.98e-02

Pos logit contributions
zo+0.045
 culture+0.043
 habitats+0.042
 galaxies+0.042
minded+0.041
Neg logit contributions
 stool-0.070
cloth-0.060
 counter-0.057
 chair-0.055
box-0.055
Token with
Token ID351
Feature activation-4.45e-02

Pos logit contributions
 habitats+0.061
zo+0.060
 rewarding+0.059
 galaxies+0.059
 culture+0.058
Neg logit contributions
 stool-0.048
cloth-0.040
 chair-0.037
 counter-0.037
box-0.036
Token us
Token ID514
Feature activation-7.13e-02

Pos logit contributions
zo+0.063
 culture+0.060
 rewarding+0.059
 habitats+0.058
 galaxies+0.058
Neg logit contributions
 stool-0.100
cloth-0.087
 counter-0.084
 chair-0.081
box-0.080
Token?"
Token ID1701
Feature activation-1.68e-02

Pos logit contributions
zo+0.122
 culture+0.120
 galaxies+0.117
 habitats+0.116
 rewarding+0.116
Neg logit contributions
 stool-0.139
cloth-0.116
 counter-0.108
box-0.107
 chair-0.107
Token a
Token ID257
Feature activation-8.40e-04

Pos logit contributions
zo+0.003
 habitats+0.003
 Earth+0.003
 adventures+0.003
 culture+0.003
Neg logit contributions
 stool-0.007
cloth-0.007
box-0.006
itten-0.006
 counter-0.006
Token candy
Token ID18550
Feature activation+3.67e-02

Pos logit contributions
 adventures+0.001
 culture+0.001
zo+0.001
 kingdom+0.001
 Earth+0.001
Neg logit contributions
 stool-0.002
cloth-0.002
 counter-0.001
itten-0.001
amas-0.001
Token bar
Token ID2318
Feature activation+1.05e-02

Pos logit contributions
 stool+0.078
cloth+0.068
apy+0.063
itten+0.063
inged+0.062
Neg logit contributions
 adventures-0.049
 kingdom-0.048
 seals-0.046
zo-0.046
 culture-0.046
Token and
Token ID290
Feature activation+7.76e-03

Pos logit contributions
 stool+0.024
cloth+0.020
 counter+0.019
 sponge+0.018
 broom+0.018
Neg logit contributions
zo-0.014
 culture-0.014
minded-0.014
 seals-0.014
 habitats-0.013
Token started
Token ID2067
Feature activation+1.66e-02

Pos logit contributions
 stool+0.017
itten+0.014
cloth+0.013
box+0.013
amas+0.013
Neg logit contributions
 culture-0.012
 adventures-0.012
 planets-0.011
 seals-0.011
 galaxies-0.011
Token to
Token ID284
Feature activation+5.09e-02

Pos logit contributions
 stool+0.041
cloth+0.036
itten+0.036
 knob+0.034
 lid+0.034
Neg logit contributions
 adventures-0.019
 culture-0.017
 habitats-0.017
 battles-0.016
 seals-0.016
Token chew
Token ID34722
Feature activation+2.36e-02

Pos logit contributions
 stool+0.114
itten+0.103
box+0.098
cloth+0.092
driver+0.089
Neg logit contributions
 adventures-0.074
 Earth-0.067
 culture-0.067
 battles-0.065
 planets-0.064
Token it
Token ID340
Feature activation+2.35e-02

Pos logit contributions
 stool+0.055
cloth+0.046
box+0.045
coat+0.045
 counter+0.044
Neg logit contributions
 adventures-0.028
zo-0.028
 targets-0.028
 seals-0.027
 planets-0.027
Token.
Token ID13
Feature activation-9.67e-03

Pos logit contributions
 stool+0.058
cloth+0.049
itten+0.048
 counter+0.047
box+0.047
Neg logit contributions
 adventures-0.028
zo-0.025
 extraordinary-0.024
 rewarding-0.024
 battles-0.024
Token
Token ID198
Feature activation-1.44e-02

Pos logit contributions
zo+0.013
 Friendship+0.013
 Earth+0.012
 habitats+0.012
 culture+0.012
Neg logit contributions
 stool-0.022
cloth-0.019
 wra-0.018
box-0.018
 knob-0.018
Token
Token ID198
Feature activation-3.19e-03

Pos logit contributions
zo+0.017
 Friendship+0.016
 culture+0.016
 adventures+0.015
 habitats+0.015
Neg logit contributions
 stool-0.036
cloth-0.032
 counter-0.030
 wra-0.029
box-0.029
Token and
Token ID290
Feature activation+1.28e-02

Pos logit contributions
 stool+0.001
cloth+0.001
box+0.001
 counter+0.001
itten+0.001
Neg logit contributions
 culture-0.001
 adventures-0.001
 planets-0.001
 habitats-0.001
zo-0.001
Token pressed
Token ID12070
Feature activation+2.12e-02

Pos logit contributions
 stool+0.025
itten+0.021
box+0.021
amas+0.020
cloth+0.020
Neg logit contributions
 adventures-0.021
minded-0.021
zo-0.021
 culture-0.020
 Earth-0.020
Token it
Token ID340
Feature activation+1.25e-02

Pos logit contributions
 stool+0.048
cloth+0.041
 counter+0.040
box+0.039
 chair+0.037
Neg logit contributions
zo-0.029
 culture-0.029
 adventures-0.029
 planets-0.028
 galaxies-0.028
Token.
Token ID13
Feature activation+1.52e-02

Pos logit contributions
 stool+0.028
cloth+0.024
 counter+0.023
box+0.022
 wra+0.022
Neg logit contributions
zo-0.017
 adventures-0.017
minded-0.017
 Jupiter-0.016
 culture-0.016
Token Suddenly
Token ID24975
Feature activation+4.61e-02

Pos logit contributions
 stool+0.030
cloth+0.026
itten+0.025
box+0.024
 knob+0.023
Neg logit contributions
 Earth-0.024
 habitats-0.024
 culture-0.024
minded-0.024
 Friendship-0.024
Token,
Token ID11
Feature activation+5.12e-02

Pos logit contributions
 stool+0.098
cloth+0.085
 counter+0.083
itten+0.080
box+0.078
Neg logit contributions
 Earth-0.069
 adventures-0.068
 aliens-0.068
 planets-0.066
 Jupiter-0.065
Token the
Token ID262
Feature activation+2.07e-02

Pos logit contributions
 stool+0.112
cloth+0.099
itten+0.099
box+0.097
coat+0.093
Neg logit contributions
 Earth-0.072
 adventures-0.070
 planets-0.069
 aliens-0.068
 galaxies-0.067
Token lab
Token ID2248
Feature activation+4.53e-02

Pos logit contributions
 stool+0.038
cloth+0.034
itten+0.031
 counter+0.030
 firmly+0.029
Neg logit contributions
zo-0.035
 planets-0.035
 adventures-0.035
 culture-0.035
 aliens-0.035
Token started
Token ID2067
Feature activation+2.87e-02

Pos logit contributions
 stool+0.082
itten+0.069
cloth+0.064
 counter+0.061
coat+0.059
Neg logit contributions
 aliens-0.081
zo-0.080
 villains-0.077
 battles-0.077
 culture-0.077
Token shaking
Token ID17275
Feature activation+1.14e-02

Pos logit contributions
 stool+0.070
cloth+0.061
 counter+0.059
box+0.057
itten+0.056
Neg logit contributions
zo-0.035
 adventures-0.034
 culture-0.033
 planets-0.032
 galaxies-0.032
Token and
Token ID290
Feature activation+2.24e-02

Pos logit contributions
 stool+0.028
cloth+0.024
itten+0.024
 counter+0.024
box+0.023
Neg logit contributions
 adventures-0.014
 planets-0.014
 galaxies-0.013
 aliens-0.013
 Earth-0.013
Token what
Token ID644
Feature activation+5.23e-04

Pos logit contributions
 stool+0.042
cloth+0.036
 counter+0.034
itten+0.033
box+0.033
Neg logit contributions
 culture-0.030
zo-0.029
 aliens-0.029
 sailors-0.029
 planets-0.029
Token is
Token ID318
Feature activation-1.84e-02

Pos logit contributions
 stool+0.001
cloth+0.001
 counter+0.001
box+0.001
 chair+0.001
Neg logit contributions
zo-0.001
 culture-0.001
 habitats-0.001
minded-0.001
 galaxies-0.001
Token that
Token ID326
Feature activation+7.21e-03

Pos logit contributions
zo+0.026
 culture+0.026
 habitats+0.025
 galaxies+0.025
minded+0.025
Neg logit contributions
 stool-0.042
cloth-0.036
 counter-0.035
box-0.033
 chair-0.033
Token?"
Token ID1701
Feature activation+4.90e-02

Pos logit contributions
 stool+0.017
cloth+0.015
coat+0.014
 wra+0.014
itten+0.014
Neg logit contributions
zo-0.009
 culture-0.009
 Earth-0.009
 rewarding-0.008
 aliens-0.008
Token Ben
Token ID3932
Feature activation+2.81e-02

Pos logit contributions
 stool+0.099
box+0.081
cloth+0.081
 counter+0.077
itten+0.077
Neg logit contributions
zo-0.077
 culture-0.077
 adventures-0.076
minded-0.075
 sailors-0.075
Token asks
Token ID7893
Feature activation+6.25e-02

Pos logit contributions
 stool+0.070
cloth+0.061
 counter+0.059
box+0.057
 chair+0.056
Neg logit contributions
zo-0.033
 culture-0.032
minded-0.031
 planets-0.031
 adventures-0.031
Token.
Token ID13
Feature activation+5.07e-02

Pos logit contributions
 stool+0.161
 counter+0.136
coat+0.135
itten+0.134
 wra+0.134
Neg logit contributions
 sailors-0.063
zo-0.063
minded-0.062
 aliens-0.061
 adventures-0.059
Token
Token ID198
Feature activation+2.11e-02

Pos logit contributions
 stool+0.137
cloth+0.124
 wra+0.123
coat+0.121
box+0.119
Neg logit contributions
 Friendship-0.040
zo-0.039
 Earth-0.039
 Jupiter-0.036
 Captain-0.036
Token
Token ID198
Feature activation+3.30e-02

Pos logit contributions
 stool+0.056
cloth+0.049
 counter+0.048
 wra+0.048
box+0.046
Neg logit contributions
zo-0.022
 Friendship-0.021
 Earth-0.019
minded-0.019
 rewarding-0.018
Token"
Token ID1
Feature activation+3.28e-02

Pos logit contributions
 stool+0.064
 wra+0.057
 counter+0.056
coat+0.054
 wrapper+0.053
Neg logit contributions
zo-0.054
 sailors-0.050
 rewarding-0.048
minded-0.048
 Friendship-0.048
TokenI
Token ID40
Feature activation+5.20e-03

Pos logit contributions
 stool+0.070
cloth+0.060
 counter+0.059
box+0.058
 chair+0.055
Neg logit contributions
zo-0.049
 culture-0.048
 rewarding-0.046
 diversity-0.045
 sailors-0.045
Token all
Token ID477
Feature activation-2.10e-03

Pos logit contributions
 stool+0.004
cloth+0.003
 counter+0.003
itten+0.003
 container+0.003
Neg logit contributions
 culture-0.001
zo-0.001
 planets-0.001
 adventures-0.001
 galaxies-0.001
Token her
Token ID607
Feature activation-8.94e-03

Pos logit contributions
zo+0.004
minded+0.004
 culture+0.004
 villains+0.004
 habitats+0.004
Neg logit contributions
 stool-0.003
cloth-0.003
 chair-0.003
 counter-0.002
 knob-0.002
Token secrets
Token ID13141
Feature activation-1.89e-03

Pos logit contributions
zo+0.013
 seals+0.012
minded+0.012
 islands+0.012
 sailors+0.012
Neg logit contributions
 stool-0.020
 chair-0.016
cloth-0.016
box-0.016
 knob-0.016
Token.
Token ID13
Feature activation-2.54e-03

Pos logit contributions
zo+0.003
 culture+0.003
 habitats+0.003
minded+0.003
 galaxies+0.003
Neg logit contributions
 stool-0.004
cloth-0.003
 counter-0.003
box-0.003
 chair-0.003
Token One
Token ID1881
Feature activation+1.93e-02

Pos logit contributions
zo+0.005
 culture+0.005
 habitats+0.004
 galaxies+0.004
 planets+0.004
Neg logit contributions
 stool-0.005
cloth-0.004
 counter-0.004
box-0.004
 chair-0.004
Token day
Token ID1110
Feature activation+6.14e-02

Pos logit contributions
 stool+0.046
 chair+0.039
cloth+0.038
 counter+0.037
 knob+0.037
Neg logit contributions
zo-0.026
minded-0.026
 culture-0.026
 habitats-0.026
 islands-0.025
Token,
Token ID11
Feature activation+6.51e-02

Pos logit contributions
 stool+0.142
 counter+0.123
cloth+0.123
box+0.117
 container+0.111
Neg logit contributions
 Earth-0.075
 adventures-0.073
 marriage-0.070
 culture-0.069
 Friendship-0.067
Token Lily
Token ID20037
Feature activation-8.11e-03

Pos logit contributions
 stool+0.137
box+0.129
itten+0.124
cloth+0.123
 counter+0.118
Neg logit contributions
 Earth-0.086
 adventures-0.084
 marriage-0.083
 adventurous-0.075
 planets-0.075
Token was
Token ID373
Feature activation+9.40e-03

Pos logit contributions
zo+0.015
 culture+0.014
 adventures+0.014
 planets+0.014
 galaxies+0.014
Neg logit contributions
 stool-0.015
cloth-0.013
 counter-0.012
box-0.011
 chair-0.011
Token feeling
Token ID4203
Feature activation+3.81e-03

Pos logit contributions
 stool+0.018
cloth+0.015
 counter+0.015
box+0.014
 chair+0.014
Neg logit contributions
zo-0.017
 culture-0.016
 habitats-0.016
 diversity-0.016
minded-0.016
Token gr
Token ID1036
Feature activation+6.26e-03

Pos logit contributions
 stool+0.007
cloth+0.006
 counter+0.006
 chair+0.006
 wra+0.005
Neg logit contributions
zo-0.007
 habitats-0.007
 culture-0.007
 Friendship-0.006
 diversity-0.006
Token couldn
Token ID3521
Feature activation+4.99e-02

Pos logit contributions
 stool+0.024
 counter+0.020
cloth+0.020
amas+0.018
 wra+0.017
Neg logit contributions
zo-0.017
 adventures-0.016
 experienced-0.014
 planets-0.014
 rewarding-0.014
Token't
Token ID470
Feature activation+3.55e-02

Pos logit contributions
 stool+0.115
cloth+0.107
box+0.098
pit+0.097
 counter+0.096
Neg logit contributions
 Earth-0.059
 Friendship-0.056
zo-0.054
 adventures-0.054
 Jupiter-0.053
Token find
Token ID1064
Feature activation+9.18e-04

Pos logit contributions
 stool+0.069
cloth+0.057
 counter+0.054
box+0.052
 chair+0.051
Neg logit contributions
zo-0.063
 culture-0.062
 habitats-0.059
 galaxies-0.059
minded-0.059
Token a
Token ID257
Feature activation+1.22e-02

Pos logit contributions
 stool+0.002
cloth+0.002
 counter+0.001
box+0.001
 chair+0.001
Neg logit contributions
zo-0.001
 culture-0.001
 planets-0.001
 adventures-0.001
 galaxies-0.001
Token way
Token ID835
Feature activation+1.50e-02

Pos logit contributions
cloth+0.022
 stool+0.021
amas+0.018
 wra+0.018
hetically+0.018
Neg logit contributions
 Earth-0.019
 adventures-0.019
 kingdom-0.019
 culture-0.019
 extraordinary-0.019
Token to
Token ID284
Feature activation+5.08e-02

Pos logit contributions
 stool+0.037
cloth+0.034
 wra+0.033
 counter+0.032
amas+0.031
Neg logit contributions
 Earth-0.016
 Jupiter-0.016
 Friendship-0.015
 aliens-0.015
 extraordinary-0.015
Token measure
Token ID3953
Feature activation+1.12e-02

Pos logit contributions
 stool+0.105
cloth+0.091
itten+0.088
box+0.087
coat+0.086
Neg logit contributions
 adventures-0.079
 Earth-0.077
 extraordinary-0.075
 rewarding-0.075
 Friendship-0.073
Token them
Token ID606
Feature activation-4.19e-03

Pos logit contributions
 stool+0.026
cloth+0.022
coat+0.022
amas+0.022
itten+0.022
Neg logit contributions
 adventures-0.014
 planets-0.014
 culture-0.013
 diversity-0.013
 extraordinary-0.013
Token!
Token ID0
Feature activation+7.84e-03

Pos logit contributions
 adventures+0.005
zo+0.005
 planets+0.005
 rewarding+0.005
minded+0.005
Neg logit contributions
 stool-0.010
cloth-0.009
itten-0.008
coat-0.008
 counter-0.008
Token
Token ID198
Feature activation-2.20e-02

Pos logit contributions
 stool+0.019
cloth+0.017
 counter+0.016
box+0.015
 chair+0.015
Neg logit contributions
 Friendship-0.010
 adventures-0.009
 habitats-0.009
minded-0.009
 Earth-0.009
Token
Token ID198
Feature activation+1.39e-02

Pos logit contributions
 culture+0.040
zo+0.039
 galaxies+0.039
 planets+0.038
 habitats+0.038
Neg logit contributions
 stool-0.041
cloth-0.034
 counter-0.033
 chair-0.032
itten-0.031
Token shore
Token ID15191
Feature activation+1.03e-02

Pos logit contributions
 stool+0.004
cloth+0.004
 counter+0.004
box+0.003
 chair+0.003
Neg logit contributions
zo-0.003
 culture-0.003
 planets-0.003
 habitats-0.003
 galaxies-0.003
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
 stool+0.025
itten+0.021
cloth+0.020
 wra+0.020
 counter+0.020
Neg logit contributions
zo-0.012
 sailors-0.012
minded-0.012
 islands-0.012
 seals-0.011
Token She
Token ID1375
Feature activation+7.17e-03

Pos logit contributions
 stool+0.023
cloth+0.020
box+0.018
 counter+0.018
itten+0.018
Neg logit contributions
 Friendship-0.020
 culture-0.019
 habitats-0.019
minded-0.019
zo-0.019
Token hugged
Token ID41130
Feature activation+1.45e-02

Pos logit contributions
 stool+0.012
cloth+0.010
 counter+0.010
box+0.009
 chair+0.009
Neg logit contributions
zo-0.014
 culture-0.014
 habitats-0.013
 planets-0.013
 galaxies-0.013
Token him
Token ID683
Feature activation+6.72e-03

Pos logit contributions
 stool+0.039
cloth+0.034
box+0.033
 counter+0.032
 container+0.031
Neg logit contributions
zo-0.014
minded-0.014
 adventures-0.014
 Friendship-0.013
 sailors-0.013
Token and
Token ID290
Feature activation+8.31e-03

Pos logit contributions
 stool+0.014
cloth+0.012
 counter+0.011
box+0.011
 chair+0.011
Neg logit contributions
zo-0.011
 culture-0.010
minded-0.010
 planets-0.010
 adventures-0.010
Token checked
Token ID10667
Feature activation+1.91e-02

Pos logit contributions
 stool+0.016
cloth+0.013
 counter+0.013
itten+0.012
box+0.012
Neg logit contributions
zo-0.014
 culture-0.014
 adventures-0.014
 sailors-0.014
 planets-0.014
Token if
Token ID611
Feature activation+1.60e-02

Pos logit contributions
 stool+0.046
 counter+0.039
cloth+0.039
amas+0.037
 pitcher+0.037
Neg logit contributions
minded-0.023
 adventures-0.022
zo-0.022
 sailors-0.022
 diversity-0.021
Token he
Token ID339
Feature activation+2.53e-03

Pos logit contributions
 stool+0.035
cloth+0.030
 counter+0.028
itten+0.027
box+0.027
Neg logit contributions
zo-0.024
 culture-0.024
 adventures-0.024
minded-0.023
 planets-0.023
Token was
Token ID373
Feature activation+6.88e-03

Pos logit contributions
 stool+0.004
cloth+0.003
 counter+0.003
box+0.003
 wra+0.003
Neg logit contributions
 culture-0.005
zo-0.005
 galaxies-0.005
 habitats-0.005
 rewarding-0.005
Token okay
Token ID8788
Feature activation-7.99e-03

Pos logit contributions
 stool+0.014
cloth+0.012
 counter+0.011
box+0.011
 chair+0.011
Neg logit contributions
zo-0.012
 culture-0.012
 habitats-0.011
 galaxies-0.011
minded-0.011
Token
Token ID220
Feature activation+1.73e-02

Pos logit contributions
zo+0.027
 culture+0.027
 galaxies+0.026
 habitats+0.026
minded+0.026
Neg logit contributions
 stool-0.035
cloth-0.030
 counter-0.028
box-0.027
 chair-0.027
Token
Token ID198
Feature activation-8.91e-03

Pos logit contributions
 stool+0.045
cloth+0.040
 counter+0.038
 wra+0.037
 knob+0.037
Neg logit contributions
zo-0.018
 Friendship-0.018
 rewarding-0.016
 adventures-0.016
 Earth-0.016
Token
Token ID198
Feature activation-6.49e-03

Pos logit contributions
 culture+0.014
zo+0.013
 habitats+0.013
minded+0.013
 rewarding+0.013
Neg logit contributions
 stool-0.019
cloth-0.016
 counter-0.016
 chair-0.015
box-0.015
TokenAlex
Token ID15309
Feature activation-9.69e-03

Pos logit contributions
zo+0.011
 culture+0.011
 galaxies+0.011
 planets+0.010
 adventures+0.010
Neg logit contributions
 stool-0.013
cloth-0.011
 counter-0.010
box-0.010
 wra-0.010
Token was
Token ID373
Feature activation-8.53e-03

Pos logit contributions
zo+0.017
 culture+0.016
 galaxies+0.016
 planets+0.016
 adventures+0.016
Neg logit contributions
 stool-0.020
cloth-0.016
 counter-0.016
 wra-0.015
itten-0.015
Token so
Token ID523
Feature activation+5.15e-03

Pos logit contributions
 rewarding+0.010
 culture+0.010
zo+0.009
 adventures+0.009
 extraordinary+0.009
Neg logit contributions
 stool-0.021
cloth-0.018
 counter-0.018
 wra-0.018
itten-0.017
Token happy
Token ID3772
Feature activation-1.78e-02

Pos logit contributions
 stool+0.015
 counter+0.012
cloth+0.012
box+0.012
coat+0.012
Neg logit contributions
 rewarding-0.005
 extraordinary-0.005
 culture-0.004
 Friendship-0.004
zo-0.004
Token to
Token ID284
Feature activation+1.74e-02

Pos logit contributions
zo+0.024
 culture+0.023
 adventures+0.023
 galaxies+0.022
minded+0.022
Neg logit contributions
 stool-0.042
cloth-0.036
 counter-0.034
box-0.033
itten-0.033
Token have
Token ID423
Feature activation-1.22e-02

Pos logit contributions
 stool+0.039
cloth+0.034
 counter+0.032
box+0.031
 chair+0.031
Neg logit contributions
zo-0.025
 culture-0.025
 adventures-0.023
 habitats-0.023
 rewarding-0.023
Token found
Token ID1043
Feature activation-3.02e-03

Pos logit contributions
zo+0.019
 culture+0.019
 habitats+0.018
 galaxies+0.018
minded+0.018
Neg logit contributions
 stool-0.026
cloth-0.022
 counter-0.021
box-0.020
 chair-0.020
Token a
Token ID257
Feature activation+7.24e-03

Pos logit contributions
zo+0.004
 culture+0.004
 adventures+0.004
 galaxies+0.004
 planets+0.004
Neg logit contributions
 stool-0.007
cloth-0.006
 counter-0.006
box-0.005
 chair-0.005
Token store
Token ID3650
Feature activation+1.77e-02

Pos logit contributions
 stool+0.020
itten+0.019
cloth+0.018
coat+0.016
apy+0.016
Neg logit contributions
 adventures-0.026
 marriage-0.026
 kingdom-0.026
 Earth-0.025
 culture-0.025
Token.
Token ID13
Feature activation+6.49e-03

Pos logit contributions
 stool+0.042
itten+0.039
cloth+0.036
 sponge+0.036
apy+0.035
Neg logit contributions
minded-0.020
 adventures-0.019
 culture-0.018
 sailors-0.017
 Friendship-0.017
Token Then
Token ID3244
Feature activation+3.41e-02

Pos logit contributions
 stool+0.016
cloth+0.013
box+0.013
itten+0.013
 knob+0.013
Neg logit contributions
 habitats-0.007
 culture-0.007
 seals-0.007
 Friendship-0.007
 Earth-0.007
Token they
Token ID484
Feature activation+7.38e-03

Pos logit contributions
 stool+0.080
box+0.068
 counter+0.067
itten+0.066
cloth+0.065
Neg logit contributions
 seals-0.042
 adventures-0.042
 marriage-0.041
 dolphins-0.040
 culture-0.040
Token sne
Token ID10505
Feature activation-7.94e-03

Pos logit contributions
 stool+0.017
itten+0.015
amas+0.014
cloth+0.014
 counter+0.014
Neg logit contributions
zo-0.009
 married-0.009
 seals-0.008
 sailors-0.008
 experienced-0.008
Tokenaked
Token ID4335
Feature activation+2.26e-02

Pos logit contributions
zo+0.012
 habitats+0.012
 adventures+0.011
minded+0.011
 culture+0.011
Neg logit contributions
 stool-0.018
 counter-0.014
cloth-0.014
 chair-0.014
box-0.013
Token to
Token ID284
Feature activation+1.34e-02

Pos logit contributions
 stool+0.051
itten+0.047
cloth+0.044
box+0.043
coat+0.041
Neg logit contributions
minded-0.027
 Earth-0.027
 sailors-0.027
 adventures-0.026
 islands-0.026
Token the
Token ID262
Feature activation+2.57e-02

Pos logit contributions
 stool+0.029
cloth+0.025
 counter+0.024
box+0.023
 chair+0.022
Neg logit contributions
zo-0.021
 culture-0.020
 habitats-0.019
 galaxies-0.019
minded-0.019
Token kitchen
Token ID9592
Feature activation+3.03e-02

Pos logit contributions
itten+0.040
 stool+0.035
cloth+0.032
coat+0.031
ifully+0.031
Neg logit contributions
 marriage-0.049
 kingdom-0.046
 Earth-0.046
 wedding-0.045
 culture-0.045
Token.
Token ID13
Feature activation-1.42e-03

Pos logit contributions
itten+0.075
 stool+0.069
 swollen+0.066
Uh+0.066
apy+0.064
Neg logit contributions
 adventures-0.027
 Earth-0.025
 about-0.024
 islands-0.023
minded-0.023
Token They
Token ID1119
Feature activation+1.31e-02

Pos logit contributions
 culture+0.002
 Friendship+0.002
 seals+0.002
 habitats+0.002
 Earth+0.002
Neg logit contributions
 stool-0.003
cloth-0.003
box-0.003
 counter-0.003
itten-0.003
Token of
Token ID286
Feature activation-2.24e-02

Pos logit contributions
zo+0.019
 sailors+0.018
 culture+0.018
 planets+0.018
 rewarding+0.018
Neg logit contributions
 stool-0.029
cloth-0.025
 counter-0.024
 wra-0.023
box-0.023
Token you
Token ID345
Feature activation-2.98e-02

Pos logit contributions
zo+0.026
 culture+0.026
 adventures+0.026
 planets+0.026
 rewarding+0.025
Neg logit contributions
 stool-0.055
cloth-0.048
 counter-0.046
itten-0.044
box-0.044
Token,
Token ID11
Feature activation-2.41e-02

Pos logit contributions
zo+0.046
 culture+0.045
 adventures+0.043
 planets+0.043
 aliens+0.043
Neg logit contributions
 stool-0.065
cloth-0.054
 counter-0.053
 wra-0.050
 chair-0.050
Token so
Token ID523
Feature activation-1.15e-02

Pos logit contributions
zo+0.035
 culture+0.035
 rewarding+0.034
 adventures+0.034
 planets+0.033
Neg logit contributions
 stool-0.053
cloth-0.045
 counter-0.043
box-0.041
 wra-0.041
Token you
Token ID345
Feature activation-2.74e-02

Pos logit contributions
zo+0.016
 culture+0.016
 rewarding+0.015
 adventures+0.015
 planets+0.015
Neg logit contributions
 stool-0.027
cloth-0.023
 counter-0.022
box-0.021
 wra-0.021
Token don
Token ID836
Feature activation+1.36e-02

Pos logit contributions
 culture+0.048
zo+0.048
 habitats+0.047
minded+0.046
 galaxies+0.046
Neg logit contributions
 stool-0.052
cloth-0.044
 counter-0.041
box-0.040
 chair-0.039
Token't
Token ID470
Feature activation+7.43e-04

Pos logit contributions
 stool+0.024
cloth+0.020
 counter+0.018
box+0.018
 chair+0.017
Neg logit contributions
zo-0.026
 culture-0.026
 adventures-0.025
 Friendship-0.025
 planets-0.025
Token get
Token ID651
Feature activation-1.94e-02

Pos logit contributions
 stool+0.002
cloth+0.001
 counter+0.001
 chair+0.001
itten+0.001
Neg logit contributions
 culture-0.001
zo-0.001
 galaxies-0.001
 habitats-0.001
 planets-0.001
Token soaked
Token ID33657
Feature activation-1.05e-02

Pos logit contributions
zo+0.034
 culture+0.034
 habitats+0.033
 galaxies+0.033
 diversity+0.032
Neg logit contributions
 stool-0.038
cloth-0.032
 counter-0.030
 chair-0.028
box-0.028
Token!"
Token ID2474
Feature activation-6.06e-03

Pos logit contributions
zo+0.015
 culture+0.014
 planets+0.014
minded+0.014
 adventures+0.014
Neg logit contributions
 stool-0.024
cloth-0.020
 counter-0.019
box-0.019
itten-0.019
Token
Token ID220
Feature activation+3.53e-02

Pos logit contributions
zo+0.007
 Friendship+0.007
 culture+0.007
minded+0.007
 sailors+0.007
Neg logit contributions
 stool-0.015
cloth-0.013
 counter-0.012
box-0.012
 wra-0.012
Token wouldn
Token ID3636
Feature activation+5.60e-02

Pos logit contributions
 stool+0.002
cloth+0.001
 counter+0.001
 lid+0.001
amas+0.001
Neg logit contributions
zo-0.001
 adventures-0.001
minded-0.001
 culture-0.001
 experienced-0.001
Token't
Token ID470
Feature activation+3.37e-02

Pos logit contributions
 stool+0.129
cloth+0.109
box+0.107
pit+0.107
 counter+0.106
Neg logit contributions
zo-0.071
 Earth-0.070
 villains-0.064
 Jupiter-0.063
 aliens-0.063
Token find
Token ID1064
Feature activation-5.37e-03

Pos logit contributions
 stool+0.076
cloth+0.062
itten+0.061
box+0.060
 counter+0.060
Neg logit contributions
zo-0.048
 adventures-0.047
 villains-0.046
 rewarding-0.046
 culture-0.046
Token out
Token ID503
Feature activation-1.14e-02

Pos logit contributions
zo+0.006
 culture+0.006
 planets+0.006
 adventures+0.006
 aliens+0.006
Neg logit contributions
 stool-0.014
cloth-0.012
 counter-0.011
box-0.011
itten-0.011
Token.
Token ID13
Feature activation+6.54e-04

Pos logit contributions
 culture+0.016
zo+0.016
 adventures+0.016
 planets+0.016
 galaxies+0.015
Neg logit contributions
 stool-0.026
cloth-0.022
 counter-0.021
box-0.020
itten-0.020
Token But
Token ID887
Feature activation+2.46e-02

Pos logit contributions
 stool+0.002
cloth+0.001
 knob+0.001
box+0.001
itten+0.001
Neg logit contributions
 Friendship-0.001
zo-0.001
 habitats-0.001
minded-0.001
 Earth-0.001
Token later
Token ID1568
Feature activation+3.15e-02

Pos logit contributions
 stool+0.054
cloth+0.045
 counter+0.045
itten+0.044
box+0.044
Neg logit contributions
 Earth-0.034
 adventures-0.034
 Jupiter-0.033
 culture-0.033
zo-0.032
Token that
Token ID326
Feature activation+1.27e-03

Pos logit contributions
 stool+0.065
cloth+0.054
 counter+0.050
 sponge+0.050
itten+0.050
Neg logit contributions
 adventures-0.048
 marriage-0.048
zo-0.047
 Earth-0.047
 culture-0.046
Token night
Token ID1755
Feature activation+3.22e-02

Pos logit contributions
 stool+0.003
cloth+0.003
 counter+0.003
box+0.003
 wra+0.003
Neg logit contributions
zo-0.002
 culture-0.001
 adventures-0.001
 planets-0.001
 rewarding-0.001
Token,
Token ID11
Feature activation+2.09e-02

Pos logit contributions
 stool+0.068
itten+0.057
cloth+0.056
box+0.055
pit+0.054
Neg logit contributions
 Earth-0.047
 adventures-0.044
zo-0.044
 marriage-0.043
 Jupiter-0.042
Token Lily
Token ID20037
Feature activation-2.25e-02

Pos logit contributions
 stool+0.043
cloth+0.036
itten+0.036
box+0.035
 counter+0.034
Neg logit contributions
 adventures-0.032
 Earth-0.032
 Jupiter-0.030
 culture-0.030
 planets-0.030
Token and
Token ID290
Feature activation-4.76e-03

Pos logit contributions
zo+0.010
 culture+0.010
 galaxies+0.010
 rewarding+0.010
 habitats+0.010
Neg logit contributions
 stool-0.014
cloth-0.012
 counter-0.012
 chair-0.011
 wra-0.011
Token say
Token ID910
Feature activation+6.89e-03

Pos logit contributions
zo+0.008
 culture+0.008
 habitats+0.008
 galaxies+0.008
minded+0.008
Neg logit contributions
 stool-0.009
cloth-0.008
 counter-0.007
box-0.007
 chair-0.007
Token,
Token ID11
Feature activation-1.06e-02

Pos logit contributions
 stool+0.011
cloth+0.009
 counter+0.008
box+0.008
 chair+0.008
Neg logit contributions
zo-0.015
 habitats-0.014
 galaxies-0.014
 culture-0.014
 planets-0.014
Token "
Token ID366
Feature activation+1.97e-02

Pos logit contributions
zo+0.027
 galaxies+0.026
 culture+0.026
 planets+0.026
 habitats+0.026
Neg logit contributions
 stool-0.013
cloth-0.009
 counter-0.009
 chair-0.008
 wra-0.007
TokenYou
Token ID1639
Feature activation-9.75e-04

Pos logit contributions
 stool+0.034
cloth+0.028
 counter+0.026
 wra+0.026
coat+0.025
Neg logit contributions
 Jupiter-0.038
zo-0.037
 habitats-0.037
 culture-0.037
 galaxies-0.037
Token are
Token ID389
Feature activation-1.17e-02

Pos logit contributions
 habitats+0.002
zo+0.002
 culture+0.002
 galaxies+0.002
minded+0.002
Neg logit contributions
 stool-0.001
cloth-0.001
box-0.001
 counter-0.001
 bowl-0.001
Token my
Token ID616
Feature activation-1.68e-02

Pos logit contributions
 habitats+0.024
zo+0.023
 galaxies+0.023
 culture+0.022
minded+0.022
Neg logit contributions
 stool-0.020
cloth-0.017
box-0.016
 bowl-0.016
 counter-0.016
Token friend
Token ID1545
Feature activation-3.10e-02

Pos logit contributions
zo+0.026
 culture+0.026
 galaxies+0.026
 habitats+0.025
 planets+0.025
Neg logit contributions
 stool-0.036
cloth-0.030
 counter-0.029
 chair-0.028
 bowl-0.028
Token.
Token ID13
Feature activation-2.68e-02

Pos logit contributions
zo+0.053
 culture+0.052
 galaxies+0.051
 habitats+0.051
 planets+0.050
Neg logit contributions
 stool-0.061
cloth-0.052
 counter-0.049
box-0.047
 chair-0.047
Token I
Token ID314
Feature activation-1.36e-02

Pos logit contributions
zo+0.057
 galaxies+0.056
 adventures+0.055
 battles+0.055
 targets+0.054
Neg logit contributions
 stool-0.038
cloth-0.036
 bowl-0.031
 sponge-0.031
 Lee-0.031
Token like
Token ID588
Feature activation-3.85e-02

Pos logit contributions
 galaxies+0.028
 planets+0.027
 culture+0.027
zo+0.027
 Jupiter+0.027
Neg logit contributions
 stool-0.020
cloth-0.019
 wra-0.016
 bowl-0.016
 spoon-0.016
Token
Token ID198
Feature activation-3.78e-02

Pos logit contributions
 stool+0.024
cloth+0.020
 counter+0.020
box+0.019
 chair+0.019
Neg logit contributions
 culture-0.018
zo-0.018
 galaxies-0.017
 habitats-0.017
 planets-0.017
Token
Token ID198
Feature activation-3.17e-02

Pos logit contributions
 culture+0.082
 rewarding+0.079
 habitats+0.079
minded+0.078
 cultures+0.077
Neg logit contributions
 stool-0.057
 counter-0.046
 chair-0.044
cloth-0.041
box-0.040
TokenThe
Token ID464
Feature activation+1.28e-03

Pos logit contributions
 culture+0.057
 habitats+0.056
 rewarding+0.056
minded+0.055
 impact+0.053
Neg logit contributions
 stool-0.055
 counter-0.046
cloth-0.044
 wra-0.043
amas-0.042
Token moral
Token ID6573
Feature activation+5.16e-04

Pos logit contributions
 stool+0.003
 counter+0.002
 chair+0.002
cloth+0.002
 bowl+0.002
Neg logit contributions
 habitats-0.002
minded-0.002
 culture-0.002
zo-0.002
 Friendship-0.002
Token of
Token ID286
Feature activation-3.32e-02

Pos logit contributions
 stool+0.001
cloth+0.001
box+0.001
 counter+0.000
 chair+0.000
Neg logit contributions
 habitats-0.001
 culture-0.001
zo-0.001
 galaxies-0.001
minded-0.001
Token the
Token ID262
Feature activation-5.28e-03

Pos logit contributions
 habitats+0.073
 culture+0.072
 villains+0.071
 victories+0.070
zo+0.070
Neg logit contributions
 stool-0.048
cloth-0.038
 chair-0.038
 counter-0.038
box-0.037
Token story
Token ID1621
Feature activation-1.19e-04

Pos logit contributions
 habitats+0.012
minded+0.011
 culture+0.011
 islands+0.011
 conflicts+0.011
Neg logit contributions
 stool-0.008
 lid-0.006
 chair-0.006
 counter-0.006
 therm-0.006
Token is
Token ID318
Feature activation-4.18e-02

Pos logit contributions
zo+0.000
 culture+0.000
 galaxies+0.000
 habitats+0.000
 sailors+0.000
Neg logit contributions
 stool-0.000
cloth-0.000
 counter-0.000
box-0.000
 chair-0.000
Token that
Token ID326
Feature activation-2.53e-02

Pos logit contributions
 planets+0.083
 galaxies+0.083
 victories+0.082
 habitats+0.081
 battles+0.081
Neg logit contributions
 stool-0.064
box-0.054
 door-0.050
cloth-0.048
 counter-0.048
Token sharing
Token ID7373
Feature activation-2.13e-02

Pos logit contributions
zo+0.046
 habitats+0.045
minded+0.042
 galaxies+0.041
 culture+0.041
Neg logit contributions
 stool-0.046
 spoon-0.037
cloth-0.037
 bowl-0.036
 counter-0.036
Token is
Token ID318
Feature activation-3.86e-02

Pos logit contributions
zo+0.043
 habitats+0.041
 culture+0.041
 galaxies+0.040
minded+0.040
Neg logit contributions
 stool-0.035
cloth-0.030
 counter-0.027
box-0.027
 wra-0.026
Token weak
Token ID4939
Feature activation+2.18e-02

Pos logit contributions
 stool+0.035
itten+0.031
cloth+0.029
 counter+0.028
box+0.027
Neg logit contributions
 adventures-0.030
 planets-0.029
 culture-0.029
 galaxies-0.028
 Jupiter-0.027
Token.
Token ID13
Feature activation-3.41e-03

Pos logit contributions
 stool+0.052
itten+0.043
cloth+0.041
box+0.041
 counter+0.040
Neg logit contributions
 aliens-0.029
 planets-0.029
 adventures-0.029
 galaxies-0.027
 culture-0.027
Token
Token ID198
Feature activation-1.08e-02

Pos logit contributions
zo+0.006
 culture+0.006
 habitats+0.006
 rewarding+0.006
minded+0.006
Neg logit contributions
 stool-0.007
cloth-0.006
 counter-0.005
 chair-0.005
box-0.005
Token
Token ID198
Feature activation+9.75e-03

Pos logit contributions
 culture+0.020
zo+0.020
 habitats+0.020
 sailors+0.020
minded+0.019
Neg logit contributions
 stool-0.019
 chair-0.015
 counter-0.015
cloth-0.015
box-0.014
TokenMay
Token ID6747
Feature activation-1.77e-02

Pos logit contributions
 stool+0.022
cloth+0.018
 counter+0.018
 chair+0.017
box+0.017
Neg logit contributions
zo-0.014
 culture-0.014
 habitats-0.014
minded-0.013
 sailors-0.013
Token wanted
Token ID2227
Feature activation-4.76e-03

Pos logit contributions
zo+0.030
 culture+0.029
minded+0.027
 adventures+0.027
 planets+0.027
Neg logit contributions
 stool-0.036
cloth-0.030
 counter-0.029
 wra-0.027
 chair-0.027
Token to
Token ID284
Feature activation+4.34e-02

Pos logit contributions
zo+0.008
 culture+0.008
 habitats+0.007
 galaxies+0.007
 planets+0.007
Neg logit contributions
 stool-0.010
cloth-0.009
 counter-0.008
 chair-0.008
box-0.008
Token make
Token ID787
Feature activation+2.10e-02

Pos logit contributions
 stool+0.096
cloth+0.082
 counter+0.078
box+0.076
 wra+0.075
Neg logit contributions
zo-0.063
 culture-0.062
 adventures-0.061
 Friendship-0.059
 planets-0.059
Token the
Token ID262
Feature activation+2.94e-02

Pos logit contributions
 stool+0.038
cloth+0.032
 counter+0.029
box+0.028
 wra+0.028
Neg logit contributions
zo-0.038
 culture-0.038
 adventures-0.038
 planets-0.037
 galaxies-0.037
Token dress
Token ID6576
Feature activation+3.50e-02

Pos logit contributions
 stool+0.048
cloth+0.039
 counter+0.036
box+0.034
itten+0.034
Neg logit contributions
 culture-0.059
 adventures-0.058
 planets-0.057
zo-0.057
 aliens-0.056
Token look
Token ID804
Feature activation+2.44e-02

Pos logit contributions
 stool+0.078
cloth+0.067
 wra+0.066
 counter+0.065
itten+0.065
Neg logit contributions
 extraordinary-0.046
 adventures-0.045
 Jupiter-0.044
 rewarding-0.044
zo-0.043
Token but
Token ID475
Feature activation-1.21e-02

Pos logit contributions
zo+0.076
 culture+0.071
 habitats+0.070
minded+0.070
 rewarding+0.069
Neg logit contributions
 stool-0.064
cloth-0.050
 counter-0.048
 chair-0.047
 ladder-0.045
Token he
Token ID339
Feature activation-4.08e-02

Pos logit contributions
zo+0.022
 culture+0.021
minded+0.021
 habitats+0.020
 Friendship+0.020
Neg logit contributions
 stool-0.023
cloth-0.019
 counter-0.018
 chair-0.018
box-0.017
Token was
Token ID373
Feature activation-2.86e-02

Pos logit contributions
 culture+0.091
 Friendship+0.089
 diversity+0.089
minded+0.089
 galaxies+0.089
Neg logit contributions
 stool-0.056
cloth-0.043
 chair-0.040
 slipped-0.039
 spoon-0.038
Token too
Token ID1165
Feature activation-2.45e-02

Pos logit contributions
 habitats+0.055
zo+0.054
 culture+0.054
minded+0.053
 cultures+0.052
Neg logit contributions
 stool-0.050
cloth-0.041
 chair-0.038
 counter-0.036
 bowl-0.036
Token scared
Token ID12008
Feature activation-3.30e-02

Pos logit contributions
 culture+0.040
zo+0.039
 habitats+0.039
 galaxies+0.038
 Friendship+0.038
Neg logit contributions
 stool-0.050
cloth-0.042
 counter-0.039
 chair-0.039
itten-0.037
Token.
Token ID13
Feature activation-3.83e-02

Pos logit contributions
 culture+0.064
zo+0.061
 habitats+0.061
 diversity+0.060
 galaxies+0.058
Neg logit contributions
 stool-0.058
cloth-0.048
 counter-0.044
 chair-0.044
box-0.044
Token He
Token ID679
Feature activation-2.47e-02

Pos logit contributions
zo+0.069
 culture+0.069
 rewarding+0.066
 aliens+0.066
 sailors+0.065
Neg logit contributions
 stool-0.072
 counter-0.058
 chair-0.056
cloth-0.056
box-0.054
Token told
Token ID1297
Feature activation-1.42e-02

Pos logit contributions
 Friendship+0.048
 diversity+0.048
 culture+0.048
 rewarding+0.048
minded+0.047
Neg logit contributions
 stool-0.040
cloth-0.032
 chair-0.030
 counter-0.029
 knob-0.029
Token his
Token ID465
Feature activation-1.46e-02

Pos logit contributions
 rewarding+0.027
 culture+0.026
zo+0.026
 Friendship+0.026
minded+0.026
Neg logit contributions
 stool-0.025
cloth-0.020
 counter-0.020
 chair-0.020
 knob-0.019
Token friend
Token ID1545
Feature activation-9.29e-03

Pos logit contributions
zo+0.030
minded+0.030
 galaxies+0.029
 culture+0.029
 Friendship+0.029
Neg logit contributions
 stool-0.024
 chair-0.018
 counter-0.017
 ladder-0.017
cloth-0.017
Token Lucy
Token ID22162
Feature activation-1.18e-02

Pos logit contributions
zo+0.017
 culture+0.017
 adventures+0.017
 aliens+0.016
 planets+0.016
Neg logit contributions
 stool-0.017
cloth-0.014
 counter-0.013
 wra-0.012
 chair-0.012
Token there
Token ID612
Feature activation+1.43e-02

Pos logit contributions
 stool+0.014
cloth+0.012
 counter+0.012
 wrapper+0.011
box+0.010
Neg logit contributions
 adventures-0.013
zo-0.012
 culture-0.012
 Jupiter-0.012
 Earth-0.012
Token was
Token ID373
Feature activation-1.44e-02

Pos logit contributions
 stool+0.029
cloth+0.023
 counter+0.023
box+0.021
 wra+0.021
Neg logit contributions
 adventures-0.025
zo-0.024
 Friendship-0.023
 planets-0.023
 Earth-0.023
Token a
Token ID257
Feature activation-2.10e-02

Pos logit contributions
 culture+0.028
zo+0.027
minded+0.027
 habitats+0.026
 diversity+0.026
Neg logit contributions
 stool-0.025
cloth-0.021
 counter-0.020
 chair-0.019
 wra-0.019
Token wise
Token ID10787
Feature activation+6.21e-03

Pos logit contributions
zo+0.040
 culture+0.040
 habitats+0.038
 galaxies+0.038
minded+0.038
Neg logit contributions
 stool-0.037
cloth-0.031
 counter-0.029
box-0.027
 chair-0.027
Token old
Token ID1468
Feature activation+2.02e-02

Pos logit contributions
 stool+0.013
cloth+0.010
 counter+0.010
 chair+0.010
box+0.009
Neg logit contributions
zo-0.010
 culture-0.010
 habitats-0.010
minded-0.010
 galaxies-0.010
Token grandfather
Token ID17695
Feature activation-7.24e-03

Pos logit contributions
 stool+0.042
cloth+0.040
box+0.035
 counter+0.034
 wra+0.034
Neg logit contributions
 culture-0.030
zo-0.030
 Earth-0.028
 adventures-0.028
 sailors-0.027
Token.
Token ID13
Feature activation+3.03e-03

Pos logit contributions
 culture+0.011
 habitats+0.011
 rewarding+0.011
 galaxies+0.011
zo+0.011
Neg logit contributions
 stool-0.015
cloth-0.013
 counter-0.013
box-0.012
 chair-0.012
Token He
Token ID679
Feature activation-1.96e-02

Pos logit contributions
 stool+0.006
cloth+0.005
 counter+0.005
 chair+0.005
box+0.005
Neg logit contributions
zo-0.005
 culture-0.005
 habitats-0.005
 galaxies-0.005
 rewarding-0.005
Token liked
Token ID8288
Feature activation-3.71e-02

Pos logit contributions
minded+0.031
 culture+0.031
zo+0.030
 diversity+0.030
 galaxies+0.030
Neg logit contributions
 stool-0.038
cloth-0.032
 counter-0.031
 chair-0.030
 wra-0.030
Token to
Token ID284
Feature activation+1.32e-02

Pos logit contributions
zo+0.068
 rewarding+0.065
 culture+0.065
minded+0.064
 diversity+0.063
Neg logit contributions
 stool-0.068
 counter-0.055
 chair-0.054
cloth-0.050
box-0.050
Token tell
Token ID1560
Feature activation-1.57e-02

Pos logit contributions
 stool+0.029
cloth+0.025
 counter+0.024
box+0.023
 wra+0.023
Neg logit contributions
zo-0.020
 culture-0.019
 adventures-0.019
 planets-0.019
 galaxies-0.018
Token air
Token ID1633
Feature activation+2.06e-02

Pos logit contributions
 stool+0.057
cloth+0.049
box+0.047
 counter+0.045
 wra+0.044
Neg logit contributions
zo-0.037
 culture-0.035
 adventures-0.035
 sailors-0.035
 seals-0.035
Token on
Token ID319
Feature activation+1.68e-02

Pos logit contributions
 stool+0.046
cloth+0.037
 counter+0.036
box+0.036
itten+0.035
Neg logit contributions
zo-0.031
 adventures-0.030
 planets-0.029
 culture-0.029
 galaxies-0.028
Token his
Token ID465
Feature activation-5.57e-03

Pos logit contributions
 stool+0.042
cloth+0.036
box+0.036
coat+0.035
itten+0.034
Neg logit contributions
zo-0.020
 adventures-0.019
 Earth-0.019
 planets-0.019
 culture-0.019
Token face
Token ID1986
Feature activation+9.28e-03

Pos logit contributions
minded+0.012
 Friendship+0.012
 culture+0.012
 rewarding+0.012
 habitats+0.011
Neg logit contributions
 stool-0.009
cloth-0.007
 counter-0.007
 chair-0.006
box-0.006
Token.
Token ID13
Feature activation-4.39e-02

Pos logit contributions
 stool+0.022
cloth+0.019
 counter+0.018
box+0.018
 wra+0.017
Neg logit contributions
zo-0.012
 adventures-0.012
 planets-0.011
 culture-0.011
 battles-0.011
Token
Token ID198
Feature activation-4.38e-02

Pos logit contributions
 culture+0.082
zo+0.078
 rewarding+0.077
 sailors+0.075
minded+0.075
Neg logit contributions
 stool-0.081
 counter-0.067
 chair-0.065
cloth-0.065
box-0.062
Token
Token ID198
Feature activation-3.01e-02

Pos logit contributions
 culture+0.093
 habitats+0.087
minded+0.087
 cultures+0.086
 galaxies+0.086
Neg logit contributions
 stool-0.073
 chair-0.057
 counter-0.055
cloth-0.054
box-0.053
TokenOne
Token ID3198
Feature activation-1.25e-02

Pos logit contributions
 culture+0.047
minded+0.045
 sailors+0.044
 aliens+0.044
 habitats+0.043
Neg logit contributions
 stool-0.064
 counter-0.052
cloth-0.052
 chair-0.051
box-0.049
Token day
Token ID1110
Feature activation+3.92e-02

Pos logit contributions
 culture+0.021
 habitats+0.021
minded+0.021
 Friendship+0.021
 galaxies+0.020
Neg logit contributions
 stool-0.026
 chair-0.022
 bowl-0.021
cloth-0.021
 spoon-0.020
Token the
Token ID262
Feature activation+2.83e-03

Pos logit contributions
 stool+0.074
cloth+0.063
 counter+0.058
box+0.057
itten+0.054
Neg logit contributions
zo-0.071
 adventures-0.068
 Earth-0.068
 culture-0.068
 planets-0.067
Token bald
Token ID28273
Feature activation+1.49e-02

Pos logit contributions
 stool+0.005
 counter+0.004
 chair+0.004
cloth+0.004
 spoon+0.003
Neg logit contributions
 culture-0.006
 habitats-0.006
 galaxies-0.006
 aliens-0.005
zo-0.005
Token everyone
Token ID2506
Feature activation-5.01e-02

Pos logit contributions
zo+0.006
 culture+0.006
 habitats+0.005
 galaxies+0.005
minded+0.005
Neg logit contributions
 stool-0.008
cloth-0.007
 counter-0.006
box-0.006
 chair-0.006
Token had
Token ID550
Feature activation-3.06e-02

Pos logit contributions
 habitats+0.117
 culture+0.114
zo+0.114
 rewarding+0.112
minded+0.112
Neg logit contributions
 stool-0.063
cloth-0.052
 chair-0.043
box-0.042
 spoon-0.042
Token fun
Token ID1257
Feature activation-4.45e-02

Pos logit contributions
zo+0.053
 culture+0.051
minded+0.051
 habitats+0.050
 diversity+0.050
Neg logit contributions
 stool-0.056
cloth-0.049
box-0.045
 counter-0.045
 chair-0.042
Token and
Token ID290
Feature activation-2.32e-02

Pos logit contributions
zo+0.087
minded+0.083
 habitats+0.082
 Jupiter+0.082
 culture+0.082
Neg logit contributions
 stool-0.068
cloth-0.059
box-0.051
 chair-0.050
 sponge-0.050
Token they
Token ID484
Feature activation-3.69e-02

Pos logit contributions
zo+0.039
 culture+0.038
 habitats+0.037
minded+0.036
 galaxies+0.036
Neg logit contributions
 stool-0.046
cloth-0.039
 counter-0.037
 chair-0.035
box-0.035
Token all
Token ID477
Feature activation-4.39e-02

Pos logit contributions
 culture+0.071
 rewarding+0.070
minded+0.069
 habitats+0.069
 diversity+0.069
Neg logit contributions
 stool-0.057
cloth-0.050
 slipped-0.042
 chair-0.042
 spoon-0.042
Token became
Token ID2627
Feature activation-2.67e-02

Pos logit contributions
 culture+0.080
minded+0.079
 habitats+0.078
zo+0.078
 diversity+0.078
Neg logit contributions
 stool-0.071
cloth-0.064
 chair-0.055
 counter-0.054
 slipped-0.052
Token friends
Token ID2460
Feature activation-5.36e-02

Pos logit contributions
zo+0.043
 habitats+0.042
 culture+0.042
minded+0.039
 galaxies+0.039
Neg logit contributions
 stool-0.053
cloth-0.046
 counter-0.041
 chair-0.041
 bowl-0.040
Token.
Token ID13
Feature activation-4.98e-02

Pos logit contributions
 culture+0.117
zo+0.115
 habitats+0.114
 galaxies+0.113
 rewarding+0.112
Neg logit contributions
 stool-0.077
cloth-0.068
 counter-0.057
 chair-0.057
box-0.056
Token<|endoftext|>
Token ID50256
Feature activation-3.94e-02

Pos logit contributions
zo+0.099
 culture+0.098
 battles+0.097
 galaxies+0.095
minded+0.095
Neg logit contributions
 stool-0.079
 counter-0.066
cloth-0.065
 bowl-0.062
 chair-0.061
TokenOnce
Token ID7454
Feature activation-2.95e-02

Pos logit contributions
 culture+0.078
zo+0.074
 rewarding+0.074
 habitats+0.072
minded+0.072
Neg logit contributions
 stool-0.070
cloth-0.058
 counter-0.056
 chair-0.055
box-0.054
Token room
Token ID2119
Feature activation+5.40e-03

Pos logit contributions
 stool+0.006
cloth+0.006
 counter+0.005
box+0.005
itten+0.005
Neg logit contributions
 culture-0.005
zo-0.004
 adventures-0.004
 planets-0.004
 habitats-0.004
Token in
Token ID287
Feature activation-1.91e-02

Pos logit contributions
 stool+0.011
cloth+0.009
 counter+0.009
box+0.008
 chair+0.008
Neg logit contributions
zo-0.009
 culture-0.009
 adventures-0.008
 habitats-0.008
 planets-0.008
Token the
Token ID262
Feature activation-2.04e-03

Pos logit contributions
zo+0.033
 culture+0.031
minded+0.030
 habitats+0.030
 galaxies+0.030
Neg logit contributions
 stool-0.039
cloth-0.033
 counter-0.031
 chair-0.030
box-0.029
Token palace
Token ID20562
Feature activation+7.53e-04

Pos logit contributions
zo+0.003
 culture+0.003
minded+0.003
 habitats+0.003
 targets+0.003
Neg logit contributions
 stool-0.004
 counter-0.004
 chair-0.004
cloth-0.004
 bowl-0.004
Token.
Token ID13
Feature activation-3.38e-02

Pos logit contributions
 stool+0.001
cloth+0.001
 counter+0.001
box+0.001
 chair+0.001
Neg logit contributions
zo-0.001
 culture-0.001
 galaxies-0.001
 habitats-0.001
minded-0.001
Token
Token ID198
Feature activation-4.54e-02

Pos logit contributions
zo+0.066
 culture+0.065
 rewarding+0.063
 battles+0.062
minded+0.062
Neg logit contributions
 stool-0.060
 counter-0.049
cloth-0.047
 chair-0.047
 bowl-0.045
Token
Token ID198
Feature activation-1.33e-02

Pos logit contributions
 culture+0.100
minded+0.097
 habitats+0.096
 sailors+0.095
zo+0.094
Neg logit contributions
 stool-0.071
 chair-0.056
cloth-0.051
 counter-0.050
 bowl-0.050
TokenOne
Token ID3198
Feature activation-1.02e-03

Pos logit contributions
zo+0.020
 culture+0.020
minded+0.019
 sailors+0.019
 galaxies+0.019
Neg logit contributions
 stool-0.029
cloth-0.025
 counter-0.024
 chair-0.023
coat-0.022
Token day
Token ID1110
Feature activation+4.48e-02

Pos logit contributions
minded+0.002
 targets+0.002
 habitats+0.002
 Friendship+0.002
 galaxies+0.002
Neg logit contributions
 stool-0.002
 chair-0.002
cloth-0.002
 bowl-0.002
 cup-0.002
Token,
Token ID11
Feature activation+5.26e-02

Pos logit contributions
 stool+0.084
cloth+0.070
 counter+0.067
box+0.064
itten+0.061
Neg logit contributions
zo-0.078
 culture-0.078
 adventures-0.077
 Earth-0.077
 aliens-0.076
Token Lily
Token ID20037
Feature activation-3.07e-02

Pos logit contributions
 stool+0.118
cloth+0.103
box+0.101
itten+0.099
 counter+0.099
Neg logit contributions
 Earth-0.070
 aliens-0.069
 adventures-0.069
 culture-0.069
 planets-0.069
Token,
Token ID11
Feature activation-4.19e-02

Pos logit contributions
zo+0.028
 culture+0.027
 Friendship+0.026
 planets+0.026
 adventures+0.026
Neg logit contributions
 stool-0.039
cloth-0.033
 counter-0.032
 wra-0.030
 chair-0.030
Token Fl
Token ID1610
Feature activation-2.59e-02

Pos logit contributions
zo+0.080
minded+0.079
 culture+0.078
 habitats+0.077
 diversity+0.075
Neg logit contributions
 stool-0.078
cloth-0.061
 counter-0.061
 chair-0.058
 wra-0.056
Tokenuffy
Token ID15352
Feature activation-4.17e-02

Pos logit contributions
zo+0.068
 culture+0.068
 galaxies+0.066
 habitats+0.066
minded+0.066
Neg logit contributions
 stool-0.027
cloth-0.019
 counter-0.017
box-0.015
 chair-0.015
Token.
Token ID13
Feature activation-4.54e-02

Pos logit contributions
zo+0.069
 culture+0.067
 habitats+0.065
 planets+0.064
 adventures+0.064
Neg logit contributions
 stool-0.085
cloth-0.072
 counter-0.069
box-0.065
 chair-0.065
Token Fl
Token ID1610
Feature activation-2.79e-02

Pos logit contributions
zo+0.079
 culture+0.079
 rewarding+0.077
minded+0.077
 galaxies+0.075
Neg logit contributions
 stool-0.090
 counter-0.074
cloth-0.074
 chair-0.072
 bowl-0.069
Tokenuffy
Token ID15352
Feature activation-5.27e-02

Pos logit contributions
zo+0.074
 culture+0.073
 galaxies+0.071
 habitats+0.071
minded+0.071
Neg logit contributions
 stool-0.029
cloth-0.020
 counter-0.018
box-0.016
 chair-0.016
Token was
Token ID373
Feature activation-3.64e-02

Pos logit contributions
 culture+0.088
 rewarding+0.087
minded+0.086
 galaxies+0.086
 diversity+0.085
Neg logit contributions
 stool-0.100
cloth-0.084
 chair-0.080
 counter-0.079
 bowl-0.077
Token happy
Token ID3772
Feature activation-3.81e-02

Pos logit contributions
zo+0.074
minded+0.073
 habitats+0.073
 culture+0.072
 galaxies+0.071
Neg logit contributions
 stool-0.061
cloth-0.050
 counter-0.046
 chair-0.046
 bowl-0.044
Token and
Token ID290
Feature activation-2.69e-02

Pos logit contributions
zo+0.078
 culture+0.077
 habitats+0.075
 diversity+0.074
 languages+0.073
Neg logit contributions
 stool-0.065
 chair-0.051
 bowl-0.049
cloth-0.049
 spoon-0.048
Token loved
Token ID6151
Feature activation-3.78e-02

Pos logit contributions
 targets+0.045
zo+0.044
minded+0.043
 galaxies+0.043
 aliens+0.043
Neg logit contributions
 stool-0.051
cloth-0.043
 chair-0.041
 counter-0.041
 bowl-0.039
Token to
Token ID284
Feature activation+8.22e-03

Pos logit contributions
zo+0.071
 culture+0.068
minded+0.067
 rewarding+0.067
 galaxies+0.066
Neg logit contributions
 stool-0.069
 chair-0.055
 counter-0.054
cloth-0.053
 wra-0.050
Token things
Token ID1243
Feature activation-2.23e-02

Pos logit contributions
zo+0.046
 rewarding+0.041
 villains+0.041
 Friendship+0.041
 galaxies+0.040
Neg logit contributions
 stool-0.042
cloth-0.035
 counter-0.034
 chair-0.034
 wra-0.033
Token for
Token ID329
Feature activation-5.57e-03

Pos logit contributions
zo+0.040
 culture+0.040
 habitats+0.040
 galaxies+0.039
 planets+0.038
Neg logit contributions
 stool-0.041
cloth-0.034
 counter-0.033
box-0.031
 wra-0.031
Token his
Token ID465
Feature activation-2.45e-02

Pos logit contributions
zo+0.009
 culture+0.009
minded+0.008
 habitats+0.008
 galaxies+0.008
Neg logit contributions
 stool-0.012
cloth-0.010
 counter-0.010
box-0.009
 chair-0.009
Token curious
Token ID11040
Feature activation-1.93e-02

Pos logit contributions
zo+0.039
 culture+0.038
 habitats+0.037
 adventures+0.037
 planets+0.036
Neg logit contributions
 stool-0.052
cloth-0.044
 counter-0.042
box-0.040
 chair-0.040
Token mind
Token ID2000
Feature activation-1.45e-02

Pos logit contributions
 culture+0.029
zo+0.029
 habitats+0.028
 adventures+0.028
 planets+0.028
Neg logit contributions
 stool-0.042
cloth-0.036
 counter-0.034
box-0.033
 wra-0.032
Token.
Token ID13
Feature activation-4.62e-02

Pos logit contributions
zo+0.021
 culture+0.020
 planets+0.019
 galaxies+0.019
 habitats+0.019
Neg logit contributions
 stool-0.033
cloth-0.028
 counter-0.027
 wra-0.026
itten-0.026
Token<|endoftext|>
Token ID50256
Feature activation-2.70e-02

Pos logit contributions
 culture+0.085
zo+0.080
 planets+0.079
 aliens+0.079
 galaxies+0.079
Neg logit contributions
 stool-0.085
cloth-0.070
 counter-0.068
 chair-0.067
box-0.066
TokenJohn
Token ID7554
Feature activation-2.96e-02

Pos logit contributions
 culture+0.043
zo+0.042
 rewarding+0.040
 habitats+0.040
minded+0.040
Neg logit contributions
 stool-0.057
cloth-0.049
 counter-0.047
box-0.045
 chair-0.045
Token was
Token ID373
Feature activation-1.71e-02

Pos logit contributions
zo+0.048
 adventures+0.045
 habitats+0.044
 planets+0.044
 culture+0.044
Neg logit contributions
 stool-0.064
cloth-0.053
 counter-0.051
 wra-0.048
box-0.047
Token three
Token ID1115
Feature activation-1.69e-03

Pos logit contributions
zo+0.024
 culture+0.023
 adventures+0.023
 planets+0.022
 rewarding+0.022
Neg logit contributions
 stool-0.039
cloth-0.033
 counter-0.032
 wra-0.031
box-0.030
Token years
Token ID812
Feature activation+6.03e-03

Pos logit contributions
zo+0.002
 culture+0.002
 Friendship+0.002
 galaxies+0.002
minded+0.002
Neg logit contributions
 stool-0.004
cloth-0.004
 counter-0.004
 chair-0.003
box-0.003