TOP ACTIVATIONS
MAX = 0.152

All the furniture in L izz ie â s
In the end , L izz ie was really pleased with
in a tree and L izz ie wanted to get closer
First , L izz ie designed a cozy house
One day , L izz ie was in the park
a new outfit . L izz ie wanted to look really
's help . <|endoftext|> F inn sat under the apple tree
wine . L izz ie listened carefully and learned
. L izz ie jumped into the dish
just begun ! <|endoftext|> F inn was fishing with his dad
into the bush . L izz ie was about to turn
again . But unb ek nown to him , the king
From then on , L izz ie knew to always be
Jack had brought L izz ie a present . It
L izz ie . L izz ie was only three years
exclaimed . L izz ie smiled back . She
bear . L izz ie was worried that the
L izz ie . L izz ie was only three years
ie . Thankfully , L izz ie was alright , but
Suddenly , L izz ie saw something wonderful .

MIN ACTIVATIONS
MIN = -0.243

my cow boys and cow girls . I love you ."
be cow boys and cow girls . They put on hats
play cow boys and cow girls , and pretend to ride
them away from their natural habitats . We should be kind
them away from their natural habitats . We should be friendly
They might not train their dogs well . You have to
Look at how happy our animals look together !"
that said : " No dogs allowed ". Jen and Ben
soon . They hoped the animals and plants would be okay
noises and pretended to be animals . They had a lot
is not only about knowing facts and words . It is
noises and pretending to be animals . Lily had a big
" I don 't hate dogs anymore , they 're so
noises and pretending to be animals . Lily had a soft
advertising a campaign to help animals that did not have homes
" Do not feed the animals ". <|endoftext|> M ia and
noises and pretending to be animals . Tom was a lion
farm . We love our animals . We love our barn
noises and pretending to be animals . Lily had a yellow
noises and pretending to be animals . Lily had a doll

INTERVAL 0.053 - 0.152
CONTAINS 6.029%

mountains and oceans until they reached a beautiful place .
opened the cup board and took out some pots and pans
's friend , Max , saw what happened and ran over
the store . They walked down the a is les and
it , and then he put on his buckle and took

INTERVAL -0.046 - 0.053
CONTAINS 85.456%

" Who did this ? Who ruined my barn ?" He
them to the living room . They started to build a
the little girl . When it was time
. <|endoftext|> L ily liked to play princess with her dolls
y ummy and everyone wanted to play with it . But

INTERVAL -0.144 - -0.046
CONTAINS 8.483%

okay to ask for help when she needs it . Together
shiny . Anna wants to play with it . She says
They decided to share their toys with Spot .
and Sue did not see the snake behind them . They
was smart enough to know when something was too dangerous .

INTERVAL -0.243 - -0.144
CONTAINS 0.031%

more careful and respectful of animals . He still liked to
a bird . They liked animals . " Look
let 's make more paper animals !" Sara said .
fun making noises and telling stories . But then
about how to make her friends happy . M
Token All
Token ID1439
Feature activation+9.63e-03

Pos logit contributions
eh+0.000
Aid+0.000
O+0.000
 aback+0.000
 ingredient+0.000
Neg logit contributions
 sizes-0.001
ances-0.001
girls-0.001
arians-0.001
 cultures-0.001
Token the
Token ID262
Feature activation-7.38e-03

Pos logit contributions
O+0.011
eh+0.010
 aback+0.010
Aid+0.009
M+0.009
Neg logit contributions
 sizes-0.019
ances-0.018
 cultures-0.018
 nations-0.018
 planets-0.017
Token furniture
Token ID13091
Feature activation-6.83e-03

Pos logit contributions
ances+0.011
 achieve+0.011
girls+0.011
 cultures+0.010
aults+0.010
Neg logit contributions
 Jane-0.011
 my-0.010
 some-0.010
 Lucy-0.010
 me-0.010
Token in
Token ID287
Feature activation+3.64e-02

Pos logit contributions
 sizes+0.012
ances+0.012
 cultures+0.012
girls+0.012
 nations+0.011
Neg logit contributions
O-0.008
 Jane-0.008
 aback-0.008
 scoop-0.007
uck-0.007
Token L
Token ID406
Feature activation+1.92e-02

Pos logit contributions
O+0.028
 aback+0.026
 Jane+0.025
uck+0.024
Aid+0.024
Neg logit contributions
 sizes-0.082
ances-0.080
 cultures-0.080
 nations-0.078
 planets-0.078
Tokenizz
Token ID6457
Feature activation+1.52e-01

Pos logit contributions
 aback+0.009
O+0.009
 Jane+0.007
Aid+0.007
 scoop+0.007
Neg logit contributions
ances-0.049
 sizes-0.047
 cultures-0.046
bles-0.046
 nations-0.046
Tokenie
Token ID494
Feature activation+4.07e-02

Pos logit contributions
 Jane+0.169
O+0.169
 aback+0.163
uck+0.162
 scoop+0.157
Neg logit contributions
 sizes-0.280
 cultures-0.275
ances-0.273
arians-0.262
girls-0.261
Tokenâ
Token ID22940
Feature activation+1.91e-02

Pos logit contributions
 Jane+0.035
 found+0.035
 scoop+0.033
 aback+0.033
uck+0.033
Neg logit contributions
ances-0.086
 sizes-0.085
 cultures-0.085
girls-0.081
 nations-0.080
Token
Token ID26391
Feature activation-1.02e-02

Pos logit contributions
O+0.027
Da+0.026
 Jane+0.025
M+0.025
 another+0.025
Neg logit contributions
ances-0.029
bows-0.029
 sizes-0.028
 cultures-0.027
 achieve-0.027
Token
Token ID8151
Feature activation+2.44e-02

Pos logit contributions
 sizes+0.016
 cultures+0.016
ances+0.016
 nations+0.015
girls+0.014
Neg logit contributions
uck-0.015
 Jane-0.015
O-0.015
 found-0.014
M-0.014
Tokens
Token ID82
Feature activation+1.75e-02

Pos logit contributions
 Jane+0.030
O+0.029
M+0.028
 aback+0.027
uck+0.027
Neg logit contributions
 cultures-0.047
 sizes-0.046
 planets-0.045
ances-0.045
arians-0.042
TokenIn
Token ID818
Feature activation-1.10e-02

Pos logit contributions
O+0.008
M+0.007
 Jane+0.007
 aback+0.007
 scoop+0.007
Neg logit contributions
 sizes-0.017
ances-0.017
 cultures-0.016
girls-0.016
 nations-0.016
Token the
Token ID262
Feature activation-9.25e-03

Pos logit contributions
 planets+0.021
 sizes+0.020
 cultures+0.019
 countries+0.019
 nations+0.019
Neg logit contributions
ray-0.012
O-0.012
faced-0.012
eh-0.012
 aback-0.012
Token end
Token ID886
Feature activation-2.83e-02

Pos logit contributions
 sizes+0.014
 cultures+0.013
 planets+0.013
ances+0.013
 nations+0.013
Neg logit contributions
O-0.014
 aback-0.014
 Jane-0.013
uck-0.013
inn-0.013
Token,
Token ID11
Feature activation-2.12e-02

Pos logit contributions
 sizes+0.048
ances+0.047
 cultures+0.047
arians+0.045
girls+0.045
Neg logit contributions
 Jane-0.040
O-0.039
M-0.037
 another-0.037
 my-0.036
Token L
Token ID406
Feature activation+1.39e-02

Pos logit contributions
 sizes+0.042
ances+0.041
 cultures+0.041
 nations+0.040
 planets+0.039
Neg logit contributions
O-0.022
 Jane-0.021
 aback-0.021
M-0.019
uck-0.019
Tokenizz
Token ID6457
Feature activation+1.48e-01

Pos logit contributions
 aback+0.007
O+0.006
 Jane+0.005
 Half+0.005
 another+0.005
Neg logit contributions
ances-0.035
 sizes-0.034
bles-0.034
 cultures-0.033
 nations-0.033
Tokenie
Token ID494
Feature activation+2.96e-02

Pos logit contributions
O+0.174
 aback+0.166
 Jane+0.159
Aid+0.152
 Half+0.150
Neg logit contributions
 sizes-0.273
ances-0.272
 cultures-0.265
 nations-0.259
 planets-0.258
Token was
Token ID373
Feature activation-6.81e-03

Pos logit contributions
O+0.034
 aback+0.031
 Jane+0.031
 scoop+0.030
M+0.030
Neg logit contributions
 sizes-0.056
ances-0.055
 cultures-0.054
 nations-0.052
 planets-0.052
Token really
Token ID1107
Feature activation+6.17e-02

Pos logit contributions
 sizes+0.009
ances+0.009
 planets+0.009
 cultures+0.009
aults+0.009
Neg logit contributions
O-0.011
 aback-0.010
ray-0.010
Aid-0.010
Da-0.010
Token pleased
Token ID10607
Feature activation-1.69e-02

Pos logit contributions
O+0.089
 aback+0.085
 Jane+0.084
M+0.081
 scoop+0.081
Neg logit contributions
 sizes-0.098
ances-0.096
 cultures-0.095
 nations-0.091
 planets-0.090
Token with
Token ID351
Feature activation+3.85e-02

Pos logit contributions
ances+0.038
 sizes+0.038
 cultures+0.037
girls+0.037
 nations+0.037
Neg logit contributions
 Jane-0.014
 another-0.013
uck-0.013
O-0.012
M-0.012
Token in
Token ID287
Feature activation+3.29e-02

Pos logit contributions
O+0.047
uck+0.044
 scoop+0.044
eh+0.043
 Jane+0.043
Neg logit contributions
 sizes-0.074
 cultures-0.073
ances-0.073
arians-0.070
 nations-0.069
Token a
Token ID257
Feature activation+1.70e-02

Pos logit contributions
ray+0.040
inn+0.040
uck+0.039
 aback+0.038
Aid+0.037
Neg logit contributions
 planets-0.060
 sizes-0.058
 nations-0.056
 countries-0.055
 cultures-0.055
Token tree
Token ID5509
Feature activation+4.24e-02

Pos logit contributions
 aback+0.028
 Jane+0.027
inn+0.027
uck+0.025
izz+0.025
Neg logit contributions
 planets-0.022
 sizes-0.021
 nations-0.021
aults-0.021
 achieve-0.021
Token and
Token ID290
Feature activation+2.67e-02

Pos logit contributions
 scoop+0.050
O+0.050
 Jane+0.050
uck+0.049
M+0.048
Neg logit contributions
 sizes-0.076
 cultures-0.076
ances-0.075
arians-0.073
girls-0.072
Token L
Token ID406
Feature activation+8.16e-03

Pos logit contributions
O+0.034
 aback+0.034
inn+0.033
uck+0.032
Aid+0.031
Neg logit contributions
 sizes-0.046
 cultures-0.045
ances-0.044
 planets-0.044
 nations-0.043
Tokenizz
Token ID6457
Feature activation+1.47e-01

Pos logit contributions
 aback+0.003
O+0.003
 Jane+0.003
Da+0.002
 another+0.002
Neg logit contributions
ances-0.021
 sizes-0.020
bles-0.020
 cultures-0.020
 nations-0.020
Tokenie
Token ID494
Feature activation+1.81e-02

Pos logit contributions
O+0.129
 Jane+0.121
eh+0.115
 scoop+0.114
M+0.114
Neg logit contributions
 sizes-0.315
 cultures-0.313
ances-0.305
arians-0.299
 nations-0.297
Token wanted
Token ID2227
Feature activation+9.44e-03

Pos logit contributions
O+0.016
 scoop+0.016
 found+0.016
M+0.016
 Jane+0.016
Neg logit contributions
 cultures-0.039
ances-0.038
 sizes-0.038
 nations-0.037
arians-0.037
Token to
Token ID284
Feature activation-1.66e-03

Pos logit contributions
inn+0.014
izz+0.013
O+0.013
plug+0.012
eh+0.012
Neg logit contributions
 cultures-0.014
 sizes-0.014
 planets-0.014
girls-0.014
 ze-0.013
Token get
Token ID651
Feature activation+5.09e-02

Pos logit contributions
 affect+0.002
 achieve+0.002
ances+0.002
 sizes+0.002
 chatter+0.002
Neg logit contributions
inn-0.003
 aback-0.002
O-0.002
 late-0.002
Da-0.002
Token closer
Token ID5699
Feature activation+4.83e-02

Pos logit contributions
Amy+0.068
inn+0.067
Lu+0.064
O+0.063
eh+0.062
Neg logit contributions
 sailors-0.081
 pilots-0.078
girls-0.075
 jets-0.074
 singing-0.074
Token
Token ID198
Feature activation+1.59e-02

Pos logit contributions
O+0.009
 aback+0.008
 Jane+0.008
eh+0.008
 scoop+0.008
Neg logit contributions
 sizes-0.019
ances-0.018
 cultures-0.018
 planets-0.018
 nations-0.017
Token
Token ID198
Feature activation+1.81e-02

Pos logit contributions
O+0.018
 aback+0.018
Aid+0.017
 scoop+0.017
 Jane+0.017
Neg logit contributions
 sizes-0.028
ances-0.028
 cultures-0.028
 planets-0.027
 nations-0.026
TokenFirst
Token ID5962
Feature activation-1.07e-03

Pos logit contributions
O+0.014
M+0.012
 Jane+0.010
inn+0.010
 aback+0.010
Neg logit contributions
ances-0.042
 sizes-0.042
 cultures-0.042
girls-0.041
arians-0.041
Token,
Token ID11
Feature activation-1.71e-02

Pos logit contributions
 sizes+0.001
ances+0.001
 cultures+0.001
 nations+0.001
 planets+0.001
Neg logit contributions
O-0.002
 aback-0.002
 Jane-0.002
Aid-0.002
 scoop-0.002
Token L
Token ID406
Feature activation+1.13e-02

Pos logit contributions
ances+0.037
 cultures+0.037
 sizes+0.037
 nations+0.036
arians+0.036
Neg logit contributions
O-0.016
 Jane-0.015
 another-0.014
M-0.014
 some-0.013
Tokenizz
Token ID6457
Feature activation+1.45e-01

Pos logit contributions
 aback+0.006
O+0.005
 Jane+0.005
Da+0.005
 Half+0.005
Neg logit contributions
ances-0.028
bles-0.027
 sizes-0.026
 ABC-0.026
 cultures-0.026
Tokenie
Token ID494
Feature activation+2.60e-02

Pos logit contributions
O+0.168
 aback+0.162
 Jane+0.156
Aid+0.148
 scoop+0.144
Neg logit contributions
 sizes-0.271
ances-0.271
 cultures-0.262
 nations-0.257
 planets-0.256
Token designed
Token ID3562
Feature activation+3.42e-02

Pos logit contributions
O+0.027
 aback+0.025
 Jane+0.025
Aid+0.023
uck+0.023
Neg logit contributions
 sizes-0.052
ances-0.051
 cultures-0.050
 planets-0.049
 nations-0.049
Token a
Token ID257
Feature activation-4.33e-03

Pos logit contributions
O+0.028
 aback+0.025
Aid+0.023
 Jane+0.023
uck+0.023
Neg logit contributions
 sizes-0.077
 planets-0.074
ances-0.074
 cultures-0.074
 nations-0.072
Token cozy
Token ID37438
Feature activation-3.90e-03

Pos logit contributions
 sizes+0.008
ances+0.008
 cultures+0.008
girls+0.007
 nations+0.007
Neg logit contributions
O-0.005
 Jane-0.005
 aback-0.005
 scoop-0.005
M-0.005
Token house
Token ID2156
Feature activation+1.10e-02

Pos logit contributions
 sizes+0.006
 planets+0.005
 cultures+0.005
ances+0.005
 nations+0.005
Neg logit contributions
O-0.006
 aback-0.006
 Jane-0.006
izz-0.006
uck-0.005
Token
Token ID198
Feature activation+1.38e-02

Pos logit contributions
O+0.013
 aback+0.013
 Jane+0.012
Aid+0.012
 scoop+0.012
Neg logit contributions
 sizes-0.021
ances-0.020
 cultures-0.020
 planets-0.019
 nations-0.019
TokenOne
Token ID3198
Feature activation+4.63e-02

Pos logit contributions
O+0.012
 aback+0.011
 Jane+0.011
uck+0.010
 scoop+0.010
Neg logit contributions
 sizes-0.030
ances-0.029
 cultures-0.029
 nations-0.028
 planets-0.028
Token day
Token ID1110
Feature activation+1.40e-03

Pos logit contributions
 aback+0.100
plug+0.092
O+0.089
ction+0.088
Gu+0.087
Neg logit contributions
 sizes-0.040
 ABC-0.038
ances-0.037
 planets-0.036
 cultures-0.036
Token,
Token ID11
Feature activation-1.31e-02

Pos logit contributions
O+0.002
 Jane+0.002
M+0.002
 aback+0.002
uck+0.002
Neg logit contributions
 sizes-0.002
ances-0.002
 cultures-0.002
girls-0.002
 nations-0.002
Token L
Token ID406
Feature activation+1.07e-02

Pos logit contributions
 sizes+0.027
ances+0.026
 cultures+0.026
 nations+0.026
girls+0.025
Neg logit contributions
O-0.013
 Jane-0.012
 aback-0.012
M-0.012
 scoop-0.011
Tokenizz
Token ID6457
Feature activation+1.45e-01

Pos logit contributions
 aback+0.005
O+0.004
 Jane+0.004
 Half+0.004
Da+0.003
Neg logit contributions
ances-0.027
 sizes-0.026
bles-0.026
 cultures-0.026
 nations-0.025
Tokenie
Token ID494
Feature activation+2.33e-02

Pos logit contributions
O+0.162
 aback+0.154
 Jane+0.149
Aid+0.141
 scoop+0.139
Neg logit contributions
 sizes-0.274
ances-0.273
 cultures-0.267
 nations-0.259
 planets-0.259
Token was
Token ID373
Feature activation-5.27e-03

Pos logit contributions
O+0.022
 aback+0.020
 Jane+0.020
Aid+0.019
M+0.018
Neg logit contributions
 sizes-0.048
ances-0.048
 cultures-0.047
 planets-0.046
 nations-0.046
Token in
Token ID287
Feature activation+4.32e-02

Pos logit contributions
 sizes+0.005
ances+0.005
 cultures+0.005
 planets+0.005
 ABC+0.005
Neg logit contributions
O-0.011
 aback-0.010
 Jane-0.010
ray-0.010
M-0.010
Token the
Token ID262
Feature activation-2.79e-03

Pos logit contributions
O+0.039
 aback+0.035
ray+0.034
uck+0.034
inn+0.033
Neg logit contributions
 sizes-0.089
 cultures-0.089
 planets-0.089
ances-0.087
 countries-0.087
Token park
Token ID3952
Feature activation-3.04e-02

Pos logit contributions
 planets+0.003
 cultures+0.002
 nations+0.002
 ABC+0.002
 countries+0.002
Neg logit contributions
 aback-0.006
inn-0.006
O-0.005
izz-0.005
 Jane-0.005
Token a
Token ID257
Feature activation-8.15e-03

Pos logit contributions
O+0.027
 aback+0.025
Aid+0.024
inn+0.023
ray+0.023
Neg logit contributions
 sizes-0.071
 planets-0.069
 cultures-0.067
ances-0.067
 nations-0.066
Token new
Token ID649
Feature activation+3.87e-03

Pos logit contributions
ances+0.015
 sizes+0.015
 cultures+0.015
girls+0.014
arians+0.014
Neg logit contributions
O-0.009
 scoop-0.009
 Jane-0.009
M-0.008
 another-0.008
Token outfit
Token ID16313
Feature activation+1.83e-02

Pos logit contributions
O+0.006
 aback+0.005
 Jane+0.005
Aid+0.005
M+0.005
Neg logit contributions
 sizes-0.006
ances-0.006
 cultures-0.006
 planets-0.006
 nations-0.006
Token.
Token ID13
Feature activation+6.89e-03

Pos logit contributions
O+0.019
 Jane+0.018
 aback+0.018
uck+0.017
 scoop+0.017
Neg logit contributions
 sizes-0.036
ances-0.035
 cultures-0.035
 nations-0.034
girls-0.034
Token L
Token ID406
Feature activation+9.63e-03

Pos logit contributions
O+0.005
 aback+0.005
 Jane+0.005
 scoop+0.004
M+0.004
Neg logit contributions
 sizes-0.015
ances-0.015
 cultures-0.015
 planets-0.015
 nations-0.015
Tokenizz
Token ID6457
Feature activation+1.44e-01

Pos logit contributions
 aback+0.005
O+0.004
 Jane+0.004
Da+0.004
 another+0.004
Neg logit contributions
ances-0.024
bles-0.023
 sizes-0.023
 cultures-0.022
 ABC-0.022
Tokenie
Token ID494
Feature activation+2.43e-02

Pos logit contributions
O+0.170
 aback+0.162
 Jane+0.155
Aid+0.148
 Half+0.145
Neg logit contributions
 sizes-0.269
ances-0.267
 cultures-0.261
 planets-0.254
 nations-0.254
Token wanted
Token ID2227
Feature activation+8.61e-03

Pos logit contributions
O+0.028
 aback+0.026
 Jane+0.026
M+0.024
uck+0.024
Neg logit contributions
 sizes-0.046
ances-0.045
 cultures-0.045
 nations-0.043
 planets-0.043
Token to
Token ID284
Feature activation+6.82e-04

Pos logit contributions
O+0.008
eh+0.008
 aback+0.007
Aid+0.007
inn+0.007
Neg logit contributions
 sizes-0.018
 cultures-0.018
 planets-0.017
 nations-0.017
 ABC-0.017
Token look
Token ID804
Feature activation+5.67e-02

Pos logit contributions
O+0.001
 aback+0.001
 Jane+0.001
eh+0.001
Da+0.001
Neg logit contributions
 sizes-0.001
ances-0.001
 cultures-0.001
 planets-0.001
 nations-0.001
Token really
Token ID1107
Feature activation+4.84e-02

Pos logit contributions
O+0.086
uck+0.079
eh+0.076
Da+0.075
ction+0.074
Neg logit contributions
 sizes-0.087
 cultures-0.087
 planets-0.086
ances-0.084
 nations-0.084
Token's
Token ID338
Feature activation-9.57e-03

Pos logit contributions
 sizes+0.017
ances+0.017
 cultures+0.017
 nations+0.016
 planets+0.016
Neg logit contributions
 Jane-0.012
O-0.012
 aback-0.012
 another-0.011
 scoop-0.011
Token help
Token ID1037
Feature activation-6.11e-02

Pos logit contributions
 sizes+0.018
ances+0.018
girls+0.017
 cultures+0.017
 nations+0.017
Neg logit contributions
 scoop-0.011
 another-0.011
O-0.011
 aback-0.011
 Jane-0.010
Token.
Token ID13
Feature activation-3.41e-03

Pos logit contributions
 sizes+0.107
 cultures+0.105
ances+0.103
 nations+0.102
girls+0.100
Neg logit contributions
 Jane-0.077
O-0.077
 aback-0.076
 scoop-0.072
 another-0.072
Token<|endoftext|>
Token ID50256
Feature activation-2.85e-02

Pos logit contributions
 sizes+0.006
 cultures+0.006
ances+0.006
 planets+0.006
girls+0.006
Neg logit contributions
O-0.004
 aback-0.004
eh-0.004
 scoop-0.003
Aid-0.003
TokenF
Token ID37
Feature activation+1.28e-02

Pos logit contributions
 sizes+0.061
ances+0.061
 cultures+0.060
 nations+0.058
arians+0.058
Neg logit contributions
O-0.026
 Jane-0.024
 aback-0.023
M-0.022
uck-0.022
Tokeninn
Token ID3732
Feature activation+1.44e-01

Pos logit contributions
O+0.012
 aback+0.012
Gu+0.011
uck+0.011
 another+0.011
Neg logit contributions
ances-0.025
 cultures-0.024
 sizes-0.024
 nations-0.024
 countries-0.023
Token sat
Token ID3332
Feature activation+3.97e-02

Pos logit contributions
O+0.155
 Jane+0.150
 scoop+0.141
 aback+0.141
uck+0.140
Neg logit contributions
 sizes-0.280
ances-0.274
 cultures-0.273
arians-0.263
girls-0.260
Token under
Token ID739
Feature activation+5.32e-02

Pos logit contributions
O+0.052
 aback+0.049
 ingredient+0.047
ction+0.047
ray+0.046
Neg logit contributions
 planets-0.065
bles-0.065
 cultures-0.065
ances-0.063
 sizes-0.062
Token the
Token ID262
Feature activation-2.49e-02

Pos logit contributions
O+0.044
 aback+0.044
ray+0.041
uck+0.040
Aid+0.040
Neg logit contributions
 sizes-0.113
 cultures-0.110
ances-0.109
 planets-0.109
 nations-0.107
Token apple
Token ID17180
Feature activation+4.65e-02

Pos logit contributions
 sizes+0.038
ances+0.037
 cultures+0.037
arians+0.035
girls+0.035
Neg logit contributions
O-0.038
 Jane-0.036
 scoop-0.036
 my-0.035
 another-0.035
Token tree
Token ID5509
Feature activation+4.08e-02

Pos logit contributions
O+0.064
 Jane+0.061
 aback+0.060
uck+0.059
M+0.059
Neg logit contributions
 sizes-0.076
ances-0.076
 cultures-0.075
 nations-0.071
girls-0.070
Token wine
Token ID8237
Feature activation-2.78e-02

Pos logit contributions
 sizes+0.010
ances+0.010
 cultures+0.010
 nations+0.009
 planets+0.009
Neg logit contributions
O-0.006
 aback-0.006
 Jane-0.005
M-0.005
uck-0.005
Token.
Token ID13
Feature activation+8.10e-04

Pos logit contributions
ances+0.048
 cultures+0.048
 sizes+0.047
 nations+0.045
 planets+0.045
Neg logit contributions
 Jane-0.035
O-0.033
 another-0.033
 aback-0.033
 scoop-0.032
Token
Token ID198
Feature activation+8.87e-03

Pos logit contributions
O+0.001
 aback+0.001
eh+0.001
 ingredient+0.001
Aid+0.001
Neg logit contributions
 sizes-0.001
ances-0.001
 cultures-0.001
girls-0.001
 planets-0.001
Token
Token ID198
Feature activation+1.08e-02

Pos logit contributions
O+0.009
 aback+0.009
 Jane+0.009
Aid+0.008
uck+0.008
Neg logit contributions
 sizes-0.017
ances-0.017
 cultures-0.017
girls-0.016
 nations-0.016
TokenL
Token ID43
Feature activation+3.05e-02

Pos logit contributions
O+0.010
 aback+0.010
 Jane+0.009
uck+0.009
Aid+0.009
Neg logit contributions
 sizes-0.023
ances-0.022
 cultures-0.022
 nations-0.021
 planets-0.021
Tokenizz
Token ID6457
Feature activation+1.43e-01

Pos logit contributions
 aback+0.017
O+0.016
 Half+0.015
 Jane+0.014
 scoop+0.014
Neg logit contributions
ances-0.074
 sizes-0.072
 cultures-0.070
bles-0.069
 nations-0.068
Tokenie
Token ID494
Feature activation+2.29e-02

Pos logit contributions
O+0.184
 aback+0.180
 Half+0.170
 Jane+0.167
Aid+0.166
Neg logit contributions
ances-0.246
 sizes-0.242
 cultures-0.233
 nations-0.227
bles-0.225
Token listened
Token ID16399
Feature activation-2.71e-02

Pos logit contributions
O+0.027
 aback+0.026
 Jane+0.026
 scoop+0.024
M+0.024
Neg logit contributions
 sizes-0.042
ances-0.041
 cultures-0.041
 nations-0.039
 planets-0.039
Token carefully
Token ID7773
Feature activation+3.93e-02

Pos logit contributions
ances+0.047
 sizes+0.047
 cultures+0.046
 planets+0.044
bles+0.043
Neg logit contributions
O-0.035
Aid-0.031
 aback-0.030
eh-0.030
 Jane-0.030
Token and
Token ID290
Feature activation+8.98e-03

Pos logit contributions
 aback+0.049
 Jane+0.048
 another+0.048
 scoop+0.047
 some+0.047
Neg logit contributions
 sizes-0.070
ances-0.068
 cultures-0.066
girls-0.065
bles-0.065
Token learned
Token ID4499
Feature activation-7.64e-02

Pos logit contributions
O+0.011
 aback+0.010
Aid+0.010
eh+0.010
inn+0.010
Neg logit contributions
 sizes-0.016
 cultures-0.016
ances-0.015
 planets-0.015
 nations-0.015
Token.
Token ID13
Feature activation+4.04e-03

Pos logit contributions
O+0.010
 aback+0.010
 Jane+0.009
M+0.009
Aid+0.009
Neg logit contributions
 sizes-0.012
ances-0.012
 cultures-0.012
 nations-0.011
 planets-0.011
Token
Token ID220
Feature activation+7.99e-03

Pos logit contributions
eh+0.005
 ingredient+0.005
O+0.005
 aback+0.005
 poisonous+0.005
Neg logit contributions
 sizes-0.006
ances-0.006
 ABC-0.006
bows-0.006
bles-0.006
Token
Token ID198
Feature activation+8.81e-03

Pos logit contributions
O+0.010
 aback+0.009
 scoop+0.009
Aid+0.009
 Jane+0.009
Neg logit contributions
ances-0.014
 sizes-0.014
bles-0.013
 cultures-0.013
girls-0.013
Token
Token ID198
Feature activation+1.08e-02

Pos logit contributions
O+0.010
 aback+0.009
 Jane+0.009
Aid+0.009
 scoop+0.009
Neg logit contributions
 sizes-0.016
ances-0.016
 cultures-0.016
bles-0.015
girls-0.015
TokenL
Token ID43
Feature activation+3.39e-02

Pos logit contributions
 aback+0.013
O+0.013
 Jane+0.013
Aid+0.012
uck+0.012
Neg logit contributions
 sizes-0.019
ances-0.019
 cultures-0.018
 nations-0.018
arians-0.018
Tokenizz
Token ID6457
Feature activation+1.42e-01

Pos logit contributions
 aback+0.022
O+0.019
 Half+0.019
 Jane+0.018
 scoop+0.018
Neg logit contributions
ances-0.081
 sizes-0.076
bles-0.076
 cultures-0.075
 nations-0.074
Tokenie
Token ID494
Feature activation+2.49e-02

Pos logit contributions
O+0.182
 aback+0.177
 Jane+0.164
Aid+0.162
 Half+0.161
Neg logit contributions
ances-0.249
 sizes-0.246
 cultures-0.238
 nations-0.235
 planets-0.233
Token jumped
Token ID11687
Feature activation-1.91e-03

Pos logit contributions
O+0.029
 aback+0.026
 Jane+0.026
Aid+0.025
inn+0.024
Neg logit contributions
 sizes-0.046
ances-0.045
 cultures-0.045
 planets-0.044
girls-0.044
Token into
Token ID656
Feature activation+3.58e-02

Pos logit contributions
 sizes+0.004
ances+0.003
bles+0.003
 cultures+0.003
 planets+0.003
Neg logit contributions
O-0.002
 Jane-0.002
 aback-0.002
Aid-0.002
ray-0.002
Token the
Token ID262
Feature activation-2.09e-02

Pos logit contributions
ray+0.041
uck+0.040
eh+0.040
O+0.039
Aid+0.039
Neg logit contributions
 sizes-0.065
 planets-0.063
ances-0.061
 dancers-0.061
 cultures-0.061
Token dish
Token ID9433
Feature activation+5.17e-02

Pos logit contributions
 sizes+0.035
 planets+0.032
 ABC+0.031
ances+0.031
 cultures+0.031
Neg logit contributions
O-0.029
Da-0.028
 Jane-0.027
 aback-0.026
Amy-0.025
Token just
Token ID655
Feature activation-1.78e-03

Pos logit contributions
 sizes+0.070
ances+0.069
 cultures+0.069
 nations+0.067
 planets+0.066
Neg logit contributions
O-0.022
 aback-0.019
 Jane-0.019
uck-0.017
Aid-0.017
Token begun
Token ID9258
Feature activation-6.41e-02

Pos logit contributions
 sizes+0.003
ances+0.003
 cultures+0.003
girls+0.003
arians+0.003
Neg logit contributions
O-0.002
 Jane-0.002
 aback-0.002
 another-0.002
 scoop-0.002
Token!
Token ID0
Feature activation-6.94e-03

Pos logit contributions
ances+0.134
 sizes+0.133
 cultures+0.130
girls+0.130
arians+0.129
Neg logit contributions
 Jane-0.057
 aback-0.055
O-0.055
 another-0.052
 my-0.050
Token<|endoftext|>
Token ID50256
Feature activation-3.40e-02

Pos logit contributions
 sizes+0.013
 cultures+0.013
ances+0.012
 planets+0.012
 nations+0.012
Neg logit contributions
O-0.008
 aback-0.007
 scoop-0.007
Aid-0.007
eh-0.007
TokenF
Token ID37
Feature activation+7.04e-03

Pos logit contributions
 sizes+0.074
ances+0.074
 cultures+0.072
 nations+0.070
arians+0.069
Neg logit contributions
O-0.030
 Jane-0.028
 aback-0.027
M-0.026
uck-0.025
Tokeninn
Token ID3732
Feature activation+1.42e-01

Pos logit contributions
O+0.006
 aback+0.006
 Jane+0.006
uck+0.006
Aid+0.006
Neg logit contributions
ances-0.014
 sizes-0.014
 cultures-0.014
 nations-0.014
girls-0.013
Token was
Token ID373
Feature activation+1.63e-02

Pos logit contributions
O+0.132
 Jane+0.131
uck+0.121
 scoop+0.119
M+0.117
Neg logit contributions
 sizes-0.296
 cultures-0.288
ances-0.287
arians-0.281
girls-0.277
Token fishing
Token ID12478
Feature activation-2.32e-03

Pos logit contributions
O+0.019
 aback+0.019
ray+0.018
Aid+0.018
 ingredient+0.017
Neg logit contributions
 sizes-0.028
 planets-0.027
ances-0.027
 nations-0.026
bles-0.026
Token with
Token ID351
Feature activation+7.11e-02

Pos logit contributions
 sizes+0.004
ances+0.004
 cultures+0.004
arians+0.004
girls+0.004
Neg logit contributions
O-0.003
 Jane-0.002
 aback-0.002
uck-0.002
M-0.002
Token his
Token ID465
Feature activation+1.68e-02

Pos logit contributions
ray+0.066
 aback+0.063
O+0.063
Aid+0.059
uck+0.056
Neg logit contributions
 sizes-0.151
 planets-0.144
 nations-0.140
 cultures-0.139
ances-0.138
Token dad
Token ID9955
Feature activation+4.15e-03

Pos logit contributions
O+0.021
 Jane+0.021
 another+0.021
 scoop+0.020
 my+0.019
Neg logit contributions
ances-0.030
 cultures-0.029
 sizes-0.029
bows-0.028
arians-0.028
Token into
Token ID656
Feature activation+4.18e-02

Pos logit contributions
O+0.017
 aback+0.015
Aid+0.015
ray+0.015
uck+0.015
Neg logit contributions
 sizes-0.029
ances-0.029
 cultures-0.028
girls-0.028
 planets-0.027
Token the
Token ID262
Feature activation-2.35e-02

Pos logit contributions
O+0.044
uck+0.042
ray+0.041
Aid+0.041
eh+0.039
Neg logit contributions
 sizes-0.081
 planets-0.079
 cultures-0.079
 countries-0.078
 nations-0.077
Token bush
Token ID24484
Feature activation+1.50e-02

Pos logit contributions
 sizes+0.029
 planets+0.028
 cultures+0.028
 nations+0.027
 countries+0.026
Neg logit contributions
izz-0.038
inn-0.038
 aback-0.038
O-0.038
 ingredient-0.037
Token.
Token ID13
Feature activation+1.17e-03

Pos logit contributions
O+0.021
 Jane+0.021
 aback+0.020
 scoop+0.020
 another+0.020
Neg logit contributions
 sizes-0.024
ances-0.024
 cultures-0.023
 nations-0.022
 planets-0.022
Token L
Token ID406
Feature activation+3.46e-03

Pos logit contributions
O+0.001
eh+0.001
 ingredient+0.001
Aid+0.001
uck+0.001
Neg logit contributions
 sizes-0.002
ances-0.002
 cultures-0.002
girls-0.002
arians-0.002
Tokenizz
Token ID6457
Feature activation+1.41e-01

Pos logit contributions
 aback+0.002
O+0.002
 another+0.001
 Half+0.001
 scoop+0.001
Neg logit contributions
ances-0.008
bles-0.008
 sizes-0.008
 cultures-0.008
 nations-0.008
Tokenie
Token ID494
Feature activation+2.09e-02

Pos logit contributions
O+0.182
 aback+0.182
 Half+0.174
Gu+0.168
 Jane+0.168
Neg logit contributions
ances-0.237
 sizes-0.236
 cultures-0.224
bles-0.222
 nations-0.220
Token was
Token ID373
Feature activation-9.94e-03

Pos logit contributions
O+0.027
 aback+0.025
 Jane+0.025
Aid+0.025
inn+0.024
Neg logit contributions
 sizes-0.035
 cultures-0.034
ances-0.034
girls-0.033
 planets-0.033
Token about
Token ID546
Feature activation+1.43e-02

Pos logit contributions
 sizes+0.015
ances+0.015
 cultures+0.014
 planets+0.014
girls+0.014
Neg logit contributions
O-0.015
ray-0.013
 aback-0.013
Aid-0.013
 Jane-0.013
Token to
Token ID284
Feature activation+1.23e-02

Pos logit contributions
O+0.007
 Jane+0.006
 aback+0.006
 another+0.005
 scoop+0.005
Neg logit contributions
ances-0.037
 sizes-0.037
 cultures-0.035
arians-0.035
girls-0.035
Token turn
Token ID1210
Feature activation+3.44e-02

Pos logit contributions
O+0.014
 Jane+0.014
 aback+0.014
 scoop+0.013
M+0.013
Neg logit contributions
 sizes-0.023
ances-0.023
 cultures-0.022
 nations-0.022
 planets-0.022
Token again
Token ID757
Feature activation+4.48e-02

Pos logit contributions
ances+0.041
 sizes+0.041
 cultures+0.040
arians+0.039
girls+0.039
Neg logit contributions
 Jane-0.027
 aback-0.027
O-0.026
 another-0.026
 my-0.025
Token.
Token ID13
Feature activation-6.41e-04

Pos logit contributions
 another+0.054
 Jane+0.053
 some+0.051
 aback+0.050
 late+0.050
Neg logit contributions
 sizes-0.080
ances-0.080
 cultures-0.078
arians-0.077
 nations-0.077
Token But
Token ID887
Feature activation-7.26e-03

Pos logit contributions
 sizes+0.001
 cultures+0.001
ances+0.001
 planets+0.001
girls+0.001
Neg logit contributions
O-0.001
 scoop-0.001
eh-0.001
 aback-0.001
Aid-0.001
Token unb
Token ID22619
Feature activation+3.38e-02

Pos logit contributions
 sizes+0.012
 cultures+0.012
ances+0.012
 planets+0.011
 nations+0.011
Neg logit contributions
O-0.010
 aback-0.009
eh-0.009
Aid-0.009
 Jane-0.009
Tokenek
Token ID988
Feature activation+3.73e-02

Pos logit contributions
 Jane+0.035
O+0.035
 aback+0.033
 ingredient+0.033
 scoop+0.032
Neg logit contributions
ances-0.064
bles-0.062
 cultures-0.062
aults-0.061
arians-0.061
Tokennown
Token ID3408
Feature activation+1.41e-01

Pos logit contributions
O+0.044
 Jane+0.041
 aback+0.041
Aid+0.039
 scoop+0.038
Neg logit contributions
 sizes-0.068
ances-0.068
 cultures-0.066
 planets-0.064
 nations-0.064
Token to
Token ID284
Feature activation+5.90e-02

Pos logit contributions
O+0.141
 Jane+0.138
 aback+0.127
 ingredient+0.124
 scoop+0.121
Neg logit contributions
 cultures-0.267
ances-0.264
 sizes-0.264
 planets-0.259
 nations-0.257
Token him
Token ID683
Feature activation+3.67e-03

Pos logit contributions
O+0.048
 aback+0.044
 Jane+0.044
M+0.040
 scoop+0.040
Neg logit contributions
 sizes-0.131
ances-0.129
 cultures-0.128
 nations-0.124
 planets-0.124
Token,
Token ID11
Feature activation-1.13e-02

Pos logit contributions
 another+0.004
 Jane+0.004
 aback+0.004
 some+0.004
O+0.004
Neg logit contributions
 sizes-0.007
ances-0.007
 cultures-0.007
bows-0.007
 nations-0.007
Token the
Token ID262
Feature activation-3.36e-02

Pos logit contributions
 cultures+0.019
 sizes+0.019
ances+0.019
 planets+0.018
 countries+0.018
Neg logit contributions
O-0.014
eh-0.013
uck-0.013
Aid-0.013
 aback-0.013
Token king
Token ID5822
Feature activation-1.02e-02

Pos logit contributions
ances+0.053
 sizes+0.052
girls+0.051
 cultures+0.050
arians+0.049
Neg logit contributions
 another-0.049
 Jane-0.048
 scoop-0.047
 half-0.046
 found-0.046
Token From
Token ID3574
Feature activation-3.40e-02

Pos logit contributions
O+0.003
 aback+0.003
 Jane+0.003
eh+0.003
 scoop+0.003
Neg logit contributions
 sizes-0.007
ances-0.007
 cultures-0.007
 planets-0.007
 nations-0.007
Token then
Token ID788
Feature activation+2.54e-02

Pos logit contributions
 sizes+0.060
ances+0.060
 cultures+0.059
 planets+0.057
 nations+0.057
Neg logit contributions
O-0.042
 aback-0.040
 Jane-0.039
M-0.037
 scoop-0.037
Token on
Token ID319
Feature activation+3.38e-02

Pos logit contributions
 Jane+0.036
O+0.036
 aback+0.035
eh+0.034
 ingredient+0.034
Neg logit contributions
ances-0.037
 sizes-0.037
 cultures-0.036
 achieve-0.035
 nations-0.035
Token,
Token ID11
Feature activation-1.82e-02

Pos logit contributions
O+0.052
 Jane+0.050
 aback+0.048
M+0.047
uck+0.047
Neg logit contributions
 sizes-0.052
ances-0.051
 cultures-0.051
 nations-0.048
girls-0.048
Token L
Token ID406
Feature activation+7.55e-03

Pos logit contributions
 sizes+0.037
ances+0.037
 cultures+0.037
 nations+0.036
girls+0.036
Neg logit contributions
O-0.018
 Jane-0.017
 another-0.016
M-0.016
 aback-0.016
Tokenizz
Token ID6457
Feature activation+1.41e-01

Pos logit contributions
 aback+0.004
O+0.003
 Jane+0.003
 Half+0.003
 scoop+0.003
Neg logit contributions
ances-0.019
 sizes-0.018
 cultures-0.018
bles-0.018
 nations-0.018
Tokenie
Token ID494
Feature activation+2.37e-02

Pos logit contributions
O+0.160
 aback+0.151
 Jane+0.146
Aid+0.139
 scoop+0.138
Neg logit contributions
 sizes-0.268
ances-0.266
 cultures-0.261
 nations-0.252
 planets-0.251
Token knew
Token ID2993
Feature activation+2.02e-02

Pos logit contributions
O+0.028
 Jane+0.027
 aback+0.026
M+0.026
 scoop+0.026
Neg logit contributions
ances-0.043
 sizes-0.043
 cultures-0.042
arians-0.040
 nations-0.040
Token to
Token ID284
Feature activation-1.71e-03

Pos logit contributions
O+0.019
 aback+0.017
 Jane+0.017
M+0.016
uck+0.016
Neg logit contributions
 sizes-0.042
ances-0.042
 cultures-0.041
 nations-0.040
 planets-0.040
Token always
Token ID1464
Feature activation-3.42e-02

Pos logit contributions
 sizes+0.003
ances+0.003
 cultures+0.003
 nations+0.003
 planets+0.003
Neg logit contributions
O-0.002
 aback-0.002
 Jane-0.002
uck-0.002
M-0.002
Token be
Token ID307
Feature activation-9.99e-03

Pos logit contributions
 sizes+0.058
 cultures+0.057
ances+0.057
 nations+0.054
 planets+0.053
Neg logit contributions
O-0.047
 aback-0.043
 Jane-0.042
eh-0.041
M-0.041
Token
Token ID198
Feature activation+1.10e-02

Pos logit contributions
 aback+0.010
O+0.010
 Jane+0.009
 ingredient+0.009
Aid+0.009
Neg logit contributions
 sizes-0.015
ances-0.014
 cultures-0.014
girls-0.014
bles-0.014
TokenJack
Token ID14295
Feature activation+3.82e-02

Pos logit contributions
 aback+0.011
 Jane+0.011
 ingredient+0.011
uck+0.010
ray+0.010
Neg logit contributions
 sizes-0.021
ances-0.020
aults-0.020
bles-0.020
 cultures-0.020
Token had
Token ID550
Feature activation+3.11e-02

Pos logit contributions
O+0.043
 Jane+0.041
 aback+0.041
 scoop+0.040
M+0.040
Neg logit contributions
ances-0.073
 sizes-0.073
 cultures-0.072
 nations-0.069
 planets-0.068
Token brought
Token ID3181
Feature activation+7.62e-02

Pos logit contributions
O+0.032
inn+0.031
Aid+0.031
uck+0.031
izz+0.030
Neg logit contributions
 sizes-0.061
 cultures-0.057
 planets-0.057
 nations-0.056
 countries-0.056
Token L
Token ID406
Feature activation+2.40e-03

Pos logit contributions
O+0.068
eh+0.065
ray+0.060
Gu+0.059
Aid+0.059
Neg logit contributions
 sizes-0.159
 cultures-0.151
ances-0.150
 planets-0.149
girls-0.147
Tokenizz
Token ID6457
Feature activation+1.41e-01

Pos logit contributions
 aback+0.002
 another+0.002
 Half+0.002
 my+0.002
 some+0.002
Neg logit contributions
ances-0.005
bles-0.005
bows-0.005
 sizes-0.005
 ABC-0.005
Tokenie
Token ID494
Feature activation+2.13e-02

Pos logit contributions
O+0.186
 aback+0.165
 Jane+0.165
 Half+0.165
Aid+0.162
Neg logit contributions
 sizes-0.243
ances-0.238
 cultures-0.233
bles-0.227
 nations-0.226
Token a
Token ID257
Feature activation+3.42e-03

Pos logit contributions
O+0.020
 aback+0.017
 Jane+0.017
inn+0.016
M+0.016
Neg logit contributions
 sizes-0.045
 cultures-0.044
ances-0.044
 nations-0.043
 planets-0.043
Token present
Token ID1944
Feature activation+3.85e-02

Pos logit contributions
O+0.004
 aback+0.004
 Jane+0.004
uck+0.004
M+0.004
Neg logit contributions
 sizes-0.006
 cultures-0.006
ances-0.006
 nations-0.006
 planets-0.006
Token.
Token ID13
Feature activation+4.66e-03

Pos logit contributions
 Jane+0.033
O+0.033
 aback+0.031
 scoop+0.030
 another+0.030
Neg logit contributions
 sizes-0.083
 cultures-0.082
ances-0.082
 nations-0.079
 planets-0.078
Token It
Token ID632
Feature activation+4.50e-02

Pos logit contributions
O+0.005
 aback+0.005
eh+0.005
uck+0.005
 ingredient+0.005
Neg logit contributions
 sizes-0.009
ances-0.008
 cultures-0.008
 planets-0.008
 ABC-0.008
Token L
Token ID406
Feature activation+9.59e-03

Pos logit contributions
ances+0.012
 sizes+0.011
bows+0.011
 ABC+0.011
 cultures+0.011
Neg logit contributions
 aback-0.006
Aid-0.006
O-0.006
plug-0.005
vice-0.005
Tokenizz
Token ID6457
Feature activation+1.33e-01

Pos logit contributions
O+0.010
 aback+0.010
 Jane+0.009
 scoop+0.009
M+0.009
Neg logit contributions
 sizes-0.019
ances-0.019
 cultures-0.018
 nations-0.018
 planets-0.018
Tokenie
Token ID494
Feature activation+1.54e-02

Pos logit contributions
O+0.102
 Jane+0.097
M+0.094
eh+0.093
uck+0.090
Neg logit contributions
 sizes-0.299
 cultures-0.296
ances-0.286
arians-0.284
girls-0.282
Token.
Token ID13
Feature activation+1.30e-02

Pos logit contributions
O+0.018
M+0.018
 another+0.018
 Jane+0.017
 scoop+0.017
Neg logit contributions
 sizes-0.029
 cultures-0.029
ances-0.028
arians-0.028
 nations-0.027
Token L
Token ID406
Feature activation+7.07e-03

Pos logit contributions
O+0.011
 Jane+0.010
 aback+0.009
M+0.009
uck+0.009
Neg logit contributions
 sizes-0.029
ances-0.029
 cultures-0.029
 nations-0.028
arians-0.028
Tokenizz
Token ID6457
Feature activation+1.41e-01

Pos logit contributions
 aback+0.003
O+0.003
 Jane+0.002
 Half+0.002
 another+0.002
Neg logit contributions
ances-0.018
 sizes-0.018
bles-0.017
 cultures-0.017
 nations-0.017
Tokenie
Token ID494
Feature activation+1.69e-02

Pos logit contributions
O+0.158
 aback+0.150
 Jane+0.145
Aid+0.137
 scoop+0.137
Neg logit contributions
 sizes-0.269
ances-0.266
 cultures-0.261
 nations-0.253
 planets-0.252
Token was
Token ID373
Feature activation-1.17e-03

Pos logit contributions
O+0.025
 aback+0.025
 Jane+0.024
uck+0.023
Aid+0.023
Neg logit contributions
 sizes-0.026
ances-0.025
 cultures-0.025
 planets-0.024
 nations-0.024
Token only
Token ID691
Feature activation+5.09e-02

Pos logit contributions
 sizes+0.002
ances+0.002
 cultures+0.002
 planets+0.002
 nations+0.002
Neg logit contributions
O-0.001
 aback-0.001
 Jane-0.001
ray-0.001
Aid-0.001
Token three
Token ID1115
Feature activation+7.31e-02

Pos logit contributions
O+0.046
 aback+0.042
 Jane+0.041
M+0.038
uck+0.038
Neg logit contributions
 sizes-0.108
ances-0.107
 cultures-0.106
 nations-0.102
 planets-0.102
Token years
Token ID812
Feature activation-6.76e-02

Pos logit contributions
 Jane+0.120
O+0.118
M+0.117
 scoop+0.115
 another+0.115
Neg logit contributions
hips-0.095
 affect-0.093
 cultures-0.093
 achieve-0.093
girls-0.092
Token exclaimed
Token ID33552
Feature activation+2.26e-02

Pos logit contributions
 sizes+0.022
ances+0.022
 cultures+0.022
 nations+0.021
 planets+0.021
Neg logit contributions
O-0.012
 aback-0.011
 Jane-0.011
M-0.010
 scoop-0.010
Token.
Token ID13
Feature activation-5.51e-03

Pos logit contributions
O+0.032
 Jane+0.032
 aback+0.031
 scoop+0.030
M+0.030
Neg logit contributions
 sizes-0.036
ances-0.036
 cultures-0.035
 nations-0.034
 planets-0.034
Token
Token ID198
Feature activation+1.77e-03

Pos logit contributions
 sizes+0.010
ances+0.010
 cultures+0.010
girls+0.009
 planets+0.009
Neg logit contributions
O-0.006
 aback-0.006
 Jane-0.006
eh-0.006
Aid-0.006
Token
Token ID198
Feature activation+3.96e-03

Pos logit contributions
 aback+0.002
O+0.002
 Jane+0.002
 ingredient+0.002
Aid+0.002
Neg logit contributions
 sizes-0.003
ances-0.003
 cultures-0.003
bles-0.003
girls-0.003
TokenL
Token ID43
Feature activation+2.70e-02

Pos logit contributions
O+0.003
 aback+0.003
 Jane+0.003
uck+0.003
Aid+0.003
Neg logit contributions
 sizes-0.009
ances-0.009
 cultures-0.008
 nations-0.008
 planets-0.008
Tokenizz
Token ID6457
Feature activation+1.41e-01

Pos logit contributions
 aback+0.014
 Half+0.012
 Jane+0.012
O+0.012
 Amy+0.011
Neg logit contributions
ances-0.066
 sizes-0.064
bles-0.063
 cultures-0.061
 nations-0.061
Tokenie
Token ID494
Feature activation+1.94e-02

Pos logit contributions
O+0.169
 aback+0.165
 Jane+0.155
 Half+0.153
Aid+0.151
Neg logit contributions
 sizes-0.252
ances-0.251
 cultures-0.244
 nations-0.239
 planets-0.237
Token smiled
Token ID13541
Feature activation-4.25e-02

Pos logit contributions
O+0.028
 Jane+0.027
 aback+0.027
 scoop+0.026
M+0.026
Neg logit contributions
 sizes-0.031
ances-0.030
 cultures-0.030
 nations-0.028
 planets-0.028
Token back
Token ID736
Feature activation+7.30e-02

Pos logit contributions
 sizes+0.075
ances+0.074
 cultures+0.073
arians+0.071
girls+0.070
Neg logit contributions
O-0.055
 aback-0.055
 another-0.054
 Jane-0.053
M-0.053
Token.
Token ID13
Feature activation+7.28e-04

Pos logit contributions
 another+0.082
uck+0.078
 Jane+0.077
 my+0.077
 scoop+0.076
Neg logit contributions
 sizes-0.151
ances-0.141
 cultures-0.140
 affect-0.135
bows-0.134
Token She
Token ID1375
Feature activation-8.13e-03

Pos logit contributions
O+0.001
 Jane+0.001
 aback+0.001
M+0.001
uck+0.001
Neg logit contributions
 sizes-0.001
ances-0.001
 cultures-0.001
 nations-0.001
 planets-0.001
Token bear
Token ID6842
Feature activation-5.00e-02

Pos logit contributions
O+0.038
 Jane+0.036
 another+0.035
 aback+0.035
 my+0.035
Neg logit contributions
ances-0.074
 cultures-0.073
 sizes-0.072
 nations-0.070
 achieve-0.069
Token.
Token ID13
Feature activation+2.40e-03

Pos logit contributions
 cultures+0.094
ances+0.093
 nations+0.088
 sizes+0.087
 countries+0.086
Neg logit contributions
 scoop-0.059
 another-0.059
 Jane-0.058
 found-0.058
O-0.057
Token
Token ID198
Feature activation+1.39e-02

Pos logit contributions
O+0.002
 aback+0.002
 Jane+0.002
Aid+0.002
 ingredient+0.002
Neg logit contributions
 sizes-0.005
ances-0.005
 cultures-0.005
 planets-0.004
girls-0.004
Token
Token ID198
Feature activation+1.50e-02

Pos logit contributions
 aback+0.018
Aid+0.017
 ingredient+0.017
 scoop+0.016
 Jane+0.016
Neg logit contributions
 sizes-0.022
ances-0.021
 cultures-0.021
bles-0.021
 achieve-0.021
TokenL
Token ID43
Feature activation+3.47e-02

Pos logit contributions
 aback+0.017
 Jane+0.016
 ingredient+0.015
Aid+0.014
ray+0.014
Neg logit contributions
 sizes-0.028
ances-0.028
 cultures-0.027
bles-0.026
 achieve-0.026
Tokenizz
Token ID6457
Feature activation+1.40e-01

Pos logit contributions
O+0.028
 aback+0.027
 Jane+0.026
 scoop+0.025
Aid+0.024
Neg logit contributions
ances-0.075
 sizes-0.074
 cultures-0.073
 nations-0.071
bles-0.071
Tokenie
Token ID494
Feature activation+2.34e-02

Pos logit contributions
O+0.128
 Jane+0.119
 aback+0.113
M+0.113
 scoop+0.111
Neg logit contributions
 sizes-0.295
 cultures-0.291
ances-0.289
arians-0.278
 nations-0.278
Token was
Token ID373
Feature activation+2.91e-04

Pos logit contributions
O+0.024
 Jane+0.023
 scoop+0.023
 found+0.023
M+0.022
Neg logit contributions
ances-0.047
 cultures-0.047
 sizes-0.047
 nations-0.044
arians-0.044
Token worried
Token ID7960
Feature activation+9.81e-03

Pos logit contributions
O+0.000
 aback+0.000
 Jane+0.000
M+0.000
uck+0.000
Neg logit contributions
 sizes-0.000
ances-0.000
 cultures-0.000
 nations-0.000
 planets-0.000
Token that
Token ID326
Feature activation+3.51e-02

Pos logit contributions
 Jane+0.012
O+0.012
 another+0.011
 aback+0.011
 some+0.011
Neg logit contributions
 sizes-0.018
 cultures-0.018
ances-0.018
girls-0.017
arians-0.017
Token the
Token ID262
Feature activation-4.77e-02

Pos logit contributions
O+0.032
 aback+0.030
 Jane+0.027
uck+0.027
Aid+0.027
Neg logit contributions
 sizes-0.075
ances-0.073
 cultures-0.073
 planets-0.071
 nations-0.070
Token L
Token ID406
Feature activation+1.10e-02

Pos logit contributions
ances+0.012
bows+0.012
 sizes+0.012
 ABC+0.011
 dancers+0.011
Neg logit contributions
 aback-0.008
Aid-0.007
vice-0.007
plug-0.006
O-0.006
Tokenizz
Token ID6457
Feature activation+1.31e-01

Pos logit contributions
O+0.013
 aback+0.013
 Jane+0.012
 scoop+0.011
Aid+0.011
Neg logit contributions
ances-0.020
 sizes-0.019
 cultures-0.019
 nations-0.019
bles-0.019
Tokenie
Token ID494
Feature activation+1.59e-02

Pos logit contributions
O+0.102
 Jane+0.098
M+0.094
eh+0.093
uck+0.091
Neg logit contributions
 sizes-0.293
 cultures-0.289
ances-0.281
arians-0.279
girls-0.277
Token.
Token ID13
Feature activation+1.51e-02

Pos logit contributions
M+0.017
O+0.017
 another+0.017
 scoop+0.017
 Jane+0.017
Neg logit contributions
 sizes-0.031
 cultures-0.031
ances-0.030
arians-0.030
girls-0.030
Token L
Token ID406
Feature activation+8.45e-03

Pos logit contributions
O+0.011
 Jane+0.010
 aback+0.010
M+0.009
uck+0.009
Neg logit contributions
ances-0.035
 cultures-0.034
 sizes-0.034
arians-0.034
girls-0.034
Tokenizz
Token ID6457
Feature activation+1.40e-01

Pos logit contributions
 aback+0.004
O+0.003
 Jane+0.003
 Half+0.003
Da+0.003
Neg logit contributions
ances-0.021
bles-0.020
 sizes-0.020
 cultures-0.020
 nations-0.020
Tokenie
Token ID494
Feature activation+1.70e-02

Pos logit contributions
O+0.162
 aback+0.156
 Jane+0.149
Aid+0.142
 scoop+0.141
Neg logit contributions
 sizes-0.262
ances-0.260
 cultures-0.254
 planets-0.247
 nations-0.246
Token was
Token ID373
Feature activation-1.61e-03

Pos logit contributions
O+0.025
 aback+0.024
 Jane+0.024
uck+0.023
Aid+0.023
Neg logit contributions
 sizes-0.026
ances-0.026
 cultures-0.025
 planets-0.024
 nations-0.024
Token only
Token ID691
Feature activation+4.83e-02

Pos logit contributions
 sizes+0.003
ances+0.003
 cultures+0.003
 planets+0.003
 nations+0.003
Neg logit contributions
O-0.002
 aback-0.002
 Jane-0.002
M-0.002
uck-0.002
Token three
Token ID1115
Feature activation+6.91e-02

Pos logit contributions
O+0.040
 Jane+0.037
 aback+0.037
 scoop+0.034
uck+0.034
Neg logit contributions
 sizes-0.106
ances-0.105
 cultures-0.104
 nations-0.101
 planets-0.100
Token years
Token ID812
Feature activation-7.04e-02

Pos logit contributions
 Jane+0.113
O+0.107
M+0.107
 scoop+0.107
sm+0.107
Neg logit contributions
hips-0.092
 affect-0.090
 achieve-0.090
 cultures-0.089
girls-0.088
Tokenie
Token ID494
Feature activation+5.22e-03

Pos logit contributions
O+0.150
 aback+0.142
 Jane+0.135
Aid+0.131
 Half+0.131
Neg logit contributions
ances-0.219
 sizes-0.217
 cultures-0.206
bles-0.204
 planets-0.201
Token.
Token ID13
Feature activation+3.91e-03

Pos logit contributions
O+0.006
 scoop+0.006
 aback+0.006
 Jane+0.006
M+0.005
Neg logit contributions
 cultures-0.010
ances-0.010
 sizes-0.010
 nations-0.010
 countries-0.009
Token Thankfully
Token ID31251
Feature activation+3.55e-02

Pos logit contributions
eh+0.005
O+0.004
Aid+0.004
 ingredient+0.004
 aback+0.004
Neg logit contributions
 sizes-0.007
ances-0.007
 cultures-0.006
bows-0.006
arians-0.006
Token,
Token ID11
Feature activation-1.53e-02

Pos logit contributions
Aid+0.076
Gu+0.075
eh+0.075
 ingredient+0.075
O+0.074
Neg logit contributions
 sizes-0.022
ances-0.022
bles-0.021
aults-0.021
 cultures-0.020
Token L
Token ID406
Feature activation+7.51e-03

Pos logit contributions
 sizes+0.027
ances+0.027
 cultures+0.026
 planets+0.025
bles+0.025
Neg logit contributions
 aback-0.018
eh-0.017
O-0.017
ray-0.017
Aid-0.016
Tokenizz
Token ID6457
Feature activation+1.40e-01

Pos logit contributions
 aback+0.004
O+0.003
 Half+0.003
 Jane+0.003
 another+0.003
Neg logit contributions
ances-0.018
bles-0.018
 sizes-0.017
 cultures-0.017
 nations-0.017
Tokenie
Token ID494
Feature activation+2.08e-02

Pos logit contributions
O+0.180
 aback+0.178
 Half+0.169
 Jane+0.168
Aid+0.164
Neg logit contributions
ances-0.238
 sizes-0.234
bles-0.224
 cultures-0.221
 nations-0.219
Token was
Token ID373
Feature activation-4.07e-03

Pos logit contributions
O+0.023
 aback+0.021
 Jane+0.021
 scoop+0.021
uck+0.020
Neg logit contributions
 sizes-0.042
ances-0.041
 cultures-0.041
 nations-0.039
 planets-0.038
Token alright
Token ID23036
Feature activation-1.65e-02

Pos logit contributions
 sizes+0.006
aults+0.006
bles+0.006
ances+0.006
 achieve+0.006
Neg logit contributions
O-0.006
ray-0.005
 aback-0.005
uck-0.005
Aid-0.005
Token,
Token ID11
Feature activation-2.75e-02

Pos logit contributions
 sizes+0.032
 cultures+0.031
ances+0.031
 nations+0.031
girls+0.030
Neg logit contributions
 Jane-0.020
 another-0.019
 her-0.018
 aback-0.018
 Sally-0.018
Token but
Token ID475
Feature activation-2.02e-03

Pos logit contributions
 sizes+0.043
ances+0.042
 cultures+0.042
 planets+0.040
 nations+0.040
Neg logit contributions
O-0.039
 aback-0.037
 Jane-0.036
uck-0.036
inn-0.035
Token
Token ID198
Feature activation+1.21e-02

Pos logit contributions
O+0.004
 Jane+0.004
 aback+0.004
M+0.004
uck+0.004
Neg logit contributions
ances-0.011
 sizes-0.011
 cultures-0.011
 nations-0.011
arians-0.011
Token
Token ID198
Feature activation+1.44e-02

Pos logit contributions
O+0.014
 aback+0.014
Aid+0.013
 Jane+0.013
 scoop+0.013
Neg logit contributions
 sizes-0.022
ances-0.021
 cultures-0.021
 planets-0.021
 nations-0.020
TokenSuddenly
Token ID38582
Feature activation+4.38e-02

Pos logit contributions
O+0.012
M+0.010
 Jane+0.009
 another+0.009
uck+0.009
Neg logit contributions
 sizes-0.033
ances-0.033
 cultures-0.033
girls-0.032
arians-0.032
Token,
Token ID11
Feature activation-7.52e-03

Pos logit contributions
O+0.071
 Jane+0.067
 aback+0.067
M+0.065
uck+0.064
Neg logit contributions
 sizes-0.063
ances-0.062
 cultures-0.061
 nations-0.058
girls-0.058
Token L
Token ID406
Feature activation+9.32e-03

Pos logit contributions
 sizes+0.017
ances+0.017
 cultures+0.017
 nations+0.016
girls+0.016
Neg logit contributions
O-0.006
 Jane-0.005
 aback-0.005
M-0.005
 another-0.005
Tokenizz
Token ID6457
Feature activation+1.39e-01

Pos logit contributions
 aback+0.004
O+0.004
 Jane+0.003
Da+0.003
 Half+0.003
Neg logit contributions
ances-0.024
 sizes-0.023
bles-0.023
 cultures-0.022
 nations-0.022
Tokenie
Token ID494
Feature activation+2.27e-02

Pos logit contributions
O+0.116
M+0.107
eh+0.104
 scoop+0.102
 Jane+0.102
Neg logit contributions
 sizes-0.310
 cultures-0.308
ances-0.300
arians-0.295
girls-0.290
Token saw
Token ID2497
Feature activation+6.69e-02

Pos logit contributions
O+0.019
M+0.017
 found+0.017
 scoop+0.017
uck+0.016
Neg logit contributions
 sizes-0.052
 cultures-0.051
ances-0.051
arians-0.048
 nations-0.048
Token something
Token ID1223
Feature activation-8.07e-03

Pos logit contributions
inn+0.070
 aback+0.065
 ingredient+0.060
 scoop+0.059
O+0.058
Neg logit contributions
 planets-0.131
 sizes-0.125
 nations-0.123
 ze-0.122
 cultures-0.121
Token wonderful
Token ID7932
Feature activation-6.62e-03

Pos logit contributions
 sizes+0.014
ances+0.014
 cultures+0.014
 nations+0.013
 planets+0.013
Neg logit contributions
O-0.010
 aback-0.010
 Jane-0.010
M-0.009
uck-0.009
Token.
Token ID13
Feature activation+9.40e-03

Pos logit contributions
 sizes+0.014
ances+0.014
arians+0.013
 cultures+0.013
girls+0.013
Neg logit contributions
 Jane-0.007
O-0.006
 another-0.006
 her-0.006
 found-0.006
Token my
Token ID616
Feature activation+1.01e-02

Pos logit contributions
ances+0.142
 cultures+0.141
 sizes+0.139
arians+0.137
 affect+0.137
Neg logit contributions
 Jane-0.044
 my-0.044
O-0.042
 scoop-0.039
 another-0.037
Token cow
Token ID9875
Feature activation-3.40e-02

Pos logit contributions
O+0.011
 Jane+0.010
 another+0.010
 scoop+0.010
 aback+0.010
Neg logit contributions
ances-0.020
 sizes-0.019
 cultures-0.019
 nations-0.019
girls-0.018
Tokenboys
Token ID13202
Feature activation-1.10e-01

Pos logit contributions
ances+0.059
 sizes+0.058
 cultures+0.058
 nations+0.055
 planets+0.055
Neg logit contributions
O-0.043
 Jane-0.042
 aback-0.041
 scoop-0.040
M-0.040
Token and
Token ID290
Feature activation-1.51e-02

Pos logit contributions
 sizes+0.191
 cultures+0.188
ances+0.187
 planets+0.179
 nations+0.179
Neg logit contributions
O-0.141
 aback-0.136
 Jane-0.136
 my-0.132
 another-0.131
Token cow
Token ID9875
Feature activation-2.59e-02

Pos logit contributions
ances+0.026
 sizes+0.026
 cultures+0.025
 nations+0.024
arians+0.024
Neg logit contributions
O-0.019
 Jane-0.018
 aback-0.018
 my-0.018
 scoop-0.018
Tokengirls
Token ID36960
Feature activation-2.43e-01

Pos logit contributions
ances+0.046
 cultures+0.046
 sizes+0.045
 nations+0.043
 planets+0.043
Neg logit contributions
 another-0.030
 Jane-0.030
 my-0.030
O-0.030
 scoop-0.029
Token.
Token ID13
Feature activation-1.60e-02

Pos logit contributions
 sizes+0.483
ances+0.481
 cultures+0.471
 nations+0.455
 planets+0.453
Neg logit contributions
 another-0.252
 my-0.242
 scoop-0.240
O-0.238
 some-0.237
Token I
Token ID314
Feature activation+6.11e-02

Pos logit contributions
ances+0.031
 sizes+0.031
 cultures+0.031
 nations+0.030
bles+0.030
Neg logit contributions
O-0.018
 Jane-0.016
 aback-0.016
M-0.016
 my-0.015
Token love
Token ID1842
Feature activation-2.96e-02

Pos logit contributions
O+0.087
 Jane+0.086
 aback+0.083
uck+0.082
inn+0.080
Neg logit contributions
 sizes-0.095
 cultures-0.091
ances-0.091
girls-0.089
bows-0.087
Token you
Token ID345
Feature activation-6.15e-03

Pos logit contributions
 sizes+0.056
ances+0.056
 cultures+0.055
 planets+0.053
 nations+0.053
Neg logit contributions
O-0.033
 aback-0.032
 Jane-0.031
M-0.029
Aid-0.029
Token."
Token ID526
Feature activation+2.06e-02

Pos logit contributions
 sizes+0.011
 cultures+0.011
ances+0.011
 nations+0.010
 planets+0.010
Neg logit contributions
O-0.007
M-0.007
 my-0.007
 Jane-0.007
 another-0.007
Token be
Token ID307
Feature activation-1.86e-02

Pos logit contributions
 sizes+0.023
ances+0.023
 cultures+0.023
arians+0.022
girls+0.022
Neg logit contributions
O-0.015
 Jane-0.015
 scoop-0.014
 another-0.014
 my-0.014
Token cow
Token ID9875
Feature activation-2.53e-02

Pos logit contributions
 sizes+0.040
ances+0.039
 cultures+0.039
 nations+0.037
arians+0.037
Neg logit contributions
O-0.016
 Jane-0.015
 aback-0.015
 scoop-0.014
M-0.014
Tokenboys
Token ID13202
Feature activation-9.97e-02

Pos logit contributions
 affect+0.028
 cultures+0.028
ances+0.028
 sizes+0.027
 achieve+0.027
Neg logit contributions
 another-0.047
 some-0.047
 Jane-0.046
 Amy-0.045
 Lucy-0.044
Token and
Token ID290
Feature activation-1.03e-02

Pos logit contributions
 sizes+0.183
 cultures+0.180
ances+0.176
 nations+0.169
 achieve+0.169
Neg logit contributions
 another-0.118
 some-0.114
 Jane-0.111
 found-0.109
uck-0.108
Token cow
Token ID9875
Feature activation-2.06e-02

Pos logit contributions
 sizes+0.014
ances+0.014
 cultures+0.014
 nations+0.013
 planets+0.013
Neg logit contributions
O-0.017
 aback-0.016
 Jane-0.016
uck-0.016
M-0.015
Tokengirls
Token ID36960
Feature activation-2.38e-01

Pos logit contributions
ances+0.018
 affect+0.017
 cultures+0.017
 achieve+0.017
 sizes+0.016
Neg logit contributions
 another-0.044
 some-0.044
 Amy-0.041
 Jane-0.041
 found-0.041
Token.
Token ID13
Feature activation+4.46e-03

Pos logit contributions
ances+0.497
 sizes+0.484
 cultures+0.475
 affect+0.454
 nations+0.444
Neg logit contributions
 some-0.249
 another-0.244
 found-0.228
 Jane-0.212
 over-0.204
Token They
Token ID1119
Feature activation-5.37e-02

Pos logit contributions
Aid+0.007
O+0.007
eh+0.007
 aback+0.006
ray+0.006
Neg logit contributions
 sizes-0.007
girls-0.006
 nations-0.006
ances-0.006
 cultures-0.006
Token put
Token ID1234
Feature activation+6.29e-02

Pos logit contributions
 sizes+0.107
ances+0.107
 cultures+0.106
bows+0.102
arians+0.102
Neg logit contributions
 scoop-0.054
O-0.051
 found-0.051
 Jane-0.049
 aback-0.048
Token on
Token ID319
Feature activation+4.15e-02

Pos logit contributions
O+0.083
 aback+0.079
 Jane+0.077
ray+0.074
uck+0.074
Neg logit contributions
 sizes-0.110
 planets-0.105
girls-0.105
 cultures-0.103
ances-0.102
Token hats
Token ID23910
Feature activation-4.89e-02

Pos logit contributions
O+0.048
 aback+0.046
 Jane+0.044
uck+0.043
Aid+0.043
Neg logit contributions
 sizes-0.079
 cultures-0.076
ances-0.076
 planets-0.074
 nations-0.074
Token play
Token ID711
Feature activation-6.05e-02

Pos logit contributions
 sizes+0.055
ances+0.055
 cultures+0.053
bles+0.052
arians+0.051
Neg logit contributions
 found-0.037
 scoop-0.036
O-0.036
 Jane-0.036
 aback-0.036
Token cow
Token ID9875
Feature activation-1.95e-02

Pos logit contributions
 sizes+0.097
 cultures+0.096
ances+0.095
 planets+0.095
 nations+0.094
Neg logit contributions
O-0.083
 aback-0.078
Aid-0.078
 Jane-0.077
izz-0.075
Tokenboys
Token ID13202
Feature activation-9.53e-02

Pos logit contributions
ances+0.031
 sizes+0.029
 cultures+0.029
 nations+0.027
 planets+0.027
Neg logit contributions
 some-0.028
 another-0.027
 Jane-0.027
 found-0.027
 scoop-0.026
Token and
Token ID290
Feature activation-4.97e-03

Pos logit contributions
 sizes+0.172
ances+0.171
 cultures+0.169
 nations+0.161
 achieve+0.159
Neg logit contributions
 found-0.111
 another-0.110
 Jane-0.109
 some-0.109
 aback-0.108
Token cow
Token ID9875
Feature activation-1.85e-02

Pos logit contributions
ances+0.007
 sizes+0.006
 cultures+0.006
bles+0.006
bows+0.006
Neg logit contributions
O-0.008
 Jane-0.008
 my-0.008
 scoop-0.008
 aback-0.008
Tokengirls
Token ID36960
Feature activation-2.26e-01

Pos logit contributions
ances+0.024
 cultures+0.023
 sizes+0.022
 achieve+0.020
aults+0.020
Neg logit contributions
 some-0.032
 another-0.032
 found-0.031
 away-0.029
 Amy-0.029
Token,
Token ID11
Feature activation-2.47e-02

Pos logit contributions
ances+0.463
 cultures+0.436
 sizes+0.435
 nations+0.414
agons+0.408
Neg logit contributions
 another-0.251
 some-0.250
 found-0.237
 one-0.213
 over-0.212
Token and
Token ID290
Feature activation+9.49e-04

Pos logit contributions
ances+0.043
 sizes+0.042
 cultures+0.041
bles+0.040
agons+0.040
Neg logit contributions
 Jane-0.033
 Sally-0.031
 another-0.030
 some-0.030
 Lucy-0.030
Token pretend
Token ID16614
Feature activation-3.25e-02

Pos logit contributions
 Jane+0.001
O+0.001
 another+0.001
 scoop+0.001
 some+0.001
Neg logit contributions
ances-0.001
 sizes-0.001
 cultures-0.001
arians-0.001
bles-0.001
Token to
Token ID284
Feature activation-1.85e-02

Pos logit contributions
ances+0.066
 sizes+0.065
 cultures+0.064
arians+0.063
bles+0.062
Neg logit contributions
 Jane-0.032
O-0.030
 another-0.028
 my-0.028
 some-0.028
Token ride
Token ID6594
Feature activation-1.69e-02

Pos logit contributions
 sizes+0.032
ances+0.032
 cultures+0.032
 nations+0.030
 planets+0.030
Neg logit contributions
O-0.023
 Jane-0.022
 aback-0.022
 scoop-0.021
 another-0.021
Token them
Token ID606
Feature activation-6.39e-04

Pos logit contributions
O+0.044
 aback+0.043
 Jane+0.041
 another+0.040
 my+0.039
Neg logit contributions
 sizes-0.077
ances-0.076
 cultures-0.074
arians-0.072
 planets-0.072
Token away
Token ID1497
Feature activation+2.10e-02

Pos logit contributions
 sizes+0.002
ances+0.002
 cultures+0.002
arians+0.001
 planets+0.001
Neg logit contributions
 aback-0.000
 another-0.000
O-0.000
 some-0.000
 Jane-0.000
Token from
Token ID422
Feature activation-3.32e-02

Pos logit contributions
 another+0.025
 Jane+0.024
 some+0.024
 aback+0.024
 my+0.024
Neg logit contributions
 sizes-0.037
ances-0.037
arians-0.036
 cultures-0.036
girls-0.035
Token their
Token ID511
Feature activation-1.96e-02

Pos logit contributions
 sizes+0.070
ances+0.068
 cultures+0.067
bows+0.065
arians+0.065
Neg logit contributions
O-0.029
 Jane-0.029
 another-0.028
 aback-0.027
 my-0.027
Token natural
Token ID3288
Feature activation-2.71e-02

Pos logit contributions
 sizes+0.034
ances+0.034
bles+0.033
bows+0.032
arians+0.032
Neg logit contributions
 my-0.023
 another-0.023
 home-0.022
 scoop-0.022
 Jane-0.022
Token habitats
Token ID35308
Feature activation-2.03e-01

Pos logit contributions
bles+0.037
aults+0.037
 sizes+0.035
acas+0.035
arians+0.035
Neg logit contributions
 another-0.041
 Jane-0.040
 some-0.040
 my-0.040
 aback-0.040
Token.
Token ID13
Feature activation-8.39e-03

Pos logit contributions
 sizes+0.395
 cultures+0.389
arians+0.381
girls+0.381
ances+0.380
Neg logit contributions
 another-0.203
 Jane-0.199
 some-0.194
out-0.180
 my-0.177
Token We
Token ID775
Feature activation+1.22e-02

Pos logit contributions
 sizes+0.017
ances+0.017
 cultures+0.016
 planets+0.016
 nations+0.016
Neg logit contributions
O-0.009
 Jane-0.008
 aback-0.008
M-0.008
uck-0.007
Token should
Token ID815
Feature activation-6.73e-02

Pos logit contributions
O+0.017
 aback+0.016
 Jane+0.015
 scoop+0.015
M+0.015
Neg logit contributions
 sizes-0.021
ances-0.020
 cultures-0.020
 planets-0.019
 nations-0.019
Token be
Token ID307
Feature activation-1.73e-02

Pos logit contributions
 sizes+0.129
ances+0.122
 cultures+0.121
 planets+0.121
arians+0.119
Neg logit contributions
O-0.075
 Jane-0.073
 scoop-0.072
 my-0.071
 another-0.069
Token kind
Token ID1611
Feature activation-3.94e-02

Pos logit contributions
 sizes+0.026
 cultures+0.026
 nations+0.025
ances+0.025
 planets+0.024
Neg logit contributions
O-0.026
 Jane-0.025
inn-0.025
izz-0.024
 ingredient-0.024
Token them
Token ID606
Feature activation-2.13e-03

Pos logit contributions
O+0.048
 aback+0.045
 Jane+0.045
M+0.043
uck+0.043
Neg logit contributions
 sizes-0.065
ances-0.064
 cultures-0.064
 nations-0.061
 planets-0.061
Token away
Token ID1497
Feature activation+1.69e-02

Pos logit contributions
 sizes+0.005
ances+0.005
 cultures+0.005
arians+0.005
 planets+0.005
Neg logit contributions
 aback-0.001
 another-0.001
 Jane-0.001
 some-0.001
 away-0.001
Token from
Token ID422
Feature activation-3.47e-02

Pos logit contributions
 another+0.019
 aback+0.019
 Jane+0.019
 my+0.019
 some+0.019
Neg logit contributions
 sizes-0.030
ances-0.030
arians-0.030
girls-0.030
 cultures-0.029
Token their
Token ID511
Feature activation-1.95e-02

Pos logit contributions
 sizes+0.068
ances+0.067
 cultures+0.066
girls+0.064
 nations+0.064
Neg logit contributions
O-0.036
 Jane-0.034
 aback-0.034
 scoop-0.032
 another-0.032
Token natural
Token ID3288
Feature activation-3.00e-02

Pos logit contributions
bows+0.036
 sizes+0.036
ances+0.036
bles+0.036
arians+0.035
Neg logit contributions
 my-0.020
 scoop-0.020
 another-0.020
 home-0.019
 away-0.019
Token habitats
Token ID35308
Feature activation-2.02e-01

Pos logit contributions
bles+0.043
arians+0.043
girls+0.042
aults+0.041
acas+0.041
Neg logit contributions
 aback-0.043
 another-0.042
 home-0.042
 Jane-0.042
 some-0.042
Token.
Token ID13
Feature activation-2.98e-03

Pos logit contributions
 cultures+0.394
 sizes+0.392
girls+0.391
arians+0.390
bles+0.383
Neg logit contributions
 Jane-0.199
 another-0.191
 some-0.180
out-0.171
 her-0.169
Token We
Token ID775
Feature activation+1.44e-02

Pos logit contributions
 sizes+0.006
 cultures+0.005
ances+0.005
 nations+0.005
 planets+0.005
Neg logit contributions
O-0.003
eh-0.003
 aback-0.003
 scoop-0.003
 Jane-0.003
Token should
Token ID815
Feature activation-6.44e-02

Pos logit contributions
O+0.022
 Jane+0.021
eh+0.021
 aback+0.021
M+0.020
Neg logit contributions
 sizes-0.021
 cultures-0.021
ances-0.021
 nations-0.020
girls-0.020
Token be
Token ID307
Feature activation-1.47e-02

Pos logit contributions
 sizes+0.114
ances+0.110
 cultures+0.109
 planets+0.107
girls+0.106
Neg logit contributions
O-0.081
 Jane-0.077
 scoop-0.076
 aback-0.076
uck-0.073
Token friendly
Token ID8030
Feature activation-4.64e-02

Pos logit contributions
 cultures+0.019
 sizes+0.019
 pilots+0.019
 sailors+0.019
 captains+0.018
Neg logit contributions
O-0.025
inn-0.024
 Jane-0.023
izz-0.023
ray-0.022
Token They
Token ID1119
Feature activation-6.57e-02

Pos logit contributions
 sizes+0.012
ances+0.012
 cultures+0.012
 planets+0.012
 nations+0.012
Neg logit contributions
O-0.009
 Jane-0.008
 aback-0.008
M-0.008
uck-0.008
Token might
Token ID1244
Feature activation-9.43e-02

Pos logit contributions
ances+0.101
 sizes+0.100
 cultures+0.099
 planets+0.094
 nations+0.094
Neg logit contributions
O-0.092
 aback-0.091
 scoop-0.090
 Jane-0.087
 found-0.087
Token not
Token ID407
Feature activation-1.08e-02

Pos logit contributions
 sizes+0.167
ances+0.163
 cultures+0.160
girls+0.154
 affect+0.154
Neg logit contributions
O-0.118
 aback-0.113
 Jane-0.111
uck-0.108
Aid-0.108
Token train
Token ID4512
Feature activation-4.12e-02

Pos logit contributions
 sizes+0.020
 cultures+0.020
ances+0.020
 planets+0.019
arians+0.019
Neg logit contributions
O-0.012
 scoop-0.012
 Jane-0.012
 aback-0.012
 another-0.011
Token their
Token ID511
Feature activation-4.40e-02

Pos logit contributions
 cultures+0.069
 sizes+0.068
ances+0.066
 planets+0.066
 nations+0.064
Neg logit contributions
 another-0.056
O-0.054
 aback-0.053
 some-0.052
 my-0.052
Token dogs
Token ID6844
Feature activation-1.80e-01

Pos logit contributions
 achieve+0.073
 sizes+0.071
aults+0.070
ances+0.068
bles+0.067
Neg logit contributions
 home-0.054
 Jane-0.053
 another-0.053
 found-0.053
 my-0.053
Token well
Token ID880
Feature activation-1.89e-02

Pos logit contributions
ances+0.308
 cultures+0.307
 sizes+0.306
 planets+0.295
 nations+0.287
Neg logit contributions
 another-0.237
 my-0.228
 found-0.226
 away-0.226
 some-0.223
Token.
Token ID13
Feature activation-4.65e-03

Pos logit contributions
 cultures+0.033
ances+0.033
 sizes+0.033
arians+0.032
 planets+0.032
Neg logit contributions
 another-0.022
 Jane-0.021
 my-0.021
 aback-0.021
 some-0.021
Token You
Token ID921
Feature activation-6.02e-03

Pos logit contributions
 sizes+0.008
ances+0.008
 cultures+0.008
 nations+0.008
 planets+0.008
Neg logit contributions
O-0.006
 Jane-0.006
 aback-0.006
M-0.005
 another-0.005
Token have
Token ID423
Feature activation+4.36e-02

Pos logit contributions
 sizes+0.009
ances+0.009
 cultures+0.009
 nations+0.009
 planets+0.009
Neg logit contributions
O-0.009
 aback-0.008
 Jane-0.008
 scoop-0.008
 another-0.008
Token to
Token ID284
Feature activation+1.63e-02

Pos logit contributions
O+0.039
inn+0.034
 aback+0.034
 Jane+0.034
izz+0.033
Neg logit contributions
 sizes-0.096
 cultures-0.093
ances-0.092
 nations-0.091
 planets-0.090
TokenLook
Token ID8567
Feature activation+5.53e-02

Pos logit contributions
 sizes+0.016
ances+0.016
 cultures+0.016
 nations+0.015
 planets+0.015
Neg logit contributions
O-0.007
 Jane-0.006
 aback-0.006
M-0.006
 scoop-0.006
Token at
Token ID379
Feature activation+2.92e-02

Pos logit contributions
O+0.079
 aback+0.076
 Jane+0.072
Aid+0.072
eh+0.071
Neg logit contributions
 sizes-0.087
ances-0.086
 cultures-0.085
 nations-0.082
 planets-0.082
Token how
Token ID703
Feature activation-2.35e-02

Pos logit contributions
O+0.023
 aback+0.021
 Jane+0.021
uck+0.019
M+0.019
Neg logit contributions
 sizes-0.065
ances-0.064
 cultures-0.064
 nations-0.062
 planets-0.061
Token happy
Token ID3772
Feature activation-1.21e-02

Pos logit contributions
 sizes+0.045
ances+0.045
 cultures+0.044
 nations+0.042
 planets+0.042
Neg logit contributions
O-0.026
 aback-0.024
 Jane-0.024
M-0.023
uck-0.023
Token our
Token ID674
Feature activation-5.29e-02

Pos logit contributions
ances+0.023
 sizes+0.023
arians+0.022
girls+0.022
 cultures+0.022
Neg logit contributions
 Jane-0.013
 my-0.013
 another-0.012
 some-0.012
O-0.012
Token animals
Token ID4695
Feature activation-1.79e-01

Pos logit contributions
 sizes+0.089
bles+0.087
ances+0.087
girls+0.086
 achieve+0.085
Neg logit contributions
 my-0.064
 Jane-0.062
 another-0.062
 scoop-0.062
 some-0.061
Token look
Token ID804
Feature activation+3.34e-02

Pos logit contributions
ances+0.261
 sizes+0.260
 cultures+0.255
girls+0.246
arians+0.243
Neg logit contributions
 aback-0.263
O-0.263
 scoop-0.260
 Jane-0.259
 found-0.254
Token together
Token ID1978
Feature activation-2.41e-02

Pos logit contributions
 Jane+0.038
 aback+0.038
O+0.037
 another+0.037
 my+0.036
Neg logit contributions
ances-0.060
 sizes-0.060
 cultures-0.059
girls-0.058
arians-0.058
Token!"
Token ID2474
Feature activation+2.20e-02

Pos logit contributions
ances+0.045
 sizes+0.044
 cultures+0.044
girls+0.044
arians+0.043
Neg logit contributions
 Jane-0.026
 my-0.026
 another-0.025
O-0.025
 scoop-0.025
Token
Token ID198
Feature activation-5.73e-03

Pos logit contributions
O+0.022
 aback+0.021
 Jane+0.020
M+0.019
uck+0.019
Neg logit contributions
 sizes-0.045
ances-0.044
 cultures-0.044
 nations-0.042
 planets-0.042
Token
Token ID198
Feature activation-1.18e-03

Pos logit contributions
 sizes+0.011
ances+0.011
 cultures+0.011
 planets+0.010
 nations+0.010
Neg logit contributions
O-0.006
 aback-0.006
 Jane-0.006
Aid-0.005
uck-0.005
Token that
Token ID326
Feature activation+5.57e-02

Pos logit contributions
 Jane+0.011
O+0.011
 another+0.010
 scoop+0.010
 aback+0.010
Neg logit contributions
ances-0.033
 sizes-0.033
 cultures-0.033
girls-0.032
arians-0.032
Token said
Token ID531
Feature activation-2.36e-02

Pos logit contributions
O+0.062
 aback+0.058
 Jane+0.056
Aid+0.054
uck+0.054
Neg logit contributions
 sizes-0.107
ances-0.105
 cultures-0.104
 planets-0.100
 nations-0.100
Token:
Token ID25
Feature activation-4.40e-03

Pos logit contributions
 sizes+0.046
ances+0.045
 cultures+0.045
 planets+0.043
 nations+0.043
Neg logit contributions
O-0.026
 Jane-0.026
 another-0.024
 aback-0.024
 Sally-0.023
Token "
Token ID366
Feature activation-1.24e-02

Pos logit contributions
 sizes+0.009
ances+0.009
 cultures+0.008
 planets+0.008
 nations+0.008
Neg logit contributions
O-0.005
 Jane-0.005
 another-0.004
M-0.004
 aback-0.004
TokenNo
Token ID2949
Feature activation+2.50e-02

Pos logit contributions
ances+0.023
 sizes+0.022
 cultures+0.022
girls+0.021
arians+0.021
Neg logit contributions
 Jane-0.014
 aback-0.014
 ingredient-0.013
eh-0.013
Aid-0.012
Token dogs
Token ID6844
Feature activation-1.79e-01

Pos logit contributions
 Jane+0.033
 my+0.032
O+0.032
 another+0.031
 Sally+0.031
Neg logit contributions
ances-0.044
 cultures-0.042
bles-0.042
 sizes-0.041
 planets-0.041
Token allowed
Token ID3142
Feature activation-1.37e-02

Pos logit contributions
 sizes+0.301
 cultures+0.291
ances+0.290
 planets+0.288
 nations+0.273
Neg logit contributions
O-0.236
 aback-0.229
 Jane-0.226
 scoop-0.225
uck-0.222
Token".
Token ID1911
Feature activation+1.84e-02

Pos logit contributions
 sizes+0.026
ances+0.026
 cultures+0.026
 nations+0.025
girls+0.025
Neg logit contributions
O-0.015
 aback-0.014
 Jane-0.014
Aid-0.013
M-0.013
Token Jen
Token ID13364
Feature activation+2.83e-02

Pos logit contributions
O+0.020
 aback+0.018
 Jane+0.018
Aid+0.017
M+0.017
Neg logit contributions
 sizes-0.036
ances-0.036
 cultures-0.035
 nations-0.034
 planets-0.034
Token and
Token ID290
Feature activation+2.05e-03

Pos logit contributions
O+0.037
 Jane+0.035
 scoop+0.035
 aback+0.035
 found+0.034
Neg logit contributions
 sizes-0.050
 cultures-0.049
ances-0.048
 nations-0.046
 planets-0.046
Token Ben
Token ID3932
Feature activation-1.12e-02

Pos logit contributions
O+0.002
 Jane+0.002
uck+0.002
 another+0.002
 aback+0.002
Neg logit contributions
 sizes-0.004
ances-0.004
 cultures-0.004
 nations-0.004
 planets-0.004
Token soon
Token ID2582
Feature activation+2.18e-03

Pos logit contributions
 sizes+0.015
 cultures+0.015
arians+0.015
ances+0.015
bows+0.015
Neg logit contributions
 another-0.008
 Jane-0.008
 some-0.008
 late-0.008
 scoop-0.008
Token.
Token ID13
Feature activation-2.69e-03

Pos logit contributions
 another+0.002
 scoop+0.002
 some+0.002
 Jane+0.002
 found+0.002
Neg logit contributions
 sizes-0.004
 cultures-0.004
bles-0.004
arians-0.004
 nations-0.004
Token They
Token ID1119
Feature activation-5.76e-02

Pos logit contributions
 sizes+0.005
ances+0.005
 cultures+0.005
 nations+0.004
 planets+0.004
Neg logit contributions
O-0.003
 Jane-0.003
 aback-0.003
M-0.003
uck-0.003
Token hoped
Token ID10719
Feature activation-1.67e-02

Pos logit contributions
 sizes+0.101
ances+0.099
 cultures+0.098
 nations+0.094
 planets+0.093
Neg logit contributions
O-0.074
 aback-0.070
 Jane-0.069
 scoop-0.067
M-0.066
Token the
Token ID262
Feature activation-5.92e-02

Pos logit contributions
 sizes+0.032
ances+0.032
 cultures+0.032
 planets+0.031
 nations+0.031
Neg logit contributions
O-0.018
 aback-0.017
Aid-0.016
eh-0.016
 Jane-0.016
Token animals
Token ID4695
Feature activation-1.78e-01

Pos logit contributions
 sizes+0.128
ances+0.128
 cultures+0.128
aults+0.123
arians+0.122
Neg logit contributions
O-0.048
 Jane-0.044
 scoop-0.044
 another-0.043
 aback-0.041
Token and
Token ID290
Feature activation-2.53e-03

Pos logit contributions
 sizes+0.302
ances+0.288
arians+0.284
 cultures+0.283
bles+0.278
Neg logit contributions
 found-0.228
 another-0.225
 Jane-0.222
 some-0.218
 scoop-0.216
Token plants
Token ID6134
Feature activation-1.08e-01

Pos logit contributions
 sizes+0.006
ances+0.006
 cultures+0.005
arians+0.005
bles+0.005
Neg logit contributions
O-0.002
 Jane-0.002
 aback-0.002
 scoop-0.002
 another-0.002
Token would
Token ID561
Feature activation-8.60e-02

Pos logit contributions
 sizes+0.191
arians+0.188
ances+0.187
 cultures+0.186
 nations+0.181
Neg logit contributions
 another-0.126
 found-0.125
 some-0.125
 scoop-0.122
 Jane-0.121
Token be
Token ID307
Feature activation-1.76e-02

Pos logit contributions
 sizes+0.170
 cultures+0.164
ances+0.164
arians+0.162
 nations+0.160
Neg logit contributions
 scoop-0.091
 another-0.088
O-0.087
 Jane-0.086
 some-0.081
Token okay
Token ID8788
Feature activation-3.73e-02

Pos logit contributions
 sizes+0.034
ances+0.034
 cultures+0.033
 nations+0.032
arians+0.032
Neg logit contributions
O-0.019
 aback-0.018
 Jane-0.018
 scoop-0.017
M-0.016
Token noises
Token ID26782
Feature activation-7.84e-02

Pos logit contributions
O+0.013
 Jane+0.012
 aback+0.012
uck+0.012
 scoop+0.012
Neg logit contributions
ances-0.020
 sizes-0.020
 cultures-0.020
arians-0.019
 nations-0.019
Token and
Token ID290
Feature activation+6.09e-04

Pos logit contributions
ances+0.185
agons+0.178
 cultures+0.177
arians+0.174
 sizes+0.166
Neg logit contributions
 another-0.082
 some-0.075
 one-0.062
 late-0.061
 over-0.058
Token pretended
Token ID36766
Feature activation-4.96e-02

Pos logit contributions
 some+0.001
 found+0.001
 another+0.001
 eaten+0.001
 Lily+0.001
Neg logit contributions
ances-0.001
agons-0.001
arians-0.001
bles-0.001
 cultures-0.001
Token to
Token ID284
Feature activation-2.50e-02

Pos logit contributions
ances+0.125
arians+0.121
 cultures+0.120
 sizes+0.118
agons+0.117
Neg logit contributions
 Jane-0.034
 some-0.030
 another-0.030
 Amy-0.027
 her-0.026
Token be
Token ID307
Feature activation-3.42e-02

Pos logit contributions
ances+0.047
 sizes+0.047
 cultures+0.046
arians+0.044
 nations+0.044
Neg logit contributions
O-0.029
 Jane-0.027
 scoop-0.026
 another-0.026
 aback-0.025
Token animals
Token ID4695
Feature activation-1.77e-01

Pos logit contributions
ances+0.067
 cultures+0.066
arians+0.066
 sizes+0.064
 affect+0.063
Neg logit contributions
 another-0.036
 Jane-0.035
 found-0.034
 some-0.034
 my-0.034
Token.
Token ID13
Feature activation-7.91e-03

Pos logit contributions
ances+0.406
 cultures+0.377
 sizes+0.377
agons+0.370
arians+0.366
Neg logit contributions
 another-0.185
 some-0.171
 away-0.149
 her-0.139
 found-0.138
Token They
Token ID1119
Feature activation-6.32e-02

Pos logit contributions
ances+0.015
 cultures+0.014
 sizes+0.014
arians+0.014
 nations+0.014
Neg logit contributions
O-0.010
 Jane-0.009
M-0.009
 another-0.009
 aback-0.009
Token had
Token ID550
Feature activation+1.73e-02

Pos logit contributions
ances+0.124
 cultures+0.120
arians+0.117
 sizes+0.114
agons+0.113
Neg logit contributions
 found-0.078
O-0.074
 another-0.070
 scoop-0.069
 Jane-0.068
Token a
Token ID257
Feature activation-3.31e-03

Pos logit contributions
 some+0.011
 another+0.011
 found+0.010
 Jane+0.009
 a+0.009
Neg logit contributions
ances-0.047
arians-0.045
 cultures-0.044
 affect-0.044
agons-0.043
Token lot
Token ID1256
Feature activation+1.65e-02

Pos logit contributions
ances+0.008
 cultures+0.007
 sizes+0.007
arians+0.007
sters+0.007
Neg logit contributions
O-0.003
 another-0.003
M-0.003
 found-0.003
 half-0.003
Token is
Token ID318
Feature activation-1.31e-02

Pos logit contributions
 sizes+0.041
ances+0.041
 cultures+0.041
 nations+0.039
girls+0.039
Neg logit contributions
O-0.035
 aback-0.034
 Jane-0.034
 scoop-0.033
 another-0.033
Token not
Token ID407
Feature activation-1.04e-02

Pos logit contributions
 sizes+0.025
ances+0.025
 cultures+0.025
 planets+0.024
 nations+0.024
Neg logit contributions
O-0.014
 aback-0.013
 Jane-0.013
uck-0.013
Aid-0.013
Token only
Token ID691
Feature activation+9.06e-03

Pos logit contributions
 sizes+0.020
ances+0.020
 cultures+0.020
 nations+0.019
 planets+0.019
Neg logit contributions
O-0.012
 aback-0.011
 Jane-0.010
uck-0.010
M-0.010
Token about
Token ID546
Feature activation-6.87e-03

Pos logit contributions
O+0.011
 aback+0.010
 Jane+0.010
M+0.009
uck+0.009
Neg logit contributions
 sizes-0.017
ances-0.017
 cultures-0.017
 nations-0.016
 planets-0.016
Token knowing
Token ID6970
Feature activation-1.28e-02

Pos logit contributions
ances+0.014
 sizes+0.013
 cultures+0.013
girls+0.013
arians+0.013
Neg logit contributions
O-0.007
 Jane-0.007
 another-0.007
 my-0.006
 aback-0.006
Token facts
Token ID6419
Feature activation-1.76e-01

Pos logit contributions
 sizes+0.027
ances+0.026
 cultures+0.026
girls+0.026
 nations+0.025
Neg logit contributions
 Jane-0.012
O-0.011
 my-0.011
 another-0.011
 aback-0.010
Token and
Token ID290
Feature activation-2.15e-02

Pos logit contributions
girls+0.337
ances+0.326
arians+0.323
aults+0.316
 cultures+0.314
Neg logit contributions
 my-0.194
 another-0.187
 some-0.185
 aback-0.174
 Jane-0.173
Token words
Token ID2456
Feature activation-9.56e-02

Pos logit contributions
ances+0.031
bles+0.030
aults+0.030
girls+0.030
arians+0.030
Neg logit contributions
 my-0.031
 aback-0.031
 Jane-0.031
 another-0.030
 some-0.030
Token.
Token ID13
Feature activation-1.42e-02

Pos logit contributions
ances+0.169
 cultures+0.166
girls+0.166
 sizes+0.165
arians+0.163
Neg logit contributions
 another-0.108
 my-0.108
 Jane-0.106
 aback-0.105
 some-0.104
Token It
Token ID632
Feature activation+3.29e-02

Pos logit contributions
 sizes+0.027
ances+0.027
 cultures+0.027
 nations+0.026
 planets+0.026
Neg logit contributions
O-0.015
 Jane-0.014
 aback-0.014
M-0.013
uck-0.013
Token is
Token ID318
Feature activation-1.74e-02

Pos logit contributions
O+0.044
 aback+0.041
 Jane+0.041
Aid+0.039
M+0.039
Neg logit contributions
 sizes-0.057
ances-0.055
 cultures-0.055
 planets-0.053
 nations-0.053
Token noises
Token ID26782
Feature activation-8.53e-02

Pos logit contributions
ances+0.044
 sizes+0.042
 cultures+0.042
arians+0.041
 affect+0.040
Neg logit contributions
 Jane-0.039
O-0.039
 another-0.037
 aback-0.037
 some-0.036
Token and
Token ID290
Feature activation-5.17e-03

Pos logit contributions
 cultures+0.187
ances+0.183
arians+0.176
 habitats+0.174
agons+0.172
Neg logit contributions
 another-0.088
 some-0.084
 Jane-0.075
 her-0.075
 one-0.072
Token pretending
Token ID23662
Feature activation-6.97e-02

Pos logit contributions
arians+0.008
ances+0.007
bles+0.007
agons+0.007
aults+0.007
Neg logit contributions
 some-0.009
 Jane-0.009
 her-0.008
 another-0.008
 scoop-0.008
Token to
Token ID284
Feature activation-3.03e-02

Pos logit contributions
ances+0.179
 cultures+0.173
arians+0.172
 sizes+0.169
bles+0.168
Neg logit contributions
 Jane-0.037
 some-0.033
 another-0.031
O-0.029
 Amy-0.028
Token be
Token ID307
Feature activation-3.79e-02

Pos logit contributions
 sizes+0.047
ances+0.045
 cultures+0.045
 planets+0.044
 nations+0.044
Neg logit contributions
 aback-0.047
O-0.045
Aid-0.043
uck-0.043
Da-0.042
Token animals
Token ID4695
Feature activation-1.75e-01

Pos logit contributions
ances+0.080
arians+0.078
 cultures+0.077
 affect+0.076
 sizes+0.075
Neg logit contributions
 another-0.033
 Jane-0.033
 my-0.032
 some-0.031
 found-0.031
Token.
Token ID13
Feature activation+9.48e-04

Pos logit contributions
ances+0.374
 cultures+0.360
 sizes+0.347
 ABC+0.339
arians+0.337
Neg logit contributions
 another-0.183
 some-0.171
 her-0.154
 away-0.151
 Jane-0.147
Token Lily
Token ID20037
Feature activation+4.17e-02

Pos logit contributions
 aback+0.001
O+0.001
Aid+0.001
ray+0.001
eh+0.001
Neg logit contributions
 sizes-0.002
girls-0.001
 cultures-0.001
 nations-0.001
 planets-0.001
Token had
Token ID550
Feature activation+2.38e-02

Pos logit contributions
 Jane+0.044
O+0.044
 aback+0.042
 scoop+0.042
 found+0.042
Neg logit contributions
ances-0.082
 sizes-0.079
 cultures-0.079
 nations-0.076
arians-0.075
Token a
Token ID257
Feature activation+1.70e-03

Pos logit contributions
O+0.014
 aback+0.013
 Jane+0.012
uck+0.011
Aid+0.011
Neg logit contributions
 sizes-0.058
ances-0.057
 cultures-0.057
 planets-0.056
 nations-0.055
Token big
Token ID1263
Feature activation+4.58e-02

Pos logit contributions
O+0.002
 aback+0.002
 Jane+0.002
M+0.002
 scoop+0.002
Neg logit contributions
 sizes-0.003
ances-0.003
 cultures-0.003
 nations-0.003
 planets-0.003
Token"
Token ID1
Feature activation+3.23e-02

Pos logit contributions
 aback+0.017
 ingredient+0.017
 Jane+0.016
Aid+0.015
uck+0.015
Neg logit contributions
ances-0.018
 sizes-0.018
 achieve-0.017
bles-0.017
aults-0.017
TokenI
Token ID40
Feature activation+4.37e-02

Pos logit contributions
 aback+0.039
 ingredient+0.039
 Jane+0.039
ray+0.038
uck+0.038
Neg logit contributions
 sizes-0.057
ances-0.053
 cultures-0.052
girls-0.051
 planets-0.050
Token don
Token ID836
Feature activation-5.81e-02

Pos logit contributions
O+0.058
Aid+0.055
uck+0.055
 Jane+0.054
inn+0.054
Neg logit contributions
 sizes-0.069
ances-0.066
girls-0.065
 affect-0.065
 cultures-0.065
Token't
Token ID470
Feature activation+4.83e-03

Pos logit contributions
 sizes+0.105
 cultures+0.103
ances+0.103
 planets+0.098
 nations+0.098
Neg logit contributions
O-0.075
 aback-0.069
 Jane-0.068
uck-0.067
M-0.066
Token hate
Token ID5465
Feature activation-3.19e-02

Pos logit contributions
O+0.007
uck+0.007
Aid+0.007
 Jane+0.007
M+0.007
Neg logit contributions
 sizes-0.007
ances-0.007
 cultures-0.007
 affect-0.007
 nations-0.007
Token dogs
Token ID6844
Feature activation-1.75e-01

Pos logit contributions
ances+0.068
 sizes+0.066
 cultures+0.066
 nations+0.063
bles+0.063
Neg logit contributions
O-0.031
 my-0.031
 Jane-0.029
 another-0.029
 aback-0.027
Token anymore
Token ID7471
Feature activation-3.49e-02

Pos logit contributions
 cultures+0.328
ances+0.326
 sizes+0.325
 planets+0.319
 nations+0.312
Neg logit contributions
O-0.196
 aback-0.195
 Jane-0.192
 another-0.186
 my-0.185
Token,
Token ID11
Feature activation-5.00e-02

Pos logit contributions
ances+0.071
 cultures+0.071
 sizes+0.071
 nations+0.069
 planets+0.068
Neg logit contributions
 Jane-0.033
O-0.033
 aback-0.031
 another-0.030
 my-0.030
Token they
Token ID484
Feature activation-3.38e-02

Pos logit contributions
 sizes+0.097
ances+0.096
 cultures+0.095
 nations+0.091
 planets+0.091
Neg logit contributions
O-0.054
 aback-0.051
 Jane-0.050
M-0.047
uck-0.047
Token're
Token ID821
Feature activation-4.33e-02

Pos logit contributions
 sizes+0.045
ances+0.045
 cultures+0.045
 nations+0.042
 planets+0.042
Neg logit contributions
O-0.056
 aback-0.054
 Jane-0.053
 scoop-0.052
uck-0.051
Token so
Token ID523
Feature activation-3.89e-03

Pos logit contributions
 sizes+0.073
ances+0.066
girls+0.065
 planets+0.065
 cultures+0.064
Neg logit contributions
O-0.060
ray-0.058
Aid-0.057
uck-0.056
M-0.056
Token noises
Token ID26782
Feature activation-8.63e-02

Pos logit contributions
ances+0.041
 cultures+0.039
 affect+0.038
arians+0.038
 sizes+0.038
Neg logit contributions
 Jane-0.042
 some-0.042
 another-0.042
O-0.040
 aback-0.040
Token and
Token ID290
Feature activation-3.56e-03

Pos logit contributions
 cultures+0.200
ances+0.195
 habitats+0.189
arians+0.188
agons+0.186
Neg logit contributions
 another-0.080
 some-0.079
 over-0.069
 one-0.068
 out-0.066
Token pretending
Token ID23662
Feature activation-6.92e-02

Pos logit contributions
arians+0.005
ances+0.005
agons+0.005
bles+0.005
aults+0.005
Neg logit contributions
 some-0.006
 Jane-0.006
 her-0.006
 found-0.005
 Amy-0.005
Token to
Token ID284
Feature activation-2.92e-02

Pos logit contributions
ances+0.187
 cultures+0.180
arians+0.180
bles+0.175
 sizes+0.175
Neg logit contributions
 Jane-0.029
 some-0.027
 another-0.024
 Amy-0.021
 her-0.020
Token be
Token ID307
Feature activation-3.70e-02

Pos logit contributions
 sizes+0.049
ances+0.047
 cultures+0.047
 nations+0.045
 planets+0.045
Neg logit contributions
O-0.040
 aback-0.040
Aid-0.037
 Jane-0.037
uck-0.037
Token animals
Token ID4695
Feature activation-1.74e-01

Pos logit contributions
ances+0.078
arians+0.076
 cultures+0.075
 affect+0.073
 achieve+0.072
Neg logit contributions
 another-0.034
 some-0.032
 found-0.032
 Jane-0.032
 my-0.032
Token.
Token ID13
Feature activation-1.29e-03

Pos logit contributions
ances+0.387
 cultures+0.372
 sizes+0.357
 ABC+0.353
arians+0.348
Neg logit contributions
 another-0.173
 some-0.159
 away-0.146
 her-0.143
 one-0.133
Token Lily
Token ID20037
Feature activation+4.25e-02

Pos logit contributions
 sizes+0.002
ances+0.002
girls+0.002
 cultures+0.002
 planets+0.002
Neg logit contributions
O-0.002
 aback-0.001
Aid-0.001
uck-0.001
 scoop-0.001
Token had
Token ID550
Feature activation+2.48e-02

Pos logit contributions
 Jane+0.042
 found+0.041
O+0.040
 scoop+0.039
 aback+0.039
Neg logit contributions
ances-0.088
 cultures-0.085
 sizes-0.084
 nations-0.082
 affect-0.081
Token a
Token ID257
Feature activation+2.82e-03

Pos logit contributions
O+0.014
 aback+0.013
 Jane+0.012
uck+0.011
M+0.011
Neg logit contributions
 sizes-0.061
ances-0.060
 cultures-0.060
 planets-0.058
 nations-0.058
Token soft
Token ID2705
Feature activation+2.07e-02

Pos logit contributions
O+0.004
 aback+0.003
 Jane+0.003
uck+0.003
M+0.003
Neg logit contributions
 sizes-0.005
ances-0.005
 cultures-0.005
 nations-0.005
 planets-0.005
Token advertising
Token ID8560
Feature activation-8.88e-03

Pos logit contributions
 sizes+0.053
ances+0.052
 planets+0.050
 cultures+0.049
 nations+0.048
Neg logit contributions
 aback-0.038
O-0.038
uck-0.037
Aid-0.036
ray-0.036
Token a
Token ID257
Feature activation-1.21e-02

Pos logit contributions
 sizes+0.022
ances+0.022
 cultures+0.022
girls+0.021
arians+0.021
Neg logit contributions
O-0.005
 Jane-0.004
 aback-0.004
 another-0.003
 scoop-0.003
Token campaign
Token ID1923
Feature activation-4.66e-03

Pos logit contributions
 sizes+0.023
ances+0.023
 cultures+0.022
girls+0.021
 planets+0.021
Neg logit contributions
O-0.014
 Jane-0.013
 aback-0.013
 scoop-0.012
M-0.012
Token to
Token ID284
Feature activation-3.35e-03

Pos logit contributions
 sizes+0.013
arians+0.013
aults+0.012
 affect+0.012
girls+0.012
Neg logit contributions
 Jane-0.002
uck-0.001
 Sally-0.001
out-0.001
 another-0.001
Token help
Token ID1037
Feature activation-7.43e-02

Pos logit contributions
ances+0.006
 sizes+0.006
 cultures+0.005
 nations+0.005
 planets+0.005
Neg logit contributions
O-0.004
 aback-0.004
 Jane-0.004
M-0.004
Aid-0.004
Token animals
Token ID4695
Feature activation-1.74e-01

Pos logit contributions
 sizes+0.126
ances+0.124
 cultures+0.123
girls+0.118
arians+0.118
Neg logit contributions
O-0.097
 Jane-0.095
 aback-0.091
 another-0.089
 scoop-0.088
Token that
Token ID326
Feature activation+2.34e-02

Pos logit contributions
 sizes+0.341
ances+0.327
 cultures+0.323
arians+0.320
 planets+0.318
Neg logit contributions
 another-0.185
 some-0.177
 Jane-0.171
 aback-0.171
 her-0.170
Token did
Token ID750
Feature activation-9.94e-02

Pos logit contributions
O+0.030
 Jane+0.029
 aback+0.028
 scoop+0.027
 another+0.026
Neg logit contributions
ances-0.040
 sizes-0.040
 cultures-0.039
arians-0.038
 nations-0.038
Token not
Token ID407
Feature activation-1.90e-02

Pos logit contributions
 sizes+0.174
ances+0.173
 cultures+0.169
girls+0.166
arians+0.165
Neg logit contributions
O-0.124
 Jane-0.119
 aback-0.116
 another-0.116
 scoop-0.113
Token have
Token ID423
Feature activation+2.53e-02

Pos logit contributions
 sizes+0.032
ances+0.032
 cultures+0.032
 nations+0.030
 planets+0.030
Neg logit contributions
O-0.026
 aback-0.024
 Jane-0.024
uck-0.023
Aid-0.023
Token homes
Token ID5682
Feature activation-9.14e-02

Pos logit contributions
O+0.026
 aback+0.024
Aid+0.023
 Jane+0.023
M+0.023
Neg logit contributions
 sizes-0.051
ances-0.049
 cultures-0.049
 nations-0.048
 planets-0.048
Token "
Token ID366
Feature activation-2.24e-02

Pos logit contributions
 sizes+0.064
ances+0.063
 cultures+0.062
arians+0.059
aults+0.058
Neg logit contributions
 another-0.030
 Jane-0.030
 some-0.029
O-0.027
 Amy-0.026
TokenDo
Token ID5211
Feature activation-4.39e-02

Pos logit contributions
 sizes+0.042
ances+0.042
 cultures+0.041
 nations+0.040
 planets+0.040
Neg logit contributions
O-0.026
 aback-0.024
 Jane-0.024
M-0.023
uck-0.022
Token not
Token ID407
Feature activation-1.67e-02

Pos logit contributions
 sizes+0.085
ances+0.081
 cultures+0.081
arians+0.078
 planets+0.076
Neg logit contributions
O-0.056
 another-0.051
 my-0.051
 Jane-0.050
 some-0.048
Token feed
Token ID3745
Feature activation-4.96e-02

Pos logit contributions
 sizes+0.031
ances+0.030
 cultures+0.030
arians+0.029
girls+0.029
Neg logit contributions
O-0.020
 Jane-0.019
 aback-0.018
 another-0.018
 scoop-0.018
Token the
Token ID262
Feature activation-6.93e-02

Pos logit contributions
 sizes+0.103
ances+0.102
 cultures+0.101
 nations+0.097
 planets+0.097
Neg logit contributions
O-0.047
 aback-0.043
 Jane-0.043
M-0.040
Aid-0.040
Token animals
Token ID4695
Feature activation-1.74e-01

Pos logit contributions
ances+0.120
 sizes+0.114
 cultures+0.112
aults+0.109
arians+0.106
Neg logit contributions
 another-0.093
 my-0.086
 some-0.086
 half-0.084
 scoop-0.084
Token".
Token ID1911
Feature activation+1.88e-02

Pos logit contributions
 sizes+0.312
ances+0.304
 cultures+0.301
 planets+0.291
 nations+0.286
Neg logit contributions
O-0.212
 Jane-0.201
 aback-0.200
 scoop-0.193
uck-0.192
Token<|endoftext|>
Token ID50256
Feature activation-1.88e-02

Pos logit contributions
O+0.016
 Jane+0.015
 aback+0.015
 another+0.014
M+0.014
Neg logit contributions
 sizes-0.042
ances-0.042
 cultures-0.041
 planets-0.039
 nations-0.039
TokenM
Token ID44
Feature activation+7.28e-02

Pos logit contributions
ances+0.046
 sizes+0.045
 cultures+0.044
arians+0.043
 nations+0.042
Neg logit contributions
O-0.014
M-0.012
 Jane-0.012
 aback-0.011
 another-0.011
Tokenia
Token ID544
Feature activation+1.27e-02

Pos logit contributions
O+0.081
 Jane+0.077
uck+0.072
M+0.072
 aback+0.071
Neg logit contributions
 sizes-0.145
 cultures-0.140
ances-0.139
 planets-0.134
arians-0.133
Token and
Token ID290
Feature activation+2.90e-02

Pos logit contributions
O+0.018
 aback+0.017
 Jane+0.017
M+0.016
 scoop+0.016
Neg logit contributions
 sizes-0.020
ances-0.020
 cultures-0.020
 nations-0.019
 planets-0.019
Token noises
Token ID26782
Feature activation-8.51e-02

Pos logit contributions
ances+0.040
 cultures+0.038
 sizes+0.037
 affect+0.037
arians+0.037
Neg logit contributions
 Jane-0.040
 another-0.039
 some-0.039
O-0.037
 aback-0.036
Token and
Token ID290
Feature activation-4.55e-03

Pos logit contributions
 cultures+0.195
ances+0.193
arians+0.187
 habitats+0.185
agons+0.184
Neg logit contributions
 another-0.083
 some-0.080
 over-0.069
 her-0.069
 one-0.068
Token pretending
Token ID23662
Feature activation-6.87e-02

Pos logit contributions
arians+0.007
ances+0.007
agons+0.007
bles+0.007
aults+0.006
Neg logit contributions
 some-0.007
 her-0.007
 Jane-0.007
 found-0.007
 his-0.007
Token to
Token ID284
Feature activation-2.97e-02

Pos logit contributions
ances+0.178
arians+0.173
 cultures+0.172
 sizes+0.168
bles+0.168
Neg logit contributions
 Jane-0.036
 some-0.033
 another-0.031
 her-0.028
 Amy-0.027
Token be
Token ID307
Feature activation-3.62e-02

Pos logit contributions
 sizes+0.051
ances+0.049
 cultures+0.049
 nations+0.047
 planets+0.047
Neg logit contributions
O-0.039
 aback-0.039
 Jane-0.037
uck-0.036
Aid-0.036
Token animals
Token ID4695
Feature activation-1.74e-01

Pos logit contributions
ances+0.078
arians+0.078
 cultures+0.076
agons+0.074
sters+0.074
Neg logit contributions
 another-0.031
 my-0.029
 some-0.029
 Jane-0.029
 found-0.028
Token.
Token ID13
Feature activation-3.86e-05

Pos logit contributions
ances+0.388
 cultures+0.370
 sizes+0.363
agons+0.352
arians+0.351
Neg logit contributions
 another-0.173
 some-0.158
 away-0.142
 her-0.141
 one-0.129
Token Tom
Token ID4186
Feature activation+6.16e-03

Pos logit contributions
 sizes+0.000
 cultures+0.000
ances+0.000
 nations+0.000
 ABC+0.000
Neg logit contributions
O-0.000
 aback-0.000
Aid-0.000
eh-0.000
ray-0.000
Token was
Token ID373
Feature activation-1.58e-02

Pos logit contributions
O+0.007
 Jane+0.007
 aback+0.007
 scoop+0.006
M+0.006
Neg logit contributions
ances-0.012
 sizes-0.011
 cultures-0.011
 nations-0.011
arians-0.011
Token a
Token ID257
Feature activation-6.58e-03

Pos logit contributions
 sizes+0.033
ances+0.032
 cultures+0.032
 nations+0.031
 planets+0.031
Neg logit contributions
O-0.015
 aback-0.014
 Jane-0.014
uck-0.013
M-0.013
Token lion
Token ID18744
Feature activation-4.87e-02

Pos logit contributions
 sizes+0.011
ances+0.011
 cultures+0.011
 nations+0.011
 planets+0.011
Neg logit contributions
O-0.009
 aback-0.008
 Jane-0.008
uck-0.008
M-0.008
Token farm
Token ID5318
Feature activation-7.30e-02

Pos logit contributions
ances+0.080
 sizes+0.077
 affect+0.074
bles+0.074
 cultures+0.073
Neg logit contributions
 Jane-0.053
 another-0.051
 my-0.050
 some-0.048
 half-0.048
Token.
Token ID13
Feature activation-1.65e-02

Pos logit contributions
ances+0.135
 affect+0.129
 cultures+0.128
 sizes+0.127
 achieve+0.125
Neg logit contributions
 Jane-0.090
 another-0.085
 some-0.082
 Amy-0.081
 my-0.080
Token We
Token ID775
Feature activation+1.53e-02

Pos logit contributions
 sizes+0.029
ances+0.029
 cultures+0.029
 nations+0.028
 planets+0.027
Neg logit contributions
O-0.021
 aback-0.019
 Jane-0.019
M-0.018
uck-0.018
Token love
Token ID1842
Feature activation-3.41e-02

Pos logit contributions
O+0.018
 aback+0.017
 scoop+0.017
 found+0.016
 Jane+0.016
Neg logit contributions
ances-0.028
 sizes-0.028
 cultures-0.028
 planets-0.027
arians-0.027
Token our
Token ID674
Feature activation-4.65e-02

Pos logit contributions
 sizes+0.070
ances+0.069
 cultures+0.068
 nations+0.066
 planets+0.065
Neg logit contributions
O-0.034
 aback-0.031
 Jane-0.031
M-0.029
uck-0.029
Token animals
Token ID4695
Feature activation-1.74e-01

Pos logit contributions
ances+0.088
 sizes+0.085
bles+0.084
 affect+0.083
 achieve+0.083
Neg logit contributions
 another-0.053
 Jane-0.053
 my-0.053
 scoop-0.049
 some-0.049
Token.
Token ID13
Feature activation-1.72e-02

Pos logit contributions
ances+0.331
 sizes+0.322
 cultures+0.321
arians+0.303
 nations+0.302
Neg logit contributions
 another-0.195
 Jane-0.188
 my-0.187
 some-0.185
O-0.176
Token We
Token ID775
Feature activation+1.35e-02

Pos logit contributions
ances+0.031
 sizes+0.031
 cultures+0.030
 nations+0.029
bles+0.029
Neg logit contributions
O-0.021
 Jane-0.020
 aback-0.019
M-0.019
 another-0.018
Token love
Token ID1842
Feature activation-3.51e-02

Pos logit contributions
O+0.015
 aback+0.014
 scoop+0.014
 found+0.014
 Jane+0.014
Neg logit contributions
ances-0.025
 sizes-0.025
 cultures-0.025
 planets-0.024
arians-0.024
Token our
Token ID674
Feature activation-4.75e-02

Pos logit contributions
 sizes+0.074
ances+0.073
 cultures+0.072
 nations+0.070
 planets+0.069
Neg logit contributions
O-0.033
 Jane-0.030
 aback-0.030
M-0.028
uck-0.028
Token barn
Token ID25203
Feature activation-3.59e-02

Pos logit contributions
ances+0.088
 sizes+0.086
bles+0.086
 achieve+0.083
 affect+0.083
Neg logit contributions
 another-0.055
 my-0.054
 Jane-0.054
 scoop-0.051
 some-0.051
Token noises
Token ID26782
Feature activation-8.52e-02

Pos logit contributions
ances+0.041
 cultures+0.040
 sizes+0.038
 affect+0.038
arians+0.038
Neg logit contributions
 Jane-0.039
 another-0.038
 some-0.038
O-0.036
 aback-0.036
Token and
Token ID290
Feature activation-4.35e-03

Pos logit contributions
 cultures+0.192
ances+0.190
arians+0.182
 habitats+0.182
agons+0.181
Neg logit contributions
 another-0.084
 some-0.081
 over-0.070
 one-0.070
 Jane-0.069
Token pretending
Token ID23662
Feature activation-6.87e-02

Pos logit contributions
arians+0.007
ances+0.006
agons+0.006
bles+0.006
aults+0.006
Neg logit contributions
 some-0.007
 Jane-0.007
 her-0.007
 found-0.007
 his-0.006
Token to
Token ID284
Feature activation-3.06e-02

Pos logit contributions
ances+0.181
 cultures+0.175
arians+0.175
 sizes+0.171
bles+0.170
Neg logit contributions
 Jane-0.033
 some-0.029
 another-0.027
 Amy-0.024
O-0.023
Token be
Token ID307
Feature activation-3.68e-02

Pos logit contributions
 sizes+0.051
ances+0.050
 cultures+0.049
 nations+0.048
 planets+0.048
Neg logit contributions
O-0.042
 aback-0.041
 Jane-0.039
Aid-0.038
uck-0.038
Token animals
Token ID4695
Feature activation-1.74e-01

Pos logit contributions
ances+0.080
arians+0.079
 cultures+0.077
 affect+0.075
 achieve+0.075
Neg logit contributions
 another-0.032
 Jane-0.030
 some-0.029
 found-0.029
 my-0.029
Token.
Token ID13
Feature activation-4.00e-04

Pos logit contributions
ances+0.385
 cultures+0.369
 sizes+0.354
 ABC+0.351
agons+0.347
Neg logit contributions
 another-0.174
 some-0.160
 away-0.145
 her-0.141
 one-0.134
Token Lily
Token ID20037
Feature activation+4.08e-02

Pos logit contributions
 sizes+0.001
girls+0.001
ances+0.001
 cultures+0.001
 planets+0.001
Neg logit contributions
O-0.000
 aback-0.000
Aid-0.000
uck-0.000
eh-0.000
Token had
Token ID550
Feature activation+2.42e-02

Pos logit contributions
 Jane+0.042
O+0.041
 found+0.040
 aback+0.040
 scoop+0.039
Neg logit contributions
ances-0.082
 cultures-0.079
 sizes-0.079
 nations-0.076
 affect-0.075
Token a
Token ID257
Feature activation+2.36e-03

Pos logit contributions
O+0.014
 aback+0.012
 Jane+0.012
uck+0.010
M+0.010
Neg logit contributions
 sizes-0.060
ances-0.059
 cultures-0.059
 nations-0.057
 planets-0.057
Token yellow
Token ID7872
Feature activation+4.84e-02

Pos logit contributions
O+0.003
 Jane+0.003
 aback+0.003
 scoop+0.003
M+0.003
Neg logit contributions
 sizes-0.004
ances-0.004
 cultures-0.004
 nations-0.004
 planets-0.004
Token noises
Token ID26782
Feature activation-8.55e-02

Pos logit contributions
ances+0.042
 cultures+0.040
 sizes+0.039
 affect+0.039
arians+0.039
Neg logit contributions
 Jane-0.038
 another-0.037
 some-0.036
O-0.035
 aback-0.035
Token and
Token ID290
Feature activation-4.90e-03

Pos logit contributions
 cultures+0.197
ances+0.194
arians+0.187
agons+0.187
 habitats+0.187
Neg logit contributions
 another-0.083
 some-0.079
 one-0.069
 over-0.068
 her-0.066
Token pretending
Token ID23662
Feature activation-6.85e-02

Pos logit contributions
arians+0.008
ances+0.007
agons+0.007
bles+0.007
aults+0.007
Neg logit contributions
 some-0.008
 Jane-0.008
 her-0.008
 found-0.007
 his-0.007
Token to
Token ID284
Feature activation-3.04e-02

Pos logit contributions
ances+0.181
arians+0.176
 cultures+0.175
 sizes+0.171
bles+0.170
Neg logit contributions
 Jane-0.032
 some-0.029
 another-0.027
 Amy-0.023
 her-0.023
Token be
Token ID307
Feature activation-3.68e-02

Pos logit contributions
 sizes+0.051
ances+0.050
 cultures+0.049
 nations+0.048
 planets+0.048
Neg logit contributions
O-0.041
 aback-0.040
 Jane-0.038
Aid-0.038
uck-0.038
Token animals
Token ID4695
Feature activation-1.74e-01

Pos logit contributions
ances+0.083
arians+0.082
 cultures+0.080
 affect+0.079
agons+0.078
Neg logit contributions
 another-0.029
 Jane-0.027
 some-0.026
 found-0.026
 my-0.025
Token.
Token ID13
Feature activation-6.79e-04

Pos logit contributions
ances+0.389
 cultures+0.372
 sizes+0.358
 ABC+0.353
agons+0.352
Neg logit contributions
 another-0.173
 some-0.157
 away-0.141
 her-0.139
 one-0.133
Token Lily
Token ID20037
Feature activation+4.07e-02

Pos logit contributions
 sizes+0.001
ances+0.001
 cultures+0.001
 nations+0.001
girls+0.001
Neg logit contributions
O-0.001
 aback-0.001
Aid-0.001
ray-0.001
uck-0.001
Token had
Token ID550
Feature activation+2.47e-02

Pos logit contributions
 Jane+0.040
O+0.040
 found+0.038
 scoop+0.038
 aback+0.038
Neg logit contributions
ances-0.083
 sizes-0.081
 cultures-0.081
 nations-0.077
 affect-0.077
Token a
Token ID257
Feature activation+1.80e-03

Pos logit contributions
O+0.015
 aback+0.013
 Jane+0.013
uck+0.012
Aid+0.012
Neg logit contributions
 sizes-0.060
ances-0.059
 cultures-0.059
 planets-0.057
 nations-0.057
Token doll
Token ID3654
Feature activation+2.45e-02

Pos logit contributions
O+0.002
 Jane+0.002
 aback+0.002
 scoop+0.002
M+0.002
Neg logit contributions
 sizes-0.003
ances-0.003
 cultures-0.003
 nations-0.003
 planets-0.003
Token mountains
Token ID12269
Feature activation-6.61e-02

Pos logit contributions
O+0.074
 aback+0.067
 Jane+0.064
uck+0.063
Aid+0.063
Neg logit contributions
 sizes-0.174
ances-0.172
 cultures-0.171
 planets-0.167
 nations-0.166
Token and
Token ID290
Feature activation+9.58e-03

Pos logit contributions
 cultures+0.148
ances+0.144
 sizes+0.143
agons+0.141
arians+0.141
Neg logit contributions
 another-0.055
 Jane-0.054
 some-0.052
 away-0.049
out-0.049
Token oceans
Token ID23744
Feature activation-8.09e-02

Pos logit contributions
O+0.013
 aback+0.012
inn+0.011
uck+0.011
Aid+0.011
Neg logit contributions
 sizes-0.017
 cultures-0.016
 planets-0.016
ances-0.016
 nations-0.016
Token until
Token ID1566
Feature activation-2.94e-03

Pos logit contributions
agons+0.172
 cultures+0.172
ances+0.170
arians+0.167
 sizes+0.163
Neg logit contributions
 another-0.077
 some-0.076
 Jane-0.074
 found-0.073
 away-0.071
Token they
Token ID484
Feature activation-2.62e-02

Pos logit contributions
 sizes+0.006
 planets+0.006
 cultures+0.006
ances+0.006
 nations+0.005
Neg logit contributions
O-0.003
 aback-0.003
Aid-0.003
ray-0.003
eh-0.003
Token reached
Token ID4251
Feature activation+9.30e-02

Pos logit contributions
 sizes+0.047
ances+0.047
 nations+0.046
 planets+0.046
 cultures+0.046
Neg logit contributions
O-0.029
 aback-0.029
Aid-0.028
 Jane-0.027
uck-0.027
Token a
Token ID257
Feature activation-1.34e-02

Pos logit contributions
O+0.099
Aid+0.088
 aback+0.088
inn+0.082
uck+0.082
Neg logit contributions
 planets-0.194
 sizes-0.186
 nations-0.183
 cultures-0.181
 countries-0.181
Token beautiful
Token ID4950
Feature activation+1.09e-02

Pos logit contributions
 sizes+0.023
 cultures+0.023
ances+0.023
 planets+0.022
 nations+0.022
Neg logit contributions
O-0.016
 aback-0.016
 Jane-0.016
uck-0.015
Aid-0.015
Token place
Token ID1295
Feature activation-2.09e-02

Pos logit contributions
O+0.018
 aback+0.018
 Jane+0.017
uck+0.017
Aid+0.017
Neg logit contributions
 sizes-0.015
 cultures-0.014
ances-0.014
 planets-0.014
 nations-0.014
Token.
Token ID13
Feature activation+2.06e-03

Pos logit contributions
 sizes+0.034
ances+0.033
 cultures+0.033
 nations+0.031
 planets+0.031
Neg logit contributions
O-0.029
 aback-0.028
 Jane-0.028
M-0.026
uck-0.026
Token
Token ID198
Feature activation+7.79e-03

Pos logit contributions
O+0.003
out+0.003
 ingredient+0.003
rous+0.003
eh+0.003
Neg logit contributions
bows-0.003
 sizes-0.003
 ABC-0.003
 planets-0.003
 Pole-0.003
Token opened
Token ID4721
Feature activation+4.32e-02

Pos logit contributions
 sizes+0.078
ances+0.075
 cultures+0.072
 nations+0.072
 planets+0.071
Neg logit contributions
 aback-0.074
O-0.073
uck-0.070
Aid-0.070
eh-0.067
Token the
Token ID262
Feature activation-1.74e-02

Pos logit contributions
O+0.050
eh+0.050
Aid+0.048
Da+0.045
ray+0.045
Neg logit contributions
 sizes-0.083
 planets-0.083
 nations-0.080
 countries-0.079
 cultures-0.079
Token cup
Token ID6508
Feature activation+6.92e-02

Pos logit contributions
 sizes+0.035
ances+0.034
 cultures+0.033
 nations+0.032
bles+0.032
Neg logit contributions
O-0.018
 Jane-0.017
 aback-0.017
Da-0.016
Aid-0.016
Tokenboard
Token ID3526
Feature activation+2.16e-02

Pos logit contributions
O+0.114
 aback+0.107
Aid+0.106
eh+0.102
Da+0.102
Neg logit contributions
 sizes-0.096
ances-0.092
bows-0.089
 cultures-0.089
 nations-0.088
Token and
Token ID290
Feature activation+2.28e-02

Pos logit contributions
O+0.037
 aback+0.035
Aid+0.034
 Jane+0.034
Da+0.033
Neg logit contributions
 sizes-0.029
ances-0.028
 cultures-0.027
 nations-0.026
 planets-0.026
Token took
Token ID1718
Feature activation+6.22e-02

Pos logit contributions
O+0.032
 aback+0.032
Aid+0.030
 single+0.029
uck+0.029
Neg logit contributions
 planets-0.036
 flags-0.035
 dancers-0.034
 nations-0.034
 countries-0.034
Token out
Token ID503
Feature activation+8.24e-02

Pos logit contributions
O+0.077
Da+0.073
eh+0.073
Aid+0.068
Gu+0.067
Neg logit contributions
 sizes-0.114
 planets-0.113
 flags-0.111
 nations-0.111
 countries-0.109
Token some
Token ID617
Feature activation+5.59e-02

Pos logit contributions
O+0.089
Aid+0.082
 aback+0.080
eh+0.077
uck+0.075
Neg logit contributions
 nations-0.166
 sizes-0.165
 planets-0.163
 countries-0.163
 cultures-0.162
Token pots
Token ID32195
Feature activation+2.74e-02

Pos logit contributions
O+0.077
 aback+0.072
 Jane+0.071
M+0.068
uck+0.068
Neg logit contributions
 sizes-0.094
ances-0.092
 cultures-0.090
 nations-0.087
 planets-0.086
Token and
Token ID290
Feature activation-5.53e-03

Pos logit contributions
O+0.038
 Jane+0.038
 aback+0.038
uck+0.036
 scoop+0.036
Neg logit contributions
 cultures-0.043
 sizes-0.043
ances-0.042
 planets-0.041
 nations-0.041
Token pans
Token ID36209
Feature activation+8.00e-04

Pos logit contributions
 sizes+0.007
ances+0.005
 planets+0.005
 flags+0.005
 cultures+0.005
Neg logit contributions
 aback-0.012
uck-0.011
O-0.011
 unavoidable-0.011
Aid-0.011
Token's
Token ID338
Feature activation+1.01e-02

Pos logit contributions
O+0.081
 aback+0.078
 Jane+0.077
Aid+0.074
inn+0.073
Neg logit contributions
 sizes-0.121
ances-0.117
 cultures-0.115
 planets-0.113
 nations-0.112
Token friend
Token ID1545
Feature activation-2.66e-02

Pos logit contributions
 another+0.010
O+0.009
 half+0.009
 scoop+0.009
uck+0.009
Neg logit contributions
 sizes-0.023
 cultures-0.021
ances-0.021
girls-0.021
bows-0.020
Token,
Token ID11
Feature activation-1.11e-02

Pos logit contributions
 sizes+0.038
ances+0.037
 cultures+0.036
 nations+0.036
girls+0.035
Neg logit contributions
O-0.042
 aback-0.039
Aid-0.039
eh-0.038
inn-0.038
Token Max
Token ID5436
Feature activation-3.50e-02

Pos logit contributions
 sizes+0.022
girls+0.022
ances+0.021
 cultures+0.021
 nations+0.021
Neg logit contributions
O-0.010
Aid-0.010
inn-0.009
 ingredient-0.009
uck-0.009
Token,
Token ID11
Feature activation-2.07e-02

Pos logit contributions
 sizes+0.059
ances+0.058
 cultures+0.058
 planets+0.054
 nations+0.054
Neg logit contributions
O-0.049
 Jane-0.049
 aback-0.048
 found-0.047
 scoop-0.046
Token saw
Token ID2497
Feature activation+6.93e-02

Pos logit contributions
 sizes+0.041
ances+0.040
 cultures+0.040
 planets+0.039
 nations+0.038
Neg logit contributions
O-0.021
 aback-0.019
Aid-0.019
inn-0.018
 Jane-0.018
Token what
Token ID644
Feature activation-6.14e-03

Pos logit contributions
O+0.058
 aback+0.053
 ingredient+0.052
 scoop+0.051
 Jane+0.051
Neg logit contributions
 sizes-0.147
 cultures-0.144
 planets-0.144
ances-0.143
 nations-0.143
Token happened
Token ID3022
Feature activation+7.36e-03

Pos logit contributions
 sizes+0.011
 cultures+0.010
ances+0.010
 planets+0.010
 nations+0.010
Neg logit contributions
O-0.008
 aback-0.008
Aid-0.007
 Jane-0.007
uck-0.007
Token and
Token ID290
Feature activation+3.81e-03

Pos logit contributions
 another+0.008
 some+0.008
 Jane+0.008
 late+0.008
uck+0.008
Neg logit contributions
 sizes-0.015
ances-0.014
 cultures-0.014
 nations-0.014
arians-0.014
Token ran
Token ID4966
Feature activation+3.64e-03

Pos logit contributions
O+0.004
 aback+0.004
 Jane+0.004
Aid+0.004
uck+0.004
Neg logit contributions
 sizes-0.007
ances-0.007
 cultures-0.007
 planets-0.007
girls-0.007
Token over
Token ID625
Feature activation+8.22e-02

Pos logit contributions
O+0.003
Aid+0.003
M+0.003
ray+0.002
 ingredient+0.002
Neg logit contributions
 sizes-0.008
ances-0.008
 cultures-0.008
girls-0.007
bows-0.007
Token the
Token ID262
Feature activation-5.43e-02

Pos logit contributions
 sizes+0.021
ances+0.020
 cultures+0.020
 nations+0.019
 planets+0.019
Neg logit contributions
O-0.008
 aback-0.007
 Jane-0.007
uck-0.007
M-0.007
Token store
Token ID3650
Feature activation-1.80e-02

Pos logit contributions
 sizes+0.099
ances+0.098
 cultures+0.095
 nations+0.092
girls+0.092
Neg logit contributions
O-0.067
 Jane-0.062
 aback-0.061
 scoop-0.060
 another-0.059
Token.
Token ID13
Feature activation-4.75e-03

Pos logit contributions
 sizes+0.034
ances+0.034
 cultures+0.033
girls+0.033
 planets+0.033
Neg logit contributions
 Jane-0.020
 another-0.020
 some-0.019
 aback-0.019
 found-0.018
Token They
Token ID1119
Feature activation-5.61e-02

Pos logit contributions
 sizes+0.008
girls+0.007
 cultures+0.007
ances+0.007
 ABC+0.006
Neg logit contributions
O-0.006
 ingredient-0.006
 aback-0.006
out-0.006
Aid-0.006
Token walked
Token ID6807
Feature activation-7.24e-03

Pos logit contributions
 sizes+0.092
ances+0.090
 cultures+0.088
 planets+0.086
 nations+0.085
Neg logit contributions
O-0.077
 aback-0.075
uck-0.073
Aid-0.070
 ingredient-0.069
Token down
Token ID866
Feature activation+5.89e-02

Pos logit contributions
 sizes+0.013
 cultures+0.013
ances+0.013
 planets+0.013
 nations+0.013
Neg logit contributions
O-0.009
Aid-0.008
 aback-0.008
 ingredient-0.007
ray-0.007
Token the
Token ID262
Feature activation-3.05e-02

Pos logit contributions
O+0.066
Aid+0.060
eh+0.058
 aback+0.055
Da+0.054
Neg logit contributions
 cultures-0.109
 nations-0.108
 countries-0.107
 sizes-0.107
ances-0.107
Token a
Token ID257
Feature activation-8.52e-03

Pos logit contributions
 sizes+0.031
ances+0.031
 cultures+0.030
 nations+0.028
 planets+0.028
Neg logit contributions
O-0.061
 Jane-0.059
 aback-0.059
 scoop-0.057
uck-0.057
Tokenis
Token ID271
Feature activation+1.36e-02

Pos logit contributions
 sizes+0.014
ances+0.014
 cultures+0.014
 nations+0.013
 planets+0.013
Neg logit contributions
O-0.011
 aback-0.011
 Jane-0.011
uck-0.010
M-0.010
Tokenles
Token ID829
Feature activation+1.86e-02

Pos logit contributions
eh+0.016
out+0.015
 scoop+0.015
O+0.014
 some+0.014
Neg logit contributions
 cultures-0.025
 sizes-0.025
girls-0.025
 planets-0.024
 achieve-0.024
Token and
Token ID290
Feature activation+1.49e-02

Pos logit contributions
 another+0.020
 some+0.019
 scoop+0.019
 Jane+0.019
 found+0.019
Neg logit contributions
 sizes-0.036
ances-0.036
 cultures-0.036
girls-0.034
 affect-0.033
Token it
Token ID340
Feature activation+7.98e-02

Pos logit contributions
 sizes+0.015
ances+0.015
 cultures+0.015
 nations+0.014
 planets+0.014
Neg logit contributions
O-0.004
 aback-0.004
 Jane-0.004
M-0.003
 scoop-0.003
Token,
Token ID11
Feature activation-3.24e-02

Pos logit contributions
O+0.090
 another+0.088
 aback+0.088
 Jane+0.087
 scoop+0.083
Neg logit contributions
 sizes-0.150
ances-0.149
 cultures-0.148
 nations-0.142
 planets-0.141
Token and
Token ID290
Feature activation+5.28e-03

Pos logit contributions
ances+0.057
 cultures+0.055
 sizes+0.055
 nations+0.054
girls+0.053
Neg logit contributions
 Jane-0.043
O-0.042
 scoop-0.041
 another-0.040
 some-0.040
Token then
Token ID788
Feature activation+2.28e-02

Pos logit contributions
O+0.007
 aback+0.007
Aid+0.006
uck+0.006
inn+0.006
Neg logit contributions
 sizes-0.009
ances-0.008
 cultures-0.008
 planets-0.008
 nations-0.008
Token he
Token ID339
Feature activation+5.01e-03

Pos logit contributions
O+0.029
 aback+0.028
 Jane+0.027
M+0.026
uck+0.026
Neg logit contributions
 sizes-0.040
ances-0.039
 cultures-0.039
 nations-0.037
 planets-0.037
Token put
Token ID1234
Feature activation+6.21e-02

Pos logit contributions
O+0.006
 aback+0.005
 Jane+0.005
Aid+0.005
uck+0.005
Neg logit contributions
 sizes-0.010
ances-0.009
 cultures-0.009
 nations-0.009
 planets-0.009
Token on
Token ID319
Feature activation+4.16e-02

Pos logit contributions
Da+0.090
O+0.082
Gu+0.081
ray+0.078
eh+0.078
Neg logit contributions
bles-0.099
 medals-0.092
 sizes-0.092
 planets-0.091
 flags-0.087
Token his
Token ID465
Feature activation-4.84e-03

Pos logit contributions
O+0.034
 aback+0.032
Da+0.031
Aid+0.030
ray+0.029
Neg logit contributions
 sizes-0.089
 cultures-0.086
 planets-0.086
 ABC-0.085
ances-0.085
Token buckle
Token ID44983
Feature activation-2.02e-03

Pos logit contributions
 sizes+0.008
ances+0.008
 cultures+0.008
 planets+0.008
 nations+0.008
Neg logit contributions
O-0.006
 aback-0.006
 Jane-0.006
uck-0.006
Da-0.006
Token and
Token ID290
Feature activation+7.92e-03

Pos logit contributions
 sizes+0.004
 cultures+0.004
ances+0.004
 nations+0.004
girls+0.004
Neg logit contributions
 another-0.002
 Jane-0.002
 aback-0.002
O-0.002
 some-0.002
Token took
Token ID1718
Feature activation+5.52e-02

Pos logit contributions
O+0.010
 aback+0.009
 Jane+0.009
uck+0.009
M+0.009
Neg logit contributions
 sizes-0.014
ances-0.014
 cultures-0.014
 planets-0.013
 nations-0.013
Token "
Token ID366
Feature activation-8.95e-03

Pos logit contributions
 sizes+0.093
ances+0.092
 cultures+0.091
 nations+0.088
 planets+0.087
Neg logit contributions
O-0.059
 Jane-0.055
 aback-0.054
M-0.053
 scoop-0.052
TokenWho
Token ID8241
Feature activation+5.67e-02

Pos logit contributions
 sizes+0.017
ances+0.017
 cultures+0.016
 nations+0.016
 countries+0.016
Neg logit contributions
 Jane-0.010
 aback-0.010
 ingredient-0.009
eh-0.009
Aid-0.009
Token did
Token ID750
Feature activation-6.94e-02

Pos logit contributions
O+0.091
 aback+0.088
 Jane+0.085
inn+0.084
Aid+0.083
Neg logit contributions
 sizes-0.079
ances-0.078
 cultures-0.078
 nations-0.077
 countries-0.074
Token this
Token ID428
Feature activation+2.65e-02

Pos logit contributions
 sizes+0.145
ances+0.143
 cultures+0.142
 nations+0.137
 planets+0.137
Neg logit contributions
O-0.065
 aback-0.060
 Jane-0.060
M-0.055
uck-0.055
Token?
Token ID30
Feature activation+1.44e-02

Pos logit contributions
O+0.031
 aback+0.029
Aid+0.028
 Jane+0.028
uck+0.027
Neg logit contributions
 sizes-0.050
ances-0.048
 cultures-0.048
 nations-0.047
 planets-0.046
Token Who
Token ID5338
Feature activation+4.36e-02

Pos logit contributions
O+0.015
 aback+0.014
 Jane+0.014
eh+0.014
Da+0.013
Neg logit contributions
 sizes-0.028
 cultures-0.027
ances-0.027
 planets-0.026
girls-0.026
Token ruined
Token ID20484
Feature activation-5.87e-03

Pos logit contributions
O+0.067
 aback+0.065
 Jane+0.063
Aid+0.062
inn+0.061
Neg logit contributions
 sizes-0.064
ances-0.063
 cultures-0.062
 nations-0.061
 planets-0.060
Token my
Token ID616
Feature activation+1.78e-02

Pos logit contributions
ances+0.012
 sizes+0.012
 nations+0.012
 planets+0.012
 cultures+0.012
Neg logit contributions
 aback-0.005
O-0.005
Aid-0.005
Da-0.005
eh-0.005
Token barn
Token ID25203
Feature activation-2.24e-02

Pos logit contributions
 Jane+0.021
 scoop+0.020
 another+0.020
 Sally+0.020
 found+0.019
Neg logit contributions
ances-0.032
 sizes-0.032
girls-0.032
arians-0.032
aults-0.031
Token?"
Token ID1701
Feature activation+5.43e-02

Pos logit contributions
 sizes+0.043
ances+0.043
 cultures+0.042
 nations+0.040
 planets+0.040
Neg logit contributions
O-0.025
 aback-0.023
 Jane-0.022
Aid-0.022
Da-0.021
Token He
Token ID679
Feature activation-3.44e-02

Pos logit contributions
O+0.053
 aback+0.049
 Jane+0.048
uck+0.046
Aid+0.046
Neg logit contributions
 sizes-0.111
ances-0.110
 cultures-0.109
 nations-0.105
 planets-0.105
Token them
Token ID606
Feature activation+1.04e-02

Pos logit contributions
O+0.028
eh+0.028
Aid+0.026
Da+0.025
ray+0.024
Neg logit contributions
 sizes-0.048
 cultures-0.046
 planets-0.046
 nations-0.046
girls-0.046
Token to
Token ID284
Feature activation+1.99e-02

Pos logit contributions
O+0.007
 aback+0.006
 Jane+0.006
M+0.005
uck+0.005
Neg logit contributions
 sizes-0.025
ances-0.024
 cultures-0.024
 nations-0.023
 planets-0.023
Token the
Token ID262
Feature activation-3.89e-02

Pos logit contributions
O+0.020
 aback+0.018
uck+0.018
Aid+0.017
ray+0.017
Neg logit contributions
 sizes-0.041
 cultures-0.041
ances-0.041
 planets-0.039
 nations-0.039
Token living
Token ID2877
Feature activation-3.69e-02

Pos logit contributions
 sizes+0.068
ances+0.068
 cultures+0.067
 nations+0.064
girls+0.064
Neg logit contributions
O-0.050
 Jane-0.047
 aback-0.047
M-0.045
 scoop-0.045
Token room
Token ID2119
Feature activation+1.79e-03

Pos logit contributions
 planets+0.047
 sizes+0.046
 nations+0.046
 countries+0.045
 cultures+0.045
Neg logit contributions
O-0.070
Aid-0.064
Da-0.064
 aback-0.063
eh-0.061
Token.
Token ID13
Feature activation+2.11e-03

Pos logit contributions
 Jane+0.002
 another+0.002
 scoop+0.002
 some+0.002
 aback+0.002
Neg logit contributions
 cultures-0.003
 sizes-0.003
ances-0.003
 planets-0.003
girls-0.003
Token They
Token ID1119
Feature activation-5.46e-02

Pos logit contributions
O+0.003
 aback+0.003
Aid+0.003
eh+0.003
ray+0.003
Neg logit contributions
 sizes-0.003
 cultures-0.003
ances-0.003
 nations-0.003
 countries-0.003
Token started
Token ID2067
Feature activation+2.07e-03

Pos logit contributions
 sizes+0.106
ances+0.105
 cultures+0.103
 nations+0.101
 planets+0.101
Neg logit contributions
O-0.060
 aback-0.056
 Jane-0.054
uck-0.053
Aid-0.052
Token to
Token ID284
Feature activation+2.78e-02

Pos logit contributions
O+0.001
 aback+0.001
 Jane+0.001
M+0.001
uck+0.001
Neg logit contributions
 sizes-0.005
ances-0.005
 cultures-0.005
 nations-0.005
 planets-0.005
Token build
Token ID1382
Feature activation-2.96e-02

Pos logit contributions
O+0.041
 aback+0.039
 Jane+0.038
uck+0.038
M+0.037
Neg logit contributions
 sizes-0.044
ances-0.043
 cultures-0.042
 planets-0.041
 nations-0.040
Token a
Token ID257
Feature activation-2.05e-02

Pos logit contributions
ances+0.070
 cultures+0.068
arians+0.068
 affect+0.067
 sizes+0.066
Neg logit contributions
 another-0.023
 some-0.023
 Jane-0.021
 half-0.019
 found-0.018
Token the
Token ID262
Feature activation-5.95e-02

Pos logit contributions
O+0.012
 aback+0.011
 Jane+0.010
uck+0.009
Aid+0.009
Neg logit contributions
 sizes-0.049
ances-0.048
 cultures-0.048
 planets-0.047
 nations-0.046
Token little
Token ID1310
Feature activation-2.63e-02

Pos logit contributions
ances+0.109
 sizes+0.107
 cultures+0.107
girls+0.107
arians+0.106
Neg logit contributions
O-0.071
 Jane-0.070
 another-0.067
 scoop-0.064
M-0.064
Token girl
Token ID2576
Feature activation-2.05e-02

Pos logit contributions
 sizes+0.038
 planets+0.036
 cultures+0.035
ances+0.035
bows+0.034
Neg logit contributions
O-0.043
 aback-0.041
inn-0.040
eh-0.040
Aid-0.039
Token.
Token ID13
Feature activation+2.48e-03

Pos logit contributions
ances+0.039
 cultures+0.038
 sizes+0.037
arians+0.037
agons+0.036
Neg logit contributions
 Jane-0.024
 another-0.023
 some-0.022
 her-0.021
 found-0.021
Token
Token ID220
Feature activation+1.40e-02

Pos logit contributions
O+0.003
 aback+0.003
eh+0.002
 scoop+0.002
Aid+0.002
Neg logit contributions
 sizes-0.005
 cultures-0.005
ances-0.005
 planets-0.004
 nations-0.004
Token
Token ID198
Feature activation+6.75e-03

Pos logit contributions
O+0.016
 aback+0.015
 scoop+0.014
Aid+0.014
 Jane+0.014
Neg logit contributions
 sizes-0.026
ances-0.025
 cultures-0.025
girls-0.024
bles-0.024
Token
Token ID198
Feature activation+7.97e-03

Pos logit contributions
O+0.006
 aback+0.006
 Jane+0.006
M+0.006
uck+0.006
Neg logit contributions
 sizes-0.014
ances-0.014
 cultures-0.014
 nations-0.013
 planets-0.013
TokenWhen
Token ID2215
Feature activation-4.23e-02

Pos logit contributions
O+0.010
 aback+0.010
 Jane+0.010
uck+0.009
 scoop+0.009
Neg logit contributions
 sizes-0.014
ances-0.014
 cultures-0.013
 nations-0.013
 planets-0.013
Token it
Token ID340
Feature activation+7.91e-02

Pos logit contributions
 planets+0.083
 sizes+0.083
 cultures+0.082
 nations+0.081
ances+0.081
Neg logit contributions
 aback-0.041
O-0.040
eh-0.038
Aid-0.037
ray-0.037
Token was
Token ID373
Feature activation-1.68e-02

Pos logit contributions
 aback+0.116
Aid+0.114
eh+0.109
O+0.108
ray+0.107
Neg logit contributions
 planets-0.105
 nations-0.103
 sizes-0.102
 achieve-0.102
bles-0.101
Token time
Token ID640
Feature activation-1.77e-02

Pos logit contributions
ances+0.022
 planets+0.021
 sizes+0.021
 nations+0.021
 cultures+0.020
Neg logit contributions
ray-0.027
O-0.027
 aback-0.026
Aid-0.026
 ingredient-0.025
Token.
Token ID13
Feature activation-2.68e-03

Pos logit contributions
 sizes+0.068
ances+0.068
 cultures+0.068
arians+0.065
 nations+0.065
Neg logit contributions
 Jane-0.039
O-0.038
 aback-0.036
 another-0.035
 scoop-0.035
Token<|endoftext|>
Token ID50256
Feature activation-2.88e-02

Pos logit contributions
 sizes+0.004
 cultures+0.004
 nations+0.004
 planets+0.004
girls+0.004
Neg logit contributions
eh-0.004
 scoop-0.004
out-0.004
Aid-0.003
 ingredient-0.003
TokenL
Token ID43
Feature activation+4.50e-02

Pos logit contributions
 sizes+0.062
ances+0.062
 cultures+0.061
 nations+0.059
arians+0.059
Neg logit contributions
O-0.026
 Jane-0.023
 aback-0.023
M-0.022
uck-0.022
Tokenily
Token ID813
Feature activation+6.60e-02

Pos logit contributions
 aback+0.053
O+0.053
 Jane+0.048
Aid+0.048
 Half+0.047
Neg logit contributions
ances-0.081
 sizes-0.079
 cultures-0.079
 nations-0.078
 countries-0.077
Token liked
Token ID8288
Feature activation+7.41e-03

Pos logit contributions
O+0.089
 Jane+0.086
 aback+0.085
 scoop+0.082
 found+0.082
Neg logit contributions
 sizes-0.111
ances-0.110
 cultures-0.108
arians-0.104
 nations-0.103
Token to
Token ID284
Feature activation+2.09e-02

Pos logit contributions
O+0.004
 Jane+0.004
 aback+0.004
 scoop+0.003
uck+0.003
Neg logit contributions
 sizes-0.018
ances-0.018
 cultures-0.018
arians-0.017
 nations-0.017
Token play
Token ID711
Feature activation-4.84e-02

Pos logit contributions
O+0.027
 Jane+0.025
 aback+0.025
 scoop+0.024
M+0.024
Neg logit contributions
 sizes-0.037
ances-0.036
 cultures-0.036
 nations-0.034
 planets-0.034
Token princess
Token ID21752
Feature activation-3.22e-02

Pos logit contributions
 planets+0.074
 nations+0.073
bles+0.073
 countries+0.071
 cultures+0.071
Neg logit contributions
 aback-0.069
O-0.066
 Jane-0.065
izz-0.063
ray-0.063
Token with
Token ID351
Feature activation+6.43e-02

Pos logit contributions
 sizes+0.062
ances+0.061
 cultures+0.060
 affect+0.056
 habitats+0.056
Neg logit contributions
 her-0.035
eh-0.035
 some-0.034
 another-0.034
 found-0.034
Token her
Token ID607
Feature activation+2.36e-02

Pos logit contributions
 aback+0.036
O+0.035
ray+0.029
uck+0.029
Aid+0.028
Neg logit contributions
 sizes-0.161
ances-0.157
 planets-0.157
 cultures-0.156
 nations-0.155
Token dolls
Token ID36062
Feature activation-5.06e-02

Pos logit contributions
 her+0.031
 another+0.031
 my+0.031
 Jane+0.030
 some+0.029
Neg logit contributions
ances-0.040
 sizes-0.039
 cultures-0.038
 achieve-0.038
arians-0.037
Token y
Token ID331
Feature activation+2.05e-02

Pos logit contributions
 aback+0.009
O+0.009
ray+0.008
 Jane+0.008
uck+0.008
Neg logit contributions
 sizes-0.013
ances-0.012
 cultures-0.012
 planets-0.012
bles-0.011
Tokenummy
Token ID13513
Feature activation+3.81e-02

Pos logit contributions
O+0.026
 Jane+0.025
 aback+0.025
uck+0.024
M+0.024
Neg logit contributions
 sizes-0.036
 cultures-0.035
ances-0.035
 planets-0.034
 nations-0.034
Token and
Token ID290
Feature activation+7.14e-03

Pos logit contributions
 Jane+0.047
O+0.043
 ingredient+0.043
 another+0.043
 some+0.043
Neg logit contributions
ances-0.069
 cultures-0.067
 achieve-0.067
 affect-0.066
arians-0.065
Token everyone
Token ID2506
Feature activation-2.02e-02

Pos logit contributions
 aback+0.009
O+0.009
ray+0.008
 Jane+0.008
inn+0.008
Neg logit contributions
 sizes-0.014
ances-0.012
 cultures-0.012
 planets-0.012
bles-0.012
Token wanted
Token ID2227
Feature activation+4.84e-03

Pos logit contributions
 sizes+0.029
 cultures+0.027
 planets+0.026
 nations+0.026
ances+0.026
Neg logit contributions
 aback-0.034
O-0.033
Aid-0.031
uck-0.031
Da-0.030
Token to
Token ID284
Feature activation-3.80e-03

Pos logit contributions
O+0.005
 aback+0.004
eh+0.004
inn+0.004
Aid+0.004
Neg logit contributions
 sizes-0.010
 cultures-0.010
 planets-0.010
ances-0.009
 nations-0.009
Token play
Token ID711
Feature activation-6.94e-02

Pos logit contributions
ances+0.006
 sizes+0.006
 achieve+0.006
girls+0.005
 cultures+0.005
Neg logit contributions
 aback-0.006
O-0.005
ray-0.005
Da-0.005
uck-0.005
Token with
Token ID351
Feature activation+3.35e-02

Pos logit contributions
 planets+0.093
girls+0.087
bles+0.086
 achieve+0.085
 songs+0.085
Neg logit contributions
 aback-0.114
ray-0.104
O-0.103
izz-0.102
Aid-0.101
Token it
Token ID340
Feature activation+7.15e-02

Pos logit contributions
 aback+0.032
inn+0.028
O+0.028
eh+0.027
Da+0.027
Neg logit contributions
 sizes-0.070
 planets-0.069
girls-0.067
ances-0.065
 cultures-0.065
Token.
Token ID13
Feature activation+3.33e-03

Pos logit contributions
O+0.114
 aback+0.109
 Jane+0.107
uck+0.103
Aid+0.103
Neg logit contributions
 sizes-0.104
ances-0.101
 cultures-0.099
 planets-0.095
 nations-0.095
Token But
Token ID887
Feature activation-5.14e-03

Pos logit contributions
O+0.004
 aback+0.004
Aid+0.004
eh+0.004
uck+0.004
Neg logit contributions
 sizes-0.006
 cultures-0.006
 planets-0.006
 nations-0.006
ances-0.006
Token okay
Token ID8788
Feature activation-3.11e-02

Pos logit contributions
ances+0.017
 sizes+0.017
 cultures+0.017
 planets+0.016
girls+0.016
Neg logit contributions
O-0.017
 aback-0.017
 Jane-0.017
 scoop-0.016
uck-0.016
Token to
Token ID284
Feature activation-6.19e-03

Pos logit contributions
 sizes+0.070
ances+0.069
 cultures+0.068
 nations+0.066
 planets+0.066
Neg logit contributions
O-0.025
 aback-0.023
 Jane-0.022
M-0.021
uck-0.020
Token ask
Token ID1265
Feature activation-4.15e-04

Pos logit contributions
ances+0.011
 sizes+0.010
 cultures+0.010
 affect+0.010
 achieve+0.010
Neg logit contributions
O-0.008
 aback-0.008
 Jane-0.008
eh-0.007
Aid-0.007
Token for
Token ID329
Feature activation+4.62e-02

Pos logit contributions
 sizes+0.001
ances+0.001
 cultures+0.001
 nations+0.001
 planets+0.001
Neg logit contributions
O-0.000
 aback-0.000
 Jane-0.000
uck-0.000
Aid-0.000
Token help
Token ID1037
Feature activation-6.30e-02

Pos logit contributions
inn+0.064
O+0.063
 aback+0.061
izz+0.059
uck+0.059
Neg logit contributions
 cultures-0.074
ances-0.072
 planets-0.072
 countries-0.072
 ABC-0.070
Token when
Token ID618
Feature activation-6.48e-02

Pos logit contributions
 sizes+0.107
 cultures+0.105
ances+0.102
 nations+0.102
girls+0.101
Neg logit contributions
O-0.081
 Jane-0.079
 aback-0.079
 another-0.079
uck-0.077
Token she
Token ID673
Feature activation-1.45e-03

Pos logit contributions
 sizes+0.126
ances+0.123
 cultures+0.120
girls+0.118
 nations+0.116
Neg logit contributions
O-0.070
 Jane-0.069
 another-0.065
 my-0.063
uck-0.063
Token needs
Token ID2476
Feature activation+1.88e-02

Pos logit contributions
 sizes+0.002
ances+0.002
 cultures+0.002
 planets+0.002
 nations+0.002
Neg logit contributions
O-0.002
 Jane-0.002
 aback-0.002
Da-0.002
inn-0.002
Token it
Token ID340
Feature activation+8.37e-02

Pos logit contributions
O+0.019
inn+0.019
eh+0.018
faced+0.018
izz+0.017
Neg logit contributions
 planets-0.035
 cultures-0.035
 countries-0.035
 sizes-0.034
ances-0.034
Token.
Token ID13
Feature activation+9.17e-07

Pos logit contributions
O+0.115
 aback+0.110
 Jane+0.109
M+0.104
uck+0.104
Neg logit contributions
 sizes-0.137
ances-0.135
 cultures-0.134
 nations-0.128
 planets-0.128
Token Together
Token ID17083
Feature activation-4.26e-02

Feature not active on prev token;
no predictions were affected.
Token shiny
Token ID22441
Feature activation+4.36e-02

Pos logit contributions
O+0.011
 aback+0.010
 Jane+0.010
uck+0.010
Aid+0.010
Neg logit contributions
 sizes-0.014
ances-0.014
 cultures-0.014
 planets-0.013
 nations-0.013
Token.
Token ID13
Feature activation+4.41e-03

Pos logit contributions
 another+0.053
 some+0.052
O+0.050
 Jane+0.050
 my+0.050
Neg logit contributions
ances-0.078
 cultures-0.078
girls-0.076
arians-0.074
 nations-0.074
Token Anna
Token ID11735
Feature activation+3.25e-02

Pos logit contributions
 aback+0.005
O+0.005
eh+0.004
Aid+0.004
 scoop+0.004
Neg logit contributions
 sizes-0.008
 cultures-0.008
girls-0.008
ances-0.008
 nations-0.008
Token wants
Token ID3382
Feature activation+2.02e-02

Pos logit contributions
O+0.046
 aback+0.045
 Jane+0.044
 scoop+0.043
uck+0.043
Neg logit contributions
ances-0.052
 sizes-0.052
 cultures-0.050
 nations-0.048
girls-0.047
Token to
Token ID284
Feature activation-3.70e-03

Pos logit contributions
O+0.024
eh+0.024
inn+0.023
 aback+0.022
izz+0.022
Neg logit contributions
 sizes-0.037
 planets-0.037
 cultures-0.036
 nations-0.036
 ABC-0.035
Token play
Token ID711
Feature activation-6.46e-02

Pos logit contributions
ances+0.005
 sizes+0.005
 achieve+0.005
 affect+0.005
 cultures+0.005
Neg logit contributions
 aback-0.005
Da-0.005
O-0.005
eh-0.005
ray-0.005
Token with
Token ID351
Feature activation+3.83e-02

Pos logit contributions
 sizes+0.097
bles+0.096
 planets+0.094
ances+0.093
 cultures+0.093
Neg logit contributions
O-0.087
 aback-0.087
ray-0.085
Da-0.084
eh-0.084
Token it
Token ID340
Feature activation+6.94e-02

Pos logit contributions
O+0.024
 aback+0.022
 Jane+0.021
M+0.019
uck+0.019
Neg logit contributions
 sizes-0.092
ances-0.091
 cultures-0.090
 planets-0.088
 nations-0.088
Token.
Token ID13
Feature activation+8.79e-03

Pos logit contributions
O+0.097
 Jane+0.093
 aback+0.092
 scoop+0.088
M+0.088
Neg logit contributions
 sizes-0.113
ances-0.112
 cultures-0.111
 nations-0.105
 planets-0.105
Token She
Token ID1375
Feature activation+9.09e-06

Pos logit contributions
O+0.010
eh+0.010
 aback+0.009
Aid+0.009
 ingredient+0.009
Neg logit contributions
 sizes-0.016
 cultures-0.015
girls-0.015
ances-0.015
arians-0.015
Token says
Token ID1139
Feature activation-2.85e-03

Feature not active on prev token;
no predictions were affected.
Token
Token ID198
Feature activation+8.65e-03

Pos logit contributions
 sizes+0.013
ances+0.013
 cultures+0.013
girls+0.012
 nations+0.012
Neg logit contributions
O-0.011
 aback-0.011
 Jane-0.010
eh-0.010
 scoop-0.010
Token
Token ID198
Feature activation+1.03e-02

Pos logit contributions
O+0.009
 aback+0.008
 Jane+0.008
 scoop+0.008
uck+0.008
Neg logit contributions
 sizes-0.017
ances-0.017
 cultures-0.017
girls-0.016
 planets-0.016
TokenThey
Token ID2990
Feature activation-4.38e-02

Pos logit contributions
 aback+0.018
 Jane+0.017
O+0.016
 ingredient+0.016
 Half+0.016
Neg logit contributions
ances-0.013
 sizes-0.013
 cultures-0.013
girls-0.012
 nations-0.012
Token decided
Token ID3066
Feature activation-1.56e-02

Pos logit contributions
 sizes+0.078
ances+0.077
 cultures+0.076
 nations+0.073
 planets+0.072
Neg logit contributions
O-0.055
 Jane-0.052
 aback-0.051
 scoop-0.049
M-0.049
Token to
Token ID284
Feature activation+7.59e-03

Pos logit contributions
 sizes+0.036
ances+0.036
 cultures+0.036
 nations+0.034
 planets+0.034
Neg logit contributions
O-0.011
 aback-0.010
 Jane-0.009
M-0.009
uck-0.009
Token share
Token ID2648
Feature activation-5.03e-02

Pos logit contributions
O+0.009
 Jane+0.008
 scoop+0.008
 aback+0.008
M+0.008
Neg logit contributions
 sizes-0.015
ances-0.014
 cultures-0.014
 planets-0.013
 nations-0.013
Token their
Token ID511
Feature activation-2.77e-02

Pos logit contributions
 sizes+0.095
 cultures+0.093
ances+0.091
 nations+0.090
 planets+0.090
Neg logit contributions
O-0.058
 aback-0.055
Aid-0.052
eh-0.052
uck-0.051
Token toys
Token ID14958
Feature activation-8.36e-02

Pos logit contributions
 sizes+0.042
ances+0.042
 cultures+0.040
arians+0.039
girls+0.038
Neg logit contributions
O-0.042
 Jane-0.040
 scoop-0.039
 another-0.038
 aback-0.038
Token with
Token ID351
Feature activation+4.22e-02

Pos logit contributions
 sizes+0.139
ances+0.138
 cultures+0.135
 planets+0.129
arians+0.128
Neg logit contributions
O-0.112
 Jane-0.110
 aback-0.107
 another-0.107
 scoop-0.104
Token Spot
Token ID15899
Feature activation-3.38e-02

Pos logit contributions
O+0.041
 aback+0.040
Aid+0.037
uck+0.036
eh+0.036
Neg logit contributions
 sizes-0.086
ances-0.085
 cultures-0.084
girls-0.082
 nations-0.082
Token.
Token ID13
Feature activation-5.11e-03

Pos logit contributions
ances+0.066
 cultures+0.064
 sizes+0.063
 nations+0.060
 planets+0.059
Neg logit contributions
 another-0.039
 some-0.037
 Jane-0.035
 half-0.035
O-0.034
Token and
Token ID290
Feature activation+8.61e-03

Pos logit contributions
O+0.033
 found+0.033
 scoop+0.033
M+0.032
 aback+0.031
Neg logit contributions
 cultures-0.050
 sizes-0.050
ances-0.050
 affect-0.047
 nations-0.046
Token Sue
Token ID26089
Feature activation+1.88e-02

Pos logit contributions
Aid+0.013
ction+0.013
rous+0.012
ray+0.012
izz+0.012
Neg logit contributions
 sizes-0.012
 nations-0.012
girls-0.011
 cultures-0.011
 languages-0.011
Token did
Token ID750
Feature activation-6.26e-02

Pos logit contributions
 found+0.022
 scoop+0.021
O+0.019
 another+0.019
M+0.019
Neg logit contributions
 sizes-0.039
 cultures-0.039
ances-0.039
arians-0.036
bows-0.036
Token not
Token ID407
Feature activation+8.05e-03

Pos logit contributions
 sizes+0.119
ances+0.116
 cultures+0.113
arians+0.110
 planets+0.110
Neg logit contributions
 another-0.079
O-0.076
 Jane-0.075
 some-0.074
 aback-0.072
Token see
Token ID766
Feature activation+6.31e-02

Pos logit contributions
O+0.010
 aback+0.009
 Jane+0.009
 scoop+0.008
M+0.008
Neg logit contributions
 sizes-0.015
ances-0.015
 cultures-0.014
 planets-0.014
 nations-0.014
Token the
Token ID262
Feature activation-4.97e-02

Pos logit contributions
O+0.056
 aback+0.051
 Jane+0.049
M+0.047
uck+0.047
Neg logit contributions
 sizes-0.136
ances-0.134
 cultures-0.133
 nations-0.129
 planets-0.129
Token snake
Token ID17522
Feature activation-1.75e-02

Pos logit contributions
 sizes+0.090
ances+0.089
 cultures+0.088
 nations+0.084
 planets+0.084
Neg logit contributions
O-0.061
 aback-0.057
 Jane-0.057
M-0.054
uck-0.054
Token behind
Token ID2157
Feature activation+4.97e-02

Pos logit contributions
 cultures+0.038
ances+0.038
 sizes+0.038
bows+0.036
arians+0.035
Neg logit contributions
 another-0.018
 scoop-0.017
 some-0.016
 away-0.016
 found-0.015
Token them
Token ID606
Feature activation+8.82e-03

Pos logit contributions
O+0.038
 Jane+0.036
 aback+0.033
 another+0.033
M+0.032
Neg logit contributions
 sizes-0.115
ances-0.114
 cultures-0.113
arians-0.108
 nations-0.107
Token.
Token ID13
Feature activation+4.06e-03

Pos logit contributions
 another+0.006
 some+0.006
 her+0.006
 over+0.006
 away+0.005
Neg logit contributions
 cultures-0.022
ances-0.022
 sizes-0.021
arians-0.021
bows-0.021
Token They
Token ID1119
Feature activation-5.24e-02

Pos logit contributions
O+0.006
 aback+0.006
 Jane+0.005
M+0.005
Aid+0.005
Neg logit contributions
 sizes-0.007
ances-0.006
 cultures-0.006
 nations-0.006
 planets-0.006
Token was
Token ID373
Feature activation-2.57e-02

Pos logit contributions
O+0.001
 aback+0.001
 Jane+0.001
uck+0.001
Aid+0.001
Neg logit contributions
 sizes-0.002
ances-0.002
 cultures-0.002
 nations-0.001
 planets-0.001
Token smart
Token ID4451
Feature activation-1.52e-02

Pos logit contributions
 sizes+0.034
 cultures+0.033
ances+0.032
 planets+0.032
 nations+0.031
Neg logit contributions
O-0.043
ray-0.040
Aid-0.040
uck-0.040
 aback-0.039
Token enough
Token ID1576
Feature activation+1.84e-02

Pos logit contributions
ances+0.031
girls+0.030
 cultures+0.030
 affect+0.030
agons+0.030
Neg logit contributions
 Jane-0.016
 aback-0.015
 another-0.015
 her-0.014
 some-0.014
Token to
Token ID284
Feature activation+1.29e-02

Pos logit contributions
 Jane+0.009
 aback+0.009
O+0.008
 another+0.008
 scoop+0.007
Neg logit contributions
 sizes-0.046
ances-0.045
 cultures-0.045
girls-0.045
 planets-0.045
Token know
Token ID760
Feature activation+1.06e-02

Pos logit contributions
 Jane+0.016
O+0.015
 scoop+0.015
 aback+0.015
 another+0.015
Neg logit contributions
 sizes-0.023
ances-0.023
 cultures-0.022
girls-0.022
arians-0.022
Token when
Token ID618
Feature activation-7.20e-02

Pos logit contributions
 Jane+0.010
O+0.009
 another+0.009
 aback+0.009
 some+0.008
Neg logit contributions
 sizes-0.022
ances-0.022
 cultures-0.021
girls-0.021
arians-0.021
Token something
Token ID1223
Feature activation-2.41e-02

Pos logit contributions
 sizes+0.154
ances+0.151
girls+0.148
 cultures+0.146
bles+0.144
Neg logit contributions
 Jane-0.068
O-0.063
 another-0.060
M-0.056
 Sally-0.056
Token was
Token ID373
Feature activation-1.98e-02

Pos logit contributions
 sizes+0.045
ances+0.044
 cultures+0.043
 nations+0.042
 planets+0.042
Neg logit contributions
O-0.029
 aback-0.027
 Jane-0.026
uck-0.025
M-0.025
Token too
Token ID1165
Feature activation-5.85e-03

Pos logit contributions
 sizes+0.035
ances+0.034
 cultures+0.034
 planets+0.033
 nations+0.032
Neg logit contributions
O-0.026
 aback-0.024
uck-0.023
Aid-0.023
M-0.023
Token dangerous
Token ID4923
Feature activation-1.46e-02

Pos logit contributions
 sizes+0.011
ances+0.011
 cultures+0.011
 nations+0.010
 planets+0.010
Neg logit contributions
O-0.007
 aback-0.006
 Jane-0.006
uck-0.006
Aid-0.006
Token.
Token ID13
Feature activation-5.75e-03

Pos logit contributions
 sizes+0.027
 cultures+0.027
ances+0.027
 nations+0.027
 planets+0.026
Neg logit contributions
 Jane-0.016
 aback-0.016
 another-0.015
 some-0.015
 her-0.015
Token more
Token ID517
Feature activation-2.46e-02

Pos logit contributions
 sizes+0.016
 pilots+0.015
 cultures+0.015
 goals+0.015
 nations+0.014
Neg logit contributions
izz-0.018
inn-0.018
 ingredient-0.018
 Jane-0.018
O-0.017
Token careful
Token ID8161
Feature activation-4.55e-02

Pos logit contributions
 sizes+0.043
ances+0.043
 cultures+0.042
 nations+0.041
 planets+0.040
Neg logit contributions
O-0.031
 Jane-0.030
 aback-0.029
uck-0.028
 scoop-0.028
Token and
Token ID290
Feature activation-1.43e-02

Pos logit contributions
arians+0.099
girls+0.099
 cultures+0.097
 nations+0.095
agons+0.095
Neg logit contributions
 some-0.042
 another-0.038
 over-0.038
 half-0.037
uck-0.036
Token respectful
Token ID25923
Feature activation-7.21e-02

Pos logit contributions
 sizes+0.024
 cultures+0.023
ances+0.023
 nations+0.022
 planets+0.022
Neg logit contributions
O-0.020
 aback-0.018
 Jane-0.018
inn-0.018
uck-0.018
Token of
Token ID286
Feature activation-1.89e-02

Pos logit contributions
ances+0.147
aults+0.143
arians+0.143
 cultures+0.142
 sizes+0.141
Neg logit contributions
 another-0.071
 some-0.068
 Jane-0.067
 aback-0.066
 half-0.065
Token animals
Token ID4695
Feature activation-1.50e-01

Pos logit contributions
 sizes+0.036
ances+0.035
 cultures+0.035
 nations+0.034
 planets+0.033
Neg logit contributions
O-0.022
 Jane-0.020
 aback-0.020
 scoop-0.019
 another-0.019
Token.
Token ID13
Feature activation-1.68e-03

Pos logit contributions
ances+0.299
 sizes+0.295
 cultures+0.291
arians+0.285
 ABC+0.282
Neg logit contributions
 another-0.156
 some-0.150
 away-0.143
 her-0.140
 aback-0.140
Token He
Token ID679
Feature activation-3.10e-02

Pos logit contributions
 sizes+0.003
ances+0.003
 cultures+0.003
 planets+0.003
 nations+0.003
Neg logit contributions
O-0.002
 aback-0.002
 Jane-0.002
 scoop-0.002
uck-0.002
Token still
Token ID991
Feature activation+1.46e-02

Pos logit contributions
 sizes+0.051
ances+0.050
 cultures+0.049
 nations+0.047
 planets+0.046
Neg logit contributions
O-0.044
 aback-0.041
 Jane-0.041
 scoop-0.040
M-0.040
Token liked
Token ID8288
Feature activation-3.21e-02

Pos logit contributions
O+0.020
 aback+0.019
 Jane+0.018
 scoop+0.018
M+0.018
Neg logit contributions
ances-0.024
 sizes-0.024
 cultures-0.024
 nations-0.023
arians-0.023
Token to
Token ID284
Feature activation-2.28e-03

Pos logit contributions
ances+0.080
 sizes+0.079
 cultures+0.078
aults+0.078
arians+0.077
Neg logit contributions
O-0.016
 Jane-0.016
 another-0.015
 my-0.014
 some-0.014
Token a
Token ID257
Feature activation-8.26e-03

Pos logit contributions
O+0.016
 aback+0.015
inn+0.014
uck+0.014
 Jane+0.014
Neg logit contributions
 sizes-0.053
 cultures-0.051
ances-0.051
 planets-0.051
 nations-0.049
Token bird
Token ID6512
Feature activation+2.16e-02

Pos logit contributions
ances+0.015
 sizes+0.015
 cultures+0.015
 nations+0.014
arians+0.014
Neg logit contributions
O-0.011
 scoop-0.010
 Jane-0.009
 another-0.009
M-0.009
Token.
Token ID13
Feature activation+5.44e-03

Pos logit contributions
 scoop+0.023
 another+0.022
 some+0.021
 found+0.021
O+0.021
Neg logit contributions
 cultures-0.043
ances-0.043
 sizes-0.042
arians-0.041
 nations-0.040
Token They
Token ID1119
Feature activation-4.98e-02

Pos logit contributions
O+0.005
 aback+0.005
eh+0.005
uck+0.005
Aid+0.005
Neg logit contributions
 sizes-0.011
ances-0.011
 cultures-0.010
girls-0.010
 planets-0.010
Token liked
Token ID8288
Feature activation-2.55e-02

Pos logit contributions
ances+0.089
 cultures+0.088
 sizes+0.087
arians+0.084
agons+0.082
Neg logit contributions
O-0.064
 scoop-0.062
 found-0.062
 Jane-0.061
 another-0.060
Token animals
Token ID4695
Feature activation-1.48e-01

Pos logit contributions
ances+0.061
arians+0.059
bles+0.059
aults+0.059
 achieve+0.058
Neg logit contributions
 Jane-0.017
 another-0.017
 some-0.017
 my-0.016
 her-0.016
Token.
Token ID13
Feature activation+4.87e-04

Pos logit contributions
ances+0.333
arians+0.312
 cultures+0.305
 sizes+0.296
 affect+0.294
Neg logit contributions
 another-0.148
 some-0.145
 her-0.131
 Jane-0.129
 my-0.123
Token
Token ID198
Feature activation+1.41e-02

Pos logit contributions
O+0.001
 Jane+0.001
 aback+0.001
M+0.000
uck+0.000
Neg logit contributions
 sizes-0.001
ances-0.001
 cultures-0.001
 nations-0.001
 planets-0.001
Token
Token ID198
Feature activation+1.55e-02

Pos logit contributions
O+0.013
 Jane+0.012
 aback+0.012
M+0.012
 scoop+0.011
Neg logit contributions
 sizes-0.030
ances-0.029
 cultures-0.029
 nations-0.028
 planets-0.028
Token"
Token ID1
Feature activation+3.48e-02

Pos logit contributions
O+0.014
M+0.013
 Jane+0.013
 aback+0.013
 scoop+0.012
Neg logit contributions
 sizes-0.033
ances-0.033
 cultures-0.032
 planets-0.031
 nations-0.031
TokenLook
Token ID8567
Feature activation+6.61e-02

Pos logit contributions
O+0.034
 aback+0.032
 Jane+0.031
uck+0.029
 scoop+0.029
Neg logit contributions
 sizes-0.072
ances-0.071
 cultures-0.070
 nations-0.068
 planets-0.068
Token let
Token ID1309
Feature activation+2.12e-02

Pos logit contributions
 sizes+0.097
ances+0.095
 cultures+0.094
 nations+0.091
 planets+0.091
Neg logit contributions
O-0.053
 aback-0.049
 Jane-0.049
M-0.046
uck-0.046
Token's
Token ID338
Feature activation-2.03e-02

Pos logit contributions
inn+0.036
 aback+0.035
year+0.034
Aid+0.034
 ingredient+0.034
Neg logit contributions
girls-0.026
aults-0.025
 countries-0.025
 nations-0.024
 ABC-0.023
Token make
Token ID787
Feature activation-2.31e-02

Pos logit contributions
 sizes+0.033
ances+0.033
 cultures+0.032
 planets+0.031
 nations+0.031
Neg logit contributions
O-0.028
 aback-0.027
 Jane-0.026
M-0.026
Aid-0.025
Token more
Token ID517
Feature activation-3.87e-02

Pos logit contributions
 planets+0.046
 countries+0.045
 nations+0.045
 sizes+0.044
girls+0.044
Neg logit contributions
O-0.023
 aback-0.023
inn-0.022
Aid-0.021
Da-0.021
Token paper
Token ID3348
Feature activation-4.62e-03

Pos logit contributions
ances+0.059
arians+0.059
 affect+0.055
 sizes+0.054
girls+0.053
Neg logit contributions
 Jane-0.057
 my-0.056
 some-0.055
 scoop-0.054
 another-0.053
Token animals
Token ID4695
Feature activation-1.55e-01

Pos logit contributions
ances+0.008
 sizes+0.008
 cultures+0.008
arians+0.007
 nations+0.007
Neg logit contributions
 Jane-0.006
O-0.005
 scoop-0.005
 aback-0.005
 another-0.005
Token!"
Token ID2474
Feature activation+3.62e-02

Pos logit contributions
 sizes+0.291
ances+0.290
 cultures+0.287
 planets+0.274
 nations+0.273
Neg logit contributions
O-0.167
 Jane-0.163
 aback-0.163
 scoop-0.159
 another-0.156
Token Sara
Token ID24799
Feature activation+4.03e-02

Pos logit contributions
O+0.037
 aback+0.035
 Jane+0.033
Aid+0.032
uck+0.032
Neg logit contributions
 sizes-0.071
ances-0.070
 cultures-0.070
 nations-0.068
 planets-0.068
Token said
Token ID531
Feature activation-1.10e-02

Pos logit contributions
O+0.043
 aback+0.040
 Jane+0.040
M+0.037
uck+0.037
Neg logit contributions
 sizes-0.079
ances-0.078
 cultures-0.078
 nations-0.075
 planets-0.075
Token.
Token ID13
Feature activation+1.10e-03

Pos logit contributions
 sizes+0.022
 cultures+0.022
 planets+0.021
 habitats+0.021
 nations+0.021
Neg logit contributions
 Jane-0.012
 another-0.012
uck-0.012
 aback-0.012
 some-0.011
Token
Token ID198
Feature activation+1.04e-02

Pos logit contributions
O+0.001
 aback+0.001
 Jane+0.001
 scoop+0.001
uck+0.001
Neg logit contributions
 sizes-0.002
ances-0.002
 cultures-0.002
 nations-0.002
 planets-0.002
Token fun
Token ID1257
Feature activation-2.66e-02

Pos logit contributions
O+0.029
 Jane+0.028
 aback+0.027
 scoop+0.027
 another+0.026
Neg logit contributions
ances-0.065
 sizes-0.064
 cultures-0.064
arians-0.062
 nations-0.061
Token making
Token ID1642
Feature activation-3.28e-02

Pos logit contributions
ances+0.062
 cultures+0.059
arians+0.058
 sizes+0.057
 affect+0.056
Neg logit contributions
 some-0.025
 another-0.024
 Jane-0.021
 its-0.020
sm-0.020
Token noises
Token ID26782
Feature activation-8.80e-02

Pos logit contributions
ances+0.060
 sizes+0.058
 cultures+0.058
arians+0.056
 affect+0.054
Neg logit contributions
 Jane-0.040
O-0.039
 another-0.038
 aback-0.038
 scoop-0.037
Token and
Token ID290
Feature activation-3.21e-03

Pos logit contributions
ances+0.208
 cultures+0.205
arians+0.203
agons+0.200
 habitats+0.194
Neg logit contributions
 another-0.084
 some-0.077
 over-0.067
 late-0.065
 scoop-0.064
Token telling
Token ID5149
Feature activation-3.66e-02

Pos logit contributions
ances+0.005
arians+0.005
agons+0.005
bles+0.005
aults+0.005
Neg logit contributions
 some-0.005
 another-0.005
 eaten-0.005
 found-0.005
 Jane-0.005
Token stories
Token ID3923
Feature activation-1.56e-01

Pos logit contributions
ances+0.057
 sizes+0.056
arians+0.055
 cultures+0.054
bles+0.054
Neg logit contributions
 Jane-0.054
O-0.052
 another-0.051
 some-0.050
 aback-0.049
Token.
Token ID13
Feature activation-9.98e-03

Pos logit contributions
ances+0.310
 cultures+0.302
 sizes+0.298
arians+0.296
agons+0.292
Neg logit contributions
 another-0.165
 Jane-0.163
 some-0.155
 her-0.144
 Amy-0.144
Token
Token ID198
Feature activation+8.47e-03

Pos logit contributions
 sizes+0.019
ances+0.018
 cultures+0.018
 nations+0.018
girls+0.017
Neg logit contributions
O-0.012
 aback-0.011
Aid-0.010
 Jane-0.010
eh-0.010
Token
Token ID198
Feature activation+1.14e-02

Pos logit contributions
O+0.008
 aback+0.008
 Jane+0.008
M+0.007
uck+0.007
Neg logit contributions
 sizes-0.017
ances-0.017
 cultures-0.017
 nations-0.016
 planets-0.016
TokenBut
Token ID1537
Feature activation+1.04e-02

Pos logit contributions
O+0.015
 aback+0.015
 Jane+0.015
Aid+0.014
uck+0.014
Neg logit contributions
 sizes-0.019
ances-0.019
 cultures-0.018
 nations-0.018
girls-0.018
Token then
Token ID788
Feature activation+2.10e-02

Pos logit contributions
 aback+0.013
O+0.013
Aid+0.013
ray+0.012
eh+0.012
Neg logit contributions
 sizes-0.018
ances-0.017
 planets-0.017
 cultures-0.017
 nations-0.017
Token about
Token ID546
Feature activation-1.08e-02

Pos logit contributions
 sizes+0.050
ances+0.050
 cultures+0.049
girls+0.048
arians+0.048
Neg logit contributions
O-0.025
 Jane-0.025
 aback-0.023
 another-0.023
M-0.022
Token how
Token ID703
Feature activation-2.39e-02

Pos logit contributions
ances+0.023
 sizes+0.022
 cultures+0.022
arians+0.022
girls+0.022
Neg logit contributions
O-0.010
 Jane-0.009
 another-0.009
 aback-0.009
 my-0.009
Token to
Token ID284
Feature activation-2.31e-02

Pos logit contributions
 sizes+0.051
ances+0.050
 cultures+0.050
 nations+0.048
 planets+0.048
Neg logit contributions
O-0.022
 Jane-0.020
 aback-0.020
M-0.018
uck-0.018
Token make
Token ID787
Feature activation-4.12e-02

Pos logit contributions
 sizes+0.037
ances+0.036
 cultures+0.035
 planets+0.034
 nations+0.034
Neg logit contributions
O-0.033
 aback-0.033
 Jane-0.032
Da-0.030
M-0.030
Token her
Token ID607
Feature activation-2.03e-02

Pos logit contributions
ances+0.072
 cultures+0.070
 sizes+0.070
arians+0.067
 nations+0.066
Neg logit contributions
O-0.053
 Jane-0.051
 another-0.051
 my-0.049
 aback-0.049
Token friends
Token ID2460
Feature activation-1.52e-01

Pos logit contributions
ances+0.037
arians+0.037
 cultures+0.035
 sizes+0.033
 countries+0.033
Neg logit contributions
 another-0.027
 some-0.026
 scoop-0.025
 Jane-0.025
 my-0.025
Token happy
Token ID3772
Feature activation-1.50e-02

Pos logit contributions
ances+0.269
arians+0.269
 cultures+0.263
 sizes+0.260
girls+0.259
Neg logit contributions
 scoop-0.190
 another-0.188
 Jane-0.187
 some-0.183
 my-0.181
Token.
Token ID13
Feature activation-2.37e-03

Pos logit contributions
ances+0.030
arians+0.030
girls+0.029
 cultures+0.029
 sizes+0.029
Neg logit contributions
 another-0.016
 Jane-0.016
 her-0.015
 some-0.015
 my-0.015
Token
Token ID198
Feature activation+1.04e-02

Pos logit contributions
 sizes+0.004
 cultures+0.004
 nations+0.004
 planets+0.004
girls+0.004
Neg logit contributions
O-0.003
 aback-0.003
Aid-0.003
plug-0.003
 scoop-0.003
Token
Token ID198
Feature activation+1.31e-02

Pos logit contributions
 aback+0.013
O+0.013
Aid+0.012
 scoop+0.012
 Jane+0.012
Neg logit contributions
 sizes-0.017
ances-0.017
 cultures-0.017
 nations-0.016
 planets-0.016
TokenM
Token ID44
Feature activation+6.59e-02

Pos logit contributions
 aback+0.015
Aid+0.014
ray+0.013
O+0.013
 Jane+0.013
Neg logit contributions
 sizes-0.025
ances-0.024
 nations-0.024
 cultures-0.024
 planets-0.023