TOP ACTIVATIONS
MAX = 0.123

into all the n ooks and cr ann ies .
to create little n ooks and cr ann ies , and
lots of small n ooks and cr ann ies to grab
wings hard and flew higher and higher until it felt so
ite flew higher and higher and he cheered with delight .
broke and the y olk and the white came out .
a book , a guitar , a dog , a cat
from them and floated higher and higher . With
Mia and Leo cl apped and laughed . They
's k ite flew higher and higher and he cheered with
. The man flew higher and higher until Sarah could not
some eggs , some flour , some sugar , some butter
a bed , a sofa , a table , a book
going . He crossed rivers , mountains and forests , and
rooms , chairs , tables , and toys . She thought
M um cl apped and said . " Thank You
added a handful of flour , a few eggs , and
some eggs , some flour , some sugar , and some
baby . Tommy cl apped and cheered - it was a
some eggs , some flour , some sugar , and some

MIN ACTIVATIONS
MIN = -0.131

a penchant for things she shouldn 't touch . One day
and not touch things she shouldn 't . <|endoftext|> One day
t you know you shouldn â t speed
to play with things she shouldn 't touch . <|endoftext|> Once
to play with things she shouldn 't touch . <|endoftext|> Once
She did many things she shouldn 't have done , like
did lots of things he shouldn 't do . One day
She takes out a magn ifying glass . She looks at
he had eaten something he shouldn 't have . Max 's
and getting into things he shouldn 't . One
ally , you know you shouldn 't keep the gum .
sometimes sneak into places he shouldn 't . One day Mac
inside the cup board that shouldn 't be there . He
, " No , we shouldn 't disturb the butterfly .
and said , " You shouldn 't take things that don
told Sarah , " You shouldn 't climb the tree ,
the shark , " You shouldn 't be selfish . We
When Daisy put the magn ifying glass down , the cater
, doesn 't mean you shouldn 't try it ! <|endoftext|>
" Me gan , you shouldn 't take things that are

INTERVAL 0.059 - 0.123
CONTAINS 6.357%

She shared it with everyone and they all said it was
They lived near the beach and loved playing in the sand
. The bear and mouse were very scared .
being seen ! He danced and played outside for the first
L ily dropped the flag and fell off the swing .

INTERVAL -0.004 - 0.059
CONTAINS 44.703%

a fort . They put the box on the grass and
if they could make a car out of it . They
so happy ! She took her new things to the park
loved to explore . One day , he was out playing
heart felt warm and the little boy smiled . He started

INTERVAL -0.068 - -0.004
CONTAINS 47.300%

to learn something new . The end . <|endoftext|>
something strange happened . The spaghetti monster vanished ! Po of
later ." " Okay ," Leo said . They
in her hair . She is happy . She says ,
asks . " No , I am not OK

INTERVAL -0.131 - -0.068
CONTAINS 1.640%

" No , you won 't ! I am a
her mom my if she could have a little bunny .
and said , " You did it all on your own
just a toy . You can get another one ." He
hands tight . " Do you think we will be
Token into
Token ID656
Feature activation+2.44e-02

Pos logit contributions
 cities+0.020
 forests+0.020
aris+0.020
 castles+0.019
cast+0.019
Neg logit contributions
 word-0.017
 pain-0.016
'.-0.015
 apology-0.015
 value-0.015
Token all
Token ID477
Feature activation+5.52e-02

Pos logit contributions
 castles+0.043
 cities+0.041
cast+0.040
 forests+0.040
 gardens+0.040
Neg logit contributions
 word-0.042
 apology-0.040
 pain-0.039
'.-0.039
 grant-0.038
Token the
Token ID262
Feature activation+3.73e-02

Pos logit contributions
 castles+0.097
cast+0.096
 cities+0.096
 forests+0.094
angers+0.094
Neg logit contributions
 word-0.094
 pain-0.089
'.-0.086
 apology-0.086
 hurts-0.085
Token n
Token ID299
Feature activation+3.45e-02

Pos logit contributions
 castles+0.057
 cities+0.055
cast+0.055
 forests+0.054
 oceans+0.054
Neg logit contributions
 word-0.073
 pain-0.070
 apology-0.069
 hurts-0.067
'.-0.067
Tokenooks
Token ID31085
Feature activation+5.71e-02

Pos logit contributions
 castles+0.055
 cities+0.053
 forests+0.052
cast+0.051
 oceans+0.051
Neg logit contributions
 word-0.064
'.-0.062
 apology-0.061
 pain-0.060
 grant-0.058
Token and
Token ID290
Feature activation+1.23e-01

Pos logit contributions
 castles+0.112
 cities+0.111
cast+0.108
 forests+0.108
 gardens+0.107
Neg logit contributions
 word-0.083
 pain-0.079
'.-0.077
 apology-0.074
 hurts-0.071
Token cr
Token ID1067
Feature activation+1.51e-02

Pos logit contributions
cast+0.226
aris+0.220
 castles+0.217
angers+0.217
 cities+0.214
Neg logit contributions
 word-0.205
 pain-0.196
 apology-0.190
 hurts-0.182
 grant-0.182
Tokenann
Token ID1236
Feature activation+7.95e-02

Pos logit contributions
 castles+0.021
 cities+0.021
cast+0.020
 forests+0.020
 gardens+0.020
Neg logit contributions
 word-0.031
'.-0.029
 apology-0.029
 pain-0.029
 hurts-0.028
Tokenies
Token ID444
Feature activation+4.13e-02

Pos logit contributions
 cities+0.080
 castles+0.080
 forests+0.077
 oceans+0.076
cast+0.076
Neg logit contributions
 word-0.191
'.-0.184
 pain-0.182
 apology-0.182
 hurts-0.180
Token.
Token ID13
Feature activation-7.70e-03

Pos logit contributions
 castles+0.099
 cities+0.098
cast+0.097
 forests+0.096
 oceans+0.096
Neg logit contributions
 word-0.043
'.-0.039
 pain-0.038
 apology-0.037
 grant-0.036
Token
Token ID198
Feature activation-8.49e-03

Pos logit contributions
 word+0.012
'.+0.011
 apology+0.011
 pain+0.010
 grant+0.010
Neg logit contributions
 castles-0.015
 cities-0.015
 gardens-0.014
cast-0.014
 forests-0.014
Token to
Token ID284
Feature activation-5.80e-03

Pos logit contributions
 word+0.010
'.+0.009
 pain+0.009
 shock+0.009
 hurts+0.008
Neg logit contributions
 castles-0.011
 cities-0.011
aris-0.011
cast-0.011
angers-0.011
Token create
Token ID2251
Feature activation-8.26e-03

Pos logit contributions
 word+0.011
 apology+0.010
'.+0.010
 pain+0.010
 hurts+0.010
Neg logit contributions
 castles-0.009
 cities-0.009
 forests-0.009
 gardens-0.009
cast-0.009
Token little
Token ID1310
Feature activation+3.53e-02

Pos logit contributions
 word+0.013
 pain+0.012
'.+0.012
 apology+0.012
 grant+0.012
Neg logit contributions
 castles-0.015
 cities-0.015
cast-0.015
 oceans-0.015
 forests-0.014
Token n
Token ID299
Feature activation+2.92e-02

Pos logit contributions
 castles+0.056
cast+0.054
 cities+0.054
 forests+0.052
 gardens+0.052
Neg logit contributions
 word-0.066
 pain-0.063
 apology-0.061
'.-0.061
 hurts-0.060
Tokenooks
Token ID31085
Feature activation+5.65e-02

Pos logit contributions
 castles+0.045
 cities+0.042
cast+0.042
 gardens+0.042
 forests+0.042
Neg logit contributions
 word-0.058
 apology-0.055
 pain-0.054
 hurts-0.053
 grant-0.053
Token and
Token ID290
Feature activation+1.18e-01

Pos logit contributions
 castles+0.111
 cities+0.109
cast+0.106
 forests+0.105
 gardens+0.105
Neg logit contributions
 word-0.083
 pain-0.078
'.-0.077
 apology-0.072
 grant-0.072
Token cr
Token ID1067
Feature activation+1.08e-02

Pos logit contributions
cast+0.197
 castles+0.193
aris+0.193
angers+0.192
 cities+0.189
Neg logit contributions
 word-0.213
 pain-0.203
 apology-0.197
 grant-0.191
 shock-0.190
Tokenann
Token ID1236
Feature activation+7.36e-02

Pos logit contributions
 castles+0.016
 cities+0.016
cast+0.016
 forests+0.015
 gardens+0.015
Neg logit contributions
 word-0.021
 apology-0.020
 pain-0.020
'.-0.020
 grant-0.019
Tokenies
Token ID444
Feature activation+3.81e-02

Pos logit contributions
 castles+0.069
 cities+0.069
 forests+0.066
 oceans+0.065
 mountains+0.065
Neg logit contributions
 word-0.181
'.-0.174
 pain-0.171
 apology-0.171
 hurts-0.170
Token,
Token ID11
Feature activation+1.05e-01

Pos logit contributions
 castles+0.090
 cities+0.089
cast+0.087
 forests+0.087
 oceans+0.087
Neg logit contributions
 word-0.040
'.-0.037
 pain-0.036
 grant-0.034
 apology-0.034
Token and
Token ID290
Feature activation+9.79e-02

Pos logit contributions
aris+0.177
cast+0.177
angers+0.175
 castles+0.172
 cities+0.168
Neg logit contributions
 word-0.181
 pain-0.176
 grant-0.166
 shock-0.163
 apology-0.163
Token lots
Token ID6041
Feature activation+7.66e-04

Pos logit contributions
angers+0.048
 castles+0.048
cast+0.047
 cities+0.047
aris+0.047
Neg logit contributions
 word-0.031
 pain-0.029
 shock-0.026
 apology-0.026
 value-0.026
Token of
Token ID286
Feature activation+7.62e-03

Pos logit contributions
 castles+0.002
 cities+0.001
cast+0.001
 forests+0.001
 gardens+0.001
Neg logit contributions
 word-0.001
 apology-0.001
 pain-0.001
'.-0.001
 grant-0.001
Token small
Token ID1402
Feature activation+1.20e-02

Pos logit contributions
 castles+0.012
 cities+0.012
cast+0.011
 forests+0.011
 gardens+0.011
Neg logit contributions
 word-0.015
 pain-0.014
 apology-0.014
'.-0.013
 hurts-0.013
Token n
Token ID299
Feature activation+2.61e-02

Pos logit contributions
 castles+0.018
cast+0.017
 cities+0.017
angers+0.017
 gardens+0.016
Neg logit contributions
 word-0.023
 pain-0.022
'.-0.021
 apology-0.021
 hurts-0.021
Tokenooks
Token ID31085
Feature activation+5.61e-02

Pos logit contributions
 castles+0.033
 cities+0.032
cast+0.031
 forests+0.030
 fields+0.030
Neg logit contributions
 word-0.057
'.-0.055
 apology-0.054
 pain-0.054
 grant-0.053
Token and
Token ID290
Feature activation+1.15e-01

Pos logit contributions
 castles+0.107
 cities+0.106
angers+0.102
cast+0.102
 gardens+0.102
Neg logit contributions
 word-0.085
 pain-0.080
'.-0.077
 apology-0.074
 grant-0.073
Token cr
Token ID1067
Feature activation+1.12e-02

Pos logit contributions
 castles+0.187
cast+0.185
 cities+0.183
angers+0.181
aris+0.181
Neg logit contributions
 word-0.212
 pain-0.198
 apology-0.195
 grant-0.191
 hurts-0.189
Tokenann
Token ID1236
Feature activation+7.28e-02

Pos logit contributions
 castles+0.017
 cities+0.016
cast+0.016
 forests+0.016
 gardens+0.016
Neg logit contributions
 word-0.022
 apology-0.021
 pain-0.021
'.-0.021
 grant-0.020
Tokenies
Token ID444
Feature activation+3.63e-02

Pos logit contributions
 castles+0.067
 cities+0.066
 forests+0.063
 oceans+0.062
cast+0.062
Neg logit contributions
 word-0.182
'.-0.174
 apology-0.173
 pain-0.173
 hurts-0.171
Token to
Token ID284
Feature activation+1.10e-03

Pos logit contributions
 castles+0.088
 cities+0.087
cast+0.085
 forests+0.085
 oceans+0.084
Neg logit contributions
 word-0.036
'.-0.033
 pain-0.032
 grant-0.030
 apology-0.029
Token grab
Token ID5552
Feature activation+1.85e-03

Pos logit contributions
 castles+0.002
 cities+0.002
 gardens+0.002
 forests+0.002
 fields+0.002
Neg logit contributions
 word-0.002
 apology-0.001
'.-0.001
 hurts-0.001
 pain-0.001
Token wings
Token ID12098
Feature activation+1.42e-02

Pos logit contributions
 castles+0.052
 forests+0.050
 cities+0.050
 fields+0.049
 oceans+0.049
Neg logit contributions
 word-0.057
 apology-0.055
'.-0.054
 owe-0.054
 pain-0.053
Token hard
Token ID1327
Feature activation-1.70e-02

Pos logit contributions
 castles+0.026
 cities+0.025
 gardens+0.024
cast+0.024
angers+0.024
Neg logit contributions
 word-0.024
'.-0.022
 pain-0.022
 apology-0.022
 grant-0.021
Token and
Token ID290
Feature activation+9.53e-02

Pos logit contributions
 word+0.033
 apology+0.032
 pain+0.031
 owe+0.031
'.+0.030
Neg logit contributions
 castles-0.026
cast-0.025
 cities-0.025
 forests-0.024
 oceans-0.024
Token flew
Token ID13112
Feature activation+5.91e-02

Pos logit contributions
angers+0.169
 castles+0.167
 cities+0.160
 ships+0.158
 gardens+0.157
Neg logit contributions
 word-0.168
 pain-0.159
 hurts-0.154
 grant-0.150
 shock-0.148
Token higher
Token ID2440
Feature activation+3.00e-02

Pos logit contributions
 castles+0.113
 cities+0.109
cast+0.107
 gardens+0.107
 forests+0.107
Neg logit contributions
 word-0.093
 apology-0.085
 pain-0.085
'.-0.085
 grant-0.083
Token and
Token ID290
Feature activation+1.14e-01

Pos logit contributions
 castles+0.054
 cities+0.052
cast+0.052
 forests+0.052
 gardens+0.051
Neg logit contributions
 word-0.050
 apology-0.047
 pain-0.047
'.-0.046
 grant-0.046
Token higher
Token ID2440
Feature activation+2.98e-02

Pos logit contributions
angers+0.205
aris+0.187
 castles+0.186
 ships+0.185
iked+0.182
Neg logit contributions
 word-0.197
 hurts-0.189
 grant-0.179
 pain-0.177
se-0.176
Token until
Token ID1566
Feature activation+6.40e-02

Pos logit contributions
 castles+0.055
 cities+0.054
cast+0.053
 forests+0.053
 gardens+0.052
Neg logit contributions
 word-0.048
 apology-0.044
 pain-0.044
'.-0.044
 grant-0.043
Token it
Token ID340
Feature activation-1.48e-03

Pos logit contributions
 castles+0.110
angers+0.107
cast+0.105
 cities+0.104
aris+0.103
Neg logit contributions
 word-0.112
 pain-0.109
'.-0.105
 shock-0.103
 grant-0.103
Token felt
Token ID2936
Feature activation+3.76e-03

Pos logit contributions
 word+0.002
 pain+0.002
 hurts+0.002
 grant+0.002
'.+0.002
Neg logit contributions
 castles-0.003
 cities-0.003
 ships-0.003
angers-0.003
 gardens-0.002
Token so
Token ID523
Feature activation+1.76e-02

Pos logit contributions
 castles+0.006
 cities+0.006
cast+0.006
 forests+0.006
 gardens+0.006
Neg logit contributions
 word-0.007
 pain-0.006
 apology-0.006
'.-0.006
 grant-0.006
Tokenite
Token ID578
Feature activation+1.18e-02

Pos logit contributions
 castles+0.037
 cities+0.036
cast+0.035
 forests+0.035
 gardens+0.035
Neg logit contributions
 word-0.041
'.-0.038
 apology-0.038
 pain-0.038
 hurts-0.038
Token flew
Token ID13112
Feature activation+4.67e-02

Pos logit contributions
 castles+0.024
 cities+0.023
 forests+0.022
cast+0.022
 gardens+0.022
Neg logit contributions
 word-0.017
 pain-0.016
 apology-0.016
'.-0.016
 hurts-0.015
Token higher
Token ID2440
Feature activation+2.76e-02

Pos logit contributions
 castles+0.087
angers+0.086
 cities+0.084
 gardens+0.084
 ships+0.083
Neg logit contributions
 word-0.074
se-0.066
 pain-0.066
'.-0.066
 shock-0.065
Token and
Token ID290
Feature activation+1.12e-01

Pos logit contributions
 castles+0.049
 cities+0.048
 gardens+0.047
angers+0.047
 ships+0.047
Neg logit contributions
 word-0.046
 pain-0.042
'.-0.042
 apology-0.041
 hurts-0.040
Token higher
Token ID2440
Feature activation+2.65e-02

Pos logit contributions
angers+0.210
aris+0.196
 castles+0.190
iked+0.190
 gardens+0.187
Neg logit contributions
 word-0.176
 hurts-0.171
se-0.164
 prove-0.163
 grant-0.161
Token and
Token ID290
Feature activation+1.14e-01

Pos logit contributions
 castles+0.045
 cities+0.044
 gardens+0.043
 forests+0.043
cast+0.043
Neg logit contributions
 word-0.046
 pain-0.043
'.-0.042
 apology-0.042
 hurts-0.041
Token he
Token ID339
Feature activation+2.28e-02

Pos logit contributions
angers+0.211
aris+0.195
iked+0.190
 castles+0.189
 gardens+0.186
Neg logit contributions
 word-0.184
 hurts-0.178
 prove-0.170
se-0.170
 grant-0.167
Token cheered
Token ID34767
Feature activation+3.46e-02

Pos logit contributions
angers+0.041
 castles+0.041
 cities+0.040
 gardens+0.039
aris+0.039
Neg logit contributions
 word-0.035
 hurts-0.034
se-0.034
 grant-0.034
 pain-0.033
Token with
Token ID351
Feature activation+2.74e-02

Pos logit contributions
 castles+0.065
 cities+0.064
 forests+0.063
 gardens+0.063
cast+0.062
Neg logit contributions
 word-0.055
'.-0.050
 apology-0.050
 pain-0.050
 grant-0.049
Token delight
Token ID10974
Feature activation+4.28e-02

Pos logit contributions
 castles+0.051
 cities+0.048
 forests+0.048
 gardens+0.047
 fields+0.047
Neg logit contributions
 word-0.046
 hurts-0.043
 grant-0.043
'.-0.042
 owe-0.042
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
 castles+0.103
 cities+0.100
cast+0.099
 forests+0.098
 gardens+0.098
Neg logit contributions
 word-0.046
 apology-0.041
 pain-0.041
'.-0.040
 grant-0.039
Token broke
Token ID6265
Feature activation-9.01e-04

Pos logit contributions
 castles+0.021
 cities+0.020
cast+0.020
 gardens+0.020
 forests+0.020
Neg logit contributions
 word-0.017
 apology-0.016
 pain-0.015
'.-0.015
 grant-0.015
Token and
Token ID290
Feature activation+8.11e-02

Pos logit contributions
 word+0.002
 apology+0.001
 pain+0.001
'.+0.001
 grant+0.001
Neg logit contributions
 castles-0.002
 cities-0.002
cast-0.002
 forests-0.002
 gardens-0.002
Token the
Token ID262
Feature activation+3.50e-02

Pos logit contributions
aris+0.154
 cities+0.152
 castles+0.149
 oceans+0.147
 forests+0.147
Neg logit contributions
 pain-0.128
 word-0.127
 hurts-0.122
 shock-0.118
 grant-0.118
Token y
Token ID331
Feature activation-3.69e-02

Pos logit contributions
 castles+0.062
 fields+0.059
 gardens+0.058
 cities+0.058
 forests+0.058
Neg logit contributions
 word-0.058
'.-0.057
 apology-0.056
 owe-0.055
 grant-0.055
Tokenolk
Token ID13597
Feature activation+3.47e-02

Pos logit contributions
 grant+0.075
 word+0.072
 apology+0.071
 owe+0.070
 prove+0.070
Neg logit contributions
 castles-0.057
 fields-0.054
 gardens-0.052
 mountains-0.051
cast-0.049
Token and
Token ID290
Feature activation+1.13e-01

Pos logit contributions
 castles+0.061
 cities+0.058
cast+0.058
 forests+0.057
 gardens+0.057
Neg logit contributions
 word-0.060
 apology-0.056
'.-0.055
 pain-0.055
 grant-0.054
Token the
Token ID262
Feature activation+5.22e-02

Pos logit contributions
 castles+0.197
 cities+0.194
cast+0.191
aris+0.189
 forests+0.188
Neg logit contributions
 word-0.194
 pain-0.185
 apology-0.180
 grant-0.177
 hurts-0.177
Token white
Token ID2330
Feature activation+1.88e-02

Pos logit contributions
 castles+0.088
 fields+0.083
 gardens+0.083
 mountains+0.081
 forests+0.081
Neg logit contributions
 word-0.093
'.-0.091
 apology-0.089
 grant-0.087
 owe-0.087
Token came
Token ID1625
Feature activation+1.94e-02

Pos logit contributions
 castles+0.034
 fields+0.033
 gardens+0.032
 mountains+0.032
 forests+0.032
Neg logit contributions
 apology-0.031
 word-0.031
 grant-0.030
 prove-0.029
 owe-0.029
Token out
Token ID503
Feature activation-4.71e-03

Pos logit contributions
 castles+0.036
 cities+0.036
cast+0.035
 ships+0.035
 forests+0.035
Neg logit contributions
 word-0.030
 pain-0.028
'.-0.028
 hurts-0.027
 grant-0.027
Token.
Token ID13
Feature activation-1.03e-02

Pos logit contributions
 word+0.006
 apology+0.005
 pain+0.005
'.+0.005
 grant+0.005
Neg logit contributions
 castles-0.011
 cities-0.010
cast-0.010
 forests-0.010
 gardens-0.010
Token a
Token ID257
Feature activation-1.09e-02

Pos logit contributions
cast+0.232
aris+0.230
 castles+0.227
ashing+0.226
angers+0.225
Neg logit contributions
 word-0.146
 grant-0.132
 pain-0.127
 apology-0.118
 shock-0.117
Token book
Token ID1492
Feature activation-1.97e-02

Pos logit contributions
 word+0.018
 apology+0.017
'.+0.017
 pain+0.017
 hurts+0.017
Neg logit contributions
 castles-0.019
 cities-0.019
 forests-0.018
 fields-0.018
 gardens-0.018
Token,
Token ID11
Feature activation+1.10e-01

Pos logit contributions
 word+0.031
'.+0.028
 grant+0.027
 pain+0.027
 hurts+0.026
Neg logit contributions
 castles-0.038
 cities-0.037
cast-0.036
 forests-0.036
 gardens-0.036
Token a
Token ID257
Feature activation-1.24e-02

Pos logit contributions
cast+0.227
 castles+0.223
aris+0.221
angers+0.220
 cities+0.219
Neg logit contributions
 word-0.153
 grant-0.138
 pain-0.134
 apology-0.128
 value-0.125
Token guitar
Token ID10047
Feature activation-1.72e-02

Pos logit contributions
 word+0.020
 apology+0.020
'.+0.019
 pain+0.019
 hurts+0.019
Neg logit contributions
 castles-0.022
 cities-0.022
 forests-0.021
 fields-0.021
 ships-0.021
Token,
Token ID11
Feature activation+1.13e-01

Pos logit contributions
 word+0.028
 apology+0.027
 pain+0.026
'.+0.026
 grant+0.026
Neg logit contributions
 castles-0.032
 cities-0.030
 forests-0.030
 gardens-0.030
cast-0.030
Token a
Token ID257
Feature activation-1.07e-02

Pos logit contributions
cast+0.237
 castles+0.232
aris+0.231
ashing+0.229
angers+0.229
Neg logit contributions
 word-0.153
 grant-0.139
 pain-0.135
 apology-0.125
 value-0.123
Token dog
Token ID3290
Feature activation-2.68e-02

Pos logit contributions
 word+0.017
 apology+0.017
'.+0.017
 pain+0.016
 hurts+0.016
Neg logit contributions
 castles-0.019
 cities-0.019
 fields-0.018
 ships-0.018
 forests-0.018
Token,
Token ID11
Feature activation+1.10e-01

Pos logit contributions
 word+0.045
 apology+0.042
 pain+0.042
'.+0.041
 grant+0.041
Neg logit contributions
 castles-0.048
 cities-0.046
 forests-0.045
cast-0.045
 gardens-0.045
Token a
Token ID257
Feature activation-1.46e-02

Pos logit contributions
cast+0.234
 castles+0.232
 cities+0.224
angers+0.224
aris+0.220
Neg logit contributions
 word-0.149
 grant-0.138
 pain-0.132
'.-0.129
 apology-0.125
Token cat
Token ID3797
Feature activation-1.79e-02

Pos logit contributions
 word+0.023
 apology+0.023
'.+0.022
 pain+0.022
 hurts+0.022
Neg logit contributions
 castles-0.026
 cities-0.025
 fields-0.025
 ships-0.025
 forests-0.025
Token from
Token ID422
Feature activation+3.67e-02

Pos logit contributions
 castles+0.040
 cities+0.038
cast+0.038
 forests+0.037
 gardens+0.037
Neg logit contributions
 word-0.039
 apology-0.036
 pain-0.036
'.-0.036
 grant-0.035
Token them
Token ID606
Feature activation+1.26e-02

Pos logit contributions
 castles+0.054
 cities+0.051
 forests+0.050
 gardens+0.050
 fields+0.050
Neg logit contributions
 word-0.074
'.-0.070
 apology-0.070
 grant-0.069
 owe-0.068
Token and
Token ID290
Feature activation+8.81e-02

Pos logit contributions
 castles+0.028
 cities+0.028
cast+0.027
 forests+0.027
 gardens+0.027
Neg logit contributions
 word-0.015
 apology-0.014
 pain-0.014
'.-0.014
 grant-0.013
Token floated
Token ID29742
Feature activation+5.75e-02

Pos logit contributions
 castles+0.141
 cities+0.141
angers+0.140
aris+0.139
 oceans+0.139
Neg logit contributions
 word-0.160
 pain-0.153
 hurts-0.148
 apology-0.146
 grant-0.145
Token higher
Token ID2440
Feature activation+3.34e-02

Pos logit contributions
 castles+0.137
 cities+0.134
cast+0.132
 forests+0.131
 gardens+0.131
Neg logit contributions
 word-0.063
 pain-0.056
 apology-0.056
'.-0.055
 hurts-0.053
Token and
Token ID290
Feature activation+1.13e-01

Pos logit contributions
 castles+0.063
cast+0.060
 cities+0.060
 forests+0.059
 oceans+0.059
Neg logit contributions
 word-0.055
 apology-0.054
 owe-0.052
 grant-0.052
 pain-0.051
Token higher
Token ID2440
Feature activation+3.24e-02

Pos logit contributions
angers+0.183
aris+0.175
 castles+0.172
iked+0.166
 ships+0.166
Neg logit contributions
 word-0.207
 hurts-0.201
 pain-0.193
 grant-0.189
 shock-0.186
Token.
Token ID13
Feature activation-1.54e-03

Pos logit contributions
 castles+0.064
 cities+0.061
cast+0.061
 forests+0.061
 gardens+0.060
Neg logit contributions
 word-0.049
 apology-0.046
 pain-0.045
'.-0.045
 grant-0.044
Token
Token ID198
Feature activation-1.25e-02

Pos logit contributions
'.+0.002
 word+0.002
 apology+0.002
 grant+0.002
 owe+0.002
Neg logit contributions
 castles-0.003
cast-0.003
 cities-0.003
 gardens-0.003
 oceans-0.003
Token
Token ID198
Feature activation-1.98e-02

Pos logit contributions
 word+0.019
 apology+0.018
'.+0.018
 pain+0.017
 grant+0.017
Neg logit contributions
 castles-0.025
 cities-0.024
cast-0.023
 gardens-0.023
 forests-0.023
TokenWith
Token ID3152
Feature activation+4.56e-02

Pos logit contributions
 word+0.028
'.+0.027
 apology+0.026
 pain+0.026
 grant+0.025
Neg logit contributions
 castles-0.041
cast-0.040
 cities-0.039
 gardens-0.038
 ships-0.038
Token Mia
Token ID32189
Feature activation+1.82e-03

Pos logit contributions
 word+0.013
 apology+0.012
 pain+0.012
'.+0.011
 grant+0.011
Neg logit contributions
 castles-0.018
 cities-0.018
cast-0.017
 forests-0.017
 gardens-0.017
Token and
Token ID290
Feature activation+8.87e-02

Pos logit contributions
 castles+0.003
 cities+0.003
angers+0.003
 gardens+0.003
cast+0.003
Neg logit contributions
 word-0.003
 pain-0.003
 hurts-0.003
 grant-0.003
'.-0.003
Token Leo
Token ID19632
Feature activation+8.52e-03

Pos logit contributions
cast+0.162
angers+0.162
 castles+0.159
aris+0.154
udd+0.154
Neg logit contributions
 word-0.152
 pain-0.141
 shock-0.136
se-0.133
 hurts-0.131
Token cl
Token ID537
Feature activation+2.77e-02

Pos logit contributions
 castles+0.013
angers+0.013
 cities+0.013
 gardens+0.013
cast+0.012
Neg logit contributions
 word-0.016
 pain-0.015
 grant-0.015
'.-0.015
 apology-0.015
Tokenapped
Token ID6320
Feature activation+4.01e-02

Pos logit contributions
 castles+0.027
 cities+0.024
cast+0.024
 ships+0.024
 forests+0.024
Neg logit contributions
 word-0.070
 apology-0.066
 pain-0.066
 grant-0.066
'.-0.065
Token and
Token ID290
Feature activation+1.12e-01

Pos logit contributions
 castles+0.068
 cities+0.066
cast+0.065
 forests+0.065
 gardens+0.065
Neg logit contributions
 word-0.072
 apology-0.067
 pain-0.067
'.-0.066
 grant-0.065
Token laughed
Token ID13818
Feature activation+2.83e-02

Pos logit contributions
angers+0.201
 cities+0.190
 castles+0.189
 gardens+0.186
aris+0.186
Neg logit contributions
 word-0.192
 pain-0.190
 hurts-0.179
 shock-0.178
 apology-0.174
Token.
Token ID13
Feature activation-8.41e-03

Pos logit contributions
 castles+0.065
 cities+0.063
cast+0.062
 forests+0.062
 fields+0.062
Neg logit contributions
 word-0.034
 apology-0.031
 pain-0.030
 grant-0.030
'.-0.030
Token
Token ID198
Feature activation-1.83e-02

Pos logit contributions
 word+0.013
'.+0.012
 apology+0.012
 pain+0.012
 grant+0.011
Neg logit contributions
 castles-0.016
cast-0.016
 cities-0.016
 gardens-0.016
 ships-0.015
Token
Token ID198
Feature activation-2.61e-02

Pos logit contributions
 word+0.030
 apology+0.028
 pain+0.027
'.+0.027
 grant+0.027
Neg logit contributions
 castles-0.034
 cities-0.033
cast-0.032
 forests-0.032
 gardens-0.032
TokenThey
Token ID2990
Feature activation-9.27e-03

Pos logit contributions
 word+0.032
'.+0.029
 pain+0.028
 apology+0.028
 grant+0.028
Neg logit contributions
 castles-0.059
cast-0.059
 cities-0.057
angers-0.057
 gardens-0.057
Token's
Token ID338
Feature activation+5.72e-03

Pos logit contributions
 word+0.008
'.+0.007
 hurts+0.007
 pain+0.007
 grant+0.007
Neg logit contributions
 castles-0.010
 cities-0.010
angers-0.010
 gardens-0.010
 oceans-0.010
Token k
Token ID479
Feature activation+2.25e-02

Pos logit contributions
 castles+0.010
 cities+0.010
cast+0.010
 forests+0.009
 gardens+0.009
Neg logit contributions
 word-0.010
 apology-0.009
 pain-0.009
'.-0.009
 grant-0.009
Tokenite
Token ID578
Feature activation+1.18e-02

Pos logit contributions
 castles+0.037
 cities+0.036
cast+0.035
 forests+0.035
 gardens+0.035
Neg logit contributions
 word-0.041
'.-0.038
 apology-0.038
 pain-0.038
 hurts-0.038
Token flew
Token ID13112
Feature activation+4.67e-02

Pos logit contributions
 castles+0.024
 cities+0.023
 forests+0.022
cast+0.022
 gardens+0.022
Neg logit contributions
 word-0.017
 pain-0.016
 apology-0.016
'.-0.016
 hurts-0.015
Token higher
Token ID2440
Feature activation+2.76e-02

Pos logit contributions
 castles+0.087
angers+0.086
 cities+0.084
 gardens+0.084
 ships+0.083
Neg logit contributions
 word-0.074
se-0.066
 pain-0.066
'.-0.066
 shock-0.065
Token and
Token ID290
Feature activation+1.12e-01

Pos logit contributions
 castles+0.049
 cities+0.048
 gardens+0.047
angers+0.047
 ships+0.047
Neg logit contributions
 word-0.046
 pain-0.042
'.-0.042
 apology-0.041
 hurts-0.040
Token higher
Token ID2440
Feature activation+2.65e-02

Pos logit contributions
angers+0.210
aris+0.196
 castles+0.190
iked+0.190
 gardens+0.187
Neg logit contributions
 word-0.176
 hurts-0.171
se-0.164
 prove-0.163
 grant-0.161
Token and
Token ID290
Feature activation+1.14e-01

Pos logit contributions
 castles+0.045
 cities+0.044
 gardens+0.043
 forests+0.043
cast+0.043
Neg logit contributions
 word-0.046
 pain-0.043
'.-0.042
 apology-0.042
 hurts-0.041
Token he
Token ID339
Feature activation+2.28e-02

Pos logit contributions
angers+0.211
aris+0.195
iked+0.190
 castles+0.189
 gardens+0.186
Neg logit contributions
 word-0.184
 hurts-0.178
 prove-0.170
se-0.170
 grant-0.167
Token cheered
Token ID34767
Feature activation+3.46e-02

Pos logit contributions
angers+0.041
 castles+0.041
 cities+0.040
 gardens+0.039
aris+0.039
Neg logit contributions
 word-0.035
 hurts-0.034
se-0.034
 grant-0.034
 pain-0.033
Token with
Token ID351
Feature activation+2.74e-02

Pos logit contributions
 castles+0.065
 cities+0.064
 forests+0.063
 gardens+0.063
cast+0.062
Neg logit contributions
 word-0.055
'.-0.050
 apology-0.050
 pain-0.050
 grant-0.049
Token.
Token ID13
Feature activation-8.73e-03

Pos logit contributions
 castles+0.104
 cities+0.102
cast+0.100
 forests+0.099
 gardens+0.099
Neg logit contributions
 word-0.040
'.-0.034
 pain-0.033
 grant-0.032
 apology-0.031
Token The
Token ID383
Feature activation+3.08e-02

Pos logit contributions
'.+0.013
 word+0.013
 owe+0.012
 apology+0.012
 grant+0.011
Neg logit contributions
 castles-0.019
 cities-0.017
 ships-0.017
cast-0.017
 oceans-0.017
Token man
Token ID582
Feature activation-2.08e-03

Pos logit contributions
 castles+0.055
 cities+0.054
 forests+0.053
 gardens+0.052
 fields+0.052
Neg logit contributions
 word-0.051
'.-0.048
 apology-0.048
 pain-0.047
 hurts-0.047
Token flew
Token ID13112
Feature activation+5.26e-02

Pos logit contributions
 word+0.003
 pain+0.003
 grant+0.003
 hurts+0.003
 apology+0.003
Neg logit contributions
 castles-0.004
angers-0.004
 cities-0.004
cast-0.004
 oceans-0.004
Token higher
Token ID2440
Feature activation+2.96e-02

Pos logit contributions
angers+0.126
 castles+0.125
 ships+0.123
 cities+0.123
 gardens+0.122
Neg logit contributions
 word-0.056
se-0.047
 pain-0.045
 shock-0.045
'.-0.044
Token and
Token ID290
Feature activation+1.12e-01

Pos logit contributions
 castles+0.055
angers+0.055
aris+0.055
 cities+0.054
 Hil+0.054
Neg logit contributions
 word-0.046
 pain-0.041
'.-0.039
 hurts-0.038
 shock-0.038
Token higher
Token ID2440
Feature activation+2.95e-02

Pos logit contributions
angers+0.224
aris+0.217
iked+0.213
ules+0.205
 ships+0.199
Neg logit contributions
 word-0.159
 hurts-0.157
 grant-0.147
bo-0.146
 prove-0.145
Token until
Token ID1566
Feature activation+5.86e-02

Pos logit contributions
 castles+0.045
 cities+0.044
aris+0.044
angers+0.043
 ships+0.043
Neg logit contributions
 word-0.056
 pain-0.052
'.-0.051
 hurts-0.050
 grant-0.049
Token Sarah
Token ID10490
Feature activation+5.14e-03

Pos logit contributions
aris+0.100
 castles+0.097
angers+0.097
ashing+0.094
udd+0.094
Neg logit contributions
 pain-0.104
 shock-0.099
 word-0.098
'.-0.095
 grant-0.093
Token could
Token ID714
Feature activation-5.60e-02

Pos logit contributions
 castles+0.009
 cities+0.009
 ships+0.008
 oceans+0.008
 gardens+0.008
Neg logit contributions
 word-0.009
 pain-0.009
'.-0.008
 hurts-0.008
 grant-0.008
Token not
Token ID407
Feature activation-5.27e-02

Pos logit contributions
 word+0.089
 apology+0.083
 pain+0.082
'.+0.082
 grant+0.080
Neg logit contributions
 castles-0.106
 cities-0.102
cast-0.101
 forests-0.100
 gardens-0.100
Token some
Token ID617
Feature activation-3.69e-03

Pos logit contributions
aris+0.202
 cities+0.202
 castles+0.201
cast+0.199
 forests+0.196
Neg logit contributions
 word-0.145
 pain-0.132
 apology-0.129
 grant-0.128
 shock-0.123
Token eggs
Token ID9653
Feature activation+3.22e-03

Pos logit contributions
 word+0.007
'.+0.007
 apology+0.007
 grant+0.007
se+0.007
Neg logit contributions
 castles-0.006
 gardens-0.005
 cities-0.005
 fields-0.005
 ships-0.005
Token,
Token ID11
Feature activation+1.04e-01

Pos logit contributions
 castles+0.005
 cities+0.005
 forests+0.005
cast+0.005
 oceans+0.005
Neg logit contributions
 word-0.006
 apology-0.005
 pain-0.005
'.-0.005
 grant-0.005
Token some
Token ID617
Feature activation-1.11e-03

Pos logit contributions
aris+0.197
 cities+0.197
cast+0.194
 castles+0.194
 forests+0.190
Neg logit contributions
 word-0.163
 pain-0.151
 grant-0.149
 apology-0.147
'.-0.141
Token flour
Token ID10601
Feature activation-1.16e-02

Pos logit contributions
 word+0.002
'.+0.002
 apology+0.002
 grant+0.002
 pain+0.002
Neg logit contributions
 castles-0.002
 cities-0.002
 gardens-0.002
 fields-0.002
 forests-0.002
Token,
Token ID11
Feature activation+1.12e-01

Pos logit contributions
 word+0.021
 apology+0.020
 pain+0.020
'.+0.020
 grant+0.019
Neg logit contributions
 castles-0.019
 cities-0.018
 forests-0.018
cast-0.018
 gardens-0.018
Token some
Token ID617
Feature activation+2.88e-03

Pos logit contributions
 cities+0.195
aris+0.194
 castles+0.193
cast+0.191
 forests+0.189
Neg logit contributions
 word-0.187
 pain-0.175
 apology-0.172
 grant-0.171
'.-0.168
Token sugar
Token ID7543
Feature activation-1.02e-02

Pos logit contributions
 castles+0.004
 gardens+0.004
 fields+0.004
 cities+0.004
 mountains+0.004
Neg logit contributions
 word-0.006
 apology-0.005
'.-0.005
 grant-0.005
se-0.005
Token,
Token ID11
Feature activation+1.05e-01

Pos logit contributions
 word+0.019
 apology+0.018
 pain+0.018
'.+0.018
 grant+0.017
Neg logit contributions
 castles-0.017
 cities-0.016
cast-0.016
 gardens-0.016
 forests-0.016
Token some
Token ID617
Feature activation-1.23e-03

Pos logit contributions
 cities+0.187
 castles+0.186
aris+0.182
cast+0.182
 forests+0.181
Neg logit contributions
 word-0.173
 pain-0.160
 apology-0.158
 grant-0.157
'.-0.155
Token butter
Token ID9215
Feature activation-1.35e-02

Pos logit contributions
 word+0.002
 apology+0.002
'.+0.002
 grant+0.002
 pain+0.002
Neg logit contributions
 castles-0.002
 gardens-0.002
 fields-0.002
 cities-0.002
 forests-0.002
Token a
Token ID257
Feature activation-1.34e-02

Pos logit contributions
cast+0.227
 castles+0.223
angers+0.222
aris+0.222
ashing+0.220
Neg logit contributions
 word-0.139
 grant-0.122
 pain-0.118
 value-0.111
 apology-0.111
Token bed
Token ID3996
Feature activation+1.06e-02

Pos logit contributions
 word+0.022
 apology+0.021
'.+0.021
 pain+0.021
 hurts+0.020
Neg logit contributions
 castles-0.024
 cities-0.023
 forests-0.023
 fields-0.023
 gardens-0.022
Token,
Token ID11
Feature activation+1.02e-01

Pos logit contributions
 castles+0.020
 cities+0.019
 forests+0.019
cast+0.019
 oceans+0.019
Neg logit contributions
 word-0.017
 pain-0.015
'.-0.015
 grant-0.015
 apology-0.015
Token a
Token ID257
Feature activation-1.12e-02

Pos logit contributions
cast+0.213
aris+0.210
 castles+0.209
angers+0.209
 cities+0.205
Neg logit contributions
 word-0.140
 grant-0.125
 pain-0.123
 apology-0.117
 shock-0.114
Token sofa
Token ID34902
Feature activation+1.52e-02

Pos logit contributions
 word+0.018
 apology+0.018
'.+0.018
 pain+0.017
 hurts+0.017
Neg logit contributions
 castles-0.020
 cities-0.019
 forests-0.019
 fields-0.019
 gardens-0.019
Token,
Token ID11
Feature activation+1.12e-01

Pos logit contributions
 castles+0.028
cast+0.028
 cities+0.028
 forests+0.027
 Hil+0.027
Neg logit contributions
 word-0.024
'.-0.021
 pain-0.021
 grant-0.021
 value-0.020
Token a
Token ID257
Feature activation-1.16e-02

Pos logit contributions
cast+0.240
aris+0.237
ashing+0.235
angers+0.234
 castles+0.233
Neg logit contributions
 word-0.144
 grant-0.131
 pain-0.126
 shock-0.115
 apology-0.115
Token table
Token ID3084
Feature activation+4.87e-03

Pos logit contributions
 word+0.019
 apology+0.018
'.+0.018
 pain+0.018
 hurts+0.018
Neg logit contributions
 castles-0.021
 cities-0.020
 forests-0.020
 fields-0.019
 gardens-0.019
Token,
Token ID11
Feature activation+1.10e-01

Pos logit contributions
 castles+0.009
 cities+0.009
cast+0.009
 forests+0.009
 oceans+0.009
Neg logit contributions
 word-0.008
 pain-0.007
'.-0.007
 apology-0.007
 grant-0.007
Token a
Token ID257
Feature activation-1.09e-02

Pos logit contributions
cast+0.232
aris+0.230
 castles+0.227
ashing+0.226
angers+0.225
Neg logit contributions
 word-0.146
 grant-0.132
 pain-0.127
 apology-0.118
 shock-0.117
Token book
Token ID1492
Feature activation-1.97e-02

Pos logit contributions
 word+0.018
 apology+0.017
'.+0.017
 pain+0.017
 hurts+0.017
Neg logit contributions
 castles-0.019
 cities-0.019
 forests-0.018
 fields-0.018
 gardens-0.018
Token going
Token ID1016
Feature activation+3.52e-03

Pos logit contributions
angers+0.133
aris+0.122
 castles+0.114
 ships+0.110
udd+0.110
Neg logit contributions
 pain-0.114
 word-0.114
 prove-0.113
 hurts-0.109
 shock-0.107
Token.
Token ID13
Feature activation-1.20e-02

Pos logit contributions
 castles+0.006
 cities+0.006
 gardens+0.006
 ships+0.006
 forests+0.006
Neg logit contributions
 word-0.006
'.-0.006
 apology-0.005
 pain-0.005
 grant-0.005
Token He
Token ID679
Feature activation+9.64e-03

Pos logit contributions
'.+0.019
 word+0.019
 apology+0.017
 grant+0.017
 owe+0.017
Neg logit contributions
 castles-0.024
 cities-0.022
cast-0.022
 gardens-0.022
 ships-0.021
Token crossed
Token ID12606
Feature activation+4.04e-03

Pos logit contributions
angers+0.019
 castles+0.018
aris+0.017
 cities+0.017
cast+0.017
Neg logit contributions
 word-0.015
 value-0.014
 pain-0.013
se-0.013
 grant-0.013
Token rivers
Token ID18180
Feature activation+4.69e-02

Pos logit contributions
angers+0.008
 castles+0.007
aris+0.007
cast+0.007
udd+0.007
Neg logit contributions
 word-0.006
 pain-0.006
'.-0.006
 shock-0.006
 grant-0.006
Token,
Token ID11
Feature activation+1.12e-01

Pos logit contributions
 castles+0.094
angers+0.093
 gardens+0.089
 ships+0.089
yles+0.089
Neg logit contributions
 word-0.063
'.-0.058
se-0.056
 pain-0.053
 grant-0.053
Token mountains
Token ID12269
Feature activation+5.60e-02

Pos logit contributions
angers+0.211
yles+0.182
aj+0.181
aris+0.181
 bedrooms+0.180
Neg logit contributions
 pain-0.156
se-0.152
 bite-0.142
 word-0.141
 shock-0.140
Token and
Token ID290
Feature activation+1.03e-01

Pos logit contributions
angers+0.112
 castles+0.111
yles+0.108
 Hil+0.107
 ships+0.106
Neg logit contributions
 word-0.076
'.-0.068
se-0.065
 pain-0.063
 grant-0.063
Token forests
Token ID17039
Feature activation+6.75e-02

Pos logit contributions
angers+0.140
aj+0.108
ashing+0.104
aris+0.102
iked+0.100
Neg logit contributions
 pain-0.231
 word-0.226
 shock-0.216
 value-0.215
 grant-0.211
Token,
Token ID11
Feature activation+1.06e-01

Pos logit contributions
angers+0.136
 castles+0.136
 ships+0.132
 cities+0.131
 Hil+0.131
Neg logit contributions
 word-0.089
'.-0.081
 pain-0.078
 grant-0.074
se-0.074
Token and
Token ID290
Feature activation+9.65e-02

Pos logit contributions
angers+0.201
aris+0.179
leys+0.177
 Hil+0.176
aj+0.176
Neg logit contributions
 pain-0.148
 word-0.145
 shock-0.136
 control-0.133
 grant-0.133
Token rooms
Token ID9519
Feature activation+3.33e-02

Pos logit contributions
angers+0.028
cast+0.026
aris+0.026
 castles+0.026
leys+0.025
Neg logit contributions
 word-0.035
 pain-0.033
 value-0.032
 apology-0.032
 shock-0.031
Token,
Token ID11
Feature activation+9.35e-02

Pos logit contributions
 Hil+0.070
cast+0.069
 oceans+0.069
 cities+0.069
aris+0.069
Neg logit contributions
 word-0.039
'.-0.033
 value-0.033
 pain-0.032
 am-0.031
Token chairs
Token ID18791
Feature activation+1.72e-02

Pos logit contributions
aris+0.162
angers+0.159
leys+0.158
iked+0.154
cast+0.149
Neg logit contributions
 word-0.161
 pain-0.161
 shock-0.142
 apology-0.141
 value-0.138
Token,
Token ID11
Feature activation+1.02e-01

Pos logit contributions
 cities+0.038
 ships+0.038
 Hil+0.038
aris+0.037
 castles+0.037
Neg logit contributions
 word-0.019
 pain-0.017
 bite-0.016
 shock-0.016
'.-0.015
Token tables
Token ID8893
Feature activation+1.53e-02

Pos logit contributions
aris+0.230
ashing+0.224
rolling+0.219
angers+0.218
yles+0.211
Neg logit contributions
 pain-0.124
 word-0.118
 bite-0.107
 shock-0.101
 hurts-0.100
Token,
Token ID11
Feature activation+1.12e-01

Pos logit contributions
 cities+0.033
aris+0.033
 ships+0.033
 Hil+0.032
 castles+0.032
Neg logit contributions
 word-0.018
 pain-0.016
 shock-0.015
'.-0.015
 bite-0.014
Token and
Token ID290
Feature activation+9.64e-02

Pos logit contributions
aris+0.253
ashing+0.245
arium+0.237
rolling+0.236
angers+0.235
Neg logit contributions
 pain-0.124
 word-0.118
 hurts-0.109
 bite-0.106
 shock-0.099
Token toys
Token ID14958
Feature activation-1.73e-03

Pos logit contributions
aris+0.156
cast+0.154
angers+0.154
ashing+0.152
ules+0.141
Neg logit contributions
 pain-0.173
 word-0.170
 value-0.164
 grant-0.157
 shock-0.156
Token.
Token ID13
Feature activation-1.45e-02

Pos logit contributions
 word+0.001
'.+0.001
 pain+0.001
 value+0.001
 hurts+0.001
Neg logit contributions
 cities-0.005
 castles-0.004
 oceans-0.004
 forests-0.004
cast-0.004
Token She
Token ID1375
Feature activation+9.92e-03

Pos logit contributions
 word+0.021
'.+0.021
 pain+0.019
 apology+0.019
 grant+0.018
Neg logit contributions
 castles-0.030
cast-0.029
 ships-0.027
 cities-0.027
leys-0.026
Token thought
Token ID1807
Feature activation-2.06e-02

Pos logit contributions
aris+0.020
angers+0.020
 cities+0.019
 oceans+0.019
cast+0.019
Neg logit contributions
 word-0.013
 value-0.013
 pain-0.012
ggy-0.012
 am-0.012
Token
Token ID198
Feature activation-1.67e-02

Pos logit contributions
 word+0.011
'.+0.010
 apology+0.010
 pain+0.010
 grant+0.010
Neg logit contributions
 castles-0.013
 cities-0.013
cast-0.013
 gardens-0.013
 forests-0.013
TokenM
Token ID44
Feature activation+5.58e-03

Pos logit contributions
 word+0.026
 apology+0.024
 pain+0.024
'.+0.024
 grant+0.023
Neg logit contributions
 castles-0.032
 cities-0.031
cast-0.031
 forests-0.031
 gardens-0.031
Tokenum
Token ID388
Feature activation+3.05e-02

Pos logit contributions
 castles+0.012
cast+0.011
 cities+0.011
 gardens+0.011
 forests+0.011
Neg logit contributions
 word-0.008
 pain-0.007
 grant-0.007
'.-0.007
 apology-0.007
Token cl
Token ID537
Feature activation+2.79e-02

Pos logit contributions
angers+0.060
 cities+0.059
 castles+0.058
 oceans+0.057
 gardens+0.057
Neg logit contributions
 word-0.045
se-0.042
ggy-0.040
 grant-0.040
ggie-0.040
Tokenapped
Token ID6320
Feature activation+3.99e-02

Pos logit contributions
 castles+0.031
 cities+0.029
cast+0.028
 forests+0.028
 gardens+0.028
Neg logit contributions
 word-0.067
 apology-0.064
 pain-0.063
 grant-0.062
'.-0.062
Token and
Token ID290
Feature activation+1.11e-01

Pos logit contributions
 castles+0.066
 cities+0.064
cast+0.064
 forests+0.063
 gardens+0.063
Neg logit contributions
 word-0.074
 pain-0.069
 apology-0.068
'.-0.068
 grant-0.066
Token said
Token ID531
Feature activation+2.83e-02

Pos logit contributions
angers+0.232
 castles+0.224
 cities+0.217
 gardens+0.216
cast+0.216
Neg logit contributions
 word-0.160
 pain-0.150
se-0.146
 shock-0.146
'.-0.142
Token.
Token ID13
Feature activation-1.40e-02

Pos logit contributions
 castles+0.047
 cities+0.045
cast+0.044
 forests+0.044
 gardens+0.044
Neg logit contributions
 word-0.052
 apology-0.049
'.-0.048
 pain-0.048
 grant-0.047
Token "
Token ID366
Feature activation-6.69e-03

Pos logit contributions
 word+0.025
'.+0.024
 apology+0.022
 grant+0.022
 pain+0.022
Neg logit contributions
 castles-0.024
 cities-0.023
 ships-0.023
cast-0.023
 forests-0.022
TokenThank
Token ID10449
Feature activation-2.51e-02

Pos logit contributions
 word+0.010
 apology+0.009
 pain+0.009
 hurts+0.009
'.+0.009
Neg logit contributions
 castles-0.013
 cities-0.013
 forests-0.013
 oceans-0.013
 fields-0.013
Token You
Token ID921
Feature activation-6.55e-02

Pos logit contributions
 word+0.048
 apology+0.046
 pain+0.046
 hurts+0.045
 owe+0.044
Neg logit contributions
 castles-0.038
 cities-0.037
 forests-0.036
 oceans-0.036
 mountains-0.036
Token added
Token ID2087
Feature activation+3.62e-02

Pos logit contributions
 castles+0.029
 cities+0.028
 oceans+0.028
 forests+0.027
cast+0.027
Neg logit contributions
 word-0.026
 grant-0.024
 pain-0.024
'.-0.024
 apology-0.024
Token a
Token ID257
Feature activation-1.68e-02

Pos logit contributions
 castles+0.060
 cities+0.058
cast+0.057
 forests+0.057
 gardens+0.057
Neg logit contributions
 word-0.065
 apology-0.061
 pain-0.061
'.-0.060
 grant-0.059
Token handful
Token ID10089
Feature activation-6.56e-03

Pos logit contributions
 apology+0.030
 pain+0.030
 word+0.029
 owe+0.029
 hurts+0.029
Neg logit contributions
 castles-0.026
 fields-0.024
 gardens-0.024
 cities-0.024
 upon-0.024
Token of
Token ID286
Feature activation+1.49e-02

Pos logit contributions
 grant+0.012
 word+0.012
 apology+0.012
 owe+0.011
 pain+0.011
Neg logit contributions
 castles-0.011
cast-0.010
 fields-0.010
 mountains-0.010
 gardens-0.010
Token flour
Token ID10601
Feature activation-1.52e-02

Pos logit contributions
 castles+0.025
 cities+0.024
 gardens+0.023
 fields+0.023
 forests+0.023
Neg logit contributions
 word-0.027
 apology-0.025
'.-0.025
 pain-0.025
 grant-0.025
Token,
Token ID11
Feature activation+1.11e-01

Pos logit contributions
 word+0.029
 apology+0.028
 pain+0.028
 grant+0.027
 owe+0.027
Neg logit contributions
 castles-0.024
cast-0.022
 cities-0.022
 fields-0.022
 gardens-0.022
Token a
Token ID257
Feature activation-1.18e-02

Pos logit contributions
 castles+0.184
 cities+0.183
aris+0.183
cast+0.180
 forests+0.178
Neg logit contributions
 word-0.196
 pain-0.183
 apology-0.179
 grant-0.178
'.-0.176
Token few
Token ID1178
Feature activation+1.85e-02

Pos logit contributions
 apology+0.020
 word+0.020
'.+0.020
 pain+0.019
 hurts+0.019
Neg logit contributions
 castles-0.019
 fields-0.018
 cities-0.018
 gardens-0.018
 forests-0.018
Token eggs
Token ID9653
Feature activation+2.17e-03

Pos logit contributions
 castles+0.029
 cities+0.028
 gardens+0.027
 forests+0.027
 fields+0.027
Neg logit contributions
 word-0.035
 apology-0.033
'.-0.033
 grant-0.033
 pain-0.032
Token,
Token ID11
Feature activation+1.02e-01

Pos logit contributions
 castles+0.003
cast+0.003
 cities+0.003
 gardens+0.003
 forests+0.003
Neg logit contributions
 word-0.004
 apology-0.004
 pain-0.004
'.-0.004
 grant-0.004
Token and
Token ID290
Feature activation+9.04e-02

Pos logit contributions
 castles+0.179
 cities+0.176
cast+0.174
aris+0.173
 forests+0.171
Neg logit contributions
 word-0.174
 pain-0.160
 grant-0.159
'.-0.158
 apology-0.158
Token some
Token ID617
Feature activation-6.37e-03

Pos logit contributions
 cities+0.185
 castles+0.185
cast+0.182
 forests+0.180
aris+0.178
Neg logit contributions
 word-0.122
 pain-0.112
 apology-0.108
 grant-0.108
 shock-0.104
Token eggs
Token ID9653
Feature activation-1.42e-05

Pos logit contributions
 word+0.013
'.+0.012
 apology+0.012
 grant+0.012
 pain+0.012
Neg logit contributions
 castles-0.009
 gardens-0.009
 cities-0.009
 fields-0.009
 mountains-0.008
Token,
Token ID11
Feature activation+9.87e-02

Pos logit contributions
 word+0.000
'.+0.000
 apology+0.000
 pain+0.000
 hurts+0.000
Neg logit contributions
 castles-0.000
 cities-0.000
 forests-0.000
 oceans-0.000
cast-0.000
Token some
Token ID617
Feature activation-1.75e-03

Pos logit contributions
 cities+0.195
aris+0.194
cast+0.193
 castles+0.190
 forests+0.188
Neg logit contributions
 word-0.143
 pain-0.132
 grant-0.130
 apology-0.130
'.-0.125
Token flour
Token ID10601
Feature activation-1.17e-02

Pos logit contributions
 word+0.004
 apology+0.003
'.+0.003
 pain+0.003
 grant+0.003
Neg logit contributions
 castles-0.003
 cities-0.002
 gardens-0.002
 fields-0.002
 forests-0.002
Token,
Token ID11
Feature activation+1.11e-01

Pos logit contributions
 word+0.020
'.+0.019
 pain+0.018
 apology+0.018
 hurts+0.018
Neg logit contributions
 castles-0.020
 cities-0.020
 forests-0.020
 oceans-0.019
 gardens-0.019
Token some
Token ID617
Feature activation+5.76e-03

Pos logit contributions
 cities+0.201
aris+0.200
 castles+0.196
cast+0.195
 forests+0.194
Neg logit contributions
 word-0.174
 pain-0.163
 apology-0.160
 grant-0.158
'.-0.158
Token sugar
Token ID7543
Feature activation-9.79e-03

Pos logit contributions
 castles+0.008
 gardens+0.008
 fields+0.007
 cities+0.007
 mountains+0.007
Neg logit contributions
 word-0.012
 apology-0.011
'.-0.011
 pain-0.011
 grant-0.011
Token,
Token ID11
Feature activation+1.05e-01

Pos logit contributions
 word+0.017
'.+0.016
 apology+0.016
 pain+0.016
 hurts+0.015
Neg logit contributions
 castles-0.017
 cities-0.016
 forests-0.016
cast-0.016
 oceans-0.016
Token and
Token ID290
Feature activation+9.15e-02

Pos logit contributions
 cities+0.201
aris+0.200
 castles+0.196
cast+0.195
 forests+0.195
Neg logit contributions
 word-0.157
 pain-0.144
 apology-0.142
'.-0.140
 grant-0.140
Token some
Token ID617
Feature activation+3.06e-04

Pos logit contributions
 castles+0.161
 cities+0.158
cast+0.158
 forests+0.154
 oceans+0.152
Neg logit contributions
 word-0.155
 pain-0.143
 apology-0.142
'.-0.141
 grant-0.139
Token baby
Token ID5156
Feature activation-1.41e-02

Pos logit contributions
 word+0.031
 apology+0.030
 grant+0.029
'.+0.029
 owe+0.029
Neg logit contributions
 castles-0.035
 cities-0.034
 forests-0.033
 fields-0.033
 gardens-0.033
Token.
Token ID13
Feature activation-7.07e-03

Pos logit contributions
 word+0.022
 apology+0.022
 pain+0.021
'.+0.021
 grant+0.021
Neg logit contributions
 castles-0.027
 cities-0.026
 gardens-0.025
 forests-0.025
 fields-0.025
Token Tommy
Token ID19919
Feature activation-1.09e-02

Pos logit contributions
 word+0.013
'.+0.012
 apology+0.012
 pain+0.012
 owe+0.012
Neg logit contributions
 castles-0.012
 cities-0.012
cast-0.011
 forests-0.011
 gardens-0.011
Token cl
Token ID537
Feature activation+2.40e-02

Pos logit contributions
 word+0.017
'.+0.016
 hurts+0.016
 grant+0.016
 pain+0.016
Neg logit contributions
 castles-0.021
 cities-0.020
 oceans-0.020
 forests-0.020
 gardens-0.020
Tokenapped
Token ID6320
Feature activation+4.27e-02

Pos logit contributions
 castles+0.027
 cities+0.025
cast+0.025
 forests+0.024
 gardens+0.024
Neg logit contributions
 word-0.057
 apology-0.054
 pain-0.054
'.-0.053
 grant-0.053
Token and
Token ID290
Feature activation+1.11e-01

Pos logit contributions
 castles+0.079
 cities+0.078
angers+0.078
 gardens+0.078
 forests+0.077
Neg logit contributions
 word-0.069
 pain-0.063
 apology-0.063
'.-0.061
 shock-0.060
Token cheered
Token ID34767
Feature activation+4.26e-02

Pos logit contributions
angers+0.196
 castles+0.195
 cities+0.191
 gardens+0.188
cast+0.186
Neg logit contributions
 word-0.189
 pain-0.180
 hurts-0.173
 apology-0.170
 shock-0.170
Token -
Token ID532
Feature activation+4.65e-02

Pos logit contributions
 castles+0.086
 cities+0.084
 forests+0.082
 gardens+0.082
cast+0.082
Neg logit contributions
 word-0.063
 apology-0.057
'.-0.057
 pain-0.057
 grant-0.055
Token it
Token ID340
Feature activation-1.31e-02

Pos logit contributions
 castles+0.075
 cities+0.072
 gardens+0.072
 forests+0.071
angers+0.070
Neg logit contributions
 word-0.088
'.-0.082
 apology-0.081
 grant-0.080
 pain-0.080
Token was
Token ID373
Feature activation-7.17e-03

Pos logit contributions
 word+0.020
 apology+0.020
 pain+0.019
'.+0.019
 grant+0.018
Neg logit contributions
 castles-0.025
cast-0.024
 cities-0.024
 forests-0.024
 gardens-0.023
Token a
Token ID257
Feature activation-3.62e-02

Pos logit contributions
 word+0.013
 pain+0.012
 apology+0.012
'.+0.012
 grant+0.011
Neg logit contributions
 castles-0.012
 cities-0.012
cast-0.012
 forests-0.012
 gardens-0.012
Token some
Token ID617
Feature activation-3.53e-03

Pos logit contributions
 castles+0.194
 cities+0.194
cast+0.191
 forests+0.189
aris+0.187
Neg logit contributions
 word-0.119
 pain-0.108
 grant-0.104
 apology-0.104
'.-0.101
Token eggs
Token ID9653
Feature activation+2.44e-03

Pos logit contributions
 word+0.007
'.+0.007
 apology+0.007
 grant+0.007
se+0.007
Neg logit contributions
 castles-0.005
 gardens-0.005
 fields-0.005
 cities-0.005
 ships-0.005
Token,
Token ID11
Feature activation+1.01e-01

Pos logit contributions
 castles+0.004
 cities+0.004
 forests+0.004
cast+0.004
 oceans+0.004
Neg logit contributions
 word-0.004
 apology-0.004
'.-0.004
 pain-0.004
 grant-0.004
Token some
Token ID617
Feature activation+1.44e-03

Pos logit contributions
 cities+0.198
aris+0.196
cast+0.196
 castles+0.194
 forests+0.191
Neg logit contributions
 word-0.148
 pain-0.137
 grant-0.135
 apology-0.134
'.-0.130
Token flour
Token ID10601
Feature activation-1.09e-02

Pos logit contributions
 castles+0.002
 cities+0.002
 gardens+0.002
 fields+0.002
 forests+0.002
Neg logit contributions
 word-0.003
 apology-0.003
'.-0.003
 pain-0.003
 grant-0.003
Token,
Token ID11
Feature activation+1.11e-01

Pos logit contributions
 word+0.019
'.+0.018
 pain+0.018
 apology+0.018
 hurts+0.017
Neg logit contributions
 castles-0.019
 cities-0.018
 forests-0.018
cast-0.018
 gardens-0.018
Token some
Token ID617
Feature activation+6.68e-03

Pos logit contributions
 cities+0.203
aris+0.203
 castles+0.198
cast+0.197
 forests+0.195
Neg logit contributions
 word-0.172
 pain-0.162
 apology-0.157
 grant-0.156
'.-0.155
Token sugar
Token ID7543
Feature activation-8.29e-03

Pos logit contributions
 castles+0.009
 gardens+0.008
 fields+0.008
 cities+0.008
 mountains+0.008
Neg logit contributions
 word-0.014
 apology-0.014
'.-0.013
 pain-0.013
 grant-0.013
Token,
Token ID11
Feature activation+1.05e-01

Pos logit contributions
 word+0.015
 apology+0.014
 pain+0.014
'.+0.014
 grant+0.013
Neg logit contributions
 castles-0.014
 cities-0.014
 forests-0.013
cast-0.013
 gardens-0.013
Token and
Token ID290
Feature activation+9.26e-02

Pos logit contributions
 cities+0.201
aris+0.199
 castles+0.197
cast+0.196
 forests+0.194
Neg logit contributions
 word-0.158
 pain-0.144
 apology-0.142
 grant-0.141
'.-0.141
Token some
Token ID617
Feature activation+2.12e-03

Pos logit contributions
 castles+0.164
 cities+0.161
cast+0.160
 forests+0.157
 oceans+0.155
Neg logit contributions
 word-0.156
 pain-0.144
 apology-0.144
'.-0.142
 grant-0.140
Token a
Token ID257
Feature activation-3.20e-02

Pos logit contributions
 castles+0.107
 cities+0.103
cast+0.102
 forests+0.101
 oceans+0.100
Neg logit contributions
 word-0.128
 apology-0.119
 pain-0.119
'.-0.118
 grant-0.117
Token penchant
Token ID41773
Feature activation-1.52e-03

Pos logit contributions
 word+0.056
 apology+0.054
'.+0.053
 pain+0.052
 grant+0.051
Neg logit contributions
 castles-0.053
 cities-0.052
 forests-0.051
 fields-0.051
 gardens-0.050
Token for
Token ID329
Feature activation-4.22e-02

Pos logit contributions
 word+0.003
 apology+0.003
'.+0.003
 pain+0.003
 grant+0.003
Neg logit contributions
 castles-0.002
 cities-0.002
 forests-0.002
 gardens-0.002
cast-0.002
Token things
Token ID1243
Feature activation-2.73e-02

Pos logit contributions
 word+0.070
 pain+0.066
 apology+0.065
'.+0.063
 grant+0.063
Neg logit contributions
 castles-0.077
cast-0.076
 cities-0.075
angers-0.073
 forests-0.072
Token she
Token ID673
Feature activation-2.70e-03

Pos logit contributions
 word+0.040
 apology+0.037
'.+0.036
 pain+0.036
 grant+0.035
Neg logit contributions
 castles-0.055
 cities-0.053
cast-0.052
 forests-0.052
 gardens-0.052
Token shouldn
Token ID6584
Feature activation-1.31e-01

Pos logit contributions
 word+0.005
 apology+0.005
 pain+0.005
'.+0.005
 grant+0.005
Neg logit contributions
 castles-0.004
 cities-0.004
cast-0.004
 gardens-0.004
 forests-0.004
Token't
Token ID470
Feature activation-6.13e-02

Pos logit contributions
 pain+0.198
 apology+0.196
 word+0.192
 trigger+0.191
 shock+0.183
Neg logit contributions
 upon-0.237
cast-0.219
 castles-0.218
 Hil-0.216
angers-0.215
Token touch
Token ID3638
Feature activation-2.00e-02

Pos logit contributions
 apology+0.085
 pain+0.084
 word+0.082
 trigger+0.082
 hurts+0.081
Neg logit contributions
 upon-0.111
 castles-0.110
 cities-0.109
cast-0.108
 gardens-0.108
Token.
Token ID13
Feature activation-2.86e-02

Pos logit contributions
 word+0.027
 apology+0.026
 grant+0.025
 pain+0.025
 owe+0.024
Neg logit contributions
 castles-0.042
 cities-0.041
 forests-0.041
 gardens-0.041
 fields-0.041
Token One
Token ID1881
Feature activation-4.34e-03

Pos logit contributions
'.+0.040
 word+0.037
 value+0.035
 apology+0.033
 grant+0.032
Neg logit contributions
 castles-0.062
cast-0.060
 cities-0.058
 gardens-0.055
 deserts-0.055
Token day
Token ID1110
Feature activation+1.06e-02

Pos logit contributions
 word+0.007
 pain+0.007
 apology+0.007
'.+0.006
 hurts+0.006
Neg logit contributions
 castles-0.008
cast-0.008
 cities-0.008
angers-0.008
 oceans-0.008
Token and
Token ID290
Feature activation+6.76e-02

Pos logit contributions
 word+0.002
 apology+0.002
'.+0.002
 pain+0.002
 grant+0.002
Neg logit contributions
 castles-0.003
 cities-0.003
cast-0.002
 forests-0.002
 gardens-0.002
Token not
Token ID407
Feature activation-5.26e-02

Pos logit contributions
cast+0.126
 forests+0.124
 castles+0.124
 oceans+0.124
 cities+0.124
Neg logit contributions
 word-0.110
 apology-0.103
 pain-0.102
 value-0.100
 hurts-0.100
Token touch
Token ID3638
Feature activation-1.08e-02

Pos logit contributions
 word+0.071
 apology+0.069
'.+0.066
 grant+0.064
 pain+0.063
Neg logit contributions
 castles-0.110
 cities-0.105
cast-0.103
 gardens-0.103
 fields-0.103
Token things
Token ID1243
Feature activation-2.39e-02

Pos logit contributions
 word+0.021
 apology+0.020
 grant+0.020
'.+0.019
 pain+0.019
Neg logit contributions
 castles-0.017
 cities-0.016
 fields-0.016
 gardens-0.016
 forests-0.016
Token she
Token ID673
Feature activation+4.93e-03

Pos logit contributions
 word+0.041
 apology+0.039
 pain+0.038
 grant+0.038
'.+0.037
Neg logit contributions
 castles-0.042
cast-0.040
 cities-0.040
 gardens-0.039
 forests-0.039
Token shouldn
Token ID6584
Feature activation-1.27e-01

Pos logit contributions
 castles+0.009
cast+0.008
 gardens+0.008
 cities+0.008
 fields+0.008
Neg logit contributions
 word-0.008
 apology-0.008
'.-0.008
 pain-0.008
 grant-0.008
Token't
Token ID470
Feature activation-6.28e-02

Pos logit contributions
 word+0.218
 pain+0.216
 apology+0.213
 trigger+0.209
 shock+0.206
Neg logit contributions
 upon-0.194
cast-0.188
 castles-0.186
 Hil-0.175
angers-0.171
Token.
Token ID13
Feature activation-1.69e-02

Pos logit contributions
 apology+0.092
 grant+0.092
 word+0.091
 pain+0.088
se+0.087
Neg logit contributions
 castles-0.113
cast-0.110
 upon-0.106
line-0.104
 they-0.103
Token<|endoftext|>
Token ID50256
Feature activation+6.18e-03

Pos logit contributions
 word+0.023
'.+0.023
 apology+0.022
 pain+0.021
 hurts+0.021
Neg logit contributions
 castles-0.037
cast-0.035
 cities-0.035
 oceans-0.034
 ships-0.034
TokenOne
Token ID3198
Feature activation+3.49e-03

Pos logit contributions
 upon+0.007
 forests+0.007
 mountains+0.007
 fields+0.007
cast+0.007
Neg logit contributions
 apology-0.014
 word-0.014
 trigger-0.013
 grant-0.013
 pain-0.013
Token day
Token ID1110
Feature activation+2.54e-02

Pos logit contributions
 castles+0.005
 cities+0.005
 fields+0.005
 gardens+0.005
 forests+0.005
Neg logit contributions
 word-0.007
 apology-0.006
 grant-0.006
'.-0.006
 pain-0.006
Token
Token ID8151
Feature activation-4.33e-02

Pos logit contributions
 word+0.101
'.+0.097
 pain+0.091
 hurts+0.091
 owe+0.091
Neg logit contributions
 castles-0.127
 cities-0.122
 forests-0.119
 gardens-0.119
 oceans-0.119
Tokent
Token ID83
Feature activation-7.25e-02

Pos logit contributions
 word+0.062
 pain+0.057
 apology+0.057
 grant+0.055
'.+0.054
Neg logit contributions
 castles-0.087
cast-0.085
 cities-0.085
 forests-0.083
 gardens-0.083
Token you
Token ID345
Feature activation-8.07e-02

Pos logit contributions
 word+0.098
 apology+0.089
 pain+0.086
 grant+0.085
'.+0.082
Neg logit contributions
 castles-0.150
 cities-0.146
cast-0.144
 ships-0.143
 forests-0.143
Token know
Token ID760
Feature activation-5.02e-02

Pos logit contributions
 word+0.120
 pain+0.114
 apology+0.108
 hurts+0.106
 shock+0.104
Neg logit contributions
 upon-0.151
 castles-0.147
cast-0.146
 cities-0.145
 ships-0.143
Token you
Token ID345
Feature activation-7.51e-02

Pos logit contributions
 word+0.087
'.+0.082
 apology+0.081
 pain+0.079
 hurts+0.078
Neg logit contributions
 castles-0.087
 cities-0.084
cast-0.084
 forests-0.083
 oceans-0.083
Token shouldn
Token ID6584
Feature activation-1.27e-01

Pos logit contributions
 word+0.115
 pain+0.113
 apology+0.112
 hurts+0.105
 shock+0.104
Neg logit contributions
 castles-0.136
 cities-0.134
cast-0.133
 upon-0.133
 ships-0.132
Tokenâ
Token ID22940
Feature activation-3.40e-02

Pos logit contributions
 apology+0.234
 word+0.219
 pain+0.219
 trigger+0.218
rum+0.209
Neg logit contributions
 Hil-0.171
 upon-0.166
 fields-0.166
 exploring-0.163
 mountains-0.162
Token
Token ID26391
Feature activation-6.88e-02

Pos logit contributions
 word+0.048
 shock+0.044
 pain+0.044
 trigger+0.043
 hurts+0.042
Neg logit contributions
 castles-0.062
cast-0.061
 gardens-0.060
 cities-0.060
line-0.060
Token
Token ID8151
Feature activation-4.67e-02

Pos logit contributions
 word+0.106
'.+0.102
 pain+0.096
 owe+0.095
 hurts+0.095
Neg logit contributions
 castles-0.135
 cities-0.129
 forests-0.127
 oceans-0.126
 gardens-0.126
Tokent
Token ID83
Feature activation-6.56e-02

Pos logit contributions
 word+0.063
 pain+0.059
 apology+0.059
 grant+0.057
 shock+0.055
Neg logit contributions
 castles-0.097
cast-0.095
 cities-0.094
 gardens-0.092
 forests-0.092
Token speed
Token ID2866
Feature activation+2.94e-03

Pos logit contributions
 word+0.084
 apology+0.083
 pain+0.082
 trigger+0.081
 grant+0.080
Neg logit contributions
 castles-0.130
 upon-0.127
cast-0.126
 cities-0.126
 gardens-0.125
Token to
Token ID284
Feature activation-3.00e-02

Pos logit contributions
 word+0.009
 apology+0.008
'.+0.008
 pain+0.008
 hurts+0.008
Neg logit contributions
 castles-0.009
 cities-0.009
cast-0.009
 forests-0.009
 oceans-0.009
Token play
Token ID711
Feature activation-7.04e-03

Pos logit contributions
 word+0.044
 apology+0.042
 hurts+0.041
 trigger+0.041
se+0.039
Neg logit contributions
 castles-0.059
 gardens-0.056
 planting-0.055
 fields-0.055
 cities-0.054
Token with
Token ID351
Feature activation-7.81e-03

Pos logit contributions
 word+0.012
 apology+0.011
 pain+0.011
 hurts+0.011
 owe+0.011
Neg logit contributions
 castles-0.013
 gardens-0.012
 cities-0.012
cast-0.012
 forests-0.012
Token things
Token ID1243
Feature activation-1.88e-02

Pos logit contributions
 word+0.014
 pain+0.013
 apology+0.013
'.+0.013
 grant+0.013
Neg logit contributions
 castles-0.013
 cities-0.013
cast-0.012
 forests-0.012
 oceans-0.012
Token she
Token ID673
Feature activation+8.80e-03

Pos logit contributions
 word+0.031
'.+0.029
 apology+0.029
 pain+0.029
 grant+0.028
Neg logit contributions
 castles-0.034
 cities-0.033
 forests-0.033
cast-0.033
 oceans-0.032
Token shouldn
Token ID6584
Feature activation-1.26e-01

Pos logit contributions
 castles+0.014
 cities+0.013
cast+0.013
 forests+0.013
 gardens+0.013
Neg logit contributions
 word-0.017
 apology-0.016
 pain-0.016
'.-0.016
 grant-0.015
Token't
Token ID470
Feature activation-6.01e-02

Pos logit contributions
 trigger+0.204
 word+0.203
 pain+0.203
 apology+0.196
 shock+0.192
Neg logit contributions
 upon-0.208
cast-0.189
 castles-0.189
 Hil-0.183
line-0.178
Token touch
Token ID3638
Feature activation-2.22e-02

Pos logit contributions
 word+0.090
 apology+0.088
 grant+0.087
 pain+0.084
 trigger+0.084
Neg logit contributions
 castles-0.105
 upon-0.101
cast-0.099
 they-0.098
line-0.098
Token.
Token ID13
Feature activation-1.98e-02

Pos logit contributions
 word+0.035
 apology+0.033
 pain+0.032
 grant+0.032
'.+0.032
Neg logit contributions
 castles-0.042
 cities-0.041
 gardens-0.040
cast-0.040
 forests-0.040
Token<|endoftext|>
Token ID50256
Feature activation+1.02e-02

Pos logit contributions
'.+0.026
 word+0.026
 apology+0.024
 value+0.023
 grant+0.022
Neg logit contributions
 castles-0.045
 cities-0.043
cast-0.043
 oceans-0.042
 forests-0.042
TokenOnce
Token ID7454
Feature activation+7.83e-02

Pos logit contributions
 forests+0.014
 fields+0.013
 upon+0.013
 mountains+0.013
 castles+0.013
Neg logit contributions
 apology-0.021
 word-0.021
 pain-0.020
 grant-0.020
 trigger-0.019
Token to
Token ID284
Feature activation-3.04e-02

Pos logit contributions
 word+0.010
 apology+0.010
'.+0.010
 pain+0.010
 grant+0.009
Neg logit contributions
 castles-0.011
 cities-0.010
cast-0.010
 forests-0.010
 oceans-0.010
Token play
Token ID711
Feature activation-7.39e-03

Pos logit contributions
 word+0.044
 apology+0.042
 trigger+0.042
 hurts+0.041
se+0.040
Neg logit contributions
 castles-0.060
 gardens-0.057
 planting-0.056
 fields-0.056
 cities-0.055
Token with
Token ID351
Feature activation-8.64e-03

Pos logit contributions
 word+0.013
 apology+0.012
 hurts+0.012
 owe+0.012
 pain+0.012
Neg logit contributions
 castles-0.013
 gardens-0.012
 cities-0.012
cast-0.012
 fields-0.012
Token things
Token ID1243
Feature activation-1.76e-02

Pos logit contributions
 word+0.016
 apology+0.015
 pain+0.015
'.+0.015
 grant+0.014
Neg logit contributions
 castles-0.014
 cities-0.014
cast-0.013
 forests-0.013
 oceans-0.013
Token she
Token ID673
Feature activation+9.31e-03

Pos logit contributions
 word+0.030
'.+0.028
 apology+0.028
 pain+0.028
 grant+0.027
Neg logit contributions
 castles-0.031
 cities-0.030
 forests-0.029
cast-0.029
 gardens-0.029
Token shouldn
Token ID6584
Feature activation-1.26e-01

Pos logit contributions
 castles+0.015
 cities+0.014
cast+0.014
 forests+0.014
 gardens+0.014
Neg logit contributions
 word-0.018
 apology-0.017
 pain-0.017
'.-0.017
 grant-0.016
Token't
Token ID470
Feature activation-5.80e-02

Pos logit contributions
 trigger+0.204
 word+0.203
 pain+0.203
 apology+0.195
 shock+0.195
Neg logit contributions
 upon-0.207
cast-0.189
 castles-0.188
 Hil-0.179
line-0.178
Token touch
Token ID3638
Feature activation-2.07e-02

Pos logit contributions
 word+0.088
 apology+0.085
 grant+0.085
se+0.081
 pain+0.081
Neg logit contributions
 castles-0.102
cast-0.098
 upon-0.098
line-0.095
 ships-0.094
Token.
Token ID13
Feature activation-1.85e-02

Pos logit contributions
 word+0.032
 apology+0.031
 grant+0.029
 pain+0.029
'.+0.029
Neg logit contributions
 castles-0.040
 cities-0.038
 gardens-0.038
 forests-0.038
cast-0.038
Token<|endoftext|>
Token ID50256
Feature activation+1.18e-02

Pos logit contributions
 word+0.026
'.+0.025
 apology+0.024
 grant+0.023
 value+0.023
Neg logit contributions
 castles-0.040
 cities-0.039
 oceans-0.038
cast-0.038
 forests-0.038
TokenOnce
Token ID7454
Feature activation+8.04e-02

Pos logit contributions
 forests+0.016
 fields+0.016
 upon+0.016
 mountains+0.016
 castles+0.015
Neg logit contributions
 apology-0.024
 word-0.024
 pain-0.023
 grant-0.023
 hurts-0.023
Token She
Token ID1375
Feature activation+1.46e-03

Pos logit contributions
 word+0.072
'.+0.067
 pain+0.066
 apology+0.065
 value+0.064
Neg logit contributions
 castles-0.062
cast-0.060
 cities-0.058
 ships-0.058
 oceans-0.056
Token did
Token ID750
Feature activation-4.68e-02

Pos logit contributions
 castles+0.002
 cities+0.002
cast+0.002
 gardens+0.002
 forests+0.002
Neg logit contributions
 word-0.003
 apology-0.003
'.-0.003
 pain-0.003
 grant-0.003
Token many
Token ID867
Feature activation+1.03e-02

Pos logit contributions
 word+0.066
 pain+0.061
 hurts+0.060
 value+0.057
ggy+0.057
Neg logit contributions
 cities-0.099
 oceans-0.098
 ships-0.097
 forests-0.097
cast-0.097
Token things
Token ID1243
Feature activation-2.56e-02

Pos logit contributions
 oceans+0.019
 cities+0.019
cast+0.019
 castles+0.018
 forests+0.018
Neg logit contributions
 word-0.017
 pain-0.016
 apology-0.016
 hurts-0.015
 value-0.015
Token she
Token ID673
Feature activation+3.61e-03

Pos logit contributions
 word+0.040
 pain+0.037
 hurts+0.036
'.+0.036
 apology+0.036
Neg logit contributions
 castles-0.049
 cities-0.048
 forests-0.047
 oceans-0.047
 fields-0.046
Token shouldn
Token ID6584
Feature activation-1.24e-01

Pos logit contributions
 castles+0.006
 cities+0.006
 forests+0.006
 oceans+0.006
cast+0.006
Neg logit contributions
 word-0.007
 pain-0.006
 hurts-0.006
 apology-0.006
 grant-0.006
Token't
Token ID470
Feature activation-5.96e-02

Pos logit contributions
 word+0.170
 apology+0.160
 pain+0.155
'.+0.146
 trigger+0.146
Neg logit contributions
 castles-0.253
cast-0.242
 upon-0.242
 gardens-0.238
 fields-0.237
Token have
Token ID423
Feature activation-6.30e-02

Pos logit contributions
 word+0.083
 apology+0.081
 pain+0.079
 grant+0.076
 trigger+0.076
Neg logit contributions
 castles-0.117
 gardens-0.111
cast-0.111
 cities-0.110
 fields-0.110
Token done
Token ID1760
Feature activation-5.97e-02

Pos logit contributions
 word+0.081
 pain+0.073
 apology+0.073
'.+0.073
 grant+0.071
Neg logit contributions
 castles-0.139
 cities-0.135
 forests-0.133
cast-0.133
 oceans-0.132
Token,
Token ID11
Feature activation+5.04e-02

Pos logit contributions
 word+0.093
'.+0.084
 pain+0.084
 hurts+0.081
 value+0.080
Neg logit contributions
 castles-0.114
 cities-0.113
 forests-0.113
 oceans-0.112
 fields-0.110
Token like
Token ID588
Feature activation-1.82e-02

Pos logit contributions
aris+0.084
 castles+0.081
cast+0.080
 cities+0.080
angers+0.080
Neg logit contributions
 word-0.092
 pain-0.089
 grant-0.083
 hurts-0.082
 shock-0.081
Token did
Token ID750
Feature activation-4.70e-02

Pos logit contributions
 word+0.016
 pain+0.015
 apology+0.015
'.+0.015
 hurts+0.015
Neg logit contributions
 castles-0.014
 cities-0.014
cast-0.013
 forests-0.013
 oceans-0.013
Token lots
Token ID6041
Feature activation-9.02e-03

Pos logit contributions
 word+0.074
 pain+0.068
 apology+0.066
 hurts+0.065
 grant+0.064
Neg logit contributions
 castles-0.090
 cities-0.090
cast-0.089
 oceans-0.088
 forests-0.088
Token of
Token ID286
Feature activation-6.06e-03

Pos logit contributions
 word+0.013
 apology+0.012
 pain+0.012
'.+0.012
 hurts+0.012
Neg logit contributions
 castles-0.018
 cities-0.018
 forests-0.018
cast-0.017
 gardens-0.017
Token things
Token ID1243
Feature activation-2.06e-02

Pos logit contributions
 word+0.011
 pain+0.010
 apology+0.010
 value+0.009
 hurts+0.009
Neg logit contributions
 castles-0.011
cast-0.011
 cities-0.010
 oceans-0.010
angers-0.010
Token he
Token ID339
Feature activation-1.14e-03

Pos logit contributions
 word+0.033
 pain+0.030
 hurts+0.030
'.+0.030
 apology+0.030
Neg logit contributions
 castles-0.038
 cities-0.038
 forests-0.037
 oceans-0.037
 gardens-0.037
Token shouldn
Token ID6584
Feature activation-1.24e-01

Pos logit contributions
 word+0.002
 pain+0.002
 apology+0.002
'.+0.002
 hurts+0.002
Neg logit contributions
 castles-0.002
 cities-0.002
 forests-0.002
 oceans-0.002
cast-0.002
Token't
Token ID470
Feature activation-5.94e-02

Pos logit contributions
 word+0.177
 apology+0.166
 pain+0.164
 trigger+0.155
 shock+0.152
Neg logit contributions
 castles-0.242
 upon-0.235
cast-0.233
 cities-0.226
 forests-0.225
Token do
Token ID466
Feature activation-7.37e-02

Pos logit contributions
 pain+0.083
 word+0.083
 apology+0.082
 trigger+0.080
 hurts+0.079
Neg logit contributions
 castles-0.109
 upon-0.105
 gardens-0.104
cast-0.104
 cities-0.104
Token.
Token ID13
Feature activation-3.18e-02

Pos logit contributions
 word+0.103
 apology+0.097
 pain+0.095
 grant+0.092
'.+0.092
Neg logit contributions
 castles-0.154
cast-0.147
 cities-0.146
 gardens-0.145
 forests-0.143
Token One
Token ID1881
Feature activation-5.29e-03

Pos logit contributions
 word+0.046
'.+0.042
 apology+0.040
 value+0.040
 pain+0.038
Neg logit contributions
 castles-0.067
cast-0.065
 cities-0.064
 ships-0.062
 gardens-0.062
Token day
Token ID1110
Feature activation+8.56e-03

Pos logit contributions
 word+0.010
 apology+0.009
 pain+0.009
'.+0.009
 grant+0.009
Neg logit contributions
 castles-0.009
 cities-0.009
cast-0.008
 forests-0.008
 gardens-0.008
Token She
Token ID1375
Feature activation+1.32e-02

Pos logit contributions
 word+0.019
'.+0.018
 apology+0.017
 grant+0.017
 pain+0.017
Neg logit contributions
 castles-0.019
cast-0.018
 gardens-0.018
 cities-0.018
 ships-0.017
Token takes
Token ID2753
Feature activation+3.00e-03

Pos logit contributions
 castles+0.026
 cities+0.025
cast+0.025
 forests+0.025
 gardens+0.025
Neg logit contributions
 word-0.020
 apology-0.018
 pain-0.018
'.-0.018
 grant-0.017
Token out
Token ID503
Feature activation+6.11e-03

Pos logit contributions
 castles+0.005
 cities+0.005
cast+0.005
 forests+0.005
 gardens+0.005
Neg logit contributions
 word-0.005
 apology-0.005
 pain-0.005
'.-0.005
 grant-0.005
Token a
Token ID257
Feature activation-1.95e-02

Pos logit contributions
cast+0.014
 castles+0.014
 cities+0.013
aris+0.013
ashing+0.013
Neg logit contributions
 word-0.008
 pain-0.007
 apology-0.007
 shock-0.007
 value-0.006
Token magn
Token ID7842
Feature activation-1.05e-03

Pos logit contributions
 word+0.032
 apology+0.030
'.+0.030
 pain+0.030
 grant+0.029
Neg logit contributions
 castles-0.035
 cities-0.034
 forests-0.033
 gardens-0.033
cast-0.033
Tokenifying
Token ID4035
Feature activation-1.23e-01

Pos logit contributions
 word+0.001
 apology+0.001
 pain+0.001
 hurts+0.001
 owe+0.001
Neg logit contributions
 castles-0.002
 cities-0.002
 oceans-0.002
cast-0.002
 forests-0.002
Token glass
Token ID5405
Feature activation+3.12e-04

Pos logit contributions
 apology+0.182
 word+0.175
'.+0.165
 prove+0.165
ggie+0.165
Neg logit contributions
 castles-0.235
 mountains-0.233
 fields-0.232
 gardens-0.229
 oceans-0.226
Token.
Token ID13
Feature activation-4.68e-03

Pos logit contributions
 castles+0.001
 cities+0.000
cast+0.000
 fields+0.000
 gardens+0.000
Neg logit contributions
 word-0.001
 apology-0.001
 grant-0.001
'.-0.001
 pain-0.001
Token She
Token ID1375
Feature activation+1.42e-02

Pos logit contributions
 word+0.007
'.+0.007
 apology+0.006
 grant+0.006
 pain+0.006
Neg logit contributions
 castles-0.010
 cities-0.009
 gardens-0.009
cast-0.009
 ships-0.009
Token looks
Token ID3073
Feature activation+7.46e-03

Pos logit contributions
 castles+0.029
 cities+0.028
 oceans+0.027
 forests+0.027
cast+0.027
Neg logit contributions
 word-0.021
 apology-0.019
 pain-0.019
 grant-0.019
 hurts-0.018
Token at
Token ID379
Feature activation+3.28e-02

Pos logit contributions
 castles+0.012
 cities+0.012
angers+0.012
 forests+0.012
aris+0.012
Neg logit contributions
 word-0.013
 pain-0.013
'.-0.012
 apology-0.012
 grant-0.012
Token he
Token ID339
Feature activation-1.61e-03

Pos logit contributions
 castles+0.004
cast+0.004
 cities+0.004
 forests+0.004
 gardens+0.004
Neg logit contributions
 word-0.005
 pain-0.005
 apology-0.005
 hurts-0.005
 shock-0.005
Token had
Token ID550
Feature activation-4.49e-02

Pos logit contributions
 word+0.003
 apology+0.003
'.+0.003
 grant+0.003
 pain+0.003
Neg logit contributions
 castles-0.003
cast-0.002
 fields-0.002
 gardens-0.002
 forests-0.002
Token eaten
Token ID17065
Feature activation-7.70e-03

Pos logit contributions
 word+0.073
 pain+0.068
 apology+0.068
'.+0.067
 hurts+0.066
Neg logit contributions
 castles-0.083
 cities-0.081
cast-0.080
 forests-0.079
 gardens-0.078
Token something
Token ID1223
Feature activation-4.73e-02

Pos logit contributions
 word+0.015
 apology+0.014
 grant+0.014
'.+0.014
 pain+0.014
Neg logit contributions
 castles-0.012
 cities-0.011
cast-0.011
 fields-0.011
 gardens-0.011
Token he
Token ID339
Feature activation+5.51e-03

Pos logit contributions
 word+0.086
 apology+0.082
'.+0.082
 grant+0.082
se+0.081
Neg logit contributions
 castles-0.076
cast-0.073
 ships-0.069
 gardens-0.069
 fields-0.068
Token shouldn
Token ID6584
Feature activation-1.23e-01

Pos logit contributions
 castles+0.008
cast+0.007
 gardens+0.007
 cities+0.007
 fields+0.007
Neg logit contributions
 word-0.011
 apology-0.011
'.-0.011
 pain-0.010
 grant-0.010
Token't
Token ID470
Feature activation-5.89e-02

Pos logit contributions
 word+0.211
 apology+0.199
 pain+0.194
 trigger+0.191
 shock+0.186
Neg logit contributions
 upon-0.208
cast-0.195
 castles-0.190
angers-0.184
 Hil-0.182
Token have
Token ID423
Feature activation-6.62e-02

Pos logit contributions
 apology+0.095
 word+0.094
 trigger+0.091
 phrase+0.090
 grant+0.088
Neg logit contributions
 upon-0.092
 castles-0.092
line-0.089
cast-0.089
 planting-0.088
Token.
Token ID13
Feature activation-2.06e-02

Pos logit contributions
 word+0.128
 apology+0.121
 pain+0.119
'.+0.117
 grant+0.117
Neg logit contributions
 castles-0.104
cast-0.098
 cities-0.097
 gardens-0.096
 forests-0.095
Token Max
Token ID5436
Feature activation-2.24e-02

Pos logit contributions
 word+0.030
 hurts+0.029
 apology+0.027
'.+0.027
 pain+0.027
Neg logit contributions
 castles-0.045
cast-0.043
 cities-0.042
 oceans-0.042
 deserts-0.041
Token's
Token ID338
Feature activation+3.50e-03

Pos logit contributions
 word+0.038
 pain+0.035
 apology+0.035
'.+0.034
 hurts+0.034
Neg logit contributions
 castles-0.040
 cities-0.039
cast-0.038
 forests-0.038
 oceans-0.038
Token and
Token ID290
Feature activation+6.70e-02

Pos logit contributions
 castles+0.052
angers+0.051
cast+0.050
 cities+0.050
aris+0.050
Neg logit contributions
 word-0.047
 pain-0.044
'.-0.044
 shock-0.041
 grant-0.041
Token getting
Token ID1972
Feature activation+7.62e-03

Pos logit contributions
angers+0.106
aris+0.102
 cities+0.096
 castles+0.096
 Hil+0.092
Neg logit contributions
 word-0.126
 hurts-0.125
 pain-0.123
 value-0.116
 shock-0.113
Token into
Token ID656
Feature activation+1.08e-02

Pos logit contributions
pit+0.012
 cities+0.012
aris+0.012
angers+0.012
arium+0.012
Neg logit contributions
 pain-0.014
 value-0.013
 word-0.013
 hurts-0.013
 shock-0.012
Token things
Token ID1243
Feature activation-1.32e-02

Pos logit contributions
 castles+0.021
 cities+0.021
cast+0.020
 forests+0.020
 gardens+0.020
Neg logit contributions
 word-0.016
 pain-0.015
 apology-0.015
'.-0.015
 grant-0.014
Token he
Token ID339
Feature activation+3.97e-03

Pos logit contributions
 word+0.017
'.+0.015
 pain+0.015
 apology+0.015
 hurts+0.015
Neg logit contributions
 castles-0.029
 cities-0.028
 forests-0.028
cast-0.028
 gardens-0.028
Token shouldn
Token ID6584
Feature activation-1.23e-01

Pos logit contributions
 castles+0.007
 cities+0.007
 oceans+0.007
angers+0.007
cast+0.007
Neg logit contributions
 word-0.007
 pain-0.006
 hurts-0.006
 value-0.006
'.-0.006
Token't
Token ID470
Feature activation-6.05e-02

Pos logit contributions
 word+0.197
 apology+0.190
 trigger+0.177
 pain+0.176
 shock+0.170
Neg logit contributions
 upon-0.217
cast-0.209
 castles-0.209
 Hil-0.203
 fields-0.203
Token.
Token ID13
Feature activation-3.18e-02

Pos logit contributions
 apology+0.093
 word+0.092
 trigger+0.087
 grant+0.086
 pain+0.085
Neg logit contributions
 castles-0.104
 upon-0.104
 gardens-0.102
cast-0.102
 fields-0.102
Token
Token ID198
Feature activation-2.06e-02

Pos logit contributions
'.+0.048
 word+0.048
 apology+0.044
 pain+0.043
 value+0.043
Neg logit contributions
 castles-0.064
 cities-0.060
cast-0.059
 gardens-0.058
angers-0.058
Token
Token ID198
Feature activation-2.54e-02

Pos logit contributions
 word+0.036
 apology+0.033
 pain+0.033
'.+0.032
 grant+0.032
Neg logit contributions
 castles-0.036
 cities-0.035
 forests-0.034
cast-0.034
 gardens-0.034
TokenOne
Token ID3198
Feature activation-4.92e-03

Pos logit contributions
 word+0.046
 apology+0.043
 pain+0.043
'.+0.043
 grant+0.042
Neg logit contributions
 castles-0.042
 cities-0.041
cast-0.040
 forests-0.040
 gardens-0.040
Tokenally
Token ID453
Feature activation-2.38e-02

Pos logit contributions
 castles+0.011
 cities+0.011
 ships+0.011
 forests+0.011
 oceans+0.011
Neg logit contributions
 word-0.014
'.-0.014
 apology-0.014
 pain-0.013
 hurts-0.013
Token,
Token ID11
Feature activation+4.67e-02

Pos logit contributions
 word+0.043
 apology+0.040
 pain+0.040
'.+0.040
 grant+0.039
Neg logit contributions
 castles-0.040
 cities-0.038
cast-0.037
 forests-0.037
 gardens-0.037
Token you
Token ID345
Feature activation-7.51e-02

Pos logit contributions
 castles+0.093
 cities+0.092
cast+0.091
 forests+0.091
angers+0.090
Neg logit contributions
 word-0.071
 apology-0.065
 grant-0.065
 pain-0.065
'.-0.063
Token know
Token ID760
Feature activation-4.38e-02

Pos logit contributions
 word+0.096
 apology+0.087
 pain+0.085
'.+0.085
 grant+0.082
Neg logit contributions
 castles-0.166
 cities-0.161
cast-0.159
 forests-0.159
 gardens-0.158
Token you
Token ID345
Feature activation-7.30e-02

Pos logit contributions
 word+0.073
'.+0.068
 hurts+0.067
 pain+0.066
 grant+0.066
Neg logit contributions
angers-0.077
cast-0.077
 cities-0.076
 castles-0.075
 forests-0.074
Token shouldn
Token ID6584
Feature activation-1.23e-01

Pos logit contributions
 word+0.107
 apology+0.100
 pain+0.097
'.+0.095
 grant+0.093
Neg logit contributions
 castles-0.147
 cities-0.141
cast-0.140
 gardens-0.140
 ships-0.139
Token't
Token ID470
Feature activation-6.45e-02

Pos logit contributions
 apology+0.220
 word+0.214
rum+0.197
 sentence+0.193
 trigger+0.193
Neg logit contributions
 planting-0.184
 Hil-0.181
 fields-0.176
 castles-0.174
 mountains-0.173
Token keep
Token ID1394
Feature activation-1.56e-02

Pos logit contributions
 apology+0.103
 word+0.098
',"+0.093
 trigger+0.091
 phrase+0.090
Neg logit contributions
 planting-0.108
 castles-0.107
 upon-0.105
 exploring-0.104
 fields-0.103
Token the
Token ID262
Feature activation+1.24e-03

Pos logit contributions
 word+0.031
 apology+0.029
'.+0.029
 pain+0.029
 grant+0.028
Neg logit contributions
 castles-0.024
 cities-0.022
 forests-0.022
cast-0.022
 gardens-0.022
Token gum
Token ID27142
Feature activation-7.33e-02

Pos logit contributions
 fields+0.002
 castles+0.002
 mountains+0.002
 upon+0.002
 planting+0.002
Neg logit contributions
'.-0.002
rum-0.002
 grant-0.002
 apology-0.002
se-0.002
Token.
Token ID13
Feature activation-3.90e-02

Pos logit contributions
 word+0.114
 apology+0.113
 pain+0.107
 grant+0.106
 shock+0.103
Neg logit contributions
 castles-0.137
cast-0.132
 gardens-0.129
 fields-0.127
 ships-0.126
Token sometimes
Token ID3360
Feature activation-1.20e-02

Pos logit contributions
 castles+0.075
 cities+0.073
cast+0.072
 forests+0.072
 oceans+0.071
Neg logit contributions
 word-0.053
 apology-0.049
 pain-0.048
'.-0.047
 grant-0.046
Token sneak
Token ID20528
Feature activation+3.81e-03

Pos logit contributions
 word+0.021
 pain+0.020
 apology+0.020
'.+0.020
 hurts+0.020
Neg logit contributions
 castles-0.020
 cities-0.020
cast-0.020
 forests-0.019
angers-0.019
Token into
Token ID656
Feature activation+1.51e-02

Pos logit contributions
 castles+0.007
 cities+0.007
cast+0.007
 forests+0.007
 gardens+0.007
Neg logit contributions
 word-0.006
 pain-0.005
 apology-0.005
'.-0.005
 hurts-0.005
Token places
Token ID4113
Feature activation+6.12e-03

Pos logit contributions
 castles+0.025
 cities+0.024
 forests+0.024
cast+0.024
 gardens+0.024
Neg logit contributions
 word-0.027
 apology-0.026
 pain-0.025
'.-0.025
 grant-0.025
Token he
Token ID339
Feature activation+8.41e-03

Pos logit contributions
 castles+0.015
 cities+0.014
cast+0.014
 forests+0.014
 gardens+0.014
Neg logit contributions
 word-0.007
'.-0.006
 pain-0.006
 apology-0.006
 hurts-0.006
Token shouldn
Token ID6584
Feature activation-1.23e-01

Pos logit contributions
 castles+0.014
 cities+0.014
cast+0.013
 forests+0.013
 gardens+0.013
Neg logit contributions
 word-0.015
 apology-0.014
 pain-0.014
'.-0.014
 hurts-0.014
Token't
Token ID470
Feature activation-5.75e-02

Pos logit contributions
 word+0.200
 apology+0.197
 pain+0.192
 trigger+0.189
 shock+0.180
Neg logit contributions
 castles-0.195
 upon-0.193
 fields-0.185
 Hil-0.184
cast-0.183
Token.
Token ID13
Feature activation-2.66e-02

Pos logit contributions
 apology+0.084
 word+0.084
 grant+0.080
 trigger+0.079
 pain+0.078
Neg logit contributions
 castles-0.107
 gardens-0.101
 fields-0.099
 cities-0.099
cast-0.098
Token One
Token ID1881
Feature activation-3.14e-03

Pos logit contributions
 word+0.038
'.+0.032
 apology+0.032
 hurts+0.031
se+0.031
Neg logit contributions
 castles-0.056
cast-0.055
 gardens-0.054
 deserts-0.054
 cities-0.054
Token day
Token ID1110
Feature activation+1.03e-02

Pos logit contributions
 word+0.006
 apology+0.005
 pain+0.005
'.+0.005
 grant+0.005
Neg logit contributions
 castles-0.005
 cities-0.005
cast-0.005
 forests-0.005
 gardens-0.005
Token Mac
Token ID4100
Feature activation+2.09e-03

Pos logit contributions
 castles+0.018
 cities+0.018
 gardens+0.018
cast+0.018
 forests+0.017
Neg logit contributions
 word-0.018
 pain-0.016
'.-0.016
 hurts-0.016
 apology-0.016
Token inside
Token ID2641
Feature activation+7.93e-03

Pos logit contributions
 word+0.022
 apology+0.021
'.+0.020
 pain+0.020
 grant+0.020
Neg logit contributions
 castles-0.020
 cities-0.019
cast-0.019
 forests-0.019
 gardens-0.019
Token the
Token ID262
Feature activation+4.33e-02

Pos logit contributions
 castles+0.014
 cities+0.014
cast+0.013
 forests+0.013
 gardens+0.013
Neg logit contributions
 word-0.014
 apology-0.013
 pain-0.013
'.-0.012
 grant-0.012
Token cup
Token ID6508
Feature activation+8.19e-03

Pos logit contributions
 castles+0.067
 cities+0.064
cast+0.063
 forests+0.062
 gardens+0.062
Neg logit contributions
 word-0.085
 apology-0.080
 pain-0.079
'.-0.079
 grant-0.077
Tokenboard
Token ID3526
Feature activation-2.23e-02

Pos logit contributions
 castles+0.013
 cities+0.013
 gardens+0.013
cast+0.013
 fields+0.013
Neg logit contributions
 word-0.015
 apology-0.014
 pain-0.014
 grant-0.014
'.-0.014
Token that
Token ID326
Feature activation+2.59e-02

Pos logit contributions
 word+0.028
 apology+0.026
 pain+0.025
'.+0.025
 grant+0.024
Neg logit contributions
 castles-0.050
 cities-0.048
cast-0.048
 forests-0.048
 gardens-0.047
Token shouldn
Token ID6584
Feature activation-1.22e-01

Pos logit contributions
angers+0.049
 castles+0.049
 cities+0.048
aris+0.047
 forests+0.047
Neg logit contributions
 word-0.042
 pain-0.039
 hurts-0.037
 value-0.037
 shock-0.036
Token't
Token ID470
Feature activation-6.05e-02

Pos logit contributions
 word+0.216
 apology+0.208
 pain+0.204
 trigger+0.199
rum+0.192
Neg logit contributions
 upon-0.178
 Hil-0.169
 castles-0.159
 they-0.154
 planting-0.152
Token be
Token ID307
Feature activation-1.82e-02

Pos logit contributions
 hurts+0.091
 word+0.090
 apology+0.090
 pain+0.088
se+0.088
Neg logit contributions
 castles-0.104
 ships-0.099
 upon-0.099
 Hil-0.098
 gardens-0.098
Token there
Token ID612
Feature activation-2.70e-02

Pos logit contributions
 word+0.033
 pain+0.032
'.+0.031
 owe+0.030
 apology+0.030
Neg logit contributions
 castles-0.030
 cities-0.029
cast-0.029
 forests-0.029
 fields-0.028
Token.
Token ID13
Feature activation-1.93e-02

Pos logit contributions
 word+0.031
 apology+0.029
 pain+0.028
 grant+0.027
 hurts+0.027
Neg logit contributions
 castles-0.063
 cities-0.061
cast-0.060
 gardens-0.060
 forests-0.060
Token He
Token ID679
Feature activation+1.76e-03

Pos logit contributions
 word+0.023
'.+0.022
 apology+0.020
 pain+0.020
 grant+0.020
Neg logit contributions
 castles-0.046
cast-0.044
 cities-0.043
 gardens-0.042
 forests-0.042
Token,
Token ID11
Feature activation+5.68e-02

Pos logit contributions
 castles+0.018
 cities+0.018
cast+0.018
 forests+0.017
 oceans+0.017
Neg logit contributions
 word-0.018
 pain-0.017
 apology-0.016
 hurts-0.016
'.-0.016
Token "
Token ID366
Feature activation-1.53e-02

Pos logit contributions
 castles+0.093
cast+0.090
 cities+0.089
 gardens+0.088
 forests+0.087
Neg logit contributions
 word-0.109
'.-0.101
 grant-0.100
 pain-0.096
 apology-0.095
TokenNo
Token ID2949
Feature activation-5.68e-02

Pos logit contributions
 word+0.022
 apology+0.021
 pain+0.020
 grant+0.020
'.+0.019
Neg logit contributions
 castles-0.030
 cities-0.029
 forests-0.029
 fields-0.029
 oceans-0.029
Token,
Token ID11
Feature activation+2.61e-02

Pos logit contributions
 grant+0.120
 owe+0.115
 word+0.115
'.+0.114
se+0.113
Neg logit contributions
 castles-0.071
 cities-0.063
 fields-0.063
 ships-0.062
 mountains-0.061
Token we
Token ID356
Feature activation-6.77e-02

Pos logit contributions
 castles+0.046
 cities+0.045
cast+0.044
 forests+0.044
 gardens+0.044
Neg logit contributions
 word-0.046
 pain-0.043
 apology-0.042
'.-0.041
 grant-0.041
Token shouldn
Token ID6584
Feature activation-1.22e-01

Pos logit contributions
 word+0.106
 apology+0.100
'.+0.093
 grant+0.093
 pain+0.093
Neg logit contributions
 castles-0.127
cast-0.120
 ships-0.120
 upon-0.120
 cities-0.120
Token't
Token ID470
Feature activation-6.86e-02

Pos logit contributions
 apology+0.244
 word+0.236
 phrase+0.220
rito+0.216
 pain+0.213
Neg logit contributions
 castles-0.160
 playing-0.153
 Hil-0.153
 fields-0.150
 mountains-0.149
Token disturb
Token ID17037
Feature activation-2.78e-02

Pos logit contributions
 apology+0.126
 word+0.116
 phrase+0.114
 pain+0.111
 grant+0.110
Neg logit contributions
 playing-0.095
 castles-0.095
 they-0.092
 Hil-0.090
 planting-0.087
Token the
Token ID262
Feature activation+5.66e-03

Pos logit contributions
 word+0.056
 apology+0.054
 owe+0.053
 grant+0.052
 pain+0.052
Neg logit contributions
 castles-0.042
 cities-0.039
 fields-0.039
 gardens-0.039
 forests-0.039
Token butterfly
Token ID35113
Feature activation-2.86e-02

Pos logit contributions
 castles+0.010
 cities+0.010
 fields+0.009
 mountains+0.009
 gardens+0.009
Neg logit contributions
 word-0.009
'.-0.009
 apology-0.009
 owe-0.009
 grant-0.009
Token.
Token ID13
Feature activation-3.31e-02

Pos logit contributions
 word+0.034
 apology+0.032
 pain+0.031
 grant+0.030
 owe+0.029
Neg logit contributions
 castles-0.065
 cities-0.062
cast-0.062
 gardens-0.062
 fields-0.061
Token and
Token ID290
Feature activation+7.47e-02

Pos logit contributions
 word+0.065
 apology+0.061
'.+0.060
 pain+0.060
 hurts+0.059
Neg logit contributions
 castles-0.059
 cities-0.057
cast-0.057
 oceans-0.056
 forests-0.056
Token said
Token ID531
Feature activation+1.24e-02

Pos logit contributions
 castles+0.116
 cities+0.114
 forests+0.112
 oceans+0.112
cast+0.111
Neg logit contributions
 word-0.145
 pain-0.138
 apology-0.137
 hurts-0.134
 owe-0.133
Token,
Token ID11
Feature activation+5.88e-02

Pos logit contributions
 castles+0.021
 cities+0.021
 forests+0.021
 oceans+0.021
 gardens+0.021
Neg logit contributions
 word-0.022
 pain-0.021
 apology-0.020
'.-0.020
 hurts-0.020
Token "
Token ID366
Feature activation-1.58e-02

Pos logit contributions
 castles+0.103
 cities+0.100
 gardens+0.098
 forests+0.097
 oceans+0.097
Neg logit contributions
 word-0.110
 grant-0.101
'.-0.101
 apology-0.100
 pain-0.098
TokenYou
Token ID1639
Feature activation-8.70e-02

Pos logit contributions
 word+0.027
 apology+0.026
 grant+0.024
'.+0.024
 pain+0.024
Neg logit contributions
 castles-0.028
 cities-0.027
 forests-0.027
 fields-0.027
 oceans-0.026
Token shouldn
Token ID6584
Feature activation-1.22e-01

Pos logit contributions
 word+0.124
 apology+0.113
'.+0.108
 sentence+0.106
 pain+0.105
Neg logit contributions
 castles-0.179
 ships-0.172
 cities-0.169
 gardens-0.168
cast-0.167
Token't
Token ID470
Feature activation-6.41e-02

Pos logit contributions
 word+0.217
 apology+0.215
 phrase+0.198
 sentence+0.194
'.+0.191
Neg logit contributions
 Hil-0.184
 castles-0.184
 planting-0.176
 fields-0.170
 upon-0.169
Token take
Token ID1011
Feature activation-4.67e-02

Pos logit contributions
 word+0.097
 apology+0.097
 phrase+0.088
',"+0.088
'.+0.087
Neg logit contributions
 castles-0.118
 planting-0.110
 gardens-0.110
 ships-0.109
 fields-0.107
Token things
Token ID1243
Feature activation-3.86e-02

Pos logit contributions
 word+0.084
 apology+0.079
'.+0.079
 grant+0.078
 pain+0.077
Neg logit contributions
 castles-0.078
 cities-0.075
 forests-0.074
 gardens-0.074
 fields-0.073
Token that
Token ID326
Feature activation-1.15e-02

Pos logit contributions
 word+0.071
 apology+0.067
 pain+0.065
 grant+0.064
 sentence+0.063
Neg logit contributions
 castles-0.063
cast-0.060
 cities-0.059
 ships-0.058
 gardens-0.058
Token don
Token ID836
Feature activation-8.18e-02

Pos logit contributions
 word+0.015
 apology+0.013
 pain+0.013
'.+0.013
 hurts+0.013
Neg logit contributions
 castles-0.025
 cities-0.025
 forests-0.024
cast-0.024
 oceans-0.024
Token told
Token ID1297
Feature activation+3.83e-03

Pos logit contributions
 cities+0.097
aris+0.097
cast+0.097
 oceans+0.096
angers+0.096
Neg logit contributions
 pain-0.064
 word-0.063
 hurts-0.063
 prove-0.058
ggy-0.057
Token Sarah
Token ID10490
Feature activation-2.24e-02

Pos logit contributions
cast+0.005
 castles+0.005
 oceans+0.005
 forests+0.005
 cities+0.005
Neg logit contributions
 word-0.009
 apology-0.008
 pain-0.008
 shock-0.008
'.-0.008
Token,
Token ID11
Feature activation+6.92e-02

Pos logit contributions
 word+0.036
'.+0.033
 apology+0.032
 pain+0.032
 hurts+0.032
Neg logit contributions
 castles-0.043
 cities-0.042
 oceans-0.042
cast-0.042
 forests-0.042
Token "
Token ID366
Feature activation-1.02e-02

Pos logit contributions
 castles+0.129
cast+0.125
 cities+0.122
 oceans+0.120
 gardens+0.120
Neg logit contributions
 word-0.127
'.-0.116
 grant-0.114
 apology-0.114
 pain-0.111
TokenYou
Token ID1639
Feature activation-8.93e-02

Pos logit contributions
 word+0.013
'.+0.012
 pain+0.012
 hurts+0.011
 apology+0.011
Neg logit contributions
 castles-0.023
 ships-0.023
cast-0.023
 cities-0.022
 forests-0.022
Token shouldn
Token ID6584
Feature activation-1.21e-01

Pos logit contributions
 word+0.109
 apology+0.099
 pain+0.098
'.+0.097
 grant+0.095
Neg logit contributions
 castles-0.203
 cities-0.197
cast-0.194
 forests-0.194
 gardens-0.193
Token't
Token ID470
Feature activation-6.20e-02

Pos logit contributions
 apology+0.208
 word+0.200
 trigger+0.190
 sentence+0.182
'.+0.181
Neg logit contributions
 Hil-0.187
 fields-0.183
 castles-0.182
 mountains-0.181
 upon-0.181
Token climb
Token ID12080
Feature activation-5.35e-03

Pos logit contributions
 apology+0.107
 trigger+0.103
 word+0.103
 grant+0.102
'.+0.100
Neg logit contributions
 castles-0.097
 fields-0.093
 cities-0.092
 gardens-0.092
 upon-0.090
Token the
Token ID262
Feature activation+2.49e-02

Pos logit contributions
 word+0.009
 pain+0.009
'.+0.008
 hurts+0.008
 apology+0.008
Neg logit contributions
 castles-0.009
 cities-0.009
cast-0.009
 gardens-0.009
 forests-0.009
Token tree
Token ID5509
Feature activation+1.37e-02

Pos logit contributions
 castles+0.045
 cities+0.043
cast+0.042
 forests+0.042
 gardens+0.042
Neg logit contributions
 word-0.042
 apology-0.040
 pain-0.039
'.-0.039
 grant-0.038
Token,
Token ID11
Feature activation+4.45e-02

Pos logit contributions
 castles+0.029
 cities+0.028
 forests+0.028
cast+0.028
 oceans+0.028
Neg logit contributions
 word-0.019
'.-0.017
 hurts-0.017
 pain-0.017
 apology-0.017
Token the
Token ID262
Feature activation+7.69e-03

Pos logit contributions
 castles+0.009
 cities+0.009
cast+0.009
 forests+0.009
 gardens+0.009
Neg logit contributions
 word-0.015
'.-0.014
 apology-0.014
 hurts-0.013
 pain-0.013
Token shark
Token ID21027
Feature activation-3.32e-02

Pos logit contributions
 castles+0.014
 ships+0.014
 mountains+0.013
 oceans+0.013
 fields+0.013
Neg logit contributions
'.-0.013
se-0.013
',"-0.013
rum-0.013
 grant-0.012
Token,
Token ID11
Feature activation+7.49e-02

Pos logit contributions
 word+0.058
 apology+0.054
 pain+0.053
'.+0.052
 hurts+0.052
Neg logit contributions
 castles-0.058
 cities-0.057
 forests-0.056
cast-0.055
 oceans-0.055
Token "
Token ID366
Feature activation-7.11e-03

Pos logit contributions
aris+0.142
 castles+0.141
 cities+0.137
 gardens+0.136
angers+0.136
Neg logit contributions
 word-0.128
 apology-0.120
 grant-0.118
 pain-0.116
'.-0.115
TokenYou
Token ID1639
Feature activation-8.72e-02

Pos logit contributions
 word+0.010
 apology+0.010
 pain+0.009
'.+0.009
 grant+0.009
Neg logit contributions
 castles-0.014
 cities-0.014
cast-0.014
 forests-0.014
 oceans-0.014
Token shouldn
Token ID6584
Feature activation-1.21e-01

Pos logit contributions
 word+0.122
 apology+0.112
'.+0.107
 sentence+0.104
 pain+0.104
Neg logit contributions
 castles-0.180
 ships-0.174
cast-0.170
 cities-0.170
 gardens-0.169
Token't
Token ID470
Feature activation-6.33e-02

Pos logit contributions
 apology+0.234
 word+0.232
rum+0.224
se+0.214
 sentence+0.214
Neg logit contributions
 planting-0.154
 castles-0.145
 Hil-0.143
 upon-0.143
 fields-0.140
Token be
Token ID307
Feature activation-2.55e-02

Pos logit contributions
 apology+0.096
antly+0.096
',"+0.095
ps+0.093
 word+0.093
Neg logit contributions
 planting-0.098
 castles-0.098
 upon-0.096
 ships-0.093
 exploring-0.092
Token selfish
Token ID20363
Feature activation-6.13e-02

Pos logit contributions
 word+0.048
 apology+0.045
'.+0.045
 grant+0.044
 pain+0.043
Neg logit contributions
 castles-0.041
 cities-0.039
 gardens-0.038
 forests-0.038
cast-0.038
Token.
Token ID13
Feature activation-4.33e-02

Pos logit contributions
 grant+0.119
 word+0.117
 apology+0.115
Ble+0.113
 shock+0.110
Neg logit contributions
 castles-0.088
cast-0.083
aris-0.082
 upon-0.082
 ships-0.081
Token We
Token ID775
Feature activation-6.43e-02

Pos logit contributions
 word+0.069
 pain+0.065
 apology+0.063
 shock+0.062
 hurts+0.060
Neg logit contributions
 castles-0.078
 cities-0.076
 ships-0.075
 forests-0.074
 oceans-0.074
Token When
Token ID1649
Feature activation+2.03e-02

Pos logit contributions
 word+0.018
'.+0.017
 apology+0.016
 grant+0.016
 owe+0.016
Neg logit contributions
 castles-0.021
cast-0.020
 cities-0.020
 gardens-0.020
 forests-0.020
Token Daisy
Token ID40355
Feature activation+1.68e-02

Pos logit contributions
 castles+0.026
 cities+0.026
cast+0.026
angers+0.026
aris+0.026
Neg logit contributions
 word-0.043
 pain-0.041
'.-0.040
 grant-0.040
 hurts-0.040
Token put
Token ID1234
Feature activation+4.40e-02

Pos logit contributions
 castles+0.028
 cities+0.027
cast+0.026
 forests+0.026
 gardens+0.026
Neg logit contributions
 word-0.030
 apology-0.028
 pain-0.028
'.-0.028
 grant-0.028
Token the
Token ID262
Feature activation+2.85e-02

Pos logit contributions
 castles+0.077
 cities+0.074
cast+0.072
 forests+0.072
 gardens+0.072
Neg logit contributions
 word-0.077
 apology-0.072
 pain-0.071
'.-0.071
 grant-0.070
Token magn
Token ID7842
Feature activation-8.79e-04

Pos logit contributions
 castles+0.051
 gardens+0.048
 fields+0.048
 mountains+0.048
 cities+0.048
Neg logit contributions
'.-0.045
 word-0.045
 owe-0.044
 apology-0.044
 grant-0.043
Tokenifying
Token ID4035
Feature activation-1.21e-01

Pos logit contributions
 word+0.001
 apology+0.001
 pain+0.001
 hurts+0.001
 owe+0.001
Neg logit contributions
 castles-0.002
 cities-0.002
 oceans-0.002
 forests-0.002
cast-0.002
Token glass
Token ID5405
Feature activation+1.36e-02

Pos logit contributions
 apology+0.202
 word+0.193
 pain+0.185
 prove+0.184
ggie+0.180
Neg logit contributions
 fields-0.214
 castles-0.212
 mountains-0.212
 gardens-0.209
 forests-0.204
Token down
Token ID866
Feature activation+1.35e-02

Pos logit contributions
 castles+0.023
 cities+0.022
cast+0.022
 forests+0.022
 gardens+0.022
Neg logit contributions
 word-0.024
 apology-0.023
 pain-0.023
'.-0.022
 grant-0.022
Token,
Token ID11
Feature activation+9.11e-02

Pos logit contributions
 castles+0.022
 cities+0.022
cast+0.022
 ships+0.021
 forests+0.021
Neg logit contributions
 word-0.025
 pain-0.023
'.-0.023
 apology-0.023
 grant-0.022
Token the
Token ID262
Feature activation+4.05e-02

Pos logit contributions
 castles+0.117
angers+0.116
cast+0.110
 cities+0.109
aris+0.108
Neg logit contributions
 word-0.198
 pain-0.194
'.-0.188
 shock-0.185
 grant-0.184
Token cater
Token ID20825
Feature activation+1.27e-02

Pos logit contributions
 castles+0.073
 cities+0.070
 gardens+0.070
 fields+0.069
 forests+0.069
Neg logit contributions
 word-0.065
'.-0.064
 apology-0.063
 owe-0.061
 grant-0.061
Token,
Token ID11
Feature activation+5.42e-02

Pos logit contributions
 word+0.088
 apology+0.085
 pain+0.082
 grant+0.082
 owe+0.081
Neg logit contributions
 castles-0.075
 cities-0.071
cast-0.070
 ships-0.069
 gardens-0.069
Token doesn
Token ID1595
Feature activation-9.66e-02

Pos logit contributions
 castles+0.086
angers+0.085
cast+0.083
 cities+0.082
 forests+0.081
Neg logit contributions
 word-0.104
 pain-0.098
 apology-0.094
'.-0.094
 hurts-0.092
Token't
Token ID470
Feature activation-5.83e-02

Pos logit contributions
 word+0.167
 apology+0.166
 pain+0.156
 sentence+0.149
Ble+0.146
Neg logit contributions
 Hil-0.142
 upon-0.140
cast-0.139
 mountains-0.135
 ships-0.134
Token mean
Token ID1612
Feature activation-5.27e-02

Pos logit contributions
 word+0.079
 apology+0.074
'.+0.073
 hurts+0.072
 pain+0.071
Neg logit contributions
 castles-0.119
 cities-0.116
 oceans-0.113
 ships-0.113
cast-0.113
Token you
Token ID345
Feature activation-8.05e-02

Pos logit contributions
 apology+0.097
 word+0.096
 grant+0.096
 owe+0.095
 pain+0.093
Neg logit contributions
 castles-0.080
 cities-0.076
 ships-0.073
cast-0.072
 fields-0.070
Token shouldn
Token ID6584
Feature activation-1.21e-01

Pos logit contributions
 word+0.120
 apology+0.115
 pain+0.114
 trigger+0.108
 shock+0.105
Neg logit contributions
 castles-0.148
 upon-0.147
 cities-0.144
cast-0.143
 gardens-0.143
Token't
Token ID470
Feature activation-6.13e-02

Pos logit contributions
 apology+0.263
rum+0.250
rito+0.248
 pain+0.248
 word+0.246
Neg logit contributions
cast-0.098
 upon-0.098
 planting-0.094
 Hil-0.094
 castles-0.094
Token try
Token ID1949
Feature activation-7.33e-02

Pos logit contributions
 apology+0.090
 pain+0.088
 trigger+0.087
 word+0.086
Ble+0.086
Neg logit contributions
 they-0.094
 castles-0.093
line-0.092
 upon-0.092
 exploring-0.092
Token it
Token ID340
Feature activation-3.99e-02

Pos logit contributions
 apology+0.129
 word+0.128
 owe+0.124
se+0.123
 grant+0.121
Neg logit contributions
 castles-0.126
cast-0.120
 cities-0.116
 gardens-0.115
 forests-0.114
Token!
Token ID0
Feature activation-2.67e-02

Pos logit contributions
 apology+0.060
 word+0.059
 pain+0.058
 grant+0.056
 owe+0.055
Neg logit contributions
 castles-0.078
cast-0.078
 gardens-0.071
 upon-0.071
 cities-0.071
Token<|endoftext|>
Token ID50256
Feature activation+1.09e-02

Pos logit contributions
 word+0.046
 pain+0.044
 apology+0.044
 hurts+0.042
 grant+0.042
Neg logit contributions
 castles-0.045
 cities-0.044
 forests-0.043
 oceans-0.043
 gardens-0.043
Token "
Token ID366
Feature activation-1.16e-02

Pos logit contributions
 castles+0.110
 cities+0.104
cast+0.103
 gardens+0.101
 oceans+0.101
Neg logit contributions
 word-0.132
'.-0.124
 grant-0.122
 apology-0.122
 pain-0.120
TokenMe
Token ID5308
Feature activation+1.32e-03

Pos logit contributions
 word+0.016
 apology+0.015
 pain+0.015
'.+0.015
 grant+0.014
Neg logit contributions
 castles-0.024
 cities-0.023
cast-0.023
 forests-0.023
 gardens-0.023
Tokengan
Token ID1030
Feature activation-1.83e-02

Pos logit contributions
 castles+0.002
cast+0.002
 upon+0.002
 gardens+0.001
 planting+0.001
Neg logit contributions
 word-0.003
 apology-0.003
 grant-0.003
 pain-0.003
 owe-0.003
Token,
Token ID11
Feature activation+4.16e-02

Pos logit contributions
 word+0.033
 apology+0.031
 pain+0.031
 grant+0.030
'.+0.030
Neg logit contributions
 castles-0.030
 cities-0.029
 gardens-0.028
cast-0.028
 forests-0.028
Token you
Token ID345
Feature activation-7.53e-02

Pos logit contributions
 castles+0.074
 cities+0.072
 forests+0.071
cast+0.071
angers+0.071
Neg logit contributions
 word-0.071
 pain-0.067
 apology-0.067
'.-0.066
 grant-0.065
Token shouldn
Token ID6584
Feature activation-1.21e-01

Pos logit contributions
 word+0.093
 apology+0.083
'.+0.080
 pain+0.080
 grant+0.077
Neg logit contributions
 castles-0.168
 cities-0.162
 gardens-0.160
 ships-0.160
cast-0.159
Token't
Token ID470
Feature activation-6.22e-02

Pos logit contributions
 word+0.217
 apology+0.215
 trigger+0.198
rum+0.194
 sentence+0.193
Neg logit contributions
 Hil-0.179
 planting-0.170
 castles-0.168
 upon-0.166
 fields-0.162
Token take
Token ID1011
Feature activation-4.49e-02

Pos logit contributions
 apology+0.099
 word+0.095
 trigger+0.093
',"+0.091
itch+0.089
Neg logit contributions
 castles-0.101
 planting-0.098
 exploring-0.098
 upon-0.095
 gardens-0.095
Token things
Token ID1243
Feature activation-3.72e-02

Pos logit contributions
 word+0.084
 apology+0.079
'.+0.078
 grant+0.078
 owe+0.076
Neg logit contributions
 castles-0.072
 cities-0.069
 forests-0.068
 gardens-0.068
 fields-0.068
Token that
Token ID326
Feature activation-1.01e-02

Pos logit contributions
 word+0.070
 apology+0.066
 grant+0.063
 pain+0.063
 sentence+0.062
Neg logit contributions
 castles-0.060
cast-0.058
 gardens-0.056
 cities-0.055
 fields-0.055
Token are
Token ID389
Feature activation-2.75e-02

Pos logit contributions
 word+0.013
 apology+0.012
 pain+0.012
'.+0.012
 hurts+0.012
Neg logit contributions
 castles-0.022
 cities-0.022
 forests-0.021
cast-0.021
 oceans-0.021
Token She
Token ID1375
Feature activation+1.67e-02

Pos logit contributions
 word+0.023
'.+0.023
 grant+0.021
 apology+0.021
se+0.020
Neg logit contributions
 castles-0.025
cast-0.023
 cities-0.023
 ships-0.022
 oceans-0.022
Token shared
Token ID4888
Feature activation+6.65e-03

Pos logit contributions
 castles+0.029
 cities+0.028
 forests+0.027
 oceans+0.027
 gardens+0.027
Neg logit contributions
 word-0.029
 apology-0.027
'.-0.027
 grant-0.027
 pain-0.027
Token it
Token ID340
Feature activation-3.81e-03

Pos logit contributions
 castles+0.012
 cities+0.012
cast+0.012
 forests+0.011
 oceans+0.011
Neg logit contributions
 word-0.011
 apology-0.010
 pain-0.010
'.-0.010
 grant-0.010
Token with
Token ID351
Feature activation+2.21e-02

Pos logit contributions
 word+0.006
'.+0.005
 apology+0.005
 pain+0.005
 grant+0.005
Neg logit contributions
 castles-0.007
 cities-0.007
 forests-0.007
 oceans-0.007
cast-0.007
Token everyone
Token ID2506
Feature activation+1.37e-02

Pos logit contributions
 castles+0.031
 cities+0.029
cast+0.029
 forests+0.029
 gardens+0.028
Neg logit contributions
 word-0.047
 apology-0.044
 pain-0.044
'.-0.044
 grant-0.043
Token and
Token ID290
Feature activation+9.16e-02

Pos logit contributions
 castles+0.024
 cities+0.023
cast+0.023
 forests+0.023
 gardens+0.022
Neg logit contributions
 word-0.024
 apology-0.022
'.-0.022
 pain-0.022
 grant-0.022
Token they
Token ID484
Feature activation+1.66e-02

Pos logit contributions
angers+0.153
aris+0.145
cast+0.144
 castles+0.142
 oceans+0.140
Neg logit contributions
 word-0.167
 pain-0.160
 apology-0.151
 value-0.151
 prove-0.149
Token all
Token ID477
Feature activation+4.05e-02

Pos logit contributions
angers+0.027
 castles+0.026
 cities+0.026
 oceans+0.025
aris+0.025
Neg logit contributions
 word-0.029
 grant-0.028
 value-0.028
 owe-0.028
'.-0.028
Token said
Token ID531
Feature activation+1.39e-02

Pos logit contributions
 castles+0.060
 cities+0.058
cast+0.057
 forests+0.056
 gardens+0.056
Neg logit contributions
 word-0.081
 apology-0.076
 pain-0.076
'.-0.075
 grant-0.074
Token it
Token ID340
Feature activation-1.74e-02

Pos logit contributions
 castles+0.022
 cities+0.021
cast+0.021
 forests+0.021
 gardens+0.021
Neg logit contributions
 word-0.026
 apology-0.025
'.-0.024
 pain-0.024
 grant-0.024
Token was
Token ID373
Feature activation-1.81e-02

Pos logit contributions
 word+0.027
 apology+0.026
 pain+0.026
'.+0.025
 grant+0.024
Neg logit contributions
 castles-0.033
 cities-0.032
cast-0.032
 gardens-0.032
 forests-0.031
Token They
Token ID1119
Feature activation-1.51e-02

Pos logit contributions
 word+0.054
'.+0.052
 pain+0.049
 apology+0.048
 shock+0.048
Neg logit contributions
 castles-0.061
 cities-0.060
 forests-0.058
cast-0.058
angers-0.057
Token lived
Token ID5615
Feature activation+2.27e-02

Pos logit contributions
 word+0.030
'.+0.029
 apology+0.029
 pain+0.028
 grant+0.028
Neg logit contributions
 castles-0.022
 cities-0.021
 forests-0.021
cast-0.021
 gardens-0.020
Token near
Token ID1474
Feature activation-1.13e-03

Pos logit contributions
 cities+0.049
 castles+0.048
 forests+0.047
 oceans+0.047
 ships+0.047
Neg logit contributions
 word-0.029
 pain-0.026
 value-0.025
'.-0.024
 apology-0.024
Token the
Token ID262
Feature activation+2.84e-02

Pos logit contributions
 word+0.002
 apology+0.001
 pain+0.001
'.+0.001
 grant+0.001
Neg logit contributions
 castles-0.002
cast-0.002
 cities-0.002
 forests-0.002
 gardens-0.002
Token beach
Token ID10481
Feature activation+5.47e-02

Pos logit contributions
 castles+0.040
 cities+0.038
cast+0.038
 forests+0.037
 gardens+0.037
Neg logit contributions
 word-0.059
 apology-0.055
 pain-0.055
'.-0.055
 grant-0.054
Token and
Token ID290
Feature activation+6.37e-02

Pos logit contributions
 Hil+0.129
 cities+0.126
angers+0.125
 forests+0.121
 deserts+0.120
Neg logit contributions
 word-0.058
'.-0.050
 value-0.047
".-0.046
 bite-0.044
Token loved
Token ID6151
Feature activation+2.72e-03

Pos logit contributions
angers+0.127
aris+0.117
 cities+0.114
 oceans+0.110
 forests+0.110
Neg logit contributions
 word-0.092
 value-0.089
 pain-0.086
 am-0.086
 grant-0.085
Token playing
Token ID2712
Feature activation-3.07e-03

Pos logit contributions
angers+0.006
 grounds+0.005
pit+0.005
leys+0.005
ways+0.005
Neg logit contributions
 word-0.004
 value-0.003
 pain-0.003
 shock-0.003
 bite-0.003
Token in
Token ID287
Feature activation-9.99e-03

Pos logit contributions
 word+0.006
 apology+0.005
 pain+0.005
'.+0.005
 grant+0.005
Neg logit contributions
 castles-0.005
cast-0.005
 cities-0.005
 gardens-0.005
 forests-0.005
Token the
Token ID262
Feature activation+2.89e-02

Pos logit contributions
 word+0.016
 pain+0.014
'.+0.014
 shock+0.014
 value+0.014
Neg logit contributions
angers-0.019
aris-0.018
cast-0.018
 castles-0.018
 cities-0.018
Token sand
Token ID6450
Feature activation+3.82e-02

Pos logit contributions
angers+0.041
aris+0.039
 cities+0.039
 castles+0.038
cast+0.037
Neg logit contributions
 word-0.057
 pain-0.054
 value-0.053
 hurts-0.052
'.-0.052
Token.
Token ID13
Feature activation-1.06e-02

Pos logit contributions
 word+0.007
'.+0.007
 pain+0.006
 apology+0.006
 hurts+0.006
Neg logit contributions
 castles-0.015
 cities-0.015
 forests-0.014
cast-0.014
 gardens-0.014
Token
Token ID198
Feature activation-6.58e-03

Pos logit contributions
 word+0.017
'.+0.016
 apology+0.016
 pain+0.016
 grant+0.015
Neg logit contributions
 castles-0.020
 cities-0.019
cast-0.019
 gardens-0.019
 forests-0.019
Token
Token ID198
Feature activation-1.51e-02

Pos logit contributions
 word+0.013
 pain+0.012
 apology+0.012
 hurts+0.012
 grant+0.012
Neg logit contributions
 castles-0.010
 forests-0.009
 cities-0.009
cast-0.009
 gardens-0.009
TokenThe
Token ID464
Feature activation+2.54e-02

Pos logit contributions
 word+0.026
 apology+0.024
 pain+0.024
 hurts+0.024
'.+0.023
Neg logit contributions
 castles-0.026
 cities-0.026
 forests-0.025
 gardens-0.025
 fields-0.025
Token bear
Token ID6842
Feature activation-6.07e-03

Pos logit contributions
 castles+0.054
 cities+0.052
 forests+0.051
cast+0.051
 gardens+0.051
Neg logit contributions
 word-0.035
 apology-0.032
'.-0.032
 pain-0.032
 grant-0.031
Token and
Token ID290
Feature activation+8.51e-02

Pos logit contributions
 word+0.010
 pain+0.009
 apology+0.009
'.+0.009
 hurts+0.009
Neg logit contributions
 castles-0.011
 cities-0.011
cast-0.011
 forests-0.010
 gardens-0.010
Token mouse
Token ID10211
Feature activation-6.63e-03

Pos logit contributions
 castles+0.145
cast+0.143
 cities+0.139
angers+0.137
 gardens+0.136
Neg logit contributions
 word-0.155
 pain-0.146
 apology-0.142
 shock-0.138
 grant-0.137
Token were
Token ID547
Feature activation+1.22e-02

Pos logit contributions
 word+0.012
 pain+0.012
 apology+0.011
'.+0.011
 grant+0.011
Neg logit contributions
angers-0.011
 castles-0.011
cast-0.010
 cities-0.010
 oceans-0.010
Token very
Token ID845
Feature activation+8.54e-03

Pos logit contributions
angers+0.021
 castles+0.021
cast+0.021
 cities+0.020
 oceans+0.020
Neg logit contributions
 word-0.021
 pain-0.021
 shock-0.020
 hurts-0.019
'.-0.019
Token scared
Token ID12008
Feature activation-1.85e-02

Pos logit contributions
 castles+0.015
angers+0.014
 cities+0.014
cast+0.014
 forests+0.014
Neg logit contributions
 word-0.015
 pain-0.014
 shock-0.014
 apology-0.014
 hurts-0.014
Token.
Token ID13
Feature activation-1.38e-02

Pos logit contributions
 word+0.022
 pain+0.019
'.+0.019
 apology+0.019
 hurts+0.018
Neg logit contributions
 castles-0.043
 cities-0.042
cast-0.042
 gardens-0.041
 forests-0.041
Token being
Token ID852
Feature activation-2.34e-02

Pos logit contributions
 castles+0.015
 cities+0.014
 forests+0.014
 fields+0.014
 gardens+0.014
Neg logit contributions
 word-0.015
 apology-0.014
'.-0.014
 grant-0.014
 owe-0.014
Token seen
Token ID1775
Feature activation-1.32e-02

Pos logit contributions
 word+0.042
 pain+0.040
 apology+0.039
'.+0.039
 hurts+0.039
Neg logit contributions
 castles-0.039
 cities-0.038
cast-0.037
 forests-0.037
 gardens-0.037
Token!
Token ID0
Feature activation-2.22e-02

Pos logit contributions
 word+0.016
 apology+0.015
 pain+0.014
 grant+0.014
'.+0.014
Neg logit contributions
 castles-0.030
 cities-0.029
 forests-0.029
cast-0.029
 gardens-0.029
Token He
Token ID679
Feature activation+2.61e-03

Pos logit contributions
 word+0.041
 pain+0.039
 apology+0.039
 grant+0.038
 hurts+0.037
Neg logit contributions
 castles-0.035
 cities-0.034
 forests-0.034
 fields-0.033
cast-0.033
Token danced
Token ID39480
Feature activation+3.42e-02

Pos logit contributions
 castles+0.005
 cities+0.005
cast+0.004
 gardens+0.004
 forests+0.004
Neg logit contributions
 word-0.004
 pain-0.004
 apology-0.004
'.-0.004
 grant-0.004
Token and
Token ID290
Feature activation+8.80e-02

Pos logit contributions
 castles+0.064
angers+0.063
 cities+0.063
 gardens+0.063
 planting+0.061
Neg logit contributions
 word-0.053
'.-0.048
 pain-0.047
se-0.046
 shock-0.046
Token played
Token ID2826
Feature activation+2.15e-02

Pos logit contributions
angers+0.159
aris+0.151
 castles+0.142
 cities+0.140
 gardens+0.139
Neg logit contributions
 pain-0.152
 word-0.149
 hurts-0.145
 shock-0.139
 value-0.135
Token outside
Token ID2354
Feature activation+1.18e-02

Pos logit contributions
 castles+0.032
 cities+0.031
 gardens+0.030
cast+0.030
 forests+0.030
Neg logit contributions
 word-0.042
 pain-0.040
 apology-0.039
'.-0.039
 grant-0.039
Token for
Token ID329
Feature activation-3.11e-02

Pos logit contributions
 castles+0.021
 cities+0.020
angers+0.020
cast+0.020
 gardens+0.020
Neg logit contributions
 word-0.020
 pain-0.018
'.-0.018
 grant-0.018
 apology-0.017
Token the
Token ID262
Feature activation+1.89e-02

Pos logit contributions
 word+0.067
 apology+0.064
 pain+0.063
'.+0.063
 grant+0.062
Neg logit contributions
 castles-0.042
 cities-0.040
cast-0.039
 forests-0.038
 gardens-0.038
Token first
Token ID717
Feature activation-2.57e-02

Pos logit contributions
 castles+0.034
 cities+0.033
cast+0.032
 forests+0.032
 gardens+0.032
Neg logit contributions
 word-0.032
 apology-0.030
 pain-0.030
'.-0.029
 grant-0.029
TokenL
Token ID43
Feature activation-5.03e-03

Pos logit contributions
 word+0.047
 apology+0.044
 pain+0.043
'.+0.043
 grant+0.042
Neg logit contributions
 castles-0.050
 cities-0.048
 forests-0.047
cast-0.047
 gardens-0.047
Tokenily
Token ID813
Feature activation-1.34e-02

Pos logit contributions
 word+0.008
'.+0.008
se+0.007
 shock+0.007
 grant+0.007
Neg logit contributions
 castles-0.009
angers-0.009
 upon-0.009
cast-0.009
 gardens-0.009
Token dropped
Token ID5710
Feature activation+2.18e-02

Pos logit contributions
 word+0.024
 apology+0.023
 pain+0.023
'.+0.023
 grant+0.022
Neg logit contributions
 castles-0.022
 cities-0.022
cast-0.021
 forests-0.021
 gardens-0.021
Token the
Token ID262
Feature activation+2.80e-02

Pos logit contributions
 castles+0.027
 fields+0.025
 cities+0.025
 mountains+0.025
 forests+0.025
Neg logit contributions
 word-0.048
 grant-0.047
'.-0.046
 apology-0.046
 owe-0.046
Token flag
Token ID6056
Feature activation+7.75e-03

Pos logit contributions
 castles+0.047
 cities+0.043
 ships+0.043
 mountains+0.042
cast+0.042
Neg logit contributions
'.-0.051
 word-0.049
 grant-0.048
 owe-0.047
ggie-0.047
Token and
Token ID290
Feature activation+8.76e-02

Pos logit contributions
 castles+0.013
 cities+0.012
cast+0.012
 forests+0.012
 gardens+0.012
Neg logit contributions
 word-0.014
 apology-0.013
'.-0.013
 pain-0.013
 grant-0.013
Token fell
Token ID3214
Feature activation+2.13e-02

Pos logit contributions
 cities+0.151
 castles+0.151
aris+0.149
angers+0.147
 forests+0.147
Neg logit contributions
 word-0.149
 pain-0.146
 hurts-0.143
 apology-0.137
 grant-0.136
Token off
Token ID572
Feature activation+6.30e-03

Pos logit contributions
 castles+0.036
 cities+0.034
cast+0.034
 forests+0.033
 fields+0.033
Neg logit contributions
 word-0.039
 apology-0.037
'.-0.036
 pain-0.036
 grant-0.036
Token the
Token ID262
Feature activation+4.32e-02

Pos logit contributions
 castles+0.012
 cities+0.011
 forests+0.011
cast+0.011
 fields+0.011
Neg logit contributions
 word-0.010
 apology-0.010
'.-0.009
 grant-0.009
 owe-0.009
Token swing
Token ID9628
Feature activation+1.48e-02

Pos logit contributions
 castles+0.069
 fields+0.063
 mountains+0.063
 gardens+0.061
 ships+0.060
Neg logit contributions
'.-0.084
 word-0.081
rum-0.080
se-0.079
 owe-0.078
Token.
Token ID13
Feature activation-1.14e-02

Pos logit contributions
 castles+0.033
cast+0.031
 cities+0.031
 fields+0.031
 gardens+0.030
Neg logit contributions
 word-0.018
 apology-0.018
 pain-0.017
 grant-0.017
'.-0.017
Token a
Token ID257
Feature activation-2.39e-02

Pos logit contributions
 word+0.038
 pain+0.037
'.+0.034
 shock+0.034
 apology+0.034
Neg logit contributions
 cities-0.058
aris-0.058
angers-0.057
cast-0.057
 oceans-0.056
Token fort
Token ID6285
Feature activation-6.77e-03

Pos logit contributions
 word+0.053
 apology+0.049
 pain+0.049
 shock+0.048
 grant+0.047
Neg logit contributions
 castles-0.030
 cities-0.030
 forests-0.029
 oceans-0.029
 gardens-0.028
Token.
Token ID13
Feature activation-2.07e-02

Pos logit contributions
 word+0.009
'.+0.008
 pain+0.008
 apology+0.008
 grant+0.008
Neg logit contributions
 castles-0.015
 cities-0.014
 forests-0.014
cast-0.014
 oceans-0.014
Token They
Token ID1119
Feature activation-8.74e-03

Pos logit contributions
 word+0.034
'.+0.032
 apology+0.032
 pain+0.031
 hurts+0.031
Neg logit contributions
 castles-0.038
 cities-0.037
cast-0.037
 forests-0.036
 gardens-0.036
Token put
Token ID1234
Feature activation+4.76e-02

Pos logit contributions
 word+0.013
 owe+0.013
 value+0.013
 grant+0.013
 bite+0.013
Neg logit contributions
 cities-0.016
 oceans-0.016
cast-0.016
angers-0.016
 castles-0.016
Token the
Token ID262
Feature activation+3.36e-02

Pos logit contributions
 cities+0.081
cast+0.081
 castles+0.081
aris+0.081
angers+0.079
Neg logit contributions
 word-0.080
 pain-0.075
 apology-0.072
 shock-0.071
'.-0.071
Token box
Token ID3091
Feature activation-6.14e-03

Pos logit contributions
cast+0.054
 castles+0.053
 cities+0.053
aris+0.052
angers+0.051
Neg logit contributions
 word-0.063
 pain-0.059
 apology-0.056
 hurts-0.056
 shock-0.056
Token on
Token ID319
Feature activation-2.73e-02

Pos logit contributions
 word+0.008
'.+0.007
 hurts+0.007
 pain+0.007
 apology+0.007
Neg logit contributions
 cities-0.013
 castles-0.012
 forests-0.012
cast-0.012
 oceans-0.012
Token the
Token ID262
Feature activation+4.79e-02

Pos logit contributions
 word+0.054
 apology+0.052
 pain+0.051
 owe+0.051
 hurts+0.051
Neg logit contributions
 castles-0.042
 fields-0.039
 cities-0.039
 forests-0.039
 gardens-0.039
Token grass
Token ID8701
Feature activation+3.16e-02

Pos logit contributions
 castles+0.069
 cities+0.064
 gardens+0.064
 forests+0.064
 fields+0.063
Neg logit contributions
 word-0.099
 apology-0.095
'.-0.093
 pain-0.093
 grant-0.091
Token and
Token ID290
Feature activation+9.10e-02

Pos logit contributions
 castles+0.059
 cities+0.058
 forests+0.057
cast+0.057
 oceans+0.056
Neg logit contributions
 word-0.051
'.-0.046
 apology-0.044
 pain-0.044
 hurts-0.043
Token if
Token ID611
Feature activation-8.31e-03

Pos logit contributions
angers+0.012
 oceans+0.012
 forests+0.012
cast+0.012
 castles+0.012
Neg logit contributions
 word-0.014
'.-0.013
 apology-0.013
 grant-0.013
 value-0.012
Token they
Token ID484
Feature activation-1.94e-02

Pos logit contributions
 word+0.016
 pain+0.015
 value+0.014
 apology+0.014
 grant+0.014
Neg logit contributions
cast-0.013
 castles-0.013
angers-0.013
 cities-0.012
 forests-0.012
Token could
Token ID714
Feature activation-7.17e-02

Pos logit contributions
 word+0.039
 apology+0.037
 pain+0.036
'.+0.036
 grant+0.035
Neg logit contributions
 castles-0.029
 cities-0.028
cast-0.027
 forests-0.027
 gardens-0.027
Token make
Token ID787
Feature activation-3.08e-02

Pos logit contributions
 apology+0.134
rum+0.133
 pain+0.133
'.+0.132
 word+0.131
Neg logit contributions
 castles-0.110
cast-0.095
 planting-0.095
 upon-0.093
 playing-0.093
Token a
Token ID257
Feature activation-2.41e-02

Pos logit contributions
 word+0.051
 apology+0.049
 pain+0.048
'.+0.048
 grant+0.047
Neg logit contributions
 castles-0.056
 cities-0.053
cast-0.052
 forests-0.052
 gardens-0.052
Token car
Token ID1097
Feature activation+7.29e-03

Pos logit contributions
 word+0.038
'.+0.038
 pain+0.037
 apology+0.037
se+0.037
Neg logit contributions
 castles-0.045
cast-0.042
 cities-0.042
 ships-0.041
 gardens-0.041
Token out
Token ID503
Feature activation-1.62e-02

Pos logit contributions
 castles+0.015
cast+0.014
 cities+0.014
 ships+0.014
 gardens+0.014
Neg logit contributions
 word-0.010
 apology-0.010
 pain-0.010
'.-0.010
 grant-0.009
Token of
Token ID286
Feature activation+8.24e-03

Pos logit contributions
 word+0.024
 apology+0.023
 pain+0.023
 grant+0.022
'.+0.022
Neg logit contributions
 castles-0.032
 cities-0.031
 ships-0.030
 forests-0.030
 gardens-0.030
Token it
Token ID340
Feature activation-1.26e-02

Pos logit contributions
 castles+0.014
 cities+0.013
 forests+0.013
 gardens+0.013
 oceans+0.013
Neg logit contributions
 word-0.015
 apology-0.014
'.-0.014
 pain-0.014
 hurts-0.014
Token.
Token ID13
Feature activation-1.41e-02

Pos logit contributions
 word+0.014
 apology+0.013
 pain+0.013
 grant+0.012
'.+0.012
Neg logit contributions
 castles-0.030
cast-0.029
 cities-0.029
 forests-0.028
 gardens-0.028
Token They
Token ID1119
Feature activation-5.03e-03

Pos logit contributions
 word+0.022
'.+0.021
 apology+0.020
 owe+0.019
 pain+0.019
Neg logit contributions
 castles-0.028
 cities-0.027
 forests-0.027
 oceans-0.027
cast-0.026
Token so
Token ID523
Feature activation-7.71e-03

Pos logit contributions
 word+0.030
 pain+0.028
 grant+0.027
 apology+0.027
 shock+0.027
Neg logit contributions
 castles-0.030
 cities-0.029
cast-0.029
 forests-0.028
 oceans-0.028
Token happy
Token ID3772
Feature activation+2.47e-03

Pos logit contributions
 word+0.017
 pain+0.017
 apology+0.016
'.+0.016
 hurts+0.016
Neg logit contributions
 castles-0.010
 cities-0.010
 forests-0.009
 oceans-0.009
 fields-0.009
Token!
Token ID0
Feature activation-2.39e-02

Pos logit contributions
 castles+0.005
 cities+0.005
 forests+0.005
 oceans+0.005
cast+0.005
Neg logit contributions
 word-0.004
'.-0.003
 pain-0.003
 hurts-0.003
 apology-0.003
Token She
Token ID1375
Feature activation+7.78e-03

Pos logit contributions
 word+0.044
'.+0.043
 apology+0.040
 pain+0.040
 grant+0.040
Neg logit contributions
 castles-0.041
 cities-0.039
cast-0.038
 ships-0.038
 oceans-0.037
Token took
Token ID1718
Feature activation+1.36e-03

Pos logit contributions
 castles+0.013
 cities+0.012
 forests+0.012
 oceans+0.012
 gardens+0.012
Neg logit contributions
 word-0.014
 grant-0.014
 pain-0.013
'.-0.013
 hurts-0.013
Token her
Token ID607
Feature activation+1.55e-02

Pos logit contributions
 castles+0.002
 cities+0.002
 forests+0.002
 fields+0.002
 gardens+0.002
Neg logit contributions
 word-0.003
 apology-0.002
'.-0.002
 grant-0.002
 owe-0.002
Token new
Token ID649
Feature activation+6.61e-03

Pos logit contributions
 castles+0.028
 cities+0.027
cast+0.027
 forests+0.027
 gardens+0.027
Neg logit contributions
 word-0.026
 apology-0.024
 pain-0.024
'.-0.024
 grant-0.023
Token things
Token ID1243
Feature activation-2.20e-02

Pos logit contributions
 castles+0.012
 cities+0.011
 forests+0.011
 ships+0.011
 fields+0.011
Neg logit contributions
 word-0.011
 apology-0.010
 pain-0.010
'.-0.010
 owe-0.010
Token to
Token ID284
Feature activation-2.63e-02

Pos logit contributions
 word+0.036
'.+0.033
 pain+0.032
 apology+0.032
 grant+0.031
Neg logit contributions
 castles-0.041
 cities-0.040
 forests-0.039
cast-0.039
 gardens-0.039
Token the
Token ID262
Feature activation+3.66e-02

Pos logit contributions
 word+0.036
 apology+0.033
 pain+0.032
'.+0.032
 grant+0.031
Neg logit contributions
 castles-0.056
 cities-0.054
cast-0.053
 forests-0.053
 gardens-0.053
Token park
Token ID3952
Feature activation+4.34e-02

Pos logit contributions
 castles+0.054
 cities+0.051
cast+0.050
 forests+0.050
 gardens+0.050
Neg logit contributions
 word-0.075
 apology-0.070
 pain-0.069
'.-0.069
 grant-0.068
Token loved
Token ID6151
Feature activation+4.81e-03

Pos logit contributions
 word+0.008
 apology+0.008
 trigger+0.008
'.+0.008
 grant+0.008
Neg logit contributions
 castles-0.004
 fields-0.004
 forests-0.004
 gardens-0.004
cast-0.003
Token to
Token ID284
Feature activation-3.89e-02

Pos logit contributions
cast+0.010
angers+0.010
leys+0.010
 cities+0.010
pit+0.010
Neg logit contributions
 word-0.006
 pain-0.006
 value-0.006
 hurts-0.006
 shock-0.005
Token explore
Token ID7301
Feature activation+2.71e-02

Pos logit contributions
 word+0.070
 apology+0.067
 pain+0.066
 hurts+0.065
'.+0.065
Neg logit contributions
 castles-0.065
 gardens-0.061
 fields-0.060
 ships-0.060
 cities-0.059
Token.
Token ID13
Feature activation-3.48e-02

Pos logit contributions
angers+0.062
aris+0.060
cast+0.059
 castles+0.059
pit+0.058
Neg logit contributions
 word-0.031
'.-0.028
 pain-0.027
 value-0.025
 shock-0.024
Token One
Token ID1881
Feature activation-3.51e-03

Pos logit contributions
 word+0.059
'.+0.055
 apology+0.054
 pain+0.054
 grant+0.052
Neg logit contributions
 castles-0.064
 cities-0.061
cast-0.060
 gardens-0.059
 forests-0.059
Token day
Token ID1110
Feature activation+9.64e-03

Pos logit contributions
 word+0.006
 apology+0.006
 pain+0.006
'.+0.006
 grant+0.006
Neg logit contributions
 castles-0.006
 cities-0.006
cast-0.005
 forests-0.005
 gardens-0.005
Token,
Token ID11
Feature activation+7.04e-02

Pos logit contributions
 castles+0.017
 cities+0.016
cast+0.016
 gardens+0.016
 forests+0.016
Neg logit contributions
 word-0.017
 pain-0.016
'.-0.016
 apology-0.016
 hurts-0.016
Token he
Token ID339
Feature activation+1.16e-02

Pos logit contributions
cast+0.116
 cities+0.116
aris+0.115
udd+0.115
angers+0.114
Neg logit contributions
 word-0.132
 pain-0.125
 shock-0.122
 grant-0.117
 bite-0.116
Token was
Token ID373
Feature activation-2.25e-02

Pos logit contributions
angers+0.023
aris+0.023
 cities+0.022
 castles+0.022
cast+0.021
Neg logit contributions
 am-0.016
 pain-0.016
 value-0.015
 word-0.014
 hurts-0.014
Token out
Token ID503
Feature activation-1.93e-02

Pos logit contributions
 pain+0.039
 hurt+0.035
 shock+0.035
ggy+0.033
 pinch+0.033
Neg logit contributions
cast-0.041
aris-0.041
 cities-0.041
 castles-0.039
 oceans-0.038
Token playing
Token ID2712
Feature activation+1.52e-03

Pos logit contributions
 pain+0.041
 word+0.040
 shock+0.038
 apology+0.038
 grant+0.038
Neg logit contributions
cast-0.026
 castles-0.026
 cities-0.025
aris-0.024
 oceans-0.024
Token heart
Token ID2612
Feature activation-8.90e-03

Pos logit contributions
 castles+0.035
 mountains+0.033
 forests+0.033
 gardens+0.033
 cities+0.033
Neg logit contributions
'.-0.044
 word-0.042
',"-0.041
se-0.040
rum-0.040
Token felt
Token ID2936
Feature activation+4.90e-03

Pos logit contributions
 word+0.014
 pain+0.013
 hurts+0.013
 grant+0.013
 apology+0.012
Neg logit contributions
 cities-0.017
 castles-0.017
cast-0.016
 forests-0.016
 oceans-0.016
Token warm
Token ID5814
Feature activation+9.25e-03

Pos logit contributions
 castles+0.006
 gardens+0.006
 fields+0.006
 cities+0.006
 forests+0.006
Neg logit contributions
 word-0.011
'.-0.010
 apology-0.010
 sentence-0.010
se-0.010
Token and
Token ID290
Feature activation+7.55e-02

Pos logit contributions
 castles+0.016
 cities+0.016
cast+0.015
 forests+0.015
 gardens+0.015
Neg logit contributions
 word-0.016
 pain-0.015
'.-0.015
 apology-0.014
 hurts-0.014
Token the
Token ID262
Feature activation+3.21e-02

Pos logit contributions
angers+0.161
aris+0.152
 cities+0.142
pit+0.141
cast+0.139
Neg logit contributions
 pain-0.122
 word-0.119
 value-0.112
 hurts-0.111
 shock-0.107
Token little
Token ID1310
Feature activation+1.81e-02

Pos logit contributions
 castles+0.054
 forests+0.052
 cities+0.052
 mountains+0.052
 gardens+0.052
Neg logit contributions
 word-0.056
'.-0.055
 apology-0.053
',"-0.051
se-0.051
Token boy
Token ID2933
Feature activation-2.75e-02

Pos logit contributions
 castles+0.027
 cities+0.025
 mountains+0.025
 forests+0.025
 oceans+0.025
Neg logit contributions
 word-0.036
'.-0.035
 apology-0.035
 grant-0.034
 owe-0.034
Token smiled
Token ID13541
Feature activation+2.66e-02

Pos logit contributions
 word+0.044
 pain+0.042
 grant+0.041
'.+0.041
 hurts+0.040
Neg logit contributions
 castles-0.052
 cities-0.051
cast-0.050
 oceans-0.049
 forests-0.049
Token.
Token ID13
Feature activation-2.21e-02

Pos logit contributions
 castles+0.055
cast+0.053
 cities+0.052
 fields+0.052
 forests+0.052
Neg logit contributions
 word-0.037
 apology-0.037
 owe-0.035
 grant-0.034
 pain-0.034
Token He
Token ID679
Feature activation+2.02e-03

Pos logit contributions
 word+0.030
'.+0.027
 apology+0.025
 value+0.024
 grant+0.024
Neg logit contributions
 castles-0.049
cast-0.047
 cities-0.047
 gardens-0.046
 forests-0.046
Token started
Token ID2067
Feature activation+1.24e-02

Pos logit contributions
 castles+0.004
 cities+0.004
angers+0.004
 oceans+0.003
 forests+0.003
Neg logit contributions
 word-0.003
 grant-0.003
 value-0.003
 pain-0.003
 owe-0.003
Token to
Token ID284
Feature activation-3.18e-02

Pos logit contributions
 castles+0.021
 forests+0.020
 cities+0.020
angers+0.020
 gardens+0.020
Neg logit contributions
 word-0.017
'.-0.015
 pain-0.015
 value-0.014
 apology-0.014
Token learn
Token ID2193
Feature activation-1.36e-02

Pos logit contributions
 hurts+0.055
 trigger+0.053
 word+0.053
 pain+0.052
 apology+0.051
Neg logit contributions
 castles-0.053
 cities-0.050
 discovering-0.049
 imagining-0.049
 gardens-0.049
Token something
Token ID1223
Feature activation-4.56e-02

Pos logit contributions
 word+0.025
 apology+0.024
'.+0.023
 pain+0.023
 grant+0.023
Neg logit contributions
 castles-0.022
 cities-0.021
cast-0.021
 forests-0.021
 gardens-0.021
Token new
Token ID649
Feature activation+7.45e-04

Pos logit contributions
 word+0.088
 apology+0.083
 pain+0.082
'.+0.082
 grant+0.080
Neg logit contributions
 castles-0.071
 cities-0.068
cast-0.066
 forests-0.066
 gardens-0.066
Token.
Token ID13
Feature activation-1.71e-02

Pos logit contributions
 castles+0.001
 cities+0.001
cast+0.001
 gardens+0.001
 ships+0.001
Neg logit contributions
 pain-0.001
 word-0.001
 apology-0.001
 grant-0.001
 owe-0.001
Token
Token ID198
Feature activation-1.10e-02

Pos logit contributions
 word+0.028
'.+0.027
 apology+0.026
 value+0.024
 owe+0.024
Neg logit contributions
 castles-0.032
cast-0.030
 cities-0.030
 gardens-0.030
angers-0.030
Token
Token ID198
Feature activation-1.83e-02

Pos logit contributions
 word+0.017
 apology+0.016
'.+0.016
 pain+0.015
 grant+0.015
Neg logit contributions
 castles-0.021
 cities-0.021
cast-0.020
 gardens-0.020
 forests-0.020
TokenThe
Token ID464
Feature activation+2.37e-02

Pos logit contributions
 word+0.035
'.+0.032
 apology+0.032
 pain+0.031
 grant+0.031
Neg logit contributions
 castles-0.029
 cities-0.028
cast-0.028
angers-0.028
 gardens-0.028
Token end
Token ID886
Feature activation-1.76e-02

Pos logit contributions
 castles+0.046
 cities+0.045
cast+0.044
 forests+0.044
 gardens+0.044
Neg logit contributions
 word-0.037
 apology-0.034
 pain-0.034
'.-0.033
 grant-0.033
Token.
Token ID13
Feature activation-2.48e-02

Pos logit contributions
 pain+0.022
 apology+0.022
 grant+0.022
 owe+0.022
 trigger+0.022
Neg logit contributions
 castles-0.037
 cities-0.036
 ships-0.035
 fields-0.035
 gardens-0.035
Token<|endoftext|>
Token ID50256
Feature activation+6.76e-03

Pos logit contributions
 word+0.041
'.+0.038
 apology+0.038
 pain+0.038
 grant+0.037
Neg logit contributions
 castles-0.046
 cities-0.044
cast-0.043
 forests-0.043
 gardens-0.043
Token something
Token ID1223
Feature activation-3.69e-02

Pos logit contributions
aris+0.170
angers+0.169
udd+0.160
ules+0.155
 cities+0.152
Neg logit contributions
 pain-0.103
 shock-0.101
 grant-0.100
 word-0.099
 bite-0.096
Token strange
Token ID6283
Feature activation-4.24e-03

Pos logit contributions
'.+0.066
 apology+0.066
 word+0.063
',"+0.062
 grant+0.062
Neg logit contributions
 castles-0.061
 fields-0.057
 ships-0.056
 gardens-0.056
 mountains-0.055
Token happened
Token ID3022
Feature activation-2.33e-02

Pos logit contributions
 apology+0.006
 word+0.006
'.+0.006
 grant+0.006
 owe+0.006
Neg logit contributions
 castles-0.008
 fields-0.008
 ships-0.008
 mountains-0.008
cast-0.008
Token.
Token ID13
Feature activation-1.99e-02

Pos logit contributions
 word+0.025
 apology+0.023
 pain+0.022
'.+0.021
 grant+0.021
Neg logit contributions
 castles-0.056
 cities-0.055
cast-0.054
 forests-0.054
 gardens-0.054
Token The
Token ID383
Feature activation+2.76e-02

Pos logit contributions
 word+0.022
'.+0.022
 grant+0.020
 pain+0.020
 shock+0.020
Neg logit contributions
 castles-0.049
cast-0.048
 cities-0.046
angers-0.044
udd-0.044
Token spaghetti
Token ID48241
Feature activation-4.76e-02

Pos logit contributions
 castles+0.051
 cities+0.049
 forests+0.048
cast+0.048
 gardens+0.048
Neg logit contributions
 word-0.045
 apology-0.042
'.-0.042
 pain-0.042
 grant-0.041
Token monster
Token ID9234
Feature activation-2.15e-02

Pos logit contributions
 word+0.082
 apology+0.077
 pain+0.076
'.+0.076
 grant+0.074
Neg logit contributions
 castles-0.083
 cities-0.080
cast-0.079
 forests-0.079
 gardens-0.078
Token vanished
Token ID23717
Feature activation+7.49e-03

Pos logit contributions
 word+0.037
 pain+0.034
 hurts+0.034
 grant+0.033
 shock+0.033
Neg logit contributions
 castles-0.038
 cities-0.038
angers-0.037
 oceans-0.036
 forests-0.036
Token!
Token ID0
Feature activation-2.05e-02

Pos logit contributions
 castles+0.015
 cities+0.014
 gardens+0.014
 forests+0.014
cast+0.014
Neg logit contributions
 word-0.011
 pain-0.010
'.-0.010
 apology-0.010
 grant-0.010
Token Po
Token ID7695
Feature activation+5.21e-03

Pos logit contributions
 word+0.024
'.+0.024
 pain+0.022
 grant+0.021
 shock+0.021
Neg logit contributions
 castles-0.049
cast-0.047
 cities-0.047
 gardens-0.045
angers-0.045
Tokenof
Token ID1659
Feature activation-5.16e-03

Pos logit contributions
 castles+0.009
 cities+0.009
 forests+0.009
 gardens+0.009
cast+0.009
Neg logit contributions
 word-0.009
'.-0.008
 pain-0.008
 grant-0.008
 apology-0.008
Token later
Token ID1568
Feature activation+2.06e-02

Pos logit contributions
 castles+0.022
 cities+0.021
cast+0.020
 forests+0.020
 gardens+0.020
Neg logit contributions
 word-0.022
'.-0.021
 apology-0.021
 pain-0.021
 grant-0.020
Token."
Token ID526
Feature activation+1.41e-02

Pos logit contributions
 castles+0.053
 cities+0.052
cast+0.051
 forests+0.051
 gardens+0.051
Neg logit contributions
 word-0.019
 apology-0.016
'.-0.016
 pain-0.016
 grant-0.015
Token
Token ID198
Feature activation-1.38e-02

Pos logit contributions
 castles+0.033
 cities+0.032
cast+0.031
 oceans+0.031
 gardens+0.031
Neg logit contributions
'.-0.016
 word-0.016
 apology-0.015
 grant-0.014
 hurts-0.014
Token
Token ID198
Feature activation-2.16e-02

Pos logit contributions
 word+0.024
 apology+0.022
 pain+0.022
'.+0.021
 grant+0.021
Neg logit contributions
 castles-0.024
 cities-0.024
 forests-0.023
cast-0.023
 gardens-0.023
Token"
Token ID1
Feature activation-4.11e-03

Pos logit contributions
 word+0.036
 apology+0.033
'.+0.033
 pain+0.033
 grant+0.032
Neg logit contributions
 castles-0.040
 cities-0.039
cast-0.038
 forests-0.038
 gardens-0.038
TokenOkay
Token ID16454
Feature activation-4.93e-02

Pos logit contributions
 word+0.006
 apology+0.006
'.+0.006
 pain+0.006
 hurts+0.006
Neg logit contributions
 castles-0.008
 cities-0.008
 forests-0.008
 oceans-0.008
cast-0.008
Token,"
Token ID553
Feature activation+3.76e-02

Pos logit contributions
 word+0.067
 apology+0.062
 pain+0.061
 grant+0.060
 owe+0.060
Neg logit contributions
 castles-0.105
 cities-0.100
cast-0.099
 gardens-0.098
 forests-0.098
Token Leo
Token ID19632
Feature activation+7.63e-03

Pos logit contributions
angers+0.060
cast+0.060
 cities+0.059
 castles+0.059
aris+0.058
Neg logit contributions
 word-0.072
 pain-0.068
 apology-0.068
 grant-0.065
 shock-0.065
Token said
Token ID531
Feature activation+3.95e-02

Pos logit contributions
 castles+0.013
 cities+0.012
 forests+0.012
 gardens+0.012
 oceans+0.012
Neg logit contributions
 word-0.014
 apology-0.013
'.-0.013
 pain-0.013
 grant-0.013
Token.
Token ID13
Feature activation-1.07e-02

Pos logit contributions
 castles+0.085
cast+0.081
 ships+0.079
 cities+0.079
 gardens+0.078
Neg logit contributions
 word-0.051
 pain-0.045
 owe-0.045
 grant-0.045
Ble-0.044
Token They
Token ID1119
Feature activation-5.49e-03

Pos logit contributions
 word+0.021
 apology+0.019
'.+0.019
 pain+0.018
 hurts+0.018
Neg logit contributions
 castles-0.017
 cities-0.017
 gardens-0.017
cast-0.017
 oceans-0.016
Token in
Token ID287
Feature activation+1.02e-02

Pos logit contributions
 apology+0.023
 word+0.021
'.+0.021
 owe+0.021
 prove+0.020
Neg logit contributions
 castles-0.019
cast-0.017
 fields-0.016
 forests-0.016
 oceans-0.016
Token her
Token ID607
Feature activation+3.80e-02

Pos logit contributions
 castles+0.012
 cities+0.011
 forests+0.011
 fields+0.011
 oceans+0.011
Neg logit contributions
 word-0.023
 apology-0.023
 owe-0.022
'.-0.022
 hurts-0.022
Token hair
Token ID4190
Feature activation-9.08e-03

Pos logit contributions
 castles+0.072
 gardens+0.069
 fields+0.068
 cities+0.067
 forests+0.067
Neg logit contributions
 word-0.059
'.-0.057
 apology-0.057
 owe-0.056
 grant-0.056
Token.
Token ID13
Feature activation-9.81e-03

Pos logit contributions
 apology+0.012
 word+0.012
 grant+0.011
'.+0.011
 owe+0.011
Neg logit contributions
 castles-0.019
cast-0.019
 fields-0.018
 mountains-0.018
 gardens-0.018
Token She
Token ID1375
Feature activation+1.40e-02

Pos logit contributions
 word+0.016
 pain+0.015
'.+0.014
 hurts+0.014
 apology+0.014
Neg logit contributions
 castles-0.019
 cities-0.018
 ships-0.018
cast-0.018
 gardens-0.017
Token is
Token ID318
Feature activation-1.46e-02

Pos logit contributions
 castles+0.028
 cities+0.027
 forests+0.026
 gardens+0.026
 oceans+0.026
Neg logit contributions
 word-0.021
 pain-0.019
 apology-0.019
 hurts-0.019
 grant-0.019
Token happy
Token ID3772
Feature activation+4.12e-03

Pos logit contributions
 pain+0.028
 word+0.027
 hurts+0.026
 shock+0.026
 value+0.025
Neg logit contributions
 cities-0.023
aris-0.023
 ships-0.023
 forests-0.022
 castles-0.022
Token.
Token ID13
Feature activation-1.40e-02

Pos logit contributions
 cities+0.009
 castles+0.009
 forests+0.009
 ships+0.009
aris+0.009
Neg logit contributions
 word-0.005
 pain-0.004
 hurts-0.004
'.-0.004
 apology-0.004
Token She
Token ID1375
Feature activation+8.62e-03

Pos logit contributions
 word+0.025
'.+0.023
 pain+0.023
 apology+0.023
 grant+0.023
Neg logit contributions
 castles-0.025
 cities-0.023
cast-0.023
 gardens-0.023
 ships-0.023
Token says
Token ID1139
Feature activation+1.32e-02

Pos logit contributions
 castles+0.016
 cities+0.016
cast+0.016
 forests+0.016
 gardens+0.016
Neg logit contributions
 word-0.014
 apology-0.013
'.-0.013
 pain-0.013
 grant-0.012
Token,
Token ID11
Feature activation+5.72e-02

Pos logit contributions
 castles+0.023
 cities+0.023
cast+0.022
 forests+0.022
 gardens+0.022
Neg logit contributions
 word-0.023
 apology-0.021
 pain-0.021
 hurts-0.021
 grant-0.021
Token asks
Token ID7893
Feature activation+3.59e-02

Pos logit contributions
 castles+0.012
 cities+0.011
cast+0.011
 forests+0.011
 gardens+0.011
Neg logit contributions
 word-0.011
 apology-0.010
 pain-0.010
'.-0.010
 grant-0.010
Token.
Token ID13
Feature activation-5.60e-03

Pos logit contributions
 castles+0.080
 cities+0.077
cast+0.076
 gardens+0.076
 forests+0.076
Neg logit contributions
 word-0.045
 apology-0.040
 grant-0.040
 owe-0.040
'.-0.039
Token
Token ID198
Feature activation-1.53e-02

Pos logit contributions
 word+0.011
 apology+0.010
 pain+0.010
'.+0.010
 grant+0.010
Neg logit contributions
 castles-0.008
 cities-0.008
 forests-0.008
 fields-0.008
 gardens-0.008
Token
Token ID198
Feature activation-2.24e-02

Pos logit contributions
 word+0.030
 apology+0.028
 pain+0.028
 grant+0.027
'.+0.027
Neg logit contributions
 castles-0.023
 forests-0.022
 cities-0.022
 fields-0.022
cast-0.021
Token"
Token ID1
Feature activation-5.47e-03

Pos logit contributions
 word+0.039
 apology+0.037
 grant+0.036
 pain+0.036
'.+0.035
Neg logit contributions
 castles-0.039
 cities-0.037
 forests-0.037
 gardens-0.037
 fields-0.037
TokenNo
Token ID2949
Feature activation-5.81e-02

Pos logit contributions
'.+0.008
 word+0.008
 apology+0.008
 pain+0.008
',"+0.008
Neg logit contributions
 castles-0.010
 cities-0.009
 forests-0.009
cast-0.009
 upon-0.009
Token,
Token ID11
Feature activation+3.01e-02

Pos logit contributions
 grant+0.138
'.+0.132
 owe+0.131
se+0.130
 trigger+0.127
Neg logit contributions
 castles-0.062
 upon-0.051
 fields-0.051
 mountains-0.049
 cities-0.049
Token I
Token ID314
Feature activation-8.48e-02

Pos logit contributions
 castles+0.042
 cities+0.037
 gardens+0.037
cast+0.036
 ships+0.035
Neg logit contributions
'.-0.061
 word-0.060
se-0.057
 grant-0.057
 apology-0.057
Token am
Token ID716
Feature activation-5.06e-02

Pos logit contributions
 word+0.123
 apology+0.115
'.+0.111
 trigger+0.111
 grant+0.110
Neg logit contributions
 upon-0.170
 castles-0.159
cast-0.158
udd-0.157
ashing-0.153
Token not
Token ID407
Feature activation-8.27e-02

Pos logit contributions
 word+0.098
 apology+0.092
'.+0.091
 grant+0.089
 trigger+0.088
Neg logit contributions
 castles-0.075
 planting-0.072
 imagining-0.070
cast-0.069
 gardens-0.068
Token OK
Token ID7477
Feature activation-7.16e-02

Pos logit contributions
 word+0.150
 grant+0.149
 trigger+0.147
 apology+0.146
'.+0.141
Neg logit contributions
 castles-0.128
 planting-0.120
 imagining-0.119
 upon-0.118
cast-0.114
Token
Token ID198
Feature activation-2.65e-02

Pos logit contributions
 word+0.032
 apology+0.030
 pain+0.030
'.+0.030
 grant+0.029
Neg logit contributions
 castles-0.036
 cities-0.034
cast-0.034
 forests-0.034
 gardens-0.033
Token"
Token ID1
Feature activation-9.38e-03

Pos logit contributions
 word+0.039
 apology+0.036
 pain+0.036
'.+0.036
 grant+0.035
Neg logit contributions
 castles-0.054
 cities-0.052
cast-0.051
 forests-0.051
 gardens-0.051
TokenNo
Token ID2949
Feature activation-5.67e-02

Pos logit contributions
 word+0.014
'.+0.013
 apology+0.013
',"+0.012
 owe+0.012
Neg logit contributions
 castles-0.018
 forests-0.018
 cities-0.018
 oceans-0.018
 fields-0.017
Token,
Token ID11
Feature activation+2.80e-02

Pos logit contributions
 word+0.116
 grant+0.115
'.+0.114
 owe+0.114
 apology+0.112
Neg logit contributions
 castles-0.077
 cities-0.070
 fields-0.068
 ships-0.068
 forests-0.068
Token you
Token ID345
Feature activation-8.26e-02

Pos logit contributions
 castles+0.048
 cities+0.046
cast+0.045
 forests+0.044
 gardens+0.044
Neg logit contributions
 word-0.049
'.-0.046
 apology-0.046
 pain-0.044
 hurts-0.044
Token won
Token ID1839
Feature activation-8.29e-02

Pos logit contributions
 word+0.127
 apology+0.119
'.+0.116
 grant+0.111
 pain+0.110
Neg logit contributions
 castles-0.160
cast-0.151
 cities-0.150
 ships-0.149
 gardens-0.148
Token't
Token ID470
Feature activation-6.98e-02

Pos logit contributions
 apology+0.158
 word+0.142
Ble+0.134
 owe+0.133
 value+0.131
Neg logit contributions
 castles-0.132
cast-0.127
 mountains-0.121
 planting-0.121
 playing-0.119
Token!
Token ID0
Feature activation-5.05e-02

Pos logit contributions
 apology+0.115
 word+0.107
antly+0.099
Ble+0.099
 phrase+0.098
Neg logit contributions
 castles-0.122
cast-0.114
 upon-0.108
 planting-0.106
 gardens-0.105
Token I
Token ID314
Feature activation-8.18e-02

Pos logit contributions
 word+0.080
 apology+0.074
 pain+0.072
 grant+0.070
'.+0.069
Neg logit contributions
 castles-0.090
 cities-0.089
 forests-0.087
 mountains-0.087
angers-0.086
Token am
Token ID716
Feature activation-4.09e-02

Pos logit contributions
 word+0.139
 apology+0.133
'.+0.127
 trigger+0.126
 sentence+0.125
Neg logit contributions
 upon-0.141
cast-0.133
 castles-0.128
 ships-0.120
ashing-0.119
Token a
Token ID257
Feature activation-5.50e-02

Pos logit contributions
 word+0.072
 apology+0.067
 pain+0.067
'.+0.066
 grant+0.065
Neg logit contributions
 castles-0.071
 cities-0.069
cast-0.067
 forests-0.067
 gardens-0.067
Token her
Token ID607
Feature activation+6.03e-03

Pos logit contributions
cast+0.042
 castles+0.041
 oceans+0.041
 cities+0.040
 ships+0.040
Neg logit contributions
 word-0.037
 apology-0.035
 pain-0.034
 value-0.034
 grant-0.033
Token mom
Token ID1995
Feature activation+1.18e-02

Pos logit contributions
 castles+0.010
 cities+0.009
 gardens+0.009
 fields+0.009
 forests+0.009
Neg logit contributions
 word-0.011
'.-0.010
 apology-0.010
 pain-0.010
 grant-0.010
Tokenmy
Token ID1820
Feature activation-6.11e-03

Pos logit contributions
 castles+0.020
 forests+0.019
 oceans+0.019
 cities+0.019
cast+0.019
Neg logit contributions
 word-0.022
'.-0.020
 value-0.020
 apology-0.019
 grant-0.019
Token if
Token ID611
Feature activation-3.47e-03

Pos logit contributions
 word+0.010
'.+0.010
 apology+0.009
 value+0.009
 grant+0.009
Neg logit contributions
 castles-0.011
 oceans-0.011
 cities-0.011
 forests-0.011
cast-0.011
Token she
Token ID673
Feature activation+5.29e-03

Pos logit contributions
 word+0.007
 pain+0.007
 value+0.007
 grant+0.007
 reward+0.007
Neg logit contributions
cast-0.005
aris-0.005
 forests-0.005
 cities-0.005
 oceans-0.005
Token could
Token ID714
Feature activation-7.10e-02

Pos logit contributions
 castles+0.005
cast+0.004
 ships+0.004
 upon+0.004
aris+0.004
Neg logit contributions
 apology-0.013
 word-0.013
 pain-0.012
 trigger-0.012
'.-0.012
Token have
Token ID423
Feature activation-4.26e-02

Pos logit contributions
 trigger+0.127
 pain+0.127
rum+0.122
 apology+0.121
 hurts+0.119
Neg logit contributions
 castles-0.101
 upon-0.101
 planting-0.100
 gardens-0.099
 parties-0.094
Token a
Token ID257
Feature activation-3.94e-02

Pos logit contributions
 word+0.070
 apology+0.068
'.+0.066
 grant+0.065
 hurts+0.063
Neg logit contributions
 castles-0.078
 cities-0.074
 fields-0.074
 gardens-0.074
 forests-0.074
Token little
Token ID1310
Feature activation-1.15e-02

Pos logit contributions
'.+0.055
 word+0.052
 pain+0.052
 apology+0.052
 owe+0.051
Neg logit contributions
 castles-0.076
 fields-0.074
 gardens-0.074
 mountains-0.072
 planting-0.072
Token bunny
Token ID44915
Feature activation-1.99e-02

Pos logit contributions
 word+0.017
 grant+0.016
'.+0.016
 owe+0.016
se+0.016
Neg logit contributions
 castles-0.022
 gardens-0.021
 fields-0.021
 ships-0.021
 cities-0.020
Token.
Token ID13
Feature activation-2.01e-02

Pos logit contributions
 word+0.029
 apology+0.028
 pain+0.027
 hurts+0.027
 grant+0.026
Neg logit contributions
 castles-0.039
 gardens-0.038
 fields-0.038
 ships-0.037
 cities-0.037
Token and
Token ID290
Feature activation+8.34e-02

Pos logit contributions
 castles+0.052
 cities+0.050
 gardens+0.049
cast+0.049
 forests+0.049
Neg logit contributions
 word-0.050
 pain-0.046
 apology-0.046
'.-0.046
 grant-0.044
Token said
Token ID531
Feature activation+2.82e-02

Pos logit contributions
 cities+0.139
 castles+0.139
angers+0.133
 forests+0.133
 gardens+0.133
Neg logit contributions
 word-0.152
 pain-0.141
 hurts-0.140
 apology-0.138
 grant-0.137
Token,
Token ID11
Feature activation+7.09e-02

Pos logit contributions
 castles+0.044
 cities+0.042
cast+0.041
 forests+0.041
 gardens+0.040
Neg logit contributions
 word-0.055
 apology-0.052
 pain-0.051
'.-0.051
 grant-0.050
Token "
Token ID366
Feature activation-5.89e-03

Pos logit contributions
 castles+0.133
 gardens+0.130
 cities+0.129
aris+0.128
 forests+0.124
Neg logit contributions
 word-0.118
 grant-0.116
'.-0.109
 apology-0.104
 owe-0.099
TokenYou
Token ID1639
Feature activation-7.91e-02

Pos logit contributions
 apology+0.009
 word+0.009
 pain+0.009
 hurts+0.008
 shock+0.008
Neg logit contributions
 castles-0.011
 forests-0.011
 oceans-0.011
 cities-0.011
cast-0.011
Token did
Token ID750
Feature activation-7.24e-02

Pos logit contributions
 word+0.133
 apology+0.125
 pain+0.123
'.+0.122
 grant+0.119
Neg logit contributions
 castles-0.142
 cities-0.137
cast-0.135
 forests-0.134
 gardens-0.134
Token it
Token ID340
Feature activation-4.80e-02

Pos logit contributions
 apology+0.125
 word+0.124
 owe+0.117
 grant+0.115
 trigger+0.114
Neg logit contributions
 castles-0.123
 cities-0.117
 forests-0.116
cast-0.114
 mountains-0.114
Token all
Token ID477
Feature activation+3.53e-03

Pos logit contributions
 word+0.066
 apology+0.063
 pain+0.061
'.+0.059
 grant+0.059
Neg logit contributions
 castles-0.101
cast-0.097
 cities-0.097
 forests-0.095
 gardens-0.095
Token on
Token ID319
Feature activation-5.90e-02

Pos logit contributions
 castles+0.007
 cities+0.006
cast+0.006
 forests+0.006
 gardens+0.006
Neg logit contributions
 word-0.006
 apology-0.005
 pain-0.005
'.-0.005
 grant-0.005
Token your
Token ID534
Feature activation+1.95e-03

Pos logit contributions
 word+0.104
 apology+0.101
 owe+0.094
 hurts+0.094
 trigger+0.093
Neg logit contributions
 castles-0.102
 forests-0.099
 fields-0.098
 cities-0.097
 mountains-0.097
Token own
Token ID898
Feature activation-1.30e-03

Pos logit contributions
 castles+0.003
 cities+0.003
 forests+0.003
 fields+0.003
 oceans+0.003
Neg logit contributions
 word-0.003
 apology-0.003
'.-0.003
 owe-0.003
 grant-0.003
Token just
Token ID655
Feature activation-8.93e-03

Pos logit contributions
 word+0.034
'.+0.033
 apology+0.033
 grant+0.031
se+0.031
Neg logit contributions
 castles-0.032
 cities-0.029
 ships-0.029
 gardens-0.029
 fields-0.029
Token a
Token ID257
Feature activation-4.13e-02

Pos logit contributions
 word+0.013
 pain+0.012
 apology+0.012
'.+0.012
 grant+0.012
Neg logit contributions
 castles-0.018
 cities-0.017
cast-0.017
 forests-0.017
 gardens-0.017
Token toy
Token ID13373
Feature activation-3.50e-02

Pos logit contributions
'.+0.065
',"+0.063
 apology+0.059
 grant+0.059
se+0.058
Neg logit contributions
 castles-0.072
 ships-0.071
 cities-0.067
cast-0.067
 mountains-0.066
Token.
Token ID13
Feature activation-2.43e-02

Pos logit contributions
 apology+0.057
 pain+0.055
 word+0.053
Ble+0.052
 owe+0.052
Neg logit contributions
 castles-0.060
 ships-0.060
 mountains-0.056
cast-0.055
 cities-0.055
Token You
Token ID921
Feature activation-7.10e-02

Pos logit contributions
 word+0.040
 pain+0.037
 apology+0.036
 shock+0.035
 bite+0.035
Neg logit contributions
 cities-0.040
 castles-0.039
 mountains-0.039
 forests-0.039
 ships-0.039
Token can
Token ID460
Feature activation-7.64e-02

Pos logit contributions
 word+0.104
 apology+0.097
'.+0.093
 sentence+0.090
 pain+0.088
Neg logit contributions
 castles-0.142
 ships-0.136
cast-0.134
 cities-0.133
 gardens-0.133
Token get
Token ID651
Feature activation-4.55e-03

Pos logit contributions
 word+0.137
 apology+0.128
 pain+0.124
 phrase+0.122
 sentence+0.122
Neg logit contributions
 castles-0.119
cast-0.111
 ships-0.108
 upon-0.103
 playing-0.103
Token another
Token ID1194
Feature activation-2.16e-02

Pos logit contributions
 word+0.008
 apology+0.007
'.+0.007
 grant+0.007
 sentence+0.007
Neg logit contributions
 castles-0.008
 ships-0.008
 cities-0.007
 gardens-0.007
 oceans-0.007
Token one
Token ID530
Feature activation-7.84e-03

Pos logit contributions
ggie+0.038
ggy+0.037
'.+0.036
Ble+0.036
kered+0.035
Neg logit contributions
 ships-0.031
 upon-0.031
 imagining-0.030
 castles-0.029
 hall-0.028
Token."
Token ID526
Feature activation+8.72e-03

Pos logit contributions
se+0.013
 apology+0.013
 grant+0.013
 pain+0.012
 owe+0.012
Neg logit contributions
 castles-0.012
 ships-0.012
 upon-0.011
 imagining-0.011
 waiting-0.011
Token He
Token ID679
Feature activation+1.28e-02

Pos logit contributions
 castles+0.017
 cities+0.017
 gardens+0.016
 forests+0.016
cast+0.016
Neg logit contributions
 word-0.013
 apology-0.013
'.-0.012
 pain-0.012
 hurts-0.012
Token hands
Token ID2832
Feature activation-3.25e-03

Pos logit contributions
 castles+0.048
 fields+0.047
 ships+0.046
 forests+0.046
 cities+0.045
Neg logit contributions
 word-0.062
 grant-0.060
 apology-0.060
 owe-0.059
 hurts-0.058
Token tight
Token ID5381
Feature activation-2.49e-02

Pos logit contributions
 word+0.004
 apology+0.004
 pain+0.004
'.+0.004
 grant+0.004
Neg logit contributions
 castles-0.007
 cities-0.007
cast-0.007
 forests-0.007
 gardens-0.007
Token.
Token ID13
Feature activation-1.71e-02

Pos logit contributions
 word+0.029
 apology+0.027
 grant+0.027
 pain+0.027
 owe+0.026
Neg logit contributions
 castles-0.057
 cities-0.055
 fields-0.055
 forests-0.054
 gardens-0.054
Token
Token ID198
Feature activation-1.62e-02

Pos logit contributions
'.+0.028
 word+0.027
 grant+0.025
 pain+0.024
 apology+0.024
Neg logit contributions
 castles-0.033
 cities-0.031
cast-0.031
 gardens-0.030
 oceans-0.029
Token"
Token ID1
Feature activation-2.56e-03

Pos logit contributions
 word+0.027
 apology+0.025
 pain+0.025
'.+0.024
 hurts+0.024
Neg logit contributions
 castles-0.030
 cities-0.029
 forests-0.028
cast-0.028
 gardens-0.028
TokenDo
Token ID5211
Feature activation-7.02e-02

Pos logit contributions
 word+0.004
 apology+0.004
 pain+0.004
 hurts+0.004
'.+0.004
Neg logit contributions
 castles-0.005
 cities-0.005
 forests-0.005
 oceans-0.005
cast-0.005
Token you
Token ID345
Feature activation-7.57e-02

Pos logit contributions
 word+0.122
 apology+0.115
 pain+0.114
'.+0.112
 grant+0.110
Neg logit contributions
 castles-0.123
 cities-0.118
 forests-0.115
 gardens-0.115
cast-0.115
Token think
Token ID892
Feature activation-3.16e-02

Pos logit contributions
 word+0.140
 hurts+0.129
 trigger+0.129
 apology+0.124
 purpose+0.123
Neg logit contributions
 upon-0.113
 castles-0.109
 cities-0.109
 deserts-0.108
 ships-0.107
Token we
Token ID356
Feature activation-6.14e-02

Pos logit contributions
 word+0.055
 apology+0.051
'.+0.050
 pain+0.050
 grant+0.049
Neg logit contributions
 castles-0.056
 cities-0.054
cast-0.053
 forests-0.053
 gardens-0.052
Token will
Token ID481
Feature activation-8.03e-02

Pos logit contributions
 word+0.090
 apology+0.085
 pain+0.080
 hurts+0.078
'.+0.077
Neg logit contributions
 castles-0.121
 cities-0.120
 forests-0.118
 fields-0.117
 gardens-0.116
Token be
Token ID307
Feature activation-1.47e-02

Pos logit contributions
 word+0.127
 apology+0.127
 hurts+0.118
 pain+0.118
 trigger+0.117
Neg logit contributions
 castles-0.142
cast-0.137
 gardens-0.135
 planting-0.135
 upon-0.135