TOP ACTIVATIONS
MAX = 13.195

grandma 's house for safe keeping . When
it to me for safe keeping . He told me to
my showed Billy his screw driver and they decided to build
He takes the screw driver and opens the hole .
can make us do things we may not be prepared for
says neigh . Aren 't they lovely , Daisy ?" Anna
the screw with this screw driver . Let 's find more
he got his little screw driver and started to screw in
we have to do things we don 't like , but
, too . They hope they can find a new toy
to the other side . Whoever gets there first is the
that they could do anything they set their minds to .
!", he said . " Whoever finishes their writing first wins
it ." Emma understood and they all played together happily .
question , and she courage ously answered it .
started it . He / she wants my car ." Mom
room and used a screw driver to open it .
. The robot does everything they ask . They have a
bring things to people that they need ."
can hop around it . Whoever finishes first , wins !"

MIN ACTIVATIONS
MIN = -18.127

she got rid of the intr uders . She did not
, he noticed that the Sm elly King was cruel and
try to grab the av oc ados . But the rock
box and share the ch ocol ates . They are happy
to see the rh in oc eros anymore . He is
and wrapped them around the gr umpy man . It said
a big box of ch ocol ates ! <|endoftext|> Once upon
food for the rh in oc eros and his friends .
had never seen the av oc ados . They wished they
have stopped the rh in oc eros . He is sleeping
her a box of ch ocol ates . Lily opened the
look for the rh in oc eros . They
play without going near the sm elly leaves . Tim and
?" asked the rh in oc eros . "
a big box of ch ocol ates !
trapped with the rh in oc eros . They looked for
had two rulers , the Sm elly King and the St
to roll around with the gr umpy shape and soon ,
and took the rh in oc eros home with her .
a big box of ch ocol ates to share with the

INTERVAL 5.365 - 13.195
CONTAINS 8.641%

it home with her . She put it in a jar
wash the mud off . He also liked to swing and
questions , but her daddy could not answer them all .
to roar and it made everyone scared . One day ,
your baseball ! â € She found a ladder and stood

INTERVAL -2.466 - 5.365
CONTAINS 68.417%

a stubborn lizard on a rock . The lizard did not
" This is my bell and I don 't want you
and he lived far away in a distant forest .
and jump around , especially when she was feeling happy .
her thumb and spread it around the paper . She gigg

INTERVAL -10.296 - -2.466
CONTAINS 21.989%

go back home ." But Tom did not listen
. He was playing with a toy and he didn 't
loved to play outside in the grass . One day ,
a time , there was a smart little dog named Max
daddy agreed and they found a spot to sit and watch

INTERVAL -18.127 - -10.296
CONTAINS 0.953%

. He liked to explore the woods near his hut ,
. They play with the pupp ets , wear the costumes
?" The elderly person smiled and nodded .
says they need to wear swe aters . "
animals got together and tied the tree down with rope to
Token grandma
Token ID49890
Feature activation+5.21e-01

Pos logit contributions
umm+0.115
andel+0.112
ickets+0.111
anes+0.107
pill+0.105
Neg logit contributions
 would-0.098
 never-0.098
 wouldn-0.096
 got-0.092
 didn-0.092
Token's
Token ID338
Feature activation-5.15e+00

Pos logit contributions
 wouldn+0.008
 would+0.008
 got+0.008
 never+0.007
 didn+0.007
Neg logit contributions
umm-0.014
andel-0.014
 sym-0.014
 films-0.014
 circles-0.013
Token house
Token ID2156
Feature activation+3.21e+00

Pos logit contributions
umm+0.101
andel+0.098
ickets+0.093
 sym+0.091
 films+0.089
Neg logit contributions
 would-0.122
 never-0.122
 wouldn-0.120
 got-0.115
 didn-0.114
Token for
Token ID329
Feature activation+9.03e-01

Pos logit contributions
 wouldn+0.057
 would+0.054
 got+0.053
 never+0.052
 didn+0.049
Neg logit contributions
 films-0.078
umm-0.077
 circles-0.077
andel-0.076
 sym-0.075
Token safe
Token ID3338
Feature activation-2.27e-01

Pos logit contributions
 wouldn+0.017
 would+0.017
 got+0.016
 never+0.016
 didn+0.015
Neg logit contributions
 films-0.021
umm-0.021
 circles-0.020
andel-0.020
 sym-0.020
Tokenkeeping
Token ID19934
Feature activation+1.32e+01

Pos logit contributions
 films+0.005
umm+0.005
 circles+0.005
andel+0.005
 sym+0.005
Neg logit contributions
 wouldn-0.005
 would-0.004
 got-0.004
 never-0.004
 didn-0.004
Token.
Token ID13
Feature activation-8.98e-01

Pos logit contributions
 wouldn+0.295
 would+0.281
 got+0.273
 never+0.267
 didn+0.265
Neg logit contributions
 circles-0.234
 films-0.227
 sym-0.221
 lions-0.218
umm-0.217
Token
Token ID220
Feature activation-1.65e+00

Pos logit contributions
 films+0.019
umm+0.019
 circles+0.019
andel+0.019
 sym+0.019
Neg logit contributions
 wouldn-0.018
 would-0.018
 got-0.017
 never-0.017
 didn-0.016
Token
Token ID198
Feature activation-3.34e+00

Pos logit contributions
 films+0.038
umm+0.038
 circles+0.037
andel+0.037
 sym+0.037
Neg logit contributions
 wouldn-0.031
 would-0.030
 got-0.029
 never-0.029
 didn-0.027
Token
Token ID198
Feature activation-3.68e+00

Pos logit contributions
 circles+0.070
 films+0.069
 sym+0.067
umm+0.067
andel+0.067
Neg logit contributions
 wouldn-0.071
 would-0.067
 got-0.065
 never-0.064
 didn-0.062
TokenWhen
Token ID2215
Feature activation+4.71e+00

Pos logit contributions
 circles+0.071
 films+0.070
 sym+0.069
umm+0.069
andel+0.068
Neg logit contributions
 wouldn-0.083
 would-0.079
 got-0.076
 never-0.075
 didn-0.072
Token it
Token ID340
Feature activation+5.21e+00

Pos logit contributions
 circles+0.038
 films+0.037
umm+0.037
 lions+0.037
andel+0.036
Neg logit contributions
 wouldn-0.020
 would-0.018
 got-0.018
 never-0.017
 didn-0.016
Token to
Token ID284
Feature activation+4.01e+00

Pos logit contributions
 wouldn+0.070
 would+0.065
 got+0.063
 never+0.062
 didn+0.058
Neg logit contributions
umm-0.149
 films-0.149
 circles-0.147
andel-0.146
 sym-0.145
Token me
Token ID502
Feature activation+1.36e+00

Pos logit contributions
 wouldn+0.073
 never+0.072
 would+0.071
 got+0.070
 didn+0.066
Neg logit contributions
umm-0.097
andel-0.094
 sym-0.094
 films-0.092
 circles-0.091
Token for
Token ID329
Feature activation+7.87e-01

Pos logit contributions
 wouldn+0.025
 would+0.024
 got+0.023
 never+0.023
 didn+0.022
Neg logit contributions
 films-0.032
umm-0.032
 circles-0.032
andel-0.031
 sym-0.031
Token safe
Token ID3338
Feature activation-6.61e-01

Pos logit contributions
 wouldn+0.016
 would+0.015
 got+0.015
 never+0.015
 didn+0.014
Neg logit contributions
umm-0.017
 films-0.017
 circles-0.017
andel-0.017
 sym-0.017
Tokenkeeping
Token ID19934
Feature activation+1.21e+01

Pos logit contributions
umm+0.014
 films+0.014
 circles+0.014
andel+0.014
 sym+0.014
Neg logit contributions
 wouldn-0.014
 would-0.013
 got-0.013
 never-0.013
 didn-0.012
Token.
Token ID13
Feature activation-1.99e+00

Pos logit contributions
 wouldn+0.273
 would+0.259
 got+0.252
 didn+0.244
 never+0.243
Neg logit contributions
 circles-0.223
 films-0.219
 lions-0.211
umm-0.207
 sym-0.206
Token He
Token ID679
Feature activation+7.17e+00

Pos logit contributions
andel+0.047
umm+0.047
 films+0.046
 sym+0.045
 circles+0.045
Neg logit contributions
 wouldn-0.037
 would-0.035
 never-0.035
 got-0.035
 didn-0.033
Token told
Token ID1297
Feature activation-3.43e-01

Pos logit contributions
 wouldn+0.115
 would+0.101
 got+0.101
 never+0.099
 managed+0.097
Neg logit contributions
 lions-0.166
 films-0.166
 circles-0.166
 elderly-0.162
 elephants-0.161
Token me
Token ID502
Feature activation+1.49e+00

Pos logit contributions
umm+0.009
 films+0.008
 circles+0.008
andel+0.008
 sym+0.008
Neg logit contributions
 wouldn-0.006
 would-0.006
 got-0.006
 never-0.005
 didn-0.005
Token to
Token ID284
Feature activation+4.00e+00

Pos logit contributions
 wouldn+0.028
 got+0.026
 would+0.025
 never+0.024
 didn+0.024
Neg logit contributions
 films-0.033
 circles-0.033
 lions-0.032
andel-0.031
 monkeys-0.031
Tokenmy
Token ID1820
Feature activation+6.37e+00

Pos logit contributions
 wouldn+0.054
 never+0.045
 got+0.044
 would+0.044
 fell+0.042
Neg logit contributions
 circles-0.101
 elderly-0.095
 lions-0.093
 sym-0.093
 films-0.093
Token showed
Token ID3751
Feature activation-7.73e-01

Pos logit contributions
 wouldn+0.081
use+0.069
 would+0.065
 never+0.065
 got+0.062
Neg logit contributions
 circles-0.166
 films-0.160
 sym-0.159
 lions-0.157
 elderly-0.155
Token Billy
Token ID15890
Feature activation+4.24e+00

Pos logit contributions
umm+0.022
 sym+0.021
 films+0.021
andel+0.021
 Wheels+0.020
Neg logit contributions
 wouldn-0.011
 got-0.011
 would-0.011
 never-0.010
 didn-0.010
Token his
Token ID465
Feature activation-3.75e+00

Pos logit contributions
 wouldn+0.063
 would+0.062
 got+0.061
 never+0.059
 didn+0.056
Neg logit contributions
umm-0.123
andel-0.116
 sym-0.115
 films-0.113
 orchestra-0.111
Token screw
Token ID9580
Feature activation+7.32e-01

Pos logit contributions
umm+0.079
andel+0.074
 films+0.071
 sym+0.068
 Wheels+0.066
Neg logit contributions
 wouldn-0.090
 would-0.088
 got-0.085
 didn-0.083
 never-0.082
Tokendriver
Token ID26230
Feature activation+1.06e+01

Pos logit contributions
 wouldn+0.012
 would+0.011
 never+0.011
 forgot+0.011
 got+0.011
Neg logit contributions
 circles-0.018
umm-0.016
 squares-0.016
andel-0.016
 films-0.016
Token and
Token ID290
Feature activation+5.96e+00

Pos logit contributions
 wouldn+0.168
 would+0.156
 got+0.150
 never+0.150
 forgot+0.146
Neg logit contributions
 circles-0.265
 films-0.254
 elderly-0.249
andel-0.247
 squares-0.246
Token they
Token ID484
Feature activation+8.45e+00

Pos logit contributions
 wouldn+0.085
 would+0.079
 never+0.078
 got+0.074
 didn+0.068
Neg logit contributions
 circles-0.158
 films-0.151
 squares-0.147
 sym-0.146
 elderly-0.146
Token decided
Token ID3066
Feature activation+4.93e+00

Pos logit contributions
 wouldn+0.109
use+0.104
 never+0.101
 would+0.095
 got+0.089
Neg logit contributions
 circles-0.214
 squares-0.206
 sym-0.201
 array-0.196
 elephants-0.195
Token to
Token ID284
Feature activation+4.46e+00

Pos logit contributions
 wouldn+0.075
 got+0.070
 would+0.067
 fell+0.066
use+0.065
Neg logit contributions
 circles-0.116
 films-0.113
 lions-0.113
 amusement-0.113
 sym-0.110
Token build
Token ID1382
Feature activation+2.42e+00

Pos logit contributions
 wouldn+0.082
 would+0.076
 got+0.073
 never+0.071
 didn+0.069
Neg logit contributions
 circles-0.102
umm-0.099
 films-0.099
 amusement-0.096
 sym-0.095
Token
Token ID198
Feature activation-2.83e+00

Pos logit contributions
umm+0.062
 films+0.060
andel+0.059
 circles+0.059
 sym+0.058
Neg logit contributions
 wouldn-0.049
 never-0.047
 would-0.047
 got-0.047
 didn-0.044
TokenHe
Token ID1544
Feature activation+7.13e+00

Pos logit contributions
umm+0.068
 films+0.066
andel+0.065
 circles+0.064
 sym+0.062
Neg logit contributions
 never-0.051
 wouldn-0.051
 would-0.050
 got-0.050
 didn-0.048
Token takes
Token ID2753
Feature activation+6.25e-01

Pos logit contributions
 wouldn+0.082
 would+0.070
 never+0.069
 got+0.067
 didn+0.061
Neg logit contributions
 circles-0.203
 films-0.197
 lions-0.194
 sym-0.193
 Wheels-0.191
Token the
Token ID262
Feature activation-9.37e+00

Pos logit contributions
 wouldn+0.011
 got+0.011
 never+0.010
 would+0.010
 didn+0.009
Neg logit contributions
umm-0.016
 sym-0.015
 films-0.015
andel-0.015
 array-0.015
Token screw
Token ID9580
Feature activation-5.72e-01

Pos logit contributions
umm+0.179
 sym+0.173
 films+0.167
opl+0.166
andel+0.165
Neg logit contributions
 would-0.221
 wouldn-0.220
 got-0.208
 never-0.198
 will-0.195
Tokendriver
Token ID26230
Feature activation+9.49e+00

Pos logit contributions
 circles+0.014
umm+0.014
andel+0.014
 films+0.013
 lions+0.013
Neg logit contributions
 wouldn-0.010
 would-0.009
 never-0.009
 got-0.009
 didn-0.009
Token and
Token ID290
Feature activation+5.64e+00

Pos logit contributions
 wouldn+0.168
 would+0.159
 didn+0.150
 forgot+0.149
 never+0.148
Neg logit contributions
 circles-0.219
andel-0.210
 elderly-0.209
 films-0.209
 lions-0.200
Token opens
Token ID9808
Feature activation+2.29e+00

Pos logit contributions
 wouldn+0.097
 would+0.094
 never+0.092
'd+0.085
 didn+0.085
Neg logit contributions
 circles-0.139
 squares-0.127
 films-0.126
 buildings-0.125
 Wheels-0.122
Token the
Token ID262
Feature activation-8.60e+00

Pos logit contributions
 wouldn+0.029
 would+0.027
 got+0.026
 never+0.026
 didn+0.024
Neg logit contributions
umm-0.067
 films-0.067
 circles-0.066
andel-0.066
 sym-0.066
Token hole
Token ID7604
Feature activation+1.98e+00

Pos logit contributions
 films+0.159
 sym+0.155
andel+0.152
umm+0.151
 orchestra+0.145
Neg logit contributions
 wouldn-0.207
 would-0.206
 got-0.199
 never-0.192
 will-0.186
Token.
Token ID13
Feature activation-1.63e+00

Pos logit contributions
 wouldn+0.037
 would+0.035
 got+0.034
 never+0.034
 didn+0.032
Neg logit contributions
 films-0.047
umm-0.047
andel-0.046
 sym-0.046
 circles-0.045
Token can
Token ID460
Feature activation+6.22e+00

Pos logit contributions
 circles+0.020
 films+0.020
umm+0.020
 sym+0.019
 lions+0.019
Neg logit contributions
 wouldn-0.012
 got-0.011
 never-0.011
 would-0.011
 managed-0.010
Token make
Token ID787
Feature activation+9.30e-01

Pos logit contributions
 wouldn+0.099
 got+0.089
 would+0.088
 managed+0.083
 didn+0.082
Neg logit contributions
 circles-0.154
 films-0.150
 elderly-0.146
 sym-0.145
 sunrise-0.144
Token us
Token ID514
Feature activation+4.39e+00

Pos logit contributions
 wouldn+0.018
 would+0.017
 never+0.016
 got+0.016
 didn+0.016
Neg logit contributions
umm-0.022
andel-0.021
 films-0.021
 sym-0.021
 circles-0.021
Token do
Token ID466
Feature activation+3.02e+00

Pos logit contributions
 wouldn+0.071
 would+0.067
 got+0.065
 never+0.065
 didn+0.061
Neg logit contributions
 films-0.113
umm-0.113
 circles-0.112
andel-0.110
 sym-0.109
Token things
Token ID1243
Feature activation+5.63e+00

Pos logit contributions
 wouldn+0.061
 would+0.058
 got+0.057
 never+0.055
 didn+0.053
Neg logit contributions
 films-0.065
 circles-0.065
 sym-0.062
umm-0.061
andel-0.060
Token we
Token ID356
Feature activation+9.49e+00

Pos logit contributions
 wouldn+0.099
 got+0.094
 would+0.093
 never+0.091
 didn+0.086
Neg logit contributions
 circles-0.129
 films-0.128
 sym-0.124
umm-0.124
 elderly-0.122
Token may
Token ID743
Feature activation+8.70e+00

Pos logit contributions
 wouldn+0.118
 never+0.107
 got+0.106
 fell+0.103
 would+0.103
Neg logit contributions
 circles-0.245
 films-0.242
 sym-0.241
 elderly-0.238
 orchestra-0.234
Token not
Token ID407
Feature activation+6.15e+00

Pos logit contributions
 wouldn+0.134
 got+0.119
 would+0.119
 didn+0.108
 never+0.108
Neg logit contributions
 circles-0.227
 films-0.221
 sym-0.219
umm-0.216
 elderly-0.213
Token be
Token ID307
Feature activation+1.83e+00

Pos logit contributions
 wouldn+0.095
 would+0.085
 got+0.084
 never+0.081
 didn+0.077
Neg logit contributions
 circles-0.156
 films-0.155
 sym-0.152
umm-0.151
 elderly-0.150
Token prepared
Token ID5597
Feature activation+3.66e+00

Pos logit contributions
 wouldn+0.035
 would+0.033
 got+0.032
 never+0.032
 didn+0.030
Neg logit contributions
 films-0.042
 circles-0.041
umm-0.041
andel-0.040
 sym-0.040
Token for
Token ID329
Feature activation+7.83e-01

Pos logit contributions
 wouldn+0.069
 would+0.066
 got+0.065
 never+0.064
 didn+0.061
Neg logit contributions
 films-0.083
umm-0.083
 circles-0.083
andel-0.081
 sym-0.081
Token says
Token ID1139
Feature activation+1.03e+00

Pos logit contributions
 wouldn+0.115
 would+0.102
 got+0.102
 never+0.101
 didn+0.099
Neg logit contributions
 films-0.159
 lions-0.158
 circles-0.156
 elephants-0.151
 monkeys-0.151
Token neigh
Token ID3422
Feature activation-2.44e+00

Pos logit contributions
 wouldn+0.018
 would+0.018
 got+0.017
 never+0.017
 didn+0.016
Neg logit contributions
umm-0.026
 films-0.025
andel-0.025
 circles-0.025
 sym-0.025
Token.
Token ID13
Feature activation-1.79e+00

Pos logit contributions
 films+0.058
 circles+0.057
umm+0.057
andel+0.056
 sym+0.056
Neg logit contributions
 wouldn-0.044
 would-0.041
 got-0.041
 never-0.040
 didn-0.038
Token Aren
Token ID9843
Feature activation+4.20e-02

Pos logit contributions
umm+0.044
andel+0.044
 sym+0.043
 films+0.043
 circles+0.042
Neg logit contributions
 wouldn-0.031
 would-0.030
 got-0.029
 never-0.029
 could-0.028
Token't
Token ID470
Feature activation+5.08e+00

Pos logit contributions
 wouldn+0.001
 would+0.001
 got+0.001
 never+0.001
 didn+0.000
Neg logit contributions
 films-0.001
 circles-0.001
umm-0.001
andel-0.001
 Wheels-0.001
Token they
Token ID484
Feature activation+9.48e+00

Pos logit contributions
 wouldn+0.079
 would+0.072
 got+0.070
 never+0.068
 didn+0.065
Neg logit contributions
 films-0.137
umm-0.133
 circles-0.132
 lions-0.132
andel-0.129
Token lovely
Token ID14081
Feature activation-3.52e+00

Pos logit contributions
 wouldn+0.178
 would+0.168
 got+0.164
 never+0.162
 didn+0.155
Neg logit contributions
 films-0.220
 circles-0.214
umm-0.213
 sym-0.209
andel-0.209
Token,
Token ID11
Feature activation-1.07e+00

Pos logit contributions
umm+0.096
 films+0.096
 circles+0.095
andel+0.094
 sym+0.094
Neg logit contributions
 wouldn-0.052
 would-0.049
 got-0.047
 never-0.047
 didn-0.044
Token Daisy
Token ID40355
Feature activation+3.49e+00

Pos logit contributions
umm+0.024
 circles+0.024
andel+0.024
 films+0.024
 sym+0.024
Neg logit contributions
 wouldn-0.021
 would-0.020
 never-0.020
 got-0.020
 didn-0.019
Token?"
Token ID1701
Feature activation-1.63e+00

Pos logit contributions
 wouldn+0.064
 managed+0.057
 got+0.056
 never+0.055
 didn+0.054
Neg logit contributions
 films-0.082
 lions-0.077
 elderly-0.076
 circles-0.076
 monkeys-0.073
Token Anna
Token ID11735
Feature activation+3.02e+00

Pos logit contributions
umm+0.042
andel+0.041
 films+0.040
 sym+0.040
 circles+0.039
Neg logit contributions
 wouldn-0.027
 would-0.027
 got-0.026
 never-0.026
 didn-0.025
Token the
Token ID262
Feature activation-8.93e+00

Pos logit contributions
 wouldn+0.037
 would+0.035
 got+0.034
 never+0.034
 didn+0.032
Neg logit contributions
umm-0.064
 films-0.063
andel-0.062
 circles-0.062
 sym-0.062
Token screw
Token ID9580
Feature activation-8.33e-02

Pos logit contributions
umm+0.157
 films+0.156
 sym+0.154
andel+0.153
ums+0.149
Neg logit contributions
 wouldn-0.221
 would-0.219
 got-0.206
 never-0.202
 will-0.196
Token with
Token ID351
Feature activation+3.49e-01

Pos logit contributions
 circles+0.002
umm+0.002
 films+0.002
 lions+0.002
andel+0.002
Neg logit contributions
 wouldn-0.001
 never-0.001
 didn-0.001
 would-0.001
 got-0.001
Token this
Token ID428
Feature activation-4.37e+00

Pos logit contributions
 wouldn+0.006
 never+0.006
 would+0.006
 got+0.006
 didn+0.006
Neg logit contributions
umm-0.008
 films-0.008
andel-0.008
 sym-0.008
 circles-0.008
Token screw
Token ID9580
Feature activation+2.91e-01

Pos logit contributions
 circles+0.091
 films+0.091
umm+0.091
andel+0.089
 sym+0.088
Neg logit contributions
 wouldn-0.090
 would-0.086
 got-0.085
 never-0.084
 didn-0.081
Tokendriver
Token ID26230
Feature activation+9.44e+00

Pos logit contributions
 wouldn+0.005
 never+0.005
 would+0.005
 got+0.005
 didn+0.005
Neg logit contributions
 circles-0.006
umm-0.006
 films-0.006
 lions-0.006
andel-0.006
Token.
Token ID13
Feature activation-1.36e+00

Pos logit contributions
 wouldn+0.182
 would+0.171
 got+0.168
 never+0.167
 didn+0.166
Neg logit contributions
 circles-0.196
 films-0.185
 elderly-0.182
 lions-0.181
umm-0.179
Token Let
Token ID3914
Feature activation+5.30e+00

Pos logit contributions
umm+0.031
andel+0.031
 films+0.031
 sym+0.030
 circles+0.030
Neg logit contributions
 wouldn-0.026
 would-0.024
 never-0.024
 got-0.024
 could-0.023
Token's
Token ID338
Feature activation+8.43e-01

Pos logit contributions
 wouldn+0.098
 would+0.090
 never+0.089
 got+0.085
 managed+0.085
Neg logit contributions
 circles-0.121
 films-0.119
 Wheels-0.114
umm-0.112
 lions-0.112
Token find
Token ID1064
Feature activation+3.18e+00

Pos logit contributions
 wouldn+0.014
 would+0.013
 got+0.012
 didn+0.012
 never+0.012
Neg logit contributions
 circles-0.021
 films-0.020
 Wheels-0.019
umm-0.019
 amusement-0.019
Token more
Token ID517
Feature activation-2.93e+00

Pos logit contributions
 wouldn+0.049
 would+0.046
 got+0.045
 never+0.044
 didn+0.041
Neg logit contributions
umm-0.086
 films-0.085
andel-0.084
 sym-0.083
 circles-0.083
Token he
Token ID339
Feature activation+8.49e+00

Pos logit contributions
 wouldn+0.021
 would+0.020
 got+0.018
 never+0.018
 didn+0.017
Neg logit contributions
 circles-0.043
umm-0.042
 films-0.042
 sym-0.041
andel-0.041
Token got
Token ID1392
Feature activation+1.98e+00

Pos logit contributions
 wouldn+0.089
use+0.086
 never+0.082
 would+0.078
 don+0.074
Neg logit contributions
 elderly-0.223
 circles-0.216
 Wheels-0.211
 films-0.211
 monkeys-0.210
Token his
Token ID465
Feature activation-3.27e+00

Pos logit contributions
 wouldn+0.037
 would+0.035
 never+0.034
 got+0.033
 didn+0.032
Neg logit contributions
 circles-0.046
 films-0.046
 elderly-0.044
andel-0.043
 lions-0.043
Token little
Token ID1310
Feature activation-6.58e+00

Pos logit contributions
umm+0.063
 sym+0.061
 films+0.060
andel+0.060
 circles+0.060
Neg logit contributions
 wouldn-0.079
 would-0.076
 got-0.074
 never-0.072
 didn-0.071
Token screw
Token ID9580
Feature activation+5.50e-01

Pos logit contributions
opl+0.131
umm+0.131
andel+0.130
 sym+0.130
 films+0.128
Neg logit contributions
 wouldn-0.149
 would-0.144
 got-0.143
 tried-0.133
 didn-0.133
Tokendriver
Token ID26230
Feature activation+9.38e+00

Pos logit contributions
 wouldn+0.009
 would+0.009
 never+0.009
 got+0.009
 didn+0.008
Neg logit contributions
 circles-0.013
umm-0.013
 films-0.013
andel-0.013
 lions-0.013
Token and
Token ID290
Feature activation+5.98e+00

Pos logit contributions
 wouldn+0.178
 would+0.170
 never+0.161
 got+0.160
 didn+0.158
Neg logit contributions
 circles-0.208
 films-0.204
 elderly-0.200
andel-0.199
umm-0.197
Token started
Token ID2067
Feature activation+3.26e+00

Pos logit contributions
 wouldn+0.076
 would+0.071
 never+0.071
use+0.068
'd+0.063
Neg logit contributions
 circles-0.159
 elderly-0.152
 films-0.151
 squares-0.149
 buildings-0.147
Token to
Token ID284
Feature activation+4.12e+00

Pos logit contributions
 wouldn+0.048
 would+0.045
 got+0.045
 never+0.044
 didn+0.041
Neg logit contributions
umm-0.090
 films-0.089
andel-0.088
 circles-0.087
 sym-0.087
Token screw
Token ID9580
Feature activation+1.19e+00

Pos logit contributions
 wouldn+0.075
 would+0.072
 never+0.070
 got+0.068
 didn+0.064
Neg logit contributions
 circles-0.097
 films-0.095
umm-0.092
 sym-0.091
 elderly-0.090
Token in
Token ID287
Feature activation-1.68e-01

Pos logit contributions
 wouldn+0.022
 would+0.021
 got+0.021
 never+0.021
 didn+0.020
Neg logit contributions
 circles-0.028
 films-0.027
umm-0.027
andel-0.027
 sym-0.026
Token we
Token ID356
Feature activation+7.18e+00

Pos logit contributions
 wouldn+0.059
 would+0.055
 got+0.055
 never+0.053
 managed+0.051
Neg logit contributions
 films-0.074
 circles-0.074
umm-0.069
 sym-0.069
 lions-0.068
Token have
Token ID423
Feature activation+5.11e+00

Pos logit contributions
 wouldn+0.094
 managed+0.084
 got+0.084
 would+0.081
 listened+0.081
Neg logit contributions
 circles-0.176
 films-0.168
 Animals-0.167
 Wheels-0.167
 Birds-0.164
Token to
Token ID284
Feature activation+4.78e+00

Pos logit contributions
 wouldn+0.076
 would+0.069
 got+0.067
 never+0.064
 didn+0.062
Neg logit contributions
 circles-0.137
 films-0.137
 lions-0.132
 sym-0.132
umm-0.132
Token do
Token ID466
Feature activation+3.07e+00

Pos logit contributions
 wouldn+0.073
 would+0.067
 got+0.065
 never+0.061
 didn+0.059
Neg logit contributions
 circles-0.127
 films-0.126
umm-0.123
 sym-0.122
andel-0.121
Token things
Token ID1243
Feature activation+5.32e+00

Pos logit contributions
 wouldn+0.067
 would+0.064
 got+0.063
 never+0.062
 didn+0.060
Neg logit contributions
 films-0.062
umm-0.062
 circles-0.061
andel-0.060
 sym-0.060
Token we
Token ID356
Feature activation+9.35e+00

Pos logit contributions
 wouldn+0.104
 would+0.097
 got+0.097
 never+0.095
 managed+0.092
Neg logit contributions
 films-0.113
 circles-0.111
 elderly-0.108
umm-0.105
 sym-0.103
Token don
Token ID836
Feature activation+4.52e+00

Pos logit contributions
 wouldn+0.124
 never+0.115
 fell+0.111
 got+0.107
 would+0.105
Neg logit contributions
 elderly-0.223
 circles-0.222
 films-0.219
 Animals-0.215
 teams-0.214
Token't
Token ID470
Feature activation+7.52e+00

Pos logit contributions
 wouldn+0.064
 would+0.060
 got+0.056
 never+0.055
 didn+0.052
Neg logit contributions
 films-0.126
 circles-0.125
umm-0.123
andel-0.121
 Wheels-0.120
Token like
Token ID588
Feature activation+2.97e-01

Pos logit contributions
 wouldn+0.120
 would+0.108
 got+0.099
 managed+0.098
 never+0.096
Neg logit contributions
 circles-0.186
 films-0.185
 elderly-0.185
andel-0.182
umm-0.180
Token,
Token ID11
Feature activation+9.91e-01

Pos logit contributions
 wouldn+0.005
 would+0.005
 never+0.005
 got+0.005
 didn+0.004
Neg logit contributions
umm-0.008
andel-0.008
 sym-0.008
 circles-0.007
 films-0.007
Token but
Token ID475
Feature activation+4.73e+00

Pos logit contributions
 wouldn+0.017
 would+0.016
 never+0.015
 got+0.015
 didn+0.015
Neg logit contributions
umm-0.026
andel-0.025
 films-0.025
 sym-0.025
 circles-0.025
Token,
Token ID11
Feature activation+1.89e+00

Pos logit contributions
 wouldn+0.077
 would+0.073
 got+0.071
 never+0.071
 didn+0.067
Neg logit contributions
 films-0.090
umm-0.090
 circles-0.089
andel-0.088
 sym-0.087
Token too
Token ID1165
Feature activation+3.11e+00

Pos logit contributions
 wouldn+0.037
 would+0.035
 got+0.034
 never+0.033
 didn+0.032
Neg logit contributions
 films-0.042
 circles-0.042
umm-0.041
 sym-0.040
andel-0.040
Token.
Token ID13
Feature activation-1.29e+00

Pos logit contributions
 wouldn+0.057
 would+0.055
 never+0.054
 got+0.053
 didn+0.052
Neg logit contributions
umm-0.077
andel-0.075
 films-0.075
 circles-0.074
 sym-0.072
Token They
Token ID1119
Feature activation+8.31e+00

Pos logit contributions
umm+0.031
andel+0.031
 films+0.031
 circles+0.030
 sym+0.029
Neg logit contributions
 wouldn-0.023
 never-0.022
 would-0.022
 got-0.021
 didn-0.021
Token hope
Token ID2911
Feature activation+2.18e+00

Pos logit contributions
 wouldn+0.134
 would+0.120
 got+0.119
 never+0.115
 didn+0.111
Neg logit contributions
 circles-0.194
 films-0.191
 amusement-0.191
 lions-0.189
 elephants-0.187
Token they
Token ID484
Feature activation+9.32e+00

Pos logit contributions
 wouldn+0.035
 would+0.033
 never+0.032
 got+0.032
 didn+0.030
Neg logit contributions
umm-0.057
 films-0.057
 circles-0.056
andel-0.056
 sym-0.056
Token can
Token ID460
Feature activation+6.79e+00

Pos logit contributions
 wouldn+0.086
 got+0.083
 managed+0.082
 fell+0.081
 refused+0.080
Neg logit contributions
 lions-0.234
 elephants-0.232
 monkeys-0.227
 films-0.222
 circles-0.220
Token find
Token ID1064
Feature activation+2.08e+00

Pos logit contributions
 wouldn+0.119
 would+0.103
 got+0.102
 didn+0.099
 managed+0.096
Neg logit contributions
 circles-0.165
 films-0.158
 sym-0.154
 elderly-0.154
 amusement-0.153
Token a
Token ID257
Feature activation-3.42e+00

Pos logit contributions
 wouldn+0.037
 would+0.036
 never+0.035
 got+0.035
 didn+0.032
Neg logit contributions
umm-0.052
andel-0.051
 sym-0.050
 films-0.050
 Wheels-0.049
Token new
Token ID649
Feature activation-8.26e+00

Pos logit contributions
umm+0.056
andel+0.053
 films+0.053
 circles+0.050
ement+0.049
Neg logit contributions
 would-0.094
 wouldn-0.093
 never-0.091
 got-0.090
 didn-0.085
Token toy
Token ID13373
Feature activation-1.78e+00

Pos logit contributions
umm+0.153
andel+0.146
 films+0.144
 Wheels+0.134
 circles+0.133
Neg logit contributions
 would-0.210
 wouldn-0.205
 never-0.202
 got-0.199
 didn-0.189
Token to
Token ID284
Feature activation+4.38e+00

Pos logit contributions
 wouldn+0.034
 would+0.033
 never+0.031
 got+0.031
 didn+0.030
Neg logit contributions
 circles-0.040
 films-0.038
umm-0.037
 elderly-0.037
andel-0.037
Token the
Token ID262
Feature activation-9.52e+00

Pos logit contributions
 wouldn+0.071
 never+0.068
 got+0.067
 would+0.067
 didn+0.062
Neg logit contributions
umm-0.119
andel-0.113
 films-0.113
 sym-0.109
ement-0.109
Token other
Token ID584
Feature activation-7.01e+00

Pos logit contributions
andel+0.162
 films+0.159
umm+0.159
ement+0.154
 sym+0.153
Neg logit contributions
 wouldn-0.244
 would-0.242
 never-0.233
 got-0.227
 will-0.217
Token side
Token ID1735
Feature activation-5.12e-01

Pos logit contributions
andel+0.126
 sym+0.123
umm+0.122
igsaw+0.119
 films+0.118
Neg logit contributions
 would-0.169
 wouldn-0.167
 got-0.164
 never-0.158
 didn-0.151
Token.
Token ID13
Feature activation-1.03e+00

Pos logit contributions
 films+0.011
umm+0.011
andel+0.011
 circles+0.011
 sym+0.011
Neg logit contributions
 wouldn-0.010
 would-0.010
 got-0.010
 never-0.009
 didn-0.009
Token Whoever
Token ID37807
Feature activation+9.29e+00

Pos logit contributions
andel+0.026
umm+0.026
 films+0.026
 sym+0.025
 orchestra+0.024
Neg logit contributions
 wouldn-0.017
 would-0.017
 never-0.016
 got-0.016
 could-0.016
Token gets
Token ID3011
Feature activation+9.43e-01

Pos logit contributions
 wouldn+0.139
 never+0.125
 would+0.121
 got+0.120
 didn+0.120
Neg logit contributions
 circles-0.232
 lions-0.222
 films-0.220
 Wheels-0.216
 squares-0.213
Token there
Token ID612
Feature activation+5.61e+00

Pos logit contributions
 wouldn+0.018
 would+0.017
 got+0.017
 never+0.017
 didn+0.016
Neg logit contributions
 circles-0.021
 films-0.020
umm-0.020
andel-0.020
 sym-0.020
Token first
Token ID717
Feature activation-8.96e-01

Pos logit contributions
 wouldn+0.079
 got+0.073
 would+0.071
 never+0.070
 didn+0.067
Neg logit contributions
 circles-0.150
 films-0.148
umm-0.146
 lions-0.144
 sym-0.143
Token is
Token ID318
Feature activation+2.16e-01

Pos logit contributions
umm+0.021
 films+0.021
andel+0.021
 circles+0.021
 sym+0.021
Neg logit contributions
 wouldn-0.016
 would-0.016
 got-0.015
 never-0.015
 didn-0.014
Token the
Token ID262
Feature activation-8.64e+00

Pos logit contributions
 wouldn+0.004
 would+0.004
 never+0.004
 got+0.004
 didn+0.003
Neg logit contributions
umm-0.005
 films-0.005
andel-0.005
 circles-0.005
 sym-0.005
Token that
Token ID326
Feature activation+1.74e+00

Pos logit contributions
 never+0.029
 would+0.028
 wouldn+0.028
 got+0.027
 listened+0.025
Neg logit contributions
umm-0.056
andel-0.053
 circles-0.050
 sym-0.050
opl-0.050
Token they
Token ID484
Feature activation+8.59e+00

Pos logit contributions
 wouldn+0.030
 would+0.029
 never+0.028
 got+0.028
 didn+0.027
Neg logit contributions
umm-0.044
andel-0.044
 sym-0.042
 films-0.042
 circles-0.042
Token could
Token ID714
Feature activation+7.22e+00

Pos logit contributions
 wouldn+0.084
 fell+0.068
use+0.067
ried+0.063
 managed+0.063
Neg logit contributions
 elderly-0.228
 dancers-0.215
 circles-0.215
 orchestra-0.214
 Animals-0.211
Token do
Token ID466
Feature activation+2.05e+00

Pos logit contributions
 wouldn+0.116
 got+0.104
 would+0.103
 didn+0.095
 fell+0.093
Neg logit contributions
 circles-0.187
 films-0.187
 sym-0.179
umm-0.177
 amusement-0.177
Token anything
Token ID1997
Feature activation+2.17e+00

Pos logit contributions
 wouldn+0.046
 would+0.044
 got+0.043
 never+0.042
 didn+0.041
Neg logit contributions
 films-0.040
 circles-0.040
umm-0.038
 sym-0.037
andel-0.037
Token they
Token ID484
Feature activation+9.29e+00

Pos logit contributions
 wouldn+0.038
 would+0.035
 got+0.035
 never+0.034
 didn+0.032
Neg logit contributions
 films-0.053
 circles-0.053
umm-0.052
 sym-0.051
andel-0.051
Token set
Token ID900
Feature activation+1.91e+00

Pos logit contributions
 wouldn+0.117
 fell+0.098
 got+0.096
 never+0.096
 would+0.092
Neg logit contributions
 elderly-0.239
 circles-0.229
 films-0.229
 orchestra-0.227
 elephants-0.225
Token their
Token ID511
Feature activation-5.41e+00

Pos logit contributions
 wouldn+0.033
 would+0.031
 never+0.030
 got+0.030
 didn+0.028
Neg logit contributions
umm-0.048
 films-0.047
andel-0.047
 circles-0.046
 sym-0.046
Token minds
Token ID9017
Feature activation+3.25e+00

Pos logit contributions
umm+0.102
andel+0.098
 sym+0.093
 circles+0.092
opl+0.091
Neg logit contributions
 would-0.131
 wouldn-0.130
 never-0.127
 got-0.123
 didn-0.121
Token to
Token ID284
Feature activation+3.78e+00

Pos logit contributions
 wouldn+0.059
 would+0.055
 got+0.055
 never+0.053
 didn+0.052
Neg logit contributions
 circles-0.077
 films-0.076
 sym-0.073
umm-0.073
andel-0.073
Token.
Token ID13
Feature activation-1.14e+00

Pos logit contributions
 wouldn+0.059
 would+0.057
 never+0.056
 got+0.055
 didn+0.052
Neg logit contributions
umm-0.101
andel-0.099
 films-0.098
 circles-0.097
 sym-0.097
Token!",
Token ID40754
Feature activation-1.08e+00

Pos logit contributions
 wouldn+0.014
 would+0.013
 got+0.013
 never+0.013
 didn+0.012
Neg logit contributions
umm-0.019
 films-0.019
andel-0.019
 circles-0.019
 sym-0.018
Token he
Token ID339
Feature activation+8.70e+00

Pos logit contributions
umm+0.028
 films+0.027
andel+0.027
 sym+0.027
 circles+0.026
Neg logit contributions
 wouldn-0.018
 got-0.018
 would-0.017
 never-0.017
 didn-0.017
Token said
Token ID531
Feature activation+1.94e+00

Pos logit contributions
 wouldn+0.131
 would+0.122
 never+0.121
 got+0.118
 didn+0.109
Neg logit contributions
 circles-0.227
 films-0.225
umm-0.219
 sym-0.219
andel-0.218
Token.
Token ID13
Feature activation-1.52e+00

Pos logit contributions
 wouldn+0.038
 would+0.036
 got+0.035
 never+0.035
 didn+0.033
Neg logit contributions
 films-0.044
umm-0.043
 circles-0.043
andel-0.043
 sym-0.042
Token "
Token ID366
Feature activation-4.75e-01

Pos logit contributions
umm+0.041
 films+0.040
andel+0.039
 array+0.038
 sym+0.038
Neg logit contributions
 wouldn-0.023
 got-0.023
 would-0.022
 never-0.022
 could-0.021
TokenWhoever
Token ID47896
Feature activation+9.28e+00

Pos logit contributions
 circles+0.010
umm+0.010
 films+0.010
andel+0.010
 sym+0.010
Neg logit contributions
 wouldn-0.010
 would-0.009
 got-0.009
 never-0.009
 didn-0.008
Token finishes
Token ID20271
Feature activation+3.00e+00

Pos logit contributions
 wouldn+0.120
 never+0.111
 would+0.106
 got+0.103
 didn+0.103
Neg logit contributions
 circles-0.251
 films-0.250
 lions-0.245
 Wheels-0.239
 elephants-0.238
Token their
Token ID511
Feature activation-6.01e+00

Pos logit contributions
 wouldn+0.058
 would+0.056
 got+0.055
 never+0.055
 didn+0.052
Neg logit contributions
umm-0.068
 films-0.067
andel-0.066
 circles-0.065
 sym-0.065
Token writing
Token ID3597
Feature activation+1.03e-01

Pos logit contributions
umm+0.126
andel+0.126
 films+0.121
 sym+0.120
 circles+0.119
Neg logit contributions
 wouldn-0.128
 would-0.126
 got-0.121
 never-0.118
 didn-0.115
Token first
Token ID717
Feature activation-1.34e+00

Pos logit contributions
 wouldn+0.002
 would+0.002
 got+0.002
 never+0.002
 didn+0.002
Neg logit contributions
 films-0.002
 circles-0.002
umm-0.002
andel-0.002
 sym-0.002
Token wins
Token ID7864
Feature activation+3.86e+00

Pos logit contributions
umm+0.030
 films+0.030
 circles+0.030
andel+0.030
 sym+0.029
Neg logit contributions
 wouldn-0.026
 would-0.025
 got-0.024
 never-0.024
 didn-0.023
Token it
Token ID340
Feature activation+6.45e+00

Pos logit contributions
 wouldn+0.023
 would+0.022
 got+0.021
 never+0.020
 didn+0.019
Neg logit contributions
 films-0.044
 circles-0.043
umm-0.043
andel-0.042
 sym-0.042
Token."
Token ID526
Feature activation-1.68e+00

Pos logit contributions
 wouldn+0.128
 would+0.122
 got+0.121
 didn+0.116
 never+0.113
Neg logit contributions
 films-0.131
 circles-0.131
 elderly-0.129
umm-0.126
andel-0.124
Token Emma
Token ID18966
Feature activation+5.56e+00

Pos logit contributions
umm+0.039
 films+0.039
andel+0.038
 circles+0.038
 sym+0.037
Neg logit contributions
 wouldn-0.032
 would-0.031
 got-0.030
 never-0.030
 didn-0.029
Token understood
Token ID7247
Feature activation+4.01e+00

Pos logit contributions
 wouldn+0.080
 would+0.074
 never+0.070
 got+0.067
 fell+0.062
Neg logit contributions
 circles-0.143
 sym-0.138
 films-0.136
 elderly-0.135
 Wheels-0.133
Token and
Token ID290
Feature activation+5.52e+00

Pos logit contributions
 wouldn+0.090
 would+0.085
 got+0.085
 never+0.084
 managed+0.080
Neg logit contributions
 circles-0.075
 films-0.074
ement-0.069
 elderly-0.069
andel-0.069
Token they
Token ID484
Feature activation+9.28e+00

Pos logit contributions
use+0.071
 wouldn+0.070
 would+0.068
'd+0.059
w+0.058
Neg logit contributions
 elderly-0.132
 circles-0.128
 performers-0.121
 amaz-0.121
 monkeys-0.120
Token all
Token ID477
Feature activation+4.73e+00

Pos logit contributions
 wouldn+0.115
 would+0.107
 never+0.102
use+0.101
 got+0.098
Neg logit contributions
 circles-0.225
 amusement-0.222
 array-0.220
 lions-0.220
 sym-0.219
Token played
Token ID2826
Feature activation-8.38e-01

Pos logit contributions
 wouldn+0.055
 would+0.048
 never+0.046
 got+0.043
 didn+0.040
Neg logit contributions
 circles-0.137
 films-0.137
 sym-0.135
 lions-0.133
umm-0.132
Token together
Token ID1978
Feature activation+4.59e+00

Pos logit contributions
umm+0.022
andel+0.022
 films+0.021
 sym+0.021
 circles+0.021
Neg logit contributions
 wouldn-0.014
 would-0.013
 got-0.013
 never-0.013
 didn-0.012
Token happily
Token ID18177
Feature activation+4.46e+00

Pos logit contributions
 wouldn+0.075
 would+0.073
 never+0.072
 got+0.071
 didn+0.065
Neg logit contributions
umm-0.122
andel-0.119
 films-0.118
 sym-0.116
 Wheels-0.114
Token.
Token ID13
Feature activation-6.68e-01

Pos logit contributions
 wouldn+0.083
 would+0.079
 got+0.077
 never+0.076
 didn+0.073
Neg logit contributions
 films-0.104
 circles-0.103
umm-0.103
andel-0.101
 sym-0.100
Token question
Token ID1808
Feature activation+1.84e+00

Pos logit contributions
umm+0.067
 films+0.063
opl+0.063
andel+0.062
ement+0.062
Neg logit contributions
 would-0.074
 got-0.071
 wouldn-0.069
 never-0.067
 didn-0.065
Token,
Token ID11
Feature activation+2.76e+00

Pos logit contributions
 wouldn+0.032
 would+0.031
 never+0.029
 got+0.029
 didn+0.027
Neg logit contributions
umm-0.047
 films-0.046
andel-0.046
 sym-0.045
 circles-0.045
Token and
Token ID290
Feature activation+4.53e+00

Pos logit contributions
 wouldn+0.062
 would+0.059
 got+0.058
 never+0.058
 didn+0.055
Neg logit contributions
 films-0.054
 circles-0.053
umm-0.053
andel-0.052
 sym-0.051
Token she
Token ID673
Feature activation+8.09e+00

Pos logit contributions
 wouldn+0.067
 would+0.063
 never+0.060
 got+0.060
 didn+0.056
Neg logit contributions
 circles-0.119
 films-0.118
umm-0.115
 lions-0.114
 sym-0.114
Token courage
Token ID11917
Feature activation+3.69e+00

Pos logit contributions
 wouldn+0.106
 never+0.093
 would+0.093
 got+0.089
 fell+0.082
Neg logit contributions
 circles-0.210
 elderly-0.204
 sym-0.204
 lions-0.203
 films-0.201
Tokenously
Token ID3481
Feature activation+9.27e+00

Pos logit contributions
 wouldn+0.069
 would+0.066
 never+0.065
 got+0.064
 didn+0.061
Neg logit contributions
 circles-0.081
 films-0.081
umm-0.079
andel-0.079
 sym-0.078
Token answered
Token ID9373
Feature activation+7.74e-01

Pos logit contributions
 wouldn+0.137
 would+0.129
 never+0.124
 got+0.119
 didn+0.114
Neg logit contributions
 circles-0.234
 sym-0.229
 films-0.226
 array-0.226
andel-0.224
Token it
Token ID340
Feature activation+5.27e+00

Pos logit contributions
 wouldn+0.013
 would+0.012
 never+0.012
 got+0.011
 didn+0.011
Neg logit contributions
umm-0.021
 films-0.021
andel-0.020
 sym-0.020
 Wheels-0.020
Token.
Token ID13
Feature activation-1.41e+00

Pos logit contributions
 wouldn+0.100
 would+0.096
 got+0.092
 never+0.091
 didn+0.087
Neg logit contributions
umm-0.126
 films-0.125
andel-0.123
 circles-0.122
 sym-0.121
Token
Token ID220
Feature activation-1.38e+00

Pos logit contributions
 films+0.036
umm+0.035
andel+0.035
 orchestra+0.034
 sym+0.034
Neg logit contributions
 wouldn-0.023
 got-0.023
 would-0.023
 never-0.022
 could-0.022
Token
Token ID198
Feature activation-3.27e+00

Pos logit contributions
umm+0.032
 films+0.032
 circles+0.032
andel+0.032
 sym+0.031
Neg logit contributions
 wouldn-0.026
 would-0.025
 got-0.024
 never-0.024
 didn-0.023
Token started
Token ID2067
Feature activation+3.15e+00

Pos logit contributions
 wouldn+0.111
 would+0.102
 never+0.098
 got+0.093
 didn+0.086
Neg logit contributions
 circles-0.249
 lions-0.247
 films-0.247
 sym-0.242
 Wheels-0.240
Token it
Token ID340
Feature activation+6.46e+00

Pos logit contributions
 wouldn+0.042
 would+0.039
 never+0.038
 got+0.038
 didn+0.035
Neg logit contributions
umm-0.092
 films-0.091
andel-0.090
 circles-0.089
 sym-0.089
Token.
Token ID13
Feature activation-1.03e+00

Pos logit contributions
 wouldn+0.122
 would+0.116
 got+0.113
 never+0.112
 didn+0.107
Neg logit contributions
 films-0.147
 circles-0.147
umm-0.146
andel-0.143
 sym-0.143
Token He
Token ID679
Feature activation+7.67e+00

Pos logit contributions
andel+0.024
umm+0.024
 films+0.023
 sym+0.023
 circles+0.023
Neg logit contributions
 wouldn-0.020
 would-0.019
 never-0.018
 got-0.018
 didn-0.017
Token/
Token ID14
Feature activation-3.27e+00

Pos logit contributions
 wouldn+0.085
 would+0.077
 never+0.072
 got+0.072
 managed+0.064
Neg logit contributions
 circles-0.210
 lions-0.200
 sym-0.199
 films-0.198
 Wheels-0.198
Tokenshe
Token ID7091
Feature activation+9.27e+00

Pos logit contributions
 Wheels+0.053
 circles+0.053
 films+0.052
andel+0.052
opl+0.050
Neg logit contributions
 wouldn-0.077
 would-0.075
 never-0.073
 managed-0.068
 got-0.068
Token wants
Token ID3382
Feature activation+3.14e+00

Pos logit contributions
 wouldn+0.093
 would+0.083
 never+0.077
 got+0.073
 didn+0.068
Neg logit contributions
 circles-0.266
 lions-0.259
 sym-0.258
 monkeys-0.255
 Wheels-0.255
Token my
Token ID616
Feature activation-5.94e+00

Pos logit contributions
 wouldn+0.041
 would+0.038
 got+0.036
 never+0.035
 didn+0.033
Neg logit contributions
 films-0.091
 circles-0.091
umm-0.090
andel-0.089
 sym-0.088
Token car
Token ID1097
Feature activation+1.21e+00

Pos logit contributions
umm+0.117
andel+0.117
 sym+0.107
 films+0.106
opl+0.103
Neg logit contributions
 would-0.142
 wouldn-0.141
 got-0.133
 never-0.131
 didn-0.129
Token."
Token ID526
Feature activation-1.86e+00

Pos logit contributions
 wouldn+0.023
 would+0.022
 got+0.021
 never+0.021
 didn+0.021
Neg logit contributions
 circles-0.026
 films-0.026
umm-0.026
 lions-0.025
 sym-0.025
Token Mom
Token ID11254
Feature activation+2.97e+00

Pos logit contributions
umm+0.047
 films+0.046
andel+0.046
 sym+0.045
 circles+0.044
Neg logit contributions
 wouldn-0.032
 got-0.031
 would-0.031
 never-0.030
 didn-0.029
Token room
Token ID2119
Feature activation+2.85e+00

Pos logit contributions
andel+0.089
umm+0.088
pill+0.083
anes+0.081
ickets+0.081
Neg logit contributions
 never-0.084
 would-0.081
 got-0.079
 wouldn-0.077
 found-0.075
Token and
Token ID290
Feature activation+6.00e+00

Pos logit contributions
 wouldn+0.047
 would+0.045
 got+0.043
 never+0.043
 didn+0.040
Neg logit contributions
 films-0.072
umm-0.072
 circles-0.072
andel-0.071
 sym-0.070
Token used
Token ID973
Feature activation+3.37e+00

Pos logit contributions
 wouldn+0.076
 would+0.070
 never+0.065
 got+0.063
use+0.062
Neg logit contributions
 circles-0.162
 films-0.155
 orchestra-0.150
 amaz-0.149
 buildings-0.149
Token a
Token ID257
Feature activation-1.25e+00

Pos logit contributions
 wouldn+0.040
 would+0.037
 got+0.036
 never+0.035
 didn+0.033
Neg logit contributions
 circles-0.102
 films-0.100
umm-0.098
andel-0.098
 sym-0.097
Token screw
Token ID9580
Feature activation+1.03e+00

Pos logit contributions
umm+0.024
 films+0.023
andel+0.022
 circles+0.022
 lions+0.022
Neg logit contributions
 wouldn-0.030
 would-0.029
 got-0.029
 never-0.028
 didn-0.027
Tokendriver
Token ID26230
Feature activation+9.27e+00

Pos logit contributions
 wouldn+0.016
 would+0.015
 never+0.015
 got+0.015
 didn+0.014
Neg logit contributions
 circles-0.027
andel-0.026
umm-0.026
 films-0.026
 lions-0.025
Token to
Token ID284
Feature activation+4.12e+00

Pos logit contributions
 wouldn+0.144
 would+0.134
 never+0.128
 got+0.127
 didn+0.127
Neg logit contributions
 circles-0.239
 films-0.231
andel-0.229
umm-0.225
 lions-0.225
Token open
Token ID1280
Feature activation+2.85e-02

Pos logit contributions
 wouldn+0.070
 would+0.066
 got+0.064
 never+0.063
 didn+0.060
Neg logit contributions
 films-0.102
 circles-0.102
umm-0.101
andel-0.100
 sym-0.099
Token it
Token ID340
Feature activation+4.09e+00

Pos logit contributions
 wouldn+0.000
 would+0.000
 got+0.000
 never+0.000
 didn+0.000
Neg logit contributions
umm-0.001
 films-0.001
andel-0.001
 sym-0.001
 Wheels-0.001
Token.
Token ID13
Feature activation-1.44e+00

Pos logit contributions
 wouldn+0.080
 would+0.076
 got+0.075
 never+0.074
 didn+0.071
Neg logit contributions
 films-0.091
umm-0.091
 circles-0.091
andel-0.089
 sym-0.088
Token
Token ID198
Feature activation-3.08e+00

Pos logit contributions
 films+0.035
umm+0.035
andel+0.034
 sym+0.034
 orchestra+0.033
Neg logit contributions
 wouldn-0.026
 would-0.025
 never-0.025
 got-0.024
 didn-0.023
Token.
Token ID13
Feature activation-1.52e+00

Pos logit contributions
 would+0.009
 wouldn+0.009
 got+0.008
 never+0.008
 could+0.008
Neg logit contributions
umm-0.014
andel-0.014
 films-0.013
 sym-0.013
 array-0.013
Token The
Token ID383
Feature activation-8.92e+00

Pos logit contributions
umm+0.045
andel+0.044
opl+0.042
 array+0.041
 flock+0.040
Neg logit contributions
 never-0.022
 could-0.022
 would-0.022
 wouldn-0.021
 got-0.021
Token robot
Token ID9379
Feature activation+1.68e+00

Pos logit contributions
andel+0.193
umm+0.182
 sym+0.169
pill+0.166
ums+0.162
Neg logit contributions
 would-0.206
 had-0.198
 got-0.197
 never-0.197
 tried-0.193
Token does
Token ID857
Feature activation+2.81e+00

Pos logit contributions
 wouldn+0.032
 would+0.030
 got+0.030
 never+0.029
 didn+0.028
Neg logit contributions
 films-0.039
umm-0.039
 circles-0.038
andel-0.038
 sym-0.037
Token everything
Token ID2279
Feature activation+4.65e+00

Pos logit contributions
 wouldn+0.026
 would+0.023
 got+0.022
 never+0.020
 didn+0.019
Neg logit contributions
 films-0.092
 circles-0.092
 sym-0.089
andel-0.089
 Wheels-0.088
Token they
Token ID484
Feature activation+9.25e+00

Pos logit contributions
 wouldn+0.083
 would+0.079
 got+0.077
 never+0.076
 didn+0.072
Neg logit contributions
 films-0.112
umm-0.112
 circles-0.111
andel-0.110
 sym-0.108
Token ask
Token ID1265
Feature activation+3.50e+00

Pos logit contributions
 wouldn+0.141
 never+0.127
 got+0.125
 would+0.124
 fell+0.115
Neg logit contributions
 circles-0.236
 films-0.235
 sym-0.228
 lions-0.225
 orchestra-0.224
Token.
Token ID13
Feature activation-1.17e+00

Pos logit contributions
 wouldn+0.058
 would+0.056
 never+0.054
 got+0.052
 didn+0.051
Neg logit contributions
umm-0.098
 sym-0.092
andel-0.091
opl-0.090
 array-0.089
Token They
Token ID1119
Feature activation+7.51e+00

Pos logit contributions
umm+0.034
andel+0.033
 array+0.032
 flock+0.032
 sym+0.031
Neg logit contributions
 could-0.016
 never-0.016
 would-0.016
 wouldn-0.015
 got-0.015
Token have
Token ID423
Feature activation+3.75e+00

Pos logit contributions
 wouldn+0.130
 would+0.121
 got+0.119
 never+0.117
 didn+0.112
Neg logit contributions
 circles-0.183
 films-0.183
umm-0.176
 sym-0.176
andel-0.176
Token a
Token ID257
Feature activation-3.19e+00

Pos logit contributions
 wouldn+0.087
 would+0.086
 never+0.084
 got+0.084
 didn+0.080
Neg logit contributions
umm-0.073
andel-0.070
 films-0.070
 sym-0.068
 circles-0.067
Token bring
Token ID2222
Feature activation+1.39e+00

Pos logit contributions
 wouldn+0.075
 listened+0.063
 never+0.062
 got+0.061
 fell+0.059
Neg logit contributions
 sym-0.155
 circles-0.155
 amaz-0.147
 orchestra-0.144
 zoo-0.144
Token things
Token ID1243
Feature activation+4.29e+00

Pos logit contributions
 wouldn+0.026
 would+0.025
 got+0.025
 never+0.025
 didn+0.023
Neg logit contributions
umm-0.032
 films-0.032
 circles-0.031
andel-0.031
 sym-0.031
Token to
Token ID284
Feature activation+4.80e+00

Pos logit contributions
 wouldn+0.078
 would+0.072
 got+0.071
 didn+0.070
 never+0.069
Neg logit contributions
 circles-0.099
 films-0.098
umm-0.095
andel-0.095
 elderly-0.093
Token people
Token ID661
Feature activation+3.85e+00

Pos logit contributions
 never+0.076
 wouldn+0.074
 would+0.074
 got+0.072
 didn+0.068
Neg logit contributions
umm-0.129
 sym-0.123
andel-0.122
ement-0.121
 circles-0.119
Token that
Token ID326
Feature activation+2.67e+00

Pos logit contributions
 wouldn+0.070
 would+0.064
 got+0.063
 never+0.061
 didn+0.060
Neg logit contributions
 films-0.092
 circles-0.090
 sym-0.086
umm-0.086
 lions-0.086
Token they
Token ID484
Feature activation+9.24e+00

Pos logit contributions
 wouldn+0.040
 never+0.037
 would+0.036
 got+0.036
 managed+0.034
Neg logit contributions
 films-0.068
 circles-0.067
umm-0.066
 lions-0.065
 Animals-0.064
Token need
Token ID761
Feature activation+4.28e+00

Pos logit contributions
 wouldn+0.109
 fell+0.097
 got+0.094
 never+0.093
 managed+0.093
Neg logit contributions
 films-0.230
 elderly-0.223
 Animals-0.222
 circles-0.220
 elephants-0.219
Token."
Token ID526
Feature activation-1.90e+00

Pos logit contributions
 wouldn+0.075
 got+0.068
 would+0.068
 never+0.066
 didn+0.065
Neg logit contributions
 films-0.101
 circles-0.100
 lions-0.096
 sym-0.095
andel-0.094
Token
Token ID220
Feature activation-2.32e+00

Pos logit contributions
 films+0.056
umm+0.055
andel+0.054
 orchestra+0.053
 array+0.052
Neg logit contributions
 got-0.025
 wouldn-0.024
 never-0.023
 would-0.023
 didn-0.022
Token
Token ID198
Feature activation-4.03e+00

Pos logit contributions
umm+0.061
 films+0.060
andel+0.059
 circles+0.058
 sym+0.058
Neg logit contributions
 wouldn-0.038
 would-0.036
 never-0.035
 got-0.035
 didn-0.033
Token
Token ID198
Feature activation-4.24e+00

Pos logit contributions
umm+0.114
 films+0.112
andel+0.108
 array+0.107
 orchestra+0.106
Neg logit contributions
 never-0.058
 would-0.057
 got-0.056
 wouldn-0.055
 didn-0.051
Token can
Token ID460
Feature activation+6.88e+00

Pos logit contributions
 wouldn+0.081
 listened+0.069
 never+0.068
 didn+0.063
 got+0.063
Neg logit contributions
 circles-0.191
 sym-0.189
opl-0.177
 Wheels-0.176
 rhythm-0.176
Token hop
Token ID1725
Feature activation+1.65e+00

Pos logit contributions
 wouldn+0.140
 would+0.130
 got+0.128
 didn+0.123
 never+0.121
Neg logit contributions
 circles-0.150
 films-0.146
 sym-0.141
umm-0.140
 lions-0.139
Token around
Token ID1088
Feature activation+1.78e+00

Pos logit contributions
 wouldn+0.031
 would+0.028
 got+0.028
 never+0.027
 didn+0.027
Neg logit contributions
 circles-0.039
 films-0.038
andel-0.037
 sym-0.037
umm-0.036
Token it
Token ID340
Feature activation+4.03e+00

Pos logit contributions
 wouldn+0.026
 never+0.026
 would+0.026
 got+0.025
 didn+0.023
Neg logit contributions
umm-0.049
andel-0.047
 sym-0.046
 films-0.046
 array-0.045
Token.
Token ID13
Feature activation-1.46e+00

Pos logit contributions
 wouldn+0.077
 would+0.072
 got+0.071
 never+0.070
 didn+0.067
Neg logit contributions
 circles-0.092
 films-0.091
umm-0.089
 sym-0.088
andel-0.087
Token Whoever
Token ID37807
Feature activation+9.24e+00

Pos logit contributions
umm+0.035
andel+0.035
 films+0.034
 sym+0.034
 array+0.033
Neg logit contributions
 wouldn-0.026
 never-0.025
 would-0.025
 got-0.024
 could-0.023
Token finishes
Token ID20271
Feature activation+2.56e+00

Pos logit contributions
 wouldn+0.134
 never+0.121
 didn+0.113
 would+0.113
 got+0.112
Neg logit contributions
 circles-0.242
 lions-0.229
 films-0.227
 dancers-0.223
 Wheels-0.223
Token first
Token ID717
Feature activation+1.68e-01

Pos logit contributions
 wouldn+0.048
 would+0.046
 got+0.045
 never+0.044
 didn+0.042
Neg logit contributions
umm-0.060
 films-0.059
 circles-0.058
andel-0.058
 sym-0.057
Token,
Token ID11
Feature activation+2.12e+00

Pos logit contributions
 wouldn+0.003
 would+0.003
 got+0.003
 never+0.003
 didn+0.003
Neg logit contributions
 films-0.004
umm-0.004
 circles-0.004
andel-0.004
 sym-0.004
Token wins
Token ID7864
Feature activation+3.71e+00

Pos logit contributions
 wouldn+0.035
 would+0.033
 got+0.032
 never+0.032
 didn+0.030
Neg logit contributions
umm-0.054
 films-0.054
andel-0.053
 circles-0.053
 sym-0.052
Token!"
Token ID2474
Feature activation-2.08e+00

Pos logit contributions
 wouldn+0.066
 would+0.064
 got+0.062
 never+0.062
 didn+0.058
Neg logit contributions
umm-0.092
 films-0.091
andel-0.089
 sym-0.088
 circles-0.087
Token she
Token ID673
Feature activation+7.49e+00

Pos logit contributions
 wouldn+0.032
 would+0.031
 never+0.030
 got+0.030
 didn+0.028
Neg logit contributions
umm-0.049
andel-0.049
 films-0.048
 sym-0.048
 circles-0.046
Token got
Token ID1392
Feature activation+1.17e+00

Pos logit contributions
 wouldn+0.065
use+0.052
 would+0.052
wed+0.048
 got+0.044
Neg logit contributions
 elderly-0.207
 circles-0.206
 sym-0.205
 orchestra-0.203
 dancers-0.201
Token rid
Token ID5755
Feature activation-4.46e+00

Pos logit contributions
 wouldn+0.019
 would+0.018
 got+0.017
 never+0.016
 didn+0.016
Neg logit contributions
 circles-0.030
 films-0.030
andel-0.029
umm-0.029
 sym-0.029
Token of
Token ID286
Feature activation-1.47e+00

Pos logit contributions
umm+0.079
 films+0.078
 sym+0.076
 circles+0.075
andel+0.074
Neg logit contributions
 wouldn-0.110
 got-0.106
 would-0.106
 never-0.103
 didn-0.100
Token the
Token ID262
Feature activation-1.05e+01

Pos logit contributions
umm+0.041
 sym+0.040
andel+0.039
 films+0.038
 array+0.038
Neg logit contributions
 never-0.023
 wouldn-0.023
 would-0.022
 got-0.021
 didn-0.020
Token intr
Token ID9913
Feature activation-1.81e+01

Pos logit contributions
umm+0.169
 films+0.167
andel+0.164
 sym+0.163
 circles+0.161
Neg logit contributions
 wouldn-0.273
 would-0.270
 never-0.262
 got-0.262
 didn-0.252
Tokenuders
Token ID48739
Feature activation+1.08e+00

Pos logit contributions
 circles+0.250
 sym+0.249
andel+0.248
 films+0.246
umm+0.246
Neg logit contributions
 wouldn-0.481
 never-0.477
 got-0.463
 would-0.463
 didn-0.450
Token.
Token ID13
Feature activation-1.08e+00

Pos logit contributions
 wouldn+0.020
 would+0.019
 got+0.018
 never+0.018
 didn+0.017
Neg logit contributions
umm-0.026
 films-0.026
 circles-0.025
andel-0.025
 sym-0.025
Token She
Token ID1375
Feature activation+7.57e+00

Pos logit contributions
 films+0.028
umm+0.027
andel+0.027
 sym+0.026
 array+0.026
Neg logit contributions
 never-0.018
 wouldn-0.018
 would-0.017
 got-0.017
 could-0.017
Token did
Token ID750
Feature activation+4.98e+00

Pos logit contributions
 wouldn+0.088
 would+0.073
use+0.071
 got+0.071
 never+0.064
Neg logit contributions
 circles-0.202
 elderly-0.199
 sym-0.196
 lions-0.195
 films-0.193
Token not
Token ID407
Feature activation+5.10e+00

Pos logit contributions
 wouldn+0.053
 would+0.049
 got+0.046
 never+0.041
 didn+0.039
Neg logit contributions
 circles-0.155
 films-0.153
 sym-0.148
andel-0.147
 lions-0.147
Token,
Token ID11
Feature activation+8.34e-01

Pos logit contributions
 wouldn+0.020
 would+0.019
 got+0.019
 never+0.018
 didn+0.017
Neg logit contributions
umm-0.027
 films-0.027
andel-0.026
 circles-0.026
 sym-0.026
Token he
Token ID339
Feature activation+6.62e+00

Pos logit contributions
 wouldn+0.011
 got+0.011
 would+0.010
 never+0.010
 didn+0.009
Neg logit contributions
umm-0.025
 sym-0.024
andel-0.024
 films-0.024
opl-0.024
Token noticed
Token ID6810
Feature activation-1.59e+00

Pos logit contributions
 wouldn+0.072
 would+0.062
 never+0.058
 got+0.058
 didn+0.051
Neg logit contributions
 circles-0.196
 sym-0.188
 films-0.186
 lions-0.186
 monkeys-0.182
Token that
Token ID326
Feature activation+5.84e-01

Pos logit contributions
umm+0.045
andel+0.042
 sym+0.042
opl+0.042
 films+0.041
Neg logit contributions
 wouldn-0.025
 would-0.025
 never-0.025
 got-0.025
 didn-0.023
Token the
Token ID262
Feature activation-1.09e+01

Pos logit contributions
 wouldn+0.010
 would+0.010
 got+0.010
 never+0.010
 didn+0.009
Neg logit contributions
umm-0.015
andel-0.015
 sym-0.015
opl-0.014
 films-0.014
Token Sm
Token ID2439
Feature activation-1.74e+01

Pos logit contributions
umm+0.183
andel+0.178
 sym+0.177
 films+0.173
 circles+0.169
Neg logit contributions
 never-0.276
 would-0.274
 wouldn-0.271
 got-0.270
 didn-0.256
Tokenelly
Token ID6148
Feature activation-7.34e+00

Pos logit contributions
 films+0.311
 circles+0.310
umm+0.308
andel+0.301
 sym+0.300
Neg logit contributions
 wouldn-0.415
 would-0.400
 got-0.394
 never-0.391
 didn-0.375
Token King
Token ID2677
Feature activation-1.66e+00

Pos logit contributions
umm+0.103
 films+0.101
 circles+0.100
andel+0.100
 sym+0.098
Neg logit contributions
 wouldn-0.205
 would-0.200
 got-0.198
 never-0.195
 didn-0.190
Token was
Token ID373
Feature activation+1.22e+00

Pos logit contributions
umm+0.054
 films+0.054
 circles+0.054
andel+0.054
 sym+0.053
Neg logit contributions
 wouldn-0.016
 would-0.014
 got-0.014
 never-0.013
 didn-0.012
Token cruel
Token ID12177
Feature activation-4.35e+00

Pos logit contributions
 never+0.023
 wouldn+0.023
 got+0.022
 would+0.022
 didn+0.021
Neg logit contributions
umm-0.030
andel-0.028
 films-0.028
 sym-0.028
 circles-0.027
Token and
Token ID290
Feature activation+3.63e+00

Pos logit contributions
umm+0.110
 films+0.109
andel+0.108
 circles+0.108
 sym+0.107
Neg logit contributions
 wouldn-0.074
 would-0.071
 got-0.068
 never-0.067
 didn-0.064
Token try
Token ID1949
Feature activation+5.62e+00

Pos logit contributions
 wouldn+0.123
 would+0.116
 never+0.112
 got+0.110
 didn+0.109
Neg logit contributions
 circles-0.121
 films-0.114
 Wheels-0.108
 squares-0.108
 sym-0.107
Token to
Token ID284
Feature activation+4.51e+00

Pos logit contributions
 wouldn+0.070
 would+0.065
 got+0.063
 never+0.062
 didn+0.057
Neg logit contributions
umm-0.167
 films-0.166
 circles-0.164
andel-0.164
 sym-0.162
Token grab
Token ID5552
Feature activation+3.91e-01

Pos logit contributions
 wouldn+0.084
 would+0.078
 got+0.074
 never+0.073
 didn+0.072
Neg logit contributions
 circles-0.107
 films-0.103
andel-0.102
umm-0.101
 sym-0.100
Token the
Token ID262
Feature activation-1.12e+01

Pos logit contributions
 wouldn+0.006
 would+0.006
 got+0.006
 never+0.006
 didn+0.005
Neg logit contributions
umm-0.010
 sym-0.010
 films-0.010
andel-0.010
 orchestra-0.010
Token av
Token ID1196
Feature activation-8.86e+00

Pos logit contributions
 films+0.196
umm+0.191
 circles+0.190
 sym+0.190
andel+0.188
Neg logit contributions
 wouldn-0.278
 would-0.270
 got-0.264
 never-0.259
 didn-0.251
Tokenoc
Token ID420
Feature activation-1.62e+01

Pos logit contributions
umm+0.147
 circles+0.146
andel+0.142
 films+0.140
 sym+0.136
Neg logit contributions
 wouldn-0.229
 would-0.217
 never-0.212
 got-0.211
 didn-0.209
Tokenados
Token ID22484
Feature activation+1.56e+00

Pos logit contributions
 films+0.200
 circles+0.196
umm+0.191
 sym+0.189
andel+0.188
Neg logit contributions
 wouldn-0.480
 would-0.464
 got-0.459
 never-0.456
 didn-0.441
Token.
Token ID13
Feature activation-1.55e+00

Pos logit contributions
 wouldn+0.030
 would+0.029
 got+0.028
 never+0.028
 didn+0.027
Neg logit contributions
 films-0.035
umm-0.035
 circles-0.034
andel-0.034
 sym-0.034
Token But
Token ID887
Feature activation+1.11e+00

Pos logit contributions
umm+0.040
 orchestra+0.039
 sym+0.038
opl+0.038
 films+0.038
Neg logit contributions
 wouldn-0.026
 would-0.025
 never-0.025
 got-0.025
 could-0.024
Token the
Token ID262
Feature activation-8.54e+00

Pos logit contributions
 wouldn+0.020
 would+0.019
 got+0.018
 never+0.018
'd+0.018
Neg logit contributions
 circles-0.024
 lions-0.024
 films-0.024
 monkeys-0.023
 elephants-0.023
Token rock
Token ID3881
Feature activation+1.65e+00

Pos logit contributions
umm+0.128
 films+0.126
 circles+0.125
andel+0.121
 sym+0.118
Neg logit contributions
 wouldn-0.232
 would-0.223
 got-0.220
 never-0.219
 didn-0.212
Token box
Token ID3091
Feature activation+1.81e+00

Pos logit contributions
 sym+0.215
umm+0.214
 films+0.212
andel+0.208
opl+0.199
Neg logit contributions
 would-0.213
 wouldn-0.209
 got-0.204
 never-0.193
 will-0.190
Token and
Token ID290
Feature activation+5.38e+00

Pos logit contributions
 wouldn+0.028
 would+0.027
 got+0.026
 never+0.026
 didn+0.024
Neg logit contributions
umm-0.048
 films-0.048
andel-0.047
 sym-0.047
 circles-0.046
Token share
Token ID2648
Feature activation+1.56e+00

Pos logit contributions
 wouldn+0.088
 would+0.080
 never+0.079
 didn+0.075
 got+0.072
Neg logit contributions
 circles-0.139
 films-0.132
 amaz-0.130
 amusement-0.128
 squares-0.127
Token the
Token ID262
Feature activation-1.08e+01

Pos logit contributions
 got+0.022
 would+0.022
 never+0.021
 wouldn+0.020
 had+0.019
Neg logit contributions
umm-0.049
andel-0.045
 sym-0.045
opl-0.044
 orchestra-0.043
Token ch
Token ID442
Feature activation-1.35e+01

Pos logit contributions
andel+0.209
umm+0.205
 films+0.201
 sym+0.198
 circles+0.189
Neg logit contributions
 would-0.249
 wouldn-0.246
 got-0.239
 never-0.235
 didn-0.227
Tokenocol
Token ID4668
Feature activation-1.60e+01

Pos logit contributions
 films+0.117
umm+0.115
 circles+0.113
andel+0.109
 sym+0.106
Neg logit contributions
 wouldn-0.455
 would-0.442
 got-0.437
 never-0.434
 didn-0.423
Tokenates
Token ID689
Feature activation+1.59e-01

Pos logit contributions
 circles+0.128
 films+0.126
umm+0.124
andel+0.123
 lions+0.109
Neg logit contributions
 wouldn-0.552
 would-0.532
 got-0.524
 never-0.519
 didn-0.517
Token.
Token ID13
Feature activation-1.73e+00

Pos logit contributions
 wouldn+0.003
 would+0.003
 never+0.002
 got+0.002
 could+0.002
Neg logit contributions
umm-0.004
 sym-0.004
andel-0.004
 films-0.004
 orchestra-0.004
Token They
Token ID1119
Feature activation+7.42e+00

Pos logit contributions
umm+0.047
opl+0.046
andel+0.045
 orchestra+0.044
ement+0.044
Neg logit contributions
 would-0.027
 got-0.027
 wouldn-0.026
 could-0.026
 never-0.026
Token are
Token ID389
Feature activation+1.25e+00

Pos logit contributions
 wouldn+0.124
 would+0.111
 got+0.107
 never+0.105
 didn+0.104
Neg logit contributions
 circles-0.186
 films-0.179
 Wheels-0.174
 lions-0.173
 squares-0.173
Token happy
Token ID3772
Feature activation-1.13e+00

Pos logit contributions
 never+0.022
 wouldn+0.021
 would+0.021
 got+0.021
 didn+0.019
Neg logit contributions
umm-0.034
andel-0.033
 sym-0.031
 films-0.031
ement-0.030
Token to
Token ID284
Feature activation+3.81e+00

Pos logit contributions
 would+0.022
 wouldn+0.022
 never+0.022
 got+0.021
 didn+0.017
Neg logit contributions
umm-0.080
andel-0.078
 sym-0.078
opl-0.077
 array-0.076
Token see
Token ID766
Feature activation+1.56e+00

Pos logit contributions
 wouldn+0.056
 would+0.052
 got+0.050
 never+0.048
 didn+0.047
Neg logit contributions
 films-0.104
 circles-0.103
umm-0.102
 sym-0.101
andel-0.101
Token the
Token ID262
Feature activation-1.11e+01

Pos logit contributions
 wouldn+0.025
 would+0.024
 never+0.024
 got+0.023
 didn+0.021
Neg logit contributions
umm-0.043
andel-0.041
 sym-0.040
 circles-0.039
 array-0.039
Token rh
Token ID9529
Feature activation-1.34e+01

Pos logit contributions
igsaw+0.192
andel+0.178
 sym+0.174
ums+0.172
ement+0.171
Neg logit contributions
 never-0.285
 would-0.279
 got-0.265
 will-0.264
 wouldn-0.263
Tokenin
Token ID259
Feature activation-6.11e+00

Pos logit contributions
 films+0.300
umm+0.299
 circles+0.297
andel+0.293
 sym+0.290
Neg logit contributions
 wouldn-0.263
 would-0.252
 got-0.246
 never-0.243
 didn-0.232
Tokenoc
Token ID420
Feature activation-1.59e+01

Pos logit contributions
umm+0.118
 films+0.117
 circles+0.116
andel+0.116
 sym+0.114
Neg logit contributions
 wouldn-0.139
 would-0.135
 got-0.133
 never-0.131
 didn-0.126
Tokeneros
Token ID27498
Feature activation-7.23e-01

Pos logit contributions
 circles+0.177
 films+0.175
 sym+0.172
andel+0.168
 orchestra+0.157
Neg logit contributions
 wouldn-0.491
 never-0.471
 would-0.470
 got-0.468
 didn-0.449
Token anymore
Token ID7471
Feature activation+1.43e+00

Pos logit contributions
umm+0.017
 films+0.017
 circles+0.017
andel+0.016
 sym+0.016
Neg logit contributions
 wouldn-0.014
 would-0.013
 got-0.013
 never-0.013
 didn-0.012
Token.
Token ID13
Feature activation-1.87e+00

Pos logit contributions
 wouldn+0.029
 would+0.027
 got+0.027
 never+0.027
 didn+0.025
Neg logit contributions
 films-0.031
umm-0.031
 circles-0.031
andel-0.031
 sym-0.030
Token He
Token ID679
Feature activation+6.58e+00

Pos logit contributions
andel+0.044
umm+0.044
 circles+0.043
 films+0.043
 sym+0.043
Neg logit contributions
 wouldn-0.034
 would-0.033
 never-0.032
 got-0.032
 could-0.031
Token is
Token ID318
Feature activation+1.85e+00

Pos logit contributions
 wouldn+0.090
 got+0.079
 managed+0.079
 would+0.078
 never+0.078
Neg logit contributions
 lions-0.159
 films-0.156
 circles-0.154
 Wheels-0.152
 monkeys-0.152
Token and
Token ID290
Feature activation+4.81e+00

Pos logit contributions
 wouldn+0.027
 would+0.025
 got+0.024
 didn+0.024
 never+0.024
Neg logit contributions
 circles-0.033
 sym-0.031
andel-0.031
 films-0.030
umm-0.030
Token wrapped
Token ID12908
Feature activation-2.44e+00

Pos logit contributions
 wouldn+0.071
 would+0.065
 never+0.060
 got+0.060
 didn+0.059
Neg logit contributions
 circles-0.129
 monkeys-0.122
 lions-0.122
 elephants-0.121
 films-0.119
Token them
Token ID606
Feature activation+4.77e+00

Pos logit contributions
 films+0.060
umm+0.060
 circles+0.060
andel+0.059
 sym+0.058
Neg logit contributions
 wouldn-0.042
 would-0.040
 got-0.039
 never-0.039
 didn-0.037
Token around
Token ID1088
Feature activation+7.37e-01

Pos logit contributions
 wouldn+0.091
 would+0.088
 got+0.085
 didn+0.081
 forgot+0.081
Neg logit contributions
 circles-0.103
 squares-0.093
 films-0.093
 lions-0.092
 sym-0.092
Token the
Token ID262
Feature activation-1.18e+01

Pos logit contributions
 wouldn+0.013
 would+0.012
 never+0.012
 got+0.012
 didn+0.011
Neg logit contributions
umm-0.019
andel-0.018
 sym-0.018
 films-0.018
 circles-0.018
Token gr
Token ID1036
Feature activation-1.59e+01

Pos logit contributions
 films+0.216
umm+0.213
andel+0.209
 sym+0.207
opl+0.197
Neg logit contributions
 wouldn-0.288
 would-0.285
 never-0.282
 got-0.281
 didn-0.266
Tokenumpy
Token ID32152
Feature activation-1.09e+01

Pos logit contributions
 films+0.238
 circles+0.234
umm+0.231
 sym+0.227
andel+0.218
Neg logit contributions
 wouldn-0.425
 got-0.411
 would-0.410
 never-0.409
 didn-0.392
Token man
Token ID582
Feature activation+1.23e+00

Pos logit contributions
 films+0.197
umm+0.195
 sym+0.190
 circles+0.190
andel+0.186
Neg logit contributions
 wouldn-0.266
 got-0.259
 would-0.257
 never-0.250
 didn-0.244
Token.
Token ID13
Feature activation-1.75e+00

Pos logit contributions
 wouldn+0.022
 would+0.021
 got+0.020
 never+0.020
 didn+0.019
Neg logit contributions
 films-0.030
umm-0.030
 circles-0.029
andel-0.029
 sym-0.029
Token It
Token ID632
Feature activation+6.29e+00

Pos logit contributions
umm+0.045
andel+0.044
 sym+0.043
 array+0.043
 films+0.043
Neg logit contributions
 wouldn-0.028
 never-0.028
 would-0.028
 got-0.027
 could-0.026
Token said
Token ID531
Feature activation+9.35e-01

Pos logit contributions
 wouldn+0.071
 would+0.062
 got+0.058
 never+0.057
use+0.056
Neg logit contributions
 circles-0.181
 monkeys-0.173
 lions-0.173
 sym-0.172
 elephants-0.169
Token a
Token ID257
Feature activation-5.00e+00

Pos logit contributions
umm+0.050
andel+0.049
 films+0.049
 sym+0.049
 Wheels+0.047
Neg logit contributions
 wouldn-0.035
 never-0.034
 would-0.034
 got-0.033
 didn-0.031
Token big
Token ID1263
Feature activation-8.85e+00

Pos logit contributions
umm+0.099
 films+0.098
andel+0.094
 circles+0.093
 Wheels+0.090
Neg logit contributions
 would-0.115
 wouldn-0.115
 got-0.113
 never-0.109
 didn-0.104
Token box
Token ID3091
Feature activation-4.66e-01

Pos logit contributions
umm+0.200
 films+0.199
andel+0.193
 circles+0.192
 sym+0.190
Neg logit contributions
 wouldn-0.172
 would-0.168
 got-0.167
 never-0.160
 didn-0.154
Token of
Token ID286
Feature activation-6.79e+00

Pos logit contributions
 films+0.010
umm+0.010
andel+0.010
 circles+0.010
 sym+0.010
Neg logit contributions
 wouldn-0.010
 would-0.010
 got-0.009
 never-0.009
 fell-0.009
Token ch
Token ID442
Feature activation-1.09e+01

Pos logit contributions
umm+0.132
andel+0.127
 sym+0.120
ums+0.120
opl+0.120
Neg logit contributions
 would-0.160
 wouldn-0.155
 got-0.154
 never-0.149
 tried-0.147
Tokenocol
Token ID4668
Feature activation-1.58e+01

Pos logit contributions
 sym+0.126
umm+0.126
 films+0.125
 lions+0.116
 Wheels+0.116
Neg logit contributions
 wouldn-0.317
 got-0.312
 never-0.302
 would-0.301
 fell-0.296
Tokenates
Token ID689
Feature activation-8.98e-01

Pos logit contributions
 films+0.150
umm+0.148
 circles+0.146
andel+0.141
 sym+0.138
Neg logit contributions
 wouldn-0.512
 would-0.498
 got-0.491
 never-0.488
 didn-0.475
Token!
Token ID0
Feature activation-1.65e+00

Pos logit contributions
 films+0.020
umm+0.020
 circles+0.020
andel+0.020
 sym+0.019
Neg logit contributions
 wouldn-0.017
 would-0.017
 got-0.016
 never-0.016
 didn-0.015
Token<|endoftext|>
Token ID50256
Feature activation-5.51e+00

Pos logit contributions
 films+0.043
umm+0.042
andel+0.042
 circles+0.041
 sym+0.041
Neg logit contributions
 wouldn-0.026
 got-0.025
 would-0.025
 never-0.025
 could-0.024
TokenOnce
Token ID7454
Feature activation-2.85e+00

Pos logit contributions
umm+0.117
 films+0.117
andel+0.116
 sym+0.112
 orchestra+0.111
Neg logit contributions
 wouldn-0.113
 would-0.113
 got-0.111
 never-0.107
 didn-0.103
Token upon
Token ID2402
Feature activation-2.07e+00

Pos logit contributions
 circles+0.055
 films+0.053
andel+0.053
ement+0.053
 elderly+0.052
Neg logit contributions
 wouldn-0.060
 never-0.055
 got-0.054
 would-0.054
 listened-0.053
Token food
Token ID2057
Feature activation-3.87e-01

Pos logit contributions
 wouldn+0.025
 would+0.023
 got+0.023
 never+0.022
 didn+0.022
Neg logit contributions
 films-0.024
 circles-0.023
umm-0.023
andel-0.023
 lions-0.022
Token for
Token ID329
Feature activation-9.53e-01

Pos logit contributions
 films+0.010
 circles+0.010
umm+0.010
andel+0.010
 lions+0.010
Neg logit contributions
 wouldn-0.006
 would-0.005
 got-0.005
 never-0.005
 didn-0.005
Token the
Token ID262
Feature activation-1.05e+01

Pos logit contributions
umm+0.022
 films+0.022
 circles+0.022
 sym+0.021
andel+0.021
Neg logit contributions
 wouldn-0.018
 would-0.017
 never-0.017
 got-0.017
 didn-0.016
Token rh
Token ID9529
Feature activation-1.24e+01

Pos logit contributions
andel+0.145
 films+0.144
 circles+0.144
 sym+0.143
umm+0.140
Neg logit contributions
 would-0.294
 wouldn-0.294
 never-0.292
 got-0.289
 didn-0.279
Tokenin
Token ID259
Feature activation-5.30e+00

Pos logit contributions
 films+0.261
umm+0.260
 circles+0.257
andel+0.254
 sym+0.249
Neg logit contributions
 wouldn-0.259
 would-0.246
 got-0.241
 never-0.238
 didn-0.230
Tokenoc
Token ID420
Feature activation-1.57e+01

Pos logit contributions
 films+0.094
umm+0.093
 circles+0.092
andel+0.090
 sym+0.089
Neg logit contributions
 wouldn-0.129
 would-0.124
 got-0.121
 never-0.121
 didn-0.117
Tokeneros
Token ID27498
Feature activation-2.85e-01

Pos logit contributions
 films+0.171
 circles+0.170
 sym+0.167
andel+0.157
 orchestra+0.156
Neg logit contributions
 wouldn-0.487
 never-0.470
 would-0.468
 got-0.466
 didn-0.447
Token and
Token ID290
Feature activation+4.13e+00

Pos logit contributions
 films+0.006
 circles+0.006
umm+0.006
andel+0.006
 lions+0.006
Neg logit contributions
 wouldn-0.006
 would-0.005
 got-0.005
 never-0.005
 didn-0.005
Token his
Token ID465
Feature activation-3.92e+00

Pos logit contributions
 wouldn+0.073
 got+0.064
 would+0.063
 fell+0.061
t+0.061
Neg logit contributions
 lions-0.093
 films-0.092
 elephants-0.092
 monkeys-0.092
 circles-0.090
Token friends
Token ID2460
Feature activation+2.43e+00

Pos logit contributions
andel+0.058
 sym+0.057
 films+0.057
 circles+0.056
umm+0.056
Neg logit contributions
 wouldn-0.108
 would-0.108
 never-0.105
 got-0.103
 didn-0.101
Token.
Token ID13
Feature activation-1.23e+00

Pos logit contributions
 wouldn+0.052
 got+0.047
 never+0.047
 would+0.047
 didn+0.046
Neg logit contributions
 films-0.047
 circles-0.047
 lions-0.045
 sym-0.045
umm-0.045
Token had
Token ID550
Feature activation+5.83e+00

Pos logit contributions
use+0.094
 wouldn+0.071
wed+0.064
ws+0.063
w+0.061
Neg logit contributions
 lions-0.182
 amusement-0.179
 monkeys-0.170
aws-0.170
 unison-0.168
Token never
Token ID1239
Feature activation+6.61e+00

Pos logit contributions
 wouldn+0.074
 would+0.064
 got+0.058
 didn+0.056
 fell+0.055
Neg logit contributions
 lions-0.156
 circles-0.153
 amusement-0.152
 elderly-0.152
 monkeys-0.150
Token seen
Token ID1775
Feature activation+1.90e+00

Pos logit contributions
 wouldn+0.087
 would+0.072
 never+0.072
 didn+0.062
 don+0.058
Neg logit contributions
 circles-0.184
 elderly-0.179
 amusement-0.179
 lions-0.177
 monkeys-0.174
Token the
Token ID262
Feature activation-1.06e+01

Pos logit contributions
 wouldn+0.032
 would+0.031
 never+0.031
 got+0.030
 didn+0.028
Neg logit contributions
umm-0.048
 sym-0.048
andel-0.047
 films-0.046
 orchestra-0.046
Token av
Token ID1196
Feature activation-1.05e+01

Pos logit contributions
andel+0.188
 circles+0.187
 films+0.186
 sym+0.185
umm+0.180
Neg logit contributions
 wouldn-0.263
 would-0.262
 never-0.254
 got-0.250
 didn-0.237
Tokenoc
Token ID420
Feature activation-1.57e+01

Pos logit contributions
umm+0.170
 circles+0.161
 films+0.157
andel+0.157
opl+0.155
Neg logit contributions
 wouldn-0.279
 would-0.265
 never-0.259
 got-0.258
 didn-0.255
Tokenados
Token ID22484
Feature activation+6.78e-01

Pos logit contributions
 films+0.187
 circles+0.183
umm+0.182
andel+0.177
 sym+0.175
Neg logit contributions
 wouldn-0.472
 would-0.457
 got-0.452
 never-0.449
 didn-0.435
Token.
Token ID13
Feature activation-1.34e+00

Pos logit contributions
 wouldn+0.013
 would+0.012
 got+0.012
 never+0.012
 didn+0.012
Neg logit contributions
 circles-0.015
umm-0.015
 films-0.014
 lions-0.014
andel-0.014
Token They
Token ID1119
Feature activation+7.43e+00

Pos logit contributions
umm+0.033
andel+0.033
 circles+0.032
 films+0.032
 sym+0.031
Neg logit contributions
 wouldn-0.024
 never-0.023
 would-0.023
 got-0.022
 didn-0.022
Token wished
Token ID16555
Feature activation+1.68e+00

Pos logit contributions
 wouldn+0.102
 would+0.093
use+0.091
t+0.084
 got+0.081
Neg logit contributions
 circles-0.173
 amusement-0.170
 unison-0.168
 elderly-0.168
 lions-0.167
Token they
Token ID484
Feature activation+8.04e+00

Pos logit contributions
 wouldn+0.019
 would+0.018
 got+0.017
 never+0.017
 didn+0.015
Neg logit contributions
 films-0.051
umm-0.051
 circles-0.051
andel-0.050
 sym-0.050
Token have
Token ID423
Feature activation+3.17e+00

Pos logit contributions
 wouldn+0.039
 would+0.035
 never+0.034
 got+0.034
 didn+0.032
Neg logit contributions
 films-0.061
 circles-0.061
 lions-0.059
 sym-0.058
 orchestra-0.058
Token stopped
Token ID5025
Feature activation-8.91e-02

Pos logit contributions
 wouldn+0.060
 would+0.059
 got+0.058
 never+0.058
 didn+0.053
Neg logit contributions
umm-0.075
andel-0.073
 films-0.072
 sym-0.071
 circles-0.071
Token the
Token ID262
Feature activation-1.04e+01

Pos logit contributions
umm+0.002
 films+0.002
 circles+0.002
andel+0.002
 sym+0.002
Neg logit contributions
 wouldn-0.002
 would-0.002
 got-0.002
 never-0.002
 didn-0.001
Token rh
Token ID9529
Feature activation-1.31e+01

Pos logit contributions
ement+0.159
andel+0.157
ums+0.155
 sym+0.150
umm+0.149
Neg logit contributions
 would-0.276
 never-0.275
 wouldn-0.269
 got-0.267
 had-0.255
Tokenin
Token ID259
Feature activation-6.31e+00

Pos logit contributions
 films+0.296
umm+0.296
 circles+0.295
 sym+0.289
andel+0.289
Neg logit contributions
 wouldn-0.252
 would-0.242
 got-0.238
 never-0.234
 didn-0.222
Tokenoc
Token ID420
Feature activation-1.57e+01

Pos logit contributions
umm+0.126
 films+0.125
 circles+0.125
andel+0.124
 sym+0.121
Neg logit contributions
 wouldn-0.139
 would-0.136
 got-0.135
 never-0.132
 didn-0.126
Tokeneros
Token ID27498
Feature activation-1.50e+00

Pos logit contributions
 circles+0.177
 films+0.176
 sym+0.173
andel+0.164
 orchestra+0.158
Neg logit contributions
 wouldn-0.479
 got-0.459
 would-0.457
 never-0.457
 didn-0.437
Token.
Token ID13
Feature activation-1.74e+00

Pos logit contributions
umm+0.035
andel+0.035
 films+0.035
 circles+0.034
 sym+0.034
Neg logit contributions
 wouldn-0.028
 would-0.027
 got-0.026
 never-0.026
 didn-0.025
Token He
Token ID679
Feature activation+6.75e+00

Pos logit contributions
andel+0.041
umm+0.041
 sym+0.040
 circles+0.040
 films+0.040
Neg logit contributions
 wouldn-0.031
 would-0.030
 got-0.030
 never-0.030
 could-0.029
Token is
Token ID318
Feature activation+2.04e+00

Pos logit contributions
 wouldn+0.099
 would+0.087
 never+0.085
 got+0.085
 managed+0.080
Neg logit contributions
 films-0.173
 lions-0.171
 circles-0.170
 elephants-0.166
 Wheels-0.165
Token sleeping
Token ID11029
Feature activation-2.10e+00

Pos logit contributions
 wouldn+0.035
 never+0.035
 got+0.033
 would+0.033
 didn+0.031
Neg logit contributions
umm-0.054
andel-0.051
 sym-0.050
 films-0.050
 circles-0.049
Token her
Token ID607
Feature activation-2.98e+00

Pos logit contributions
 films+0.043
 circles+0.043
 elderly+0.038
 lions+0.038
andel+0.037
Neg logit contributions
 wouldn-0.051
 never-0.047
 forgot-0.046
'd-0.046
 would-0.046
Token a
Token ID257
Feature activation-4.08e+00

Pos logit contributions
umm+0.084
 sym+0.078
opl+0.077
andel+0.076
 films+0.074
Neg logit contributions
 got-0.048
 wouldn-0.047
 would-0.047
 never-0.045
 didn-0.043
Token box
Token ID3091
Feature activation+4.05e-01

Pos logit contributions
umm+0.075
 circles+0.069
 films+0.069
opl+0.068
andel+0.067
Neg logit contributions
 would-0.103
 wouldn-0.102
 got-0.101
 never-0.094
 didn-0.094
Token of
Token ID286
Feature activation-6.86e+00

Pos logit contributions
 wouldn+0.008
 would+0.008
 got+0.008
 never+0.008
 didn+0.008
Neg logit contributions
 films-0.009
umm-0.008
 circles-0.008
andel-0.008
 sym-0.008
Token ch
Token ID442
Feature activation-1.06e+01

Pos logit contributions
umm+0.147
opl+0.135
 sym+0.132
andel+0.126
ement+0.123
Neg logit contributions
 would-0.159
 got-0.154
 wouldn-0.152
 tried-0.149
 didn-0.146
Tokenocol
Token ID4668
Feature activation-1.57e+01

Pos logit contributions
 films+0.120
andel+0.119
 circles+0.118
umm+0.116
 sym+0.107
Neg logit contributions
 wouldn-0.333
 would-0.325
 never-0.317
 got-0.315
 didn-0.307
Tokenates
Token ID689
Feature activation-2.39e-01

Pos logit contributions
 circles+0.143
 films+0.140
umm+0.140
andel+0.135
 sym+0.123
Neg logit contributions
 wouldn-0.522
 would-0.506
 got-0.496
 never-0.492
 didn-0.487
Token.
Token ID13
Feature activation-1.76e+00

Pos logit contributions
 films+0.005
 circles+0.005
umm+0.005
andel+0.005
 sym+0.005
Neg logit contributions
 wouldn-0.004
 would-0.004
 got-0.004
 never-0.004
 didn-0.004
Token Lily
Token ID20037
Feature activation+4.00e+00

Pos logit contributions
umm+0.039
 films+0.039
 circles+0.039
andel+0.039
 sym+0.038
Neg logit contributions
 wouldn-0.035
 would-0.034
 got-0.033
 never-0.032
 didn-0.031
Token opened
Token ID4721
Feature activation+7.48e-01

Pos logit contributions
 wouldn+0.055
use+0.050
 never+0.045
 would+0.044
 managed+0.041
Neg logit contributions
 elderly-0.100
 circles-0.099
 films-0.099
 array-0.095
 Wheels-0.094
Token the
Token ID262
Feature activation-9.65e+00

Pos logit contributions
 wouldn+0.013
 would+0.012
 got+0.012
 never+0.012
 didn+0.011
Neg logit contributions
umm-0.019
 films-0.019
 circles-0.018
 sym-0.018
andel-0.018
Token look
Token ID804
Feature activation+3.83e+00

Pos logit contributions
 wouldn+0.069
 would+0.060
 managed+0.058
wed+0.057
w+0.056
Neg logit contributions
 amusement-0.096
 playground-0.096
 films-0.095
 circles-0.092
 stage-0.092
Token for
Token ID329
Feature activation+1.31e-01

Pos logit contributions
 wouldn+0.068
 would+0.065
 never+0.064
 got+0.063
 didn+0.060
Neg logit contributions
 films-0.092
 circles-0.090
andel-0.088
 sym-0.088
umm-0.087
Token the
Token ID262
Feature activation-9.56e+00

Pos logit contributions
 never+0.002
 got+0.002
 wouldn+0.002
 would+0.002
 tried+0.002
Neg logit contributions
umm-0.004
 sym-0.003
andel-0.003
opl-0.003
 films-0.003
Token rh
Token ID9529
Feature activation-1.23e+01

Pos logit contributions
andel+0.180
 sym+0.175
igsaw+0.174
umm+0.167
 films+0.164
Neg logit contributions
 would-0.242
 never-0.238
 wouldn-0.229
 got-0.227
 will-0.220
Tokenin
Token ID259
Feature activation-5.47e+00

Pos logit contributions
 films+0.278
umm+0.277
 circles+0.276
andel+0.272
 sym+0.271
Neg logit contributions
 wouldn-0.238
 would-0.229
 got-0.224
 never-0.221
 didn-0.210
Tokenoc
Token ID420
Feature activation-1.56e+01

Pos logit contributions
umm+0.107
 films+0.106
andel+0.105
 circles+0.105
 sym+0.104
Neg logit contributions
 wouldn-0.124
 would-0.121
 got-0.120
 never-0.117
 didn-0.112
Tokeneros
Token ID27498
Feature activation-1.19e-01

Pos logit contributions
 films+0.178
 circles+0.177
 sym+0.176
andel+0.166
 orchestra+0.163
Neg logit contributions
 wouldn-0.472
 never-0.454
 would-0.453
 got-0.451
 didn-0.428
Token.
Token ID13
Feature activation-1.43e+00

Pos logit contributions
 films+0.003
 circles+0.003
umm+0.003
andel+0.003
 sym+0.003
Neg logit contributions
 wouldn-0.002
 would-0.002
 got-0.002
 never-0.002
 didn-0.002
Token
Token ID198
Feature activation-3.80e+00

Pos logit contributions
 films+0.032
umm+0.032
 circles+0.032
andel+0.031
 sym+0.031
Neg logit contributions
 wouldn-0.028
 would-0.027
 got-0.026
 never-0.026
 didn-0.025
Token
Token ID198
Feature activation-4.16e+00

Pos logit contributions
 circles+0.077
 films+0.076
 sym+0.073
andel+0.073
 Wheels+0.072
Neg logit contributions
 wouldn-0.085
 would-0.078
 got-0.076
 never-0.074
 didn-0.073
TokenThey
Token ID2990
Feature activation+7.00e+00

Pos logit contributions
 films+0.088
 circles+0.088
 Wheels+0.084
 sym+0.084
andel+0.083
Neg logit contributions
 wouldn-0.089
 would-0.081
 got-0.078
 never-0.076
 didn-0.075
Token play
Token ID711
Feature activation-1.04e+00

Pos logit contributions
 wouldn+0.105
 would+0.098
 got+0.094
 didn+0.087
 never+0.085
Neg logit contributions
 circles-0.156
 films-0.155
 lions-0.147
 elderly-0.147
 sym-0.147
Token without
Token ID1231
Feature activation-3.59e-01

Pos logit contributions
umm+0.028
andel+0.027
 films+0.027
 sym+0.026
 circles+0.026
Neg logit contributions
 wouldn-0.017
 would-0.016
 got-0.016
 never-0.016
 didn-0.015
Token going
Token ID1016
Feature activation+1.37e+00

Pos logit contributions
 films+0.008
umm+0.008
 circles+0.008
andel+0.008
 sym+0.008
Neg logit contributions
 wouldn-0.007
 would-0.007
 got-0.007
 never-0.007
 didn-0.006
Token near
Token ID1474
Feature activation-2.66e+00

Pos logit contributions
 wouldn+0.022
 would+0.021
 got+0.021
 never+0.020
 didn+0.019
Neg logit contributions
 films-0.035
 circles-0.035
umm-0.035
andel-0.034
 sym-0.034
Token the
Token ID262
Feature activation-1.22e+01

Pos logit contributions
umm+0.077
 sym+0.073
andel+0.073
 orchestra+0.072
opl+0.070
Neg logit contributions
 wouldn-0.040
 would-0.040
 never-0.040
 got-0.038
 didn-0.035
Token sm
Token ID895
Feature activation-1.56e+01

Pos logit contributions
andel+0.223
ement+0.222
 films+0.220
ums+0.217
umm+0.216
Neg logit contributions
 would-0.302
 never-0.295
 got-0.289
 wouldn-0.286
 will-0.281
Tokenelly
Token ID6148
Feature activation-1.10e+01

Pos logit contributions
 films+0.186
 circles+0.178
 sym+0.173
umm+0.168
 orchestra+0.167
Neg logit contributions
 wouldn-0.456
 got-0.451
 would-0.450
 never-0.442
 fell-0.424
Token leaves
Token ID5667
Feature activation-8.80e-01

Pos logit contributions
 films+0.167
umm+0.166
andel+0.165
 circles+0.161
 sym+0.159
Neg logit contributions
 wouldn-0.295
 would-0.291
 got-0.287
 never-0.279
 didn-0.271
Token.
Token ID13
Feature activation-1.53e+00

Pos logit contributions
umm+0.021
 films+0.021
andel+0.021
 sym+0.020
 circles+0.020
Neg logit contributions
 wouldn-0.016
 would-0.016
 got-0.015
 never-0.015
 didn-0.014
Token Tim
Token ID5045
Feature activation+4.07e+00

Pos logit contributions
 orchestra+0.040
andel+0.039
opl+0.039
umm+0.039
 films+0.039
Neg logit contributions
 wouldn-0.024
 got-0.023
 never-0.023
 would-0.023
 could-0.022
Token and
Token ID290
Feature activation+2.92e+00

Pos logit contributions
 wouldn+0.067
 would+0.060
 never+0.059
 got+0.058
 didn+0.055
Neg logit contributions
 circles-0.101
 lions-0.094
 films-0.094
 elderly-0.093
 sym-0.093
Token?"
Token ID1701
Feature activation-8.47e-01

Pos logit contributions
 wouldn+0.045
 would+0.042
 got+0.041
 never+0.041
 didn+0.038
Neg logit contributions
 films-0.085
 circles-0.084
umm-0.084
andel-0.082
 sym-0.082
Token asked
Token ID1965
Feature activation+1.62e+00

Pos logit contributions
umm+0.025
 films+0.024
andel+0.024
 circles+0.024
 sym+0.024
Neg logit contributions
 wouldn-0.011
 got-0.011
 would-0.011
 never-0.010
 didn-0.010
Token the
Token ID262
Feature activation-1.04e+01

Pos logit contributions
 got+0.025
 wouldn+0.025
 would+0.025
 tried+0.023
 never+0.023
Neg logit contributions
umm-0.047
andel-0.044
 sym-0.044
opl-0.043
 array-0.042
Token rh
Token ID9529
Feature activation-1.30e+01

Pos logit contributions
umm+0.209
igsaw+0.205
andel+0.200
 sym+0.197
ement+0.184
Neg logit contributions
 got-0.253
 never-0.248
 would-0.239
 had-0.231
 tried-0.230
Tokenin
Token ID259
Feature activation-5.05e+00

Pos logit contributions
 films+0.273
umm+0.269
 circles+0.268
andel+0.264
 lions+0.259
Neg logit contributions
 wouldn-0.274
 would-0.259
 got-0.254
 never-0.251
 didn-0.242
Tokenoc
Token ID420
Feature activation-1.55e+01

Pos logit contributions
umm+0.106
 films+0.103
 circles+0.100
 sym+0.100
andel+0.100
Neg logit contributions
 wouldn-0.109
 would-0.107
 got-0.107
 never-0.104
 didn-0.101
Tokeneros
Token ID27498
Feature activation-5.05e-01

Pos logit contributions
 films+0.163
 circles+0.161
 sym+0.157
andel+0.154
umm+0.149
Neg logit contributions
 wouldn-0.485
 would-0.469
 never-0.469
 got-0.466
 didn-0.449
Token.
Token ID13
Feature activation-2.19e+00

Pos logit contributions
umm+0.012
 films+0.012
andel+0.011
 circles+0.011
 sym+0.011
Neg logit contributions
 wouldn-0.010
 would-0.009
 got-0.009
 never-0.009
 didn-0.009
Token
Token ID198
Feature activation-4.46e+00

Pos logit contributions
umm+0.049
andel+0.047
 films+0.047
 sym+0.046
 orchestra+0.045
Neg logit contributions
 got-0.045
 wouldn-0.045
 would-0.044
 never-0.042
 didn-0.042
Token
Token ID198
Feature activation-4.77e+00

Pos logit contributions
 circles+0.090
 films+0.087
 sym+0.085
andel+0.084
umm+0.083
Neg logit contributions
 wouldn-0.100
 would-0.091
 got-0.089
 never-0.088
 didn-0.086
Token"
Token ID1
Feature activation-7.70e-01

Pos logit contributions
 circles+0.123
 films+0.123
umm+0.121
 sym+0.120
andel+0.120
Neg logit contributions
 wouldn-0.078
 would-0.071
 got-0.068
 never-0.068
 didn-0.065
Token a
Token ID257
Feature activation-4.79e+00

Pos logit contributions
 wouldn+0.035
 would+0.033
 got+0.033
 never+0.032
 didn+0.031
Neg logit contributions
 circles-0.042
 films-0.042
umm-0.041
 lions-0.040
andel-0.040
Token big
Token ID1263
Feature activation-8.40e+00

Pos logit contributions
 films+0.091
umm+0.090
andel+0.088
opl+0.086
 circles+0.086
Neg logit contributions
 would-0.114
 wouldn-0.113
 got-0.111
 never-0.107
 didn-0.104
Token box
Token ID3091
Feature activation-1.07e-01

Pos logit contributions
 films+0.160
umm+0.158
andel+0.155
 circles+0.154
 sym+0.152
Neg logit contributions
 wouldn-0.194
 would-0.189
 got-0.187
 never-0.182
 didn-0.176
Token of
Token ID286
Feature activation-7.56e+00

Pos logit contributions
 films+0.002
umm+0.002
andel+0.002
 circles+0.002
 sym+0.002
Neg logit contributions
 wouldn-0.002
 would-0.002
 got-0.002
 never-0.002
 didn-0.002
Token ch
Token ID442
Feature activation-1.05e+01

Pos logit contributions
umm+0.137
andel+0.134
 sym+0.134
 films+0.131
opl+0.126
Neg logit contributions
 would-0.182
 wouldn-0.181
 got-0.178
 never-0.175
 didn-0.168
Tokenocol
Token ID4668
Feature activation-1.55e+01

Pos logit contributions
 films+0.089
umm+0.087
 circles+0.085
 sym+0.082
andel+0.079
Neg logit contributions
 wouldn-0.352
 would-0.342
 got-0.340
 never-0.336
 didn-0.328
Tokenates
Token ID689
Feature activation-8.58e-01

Pos logit contributions
 circles+0.139
umm+0.137
 films+0.133
andel+0.132
 sym+0.120
Neg logit contributions
 wouldn-0.519
 would-0.505
 got-0.494
 never-0.489
 didn-0.485
Token!
Token ID0
Feature activation-1.45e+00

Pos logit contributions
 circles+0.018
 films+0.018
umm+0.017
andel+0.017
 sym+0.017
Neg logit contributions
 wouldn-0.018
 would-0.017
 got-0.017
 never-0.017
 didn-0.016
Token
Token ID220
Feature activation-2.00e+00

Pos logit contributions
 films+0.034
 sym+0.033
umm+0.033
andel+0.033
 circles+0.033
Neg logit contributions
 wouldn-0.027
 got-0.027
 would-0.026
 never-0.025
 didn-0.025
Token
Token ID198
Feature activation-3.57e+00

Pos logit contributions
 circles+0.043
 films+0.043
umm+0.043
andel+0.042
 sym+0.042
Neg logit contributions
 wouldn-0.040
 would-0.038
 got-0.037
 never-0.037
 didn-0.035
Token
Token ID198
Feature activation-3.87e+00

Pos logit contributions
 circles+0.074
 films+0.072
andel+0.070
 sym+0.069
umm+0.069
Neg logit contributions
 wouldn-0.079
 would-0.073
 got-0.071
 never-0.069
 didn-0.068
Token trapped
Token ID13640
Feature activation-1.55e+00

Pos logit contributions
 wouldn+0.011
 would+0.010
 got+0.010
 never+0.010
 didn+0.009
Neg logit contributions
 films-0.013
 circles-0.013
umm-0.013
andel-0.013
 sym-0.013
Token with
Token ID351
Feature activation-2.82e+00

Pos logit contributions
 circles+0.038
 films+0.037
 lions+0.036
umm+0.036
 sym+0.036
Neg logit contributions
 wouldn-0.027
 would-0.025
 got-0.025
 didn-0.024
 never-0.023
Token the
Token ID262
Feature activation-9.96e+00

Pos logit contributions
umm+0.071
 sym+0.069
andel+0.068
 films+0.068
 circles+0.067
Neg logit contributions
 wouldn-0.050
 never-0.048
 would-0.048
 got-0.047
 didn-0.044
Token rh
Token ID9529
Feature activation-1.25e+01

Pos logit contributions
umm+0.180
 sym+0.175
igsaw+0.171
andel+0.169
 films+0.168
Neg logit contributions
 never-0.259
 would-0.254
 wouldn-0.249
 got-0.247
 had-0.236
Tokenin
Token ID259
Feature activation-5.97e+00

Pos logit contributions
umm+0.282
 films+0.281
 circles+0.279
 sym+0.277
andel+0.275
Neg logit contributions
 wouldn-0.242
 would-0.234
 got-0.229
 never-0.226
 didn-0.213
Tokenoc
Token ID420
Feature activation-1.55e+01

Pos logit contributions
 films+0.111
umm+0.110
 circles+0.109
andel+0.107
 sym+0.106
Neg logit contributions
 wouldn-0.141
 would-0.136
 got-0.134
 never-0.133
 didn-0.127
Tokeneros
Token ID27498
Feature activation-7.98e-01

Pos logit contributions
 films+0.175
 circles+0.172
 sym+0.170
andel+0.160
 Wheels+0.155
Neg logit contributions
 wouldn-0.475
 never-0.458
 would-0.453
 got-0.452
 didn-0.432
Token.
Token ID13
Feature activation-1.93e+00

Pos logit contributions
umm+0.020
 films+0.020
andel+0.019
 circles+0.019
 sym+0.019
Neg logit contributions
 wouldn-0.014
 would-0.014
 got-0.013
 never-0.013
 didn-0.012
Token They
Token ID1119
Feature activation+6.51e+00

Pos logit contributions
umm+0.048
andel+0.047
 sym+0.046
opl+0.045
 array+0.045
Neg logit contributions
 wouldn-0.033
 never-0.033
 would-0.032
 got-0.032
 could-0.031
Token looked
Token ID3114
Feature activation+4.47e+00

Pos logit contributions
use+0.067
 wouldn+0.067
 would+0.061
 managed+0.050
 listened+0.050
Neg logit contributions
 circles-0.166
 monkeys-0.165
 lions-0.163
 elephants-0.160
 Wheels-0.156
Token for
Token ID329
Feature activation-8.95e-02

Pos logit contributions
 wouldn+0.085
 would+0.082
 got+0.079
 never+0.079
 didn+0.075
Neg logit contributions
 films-0.100
 circles-0.098
 lions-0.095
 sym-0.095
andel-0.094
Token had
Token ID550
Feature activation+3.30e+00

Pos logit contributions
 circles+0.048
 films+0.047
umm+0.046
ement+0.046
 lions+0.046
Neg logit contributions
 wouldn-0.016
 got-0.014
 would-0.014
 never-0.013
 fell-0.012
Token two
Token ID734
Feature activation-5.29e+00

Pos logit contributions
 wouldn+0.051
 would+0.049
 got+0.048
 never+0.048
 didn+0.044
Neg logit contributions
umm-0.089
 films-0.089
andel-0.087
 circles-0.087
 sym-0.086
Token rulers
Token ID24925
Feature activation-8.64e-01

Pos logit contributions
umm+0.092
 sym+0.089
 films+0.089
andel+0.086
 circles+0.086
Neg logit contributions
 wouldn-0.132
 would-0.131
 got-0.130
 never-0.127
 didn-0.122
Token,
Token ID11
Feature activation+1.25e+00

Pos logit contributions
 films+0.018
 circles+0.018
umm+0.018
andel+0.018
 sym+0.017
Neg logit contributions
 wouldn-0.018
 would-0.017
 got-0.017
 never-0.016
 didn-0.016
Token the
Token ID262
Feature activation-8.89e+00

Pos logit contributions
 wouldn+0.023
 would+0.022
 never+0.022
 got+0.022
 didn+0.021
Neg logit contributions
umm-0.030
andel-0.030
 films-0.029
 sym-0.029
 circles-0.028
Token Sm
Token ID2439
Feature activation-1.55e+01

Pos logit contributions
umm+0.165
andel+0.159
 sym+0.156
 films+0.152
opl+0.150
Neg logit contributions
 never-0.215
 would-0.214
 wouldn-0.212
 got-0.208
 didn-0.199
Tokenelly
Token ID6148
Feature activation-5.43e+00

Pos logit contributions
 films+0.258
umm+0.256
 circles+0.254
andel+0.249
 sym+0.246
Neg logit contributions
 wouldn-0.391
 would-0.377
 got-0.371
 never-0.368
 didn-0.355
Token King
Token ID2677
Feature activation-2.01e+00

Pos logit contributions
umm+0.078
 films+0.077
 circles+0.077
andel+0.075
 sym+0.074
Neg logit contributions
 wouldn-0.150
 would-0.146
 got-0.144
 never-0.142
 didn-0.138
Token and
Token ID290
Feature activation+3.87e+00

Pos logit contributions
umm+0.060
 films+0.059
andel+0.059
 circles+0.058
 sym+0.058
Neg logit contributions
 wouldn-0.026
 would-0.025
 got-0.024
 never-0.023
 didn-0.022
Token the
Token ID262
Feature activation-9.05e+00

Pos logit contributions
 wouldn+0.061
 would+0.057
 got+0.056
 never+0.055
 didn+0.052
Neg logit contributions
 films-0.101
umm-0.101
 circles-0.100
andel-0.099
 sym-0.098
Token St
Token ID520
Feature activation-1.19e+01

Pos logit contributions
umm+0.166
andel+0.161
 sym+0.156
 films+0.153
opl+0.150
Neg logit contributions
 never-0.220
 would-0.219
 wouldn-0.217
 got-0.214
 didn-0.205
Token to
Token ID284
Feature activation+3.78e+00

Pos logit contributions
 wouldn+0.031
 would+0.029
 got+0.028
 never+0.027
 didn+0.026
Neg logit contributions
 circles-0.046
 films-0.045
 sym-0.044
umm-0.043
andel-0.043
Token roll
Token ID4836
Feature activation+3.24e-01

Pos logit contributions
 wouldn+0.058
 would+0.054
 got+0.053
 never+0.050
 didn+0.048
Neg logit contributions
 circles-0.100
 films-0.100
umm-0.097
 sym-0.096
andel-0.096
Token around
Token ID1088
Feature activation+1.71e+00

Pos logit contributions
 wouldn+0.005
 would+0.005
 got+0.005
 never+0.005
 didn+0.004
Neg logit contributions
umm-0.009
 films-0.009
andel-0.008
 sym-0.008
 circles-0.008
Token with
Token ID351
Feature activation-1.31e+00

Pos logit contributions
 wouldn+0.027
 would+0.026
 never+0.025
 got+0.025
 didn+0.024
Neg logit contributions
umm-0.048
andel-0.046
 films-0.044
opl-0.044
 sym-0.044
Token the
Token ID262
Feature activation-9.99e+00

Pos logit contributions
umm+0.038
andel+0.036
 sym+0.035
opl+0.034
 films+0.034
Neg logit contributions
 wouldn-0.020
 never-0.020
 would-0.020
 got-0.020
 didn-0.019
Token gr
Token ID1036
Feature activation-1.54e+01

Pos logit contributions
opl+0.191
umm+0.186
 films+0.185
andel+0.184
 sym+0.183
Neg logit contributions
 wouldn-0.231
 got-0.230
 would-0.227
 tried-0.221
 didn-0.221
Tokenumpy
Token ID32152
Feature activation-1.04e+01

Pos logit contributions
 films+0.226
 circles+0.220
umm+0.218
 sym+0.216
 lions+0.215
Neg logit contributions
 wouldn-0.410
 got-0.399
 would-0.396
 never-0.391
 didn-0.382
Token shape
Token ID5485
Feature activation-1.80e+00

Pos logit contributions
 films+0.182
umm+0.176
opl+0.175
 lions+0.175
 sym+0.175
Neg logit contributions
 wouldn-0.253
 got-0.249
 would-0.241
 didn-0.239
 never-0.233
Token and
Token ID290
Feature activation+4.46e+00

Pos logit contributions
umm+0.044
 films+0.044
andel+0.043
 sym+0.043
 circles+0.042
Neg logit contributions
 wouldn-0.032
 would-0.031
 got-0.030
 never-0.030
 didn-0.028
Token soon
Token ID2582
Feature activation+4.81e+00

Pos logit contributions
 wouldn+0.072
 would+0.068
 got+0.066
 never+0.065
 didn+0.061
Neg logit contributions
 circles-0.114
 films-0.114
umm-0.113
 sym-0.111
andel-0.110
Token,
Token ID11
Feature activation+2.27e+00

Pos logit contributions
 wouldn+0.071
 would+0.067
 got+0.066
 never+0.064
 didn+0.060
Neg logit contributions
umm-0.132
 films-0.132
andel-0.130
 circles-0.130
 sym-0.128
Token and
Token ID290
Feature activation+3.42e+00

Pos logit contributions
umm+0.053
andel+0.052
 films+0.051
ement+0.050
 elephants+0.050
Neg logit contributions
 got-0.027
 never-0.025
 would-0.024
 fell-0.024
 wouldn-0.024
Token took
Token ID1718
Feature activation+6.37e-01

Pos logit contributions
 wouldn+0.037
 would+0.037
'd+0.034
use+0.032
 never+0.030
Neg logit contributions
 circles-0.086
 elderly-0.086
 amaz-0.085
 films-0.085
 lions-0.084
Token the
Token ID262
Feature activation-9.49e+00

Pos logit contributions
 wouldn+0.013
 would+0.012
 never+0.011
 didn+0.011
 got+0.011
Neg logit contributions
 circles-0.014
 films-0.014
andel-0.014
umm-0.014
 lions-0.013
Token rh
Token ID9529
Feature activation-1.18e+01

Pos logit contributions
 sym+0.154
 circles+0.150
umm+0.150
andel+0.148
 films+0.147
Neg logit contributions
 got-0.254
 would-0.250
 never-0.248
 wouldn-0.241
 had-0.230
Tokenin
Token ID259
Feature activation-5.78e+00

Pos logit contributions
 films+0.244
umm+0.239
 circles+0.237
andel+0.235
 lions+0.232
Neg logit contributions
 wouldn-0.255
 would-0.239
 got-0.234
 never-0.232
 didn-0.226
Tokenoc
Token ID420
Feature activation-1.54e+01

Pos logit contributions
 films+0.108
umm+0.108
 circles+0.107
andel+0.105
 sym+0.104
Neg logit contributions
 wouldn-0.135
 would-0.130
 got-0.129
 never-0.127
 didn-0.122
Tokeneros
Token ID27498
Feature activation+6.19e-02

Pos logit contributions
 films+0.140
 circles+0.137
umm+0.134
andel+0.130
 sym+0.129
Neg logit contributions
 wouldn-0.507
 would-0.492
 got-0.487
 never-0.485
 didn-0.470
Token home
Token ID1363
Feature activation+2.16e+00

Pos logit contributions
 wouldn+0.001
 would+0.001
 got+0.001
 never+0.001
 didn+0.001
Neg logit contributions
umm-0.001
 films-0.001
 circles-0.001
andel-0.001
 sym-0.001
Token with
Token ID351
Feature activation-1.82e+00

Pos logit contributions
 wouldn+0.044
 would+0.042
 got+0.041
 never+0.041
 didn+0.039
Neg logit contributions
 films-0.046
 circles-0.046
umm-0.046
andel-0.045
 sym-0.044
Token her
Token ID607
Feature activation-4.45e+00

Pos logit contributions
umm+0.048
andel+0.047
 films+0.047
 sym+0.047
 circles+0.045
Neg logit contributions
 wouldn-0.029
 never-0.029
 would-0.028
 got-0.028
 didn-0.026
Token.
Token ID13
Feature activation-2.00e+00

Pos logit contributions
umm+0.104
 films+0.102
andel+0.102
 sym+0.099
 orchestra+0.098
Neg logit contributions
 wouldn-0.092
 would-0.091
 got-0.087
 never-0.087
 didn-0.082
Token a
Token ID257
Feature activation-4.22e+00

Pos logit contributions
 wouldn+0.014
 would+0.013
 got+0.013
 never+0.013
 didn+0.012
Neg logit contributions
 films-0.015
 circles-0.015
umm-0.015
andel-0.014
 sym-0.014
Token big
Token ID1263
Feature activation-7.85e+00

Pos logit contributions
umm+0.077
 circles+0.072
 films+0.071
andel+0.071
opl+0.070
Neg logit contributions
 would-0.108
 got-0.105
 wouldn-0.105
 never-0.101
 didn-0.097
Token box
Token ID3091
Feature activation+5.55e-01

Pos logit contributions
umm+0.147
 films+0.142
 circles+0.142
 sym+0.141
andel+0.140
Neg logit contributions
 got-0.182
 wouldn-0.180
 would-0.180
 never-0.170
 didn-0.167
Token of
Token ID286
Feature activation-6.69e+00

Pos logit contributions
 wouldn+0.011
 would+0.011
 got+0.011
 never+0.010
 didn+0.010
Neg logit contributions
 films-0.012
 circles-0.012
umm-0.012
andel-0.012
 sym-0.011
Token ch
Token ID442
Feature activation-1.05e+01

Pos logit contributions
umm+0.141
andel+0.134
 sym+0.134
opl+0.128
ums+0.127
Neg logit contributions
 would-0.146
 got-0.140
 wouldn-0.139
 never-0.136
 found-0.133
Tokenocol
Token ID4668
Feature activation-1.54e+01

Pos logit contributions
umm+0.114
 films+0.114
 sym+0.112
 circles+0.112
 Wheels+0.102
Neg logit contributions
 wouldn-0.319
 got-0.310
 would-0.309
 never-0.305
 didn-0.296
Tokenates
Token ID689
Feature activation-5.26e-01

Pos logit contributions
 circles+0.144
 films+0.144
umm+0.143
andel+0.138
 sym+0.130
Neg logit contributions
 wouldn-0.506
 would-0.490
 got-0.483
 never-0.478
 didn-0.471
Token to
Token ID284
Feature activation+1.56e+00

Pos logit contributions
 films+0.013
 circles+0.013
umm+0.013
andel+0.013
 sym+0.013
Neg logit contributions
 wouldn-0.009
 would-0.008
 got-0.008
 never-0.008
 didn-0.008
Token share
Token ID2648
Feature activation-2.27e-01

Pos logit contributions
 wouldn+0.025
 would+0.024
 got+0.024
 never+0.023
 didn+0.022
Neg logit contributions
umm-0.040
 films-0.040
 circles-0.040
andel-0.039
 sym-0.039
Token with
Token ID351
Feature activation-2.76e+00

Pos logit contributions
umm+0.007
andel+0.006
 sym+0.006
 films+0.006
 orchestra+0.006
Neg logit contributions
 would-0.003
 never-0.003
 wouldn-0.003
 got-0.003
 didn-0.003
Token the
Token ID262
Feature activation-1.01e+01

Pos logit contributions
umm+0.071
 films+0.070
 circles+0.070
andel+0.070
 sym+0.069
Neg logit contributions
 wouldn-0.046
 would-0.044
 got-0.043
 never-0.043
 didn-0.040
Token it
Token ID340
Feature activation+4.06e+00

Pos logit contributions
 wouldn+0.002
 would+0.002
 got+0.002
 didn+0.002
 never+0.002
Neg logit contributions
 circles-0.004
 films-0.004
andel-0.004
umm-0.003
 lions-0.003
Token home
Token ID1363
Feature activation+2.94e+00

Pos logit contributions
 wouldn+0.085
 would+0.081
 got+0.077
 didn+0.075
 never+0.075
Neg logit contributions
 circles-0.087
 films-0.082
andel-0.082
 sym-0.079
umm-0.079
Token with
Token ID351
Feature activation-1.31e+00

Pos logit contributions
 wouldn+0.065
 would+0.062
 got+0.060
 never+0.058
 didn+0.058
Neg logit contributions
 circles-0.058
 films-0.057
andel-0.055
 elderly-0.054
 sym-0.054
Token her
Token ID607
Feature activation-4.47e+00

Pos logit contributions
 circles+0.033
 films+0.033
umm+0.032
andel+0.032
 sym+0.032
Neg logit contributions
 wouldn-0.022
 would-0.021
 got-0.020
 never-0.020
 didn-0.019
Token.
Token ID13
Feature activation-1.26e+00

Pos logit contributions
 films+0.112
umm+0.112
andel+0.109
 sym+0.108
 Wheels+0.105
Neg logit contributions
 wouldn-0.084
 would-0.081
 never-0.078
 got-0.077
 didn-0.074
Token She
Token ID1375
Feature activation+7.22e+00

Pos logit contributions
 films+0.027
umm+0.027
 circles+0.027
andel+0.026
 sym+0.026
Neg logit contributions
 wouldn-0.026
 would-0.025
 got-0.024
 never-0.024
 didn-0.023
Token put
Token ID1234
Feature activation-1.46e-01

Pos logit contributions
use+0.083
t+0.081
 wouldn+0.077
 don+0.074
w+0.066
Neg logit contributions
 elderly-0.181
 circles-0.178
 lions-0.171
 ze-0.170
 monkeys-0.169
Token it
Token ID340
Feature activation+6.38e+00

Pos logit contributions
 circles+0.004
 films+0.003
andel+0.003
 squares+0.003
 sunrise+0.003
Neg logit contributions
 wouldn-0.003
 would-0.003
 never-0.002
 forgot-0.002
 got-0.002
Token in
Token ID287
Feature activation+2.91e-02

Pos logit contributions
 wouldn+0.122
 would+0.113
 forgot+0.110
 didn+0.108
 got+0.107
Neg logit contributions
 circles-0.140
 films-0.122
 squares-0.121
 elderly-0.121
andel-0.121
Token a
Token ID257
Feature activation-4.82e+00

Pos logit contributions
 never+0.000
 wouldn+0.000
 got+0.000
 would+0.000
 didn+0.000
Neg logit contributions
umm-0.001
 films-0.001
andel-0.001
 sym-0.001
 lions-0.001
Token jar
Token ID17379
Feature activation-3.42e-01

Pos logit contributions
 films+0.083
umm+0.083
 circles+0.081
andel+0.080
 sym+0.079
Neg logit contributions
 wouldn-0.120
 would-0.117
 got-0.115
 never-0.113
 didn-0.109
Token wash
Token ID13502
Feature activation+2.93e-01

Pos logit contributions
 wouldn+0.053
 would+0.049
 got+0.046
 never+0.045
 didn+0.044
Neg logit contributions
 circles-0.087
 films-0.086
andel-0.083
umm-0.082
 lions-0.082
Token the
Token ID262
Feature activation-1.03e+01

Pos logit contributions
 wouldn+0.005
 would+0.005
 got+0.005
 never+0.005
 didn+0.005
Neg logit contributions
umm-0.007
 films-0.007
 circles-0.007
andel-0.007
 sym-0.007
Token mud
Token ID17492
Feature activation-2.09e+00

Pos logit contributions
andel+0.183
umm+0.181
 films+0.174
 sym+0.174
ement+0.167
Neg logit contributions
 got-0.247
 would-0.241
 will-0.231
 wouldn-0.230
 had-0.229
Token off
Token ID572
Feature activation+1.01e+00

Pos logit contributions
 films+0.046
umm+0.045
 circles+0.045
andel+0.044
 sym+0.044
Neg logit contributions
 wouldn-0.042
 would-0.040
 got-0.039
 never-0.039
 didn-0.037
Token.
Token ID13
Feature activation-8.81e-01

Pos logit contributions
 wouldn+0.017
 would+0.017
 never+0.016
 got+0.016
 didn+0.015
Neg logit contributions
umm-0.026
 sym-0.025
andel-0.025
 films-0.025
 circles-0.024
Token He
Token ID679
Feature activation+8.02e+00

Pos logit contributions
 films+0.021
umm+0.021
andel+0.021
 sym+0.021
 circles+0.021
Neg logit contributions
 wouldn-0.016
 would-0.015
 got-0.015
 never-0.015
 didn-0.014
Token also
Token ID635
Feature activation+6.54e+00

Pos logit contributions
 wouldn+0.098
use+0.085
 never+0.080
 would+0.080
 don+0.076
Neg logit contributions
 circles-0.213
 elderly-0.212
 monkeys-0.206
 lions-0.204
 Wheels-0.202
Token liked
Token ID8288
Feature activation-1.64e+00

Pos logit contributions
 wouldn+0.085
t+0.077
 never+0.070
 would+0.068
 don+0.063
Neg logit contributions
 elderly-0.169
 circles-0.164
 squares-0.157
 amaz-0.155
 array-0.155
Token to
Token ID284
Feature activation+2.85e+00

Pos logit contributions
umm+0.044
andel+0.043
 sym+0.042
 films+0.041
ement+0.041
Neg logit contributions
 wouldn-0.027
 would-0.026
 never-0.026
 got-0.025
 didn-0.023
Token swing
Token ID9628
Feature activation+1.30e+00

Pos logit contributions
 wouldn+0.054
 would+0.050
 got+0.049
 never+0.047
 didn+0.046
Neg logit contributions
 circles-0.065
 films-0.064
 sym-0.062
andel-0.061
 orchestra-0.060
Token and
Token ID290
Feature activation+5.26e+00

Pos logit contributions
 wouldn+0.021
 would+0.020
 got+0.019
 never+0.019
 didn+0.018
Neg logit contributions
umm-0.035
 films-0.034
andel-0.034
 sym-0.034
 circles-0.033
Token questions
Token ID2683
Feature activation+9.41e-01

Pos logit contributions
umm+0.084
 sym+0.073
andel+0.072
ums+0.070
 Wheels+0.068
Neg logit contributions
 would-0.090
 wouldn-0.089
 never-0.088
 got-0.087
 tried-0.085
Token,
Token ID11
Feature activation+2.41e+00

Pos logit contributions
 wouldn+0.016
 would+0.016
 got+0.015
 never+0.015
 didn+0.014
Neg logit contributions
umm-0.024
 films-0.023
andel-0.023
 sym-0.023
 circles-0.023
Token but
Token ID475
Feature activation+3.95e+00

Pos logit contributions
 wouldn+0.045
 would+0.043
 got+0.041
 never+0.041
 didn+0.039
Neg logit contributions
umm-0.057
 films-0.057
andel-0.056
 circles-0.056
 sym-0.055
Token her
Token ID607
Feature activation-2.50e+00

Pos logit contributions
 wouldn+0.060
 got+0.057
 would+0.057
 never+0.052
 fell+0.051
Neg logit contributions
 circles-0.099
 films-0.097
umm-0.094
 lions-0.094
 amaz-0.093
Token daddy
Token ID35822
Feature activation+4.67e+00

Pos logit contributions
 films+0.044
umm+0.044
 circles+0.044
andel+0.043
 sym+0.043
Neg logit contributions
 wouldn-0.061
 would-0.059
 got-0.058
 never-0.057
 didn-0.055
Token could
Token ID714
Feature activation+7.34e+00

Pos logit contributions
 wouldn+0.060
 got+0.058
 never+0.056
 would+0.056
 fell+0.055
Neg logit contributions
ement-0.111
 elderly-0.110
 circles-0.110
 lions-0.107
 sym-0.106
Token not
Token ID407
Feature activation+4.50e+00

Pos logit contributions
 wouldn+0.106
 would+0.095
 got+0.094
 didn+0.087
 fell+0.083
Neg logit contributions
 circles-0.193
 films-0.191
 elderly-0.185
 amusement-0.184
 sunrise-0.184
Token answer
Token ID3280
Feature activation+2.17e+00

Pos logit contributions
 wouldn+0.084
 would+0.075
 got+0.073
 didn+0.071
 fell+0.068
Neg logit contributions
 circles-0.101
 films-0.099
andel-0.096
 elderly-0.096
 orchestra-0.096
Token them
Token ID606
Feature activation+3.59e+00

Pos logit contributions
 wouldn+0.039
 would+0.038
 never+0.036
 got+0.035
 didn+0.033
Neg logit contributions
umm-0.055
andel-0.053
 sym-0.053
 films-0.053
 circles-0.051
Token all
Token ID477
Feature activation+3.94e+00

Pos logit contributions
 wouldn+0.067
 would+0.065
 never+0.061
 got+0.060
 didn+0.058
Neg logit contributions
umm-0.089
andel-0.087
 films-0.087
 sym-0.085
 circles-0.085
Token.
Token ID13
Feature activation-9.86e-01

Pos logit contributions
 wouldn+0.075
 would+0.073
 got+0.070
 never+0.070
 didn+0.066
Neg logit contributions
umm-0.094
andel-0.092
 films-0.091
 sym-0.089
 circles-0.089
Token to
Token ID284
Feature activation+2.32e+00

Pos logit contributions
umm+0.083
andel+0.081
 sym+0.080
 films+0.080
 circles+0.080
Neg logit contributions
 wouldn-0.032
 would-0.031
 never-0.030
 got-0.029
 didn-0.026
Token roar
Token ID34532
Feature activation-3.55e-01

Pos logit contributions
 wouldn+0.045
 would+0.041
 got+0.040
 never+0.039
 didn+0.038
Neg logit contributions
 films-0.053
 circles-0.053
 sym-0.051
 lions-0.051
umm-0.050
Token and
Token ID290
Feature activation+4.70e+00

Pos logit contributions
umm+0.010
andel+0.009
 sym+0.009
 films+0.009
 circles+0.009
Neg logit contributions
 wouldn-0.005
 would-0.005
 got-0.005
 never-0.005
 didn-0.005
Token it
Token ID340
Feature activation+5.63e+00

Pos logit contributions
 wouldn+0.096
 would+0.087
 got+0.087
 never+0.085
 didn+0.083
Neg logit contributions
 circles-0.101
 films-0.100
 lions-0.096
 monkeys-0.095
 elephants-0.095
Token made
Token ID925
Feature activation+1.48e+00

Pos logit contributions
 wouldn+0.065
 never+0.052
 would+0.052
 got+0.051
use+0.049
Neg logit contributions
 circles-0.161
 films-0.158
 lions-0.157
 sym-0.156
 monkeys-0.155
Token everyone
Token ID2506
Feature activation+5.85e+00

Pos logit contributions
 never+0.022
 wouldn+0.022
 would+0.022
 got+0.021
 didn+0.020
Neg logit contributions
umm-0.041
andel-0.039
 sym-0.039
 array-0.038
 amusement-0.038
Token scared
Token ID12008
Feature activation-1.80e+00

Pos logit contributions
 wouldn+0.116
 would+0.112
 got+0.109
 never+0.108
 didn+0.103
Neg logit contributions
umm-0.131
 films-0.129
 circles-0.128
andel-0.128
 sym-0.125
Token.
Token ID13
Feature activation-1.13e+00

Pos logit contributions
umm+0.052
andel+0.051
 sym+0.050
 films+0.050
 circles+0.049
Neg logit contributions
 wouldn-0.025
 would-0.024
 got-0.023
 never-0.023
 didn-0.021
Token One
Token ID1881
Feature activation-2.78e+00

Pos logit contributions
andel+0.030
umm+0.030
 films+0.029
 sym+0.028
 orchestra+0.028
Neg logit contributions
 would-0.018
 never-0.017
 wouldn-0.017
 got-0.016
 could-0.016
Token day
Token ID1110
Feature activation+2.46e+00

Pos logit contributions
umm+0.067
 circles+0.064
andel+0.064
 films+0.063
 sym+0.063
Neg logit contributions
 would-0.053
 wouldn-0.053
 got-0.051
 never-0.050
 didn-0.048
Token,
Token ID11
Feature activation+4.07e-02

Pos logit contributions
 wouldn+0.041
 would+0.039
 got+0.038
 never+0.037
 didn+0.036
Neg logit contributions
umm-0.063
 films-0.063
 circles-0.062
andel-0.062
 sym-0.061
Token your
Token ID534
Feature activation-5.86e+00

Pos logit contributions
 wouldn+0.006
 got+0.006
 never+0.006
 would+0.006
 didn+0.005
Neg logit contributions
umm-0.009
 sym-0.009
 films-0.009
andel-0.009
 circles-0.009
Token baseball
Token ID9283
Feature activation-6.18e-01

Pos logit contributions
andel+0.110
umm+0.109
 films+0.106
 sym+0.106
 orchestra+0.102
Neg logit contributions
 wouldn-0.139
 would-0.136
 got-0.133
 never-0.132
 didn-0.127
Token!
Token ID0
Feature activation-7.88e-01

Pos logit contributions
 films+0.014
umm+0.014
andel+0.014
 circles+0.014
 sym+0.013
Neg logit contributions
 wouldn-0.012
 would-0.012
 got-0.011
 never-0.011
 didn-0.011
Tokenâ
Token ID22940
Feature activation-5.35e+00

Pos logit contributions
 films+0.017
umm+0.017
 circles+0.017
andel+0.017
 sym+0.016
Neg logit contributions
 wouldn-0.016
 would-0.015
 got-0.015
 never-0.015
 didn-0.014
Token€
Token ID26391
Feature activation-5.28e+00

Pos logit contributions
umm+0.092
opl+0.090
 circles+0.090
 sym+0.090
 orchestra+0.087
Neg logit contributions
 wouldn-0.127
 would-0.126
 got-0.120
 never-0.119
 could-0.114
Token She
Token ID1375
Feature activation+7.47e+00

Pos logit contributions
 circles+0.106
 films+0.103
 sym+0.102
umm+0.100
 Wheels+0.100
Neg logit contributions
 wouldn-0.111
 would-0.107
 got-0.101
 never-0.099
 didn-0.096
Token found
Token ID1043
Feature activation+1.52e+00

Pos logit contributions
 wouldn+0.068
 never+0.064
 would+0.062
 got+0.060
 listened+0.057
Neg logit contributions
 circles-0.201
 elderly-0.201
 array-0.197
umm-0.196
 sym-0.196
Token a
Token ID257
Feature activation-4.42e+00

Pos logit contributions
 wouldn+0.021
 would+0.020
 got+0.019
 never+0.018
 didn+0.018
Neg logit contributions
 circles-0.042
 films-0.042
umm-0.042
andel-0.041
 sym-0.041
Token ladder
Token ID18002
Feature activation+3.66e-01

Pos logit contributions
umm+0.082
 films+0.080
andel+0.080
opl+0.075
ement+0.075
Neg logit contributions
 would-0.110
 wouldn-0.108
 got-0.106
 never-0.102
 didn-0.099
Token and
Token ID290
Feature activation+4.89e+00

Pos logit contributions
 wouldn+0.007
 would+0.006
 got+0.006
 never+0.006
 didn+0.006
Neg logit contributions
 circles-0.009
umm-0.009
 films-0.009
andel-0.008
 sym-0.008
Token stood
Token ID6204
Feature activation+6.08e-02

Pos logit contributions
 wouldn+0.060
 would+0.056
 never+0.054
 got+0.049
 didn+0.048
Neg logit contributions
 circles-0.137
 elderly-0.134
 films-0.131
umm-0.130
 array-0.129
Token a
Token ID257
Feature activation-5.74e+00

Pos logit contributions
umm+0.057
andel+0.055
 sym+0.053
ement+0.051
opl+0.051
Neg logit contributions
 would-0.029
 got-0.029
 never-0.028
 wouldn-0.026
 didn-0.025
Token stubborn
Token ID24444
Feature activation-6.27e+00

Pos logit contributions
umm+0.123
andel+0.118
ement+0.117
 circles+0.117
 films+0.116
Neg logit contributions
 would-0.131
 got-0.128
 wouldn-0.118
 had-0.117
 never-0.117
Token lizard
Token ID42406
Feature activation-1.03e+00

Pos logit contributions
umm+0.115
 circles+0.112
 films+0.111
 sym+0.110
andel+0.108
Neg logit contributions
 got-0.152
 would-0.147
 wouldn-0.145
 didn-0.138
 fell-0.138
Token on
Token ID319
Feature activation-8.81e-01

Pos logit contributions
 films+0.024
 circles+0.024
umm+0.023
 lions+0.023
 sym+0.023
Neg logit contributions
 wouldn-0.019
 would-0.018
 got-0.018
 never-0.017
 didn-0.017
Token a
Token ID257
Feature activation-5.23e+00

Pos logit contributions
umm+0.028
 sym+0.027
ement+0.026
andel+0.025
 orchestra+0.025
Neg logit contributions
 never-0.012
 got-0.011
 would-0.010
 wouldn-0.010
 didn-0.010
Token rock
Token ID3881
Feature activation+7.76e-02

Pos logit contributions
umm+0.085
 films+0.085
 circles+0.083
andel+0.081
 sym+0.080
Neg logit contributions
 wouldn-0.136
 would-0.133
 got-0.131
 never-0.129
 didn-0.124
Token.
Token ID13
Feature activation-1.62e+00

Pos logit contributions
 wouldn+0.002
 would+0.001
 got+0.001
 never+0.001
 didn+0.001
Neg logit contributions
 films-0.002
umm-0.002
 circles-0.002
andel-0.002
 sym-0.002
Token The
Token ID383
Feature activation-8.62e+00

Pos logit contributions
umm+0.035
 films+0.035
andel+0.035
 circles+0.034
 sym+0.034
Neg logit contributions
 wouldn-0.034
 would-0.033
 got-0.033
 never-0.031
 didn-0.031
Token lizard
Token ID42406
Feature activation+7.42e-01

Pos logit contributions
umm+0.128
 films+0.125
 sym+0.121
andel+0.121
 array+0.113
Neg logit contributions
 would-0.238
 wouldn-0.238
 got-0.238
 never-0.231
 didn-0.226
Token did
Token ID750
Feature activation+4.25e+00

Pos logit contributions
 wouldn+0.009
use+0.008
 managed+0.008
 never+0.008
wed+0.007
Neg logit contributions
 lions-0.018
 monkeys-0.018
 circles-0.018
 elephants-0.018
 jungle-0.017
Token not
Token ID407
Feature activation+4.57e+00

Pos logit contributions
 wouldn+0.065
use+0.059
 would+0.058
 got+0.052
'd+0.051
Neg logit contributions
 circles-0.105
 monkeys-0.102
 lions-0.100
 rhythm-0.099
 elderly-0.098
Token "
Token ID366
Feature activation-7.31e-01

Pos logit contributions
 films+0.024
umm+0.024
 circles+0.024
andel+0.024
 sym+0.023
Neg logit contributions
 wouldn-0.028
 would-0.027
 got-0.026
 never-0.026
 didn-0.025
TokenThis
Token ID1212
Feature activation-2.67e+00

Pos logit contributions
 circles+0.015
umm+0.015
 films+0.014
 Wheels+0.014
andel+0.014
Neg logit contributions
 wouldn-0.016
 would-0.015
 got-0.014
 fell-0.013
 didn-0.013
Token is
Token ID318
Feature activation-2.06e-02

Pos logit contributions
 circles+0.058
 lions+0.057
opl+0.057
 monkeys+0.056
 films+0.056
Neg logit contributions
 wouldn-0.051
 would-0.046
 got-0.046
 forgot-0.045
 never-0.044
Token my
Token ID616
Feature activation-6.06e+00

Pos logit contributions
 films+0.000
 circles+0.000
umm+0.000
andel+0.000
 sym+0.000
Neg logit contributions
 wouldn-0.000
 would-0.000
 got-0.000
 never-0.000
 didn-0.000
Token bell
Token ID8966
Feature activation+8.41e-01

Pos logit contributions
 films+0.110
andel+0.109
umm+0.108
 circles+0.104
 sym+0.101
Neg logit contributions
 wouldn-0.152
 would-0.148
 never-0.146
 got-0.142
 didn-0.137
Token and
Token ID290
Feature activation+4.69e+00

Pos logit contributions
 wouldn+0.016
 would+0.015
 got+0.015
 never+0.015
 didn+0.014
Neg logit contributions
 films-0.019
 circles-0.019
umm-0.018
 lions-0.018
 sym-0.018
Token I
Token ID314
Feature activation+6.02e+00

Pos logit contributions
 wouldn+0.071
 got+0.067
 would+0.066
 never+0.062
 managed+0.061
Neg logit contributions
 circles-0.119
 films-0.117
 lions-0.114
 elephants-0.112
 elderly-0.112
Token don
Token ID836
Feature activation+2.85e+00

Pos logit contributions
 wouldn+0.069
 listened+0.064
 never+0.063
 would+0.057
 got+0.057
Neg logit contributions
 sym-0.145
 circles-0.137
 elderly-0.137
opl-0.135
 zoo-0.134
Token't
Token ID470
Feature activation+4.48e+00

Pos logit contributions
 wouldn+0.041
 would+0.039
 got+0.036
 never+0.035
 didn+0.034
Neg logit contributions
 films-0.078
 circles-0.078
umm-0.076
andel-0.076
 Wheels-0.075
Token want
Token ID765
Feature activation+2.16e+00

Pos logit contributions
 wouldn+0.069
 would+0.063
 could+0.053
 never+0.053
 didn+0.053
Neg logit contributions
 circles-0.112
andel-0.109
 monkeys-0.109
 orchestra-0.109
 elderly-0.108
Token you
Token ID345
Feature activation+5.25e+00

Pos logit contributions
 wouldn+0.026
 would+0.024
 got+0.022
 never+0.022
 didn+0.021
Neg logit contributions
 circles-0.065
 films-0.065
umm-0.064
andel-0.064
 sym-0.063
Token and
Token ID290
Feature activation+3.99e+00

Pos logit contributions
 wouldn+0.023
 would+0.022
 never+0.021
 got+0.021
 didn+0.020
Neg logit contributions
umm-0.031
 films-0.031
andel-0.030
 circles-0.030
 sym-0.030
Token he
Token ID339
Feature activation+7.85e+00

Pos logit contributions
 wouldn+0.061
 would+0.056
 got+0.056
 never+0.053
 didn+0.051
Neg logit contributions
 circles-0.102
 films-0.101
 lions-0.099
 monkeys-0.098
umm-0.098
Token lived
Token ID5615
Feature activation-1.11e+00

Pos logit contributions
 wouldn+0.077
 would+0.062
 got+0.061
 never+0.057
use+0.057
Neg logit contributions
 circles-0.230
 lions-0.229
 monkeys-0.227
 sym-0.226
 films-0.226
Token far
Token ID1290
Feature activation-1.54e+00

Pos logit contributions
umm+0.030
 films+0.030
andel+0.029
 circles+0.029
 sym+0.029
Neg logit contributions
 wouldn-0.017
 would-0.016
 never-0.016
 got-0.016
 didn-0.014
Token away
Token ID1497
Feature activation-4.53e-01

Pos logit contributions
umm+0.048
 films+0.048
 circles+0.047
 sym+0.047
andel+0.047
Neg logit contributions
 wouldn-0.017
 would-0.015
 got-0.015
 never-0.015
 didn-0.013
Token in
Token ID287
Feature activation-2.44e+00

Pos logit contributions
umm+0.012
andel+0.012
 films+0.011
 sym+0.011
 Wheels+0.011
Neg logit contributions
 never-0.007
 wouldn-0.007
 would-0.007
 got-0.007
 didn-0.007
Token a
Token ID257
Feature activation-5.56e+00

Pos logit contributions
umm+0.062
andel+0.060
 sym+0.060
 films+0.058
 array+0.057
Neg logit contributions
 never-0.043
 got-0.041
 would-0.040
 wouldn-0.039
 tried-0.037
Token distant
Token ID12899
Feature activation-6.71e+00

Pos logit contributions
umm+0.106
andel+0.102
ums+0.100
 films+0.097
ement+0.097
Neg logit contributions
 would-0.134
 never-0.134
 got-0.132
 wouldn-0.125
 fell-0.121
Token forest
Token ID8222
Feature activation+7.54e-02

Pos logit contributions
umm+0.111
andel+0.105
 films+0.104
ums+0.103
 Wheels+0.101
Neg logit contributions
 would-0.174
 got-0.172
 never-0.171
 wouldn-0.165
 didn-0.158
Token.
Token ID13
Feature activation-1.29e+00

Pos logit contributions
 never+0.001
 wouldn+0.001
 would+0.001
 got+0.001
 didn+0.001
Neg logit contributions
umm-0.002
andel-0.002
 films-0.002
 sym-0.002
 Wheels-0.002
Token
Token ID220
Feature activation-1.65e+00

Pos logit contributions
umm+0.035
andel+0.034
 films+0.033
 sym+0.033
 orchestra+0.032
Neg logit contributions
 never-0.020
 wouldn-0.019
 would-0.019
 got-0.019
 could-0.018
Token and
Token ID290
Feature activation+4.80e+00

Pos logit contributions
umm+0.015
andel+0.014
opl+0.014
 films+0.013
 Wheels+0.013
Neg logit contributions
 would-0.007
 wouldn-0.007
 got-0.007
 never-0.007
 didn-0.006
Token jump
Token ID4391
Feature activation+1.61e+00

Pos logit contributions
 wouldn+0.101
 got+0.093
 would+0.092
 never+0.091
 didn+0.087
Neg logit contributions
 circles-0.101
 films-0.099
 sym-0.095
 orchestra-0.093
umm-0.093
Token around
Token ID1088
Feature activation+1.92e+00

Pos logit contributions
 would+0.022
 wouldn+0.022
 got+0.021
 never+0.020
 didn+0.019
Neg logit contributions
umm-0.047
andel-0.045
 films-0.045
opl-0.044
 circles-0.044
Token,
Token ID11
Feature activation+1.82e+00

Pos logit contributions
 would+0.026
 never+0.024
 wouldn+0.024
 got+0.023
 didn+0.022
Neg logit contributions
umm-0.057
andel-0.055
opl-0.053
 Wheels-0.053
ums-0.053
Token especially
Token ID2592
Feature activation+8.22e-01

Pos logit contributions
 wouldn+0.037
 would+0.036
 got+0.035
 never+0.034
 didn+0.033
Neg logit contributions
 films-0.039
 circles-0.039
umm-0.038
andel-0.038
 sym-0.038
Token when
Token ID618
Feature activation+3.30e+00

Pos logit contributions
 wouldn+0.013
 would+0.013
 got+0.013
 never+0.012
 didn+0.012
Neg logit contributions
umm-0.022
andel-0.021
 films-0.021
ement-0.021
 sym-0.021
Token she
Token ID673
Feature activation+6.50e+00

Pos logit contributions
 wouldn+0.046
 would+0.044
 got+0.042
 never+0.042
 didn+0.039
Neg logit contributions
umm-0.093
andel-0.092
 films-0.092
 circles-0.090
 sym-0.090
Token was
Token ID373
Feature activation+2.27e+00

Pos logit contributions
 wouldn+0.079
 never+0.068
 would+0.066
 got+0.062
 didn+0.058
Neg logit contributions
 circles-0.189
 sym-0.185
 films-0.184
 dancers-0.179
 Wheels-0.178
Token feeling
Token ID4203
Feature activation+1.83e+00

Pos logit contributions
 never+0.051
 got+0.050
 wouldn+0.050
 would+0.049
 didn+0.046
Neg logit contributions
umm-0.050
andel-0.044
 films-0.044
opl-0.043
ement-0.043
Token happy
Token ID3772
Feature activation-1.54e+00

Pos logit contributions
 wouldn+0.036
 would+0.035
 got+0.035
 never+0.034
 didn+0.033
Neg logit contributions
umm-0.042
 films-0.040
andel-0.040
 circles-0.040
 sym-0.039
Token.
Token ID13
Feature activation-8.77e-01

Pos logit contributions
umm+0.039
andel+0.039
 films+0.039
 Wheels+0.037
 circles+0.037
Neg logit contributions
 wouldn-0.026
 would-0.025
 got-0.025
 never-0.024
 didn-0.023
Token her
Token ID607
Feature activation-4.99e+00

Pos logit contributions
 never+0.019
 would+0.018
 wouldn+0.018
 got+0.018
 didn+0.016
Neg logit contributions
umm-0.039
 sym-0.038
 films-0.038
opl-0.038
 lions-0.037
Token thumb
Token ID15683
Feature activation-8.04e-01

Pos logit contributions
umm+0.097
 films+0.096
andel+0.092
 sym+0.091
 lions+0.091
Neg logit contributions
 wouldn-0.115
 would-0.112
 got-0.110
 never-0.110
 didn-0.103
Token and
Token ID290
Feature activation+4.90e+00

Pos logit contributions
 films+0.019
umm+0.019
andel+0.019
 sym+0.019
 circles+0.018
Neg logit contributions
 wouldn-0.015
 would-0.015
 got-0.014
 never-0.014
 didn-0.013
Token spread
Token ID4104
Feature activation+1.16e+00

Pos logit contributions
 wouldn+0.067
 would+0.063
 never+0.061
 got+0.060
 didn+0.056
Neg logit contributions
 circles-0.134
 films-0.127
 squares-0.125
 sym-0.125
umm-0.123
Token it
Token ID340
Feature activation+5.13e+00

Pos logit contributions
 wouldn+0.013
 would+0.013
 got+0.012
 never+0.012
 didn+0.011
Neg logit contributions
umm-0.036
 films-0.036
 sym-0.035
andel-0.035
 circles-0.034
Token around
Token ID1088
Feature activation+8.64e-01

Pos logit contributions
 wouldn+0.085
 would+0.080
 got+0.078
 never+0.078
 didn+0.076
Neg logit contributions
 circles-0.129
umm-0.125
 films-0.125
andel-0.124
 sym-0.123
Token the
Token ID262
Feature activation-1.14e+01

Pos logit contributions
 would+0.011
 wouldn+0.011
 never+0.010
 got+0.010
 didn+0.009
Neg logit contributions
umm-0.026
opl-0.025
 films-0.025
andel-0.025
 Wheels-0.025
Token paper
Token ID3348
Feature activation-1.76e+00

Pos logit contributions
ement+0.186
 films+0.186
umm+0.181
andel+0.178
opl+0.176
Neg logit contributions
 would-0.282
 got-0.282
 never-0.280
 had-0.271
 wouldn-0.268
Token.
Token ID13
Feature activation-1.77e+00

Pos logit contributions
 films+0.047
andel+0.046
umm+0.046
 amusement+0.045
 sym+0.045
Neg logit contributions
 would-0.028
 wouldn-0.028
 got-0.026
 never-0.026
 could-0.024
Token She
Token ID1375
Feature activation+7.19e+00

Pos logit contributions
 films+0.044
 flock+0.043
 sym+0.043
 amusement+0.042
 lions+0.042
Neg logit contributions
 wouldn-0.030
 got-0.029
 would-0.029
 could-0.027
 never-0.027
Token gigg
Token ID30442
Feature activation-2.79e+00

Pos logit contributions
 wouldn+0.088
use+0.078
 never+0.077
 would+0.076
 got+0.073
Neg logit contributions
 circles-0.198
 sym-0.187
 squares-0.187
 array-0.183
 films-0.182
Token go
Token ID467
Feature activation+2.08e+00

Pos logit contributions
 wouldn+0.086
 would+0.080
 got+0.078
 didn+0.071
 managed+0.071
Neg logit contributions
 circles-0.145
 films-0.142
 amusement-0.140
 elderly-0.139
 lions-0.139
Token back
Token ID736
Feature activation+8.17e-01

Pos logit contributions
 wouldn+0.042
 would+0.038
 didn+0.037
 got+0.035
 forgot+0.034
Neg logit contributions
 circles-0.048
 elderly-0.045
 films-0.044
 lions-0.044
 monkeys-0.044
Token home
Token ID1363
Feature activation+2.72e+00

Pos logit contributions
 wouldn+0.012
 would+0.011
 got+0.011
 never+0.011
 didn+0.011
Neg logit contributions
 circles-0.022
 films-0.022
umm-0.021
andel-0.021
 lions-0.021
Token."
Token ID526
Feature activation-1.62e+00

Pos logit contributions
 wouldn+0.041
 would+0.038
 got+0.037
 never+0.036
 didn+0.035
Neg logit contributions
 films-0.073
 circles-0.072
umm-0.071
andel-0.070
 sym-0.070
Token
Token ID198
Feature activation-4.50e+00

Pos logit contributions
umm+0.039
 films+0.039
andel+0.038
 circles+0.038
 sym+0.038
Neg logit contributions
 wouldn-0.029
 would-0.028
 got-0.028
 never-0.027
 didn-0.026
Token
Token ID198
Feature activation-4.73e+00

Pos logit contributions
umm+0.122
 films+0.117
andel+0.116
 sym+0.112
 circles+0.112
Neg logit contributions
 never-0.069
 wouldn-0.069
 would-0.069
 got-0.068
 didn-0.064
TokenBut
Token ID1537
Feature activation+6.96e-01

Pos logit contributions
umm+0.112
 films+0.111
andel+0.109
 circles+0.105
 sym+0.104
Neg logit contributions
 wouldn-0.089
 never-0.088
 would-0.087
 got-0.087
 didn-0.082
Token Tom
Token ID4186
Feature activation+2.62e+00

Pos logit contributions
 wouldn+0.013
 would+0.013
 got+0.013
 never+0.012
 didn+0.012
Neg logit contributions
 films-0.016
umm-0.016
 circles-0.016
andel-0.016
 sym-0.015
Token did
Token ID750
Feature activation+4.48e+00

Pos logit contributions
 wouldn+0.025
 would+0.023
 never+0.022
 got+0.021
use+0.021
Neg logit contributions
 circles-0.075
ement-0.073
 elderly-0.073
 lions-0.072
 monkeys-0.072
Token not
Token ID407
Feature activation+4.70e+00

Pos logit contributions
 wouldn+0.051
 would+0.051
 got+0.044
 managed+0.041
 could+0.041
Neg logit contributions
 films-0.126
 circles-0.126
 amusement-0.123
 lions-0.122
 elderly-0.121
Token listen
Token ID6004
Feature activation+5.25e-01

Pos logit contributions
 wouldn+0.075
 would+0.067
 didn+0.062
 never+0.061
 could+0.060
Neg logit contributions
 circles-0.114
 films-0.113
 dancers-0.112
andel-0.111
 orchestra-0.111
Token.
Token ID13
Feature activation-1.08e+00

Pos logit contributions
 wouldn+0.003
 would+0.003
 got+0.003
 never+0.003
 didn+0.003
Neg logit contributions
 films-0.004
 circles-0.004
umm-0.004
andel-0.004
 sym-0.004
Token He
Token ID679
Feature activation+7.11e+00

Pos logit contributions
 orchestra+0.026
 sym+0.026
andel+0.026
umm+0.026
opl+0.026
Neg logit contributions
 wouldn-0.019
 got-0.019
 didn-0.018
 could-0.018
 would-0.018
Token was
Token ID373
Feature activation+1.33e+00

Pos logit contributions
 wouldn+0.068
 never+0.059
use+0.057
 would+0.056
 managed+0.053
Neg logit contributions
 lions-0.192
 circles-0.190
 monkeys-0.189
 films-0.188
 Wheels-0.185
Token playing
Token ID2712
Feature activation-8.01e-01

Pos logit contributions
 wouldn+0.023
 never+0.023
 got+0.022
 would+0.022
 didn+0.021
Neg logit contributions
umm-0.034
 films-0.033
andel-0.033
 sym-0.032
 circles-0.032
Token with
Token ID351
Feature activation-2.01e+00

Pos logit contributions
umm+0.023
 films+0.022
andel+0.022
 circles+0.022
 sym+0.022
Neg logit contributions
 wouldn-0.011
 would-0.011
 got-0.011
 never-0.011
 didn-0.010
Token a
Token ID257
Feature activation-4.36e+00

Pos logit contributions
umm+0.055
 sym+0.053
andel+0.053
 films+0.052
 orchestra+0.052
Neg logit contributions
 wouldn-0.032
 never-0.031
 would-0.031
 got-0.030
 didn-0.028
Token toy
Token ID13373
Feature activation-2.99e+00

Pos logit contributions
umm+0.089
 films+0.084
andel+0.082
opl+0.080
ums+0.080
Neg logit contributions
 would-0.103
 got-0.101
 wouldn-0.101
 never-0.096
 didn-0.093
Token and
Token ID290
Feature activation+3.41e+00

Pos logit contributions
umm+0.063
andel+0.061
 films+0.061
 sym+0.061
 circles+0.060
Neg logit contributions
 wouldn-0.066
 would-0.065
 got-0.063
 never-0.061
 could-0.059
Token he
Token ID339
Feature activation+6.67e+00

Pos logit contributions
 wouldn+0.041
 would+0.037
 got+0.035
 never+0.033
use+0.031
Neg logit contributions
 circles-0.096
 lions-0.094
 films-0.093
 monkeys-0.092
 elephants-0.092
Token didn
Token ID1422
Feature activation+3.74e+00

Pos logit contributions
 wouldn+0.055
use+0.050
 would+0.047
 never+0.043
 got+0.043
Neg logit contributions
 circles-0.197
 lions-0.191
 sym-0.189
 films-0.189
 monkeys-0.188
Token't
Token ID470
Feature activation+5.04e+00

Pos logit contributions
 wouldn+0.049
 would+0.046
 got+0.044
 never+0.042
 didn+0.040
Neg logit contributions
 films-0.107
 circles-0.107
umm-0.106
andel-0.105
 sym-0.104
Token loved
Token ID6151
Feature activation-3.35e+00

Pos logit contributions
 wouldn+0.125
use+0.111
 managed+0.108
ried+0.104
 refused+0.099
Neg logit contributions
 films-0.129
 elderly-0.127
 music-0.126
 sym-0.125
 lions-0.123
Token to
Token ID284
Feature activation+2.03e+00

Pos logit contributions
umm+0.104
andel+0.102
 circles+0.101
 films+0.101
 sym+0.101
Neg logit contributions
 wouldn-0.037
 would-0.036
 got-0.035
 never-0.034
 didn-0.031
Token play
Token ID711
Feature activation-1.68e+00

Pos logit contributions
 wouldn+0.036
 would+0.034
 got+0.033
 never+0.033
 didn+0.031
Neg logit contributions
 films-0.049
umm-0.049
 circles-0.049
andel-0.048
 sym-0.048
Token outside
Token ID2354
Feature activation+4.54e-01

Pos logit contributions
umm+0.043
andel+0.041
 sym+0.040
 films+0.040
 circles+0.040
Neg logit contributions
 wouldn-0.029
 would-0.028
 got-0.028
 never-0.027
 didn-0.026
Token in
Token ID287
Feature activation-1.49e+00

Pos logit contributions
 wouldn+0.006
 would+0.006
 got+0.006
 never+0.006
 didn+0.005
Neg logit contributions
umm-0.013
 films-0.013
andel-0.013
 sym-0.013
 circles-0.012
Token the
Token ID262
Feature activation-9.42e+00

Pos logit contributions
umm+0.040
andel+0.038
 sym+0.038
 films+0.037
ement+0.037
Neg logit contributions
 got-0.023
 never-0.023
 would-0.023
 wouldn-0.022
 tried-0.021
Token grass
Token ID8701
Feature activation-1.91e+00

Pos logit contributions
umm+0.143
andel+0.142
ement+0.141
 films+0.141
 orchestra+0.130
Neg logit contributions
 got-0.255
 would-0.253
 never-0.245
 fell-0.235
 found-0.234
Token.
Token ID13
Feature activation-2.00e+00

Pos logit contributions
umm+0.048
 films+0.047
andel+0.047
 sym+0.047
 orchestra+0.046
Neg logit contributions
 would-0.033
 wouldn-0.032
 never-0.032
 got-0.032
 didn-0.029
Token One
Token ID1881
Feature activation-3.25e+00

Pos logit contributions
umm+0.049
 films+0.047
andel+0.047
 sym+0.047
 orchestra+0.046
Neg logit contributions
 would-0.037
 wouldn-0.035
 got-0.035
 never-0.033
 didn-0.032
Token day
Token ID1110
Feature activation+1.80e+00

Pos logit contributions
 films+0.064
umm+0.063
 circles+0.063
andel+0.062
 sym+0.062
Neg logit contributions
 wouldn-0.073
 would-0.069
 got-0.068
 never-0.067
 didn-0.065
Token,
Token ID11
Feature activation-5.18e-01

Pos logit contributions
 wouldn+0.039
 never+0.037
 would+0.036
 managed+0.034
 got+0.033
Neg logit contributions
 circles-0.035
 films-0.035
andel-0.035
 sym-0.033
 lions-0.033
Token a
Token ID257
Feature activation-5.25e+00

Pos logit contributions
 films+0.037
 circles+0.037
 sym+0.035
umm+0.035
 lions+0.034
Neg logit contributions
 wouldn-0.040
 would-0.037
 got-0.036
 never-0.036
 didn-0.035
Token time
Token ID640
Feature activation-1.89e-01

Pos logit contributions
 films+0.097
 circles+0.095
 sym+0.094
umm+0.094
andel+0.093
Neg logit contributions
 wouldn-0.124
 would-0.116
 got-0.114
 never-0.112
 didn-0.109
Token,
Token ID11
Feature activation-2.40e+00

Pos logit contributions
 circles+0.004
 films+0.004
umm+0.004
 sym+0.004
andel+0.004
Neg logit contributions
 wouldn-0.004
 would-0.003
 got-0.003
 never-0.003
 didn-0.003
Token there
Token ID612
Feature activation+3.86e+00

Pos logit contributions
 circles+0.050
 films+0.049
 lions+0.049
 Wheels+0.048
 sym+0.047
Neg logit contributions
 wouldn-0.048
 would-0.041
 never-0.040
 promised-0.040
 forgot-0.039
Token was
Token ID373
Feature activation-2.03e+00

Pos logit contributions
 wouldn+0.050
 would+0.042
 never+0.042
 got+0.041
 managed+0.039
Neg logit contributions
 circles-0.109
 films-0.106
 sym-0.105
umm-0.104
 lions-0.104
Token a
Token ID257
Feature activation-5.17e+00

Pos logit contributions
umm+0.060
andel+0.057
 orchestra+0.056
ement+0.055
opl+0.055
Neg logit contributions
 never-0.028
 got-0.027
 would-0.026
 didn-0.025
 found-0.024
Token smart
Token ID4451
Feature activation-6.35e+00

Pos logit contributions
umm+0.107
opl+0.103
 circles+0.103
andel+0.100
ement+0.099
Neg logit contributions
 would-0.119
 got-0.116
 wouldn-0.111
 never-0.111
 didn-0.110
Token little
Token ID1310
Feature activation-6.91e+00

Pos logit contributions
umm+0.126
 circles+0.125
opl+0.120
ement+0.119
 amaz+0.118
Neg logit contributions
 would-0.147
 got-0.146
 wouldn-0.138
 didn-0.135
 never-0.132
Token dog
Token ID3290
Feature activation-1.21e+00

Pos logit contributions
opl+0.141
 circles+0.140
ement+0.137
umm+0.136
 sym+0.135
Neg logit contributions
 would-0.152
 got-0.150
 wouldn-0.141
 fell-0.138
 didn-0.132
Token named
Token ID3706
Feature activation-3.78e+00

Pos logit contributions
umm+0.031
andel+0.030
 films+0.030
 circles+0.030
 sym+0.029
Neg logit contributions
 wouldn-0.021
 would-0.020
 got-0.020
 never-0.020
 didn-0.018
Token Max
Token ID5436
Feature activation+1.81e+00

Pos logit contributions
umm+0.109
andel+0.107
ement+0.106
 orchestra+0.103
 sym+0.102
Neg logit contributions
 would-0.058
 got-0.056
 wouldn-0.056
 never-0.055
 didn-0.052
Token daddy
Token ID35822
Feature activation+4.93e+00

Pos logit contributions
 wouldn+0.116
 would+0.109
 never+0.108
 got+0.102
 managed+0.102
Neg logit contributions
 films-0.079
 elderly-0.075
 lions-0.073
 dancers-0.072
 circles-0.071
Token agreed
Token ID4987
Feature activation+1.63e+00

Pos logit contributions
 wouldn+0.056
 never+0.053
 would+0.046
 managed+0.046
 fell+0.045
Neg logit contributions
 films-0.128
 circles-0.127
 lions-0.124
 sym-0.124
 dancers-0.122
Token and
Token ID290
Feature activation+5.49e+00

Pos logit contributions
 wouldn+0.030
 would+0.028
 got+0.027
 never+0.026
 managed+0.026
Neg logit contributions
 circles-0.036
 films-0.036
 sym-0.034
umm-0.034
 elderly-0.033
Token they
Token ID484
Feature activation+8.53e+00

Pos logit contributions
 wouldn+0.067
 would+0.063
 never+0.059
 got+0.056
'd+0.053
Neg logit contributions
 films-0.145
 circles-0.144
 elderly-0.143
umm-0.139
 sym-0.139
Token found
Token ID1043
Feature activation+1.88e+00

Pos logit contributions
 wouldn+0.083
 never+0.079
use+0.075
 would+0.071
 got+0.065
Neg logit contributions
 sym-0.217
 elderly-0.217
 dancers-0.214
 circles-0.214
 films-0.213
Token a
Token ID257
Feature activation-3.46e+00

Pos logit contributions
 wouldn+0.034
 would+0.032
 got+0.031
 never+0.031
 didn+0.029
Neg logit contributions
umm-0.048
 sym-0.046
 films-0.046
andel-0.046
opl-0.045
Token spot
Token ID4136
Feature activation+4.63e-01

Pos logit contributions
umm+0.063
andel+0.060
 films+0.059
opl+0.058
 circles+0.058
Neg logit contributions
 would-0.089
 wouldn-0.087
 got-0.085
 never-0.083
 didn-0.079
Token to
Token ID284
Feature activation+2.31e+00

Pos logit contributions
 wouldn+0.008
 would+0.007
 got+0.007
 never+0.007
 didn+0.007
Neg logit contributions
 circles-0.012
 films-0.011
umm-0.011
andel-0.011
 sym-0.011
Token sit
Token ID1650
Feature activation-1.74e+00

Pos logit contributions
 wouldn+0.041
 would+0.038
 got+0.037
 never+0.037
 didn+0.035
Neg logit contributions
 films-0.056
 circles-0.056
umm-0.055
andel-0.055
 sym-0.055
Token and
Token ID290
Feature activation+5.25e+00

Pos logit contributions
 films+0.043
 circles+0.042
umm+0.042
andel+0.041
 sym+0.041
Neg logit contributions
 wouldn-0.030
 would-0.028
 got-0.027
 never-0.027
 didn-0.026
Token watch
Token ID2342
Feature activation+1.21e+00

Pos logit contributions
 wouldn+0.090
 would+0.080
 never+0.077
 didn+0.075
 got+0.074
Neg logit contributions
 circles-0.121
 amusement-0.117
 sym-0.116
 films-0.116
 playground-0.116
Token.
Token ID13
Feature activation-1.58e+00

Pos logit contributions
 wouldn+0.027
 would+0.026
 got+0.025
 never+0.024
 didn+0.021
Neg logit contributions
umm-0.078
 films-0.077
andel-0.077
 sym-0.076
 circles-0.076
Token He
Token ID679
Feature activation+7.18e+00

Pos logit contributions
 films+0.039
andel+0.038
umm+0.038
 sym+0.038
 circles+0.037
Neg logit contributions
 wouldn-0.028
 would-0.027
 got-0.027
 never-0.026
 didn-0.025
Token liked
Token ID8288
Feature activation-2.36e+00

Pos logit contributions
 wouldn+0.080
 never+0.065
use+0.064
 don+0.064
 would+0.061
Neg logit contributions
 elderly-0.190
 circles-0.187
 lions-0.185
 Wheels-0.185
 monkeys-0.182
Token to
Token ID284
Feature activation+2.28e+00

Pos logit contributions
umm+0.077
andel+0.075
 sym+0.075
ement+0.072
 films+0.072
Neg logit contributions
 would-0.024
 wouldn-0.024
 never-0.023
 got-0.023
 didn-0.020
Token explore
Token ID7301
Feature activation+8.65e-01

Pos logit contributions
 wouldn+0.040
 would+0.037
 got+0.036
 never+0.035
 didn+0.034
Neg logit contributions
 films-0.056
 circles-0.056
umm-0.054
andel-0.054
 sym-0.054
Token the
Token ID262
Feature activation-1.05e+01

Pos logit contributions
 would+0.014
 got+0.013
 wouldn+0.013
 never+0.013
 didn+0.012
Neg logit contributions
umm-0.025
andel-0.023
 sym-0.022
ums-0.022
 Wheels-0.022
Token woods
Token ID16479
Feature activation+4.44e-01

Pos logit contributions
ement+0.173
umm+0.166
andel+0.166
ums+0.158
 unison+0.155
Neg logit contributions
 would-0.284
 never-0.266
 will-0.260
 got-0.255
 wouldn-0.247
Token near
Token ID1474
Feature activation-2.77e+00

Pos logit contributions
 wouldn+0.008
 would+0.007
 never+0.007
 got+0.007
 didn+0.007
Neg logit contributions
umm-0.011
andel-0.011
 films-0.011
 sym-0.011
 circles-0.011
Token his
Token ID465
Feature activation-6.69e+00

Pos logit contributions
umm+0.092
 sym+0.089
andel+0.086
pill+0.083
 orchestra+0.083
Neg logit contributions
 never-0.031
 got-0.030
 would-0.029
 wouldn-0.026
 didn-0.026
Token hut
Token ID40812
Feature activation-1.88e-01

Pos logit contributions
andel+0.150
umm+0.147
ement+0.147
 sym+0.146
pill+0.144
Neg logit contributions
 would-0.135
 never-0.133
 got-0.133
 fell-0.122
 wouldn-0.121
Token,
Token ID11
Feature activation+1.31e+00

Pos logit contributions
andel+0.005
umm+0.005
 films+0.005
 sym+0.005
 orchestra+0.004
Neg logit contributions
 would-0.003
 wouldn-0.003
 never-0.003
 got-0.003
 could-0.003
Token.
Token ID13
Feature activation-1.35e+00

Pos logit contributions
 wouldn+0.059
 would+0.056
 got+0.055
 never+0.054
 didn+0.052
Neg logit contributions
 films-0.060
 circles-0.060
umm-0.059
andel-0.058
 sym-0.057
Token They
Token ID1119
Feature activation+7.64e+00

Pos logit contributions
umm+0.037
 flock+0.036
andel+0.036
 sym+0.036
 films+0.036
Neg logit contributions
 would-0.019
 wouldn-0.019
 could-0.018
 got-0.018
 never-0.017
Token play
Token ID711
Feature activation-6.34e-01

Pos logit contributions
 wouldn+0.125
 would+0.112
 got+0.111
 never+0.110
 didn+0.107
Neg logit contributions
 circles-0.196
 films-0.193
umm-0.186
 lions-0.185
andel-0.185
Token with
Token ID351
Feature activation-1.28e+00

Pos logit contributions
umm+0.016
andel+0.015
 sym+0.015
 films+0.015
opl+0.015
Neg logit contributions
 wouldn-0.011
 would-0.011
 got-0.011
 never-0.011
 didn-0.010
Token the
Token ID262
Feature activation-9.58e+00

Pos logit contributions
umm+0.035
 sym+0.034
andel+0.033
opl+0.032
 orchestra+0.031
Neg logit contributions
 got-0.022
 wouldn-0.022
 would-0.022
 never-0.021
 didn-0.020
Token pupp
Token ID13595
Feature activation-1.08e+01

Pos logit contributions
umm+0.193
andel+0.176
 sym+0.170
ums+0.170
opl+0.161
Neg logit contributions
 would-0.242
 wouldn-0.227
 got-0.221
 will-0.220
 had-0.209
Tokenets
Token ID1039
Feature activation+5.13e+00

Pos logit contributions
umm+0.296
 amusement+0.287
 elderly+0.286
 sym+0.282
 amaz+0.280
Neg logit contributions
 would-0.157
 got-0.148
 wouldn-0.144
 never-0.140
 could-0.137
Token,
Token ID11
Feature activation+1.61e+00

Pos logit contributions
 would+0.080
 wouldn+0.080
 never+0.077
 got+0.076
 didn+0.068
Neg logit contributions
umm-0.144
andel-0.139
 films-0.137
 sym-0.133
 Wheels-0.131
Token wear
Token ID5806
Feature activation+1.88e+00

Pos logit contributions
 wouldn+0.036
 would+0.035
 got+0.034
 never+0.034
 didn+0.032
Neg logit contributions
umm-0.034
andel-0.032
 sym-0.031
 films-0.031
ement-0.031
Token the
Token ID262
Feature activation-9.41e+00

Pos logit contributions
 would+0.042
 got+0.041
 wouldn+0.041
 never+0.040
 didn+0.038
Neg logit contributions
umm-0.042
andel-0.037
 sym-0.036
ement-0.036
opl-0.035
Token costumes
Token ID24138
Feature activation+1.55e+00

Pos logit contributions
umm+0.191
andel+0.158
opl+0.157
 sym+0.156
ums+0.156
Neg logit contributions
 would-0.237
 wouldn-0.227
 got-0.222
 will-0.219
 had-0.212
Token?"
Token ID1701
Feature activation-2.23e+00

Pos logit contributions
 wouldn+0.062
 never+0.057
 would+0.056
 got+0.055
 didn+0.053
Neg logit contributions
 films-0.092
 circles-0.092
 sym-0.089
 elderly-0.088
umm-0.088
Token
Token ID220
Feature activation-3.27e+00

Pos logit contributions
 films+0.054
umm+0.054
andel+0.053
 circles+0.053
 sym+0.052
Neg logit contributions
 wouldn-0.040
 would-0.038
 got-0.038
 never-0.037
 didn-0.035
Token
Token ID198
Feature activation-5.13e+00

Pos logit contributions
 films+0.083
umm+0.083
andel+0.081
 sym+0.080
 circles+0.080
Neg logit contributions
 wouldn-0.055
 would-0.052
 never-0.051
 got-0.051
 didn-0.048
Token
Token ID198
Feature activation-5.22e+00

Pos logit contributions
umm+0.141
 films+0.138
andel+0.134
 sym+0.130
 array+0.130
Neg logit contributions
 never-0.079
 got-0.077
 would-0.077
 wouldn-0.076
 didn-0.069
TokenThe
Token ID464
Feature activation-1.02e+01

Pos logit contributions
umm+0.111
 films+0.110
andel+0.108
 circles+0.103
 sym+0.102
Neg logit contributions
 never-0.111
 got-0.109
 wouldn-0.108
 would-0.108
 didn-0.102
Token elderly
Token ID13830
Feature activation-1.26e+01

Pos logit contributions
 sym+0.190
umm+0.184
opl+0.177
ement+0.165
ickets+0.163
Neg logit contributions
 got-0.257
 never-0.254
 tried-0.243
 had-0.228
 took-0.227
Token person
Token ID1048
Feature activation+2.01e+00

Pos logit contributions
umm+0.241
 sym+0.237
 films+0.229
andel+0.229
 circles+0.226
Neg logit contributions
 got-0.296
 never-0.291
 would-0.282
 wouldn-0.280
 tried-0.271
Token smiled
Token ID13541
Feature activation+3.26e-01

Pos logit contributions
 wouldn+0.031
 never+0.029
 would+0.029
 got+0.028
 fell+0.026
Neg logit contributions
 circles-0.050
 films-0.049
 sym-0.048
ement-0.048
umm-0.048
Token and
Token ID290
Feature activation+4.86e+00

Pos logit contributions
 wouldn+0.006
 would+0.005
 never+0.005
 got+0.005
 didn+0.005
Neg logit contributions
 circles-0.008
 films-0.008
umm-0.008
 sym-0.008
andel-0.008
Token nodded
Token ID14464
Feature activation+2.95e+00

Pos logit contributions
 wouldn+0.074
 would+0.071
 never+0.069
 got+0.066
 didn+0.062
Neg logit contributions
 circles-0.123
 films-0.122
 sym-0.122
umm-0.121
andel-0.119
Token.
Token ID13
Feature activation-1.36e+00

Pos logit contributions
 wouldn+0.061
 would+0.058
 got+0.057
 never+0.057
 didn+0.054
Neg logit contributions
 circles-0.061
 films-0.061
umm-0.059
 sym-0.059
andel-0.058
Token says
Token ID1139
Feature activation+7.59e-01

Pos logit contributions
 wouldn+0.048
 never+0.044
 managed+0.043
 got+0.041
 didn+0.041
Neg logit contributions
 films-0.052
 circles-0.051
 lions-0.049
 elderly-0.049
 Wheels-0.048
Token they
Token ID484
Feature activation+6.84e+00

Pos logit contributions
 wouldn+0.014
 would+0.012
 never+0.012
 got+0.012
 managed+0.012
Neg logit contributions
 films-0.018
 circles-0.017
 lions-0.017
 Wheels-0.017
umm-0.017
Token need
Token ID761
Feature activation+1.34e+00

Pos logit contributions
 wouldn+0.095
 managed+0.086
 never+0.080
ried+0.079
 listened+0.079
Neg logit contributions
 films-0.152
 Wheels-0.150
 circles-0.148
 dancers-0.144
 teams-0.144
Token to
Token ID284
Feature activation+3.33e+00

Pos logit contributions
 wouldn+0.020
 would+0.017
 got+0.017
 never+0.017
 didn+0.016
Neg logit contributions
 films-0.036
 circles-0.036
 lions-0.034
andel-0.034
 elderly-0.034
Token wear
Token ID5806
Feature activation+1.86e+00

Pos logit contributions
 wouldn+0.056
 would+0.053
 got+0.049
 never+0.047
 could+0.046
Neg logit contributions
 films-0.082
 circles-0.082
 amusement-0.079
andel-0.078
umm-0.078
Token swe
Token ID3490
Feature activation-1.09e+01

Pos logit contributions
 wouldn+0.041
 would+0.039
 got+0.039
 never+0.038
 didn+0.037
Neg logit contributions
 films-0.037
 circles-0.036
andel-0.035
 sym-0.035
umm-0.035
Tokenaters
Token ID8605
Feature activation+2.88e+00

Pos logit contributions
umm+0.051
 films+0.048
 circles+0.047
andel+0.040
 sym+0.038
Neg logit contributions
 wouldn-0.414
 would-0.403
 never-0.397
 got-0.397
 didn-0.387
Token.
Token ID13
Feature activation-2.00e+00

Pos logit contributions
 wouldn+0.053
 got+0.049
 never+0.049
 would+0.048
 didn+0.047
Neg logit contributions
 circles-0.063
 films-0.063
umm-0.061
andel-0.060
 lions-0.060
Token
Token ID198
Feature activation-3.85e+00

Pos logit contributions
umm+0.049
andel+0.048
 films+0.048
 sym+0.048
 circles+0.047
Neg logit contributions
 wouldn-0.035
 got-0.034
 would-0.034
 never-0.032
 didn-0.032
Token
Token ID198
Feature activation-4.36e+00

Pos logit contributions
 films+0.086
 circles+0.085
umm+0.085
andel+0.083
 sym+0.083
Neg logit contributions
 wouldn-0.076
 would-0.072
 got-0.070
 never-0.070
 didn-0.067
Token"
Token ID1
Feature activation-1.73e+00

Pos logit contributions
 films+0.105
 circles+0.105
umm+0.104
andel+0.102
 sym+0.102
Neg logit contributions
 wouldn-0.078
 would-0.073
 got-0.071
 never-0.070
 didn-0.067
Token animals
Token ID4695
Feature activation+1.21e+00

Pos logit contributions
umm+0.144
andel+0.142
 films+0.141
 sym+0.140
 circles+0.135
Neg logit contributions
 wouldn-0.201
 would-0.199
 never-0.197
 got-0.196
 didn-0.189
Token got
Token ID1392
Feature activation+8.13e-01

Pos logit contributions
 wouldn+0.015
 never+0.013
 would+0.013
 got+0.012
 forgot+0.012
Neg logit contributions
 circles-0.032
 films-0.032
 lions-0.032
 monkeys-0.031
 elephants-0.031
Token together
Token ID1978
Feature activation+4.73e+00

Pos logit contributions
 wouldn+0.014
 would+0.013
 never+0.012
 got+0.012
 didn+0.012
Neg logit contributions
 films-0.020
 circles-0.020
 lions-0.019
 elderly-0.019
 monkeys-0.019
Token and
Token ID290
Feature activation+4.76e+00

Pos logit contributions
 wouldn+0.073
 would+0.068
 got+0.066
 never+0.065
 didn+0.062
Neg logit contributions
 films-0.126
 circles-0.125
umm-0.125
andel-0.123
 sym-0.122
Token tied
Token ID8165
Feature activation+3.29e-01

Pos logit contributions
 wouldn+0.063
 would+0.057
 never+0.056
 got+0.054
 didn+0.051
Neg logit contributions
 circles-0.131
 films-0.131
 lions-0.127
 sym-0.127
 monkeys-0.124
Token the
Token ID262
Feature activation-1.03e+01

Pos logit contributions
 wouldn+0.006
 would+0.006
 never+0.006
 got+0.006
 didn+0.005
Neg logit contributions
umm-0.008
 films-0.008
andel-0.008
 sym-0.008
 orchestra-0.008
Token tree
Token ID5509
Feature activation+6.19e-01

Pos logit contributions
 circles+0.152
 films+0.151
umm+0.151
andel+0.148
 sym+0.144
Neg logit contributions
 wouldn-0.278
 would-0.268
 got-0.264
 never-0.262
 didn-0.254
Token down
Token ID866
Feature activation-2.27e+00

Pos logit contributions
 wouldn+0.009
 would+0.009
 got+0.008
 never+0.008
 didn+0.008
Neg logit contributions
 circles-0.017
 films-0.016
umm-0.016
andel-0.016
 sym-0.016
Token with
Token ID351
Feature activation-1.94e+00

Pos logit contributions
umm+0.057
 films+0.057
andel+0.056
 circles+0.055
 sym+0.055
Neg logit contributions
 wouldn-0.039
 would-0.037
 got-0.036
 never-0.036
 didn-0.034
Token rope
Token ID17182
Feature activation+1.34e+00

Pos logit contributions
umm+0.052
andel+0.049
 sym+0.048
 films+0.047
 orchestra+0.047
Neg logit contributions
 never-0.033
 wouldn-0.032
 got-0.032
 would-0.032
 didn-0.029
Token to
Token ID284
Feature activation+2.72e+00

Pos logit contributions
 wouldn+0.022
 would+0.021
 got+0.021
 never+0.020
 didn+0.019
Neg logit contributions
umm-0.035
 films-0.035
andel-0.034
 circles-0.034
 sym-0.033