TOP ACTIVATIONS
MAX = 0.257

was three - years - old . She had a lot
a three - year - old who found a chair .
a 3 - year - old boy who had a new
was three - years - old and she loved to explore
little 3 - year - old boy . He asked the
was three - years - old and loved to act and
a 3 - year - old little boy . He had
was three - years - old . She wanted to go
was three - years - old and very excited to go
was three - years - old . Every day she would
a three - year - old girl called Joy . She
a three - year - old named Peter . Peter loved
was three - years - old and loved to play in
only three - years - old , but she already knew
was three - years - old . She had a kind
average three - year - old boy . He loved playing
was three - years - old . R ia was very
eager 3 - year - old boy who loved to explore
a three - year - old boy was very excited .
little 3 - year - old girl was on the playground

MIN ACTIVATIONS
MIN = -0.304

never played with the or nam ents again . They still
the rest of the or nam ents onto the tree .
he especially liked the or nam ents he kept around .
a box of shiny or nam ents that spark led in
All of these lovely or nam ents reminded him of the
tree with lights and or nam ents . They also saw
He quickly hung colorful or nam ents and tw ink ling
can we have some or nam ents for our snow man
to me . The or nam ents were very important to
at all the different or nam ents and adm iring the
to look at the or nam ents on the tree .
to help sort the or nam ents . She picked up
who liked to collect or nam ents . Everywhere she went
many colors . Some or nam ents were round , some
time playing with her or nam ents and making her special
went she searched for or nam ents , and the more
park filled with colourful or nam ents . Jimmy looked out
. They were called or nam ents and they were very
a box of pretty or nam ents that she wanted to
and lots of shiny or nam ents . L ila wanted

INTERVAL 0.117 - 0.257
CONTAINS 1.554%

Jane was so proud of him . The moral
want to touch the tube anymore because it was scary .
robe and went to the park to play . She saw
Ben . Tru cks are fun . Doll s are boring
decided to go find a new home . But where should

INTERVAL -0.023 - 0.117
CONTAINS 71.663%

hearts . Then , Mom my said , �
when she saw all the fruits and vegetables spread out around
looked around and saw other kids touching their toes . Mia
. She asked Sara and Ben to help her . Sara
Lily said . They waved to the ostr ich

INTERVAL -0.163 - -0.023
CONTAINS 26.543%

if you put your heart into it . Not only that
lifted the box . He peek ed inside , and he
ugly and it made a lot of noise . The cell
farmer . He has a big farm with many animals .
the bench and opened the jar . L ily

INTERVAL -0.304 - -0.163
CONTAINS 0.240%

They drove for a long time and soon they were
hand . They followed the long , curly cable all the
" Exc use me squ ir rell y , I was
that her tie was too long and was dragging on the
loved to play all day long . He had many friends
Token was
Token ID373
Feature activation+5.31e-02

Pos logit contributions
 antenna+0.039
nam+0.038
yt+0.038
 magn+0.036
 nowhere+0.036
Neg logit contributions
old-0.030
Here-0.024
five-0.024
aid-0.024
 doctor-0.024
Token three
Token ID1115
Feature activation-4.03e-02

Pos logit contributions
old+0.077
 him+0.065
 better+0.063
 safer+0.063
 doctor+0.062
Neg logit contributions
 antenna-0.107
nam-0.101
yt-0.101
rys-0.099
 stub-0.098
Token-
Token ID12
Feature activation-3.36e-02

Pos logit contributions
 antenna+0.058
nam+0.054
yt+0.053
rys+0.052
 stub+0.051
Neg logit contributions
old-0.082
 him-0.073
 better-0.072
 safer-0.071
 doctor-0.071
Tokenyears
Token ID19002
Feature activation-1.46e-01

Pos logit contributions
 antenna+0.044
nam+0.041
yt+0.039
rys+0.038
 stub+0.038
Neg logit contributions
old-0.071
 him-0.064
 safer-0.064
 better-0.064
 doctor-0.063
Token-
Token ID12
Feature activation-4.25e-02

Pos logit contributions
 antenna+0.288
yt+0.269
nam+0.266
rys+0.262
 tentacles+0.255
Neg logit contributions
old-0.206
 him-0.186
 safer-0.183
 better-0.181
 doctor-0.179
Tokenold
Token ID727
Feature activation+2.24e-01

Pos logit contributions
 antenna+0.109
nam+0.107
yt+0.101
 stub+0.101
chie+0.101
Neg logit contributions
 safer-0.029
 better-0.027
 welcome-0.025
old-0.025
 doctor-0.024
Token.
Token ID13
Feature activation+3.32e-02

Pos logit contributions
old+0.295
 safer+0.278
 better+0.261
Here+0.258
 welcome+0.253
Neg logit contributions
 antenna-0.455
nam-0.425
yt-0.422
 stub-0.408
 nowhere-0.395
Token She
Token ID1375
Feature activation+1.31e-02

Pos logit contributions
old+0.041
 safer+0.034
 better+0.034
 him+0.033
 doctor+0.032
Neg logit contributions
 antenna-0.073
nam-0.070
yt-0.068
rys-0.067
 stub-0.067
Token had
Token ID550
Feature activation-1.42e-01

Pos logit contributions
old+0.019
 him+0.016
 better+0.015
 doctor+0.015
 ice+0.014
Neg logit contributions
rys-0.027
nam-0.027
 antenna-0.027
yt-0.026
ifying-0.025
Token a
Token ID257
Feature activation-1.72e-02

Pos logit contributions
 antenna+0.296
yt+0.294
rys+0.289
nam+0.282
ho+0.271
Neg logit contributions
old-0.194
 him-0.169
 better-0.162
 doctor-0.156
 ice-0.155
Token lot
Token ID1256
Feature activation-3.89e-02

Pos logit contributions
 antenna+0.033
nam+0.032
yt+0.031
rys+0.031
 stub+0.030
Neg logit contributions
old-0.026
 him-0.022
 better-0.022
 safer-0.022
 doctor-0.022
Token a
Token ID257
Feature activation-1.13e-02

Pos logit contributions
old+0.040
 safer+0.033
 doctor+0.033
 better+0.032
 Doctor+0.031
Neg logit contributions
 antenna-0.055
nam-0.051
 stub-0.051
rys-0.050
 nowhere-0.050
Token three
Token ID1115
Feature activation-5.59e-02

Pos logit contributions
 antenna+0.021
yt+0.020
nam+0.020
 stub+0.019
rys+0.019
Neg logit contributions
old-0.018
 him-0.016
 safer-0.015
 better-0.014
five-0.014
Token-
Token ID12
Feature activation-4.63e-02

Pos logit contributions
 antenna+0.101
nam+0.095
yt+0.094
rys+0.092
 stub+0.091
Neg logit contributions
old-0.093
 him-0.080
 better-0.078
 safer-0.078
 doctor-0.077
Tokenyear
Token ID1941
Feature activation-6.32e-02

Pos logit contributions
 antenna+0.078
nam+0.075
 stub+0.071
yt+0.071
rys+0.070
Neg logit contributions
old-0.076
 safer-0.070
 better-0.070
 him-0.069
 doctor-0.067
Token-
Token ID12
Feature activation-5.18e-02

Pos logit contributions
 antenna+0.134
nam+0.130
 stub+0.125
ifying+0.125
rys+0.125
Neg logit contributions
old-0.089
 him-0.069
 doctor-0.068
 better-0.067
 ice-0.067
Tokenold
Token ID727
Feature activation+2.22e-01

Pos logit contributions
 antenna+0.152
nam+0.152
yt+0.146
 stub+0.145
rys+0.142
Neg logit contributions
 safer-0.012
 better-0.011
old-0.008
 Doctor-0.007
 welcome-0.007
Token who
Token ID508
Feature activation+7.72e-02

Pos logit contributions
old+0.279
 safer+0.262
 better+0.246
Here+0.241
 him+0.235
Neg logit contributions
 antenna-0.468
nam-0.441
yt-0.440
 stub-0.427
 magn-0.411
Token found
Token ID1043
Feature activation+2.22e-02

Pos logit contributions
old+0.131
 safer+0.115
 better+0.113
 him+0.112
 doctor+0.111
Neg logit contributions
 antenna-0.134
nam-0.125
 stub-0.122
yt-0.122
rys-0.117
Token a
Token ID257
Feature activation-2.31e-02

Pos logit contributions
old+0.032
 safer+0.027
 better+0.026
 welcome+0.026
five+0.026
Neg logit contributions
 antenna-0.045
nam-0.042
 nowhere-0.041
 stub-0.040
 tentacles-0.040
Token chair
Token ID5118
Feature activation-8.75e-02

Pos logit contributions
 antenna+0.046
nam+0.043
rys+0.043
yt+0.043
 stub+0.041
Neg logit contributions
old-0.033
 better-0.029
 doctor-0.029
 safer-0.028
 him-0.028
Token.
Token ID13
Feature activation+2.38e-02

Pos logit contributions
 antenna+0.197
nam+0.188
yt+0.187
 stub+0.183
rys+0.183
Neg logit contributions
old-0.104
 him-0.084
 better-0.081
 safer-0.081
 doctor-0.079
Token a
Token ID257
Feature activation+1.94e-02

Pos logit contributions
old+0.035
 safer+0.030
 better+0.030
 him+0.029
 doctor+0.028
Neg logit contributions
 antenna-0.051
nam-0.048
yt-0.048
 stub-0.047
rys-0.046
Token 3
Token ID513
Feature activation-5.45e-02

Pos logit contributions
old+0.029
 him+0.025
 better+0.024
 safer+0.024
 doctor+0.024
Neg logit contributions
 antenna-0.038
nam-0.036
yt-0.036
rys-0.035
 stub-0.035
Token-
Token ID12
Feature activation-3.08e-02

Pos logit contributions
 antenna+0.081
nam+0.078
yt+0.074
rys+0.074
 stub+0.072
Neg logit contributions
old-0.111
 him-0.096
 better-0.093
 doctor-0.093
 welcome-0.092
Tokenyear
Token ID1941
Feature activation-5.97e-02

Pos logit contributions
 antenna+0.051
nam+0.048
yt+0.046
rys+0.045
 stub+0.045
Neg logit contributions
old-0.054
 safer-0.048
 him-0.048
 better-0.047
 doctor-0.047
Token-
Token ID12
Feature activation-4.56e-02

Pos logit contributions
 antenna+0.123
nam+0.118
yt+0.115
 stub+0.114
rys+0.114
Neg logit contributions
old-0.086
 him-0.071
 better-0.068
 doctor-0.068
 safer-0.067
Tokenold
Token ID727
Feature activation+2.21e-01

Pos logit contributions
 antenna+0.118
nam+0.115
yt+0.110
 stub+0.109
rys+0.108
Neg logit contributions
old-0.030
 safer-0.029
 welcome-0.027
 better-0.027
 doctor-0.025
Token boy
Token ID2933
Feature activation+9.58e-02

Pos logit contributions
old+0.306
 safer+0.303
 better+0.281
Here+0.279
 welcome+0.269
Neg logit contributions
 antenna-0.418
yt-0.391
nam-0.388
 stub-0.376
 magn-0.359
Token who
Token ID508
Feature activation+7.44e-02

Pos logit contributions
old+0.143
 safer+0.140
Here+0.135
 better+0.128
orting+0.127
Neg logit contributions
 antenna-0.162
 stub-0.150
yt-0.149
 reapp-0.145
nam-0.144
Token had
Token ID550
Feature activation-1.44e-01

Pos logit contributions
old+0.124
 safer+0.111
 him+0.107
 better+0.107
 doctor+0.107
Neg logit contributions
 antenna-0.127
nam-0.119
yt-0.117
 stub-0.115
rys-0.112
Token a
Token ID257
Feature activation-2.19e-02

Pos logit contributions
 antenna+0.305
nam+0.289
yt+0.287
rys+0.283
 stub+0.280
Neg logit contributions
old-0.194
 him-0.163
 better-0.158
 safer-0.156
 doctor-0.155
Token new
Token ID649
Feature activation+1.38e-01

Pos logit contributions
 antenna+0.043
nam+0.041
yt+0.040
rys+0.040
 stub+0.039
Neg logit contributions
old-0.033
 him-0.028
 better-0.027
 safer-0.027
 doctor-0.026
Token was
Token ID373
Feature activation+2.27e-02

Pos logit contributions
old+0.043
 him+0.037
 better+0.036
 doctor+0.036
 safer+0.035
Neg logit contributions
 antenna-0.052
nam-0.051
yt-0.049
rys-0.049
 stub-0.047
Token three
Token ID1115
Feature activation-5.16e-02

Pos logit contributions
old+0.036
 him+0.030
 safer+0.029
 better+0.029
 doctor+0.029
Neg logit contributions
 antenna-0.043
nam-0.041
 stub-0.039
yt-0.039
rys-0.039
Token-
Token ID12
Feature activation-4.17e-02

Pos logit contributions
 antenna+0.064
nam+0.058
yt+0.057
rys+0.056
 stub+0.054
Neg logit contributions
old-0.115
 him-0.103
 better-0.101
 safer-0.101
 doctor-0.101
Tokenyears
Token ID19002
Feature activation-1.57e-01

Pos logit contributions
 antenna+0.037
nam+0.035
yt+0.031
rys+0.031
 stub+0.030
Neg logit contributions
old-0.102
 him-0.096
 safer-0.096
 better-0.096
 doctor-0.095
Token-
Token ID12
Feature activation-4.47e-02

Pos logit contributions
 antenna+0.305
yt+0.285
nam+0.284
rys+0.278
 stub+0.271
Neg logit contributions
old-0.230
 him-0.204
 safer-0.199
 better-0.198
 doctor-0.196
Tokenold
Token ID727
Feature activation+2.19e-01

Pos logit contributions
 antenna+0.126
nam+0.125
ho+0.119
yt+0.119
 stub+0.118
Neg logit contributions
 safer-0.017
 better-0.015
 doctor-0.012
 welcome-0.011
 him-0.011
Token and
Token ID290
Feature activation+1.41e-02

Pos logit contributions
old+0.285
 safer+0.249
Here+0.239
 better+0.237
 doctor+0.231
Neg logit contributions
 antenna-0.462
nam-0.433
yt-0.430
 stub-0.412
ho-0.405
Token she
Token ID673
Feature activation-3.74e-02

Pos logit contributions
old+0.021
 him+0.017
 better+0.017
 safer+0.017
 doctor+0.017
Neg logit contributions
 antenna-0.028
nam-0.027
yt-0.026
rys-0.026
 stub-0.026
Token loved
Token ID6151
Feature activation+6.54e-03

Pos logit contributions
 antenna+0.071
nam+0.067
yt+0.066
rys+0.066
 stub+0.064
Neg logit contributions
old-0.059
 him-0.050
 better-0.049
 safer-0.048
 doctor-0.048
Token to
Token ID284
Feature activation-2.13e-02

Pos logit contributions
old+0.010
 safer+0.008
 better+0.008
 him+0.008
 doctor+0.008
Neg logit contributions
 antenna-0.013
nam-0.012
yt-0.012
 stub-0.011
rys-0.011
Token explore
Token ID7301
Feature activation-2.86e-02

Pos logit contributions
 antenna+0.046
nam+0.043
yt+0.043
rys+0.042
 stub+0.042
Neg logit contributions
old-0.028
 safer-0.023
 better-0.022
 him-0.022
 doctor-0.021
Token little
Token ID1310
Feature activation+1.44e-02

Pos logit contributions
 antenna+0.029
nam+0.027
yt+0.027
rys+0.027
 stub+0.026
Neg logit contributions
old-0.021
 better-0.018
 doctor-0.018
 safer-0.018
 him-0.018
Token 3
Token ID513
Feature activation-7.76e-02

Pos logit contributions
old+0.019
 him+0.015
 safer+0.015
 better+0.015
 doctor+0.014
Neg logit contributions
 antenna-0.031
nam-0.030
yt-0.030
rys-0.029
 stub-0.029
Token-
Token ID12
Feature activation-4.45e-02

Pos logit contributions
 antenna+0.125
nam+0.121
ifying+0.115
rys+0.113
yt+0.112
Neg logit contributions
old-0.152
 him-0.128
 doctor-0.126
 ice-0.124
 better-0.124
Tokenyear
Token ID1941
Feature activation-6.84e-02

Pos logit contributions
 antenna+0.068
nam+0.067
 stub+0.062
yt+0.061
rys+0.060
Neg logit contributions
old-0.077
 safer-0.074
 better-0.073
 him-0.072
 welcome-0.072
Token-
Token ID12
Feature activation-4.95e-02

Pos logit contributions
 antenna+0.145
nam+0.141
 stub+0.136
yt+0.135
rys+0.135
Neg logit contributions
old-0.097
 him-0.076
 better-0.074
 doctor-0.074
 safer-0.072
Tokenold
Token ID727
Feature activation+2.19e-01

Pos logit contributions
nam+0.143
 antenna+0.142
 stub+0.135
ho+0.135
yt+0.134
Neg logit contributions
 safer-0.014
 welcome-0.012
 better-0.012
 doctor-0.008
 Doctor-0.008
Token boy
Token ID2933
Feature activation+9.06e-02

Pos logit contributions
old+0.287
 safer+0.279
Here+0.262
 better+0.259
 welcome+0.245
Neg logit contributions
 antenna-0.437
yt-0.415
nam-0.413
 stub-0.400
 magn-0.383
Token.
Token ID13
Feature activation+2.88e-02

Pos logit contributions
old+0.162
 safer+0.153
 better+0.147
 Doctor+0.146
Here+0.146
Neg logit contributions
 antenna-0.134
 stub-0.127
nam-0.125
yt-0.124
 nowhere-0.120
Token He
Token ID679
Feature activation+5.93e-02

Pos logit contributions
old+0.034
 better+0.027
 safer+0.027
 him+0.027
 doctor+0.026
Neg logit contributions
 antenna-0.065
nam-0.063
yt-0.061
 stub-0.060
rys-0.060
Token asked
Token ID1965
Feature activation+4.41e-02

Pos logit contributions
old+0.083
 him+0.072
 better+0.067
 doctor+0.065
 ice+0.063
Neg logit contributions
rys-0.125
nam-0.125
 antenna-0.123
yt-0.123
ifying-0.117
Token the
Token ID262
Feature activation+3.24e-02

Pos logit contributions
old+0.055
 safer+0.048
Here+0.048
 better+0.043
 cleaner+0.043
Neg logit contributions
 antenna-0.091
nam-0.090
 nowhere-0.083
yt-0.082
 stub-0.080
Token was
Token ID373
Feature activation+1.69e-02

Pos logit contributions
 antenna+0.042
yt+0.039
 stub+0.038
 tentacles+0.037
 nowhere+0.037
Neg logit contributions
old-0.037
 safer-0.034
Here-0.033
 doctor-0.033
 him-0.032
Token three
Token ID1115
Feature activation-5.73e-02

Pos logit contributions
old+0.029
five+0.024
orting+0.024
 him+0.023
Here+0.023
Neg logit contributions
nam-0.030
 antenna-0.029
 stub-0.027
 nowhere-0.026
rys-0.026
Token-
Token ID12
Feature activation-4.75e-02

Pos logit contributions
 antenna+0.107
nam+0.101
rys+0.099
 stub+0.098
yt+0.097
Neg logit contributions
old-0.094
 him-0.080
 better-0.078
 safer-0.077
 doctor-0.076
Tokenyears
Token ID19002
Feature activation-1.54e-01

Pos logit contributions
nam+0.061
 antenna+0.060
 stub+0.054
rys+0.054
yt+0.054
Neg logit contributions
old-0.094
 safer-0.091
 better-0.090
 doctor-0.089
 welcome-0.088
Token-
Token ID12
Feature activation-4.71e-02

Pos logit contributions
 antenna+0.309
nam+0.294
yt+0.289
rys+0.285
 stub+0.283
Neg logit contributions
old-0.224
 him-0.188
 better-0.183
 safer-0.182
 doctor-0.181
Tokenold
Token ID727
Feature activation+2.18e-01

Pos logit contributions
nam+0.127
 antenna+0.125
ho+0.122
rys+0.122
yt+0.121
Neg logit contributions
 safer-0.021
 better-0.020
 doctor-0.018
 cleaner-0.017
 welcome-0.016
Token and
Token ID290
Feature activation+2.55e-02

Pos logit contributions
old+0.308
 safer+0.279
Here+0.279
 choose+0.262
 better+0.260
Neg logit contributions
 antenna-0.432
nam-0.403
yt-0.402
 tentacles-0.383
ho-0.379
Token loved
Token ID6151
Feature activation+1.71e-02

Pos logit contributions
old+0.041
 safer+0.034
 doctor+0.034
 Doctor+0.033
 better+0.033
Neg logit contributions
 antenna-0.047
nam-0.046
yt-0.043
 stub-0.042
 tentacles-0.041
Token to
Token ID284
Feature activation-1.23e-02

Pos logit contributions
old+0.027
 safer+0.022
 better+0.022
 doctor+0.022
 him+0.022
Neg logit contributions
 antenna-0.033
nam-0.031
yt-0.030
 stub-0.030
rys-0.030
Token act
Token ID719
Feature activation-3.27e-02

Pos logit contributions
 antenna+0.029
nam+0.027
rys+0.027
 stub+0.027
yt+0.027
Neg logit contributions
old-0.014
 safer-0.011
 Doctor-0.010
 better-0.010
Here-0.010
Token and
Token ID290
Feature activation+2.49e-02

Pos logit contributions
 antenna+0.068
yt+0.065
nam+0.065
 tentacles+0.063
 beating+0.062
Neg logit contributions
old-0.044
Here-0.037
 Doctor-0.035
 doctor-0.034
 him-0.034
Token a
Token ID257
Feature activation+5.22e-03

Pos logit contributions
old+0.057
 him+0.047
 better+0.045
 safer+0.045
 doctor+0.045
Neg logit contributions
 antenna-0.092
nam-0.087
yt-0.086
rys-0.085
 stub-0.084
Token 3
Token ID513
Feature activation-6.46e-02

Pos logit contributions
old+0.007
 better+0.006
 doctor+0.006
 him+0.006
 safer+0.006
Neg logit contributions
 antenna-0.011
nam-0.010
yt-0.010
rys-0.010
 stub-0.010
Token-
Token ID12
Feature activation-3.73e-02

Pos logit contributions
 antenna+0.102
nam+0.100
yt+0.093
rys+0.093
ifying+0.092
Neg logit contributions
old-0.127
 him-0.108
 better-0.105
 doctor-0.105
 welcome-0.105
Tokenyear
Token ID1941
Feature activation-6.40e-02

Pos logit contributions
 antenna+0.064
nam+0.061
yt+0.058
rys+0.058
 stub+0.058
Neg logit contributions
old-0.060
 safer-0.055
 him-0.054
 better-0.054
 doctor-0.053
Token-
Token ID12
Feature activation-4.91e-02

Pos logit contributions
 antenna+0.134
nam+0.129
yt+0.125
 stub+0.125
rys+0.124
Neg logit contributions
old-0.091
 him-0.074
 better-0.071
 doctor-0.070
 safer-0.070
Tokenold
Token ID727
Feature activation+2.18e-01

Pos logit contributions
 antenna+0.139
nam+0.137
yt+0.131
 stub+0.130
rys+0.129
Neg logit contributions
old-0.018
 safer-0.017
 better-0.016
 welcome-0.014
 doctor-0.014
Token little
Token ID1310
Feature activation+9.68e-03

Pos logit contributions
old+0.306
 safer+0.297
 better+0.279
Here+0.274
 welcome+0.264
Neg logit contributions
 antenna-0.414
yt-0.390
nam-0.383
 stub-0.375
rys-0.357
Token boy
Token ID2933
Feature activation+9.11e-02

Pos logit contributions
old+0.013
 safer+0.011
 him+0.011
Here+0.010
 better+0.010
Neg logit contributions
 antenna-0.021
nam-0.019
yt-0.019
 stub-0.018
rys-0.018
Token.
Token ID13
Feature activation+3.18e-02

Pos logit contributions
old+0.142
 safer+0.134
Here+0.128
 better+0.125
 doctor+0.123
Neg logit contributions
 antenna-0.156
yt-0.144
 stub-0.144
nam-0.139
 reapp-0.136
Token He
Token ID679
Feature activation+6.11e-02

Pos logit contributions
old+0.038
 safer+0.032
 better+0.032
 him+0.031
 doctor+0.030
Neg logit contributions
 antenna-0.070
nam-0.067
yt-0.065
rys-0.064
 stub-0.064
Token had
Token ID550
Feature activation-1.45e-01

Pos logit contributions
old+0.090
 him+0.076
 better+0.073
 doctor+0.072
 safer+0.069
Neg logit contributions
 antenna-0.124
nam-0.122
rys-0.119
yt-0.118
 stub-0.114
Token was
Token ID373
Feature activation+8.76e-03

Pos logit contributions
 antenna+0.059
nam+0.057
yt+0.056
rys+0.056
 stub+0.054
Neg logit contributions
old-0.046
 him-0.039
 better-0.038
 doctor-0.038
 safer-0.037
Token three
Token ID1115
Feature activation-5.73e-02

Pos logit contributions
old+0.013
 him+0.011
 better+0.011
 safer+0.011
 doctor+0.011
Neg logit contributions
 antenna-0.017
nam-0.016
yt-0.016
rys-0.016
 stub-0.016
Token-
Token ID12
Feature activation-5.06e-02

Pos logit contributions
 antenna+0.078
nam+0.072
yt+0.070
rys+0.069
 stub+0.068
Neg logit contributions
old-0.122
 him-0.108
 better-0.106
 safer-0.106
 doctor-0.105
Tokenyears
Token ID19002
Feature activation-1.59e-01

Pos logit contributions
 antenna+0.048
nam+0.043
yt+0.041
rys+0.039
 stub+0.039
Neg logit contributions
old-0.126
 him-0.116
 better-0.114
 safer-0.114
 doctor-0.114
Token-
Token ID12
Feature activation-5.21e-02

Pos logit contributions
 antenna+0.315
nam+0.295
yt+0.294
rys+0.289
 stub+0.282
Neg logit contributions
old-0.229
 him-0.201
 safer-0.195
 better-0.195
 doctor-0.193
Tokenold
Token ID727
Feature activation+2.17e-01

Pos logit contributions
 antenna+0.152
nam+0.150
yt+0.143
 stub+0.142
rys+0.142
Neg logit contributions
 safer-0.016
 better-0.014
 doctor-0.012
 Doctor-0.011
 welcome-0.009
Token.
Token ID13
Feature activation+2.59e-02

Pos logit contributions
old+0.266
 safer+0.234
Here+0.221
 better+0.221
 doctor+0.219
Neg logit contributions
 antenna-0.468
nam-0.445
yt-0.436
 stub-0.422
 tentacles-0.412
Token She
Token ID1375
Feature activation+8.87e-03

Pos logit contributions
old+0.037
 him+0.031
 better+0.030
 safer+0.030
 doctor+0.030
Neg logit contributions
 antenna-0.052
nam-0.050
yt-0.049
rys-0.048
 stub-0.048
Token wanted
Token ID2227
Feature activation-7.16e-03

Pos logit contributions
old+0.013
 him+0.011
 better+0.010
 doctor+0.010
 ice+0.010
Neg logit contributions
 antenna-0.018
nam-0.018
rys-0.017
yt-0.017
ifying-0.016
Token to
Token ID284
Feature activation-2.23e-02

Pos logit contributions
 antenna+0.014
nam+0.013
yt+0.013
 stub+0.013
rys+0.013
Neg logit contributions
old-0.011
 him-0.009
 safer-0.009
 better-0.009
 doctor-0.009
Token go
Token ID467
Feature activation+8.23e-02

Pos logit contributions
 antenna+0.051
nam+0.048
yt+0.048
rys+0.047
 stub+0.047
Neg logit contributions
old-0.027
 him-0.022
 better-0.021
 safer-0.021
 doctor-0.020
Token was
Token ID373
Feature activation+8.68e-03

Pos logit contributions
 antenna+0.194
nam+0.185
yt+0.183
rys+0.181
 stub+0.177
Neg logit contributions
old-0.137
 him-0.116
 better-0.112
 safer-0.110
 doctor-0.110
Token three
Token ID1115
Feature activation-6.26e-02

Pos logit contributions
old+0.013
 him+0.011
 safer+0.010
 better+0.010
 doctor+0.010
Neg logit contributions
 antenna-0.017
nam-0.017
yt-0.016
 stub-0.016
rys-0.016
Token-
Token ID12
Feature activation-5.15e-02

Pos logit contributions
 antenna+0.096
nam+0.089
yt+0.087
rys+0.086
 stub+0.086
Neg logit contributions
old-0.123
 him-0.107
 better-0.105
 safer-0.105
 doctor-0.104
Tokenyears
Token ID19002
Feature activation-1.60e-01

Pos logit contributions
 antenna+0.045
nam+0.041
yt+0.038
rys+0.037
 stub+0.037
Neg logit contributions
old-0.132
 him-0.121
 better-0.120
 safer-0.120
 doctor-0.119
Token-
Token ID12
Feature activation-5.13e-02

Pos logit contributions
 antenna+0.323
nam+0.307
yt+0.303
rys+0.299
 stub+0.297
Neg logit contributions
old-0.232
 him-0.194
 better-0.189
 safer-0.188
 doctor-0.186
Tokenold
Token ID727
Feature activation+2.16e-01

Pos logit contributions
nam+0.148
 antenna+0.145
rys+0.140
 stub+0.140
yt+0.139
Neg logit contributions
 safer-0.016
 better-0.015
 him-0.013
 doctor-0.012
old-0.011
Token and
Token ID290
Feature activation+2.09e-02

Pos logit contributions
old+0.275
 safer+0.229
Here+0.225
 him+0.222
 better+0.221
Neg logit contributions
 antenna-0.469
nam-0.443
yt-0.440
 stub-0.423
rys-0.420
Token very
Token ID845
Feature activation-3.85e-02

Pos logit contributions
old+0.029
 him+0.024
 safer+0.024
 better+0.024
 doctor+0.023
Neg logit contributions
 antenna-0.043
nam-0.041
yt-0.040
 stub-0.040
rys-0.040
Token excited
Token ID6568
Feature activation-3.87e-02

Pos logit contributions
 antenna+0.072
nam+0.072
yt+0.068
 stub+0.067
rys+0.066
Neg logit contributions
old-0.058
 him-0.052
 safer-0.049
 doctor-0.048
Here-0.048
Token to
Token ID284
Feature activation-1.90e-02

Pos logit contributions
 antenna+0.088
nam+0.084
yt+0.083
rys+0.082
 stub+0.081
Neg logit contributions
old-0.046
 him-0.037
 safer-0.036
 better-0.036
 doctor-0.036
Token go
Token ID467
Feature activation+8.61e-02

Pos logit contributions
 antenna+0.046
nam+0.044
yt+0.043
rys+0.043
 stub+0.043
Neg logit contributions
old-0.020
 him-0.015
 safer-0.015
 better-0.015
 doctor-0.014
Token was
Token ID373
Feature activation+1.43e-02

Pos logit contributions
old+0.025
 him+0.022
 better+0.020
 doctor+0.020
 ice+0.020
Neg logit contributions
 antenna-0.045
nam-0.044
rys-0.043
yt-0.042
ifying-0.042
Token three
Token ID1115
Feature activation-5.97e-02

Pos logit contributions
old+0.023
 him+0.020
 safer+0.019
 better+0.019
 doctor+0.019
Neg logit contributions
 antenna-0.026
nam-0.025
yt-0.024
 stub-0.024
rys-0.024
Token-
Token ID12
Feature activation-5.02e-02

Pos logit contributions
 antenna+0.092
nam+0.086
rys+0.083
 stub+0.083
yt+0.081
Neg logit contributions
old-0.119
 him-0.103
 better-0.100
 safer-0.099
 ice-0.099
Tokenyears
Token ID19002
Feature activation-1.56e-01

Pos logit contributions
 antenna+0.055
nam+0.050
yt+0.049
rys+0.048
 stub+0.047
Neg logit contributions
old-0.118
 him-0.107
 better-0.106
 safer-0.106
 doctor-0.105
Token-
Token ID12
Feature activation-4.97e-02

Pos logit contributions
 antenna+0.317
nam+0.303
yt+0.296
rys+0.293
 stub+0.293
Neg logit contributions
old-0.224
 him-0.186
 better-0.181
 safer-0.179
 doctor-0.178
Tokenold
Token ID727
Feature activation+2.16e-01

Pos logit contributions
 antenna+0.144
nam+0.143
rys+0.140
yt+0.139
 stub+0.138
Neg logit contributions
 safer-0.014
 better-0.013
old-0.011
 doctor-0.011
 him-0.011
Token.
Token ID13
Feature activation+2.17e-02

Pos logit contributions
old+0.282
 safer+0.236
 better+0.228
 him+0.226
Here+0.225
Neg logit contributions
 antenna-0.462
nam-0.437
yt-0.435
 stub-0.417
rys-0.415
Token Every
Token ID3887
Feature activation+7.63e-02

Pos logit contributions
old+0.028
 him+0.023
 better+0.022
 safer+0.022
 doctor+0.022
Neg logit contributions
 antenna-0.047
nam-0.045
yt-0.045
rys-0.044
 stub-0.044
Token day
Token ID1110
Feature activation-7.90e-03

Pos logit contributions
old+0.093
 him+0.077
 better+0.075
 doctor+0.074
 welcome+0.072
Neg logit contributions
 antenna-0.175
nam-0.172
yt-0.164
rys-0.160
 stub-0.159
Token she
Token ID673
Feature activation-3.03e-02

Pos logit contributions
 antenna+0.018
nam+0.017
yt+0.017
rys+0.017
 stub+0.017
Neg logit contributions
old-0.009
 him-0.008
 better-0.007
 welcome-0.007
 safer-0.007
Token would
Token ID561
Feature activation+5.01e-02

Pos logit contributions
 antenna+0.055
nam+0.051
yt+0.051
rys+0.050
 stub+0.050
Neg logit contributions
old-0.050
 him-0.043
 better-0.042
 safer-0.042
 doctor-0.042
Token a
Token ID257
Feature activation-1.91e-02

Pos logit contributions
old+0.023
 doctor+0.018
 safer+0.018
 him+0.018
 better+0.018
Neg logit contributions
 antenna-0.031
nam-0.029
 nowhere-0.028
 stub-0.028
yt-0.028
Token three
Token ID1115
Feature activation-6.12e-02

Pos logit contributions
 antenna+0.034
yt+0.033
nam+0.033
 stub+0.032
rys+0.032
Neg logit contributions
old-0.033
 him-0.028
five-0.025
Here-0.025
 safer-0.025
Token-
Token ID12
Feature activation-5.02e-02

Pos logit contributions
 antenna+0.114
nam+0.108
 stub+0.104
rys+0.104
yt+0.103
Neg logit contributions
old-0.101
 doctor-0.084
 him-0.083
 better-0.083
 welcome-0.082
Tokenyear
Token ID1941
Feature activation-6.78e-02

Pos logit contributions
 antenna+0.083
nam+0.080
yt+0.076
 stub+0.076
rys+0.075
Neg logit contributions
old-0.086
 him-0.078
 safer-0.077
 better-0.077
 doctor-0.076
Token-
Token ID12
Feature activation-5.00e-02

Pos logit contributions
 antenna+0.149
nam+0.148
ifying+0.142
 stub+0.141
 nowhere+0.139
Neg logit contributions
old-0.089
 doctor-0.068
 ice-0.068
 welcome-0.067
 him-0.066
Tokenold
Token ID727
Feature activation+2.15e-01

Pos logit contributions
nam+0.142
 antenna+0.142
yt+0.137
 stub+0.137
rys+0.136
Neg logit contributions
old-0.016
 safer-0.015
 him-0.015
 better-0.014
 doctor-0.012
Token girl
Token ID2576
Feature activation+3.46e-02

Pos logit contributions
old+0.274
 safer+0.246
 better+0.229
 him+0.227
Here+0.227
Neg logit contributions
 antenna-0.452
yt-0.433
nam-0.433
 stub-0.413
rys-0.410
Token called
Token ID1444
Feature activation+5.18e-02

Pos logit contributions
old+0.049
 him+0.039
 doctor+0.038
 better+0.038
 welcome+0.037
Neg logit contributions
 antenna-0.075
nam-0.071
yt-0.069
 tentacles-0.068
rys-0.068
Token Joy
Token ID14087
Feature activation-6.55e-02

Pos logit contributions
old+0.055
 him+0.043
 welcome+0.042
 doctor+0.040
 Doctor+0.039
Neg logit contributions
 antenna-0.125
yt-0.119
nam-0.118
rys-0.118
 stub-0.118
Token.
Token ID13
Feature activation+2.09e-02

Pos logit contributions
 antenna+0.169
nam+0.165
yt+0.160
rys+0.160
 stub+0.160
Neg logit contributions
old-0.060
 him-0.043
 better-0.043
 doctor-0.040
 welcome-0.040
Token She
Token ID1375
Feature activation+2.15e-02

Pos logit contributions
old+0.027
 him+0.023
 welcome+0.022
 better+0.022
 safer+0.021
Neg logit contributions
 antenna-0.044
nam-0.044
yt-0.043
rys-0.043
 stub-0.043
Token a
Token ID257
Feature activation-1.75e-02

Pos logit contributions
old+0.023
 safer+0.017
five+0.017
orting+0.017
 choice+0.016
Neg logit contributions
nam-0.041
 antenna-0.041
 magn-0.039
 stub-0.038
 nowhere-0.038
Token three
Token ID1115
Feature activation-5.89e-02

Pos logit contributions
 antenna+0.033
nam+0.032
 stub+0.031
yt+0.031
rys+0.030
Neg logit contributions
old-0.028
 him-0.024
 safer-0.022
 better-0.021
five-0.021
Token-
Token ID12
Feature activation-4.95e-02

Pos logit contributions
 antenna+0.108
nam+0.102
yt+0.100
rys+0.099
 stub+0.098
Neg logit contributions
old-0.097
 him-0.082
 better-0.081
 doctor-0.080
 safer-0.080
Tokenyear
Token ID1941
Feature activation-6.74e-02

Pos logit contributions
nam+0.067
 antenna+0.063
 stub+0.061
rys+0.059
yt+0.058
Neg logit contributions
 safer-0.090
 better-0.088
 Doctor-0.087
 doctor-0.086
old-0.086
Token-
Token ID12
Feature activation-4.93e-02

Pos logit contributions
 antenna+0.142
nam+0.138
 stub+0.132
rys+0.132
yt+0.132
Neg logit contributions
old-0.093
 him-0.074
 doctor-0.073
 better-0.072
 welcome-0.072
Tokenold
Token ID727
Feature activation+2.15e-01

Pos logit contributions
nam+0.138
 antenna+0.133
yt+0.132
 stub+0.131
ho+0.129
Neg logit contributions
 safer-0.016
 better-0.014
 Doctor-0.013
 cleaner-0.011
 doctor-0.011
Token named
Token ID3706
Feature activation+2.17e-02

Pos logit contributions
old+0.268
 safer+0.247
Here+0.227
orting+0.219
 choose+0.216
Neg logit contributions
 antenna-0.457
yt-0.430
nam-0.424
 stub-0.416
 magn-0.410
Token Peter
Token ID5613
Feature activation-1.02e-01

Pos logit contributions
old+0.022
 safer+0.018
 better+0.018
 doctor+0.017
 hospital+0.016
Neg logit contributions
nam-0.051
 antenna-0.050
yt-0.048
 magn-0.047
 stub-0.046
Token.
Token ID13
Feature activation+2.53e-02

Pos logit contributions
 antenna+0.199
yt+0.193
 stub+0.186
 tentacles+0.184
rys+0.182
Neg logit contributions
old-0.138
 safer-0.129
Here-0.124
 Doctor-0.120
five-0.117
Token Peter
Token ID5613
Feature activation-1.02e-01

Pos logit contributions
old+0.036
 safer+0.030
 him+0.030
 better+0.030
 doctor+0.029
Neg logit contributions
 antenna-0.052
nam-0.049
yt-0.048
rys-0.047
 stub-0.047
Token loved
Token ID6151
Feature activation+1.51e-02

Pos logit contributions
 antenna+0.186
nam+0.174
yt+0.172
 stub+0.170
rys+0.168
Neg logit contributions
old-0.164
 him-0.141
 safer-0.139
 better-0.138
 doctor-0.137
Token was
Token ID373
Feature activation+1.23e-02

Pos logit contributions
old+0.027
 him+0.023
 better+0.022
 doctor+0.021
 ice+0.021
Neg logit contributions
 antenna-0.050
nam-0.048
yt-0.048
rys-0.047
ifying-0.046
Token three
Token ID1115
Feature activation-6.02e-02

Pos logit contributions
old+0.019
 him+0.016
 safer+0.016
 better+0.016
 doctor+0.016
Neg logit contributions
 antenna-0.023
nam-0.022
yt-0.022
rys-0.022
 stub-0.022
Token-
Token ID12
Feature activation-5.23e-02

Pos logit contributions
 antenna+0.093
nam+0.085
rys+0.084
 stub+0.084
yt+0.082
Neg logit contributions
old-0.120
 him-0.103
 better-0.101
 ice-0.100
 doctor-0.100
Tokenyears
Token ID19002
Feature activation-1.59e-01

Pos logit contributions
 antenna+0.045
nam+0.041
yt+0.038
rys+0.037
 stub+0.037
Neg logit contributions
old-0.134
 him-0.123
 better-0.122
 safer-0.122
 doctor-0.121
Token-
Token ID12
Feature activation-5.19e-02

Pos logit contributions
 antenna+0.324
nam+0.309
yt+0.302
rys+0.300
 stub+0.299
Neg logit contributions
old-0.229
 him-0.188
 better-0.183
 doctor-0.181
 safer-0.181
Tokenold
Token ID727
Feature activation+2.15e-01

Pos logit contributions
nam+0.148
 antenna+0.144
 stub+0.140
rys+0.139
yt+0.138
Neg logit contributions
 safer-0.019
 better-0.018
 him-0.015
 doctor-0.014
old-0.014
Token and
Token ID290
Feature activation+2.08e-02

Pos logit contributions
old+0.286
 safer+0.248
Here+0.242
 better+0.236
 him+0.230
Neg logit contributions
 antenna-0.451
nam-0.431
yt-0.425
 stub-0.403
rys-0.398
Token loved
Token ID6151
Feature activation+1.45e-02

Pos logit contributions
old+0.032
 safer+0.026
 doctor+0.026
 him+0.026
 better+0.025
Neg logit contributions
 antenna-0.040
nam-0.040
 stub-0.037
yt-0.037
 beating-0.036
Token to
Token ID284
Feature activation-1.65e-02

Pos logit contributions
old+0.021
 him+0.018
 better+0.017
 safer+0.017
 doctor+0.017
Neg logit contributions
 antenna-0.029
nam-0.028
yt-0.028
rys-0.027
 stub-0.027
Token play
Token ID711
Feature activation+1.12e-01

Pos logit contributions
 antenna+0.039
nam+0.037
rys+0.037
yt+0.037
 stub+0.036
Neg logit contributions
old-0.018
 safer-0.014
 better-0.013
 him-0.013
 doctor-0.013
Token in
Token ID287
Feature activation-7.71e-02

Pos logit contributions
old+0.183
 safer+0.141
 Doctor+0.135
 welcome+0.135
five+0.134
Neg logit contributions
 antenna-0.211
nam-0.204
yt-0.200
rys-0.197
 stub-0.197
Token only
Token ID691
Feature activation+3.81e-02

Pos logit contributions
old+0.010
 him+0.008
 safer+0.008
 better+0.008
 doctor+0.008
Neg logit contributions
 antenna-0.013
nam-0.013
yt-0.012
rys-0.012
 stub-0.012
Token three
Token ID1115
Feature activation-6.51e-02

Pos logit contributions
old+0.086
 him+0.078
 safer+0.075
Here+0.075
orting+0.074
Neg logit contributions
 antenna-0.048
nam-0.047
rys-0.041
yt-0.040
 stub-0.040
Token-
Token ID12
Feature activation-5.35e-02

Pos logit contributions
 antenna+0.088
nam+0.081
yt+0.079
rys+0.078
 stub+0.077
Neg logit contributions
old-0.138
 him-0.123
 better-0.121
 safer-0.120
 doctor-0.120
Tokenyears
Token ID19002
Feature activation-1.63e-01

Pos logit contributions
 antenna+0.040
nam+0.038
 stub+0.032
rys+0.032
yt+0.032
Neg logit contributions
old-0.140
 him-0.131
 safer-0.131
 better-0.131
 welcome-0.129
Token-
Token ID12
Feature activation-5.33e-02

Pos logit contributions
 antenna+0.326
nam+0.309
yt+0.305
rys+0.300
 stub+0.298
Neg logit contributions
old-0.237
 him-0.199
 better-0.194
 safer-0.193
 doctor-0.191
Tokenold
Token ID727
Feature activation+2.15e-01

Pos logit contributions
nam+0.145
 antenna+0.142
 stub+0.136
rys+0.136
yt+0.134
Neg logit contributions
 safer-0.026
 better-0.024
 him-0.021
 doctor-0.020
 cleaner-0.020
Token,
Token ID11
Feature activation-8.52e-03

Pos logit contributions
old+0.290
Here+0.260
 safer+0.259
 better+0.242
 him+0.235
Neg logit contributions
 antenna-0.439
nam-0.421
yt-0.413
 stub-0.391
ho-0.385
Token but
Token ID475
Feature activation+1.46e-02

Pos logit contributions
 antenna+0.018
nam+0.017
yt+0.017
rys+0.017
 stub+0.016
Neg logit contributions
old-0.012
 him-0.010
 better-0.009
 safer-0.009
 doctor-0.009
Token she
Token ID673
Feature activation-3.42e-02

Pos logit contributions
old+0.019
 him+0.016
 better+0.016
 safer+0.015
 doctor+0.015
Neg logit contributions
 antenna-0.032
nam-0.030
yt-0.030
rys-0.030
 stub-0.029
Token already
Token ID1541
Feature activation-1.76e-03

Pos logit contributions
 antenna+0.066
nam+0.062
 stub+0.061
yt+0.060
rys+0.060
Neg logit contributions
old-0.053
 him-0.045
 safer-0.044
 better-0.043
 doctor-0.043
Token knew
Token ID2993
Feature activation-5.59e-02

Pos logit contributions
 antenna+0.003
nam+0.003
yt+0.003
rys+0.003
ifying+0.003
Neg logit contributions
old-0.003
 him-0.002
 better-0.002
 doctor-0.002
 welcome-0.002
Token was
Token ID373
Feature activation+1.44e-02

Pos logit contributions
old+0.132
 him+0.109
 safer+0.108
 doctor+0.108
 Doctor+0.107
Neg logit contributions
 antenna-0.148
nam-0.142
 stub-0.140
yt-0.137
rys-0.137
Token three
Token ID1115
Feature activation-6.16e-02

Pos logit contributions
old+0.022
 him+0.019
 better+0.018
 safer+0.018
 doctor+0.018
Neg logit contributions
 antenna-0.028
nam-0.026
yt-0.026
rys-0.026
 stub-0.025
Token-
Token ID12
Feature activation-5.19e-02

Pos logit contributions
 antenna+0.100
nam+0.093
yt+0.091
rys+0.091
 stub+0.091
Neg logit contributions
old-0.115
 him-0.100
 better-0.098
 safer-0.097
 doctor-0.097
Tokenyears
Token ID19002
Feature activation-1.58e-01

Pos logit contributions
 antenna+0.059
nam+0.057
 stub+0.052
rys+0.052
yt+0.051
Neg logit contributions
old-0.116
 him-0.107
 safer-0.107
 better-0.107
 doctor-0.105
Token-
Token ID12
Feature activation-5.17e-02

Pos logit contributions
 antenna+0.325
nam+0.312
yt+0.304
 stub+0.302
rys+0.302
Neg logit contributions
old-0.227
 him-0.185
 better-0.180
 doctor-0.178
 safer-0.177
Tokenold
Token ID727
Feature activation+2.15e-01

Pos logit contributions
nam+0.148
 antenna+0.147
rys+0.143
 stub+0.142
yt+0.141
Neg logit contributions
 safer-0.015
 better-0.014
 him-0.013
old-0.012
 doctor-0.011
Token.
Token ID13
Feature activation+2.06e-02

Pos logit contributions
old+0.266
 safer+0.224
Here+0.215
 him+0.212
 better+0.211
Neg logit contributions
 antenna-0.472
nam-0.448
yt-0.442
 stub-0.427
rys-0.423
Token She
Token ID1375
Feature activation+2.22e-02

Pos logit contributions
old+0.030
 him+0.025
 better+0.025
 safer+0.025
 doctor+0.024
Neg logit contributions
 antenna-0.041
nam-0.039
yt-0.038
 stub-0.038
rys-0.038
Token had
Token ID550
Feature activation-1.36e-01

Pos logit contributions
old+0.030
 him+0.026
 better+0.025
 doctor+0.024
 safer+0.024
Neg logit contributions
 antenna-0.047
nam-0.045
yt-0.045
rys-0.044
ifying-0.043
Token a
Token ID257
Feature activation-1.90e-02

Pos logit contributions
 antenna+0.327
yt+0.317
rys+0.312
nam+0.307
 tentacles+0.298
Neg logit contributions
old-0.142
 him-0.120
 better-0.115
 doctor-0.112
 safer-0.110
Token kind
Token ID1611
Feature activation-7.31e-03

Pos logit contributions
 antenna+0.039
nam+0.037
yt+0.037
rys+0.036
 stub+0.035
Neg logit contributions
old-0.026
 better-0.022
 him-0.022
 doctor-0.022
 safer-0.022
Token average
Token ID2811
Feature activation+5.59e-02

Pos logit contributions
nam+0.191
 antenna+0.188
rys+0.185
yt+0.177
 stub+0.169
Neg logit contributions
old-0.142
 him-0.129
Here-0.128
 safer-0.126
 better-0.123
Token three
Token ID1115
Feature activation-5.89e-02

Pos logit contributions
old+0.086
 him+0.072
 safer+0.071
 better+0.069
Here+0.068
Neg logit contributions
 antenna-0.110
yt-0.103
nam-0.101
rys-0.100
 stub-0.099
Token-
Token ID12
Feature activation-5.09e-02

Pos logit contributions
 antenna+0.097
nam+0.091
yt+0.089
rys+0.088
 stub+0.087
Neg logit contributions
old-0.108
 him-0.094
 better-0.092
 safer-0.091
 doctor-0.091
Tokenyear
Token ID1941
Feature activation-6.88e-02

Pos logit contributions
 antenna+0.080
nam+0.078
yt+0.074
 stub+0.073
rys+0.072
Neg logit contributions
old-0.090
 safer-0.082
 him-0.082
 better-0.082
 doctor-0.080
Token-
Token ID12
Feature activation-5.44e-02

Pos logit contributions
 antenna+0.144
nam+0.138
 stub+0.134
yt+0.134
rys+0.134
Neg logit contributions
old-0.098
 him-0.078
 doctor-0.076
 better-0.076
 safer-0.075
Tokenold
Token ID727
Feature activation+2.15e-01

Pos logit contributions
nam+0.161
 antenna+0.160
yt+0.156
 stub+0.155
rys+0.152
Neg logit contributions
 safer-0.011
 better-0.009
old-0.007
 welcome-0.007
 him-0.006
Token boy
Token ID2933
Feature activation+9.08e-02

Pos logit contributions
old+0.266
 safer+0.240
 better+0.226
Here+0.222
 him+0.217
Neg logit contributions
 antenna-0.459
nam-0.437
yt-0.433
 stub-0.420
rys-0.407
Token.
Token ID13
Feature activation+2.14e-02

Pos logit contributions
old+0.117
 safer+0.101
 better+0.097
 welcome+0.097
 him+0.095
Neg logit contributions
 antenna-0.187
nam-0.178
 stub-0.176
yt-0.175
rys-0.171
Token He
Token ID679
Feature activation+6.26e-02

Pos logit contributions
old+0.023
 safer+0.019
 better+0.019
 doctor+0.018
 him+0.018
Neg logit contributions
 antenna-0.049
yt-0.048
nam-0.047
rys-0.046
 stub-0.046
Token loved
Token ID6151
Feature activation+5.22e-03

Pos logit contributions
old+0.098
 him+0.083
 better+0.081
 doctor+0.080
 safer+0.080
Neg logit contributions
 antenna-0.119
nam-0.114
yt-0.112
rys-0.111
 stub-0.108
Token playing
Token ID2712
Feature activation+8.63e-02

Pos logit contributions
old+0.008
 him+0.007
 better+0.006
 safer+0.006
 doctor+0.006
Neg logit contributions
 antenna-0.010
nam-0.010
yt-0.010
rys-0.010
 stub-0.010
Token was
Token ID373
Feature activation+1.44e-02

Pos logit contributions
 antenna+0.058
 stub+0.054
nam+0.054
yt+0.053
rys+0.053
Neg logit contributions
old-0.049
 him-0.041
 safer-0.041
 doctor-0.040
 Doctor-0.040
Token three
Token ID1115
Feature activation-6.09e-02

Pos logit contributions
old+0.023
 him+0.020
 safer+0.019
 better+0.019
 doctor+0.019
Neg logit contributions
 antenna-0.026
nam-0.026
 stub-0.025
rys-0.024
yt-0.024
Token-
Token ID12
Feature activation-4.97e-02

Pos logit contributions
 antenna+0.091
nam+0.085
yt+0.083
rys+0.082
 stub+0.081
Neg logit contributions
old-0.120
 him-0.106
 better-0.104
 safer-0.103
 doctor-0.103
Tokenyears
Token ID19002
Feature activation-1.57e-01

Pos logit contributions
 antenna+0.043
nam+0.040
yt+0.036
 stub+0.036
rys+0.036
Neg logit contributions
old-0.126
 him-0.117
 safer-0.116
 better-0.116
 doctor-0.115
Token-
Token ID12
Feature activation-4.96e-02

Pos logit contributions
 antenna+0.313
nam+0.296
yt+0.293
rys+0.289
 stub+0.286
Neg logit contributions
old-0.228
 him-0.193
 better-0.188
 safer-0.187
 doctor-0.185
Tokenold
Token ID727
Feature activation+2.15e-01

Pos logit contributions
nam+0.127
 antenna+0.122
 stub+0.119
rys+0.118
yt+0.117
Neg logit contributions
 safer-0.032
 better-0.031
 him-0.030
old-0.030
 doctor-0.029
Token.
Token ID13
Feature activation+2.16e-02

Pos logit contributions
old+0.272
Here+0.226
 safer+0.225
 him+0.213
 better+0.213
Neg logit contributions
 antenna-0.467
nam-0.442
yt-0.439
 stub-0.419
rys-0.415
Token R
Token ID371
Feature activation-5.43e-02

Pos logit contributions
old+0.032
 him+0.027
 better+0.026
 safer+0.026
 doctor+0.026
Neg logit contributions
 antenna-0.043
nam-0.041
yt-0.040
rys-0.040
 stub-0.039
Tokenia
Token ID544
Feature activation-1.52e-02

Pos logit contributions
 antenna+0.104
yt+0.097
rys+0.096
nam+0.095
 stub+0.094
Neg logit contributions
old-0.084
 him-0.073
 better-0.072
 safer-0.071
 doctor-0.070
Token was
Token ID373
Feature activation+1.47e-02

Pos logit contributions
 antenna+0.032
nam+0.031
yt+0.030
rys+0.030
ifying+0.029
Neg logit contributions
old-0.021
 him-0.018
 better-0.017
 doctor-0.017
 safer-0.016
Token very
Token ID845
Feature activation-3.51e-02

Pos logit contributions
old+0.024
 him+0.020
 safer+0.019
 better+0.019
five+0.019
Neg logit contributions
 antenna-0.027
nam-0.027
 stub-0.025
rys-0.025
yt-0.025
Token eager
Token ID11069
Feature activation-3.59e-02

Pos logit contributions
nam+0.207
 antenna+0.207
rys+0.198
yt+0.198
 stub+0.189
Neg logit contributions
old-0.152
 him-0.131
 safer-0.130
 better-0.127
Here-0.127
Token 3
Token ID513
Feature activation-7.76e-02

Pos logit contributions
 antenna+0.069
yt+0.065
nam+0.062
 stub+0.062
rys+0.062
Neg logit contributions
old-0.055
 safer-0.048
Here-0.046
 him-0.046
 better-0.045
Token-
Token ID12
Feature activation-5.01e-02

Pos logit contributions
 antenna+0.120
nam+0.112
yt+0.109
rys+0.108
 stub+0.107
Neg logit contributions
old-0.151
 him-0.132
 better-0.129
 doctor-0.128
 safer-0.128
Tokenyear
Token ID1941
Feature activation-6.83e-02

Pos logit contributions
nam+0.114
 antenna+0.107
 stub+0.107
yt+0.104
rys+0.104
Neg logit contributions
 safer-0.046
 better-0.043
 him-0.042
 doctor-0.041
 welcome-0.041
Token-
Token ID12
Feature activation-4.95e-02

Pos logit contributions
 antenna+0.142
nam+0.136
yt+0.132
 stub+0.132
rys+0.131
Neg logit contributions
old-0.097
 him-0.079
 better-0.077
 doctor-0.076
 safer-0.075
Tokenold
Token ID727
Feature activation+2.15e-01

Pos logit contributions
nam+0.142
 antenna+0.134
 stub+0.134
yt+0.134
spe+0.132
Neg logit contributions
 safer-0.015
 better-0.011
 cleaner-0.009
 doctor-0.009
 Doctor-0.009
Token boy
Token ID2933
Feature activation+8.89e-02

Pos logit contributions
old+0.268
 safer+0.260
Here+0.247
 better+0.232
orting+0.228
Neg logit contributions
 antenna-0.440
nam-0.425
yt-0.424
 stub-0.402
spe-0.384
Token who
Token ID508
Feature activation+8.29e-02

Pos logit contributions
old+0.112
 safer+0.099
 better+0.092
 Doctor+0.092
Here+0.092
Neg logit contributions
 antenna-0.184
nam-0.175
 stub-0.172
yt-0.172
rys-0.170
Token loved
Token ID6151
Feature activation+1.33e-02

Pos logit contributions
old+0.144
 safer+0.144
 Doctor+0.135
orting+0.135
 cleaner+0.131
Neg logit contributions
 antenna-0.113
 stub-0.108
nam-0.106
 magn-0.100
 stumbled-0.099
Token to
Token ID284
Feature activation-1.49e-02

Pos logit contributions
old+0.020
 safer+0.017
 better+0.017
 doctor+0.016
 welcome+0.016
Neg logit contributions
 antenna-0.026
nam-0.025
yt-0.024
 stub-0.024
rys-0.023
Token explore
Token ID7301
Feature activation-2.41e-02

Pos logit contributions
 antenna+0.035
nam+0.034
yt+0.033
rys+0.033
 stub+0.033
Neg logit contributions
old-0.016
 safer-0.012
 him-0.012
 better-0.012
 doctor-0.012
Token a
Token ID257
Feature activation-1.91e-02

Pos logit contributions
 antenna+0.012
nam+0.011
 stub+0.011
yt+0.011
rys+0.011
Neg logit contributions
old-0.007
 safer-0.005
 better-0.005
 doctor-0.005
 him-0.005
Token three
Token ID1115
Feature activation-6.03e-02

Pos logit contributions
 antenna+0.035
yt+0.033
nam+0.033
 stub+0.033
rys+0.032
Neg logit contributions
old-0.032
 him-0.027
 safer-0.025
five-0.024
 better-0.024
Token-
Token ID12
Feature activation-5.04e-02

Pos logit contributions
 antenna+0.119
nam+0.113
ifying+0.108
 stub+0.107
rys+0.107
Neg logit contributions
old-0.096
 doctor-0.078
 him-0.078
 ice-0.078
 welcome-0.076
Tokenyear
Token ID1941
Feature activation-6.85e-02

Pos logit contributions
nam+0.071
 antenna+0.068
 stub+0.066
yt+0.064
rys+0.063
Neg logit contributions
 safer-0.088
old-0.087
 better-0.085
 doctor-0.085
 Doctor-0.085
Token-
Token ID12
Feature activation-5.02e-02

Pos logit contributions
 antenna+0.153
nam+0.153
ifying+0.147
 stub+0.144
 nowhere+0.142
Neg logit contributions
old-0.091
 ice-0.067
 doctor-0.067
 him-0.066
 welcome-0.065
Tokenold
Token ID727
Feature activation+2.15e-01

Pos logit contributions
nam+0.146
 antenna+0.142
yt+0.142
 stub+0.142
rys+0.138
Neg logit contributions
 safer-0.010
 Doctor-0.008
 better-0.007
 doctor-0.006
 welcome-0.005
Token boy
Token ID2933
Feature activation+8.94e-02

Pos logit contributions
old+0.271
 safer+0.254
 better+0.233
Here+0.221
 him+0.219
Neg logit contributions
 antenna-0.439
yt-0.429
 stub-0.421
nam-0.413
rys-0.402
Token was
Token ID373
Feature activation+1.37e-02

Pos logit contributions
old+0.163
 safer+0.157
 better+0.149
 doctor+0.146
 Doctor+0.146
Neg logit contributions
 antenna-0.127
 stub-0.125
yt-0.122
rys-0.120
 magn-0.117
Token very
Token ID845
Feature activation-3.55e-02

Pos logit contributions
old+0.021
 him+0.018
 safer+0.018
 better+0.017
 doctor+0.017
Neg logit contributions
 antenna-0.026
nam-0.025
 stub-0.024
yt-0.024
rys-0.024
Token excited
Token ID6568
Feature activation-3.60e-02

Pos logit contributions
nam+0.059
 antenna+0.058
 stub+0.057
yt+0.056
rys+0.054
Neg logit contributions
old-0.060
 him-0.054
 safer-0.052
Here-0.052
orting-0.051
Token.
Token ID13
Feature activation+2.20e-02

Pos logit contributions
 antenna+0.074
nam+0.070
yt+0.070
rys+0.068
 stub+0.068
Neg logit contributions
old-0.051
 him-0.042
 safer-0.041
 better-0.041
 doctor-0.041
Token little
Token ID1310
Feature activation+1.71e-02

Pos logit contributions
 antenna+0.037
nam+0.035
 stub+0.035
yt+0.035
rys+0.035
Neg logit contributions
old-0.030
 him-0.026
 safer-0.023
 choose-0.023
 better-0.023
Token 3
Token ID513
Feature activation-7.76e-02

Pos logit contributions
old+0.020
 him+0.016
 safer+0.015
 better+0.015
 doctor+0.014
Neg logit contributions
 antenna-0.039
nam-0.038
yt-0.037
rys-0.037
 stub-0.037
Token-
Token ID12
Feature activation-4.97e-02

Pos logit contributions
 antenna+0.148
nam+0.143
ifying+0.136
rys+0.133
yt+0.133
Neg logit contributions
old-0.129
 him-0.108
 doctor-0.104
 better-0.101
 welcome-0.101
Tokenyear
Token ID1941
Feature activation-6.79e-02

Pos logit contributions
nam+0.073
 antenna+0.073
 stub+0.069
rys+0.067
ho+0.067
Neg logit contributions
 safer-0.082
 better-0.079
old-0.079
 him-0.078
 doctor-0.077
Token-
Token ID12
Feature activation-4.91e-02

Pos logit contributions
 antenna+0.151
nam+0.150
ifying+0.146
 stub+0.141
rys+0.139
Neg logit contributions
old-0.091
 him-0.069
 doctor-0.068
 ice-0.067
 better-0.066
Tokenold
Token ID727
Feature activation+2.15e-01

Pos logit contributions
nam+0.136
 antenna+0.135
ho+0.134
 stub+0.134
yt+0.131
Neg logit contributions
 safer-0.014
 better-0.011
 welcome-0.009
 cleaner-0.009
 Doctor-0.009
Token girl
Token ID2576
Feature activation+3.45e-02

Pos logit contributions
old+0.264
 safer+0.253
 better+0.228
Here+0.225
 choose+0.219
Neg logit contributions
 antenna-0.443
yt-0.429
 stub-0.425
nam-0.408
rys-0.402
Token was
Token ID373
Feature activation+1.46e-02

Pos logit contributions
old+0.057
 safer+0.050
 him+0.050
 better+0.049
 doctor+0.048
Neg logit contributions
 antenna-0.061
yt-0.057
 stub-0.056
nam-0.056
rys-0.056
Token on
Token ID319
Feature activation-1.10e-01

Pos logit contributions
old+0.023
 him+0.020
 doctor+0.020
 better+0.020
 safer+0.019
Neg logit contributions
 antenna-0.028
yt-0.027
nam-0.026
rys-0.026
 tentacles-0.025
Token the
Token ID262
Feature activation+3.68e-02

Pos logit contributions
 antenna+0.257
nam+0.245
yt+0.244
rys+0.239
 stub+0.238
Neg logit contributions
old-0.123
 him-0.101
 better-0.096
 doctor-0.095
 safer-0.094
Token playground
Token ID24817
Feature activation+4.80e-02

Pos logit contributions
old+0.053
 him+0.044
 better+0.043
 safer+0.043
 doctor+0.042
Neg logit contributions
 antenna-0.076
nam-0.072
yt-0.071
rys-0.070
 stub-0.069
Token never
Token ID1239
Feature activation-1.38e-01

Pos logit contributions
old+0.000
 him+0.000
 better+0.000
 safer+0.000
 doctor+0.000
Neg logit contributions
 antenna-0.000
yt-0.000
nam-0.000
rys-0.000
 stub-0.000
Token played
Token ID2826
Feature activation+6.05e-02

Pos logit contributions
 antenna+0.249
nam+0.237
yt+0.235
rys+0.231
ifying+0.224
Neg logit contributions
old-0.225
 better-0.199
 him-0.196
 safer-0.189
 doctor-0.187
Token with
Token ID351
Feature activation-5.61e-02

Pos logit contributions
old+0.116
 say+0.088
Here+0.085
five+0.084
 him+0.083
Neg logit contributions
 antenna-0.095
nam-0.090
 nowhere-0.087
rys-0.085
 stub-0.085
Token the
Token ID262
Feature activation+1.34e-02

Pos logit contributions
nam+0.116
 antenna+0.116
 tentacles+0.106
 nowhere+0.106
 stub+0.103
Neg logit contributions
old-0.078
five-0.061
 welcome-0.059
Here-0.057
 doctor-0.056
Token or
Token ID393
Feature activation+4.10e-02

Pos logit contributions
old+0.019
 him+0.015
five+0.014
 welcome+0.013
 Doctor+0.013
Neg logit contributions
 antenna-0.029
nam-0.027
 stub-0.026
 magn-0.026
yt-0.026
Tokennam
Token ID7402
Feature activation-3.04e-01

Pos logit contributions
old+0.052
 better+0.047
 him+0.047
 safer+0.046
 doctor+0.045
Neg logit contributions
 antenna-0.088
yt-0.086
rys-0.082
 tentacles-0.080
 stub-0.079
Tokenents
Token ID658
Feature activation-5.31e-02

Pos logit contributions
 antenna+0.559
rys+0.535
 magn+0.531
 stub+0.512
yt+0.496
Neg logit contributions
old-0.525
 him-0.421
 better-0.399
 choose-0.387
 safer-0.387
Token again
Token ID757
Feature activation+3.35e-02

Pos logit contributions
 antenna+0.105
nam+0.101
yt+0.100
 stub+0.098
rys+0.096
Neg logit contributions
old-0.077
Here-0.061
 Doctor-0.061
 welcome-0.060
 him-0.060
Token.
Token ID13
Feature activation+7.21e-03

Pos logit contributions
old+0.037
 him+0.028
 safer+0.027
 better+0.027
 doctor+0.027
Neg logit contributions
 antenna-0.080
nam-0.076
yt-0.076
rys-0.074
 stub-0.074
Token They
Token ID1119
Feature activation+1.59e-02

Pos logit contributions
old+0.010
 better+0.008
 him+0.008
 safer+0.008
 cream+0.008
Neg logit contributions
 antenna-0.015
nam-0.014
yt-0.014
rys-0.014
 stub-0.014
Token still
Token ID991
Feature activation+1.16e-02

Pos logit contributions
old+0.025
 him+0.021
 better+0.021
 safer+0.021
 doctor+0.021
Neg logit contributions
 antenna-0.031
yt-0.029
nam-0.029
rys-0.028
 stub-0.028
Token the
Token ID262
Feature activation+1.81e-02

Pos logit contributions
old+0.037
 doctor+0.029
 safer+0.028
 him+0.028
 better+0.028
Neg logit contributions
 antenna-0.062
nam-0.059
yt-0.057
rys-0.057
 tentacles-0.056
Token rest
Token ID1334
Feature activation+8.61e-02

Pos logit contributions
old+0.030
 better+0.027
 him+0.026
 safer+0.026
 doctor+0.026
Neg logit contributions
 antenna-0.031
nam-0.030
yt-0.030
rys-0.029
 stub-0.028
Token of
Token ID286
Feature activation+8.48e-02

Pos logit contributions
old+0.079
 him+0.059
 better+0.056
 safer+0.056
 doctor+0.055
Neg logit contributions
 antenna-0.219
nam-0.210
yt-0.208
rys-0.205
 stub-0.204
Token the
Token ID262
Feature activation+1.82e-02

Pos logit contributions
old+0.099
 him+0.079
 better+0.076
 safer+0.076
 doctor+0.075
Neg logit contributions
 antenna-0.196
nam-0.187
yt-0.184
rys-0.182
 stub-0.181
Token or
Token ID393
Feature activation+4.29e-02

Pos logit contributions
old+0.029
 better+0.026
 doctor+0.025
 safer+0.025
 him+0.025
Neg logit contributions
 antenna-0.033
nam-0.031
yt-0.031
rys-0.030
 stub-0.030
Tokennam
Token ID7402
Feature activation-3.02e-01

Pos logit contributions
old+0.063
 him+0.057
 better+0.056
 safer+0.055
 doctor+0.053
Neg logit contributions
 antenna-0.084
yt-0.080
rys-0.079
 stub-0.076
nam-0.076
Tokenents
Token ID658
Feature activation-4.87e-02

Pos logit contributions
 antenna+0.585
rys+0.582
 magn+0.573
 stub+0.553
yt+0.529
Neg logit contributions
old-0.493
 him-0.374
 better-0.353
 ice-0.342
 choose-0.338
Token onto
Token ID4291
Feature activation-9.19e-02

Pos logit contributions
 antenna+0.113
nam+0.108
yt+0.106
rys+0.105
 stub+0.104
Neg logit contributions
old-0.055
 him-0.044
 better-0.043
 safer-0.043
 doctor-0.042
Token the
Token ID262
Feature activation+1.68e-02

Pos logit contributions
 antenna+0.208
nam+0.197
yt+0.195
rys+0.192
 stub+0.191
Neg logit contributions
old-0.111
 him-0.086
 safer-0.085
 better-0.085
 doctor-0.084
Token tree
Token ID5509
Feature activation-4.84e-02

Pos logit contributions
old+0.028
 better+0.025
 him+0.025
 doctor+0.024
 safer+0.024
Neg logit contributions
 antenna-0.029
yt-0.028
nam-0.028
rys-0.028
 stub-0.027
Token.
Token ID13
Feature activation+1.19e-02

Pos logit contributions
 antenna+0.117
nam+0.112
yt+0.111
rys+0.110
 stub+0.109
Neg logit contributions
old-0.050
 him-0.040
 better-0.038
 safer-0.038
 doctor-0.037
Token he
Token ID339
Feature activation+4.51e-03

Pos logit contributions
old+0.006
 him+0.005
 better+0.005
 doctor+0.005
 safer+0.005
Neg logit contributions
 antenna-0.010
nam-0.010
yt-0.010
rys-0.010
 stub-0.010
Token especially
Token ID2592
Feature activation+2.94e-02

Pos logit contributions
old+0.007
 him+0.006
 better+0.006
 safer+0.006
 doctor+0.006
Neg logit contributions
 antenna-0.008
nam-0.008
yt-0.008
rys-0.008
 stub-0.008
Token liked
Token ID8288
Feature activation+8.02e-03

Pos logit contributions
old+0.050
 him+0.043
 doctor+0.043
 safer+0.043
 Doctor+0.042
Neg logit contributions
 antenna-0.050
nam-0.049
 stub-0.046
yt-0.044
rys-0.044
Token the
Token ID262
Feature activation+2.51e-02

Pos logit contributions
old+0.013
 safer+0.010
 better+0.010
 welcome+0.010
 doctor+0.010
Neg logit contributions
 antenna-0.015
nam-0.014
yt-0.014
 stub-0.014
rys-0.013
Token or
Token ID393
Feature activation+4.64e-02

Pos logit contributions
old+0.041
 him+0.033
five+0.032
 safer+0.031
orting+0.031
Neg logit contributions
 antenna-0.047
nam-0.046
yt-0.043
 stub-0.043
 magn-0.043
Tokennam
Token ID7402
Feature activation-3.00e-01

Pos logit contributions
old+0.068
 him+0.059
 better+0.058
 safer+0.057
 doctor+0.056
Neg logit contributions
 antenna-0.092
yt-0.087
rys-0.086
nam-0.085
 stub-0.084
Tokenents
Token ID658
Feature activation-4.60e-02

Pos logit contributions
 antenna+0.557
rys+0.539
 magn+0.533
 stub+0.510
yt+0.486
Neg logit contributions
old-0.512
 him-0.416
 better-0.386
 choose-0.379
 ice-0.378
Token he
Token ID339
Feature activation+4.47e-03

Pos logit contributions
 antenna+0.097
nam+0.094
yt+0.092
 stub+0.090
ifying+0.088
Neg logit contributions
old-0.059
 Doctor-0.047
Here-0.047
 safer-0.046
 doctor-0.046
Token kept
Token ID4030
Feature activation+2.66e-02

Pos logit contributions
old+0.008
 him+0.007
 safer+0.007
 doctor+0.007
 better+0.007
Neg logit contributions
 antenna-0.007
nam-0.006
 stub-0.006
yt-0.006
rys-0.006
Token around
Token ID1088
Feature activation-2.94e-02

Pos logit contributions
old+0.039
 him+0.032
 safer+0.032
 doctor+0.031
 better+0.031
Neg logit contributions
 antenna-0.053
nam-0.051
yt-0.050
rys-0.049
 stub-0.049
Token.
Token ID13
Feature activation+1.66e-02

Pos logit contributions
 antenna+0.068
nam+0.065
yt+0.064
rys+0.063
 stub+0.062
Neg logit contributions
old-0.034
 him-0.027
 better-0.026
 safer-0.026
 doctor-0.026
Token a
Token ID257
Feature activation-3.44e-02

Pos logit contributions
 antenna+0.319
rys+0.309
yt+0.309
nam+0.304
 stub+0.295
Neg logit contributions
old-0.220
 him-0.190
 better-0.188
 safer-0.184
 ice-0.181
Token box
Token ID3091
Feature activation-4.08e-02

Pos logit contributions
 antenna+0.069
nam+0.065
yt+0.065
rys+0.063
 stub+0.063
Neg logit contributions
old-0.051
 him-0.043
 better-0.041
 safer-0.041
 doctor-0.040
Token of
Token ID286
Feature activation+8.33e-02

Pos logit contributions
 antenna+0.096
nam+0.092
yt+0.091
rys+0.090
 stub+0.089
Neg logit contributions
old-0.045
 him-0.036
 better-0.035
 safer-0.034
 doctor-0.034
Token shiny
Token ID22441
Feature activation-3.14e-02

Pos logit contributions
old+0.133
 him+0.108
 Doctor+0.107
 safer+0.107
 doctor+0.106
Neg logit contributions
 antenna-0.157
nam-0.155
yt-0.141
 magn-0.138
 tentacles-0.137
Token or
Token ID393
Feature activation+4.53e-02

Pos logit contributions
 antenna+0.057
nam+0.053
yt+0.053
rys+0.051
 stub+0.051
Neg logit contributions
old-0.053
 him-0.045
 safer-0.044
 better-0.044
 doctor-0.043
Tokennam
Token ID7402
Feature activation-3.00e-01

Pos logit contributions
 better+0.095
old+0.093
 ice+0.092
 safer+0.091
 him+0.090
Neg logit contributions
rys-0.058
 antenna-0.056
yt-0.054
 stub-0.052
ho-0.050
Tokenents
Token ID658
Feature activation-4.59e-02

Pos logit contributions
 antenna+0.589
rys+0.586
 magn+0.572
 stub+0.560
 tentacles+0.521
Neg logit contributions
old-0.468
 him-0.359
 better-0.344
 choose-0.334
 ice-0.333
Token that
Token ID326
Feature activation+3.37e-02

Pos logit contributions
 antenna+0.100
nam+0.095
yt+0.094
rys+0.093
 stub+0.092
Neg logit contributions
old-0.059
 him-0.049
 better-0.048
 safer-0.047
 doctor-0.046
Token spark
Token ID9009
Feature activation-9.24e-02

Pos logit contributions
old+0.050
 him+0.042
 safer+0.041
 better+0.041
 doctor+0.040
Neg logit contributions
 antenna-0.067
nam-0.063
yt-0.062
rys-0.061
 stub-0.061
Tokenled
Token ID992
Feature activation-9.91e-02

Pos logit contributions
 antenna+0.139
yt+0.130
nam+0.126
rys+0.121
ifying+0.119
Neg logit contributions
old-0.178
 safer-0.159
 welcome-0.158
 Doctor-0.157
 doctor-0.156
Token in
Token ID287
Feature activation-8.96e-02

Pos logit contributions
 antenna+0.197
nam+0.186
yt+0.184
rys+0.182
 stub+0.180
Neg logit contributions
old-0.146
 him-0.124
 better-0.121
 safer-0.120
 doctor-0.119
Token All
Token ID1439
Feature activation-2.80e-03

Pos logit contributions
old+0.025
 better+0.022
 safer+0.022
 cleaner+0.021
 cream+0.021
Neg logit contributions
 antenna-0.030
nam-0.029
 nowhere-0.028
 stub-0.027
 tentacles-0.027
Token of
Token ID286
Feature activation+8.68e-02

Pos logit contributions
 antenna+0.006
nam+0.006
rys+0.006
 stub+0.006
yt+0.006
Neg logit contributions
old-0.003
 safer-0.002
 doctor-0.002
 better-0.002
 Doctor-0.002
Token these
Token ID777
Feature activation-1.54e-02

Pos logit contributions
old+0.112
 safer+0.088
 better+0.086
five+0.085
 doctor+0.085
Neg logit contributions
 antenna-0.188
nam-0.181
 nowhere-0.173
yt-0.172
 tentacles-0.171
Token lovely
Token ID14081
Feature activation+3.93e-02

Pos logit contributions
 antenna+0.030
nam+0.029
yt+0.027
 stub+0.027
rys+0.026
Neg logit contributions
old-0.023
 Doctor-0.019
 safer-0.019
 doctor-0.019
 him-0.019
Token or
Token ID393
Feature activation+4.92e-02

Pos logit contributions
old+0.046
 him+0.037
 better+0.035
 doctor+0.035
 safer+0.034
Neg logit contributions
 antenna-0.090
nam-0.086
yt-0.085
rys-0.085
 stub-0.084
Tokennam
Token ID7402
Feature activation-2.97e-01

Pos logit contributions
old+0.125
 him+0.117
 better+0.115
 safer+0.114
 doctor+0.113
Neg logit contributions
 antenna-0.044
rys-0.039
yt-0.039
 stub-0.037
nam-0.034
Tokenents
Token ID658
Feature activation-4.41e-02

Pos logit contributions
 antenna+0.579
rys+0.573
 magn+0.563
 stub+0.544
yt+0.508
Neg logit contributions
old-0.489
 him-0.381
 better-0.342
 ice-0.340
 choose-0.338
Token reminded
Token ID14516
Feature activation-6.26e-02

Pos logit contributions
 antenna+0.084
nam+0.079
yt+0.078
 stub+0.076
rys+0.075
Neg logit contributions
old-0.068
 safer-0.056
 doctor-0.056
 Doctor-0.056
 him-0.056
Token him
Token ID683
Feature activation+1.24e-01

Pos logit contributions
 antenna+0.170
nam+0.163
 tentacles+0.157
 nowhere+0.156
rys+0.156
Neg logit contributions
old-0.042
 safer-0.029
 doctor-0.025
 better-0.025
five-0.025
Token of
Token ID286
Feature activation+8.81e-02

Pos logit contributions
old+0.140
Here+0.117
 safer+0.116
 him+0.112
 Doctor+0.112
Neg logit contributions
 antenna-0.275
nam-0.259
yt-0.255
 stub-0.252
rys-0.251
Token the
Token ID262
Feature activation+2.71e-02

Pos logit contributions
old+0.110
 safer+0.087
 better+0.085
 doctor+0.085
 him+0.084
Neg logit contributions
 antenna-0.196
nam-0.188
yt-0.180
rys-0.178
 stub-0.176
Token tree
Token ID5509
Feature activation-4.70e-02

Pos logit contributions
 antenna+0.074
nam+0.070
yt+0.069
rys+0.068
 stub+0.067
Neg logit contributions
old-0.063
 him-0.054
 better-0.052
 safer-0.052
 doctor-0.052
Token with
Token ID351
Feature activation-5.27e-02

Pos logit contributions
 antenna+0.113
nam+0.109
rys+0.106
yt+0.105
 stub+0.105
Neg logit contributions
old-0.050
 him-0.040
 better-0.040
 safer-0.039
 doctor-0.038
Token lights
Token ID7588
Feature activation+1.71e-02

Pos logit contributions
 antenna+0.103
yt+0.098
rys+0.097
nam+0.097
 stub+0.095
Neg logit contributions
old-0.078
 him-0.069
 better-0.066
 safer-0.065
 doctor-0.065
Token and
Token ID290
Feature activation-1.85e-03

Pos logit contributions
old+0.020
 him+0.016
 better+0.016
 safer+0.015
 doctor+0.015
Neg logit contributions
 antenna-0.039
nam-0.038
yt-0.037
rys-0.037
 stub-0.036
Token or
Token ID393
Feature activation+4.38e-02

Pos logit contributions
 antenna+0.004
rys+0.004
yt+0.004
nam+0.004
 stub+0.004
Neg logit contributions
old-0.003
 him-0.002
 better-0.002
 safer-0.002
 ice-0.002
Tokennam
Token ID7402
Feature activation-2.97e-01

Pos logit contributions
old+0.060
 him+0.052
 better+0.051
 safer+0.050
 doctor+0.049
Neg logit contributions
 antenna-0.090
yt-0.086
rys-0.085
nam-0.083
 stub-0.083
Tokenents
Token ID658
Feature activation-4.66e-02

Pos logit contributions
rys+0.581
 antenna+0.576
 magn+0.574
 stub+0.554
ho+0.509
Neg logit contributions
old-0.457
 him-0.368
 ice-0.333
 better-0.332
 choose-0.326
Token.
Token ID13
Feature activation+1.19e-02

Pos logit contributions
 antenna+0.104
nam+0.099
yt+0.098
rys+0.097
 stub+0.096
Neg logit contributions
old-0.057
 him-0.046
 safer-0.044
 better-0.044
 doctor-0.044
Token They
Token ID1119
Feature activation+2.45e-02

Pos logit contributions
old+0.016
 better+0.013
 safer+0.013
 him+0.012
 welcome+0.012
Neg logit contributions
 antenna-0.026
nam-0.025
yt-0.024
 stub-0.023
rys-0.023
Token also
Token ID635
Feature activation+1.74e-02

Pos logit contributions
old+0.031
 him+0.029
 like+0.027
 say+0.027
 choose+0.027
Neg logit contributions
rys-0.056
 antenna-0.051
yt-0.050
ifying-0.050
 stub-0.049
Token saw
Token ID2497
Feature activation+3.01e-02

Pos logit contributions
old+0.022
 like+0.019
 him+0.018
 say+0.018
 better+0.018
Neg logit contributions
rys-0.040
ifying-0.037
 antenna-0.037
yt-0.037
 tentacles-0.036
Token He
Token ID679
Feature activation+4.88e-02

Pos logit contributions
old+0.011
 better+0.009
 safer+0.009
 him+0.009
 ice+0.009
Neg logit contributions
 antenna-0.015
nam-0.014
 stub-0.014
 nowhere-0.013
yt-0.013
Token quickly
Token ID2952
Feature activation-5.66e-02

Pos logit contributions
old+0.077
 him+0.064
 better+0.064
 doctor+0.063
 ice+0.063
Neg logit contributions
rys-0.093
nam-0.090
 antenna-0.090
yt-0.086
ifying-0.086
Token hung
Token ID9174
Feature activation+5.79e-02

Pos logit contributions
 antenna+0.098
nam+0.096
rys+0.092
yt+0.090
 stub+0.089
Neg logit contributions
old-0.098
 him-0.084
 better-0.083
 safer-0.082
 doctor-0.081
Token colorful
Token ID20239
Feature activation+1.82e-03

Pos logit contributions
old+0.079
 safer+0.061
 him+0.060
 better+0.060
 doctor+0.060
Neg logit contributions
 antenna-0.125
nam-0.119
yt-0.117
rys-0.114
 stub-0.113
Token or
Token ID393
Feature activation+4.00e-02

Pos logit contributions
old+0.003
 him+0.003
 safer+0.003
 welcome+0.003
 doctor+0.003
Neg logit contributions
 antenna-0.003
yt-0.003
nam-0.003
rys-0.003
 tentacles-0.003
Tokennam
Token ID7402
Feature activation-2.96e-01

Pos logit contributions
 better+0.090
old+0.088
 safer+0.088
 him+0.085
 like+0.083
Neg logit contributions
rys-0.043
 antenna-0.042
yt-0.039
 stub-0.038
ho-0.036
Tokenents
Token ID658
Feature activation-4.79e-02

Pos logit contributions
 antenna+0.538
rys+0.532
 stub+0.511
 magn+0.510
nam+0.483
Neg logit contributions
old-0.493
 him-0.413
 better-0.391
 ice-0.375
 safer-0.368
Token and
Token ID290
Feature activation-1.74e-03

Pos logit contributions
 antenna+0.085
yt+0.080
nam+0.078
 stub+0.074
ifying+0.074
Neg logit contributions
old-0.082
 Doctor-0.068
 doctor-0.067
Here-0.067
 welcome-0.067
Token tw
Token ID665
Feature activation-1.46e-01

Pos logit contributions
rys+0.003
 antenna+0.003
yt+0.003
 stub+0.003
nam+0.003
Neg logit contributions
old-0.003
 him-0.003
 better-0.003
 ice-0.002
 safer-0.002
Tokenink
Token ID676
Feature activation-1.89e-01

Pos logit contributions
 antenna+0.156
nam+0.144
 stub+0.136
yt+0.135
rys+0.134
Neg logit contributions
old-0.353
 him-0.321
 better-0.309
 safer-0.308
 doctor-0.306
Tokenling
Token ID1359
Feature activation-8.79e-02

Pos logit contributions
 antenna+0.320
nam+0.306
rys+0.297
yt+0.296
 stub+0.289
Neg logit contributions
old-0.334
 him-0.295
 better-0.289
 safer-0.282
 doctor-0.281
Token can
Token ID460
Feature activation+7.88e-02

Pos logit contributions
 antenna+0.052
nam+0.049
yt+0.049
rys+0.048
 stub+0.047
Neg logit contributions
old-0.029
 him-0.024
 better-0.023
 safer-0.023
 doctor-0.023
Token we
Token ID356
Feature activation-7.84e-02

Pos logit contributions
old+0.109
 him+0.099
 doctor+0.094
 better+0.093
 welcome+0.091
Neg logit contributions
 antenna-0.164
yt-0.155
rys-0.150
 stub-0.149
nam-0.149
Token have
Token ID423
Feature activation-1.66e-02

Pos logit contributions
 antenna+0.180
nam+0.172
yt+0.170
rys+0.168
 stub+0.166
Neg logit contributions
old-0.092
 him-0.074
 better-0.071
 safer-0.071
 doctor-0.070
Token some
Token ID617
Feature activation-2.64e-02

Pos logit contributions
 antenna+0.038
yt+0.037
rys+0.036
 stub+0.036
nam+0.035
Neg logit contributions
old-0.019
 him-0.017
 better-0.017
 ice-0.016
 safer-0.016
Token or
Token ID393
Feature activation+4.04e-02

Pos logit contributions
 antenna+0.057
nam+0.054
yt+0.053
rys+0.052
 stub+0.052
Neg logit contributions
old-0.035
 him-0.029
 safer-0.027
 doctor-0.027
 better-0.027
Tokennam
Token ID7402
Feature activation-2.96e-01

Pos logit contributions
old+0.067
 ice+0.067
 him+0.066
 better+0.066
 doctor+0.065
Neg logit contributions
 antenna-0.068
yt-0.067
rys-0.066
 stub-0.064
ho-0.061
Tokenents
Token ID658
Feature activation-4.78e-02

Pos logit contributions
 antenna+0.594
rys+0.589
 magn+0.585
 stub+0.558
 tentacles+0.525
Neg logit contributions
old-0.449
 him-0.355
 better-0.330
 choose-0.325
 ice-0.322
Token for
Token ID329
Feature activation+2.26e-02

Pos logit contributions
 antenna+0.102
nam+0.097
yt+0.095
rys+0.094
 stub+0.093
Neg logit contributions
old-0.063
 him-0.053
 better-0.052
 safer-0.051
 doctor-0.051
Token our
Token ID674
Feature activation-2.24e-02

Pos logit contributions
old+0.025
 him+0.024
 better+0.022
 ice+0.021
 doctor+0.021
Neg logit contributions
 antenna-0.052
yt-0.051
rys-0.050
 stub-0.050
 magn-0.048
Token snow
Token ID6729
Feature activation+1.10e-01

Pos logit contributions
 antenna+0.045
yt+0.043
rys+0.043
nam+0.043
 stub+0.042
Neg logit contributions
old-0.031
 him-0.027
 doctor-0.026
 better-0.026
 safer-0.026
Tokenman
Token ID805
Feature activation-6.96e-02

Pos logit contributions
old+0.157
 doctor+0.150
 him+0.147
 better+0.141
 ice+0.139
Neg logit contributions
 antenna-0.215
nam-0.210
rys-0.203
 tentacles-0.193
 stub-0.192
Token to
Token ID284
Feature activation-2.93e-02

Pos logit contributions
 antenna+0.067
nam+0.063
yt+0.062
rys+0.061
 stub+0.061
Neg logit contributions
old-0.049
 him-0.041
 safer-0.040
 better-0.039
 doctor-0.039
Token me
Token ID502
Feature activation+3.65e-02

Pos logit contributions
 antenna+0.063
nam+0.060
yt+0.058
rys+0.058
 stub+0.057
Neg logit contributions
old-0.039
 safer-0.030
 better-0.030
 him-0.030
 doctor-0.030
Token.
Token ID13
Feature activation+2.40e-02

Pos logit contributions
old+0.046
Here+0.038
 him+0.036
 safer+0.035
 doctor+0.034
Neg logit contributions
 antenna-0.078
nam-0.075
yt-0.072
rys-0.071
 stub-0.070
Token The
Token ID383
Feature activation+7.90e-04

Pos logit contributions
old+0.031
 him+0.026
 better+0.025
 safer+0.025
 doctor+0.024
Neg logit contributions
 antenna-0.052
nam-0.049
rys-0.048
yt-0.048
 stub-0.048
Token or
Token ID393
Feature activation+4.86e-02

Pos logit contributions
old+0.001
 him+0.001
 better+0.001
 safer+0.001
 doctor+0.001
Neg logit contributions
 antenna-0.002
nam-0.002
yt-0.002
rys-0.001
 stub-0.001
Tokennam
Token ID7402
Feature activation-2.95e-01

Pos logit contributions
old+0.046
 better+0.045
 safer+0.042
 him+0.042
 ice+0.040
Neg logit contributions
yt-0.119
 antenna-0.117
 stub-0.112
rys-0.112
 tentacles-0.110
Tokenents
Token ID658
Feature activation-4.80e-02

Pos logit contributions
 antenna+0.546
 magn+0.523
rys+0.522
 stub+0.507
yt+0.503
Neg logit contributions
old-0.510
 him-0.410
 better-0.387
 choose-0.380
 doctor-0.374
Token were
Token ID547
Feature activation+8.69e-03

Pos logit contributions
 antenna+0.105
nam+0.098
yt+0.097
rys+0.096
 stub+0.095
Neg logit contributions
old-0.060
 Doctor-0.047
Here-0.045
 him-0.045
 safer-0.045
Token very
Token ID845
Feature activation-5.28e-02

Pos logit contributions
old+0.012
 better+0.011
 safer+0.010
 him+0.010
 doctor+0.010
Neg logit contributions
 antenna-0.018
yt-0.017
rys-0.017
nam-0.017
 stub-0.017
Token important
Token ID1593
Feature activation+5.17e-03

Pos logit contributions
 antenna+0.110
yt+0.105
nam+0.103
rys+0.103
 stub+0.102
Neg logit contributions
old-0.073
 better-0.063
 safer-0.062
 welcome-0.059
 him-0.059
Token to
Token ID284
Feature activation-2.95e-02

Pos logit contributions
old+0.007
 safer+0.005
 him+0.005
Here+0.005
 better+0.005
Neg logit contributions
 antenna-0.011
nam-0.011
yt-0.011
 tentacles-0.010
rys-0.010
Token at
Token ID379
Feature activation-2.45e-02

Pos logit contributions
 antenna+0.069
yt+0.064
nam+0.064
rys+0.062
 stub+0.060
Neg logit contributions
old-0.044
 doctor-0.036
Here-0.036
 him-0.035
 Doctor-0.035
Token all
Token ID477
Feature activation-6.42e-02

Pos logit contributions
 antenna+0.050
nam+0.048
 nowhere+0.046
 tentacles+0.044
yt+0.043
Neg logit contributions
old-0.036
 doctor-0.028
five-0.028
 safer-0.027
 better-0.027
Token the
Token ID262
Feature activation+2.54e-02

Pos logit contributions
 antenna+0.122
nam+0.116
yt+0.115
rys+0.114
 nowhere+0.111
Neg logit contributions
old-0.087
 doctor-0.080
 Doctor-0.076
Here-0.075
 safer-0.075
Token different
Token ID1180
Feature activation+7.35e-02

Pos logit contributions
old+0.038
 him+0.032
 Doctor+0.031
five+0.030
 safer+0.030
Neg logit contributions
 antenna-0.050
nam-0.048
 magn-0.047
yt-0.047
rys-0.045
Token or
Token ID393
Feature activation+4.79e-02

Pos logit contributions
old+0.105
 Doctor+0.088
 safer+0.087
 him+0.087
 doctor+0.087
Neg logit contributions
 antenna-0.150
nam-0.142
yt-0.140
rys-0.134
 tentacles-0.133
Tokennam
Token ID7402
Feature activation-2.95e-01

Pos logit contributions
old+0.118
 him+0.110
 better+0.108
 safer+0.107
 doctor+0.106
Neg logit contributions
 antenna-0.046
yt-0.041
rys-0.041
 stub-0.039
nam-0.038
Tokenents
Token ID658
Feature activation-4.44e-02

Pos logit contributions
 antenna+0.551
rys+0.540
 magn+0.530
 stub+0.517
yt+0.493
Neg logit contributions
old-0.502
 him-0.395
 better-0.365
 ice-0.360
 choose-0.356
Token and
Token ID290
Feature activation+4.09e-03

Pos logit contributions
 antenna+0.095
nam+0.090
yt+0.089
 stub+0.086
rys+0.086
Neg logit contributions
old-0.058
 Doctor-0.046
 him-0.046
 doctor-0.046
 safer-0.045
Token adm
Token ID6178
Feature activation-3.75e-02

Pos logit contributions
old+0.007
 him+0.006
 better+0.006
 safer+0.006
 doctor+0.006
Neg logit contributions
 antenna-0.007
rys-0.007
yt-0.007
 stub-0.007
nam-0.007
Tokeniring
Token ID3428
Feature activation+1.07e-01

Pos logit contributions
 antenna+0.071
nam+0.069
yt+0.069
ifying+0.067
rys+0.066
Neg logit contributions
old-0.056
 Doctor-0.048
 safer-0.048
 say-0.048
five-0.048
Token the
Token ID262
Feature activation+2.27e-02

Pos logit contributions
old+0.138
 him+0.110
 better+0.108
 safer+0.108
 doctor+0.107
Neg logit contributions
 antenna-0.234
nam-0.224
yt-0.220
rys-0.216
 stub-0.213
Token to
Token ID284
Feature activation-2.59e-02

Pos logit contributions
old+0.008
 him+0.006
 safer+0.006
 better+0.006
 doctor+0.006
Neg logit contributions
 antenna-0.012
nam-0.011
yt-0.011
rys-0.011
 stub-0.011
Token look
Token ID804
Feature activation-1.33e-02

Pos logit contributions
 antenna+0.058
nam+0.055
yt+0.055
rys+0.054
 stub+0.054
Neg logit contributions
old-0.032
 him-0.025
 safer-0.025
 better-0.025
 doctor-0.024
Token at
Token ID379
Feature activation-2.60e-02

Pos logit contributions
 antenna+0.024
nam+0.023
yt+0.023
rys+0.022
ifying+0.021
Neg logit contributions
old-0.021
Here-0.018
 safer-0.018
 doctor-0.017
five-0.017
Token the
Token ID262
Feature activation+2.69e-02

Pos logit contributions
 antenna+0.058
nam+0.056
yt+0.054
rys+0.054
 stub+0.053
Neg logit contributions
old-0.032
 safer-0.025
 better-0.025
 him-0.025
 doctor-0.024
Token or
Token ID393
Feature activation+5.25e-02

Pos logit contributions
old+0.045
 him+0.038
 safer+0.038
 better+0.037
 welcome+0.036
Neg logit contributions
 antenna-0.048
nam-0.046
yt-0.044
rys-0.044
 stub-0.043
Tokennam
Token ID7402
Feature activation-2.94e-01

Pos logit contributions
old+0.120
 him+0.120
 better+0.116
 say+0.113
 choose+0.113
Neg logit contributions
 antenna-0.054
yt-0.054
rys-0.049
 tentacles-0.047
chie-0.044
Tokenents
Token ID658
Feature activation-4.10e-02

Pos logit contributions
 antenna+0.548
rys+0.525
 magn+0.518
 stub+0.501
yt+0.490
Neg logit contributions
old-0.505
 him-0.410
 better-0.376
 choose-0.371
 ice-0.368
Token on
Token ID319
Feature activation-1.18e-01

Pos logit contributions
 antenna+0.089
nam+0.085
yt+0.084
rys+0.083
 stub+0.082
Neg logit contributions
old-0.052
 him-0.043
 better-0.041
 safer-0.041
 doctor-0.041
Token the
Token ID262
Feature activation+2.76e-02

Pos logit contributions
 antenna+0.274
nam+0.260
yt+0.260
rys+0.258
 stub+0.256
Neg logit contributions
old-0.133
 him-0.112
 better-0.105
 doctor-0.104
 safer-0.103
Token tree
Token ID5509
Feature activation-4.40e-02

Pos logit contributions
old+0.045
 him+0.041
 better+0.040
 doctor+0.040
 safer+0.040
Neg logit contributions
 antenna-0.048
rys-0.046
yt-0.046
nam-0.045
 stub-0.045
Token.
Token ID13
Feature activation+1.49e-02

Pos logit contributions
 antenna+0.109
yt+0.104
nam+0.104
rys+0.103
 stub+0.102
Neg logit contributions
old-0.043
 him-0.034
 better-0.033
 safer-0.032
 doctor-0.032
Token to
Token ID284
Feature activation-2.92e-02

Pos logit contributions
old+0.059
 safer+0.048
Here+0.046
 better+0.046
 welcome+0.046
Neg logit contributions
 antenna-0.092
yt-0.087
nam-0.085
 stub-0.084
rys-0.083
Token help
Token ID1037
Feature activation+5.87e-02

Pos logit contributions
 antenna+0.068
rys+0.064
yt+0.064
 stub+0.064
nam+0.063
Neg logit contributions
old-0.033
 safer-0.025
 him-0.025
 better-0.024
 doctor-0.024
Token sort
Token ID3297
Feature activation+5.14e-03

Pos logit contributions
old+0.088
Here+0.080
five+0.075
 safer+0.073
Deal+0.072
Neg logit contributions
 antenna-0.112
nam-0.106
rys-0.101
 beating-0.099
yt-0.099
Token the
Token ID262
Feature activation+2.16e-02

Pos logit contributions
old+0.008
 doctor+0.007
 Doctor+0.007
Deal+0.006
 safer+0.006
Neg logit contributions
nam-0.010
 antenna-0.009
yt-0.009
rys-0.009
 magn-0.008
Token or
Token ID393
Feature activation+4.74e-02

Pos logit contributions
old+0.034
 him+0.029
 better+0.028
 safer+0.028
 doctor+0.028
Neg logit contributions
 antenna-0.041
nam-0.039
yt-0.038
rys-0.038
 stub-0.037
Tokennam
Token ID7402
Feature activation-2.94e-01

Pos logit contributions
old+0.102
 him+0.092
 better+0.091
 safer+0.090
 doctor+0.090
Neg logit contributions
 antenna-0.062
yt-0.057
rys-0.056
nam-0.055
 stub-0.054
Tokenents
Token ID658
Feature activation-4.48e-02

Pos logit contributions
 antenna+0.544
rys+0.531
 magn+0.522
 stub+0.512
yt+0.492
Neg logit contributions
old-0.503
 him-0.407
 better-0.387
 choose-0.379
 ice-0.369
Token.
Token ID13
Feature activation+1.38e-02

Pos logit contributions
 antenna+0.097
nam+0.093
yt+0.092
rys+0.089
 stub+0.089
Neg logit contributions
old-0.057
 him-0.045
 Doctor-0.044
 safer-0.044
 doctor-0.044
Token She
Token ID1375
Feature activation+3.90e-03

Pos logit contributions
old+0.020
 him+0.016
 safer+0.016
 better+0.016
 ice+0.016
Neg logit contributions
 antenna-0.028
nam-0.026
yt-0.026
rys-0.025
 stub-0.025
Token picked
Token ID6497
Feature activation+3.51e-03

Pos logit contributions
old+0.006
 better+0.005
 him+0.005
 doctor+0.005
 ice+0.005
Neg logit contributions
 antenna-0.008
rys-0.007
nam-0.007
ifying-0.007
yt-0.007
Token up
Token ID510
Feature activation+6.86e-02

Pos logit contributions
old+0.006
 safer+0.005
 welcome+0.005
 doctor+0.005
 Doctor+0.004
Neg logit contributions
 antenna-0.006
nam-0.006
yt-0.006
 nowhere-0.006
 tentacles-0.005
Token who
Token ID508
Feature activation+7.47e-02

Pos logit contributions
old+0.037
 him+0.030
 doctor+0.030
 better+0.029
 welcome+0.028
Neg logit contributions
 antenna-0.057
nam-0.054
yt-0.053
 tentacles-0.052
rys-0.052
Token liked
Token ID8288
Feature activation+1.82e-02

Pos logit contributions
old+0.122
 Doctor+0.104
 doctor+0.103
 safer+0.102
 him+0.101
Neg logit contributions
 antenna-0.134
 stub-0.130
nam-0.124
rys-0.121
 magn-0.121
Token to
Token ID284
Feature activation-2.29e-02

Pos logit contributions
old+0.024
 him+0.021
 doctor+0.020
 better+0.020
 ice+0.019
Neg logit contributions
 antenna-0.039
yt-0.037
nam-0.036
rys-0.036
 stub-0.035
Token collect
Token ID2824
Feature activation-1.41e-02

Pos logit contributions
 antenna+0.054
nam+0.051
yt+0.051
rys+0.050
 stub+0.050
Neg logit contributions
old-0.026
 him-0.020
 better-0.020
 safer-0.019
 doctor-0.019
Token or
Token ID393
Feature activation+5.54e-02

Pos logit contributions
 antenna+0.028
nam+0.028
yt+0.027
rys+0.026
 tentacles+0.026
Neg logit contributions
old-0.020
 Doctor-0.017
 doctor-0.016
 safer-0.016
 him-0.016
Tokennam
Token ID7402
Feature activation-2.93e-01

Pos logit contributions
old+0.155
 him+0.154
 better+0.151
 choose+0.150
 ice+0.148
Neg logit contributions
 antenna-0.031
yt-0.027
rys-0.025
 stub-0.021
ifying-0.020
Tokenents
Token ID658
Feature activation-3.85e-02

Pos logit contributions
 antenna+0.541
rys+0.515
 magn+0.510
 stub+0.501
yt+0.476
Neg logit contributions
old-0.502
 him-0.422
 better-0.394
 choose-0.379
 ice-0.377
Token.
Token ID13
Feature activation+2.00e-02

Pos logit contributions
 antenna+0.086
nam+0.082
yt+0.081
rys+0.080
 stub+0.079
Neg logit contributions
old-0.047
 him-0.037
 safer-0.037
 doctor-0.036
 better-0.036
Token Everywhere
Token ID48201
Feature activation+1.49e-02

Pos logit contributions
old+0.036
 safer+0.030
 him+0.030
 better+0.029
 cream+0.028
Neg logit contributions
 antenna-0.033
nam-0.032
rys-0.030
 stem-0.029
yt-0.029
Token she
Token ID673
Feature activation-3.62e-02

Pos logit contributions
old+0.020
 him+0.016
 better+0.016
 safer+0.016
 doctor+0.016
Neg logit contributions
 antenna-0.032
nam-0.031
yt-0.030
rys-0.030
 stub-0.030
Token went
Token ID1816
Feature activation-2.67e-03

Pos logit contributions
 antenna+0.060
nam+0.056
yt+0.055
rys+0.055
 stub+0.054
Neg logit contributions
old-0.065
 him-0.057
 better-0.056
 safer-0.055
 doctor-0.055
Token many
Token ID867
Feature activation-5.97e-02

Pos logit contributions
 antenna+0.302
rys+0.287
yt+0.287
nam+0.285
 stub+0.282
Neg logit contributions
old-0.200
 him-0.172
 better-0.168
 safer-0.166
 doctor-0.164
Token colors
Token ID7577
Feature activation-1.26e-02

Pos logit contributions
 antenna+0.114
nam+0.107
yt+0.106
rys+0.105
 stub+0.104
Neg logit contributions
old-0.093
 him-0.079
 better-0.077
 safer-0.077
 doctor-0.076
Token.
Token ID13
Feature activation+1.39e-02

Pos logit contributions
 antenna+0.028
yt+0.027
nam+0.026
rys+0.026
 tentacles+0.025
Neg logit contributions
old-0.016
 Doctor-0.012
 safer-0.012
 him-0.012
 welcome-0.012
Token Some
Token ID2773
Feature activation+1.00e-01

Pos logit contributions
old+0.030
five+0.025
aid+0.025
 cream+0.025
 safer+0.025
Neg logit contributions
 antenna-0.017
nam-0.017
 nowhere-0.016
 stub-0.016
yt-0.015
Token or
Token ID393
Feature activation+4.92e-02

Pos logit contributions
old+0.140
 him+0.117
 better+0.114
 safer+0.113
 doctor+0.112
Neg logit contributions
 antenna-0.207
nam-0.196
yt-0.194
rys-0.191
 stub-0.189
Tokennam
Token ID7402
Feature activation-2.93e-01

Pos logit contributions
old+0.100
 him+0.099
 better+0.098
 ice+0.096
 safer+0.093
Neg logit contributions
 antenna-0.063
yt-0.061
rys-0.060
 stub-0.060
 tentacles-0.053
Tokenents
Token ID658
Feature activation-4.38e-02

Pos logit contributions
rys+0.555
 antenna+0.554
 magn+0.547
 stub+0.536
yt+0.500
Neg logit contributions
old-0.464
 him-0.383
 better-0.347
 choose-0.346
 ice-0.332
Token were
Token ID547
Feature activation+1.49e-02

Pos logit contributions
 antenna+0.082
nam+0.077
yt+0.077
 stub+0.074
rys+0.074
Neg logit contributions
old-0.069
 Doctor-0.057
 safer-0.057
 doctor-0.057
 him-0.057
Token round
Token ID2835
Feature activation-7.60e-02

Pos logit contributions
old+0.020
 better+0.019
 safer+0.018
 him+0.017
 doctor+0.017
Neg logit contributions
 antenna-0.030
rys-0.030
yt-0.029
 stub-0.029
 magn-0.028
Token,
Token ID11
Feature activation-1.47e-02

Pos logit contributions
 antenna+0.169
nam+0.160
yt+0.159
rys+0.155
 stub+0.153
Neg logit contributions
old-0.096
 him-0.076
 safer-0.075
 better-0.073
 doctor-0.073
Token some
Token ID617
Feature activation-2.08e-02

Pos logit contributions
 antenna+0.030
nam+0.029
yt+0.028
rys+0.028
 stub+0.028
Neg logit contributions
old-0.021
 him-0.017
 safer-0.016
 better-0.016
 doctor-0.016
Token time
Token ID640
Feature activation+6.47e-02

Pos logit contributions
old+0.118
 him+0.099
 better+0.097
 safer+0.096
 doctor+0.096
Neg logit contributions
 antenna-0.172
nam-0.164
yt-0.161
rys-0.159
 stub-0.157
Token playing
Token ID2712
Feature activation+9.42e-02

Pos logit contributions
old+0.092
 him+0.077
 better+0.075
 safer+0.074
 doctor+0.074
Neg logit contributions
 antenna-0.132
nam-0.126
yt-0.124
rys-0.122
 stub-0.121
Token with
Token ID351
Feature activation-3.97e-02

Pos logit contributions
old+0.151
 safer+0.123
 better+0.120
 Doctor+0.118
 him+0.118
Neg logit contributions
 antenna-0.177
nam-0.166
yt-0.163
rys-0.161
 stub-0.159
Token her
Token ID607
Feature activation-2.11e-02

Pos logit contributions
 antenna+0.077
nam+0.073
yt+0.069
 nowhere+0.069
 stub+0.068
Neg logit contributions
old-0.060
 safer-0.050
five-0.049
 better-0.049
 doctor-0.049
Token or
Token ID393
Feature activation+5.70e-02

Pos logit contributions
 antenna+0.043
nam+0.040
yt+0.039
rys+0.039
 stub+0.039
Neg logit contributions
old-0.031
 him-0.026
 safer-0.025
 better-0.025
 doctor-0.025
Tokennam
Token ID7402
Feature activation-2.92e-01

Pos logit contributions
old+0.074
 him+0.071
 better+0.067
 doctor+0.065
 safer+0.064
Neg logit contributions
 antenna-0.118
rys-0.114
yt-0.113
ifying-0.109
rac-0.108
Tokenents
Token ID658
Feature activation-3.56e-02

Pos logit contributions
 antenna+0.560
rys+0.553
 magn+0.552
 stub+0.520
AAAA+0.506
Neg logit contributions
old-0.483
 him-0.372
 choose-0.344
 better-0.337
 ice-0.331
Token and
Token ID290
Feature activation+1.33e-02

Pos logit contributions
 antenna+0.082
nam+0.078
yt+0.077
rys+0.077
 stub+0.076
Neg logit contributions
old-0.041
 him-0.033
 better-0.032
 safer-0.031
 doctor-0.031
Token making
Token ID1642
Feature activation+3.23e-02

Pos logit contributions
old+0.020
 better+0.018
 him+0.018
 doctor+0.017
 safer+0.017
Neg logit contributions
rys-0.026
 antenna-0.025
ifying-0.024
yt-0.024
 stub-0.024
Token her
Token ID607
Feature activation-2.63e-02

Pos logit contributions
old+0.046
 safer+0.038
 doctor+0.037
 welcome+0.037
 Doctor+0.037
Neg logit contributions
 antenna-0.065
nam-0.063
yt-0.057
 tentacles-0.056
 stub-0.056
Token special
Token ID2041
Feature activation+2.50e-02

Pos logit contributions
 antenna+0.055
nam+0.052
yt+0.052
rys+0.052
 stub+0.051
Neg logit contributions
old-0.035
 him-0.030
 better-0.029
 safer-0.028
 doctor-0.028
Token went
Token ID1816
Feature activation-2.67e-03

Pos logit contributions
 antenna+0.060
nam+0.056
yt+0.055
rys+0.055
 stub+0.054
Neg logit contributions
old-0.065
 him-0.057
 better-0.056
 safer-0.055
 doctor-0.055
Token she
Token ID673
Feature activation-3.89e-02

Pos logit contributions
 antenna+0.005
nam+0.005
yt+0.005
rys+0.005
 stub+0.005
Neg logit contributions
old-0.004
 safer-0.003
 better-0.003
 him-0.003
Here-0.003
Token searched
Token ID16499
Feature activation-2.67e-02

Pos logit contributions
 antenna+0.065
nam+0.061
yt+0.060
 stub+0.059
rys+0.059
Neg logit contributions
old-0.069
 him-0.060
 safer-0.059
 better-0.059
 doctor-0.058
Token for
Token ID329
Feature activation+3.50e-02

Pos logit contributions
 antenna+0.052
nam+0.050
yt+0.049
 magn+0.047
rys+0.046
Neg logit contributions
old-0.041
 doctor-0.034
 Doctor-0.033
five-0.032
Here-0.032
Token or
Token ID393
Feature activation+5.50e-02

Pos logit contributions
old+0.046
 him+0.038
 better+0.037
 safer+0.037
 doctor+0.036
Neg logit contributions
 antenna-0.075
nam-0.071
yt-0.070
rys-0.070
 stub-0.069
Tokennam
Token ID7402
Feature activation-2.91e-01

Pos logit contributions
 better+0.151
 him+0.150
old+0.149
 safer+0.148
 choose+0.145
Neg logit contributions
yt-0.032
 antenna-0.031
rys-0.030
ifying-0.025
 tentacles-0.022
Tokenents
Token ID658
Feature activation-3.68e-02

Pos logit contributions
 antenna+0.552
rys+0.540
 magn+0.536
 stub+0.515
yt+0.488
Neg logit contributions
old-0.482
 him-0.388
 better-0.361
 choose-0.344
 ice-0.343
Token,
Token ID11
Feature activation-2.37e-03

Pos logit contributions
 antenna+0.076
nam+0.072
yt+0.072
 stub+0.069
rys+0.068
Neg logit contributions
old-0.051
 Doctor-0.041
 doctor-0.040
Here-0.040
 safer-0.040
Token and
Token ID290
Feature activation+1.53e-02

Pos logit contributions
 antenna+0.005
nam+0.005
yt+0.005
 stub+0.004
rys+0.004
Neg logit contributions
old-0.003
 him-0.003
 safer-0.003
 better-0.003
 doctor-0.003
Token the
Token ID262
Feature activation+3.27e-02

Pos logit contributions
old+0.018
 him+0.017
 better+0.017
 doctor+0.016
 safer+0.016
Neg logit contributions
 antenna-0.034
rys-0.033
yt-0.032
ifying-0.032
 tentacles-0.031
Token more
Token ID517
Feature activation+1.76e-01

Pos logit contributions
old+0.041
 better+0.038
 doctor+0.038
 him+0.037
 safer+0.036
Neg logit contributions
 antenna-0.068
rys-0.066
yt-0.065
 stub-0.064
 tentacles-0.063
Token park
Token ID3952
Feature activation+1.18e-01

Pos logit contributions
 antenna+0.057
nam+0.054
yt+0.054
rys+0.053
 stub+0.053
Neg logit contributions
old-0.043
 him-0.037
 better-0.034
 safer-0.034
 doctor-0.034
Token filled
Token ID5901
Feature activation-5.81e-03

Pos logit contributions
old+0.134
 him+0.105
 better+0.102
 safer+0.102
 doctor+0.100
Neg logit contributions
 antenna-0.273
nam-0.261
yt-0.259
 stub-0.253
rys-0.253
Token with
Token ID351
Feature activation-4.88e-02

Pos logit contributions
 antenna+0.012
nam+0.012
yt+0.012
rys+0.012
 stub+0.011
Neg logit contributions
old-0.008
 him-0.007
 better-0.006
 safer-0.006
 doctor-0.006
Token colourful
Token ID44799
Feature activation-1.86e-03

Pos logit contributions
 antenna+0.098
nam+0.097
 tentacles+0.090
yt+0.089
ifying+0.085
Neg logit contributions
old-0.070
 Doctor-0.060
 doctor-0.059
orting-0.054
Deal-0.054
Token or
Token ID393
Feature activation+4.88e-02

Pos logit contributions
 antenna+0.003
yt+0.003
nam+0.003
 tentacles+0.003
 magn+0.003
Neg logit contributions
old-0.003
 Doctor-0.003
 safer-0.003
 doctor-0.003
 him-0.003
Tokennam
Token ID7402
Feature activation-2.90e-01

Pos logit contributions
old+0.083
 better+0.076
 him+0.075
 safer+0.075
 doctor+0.073
Neg logit contributions
 antenna-0.083
rys-0.078
yt-0.077
 stub-0.075
nam-0.073
Tokenents
Token ID658
Feature activation-4.25e-02

Pos logit contributions
 antenna+0.565
rys+0.562
 magn+0.551
 stub+0.530
nam+0.496
Neg logit contributions
old-0.460
 him-0.364
 ice-0.338
 better-0.335
up-0.316
Token.
Token ID13
Feature activation+1.56e-02

Pos logit contributions
 antenna+0.096
nam+0.091
yt+0.091
rys+0.088
 stub+0.088
Neg logit contributions
old-0.051
 him-0.040
 safer-0.040
 doctor-0.039
 better-0.039
Token Jimmy
Token ID12963
Feature activation+3.29e-02

Pos logit contributions
old+0.023
aid+0.019
 better+0.018
 safer+0.018
 say+0.018
Neg logit contributions
 antenna-0.029
nam-0.028
ifying-0.028
 stub-0.027
 tentacles-0.027
Token looked
Token ID3114
Feature activation-8.18e-02

Pos logit contributions
old+0.054
 him+0.046
 better+0.044
 doctor+0.044
 safer+0.043
Neg logit contributions
 antenna-0.061
nam-0.059
rys-0.059
yt-0.057
ifying-0.055
Token out
Token ID503
Feature activation+1.23e-02

Pos logit contributions
 antenna+0.173
nam+0.163
yt+0.161
rys+0.161
 stub+0.159
Neg logit contributions
old-0.111
 him-0.095
 better-0.092
 safer-0.090
 doctor-0.088
Token.
Token ID13
Feature activation+1.29e-02

Pos logit contributions
old+0.002
 him+0.002
 safer+0.002
 doctor+0.002
 better+0.002
Neg logit contributions
 antenna-0.004
nam-0.004
yt-0.004
rys-0.004
 stub-0.004
Token They
Token ID1119
Feature activation+3.08e-02

Pos logit contributions
old+0.025
 better+0.021
aid+0.021
 him+0.021
 safer+0.021
Neg logit contributions
 antenna-0.019
nam-0.018
 stub-0.017
rys-0.017
yt-0.017
Token were
Token ID547
Feature activation+1.27e-02

Pos logit contributions
old+0.044
 him+0.039
 better+0.037
 choose+0.037
 say+0.036
Neg logit contributions
 antenna-0.061
rys-0.061
yt-0.060
ifying-0.056
 stub-0.056
Token called
Token ID1444
Feature activation+4.68e-02

Pos logit contributions
old+0.022
 him+0.018
 doctor+0.018
five+0.018
Here+0.018
Neg logit contributions
 antenna-0.023
nam-0.022
yt-0.020
 stub-0.020
rys-0.020
Token or
Token ID393
Feature activation+4.80e-02

Pos logit contributions
old+0.065
 him+0.053
 better+0.052
 safer+0.052
 doctor+0.051
Neg logit contributions
 antenna-0.098
nam-0.093
yt-0.091
rys-0.090
 stub-0.088
Tokennam
Token ID7402
Feature activation-2.90e-01

Pos logit contributions
 like+0.124
 Tom+0.123
 ice+0.122
 him+0.122
 fun+0.121
Neg logit contributions
yt-0.041
lor-0.040
rys-0.038
 tal-0.033
 antenna-0.030
Tokenents
Token ID658
Feature activation-4.27e-02

Pos logit contributions
 antenna+0.556
rys+0.550
 magn+0.536
 stub+0.533
yt+0.493
Neg logit contributions
old-0.464
 him-0.379
 better-0.344
 ice-0.339
 choose-0.332
Token and
Token ID290
Feature activation+6.02e-03

Pos logit contributions
 antenna+0.087
yt+0.082
nam+0.082
ho+0.078
rys+0.078
Neg logit contributions
old-0.060
Here-0.049
 safer-0.048
 Doctor-0.047
 doctor-0.047
Token they
Token ID484
Feature activation-5.15e-02

Pos logit contributions
old+0.009
 him+0.008
 ice+0.008
 better+0.008
 doctor+0.008
Neg logit contributions
 antenna-0.011
rys-0.011
yt-0.011
 stub-0.010
 tentacles-0.010
Token were
Token ID547
Feature activation+1.40e-02

Pos logit contributions
 antenna+0.098
rys+0.094
yt+0.093
nam+0.091
 stub+0.090
Neg logit contributions
old-0.079
 him-0.069
 better-0.067
 doctor-0.066
 safer-0.065
Token very
Token ID845
Feature activation-4.44e-02

Pos logit contributions
old+0.022
 him+0.018
 better+0.018
 safer+0.018
 doctor+0.018
Neg logit contributions
 antenna-0.027
nam-0.026
yt-0.025
rys-0.025
 stub-0.025
Token a
Token ID257
Feature activation-2.38e-02

Pos logit contributions
 antenna+0.341
yt+0.339
rys+0.329
nam+0.326
ifying+0.313
Neg logit contributions
old-0.149
 him-0.123
 better-0.120
 doctor-0.118
 ice-0.116
Token box
Token ID3091
Feature activation-3.77e-02

Pos logit contributions
 antenna+0.046
nam+0.044
yt+0.043
rys+0.043
 stub+0.042
Neg logit contributions
old-0.035
 better-0.031
 doctor-0.031
 him-0.030
 safer-0.030
Token of
Token ID286
Feature activation+8.99e-02

Pos logit contributions
 antenna+0.090
nam+0.086
yt+0.085
rys+0.085
 stub+0.084
Neg logit contributions
old-0.040
 him-0.032
 better-0.031
 safer-0.030
 doctor-0.030
Token pretty
Token ID2495
Feature activation+8.94e-03

Pos logit contributions
old+0.156
five+0.132
 Doctor+0.130
orting+0.129
 safer+0.128
Neg logit contributions
nam-0.160
 antenna-0.157
 magn-0.144
 tentacles-0.134
 sleeves-0.133
Token or
Token ID393
Feature activation+5.31e-02

Pos logit contributions
old+0.013
 him+0.011
 better+0.011
 doctor+0.011
 safer+0.011
Neg logit contributions
 antenna-0.018
nam-0.017
yt-0.017
rys-0.017
 stub-0.017
Tokennam
Token ID7402
Feature activation-2.90e-01

Pos logit contributions
old+0.105
 better+0.098
 him+0.097
 safer+0.096
 ice+0.095
Neg logit contributions
 antenna-0.075
yt-0.073
rys-0.072
 stub-0.068
ifying-0.065
Tokenents
Token ID658
Feature activation-3.84e-02

Pos logit contributions
 antenna+0.571
rys+0.554
 magn+0.550
 stub+0.534
 tentacles+0.506
Neg logit contributions
old-0.458
 him-0.349
 better-0.331
 choose-0.325
up-0.315
Token that
Token ID326
Feature activation+3.92e-02

Pos logit contributions
 antenna+0.076
nam+0.072
yt+0.071
rys+0.070
 stub+0.069
Neg logit contributions
old-0.057
 him-0.047
 safer-0.046
 doctor-0.046
 better-0.046
Token she
Token ID673
Feature activation-3.95e-02

Pos logit contributions
old+0.055
 him+0.046
 better+0.045
 safer+0.045
 doctor+0.044
Neg logit contributions
 antenna-0.081
nam-0.077
yt-0.076
rys-0.075
 stub-0.074
Token wanted
Token ID2227
Feature activation-6.65e-03

Pos logit contributions
 antenna+0.071
yt+0.069
rys+0.068
nam+0.067
ifying+0.065
Neg logit contributions
old-0.066
 him-0.056
 better-0.056
 doctor-0.055
 safer-0.054
Token to
Token ID284
Feature activation-2.31e-02

Pos logit contributions
 antenna+0.013
nam+0.012
yt+0.012
 stub+0.012
rys+0.012
Neg logit contributions
old-0.010
 him-0.008
 safer-0.008
 better-0.008
 doctor-0.008
Token and
Token ID290
Feature activation+8.02e-03

Pos logit contributions
 antenna+0.226
nam+0.221
rys+0.220
 stub+0.215
yt+0.209
Neg logit contributions
old-0.103
 him-0.087
 better-0.085
 ice-0.079
 safer-0.079
Token lots
Token ID6041
Feature activation-9.82e-02

Pos logit contributions
old+0.013
 him+0.011
 better+0.011
 safer+0.011
 ice+0.011
Neg logit contributions
 antenna-0.015
rys-0.014
yt-0.014
 stub-0.014
nam-0.013
Token of
Token ID286
Feature activation+8.95e-02

Pos logit contributions
 antenna+0.206
yt+0.200
 tentacles+0.186
rys+0.180
 beating+0.179
Neg logit contributions
old-0.116
 Doctor-0.114
Deal-0.111
 doctor-0.108
Here-0.108
Token shiny
Token ID22441
Feature activation-2.43e-02

Pos logit contributions
old+0.140
 Doctor+0.126
 doctor+0.122
 safer+0.118
orting+0.114
Neg logit contributions
 antenna-0.167
nam-0.163
yt-0.155
 tentacles-0.151
 magn-0.147
Token or
Token ID393
Feature activation+4.88e-02

Pos logit contributions
 antenna+0.041
nam+0.038
yt+0.038
rys+0.037
 stub+0.037
Neg logit contributions
old-0.043
 him-0.038
 better-0.037
 safer-0.037
 doctor-0.037
Tokennam
Token ID7402
Feature activation-2.90e-01

Pos logit contributions
old+0.074
 better+0.071
 him+0.069
 safer+0.068
 ice+0.067
Neg logit contributions
 antenna-0.089
rys-0.087
 stub-0.084
yt-0.083
rac-0.080
Tokenents
Token ID658
Feature activation-4.15e-02

Pos logit contributions
 antenna+0.555
rys+0.548
 magn+0.529
 stub+0.524
AAAA+0.484
Neg logit contributions
old-0.457
 him-0.374
 better-0.341
 choose-0.333
 ice-0.332
Token.
Token ID13
Feature activation+1.76e-02

Pos logit contributions
 antenna+0.087
yt+0.083
nam+0.082
rys+0.080
 stub+0.080
Neg logit contributions
old-0.056
 him-0.045
 doctor-0.045
 safer-0.045
 Doctor-0.045
Token L
Token ID406
Feature activation+2.32e-02

Pos logit contributions
old+0.025
 him+0.021
 better+0.021
 safer+0.021
 welcome+0.020
Neg logit contributions
 antenna-0.035
yt-0.033
nam-0.033
 stub-0.033
rys-0.033
Tokenila
Token ID10102
Feature activation+3.56e-02

Pos logit contributions
old+0.025
 him+0.019
 better+0.018
 doctor+0.018
 safer+0.018
Neg logit contributions
 antenna-0.056
nam-0.054
yt-0.053
rys-0.052
 stub-0.052
Token wanted
Token ID2227
Feature activation-1.17e-02

Pos logit contributions
old+0.051
 him+0.041
 better+0.039
 doctor+0.038
 choose+0.037
Neg logit contributions
nam-0.074
rys-0.074
 antenna-0.073
ifying-0.072
yt-0.071
Token Jane
Token ID12091
Feature activation-1.37e-02

Pos logit contributions
old+0.030
 safer+0.025
 better+0.024
 him+0.024
 choose+0.023
Neg logit contributions
 antenna-0.051
rys-0.046
 stub-0.046
 tentacles-0.045
nam-0.045
Token was
Token ID373
Feature activation+4.97e-03

Pos logit contributions
 antenna+0.027
nam+0.027
yt+0.026
rys+0.026
 stub+0.025
Neg logit contributions
old-0.021
 him-0.017
 better-0.017
 doctor-0.016
 safer-0.016
Token so
Token ID523
Feature activation-1.41e-02

Pos logit contributions
old+0.007
 him+0.006
 better+0.006
 safer+0.006
 doctor+0.006
Neg logit contributions
 antenna-0.010
nam-0.010
yt-0.009
rys-0.009
 stub-0.009
Token proud
Token ID6613
Feature activation+9.95e-03

Pos logit contributions
 antenna+0.033
nam+0.032
yt+0.031
 stub+0.031
rys+0.030
Neg logit contributions
old-0.016
 him-0.013
 doctor-0.012
 safer-0.012
 Doctor-0.012
Token of
Token ID286
Feature activation+9.34e-02

Pos logit contributions
old+0.012
Here+0.011
 safer+0.011
five+0.010
 earn+0.010
Neg logit contributions
yt-0.021
 antenna-0.021
 nowhere-0.019
rys-0.019
 tentacles-0.019
Token him
Token ID683
Feature activation+1.27e-01

Pos logit contributions
old+0.132
five+0.109
 doctor+0.101
 safer+0.100
 cleaner+0.099
Neg logit contributions
 antenna-0.192
nam-0.183
 nowhere-0.179
rys-0.178
 tentacles-0.172
Token.
Token ID13
Feature activation+2.43e-02

Pos logit contributions
old+0.169
Here+0.142
 him+0.134
 doctor+0.133
 safer+0.133
Neg logit contributions
 antenna-0.265
yt-0.246
nam-0.245
rys-0.243
 stub-0.238
Token
Token ID198
Feature activation-9.33e-03

Pos logit contributions
old+0.031
 safer+0.027
 better+0.026
 doctor+0.025
 ice+0.025
Neg logit contributions
 antenna-0.052
nam-0.047
rys-0.047
yt-0.047
 stub-0.047
Token
Token ID198
Feature activation-1.03e-02

Pos logit contributions
 antenna+0.018
 stub+0.016
rys+0.016
 nowhere+0.016
yt+0.016
Neg logit contributions
old-0.012
 safer-0.012
 cleaner-0.011
 better-0.011
 doctor-0.011
TokenThe
Token ID464
Feature activation-2.82e-02

Pos logit contributions
 antenna+0.017
nam+0.015
rys+0.015
yt+0.015
 stub+0.015
Neg logit contributions
old-0.018
 him-0.016
 safer-0.016
 better-0.016
 doctor-0.016
Token moral
Token ID6573
Feature activation-7.55e-02

Pos logit contributions
 antenna+0.052
nam+0.049
yt+0.048
rys+0.047
 stub+0.047
Neg logit contributions
old-0.046
 him-0.040
 better-0.039
 safer-0.039
 doctor-0.039
Token want
Token ID765
Feature activation+4.71e-02

Pos logit contributions
 antenna+0.247
yt+0.239
rys+0.236
 stub+0.234
nam+0.226
Neg logit contributions
old-0.151
Here-0.141
 safer-0.119
 Doctor-0.117
 doctor-0.115
Token to
Token ID284
Feature activation-3.02e-02

Pos logit contributions
old+0.081
Here+0.067
Deal+0.066
five+0.066
 safer+0.066
Neg logit contributions
 antenna-0.081
 stub-0.075
rys-0.074
 nowhere-0.073
nam-0.073
Token touch
Token ID3638
Feature activation+1.94e-03

Pos logit contributions
 antenna+0.062
 stub+0.060
yt+0.059
rys+0.059
 reapp+0.057
Neg logit contributions
old-0.043
Here-0.036
 safer-0.033
 Doctor-0.031
Deal-0.030
Token the
Token ID262
Feature activation+2.23e-02

Pos logit contributions
old+0.002
 him+0.002
 better+0.002
 safer+0.002
 doctor+0.002
Neg logit contributions
 antenna-0.004
nam-0.004
yt-0.004
rys-0.004
 stub-0.004
Token tube
Token ID12403
Feature activation-7.33e-02

Pos logit contributions
old+0.037
 him+0.031
 safer+0.030
 better+0.030
 doctor+0.029
Neg logit contributions
 antenna-0.041
nam-0.039
yt-0.038
rys-0.038
 stub-0.038
Token anymore
Token ID7471
Feature activation+1.21e-01

Pos logit contributions
 antenna+0.163
nam+0.155
yt+0.154
rys+0.151
 stub+0.150
Neg logit contributions
old-0.091
 him-0.074
 better-0.071
 safer-0.071
 doctor-0.070
Token because
Token ID780
Feature activation-4.05e-02

Pos logit contributions
old+0.140
 him+0.111
 better+0.108
 safer+0.107
 doctor+0.106
Neg logit contributions
 antenna-0.281
nam-0.268
yt-0.266
rys-0.261
 stub-0.260
Token it
Token ID340
Feature activation-2.05e-04

Pos logit contributions
 antenna+0.090
nam+0.085
yt+0.084
rys+0.083
 stub+0.082
Neg logit contributions
old-0.051
 him-0.042
 better-0.041
 safer-0.040
 doctor-0.040
Token was
Token ID373
Feature activation-7.26e-04

Pos logit contributions
nam+0.000
 antenna+0.000
yt+0.000
rys+0.000
ifying+0.000
Neg logit contributions
old-0.000
 better-0.000
 him-0.000
 doctor-0.000
 safer-0.000
Token scary
Token ID14343
Feature activation+9.94e-03

Pos logit contributions
 antenna+0.002
yt+0.002
nam+0.001
rys+0.001
rac+0.001
Neg logit contributions
 better-0.001
 safer-0.001
old-0.001
 him-0.001
 welcome-0.001
Token.
Token ID13
Feature activation+1.43e-02

Pos logit contributions
old+0.013
Here+0.010
 doctor+0.010
 safer+0.010
 Doctor+0.010
Neg logit contributions
 antenna-0.021
yt-0.020
 stub-0.020
nam-0.020
rys-0.019
Tokenrobe
Token ID25481
Feature activation-1.11e-01

Pos logit contributions
 antenna+0.033
nam+0.030
yt+0.030
rys+0.029
 stub+0.028
Neg logit contributions
old-0.051
 him-0.046
 better-0.045
 doctor-0.045
 safer-0.044
Token and
Token ID290
Feature activation+2.19e-02

Pos logit contributions
 antenna+0.251
nam+0.239
yt+0.236
rys+0.234
 stub+0.230
Neg logit contributions
old-0.134
 him-0.111
 better-0.107
 doctor-0.106
 safer-0.105
Token went
Token ID1816
Feature activation+1.02e-03

Pos logit contributions
old+0.032
 him+0.030
 better+0.029
 doctor+0.028
 ice+0.028
Neg logit contributions
 antenna-0.044
rys-0.041
yt-0.039
chie-0.039
ifying-0.039
Token to
Token ID284
Feature activation-1.72e-02

Pos logit contributions
old+0.002
 him+0.001
 safer+0.001
 better+0.001
Here+0.001
Neg logit contributions
 antenna-0.002
yt-0.002
rys-0.002
 stub-0.002
nam-0.002
Token the
Token ID262
Feature activation+3.59e-02

Pos logit contributions
 antenna+0.038
nam+0.036
 stub+0.035
rys+0.035
yt+0.035
Neg logit contributions
old-0.022
 safer-0.017
 better-0.016
 him-0.016
 doctor-0.016
Token park
Token ID3952
Feature activation+1.21e-01

Pos logit contributions
old+0.055
five+0.041
 him+0.039
aid+0.038
 safer+0.037
Neg logit contributions
 stub-0.070
 antenna-0.068
nam-0.067
ifying-0.067
rys-0.066
Token to
Token ID284
Feature activation-1.68e-02

Pos logit contributions
old+0.172
 him+0.144
 better+0.141
 safer+0.140
 doctor+0.138
Neg logit contributions
 antenna-0.247
nam-0.235
yt-0.233
rys-0.228
 stub-0.228
Token play
Token ID711
Feature activation+1.07e-01

Pos logit contributions
rys+0.035
 stub+0.035
 antenna+0.035
nam+0.035
 magn+0.033
Neg logit contributions
old-0.022
 Doctor-0.015
 safer-0.015
 him-0.015
 doctor-0.014
Token.
Token ID13
Feature activation+1.91e-02

Pos logit contributions
old+0.181
 Doctor+0.146
 safer+0.138
 doctor+0.135
 welcome+0.134
Neg logit contributions
 antenna-0.190
 stub-0.181
rys-0.179
 nowhere-0.176
nam-0.175
Token She
Token ID1375
Feature activation+1.67e-02

Pos logit contributions
old+0.028
 safer+0.023
 him+0.023
 better+0.023
 doctor+0.022
Neg logit contributions
 antenna-0.038
 stub-0.036
nam-0.036
yt-0.035
rys-0.035
Token saw
Token ID2497
Feature activation+4.42e-02

Pos logit contributions
old+0.024
 him+0.021
 better+0.020
 doctor+0.020
 safer+0.020
Neg logit contributions
 antenna-0.034
nam-0.032
rys-0.032
ifying-0.031
yt-0.031
Token Ben
Token ID3932
Feature activation+7.81e-03

Pos logit contributions
old+0.166
 him+0.139
 better+0.134
 safer+0.132
 doctor+0.132
Neg logit contributions
 antenna-0.290
nam-0.275
yt-0.275
rys-0.270
 stub-0.268
Token.
Token ID13
Feature activation+1.59e-02

Pos logit contributions
old+0.011
Here+0.009
 safer+0.009
 Doctor+0.009
 him+0.008
Neg logit contributions
 antenna-0.016
nam-0.015
 stub-0.015
yt-0.015
 tentacles-0.014
Token Tru
Token ID13616
Feature activation-7.69e-02

Pos logit contributions
old+0.024
 safer+0.020
 better+0.019
 him+0.019
 doctor+0.018
Neg logit contributions
 antenna-0.031
nam-0.030
ifying-0.029
 stub-0.028
rys-0.028
Tokencks
Token ID4657
Feature activation-9.94e-02

Pos logit contributions
 antenna+0.101
yt+0.096
nam+0.093
 stub+0.090
rys+0.089
Neg logit contributions
old-0.166
 him-0.153
 better-0.145
 safer-0.142
 welcome-0.141
Token are
Token ID389
Feature activation+5.07e-02

Pos logit contributions
 antenna+0.213
nam+0.195
yt+0.194
rys+0.193
 tentacles+0.188
Neg logit contributions
old-0.133
 safer-0.106
 doctor-0.105
 Doctor-0.105
Here-0.101
Token fun
Token ID1257
Feature activation+1.48e-01

Pos logit contributions
old+0.071
 him+0.058
 better+0.056
 safer+0.056
 doctor+0.056
Neg logit contributions
 antenna-0.105
nam-0.101
yt-0.098
rys-0.097
 stub-0.096
Token.
Token ID13
Feature activation+1.56e-02

Pos logit contributions
old+0.186
 him+0.147
 safer+0.144
 better+0.141
 doctor+0.141
Neg logit contributions
 antenna-0.329
nam-0.313
yt-0.312
 stub-0.303
rys-0.302
Token Doll
Token ID22209
Feature activation-4.24e-02

Pos logit contributions
old+0.025
 him+0.021
 better+0.021
 safer+0.021
 doctor+0.020
Neg logit contributions
 antenna-0.028
nam-0.027
rys-0.026
 stub-0.026
yt-0.026
Tokens
Token ID82
Feature activation-9.21e-02

Pos logit contributions
 antenna+0.067
nam+0.062
yt+0.061
rys+0.059
 stub+0.059
Neg logit contributions
old-0.076
Here-0.067
 safer-0.066
 Doctor-0.066
 doctor-0.066
Token are
Token ID389
Feature activation+4.97e-02

Pos logit contributions
 antenna+0.228
nam+0.219
yt+0.217
rys+0.215
 stub+0.213
Neg logit contributions
old-0.090
 him-0.070
 better-0.068
 safer-0.066
 doctor-0.065
Token boring
Token ID14262
Feature activation+5.57e-03

Pos logit contributions
old+0.067
 him+0.056
 better+0.055
 safer+0.055
 doctor+0.054
Neg logit contributions
 antenna-0.105
nam-0.099
yt-0.099
rys-0.097
 stub-0.097
Token decided
Token ID3066
Feature activation-2.62e-03

Pos logit contributions
old+0.025
 better+0.023
 him+0.023
 doctor+0.021
 ice+0.021
Neg logit contributions
 antenna-0.037
nam-0.036
rys-0.035
ifying-0.035
 fists-0.034
Token to
Token ID284
Feature activation-1.83e-02

Pos logit contributions
 antenna+0.006
nam+0.005
yt+0.005
rys+0.005
 stub+0.005
Neg logit contributions
old-0.004
 him-0.003
 better-0.003
 doctor-0.003
 safer-0.003
Token go
Token ID467
Feature activation+8.72e-02

Pos logit contributions
 antenna+0.038
nam+0.038
yt+0.037
ifying+0.036
rys+0.035
Neg logit contributions
old-0.024
 him-0.022
 better-0.021
 doctor-0.021
 welcome-0.021
Token find
Token ID1064
Feature activation+1.55e-02

Pos logit contributions
old+0.131
 him+0.109
 safer+0.106
 doctor+0.106
 better+0.105
Neg logit contributions
 antenna-0.171
nam-0.162
yt-0.160
rys-0.159
 stub-0.157
Token a
Token ID257
Feature activation-2.17e-02

Pos logit contributions
old+0.024
Deal+0.022
 earn+0.021
five+0.021
 choose+0.020
Neg logit contributions
 antenna-0.028
nam-0.025
 nowhere-0.025
 tentacles-0.025
 magn-0.024
Token new
Token ID649
Feature activation+1.40e-01

Pos logit contributions
 antenna+0.048
nam+0.046
yt+0.045
rys+0.045
ifying+0.044
Neg logit contributions
old-0.025
 better-0.024
 safer-0.023
 doctor-0.023
 welcome-0.022
Token home
Token ID1363
Feature activation+3.32e-02

Pos logit contributions
 doctor+0.154
 better+0.150
 ice+0.137
 welcome+0.134
old+0.133
Neg logit contributions
ifying-0.336
nam-0.323
 antenna-0.307
rac-0.296
 fists-0.296
Token.
Token ID13
Feature activation+2.14e-02

Pos logit contributions
old+0.047
Here+0.042
 Doctor+0.038
 safer+0.037
five+0.037
Neg logit contributions
 antenna-0.064
nam-0.063
yt-0.062
 nowhere-0.060
ifying-0.059
Token But
Token ID887
Feature activation+2.47e-02

Pos logit contributions
old+0.028
 safer+0.024
 better+0.024
 cleaner+0.022
 doctor+0.022
Neg logit contributions
 antenna-0.045
nam-0.042
 stub-0.041
 tentacles-0.040
yt-0.040
Token where
Token ID810
Feature activation+8.27e-02

Pos logit contributions
old+0.024
 him+0.020
 better+0.019
 doctor+0.019
 safer+0.019
Neg logit contributions
 antenna-0.061
yt-0.058
nam-0.058
rys-0.058
 stub-0.056
Token should
Token ID815
Feature activation+4.07e-02

Pos logit contributions
old+0.106
Here+0.083
 safer+0.082
five+0.081
 him+0.081
Neg logit contributions
 antenna-0.175
nam-0.174
yt-0.168
 stub-0.166
rys-0.160
Token hearts
Token ID11954
Feature activation-3.08e-02

Pos logit contributions
 antenna+0.026
nam+0.024
yt+0.023
rys+0.023
 stub+0.023
Neg logit contributions
old-0.030
 him-0.026
 better-0.026
 safer-0.025
 doctor-0.025
Token.
Token ID13
Feature activation+1.89e-02

Pos logit contributions
 antenna+0.075
nam+0.073
rys+0.072
 stub+0.071
yt+0.070
Neg logit contributions
old-0.030
 better-0.025
 him-0.025
 safer-0.024
 welcome-0.023
Token
Token ID198
Feature activation-1.74e-02

Pos logit contributions
old+0.028
 him+0.024
 safer+0.023
 better+0.023
 doctor+0.023
Neg logit contributions
 antenna-0.036
nam-0.034
ifying-0.033
yt-0.033
rys-0.033
Token
Token ID198
Feature activation-1.79e-02

Pos logit contributions
 antenna+0.032
 stub+0.030
nam+0.029
yt+0.029
rys+0.029
Neg logit contributions
old-0.026
 him-0.023
 safer-0.022
 better-0.022
 doctor-0.022
TokenThen
Token ID6423
Feature activation+2.30e-02

Pos logit contributions
 antenna+0.032
nam+0.030
yt+0.030
rys+0.029
 stub+0.029
Neg logit contributions
old-0.030
 him-0.026
 better-0.025
 safer-0.025
 doctor-0.025
Token,
Token ID11
Feature activation-1.24e-02

Pos logit contributions
old+0.032
 him+0.026
 safer+0.025
 better+0.025
 doctor+0.025
Neg logit contributions
 antenna-0.048
nam-0.046
yt-0.045
rys-0.044
 stub-0.044
Token Mom
Token ID11254
Feature activation-1.16e-02

Pos logit contributions
 antenna+0.025
nam+0.024
yt+0.024
rys+0.023
 stub+0.023
Neg logit contributions
old-0.018
 him-0.014
 safer-0.014
 doctor-0.014
 better-0.014
Tokenmy
Token ID1820
Feature activation+8.50e-02

Pos logit contributions
 antenna+0.024
yt+0.023
nam+0.023
 stub+0.023
rys+0.022
Neg logit contributions
old-0.015
 safer-0.013
 him-0.012
 better-0.012
 doctor-0.012
Token said
Token ID531
Feature activation+4.16e-02

Pos logit contributions
old+0.115
 him+0.096
 better+0.094
 doctor+0.092
 safer+0.090
Neg logit contributions
 antenna-0.179
nam-0.174
rys-0.168
yt-0.167
 stub-0.164
Token,
Token ID11
Feature activation-1.12e-02

Pos logit contributions
old+0.056
 him+0.046
 safer+0.045
 better+0.045
 doctor+0.044
Neg logit contributions
 antenna-0.088
nam-0.084
yt-0.083
rys-0.082
 stub-0.081
Token �
Token ID6184
Feature activation-9.07e-03

Pos logit contributions
 antenna+0.024
nam+0.024
yt+0.023
rys+0.023
 stub+0.023
Neg logit contributions
old-0.014
 him-0.012
 better-0.012
 doctor-0.012
 Doctor-0.011
Token when
Token ID618
Feature activation+7.51e-02

Pos logit contributions
 antenna+0.124
nam+0.117
yt+0.117
rys+0.113
 stub+0.113
Neg logit contributions
old-0.072
 him-0.056
 safer-0.056
 better-0.055
 doctor-0.055
Token she
Token ID673
Feature activation-3.74e-02

Pos logit contributions
old+0.096
 him+0.082
 better+0.078
 doctor+0.077
 safer+0.077
Neg logit contributions
 antenna-0.164
yt-0.155
nam-0.155
rys-0.154
 stub-0.151
Token saw
Token ID2497
Feature activation+4.45e-02

Pos logit contributions
 antenna+0.075
nam+0.073
rys+0.071
yt+0.071
 stub+0.068
Neg logit contributions
old-0.055
 him-0.047
 better-0.045
 doctor-0.045
 safer-0.044
Token all
Token ID477
Feature activation-5.92e-02

Pos logit contributions
old+0.070
five+0.060
 welcome+0.056
orting+0.056
 safer+0.056
Neg logit contributions
nam-0.083
 antenna-0.080
 nowhere-0.076
yt-0.069
 magn-0.068
Token the
Token ID262
Feature activation+3.43e-02

Pos logit contributions
 antenna+0.118
nam+0.115
rys+0.111
yt+0.109
 nowhere+0.106
Neg logit contributions
old-0.079
 doctor-0.068
 him-0.068
 safer-0.066
Here-0.065
Token fruits
Token ID15921
Feature activation+2.07e-02

Pos logit contributions
old+0.052
 doctor+0.042
 Doctor+0.041
five+0.041
 him+0.041
Neg logit contributions
 antenna-0.068
nam-0.068
yt-0.065
 magn-0.063
 sleeves-0.061
Token and
Token ID290
Feature activation+1.73e-02

Pos logit contributions
old+0.031
 doctor+0.025
 Doctor+0.024
 him+0.024
 safer+0.024
Neg logit contributions
 antenna-0.041
yt-0.039
nam-0.039
 stub-0.037
rys-0.037
Token vegetables
Token ID13701
Feature activation-1.74e-02

Pos logit contributions
old+0.026
 him+0.021
 doctor+0.021
 safer+0.021
 better+0.021
Neg logit contributions
 antenna-0.035
nam-0.033
yt-0.032
rys-0.031
 stub-0.031
Token spread
Token ID4104
Feature activation-2.50e-02

Pos logit contributions
 antenna+0.037
yt+0.035
nam+0.035
 stub+0.033
rys+0.033
Neg logit contributions
old-0.024
 Doctor-0.019
 doctor-0.019
 him-0.019
 safer-0.018
Token out
Token ID503
Feature activation+1.82e-02

Pos logit contributions
 antenna+0.054
nam+0.051
yt+0.050
rys+0.049
 tentacles+0.048
Neg logit contributions
old-0.033
 doctor-0.026
 Doctor-0.025
 him-0.025
 safer-0.025
Token around
Token ID1088
Feature activation-2.46e-02

Pos logit contributions
old+0.024
 doctor+0.019
 him+0.019
 Doctor+0.018
 safer+0.018
Neg logit contributions
 antenna-0.039
nam-0.037
yt-0.037
rys-0.036
ifying-0.035
Token looked
Token ID3114
Feature activation-7.51e-02

Pos logit contributions
old+0.011
 him+0.010
 doctor+0.010
 better+0.010
 ice+0.009
Neg logit contributions
 antenna-0.013
nam-0.013
yt-0.013
rys-0.013
ifying-0.013
Token around
Token ID1088
Feature activation-2.70e-02

Pos logit contributions
 antenna+0.153
nam+0.146
yt+0.144
rys+0.142
 stub+0.140
Neg logit contributions
old-0.108
 him-0.090
 better-0.088
 safer-0.087
 doctor-0.087
Token and
Token ID290
Feature activation+1.19e-02

Pos logit contributions
 antenna+0.061
nam+0.057
yt+0.057
rys+0.056
 stub+0.055
Neg logit contributions
old-0.034
 safer-0.026
 him-0.026
 better-0.026
 doctor-0.025
Token saw
Token ID2497
Feature activation+4.02e-02

Pos logit contributions
old+0.019
 him+0.016
 better+0.016
 safer+0.016
 doctor+0.015
Neg logit contributions
 antenna-0.022
nam-0.021
yt-0.021
rys-0.020
 stub-0.020
Token other
Token ID584
Feature activation+1.30e-01

Pos logit contributions
old+0.057
 welcome+0.046
 safer+0.046
five+0.045
 better+0.045
Neg logit contributions
 antenna-0.083
nam-0.080
 stub-0.074
 nowhere-0.074
 tentacles-0.073
Token kids
Token ID3988
Feature activation+1.13e-01

Pos logit contributions
old+0.161
Here+0.145
aid+0.131
 him+0.128
 safer+0.124
Neg logit contributions
 antenna-0.285
yt-0.270
nam-0.270
 magn-0.265
 tentacles-0.254
Token touching
Token ID15241
Feature activation+6.91e-03

Pos logit contributions
old+0.197
Here+0.179
 Doctor+0.170
five+0.168
 safer+0.167
Neg logit contributions
 antenna-0.183
nam-0.179
yt-0.173
 stub-0.171
 magn-0.163
Token their
Token ID511
Feature activation-1.07e-02

Pos logit contributions
old+0.010
 doctor+0.007
 safer+0.007
Here+0.007
five+0.007
Neg logit contributions
 antenna-0.014
nam-0.014
yt-0.013
rys-0.013
 tentacles-0.013
Token toes
Token ID23932
Feature activation-1.40e-01

Pos logit contributions
 antenna+0.015
nam+0.014
yt+0.014
rys+0.014
 stub+0.013
Neg logit contributions
old-0.021
 him-0.020
 better-0.019
 safer-0.019
 ice-0.019
Token.
Token ID13
Feature activation+1.75e-02

Pos logit contributions
 antenna+0.363
nam+0.350
rys+0.346
yt+0.345
 stub+0.341
Neg logit contributions
old-0.116
 him-0.097
 better-0.092
 safer-0.085
 doctor-0.084
Token Mia
Token ID32189
Feature activation+2.89e-03

Pos logit contributions
old+0.022
 him+0.018
 better+0.018
 safer+0.018
 doctor+0.017
Neg logit contributions
 antenna-0.039
nam-0.036
yt-0.036
rys-0.036
 stub-0.035
Token.
Token ID13
Feature activation+1.70e-02

Pos logit contributions
yt+0.058
 antenna+0.057
nam+0.054
 nowhere+0.052
ifying+0.051
Neg logit contributions
old-0.042
Here-0.036
 safer-0.034
 him-0.033
 welcome-0.032
Token She
Token ID1375
Feature activation+2.46e-03

Pos logit contributions
old+0.025
 safer+0.021
 better+0.021
 him+0.020
aid+0.018
Neg logit contributions
 antenna-0.034
nam-0.032
rys-0.031
 stub-0.031
yt-0.031
Token asked
Token ID1965
Feature activation+3.99e-02

Pos logit contributions
old+0.004
 him+0.003
 better+0.003
 doctor+0.003
 safer+0.003
Neg logit contributions
 antenna-0.005
nam-0.005
rys-0.004
yt-0.004
 stub-0.004
Token Sara
Token ID24799
Feature activation+6.91e-02

Pos logit contributions
old+0.055
 safer+0.050
Here+0.047
aid+0.046
 cleaner+0.045
Neg logit contributions
 antenna-0.077
nam-0.074
 eight-0.069
 nowhere-0.069
ho-0.069
Token and
Token ID290
Feature activation+8.05e-03

Pos logit contributions
old+0.094
Here+0.092
 safer+0.088
 ice+0.081
 cleaner+0.081
Neg logit contributions
 antenna-0.123
nam-0.121
 stub-0.114
 magn-0.112
ho-0.110
Token Ben
Token ID3932
Feature activation+1.15e-02

Pos logit contributions
old+0.008
 him+0.006
 better+0.006
 safer+0.006
 doctor+0.006
Neg logit contributions
 antenna-0.020
nam-0.019
yt-0.019
rys-0.018
 stub-0.018
Token to
Token ID284
Feature activation-2.42e-02

Pos logit contributions
old+0.018
 safer+0.016
Here+0.016
 Doctor+0.015
 ice+0.015
Neg logit contributions
 antenna-0.020
nam-0.020
 stub-0.018
 magn-0.018
yt-0.018
Token help
Token ID1037
Feature activation+6.29e-02

Pos logit contributions
 antenna+0.056
nam+0.055
rys+0.052
yt+0.051
 stub+0.051
Neg logit contributions
old-0.028
 safer-0.021
 better-0.019
 Doctor-0.019
Here-0.018
Token her
Token ID607
Feature activation-2.68e-02

Pos logit contributions
old+0.098
 safer+0.084
Here+0.084
five+0.076
Deal+0.074
Neg logit contributions
 antenna-0.116
nam-0.116
 stub-0.104
yt-0.103
 magn-0.102
Token.
Token ID13
Feature activation+1.36e-02

Pos logit contributions
nam+0.052
 antenna+0.051
yt+0.048
 magn+0.047
 reapp+0.047
Neg logit contributions
old-0.038
Here-0.034
 safer-0.031
orting-0.031
 him-0.030
Token Sara
Token ID24799
Feature activation+7.04e-02

Pos logit contributions
old+0.019
 safer+0.015
 better+0.015
 him+0.015
 welcome+0.014
Neg logit contributions
 antenna-0.028
nam-0.027
yt-0.026
 stub-0.026
rys-0.026
Token Lily
Token ID20037
Feature activation+5.24e-02

Pos logit contributions
old+0.001
 him+0.001
 doctor+0.001
 better+0.001
 Doctor+0.001
Neg logit contributions
 antenna-0.003
yt-0.003
nam-0.003
ifying-0.003
rys-0.003
Token said
Token ID531
Feature activation+3.56e-02

Pos logit contributions
old+0.035
 safer+0.023
 him+0.022
 doctor+0.021
 better+0.020
Neg logit contributions
 antenna-0.145
yt-0.139
nam-0.138
 stub-0.134
rys-0.134
Token.
Token ID13
Feature activation+7.34e-03

Pos logit contributions
old+0.041
 him+0.032
 safer+0.031
 doctor+0.031
 better+0.031
Neg logit contributions
 antenna-0.082
nam-0.079
yt-0.078
rys-0.076
 stub-0.076
Token
Token ID198
Feature activation-2.87e-02

Pos logit contributions
old+0.007
 him+0.006
 better+0.005
 doctor+0.005
 safer+0.005
Neg logit contributions
 antenna-0.018
nam-0.018
yt-0.017
rys-0.017
 stub-0.017
Token
Token ID198
Feature activation-2.75e-02

Pos logit contributions
 antenna+0.058
yt+0.052
 stub+0.052
 nowhere+0.051
nam+0.051
Neg logit contributions
old-0.038
 safer-0.032
 better-0.032
 doctor-0.031
 cleaner-0.031
TokenThey
Token ID2990
Feature activation+1.20e-02

Pos logit contributions
 antenna+0.062
nam+0.059
rys+0.059
yt+0.058
 stub+0.058
Neg logit contributions
old-0.035
 better-0.028
 safer-0.028
 doctor-0.028
 him-0.028
Token waved
Token ID26834
Feature activation-6.27e-02

Pos logit contributions
old+0.020
 him+0.018
 say+0.017
 better+0.017
 choose+0.017
Neg logit contributions
rys-0.021
 antenna-0.021
nam-0.020
yt-0.020
 stub-0.020
Token to
Token ID284
Feature activation-3.10e-02

Pos logit contributions
 antenna+0.118
nam+0.109
yt+0.108
rys+0.104
 stub+0.103
Neg logit contributions
old-0.102
 doctor-0.084
 Doctor-0.083
 safer-0.083
 better-0.082
Token the
Token ID262
Feature activation+1.58e-02

Pos logit contributions
 antenna+0.066
nam+0.063
rys+0.060
yt+0.059
 magn+0.058
Neg logit contributions
old-0.043
 safer-0.033
 doctor-0.033
 better-0.033
Here-0.032
Token ostr
Token ID42387
Feature activation-1.69e-02

Pos logit contributions
old+0.029
 him+0.025
 safer+0.024
 better+0.024
 doctor+0.023
Neg logit contributions
 antenna-0.026
yt-0.024
nam-0.024
rys-0.023
ho-0.023
Tokenich
Token ID488
Feature activation-1.11e-01

Pos logit contributions
 antenna+0.020
nam+0.018
rys+0.017
yt+0.017
 stub+0.017
Neg logit contributions
old-0.039
 him-0.035
 doctor-0.035
 better-0.034
 safer-0.034
Token if
Token ID611
Feature activation+9.23e-02

Pos logit contributions
old+0.068
Here+0.056
 him+0.054
 safer+0.053
five+0.053
Neg logit contributions
 antenna-0.101
yt-0.094
nam-0.092
rys-0.090
 magn-0.090
Token you
Token ID345
Feature activation+7.26e-03

Pos logit contributions
old+0.113
 him+0.095
 better+0.093
 doctor+0.090
 safer+0.090
Neg logit contributions
 antenna-0.206
yt-0.196
nam-0.195
rys-0.193
 stub-0.191
Token put
Token ID1234
Feature activation+9.86e-02

Pos logit contributions
old+0.009
 him+0.007
 doctor+0.007
 safer+0.007
 Doctor+0.007
Neg logit contributions
 antenna-0.016
nam-0.015
rys-0.015
yt-0.015
 stub-0.015
Token your
Token ID534
Feature activation+9.27e-03

Pos logit contributions
old+0.140
 doctor+0.109
 him+0.107
 safer+0.106
 Doctor+0.105
Neg logit contributions
 antenna-0.203
nam-0.197
rys-0.188
yt-0.184
 magn-0.183
Token heart
Token ID2612
Feature activation-9.63e-02

Pos logit contributions
old+0.017
 him+0.015
 better+0.015
 doctor+0.015
 safer+0.015
Neg logit contributions
 antenna-0.015
yt-0.014
nam-0.014
rys-0.014
 stub-0.014
Token into
Token ID656
Feature activation-8.06e-02

Pos logit contributions
 antenna+0.205
yt+0.197
rys+0.195
nam+0.189
 stub+0.188
Neg logit contributions
old-0.121
 better-0.119
 safer-0.107
 him-0.105
 welcome-0.104
Token it
Token ID340
Feature activation+6.73e-03

Pos logit contributions
 antenna+0.178
nam+0.169
yt+0.169
rys+0.167
 stub+0.165
Neg logit contributions
old-0.099
 him-0.083
 better-0.081
 safer-0.080
 doctor-0.078
Token.
Token ID13
Feature activation+2.29e-02

Pos logit contributions
old+0.008
 better+0.006
 him+0.006
 safer+0.006
 doctor+0.006
Neg logit contributions
 antenna-0.015
nam-0.015
yt-0.015
rys-0.015
 stub-0.014
Token Not
Token ID1892
Feature activation-2.90e-02

Pos logit contributions
old+0.030
 him+0.025
 safer+0.025
 better+0.024
 ice+0.024
Neg logit contributions
 antenna-0.048
nam-0.045
rys-0.045
yt-0.044
 stub-0.044
Token only
Token ID691
Feature activation+3.63e-02

Pos logit contributions
 antenna+0.065
yt+0.062
rys+0.060
 stub+0.060
nam+0.060
Neg logit contributions
old-0.035
 better-0.031
 him-0.030
 safer-0.029
 doctor-0.028
Token that
Token ID326
Feature activation+4.03e-02

Pos logit contributions
old+0.051
 him+0.043
 better+0.043
 safer+0.042
 doctor+0.041
Neg logit contributions
 antenna-0.074
nam-0.071
yt-0.070
rys-0.069
 stub-0.068
Token lifted
Token ID13663
Feature activation+1.83e-02

Pos logit contributions
old+0.007
 him+0.006
 safer+0.006
 better+0.006
 doctor+0.006
Neg logit contributions
 antenna-0.006
nam-0.006
yt-0.006
rys-0.006
 stub-0.006
Token the
Token ID262
Feature activation+3.23e-02

Pos logit contributions
old+0.026
 safer+0.021
 doctor+0.020
Deal+0.020
Here+0.019
Neg logit contributions
 antenna-0.038
nam-0.036
yt-0.033
rys-0.033
 tentacles-0.033
Token box
Token ID3091
Feature activation-3.27e-02

Pos logit contributions
old+0.055
 better+0.049
 doctor+0.049
 safer+0.048
 him+0.048
Neg logit contributions
 antenna-0.055
nam-0.053
yt-0.053
rys-0.052
 stub-0.050
Token.
Token ID13
Feature activation+1.77e-02

Pos logit contributions
 antenna+0.076
nam+0.072
yt+0.071
rys+0.071
 stub+0.070
Neg logit contributions
old-0.037
 him-0.030
 better-0.029
 safer-0.029
 doctor-0.028
Token He
Token ID679
Feature activation+5.71e-02

Pos logit contributions
old+0.024
 safer+0.020
 better+0.020
 doctor+0.019
 him+0.019
Neg logit contributions
 antenna-0.037
nam-0.034
ifying-0.032
 tentacles-0.032
 nowhere-0.032
Token peek
Token ID27185
Feature activation-3.04e-02

Pos logit contributions
old+0.092
 him+0.081
 better+0.080
 doctor+0.077
 safer+0.077
Neg logit contributions
 antenna-0.102
nam-0.102
rys-0.101
yt-0.098
ifying-0.095
Tokened
Token ID276
Feature activation-8.44e-02

Pos logit contributions
 antenna+0.043
yt+0.039
nam+0.039
rys+0.038
 stub+0.037
Neg logit contributions
old-0.063
 him-0.055
 doctor-0.055
 safer-0.055
 better-0.055
Token inside
Token ID2641
Feature activation+6.55e-02

Pos logit contributions
 antenna+0.156
yt+0.141
nam+0.139
 magn+0.137
 tentacles+0.136
Neg logit contributions
old-0.145
 doctor-0.118
 Doctor-0.116
 safer-0.114
Here-0.113
Token,
Token ID11
Feature activation-6.46e-03

Pos logit contributions
old+0.096
Here+0.075
 safer+0.075
 doctor+0.073
aid+0.073
Neg logit contributions
 antenna-0.135
yt-0.125
nam-0.122
 nowhere-0.121
 stub-0.119
Token and
Token ID290
Feature activation+1.26e-02

Pos logit contributions
 antenna+0.013
nam+0.013
yt+0.013
rys+0.013
 stub+0.012
Neg logit contributions
old-0.009
 him-0.008
 better-0.007
 safer-0.007
 doctor-0.007
Token he
Token ID339
Feature activation+9.82e-03

Pos logit contributions
old+0.019
 him+0.015
 safer+0.015
 better+0.015
 doctor+0.015
Neg logit contributions
 antenna-0.025
nam-0.024
yt-0.023
 stub-0.023
rys-0.023
Token ugly
Token ID13400
Feature activation-7.54e-04

Pos logit contributions
 antenna+0.080
nam+0.075
yt+0.075
rys+0.073
 stub+0.073
Neg logit contributions
old-0.057
 him-0.048
 better-0.047
 safer-0.046
 doctor-0.046
Token and
Token ID290
Feature activation+1.77e-02

Pos logit contributions
 antenna+0.002
yt+0.002
nam+0.002
 stub+0.002
rys+0.001
Neg logit contributions
old-0.001
Here-0.001
 him-0.001
 safer-0.001
 welcome-0.001
Token it
Token ID340
Feature activation+7.35e-03

Pos logit contributions
old+0.022
 him+0.022
 better+0.021
 ice+0.020
 safer+0.020
Neg logit contributions
 antenna-0.037
yt-0.037
rys-0.035
 stub-0.033
 magn-0.033
Token made
Token ID925
Feature activation-1.05e-02

Pos logit contributions
old+0.011
 him+0.010
 better+0.010
 doctor+0.010
 safer+0.010
Neg logit contributions
 antenna-0.014
yt-0.013
nam-0.013
rys-0.013
ifying-0.012
Token a
Token ID257
Feature activation-2.52e-02

Pos logit contributions
 antenna+0.020
nam+0.019
 beating+0.018
 nowhere+0.018
 stub+0.018
Neg logit contributions
old-0.016
five-0.013
Here-0.013
Deal-0.013
 safer-0.013
Token lot
Token ID1256
Feature activation-3.30e-02

Pos logit contributions
 antenna+0.052
nam+0.050
rys+0.049
yt+0.049
rac+0.048
Neg logit contributions
old-0.031
 better-0.030
 doctor-0.029
 safer-0.029
 ice-0.028
Token of
Token ID286
Feature activation+9.34e-02

Pos logit contributions
 antenna+0.078
nam+0.072
rys+0.072
ifying+0.072
yt+0.071
Neg logit contributions
old-0.033
 him-0.028
 Doctor-0.027
Here-0.027
Deal-0.026
Token noise
Token ID7838
Feature activation+3.31e-02

Pos logit contributions
old+0.141
five+0.113
 Doctor+0.111
orting+0.110
 him+0.109
Neg logit contributions
 antenna-0.183
nam-0.174
 stub-0.170
 tentacles-0.166
rys-0.162
Token.
Token ID13
Feature activation+1.36e-02

Pos logit contributions
old+0.047
Here+0.040
 welcome+0.037
 Doctor+0.037
 safer+0.036
Neg logit contributions
 antenna-0.066
yt-0.063
nam-0.062
 stub-0.062
 beating-0.061
Token The
Token ID383
Feature activation+1.29e-02

Pos logit contributions
old+0.018
 him+0.014
 better+0.014
 safer+0.014
 doctor+0.014
Neg logit contributions
 antenna-0.029
nam-0.028
yt-0.027
 stub-0.027
rys-0.027
Token cell
Token ID2685
Feature activation-3.57e-02

Pos logit contributions
old+0.026
 him+0.022
 safer+0.022
five+0.021
 better+0.021
Neg logit contributions
 antenna-0.019
yt-0.018
 stub-0.018
nam-0.018
rys-0.018
Token farmer
Token ID18739
Feature activation-2.50e-02

Pos logit contributions
 antenna+0.055
nam+0.052
yt+0.052
 stub+0.051
rys+0.051
Neg logit contributions
old-0.046
 him-0.039
 safer-0.037
 better-0.036
Here-0.035
Token.
Token ID13
Feature activation+8.91e-03

Pos logit contributions
 antenna+0.053
nam+0.049
 stub+0.049
yt+0.049
rys+0.049
Neg logit contributions
old-0.031
 safer-0.026
Here-0.025
 Doctor-0.025
 him-0.025
Token He
Token ID679
Feature activation+5.35e-02

Pos logit contributions
old+0.011
 safer+0.010
 better+0.009
 cleaner+0.009
 doctor+0.009
Neg logit contributions
 antenna-0.018
yt-0.018
nam-0.018
 stub-0.017
ifying-0.017
Token has
Token ID468
Feature activation-1.12e-01

Pos logit contributions
old+0.093
 safer+0.080
 Doctor+0.078
 him+0.077
 doctor+0.077
Neg logit contributions
 antenna-0.087
 stub-0.080
yt-0.078
 tentacles-0.077
rys-0.077
Token a
Token ID257
Feature activation-2.93e-02

Pos logit contributions
 antenna+0.245
nam+0.233
yt+0.229
rys+0.226
 stub+0.226
Neg logit contributions
old-0.144
 him-0.117
 better-0.114
 safer-0.113
 doctor-0.112
Token big
Token ID1263
Feature activation-4.13e-02

Pos logit contributions
 antenna+0.057
nam+0.054
yt+0.053
rys+0.053
 stub+0.052
Neg logit contributions
old-0.043
 better-0.038
 doctor-0.038
 safer-0.037
 him-0.037
Token farm
Token ID5318
Feature activation-3.40e-02

Pos logit contributions
 antenna+0.079
nam+0.076
rys+0.074
yt+0.074
 stub+0.072
Neg logit contributions
old-0.062
 him-0.054
 better-0.054
 doctor-0.054
 welcome-0.053
Token with
Token ID351
Feature activation-4.99e-02

Pos logit contributions
 antenna+0.077
nam+0.073
yt+0.072
rys+0.071
 stub+0.071
Neg logit contributions
old-0.040
 him-0.033
 better-0.032
 safer-0.031
 doctor-0.031
Token many
Token ID867
Feature activation-6.48e-02

Pos logit contributions
 antenna+0.098
nam+0.093
 tentacles+0.088
yt+0.088
 stub+0.087
Neg logit contributions
old-0.073
 safer-0.061
 better-0.060
 doctor-0.059
 welcome-0.057
Token animals
Token ID4695
Feature activation+8.07e-02

Pos logit contributions
 antenna+0.138
nam+0.129
yt+0.129
 tentacles+0.124
rys+0.124
Neg logit contributions
old-0.084
 him-0.070
 doctor-0.068
 better-0.067
 safer-0.067
Token.
Token ID13
Feature activation+5.52e-03

Pos logit contributions
old+0.098
 Doctor+0.080
 safer+0.080
 doctor+0.079
 better+0.078
Neg logit contributions
 antenna-0.176
nam-0.165
yt-0.164
rys-0.161
 stub-0.160
Token the
Token ID262
Feature activation+2.27e-02

Pos logit contributions
 antenna+0.056
nam+0.053
 magn+0.050
 stub+0.049
 nowhere+0.049
Neg logit contributions
old-0.049
Deal-0.041
orting-0.041
Here-0.040
 safer-0.039
Token bench
Token ID7624
Feature activation-5.14e-02

Pos logit contributions
old+0.045
 him+0.039
 better+0.038
 safer+0.038
 doctor+0.038
Neg logit contributions
 antenna-0.035
nam-0.032
yt-0.032
rys-0.031
 stub-0.031
Token and
Token ID290
Feature activation-1.30e-03

Pos logit contributions
 antenna+0.111
yt+0.105
nam+0.105
 stub+0.102
 nowhere+0.100
Neg logit contributions
old-0.066
 him-0.053
 safer-0.052
 better-0.052
 welcome-0.051
Token opened
Token ID4721
Feature activation-1.48e-02

Pos logit contributions
 antenna+0.002
nam+0.002
yt+0.002
rys+0.002
 stub+0.002
Neg logit contributions
old-0.002
 him-0.002
 better-0.002
 safer-0.002
 doctor-0.002
Token the
Token ID262
Feature activation+2.14e-02

Pos logit contributions
 antenna+0.033
nam+0.031
yt+0.031
rys+0.031
 stub+0.030
Neg logit contributions
old-0.018
 him-0.015
 safer-0.014
 better-0.014
 doctor-0.014
Token jar
Token ID17379
Feature activation-1.28e-01

Pos logit contributions
old+0.044
 better+0.040
 him+0.039
 safer+0.039
 doctor+0.039
Neg logit contributions
 antenna-0.030
nam-0.028
yt-0.027
rys-0.027
 stub-0.026
Token.
Token ID13
Feature activation+8.37e-03

Pos logit contributions
 antenna+0.314
nam+0.302
yt+0.297
rys+0.296
 stub+0.293
Neg logit contributions
old-0.128
 him-0.099
 better-0.096
 safer-0.094
 doctor-0.093
Token
Token ID198
Feature activation-2.39e-02

Pos logit contributions
old+0.014
 better+0.011
 him+0.011
 safer+0.011
 choice+0.011
Neg logit contributions
 antenna-0.016
nam-0.014
 stub-0.014
yt-0.014
 tentacles-0.014
Token
Token ID198
Feature activation-2.31e-02

Pos logit contributions
 antenna+0.045
 stub+0.042
yt+0.042
 nowhere+0.040
rac+0.039
Neg logit contributions
old-0.033
 ice-0.029
 safer-0.029
 better-0.028
 doctor-0.028
TokenL
Token ID43
Feature activation-5.07e-02

Pos logit contributions
 antenna+0.037
nam+0.034
yt+0.034
 stub+0.033
rys+0.033
Neg logit contributions
old-0.042
 him-0.038
 better-0.037
 safer-0.036
 doctor-0.036
Tokenily
Token ID813
Feature activation+5.05e-02

Pos logit contributions
 antenna+0.081
rys+0.081
 nowhere+0.078
alk+0.078
yt+0.077
Neg logit contributions
 better-0.075
 safer-0.074
 healthier-0.072
 Doctor-0.070
Here-0.069
Token
Token ID198
Feature activation-1.90e-02

Pos logit contributions
 antenna+0.037
 stub+0.035
 nowhere+0.033
yt+0.033
nam+0.033
Neg logit contributions
old-0.024
 safer-0.021
 better-0.021
 ice-0.020
 doctor-0.020
TokenThey
Token ID2990
Feature activation+1.96e-02

Pos logit contributions
 antenna+0.035
nam+0.033
yt+0.032
rys+0.032
 stub+0.032
Neg logit contributions
old-0.031
 him-0.027
 better-0.026
 safer-0.026
 doctor-0.025
Token drove
Token ID10357
Feature activation+3.42e-02

Pos logit contributions
old+0.032
 him+0.029
 better+0.028
 ice+0.027
 choose+0.026
Neg logit contributions
 antenna-0.036
rys-0.036
yt-0.034
nam-0.033
 stub-0.033
Token for
Token ID329
Feature activation+2.55e-02

Pos logit contributions
old+0.053
 doctor+0.043
 safer+0.043
 him+0.043
 Doctor+0.043
Neg logit contributions
 antenna-0.066
nam-0.063
yt-0.061
 stub-0.059
rys-0.059
Token a
Token ID257
Feature activation-3.53e-02

Pos logit contributions
old+0.036
 him+0.030
 doctor+0.029
 better+0.029
 safer+0.029
Neg logit contributions
 antenna-0.052
nam-0.050
yt-0.048
rys-0.048
 stub-0.047
Token long
Token ID890
Feature activation-1.88e-01

Pos logit contributions
 antenna+0.051
nam+0.046
rys+0.046
yt+0.044
 stub+0.043
Neg logit contributions
old-0.071
 better-0.065
 safer-0.064
 him-0.064
 doctor-0.063
Token time
Token ID640
Feature activation+4.77e-02

Pos logit contributions
nam+0.441
 antenna+0.437
rys+0.416
ifying+0.415
rac+0.397
Neg logit contributions
old-0.208
 him-0.180
 ice-0.165
 better-0.163
 safer-0.159
Token and
Token ID290
Feature activation+1.14e-03

Pos logit contributions
old+0.065
 doctor+0.054
 him+0.054
Here+0.053
 Doctor+0.053
Neg logit contributions
 antenna-0.099
yt-0.095
nam-0.092
 stub-0.089
rys-0.089
Token soon
Token ID2582
Feature activation-2.12e-02

Pos logit contributions
old+0.002
 him+0.002
 better+0.002
 safer+0.002
 doctor+0.002
Neg logit contributions
 antenna-0.002
nam-0.002
yt-0.002
rys-0.002
 stub-0.002
Token they
Token ID484
Feature activation-5.62e-02

Pos logit contributions
 antenna+0.040
yt+0.038
 stub+0.036
nam+0.036
 tentacles+0.036
Neg logit contributions
old-0.033
 doctor-0.027
Here-0.027
 welcome-0.027
 safer-0.027
Token were
Token ID547
Feature activation+8.47e-03

Pos logit contributions
 antenna+0.107
rys+0.105
nam+0.103
yt+0.099
 stub+0.096
Neg logit contributions
old-0.086
 him-0.080
 better-0.075
 safer-0.071
 ice-0.070
Token hand
Token ID1021
Feature activation-7.27e-02

Pos logit contributions
 antenna+0.059
yt+0.057
nam+0.056
rys+0.056
 stub+0.055
Neg logit contributions
old-0.068
 him-0.061
 doctor-0.060
 better-0.059
 welcome-0.059
Token.
Token ID13
Feature activation+1.74e-02

Pos logit contributions
 antenna+0.172
nam+0.165
yt+0.163
rys+0.161
 stub+0.160
Neg logit contributions
old-0.079
 him-0.062
 better-0.060
 safer-0.059
 doctor-0.059
Token They
Token ID1119
Feature activation+3.66e-02

Pos logit contributions
old+0.025
 safer+0.022
 better+0.021
 doctor+0.020
 him+0.020
Neg logit contributions
 antenna-0.034
nam-0.032
yt-0.031
 stub-0.031
ifying-0.030
Token followed
Token ID3940
Feature activation-3.24e-02

Pos logit contributions
old+0.063
 him+0.057
 better+0.053
 ice+0.052
 say+0.052
Neg logit contributions
rys-0.066
 antenna-0.065
yt-0.061
nam-0.060
 stub-0.059
Token the
Token ID262
Feature activation+3.42e-02

Pos logit contributions
 antenna+0.072
nam+0.069
yt+0.067
rys+0.067
 stub+0.066
Neg logit contributions
old-0.040
 safer-0.032
 better-0.031
 doctor-0.031
 welcome-0.031
Token long
Token ID890
Feature activation-1.73e-01

Pos logit contributions
old+0.068
 him+0.058
 safer+0.057
 better+0.056
 welcome+0.055
Neg logit contributions
 antenna-0.053
nam-0.047
yt-0.046
 stub-0.045
 magn-0.045
Token,
Token ID11
Feature activation-4.17e-03

Pos logit contributions
 antenna+0.290
nam+0.286
yt+0.274
rys+0.273
 stub+0.270
Neg logit contributions
old-0.304
 him-0.267
 better-0.262
 doctor-0.261
 safer-0.259
Token curly
Token ID45731
Feature activation-1.36e-01

Pos logit contributions
 antenna+0.008
yt+0.008
rys+0.008
nam+0.008
 stub+0.007
Neg logit contributions
old-0.006
 safer-0.005
 better-0.005
 him-0.005
 ice-0.005
Token cable
Token ID7862
Feature activation-4.35e-02

Pos logit contributions
 antenna+0.228
nam+0.205
yt+0.204
rys+0.200
 stub+0.196
Neg logit contributions
old-0.246
 safer-0.213
 him-0.212
 better-0.212
 welcome-0.207
Token all
Token ID477
Feature activation-5.97e-02

Pos logit contributions
 antenna+0.092
nam+0.087
yt+0.086
rys+0.085
 stub+0.084
Neg logit contributions
old-0.059
 him-0.049
 better-0.048
 safer-0.047
 doctor-0.047
Token the
Token ID262
Feature activation+3.51e-02

Pos logit contributions
 antenna+0.103
yt+0.097
nam+0.096
rys+0.093
 stub+0.092
Neg logit contributions
old-0.103
 doctor-0.090
 Doctor-0.087
 safer-0.086
 him-0.086
Token "
Token ID366
Feature activation-5.97e-02

Pos logit contributions
 antenna+0.008
yt+0.008
rys+0.008
nam+0.008
 stub+0.008
Neg logit contributions
old-0.004
 him-0.004
 better-0.003
 doctor-0.003
 Doctor-0.003
TokenExc
Token ID40127
Feature activation-6.13e-02

Pos logit contributions
 antenna+0.108
nam+0.103
yt+0.100
rys+0.100
 stub+0.099
Neg logit contributions
old-0.098
 safer-0.083
 better-0.082
 him-0.082
 doctor-0.080
Tokenuse
Token ID1904
Feature activation-7.93e-02

Pos logit contributions
 antenna+0.063
nam+0.061
yt+0.059
rys+0.057
 stub+0.056
Neg logit contributions
old-0.144
 safer-0.129
 doctor-0.129
 him-0.127
 Doctor-0.127
Token me
Token ID502
Feature activation+4.49e-02

Pos logit contributions
 antenna+0.134
nam+0.126
yt+0.125
rys+0.121
 stub+0.120
Neg logit contributions
old-0.137
 safer-0.119
 better-0.116
 doctor-0.114
 him-0.114
Token squ
Token ID2809
Feature activation-9.53e-02

Pos logit contributions
old+0.063
 him+0.053
Here+0.052
 safer+0.051
 better+0.050
Neg logit contributions
 antenna-0.091
yt-0.085
nam-0.085
rys-0.083
 stub-0.083
Tokenir
Token ID343
Feature activation-1.77e-01

Pos logit contributions
 antenna+0.175
rys+0.168
yt+0.165
nam+0.165
 stub+0.158
Neg logit contributions
old-0.149
 safer-0.130
 doctor-0.129
 better-0.129
 him-0.128
Tokenrell
Token ID11252
Feature activation-4.86e-02

Pos logit contributions
nam+0.263
 antenna+0.245
yt+0.232
 stub+0.226
 nowhere+0.224
Neg logit contributions
old-0.364
 him-0.333
 welcome-0.324
 doctor-0.317
 Doctor-0.307
Tokeny
Token ID88
Feature activation-1.39e-01

Pos logit contributions
 antenna+0.078
nam+0.073
yt+0.073
rys+0.071
 stub+0.069
Neg logit contributions
old-0.088
 him-0.079
 better-0.078
 safer-0.078
 doctor-0.076
Token,
Token ID11
Feature activation-3.58e-03

Pos logit contributions
 antenna+0.287
nam+0.275
yt+0.269
rys+0.267
 stub+0.263
Neg logit contributions
old-0.194
 him-0.163
 better-0.158
 doctor-0.158
 safer-0.155
Token I
Token ID314
Feature activation-5.19e-02

Pos logit contributions
 antenna+0.007
nam+0.007
yt+0.007
rys+0.007
 stub+0.006
Neg logit contributions
old-0.005
 him-0.004
 safer-0.004
five-0.004
Here-0.004
Token was
Token ID373
Feature activation+1.30e-02

Pos logit contributions
 antenna+0.103
 stub+0.098
yt+0.097
rys+0.096
nam+0.094
Neg logit contributions
old-0.074
 him-0.062
 safer-0.061
 Doctor-0.060
 doctor-0.059
Token that
Token ID326
Feature activation+4.42e-02

Pos logit contributions
 antenna+0.142
nam+0.136
yt+0.135
rys+0.133
 stub+0.132
Neg logit contributions
old-0.060
 him-0.047
 better-0.045
 safer-0.045
 doctor-0.044
Token her
Token ID607
Feature activation-2.28e-02

Pos logit contributions
old+0.050
 him+0.045
 better+0.044
 doctor+0.042
 safer+0.041
Neg logit contributions
 antenna-0.103
yt-0.098
rys-0.096
ifying-0.096
 tentacles-0.095
Token tie
Token ID9839
Feature activation-8.97e-03

Pos logit contributions
 antenna+0.041
nam+0.040
yt+0.039
rys+0.038
ifying+0.037
Neg logit contributions
old-0.036
 better-0.032
 doctor-0.031
 him-0.031
 safer-0.031
Token was
Token ID373
Feature activation+7.99e-03

Pos logit contributions
 antenna+0.018
nam+0.018
yt+0.017
rys+0.017
 stub+0.017
Neg logit contributions
old-0.013
 him-0.011
 better-0.011
 doctor-0.010
 safer-0.010
Token too
Token ID1165
Feature activation+4.07e-02

Pos logit contributions
old+0.012
 better+0.011
 safer+0.010
 him+0.010
 doctor+0.010
Neg logit contributions
 antenna-0.016
yt-0.015
nam-0.015
rys-0.014
 stub-0.014
Token long
Token ID890
Feature activation-1.76e-01

Pos logit contributions
old+0.054
 better+0.051
 safer+0.047
 fun+0.044
 ice+0.043
Neg logit contributions
 antenna-0.087
nam-0.082
yt-0.081
 satellites-0.081
fly-0.080
Token and
Token ID290
Feature activation+1.56e-02

Pos logit contributions
 antenna+0.398
yt+0.376
nam+0.376
 stub+0.368
rys+0.367
Neg logit contributions
old-0.210
 him-0.165
 safer-0.163
 doctor-0.163
 better-0.161
Token was
Token ID373
Feature activation+8.50e-03

Pos logit contributions
old+0.022
 better+0.021
 safer+0.021
 him+0.020
 doctor+0.019
Neg logit contributions
 antenna-0.031
yt-0.030
rys-0.029
 stub-0.028
nam-0.028
Token dragging
Token ID25046
Feature activation+1.70e-02

Pos logit contributions
old+0.012
 better+0.012
 safer+0.011
 him+0.011
 ice+0.011
Neg logit contributions
 antenna-0.016
yt-0.016
nam-0.015
rys-0.015
 stub-0.015
Token on
Token ID319
Feature activation-1.16e-01

Pos logit contributions
old+0.023
 him+0.019
 safer+0.019
 doctor+0.019
 better+0.019
Neg logit contributions
 antenna-0.036
nam-0.034
yt-0.033
rys-0.033
 stub-0.033
Token the
Token ID262
Feature activation+3.38e-02

Pos logit contributions
 antenna+0.259
yt+0.249
nam+0.248
rys+0.244
 stub+0.241
Neg logit contributions
old-0.137
 him-0.120
 better-0.113
 safer-0.111
 doctor-0.110
Token loved
Token ID6151
Feature activation+8.71e-03

Pos logit contributions
old+0.012
 him+0.011
 better+0.010
 doctor+0.010
 safer+0.010
Neg logit contributions
 antenna-0.017
nam-0.016
yt-0.016
rys-0.016
ifying-0.015
Token to
Token ID284
Feature activation-2.33e-02

Pos logit contributions
old+0.012
 him+0.010
 better+0.010
 doctor+0.010
 safer+0.010
Neg logit contributions
 antenna-0.018
nam-0.017
yt-0.017
rys-0.017
 stub-0.017
Token play
Token ID711
Feature activation+1.08e-01

Pos logit contributions
 antenna+0.054
nam+0.052
yt+0.051
rys+0.051
 stub+0.050
Neg logit contributions
old-0.026
 him-0.022
 better-0.021
 doctor-0.020
 safer-0.020
Token all
Token ID477
Feature activation-5.81e-02

Pos logit contributions
old+0.184
 safer+0.146
 better+0.142
 Doctor+0.141
 doctor+0.140
Neg logit contributions
 antenna-0.195
yt-0.177
nam-0.177
 stub-0.175
rys-0.174
Token day
Token ID1110
Feature activation-7.55e-03

Pos logit contributions
 antenna+0.110
rys+0.103
yt+0.102
nam+0.102
 stub+0.100
Neg logit contributions
old-0.087
 doctor-0.073
 safer-0.073
 better-0.071
 Doctor-0.071
Token long
Token ID890
Feature activation-1.78e-01

Pos logit contributions
rys+0.016
chie+0.015
ifying+0.015
fly+0.015
 antenna+0.015
Neg logit contributions
 him-0.010
old-0.009
 about-0.009
 fun-0.009
 better-0.009
Token.
Token ID13
Feature activation+2.04e-02

Pos logit contributions
 antenna+0.450
nam+0.439
yt+0.437
rys+0.433
 stub+0.426
Neg logit contributions
old-0.164
 him-0.139
 better-0.122
 safer-0.117
 doctor-0.116
Token He
Token ID679
Feature activation+6.31e-02

Pos logit contributions
old+0.039
 cleaner+0.032
 safer+0.032
 cream+0.032
 better+0.032
Neg logit contributions
 antenna-0.032
 stub-0.029
nam-0.028
 magn-0.027
 tentacles-0.027
Token had
Token ID550
Feature activation-1.42e-01

Pos logit contributions
old+0.083
 him+0.076
 better+0.075
 doctor+0.072
 ice+0.071
Neg logit contributions
ifying-0.133
nam-0.132
rys-0.132
 antenna-0.131
yt-0.128
Token many
Token ID867
Feature activation-5.43e-02

Pos logit contributions
 antenna+0.326
yt+0.316
rys+0.313
nam+0.313
 stub+0.300
Neg logit contributions
old-0.163
 him-0.141
 better-0.135
 safer-0.133
 doctor-0.129
Token friends
Token ID2460
Feature activation+5.94e-02

Pos logit contributions
 antenna+0.123
nam+0.116
yt+0.115
rys+0.113
 stub+0.112
Neg logit contributions
old-0.066
 him-0.053
 better-0.051
 doctor-0.051
 safer-0.051