TOP ACTIVATIONS
MAX = 26.413

still stubborn but he was determined to figure it all out
she was brave and was determined to reach the other side
But even though Jack was determined , he hit the ball
sunny day , he was determined to reach the top of
scared , but she was determined to try it out .
T eddy was determined . He knew he could
The cats were still determined to get the mouse .
But Ted was determined to try it . He
But she was determined to learn how to read
quite ignorant but she was determined to take care of the
game , but he was determined to still play fair .
kept cleaning . She was determined to finish the job .
small towel . He was determined to succeed in iron ing
and started building again , determined to make an even longer
Max nodded , determined to get better . He
Lucy kept trying and was determined to increase her knowledge .
a promise . She was determined to find Sarah and deliver
on her head and was determined to succeed . She went
. Sam was determined . He threw on more
frustrated , but she was determined to find a way to

MIN ACTIVATIONS
MIN = -27.089

clown s and the ac rob ats and the lions and
elephants , lions and ac rob ats . They are amazed
of fun watching the ac rob ats and clown s .
clown s juggling and ac rob ats doing flips . But
thought they looked like ac rob ats . She enjoyed the
the animals , the ac rob ats and the clown s
saw clown s and ac rob ats and lions and elephants
a black cat named Wh isk ers . Wh isk ers
day , Lily and Wh isk ers were playing with their
walked , Mr . Wh isk ers told the girl about
saw her searching . Wh isk ers liked to play with
B uddy and Wh isk ers started to fight over
free ." Wh isk ers didn 't listen to
" Thank you , Wh isk ers !" The
a quiet mouse named Wh isk ers . Wh isk ers
others ." Wh isk ers listened to the owl
got back inside , Wh isk ers saw a teaspoon on
Lucy and Mr . Wh isk ers were very happy .
's pet cat , Wh isk ers , jumped up and
Wh isk ers . Wh isk ers loved to walk around

INTERVAL 13.038 - 26.413
CONTAINS 1.109%

Tom was happy and grateful . He opened the box
The boy was glad he had decided to be
Tim my was surprised and scared , but also excited .
Everyone in the class was surprised . Was it okay to
look ! The bird is scared . Can we help it

INTERVAL -0.338 - 13.038
CONTAINS 55.613%

. She looked everywhere but couldn 't find it .
it is too hard . She can 't cut it .
felt happy and proud of herself . They continued to play
him . She was only three years old and was excited
The mummy pig and daddy pig enjoyed their milk

INTERVAL -13.713 - -0.338
CONTAINS 42.218%

at Ben and said : " I 'm sorry
to be a surprise for the village and not something that
ila and Ben went to the market with Mom . They
Ben does not want to go . He wants to watch
but he knew that it was time for bed .

INTERVAL -27.089 - -13.713
CONTAINS 1.059%

take it home and show our parents !" Lucy said .
made ." S ara promised . She helped her
. OK ?" L ily nodded . She said
too dangerous . L ily felt sad , but
excited again . <|endoftext|> Tom is a big boy who likes
Token still
Token ID991
Feature activation+1.19e+01

Pos logit contributions
 curious+0.112
 nervous+0.111
 determined+0.110
 hungry+0.108
 confused+0.108
Neg logit contributions
eds-0.112
w-0.104
ara-0.102
ating-0.101
icer-0.098
Token stubborn
Token ID24444
Feature activation+1.55e+01

Pos logit contributions
 curious+0.574
 nervous+0.565
 determined+0.564
 hungry+0.551
 confused+0.548
Neg logit contributions
eds-0.744
w-0.701
ara-0.683
ating-0.680
of-0.659
Token but
Token ID475
Feature activation+1.09e+01

Pos logit contributions
 curious+0.779
 nervous+0.771
 determined+0.763
 hungry+0.755
 confused+0.745
Neg logit contributions
eds-0.948
w-0.892
ara-0.872
ating-0.871
icer-0.857
Token he
Token ID339
Feature activation+1.44e+01

Pos logit contributions
 curious+0.434
 nervous+0.430
 determined+0.424
 confused+0.413
 hungry+0.413
Neg logit contributions
eds-0.754
w-0.733
ara-0.716
ating-0.705
of-0.695
Token was
Token ID373
Feature activation-6.51e+00

Pos logit contributions
 curious+0.777
 determined+0.767
 nervous+0.764
 hungry+0.748
 confused+0.744
Neg logit contributions
eds-0.828
w-0.770
ara-0.754
ating-0.752
icer-0.726
Token determined
Token ID5295
Feature activation+2.47e+01

Pos logit contributions
eds+0.432
 us+0.404
ating+0.400
w+0.397
 is+0.397
Neg logit contributions
 nervous-0.186
 curious-0.180
 hungry-0.175
 thirsty-0.174
 confused-0.172
Token to
Token ID284
Feature activation+4.02e+00

Pos logit contributions
 curious+1.107
 nervous+1.088
 determined+1.088
 hungry+1.061
 confused+1.053
Neg logit contributions
eds-1.636
w-1.545
ara-1.517
ating-1.507
of-1.463
Token figure
Token ID3785
Feature activation+8.28e+00

Pos logit contributions
 curious+0.229
 determined+0.226
 nervous+0.225
 hungry+0.220
 confused+0.219
Neg logit contributions
eds-0.220
w-0.206
ara-0.201
ating-0.198
icer-0.193
Token it
Token ID340
Feature activation+4.88e+00

Pos logit contributions
 curious+0.380
 nervous+0.372
 determined+0.371
 confused+0.360
 hungry+0.358
Neg logit contributions
eds-0.543
w-0.512
ara-0.509
ating-0.499
icer-0.493
Token all
Token ID477
Feature activation+3.83e+00

Pos logit contributions
 curious+0.259
 nervous+0.256
 determined+0.255
 hungry+0.251
 confused+0.249
Neg logit contributions
eds-0.281
w-0.263
ating-0.257
ara-0.257
of-0.248
Token out
Token ID503
Feature activation+1.26e+00

Pos logit contributions
 determined+0.202
 curious+0.202
 nervous+0.201
 confused+0.195
 excited+0.193
Neg logit contributions
eds-0.225
ara-0.218
w-0.211
icer-0.209
ucks-0.201
Token she
Token ID673
Feature activation+1.59e+01

Pos logit contributions
 curious+1.055
 determined+1.051
 nervous+1.037
 hungry+1.020
 confused+1.018
Neg logit contributions
eds-0.974
w-0.888
ara-0.873
ating-0.865
of-0.849
Token was
Token ID373
Feature activation-5.34e+00

Pos logit contributions
 curious+0.834
 nervous+0.816
 determined+0.816
 hungry+0.804
 confused+0.797
Neg logit contributions
eds-0.958
w-0.881
ating-0.875
ara-0.863
icer-0.854
Token brave
Token ID14802
Feature activation+1.57e+01

Pos logit contributions
eds+0.376
w+0.347
ating+0.347
ee+0.336
ma+0.336
Neg logit contributions
 nervous-0.165
 determined-0.165
 curious-0.163
 unsure-0.160
 confused-0.154
Token and
Token ID290
Feature activation+3.24e+00

Pos logit contributions
 determined+0.904
 nervous+0.884
 thirsty+0.884
 curious+0.883
 confused+0.876
Neg logit contributions
eds-0.745
ara-0.714
of-0.686
st-0.681
ent-0.657
Token was
Token ID373
Feature activation-6.51e+00

Pos logit contributions
 nervous+0.093
 sorry+0.092
 unsure+0.092
 confused+0.091
 thirsty+0.087
Neg logit contributions
 is-0.208
w-0.207
eds-0.206
ating-0.201
st-0.198
Token determined
Token ID5295
Feature activation+2.47e+01

Pos logit contributions
eds+0.436
w+0.416
ating+0.412
ee+0.399
st+0.399
Neg logit contributions
 nervous-0.224
 confused-0.218
 curious-0.214
 hungry-0.211
 unsure-0.209
Token to
Token ID284
Feature activation+4.02e+00

Pos logit contributions
 curious+1.042
 nervous+1.024
 determined+1.022
 hungry+0.998
 confused+0.990
Neg logit contributions
eds-1.698
w-1.608
ara-1.578
ating-1.570
of-1.528
Token reach
Token ID3151
Feature activation+1.67e+00

Pos logit contributions
 curious+0.229
 determined+0.226
 nervous+0.225
 hungry+0.221
 confused+0.219
Neg logit contributions
eds-0.219
w-0.204
ara-0.200
ating-0.197
icer-0.191
Token the
Token ID262
Feature activation-2.27e+00

Pos logit contributions
 curious+0.081
 nervous+0.080
 determined+0.080
 hungry+0.078
 confused+0.077
Neg logit contributions
eds-0.105
w-0.099
ara-0.097
ating-0.096
icer-0.093
Token other
Token ID584
Feature activation-3.14e+00

Pos logit contributions
eds+0.129
w+0.120
ara+0.117
ating+0.117
of+0.112
Neg logit contributions
 curious-0.125
 nervous-0.122
 determined-0.122
 hungry-0.120
 confused-0.119
Token side
Token ID1735
Feature activation-5.00e+00

Pos logit contributions
eds+0.167
w+0.160
ating+0.151
ara+0.150
ucks+0.143
Neg logit contributions
 curious-0.193
 nervous-0.186
 hungry-0.183
 determined-0.183
 excited-0.179
TokenBut
Token ID1537
Feature activation+6.70e+00

Pos logit contributions
 curious+0.237
 nervous+0.237
 determined+0.237
 confused+0.232
 hungry+0.230
Neg logit contributions
eds-0.290
ara-0.268
w-0.268
ating-0.265
st-0.255
Token even
Token ID772
Feature activation+1.02e+01

Pos logit contributions
 curious+0.289
 nervous+0.283
 determined+0.283
 hungry+0.276
 confused+0.274
Neg logit contributions
eds-0.456
w-0.431
ara-0.423
ating-0.421
of-0.408
Token though
Token ID996
Feature activation+8.25e+00

Pos logit contributions
 curious+0.446
 nervous+0.444
 determined+0.442
 hungry+0.431
 confused+0.427
Neg logit contributions
eds-0.645
w-0.628
ara-0.608
ating-0.607
of-0.594
Token Jack
Token ID3619
Feature activation+9.65e+00

Pos logit contributions
 curious+0.300
 determined+0.294
 nervous+0.294
 hungry+0.285
 confused+0.282
Neg logit contributions
eds-0.616
w-0.584
ara-0.575
ating-0.572
of-0.557
Token was
Token ID373
Feature activation-6.51e+00

Pos logit contributions
 curious+0.559
 determined+0.558
 nervous+0.554
 confused+0.547
 hungry+0.541
Neg logit contributions
eds-0.530
ating-0.475
ucks-0.468
w-0.466
ara-0.452
Token determined
Token ID5295
Feature activation+2.47e+01

Pos logit contributions
eds+0.424
ating+0.398
w+0.397
 is+0.385
st+0.384
Neg logit contributions
 nervous-0.215
 curious-0.211
 unsure-0.209
 determined-0.207
 confused-0.202
Token,
Token ID11
Feature activation+4.27e+00

Pos logit contributions
 curious+1.188
 determined+1.170
 nervous+1.170
 hungry+1.142
 confused+1.133
Neg logit contributions
eds-1.562
w-1.467
ara-1.438
ating-1.425
icer-1.395
Token he
Token ID339
Feature activation+1.34e+01

Pos logit contributions
 nervous+0.135
 curious+0.131
 confused+0.129
 determined+0.127
 hungry+0.126
Neg logit contributions
eds-0.288
ating-0.288
w-0.287
st-0.286
ara-0.283
Token hit
Token ID2277
Feature activation+1.90e+00

Pos logit contributions
 curious+0.690
 nervous+0.685
 determined+0.676
 hungry+0.672
 confused+0.661
Neg logit contributions
eds-0.776
w-0.739
ara-0.722
ating-0.708
of-0.695
Token the
Token ID262
Feature activation-2.41e+00

Pos logit contributions
 curious+0.083
 nervous+0.082
 determined+0.082
 hungry+0.080
 confused+0.079
Neg logit contributions
eds-0.128
w-0.121
ara-0.119
ating-0.118
of-0.115
Token ball
Token ID2613
Feature activation-1.07e+00

Pos logit contributions
eds+0.139
w+0.130
ara+0.128
ating+0.126
of+0.122
Neg logit contributions
 curious-0.128
 determined-0.126
 nervous-0.126
 hungry-0.123
 confused-0.122
Token sunny
Token ID27737
Feature activation+2.55e+00

Pos logit contributions
eds+0.134
w+0.129
st+0.122
of+0.117
ating+0.114
Neg logit contributions
 determined-0.181
 curious-0.175
 sorry-0.174
 nervous-0.168
 unsure-0.168
Token day
Token ID1110
Feature activation+3.82e+00

Pos logit contributions
 curious+0.134
 hungry+0.131
 nervous+0.130
 excited+0.128
 determined+0.127
Neg logit contributions
eds-0.148
w-0.138
ara-0.135
of-0.134
icer-0.133
Token,
Token ID11
Feature activation+5.44e+00

Pos logit contributions
 curious+0.196
 determined+0.194
 nervous+0.194
 confused+0.189
 hungry+0.188
Neg logit contributions
eds-0.227
w-0.213
ara-0.209
ating-0.207
of-0.201
Token he
Token ID339
Feature activation+1.44e+01

Pos logit contributions
 determined+0.228
 curious+0.227
 nervous+0.227
 confused+0.221
 hungry+0.218
Neg logit contributions
eds-0.362
ara-0.347
w-0.343
of-0.334
ating-0.332
Token was
Token ID373
Feature activation-6.51e+00

Pos logit contributions
 curious+0.809
 determined+0.798
 nervous+0.795
 hungry+0.781
 confused+0.775
Neg logit contributions
eds-0.799
w-0.745
ating-0.731
ara-0.723
icer-0.701
Token determined
Token ID5295
Feature activation+2.47e+01

Pos logit contributions
eds+0.465
ee+0.427
w+0.422
ating+0.414
st+0.412
Neg logit contributions
 curious-0.174
 unsure-0.173
 nervous-0.171
 confused-0.165
 determined-0.165
Token to
Token ID284
Feature activation+4.02e+00

Pos logit contributions
 curious+1.070
 nervous+1.052
 determined+1.043
 hungry+1.026
 confused+1.022
Neg logit contributions
eds-1.665
w-1.571
ara-1.536
ating-1.535
of-1.496
Token reach
Token ID3151
Feature activation+1.67e+00

Pos logit contributions
 curious+0.229
 determined+0.226
 nervous+0.226
 hungry+0.221
 confused+0.219
Neg logit contributions
eds-0.219
w-0.205
ara-0.200
ating-0.197
icer-0.192
Token the
Token ID262
Feature activation-2.27e+00

Pos logit contributions
 curious+0.079
 nervous+0.078
 determined+0.078
 hungry+0.077
 confused+0.076
Neg logit contributions
eds-0.106
w-0.101
ara-0.099
ating-0.098
icer-0.095
Token top
Token ID1353
Feature activation+3.43e+00

Pos logit contributions
eds+0.134
w+0.125
ara+0.122
ating+0.122
of+0.117
Neg logit contributions
 curious-0.120
 nervous-0.118
 determined-0.117
 hungry-0.115
 confused-0.114
Token of
Token ID286
Feature activation-4.89e+00

Pos logit contributions
 curious+0.188
 nervous+0.187
 determined+0.185
 hungry+0.182
 confused+0.181
Neg logit contributions
eds-0.193
w-0.181
ara-0.176
ating-0.174
icer-0.168
Token scared
Token ID12008
Feature activation+2.13e+01

Pos logit contributions
 curious+0.338
 determined+0.336
 nervous+0.323
 confused+0.314
 hungry+0.310
Neg logit contributions
eds-0.414
w-0.396
ara-0.379
of-0.370
ating-0.367
Token,
Token ID11
Feature activation+5.44e+00

Pos logit contributions
 curious+1.139
 determined+1.130
 nervous+1.121
 hungry+1.097
 confused+1.093
Neg logit contributions
eds-1.226
w-1.146
ara-1.123
ating-1.110
of-1.081
Token but
Token ID475
Feature activation+1.06e+01

Pos logit contributions
 determined+0.202
 curious+0.194
 nervous+0.193
 unsure+0.188
 confused+0.185
Neg logit contributions
eds-0.349
ara-0.333
w-0.330
 is-0.326
 us-0.325
Token she
Token ID673
Feature activation+1.45e+01

Pos logit contributions
 nervous+0.418
 curious+0.417
 determined+0.411
 confused+0.400
 hungry+0.393
Neg logit contributions
w-0.698
eds-0.693
ara-0.665
of-0.654
ating-0.650
Token was
Token ID373
Feature activation-6.79e+00

Pos logit contributions
 curious+0.764
 determined+0.752
 nervous+0.751
 hungry+0.735
 confused+0.731
Neg logit contributions
eds-0.850
w-0.794
ara-0.777
ating-0.775
icer-0.747
Token determined
Token ID5295
Feature activation+2.45e+01

Pos logit contributions
eds+0.455
e+0.411
 us+0.407
ating+0.407
ma+0.406
Neg logit contributions
 nervous-0.196
 hungry-0.188
 curious-0.185
 unsure-0.183
 confused-0.183
Token to
Token ID284
Feature activation+3.84e+00

Pos logit contributions
 curious+1.114
 nervous+1.100
 determined+1.089
 hungry+1.076
 confused+1.067
Neg logit contributions
eds-1.600
w-1.509
ating-1.475
ara-1.475
of-1.436
Token try
Token ID1949
Feature activation+6.18e+00

Pos logit contributions
 curious+0.218
 nervous+0.215
 determined+0.215
 hungry+0.211
 confused+0.210
Neg logit contributions
eds-0.209
w-0.195
ara-0.191
ating-0.189
of-0.182
Token it
Token ID340
Feature activation+4.74e+00

Pos logit contributions
 curious+0.308
 nervous+0.303
 determined+0.303
 hungry+0.297
 confused+0.295
Neg logit contributions
eds-0.379
w-0.356
ara-0.349
ating-0.347
of-0.336
Token out
Token ID503
Feature activation+1.26e+00

Pos logit contributions
 curious+0.254
 nervous+0.251
 determined+0.250
 hungry+0.245
 confused+0.243
Neg logit contributions
eds-0.275
w-0.258
ara-0.255
ating-0.248
icer-0.244
Token.
Token ID13
Feature activation+1.66e+00

Pos logit contributions
 curious+0.066
 determined+0.065
 nervous+0.065
 hungry+0.064
 confused+0.063
Neg logit contributions
eds-0.073
w-0.069
ara-0.068
ating-0.067
icer-0.065
Token
Token ID198
Feature activation+6.45e-01

Pos logit contributions
eds+0.118
ara+0.116
w+0.112
of+0.109
ating+0.105
Neg logit contributions
 determined-0.108
 curious-0.107
 nervous-0.105
 confused-0.102
 hungry-0.102
Token
Token ID198
Feature activation+1.79e-01

Pos logit contributions
 curious+0.030
 determined+0.028
 nervous+0.028
 excited+0.028
 hungry+0.028
Neg logit contributions
eds-0.042
ating-0.041
w-0.040
ators-0.039
icer-0.038
TokenT
Token ID51
Feature activation-3.27e+00

Pos logit contributions
 curious+0.011
 determined+0.011
 nervous+0.011
 excited+0.011
 hungry+0.011
Neg logit contributions
eds-0.009
ating-0.008
w-0.008
icer-0.007
ators-0.007
Tokeneddy
Token ID21874
Feature activation-1.18e+01

Pos logit contributions
eds+0.115
w+0.099
ating+0.097
ators+0.091
icer+0.088
Neg logit contributions
 curious-0.243
 nervous-0.239
 confused-0.235
 hungry-0.232
 excited-0.231
Token was
Token ID373
Feature activation-6.79e+00

Pos logit contributions
eds+0.730
w+0.686
ating+0.676
icer+0.659
icy+0.655
Neg logit contributions
 curious-0.586
 confused-0.564
 hungry-0.558
 nervous-0.555
 determined-0.546
Token determined
Token ID5295
Feature activation+2.45e+01

Pos logit contributions
eds+0.465
ma+0.427
w+0.425
ating+0.421
e+0.418
Neg logit contributions
 nervous-0.171
 unsure-0.166
 curious-0.164
 determined-0.163
 hungry-0.158
Token.
Token ID13
Feature activation+2.18e+00

Pos logit contributions
 curious+1.152
 nervous+1.128
 determined+1.122
 hungry+1.103
 confused+1.099
Neg logit contributions
eds-1.567
w-1.480
ara-1.454
ating-1.439
of-1.415
Token He
Token ID679
Feature activation+1.13e+01

Pos logit contributions
 determined+0.080
 thirsty+0.079
 hungry+0.079
 curious+0.079
 nervous+0.074
Neg logit contributions
ara-0.150
eds-0.144
of-0.144
icer-0.138
ucks-0.138
Token knew
Token ID2993
Feature activation+7.53e+00

Pos logit contributions
 curious+0.559
 nervous+0.555
 determined+0.548
 hungry+0.544
 confused+0.536
Neg logit contributions
eds-0.680
w-0.642
ara-0.633
ating-0.621
of-0.612
Token he
Token ID339
Feature activation+1.30e+01

Pos logit contributions
 nervous+0.317
 curious+0.309
 hungry+0.306
 determined+0.304
 confused+0.301
Neg logit contributions
eds-0.504
ating-0.481
w-0.479
ara-0.475
st-0.467
Token could
Token ID714
Feature activation+4.47e+00

Pos logit contributions
 curious+0.763
 nervous+0.756
 determined+0.752
 hungry+0.742
 confused+0.736
Neg logit contributions
eds-0.682
w-0.637
ara-0.620
ating-0.616
of-0.594
Token
Token ID198
Feature activation+6.45e-01

Pos logit contributions
 curious+0.052
 determined+0.052
 nervous+0.050
 hungry+0.050
 excited+0.049
Neg logit contributions
eds-0.052
ating-0.049
w-0.048
ara-0.046
icer-0.046
TokenThe
Token ID464
Feature activation-5.32e+00

Pos logit contributions
 determined+0.040
 curious+0.040
 nervous+0.038
 excited+0.038
 hungry+0.038
Neg logit contributions
eds-0.032
ating-0.030
w-0.029
icer-0.028
ara-0.027
Token cats
Token ID11875
Feature activation-1.90e+00

Pos logit contributions
eds+0.310
w+0.292
ara+0.287
ating+0.282
of+0.274
Neg logit contributions
 curious-0.274
 determined-0.272
 nervous-0.272
 hungry-0.265
 confused-0.264
Token were
Token ID547
Feature activation-2.53e+00

Pos logit contributions
eds+0.118
w+0.109
ara+0.108
ating+0.107
icer+0.104
Neg logit contributions
 curious-0.095
 determined-0.094
 nervous-0.094
 confused-0.091
 hungry-0.091
Token still
Token ID991
Feature activation+1.02e+01

Pos logit contributions
eds+0.184
ating+0.167
 us+0.166
w+0.165
ee+0.160
Neg logit contributions
 nervous-0.073
 unsure-0.072
 determined-0.071
 curious-0.071
 confident-0.067
Token determined
Token ID5295
Feature activation+2.45e+01

Pos logit contributions
 determined+0.360
 curious+0.350
 nervous+0.349
 unsure+0.343
 sorry+0.333
Neg logit contributions
eds-0.732
w-0.664
ating-0.658
st-0.648
ee-0.642
Token to
Token ID284
Feature activation+3.84e+00

Pos logit contributions
 curious+1.109
 nervous+1.094
 determined+1.083
 hungry+1.063
 confused+1.059
Neg logit contributions
eds-1.607
w-1.513
ating-1.482
ara-1.481
of-1.440
Token get
Token ID651
Feature activation-1.16e+00

Pos logit contributions
 curious+0.218
 determined+0.215
 nervous+0.213
 hungry+0.209
 confused+0.208
Neg logit contributions
eds-0.212
w-0.198
ara-0.195
ating-0.191
icer-0.188
Token the
Token ID262
Feature activation-2.41e+00

Pos logit contributions
eds+0.066
w+0.062
ating+0.060
st+0.058
of+0.057
Neg logit contributions
 curious-0.058
 nervous-0.058
 determined-0.058
 confused-0.056
 unsure-0.056
Token mouse
Token ID10211
Feature activation+4.44e+00

Pos logit contributions
eds+0.132
w+0.123
ara+0.120
ating+0.119
of+0.115
Neg logit contributions
 curious-0.134
 determined-0.133
 nervous-0.132
 hungry-0.130
 confused-0.129
Token.
Token ID13
Feature activation+1.66e+00

Pos logit contributions
 curious+0.220
 nervous+0.218
 determined+0.217
 confused+0.212
 hungry+0.212
Neg logit contributions
eds-0.269
w-0.250
ating-0.245
ara-0.244
of-0.240
Token
Token ID198
Feature activation+1.79e-01

Pos logit contributions
eds+0.448
ara+0.444
w+0.441
of+0.422
ee+0.408
Neg logit contributions
 determined-0.375
 curious-0.370
 nervous-0.362
 confused-0.353
 hungry-0.352
Token
Token ID198
Feature activation-2.02e-01

Pos logit contributions
 curious+0.009
 determined+0.009
 nervous+0.008
 excited+0.008
 hungry+0.008
Neg logit contributions
eds-0.011
ating-0.011
w-0.011
ators-0.010
icer-0.010
TokenBut
Token ID1537
Feature activation+5.24e+00

Pos logit contributions
eds+0.010
ating+0.009
w+0.009
icer+0.009
ara+0.009
Neg logit contributions
 curious-0.013
 determined-0.013
 nervous-0.012
 excited-0.012
 hungry-0.012
Token Ted
Token ID11396
Feature activation+8.54e+00

Pos logit contributions
 curious+0.244
 nervous+0.243
 determined+0.240
 hungry+0.233
 confused+0.232
Neg logit contributions
eds-0.326
w-0.316
ara-0.313
ating-0.301
of-0.297
Token was
Token ID373
Feature activation-6.97e+00

Pos logit contributions
 curious+0.448
 determined+0.443
 nervous+0.441
 confused+0.432
 hungry+0.432
Neg logit contributions
eds-0.509
w-0.469
ating-0.463
ara-0.459
icer-0.452
Token determined
Token ID5295
Feature activation+2.43e+01

Pos logit contributions
eds+0.448
ile+0.409
 us+0.407
e+0.404
w+0.401
Neg logit contributions
 curious-0.184
 nervous-0.184
 hungry-0.178
 hopeful-0.176
 thirsty-0.174
Token to
Token ID284
Feature activation+3.69e+00

Pos logit contributions
 curious+1.160
 nervous+1.139
 determined+1.132
 hungry+1.117
 confused+1.108
Neg logit contributions
eds-1.537
w-1.451
ara-1.424
ating-1.412
of-1.381
Token try
Token ID1949
Feature activation+6.04e+00

Pos logit contributions
 curious+0.211
 nervous+0.209
 determined+0.208
 hungry+0.205
 confused+0.204
Neg logit contributions
eds-0.199
w-0.186
ara-0.181
ating-0.181
of-0.174
Token it
Token ID340
Feature activation+4.62e+00

Pos logit contributions
 curious+0.300
 nervous+0.297
 determined+0.295
 hungry+0.289
 confused+0.288
Neg logit contributions
eds-0.372
w-0.340
ating-0.338
ara-0.337
st-0.327
Token.
Token ID13
Feature activation+1.66e+00

Pos logit contributions
 curious+0.257
 determined+0.254
 nervous+0.252
 hungry+0.249
 confused+0.247
Neg logit contributions
eds-0.255
w-0.239
ating-0.234
ara-0.231
of-0.225
Token He
Token ID679
Feature activation+1.08e+01

Pos logit contributions
 curious+0.058
 hungry+0.058
 determined+0.058
 nervous+0.057
 thirsty+0.057
Neg logit contributions
eds-0.117
ara-0.116
of-0.113
icer-0.110
st-0.109
Token
Token ID198
Feature activation+1.79e-01

Pos logit contributions
 curious+0.173
 determined+0.172
 nervous+0.170
 hungry+0.166
 confused+0.165
Neg logit contributions
eds-0.264
ara-0.251
w-0.250
ating-0.243
of-0.240
Token
Token ID198
Feature activation-2.02e-01

Pos logit contributions
 curious+0.008
 determined+0.008
 nervous+0.008
 excited+0.008
 hungry+0.008
Neg logit contributions
eds-0.012
ating-0.011
w-0.011
ators-0.011
st-0.010
TokenBut
Token ID1537
Feature activation+5.24e+00

Pos logit contributions
eds+0.010
ating+0.009
w+0.009
icer+0.008
ators+0.008
Neg logit contributions
 curious-0.013
 determined-0.013
 nervous-0.012
 excited-0.012
 hungry-0.012
Token she
Token ID673
Feature activation+1.42e+01

Pos logit contributions
 curious+0.243
 nervous+0.241
 determined+0.239
 hungry+0.231
 confused+0.231
Neg logit contributions
eds-0.328
w-0.317
ara-0.314
ating-0.305
of-0.299
Token was
Token ID373
Feature activation-6.97e+00

Pos logit contributions
 curious+0.712
 determined+0.701
 nervous+0.696
 confused+0.684
 hungry+0.683
Neg logit contributions
eds-0.887
w-0.817
ating-0.812
ara-0.790
icer-0.784
Token determined
Token ID5295
Feature activation+2.43e+01

Pos logit contributions
eds+0.439
w+0.402
 us+0.401
ile+0.401
ma+0.400
Neg logit contributions
 nervous-0.200
 curious-0.194
 unsure-0.189
 hungry-0.187
 hopeful-0.184
Token to
Token ID284
Feature activation+3.69e+00

Pos logit contributions
 curious+1.118
 nervous+1.100
 determined+1.098
 hungry+1.074
 confused+1.066
Neg logit contributions
eds-1.584
w-1.496
ara-1.469
ating-1.458
of-1.418
Token learn
Token ID2193
Feature activation+2.02e+00

Pos logit contributions
 curious+0.212
 determined+0.209
 nervous+0.209
 hungry+0.205
 confused+0.204
Neg logit contributions
eds-0.200
w-0.186
ara-0.182
ating-0.180
of-0.174
Token how
Token ID703
Feature activation-1.93e+00

Pos logit contributions
 curious+0.095
 nervous+0.093
 determined+0.092
 confused+0.090
 hungry+0.090
Neg logit contributions
eds-0.131
w-0.126
ara-0.122
ating-0.120
icer-0.119
Token to
Token ID284
Feature activation+3.32e+00

Pos logit contributions
eds+0.118
ating+0.106
w+0.106
st+0.105
of+0.104
Neg logit contributions
 nervous-0.091
 confused-0.090
 unsure-0.089
 curious-0.089
 determined-0.089
Token read
Token ID1100
Feature activation+1.04e+00

Pos logit contributions
 curious+0.185
 determined+0.182
 nervous+0.180
 confused+0.175
 hungry+0.175
Neg logit contributions
eds-0.191
w-0.177
ara-0.173
icer-0.170
ating-0.167
Token quite
Token ID2407
Feature activation+1.15e+01

Pos logit contributions
 determined+0.044
 unsure+0.041
 curious+0.041
 nervous+0.040
 hungry+0.039
Neg logit contributions
 us-0.097
 is-0.096
eds-0.093
w-0.090
ating-0.090
Token ignorant
Token ID20109
Feature activation+1.45e+01

Pos logit contributions
 determined+0.212
 unsure+0.179
 curious+0.171
 nervous+0.163
 hungry+0.145
Neg logit contributions
w-0.873
eds-0.857
 is-0.844
ating-0.830
 us-0.824
Token but
Token ID475
Feature activation+9.96e+00

Pos logit contributions
 determined+0.834
 curious+0.825
 hungry+0.811
 nervous+0.810
 thirsty+0.804
Neg logit contributions
eds-0.756
w-0.688
ating-0.683
ara-0.675
of-0.667
Token she
Token ID673
Feature activation+1.40e+01

Pos logit contributions
 nervous+0.384
 hungry+0.365
 curious+0.361
 thirsty+0.353
 sorry+0.351
Neg logit contributions
w-0.600
eds-0.598
ara-0.578
 is-0.575
of-0.574
Token was
Token ID373
Feature activation-7.15e+00

Pos logit contributions
 curious+0.738
 nervous+0.734
 determined+0.727
 hungry+0.722
 confused+0.709
Neg logit contributions
eds-0.801
w-0.760
ara-0.744
ating-0.733
of-0.709
Token determined
Token ID5295
Feature activation+2.42e+01

Pos logit contributions
eds+0.425
 us+0.411
ma+0.407
 is+0.407
e+0.404
Neg logit contributions
 hungry-0.189
 thirsty-0.188
 nervous-0.187
dream-0.167
 bored-0.164
Token to
Token ID284
Feature activation+3.55e+00

Pos logit contributions
 curious+1.118
 nervous+1.099
 determined+1.093
 hungry+1.077
 confused+1.067
Neg logit contributions
eds-1.558
w-1.473
ara-1.442
ating-1.435
of-1.398
Token take
Token ID1011
Feature activation-5.53e+00

Pos logit contributions
 curious+0.203
 determined+0.200
 nervous+0.200
 hungry+0.196
 confused+0.195
Neg logit contributions
eds-0.193
w-0.180
ara-0.176
ating-0.174
of-0.168
Token care
Token ID1337
Feature activation+4.24e+00

Pos logit contributions
eds+0.346
w+0.329
ating+0.322
ara+0.320
of+0.314
Neg logit contributions
 curious-0.258
 determined-0.254
 nervous-0.254
 hungry-0.250
 confused-0.248
Token of
Token ID286
Feature activation-5.13e+00

Pos logit contributions
 curious+0.257
 nervous+0.253
 determined+0.253
 hungry+0.249
 confused+0.247
Neg logit contributions
eds-0.215
w-0.199
ara-0.195
ating-0.193
of-0.185
Token the
Token ID262
Feature activation-2.85e+00

Pos logit contributions
eds+0.316
w+0.296
ara+0.290
ating+0.290
of+0.280
Neg logit contributions
 curious-0.252
 nervous-0.249
 determined-0.249
 hungry-0.244
 confused-0.242
Token game
Token ID983
Feature activation-1.95e-01

Pos logit contributions
eds+0.043
w+0.040
ara+0.039
ating+0.039
of+0.037
Neg logit contributions
 curious-0.044
 determined-0.044
 nervous-0.044
 hungry-0.043
 confused-0.042
Token,
Token ID11
Feature activation+4.73e+00

Pos logit contributions
eds+0.012
w+0.011
ara+0.011
ating+0.010
icer+0.010
Neg logit contributions
 curious-0.010
 nervous-0.010
 determined-0.010
 hungry-0.010
 confused-0.010
Token but
Token ID475
Feature activation+9.96e+00

Pos logit contributions
 curious+0.193
 nervous+0.192
 determined+0.191
 hungry+0.188
 confused+0.186
Neg logit contributions
eds-0.309
ara-0.293
w-0.291
ating-0.286
of-0.285
Token he
Token ID339
Feature activation+1.36e+01

Pos logit contributions
 curious+0.417
 nervous+0.411
 determined+0.406
 hungry+0.398
 confused+0.395
Neg logit contributions
eds-0.666
w-0.647
ara-0.631
ating-0.625
of-0.611
Token was
Token ID373
Feature activation-7.15e+00

Pos logit contributions
 curious+0.767
 determined+0.760
 nervous+0.754
 hungry+0.737
 confused+0.737
Neg logit contributions
eds-0.755
w-0.698
ara-0.683
ating-0.681
icer-0.660
Token determined
Token ID5295
Feature activation+2.42e+01

Pos logit contributions
eds+0.470
ating+0.447
 is+0.445
w+0.441
 us+0.433
Neg logit contributions
 nervous-0.205
 curious-0.204
 thirsty-0.204
 hungry-0.204
 confused-0.191
Token to
Token ID284
Feature activation+3.55e+00

Pos logit contributions
 curious+1.134
 nervous+1.115
 determined+1.110
 hungry+1.092
 confused+1.082
Neg logit contributions
eds-1.545
w-1.458
ara-1.429
ating-1.424
of-1.383
Token still
Token ID991
Feature activation+9.43e+00

Pos logit contributions
 curious+0.194
 determined+0.193
 nervous+0.190
 confused+0.184
 hungry+0.182
Neg logit contributions
eds-0.207
w-0.194
ara-0.190
icer-0.188
ating-0.184
Token play
Token ID711
Feature activation-2.18e-01

Pos logit contributions
 curious+0.541
 determined+0.536
 nervous+0.532
 confused+0.517
 hungry+0.516
Neg logit contributions
eds-0.516
w-0.485
ara-0.472
ating-0.460
icer-0.459
Token fair
Token ID3148
Feature activation+4.41e+00

Pos logit contributions
eds+0.013
w+0.012
ara+0.012
ating+0.012
of+0.012
Neg logit contributions
 curious-0.011
 nervous-0.011
 determined-0.011
 hungry-0.011
 confused-0.011
Token.
Token ID13
Feature activation+1.46e+00

Pos logit contributions
 curious+0.234
 nervous+0.229
 determined+0.226
 hungry+0.221
 excited+0.220
Neg logit contributions
eds-0.259
w-0.248
ara-0.238
icer-0.235
ating-0.229
Token kept
Token ID4030
Feature activation+2.78e+00

Pos logit contributions
 curious+0.039
 determined+0.038
 nervous+0.038
 confused+0.037
 hungry+0.036
Neg logit contributions
eds-0.042
ating-0.037
ucks-0.037
w-0.036
icer-0.036
Token cleaning
Token ID12724
Feature activation+6.83e+00

Pos logit contributions
 curious+0.122
 nervous+0.122
 thirsty+0.118
 confused+0.116
 determined+0.116
Neg logit contributions
eds-0.168
w-0.154
ating-0.154
st-0.150
ma-0.147
Token.
Token ID13
Feature activation+2.54e+00

Pos logit contributions
 curious+0.329
 nervous+0.324
 determined+0.323
 hungry+0.316
 confused+0.314
Neg logit contributions
eds-0.431
w-0.406
ara-0.398
ating-0.393
of-0.384
Token She
Token ID1375
Feature activation+1.26e+01

Pos logit contributions
 determined+0.090
 curious+0.089
 nervous+0.086
 hungry+0.086
 confused+0.084
Neg logit contributions
eds-0.186
ara-0.183
w-0.175
icer-0.175
of-0.174
Token was
Token ID373
Feature activation-7.15e+00

Pos logit contributions
 curious+0.728
 nervous+0.724
 determined+0.722
 hungry+0.714
 confused+0.702
Neg logit contributions
eds-0.654
w-0.612
ara-0.602
ating-0.589
of-0.569
Token determined
Token ID5295
Feature activation+2.42e+01

Pos logit contributions
eds+0.424
ma+0.393
ating+0.391
 us+0.386
w+0.375
Neg logit contributions
 unsure-0.220
 nervous-0.216
 hungry-0.215
 determined-0.212
 confident-0.212
Token to
Token ID284
Feature activation+3.55e+00

Pos logit contributions
 curious+1.032
 nervous+1.013
 determined+1.012
 hungry+0.988
 confused+0.979
Neg logit contributions
eds-1.657
w-1.570
ara-1.542
ating-1.532
of-1.490
Token finish
Token ID5461
Feature activation+3.99e+00

Pos logit contributions
 curious+0.192
 nervous+0.189
 determined+0.189
 hungry+0.184
 confused+0.183
Neg logit contributions
eds-0.205
w-0.193
ara-0.189
ating-0.185
icer-0.181
Token the
Token ID262
Feature activation-2.65e+00

Pos logit contributions
 curious+0.190
 nervous+0.188
 determined+0.187
 hungry+0.183
 confused+0.182
Neg logit contributions
eds-0.252
w-0.237
ara-0.233
ating-0.233
icer-0.224
Token job
Token ID1693
Feature activation+1.75e+00

Pos logit contributions
eds+0.135
w+0.126
ara+0.122
ating+0.122
icer+0.117
Neg logit contributions
 curious-0.158
 nervous-0.156
 determined-0.156
 hungry-0.153
 confused-0.152
Token.
Token ID13
Feature activation+1.46e+00

Pos logit contributions
 curious+0.093
 determined+0.092
 nervous+0.092
 hungry+0.091
 confused+0.090
Neg logit contributions
eds-0.097
ating-0.090
w-0.090
ara-0.088
of-0.086
Token small
Token ID1402
Feature activation+7.32e+00

Pos logit contributions
 curious+0.310
 nervous+0.305
 determined+0.305
 hungry+0.299
 confused+0.297
Neg logit contributions
eds-0.369
w-0.346
ara-0.339
ating-0.337
of-0.327
Token towel
Token ID24808
Feature activation+3.94e+00

Pos logit contributions
 curious+0.428
 determined+0.424
 nervous+0.421
 hungry+0.408
 confused+0.407
Neg logit contributions
eds-0.390
w-0.376
ating-0.356
ara-0.356
icer-0.333
Token.
Token ID13
Feature activation+2.54e+00

Pos logit contributions
 curious+0.195
 nervous+0.193
 determined+0.191
 hungry+0.187
 confused+0.186
Neg logit contributions
eds-0.243
w-0.227
ara-0.223
ating-0.220
icer-0.217
Token He
Token ID679
Feature activation+1.16e+01

Pos logit contributions
 curious+0.114
 determined+0.113
 nervous+0.111
 hungry+0.109
 confused+0.108
Neg logit contributions
eds-0.165
ara-0.158
w-0.155
ating-0.151
of-0.151
Token was
Token ID373
Feature activation-7.15e+00

Pos logit contributions
 curious+0.617
 determined+0.612
 nervous+0.601
 confused+0.590
 excited+0.583
Neg logit contributions
eds-0.692
w-0.636
ating-0.627
icer-0.619
ara-0.619
Token determined
Token ID5295
Feature activation+2.42e+01

Pos logit contributions
eds+0.448
ating+0.437
ee+0.424
e+0.424
w+0.420
Neg logit contributions
 ashamed-0.205
 thirsty-0.202
 nervous-0.202
 hungry-0.198
 unsure-0.198
Token to
Token ID284
Feature activation+3.55e+00

Pos logit contributions
 curious+1.041
 nervous+1.020
 determined+1.017
 hungry+0.997
 confused+0.988
Neg logit contributions
eds-1.644
w-1.558
ara-1.527
ating-1.521
of-1.480
Token succeed
Token ID6758
Feature activation+6.69e+00

Pos logit contributions
 curious+0.202
 determined+0.200
 nervous+0.199
 hungry+0.193
 confused+0.193
Neg logit contributions
eds-0.196
w-0.183
ara-0.180
ating-0.175
icer-0.174
Token in
Token ID287
Feature activation-8.37e-01

Pos logit contributions
 curious+0.362
 determined+0.357
 nervous+0.357
 hungry+0.349
 confused+0.347
Neg logit contributions
eds-0.382
w-0.358
ara-0.350
ating-0.347
of-0.335
Token iron
Token ID6953
Feature activation+3.58e+00

Pos logit contributions
eds+0.053
w+0.050
ara+0.049
ating+0.049
of+0.047
Neg logit contributions
 curious-0.040
 nervous-0.039
 determined-0.039
 hungry-0.038
 confused-0.038
Tokening
Token ID278
Feature activation-1.21e+01

Pos logit contributions
 curious+0.195
 nervous+0.192
 determined+0.192
 hungry+0.187
 confused+0.187
Neg logit contributions
eds-0.203
w-0.190
ara-0.187
ating-0.183
of-0.178
Token and
Token ID290
Feature activation+3.24e+00

Pos logit contributions
 curious+0.063
 determined+0.062
 nervous+0.062
 sorry+0.060
 confused+0.060
Neg logit contributions
eds-0.068
w-0.063
ating-0.062
ara-0.061
st-0.061
Token started
Token ID2067
Feature activation+6.48e+00

Pos logit contributions
 curious+0.144
 nervous+0.143
 determined+0.143
 sorry+0.136
 hungry+0.136
Neg logit contributions
eds-0.203
w-0.194
ara-0.189
ating-0.188
st-0.183
Token building
Token ID2615
Feature activation+6.85e+00

Pos logit contributions
 curious+0.312
 nervous+0.312
 unsure+0.304
 sorry+0.303
 determined+0.301
Neg logit contributions
eds-0.375
st-0.333
ating-0.329
w-0.319
ee-0.310
Token again
Token ID757
Feature activation-4.23e+00

Pos logit contributions
 curious+0.364
 determined+0.362
 nervous+0.357
 hungry+0.352
 confused+0.348
Neg logit contributions
eds-0.387
ating-0.363
w-0.351
ara-0.349
icer-0.341
Token,
Token ID11
Feature activation+4.09e+00

Pos logit contributions
eds+0.226
ating+0.206
w+0.206
ara+0.203
st+0.197
Neg logit contributions
 curious-0.240
 determined-0.238
 nervous-0.238
 confused-0.232
 hungry-0.232
Token determined
Token ID5295
Feature activation+2.42e+01

Pos logit contributions
 curious+0.163
 nervous+0.162
 hungry+0.160
 sorry+0.156
 thirsty+0.154
Neg logit contributions
eds-0.250
ating-0.236
 is-0.229
st-0.227
w-0.226
Token to
Token ID284
Feature activation+3.55e+00

Pos logit contributions
 curious+1.035
 nervous+1.016
 determined+1.015
 hungry+0.990
 confused+0.982
Neg logit contributions
eds-1.654
w-1.565
ara-1.537
ating-1.529
of-1.485
Token make
Token ID787
Feature activation-3.46e+00

Pos logit contributions
 curious+0.208
 determined+0.205
 nervous+0.205
 hungry+0.201
 confused+0.200
Neg logit contributions
eds-0.187
w-0.174
ara-0.170
ating-0.168
of-0.162
Token an
Token ID281
Feature activation+2.04e+00

Pos logit contributions
eds+0.211
ating+0.195
w+0.193
ara+0.191
ee+0.186
Neg logit contributions
 curious-0.168
 determined-0.167
 nervous-0.166
 hungry-0.164
 confused-0.162
Token even
Token ID772
Feature activation+7.72e+00

Pos logit contributions
 hungry+0.105
 curious+0.105
 nervous+0.105
 determined+0.105
 confused+0.102
Neg logit contributions
eds-0.114
ating-0.105
ara-0.102
icer-0.101
ee-0.101
Token longer
Token ID2392
Feature activation+7.07e+00

Pos logit contributions
 determined+0.365
 curious+0.363
 hungry+0.356
 nervous+0.356
 confused+0.350
Neg logit contributions
eds-0.427
ating-0.412
w-0.407
st-0.404
of-0.392
Token
Token ID198
Feature activation-5.29e-01

Pos logit contributions
eds+0.182
ara+0.174
w+0.172
of+0.166
ating+0.165
Neg logit contributions
 curious-0.169
 determined-0.168
 nervous-0.166
 hungry-0.163
 confused-0.162
Token
Token ID198
Feature activation-8.01e-01

Pos logit contributions
eds+0.035
w+0.033
ating+0.033
ators+0.031
icer+0.031
Neg logit contributions
 curious-0.025
 determined-0.024
 excited-0.023
 nervous-0.023
 hungry-0.023
TokenMax
Token ID11518
Feature activation-4.37e+00

Pos logit contributions
eds+0.041
w+0.037
ating+0.037
icer+0.036
ara+0.035
Neg logit contributions
 determined-0.049
 curious-0.049
 nervous-0.047
 excited-0.047
 hungry-0.046
Token nodded
Token ID14464
Feature activation+5.96e+00

Pos logit contributions
eds+0.247
w+0.231
ara+0.226
ating+0.224
of+0.216
Neg logit contributions
 curious-0.239
 nervous-0.236
 determined-0.236
 hungry-0.231
 confused-0.229
Token,
Token ID11
Feature activation+3.95e+00

Pos logit contributions
 thirsty+0.338
 determined+0.328
 hungry+0.327
 nervous+0.326
 curious+0.318
Neg logit contributions
eds-0.280
ating-0.268
ara-0.257
of-0.257
en-0.255
Token determined
Token ID5295
Feature activation+2.40e+01

Pos logit contributions
 thirsty+0.162
 bored+0.161
 nervous+0.158
 hungry+0.157
 sorry+0.153
Neg logit contributions
of-0.185
st-0.180
eds-0.178
ara-0.177
ma-0.175
Token to
Token ID284
Feature activation+3.44e+00

Pos logit contributions
 curious+1.180
 nervous+1.170
 hungry+1.162
 determined+1.146
 thirsty+1.139
Neg logit contributions
eds-1.461
w-1.389
ating-1.376
ara-1.347
of-1.338
Token get
Token ID651
Feature activation-1.54e+00

Pos logit contributions
 curious+0.193
 determined+0.191
 nervous+0.190
 hungry+0.185
 confused+0.185
Neg logit contributions
eds-0.191
w-0.179
ara-0.175
ating-0.172
icer-0.168
Token better
Token ID1365
Feature activation+7.77e+00

Pos logit contributions
eds+0.089
w+0.084
ating+0.084
st+0.082
of+0.079
Neg logit contributions
 curious-0.073
 nervous-0.071
 determined-0.071
 hungry-0.070
 confused-0.069
Token.
Token ID13
Feature activation+1.46e+00

Pos logit contributions
 curious+0.445
 nervous+0.439
 hungry+0.436
 determined+0.435
 thirsty+0.432
Neg logit contributions
eds-0.391
ating-0.366
w-0.358
st-0.355
ara-0.346
Token He
Token ID679
Feature activation+1.06e+01

Pos logit contributions
 curious+0.056
 hungry+0.055
 thirsty+0.055
 nervous+0.055
 determined+0.055
Neg logit contributions
ara-0.098
eds-0.096
of-0.093
icer-0.090
st-0.089
Token Lucy
Token ID22162
Feature activation+2.96e+00

Pos logit contributions
 curious+0.324
 nervous+0.320
 determined+0.315
 confused+0.304
 hungry+0.298
Neg logit contributions
eds-0.803
w-0.800
ara-0.793
of-0.765
ating-0.765
Token kept
Token ID4030
Feature activation+2.18e+00

Pos logit contributions
 curious+0.171
 determined+0.169
 nervous+0.168
 hungry+0.165
 confused+0.164
Neg logit contributions
eds-0.158
w-0.147
ara-0.143
ating-0.143
icer-0.137
Token trying
Token ID2111
Feature activation+1.73e+01

Pos logit contributions
 curious+0.096
 nervous+0.095
 confused+0.091
 determined+0.091
 unsure+0.090
Neg logit contributions
eds-0.137
w-0.125
ating-0.124
st-0.121
ma-0.120
Token and
Token ID290
Feature activation+2.28e+00

Pos logit contributions
 curious+0.869
 nervous+0.860
 determined+0.855
 hungry+0.840
 confused+0.836
Neg logit contributions
eds-1.053
w-0.979
ara-0.961
ating-0.959
of-0.933
Token was
Token ID373
Feature activation-7.30e+00

Pos logit contributions
 curious+0.105
 nervous+0.104
 hungry+0.101
 sorry+0.099
 confused+0.099
Neg logit contributions
eds-0.138
w-0.135
ating-0.134
st-0.133
of-0.129
Token determined
Token ID5295
Feature activation+2.40e+01

Pos logit contributions
eds+0.482
w+0.462
ating+0.460
st+0.447
ma+0.441
Neg logit contributions
 curious-0.229
 nervous-0.227
 hungry-0.223
 thirsty-0.217
 confused-0.216
Token to
Token ID284
Feature activation+3.44e+00

Pos logit contributions
 curious+1.176
 determined+1.158
 nervous+1.158
 hungry+1.131
 confused+1.123
Neg logit contributions
eds-1.498
w-1.410
ara-1.382
ating-1.372
of-1.329
Token increase
Token ID2620
Feature activation+1.55e+00

Pos logit contributions
 curious+0.197
 determined+0.194
 nervous+0.194
 hungry+0.190
 confused+0.189
Neg logit contributions
eds-0.187
w-0.174
ara-0.170
ating-0.168
icer-0.163
Token her
Token ID607
Feature activation+1.41e+00

Pos logit contributions
 curious+0.076
 nervous+0.074
 determined+0.074
 hungry+0.073
 confused+0.072
Neg logit contributions
eds-0.097
w-0.091
ara-0.089
ating-0.089
of-0.086
Token knowledge
Token ID3725
Feature activation-1.38e+00

Pos logit contributions
 curious+0.074
 nervous+0.074
 determined+0.073
 hungry+0.072
 confused+0.072
Neg logit contributions
eds-0.080
w-0.075
ating-0.073
ara-0.073
of-0.070
Token.
Token ID13
Feature activation+1.36e+00

Pos logit contributions
eds+0.080
w+0.074
ara+0.073
ating+0.072
of+0.070
Neg logit contributions
 curious-0.074
 nervous-0.073
 determined-0.073
 hungry-0.071
 confused-0.071
Token a
Token ID257
Feature activation+5.41e+00

Pos logit contributions
 curious+0.165
 determined+0.164
 nervous+0.164
 confused+0.160
 hungry+0.160
Neg logit contributions
eds-0.249
w-0.235
ating-0.230
ara-0.230
of-0.224
Token promise
Token ID6991
Feature activation-1.03e+00

Pos logit contributions
 curious+0.250
 nervous+0.250
 determined+0.249
 confused+0.244
 unsure+0.243
Neg logit contributions
eds-0.329
ara-0.303
w-0.301
of-0.294
ee-0.290
Token.
Token ID13
Feature activation+2.36e+00

Pos logit contributions
eds+0.065
w+0.061
ara+0.060
ating+0.059
of+0.058
Neg logit contributions
 curious-0.048
 nervous-0.048
 determined-0.047
 hungry-0.047
 confused-0.047
Token She
Token ID1375
Feature activation+1.24e+01

Pos logit contributions
 curious+0.107
 determined+0.107
 nervous+0.106
 confused+0.103
 hungry+0.103
Neg logit contributions
eds-0.150
ara-0.144
w-0.140
ating-0.138
of-0.137
Token was
Token ID373
Feature activation-7.30e+00

Pos logit contributions
 curious+0.673
 determined+0.665
 nervous+0.656
 confused+0.642
 hungry+0.642
Neg logit contributions
eds-0.723
w-0.667
ating-0.654
ara-0.647
icer-0.641
Token determined
Token ID5295
Feature activation+2.40e+01

Pos logit contributions
eds+0.516
w+0.484
ma+0.482
ating+0.481
e+0.480
Neg logit contributions
 nervous-0.193
 curious-0.187
 unsure-0.181
 confused-0.178
 hungry-0.178
Token to
Token ID284
Feature activation+3.44e+00

Pos logit contributions
 curious+1.042
 nervous+1.022
 determined+1.015
 hungry+0.997
 confused+0.990
Neg logit contributions
eds-1.628
w-1.544
ara-1.513
ating-1.506
of-1.468
Token find
Token ID1064
Feature activation+3.31e-01

Pos logit contributions
 curious+0.196
 determined+0.194
 nervous+0.192
 hungry+0.188
 confused+0.187
Neg logit contributions
eds-0.189
w-0.177
ara-0.173
ating-0.169
icer-0.167
Token Sarah
Token ID10490
Feature activation+2.15e+00

Pos logit contributions
 curious+0.015
 determined+0.015
 nervous+0.015
 hungry+0.014
 confused+0.014
Neg logit contributions
eds-0.022
w-0.021
ara-0.020
ating-0.020
of-0.020
Token and
Token ID290
Feature activation+1.55e+00

Pos logit contributions
 curious+0.129
 nervous+0.127
 determined+0.127
 hungry+0.125
 confused+0.124
Neg logit contributions
eds-0.111
w-0.103
ara-0.101
ating-0.100
of-0.096
Token deliver
Token ID5203
Feature activation-1.93e+00

Pos logit contributions
 nervous+0.074
 curious+0.072
 hungry+0.072
 confused+0.071
 determined+0.071
Neg logit contributions
eds-0.093
w-0.089
ating-0.087
ara-0.086
of-0.085
Token on
Token ID319
Feature activation-3.54e-02

Pos logit contributions
 curious+0.237
 determined+0.236
 nervous+0.234
 hungry+0.229
 confused+0.228
Neg logit contributions
eds-0.239
w-0.228
ara-0.220
ating-0.219
st-0.212
Token her
Token ID607
Feature activation+2.41e+00

Pos logit contributions
eds+0.002
w+0.002
ating+0.002
ara+0.002
of+0.002
Neg logit contributions
 curious-0.002
 nervous-0.002
 determined-0.002
 hungry-0.002
 confused-0.002
Token head
Token ID1182
Feature activation+3.14e+00

Pos logit contributions
 curious+0.127
 nervous+0.126
 determined+0.126
 hungry+0.123
 confused+0.122
Neg logit contributions
eds-0.140
w-0.130
ara-0.128
ating-0.126
of-0.123
Token and
Token ID290
Feature activation+2.28e+00

Pos logit contributions
 curious+0.171
 nervous+0.169
 determined+0.168
 hungry+0.165
 confused+0.164
Neg logit contributions
eds-0.177
w-0.165
ara-0.162
ating-0.160
icer-0.155
Token was
Token ID373
Feature activation-7.30e+00

Pos logit contributions
 determined+0.104
 nervous+0.103
 curious+0.101
 sorry+0.100
 confused+0.099
Neg logit contributions
eds-0.130
w-0.129
ara-0.125
of-0.121
st-0.121
Token determined
Token ID5295
Feature activation+2.40e+01

Pos logit contributions
eds+0.504
w+0.468
ee+0.464
ating+0.463
ara+0.451
Neg logit contributions
 nervous-0.227
 determined-0.224
 unsure-0.224
 confused-0.221
 curious-0.221
Token to
Token ID284
Feature activation+3.44e+00

Pos logit contributions
 curious+1.054
 nervous+1.036
 determined+1.030
 hungry+1.011
 confused+1.005
Neg logit contributions
eds-1.613
w-1.527
ara-1.494
ating-1.489
of-1.452
Token succeed
Token ID6758
Feature activation+6.57e+00

Pos logit contributions
 curious+0.194
 determined+0.192
 nervous+0.190
 confused+0.184
 hungry+0.184
Neg logit contributions
eds-0.194
w-0.181
ara-0.178
ating-0.174
icer-0.172
Token.
Token ID13
Feature activation+1.54e+00

Pos logit contributions
 curious+0.348
 determined+0.343
 nervous+0.341
 hungry+0.333
 confused+0.330
Neg logit contributions
eds-0.386
w-0.362
ara-0.361
icer-0.347
ating-0.346
Token She
Token ID1375
Feature activation+1.17e+01

Pos logit contributions
 curious+0.073
 determined+0.072
 nervous+0.072
 hungry+0.071
 confused+0.070
Neg logit contributions
eds-0.095
ara-0.091
w-0.089
of-0.088
ating-0.086
Token went
Token ID1816
Feature activation+1.60e+00

Pos logit contributions
 determined+0.599
 curious+0.599
 nervous+0.584
 confused+0.572
 excited+0.563
Neg logit contributions
eds-0.721
w-0.662
ara-0.650
ating-0.645
icer-0.639
Token.
Token ID13
Feature activation+2.54e+00

Pos logit contributions
 curious+0.519
 determined+0.513
 nervous+0.512
 hungry+0.501
 confused+0.499
Neg logit contributions
eds-0.561
w-0.524
ating-0.515
ara-0.515
of-0.492
Token
Token ID198
Feature activation-9.87e-01

Pos logit contributions
 determined+0.109
 curious+0.109
 nervous+0.107
 hungry+0.105
 confused+0.103
Neg logit contributions
eds-0.165
ara-0.162
w-0.155
of-0.153
ating-0.151
Token
Token ID198
Feature activation-1.16e+00

Pos logit contributions
eds+0.064
w+0.062
ating+0.061
ators+0.057
icer+0.057
Neg logit contributions
 curious-0.047
 determined-0.046
 nervous-0.045
 excited-0.044
 hungry-0.043
TokenSam
Token ID16305
Feature activation-4.29e+00

Pos logit contributions
eds+0.058
w+0.053
ating+0.052
ara+0.050
icer+0.049
Neg logit contributions
 curious-0.073
 determined-0.073
 nervous-0.071
 hungry-0.070
 excited-0.069
Token was
Token ID373
Feature activation-7.44e+00

Pos logit contributions
eds+0.260
w+0.242
ating+0.236
ara+0.231
icer+0.231
Neg logit contributions
 curious-0.225
 determined-0.221
 nervous-0.218
 confused-0.217
 hungry-0.212
Token determined
Token ID5295
Feature activation+2.39e+01

Pos logit contributions
eds+0.490
e+0.484
ating+0.470
 is+0.459
ma+0.457
Neg logit contributions
 nervous-0.163
 hungry-0.163
 thirsty-0.157
 unsure-0.155
 curious-0.155
Token.
Token ID13
Feature activation+1.66e+00

Pos logit contributions
 curious+1.129
 nervous+1.106
 determined+1.096
 hungry+1.086
 confused+1.075
Neg logit contributions
eds-1.512
w-1.433
ara-1.404
ating-1.401
of-1.363
Token He
Token ID679
Feature activation+1.08e+01

Pos logit contributions
 hungry+0.056
 thirsty+0.056
 determined+0.055
 curious+0.055
 nervous+0.054
Neg logit contributions
ara-0.116
eds-0.115
of-0.111
icer-0.110
ating-0.108
Token threw
Token ID9617
Feature activation+2.33e+00

Pos logit contributions
 curious+0.525
 nervous+0.521
 determined+0.515
 hungry+0.510
 confused+0.503
Neg logit contributions
eds-0.661
w-0.626
ara-0.613
ating-0.606
of-0.591
Token on
Token ID319
Feature activation-1.49e+00

Pos logit contributions
 curious+0.112
 nervous+0.110
 determined+0.110
 hungry+0.108
 confused+0.107
Neg logit contributions
eds-0.148
w-0.139
ara-0.137
ating-0.136
of-0.131
Token more
Token ID517
Feature activation+2.06e+00

Pos logit contributions
eds+0.084
w+0.079
ara+0.077
ating+0.077
of+0.075
Neg logit contributions
 curious-0.080
 nervous-0.080
 determined-0.080
 hungry-0.078
 confused-0.078
Token frustrated
Token ID14718
Feature activation+1.80e+01

Pos logit contributions
 determined+0.167
 curious+0.161
 nervous+0.139
 hungry+0.138
 sorry+0.137
Neg logit contributions
w-0.365
ee-0.356
st-0.356
ara-0.354
eds-0.352
Token,
Token ID11
Feature activation+4.27e+00

Pos logit contributions
 curious+1.004
 determined+0.999
 nervous+0.978
 hungry+0.977
 confused+0.962
Neg logit contributions
eds-0.927
ating-0.886
ara-0.884
w-0.876
st-0.872
Token but
Token ID475
Feature activation+9.60e+00

Pos logit contributions
 sorry+0.173
 curious+0.172
 hungry+0.167
 nervous+0.164
 determined+0.163
Neg logit contributions
eds-0.235
st-0.233
of-0.231
w-0.229
ara-0.228
Token she
Token ID673
Feature activation+1.37e+01

Pos logit contributions
 nervous+0.371
 curious+0.368
 hungry+0.356
 determined+0.350
 confused+0.346
Neg logit contributions
w-0.612
ara-0.604
eds-0.592
of-0.588
ating-0.587
Token was
Token ID373
Feature activation-7.44e+00

Pos logit contributions
 curious+0.725
 nervous+0.720
 determined+0.711
 hungry+0.708
 confused+0.697
Neg logit contributions
eds-0.771
w-0.736
ara-0.724
ating-0.709
of-0.691
Token determined
Token ID5295
Feature activation+2.39e+01

Pos logit contributions
eds+0.421
ma+0.420
e+0.410
ating+0.409
st+0.401
Neg logit contributions
 nervous-0.224
 hungry-0.215
 curious-0.211
 floppy-0.205
 thirsty-0.205
Token to
Token ID284
Feature activation+3.32e+00

Pos logit contributions
 curious+1.186
 nervous+1.169
 determined+1.152
 hungry+1.148
 confused+1.136
Neg logit contributions
eds-1.437
w-1.363
ara-1.336
ating-1.335
of-1.299
Token find
Token ID1064
Feature activation+2.14e-01

Pos logit contributions
 curious+0.190
 determined+0.188
 nervous+0.188
 hungry+0.184
 confused+0.183
Neg logit contributions
eds-0.179
w-0.167
ara-0.163
ating-0.162
of-0.156
Token a
Token ID257
Feature activation+4.05e+00

Pos logit contributions
 curious+0.010
 determined+0.010
 nervous+0.010
 hungry+0.010
 confused+0.009
Neg logit contributions
eds-0.014
w-0.013
ara-0.013
ating-0.013
of-0.012
Token way
Token ID835
Feature activation+2.30e+00

Pos logit contributions
 curious+0.229
 nervous+0.229
 determined+0.227
 confused+0.226
 unsure+0.223
Neg logit contributions
eds-0.209
w-0.188
st-0.188
ara-0.187
of-0.182
Token to
Token ID284
Feature activation+2.95e+00

Pos logit contributions
 curious+0.121
 nervous+0.119
 hungry+0.118
 determined+0.118
 confused+0.117
Neg logit contributions
eds-0.134
w-0.119
of-0.118
st-0.118
ating-0.116
Token clown
Token ID25573
Feature activation-5.52e-01

Pos logit contributions
eds+0.278
w+0.262
ara+0.256
ating+0.253
icer+0.245
Neg logit contributions
 curious-0.246
 determined-0.245
 nervous-0.244
 hungry-0.238
 confused-0.237
Tokens
Token ID82
Feature activation-1.37e+01

Pos logit contributions
eds+0.031
w+0.029
ara+0.029
ating+0.028
icer+0.027
Neg logit contributions
 curious-0.030
 nervous-0.030
 determined-0.030
 hungry-0.029
 confused-0.029
Token and
Token ID290
Feature activation-3.19e-01

Pos logit contributions
eds+0.721
ating+0.664
w+0.662
ara+0.638
st+0.615
Neg logit contributions
 determined-0.743
 nervous-0.738
 curious-0.727
 unsure-0.712
 hungry-0.706
Token the
Token ID262
Feature activation-4.72e+00

Pos logit contributions
eds+0.017
ara+0.015
w+0.015
of+0.015
ating+0.015
Neg logit contributions
 determined-0.017
 nervous-0.017
 curious-0.017
 unsure-0.016
 hungry-0.016
Token ac
Token ID936
Feature activation-7.39e+00

Pos logit contributions
eds+0.268
w+0.250
ara+0.245
ating+0.243
of+0.235
Neg logit contributions
 curious-0.258
 nervous-0.254
 determined-0.254
 hungry-0.249
 confused-0.247
Tokenrob
Token ID22609
Feature activation-2.71e+01

Pos logit contributions
 occasions+0.205
ac+0.204
eds+0.201
ators+0.184
anish+0.183
Neg logit contributions
 hungry-0.525
 curious-0.511
 nervous-0.503
 determined-0.500
 thirsty-0.497
Tokenats
Token ID1381
Feature activation-1.94e+01

Pos logit contributions
eds+0.495
ara+0.475
of+0.399
st+0.398
ators+0.370
Neg logit contributions
 determined-2.392
 curious-2.363
 hungry-2.355
 nervous-2.320
 excited-2.310
Token and
Token ID290
Feature activation-3.24e-01

Pos logit contributions
eds+1.045
w+0.968
ara+0.947
ating+0.946
of+0.926
Neg logit contributions
 curious-1.088
 determined-1.084
 nervous-1.068
 hungry-1.048
 confused-1.038
Token the
Token ID262
Feature activation-4.73e+00

Pos logit contributions
eds+0.018
w+0.016
ara+0.016
ating+0.016
of+0.016
Neg logit contributions
 determined-0.018
 curious-0.018
 nervous-0.017
 unsure-0.017
 hungry-0.017
Token lions
Token ID31420
Feature activation-6.82e+00

Pos logit contributions
eds+0.271
w+0.254
ara+0.248
ating+0.248
of+0.238
Neg logit contributions
 curious-0.257
 nervous-0.252
 determined-0.251
 hungry-0.248
 confused-0.246
Token and
Token ID290
Feature activation-3.13e-01

Pos logit contributions
eds+0.382
w+0.357
ara+0.350
ating+0.346
of+0.334
Neg logit contributions
 curious-0.375
 nervous-0.370
 determined-0.370
 hungry-0.363
 confused-0.360
Token elephants
Token ID25368
Feature activation-6.11e+00

Pos logit contributions
 determined+0.075
 nervous+0.073
 curious+0.072
 unsure+0.071
 confused+0.070
Neg logit contributions
eds-0.090
w-0.084
ara-0.082
of-0.081
ating-0.079
Token,
Token ID11
Feature activation+1.51e+00

Pos logit contributions
eds+0.389
w+0.367
ara+0.360
ating+0.357
of+0.346
Neg logit contributions
 curious-0.290
 nervous-0.286
 determined-0.285
 hungry-0.279
 confused-0.277
Token lions
Token ID31420
Feature activation-6.82e+00

Pos logit contributions
 curious+0.077
 nervous+0.075
 determined+0.075
 hungry+0.074
 confused+0.073
Neg logit contributions
eds-0.091
w-0.086
ara-0.084
ating-0.084
icer-0.081
Token and
Token ID290
Feature activation-3.01e-01

Pos logit contributions
eds+0.440
w+0.416
ara+0.407
ating+0.404
of+0.392
Neg logit contributions
 curious-0.319
 nervous-0.312
 determined-0.312
 hungry-0.306
 confused-0.303
Token ac
Token ID936
Feature activation-7.37e+00

Pos logit contributions
eds+0.016
w+0.015
ara+0.014
of+0.014
ating+0.014
Neg logit contributions
 determined-0.017
 nervous-0.017
 curious-0.017
 confused-0.016
 unsure-0.016
Tokenrob
Token ID22609
Feature activation-2.71e+01

Pos logit contributions
ac+0.206
eds+0.199
ators+0.186
ara+0.186
anish+0.180
Neg logit contributions
 hungry-0.536
 curious-0.526
 excited-0.504
 thirsty-0.500
 nervous-0.498
Tokenats
Token ID1381
Feature activation-1.94e+01

Pos logit contributions
eds+0.547
ara+0.518
st+0.477
ators+0.444
of+0.427
Neg logit contributions
 determined-2.321
 curious-2.304
 hungry-2.280
 excited-2.256
 nervous-2.249
Token.
Token ID13
Feature activation-3.65e-01

Pos logit contributions
eds+1.149
w+1.097
ara+1.073
ating+1.054
icer+1.035
Neg logit contributions
 curious-1.005
 nervous-0.988
 hungry-0.973
 determined-0.964
 confused-0.957
Token They
Token ID1119
Feature activation+4.78e+00

Pos logit contributions
ara+0.019
eds+0.017
of+0.017
 is+0.017
ucks+0.016
Neg logit contributions
 determined-0.021
 unsure-0.019
 curious-0.019
 nervous-0.018
 hungry-0.018
Token are
Token ID389
Feature activation-1.33e+01

Pos logit contributions
 determined+0.274
 curious+0.274
 nervous+0.273
 hungry+0.268
 confused+0.265
Neg logit contributions
eds-0.248
w-0.231
ara-0.227
of-0.222
ating-0.222
Token amazed
Token ID24562
Feature activation+1.44e+01

Pos logit contributions
eds+0.838
 us+0.776
 is+0.760
ating+0.749
ile+0.724
Neg logit contributions
 unsure-0.455
 determined-0.454
 confident-0.421
 ashamed-0.404
 nervous-0.402
Token of
Token ID286
Feature activation-7.03e+00

Pos logit contributions
eds+0.260
w+0.241
ara+0.235
ating+0.233
icer+0.223
Neg logit contributions
 curious-0.314
 nervous-0.310
 determined-0.309
 hungry-0.304
 confused-0.302
Token fun
Token ID1257
Feature activation+7.30e+00

Pos logit contributions
eds+0.532
w+0.509
ara+0.502
ating+0.498
of+0.486
Neg logit contributions
 curious-0.244
 nervous-0.241
 determined-0.240
 hungry-0.233
 confused-0.232
Token watching
Token ID4964
Feature activation+4.28e+00

Pos logit contributions
 curious+0.385
 determined+0.378
 nervous+0.376
 hungry+0.371
 confused+0.364
Neg logit contributions
eds-0.431
w-0.406
ara-0.400
of-0.386
ating-0.384
Token the
Token ID262
Feature activation-4.66e+00

Pos logit contributions
 curious+0.218
 nervous+0.215
 determined+0.212
 hungry+0.210
 confused+0.208
Neg logit contributions
eds-0.259
w-0.246
ara-0.239
ating-0.236
icer-0.229
Token ac
Token ID936
Feature activation-7.34e+00

Pos logit contributions
eds+0.284
w+0.256
ating+0.249
ara+0.243
icer+0.241
Neg logit contributions
 curious-0.260
 hungry-0.252
 nervous-0.247
 excited-0.241
 determined-0.239
Tokenrob
Token ID22609
Feature activation-2.70e+01

Pos logit contributions
eds+0.234
ac+0.229
anish+0.206
 occasions+0.203
ators+0.200
Neg logit contributions
 hungry-0.521
 nervous-0.506
 confident-0.500
 curious-0.497
 determined-0.494
Tokenats
Token ID1381
Feature activation-1.93e+01

Pos logit contributions
eds+0.600
ara+0.542
w+0.474
ators+0.447
st+0.440
Neg logit contributions
 determined-2.324
 hungry-2.297
 curious-2.275
 nervous-2.256
 excited-2.241
Token and
Token ID290
Feature activation-2.48e-01

Pos logit contributions
eds+1.112
w+1.043
ara+1.021
ating+1.011
icer+0.976
Neg logit contributions
 curious-1.032
 nervous-1.019
 determined-1.016
 hungry-0.997
 confused-0.990
Token clown
Token ID25573
Feature activation-4.47e-01

Pos logit contributions
eds+0.013
w+0.012
ara+0.012
ating+0.012
of+0.012
Neg logit contributions
 determined-0.014
 curious-0.014
 nervous-0.014
 confused-0.013
 hungry-0.013
Tokens
Token ID82
Feature activation-1.36e+01

Pos logit contributions
eds+0.018
w+0.017
ara+0.016
ating+0.016
icer+0.016
Neg logit contributions
 curious-0.031
 nervous-0.031
 hungry-0.031
 determined-0.031
 confused-0.030
Token.
Token ID13
Feature activation-3.13e-01

Pos logit contributions
eds+0.730
w+0.674
ating+0.672
ara+0.664
of+0.644
Neg logit contributions
 curious-0.749
 determined-0.747
 nervous-0.742
 hungry-0.724
 confused-0.724
Token clown
Token ID25573
Feature activation-3.28e-01

Pos logit contributions
eds+0.167
w+0.158
ara+0.156
of+0.154
ating+0.152
Neg logit contributions
 determined-0.135
 nervous-0.130
 unsure-0.127
 hungry-0.127
 curious-0.126
Tokens
Token ID82
Feature activation-1.35e+01

Pos logit contributions
eds+0.012
w+0.011
ara+0.011
ating+0.011
icer+0.010
Neg logit contributions
 curious-0.024
 nervous-0.024
 determined-0.024
 hungry-0.024
 confused-0.023
Token juggling
Token ID48097
Feature activation+2.92e+00

Pos logit contributions
eds+0.816
w+0.763
ara+0.759
ating+0.753
of+0.731
Neg logit contributions
 determined-0.663
 curious-0.661
 nervous-0.658
 hungry-0.641
 confused-0.636
Token and
Token ID290
Feature activation-1.55e-01

Pos logit contributions
 curious+0.144
 nervous+0.141
 determined+0.139
 hungry+0.138
 confused+0.136
Neg logit contributions
eds-0.184
w-0.173
ara-0.170
ating-0.170
icer-0.164
Token ac
Token ID936
Feature activation-7.26e+00

Pos logit contributions
eds+0.010
w+0.009
ara+0.009
of+0.009
ating+0.009
Neg logit contributions
 determined-0.007
 nervous-0.007
 curious-0.007
 sorry-0.007
 confused-0.007
Tokenrob
Token ID22609
Feature activation-2.70e+01

Pos logit contributions
eds+0.196
ara+0.185
 occasions+0.177
ators+0.171
ac+0.168
Neg logit contributions
 hungry-0.537
 determined-0.536
 curious-0.533
 excited-0.532
 nervous-0.527
Tokenats
Token ID1381
Feature activation-1.93e+01

Pos logit contributions
eds+0.597
ara+0.542
st+0.469
of+0.466
w+0.440
Neg logit contributions
 determined-2.320
 curious-2.280
 hungry-2.256
 nervous-2.249
 excited-2.238
Token doing
Token ID1804
Feature activation+3.08e+00

Pos logit contributions
eds+0.991
w+0.920
ating+0.906
of+0.897
st+0.868
Neg logit contributions
 determined-1.083
 curious-1.082
 nervous-1.059
 hungry-1.042
 sorry-1.037
Token flips
Token ID45971
Feature activation-2.77e+00

Pos logit contributions
 curious+0.149
 nervous+0.148
 determined+0.147
 hungry+0.144
 confused+0.144
Neg logit contributions
eds-0.187
ating-0.172
w-0.171
ara-0.169
of-0.168
Token.
Token ID13
Feature activation-3.18e-01

Pos logit contributions
eds+0.157
w+0.146
ating+0.144
ara+0.143
of+0.142
Neg logit contributions
 determined-0.146
 curious-0.146
 nervous-0.143
 hungry-0.141
 confused-0.140
Token But
Token ID887
Feature activation+7.36e+00

Pos logit contributions
eds+0.019
ara+0.019
w+0.018
of+0.018
ating+0.017
Neg logit contributions
 determined-0.016
 curious-0.016
 nervous-0.015
 hungry-0.015
 confused-0.015
Token thought
Token ID1807
Feature activation+8.07e+00

Pos logit contributions
eds+0.020
w+0.019
ara+0.018
ating+0.018
icer+0.018
Neg logit contributions
 curious-0.020
 determined-0.020
 nervous-0.020
 confused-0.019
 hungry-0.019
Token they
Token ID484
Feature activation+6.04e+00

Pos logit contributions
 curious+0.412
 hungry+0.408
 determined+0.405
 nervous+0.404
 thirsty+0.397
Neg logit contributions
eds-0.460
ating-0.428
ara-0.428
w-0.426
of-0.424
Token looked
Token ID3114
Feature activation+5.05e+00

Pos logit contributions
 curious+0.355
 determined+0.355
 nervous+0.353
 hungry+0.352
 sorry+0.346
Neg logit contributions
eds-0.303
w-0.287
ating-0.274
ara-0.273
of-0.271
Token like
Token ID588
Feature activation+5.57e+00

Pos logit contributions
 determined+0.192
 thirsty+0.184
 sorry+0.184
 hungry+0.181
 nervous+0.179
Neg logit contributions
eds-0.300
 is-0.291
 us-0.284
st-0.284
of-0.279
Token ac
Token ID936
Feature activation-7.13e+00

Pos logit contributions
 determined+0.253
 confused+0.250
 unsure+0.249
 nervous+0.248
 curious+0.245
Neg logit contributions
eds-0.329
ara-0.318
w-0.311
ating-0.310
of-0.310
Tokenrob
Token ID22609
Feature activation-2.68e+01

Pos logit contributions
eds+0.195
ara+0.168
ators+0.168
 occasions+0.165
ucks+0.158
Neg logit contributions
 nervous-0.552
 hungry-0.550
 curious-0.542
 determined-0.539
 confident-0.538
Tokenats
Token ID1381
Feature activation-1.91e+01

Pos logit contributions
eds+0.655
ara+0.564
ators+0.504
w+0.490
of+0.489
Neg logit contributions
 determined-2.261
 curious-2.228
 hungry-2.225
 nervous-2.216
 excited-2.181
Token.
Token ID13
Feature activation-1.19e-01

Pos logit contributions
eds+1.132
w+1.069
ating+1.042
ara+1.039
of+1.028
Neg logit contributions
 curious-0.962
 determined-0.958
 nervous-0.936
 hungry-0.921
 confused-0.917
Token She
Token ID1375
Feature activation+1.01e+01

Pos logit contributions
ara+0.008
eds+0.008
of+0.007
ating+0.007
 is+0.007
Neg logit contributions
 determined-0.005
 hungry-0.005
 curious-0.004
 thirsty-0.004
 sorry-0.004
Token enjoyed
Token ID8359
Feature activation-1.25e+00

Pos logit contributions
 curious+0.483
 nervous+0.476
 determined+0.475
 hungry+0.466
 confused+0.461
Neg logit contributions
eds-0.639
w-0.602
ara-0.592
ating-0.586
of-0.570
Token the
Token ID262
Feature activation-4.38e+00

Pos logit contributions
eds+0.078
ating+0.072
w+0.071
ara+0.070
st+0.069
Neg logit contributions
 nervous-0.059
 curious-0.058
 confused-0.058
 unsure-0.058
 determined-0.058
Token the
Token ID262
Feature activation-4.36e+00

Pos logit contributions
 curious+0.102
 hungry+0.095
 nervous+0.095
 excited+0.092
 confused+0.091
Neg logit contributions
eds-0.149
w-0.146
ating-0.138
ara-0.136
icer-0.132
Token animals
Token ID4695
Feature activation-6.39e+00

Pos logit contributions
eds+0.270
w+0.251
ating+0.242
ara+0.239
of+0.234
Neg logit contributions
 curious-0.241
 hungry-0.227
 nervous-0.225
 excited-0.218
 confused-0.213
Token,
Token ID11
Feature activation+1.87e+00

Pos logit contributions
eds+0.357
ating+0.331
w+0.327
ara+0.322
of+0.317
Neg logit contributions
 determined-0.339
 curious-0.337
 nervous-0.336
 confused-0.329
 unsure-0.325
Token the
Token ID262
Feature activation-4.32e+00

Pos logit contributions
 determined+0.085
 unsure+0.080
 nervous+0.080
 curious+0.079
 confused+0.078
Neg logit contributions
eds-0.112
of-0.109
ara-0.108
w-0.102
 is-0.101
Token ac
Token ID936
Feature activation-7.00e+00

Pos logit contributions
eds+0.233
w+0.217
ating+0.211
ara+0.210
of+0.203
Neg logit contributions
 curious-0.255
 nervous-0.248
 hungry-0.246
 determined-0.246
 confused-0.242
Tokenrob
Token ID22609
Feature activation-2.67e+01

Pos logit contributions
eds+0.181
ac+0.171
 occasions+0.171
ators+0.166
anish+0.164
Neg logit contributions
 hungry-0.535
 curious-0.515
 nervous-0.512
 thirsty-0.512
 excited-0.497
Tokenats
Token ID1381
Feature activation-1.90e+01

Pos logit contributions
eds+0.564
ara+0.534
ators+0.463
of+0.457
st+0.456
Neg logit contributions
 hungry-2.237
 determined-2.229
 curious-2.192
 excited-2.175
 thirsty-2.171
Token and
Token ID290
Feature activation+9.73e-02

Pos logit contributions
eds+1.184
w+1.115
ara+1.094
ating+1.087
of+1.053
Neg logit contributions
 curious-0.918
 determined-0.905
 nervous-0.904
 hungry-0.883
 confused-0.877
Token the
Token ID262
Feature activation-4.27e+00

Pos logit contributions
 determined+0.006
 nervous+0.005
 curious+0.005
 unsure+0.005
 confused+0.005
Neg logit contributions
eds-0.005
ara-0.005
w-0.005
of-0.005
ating-0.004
Token clown
Token ID25573
Feature activation-8.43e-02

Pos logit contributions
eds+0.234
w+0.219
ating+0.214
ara+0.213
of+0.203
Neg logit contributions
 curious-0.247
 nervous-0.242
 determined-0.239
 hungry-0.238
 confused-0.235
Tokens
Token ID82
Feature activation-1.32e+01

Pos logit contributions
eds+0.004
w+0.003
ara+0.003
ating+0.003
icer+0.003
Neg logit contributions
 curious-0.006
 nervous-0.006
 hungry-0.006
 determined-0.006
 confused-0.006
Token saw
Token ID2497
Feature activation-2.49e+00

Pos logit contributions
 curious+0.542
 determined+0.536
 nervous+0.536
 hungry+0.526
 confused+0.522
Neg logit contributions
eds-0.495
w-0.460
ara-0.452
ating-0.446
of-0.430
Token clown
Token ID25573
Feature activation+3.86e-02

Pos logit contributions
eds+0.133
ara+0.124
w+0.121
ating+0.120
of+0.119
Neg logit contributions
 determined-0.140
 curious-0.134
 nervous-0.134
 unsure-0.131
 hungry-0.129
Tokens
Token ID82
Feature activation-1.31e+01

Pos logit contributions
 curious+0.003
 hungry+0.003
 nervous+0.003
 confused+0.003
 determined+0.003
Neg logit contributions
eds-0.002
w-0.001
ating-0.001
ara-0.001
icer-0.001
Token and
Token ID290
Feature activation+2.78e-01

Pos logit contributions
eds+0.718
w+0.660
ating+0.659
ara+0.653
of+0.623
Neg logit contributions
 determined-0.710
 nervous-0.703
 curious-0.697
 unsure-0.682
 confused-0.679
Token ac
Token ID936
Feature activation-6.80e+00

Pos logit contributions
 determined+0.015
 nervous+0.015
 unsure+0.015
 curious+0.015
 confused+0.014
Neg logit contributions
eds-0.015
w-0.014
ara-0.013
of-0.013
ating-0.013
Tokenrob
Token ID22609
Feature activation-2.65e+01

Pos logit contributions
ac+0.176
 occasions+0.171
ara+0.168
eds+0.162
anish+0.162
Neg logit contributions
 hungry-0.506
 curious-0.488
 thirsty-0.479
 determined-0.473
 excited-0.472
Tokenats
Token ID1381
Feature activation-1.88e+01

Pos logit contributions
eds+0.487
ara+0.435
st+0.386
of+0.371
w+0.365
Neg logit contributions
 determined-2.349
 curious-2.332
 hungry-2.315
 nervous-2.289
 excited-2.273
Token and
Token ID290
Feature activation+3.27e-01

Pos logit contributions
eds+1.066
w+0.991
ara+0.971
ating+0.966
of+0.941
Neg logit contributions
 curious-1.011
 determined-1.004
 nervous-0.995
 hungry-0.973
 confused-0.969
Token lions
Token ID31420
Feature activation-6.17e+00

Pos logit contributions
 determined+0.018
 nervous+0.018
 curious+0.018
 unsure+0.018
 confused+0.017
Neg logit contributions
eds-0.018
w-0.016
ara-0.016
ating-0.015
of-0.015
Token and
Token ID290
Feature activation+3.44e-01

Pos logit contributions
eds+0.350
w+0.328
ara+0.321
ating+0.318
of+0.307
Neg logit contributions
 curious-0.335
 nervous-0.330
 determined-0.329
 hungry-0.323
 confused-0.321
Token elephants
Token ID25368
Feature activation-5.42e+00

Pos logit contributions
 determined+0.020
 curious+0.020
 nervous+0.020
 confused+0.019
 unsure+0.019
Neg logit contributions
eds-0.018
w-0.016
ara-0.016
ating-0.015
of-0.015
Token a
Token ID257
Feature activation+2.14e+00

Pos logit contributions
w+0.594
ara+0.591
eds+0.590
icer+0.564
ee+0.563
Neg logit contributions
 determined-0.446
 nervous-0.431
 unsure-0.431
 confused-0.429
 curious-0.419
Token black
Token ID2042
Feature activation+3.42e+00

Pos logit contributions
 unsure+0.102
 determined+0.101
 confused+0.096
 nervous+0.095
 ashamed+0.091
Neg logit contributions
ara-0.130
eds-0.124
ee-0.118
w-0.118
icer-0.116
Token cat
Token ID3797
Feature activation-2.67e+00

Pos logit contributions
 curious+0.184
 nervous+0.179
 hungry+0.178
 confused+0.172
 determined+0.170
Neg logit contributions
eds-0.201
w-0.187
ating-0.186
of-0.174
ara-0.173
Token named
Token ID3706
Feature activation+1.64e+00

Pos logit contributions
eds+0.167
w+0.158
ara+0.155
ating+0.153
of+0.149
Neg logit contributions
 curious-0.130
 nervous-0.127
 determined-0.127
 hungry-0.125
 confused-0.124
Token Wh
Token ID854
Feature activation-8.37e+00

Pos logit contributions
 curious+0.076
 nervous+0.074
 determined+0.073
 hungry+0.073
 confused+0.071
Neg logit contributions
eds-0.108
w-0.102
ating-0.100
ara-0.099
icer-0.096
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
w+0.321
eds+0.310
ating+0.299
 occasions+0.292
ucks+0.285
Neg logit contributions
 curious-0.534
 nervous-0.508
 hungry-0.507
 confused-0.494
 determined-0.488
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
eds+0.903
icer+0.849
w+0.836
ara+0.827
ee+0.743
Neg logit contributions
 curious-1.896
 excited-1.832
 determined-1.810
 hungry-1.779
 nervous-1.770
Token.
Token ID13
Feature activation-4.37e-01

Pos logit contributions
eds+0.872
w+0.816
ara+0.794
ating+0.788
icer+0.762
Neg logit contributions
 curious-0.821
 nervous-0.798
 determined-0.794
 hungry-0.786
 confused-0.777
Token Wh
Token ID854
Feature activation-8.36e+00

Pos logit contributions
eds+0.024
w+0.023
ara+0.023
ating+0.022
of+0.022
Neg logit contributions
 determined-0.024
 curious-0.024
 nervous-0.023
 confused-0.023
 hungry-0.023
Tokenisk
Token ID1984
Feature activation-2.64e+01

Pos logit contributions
w+0.329
eds+0.325
 occasions+0.318
ee+0.311
ators+0.300
Neg logit contributions
 curious-0.520
 hungry-0.514
 nervous-0.510
 confused-0.495
 thirsty-0.490
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
eds+0.878
ara+0.862
icer+0.835
w+0.818
of+0.783
Neg logit contributions
 curious-1.922
 excited-1.849
 determined-1.818
 nervous-1.788
 hungry-1.779
Token day
Token ID1110
Feature activation-5.98e-01

Pos logit contributions
eds+0.418
w+0.384
ating+0.381
ara+0.373
icer+0.368
Neg logit contributions
 curious-0.413
 nervous-0.405
 hungry-0.401
 determined-0.398
 confused-0.388
Token,
Token ID11
Feature activation+1.49e+00

Pos logit contributions
eds+0.043
ating+0.039
w+0.039
icer+0.039
ara+0.039
Neg logit contributions
 curious-0.025
 nervous-0.024
 hungry-0.024
 determined-0.023
 sorry-0.023
Token Lily
Token ID20037
Feature activation-5.86e-01

Pos logit contributions
 determined+0.073
 curious+0.070
 unsure+0.069
 nervous+0.068
 confused+0.068
Neg logit contributions
w-0.089
ara-0.089
eds-0.089
of-0.083
ating-0.082
Token and
Token ID290
Feature activation-3.37e-01

Pos logit contributions
eds+0.036
ating+0.034
w+0.033
icer+0.033
ators+0.033
Neg logit contributions
 curious-0.030
 nervous-0.030
 hungry-0.029
 excited-0.029
 confused-0.029
Token Wh
Token ID854
Feature activation-8.35e+00

Pos logit contributions
eds+0.020
w+0.019
ara+0.019
ating+0.019
icer+0.018
Neg logit contributions
 curious-0.017
 nervous-0.017
 determined-0.017
 hungry-0.017
 confused-0.016
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
eds+0.339
w+0.327
ee+0.312
ating+0.311
ara+0.306
Neg logit contributions
 nervous-0.507
 curious-0.503
 hungry-0.501
 sorry-0.489
 confused-0.486
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
ara+0.906
icer+0.887
eds+0.883
w+0.767
ee+0.754
Neg logit contributions
 curious-1.880
 excited-1.851
 determined-1.827
 nervous-1.804
 hungry-1.791
Token were
Token ID547
Feature activation-5.79e+00

Pos logit contributions
eds+0.747
w+0.725
ara+0.689
ating+0.672
of+0.670
Neg logit contributions
 determined-0.872
 curious-0.867
 nervous-0.861
 unsure-0.849
 confused-0.837
Token playing
Token ID2712
Feature activation+3.82e+00

Pos logit contributions
eds+0.319
w+0.302
ating+0.292
st+0.284
ara+0.271
Neg logit contributions
 unsure-0.288
 determined-0.283
 confident-0.281
 hopeful-0.276
 curious-0.270
Token with
Token ID351
Feature activation-3.51e+00

Pos logit contributions
 curious+0.198
 nervous+0.193
 determined+0.192
 hungry+0.189
 confused+0.186
Neg logit contributions
eds-0.232
w-0.216
ara-0.213
ating-0.208
of-0.207
Token their
Token ID511
Feature activation-3.82e+00

Pos logit contributions
eds+0.215
w+0.202
ara+0.198
ating+0.196
of+0.190
Neg logit contributions
 curious-0.177
 determined-0.175
 nervous-0.174
 hungry-0.170
 confused-0.169
Token walked
Token ID6807
Feature activation+2.80e-01

Pos logit contributions
 curious+0.281
 determined+0.277
 nervous+0.274
 hungry+0.272
 confused+0.269
Neg logit contributions
eds-0.366
w-0.341
ara-0.336
ating-0.330
icer-0.321
Token,
Token ID11
Feature activation+1.49e+00

Pos logit contributions
 determined+0.018
 curious+0.018
 nervous+0.017
 unsure+0.017
 confused+0.017
Neg logit contributions
eds-0.012
ating-0.011
of-0.011
st-0.011
ara-0.011
Token Mr
Token ID1770
Feature activation-4.04e+00

Pos logit contributions
 determined+0.076
 curious+0.076
 unsure+0.072
 nervous+0.071
 confused+0.071
Neg logit contributions
ara-0.084
w-0.082
eds-0.082
icer-0.077
of-0.077
Token.
Token ID13
Feature activation-4.29e-01

Pos logit contributions
eds+0.245
w+0.229
ating+0.222
icer+0.215
of+0.214
Neg logit contributions
 curious-0.209
 nervous-0.205
 hungry-0.204
 confused-0.199
 determined-0.196
Token Wh
Token ID854
Feature activation-8.35e+00

Pos logit contributions
eds+0.023
ara+0.022
w+0.021
ating+0.021
icer+0.021
Neg logit contributions
 curious-0.024
 determined-0.024
 nervous-0.024
 confused-0.023
 unsure-0.023
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
eds+0.321
w+0.305
 occasions+0.286
ating+0.286
ee+0.282
Neg logit contributions
 nervous-0.558
 hungry-0.553
 curious-0.550
 confused-0.545
 ashamed-0.541
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
eds+0.885
icer+0.859
ara+0.799
w+0.762
ucks+0.739
Neg logit contributions
 determined-1.933
 curious-1.926
 excited-1.877
 nervous-1.840
 hungry-1.823
Token told
Token ID1297
Feature activation+2.37e+00

Pos logit contributions
eds+0.819
w+0.774
ara+0.762
ating+0.754
of+0.738
Neg logit contributions
 curious-0.806
 nervous-0.796
 determined-0.792
 hungry-0.778
 confused-0.766
Token the
Token ID262
Feature activation-4.75e+00

Pos logit contributions
 curious+0.116
 determined+0.115
 nervous+0.113
 confused+0.110
 hungry+0.110
Neg logit contributions
eds-0.146
ara-0.138
w-0.137
ating-0.134
of-0.132
Token girl
Token ID2576
Feature activation-1.76e+00

Pos logit contributions
ara+0.252
eds+0.248
w+0.231
of+0.229
st+0.228
Neg logit contributions
 determined-0.254
 unsure-0.244
 curious-0.243
 nervous-0.237
 confused-0.237
Token about
Token ID546
Feature activation+9.76e-01

Pos logit contributions
 is+0.090
eds+0.090
of+0.088
ating+0.084
ara+0.083
Neg logit contributions
 nervous-0.088
 determined-0.085
 thirsty-0.084
 confused-0.084
 unsure-0.083
Token saw
Token ID2497
Feature activation-3.05e+00

Pos logit contributions
 determined+0.083
 nervous+0.083
 curious+0.083
 sorry+0.080
 confused+0.080
Neg logit contributions
eds-0.079
w-0.073
ara-0.071
of-0.070
ating-0.069
Token her
Token ID607
Feature activation-5.84e-01

Pos logit contributions
eds+0.196
w+0.183
ara+0.181
ating+0.179
of+0.175
Neg logit contributions
 determined-0.145
 curious-0.143
 nervous-0.141
 hungry-0.137
 confused-0.136
Token searching
Token ID10342
Feature activation+1.55e+01

Pos logit contributions
eds+0.030
ara+0.027
w+0.027
ating+0.026
of+0.026
Neg logit contributions
 determined-0.034
 curious-0.033
 nervous-0.033
 hungry-0.033
 unsure-0.032
Token.
Token ID13
Feature activation-4.46e-01

Pos logit contributions
 determined+0.803
 curious+0.793
 nervous+0.783
 confused+0.762
 unsure+0.755
Neg logit contributions
eds-0.897
w-0.837
ating-0.814
ara-0.786
st-0.781
Token Wh
Token ID854
Feature activation-8.38e+00

Pos logit contributions
 is+0.026
ara+0.026
eds+0.025
of+0.024
icer+0.023
Neg logit contributions
 determined-0.021
 confident-0.019
 unsure-0.019
 thirsty-0.018
 curious-0.018
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
eds+0.343
w+0.338
ee+0.332
 occasions+0.309
of+0.308
Neg logit contributions
 nervous-0.484
 curious-0.480
 hungry-0.474
 confused-0.469
 unsure-0.466
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
eds+0.864
icer+0.850
ara+0.845
w+0.773
ee+0.754
Neg logit contributions
 curious-1.950
 determined-1.902
 excited-1.874
 nervous-1.838
 hungry-1.806
Token liked
Token ID8288
Feature activation-3.29e-01

Pos logit contributions
eds+0.690
w+0.660
ating+0.655
of+0.638
ara+0.629
Neg logit contributions
 nervous-0.876
 determined-0.865
 curious-0.859
 hungry-0.855
 confident-0.849
Token to
Token ID284
Feature activation+1.22e+00

Pos logit contributions
eds+0.019
ara+0.016
st+0.016
of+0.016
w+0.016
Neg logit contributions
 determined-0.017
 curious-0.017
 unsure-0.017
 nervous-0.017
 confused-0.016
Token play
Token ID711
Feature activation-2.32e+00

Pos logit contributions
 determined+0.075
 curious+0.074
 nervous+0.074
 hungry+0.073
 confused+0.072
Neg logit contributions
eds-0.059
w-0.053
ara-0.053
of-0.053
ating-0.051
Token with
Token ID351
Feature activation-3.52e+00

Pos logit contributions
eds+0.120
ating+0.111
w+0.109
ara+0.107
st+0.104
Neg logit contributions
 determined-0.135
 curious-0.133
 nervous-0.133
 confused-0.131
 hungry-0.131
Token
Token ID198
Feature activation-3.77e+00

Pos logit contributions
eds+0.258
ating+0.249
ators+0.246
w+0.245
ara+0.240
Neg logit contributions
 curious-0.157
 determined-0.152
 nervous-0.151
 excited-0.149
 hungry-0.146
TokenB
Token ID33
Feature activation-5.20e+00

Pos logit contributions
eds+0.187
ara+0.172
ating+0.167
icer+0.166
w+0.166
Neg logit contributions
 determined-0.236
 curious-0.235
 nervous-0.228
 hungry-0.225
 confused-0.219
Tokenuddy
Token ID21584
Feature activation-4.98e+00

Pos logit contributions
eds+0.294
ating+0.284
ators+0.281
w+0.277
ara+0.273
Neg logit contributions
 curious-0.268
 nervous-0.266
 determined-0.259
 hungry-0.257
 excited-0.252
Token and
Token ID290
Feature activation-3.37e-01

Pos logit contributions
eds+0.274
w+0.259
ara+0.257
ating+0.248
of+0.245
Neg logit contributions
 nervous-0.263
 curious-0.263
 determined-0.263
 hungry-0.258
 sorry-0.256
Token Wh
Token ID854
Feature activation-8.37e+00

Pos logit contributions
eds+0.018
w+0.016
ara+0.016
ating+0.016
of+0.016
Neg logit contributions
 determined-0.019
 curious-0.019
 nervous-0.019
 confused-0.018
 unsure-0.018
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
eds+0.323
ara+0.317
w+0.317
ating+0.306
 occasions+0.301
Neg logit contributions
 nervous-0.524
 curious-0.524
 hungry-0.514
 confused-0.508
 determined-0.504
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
ara+0.992
icer+0.934
eds+0.894
ucks+0.814
ee+0.811
Neg logit contributions
 curious-1.853
 determined-1.823
 excited-1.816
 nervous-1.756
 hungry-1.736
Token started
Token ID2067
Feature activation+3.23e+00

Pos logit contributions
w+0.714
eds+0.712
of+0.674
ating+0.655
ara+0.651
Neg logit contributions
 nervous-0.855
 curious-0.855
 determined-0.845
 hungry-0.838
 sorry-0.829
Token to
Token ID284
Feature activation+1.21e+00

Pos logit contributions
 confident+0.168
 curious+0.168
 nervous+0.167
 sorry+0.167
 unsure+0.166
Neg logit contributions
eds-0.170
st-0.154
ating-0.138
 is-0.136
en-0.135
Token fight
Token ID1907
Feature activation+2.34e+00

Pos logit contributions
 curious+0.070
 determined+0.070
 nervous+0.070
 hungry+0.068
 confused+0.068
Neg logit contributions
eds-0.060
w-0.055
of-0.054
ara-0.054
ating-0.054
Token over
Token ID625
Feature activation-3.68e+00

Pos logit contributions
 curious+0.125
 nervous+0.124
 determined+0.123
 hungry+0.120
 confused+0.120
Neg logit contributions
eds-0.137
w-0.129
ara-0.127
ating-0.125
icer-0.122
Token free
Token ID1479
Feature activation+2.95e+00

Pos logit contributions
 curious+0.037
 nervous+0.036
 hungry+0.035
 excited+0.034
 confused+0.034
Neg logit contributions
eds-0.051
ara-0.050
ating-0.049
icer-0.049
w-0.049
Token."
Token ID526
Feature activation-6.79e+00

Pos logit contributions
 curious+0.159
 nervous+0.154
 hungry+0.153
 excited+0.149
 confused+0.147
Neg logit contributions
eds-0.168
ara-0.165
icer-0.160
w-0.160
ating-0.159
Token
Token ID198
Feature activation-3.80e+00

Pos logit contributions
eds+0.415
w+0.390
ating+0.383
ara+0.377
ee+0.364
Neg logit contributions
 curious-0.342
 nervous-0.341
 determined-0.333
 hungry-0.331
 confused-0.330
Token
Token ID198
Feature activation-3.78e+00

Pos logit contributions
eds+0.273
ating+0.268
ators+0.263
w+0.261
icer+0.252
Neg logit contributions
 curious-0.141
 nervous-0.140
 excited-0.137
 determined-0.132
 hungry-0.130
TokenWh
Token ID1199
Feature activation-1.17e+01

Pos logit contributions
eds+0.201
ating+0.183
w+0.182
ara+0.182
ee+0.180
Neg logit contributions
 determined-0.217
 curious-0.215
 nervous-0.213
 confused-0.205
 hungry-0.201
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
eds+0.527
ators+0.516
 occasions+0.505
ating+0.472
w+0.464
Neg logit contributions
 nervous-0.592
 determined-0.592
 hungry-0.574
 curious-0.561
 confused-0.561
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
icer+0.993
eds+0.967
ara+0.958
ee+0.894
w+0.878
Neg logit contributions
 curious-1.773
 determined-1.745
 excited-1.692
 confident-1.616
 nervous-1.606
Token didn
Token ID1422
Feature activation+8.80e-01

Pos logit contributions
eds+0.870
w+0.831
ara+0.811
ating+0.805
of+0.782
Neg logit contributions
 curious-0.778
 nervous-0.772
 determined-0.768
 hungry-0.760
 confused-0.744
Token't
Token ID470
Feature activation+9.66e-01

Pos logit contributions
 curious+0.048
 nervous+0.048
 determined+0.048
 hungry+0.047
 confused+0.046
Neg logit contributions
eds-0.049
w-0.046
ara-0.045
ating-0.045
of-0.043
Token listen
Token ID6004
Feature activation-6.27e-01

Pos logit contributions
 determined+0.056
 confused+0.056
 curious+0.055
 unsure+0.055
 hungry+0.055
Neg logit contributions
of-0.045
w-0.045
ent-0.042
le-0.042
eds-0.042
Token to
Token ID284
Feature activation+1.22e+00

Pos logit contributions
eds+0.027
of+0.024
ating+0.024
st+0.024
ent+0.024
Neg logit contributions
 thirsty-0.039
 nervous-0.038
 curious-0.038
 determined-0.037
 hungry-0.037
Token "
Token ID366
Feature activation+4.15e+00

Pos logit contributions
 thirsty+0.070
 determined+0.065
 nervous+0.065
 unsure+0.065
 bored+0.065
Neg logit contributions
 is-0.068
ara-0.068
eds-0.067
st-0.067
of-0.067
TokenThank
Token ID10449
Feature activation-1.31e+00

Pos logit contributions
 determined+0.171
 curious+0.168
 nervous+0.166
 hungry+0.160
 excited+0.156
Neg logit contributions
eds-0.300
ating-0.263
icer-0.262
ators-0.260
ara-0.257
Token you
Token ID345
Feature activation-1.15e+01

Pos logit contributions
eds+0.075
ators+0.069
angles+0.066
icer+0.065
ara+0.063
Neg logit contributions
 sorry-0.074
 curious-0.072
 hungry-0.070
 determined-0.068
 nervous-0.067
Token,
Token ID11
Feature activation+1.46e+00

Pos logit contributions
eds+0.743
w+0.702
ara+0.688
ating+0.686
icer+0.661
Neg logit contributions
 curious-0.548
 nervous-0.539
 determined-0.534
 hungry-0.529
 confused-0.520
Token Wh
Token ID854
Feature activation-8.37e+00

Pos logit contributions
 curious+0.071
 determined+0.070
 nervous+0.069
 confused+0.068
 hungry+0.068
Neg logit contributions
eds-0.091
w-0.086
ara-0.085
ating-0.083
of-0.082
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
eds+0.356
ators+0.352
 occasions+0.350
ee+0.319
ating+0.311
Neg logit contributions
 hungry-0.471
 sorry-0.468
 curious-0.463
 nervous-0.457
 ashamed-0.438
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
eds+0.912
icer+0.900
ara+0.827
ucks+0.751
ators+0.745
Neg logit contributions
 curious-1.944
 determined-1.853
 excited-1.842
 sorry-1.841
 hungry-1.816
Token!"
Token ID2474
Feature activation-1.22e+01

Pos logit contributions
eds+0.873
w+0.803
ara+0.800
icer+0.786
ating+0.784
Neg logit contributions
 curious-0.824
 hungry-0.804
 nervous-0.795
 sorry-0.777
 determined-0.758
Token
Token ID198
Feature activation-3.78e+00

Pos logit contributions
eds+0.658
ara+0.629
w+0.615
of+0.602
ating+0.586
Neg logit contributions
 determined-0.673
 curious-0.668
 nervous-0.644
 confused-0.636
 unsure-0.633
Token
Token ID198
Feature activation-3.78e+00

Pos logit contributions
eds+0.275
ators+0.273
ating+0.272
w+0.265
els+0.262
Neg logit contributions
 curious-0.132
 excited-0.132
 nervous-0.128
 hungry-0.125
 determined-0.122
TokenThe
Token ID464
Feature activation-9.28e+00

Pos logit contributions
eds+0.212
ee+0.190
icer+0.189
els+0.189
ators+0.189
Neg logit contributions
 determined-0.209
 curious-0.200
 nervous-0.192
 excited-0.191
 hungry-0.188
Token a
Token ID257
Feature activation+2.16e+00

Pos logit contributions
eds+0.479
w+0.456
ara+0.448
ating+0.446
of+0.434
Neg logit contributions
 curious-0.233
 determined-0.228
 nervous-0.228
 hungry-0.221
 confused-0.219
Token quiet
Token ID5897
Feature activation+6.07e+00

Pos logit contributions
 unsure+0.102
 determined+0.100
 confused+0.096
 nervous+0.094
 curious+0.092
Neg logit contributions
ara-0.124
eds-0.122
st-0.115
w-0.114
of-0.112
Token mouse
Token ID10211
Feature activation+2.23e+00

Pos logit contributions
 determined+0.360
 curious+0.357
 nervous+0.353
 hungry+0.347
 confused+0.346
Neg logit contributions
eds-0.311
ara-0.289
w-0.287
ating-0.279
of-0.270
Token named
Token ID3706
Feature activation+1.64e+00

Pos logit contributions
 curious+0.114
 nervous+0.113
 determined+0.112
 hungry+0.110
 confused+0.109
Neg logit contributions
eds-0.134
w-0.126
ara-0.123
ating-0.122
of-0.118
Token Wh
Token ID854
Feature activation-8.38e+00

Pos logit contributions
 determined+0.077
 curious+0.074
 nervous+0.074
 confused+0.073
 hungry+0.072
Neg logit contributions
eds-0.105
w-0.098
ara-0.097
of-0.095
ating-0.095
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
w+0.333
 occasions+0.308
ucks+0.302
eds+0.301
ating+0.299
Neg logit contributions
 curious-0.504
 nervous-0.485
 hungry-0.472
 confused-0.465
 fearful-0.463
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
icer+0.866
eds+0.842
ara+0.824
w+0.780
ee+0.705
Neg logit contributions
 curious-1.964
 excited-1.889
 determined-1.884
 nervous-1.840
 hungry-1.833
Token.
Token ID13
Feature activation-4.31e-01

Pos logit contributions
eds+0.789
w+0.732
ara+0.713
ating+0.710
of+0.688
Neg logit contributions
 curious-0.873
 determined-0.871
 nervous-0.869
 hungry-0.847
 confused-0.846
Token Wh
Token ID854
Feature activation-8.36e+00

Pos logit contributions
eds+0.026
w+0.024
ara+0.024
ating+0.023
of+0.023
Neg logit contributions
 determined-0.022
 curious-0.022
 nervous-0.021
 confused-0.021
 hungry-0.021
Tokenisk
Token ID1984
Feature activation-2.64e+01

Pos logit contributions
w+0.338
 occasions+0.337
ara+0.331
ee+0.315
eds+0.314
Neg logit contributions
 curious-0.468
 nervous-0.458
 fearful-0.447
 confused-0.442
 hungry-0.440
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
ara+0.961
icer+0.929
eds+0.896
w+0.848
ee+0.825
Neg logit contributions
 curious-1.837
 excited-1.770
 determined-1.744
 nervous-1.691
 hungry-1.662
Token others
Token ID1854
Feature activation-8.16e+00

Pos logit contributions
eds+0.206
ara+0.195
w+0.195
ating+0.188
of+0.185
Neg logit contributions
 curious-0.242
 determined-0.237
 nervous-0.236
 hungry-0.235
 confused-0.232
Token."
Token ID526
Feature activation-6.80e+00

Pos logit contributions
eds+0.433
w+0.406
ara+0.403
icer+0.392
ating+0.384
Neg logit contributions
 curious-0.495
 nervous-0.475
 determined-0.470
 hungry-0.466
 confused-0.460
Token
Token ID198
Feature activation-3.79e+00

Pos logit contributions
eds+0.419
w+0.393
ara+0.383
ating+0.383
icer+0.368
Neg logit contributions
 curious-0.341
 nervous-0.338
 determined-0.334
 hungry-0.329
 confused-0.327
Token
Token ID198
Feature activation-3.80e+00

Pos logit contributions
eds+0.273
ating+0.261
ators+0.260
w+0.255
icer+0.250
Neg logit contributions
 curious-0.146
 nervous-0.143
 excited-0.140
 determined-0.136
 hungry-0.134
TokenWh
Token ID1199
Feature activation-1.17e+01

Pos logit contributions
eds+0.203
w+0.181
ating+0.180
ara+0.180
ee+0.180
Neg logit contributions
 curious-0.223
 determined-0.223
 nervous-0.221
 hungry-0.212
 confused-0.211
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
eds+0.524
 occasions+0.514
ators+0.500
w+0.481
ucks+0.475
Neg logit contributions
 determined-0.594
 nervous-0.588
 hungry-0.580
 curious-0.576
 confused-0.560
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
icer+0.947
ara+0.943
eds+0.936
ee+0.861
ucks+0.840
Neg logit contributions
 curious-1.860
 determined-1.829
 excited-1.763
 confident-1.687
 nervous-1.681
Token listened
Token ID16399
Feature activation+3.40e+00

Pos logit contributions
eds+0.893
w+0.852
ara+0.834
ating+0.830
of+0.802
Neg logit contributions
 curious-0.755
 nervous-0.750
 determined-0.746
 hungry-0.736
 confused-0.723
Token to
Token ID284
Feature activation+1.23e+00

Pos logit contributions
 thirsty+0.197
 sorry+0.190
 hungry+0.181
 determined+0.181
 ashamed+0.180
Neg logit contributions
eds-0.148
st-0.147
ating-0.140
of-0.140
w-0.136
Token the
Token ID262
Feature activation-4.72e+00

Pos logit contributions
 curious+0.062
 nervous+0.061
 determined+0.061
 hungry+0.060
 confused+0.060
Neg logit contributions
eds-0.075
w-0.071
ara-0.069
ating-0.068
of-0.066
Token owl
Token ID39610
Feature activation+1.38e+00

Pos logit contributions
eds+0.296
w+0.283
ating+0.277
st+0.268
ara+0.268
Neg logit contributions
 determined-0.211
 curious-0.209
 nervous-0.209
 thirsty-0.205
 hungry-0.205
Token got
Token ID1392
Feature activation-4.45e+00

Pos logit contributions
 curious+0.286
 nervous+0.285
 determined+0.282
 hungry+0.282
 confused+0.279
Neg logit contributions
eds-0.365
ating-0.331
w-0.331
ara-0.327
icer-0.320
Token back
Token ID736
Feature activation-2.22e+00

Pos logit contributions
eds+0.212
w+0.200
ating+0.195
ma+0.192
st+0.191
Neg logit contributions
 determined-0.253
 curious-0.247
 confident-0.247
 unsure-0.247
 nervous-0.242
Token inside
Token ID2641
Feature activation+2.70e+00

Pos logit contributions
eds+0.126
w+0.117
ating+0.117
ara+0.115
st+0.111
Neg logit contributions
 determined-0.120
 curious-0.119
 nervous-0.117
 unsure-0.114
 confused-0.114
Token,
Token ID11
Feature activation+1.46e+00

Pos logit contributions
 determined+0.148
 curious+0.148
 nervous+0.144
 unsure+0.141
 hungry+0.141
Neg logit contributions
eds-0.144
ara-0.138
ating-0.136
w-0.133
of-0.131
Token Wh
Token ID854
Feature activation-8.37e+00

Pos logit contributions
 determined+0.084
 curious+0.081
 unsure+0.079
 confused+0.077
 nervous+0.077
Neg logit contributions
w-0.067
ara-0.067
eds-0.065
ating-0.062
of-0.062
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
 occasions+0.328
eds+0.326
w+0.319
ating+0.317
ee+0.313
Neg logit contributions
 hungry-0.516
 nervous-0.511
 confused-0.498
 curious-0.496
 ashamed-0.489
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
icer+0.916
eds+0.892
ara+0.840
w+0.822
ee+0.792
Neg logit contributions
 curious-1.843
 determined-1.788
 excited-1.779
 hungry-1.697
 nervous-1.690
Token saw
Token ID2497
Feature activation-3.06e+00

Pos logit contributions
eds+0.870
w+0.815
ara+0.799
ating+0.792
of+0.765
Neg logit contributions
 curious-0.802
 nervous-0.791
 determined-0.791
 hungry-0.774
 confused-0.769
Token a
Token ID257
Feature activation+2.16e+00

Pos logit contributions
eds+0.186
ara+0.173
w+0.173
ating+0.170
of+0.167
Neg logit contributions
 determined-0.153
 curious-0.145
 nervous-0.145
 unsure-0.141
 confused-0.140
Token teaspoon
Token ID22326
Feature activation+2.98e+00

Pos logit contributions
 determined+0.104
 unsure+0.102
 nervous+0.102
 curious+0.101
 confused+0.099
Neg logit contributions
eds-0.125
ara-0.121
w-0.119
st-0.113
ee-0.111
Token on
Token ID319
Feature activation-3.27e+00

Pos logit contributions
 nervous+0.167
 curious+0.167
 determined+0.167
 confused+0.163
 hungry+0.161
Neg logit contributions
eds-0.161
w-0.150
ating-0.146
st-0.144
ara-0.144
Token Lucy
Token ID22162
Feature activation-3.08e-01

Pos logit contributions
ara+0.024
eds+0.023
icer+0.022
ating+0.022
 is+0.021
Neg logit contributions
 determined-0.023
 curious-0.021
 unsure-0.021
 nervous-0.021
 confused-0.021
Token and
Token ID290
Feature activation-3.46e-01

Pos logit contributions
eds+0.019
w+0.018
ating+0.017
ara+0.017
icer+0.017
Neg logit contributions
 curious-0.016
 nervous-0.015
 determined-0.015
 hungry-0.015
 confused-0.015
Token Mr
Token ID1770
Feature activation-4.06e+00

Pos logit contributions
eds+0.020
ara+0.020
w+0.020
ating+0.019
of+0.019
Neg logit contributions
 determined-0.017
 curious-0.017
 nervous-0.017
 confused-0.016
 unsure-0.016
Token.
Token ID13
Feature activation-4.54e-01

Pos logit contributions
eds+0.241
w+0.226
ating+0.218
ara+0.216
icer+0.212
Neg logit contributions
 curious-0.215
 nervous-0.211
 hungry-0.209
 determined-0.205
 confused-0.204
Token Wh
Token ID854
Feature activation-8.37e+00

Pos logit contributions
eds+0.027
w+0.025
ara+0.025
ating+0.025
of+0.024
Neg logit contributions
 curious-0.024
 nervous-0.023
 determined-0.023
 hungry-0.023
 confused-0.023
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
eds+0.319
w+0.307
icer+0.274
ating+0.272
ee+0.271
Neg logit contributions
 curious-0.576
 nervous-0.573
 hungry-0.567
 sorry-0.564
 confused-0.558
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
eds+0.883
icer+0.883
ara+0.831
w+0.770
ucks+0.733
Neg logit contributions
 curious-1.925
 determined-1.900
 excited-1.889
 nervous-1.831
 hungry-1.830
Token were
Token ID547
Feature activation-5.78e+00

Pos logit contributions
eds+0.644
of+0.612
w+0.595
ating+0.584
ara+0.578
Neg logit contributions
 nervous-0.936
 curious-0.932
 determined-0.930
 sorry-0.914
 hungry-0.913
Token very
Token ID845
Feature activation+1.23e+01

Pos logit contributions
eds+0.393
ating+0.380
 is+0.367
 us+0.355
ee+0.351
Neg logit contributions
 unsure-0.184
 determined-0.182
 nervous-0.169
dream-0.165
 confident-0.164
Token happy
Token ID3772
Feature activation+1.13e+01

Pos logit contributions
 unsure+0.354
 determined+0.344
dream+0.288
 confused+0.287
 nervous+0.279
Neg logit contributions
eds-0.844
ara-0.829
ee-0.820
ating-0.810
w-0.806
Token.
Token ID13
Feature activation-4.17e-01

Pos logit contributions
 determined+0.709
 nervous+0.693
 curious+0.688
 confused+0.685
 unsure+0.671
Neg logit contributions
eds-0.539
ara-0.496
ating-0.492
st-0.486
w-0.476
Token's
Token ID338
Feature activation-9.50e+00

Pos logit contributions
eds+0.016
w+0.015
ara+0.014
ating+0.014
of+0.014
Neg logit contributions
 curious-0.018
 determined-0.017
 nervous-0.017
 hungry-0.017
 confused-0.017
Token pet
Token ID4273
Feature activation+1.24e+00

Pos logit contributions
eds+0.556
ara+0.532
w+0.524
ee+0.501
ating+0.500
Neg logit contributions
 determined-0.451
 unsure-0.441
 nervous-0.441
 curious-0.431
 hungry-0.424
Token cat
Token ID3797
Feature activation-2.66e+00

Pos logit contributions
 curious+0.067
 nervous+0.067
 hungry+0.066
 confused+0.065
 determined+0.065
Neg logit contributions
eds-0.072
ating-0.065
w-0.065
ara-0.063
icer-0.063
Token,
Token ID11
Feature activation+1.47e+00

Pos logit contributions
eds+0.149
w+0.140
ara+0.137
ating+0.135
of+0.131
Neg logit contributions
 curious-0.144
 determined-0.144
 nervous-0.143
 confused-0.139
 hungry-0.139
Token Wh
Token ID854
Feature activation-8.36e+00

Pos logit contributions
 determined+0.066
 nervous+0.063
 curious+0.063
 unsure+0.062
 confused+0.061
Neg logit contributions
eds-0.092
w-0.088
ara-0.088
of-0.085
st-0.083
Tokenisk
Token ID1984
Feature activation-2.64e+01

Pos logit contributions
eds+0.314
w+0.298
icer+0.281
ucks+0.276
ating+0.274
Neg logit contributions
 curious-0.547
 nervous-0.527
 sorry-0.520
 confused-0.516
 hungry-0.514
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
eds+0.825
icer+0.795
ara+0.755
w+0.737
ucks+0.674
Neg logit contributions
 curious-2.011
 determined-1.964
 excited-1.939
 nervous-1.902
 hungry-1.889
Token,
Token ID11
Feature activation+1.50e+00

Pos logit contributions
eds+0.849
w+0.789
ating+0.763
ara+0.763
of+0.755
Neg logit contributions
 curious-0.796
 determined-0.793
 nervous-0.786
 hungry-0.762
 unsure-0.762
Token jumped
Token ID11687
Feature activation+3.05e+00

Pos logit contributions
 determined+0.075
 curious+0.075
 nervous+0.074
 confused+0.072
 unsure+0.072
Neg logit contributions
eds-0.088
w-0.084
ara-0.081
of-0.078
ating-0.078
Token up
Token ID510
Feature activation-2.73e+00

Pos logit contributions
 determined+0.177
 curious+0.174
 nervous+0.170
 unsure+0.168
 confused+0.166
Neg logit contributions
eds-0.155
w-0.143
ara-0.138
st-0.136
ating-0.135
Token and
Token ID290
Feature activation-3.15e-01

Pos logit contributions
eds+0.120
st+0.117
w+0.110
ara+0.109
ating+0.106
Neg logit contributions
 determined-0.165
 curious-0.164
 sorry-0.160
 nervous-0.157
 ashamed-0.156
Token Wh
Token ID854
Feature activation-8.37e+00

Pos logit contributions
 curious+0.076
 nervous+0.074
 determined+0.073
 hungry+0.073
 confused+0.071
Neg logit contributions
eds-0.108
w-0.102
ating-0.100
ara-0.099
icer-0.096
Tokenisk
Token ID1984
Feature activation-2.65e+01

Pos logit contributions
w+0.321
eds+0.310
ating+0.299
 occasions+0.292
ucks+0.285
Neg logit contributions
 curious-0.534
 nervous-0.508
 hungry-0.507
 confused-0.494
 determined-0.488
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
eds+0.903
icer+0.849
w+0.836
ara+0.827
ee+0.743
Neg logit contributions
 curious-1.896
 excited-1.832
 determined-1.810
 hungry-1.779
 nervous-1.770
Token.
Token ID13
Feature activation-4.37e-01

Pos logit contributions
eds+0.872
w+0.816
ara+0.794
ating+0.788
icer+0.762
Neg logit contributions
 curious-0.821
 nervous-0.798
 determined-0.794
 hungry-0.786
 confused-0.777
Token Wh
Token ID854
Feature activation-8.36e+00

Pos logit contributions
eds+0.024
w+0.023
ara+0.023
ating+0.022
of+0.022
Neg logit contributions
 determined-0.024
 curious-0.024
 nervous-0.023
 confused-0.023
 hungry-0.023
Tokenisk
Token ID1984
Feature activation-2.64e+01

Pos logit contributions
w+0.329
eds+0.325
 occasions+0.318
ee+0.311
ators+0.300
Neg logit contributions
 curious-0.520
 hungry-0.514
 nervous-0.510
 confused-0.495
 thirsty-0.490
Tokeners
Token ID364
Feature activation-1.51e+01

Pos logit contributions
eds+0.878
ara+0.862
icer+0.835
w+0.818
of+0.783
Neg logit contributions
 curious-1.922
 excited-1.849
 determined-1.818
 nervous-1.788
 hungry-1.779
Token loved
Token ID6151
Feature activation+2.50e-01

Pos logit contributions
eds+0.978
w+0.875
ara+0.865
ating+0.854
icer+0.853
Neg logit contributions
 curious-0.778
 nervous-0.726
 determined-0.724
 hungry-0.720
 excited-0.717
Token to
Token ID284
Feature activation+1.24e+00

Pos logit contributions
 determined+0.013
 unsure+0.013
 confused+0.012
 nervous+0.012
 curious+0.012
Neg logit contributions
eds-0.015
w-0.014
st-0.013
ara-0.013
ating-0.013
Token walk
Token ID2513
Feature activation-2.62e+00

Pos logit contributions
 determined+0.078
 curious+0.078
 nervous+0.077
 confused+0.076
 unsure+0.076
Neg logit contributions
eds-0.059
w-0.054
ara-0.053
ating-0.052
of-0.052
Token around
Token ID1088
Feature activation+7.81e-01

Pos logit contributions
eds+0.145
w+0.137
ating+0.133
ara+0.131
of+0.129
Neg logit contributions
 determined-0.146
 nervous-0.143
 curious-0.143
 confused-0.141
 unsure-0.140
Token
Token ID198
Feature activation-3.17e+00

Pos logit contributions
eds+0.208
w+0.202
ating+0.200
ators+0.194
icer+0.192
Neg logit contributions
 curious-0.146
 excited-0.138
 determined-0.137
 nervous-0.137
 hungry-0.136
TokenTom
Token ID13787
Feature activation-9.58e+00

Pos logit contributions
eds+0.161
ating+0.146
w+0.146
icer+0.141
st+0.139
Neg logit contributions
 determined-0.193
 curious-0.192
 excited-0.182
 hungry-0.181
 nervous-0.180
Token was
Token ID373
Feature activation-9.12e+00

Pos logit contributions
eds+0.645
w+0.610
ating+0.595
icer+0.584
ara+0.570
Neg logit contributions
 curious-0.442
 nervous-0.428
 confused-0.423
 determined-0.421
 hungry-0.413
Token happy
Token ID3772
Feature activation+1.20e+01

Pos logit contributions
eds+0.637
 is+0.635
ate+0.602
ating+0.602
ara+0.595
Neg logit contributions
dream-0.216
 determined-0.201
 unsure-0.193
 nervous-0.185
 hungry-0.183
Token and
Token ID290
Feature activation+3.40e-01

Pos logit contributions
 determined+0.766
 nervous+0.740
 curious+0.737
 thirsty+0.721
 unsure+0.718
Neg logit contributions
eds-0.541
ara-0.509
of-0.486
st-0.480
 is-0.470
Token grateful
Token ID14066
Feature activation+1.38e+01

Pos logit contributions
 unsure+0.013
 determined+0.013
dream+0.013
 wondering+0.012
 nervous+0.012
Neg logit contributions
 is-0.019
ara-0.018
eds-0.018
of-0.017
w-0.017
Token.
Token ID13
Feature activation+2.69e-01

Pos logit contributions
 determined+0.825
 curious+0.818
 nervous+0.814
 hungry+0.793
 confused+0.793
Neg logit contributions
eds-0.716
w-0.663
ara-0.660
ating-0.636
of-0.631
Token He
Token ID679
Feature activation+9.51e+00

Pos logit contributions
 determined+0.011
 thirsty+0.010
 curious+0.010
 nervous+0.010
 hungry+0.009
Neg logit contributions
ara-0.019
eds-0.018
ucks-0.018
 is-0.018
of-0.017
Token opened
Token ID4721
Feature activation+7.00e+00

Pos logit contributions
 determined+0.482
 curious+0.480
 nervous+0.478
 hungry+0.467
 confused+0.462
Neg logit contributions
eds-0.552
ara-0.531
w-0.523
ating-0.503
of-0.501
Token the
Token ID262
Feature activation-3.99e+00

Pos logit contributions
 curious+0.337
 determined+0.337
 nervous+0.331
 hungry+0.322
 confused+0.319
Neg logit contributions
eds-0.441
ara-0.410
w-0.410
ating-0.404
icer-0.395
Token box
Token ID3091
Feature activation+9.52e-01

Pos logit contributions
eds+0.216
ating+0.206
w+0.203
ara+0.193
of+0.191
Neg logit contributions
 curious-0.239
 hungry-0.232
 confused-0.231
 nervous-0.231
 determined-0.226
Token
Token ID198
Feature activation-3.47e+00

Pos logit contributions
 curious+0.171
 nervous+0.165
 excited+0.163
 hungry+0.160
 determined+0.160
Neg logit contributions
eds-0.250
w-0.242
ating-0.230
ara-0.228
icer-0.223
Token
Token ID198
Feature activation-3.46e+00

Pos logit contributions
eds+0.233
w+0.225
ating+0.219
ara+0.217
ators+0.215
Neg logit contributions
 curious-0.153
 excited-0.149
 determined-0.149
 nervous-0.146
 hungry-0.143
TokenThe
Token ID464
Feature activation-8.94e+00

Pos logit contributions
eds+0.208
w+0.192
ara+0.190
ating+0.186
icer+0.183
Neg logit contributions
 determined-0.184
 curious-0.176
 excited-0.170
 nervous-0.166
 confident-0.163
Token boy
Token ID2933
Feature activation-7.66e-01

Pos logit contributions
eds+0.425
w+0.400
ara+0.394
ating+0.383
icer+0.371
Neg logit contributions
 nervous-0.550
 curious-0.549
 determined-0.544
 hungry-0.535
 confused-0.533
Token was
Token ID373
Feature activation-9.39e+00

Pos logit contributions
eds+0.041
w+0.038
ating+0.037
ara+0.037
icer+0.036
Neg logit contributions
 curious-0.044
 determined-0.044
 nervous-0.044
 hungry-0.043
 confused-0.043
Token glad
Token ID9675
Feature activation+1.41e+01

Pos logit contributions
eds+0.661
ma+0.626
e+0.619
ee+0.618
ating+0.613
Neg logit contributions
 nervous-0.250
 curious-0.249
 unsure-0.243
 hungry-0.232
 confused-0.231
Token he
Token ID339
Feature activation+1.12e+01

Pos logit contributions
 nervous+0.627
 determined+0.621
 curious+0.619
 thirsty+0.601
 confused+0.598
Neg logit contributions
eds-0.892
ara-0.859
of-0.837
st-0.837
ating-0.832
Token had
Token ID550
Feature activation-1.48e+00

Pos logit contributions
 curious+0.565
 determined+0.559
 nervous+0.552
 hungry+0.535
 confused+0.532
Neg logit contributions
eds-0.705
ara-0.641
icer-0.641
w-0.639
ating-0.627
Token decided
Token ID3066
Feature activation+9.66e+00

Pos logit contributions
eds+0.090
w+0.085
ara+0.083
ating+0.082
of+0.080
Neg logit contributions
 curious-0.076
 determined-0.074
 nervous-0.074
 hungry-0.073
 confused-0.072
Token to
Token ID284
Feature activation+1.65e+00

Pos logit contributions
 nervous+0.469
 curious+0.462
 hungry+0.454
 confused+0.453
 thirsty+0.451
Neg logit contributions
eds-0.573
ating-0.540
w-0.530
st-0.521
ara-0.519
Token be
Token ID307
Feature activation-9.99e+00

Pos logit contributions
 curious+0.091
 determined+0.089
 nervous+0.088
 hungry+0.086
 sorry+0.085
Neg logit contributions
eds-0.095
w-0.088
ara-0.086
icer-0.085
ating-0.083
Token Tim
Token ID5045
Feature activation+5.76e-01

Pos logit contributions
eds+0.321
ara+0.310
w+0.303
icer+0.294
of+0.291
Neg logit contributions
 determined-0.297
 curious-0.286
 nervous-0.276
 unsure-0.274
 confused-0.272
Tokenmy
Token ID1820
Feature activation-1.21e+01

Pos logit contributions
 curious+0.040
 nervous+0.040
 confused+0.039
 hungry+0.039
 determined+0.039
Neg logit contributions
ating-0.022
eds-0.022
w-0.021
 occasions-0.021
ara-0.021
Token was
Token ID373
Feature activation-9.67e+00

Pos logit contributions
eds+0.512
w+0.464
ara+0.448
ating+0.432
of+0.426
Neg logit contributions
 determined-0.817
 curious-0.814
 nervous-0.813
 hungry-0.805
 sorry-0.793
Token surprised
Token ID6655
Feature activation+1.34e+01

Pos logit contributions
eds+0.629
 is+0.590
e+0.561
 us+0.536
el+0.526
Neg logit contributions
 confident-0.281
dream-0.278
 determined-0.271
 energetic-0.248
 unsure-0.244
Token and
Token ID290
Feature activation-2.48e-01

Pos logit contributions
 determined+0.778
 curious+0.742
 sorry+0.738
 nervous+0.735
 thirsty+0.729
Neg logit contributions
eds-0.723
ara-0.686
of-0.677
st-0.663
w-0.658
Token scared
Token ID12008
Feature activation+1.69e+01

Pos logit contributions
eds+0.010
 is+0.010
 us+0.010
ie+0.010
w+0.009
Neg logit contributions
 confident-0.012
dream-0.012
 unsure-0.012
 determined-0.012
 nervous-0.011
Token,
Token ID11
Feature activation+1.59e+00

Pos logit contributions
 determined+1.289
 thirsty+1.199
 hopeful+1.174
 confident+1.170
 puzzled+1.168
Neg logit contributions
eds-0.556
of-0.534
 us-0.527
st-0.496
ie-0.496
Token but
Token ID475
Feature activation+7.11e+00

Pos logit contributions
 determined+0.069
 curious+0.068
 confident+0.067
 hopeful+0.065
 thirsty+0.065
Neg logit contributions
 is-0.072
 us-0.071
eds-0.070
w-0.063
st-0.063
Token also
Token ID635
Feature activation+7.43e+00

Pos logit contributions
 nervous+0.295
 confident+0.288
 unsure+0.285
 determined+0.279
 puzzled+0.274
Neg logit contributions
w-0.360
 is-0.353
ie-0.347
eds-0.345
 us-0.330
Token excited
Token ID6568
Feature activation+1.94e+01

Pos logit contributions
 determined+0.230
 nervous+0.214
 unsure+0.209
dream+0.207
 sorry+0.207
Neg logit contributions
eds-0.459
w-0.456
e-0.437
 is-0.436
st-0.435
Token.
Token ID13
Feature activation-3.01e-01

Pos logit contributions
 determined+1.074
 curious+1.041
 nervous+1.029
 confused+1.008
 sorry+1.002
Neg logit contributions
eds-1.090
w-1.017
ara-1.008
ating-1.001
of-0.994
TokenEveryone
Token ID16190
Feature activation-1.19e+01

Pos logit contributions
eds+0.168
w+0.153
ating+0.147
icer+0.146
ara+0.141
Neg logit contributions
 curious-0.223
 determined-0.222
 nervous-0.213
 excited-0.210
 confused-0.208
Token in
Token ID287
Feature activation-2.61e+00

Pos logit contributions
eds+0.610
w+0.573
ating+0.570
ara+0.565
of+0.559
Neg logit contributions
 nervous-0.668
 curious-0.667
 hungry-0.665
 determined-0.659
 sorry-0.647
Token the
Token ID262
Feature activation-4.44e+00

Pos logit contributions
eds+0.149
w+0.140
ara+0.139
ating+0.136
of+0.134
Neg logit contributions
 curious-0.137
 determined-0.136
 nervous-0.136
 hungry-0.133
 confused-0.131
Token class
Token ID1398
Feature activation-3.42e+00

Pos logit contributions
eds+0.236
w+0.219
ating+0.213
ara+0.213
icer+0.203
Neg logit contributions
 curious-0.263
 nervous-0.259
 determined-0.258
 hungry-0.254
 confused-0.253
Token was
Token ID373
Feature activation-9.49e+00

Pos logit contributions
eds+0.181
w+0.171
ara+0.169
ating+0.168
of+0.162
Neg logit contributions
 curious-0.191
 nervous-0.189
 determined-0.188
 hungry-0.187
 confused-0.183
Token surprised
Token ID6655
Feature activation+1.36e+01

Pos logit contributions
eds+0.715
ating+0.696
ara+0.674
ma+0.669
ee+0.669
Neg logit contributions
 hungry-0.202
 nervous-0.196
 thirsty-0.193
 determined-0.192
 curious-0.192
Token.
Token ID13
Feature activation-1.92e-01

Pos logit contributions
 determined+0.801
 hungry+0.789
 thirsty+0.786
 curious+0.784
 nervous+0.783
Neg logit contributions
eds-0.673
ating-0.656
ara-0.646
of-0.633
ent-0.618
Token Was
Token ID8920
Feature activation-2.26e+00

Pos logit contributions
eds+0.011
ara+0.011
of+0.010
ating+0.010
w+0.010
Neg logit contributions
 hungry-0.009
 determined-0.009
 curious-0.009
 nervous-0.009
 thirsty-0.009
Token it
Token ID340
Feature activation+2.63e+00

Pos logit contributions
eds+0.151
w+0.138
ara+0.136
ating+0.136
icer+0.134
Neg logit contributions
 curious-0.104
 determined-0.103
 nervous-0.102
 hungry-0.098
 confused-0.098
Token okay
Token ID8788
Feature activation+5.28e+00

Pos logit contributions
 curious+0.123
 determined+0.121
 nervous+0.121
 hungry+0.118
 confused+0.117
Neg logit contributions
eds-0.169
w-0.159
ara-0.156
ating-0.155
of-0.150
Token to
Token ID284
Feature activation+1.42e+00

Pos logit contributions
 curious+0.371
 nervous+0.360
 determined+0.357
 hungry+0.354
 excited+0.351
Neg logit contributions
eds-0.227
w-0.214
icer-0.200
ara-0.198
of-0.192
Token look
Token ID804
Feature activation+2.44e+00

Pos logit contributions
 curious+0.102
 determined+0.101
 nervous+0.101
 hungry+0.099
 confused+0.098
Neg logit contributions
eds-0.104
w-0.098
ara-0.096
ating-0.095
of-0.091
Token!
Token ID0
Feature activation-5.36e+00

Pos logit contributions
 curious+0.122
 determined+0.120
 nervous+0.120
 hungry+0.117
 confused+0.117
Neg logit contributions
eds-0.150
w-0.141
ara-0.138
ating-0.137
of-0.132
Token The
Token ID383
Feature activation-2.22e+00

Pos logit contributions
eds+0.324
ara+0.307
w+0.306
of+0.293
ating+0.292
Neg logit contributions
 determined-0.265
 curious-0.257
 nervous-0.254
 hungry-0.249
 unsure-0.247
Token bird
Token ID6512
Feature activation-1.46e+00

Pos logit contributions
eds+0.107
w+0.101
ara+0.096
ating+0.094
st+0.091
Neg logit contributions
 determined-0.137
 curious-0.136
 nervous-0.136
 unsure-0.132
 hungry-0.132
Token is
Token ID318
Feature activation-1.57e+01

Pos logit contributions
eds+0.044
ara+0.042
ators+0.039
icer+0.039
ating+0.038
Neg logit contributions
 curious-0.119
 nervous-0.118
 hungry-0.117
 confused-0.116
 determined-0.116
Token scared
Token ID12008
Feature activation+1.73e+01

Pos logit contributions
eds+1.093
w+0.976
 us+0.973
st+0.951
 is+0.926
Neg logit contributions
 determined-0.548
 confident-0.509
 unsure-0.501
 curious-0.472
 nervous-0.470
Token.
Token ID13
Feature activation+1.84e-02

Pos logit contributions
 curious+1.005
 nervous+0.994
 determined+0.977
 hungry+0.973
 confused+0.963
Neg logit contributions
eds-0.925
w-0.866
ara-0.850
ating-0.847
icer-0.819
Token Can
Token ID1680
Feature activation+1.75e+00

Pos logit contributions
 determined+0.001
 confident+0.001
 unsure+0.001
 nervous+0.001
 hungry+0.001
Neg logit contributions
eds-0.001
ara-0.001
of-0.001
 is-0.001
w-0.001
Token we
Token ID356
Feature activation-6.26e+00

Pos logit contributions
 curious+0.102
 nervous+0.101
 determined+0.100
 hungry+0.099
 confused+0.099
Neg logit contributions
eds-0.095
ating-0.086
w-0.085
ara-0.084
icer-0.082
Token help
Token ID1037
Feature activation+3.75e+00

Pos logit contributions
eds+0.333
w+0.309
ara+0.302
ating+0.300
icer+0.288
Neg logit contributions
 curious-0.367
 nervous-0.362
 determined-0.361
 hungry-0.355
 confused-0.353
Token it
Token ID340
Feature activation+2.90e+00

Pos logit contributions
 curious+0.197
 nervous+0.197
 hungry+0.193
 confused+0.187
 excited+0.187
Neg logit contributions
icer-0.220
ara-0.210
ating-0.210
ase-0.209
eds-0.208
Token.
Token ID13
Feature activation-4.39e-01

Pos logit contributions
eds+0.024
w+0.022
ara+0.022
ating+0.021
st+0.021
Neg logit contributions
 determined-0.027
 curious-0.027
 nervous-0.027
 hungry-0.026
 confused-0.026
Token She
Token ID1375
Feature activation+9.77e+00

Pos logit contributions
ara+0.029
eds+0.028
of+0.027
 is+0.026
w+0.026
Neg logit contributions
 determined-0.019
 curious-0.017
 hungry-0.017
 unsure-0.017
 confident-0.017
Token looked
Token ID3114
Feature activation+4.75e+00

Pos logit contributions
 determined+0.511
 curious+0.501
 nervous+0.499
 sorry+0.495
 hungry+0.495
Neg logit contributions
eds-0.553
w-0.522
ara-0.511
of-0.484
st-0.481
Token everywhere
Token ID8347
Feature activation+5.62e+00

Pos logit contributions
 determined+0.284
 ashamed+0.270
 sorry+0.270
 nervous+0.269
 unsure+0.268
Neg logit contributions
eds-0.216
ie-0.192
 us-0.186
 is-0.182
w-0.182
Token but
Token ID475
Feature activation+6.98e+00

Pos logit contributions
 determined+0.295
 curious+0.284
 unsure+0.280
 confused+0.278
 nervous+0.278
Neg logit contributions
eds-0.309
w-0.289
of-0.286
st-0.279
ara-0.271
Token couldn
Token ID3521
Feature activation+4.24e+00

Pos logit contributions
 curious+0.297
 determined+0.294
 nervous+0.292
 hungry+0.288
 confused+0.280
Neg logit contributions
eds-0.458
w-0.449
ara-0.433
ating-0.424
of-0.422
Token't
Token ID470
Feature activation+9.81e-01

Pos logit contributions
 nervous+0.199
 curious+0.196
 determined+0.196
 hungry+0.191
 confused+0.190
Neg logit contributions
eds-0.277
w-0.255
ating-0.254
ara-0.248
icer-0.243
Token find
Token ID1064
Feature activation-1.76e+00

Pos logit contributions
 determined+0.062
 curious+0.061
 hungry+0.060
 nervous+0.060
 confused+0.060
Neg logit contributions
eds-0.045
ara-0.041
of-0.040
w-0.040
st-0.039
Token it
Token ID340
Feature activation+2.43e+00

Pos logit contributions
eds+0.111
w+0.104
ara+0.102
ating+0.102
of+0.098
Neg logit contributions
 curious-0.085
 nervous-0.083
 determined-0.083
 hungry-0.082
 confused-0.081
Token.
Token ID13
Feature activation-4.17e-01

Pos logit contributions
 curious+0.137
 determined+0.135
 nervous+0.135
 hungry+0.132
 confused+0.132
Neg logit contributions
eds-0.133
w-0.124
ara-0.122
ating-0.121
of-0.117
Token
Token ID220
Feature activation+3.48e+00

Pos logit contributions
ara+0.023
eds+0.023
w+0.021
of+0.021
icer+0.021
Neg logit contributions
 determined-0.022
 curious-0.021
 hungry-0.021
 nervous-0.021
 confused-0.021
Token it
Token ID340
Feature activation+2.40e+00

Pos logit contributions
 curious+0.405
 determined+0.402
 nervous+0.396
 hungry+0.387
 confused+0.383
Neg logit contributions
w-0.368
eds-0.359
ara-0.353
ating-0.334
of-0.330
Token is
Token ID318
Feature activation-1.61e+01

Pos logit contributions
 curious+0.200
 nervous+0.199
 hungry+0.197
 confused+0.197
 determined+0.196
Neg logit contributions
eds-0.069
ating-0.057
 occasions-0.054
w-0.053
ent-0.053
Token too
Token ID1165
Feature activation+5.23e+00

Pos logit contributions
eds+0.949
w+0.937
ara+0.878
 is+0.876
icer+0.862
Neg logit contributions
 curious-0.691
 determined-0.688
 confident-0.669
 nervous-0.665
 unsure-0.651
Token hard
Token ID1327
Feature activation+1.04e+01

Pos logit contributions
 determined+0.184
 curious+0.181
 nervous+0.176
 unsure+0.176
 sorry+0.173
Neg logit contributions
eds-0.386
w-0.386
ara-0.372
st-0.356
ee-0.355
Token.
Token ID13
Feature activation-4.29e-01

Pos logit contributions
 curious+0.627
 determined+0.623
 nervous+0.615
 confused+0.606
 hungry+0.605
Neg logit contributions
eds-0.525
w-0.481
ating-0.468
ara-0.465
st-0.458
Token She
Token ID1375
Feature activation+9.80e+00

Pos logit contributions
ara+0.030
eds+0.028
w+0.028
icer+0.027
of+0.027
Neg logit contributions
 determined-0.018
 curious-0.017
 hungry-0.016
 nervous-0.016
 confused-0.016
Token can
Token ID460
Feature activation-2.45e+00

Pos logit contributions
 curious+0.617
 nervous+0.610
 determined+0.603
 confused+0.600
 hungry+0.600
Neg logit contributions
eds-0.486
ating-0.443
w-0.437
ara-0.424
of-0.410
Token't
Token ID470
Feature activation+1.01e+00

Pos logit contributions
eds+0.126
w+0.123
ara+0.119
of+0.116
ating+0.113
Neg logit contributions
 curious-0.141
 determined-0.140
 nervous-0.137
 hungry-0.135
 confused-0.135
Token cut
Token ID2005
Feature activation-9.20e-01

Pos logit contributions
 curious+0.056
 determined+0.055
 nervous+0.055
 hungry+0.054
 confused+0.054
Neg logit contributions
eds-0.056
w-0.053
ara-0.052
ating-0.051
of-0.050
Token it
Token ID340
Feature activation+2.43e+00

Pos logit contributions
eds+0.062
w+0.057
ating+0.057
ara+0.057
of+0.055
Neg logit contributions
 curious-0.041
 nervous-0.041
 hungry-0.040
 confused-0.039
 determined-0.039
Token.
Token ID13
Feature activation-4.17e-01

Pos logit contributions
 curious+0.134
 determined+0.133
 nervous+0.132
 hungry+0.129
 confused+0.129
Neg logit contributions
eds-0.136
w-0.127
ara-0.124
ating-0.123
of-0.119
Token felt
Token ID2936
Feature activation-2.40e+00

Pos logit contributions
 curious+0.089
 nervous+0.088
 determined+0.088
 hungry+0.087
 confused+0.086
Neg logit contributions
eds-0.078
w-0.072
ara-0.072
ating-0.071
of-0.069
Token happy
Token ID3772
Feature activation+1.13e+01

Pos logit contributions
 is+0.153
 us+0.139
of+0.139
eds+0.137
e+0.136
Neg logit contributions
 determined-0.069
 curious-0.069
 unsure-0.067
 hungry-0.066
 thirsty-0.062
Token and
Token ID290
Feature activation-3.46e-01

Pos logit contributions
 determined+0.748
 thirsty+0.744
 curious+0.733
 nervous+0.732
 hungry+0.722
Neg logit contributions
 is-0.372
st-0.366
eds-0.350
ara-0.344
ating-0.337
Token proud
Token ID6613
Feature activation+1.21e+01

Pos logit contributions
 is+0.022
ating+0.021
eds+0.021
w+0.021
of+0.020
Neg logit contributions
 unsure-0.011
 determined-0.011
 hungry-0.010
 nervous-0.010
 curious-0.010
Token of
Token ID286
Feature activation-7.12e+00

Pos logit contributions
 determined+0.825
 thirsty+0.810
 curious+0.801
 nervous+0.796
 confused+0.795
Neg logit contributions
ating-0.429
st-0.429
ara-0.424
eds-0.421
of-0.409
Token herself
Token ID5223
Feature activation+1.21e+00

Pos logit contributions
eds+0.364
ating+0.363
ara+0.334
 occasions+0.328
ma+0.326
Neg logit contributions
 nervous-0.390
 curious-0.383
 hungry-0.383
 determined-0.382
 thirsty-0.381
Token.
Token ID13
Feature activation-4.37e-01

Pos logit contributions
 thirsty+0.080
 nervous+0.078
 confused+0.078
 hungry+0.077
 determined+0.077
Neg logit contributions
eds-0.045
ating-0.043
st-0.042
of-0.039
 is-0.039
Token They
Token ID1119
Feature activation+4.72e+00

Pos logit contributions
eds+0.029
ara+0.027
w+0.026
ating+0.026
of+0.026
Neg logit contributions
 curious-0.019
 determined-0.019
 nervous-0.019
 hungry-0.019
 confused-0.018
Token continued
Token ID3767
Feature activation+1.93e+00

Pos logit contributions
 curious+0.252
 nervous+0.250
 determined+0.247
 hungry+0.243
 confused+0.242
Neg logit contributions
eds-0.267
w-0.254
ara-0.247
ating-0.244
of-0.244
Token to
Token ID284
Feature activation+1.23e+00

Pos logit contributions
 curious+0.099
 nervous+0.099
 unsure+0.098
 confused+0.097
 determined+0.096
Neg logit contributions
eds-0.113
ating-0.103
st-0.099
w-0.096
ee-0.094
Token play
Token ID711
Feature activation-2.29e+00

Pos logit contributions
 curious+0.075
 nervous+0.074
 determined+0.074
 hungry+0.073
 confused+0.073
Neg logit contributions
eds-0.061
w-0.057
ara-0.055
ating-0.055
of-0.053
Token him
Token ID683
Feature activation-2.99e+00

Pos logit contributions
eds+0.163
w+0.156
ee+0.151
of+0.150
ara+0.146
Neg logit contributions
 determined-0.137
 unsure-0.129
 nervous-0.129
 curious-0.128
 confused-0.126
Token.
Token ID13
Feature activation+2.69e-01

Pos logit contributions
eds+0.161
w+0.150
ara+0.145
ating+0.145
of+0.140
Neg logit contributions
 curious-0.168
 determined-0.168
 nervous-0.166
 hungry-0.162
 confused-0.162
Token She
Token ID1375
Feature activation+1.05e+01

Pos logit contributions
 determined+0.011
 curious+0.010
 nervous+0.010
 sorry+0.010
 unsure+0.010
Neg logit contributions
ara-0.018
eds-0.018
of-0.018
ating-0.017
w-0.017
Token was
Token ID373
Feature activation-9.03e+00

Pos logit contributions
 curious+0.545
 nervous+0.533
 determined+0.533
 confused+0.521
 hungry+0.521
Neg logit contributions
eds-0.640
w-0.597
ating-0.587
ara-0.577
icer-0.571
Token only
Token ID691
Feature activation+9.62e+00

Pos logit contributions
eds+0.522
e+0.514
 is+0.502
ma+0.486
ee+0.484
Neg logit contributions
dream-0.259
 determined-0.238
 confident-0.232
 unsure-0.228
 he-0.227
Token three
Token ID1115
Feature activation+2.05e+00

Pos logit contributions
 determined+0.432
 sorry+0.420
 hungry+0.400
 unsure+0.398
 nervous+0.398
Neg logit contributions
ee-0.525
eds-0.525
st-0.517
w-0.503
ara-0.497
Token years
Token ID812
Feature activation-1.15e+00

Pos logit contributions
 curious+0.123
 determined+0.122
 nervous+0.122
 hungry+0.120
 confused+0.119
Neg logit contributions
eds-0.103
w-0.093
ara-0.093
ating-0.092
of-0.088
Token old
Token ID1468
Feature activation+2.29e+00

Pos logit contributions
eds+0.076
ara+0.075
ating+0.070
of+0.068
ee+0.067
Neg logit contributions
 determined-0.047
 unsure-0.045
 curious-0.044
 nervous-0.042
 hungry-0.042
Token and
Token ID290
Feature activation+4.65e-01

Pos logit contributions
 determined+0.139
 nervous+0.133
 sorry+0.131
 curious+0.131
 hungry+0.130
Neg logit contributions
eds-0.106
ara-0.105
ating-0.099
st-0.097
w-0.096
Token was
Token ID373
Feature activation-8.94e+00

Pos logit contributions
 determined+0.019
 thirsty+0.019
 nervous+0.019
 unsure+0.019
 hungry+0.019
Neg logit contributions
 is-0.026
w-0.025
ating-0.024
ee-0.024
eds-0.024
Token excited
Token ID6568
Feature activation+2.01e+01

Pos logit contributions
eds+0.620
 is+0.583
ee+0.578
e+0.572
ating+0.570
Neg logit contributions
 determined-0.201
 unsure-0.192
 sorry-0.190
dream-0.187
 confident-0.178
Token
Token ID198
Feature activation-3.64e+00

Pos logit contributions
eds+0.017
ara+0.016
w+0.015
ating+0.015
icer+0.015
Neg logit contributions
 determined-0.014
 curious-0.014
 nervous-0.014
 confused-0.013
 hungry-0.013
Token
Token ID198
Feature activation-3.64e+00

Pos logit contributions
eds+0.238
w+0.229
ating+0.227
ators+0.220
ara+0.217
Neg logit contributions
 curious-0.167
 determined-0.166
 excited-0.164
 nervous-0.161
 hungry-0.158
TokenThe
Token ID464
Feature activation-9.16e+00

Pos logit contributions
eds+0.203
w+0.185
ating+0.181
ara+0.178
icer+0.174
Neg logit contributions
 determined-0.211
 curious-0.203
 excited-0.198
 nervous-0.196
 hungry-0.193
Token mummy
Token ID38567
Feature activation+5.51e+00

Pos logit contributions
eds+0.467
w+0.438
ara+0.434
ating+0.423
icer+0.408
Neg logit contributions
 curious-0.542
 nervous-0.539
 determined-0.536
 confused-0.526
 hungry-0.525
Token pig
Token ID12967
Feature activation-1.79e+00

Pos logit contributions
 curious+0.256
 determined+0.248
 nervous+0.246
 hungry+0.244
 excited+0.242
Neg logit contributions
eds-0.361
w-0.344
ating-0.330
icer-0.327
ara-0.323
Token and
Token ID290
Feature activation-2.41e-01

Pos logit contributions
eds+0.097
w+0.092
ara+0.088
ating+0.087
icer+0.087
Neg logit contributions
 curious-0.102
 determined-0.100
 nervous-0.098
 hungry-0.098
 confused-0.097
Token daddy
Token ID35822
Feature activation+3.86e+00

Pos logit contributions
eds+0.015
w+0.014
ara+0.014
ating+0.014
of+0.013
Neg logit contributions
 determined-0.012
 curious-0.012
 nervous-0.011
 confused-0.011
 hungry-0.011
Token pig
Token ID12967
Feature activation-1.79e+00

Pos logit contributions
 curious+0.200
 determined+0.196
 nervous+0.194
 hungry+0.194
 excited+0.192
Neg logit contributions
eds-0.232
w-0.214
ara-0.211
ating-0.211
icer-0.206
Token enjoyed
Token ID8359
Feature activation-1.52e+00

Pos logit contributions
eds+0.100
w+0.095
ara+0.095
icer+0.092
ating+0.091
Neg logit contributions
 curious-0.098
 determined-0.095
 hungry-0.095
 nervous-0.095
 confused-0.093
Token their
Token ID511
Feature activation-3.77e+00

Pos logit contributions
eds+0.080
ating+0.076
w+0.071
ara+0.070
st+0.069
Neg logit contributions
 nervous-0.086
 confused-0.084
 unsure-0.083
 curious-0.083
 determined-0.081
Token milk
Token ID7545
Feature activation-3.58e+00

Pos logit contributions
eds+0.225
w+0.208
ara+0.202
ating+0.199
of+0.197
Neg logit contributions
 curious-0.203
 determined-0.199
 nervous-0.199
 hungry-0.198
 excited-0.192
Token at
Token ID379
Feature activation-1.35e+00

Pos logit contributions
 determined+0.315
 nervous+0.297
 curious+0.294
 thirsty+0.291
 unsure+0.286
Neg logit contributions
eds-0.298
w-0.273
ara-0.269
 is-0.264
les-0.262
Token Ben
Token ID3932
Feature activation+5.56e+00

Pos logit contributions
eds+0.088
w+0.083
ating+0.081
ara+0.080
st+0.078
Neg logit contributions
 curious-0.063
 nervous-0.062
 determined-0.061
 hungry-0.060
 confused-0.060
Token and
Token ID290
Feature activation+6.33e-01

Pos logit contributions
 curious+0.327
 nervous+0.324
 determined+0.323
 hungry+0.317
 confused+0.314
Neg logit contributions
eds-0.287
w-0.261
ating-0.257
ara-0.257
of-0.249
Token said
Token ID531
Feature activation+2.35e+00

Pos logit contributions
 determined+0.035
 nervous+0.033
 curious+0.033
 unsure+0.031
 excited+0.031
Neg logit contributions
ara-0.032
eds-0.032
w-0.032
of-0.029
 is-0.029
Token:
Token ID25
Feature activation+1.01e+00

Pos logit contributions
 determined+0.130
 nervous+0.127
 unsure+0.126
 thirsty+0.125
 curious+0.125
Neg logit contributions
w-0.111
 is-0.109
ara-0.107
eds-0.104
ating-0.104
Token
Token ID198
Feature activation-2.93e+00

Pos logit contributions
 determined+0.039
 nervous+0.039
 curious+0.039
 unsure+0.037
 confused+0.037
Neg logit contributions
eds-0.070
ara-0.068
w-0.067
ating-0.065
of-0.064
Token
Token ID198
Feature activation-2.96e+00

Pos logit contributions
eds+0.195
ating+0.184
w+0.183
ators+0.180
icer+0.179
Neg logit contributions
 curious-0.133
 nervous-0.128
 determined-0.127
 sorry-0.126
 hungry-0.126
Token"
Token ID1
Feature activation-8.12e+00

Pos logit contributions
eds+0.126
ating+0.113
ators+0.111
icer+0.109
w+0.109
Neg logit contributions
 curious-0.207
 determined-0.202
 hungry-0.200
 nervous-0.200
 sorry-0.196
TokenI
Token ID40
Feature activation-5.62e+00

Pos logit contributions
eds+0.380
w+0.345
ating+0.334
ara+0.333
icer+0.320
Neg logit contributions
 curious-0.536
 nervous-0.530
 determined-0.529
 hungry-0.520
 confused-0.517
Token'm
Token ID1101
Feature activation-7.18e+00

Pos logit contributions
eds+0.286
ating+0.260
w+0.249
icer+0.244
ara+0.243
Neg logit contributions
 curious-0.360
 nervous-0.356
 determined-0.350
 hungry-0.349
 confused-0.346
Token sorry
Token ID7926
Feature activation+1.79e+01

Pos logit contributions
 us+0.482
 is+0.477
eds+0.469
w+0.469
e+0.463
Neg logit contributions
 determined-0.169
 unsure-0.148
 confident-0.142
dream-0.141
 desperate-0.138
Token to
Token ID284
Feature activation+1.56e+00

Pos logit contributions
eds+0.123
w+0.117
ating+0.115
ara+0.112
of+0.112
Neg logit contributions
 curious-0.098
 determined-0.098
 nervous-0.096
 confused-0.094
 hungry-0.094
Token be
Token ID307
Feature activation-1.00e+01

Pos logit contributions
 curious+0.086
 nervous+0.085
 hungry+0.084
 sorry+0.084
 determined+0.083
Neg logit contributions
eds-0.089
w-0.083
ara-0.080
icer-0.080
ating-0.079
Token a
Token ID257
Feature activation+2.53e+00

Pos logit contributions
eds+0.692
ating+0.655
w+0.628
st+0.596
ats+0.592
Neg logit contributions
 determined-0.346
 nervous-0.344
 unsure-0.342
 curious-0.331
 confused-0.325
Token surprise
Token ID5975
Feature activation+9.03e-01

Pos logit contributions
 unsure+0.110
 confused+0.109
 nervous+0.108
 determined+0.108
 curious+0.107
Neg logit contributions
eds-0.149
ara-0.145
w-0.144
st-0.141
ee-0.137
Token for
Token ID329
Feature activation-8.22e+00

Pos logit contributions
 curious+0.048
 determined+0.048
 nervous+0.048
 confused+0.047
 hungry+0.047
Neg logit contributions
eds-0.051
w-0.048
ara-0.047
ating-0.047
of-0.045
Token the
Token ID262
Feature activation-4.35e+00

Pos logit contributions
eds+0.523
w+0.492
ating+0.483
ara+0.482
of+0.465
Neg logit contributions
 curious-0.391
 determined-0.386
 nervous-0.386
 hungry-0.375
 confused-0.374
Token village
Token ID7404
Feature activation-9.13e-01

Pos logit contributions
eds+0.224
w+0.222
ara+0.218
ating+0.217
icer+0.208
Neg logit contributions
 nervous-0.221
 determined-0.220
 curious-0.217
 confused-0.216
 unsure-0.214
Token and
Token ID290
Feature activation+8.13e-02

Pos logit contributions
eds+0.047
ating+0.043
w+0.042
ara+0.042
st+0.041
Neg logit contributions
 determined-0.051
 curious-0.051
 nervous-0.051
 confused-0.050
 hungry-0.049
Token not
Token ID407
Feature activation+9.82e+00

Pos logit contributions
 nervous+0.004
 determined+0.004
 curious+0.004
 confused+0.004
 hungry+0.004
Neg logit contributions
eds-0.005
w-0.004
ating-0.004
 is-0.004
st-0.004
Token something
Token ID1223
Feature activation-2.61e+00

Pos logit contributions
 determined+0.489
 curious+0.483
 nervous+0.481
 confused+0.471
 unsure+0.468
Neg logit contributions
eds-0.559
w-0.512
st-0.502
ating-0.491
of-0.482
Token that
Token ID326
Feature activation+5.25e+00

Pos logit contributions
eds+0.160
w+0.152
ating+0.147
ara+0.147
st+0.144
Neg logit contributions
 determined-0.118
 nervous-0.118
 curious-0.117
 confused-0.115
 unsure-0.115
Tokenila
Token ID10102
Feature activation-9.56e+00

Pos logit contributions
eds+0.750
w+0.739
ating+0.733
st+0.683
ators+0.680
Neg logit contributions
 excited-0.836
 curious-0.834
 nervous-0.814
 hungry-0.813
 confused-0.803
Token and
Token ID290
Feature activation-3.26e-01

Pos logit contributions
eds+0.637
w+0.574
ating+0.561
icer+0.560
ators+0.554
Neg logit contributions
 curious-0.461
 excited-0.447
 nervous-0.437
 hungry-0.433
 confused-0.427
Token Ben
Token ID3932
Feature activation+4.55e+00

Pos logit contributions
w+0.018
ara+0.018
eds+0.017
ee+0.017
of+0.017
Neg logit contributions
 determined-0.017
 nervous-0.016
 unsure-0.016
 confused-0.016
 curious-0.016
Token went
Token ID1816
Feature activation-1.80e-01

Pos logit contributions
 curious+0.254
 nervous+0.244
 hungry+0.244
 excited+0.242
 confused+0.239
Neg logit contributions
eds-0.275
ating-0.241
w-0.240
icer-0.232
st-0.225
Token to
Token ID284
Feature activation+1.24e+00

Pos logit contributions
eds+0.011
w+0.010
ara+0.010
ating+0.010
st+0.010
Neg logit contributions
 determined-0.009
 curious-0.009
 nervous-0.009
 unsure-0.009
 confused-0.009
Token the
Token ID262
Feature activation-4.73e+00

Pos logit contributions
 curious+0.058
 nervous+0.057
 hungry+0.056
 determined+0.055
 confused+0.054
Neg logit contributions
eds-0.083
w-0.080
ating-0.078
ara-0.075
icer-0.073
Token market
Token ID1910
Feature activation+9.00e-01

Pos logit contributions
eds+0.227
w+0.211
ating+0.202
ara+0.200
of+0.193
Neg logit contributions
 curious-0.305
 nervous-0.299
 determined-0.296
 hungry-0.296
 confused-0.292
Token with
Token ID351
Feature activation-3.51e+00

Pos logit contributions
 determined+0.055
 nervous+0.054
 sorry+0.053
 curious+0.053
 confused+0.053
Neg logit contributions
eds-0.044
ara-0.041
ating-0.040
w-0.039
st-0.039
Token Mom
Token ID11254
Feature activation+3.59e+00

Pos logit contributions
eds+0.185
ara+0.180
w+0.169
ating+0.168
of+0.167
Neg logit contributions
 determined-0.199
 nervous-0.193
 curious-0.192
 unsure-0.191
 confused-0.188
Token.
Token ID13
Feature activation-4.46e-01

Pos logit contributions
 curious+0.183
 nervous+0.178
 hungry+0.177
 determined+0.176
 excited+0.173
Neg logit contributions
eds-0.221
w-0.213
ating-0.201
ara-0.199
icer-0.193
Token They
Token ID1119
Feature activation+4.71e+00

Pos logit contributions
ara+0.027
eds+0.026
icer+0.025
ucks+0.024
of+0.024
Neg logit contributions
 determined-0.023
 unsure-0.021
 curious-0.021
 sorry-0.021
 confident-0.021
Token Ben
Token ID3932
Feature activation+4.56e+00

Pos logit contributions
ara+0.021
 is+0.017
ucks+0.017
of+0.016
eds+0.016
Neg logit contributions
 determined-0.027
 curious-0.024
 confident-0.024
 unsure-0.023
 sorry-0.022
Token does
Token ID857
Feature activation-3.04e+00

Pos logit contributions
 curious+0.355
 nervous+0.349
 determined+0.348
 confused+0.343
 hungry+0.343
Neg logit contributions
eds-0.164
w-0.146
ating-0.139
ara-0.132
of-0.128
Token not
Token ID407
Feature activation+9.43e+00

Pos logit contributions
eds+0.209
ating+0.201
w+0.200
ara+0.200
 is+0.195
Neg logit contributions
 hungry-0.100
 nervous-0.098
 unsure-0.097
 confused-0.097
 curious-0.096
Token want
Token ID765
Feature activation+2.99e+00

Pos logit contributions
 determined+0.444
 curious+0.443
 confused+0.430
 nervous+0.430
 hungry+0.428
Neg logit contributions
eds-0.580
w-0.560
ara-0.543
of-0.536
ee-0.532
Token to
Token ID284
Feature activation+1.27e+00

Pos logit contributions
 curious+0.125
 determined+0.123
 nervous+0.122
 hungry+0.119
 confused+0.118
Neg logit contributions
eds-0.208
w-0.197
ara-0.194
ating-0.192
of-0.188
Token go
Token ID467
Feature activation-4.88e+00

Pos logit contributions
 curious+0.075
 determined+0.074
 nervous+0.074
 hungry+0.073
 confused+0.072
Neg logit contributions
eds-0.066
w-0.061
ara-0.060
ating-0.059
of-0.057
Token.
Token ID13
Feature activation-3.84e-01

Pos logit contributions
eds+0.284
ara+0.257
w+0.257
ating+0.256
st+0.248
Neg logit contributions
 determined-0.256
 curious-0.255
 nervous-0.245
 unsure-0.243
 confused-0.242
Token He
Token ID679
Feature activation+8.84e+00

Pos logit contributions
ara+0.027
eds+0.024
ucks+0.024
 is+0.024
icer+0.023
Neg logit contributions
 determined-0.016
 curious-0.014
 confident-0.014
 unsure-0.014
 thirsty-0.014
Token wants
Token ID3382
Feature activation-5.15e+00

Pos logit contributions
 curious+0.615
 nervous+0.608
 determined+0.608
 hungry+0.598
 confused+0.596
Neg logit contributions
eds-0.375
w-0.339
ating-0.328
ara-0.326
of-0.310
Token to
Token ID284
Feature activation+1.29e+00

Pos logit contributions
eds+0.332
ara+0.306
w+0.304
ating+0.298
of+0.296
Neg logit contributions
 determined-0.240
 curious-0.237
 nervous-0.232
 unsure-0.227
 confused-0.226
Token watch
Token ID2342
Feature activation-3.72e+00

Pos logit contributions
 curious+0.074
 nervous+0.073
 determined+0.072
 hungry+0.071
 confused+0.070
Neg logit contributions
eds-0.071
w-0.067
ating-0.064
ara-0.064
icer-0.063
Token but
Token ID475
Feature activation+7.27e+00

Pos logit contributions
 nervous+0.086
 determined+0.085
 curious+0.085
 confused+0.083
 hungry+0.082
Neg logit contributions
eds-0.097
ating-0.090
w-0.089
ara-0.088
st-0.086
Token he
Token ID339
Feature activation+1.11e+01

Pos logit contributions
 nervous+0.279
 curious+0.273
 hungry+0.268
 determined+0.266
 confused+0.263
Neg logit contributions
eds-0.485
w-0.482
ating-0.472
ara-0.456
st-0.455
Token knew
Token ID2993
Feature activation+5.49e+00

Pos logit contributions
 curious+0.629
 determined+0.621
 nervous+0.619
 hungry+0.606
 confused+0.604
Neg logit contributions
eds-0.608
w-0.566
ara-0.553
ating-0.549
icer-0.530
Token that
Token ID326
Feature activation+5.10e+00

Pos logit contributions
 nervous+0.225
 curious+0.223
 determined+0.220
 hungry+0.217
 confused+0.215
Neg logit contributions
eds-0.377
w-0.359
ating-0.356
ara-0.352
st-0.343
Token it
Token ID340
Feature activation+2.72e+00

Pos logit contributions
 curious+0.216
 nervous+0.214
 determined+0.213
 confused+0.207
 hungry+0.206
Neg logit contributions
eds-0.345
w-0.328
ating-0.321
ara-0.320
of-0.310
Token was
Token ID373
Feature activation-9.45e+00

Pos logit contributions
 curious+0.188
 determined+0.186
 nervous+0.185
 confused+0.182
 hungry+0.181
Neg logit contributions
eds-0.120
w-0.109
ara-0.107
ating-0.102
icer-0.101
Token time
Token ID640
Feature activation-3.83e+00

Pos logit contributions
eds+0.604
ating+0.572
w+0.572
ara+0.555
st+0.547
Neg logit contributions
 nervous-0.400
 curious-0.400
 determined-0.389
 confused-0.384
 hungry-0.379
Token for
Token ID329
Feature activation-8.27e+00

Pos logit contributions
eds+0.224
w+0.210
ating+0.198
ara+0.197
of+0.196
Neg logit contributions
 nervous-0.191
 curious-0.189
 determined-0.187
 confused-0.185
 unsure-0.185
Token bed
Token ID3996
Feature activation+5.45e+00

Pos logit contributions
eds+0.489
w+0.456
ara+0.448
ating+0.446
of+0.431
Neg logit contributions
 curious-0.427
 determined-0.422
 nervous-0.421
 confused-0.410
 hungry-0.409
Token.
Token ID13
Feature activation-6.07e-02

Pos logit contributions
 curious+0.296
 determined+0.292
 nervous+0.292
 hungry+0.286
 confused+0.284
Neg logit contributions
eds-0.309
w-0.289
ara-0.283
ating-0.281
of-0.271
Token
Token ID198
Feature activation-3.40e+00

Pos logit contributions
eds+0.004
ara+0.003
w+0.003
ating+0.003
of+0.003
Neg logit contributions
 curious-0.003
 determined-0.003
 nervous-0.003
 hungry-0.003
 confused-0.003
Token take
Token ID1011
Feature activation-7.69e+00

Pos logit contributions
eds+0.514
w+0.476
ara+0.467
ating+0.461
icer+0.443
Neg logit contributions
 curious-0.546
 determined-0.536
 nervous-0.536
 hungry-0.525
 confused-0.522
Token it
Token ID340
Feature activation+2.43e+00

Pos logit contributions
eds+0.507
ara+0.463
w+0.461
ating+0.459
icer+0.453
Neg logit contributions
 curious-0.361
 determined-0.352
 nervous-0.351
 hungry-0.342
 excited-0.341
Token home
Token ID1363
Feature activation-2.39e+00

Pos logit contributions
 curious+0.121
 nervous+0.120
 determined+0.119
 hungry+0.118
 confused+0.115
Neg logit contributions
eds-0.147
ara-0.139
w-0.137
ating-0.136
icer-0.133
Token and
Token ID290
Feature activation-3.05e-01

Pos logit contributions
eds+0.118
w+0.108
ara+0.107
ating+0.105
icer+0.102
Neg logit contributions
 curious-0.149
 nervous-0.146
 determined-0.146
 hungry-0.144
 confused-0.143
Token show
Token ID905
Feature activation-1.13e+00

Pos logit contributions
eds+0.015
w+0.014
ara+0.014
ating+0.014
of+0.013
Neg logit contributions
 curious-0.018
 nervous-0.018
 determined-0.018
 hungry-0.018
 confused-0.018
Token our
Token ID674
Feature activation-1.49e+01

Pos logit contributions
eds+0.076
icer+0.074
w+0.070
ara+0.070
ating+0.069
Neg logit contributions
 curious-0.052
 sorry-0.051
 determined-0.049
 excited-0.049
 nervous-0.048
Token parents
Token ID3397
Feature activation+1.92e+00

Pos logit contributions
eds+0.912
w+0.862
ara+0.846
ating+0.834
of+0.809
Neg logit contributions
 curious-0.767
 determined-0.756
 nervous-0.749
 hungry-0.734
 confused-0.729
Token!"
Token ID2474
Feature activation-1.21e+01

Pos logit contributions
 curious+0.115
 determined+0.111
 excited+0.110
 nervous+0.110
 hungry+0.110
Neg logit contributions
eds-0.099
w-0.093
ara-0.091
icer-0.089
ating-0.089
Token Lucy
Token ID22162
Feature activation-2.79e-01

Pos logit contributions
ara+0.700
w+0.690
of+0.672
eds+0.648
ee+0.646
Neg logit contributions
 determined-0.559
 curious-0.551
 nervous-0.540
 hungry-0.534
 confused-0.524
Token said
Token ID531
Feature activation+1.43e+00

Pos logit contributions
eds+0.015
w+0.015
of+0.015
ara+0.015
ating+0.014
Neg logit contributions
 hungry-0.014
 curious-0.014
 nervous-0.014
 sorry-0.014
 determined-0.014
Token.
Token ID13
Feature activation-4.20e-01

Pos logit contributions
 thirsty+0.092
 hungry+0.091
 bored+0.088
 nervous+0.086
 fearful+0.086
Neg logit contributions
ating-0.046
of-0.046
eds-0.045
ent-0.045
e-0.044
Token made
Token ID925
Feature activation+2.39e-01

Pos logit contributions
eds+0.640
w+0.598
ating+0.595
ara+0.588
icer+0.579
Neg logit contributions
 curious-0.672
 nervous-0.661
 hungry-0.656
 determined-0.647
 confused-0.642
Token."
Token ID526
Feature activation-6.76e+00

Pos logit contributions
 curious+0.014
 nervous+0.014
 hungry+0.014
 sorry+0.014
 excited+0.014
Neg logit contributions
eds-0.013
ara-0.012
w-0.012
icer-0.012
of-0.012
Token
Token ID198
Feature activation-3.76e+00

Pos logit contributions
eds+0.327
ara+0.327
w+0.304
of+0.297
ating+0.288
Neg logit contributions
 determined-0.415
 curious-0.410
 nervous-0.402
 confused-0.396
 hungry-0.394
Token
Token ID198
Feature activation-3.76e+00

Pos logit contributions
eds+0.243
ating+0.235
w+0.235
ators+0.230
icer+0.224
Neg logit contributions
 curious-0.181
 nervous-0.174
 excited-0.174
 hungry-0.174
 determined-0.170
TokenS
Token ID50
Feature activation-1.28e+01

Pos logit contributions
eds+0.200
ating+0.186
ators+0.184
w+0.177
icer+0.175
Neg logit contributions
 curious-0.224
 determined-0.220
 hungry-0.218
 excited-0.217
 nervous-0.215
Tokenara
Token ID3301
Feature activation-2.11e+01

Pos logit contributions
eds+0.444
ators+0.376
achers+0.331
rob+0.330
ating+0.319
Neg logit contributions
 hungry-0.903
 tired-0.865
 nervous-0.865
 thirsty-0.848
 excited-0.841
Token promised
Token ID8072
Feature activation+6.90e-01

Pos logit contributions
eds+1.329
w+1.251
ara+1.220
ating+1.219
icer+1.179
Neg logit contributions
 curious-1.028
 nervous-1.010
 determined-1.008
 hungry-0.985
 confused-0.981
Token.
Token ID13
Feature activation-4.31e-01

Pos logit contributions
 nervous+0.034
 curious+0.034
 determined+0.034
 confused+0.034
 unsure+0.034
Neg logit contributions
eds-0.040
ara-0.040
of-0.039
w-0.038
ating-0.037
Token She
Token ID1375
Feature activation+9.79e+00

Pos logit contributions
ara+0.030
ucks+0.027
eds+0.027
 is+0.026
of+0.026
Neg logit contributions
 determined-0.019
 unsure-0.017
 curious-0.017
 thirsty-0.017
 hungry-0.016
Token helped
Token ID4193
Feature activation+5.84e+00

Pos logit contributions
 curious+0.499
 nervous+0.498
 determined+0.495
 hungry+0.488
 confused+0.483
Neg logit contributions
eds-0.573
w-0.544
ara-0.543
ating-0.526
of-0.515
Token her
Token ID607
Feature activation-5.63e-01

Pos logit contributions
 curious+0.212
 nervous+0.210
 hungry+0.205
 confused+0.200
 excited+0.199
Neg logit contributions
eds-0.444
w-0.422
ating-0.414
icer-0.405
st-0.402
Token.
Token ID13
Feature activation-4.29e-01

Pos logit contributions
eds+0.718
w+0.672
icer+0.660
ara+0.659
ating+0.651
Neg logit contributions
 curious-0.583
 nervous-0.565
 hungry-0.560
 determined-0.543
 sorry-0.542
Token OK
Token ID7477
Feature activation+2.75e+00

Pos logit contributions
eds+0.026
w+0.024
ara+0.024
ating+0.024
of+0.023
Neg logit contributions
 curious-0.022
 determined-0.021
 nervous-0.021
 hungry-0.021
 confused-0.021
Token?"
Token ID1701
Feature activation-1.40e+01

Pos logit contributions
 curious+0.187
 nervous+0.185
 hungry+0.184
 sorry+0.181
 confused+0.179
Neg logit contributions
eds-0.125
w-0.113
icer-0.113
of-0.110
ating-0.110
Token
Token ID198
Feature activation-3.75e+00

Pos logit contributions
ara+0.651
eds+0.650
w+0.603
of+0.599
ating+0.571
Neg logit contributions
 determined-0.879
 curious-0.867
 nervous-0.848
 confused-0.837
 hungry-0.832
Token
Token ID198
Feature activation-3.75e+00

Pos logit contributions
eds+0.268
w+0.266
ating+0.266
ators+0.265
icer+0.254
Neg logit contributions
 curious-0.152
 nervous-0.146
 excited-0.145
 hungry-0.145
 sorry-0.141
TokenL
Token ID43
Feature activation-1.51e+01

Pos logit contributions
eds+0.183
ating+0.173
ators+0.170
w+0.167
icer+0.165
Neg logit contributions
 curious-0.241
 determined-0.231
 excited-0.230
 nervous-0.230
 hungry-0.229
Tokenily
Token ID813
Feature activation-1.34e+00

Pos logit contributions
eds+1.098
w+0.999
ating+0.988
icer+0.972
ators+0.956
Neg logit contributions
 sorry-0.584
 curious-0.578
 nervous-0.569
 ashamed-0.551
 excited-0.535
Token nodded
Token ID14464
Feature activation+3.36e+00

Pos logit contributions
eds+0.087
w+0.082
ara+0.082
ating+0.080
of+0.078
Neg logit contributions
 curious-0.061
 nervous-0.060
 determined-0.060
 hungry-0.059
 confused-0.058
Token.
Token ID13
Feature activation-4.05e-01

Pos logit contributions
 determined+0.209
 curious+0.201
 nervous+0.199
 thirsty+0.199
 unsure+0.197
Neg logit contributions
eds-0.152
ara-0.148
w-0.137
of-0.136
ating-0.136
Token She
Token ID1375
Feature activation+9.80e+00

Pos logit contributions
ara+0.028
eds+0.026
of+0.025
ucks+0.025
 is+0.025
Neg logit contributions
 determined-0.017
 unsure-0.015
 curious-0.015
 hungry-0.015
 thirsty-0.015
Token said
Token ID531
Feature activation+1.46e+00

Pos logit contributions
 nervous+0.505
 determined+0.503
 curious+0.502
 hungry+0.495
 confused+0.489
Neg logit contributions
eds-0.556
ara-0.538
w-0.532
ating-0.513
of-0.509
Token too
Token ID1165
Feature activation+5.81e+00

Pos logit contributions
eds+0.612
w+0.591
ating+0.577
ara+0.567
st+0.558
Neg logit contributions
 curious-0.362
 nervous-0.353
 determined-0.350
 confused-0.341
 unsure-0.337
Token dangerous
Token ID4923
Feature activation+2.52e-01

Pos logit contributions
 determined+0.206
 curious+0.200
 unsure+0.199
 nervous+0.195
 confused+0.191
Neg logit contributions
eds-0.412
w-0.406
ara-0.395
ating-0.385
st-0.382
Token.
Token ID13
Feature activation+8.39e-02

Pos logit contributions
 curious+0.015
 determined+0.015
 nervous+0.015
 confused+0.014
 hungry+0.014
Neg logit contributions
eds-0.013
w-0.012
ating-0.012
ara-0.012
st-0.011
Token
Token ID198
Feature activation-3.27e+00

Pos logit contributions
 curious+0.004
 determined+0.004
 nervous+0.004
 confused+0.004
 hungry+0.004
Neg logit contributions
ara-0.005
eds-0.005
w-0.004
icer-0.004
of-0.004
Token
Token ID198
Feature activation-3.31e+00

Pos logit contributions
eds+0.235
ating+0.224
w+0.224
ators+0.216
ee+0.208
Neg logit contributions
 curious-0.129
 excited-0.127
 determined-0.126
 nervous-0.126
 hungry-0.124
TokenL
Token ID43
Feature activation-1.47e+01

Pos logit contributions
eds+0.177
ating+0.157
w+0.157
ators+0.149
ee+0.148
Neg logit contributions
 determined-0.201
 curious-0.197
 nervous-0.191
 hungry-0.189
 excited-0.189
Tokenily
Token ID813
Feature activation-9.28e-01

Pos logit contributions
eds+0.930
w+0.854
ating+0.825
icer+0.807
st+0.800
Neg logit contributions
 curious-0.718
 nervous-0.704
 determined-0.695
 hungry-0.695
 excited-0.684
Token felt
Token ID2936
Feature activation-1.99e+00

Pos logit contributions
eds+0.049
w+0.046
ara+0.045
ating+0.045
of+0.043
Neg logit contributions
 curious-0.053
 nervous-0.053
 determined-0.052
 hungry-0.052
 confused-0.051
Token sad
Token ID6507
Feature activation+1.28e+01

Pos logit contributions
 is+0.133
 us+0.130
w+0.127
eds+0.122
le+0.122
Neg logit contributions
 determined-0.046
 curious-0.045
 unsure-0.042
 nervous-0.041
 eager-0.039
Token,
Token ID11
Feature activation+1.86e+00

Pos logit contributions
 determined+0.788
 curious+0.762
 thirsty+0.758
 hungry+0.741
 nervous+0.732
Neg logit contributions
eds-0.563
ara-0.561
 is-0.541
w-0.534
st-0.532
Token but
Token ID475
Feature activation+7.34e+00

Pos logit contributions
 curious+0.078
 determined+0.075
 thirsty+0.072
 hungry+0.072
 nervous+0.071
Neg logit contributions
 is-0.088
ara-0.088
 us-0.086
w-0.086
of-0.082
Token excited
Token ID6568
Feature activation+1.93e+01

Pos logit contributions
eds+0.021
 is+0.021
w+0.021
ating+0.020
ee+0.020
Neg logit contributions
 unsure-0.008
 determined-0.008
 confused-0.007
 nervous-0.007
 hungry-0.006
Token again
Token ID757
Feature activation-6.96e+00

Pos logit contributions
 curious+1.065
 nervous+1.050
 determined+1.045
 hungry+1.029
 confused+1.022
Neg logit contributions
eds-1.097
w-1.028
ara-1.001
ating-0.994
icer-0.963
Token.
Token ID13
Feature activation-3.60e-01

Pos logit contributions
eds+0.382
w+0.356
ara+0.348
ating+0.345
of+0.333
Neg logit contributions
 curious-0.393
 determined-0.388
 nervous-0.388
 hungry-0.380
 confused-0.378
Token<|endoftext|>
Token ID50256
Feature activation-4.77e+00

Pos logit contributions
eds+0.021
ara+0.020
w+0.020
ating+0.019
of+0.019
Neg logit contributions
 determined-0.019
 curious-0.019
 nervous-0.018
 hungry-0.018
 confused-0.018
TokenTom
Token ID13787
Feature activation-1.01e+01

Pos logit contributions
eds+0.262
angles+0.240
w+0.230
 occasions+0.227
ation+0.227
Neg logit contributions
 hungry-0.274
 curious-0.263
 nervous-0.259
 excited-0.259
 thirsty-0.242
Token is
Token ID318
Feature activation-1.61e+01

Pos logit contributions
eds+0.477
 occasions+0.458
works+0.453
ymes+0.452
ating+0.448
Neg logit contributions
 hungry-0.630
 curious-0.622
 excited-0.612
 nervous-0.603
 eager-0.594
Token a
Token ID257
Feature activation+2.26e+00

Pos logit contributions
ee+0.860
w+0.826
eds+0.798
ara+0.792
ating+0.787
Neg logit contributions
 determined-0.701
 ashamed-0.701
 unsure-0.692
 nervous-0.657
 confused-0.656
Token big
Token ID1263
Feature activation+7.05e-01

Pos logit contributions
 unsure+0.104
 determined+0.102
 confused+0.096
 nervous+0.095
 ashamed+0.093
Neg logit contributions
ara-0.136
ee-0.128
eds-0.126
w-0.122
icer-0.121
Token boy
Token ID2933
Feature activation-1.02e+00

Pos logit contributions
 determined+0.041
 curious+0.040
 nervous+0.040
 unsure+0.039
 confused+0.039
Neg logit contributions
eds-0.035
ara-0.034
w-0.033
ee-0.032
icer-0.032
Token who
Token ID508
Feature activation-1.14e+00

Pos logit contributions
eds+0.062
w+0.058
ara+0.056
ating+0.056
icer+0.055
Neg logit contributions
 curious-0.053
 nervous-0.051
 hungry-0.051
 determined-0.051
 confused-0.050
Token likes
Token ID7832
Feature activation-9.36e+00

Pos logit contributions
eds+0.050
w+0.046
ara+0.045
ating+0.044
of+0.042
Neg logit contributions
 curious-0.076
 nervous-0.075
 determined-0.075
 hungry-0.074
 confused-0.074