TOP ACTIVATIONS
MAX = 16.674

and forth , back and forth . She had so much
He shook it back and forth and watched as the birds
. She rocks back and forth , back and forth ,
and forth , back and forth , and makes horse noises
bouncing the ball back and forth . They laughed and played
. She goes back and forth and makes many lines .
take the soup back and forth to the kitchen . From
started to sway back and forth and the junk began to
swing would go back and forth in a fun motion .
and rocks it back and forth . She thinks it is
hug . From that day forth , Molly and Daisy were
moved the stick back and forth and soon he was striking
He moved it back and forth and he made strange patterns
began to swing back and forth . The two friends had
rock the ship back and forth , but the pirate was
. It hopped back and forth , collecting a few items
and he swings back and forth . He feels the wind
and Bunny rocked back and forth together and made a big
roll a ball back and forth . The candle wanted to
threw the ball back and forth until they were too tired

MIN ACTIVATIONS
MIN = -21.172

and said it was their fault . Mom said they were
it was Tim my 's fault but Tim my felt guilty
awn â s fault . The
felt like it was her fault , like she had done
told her it was her fault because she dropped her toy
" It 's your fault !" Mia blamed Sam .
realised it wasn 't her fault after all . <|endoftext|> Once
saying that it was his fault that he forgot the folder
it was all Joe 's fault . Joe tried to talk
. It 's not your fault . The rope was old
said it was not their fault . They said the man
realized it was his own fault for not val uing her
them it was all their fault . His friends were so
knew it was his own fault . He would have to
" It 's your own fault , you should 've been
and it was his own fault . Johnny smiled
He realized it was his fault and he felt regret .
" It 's not your fault , honey . Ben made
No , it 's your fault , Lily !" Ben said
" It 's not my fault . I can 't help

INTERVAL 7.213 - 16.674
CONTAINS 4.617%

must promise me not to take any away ."
the store . <|endoftext|> Once upon a time , there was
dinner together . <|endoftext|> Once upon a time , there was
girl was so en vious of the creatures enjoying themselves in
away . Later that day , the mouse

INTERVAL -2.249 - 7.213
CONTAINS 69.102%

about the distant shark . From that day on , Tim
. She wanted to stay dry so she started to cry
shall we go ?" and her dog bark ed yes .
said , " Ben , I am glad . You are
he found the egg . He knew it was the right

INTERVAL -11.710 - -2.249
CONTAINS 25.832%

, her brother poured the oil on the wheels of the
sunny day . <|endoftext|> Jack was walking in the park .
?" " Of course , Tim my . But
. She fell to the ground . Her dress got dirty
time . <|endoftext|> Once upon a time , there was a

INTERVAL -21.172 - -11.710
CONTAINS 0.449%

ice palace . Please help me
the mother : Let â
go on the road without an adult . The automobiles can
found a big jar of pepper on the counter . They
He would bring food to an old mouse who lived down
Token and
Token ID290
Feature activation-5.74e-01

Pos logit contributions
 agree+0.049
 agreement+0.047
ies+0.042
ie+0.041
reci+0.040
Neg logit contributions
 fault-0.057
uch-0.053
ses-0.052
sed-0.052
paste-0.051
Token forth
Token ID6071
Feature activation+1.67e+01

Pos logit contributions
 fault+0.031
ses+0.029
sed+0.029
uch+0.029
paste+0.029
Neg logit contributions
 agreement-0.018
 agree-0.018
ies-0.016
ie-0.016
ashing-0.016
Token,
Token ID11
Feature activation+3.00e-01

Pos logit contributions
 agree+0.611
 agreement+0.561
ies+0.543
ie+0.516
 happy+0.503
Neg logit contributions
 fault-0.880
uch-0.842
ses-0.834
paste-0.824
sed-0.821
Token back
Token ID736
Feature activation+8.51e-01

Pos logit contributions
 agree+0.012
 agreement+0.011
ies+0.011
ie+0.011
reci+0.010
Neg logit contributions
 fault-0.015
sed-0.014
ses-0.014
paste-0.013
 tresp-0.013
Token and
Token ID290
Feature activation-7.09e-01

Pos logit contributions
 agree+0.035
 agreement+0.034
ies+0.030
ie+0.030
reci+0.029
Neg logit contributions
 fault-0.041
ses-0.037
sed-0.037
uch-0.036
paste-0.036
Token forth
Token ID6071
Feature activation+1.64e+01

Pos logit contributions
 fault+0.041
sed+0.039
ses+0.039
uch+0.038
 nothing+0.038
Neg logit contributions
 agree-0.021
 agreement-0.020
ie-0.018
ies-0.018
reci-0.018
Token.
Token ID13
Feature activation+9.91e-02

Pos logit contributions
 agree+0.604
 agreement+0.555
ies+0.538
ie+0.510
 happy+0.498
Neg logit contributions
 fault-0.862
uch-0.828
ses-0.820
paste-0.809
sed-0.806
Token She
Token ID1375
Feature activation+3.01e+00

Pos logit contributions
 agree+0.004
ies+0.004
 agreement+0.003
ie+0.003
 forth+0.003
Neg logit contributions
 fault-0.005
ses-0.005
sed-0.005
OP-0.005
paste-0.005
Token had
Token ID550
Feature activation+2.55e+00

Pos logit contributions
 agree+0.127
 agreement+0.114
ies+0.112
ie+0.108
 happy+0.105
Neg logit contributions
 fault-0.146
uch-0.139
ses-0.137
paste-0.137
sed-0.135
Token so
Token ID523
Feature activation-9.52e+00

Pos logit contributions
 happy+0.114
 agree+0.107
 play+0.100
 agreement+0.099
 family+0.094
Neg logit contributions
osaurus-0.128
uch-0.123
ses-0.123
sed-0.118
paste-0.117
Token much
Token ID881
Feature activation-4.70e-01

Pos logit contributions
 fault+0.382
uch+0.358
ses+0.339
gh+0.338
paste+0.336
Neg logit contributions
 agreement-0.462
 agree-0.462
ies-0.435
ie-0.427
 forth-0.396
Token He
Token ID679
Feature activation+1.86e+00

Pos logit contributions
 fault+0.201
ses+0.188
uch+0.188
paste+0.188
sed+0.188
Neg logit contributions
 agree-0.153
ies-0.147
 agreement-0.144
ie-0.135
ashing-0.125
Token shook
Token ID14682
Feature activation-6.01e+00

Pos logit contributions
 agree+0.074
 agreement+0.067
ies+0.065
ie+0.063
 happy+0.061
Neg logit contributions
 fault-0.094
uch-0.090
ses-0.087
paste-0.087
sed-0.086
Token it
Token ID340
Feature activation-3.00e+00

Pos logit contributions
uch+0.316
 fault+0.307
ses+0.294
paste+0.294
sed+0.289
Neg logit contributions
 agree-0.237
 agreement-0.232
 happy-0.214
ies-0.210
ie-0.205
Token back
Token ID736
Feature activation+9.10e-01

Pos logit contributions
 fault+0.179
uch+0.176
ses+0.173
 tresp+0.172
sed+0.169
Neg logit contributions
 agree-0.090
 agreement-0.083
ies-0.078
 happy-0.077
ie-0.076
Token and
Token ID290
Feature activation-5.48e-01

Pos logit contributions
 agree+0.030
 agreement+0.028
ies+0.027
ie+0.025
 happy+0.025
Neg logit contributions
 fault-0.051
uch-0.050
ses-0.049
paste-0.048
sed-0.048
Token forth
Token ID6071
Feature activation+1.64e+01

Pos logit contributions
 fault+0.026
sed+0.025
ses+0.025
uch+0.025
paste+0.025
Neg logit contributions
 agree-0.022
 agreement-0.021
ies-0.020
ie-0.019
ashing-0.018
Token and
Token ID290
Feature activation-6.73e-01

Pos logit contributions
 agree+0.513
 agreement+0.457
ies+0.450
 happy+0.430
ie+0.419
Neg logit contributions
 fault-0.951
uch-0.937
ses-0.919
sed-0.892
 tresp-0.891
Token watched
Token ID7342
Feature activation+1.54e+00

Pos logit contributions
 fault+0.031
sed+0.030
ses+0.029
uch+0.029
paste+0.029
Neg logit contributions
 agree-0.028
 agreement-0.027
ies-0.026
ie-0.025
ashing-0.023
Token as
Token ID355
Feature activation+3.75e+00

Pos logit contributions
 agree+0.063
 agreement+0.060
ies+0.056
ie+0.054
 happy+0.053
Neg logit contributions
 fault-0.075
uch-0.072
ses-0.070
sed-0.069
paste-0.069
Token the
Token ID262
Feature activation+1.35e+00

Pos logit contributions
 agree+0.152
 agreement+0.142
ies+0.135
ie+0.130
 happy+0.125
Neg logit contributions
 fault-0.184
uch-0.176
ses-0.173
sed-0.172
paste-0.171
Token birds
Token ID10087
Feature activation+8.52e-01

Pos logit contributions
 agree+0.058
 agreement+0.055
ies+0.052
ie+0.050
 happy+0.049
Neg logit contributions
 fault-0.063
uch-0.060
ses-0.059
paste-0.058
sed-0.058
Token.
Token ID13
Feature activation+8.57e-01

Pos logit contributions
 fault+0.032
uch+0.031
ses+0.030
paste+0.030
sed+0.030
Neg logit contributions
 agree-0.026
 agreement-0.024
ies-0.023
ie-0.022
 happy-0.021
Token She
Token ID1375
Feature activation+3.30e+00

Pos logit contributions
 agree+0.031
ies+0.030
 agreement+0.028
ie+0.027
 forth+0.025
Neg logit contributions
 fault-0.045
ses-0.042
 Ow-0.041
paste-0.041
 anything-0.040
Token rocks
Token ID12586
Feature activation-2.60e+00

Pos logit contributions
 agree+0.122
 agreement+0.109
ies+0.108
ie+0.104
 happy+0.101
Neg logit contributions
 fault-0.177
uch-0.167
ses-0.164
sed-0.163
paste-0.163
Token back
Token ID736
Feature activation+5.25e-01

Pos logit contributions
 fault+0.120
uch+0.113
paste+0.113
ses+0.112
sed+0.111
Neg logit contributions
 agree-0.111
 agreement-0.107
ies-0.091
ashing-0.088
 family-0.088
Token and
Token ID290
Feature activation-1.72e-01

Pos logit contributions
 agree+0.024
 agreement+0.022
ies+0.021
ie+0.020
 happy+0.020
Neg logit contributions
 fault-0.024
uch-0.022
ses-0.022
sed-0.022
paste-0.021
Token forth
Token ID6071
Feature activation+1.64e+01

Pos logit contributions
 teeth+0.009
paste+0.009
 anything+0.009
ses+0.009
 ger+0.009
Neg logit contributions
 agree-0.005
 agreement-0.005
ashing-0.005
ogged-0.005
utter-0.004
Token,
Token ID11
Feature activation+5.59e-01

Pos logit contributions
 agree+0.569
 agreement+0.521
ies+0.501
ie+0.475
 happy+0.454
Neg logit contributions
 fault-0.907
uch-0.865
ses-0.856
sed-0.847
paste-0.845
Token back
Token ID736
Feature activation+5.19e-01

Pos logit contributions
 agree+0.023
 agreement+0.021
ies+0.020
ie+0.019
reci+0.018
Neg logit contributions
 fault-0.027
ses-0.026
sed-0.026
paste-0.025
uch-0.025
Token and
Token ID290
Feature activation-1.98e-01

Pos logit contributions
 agree+0.027
 agreement+0.027
ies+0.024
ie+0.024
reci+0.023
Neg logit contributions
 fault-0.019
ses-0.017
sed-0.017
paste-0.017
uch-0.017
Token forth
Token ID6071
Feature activation+1.64e+01

Pos logit contributions
 anything+0.012
 nothing+0.012
 fault+0.012
ses+0.012
sed+0.011
Neg logit contributions
 agree-0.005
 agreement-0.005
ogged-0.004
reci-0.004
ies-0.004
Token,
Token ID11
Feature activation+5.44e-01

Pos logit contributions
 agree+0.609
 agreement+0.563
ies+0.541
ie+0.514
 happy+0.492
Neg logit contributions
 fault-0.861
uch-0.821
ses-0.812
sed-0.804
paste-0.801
Token and
Token ID290
Feature activation-1.72e-01

Pos logit contributions
 agree+0.024
 agreement+0.022
ies+0.021
ie+0.020
 happy+0.020
Neg logit contributions
 fault-0.024
uch-0.022
ses-0.022
sed-0.022
paste-0.021
Token forth
Token ID6071
Feature activation+1.64e+01

Pos logit contributions
 teeth+0.009
paste+0.009
 anything+0.009
ses+0.009
 ger+0.009
Neg logit contributions
 agree-0.005
 agreement-0.005
ashing-0.005
ogged-0.005
utter-0.004
Token,
Token ID11
Feature activation+5.59e-01

Pos logit contributions
 agree+0.569
 agreement+0.521
ies+0.501
ie+0.475
 happy+0.454
Neg logit contributions
 fault-0.907
uch-0.865
ses-0.856
sed-0.847
paste-0.845
Token back
Token ID736
Feature activation+5.19e-01

Pos logit contributions
 agree+0.023
 agreement+0.021
ies+0.020
ie+0.019
reci+0.018
Neg logit contributions
 fault-0.027
ses-0.026
sed-0.026
paste-0.025
uch-0.025
Token and
Token ID290
Feature activation-1.98e-01

Pos logit contributions
 agree+0.027
 agreement+0.027
ies+0.024
ie+0.024
reci+0.023
Neg logit contributions
 fault-0.019
ses-0.017
sed-0.017
paste-0.017
uch-0.017
Token forth
Token ID6071
Feature activation+1.64e+01

Pos logit contributions
 anything+0.012
 nothing+0.012
 fault+0.012
ses+0.012
sed+0.011
Neg logit contributions
 agree-0.005
 agreement-0.005
ogged-0.004
reci-0.004
ies-0.004
Token,
Token ID11
Feature activation+5.44e-01

Pos logit contributions
 agree+0.609
 agreement+0.563
ies+0.541
ie+0.514
 happy+0.492
Neg logit contributions
 fault-0.861
uch-0.821
ses-0.812
sed-0.804
paste-0.801
Token and
Token ID290
Feature activation-1.95e-01

Pos logit contributions
 agree+0.023
 agreement+0.021
ies+0.020
ie+0.020
reci+0.019
Neg logit contributions
 fault-0.026
ses-0.024
sed-0.024
uch-0.023
paste-0.023
Token makes
Token ID1838
Feature activation-3.01e+00

Pos logit contributions
 fault+0.009
ses+0.009
paste+0.009
sed+0.009
 nothing+0.008
Neg logit contributions
 agree-0.008
 agreement-0.007
ies-0.007
ie-0.007
ogged-0.007
Token horse
Token ID8223
Feature activation-6.30e-01

Pos logit contributions
uch+0.153
 fault+0.151
ses+0.148
sed+0.148
gency+0.145
Neg logit contributions
 agree-0.117
 agreement-0.108
ies-0.107
ie-0.103
 happy-0.102
Token noises
Token ID26782
Feature activation-1.38e+00

Pos logit contributions
 fault+0.028
paste+0.026
uch+0.025
 tresp+0.025
ze+0.025
Neg logit contributions
 agree-0.027
 agreement-0.026
reci-0.023
ie-0.023
 particular-0.023
Token bouncing
Token ID30713
Feature activation+1.64e+00

Pos logit contributions
 fault+0.066
uch+0.065
ses+0.064
sed+0.064
paste+0.063
Neg logit contributions
 agree-0.043
 agreement-0.039
ies-0.038
ie-0.036
 happy-0.035
Token the
Token ID262
Feature activation+1.86e+00

Pos logit contributions
 agree+0.050
ie+0.049
ies+0.049
 agreement+0.047
 happy+0.045
Neg logit contributions
 fault-0.097
uch-0.094
 tresp-0.092
gency-0.091
paste-0.090
Token ball
Token ID2613
Feature activation-1.74e+00

Pos logit contributions
 agree+0.073
ies+0.071
 agreement+0.071
ie+0.068
 happy+0.067
Neg logit contributions
 fault-0.089
uch-0.084
gency-0.084
paste-0.084
OP-0.083
Token back
Token ID736
Feature activation+9.04e-01

Pos logit contributions
 fault+0.091
gency+0.088
 tresp+0.087
paste+0.087
ses+0.087
Neg logit contributions
 agree-0.063
ie-0.058
ies-0.057
 agreement-0.055
 happy-0.053
Token and
Token ID290
Feature activation-6.55e-01

Pos logit contributions
 agree+0.036
 agreement+0.033
ies+0.032
ie+0.031
 happy+0.029
Neg logit contributions
 fault-0.045
uch-0.043
ses-0.042
paste-0.042
sed-0.042
Token forth
Token ID6071
Feature activation+1.64e+01

Pos logit contributions
 fault+0.039
sed+0.037
ses+0.037
uch+0.036
 tresp+0.036
Neg logit contributions
 agree-0.018
 agreement-0.018
ies-0.015
reci-0.015
 particular-0.014
Token.
Token ID13
Feature activation+1.49e-01

Pos logit contributions
 agree+0.581
 agreement+0.527
ies+0.514
 happy+0.490
ie+0.484
Neg logit contributions
 fault-0.880
uch-0.856
ses-0.845
paste-0.839
 tresp-0.834
Token They
Token ID1119
Feature activation+3.83e+00

Pos logit contributions
 agree+0.006
ies+0.005
 agreement+0.005
ie+0.005
reci+0.004
Neg logit contributions
 fault-0.008
sed-0.007
ses-0.007
paste-0.007
OP-0.007
Token laughed
Token ID13818
Feature activation-3.12e+00

Pos logit contributions
 agree+0.138
 agreement+0.136
ies+0.132
ie+0.123
ashing+0.118
Neg logit contributions
 fault-0.192
uch-0.182
ses-0.181
sed-0.180
 tresp-0.179
Token and
Token ID290
Feature activation-5.72e-01

Pos logit contributions
 fault+0.183
uch+0.180
ses+0.173
paste+0.173
 tresp+0.173
Neg logit contributions
 agree-0.100
 agreement-0.091
ies-0.085
 happy-0.081
ie-0.079
Token played
Token ID2826
Feature activation+4.30e+00

Pos logit contributions
 fault+0.029
ses+0.028
sed+0.028
uch+0.027
paste+0.027
Neg logit contributions
 agree-0.022
 agreement-0.021
ies-0.019
ie-0.019
reci-0.017
Token.
Token ID13
Feature activation+5.04e-01

Pos logit contributions
 agree+0.150
 agreement+0.136
ies+0.130
ie+0.127
 happy+0.121
Neg logit contributions
 fault-0.215
uch-0.208
ses-0.206
sed-0.205
paste-0.201
Token She
Token ID1375
Feature activation+3.29e+00

Pos logit contributions
 agree+0.016
 agreement+0.014
ies+0.014
ie+0.013
 happy+0.012
Neg logit contributions
 fault-0.029
ses-0.028
uch-0.028
sed-0.028
paste-0.027
Token goes
Token ID2925
Feature activation+1.67e+00

Pos logit contributions
 agree+0.159
 agreement+0.141
ies+0.139
 play+0.135
 happy+0.135
Neg logit contributions
 fault-0.139
uch-0.135
gh-0.134
paste-0.128
osaurus-0.128
Token back
Token ID736
Feature activation+5.05e-01

Pos logit contributions
 agree+0.081
 agreement+0.077
ies+0.073
ie+0.071
reci+0.068
Neg logit contributions
 fault-0.069
uch-0.064
ses-0.063
sed-0.063
paste-0.062
Token and
Token ID290
Feature activation-5.18e-01

Pos logit contributions
 agree+0.025
 agreement+0.024
ies+0.023
ie+0.022
 happy+0.021
Neg logit contributions
 fault-0.020
uch-0.019
ses-0.019
sed-0.018
paste-0.018
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
 fault+0.021
ses+0.021
sed+0.021
gency+0.020
 tresp+0.020
Neg logit contributions
 agree-0.024
 agreement-0.023
ies-0.021
ie-0.021
ashing-0.021
Token and
Token ID290
Feature activation-5.92e-01

Pos logit contributions
 agree+0.541
 agreement+0.494
ies+0.471
ie+0.443
 happy+0.428
Neg logit contributions
 fault-0.922
uch-0.887
ses-0.873
paste-0.865
sed-0.864
Token makes
Token ID1838
Feature activation-2.82e+00

Pos logit contributions
 fault+0.030
ses+0.030
sed+0.029
gency+0.028
 tresp+0.028
Neg logit contributions
 agree-0.021
 agreement-0.020
ies-0.019
ie-0.019
ashing-0.018
Token many
Token ID867
Feature activation+3.60e+00

Pos logit contributions
uch+0.147
 fault+0.142
sed+0.138
gency+0.138
ses+0.138
Neg logit contributions
 agree-0.108
 agreement-0.100
ies-0.095
 happy-0.095
ie-0.091
Token lines
Token ID3951
Feature activation-2.93e+00

Pos logit contributions
 agree+0.170
 agreement+0.157
ashing+0.153
eline+0.151
ogged+0.148
Neg logit contributions
 fault-0.137
 tresp-0.132
 teeth-0.130
 ger-0.129
 anything-0.129
Token.
Token ID13
Feature activation+5.06e-01

Pos logit contributions
 fault+0.146
uch+0.139
ses+0.138
sed+0.137
paste+0.136
Neg logit contributions
 agree-0.115
 agreement-0.107
ies-0.103
ie-0.098
 happy-0.094
Token take
Token ID1011
Feature activation+9.36e+00

Pos logit contributions
 fault+0.291
uch+0.284
 tresp+0.276
ses+0.274
gh+0.274
Neg logit contributions
 agree-0.170
ies-0.159
 agreement-0.158
ie-0.145
ashing-0.144
Token the
Token ID262
Feature activation+1.59e+00

Pos logit contributions
 agree+0.337
 happy+0.336
 agreement+0.322
 play+0.316
 family+0.312
Neg logit contributions
ses-0.517
sed-0.511
 fault-0.484
uch-0.469
gency-0.466
Token soup
Token ID17141
Feature activation-8.34e+00

Pos logit contributions
 agree+0.064
 agreement+0.062
ies+0.059
 happy+0.059
ie+0.059
Neg logit contributions
 fault-0.076
ses-0.073
gency-0.072
uch-0.072
sed-0.072
Token back
Token ID736
Feature activation+6.45e-01

Pos logit contributions
 fault+0.406
uch+0.384
paste+0.377
ses+0.374
sed+0.372
Neg logit contributions
 agree-0.338
 agreement-0.318
ies-0.299
ie-0.286
 happy-0.274
Token and
Token ID290
Feature activation-5.26e-01

Pos logit contributions
 agree+0.028
 agreement+0.026
ies+0.025
ie+0.024
 happy+0.023
Neg logit contributions
 fault-0.030
uch-0.028
ses-0.028
sed-0.028
paste-0.028
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
 fault+0.027
uch+0.026
paste+0.025
sed+0.025
ses+0.025
Neg logit contributions
 agree-0.019
 agreement-0.018
ies-0.016
ashing-0.016
ie-0.016
Token to
Token ID284
Feature activation-5.36e+00

Pos logit contributions
 agree+0.575
 agreement+0.528
ies+0.516
ie+0.489
 happy+0.487
Neg logit contributions
 fault-0.886
uch-0.857
ses-0.856
sed-0.836
 tresp-0.820
Token the
Token ID262
Feature activation+1.61e+00

Pos logit contributions
 fault+0.260
 anything+0.240
 tresp+0.238
uch+0.237
ses+0.236
Neg logit contributions
 agree-0.207
ies-0.201
 agreement-0.188
ashing-0.183
ie-0.178
Token kitchen
Token ID9592
Feature activation+5.56e-01

Pos logit contributions
 agree+0.060
 agreement+0.057
ies+0.053
ie+0.051
 happy+0.050
Neg logit contributions
 fault-0.084
uch-0.081
ses-0.079
paste-0.079
sed-0.078
Token.
Token ID13
Feature activation+2.27e-01

Pos logit contributions
 agree+0.022
 agreement+0.021
ies+0.020
ie+0.019
 happy+0.018
Neg logit contributions
 fault-0.028
uch-0.026
ses-0.026
paste-0.026
sed-0.026
Token From
Token ID3574
Feature activation+4.72e+00

Pos logit contributions
 agree+0.009
ies+0.009
 agreement+0.008
ie+0.008
 forth+0.007
Neg logit contributions
 fault-0.012
ses-0.011
OP-0.011
paste-0.011
uch-0.011
Token started
Token ID2067
Feature activation+2.75e+00

Pos logit contributions
 agree+0.280
 agreement+0.260
ies+0.250
ie+0.240
 happy+0.227
Neg logit contributions
 fault-0.383
uch-0.362
ses-0.358
sed-0.355
paste-0.353
Token to
Token ID284
Feature activation-5.40e+00

Pos logit contributions
ies+0.120
 agree+0.119
 happy+0.113
 agreement+0.113
ie+0.111
Neg logit contributions
uch-0.127
 fault-0.121
ses-0.115
osaurus-0.114
gency-0.112
Token sway
Token ID20009
Feature activation-1.19e+00

Pos logit contributions
 fault+0.263
uch+0.248
ses+0.245
sed+0.244
paste+0.242
Neg logit contributions
 agree-0.223
 agreement-0.209
ies-0.199
ie-0.191
 happy-0.184
Token back
Token ID736
Feature activation+5.14e-01

Pos logit contributions
 fault+0.062
uch+0.059
ses+0.059
sed+0.059
 tresp+0.058
Neg logit contributions
 agree-0.044
 agreement-0.042
ies-0.041
ie-0.040
 happy-0.038
Token and
Token ID290
Feature activation-5.22e-01

Pos logit contributions
 agree+0.019
 agreement+0.018
ies+0.017
ie+0.016
 happy+0.016
Neg logit contributions
 fault-0.027
uch-0.026
ses-0.025
sed-0.025
paste-0.025
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
 fault+0.028
sed+0.028
ses+0.028
paste+0.027
 nothing+0.027
Neg logit contributions
 agree-0.017
 agreement-0.015
ies-0.014
ogged-0.014
ie-0.014
Token and
Token ID290
Feature activation-5.18e-01

Pos logit contributions
 agree+0.513
ies+0.471
 agreement+0.463
 happy+0.453
ie+0.431
Neg logit contributions
 fault-0.954
uch-0.928
ses-0.905
 tresp-0.881
osaurus-0.881
Token the
Token ID262
Feature activation+1.51e+00

Pos logit contributions
 fault+0.024
ses+0.023
sed+0.023
paste+0.023
uch+0.022
Neg logit contributions
 agree-0.022
 agreement-0.020
ies-0.020
ie-0.019
ashing-0.018
Token junk
Token ID18556
Feature activation-7.56e-01

Pos logit contributions
 agree+0.066
 agreement+0.061
ie+0.060
ies+0.059
ogged+0.056
Neg logit contributions
 fault-0.067
sed-0.066
ses-0.066
 tresp-0.063
uch-0.063
Token began
Token ID2540
Feature activation+3.43e+00

Pos logit contributions
 fault+0.039
uch+0.037
ses+0.036
osaurus+0.035
gency+0.035
Neg logit contributions
 agree-0.029
 agreement-0.027
ies-0.026
ie-0.025
 happy-0.024
Token to
Token ID284
Feature activation-5.40e+00

Pos logit contributions
 agree+0.152
ies+0.150
 agreement+0.146
 happy+0.143
ie+0.142
Neg logit contributions
uch-0.154
 fault-0.144
ses-0.143
osaurus-0.139
sed-0.137
Token swing
Token ID9628
Feature activation+4.34e+00

Pos logit contributions
 agree+0.048
 agreement+0.045
ies+0.043
ie+0.041
 happy+0.039
Neg logit contributions
 fault-0.064
uch-0.061
ses-0.060
sed-0.060
paste-0.060
Token would
Token ID561
Feature activation-2.33e+00

Pos logit contributions
 agree+0.184
 agreement+0.168
ies+0.168
ie+0.164
 happy+0.157
Neg logit contributions
 fault-0.207
ses-0.192
uch-0.191
sed-0.190
paste-0.188
Token go
Token ID467
Feature activation+6.51e+00

Pos logit contributions
 fault+0.125
uch+0.119
ses+0.118
sed+0.117
paste+0.117
Neg logit contributions
 agree-0.082
 agreement-0.076
ies-0.073
ie-0.069
 happy-0.065
Token back
Token ID736
Feature activation+7.86e-01

Pos logit contributions
 agree+0.270
 agreement+0.252
ies+0.248
ie+0.244
 happy+0.236
Neg logit contributions
 fault-0.315
ses-0.301
uch-0.300
sed-0.299
paste-0.295
Token and
Token ID290
Feature activation-2.70e-01

Pos logit contributions
 agree+0.034
 agreement+0.032
ies+0.031
ie+0.030
 happy+0.029
Neg logit contributions
 fault-0.036
uch-0.034
ses-0.034
paste-0.033
sed-0.033
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
 fault+0.015
ses+0.014
uch+0.014
sed+0.014
paste+0.014
Neg logit contributions
 agree-0.008
 agreement-0.008
ies-0.007
ashing-0.007
ie-0.007
Token in
Token ID287
Feature activation+2.00e+00

Pos logit contributions
 agree+0.536
 agreement+0.484
ies+0.475
ie+0.448
 happy+0.445
Neg logit contributions
 fault-0.920
uch-0.888
ses-0.882
sed-0.865
paste-0.861
Token a
Token ID257
Feature activation-8.04e+00

Pos logit contributions
 agree+0.078
 agreement+0.073
ies+0.068
ie+0.066
 happy+0.063
Neg logit contributions
 fault-0.102
uch-0.097
ses-0.096
sed-0.096
paste-0.095
Token fun
Token ID1257
Feature activation-1.05e+00

Pos logit contributions
 fault+0.368
gh+0.351
uch+0.347
 tresp+0.345
paste+0.343
Neg logit contributions
 agree-0.330
ies-0.313
 agreement-0.297
ie-0.293
ashing-0.277
Token motion
Token ID6268
Feature activation-1.91e+00

Pos logit contributions
 fault+0.051
gh+0.050
uch+0.045
 tresp+0.045
ze+0.045
Neg logit contributions
 agree-0.039
 agreement-0.037
reci-0.036
ashing-0.034
ies-0.034
Token.
Token ID13
Feature activation+5.26e-01

Pos logit contributions
 fault+0.098
uch+0.093
ses+0.092
sed+0.091
paste+0.091
Neg logit contributions
 agree-0.073
 agreement-0.068
ies-0.065
ie-0.062
 happy-0.059
Token and
Token ID290
Feature activation-6.68e-01

Pos logit contributions
 fault+0.506
uch+0.490
sed+0.486
ses+0.485
gency+0.479
Neg logit contributions
 agree-0.307
 agreement-0.277
ies-0.269
ie-0.255
 happy-0.246
Token rocks
Token ID12586
Feature activation-2.45e+00

Pos logit contributions
 fault+0.033
gh+0.033
uch+0.032
lor+0.029
osaurus+0.029
Neg logit contributions
 agree-0.028
ies-0.026
 play-0.025
 agreement-0.025
 forth-0.024
Token it
Token ID340
Feature activation-3.53e+00

Pos logit contributions
 fault+0.122
uch+0.114
ses+0.113
paste+0.113
sed+0.112
Neg logit contributions
 agree-0.097
 agreement-0.092
ies-0.086
ie-0.083
 happy-0.079
Token back
Token ID736
Feature activation+5.60e-01

Pos logit contributions
gency+0.153
 fault+0.152
uch+0.147
 tresp+0.146
osaurus+0.144
Neg logit contributions
 agree-0.153
ies-0.150
 happy-0.145
 play-0.141
ie-0.141
Token and
Token ID290
Feature activation-6.78e-01

Pos logit contributions
 agree+0.021
ies+0.019
 agreement+0.019
ie+0.018
 happy+0.017
Neg logit contributions
uch-0.029
 fault-0.028
paste-0.028
ses-0.027
 tresp-0.027
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
 fault+0.037
ses+0.035
sed+0.035
uch+0.034
paste+0.034
Neg logit contributions
 agree-0.024
 agreement-0.022
ies-0.021
ie-0.020
ashing-0.019
Token.
Token ID13
Feature activation+6.12e-01

Pos logit contributions
 agree+0.522
ies+0.466
 agreement+0.464
 happy+0.436
ie+0.423
Neg logit contributions
 fault-0.923
uch-0.917
ses-0.898
sed-0.883
paste-0.879
Token She
Token ID1375
Feature activation+3.09e+00

Pos logit contributions
 agree+0.015
 agreement+0.014
ies+0.012
ie+0.011
 happy+0.011
Neg logit contributions
 fault-0.039
uch-0.038
ses-0.037
paste-0.037
sed-0.037
Token thinks
Token ID6834
Feature activation-6.01e+00

Pos logit contributions
 agree+0.140
ies+0.122
 play+0.121
 agreement+0.117
 ask+0.116
Neg logit contributions
gh-0.145
uch-0.135
gency-0.133
 fault-0.132
OP-0.129
Token it
Token ID340
Feature activation-3.56e+00

Pos logit contributions
uch+0.249
 fault+0.249
ses+0.235
osaurus+0.234
sed+0.233
Neg logit contributions
 agree-0.288
 agreement-0.269
ies-0.263
 happy-0.258
ie-0.255
Token is
Token ID318
Feature activation-6.41e+00

Pos logit contributions
gency+0.171
sed+0.168
 tresp+0.166
ifier+0.164
paste+0.162
Neg logit contributions
 play-0.128
 agree-0.126
ie-0.124
ies-0.123
 happy-0.121
Token hug
Token ID16225
Feature activation+2.05e+00

Pos logit contributions
 agree+0.023
 agreement+0.021
ies+0.018
ie+0.017
 happy+0.016
Neg logit contributions
 fault-0.057
uch-0.055
paste-0.054
ses-0.054
sed-0.053
Token.
Token ID13
Feature activation+3.28e-02

Pos logit contributions
 agree+0.070
 agreement+0.065
ies+0.061
ie+0.059
 happy+0.056
Neg logit contributions
 fault-0.114
uch-0.113
ses-0.110
sed-0.108
 tresp-0.107
Token From
Token ID3574
Feature activation+4.63e+00

Pos logit contributions
 agree+0.001
ies+0.001
 agreement+0.001
ie+0.001
 forth+0.001
Neg logit contributions
 fault-0.002
sed-0.002
ses-0.002
paste-0.001
 Ow-0.001
Token that
Token ID326
Feature activation-1.06e+00

Pos logit contributions
 agree+0.223
 agreement+0.211
ies+0.202
ie+0.194
 happy+0.187
Neg logit contributions
 fault-0.192
uch-0.181
ses-0.178
sed-0.177
paste-0.175
Token day
Token ID1110
Feature activation+5.53e+00

Pos logit contributions
 fault+0.069
uch+0.069
gh+0.068
 tresp+0.066
ses+0.066
Neg logit contributions
 agree-0.025
ies-0.021
 agreement-0.020
ashing-0.017
ogged-0.017
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
 agree+0.282
 agreement+0.263
ies+0.256
ie+0.245
 happy+0.234
Neg logit contributions
 fault-0.210
uch-0.209
paste-0.199
gh-0.199
 tresp-0.196
Token,
Token ID11
Feature activation+7.12e-02

Pos logit contributions
 agree+0.576
 agreement+0.531
ies+0.493
ie+0.467
 happy+0.443
Neg logit contributions
 fault-0.882
uch-0.838
paste-0.829
ses-0.827
sed-0.825
Token Molly
Token ID30236
Feature activation-2.65e+00

Pos logit contributions
 agree+0.003
 agreement+0.003
ies+0.003
ie+0.003
reci+0.003
Neg logit contributions
 fault-0.003
paste-0.003
 nothing-0.003
 anything-0.003
uch-0.003
Token and
Token ID290
Feature activation-6.55e-01

Pos logit contributions
 fault+0.136
ses+0.129
uch+0.129
sed+0.127
paste+0.126
Neg logit contributions
 agree-0.102
 agreement-0.092
ies-0.091
ie-0.088
 happy-0.084
Token Daisy
Token ID40355
Feature activation+1.74e+00

Pos logit contributions
 fault+0.025
paste+0.023
 tresp+0.023
ses+0.023
 nothing+0.022
Neg logit contributions
 agree-0.032
 agreement-0.029
ashing-0.028
ies-0.028
reci-0.027
Token were
Token ID547
Feature activation+2.78e-01

Pos logit contributions
 agreement+0.071
 agree+0.069
ies+0.061
ashing+0.060
ogging+0.058
Neg logit contributions
 fault-0.081
uch-0.078
 tresp-0.077
sed-0.075
osaurus-0.075
Token moved
Token ID3888
Feature activation-1.94e+00

Pos logit contributions
 agree+0.035
 agreement+0.030
ies+0.030
ie+0.028
 happy+0.028
Neg logit contributions
 fault-0.054
uch-0.051
ses-0.050
gh-0.049
paste-0.049
Token the
Token ID262
Feature activation+1.47e+00

Pos logit contributions
 fault+0.109
uch+0.107
ses+0.104
sed+0.104
 tresp+0.101
Neg logit contributions
 agree-0.069
 agreement-0.066
ies-0.060
 happy-0.058
ie-0.057
Token stick
Token ID4859
Feature activation-9.19e-01

Pos logit contributions
 agree+0.060
 agreement+0.059
 happy+0.054
ies+0.052
 family+0.051
Neg logit contributions
 fault-0.069
uch-0.067
paste-0.067
osaurus-0.066
ses-0.066
Token back
Token ID736
Feature activation+6.12e-01

Pos logit contributions
 fault+0.044
uch+0.042
paste+0.041
ses+0.041
sed+0.041
Neg logit contributions
 agree-0.037
 agreement-0.035
ies-0.033
ie-0.032
reci-0.030
Token and
Token ID290
Feature activation-5.67e-01

Pos logit contributions
 agree+0.024
 agreement+0.022
ies+0.021
ie+0.020
 happy+0.019
Neg logit contributions
 fault-0.031
uch-0.030
ses-0.029
sed-0.029
paste-0.029
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
 fault+0.035
sed+0.034
paste+0.034
ses+0.034
uch+0.034
Neg logit contributions
 agree-0.014
 agreement-0.013
ies-0.011
ie-0.011
ashing-0.010
Token and
Token ID290
Feature activation-5.78e-01

Pos logit contributions
 agree+0.515
 agreement+0.460
ies+0.457
 happy+0.446
ie+0.423
Neg logit contributions
 fault-0.952
uch-0.930
ses-0.917
 tresp-0.888
sed-0.888
Token soon
Token ID2582
Feature activation+1.58e+00

Pos logit contributions
 fault+0.029
sed+0.028
ses+0.028
uch+0.028
paste+0.028
Neg logit contributions
 agree-0.021
 agreement-0.020
ies-0.019
ie-0.018
ashing-0.017
Token he
Token ID339
Feature activation+3.17e+00

Pos logit contributions
 agree+0.058
 agreement+0.053
ies+0.052
ie+0.049
 happy+0.048
Neg logit contributions
 fault-0.083
uch-0.081
ses-0.080
sed-0.079
paste-0.078
Token was
Token ID373
Feature activation-3.66e+00

Pos logit contributions
 agree+0.120
ies+0.107
 happy+0.102
ie+0.101
 play+0.098
Neg logit contributions
 fault-0.174
ses-0.162
gency-0.160
uch-0.155
 tresp-0.155
Token striking
Token ID8871
Feature activation-2.92e+00

Pos logit contributions
 fault+0.171
ses+0.164
uch+0.163
sed+0.161
paste+0.160
Neg logit contributions
 agree-0.156
 agreement-0.146
ies-0.140
 happy-0.139
ie-0.133
Token He
Token ID679
Feature activation+9.80e-01

Pos logit contributions
 agree+0.018
 agreement+0.017
ies+0.017
ie+0.016
reci+0.014
Neg logit contributions
 fault-0.025
ses-0.024
sed-0.024
paste-0.024
gency-0.024
Token moved
Token ID3888
Feature activation-2.01e+00

Pos logit contributions
 agree+0.037
 agreement+0.034
ies+0.033
ie+0.032
 happy+0.030
Neg logit contributions
 fault-0.052
uch-0.049
ses-0.048
paste-0.048
sed-0.048
Token it
Token ID340
Feature activation-3.35e+00

Pos logit contributions
 fault+0.114
uch+0.110
ses+0.108
sed+0.107
paste+0.104
Neg logit contributions
 agree-0.071
 agreement-0.068
ies-0.062
ie-0.060
 happy-0.060
Token back
Token ID736
Feature activation+5.64e-01

Pos logit contributions
 fault+0.170
uch+0.162
ses+0.160
sed+0.159
paste+0.158
Neg logit contributions
 agree-0.128
 agreement-0.119
ies-0.113
ie-0.108
 happy-0.102
Token and
Token ID290
Feature activation-3.44e-01

Pos logit contributions
 agree+0.023
 agreement+0.021
ies+0.021
ie+0.020
 happy+0.019
Neg logit contributions
 fault-0.028
uch-0.026
ses-0.026
paste-0.026
sed-0.026
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
 fault+0.019
sed+0.018
ses+0.018
uch+0.018
paste+0.018
Neg logit contributions
 agree-0.011
 agreement-0.010
ies-0.009
ie-0.009
reci-0.008
Token and
Token ID290
Feature activation-3.55e-01

Pos logit contributions
 agree+0.469
 agreement+0.412
ies+0.410
 happy+0.391
ie+0.379
Neg logit contributions
 fault-1.007
uch-0.976
ses-0.960
 tresp-0.932
sed-0.930
Token he
Token ID339
Feature activation+3.16e+00

Pos logit contributions
 fault+0.017
sed+0.016
ses+0.016
uch+0.016
paste+0.016
Neg logit contributions
 agree-0.014
 agreement-0.013
ies-0.013
ie-0.012
reci-0.011
Token made
Token ID925
Feature activation-5.03e+00

Pos logit contributions
 agree+0.119
ies+0.106
 happy+0.102
 agreement+0.101
ie+0.101
Neg logit contributions
 fault-0.174
ses-0.160
uch-0.159
paste-0.157
 tresp-0.156
Token strange
Token ID6283
Feature activation-9.90e-01

Pos logit contributions
uch+0.206
 fault+0.196
ses+0.192
sed+0.189
gency+0.186
Neg logit contributions
 agree-0.257
 happy-0.249
 agreement-0.247
ies-0.232
ie-0.225
Token patterns
Token ID7572
Feature activation+1.24e+00

Pos logit contributions
 fault+0.045
uch+0.044
ses+0.042
sed+0.042
paste+0.041
Neg logit contributions
 agree-0.044
 agreement-0.042
ies-0.041
ie-0.039
 happy-0.038
Token began
Token ID2540
Feature activation+3.57e+00

Pos logit contributions
 fault+0.008
paste+0.008
uch+0.008
sed+0.008
ses+0.008
Neg logit contributions
 agree-0.007
 agreement-0.006
ies-0.006
ie-0.006
ashing-0.005
Token to
Token ID284
Feature activation-4.78e+00

Pos logit contributions
 agree+0.187
 agreement+0.178
ies+0.171
ie+0.166
 happy+0.159
Neg logit contributions
 fault-0.133
uch-0.122
ses-0.120
sed-0.120
paste-0.119
Token swing
Token ID9628
Feature activation+3.86e+00

Pos logit contributions
 fault+0.256
uch+0.245
ses+0.239
gh+0.238
 tresp+0.237
Neg logit contributions
 agree-0.152
 agreement-0.145
ies-0.145
ashing-0.137
ie-0.133
Token back
Token ID736
Feature activation+8.89e-01

Pos logit contributions
 agree+0.131
 agreement+0.120
ies+0.119
ie+0.118
 happy+0.115
Neg logit contributions
 fault-0.212
ses-0.207
uch-0.207
sed-0.205
 tresp-0.202
Token and
Token ID290
Feature activation-5.55e-01

Pos logit contributions
 agree+0.037
 agreement+0.035
ies+0.032
ie+0.031
 happy+0.029
Neg logit contributions
 fault-0.043
uch-0.040
ses-0.040
sed-0.039
paste-0.039
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
 fault+0.034
ses+0.033
sed+0.033
uch+0.033
 nothing+0.032
Neg logit contributions
 agree-0.013
 agreement-0.013
ies-0.011
ie-0.011
ashing-0.011
Token.
Token ID13
Feature activation+1.15e-01

Pos logit contributions
 agree+0.589
 agreement+0.541
ies+0.524
ie+0.499
 happy+0.491
Neg logit contributions
 fault-0.864
uch-0.832
ses-0.827
sed-0.812
paste-0.812
Token The
Token ID383
Feature activation-2.16e-01

Pos logit contributions
 agree+0.004
ies+0.004
 agreement+0.004
ie+0.004
 forth+0.003
Neg logit contributions
 fault-0.006
ses-0.006
sed-0.006
paste-0.005
uch-0.005
Token two
Token ID734
Feature activation+1.10e+00

Pos logit contributions
 fault+0.011
uch+0.011
ses+0.011
sed+0.011
paste+0.011
Neg logit contributions
 agree-0.008
 agreement-0.007
ies-0.007
ie-0.007
ashing-0.006
Token friends
Token ID2460
Feature activation+1.02e+01

Pos logit contributions
 agree+0.039
 agreement+0.037
ashing+0.033
reci+0.031
eline+0.029
Neg logit contributions
 tricked-0.052
 tresp-0.052
 happens-0.052
 fault-0.051
 nothing-0.051
Token had
Token ID550
Feature activation+2.60e+00

Pos logit contributions
 agreement+0.423
 agree+0.405
ies+0.377
 particular+0.361
ashing+0.357
Neg logit contributions
 fault-0.435
uch-0.431
 tresp-0.412
sed-0.409
 tricked-0.408
Token rock
Token ID3881
Feature activation-2.88e+00

Pos logit contributions
 fault+0.260
uch+0.249
ses+0.244
paste+0.242
sed+0.241
Neg logit contributions
 agree-0.215
 agreement-0.201
ies-0.194
ie-0.183
ashing-0.175
Token the
Token ID262
Feature activation+1.39e+00

Pos logit contributions
 fault+0.170
uch+0.166
ses+0.164
sed+0.162
gency+0.160
Neg logit contributions
 agree-0.086
ies-0.081
ie-0.076
 agreement-0.075
 happy-0.072
Token ship
Token ID4074
Feature activation-1.31e+00

Pos logit contributions
 agree+0.047
 agreement+0.045
 family+0.045
 happy+0.043
ies+0.042
Neg logit contributions
 fault-0.074
paste-0.073
uch-0.071
osaurus-0.070
gh-0.070
Token back
Token ID736
Feature activation+6.03e-01

Pos logit contributions
 fault+0.073
uch+0.069
ses+0.069
sed+0.068
paste+0.068
Neg logit contributions
 agree-0.045
 agreement-0.041
ies-0.039
ie-0.037
 happy-0.036
Token and
Token ID290
Feature activation-5.19e-01

Pos logit contributions
 agree+0.026
 agreement+0.024
ies+0.022
ie+0.021
reci+0.021
Neg logit contributions
 fault-0.029
uch-0.027
ses-0.027
sed-0.026
paste-0.026
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
 fault+0.029
ses+0.028
sed+0.028
uch+0.028
paste+0.028
Neg logit contributions
 agree-0.016
 agreement-0.015
ies-0.014
ie-0.013
ashing-0.013
Token,
Token ID11
Feature activation+2.77e-01

Pos logit contributions
 agree+0.499
ies+0.455
 agreement+0.441
 happy+0.435
ie+0.416
Neg logit contributions
 fault-0.945
uch-0.926
ses-0.917
sed-0.891
paste-0.887
Token but
Token ID475
Feature activation+6.67e-01

Pos logit contributions
 agree+0.011
 agreement+0.010
ies+0.010
ie+0.010
 happy+0.009
Neg logit contributions
 fault-0.014
uch-0.013
ses-0.013
sed-0.013
paste-0.013
Token the
Token ID262
Feature activation+1.40e+00

Pos logit contributions
 agree+0.027
 agreement+0.025
ies+0.024
ie+0.023
 happy+0.022
Neg logit contributions
 fault-0.033
uch-0.032
ses-0.031
sed-0.031
paste-0.031
Token pirate
Token ID25868
Feature activation-3.45e+00

Pos logit contributions
 agree+0.051
 agreement+0.048
 family+0.047
 happy+0.046
ies+0.044
Neg logit contributions
 fault-0.074
paste-0.071
uch-0.071
gh-0.070
osaurus-0.070
Token was
Token ID373
Feature activation-3.55e+00

Pos logit contributions
 fault+0.164
ses+0.155
gency+0.151
paste+0.149
uch+0.148
Neg logit contributions
 agree-0.149
ies-0.140
ie-0.134
 agreement-0.131
 happy-0.131
Token.
Token ID13
Feature activation+2.99e-01

Pos logit contributions
 fault+0.043
uch+0.041
ses+0.040
sed+0.039
paste+0.039
Neg logit contributions
 agree-0.031
 agreement-0.028
ies-0.027
ie-0.026
 happy-0.025
Token It
Token ID632
Feature activation-4.95e+00

Pos logit contributions
 agree+0.011
 agreement+0.010
ies+0.009
ie+0.009
 happy+0.008
Neg logit contributions
 fault-0.016
uch-0.016
ses-0.015
sed-0.015
paste-0.015
Token hopped
Token ID45230
Feature activation+3.66e+00

Pos logit contributions
 fault+0.237
paste+0.231
sed+0.231
uch+0.228
ses+0.228
Neg logit contributions
 agree-0.194
 agreement-0.183
ies-0.172
ie-0.162
ashing-0.158
Token back
Token ID736
Feature activation+5.84e-01

Pos logit contributions
 agreement+0.129
 agree+0.128
ies+0.114
 happy+0.114
ie+0.114
Neg logit contributions
 fault-0.206
uch-0.197
ses-0.193
sed-0.190
osaurus-0.187
Token and
Token ID290
Feature activation-4.16e-01

Pos logit contributions
 agree+0.027
 agreement+0.026
ies+0.024
ie+0.024
 happy+0.023
Neg logit contributions
 fault-0.025
uch-0.023
ses-0.023
sed-0.023
paste-0.023
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
sed+0.020
uch+0.020
ses+0.019
paste+0.019
gency+0.019
Neg logit contributions
 agree-0.016
 agreement-0.016
ies-0.014
reci-0.013
ie-0.013
Token,
Token ID11
Feature activation+3.55e-01

Pos logit contributions
 agree+0.498
ies+0.451
 agreement+0.444
 happy+0.428
ie+0.414
Neg logit contributions
 fault-0.963
uch-0.935
ses-0.917
 tresp-0.897
paste-0.891
Token collecting
Token ID13157
Feature activation+5.55e+00

Pos logit contributions
 agree+0.015
 agreement+0.014
ies+0.013
ie+0.013
reci+0.012
Neg logit contributions
 fault-0.017
sed-0.016
ses-0.016
uch-0.016
paste-0.016
Token a
Token ID257
Feature activation-8.08e+00

Pos logit contributions
 agree+0.217
 agreement+0.212
ies+0.208
 happy+0.201
ie+0.198
Neg logit contributions
uch-0.281
 fault-0.280
ses-0.268
sed-0.267
gency-0.261
Token few
Token ID1178
Feature activation+8.35e+00

Pos logit contributions
 fault+0.386
uch+0.364
ses+0.363
sed+0.359
paste+0.353
Neg logit contributions
 agree-0.341
 agreement-0.321
ies-0.307
ie-0.294
 happy-0.286
Token items
Token ID3709
Feature activation+3.50e-01

Pos logit contributions
 agree+0.377
 agreement+0.355
ies+0.341
ie+0.328
 happy+0.316
Neg logit contributions
 fault-0.372
uch-0.351
ses-0.346
sed-0.343
paste-0.341
Token and
Token ID290
Feature activation-1.96e-01

Pos logit contributions
 agree+0.043
 agreement+0.039
ies+0.036
 happy+0.035
ie+0.033
Neg logit contributions
 fault-0.074
uch-0.073
sed-0.071
ses-0.071
 tresp-0.070
Token he
Token ID339
Feature activation+3.07e+00

Pos logit contributions
 fault+0.008
uch+0.008
gh+0.008
sed+0.008
ses+0.008
Neg logit contributions
 agree-0.009
 agreement-0.009
ies-0.008
 happy-0.008
ie-0.008
Token swings
Token ID26728
Feature activation-1.95e+00

Pos logit contributions
 agree+0.144
ies+0.128
 agreement+0.126
 play+0.123
 happy+0.123
Neg logit contributions
 fault-0.130
gh-0.127
uch-0.127
sed-0.122
 tresp-0.122
Token back
Token ID736
Feature activation+5.59e-01

Pos logit contributions
uch+0.111
 fault+0.111
paste+0.105
ses+0.105
sed+0.104
Neg logit contributions
 agree-0.066
 agreement-0.061
ies-0.056
 happy-0.056
ie-0.054
Token and
Token ID290
Feature activation-2.26e-01

Pos logit contributions
 agree+0.023
 agreement+0.021
ies+0.020
ie+0.019
 happy+0.018
Neg logit contributions
 fault-0.027
uch-0.026
ses-0.026
sed-0.026
paste-0.025
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
ses+0.013
 happens+0.013
 fault+0.013
sed+0.013
paste+0.013
Neg logit contributions
 agree-0.005
 agreement-0.005
ashing-0.004
ies-0.004
ie-0.004
Token.
Token ID13
Feature activation+9.05e-01

Pos logit contributions
 agree+0.520
 agreement+0.463
ies+0.456
 happy+0.448
ie+0.411
Neg logit contributions
 fault-0.936
uch-0.920
ses-0.904
paste-0.881
sed-0.879
Token He
Token ID679
Feature activation+1.04e+00

Pos logit contributions
 agree+0.038
 agreement+0.036
ies+0.036
ie+0.036
reci+0.031
Neg logit contributions
 fault-0.043
ses-0.041
sed-0.040
paste-0.039
uch-0.038
Token feels
Token ID5300
Feature activation-3.15e+00

Pos logit contributions
 agree+0.049
 agreement+0.044
ies+0.043
 happy+0.042
ie+0.041
Neg logit contributions
 fault-0.044
uch-0.043
gh-0.043
paste-0.041
gency-0.041
Token the
Token ID262
Feature activation+1.29e+00

Pos logit contributions
 fault+0.180
uch+0.172
ses+0.170
sed+0.170
paste+0.167
Neg logit contributions
 agree-0.105
 agreement-0.096
ies-0.089
 happy-0.084
ie-0.084
Token wind
Token ID2344
Feature activation-2.86e+00

Pos logit contributions
 agree+0.044
 agreement+0.042
ies+0.039
 happy+0.039
 family+0.036
Neg logit contributions
 fault-0.070
osaurus-0.068
paste-0.067
uch-0.066
gency-0.066
Token and
Token ID290
Feature activation-5.85e-01

Pos logit contributions
 fault+0.124
uch+0.118
ses+0.116
sed+0.115
paste+0.114
Neg logit contributions
 agree-0.088
 agreement-0.080
ies-0.077
ie-0.074
 happy-0.071
Token Bunny
Token ID28683
Feature activation-3.97e+00

Pos logit contributions
 nothing+0.017
sed+0.015
 Ow+0.015
ses+0.015
 fault+0.015
Neg logit contributions
 agree-0.034
ashing-0.031
reci-0.030
 agreement-0.029
eline-0.029
Token rocked
Token ID36872
Feature activation-1.33e+00

Pos logit contributions
 fault+0.184
uch+0.177
sed+0.175
ses+0.174
paste+0.172
Neg logit contributions
 agree-0.159
 agreement-0.157
ies-0.146
ie-0.138
 particular-0.133
Token back
Token ID736
Feature activation+5.98e-01

Pos logit contributions
 fault+0.067
ses+0.064
sed+0.063
paste+0.063
uch+0.062
Neg logit contributions
 agree-0.050
 agreement-0.046
ies-0.044
ie-0.042
reci-0.041
Token and
Token ID290
Feature activation-5.91e-01

Pos logit contributions
 agree+0.028
 agreement+0.027
ies+0.024
ie+0.023
reci+0.023
Neg logit contributions
 fault-0.026
ses-0.024
sed-0.024
uch-0.024
paste-0.023
Token forth
Token ID6071
Feature activation+1.63e+01

Pos logit contributions
sed+0.034
ses+0.034
 fault+0.034
 nothing+0.033
paste+0.033
Neg logit contributions
 agreement-0.016
 agree-0.016
reci-0.014
 particular-0.013
eline-0.012
Token together
Token ID1978
Feature activation+5.44e+00

Pos logit contributions
 agree+0.468
 agreement+0.412
ies+0.400
 happy+0.377
ie+0.367
Neg logit contributions
 fault-1.003
uch-0.976
ses-0.946
 tresp-0.938
paste-0.934
Token and
Token ID290
Feature activation-5.98e-01

Pos logit contributions
 agree+0.204
 agreement+0.199
ies+0.188
ie+0.180
reci+0.178
Neg logit contributions
 fault-0.274
gh-0.256
sed-0.255
ses-0.252
paste-0.250
Token made
Token ID925
Feature activation-4.92e+00

Pos logit contributions
 fault+0.027
sed+0.026
ses+0.026
uch+0.025
paste+0.025
Neg logit contributions
 agree-0.026
 agreement-0.025
ies-0.023
reci-0.022
ie-0.022
Token a
Token ID257
Feature activation-8.20e+00

Pos logit contributions
uch+0.193
 fault+0.191
sed+0.183
ses+0.183
gency+0.182
Neg logit contributions
 agree-0.248
 agreement-0.234
ies-0.228
 happy-0.224
ie-0.221
Token big
Token ID1263
Feature activation+8.21e-01

Pos logit contributions
 fault+0.388
sed+0.360
ses+0.359
uch+0.353
 tresp+0.352
Neg logit contributions
 agree-0.318
 agreement-0.299
ie-0.298
ies-0.292
ogged-0.289
Token roll
Token ID4836
Feature activation+4.54e+00

Pos logit contributions
 fault+0.313
uch+0.300
ses+0.296
sed+0.294
paste+0.292
Neg logit contributions
 agree-0.165
 agreement-0.154
ies-0.148
ie-0.139
 happy-0.127
Token a
Token ID257
Feature activation-8.14e+00

Pos logit contributions
 agree+0.129
ies+0.118
 happy+0.115
ie+0.113
 agreement+0.112
Neg logit contributions
sed-0.287
 fault-0.286
uch-0.283
ses-0.279
lor-0.278
Token ball
Token ID2613
Feature activation-2.25e+00

Pos logit contributions
 fault+0.419
ses+0.395
sed+0.388
uch+0.387
ps+0.386
Neg logit contributions
 agree-0.314
 agreement-0.295
 happy-0.293
ies-0.286
ie-0.279
Token back
Token ID736
Feature activation+5.66e-01

Pos logit contributions
 fault+0.122
uch+0.117
ses+0.116
sed+0.116
paste+0.115
Neg logit contributions
 agree-0.082
 agreement-0.073
ie-0.070
ies-0.069
 happy-0.066
Token and
Token ID290
Feature activation-4.54e-01

Pos logit contributions
 agree+0.021
 agreement+0.019
ies+0.018
ie+0.017
 happy+0.016
Neg logit contributions
 fault-0.030
uch-0.029
ses-0.028
sed-0.028
paste-0.028
Token forth
Token ID6071
Feature activation+1.62e+01

Pos logit contributions
 fault+0.024
ses+0.024
sed+0.024
uch+0.023
gency+0.023
Neg logit contributions
 agreement-0.014
 agree-0.014
ies-0.013
ie-0.013
reci-0.012
Token.
Token ID13
Feature activation+4.77e-01

Pos logit contributions
 agree+0.522
ies+0.463
 agreement+0.454
 happy+0.454
ie+0.417
Neg logit contributions
 fault-0.942
uch-0.938
paste-0.911
ses-0.906
 tresp-0.889
Token The
Token ID383
Feature activation+7.49e-02

Pos logit contributions
 agree+0.014
 agreement+0.013
ies+0.011
ie+0.011
 happy+0.010
Neg logit contributions
 fault-0.029
uch-0.029
gh-0.028
paste-0.028
 tresp-0.028
Token candle
Token ID26839
Feature activation-8.32e+00

Pos logit contributions
 agree+0.002
 agreement+0.002
ies+0.002
ie+0.002
ashing+0.001
Neg logit contributions
 fault-0.005
uch-0.004
ses-0.004
sed-0.004
paste-0.004
Token wanted
Token ID2227
Feature activation+2.41e+00

Pos logit contributions
 fault+0.401
ses+0.375
uch+0.371
gency+0.371
 tresp+0.369
Neg logit contributions
 agree-0.351
ies-0.325
 agreement-0.314
ie-0.313
 happy-0.296
Token to
Token ID284
Feature activation-5.39e+00

Pos logit contributions
 happy+0.074
 play+0.073
 family+0.073
 agree+0.069
 whenever+0.065
Neg logit contributions
uch-0.154
osaurus-0.144
ses-0.144
paste-0.143
lor-0.143
Token threw
Token ID9617
Feature activation-1.75e+00

Pos logit contributions
 agreement+0.043
 agree+0.042
ies+0.040
 particular+0.038
ie+0.037
Neg logit contributions
uch-0.049
 fault-0.048
sed-0.047
paste-0.047
 tresp-0.047
Token the
Token ID262
Feature activation+1.61e+00

Pos logit contributions
uch+0.098
 fault+0.097
sed+0.093
paste+0.092
ses+0.092
Neg logit contributions
 agree-0.063
 agreement-0.060
 happy-0.054
ies-0.053
ie-0.051
Token ball
Token ID2613
Feature activation-2.20e+00

Pos logit contributions
 agree+0.073
 agreement+0.070
ies+0.066
 happy+0.063
ie+0.063
Neg logit contributions
 fault-0.070
uch-0.067
paste-0.066
gh-0.065
ses-0.065
Token back
Token ID736
Feature activation+6.94e-01

Pos logit contributions
 fault+0.107
uch+0.102
ses+0.100
paste+0.099
sed+0.099
Neg logit contributions
 agree-0.090
 agreement-0.084
ies-0.080
ie-0.077
 happy-0.074
Token and
Token ID290
Feature activation-6.83e-01

Pos logit contributions
 agree+0.029
 agreement+0.027
ies+0.025
ie+0.025
 happy+0.023
Neg logit contributions
 fault-0.033
uch-0.031
ses-0.031
sed-0.031
paste-0.030
Token forth
Token ID6071
Feature activation+1.62e+01

Pos logit contributions
sed+0.043
ses+0.042
 fault+0.042
paste+0.041
 tresp+0.041
Neg logit contributions
 agree-0.016
 agreement-0.015
 particular-0.013
reci-0.012
ie-0.012
Token until
Token ID1566
Feature activation+7.08e+00

Pos logit contributions
 agree+0.525
ies+0.472
 agreement+0.468
 happy+0.453
ie+0.427
Neg logit contributions
 fault-0.932
uch-0.915
ses-0.891
paste-0.877
gency-0.874
Token they
Token ID484
Feature activation+5.33e+00

Pos logit contributions
 agree+0.317
 agreement+0.300
ies+0.290
ie+0.280
 happy+0.273
Neg logit contributions
 fault-0.321
uch-0.302
ses-0.298
sed-0.295
paste-0.292
Token were
Token ID547
Feature activation+3.22e-01

Pos logit contributions
 agree+0.220
 agreement+0.201
ies+0.195
ie+0.188
 happy+0.184
Neg logit contributions
 fault-0.264
ses-0.247
uch-0.245
paste-0.244
sed-0.242
Token too
Token ID1165
Feature activation-6.92e+00

Pos logit contributions
 agree+0.015
 agreement+0.014
ies+0.013
ie+0.013
reci+0.012
Neg logit contributions
 fault-0.014
uch-0.013
ses-0.013
sed-0.013
paste-0.013
Token tired
Token ID10032
Feature activation+2.94e+00

Pos logit contributions
 fault+0.327
uch+0.314
 tresp+0.298
ses+0.296
sed+0.294
Neg logit contributions
 agree-0.284
 agreement-0.283
ie-0.259
ies-0.252
reci-0.235
Token and
Token ID290
Feature activation-8.57e-01

Pos logit contributions
 agree+0.153
 agreement+0.144
ies+0.135
ie+0.131
 happy+0.123
Neg logit contributions
 fault-0.206
uch-0.192
ses-0.192
sed-0.191
paste-0.190
Token said
Token ID531
Feature activation-6.74e-01

Pos logit contributions
 fault+0.043
sed+0.042
ses+0.041
paste+0.041
uch+0.041
Neg logit contributions
 agree-0.032
 agreement-0.030
ies-0.028
ie-0.028
reci-0.026
Token it
Token ID340
Feature activation-3.69e+00

Pos logit contributions
uch+0.040
 fault+0.040
ses+0.039
osaurus+0.038
gency+0.038
Neg logit contributions
 agree-0.021
 agreement-0.018
 happy-0.018
ies-0.018
ie-0.016
Token was
Token ID373
Feature activation-3.86e+00

Pos logit contributions
 fault+0.211
gency+0.209
uch+0.204
ses+0.201
 tresp+0.198
Neg logit contributions
 agree-0.120
 happy-0.110
ies-0.105
ie-0.102
 play-0.099
Token their
Token ID511
Feature activation+1.96e+00

Pos logit contributions
uch+0.175
paste+0.167
gency+0.166
 fault+0.165
ses+0.163
Neg logit contributions
 agree-0.174
 happy-0.172
 agreement-0.169
ies-0.162
 family-0.155
Token fault
Token ID8046
Feature activation-2.12e+01

Pos logit contributions
 agree+0.072
 agreement+0.065
ies+0.062
 happy+0.059
 family+0.059
Neg logit contributions
paste-0.104
uch-0.102
gency-0.100
osaurus-0.099
 fault-0.099
Token.
Token ID13
Feature activation-2.01e-01

Pos logit contributions
 fault+1.062
sed+0.981
ses+0.980
uch+0.970
paste+0.964
Neg logit contributions
 agree-0.838
 agreement-0.792
ies-0.724
ie-0.699
reci-0.687
Token Mom
Token ID11254
Feature activation-1.47e+00

Pos logit contributions
 fault+0.013
uch+0.013
paste+0.012
 tresp+0.012
ses+0.012
Neg logit contributions
 agree-0.005
 agreement-0.004
ies-0.004
 happy-0.004
ie-0.003
Token said
Token ID531
Feature activation-7.00e-01

Pos logit contributions
 fault+0.075
uch+0.072
ses+0.070
sed+0.070
paste+0.070
Neg logit contributions
 agree-0.056
 agreement-0.053
ies-0.049
ie-0.047
 happy-0.045
Token they
Token ID484
Feature activation+5.14e+00

Pos logit contributions
uch+0.042
osaurus+0.041
 fault+0.041
ses+0.041
gency+0.040
Neg logit contributions
 agree-0.022
 happy-0.021
ies-0.019
 agreement-0.018
 family-0.018
Token were
Token ID547
Feature activation+2.65e-01

Pos logit contributions
 agree+0.202
 agreement+0.182
ies+0.178
ie+0.170
 happy+0.164
Neg logit contributions
 fault-0.262
ses-0.249
uch-0.248
sed-0.246
paste-0.245
Token it
Token ID340
Feature activation-3.31e+00

Pos logit contributions
 fault+0.023
ses+0.022
osaurus+0.022
uch+0.022
sed+0.021
Neg logit contributions
 agree-0.011
 happy-0.010
ies-0.010
ie-0.010
 agreement-0.009
Token was
Token ID373
Feature activation-3.53e+00

Pos logit contributions
lor+0.211
 fault+0.210
gency+0.209
ses+0.208
paste+0.202
Neg logit contributions
ie-0.080
my-0.076
 play-0.070
 happy-0.069
ies-0.067
Token Tim
Token ID5045
Feature activation-3.03e-01

Pos logit contributions
lor+0.187
ses+0.173
gency+0.173
paste+0.172
sed+0.171
Neg logit contributions
 happy-0.141
ies-0.130
ie-0.129
 family-0.125
 agree-0.125
Tokenmy
Token ID1820
Feature activation+7.75e+00

Pos logit contributions
 fault+0.022
uch+0.021
ses+0.021
sed+0.020
paste+0.020
Neg logit contributions
 agree-0.005
 agreement-0.005
ies-0.004
ie-0.003
 happy-0.003
Token's
Token ID338
Feature activation+1.15e+00

Pos logit contributions
 agree+0.314
ies+0.287
 happy+0.280
ie+0.279
 agreement+0.274
Neg logit contributions
 fault-0.371
ses-0.368
paste-0.362
sed-0.362
uch-0.358
Token fault
Token ID8046
Feature activation-2.10e+01

Pos logit contributions
 agree+0.043
ies+0.042
 happy+0.041
 agreement+0.041
 family+0.041
Neg logit contributions
gency-0.055
OP-0.055
 tresp-0.055
uch-0.055
paste-0.054
Token but
Token ID475
Feature activation+5.77e-01

Pos logit contributions
 fault+1.074
ses+0.991
sed+0.991
paste+0.984
uch+0.981
Neg logit contributions
 agree-0.818
 agreement-0.765
ies-0.699
ie-0.669
reci-0.665
Token Tim
Token ID5045
Feature activation-2.81e-01

Pos logit contributions
 agree+0.019
 agreement+0.018
ies+0.016
ashing+0.016
reci+0.016
Neg logit contributions
 fault-0.032
sed-0.030
 tresp-0.030
 nothing-0.030
ses-0.030
Tokenmy
Token ID1820
Feature activation+7.76e+00

Pos logit contributions
 fault+0.021
ses+0.021
paste+0.021
uch+0.021
sed+0.020
Neg logit contributions
 agree-0.004
 agreement-0.003
ies-0.003
ie-0.003
 happy-0.002
Token felt
Token ID2936
Feature activation-2.07e+00

Pos logit contributions
 agree+0.328
ies+0.293
ie+0.288
 happy+0.280
 agreement+0.279
Neg logit contributions
 fault-0.374
paste-0.363
ses-0.360
gency-0.352
sed-0.350
Token guilty
Token ID6717
Feature activation-4.27e+00

Pos logit contributions
 fault+0.095
sed+0.091
ses+0.091
osaurus+0.090
uch+0.090
Neg logit contributions
 agree-0.093
 happy-0.089
 agreement-0.088
ies-0.078
 family-0.078
Tokenawn
Token ID3832
Feature activation-6.21e+00

Pos logit contributions
 fault+0.511
uch+0.482
ses+0.475
sed+0.471
gh+0.461
Neg logit contributions
 agree-0.353
 agreement-0.335
ies-0.310
ie-0.291
 happy-0.287
Tokenâ
Token ID22940
Feature activation-8.33e+00

Pos logit contributions
 fault+0.278
uch+0.264
sed+0.259
ses+0.259
paste+0.258
Neg logit contributions
 agree-0.276
 agreement-0.262
ies-0.243
ie-0.234
 happy-0.226
Token
Token ID26391
Feature activation-1.29e+01

Pos logit contributions
 fault+0.311
uch+0.302
ses+0.286
sed+0.282
gh+0.280
Neg logit contributions
 agree-0.428
 agreement-0.407
ies-0.401
ie-0.379
 happy-0.373
Token
Token ID8151
Feature activation-1.20e+01

Pos logit contributions
 fault+0.583
uch+0.556
sed+0.541
ses+0.540
paste+0.534
Neg logit contributions
 agree-0.563
 agreement-0.542
ies-0.508
ie-0.495
 happy-0.467
Tokens
Token ID82
Feature activation+2.90e+00

Pos logit contributions
 fault+0.376
uch+0.358
gh+0.331
sed+0.326
paste+0.323
Neg logit contributions
 agree-0.696
 agreement-0.675
ies-0.642
ie-0.616
reci-0.601
Token fault
Token ID8046
Feature activation-2.10e+01

Pos logit contributions
 agree+0.131
 agreement+0.123
ies+0.119
ie+0.114
 happy+0.112
Neg logit contributions
 fault-0.126
uch-0.122
ses-0.118
paste-0.118
sed-0.117
Token.
Token ID13
Feature activation-1.40e-01

Pos logit contributions
 fault+1.023
sed+0.947
uch+0.946
ses+0.946
paste+0.934
Neg logit contributions
 agree-0.861
 agreement-0.807
ies-0.754
ie-0.725
reci-0.691
Token
Token ID220
Feature activation+1.66e+00

Pos logit contributions
 fault+0.007
ses+0.007
uch+0.007
paste+0.007
sed+0.006
Neg logit contributions
 agree-0.006
ies-0.005
 agreement-0.005
ie-0.005
 happy-0.004
Token
Token ID198
Feature activation+6.43e-01

Pos logit contributions
 agree+0.052
 agreement+0.049
ies+0.046
ie+0.043
 happy+0.039
Neg logit contributions
 fault-0.097
uch-0.092
paste-0.090
ses-0.090
sed-0.089
Token
Token ID198
Feature activation+6.17e-01

Pos logit contributions
ies+0.024
 agree+0.024
 agreement+0.022
ie+0.022
 forth+0.021
Neg logit contributions
 fault-0.031
uch-0.031
gh-0.031
paste-0.030
 tresp-0.030
TokenThe
Token ID464
Feature activation+3.12e+00

Pos logit contributions
 agree+0.027
ies+0.027
 agreement+0.025
ie+0.025
 forth+0.023
Neg logit contributions
uch-0.026
 tresp-0.025
gh-0.025
paste-0.025
 fault-0.025
Token felt
Token ID2936
Feature activation-1.94e+00

Pos logit contributions
 agree+0.112
 agreement+0.100
ies+0.096
ie+0.094
 happy+0.091
Neg logit contributions
 fault-0.158
uch-0.149
ses-0.147
paste-0.144
gh-0.144
Token like
Token ID588
Feature activation-5.46e+00

Pos logit contributions
 fault+0.095
uch+0.091
ses+0.090
sed+0.090
paste+0.088
Neg logit contributions
 agree-0.079
 agreement-0.074
ies-0.069
ie-0.067
 happy-0.067
Token it
Token ID340
Feature activation-3.46e+00

Pos logit contributions
 fault+0.293
uch+0.292
ses+0.282
osaurus+0.281
sed+0.279
Neg logit contributions
 agree-0.190
 agreement-0.175
 happy-0.170
ies-0.168
ie-0.160
Token was
Token ID373
Feature activation-3.84e+00

Pos logit contributions
 fault+0.160
ses+0.155
sed+0.154
uch+0.152
gency+0.150
Neg logit contributions
 agree-0.146
ie-0.137
 agreement-0.136
ies-0.136
 happy-0.134
Token her
Token ID607
Feature activation+2.85e+00

Pos logit contributions
uch+0.174
 fault+0.172
gency+0.169
ses+0.169
osaurus+0.168
Neg logit contributions
 agree-0.161
 agreement-0.156
 happy-0.153
ies-0.150
ie-0.146
Token fault
Token ID8046
Feature activation-2.10e+01

Pos logit contributions
 agree+0.103
 agreement+0.095
ies+0.091
 happy+0.088
 family+0.086
Neg logit contributions
uch-0.150
paste-0.147
 fault-0.147
gency-0.145
osaurus-0.144
Token,
Token ID11
Feature activation-2.87e-02

Pos logit contributions
 fault+1.033
sed+0.944
ses+0.944
uch+0.942
paste+0.936
Neg logit contributions
 agree-0.864
 agreement-0.808
ies-0.741
reci-0.708
ie-0.707
Token like
Token ID588
Feature activation-5.46e+00

Pos logit contributions
 fault+0.001
uch+0.001
ses+0.001
sed+0.001
paste+0.001
Neg logit contributions
 agree-0.001
 agreement-0.001
ies-0.001
ie-0.001
 happy-0.001
Token she
Token ID673
Feature activation+5.35e+00

Pos logit contributions
uch+0.266
 fault+0.255
osaurus+0.254
ses+0.253
sed+0.250
Neg logit contributions
 agree-0.213
 happy-0.201
 agreement-0.201
ies-0.195
 family-0.193
Token had
Token ID550
Feature activation+2.46e+00

Pos logit contributions
 agree+0.242
ies+0.219
 agreement+0.219
ie+0.217
 happy+0.209
Neg logit contributions
 fault-0.240
ses-0.226
sed-0.224
paste-0.221
uch-0.220
Token done
Token ID1760
Feature activation-2.52e+00

Pos logit contributions
 agree+0.114
 happy+0.111
ies+0.110
 family+0.105
 play+0.105
Neg logit contributions
ses-0.100
osaurus-0.100
gency-0.097
sed-0.095
uch-0.094
Token told
Token ID1297
Feature activation+1.67e+00

Pos logit contributions
 agree+0.098
ies+0.086
ie+0.083
 agreement+0.081
 happy+0.081
Neg logit contributions
 fault-0.137
ses-0.130
paste-0.125
uch-0.124
sed-0.124
Token her
Token ID607
Feature activation+3.09e+00

Pos logit contributions
 agree+0.040
 agreement+0.037
ie+0.035
ies+0.033
 happy+0.029
Neg logit contributions
 fault-0.110
uch-0.108
sed-0.106
ses-0.106
osaurus-0.104
Token it
Token ID340
Feature activation-3.36e+00

Pos logit contributions
 agree+0.120
ies+0.110
 agreement+0.109
ie+0.104
 happy+0.101
Neg logit contributions
 fault-0.155
uch-0.153
ses-0.150
sed-0.148
osaurus-0.146
Token was
Token ID373
Feature activation-3.48e+00

Pos logit contributions
gency+0.228
ses+0.224
osaurus+0.220
lor+0.219
angs+0.216
Neg logit contributions
ie-0.069
 happy-0.060
 agree-0.059
my-0.059
ies-0.058
Token her
Token ID607
Feature activation+3.08e+00

Pos logit contributions
ses+0.151
 fault+0.150
osaurus+0.150
sed+0.150
paste+0.149
Neg logit contributions
 agree-0.152
 agreement-0.146
ies-0.146
ie-0.144
 happy-0.144
Token fault
Token ID8046
Feature activation-2.10e+01

Pos logit contributions
 agree+0.177
 happy+0.168
ies+0.164
 family+0.164
 agreement+0.159
Neg logit contributions
osaurus-0.102
uch-0.100
RR-0.098
paste-0.098
gency-0.097
Token because
Token ID780
Feature activation-2.84e+00

Pos logit contributions
 fault+1.065
sed+0.986
ses+0.985
uch+0.980
paste+0.978
Neg logit contributions
 agree-0.822
 agreement-0.771
ies-0.710
ie-0.679
reci-0.673
Token she
Token ID673
Feature activation+5.44e+00

Pos logit contributions
 fault+0.154
uch+0.146
paste+0.145
ses+0.144
sed+0.143
Neg logit contributions
 agree-0.103
 agreement-0.095
ies-0.088
ie-0.083
 happy-0.078
Token dropped
Token ID5710
Feature activation-3.79e+00

Pos logit contributions
 agree+0.184
ie+0.183
ies+0.177
 family+0.169
 happy+0.168
Neg logit contributions
 fault-0.302
gency-0.285
ses-0.284
OP-0.278
gh-0.276
Token her
Token ID607
Feature activation+3.08e+00

Pos logit contributions
uch+0.272
sed+0.255
gh+0.255
lor+0.252
OP+0.251
Neg logit contributions
 agreement-0.073
ies-0.070
ie-0.070
 happy-0.069
 play-0.063
Token toy
Token ID13373
Feature activation-2.44e+00

Pos logit contributions
ies+0.114
 happy+0.110
 family+0.106
 agree+0.105
 play+0.102
Neg logit contributions
gency-0.151
OP-0.149
uch-0.147
RR-0.147
paste-0.145
Token
Token ID198
Feature activation+5.23e-01

Pos logit contributions
 agree+0.022
 agreement+0.021
ies+0.021
 forth+0.020
ashing+0.020
Neg logit contributions
 fault-0.026
gh-0.025
OP-0.025
 Ow-0.024
paste-0.024
Token"
Token ID1
Feature activation+5.46e+00

Pos logit contributions
 agree+0.020
ies+0.020
 agreement+0.019
ie+0.019
ashing+0.019
Neg logit contributions
gh-0.023
RR-0.022
paste-0.022
OP-0.022
uch-0.022
TokenIt
Token ID1026
Feature activation-2.78e+00

Pos logit contributions
 agree+0.233
 agreement+0.215
ies+0.199
 happy+0.193
ie+0.187
Neg logit contributions
 fault-0.265
uch-0.245
sed-0.243
ses-0.243
paste-0.239
Token's
Token ID338
Feature activation+9.29e-01

Pos logit contributions
gency+0.168
 fault+0.167
ses+0.165
gh+0.162
sed+0.162
Neg logit contributions
ie-0.075
ies-0.069
my-0.067
 play-0.063
 agree-0.063
Token your
Token ID534
Feature activation-1.04e+00

Pos logit contributions
 happy+0.042
 family+0.038
 sunny+0.036
 agree+0.035
 friends+0.034
Neg logit contributions
lor-0.045
osaurus-0.043
gency-0.042
paste-0.041
uch-0.040
Token fault
Token ID8046
Feature activation-2.09e+01

Pos logit contributions
uch+0.047
paste+0.046
gh+0.045
gency+0.045
 happens+0.044
Neg logit contributions
 agree-0.044
 family-0.042
ies-0.041
 happy-0.040
 agreement-0.040
Token!"
Token ID2474
Feature activation-7.61e+00

Pos logit contributions
 fault+0.837
uch+0.792
ses+0.776
paste+0.768
sed+0.766
Neg logit contributions
 agree-1.034
 agreement-0.971
ies-0.954
ie-0.918
 happy-0.881
Token Mia
Token ID32189
Feature activation-2.04e+00

Pos logit contributions
 fault+0.368
ses+0.345
uch+0.345
sed+0.343
paste+0.342
Neg logit contributions
 agree-0.311
 agreement-0.291
ies-0.281
ie-0.270
 happy-0.253
Token blamed
Token ID13772
Feature activation-3.20e-01

Pos logit contributions
 fault+0.109
uch+0.106
gh+0.103
paste+0.103
ses+0.102
Neg logit contributions
 agree-0.076
 agreement-0.068
ies-0.064
ie-0.062
 happy-0.060
Token Sam
Token ID3409
Feature activation-3.01e+00

Pos logit contributions
 fault+0.016
ses+0.015
uch+0.015
sed+0.015
paste+0.015
Neg logit contributions
 agree-0.013
 agreement-0.012
ies-0.011
ie-0.011
 happy-0.010
Token.
Token ID13
Feature activation-5.72e-02

Pos logit contributions
 fault+0.124
paste+0.113
ps+0.112
ses+0.109
ze+0.108
Neg logit contributions
 agree-0.134
 agreement-0.131
reci-0.123
 particular-0.123
ies-0.122
Token realised
Token ID19169
Feature activation-1.47e+00

Pos logit contributions
 agree+0.058
 agreement+0.057
ies+0.050
ashing+0.049
ie+0.048
Neg logit contributions
 fault-0.069
uch-0.064
 happens-0.063
paste-0.063
 tresp-0.063
Token it
Token ID340
Feature activation-3.47e+00

Pos logit contributions
 fault+0.058
uch+0.056
ses+0.055
sed+0.054
paste+0.053
Neg logit contributions
 agree-0.074
 agreement-0.069
ies-0.068
ie-0.066
 happy-0.064
Token wasn
Token ID2492
Feature activation-6.67e+00

Pos logit contributions
 fault+0.168
ses+0.167
sed+0.164
uch+0.158
gency+0.158
Neg logit contributions
 agree-0.139
ie-0.134
ies-0.132
 agreement-0.126
 happy-0.125
Token't
Token ID470
Feature activation-1.06e+01

Pos logit contributions
 fault+0.213
ses+0.207
sed+0.207
gency+0.184
osaurus+0.184
Neg logit contributions
 agree-0.374
 agreement-0.367
 happy-0.354
ie-0.350
 particular-0.348
Token her
Token ID607
Feature activation+3.03e+00

Pos logit contributions
 fault+0.435
uch+0.434
ses+0.427
sed+0.425
paste+0.411
Neg logit contributions
 agree-0.498
 agreement-0.478
ies-0.472
ie-0.459
 happy-0.456
Token fault
Token ID8046
Feature activation-2.09e+01

Pos logit contributions
 agree+0.111
 agreement+0.102
ies+0.100
 happy+0.096
 family+0.094
Neg logit contributions
uch-0.164
paste-0.155
gency-0.152
osaurus-0.152
OP-0.152
Token after
Token ID706
Feature activation+3.22e+00

Pos logit contributions
 fault+1.069
uch+0.989
paste+0.981
ses+0.981
sed+0.978
Neg logit contributions
 agree-0.814
 agreement-0.765
ies-0.710
ie-0.679
reci-0.646
Token all
Token ID477
Feature activation+3.52e+00

Pos logit contributions
 agree+0.115
 agreement+0.105
ies+0.104
ie+0.100
 happy+0.095
Neg logit contributions
 fault-0.172
uch-0.165
ses-0.165
sed-0.163
paste-0.161
Token.
Token ID13
Feature activation-2.19e-01

Pos logit contributions
 agree+0.157
 agreement+0.151
ashing+0.137
ies+0.137
ogged+0.130
Neg logit contributions
 fault-0.159
 tresp-0.142
 happens-0.139
ses-0.134
paste-0.134
Token<|endoftext|>
Token ID50256
Feature activation+3.67e+00

Pos logit contributions
 fault+0.012
ses+0.011
paste+0.011
sed+0.011
uch+0.011
Neg logit contributions
 agree-0.008
ies-0.007
 agreement-0.007
ie-0.007
 forth-0.006
TokenOnce
Token ID7454
Feature activation+3.81e+00

Pos logit contributions
 agree+0.136
 agreement+0.126
ies+0.119
ie+0.115
 happy+0.109
Neg logit contributions
 fault-0.192
uch-0.183
ses-0.181
sed-0.180
paste-0.179
Token saying
Token ID2282
Feature activation-3.64e-01

Pos logit contributions
 fault+0.005
ses+0.005
sed+0.005
uch+0.005
paste+0.005
Neg logit contributions
 agree-0.005
 agreement-0.004
ies-0.004
ie-0.004
 happy-0.004
Token that
Token ID326
Feature activation-1.17e+00

Pos logit contributions
uch+0.023
 fault+0.023
ses+0.023
sed+0.022
ASH+0.021
Neg logit contributions
 agree-0.010
 happy-0.009
 agreement-0.009
ies-0.008
ie-0.008
Token it
Token ID340
Feature activation-3.52e+00

Pos logit contributions
 fault+0.063
uch+0.061
ses+0.060
sed+0.060
paste+0.059
Neg logit contributions
 agree-0.041
 agreement-0.038
ies-0.036
ie-0.034
 happy-0.033
Token was
Token ID373
Feature activation-3.85e+00

Pos logit contributions
ses+0.183
 fault+0.183
gency+0.178
sed+0.175
paste+0.173
Neg logit contributions
ie-0.121
 agree-0.118
ies-0.114
 happy-0.112
 play-0.107
Token his
Token ID465
Feature activation+3.41e-01

Pos logit contributions
uch+0.162
ses+0.160
paste+0.159
lor+0.158
sed+0.155
Neg logit contributions
 happy-0.178
 agreement-0.173
 agree-0.171
 family-0.163
ies-0.161
Token fault
Token ID8046
Feature activation-2.09e+01

Pos logit contributions
 agree+0.013
 agreement+0.012
ies+0.012
 family+0.012
 happy+0.012
Neg logit contributions
uch-0.017
paste-0.017
osaurus-0.017
gency-0.016
gh-0.016
Token that
Token ID326
Feature activation-1.16e+00

Pos logit contributions
 fault+1.092
uch+1.039
ses+1.026
sed+1.019
paste+1.017
Neg logit contributions
 agree-0.777
 agreement-0.722
ies-0.686
ie-0.655
 happy-0.623
Token he
Token ID339
Feature activation+2.96e+00

Pos logit contributions
 fault+0.059
ses+0.057
uch+0.057
sed+0.057
paste+0.057
Neg logit contributions
 agree-0.043
 agreement-0.038
ies-0.038
ie-0.037
 happy-0.037
Token forgot
Token ID16453
Feature activation+4.84e+00

Pos logit contributions
 agree+0.114
ies+0.102
ie+0.101
 happy+0.096
 agreement+0.095
Neg logit contributions
 fault-0.154
ses-0.146
paste-0.144
sed-0.142
gency-0.142
Token the
Token ID262
Feature activation+1.40e+00

Pos logit contributions
 agree+0.221
ies+0.205
 agreement+0.203
ie+0.198
 happy+0.197
Neg logit contributions
ses-0.207
uch-0.205
 fault-0.204
sed-0.201
paste-0.195
Token folder
Token ID9483
Feature activation-3.20e+00

Pos logit contributions
 agree+0.060
 agreement+0.058
ies+0.054
 family+0.053
 happy+0.053
Neg logit contributions
osaurus-0.066
uch-0.064
paste-0.063
gency-0.061
OP-0.061
Token it
Token ID340
Feature activation-3.36e+00

Pos logit contributions
 fault+0.111
uch+0.108
ses+0.106
sed+0.105
paste+0.104
Neg logit contributions
 agree-0.081
 agreement-0.074
ies-0.071
ie-0.069
 happy-0.067
Token was
Token ID373
Feature activation-3.76e+00

Pos logit contributions
ses+0.189
 fault+0.184
sed+0.183
uch+0.177
paste+0.176
Neg logit contributions
ie-0.106
 agree-0.101
 happy-0.101
 family-0.097
 day-0.096
Token all
Token ID477
Feature activation+3.53e+00

Pos logit contributions
ses+0.176
paste+0.175
uch+0.174
sed+0.174
osaurus+0.171
Neg logit contributions
 happy-0.156
 agree-0.153
 agreement-0.153
 family-0.148
ie-0.147
Token Joe
Token ID5689
Feature activation-4.84e+00

Pos logit contributions
 agree+0.160
 agreement+0.150
 happy+0.148
ies+0.147
ie+0.146
Neg logit contributions
uch-0.147
ses-0.147
sed-0.146
paste-0.146
 fault-0.143
Token's
Token ID338
Feature activation+1.08e+00

Pos logit contributions
 fault+0.226
ses+0.221
uch+0.219
sed+0.217
paste+0.213
Neg logit contributions
 agree-0.207
ies-0.188
 agreement-0.186
ie-0.183
 happy-0.179
Token fault
Token ID8046
Feature activation-2.09e+01

Pos logit contributions
 agree+0.037
 agreement+0.034
ies+0.032
 happy+0.031
ie+0.030
Neg logit contributions
uch-0.058
 fault-0.058
paste-0.057
osaurus-0.056
ses-0.056
Token.
Token ID13
Feature activation+1.62e-01

Pos logit contributions
 fault+1.034
uch+0.960
ses+0.959
sed+0.958
paste+0.949
Neg logit contributions
 agree-0.840
 agreement-0.785
ies-0.736
ie-0.706
 happy-0.671
Token Joe
Token ID5689
Feature activation-4.82e+00

Pos logit contributions
 agree+0.005
 agreement+0.005
ies+0.005
ie+0.004
 happy+0.004
Neg logit contributions
 fault-0.009
uch-0.009
ses-0.009
sed-0.009
paste-0.009
Token tried
Token ID3088
Feature activation-6.51e-01

Pos logit contributions
 fault+0.249
ses+0.236
uch+0.235
sed+0.231
paste+0.230
Neg logit contributions
 agree-0.189
 agreement-0.172
ies-0.167
ie-0.164
 happy-0.156
Token to
Token ID284
Feature activation-5.46e+00

Pos logit contributions
uch+0.026
ses+0.025
 fault+0.025
sed+0.025
paste+0.024
Neg logit contributions
 agree-0.033
 agreement-0.031
ies-0.029
 happy-0.029
ie-0.028
Token talk
Token ID1561
Feature activation+1.05e+00

Pos logit contributions
 fault+0.273
uch+0.263
 tresp+0.259
paste+0.254
gh+0.254
Neg logit contributions
 agree-0.196
 agreement-0.185
ies-0.184
ashing-0.178
ie-0.168
Token.
Token ID13
Feature activation+2.00e-01

Pos logit contributions
 agree+0.082
ies+0.077
ie+0.074
 agreement+0.074
 happy+0.067
Neg logit contributions
 fault-0.131
uch-0.129
ses-0.129
sed-0.128
gency-0.124
Token It
Token ID632
Feature activation-5.07e+00

Pos logit contributions
 agree+0.009
 agreement+0.009
ies+0.009
ie+0.008
 happy+0.008
Neg logit contributions
 fault-0.008
uch-0.008
 tresp-0.008
gh-0.008
paste-0.007
Token's
Token ID338
Feature activation+1.07e+00

Pos logit contributions
gency+0.271
gh+0.263
 fault+0.263
ses+0.254
sed+0.252
Neg logit contributions
 agree-0.176
ie-0.172
ies-0.170
 happy-0.164
 play-0.153
Token not
Token ID407
Feature activation-7.37e+00

Pos logit contributions
 happy+0.049
 agree+0.045
 agreement+0.044
ies+0.043
ie+0.042
Neg logit contributions
gency-0.048
lor-0.047
paste-0.046
OP-0.046
uch-0.046
Token your
Token ID534
Feature activation-1.06e+00

Pos logit contributions
 fault+0.337
uch+0.335
paste+0.330
gency+0.328
sed+0.326
Neg logit contributions
 agree-0.319
ies-0.297
 happy-0.292
 agreement-0.286
ie-0.279
Token fault
Token ID8046
Feature activation-2.09e+01

Pos logit contributions
uch+0.043
paste+0.043
osaurus+0.040
gency+0.040
OP+0.039
Neg logit contributions
 agree-0.053
 agreement-0.049
ies-0.047
 family-0.047
 happy-0.047
Token.
Token ID13
Feature activation+3.20e-01

Pos logit contributions
 fault+0.973
uch+0.919
ses+0.907
sed+0.900
paste+0.897
Neg logit contributions
 agree-0.896
 agreement-0.841
ies-0.804
ie-0.773
 happy-0.741
Token The
Token ID383
Feature activation+7.32e-02

Pos logit contributions
 agree+0.015
 agreement+0.015
ies+0.013
 happy+0.013
 family+0.013
Neg logit contributions
 fault-0.013
uch-0.013
gh-0.012
 tresp-0.012
gency-0.012
Token rope
Token ID17182
Feature activation-6.64e+00

Pos logit contributions
 agree+0.003
 agreement+0.003
 family+0.003
 happy+0.003
ies+0.003
Neg logit contributions
uch-0.003
 fault-0.003
OP-0.003
osaurus-0.003
paste-0.003
Token was
Token ID373
Feature activation-3.65e+00

Pos logit contributions
 fault+0.236
gency+0.230
paste+0.219
gh+0.219
osaurus+0.217
Neg logit contributions
 agree-0.348
ies-0.333
ie-0.328
 agreement-0.313
 happy-0.313
Token old
Token ID1468
Feature activation+4.16e+00

Pos logit contributions
uch+0.167
lor+0.166
gency+0.163
osaurus+0.163
OP+0.157
Neg logit contributions
 happy-0.163
ies-0.139
 agreement-0.138
 sunny-0.138
 agree-0.137
Token said
Token ID531
Feature activation-5.59e-01

Pos logit contributions
 fault+0.029
 tresp+0.028
ses+0.028
sed+0.028
paste+0.027
Neg logit contributions
 agree-0.023
 agreement-0.023
ies-0.021
ie-0.021
ashing-0.021
Token it
Token ID340
Feature activation-3.39e+00

Pos logit contributions
uch+0.034
 fault+0.034
ses+0.033
sed+0.033
osaurus+0.033
Neg logit contributions
 agree-0.016
 agreement-0.014
ies-0.014
 happy-0.014
ie-0.013
Token was
Token ID373
Feature activation-3.80e+00

Pos logit contributions
gency+0.217
 fault+0.211
ses+0.207
uch+0.204
 tresp+0.202
Neg logit contributions
 happy-0.077
 agree-0.075
 play-0.075
ies-0.069
ie-0.065
Token not
Token ID407
Feature activation-7.22e+00

Pos logit contributions
uch+0.180
gency+0.177
 fault+0.177
paste+0.176
ses+0.176
Neg logit contributions
 agree-0.154
 happy-0.152
 agreement-0.148
ies-0.143
 family-0.139
Token their
Token ID511
Feature activation+2.14e+00

Pos logit contributions
uch+0.330
 fault+0.329
sed+0.318
ses+0.316
paste+0.314
Neg logit contributions
 agree-0.317
 agreement-0.290
ies-0.289
 happy-0.281
ie-0.270
Token fault
Token ID8046
Feature activation-2.08e+01

Pos logit contributions
 agree+0.170
 agreement+0.163
ies+0.160
 happy+0.156
 family+0.156
Neg logit contributions
uch-0.025
paste-0.023
gency-0.016
gh-0.016
osaurus-0.015
Token.
Token ID13
Feature activation+4.02e-01

Pos logit contributions
 fault+0.967
paste+0.844
sed+0.836
ses+0.836
gh+0.823
Neg logit contributions
 agree-0.906
 agreement-0.889
reci-0.806
ies-0.782
ie-0.762
Token They
Token ID1119
Feature activation+3.79e+00

Pos logit contributions
 agree+0.010
 agreement+0.009
ies+0.008
ie+0.008
 happy+0.007
Neg logit contributions
 fault-0.026
uch-0.025
ses-0.025
sed-0.025
paste-0.025
Token said
Token ID531
Feature activation-5.66e-01

Pos logit contributions
 agree+0.146
 agreement+0.144
ies+0.138
ie+0.131
ashing+0.130
Neg logit contributions
 fault-0.178
 tresp-0.170
sed-0.170
ses-0.170
uch-0.169
Token the
Token ID262
Feature activation+1.46e+00

Pos logit contributions
 fault+0.035
uch+0.035
ses+0.035
sed+0.034
osaurus+0.034
Neg logit contributions
 agree-0.015
 agreement-0.013
ies-0.013
 happy-0.012
ie-0.012
Token man
Token ID582
Feature activation-2.10e+00

Pos logit contributions
 agree+0.059
 agreement+0.059
 family+0.054
 happy+0.053
ies+0.051
Neg logit contributions
uch-0.071
paste-0.068
osaurus-0.066
gency-0.064
ses-0.064
Token realized
Token ID6939
Feature activation-1.54e+00

Pos logit contributions
 agree+0.029
ie+0.026
 agreement+0.026
ies+0.025
 happy+0.025
Neg logit contributions
 fault-0.049
ses-0.045
uch-0.044
paste-0.043
sed-0.043
Token it
Token ID340
Feature activation-3.44e+00

Pos logit contributions
uch+0.072
ses+0.071
 fault+0.071
sed+0.070
gency+0.065
Neg logit contributions
 agree-0.065
ies-0.061
 agreement-0.061
ie-0.060
 happy-0.058
Token was
Token ID373
Feature activation-3.76e+00

Pos logit contributions
ses+0.191
sed+0.186
 fault+0.184
gency+0.179
paste+0.177
Neg logit contributions
ie-0.119
ies-0.108
 agree-0.106
 happy-0.098
 family-0.097
Token his
Token ID465
Feature activation+4.94e-01

Pos logit contributions
ses+0.161
sed+0.159
paste+0.155
uch+0.154
osaurus+0.154
Neg logit contributions
 happy-0.172
 agreement-0.165
ie-0.162
 agree-0.160
 family-0.159
Token own
Token ID898
Feature activation+5.61e+00

Pos logit contributions
 agree+0.026
 agreement+0.024
 family+0.024
 happy+0.024
ies+0.024
Neg logit contributions
paste-0.018
uch-0.018
osaurus-0.017
gency-0.017
RR-0.016
Token fault
Token ID8046
Feature activation-2.08e+01

Pos logit contributions
 agree+0.227
 agreement+0.211
ies+0.200
 happy+0.199
 family+0.197
Neg logit contributions
uch-0.270
paste-0.266
osaurus-0.261
ses-0.255
gency-0.255
Token for
Token ID329
Feature activation+6.98e+00

Pos logit contributions
 fault+1.005
paste+0.913
uch+0.910
sed+0.910
ses+0.908
Neg logit contributions
 agree-0.863
 agreement-0.810
ies-0.751
reci-0.714
ie-0.713
Token not
Token ID407
Feature activation-7.11e+00

Pos logit contributions
 agree+0.322
 agreement+0.305
ies+0.287
 happy+0.285
ie+0.281
Neg logit contributions
 fault-0.303
uch-0.295
ses-0.288
sed-0.285
paste-0.282
Token val
Token ID1188
Feature activation-2.16e+00

Pos logit contributions
 fault+0.284
 tresp+0.254
uch+0.252
paste+0.248
ses+0.248
Neg logit contributions
 agree-0.353
 agreement-0.340
ies-0.322
ie-0.315
ashing-0.298
Tokenuing
Token ID4250
Feature activation+1.14e+01

Pos logit contributions
 fault+0.144
ses+0.143
sed+0.141
uch+0.140
paste+0.139
Neg logit contributions
 agree-0.050
 agreement-0.043
ies-0.042
ie-0.038
 happy-0.032
Token her
Token ID607
Feature activation+2.98e+00

Pos logit contributions
 agree+0.413
ies+0.386
 agreement+0.373
ie+0.365
 happy+0.359
Neg logit contributions
uch-0.601
ses-0.599
sed-0.598
 fault-0.576
paste-0.566
Token them
Token ID606
Feature activation+1.07e+00

Pos logit contributions
 agree+0.146
 agreement+0.137
ies+0.128
ie+0.125
 happy+0.121
Neg logit contributions
 fault-0.174
uch-0.170
ses-0.167
sed-0.165
paste-0.162
Token it
Token ID340
Feature activation-3.50e+00

Pos logit contributions
 agree+0.044
 agreement+0.041
ies+0.040
ie+0.038
 happy+0.037
Neg logit contributions
 fault-0.052
uch-0.051
ses-0.050
sed-0.049
paste-0.049
Token was
Token ID373
Feature activation-3.75e+00

Pos logit contributions
uch+0.175
ses+0.174
paste+0.170
gency+0.170
sed+0.169
Neg logit contributions
 agree-0.132
ies-0.124
ie-0.124
 happy-0.122
 family-0.114
Token all
Token ID477
Feature activation+3.50e+00

Pos logit contributions
ses+0.183
paste+0.181
sed+0.178
uch+0.176
gency+0.171
Neg logit contributions
 agreement-0.152
 agree-0.150
 happy-0.149
ie-0.141
ies-0.139
Token their
Token ID511
Feature activation+2.02e+00

Pos logit contributions
 agree+0.171
 agreement+0.163
ies+0.158
 happy+0.157
ie+0.155
Neg logit contributions
uch-0.135
ses-0.133
paste-0.133
 fault-0.133
sed-0.132
Token fault
Token ID8046
Feature activation-2.08e+01

Pos logit contributions
 agree+0.078
 agreement+0.072
ies+0.070
ie+0.067
 happy+0.065
Neg logit contributions
 fault-0.101
uch-0.099
paste-0.097
ses-0.097
sed-0.096
Token.
Token ID13
Feature activation+7.93e-02

Pos logit contributions
 fault+1.011
sed+0.923
ses+0.918
paste+0.915
 tresp+0.910
Neg logit contributions
 agree-0.854
 agreement-0.813
ies-0.733
reci-0.710
ie-0.709
Token His
Token ID2399
Feature activation+4.37e-02

Pos logit contributions
 agree+0.003
 agreement+0.002
ies+0.002
ie+0.002
 happy+0.002
Neg logit contributions
 fault-0.004
uch-0.004
ses-0.004
sed-0.004
paste-0.004
Token friends
Token ID2460
Feature activation+1.01e+01

Pos logit contributions
 agree+0.001
 agreement+0.001
ies+0.000
ie+0.000
 happy+0.000
Neg logit contributions
 fault-0.003
uch-0.003
paste-0.003
ses-0.003
gency-0.003
Token were
Token ID547
Feature activation+2.41e-01

Pos logit contributions
 agree+0.394
 agreement+0.378
ies+0.364
ie+0.339
ashing+0.334
Neg logit contributions
 fault-0.471
uch-0.454
 tresp-0.449
sed-0.446
gh-0.438
Token so
Token ID523
Feature activation-9.57e+00

Pos logit contributions
 agree+0.012
 agreement+0.011
ies+0.011
 happy+0.010
ie+0.010
Neg logit contributions
 fault-0.010
uch-0.009
ses-0.009
paste-0.009
sed-0.009
Token knew
Token ID2993
Feature activation-2.07e+00

Pos logit contributions
 agree+0.120
 agreement+0.107
ies+0.100
ie+0.099
 happy+0.099
Neg logit contributions
 fault-0.175
ses-0.164
uch-0.164
sed-0.160
paste-0.159
Token it
Token ID340
Feature activation-3.46e+00

Pos logit contributions
 fault+0.106
uch+0.106
ses+0.105
sed+0.104
paste+0.100
Neg logit contributions
 agree-0.078
 agreement-0.071
ies-0.068
ie-0.066
 happy-0.065
Token was
Token ID373
Feature activation-3.74e+00

Pos logit contributions
ses+0.177
 fault+0.176
sed+0.172
uch+0.171
paste+0.166
Neg logit contributions
 agree-0.131
ie-0.119
 happy-0.119
 agreement-0.115
ies-0.113
Token his
Token ID465
Feature activation+3.81e-01

Pos logit contributions
ses+0.170
sed+0.169
uch+0.169
 fault+0.169
paste+0.167
Neg logit contributions
 agree-0.163
 agreement-0.157
 happy-0.147
ies-0.144
ie-0.143
Token own
Token ID898
Feature activation+5.46e+00

Pos logit contributions
 agree+0.013
 agreement+0.012
 happy+0.012
 family+0.011
ies+0.011
Neg logit contributions
uch-0.021
paste-0.021
osaurus-0.021
angs-0.020
RR-0.020
Token fault
Token ID8046
Feature activation-2.08e+01

Pos logit contributions
 agree+0.198
 agreement+0.185
ies+0.171
 happy+0.165
ie+0.161
Neg logit contributions
uch-0.286
 fault-0.284
ses-0.279
paste-0.278
sed-0.277
Token.
Token ID13
Feature activation-7.79e-02

Pos logit contributions
 fault+1.012
paste+0.919
uch+0.919
ses+0.915
sed+0.915
Neg logit contributions
 agree-0.852
 agreement-0.803
ies-0.754
ie-0.716
reci-0.706
Token He
Token ID679
Feature activation+8.23e-01

Pos logit contributions
 fault+0.004
paste+0.004
ses+0.004
sed+0.004
uch+0.004
Neg logit contributions
 agree-0.003
ies-0.003
 agreement-0.003
ie-0.003
 happy-0.002
Token would
Token ID561
Feature activation-2.82e+00

Pos logit contributions
 agree+0.031
 agreement+0.029
ies+0.027
ie+0.026
 happy+0.026
Neg logit contributions
 fault-0.043
uch-0.041
ses-0.041
sed-0.040
paste-0.040
Token have
Token ID423
Feature activation+8.37e+00

Pos logit contributions
 fault+0.108
uch+0.098
ses+0.096
 tresp+0.096
gh+0.095
Neg logit contributions
 agree-0.139
ies-0.134
 agreement-0.134
ie-0.127
ashing-0.125
Token to
Token ID284
Feature activation-5.45e+00

Pos logit contributions
 agree+0.379
 happy+0.356
 family+0.353
 play+0.334
 agreement+0.333
Neg logit contributions
ses-0.386
uch-0.384
sed-0.374
osaurus-0.368
RR-0.368
Token "
Token ID366
Feature activation+1.47e+00

Pos logit contributions
uch+0.037
 fault+0.037
ses+0.036
osaurus+0.036
paste+0.035
Neg logit contributions
 agree-0.024
 agreement-0.022
 happy-0.021
ies-0.021
ie-0.020
TokenIt
Token ID1026
Feature activation-2.43e+00

Pos logit contributions
 agree+0.062
 agreement+0.057
ies+0.053
ie+0.052
 happy+0.051
Neg logit contributions
 fault-0.072
ses-0.068
sed-0.067
 tresp-0.066
uch-0.066
Token's
Token ID338
Feature activation+1.32e+00

Pos logit contributions
 fault+0.108
ses+0.105
gency+0.103
gh+0.102
sed+0.102
Neg logit contributions
 agree-0.106
ies-0.100
ie-0.098
 happy-0.097
 agreement-0.092
Token your
Token ID534
Feature activation-6.39e-01

Pos logit contributions
 agree+0.058
 happy+0.058
 agreement+0.057
 family+0.053
ies+0.053
Neg logit contributions
osaurus-0.057
paste-0.056
 fault-0.055
ses-0.055
uch-0.055
Token own
Token ID898
Feature activation+6.08e+00

Pos logit contributions
uch+0.033
paste+0.033
osaurus+0.032
gency+0.032
gh+0.032
Neg logit contributions
 agree-0.023
 agreement-0.021
 happy-0.021
 family-0.021
ies-0.020
Token fault
Token ID8046
Feature activation-2.08e+01

Pos logit contributions
 agree+0.251
 agreement+0.236
ies+0.224
 happy+0.223
 family+0.222
Neg logit contributions
paste-0.278
uch-0.276
 fault-0.273
ses-0.271
osaurus-0.269
Token,
Token ID11
Feature activation+1.21e-01

Pos logit contributions
 fault+0.846
uch+0.831
ses+0.815
sed+0.800
paste+0.799
Neg logit contributions
 agree-0.997
ies-0.930
 agreement-0.924
ie-0.895
 happy-0.854
Token you
Token ID345
Feature activation-1.36e+00

Pos logit contributions
 agree+0.005
 agreement+0.005
ies+0.005
ie+0.005
 happy+0.004
Neg logit contributions
 fault-0.006
uch-0.005
ses-0.005
sed-0.005
paste-0.005
Token should
Token ID815
Feature activation-6.85e+00

Pos logit contributions
 fault+0.062
ses+0.060
sed+0.060
gh+0.059
paste+0.058
Neg logit contributions
 agree-0.061
ies-0.054
 agreement-0.054
ie-0.051
 happy-0.051
Token've
Token ID1053
Feature activation+7.88e-01

Pos logit contributions
 fault+0.300
uch+0.282
ses+0.276
sed+0.274
paste+0.273
Neg logit contributions
 agree-0.312
 agreement-0.296
ies-0.284
ie-0.273
 happy-0.262
Token been
Token ID587
Feature activation-6.34e+00

Pos logit contributions
 agree+0.042
 agreement+0.038
 happy+0.037
 play+0.036
ies+0.036
Neg logit contributions
ses-0.029
 fault-0.029
paste-0.028
sed-0.028
osaurus-0.028
Token and
Token ID290
Feature activation-4.86e-01

Pos logit contributions
 agree+0.003
 agreement+0.003
ies+0.003
ie+0.003
 happy+0.003
Neg logit contributions
 fault-0.005
uch-0.005
ses-0.005
sed-0.004
paste-0.004
Token it
Token ID340
Feature activation-3.39e+00

Pos logit contributions
 fault+0.024
uch+0.022
ses+0.022
sed+0.022
paste+0.022
Neg logit contributions
 agree-0.020
 agreement-0.018
ies-0.018
 happy-0.017
ie-0.017
Token was
Token ID373
Feature activation-3.54e+00

Pos logit contributions
 fault+0.159
ses+0.155
gency+0.149
sed+0.149
uch+0.146
Neg logit contributions
 agree-0.137
ie-0.130
ies-0.130
 happy-0.124
 agreement-0.121
Token his
Token ID465
Feature activation+4.77e-01

Pos logit contributions
uch+0.155
ses+0.152
 fault+0.152
sed+0.150
paste+0.149
Neg logit contributions
 agree-0.154
 agreement-0.152
 happy-0.147
ies-0.139
ie-0.137
Token own
Token ID898
Feature activation+5.57e+00

Pos logit contributions
 agree+0.016
 agreement+0.014
ies+0.014
 family+0.014
 happy+0.014
Neg logit contributions
uch-0.026
paste-0.026
osaurus-0.026
gency-0.025
 fault-0.025
Token fault
Token ID8046
Feature activation-2.08e+01

Pos logit contributions
 agree+0.194
 agreement+0.182
ies+0.172
 happy+0.159
ie+0.159
Neg logit contributions
uch-0.297
 fault-0.294
ses-0.288
paste-0.287
sed-0.284
Token.
Token ID13
Feature activation+3.61e-01

Pos logit contributions
 fault+0.981
sed+0.885
paste+0.878
uch+0.876
ses+0.876
Neg logit contributions
 agree-0.892
 agreement-0.833
ies-0.774
reci-0.763
ie-0.738
Token
Token ID198
Feature activation+1.05e+00

Pos logit contributions
 agree+0.014
ies+0.013
 agreement+0.013
ie+0.012
 forth+0.011
Neg logit contributions
 fault-0.019
gency-0.017
uch-0.017
OP-0.017
paste-0.017
Token
Token ID198
Feature activation+1.05e+00

Pos logit contributions
ies+0.041
 agree+0.040
 agreement+0.040
ie+0.036
 forth+0.035
Neg logit contributions
 fault-0.051
gh-0.049
uch-0.049
OP-0.047
gency-0.046
TokenJohnny
Token ID44960
Feature activation-1.32e+00

Pos logit contributions
 agree+0.045
ies+0.045
 agreement+0.043
ie+0.041
ashing+0.038
Neg logit contributions
gh-0.046
uch-0.045
OP-0.043
 fault-0.043
rain-0.042
Token smiled
Token ID13541
Feature activation-8.75e-01

Pos logit contributions
 fault+0.067
ses+0.063
uch+0.062
paste+0.061
sed+0.061
Neg logit contributions
 agree-0.054
 agreement-0.049
ies-0.048
ie-0.046
 happy-0.045
Token He
Token ID679
Feature activation+9.29e-01

Pos logit contributions
 agree+0.003
 agreement+0.003
ies+0.002
ie+0.002
 happy+0.002
Neg logit contributions
 fault-0.005
uch-0.005
paste-0.005
ses-0.005
 tresp-0.005
Token realized
Token ID6939
Feature activation-1.25e+00

Pos logit contributions
 agree+0.032
 agreement+0.028
ie+0.028
 happy+0.027
ies+0.026
Neg logit contributions
 fault-0.056
uch-0.052
paste-0.051
ses-0.050
osaurus-0.049
Token it
Token ID340
Feature activation-3.24e+00

Pos logit contributions
 fault+0.065
uch+0.064
ses+0.064
sed+0.063
OP+0.059
Neg logit contributions
 agree-0.047
ie-0.044
ies-0.044
 happy-0.043
 agreement-0.043
Token was
Token ID373
Feature activation-3.58e+00

Pos logit contributions
ses+0.199
 fault+0.193
sed+0.193
paste+0.192
gency+0.192
Neg logit contributions
ie-0.104
ies-0.084
my-0.082
 play-0.075
 day-0.074
Token his
Token ID465
Feature activation+6.90e-01

Pos logit contributions
lor+0.179
paste+0.173
OP+0.171
gency+0.169
uch+0.168
Neg logit contributions
 happy-0.161
 family-0.140
 day-0.140
ie-0.140
 agreement-0.136
Token fault
Token ID8046
Feature activation-2.08e+01

Pos logit contributions
 agree+0.028
 family+0.027
 happy+0.027
ies+0.025
 agreement+0.025
Neg logit contributions
paste-0.035
uch-0.033
gency-0.032
osaurus-0.032
ñ-0.032
Token and
Token ID290
Feature activation-5.01e-01

Pos logit contributions
 fault+1.071
uch+1.010
ses+1.003
sed+0.999
paste+0.992
Neg logit contributions
 agree-0.793
 agreement-0.737
ies-0.695
ie-0.663
 happy-0.631
Token he
Token ID339
Feature activation+3.26e+00

Pos logit contributions
 fault+0.028
uch+0.027
ses+0.026
paste+0.026
sed+0.026
Neg logit contributions
 agree-0.017
 agreement-0.016
ies-0.016
ie-0.015
 happy-0.014
Token felt
Token ID2936
Feature activation-2.13e+00

Pos logit contributions
 agree+0.116
ie+0.112
ies+0.105
 happy+0.103
 family+0.098
Neg logit contributions
 fault-0.181
gency-0.172
ses-0.171
paste-0.167
gh-0.165
Token regret
Token ID13721
Feature activation-2.89e+00

Pos logit contributions
 fault+0.102
osaurus+0.101
uch+0.101
sed+0.100
ses+0.100
Neg logit contributions
 happy-0.089
 agree-0.088
 agreement-0.087
ie-0.075
 family-0.075
Token.
Token ID13
Feature activation+9.18e-02

Pos logit contributions
 fault+0.157
uch+0.157
ses+0.152
OP+0.150
sed+0.149
Neg logit contributions
 agree-0.094
ies-0.092
 agreement-0.087
ie-0.086
 happy-0.086
Token "
Token ID366
Feature activation+1.16e+00

Pos logit contributions
 agree+0.000
 agreement+0.000
ies+0.000
ie+0.000
ashing+0.000
Neg logit contributions
 fault-0.000
sed-0.000
paste-0.000
ses-0.000
uch-0.000
TokenIt
Token ID1026
Feature activation-2.47e+00

Pos logit contributions
 agree+0.062
 agreement+0.057
ies+0.056
ie+0.054
ashing+0.052
Neg logit contributions
paste-0.040
 fault-0.039
uch-0.039
 happens-0.036
ses-0.036
Token's
Token ID338
Feature activation+1.13e+00

Pos logit contributions
gency+0.144
 fault+0.143
ses+0.141
gh+0.141
sed+0.141
Neg logit contributions
ie-0.077
ies-0.071
 happy-0.070
 agree-0.069
 family-0.066
Token not
Token ID407
Feature activation-7.20e+00

Pos logit contributions
 agree+0.047
 agreement+0.046
 happy+0.044
ies+0.044
ie+0.043
Neg logit contributions
 fault-0.053
sed-0.052
ses-0.052
gency-0.051
osaurus-0.051
Token your
Token ID534
Feature activation-9.79e-01

Pos logit contributions
 fault+0.329
uch+0.323
sed+0.319
ses+0.319
paste+0.317
Neg logit contributions
 agree-0.309
ies-0.289
 agreement-0.289
 happy-0.284
ie-0.273
Token fault
Token ID8046
Feature activation-2.07e+01

Pos logit contributions
uch+0.019
gency+0.018
paste+0.018
osaurus+0.017
RR+0.017
Neg logit contributions
 agree-0.069
ies-0.067
 family-0.066
 happy-0.065
 agreement-0.064
Token,
Token ID11
Feature activation+8.65e-03

Pos logit contributions
 fault+1.050
ses+0.957
sed+0.954
uch+0.953
paste+0.952
Neg logit contributions
 agree-0.818
 agreement-0.778
ies-0.698
reci-0.676
ie-0.675
Token honey
Token ID12498
Feature activation-2.03e+00

Pos logit contributions
 agree+0.000
 agreement+0.000
ies+0.000
ie+0.000
 happy+0.000
Neg logit contributions
 fault-0.000
uch-0.000
ses-0.000
sed-0.000
paste-0.000
Token.
Token ID13
Feature activation+1.76e-01

Pos logit contributions
 fault+0.107
uch+0.104
ses+0.103
sed+0.102
gency+0.101
Neg logit contributions
 agree-0.072
ies-0.066
 agreement-0.065
ie-0.062
 happy-0.058
Token Ben
Token ID3932
Feature activation-3.02e+00

Pos logit contributions
 agree+0.007
 agreement+0.007
ies+0.007
 family+0.007
ie+0.006
Neg logit contributions
gh-0.008
uch-0.008
 fault-0.007
osaurus-0.007
 tresp-0.007
Token made
Token ID925
Feature activation-5.02e+00

Pos logit contributions
 fault+0.117
ses+0.110
sed+0.108
paste+0.108
uch+0.107
Neg logit contributions
 agree-0.154
ies-0.143
 agreement-0.140
ie-0.139
 happy-0.136
TokenNo
Token ID2949
Feature activation-6.19e+00

Pos logit contributions
 agree+0.239
 agreement+0.220
 happy+0.202
ies+0.196
 particular+0.189
Neg logit contributions
 fault-0.285
uch-0.257
 tresp-0.256
ses-0.256
gency-0.255
Token,
Token ID11
Feature activation-8.40e-02

Pos logit contributions
paste+0.441
uch+0.426
angs+0.423
ses+0.417
 fault+0.414
Neg logit contributions
 agree-0.143
ies-0.115
 agreement-0.112
ie-0.103
 family-0.092
Token it
Token ID340
Feature activation-3.53e+00

Pos logit contributions
 fault+0.005
uch+0.005
osaurus+0.005
OP+0.004
gency+0.004
Neg logit contributions
 agree-0.003
 agreement-0.003
 happy-0.003
 family-0.002
ies-0.002
Token's
Token ID338
Feature activation+9.42e-01

Pos logit contributions
gency+0.193
 fault+0.192
paste+0.183
ses+0.182
ifier+0.177
Neg logit contributions
ie-0.121
ies-0.117
 agree-0.110
 happy-0.108
 play-0.105
Token your
Token ID534
Feature activation-1.11e+00

Pos logit contributions
 happy+0.044
 forth+0.038
ies+0.038
 agree+0.037
 friends+0.036
Neg logit contributions
lor-0.049
paste-0.044
gency-0.042
osaurus-0.041
flake-0.040
Token fault
Token ID8046
Feature activation-2.07e+01

Pos logit contributions
paste+0.041
uch+0.039
gency+0.038
 happens+0.037
lor+0.036
Neg logit contributions
 agree-0.058
ies-0.054
 agreement-0.054
 happy-0.054
 family-0.054
Token,
Token ID11
Feature activation-7.12e-02

Pos logit contributions
 fault+0.688
uch+0.662
paste+0.638
ses+0.633
gency+0.630
Neg logit contributions
 agree-1.169
ies-1.115
 agreement-1.087
ie-1.067
 happy-1.022
Token Lily
Token ID20037
Feature activation+2.09e+00

Pos logit contributions
uch+0.003
 fault+0.003
gency+0.003
OP+0.003
osaurus+0.003
Neg logit contributions
 agree-0.003
 happy-0.003
 agreement-0.003
ies-0.003
ie-0.003
Token!"
Token ID2474
Feature activation-7.54e+00

Pos logit contributions
 agree+0.095
 agreement+0.087
ies+0.086
ie+0.083
 happy+0.079
Neg logit contributions
 fault-0.093
uch-0.088
ses-0.088
sed-0.087
paste-0.087
Token Ben
Token ID3932
Feature activation-3.08e+00

Pos logit contributions
 fault+0.328
ses+0.308
sed+0.304
uch+0.300
paste+0.300
Neg logit contributions
 agree-0.344
 agreement-0.328
ies-0.319
ie-0.310
ashing-0.288
Token said
Token ID531
Feature activation-6.58e-01

Pos logit contributions
 fault+0.208
uch+0.193
ses+0.189
gh+0.187
gency+0.186
Neg logit contributions
 agree-0.092
 agreement-0.073
ies-0.071
 happy-0.068
ie-0.065
Token "
Token ID366
Feature activation+1.37e+00

Pos logit contributions
 agree+0.010
 agreement+0.009
ies+0.009
ie+0.008
 happy+0.008
Neg logit contributions
 fault-0.012
uch-0.011
ses-0.011
sed-0.011
paste-0.011
TokenIt
Token ID1026
Feature activation-2.22e+00

Pos logit contributions
 agree+0.063
 agreement+0.059
ies+0.056
ie+0.055
 happy+0.053
Neg logit contributions
 fault-0.061
ses-0.057
sed-0.057
uch-0.056
 tresp-0.055
Token's
Token ID338
Feature activation+1.22e+00

Pos logit contributions
gh+0.106
gency+0.104
 fault+0.104
ses+0.102
sed+0.101
Neg logit contributions
ie-0.088
ies-0.088
 agree-0.084
 happy-0.078
 ask-0.075
Token not
Token ID407
Feature activation-7.00e+00

Pos logit contributions
 happy+0.053
 family+0.050
ies+0.048
ie+0.048
 day+0.047
Neg logit contributions
lor-0.054
osaurus-0.052
 fault-0.051
gency-0.051
paste-0.050
Token my
Token ID616
Feature activation-1.90e+00

Pos logit contributions
uch+0.347
sed+0.343
ses+0.339
 fault+0.338
paste+0.337
Neg logit contributions
 agree-0.275
 happy-0.267
ies-0.258
 agreement-0.250
ie-0.239
Token fault
Token ID8046
Feature activation-2.07e+01

Pos logit contributions
paste+0.090
osaurus+0.088
uch+0.086
RR+0.083
gency+0.082
Neg logit contributions
 agree-0.084
ies-0.079
 family-0.078
 happy-0.071
 play-0.071
Token.
Token ID13
Feature activation+2.77e-01

Pos logit contributions
 fault+0.905
uch+0.879
ses+0.864
sed+0.852
paste+0.851
Neg logit contributions
 agree-0.933
ies-0.870
 agreement-0.867
ie-0.832
 happy-0.795
Token I
Token ID314
Feature activation-1.85e+00

Pos logit contributions
 agree+0.014
 agreement+0.013
ies+0.012
ie+0.012
 happy+0.012
Neg logit contributions
 fault-0.011
uch-0.010
gh-0.010
sed-0.010
ses-0.010
Token can
Token ID460
Feature activation-5.17e+00

Pos logit contributions
 fault+0.085
ses+0.081
paste+0.081
sed+0.080
gh+0.080
Neg logit contributions
 agree-0.082
ies-0.072
 agreement-0.072
ie-0.070
 happy-0.068
Token't
Token ID470
Feature activation-1.04e+01

Pos logit contributions
 fault+0.157
ses+0.141
sed+0.138
paste+0.136
uch+0.134
Neg logit contributions
 agree-0.317
 agreement-0.297
ie-0.286
ies-0.286
 happy-0.277
Token help
Token ID1037
Feature activation-8.22e-01

Pos logit contributions
uch+0.560
 fault+0.547
paste+0.527
gh+0.522
 tresp+0.520
Neg logit contributions
 agreement-0.326
 agree-0.317
ies-0.306
ashing-0.270
reci-0.267
Token must
Token ID1276
Feature activation-9.40e+00

Pos logit contributions
 fault+0.060
sed+0.058
ses+0.058
gh+0.057
paste+0.056
Neg logit contributions
 agree-0.067
 agreement-0.057
ies-0.055
 happy-0.054
ie-0.054
Token promise
Token ID6991
Feature activation+5.96e+00

Pos logit contributions
 fault+0.469
uch+0.446
gh+0.425
 tresp+0.420
ses+0.417
Neg logit contributions
 agree-0.350
 agreement-0.342
ies-0.339
ie-0.316
ashing-0.312
Token me
Token ID502
Feature activation+7.14e-01

Pos logit contributions
 agree+0.246
ie+0.219
 ask+0.215
ies+0.213
 play+0.212
Neg logit contributions
ses-0.298
paste-0.293
sed-0.285
 fault-0.283
osaurus-0.280
Token not
Token ID407
Feature activation-7.16e+00

Pos logit contributions
 agree+0.033
ie+0.030
ies+0.030
 agreement+0.030
 happy+0.029
Neg logit contributions
ses-0.032
sed-0.031
 fault-0.030
osaurus-0.029
 CA-0.028
Token to
Token ID284
Feature activation-5.49e+00

Pos logit contributions
 fault+0.255
ses+0.240
uch+0.240
sed+0.237
paste+0.236
Neg logit contributions
 agree-0.387
 agreement-0.360
ies-0.354
ie-0.341
 happy-0.335
Token take
Token ID1011
Feature activation+9.23e+00

Pos logit contributions
 fault+0.296
uch+0.283
ses+0.279
sed+0.277
paste+0.276
Neg logit contributions
 agree-0.195
 agreement-0.181
ies-0.172
ie-0.163
 happy-0.155
Token any
Token ID597
Feature activation-4.41e+00

Pos logit contributions
 agree+0.385
 happy+0.367
 agreement+0.357
 family+0.340
ie+0.339
Neg logit contributions
sed-0.449
ses-0.447
 fault-0.422
tr-0.415
osaurus-0.410
Token away
Token ID1497
Feature activation-6.80e-01

Pos logit contributions
 fault+0.195
ses+0.185
uch+0.184
sed+0.184
paste+0.179
Neg logit contributions
 agree-0.200
 agreement-0.188
ies-0.182
ie-0.176
 happy-0.170
Token."
Token ID526
Feature activation-4.65e+00

Pos logit contributions
 fault+0.031
ses+0.029
sed+0.029
uch+0.029
paste+0.028
Neg logit contributions
 agree-0.030
 agreement-0.028
ies-0.028
ie-0.027
 happy-0.026
Token
Token ID220
Feature activation+1.69e+00

Pos logit contributions
 fault+0.217
paste+0.209
uch+0.209
gency+0.207
OP+0.207
Neg logit contributions
ies-0.174
 agreement-0.164
 agree-0.163
ashing-0.156
ie-0.154
Token
Token ID198
Feature activation+8.32e-01

Pos logit contributions
 agree+0.052
 agreement+0.050
ies+0.046
ie+0.043
 happy+0.039
Neg logit contributions
 fault-0.099
uch-0.093
ses-0.092
paste-0.092
sed-0.091
Token the
Token ID262
Feature activation+1.51e+00

Pos logit contributions
 agree+0.168
 agreement+0.157
ies+0.149
ie+0.143
 happy+0.136
Neg logit contributions
 fault-0.216
uch-0.205
ses-0.202
sed-0.201
paste-0.200
Token store
Token ID3650
Feature activation+5.09e+00

Pos logit contributions
 agree+0.061
 agreement+0.056
ies+0.055
ie+0.052
 happy+0.049
Neg logit contributions
 fault-0.075
uch-0.070
ses-0.070
sed-0.069
 tresp-0.068
Token.
Token ID13
Feature activation-1.24e-01

Pos logit contributions
 agree+0.215
 agreement+0.209
ies+0.191
ie+0.185
reci+0.185
Neg logit contributions
 fault-0.237
 tresp-0.212
sed-0.208
uch-0.208
gh-0.207
Token<|endoftext|>
Token ID50256
Feature activation+3.66e+00

Pos logit contributions
 fault+0.006
OP+0.005
 Ow+0.005
ses+0.005
 CA+0.005
Neg logit contributions
ies-0.006
 agree-0.005
ie-0.005
 forth-0.005
 agreement-0.005
TokenOnce
Token ID7454
Feature activation+3.70e+00

Pos logit contributions
 agree+0.114
 agreement+0.104
ies+0.097
ie+0.093
 happy+0.087
Neg logit contributions
 fault-0.214
uch-0.204
ses-0.203
sed-0.202
paste-0.201
Token upon
Token ID2402
Feature activation+9.81e+00

Pos logit contributions
 agreement+0.127
 agree+0.124
 forth+0.106
reci+0.101
acking+0.100
Neg logit contributions
 fault-0.212
 nothing-0.181
 wrong-0.180
 an-0.176
gh-0.174
Token a
Token ID257
Feature activation-8.21e+00

Pos logit contributions
 agree+0.544
 happy+0.534
 agreement+0.515
 play+0.492
 family+0.480
Neg logit contributions
 fault-0.359
sed-0.339
ses-0.332
uch-0.324
gency-0.319
Token time
Token ID640
Feature activation-2.06e+00

Pos logit contributions
 fault+0.361
uch+0.344
gh+0.336
ses+0.328
sed+0.327
Neg logit contributions
 agree-0.363
 agreement-0.348
ies-0.344
ie-0.321
reci-0.305
Token,
Token ID11
Feature activation+9.00e-02

Pos logit contributions
 fault+0.109
uch+0.104
sed+0.103
ses+0.102
gh+0.102
Neg logit contributions
 agree-0.075
 agreement-0.071
ies-0.065
ie-0.061
reci-0.059
Token there
Token ID612
Feature activation+6.82e+00

Pos logit contributions
 agree+0.005
 agreement+0.004
ies+0.004
reci+0.004
ogged+0.004
Neg logit contributions
 fault-0.003
uch-0.003
sed-0.003
ses-0.003
gh-0.003
Token was
Token ID373
Feature activation-3.72e+00

Pos logit contributions
 agree+0.322
 agreement+0.303
ies+0.292
ie+0.287
 happy+0.277
Neg logit contributions
 fault-0.290
uch-0.271
ses-0.270
paste-0.269
sed-0.267
Token dinner
Token ID8073
Feature activation-3.09e-01

Pos logit contributions
 agree+0.387
 happy+0.371
 family+0.370
 agreement+0.363
ie+0.348
Neg logit contributions
ses-0.380
sed-0.370
 fault-0.370
uch-0.352
gency-0.351
Token together
Token ID1978
Feature activation+5.41e+00

Pos logit contributions
 fault+0.018
ses+0.017
uch+0.017
sed+0.017
 tresp+0.017
Neg logit contributions
 agree-0.010
 agreement-0.009
ies-0.008
ie-0.008
 happy-0.008
Token.
Token ID13
Feature activation+1.51e-01

Pos logit contributions
 agree+0.205
 agreement+0.189
ies+0.180
ie+0.172
 happy+0.165
Neg logit contributions
 fault-0.283
uch-0.270
ses-0.267
sed-0.265
paste-0.263
Token<|endoftext|>
Token ID50256
Feature activation+4.00e+00

Pos logit contributions
 agree+0.007
ies+0.007
 agreement+0.007
 forth+0.007
ie+0.006
Neg logit contributions
 fault-0.006
 Ow-0.006
 CA-0.006
tr-0.006
ses-0.006
TokenOnce
Token ID7454
Feature activation+3.96e+00

Pos logit contributions
 agree+0.092
 agreement+0.080
ies+0.072
ie+0.070
 happy+0.066
Neg logit contributions
 fault-0.267
ses-0.256
uch-0.255
sed-0.255
paste-0.253
Token upon
Token ID2402
Feature activation+1.01e+01

Pos logit contributions
 agreement+0.142
 agree+0.135
 forth+0.114
reci+0.107
 towns+0.107
Neg logit contributions
 fault-0.224
 happens-0.192
 Ow-0.190
 nothing-0.188
 wrong-0.187
Token a
Token ID257
Feature activation-8.08e+00

Pos logit contributions
 agree+0.576
 happy+0.556
 agreement+0.543
 play+0.519
ies+0.514
Neg logit contributions
 fault-0.357
sed-0.331
ses-0.327
gency-0.315
uch-0.315
Token time
Token ID640
Feature activation-1.91e+00

Pos logit contributions
 fault+0.346
uch+0.328
gh+0.322
sed+0.311
ses+0.310
Neg logit contributions
 agree-0.363
 agreement-0.353
ies-0.348
ie-0.324
 forth-0.316
Token,
Token ID11
Feature activation+4.04e-01

Pos logit contributions
 fault+0.100
uch+0.095
sed+0.093
ses+0.093
gh+0.093
Neg logit contributions
 agree-0.072
 agreement-0.068
ies-0.062
ie-0.059
reci-0.057
Token there
Token ID612
Feature activation+6.87e+00

Pos logit contributions
 agree+0.021
 agreement+0.020
ies+0.018
reci+0.018
ogged+0.017
Neg logit contributions
uch-0.015
 fault-0.014
sed-0.014
ses-0.014
 nothing-0.014
Token was
Token ID373
Feature activation-3.40e+00

Pos logit contributions
 agree+0.311
 agreement+0.289
ies+0.281
ie+0.280
 happy+0.270
Neg logit contributions
 fault-0.308
paste-0.288
ses-0.286
uch-0.284
sed-0.282
Token girl
Token ID2576
Feature activation+7.34e+00

Pos logit contributions
 agree+0.150
 agreement+0.119
ies+0.099
ogged+0.097
ashing+0.095
Neg logit contributions
 fault-0.483
ses-0.470
sed-0.468
paste-0.464
uch-0.456
Token was
Token ID373
Feature activation-3.68e+00

Pos logit contributions
 agree+0.303
 agreement+0.285
ies+0.271
ie+0.260
 happy+0.248
Neg logit contributions
 fault-0.351
uch-0.333
ses-0.329
sed-0.328
paste-0.326
Token so
Token ID523
Feature activation-9.68e+00

Pos logit contributions
 fault+0.169
uch+0.163
ses+0.159
paste+0.159
sed+0.157
Neg logit contributions
 agree-0.162
 agreement-0.153
 happy-0.144
ies-0.143
ie-0.137
Token en
Token ID551
Feature activation+1.13e+00

Pos logit contributions
 fault+0.554
uch+0.529
sed+0.526
ses+0.523
 tricked+0.512
Neg logit contributions
 agreement-0.276
 agree-0.263
ie-0.261
ies-0.247
 forth-0.225
Tokenvious
Token ID1442
Feature activation-6.88e+00

Pos logit contributions
 agree+0.069
 agreement+0.068
ies+0.065
 family+0.063
ie+0.061
Neg logit contributions
 fault-0.028
uch-0.028
 happens-0.026
paste-0.025
 tricked-0.024
Token of
Token ID286
Feature activation+1.00e+01

Pos logit contributions
 fault+0.415
uch+0.394
ses+0.391
paste+0.391
sed+0.390
Neg logit contributions
 agree-0.203
 agreement-0.187
ies-0.171
ie-0.160
 happy-0.151
Token the
Token ID262
Feature activation+1.42e+00

Pos logit contributions
 agree+0.311
 agreement+0.296
ie+0.271
ies+0.268
 happy+0.255
Neg logit contributions
 fault-0.584
uch-0.573
ses-0.558
sed-0.558
paste-0.557
Token creatures
Token ID8109
Feature activation+3.33e+00

Pos logit contributions
 agree+0.061
 agreement+0.058
ies+0.056
ie+0.053
 happy+0.052
Neg logit contributions
 fault-0.067
uch-0.063
paste-0.062
ses-0.062
sed-0.061
Token enjoying
Token ID13226
Feature activation+4.32e+00

Pos logit contributions
 agree+0.136
 agreement+0.129
ies+0.120
ie+0.113
reci+0.110
Neg logit contributions
 fault-0.158
paste-0.147
uch-0.146
 tresp-0.146
sed-0.144
Token themselves
Token ID2405
Feature activation+1.82e+00

Pos logit contributions
 agree+0.142
 agreement+0.133
ies+0.123
 happy+0.121
ie+0.116
Neg logit contributions
 fault-0.249
uch-0.246
paste-0.240
ses-0.234
sed-0.233
Token in
Token ID287
Feature activation+1.52e+00

Pos logit contributions
 agree+0.074
 agreement+0.070
ies+0.065
ie+0.062
ogged+0.061
Neg logit contributions
 fault-0.086
uch-0.080
paste-0.079
gh-0.078
 tresp-0.078
Token away
Token ID1497
Feature activation-6.39e-01

Pos logit contributions
 fault+0.054
 tresp+0.051
 nothing+0.051
gh+0.051
ze+0.050
Neg logit contributions
 agree-0.055
reci-0.050
ie-0.050
ies-0.049
 agreement-0.049
Token.
Token ID13
Feature activation+1.42e-01

Pos logit contributions
 fault+0.030
uch+0.028
paste+0.028
ses+0.028
 tresp+0.028
Neg logit contributions
 agree-0.027
 agreement-0.025
ies-0.024
ie-0.023
reci-0.022
Token
Token ID220
Feature activation+1.75e+00

Pos logit contributions
ies+0.007
 agree+0.006
ie+0.006
 agreement+0.006
 forth+0.006
Neg logit contributions
 fault-0.006
 Ow-0.006
OP-0.006
 nothing-0.006
ses-0.006
Token
Token ID198
Feature activation+8.97e-01

Pos logit contributions
 agreement+0.061
 agree+0.060
ies+0.057
ie+0.055
 forth+0.051
Neg logit contributions
 fault-0.098
uch-0.089
ses-0.088
sed-0.087
OP-0.087
Token
Token ID198
Feature activation+8.87e-01

Pos logit contributions
ies+0.042
 forth+0.039
 agreement+0.038
ie+0.038
 agree+0.037
Neg logit contributions
gh-0.037
OP-0.035
uch-0.035
 fault-0.034
 Ow-0.034
TokenLater
Token ID18602
Feature activation+1.08e+01

Pos logit contributions
ies+0.039
 agree+0.036
ie+0.036
 agreement+0.034
 forth+0.032
Neg logit contributions
gh-0.040
uch-0.040
OP-0.038
paste-0.038
ze-0.038
Token that
Token ID326
Feature activation-1.10e+00

Pos logit contributions
 agree+0.570
 agreement+0.542
ies+0.526
ie+0.514
 happy+0.495
Neg logit contributions
 fault-0.404
uch-0.368
ses-0.368
sed-0.362
paste-0.359
Token day
Token ID1110
Feature activation+5.63e+00

Pos logit contributions
uch+0.065
 fault+0.063
 happens+0.062
gh+0.061
 burns+0.059
Neg logit contributions
 agree-0.032
 agreement-0.030
ies-0.027
ashing-0.025
ogged-0.024
Token,
Token ID11
Feature activation+7.60e-02

Pos logit contributions
 agree+0.218
 agreement+0.203
ies+0.184
ie+0.177
reci+0.168
Neg logit contributions
 fault-0.282
uch-0.278
paste-0.272
gh-0.272
 tresp-0.270
Token the
Token ID262
Feature activation+1.51e+00

Pos logit contributions
 agree+0.003
 agreement+0.003
ies+0.003
ie+0.003
reci+0.002
Neg logit contributions
 fault-0.004
uch-0.004
sed-0.004
ses-0.004
paste-0.004
Token mouse
Token ID10211
Feature activation-6.03e+00

Pos logit contributions
 agree+0.085
 agreement+0.079
ies+0.078
ie+0.075
ogged+0.074
Neg logit contributions
 fault-0.050
ses-0.048
sed-0.048
uch-0.048
paste-0.046
Token about
Token ID546
Feature activation+5.53e+00

Pos logit contributions
 agree+0.175
 agreement+0.170
ies+0.164
ashing+0.156
rolling+0.154
Neg logit contributions
 fault-0.145
 tresp-0.139
 happens-0.136
ps-0.124
 electricity-0.123
Token the
Token ID262
Feature activation+1.58e+00

Pos logit contributions
 agree+0.200
 agreement+0.187
ies+0.174
ie+0.166
 happy+0.166
Neg logit contributions
 fault-0.294
uch-0.290
paste-0.283
ses-0.281
sed-0.280
Token distant
Token ID12899
Feature activation+4.22e+00

Pos logit contributions
 agree+0.067
 agreement+0.063
ies+0.060
 happy+0.058
ie+0.057
Neg logit contributions
 fault-0.075
paste-0.072
uch-0.071
ses-0.070
sed-0.070
Token shark
Token ID21027
Feature activation-1.04e+01

Pos logit contributions
 agree+0.225
 agreement+0.213
ies+0.206
ie+0.200
 happy+0.193
Neg logit contributions
 fault-0.155
uch-0.144
ses-0.141
sed-0.140
paste-0.140
Token.
Token ID13
Feature activation+1.91e-01

Pos logit contributions
 fault+0.503
uch+0.476
ses+0.469
sed+0.467
paste+0.466
Neg logit contributions
 agree-0.422
 agreement-0.397
ies-0.376
ie-0.360
 happy-0.343
Token From
Token ID3574
Feature activation+4.79e+00

Pos logit contributions
 agree+0.009
ies+0.009
 agreement+0.008
ie+0.008
 forth+0.008
Neg logit contributions
 fault-0.008
 Ow-0.008
 nothing-0.008
 CA-0.008
ses-0.007
Token that
Token ID326
Feature activation-9.46e-01

Pos logit contributions
 agree+0.258
 agreement+0.246
ies+0.237
ie+0.229
 happy+0.222
Neg logit contributions
 fault-0.171
uch-0.160
ses-0.157
sed-0.155
paste-0.154
Token day
Token ID1110
Feature activation+5.61e+00

Pos logit contributions
uch+0.061
 fault+0.060
gh+0.059
 tresp+0.058
ses+0.057
Neg logit contributions
 agree-0.023
ies-0.020
 agreement-0.019
eline-0.016
ogged-0.016
Token on
Token ID319
Feature activation-1.59e-01

Pos logit contributions
 agree+0.327
 agreement+0.307
ies+0.295
ie+0.285
reci+0.274
Neg logit contributions
uch-0.181
gh-0.173
 fault-0.170
 tresp-0.165
paste-0.164
Token,
Token ID11
Feature activation+1.63e-01

Pos logit contributions
 fault+0.009
uch+0.009
ses+0.009
sed+0.009
paste+0.009
Neg logit contributions
 agree-0.005
 agreement-0.005
ies-0.004
ie-0.004
 happy-0.004
Token Tim
Token ID5045
Feature activation-4.72e-01

Pos logit contributions
 agree+0.009
 agreement+0.009
ies+0.008
ashing+0.008
eline+0.007
Neg logit contributions
 anything-0.006
 nothing-0.006
 fault-0.006
 no-0.006
 an-0.006
Token.
Token ID13
Feature activation-1.27e-01

Pos logit contributions
uch+0.091
 fault+0.091
ses+0.089
sed+0.089
gency+0.087
Neg logit contributions
 agree-0.056
ies-0.054
 agreement-0.053
ie-0.051
 happy-0.047
Token She
Token ID1375
Feature activation+2.86e+00

Pos logit contributions
 fault+0.007
ses+0.007
sed+0.007
uch+0.006
paste+0.006
Neg logit contributions
 agree-0.004
ies-0.004
 agreement-0.004
ie-0.004
 forth-0.003
Token wanted
Token ID2227
Feature activation+2.08e+00

Pos logit contributions
 agree+0.109
 agreement+0.100
ies+0.096
ie+0.093
 happy+0.090
Neg logit contributions
 fault-0.151
uch-0.143
ses-0.141
sed-0.138
paste-0.138
Token to
Token ID284
Feature activation-5.59e+00

Pos logit contributions
 agree+0.089
 agreement+0.083
 happy+0.083
 family+0.082
ie+0.080
Neg logit contributions
uch-0.098
osaurus-0.095
ses-0.095
sed-0.093
gency-0.090
Token stay
Token ID2652
Feature activation+3.32e+00

Pos logit contributions
 fault+0.317
uch+0.300
 tresp+0.295
ses+0.292
sed+0.290
Neg logit contributions
 agree-0.175
 agreement-0.165
ies-0.162
ie-0.147
ashing-0.146
Token dry
Token ID5894
Feature activation+4.28e+00

Pos logit contributions
 agree+0.143
 agreement+0.129
ies+0.122
reci+0.121
ie+0.120
Neg logit contributions
 fault-0.152
uch-0.136
 tresp-0.136
sed-0.135
ses-0.134
Token so
Token ID523
Feature activation-9.71e+00

Pos logit contributions
 agree+0.139
 agreement+0.128
ies+0.125
ie+0.120
 happy+0.113
Neg logit contributions
 fault-0.242
uch-0.236
ses-0.234
sed-0.231
paste-0.228
Token she
Token ID673
Feature activation+5.27e+00

Pos logit contributions
 fault+0.364
 nothing+0.364
uch+0.348
 anything+0.329
sed+0.323
Neg logit contributions
 agree-0.483
 agreement-0.467
ies-0.422
reci-0.415
acking-0.406
Token started
Token ID2067
Feature activation+2.51e+00

Pos logit contributions
 agree+0.235
ies+0.216
 agreement+0.215
ie+0.212
 happy+0.203
Neg logit contributions
 fault-0.243
ses-0.225
uch-0.224
osaurus-0.219
sed-0.218
Token to
Token ID284
Feature activation-5.60e+00

Pos logit contributions
 agree+0.128
 agreement+0.121
ies+0.118
ie+0.114
 happy+0.112
Neg logit contributions
 fault-0.095
uch-0.092
ses-0.090
sed-0.088
paste-0.088
Token cry
Token ID3960
Feature activation-1.47e+00

Pos logit contributions
 fault+0.291
 tresp+0.268
uch+0.267
ses+0.264
sed+0.262
Neg logit contributions
 agree-0.203
 agreement-0.194
ies-0.187
ie-0.176
ashing-0.169
Tokenshall
Token ID49271
Feature activation-3.59e+00

Pos logit contributions
 agree+0.058
 agreement+0.055
ies+0.051
ie+0.050
 happy+0.047
Neg logit contributions
 fault-0.077
ses-0.070
sed-0.070
uch-0.068
paste-0.068
Token we
Token ID356
Feature activation+2.03e+00

Pos logit contributions
 fault+0.203
osaurus+0.200
ps+0.193
angs+0.191
ses+0.190
Neg logit contributions
 agree-0.123
 happy-0.118
 family-0.115
 play-0.113
ie-0.110
Token go
Token ID467
Feature activation+6.09e+00

Pos logit contributions
 agree+0.066
 agreement+0.061
ies+0.057
ie+0.054
 happy+0.051
Neg logit contributions
 fault-0.115
uch-0.111
ses-0.109
sed-0.108
paste-0.108
Token?"
Token ID1701
Feature activation-7.75e+00

Pos logit contributions
 agree+0.264
 agreement+0.246
ies+0.238
ie+0.231
 happy+0.221
Neg logit contributions
 fault-0.287
ses-0.269
uch-0.268
sed-0.266
paste-0.265
Token and
Token ID290
Feature activation-6.91e-01

Pos logit contributions
 fault+0.380
sed+0.363
ses+0.362
paste+0.355
uch+0.353
Neg logit contributions
 agree-0.290
 agreement-0.281
ies-0.275
ie-0.257
reci-0.247
Token her
Token ID607
Feature activation+2.91e+00

Pos logit contributions
 fault+0.033
sed+0.033
ses+0.032
paste+0.032
uch+0.032
Neg logit contributions
 agree-0.026
 agreement-0.025
ies-0.024
ie-0.022
reci-0.022
Token dog
Token ID3290
Feature activation-3.72e+00

Pos logit contributions
 agree+0.075
ies+0.065
 agreement+0.064
 family+0.062
 happy+0.059
Neg logit contributions
 fault-0.184
uch-0.184
paste-0.180
gency-0.177
OP-0.175
Token bark
Token ID21405
Feature activation-3.96e+00

Pos logit contributions
 fault+0.189
uch+0.177
ses+0.173
paste+0.173
sed+0.169
Neg logit contributions
 agree-0.153
 agreement-0.140
ies-0.134
ie-0.129
 happy-0.125
Tokened
Token ID276
Feature activation-5.84e+00

Pos logit contributions
 fault+0.171
uch+0.158
gency+0.150
ses+0.147
 tresp+0.142
Neg logit contributions
 agree-0.201
 agreement-0.191
 happy-0.187
ies-0.183
ie-0.177
Token yes
Token ID3763
Feature activation-4.31e+00

Pos logit contributions
 fault+0.317
uch+0.300
ses+0.298
sed+0.297
paste+0.296
Neg logit contributions
 agree-0.205
 agreement-0.189
ies-0.180
ie-0.171
 happy-0.160
Token.
Token ID13
Feature activation+8.70e-02

Pos logit contributions
 fault+0.252
ses+0.252
sed+0.247
uch+0.245
paste+0.242
Neg logit contributions
 agree-0.143
ies-0.124
 agreement-0.121
ie-0.121
 happy-0.111
Token said
Token ID531
Feature activation-6.17e-01

Pos logit contributions
 agree+0.110
 agreement+0.095
ies+0.093
 happy+0.088
ie+0.087
Neg logit contributions
 fault-0.182
uch-0.174
gh-0.169
ses-0.168
gency-0.168
Token,
Token ID11
Feature activation+1.54e-01

Pos logit contributions
 fault+0.035
uch+0.034
ses+0.033
sed+0.033
paste+0.033
Neg logit contributions
 agree-0.020
 agreement-0.018
ies-0.018
ie-0.017
 happy-0.016
Token "
Token ID366
Feature activation+1.22e+00

Pos logit contributions
 agree+0.007
ies+0.007
 agreement+0.007
ie+0.007
ogged+0.006
Neg logit contributions
 fault-0.006
sed-0.006
ses-0.006
paste-0.006
gh-0.006
TokenBen
Token ID11696
Feature activation-1.70e+00

Pos logit contributions
 agree+0.060
 agreement+0.058
ies+0.057
ie+0.055
ashing+0.051
Neg logit contributions
 fault-0.046
uch-0.045
paste-0.045
ses-0.043
sed-0.042
Token,
Token ID11
Feature activation+1.59e-01

Pos logit contributions
 fault+0.105
paste+0.103
uch+0.103
ses+0.102
gency+0.101
Neg logit contributions
 agree-0.048
ies-0.041
 agreement-0.040
ie-0.038
 happy-0.037
Token I
Token ID314
Feature activation-2.06e+00

Pos logit contributions
 agree+0.009
 agreement+0.008
ies+0.008
ie+0.008
 happy+0.007
Neg logit contributions
 fault-0.006
uch-0.005
ses-0.005
sed-0.005
paste-0.005
Token am
Token ID716
Feature activation-1.00e+01

Pos logit contributions
 fault+0.107
uch+0.102
ses+0.102
sed+0.102
paste+0.101
Neg logit contributions
 agree-0.081
 agreement-0.072
ies-0.069
ie-0.065
 happy-0.065
Token glad
Token ID9675
Feature activation+7.52e+00

Pos logit contributions
 fault+0.584
uch+0.563
ses+0.553
paste+0.553
sed+0.550
Neg logit contributions
 agree-0.322
 agreement-0.297
ies-0.269
 happy-0.265
ie-0.254
Token.
Token ID13
Feature activation+3.33e-01

Pos logit contributions
 agree+0.344
ies+0.320
 agreement+0.319
ie+0.315
 happy+0.301
Neg logit contributions
 fault-0.337
uch-0.325
ses-0.315
sed-0.309
paste-0.304
Token You
Token ID921
Feature activation-4.12e+00

Pos logit contributions
 agree+0.019
 agreement+0.018
ies+0.018
ie+0.017
 happy+0.016
Neg logit contributions
 fault-0.011
ses-0.010
uch-0.010
sed-0.010
paste-0.010
Token are
Token ID389
Feature activation-1.66e+00

Pos logit contributions
 fault+0.216
ses+0.202
gh+0.202
sed+0.202
paste+0.201
Neg logit contributions
 agree-0.166
 agreement-0.145
ies-0.139
 happy-0.136
ie-0.131
Token he
Token ID339
Feature activation+3.19e+00

Pos logit contributions
 agree+0.207
 agreement+0.187
ies+0.187
ie+0.177
 happy+0.171
Neg logit contributions
 fault-0.397
uch-0.387
ses-0.383
sed-0.379
paste-0.376
Token found
Token ID1043
Feature activation+2.71e+00

Pos logit contributions
 agree+0.123
ie+0.105
ies+0.104
 play+0.100
 happy+0.098
Neg logit contributions
 fault-0.174
paste-0.161
ses-0.160
gency-0.158
sed-0.156
Token the
Token ID262
Feature activation+1.61e+00

Pos logit contributions
 play+0.092
 family+0.091
ie+0.088
 agree+0.086
 friends+0.085
Neg logit contributions
uch-0.148
ses-0.140
sed-0.139
 fault-0.138
paste-0.138
Token egg
Token ID5935
Feature activation-9.37e+00

Pos logit contributions
 family+0.065
 day+0.059
 happy+0.058
 agreement+0.057
 girl+0.053
Neg logit contributions
paste-0.085
OP-0.081
angs-0.080
 happens-0.080
ñ-0.078
Token.
Token ID13
Feature activation+4.69e-01

Pos logit contributions
 fault+0.517
uch+0.499
ses+0.497
sed+0.491
gency+0.491
Neg logit contributions
 agree-0.315
ies-0.296
ie-0.282
 agreement-0.277
 happy-0.260
Token He
Token ID679
Feature activation+1.01e+00

Pos logit contributions
 agree+0.017
 agreement+0.016
ies+0.015
ie+0.015
 happy+0.014
Neg logit contributions
 fault-0.025
uch-0.024
ses-0.023
paste-0.023
sed-0.023
Token knew
Token ID2993
Feature activation-2.12e+00

Pos logit contributions
 agree+0.033
ies+0.028
 agreement+0.028
ie+0.027
 happy+0.026
Neg logit contributions
 fault-0.062
uch-0.058
ses-0.056
gh-0.056
paste-0.056
Token it
Token ID340
Feature activation-3.30e+00

Pos logit contributions
uch+0.108
 fault+0.106
ses+0.103
sed+0.100
paste+0.100
Neg logit contributions
 agree-0.081
ies-0.074
 agreement-0.073
 family-0.072
 happy-0.072
Token was
Token ID373
Feature activation-3.57e+00

Pos logit contributions
 fault+0.195
ses+0.188
sed+0.180
paste+0.180
angs+0.178
Neg logit contributions
ie-0.102
 agree-0.097
 happy-0.094
 family-0.094
ies-0.094
Token the
Token ID262
Feature activation+1.61e+00

Pos logit contributions
paste+0.184
uch+0.182
ses+0.180
sed+0.179
gency+0.176
Neg logit contributions
 happy-0.135
 agree-0.132
 agreement-0.128
 family-0.126
ies-0.126
Token right
Token ID826
Feature activation-4.15e+00

Pos logit contributions
 family+0.053
 happy+0.049
 agreement+0.049
 day+0.049
 agree+0.047
Neg logit contributions
paste-0.094
OP-0.092
angs-0.091
gency-0.089
 happens-0.088
Token,
Token ID11
Feature activation+7.50e-02

Pos logit contributions
 agree+0.151
 agreement+0.139
ies+0.131
ie+0.125
 happy+0.118
Neg logit contributions
 fault-0.234
uch-0.222
ses-0.220
sed-0.219
paste-0.217
Token her
Token ID607
Feature activation+2.94e+00

Pos logit contributions
 agree+0.003
 agreement+0.003
ie+0.002
ies+0.002
ogged+0.002
Neg logit contributions
 fault-0.004
 nothing-0.004
sed-0.004
paste-0.004
ses-0.004
Token brother
Token ID3956
Feature activation+1.76e+00

Pos logit contributions
 agree+0.098
 agreement+0.090
ies+0.084
ie+0.081
 happy+0.074
Neg logit contributions
 fault-0.168
uch-0.158
ses-0.158
sed-0.157
paste-0.156
Token poured
Token ID19036
Feature activation-6.45e-01

Pos logit contributions
 agree+0.067
 agreement+0.062
ies+0.059
ie+0.057
 happy+0.054
Neg logit contributions
 fault-0.091
uch-0.086
ses-0.085
sed-0.085
paste-0.084
Token the
Token ID262
Feature activation+1.54e+00

Pos logit contributions
sed+0.035
ses+0.034
 fault+0.034
uch+0.034
 tresp+0.033
Neg logit contributions
 agree-0.023
ies-0.022
 agreement-0.021
ie-0.021
 happy-0.021
Token oil
Token ID3056
Feature activation-1.01e+01

Pos logit contributions
 agree+0.082
ies+0.081
 agreement+0.079
 happy+0.078
 family+0.077
Neg logit contributions
 fault-0.049
ses-0.048
sed-0.048
osaurus-0.046
gh-0.046
Token on
Token ID319
Feature activation-1.66e-01

Pos logit contributions
ses+0.533
 fault+0.532
sed+0.526
gency+0.520
 tresp+0.516
Neg logit contributions
 agree-0.347
ies-0.334
ie-0.324
 happy-0.303
 agreement-0.303
Token the
Token ID262
Feature activation+1.57e+00

Pos logit contributions
uch+0.011
 fault+0.011
sed+0.011
ses+0.011
angs+0.010
Neg logit contributions
 agreement-0.004
 agree-0.004
 happy-0.004
ie-0.004
 whenever-0.003
Token wheels
Token ID13666
Feature activation-6.68e+00

Pos logit contributions
 agree+0.055
 agreement+0.054
 family+0.050
 happy+0.049
 play+0.049
Neg logit contributions
 fault-0.081
uch-0.079
paste-0.077
osaurus-0.077
gh-0.076
Token of
Token ID286
Feature activation+1.00e+01

Pos logit contributions
 fault+0.367
uch+0.352
ses+0.350
sed+0.348
 tresp+0.344
Neg logit contributions
 agree-0.230
 agreement-0.209
ies-0.202
ie-0.192
 happy-0.183
Token the
Token ID262
Feature activation+1.52e+00

Pos logit contributions
 agree+0.325
 agreement+0.314
ies+0.283
ie+0.281
 happy+0.281
Neg logit contributions
uch-0.566
 fault-0.560
sed-0.548
ses-0.542
paste-0.532
Token sunny
Token ID27737
Feature activation+9.15e+00

Pos logit contributions
 fault+0.023
uch+0.022
ses+0.021
sed+0.021
paste+0.021
Neg logit contributions
 agree-0.021
 agreement-0.019
 happy-0.019
ies-0.019
 family-0.018
Token day
Token ID1110
Feature activation+5.90e+00

Pos logit contributions
 agree+0.217
 agreement+0.193
ies+0.177
ie+0.164
 happy+0.150
Neg logit contributions
 fault-0.603
uch-0.579
ses-0.574
sed-0.571
paste-0.570
Token.
Token ID13
Feature activation+6.94e-02

Pos logit contributions
 agree+0.232
 agreement+0.216
ies+0.206
ie+0.197
 happy+0.190
Neg logit contributions
 fault-0.297
uch-0.282
ses-0.279
sed-0.277
paste-0.275
Token<|endoftext|>
Token ID50256
Feature activation+4.18e+00

Pos logit contributions
 agree+0.003
ies+0.003
 agreement+0.002
ie+0.002
 forth+0.002
Neg logit contributions
 fault-0.004
ses-0.003
paste-0.003
sed-0.003
osaurus-0.003
TokenJack
Token ID14295
Feature activation-9.80e-02

Pos logit contributions
 agree+0.120
 agreement+0.108
ies+0.101
ie+0.096
 happy+0.091
Neg logit contributions
 fault-0.254
uch-0.242
ses-0.241
sed-0.241
paste-0.239
Token was
Token ID373
Feature activation-3.22e+00

Pos logit contributions
 fault+0.005
uch+0.005
ses+0.004
sed+0.004
osaurus+0.004
Neg logit contributions
 agree-0.004
 agreement-0.004
ies-0.003
ashing-0.003
 happy-0.003
Token walking
Token ID6155
Feature activation+6.67e+00

Pos logit contributions
 fault+0.163
ses+0.153
uch+0.153
sed+0.153
 tresp+0.148
Neg logit contributions
 agree-0.129
 agreement-0.117
ies-0.114
ie-0.108
reci-0.103
Token in
Token ID287
Feature activation+1.86e+00

Pos logit contributions
 agree+0.355
reci+0.317
 agreement+0.316
ies+0.281
 forget+0.280
Neg logit contributions
 an-0.229
 fault-0.218
 nothing-0.204
ASH-0.200
ps-0.199
Token the
Token ID262
Feature activation+1.93e+00

Pos logit contributions
 agree+0.087
ies+0.079
 agreement+0.077
ashing+0.075
ie+0.074
Neg logit contributions
 fault-0.079
 tresp-0.073
ses-0.072
gency-0.070
sed-0.070
Token park
Token ID3952
Feature activation+2.50e+00

Pos logit contributions
 agree+0.072
 agreement+0.064
ies+0.063
ie+0.057
ashing+0.053
Neg logit contributions
 fault-0.101
uch-0.098
ses-0.095
 tresp-0.094
sed-0.094
Token.
Token ID13
Feature activation+2.87e-01

Pos logit contributions
 agree+0.118
 agreement+0.115
reci+0.115
ies+0.108
ie+0.104
Neg logit contributions
 fault-0.090
 an-0.089
 nothing-0.087
gh-0.086
uch-0.086
Token?"
Token ID1701
Feature activation-7.52e+00

Pos logit contributions
 fault+0.034
uch+0.032
ses+0.031
paste+0.031
sed+0.030
Neg logit contributions
 agree-0.045
 agreement-0.043
ies-0.041
ie-0.040
 happy-0.039
Token
Token ID198
Feature activation+1.05e+00

Pos logit contributions
 an+0.229
 nothing+0.225
gh+0.220
 teeth+0.214
 Bang+0.205
Neg logit contributions
reci-0.387
ies-0.367
 agreement-0.367
ashing-0.350
 agree-0.349
Token
Token ID198
Feature activation+1.05e+00

Pos logit contributions
 agreement+0.048
 agree+0.046
ies+0.046
 forth+0.045
ashing+0.043
Neg logit contributions
gh-0.040
 fault-0.039
 Ow-0.038
uch-0.038
OP-0.037
Token"
Token ID1
Feature activation+6.04e+00

Pos logit contributions
 agreement+0.044
ies+0.043
 agree+0.043
reci+0.041
ie+0.040
Neg logit contributions
gh-0.044
OP-0.042
osaurus-0.042
uch-0.040
RR-0.040
TokenOf
Token ID5189
Feature activation+1.77e+00

Pos logit contributions
 agree+0.189
 agreement+0.176
ies+0.173
ie+0.163
 happy+0.143
Neg logit contributions
 fault-0.340
uch-0.334
paste-0.330
sed-0.324
ses-0.323
Token course
Token ID1781
Feature activation-6.13e+00

Pos logit contributions
 agree+0.087
 agreement+0.084
ie+0.080
ies+0.079
 happy+0.074
Neg logit contributions
 fault-0.074
ses-0.069
sed-0.069
paste-0.068
uch-0.066
Token,
Token ID11
Feature activation+2.48e-01

Pos logit contributions
 fault+0.338
ses+0.320
uch+0.319
paste+0.315
sed+0.313
Neg logit contributions
 agree-0.221
 agreement-0.198
ies-0.196
ie-0.189
 happy-0.181
Token Tim
Token ID5045
Feature activation-3.59e-01

Pos logit contributions
 agree+0.010
 agreement+0.010
ies+0.009
ie+0.008
ashing+0.008
Neg logit contributions
 fault-0.012
uch-0.011
paste-0.011
ses-0.011
sed-0.011
Tokenmy
Token ID1820
Feature activation+7.82e+00

Pos logit contributions
 fault+0.026
uch+0.025
ses+0.024
paste+0.024
sed+0.024
Neg logit contributions
 agree-0.006
 agreement-0.005
ies-0.004
ie-0.004
 happy-0.004
Token.
Token ID13
Feature activation+3.76e-01

Pos logit contributions
 agree+0.285
ies+0.258
ie+0.245
 agreement+0.243
 happy+0.224
Neg logit contributions
 fault-0.419
uch-0.407
ses-0.404
sed-0.403
paste-0.401
Token But
Token ID887
Feature activation-2.53e+00

Pos logit contributions
 agree+0.019
 agreement+0.018
ies+0.018
ie+0.017
 happy+0.017
Neg logit contributions
 fault-0.014
ses-0.013
uch-0.013
sed-0.013
paste-0.013
Token.
Token ID13
Feature activation+6.05e-01

Pos logit contributions
 fault+0.316
uch+0.301
ses+0.297
sed+0.295
paste+0.294
Neg logit contributions
 agree-0.246
 agreement-0.229
ies-0.218
ie-0.209
 happy-0.199
Token She
Token ID1375
Feature activation+3.19e+00

Pos logit contributions
 agree+0.019
 agreement+0.017
ies+0.015
 happy+0.015
ie+0.014
Neg logit contributions
 fault-0.036
uch-0.035
 tresp-0.033
ses-0.033
sed-0.033
Token fell
Token ID3214
Feature activation-5.79e+00

Pos logit contributions
 agree+0.117
ies+0.103
 happy+0.101
 play+0.096
 agreement+0.091
Neg logit contributions
 fault-0.183
gency-0.168
uch-0.167
gh-0.166
osaurus-0.162
Token to
Token ID284
Feature activation-5.46e+00

Pos logit contributions
uch+0.319
 fault+0.312
gh+0.300
sed+0.298
ses+0.297
Neg logit contributions
ies-0.196
 agree-0.190
 agreement-0.184
ie-0.183
 happy-0.177
Token the
Token ID262
Feature activation+1.34e+00

Pos logit contributions
 fault+0.285
uch+0.268
ses+0.267
sed+0.264
paste+0.264
Neg logit contributions
 agree-0.204
 agreement-0.190
ies-0.182
ie-0.172
 happy-0.163
Token ground
Token ID2323
Feature activation-2.65e+00

Pos logit contributions
 agreement+0.048
 agree+0.045
ies+0.043
 happy+0.042
 family+0.042
Neg logit contributions
 fault-0.070
uch-0.068
osaurus-0.068
OP-0.066
gh-0.065
Token.
Token ID13
Feature activation+6.11e-01

Pos logit contributions
 fault+0.152
uch+0.149
ses+0.145
sed+0.144
gency+0.143
Neg logit contributions
 agree-0.087
 agreement-0.079
ies-0.074
ie-0.071
 happy-0.071
Token Her
Token ID2332
Feature activation+2.51e+00

Pos logit contributions
 agree+0.015
 agreement+0.014
 happy+0.012
ies+0.011
ie+0.011
Neg logit contributions
uch-0.039
 fault-0.039
 tresp-0.037
osaurus-0.037
gh-0.037
Token dress
Token ID6576
Feature activation-1.34e+00

Pos logit contributions
 agree+0.087
ies+0.084
 agreement+0.079
 happy+0.075
 family+0.074
Neg logit contributions
uch-0.132
osaurus-0.131
 fault-0.131
gency-0.130
 tresp-0.128
Token got
Token ID1392
Feature activation-3.15e-01

Pos logit contributions
gency+0.074
 fault+0.073
 tresp+0.071
osaurus+0.068
RR+0.068
Neg logit contributions
ies-0.049
ie-0.044
 agree-0.041
 happy-0.041
 play-0.037
Token dirty
Token ID11841
Feature activation-2.01e+00

Pos logit contributions
 fault+0.014
uch+0.014
ses+0.014
gency+0.014
sed+0.013
Neg logit contributions
ies-0.014
 happy-0.013
 agreement-0.013
 agree-0.013
ie-0.013
Token time
Token ID640
Feature activation-1.62e+00

Pos logit contributions
 agree+0.077
 agreement+0.072
ies+0.068
ie+0.065
 happy+0.061
Neg logit contributions
 fault-0.102
uch-0.095
ses-0.094
sed-0.093
 tresp-0.093
Token.
Token ID13
Feature activation+3.72e-02

Pos logit contributions
 fault+0.079
uch+0.073
paste+0.072
gh+0.072
sed+0.072
Neg logit contributions
 agree-0.065
 agreement-0.063
ies-0.058
ie-0.056
reci-0.052
Token<|endoftext|>
Token ID50256
Feature activation+3.97e+00

Pos logit contributions
 agree+0.001
ies+0.001
 agreement+0.001
ie+0.001
 forth+0.001
Neg logit contributions
 fault-0.002
ses-0.002
sed-0.002
OP-0.002
paste-0.002
TokenOnce
Token ID7454
Feature activation+3.92e+00

Pos logit contributions
 agree+0.106
 agreement+0.095
ies+0.086
ie+0.083
 happy+0.079
Neg logit contributions
 fault-0.250
uch-0.238
ses-0.238
sed-0.238
paste-0.237
Token upon
Token ID2402
Feature activation+1.02e+01

Pos logit contributions
 agree+0.131
 agreement+0.127
 forth+0.096
reci+0.095
 embrace+0.092
Neg logit contributions
 fault-0.232
uch-0.204
 nothing-0.204
 wrong-0.203
 Ow-0.200
Token a
Token ID257
Feature activation-7.95e+00

Pos logit contributions
 happy+0.540
 agree+0.533
 agreement+0.505
 play+0.491
 family+0.485
Neg logit contributions
 fault-0.393
sed-0.375
gency-0.369
ses-0.365
uch-0.354
Token time
Token ID640
Feature activation-1.63e+00

Pos logit contributions
 fault+0.335
uch+0.324
gh+0.316
sed+0.306
ses+0.305
Neg logit contributions
 agree-0.362
 agreement-0.348
ies-0.344
ie-0.321
reci-0.310
Token,
Token ID11
Feature activation+3.83e-01

Pos logit contributions
 fault+0.082
uch+0.080
gh+0.077
sed+0.077
ses+0.076
Neg logit contributions
 agree-0.064
 agreement-0.061
ies-0.055
ie-0.052
reci-0.052
Token there
Token ID612
Feature activation+7.27e+00

Pos logit contributions
 agree+0.019
 agreement+0.017
ies+0.016
reci+0.015
ogged+0.015
Neg logit contributions
 fault-0.016
uch-0.016
sed-0.015
ses-0.015
gh-0.015
Token was
Token ID373
Feature activation-3.35e+00

Pos logit contributions
 agree+0.349
 agreement+0.328
ies+0.318
ie+0.310
 happy+0.299
Neg logit contributions
 fault-0.303
uch-0.283
ses-0.281
paste-0.279
sed-0.278
Token a
Token ID257
Feature activation-7.87e+00

Pos logit contributions
 fault+0.110
gency+0.104
sed+0.098
uch+0.098
paste+0.098
Neg logit contributions
 agree-0.189
 happy-0.185
 agreement-0.183
 family-0.175
ie-0.173
Token ice
Token ID4771
Feature activation-7.14e+00

Pos logit contributions
 agree+0.055
 agreement+0.053
ies+0.048
 happy+0.047
ie+0.046
Neg logit contributions
 fault-0.070
uch-0.066
gh-0.064
paste-0.064
ses-0.063
Token palace
Token ID20562
Feature activation+2.16e+00

Pos logit contributions
 fault+0.341
uch+0.320
ses+0.318
sed+0.316
paste+0.314
Neg logit contributions
 agree-0.303
 agreement-0.282
ies-0.277
ie-0.266
 happy-0.251
Token.
Token ID13
Feature activation+1.70e-01

Pos logit contributions
 agree+0.091
 agreement+0.085
ies+0.079
ie+0.076
 happy+0.073
Neg logit contributions
 fault-0.102
uch-0.097
sed-0.096
ses-0.096
paste-0.095
Token
Token ID6184
Feature activation-5.31e+00

Pos logit contributions
 agree+0.007
 agreement+0.007
 happy+0.006
ies+0.006
 family+0.006
Neg logit contributions
 fault-0.008
uch-0.008
paste-0.008
gh-0.008
osaurus-0.007
Token
Token ID95
Feature activation-8.26e+00

Pos logit contributions
 fault+0.211
paste+0.184
ses+0.178
sed+0.176
uch+0.174
Neg logit contributions
 agree-0.283
 agreement-0.258
ies-0.245
ie-0.243
 happy-0.238
Token
Token ID26391
Feature activation-1.27e+01

Pos logit contributions
 fault+0.358
paste+0.322
sed+0.304
ses+0.302
osaurus+0.296
Neg logit contributions
 agree-0.412
 agreement-0.391
 play-0.341
 embrace-0.338
ie-0.337
Token
Token ID129
Feature activation-8.18e+00

Pos logit contributions
 fault+0.652
ses+0.608
paste+0.604
uch+0.602
sed+0.600
Neg logit contributions
 agree-0.490
 agreement-0.438
ies-0.435
ie-0.407
 happy-0.401
Token
Token ID241
Feature activation+6.43e-01

Pos logit contributions
 fault+0.352
uch+0.330
paste+0.327
ses+0.322
sed+0.320
Neg logit contributions
 agree-0.385
 agreement-0.363
ies-0.343
ie-0.333
 happy-0.326
TokenPlease
Token ID5492
Feature activation-4.62e+00

Pos logit contributions
 agree+0.024
 agreement+0.021
 happy+0.020
ies+0.020
ie+0.019
Neg logit contributions
 fault-0.034
uch-0.033
ses-0.032
sed-0.032
paste-0.032
Token help
Token ID1037
Feature activation-8.88e-01

Pos logit contributions
 fault+0.225
ses+0.209
sed+0.207
uch+0.206
paste+0.205
Neg logit contributions
 agree-0.199
 agreement-0.183
ies-0.175
ie-0.172
 happy-0.163
Token me
Token ID502
Feature activation+8.34e-01

Pos logit contributions
 fault+0.038
ses+0.035
sed+0.035
uch+0.035
paste+0.034
Neg logit contributions
 agree-0.043
 agreement-0.040
ies-0.039
ie-0.038
 happy-0.036
Token the
Token ID262
Feature activation+1.62e+00

Pos logit contributions
 fault+0.272
uch+0.260
ses+0.257
sed+0.255
 tresp+0.253
Neg logit contributions
 agree-0.208
 agreement-0.191
ies-0.191
ie-0.177
ashing-0.169
Token mother
Token ID2802
Feature activation+2.45e+00

Pos logit contributions
 agree+0.042
 agreement+0.038
ies+0.035
ie+0.033
ashing+0.029
Neg logit contributions
 fault-0.103
uch-0.098
ses-0.097
sed-0.097
 tresp-0.096
Token:
Token ID25
Feature activation-6.13e+00

Pos logit contributions
 agree+0.096
 agreement+0.091
ies+0.079
ashing+0.076
reci+0.076
Neg logit contributions
 fault-0.123
paste-0.116
 tresp-0.113
ses-0.112
sed-0.112
Token
Token ID6184
Feature activation-5.43e+00

Pos logit contributions
 fault+0.264
sed+0.249
ses+0.248
gh+0.247
uch+0.246
Neg logit contributions
 agree-0.271
 agreement-0.259
ies-0.253
ie-0.247
reci-0.230
Token
Token ID95
Feature activation-8.37e+00

Pos logit contributions
 fault+0.213
uch+0.204
ses+0.198
sed+0.196
paste+0.196
Neg logit contributions
 agree-0.269
 agreement-0.257
ies-0.249
ie-0.240
 happy-0.231
Token
Token ID26391
Feature activation-1.28e+01

Pos logit contributions
 fault+0.335
sed+0.305
ses+0.305
uch+0.303
paste+0.300
Neg logit contributions
 agree-0.427
 agreement-0.400
ies-0.371
ie-0.364
 happy-0.358
Token
Token ID129
Feature activation-8.31e+00

Pos logit contributions
 fault+0.593
uch+0.568
ses+0.553
sed+0.551
paste+0.548
Neg logit contributions
 agree-0.546
 agreement-0.523
ies-0.491
ie-0.478
 happy-0.452
Token
Token ID241
Feature activation+4.53e-01

Pos logit contributions
 fault+0.349
uch+0.331
ses+0.328
sed+0.323
gh+0.320
Neg logit contributions
 agree-0.384
 agreement-0.366
ies-0.363
ie-0.348
 family-0.323
TokenLet
Token ID5756
Feature activation-1.09e+00

Pos logit contributions
 agree+0.019
 agreement+0.017
 happy+0.016
ies+0.016
 family+0.015
Neg logit contributions
 fault-0.022
uch-0.021
sed-0.021
ses-0.021
paste-0.020
Tokenâ
Token ID22940
Feature activation-8.40e+00

Pos logit contributions
 fault+0.053
ses+0.052
sed+0.051
paste+0.049
uch+0.048
Neg logit contributions
 agree-0.046
 agreement-0.044
ie-0.042
ies-0.042
 happy-0.039
Token
Token ID26391
Feature activation-1.28e+01

Pos logit contributions
uch+0.258
 fault+0.252
gh+0.236
ses+0.229
paste+0.228
Neg logit contributions
 agree-0.473
ies-0.465
 agreement-0.457
ie-0.438
 happy-0.426
Token go
Token ID467
Feature activation+6.27e+00

Pos logit contributions
 fault+0.399
ses+0.368
sed+0.367
uch+0.367
 tresp+0.362
Neg logit contributions
 agree-0.353
 agreement-0.343
ies-0.316
ie-0.305
ogged-0.291
Token on
Token ID319
Feature activation+3.36e-02

Pos logit contributions
 agree+0.293
 agreement+0.273
ies+0.270
ie+0.261
 happy+0.255
Neg logit contributions
 fault-0.272
ses-0.257
uch-0.254
sed-0.253
paste-0.252
Token the
Token ID262
Feature activation+1.79e+00

Pos logit contributions
 agree+0.001
 happy+0.001
 agreement+0.001
 play+0.001
ies+0.001
Neg logit contributions
ses-0.002
sed-0.002
uch-0.002
paste-0.002
 fault-0.002
Token road
Token ID2975
Feature activation+2.01e+00

Pos logit contributions
 agree+0.064
 agreement+0.062
 happy+0.056
 family+0.054
 play+0.054
Neg logit contributions
 fault-0.094
paste-0.093
uch-0.092
osaurus-0.090
angs-0.089
Token without
Token ID1231
Feature activation-1.55e+00

Pos logit contributions
 agree+0.079
 agreement+0.071
ies+0.070
ie+0.068
 happy+0.065
Neg logit contributions
 fault-0.101
ses-0.098
sed-0.098
uch-0.097
paste-0.096
Token an
Token ID281
Feature activation-1.38e+01

Pos logit contributions
 fault+0.072
uch+0.072
paste+0.071
ses+0.069
osaurus+0.069
Neg logit contributions
 agree-0.064
 agreement-0.062
ies-0.060
 happy-0.059
ie-0.057
Token adult
Token ID4044
Feature activation+1.13e+00

Pos logit contributions
 fault+0.681
ses+0.633
sed+0.613
uch+0.608
 tresp+0.605
Neg logit contributions
 agree-0.545
ies-0.525
 agreement-0.499
ie-0.480
ashing-0.457
Token.
Token ID13
Feature activation+4.41e-01

Pos logit contributions
 agree+0.043
 agreement+0.039
ies+0.038
ie+0.036
 happy+0.035
Neg logit contributions
 fault-0.058
uch-0.056
ses-0.056
sed-0.056
paste-0.055
Token The
Token ID383
Feature activation+9.39e-02

Pos logit contributions
 agree+0.020
 agreement+0.019
ies+0.018
ie+0.018
 happy+0.017
Neg logit contributions
 fault-0.019
uch-0.018
ses-0.018
sed-0.018
paste-0.018
Token automobiles
Token ID44820
Feature activation+8.33e-01

Pos logit contributions
 agree+0.003
 agreement+0.003
ies+0.003
ie+0.003
 happy+0.002
Neg logit contributions
 fault-0.005
uch-0.005
ses-0.005
paste-0.005
sed-0.005
Token can
Token ID460
Feature activation-5.07e+00

Pos logit contributions
 agree+0.038
 agreement+0.036
ies+0.035
ie+0.034
 happy+0.032
Neg logit contributions
 fault-0.036
uch-0.034
ses-0.033
sed-0.033
paste-0.033
Token found
Token ID1043
Feature activation+2.65e+00

Pos logit contributions
 agreement+0.232
 agree+0.230
ogging+0.220
ies+0.210
reci+0.209
Neg logit contributions
 tricked-0.219
 tresp-0.213
uch-0.212
sed-0.209
OP-0.206
Token a
Token ID257
Feature activation-8.14e+00

Pos logit contributions
 family+0.105
ie+0.101
 happy+0.099
 day+0.096
ies+0.094
Neg logit contributions
ñ-0.130
osaurus-0.129
uch-0.128
gency-0.125
gh-0.125
Token big
Token ID1263
Feature activation+8.83e-01

Pos logit contributions
 fault+0.472
ps+0.454
uch+0.452
ses+0.449
paste+0.438
Neg logit contributions
 happy-0.254
 agree-0.254
 family-0.242
ies-0.239
 agreement-0.235
Token jar
Token ID17379
Feature activation-7.11e+00

Pos logit contributions
 agree+0.036
 agreement+0.035
ies+0.034
 happy+0.033
ie+0.033
Neg logit contributions
 fault-0.044
uch-0.041
paste-0.040
ses-0.039
sed-0.039
Token of
Token ID286
Feature activation+1.02e+01

Pos logit contributions
uch+0.402
 fault+0.396
paste+0.395
gh+0.391
sed+0.382
Neg logit contributions
 agree-0.220
 agreement-0.209
ies-0.181
reci-0.168
ie-0.162
Token pepper
Token ID13385
Feature activation-1.18e+01

Pos logit contributions
 agree+0.487
 agreement+0.469
ies+0.467
 happy+0.464
ie+0.453
Neg logit contributions
 fault-0.431
uch-0.404
ses-0.397
sed-0.394
osaurus-0.387
Token on
Token ID319
Feature activation-2.75e-01

Pos logit contributions
 fault+0.670
ses+0.628
gency+0.628
sed+0.609
uch+0.609
Neg logit contributions
ies-0.420
ie-0.390
 agree-0.381
 family-0.331
 day-0.321
Token the
Token ID262
Feature activation+1.80e+00

Pos logit contributions
uch+0.019
sed+0.019
 fault+0.019
ses+0.019
paste+0.018
Neg logit contributions
 agree-0.006
 agreement-0.006
 particular-0.006
ie-0.005
 family-0.005
Token counter
Token ID3753
Feature activation-5.05e+00

Pos logit contributions
 agree+0.045
 family+0.043
 agreement+0.043
ie+0.040
 happy+0.039
Neg logit contributions
 fault-0.113
ses-0.106
 happens-0.105
uch-0.105
paste-0.104
Token.
Token ID13
Feature activation+6.86e-01

Pos logit contributions
 fault+0.260
uch+0.247
ses+0.244
sed+0.242
paste+0.242
Neg logit contributions
 agree-0.193
 agreement-0.179
ies-0.171
ie-0.163
 happy-0.156
Token They
Token ID1119
Feature activation+4.17e+00

Pos logit contributions
 agree+0.029
 agreement+0.027
ies+0.027
ie+0.026
 happy+0.023
Neg logit contributions
 fault-0.032
ses-0.030
sed-0.030
paste-0.030
uch-0.030
Token He
Token ID679
Feature activation+8.33e-01

Pos logit contributions
 fault+0.000
uch+0.000
ses+0.000
sed+0.000
paste+0.000
Neg logit contributions
 agree-0.000
 agreement-0.000
ies-0.000
ie-0.000
 happy-0.000
Token would
Token ID561
Feature activation-2.92e+00

Pos logit contributions
 agree+0.032
 agreement+0.029
ies+0.028
ie+0.027
 happy+0.027
Neg logit contributions
 fault-0.045
uch-0.042
ses-0.042
sed-0.040
gh-0.040
Token bring
Token ID2222
Feature activation+9.95e+00

Pos logit contributions
 fault+0.156
uch+0.149
ses+0.146
paste+0.146
sed+0.146
Neg logit contributions
 agree-0.100
 agreement-0.094
ies-0.092
ie-0.086
ashing-0.081
Token food
Token ID2057
Feature activation-1.87e+00

Pos logit contributions
 agree+0.389
 agreement+0.363
 happy+0.362
ie+0.353
ies+0.352
Neg logit contributions
uch-0.495
ses-0.494
sed-0.483
 fault-0.480
paste-0.451
Token to
Token ID284
Feature activation-5.47e+00

Pos logit contributions
 fault+0.099
uch+0.096
ses+0.095
sed+0.094
paste+0.091
Neg logit contributions
 agree-0.070
 agreement-0.063
ies-0.062
ie-0.058
 happy-0.056
Token an
Token ID281
Feature activation-1.39e+01

Pos logit contributions
 fault+0.265
 tresp+0.246
uch+0.245
ses+0.244
paste+0.242
Neg logit contributions
 agree-0.217
 agreement-0.204
ies-0.201
ie-0.189
ashing-0.184
Token old
Token ID1468
Feature activation+4.27e+00

Pos logit contributions
 fault+0.647
 tresp+0.605
ses+0.601
uch+0.598
sed+0.595
Neg logit contributions
 agree-0.569
ies-0.530
 agreement-0.522
ie-0.486
ashing-0.471
Token mouse
Token ID10211
Feature activation-5.98e+00

Pos logit contributions
 agree+0.156
 agreement+0.144
ies+0.136
ie+0.129
 happy+0.125
Neg logit contributions
 fault-0.229
uch-0.221
ses-0.215
paste-0.215
sed-0.213
Token who
Token ID508
Feature activation+6.65e+00

Pos logit contributions
 fault+0.294
uch+0.276
paste+0.273
ses+0.273
sed+0.271
Neg logit contributions
 agree-0.237
 agreement-0.223
ies-0.213
ie-0.202
 happy-0.193
Token lived
Token ID5615
Feature activation+2.99e+00

Pos logit contributions
 agree+0.296
 agreement+0.277
ies+0.267
ie+0.258
 happy+0.248
Neg logit contributions
 fault-0.300
uch-0.282
ses-0.279
sed-0.277
paste-0.275
Token down
Token ID866
Feature activation+6.31e+00

Pos logit contributions
 agree+0.117
 agreement+0.109
ies+0.104
ie+0.100
 happy+0.098
Neg logit contributions
 fault-0.152
uch-0.144
ses-0.144
sed-0.142
paste-0.141