TOP ACTIVATIONS
MAX = 0.120

nodded and put on her mask . She hoped the bad
! He grabbed his baseball bat and ran out of the
the kids all grabbed some pants and carefully hung them up
. He took off his mask and cape and said ,
Sam took off his mask and looked sorry . He
We will share our sp ades and buckets next time !"
They try to take the vest . The man is angry
girl took off her high pants and put them away .
, took off his white cap and jumped out of bed
arm chair and put a crown on his head .
. He puts on his mask . He thanks Sam .
put on his favorite blue cap and hopped in the car
They rolled big balls of snow and stacked them on top
saw a big box of Leg os that he really wanted
He put on a golden crown and felt like a king
and took off his heavy vest . He hung it up
. They picked up their sunglasses . They ran to the
He grabbed hold of the mask and used it to protect
Leo and gave him a crown and a cape . She
. He puts on his mask and plays with his dinosaur

MIN ACTIVATIONS
MIN = -0.158

robe or the towel or the hug . I won 't
around the waves , feeling such joy and freedom .
a sque aky reply from the tree . " Me ,"
you the bath robe or the towel or the hug .
water and the sand and the shells . Maybe we will
can use the sand and the toys ," she says .
can do great things that the weak cannot do alone .
" Max , you 're such a creative boy !"
why can 't I color the pictures ? They are boring
old or too silly for the park . She
sne ez ed loudly and the tower fell down with a
the y ummy fruit and the pretty sunset . <|endoftext|> Once
being nice to me and the spider . Maybe one day
The more the ants and the mole played together , the
, much too high for the farmer to reach . They
my explained that it meant the food wasn 't good anymore
we will be careful on the road ." The taxi driver
don 't think it 's a good idea to climb the
of luck and good things would happen to her .
a shiny new bell that the goose could wear around its

INTERVAL 0.051 - 0.120
CONTAINS 8.689%

distance . It was very high and looked very dangerous .
and the birds and the sun . She feels warm and
he had never met before . He wrote a
, but Max was smart . Max knew he needed to
a nap . Molly fell asleep on the couch and enjoyed

INTERVAL -0.019 - 0.051
CONTAINS 57.165%

she started cl apping and smiling so he could see her
disappointed . B obby thought and thought about what
felt its weight . Then , with a lively smile on
. She always asked her mom to make pasta for dinner
. Mia liked the images . She pointed at them and

INTERVAL -0.088 - -0.019
CONTAINS 33.697%

, Lily , what did you do ? Come here ,
was shy and didn 't like to talk to other kids
, she was playing in the garden with her friends .
tools in the garage . You can hurt yourself or break
She has a doll , a t eddy bear , a

INTERVAL -0.158 - -0.088
CONTAINS 0.449%

each other . They felt the heat and the smoke .
searched and searched , but the jelly was nowhere to be
Tim my was adventurous and wanted to learn
second question : What is the name of the animal that
the left looked over at the friend on the right and
Token nodded
Token ID14464
Feature activation+1.79e-02

Pos logit contributions
es+0.033
ing+0.032
ed+0.031
oped+0.031
med+0.030
Neg logit contributions
 pap-0.031
 not-0.030
 plan-0.029
 help-0.028
 mistake-0.028
Token and
Token ID290
Feature activation-2.07e-02

Pos logit contributions
oped+0.033
es+0.032
ane+0.031
ing+0.031
hed+0.030
Neg logit contributions
 not-0.033
 pap-0.031
 help-0.031
 grown-0.030
 be-0.030
Token put
Token ID1234
Feature activation+3.52e-02

Pos logit contributions
 ours+0.035
 plan+0.035
 fact+0.034
 mistake+0.034
 disl+0.034
Neg logit contributions
ed-0.038
es-0.038
ing-0.034
med-0.033
oped-0.033
Token on
Token ID319
Feature activation+1.25e-02

Pos logit contributions
oped+0.078
es+0.077
ing+0.077
ane+0.074
ed+0.073
Neg logit contributions
 help-0.043
 not-0.042
 pap-0.042
 fun-0.038
 permission-0.038
Token her
Token ID607
Feature activation-1.76e-02

Pos logit contributions
es+0.029
ed+0.028
ing+0.028
med+0.027
oped+0.027
Neg logit contributions
 pap-0.015
 plan-0.014
 not-0.014
 mistake-0.013
 majesty-0.013
Token mask
Token ID9335
Feature activation+1.20e-01

Pos logit contributions
 pap+0.023
 not+0.021
 gard+0.021
 mistake+0.021
 ours+0.021
Neg logit contributions
es-0.037
ed-0.036
med-0.035
ing-0.034
s-0.033
Token.
Token ID13
Feature activation+4.54e-02

Pos logit contributions
oped+0.240
ays+0.226
med+0.220
ing+0.220
acked+0.217
Neg logit contributions
 not-0.203
 help-0.169
 old-0.165
 sc-0.163
 special-0.161
Token She
Token ID1375
Feature activation-1.99e-02

Pos logit contributions
oped+0.085
es+0.083
med+0.082
ing+0.082
hed+0.078
Neg logit contributions
 not-0.081
 pap-0.078
 grown-0.076
 dad-0.072
 old-0.071
Token hoped
Token ID10719
Feature activation+1.04e-02

Pos logit contributions
 pap+0.029
 ours+0.028
 plan+0.027
 not+0.027
 mistake+0.027
Neg logit contributions
es-0.041
ed-0.039
oped-0.038
ing-0.038
med-0.037
Token the
Token ID262
Feature activation-2.80e-02

Pos logit contributions
es+0.021
ing+0.021
ed+0.020
oped+0.020
med+0.020
Neg logit contributions
 pap-0.015
 not-0.015
 plan-0.014
 help-0.014
 mistake-0.013
Token bad
Token ID2089
Feature activation+1.76e-02

Pos logit contributions
 pap+0.038
 ours+0.035
 disl+0.032
 majesty+0.032
 not+0.031
Neg logit contributions
ing-0.058
es-0.057
ed-0.057
med-0.053
ane-0.053
Token!
Token ID0
Feature activation+4.56e-02

Pos logit contributions
oped+0.087
es+0.081
med+0.079
ed+0.077
ays+0.076
Neg logit contributions
 not-0.070
 grown-0.067
 fun-0.065
 help-0.065
 be-0.064
Token He
Token ID679
Feature activation-2.09e-02

Pos logit contributions
es+0.065
ing+0.059
oped+0.058
med+0.057
ed+0.057
Neg logit contributions
 not-0.100
 pap-0.097
 plan-0.097
 help-0.095
 grown-0.095
Token grabbed
Token ID13176
Feature activation-6.73e-04

Pos logit contributions
 pap+0.035
 permission+0.031
 mistake+0.031
 ours+0.031
 disl+0.031
Neg logit contributions
ed-0.039
es-0.039
ing-0.038
oped-0.037
med-0.037
Token his
Token ID465
Feature activation-2.57e-02

Pos logit contributions
 pap+0.001
 plan+0.001
 not+0.001
 majesty+0.001
 mistake+0.001
Neg logit contributions
es-0.001
ed-0.001
ing-0.001
med-0.001
oped-0.001
Token baseball
Token ID9283
Feature activation+8.28e-02

Pos logit contributions
 pap+0.039
 majesty+0.038
 gard+0.037
 disl+0.037
 mistake+0.036
Neg logit contributions
ed-0.049
ing-0.049
s-0.049
es-0.048
med-0.047
Token bat
Token ID7365
Feature activation+1.18e-01

Pos logit contributions
es+0.134
ing+0.129
oped+0.128
ed+0.127
med+0.124
Neg logit contributions
 pap-0.153
 not-0.152
 plan-0.148
 help-0.145
 grown-0.141
Token and
Token ID290
Feature activation-1.89e-02

Pos logit contributions
es+0.192
oped+0.190
med+0.180
ays+0.176
ane+0.164
Neg logit contributions
 not-0.205
 help-0.191
 grown-0.189
 water-0.183
 fun-0.181
Token ran
Token ID4966
Feature activation+3.28e-02

Pos logit contributions
 pap+0.031
 mistake+0.031
 majesty+0.029
 gard+0.028
 disl+0.028
Neg logit contributions
ed-0.036
es-0.035
med-0.034
hed-0.033
ing-0.033
Token out
Token ID503
Feature activation+1.69e-02

Pos logit contributions
es+0.054
ing+0.052
ed+0.051
oped+0.051
med+0.049
Neg logit contributions
 pap-0.060
 not-0.060
 plan-0.058
 help-0.057
 grown-0.056
Token of
Token ID286
Feature activation-3.64e-02

Pos logit contributions
es+0.030
oped+0.029
ed+0.029
ing+0.028
hed+0.028
Neg logit contributions
 help-0.028
 not-0.028
 pap-0.028
 plan-0.027
 grown-0.027
Token the
Token ID262
Feature activation-3.17e-02

Pos logit contributions
 majesty+0.060
 pap+0.060
 gard+0.055
 disl+0.053
 not+0.052
Neg logit contributions
med-0.070
es-0.066
ing-0.065
ays-0.063
hed-0.063
Token the
Token ID262
Feature activation-4.92e-02

Pos logit contributions
 majesty+0.033
 mistake+0.031
 fact+0.031
 lookout+0.031
 disl+0.029
Neg logit contributions
ing-0.071
ed-0.070
med-0.065
es-0.065
ted-0.063
Token kids
Token ID3988
Feature activation+1.93e-02

Pos logit contributions
 disl+0.063
 ours+0.057
 not+0.055
 majesty+0.053
 fact+0.049
Neg logit contributions
ed-0.105
ing-0.104
es-0.099
s-0.094
med-0.091
Token all
Token ID477
Feature activation-1.94e-02

Pos logit contributions
es+0.041
ing+0.040
med+0.039
ed+0.039
oped+0.038
Neg logit contributions
 not-0.026
 plan-0.026
 pap-0.026
 help-0.025
 grown-0.024
Token grabbed
Token ID13176
Feature activation+7.82e-03

Pos logit contributions
 pap+0.030
 not+0.029
 mistake+0.027
 plan+0.027
 ours+0.027
Neg logit contributions
es-0.038
ed-0.038
ing-0.037
oped-0.036
med-0.036
Token some
Token ID617
Feature activation-4.42e-03

Pos logit contributions
es+0.019
ing+0.019
ed+0.019
oped+0.018
med+0.018
Neg logit contributions
 pap-0.008
 not-0.008
 plan-0.007
 mistake-0.007
 help-0.007
Token pants
Token ID12581
Feature activation+1.17e-01

Pos logit contributions
 pap+0.005
 not+0.005
 plan+0.005
 mistake+0.005
 majesty+0.005
Neg logit contributions
es-0.010
ing-0.010
ed-0.010
med-0.009
oped-0.009
Token and
Token ID290
Feature activation-1.30e-02

Pos logit contributions
es+0.181
oped+0.181
med+0.174
ing+0.168
ed+0.168
Neg logit contributions
 not-0.216
 help-0.208
 pap-0.201
 fun-0.196
 grown-0.196
Token carefully
Token ID7773
Feature activation+8.79e-03

Pos logit contributions
 mistake+0.019
 majesty+0.019
 pap+0.019
 ours+0.018
 not+0.018
Neg logit contributions
es-0.026
ed-0.026
med-0.025
ing-0.024
ted-0.024
Token hung
Token ID9174
Feature activation+4.35e-02

Pos logit contributions
es+0.021
ing+0.021
ed+0.020
oped+0.020
ane+0.020
Neg logit contributions
 pap-0.009
 not-0.009
 plan-0.009
 help-0.009
 grown-0.008
Token them
Token ID606
Feature activation+4.47e-02

Pos logit contributions
es+0.099
oped+0.098
ing+0.095
ed+0.094
hed+0.093
Neg logit contributions
 not-0.051
 pap-0.050
 help-0.047
 grown-0.045
 plan-0.045
Token up
Token ID510
Feature activation+1.97e-02

Pos logit contributions
oped+0.090
ane+0.085
es+0.083
umping+0.082
ing+0.080
Neg logit contributions
 not-0.075
 work-0.070
 special-0.067
 help-0.067
 plan-0.066
Token.
Token ID13
Feature activation+4.00e-02

Pos logit contributions
oped+0.081
es+0.080
med+0.077
ane+0.074
ing+0.071
Neg logit contributions
 not-0.077
 help-0.074
 mistake-0.074
 permission-0.071
 pap-0.071
Token He
Token ID679
Feature activation-1.34e-02

Pos logit contributions
es+0.055
ing+0.054
med+0.053
ed+0.053
oped+0.052
Neg logit contributions
 not-0.087
 pap-0.086
 help-0.081
 disl-0.080
 grown-0.080
Token took
Token ID1718
Feature activation+5.56e-03

Pos logit contributions
 pap+0.021
 ours+0.020
 plan+0.020
 disl+0.020
 not+0.019
Neg logit contributions
es-0.026
ed-0.025
ing-0.024
oped-0.024
med-0.024
Token off
Token ID572
Feature activation+2.72e-02

Pos logit contributions
ed+0.011
es+0.011
ing+0.011
med+0.011
oped+0.010
Neg logit contributions
 pap-0.008
 plan-0.008
 not-0.008
 majesty-0.008
 mistake-0.008
Token his
Token ID465
Feature activation-1.99e-02

Pos logit contributions
es+0.061
ed+0.058
ing+0.058
med+0.057
oped+0.057
Neg logit contributions
 pap-0.035
 plan-0.033
 not-0.033
 mistake-0.031
 majesty-0.031
Token mask
Token ID9335
Feature activation+1.16e-01

Pos logit contributions
 pap+0.027
 not+0.025
 disl+0.024
 majesty+0.024
 gard+0.023
Neg logit contributions
es-0.041
ed-0.040
ing-0.038
med-0.038
s-0.036
Token and
Token ID290
Feature activation-1.19e-02

Pos logit contributions
oped+0.203
med+0.202
es+0.194
ing+0.192
ays+0.189
Neg logit contributions
 not-0.218
 help-0.187
 mistake-0.185
 sc-0.181
 plan-0.179
Token cape
Token ID36023
Feature activation+9.77e-02

Pos logit contributions
 pap+0.019
 majesty+0.019
 fact+0.019
 disl+0.019
 ours+0.019
Neg logit contributions
es-0.022
ed-0.021
med-0.019
ted-0.019
oped-0.018
Token and
Token ID290
Feature activation-1.25e-02

Pos logit contributions
ays+0.156
oped+0.153
ing+0.152
ane+0.151
es+0.150
Neg logit contributions
 not-0.179
 help-0.170
 sc-0.152
 be-0.151
 grown-0.151
Token said
Token ID531
Feature activation+1.35e-02

Pos logit contributions
 pap+0.021
 ours+0.019
 mistake+0.019
 majesty+0.019
 fact+0.019
Neg logit contributions
es-0.024
ed-0.023
med-0.021
ted-0.020
oped-0.020
Token,
Token ID11
Feature activation+2.30e-02

Pos logit contributions
es+0.023
oped+0.023
ing+0.023
med+0.023
ed+0.022
Neg logit contributions
 not-0.024
 pap-0.023
 help-0.022
 plan-0.022
 grown-0.021
Token
Token ID198
Feature activation+3.50e-02

Pos logit contributions
es+0.052
ed+0.051
ing+0.051
med+0.050
oped+0.049
Neg logit contributions
 not-0.064
 pap-0.062
 help-0.060
 plan-0.060
 mistake-0.059
TokenSam
Token ID16305
Feature activation+3.48e-02

Pos logit contributions
es+0.055
ing+0.053
ed+0.052
oped+0.051
med+0.050
Neg logit contributions
 pap-0.068
 not-0.066
 plan-0.065
 help-0.063
 mistake-0.063
Token took
Token ID1718
Feature activation+1.40e-02

Pos logit contributions
es+0.053
ing+0.053
oped+0.051
med+0.050
ed+0.049
Neg logit contributions
 not-0.069
 pap-0.066
 grown-0.064
 plan-0.063
 mistake-0.063
Token off
Token ID572
Feature activation+3.03e-02

Pos logit contributions
es+0.030
ed+0.030
ing+0.029
med+0.028
oped+0.027
Neg logit contributions
 pap-0.019
 not-0.018
 plan-0.018
 majesty-0.017
 mistake-0.017
Token his
Token ID465
Feature activation-2.23e-02

Pos logit contributions
es+0.072
ing+0.070
ed+0.069
oped+0.068
med+0.067
Neg logit contributions
 pap-0.034
 not-0.033
 plan-0.032
 help-0.031
 mistake-0.030
Token mask
Token ID9335
Feature activation+1.16e-01

Pos logit contributions
 pap+0.030
 not+0.028
 majesty+0.027
 disl+0.027
 fact+0.026
Neg logit contributions
es-0.046
ed-0.045
ing-0.043
med-0.043
s-0.041
Token and
Token ID290
Feature activation-1.41e-02

Pos logit contributions
oped+0.220
med+0.216
ing+0.204
es+0.204
ays+0.200
Neg logit contributions
 not-0.210
 help-0.178
 mistake-0.176
 sc-0.171
 plan-0.165
Token looked
Token ID3114
Feature activation-1.91e-04

Pos logit contributions
 pap+0.022
 fact+0.022
 disl+0.022
 majesty+0.021
 plan+0.021
Neg logit contributions
es-0.028
ed-0.026
med-0.025
ing-0.023
ted-0.023
Token sorry
Token ID7926
Feature activation+3.26e-02

Pos logit contributions
 pap+0.000
 not+0.000
 plan+0.000
 help+0.000
 mistake+0.000
Neg logit contributions
es-0.000
ing-0.000
ed-0.000
oped-0.000
med-0.000
Token.
Token ID13
Feature activation+3.87e-02

Pos logit contributions
oped+0.076
med+0.071
hed+0.068
acked+0.067
es+0.066
Neg logit contributions
 not-0.053
 help-0.047
 fun-0.047
 be-0.044
 sweet-0.044
Token He
Token ID679
Feature activation-2.20e-02

Pos logit contributions
es+0.059
med+0.057
oped+0.057
ing+0.057
ed+0.055
Neg logit contributions
 not-0.083
 pap-0.080
 grown-0.076
 help-0.076
 fun-0.075
TokenWe
Token ID1135
Feature activation-3.74e-02

Pos logit contributions
es+0.054
ing+0.051
oped+0.050
med+0.050
ol+0.049
Neg logit contributions
 not-0.040
 help-0.037
 be-0.035
 think-0.035
 fun-0.034
Token will
Token ID481
Feature activation-3.74e-02

Pos logit contributions
 pap+0.063
 not+0.056
 mistake+0.055
 majesty+0.055
 elderly+0.054
Neg logit contributions
es-0.070
ed-0.068
oped-0.067
ing-0.065
hed-0.064
Token share
Token ID2648
Feature activation+3.15e-02

Pos logit contributions
 pap+0.050
 seaw+0.040
 creepy+0.039
 tresp+0.038
 fact+0.038
Neg logit contributions
es-0.076
s-0.072
oped-0.071
hed-0.069
ed-0.069
Token our
Token ID674
Feature activation-2.48e-02

Pos logit contributions
es+0.069
ing+0.066
ed+0.065
med+0.064
oped+0.063
Neg logit contributions
 pap-0.043
 plan-0.040
 not-0.039
 help-0.037
 grown-0.037
Token sp
Token ID599
Feature activation+5.41e-02

Pos logit contributions
 pap+0.043
 not+0.037
 gard+0.037
 majesty+0.035
 fact+0.034
Neg logit contributions
es-0.043
s-0.042
ed-0.042
ing-0.041
oped-0.039
Tokenades
Token ID2367
Feature activation+1.16e-01

Pos logit contributions
es+0.086
ing+0.083
ed+0.082
oped+0.079
med+0.078
Neg logit contributions
 pap-0.104
 not-0.100
 plan-0.098
 help-0.097
 grown-0.095
Token and
Token ID290
Feature activation-2.38e-02

Pos logit contributions
oped+0.204
es+0.191
ing+0.189
med+0.189
ed+0.187
Neg logit contributions
 not-0.203
 pap-0.191
 plan-0.188
 fun-0.188
 help-0.187
Token buckets
Token ID38674
Feature activation+7.82e-02

Pos logit contributions
 pap+0.031
 fact+0.030
 supposed+0.029
 allergic+0.029
 gard+0.028
Neg logit contributions
es-0.044
s-0.042
med-0.040
 again-0.040
ed-0.039
Token next
Token ID1306
Feature activation+7.99e-03

Pos logit contributions
es+0.113
ing+0.108
ed+0.107
oped+0.105
med+0.102
Neg logit contributions
 pap-0.160
 not-0.157
 plan-0.153
 help-0.150
 mistake-0.149
Token time
Token ID640
Feature activation+3.04e-02

Pos logit contributions
es+0.017
ing+0.017
ed+0.017
oped+0.017
med+0.016
Neg logit contributions
 pap-0.010
 not-0.010
 plan-0.010
 help-0.010
 mistake-0.010
Token!"
Token ID2474
Feature activation+4.95e-02

Pos logit contributions
es+0.061
oped+0.058
ing+0.058
ed+0.058
med+0.057
Neg logit contributions
 not-0.045
 pap-0.044
 plan-0.042
 help-0.042
 mistake-0.041
Token They
Token ID1119
Feature activation-2.86e-02

Pos logit contributions
es+0.107
ing+0.106
oped+0.100
med+0.100
ed+0.097
Neg logit contributions
 not-0.073
 help-0.069
 pap-0.066
 be-0.065
 fun-0.063
Token try
Token ID1949
Feature activation+2.47e-02

Pos logit contributions
 pap+0.033
 mistake+0.031
 ours+0.029
 disl+0.029
 not+0.029
Neg logit contributions
es-0.065
oped-0.064
ed-0.062
hed-0.061
med-0.060
Token to
Token ID284
Feature activation-3.53e-02

Pos logit contributions
oped+0.055
es+0.054
ed+0.054
hed+0.052
med+0.052
Neg logit contributions
 not-0.035
 pap-0.030
 grown-0.030
 mistake-0.030
 help-0.029
Token take
Token ID1011
Feature activation+2.55e-02

Pos logit contributions
 pap+0.047
 majesty+0.044
 disl+0.040
 fact+0.039
 ours+0.038
Neg logit contributions
es-0.075
oped-0.072
med-0.072
ed-0.071
ing-0.070
Token the
Token ID262
Feature activation-2.39e-02

Pos logit contributions
es+0.066
oped+0.065
ing+0.065
ed+0.063
med+0.062
Neg logit contributions
 not-0.022
 pap-0.022
 help-0.022
 plan-0.020
 fun-0.020
Token vest
Token ID19750
Feature activation+1.16e-01

Pos logit contributions
 pap+0.036
 not+0.035
 plan+0.034
 help+0.034
 grown+0.033
Neg logit contributions
es-0.049
ing-0.046
ed-0.046
oped-0.046
med-0.045
Token.
Token ID13
Feature activation+5.80e-02

Pos logit contributions
med+0.231
oped+0.230
es+0.221
ted+0.212
hed+0.210
Neg logit contributions
 not-0.204
 help-0.176
 plan-0.172
 work-0.166
 be-0.165
Token The
Token ID383
Feature activation-2.76e-02

Pos logit contributions
es+0.139
ing+0.136
med+0.133
oped+0.132
ane+0.130
Neg logit contributions
 not-0.081
 help-0.075
 fun-0.070
 be-0.070
 grown-0.064
Token man
Token ID582
Feature activation+4.61e-02

Pos logit contributions
 pap+0.043
 not+0.042
 plan+0.040
 help+0.039
 mistake+0.039
Neg logit contributions
es-0.054
ing-0.052
ed-0.052
oped-0.051
med-0.050
Token is
Token ID318
Feature activation-2.81e-02

Pos logit contributions
ing+0.101
es+0.100
ed+0.100
oped+0.099
med+0.096
Neg logit contributions
 not-0.058
 pap-0.057
 plan-0.055
 help-0.055
 mistake-0.050
Token angry
Token ID7954
Feature activation+5.21e-02

Pos logit contributions
 pap+0.030
 majesty+0.028
 disl+0.028
 plan+0.027
 mistake+0.026
Neg logit contributions
es-0.067
ing-0.065
ed-0.065
med-0.065
oped-0.062
Token girl
Token ID2576
Feature activation+2.04e-02

Pos logit contributions
 pap+0.055
 not+0.053
 disl+0.051
 ours+0.050
 majesty+0.050
Neg logit contributions
ing-0.065
ed-0.063
es-0.061
oped-0.059
med-0.058
Token took
Token ID1718
Feature activation+1.65e-02

Pos logit contributions
es+0.040
ing+0.039
ed+0.038
oped+0.038
med+0.038
Neg logit contributions
 pap-0.031
 not-0.031
 plan-0.029
 help-0.029
 grown-0.028
Token off
Token ID572
Feature activation+2.57e-02

Pos logit contributions
es+0.039
ing+0.038
oped+0.038
ed+0.037
med+0.037
Neg logit contributions
 pap-0.019
 not-0.018
 help-0.017
 plan-0.017
 grown-0.016
Token her
Token ID607
Feature activation-1.65e-02

Pos logit contributions
es+0.062
ing+0.061
ed+0.060
oped+0.059
med+0.058
Neg logit contributions
 not-0.028
 pap-0.028
 help-0.026
 plan-0.025
 grown-0.025
Token high
Token ID1029
Feature activation+1.64e-02

Pos logit contributions
 pap+0.021
 majesty+0.018
 not+0.018
 ours+0.018
 mistake+0.018
Neg logit contributions
ed-0.036
ing-0.035
es-0.035
med-0.034
oped-0.034
Token pants
Token ID12581
Feature activation+1.16e-01

Pos logit contributions
ing+0.035
es+0.035
ed+0.034
oped+0.033
med+0.033
Neg logit contributions
 pap-0.022
 not-0.021
 ours-0.019
 plan-0.019
 majesty-0.019
Token and
Token ID290
Feature activation-1.11e-02

Pos logit contributions
es+0.192
med+0.190
oped+0.189
ed+0.179
ing+0.177
Neg logit contributions
 not-0.215
 help-0.193
 grown-0.187
 sc-0.185
 fun-0.184
Token put
Token ID1234
Feature activation+3.16e-02

Pos logit contributions
 pap+0.018
 mistake+0.018
 plan+0.017
 disl+0.017
 fact+0.017
Neg logit contributions
es-0.021
ed-0.020
ing-0.019
med-0.019
oped-0.019
Token them
Token ID606
Feature activation+4.57e-02

Pos logit contributions
es+0.066
ing+0.064
oped+0.063
ed+0.062
med+0.061
Neg logit contributions
 not-0.043
 pap-0.043
 help-0.041
 plan-0.040
 fun-0.039
Token away
Token ID1497
Feature activation+3.36e-02

Pos logit contributions
es+0.070
ing+0.068
oped+0.068
ane+0.064
ays+0.064
Neg logit contributions
 not-0.091
 fun-0.083
 help-0.083
 plan-0.080
 sc-0.079
Token.
Token ID13
Feature activation+5.36e-02

Pos logit contributions
oped+0.059
es+0.056
med+0.054
ing+0.053
ed+0.053
Neg logit contributions
 not-0.063
 help-0.058
 pap-0.057
 fun-0.056
 grown-0.056
Token,
Token ID11
Feature activation-7.19e-03

Pos logit contributions
oped+0.051
es+0.050
ays+0.047
ane+0.047
med+0.046
Neg logit contributions
 not-0.050
 pap-0.050
 sweet-0.048
 help-0.047
 be-0.046
Token took
Token ID1718
Feature activation+1.30e-02

Pos logit contributions
 plan+0.010
 pap+0.010
 mistake+0.010
 ours+0.010
 help+0.010
Neg logit contributions
es-0.014
ed-0.014
ing-0.014
med-0.013
oped-0.013
Token off
Token ID572
Feature activation+2.03e-02

Pos logit contributions
es+0.030
oped+0.030
ing+0.029
ed+0.029
med+0.028
Neg logit contributions
 pap-0.015
 not-0.014
 help-0.014
 plan-0.014
 grown-0.013
Token his
Token ID465
Feature activation-1.54e-02

Pos logit contributions
es+0.046
ing+0.045
ed+0.045
oped+0.044
med+0.043
Neg logit contributions
 pap-0.025
 not-0.025
 plan-0.023
 help-0.023
 grown-0.022
Token white
Token ID2330
Feature activation+4.09e-03

Pos logit contributions
 pap+0.022
 not+0.020
 ours+0.020
 gard+0.019
 plan+0.019
Neg logit contributions
es-0.030
ed-0.030
med-0.030
ing-0.030
oped-0.029
Token cap
Token ID1451
Feature activation+1.15e-01

Pos logit contributions
es+0.008
ed+0.008
ing+0.008
oped+0.008
med+0.008
Neg logit contributions
 pap-0.006
 not-0.006
 ours-0.006
 plan-0.005
 mistake-0.005
Token and
Token ID290
Feature activation-1.72e-02

Pos logit contributions
oped+0.190
ays+0.185
med+0.183
es+0.182
ing+0.178
Neg logit contributions
 not-0.216
 help-0.195
 pap-0.191
 plan-0.187
 be-0.184
Token jumped
Token ID11687
Feature activation+4.90e-02

Pos logit contributions
 mistake+0.027
 pap+0.027
 plan+0.027
 ours+0.026
 fact+0.026
Neg logit contributions
es-0.033
ed-0.032
med-0.030
ing-0.030
oped-0.029
Token out
Token ID503
Feature activation+2.16e-02

Pos logit contributions
es+0.088
oped+0.087
ane+0.082
ing+0.081
med+0.077
Neg logit contributions
 not-0.085
 help-0.081
 grown-0.080
 pap-0.078
 fun-0.078
Token of
Token ID286
Feature activation-3.03e-02

Pos logit contributions
es+0.035
oped+0.034
ed+0.033
ing+0.033
ane+0.032
Neg logit contributions
 help-0.041
 not-0.040
 pap-0.040
 plan-0.039
 grown-0.039
Token bed
Token ID3996
Feature activation+7.40e-02

Pos logit contributions
 majesty+0.044
 gard+0.042
 pap+0.041
 disl+0.041
 plan+0.039
Neg logit contributions
med-0.058
ted-0.055
 again-0.053
es-0.052
ays-0.052
Token arm
Token ID3211
Feature activation+7.08e-02

Pos logit contributions
 pap+0.028
 disl+0.026
 plan+0.024
 not+0.024
 dads+0.024
Neg logit contributions
es-0.049
ed-0.047
ing-0.046
med-0.045
s-0.045
Tokenchair
Token ID16337
Feature activation+7.51e-02

Pos logit contributions
oped+0.096
es+0.087
ing+0.085
med+0.083
ays+0.082
Neg logit contributions
 not-0.147
 help-0.144
 old-0.142
 grown-0.139
 be-0.139
Token and
Token ID290
Feature activation-3.51e-03

Pos logit contributions
oped+0.135
med+0.131
ed+0.130
ing+0.129
es+0.128
Neg logit contributions
 not-0.135
 help-0.124
 plan-0.122
 be-0.116
 pap-0.116
Token put
Token ID1234
Feature activation+3.30e-02

Pos logit contributions
 pap+0.005
 not+0.005
 plan+0.004
 help+0.004
 grown+0.004
Neg logit contributions
es-0.008
ing-0.007
ed-0.007
oped-0.007
med-0.007
Token a
Token ID257
Feature activation-1.94e-02

Pos logit contributions
es+0.075
ing+0.074
oped+0.073
ed+0.071
ane+0.070
Neg logit contributions
 not-0.040
 pap-0.039
 help-0.038
 plan-0.036
 fun-0.034
Token crown
Token ID12389
Feature activation+1.15e-01

Pos logit contributions
 pap+0.023
 disl+0.023
 not+0.022
 permission+0.021
 fact+0.021
Neg logit contributions
es-0.042
ing-0.041
ed-0.041
oped-0.040
med-0.040
Token on
Token ID319
Feature activation+4.16e-03

Pos logit contributions
oped+0.179
acked+0.162
es+0.159
umping+0.159
med+0.158
Neg logit contributions
 not-0.219
 help-0.202
 big-0.189
 sc-0.189
 be-0.188
Token his
Token ID465
Feature activation-1.80e-02

Pos logit contributions
es+0.010
ing+0.010
oped+0.010
ed+0.010
med+0.009
Neg logit contributions
 not-0.005
 pap-0.005
 help-0.004
 plan-0.004
 grown-0.004
Token head
Token ID1182
Feature activation+7.37e-02

Pos logit contributions
 disl+0.019
 pap+0.019
 ours+0.017
 not+0.017
 fact+0.017
Neg logit contributions
es-0.042
ed-0.041
ing-0.039
med-0.039
s-0.038
Token.
Token ID13
Feature activation+5.95e-02

Pos logit contributions
oped+0.140
med+0.132
acked+0.132
ays+0.131
hed+0.130
Neg logit contributions
 not-0.118
 be-0.103
 help-0.100
 pap-0.097
 permission-0.095
Token
Token ID198
Feature activation+4.63e-02

Pos logit contributions
es+0.113
ing+0.108
oped+0.107
ane+0.106
med+0.105
Neg logit contributions
 not-0.101
 grown-0.098
 pap-0.097
 help-0.091
 plan-0.091
Token.
Token ID13
Feature activation+4.46e-02

Pos logit contributions
oped+0.050
ane+0.046
es+0.045
or+0.045
ays+0.044
Neg logit contributions
 not-0.036
 help-0.033
 be-0.033
 fun-0.032
 grown-0.032
Token He
Token ID679
Feature activation-1.47e-02

Pos logit contributions
es+0.076
oped+0.074
med+0.073
ing+0.071
ed+0.070
Neg logit contributions
 not-0.085
 pap-0.082
 help-0.082
 grown-0.080
 welcome-0.080
Token puts
Token ID7584
Feature activation+2.53e-02

Pos logit contributions
 pap+0.023
 not+0.022
 plan+0.021
 ours+0.021
 permission+0.020
Neg logit contributions
es-0.029
ed-0.028
ing-0.027
med-0.026
oped-0.026
Token on
Token ID319
Feature activation+1.22e-02

Pos logit contributions
es+0.065
oped+0.063
ing+0.063
ed+0.061
ane+0.060
Neg logit contributions
 not-0.023
 pap-0.022
 help-0.022
 plan-0.020
 fun-0.020
Token his
Token ID465
Feature activation-2.16e-02

Pos logit contributions
es+0.027
ing+0.026
ed+0.026
med+0.025
oped+0.025
Neg logit contributions
 pap-0.016
 not-0.015
 plan-0.015
 help-0.014
 mistake-0.014
Token mask
Token ID9335
Feature activation+1.15e-01

Pos logit contributions
 pap+0.032
 not+0.030
 ours+0.030
 disl+0.029
 mistake+0.029
Neg logit contributions
es-0.041
ed-0.039
med-0.038
ing-0.038
oped-0.036
Token.
Token ID13
Feature activation+5.00e-02

Pos logit contributions
oped+0.236
ing+0.219
med+0.218
ays+0.217
acked+0.206
Neg logit contributions
 not-0.192
 old-0.157
 help-0.155
 sc-0.155
 fun-0.149
Token He
Token ID679
Feature activation-1.32e-02

Pos logit contributions
es+0.085
ing+0.082
oped+0.081
med+0.079
ed+0.079
Neg logit contributions
 not-0.098
 help-0.093
 fun-0.092
 pap-0.091
 grown-0.091
Token thanks
Token ID5176
Feature activation+2.28e-02

Pos logit contributions
 pap+0.020
 not+0.019
 plan+0.019
 ours+0.019
 permission+0.019
Neg logit contributions
es-0.026
ed-0.024
ing-0.024
med-0.024
oped-0.023
Token Sam
Token ID3409
Feature activation+5.21e-02

Pos logit contributions
es+0.045
ing+0.044
ed+0.044
med+0.042
ted+0.041
Neg logit contributions
 pap-0.033
 plan-0.032
 fact-0.031
 not-0.030
 mistake-0.030
Token.
Token ID13
Feature activation+4.69e-02

Pos logit contributions
oped+0.112
med+0.101
ays+0.101
acked+0.100
ane+0.100
Neg logit contributions
 help-0.076
 not-0.075
 pap-0.072
 permission-0.071
 work-0.070
Token put
Token ID1234
Feature activation+2.11e-02

Pos logit contributions
 pap+0.029
 not+0.027
 mistake+0.027
 disl+0.026
 plan+0.026
Neg logit contributions
es-0.037
ed-0.036
oped-0.035
ing-0.035
med-0.035
Token on
Token ID319
Feature activation+6.61e-03

Pos logit contributions
es+0.043
ing+0.042
ed+0.041
oped+0.041
med+0.040
Neg logit contributions
 pap-0.031
 not-0.030
 plan-0.029
 help-0.028
 mistake-0.027
Token his
Token ID465
Feature activation-2.66e-02

Pos logit contributions
es+0.013
med+0.013
ed+0.013
ing+0.013
hed+0.012
Neg logit contributions
 pap-0.010
 plan-0.009
 gard-0.009
 mistake-0.009
 majesty-0.009
Token favorite
Token ID4004
Feature activation-1.23e-02

Pos logit contributions
 pap+0.029
 gard+0.028
 disl+0.028
 mistake+0.027
 not+0.026
Neg logit contributions
ed-0.059
es-0.059
med-0.057
ing-0.056
oped-0.055
Token blue
Token ID4171
Feature activation+3.21e-03

Pos logit contributions
 pap+0.015
 not+0.014
 gard+0.013
 mistake+0.013
 majesty+0.013
Neg logit contributions
es-0.028
ed-0.027
ing-0.027
med-0.027
oped-0.026
Token cap
Token ID1451
Feature activation+1.15e-01

Pos logit contributions
es+0.008
ed+0.007
ing+0.007
med+0.007
oped+0.007
Neg logit contributions
 pap-0.004
 not-0.003
 mistake-0.003
 plan-0.003
 disl-0.003
Token and
Token ID290
Feature activation-1.52e-02

Pos logit contributions
oped+0.194
ays+0.189
med+0.185
es+0.182
ing+0.180
Neg logit contributions
 not-0.202
 sc-0.172
 pap-0.171
 grown-0.168
 dad-0.167
Token hopped
Token ID45230
Feature activation+4.41e-02

Pos logit contributions
 pap+0.022
 mistake+0.021
 plan+0.021
 not+0.021
 help+0.020
Neg logit contributions
es-0.031
ed-0.031
ing-0.029
med-0.029
oped-0.029
Token in
Token ID287
Feature activation+4.63e-03

Pos logit contributions
es+0.071
oped+0.066
ing+0.065
ane+0.061
ed+0.061
Neg logit contributions
 not-0.084
 help-0.082
 grown-0.080
 pap-0.080
 plan-0.079
Token the
Token ID262
Feature activation-3.33e-02

Pos logit contributions
es+0.011
ing+0.010
ed+0.010
oped+0.010
med+0.010
Neg logit contributions
 pap-0.006
 not-0.005
 plan-0.005
 help-0.005
 grown-0.005
Token car
Token ID1097
Feature activation+7.38e-02

Pos logit contributions
 pap+0.047
 not+0.046
 plan+0.044
 mistake+0.043
 help+0.042
Neg logit contributions
es-0.069
ing-0.067
ed-0.067
oped-0.065
med-0.065
Token They
Token ID1119
Feature activation-2.51e-02

Pos logit contributions
es+0.129
med+0.117
ing+0.116
oped+0.115
ane+0.112
Neg logit contributions
 not-0.086
 fun-0.082
 grown-0.079
 be-0.076
 dad-0.075
Token rolled
Token ID11686
Feature activation+4.75e-02

Pos logit contributions
 pap+0.035
 not+0.033
 mistake+0.033
 plan+0.032
 disl+0.032
Neg logit contributions
es-0.053
ed-0.052
oped-0.051
ing-0.051
med-0.050
Token big
Token ID1263
Feature activation+1.42e-02

Pos logit contributions
es+0.082
oped+0.080
ing+0.079
ed+0.077
med+0.076
Neg logit contributions
 not-0.084
 pap-0.083
 plan-0.080
 help-0.079
 fun-0.078
Token balls
Token ID11333
Feature activation+9.98e-02

Pos logit contributions
es+0.035
ing+0.034
ed+0.033
oped+0.033
med+0.033
Neg logit contributions
 pap-0.015
 not-0.014
 plan-0.014
 help-0.013
 grown-0.013
Token of
Token ID286
Feature activation-4.19e-02

Pos logit contributions
es+0.163
oped+0.163
med+0.158
ing+0.152
ed+0.151
Neg logit contributions
 not-0.187
 help-0.173
 plan-0.164
 fun-0.163
 be-0.160
Token snow
Token ID6729
Feature activation+1.15e-01

Pos logit contributions
 pap+0.057
 majesty+0.056
 mistake+0.055
 disl+0.055
 not+0.055
Neg logit contributions
es-0.085
med-0.080
ing-0.080
ed-0.077
ted-0.076
Token and
Token ID290
Feature activation-1.09e-02

Pos logit contributions
oped+0.213
med+0.197
ane+0.186
 onwards+0.185
 onward+0.184
Neg logit contributions
 not-0.197
 pap-0.175
 help-0.171
 be-0.168
 fun-0.166
Token stacked
Token ID24167
Feature activation+5.13e-02

Pos logit contributions
 pap+0.017
 mistake+0.017
 plan+0.016
 not+0.016
 disl+0.016
Neg logit contributions
es-0.021
ed-0.021
ing-0.020
med-0.020
oped-0.020
Token them
Token ID606
Feature activation+6.13e-02

Pos logit contributions
oped+0.122
es+0.120
ing+0.117
ed+0.116
ane+0.116
Neg logit contributions
 not-0.059
 pap-0.054
 fun-0.054
 help-0.051
 mud-0.050
Token on
Token ID319
Feature activation+6.50e-03

Pos logit contributions
oped+0.130
ane+0.120
umping+0.119
ays+0.113
med+0.111
Neg logit contributions
 not-0.111
 fun-0.089
 be-0.087
 help-0.087
 big-0.086
Token top
Token ID1353
Feature activation+6.07e-02

Pos logit contributions
es+0.016
ing+0.016
ed+0.016
oped+0.016
med+0.015
Neg logit contributions
 pap-0.006
 not-0.006
 plan-0.006
 help-0.006
 mistake-0.006
Token saw
Token ID2497
Feature activation+3.82e-03

Pos logit contributions
es+0.050
ing+0.049
ed+0.048
med+0.048
oped+0.047
Neg logit contributions
 not-0.024
 pap-0.024
 plan-0.023
 grown-0.022
 help-0.021
Token a
Token ID257
Feature activation-2.16e-02

Pos logit contributions
ing+0.008
es+0.008
ed+0.008
med+0.008
ted+0.007
Neg logit contributions
 pap-0.005
 majesty-0.005
 plan-0.005
 promise-0.005
 disl-0.005
Token big
Token ID1263
Feature activation-1.86e-03

Pos logit contributions
 pap+0.021
 permission+0.020
 not+0.019
 disl+0.019
 ours+0.017
Neg logit contributions
ing-0.053
es-0.051
oped-0.050
ed-0.050
s-0.049
Token box
Token ID3091
Feature activation+7.52e-02

Pos logit contributions
 pap+0.003
 not+0.003
 disl+0.002
 permission+0.002
 majesty+0.002
Neg logit contributions
ing-0.004
es-0.004
ed-0.003
oped-0.003
med-0.003
Token of
Token ID286
Feature activation-3.47e-02

Pos logit contributions
es+0.135
ed+0.127
oped+0.124
med+0.122
ing+0.120
Neg logit contributions
 not-0.138
 help-0.124
 plan-0.124
 grown-0.123
 pap-0.119
Token Leg
Token ID3564
Feature activation+1.15e-01

Pos logit contributions
 pap+0.043
 not+0.042
 majesty+0.039
 mistake+0.038
 disl+0.037
Neg logit contributions
es-0.075
ing-0.075
ed-0.071
med-0.070
ted-0.069
Tokenos
Token ID418
Feature activation+7.46e-02

Pos logit contributions
ted+0.135
ing+0.134
es+0.125
 onward+0.125
ed+0.123
Neg logit contributions
 not-0.243
 help-0.220
 plan-0.220
 been-0.217
 be-0.217
Token that
Token ID326
Feature activation+2.08e-02

Pos logit contributions
ed+0.125
oped+0.123
antly+0.118
es+0.117
 onward+0.115
Neg logit contributions
 work-0.123
 sc-0.123
 help-0.122
 not-0.119
 water-0.117
Token he
Token ID339
Feature activation+1.02e-03

Pos logit contributions
es+0.038
ing+0.037
ed+0.036
oped+0.036
med+0.035
Neg logit contributions
 pap-0.035
 not-0.035
 plan-0.033
 help-0.033
 grown-0.032
Token really
Token ID1107
Feature activation-2.88e-02

Pos logit contributions
es+0.002
ed+0.002
ing+0.002
oped+0.002
med+0.002
Neg logit contributions
 pap-0.001
 not-0.001
 plan-0.001
 help-0.001
 mistake-0.001
Token wanted
Token ID2227
Feature activation+2.15e-02

Pos logit contributions
 pap+0.044
 not+0.038
 mistake+0.038
 plan+0.038
 help+0.038
Neg logit contributions
ed-0.057
es-0.057
ing-0.055
oped-0.054
hed-0.053
Token He
Token ID679
Feature activation-1.11e-02

Pos logit contributions
oped+0.085
es+0.084
ing+0.082
med+0.079
ane+0.076
Neg logit contributions
 not-0.097
 grown-0.097
 pap-0.095
 help-0.090
 plan-0.090
Token put
Token ID1234
Feature activation+2.63e-02

Pos logit contributions
 pap+0.015
 not+0.015
 plan+0.015
 help+0.014
 grown+0.014
Neg logit contributions
es-0.023
ing-0.023
ed-0.022
oped-0.022
med-0.022
Token on
Token ID319
Feature activation+1.62e-02

Pos logit contributions
es+0.057
ing+0.056
oped+0.055
ed+0.054
med+0.053
Neg logit contributions
 not-0.034
 pap-0.034
 help-0.032
 plan-0.032
 fun-0.031
Token a
Token ID257
Feature activation-1.90e-02

Pos logit contributions
es+0.038
ed+0.037
ing+0.037
med+0.036
oped+0.035
Neg logit contributions
 pap-0.019
 not-0.017
 plan-0.017
 mistake-0.016
 help-0.016
Token golden
Token ID10861
Feature activation+2.56e-03

Pos logit contributions
 pap+0.022
 not+0.020
 mistake+0.018
 permission+0.018
 disl+0.018
Neg logit contributions
es-0.044
ing-0.043
ed-0.043
oped-0.043
med-0.043
Token crown
Token ID12389
Feature activation+1.15e-01

Pos logit contributions
es+0.005
ed+0.005
ing+0.005
med+0.005
oped+0.005
Neg logit contributions
 pap-0.003
 not-0.003
 disl-0.003
 mistake-0.003
 ours-0.003
Token and
Token ID290
Feature activation-1.58e-02

Pos logit contributions
oped+0.229
acked+0.212
med+0.210
es+0.207
hed+0.199
Neg logit contributions
 not-0.180
 help-0.151
 be-0.149
 sc-0.149
 turn-0.144
Token felt
Token ID2936
Feature activation-2.63e-02

Pos logit contributions
 pap+0.023
 plan+0.022
 mistake+0.022
 not+0.021
 ours+0.020
Neg logit contributions
es-0.033
ed-0.032
med-0.031
ing-0.031
oped-0.030
Token like
Token ID588
Feature activation-9.99e-03

Pos logit contributions
 pap+0.035
 mistake+0.033
 plan+0.032
 fact+0.031
 not+0.031
Neg logit contributions
es-0.057
ed-0.056
med-0.055
ing-0.054
oped-0.051
Token a
Token ID257
Feature activation-1.00e-02

Pos logit contributions
 pap+0.011
 not+0.011
 plan+0.010
 help+0.010
 mistake+0.010
Neg logit contributions
es-0.024
ing-0.023
ed-0.023
oped-0.023
med-0.022
Token king
Token ID5822
Feature activation+7.79e-02

Pos logit contributions
 pap+0.012
 not+0.012
 ours+0.011
 plan+0.011
 permission+0.011
Neg logit contributions
es-0.022
ing-0.022
ed-0.021
oped-0.021
med-0.021
Token and
Token ID290
Feature activation-1.65e-02

Pos logit contributions
oped+0.173
es+0.167
hed+0.162
med+0.161
ane+0.159
Neg logit contributions
 not-0.138
 old-0.121
 help-0.119
 be-0.116
 work-0.114
Token took
Token ID1718
Feature activation+1.38e-02

Pos logit contributions
 pap+0.025
 disl+0.024
 plan+0.024
 fact+0.024
 ours+0.024
Neg logit contributions
es-0.032
ed-0.032
ing-0.030
oped-0.029
med-0.029
Token off
Token ID572
Feature activation+2.79e-02

Pos logit contributions
es+0.028
ed+0.028
ing+0.027
med+0.027
oped+0.026
Neg logit contributions
 pap-0.020
 not-0.019
 plan-0.019
 mistake-0.018
 fact-0.018
Token his
Token ID465
Feature activation-1.96e-02

Pos logit contributions
es+0.068
ing+0.067
ed+0.066
oped+0.066
med+0.064
Neg logit contributions
 not-0.031
 pap-0.029
 help-0.028
 plan-0.027
 grown-0.027
Token heavy
Token ID4334
Feature activation-1.11e-02

Pos logit contributions
 pap+0.033
 not+0.031
 plan+0.030
 ours+0.030
 mistake+0.029
Neg logit contributions
es-0.034
ed-0.034
ing-0.033
med-0.032
oped-0.032
Token vest
Token ID19750
Feature activation+1.15e-01

Pos logit contributions
 pap+0.019
 not+0.019
 plan+0.018
 ours+0.017
 grown+0.017
Neg logit contributions
es-0.019
ing-0.019
ed-0.019
oped-0.018
med-0.018
Token.
Token ID13
Feature activation+5.67e-02

Pos logit contributions
med+0.235
oped+0.232
ays+0.218
es+0.216
acked+0.213
Neg logit contributions
 not-0.200
 help-0.166
 sc-0.156
 work-0.155
 be-0.153
Token He
Token ID679
Feature activation-1.16e-02

Pos logit contributions
es+0.094
ing+0.091
oped+0.088
med+0.088
ed+0.087
Neg logit contributions
 not-0.108
 pap-0.105
 plan-0.100
 grown-0.100
 help-0.100
Token hung
Token ID9174
Feature activation+4.12e-02

Pos logit contributions
 pap+0.018
 ours+0.017
 disl+0.017
 not+0.017
 plan+0.017
Neg logit contributions
es-0.022
ed-0.021
oped-0.020
ing-0.020
med-0.020
Token it
Token ID340
Feature activation+4.84e-02

Pos logit contributions
es+0.090
oped+0.088
ing+0.086
ed+0.084
hed+0.083
Neg logit contributions
 not-0.056
 pap-0.052
 help-0.050
 plan-0.048
 grown-0.048
Token up
Token ID510
Feature activation+2.38e-02

Pos logit contributions
oped+0.096
umping+0.092
es+0.091
ays+0.091
ane+0.090
Neg logit contributions
 not-0.093
 work-0.078
 big-0.078
 dry-0.078
 be-0.077
Token.
Token ID13
Feature activation+4.31e-02

Pos logit contributions
es+0.122
oped+0.121
med+0.121
ing+0.117
ed+0.116
Neg logit contributions
 not-0.153
 plan-0.144
 mistake-0.144
 pap-0.143
 help-0.142
Token They
Token ID1119
Feature activation-2.37e-02

Pos logit contributions
ing+0.090
es+0.088
oped+0.085
med+0.082
ed+0.081
Neg logit contributions
 not-0.070
 help-0.066
 fun-0.065
 grown-0.064
 pap-0.064
Token picked
Token ID6497
Feature activation+2.90e-02

Pos logit contributions
 pap+0.040
 disl+0.039
 majesty+0.038
 ours+0.038
 mistake+0.037
Neg logit contributions
es-0.043
ed-0.041
oped-0.041
hed-0.039
s-0.038
Token up
Token ID510
Feature activation+3.39e-02

Pos logit contributions
es+0.051
oped+0.049
ing+0.048
ed+0.047
med+0.047
Neg logit contributions
 pap-0.050
 not-0.050
 plan-0.048
 help-0.047
 mistake-0.046
Token their
Token ID511
Feature activation-2.50e-02

Pos logit contributions
es+0.085
ing+0.083
ed+0.083
oped+0.081
med+0.080
Neg logit contributions
 pap-0.034
 not-0.032
 plan-0.031
 help-0.029
 mistake-0.029
Token sunglasses
Token ID36251
Feature activation+1.14e-01

Pos logit contributions
 pap+0.030
 not+0.028
 majesty+0.026
 fact+0.025
 gard+0.024
Neg logit contributions
ed-0.056
s-0.055
es-0.054
ing-0.054
ted-0.053
Token.
Token ID13
Feature activation+4.44e-02

Pos logit contributions
oped+0.224
med+0.218
es+0.209
acked+0.207
ed+0.203
Neg logit contributions
 not-0.189
 help-0.163
 sc-0.159
 be-0.155
 water-0.151
Token They
Token ID1119
Feature activation-2.44e-02

Pos logit contributions
ing+0.100
es+0.097
oped+0.096
med+0.093
ane+0.091
Neg logit contributions
 not-0.067
 fun-0.062
 help-0.060
 grown-0.060
 pap-0.058
Token ran
Token ID4966
Feature activation+3.60e-02

Pos logit contributions
 pap+0.040
 majesty+0.038
 ours+0.038
 disl+0.038
 mistake+0.038
Neg logit contributions
es-0.046
ed-0.044
oped-0.043
hed-0.042
med-0.041
Token to
Token ID284
Feature activation-7.15e-03

Pos logit contributions
es+0.080
oped+0.077
med+0.075
ing+0.075
ane+0.073
Neg logit contributions
 not-0.049
 help-0.047
 fun-0.046
 grown-0.044
 plan-0.044
Token the
Token ID262
Feature activation-3.68e-02

Pos logit contributions
 majesty+0.013
 pap+0.013
 vitamins+0.012
 fact+0.012
 gard+0.011
Neg logit contributions
 again-0.011
es-0.011
s-0.010
ed-0.010
med-0.010
Token He
Token ID679
Feature activation-1.68e-02

Pos logit contributions
oped+0.085
es+0.084
ing+0.084
med+0.080
ays+0.078
Neg logit contributions
 not-0.121
 pap-0.119
 help-0.113
 grown-0.112
 plan-0.111
Token grabbed
Token ID13176
Feature activation+7.89e-03

Pos logit contributions
 pap+0.023
 ours+0.022
 not+0.021
 plan+0.021
 majesty+0.021
Neg logit contributions
es-0.036
ed-0.035
ing-0.034
oped-0.034
med-0.033
Token hold
Token ID1745
Feature activation+4.01e-02

Pos logit contributions
es+0.018
ed+0.018
ing+0.017
med+0.017
oped+0.017
Neg logit contributions
 pap-0.010
 not-0.009
 plan-0.009
 mistake-0.009
 majesty-0.009
Token of
Token ID286
Feature activation-1.47e-02

Pos logit contributions
es+0.064
oped+0.061
ane+0.057
hed+0.056
med+0.056
Neg logit contributions
 help-0.075
 pap-0.072
 not-0.071
 permission-0.069
 welcome-0.068
Token the
Token ID262
Feature activation-2.79e-02

Pos logit contributions
 disl+0.024
 majesty+0.024
 pap+0.023
 rightful+0.023
 mistake+0.023
Neg logit contributions
es-0.025
med-0.025
ed-0.025
ted-0.025
ing-0.024
Token mask
Token ID9335
Feature activation+1.14e-01

Pos logit contributions
 pap+0.045
 not+0.043
 disl+0.042
 ours+0.041
 mistake+0.040
Neg logit contributions
ing-0.053
es-0.052
ed-0.051
med-0.048
s-0.047
Token and
Token ID290
Feature activation-7.02e-03

Pos logit contributions
oped+0.193
es+0.190
med+0.187
ays+0.185
ed+0.180
Neg logit contributions
 not-0.216
 help-0.192
 plan-0.187
 pap-0.186
 special-0.182
Token used
Token ID973
Feature activation-2.02e-03

Pos logit contributions
 pap+0.011
 plan+0.011
 mistake+0.010
 not+0.010
 majesty+0.010
Neg logit contributions
es-0.014
ed-0.013
ing-0.013
med-0.013
oped-0.013
Token it
Token ID340
Feature activation+3.99e-02

Pos logit contributions
 pap+0.004
 mistake+0.003
 plan+0.003
 majesty+0.003
 not+0.003
Neg logit contributions
es-0.004
ed-0.003
ing-0.003
med-0.003
oped-0.003
Token to
Token ID284
Feature activation-2.58e-02

Pos logit contributions
es+0.084
ays+0.084
oped+0.084
ane+0.084
ing+0.081
Neg logit contributions
 not-0.056
 help-0.054
 fun-0.049
 grown-0.048
 permission-0.048
Token protect
Token ID1805
Feature activation+3.77e-02

Pos logit contributions
 pap+0.032
 ours+0.029
 majesty+0.028
 fact+0.026
 disl+0.026
Neg logit contributions
es-0.057
oped-0.054
med-0.053
ed-0.053
ing-0.053
Token Leo
Token ID19632
Feature activation+6.06e-02

Pos logit contributions
oped+0.085
ing+0.085
hed+0.082
ed+0.081
ays+0.080
Neg logit contributions
 pap-0.049
 not-0.047
 help-0.042
 plan-0.040
 grown-0.040
Token and
Token ID290
Feature activation-6.30e-03

Pos logit contributions
ing+0.112
oped+0.111
ed+0.109
hed+0.107
ays+0.106
Neg logit contributions
 pap-0.101
 not-0.098
 help-0.092
 plan-0.089
 be-0.087
Token gave
Token ID2921
Feature activation+2.47e-02

Pos logit contributions
 mistake+0.009
 plan+0.008
 not+0.008
 pap+0.008
 ours+0.008
Neg logit contributions
es-0.014
ed-0.013
med-0.012
ing-0.012
oped-0.012
Token him
Token ID683
Feature activation+3.37e-02

Pos logit contributions
es+0.049
ing+0.047
ed+0.047
med+0.045
oped+0.045
Neg logit contributions
 pap-0.037
 not-0.036
 plan-0.036
 mistake-0.035
 grown-0.034
Token a
Token ID257
Feature activation-2.81e-02

Pos logit contributions
es+0.080
ed+0.077
ing+0.075
med+0.074
ted+0.070
Neg logit contributions
 pap-0.038
 mistake-0.038
 plan-0.037
 not-0.036
 fact-0.036
Token crown
Token ID12389
Feature activation+1.14e-01

Pos logit contributions
 disl+0.033
 pap+0.033
 not+0.033
 mistake+0.032
 permission+0.031
Neg logit contributions
s-0.058
es-0.057
ing-0.057
ed-0.056
oped-0.056
Token and
Token ID290
Feature activation-2.21e-02

Pos logit contributions
oped+0.215
med+0.204
hed+0.201
es+0.194
ed+0.193
Neg logit contributions
 not-0.190
 pap-0.177
 help-0.176
 sweet-0.164
 dad-0.160
Token a
Token ID257
Feature activation-3.61e-02

Pos logit contributions
 mistake+0.029
 plan+0.026
 pap+0.025
 fact+0.025
 disl+0.025
Neg logit contributions
es-0.050
ed-0.048
med-0.045
ted-0.044
s-0.043
Token cape
Token ID36023
Feature activation+9.90e-02

Pos logit contributions
 not+0.045
 pap+0.045
 disl+0.045
 permission+0.043
 mistake+0.043
Neg logit contributions
es-0.073
ed-0.070
ing-0.070
s-0.069
oped-0.069
Token.
Token ID13
Feature activation+6.01e-02

Pos logit contributions
oped+0.175
ing+0.170
med+0.167
es+0.165
ays+0.161
Neg logit contributions
 not-0.170
 help-0.163
 be-0.147
 pap-0.147
 plan-0.147
Token She
Token ID1375
Feature activation-1.27e-02

Pos logit contributions
ing+0.131
ed+0.128
oped+0.128
es+0.126
med+0.124
Neg logit contributions
 pap-0.085
 not-0.084
 plan-0.075
 help-0.075
 fun-0.074
Token.
Token ID13
Feature activation+4.66e-02

Pos logit contributions
oped+0.102
ing+0.101
es+0.099
med+0.097
ed+0.096
Neg logit contributions
 not-0.117
 help-0.110
 pap-0.108
 plan-0.107
 grown-0.107
Token He
Token ID679
Feature activation-1.91e-02

Pos logit contributions
es+0.068
ing+0.066
oped+0.064
ed+0.064
med+0.063
Neg logit contributions
 pap-0.099
 not-0.098
 grown-0.092
 plan-0.091
 help-0.090
Token puts
Token ID7584
Feature activation+1.72e-02

Pos logit contributions
 pap+0.029
 ours+0.028
 permission+0.028
 not+0.028
 plan+0.027
Neg logit contributions
es-0.038
ed-0.034
oped-0.033
ing-0.033
s-0.033
Token on
Token ID319
Feature activation+9.42e-03

Pos logit contributions
es+0.039
ing+0.039
oped+0.038
ed+0.038
med+0.037
Neg logit contributions
 pap-0.021
 not-0.020
 plan-0.019
 help-0.019
 grown-0.018
Token his
Token ID465
Feature activation-2.40e-02

Pos logit contributions
es+0.020
ed+0.019
med+0.019
ing+0.018
oped+0.018
Neg logit contributions
 pap-0.013
 plan-0.013
 not-0.012
 mistake-0.012
 ours-0.012
Token mask
Token ID9335
Feature activation+1.14e-01

Pos logit contributions
 pap+0.030
 not+0.027
 gard+0.027
 disl+0.027
 plan+0.027
Neg logit contributions
es-0.051
ed-0.049
med-0.049
ing-0.047
s-0.046
Token and
Token ID290
Feature activation-1.63e-02

Pos logit contributions
oped+0.227
ays+0.217
ing+0.213
med+0.209
acked+0.202
Neg logit contributions
 not-0.189
 help-0.149
 sc-0.148
 grown-0.147
 fun-0.145
Token plays
Token ID5341
Feature activation+4.71e-02

Pos logit contributions
 pap+0.024
 plan+0.024
 not+0.023
 mistake+0.023
 ours+0.023
Neg logit contributions
es-0.032
ed-0.030
med-0.029
ing-0.028
oped-0.028
Token with
Token ID351
Feature activation-5.53e-03

Pos logit contributions
ing+0.083
oped+0.083
ed+0.079
es+0.079
ane+0.078
Neg logit contributions
 not-0.085
 help-0.081
 fun-0.079
 grown-0.079
 plan-0.079
Token his
Token ID465
Feature activation-1.60e-02

Pos logit contributions
 pap+0.009
 plan+0.009
 gard+0.009
 disl+0.008
 mistake+0.008
Neg logit contributions
es-0.010
ed-0.010
med-0.009
ing-0.009
ted-0.009
Token dinosaur
Token ID28257
Feature activation+6.81e-02

Pos logit contributions
 pap+0.018
 not+0.017
 disl+0.016
 ours+0.016
 gard+0.016
Neg logit contributions
es-0.037
ed-0.035
ing-0.035
med-0.034
oped-0.034
Tokenrobe
Token ID25481
Feature activation+2.51e-02

Pos logit contributions
es+0.025
ing+0.024
ed+0.024
oped+0.023
med+0.023
Neg logit contributions
 pap-0.026
 not-0.026
 plan-0.025
 help-0.024
 mistake-0.024
Token or
Token ID393
Feature activation-7.83e-02

Pos logit contributions
es+0.047
oped+0.045
ing+0.044
med+0.044
ed+0.044
Neg logit contributions
 not-0.041
 pap-0.039
 help-0.038
 plan-0.038
 grown-0.037
Token the
Token ID262
Feature activation-1.16e-01

Pos logit contributions
 pap+0.096
 fact+0.096
 gard+0.092
 mistake+0.090
 majesty+0.087
Neg logit contributions
es-0.158
ed-0.147
s-0.144
ing-0.141
med-0.138
Token towel
Token ID24808
Feature activation+2.35e-02

Pos logit contributions
 not+0.128
 pap+0.123
 disl+0.122
 ours+0.120
 majesty+0.110
Neg logit contributions
ing-0.247
es-0.243
ed-0.236
med-0.226
ane-0.222
Token or
Token ID393
Feature activation-7.70e-02

Pos logit contributions
es+0.039
ing+0.037
oped+0.037
ed+0.037
med+0.036
Neg logit contributions
 not-0.042
 pap-0.042
 plan-0.040
 help-0.040
 mistake-0.039
Token the
Token ID262
Feature activation-1.17e-01

Pos logit contributions
 pap+0.093
 fact+0.088
 gard+0.087
 mistake+0.085
 plan+0.084
Neg logit contributions
es-0.161
ed-0.153
ing-0.145
s-0.144
med-0.143
Token hug
Token ID16225
Feature activation+1.85e-02

Pos logit contributions
 not+0.130
 pap+0.124
 disl+0.124
 ours+0.122
 majesty+0.112
Neg logit contributions
ing-0.249
es-0.242
ed-0.237
med-0.228
oped-0.224
Token.
Token ID13
Feature activation+3.44e-05

Pos logit contributions
oped+0.036
med+0.033
ed+0.032
es+0.032
hed+0.032
Neg logit contributions
 not-0.030
 be-0.027
 pap-0.027
 help-0.027
 grown-0.026
Token I
Token ID314
Feature activation-6.02e-02

Pos logit contributions
ing+0.000
ed+0.000
es+0.000
oped+0.000
med+0.000
Neg logit contributions
 pap-0.000
 not-0.000
 welcome-0.000
 disl-0.000
 help-0.000
Token won
Token ID1839
Feature activation-2.58e-02

Pos logit contributions
 pap+0.078
 ours+0.069
 not+0.068
 majesty+0.067
 mistake+0.066
Neg logit contributions
es-0.133
ed-0.129
oped-0.127
ing-0.126
hed-0.123
Token't
Token ID470
Feature activation-8.14e-02

Pos logit contributions
 pap+0.040
 not+0.039
 plan+0.038
 help+0.037
 mistake+0.036
Neg logit contributions
es-0.050
ing-0.049
ed-0.048
oped-0.047
med-0.047
Token around
Token ID1088
Feature activation-3.99e-02

Pos logit contributions
 not+0.017
 plan+0.015
 grown+0.015
 help+0.015
 pap+0.015
Neg logit contributions
es-0.014
oped-0.014
ing-0.014
med-0.013
ed-0.013
Token the
Token ID262
Feature activation-9.38e-02

Pos logit contributions
 pap+0.056
 not+0.053
 plan+0.051
 help+0.050
 mistake+0.049
Neg logit contributions
es-0.084
ing-0.081
ed-0.081
oped-0.079
med-0.078
Token waves
Token ID9813
Feature activation+2.12e-02

Pos logit contributions
 pap+0.112
 not+0.104
 disl+0.099
 ours+0.099
 help+0.097
Neg logit contributions
ing-0.205
ed-0.193
es-0.190
s-0.185
hed-0.185
Token,
Token ID11
Feature activation-5.41e-02

Pos logit contributions
es+0.034
oped+0.033
ing+0.031
med+0.031
ed+0.031
Neg logit contributions
 not-0.039
 help-0.037
 plan-0.037
 pap-0.036
 grown-0.036
Token feeling
Token ID4203
Feature activation-6.53e-02

Pos logit contributions
 pap+0.092
 plan+0.088
 majesty+0.085
 help+0.085
 not+0.085
Neg logit contributions
es-0.094
ing-0.093
ed-0.092
med-0.089
oped-0.087
Token such
Token ID884
Feature activation-1.17e-01

Pos logit contributions
 pap+0.079
 majesty+0.074
 ours+0.074
 mistake+0.074
 plan+0.073
Neg logit contributions
ed-0.147
es-0.145
ing-0.145
med-0.144
hed-0.136
Token joy
Token ID8716
Feature activation+4.66e-03

Pos logit contributions
 pap+0.173
 majesty+0.158
 mistake+0.155
 dads+0.152
 plan+0.139
Neg logit contributions
ed-0.234
es-0.211
med-0.209
ing-0.205
hed-0.203
Token and
Token ID290
Feature activation-6.92e-02

Pos logit contributions
es+0.007
ed+0.007
oped+0.007
ing+0.007
med+0.007
Neg logit contributions
 pap-0.009
 not-0.009
 plan-0.009
 help-0.008
 grown-0.008
Token freedom
Token ID4925
Feature activation+1.69e-02

Pos logit contributions
 pap+0.073
 mistake+0.067
 majesty+0.066
 tresp+0.066
 ours+0.062
Neg logit contributions
ed-0.156
ing-0.152
es-0.152
med-0.145
 onwards-0.144
Token.
Token ID13
Feature activation+1.24e-04

Pos logit contributions
es+0.027
oped+0.026
ed+0.025
med+0.024
ing+0.024
Neg logit contributions
 not-0.032
 pap-0.031
 help-0.030
 grown-0.030
 plan-0.030
Token
Token ID220
Feature activation-1.83e-03

Pos logit contributions
es+0.000
ing+0.000
ed+0.000
med+0.000
oped+0.000
Neg logit contributions
 pap-0.000
 not-0.000
 grown-0.000
 plan-0.000
 help-0.000
Token a
Token ID257
Feature activation-9.06e-02

Pos logit contributions
 pap+0.165
 disl+0.151
 majesty+0.150
 dads+0.144
 gard+0.142
Neg logit contributions
ed-0.247
med-0.245
es-0.240
ing-0.240
hed-0.227
Token sque
Token ID35776
Feature activation-3.10e-03

Pos logit contributions
 permission+0.101
 disl+0.099
 ours+0.094
 not+0.093
 pap+0.087
Neg logit contributions
ing-0.190
oped-0.177
es-0.174
s-0.173
ed-0.171
Tokenaky
Token ID15492
Feature activation-1.87e-02

Pos logit contributions
 pap+0.005
 not+0.005
 plan+0.005
 help+0.004
 mistake+0.004
Neg logit contributions
es-0.006
ing-0.006
ed-0.006
oped-0.006
med-0.006
Token reply
Token ID10971
Feature activation+5.26e-03

Pos logit contributions
 not+0.022
 pap+0.021
 ours+0.021
 disl+0.020
 permission+0.019
Neg logit contributions
ing-0.042
es-0.042
ed-0.040
oped-0.039
med-0.038
Token from
Token ID422
Feature activation-7.20e-02

Pos logit contributions
oped+0.008
es+0.008
med+0.008
ing+0.008
ed+0.008
Neg logit contributions
 pap-0.010
 not-0.010
 plan-0.009
 grown-0.009
 help-0.009
Token the
Token ID262
Feature activation-1.16e-01

Pos logit contributions
 pap+0.095
 plan+0.091
 majesty+0.085
 not+0.084
 disl+0.083
Neg logit contributions
es-0.147
med-0.140
ed-0.139
ing-0.138
ted-0.132
Token tree
Token ID5509
Feature activation+1.73e-02

Pos logit contributions
 not+0.138
 disl+0.137
 pap+0.132
 ours+0.127
 plan+0.119
Neg logit contributions
ing-0.237
es-0.227
ed-0.223
ane-0.216
med-0.215
Token.
Token ID13
Feature activation+1.83e-03

Pos logit contributions
es+0.027
ed+0.027
ing+0.026
oped+0.026
med+0.026
Neg logit contributions
 not-0.033
 pap-0.032
 plan-0.031
 help-0.030
 mistake-0.030
Token "
Token ID366
Feature activation+8.40e-03

Pos logit contributions
ing+0.003
es+0.003
ed+0.003
oped+0.003
med+0.003
Neg logit contributions
 pap-0.003
 not-0.003
 grown-0.003
 be-0.003
 welcome-0.003
TokenMe
Token ID5308
Feature activation-3.05e-02

Pos logit contributions
es+0.017
ing+0.017
ed+0.016
ays+0.015
oped+0.015
Neg logit contributions
 pap-0.013
 not-0.012
 disl-0.012
 help-0.012
 be-0.012
Token,"
Token ID553
Feature activation+1.40e-02

Pos logit contributions
 pap+0.046
 not+0.044
 plan+0.043
 help+0.042
 mistake+0.042
Neg logit contributions
es-0.061
ing-0.060
ed-0.059
oped-0.058
med-0.057
Token you
Token ID345
Feature activation-6.52e-02

Pos logit contributions
 pap+0.055
 plan+0.052
 not+0.050
 mistake+0.048
 majesty+0.047
Neg logit contributions
es-0.095
ing-0.094
ed-0.094
med-0.089
oped-0.087
Token the
Token ID262
Feature activation-1.07e-01

Pos logit contributions
 pap+0.091
 fact+0.080
 majesty+0.080
 plan+0.080
 gard+0.078
Neg logit contributions
ing-0.133
es-0.133
ed-0.132
med-0.124
s-0.121
Token bath
Token ID7837
Feature activation+1.47e-02

Pos logit contributions
 pap+0.119
 not+0.116
 disl+0.106
 majesty+0.102
 ours+0.100
Neg logit contributions
ing-0.233
es-0.224
ed-0.221
ane-0.217
hed-0.208
Tokenrobe
Token ID25481
Feature activation+2.51e-02

Pos logit contributions
es+0.025
ing+0.024
ed+0.024
oped+0.023
med+0.023
Neg logit contributions
 pap-0.026
 not-0.026
 plan-0.025
 help-0.024
 mistake-0.024
Token or
Token ID393
Feature activation-7.83e-02

Pos logit contributions
es+0.047
oped+0.045
ing+0.044
med+0.044
ed+0.044
Neg logit contributions
 not-0.041
 pap-0.039
 help-0.038
 plan-0.038
 grown-0.037
Token the
Token ID262
Feature activation-1.16e-01

Pos logit contributions
 pap+0.096
 fact+0.096
 gard+0.092
 mistake+0.090
 majesty+0.087
Neg logit contributions
es-0.158
ed-0.147
s-0.144
ing-0.141
med-0.138
Token towel
Token ID24808
Feature activation+2.35e-02

Pos logit contributions
 not+0.128
 pap+0.123
 disl+0.122
 ours+0.120
 majesty+0.110
Neg logit contributions
ing-0.247
es-0.243
ed-0.236
med-0.226
ane-0.222
Token or
Token ID393
Feature activation-7.70e-02

Pos logit contributions
es+0.039
ing+0.037
oped+0.037
ed+0.037
med+0.036
Neg logit contributions
 not-0.042
 pap-0.042
 plan-0.040
 help-0.040
 mistake-0.039
Token the
Token ID262
Feature activation-1.17e-01

Pos logit contributions
 pap+0.093
 fact+0.088
 gard+0.087
 mistake+0.085
 plan+0.084
Neg logit contributions
es-0.161
ed-0.153
ing-0.145
s-0.144
med-0.143
Token hug
Token ID16225
Feature activation+1.85e-02

Pos logit contributions
 not+0.130
 pap+0.124
 disl+0.124
 ours+0.122
 majesty+0.112
Neg logit contributions
ing-0.249
es-0.242
ed-0.237
med-0.228
oped-0.224
Token.
Token ID13
Feature activation+3.44e-05

Pos logit contributions
oped+0.036
med+0.033
ed+0.032
es+0.032
hed+0.032
Neg logit contributions
 not-0.030
 be-0.027
 pap-0.027
 help-0.027
 grown-0.026
Token water
Token ID1660
Feature activation-3.71e-02

Pos logit contributions
oped+0.162
med+0.159
es+0.158
ed+0.155
ing+0.152
Neg logit contributions
 not-0.150
 work-0.136
 help-0.136
 sc-0.134
 be-0.133
Token and
Token ID290
Feature activation-8.79e-02

Pos logit contributions
 pap+0.074
 not+0.072
 plan+0.071
 help+0.069
 mistake+0.069
Neg logit contributions
es-0.055
ing-0.053
ed-0.053
oped-0.051
med-0.050
Token the
Token ID262
Feature activation-1.21e-01

Pos logit contributions
 pap+0.105
 majesty+0.093
 mistake+0.092
 fact+0.091
 plan+0.091
Neg logit contributions
es-0.191
ed-0.190
ing-0.180
med-0.178
hed-0.174
Token sand
Token ID6450
Feature activation+8.07e-03

Pos logit contributions
 not+0.125
 pap+0.125
 ours+0.117
 disl+0.117
 plan+0.110
Neg logit contributions
ing-0.271
ed-0.263
es-0.251
hed-0.242
med-0.242
Token and
Token ID290
Feature activation-7.49e-02

Pos logit contributions
es+0.016
ing+0.015
ed+0.015
oped+0.015
med+0.015
Neg logit contributions
 pap-0.012
 not-0.012
 plan-0.011
 help-0.011
 mistake-0.011
Token the
Token ID262
Feature activation-1.16e-01

Pos logit contributions
 pap+0.085
 mistake+0.077
 plan+0.077
 disl+0.076
 majesty+0.075
Neg logit contributions
es-0.165
ed-0.164
ing-0.154
med-0.154
hed-0.148
Token shells
Token ID19679
Feature activation+1.77e-02

Pos logit contributions
 pap+0.116
 not+0.111
 disl+0.106
 ours+0.103
 plan+0.099
Neg logit contributions
ing-0.259
ed-0.256
es-0.236
s-0.233
med-0.233
Token.
Token ID13
Feature activation-4.80e-03

Pos logit contributions
es+0.028
ing+0.027
ed+0.026
oped+0.026
med+0.026
Neg logit contributions
 not-0.033
 pap-0.033
 plan-0.032
 help-0.031
 grown-0.031
Token Maybe
Token ID6674
Feature activation-3.84e-02

Pos logit contributions
 pap+0.009
 not+0.009
 grown+0.008
 help+0.008
 welcome+0.008
Neg logit contributions
es-0.008
ing-0.008
ed-0.008
med-0.008
oped-0.008
Token we
Token ID356
Feature activation-6.24e-02

Pos logit contributions
 pap+0.066
 plan+0.065
 mistake+0.063
 majesty+0.063
 disl+0.062
Neg logit contributions
es-0.064
ing-0.063
ed-0.062
med-0.059
oped-0.057
Token will
Token ID481
Feature activation-7.34e-02

Pos logit contributions
 pap+0.102
 not+0.095
 mistake+0.092
 plan+0.091
 majesty+0.090
Neg logit contributions
es-0.118
ed-0.114
ing-0.112
oped-0.112
med-0.106
Token can
Token ID460
Feature activation-7.16e-02

Pos logit contributions
 pap+0.113
 not+0.102
 mistake+0.100
 majesty+0.097
 plan+0.096
Neg logit contributions
es-0.185
ed-0.182
ing-0.179
oped-0.175
med-0.171
Token use
Token ID779
Feature activation-5.52e-02

Pos logit contributions
 pap+0.071
 majesty+0.060
 ours+0.060
 mistake+0.060
 gard+0.057
Neg logit contributions
es-0.175
ing-0.171
oped-0.167
med-0.166
ed-0.165
Token the
Token ID262
Feature activation-9.70e-02

Pos logit contributions
 pap+0.073
 gard+0.072
 plan+0.070
 majesty+0.068
 not+0.067
Neg logit contributions
es-0.116
ing-0.112
med-0.112
ed-0.107
s-0.103
Token sand
Token ID6450
Feature activation+2.68e-02

Pos logit contributions
 not+0.111
 disl+0.109
 pap+0.106
 majesty+0.097
 dads+0.092
Neg logit contributions
ing-0.207
es-0.198
ed-0.194
med-0.186
ane-0.185
Token and
Token ID290
Feature activation-7.39e-02

Pos logit contributions
es+0.049
ing+0.048
ed+0.047
oped+0.046
med+0.046
Neg logit contributions
 pap-0.044
 not-0.043
 plan-0.042
 help-0.040
 mistake-0.040
Token the
Token ID262
Feature activation-1.16e-01

Pos logit contributions
 mistake+0.088
 pap+0.087
 disl+0.083
 fact+0.083
 tresp+0.080
Neg logit contributions
es-0.143
med-0.137
ed-0.135
ing-0.129
 again-0.128
Token toys
Token ID14958
Feature activation+1.74e-02

Pos logit contributions
 pap+0.115
 disl+0.112
 not+0.108
 supposed+0.093
 punish+0.090
Neg logit contributions
ing-0.247
ed-0.239
es-0.224
s-0.222
med-0.219
Token,"
Token ID553
Feature activation+1.01e-03

Pos logit contributions
es+0.030
ing+0.029
ed+0.029
oped+0.029
med+0.028
Neg logit contributions
 not-0.030
 pap-0.029
 plan-0.028
 help-0.028
 grown-0.028
Token she
Token ID673
Feature activation-4.56e-02

Pos logit contributions
es+0.002
ing+0.002
ed+0.002
oped+0.002
med+0.002
Neg logit contributions
 pap-0.001
 not-0.001
 plan-0.001
 help-0.001
 grown-0.001
Token says
Token ID1139
Feature activation-9.39e-03

Pos logit contributions
 ours+0.073
 pap+0.072
 fact+0.069
 not+0.069
 mistake+0.069
Neg logit contributions
es-0.085
ed-0.081
ing-0.078
med-0.075
oped-0.074
Token.
Token ID13
Feature activation+8.19e-03

Pos logit contributions
 not+0.017
 help+0.017
 pap+0.016
 permission+0.015
 grown+0.015
Neg logit contributions
oped-0.016
ays-0.016
ane-0.016
acked-0.016
es-0.016
Token can
Token ID460
Feature activation-7.69e-02

Pos logit contributions
 pap+0.099
 not+0.090
 plan+0.086
 mistake+0.085
 ours+0.083
Neg logit contributions
es-0.174
ed-0.168
ing-0.168
oped-0.163
med-0.161
Token do
Token ID466
Feature activation-5.75e-02

Pos logit contributions
 pap+0.084
 ours+0.073
 majesty+0.070
 dads+0.068
 tresp+0.067
Neg logit contributions
es-0.182
ing-0.177
oped-0.168
med-0.167
ed-0.165
Token great
Token ID1049
Feature activation-4.86e-02

Pos logit contributions
 pap+0.084
 gard+0.075
 plan+0.073
 dads+0.071
 majesty+0.070
Neg logit contributions
es-0.121
ing-0.115
med-0.110
ed-0.110
hed-0.103
Token things
Token ID1243
Feature activation-1.48e-02

Pos logit contributions
 pap+0.069
 not+0.062
 ours+0.059
 plan+0.058
 fact+0.057
Neg logit contributions
es-0.103
ing-0.102
ed-0.100
med-0.096
hed-0.093
Token that
Token ID326
Feature activation-6.18e-02

Pos logit contributions
 pap+0.025
 not+0.025
 plan+0.024
 help+0.024
 mistake+0.023
Neg logit contributions
es-0.026
ing-0.025
ed-0.025
oped-0.025
med-0.024
Token the
Token ID262
Feature activation-1.15e-01

Pos logit contributions
 pap+0.078
 majesty+0.069
 gard+0.066
 mistake+0.065
 permission+0.063
Neg logit contributions
es-0.137
ing-0.134
ed-0.129
hed-0.125
s-0.124
Token weak
Token ID4939
Feature activation-4.77e-02

Pos logit contributions
 pap+0.124
 not+0.120
 ours+0.116
 disl+0.113
 majesty+0.106
Neg logit contributions
ing-0.248
es-0.237
ed-0.234
ane-0.227
hed-0.221
Token cannot
Token ID2314
Feature activation-7.42e-02

Pos logit contributions
 pap+0.065
 not+0.063
 plan+0.060
 help+0.058
 ours+0.057
Neg logit contributions
es-0.100
ing-0.099
ed-0.097
oped-0.094
med-0.093
Token do
Token ID466
Feature activation-4.80e-02

Pos logit contributions
 pap+0.092
 majesty+0.078
 dads+0.072
 gard+0.070
 disl+0.068
Neg logit contributions
es-0.168
ing-0.166
ed-0.157
oped-0.156
s-0.155
Token alone
Token ID3436
Feature activation-1.40e-02

Pos logit contributions
 not+0.080
 pap+0.078
 plan+0.076
 help+0.075
 mistake+0.074
Neg logit contributions
es-0.087
ing-0.084
ed-0.084
oped-0.083
med-0.081
Token.
Token ID13
Feature activation-2.85e-03

Pos logit contributions
 not+0.026
 help+0.025
 plan+0.024
 pap+0.024
 mistake+0.024
Neg logit contributions
oped-0.023
es-0.023
ed-0.022
ing-0.022
med-0.022
Token "
Token ID366
Feature activation-1.04e-02

Pos logit contributions
 pap+0.061
 not+0.061
 grown+0.055
 be+0.055
 help+0.055
Neg logit contributions
ing-0.060
es-0.059
ed-0.057
med-0.055
oped-0.055
TokenMax
Token ID11518
Feature activation+4.02e-03

Pos logit contributions
 not+0.016
 pap+0.016
 disl+0.016
 grown+0.015
 be+0.015
Neg logit contributions
es-0.020
ing-0.020
ed-0.019
oped-0.018
ays-0.018
Token,
Token ID11
Feature activation-5.42e-02

Pos logit contributions
es+0.007
ing+0.007
ed+0.007
oped+0.006
med+0.006
Neg logit contributions
 pap-0.007
 not-0.007
 plan-0.007
 help-0.007
 mistake-0.007
Token you
Token ID345
Feature activation-6.60e-02

Pos logit contributions
 pap+0.068
 rightful+0.060
 plan+0.059
 majesty+0.059
 elderly+0.059
Neg logit contributions
es-0.116
ing-0.113
ed-0.110
med-0.106
oped-0.104
Token're
Token ID821
Feature activation-7.63e-02

Pos logit contributions
 pap+0.103
 rightful+0.089
 mistake+0.088
 disl+0.087
 elderly+0.086
Neg logit contributions
es-0.137
ed-0.129
oped-0.125
ing-0.125
hed-0.122
Token such
Token ID884
Feature activation-1.15e-01

Pos logit contributions
 disl+0.115
 majesty+0.113
 pap+0.108
 dads+0.103
 rightful+0.102
Neg logit contributions
es-0.136
med-0.133
ing-0.133
 on-0.129
 onwards-0.124
Token a
Token ID257
Feature activation-8.20e-02

Pos logit contributions
 pap+0.176
 porch+0.168
 majesty+0.160
 dads+0.160
 mistake+0.157
Neg logit contributions
ed-0.175
es-0.169
 on-0.161
 against-0.160
ing-0.158
Token creative
Token ID7325
Feature activation-4.63e-02

Pos logit contributions
 pap+0.085
 permission+0.076
 ours+0.073
 disl+0.069
 not+0.068
Neg logit contributions
ing-0.174
ed-0.162
es-0.160
oped-0.157
s-0.156
Token boy
Token ID2933
Feature activation+1.56e-02

Pos logit contributions
 pap+0.058
 not+0.056
 ours+0.055
 disl+0.052
 grown+0.048
Neg logit contributions
ing-0.097
es-0.095
ed-0.093
hed-0.086
ane-0.086
Token!"
Token ID2474
Feature activation+1.73e-02

Pos logit contributions
es+0.024
ing+0.024
ed+0.024
oped+0.023
med+0.023
Neg logit contributions
 pap-0.029
 not-0.029
 plan-0.028
 help-0.028
 mistake-0.027
Token
Token ID198
Feature activation+8.18e-03

Pos logit contributions
ing+0.034
es+0.034
ed+0.034
oped+0.034
med+0.033
Neg logit contributions
 not-0.026
 pap-0.026
 help-0.024
 plan-0.023
 grown-0.022
Token why
Token ID1521
Feature activation-6.32e-02

Pos logit contributions
 pap+0.124
 plan+0.117
 not+0.116
 mistake+0.115
 help+0.113
Neg logit contributions
es-0.145
ing-0.141
ed-0.139
oped-0.134
med-0.133
Token can
Token ID460
Feature activation-7.08e-02

Pos logit contributions
 pap+0.097
 permission+0.087
 gard+0.086
 plan+0.085
 majesty+0.084
Neg logit contributions
es-0.128
ing-0.120
ed-0.114
s-0.110
oped-0.109
Token't
Token ID470
Feature activation-7.96e-02

Pos logit contributions
 majesty+0.105
 gard+0.103
 pap+0.103
 dads+0.102
 mistake+0.101
Neg logit contributions
es-0.135
s-0.128
ing-0.124
med-0.122
oped-0.116
Token I
Token ID314
Feature activation-7.98e-02

Pos logit contributions
 pap+0.097
 majesty+0.094
 fact+0.086
 disl+0.084
 mistake+0.082
Neg logit contributions
ing-0.166
es-0.166
s-0.157
oped-0.154
ane-0.152
Token color
Token ID3124
Feature activation+1.38e-03

Pos logit contributions
 pap+0.088
 majesty+0.072
 ours+0.072
 mistake+0.071
 not+0.071
Neg logit contributions
es-0.188
oped-0.184
ed-0.183
ing-0.182
s-0.175
Token the
Token ID262
Feature activation-1.14e-01

Pos logit contributions
es+0.003
oped+0.003
ed+0.003
ing+0.003
med+0.003
Neg logit contributions
 not-0.002
 pap-0.002
 help-0.002
 plan-0.002
 ours-0.002
Token pictures
Token ID5986
Feature activation+2.12e-02

Pos logit contributions
 pap+0.124
 not+0.120
 disl+0.117
 majesty+0.105
 ours+0.104
Neg logit contributions
ing-0.246
ed-0.234
es-0.231
ane-0.228
hed-0.223
Token?
Token ID30
Feature activation-9.38e-03

Pos logit contributions
es+0.040
oped+0.040
ed+0.039
ing+0.039
med+0.037
Neg logit contributions
 not-0.034
 help-0.032
 plan-0.030
 be-0.030
 fun-0.030
Token They
Token ID1119
Feature activation-6.56e-02

Pos logit contributions
 pap+0.013
 not+0.013
 help+0.012
 be+0.012
 welcome+0.012
Neg logit contributions
ing-0.020
ed-0.019
es-0.018
hed-0.018
med-0.018
Token are
Token ID389
Feature activation-7.41e-02

Pos logit contributions
 pap+0.091
 mistake+0.082
 majesty+0.081
 disl+0.079
 elderly+0.079
Neg logit contributions
es-0.139
oped-0.130
ed-0.129
ing-0.126
s-0.125
Token boring
Token ID14262
Feature activation-4.33e-03

Pos logit contributions
 disl+0.088
 pap+0.084
 plan+0.083
 lookout+0.082
 majesty+0.081
Neg logit contributions
es-0.151
med-0.147
ing-0.138
ed-0.135
 on-0.133
Token old
Token ID1468
Feature activation-3.30e-02

Pos logit contributions
 pap+0.108
 plan+0.097
 help+0.096
 majesty+0.094
 not+0.094
Neg logit contributions
ing-0.206
es-0.204
ed-0.201
oped-0.196
med-0.191
Token or
Token ID393
Feature activation-8.78e-02

Pos logit contributions
 pap+0.066
 not+0.065
 plan+0.063
 mistake+0.062
 help+0.062
Neg logit contributions
es-0.049
ing-0.047
ed-0.047
oped-0.047
med-0.046
Token too
Token ID1165
Feature activation-7.44e-02

Pos logit contributions
 pap+0.099
 majesty+0.095
 plan+0.093
 gard+0.092
 mistake+0.091
Neg logit contributions
es-0.195
ing-0.192
ed-0.192
med-0.184
oped-0.180
Token silly
Token ID14397
Feature activation-9.26e-03

Pos logit contributions
 pap+0.071
 not+0.067
 plan+0.064
 help+0.061
 mistake+0.060
Neg logit contributions
es-0.188
ing-0.185
ed-0.183
oped-0.180
med-0.178
Token for
Token ID329
Feature activation-5.10e-02

Pos logit contributions
 not+0.016
 help+0.015
 pap+0.015
 grown+0.015
 plan+0.015
Neg logit contributions
oped-0.016
es-0.016
med-0.015
ing-0.015
ed-0.015
Token the
Token ID262
Feature activation-1.14e-01

Pos logit contributions
 pap+0.066
 majesty+0.059
 plan+0.057
 gard+0.056
 disl+0.055
Neg logit contributions
es-0.109
med-0.103
ing-0.102
ed-0.100
oped-0.097
Token park
Token ID3952
Feature activation+1.32e-02

Pos logit contributions
 pap+0.129
 not+0.122
 disl+0.113
 ours+0.111
 majesty+0.109
Neg logit contributions
ing-0.249
ed-0.238
es-0.237
ane-0.228
med-0.225
Token.
Token ID13
Feature activation+3.50e-04

Pos logit contributions
es+0.021
oped+0.021
ing+0.021
med+0.021
ed+0.020
Neg logit contributions
 not-0.025
 pap-0.023
 plan-0.023
 help-0.023
 grown-0.023
Token
Token ID198
Feature activation-9.70e-03

Pos logit contributions
ing+0.001
es+0.001
ed+0.001
med+0.001
oped+0.001
Neg logit contributions
 not-0.001
 pap-0.001
 grown-0.001
 fun-0.001
 help-0.001
Token
Token ID198
Feature activation-8.71e-03

Pos logit contributions
 not+0.017
 pap+0.017
 plan+0.016
 grown+0.016
 mistake+0.016
Neg logit contributions
es-0.017
ed-0.017
ing-0.016
med-0.016
oped-0.016
TokenShe
Token ID3347
Feature activation-5.21e-02

Pos logit contributions
 not+0.016
 pap+0.016
 plan+0.015
 grown+0.015
 help+0.015
Neg logit contributions
es-0.014
ed-0.014
ing-0.014
oped-0.013
med-0.013
Token sne
Token ID10505
Feature activation-1.65e-02

Pos logit contributions
oped+0.161
med+0.159
es+0.157
ed+0.156
ing+0.153
Neg logit contributions
 not-0.111
 pap-0.104
 help-0.103
 plan-0.101
 fun-0.100
Tokenez
Token ID8471
Feature activation+2.53e-02

Pos logit contributions
 not+0.031
 pap+0.031
 help+0.030
 plan+0.030
 mistake+0.029
Neg logit contributions
es-0.025
oped-0.024
ing-0.024
med-0.023
ed-0.023
Tokened
Token ID276
Feature activation-4.35e-02

Pos logit contributions
oped+0.014
 onward+0.010
med+0.010
ane+0.010
 onwards+0.010
Neg logit contributions
 not-0.070
 be-0.065
 help-0.064
 grown-0.064
 pap-0.064
Token loudly
Token ID23112
Feature activation-2.17e-02

Pos logit contributions
 not+0.080
 pap+0.079
 plan+0.076
 help+0.075
 mistake+0.074
Neg logit contributions
es-0.072
oped-0.068
ing-0.068
ed-0.067
med-0.066
Token and
Token ID290
Feature activation-7.32e-02

Pos logit contributions
 not+0.040
 pap+0.039
 help+0.038
 plan+0.038
 grown+0.038
Neg logit contributions
es-0.034
oped-0.034
ing-0.033
med-0.032
ed-0.031
Token the
Token ID262
Feature activation-1.14e-01

Pos logit contributions
 pap+0.100
 plan+0.093
 ours+0.093
 disl+0.091
 mistake+0.091
Neg logit contributions
es-0.155
ed-0.151
ing-0.145
med-0.142
oped-0.141
Token tower
Token ID10580
Feature activation+9.56e-03

Pos logit contributions
 ours+0.137
 disl+0.134
 pap+0.133
 not+0.132
 plan+0.122
Neg logit contributions
ing-0.242
ed-0.234
es-0.229
med-0.221
hed-0.216
Token fell
Token ID3214
Feature activation-1.83e-02

Pos logit contributions
es+0.019
ing+0.019
oped+0.018
med+0.018
ed+0.018
Neg logit contributions
 not-0.015
 plan-0.013
 help-0.013
 grown-0.013
 be-0.013
Token down
Token ID866
Feature activation-1.99e-02

Pos logit contributions
 not+0.036
 help+0.032
 sc+0.032
 pap+0.032
 fun+0.032
Neg logit contributions
es-0.029
oped-0.027
ane-0.026
med-0.026
ing-0.025
Token with
Token ID351
Feature activation-7.25e-02

Pos logit contributions
 not+0.038
 help+0.035
 mistake+0.035
 plan+0.034
 pap+0.034
Neg logit contributions
es-0.033
oped-0.032
ing-0.031
ed-0.031
med-0.031
Token a
Token ID257
Feature activation-5.30e-02

Pos logit contributions
 majesty+0.098
 pap+0.097
 rightful+0.096
 disl+0.093
 mistake+0.093
Neg logit contributions
es-0.149
med-0.145
ed-0.144
hed-0.135
ing-0.135
Token the
Token ID262
Feature activation-1.07e-01

Pos logit contributions
 pap+0.129
 plan+0.122
 majesty+0.120
 gard+0.117
 promise+0.117
Neg logit contributions
es-0.187
ing-0.180
med-0.171
ed-0.169
hed-0.159
Token y
Token ID331
Feature activation-1.92e-02

Pos logit contributions
 pap+0.125
 not+0.119
 ours+0.109
 disl+0.107
 majesty+0.107
Neg logit contributions
ing-0.237
es-0.227
ed-0.222
ane-0.217
med-0.214
Tokenummy
Token ID13513
Feature activation-4.62e-02

Pos logit contributions
 pap+0.034
 not+0.033
 plan+0.033
 help+0.032
 mistake+0.032
Neg logit contributions
es-0.033
ing-0.031
ed-0.031
oped-0.031
med-0.030
Token fruit
Token ID8234
Feature activation+2.36e-02

Pos logit contributions
 pap+0.045
 not+0.044
 plan+0.040
 mistake+0.038
 ours+0.038
Neg logit contributions
es-0.114
ing-0.113
ed-0.111
med-0.107
oped-0.106
Token and
Token ID290
Feature activation-7.47e-02

Pos logit contributions
es+0.039
oped+0.039
med+0.038
ing+0.038
ed+0.038
Neg logit contributions
 not-0.042
 pap-0.041
 plan-0.039
 help-0.039
 grown-0.039
Token the
Token ID262
Feature activation-1.13e-01

Pos logit contributions
 pap+0.096
 mistake+0.088
 plan+0.087
 tresp+0.087
 permission+0.085
Neg logit contributions
es-0.150
ed-0.148
ing-0.137
ted-0.135
med-0.134
Token pretty
Token ID2495
Feature activation-5.59e-02

Pos logit contributions
 not+0.111
 pap+0.111
 ours+0.107
 disl+0.106
 plan+0.100
Neg logit contributions
ing-0.250
ed-0.246
es-0.242
med-0.228
hed-0.226
Token sunset
Token ID26428
Feature activation+3.59e-02

Pos logit contributions
 pap+0.062
 not+0.058
 plan+0.053
 help+0.052
 ours+0.051
Neg logit contributions
es-0.131
ing-0.131
ed-0.130
oped-0.124
med-0.123
Token.
Token ID13
Feature activation-8.73e-04

Pos logit contributions
oped+0.061
es+0.059
med+0.055
ed+0.055
ing+0.054
Neg logit contributions
 not-0.060
 fun-0.058
 pap-0.057
 grown-0.056
 plan-0.056
Token<|endoftext|>
Token ID50256
Feature activation+1.90e-02

Pos logit contributions
 pap+0.002
 not+0.001
 grown+0.001
 plan+0.001
 be+0.001
Neg logit contributions
es-0.002
ing-0.002
ed-0.002
med-0.001
oped-0.001
TokenOnce
Token ID7454
Feature activation-2.45e-02

Pos logit contributions
ing+0.039
cases+0.039
oped+0.039
med+0.038
es+0.038
Neg logit contributions
 welcome-0.024
 not-0.022
 fun-0.022
 grown-0.021
 be-0.020
Token being
Token ID852
Feature activation-7.64e-02

Pos logit contributions
 pap+0.116
 plan+0.106
 promise+0.102
 gard+0.099
 ours+0.098
Neg logit contributions
es-0.187
med-0.169
ed-0.166
ing-0.165
 again-0.156
Token nice
Token ID3621
Feature activation-2.97e-02

Pos logit contributions
 pap+0.088
 plan+0.080
 majesty+0.075
 fact+0.073
 mistake+0.072
Neg logit contributions
es-0.181
ed-0.180
ing-0.178
med-0.173
hed-0.162
Token to
Token ID284
Feature activation-7.16e-02

Pos logit contributions
 pap+0.059
 not+0.058
 plan+0.056
 help+0.055
 mistake+0.055
Neg logit contributions
es-0.045
ing-0.043
ed-0.042
oped-0.041
med-0.041
Token me
Token ID502
Feature activation-2.30e-02

Pos logit contributions
 pap+0.092
 plan+0.077
 majesty+0.076
 fact+0.075
 gard+0.074
Neg logit contributions
es-0.155
ing-0.148
med-0.144
ed-0.141
hed-0.137
Token and
Token ID290
Feature activation-6.20e-02

Pos logit contributions
 not+0.044
 pap+0.044
 plan+0.043
 help+0.042
 mistake+0.041
Neg logit contributions
es-0.035
ed-0.034
ing-0.034
oped-0.034
med-0.033
Token the
Token ID262
Feature activation-1.13e-01

Pos logit contributions
 pap+0.082
 fact+0.075
 mistake+0.074
 permission+0.073
 plan+0.072
Neg logit contributions
es-0.125
ed-0.120
ing-0.116
s-0.113
med-0.110
Token spider
Token ID19230
Feature activation-6.51e-03

Pos logit contributions
 not+0.125
 pap+0.124
 disl+0.120
 ours+0.114
 majesty+0.105
Neg logit contributions
ing-0.239
ed-0.227
es-0.227
ane-0.218
hed-0.213
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
 pap+0.011
 not+0.011
 plan+0.011
 help+0.010
 mistake+0.010
Neg logit contributions
es-0.011
ing-0.011
ed-0.011
oped-0.011
med-0.010
Token Maybe
Token ID6674
Feature activation-3.91e-02

Pos logit contributions
 pap+0.021
 not+0.020
 help+0.018
 disl+0.018
 plan+0.018
Neg logit contributions
es-0.018
ing-0.018
ed-0.017
med-0.016
oped-0.016
Token one
Token ID530
Feature activation-4.36e-02

Pos logit contributions
 pap+0.058
 plan+0.056
 not+0.055
 mistake+0.054
 majesty+0.052
Neg logit contributions
es-0.077
ing-0.074
ed-0.073
med-0.071
oped-0.070
Token day
Token ID1110
Feature activation-7.31e-03

Pos logit contributions
 pap+0.051
 not+0.049
 plan+0.048
 help+0.046
 mistake+0.045
Neg logit contributions
es-0.103
ing-0.099
ed-0.099
oped-0.098
med-0.097
TokenThe
Token ID464
Feature activation-1.13e-01

Pos logit contributions
 pap+0.080
 not+0.079
 plan+0.076
 help+0.074
 grown+0.074
Neg logit contributions
es-0.065
ing-0.063
ed-0.063
oped-0.060
med-0.059
Token more
Token ID517
Feature activation-7.13e-02

Pos logit contributions
 disl+0.137
 not+0.135
 ours+0.129
 pap+0.126
 supposed+0.115
Neg logit contributions
ing-0.234
es-0.217
ed-0.216
ane-0.210
med-0.208
Token the
Token ID262
Feature activation-1.07e-01

Pos logit contributions
 pap+0.078
 not+0.074
 plan+0.068
 ours+0.068
 help+0.068
Neg logit contributions
es-0.172
ing-0.167
ed-0.166
oped-0.161
med-0.160
Token ants
Token ID27842
Feature activation-1.14e-02

Pos logit contributions
 disl+0.123
 ours+0.119
 pap+0.118
 not+0.117
 majesty+0.101
Neg logit contributions
ing-0.225
ed-0.212
es-0.212
ane-0.205
med-0.202
Token and
Token ID290
Feature activation-5.80e-02

Pos logit contributions
 pap+0.016
 not+0.016
 plan+0.015
 help+0.015
 mistake+0.015
Neg logit contributions
es-0.023
ing-0.023
ed-0.023
oped-0.022
med-0.022
Token the
Token ID262
Feature activation-1.12e-01

Pos logit contributions
 pap+0.073
 not+0.071
 mistake+0.071
 ours+0.071
 plan+0.068
Neg logit contributions
es-0.128
ed-0.121
ing-0.116
med-0.114
oped-0.109
Token mole
Token ID9411
Feature activation-1.77e-02

Pos logit contributions
 ours+0.124
 disl+0.121
 not+0.121
 pap+0.109
 supposed+0.101
Neg logit contributions
ing-0.245
ed-0.232
es-0.227
med-0.211
s-0.208
Token played
Token ID2826
Feature activation+1.01e-02

Pos logit contributions
 pap+0.027
 not+0.026
 plan+0.025
 help+0.025
 grown+0.024
Neg logit contributions
es-0.035
ing-0.034
ed-0.033
oped-0.033
med-0.033
Token together
Token ID1978
Feature activation-9.17e-03

Pos logit contributions
es+0.013
ing+0.013
oped+0.013
ed+0.013
med+0.012
Neg logit contributions
 not-0.022
 pap-0.021
 plan-0.021
 help-0.021
 fun-0.020
Token,
Token ID11
Feature activation-4.08e-02

Pos logit contributions
 not+0.016
 pap+0.016
 plan+0.016
 help+0.016
 grown+0.015
Neg logit contributions
es-0.015
oped-0.015
ed-0.015
ing-0.015
med-0.014
Token the
Token ID262
Feature activation-9.02e-02

Pos logit contributions
 pap+0.060
 plan+0.057
 not+0.055
 help+0.055
 mistake+0.055
Neg logit contributions
es-0.081
ing-0.079
ed-0.078
med-0.076
oped-0.076
Token,
Token ID11
Feature activation-6.40e-02

Pos logit contributions
 pap+0.192
 plan+0.173
 majesty+0.171
 help+0.170
 fact+0.170
Neg logit contributions
es-0.111
ed-0.104
ing-0.103
hed-0.091
oped-0.091
Token much
Token ID881
Feature activation-5.59e-02

Pos logit contributions
 pap+0.109
 plan+0.103
 not+0.103
 help+0.100
 mistake+0.100
Neg logit contributions
es-0.112
ing-0.108
ed-0.107
oped-0.104
med-0.103
Token too
Token ID1165
Feature activation-6.81e-02

Pos logit contributions
 pap+0.088
 not+0.081
 plan+0.080
 majesty+0.078
 fact+0.078
Neg logit contributions
es-0.106
ing-0.106
ed-0.102
med-0.097
oped-0.096
Token high
Token ID1029
Feature activation+3.74e-03

Pos logit contributions
 pap+0.118
 not+0.112
 plan+0.111
 help+0.108
 mistake+0.107
Neg logit contributions
es-0.120
ing-0.116
ed-0.114
oped-0.111
med-0.109
Token for
Token ID329
Feature activation-4.80e-02

Pos logit contributions
es+0.006
oped+0.006
med+0.006
ing+0.005
ane+0.005
Neg logit contributions
 not-0.007
 mistake-0.006
 pap-0.006
 help-0.006
 fact-0.006
Token the
Token ID262
Feature activation-1.12e-01

Pos logit contributions
 pap+0.080
 plan+0.074
 disl+0.073
 majesty+0.073
 mistake+0.070
Neg logit contributions
es-0.086
med-0.079
ing-0.078
ed-0.077
hed-0.074
Token farmer
Token ID18739
Feature activation+1.05e-02

Pos logit contributions
 pap+0.126
 disl+0.122
 not+0.113
 majesty+0.108
 ours+0.107
Neg logit contributions
ing-0.241
ed-0.226
es-0.223
ane-0.222
hed-0.216
Token to
Token ID284
Feature activation-7.68e-02

Pos logit contributions
es+0.017
ing+0.016
ed+0.016
oped+0.016
med+0.016
Neg logit contributions
 pap-0.020
 not-0.019
 plan-0.019
 help-0.018
 mistake-0.018
Token reach
Token ID3151
Feature activation-1.78e-02

Pos logit contributions
 pap+0.094
 majesty+0.079
 ours+0.074
 disl+0.073
 vitamins+0.071
Neg logit contributions
es-0.161
hed-0.153
he-0.152
oped-0.150
med-0.149
Token.
Token ID13
Feature activation-2.19e-03

Pos logit contributions
 pap+0.027
 not+0.026
 plan+0.025
 help+0.024
 mistake+0.024
Neg logit contributions
es-0.035
ing-0.034
ed-0.034
oped-0.032
med-0.032
Token They
Token ID1119
Feature activation-7.39e-02

Pos logit contributions
 pap+0.004
 not+0.004
 grown+0.004
 plan+0.004
 help+0.004
Neg logit contributions
es-0.004
ing-0.004
ed-0.004
med-0.003
oped-0.003
Tokenmy
Token ID1820
Feature activation-2.19e-02

Pos logit contributions
 pap+0.044
 not+0.044
 plan+0.042
 help+0.042
 grown+0.040
Neg logit contributions
es-0.050
ing-0.049
ed-0.048
oped-0.048
med-0.047
Token explained
Token ID4893
Feature activation-3.12e-02

Pos logit contributions
 pap+0.035
 not+0.035
 plan+0.033
 help+0.033
 grown+0.032
Neg logit contributions
es-0.040
ing-0.039
oped-0.039
ed-0.039
med-0.038
Token that
Token ID326
Feature activation-4.15e-02

Pos logit contributions
 pap+0.050
 plan+0.048
 not+0.047
 majesty+0.045
 mistake+0.045
Neg logit contributions
es-0.059
ing-0.058
ed-0.057
med-0.054
oped-0.053
Token it
Token ID340
Feature activation-2.91e-02

Pos logit contributions
 pap+0.065
 majesty+0.065
 plan+0.064
 ours+0.062
 fact+0.061
Neg logit contributions
es-0.077
ing-0.074
ed-0.070
med-0.067
s-0.064
Token meant
Token ID4001
Feature activation-4.55e-02

Pos logit contributions
 pap+0.052
 not+0.051
 plan+0.049
 help+0.048
 mistake+0.048
Neg logit contributions
es-0.050
ing-0.048
ed-0.048
oped-0.046
med-0.046
Token the
Token ID262
Feature activation-1.12e-01

Pos logit contributions
 pap+0.066
 plan+0.064
 majesty+0.062
 gard+0.060
 fact+0.059
Neg logit contributions
es-0.091
ing-0.089
ed-0.086
med-0.082
oped-0.078
Token food
Token ID2057
Feature activation+2.57e-02

Pos logit contributions
 pap+0.115
 not+0.111
 disl+0.107
 ours+0.104
 majesty+0.103
Neg logit contributions
ing-0.247
ed-0.238
es-0.235
ane-0.227
med-0.221
Token wasn
Token ID2492
Feature activation-6.32e-02

Pos logit contributions
es+0.046
ing+0.044
ed+0.044
oped+0.043
med+0.042
Neg logit contributions
 pap-0.044
 not-0.042
 plan-0.041
 help-0.040
 mistake-0.040
Token't
Token ID470
Feature activation-8.41e-02

Pos logit contributions
 pap+0.117
 majesty+0.111
 plan+0.109
 disl+0.107
 mistake+0.107
Neg logit contributions
es-0.104
ing-0.101
oped-0.099
med-0.099
ed-0.098
Token good
Token ID922
Feature activation-3.97e-02

Pos logit contributions
 pap+0.112
 majesty+0.109
 disl+0.101
 dads+0.097
 plan+0.093
Neg logit contributions
ing-0.175
es-0.173
ed-0.166
med-0.155
hed-0.152
Token anymore
Token ID7471
Feature activation-2.51e-02

Pos logit contributions
 pap+0.066
 not+0.064
 plan+0.063
 help+0.061
 mistake+0.060
Neg logit contributions
es-0.073
ing-0.070
ed-0.070
oped-0.069
med-0.068
Token we
Token ID356
Feature activation-1.19e-01

Pos logit contributions
es+0.040
ing+0.039
ed+0.039
oped+0.039
med+0.038
Neg logit contributions
 not-0.031
 pap-0.030
 plan-0.029
 help-0.028
 mistake-0.028
Token will
Token ID481
Feature activation-9.99e-02

Pos logit contributions
 pap+0.165
 majesty+0.141
 mistake+0.139
 disl+0.138
 fact+0.137
Neg logit contributions
es-0.250
ed-0.248
ing-0.240
oped-0.234
hed-0.231
Token be
Token ID307
Feature activation-7.75e-02

Pos logit contributions
 pap+0.113
 ours+0.095
 creepy+0.094
 majesty+0.093
 fact+0.092
Neg logit contributions
es-0.216
ing-0.203
 again-0.198
med-0.197
s-0.196
Token careful
Token ID8161
Feature activation-2.91e-02

Pos logit contributions
 pap+0.095
 fact+0.088
 plan+0.083
 gard+0.083
 seaw+0.082
Neg logit contributions
es-0.170
ing-0.163
med-0.162
ed-0.161
 again-0.147
Token on
Token ID319
Feature activation-5.46e-02

Pos logit contributions
 pap+0.057
 not+0.056
 plan+0.054
 help+0.053
 mistake+0.053
Neg logit contributions
es-0.045
ing-0.043
ed-0.042
oped-0.041
med-0.041
Token the
Token ID262
Feature activation-1.12e-01

Pos logit contributions
 pap+0.065
 plan+0.060
 gard+0.058
 majesty+0.056
 not+0.056
Neg logit contributions
es-0.127
ing-0.123
med-0.116
ed-0.114
s-0.110
Token road
Token ID2975
Feature activation+1.57e-02

Pos logit contributions
 pap+0.142
 disl+0.137
 not+0.136
 majesty+0.123
 grown+0.116
Neg logit contributions
ing-0.216
es-0.200
ane-0.195
ed-0.194
 again-0.190
Token."
Token ID526
Feature activation-2.23e-03

Pos logit contributions
es+0.026
oped+0.026
ed+0.025
med+0.025
ing+0.025
Neg logit contributions
 not-0.029
 plan-0.027
 pap-0.027
 help-0.027
 mistake-0.027
Token The
Token ID383
Feature activation-7.83e-02

Pos logit contributions
 pap+0.004
 not+0.004
 help+0.004
 plan+0.004
 be+0.003
Neg logit contributions
ing-0.004
es-0.004
ed-0.004
med-0.004
oped-0.004
Token taxi
Token ID17536
Feature activation+3.33e-02

Pos logit contributions
 not+0.104
 pap+0.100
 ours+0.099
 disl+0.097
 fact+0.089
Neg logit contributions
ing-0.158
es-0.151
ed-0.148
oped-0.144
med-0.142
Token driver
Token ID4639
Feature activation+1.84e-02

Pos logit contributions
es+0.070
ing+0.069
med+0.068
oped+0.067
ays+0.065
Neg logit contributions
 not-0.047
 plan-0.043
 pap-0.043
 be-0.042
 help-0.041
Token don
Token ID836
Feature activation-5.23e-02

Pos logit contributions
 pap+0.139
 not+0.121
 ours+0.120
 mistake+0.118
 majesty+0.118
Neg logit contributions
es-0.235
ed-0.234
ing-0.228
oped-0.228
hed-0.219
Token't
Token ID470
Feature activation-8.68e-02

Pos logit contributions
 pap+0.077
 not+0.077
 plan+0.073
 help+0.072
 mistake+0.070
Neg logit contributions
es-0.104
ed-0.101
ing-0.101
oped-0.099
med-0.097
Token think
Token ID892
Feature activation-4.68e-02

Pos logit contributions
 pap+0.102
 not+0.087
 plan+0.085
 ours+0.084
 majesty+0.084
Neg logit contributions
es-0.199
ing-0.199
ed-0.193
oped-0.192
med-0.188
Token it
Token ID340
Feature activation-5.32e-02

Pos logit contributions
 not+0.064
 pap+0.061
 help+0.058
 plan+0.057
 fun+0.056
Neg logit contributions
es-0.100
oped-0.100
ed-0.099
ing-0.096
med-0.095
Token's
Token ID338
Feature activation-9.42e-02

Pos logit contributions
 not+0.083
 pap+0.081
 plan+0.079
 help+0.076
 mistake+0.075
Neg logit contributions
es-0.102
oped-0.099
ing-0.099
ed-0.099
med-0.098
Token a
Token ID257
Feature activation-1.11e-01

Pos logit contributions
 gard+0.096
 pap+0.092
 majesty+0.087
 disl+0.085
 dads+0.081
Neg logit contributions
es-0.214
ing-0.211
ed-0.198
s-0.193
med-0.191
Token good
Token ID922
Feature activation-7.39e-02

Pos logit contributions
 pap+0.108
 not+0.101
 disl+0.098
 permission+0.096
 ours+0.093
Neg logit contributions
ing-0.249
es-0.233
s-0.232
oped-0.230
ane-0.229
Token idea
Token ID2126
Feature activation-1.60e-02

Pos logit contributions
 pap+0.083
 not+0.077
 ours+0.072
 disl+0.070
 fact+0.068
Neg logit contributions
ing-0.167
ed-0.162
es-0.158
ane-0.149
oped-0.149
Token to
Token ID284
Feature activation-8.31e-02

Pos logit contributions
 not+0.029
 pap+0.028
 help+0.027
 plan+0.027
 mistake+0.026
Neg logit contributions
es-0.027
oped-0.026
ing-0.026
ed-0.026
med-0.025
Token climb
Token ID12080
Feature activation+8.31e-03

Pos logit contributions
 pap+0.096
 majesty+0.085
 ours+0.083
 fact+0.082
 freshly+0.082
Neg logit contributions
es-0.172
ing-0.163
le-0.158
med-0.157
oped-0.157
Token the
Token ID262
Feature activation-9.50e-02

Pos logit contributions
es+0.018
oped+0.017
ed+0.017
ing+0.017
med+0.016
Neg logit contributions
 not-0.011
 pap-0.011
 help-0.010
 ours-0.010
 fun-0.010
Token of
Token ID286
Feature activation-8.65e-02

Pos logit contributions
 pap+0.149
 not+0.141
 plan+0.141
 mistake+0.137
 majesty+0.137
Neg logit contributions
es-0.062
ing-0.061
ed-0.058
med-0.054
oped-0.053
Token luck
Token ID8458
Feature activation+2.49e-02

Pos logit contributions
 pap+0.113
 majesty+0.091
 not+0.089
 ours+0.089
 gard+0.088
Neg logit contributions
es-0.199
ing-0.187
med-0.185
ed-0.184
hed-0.176
Token and
Token ID290
Feature activation-7.21e-02

Pos logit contributions
es+0.037
ing+0.035
ed+0.035
oped+0.034
med+0.034
Neg logit contributions
 pap-0.050
 not-0.049
 plan-0.047
 help-0.046
 mistake-0.046
Token good
Token ID922
Feature activation-4.03e-02

Pos logit contributions
 pap+0.104
 ours+0.096
 mistake+0.093
 fact+0.090
 gard+0.089
Neg logit contributions
es-0.146
ed-0.144
ing-0.136
med-0.131
hed-0.129
Token things
Token ID1243
Feature activation-1.12e-02

Pos logit contributions
 pap+0.050
 not+0.048
 plan+0.045
 mistake+0.044
 ours+0.043
Neg logit contributions
es-0.091
ing-0.090
ed-0.089
oped-0.085
med-0.085
Token would
Token ID561
Feature activation-1.11e-01

Pos logit contributions
 pap+0.021
 not+0.020
 plan+0.020
 mistake+0.019
 help+0.019
Neg logit contributions
es-0.018
ing-0.018
ed-0.018
oped-0.017
med-0.017
Token happen
Token ID1645
Feature activation-1.16e-02

Pos logit contributions
 pap+0.149
 majesty+0.121
 gard+0.111
ses+0.110
 elderly+0.108
Neg logit contributions
es-0.212
 again-0.205
he-0.202
s-0.201
hed-0.201
Token to
Token ID284
Feature activation-5.98e-02

Pos logit contributions
 not+0.022
 pap+0.022
 plan+0.021
 help+0.021
 mistake+0.020
Neg logit contributions
es-0.018
oped-0.018
ed-0.018
ing-0.018
med-0.017
Token her
Token ID607
Feature activation-6.94e-02

Pos logit contributions
 pap+0.076
 plan+0.071
 majesty+0.068
 gard+0.068
 mistake+0.066
Neg logit contributions
es-0.125
ing-0.120
med-0.113
ed-0.112
hed-0.105
Token.
Token ID13
Feature activation+3.20e-03

Pos logit contributions
 pap+0.089
 not+0.085
 plan+0.081
 help+0.079
 ours+0.078
Neg logit contributions
es-0.151
ing-0.146
ed-0.144
oped-0.140
med-0.139
Token
Token ID198
Feature activation-1.10e-02

Pos logit contributions
ing+0.005
es+0.005
ed+0.005
med+0.005
oped+0.005
Neg logit contributions
 pap-0.006
 not-0.006
 plan-0.005
 help-0.005
 grown-0.005
Token a
Token ID257
Feature activation-1.04e-01

Pos logit contributions
 pap+0.095
 plan+0.089
 mistake+0.087
 not+0.086
 majesty+0.086
Neg logit contributions
es-0.123
ing-0.118
ed-0.117
med-0.111
oped-0.110
Token shiny
Token ID22441
Feature activation-6.76e-02

Pos logit contributions
 disl+0.095
 permission+0.093
 pap+0.091
 ours+0.091
 not+0.088
Neg logit contributions
ing-0.229
es-0.220
ed-0.215
oped-0.214
s-0.213
Token new
Token ID649
Feature activation-7.72e-02

Pos logit contributions
 pap+0.093
 not+0.087
 disl+0.082
 dads+0.079
 permission+0.079
Neg logit contributions
ing-0.137
ed-0.132
es-0.132
ane-0.126
oped-0.126
Token bell
Token ID8966
Feature activation+2.44e-02

Pos logit contributions
 pap+0.097
 not+0.091
 disl+0.089
 ours+0.084
 permission+0.083
Neg logit contributions
ing-0.160
es-0.157
ed-0.154
ane-0.149
oped-0.147
Token that
Token ID326
Feature activation-4.27e-02

Pos logit contributions
es+0.043
oped+0.041
ed+0.041
ing+0.041
med+0.040
Neg logit contributions
 not-0.041
 pap-0.039
 help-0.038
 plan-0.038
 be-0.038
Token the
Token ID262
Feature activation-1.11e-01

Pos logit contributions
 pap+0.066
 not+0.062
 plan+0.062
 mistake+0.060
 help+0.059
Neg logit contributions
es-0.083
ing-0.081
ed-0.080
oped-0.077
med-0.077
Token goose
Token ID37246
Feature activation+1.01e-02

Pos logit contributions
 pap+0.131
 not+0.125
 disl+0.123
 ours+0.117
 plan+0.116
Neg logit contributions
ing-0.230
ed-0.221
ane-0.219
es-0.216
med-0.213
Token could
Token ID714
Feature activation-8.06e-02

Pos logit contributions
es+0.020
ing+0.019
ed+0.019
oped+0.019
med+0.019
Neg logit contributions
 not-0.015
 pap-0.014
 plan-0.014
 help-0.014
 grown-0.014
Token wear
Token ID5806
Feature activation-1.60e-02

Pos logit contributions
 pap+0.090
 majesty+0.075
 mistake+0.073
 ours+0.073
 dads+0.072
Neg logit contributions
es-0.189
oped-0.181
ed-0.180
med-0.178
ing-0.177
Token around
Token ID1088
Feature activation-1.87e-02

Pos logit contributions
 pap+0.025
 not+0.025
 plan+0.023
 help+0.023
 grown+0.023
Neg logit contributions
es-0.031
ing-0.030
ed-0.029
oped-0.029
med-0.028
Token its
Token ID663
Feature activation-6.87e-02

Pos logit contributions
 pap+0.028
 not+0.027
 plan+0.026
 help+0.025
 mistake+0.025
Neg logit contributions
es-0.038
ing-0.037
ed-0.036
oped-0.036
med-0.035
Token distance
Token ID5253
Feature activation+4.57e-02

Pos logit contributions
 pap+0.047
 disl+0.047
 majesty+0.045
 plan+0.044
 permission+0.044
Neg logit contributions
ing-0.044
ed-0.043
hed-0.041
es-0.040
s-0.039
Token.
Token ID13
Feature activation+7.04e-02

Pos logit contributions
oped+0.080
es+0.080
med+0.076
ed+0.075
ing+0.072
Neg logit contributions
 not-0.084
 pap-0.077
 be-0.076
 help-0.076
 mistake-0.075
Token It
Token ID632
Feature activation+6.84e-03

Pos logit contributions
es+0.105
ing+0.104
oped+0.103
med+0.101
ed+0.098
Neg logit contributions
 not-0.141
 pap-0.137
 plan-0.134
 help-0.133
 grown-0.133
Token was
Token ID373
Feature activation-3.67e-02

Pos logit contributions
es+0.011
ing+0.011
ed+0.011
oped+0.010
med+0.010
Neg logit contributions
 pap-0.013
 not-0.012
 plan-0.012
 help-0.012
 mistake-0.012
Token very
Token ID845
Feature activation-5.34e-02

Pos logit contributions
 majesty+0.059
 disl+0.056
 permission+0.050
 dads+0.049
 promise+0.048
Neg logit contributions
ing-0.070
ed-0.067
es-0.066
med-0.064
 against-0.063
Token high
Token ID1029
Feature activation+5.08e-02

Pos logit contributions
 permission+0.076
 pap+0.072
 majesty+0.071
 help+0.068
 fact+0.065
Neg logit contributions
ing-0.109
s-0.105
ed-0.101
oped-0.096
 onwards-0.095
Token and
Token ID290
Feature activation-1.45e-02

Pos logit contributions
oped+0.089
es+0.088
med+0.083
ed+0.079
ane+0.077
Neg logit contributions
 not-0.092
 pap-0.085
 fun-0.083
 dangerous-0.083
 fact-0.082
Token looked
Token ID3114
Feature activation-2.56e-02

Pos logit contributions
 pap+0.024
 permission+0.022
 majesty+0.022
 plan+0.022
 mistake+0.022
Neg logit contributions
es-0.027
ed-0.026
ing-0.026
med-0.025
oped-0.024
Token very
Token ID845
Feature activation-4.68e-02

Pos logit contributions
 pap+0.037
 majesty+0.034
 plan+0.032
 dads+0.031
 mistake+0.031
Neg logit contributions
ing-0.053
ed-0.052
es-0.051
med-0.049
s-0.048
Token dangerous
Token ID4923
Feature activation+4.62e-02

Pos logit contributions
 pap+0.056
 permission+0.053
 majesty+0.050
 mistake+0.048
 help+0.048
Neg logit contributions
ing-0.107
ed-0.102
s-0.102
es-0.097
oped-0.096
Token.
Token ID13
Feature activation+6.74e-02

Pos logit contributions
oped+0.087
med+0.083
es+0.082
ane+0.078
acked+0.077
Neg logit contributions
 not-0.080
 fun-0.073
 help-0.072
 dangerous-0.070
 ours-0.070
Token and
Token ID290
Feature activation-2.93e-02

Pos logit contributions
ed+0.099
es+0.098
oped+0.097
ing+0.095
med+0.093
Neg logit contributions
 not-0.102
 pap-0.095
 help-0.094
 plan-0.094
 grown-0.093
Token the
Token ID262
Feature activation-4.81e-02

Pos logit contributions
vern+0.043
 tresp+0.043
 mistake+0.043
 ours+0.042
 punish+0.042
Neg logit contributions
es-0.054
ed-0.052
med-0.050
ted-0.050
s-0.049
Token birds
Token ID10087
Feature activation+5.06e-02

Pos logit contributions
 not+0.064
 pap+0.063
 disl+0.061
 ours+0.059
 mistake+0.058
Neg logit contributions
ing-0.093
ed-0.091
es-0.089
med-0.084
s-0.083
Token and
Token ID290
Feature activation-2.84e-02

Pos logit contributions
ed+0.094
es+0.092
oped+0.090
ing+0.088
med+0.085
Neg logit contributions
 not-0.082
 help-0.077
 be-0.075
 plan-0.075
 grown-0.075
Token the
Token ID262
Feature activation-4.96e-02

Pos logit contributions
 mistake+0.039
 pap+0.038
 not+0.038
 plan+0.038
 tresp+0.037
Neg logit contributions
es-0.056
ed-0.055
med-0.052
ted-0.052
ing-0.052
Token sun
Token ID4252
Feature activation+5.94e-02

Pos logit contributions
 not+0.065
 pap+0.064
 ours+0.060
 disl+0.060
 plan+0.059
Neg logit contributions
ing-0.100
ed-0.098
es-0.098
med-0.092
s-0.090
Token.
Token ID13
Feature activation+5.44e-02

Pos logit contributions
oped+0.112
es+0.106
ed+0.102
ane+0.100
hed+0.099
Neg logit contributions
 not-0.090
 be-0.090
 smell-0.086
 water-0.085
 work-0.084
Token She
Token ID1375
Feature activation-2.28e-02

Pos logit contributions
oped+0.105
med+0.103
ing+0.102
es+0.100
ane+0.098
Neg logit contributions
 not-0.090
 pap-0.089
 grown-0.086
 old-0.082
 dad-0.081
Token feels
Token ID5300
Feature activation-3.12e-02

Pos logit contributions
 pap+0.037
 ours+0.037
 permission+0.036
 disl+0.036
 help+0.035
Neg logit contributions
es-0.042
ed-0.039
ing-0.039
s-0.038
med-0.037
Token warm
Token ID5814
Feature activation+4.21e-02

Pos logit contributions
 mistake+0.036
 pap+0.035
 majesty+0.034
 plan+0.034
 tresp+0.033
Neg logit contributions
es-0.069
ed-0.068
ing-0.067
med-0.067
s-0.063
Token and
Token ID290
Feature activation-4.30e-02

Pos logit contributions
oped+0.074
es+0.070
ane+0.067
med+0.064
ing+0.064
Neg logit contributions
 not-0.074
 pap-0.071
 sweet-0.069
 fun-0.069
 grown-0.068
Token he
Token ID339
Feature activation-1.87e-02

Pos logit contributions
es+0.088
oped+0.087
ed+0.086
ing+0.085
med+0.083
Neg logit contributions
 not-0.087
 pap-0.086
 help-0.083
 plan-0.082
 grown-0.080
Token had
Token ID550
Feature activation-4.11e-02

Pos logit contributions
 pap+0.022
 not+0.022
 plan+0.021
 help+0.021
 grown+0.020
Neg logit contributions
es-0.043
ing-0.042
ed-0.041
oped-0.040
med-0.040
Token never
Token ID1239
Feature activation-5.28e-02

Pos logit contributions
 pap+0.060
 plan+0.055
 not+0.054
 mistake+0.053
 disl+0.052
Neg logit contributions
es-0.086
ed-0.083
ing-0.081
med-0.079
oped-0.079
Token met
Token ID1138
Feature activation+2.99e-02

Pos logit contributions
 pap+0.062
 not+0.054
 majesty+0.053
 plan+0.053
 disl+0.051
Neg logit contributions
es-0.125
ed-0.123
ing-0.120
med-0.118
oped-0.117
Token before
Token ID878
Feature activation-1.55e-02

Pos logit contributions
oped+0.056
es+0.051
ays+0.051
ed+0.051
acked+0.050
Neg logit contributions
 not-0.050
 grown-0.047
 fun-0.046
 help-0.046
 pap-0.046
Token.
Token ID13
Feature activation+6.43e-02

Pos logit contributions
 not+0.029
 grown+0.026
 pap+0.026
 fun+0.026
 help+0.026
Neg logit contributions
ed-0.026
oped-0.026
es-0.026
ing-0.026
med-0.025
Token
Token ID198
Feature activation+5.43e-02

Pos logit contributions
es+0.115
ing+0.109
med+0.108
oped+0.107
ed+0.103
Neg logit contributions
 not-0.114
 pap-0.113
 help-0.110
 plan-0.108
 grown-0.108
Token
Token ID198
Feature activation+4.74e-02

Pos logit contributions
ays+0.115
ed+0.114
med+0.113
es+0.112
oped+0.109
Neg logit contributions
 not-0.077
 help-0.068
 plan-0.068
 fun-0.067
 grown-0.065
TokenHe
Token ID1544
Feature activation-1.48e-02

Pos logit contributions
ays+0.091
ed+0.087
es+0.085
oped+0.082
ing+0.082
Neg logit contributions
 not-0.084
 grown-0.080
 plan-0.079
 help-0.077
 gentle-0.077
Token wrote
Token ID2630
Feature activation+4.36e-02

Pos logit contributions
 pap+0.021
 not+0.020
 plan+0.020
 grown+0.019
 help+0.019
Neg logit contributions
es-0.031
ing-0.030
ed-0.030
oped-0.029
med-0.029
Token a
Token ID257
Feature activation-3.40e-02

Pos logit contributions
oped+0.097
es+0.092
ays+0.092
ane+0.090
hed+0.089
Neg logit contributions
 not-0.059
 pap-0.054
 grown-0.052
 help-0.050
 plan-0.050
Token,
Token ID11
Feature activation-2.03e-02

Pos logit contributions
med+0.147
oped+0.146
es+0.142
ays+0.138
ing+0.137
Neg logit contributions
 not-0.128
 help-0.112
 plan-0.106
 fun-0.106
 be-0.103
Token but
Token ID475
Feature activation+9.24e-03

Pos logit contributions
 majesty+0.051
 pap+0.051
 rightful+0.046
 plan+0.046
 disl+0.046
Neg logit contributions
es-0.017
ing-0.015
ed-0.015
ted-0.014
s-0.014
Token Max
Token ID5436
Feature activation+3.40e-02

Pos logit contributions
es+0.020
ing+0.019
ed+0.019
oped+0.019
med+0.018
Neg logit contributions
 pap-0.013
 not-0.013
 plan-0.012
 help-0.012
 grown-0.011
Token was
Token ID373
Feature activation-4.99e-02

Pos logit contributions
ing+0.071
es+0.070
med+0.069
oped+0.068
ed+0.067
Neg logit contributions
 not-0.049
 pap-0.046
 plan-0.045
 help-0.043
 grown-0.043
Token smart
Token ID4451
Feature activation+2.78e-02

Pos logit contributions
 pap+0.085
 disl+0.084
 majesty+0.082
 plan+0.076
 dads+0.074
Neg logit contributions
es-0.092
ed-0.085
med-0.084
ing-0.084
oped-0.081
Token.
Token ID13
Feature activation+6.05e-02

Pos logit contributions
oped+0.046
es+0.045
med+0.044
ed+0.044
ing+0.043
Neg logit contributions
 not-0.052
 plan-0.049
 mistake-0.049
 help-0.049
 fun-0.048
Token Max
Token ID5436
Feature activation+2.77e-02

Pos logit contributions
ing+0.109
oped+0.109
es+0.108
ane+0.104
ed+0.103
Neg logit contributions
 not-0.111
 help-0.103
 fun-0.101
 grown-0.100
 plan-0.098
Token knew
Token ID2993
Feature activation-2.40e-03

Pos logit contributions
es+0.057
ing+0.056
ed+0.054
oped+0.054
med+0.054
Neg logit contributions
 pap-0.040
 not-0.040
 plan-0.038
 help-0.036
 grown-0.036
Token he
Token ID339
Feature activation-4.47e-03

Pos logit contributions
 pap+0.004
 not+0.004
 plan+0.004
 help+0.004
 mistake+0.004
Neg logit contributions
es-0.004
ing-0.004
ed-0.004
oped-0.004
med-0.004
Token needed
Token ID2622
Feature activation-6.39e-03

Pos logit contributions
 not+0.006
 pap+0.006
 plan+0.006
 help+0.006
 grown+0.006
Neg logit contributions
es-0.009
ing-0.009
ed-0.009
med-0.009
oped-0.009
Token to
Token ID284
Feature activation-3.62e-02

Pos logit contributions
 pap+0.009
 majesty+0.009
 disl+0.009
 plan+0.008
 fact+0.008
Neg logit contributions
es-0.013
ed-0.012
ing-0.012
med-0.012
ted-0.012
Token a
Token ID257
Feature activation-2.30e-02

Pos logit contributions
es+0.046
ing+0.045
oped+0.044
ed+0.044
med+0.043
Neg logit contributions
 pap-0.014
 not-0.014
 help-0.013
 grown-0.013
 plan-0.013
Token nap
Token ID25422
Feature activation+6.98e-02

Pos logit contributions
 pap+0.024
 not+0.023
 plan+0.021
 mistake+0.021
 majesty+0.021
Neg logit contributions
ing-0.055
es-0.054
oped-0.054
ed-0.053
med-0.052
Token.
Token ID13
Feature activation+5.36e-02

Pos logit contributions
oped+0.107
es+0.105
ed+0.102
ing+0.101
med+0.101
Neg logit contributions
 not-0.136
 pap-0.132
 plan-0.129
 help-0.128
 grown-0.126
Token Molly
Token ID30236
Feature activation+2.33e-02

Pos logit contributions
es+0.078
ing+0.075
oped+0.072
med+0.072
ed+0.070
Neg logit contributions
 pap-0.111
 not-0.109
 grown-0.106
 plan-0.104
 help-0.103
Token fell
Token ID3214
Feature activation+2.55e-02

Pos logit contributions
es+0.048
ing+0.046
ed+0.046
oped+0.045
med+0.045
Neg logit contributions
 pap-0.033
 not-0.033
 plan-0.031
 help-0.031
 mistake-0.030
Token asleep
Token ID16039
Feature activation+6.07e-02

Pos logit contributions
es+0.049
oped+0.045
med+0.044
ays+0.042
hed+0.042
Neg logit contributions
 not-0.042
 pap-0.039
 grown-0.038
 sc-0.038
 help-0.037
Token on
Token ID319
Feature activation+7.62e-03

Pos logit contributions
es+0.116
ays+0.115
ing+0.114
oped+0.113
ane+0.111
Neg logit contributions
 not-0.098
 pap-0.093
 help-0.091
 grown-0.090
 plan-0.086
Token the
Token ID262
Feature activation-3.17e-02

Pos logit contributions
es+0.019
ing+0.018
ed+0.018
oped+0.018
med+0.018
Neg logit contributions
 pap-0.008
 not-0.008
 plan-0.007
 help-0.007
 mistake-0.007
Token couch
Token ID18507
Feature activation+6.76e-02

Pos logit contributions
 pap+0.036
 disl+0.032
 not+0.032
 plan+0.031
 majesty+0.029
Neg logit contributions
ing-0.074
ed-0.073
es-0.071
med-0.070
hed-0.070
Token and
Token ID290
Feature activation-1.53e-02

Pos logit contributions
es+0.112
oped+0.107
ed+0.106
ing+0.106
med+0.104
Neg logit contributions
 not-0.124
 pap-0.119
 grown-0.116
 plan-0.115
 help-0.114
Token enjoyed
Token ID8359
Feature activation-1.19e-02

Pos logit contributions
 pap+0.023
 not+0.022
 plan+0.022
 help+0.021
 mistake+0.021
Neg logit contributions
es-0.030
ed-0.029
ing-0.029
oped-0.029
med-0.028
Token she
Token ID673
Feature activation-1.88e-02

Pos logit contributions
 pap+0.006
 not+0.006
 plan+0.006
 mistake+0.006
 help+0.006
Neg logit contributions
es-0.005
ing-0.005
ed-0.005
oped-0.005
med-0.005
Token started
Token ID2067
Feature activation-1.83e-02

Pos logit contributions
 pap+0.028
 ours+0.027
 plan+0.026
 disl+0.026
 not+0.026
Neg logit contributions
es-0.038
ed-0.037
ing-0.036
oped-0.036
med-0.035
Token cl
Token ID537
Feature activation+3.70e-02

Pos logit contributions
 pap+0.030
 ours+0.029
 majesty+0.028
 plan+0.027
 not+0.027
Neg logit contributions
ing-0.034
es-0.034
ed-0.033
med-0.033
oped-0.031
Tokenapping
Token ID5912
Feature activation+2.59e-02

Pos logit contributions
es+0.048
oped+0.044
ed+0.043
ing+0.042
ted+0.041
Neg logit contributions
 pap-0.080
 not-0.077
 plan-0.075
 grown-0.074
 help-0.074
Token and
Token ID290
Feature activation-3.11e-02

Pos logit contributions
oped+0.049
es+0.049
ing+0.047
ed+0.047
ane+0.046
Neg logit contributions
 not-0.042
 pap-0.040
 plan-0.039
 grown-0.038
 help-0.038
Token smiling
Token ID16755
Feature activation+2.76e-02

Pos logit contributions
 pap+0.040
 ours+0.038
 not+0.037
 plan+0.037
 majesty+0.037
Neg logit contributions
es-0.068
ing-0.066
med-0.066
ed-0.065
oped-0.063
Token so
Token ID523
Feature activation-1.01e-02

Pos logit contributions
oped+0.060
ane+0.054
ays+0.054
es+0.052
acked+0.050
Neg logit contributions
 not-0.042
 sweet-0.041
 fun-0.040
 grown-0.040
 special-0.038
Token he
Token ID339
Feature activation-1.55e-02

Pos logit contributions
 not+0.010
 grown+0.009
 pap+0.009
 fun+0.009
 help+0.009
Neg logit contributions
es-0.026
oped-0.026
ed-0.025
ing-0.025
ays-0.025
Token could
Token ID714
Feature activation-4.26e-02

Pos logit contributions
 pap+0.023
 not+0.022
 plan+0.022
 help+0.021
 mistake+0.021
Neg logit contributions
es-0.031
ing-0.030
ed-0.030
oped-0.029
med-0.029
Token see
Token ID766
Feature activation+3.32e-04

Pos logit contributions
 pap+0.055
 ours+0.048
 mistake+0.047
 tresp+0.047
 permission+0.045
Neg logit contributions
es-0.091
ed-0.089
med-0.089
ing-0.086
oped-0.086
Token her
Token ID607
Feature activation-1.72e-02

Pos logit contributions
es+0.001
oped+0.001
ed+0.001
ing+0.001
med+0.001
Neg logit contributions
 pap-0.000
 not-0.000
 plan-0.000
 help-0.000
 grown-0.000
Token disappointed
Token ID11679
Feature activation+1.67e-02

Pos logit contributions
 pap+0.087
 majesty+0.078
 permission+0.078
 ours+0.076
 mistake+0.075
Neg logit contributions
ed-0.125
es-0.122
ing-0.119
oped-0.116
hed-0.113
Token.
Token ID13
Feature activation+4.01e-02

Pos logit contributions
es+0.029
oped+0.029
med+0.028
ed+0.027
hed+0.026
Neg logit contributions
 not-0.031
 pap-0.029
 mistake-0.028
 plan-0.028
 help-0.028
Token
Token ID198
Feature activation+3.08e-02

Pos logit contributions
es+0.064
ing+0.060
med+0.060
ed+0.058
oped+0.058
Neg logit contributions
 pap-0.078
 not-0.078
 help-0.074
 plan-0.073
 grown-0.073
Token
Token ID198
Feature activation+2.54e-02

Pos logit contributions
es+0.055
ed+0.054
ing+0.053
med+0.052
oped+0.051
Neg logit contributions
 not-0.055
 help-0.051
 pap-0.049
 grown-0.049
 plan-0.049
TokenB
Token ID33
Feature activation+3.90e-02

Pos logit contributions
es+0.036
ed+0.035
ing+0.034
oped+0.033
med+0.032
Neg logit contributions
 not-0.053
 pap-0.052
 plan-0.051
 help-0.051
 grown-0.050
Tokenobby
Token ID11369
Feature activation+9.18e-03

Pos logit contributions
es+0.069
med+0.064
ed+0.061
oped+0.060
ing+0.059
Neg logit contributions
 pap-0.066
 not-0.064
 plan-0.064
 mistake-0.061
 help-0.061
Token thought
Token ID1807
Feature activation-6.23e-03

Pos logit contributions
es+0.018
ing+0.018
ed+0.018
oped+0.018
med+0.017
Neg logit contributions
 pap-0.013
 not-0.013
 plan-0.013
 help-0.012
 grown-0.012
Token and
Token ID290
Feature activation-2.80e-02

Pos logit contributions
 pap+0.010
 not+0.010
 plan+0.010
 help+0.009
 mistake+0.009
Neg logit contributions
es-0.011
ing-0.011
ed-0.011
oped-0.011
med-0.011
Token thought
Token ID1807
Feature activation+8.53e-03

Pos logit contributions
 ours+0.038
 pap+0.038
 plan+0.035
 majesty+0.035
 help+0.034
Neg logit contributions
es-0.058
ed-0.057
ing-0.056
med-0.053
oped-0.053
Token about
Token ID546
Feature activation-6.51e-03

Pos logit contributions
es+0.016
oped+0.016
ed+0.016
hed+0.016
ing+0.016
Neg logit contributions
 not-0.014
 help-0.013
 plan-0.013
 pap-0.013
 grown-0.012
Token what
Token ID644
Feature activation-2.66e-03

Pos logit contributions
 pap+0.010
 majesty+0.009
 plan+0.009
 ours+0.009
 gard+0.009
Neg logit contributions
es-0.013
ing-0.012
med-0.012
ed-0.011
oped-0.011
Token felt
Token ID2936
Feature activation-5.98e-02

Pos logit contributions
 mistake+0.042
 disl+0.041
 fact+0.041
 ours+0.041
 plan+0.041
Neg logit contributions
es-0.067
ed-0.066
ing-0.063
oped-0.061
med-0.060
Token its
Token ID663
Feature activation-5.76e-02

Pos logit contributions
 majesty+0.083
 pap+0.083
 mistake+0.076
 dads+0.075
 plan+0.073
Neg logit contributions
ed-0.122
es-0.120
med-0.117
ing-0.116
hed-0.115
Token weight
Token ID3463
Feature activation+4.44e-02

Pos logit contributions
 disl+0.061
 dads+0.061
 pap+0.059
 ours+0.056
 majesty+0.052
Neg logit contributions
ed-0.131
ing-0.130
es-0.121
med-0.119
oped-0.119
Token.
Token ID13
Feature activation+3.65e-02

Pos logit contributions
es+0.070
oped+0.066
ed+0.065
med+0.064
ing+0.064
Neg logit contributions
 not-0.084
 pap-0.081
 help-0.081
 plan-0.079
 mistake-0.077
Token Then
Token ID3244
Feature activation-1.66e-03

Pos logit contributions
es+0.063
ing+0.059
med+0.058
oped+0.057
ed+0.057
Neg logit contributions
 not-0.068
 pap-0.067
 grown-0.064
 plan-0.062
 help-0.062
Token,
Token ID11
Feature activation-1.09e-02

Pos logit contributions
 pap+0.003
 not+0.003
 plan+0.003
 help+0.003
 grown+0.003
Neg logit contributions
es-0.003
ing-0.003
oped-0.003
ed-0.003
med-0.003
Token with
Token ID351
Feature activation-4.75e-02

Pos logit contributions
 pap+0.017
 not+0.017
 plan+0.016
 help+0.016
 mistake+0.016
Neg logit contributions
es-0.021
ing-0.020
ed-0.020
oped-0.020
med-0.019
Token a
Token ID257
Feature activation-4.07e-02

Pos logit contributions
 majesty+0.058
 pap+0.056
 disl+0.055
 mistake+0.052
 plan+0.051
Neg logit contributions
ed-0.105
med-0.104
es-0.103
ing-0.101
oped-0.099
Token lively
Token ID29696
Feature activation-3.43e-02

Pos logit contributions
 ours+0.041
 not+0.038
 disl+0.038
 pap+0.037
 permission+0.036
Neg logit contributions
ing-0.095
oped-0.094
es-0.094
ed-0.091
med-0.091
Token smile
Token ID8212
Feature activation+3.43e-02

Pos logit contributions
 ours+0.043
 pap+0.040
 not+0.040
 disl+0.037
 majesty+0.036
Neg logit contributions
ing-0.073
oped-0.068
ed-0.068
med-0.068
es-0.067
Token on
Token ID319
Feature activation-1.15e-02

Pos logit contributions
es+0.052
ays+0.050
oped+0.049
acked+0.048
med+0.047
Neg logit contributions
 not-0.062
 pap-0.062
 help-0.061
 be-0.061
 grown-0.060
Token.
Token ID13
Feature activation+4.39e-02

Pos logit contributions
oped+0.122
med+0.110
ing+0.106
ane+0.106
es+0.105
Neg logit contributions
 not-0.112
 grown-0.103
 help-0.102
 pap-0.100
 fun-0.099
Token She
Token ID1375
Feature activation-2.88e-02

Pos logit contributions
es+0.083
ed+0.080
oped+0.080
med+0.078
ing+0.077
Neg logit contributions
 pap-0.077
 not-0.076
 help-0.076
 fun-0.074
 plan-0.073
Token always
Token ID1464
Feature activation-3.14e-02

Pos logit contributions
 ours+0.050
 permission+0.048
 pap+0.047
 disl+0.047
 mistake+0.047
Neg logit contributions
es-0.051
ed-0.048
s-0.044
oped-0.043
hed-0.043
Token asked
Token ID1965
Feature activation+1.04e-02

Pos logit contributions
 ours+0.048
 pap+0.047
 mistake+0.046
 permission+0.045
 plan+0.045
Neg logit contributions
es-0.059
ed-0.057
oped-0.054
hed-0.053
ing-0.052
Token her
Token ID607
Feature activation-2.23e-02

Pos logit contributions
oped+0.026
es+0.026
ed+0.025
ing+0.025
med+0.025
Neg logit contributions
 pap-0.011
 not-0.011
 help-0.010
 grown-0.010
 plan-0.010
Token mom
Token ID1995
Feature activation+3.27e-02

Pos logit contributions
 pap+0.016
 not+0.016
 plan+0.015
 fact+0.014
 mistake+0.014
Neg logit contributions
es-0.060
ing-0.059
ed-0.058
med-0.056
oped-0.056
Token to
Token ID284
Feature activation-4.20e-02

Pos logit contributions
oped+0.073
hed+0.068
ed+0.068
med+0.068
es+0.067
Neg logit contributions
 not-0.048
 pap-0.046
 help-0.045
 plan-0.043
 grown-0.042
Token make
Token ID787
Feature activation+5.68e-03

Pos logit contributions
 majesty+0.055
 disl+0.052
ses+0.052
 ours+0.052
 tresp+0.052
Neg logit contributions
es-0.076
he-0.075
med-0.074
s-0.073
ing-0.072
Token pasta
Token ID26296
Feature activation+7.48e-02

Pos logit contributions
es+0.014
ing+0.014
ed+0.013
med+0.013
s+0.012
Neg logit contributions
 pap-0.006
 plan-0.006
 majesty-0.006
 not-0.006
 fact-0.005
Token for
Token ID329
Feature activation-1.27e-02

Pos logit contributions
oped+0.164
ed+0.154
med+0.153
ing+0.152
es+0.152
Neg logit contributions
 not-0.109
 pap-0.101
 plan-0.100
 help-0.099
 fun-0.095
Token dinner
Token ID8073
Feature activation+5.75e-02

Pos logit contributions
 pap+0.014
 majesty+0.014
 disl+0.014
 tresp+0.014
 plan+0.013
Neg logit contributions
es-0.029
med-0.027
ing-0.026
ed-0.026
s-0.024
Token.
Token ID13
Feature activation+3.88e-03

Pos logit contributions
oped+0.050
ed+0.047
ing+0.047
es+0.046
med+0.046
Neg logit contributions
 not-0.053
 help-0.049
 pap-0.048
 grown-0.048
 plan-0.047
Token Mia
Token ID32189
Feature activation-8.89e-03

Pos logit contributions
ing+0.006
es+0.006
ed+0.006
oped+0.006
med+0.006
Neg logit contributions
 pap-0.007
 not-0.007
 help-0.007
 grown-0.007
 plan-0.007
Token liked
Token ID8288
Feature activation-5.47e-02

Pos logit contributions
 pap+0.014
 not+0.014
 plan+0.013
 help+0.013
 grown+0.013
Neg logit contributions
es-0.017
ing-0.016
ed-0.016
oped-0.016
med-0.016
Token the
Token ID262
Feature activation-6.43e-02

Pos logit contributions
 pap+0.096
 promise+0.093
 plan+0.092
 mistake+0.091
 fact+0.090
Neg logit contributions
es-0.090
ing-0.088
med-0.083
ed-0.083
s-0.077
Token images
Token ID4263
Feature activation+3.47e-02

Pos logit contributions
 pap+0.087
 ours+0.078
 permission+0.078
 not+0.077
 disl+0.076
Neg logit contributions
ing-0.126
es-0.126
ed-0.120
ane-0.119
s-0.115
Token.
Token ID13
Feature activation+1.54e-02

Pos logit contributions
oped+0.059
ed+0.056
es+0.056
ing+0.055
med+0.054
Neg logit contributions
 not-0.064
 help-0.061
 pap-0.060
 plan-0.060
 fun-0.059
Token She
Token ID1375
Feature activation-4.54e-02

Pos logit contributions
ing+0.026
ed+0.026
es+0.025
oped+0.025
med+0.025
Neg logit contributions
 not-0.028
 pap-0.028
 help-0.027
 plan-0.026
 welcome-0.026
Token pointed
Token ID6235
Feature activation-9.00e-03

Pos logit contributions
 pap+0.070
 plan+0.065
 not+0.065
 ours+0.065
 mistake+0.065
Neg logit contributions
es-0.090
ed-0.086
ing-0.083
oped-0.083
med-0.082
Token at
Token ID379
Feature activation-3.33e-02

Pos logit contributions
 not+0.014
 pap+0.014
 plan+0.013
 grown+0.013
 help+0.013
Neg logit contributions
es-0.017
oped-0.017
ing-0.016
med-0.016
ed-0.016
Token them
Token ID606
Feature activation+2.15e-02

Pos logit contributions
 pap+0.044
 plan+0.041
 not+0.041
 mistake+0.040
 majesty+0.039
Neg logit contributions
es-0.073
ing-0.070
ed-0.069
med-0.067
oped-0.064
Token and
Token ID290
Feature activation-4.02e-02

Pos logit contributions
oped+0.044
ed+0.040
hed+0.040
acked+0.039
med+0.039
Neg logit contributions
 not-0.038
 plan-0.032
 be-0.031
 help-0.031
 pap-0.031
Token,
Token ID11
Feature activation-7.37e-03

Pos logit contributions
 pap+0.037
 not+0.035
 plan+0.034
 ours+0.034
 disl+0.034
Neg logit contributions
ing-0.022
es-0.022
ed-0.020
oped-0.019
med-0.019
Token Lily
Token ID20037
Feature activation+3.02e-02

Pos logit contributions
 rightful+0.013
 pap+0.012
 plan+0.011
 lookout+0.011
 beauty+0.011
Neg logit contributions
es-0.012
ing-0.011
ed-0.011
med-0.010
chair-0.010
Token,
Token ID11
Feature activation-4.39e-03

Pos logit contributions
med+0.050
es+0.050
oped+0.050
ing+0.049
ed+0.048
Neg logit contributions
 not-0.055
 pap-0.052
 help-0.051
 plan-0.051
 mistake-0.050
Token what
Token ID644
Feature activation-9.43e-03

Pos logit contributions
 rightful+0.008
 pap+0.007
 plan+0.007
 beauty+0.007
 majesty+0.007
Neg logit contributions
es-0.007
ing-0.007
ed-0.006
med-0.006
 again-0.006
Token did
Token ID750
Feature activation-1.25e-02

Pos logit contributions
 pap+0.016
 plan+0.014
 mistake+0.014
 fact+0.014
 majesty+0.014
Neg logit contributions
es-0.018
ed-0.017
ing-0.017
med-0.016
oped-0.016
Token you
Token ID345
Feature activation-3.65e-02

Pos logit contributions
 pap+0.017
 gard+0.017
 plan+0.017
 mistake+0.016
 majesty+0.016
Neg logit contributions
es-0.026
ing-0.024
ed-0.023
s-0.022
med-0.022
Token do
Token ID466
Feature activation+4.35e-03

Pos logit contributions
 majesty+0.059
 pap+0.057
 rightful+0.050
 disl+0.048
 dads+0.048
Neg logit contributions
es-0.065
he-0.065
ing-0.065
hed-0.062
ed-0.062
Token?
Token ID30
Feature activation+1.81e-02

Pos logit contributions
es+0.009
ing+0.009
ed+0.009
oped+0.009
med+0.009
Neg logit contributions
 pap-0.006
 not-0.006
 plan-0.006
 help-0.005
 mistake-0.005
Token Come
Token ID7911
Feature activation+2.81e-02

Pos logit contributions
ing+0.030
oped+0.029
es+0.029
med+0.028
ed+0.028
Neg logit contributions
 pap-0.034
 not-0.033
 help-0.031
 disl-0.030
 grown-0.029
Token here
Token ID994
Feature activation+3.02e-02

Pos logit contributions
es+0.049
ing+0.047
ed+0.047
oped+0.046
hed+0.046
Neg logit contributions
 not-0.052
 help-0.051
 plan-0.048
 pap-0.048
 welcome-0.046
Token,
Token ID11
Feature activation-2.10e-04

Pos logit contributions
es+0.057
ing+0.055
ed+0.055
oped+0.054
med+0.053
Neg logit contributions
 pap-0.049
 not-0.049
 plan-0.047
 help-0.046
 mistake-0.045
Token was
Token ID373
Feature activation-3.55e-02

Pos logit contributions
es+0.034
ing+0.033
ed+0.033
oped+0.032
med+0.032
Neg logit contributions
 pap-0.018
 not-0.018
 plan-0.017
 help-0.016
 mistake-0.016
Token shy
Token ID15800
Feature activation+1.61e-02

Pos logit contributions
 majesty+0.050
 mistake+0.050
 disl+0.050
 ours+0.046
 seaw+0.045
Neg logit contributions
ing-0.068
ed-0.068
es-0.066
med-0.064
oped-0.061
Token and
Token ID290
Feature activation-2.35e-02

Pos logit contributions
oped+0.031
es+0.029
med+0.029
ed+0.027
acked+0.027
Neg logit contributions
 not-0.028
 fun-0.026
 grown-0.026
 pap-0.026
 help-0.026
Token didn
Token ID1422
Feature activation-3.84e-03

Pos logit contributions
 mistake+0.038
 pap+0.037
 permission+0.037
 plan+0.035
 majesty+0.035
Neg logit contributions
es-0.042
ed-0.041
ing-0.040
s-0.038
oped-0.037
Token't
Token ID470
Feature activation-4.05e-02

Pos logit contributions
 not+0.005
 pap+0.005
 help+0.005
 plan+0.005
 grown+0.005
Neg logit contributions
es-0.008
ed-0.008
ing-0.008
oped-0.008
med-0.008
Token like
Token ID588
Feature activation-1.91e-02

Pos logit contributions
 pap+0.043
 not+0.041
 plan+0.039
 help+0.038
 mistake+0.037
Neg logit contributions
es-0.098
ing-0.096
ed-0.095
oped-0.093
med-0.092
Token to
Token ID284
Feature activation-4.56e-02

Pos logit contributions
 pap+0.030
 plan+0.029
 majesty+0.028
 ours+0.027
 mistake+0.027
Neg logit contributions
es-0.035
ing-0.035
ed-0.033
med-0.032
oped-0.031
Token talk
Token ID1561
Feature activation+3.41e-02

Pos logit contributions
 pap+0.054
 ours+0.048
 majesty+0.046
 fact+0.045
 not+0.045
Neg logit contributions
es-0.103
ing-0.099
ed-0.097
oped-0.097
med-0.094
Token to
Token ID284
Feature activation-6.70e-03

Pos logit contributions
oped+0.084
med+0.076
hed+0.076
ays+0.076
acked+0.076
Neg logit contributions
 not-0.049
 fun-0.044
 help-0.042
 grown-0.042
 sweet-0.039
Token other
Token ID584
Feature activation-1.01e-02

Pos logit contributions
 pap+0.008
 not+0.008
 plan+0.007
 help+0.007
 mistake+0.007
Neg logit contributions
es-0.015
ing-0.015
ed-0.015
oped-0.014
med-0.014
Token kids
Token ID3988
Feature activation+4.90e-02

Pos logit contributions
 pap+0.010
 not+0.010
 plan+0.009
 help+0.008
 mistake+0.008
Neg logit contributions
es-0.025
ing-0.025
ed-0.025
med-0.024
oped-0.024
Token,
Token ID11
Feature activation+1.89e-02

Pos logit contributions
es+0.063
oped+0.059
ing+0.058
med+0.058
ed+0.057
Neg logit contributions
 not-0.062
 pap-0.060
 grown-0.059
 help-0.058
 plan-0.057
Token she
Token ID673
Feature activation-1.90e-02

Pos logit contributions
es+0.034
ed+0.034
ing+0.034
oped+0.032
med+0.032
Neg logit contributions
 pap-0.032
 plan-0.030
 not-0.030
 mistake-0.029
 help-0.029
Token was
Token ID373
Feature activation-4.08e-02

Pos logit contributions
 pap+0.029
 ours+0.026
 not+0.026
 help+0.026
 plan+0.026
Neg logit contributions
ed-0.039
es-0.039
ing-0.037
oped-0.036
med-0.035
Token playing
Token ID2712
Feature activation+3.86e-02

Pos logit contributions
 disl+0.067
 pap+0.062
 majesty+0.061
 seaw+0.060
 ours+0.059
Neg logit contributions
ing-0.079
ed-0.079
es-0.077
med-0.074
oped-0.070
Token in
Token ID287
Feature activation-1.25e-02

Pos logit contributions
es+0.054
ing+0.051
ed+0.051
oped+0.048
med+0.048
Neg logit contributions
 pap-0.082
 not-0.079
 plan-0.078
 mistake-0.076
 help-0.076
Token the
Token ID262
Feature activation-4.01e-02

Pos logit contributions
 majesty+0.022
 pap+0.021
 promise+0.020
 ours+0.019
 disl+0.019
Neg logit contributions
es-0.021
med-0.020
ing-0.020
ed-0.019
hed-0.019
Token garden
Token ID11376
Feature activation+4.83e-02

Pos logit contributions
 pap+0.048
 tresp+0.042
 mistake+0.041
 majesty+0.041
 permission+0.041
Neg logit contributions
ing-0.098
ed-0.092
es-0.086
s-0.086
hed-0.086
Token with
Token ID351
Feature activation-2.35e-02

Pos logit contributions
es+0.066
ing+0.064
ed+0.063
oped+0.059
med+0.059
Neg logit contributions
 pap-0.103
 not-0.098
 plan-0.098
 mistake-0.096
 help-0.095
Token her
Token ID607
Feature activation-3.34e-02

Pos logit contributions
 mistake+0.049
 majesty+0.049
 promise+0.047
 inconvenience+0.046
 tresp+0.046
Neg logit contributions
ed-0.035
es-0.034
med-0.034
ing-0.032
ted-0.030
Token friends
Token ID2460
Feature activation+4.50e-02

Pos logit contributions
 pap+0.034
 mistake+0.034
 disl+0.033
 not+0.032
 ours+0.032
Neg logit contributions
es-0.080
ed-0.080
ing-0.078
s-0.075
med-0.072
Token.
Token ID13
Feature activation+5.59e-02

Pos logit contributions
oped+0.082
es+0.082
ed+0.079
ane+0.078
med+0.078
Neg logit contributions
 not-0.074
 help-0.073
 permission-0.069
 grown-0.069
 pap-0.069
Token tools
Token ID4899
Feature activation+7.25e-02

Pos logit contributions
 pap+0.004
 plan+0.004
 majesty+0.004
 not+0.004
 gard+0.004
Neg logit contributions
es-0.009
ing-0.009
ed-0.009
med-0.009
oped-0.008
Token in
Token ID287
Feature activation-1.20e-02

Pos logit contributions
oped+0.122
es+0.121
ed+0.119
med+0.116
ing+0.113
Neg logit contributions
 not-0.131
 help-0.121
 mistake-0.120
 pap-0.119
 plan-0.119
Token the
Token ID262
Feature activation-4.36e-02

Pos logit contributions
 pap+0.013
 not+0.012
 plan+0.012
 majesty+0.012
 gard+0.011
Neg logit contributions
es-0.029
ing-0.028
ed-0.028
med-0.027
oped-0.027
Token garage
Token ID15591
Feature activation+6.26e-02

Pos logit contributions
 pap+0.060
 not+0.056
 disl+0.053
 majesty+0.052
 ours+0.051
Neg logit contributions
ing-0.091
es-0.090
ed-0.087
hed-0.083
oped-0.082
Token.
Token ID13
Feature activation+2.50e-02

Pos logit contributions
es+0.100
oped+0.098
ed+0.097
med+0.096
ing+0.095
Neg logit contributions
 not-0.118
 pap-0.113
 plan-0.112
 mistake-0.112
 help-0.109
Token You
Token ID921
Feature activation-2.92e-02

Pos logit contributions
es+0.042
ing+0.041
oped+0.041
ed+0.041
med+0.039
Neg logit contributions
 pap-0.045
 not-0.045
 help-0.043
 grown-0.041
 ours-0.041
Token can
Token ID460
Feature activation-3.22e-02

Pos logit contributions
 pap+0.043
 mistake+0.039
 majesty+0.038
 plan+0.037
 disl+0.036
Neg logit contributions
es-0.063
ed-0.058
ing-0.058
oped-0.057
hed-0.055
Token hurt
Token ID5938
Feature activation+1.23e-02

Pos logit contributions
 pap+0.045
 majesty+0.039
 ours+0.038
 disl+0.038
 plan+0.037
Neg logit contributions
es-0.069
ing-0.066
oped-0.065
ed-0.064
med-0.063
Token yourself
Token ID3511
Feature activation+4.49e-02

Pos logit contributions
es+0.031
ing+0.030
ed+0.029
med+0.028
oped+0.027
Neg logit contributions
 pap-0.013
 plan-0.012
 majesty-0.011
 not-0.011
 help-0.010
Token or
Token ID393
Feature activation-4.80e-02

Pos logit contributions
oped+0.075
ed+0.073
med+0.071
es+0.069
acked+0.069
Neg logit contributions
 not-0.086
 mistake-0.078
 ours-0.077
 help-0.077
 plan-0.077
Token break
Token ID2270
Feature activation+2.99e-02

Pos logit contributions
 pap+0.060
 majesty+0.055
 fact+0.054
 plan+0.053
 gard+0.052
Neg logit contributions
es-0.101
ing-0.097
ed-0.097
med-0.094
s-0.089
Token She
Token ID1375
Feature activation-1.48e-02

Pos logit contributions
med+0.063
es+0.062
oped+0.062
ing+0.060
ed+0.059
Neg logit contributions
 not-0.070
 pap-0.070
 grown-0.066
 help-0.065
 gard-0.064
Token has
Token ID468
Feature activation-2.31e-02

Pos logit contributions
 pap+0.025
 ours+0.024
 permission+0.024
 plan+0.023
 disl+0.023
Neg logit contributions
es-0.027
ed-0.025
ing-0.024
med-0.024
s-0.024
Token a
Token ID257
Feature activation-2.63e-02

Pos logit contributions
 pap+0.034
 majesty+0.031
 plan+0.029
 promise+0.029
 gard+0.029
Neg logit contributions
es-0.043
ed-0.041
med-0.040
 again-0.038
hed-0.036
Token doll
Token ID3654
Feature activation+7.00e-02

Pos logit contributions
 pap+0.034
 disl+0.034
 permission+0.034
 ours+0.032
 elderly+0.030
Neg logit contributions
oped-0.048
es-0.048
s-0.048
ing-0.047
ane-0.045
Token,
Token ID11
Feature activation-1.27e-02

Pos logit contributions
oped+0.116
ing+0.115
ed+0.114
med+0.111
es+0.109
Neg logit contributions
 not-0.131
 help-0.118
 pap-0.116
 be-0.116
 plan-0.114
Token a
Token ID257
Feature activation-3.00e-02

Pos logit contributions
 plan+0.015
 pap+0.014
 mistake+0.014
 disl+0.013
 majesty+0.013
Neg logit contributions
es-0.030
med-0.027
ed-0.027
ing-0.026
oped-0.026
Token t
Token ID256
Feature activation+7.24e-02

Pos logit contributions
 pap+0.043
 not+0.042
 disl+0.040
 permission+0.040
 ours+0.040
Neg logit contributions
es-0.057
oped-0.055
ing-0.055
ed-0.054
med-0.053
Tokeneddy
Token ID21874
Feature activation+5.55e-02

Pos logit contributions
es+0.062
med+0.060
oped+0.059
hed+0.056
ed+0.055
Neg logit contributions
 pap-0.179
 not-0.177
 plan-0.172
 help-0.172
 mistake-0.171
Token bear
Token ID6842
Feature activation+9.29e-02

Pos logit contributions
oped+0.134
ing+0.134
es+0.132
ed+0.132
med+0.129
Neg logit contributions
 pap-0.057
 not-0.056
 help-0.053
 plan-0.051
 mistake-0.049
Token,
Token ID11
Feature activation-2.68e-03

Pos logit contributions
ing+0.160
oped+0.158
ed+0.156
med+0.155
es+0.149
Neg logit contributions
 not-0.162
 pap-0.155
 help-0.150
 be-0.142
 plan-0.142
Token a
Token ID257
Feature activation-3.48e-02

Pos logit contributions
 pap+0.003
 plan+0.003
 not+0.003
 mistake+0.003
 disl+0.003
Neg logit contributions
es-0.006
ed-0.006
ing-0.006
med-0.006
oped-0.005
Token each
Token ID1123
Feature activation-7.04e-02

Pos logit contributions
 pap+0.045
 not+0.044
 plan+0.042
 help+0.041
 mistake+0.040
Neg logit contributions
es-0.074
ing-0.072
ed-0.071
oped-0.070
med-0.069
Token other
Token ID584
Feature activation-3.19e-02

Pos logit contributions
 pap+0.070
 not+0.062
 disl+0.060
 majesty+0.059
 plan+0.056
Neg logit contributions
ing-0.176
ed-0.175
es-0.173
hed-0.167
med-0.165
Token.
Token ID13
Feature activation-1.41e-02

Pos logit contributions
 pap+0.069
 not+0.066
 plan+0.065
 grown+0.063
 majesty+0.063
Neg logit contributions
es-0.043
ing-0.042
ed-0.041
med-0.038
oped-0.038
Token They
Token ID1119
Feature activation-7.18e-02

Pos logit contributions
 pap+0.024
 not+0.024
 majesty+0.022
 disl+0.022
 grown+0.022
Neg logit contributions
ing-0.026
es-0.025
ed-0.025
med-0.024
oped-0.023
Token felt
Token ID2936
Feature activation-7.30e-02

Pos logit contributions
 pap+0.123
 ours+0.117
 majesty+0.117
 disl+0.115
 mistake+0.115
Neg logit contributions
es-0.129
ed-0.124
oped-0.120
hed-0.117
ing-0.113
Token the
Token ID262
Feature activation-9.07e-02

Pos logit contributions
 pap+0.095
 majesty+0.092
 mistake+0.088
 plan+0.086
 fact+0.080
Neg logit contributions
ed-0.156
es-0.148
med-0.146
ing-0.143
hed-0.143
Token heat
Token ID4894
Feature activation+2.66e-02

Pos logit contributions
 disl+0.116
 pap+0.102
 ours+0.095
 majesty+0.095
 not+0.094
Neg logit contributions
ing-0.189
ed-0.173
ane-0.169
hed-0.168
he-0.160
Token and
Token ID290
Feature activation-6.24e-02

Pos logit contributions
es+0.044
oped+0.042
ed+0.040
med+0.039
ane+0.038
Neg logit contributions
 not-0.049
 help-0.047
 pap-0.047
 plan-0.046
 be-0.045
Token the
Token ID262
Feature activation-8.84e-02

Pos logit contributions
 majesty+0.089
 plan+0.084
 gard+0.084
 pap+0.084
 ours+0.083
Neg logit contributions
ed-0.119
es-0.117
med-0.114
ted-0.111
ing-0.109
Token smoke
Token ID7523
Feature activation+2.78e-02

Pos logit contributions
 pap+0.094
 not+0.094
 disl+0.091
 ours+0.088
 plan+0.081
Neg logit contributions
ing-0.195
ed-0.189
es-0.183
hed-0.178
med-0.177
Token.
Token ID13
Feature activation+7.54e-03

Pos logit contributions
es+0.051
oped+0.048
ed+0.047
ing+0.047
med+0.046
Neg logit contributions
 not-0.046
 help-0.044
 pap-0.043
 grown-0.043
 be-0.043
Token searched
Token ID16499
Feature activation-4.03e-02

Pos logit contributions
 pap+0.094
 not+0.088
 plan+0.087
 ours+0.087
 permission+0.086
Neg logit contributions
es-0.101
ed-0.096
ing-0.094
oped-0.091
med-0.091
Token and
Token ID290
Feature activation-6.83e-02

Pos logit contributions
 pap+0.077
 not+0.072
 plan+0.071
 disl+0.070
 mistake+0.070
Neg logit contributions
es-0.064
ing-0.062
ed-0.061
oped-0.058
med-0.058
Token searched
Token ID16499
Feature activation-2.75e-02

Pos logit contributions
 pap+0.097
 not+0.094
 plan+0.091
 help+0.088
 mistake+0.087
Neg logit contributions
es-0.143
ing-0.139
ed-0.137
oped-0.135
med-0.133
Token,
Token ID11
Feature activation-4.25e-02

Pos logit contributions
 pap+0.059
 not+0.057
 plan+0.056
 mistake+0.055
 help+0.055
Neg logit contributions
es-0.037
ing-0.035
ed-0.035
oped-0.033
med-0.033
Token but
Token ID475
Feature activation-5.28e-02

Pos logit contributions
 pap+0.086
 plan+0.083
 not+0.081
 mistake+0.080
 help+0.080
Neg logit contributions
es-0.061
ing-0.059
ed-0.059
oped-0.056
med-0.056
Token the
Token ID262
Feature activation-8.94e-02

Pos logit contributions
 pap+0.085
 plan+0.082
 not+0.080
 mistake+0.079
 help+0.078
Neg logit contributions
es-0.099
ed-0.095
ing-0.095
oped-0.091
med-0.090
Token jelly
Token ID27644
Feature activation+2.19e-02

Pos logit contributions
 disl+0.098
 ours+0.093
 not+0.092
 pap+0.090
 plan+0.085
Neg logit contributions
ing-0.200
ed-0.194
es-0.189
med-0.180
s-0.179
Token was
Token ID373
Feature activation-8.91e-02

Pos logit contributions
es+0.039
ing+0.037
ed+0.037
oped+0.036
med+0.035
Neg logit contributions
 pap-0.039
 not-0.036
 plan-0.036
 help-0.035
 mistake-0.035
Token nowhere
Token ID12062
Feature activation-2.74e-02

Pos logit contributions
 disl+0.153
 majesty+0.150
 dads+0.147
 pap+0.139
 plan+0.132
Neg logit contributions
es-0.152
ed-0.148
ing-0.148
med-0.144
 again-0.143
Token to
Token ID284
Feature activation-6.27e-02

Pos logit contributions
 pap+0.044
 not+0.043
 plan+0.042
 help+0.041
 mistake+0.040
Neg logit contributions
es-0.052
ing-0.050
ed-0.050
oped-0.049
med-0.048
Token be
Token ID307
Feature activation-4.03e-02

Pos logit contributions
 pap+0.052
 majesty+0.048
 dads+0.041
 ours+0.041
 gard+0.038
Neg logit contributions
es-0.156
he-0.149
ed-0.149
oped-0.149
ing-0.147
Token
Token ID220
Feature activation-2.99e-02

Pos logit contributions
 not+0.025
 pap+0.025
 disl+0.023
 majesty+0.022
 plan+0.021
Neg logit contributions
ing-0.054
es-0.052
ed-0.051
s-0.049
hed-0.048
Token
Token ID198
Feature activation-2.64e-02

Pos logit contributions
 not+0.054
 pap+0.050
 plan+0.048
 grown+0.048
 fun+0.047
Neg logit contributions
es-0.050
oped-0.048
med-0.047
ing-0.046
ed-0.046
Token
Token ID198
Feature activation-2.10e-02

Pos logit contributions
 not+0.048
 pap+0.047
 plan+0.045
 grown+0.044
 help+0.044
Neg logit contributions
es-0.044
ed-0.043
ing-0.043
oped-0.041
med-0.040
TokenTim
Token ID14967
Feature activation-8.60e-03

Pos logit contributions
 not+0.038
 pap+0.038
 plan+0.036
 grown+0.036
 help+0.035
Neg logit contributions
es-0.035
ed-0.035
ing-0.034
oped-0.032
med-0.032
Tokenmy
Token ID1820
Feature activation-2.23e-02

Pos logit contributions
 pap+0.013
 not+0.012
 plan+0.012
 mistake+0.012
 be+0.012
Neg logit contributions
es-0.017
ing-0.016
oped-0.016
med-0.016
ed-0.016
Token was
Token ID373
Feature activation-8.94e-02

Pos logit contributions
 pap+0.036
 not+0.035
 plan+0.034
 help+0.033
 mistake+0.033
Neg logit contributions
es-0.042
ing-0.040
ed-0.040
oped-0.039
med-0.039
Token adventurous
Token ID39770
Feature activation-2.47e-02

Pos logit contributions
 disl+0.131
 pap+0.127
 majesty+0.126
 seaw+0.121
 ours+0.119
Neg logit contributions
med-0.172
es-0.167
ed-0.166
ing-0.160
 onwards-0.160
Token and
Token ID290
Feature activation-4.09e-02

Pos logit contributions
 pap+0.049
 not+0.048
 plan+0.047
 help+0.046
 mistake+0.046
Neg logit contributions
es-0.038
ing-0.036
ed-0.036
oped-0.035
med-0.034
Token wanted
Token ID2227
Feature activation-3.17e-02

Pos logit contributions
 mistake+0.070
 pap+0.067
 permission+0.063
 ours+0.063
 fact+0.063
Neg logit contributions
ed-0.074
es-0.071
ing-0.071
med-0.066
s-0.065
Token to
Token ID284
Feature activation-7.27e-02

Pos logit contributions
 pap+0.060
 plan+0.058
 majesty+0.058
 mistake+0.056
 ours+0.056
Neg logit contributions
ing-0.050
es-0.049
ed-0.048
med-0.045
ted-0.042
Token learn
Token ID2193
Feature activation-1.63e-02

Pos logit contributions
 pap+0.096
 ours+0.089
 mistake+0.086
 fact+0.085
 freshly+0.084
Neg logit contributions
es-0.142
med-0.127
ane-0.126
ed-0.125
oped-0.125
Token second
Token ID1218
Feature activation-3.02e-02

Pos logit contributions
 disl+0.103
 pap+0.103
 not+0.095
 majesty+0.091
 ours+0.088
Neg logit contributions
ing-0.188
ane-0.177
es-0.177
ed-0.174
hed-0.167
Token question
Token ID1808
Feature activation+4.46e-02

Pos logit contributions
 pap+0.040
 not+0.039
 plan+0.037
 help+0.036
 mistake+0.036
Neg logit contributions
es-0.065
ing-0.063
ed-0.063
oped-0.062
med-0.061
Token:
Token ID25
Feature activation-2.55e-02

Pos logit contributions
oped+0.082
med+0.077
es+0.076
ing+0.075
ed+0.075
Neg logit contributions
 not-0.075
 help-0.071
 be-0.068
 pap-0.067
 plan-0.066
Token What
Token ID1867
Feature activation-3.71e-02

Pos logit contributions
 not+0.034
 pap+0.034
 help+0.031
 plan+0.031
 grown+0.031
Neg logit contributions
es-0.055
oped-0.054
ing-0.054
ed-0.054
med-0.052
Token is
Token ID318
Feature activation-5.18e-02

Pos logit contributions
 pap+0.054
 not+0.053
 plan+0.051
 help+0.049
 mistake+0.049
Neg logit contributions
es-0.075
ing-0.074
ed-0.073
oped-0.072
med-0.071
Token the
Token ID262
Feature activation-8.88e-02

Pos logit contributions
 pap+0.057
 plan+0.054
 not+0.052
 mistake+0.051
 help+0.049
Neg logit contributions
es-0.122
ing-0.120
ed-0.118
med-0.116
oped-0.114
Token name
Token ID1438
Feature activation+3.42e-02

Pos logit contributions
 pap+0.096
 disl+0.091
 not+0.090
 majesty+0.083
 ours+0.082
Neg logit contributions
ing-0.195
es-0.190
ed-0.184
ane-0.182
med-0.176
Token of
Token ID286
Feature activation-4.39e-02

Pos logit contributions
oped+0.077
ed+0.073
es+0.072
ing+0.072
med+0.070
Neg logit contributions
 not-0.043
 help-0.041
 welcome-0.040
 my-0.039
 be-0.039
Token the
Token ID262
Feature activation-7.94e-02

Pos logit contributions
 pap+0.046
 not+0.043
 plan+0.043
 mistake+0.041
 majesty+0.040
Neg logit contributions
es-0.106
ing-0.102
ed-0.099
med-0.099
oped-0.097
Token animal
Token ID5044
Feature activation+1.05e-02

Pos logit contributions
 pap+0.088
 not+0.087
 disl+0.078
 ours+0.078
 plan+0.077
Neg logit contributions
es-0.175
ing-0.174
ed-0.168
med-0.163
ane-0.162
Token that
Token ID326
Feature activation-3.91e-02

Pos logit contributions
oped+0.019
ed+0.019
es+0.019
ing+0.019
med+0.018
Neg logit contributions
 not-0.017
 pap-0.017
 plan-0.016
 help-0.016
 grown-0.016
Token the
Token ID262
Feature activation-1.32e-01

Pos logit contributions
es+0.139
ed+0.136
ing+0.135
oped+0.133
med+0.132
Neg logit contributions
 pap-0.068
 not-0.068
 help-0.063
 plan-0.062
 mistake-0.060
Token left
Token ID1364
Feature activation-3.40e-02

Pos logit contributions
 pap+0.146
 not+0.143
 disl+0.138
 ours+0.131
 help+0.124
Neg logit contributions
ing-0.286
es-0.272
ed-0.271
ane-0.260
med-0.256
Token looked
Token ID3114
Feature activation-6.73e-02

Pos logit contributions
 pap+0.061
 not+0.059
 plan+0.058
 help+0.056
 mistake+0.056
Neg logit contributions
es-0.057
ing-0.055
ed-0.055
oped-0.053
med-0.053
Token over
Token ID625
Feature activation-5.73e-02

Pos logit contributions
 pap+0.096
 not+0.089
 plan+0.087
 help+0.085
 mistake+0.084
Neg logit contributions
es-0.138
ing-0.137
ed-0.136
med-0.130
oped-0.130
Token at
Token ID379
Feature activation-5.57e-02

Pos logit contributions
 pap+0.115
 not+0.111
 plan+0.109
 help+0.105
 majesty+0.105
Neg logit contributions
es-0.084
ing-0.083
ed-0.082
med-0.077
oped-0.077
Token the
Token ID262
Feature activation-9.45e-02

Pos logit contributions
 pap+0.076
 plan+0.072
 not+0.068
 majesty+0.068
 gard+0.066
Neg logit contributions
es-0.117
ing-0.114
ed-0.113
med-0.108
oped-0.105
Token friend
Token ID1545
Feature activation-2.12e-03

Pos logit contributions
 pap+0.104
 not+0.100
 disl+0.095
 plan+0.089
 majesty+0.088
Neg logit contributions
ing-0.216
es-0.206
ed-0.205
med-0.198
ane-0.196
Token on
Token ID319
Feature activation-4.54e-02

Pos logit contributions
 not+0.004
 pap+0.004
 plan+0.004
 help+0.004
 grown+0.003
Neg logit contributions
es-0.004
ed-0.004
ing-0.003
oped-0.003
med-0.003
Token the
Token ID262
Feature activation-9.23e-02

Pos logit contributions
 pap+0.063
 not+0.058
 plan+0.058
 help+0.055
 majesty+0.055
Neg logit contributions
es-0.096
ing-0.093
ed-0.091
oped-0.089
med-0.089
Token right
Token ID826
Feature activation-4.62e-02

Pos logit contributions
 pap+0.105
 not+0.098
 disl+0.097
 majesty+0.089
 ours+0.086
Neg logit contributions
ing-0.198
es-0.188
ed-0.186
med-0.183
ane-0.181
Token and
Token ID290
Feature activation-5.87e-02

Pos logit contributions
 pap+0.059
 not+0.050
 ours+0.049
 disl+0.048
 help+0.048
Neg logit contributions
ing-0.101
ed-0.101
es-0.098
med-0.094
oped-0.093