TOP ACTIVATIONS
MAX = 0.092

students decided to take the youngest one with them on a
One day an unexpected guest came to visit the garden
a classroom , where there were many desks and chairs ,
a cafeteria , where there were many tables and chairs ,
angry . She asked the other children where Anna and Ben
could not hear what the other kids said to her .
a library , where there were many books and computers .
was a number . The number was open . It wanted
quiz . The teacher gives them the quiz .
When she arrived , the teacher taught them about rivers .
boring . The teacher collects the papers and looks
day , the three university students decided to take the youngest
number . Finally , the mystery was solved !
pass their folders to the front of the class . She
teacher many questions . The teacher is happy and answers them
to the teacher . The teacher checks the quiz . She
his teacher standing at the front of the room . She
had ordered them and the delivery man had brought them to
, she goes to the front of the room . She
and then saying the biggest number of all : zero !

MIN ACTIVATIONS
MIN = -0.098

worm . She felt it wr iggle and squ irm in
. The steak tried to wr iggle away .
pet him . George felt com fy and started to forget
in bed and felt so com fy . After a few
smiled . It felt so com fy and warm .
favourite dress . She felt com fy and content and loved
made Tim my feel so com fy and cozy .
started to move . It wr igg led and w igg
started to clean . It wr igg led around the cave
them . He felt so com fy that he decided to
happy and his body felt com fy and relaxed on the
not like that . It wr igg led and bit Mia
the rain and take a com fy nap . The tiger
The branch was so com fy and gentle that the
, and the thief felt com fy and warm . They
The little girl felt so com fy and happy . The
bed . It was so com fy and tight . He
. He could never get com fy . One
laughed . It was so com fy and fun . She
couch that was just as com fy as the old one

INTERVAL 0.044 - 0.092
CONTAINS 0.550%

day , the principal asked the teachers to count how many
this table !" Then she heard a voice say , "
to the cliff to sleep ?" Her mom smiled and said
stickers and other things she found . One day , she
and said that they had done a really good job .

INTERVAL -0.003 - 0.044
CONTAINS 58.010%

he saw something ! It was a small boat with a
said he wanted to be able to spring .
with a grape . Chris wanted to hold the grape ,
bur row at the top of a hill . One day
remembered was the peaceful river . <|endoftext|> Once upon a time

INTERVAL -0.051 - -0.003
CONTAINS 41.169%

" Don 't touch it , little child ,"
and tired . As Max walked back home ,
One day , the farmer was too busy to
said , " Let 's sit down and take a break
you are hurting me and my things !" the fairy said

INTERVAL -0.098 - -0.051
CONTAINS 0.271%

end . <|endoftext|> Once upon a time there was a nice
knee . It made him feel very sad and he started
woke up , it felt much better . The w affle
floating in the water . D ucky thought it was a
hand . She feels it tick le her palm . She
Token students
Token ID2444
Feature activation+7.98e-02

Pos logit contributions
amer+0.149
old+0.136
ful+0.129
od+0.128
ena+0.126
Neg logit contributions
 wouldn-0.106
 may-0.100
suit-0.085
rated-0.084
 don-0.082
Token decided
Token ID3066
Feature activation+2.95e-02

Pos logit contributions
amer+0.164
old+0.146
ena+0.138
ful+0.137
 office+0.135
Neg logit contributions
 wouldn-0.123
 may-0.116
 don-0.099
rated-0.098
suit-0.096
Token to
Token ID284
Feature activation+5.28e-02

Pos logit contributions
amer+0.058
old+0.052
ena+0.049
od+0.048
 student+0.048
Neg logit contributions
 wouldn-0.049
 may-0.048
 steep-0.042
 don-0.042
 spine-0.041
Token take
Token ID1011
Feature activation+9.06e-03

Pos logit contributions
amer+0.114
old+0.107
ena+0.100
od+0.099
 twins+0.098
Neg logit contributions
 may-0.073
 wouldn-0.069
 don-0.065
rated-0.059
 steep-0.056
Token the
Token ID262
Feature activation+3.48e-02

Pos logit contributions
amer+0.014
old+0.013
ena+0.012
od+0.011
ful+0.011
Neg logit contributions
 may-0.017
 wouldn-0.017
suit-0.016
rated-0.016
 spine-0.016
Token youngest
Token ID18887
Feature activation+9.09e-02

Pos logit contributions
amer+0.055
old+0.052
ena+0.047
od+0.046
ful+0.045
Neg logit contributions
 may-0.067
 wouldn-0.066
 spine-0.063
rated-0.062
suit-0.061
Token one
Token ID530
Feature activation+1.79e-02

Pos logit contributions
amer+0.098
old+0.085
Her+0.083
ful+0.079
od+0.074
Neg logit contributions
 may-0.203
 wouldn-0.199
suit-0.189
 spine-0.188
 doesn-0.184
Token with
Token ID351
Feature activation+3.36e-02

Pos logit contributions
amer+0.032
old+0.029
od+0.028
ena+0.027
 twins+0.026
Neg logit contributions
 may-0.032
 wouldn-0.031
 spine-0.029
 steep-0.028
suit-0.028
Token them
Token ID606
Feature activation+2.52e-02

Pos logit contributions
amer+0.053
old+0.049
od+0.044
 office+0.043
ena+0.042
Neg logit contributions
 may-0.064
 wouldn-0.063
 steep-0.057
 don-0.057
suit-0.055
Token on
Token ID319
Feature activation+3.51e-02

Pos logit contributions
amer+0.041
old+0.038
ena+0.035
od+0.034
 student+0.033
Neg logit contributions
 wouldn-0.048
 may-0.048
 spine-0.044
suit-0.043
rated-0.043
Token a
Token ID257
Feature activation+9.48e-03

Pos logit contributions
amer+0.059
old+0.059
od+0.051
ful+0.049
ena+0.049
Neg logit contributions
 may-0.060
 wouldn-0.058
 steep-0.057
 spine-0.054
suit-0.054
Token
Token ID198
Feature activation+1.02e-02

Pos logit contributions
amer+0.036
old+0.031
 singers+0.031
 student+0.031
 office+0.030
Neg logit contributions
 may-0.028
 wouldn-0.026
 steep-0.025
 don-0.024
 spine-0.024
TokenOne
Token ID3198
Feature activation+8.03e-03

Pos logit contributions
amer+0.020
old+0.018
od+0.017
ful+0.017
 priest+0.017
Neg logit contributions
 may-0.015
 steep-0.015
 wouldn-0.014
rated-0.014
 spine-0.013
Token day
Token ID1110
Feature activation-2.06e-02

Pos logit contributions
amer+0.016
old+0.015
od+0.014
ena+0.014
 office+0.013
Neg logit contributions
 may-0.013
 wouldn-0.012
 spine-0.012
rated-0.012
 steep-0.011
Token an
Token ID281
Feature activation+1.05e-03

Pos logit contributions
rated+0.040
 spine+0.038
 wouldn+0.037
 may+0.037
suit+0.036
Neg logit contributions
old-0.031
amer-0.029
ena-0.028
 office-0.028
 twins-0.028
Token unexpected
Token ID10059
Feature activation+1.91e-02

Pos logit contributions
amer+0.002
old+0.002
od+0.002
ena+0.002
Jul+0.001
Neg logit contributions
rated-0.002
 wouldn-0.002
 spine-0.002
 may-0.002
 steep-0.002
Token guest
Token ID8319
Feature activation+8.45e-02

Pos logit contributions
amer+0.032
old+0.031
ena+0.028
od+0.028
 office+0.027
Neg logit contributions
 may-0.035
 wouldn-0.034
 spine-0.032
rated-0.032
suit-0.032
Token came
Token ID1625
Feature activation+3.49e-02

Pos logit contributions
amer+0.175
old+0.163
od+0.153
reating+0.153
Her+0.150
Neg logit contributions
 wouldn-0.122
 may-0.121
suit-0.099
 doesn-0.097
 don-0.096
Token to
Token ID284
Feature activation+3.22e-02

Pos logit contributions
amer+0.061
old+0.058
ena+0.053
 student+0.052
od+0.052
Neg logit contributions
 wouldn-0.060
 may-0.059
 spine-0.056
rated-0.055
suit-0.054
Token visit
Token ID3187
Feature activation+1.93e-02

Pos logit contributions
amer+0.062
old+0.059
od+0.055
ena+0.054
 student+0.052
Neg logit contributions
 may-0.050
 wouldn-0.049
 don-0.046
 spine-0.045
rated-0.044
Token the
Token ID262
Feature activation+3.06e-02

Pos logit contributions
amer+0.027
old+0.026
ena+0.024
od+0.023
 office+0.023
Neg logit contributions
 may-0.040
 wouldn-0.039
 spine-0.038
rated-0.037
suit-0.037
Token garden
Token ID11376
Feature activation+5.98e-02

Pos logit contributions
old+0.048
amer+0.048
od+0.044
ful+0.042
ena+0.042
Neg logit contributions
 may-0.057
 wouldn-0.055
 spine-0.053
 steep-0.052
 don-0.051
Token a
Token ID257
Feature activation+1.85e-02

Pos logit contributions
amer+0.059
eper+0.047
ena+0.046
old+0.045
 student+0.045
Neg logit contributions
 may-0.047
 wouldn-0.043
 don-0.039
 doesn-0.039
 might-0.037
Token classroom
Token ID15806
Feature activation+6.24e-02

Pos logit contributions
amer+0.032
old+0.030
ena+0.027
od+0.027
ful+0.026
Neg logit contributions
 may-0.034
 wouldn-0.032
 spine-0.031
suit-0.031
 steep-0.030
Token,
Token ID11
Feature activation-3.41e-03

Pos logit contributions
amer+0.137
old+0.123
eper+0.116
od+0.114
daughter+0.113
Neg logit contributions
 may-0.082
 wouldn-0.077
suit-0.077
 spine-0.070
 steep-0.070
Token where
Token ID810
Feature activation+3.56e-02

Pos logit contributions
 may+0.006
 wouldn+0.006
 spine+0.006
 steep+0.006
suit+0.006
Neg logit contributions
amer-0.006
old-0.005
od-0.005
 student-0.004
ena-0.004
Token there
Token ID612
Feature activation+2.95e-02

Pos logit contributions
amer+0.069
old+0.060
eper+0.055
ena+0.054
od+0.052
Neg logit contributions
 may-0.061
 wouldn-0.060
 don-0.055
 steep-0.053
 doesn-0.051
Token were
Token ID547
Feature activation+8.32e-02

Pos logit contributions
amer+0.065
old+0.061
ena+0.056
 student+0.055
ful+0.055
Neg logit contributions
 may-0.044
 wouldn-0.041
 don-0.035
 doesn-0.034
 might-0.032
Token many
Token ID867
Feature activation+5.66e-02

Pos logit contributions
amer+0.137
old+0.126
od+0.105
eper+0.102
daughter+0.101
Neg logit contributions
 may-0.153
 steep-0.152
 wouldn-0.152
suit-0.135
 don-0.133
Token desks
Token ID44301
Feature activation+4.14e-02

Pos logit contributions
amer+0.094
old+0.087
ena+0.079
ful+0.078
od+0.077
Neg logit contributions
 may-0.105
 wouldn-0.100
 steep-0.095
 spine-0.094
suit-0.094
Token and
Token ID290
Feature activation+3.38e-02

Pos logit contributions
amer+0.088
old+0.084
 student+0.077
ena+0.077
od+0.076
Neg logit contributions
 may-0.057
 wouldn-0.056
 spine-0.051
suit-0.050
 steep-0.048
Token chairs
Token ID18791
Feature activation+2.71e-02

Pos logit contributions
amer+0.073
old+0.068
eper+0.062
od+0.060
ful+0.058
Neg logit contributions
 may-0.047
 wouldn-0.042
 steep-0.042
 don-0.041
suit-0.041
Token,
Token ID11
Feature activation-1.05e-03

Pos logit contributions
amer+0.051
old+0.050
ena+0.045
od+0.044
 student+0.044
Neg logit contributions
 may-0.043
 wouldn-0.043
 spine-0.040
suit-0.039
 steep-0.038
Token a
Token ID257
Feature activation+2.15e-02

Pos logit contributions
amer+0.061
eper+0.049
ena+0.048
 student+0.047
old+0.046
Neg logit contributions
 may-0.043
 wouldn-0.039
 don-0.035
 might-0.034
 doesn-0.033
Token cafeteria
Token ID43075
Feature activation+6.47e-02

Pos logit contributions
amer+0.037
old+0.035
ena+0.032
od+0.031
ful+0.030
Neg logit contributions
 may-0.039
 wouldn-0.038
 spine-0.036
suit-0.036
 steep-0.035
Token,
Token ID11
Feature activation-4.44e-03

Pos logit contributions
amer+0.128
old+0.119
od+0.109
ena+0.107
ful+0.106
Neg logit contributions
 may-0.098
suit-0.095
 wouldn-0.093
 spine-0.087
 steep-0.084
Token where
Token ID810
Feature activation+4.22e-02

Pos logit contributions
 may+0.008
 wouldn+0.008
 spine+0.008
suit+0.007
rated+0.007
Neg logit contributions
amer-0.007
old-0.007
od-0.006
ena-0.006
 student-0.006
Token there
Token ID612
Feature activation+3.03e-02

Pos logit contributions
amer+0.085
old+0.072
eper+0.068
ena+0.067
od+0.064
Neg logit contributions
 may-0.069
 wouldn-0.067
 don-0.063
 steep-0.059
 doesn-0.057
Token were
Token ID547
Feature activation+8.31e-02

Pos logit contributions
amer+0.067
old+0.062
ena+0.057
 student+0.056
ful+0.055
Neg logit contributions
 may-0.046
 wouldn-0.043
 don-0.037
 doesn-0.035
 might-0.034
Token many
Token ID867
Feature activation+5.07e-02

Pos logit contributions
amer+0.141
old+0.129
od+0.110
 student+0.106
ful+0.105
Neg logit contributions
 may-0.153
 wouldn-0.150
 steep-0.149
suit-0.134
 don-0.131
Token tables
Token ID8893
Feature activation+2.56e-02

Pos logit contributions
amer+0.090
old+0.084
ena+0.077
ful+0.076
od+0.076
Neg logit contributions
 may-0.089
 wouldn-0.084
 steep-0.079
 spine-0.079
suit-0.079
Token and
Token ID290
Feature activation+3.06e-02

Pos logit contributions
amer+0.048
old+0.046
ena+0.043
 office+0.042
od+0.042
Neg logit contributions
 may-0.040
 wouldn-0.040
 spine-0.038
rated-0.038
suit-0.036
Token chairs
Token ID18791
Feature activation+2.59e-02

Pos logit contributions
amer+0.064
old+0.060
eper+0.054
od+0.054
 student+0.054
Neg logit contributions
 may-0.044
 wouldn-0.041
 steep-0.040
 don-0.038
suit-0.038
Token,
Token ID11
Feature activation-3.30e-03

Pos logit contributions
amer+0.047
old+0.046
ena+0.041
od+0.041
 student+0.041
Neg logit contributions
 may-0.043
 wouldn-0.042
 spine-0.040
suit-0.039
rated-0.038
Token angry
Token ID7954
Feature activation+6.64e-03

Pos logit contributions
 may+0.012
 spine+0.011
 wouldn+0.011
suit+0.011
rated+0.011
Neg logit contributions
old-0.013
amer-0.013
ena-0.012
 office-0.011
 student-0.011
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
amer+0.012
old+0.012
 priest+0.011
daughter+0.010
ena+0.010
Neg logit contributions
 wouldn-0.011
 may-0.011
 don-0.010
 doesn-0.010
 spine-0.010
Token She
Token ID1375
Feature activation+9.40e-03

Pos logit contributions
amer+0.017
old+0.016
ena+0.014
od+0.014
 office+0.014
Neg logit contributions
 may-0.025
 wouldn-0.024
 spine-0.023
suit-0.023
rated-0.023
Token asked
Token ID1965
Feature activation+1.77e-02

Pos logit contributions
old+0.016
ena+0.015
amer+0.015
 student+0.014
 office+0.014
Neg logit contributions
 spine-0.016
 may-0.016
rated-0.015
suit-0.015
icol-0.015
Token the
Token ID262
Feature activation+7.20e-02

Pos logit contributions
amer+0.024
old+0.024
ena+0.021
od+0.020
 student+0.020
Neg logit contributions
 may-0.037
 wouldn-0.037
 spine-0.035
rated-0.035
suit-0.034
Token other
Token ID584
Feature activation+8.24e-02

Pos logit contributions
amer+0.106
old+0.092
od+0.084
ful+0.083
ena+0.076
Neg logit contributions
 may-0.146
 wouldn-0.142
 spine-0.138
suit-0.136
 don-0.134
Token children
Token ID1751
Feature activation+3.18e-02

Pos logit contributions
amer+0.117
old+0.088
ful+0.085
armac+0.083
od+0.081
Neg logit contributions
 may-0.165
 wouldn-0.163
 don-0.158
suit-0.151
 doesn-0.150
Token where
Token ID810
Feature activation+3.21e-02

Pos logit contributions
amer+0.059
old+0.054
ena+0.051
od+0.050
 student+0.049
Neg logit contributions
 may-0.054
 wouldn-0.054
 spine-0.049
 don-0.048
 steep-0.047
Token Anna
Token ID11735
Feature activation+1.48e-02

Pos logit contributions
amer+0.062
old+0.054
od+0.050
ena+0.050
 office+0.048
Neg logit contributions
 wouldn-0.056
 may-0.055
 don-0.050
 steep-0.047
 doesn-0.046
Token and
Token ID290
Feature activation+1.50e-02

Pos logit contributions
old+0.025
amer+0.024
ena+0.023
 office+0.022
 student+0.021
Neg logit contributions
 spine-0.025
 may-0.025
rated-0.025
icol-0.024
 steep-0.024
Token Ben
Token ID3932
Feature activation+1.81e-02

Pos logit contributions
amer+0.029
old+0.027
od+0.024
 office+0.024
ena+0.024
Neg logit contributions
 wouldn-0.024
 may-0.024
 spine-0.022
 don-0.021
suit-0.021
Token could
Token ID714
Feature activation+2.69e-02

Pos logit contributions
amer+0.022
old+0.022
ena+0.021
 student+0.020
od+0.020
Neg logit contributions
 spine-0.020
 may-0.020
rated-0.019
 wouldn-0.019
suit-0.018
Token not
Token ID407
Feature activation+2.74e-02

Pos logit contributions
amer+0.059
old+0.056
od+0.052
ful+0.050
ena+0.049
Neg logit contributions
 may-0.035
 wouldn-0.035
 don-0.033
 spine-0.030
 steep-0.028
Token hear
Token ID3285
Feature activation+2.61e-02

Pos logit contributions
amer+0.052
old+0.051
 twins+0.046
 office+0.046
od+0.045
Neg logit contributions
 may-0.042
 wouldn-0.041
 spine-0.039
 don-0.037
rated-0.036
Token what
Token ID644
Feature activation+2.32e-02

Pos logit contributions
amer+0.033
old+0.030
ena+0.026
od+0.026
 student+0.026
Neg logit contributions
 may-0.058
 wouldn-0.058
 spine-0.054
rated-0.053
 steep-0.053
Token the
Token ID262
Feature activation+5.70e-02

Pos logit contributions
amer+0.038
old+0.033
ena+0.030
od+0.029
eper+0.029
Neg logit contributions
 may-0.046
 wouldn-0.046
 steep-0.040
 don-0.040
rated-0.040
Token other
Token ID584
Feature activation+8.14e-02

Pos logit contributions
amer+0.075
old+0.070
ful+0.058
od+0.058
ena+0.056
Neg logit contributions
 may-0.125
 wouldn-0.122
 spine-0.120
suit-0.116
rated-0.115
Token kids
Token ID3988
Feature activation+4.96e-02

Pos logit contributions
amer+0.125
old+0.103
ful+0.097
od+0.091
eper+0.090
Neg logit contributions
 wouldn-0.165
 may-0.164
 doesn-0.148
 don-0.148
suit-0.146
Token said
Token ID531
Feature activation+1.90e-02

Pos logit contributions
amer+0.087
old+0.085
ena+0.078
od+0.076
 student+0.075
Neg logit contributions
 may-0.083
 wouldn-0.082
 spine-0.079
rated-0.078
suit-0.077
Token to
Token ID284
Feature activation+2.50e-02

Pos logit contributions
amer+0.029
old+0.028
ena+0.025
 student+0.024
od+0.024
Neg logit contributions
 may-0.037
 wouldn-0.037
 spine-0.034
rated-0.034
suit-0.033
Token her
Token ID607
Feature activation+2.79e-02

Pos logit contributions
amer+0.045
old+0.042
od+0.037
ena+0.037
 student+0.037
Neg logit contributions
 may-0.044
 wouldn-0.043
 spine-0.040
 don-0.039
rated-0.039
Token.
Token ID13
Feature activation+1.01e-02

Pos logit contributions
amer+0.047
old+0.044
ena+0.039
od+0.038
ful+0.037
Neg logit contributions
 may-0.051
 wouldn-0.051
 spine-0.048
suit-0.046
 don-0.046
Token a
Token ID257
Feature activation+1.51e-02

Pos logit contributions
amer+0.052
ena+0.040
eper+0.040
old+0.040
 student+0.039
Neg logit contributions
 may-0.045
 wouldn-0.042
 don-0.038
 doesn-0.037
 steep-0.036
Token library
Token ID5888
Feature activation+4.87e-02

Pos logit contributions
amer+0.026
old+0.025
ena+0.023
od+0.022
 office+0.022
Neg logit contributions
 may-0.027
 wouldn-0.027
 spine-0.025
suit-0.025
rated-0.025
Token,
Token ID11
Feature activation-3.39e-03

Pos logit contributions
amer+0.100
old+0.094
od+0.086
ena+0.085
eper+0.083
Neg logit contributions
 may-0.070
 wouldn-0.068
suit-0.067
 spine-0.063
 steep-0.063
Token where
Token ID810
Feature activation+3.82e-02

Pos logit contributions
 may+0.007
 wouldn+0.006
 spine+0.006
suit+0.006
 steep+0.006
Neg logit contributions
amer-0.005
old-0.005
od-0.004
ena-0.004
 student-0.004
Token there
Token ID612
Feature activation+2.73e-02

Pos logit contributions
amer+0.072
old+0.062
ena+0.057
eper+0.057
od+0.055
Neg logit contributions
 may-0.067
 wouldn-0.066
 don-0.061
 steep-0.059
 doesn-0.056
Token were
Token ID547
Feature activation+8.08e-02

Pos logit contributions
amer+0.057
old+0.053
ena+0.049
 student+0.048
od+0.047
Neg logit contributions
 may-0.043
 wouldn-0.042
 don-0.036
 doesn-0.034
 steep-0.033
Token many
Token ID867
Feature activation+4.92e-02

Pos logit contributions
amer+0.129
old+0.115
od+0.100
ena+0.096
 student+0.095
Neg logit contributions
 may-0.157
 wouldn-0.157
 steep-0.154
suit-0.137
 don-0.136
Token books
Token ID3835
Feature activation+2.17e-02

Pos logit contributions
amer+0.082
old+0.077
ena+0.070
od+0.069
ful+0.068
Neg logit contributions
 may-0.090
 wouldn-0.087
 spine-0.082
 steep-0.081
suit-0.081
Token and
Token ID290
Feature activation+1.97e-02

Pos logit contributions
amer+0.042
old+0.041
ena+0.037
 student+0.037
od+0.036
Neg logit contributions
 may-0.034
 wouldn-0.033
 spine-0.031
suit-0.030
 steep-0.030
Token computers
Token ID9061
Feature activation+3.35e-02

Pos logit contributions
amer+0.040
old+0.037
eper+0.034
od+0.033
 student+0.033
Neg logit contributions
 may-0.031
 wouldn-0.029
 steep-0.027
 don-0.027
 doesn-0.026
Token.
Token ID13
Feature activation+1.99e-02

Pos logit contributions
amer+0.060
old+0.057
ena+0.052
od+0.051
 student+0.051
Neg logit contributions
 may-0.058
 wouldn-0.057
 spine-0.053
suit-0.052
 steep-0.051
Token was
Token ID373
Feature activation-9.82e-03

Pos logit contributions
amer+0.001
old+0.001
ena+0.001
od+0.001
 student+0.001
Neg logit contributions
 may-0.001
 wouldn-0.001
 spine-0.001
rated-0.001
suit-0.001
Token a
Token ID257
Feature activation-1.23e-02

Pos logit contributions
 may+0.018
 wouldn+0.018
 steep+0.017
 spine+0.017
 don+0.017
Neg logit contributions
amer-0.016
old-0.015
 office-0.013
 priest-0.013
eper-0.013
Token number
Token ID1271
Feature activation+6.88e-02

Pos logit contributions
 wouldn+0.023
 may+0.023
rated+0.023
 spine+0.022
suit+0.021
Neg logit contributions
amer-0.019
old-0.018
 student-0.018
 twins-0.018
 office-0.017
Token.
Token ID13
Feature activation+1.01e-02

Pos logit contributions
Kat+0.119
amer+0.115
old+0.114
Jul+0.111
Maria+0.106
Neg logit contributions
 wouldn-0.105
 may-0.098
suit-0.097
 bit-0.086
 while-0.084
Token The
Token ID383
Feature activation+4.79e-02

Pos logit contributions
amer+0.020
old+0.018
eper+0.017
ful+0.016
 priest+0.016
Neg logit contributions
 may-0.016
 wouldn-0.015
 spine-0.015
suit-0.014
 steep-0.014
Token number
Token ID1271
Feature activation+8.06e-02

Pos logit contributions
amer+0.081
old+0.077
ful+0.071
od+0.065
ena+0.064
Neg logit contributions
 may-0.088
 wouldn-0.084
 spine-0.081
suit-0.080
 steep-0.078
Token was
Token ID373
Feature activation+3.96e-02

Pos logit contributions
amer+0.198
Kat+0.195
reating+0.193
ful+0.186
old+0.183
Neg logit contributions
 wouldn-0.082
 may-0.078
 might-0.061
 doesn-0.059
 didn-0.056
Token open
Token ID1280
Feature activation+3.27e-02

Pos logit contributions
amer+0.063
old+0.060
od+0.052
ena+0.051
 student+0.049
Neg logit contributions
 wouldn-0.077
 may-0.074
 steep-0.071
 spine-0.068
suit-0.068
Token.
Token ID13
Feature activation+1.69e-02

Pos logit contributions
amer+0.055
old+0.053
ena+0.048
od+0.047
 student+0.046
Neg logit contributions
 may-0.059
 wouldn-0.058
 spine-0.055
suit-0.054
rated-0.054
Token It
Token ID632
Feature activation+1.52e-02

Pos logit contributions
amer+0.031
old+0.029
ena+0.026
od+0.026
 office+0.026
Neg logit contributions
 may-0.028
 wouldn-0.027
 spine-0.026
suit-0.025
 steep-0.025
Token wanted
Token ID2227
Feature activation+2.93e-02

Pos logit contributions
amer+0.032
old+0.030
od+0.027
ful+0.027
ena+0.027
Neg logit contributions
 wouldn-0.024
 may-0.023
 steep-0.019
rated-0.019
suit-0.019
Token quiz
Token ID38964
Feature activation+4.51e-02

Pos logit contributions
amer+0.073
old+0.071
ena+0.064
od+0.063
 office+0.061
Neg logit contributions
 may-0.078
 wouldn-0.077
 spine-0.073
rated-0.072
suit-0.071
Token.
Token ID13
Feature activation-4.79e-03

Pos logit contributions
amer+0.082
old+0.076
eper+0.068
Kat+0.066
ena+0.066
Neg logit contributions
 may-0.076
 wouldn-0.074
suit-0.070
 steep-0.068
 spine-0.066
Token
Token ID198
Feature activation+1.69e-02

Pos logit contributions
 may+0.009
 wouldn+0.009
 spine+0.008
suit+0.008
 steep+0.008
Neg logit contributions
amer-0.008
old-0.007
ena-0.006
 office-0.006
od-0.006
Token
Token ID198
Feature activation+4.84e-03

Pos logit contributions
amer+0.034
old+0.030
 office+0.029
eper+0.029
 student+0.029
Neg logit contributions
 may-0.025
 steep-0.023
 wouldn-0.022
 don-0.022
suit-0.021
TokenThe
Token ID464
Feature activation+4.76e-02

Pos logit contributions
amer+0.009
old+0.008
od+0.008
 office+0.008
 priest+0.007
Neg logit contributions
 may-0.008
 wouldn-0.008
 steep-0.008
suit-0.007
rated-0.007
Token teacher
Token ID4701
Feature activation+8.01e-02

Pos logit contributions
amer+0.091
old+0.087
ful+0.080
od+0.076
ena+0.075
Neg logit contributions
 may-0.075
 wouldn-0.071
 spine-0.068
 steep-0.066
suit-0.066
Token gives
Token ID3607
Feature activation+1.00e-02

Pos logit contributions
amer+0.154
ful+0.135
old+0.134
Her+0.126
od+0.126
Neg logit contributions
 may-0.133
 wouldn-0.127
 doesn-0.115
 don-0.111
suit-0.108
Token them
Token ID606
Feature activation+3.23e-02

Pos logit contributions
amer+0.013
old+0.013
ena+0.011
od+0.011
 student+0.011
Neg logit contributions
 may-0.022
 wouldn-0.021
 spine-0.020
rated-0.020
suit-0.020
Token the
Token ID262
Feature activation+4.91e-02

Pos logit contributions
amer+0.054
old+0.049
od+0.042
 office+0.042
 priest+0.042
Neg logit contributions
 may-0.058
 wouldn-0.056
suit-0.054
fy-0.053
 steep-0.052
Token quiz
Token ID38964
Feature activation+4.58e-02

Pos logit contributions
amer+0.086
old+0.082
ena+0.074
od+0.073
ful+0.072
Neg logit contributions
 may-0.086
 wouldn-0.084
 spine-0.080
suit-0.078
rated-0.078
Token.
Token ID13
Feature activation+1.75e-03

Pos logit contributions
amer+0.087
old+0.078
eper+0.072
od+0.071
ena+0.069
Neg logit contributions
 may-0.072
suit-0.071
 wouldn-0.071
 spine-0.066
 steep-0.065
TokenWhen
Token ID2215
Feature activation+3.45e-02

Pos logit contributions
amer+0.119
eper+0.103
 twins+0.099
 singers+0.097
 priest+0.097
Neg logit contributions
 may-0.055
 wouldn-0.054
 don-0.052
rated-0.049
 steep-0.049
Token she
Token ID673
Feature activation+1.25e-02

Pos logit contributions
amer+0.067
old+0.060
eper+0.053
ena+0.052
 priest+0.050
Neg logit contributions
 wouldn-0.062
 may-0.061
 don-0.055
 steep-0.051
 spine-0.051
Token arrived
Token ID5284
Feature activation+3.78e-02

Pos logit contributions
amer+0.027
old+0.026
ena+0.024
od+0.023
 student+0.023
Neg logit contributions
 may-0.017
 wouldn-0.017
 spine-0.015
rated-0.015
suit-0.015
Token,
Token ID11
Feature activation+3.61e-02

Pos logit contributions
amer+0.073
old+0.067
ena+0.062
od+0.061
 student+0.060
Neg logit contributions
 wouldn-0.060
 may-0.060
 spine-0.056
suit-0.054
 steep-0.052
Token the
Token ID262
Feature activation+5.12e-02

Pos logit contributions
amer+0.062
old+0.055
ena+0.047
eper+0.047
 student+0.046
Neg logit contributions
 wouldn-0.072
 may-0.069
 don-0.064
rated-0.061
 spine-0.061
Token teacher
Token ID4701
Feature activation+8.01e-02

Pos logit contributions
amer+0.066
old+0.060
od+0.052
ena+0.051
ful+0.051
Neg logit contributions
 may-0.113
 wouldn-0.112
 spine-0.108
suit-0.106
rated-0.104
Token taught
Token ID7817
Feature activation+5.03e-02

Pos logit contributions
amer+0.164
old+0.144
ful+0.136
od+0.132
ena+0.132
Neg logit contributions
 wouldn-0.127
 may-0.122
suit-0.101
 doesn-0.101
 don-0.100
Token them
Token ID606
Feature activation+3.85e-02

Pos logit contributions
amer+0.078
 priest+0.061
eper+0.061
old+0.058
 student+0.057
Neg logit contributions
 wouldn-0.098
 may-0.096
 don-0.091
 doesn-0.087
rated-0.087
Token about
Token ID546
Feature activation+4.96e-02

Pos logit contributions
amer+0.073
eper+0.057
 priest+0.055
old+0.054
ena+0.053
Neg logit contributions
 may-0.072
 wouldn-0.071
 don-0.064
 doesn-0.064
idents-0.058
Token rivers
Token ID18180
Feature activation+1.28e-02

Pos logit contributions
amer+0.078
old+0.068
eper+0.058
v+0.056
ena+0.056
Neg logit contributions
 may-0.103
 wouldn-0.101
 don-0.093
 doesn-0.090
 steep-0.088
Token.
Token ID13
Feature activation+3.61e-02

Pos logit contributions
amer+0.024
old+0.023
ena+0.021
 student+0.021
od+0.021
Neg logit contributions
 may-0.020
 wouldn-0.020
 spine-0.019
suit-0.018
rated-0.018
Token boring
Token ID14262
Feature activation+1.86e-02

Pos logit contributions
amer+0.051
old+0.049
 office+0.042
ena+0.042
od+0.042
Neg logit contributions
 may-0.045
 wouldn-0.045
 steep-0.040
 don-0.039
 doesn-0.039
Token.
Token ID13
Feature activation+3.37e-03

Pos logit contributions
amer+0.032
old+0.031
ena+0.027
 priest+0.027
od+0.027
Neg logit contributions
 may-0.033
 wouldn-0.032
 doesn-0.029
suit-0.029
 spine-0.029
Token
Token ID198
Feature activation+1.36e-02

Pos logit contributions
amer+0.006
old+0.006
od+0.005
 office+0.005
ena+0.005
Neg logit contributions
 may-0.006
 wouldn-0.005
 spine-0.005
suit-0.005
 steep-0.005
Token
Token ID198
Feature activation+1.09e-02

Pos logit contributions
amer+0.025
old+0.022
 student+0.021
od+0.021
 singers+0.021
Neg logit contributions
 may-0.023
 steep-0.021
 wouldn-0.021
 don-0.020
 spine-0.019
TokenThe
Token ID464
Feature activation+2.97e-02

Pos logit contributions
amer+0.018
od+0.017
old+0.016
 singers+0.016
 invitations+0.016
Neg logit contributions
 may-0.019
 steep-0.018
 don-0.017
 wouldn-0.017
suit-0.017
Token teacher
Token ID4701
Feature activation+7.99e-02

Pos logit contributions
amer+0.048
old+0.047
od+0.041
ena+0.040
ful+0.039
Neg logit contributions
 may-0.056
 wouldn-0.054
 spine-0.053
rated-0.051
suit-0.051
Token collects
Token ID26609
Feature activation+3.18e-02

Pos logit contributions
amer+0.163
ful+0.144
old+0.142
od+0.135
Kat+0.133
Neg logit contributions
 may-0.122
 wouldn-0.117
 doesn-0.107
 don-0.102
suit-0.094
Token the
Token ID262
Feature activation+3.21e-02

Pos logit contributions
amer+0.049
old+0.047
ena+0.040
od+0.040
 office+0.040
Neg logit contributions
 may-0.063
 wouldn-0.061
suit-0.057
 spine-0.057
 steep-0.056
Token papers
Token ID9473
Feature activation+5.81e-02

Pos logit contributions
amer+0.045
old+0.043
ena+0.038
od+0.037
 student+0.036
Neg logit contributions
 may-0.067
 wouldn-0.066
 spine-0.063
rated-0.062
suit-0.062
Token and
Token ID290
Feature activation-6.31e-04

Pos logit contributions
amer+0.115
old+0.113
Kat+0.100
daughter+0.100
ena+0.099
Neg logit contributions
 wouldn-0.086
 may-0.084
 spine-0.080
suit-0.075
 steep-0.074
Token looks
Token ID3073
Feature activation-1.94e-04

Pos logit contributions
 may+0.001
 wouldn+0.001
 don+0.001
 doesn+0.001
 steep+0.001
Neg logit contributions
amer-0.001
old-0.001
od-0.001
ful-0.001
ena-0.001
Token day
Token ID1110
Feature activation-2.35e-03

Pos logit contributions
amer+0.067
old+0.061
od+0.059
eper+0.058
ena+0.057
Neg logit contributions
 may-0.043
 wouldn-0.041
 steep-0.039
rated-0.038
 spine-0.038
Token,
Token ID11
Feature activation-5.43e-03

Pos logit contributions
 may+0.004
 wouldn+0.004
rated+0.004
 spine+0.004
suit+0.003
Neg logit contributions
amer-0.004
old-0.004
ena-0.004
od-0.004
 student-0.004
Token the
Token ID262
Feature activation+3.49e-02

Pos logit contributions
 may+0.010
 wouldn+0.010
 spine+0.010
rated+0.010
suit+0.010
Neg logit contributions
amer-0.008
old-0.008
ena-0.007
od-0.007
 student-0.007
Token three
Token ID1115
Feature activation+6.49e-02

Pos logit contributions
amer+0.050
old+0.049
ena+0.043
od+0.042
ful+0.040
Neg logit contributions
 may-0.071
 wouldn-0.070
 spine-0.068
rated-0.066
suit-0.066
Token university
Token ID6403
Feature activation+7.04e-02

Pos logit contributions
amer+0.131
reating+0.116
ful+0.114
 Friendship+0.112
old+0.109
Neg logit contributions
 may-0.088
 wouldn-0.083
fy-0.081
rated-0.080
suit-0.079
Token students
Token ID2444
Feature activation+7.98e-02

Pos logit contributions
amer+0.149
old+0.136
ful+0.129
od+0.128
ena+0.126
Neg logit contributions
 wouldn-0.106
 may-0.100
suit-0.085
rated-0.084
 don-0.082
Token decided
Token ID3066
Feature activation+2.95e-02

Pos logit contributions
amer+0.164
old+0.146
ena+0.138
ful+0.137
 office+0.135
Neg logit contributions
 wouldn-0.123
 may-0.116
 don-0.099
rated-0.098
suit-0.096
Token to
Token ID284
Feature activation+5.28e-02

Pos logit contributions
amer+0.058
old+0.052
ena+0.049
od+0.048
 student+0.048
Neg logit contributions
 wouldn-0.049
 may-0.048
 steep-0.042
 don-0.042
 spine-0.041
Token take
Token ID1011
Feature activation+9.06e-03

Pos logit contributions
amer+0.114
old+0.107
ena+0.100
od+0.099
 twins+0.098
Neg logit contributions
 may-0.073
 wouldn-0.069
 don-0.065
rated-0.059
 steep-0.056
Token the
Token ID262
Feature activation+3.48e-02

Pos logit contributions
amer+0.014
old+0.013
ena+0.012
od+0.011
ful+0.011
Neg logit contributions
 may-0.017
 wouldn-0.017
suit-0.016
rated-0.016
 spine-0.016
Token youngest
Token ID18887
Feature activation+9.09e-02

Pos logit contributions
amer+0.055
old+0.052
ena+0.047
od+0.046
ful+0.045
Neg logit contributions
 may-0.067
 wouldn-0.066
 spine-0.063
rated-0.062
suit-0.061
Token number
Token ID1271
Feature activation+7.12e-02

Pos logit contributions
amer+0.054
old+0.051
ena+0.046
od+0.045
ful+0.044
Neg logit contributions
 may-0.075
 wouldn-0.074
 spine-0.071
rated-0.070
suit-0.069
Token.
Token ID13
Feature activation+1.82e-02

Pos logit contributions
amer+0.119
old+0.112
Kat+0.108
 twins+0.101
Her+0.100
Neg logit contributions
 may-0.126
 wouldn-0.126
 steep-0.103
 doesn-0.102
 don-0.102
Token Finally
Token ID9461
Feature activation+5.29e-03

Pos logit contributions
amer+0.033
old+0.030
ena+0.028
eper+0.028
 priest+0.028
Neg logit contributions
 may-0.031
 wouldn-0.029
 spine-0.028
suit-0.027
�-0.026
Token,
Token ID11
Feature activation+1.04e-02

Pos logit contributions
amer+0.011
old+0.011
ena+0.010
od+0.010
 office+0.010
Neg logit contributions
 may-0.007
 wouldn-0.007
 spine-0.006
rated-0.006
suit-0.006
Token the
Token ID262
Feature activation+4.18e-02

Pos logit contributions
amer+0.017
old+0.016
ena+0.014
od+0.014
 office+0.014
Neg logit contributions
 wouldn-0.020
 may-0.020
 spine-0.018
rated-0.018
 steep-0.018
Token mystery
Token ID10715
Feature activation+7.94e-02

Pos logit contributions
amer+0.066
old+0.062
ena+0.055
od+0.055
ful+0.055
Neg logit contributions
 may-0.080
 wouldn-0.079
 spine-0.076
rated-0.074
suit-0.073
Token was
Token ID373
Feature activation+3.02e-02

Pos logit contributions
amer+0.141
old+0.131
ful+0.120
ena+0.119
Her+0.119
Neg logit contributions
 wouldn-0.143
 may-0.141
 doesn-0.116
suit-0.115
 steep-0.115
Token solved
Token ID16019
Feature activation+3.59e-02

Pos logit contributions
amer+0.054
old+0.052
ena+0.047
od+0.046
 student+0.046
Neg logit contributions
 wouldn-0.052
 may-0.051
rated-0.047
 spine-0.047
 steep-0.047
Token!
Token ID0
Feature activation+2.65e-02

Pos logit contributions
amer+0.062
old+0.060
ena+0.054
od+0.053
 office+0.053
Neg logit contributions
 may-0.062
 wouldn-0.062
 spine-0.058
rated-0.058
suit-0.057
Token
Token ID198
Feature activation+2.38e-02

Pos logit contributions
amer+0.052
old+0.044
eper+0.044
 priest+0.043
ena+0.043
Neg logit contributions
 may-0.042
 wouldn-0.038
 spine-0.037
�-0.037
suit-0.036
Token
Token ID198
Feature activation+2.17e-02

Pos logit contributions
amer+0.052
 office+0.045
 student+0.045
 singers+0.044
ena+0.044
Neg logit contributions
 may-0.032
 don-0.027
 wouldn-0.026
 steep-0.026
�-0.026
Token pass
Token ID1208
Feature activation+2.61e-02

Pos logit contributions
amer+0.087
old+0.084
od+0.076
ena+0.076
 student+0.074
Neg logit contributions
 may-0.069
 wouldn-0.069
 spine-0.064
rated-0.062
suit-0.061
Token their
Token ID511
Feature activation+2.76e-02

Pos logit contributions
amer+0.038
old+0.036
od+0.032
ena+0.031
 office+0.030
Neg logit contributions
 wouldn-0.052
 may-0.052
suit-0.049
 spine-0.048
rated-0.048
Token folders
Token ID24512
Feature activation+4.12e-02

Pos logit contributions
old+0.035
amer+0.035
 student+0.034
 office+0.033
ena+0.032
Neg logit contributions
 wouldn-0.057
 may-0.057
rated-0.056
 spine-0.054
 steep-0.053
Token to
Token ID284
Feature activation+3.87e-02

Pos logit contributions
amer+0.069
old+0.067
ena+0.061
od+0.060
 office+0.059
Neg logit contributions
 may-0.073
 wouldn-0.072
 spine-0.068
rated-0.067
suit-0.066
Token the
Token ID262
Feature activation+3.76e-02

Pos logit contributions
amer+0.068
old+0.062
od+0.055
eper+0.054
ena+0.051
Neg logit contributions
 may-0.067
 don-0.065
 wouldn-0.065
 spine-0.060
 steep-0.059
Token front
Token ID2166
Feature activation+7.78e-02

Pos logit contributions
amer+0.056
old+0.054
od+0.047
ena+0.046
ful+0.045
Neg logit contributions
 may-0.075
 wouldn-0.073
 spine-0.071
suit-0.070
 steep-0.069
Token of
Token ID286
Feature activation+3.92e-02

Pos logit contributions
amer+0.124
old+0.109
od+0.104
eper+0.099
ful+0.097
Neg logit contributions
 may-0.148
 spine-0.142
 wouldn-0.142
suit-0.135
 steep-0.134
Token the
Token ID262
Feature activation+4.57e-02

Pos logit contributions
amer+0.066
old+0.062
od+0.054
ena+0.051
eper+0.051
Neg logit contributions
 may-0.070
 wouldn-0.068
 don-0.065
 spine-0.063
 steep-0.063
Token class
Token ID1398
Feature activation+6.28e-02

Pos logit contributions
amer+0.063
old+0.060
ena+0.052
od+0.052
ful+0.050
Neg logit contributions
 may-0.096
 wouldn-0.094
 spine-0.091
suit-0.089
rated-0.089
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
amer+0.114
old+0.100
od+0.092
eper+0.090
ful+0.089
Neg logit contributions
 may-0.111
 wouldn-0.110
suit-0.099
 spine-0.098
 don-0.096
Token She
Token ID1375
Feature activation+2.50e-02

Pos logit contributions
amer+0.020
old+0.019
ena+0.017
od+0.017
 office+0.016
Neg logit contributions
 may-0.022
 wouldn-0.021
 spine-0.020
rated-0.020
suit-0.020
Token teacher
Token ID4701
Feature activation+5.34e-02

Pos logit contributions
old+0.107
amer+0.103
ful+0.096
od+0.089
ena+0.087
Neg logit contributions
 may-0.141
suit-0.129
 spine-0.128
 don-0.128
 steep-0.127
Token many
Token ID867
Feature activation+4.40e-02

Pos logit contributions
amer+0.107
old+0.094
ena+0.087
od+0.086
Kat+0.086
Neg logit contributions
 may-0.081
 wouldn-0.078
 don-0.071
suit-0.069
 doesn-0.068
Token questions
Token ID2683
Feature activation+4.66e-02

Pos logit contributions
amer+0.069
old+0.063
ena+0.058
ful+0.058
 office+0.057
Neg logit contributions
 may-0.085
 wouldn-0.077
 don-0.074
 steep-0.074
 doesn-0.072
Token.
Token ID13
Feature activation+1.79e-02

Pos logit contributions
amer+0.090
old+0.086
Kat+0.076
od+0.076
 office+0.075
Neg logit contributions
 may-0.072
 wouldn-0.069
 don-0.062
 spine-0.062
 steep-0.061
Token The
Token ID383
Feature activation+7.53e-02

Pos logit contributions
amer+0.035
old+0.033
od+0.031
ful+0.030
eper+0.029
Neg logit contributions
 may-0.027
 wouldn-0.026
 spine-0.026
suit-0.025
 steep-0.024
Token teacher
Token ID4701
Feature activation+7.77e-02

Pos logit contributions
amer+0.116
old+0.113
ful+0.105
v+0.099
od+0.099
Neg logit contributions
 may-0.138
suit-0.128
 spine-0.127
 wouldn-0.125
 don-0.124
Token is
Token ID318
Feature activation+5.60e-02

Pos logit contributions
amer+0.133
old+0.119
ful+0.115
od+0.108
ena+0.107
Neg logit contributions
 may-0.145
 wouldn-0.139
 doesn-0.125
 don-0.124
suit-0.120
Token happy
Token ID3772
Feature activation+3.47e-02

Pos logit contributions
amer+0.109
old+0.104
rolley+0.098
od+0.096
armac+0.095
Neg logit contributions
 wouldn-0.075
 steep-0.074
 don-0.071
 low-0.068
 may-0.066
Token and
Token ID290
Feature activation+1.88e-02

Pos logit contributions
amer+0.073
old+0.070
od+0.062
ena+0.061
Kat+0.061
Neg logit contributions
 may-0.050
 wouldn-0.049
 don-0.045
 doesn-0.045
 spine-0.041
Token answers
Token ID7429
Feature activation+3.21e-02

Pos logit contributions
amer+0.034
old+0.031
od+0.029
ena+0.029
ful+0.029
Neg logit contributions
 may-0.033
 wouldn-0.032
 don-0.029
 steep-0.029
 doesn-0.028
Token them
Token ID606
Feature activation+2.38e-02

Pos logit contributions
amer+0.056
old+0.053
 office+0.047
od+0.046
ena+0.046
Neg logit contributions
 may-0.058
 wouldn-0.056
 don-0.050
suit-0.048
 doesn-0.048
Token to
Token ID284
Feature activation+2.64e-02

Pos logit contributions
amer+0.056
old+0.050
eper+0.046
 twins+0.045
ena+0.045
Neg logit contributions
 may-0.044
 wouldn-0.043
suit-0.042
 steep-0.040
fy-0.039
Token the
Token ID262
Feature activation+5.69e-02

Pos logit contributions
amer+0.046
old+0.043
od+0.038
 office+0.036
eper+0.036
Neg logit contributions
 may-0.046
 wouldn-0.046
 don-0.043
 spine-0.042
 steep-0.041
Token teacher
Token ID4701
Feature activation+5.60e-02

Pos logit contributions
amer+0.078
old+0.074
ful+0.066
od+0.063
Kat+0.058
Neg logit contributions
 may-0.115
 wouldn-0.112
 spine-0.111
suit-0.107
 don-0.106
Token.
Token ID13
Feature activation+7.03e-03

Pos logit contributions
amer+0.120
Kat+0.101
armac+0.100
eper+0.097
old+0.097
Neg logit contributions
 may-0.072
 wouldn-0.071
 don-0.066
 doesn-0.064
suit-0.063
Token The
Token ID383
Feature activation+6.16e-02

Pos logit contributions
amer+0.014
old+0.013
od+0.012
 office+0.012
eper+0.012
Neg logit contributions
 may-0.011
 wouldn-0.010
suit-0.010
 steep-0.010
 don-0.009
Token teacher
Token ID4701
Feature activation+7.74e-02

Pos logit contributions
amer+0.090
old+0.080
ful+0.073
od+0.072
eper+0.065
Neg logit contributions
 may-0.124
 wouldn-0.120
 spine-0.118
suit-0.115
 don-0.111
Token checks
Token ID8794
Feature activation+2.04e-03

Pos logit contributions
amer+0.144
old+0.127
ful+0.126
od+0.117
Her+0.115
Neg logit contributions
 may-0.135
 wouldn-0.129
 doesn-0.117
 don-0.114
suit-0.107
Token the
Token ID262
Feature activation+4.17e-02

Pos logit contributions
amer+0.003
old+0.003
ena+0.002
 office+0.002
od+0.002
Neg logit contributions
 may-0.004
 wouldn-0.004
 spine-0.004
rated-0.004
suit-0.004
Token quiz
Token ID38964
Feature activation+4.61e-02

Pos logit contributions
amer+0.068
old+0.066
ena+0.059
od+0.058
ful+0.056
Neg logit contributions
 may-0.077
 wouldn-0.075
 spine-0.072
rated-0.070
suit-0.070
Token.
Token ID13
Feature activation+8.43e-03

Pos logit contributions
amer+0.078
old+0.073
od+0.065
ena+0.064
ful+0.062
Neg logit contributions
 may-0.085
 wouldn-0.084
suit-0.076
 spine-0.076
 steep-0.075
Token She
Token ID1375
Feature activation+2.81e-02

Pos logit contributions
amer+0.015
old+0.014
od+0.013
ena+0.013
 office+0.013
Neg logit contributions
 may-0.015
 wouldn-0.014
 spine-0.013
suit-0.013
 steep-0.013
Token his
Token ID465
Feature activation+2.53e-02

Pos logit contributions
amer+0.055
old+0.046
od+0.045
 student+0.044
anny+0.044
Neg logit contributions
 wouldn-0.047
 may-0.046
 steep-0.045
 don-0.040
 Nut-0.039
Token teacher
Token ID4701
Feature activation+4.67e-02

Pos logit contributions
amer+0.032
old+0.030
ena+0.026
od+0.026
ful+0.025
Neg logit contributions
 may-0.056
 wouldn-0.056
 spine-0.054
suit-0.053
rated-0.052
Token standing
Token ID5055
Feature activation+1.90e-02

Pos logit contributions
amer+0.093
old+0.079
 priest+0.074
od+0.074
Kat+0.073
Neg logit contributions
 wouldn-0.071
 may-0.070
 don-0.063
 spine-0.063
suit-0.061
Token at
Token ID379
Feature activation+3.18e-02

Pos logit contributions
amer+0.032
old+0.031
ena+0.028
od+0.028
 student+0.027
Neg logit contributions
 may-0.034
 wouldn-0.034
 spine-0.032
rated-0.031
suit-0.031
Token the
Token ID262
Feature activation+4.05e-02

Pos logit contributions
amer+0.060
old+0.059
od+0.054
Her+0.050
ful+0.049
Neg logit contributions
 spine-0.043
 don-0.043
 may-0.041
 wouldn-0.041
icol-0.039
Token front
Token ID2166
Feature activation+7.74e-02

Pos logit contributions
amer+0.067
old+0.065
od+0.057
ful+0.055
eper+0.054
Neg logit contributions
 spine-0.072
 may-0.071
suit-0.067
 steep-0.067
 wouldn-0.067
Token of
Token ID286
Feature activation+3.22e-02

Pos logit contributions
amer+0.133
old+0.115
od+0.113
eper+0.110
ful+0.104
Neg logit contributions
 spine-0.137
 may-0.130
 steep-0.124
 wouldn-0.122
suit-0.122
Token the
Token ID262
Feature activation+4.57e-02

Pos logit contributions
amer+0.055
old+0.053
od+0.046
ena+0.045
eper+0.044
Neg logit contributions
 may-0.050
 wouldn-0.048
 don-0.048
 spine-0.048
 steep-0.046
Token room
Token ID2119
Feature activation+5.09e-02

Pos logit contributions
amer+0.066
old+0.064
od+0.055
ful+0.054
ena+0.052
Neg logit contributions
 may-0.090
 spine-0.090
 wouldn-0.087
suit-0.086
 steep-0.084
Token.
Token ID13
Feature activation-1.40e-03

Pos logit contributions
amer+0.097
old+0.093
od+0.083
eper+0.081
Kat+0.081
Neg logit contributions
 wouldn-0.079
 spine-0.077
 may-0.076
suit-0.071
 steep-0.070
Token She
Token ID1375
Feature activation+1.60e-02

Pos logit contributions
 may+0.002
 wouldn+0.002
 spine+0.002
suit+0.002
rated+0.002
Neg logit contributions
amer-0.002
old-0.002
ena-0.002
od-0.002
 student-0.002
Token had
Token ID550
Feature activation+2.99e-02

Pos logit contributions
amer+0.046
old+0.043
od+0.041
ena+0.040
 student+0.040
Neg logit contributions
 wouldn-0.035
 may-0.035
rated-0.030
 don-0.029
 spine-0.029
Token ordered
Token ID6149
Feature activation+4.00e-02

Pos logit contributions
amer+0.055
old+0.052
ena+0.047
od+0.047
 student+0.046
Neg logit contributions
 wouldn-0.051
 may-0.051
 spine-0.046
rated-0.045
suit-0.045
Token them
Token ID606
Feature activation+2.00e-02

Pos logit contributions
amer+0.050
old+0.046
ena+0.040
od+0.039
 student+0.039
Neg logit contributions
 may-0.090
 wouldn-0.089
 spine-0.084
rated-0.083
suit-0.083
Token and
Token ID290
Feature activation+6.22e-03

Pos logit contributions
amer+0.035
old+0.034
ena+0.031
od+0.030
 student+0.030
Neg logit contributions
 may-0.034
 wouldn-0.034
 spine-0.032
rated-0.031
suit-0.031
Token the
Token ID262
Feature activation+3.57e-02

Pos logit contributions
amer+0.011
old+0.010
od+0.010
ena+0.010
 student+0.009
Neg logit contributions
 wouldn-0.011
 may-0.011
 don-0.010
 doesn-0.009
 steep-0.009
Token delivery
Token ID7585
Feature activation+7.70e-02

Pos logit contributions
amer+0.055
old+0.052
ena+0.047
od+0.046
 student+0.045
Neg logit contributions
 may-0.070
 wouldn-0.069
 spine-0.066
rated-0.065
suit-0.064
Token man
Token ID582
Feature activation+3.93e-02

Pos logit contributions
amer+0.129
old+0.115
od+0.114
Her+0.112
Jul+0.110
Neg logit contributions
 wouldn-0.146
 may-0.139
suit-0.119
 doesn-0.119
 don-0.116
Token had
Token ID550
Feature activation+3.88e-02

Pos logit contributions
amer+0.078
old+0.070
od+0.067
ena+0.066
 student+0.066
Neg logit contributions
 wouldn-0.064
 may-0.062
suit-0.052
 don-0.052
 doesn-0.051
Token brought
Token ID3181
Feature activation+2.42e-02

Pos logit contributions
amer+0.068
old+0.063
od+0.057
ena+0.057
 student+0.056
Neg logit contributions
 wouldn-0.070
 may-0.069
 spine-0.061
 steep-0.061
suit-0.061
Token them
Token ID606
Feature activation+6.21e-03

Pos logit contributions
amer+0.035
old+0.032
ena+0.029
od+0.028
 student+0.028
Neg logit contributions
 may-0.049
 wouldn-0.049
 spine-0.046
rated-0.046
suit-0.046
Token to
Token ID284
Feature activation+3.07e-02

Pos logit contributions
amer+0.010
old+0.010
ena+0.009
od+0.009
 office+0.009
Neg logit contributions
 may-0.011
 wouldn-0.011
 spine-0.011
rated-0.011
suit-0.010
Token,
Token ID11
Feature activation+9.42e-05

Pos logit contributions
amer+0.036
old+0.035
ena+0.033
od+0.032
 student+0.032
Neg logit contributions
 may-0.032
 wouldn-0.032
 spine-0.030
rated-0.030
suit-0.029
Token she
Token ID673
Feature activation+1.96e-02

Pos logit contributions
amer+0.000
old+0.000
ena+0.000
od+0.000
 office+0.000
Neg logit contributions
 may-0.000
 wouldn-0.000
 spine-0.000
rated-0.000
suit-0.000
Token goes
Token ID2925
Feature activation+1.40e-02

Pos logit contributions
old+0.033
amer+0.033
ena+0.030
 office+0.029
od+0.029
Neg logit contributions
 spine-0.034
rated-0.033
 wouldn-0.033
 may-0.032
suit-0.032
Token to
Token ID284
Feature activation+4.20e-02

Pos logit contributions
amer+0.024
old+0.023
ena+0.021
od+0.021
 office+0.021
Neg logit contributions
 may-0.025
 wouldn-0.025
 spine-0.023
rated-0.023
suit-0.023
Token the
Token ID262
Feature activation+3.41e-02

Pos logit contributions
amer+0.069
old+0.063
od+0.055
eper+0.054
ena+0.054
Neg logit contributions
 may-0.078
 spine-0.075
 wouldn-0.075
 don-0.073
 steep-0.070
Token front
Token ID2166
Feature activation+7.68e-02

Pos logit contributions
amer+0.054
old+0.051
od+0.046
ful+0.043
ena+0.043
Neg logit contributions
 may-0.065
 wouldn-0.062
suit-0.062
 spine-0.062
 steep-0.060
Token of
Token ID286
Feature activation+4.28e-02

Pos logit contributions
amer+0.130
od+0.112
eper+0.107
old+0.105
ful+0.100
Neg logit contributions
 spine-0.134
 may-0.127
 steep-0.124
suit-0.122
fy-0.117
Token the
Token ID262
Feature activation+4.59e-02

Pos logit contributions
amer+0.074
old+0.070
od+0.062
eper+0.058
v+0.057
Neg logit contributions
 may-0.069
 don-0.067
 spine-0.066
 wouldn-0.066
 steep-0.063
Token room
Token ID2119
Feature activation+6.16e-02

Pos logit contributions
amer+0.064
old+0.060
od+0.052
ena+0.050
ful+0.050
Neg logit contributions
 may-0.097
 wouldn-0.093
 spine-0.092
suit-0.091
 steep-0.089
Token.
Token ID13
Feature activation+1.43e-02

Pos logit contributions
amer+0.121
old+0.111
od+0.104
eper+0.100
daughter+0.098
Neg logit contributions
 wouldn-0.093
 may-0.093
 spine-0.090
suit-0.086
 steep-0.083
Token She
Token ID1375
Feature activation+1.79e-02

Pos logit contributions
amer+0.018
old+0.017
ena+0.015
od+0.015
 office+0.014
Neg logit contributions
 may-0.032
 wouldn-0.031
 spine-0.030
rated-0.030
suit-0.029
Token and
Token ID290
Feature activation+1.18e-02

Pos logit contributions
amer+0.038
old+0.034
ena+0.032
 priest+0.031
Kat+0.031
Neg logit contributions
 may-0.044
 wouldn-0.043
 steep-0.042
 spine-0.041
fy-0.040
Token then
Token ID788
Feature activation+2.60e-02

Pos logit contributions
amer+0.021
old+0.019
 office+0.017
od+0.017
ena+0.017
Neg logit contributions
 wouldn-0.022
 may-0.021
 don-0.019
 steep-0.018
 spine-0.018
Token saying
Token ID2282
Feature activation+2.71e-02

Pos logit contributions
amer+0.050
old+0.046
ena+0.042
 twins+0.041
 office+0.041
Neg logit contributions
 wouldn-0.044
 may-0.043
 steep-0.037
 don-0.036
rated-0.036
Token the
Token ID262
Feature activation+3.59e-02

Pos logit contributions
amer+0.057
old+0.053
ena+0.049
 office+0.047
 student+0.047
Neg logit contributions
 may-0.040
 wouldn-0.039
 don-0.034
 steep-0.034
suit-0.033
Token biggest
Token ID4094
Feature activation+4.01e-02

Pos logit contributions
amer+0.060
old+0.058
ena+0.051
ful+0.050
od+0.049
Neg logit contributions
 may-0.067
 wouldn-0.065
 spine-0.061
 steep-0.060
rated-0.059
Token number
Token ID1271
Feature activation+7.67e-02

Pos logit contributions
amer+0.074
old+0.069
ena+0.063
ful+0.060
eper+0.059
Neg logit contributions
 may-0.071
 wouldn-0.068
 spine-0.064
 steep-0.062
rated-0.060
Token of
Token ID286
Feature activation+4.29e-02

Pos logit contributions
amer+0.128
old+0.120
Kat+0.113
 twins+0.108
armac+0.104
Neg logit contributions
 wouldn-0.145
 may-0.140
suit-0.117
 might-0.115
 doesn-0.114
Token all
Token ID477
Feature activation+2.24e-02

Pos logit contributions
amer+0.077
old+0.067
ful+0.060
ena+0.060
eper+0.058
Neg logit contributions
 may-0.077
 wouldn-0.072
 steep-0.069
 don-0.068
 spine-0.064
Token:
Token ID25
Feature activation+3.77e-02

Pos logit contributions
amer+0.041
old+0.036
 priest+0.035
ena+0.034
eper+0.034
Neg logit contributions
 may-0.039
 wouldn-0.039
rated-0.035
 steep-0.035
 doesn-0.034
Token zero
Token ID6632
Feature activation+3.56e-02

Pos logit contributions
amer+0.061
old+0.056
ena+0.048
 office+0.047
 student+0.047
Neg logit contributions
 may-0.074
 wouldn-0.072
 steep-0.066
 don-0.065
 doesn-0.062
Token!
Token ID0
Feature activation+2.53e-02

Pos logit contributions
amer+0.064
old+0.061
ena+0.055
 office+0.054
 student+0.053
Neg logit contributions
 may-0.062
 wouldn-0.061
suit-0.053
 steep-0.053
rated-0.053
Token worm
Token ID23849
Feature activation-1.16e-02

Pos logit contributions
 wouldn+0.011
 may+0.011
icol+0.011
rated+0.011
fy+0.010
Neg logit contributions
amer-0.009
old-0.009
 twins-0.008
 office-0.008
ful-0.008
Token.
Token ID13
Feature activation-1.09e-03

Pos logit contributions
 may+0.022
 wouldn+0.022
 spine+0.021
suit+0.020
rated+0.020
Neg logit contributions
amer-0.019
old-0.018
ena-0.016
 student-0.016
od-0.016
Token She
Token ID1375
Feature activation-1.33e-02

Pos logit contributions
rated+0.003
 wouldn+0.003
 may+0.003
 spine+0.003
fy+0.003
Neg logit contributions
old-0.001
amer-0.001
ena-0.001
Her-0.001
od-0.001
Token felt
Token ID2936
Feature activation-8.64e-03

Pos logit contributions
 spine+0.023
rated+0.022
 may+0.021
 steep+0.020
icol+0.020
Neg logit contributions
ena-0.023
old-0.023
amer-0.023
eper-0.020
od-0.020
Token it
Token ID340
Feature activation-3.00e-02

Pos logit contributions
 may+0.018
 wouldn+0.018
 spine+0.017
rated+0.017
suit+0.016
Neg logit contributions
amer-0.012
old-0.012
ena-0.010
od-0.010
 student-0.010
Token wr
Token ID1319
Feature activation-9.84e-02

Pos logit contributions
icol+0.054
 may+0.052
 spine+0.050
rated+0.049
 wouldn+0.049
Neg logit contributions
amer-0.046
ful-0.045
old-0.043
ena-0.042
od-0.041
Tokeniggle
Token ID24082
Feature activation-2.59e-02

Pos logit contributions
 spine+0.183
 may+0.181
 wouldn+0.174
rated+0.167
icol+0.167
Neg logit contributions
ful-0.138
old-0.137
od-0.136
amer-0.133
ily-0.131
Token and
Token ID290
Feature activation-4.02e-02

Pos logit contributions
 may+0.049
 wouldn+0.048
 spine+0.047
rated+0.046
icol+0.045
Neg logit contributions
amer-0.040
old-0.038
ena-0.036
ful-0.035
od-0.034
Token squ
Token ID2809
Feature activation-6.77e-02

Pos logit contributions
rated+0.065
 may+0.063
 spine+0.061
fy+0.059
suit+0.059
Neg logit contributions
ful-0.065
amer-0.064
ena-0.063
old-0.062
 twins-0.061
Tokenirm
Token ID2533
Feature activation-4.12e-02

Pos logit contributions
 may+0.109
 spine+0.104
 wouldn+0.099
rated+0.093
 steep+0.092
Neg logit contributions
old-0.132
amer-0.121
ena-0.116
v-0.115
ful-0.114
Token in
Token ID287
Feature activation-6.18e-03

Pos logit contributions
 may+0.090
 spine+0.087
 wouldn+0.086
icol+0.084
rated+0.084
Neg logit contributions
amer-0.052
old-0.048
ena-0.046
ful-0.045
od-0.044
Token.
Token ID13
Feature activation-1.43e-02

Pos logit contributions
 may+0.032
 wouldn+0.031
 spine+0.030
rated+0.030
suit+0.029
Neg logit contributions
amer-0.020
old-0.019
ena-0.017
od-0.017
 office-0.017
Token The
Token ID383
Feature activation+9.23e-03

Pos logit contributions
 wouldn+0.028
 may+0.028
rated+0.028
 spine+0.027
suit+0.026
Neg logit contributions
old-0.020
amer-0.020
 office-0.018
ena-0.018
 twins-0.017
Token steak
Token ID26320
Feature activation-2.65e-03

Pos logit contributions
amer+0.012
old+0.012
 office+0.011
 twins+0.011
ena+0.011
Neg logit contributions
 wouldn-0.019
 may-0.019
 spine-0.019
rated-0.018
suit-0.018
Token tried
Token ID3088
Feature activation-1.27e-02

Pos logit contributions
 spine+0.005
icol+0.004
 may+0.004
rated+0.004
suit+0.004
Neg logit contributions
amer-0.004
old-0.004
ena-0.004
 office-0.004
od-0.004
Token to
Token ID284
Feature activation-5.86e-04

Pos logit contributions
rated+0.028
suit+0.028
 may+0.027
 spine+0.027
fy+0.027
Neg logit contributions
amer-0.014
ful-0.014
old-0.014
 office-0.013
 twins-0.013
Token wr
Token ID1319
Feature activation-9.65e-02

Pos logit contributions
 may+0.001
suit+0.001
 wouldn+0.001
icol+0.001
 spine+0.001
Neg logit contributions
amer-0.001
ful-0.001
old-0.001
ena-0.001
 office-0.001
Tokeniggle
Token ID24082
Feature activation-3.27e-02

Pos logit contributions
 spine+0.186
use+0.159
suit+0.158
rated+0.156
icol+0.152
Neg logit contributions
od-0.144
ful-0.143
old-0.142
ily-0.133
ate-0.128
Token away
Token ID1497
Feature activation-2.96e-02

Pos logit contributions
 spine+0.062
 may+0.060
rated+0.060
icol+0.059
 wouldn+0.057
Neg logit contributions
amer-0.047
ful-0.045
old-0.044
od-0.043
ena-0.042
Token.
Token ID13
Feature activation-1.49e-02

Pos logit contributions
 may+0.050
 wouldn+0.048
rated+0.047
icol+0.046
 spine+0.046
Neg logit contributions
amer-0.050
ful-0.047
old-0.046
od-0.045
ena-0.045
Token
Token ID220
Feature activation-2.46e-02

Pos logit contributions
rated+0.030
 wouldn+0.029
 may+0.028
 spine+0.028
suit+0.027
Neg logit contributions
old-0.021
amer-0.019
 office-0.018
ena-0.018
 twins-0.018
Token
Token ID198
Feature activation-3.51e-03

Pos logit contributions
 may+0.054
 wouldn+0.054
 spine+0.052
rated+0.052
suit+0.051
Neg logit contributions
amer-0.030
old-0.030
ena-0.026
od-0.025
 office-0.025
Token pet
Token ID4273
Feature activation-1.08e-02

Pos logit contributions
old+0.026
amer+0.026
 office+0.024
ena+0.023
 student+0.023
Neg logit contributions
 may-0.026
 wouldn-0.026
suit-0.025
 spine-0.025
rated-0.024
Token him
Token ID683
Feature activation-7.22e-03

Pos logit contributions
 may+0.023
 wouldn+0.023
 spine+0.023
rated+0.022
 steep+0.022
Neg logit contributions
amer-0.014
old-0.014
ena-0.012
od-0.011
 student-0.011
Token.
Token ID13
Feature activation-4.45e-03

Pos logit contributions
 may+0.015
 wouldn+0.015
 spine+0.014
rated+0.014
suit+0.014
Neg logit contributions
amer-0.010
old-0.010
ena-0.009
od-0.009
 office-0.008
Token George
Token ID4502
Feature activation+4.20e-03

Pos logit contributions
 may+0.009
 wouldn+0.008
 spine+0.008
rated+0.008
suit+0.008
Neg logit contributions
amer-0.007
old-0.007
ena-0.006
od-0.006
 office-0.006
Token felt
Token ID2936
Feature activation-2.50e-02

Pos logit contributions
amer+0.008
old+0.007
ena+0.007
od+0.006
 office+0.006
Neg logit contributions
 may-0.007
 wouldn-0.007
 spine-0.007
rated-0.006
suit-0.006
Token com
Token ID401
Feature activation-9.56e-02

Pos logit contributions
 may+0.047
 wouldn+0.046
 spine+0.046
rated+0.044
suit+0.044
Neg logit contributions
old-0.039
amer-0.039
ena-0.033
od-0.033
ful-0.033
Tokenfy
Token ID24928
Feature activation-3.97e-02

Pos logit contributions
 may+0.075
 wouldn+0.074
 steep+0.064
 footing+0.061
 spine+0.059
Neg logit contributions
old-0.256
amer-0.241
ful-0.228
v-0.227
year-0.223
Token and
Token ID290
Feature activation-4.01e-02

Pos logit contributions
 may+0.083
rated+0.081
 wouldn+0.076
 AW+0.075
suit+0.074
Neg logit contributions
ful-0.052
amer-0.052
 office-0.050
old-0.049
ena-0.046
Token started
Token ID2067
Feature activation+1.16e-02

Pos logit contributions
 spine+0.074
 may+0.072
suit+0.071
rated+0.070
 wouldn+0.068
Neg logit contributions
old-0.060
amer-0.056
 twins-0.053
ful-0.053
 office-0.052
Token to
Token ID284
Feature activation-2.39e-03

Pos logit contributions
amer+0.018
old+0.018
ena+0.016
 office+0.016
od+0.016
Neg logit contributions
 may-0.022
 wouldn-0.021
rated-0.021
 spine-0.021
suit-0.020
Token forget
Token ID6044
Feature activation+2.09e-02

Pos logit contributions
suit+0.004
 may+0.004
 spine+0.004
 wouldn+0.004
fy+0.004
Neg logit contributions
old-0.004
 office-0.004
Her-0.003
ful-0.003
amer-0.003
Token in
Token ID287
Feature activation+2.95e-03

Pos logit contributions
 may+0.055
 spine+0.052
 wouldn+0.052
rated+0.052
icol+0.051
Neg logit contributions
amer-0.031
ful-0.027
old-0.027
ena-0.027
 twins-0.027
Token bed
Token ID3996
Feature activation-1.78e-02

Pos logit contributions
amer+0.002
old+0.002
ena+0.002
 office+0.002
 twins+0.002
Neg logit contributions
 wouldn-0.008
 may-0.008
 spine-0.007
rated-0.007
suit-0.007
Token and
Token ID290
Feature activation-4.04e-02

Pos logit contributions
 may+0.033
rated+0.031
 wouldn+0.031
 steep+0.030
suit+0.029
Neg logit contributions
amer-0.028
old-0.026
 office-0.026
ena-0.026
od-0.025
Token felt
Token ID2936
Feature activation-1.76e-02

Pos logit contributions
 AW+0.071
 may+0.069
 spine+0.068
suit+0.068
�+0.067
Neg logit contributions
ena-0.056
ful-0.056
 her-0.054
 office-0.054
 living-0.053
Token so
Token ID523
Feature activation-3.49e-02

Pos logit contributions
 spine+0.031
 may+0.031
 wouldn+0.031
rated+0.030
suit+0.030
Neg logit contributions
old-0.028
amer-0.027
ful-0.025
 twins-0.024
 office-0.024
Token com
Token ID401
Feature activation-9.55e-02

Pos logit contributions
 spine+0.059
 wouldn+0.059
suit+0.058
icol+0.057
 may+0.057
Neg logit contributions
old-0.056
amer-0.052
ful-0.052
 twins-0.050
Her-0.049
Tokenfy
Token ID24928
Feature activation-4.66e-02

Pos logit contributions
 may+0.083
 wouldn+0.076
 footing+0.067
 steep+0.063
rated+0.062
Neg logit contributions
old-0.237
amer-0.223
ful-0.219
v-0.218
year-0.206
Token.
Token ID13
Feature activation+9.05e-03

Pos logit contributions
 may+0.105
 wouldn+0.100
rated+0.100
 spine+0.096
suit+0.096
Neg logit contributions
amer-0.055
old-0.051
ful-0.050
 office-0.047
ena-0.046
Token After
Token ID2293
Feature activation+1.58e-02

Pos logit contributions
old+0.011
amer+0.010
ena+0.010
 office+0.010
od+0.010
Neg logit contributions
 wouldn-0.020
 may-0.020
rated-0.019
suit-0.019
 spine-0.019
Token a
Token ID257
Feature activation-3.08e-02

Pos logit contributions
amer+0.023
old+0.022
ena+0.019
 student+0.019
od+0.019
Neg logit contributions
 may-0.033
 wouldn-0.033
 spine-0.031
 steep-0.029
rated-0.029
Token few
Token ID1178
Feature activation-1.17e-02

Pos logit contributions
 wouldn+0.048
rated+0.045
 may+0.044
 spine+0.042
 doesn+0.041
Neg logit contributions
 student-0.054
old-0.053
amer-0.051
 priest-0.050
Her-0.049
Token smiled
Token ID13541
Feature activation-1.23e-02

Pos logit contributions
 may+0.032
 wouldn+0.031
 spine+0.030
rated+0.029
suit+0.029
Neg logit contributions
amer-0.033
old-0.032
ena-0.030
od-0.029
 student-0.029
Token.
Token ID13
Feature activation+1.15e-02

Pos logit contributions
 wouldn+0.022
 may+0.022
 spine+0.021
rated+0.020
suit+0.020
Neg logit contributions
amer-0.021
old-0.021
ena-0.018
 office-0.018
 student-0.018
Token It
Token ID632
Feature activation-1.08e-02

Pos logit contributions
amer+0.018
old+0.017
ena+0.016
od+0.015
 student+0.015
Neg logit contributions
 may-0.022
 wouldn-0.022
 spine-0.021
rated-0.020
suit-0.020
Token felt
Token ID2936
Feature activation-2.17e-02

Pos logit contributions
 may+0.019
 wouldn+0.018
 spine+0.018
rated+0.017
suit+0.017
Neg logit contributions
amer-0.019
old-0.018
ena-0.017
od-0.016
 student-0.016
Token so
Token ID523
Feature activation-3.63e-02

Pos logit contributions
 spine+0.041
 may+0.040
 wouldn+0.040
rated+0.039
suit+0.038
Neg logit contributions
old-0.033
amer-0.032
ful-0.029
ena-0.028
 twins-0.028
Token com
Token ID401
Feature activation-9.54e-02

Pos logit contributions
 spine+0.073
 wouldn+0.071
 may+0.068
suit+0.067
rated+0.067
Neg logit contributions
old-0.052
amer-0.047
od-0.045
ful-0.045
 twins-0.044
Tokenfy
Token ID24928
Feature activation-4.35e-02

Pos logit contributions
 may+0.080
 wouldn+0.076
 footing+0.071
 steep+0.064
 spine+0.064
Neg logit contributions
old-0.236
amer-0.220
v-0.218
ful-0.214
year-0.207
Token and
Token ID290
Feature activation-3.68e-02

Pos logit contributions
 may+0.099
rated+0.096
 wouldn+0.096
 spine+0.093
suit+0.092
Neg logit contributions
amer-0.049
old-0.046
ful-0.043
ena-0.041
 office-0.041
Token warm
Token ID5814
Feature activation-4.25e-02

Pos logit contributions
 spine+0.065
 may+0.062
suit+0.060
rated+0.059
icol+0.058
Neg logit contributions
old-0.055
amer-0.055
ena-0.053
ful-0.052
 twins-0.051
Token.
Token ID13
Feature activation+3.65e-03

Pos logit contributions
 may+0.097
 wouldn+0.093
 spine+0.092
rated+0.091
 steep+0.089
Neg logit contributions
amer-0.048
old-0.044
ful-0.042
od-0.039
ena-0.039
Token
Token ID220
Feature activation-1.42e-02

Pos logit contributions
amer+0.006
old+0.006
ena+0.005
od+0.005
 office+0.005
Neg logit contributions
 may-0.007
 wouldn-0.007
 spine-0.007
rated-0.006
suit-0.006
Token favourite
Token ID12507
Feature activation-2.99e-02

Pos logit contributions
rated+0.055
 spine+0.051
 steep+0.050
icol+0.050
 wouldn+0.049
Neg logit contributions
 twins-0.041
ena-0.037
 office-0.037
 living-0.036
old-0.035
Token dress
Token ID6576
Feature activation-4.50e-03

Pos logit contributions
 spine+0.056
 wouldn+0.056
rated+0.055
 may+0.054
 steep+0.054
Neg logit contributions
old-0.041
amer-0.041
 twins-0.039
 office-0.038
ena-0.038
Token.
Token ID13
Feature activation+2.69e-02

Pos logit contributions
 may+0.009
 wouldn+0.008
rated+0.008
 spine+0.008
 steep+0.008
Neg logit contributions
amer-0.007
old-0.006
ena-0.006
 office-0.006
od-0.006
Token She
Token ID1375
Feature activation-1.04e-02

Pos logit contributions
amer+0.045
old+0.040
od+0.038
ena+0.037
eper+0.037
Neg logit contributions
 may-0.050
 wouldn-0.048
 spine-0.046
 steep-0.044
icol-0.044
Token felt
Token ID2936
Feature activation-1.86e-02

Pos logit contributions
 spine+0.018
 may+0.018
rated+0.017
 wouldn+0.017
suit+0.017
Neg logit contributions
old-0.017
amer-0.017
ena-0.016
 student-0.015
od-0.015
Token com
Token ID401
Feature activation-9.52e-02

Pos logit contributions
 may+0.037
 wouldn+0.036
 spine+0.036
rated+0.035
suit+0.034
Neg logit contributions
old-0.027
amer-0.027
ena-0.024
od-0.023
ful-0.023
Tokenfy
Token ID24928
Feature activation-3.68e-02

Pos logit contributions
 wouldn+0.077
 may+0.077
 steep+0.064
 footing+0.062
 spine+0.060
Neg logit contributions
old-0.251
amer-0.235
v-0.227
ful-0.222
year-0.220
Token and
Token ID290
Feature activation-3.83e-02

Pos logit contributions
 may+0.076
rated+0.075
 wouldn+0.070
 steep+0.069
idents+0.068
Neg logit contributions
amer-0.048
ful-0.048
 office-0.046
old-0.046
ena-0.045
Token content
Token ID2695
Feature activation-7.71e-03

Pos logit contributions
 spine+0.067
 may+0.064
suit+0.062
rated+0.061
icol+0.060
Neg logit contributions
old-0.060
amer-0.056
ful-0.055
ena-0.055
 twins-0.054
Token and
Token ID290
Feature activation-2.90e-02

Pos logit contributions
 may+0.015
 wouldn+0.015
rated+0.014
 spine+0.014
suit+0.014
Neg logit contributions
amer-0.012
old-0.011
ena-0.010
 office-0.010
od-0.010
Token loved
Token ID6151
Feature activation+1.69e-02

Pos logit contributions
 may+0.047
 spine+0.047
suit+0.046
 wouldn+0.045
rated+0.044
Neg logit contributions
old-0.048
amer-0.048
ena-0.044
ful-0.044
 office-0.043
Token made
Token ID925
Feature activation-1.18e-02

Pos logit contributions
 spine+0.044
 may+0.040
rated+0.040
icol+0.040
suit+0.039
Neg logit contributions
amer-0.028
old-0.027
ena-0.026
ful-0.024
 twins-0.024
Token Tim
Token ID5045
Feature activation-2.87e-02

Pos logit contributions
 may+0.018
 wouldn+0.017
rated+0.016
 spine+0.015
 steep+0.015
Neg logit contributions
amer-0.023
old-0.022
ena-0.021
 student-0.021
 office-0.021
Tokenmy
Token ID1820
Feature activation-3.89e-02

Pos logit contributions
 may+0.050
 wouldn+0.050
 spine+0.047
rated+0.047
suit+0.046
Neg logit contributions
amer-0.049
old-0.047
ena-0.043
od-0.043
 student-0.042
Token feel
Token ID1254
Feature activation-3.50e-02

Pos logit contributions
 may+0.051
 wouldn+0.050
 spine+0.046
rated+0.045
suit+0.044
Neg logit contributions
amer-0.085
old-0.082
ena-0.077
od-0.076
 student-0.075
Token so
Token ID523
Feature activation-4.04e-02

Pos logit contributions
 spine+0.066
 may+0.064
 wouldn+0.064
rated+0.063
suit+0.062
Neg logit contributions
old-0.053
amer-0.052
ful-0.047
 twins-0.046
ena-0.045
Token com
Token ID401
Feature activation-9.43e-02

Pos logit contributions
 wouldn+0.083
 spine+0.082
 may+0.081
suit+0.079
icol+0.079
Neg logit contributions
old-0.054
amer-0.052
 twins-0.045
ful-0.045
od-0.044
Tokenfy
Token ID24928
Feature activation-4.96e-02

Pos logit contributions
 may+0.078
 wouldn+0.076
 footing+0.072
 steep+0.066
 spine+0.063
Neg logit contributions
old-0.235
amer-0.222
v-0.220
ful-0.215
od-0.207
Token and
Token ID290
Feature activation-4.23e-02

Pos logit contributions
 may+0.108
rated+0.104
 wouldn+0.104
 spine+0.102
suit+0.100
Neg logit contributions
amer-0.062
old-0.058
ful-0.055
 office-0.053
ena-0.052
Token cozy
Token ID37438
Feature activation-5.08e-02

Pos logit contributions
 spine+0.077
 may+0.075
suit+0.074
rated+0.072
icol+0.072
Neg logit contributions
old-0.062
amer-0.062
 twins-0.056
ful-0.056
ena-0.055
Token.
Token ID13
Feature activation-1.83e-02

Pos logit contributions
 may+0.113
 wouldn+0.110
rated+0.105
 spine+0.104
suit+0.102
Neg logit contributions
amer-0.064
old-0.059
ful-0.053
 office-0.053
ena-0.053
Token
Token ID198
Feature activation-4.38e-03

Pos logit contributions
 may+0.039
 wouldn+0.039
 spine+0.038
rated+0.038
suit+0.037
Neg logit contributions
old-0.023
amer-0.023
ena-0.020
 office-0.020
od-0.019
Token started
Token ID2067
Feature activation+1.07e-02

Pos logit contributions
amer+0.067
old+0.065
ena+0.060
od+0.059
 office+0.059
Neg logit contributions
 may-0.053
 wouldn-0.052
 spine-0.052
rated-0.050
suit-0.049
Token to
Token ID284
Feature activation+3.16e-03

Pos logit contributions
amer+0.016
old+0.016
od+0.014
ena+0.014
 office+0.014
Neg logit contributions
 may-0.021
 wouldn-0.020
rated-0.020
 spine-0.020
suit-0.019
Token move
Token ID1445
Feature activation-1.29e-02

Pos logit contributions
amer+0.006
old+0.006
 office+0.005
ena+0.005
ful+0.005
Neg logit contributions
 may-0.005
 wouldn-0.005
suit-0.005
 spine-0.005
fy-0.004
Token.
Token ID13
Feature activation-5.71e-03

Pos logit contributions
 spine+0.026
 may+0.026
icol+0.025
 wouldn+0.025
rated+0.024
Neg logit contributions
amer-0.018
old-0.017
ful-0.016
od-0.015
ena-0.015
Token It
Token ID632
Feature activation-1.74e-02

Pos logit contributions
 wouldn+0.011
 may+0.011
rated+0.011
 spine+0.011
suit+0.011
Neg logit contributions
amer-0.008
old-0.008
ena-0.007
od-0.007
 office-0.007
Token wr
Token ID1319
Feature activation-9.43e-02

Pos logit contributions
 spine+0.031
icol+0.029
 may+0.029
suit+0.028
rated+0.028
Neg logit contributions
amer-0.028
old-0.027
ena-0.025
od-0.024
 student-0.024
Tokenigg
Token ID6950
Feature activation-7.86e-02

Pos logit contributions
 spine+0.185
 may+0.171
 wouldn+0.168
use+0.167
rated+0.163
Neg logit contributions
od-0.132
ful-0.125
old-0.124
amer-0.120
ily-0.115
Tokenled
Token ID992
Feature activation-3.88e-02

Pos logit contributions
 spine+0.146
�+0.141
 may+0.135
icol+0.132
 headache+0.132
Neg logit contributions
amer-0.124
old-0.120
od-0.116
v-0.107
ful-0.107
Token and
Token ID290
Feature activation-3.97e-02

Pos logit contributions
 spine+0.087
�+0.084
use+0.084
 headache+0.083
 may+0.083
Neg logit contributions
amer-0.041
ful-0.039
 who-0.038
od-0.036
 twins-0.034
Token w
Token ID266
Feature activation-4.70e-02

Pos logit contributions
 spine+0.078
 may+0.076
suit+0.074
rated+0.074
fy+0.074
Neg logit contributions
amer-0.048
ful-0.046
ena-0.046
old-0.044
 twins-0.044
Tokenigg
Token ID6950
Feature activation-7.12e-02

Pos logit contributions
 may+0.072
 wouldn+0.070
 spine+0.069
suit+0.066
rated+0.066
Neg logit contributions
old-0.091
amer-0.090
od-0.083
ena-0.082
ful-0.080
Token started
Token ID2067
Feature activation+1.98e-02

Pos logit contributions
 spine+0.008
 may+0.007
suit+0.007
icol+0.007
rated+0.007
Neg logit contributions
amer-0.009
old-0.008
ena-0.008
od-0.007
 office-0.007
Token to
Token ID284
Feature activation+4.44e-03

Pos logit contributions
amer+0.033
old+0.032
ena+0.029
od+0.028
 office+0.028
Neg logit contributions
 may-0.035
 wouldn-0.035
 spine-0.033
rated-0.033
suit-0.032
Token clean
Token ID3424
Feature activation-1.80e-02

Pos logit contributions
amer+0.009
old+0.009
 office+0.008
ena+0.008
od+0.008
Neg logit contributions
 may-0.006
 wouldn-0.006
suit-0.006
 spine-0.006
rated-0.006
Token.
Token ID13
Feature activation+6.63e-04

Pos logit contributions
 may+0.034
 wouldn+0.034
 spine+0.033
rated+0.032
suit+0.032
Neg logit contributions
amer-0.027
old-0.026
 office-0.024
ena-0.024
od-0.024
Token It
Token ID632
Feature activation-9.45e-03

Pos logit contributions
amer+0.001
old+0.001
ena+0.001
od+0.001
 student+0.001
Neg logit contributions
 may-0.001
 wouldn-0.001
 spine-0.001
rated-0.001
suit-0.001
Token wr
Token ID1319
Feature activation-9.25e-02

Pos logit contributions
 may+0.016
 wouldn+0.015
 spine+0.015
rated+0.014
suit+0.014
Neg logit contributions
amer-0.017
old-0.017
ena-0.015
od-0.015
 office-0.015
Tokenigg
Token ID6950
Feature activation-7.40e-02

Pos logit contributions
 spine+0.182
 may+0.163
use+0.161
 wouldn+0.158
suit+0.155
Neg logit contributions
od-0.131
old-0.125
ful-0.125
ily-0.122
amer-0.118
Tokenled
Token ID992
Feature activation-2.72e-02

Pos logit contributions
 spine+0.135
�+0.126
 may+0.125
 headache+0.122
icol+0.122
Neg logit contributions
old-0.118
amer-0.117
od-0.111
ful-0.105
 kinds-0.103
Token around
Token ID1088
Feature activation-2.21e-02

Pos logit contributions
 spine+0.055
rated+0.051
 may+0.051
suit+0.050
 wouldn+0.050
Neg logit contributions
amer-0.036
old-0.034
ful-0.033
od-0.033
 twins-0.031
Token the
Token ID262
Feature activation-1.74e-02

Pos logit contributions
rated+0.044
 may+0.044
 wouldn+0.043
 spine+0.043
suit+0.043
Neg logit contributions
amer-0.029
old-0.028
od-0.026
ful-0.026
ena-0.026
Token cave
Token ID11527
Feature activation+8.44e-04

Pos logit contributions
 wouldn+0.034
 may+0.033
rated+0.031
 spine+0.031
suit+0.030
Neg logit contributions
amer-0.027
old-0.025
 office-0.024
 student-0.024
 twins-0.023
Token them
Token ID606
Feature activation-4.44e-03

Pos logit contributions
 may+0.007
 wouldn+0.007
 spine+0.007
rated+0.007
suit+0.007
Neg logit contributions
amer-0.005
old-0.005
ena-0.004
od-0.004
 office-0.004
Token.
Token ID13
Feature activation-1.03e-02

Pos logit contributions
 may+0.009
 wouldn+0.009
 spine+0.009
rated+0.009
suit+0.009
Neg logit contributions
amer-0.006
old-0.005
od-0.005
ena-0.005
 office-0.005
Token He
Token ID679
Feature activation-1.56e-02

Pos logit contributions
 wouldn+0.019
 may+0.019
rated+0.018
 spine+0.018
suit+0.017
Neg logit contributions
old-0.016
amer-0.016
ena-0.015
 office-0.014
od-0.014
Token felt
Token ID2936
Feature activation-2.88e-02

Pos logit contributions
 may+0.025
 spine+0.024
 wouldn+0.023
rated+0.023
suit+0.023
Neg logit contributions
amer-0.028
old-0.028
ena-0.026
od-0.025
 office-0.025
Token so
Token ID523
Feature activation-3.78e-02

Pos logit contributions
 spine+0.049
 may+0.048
 wouldn+0.048
suit+0.048
rated+0.047
Neg logit contributions
old-0.049
amer-0.047
ful-0.043
 twins-0.042
 office-0.042
Token com
Token ID401
Feature activation-9.19e-02

Pos logit contributions
suit+0.070
icol+0.070
 spine+0.068
 wouldn+0.068
 may+0.066
Neg logit contributions
old-0.060
amer-0.054
 twins-0.051
ful-0.051
 office-0.048
Tokenfy
Token ID24928
Feature activation-4.51e-02

Pos logit contributions
 may+0.079
 wouldn+0.074
 footing+0.064
 steep+0.063
rated+0.063
Neg logit contributions
old-0.238
amer-0.217
ful-0.211
v-0.209
year-0.206
Token that
Token ID326
Feature activation-2.13e-02

Pos logit contributions
 may+0.102
rated+0.101
 wouldn+0.096
suit+0.095
fy+0.094
Neg logit contributions
amer-0.050
ful-0.049
old-0.048
 office-0.045
ena-0.042
Token he
Token ID339
Feature activation-2.27e-02

Pos logit contributions
suit+0.048
icol+0.047
 spine+0.047
rated+0.046
 may+0.045
Neg logit contributions
amer-0.022
old-0.021
 office-0.020
ful-0.019
 twins-0.019
Token decided
Token ID3066
Feature activation-2.97e-03

Pos logit contributions
 spine+0.037
icol+0.036
rated+0.032
 steep+0.032
 AW+0.032
Neg logit contributions
amer-0.038
old-0.038
ful-0.036
ena-0.036
 who-0.036
Token to
Token ID284
Feature activation+1.13e-02

Pos logit contributions
rated+0.006
 may+0.006
 spine+0.006
suit+0.006
fy+0.006
Neg logit contributions
old-0.003
amer-0.003
 office-0.003
ful-0.003
ena-0.003
Token happy
Token ID3772
Feature activation+1.15e-02

Pos logit contributions
amer+0.020
old+0.018
od+0.017
 student+0.017
reating+0.016
Neg logit contributions
 wouldn-0.016
 may-0.015
 steep-0.015
rated-0.014
 spine-0.014
Token and
Token ID290
Feature activation-1.78e-02

Pos logit contributions
amer+0.024
old+0.022
ena+0.021
od+0.020
 student+0.020
Neg logit contributions
 wouldn-0.017
 may-0.017
 spine-0.016
 don-0.015
suit-0.015
Token his
Token ID465
Feature activation+3.12e-03

Pos logit contributions
 may+0.031
 wouldn+0.031
 spine+0.030
rated+0.029
suit+0.029
Neg logit contributions
amer-0.030
old-0.029
ena-0.026
 office-0.026
 student-0.026
Token body
Token ID1767
Feature activation+9.95e-03

Pos logit contributions
 office+0.003
 twins+0.003
old+0.003
amer+0.003
 student+0.003
Neg logit contributions
 may-0.007
 wouldn-0.007
rated-0.007
suit-0.007
 spine-0.007
Token felt
Token ID2936
Feature activation-1.91e-02

Pos logit contributions
amer+0.019
old+0.018
ena+0.017
od+0.016
 office+0.016
Neg logit contributions
 may-0.016
 wouldn-0.015
 spine-0.015
rated-0.014
suit-0.014
Token com
Token ID401
Feature activation-9.18e-02

Pos logit contributions
 spine+0.036
 may+0.035
 wouldn+0.035
rated+0.034
suit+0.033
Neg logit contributions
old-0.030
amer-0.029
ful-0.026
ena-0.025
 twins-0.025
Tokenfy
Token ID24928
Feature activation-3.64e-02

Pos logit contributions
 may+0.081
 wouldn+0.079
 footing+0.069
 spine+0.061
 steep+0.061
Neg logit contributions
old-0.235
amer-0.219
v-0.218
ful-0.204
year-0.199
Token and
Token ID290
Feature activation-3.73e-02

Pos logit contributions
 may+0.083
 wouldn+0.078
rated+0.078
 spine+0.075
 steep+0.074
Neg logit contributions
amer-0.042
old-0.038
ful-0.038
 office-0.037
ena-0.035
Token relaxed
Token ID18397
Feature activation-2.37e-02

Pos logit contributions
 spine+0.059
 may+0.059
suit+0.056
rated+0.055
 wouldn+0.055
Neg logit contributions
old-0.063
amer-0.062
 office-0.058
ful-0.057
ena-0.057
Token on
Token ID319
Feature activation-1.41e-02

Pos logit contributions
 may+0.050
 wouldn+0.049
 spine+0.047
rated+0.046
suit+0.045
Neg logit contributions
amer-0.033
old-0.031
ena-0.028
od-0.027
 office-0.027
Token the
Token ID262
Feature activation-3.31e-02

Pos logit contributions
 may+0.030
 wouldn+0.029
 spine+0.028
rated+0.027
 steep+0.027
Neg logit contributions
amer-0.020
old-0.019
ena-0.016
od-0.016
 student-0.016
Token not
Token ID407
Feature activation+1.39e-02

Pos logit contributions
amer+0.019
old+0.018
ena+0.018
od+0.017
ful+0.016
Neg logit contributions
 spine-0.025
 may-0.025
rated-0.024
 wouldn-0.024
suit-0.024
Token like
Token ID588
Feature activation+1.92e-02

Pos logit contributions
amer+0.025
old+0.025
ena+0.023
od+0.023
 student+0.023
Neg logit contributions
 may-0.022
 wouldn-0.022
icol-0.021
 spine-0.021
rated-0.021
Token that
Token ID326
Feature activation+1.03e-02

Pos logit contributions
amer+0.031
old+0.029
ena+0.026
od+0.026
 office+0.025
Neg logit contributions
 may-0.037
 wouldn-0.036
 spine-0.034
rated-0.033
 steep-0.033
Token.
Token ID13
Feature activation+3.17e-03

Pos logit contributions
amer+0.018
old+0.017
ena+0.015
od+0.015
 office+0.015
Neg logit contributions
 may-0.019
 wouldn-0.019
 spine-0.017
rated-0.017
suit-0.017
Token It
Token ID632
Feature activation-1.33e-02

Pos logit contributions
amer+0.005
old+0.004
ena+0.004
od+0.004
 student+0.004
Neg logit contributions
 wouldn-0.006
 may-0.006
 spine-0.006
rated-0.006
suit-0.006
Token wr
Token ID1319
Feature activation-9.10e-02

Pos logit contributions
 spine+0.022
icol+0.021
 may+0.021
rated+0.021
 wouldn+0.020
Neg logit contributions
amer-0.023
ena-0.021
old-0.021
od-0.020
eper-0.020
Tokenigg
Token ID6950
Feature activation-6.29e-02

Pos logit contributions
 spine+0.178
 wouldn+0.159
 may+0.158
rated+0.158
icol+0.153
Neg logit contributions
od-0.129
old-0.127
ily-0.122
amer-0.121
ful-0.120
Tokenled
Token ID992
Feature activation-1.77e-02

Pos logit contributions
 spine+0.110
�+0.103
 may+0.103
icol+0.102
 headache+0.099
Neg logit contributions
amer-0.109
old-0.103
od-0.098
ena-0.096
ful-0.093
Token and
Token ID290
Feature activation-5.22e-02

Pos logit contributions
 spine+0.036
 may+0.036
rated+0.035
icol+0.035
 wouldn+0.034
Neg logit contributions
amer-0.023
old-0.022
ful-0.022
ena-0.020
 twins-0.020
Token bit
Token ID1643
Feature activation-1.87e-02

Pos logit contributions
 spine+0.086
rated+0.085
 may+0.080
icol+0.079
acket+0.078
Neg logit contributions
ful-0.078
amer-0.077
ena-0.076
Her-0.071
 twins-0.071
Token Mia
Token ID32189
Feature activation-1.61e-02

Pos logit contributions
 may+0.039
 wouldn+0.038
 spine+0.037
rated+0.037
suit+0.036
Neg logit contributions
amer-0.026
old-0.025
ena-0.023
od-0.022
ful-0.022
Token the
Token ID262
Feature activation-3.18e-02

Pos logit contributions
 may+0.008
 wouldn+0.008
 spine+0.008
rated+0.008
suit+0.008
Neg logit contributions
amer-0.006
old-0.006
ena-0.005
od-0.005
 student-0.005
Token rain
Token ID6290
Feature activation-2.04e-02

Pos logit contributions
 wouldn+0.056
 may+0.055
rated+0.051
 spine+0.050
suit+0.050
Neg logit contributions
old-0.051
amer-0.051
 office-0.051
 twins-0.048
 living-0.046
Token and
Token ID290
Feature activation-2.98e-02

Pos logit contributions
 may+0.040
 wouldn+0.038
rated+0.035
 spine+0.035
suit+0.034
Neg logit contributions
amer-0.031
old-0.030
 office-0.028
ena-0.027
ful-0.027
Token take
Token ID1011
Feature activation-3.51e-02

Pos logit contributions
 may+0.048
suit+0.046
 spine+0.046
rated+0.046
 wouldn+0.045
Neg logit contributions
old-0.049
amer-0.049
 office-0.046
ful-0.045
ena-0.045
Token a
Token ID257
Feature activation-5.19e-02

Pos logit contributions
 spine+0.087
 wouldn+0.086
 may+0.086
rated+0.086
idents+0.083
Neg logit contributions
old-0.032
amer-0.029
ena-0.026
ful-0.026
od-0.025
Token com
Token ID401
Feature activation-9.07e-02

Pos logit contributions
 wouldn+0.076
rated+0.072
 doesn+0.066
 may+0.066
idents+0.065
Neg logit contributions
old-0.096
ful-0.090
 student-0.088
 priest-0.087
 of-0.086
Tokenfy
Token ID24928
Feature activation-5.11e-02

Pos logit contributions
 wouldn+0.077
 may+0.072
 Di+0.061
icol+0.059
 hips+0.054
Neg logit contributions
old-0.238
ful-0.215
amer-0.214
v-0.207
year-0.201
Token nap
Token ID25422
Feature activation-1.98e-02

Pos logit contributions
 wouldn+0.090
rated+0.086
 may+0.085
idents+0.082
 doesn+0.078
Neg logit contributions
 office-0.081
old-0.081
ful-0.077
 living-0.075
 of-0.074
Token.
Token ID13
Feature activation-8.86e-03

Pos logit contributions
 may+0.040
 wouldn+0.039
 spine+0.038
rated+0.038
suit+0.037
Neg logit contributions
amer-0.029
old-0.027
ena-0.024
od-0.024
 office-0.024
Token The
Token ID383
Feature activation-5.92e-03

Pos logit contributions
 wouldn+0.017
 may+0.017
rated+0.017
 spine+0.016
suit+0.016
Neg logit contributions
old-0.013
amer-0.013
ena-0.011
 office-0.011
 student-0.011
Token tiger
Token ID26241
Feature activation+4.85e-04

Pos logit contributions
 wouldn+0.012
 may+0.012
 spine+0.012
suit+0.012
icol+0.011
Neg logit contributions
amer-0.008
old-0.008
 twins-0.008
 office-0.007
ena-0.007
Token
Token ID198
Feature activation-3.67e-03

Pos logit contributions
amer+0.014
old+0.012
 student+0.012
 office+0.012
ena+0.012
Neg logit contributions
 may-0.013
 wouldn-0.012
 steep-0.011
 don-0.011
rated-0.011
TokenThe
Token ID464
Feature activation-5.56e-03

Pos logit contributions
 may+0.006
 steep+0.005
 wouldn+0.005
rated+0.005
suit+0.005
Neg logit contributions
amer-0.007
old-0.007
od-0.006
 office-0.006
ena-0.006
Token branch
Token ID8478
Feature activation+2.23e-03

Pos logit contributions
 wouldn+0.010
 may+0.010
suit+0.009
 spine+0.009
rated+0.009
Neg logit contributions
amer-0.009
old-0.009
ena-0.008
 office-0.008
 student-0.008
Token was
Token ID373
Feature activation-7.67e-03

Pos logit contributions
amer+0.004
old+0.004
ena+0.003
od+0.003
 office+0.003
Neg logit contributions
 may-0.004
 wouldn-0.004
 spine-0.004
rated-0.004
suit-0.003
Token so
Token ID523
Feature activation-3.14e-02

Pos logit contributions
 may+0.016
suit+0.015
 spine+0.015
 wouldn+0.015
icol+0.015
Neg logit contributions
amer-0.010
old-0.009
ena-0.009
 office-0.009
ful-0.009
Token com
Token ID401
Feature activation-9.01e-02

Pos logit contributions
 spine+0.056
 may+0.054
 wouldn+0.053
suit+0.053
icol+0.053
Neg logit contributions
amer-0.051
old-0.049
ful-0.046
od-0.044
 priest-0.044
Tokenfy
Token ID24928
Feature activation-4.50e-02

Pos logit contributions
 wouldn+0.075
 may+0.072
 footing+0.066
 hips+0.066
rated+0.063
Neg logit contributions
old-0.225
amer-0.216
ful-0.201
v-0.200
od-0.191
Token and
Token ID290
Feature activation-4.60e-02

Pos logit contributions
 may+0.104
rated+0.103
 wouldn+0.100
 spine+0.097
suit+0.096
Neg logit contributions
amer-0.048
ful-0.045
old-0.044
 office-0.040
ena-0.039
Token gentle
Token ID10296
Feature activation-1.91e-02

Pos logit contributions
 spine+0.094
suit+0.089
icol+0.087
use+0.086
 may+0.085
Neg logit contributions
amer-0.059
ful-0.056
 kinds-0.054
old-0.054
ena-0.052
Token that
Token ID326
Feature activation-1.92e-02

Pos logit contributions
 may+0.039
 wouldn+0.038
 spine+0.037
rated+0.037
suit+0.036
Neg logit contributions
amer-0.027
old-0.026
ena-0.023
ful-0.023
od-0.023
Token the
Token ID262
Feature activation-4.76e-03

Pos logit contributions
 spine+0.040
suit+0.039
 may+0.038
rated+0.038
icol+0.038
Neg logit contributions
amer-0.025
old-0.024
 office-0.022
od-0.021
ena-0.021
Token,
Token ID11
Feature activation-4.60e-02

Pos logit contributions
 may+0.071
 wouldn+0.070
 spine+0.068
rated+0.068
 AW+0.064
Neg logit contributions
amer-0.054
old-0.050
 office-0.049
 twins-0.048
ful-0.047
Token and
Token ID290
Feature activation-2.19e-02

Pos logit contributions
 may+0.086
rated+0.084
 spine+0.083
 wouldn+0.083
suit+0.080
Neg logit contributions
amer-0.066
 twins-0.061
 office-0.061
old-0.061
ena-0.060
Token the
Token ID262
Feature activation-2.92e-03

Pos logit contributions
 may+0.038
 wouldn+0.037
 spine+0.036
rated+0.036
suit+0.035
Neg logit contributions
amer-0.037
old-0.036
ena-0.033
od-0.032
 office-0.032
Token thief
Token ID25906
Feature activation+2.13e-02

Pos logit contributions
 wouldn+0.005
 may+0.005
rated+0.005
icol+0.005
suit+0.005
Neg logit contributions
amer-0.004
 twins-0.004
 office-0.004
old-0.004
 student-0.004
Token felt
Token ID2936
Feature activation-1.39e-02

Pos logit contributions
amer+0.040
old+0.039
ena+0.036
od+0.035
 office+0.035
Neg logit contributions
 may-0.033
 wouldn-0.033
 spine-0.031
rated-0.031
suit-0.030
Token com
Token ID401
Feature activation-8.99e-02

Pos logit contributions
 may+0.025
 wouldn+0.025
 spine+0.025
rated+0.024
suit+0.023
Neg logit contributions
amer-0.022
old-0.022
ena-0.019
 office-0.019
ful-0.019
Tokenfy
Token ID24928
Feature activation-3.67e-02

Pos logit contributions
 may+0.076
 wouldn+0.073
 footing+0.061
 steep+0.059
rated+0.053
Neg logit contributions
old-0.231
amer-0.218
v-0.214
year-0.205
ful-0.205
Token and
Token ID290
Feature activation-3.05e-02

Pos logit contributions
 may+0.078
 wouldn+0.074
rated+0.073
 spine+0.071
suit+0.070
Neg logit contributions
amer-0.048
old-0.044
 office-0.043
ful-0.043
ena-0.041
Token warm
Token ID5814
Feature activation-3.94e-02

Pos logit contributions
 spine+0.049
 may+0.049
suit+0.047
rated+0.047
 wouldn+0.047
Neg logit contributions
old-0.049
amer-0.049
 office-0.046
 twins-0.046
ena-0.045
Token.
Token ID13
Feature activation-7.24e-03

Pos logit contributions
 may+0.089
 wouldn+0.086
 spine+0.082
rated+0.082
 steep+0.081
Neg logit contributions
amer-0.048
old-0.043
ful-0.039
 twins-0.039
 office-0.039
Token They
Token ID1119
Feature activation-1.19e-02

Pos logit contributions
 may+0.013
 wouldn+0.013
 spine+0.013
suit+0.012
rated+0.012
Neg logit contributions
amer-0.012
old-0.011
ena-0.010
od-0.010
 student-0.010
Token The
Token ID383
Feature activation+7.43e-03

Pos logit contributions
 may+0.013
 wouldn+0.013
 spine+0.012
suit+0.012
rated+0.012
Neg logit contributions
amer-0.011
old-0.010
ena-0.010
od-0.009
 office-0.009
Token little
Token ID1310
Feature activation-3.54e-03

Pos logit contributions
amer+0.012
old+0.012
 office+0.011
 student+0.011
 twins+0.011
Neg logit contributions
 may-0.013
 wouldn-0.013
rated-0.012
 spine-0.012
 steep-0.012
Token girl
Token ID2576
Feature activation+6.07e-03

Pos logit contributions
 may+0.008
 wouldn+0.008
 spine+0.008
rated+0.008
suit+0.007
Neg logit contributions
amer-0.004
old-0.004
ena-0.004
 student-0.003
 office-0.003
Token felt
Token ID2936
Feature activation-1.43e-02

Pos logit contributions
amer+0.011
old+0.011
ena+0.010
od+0.010
 office+0.010
Neg logit contributions
 may-0.010
 wouldn-0.010
 spine-0.009
rated-0.009
suit-0.009
Token so
Token ID523
Feature activation-2.79e-02

Pos logit contributions
 may+0.028
 wouldn+0.027
 spine+0.027
rated+0.026
suit+0.026
Neg logit contributions
amer-0.022
old-0.021
ena-0.019
od-0.018
 office-0.018
Token com
Token ID401
Feature activation-8.99e-02

Pos logit contributions
 wouldn+0.060
 may+0.059
 spine+0.058
rated+0.057
suit+0.057
Neg logit contributions
old-0.036
amer-0.034
ena-0.029
od-0.029
 twins-0.029
Tokenfy
Token ID24928
Feature activation-3.46e-02

Pos logit contributions
 may+0.069
 wouldn+0.066
 steep+0.058
 spine+0.056
 footing+0.056
Neg logit contributions
old-0.241
amer-0.225
v-0.212
ful-0.212
year-0.206
Token and
Token ID290
Feature activation-3.08e-02

Pos logit contributions
 may+0.070
rated+0.067
 wouldn+0.066
 spine+0.064
suit+0.063
Neg logit contributions
amer-0.049
old-0.047
ful-0.044
 office-0.044
ena-0.044
Token happy
Token ID3772
Feature activation+7.71e-03

Pos logit contributions
 spine+0.055
 may+0.054
suit+0.052
rated+0.052
 wouldn+0.051
Neg logit contributions
old-0.048
amer-0.046
ena-0.043
 office-0.042
ful-0.042
Token.
Token ID13
Feature activation+3.69e-03

Pos logit contributions
amer+0.014
old+0.013
ena+0.012
 student+0.012
daughter+0.012
Neg logit contributions
 wouldn-0.013
 may-0.013
 spine-0.012
suit-0.012
rated-0.012
Token The
Token ID383
Feature activation+4.71e-03

Pos logit contributions
amer+0.004
old+0.004
ena+0.004
od+0.004
 office+0.004
Neg logit contributions
 may-0.008
 wouldn-0.008
 spine-0.008
rated-0.008
suit-0.008
Token bed
Token ID3996
Feature activation-9.91e-03

Pos logit contributions
 may+0.080
 wouldn+0.079
rated+0.074
 don+0.070
suit+0.070
Neg logit contributions
amer-0.082
old-0.080
 office-0.078
 twins-0.074
ena-0.074
Token.
Token ID13
Feature activation-6.34e-03

Pos logit contributions
 may+0.019
 wouldn+0.018
 spine+0.017
rated+0.017
suit+0.017
Neg logit contributions
amer-0.016
old-0.015
ena-0.014
 office-0.013
od-0.013
Token It
Token ID632
Feature activation-2.78e-02

Pos logit contributions
 wouldn+0.011
 may+0.011
rated+0.010
 spine+0.010
suit+0.010
Neg logit contributions
old-0.010
amer-0.010
ena-0.009
 student-0.009
 office-0.009
Token was
Token ID373
Feature activation-2.12e-03

Pos logit contributions
 spine+0.053
 may+0.051
icol+0.051
suit+0.049
idents+0.048
Neg logit contributions
amer-0.039
old-0.038
ena-0.035
ful-0.035
 office-0.033
Token so
Token ID523
Feature activation-2.68e-02

Pos logit contributions
 may+0.004
 spine+0.004
 wouldn+0.004
suit+0.004
rated+0.004
Neg logit contributions
amer-0.003
old-0.003
ena-0.003
 office-0.003
ful-0.003
Token com
Token ID401
Feature activation-8.99e-02

Pos logit contributions
 wouldn+0.042
 spine+0.042
 may+0.042
suit+0.039
rated+0.039
Neg logit contributions
amer-0.049
old-0.048
ful-0.043
ena-0.043
 twins-0.043
Tokenfy
Token ID24928
Feature activation-4.84e-02

Pos logit contributions
 wouldn+0.083
 may+0.081
 footing+0.078
 hips+0.076
 headache+0.068
Neg logit contributions
old-0.217
amer-0.197
v-0.193
ful-0.186
year-0.183
Token and
Token ID290
Feature activation-4.52e-02

Pos logit contributions
 may+0.097
rated+0.095
 wouldn+0.094
 spine+0.091
suit+0.088
Neg logit contributions
amer-0.066
old-0.062
ful-0.060
 office-0.058
ena-0.056
Token tight
Token ID5381
Feature activation-2.57e-02

Pos logit contributions
 spine+0.084
 may+0.077
suit+0.076
icol+0.075
use+0.075
Neg logit contributions
amer-0.065
old-0.063
ful-0.062
 kinds-0.060
 twins-0.059
Token.
Token ID13
Feature activation-1.03e-02

Pos logit contributions
 may+0.054
 spine+0.053
rated+0.052
 wouldn+0.051
 steep+0.050
Neg logit contributions
amer-0.031
old-0.030
ful-0.027
ena-0.027
 office-0.026
Token He
Token ID679
Feature activation-2.15e-02

Pos logit contributions
 wouldn+0.019
 may+0.018
rated+0.018
 spine+0.017
suit+0.017
Neg logit contributions
old-0.017
amer-0.016
ena-0.015
od-0.015
 student-0.015
Token.
Token ID13
Feature activation-1.36e-02

Pos logit contributions
 may+0.004
icol+0.004
rated+0.004
 spine+0.004
 wouldn+0.004
Neg logit contributions
amer-0.002
old-0.002
ful-0.002
ena-0.002
 office-0.002
Token He
Token ID679
Feature activation-2.46e-02

Pos logit contributions
 wouldn+0.028
rated+0.027
 may+0.027
fy+0.026
suit+0.026
Neg logit contributions
old-0.019
amer-0.018
 office-0.016
 twins-0.016
ena-0.016
Token could
Token ID714
Feature activation-1.79e-02

Pos logit contributions
 spine+0.043
icol+0.042
 may+0.040
suit+0.040
rated+0.040
Neg logit contributions
old-0.040
amer-0.038
ena-0.037
 office-0.034
ful-0.034
Token never
Token ID1239
Feature activation-1.40e-02

Pos logit contributions
rated+0.027
suit+0.027
 may+0.026
icol+0.026
 wouldn+0.025
Neg logit contributions
amer-0.032
old-0.032
ena-0.030
 office-0.030
ful-0.029
Token get
Token ID651
Feature activation-8.33e-03

Pos logit contributions
icol+0.022
suit+0.022
 wouldn+0.021
 may+0.021
fy+0.020
Neg logit contributions
old-0.025
amer-0.023
 student-0.022
od-0.022
ena-0.022
Token com
Token ID401
Feature activation-8.98e-02

Pos logit contributions
 may+0.015
 spine+0.014
icol+0.014
rated+0.014
suit+0.014
Neg logit contributions
old-0.014
amer-0.014
ful-0.013
 twins-0.012
 office-0.012
Tokenfy
Token ID24928
Feature activation-4.11e-02

Pos logit contributions
 wouldn+0.102
 may+0.102
 footing+0.094
rated+0.089
 hips+0.086
Neg logit contributions
old-0.200
v-0.182
ful-0.167
amer-0.164
off-0.157
Token.
Token ID13
Feature activation-7.67e-03

Pos logit contributions
 may+0.084
rated+0.084
 AW+0.080
 wouldn+0.078
idents+0.077
Neg logit contributions
ful-0.052
amer-0.048
 office-0.048
 of-0.045
 twins-0.045
Token
Token ID198
Feature activation+1.23e-02

Pos logit contributions
 wouldn+0.016
 may+0.016
rated+0.015
 spine+0.015
suit+0.015
Neg logit contributions
amer-0.011
old-0.011
ena-0.009
 office-0.009
od-0.009
Token
Token ID198
Feature activation+2.60e-03

Pos logit contributions
amer+0.022
old+0.019
 student+0.019
ena+0.019
od+0.018
Neg logit contributions
 may-0.021
 wouldn-0.020
 steep-0.019
 spine-0.018
 don-0.018
TokenOne
Token ID3198
Feature activation-1.65e-04

Pos logit contributions
amer+0.004
old+0.004
od+0.004
ena+0.004
 student+0.004
Neg logit contributions
 may-0.005
 steep-0.004
 wouldn-0.004
rated-0.004
 spine-0.004
Token laughed
Token ID13818
Feature activation-2.23e-02

Pos logit contributions
 may+0.049
 spine+0.048
rated+0.047
 wouldn+0.046
suit+0.045
Neg logit contributions
amer-0.046
old-0.045
ena-0.044
od-0.041
 student-0.041
Token.
Token ID13
Feature activation+1.68e-02

Pos logit contributions
 may+0.043
 wouldn+0.042
 spine+0.040
rated+0.040
suit+0.039
Neg logit contributions
amer-0.035
old-0.034
ena-0.030
od-0.029
 office-0.029
Token It
Token ID632
Feature activation-1.57e-02

Pos logit contributions
amer+0.024
old+0.024
ena+0.021
od+0.021
 office+0.020
Neg logit contributions
 may-0.034
 wouldn-0.034
 spine-0.032
rated-0.032
suit-0.031
Token was
Token ID373
Feature activation+5.16e-04

Pos logit contributions
 may+0.028
 spine+0.027
 wouldn+0.027
rated+0.026
suit+0.026
Neg logit contributions
amer-0.026
old-0.025
ena-0.023
od-0.023
 student-0.022
Token so
Token ID523
Feature activation-2.38e-02

Pos logit contributions
amer+0.001
old+0.001
ena+0.001
od+0.001
 student+0.001
Neg logit contributions
 may-0.001
 wouldn-0.001
 spine-0.001
rated-0.001
suit-0.001
Token com
Token ID401
Feature activation-8.97e-02

Pos logit contributions
 wouldn+0.047
 may+0.047
 spine+0.046
rated+0.044
suit+0.044
Neg logit contributions
amer-0.035
old-0.034
ena-0.030
od-0.030
 student-0.029
Tokenfy
Token ID24928
Feature activation-3.74e-02

Pos logit contributions
 wouldn+0.079
 may+0.078
 footing+0.072
 spine+0.066
 hips+0.066
Neg logit contributions
old-0.224
amer-0.204
v-0.200
ful-0.192
od-0.187
Token and
Token ID290
Feature activation-3.39e-02

Pos logit contributions
 may+0.074
 wouldn+0.073
rated+0.071
 spine+0.070
suit+0.068
Neg logit contributions
amer-0.054
old-0.052
ena-0.047
 office-0.046
od-0.046
Token fun
Token ID1257
Feature activation-3.42e-03

Pos logit contributions
 spine+0.061
 may+0.058
suit+0.056
rated+0.055
 wouldn+0.055
Neg logit contributions
amer-0.052
old-0.051
ena-0.049
ful-0.048
od-0.046
Token.
Token ID13
Feature activation+1.70e-02

Pos logit contributions
 wouldn+0.005
 may+0.005
 spine+0.005
 steep+0.004
suit+0.004
Neg logit contributions
amer-0.007
old-0.007
ena-0.006
eper-0.006
 priest-0.006
Token She
Token ID1375
Feature activation-1.04e-02

Pos logit contributions
amer+0.028
old+0.027
ena+0.024
 student+0.024
od+0.024
Neg logit contributions
 may-0.031
 wouldn-0.031
 spine-0.029
rated-0.029
 steep-0.028
Token couch
Token ID18507
Feature activation+3.48e-02

Pos logit contributions
amer+0.035
old+0.033
od+0.030
ena+0.030
ful+0.029
Neg logit contributions
 may-0.030
 wouldn-0.029
 spine-0.028
 steep-0.028
suit-0.027
Token that
Token ID326
Feature activation+1.26e-03

Pos logit contributions
amer+0.062
old+0.060
ena+0.053
od+0.053
eper+0.052
Neg logit contributions
 wouldn-0.060
suit-0.056
 may-0.056
 spine-0.055
 steep-0.053
Token was
Token ID373
Feature activation+2.71e-02

Pos logit contributions
amer+0.002
old+0.002
ena+0.002
od+0.002
 office+0.002
Neg logit contributions
 may-0.002
 wouldn-0.002
 spine-0.002
rated-0.002
suit-0.002
Token just
Token ID655
Feature activation-1.67e-02

Pos logit contributions
amer+0.049
old+0.048
od+0.043
ena+0.043
 student+0.042
Neg logit contributions
 wouldn-0.046
 may-0.045
 steep-0.042
 spine-0.041
rated-0.041
Token as
Token ID355
Feature activation-1.10e-02

Pos logit contributions
 may+0.031
 wouldn+0.030
 spine+0.029
rated+0.028
suit+0.028
Neg logit contributions
amer-0.027
old-0.026
ena-0.024
 office-0.023
od-0.023
Token com
Token ID401
Feature activation-8.96e-02

Pos logit contributions
 may+0.018
 wouldn+0.018
 spine+0.017
rated+0.017
suit+0.017
Neg logit contributions
amer-0.019
old-0.019
ena-0.017
od-0.017
 office-0.017
Tokenfy
Token ID24928
Feature activation-2.95e-02

Pos logit contributions
 may+0.116
 wouldn+0.105
 steep+0.100
rated+0.098
 footing+0.097
Neg logit contributions
old-0.196
v-0.163
amer-0.163
ful-0.162
eared-0.153
Token as
Token ID355
Feature activation-1.47e-02

Pos logit contributions
 may+0.054
rated+0.052
 wouldn+0.051
 spine+0.049
 steep+0.048
Neg logit contributions
amer-0.047
old-0.045
ful-0.042
 office-0.042
ena-0.042
Token the
Token ID262
Feature activation+1.37e-02

Pos logit contributions
 may+0.029
 wouldn+0.029
 spine+0.028
rated+0.028
suit+0.027
Neg logit contributions
amer-0.021
old-0.021
ena-0.019
od-0.018
 office-0.018
Token old
Token ID1468
Feature activation+3.90e-02

Pos logit contributions
amer+0.024
old+0.023
ena+0.021
od+0.020
 office+0.020
Neg logit contributions
 may-0.024
 wouldn-0.024
 spine-0.023
rated-0.022
suit-0.022
Token one
Token ID530
Feature activation+1.25e-02

Pos logit contributions
amer+0.072
old+0.066
od+0.062
ful+0.060
ena+0.057
Neg logit contributions
 wouldn-0.068
 may-0.068
rated-0.063
 spine-0.062
 steep-0.061
Token day
Token ID1110
Feature activation-1.47e-02

Pos logit contributions
amer+0.051
old+0.046
od+0.044
eper+0.044
 priest+0.042
Neg logit contributions
 may-0.032
 wouldn-0.032
suit-0.031
 steep-0.030
rated-0.030
Token,
Token ID11
Feature activation+4.29e-03

Pos logit contributions
rated+0.024
 wouldn+0.024
 spine+0.024
 may+0.023
icol+0.022
Neg logit contributions
old-0.025
amer-0.025
ena-0.023
od-0.023
 student-0.023
Token the
Token ID262
Feature activation+4.71e-02

Pos logit contributions
amer+0.008
old+0.007
od+0.006
ena+0.006
 office+0.006
Neg logit contributions
 may-0.008
 wouldn-0.008
 don-0.007
suit-0.007
 steep-0.007
Token principal
Token ID10033
Feature activation+5.10e-02

Pos logit contributions
amer+0.055
old+0.051
od+0.043
ena+0.041
ful+0.041
Neg logit contributions
 may-0.109
 wouldn-0.107
 spine-0.104
suit-0.103
rated-0.100
Token asked
Token ID1965
Feature activation+1.41e-02

Pos logit contributions
amer+0.108
old+0.093
ful+0.090
od+0.088
 priest+0.087
Neg logit contributions
 wouldn-0.076
 may-0.073
 don-0.063
suit-0.062
 doesn-0.061
Token the
Token ID262
Feature activation+6.28e-02

Pos logit contributions
amer+0.020
old+0.018
ena+0.016
od+0.016
 student+0.016
Neg logit contributions
 wouldn-0.029
 may-0.028
rated-0.027
 don-0.026
 spine-0.026
Token teachers
Token ID7799
Feature activation+4.11e-02

Pos logit contributions
amer+0.088
old+0.078
ful+0.070
od+0.069
eper+0.063
Neg logit contributions
 may-0.128
 wouldn-0.125
suit-0.122
 spine-0.120
 don-0.119
Token to
Token ID284
Feature activation+5.05e-02

Pos logit contributions
amer+0.090
old+0.077
Kat+0.074
ena+0.073
od+0.073
Neg logit contributions
 wouldn-0.057
 may-0.055
 don-0.051
 spine-0.046
suit-0.044
Token count
Token ID954
Feature activation+6.57e-02

Pos logit contributions
amer+0.103
old+0.097
od+0.090
ena+0.088
eper+0.085
Neg logit contributions
 may-0.073
 wouldn-0.072
 don-0.067
 spine-0.062
 steep-0.060
Token how
Token ID703
Feature activation+4.58e-02

Pos logit contributions
old+0.131
amer+0.131
Kat+0.117
armac+0.113
 student+0.111
Neg logit contributions
 may-0.085
 steep-0.084
 wouldn-0.083
 don-0.079
 might-0.075
Token many
Token ID867
Feature activation+5.92e-02

Pos logit contributions
amer+0.080
old+0.067
ena+0.066
mother+0.064
eper+0.063
Neg logit contributions
 may-0.081
 wouldn-0.080
 steep-0.079
 don-0.075
rated-0.069
Token this
Token ID428
Feature activation+1.14e-02

Pos logit contributions
 may+0.004
 wouldn+0.004
 spine+0.004
rated+0.004
suit+0.004
Neg logit contributions
amer-0.003
old-0.003
ena-0.002
od-0.002
 office-0.002
Token table
Token ID3084
Feature activation+3.04e-02

Pos logit contributions
amer+0.019
old+0.018
 office+0.017
ena+0.017
 student+0.017
Neg logit contributions
 may-0.021
 wouldn-0.021
 spine-0.019
rated-0.019
suit-0.019
Token!"
Token ID2474
Feature activation+2.57e-02

Pos logit contributions
amer+0.053
old+0.051
ena+0.047
od+0.046
 office+0.046
Neg logit contributions
 may-0.052
 wouldn-0.051
 spine-0.049
rated-0.048
suit-0.047
Token Then
Token ID3244
Feature activation+1.52e-02

Pos logit contributions
amer+0.043
old+0.039
ena+0.036
 office+0.035
eper+0.035
Neg logit contributions
 may-0.049
 wouldn-0.048
 spine-0.046
rated-0.044
suit-0.043
Token she
Token ID673
Feature activation+1.67e-02

Pos logit contributions
amer+0.032
old+0.030
ena+0.028
 office+0.027
od+0.027
Neg logit contributions
 may-0.023
 wouldn-0.023
 don-0.019
 spine-0.019
rated-0.019
Token heard
Token ID2982
Feature activation+4.56e-02

Pos logit contributions
amer+0.030
old+0.029
ena+0.028
 student+0.027
od+0.027
Neg logit contributions
 may-0.027
 spine-0.026
 wouldn-0.026
rated-0.026
 steep-0.024
Token a
Token ID257
Feature activation+3.66e-04

Pos logit contributions
amer+0.079
old+0.064
eper+0.063
 office+0.058
v+0.058
Neg logit contributions
 may-0.084
 wouldn-0.080
 don-0.073
 spine-0.072
icol-0.071
Token voice
Token ID3809
Feature activation+1.17e-02

Pos logit contributions
amer+0.001
old+0.001
ena+0.001
od+0.001
 office+0.001
Neg logit contributions
 may-0.001
 wouldn-0.001
 spine-0.001
rated-0.001
suit-0.001
Token say
Token ID910
Feature activation+3.69e-03

Pos logit contributions
amer+0.021
old+0.019
 student+0.019
ena+0.018
 office+0.018
Neg logit contributions
 wouldn-0.019
 may-0.019
 spine-0.018
rated-0.017
 don-0.017
Token,
Token ID11
Feature activation-2.29e-02

Pos logit contributions
amer+0.008
old+0.007
ena+0.007
 student+0.007
 office+0.007
Neg logit contributions
 may-0.005
 wouldn-0.005
 spine-0.005
suit-0.005
rated-0.005
Token "
Token ID366
Feature activation+2.98e-02

Pos logit contributions
 may+0.049
 wouldn+0.048
rated+0.048
fy+0.047
 spine+0.047
Neg logit contributions
amer-0.028
old-0.026
ena-0.025
od-0.024
ful-0.024
Token to
Token ID284
Feature activation+1.04e-02

Pos logit contributions
 may+0.059
rated+0.058
 wouldn+0.058
 spine+0.056
icol+0.055
Neg logit contributions
old-0.035
amer-0.035
 office-0.034
ful-0.032
ena-0.031
Token the
Token ID262
Feature activation+4.48e-03

Pos logit contributions
amer+0.016
old+0.015
ena+0.014
 office+0.014
od+0.013
Neg logit contributions
 may-0.020
 wouldn-0.020
 spine-0.019
rated-0.019
suit-0.019
Token cliff
Token ID19516
Feature activation+9.43e-03

Pos logit contributions
amer+0.007
old+0.007
od+0.006
ena+0.006
 student+0.005
Neg logit contributions
 may-0.009
 wouldn-0.009
 spine-0.008
 steep-0.008
suit-0.008
Token to
Token ID284
Feature activation-1.81e-03

Pos logit contributions
amer+0.015
old+0.015
 office+0.014
ena+0.013
 student+0.013
Neg logit contributions
 may-0.017
 wouldn-0.017
 spine-0.016
rated-0.016
icol-0.016
Token sleep
Token ID3993
Feature activation-1.39e-02

Pos logit contributions
 wouldn+0.003
 may+0.003
rated+0.003
suit+0.003
 spine+0.003
Neg logit contributions
amer-0.003
old-0.003
 office-0.003
ena-0.003
od-0.003
Token?"
Token ID1701
Feature activation+4.46e-02

Pos logit contributions
 may+0.029
 wouldn+0.028
 spine+0.027
rated+0.026
icol+0.026
Neg logit contributions
amer-0.020
old-0.018
 office-0.018
ena-0.018
 student-0.017
Token Her
Token ID2332
Feature activation+2.50e-02

Pos logit contributions
amer+0.084
eper+0.068
old+0.067
reating+0.067
ena+0.064
Neg logit contributions
 wouldn-0.078
 may-0.078
�-0.070
 don-0.070
 spine-0.069
Token mom
Token ID1995
Feature activation+1.31e-02

Pos logit contributions
amer+0.038
old+0.037
ena+0.032
od+0.032
 student+0.031
Neg logit contributions
 may-0.049
 wouldn-0.049
 spine-0.046
rated-0.045
suit-0.045
Token smiled
Token ID13541
Feature activation-1.37e-02

Pos logit contributions
amer+0.023
old+0.022
ena+0.020
od+0.020
 office+0.020
Neg logit contributions
 may-0.023
 wouldn-0.023
 spine-0.021
rated-0.021
suit-0.021
Token and
Token ID290
Feature activation-4.50e-03

Pos logit contributions
 may+0.022
 wouldn+0.021
 spine+0.020
rated+0.020
suit+0.019
Neg logit contributions
amer-0.026
old-0.025
ena-0.023
od-0.023
 office-0.022
Token said
Token ID531
Feature activation+1.65e-03

Pos logit contributions
 wouldn+0.009
 may+0.009
 spine+0.008
rated+0.008
suit+0.008
Neg logit contributions
amer-0.007
old-0.007
ena-0.006
od-0.006
 office-0.006
Token stickers
Token ID28568
Feature activation+3.26e-03

Pos logit contributions
 wouldn+0.055
rated+0.055
 may+0.055
 spine+0.054
suit+0.051
Neg logit contributions
amer-0.050
old-0.048
ena-0.046
od-0.046
 twins-0.044
Token and
Token ID290
Feature activation-1.66e-03

Pos logit contributions
amer+0.007
old+0.007
ena+0.006
 student+0.006
 office+0.006
Neg logit contributions
 may-0.004
 wouldn-0.004
suit-0.004
 spine-0.004
 steep-0.003
Token other
Token ID584
Feature activation+2.43e-02

Pos logit contributions
 may+0.003
 wouldn+0.003
 steep+0.003
suit+0.003
 spine+0.003
Neg logit contributions
amer-0.003
old-0.003
ena-0.003
 office-0.003
eper-0.002
Token things
Token ID1243
Feature activation+4.94e-03

Pos logit contributions
amer+0.040
old+0.038
ena+0.035
od+0.034
 office+0.033
Neg logit contributions
 may-0.045
 wouldn-0.044
 spine-0.042
rated-0.041
suit-0.041
Token she
Token ID673
Feature activation+1.38e-02

Pos logit contributions
amer+0.009
old+0.008
ena+0.008
 office+0.007
 student+0.007
Neg logit contributions
 may-0.009
 wouldn-0.009
suit-0.008
 don-0.008
 steep-0.008
Token found
Token ID1043
Feature activation+5.00e-02

Pos logit contributions
amer+0.030
old+0.028
 office+0.026
ful+0.026
ena+0.025
Neg logit contributions
 wouldn-0.020
 may-0.020
suit-0.017
 don-0.016
 doesn-0.016
Token.
Token ID13
Feature activation+2.17e-02

Pos logit contributions
amer+0.090
old+0.086
 student+0.077
ena+0.076
 priest+0.074
Neg logit contributions
 wouldn-0.086
 may-0.084
suit-0.077
 steep-0.071
 doesn-0.071
Token One
Token ID1881
Feature activation+1.94e-02

Pos logit contributions
amer+0.037
old+0.035
ena+0.032
od+0.031
 student+0.031
Neg logit contributions
 may-0.039
 wouldn-0.038
 spine-0.037
suit-0.036
rated-0.036
Token day
Token ID1110
Feature activation-5.64e-03

Pos logit contributions
amer+0.044
old+0.040
od+0.038
ena+0.037
eper+0.037
Neg logit contributions
 may-0.026
 wouldn-0.026
rated-0.023
suit-0.023
 steep-0.023
Token,
Token ID11
Feature activation+6.53e-03

Pos logit contributions
rated+0.009
 wouldn+0.009
 may+0.009
 spine+0.009
icol+0.008
Neg logit contributions
old-0.010
amer-0.009
ena-0.009
od-0.009
 twins-0.009
Token she
Token ID673
Feature activation+3.49e-02

Pos logit contributions
amer+0.008
old+0.008
ena+0.007
od+0.006
 student+0.006
Neg logit contributions
 may-0.015
 wouldn-0.014
 spine-0.014
rated-0.014
suit-0.014
Token and
Token ID290
Feature activation-1.11e-02

Pos logit contributions
 may+0.009
 wouldn+0.009
 spine+0.008
suit+0.008
rated+0.008
Neg logit contributions
amer-0.012
old-0.012
ena-0.011
 office-0.011
od-0.011
Token said
Token ID531
Feature activation+1.33e-02

Pos logit contributions
 may+0.019
 wouldn+0.018
 spine+0.018
rated+0.017
suit+0.017
Neg logit contributions
amer-0.020
old-0.019
ena-0.018
od-0.017
 student-0.017
Token that
Token ID326
Feature activation+3.03e-02

Pos logit contributions
amer+0.028
old+0.026
ena+0.024
 office+0.024
 student+0.023
Neg logit contributions
 may-0.020
 wouldn-0.020
 don-0.017
 steep-0.017
suit-0.016
Token they
Token ID484
Feature activation+6.50e-03

Pos logit contributions
amer+0.060
old+0.055
ena+0.049
eper+0.047
 office+0.047
Neg logit contributions
 may-0.051
 wouldn-0.051
 don-0.047
 doesn-0.043
 steep-0.042
Token had
Token ID550
Feature activation+3.22e-02

Pos logit contributions
amer+0.013
old+0.013
ena+0.012
 office+0.012
 student+0.011
Neg logit contributions
 may-0.010
 wouldn-0.010
rated-0.009
 spine-0.009
suit-0.009
Token done
Token ID1760
Feature activation+4.88e-02

Pos logit contributions
amer+0.057
old+0.053
ena+0.047
 office+0.046
 student+0.046
Neg logit contributions
 wouldn-0.059
 may-0.058
 spine-0.052
 steep-0.052
suit-0.052
Token a
Token ID257
Feature activation-2.49e-03

Pos logit contributions
amer+0.080
old+0.075
 student+0.064
ena+0.064
 office+0.062
Neg logit contributions
 may-0.094
 wouldn-0.093
 doesn-0.083
 don-0.082
 steep-0.082
Token really
Token ID1107
Feature activation+1.34e-04

Pos logit contributions
 wouldn+0.004
 may+0.004
 spine+0.004
rated+0.004
suit+0.003
Neg logit contributions
amer-0.005
old-0.005
 student-0.004
od-0.004
ena-0.004
Token good
Token ID922
Feature activation+2.31e-02

Pos logit contributions
amer+0.000
old+0.000
ena+0.000
od+0.000
 office+0.000
Neg logit contributions
 may-0.000
 wouldn-0.000
rated-0.000
 spine-0.000
suit-0.000
Token job
Token ID1693
Feature activation+4.54e-02

Pos logit contributions
amer+0.047
old+0.045
ena+0.042
od+0.041
ful+0.040
Neg logit contributions
 may-0.034
 wouldn-0.033
 spine-0.031
rated-0.031
suit-0.030
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
amer+0.082
old+0.079
ena+0.069
eper+0.067
 student+0.067
Neg logit contributions
 wouldn-0.076
 may-0.076
 spine-0.068
suit-0.067
 don-0.067
Token he
Token ID339
Feature activation+3.65e-03

Pos logit contributions
 may+0.025
 wouldn+0.025
 spine+0.024
rated+0.023
suit+0.023
Neg logit contributions
amer-0.030
old-0.029
ena-0.027
 office-0.027
od-0.026
Token saw
Token ID2497
Feature activation+2.31e-02

Pos logit contributions
amer+0.005
old+0.005
ena+0.005
eper+0.005
od+0.005
Neg logit contributions
 spine-0.007
 may-0.007
rated-0.007
idents-0.007
icol-0.006
Token something
Token ID1223
Feature activation-1.66e-02

Pos logit contributions
amer+0.038
old+0.034
 student+0.033
ena+0.033
od+0.032
Neg logit contributions
 wouldn-0.044
 may-0.043
 steep-0.039
 don-0.038
 spine-0.037
Token!
Token ID0
Feature activation+8.05e-03

Pos logit contributions
 may+0.030
 wouldn+0.029
 spine+0.029
rated+0.028
suit+0.027
Neg logit contributions
amer-0.027
old-0.027
ena-0.024
od-0.024
 office-0.023
Token It
Token ID632
Feature activation-1.89e-02

Pos logit contributions
amer+0.013
old+0.012
ena+0.011
od+0.011
 office+0.011
Neg logit contributions
 may-0.015
 wouldn-0.015
 spine-0.014
rated-0.014
suit-0.014
Token was
Token ID373
Feature activation+1.11e-02

Pos logit contributions
 spine+0.039
 may+0.037
 wouldn+0.036
icol+0.035
suit+0.035
Neg logit contributions
amer-0.026
old-0.024
ena-0.023
od-0.023
ful-0.022
Token a
Token ID257
Feature activation-1.71e-02

Pos logit contributions
amer+0.013
old+0.012
ena+0.011
od+0.011
 office+0.011
Neg logit contributions
 may-0.025
 wouldn-0.025
 spine-0.024
rated-0.024
suit-0.024
Token small
Token ID1402
Feature activation-1.50e-02

Pos logit contributions
 wouldn+0.033
rated+0.032
 spine+0.032
 may+0.032
fy+0.030
Neg logit contributions
amer-0.025
 student-0.023
 priest-0.022
old-0.022
 office-0.022
Token boat
Token ID8848
Feature activation+4.88e-03

Pos logit contributions
 wouldn+0.029
 may+0.028
rated+0.027
 spine+0.027
fy+0.026
Neg logit contributions
amer-0.023
old-0.021
 office-0.020
 student-0.020
 twins-0.019
Token with
Token ID351
Feature activation+1.01e-02

Pos logit contributions
amer+0.008
old+0.007
ena+0.007
od+0.007
 office+0.007
Neg logit contributions
 may-0.009
 wouldn-0.009
 spine-0.009
rated-0.008
suit-0.008
Token a
Token ID257
Feature activation-1.07e-02

Pos logit contributions
old+0.017
amer+0.017
 office+0.016
 student+0.016
ena+0.015
Neg logit contributions
 may-0.018
 wouldn-0.017
 steep-0.017
suit-0.016
icol-0.015
Token said
Token ID531
Feature activation+1.30e-02

Pos logit contributions
amer+0.034
old+0.032
ena+0.029
 office+0.029
 student+0.029
Neg logit contributions
 wouldn-0.038
 may-0.038
rated-0.034
suit-0.033
 spine-0.033
Token he
Token ID339
Feature activation+8.53e-03

Pos logit contributions
amer+0.021
old+0.019
ena+0.017
 student+0.017
 office+0.017
Neg logit contributions
 may-0.025
 wouldn-0.025
 spine-0.023
rated-0.023
suit-0.023
Token wanted
Token ID2227
Feature activation+2.62e-02

Pos logit contributions
amer+0.017
old+0.016
ena+0.015
od+0.015
 student+0.015
Neg logit contributions
 may-0.013
 wouldn-0.013
 spine-0.012
rated-0.012
suit-0.012
Token to
Token ID284
Feature activation+3.01e-02

Pos logit contributions
amer+0.045
old+0.043
ena+0.039
od+0.038
 student+0.038
Neg logit contributions
 may-0.047
 wouldn-0.046
 spine-0.043
rated-0.042
suit-0.042
Token be
Token ID307
Feature activation+2.63e-02

Pos logit contributions
amer+0.063
old+0.060
ena+0.055
od+0.055
 twins+0.054
Neg logit contributions
 may-0.045
 wouldn-0.042
 spine-0.038
 don-0.038
rated-0.036
Token able
Token ID1498
Feature activation+2.06e-02

Pos logit contributions
amer+0.045
old+0.043
od+0.038
ena+0.038
daughter+0.037
Neg logit contributions
 wouldn-0.045
 may-0.042
 don-0.041
 steep-0.039
suit-0.039
Token to
Token ID284
Feature activation+2.25e-02

Pos logit contributions
amer+0.041
old+0.039
ena+0.038
od+0.036
 student+0.036
Neg logit contributions
 may-0.030
 wouldn-0.030
 spine-0.028
 steep-0.027
 don-0.026
Token spring
Token ID6076
Feature activation-3.63e-03

Pos logit contributions
amer+0.046
old+0.045
ena+0.042
od+0.040
 student+0.040
Neg logit contributions
 may-0.033
 wouldn-0.031
 spine-0.029
 don-0.028
rated-0.028
Token.
Token ID13
Feature activation-8.63e-03

Pos logit contributions
 may+0.007
 wouldn+0.007
 spine+0.006
rated+0.006
suit+0.006
Neg logit contributions
amer-0.006
old-0.006
ena-0.005
od-0.005
 office-0.005
Token
Token ID220
Feature activation-1.41e-02

Pos logit contributions
 may+0.015
 wouldn+0.014
 spine+0.014
rated+0.013
suit+0.013
Neg logit contributions
amer-0.015
old-0.015
ena-0.014
od-0.013
 student-0.013
Token
Token ID198
Feature activation-1.84e-03

Pos logit contributions
 may+0.026
 wouldn+0.025
 spine+0.023
 steep+0.023
suit+0.023
Neg logit contributions
amer-0.024
old-0.021
 student-0.020
ena-0.020
 office-0.020
Token with
Token ID351
Feature activation+1.56e-02

Pos logit contributions
 may+0.033
 wouldn+0.033
 spine+0.031
rated+0.031
suit+0.031
Neg logit contributions
amer-0.026
old-0.025
ena-0.022
od-0.022
 office-0.021
Token a
Token ID257
Feature activation-5.08e-03

Pos logit contributions
amer+0.030
old+0.028
 priest+0.026
 office+0.026
card+0.025
Neg logit contributions
 may-0.023
 don-0.021
 steep-0.021
 wouldn-0.020
suit-0.019
Token grape
Token ID30777
Feature activation+2.53e-02

Pos logit contributions
 may+0.008
 wouldn+0.007
suit+0.007
 spine+0.007
 steep+0.007
Neg logit contributions
amer-0.010
old-0.010
ena-0.009
od-0.009
 office-0.009
Token.
Token ID13
Feature activation+2.01e-02

Pos logit contributions
amer+0.047
old+0.046
ena+0.042
daughter+0.041
Kat+0.040
Neg logit contributions
 may-0.041
 wouldn-0.040
suit-0.037
 steep-0.037
 spine-0.037
Token Chris
Token ID5180
Feature activation+6.82e-03

Pos logit contributions
amer+0.039
old+0.037
ful+0.036
 priest+0.034
eper+0.033
Neg logit contributions
 may-0.034
 wouldn-0.030
 spine-0.028
 steep-0.028
 doesn-0.028
Token wanted
Token ID2227
Feature activation+2.37e-02

Pos logit contributions
amer+0.012
old+0.012
ena+0.011
od+0.010
 office+0.010
Neg logit contributions
 may-0.012
 wouldn-0.011
 spine-0.011
rated-0.011
suit-0.010
Token to
Token ID284
Feature activation+2.25e-02

Pos logit contributions
amer+0.053
old+0.049
ena+0.045
 office+0.045
 student+0.044
Neg logit contributions
 may-0.031
 wouldn-0.030
 steep-0.025
 doesn-0.024
 don-0.024
Token hold
Token ID1745
Feature activation-1.13e-02

Pos logit contributions
amer+0.050
old+0.047
ena+0.045
 office+0.044
od+0.043
Neg logit contributions
 may-0.030
 wouldn-0.028
 don-0.025
 spine-0.025
rated-0.024
Token the
Token ID262
Feature activation+1.93e-02

Pos logit contributions
 may+0.021
 wouldn+0.021
suit+0.019
 spine+0.019
rated+0.019
Neg logit contributions
amer-0.018
old-0.018
ena-0.016
 office-0.016
 student-0.016
Token grape
Token ID30777
Feature activation-1.33e-02

Pos logit contributions
amer+0.034
old+0.034
ena+0.030
 office+0.029
od+0.029
Neg logit contributions
 may-0.033
 wouldn-0.032
 spine-0.031
rated-0.030
suit-0.030
Token,
Token ID11
Feature activation-4.20e-02

Pos logit contributions
 may+0.021
 wouldn+0.020
 spine+0.019
rated+0.019
suit+0.019
Neg logit contributions
amer-0.025
old-0.024
ena-0.022
od-0.022
 student-0.022
Token bur
Token ID4356
Feature activation-4.66e-02

Pos logit contributions
 wouldn+0.071
 may+0.070
rated+0.065
 spine+0.063
suit+0.061
Neg logit contributions
amer-0.081
old-0.077
 office-0.073
ena-0.071
 student-0.071
Tokenrow
Token ID808
Feature activation+1.14e-02

Pos logit contributions
 may+0.088
 wouldn+0.087
rated+0.080
 spine+0.079
suit+0.078
Neg logit contributions
amer-0.076
old-0.073
 office-0.064
ena-0.064
ful-0.063
Token at
Token ID379
Feature activation+4.65e-03

Pos logit contributions
amer+0.019
old+0.019
ena+0.018
od+0.017
 student+0.017
Neg logit contributions
 wouldn-0.019
 may-0.019
 spine-0.019
suit-0.018
rated-0.018
Token the
Token ID262
Feature activation-1.61e-02

Pos logit contributions
amer+0.008
old+0.008
ena+0.007
od+0.007
 student+0.007
Neg logit contributions
 may-0.008
 wouldn-0.008
 spine-0.007
 steep-0.007
rated-0.007
Token top
Token ID1353
Feature activation+1.72e-02

Pos logit contributions
 may+0.028
 wouldn+0.027
 spine+0.026
rated+0.026
suit+0.025
Neg logit contributions
amer-0.028
old-0.027
ena-0.025
od-0.024
 student-0.024
Token of
Token ID286
Feature activation+4.63e-03

Pos logit contributions
amer+0.019
old+0.018
ena+0.017
od+0.016
eper+0.015
Neg logit contributions
 may-0.040
 steep-0.039
 spine-0.038
 wouldn-0.038
fy-0.037
Token a
Token ID257
Feature activation-2.13e-02

Pos logit contributions
amer+0.007
old+0.007
ena+0.006
od+0.006
 student+0.006
Neg logit contributions
 may-0.009
 wouldn-0.009
 spine-0.008
 steep-0.008
rated-0.008
Token hill
Token ID12788
Feature activation+1.97e-02

Pos logit contributions
 wouldn+0.044
rated+0.043
 spine+0.043
 may+0.041
icol+0.041
Neg logit contributions
 office-0.026
amer-0.026
 student-0.026
 priest-0.025
 twins-0.024
Token.
Token ID13
Feature activation-1.30e-02

Pos logit contributions
amer+0.033
old+0.033
ena+0.030
od+0.030
 student+0.029
Neg logit contributions
 wouldn-0.034
 may-0.034
 spine-0.032
 steep-0.032
suit-0.032
Token One
Token ID1881
Feature activation-1.48e-02

Pos logit contributions
 wouldn+0.025
 may+0.025
rated+0.024
 spine+0.024
suit+0.023
Neg logit contributions
amer-0.020
old-0.019
ena-0.017
 office-0.017
od-0.017
Token day
Token ID1110
Feature activation-4.55e-02

Pos logit contributions
 wouldn+0.030
 may+0.030
icol+0.028
 spine+0.028
suit+0.028
Neg logit contributions
old-0.021
amer-0.021
ena-0.019
 office-0.019
 student-0.018
Token remembered
Token ID12086
Feature activation+2.48e-02

Pos logit contributions
amer+0.008
old+0.008
ena+0.008
 office+0.007
od+0.007
Neg logit contributions
 spine-0.007
 may-0.006
rated-0.006
icol-0.006
suit-0.006
Token was
Token ID373
Feature activation+1.83e-02

Pos logit contributions
amer+0.035
old+0.034
ena+0.031
 office+0.030
od+0.030
Neg logit contributions
 may-0.049
 wouldn-0.048
 spine-0.048
rated-0.047
suit-0.047
Token the
Token ID262
Feature activation-4.16e-03

Pos logit contributions
amer+0.029
old+0.028
ena+0.025
od+0.024
 office+0.024
Neg logit contributions
 may-0.035
 wouldn-0.035
 spine-0.033
rated-0.033
suit-0.032
Token peaceful
Token ID12309
Feature activation-1.08e-02

Pos logit contributions
 may+0.008
 wouldn+0.008
 spine+0.007
rated+0.007
suit+0.007
Neg logit contributions
amer-0.007
old-0.006
 office-0.006
 student-0.006
 twins-0.006
Token river
Token ID7850
Feature activation+7.50e-03

Pos logit contributions
 wouldn+0.019
 may+0.019
fy+0.018
 spine+0.018
suit+0.018
Neg logit contributions
amer-0.017
old-0.016
 office-0.016
 twins-0.015
 student-0.015
Token.
Token ID13
Feature activation+6.22e-03

Pos logit contributions
amer+0.012
old+0.011
ena+0.010
od+0.010
 office+0.010
Neg logit contributions
 may-0.014
 wouldn-0.014
 spine-0.013
rated-0.013
suit-0.013
Token<|endoftext|>
Token ID50256
Feature activation+2.80e-03

Pos logit contributions
amer+0.010
old+0.010
ena+0.009
od+0.009
 student+0.009
Neg logit contributions
 may-0.012
 wouldn-0.011
 spine-0.011
rated-0.011
suit-0.010
TokenOnce
Token ID7454
Feature activation-1.52e-02

Pos logit contributions
amer+0.004
old+0.004
ena+0.004
od+0.004
 office+0.004
Neg logit contributions
 wouldn-0.005
 may-0.005
 spine-0.005
rated-0.005
suit-0.005
Token upon
Token ID2402
Feature activation+1.32e-03

Pos logit contributions
 may+0.015
 wouldn+0.015
 spine+0.014
rated+0.013
suit+0.013
Neg logit contributions
amer-0.037
old-0.036
ena-0.034
od-0.034
 office-0.033
Token a
Token ID257
Feature activation-4.06e-02

Pos logit contributions
amer+0.002
old+0.002
od+0.002
 student+0.002
 office+0.002
Neg logit contributions
 may-0.003
 wouldn-0.002
 don-0.002
 steep-0.002
rated-0.002
Token time
Token ID640
Feature activation-3.58e-02

Pos logit contributions
 wouldn+0.065
 may+0.061
suit+0.056
 spine+0.055
rated+0.054
Neg logit contributions
old-0.078
ena-0.073
 student-0.069
amer-0.068
 twins-0.067
Token
Token ID198
Feature activation+1.67e-02

Pos logit contributions
old+0.003
amer+0.003
ena+0.003
 office+0.003
od+0.003
Neg logit contributions
 may-0.004
 wouldn-0.004
rated-0.004
 spine-0.003
suit-0.003
Token
Token ID198
Feature activation+1.41e-02

Pos logit contributions
amer+0.028
old+0.024
 student+0.024
od+0.023
ena+0.023
Neg logit contributions
 may-0.031
 wouldn-0.030
 steep-0.028
suit-0.027
rated-0.027
Token"
Token ID1
Feature activation+1.40e-02

Pos logit contributions
amer+0.025
od+0.021
old+0.021
 student+0.021
 office+0.021
Neg logit contributions
 may-0.024
 wouldn-0.023
 steep-0.023
fy-0.022
rated-0.022
TokenDon
Token ID3987
Feature activation-2.36e-02

Pos logit contributions
amer+0.024
old+0.023
ena+0.021
od+0.021
 office+0.021
Neg logit contributions
 may-0.024
 wouldn-0.024
 spine-0.023
rated-0.023
suit-0.022
Token't
Token ID470
Feature activation-2.33e-02

Pos logit contributions
 may+0.038
 wouldn+0.038
 spine+0.037
rated+0.036
suit+0.035
Neg logit contributions
amer-0.043
old-0.042
ena-0.038
od-0.038
 office-0.037
Token touch
Token ID3638
Feature activation-2.64e-02

Pos logit contributions
suit+0.040
 wouldn+0.039
 may+0.039
icol+0.038
 spine+0.037
Neg logit contributions
amer-0.038
old-0.037
ful-0.035
od-0.034
 office-0.033
Token it
Token ID340
Feature activation-2.97e-02

Pos logit contributions
 may+0.053
 wouldn+0.053
 spine+0.051
rated+0.050
suit+0.050
Neg logit contributions
amer-0.038
old-0.036
ena-0.032
od-0.032
 office-0.032
Token,
Token ID11
Feature activation-1.73e-02

Pos logit contributions
 may+0.053
 wouldn+0.051
 spine+0.051
icol+0.050
suit+0.050
Neg logit contributions
amer-0.049
old-0.047
od-0.043
 office-0.043
ful-0.042
Token little
Token ID1310
Feature activation+2.65e-03

Pos logit contributions
 may+0.031
 wouldn+0.031
 spine+0.031
suit+0.030
rated+0.030
Neg logit contributions
amer-0.027
old-0.027
ena-0.025
 office-0.024
od-0.024
Token child
Token ID1200
Feature activation-2.17e-03

Pos logit contributions
amer+0.003
old+0.003
ena+0.003
od+0.003
 student+0.003
Neg logit contributions
 may-0.006
 wouldn-0.006
 spine-0.006
rated-0.006
suit-0.006
Token,"
Token ID553
Feature activation-2.22e-03

Pos logit contributions
 may+0.004
 wouldn+0.004
rated+0.003
 spine+0.003
suit+0.003
Neg logit contributions
amer-0.004
old-0.004
ena-0.003
 student-0.003
od-0.003
Token and
Token ID290
Feature activation-4.03e-02

Pos logit contributions
 may+0.088
 wouldn+0.082
rated+0.081
suit+0.080
fy+0.079
Neg logit contributions
amer-0.048
old-0.047
ful-0.045
 office-0.041
ena-0.040
Token tired
Token ID10032
Feature activation-3.14e-02

Pos logit contributions
 spine+0.076
icol+0.075
suit+0.071
 may+0.068
rated+0.067
Neg logit contributions
ful-0.056
old-0.054
amer-0.052
 twins-0.051
 who-0.049
Token.
Token ID13
Feature activation-2.06e-02

Pos logit contributions
 may+0.071
 wouldn+0.068
rated+0.067
 spine+0.066
suit+0.066
Neg logit contributions
amer-0.037
old-0.036
ful-0.032
ena-0.030
 office-0.030
Token
Token ID198
Feature activation-7.95e-03

Pos logit contributions
rated+0.045
 wouldn+0.044
 may+0.044
 spine+0.044
suit+0.042
Neg logit contributions
old-0.026
amer-0.024
ena-0.022
 office-0.021
 twins-0.021
Token
Token ID198
Feature activation-1.62e-02

Pos logit contributions
 wouldn+0.017
 may+0.017
 spine+0.017
rated+0.017
suit+0.016
Neg logit contributions
amer-0.010
old-0.010
ena-0.009
od-0.008
 office-0.008
TokenAs
Token ID1722
Feature activation-2.63e-02

Pos logit contributions
 may+0.027
 wouldn+0.027
 spine+0.026
rated+0.025
suit+0.025
Neg logit contributions
amer-0.029
old-0.028
ena-0.026
 student-0.025
 office-0.025
Token Max
Token ID5436
Feature activation-2.82e-02

Pos logit contributions
rated+0.048
 wouldn+0.046
 spine+0.046
 may+0.046
suit+0.046
Neg logit contributions
amer-0.042
old-0.042
ena-0.038
 student-0.038
 office-0.038
Token walked
Token ID6807
Feature activation-4.90e-02

Pos logit contributions
 spine+0.056
rated+0.054
idents+0.052
 may+0.052
icol+0.050
Neg logit contributions
old-0.040
ena-0.037
amer-0.037
 who-0.035
ful-0.033
Token back
Token ID736
Feature activation-2.50e-02

Pos logit contributions
 may+0.087
 wouldn+0.084
rated+0.081
icol+0.080
 spine+0.079
Neg logit contributions
amer-0.085
old-0.083
ful-0.073
eper-0.072
 office-0.072
Token home
Token ID1363
Feature activation-1.13e-02

Pos logit contributions
rated+0.057
 wouldn+0.055
 may+0.055
fy+0.052
 spine+0.052
Neg logit contributions
old-0.032
amer-0.028
 office-0.028
ful-0.024
 student-0.023
Token,
Token ID11
Feature activation-1.59e-02

Pos logit contributions
 may+0.020
rated+0.019
icol+0.018
 wouldn+0.018
suit+0.018
Neg logit contributions
old-0.019
amer-0.018
 office-0.018
ful-0.017
ena-0.017
Token
Token ID198
Feature activation+4.06e-03

Pos logit contributions
 may+0.025
 wouldn+0.024
 spine+0.023
rated+0.023
suit+0.023
Neg logit contributions
amer-0.020
old-0.019
ena-0.017
od-0.017
 student-0.017
Token
Token ID198
Feature activation-8.06e-03

Pos logit contributions
amer+0.008
old+0.007
 office+0.007
ena+0.007
 student+0.006
Neg logit contributions
 may-0.007
 wouldn-0.006
 steep-0.006
 don-0.006
 spine-0.006
TokenOne
Token ID3198
Feature activation-8.99e-03

Pos logit contributions
 may+0.013
 steep+0.012
 wouldn+0.012
rated+0.011
 don+0.011
Neg logit contributions
amer-0.015
old-0.015
ena-0.013
od-0.013
 office-0.013
Token day
Token ID1110
Feature activation-3.09e-02

Pos logit contributions
 may+0.016
 wouldn+0.016
 spine+0.015
rated+0.015
suit+0.014
Neg logit contributions
amer-0.015
old-0.015
ena-0.014
od-0.013
 office-0.013
Token,
Token ID11
Feature activation-2.53e-02

Pos logit contributions
 spine+0.066
rated+0.066
suit+0.062
 wouldn+0.061
icol+0.061
Neg logit contributions
old-0.039
 twins-0.036
amer-0.036
 who-0.036
 of-0.034
Token the
Token ID262
Feature activation-9.08e-03

Pos logit contributions
rated+0.066
 spine+0.064
icol+0.064
suit+0.063
fy+0.063
Neg logit contributions
amer-0.021
 twins-0.018
old-0.017
 who-0.017
 student-0.016
Token farmer
Token ID18739
Feature activation+3.42e-02

Pos logit contributions
 may+0.020
 wouldn+0.020
 spine+0.020
rated+0.020
suit+0.019
Neg logit contributions
amer-0.011
old-0.010
 office-0.009
 twins-0.009
ena-0.009
Token was
Token ID373
Feature activation+1.98e-02

Pos logit contributions
amer+0.054
old+0.053
 office+0.048
ena+0.048
od+0.047
Neg logit contributions
 may-0.063
 spine-0.063
 wouldn-0.060
icol-0.060
rated-0.059
Token too
Token ID1165
Feature activation-7.70e-05

Pos logit contributions
amer+0.036
old+0.035
ena+0.032
od+0.031
 student+0.031
Neg logit contributions
 wouldn-0.032
 may-0.032
 spine-0.030
rated-0.030
suit-0.029
Token busy
Token ID8179
Feature activation+1.91e-02

Pos logit contributions
 may+0.000
 wouldn+0.000
icol+0.000
 spine+0.000
rated+0.000
Neg logit contributions
amer-0.000
old-0.000
ful-0.000
ena-0.000
 student-0.000
Token to
Token ID284
Feature activation+2.45e-02

Pos logit contributions
amer+0.028
old+0.028
ena+0.024
 office+0.024
od+0.024
Neg logit contributions
 may-0.038
 wouldn-0.037
rated-0.036
 spine-0.036
suit-0.035
Token said
Token ID531
Feature activation-1.16e-02

Pos logit contributions
amer+0.023
old+0.022
ena+0.020
od+0.020
 office+0.020
Neg logit contributions
 may-0.025
 wouldn-0.024
 spine-0.023
rated-0.023
suit-0.022
Token,
Token ID11
Feature activation-2.03e-02

Pos logit contributions
rated+0.019
 spine+0.019
 may+0.018
 steep+0.018
suit+0.018
Neg logit contributions
old-0.020
ful-0.019
amer-0.018
od-0.018
ena-0.018
Token "
Token ID366
Feature activation+4.53e-03

Pos logit contributions
 may+0.041
 wouldn+0.040
 spine+0.040
rated+0.040
suit+0.039
Neg logit contributions
amer-0.028
old-0.027
ena-0.025
od-0.024
 office-0.024
TokenLet
Token ID5756
Feature activation-4.47e-02

Pos logit contributions
amer+0.008
old+0.008
 office+0.008
od+0.008
ena+0.007
Neg logit contributions
 wouldn-0.007
 may-0.007
rated-0.007
suit-0.006
 don-0.006
Token's
Token ID338
Feature activation-8.65e-03

Pos logit contributions
 spine+0.106
 may+0.103
suit+0.103
 wouldn+0.102
 steep+0.100
Neg logit contributions
old-0.048
amer-0.045
ena-0.039
 office-0.039
od-0.038
Token sit
Token ID1650
Feature activation-3.78e-02

Pos logit contributions
 may+0.014
 spine+0.014
 wouldn+0.014
suit+0.014
rated+0.014
Neg logit contributions
amer-0.015
old-0.014
 office-0.014
ena-0.013
ful-0.013
Token down
Token ID866
Feature activation-1.34e-02

Pos logit contributions
 may+0.079
 spine+0.078
rated+0.076
 wouldn+0.075
icol+0.072
Neg logit contributions
amer-0.057
 office-0.048
eper-0.047
ena-0.046
ful-0.045
Token and
Token ID290
Feature activation-3.83e-02

Pos logit contributions
 may+0.025
 wouldn+0.025
 spine+0.024
rated+0.024
suit+0.024
Neg logit contributions
amer-0.021
old-0.019
ena-0.018
 office-0.018
ful-0.018
Token take
Token ID1011
Feature activation-3.34e-02

Pos logit contributions
suit+0.073
rated+0.072
 AW+0.069
 may+0.067
 wouldn+0.066
Neg logit contributions
old-0.051
 office-0.051
amer-0.050
 who-0.049
 priest-0.048
Token a
Token ID257
Feature activation-5.02e-02

Pos logit contributions
 may+0.071
 wouldn+0.071
 spine+0.069
rated+0.068
suit+0.067
Neg logit contributions
amer-0.044
old-0.043
ena-0.037
od-0.036
 office-0.036
Token break
Token ID2270
Feature activation-1.65e-02

Pos logit contributions
 wouldn+0.071
rated+0.068
 may+0.064
icol+0.063
idents+0.063
Neg logit contributions
old-0.098
 priest-0.091
 student-0.089
amer-0.086
ful-0.086
Token you
Token ID345
Feature activation-1.87e-02

Pos logit contributions
 may+0.034
 wouldn+0.034
rated+0.033
 spine+0.033
suit+0.032
Neg logit contributions
amer-0.029
old-0.028
ena-0.026
 office-0.025
 student-0.025
Token are
Token ID389
Feature activation+1.31e-02

Pos logit contributions
icol+0.030
 spine+0.028
suit+0.026
rated+0.026
 steep+0.024
Neg logit contributions
old-0.034
amer-0.034
 twins-0.032
od-0.031
eper-0.031
Token hurting
Token ID21530
Feature activation+5.79e-04

Pos logit contributions
amer+0.023
old+0.022
ena+0.021
od+0.020
 office+0.020
Neg logit contributions
 may-0.022
 wouldn-0.022
 spine-0.021
rated-0.021
suit-0.020
Token me
Token ID502
Feature activation-4.77e-02

Pos logit contributions
amer+0.001
old+0.001
od+0.001
ena+0.001
 student+0.001
Neg logit contributions
 may-0.001
 wouldn-0.001
rated-0.001
 spine-0.001
 steep-0.001
Token and
Token ID290
Feature activation-2.55e-02

Pos logit contributions
 may+0.089
 wouldn+0.087
 spine+0.083
suit+0.081
icol+0.080
Neg logit contributions
amer-0.078
old-0.075
ena-0.068
 office-0.067
od-0.066
Token my
Token ID616
Feature activation-1.38e-02

Pos logit contributions
 may+0.048
 wouldn+0.047
 spine+0.047
rated+0.047
suit+0.046
Neg logit contributions
amer-0.038
old-0.037
ena-0.034
 office-0.033
 student-0.033
Token things
Token ID1243
Feature activation-2.16e-02

Pos logit contributions
 may+0.028
 wouldn+0.028
 spine+0.026
rated+0.026
suit+0.026
Neg logit contributions
amer-0.020
old-0.019
ena-0.017
 office-0.017
 student-0.017
Token!"
Token ID2474
Feature activation+8.57e-03

Pos logit contributions
 may+0.040
 wouldn+0.040
 spine+0.038
rated+0.038
suit+0.037
Neg logit contributions
amer-0.034
old-0.033
ena-0.030
od-0.029
 office-0.029
Token the
Token ID262
Feature activation+2.32e-02

Pos logit contributions
amer+0.016
old+0.014
od+0.014
reating+0.013
 student+0.013
Neg logit contributions
 may-0.014
 wouldn-0.014
 don-0.013
 spine-0.013
 doesn-0.012
Token fairy
Token ID25607
Feature activation+2.89e-02

Pos logit contributions
amer+0.035
old+0.034
od+0.029
ena+0.029
ful+0.027
Neg logit contributions
 may-0.046
 wouldn-0.044
 spine-0.043
suit-0.042
 don-0.041
Token said
Token ID531
Feature activation+1.44e-05

Pos logit contributions
amer+0.054
old+0.049
reating+0.048
 office+0.046
 student+0.046
Neg logit contributions
 wouldn-0.049
 may-0.046
rated-0.041
 don-0.040
suit-0.040
Token end
Token ID886
Feature activation-2.16e-03

Pos logit contributions
old+0.000
amer+0.000
 office+0.000
 twins+0.000
 student+0.000
Neg logit contributions
 wouldn-0.000
 may-0.000
rated-0.000
 spine-0.000
icol-0.000
Token.
Token ID13
Feature activation-1.54e-03

Pos logit contributions
 may+0.004
 wouldn+0.004
 spine+0.004
rated+0.004
 steep+0.004
Neg logit contributions
amer-0.003
old-0.003
ena-0.003
od-0.003
 student-0.003
Token<|endoftext|>
Token ID50256
Feature activation+3.05e-03

Pos logit contributions
 may+0.003
 wouldn+0.002
 spine+0.002
 doesn+0.002
 don+0.002
Neg logit contributions
amer-0.003
old-0.003
 priest-0.002
ena-0.002
 student-0.002
TokenOnce
Token ID7454
Feature activation-1.55e-02

Pos logit contributions
amer+0.004
old+0.004
ena+0.003
od+0.003
 office+0.003
Neg logit contributions
 may-0.006
 wouldn-0.006
 spine-0.006
rated-0.006
suit-0.006
Token upon
Token ID2402
Feature activation-5.03e-03

Pos logit contributions
 wouldn+0.017
 may+0.017
 spine+0.016
rated+0.015
suit+0.015
Neg logit contributions
amer-0.036
old-0.036
ena-0.033
od-0.033
 student-0.032
Token a
Token ID257
Feature activation-5.46e-02

Pos logit contributions
 may+0.010
 wouldn+0.010
 don+0.009
rated+0.009
 spine+0.009
Neg logit contributions
amer-0.008
old-0.007
 office-0.006
od-0.006
 student-0.006
Token time
Token ID640
Feature activation-4.07e-02

Pos logit contributions
 wouldn+0.092
 may+0.086
 spine+0.081
suit+0.077
drops+0.077
Neg logit contributions
old-0.098
ena-0.094
 student-0.087
amer-0.086
 twins-0.086
Token there
Token ID612
Feature activation-1.36e-02

Pos logit contributions
rated+0.080
 wouldn+0.078
 spine+0.077
icol+0.076
fy+0.076
Neg logit contributions
old-0.060
od-0.056
ful-0.055
amer-0.054
 twins-0.053
Token was
Token ID373
Feature activation-1.74e-02

Pos logit contributions
 may+0.025
 wouldn+0.025
 spine+0.024
rated+0.023
suit+0.023
Neg logit contributions
amer-0.022
old-0.021
ena-0.019
od-0.019
 office-0.018
Token a
Token ID257
Feature activation-2.45e-02

Pos logit contributions
 may+0.035
 wouldn+0.035
 spine+0.033
 steep+0.033
rated+0.033
Neg logit contributions
amer-0.025
old-0.023
 office-0.020
ena-0.020
 student-0.020
Token nice
Token ID3621
Feature activation+8.76e-04

Pos logit contributions
 wouldn+0.047
 may+0.046
rated+0.045
 spine+0.045
icol+0.042
Neg logit contributions
amer-0.037
 student-0.035
old-0.035
 twins-0.035
 office-0.034
Token knee
Token ID10329
Feature activation-4.14e-02

Pos logit contributions
 may+0.081
 wouldn+0.077
rated+0.074
icol+0.072
suit+0.072
Neg logit contributions
amer-0.058
 office-0.054
old-0.054
 twins-0.053
ena-0.052
Token.
Token ID13
Feature activation-2.43e-02

Pos logit contributions
 may+0.103
icol+0.102
suit+0.099
 spine+0.097
rated+0.096
Neg logit contributions
amer-0.031
ful-0.031
old-0.029
 office-0.029
Her-0.026
Token It
Token ID632
Feature activation-5.22e-02

Pos logit contributions
rated+0.057
fy+0.056
 may+0.056
suit+0.055
 wouldn+0.055
Neg logit contributions
old-0.023
 office-0.021
Her-0.021
amer-0.020
 twins-0.019
Token made
Token ID925
Feature activation-2.09e-02

Pos logit contributions
 spine+0.095
icol+0.091
suit+0.091
 may+0.087
fy+0.084
Neg logit contributions
amer-0.079
old-0.073
ful-0.072
ena-0.069
Her-0.069
Token him
Token ID683
Feature activation-5.90e-02

Pos logit contributions
 may+0.041
 wouldn+0.040
 spine+0.038
rated+0.037
 steep+0.037
Neg logit contributions
amer-0.032
old-0.031
ena-0.028
 student-0.027
od-0.027
Token feel
Token ID1254
Feature activation-5.30e-02

Pos logit contributions
 may+0.093
icol+0.089
suit+0.088
 spine+0.088
 wouldn+0.087
Neg logit contributions
old-0.104
amer-0.102
ful-0.094
 twins-0.094
 office-0.093
Token very
Token ID845
Feature activation-6.14e-02

Pos logit contributions
 spine+0.096
suit+0.094
icol+0.090
rated+0.088
 may+0.087
Neg logit contributions
old-0.085
ful-0.080
amer-0.080
 twins-0.078
Her-0.072
Token sad
Token ID6507
Feature activation-2.58e-02

Pos logit contributions
icol+0.121
suit+0.121
 spine+0.113
 may+0.112
 hips+0.106
Neg logit contributions
old-0.087
 twins-0.081
ful-0.079
Her-0.075
amer-0.075
Token and
Token ID290
Feature activation-4.38e-02

Pos logit contributions
 may+0.050
 wouldn+0.048
suit+0.047
rated+0.047
 spine+0.047
Neg logit contributions
amer-0.038
old-0.037
ful-0.034
 twins-0.034
 office-0.033
Token he
Token ID339
Feature activation-2.44e-02

Pos logit contributions
suit+0.070
 spine+0.070
 may+0.069
icol+0.066
rated+0.064
Neg logit contributions
amer-0.074
old-0.073
 twins-0.068
 office-0.067
ful-0.067
Token started
Token ID2067
Feature activation+2.59e-03

Pos logit contributions
 spine+0.039
suit+0.036
icol+0.036
 steep+0.036
rated+0.035
Neg logit contributions
old-0.041
amer-0.040
 twins-0.039
ena-0.038
 who-0.036
Token woke
Token ID19092
Feature activation-6.16e-02

Pos logit contributions
 spine+0.067
icol+0.063
suit+0.063
inters+0.063
 steep+0.061
Neg logit contributions
old-0.042
amer-0.040
ena-0.038
 office-0.038
 of-0.038
Token up
Token ID510
Feature activation-2.72e-02

Pos logit contributions
suit+0.123
 spine+0.121
 may+0.120
 wouldn+0.112
 steep+0.111
Neg logit contributions
old-0.082
amer-0.079
od-0.076
 office-0.075
Her-0.074
Token,
Token ID11
Feature activation-1.99e-02

Pos logit contributions
rated+0.059
suit+0.059
 may+0.056
 wouldn+0.056
 spine+0.055
Neg logit contributions
 of-0.030
 her-0.030
old-0.029
 twins-0.028
ful-0.027
Token it
Token ID340
Feature activation-1.84e-02

Pos logit contributions
rated+0.050
 spine+0.050
suit+0.049
fy+0.048
icol+0.047
Neg logit contributions
 twins-0.016
old-0.015
 student-0.014
 office-0.014
amer-0.014
Token felt
Token ID2936
Feature activation-3.79e-02

Pos logit contributions
 spine+0.039
suit+0.036
icol+0.035
inters+0.035
 AW+0.034
Neg logit contributions
amer-0.022
old-0.021
ena-0.021
 office-0.020
 twins-0.019
Token much
Token ID881
Feature activation-5.28e-02

Pos logit contributions
 spine+0.081
suit+0.076
icol+0.075
rated+0.074
 may+0.073
Neg logit contributions
old-0.045
ful-0.045
 twins-0.042
Her-0.042
amer-0.041
Token better
Token ID1365
Feature activation-1.70e-02

Pos logit contributions
suit+0.116
 spine+0.115
 wouldn+0.110
 may+0.109
icol+0.108
Neg logit contributions
old-0.057
 twins-0.054
 of-0.051
amer-0.051
 office-0.047
Token.
Token ID13
Feature activation-1.17e-02

Pos logit contributions
 may+0.036
 wouldn+0.035
rated+0.034
suit+0.034
 spine+0.033
Neg logit contributions
amer-0.023
old-0.021
ena-0.020
ful-0.020
 office-0.019
Token The
Token ID383
Feature activation-8.89e-03

Pos logit contributions
 may+0.020
 wouldn+0.020
 spine+0.019
rated+0.019
suit+0.018
Neg logit contributions
amer-0.020
old-0.020
ena-0.018
od-0.018
 office-0.017
Token w
Token ID266
Feature activation-2.99e-02

Pos logit contributions
 wouldn+0.019
 spine+0.019
 may+0.018
icol+0.018
rated+0.018
Neg logit contributions
 twins-0.012
amer-0.010
 office-0.010
 student-0.010
old-0.010
Tokenaffle
Token ID30697
Feature activation+5.09e-03

Pos logit contributions
 wouldn+0.058
 may+0.057
 spine+0.056
 steep+0.055
rated+0.055
Neg logit contributions
old-0.047
amer-0.045
 twins-0.040
od-0.039
ena-0.039
Token floating
Token ID12462
Feature activation-9.73e-03

Pos logit contributions
amer+0.036
old+0.036
ena+0.033
 student+0.032
od+0.031
Neg logit contributions
 wouldn-0.034
 may-0.034
 spine-0.031
 steep-0.030
suit-0.030
Token in
Token ID287
Feature activation-5.46e-03

Pos logit contributions
 may+0.020
 wouldn+0.019
 spine+0.019
icol+0.018
rated+0.018
Neg logit contributions
amer-0.014
old-0.012
ful-0.012
ena-0.012
od-0.011
Token the
Token ID262
Feature activation-3.88e-02

Pos logit contributions
 may+0.009
 steep+0.008
 wouldn+0.008
 don+0.008
rated+0.007
Neg logit contributions
amer-0.011
old-0.010
ena-0.009
od-0.009
 student-0.009
Token water
Token ID1660
Feature activation+6.81e-03

Pos logit contributions
 wouldn+0.066
 may+0.065
 spine+0.062
rated+0.061
suit+0.059
Neg logit contributions
amer-0.068
old-0.065
 office-0.060
ena-0.059
 twins-0.058
Token.
Token ID13
Feature activation-2.57e-02

Pos logit contributions
amer+0.010
old+0.010
ena+0.009
od+0.009
 student+0.009
Neg logit contributions
 may-0.014
 wouldn-0.013
 spine-0.013
rated-0.013
suit-0.012
Token D
Token ID360
Feature activation-5.18e-02

Pos logit contributions
rated+0.059
 spine+0.057
 wouldn+0.056
fy+0.055
 may+0.055
Neg logit contributions
old-0.024
Her-0.024
amer-0.023
 twins-0.022
ful-0.021
Tokenucky
Token ID5309
Feature activation-1.76e-02

Pos logit contributions
 steep+0.113
 wouldn+0.112
icol+0.111
 spine+0.111
rated+0.107
Neg logit contributions
old-0.060
 twins-0.055
ily-0.053
amer-0.053
ful-0.052
Token thought
Token ID1807
Feature activation-2.63e-02

Pos logit contributions
 may+0.032
 spine+0.032
 wouldn+0.031
rated+0.030
suit+0.030
Neg logit contributions
amer-0.027
old-0.027
ena-0.024
od-0.024
 twins-0.024
Token it
Token ID340
Feature activation-5.12e-02

Pos logit contributions
rated+0.062
suit+0.060
 spine+0.060
icol+0.059
fy+0.056
Neg logit contributions
old-0.029
ful-0.026
od-0.025
 of-0.023
 office-0.022
Token was
Token ID373
Feature activation-1.29e-03

Pos logit contributions
 spine+0.113
icol+0.098
 injuries+0.097
fy+0.097
aj+0.096
Neg logit contributions
od-0.059
amer-0.059
old-0.054
 kinds-0.053
 who-0.053
Token a
Token ID257
Feature activation-2.68e-02

Pos logit contributions
 spine+0.003
 may+0.003
 wouldn+0.003
rated+0.003
fy+0.003
Neg logit contributions
amer-0.002
old-0.001
ena-0.001
 twins-0.001
od-0.001
Token hand
Token ID1021
Feature activation-1.71e-02

Pos logit contributions
rated+0.069
 wouldn+0.068
 may+0.067
 steep+0.063
 spine+0.062
Neg logit contributions
amer-0.048
old-0.047
ena-0.044
 office-0.044
 student-0.043
Token.
Token ID13
Feature activation-5.80e-03

Pos logit contributions
 may+0.037
 wouldn+0.036
rated+0.035
 spine+0.034
suit+0.034
Neg logit contributions
amer-0.022
old-0.021
ena-0.019
 office-0.018
ful-0.018
Token She
Token ID1375
Feature activation-8.29e-03

Pos logit contributions
rated+0.015
 wouldn+0.015
 may+0.014
 spine+0.014
suit+0.014
Neg logit contributions
old-0.005
amer-0.005
ena-0.004
 student-0.004
od-0.004
Token feels
Token ID5300
Feature activation-1.53e-02

Pos logit contributions
 spine+0.015
rated+0.015
suit+0.014
 wouldn+0.014
 may+0.013
Neg logit contributions
old-0.014
amer-0.013
ena-0.013
 twins-0.012
 office-0.012
Token it
Token ID340
Feature activation-2.89e-02

Pos logit contributions
 may+0.029
 wouldn+0.029
 spine+0.028
rated+0.027
suit+0.027
Neg logit contributions
amer-0.024
old-0.023
ena-0.021
od-0.020
 office-0.020
Token tick
Token ID4378
Feature activation-5.96e-02

Pos logit contributions
icol+0.038
 may+0.038
 spine+0.037
 wouldn+0.036
rated+0.035
Neg logit contributions
amer-0.057
old-0.055
ful-0.053
ena-0.052
od-0.052
Tokenle
Token ID293
Feature activation-1.07e-02

Pos logit contributions
 spine+0.114
 may+0.109
 steep+0.105
icol+0.104
 wouldn+0.104
Neg logit contributions
amer-0.097
ful-0.090
old-0.089
ena-0.087
od-0.087
Token her
Token ID607
Feature activation-5.06e-02

Pos logit contributions
 may+0.027
 wouldn+0.027
 spine+0.026
rated+0.026
suit+0.025
Neg logit contributions
amer-0.010
old-0.009
ena-0.008
od-0.007
 student-0.007
Token palm
Token ID18057
Feature activation-1.49e-02

Pos logit contributions
 may+0.089
rated+0.087
 wouldn+0.086
icol+0.085
use+0.082
Neg logit contributions
amer-0.080
ena-0.077
ful-0.075
 twins-0.075
 student-0.071
Token.
Token ID13
Feature activation-5.13e-03

Pos logit contributions
 may+0.032
 wouldn+0.032
rated+0.031
 spine+0.031
icol+0.030
Neg logit contributions
amer-0.019
old-0.018
ena-0.016
od-0.015
 office-0.015
Token She
Token ID1375
Feature activation-1.60e-02

Pos logit contributions
rated+0.014
 wouldn+0.014
 may+0.013
 spine+0.013
suit+0.013
Neg logit contributions
old-0.004
amer-0.003
ena-0.003
Her-0.003
od-0.003