TOP ACTIVATIONS
MAX = 0.094

tells her the ones she doesn 't know . They have
the story is that it doesn 't take fancy things to
he also realized that it doesn 't matter , because we
a lot of things he didn 't know . He was
uncomfortable playing with someone she didn 't know . She politely
They both learned that it doesn 't matter what others say
ara is selfish . She doesn 't pass the ball to
olly , this k ite doesn 't have much weight .
I didn 't know you didn 't like the water ,"
was old things that they didn 't need anymore . Lily
of old things that she didn 't need . Her mom
And Betty learned that it doesn 't matter what you look
with things that were not hers . <|endoftext|> S ara and
try to approach dogs she doesn 't know . No matter
to talk to people she didn 't know . One day
right to take something that doesn 't belong to you .
happy that she had resisted temptation and went on this big
, " I know it doesn 't taste good , but
and Cat learned that it doesn 't matter if your tail
to talk to people she didn 't know . One day

MIN ACTIVATIONS
MIN = -0.104

decided to speed . Faster and faster he went , until
began to spin . Faster and faster it went . Round
She explored further and further , until she came
together to create even brighter and more beautiful pictures . <|endoftext|>
At the end of the day , Ad eline
started to climb . Higher and higher he went , until
started to spin . Faster and faster it went until it
Suddenly , one of the pieces of ice broke
in the circle . Faster and faster they went until Sarah
around in circles . Faster and faster it went . Then
jewel . Then , one of them saw a hole in
So , one of the cows offered to carry
At the end of the day , the best
Then , one of the children said : "
But then , one of the children came running up
At the end of the day , they said
around and around . Faster and faster she went , until
be speeding up ! Faster and faster it went . Jo
started to bounce . Higher and higher he went until he
. Every day , more and more people would come to

INTERVAL 0.045 - 0.094
CONTAINS 5.483%

But Sam was scared and said , " No , thank
Sam felt sorry and ashamed . They told Mom and Dad
went to play together . <|endoftext|> L ily liked to tie
a spoon and a bowl . She scoop ed some soup
happened and said , " Don 't worry , Tiny .

INTERVAL -0.005 - 0.045
CONTAINS 53.061%

the children . <|endoftext|> Once upon a time , there was
was so excited and praised the golf ball " You are
her for dinner ! But the little duck had an idea
ready for a big adventure . When she was
help when he needed it . <|endoftext|> Once upon a time

INTERVAL -0.054 - -0.005
CONTAINS 36.414%

cat who liked to chase mice . He saw a mouse
looked up from the microscope and pointed to the sky .
<|endoftext|> Jim was a fire man . He was very strong
and it was bleeding . She quickly called an ambulance to
goes !", said Max . The butterfly flew up

INTERVAL -0.104 - -0.054
CONTAINS 5.041%

for it . When the bear came to life
They walked to the bag and pulled the zipper . Inside
When she got home , Polly laid the note out
the mouse were best friends and always had fun together .
Tim my felt bad and apologized again . The bear
Token tells
Token ID4952
Feature activation+1.66e-02

Pos logit contributions
 consequences+0.016
 aback+0.016
 truth+0.015
ives+0.015
 Give+0.014
Neg logit contributions
 boarded-0.019
ep-0.019
cker-0.018
aster-0.018
 sailed-0.018
Token her
Token ID607
Feature activation+5.68e-02

Pos logit contributions
ives+0.033
 aback+0.032
 consequences+0.031
 Give+0.030
 truth+0.030
Neg logit contributions
aster-0.023
 boarded-0.023
ep-0.023
 wheat-0.022
str-0.022
Token the
Token ID262
Feature activation+3.44e-02

Pos logit contributions
 aback+0.115
ives+0.114
 consequences+0.113
 Give+0.108
 truth+0.107
Neg logit contributions
 boarded-0.079
ep-0.075
str-0.074
aster-0.072
field-0.072
Token ones
Token ID3392
Feature activation-1.71e-03

Pos logit contributions
 aback+0.076
ives+0.076
 Give+0.068
 agrees+0.066
 consequences+0.064
Neg logit contributions
 field-0.044
field-0.043
 sunny-0.043
 wheat-0.043
 sudden-0.042
Token she
Token ID673
Feature activation+4.22e-02

Pos logit contributions
 boarded+0.003
ep+0.003
str+0.003
aster+0.003
field+0.003
Neg logit contributions
 aback-0.003
 consequences-0.003
ives-0.003
 Give-0.003
 truth-0.003
Token doesn
Token ID1595
Feature activation+9.41e-02

Pos logit contributions
 aback+0.076
 consequences+0.072
 apology+0.069
 truth+0.068
 fault+0.068
Neg logit contributions
 boarded-0.071
ep-0.071
 sailed-0.064
cker-0.064
 flew-0.062
Token't
Token ID470
Feature activation+2.64e-02

Pos logit contributions
 aback+0.151
 consequences+0.149
ives+0.146
 agrees+0.135
 Give+0.134
Neg logit contributions
aster-0.171
ep-0.166
 boarded-0.166
str-0.162
field-0.160
Token know
Token ID760
Feature activation+2.94e-02

Pos logit contributions
ives+0.041
 aback+0.040
 consequences+0.040
 Give+0.039
 truth+0.039
Neg logit contributions
 boarded-0.049
ep-0.047
aster-0.047
str-0.047
field-0.045
Token.
Token ID13
Feature activation+5.42e-02

Pos logit contributions
ives+0.048
 aback+0.048
 consequences+0.047
 truth+0.046
 Give+0.046
Neg logit contributions
 boarded-0.052
ep-0.050
str-0.049
field-0.048
aster-0.048
Token They
Token ID1119
Feature activation-7.15e-03

Pos logit contributions
 consequences+0.090
ives+0.089
 aback+0.089
 truth+0.086
 Give+0.084
Neg logit contributions
 boarded-0.095
ep-0.090
str-0.090
aster-0.089
 sudden-0.089
Token have
Token ID423
Feature activation+2.27e-03

Pos logit contributions
 boarded+0.013
ep+0.012
str+0.012
field+0.012
aster+0.012
Neg logit contributions
ives-0.012
 aback-0.012
 consequences-0.012
 truth-0.011
 Give-0.011
Token the
Token ID262
Feature activation-1.20e-02

Pos logit contributions
 boarded+0.091
 sudden+0.088
cker+0.085
ep+0.081
field+0.080
Neg logit contributions
 truth-0.126
 aback-0.125
 consequences-0.122
 Give-0.119
 apology-0.115
Token story
Token ID1621
Feature activation-7.66e-03

Pos logit contributions
 boarded+0.021
str+0.020
aster+0.020
ep+0.020
 sudden+0.019
Neg logit contributions
 aback-0.021
ives-0.020
 consequences-0.019
 Give-0.019
 agrees-0.019
Token is
Token ID318
Feature activation-1.58e-03

Pos logit contributions
 boarded+0.015
ep+0.015
field+0.015
str+0.015
lf+0.015
Neg logit contributions
 aback-0.010
ives-0.010
 consequences-0.010
 begs-0.009
 Give-0.008
Token that
Token ID326
Feature activation-1.50e-02

Pos logit contributions
 boarded+0.002
ep+0.002
 sudden+0.002
aster+0.002
str+0.002
Neg logit contributions
ives-0.003
 consequences-0.003
 aback-0.003
 truth-0.003
 Give-0.003
Token it
Token ID340
Feature activation+3.70e-02

Pos logit contributions
 boarded+0.020
ep+0.019
str+0.019
field+0.019
 sudden+0.019
Neg logit contributions
ives-0.031
 aback-0.030
 consequences-0.030
 truth-0.030
 Give-0.029
Token doesn
Token ID1595
Feature activation+8.94e-02

Pos logit contributions
ives+0.063
 aback+0.063
 consequences+0.061
 Give+0.060
 apology+0.059
Neg logit contributions
str-0.056
 boarded-0.056
ep-0.053
field-0.053
 blowing-0.052
Token't
Token ID470
Feature activation+1.55e-02

Pos logit contributions
ives+0.142
 aback+0.141
 consequences+0.141
 agrees+0.132
 Give+0.131
Neg logit contributions
 boarded-0.158
aster-0.156
ep-0.154
str-0.153
After-0.151
Token take
Token ID1011
Feature activation+4.15e-02

Pos logit contributions
ives+0.026
 consequences+0.026
 aback+0.026
 truth+0.025
 Give+0.025
Neg logit contributions
 boarded-0.027
ep-0.026
str-0.025
field-0.025
 sudden-0.025
Token fancy
Token ID14996
Feature activation+2.23e-02

Pos logit contributions
ives+0.076
 Give+0.072
 truth+0.071
 aback+0.070
 agrees+0.070
Neg logit contributions
 boarded-0.061
 wheat-0.060
 sudden-0.060
ep-0.060
 sunny-0.058
Token things
Token ID1243
Feature activation+5.27e-03

Pos logit contributions
ives+0.038
 Give+0.038
 truth+0.038
 aback+0.038
 consequences+0.036
Neg logit contributions
 boarded-0.034
 sudden-0.034
ep-0.033
aster-0.033
 wheat-0.032
Token to
Token ID284
Feature activation+1.30e-02

Pos logit contributions
ives+0.011
 consequences+0.011
 aback+0.010
 truth+0.010
 Give+0.010
Neg logit contributions
 boarded-0.007
ep-0.007
str-0.007
field-0.007
aster-0.007
Token he
Token ID339
Feature activation+1.17e-02

Pos logit contributions
 boarded+0.060
ep+0.058
aster+0.057
str+0.056
pron+0.055
Neg logit contributions
 truth-0.098
 consequences-0.098
 aback-0.097
ives-0.095
 Give-0.094
Token also
Token ID635
Feature activation-2.43e-02

Pos logit contributions
ives+0.021
 aback+0.021
 consequences+0.021
 Give+0.020
 truth+0.020
Neg logit contributions
 boarded-0.019
ep-0.018
field-0.018
str-0.018
 sudden-0.017
Token realized
Token ID6939
Feature activation+9.46e-04

Pos logit contributions
 boarded+0.038
str+0.035
ep+0.035
field+0.035
 sailed+0.034
Neg logit contributions
ives-0.044
 consequences-0.043
 aback-0.042
 Give-0.042
 truth-0.041
Token that
Token ID326
Feature activation-6.71e-03

Pos logit contributions
 truth+0.002
 consequences+0.002
 aback+0.002
 Give+0.002
ives+0.002
Neg logit contributions
 boarded-0.001
pron-0.001
ep-0.001
str-0.001
aster-0.001
Token it
Token ID340
Feature activation+3.90e-02

Pos logit contributions
 boarded+0.009
ep+0.009
str+0.009
aster+0.008
field+0.008
Neg logit contributions
 consequences-0.014
 truth-0.014
 aback-0.013
ives-0.013
 Give-0.013
Token doesn
Token ID1595
Feature activation+8.82e-02

Pos logit contributions
ives+0.072
 aback+0.071
 consequences+0.069
 Give+0.069
 apology+0.068
Neg logit contributions
 boarded-0.056
ep-0.056
str-0.056
field-0.054
 blowing-0.054
Token't
Token ID470
Feature activation+1.28e-02

Pos logit contributions
ives+0.147
 aback+0.147
 consequences+0.145
 agrees+0.137
 Give+0.135
Neg logit contributions
aster-0.148
ep-0.147
 boarded-0.147
str-0.145
After-0.144
Token matter
Token ID2300
Feature activation+2.40e-02

Pos logit contributions
ives+0.022
 aback+0.021
 consequences+0.021
 Give+0.020
 truth+0.020
Neg logit contributions
 boarded-0.022
ep-0.021
str-0.021
aster-0.021
field-0.021
Token,
Token ID11
Feature activation-4.00e-02

Pos logit contributions
ives+0.053
 consequences+0.053
 aback+0.053
 truth+0.052
 Give+0.052
Neg logit contributions
 boarded-0.029
ep-0.027
str-0.026
field-0.026
 sudden-0.026
Token because
Token ID780
Feature activation-4.32e-02

Pos logit contributions
 boarded+0.060
ep+0.058
str+0.057
pron+0.055
aster+0.055
Neg logit contributions
 consequences-0.078
 truth-0.077
 aback-0.076
 Give-0.074
ives-0.074
Token we
Token ID356
Feature activation+2.84e-02

Pos logit contributions
 boarded+0.064
ep+0.063
str+0.063
pron+0.060
aster+0.060
Neg logit contributions
 truth-0.085
 consequences-0.083
 Give-0.079
 aback-0.079
ives-0.077
Token a
Token ID257
Feature activation+1.15e-02

Pos logit contributions
 consequences+0.010
ives+0.010
 aback+0.010
 truth+0.010
 Give+0.009
Neg logit contributions
 boarded-0.007
ep-0.006
str-0.006
field-0.006
aster-0.006
Token lot
Token ID1256
Feature activation-2.23e-02

Pos logit contributions
 truth+0.021
 Give+0.020
 consequences+0.020
 aback+0.019
ives+0.019
Neg logit contributions
 boarded-0.018
aster-0.018
ep-0.017
cker-0.016
field-0.016
Token of
Token ID286
Feature activation-3.29e-02

Pos logit contributions
 boarded+0.025
ep+0.024
 wheat+0.023
 sunny+0.023
str+0.023
Neg logit contributions
 consequences-0.050
 aback-0.049
ives-0.048
 fault-0.048
 Give-0.048
Token things
Token ID1243
Feature activation-6.43e-03

Pos logit contributions
 boarded+0.047
 sudden+0.046
ep+0.045
 wheat+0.044
str+0.044
Neg logit contributions
ives-0.065
 aback-0.064
 consequences-0.062
 Give-0.061
 truth-0.060
Token he
Token ID339
Feature activation+3.64e-02

Pos logit contributions
 boarded+0.008
ep+0.008
 blowing+0.008
field+0.007
str+0.007
Neg logit contributions
 consequences-0.014
 aback-0.013
ives-0.013
 truth-0.013
 Give-0.013
Token didn
Token ID1422
Feature activation+8.80e-02

Pos logit contributions
ives+0.082
 aback+0.081
 consequences+0.081
 Give+0.078
 truth+0.078
Neg logit contributions
 boarded-0.044
ep-0.041
str-0.040
 sudden-0.039
field-0.039
Token't
Token ID470
Feature activation+3.74e-02

Pos logit contributions
 consequences+0.165
 aback+0.163
ives+0.155
 agrees+0.155
 fault+0.149
Neg logit contributions
 blowing-0.130
aster-0.130
ep-0.130
 sunny-0.124
lf-0.123
Token know
Token ID760
Feature activation+5.06e-02

Pos logit contributions
 consequences+0.059
ives+0.058
 aback+0.058
 Give+0.057
 truth+0.055
Neg logit contributions
 boarded-0.068
ep-0.066
 sunny-0.065
aster-0.065
str-0.064
Token.
Token ID13
Feature activation+3.48e-02

Pos logit contributions
 aback+0.089
ives+0.088
 consequences+0.086
 Give+0.082
 truth+0.081
Neg logit contributions
 boarded-0.082
ep-0.081
str-0.079
aster-0.078
field-0.077
Token He
Token ID679
Feature activation+1.26e-02

Pos logit contributions
 consequences+0.058
ives+0.056
 fault+0.053
 aback+0.053
 truth+0.052
Neg logit contributions
 boarded-0.062
 sudden-0.059
After-0.058
ep-0.057
 field-0.057
Token was
Token ID373
Feature activation+1.69e-02

Pos logit contributions
 truth+0.020
 Give+0.020
 aback+0.020
 consequences+0.020
ives+0.019
Neg logit contributions
aster-0.022
 boarded-0.021
field-0.021
str-0.020
pron-0.020
Token uncomfortable
Token ID12916
Feature activation+3.77e-02

Pos logit contributions
 boarded+0.005
 sudden+0.005
 sunny+0.005
ep+0.005
aster+0.005
Neg logit contributions
ives-0.007
 aback-0.007
 Give-0.007
 consequences-0.007
 truth-0.007
Token playing
Token ID2712
Feature activation+1.78e-02

Pos logit contributions
ives+0.079
 aback+0.077
 agrees+0.075
 consequences+0.073
 Give+0.073
Neg logit contributions
 blowing-0.051
 which-0.050
 sunny-0.049
 when-0.046
ep-0.044
Token with
Token ID351
Feature activation-1.88e-02

Pos logit contributions
ives+0.039
 fault+0.037
 aback+0.037
 agrees+0.035
 consequences+0.035
Neg logit contributions
 sunny-0.023
ep-0.023
lf-0.020
 which-0.020
aster-0.020
Token someone
Token ID2130
Feature activation+1.33e-02

Pos logit contributions
 boarded+0.027
ep+0.025
str+0.025
field+0.024
 sudden+0.024
Neg logit contributions
 consequences-0.038
 truth-0.037
 aback-0.037
ives-0.037
 Give-0.036
Token she
Token ID673
Feature activation+2.99e-02

Pos logit contributions
ives+0.025
 aback+0.025
 consequences+0.025
 truth+0.024
 Give+0.024
Neg logit contributions
 boarded-0.020
ep-0.019
str-0.019
 sudden-0.018
 sunny-0.018
Token didn
Token ID1422
Feature activation+8.80e-02

Pos logit contributions
 aback+0.066
 consequences+0.065
ives+0.064
 truth+0.063
 Give+0.061
Neg logit contributions
 boarded-0.039
ep-0.036
str-0.034
 sudden-0.034
field-0.034
Token't
Token ID470
Feature activation+3.64e-02

Pos logit contributions
 aback+0.185
 consequences+0.185
ives+0.176
 agrees+0.175
 fault+0.173
Neg logit contributions
 blowing-0.114
aster-0.113
ep-0.113
lf-0.108
 sunny-0.103
Token know
Token ID760
Feature activation+3.63e-02

Pos logit contributions
 consequences+0.066
 aback+0.065
ives+0.065
 agrees+0.062
 Give+0.062
Neg logit contributions
 sunny-0.055
 boarded-0.053
ep-0.053
aster-0.051
While-0.050
Token.
Token ID13
Feature activation+3.10e-02

Pos logit contributions
 aback+0.056
ives+0.056
 consequences+0.054
 Give+0.051
 truth+0.051
Neg logit contributions
 boarded-0.066
ep-0.065
aster-0.064
str-0.063
field-0.062
Token She
Token ID1375
Feature activation-5.75e-03

Pos logit contributions
 consequences+0.055
ives+0.053
 aback+0.052
 truth+0.051
 fault+0.050
Neg logit contributions
 boarded-0.050
After-0.048
str-0.048
 sudden-0.048
field-0.046
Token politely
Token ID34313
Feature activation-1.63e-02

Pos logit contributions
 boarded+0.008
aster+0.008
str+0.008
field+0.008
ep+0.008
Neg logit contributions
 aback-0.011
 consequences-0.011
ives-0.011
 truth-0.011
 Give-0.011
Token They
Token ID1119
Feature activation-1.32e-02

Pos logit contributions
 consequences+0.044
 aback+0.043
ives+0.043
 truth+0.042
 agrees+0.040
Neg logit contributions
 boarded-0.050
str-0.046
aster-0.046
ep-0.046
field-0.046
Token both
Token ID1111
Feature activation-2.25e-02

Pos logit contributions
aster+0.021
pron+0.020
 sudden+0.020
 wheat+0.019
cker+0.019
Neg logit contributions
ives-0.024
 agrees-0.022
 aback-0.022
 truth-0.022
 Give-0.022
Token learned
Token ID4499
Feature activation+1.86e-02

Pos logit contributions
aster+0.035
pron+0.035
ep+0.034
 sudden+0.034
field+0.034
Neg logit contributions
 aback-0.041
ives-0.041
 agrees-0.039
 consequences-0.039
 Give-0.038
Token that
Token ID326
Feature activation-8.93e-03

Pos logit contributions
 consequences+0.041
 aback+0.041
ives+0.041
 truth+0.041
 Give+0.040
Neg logit contributions
 boarded-0.022
ep-0.021
str-0.021
field-0.021
aster-0.021
Token it
Token ID340
Feature activation+4.04e-02

Pos logit contributions
 boarded+0.014
ep+0.013
str+0.013
field+0.013
aster+0.013
Neg logit contributions
 consequences-0.017
 truth-0.017
 aback-0.016
 Give-0.016
ives-0.016
Token doesn
Token ID1595
Feature activation+8.76e-02

Pos logit contributions
 apology+0.078
 aback+0.078
ives+0.077
 consequences+0.076
 Give+0.073
Neg logit contributions
ep-0.054
str-0.052
 boarded-0.052
 blew-0.050
 blowing-0.050
Token't
Token ID470
Feature activation+1.16e-02

Pos logit contributions
ives+0.137
 aback+0.136
 consequences+0.135
 agrees+0.126
 Give+0.125
Neg logit contributions
 boarded-0.158
ep-0.156
aster-0.155
str-0.154
After-0.151
Token matter
Token ID2300
Feature activation+2.56e-02

Pos logit contributions
ives+0.022
 consequences+0.022
 aback+0.022
 truth+0.022
 Give+0.021
Neg logit contributions
 boarded-0.017
ep-0.016
str-0.016
field-0.016
 sudden-0.016
Token what
Token ID644
Feature activation+2.11e-02

Pos logit contributions
 consequences+0.056
 aback+0.056
ives+0.055
 truth+0.055
 Give+0.054
Neg logit contributions
 boarded-0.032
str-0.029
ep-0.029
field-0.029
 sudden-0.029
Token others
Token ID1854
Feature activation+6.51e-03

Pos logit contributions
ives+0.039
 aback+0.039
 consequences+0.037
 Give+0.036
 truth+0.036
Neg logit contributions
 boarded-0.033
str-0.032
ep-0.032
 sudden-0.031
field-0.031
Token say
Token ID910
Feature activation+3.78e-02

Pos logit contributions
 aback+0.010
 consequences+0.010
ives+0.009
 Give+0.009
 truth+0.009
Neg logit contributions
 boarded-0.012
ep-0.011
str-0.011
field-0.011
 blowing-0.011
Tokenara
Token ID3301
Feature activation+1.66e-02

Pos logit contributions
 aback+0.047
 truth+0.046
 consequences+0.046
ives+0.045
 Give+0.044
Neg logit contributions
 boarded-0.032
str-0.031
aster-0.030
ep-0.030
 wheat-0.028
Token is
Token ID318
Feature activation+1.80e-02

Pos logit contributions
ives+0.034
 aback+0.034
 consequences+0.034
 truth+0.033
 Give+0.031
Neg logit contributions
ep-0.022
 boarded-0.021
str-0.021
cker-0.019
aster-0.019
Token selfish
Token ID20363
Feature activation+5.12e-02

Pos logit contributions
ives+0.030
 aback+0.029
 consequences+0.029
 truth+0.029
 Give+0.028
Neg logit contributions
 boarded-0.032
ep-0.030
str-0.030
field-0.030
aster-0.030
Token.
Token ID13
Feature activation+7.03e-02

Pos logit contributions
 aback+0.116
ives+0.112
 apology+0.111
 truth+0.110
 consequences+0.109
Neg logit contributions
 when-0.043
 which-0.040
 while-0.039
ep-0.037
 field-0.037
Token She
Token ID1375
Feature activation+2.18e-02

Pos logit contributions
 consequences+0.099
 aback+0.097
ives+0.094
 truth+0.093
 fault+0.092
Neg logit contributions
 boarded-0.131
aster-0.131
After-0.124
Un-0.124
 sunny-0.123
Token doesn
Token ID1595
Feature activation+8.76e-02

Pos logit contributions
 aback+0.033
 consequences+0.031
ives+0.031
 truth+0.030
 fault+0.028
Neg logit contributions
ep-0.041
 boarded-0.041
str-0.038
 sailed-0.037
cker-0.037
Token't
Token ID470
Feature activation+1.95e-02

Pos logit contributions
 aback+0.142
 consequences+0.140
ives+0.138
 agrees+0.129
 truth+0.126
Neg logit contributions
aster-0.154
 boarded-0.154
ep-0.152
str-0.150
After-0.148
Token pass
Token ID1208
Feature activation+2.71e-02

Pos logit contributions
ives+0.031
 aback+0.030
 consequences+0.030
 truth+0.029
 Give+0.029
Neg logit contributions
 boarded-0.035
ep-0.035
aster-0.034
str-0.034
 sunny-0.033
Token the
Token ID262
Feature activation+2.23e-02

Pos logit contributions
ives+0.054
 consequences+0.052
 aback+0.052
 truth+0.052
 Give+0.051
Neg logit contributions
ep-0.036
 boarded-0.035
str-0.034
aster-0.033
 wheat-0.032
Token ball
Token ID2613
Feature activation+7.31e-03

Pos logit contributions
ives+0.050
 aback+0.049
 Give+0.045
 agrees+0.045
 begs+0.044
Neg logit contributions
 sunny-0.026
 field-0.026
 boarded-0.026
 sudden-0.025
 wheat-0.025
Token to
Token ID284
Feature activation-2.97e-03

Pos logit contributions
 aback+0.012
 consequences+0.012
ives+0.012
 truth+0.012
 Give+0.011
Neg logit contributions
ep-0.012
 boarded-0.012
str-0.011
aster-0.011
field-0.011
Tokenolly
Token ID5098
Feature activation+7.08e-03

Pos logit contributions
 aback+0.017
ives+0.016
 Give+0.016
 truth+0.016
 consequences+0.016
Neg logit contributions
aster-0.012
After-0.012
pron-0.012
str-0.011
field-0.011
Token,
Token ID11
Feature activation-1.10e-02

Pos logit contributions
ives+0.015
 consequences+0.015
 aback+0.014
 truth+0.014
 Give+0.014
Neg logit contributions
str-0.009
 boarded-0.009
ep-0.009
field-0.008
 wheat-0.008
Token this
Token ID428
Feature activation+3.26e-02

Pos logit contributions
 boarded+0.019
ep+0.018
str+0.018
field+0.018
 sudden+0.018
Neg logit contributions
 consequences-0.019
 truth-0.018
 aback-0.018
ives-0.018
 Give-0.018
Token k
Token ID479
Feature activation-3.06e-02

Pos logit contributions
 truth+0.061
 consequences+0.061
 aback+0.059
 Give+0.059
ives+0.057
Neg logit contributions
aster-0.051
 boarded-0.049
 sudden-0.048
chard-0.047
eh-0.046
Tokenite
Token ID578
Feature activation-2.80e-02

Pos logit contributions
 boarded+0.025
ep+0.023
str+0.022
aster+0.022
 sudden+0.022
Neg logit contributions
ives-0.080
 aback-0.079
 consequences-0.079
 Give-0.078
 truth-0.078
Token doesn
Token ID1595
Feature activation+8.73e-02

Pos logit contributions
 boarded+0.065
ep+0.064
str+0.063
field+0.062
aster+0.061
Neg logit contributions
ives-0.030
 aback-0.030
 consequences-0.030
 truth-0.028
 Give-0.028
Token't
Token ID470
Feature activation+1.46e-02

Pos logit contributions
 consequences+0.147
 aback+0.147
ives+0.146
 agrees+0.136
 fault+0.134
Neg logit contributions
aster-0.147
 boarded-0.144
ep-0.143
str-0.143
After-0.140
Token have
Token ID423
Feature activation+3.73e-02

Pos logit contributions
 consequences+0.022
 aback+0.021
 truth+0.021
ives+0.020
 Give+0.019
Neg logit contributions
 boarded-0.028
 sudden-0.028
field-0.028
ep-0.027
pron-0.027
Token much
Token ID881
Feature activation-1.16e-04

Pos logit contributions
ives+0.070
 aback+0.069
 consequences+0.068
 Give+0.067
 truth+0.066
Neg logit contributions
 boarded-0.057
ep-0.055
str-0.054
aster-0.053
 sudden-0.052
Token weight
Token ID3463
Feature activation-6.22e-03

Pos logit contributions
 boarded+0.000
 sudden+0.000
 wheat+0.000
ep+0.000
str+0.000
Neg logit contributions
ives-0.000
 aback-0.000
 Give-0.000
 consequences-0.000
 truth-0.000
Token.
Token ID13
Feature activation+4.88e-02

Pos logit contributions
 boarded+0.011
ep+0.011
str+0.011
aster+0.010
field+0.010
Neg logit contributions
ives-0.010
 aback-0.010
 consequences-0.009
 truth-0.009
 Give-0.009
Token I
Token ID314
Feature activation+2.42e-02

Pos logit contributions
 boarded+0.027
ep+0.026
 sudden+0.026
str+0.026
field+0.026
Neg logit contributions
 truth-0.018
 consequences-0.018
ives-0.018
 Give-0.018
 aback-0.018
Token didn
Token ID1422
Feature activation+7.39e-02

Pos logit contributions
ives+0.051
 consequences+0.051
 aback+0.051
 Give+0.048
 truth+0.048
Neg logit contributions
 boarded-0.032
ep-0.030
str-0.030
 sudden-0.029
After-0.028
Token't
Token ID470
Feature activation+2.94e-02

Pos logit contributions
 aback+0.130
 consequences+0.130
ives+0.128
 agrees+0.123
 fault+0.119
Neg logit contributions
aster-0.123
ep-0.117
 boarded-0.116
 blowing-0.115
After-0.114
Token know
Token ID760
Feature activation+2.12e-02

Pos logit contributions
ives+0.060
 agrees+0.056
 aback+0.055
 consequences+0.052
 Give+0.052
Neg logit contributions
 sunny-0.044
ep-0.043
aster-0.039
str-0.038
 blowing-0.037
Token you
Token ID345
Feature activation+7.59e-02

Pos logit contributions
ives+0.038
 aback+0.037
 consequences+0.036
 truth+0.035
 Give+0.035
Neg logit contributions
 boarded-0.034
ep-0.034
aster-0.033
str-0.033
 sudden-0.032
Token didn
Token ID1422
Feature activation+8.71e-02

Pos logit contributions
 aback+0.144
 consequences+0.141
ives+0.140
 Give+0.135
 truth+0.135
Neg logit contributions
 boarded-0.118
ep-0.109
str-0.106
aster-0.105
field-0.103
Token't
Token ID470
Feature activation+4.07e-02

Pos logit contributions
 consequences+0.167
 aback+0.166
 agrees+0.163
ives+0.161
 fault+0.153
Neg logit contributions
aster-0.136
ep-0.127
 blowing-0.126
After-0.121
lf-0.120
Token like
Token ID588
Feature activation+2.13e-02

Pos logit contributions
ives+0.079
 aback+0.075
 agrees+0.075
 consequences+0.073
 Give+0.071
Neg logit contributions
ep-0.060
 sunny-0.059
 boarded-0.058
 blowing-0.058
str-0.055
Token the
Token ID262
Feature activation+2.41e-02

Pos logit contributions
ives+0.048
 aback+0.048
 consequences+0.047
 truth+0.046
 Give+0.046
Neg logit contributions
 boarded-0.025
ep-0.023
aster-0.023
str-0.023
 sudden-0.023
Token water
Token ID1660
Feature activation+6.04e-03

Pos logit contributions
ives+0.052
 aback+0.051
 Give+0.048
 agrees+0.046
 consequences+0.045
Neg logit contributions
 sunny-0.031
ep-0.030
pron-0.030
 sudden-0.030
 boarded-0.030
Token,"
Token ID553
Feature activation-3.64e-02

Pos logit contributions
ives+0.010
 aback+0.010
 consequences+0.009
 Give+0.009
 truth+0.009
Neg logit contributions
ep-0.011
 boarded-0.011
str-0.010
field-0.010
aster-0.010
Token was
Token ID373
Feature activation+2.96e-02

Pos logit contributions
ives+0.121
 aback+0.117
 consequences+0.114
 apology+0.113
 agrees+0.110
Neg logit contributions
str-0.052
 boarded-0.049
ep-0.048
 flew-0.047
 blowing-0.046
Token old
Token ID1468
Feature activation+2.24e-02

Pos logit contributions
 consequences+0.056
 truth+0.055
 aback+0.053
ives+0.051
 Give+0.051
Neg logit contributions
 boarded-0.047
ep-0.044
 sudden-0.044
str-0.044
field-0.044
Token things
Token ID1243
Feature activation+6.60e-03

Pos logit contributions
ives+0.048
 aback+0.046
 Give+0.045
 consequences+0.045
 truth+0.044
Neg logit contributions
 boarded-0.029
 wheat-0.028
ep-0.027
 sunny-0.026
field-0.026
Token that
Token ID326
Feature activation-3.73e-03

Pos logit contributions
ives+0.013
 aback+0.013
 consequences+0.013
 Give+0.013
 agrees+0.012
Neg logit contributions
 boarded-0.009
str-0.009
ep-0.009
 sunny-0.009
 wheat-0.008
Token they
Token ID484
Feature activation+2.53e-02

Pos logit contributions
 boarded+0.006
ep+0.006
 sudden+0.005
str+0.005
aster+0.005
Neg logit contributions
ives-0.007
 consequences-0.007
 aback-0.007
 truth-0.007
 Give-0.007
Token didn
Token ID1422
Feature activation+8.70e-02

Pos logit contributions
ives+0.045
 consequences+0.044
 aback+0.044
 truth+0.043
 Give+0.042
Neg logit contributions
 boarded-0.041
ep-0.040
str-0.039
aster-0.039
field-0.039
Token't
Token ID470
Feature activation+3.36e-02

Pos logit contributions
 aback+0.154
 consequences+0.153
ives+0.149
 agrees+0.146
 fault+0.139
Neg logit contributions
ep-0.142
aster-0.142
 boarded-0.136
 blowing-0.136
After-0.134
Token need
Token ID761
Feature activation+6.12e-02

Pos logit contributions
ives+0.055
 aback+0.055
 consequences+0.055
 Give+0.052
 truth+0.052
Neg logit contributions
 boarded-0.060
ep-0.057
aster-0.056
str-0.056
 sunny-0.055
Token anymore
Token ID7471
Feature activation+2.74e-02

Pos logit contributions
 aback+0.119
ives+0.116
 agrees+0.115
 Give+0.109
 begs+0.108
Neg logit contributions
 boarded-0.086
 wheat-0.084
aster-0.084
 sunny-0.083
ep-0.079
Token.
Token ID13
Feature activation+2.99e-02

Pos logit contributions
 aback+0.050
ives+0.047
 consequences+0.046
 Give+0.045
 agrees+0.044
Neg logit contributions
 boarded-0.042
ep-0.042
 blowing-0.041
 sudden-0.040
 sunny-0.040
Token Lily
Token ID20037
Feature activation+3.01e-03

Pos logit contributions
 consequences+0.065
ives+0.063
 fault+0.063
 aback+0.061
 truth+0.060
Neg logit contributions
After-0.036
 which-0.033
 they-0.033
 sudden-0.033
At-0.032
Token of
Token ID286
Feature activation-3.90e-02

Pos logit contributions
 consequences+0.040
 aback+0.040
ives+0.040
 truth+0.039
 Give+0.039
Neg logit contributions
 boarded-0.027
ep-0.025
str-0.024
aster-0.024
 sudden-0.024
Token old
Token ID1468
Feature activation-1.04e-02

Pos logit contributions
 sudden+0.048
 boarded+0.047
 sunny+0.047
 wheat+0.046
ep+0.045
Neg logit contributions
ives-0.086
 aback-0.081
 agrees-0.080
 begs-0.079
 fault-0.079
Token things
Token ID1243
Feature activation-1.42e-02

Pos logit contributions
 sudden+0.013
 wheat+0.013
 boarded+0.012
ep+0.012
 field+0.012
Neg logit contributions
ives-0.023
 aback-0.022
 truth-0.021
 fault-0.021
 agrees-0.021
Token that
Token ID326
Feature activation-1.43e-02

Pos logit contributions
 boarded+0.018
 which+0.018
 blowing+0.018
 wheat+0.017
 sunny+0.017
Neg logit contributions
 consequences-0.029
ives-0.029
 aback-0.028
 apology-0.028
 Give-0.028
Token she
Token ID673
Feature activation+2.44e-02

Pos logit contributions
 boarded+0.019
ep+0.018
str+0.018
field+0.018
 sudden+0.018
Neg logit contributions
ives-0.029
 consequences-0.029
 aback-0.029
 truth-0.029
 Give-0.028
Token didn
Token ID1422
Feature activation+8.70e-02

Pos logit contributions
 aback+0.054
ives+0.054
 consequences+0.053
 fault+0.051
 Give+0.051
Neg logit contributions
 boarded-0.032
ep-0.030
 sudden-0.029
str-0.028
field-0.027
Token't
Token ID470
Feature activation+3.89e-02

Pos logit contributions
 consequences+0.164
 aback+0.161
 agrees+0.156
ives+0.154
 fault+0.146
Neg logit contributions
aster-0.133
ep-0.133
 blowing-0.128
lf-0.127
 boarded-0.126
Token need
Token ID761
Feature activation+6.27e-02

Pos logit contributions
ives+0.060
 consequences+0.060
 aback+0.060
 truth+0.057
 Give+0.057
Neg logit contributions
 boarded-0.073
ep-0.070
aster-0.069
str-0.068
 sunny-0.067
Token.
Token ID13
Feature activation+2.93e-02

Pos logit contributions
 agrees+0.127
 aback+0.123
ives+0.121
 Give+0.113
 begs+0.113
Neg logit contributions
aster-0.082
 which-0.080
 boarded-0.079
 sunny-0.078
 wheat-0.076
Token Her
Token ID2332
Feature activation+2.51e-02

Pos logit contributions
 consequences+0.064
ives+0.062
 fault+0.060
 aback+0.059
 agrees+0.058
Neg logit contributions
 sudden-0.036
After-0.036
 boarded-0.035
str-0.034
At-0.033
Token mom
Token ID1995
Feature activation-3.10e-02

Pos logit contributions
ives+0.063
 aback+0.062
 consequences+0.060
 Give+0.059
 agrees+0.059
Neg logit contributions
 boarded-0.025
 sudden-0.024
ep-0.023
str-0.023
pron-0.022
Token And
Token ID843
Feature activation-4.47e-02

Pos logit contributions
 consequences+0.056
ives+0.055
 aback+0.053
 truth+0.053
 Give+0.051
Neg logit contributions
 boarded-0.045
str-0.044
field-0.043
ep-0.042
 sudden-0.042
Token Betty
Token ID29504
Feature activation-1.34e-02

Pos logit contributions
ep+0.069
pron+0.068
eh+0.065
aster+0.064
 boarded+0.061
Neg logit contributions
 aback-0.093
 truth-0.086
 consequences-0.085
 Give-0.084
ives-0.083
Token learned
Token ID4499
Feature activation+1.88e-02

Pos logit contributions
 boarded+0.017
ep+0.016
str+0.016
field+0.016
 sailed+0.015
Neg logit contributions
 consequences-0.029
ives-0.029
 aback-0.028
 Give-0.028
 truth-0.028
Token that
Token ID326
Feature activation-1.05e-02

Pos logit contributions
ives+0.048
 aback+0.047
 consequences+0.047
 Give+0.046
 agrees+0.045
Neg logit contributions
 boarded-0.016
ep-0.015
str-0.015
After-0.014
 sudden-0.014
Token it
Token ID340
Feature activation+3.99e-02

Pos logit contributions
 boarded+0.013
ep+0.012
 sudden+0.012
str+0.012
 sunny+0.012
Neg logit contributions
ives-0.023
 aback-0.022
 consequences-0.022
 Give-0.021
 truth-0.021
Token doesn
Token ID1595
Feature activation+8.70e-02

Pos logit contributions
ives+0.083
 apology+0.083
 aback+0.080
 Give+0.078
 consequences+0.077
Neg logit contributions
ep-0.046
 blew-0.046
 flew-0.045
 they-0.044
 blowing-0.044
Token't
Token ID470
Feature activation+1.37e-02

Pos logit contributions
ives+0.146
 aback+0.145
 consequences+0.143
 agrees+0.136
 Give+0.133
Neg logit contributions
ep-0.145
aster-0.145
 boarded-0.145
str-0.143
After-0.143
Token matter
Token ID2300
Feature activation+2.52e-02

Pos logit contributions
ives+0.027
 aback+0.026
 consequences+0.025
 Give+0.025
 truth+0.025
Neg logit contributions
 boarded-0.020
ep-0.019
str-0.019
aster-0.018
 sunny-0.018
Token what
Token ID644
Feature activation+2.04e-02

Pos logit contributions
ives+0.059
 aback+0.058
 consequences+0.057
 truth+0.056
 Give+0.056
Neg logit contributions
 boarded-0.027
ep-0.026
str-0.025
aster-0.025
 sudden-0.025
Token you
Token ID345
Feature activation+3.52e-02

Pos logit contributions
ives+0.043
 aback+0.043
 consequences+0.040
 Give+0.040
 agrees+0.039
Neg logit contributions
 boarded-0.027
ep-0.026
str-0.026
 sudden-0.025
 sunny-0.025
Token look
Token ID804
Feature activation+5.40e-02

Pos logit contributions
 aback+0.059
 consequences+0.058
 fault+0.056
ives+0.056
 Give+0.056
Neg logit contributions
 boarded-0.061
 sailed-0.055
aster-0.055
ep-0.055
field-0.055
Token with
Token ID351
Feature activation-2.09e-02

Pos logit contributions
ives+0.043
 aback+0.042
 agrees+0.041
 Give+0.040
 fault+0.040
Neg logit contributions
ep-0.019
 sunny-0.018
aster-0.017
 wheat-0.016
lf-0.016
Token things
Token ID1243
Feature activation-9.19e-03

Pos logit contributions
 boarded+0.030
str+0.028
field+0.028
ep+0.028
 sudden+0.027
Neg logit contributions
 consequences-0.043
 truth-0.042
 aback-0.041
ives-0.040
 Give-0.039
Token that
Token ID326
Feature activation-9.85e-04

Pos logit contributions
 boarded+0.009
 which+0.009
 sunny+0.009
ep+0.009
aster+0.009
Neg logit contributions
 aback-0.021
 consequences-0.021
ives-0.021
 Give-0.020
 agrees-0.020
Token were
Token ID547
Feature activation+2.63e-03

Pos logit contributions
 boarded+0.002
str+0.002
ep+0.002
field+0.002
aster+0.002
Neg logit contributions
ives-0.001
 aback-0.001
 consequences-0.001
 Give-0.001
 truth-0.001
Token not
Token ID407
Feature activation+1.26e-02

Pos logit contributions
 consequences+0.005
 truth+0.004
 fault+0.004
 aback+0.004
ives+0.004
Neg logit contributions
 boarded-0.005
 sudden-0.005
aster-0.004
field-0.004
str-0.004
Token hers
Token ID25144
Feature activation+8.68e-02

Pos logit contributions
 consequences+0.021
 truth+0.020
 fault+0.019
 aback+0.019
ives+0.018
Neg logit contributions
field-0.024
 sudden-0.024
 boarded-0.023
str-0.023
aster-0.023
Token.
Token ID13
Feature activation+5.62e-02

Pos logit contributions
 aback+0.184
 apology+0.172
 agrees+0.171
ives+0.170
 consequences+0.167
Neg logit contributions
 which-0.108
 when-0.105
 they-0.102
 blowing-0.099
field-0.098
Token<|endoftext|>
Token ID50256
Feature activation+4.57e-02

Pos logit contributions
 consequences+0.099
 aback+0.097
ives+0.096
 agrees+0.095
 apology+0.089
Neg logit contributions
 boarded-0.087
str-0.083
aster-0.083
After-0.082
Un-0.081
TokenS
Token ID50
Feature activation+3.12e-03

Pos logit contributions
 consequences+0.110
 apology+0.109
 aback+0.108
 fault+0.108
 agrees+0.106
Neg logit contributions
 boarded-0.043
field-0.043
 field-0.041
At-0.041
aster-0.040
Tokenara
Token ID3301
Feature activation+7.48e-03

Pos logit contributions
ives+0.005
 truth+0.005
 aback+0.005
 consequences+0.005
 Give+0.005
Neg logit contributions
 boarded-0.005
 sudden-0.005
ep-0.005
cker-0.005
 sunny-0.005
Token and
Token ID290
Feature activation-5.46e-02

Pos logit contributions
 consequences+0.017
 aback+0.016
ives+0.016
 truth+0.016
 apology+0.016
Neg logit contributions
 boarded-0.009
ep-0.009
field-0.008
str-0.008
aster-0.008
Token try
Token ID1949
Feature activation+5.74e-02

Pos logit contributions
 boarded+0.013
str+0.012
ep+0.012
aster+0.012
 sudden+0.012
Neg logit contributions
ives-0.018
 aback-0.018
 consequences-0.017
 truth-0.017
 Give-0.017
Token to
Token ID284
Feature activation+1.22e-02

Pos logit contributions
ives+0.087
 consequences+0.087
 aback+0.086
 truth+0.084
 Give+0.082
Neg logit contributions
 boarded-0.109
ep-0.105
str-0.103
field-0.102
aster-0.102
Token approach
Token ID3164
Feature activation+1.20e-02

Pos logit contributions
ives+0.024
 aback+0.024
 consequences+0.024
 Give+0.023
 truth+0.023
Neg logit contributions
 boarded-0.018
ep-0.017
str-0.017
aster-0.016
field-0.016
Token dogs
Token ID6844
Feature activation-1.44e-02

Pos logit contributions
ives+0.027
 aback+0.027
 consequences+0.026
 Give+0.026
 truth+0.026
Neg logit contributions
 boarded-0.014
ep-0.013
str-0.013
 sudden-0.013
aster-0.013
Token she
Token ID673
Feature activation+3.55e-02

Pos logit contributions
 boarded+0.023
field+0.022
str+0.021
ep+0.021
aster+0.021
Neg logit contributions
 consequences-0.025
ives-0.025
 aback-0.025
 Give-0.024
 truth-0.024
Token doesn
Token ID1595
Feature activation+8.67e-02

Pos logit contributions
 aback+0.060
ives+0.060
 consequences+0.058
 truth+0.056
 Give+0.056
Neg logit contributions
 boarded-0.060
str-0.056
ep-0.056
 sudden-0.054
field-0.054
Token't
Token ID470
Feature activation+1.69e-02

Pos logit contributions
 consequences+0.143
 aback+0.142
ives+0.139
 agrees+0.133
 fault+0.131
Neg logit contributions
aster-0.148
str-0.146
 boarded-0.146
ep-0.145
After-0.141
Token know
Token ID760
Feature activation+2.98e-02

Pos logit contributions
ives+0.026
 aback+0.026
 consequences+0.026
 Give+0.025
 truth+0.025
Neg logit contributions
 boarded-0.031
str-0.030
ep-0.029
aster-0.029
 sunny-0.028
Token.
Token ID13
Feature activation+3.07e-02

Pos logit contributions
ives+0.057
 aback+0.057
 consequences+0.057
 truth+0.055
 Give+0.055
Neg logit contributions
 boarded-0.044
ep-0.042
str-0.041
field-0.041
 sudden-0.040
Token No
Token ID1400
Feature activation-1.57e-02

Pos logit contributions
 consequences+0.055
ives+0.053
 truth+0.051
 aback+0.051
 Give+0.050
Neg logit contributions
 boarded-0.050
str-0.048
field-0.048
aster-0.047
 sudden-0.047
Token matter
Token ID2300
Feature activation+8.99e-03

Pos logit contributions
 boarded+0.021
ep+0.020
field+0.020
str+0.019
aster+0.019
Neg logit contributions
 consequences-0.033
 aback-0.033
ives-0.033
 truth-0.032
 Give-0.032
Token to
Token ID284
Feature activation+7.85e-03

Pos logit contributions
ives+0.025
 aback+0.025
 consequences+0.025
 truth+0.024
 Give+0.024
Neg logit contributions
 boarded-0.023
ep-0.022
str-0.021
field-0.021
 sudden-0.021
Token talk
Token ID1561
Feature activation+3.24e-02

Pos logit contributions
ives+0.015
 aback+0.015
 consequences+0.015
 Give+0.014
 truth+0.014
Neg logit contributions
 boarded-0.012
ep-0.011
aster-0.011
str-0.011
field-0.011
Token to
Token ID284
Feature activation+2.01e-03

Pos logit contributions
 aback+0.056
ives+0.055
 consequences+0.054
 Give+0.053
 truth+0.051
Neg logit contributions
 boarded-0.053
ep-0.051
 sunny-0.050
str-0.049
 sudden-0.049
Token people
Token ID661
Feature activation-1.50e-02

Pos logit contributions
ives+0.004
 aback+0.004
 consequences+0.004
 Give+0.004
 truth+0.004
Neg logit contributions
 boarded-0.003
ep-0.003
aster-0.003
str-0.003
 sudden-0.003
Token she
Token ID673
Feature activation+3.34e-02

Pos logit contributions
 boarded+0.027
ep+0.026
str+0.026
field+0.026
aster+0.026
Neg logit contributions
ives-0.024
 aback-0.024
 consequences-0.024
 truth-0.023
 Give-0.022
Token didn
Token ID1422
Feature activation+8.66e-02

Pos logit contributions
 aback+0.069
ives+0.068
 consequences+0.068
 truth+0.066
 Give+0.066
Neg logit contributions
 boarded-0.046
ep-0.044
str-0.043
field-0.042
 sudden-0.042
Token't
Token ID470
Feature activation+3.71e-02

Pos logit contributions
 aback+0.160
 consequences+0.156
ives+0.156
 agrees+0.149
 fault+0.146
Neg logit contributions
ep-0.136
aster-0.133
 blowing-0.132
After-0.129
 boarded-0.126
Token know
Token ID760
Feature activation+3.69e-02

Pos logit contributions
ives+0.060
 aback+0.060
 consequences+0.059
 Give+0.057
 agrees+0.056
Neg logit contributions
 boarded-0.066
ep-0.064
str-0.062
aster-0.062
 sunny-0.061
Token.
Token ID13
Feature activation+2.48e-02

Pos logit contributions
ives+0.060
 consequences+0.060
 aback+0.059
 truth+0.058
 Give+0.057
Neg logit contributions
 boarded-0.065
ep-0.062
str-0.062
field-0.061
 sudden-0.061
Token One
Token ID1881
Feature activation-2.56e-02

Pos logit contributions
ives+0.051
 aback+0.050
 consequences+0.050
 agrees+0.046
 fault+0.046
Neg logit contributions
 sunny-0.033
After-0.033
 boarded-0.032
While-0.032
At-0.031
Token day
Token ID1110
Feature activation-3.49e-02

Pos logit contributions
ep+0.041
 boarded+0.039
 sudden+0.039
field+0.039
cking+0.037
Neg logit contributions
 truth-0.046
 Give-0.044
 consequences-0.043
 apology-0.040
ives-0.039
Token right
Token ID826
Feature activation+5.55e-02

Pos logit contributions
 consequences+0.022
 aback+0.022
 truth+0.021
ives+0.021
 Give+0.020
Neg logit contributions
 boarded-0.021
field-0.020
ep-0.020
str-0.020
aster-0.020
Token to
Token ID284
Feature activation+1.43e-02

Pos logit contributions
ives+0.112
 Give+0.103
 aback+0.101
 agrees+0.098
 truth+0.093
Neg logit contributions
 boarded-0.081
 wheat-0.079
ep-0.078
field-0.076
cker-0.074
Token take
Token ID1011
Feature activation+3.49e-02

Pos logit contributions
ives+0.025
 aback+0.025
 consequences+0.024
 Give+0.024
 truth+0.023
Neg logit contributions
 boarded-0.023
ep-0.022
str-0.022
aster-0.022
field-0.022
Token something
Token ID1223
Feature activation+5.28e-02

Pos logit contributions
ives+0.066
 Give+0.061
 agrees+0.060
 aback+0.059
 truth+0.059
Neg logit contributions
 boarded-0.052
 wheat-0.052
 sudden-0.051
ep-0.050
aster-0.049
Token that
Token ID326
Feature activation+2.09e-03

Pos logit contributions
ives+0.102
 aback+0.099
 Give+0.098
 consequences+0.097
 agrees+0.096
Neg logit contributions
 boarded-0.076
 sudden-0.074
str-0.073
 wheat-0.071
ep-0.070
Token doesn
Token ID1595
Feature activation+8.66e-02

Pos logit contributions
ives+0.003
 aback+0.003
 consequences+0.003
 Give+0.003
 truth+0.002
Neg logit contributions
 boarded-0.004
str-0.004
ep-0.004
field-0.004
aster-0.004
Token't
Token ID470
Feature activation+7.64e-03

Pos logit contributions
 aback+0.141
ives+0.138
 consequences+0.138
 agrees+0.130
 Give+0.128
Neg logit contributions
aster-0.150
 boarded-0.148
ep-0.146
str-0.146
field-0.143
Token belong
Token ID5594
Feature activation+4.80e-02

Pos logit contributions
 consequences+0.012
 truth+0.011
 aback+0.011
ives+0.011
 Give+0.011
Neg logit contributions
 boarded-0.015
ep-0.014
 sudden-0.014
field-0.014
str-0.014
Token to
Token ID284
Feature activation+1.00e-02

Pos logit contributions
ives+0.104
 Give+0.103
 apology+0.102
 begs+0.100
 agrees+0.099
Neg logit contributions
 which-0.053
 they-0.052
 boarded-0.052
 when-0.049
 wheat-0.047
Token you
Token ID345
Feature activation+5.46e-02

Pos logit contributions
ives+0.018
 aback+0.018
 Give+0.017
 consequences+0.017
 agrees+0.016
Neg logit contributions
 boarded-0.016
aster-0.015
ep-0.015
str-0.015
 wheat-0.015
Token.
Token ID13
Feature activation+3.58e-02

Pos logit contributions
 aback+0.087
ives+0.086
 consequences+0.085
 Give+0.082
 truth+0.082
Neg logit contributions
 boarded-0.097
field-0.092
str-0.092
ep-0.092
aster-0.091
Token happy
Token ID3772
Feature activation-4.30e-03

Pos logit contributions
 boarded+0.025
field+0.025
aster+0.024
ep+0.024
cker+0.024
Neg logit contributions
 aback-0.028
ives-0.028
 truth-0.027
 agrees-0.027
 consequences-0.027
Token that
Token ID326
Feature activation-2.70e-02

Pos logit contributions
 boarded+0.006
ep+0.006
str+0.006
field+0.006
 sudden+0.006
Neg logit contributions
ives-0.008
 aback-0.008
 consequences-0.008
 truth-0.008
 Give-0.008
Token she
Token ID673
Feature activation+1.67e-02

Pos logit contributions
 boarded+0.031
ep+0.029
str+0.029
field+0.028
 sudden+0.028
Neg logit contributions
ives-0.061
 consequences-0.061
 aback-0.061
 truth-0.060
 Give-0.059
Token had
Token ID550
Feature activation+6.73e-03

Pos logit contributions
ives+0.035
 aback+0.034
 consequences+0.034
 truth+0.034
 Give+0.033
Neg logit contributions
 boarded-0.022
ep-0.021
str-0.021
field-0.020
aster-0.020
Token resisted
Token ID26643
Feature activation-4.92e-04

Pos logit contributions
 consequences+0.012
 aback+0.012
ives+0.012
 truth+0.012
 Give+0.012
Neg logit contributions
str-0.010
field-0.010
 boarded-0.010
cker-0.010
aster-0.010
Token temptation
Token ID29062
Feature activation+8.66e-02

Pos logit contributions
 boarded+0.001
ep+0.001
str+0.001
field+0.001
 sudden+0.001
Neg logit contributions
ives-0.001
 consequences-0.001
 aback-0.001
 truth-0.001
 Give-0.001
Token and
Token ID290
Feature activation-6.96e-02

Pos logit contributions
 aback+0.156
 consequences+0.151
ives+0.150
 Give+0.147
 agrees+0.144
Neg logit contributions
 boarded-0.133
ep-0.124
str-0.124
 sunny-0.123
aster-0.122
Token went
Token ID1816
Feature activation+1.83e-02

Pos logit contributions
str+0.097
 boarded+0.096
ep+0.095
field+0.093
 sudden+0.090
Neg logit contributions
 consequences-0.140
 truth-0.137
 aback-0.136
ives-0.135
 Give-0.132
Token on
Token ID319
Feature activation-1.99e-02

Pos logit contributions
ives+0.031
 consequences+0.031
 aback+0.031
 truth+0.030
 Give+0.030
Neg logit contributions
 boarded-0.032
ep-0.030
str-0.029
field-0.029
aster-0.029
Token this
Token ID428
Feature activation+3.21e-02

Pos logit contributions
 boarded+0.027
 sudden+0.025
ep+0.025
aster+0.025
field+0.024
Neg logit contributions
ives-0.042
 aback-0.041
 consequences-0.040
 Give-0.040
 truth-0.039
Token big
Token ID1263
Feature activation-1.22e-02

Pos logit contributions
 consequences+0.063
ives+0.062
 aback+0.062
 truth+0.062
 Give+0.060
Neg logit contributions
 boarded-0.047
ep-0.044
str-0.043
 sudden-0.043
aster-0.042
Token,
Token ID11
Feature activation-6.97e-03

Pos logit contributions
ives+0.106
 aback+0.104
 consequences+0.104
 truth+0.100
 Give+0.099
Neg logit contributions
 boarded-0.057
str-0.056
 sudden-0.056
aster-0.055
pron-0.054
Token "
Token ID366
Feature activation+2.57e-02

Pos logit contributions
 boarded+0.014
ep+0.013
field+0.013
aster+0.013
str+0.013
Neg logit contributions
 Give-0.010
 truth-0.010
ives-0.010
 aback-0.010
 consequences-0.009
TokenI
Token ID40
Feature activation+1.84e-02

Pos logit contributions
ives+0.044
 consequences+0.043
 aback+0.043
 truth+0.043
 Give+0.043
Neg logit contributions
 boarded-0.045
field-0.042
ep-0.042
str-0.042
aster-0.041
Token know
Token ID760
Feature activation+1.37e-02

Pos logit contributions
 consequences+0.032
ives+0.030
 Give+0.029
 aback+0.028
 truth+0.028
Neg logit contributions
 boarded-0.032
str-0.031
aster-0.030
After-0.030
field-0.029
Token it
Token ID340
Feature activation+4.31e-02

Pos logit contributions
ives+0.026
 consequences+0.026
 aback+0.026
 truth+0.025
 Give+0.025
Neg logit contributions
 boarded-0.021
ep-0.020
str-0.019
field-0.019
 sudden-0.019
Token doesn
Token ID1595
Feature activation+8.63e-02

Pos logit contributions
 aback+0.079
 consequences+0.077
ives+0.077
 Give+0.073
 apology+0.073
Neg logit contributions
ep-0.063
str-0.063
 boarded-0.061
 wheat-0.061
field-0.061
Token't
Token ID470
Feature activation+1.85e-02

Pos logit contributions
 aback+0.145
 consequences+0.143
ives+0.141
 agrees+0.136
 fault+0.131
Neg logit contributions
aster-0.149
 boarded-0.143
str-0.141
 blowing-0.140
After-0.140
Token taste
Token ID6938
Feature activation+2.39e-02

Pos logit contributions
ives+0.030
 aback+0.030
 consequences+0.030
 truth+0.029
 Give+0.029
Neg logit contributions
 boarded-0.033
ep-0.032
str-0.031
aster-0.031
field-0.031
Token good
Token ID922
Feature activation+1.83e-02

Pos logit contributions
ives+0.041
 aback+0.040
 Give+0.039
 consequences+0.039
 truth+0.038
Neg logit contributions
 boarded-0.039
ep-0.038
str-0.038
aster-0.038
 sudden-0.038
Token,
Token ID11
Feature activation-1.13e-02

Pos logit contributions
ives+0.040
 aback+0.039
 agrees+0.038
 Give+0.037
 truth+0.037
Neg logit contributions
 when-0.019
 while-0.019
 as-0.019
 blowing-0.018
 they-0.018
Token but
Token ID475
Feature activation-4.07e-02

Pos logit contributions
 boarded+0.018
ep+0.017
 sudden+0.017
str+0.017
aster+0.016
Neg logit contributions
ives-0.021
 aback-0.021
 consequences-0.020
 truth-0.020
 Give-0.019
Token and
Token ID290
Feature activation-4.91e-02

Pos logit contributions
 consequences+0.024
ives+0.024
 aback+0.024
 truth+0.024
 Give+0.023
Neg logit contributions
 boarded-0.017
ep-0.016
str-0.016
field-0.016
 sudden-0.016
Token Cat
Token ID5181
Feature activation-1.67e-02

Pos logit contributions
 boarded+0.090
pron+0.089
 sudden+0.084
field+0.083
str+0.082
Neg logit contributions
 consequences-0.078
 aback-0.078
 Give-0.076
 begs-0.074
 truth-0.074
Token learned
Token ID4499
Feature activation+1.76e-02

Pos logit contributions
 boarded+0.022
ep+0.022
field+0.021
str+0.021
aster+0.021
Neg logit contributions
ives-0.034
 consequences-0.033
 aback-0.033
 truth-0.032
 agrees-0.032
Token that
Token ID326
Feature activation-8.59e-03

Pos logit contributions
ives+0.041
 aback+0.040
 consequences+0.040
 truth+0.039
 Give+0.039
Neg logit contributions
 boarded-0.020
ep-0.018
str-0.018
field-0.018
 sudden-0.018
Token it
Token ID340
Feature activation+4.18e-02

Pos logit contributions
 boarded+0.013
ep+0.012
str+0.012
field+0.012
aster+0.011
Neg logit contributions
 consequences-0.017
ives-0.017
 truth-0.017
 aback-0.016
 Give-0.016
Token doesn
Token ID1595
Feature activation+8.63e-02

Pos logit contributions
 aback+0.082
ives+0.081
 apology+0.080
 consequences+0.080
 Give+0.077
Neg logit contributions
ep-0.054
 blowing-0.052
str-0.052
 flew-0.051
 they-0.050
Token't
Token ID470
Feature activation+1.37e-02

Pos logit contributions
ives+0.138
 aback+0.138
 consequences+0.135
 agrees+0.127
 Give+0.126
Neg logit contributions
 boarded-0.152
ep-0.152
aster-0.151
str-0.149
After-0.147
Token matter
Token ID2300
Feature activation+2.40e-02

Pos logit contributions
ives+0.026
 consequences+0.026
 aback+0.026
 truth+0.025
 Give+0.025
Neg logit contributions
 boarded-0.021
ep-0.020
str-0.019
field-0.019
 sudden-0.019
Token if
Token ID611
Feature activation-1.23e-02

Pos logit contributions
ives+0.054
 consequences+0.054
 aback+0.054
 truth+0.053
 Give+0.052
Neg logit contributions
 boarded-0.028
ep-0.026
str-0.025
field-0.025
 sudden-0.025
Token your
Token ID534
Feature activation+5.20e-02

Pos logit contributions
 boarded+0.015
 sudden+0.015
 wheat+0.014
ep+0.014
 blowing+0.014
Neg logit contributions
ives-0.027
 aback-0.027
 Give-0.025
 consequences-0.025
 truth-0.024
Token tail
Token ID7894
Feature activation-1.13e-02

Pos logit contributions
ives+0.116
 aback+0.116
 consequences+0.113
 truth+0.112
 Give+0.111
Neg logit contributions
 boarded-0.060
ep-0.057
 sudden-0.056
str-0.055
field-0.054
Token to
Token ID284
Feature activation+8.05e-03

Pos logit contributions
ives+0.027
 aback+0.027
 consequences+0.027
 truth+0.026
 Give+0.026
Neg logit contributions
 boarded-0.025
ep-0.024
str-0.024
field-0.023
aster-0.023
Token talk
Token ID1561
Feature activation+3.39e-02

Pos logit contributions
ives+0.016
 aback+0.016
 consequences+0.016
 Give+0.015
 truth+0.015
Neg logit contributions
 boarded-0.012
ep-0.011
aster-0.011
str-0.011
field-0.011
Token to
Token ID284
Feature activation+1.98e-03

Pos logit contributions
 aback+0.058
ives+0.057
 consequences+0.056
 Give+0.055
 truth+0.053
Neg logit contributions
 boarded-0.057
ep-0.055
 sunny-0.053
str-0.052
 sudden-0.052
Token people
Token ID661
Feature activation-1.42e-02

Pos logit contributions
ives+0.004
 aback+0.004
 consequences+0.004
 Give+0.004
 truth+0.004
Neg logit contributions
 boarded-0.003
ep-0.003
aster-0.003
str-0.003
 sudden-0.003
Token she
Token ID673
Feature activation+3.42e-02

Pos logit contributions
 boarded+0.027
ep+0.026
str+0.025
field+0.025
aster+0.025
Neg logit contributions
ives-0.022
 aback-0.022
 consequences-0.022
 truth-0.021
 Give-0.021
Token didn
Token ID1422
Feature activation+8.60e-02

Pos logit contributions
 aback+0.071
ives+0.071
 consequences+0.070
 truth+0.068
 Give+0.068
Neg logit contributions
 boarded-0.047
ep-0.044
str-0.043
field-0.042
 sudden-0.042
Token't
Token ID470
Feature activation+3.71e-02

Pos logit contributions
 aback+0.160
 consequences+0.157
ives+0.155
 agrees+0.150
 fault+0.147
Neg logit contributions
aster-0.131
ep-0.131
 blowing-0.129
After-0.126
 boarded-0.124
Token know
Token ID760
Feature activation+3.82e-02

Pos logit contributions
 aback+0.061
ives+0.060
 consequences+0.059
 Give+0.058
 truth+0.056
Neg logit contributions
 boarded-0.065
ep-0.064
str-0.062
aster-0.061
 sunny-0.060
Token.
Token ID13
Feature activation+2.64e-02

Pos logit contributions
ives+0.061
 consequences+0.061
 aback+0.060
 truth+0.059
 Give+0.058
Neg logit contributions
 boarded-0.069
ep-0.066
str-0.065
field-0.064
 sudden-0.064
Token One
Token ID1881
Feature activation-2.46e-02

Pos logit contributions
 aback+0.054
ives+0.054
 consequences+0.054
 agrees+0.049
 fault+0.049
Neg logit contributions
 sunny-0.035
After-0.035
 boarded-0.034
While-0.034
At-0.033
Token day
Token ID1110
Feature activation-3.47e-02

Pos logit contributions
ep+0.040
 boarded+0.038
 sudden+0.038
field+0.037
cking+0.036
Neg logit contributions
 truth-0.044
 Give-0.042
 consequences-0.041
 apology-0.039
ives-0.038
Token decided
Token ID3066
Feature activation+2.65e-02

Pos logit contributions
 aback+0.018
ives+0.018
 consequences+0.018
 truth+0.018
 Give+0.018
Neg logit contributions
aster-0.012
str-0.011
 boarded-0.011
field-0.011
 sudden-0.011
Token to
Token ID284
Feature activation-8.34e-04

Pos logit contributions
ives+0.044
 aback+0.043
 consequences+0.043
 truth+0.041
 Give+0.041
Neg logit contributions
 boarded-0.047
ep-0.045
str-0.045
field-0.044
 sudden-0.044
Token speed
Token ID2866
Feature activation-2.59e-02

Pos logit contributions
 boarded+0.001
str+0.001
 sudden+0.001
ep+0.001
pron+0.001
Neg logit contributions
 truth-0.001
 aback-0.001
 consequences-0.001
 Give-0.001
ives-0.001
Token.
Token ID13
Feature activation+2.33e-02

Pos logit contributions
 boarded+0.043
ep+0.041
str+0.041
 sudden+0.040
field+0.040
Neg logit contributions
 aback-0.045
ives-0.045
 consequences-0.045
 truth-0.044
 Give-0.043
Token Faster
Token ID38996
Feature activation-1.25e-02

Pos logit contributions
 consequences+0.039
ives+0.036
 truth+0.034
 fault+0.034
 aback+0.033
Neg logit contributions
 boarded-0.042
After-0.040
 sudden-0.040
field-0.040
aster-0.039
Token and
Token ID290
Feature activation-1.04e-01

Pos logit contributions
 boarded+0.021
pron+0.020
str+0.019
ep+0.019
aster+0.019
Neg logit contributions
 aback-0.023
ives-0.022
 consequences-0.021
 Give-0.021
 agrees-0.020
Token faster
Token ID5443
Feature activation-4.94e-02

Pos logit contributions
 boarded+0.205
pron+0.202
field+0.197
str+0.196
ep+0.194
Neg logit contributions
 aback-0.149
 consequences-0.141
ives-0.137
 Give-0.136
 agrees-0.136
Token he
Token ID339
Feature activation-1.20e-03

Pos logit contributions
 boarded+0.063
str+0.061
ep+0.059
field+0.058
pron+0.058
Neg logit contributions
 aback-0.106
 consequences-0.105
ives-0.104
 Give-0.101
 truth-0.101
Token went
Token ID1816
Feature activation+1.17e-02

Pos logit contributions
 boarded+0.001
aster+0.001
str+0.001
pron+0.001
field+0.001
Neg logit contributions
ives-0.003
 aback-0.003
 consequences-0.003
 truth-0.003
 Give-0.003
Token,
Token ID11
Feature activation-5.17e-02

Pos logit contributions
ives+0.022
 consequences+0.022
 aback+0.022
 truth+0.022
 Give+0.021
Neg logit contributions
 boarded-0.018
ep-0.017
str-0.016
field-0.016
 sudden-0.016
Token until
Token ID1566
Feature activation-6.83e-02

Pos logit contributions
 boarded+0.073
str+0.070
ep+0.070
pron+0.068
aster+0.068
Neg logit contributions
 aback-0.104
ives-0.101
 consequences-0.100
 Give-0.099
 truth-0.097
Token began
Token ID2540
Feature activation-9.33e-04

Pos logit contributions
 boarded+0.058
aster+0.055
str+0.054
ep+0.054
field+0.053
Neg logit contributions
ives-0.072
 consequences-0.071
 truth-0.070
 aback-0.070
 Give-0.069
Token to
Token ID284
Feature activation-8.91e-03

Pos logit contributions
 boarded+0.002
str+0.002
ep+0.002
field+0.002
aster+0.002
Neg logit contributions
 aback-0.001
 consequences-0.001
 truth-0.001
ives-0.001
 Give-0.001
Token spin
Token ID7906
Feature activation+3.75e-03

Pos logit contributions
 boarded+0.015
str+0.015
ep+0.014
 sudden+0.014
cle+0.014
Neg logit contributions
 truth-0.015
 consequences-0.015
 aback-0.015
ives-0.014
 Give-0.014
Token.
Token ID13
Feature activation+2.33e-02

Pos logit contributions
ives+0.007
 consequences+0.007
 Give+0.007
 truth+0.007
 aback+0.007
Neg logit contributions
ep-0.005
 boarded-0.005
 sunny-0.005
 sudden-0.005
After-0.005
Token Faster
Token ID38996
Feature activation-1.13e-02

Pos logit contributions
 consequences+0.037
ives+0.037
 aback+0.036
 truth+0.035
 Give+0.034
Neg logit contributions
 boarded-0.042
ep-0.040
str-0.040
aster-0.040
 sudden-0.040
Token and
Token ID290
Feature activation-1.03e-01

Pos logit contributions
 boarded+0.020
field+0.018
pron+0.018
aster+0.018
ep+0.018
Neg logit contributions
 aback-0.020
ives-0.019
 Give-0.019
 consequences-0.019
 truth-0.018
Token faster
Token ID5443
Feature activation-5.02e-02

Pos logit contributions
 boarded+0.184
field+0.174
ep+0.171
str+0.170
pron+0.169
Neg logit contributions
 aback-0.171
 consequences-0.168
ives-0.166
 Give-0.163
 truth-0.161
Token it
Token ID340
Feature activation+5.20e-03

Pos logit contributions
 boarded+0.071
str+0.066
ep+0.066
field+0.065
pron+0.064
Neg logit contributions
 consequences-0.101
 aback-0.101
ives-0.100
 Give-0.098
 truth-0.097
Token went
Token ID1816
Feature activation+1.05e-02

Pos logit contributions
ives+0.012
 aback+0.012
 consequences+0.012
 truth+0.012
 Give+0.011
Neg logit contributions
 boarded-0.006
str-0.005
ep-0.005
field-0.005
aster-0.005
Token.
Token ID13
Feature activation+1.96e-02

Pos logit contributions
ives+0.022
 consequences+0.021
 aback+0.021
 truth+0.020
 fault+0.020
Neg logit contributions
ep-0.014
 boarded-0.014
str-0.013
 sunny-0.013
field-0.013
Token Round
Token ID10485
Feature activation+4.41e-04

Pos logit contributions
 consequences+0.032
ives+0.032
 aback+0.031
 truth+0.030
 Give+0.030
Neg logit contributions
 boarded-0.035
ep-0.033
str-0.033
aster-0.033
 sudden-0.033
Token
Token ID198
Feature activation-3.69e-02

Pos logit contributions
ives+0.026
 aback+0.026
 Give+0.025
 consequences+0.025
 truth+0.025
Neg logit contributions
 boarded-0.036
ep-0.035
str-0.035
 sudden-0.034
aster-0.034
Token
Token ID198
Feature activation-4.27e-02

Pos logit contributions
 boarded+0.058
str+0.054
field+0.053
ep+0.053
pron+0.052
Neg logit contributions
 truth-0.068
ives-0.066
 aback-0.066
 Give-0.066
 consequences-0.065
TokenShe
Token ID3347
Feature activation-1.57e-02

Pos logit contributions
 boarded+0.055
field+0.052
str+0.051
 blowing+0.051
cker+0.049
Neg logit contributions
 Give-0.088
ives-0.087
 truth-0.086
 aback-0.085
 agrees-0.082
Token explored
Token ID18782
Feature activation-1.77e-02

Pos logit contributions
 boarded+0.021
str+0.020
ep+0.020
aster+0.020
field+0.019
Neg logit contributions
ives-0.032
 aback-0.032
 consequences-0.031
 Give-0.031
 truth-0.031
Token further
Token ID2252
Feature activation-4.66e-02

Pos logit contributions
 boarded+0.023
str+0.022
ep+0.021
aster+0.021
pron+0.020
Neg logit contributions
 aback-0.039
 consequences-0.038
 truth-0.037
ives-0.037
 agrees-0.036
Token and
Token ID290
Feature activation-1.02e-01

Pos logit contributions
 boarded+0.080
pron+0.080
str+0.077
aster+0.077
ep+0.075
Neg logit contributions
 aback-0.082
 consequences-0.080
ives-0.075
 agrees-0.074
 truth-0.074
Token further
Token ID2252
Feature activation-4.59e-02

Pos logit contributions
 boarded+0.180
pron+0.177
str+0.175
ep+0.173
aster+0.172
Neg logit contributions
 aback-0.168
 consequences-0.163
ives-0.155
 agrees-0.152
 truth-0.151
Token,
Token ID11
Feature activation-6.89e-02

Pos logit contributions
 boarded+0.075
str+0.071
pron+0.071
aster+0.071
ep+0.071
Neg logit contributions
 aback-0.083
 consequences-0.081
ives-0.080
 truth-0.078
 agrees-0.076
Token until
Token ID1566
Feature activation-7.12e-02

Pos logit contributions
 boarded+0.102
pron+0.102
ep+0.101
str+0.098
aster+0.097
Neg logit contributions
 aback-0.135
 consequences-0.128
ives-0.126
 agrees-0.119
 truth-0.118
Token she
Token ID673
Feature activation-6.76e-03

Pos logit contributions
ep+0.117
pron+0.107
cker+0.104
 boarded+0.103
str+0.102
Neg logit contributions
 aback-0.132
 consequences-0.129
 truth-0.124
ives-0.118
 Give-0.116
Token came
Token ID1625
Feature activation+2.41e-02

Pos logit contributions
 boarded+0.009
ep+0.009
str+0.009
aster+0.009
 sudden+0.009
Neg logit contributions
ives-0.013
 aback-0.013
 consequences-0.013
 truth-0.013
 Give-0.013
Token together
Token ID1978
Feature activation-2.28e-02

Pos logit contributions
 boarded+0.002
ep+0.002
 sudden+0.002
str+0.002
aster+0.002
Neg logit contributions
ives-0.005
 Give-0.005
 consequences-0.005
 aback-0.005
 truth-0.005
Token to
Token ID284
Feature activation-1.53e-02

Pos logit contributions
 boarded+0.041
ep+0.039
str+0.039
 sudden+0.039
aster+0.038
Neg logit contributions
ives-0.037
 consequences-0.036
 aback-0.036
 truth-0.036
 Give-0.035
Token create
Token ID2251
Feature activation-4.03e-03

Pos logit contributions
 boarded+0.030
str+0.028
 sudden+0.028
ep+0.027
eh+0.027
Neg logit contributions
 truth-0.023
 consequences-0.023
 aback-0.022
ives-0.021
 Give-0.020
Token even
Token ID772
Feature activation-7.33e-02

Pos logit contributions
 boarded+0.006
ep+0.005
str+0.005
 sudden+0.005
aster+0.005
Neg logit contributions
ives-0.008
 aback-0.008
 consequences-0.008
 Give-0.008
 truth-0.008
Token brighter
Token ID26083
Feature activation-3.59e-02

Pos logit contributions
field+0.142
str+0.141
 boarded+0.140
ep+0.134
ittens+0.132
Neg logit contributions
 consequences-0.113
 aback-0.109
 truth-0.107
ives-0.103
 agrees-0.099
Token and
Token ID290
Feature activation-1.01e-01

Pos logit contributions
 boarded+0.065
str+0.062
ep+0.060
aster+0.059
field+0.059
Neg logit contributions
 consequences-0.060
 aback-0.057
ives-0.057
 truth-0.057
 Give-0.054
Token more
Token ID517
Feature activation-4.57e-02

Pos logit contributions
 boarded+0.178
ep+0.172
field+0.171
str+0.171
aster+0.168
Neg logit contributions
 consequences-0.167
 aback-0.166
 truth-0.162
ives-0.160
 Give-0.154
Token beautiful
Token ID4950
Feature activation-2.65e-02

Pos logit contributions
 boarded+0.076
ep+0.072
str+0.072
field+0.071
aster+0.070
Neg logit contributions
ives-0.081
 consequences-0.080
 aback-0.080
 truth-0.078
 Give-0.077
Token pictures
Token ID5986
Feature activation-3.97e-02

Pos logit contributions
 boarded+0.043
ep+0.042
 sudden+0.041
str+0.041
field+0.040
Neg logit contributions
ives-0.047
 aback-0.047
 consequences-0.046
 Give-0.045
 truth-0.045
Token.
Token ID13
Feature activation+2.22e-02

Pos logit contributions
 boarded+0.072
ep+0.069
str+0.068
field+0.068
aster+0.067
Neg logit contributions
 consequences-0.062
 aback-0.062
ives-0.062
 truth-0.060
 Give-0.059
Token<|endoftext|>
Token ID50256
Feature activation+3.91e-02

Pos logit contributions
 consequences+0.042
ives+0.042
 aback+0.042
 Give+0.040
 truth+0.040
Neg logit contributions
 boarded-0.033
str-0.032
aster-0.031
field-0.030
ep-0.030
Token
Token ID220
Feature activation+3.80e-02

Pos logit contributions
 boarded+0.049
ep+0.045
field+0.045
str+0.044
 sudden+0.042
Neg logit contributions
ives-0.064
 consequences-0.064
 truth-0.063
 aback-0.062
 Give-0.061
Token
Token ID198
Feature activation-3.80e-02

Pos logit contributions
 consequences+0.072
 aback+0.071
ives+0.071
 Give+0.069
 truth+0.068
Neg logit contributions
 boarded-0.056
str-0.054
ep-0.054
 sudden-0.053
field-0.053
TokenAt
Token ID2953
Feature activation-6.45e-02

Pos logit contributions
 boarded+0.064
field+0.060
ep+0.060
str+0.060
aster+0.058
Neg logit contributions
ives-0.065
 aback-0.064
 truth-0.064
 Give-0.064
 consequences-0.062
Token the
Token ID262
Feature activation-9.25e-03

Pos logit contributions
ep+0.105
 boarded+0.103
 sudden+0.101
str+0.100
cker+0.097
Neg logit contributions
 aback-0.112
 fault-0.111
 consequences-0.110
 truth-0.109
 apology-0.097
Token end
Token ID886
Feature activation-5.20e-03

Pos logit contributions
 boarded+0.014
ep+0.014
ittens+0.013
cker+0.012
After+0.012
Neg logit contributions
 truth-0.018
 consequences-0.017
 Give-0.015
 fault-0.015
ives-0.015
Token of
Token ID286
Feature activation-1.01e-01

Pos logit contributions
 boarded+0.008
ep+0.008
str+0.008
field+0.008
 sudden+0.007
Neg logit contributions
ives-0.010
 aback-0.010
 truth-0.009
 consequences-0.009
 Give-0.009
Token the
Token ID262
Feature activation-2.34e-02

Pos logit contributions
 boarded+0.166
str+0.157
ep+0.154
pron+0.154
 sudden+0.147
Neg logit contributions
 aback-0.173
 truth-0.171
 consequences-0.166
 Give-0.160
 fault-0.158
Token day
Token ID1110
Feature activation-2.41e-02

Pos logit contributions
 boarded+0.032
ep+0.031
cker+0.027
After+0.027
str+0.027
Neg logit contributions
 truth-0.048
 consequences-0.048
 Give-0.044
 apology-0.044
ives-0.043
Token,
Token ID11
Feature activation-5.61e-02

Pos logit contributions
ep+0.039
 boarded+0.038
str+0.038
aster+0.036
field+0.036
Neg logit contributions
ives-0.043
 aback-0.042
 consequences-0.041
 Give-0.041
 truth-0.041
Token Ad
Token ID1215
Feature activation-3.01e-02

Pos logit contributions
ep+0.107
str+0.103
 boarded+0.102
cker+0.102
field+0.099
Neg logit contributions
 aback-0.082
 truth-0.079
 Give-0.078
ives-0.076
 consequences-0.076
Tokeneline
Token ID4470
Feature activation-2.35e-02

Pos logit contributions
 boarded+0.040
ep+0.038
field+0.038
aster+0.038
 sudden+0.038
Neg logit contributions
ives-0.062
 aback-0.061
 consequences-0.060
 Give-0.059
 truth-0.059
Token started
Token ID2067
Feature activation+5.12e-03

Pos logit contributions
aster+0.003
field+0.002
str+0.002
 boarded+0.002
chard+0.002
Neg logit contributions
 aback-0.005
ives-0.005
 agrees-0.005
 Give-0.005
 consequences-0.005
Token to
Token ID284
Feature activation-5.34e-03

Pos logit contributions
 aback+0.008
 consequences+0.008
 truth+0.007
 Give+0.007
 demands+0.007
Neg logit contributions
 boarded-0.010
pron-0.009
ep-0.009
field-0.009
str-0.009
Token climb
Token ID12080
Feature activation-1.13e-02

Pos logit contributions
 boarded+0.007
ep+0.007
pron+0.007
 sudden+0.007
str+0.007
Neg logit contributions
 aback-0.011
 truth-0.010
 consequences-0.010
 agrees-0.010
 Give-0.010
Token.
Token ID13
Feature activation+2.60e-02

Pos logit contributions
 boarded+0.016
ep+0.015
str+0.015
aster+0.015
 sudden+0.015
Neg logit contributions
 aback-0.022
 consequences-0.022
ives-0.022
 truth-0.022
 Give-0.021
Token Higher
Token ID16038
Feature activation-1.99e-02

Pos logit contributions
ives+0.039
 consequences+0.039
 aback+0.037
 truth+0.036
 Give+0.036
Neg logit contributions
 boarded-0.051
field-0.048
 sudden-0.048
aster-0.048
ep-0.047
Token and
Token ID290
Feature activation-1.00e-01

Pos logit contributions
 boarded+0.038
pron+0.035
aster+0.034
str+0.034
ep+0.034
Neg logit contributions
 aback-0.032
ives-0.030
 consequences-0.030
 truth-0.029
 Give-0.029
Token higher
Token ID2440
Feature activation-7.10e-02

Pos logit contributions
 boarded+0.154
pron+0.143
ep+0.141
aster+0.137
field+0.137
Neg logit contributions
 aback-0.194
 consequences-0.184
 agrees-0.182
ives-0.179
 Give-0.179
Token he
Token ID339
Feature activation-4.37e-03

Pos logit contributions
 boarded+0.110
pron+0.103
str+0.101
ep+0.100
 sudden+0.096
Neg logit contributions
 aback-0.136
 consequences-0.134
ives-0.127
 truth-0.127
 agrees-0.126
Token went
Token ID1816
Feature activation+4.89e-03

Pos logit contributions
 boarded+0.006
pron+0.005
aster+0.005
str+0.005
ep+0.005
Neg logit contributions
ives-0.009
 aback-0.009
 consequences-0.008
 truth-0.008
 agrees-0.008
Token,
Token ID11
Feature activation-5.20e-02

Pos logit contributions
ives+0.009
 consequences+0.009
 aback+0.009
 truth+0.008
 Give+0.008
Neg logit contributions
 boarded-0.008
str-0.007
ep-0.007
field-0.007
aster-0.007
Token until
Token ID1566
Feature activation-5.74e-02

Pos logit contributions
 boarded+0.077
ep+0.073
str+0.073
pron+0.071
field+0.070
Neg logit contributions
 aback-0.103
ives-0.099
 consequences-0.099
 agrees-0.097
 Give-0.096
Token started
Token ID2067
Feature activation+9.13e-03

Pos logit contributions
 boarded+0.048
str+0.045
ep+0.045
aster+0.044
 sudden+0.044
Neg logit contributions
ives-0.072
 aback-0.071
 consequences-0.071
 truth-0.069
 Give-0.069
Token to
Token ID284
Feature activation-3.29e-03

Pos logit contributions
ives+0.016
 consequences+0.015
 aback+0.015
 truth+0.015
 Give+0.015
Neg logit contributions
 boarded-0.015
ep-0.015
 sudden-0.015
str-0.015
aster-0.014
Token spin
Token ID7906
Feature activation+8.28e-03

Pos logit contributions
 boarded+0.005
str+0.005
ep+0.005
 sudden+0.005
field+0.005
Neg logit contributions
 aback-0.006
 consequences-0.006
 truth-0.006
ives-0.006
 Give-0.006
Token.
Token ID13
Feature activation+2.75e-02

Pos logit contributions
ives+0.016
 consequences+0.015
 truth+0.015
 apology+0.015
 Give+0.015
Neg logit contributions
ep-0.013
 sunny-0.012
 sudden-0.012
 boarded-0.011
 blowing-0.011
Token Faster
Token ID38996
Feature activation-1.09e-02

Pos logit contributions
 consequences+0.042
ives+0.041
 aback+0.041
 truth+0.040
 Give+0.039
Neg logit contributions
 boarded-0.052
ep-0.049
str-0.049
 sudden-0.049
aster-0.049
Token and
Token ID290
Feature activation-1.00e-01

Pos logit contributions
 boarded+0.019
str+0.017
field+0.017
pron+0.017
ep+0.017
Neg logit contributions
 aback-0.019
ives-0.019
 consequences-0.019
 Give-0.018
 truth-0.018
Token faster
Token ID5443
Feature activation-4.82e-02

Pos logit contributions
 boarded+0.195
field+0.185
str+0.183
ep+0.182
pron+0.180
Neg logit contributions
 aback-0.149
 consequences-0.146
ives-0.145
 Give-0.140
 truth-0.138
Token it
Token ID340
Feature activation+9.87e-03

Pos logit contributions
 boarded+0.064
str+0.061
field+0.060
ep+0.059
pron+0.059
Neg logit contributions
 aback-0.101
 consequences-0.101
ives-0.099
 Give-0.097
 truth-0.096
Token went
Token ID1816
Feature activation+1.08e-02

Pos logit contributions
ives+0.022
 aback+0.022
 consequences+0.022
 truth+0.022
 Give+0.022
Neg logit contributions
 boarded-0.011
str-0.010
field-0.010
ep-0.010
aster-0.010
Token until
Token ID1566
Feature activation-6.16e-02

Pos logit contributions
ives+0.021
 consequences+0.021
 truth+0.020
 aback+0.020
 fault+0.020
Neg logit contributions
ep-0.015
 boarded-0.015
str-0.015
 sunny-0.014
 sudden-0.014
Token it
Token ID340
Feature activation+2.70e-02

Pos logit contributions
ep+0.091
 boarded+0.091
str+0.090
cker+0.088
pron+0.086
Neg logit contributions
 consequences-0.115
 aback-0.114
 truth-0.113
 Give-0.109
 fault-0.106
Token
Token ID198
Feature activation-3.49e-02

Pos logit contributions
ives+0.024
 consequences+0.024
 aback+0.024
 truth+0.023
 Give+0.022
Neg logit contributions
 boarded-0.031
ep-0.030
str-0.030
field-0.030
 sudden-0.030
Token
Token ID198
Feature activation-4.34e-02

Pos logit contributions
 boarded+0.060
ep+0.057
str+0.056
field+0.056
aster+0.055
Neg logit contributions
ives-0.060
 aback-0.059
 truth-0.058
 consequences-0.058
 Give-0.057
TokenSuddenly
Token ID38582
Feature activation-6.00e-02

Pos logit contributions
cker+0.051
field+0.050
 boarded+0.050
str+0.047
ep+0.046
Neg logit contributions
ives-0.097
 Give-0.096
 aback-0.094
 truth-0.092
 agrees-0.088
Token,
Token ID11
Feature activation-7.45e-02

Pos logit contributions
 boarded+0.094
ep+0.091
str+0.089
field+0.088
aster+0.087
Neg logit contributions
 aback-0.111
 consequences-0.110
ives-0.109
 truth-0.107
 Give-0.105
Token one
Token ID530
Feature activation-2.92e-02

Pos logit contributions
ep+0.122
cker+0.117
 boarded+0.116
str+0.114
field+0.111
Neg logit contributions
 aback-0.138
 truth-0.134
 consequences-0.133
 Give-0.125
ives-0.125
Token of
Token ID286
Feature activation-1.00e-01

Pos logit contributions
ep+0.053
 boarded+0.053
field+0.052
str+0.050
After+0.048
Neg logit contributions
 aback-0.045
 consequences-0.045
 truth-0.044
ives-0.044
 Give-0.043
Token the
Token ID262
Feature activation-8.15e-03

Pos logit contributions
str+0.164
 boarded+0.163
field+0.158
cker+0.155
ep+0.152
Neg logit contributions
 aback-0.188
 truth-0.184
 consequences-0.174
 Give-0.167
 apology-0.160
Token pieces
Token ID5207
Feature activation-5.94e-02

Pos logit contributions
 boarded+0.012
ep+0.012
 sudden+0.011
str+0.011
field+0.011
Neg logit contributions
ives-0.016
 aback-0.016
 consequences-0.015
 Give-0.015
 truth-0.015
Token of
Token ID286
Feature activation-6.11e-02

Pos logit contributions
 boarded+0.084
aster+0.083
cle+0.083
str+0.082
field+0.081
Neg logit contributions
ives-0.115
 aback-0.109
 consequences-0.108
 truth-0.108
 Give-0.104
Token ice
Token ID4771
Feature activation-3.38e-02

Pos logit contributions
 boarded+0.093
field+0.092
str+0.088
ep+0.087
pron+0.086
Neg logit contributions
 truth-0.118
 consequences-0.116
 aback-0.115
 Give-0.109
ives-0.108
Token broke
Token ID6265
Feature activation+1.25e-02

Pos logit contributions
 boarded+0.052
aster+0.050
 sudden+0.049
After+0.048
str+0.048
Neg logit contributions
ives-0.063
 consequences-0.063
 aback-0.062
 truth-0.059
 fault-0.059
Token in
Token ID287
Feature activation-4.28e-02

Pos logit contributions
 boarded+0.045
ep+0.043
str+0.043
field+0.042
 sudden+0.042
Neg logit contributions
ives-0.062
 aback-0.061
 consequences-0.060
 truth-0.059
 Give-0.059
Token the
Token ID262
Feature activation-2.47e-03

Pos logit contributions
 boarded+0.046
ep+0.043
str+0.043
field+0.042
aster+0.041
Neg logit contributions
ives-0.099
 aback-0.099
 consequences-0.099
 truth-0.097
 Give-0.096
Token circle
Token ID9197
Feature activation-7.13e-03

Pos logit contributions
 boarded+0.003
str+0.003
ep+0.003
aster+0.003
field+0.003
Neg logit contributions
 consequences-0.005
 truth-0.005
 aback-0.005
ives-0.005
 Give-0.005
Token.
Token ID13
Feature activation+2.44e-02

Pos logit contributions
 boarded+0.014
str+0.014
ep+0.013
 sudden+0.013
aster+0.013
Neg logit contributions
ives-0.010
 consequences-0.010
 aback-0.010
 truth-0.010
 Give-0.009
Token Faster
Token ID38996
Feature activation-1.44e-02

Pos logit contributions
ives+0.039
 aback+0.038
 consequences+0.038
 Give+0.037
 truth+0.037
Neg logit contributions
 boarded-0.044
ep-0.043
str-0.042
field-0.041
aster-0.041
Token and
Token ID290
Feature activation-1.00e-01

Pos logit contributions
 boarded+0.026
pron+0.024
field+0.024
aster+0.024
ep+0.023
Neg logit contributions
 aback-0.024
ives-0.024
 consequences-0.023
 Give-0.023
 agrees-0.022
Token faster
Token ID5443
Feature activation-4.66e-02

Pos logit contributions
 boarded+0.159
pron+0.151
field+0.150
ep+0.147
aster+0.145
Neg logit contributions
 aback-0.185
 consequences-0.180
ives-0.175
 agrees-0.174
 Give-0.174
Token they
Token ID484
Feature activation-4.65e-03

Pos logit contributions
 boarded+0.065
str+0.061
field+0.061
ep+0.061
pron+0.060
Neg logit contributions
 aback-0.094
 consequences-0.094
ives-0.093
 Give-0.090
 truth-0.089
Token went
Token ID1816
Feature activation+1.86e-02

Pos logit contributions
pron+0.006
aster+0.006
str+0.006
 wheat+0.006
field+0.006
Neg logit contributions
ives-0.009
 aback-0.009
 consequences-0.009
 agrees-0.009
 truth-0.009
Token until
Token ID1566
Feature activation-6.22e-02

Pos logit contributions
ives+0.038
 consequences+0.037
 aback+0.037
 truth+0.036
 Give+0.036
Neg logit contributions
 boarded-0.026
ep-0.025
str-0.024
field-0.024
 sudden-0.024
Token Sarah
Token ID10490
Feature activation+1.60e-02

Pos logit contributions
ep+0.093
cker+0.087
eh+0.085
 boarded+0.085
pron+0.084
Neg logit contributions
 consequences-0.119
 aback-0.118
 truth-0.116
 fault-0.113
 Give-0.110
Token around
Token ID1088
Feature activation-2.57e-02

Pos logit contributions
aster+0.096
pron+0.095
 boarded+0.094
ittens+0.094
str+0.093
Neg logit contributions
 aback-0.131
 consequences-0.123
 Give-0.122
 truth-0.120
ives-0.120
Token in
Token ID287
Feature activation-4.45e-02

Pos logit contributions
 boarded+0.026
ep+0.025
 sudden+0.025
field+0.025
aster+0.024
Neg logit contributions
ives-0.062
 aback-0.059
 consequences-0.059
 Give-0.059
 truth-0.058
Token circles
Token ID13332
Feature activation-9.56e-03

Pos logit contributions
 boarded+0.047
ep+0.044
str+0.044
field+0.042
aster+0.042
Neg logit contributions
ives-0.105
 aback-0.104
 consequences-0.104
 truth-0.102
 Give-0.101
Token.
Token ID13
Feature activation+2.95e-02

Pos logit contributions
 boarded+0.016
ep+0.016
str+0.015
field+0.015
aster+0.015
Neg logit contributions
ives-0.016
 aback-0.016
 consequences-0.016
 truth-0.016
 Give-0.015
Token Faster
Token ID38996
Feature activation-7.77e-03

Pos logit contributions
 consequences+0.046
ives+0.045
 aback+0.044
 truth+0.043
 Give+0.042
Neg logit contributions
 boarded-0.055
aster-0.052
 sudden-0.051
field-0.051
ep-0.051
Token and
Token ID290
Feature activation-9.99e-02

Pos logit contributions
 boarded+0.013
aster+0.012
pron+0.012
str+0.012
field+0.012
Neg logit contributions
 aback-0.014
ives-0.013
 consequences-0.013
 Give-0.013
 truth-0.013
Token faster
Token ID5443
Feature activation-4.69e-02

Pos logit contributions
 boarded+0.189
field+0.181
pron+0.180
aster+0.178
str+0.177
Neg logit contributions
 aback-0.152
 consequences-0.148
ives-0.143
 Give-0.141
 truth-0.141
Token it
Token ID340
Feature activation+8.29e-03

Pos logit contributions
 boarded+0.066
str+0.064
pron+0.064
field+0.062
ep+0.062
Neg logit contributions
 consequences-0.093
 aback-0.092
ives-0.091
 Give-0.089
 truth-0.088
Token went
Token ID1816
Feature activation+1.09e-02

Pos logit contributions
ives+0.018
 aback+0.018
 consequences+0.018
 truth+0.018
 Give+0.017
Neg logit contributions
 boarded-0.009
aster-0.009
str-0.009
pron-0.009
field-0.009
Token.
Token ID13
Feature activation+2.33e-02

Pos logit contributions
ives+0.022
 consequences+0.021
 aback+0.021
 truth+0.020
 Give+0.020
Neg logit contributions
 boarded-0.016
ep-0.015
str-0.015
field-0.015
 sudden-0.015
Token Then
Token ID3244
Feature activation-5.39e-02

Pos logit contributions
ives+0.036
 consequences+0.036
 aback+0.036
 truth+0.035
 Give+0.034
Neg logit contributions
 boarded-0.043
ep-0.041
str-0.040
aster-0.040
field-0.040
Token jewel
Token ID16927
Feature activation-1.97e-02

Pos logit contributions
ives+0.044
 aback+0.043
 Give+0.042
 consequences+0.042
 agrees+0.041
Neg logit contributions
 boarded-0.028
ep-0.027
str-0.026
field-0.026
 sudden-0.026
Token.
Token ID13
Feature activation+3.86e-02

Pos logit contributions
 boarded+0.038
ep+0.037
field+0.036
str+0.036
aster+0.036
Neg logit contributions
ives-0.029
 aback-0.029
 consequences-0.029
 truth-0.027
 Give-0.027
Token Then
Token ID3244
Feature activation-5.40e-02

Pos logit contributions
 consequences+0.071
ives+0.069
 aback+0.068
 agrees+0.064
 truth+0.064
Neg logit contributions
 boarded-0.062
 sudden-0.058
aster-0.057
str-0.057
After-0.057
Token,
Token ID11
Feature activation-6.90e-02

Pos logit contributions
 boarded+0.055
ep+0.052
str+0.051
field+0.049
aster+0.048
Neg logit contributions
 consequences-0.128
 aback-0.127
ives-0.126
 truth-0.125
 Give-0.125
Token one
Token ID530
Feature activation-3.16e-02

Pos logit contributions
str+0.087
ep+0.086
 boarded+0.082
field+0.079
pron+0.079
Neg logit contributions
 truth-0.140
 consequences-0.140
 aback-0.140
 Give-0.140
ives-0.134
Token of
Token ID286
Feature activation-9.97e-02

Pos logit contributions
 boarded+0.049
field+0.048
ep+0.048
str+0.047
 sudden+0.046
Neg logit contributions
 consequences-0.057
 truth-0.057
ives-0.056
 Give-0.056
 aback-0.055
Token them
Token ID606
Feature activation+1.02e-02

Pos logit contributions
str+0.157
 boarded+0.156
field+0.148
pron+0.144
 sudden+0.137
Neg logit contributions
 truth-0.185
 aback-0.184
 consequences-0.183
 Give-0.173
 apology-0.169
Token saw
Token ID2497
Feature activation-5.60e-03

Pos logit contributions
ives+0.022
 consequences+0.021
 aback+0.021
 truth+0.021
 Give+0.021
Neg logit contributions
 boarded-0.013
ep-0.012
str-0.012
aster-0.012
field-0.012
Token a
Token ID257
Feature activation+5.03e-03

Pos logit contributions
 boarded+0.007
str+0.007
ep+0.007
field+0.007
aster+0.007
Neg logit contributions
ives-0.012
 consequences-0.012
 aback-0.012
 truth-0.012
 Give-0.011
Token hole
Token ID7604
Feature activation-1.12e-03

Pos logit contributions
 truth+0.010
 consequences+0.009
 aback+0.009
 Give+0.009
ives+0.009
Neg logit contributions
 boarded-0.008
ep-0.007
aster-0.007
field-0.007
str-0.007
Token in
Token ID287
Feature activation-2.69e-02

Pos logit contributions
 boarded+0.002
str+0.002
 sudden+0.002
aster+0.002
ep+0.002
Neg logit contributions
ives-0.002
 consequences-0.002
 aback-0.002
 truth-0.002
 Give-0.002
Token
Token ID198
Feature activation-2.90e-02

Pos logit contributions
 consequences+0.069
ives+0.066
 aback+0.062
 fault+0.061
 truth+0.061
Neg logit contributions
 sudden-0.044
 boarded-0.044
After-0.043
At-0.040
str-0.040
Token
Token ID198
Feature activation-4.07e-02

Pos logit contributions
 boarded+0.043
ep+0.041
str+0.041
 sudden+0.041
aster+0.040
Neg logit contributions
 consequences-0.056
ives-0.056
 aback-0.056
 truth-0.053
 Give-0.053
TokenSo
Token ID2396
Feature activation-4.33e-02

Pos logit contributions
 boarded+0.050
field+0.047
str+0.045
cker+0.043
ep+0.043
Neg logit contributions
ives-0.089
 Give-0.088
 aback-0.087
 truth-0.087
 consequences-0.087
Token,
Token ID11
Feature activation-6.26e-02

Pos logit contributions
 boarded+0.058
ep+0.058
str+0.055
aster+0.054
field+0.054
Neg logit contributions
 consequences-0.089
 aback-0.088
 truth-0.087
 Give-0.086
ives-0.085
Token one
Token ID530
Feature activation-3.61e-02

Pos logit contributions
ep+0.096
str+0.092
 boarded+0.092
cker+0.090
pron+0.089
Neg logit contributions
 aback-0.116
 truth-0.115
 consequences-0.115
 Give-0.113
ives-0.111
Token of
Token ID286
Feature activation-9.93e-02

Pos logit contributions
 boarded+0.049
str+0.047
field+0.047
ep+0.046
 sudden+0.043
Neg logit contributions
 consequences-0.074
 truth-0.074
 Give-0.072
ives-0.072
 aback-0.071
Token the
Token ID262
Feature activation-1.03e-02

Pos logit contributions
str+0.161
 boarded+0.160
field+0.156
pron+0.151
cker+0.145
Neg logit contributions
 truth-0.181
 aback-0.179
 consequences-0.177
 Give-0.169
 apology-0.164
Token cows
Token ID22575
Feature activation-6.81e-02

Pos logit contributions
 sudden+0.014
 boarded+0.014
 sunny+0.013
ep+0.013
 wheat+0.013
Neg logit contributions
ives-0.021
 aback-0.021
 Give-0.020
 consequences-0.019
 agrees-0.019
Token offered
Token ID4438
Feature activation+1.42e-02

Pos logit contributions
 boarded+0.093
ep+0.090
str+0.089
aster+0.088
 sudden+0.088
Neg logit contributions
ives-0.137
 aback-0.134
 consequences-0.134
 truth-0.132
 Give-0.130
Token to
Token ID284
Feature activation-2.10e-02

Pos logit contributions
ives+0.028
 aback+0.028
 consequences+0.027
 Give+0.027
 truth+0.027
Neg logit contributions
 boarded-0.019
 wheat-0.018
ep-0.018
aster-0.018
str-0.018
Token carry
Token ID3283
Feature activation-1.83e-02

Pos logit contributions
 boarded+0.044
str+0.042
ep+0.042
field+0.042
 sudden+0.041
Neg logit contributions
 truth-0.028
 consequences-0.027
 aback-0.027
ives-0.027
 Give-0.026
Token
Token ID198
Feature activation-3.49e-02

Pos logit contributions
 consequences+0.053
ives+0.053
 aback+0.052
 truth+0.051
 Give+0.050
Neg logit contributions
 boarded-0.045
ep-0.043
str-0.043
field-0.042
aster-0.042
Token
Token ID198
Feature activation-4.33e-02

Pos logit contributions
 boarded+0.060
ep+0.057
str+0.057
field+0.057
aster+0.055
Neg logit contributions
ives-0.058
 aback-0.057
 truth-0.057
 Give-0.056
 consequences-0.056
TokenAt
Token ID2953
Feature activation-6.55e-02

Pos logit contributions
 boarded+0.052
field+0.050
str+0.049
cker+0.048
ep+0.047
Neg logit contributions
 Give-0.093
ives-0.092
 truth-0.090
 aback-0.089
 agrees-0.086
Token the
Token ID262
Feature activation-1.36e-02

Pos logit contributions
ep+0.111
 boarded+0.109
str+0.107
 sudden+0.107
cker+0.104
Neg logit contributions
 aback-0.106
 consequences-0.106
 truth-0.104
 fault-0.101
 advice-0.097
Token end
Token ID886
Feature activation-7.85e-03

Pos logit contributions
 boarded+0.023
ep+0.022
cker+0.021
str+0.020
ittens+0.020
Neg logit contributions
 truth-0.024
 consequences-0.024
 Give-0.021
ives-0.021
 fault-0.020
Token of
Token ID286
Feature activation-9.92e-02

Pos logit contributions
 boarded+0.013
ep+0.012
str+0.012
field+0.012
 sudden+0.012
Neg logit contributions
ives-0.014
 aback-0.014
 truth-0.014
 Give-0.014
 consequences-0.014
Token the
Token ID262
Feature activation-2.50e-02

Pos logit contributions
 boarded+0.159
ep+0.154
str+0.152
pron+0.148
 sudden+0.143
Neg logit contributions
 aback-0.181
 truth-0.171
 consequences-0.168
 Give-0.162
 fault-0.162
Token day
Token ID1110
Feature activation-2.78e-02

Pos logit contributions
 boarded+0.031
ep+0.030
str+0.027
cker+0.026
aster+0.026
Neg logit contributions
 truth-0.055
 consequences-0.054
 Give-0.051
 apology-0.050
ives-0.049
Token,
Token ID11
Feature activation-5.90e-02

Pos logit contributions
ep+0.045
 boarded+0.045
str+0.044
field+0.043
aster+0.043
Neg logit contributions
ives-0.049
 aback-0.047
 consequences-0.047
 Give-0.047
 truth-0.046
Token the
Token ID262
Feature activation-8.18e-03

Pos logit contributions
ep+0.109
str+0.104
cker+0.103
 boarded+0.103
aster+0.101
Neg logit contributions
 aback-0.090
 truth-0.087
 Give-0.085
 consequences-0.083
ives-0.081
Token best
Token ID1266
Feature activation-3.22e-02

Pos logit contributions
 boarded+0.012
ep+0.012
str+0.011
aster+0.011
field+0.011
Neg logit contributions
 consequences-0.016
 truth-0.016
ives-0.015
 aback-0.015
 Give-0.015
Token
Token ID198
Feature activation-3.03e-02

Pos logit contributions
 consequences+0.039
ives+0.039
 aback+0.039
 truth+0.038
 Give+0.037
Neg logit contributions
 boarded-0.037
ep-0.034
str-0.034
field-0.034
 sudden-0.034
Token
Token ID198
Feature activation-3.82e-02

Pos logit contributions
 boarded+0.050
ep+0.048
str+0.047
field+0.046
aster+0.046
Neg logit contributions
ives-0.054
 aback-0.053
 consequences-0.053
 truth-0.052
 Give-0.051
TokenThen
Token ID6423
Feature activation-5.71e-02

Pos logit contributions
 boarded+0.034
field+0.034
cker+0.033
str+0.032
 wheat+0.031
Neg logit contributions
ives-0.094
 Give-0.093
 aback-0.092
 truth-0.091
 agrees-0.088
Token,
Token ID11
Feature activation-7.06e-02

Pos logit contributions
ep+0.085
str+0.084
 boarded+0.084
pron+0.081
eh+0.080
Neg logit contributions
 aback-0.108
 consequences-0.106
 truth-0.103
 Give-0.102
 agrees-0.100
Token one
Token ID530
Feature activation-3.06e-02

Pos logit contributions
ep+0.096
str+0.094
pron+0.092
 boarded+0.091
eh+0.089
Neg logit contributions
 aback-0.144
 consequences-0.135
 truth-0.134
 Give-0.130
 agrees-0.126
Token of
Token ID286
Feature activation-9.90e-02

Pos logit contributions
ep+0.055
 boarded+0.053
field+0.053
str+0.052
chard+0.051
Neg logit contributions
 aback-0.046
 truth-0.045
 Give-0.045
ives-0.044
 consequences-0.044
Token the
Token ID262
Feature activation-1.04e-02

Pos logit contributions
str+0.155
 boarded+0.150
field+0.145
pron+0.143
cker+0.137
Neg logit contributions
 aback-0.188
 truth-0.182
 consequences-0.178
 Give-0.171
 apology-0.165
Token children
Token ID1751
Feature activation-3.68e-02

Pos logit contributions
 boarded+0.017
ep+0.016
str+0.016
field+0.016
aster+0.016
Neg logit contributions
 consequences-0.018
ives-0.018
 aback-0.018
 truth-0.018
 Give-0.018
Token said
Token ID531
Feature activation+4.49e-02

Pos logit contributions
 boarded+0.048
ep+0.047
str+0.046
aster+0.045
field+0.045
Neg logit contributions
ives-0.076
 aback-0.075
 consequences-0.074
 truth-0.073
 Give-0.073
Token:
Token ID25
Feature activation+1.38e-03

Pos logit contributions
ives+0.089
 consequences+0.087
 aback+0.085
 fault+0.082
 truth+0.082
Neg logit contributions
 boarded-0.064
str-0.060
ep-0.060
 sudden-0.059
 wheat-0.058
Token "
Token ID366
Feature activation+2.97e-02

Pos logit contributions
ives+0.002
 aback+0.002
 truth+0.002
 consequences+0.002
 Give+0.002
Neg logit contributions
 boarded-0.002
ep-0.002
str-0.002
 sudden-0.002
field-0.002
Token
Token ID198
Feature activation-3.75e-02

Pos logit contributions
 boarded+0.051
ep+0.049
str+0.048
field+0.048
aster+0.047
Neg logit contributions
ives-0.053
 aback-0.053
 consequences-0.052
 truth-0.051
 Give-0.051
TokenBut
Token ID1537
Feature activation-4.04e-02

Pos logit contributions
 boarded+0.043
field+0.042
 blowing+0.039
cker+0.038
pron+0.038
Neg logit contributions
ives-0.083
 Give-0.083
 aback-0.081
 truth-0.079
 consequences-0.078
Token then
Token ID788
Feature activation-6.42e-02

Pos logit contributions
ep+0.057
pron+0.056
str+0.055
 boarded+0.055
eh+0.052
Neg logit contributions
 aback-0.075
 Give-0.072
 consequences-0.071
 begs-0.067
 truth-0.067
Token,
Token ID11
Feature activation-7.01e-02

Pos logit contributions
 boarded+0.098
ep+0.095
str+0.093
field+0.092
pron+0.091
Neg logit contributions
 aback-0.120
 consequences-0.120
 truth-0.116
 Give-0.115
ives-0.114
Token one
Token ID530
Feature activation-3.29e-02

Pos logit contributions
ep+0.111
str+0.109
 boarded+0.106
pron+0.104
field+0.103
Neg logit contributions
 aback-0.125
 consequences-0.122
 truth-0.120
 Give-0.117
ives-0.108
Token of
Token ID286
Feature activation-9.90e-02

Pos logit contributions
 boarded+0.051
field+0.051
ep+0.050
str+0.048
After+0.047
Neg logit contributions
 consequences-0.060
 truth-0.059
 aback-0.058
ives-0.057
 Give-0.057
Token the
Token ID262
Feature activation-5.98e-03

Pos logit contributions
str+0.165
 boarded+0.161
field+0.159
pron+0.157
cker+0.143
Neg logit contributions
 aback-0.177
 truth-0.175
 consequences-0.175
 Give-0.160
 apology-0.157
Token children
Token ID1751
Feature activation-3.73e-02

Pos logit contributions
 boarded+0.009
 sudden+0.009
ep+0.009
str+0.009
 sunny+0.009
Neg logit contributions
 aback-0.011
ives-0.011
 consequences-0.010
 Give-0.010
 truth-0.010
Token came
Token ID1625
Feature activation+3.14e-02

Pos logit contributions
 boarded+0.049
ep+0.047
str+0.046
aster+0.046
field+0.046
Neg logit contributions
ives-0.077
 aback-0.076
 consequences-0.076
 truth-0.075
 Give-0.074
Token running
Token ID2491
Feature activation+3.14e-02

Pos logit contributions
ives+0.060
 consequences+0.058
 aback+0.056
 truth+0.056
 Give+0.056
Neg logit contributions
 boarded-0.048
ep-0.047
 sudden-0.045
field-0.044
 blowing-0.044
Token up
Token ID510
Feature activation-1.98e-02

Pos logit contributions
ives+0.066
 consequences+0.065
 truth+0.063
 Give+0.061
 apology+0.061
Neg logit contributions
 boarded-0.040
ep-0.039
 sunny-0.038
 blowing-0.037
 wheat-0.037
Token
Token ID198
Feature activation-4.26e-02

Pos logit contributions
 boarded+0.046
ep+0.045
str+0.043
field+0.043
 sudden+0.042
Neg logit contributions
 truth-0.064
ives-0.064
 aback-0.064
 consequences-0.063
 Give-0.062
Token
Token ID198
Feature activation-4.78e-02

Pos logit contributions
 boarded+0.066
field+0.062
str+0.062
 wheat+0.060
 blowing+0.060
Neg logit contributions
 Give-0.077
ives-0.075
 aback-0.075
 truth-0.074
 consequences-0.071
TokenAt
Token ID2953
Feature activation-6.36e-02

Pos logit contributions
 boarded+0.062
field+0.060
 blowing+0.059
str+0.059
cker+0.058
Neg logit contributions
 Give-0.098
ives-0.095
 aback-0.094
 truth-0.093
 agrees-0.090
Token the
Token ID262
Feature activation-9.74e-03

Pos logit contributions
str+0.103
ep+0.101
 sudden+0.100
 boarded+0.099
cker+0.098
Neg logit contributions
 aback-0.110
 consequences-0.110
 truth-0.107
 fault-0.106
 apology-0.096
Token end
Token ID886
Feature activation-2.88e-03

Pos logit contributions
 boarded+0.015
ep+0.015
ittens+0.014
str+0.014
cker+0.014
Neg logit contributions
 truth-0.018
 consequences-0.018
 Give-0.016
 fault-0.016
 apology-0.015
Token of
Token ID286
Feature activation-9.90e-02

Pos logit contributions
 boarded+0.004
ep+0.004
str+0.004
field+0.004
aster+0.004
Neg logit contributions
ives-0.005
 aback-0.005
 consequences-0.005
 truth-0.005
 Give-0.005
Token the
Token ID262
Feature activation-2.41e-02

Pos logit contributions
 boarded+0.162
str+0.159
pron+0.154
ep+0.149
 sudden+0.147
Neg logit contributions
 aback-0.170
 truth-0.169
 consequences-0.164
 Give-0.161
 fault-0.157
Token day
Token ID1110
Feature activation-2.73e-02

Pos logit contributions
 boarded+0.031
ep+0.031
str+0.027
After+0.027
cker+0.027
Neg logit contributions
 truth-0.051
 consequences-0.051
 Give-0.047
 apology-0.047
 rules-0.045
Token,
Token ID11
Feature activation-5.82e-02

Pos logit contributions
ep+0.044
 boarded+0.044
str+0.043
aster+0.042
field+0.042
Neg logit contributions
ives-0.049
 aback-0.047
 consequences-0.046
 Give-0.046
 truth-0.046
Token they
Token ID484
Feature activation-1.01e-02

Pos logit contributions
ep+0.111
str+0.107
 boarded+0.107
cker+0.106
aster+0.105
Neg logit contributions
 aback-0.085
 truth-0.082
 Give-0.082
 consequences-0.079
ives-0.079
Token said
Token ID531
Feature activation+4.04e-02

Pos logit contributions
 sudden+0.014
ep+0.014
str+0.014
pron+0.014
 blowing+0.013
Neg logit contributions
ives-0.019
 truth-0.019
 aback-0.019
 agrees-0.018
 consequences-0.018
Token around
Token ID1088
Feature activation-1.15e-02

Pos logit contributions
ives+0.003
 consequences+0.002
 aback+0.002
 truth+0.002
 Give+0.002
Neg logit contributions
 boarded-0.002
ep-0.002
str-0.002
 sudden-0.002
aster-0.002
Token and
Token ID290
Feature activation-6.19e-02

Pos logit contributions
 boarded+0.016
ep+0.015
aster+0.015
field+0.015
 sudden+0.015
Neg logit contributions
ives-0.023
 aback-0.023
 consequences-0.023
 Give-0.022
 truth-0.022
Token around
Token ID1088
Feature activation-2.39e-02

Pos logit contributions
str+0.107
ep+0.101
 boarded+0.099
 wheat+0.099
pron+0.098
Neg logit contributions
 aback-0.105
 Give-0.101
ives-0.100
 consequences-0.099
 truth-0.098
Token.
Token ID13
Feature activation+2.78e-02

Pos logit contributions
 boarded+0.035
ep+0.035
 sunny+0.034
aster+0.034
field+0.034
Neg logit contributions
ives-0.047
 aback-0.045
 consequences-0.044
 Give-0.043
 truth-0.042
Token Faster
Token ID38996
Feature activation-7.13e-03

Pos logit contributions
 consequences+0.045
ives+0.044
 aback+0.044
 truth+0.043
 Give+0.041
Neg logit contributions
 boarded-0.050
str-0.047
 sudden-0.047
field-0.047
aster-0.046
Token and
Token ID290
Feature activation-9.90e-02

Pos logit contributions
 boarded+0.012
ep+0.011
aster+0.011
str+0.011
field+0.011
Neg logit contributions
 aback-0.013
ives-0.012
 consequences-0.012
 Give-0.012
 truth-0.012
Token faster
Token ID5443
Feature activation-4.33e-02

Pos logit contributions
 boarded+0.180
field+0.171
str+0.168
ep+0.167
aster+0.167
Neg logit contributions
 aback-0.158
ives-0.155
 consequences-0.155
 truth-0.150
 Give-0.149
Token she
Token ID673
Feature activation+2.12e-03

Pos logit contributions
 boarded+0.061
str+0.058
ep+0.058
field+0.056
pron+0.055
Neg logit contributions
 consequences-0.087
ives-0.087
 aback-0.087
 truth-0.084
 Give-0.084
Token went
Token ID1816
Feature activation+8.88e-03

Pos logit contributions
ives+0.004
 aback+0.004
 consequences+0.004
 truth+0.004
 Give+0.004
Neg logit contributions
 boarded-0.003
ep-0.003
str-0.003
field-0.003
aster-0.003
Token,
Token ID11
Feature activation-5.30e-02

Pos logit contributions
ives+0.019
 consequences+0.018
 aback+0.018
 truth+0.018
 fault+0.018
Neg logit contributions
 boarded-0.012
ep-0.012
field-0.011
 sunny-0.011
str-0.011
Token until
Token ID1566
Feature activation-6.04e-02

Pos logit contributions
 boarded+0.074
str+0.071
ep+0.070
field+0.069
 sudden+0.068
Neg logit contributions
 aback-0.106
ives-0.106
 consequences-0.105
 truth-0.102
 Give-0.102
Token be
Token ID307
Feature activation+6.05e-03

Pos logit contributions
 aback+0.006
 truth+0.006
 consequences+0.006
ives+0.006
 Give+0.006
Neg logit contributions
 boarded-0.006
ep-0.005
str-0.005
 sudden-0.005
aster-0.005
Token speeding
Token ID26347
Feature activation+9.46e-03

Pos logit contributions
 aback+0.011
 consequences+0.011
ives+0.010
 truth+0.010
 fault+0.010
Neg logit contributions
aster-0.009
field-0.009
 boarded-0.009
str-0.009
cle-0.008
Token up
Token ID510
Feature activation-2.14e-02

Pos logit contributions
ives+0.019
 consequences+0.018
 truth+0.018
 Give+0.017
 aback+0.017
Neg logit contributions
 boarded-0.014
ep-0.013
str-0.013
field-0.013
 sudden-0.013
Token!
Token ID0
Feature activation+2.81e-02

Pos logit contributions
 boarded+0.032
ep+0.030
str+0.030
aster+0.029
field+0.029
Neg logit contributions
 aback-0.041
ives-0.041
 consequences-0.041
 truth-0.040
 Give-0.039
Token Faster
Token ID38996
Feature activation-7.02e-03

Pos logit contributions
ives+0.051
 consequences+0.051
 aback+0.050
 truth+0.049
 Give+0.049
Neg logit contributions
 boarded-0.045
ep-0.043
str-0.042
 sudden-0.042
field-0.041
Token and
Token ID290
Feature activation-9.89e-02

Pos logit contributions
 boarded+0.012
ep+0.011
aster+0.011
pron+0.011
field+0.011
Neg logit contributions
 aback-0.013
ives-0.012
 consequences-0.012
 Give-0.012
 truth-0.012
Token faster
Token ID5443
Feature activation-4.55e-02

Pos logit contributions
 boarded+0.186
field+0.177
pron+0.176
ep+0.174
aster+0.174
Neg logit contributions
 aback-0.154
 consequences-0.148
ives-0.146
 Give-0.144
 agrees-0.143
Token it
Token ID340
Feature activation+9.50e-03

Pos logit contributions
 boarded+0.065
pron+0.061
ep+0.061
str+0.061
field+0.060
Neg logit contributions
 aback-0.092
 consequences-0.091
ives-0.089
 Give-0.087
 truth-0.086
Token went
Token ID1816
Feature activation+7.90e-03

Pos logit contributions
ives+0.021
 aback+0.021
 consequences+0.021
 truth+0.020
 Give+0.020
Neg logit contributions
 boarded-0.011
str-0.010
aster-0.010
field-0.010
pron-0.010
Token.
Token ID13
Feature activation+2.07e-02

Pos logit contributions
ives+0.015
 consequences+0.015
 truth+0.015
 aback+0.015
 fault+0.014
Neg logit contributions
 boarded-0.011
ep-0.011
str-0.011
field-0.011
 sudden-0.011
Token Jo
Token ID5302
Feature activation+5.48e-03

Pos logit contributions
ives+0.037
 consequences+0.036
 aback+0.036
 truth+0.035
 Give+0.035
Neg logit contributions
 boarded-0.034
ep-0.032
str-0.032
field-0.032
 sudden-0.032
Token started
Token ID2067
Feature activation+1.66e-03

Pos logit contributions
str+0.082
aster+0.081
field+0.080
 boarded+0.079
cker+0.079
Neg logit contributions
 aback-0.107
 Give-0.104
ives-0.100
 agrees-0.099
 consequences-0.094
Token to
Token ID284
Feature activation-6.04e-03

Pos logit contributions
ives+0.003
 aback+0.003
 consequences+0.003
 truth+0.003
 Give+0.003
Neg logit contributions
 boarded-0.003
ep-0.003
str-0.003
field-0.003
aster-0.003
Token bounce
Token ID20110
Feature activation+3.59e-03

Pos logit contributions
 boarded+0.008
ep+0.008
str+0.008
field+0.008
aster+0.008
Neg logit contributions
ives-0.012
 aback-0.012
 consequences-0.012
 truth-0.012
 Give-0.012
Token.
Token ID13
Feature activation+2.29e-02

Pos logit contributions
ives+0.008
 consequences+0.008
 truth+0.008
 apology+0.008
 fault+0.007
Neg logit contributions
ep-0.004
 sunny-0.004
 field-0.004
 which-0.004
After-0.004
Token Higher
Token ID16038
Feature activation-2.19e-02

Pos logit contributions
 consequences+0.045
ives+0.044
 truth+0.042
 aback+0.041
 fault+0.041
Neg logit contributions
 sudden-0.034
 boarded-0.033
After-0.033
aster-0.032
field-0.031
Token and
Token ID290
Feature activation-9.89e-02

Pos logit contributions
 boarded+0.043
pron+0.041
aster+0.039
str+0.038
ep+0.038
Neg logit contributions
 aback-0.034
 consequences-0.032
ives-0.032
 Give-0.031
 agrees-0.030
Token higher
Token ID2440
Feature activation-6.95e-02

Pos logit contributions
 boarded+0.161
pron+0.148
aster+0.146
field+0.146
ep+0.144
Neg logit contributions
 aback-0.182
 consequences-0.172
 Give-0.169
 agrees-0.168
ives-0.168
Token he
Token ID339
Feature activation-8.51e-04

Pos logit contributions
 boarded+0.111
pron+0.107
str+0.104
ep+0.100
field+0.098
Neg logit contributions
 aback-0.128
 consequences-0.128
ives-0.121
 Give-0.120
 truth-0.119
Token went
Token ID1816
Feature activation+7.00e-03

Pos logit contributions
 boarded+0.001
str+0.001
ep+0.001
aster+0.001
field+0.001
Neg logit contributions
ives-0.002
 aback-0.002
 consequences-0.002
 truth-0.002
 Give-0.002
Token until
Token ID1566
Feature activation-5.46e-02

Pos logit contributions
ives+0.015
 consequences+0.015
 aback+0.014
 truth+0.014
 fault+0.014
Neg logit contributions
str-0.008
 sunny-0.008
ep-0.008
 boarded-0.008
field-0.008
Token he
Token ID339
Feature activation+2.09e-02

Pos logit contributions
ep+0.082
str+0.082
 boarded+0.081
cker+0.077
pron+0.075
Neg logit contributions
 consequences-0.104
 aback-0.103
 truth-0.100
 Give-0.097
ives-0.095
Token.
Token ID13
Feature activation+1.92e-02

Pos logit contributions
 boarded+0.005
ep+0.005
str+0.005
 sudden+0.005
field+0.005
Neg logit contributions
 aback-0.009
 consequences-0.009
ives-0.009
 truth-0.009
 Give-0.009
Token Every
Token ID3887
Feature activation-3.08e-02

Pos logit contributions
ives+0.029
 consequences+0.029
 aback+0.028
 truth+0.028
 Give+0.027
Neg logit contributions
 boarded-0.037
ep-0.035
str-0.035
field-0.034
 sudden-0.034
Token day
Token ID1110
Feature activation-4.17e-02

Pos logit contributions
 boarded+0.033
ep+0.032
str+0.031
 sudden+0.029
After+0.029
Neg logit contributions
 truth-0.072
 consequences-0.071
 aback-0.069
 Give-0.068
ives-0.067
Token,
Token ID11
Feature activation-6.96e-02

Pos logit contributions
 boarded+0.055
ep+0.055
str+0.053
field+0.053
cker+0.050
Neg logit contributions
 aback-0.086
ives-0.085
 consequences-0.084
 truth-0.083
 Give-0.083
Token more
Token ID517
Feature activation-6.50e-02

Pos logit contributions
 boarded+0.101
ep+0.101
str+0.099
field+0.094
cker+0.092
Neg logit contributions
 aback-0.132
 truth-0.130
 consequences-0.129
 Give-0.126
ives-0.125
Token and
Token ID290
Feature activation-9.88e-02

Pos logit contributions
 boarded+0.103
str+0.100
field+0.100
ep+0.098
aster+0.097
Neg logit contributions
 consequences-0.119
 truth-0.116
 aback-0.116
ives-0.116
 Give-0.112
Token more
Token ID517
Feature activation-6.07e-02

Pos logit contributions
str+0.142
 boarded+0.141
field+0.140
pron+0.136
ep+0.136
Neg logit contributions
 consequences-0.192
 aback-0.190
 truth-0.181
 agrees-0.178
 Give-0.177
Token people
Token ID661
Feature activation-4.38e-02

Pos logit contributions
 boarded+0.094
str+0.092
field+0.091
ep+0.089
aster+0.088
Neg logit contributions
 consequences-0.113
 aback-0.111
ives-0.111
 truth-0.109
 Give-0.106
Token would
Token ID561
Feature activation+2.18e-03

Pos logit contributions
aster+0.062
 sudden+0.060
 boarded+0.060
str+0.059
After+0.059
Neg logit contributions
ives-0.085
 aback-0.083
 truth-0.081
 consequences-0.080
 agrees-0.078
Token come
Token ID1282
Feature activation+3.87e-02

Pos logit contributions
 truth+0.004
ives+0.004
 aback+0.004
 consequences+0.004
 Give+0.004
Neg logit contributions
 boarded-0.004
str-0.004
ep-0.003
 sudden-0.003
aster-0.003
Token to
Token ID284
Feature activation-6.15e-03

Pos logit contributions
ives+0.063
 consequences+0.063
 aback+0.063
 truth+0.061
 Give+0.060
Neg logit contributions
 boarded-0.069
ep-0.066
str-0.065
field-0.064
 sudden-0.064
Token But
Token ID887
Feature activation-3.48e-02

Pos logit contributions
 consequences+0.057
ives+0.056
 aback+0.053
 truth+0.051
 agrees+0.049
Neg logit contributions
 boarded-0.081
aster-0.077
ep-0.075
str-0.075
field-0.075
Token Sam
Token ID3409
Feature activation+5.98e-03

Pos logit contributions
str+0.061
ep+0.058
 sudden+0.057
field+0.056
cker+0.056
Neg logit contributions
 Give-0.057
 aback-0.056
 consequences-0.053
 truth-0.052
 begs-0.050
Token was
Token ID373
Feature activation-4.03e-05

Pos logit contributions
ives+0.012
 consequences+0.012
 aback+0.012
 truth+0.012
 Give+0.011
Neg logit contributions
 boarded-0.008
ep-0.008
str-0.008
field-0.008
aster-0.008
Token scared
Token ID12008
Feature activation+2.26e-02

Pos logit contributions
 boarded+0.000
field+0.000
str+0.000
 sudden+0.000
aster+0.000
Neg logit contributions
 aback-0.000
 truth-0.000
 consequences-0.000
ives-0.000
 Give-0.000
Token and
Token ID290
Feature activation-5.39e-02

Pos logit contributions
ives+0.048
 consequences+0.047
 aback+0.047
 fault+0.046
 truth+0.045
Neg logit contributions
 sudden-0.027
ep-0.026
 wheat-0.026
 sunny-0.026
 which-0.025
Token said
Token ID531
Feature activation+5.83e-02

Pos logit contributions
 boarded+0.084
str+0.082
field+0.081
aster+0.080
ep+0.080
Neg logit contributions
 aback-0.096
 consequences-0.095
 Give-0.095
 truth-0.093
ives-0.093
Token,
Token ID11
Feature activation+7.38e-03

Pos logit contributions
ives+0.133
 aback+0.129
 consequences+0.127
 truth+0.123
 agrees+0.121
Neg logit contributions
str-0.066
 boarded-0.063
aster-0.061
ep-0.061
 wheat-0.059
Token "
Token ID366
Feature activation+4.29e-02

Pos logit contributions
ives+0.013
 aback+0.013
 consequences+0.013
 truth+0.012
 Give+0.012
Neg logit contributions
 boarded-0.012
ep-0.012
str-0.012
field-0.012
aster-0.012
TokenNo
Token ID2949
Feature activation+2.82e-03

Pos logit contributions
 aback+0.077
 consequences+0.076
ives+0.075
 Give+0.074
 truth+0.073
Neg logit contributions
 boarded-0.071
field-0.066
ep-0.066
str-0.066
pron-0.065
Token,
Token ID11
Feature activation-6.97e-03

Pos logit contributions
ives+0.007
 aback+0.006
 consequences+0.006
 truth+0.006
 Give+0.006
Neg logit contributions
str-0.003
aster-0.003
 sudden-0.003
ep-0.003
 wheat-0.003
Token thank
Token ID5875
Feature activation-2.95e-03

Pos logit contributions
 boarded+0.013
field+0.012
ep+0.012
 sudden+0.012
str+0.012
Neg logit contributions
 Give-0.011
 consequences-0.011
 truth-0.011
 aback-0.011
ives-0.011
Token Sam
Token ID3409
Feature activation-2.12e-03

Pos logit contributions
 boarded+0.091
str+0.088
field+0.088
ep+0.087
 sudden+0.086
Neg logit contributions
 aback-0.085
 consequences-0.085
ives-0.083
 Give-0.083
 truth-0.082
Token felt
Token ID2936
Feature activation-1.43e-03

Pos logit contributions
 boarded+0.003
ep+0.003
str+0.003
field+0.003
 sudden+0.003
Neg logit contributions
ives-0.004
 consequences-0.004
 aback-0.004
 truth-0.004
 Give-0.004
Token sorry
Token ID7926
Feature activation+5.29e-02

Pos logit contributions
 boarded+0.002
 sudden+0.002
 wheat+0.002
ep+0.002
 sunny+0.002
Neg logit contributions
ives-0.003
 Give-0.002
 aback-0.002
 consequences-0.002
 fault-0.002
Token and
Token ID290
Feature activation-5.12e-02

Pos logit contributions
 aback+0.117
ives+0.113
 Give+0.107
 consequences+0.104
 agrees+0.104
Neg logit contributions
After-0.061
 blowing-0.058
 they-0.058
 sunny-0.057
ep-0.057
Token ashamed
Token ID22461
Feature activation+3.81e-02

Pos logit contributions
 boarded+0.097
str+0.092
ep+0.092
 sudden+0.090
aster+0.088
Neg logit contributions
ives-0.080
 consequences-0.078
 aback-0.077
 truth-0.075
 Give-0.074
Token.
Token ID13
Feature activation+5.94e-02

Pos logit contributions
 aback+0.084
 fault+0.082
ives+0.081
 agrees+0.077
 Give+0.076
Neg logit contributions
 they-0.043
 when-0.040
 which-0.039
 while-0.037
 blowing-0.037
Token They
Token ID1119
Feature activation-3.48e-05

Pos logit contributions
ives+0.108
 aback+0.105
 consequences+0.105
 truth+0.099
 Give+0.096
Neg logit contributions
 boarded-0.098
 sudden-0.092
ep-0.091
 wheat-0.091
After-0.091
Token told
Token ID1297
Feature activation+2.85e-02

Pos logit contributions
 boarded+0.000
ep+0.000
str+0.000
field+0.000
aster+0.000
Neg logit contributions
ives-0.000
 aback-0.000
 consequences-0.000
 truth-0.000
 Give-0.000
Token Mom
Token ID11254
Feature activation-6.61e-03

Pos logit contributions
ives+0.068
 aback+0.068
 consequences+0.067
 truth+0.066
 Give+0.066
Neg logit contributions
 boarded-0.030
ep-0.028
str-0.027
aster-0.027
field-0.026
Token and
Token ID290
Feature activation-5.21e-02

Pos logit contributions
 boarded+0.009
ep+0.009
str+0.009
aster+0.008
field+0.008
Neg logit contributions
ives-0.013
 aback-0.013
 consequences-0.013
 truth-0.013
 Give-0.013
Token Dad
Token ID17415
Feature activation-2.11e-02

Pos logit contributions
ep+0.084
 boarded+0.080
field+0.079
str+0.078
 sudden+0.078
Neg logit contributions
 aback-0.095
ives-0.094
 consequences-0.093
 truth-0.091
 fault-0.089
Token went
Token ID1816
Feature activation+2.96e-02

Pos logit contributions
aster+0.078
ep+0.077
str+0.076
field+0.076
 boarded+0.076
Neg logit contributions
 aback-0.086
ives-0.086
 consequences-0.086
 agrees-0.084
 truth-0.083
Token to
Token ID284
Feature activation-9.72e-03

Pos logit contributions
 consequences+0.050
 aback+0.049
ives+0.049
 truth+0.048
 Give+0.047
Neg logit contributions
 boarded-0.053
ep-0.050
 sudden-0.048
aster-0.048
 wheat-0.047
Token play
Token ID711
Feature activation+1.07e-02

Pos logit contributions
 boarded+0.013
aster+0.012
ep+0.012
field+0.012
str+0.012
Neg logit contributions
ives-0.021
 consequences-0.021
 aback-0.020
 Give-0.020
 truth-0.020
Token together
Token ID1978
Feature activation-1.07e-02

Pos logit contributions
 aback+0.023
ives+0.023
 fault+0.023
 consequences+0.023
 Give+0.022
Neg logit contributions
ep-0.013
aster-0.012
 sunny-0.012
 sudden-0.011
pron-0.011
Token.
Token ID13
Feature activation+4.97e-02

Pos logit contributions
 boarded+0.018
aster+0.018
ep+0.018
field+0.018
 sunny+0.018
Neg logit contributions
 aback-0.018
ives-0.018
 consequences-0.017
 Give-0.017
 truth-0.016
Token<|endoftext|>
Token ID50256
Feature activation+4.54e-02

Pos logit contributions
 aback+0.087
 consequences+0.087
ives+0.087
 truth+0.082
 Give+0.081
Neg logit contributions
 boarded-0.083
str-0.078
ep-0.077
 sudden-0.077
field-0.077
TokenL
Token ID43
Feature activation-1.47e-02

Pos logit contributions
 consequences+0.101
 aback+0.099
 fault+0.099
 apology+0.099
ives+0.097
Neg logit contributions
 boarded-0.052
field-0.050
ep-0.049
 field-0.048
str-0.047
Tokenily
Token ID813
Feature activation+9.29e-03

Pos logit contributions
 boarded+0.022
ep+0.022
aster+0.022
str+0.020
field+0.020
Neg logit contributions
 aback-0.028
 consequences-0.028
ives-0.028
 Give-0.027
 truth-0.027
Token liked
Token ID8288
Feature activation-2.62e-03

Pos logit contributions
 consequences+0.020
ives+0.020
 aback+0.020
 truth+0.019
 Give+0.019
Neg logit contributions
 boarded-0.012
ep-0.011
str-0.011
aster-0.011
field-0.011
Token to
Token ID284
Feature activation+8.83e-04

Pos logit contributions
 boarded+0.006
ep+0.006
field+0.006
str+0.006
cker+0.006
Neg logit contributions
 truth-0.003
ives-0.003
 consequences-0.002
 Give-0.002
 fault-0.002
Token tie
Token ID9839
Feature activation-1.20e-02

Pos logit contributions
 truth+0.001
ives+0.001
 Give+0.001
 aback+0.001
 consequences+0.001
Neg logit contributions
 boarded-0.002
str-0.002
ep-0.001
field-0.001
 sudden-0.001
Token a
Token ID257
Feature activation+7.23e-03

Pos logit contributions
ives+0.002
 consequences+0.002
 aback+0.002
 truth+0.002
 Give+0.002
Neg logit contributions
 boarded-0.001
aster-0.001
ep-0.001
 sudden-0.001
 wheat-0.001
Token spoon
Token ID24556
Feature activation-6.20e-03

Pos logit contributions
ives+0.012
 aback+0.012
 truth+0.012
 Give+0.011
 apology+0.011
Neg logit contributions
 boarded-0.012
ep-0.012
field-0.011
ths-0.011
 sailed-0.011
Token and
Token ID290
Feature activation-6.78e-02

Pos logit contributions
 boarded+0.010
 sudden+0.010
str+0.010
ep+0.009
field+0.009
Neg logit contributions
ives-0.011
 aback-0.011
 consequences-0.011
 truth-0.011
 Give-0.011
Token a
Token ID257
Feature activation-5.60e-04

Pos logit contributions
str+0.114
field+0.103
ep+0.100
 boarded+0.098
ich+0.097
Neg logit contributions
ives-0.120
 aback-0.117
 agrees-0.114
 consequences-0.114
 Give-0.113
Token bowl
Token ID9396
Feature activation-1.79e-02

Pos logit contributions
 boarded+0.001
ep+0.001
str+0.001
field+0.001
aster+0.001
Neg logit contributions
 truth-0.001
ives-0.001
 aback-0.001
 consequences-0.001
 Give-0.001
Token.
Token ID13
Feature activation+5.70e-02

Pos logit contributions
 boarded+0.028
ep+0.027
aster+0.027
field+0.026
str+0.026
Neg logit contributions
 consequences-0.033
 aback-0.032
ives-0.032
 truth-0.031
 Give-0.031
Token She
Token ID1375
Feature activation-1.04e-02

Pos logit contributions
 consequences+0.093
 aback+0.088
ives+0.086
 truth+0.084
 Give+0.081
Neg logit contributions
 boarded-0.102
aster-0.098
After-0.097
 wheat-0.096
field-0.096
Token scoop
Token ID23529
Feature activation-3.79e-04

Pos logit contributions
 boarded+0.013
str+0.013
ep+0.013
field+0.013
 sudden+0.013
Neg logit contributions
ives-0.022
 aback-0.021
 consequences-0.021
 truth-0.021
 Give-0.021
Tokened
Token ID276
Feature activation-1.77e-03

Pos logit contributions
 boarded+0.001
ep+0.001
str+0.001
aster+0.001
 sudden+0.001
Neg logit contributions
ives-0.000
 consequences-0.000
 aback-0.000
 truth-0.000
 Give-0.000
Token some
Token ID617
Feature activation-4.51e-03

Pos logit contributions
 boarded+0.002
ep+0.002
str+0.002
field+0.002
aster+0.002
Neg logit contributions
ives-0.004
 consequences-0.004
 aback-0.004
 truth-0.004
 Give-0.004
Token soup
Token ID17141
Feature activation-1.09e-02

Pos logit contributions
 boarded+0.006
str+0.006
ep+0.006
field+0.006
aster+0.006
Neg logit contributions
ives-0.009
 aback-0.009
 consequences-0.009
 truth-0.009
 Give-0.009
Token happened
Token ID3022
Feature activation+5.65e-02

Pos logit contributions
 consequences+0.029
 aback+0.029
ives+0.028
 truth+0.028
 Give+0.027
Neg logit contributions
 boarded-0.028
field-0.027
ep-0.027
str-0.026
aster-0.026
Token and
Token ID290
Feature activation-5.65e-02

Pos logit contributions
ives+0.125
 aback+0.122
 consequences+0.122
 truth+0.118
 Give+0.118
Neg logit contributions
 boarded-0.066
ep-0.066
str-0.063
field-0.062
 sudden-0.062
Token said
Token ID531
Feature activation+5.28e-02

Pos logit contributions
pron+0.089
 boarded+0.089
field+0.087
aster+0.086
str+0.084
Neg logit contributions
 aback-0.101
 consequences-0.099
 Give-0.098
 agrees-0.097
 truth-0.097
Token,
Token ID11
Feature activation-8.60e-03

Pos logit contributions
ives+0.135
 aback+0.126
 consequences+0.126
 truth+0.122
 fault+0.122
Neg logit contributions
str-0.048
 they-0.044
apy-0.042
 boarded-0.040
ep-0.039
Token "
Token ID366
Feature activation+3.25e-02

Pos logit contributions
 boarded+0.018
aster+0.017
field+0.017
str+0.017
ep+0.017
Neg logit contributions
 Give-0.011
 truth-0.011
 aback-0.011
 consequences-0.011
ives-0.011
TokenDon
Token ID3987
Feature activation+4.99e-02

Pos logit contributions
 aback+0.053
 consequences+0.053
 Give+0.051
ives+0.051
 truth+0.050
Neg logit contributions
 boarded-0.059
field-0.056
str-0.054
ep-0.054
aster-0.054
Token't
Token ID470
Feature activation-5.72e-03

Pos logit contributions
ives+0.076
 consequences+0.075
 aback+0.075
 fault+0.072
 truth+0.070
Neg logit contributions
 boarded-0.093
ep-0.091
str-0.091
aster-0.087
 sudden-0.087
Token worry
Token ID5490
Feature activation+5.57e-02

Pos logit contributions
 boarded+0.009
str+0.009
ep+0.009
 sudden+0.009
aster+0.009
Neg logit contributions
ives-0.010
 aback-0.010
 consequences-0.010
 truth-0.009
 Give-0.009
Token,
Token ID11
Feature activation-5.49e-03

Pos logit contributions
ives+0.132
 aback+0.127
 consequences+0.125
 agrees+0.120
 truth+0.119
Neg logit contributions
str-0.058
ep-0.054
 sunny-0.053
 wheat-0.052
 sudden-0.050
Token Tiny
Token ID20443
Feature activation+1.33e-02

Pos logit contributions
 boarded+0.009
ep+0.009
field+0.009
str+0.009
aster+0.009
Neg logit contributions
 consequences-0.010
 truth-0.009
 aback-0.009
ives-0.009
 Give-0.009
Token.
Token ID13
Feature activation+5.05e-02

Pos logit contributions
ives+0.028
 aback+0.026
 consequences+0.026
 truth+0.025
 apology+0.025
Neg logit contributions
str-0.018
 flew-0.017
ep-0.017
cker-0.016
 wheat-0.016
Token the
Token ID262
Feature activation+1.62e-02

Pos logit contributions
str+0.088
ep+0.087
 boarded+0.087
 sudden+0.085
pron+0.084
Neg logit contributions
 consequences-0.099
 truth-0.099
 aback-0.098
ives-0.096
 Give-0.095
Token children
Token ID1751
Feature activation-2.65e-02

Pos logit contributions
ives+0.030
 aback+0.030
 Give+0.028
 consequences+0.028
 agrees+0.028
Neg logit contributions
 boarded-0.025
 sudden-0.024
ep-0.024
field-0.024
 wheat-0.024
Token.
Token ID13
Feature activation+4.24e-02

Pos logit contributions
 boarded+0.048
ep+0.046
str+0.046
 sudden+0.046
aster+0.045
Neg logit contributions
ives-0.042
 aback-0.041
 consequences-0.041
 truth-0.040
 Give-0.039
Token<|endoftext|>
Token ID50256
Feature activation+4.27e-02

Pos logit contributions
 consequences+0.075
ives+0.074
 aback+0.073
 truth+0.070
 Give+0.069
Neg logit contributions
 boarded-0.070
str-0.066
ep-0.066
field-0.066
aster-0.065
TokenOnce
Token ID7454
Feature activation-2.74e-02

Pos logit contributions
 consequences+0.090
 aback+0.087
ives+0.087
 fault+0.086
 apology+0.086
Neg logit contributions
 boarded-0.055
field-0.053
str-0.051
ep-0.051
 field-0.050
Token upon
Token ID2402
Feature activation+1.51e-03

Pos logit contributions
 boarded+0.031
ep+0.029
 sudden+0.028
str+0.028
field+0.028
Neg logit contributions
ives-0.062
 aback-0.061
 consequences-0.061
 truth-0.061
 Give-0.060
Token a
Token ID257
Feature activation-3.67e-02

Pos logit contributions
 truth+0.003
 consequences+0.003
ives+0.003
 aback+0.003
 Give+0.003
Neg logit contributions
 boarded-0.002
str-0.002
field-0.002
ep-0.002
cker-0.002
Token time
Token ID640
Feature activation-2.29e-02

Pos logit contributions
 boarded+0.053
ep+0.049
field+0.048
str+0.048
 sudden+0.046
Neg logit contributions
 consequences-0.072
ives-0.072
 aback-0.071
 truth-0.071
 Give-0.069
Token,
Token ID11
Feature activation-5.52e-02

Pos logit contributions
 boarded+0.032
str+0.030
ep+0.030
aster+0.029
 sudden+0.029
Neg logit contributions
ives-0.046
 consequences-0.046
 aback-0.045
 truth-0.044
 Give-0.044
Token there
Token ID612
Feature activation-1.25e-02

Pos logit contributions
 boarded+0.077
field+0.073
aster+0.073
str+0.069
ep+0.069
Neg logit contributions
 Give-0.110
 aback-0.110
 truth-0.107
 consequences-0.106
ives-0.101
Token was
Token ID373
Feature activation-1.40e-02

Pos logit contributions
 boarded+0.020
ep+0.019
str+0.019
field+0.019
 sudden+0.019
Neg logit contributions
ives-0.022
 consequences-0.022
 aback-0.022
 truth-0.021
 Give-0.021
Token was
Token ID373
Feature activation+5.54e-03

Pos logit contributions
aster+0.019
 boarded+0.019
str+0.019
ep+0.018
 sudden+0.018
Neg logit contributions
ives-0.022
 agrees-0.021
 aback-0.021
 Give-0.021
 truth-0.021
Token so
Token ID523
Feature activation-1.33e-02

Pos logit contributions
 aback+0.009
 truth+0.008
 consequences+0.008
 fault+0.008
 Give+0.008
Neg logit contributions
field-0.010
 boarded-0.009
str-0.009
cker-0.009
aster-0.009
Token excited
Token ID6568
Feature activation+2.73e-03

Pos logit contributions
field+0.023
aster+0.022
cker+0.021
ep+0.021
 boarded+0.021
Neg logit contributions
 aback-0.024
ives-0.022
 agrees-0.022
 Give-0.022
 truth-0.022
Token and
Token ID290
Feature activation-6.50e-02

Pos logit contributions
ives+0.005
 aback+0.005
 consequences+0.005
 truth+0.005
 Give+0.005
Neg logit contributions
 boarded-0.004
ep-0.004
str-0.004
field-0.004
aster-0.004
Token praised
Token ID15342
Feature activation-1.19e-02

Pos logit contributions
ep+0.104
aster+0.103
cker+0.098
field+0.097
str+0.097
Neg logit contributions
 Give-0.113
 aback-0.112
 agrees-0.111
ives-0.106
 demands-0.104
Token the
Token ID262
Feature activation+1.59e-02

Pos logit contributions
 boarded+0.015
ep+0.014
str+0.014
field+0.013
 sudden+0.013
Neg logit contributions
ives-0.026
 aback-0.026
 consequences-0.026
 truth-0.025
 Give-0.025
Token golf
Token ID13126
Feature activation-2.62e-03

Pos logit contributions
ives+0.034
 aback+0.033
 consequences+0.032
 Give+0.031
 fault+0.031
Neg logit contributions
 boarded-0.020
str-0.020
 sunny-0.020
 sudden-0.019
 field-0.019
Token ball
Token ID2613
Feature activation-1.63e-02

Pos logit contributions
 boarded+0.004
str+0.004
ep+0.004
 field+0.004
 sunny+0.003
Neg logit contributions
 consequences-0.005
 aback-0.005
 fault-0.005
 truth-0.005
ives-0.005
Token "
Token ID366
Feature activation+3.03e-02

Pos logit contributions
 boarded+0.025
ep+0.024
str+0.023
field+0.023
aster+0.023
Neg logit contributions
 consequences-0.030
ives-0.030
 aback-0.030
 truth-0.029
 Give-0.029
TokenYou
Token ID1639
Feature activation+4.77e-02

Pos logit contributions
 aback+0.051
ives+0.051
 Give+0.048
 consequences+0.045
 agrees+0.044
Neg logit contributions
 boarded-0.050
field-0.050
aster-0.049
lf-0.049
str-0.049
Token are
Token ID389
Feature activation+1.50e-02

Pos logit contributions
ives+0.076
 aback+0.075
 consequences+0.075
 truth+0.073
 Give+0.072
Neg logit contributions
 boarded-0.087
ep-0.083
str-0.083
field-0.081
aster-0.081
Token her
Token ID607
Feature activation+3.73e-02

Pos logit contributions
 boarded+0.016
str+0.014
ep+0.014
field+0.014
 sudden+0.014
Neg logit contributions
 consequences-0.025
 aback-0.025
 truth-0.025
ives-0.025
 Give-0.024
Token for
Token ID329
Feature activation-1.73e-02

Pos logit contributions
ives+0.068
 consequences+0.068
 aback+0.068
 truth+0.067
 Give+0.066
Neg logit contributions
 boarded-0.059
ep-0.056
str-0.055
aster-0.054
field-0.054
Token dinner
Token ID8073
Feature activation+8.14e-03

Pos logit contributions
 boarded+0.027
str+0.026
field+0.025
ep+0.025
cker+0.024
Neg logit contributions
 consequences-0.034
 truth-0.033
 aback-0.032
ives-0.031
 Give-0.031
Token!
Token ID0
Feature activation+2.78e-02

Pos logit contributions
ives+0.016
 aback+0.016
 consequences+0.016
 Give+0.015
 truth+0.015
Neg logit contributions
 boarded-0.012
field-0.011
ep-0.011
aster-0.011
str-0.011
Token But
Token ID887
Feature activation-3.35e-02

Pos logit contributions
 consequences+0.044
ives+0.043
 aback+0.042
 truth+0.041
 Give+0.040
Neg logit contributions
 boarded-0.052
str-0.049
ep-0.049
field-0.049
 sudden-0.049
Token the
Token ID262
Feature activation-5.22e-04

Pos logit contributions
ep+0.055
 boarded+0.054
pron+0.053
str+0.053
cker+0.050
Neg logit contributions
 aback-0.056
 consequences-0.054
 truth-0.052
 Give-0.051
 right-0.048
Token little
Token ID1310
Feature activation-2.06e-02

Pos logit contributions
 boarded+0.001
ep+0.001
str+0.001
 sudden+0.001
field+0.001
Neg logit contributions
ives-0.001
 aback-0.001
 consequences-0.001
 truth-0.001
 Give-0.001
Token duck
Token ID22045
Feature activation-1.56e-02

Pos logit contributions
 boarded+0.030
ep+0.028
str+0.028
field+0.028
aster+0.027
Neg logit contributions
ives-0.041
 consequences-0.040
 aback-0.040
 truth-0.039
 Give-0.039
Token had
Token ID550
Feature activation+6.19e-03

Pos logit contributions
 boarded+0.025
aster+0.023
ep+0.023
str+0.022
field+0.022
Neg logit contributions
 aback-0.028
ives-0.028
 consequences-0.028
 truth-0.028
 Give-0.027
Token an
Token ID281
Feature activation+3.34e-02

Pos logit contributions
ives+0.011
 aback+0.011
 consequences+0.011
 Give+0.010
 truth+0.010
Neg logit contributions
 boarded-0.010
ep-0.010
str-0.010
 sudden-0.010
field-0.010
Token idea
Token ID2126
Feature activation+3.95e-02

Pos logit contributions
ives+0.050
 aback+0.046
 Give+0.046
 consequences+0.045
 truth+0.044
Neg logit contributions
ep-0.064
 boarded-0.064
aster-0.063
 wheat-0.062
str-0.062
Token ready
Token ID3492
Feature activation+3.63e-02

Pos logit contributions
 aback+0.036
 consequences+0.035
 truth+0.035
ives+0.034
 Give+0.033
Neg logit contributions
str-0.025
field-0.024
aster-0.023
 boarded-0.023
cker-0.023
Token for
Token ID329
Feature activation-1.31e-02

Pos logit contributions
ives+0.052
 consequences+0.052
 aback+0.052
 truth+0.052
 Give+0.049
Neg logit contributions
 boarded-0.071
str-0.069
ep-0.068
 sudden-0.067
field-0.067
Token a
Token ID257
Feature activation+1.28e-02

Pos logit contributions
 boarded+0.018
str+0.017
ep+0.017
cker+0.017
field+0.016
Neg logit contributions
 consequences-0.027
 truth-0.027
 aback-0.026
ives-0.026
 Give-0.025
Token big
Token ID1263
Feature activation-4.83e-03

Pos logit contributions
 truth+0.025
ives+0.024
 consequences+0.024
 aback+0.024
 Give+0.024
Neg logit contributions
ep-0.018
 boarded-0.017
cker-0.017
field-0.016
str-0.016
Token adventure
Token ID8855
Feature activation-7.45e-03

Pos logit contributions
 boarded+0.006
ep+0.006
str+0.006
field+0.006
 sudden+0.006
Neg logit contributions
 consequences-0.010
 aback-0.010
 truth-0.010
ives-0.010
 Give-0.010
Token.
Token ID13
Feature activation+2.36e-02

Pos logit contributions
 boarded+0.015
ep+0.014
str+0.014
 sudden+0.014
aster+0.014
Neg logit contributions
ives-0.011
 aback-0.010
 consequences-0.010
 truth-0.010
 Give-0.010
Token
Token ID198
Feature activation-2.16e-02

Pos logit contributions
 aback+0.036
ives+0.036
 Give+0.035
 truth+0.034
 consequences+0.034
Neg logit contributions
ep-0.045
 boarded-0.045
aster-0.043
pron-0.043
field-0.043
Token
Token ID198
Feature activation-3.12e-02

Pos logit contributions
 boarded+0.040
ep+0.038
field+0.038
str+0.037
aster+0.037
Neg logit contributions
ives-0.033
 aback-0.033
 truth-0.033
 Give-0.032
 consequences-0.031
TokenWhen
Token ID2215
Feature activation-6.76e-02

Pos logit contributions
chard+0.036
field+0.036
cker+0.035
pron+0.033
 boarded+0.033
Neg logit contributions
 Give-0.069
 truth-0.067
ives-0.066
 aback-0.065
 consequences-0.062
Token she
Token ID673
Feature activation-6.27e-03

Pos logit contributions
ep+0.125
pron+0.124
cker+0.122
 boarded+0.116
chard+0.116
Neg logit contributions
 aback-0.102
 consequences-0.101
 truth-0.100
 Give-0.090
 right-0.085
Token was
Token ID373
Feature activation-2.47e-03

Pos logit contributions
 wheat+0.009
cker+0.009
chard+0.009
str+0.009
ep+0.009
Neg logit contributions
 aback-0.012
ives-0.011
 Give-0.011
 truth-0.011
 consequences-0.010
Token help
Token ID1037
Feature activation+2.01e-02

Pos logit contributions
 boarded+0.044
str+0.043
field+0.042
ep+0.042
cker+0.040
Neg logit contributions
 consequences-0.033
 truth-0.032
 aback-0.031
ives-0.031
 Give-0.029
Token when
Token ID618
Feature activation-3.58e-02

Pos logit contributions
ives+0.039
 aback+0.039
 consequences+0.038
 truth+0.037
 Give+0.037
Neg logit contributions
 boarded-0.029
ep-0.028
str-0.028
aster-0.027
 sudden-0.027
Token he
Token ID339
Feature activation+2.71e-02

Pos logit contributions
 boarded+0.043
ep+0.040
str+0.040
field+0.038
aster+0.037
Neg logit contributions
 consequences-0.080
ives-0.078
 aback-0.078
 truth-0.078
 Give-0.077
Token needed
Token ID2622
Feature activation+1.48e-02

Pos logit contributions
ives+0.054
 aback+0.054
 consequences+0.054
 truth+0.053
 Give+0.052
Neg logit contributions
 boarded-0.038
ep-0.036
str-0.035
field-0.035
aster-0.035
Token it
Token ID340
Feature activation+4.38e-02

Pos logit contributions
ives+0.032
 aback+0.031
 consequences+0.031
 Give+0.030
 truth+0.030
Neg logit contributions
 boarded-0.019
aster-0.018
ep-0.018
 sudden-0.018
str-0.017
Token.
Token ID13
Feature activation+2.83e-02

Pos logit contributions
ives+0.067
 aback+0.066
 consequences+0.065
 Give+0.064
 truth+0.063
Neg logit contributions
 boarded-0.082
ep-0.080
field-0.078
str-0.078
aster-0.077
Token<|endoftext|>
Token ID50256
Feature activation+4.15e-02

Pos logit contributions
 consequences+0.056
ives+0.054
 aback+0.054
 truth+0.052
 agrees+0.051
Neg logit contributions
 boarded-0.041
str-0.040
aster-0.038
After-0.038
field-0.037
TokenOnce
Token ID7454
Feature activation-2.17e-02

Pos logit contributions
 consequences+0.100
ives+0.097
 agrees+0.095
 apology+0.095
 fault+0.095
Neg logit contributions
 boarded-0.042
field-0.040
str-0.038
At-0.038
 sudden-0.038
Token upon
Token ID2402
Feature activation+2.88e-03

Pos logit contributions
 boarded+0.027
ep+0.026
field+0.025
 sudden+0.024
aster+0.024
Neg logit contributions
 aback-0.046
ives-0.046
 consequences-0.046
 truth-0.046
 Give-0.044
Token a
Token ID257
Feature activation-3.50e-02

Pos logit contributions
 truth+0.005
 consequences+0.005
ives+0.005
 aback+0.005
 Give+0.005
Neg logit contributions
 boarded-0.004
str-0.004
field-0.004
ep-0.004
 sudden-0.004
Token time
Token ID640
Feature activation-2.09e-02

Pos logit contributions
 boarded+0.046
ep+0.043
str+0.042
field+0.042
 sudden+0.041
Neg logit contributions
ives-0.074
 consequences-0.073
 aback-0.073
 truth-0.072
 Give-0.071
Token cat
Token ID3797
Feature activation-4.64e-03

Pos logit contributions
 boarded+0.004
ep+0.004
str+0.004
 sudden+0.004
After+0.004
Neg logit contributions
 truth-0.004
 consequences-0.004
 Give-0.004
ives-0.004
 aback-0.004
Token who
Token ID508
Feature activation-1.83e-02

Pos logit contributions
 boarded+0.010
str+0.010
 sudden+0.010
ep+0.010
aster+0.010
Neg logit contributions
ives-0.005
 truth-0.005
 Give-0.005
 consequences-0.005
 aback-0.005
Token liked
Token ID8288
Feature activation+6.54e-03

Pos logit contributions
 boarded+0.023
 sudden+0.022
ep+0.022
str+0.022
field+0.021
Neg logit contributions
ives-0.039
 aback-0.038
 consequences-0.038
 truth-0.038
 Give-0.037
Token to
Token ID284
Feature activation-3.46e-03

Pos logit contributions
 truth+0.009
 consequences+0.009
ives+0.009
 aback+0.008
 Give+0.008
Neg logit contributions
 boarded-0.014
ep-0.013
str-0.013
field-0.013
 sudden-0.013
Token chase
Token ID15505
Feature activation+1.03e-02

Pos logit contributions
 boarded+0.005
str+0.005
ep+0.005
 sudden+0.005
field+0.005
Neg logit contributions
ives-0.006
 truth-0.006
 consequences-0.006
 aback-0.006
 Give-0.006
Token mice
Token ID10693
Feature activation-2.64e-02

Pos logit contributions
ives+0.024
 aback+0.024
 consequences+0.024
 truth+0.023
 Give+0.023
Neg logit contributions
 boarded-0.011
ep-0.010
aster-0.010
 sudden-0.010
str-0.010
Token.
Token ID13
Feature activation+2.87e-02

Pos logit contributions
ep+0.042
field+0.042
 boarded+0.042
 field+0.041
aster+0.040
Neg logit contributions
 aback-0.045
 consequences-0.045
ives-0.044
 Give-0.043
 truth-0.043
Token He
Token ID679
Feature activation-8.98e-04

Pos logit contributions
ives+0.047
 consequences+0.047
 aback+0.046
 truth+0.044
 Give+0.043
Neg logit contributions
 boarded-0.050
aster-0.049
 sunny-0.048
ep-0.048
str-0.047
Token saw
Token ID2497
Feature activation-2.39e-03

Pos logit contributions
 boarded+0.001
ep+0.001
str+0.001
field+0.001
aster+0.001
Neg logit contributions
 aback-0.002
ives-0.002
 consequences-0.002
 truth-0.002
 Give-0.002
Token a
Token ID257
Feature activation+7.96e-03

Pos logit contributions
 boarded+0.003
field+0.003
str+0.003
ep+0.003
aster+0.003
Neg logit contributions
 consequences-0.005
ives-0.005
 truth-0.005
 aback-0.005
 Give-0.005
Token mouse
Token ID10211
Feature activation+1.84e-03

Pos logit contributions
 truth+0.014
 Give+0.012
 consequences+0.012
ives+0.011
 aback+0.011
Neg logit contributions
 boarded-0.014
aster-0.013
ep-0.013
cker-0.013
 sailed-0.012
Token looked
Token ID3114
Feature activation+4.54e-02

Pos logit contributions
 boarded+0.001
ep+0.001
str+0.001
aster+0.001
field+0.001
Neg logit contributions
ives-0.002
 aback-0.002
 consequences-0.002
 truth-0.002
 Give-0.002
Token up
Token ID510
Feature activation-1.12e-02

Pos logit contributions
ives+0.091
 consequences+0.089
 truth+0.087
 Give+0.086
 aback+0.084
Neg logit contributions
 boarded-0.066
 sudden-0.062
ep-0.061
field-0.061
 wheat-0.060
Token from
Token ID422
Feature activation-1.66e-02

Pos logit contributions
ep+0.019
 boarded+0.019
str+0.019
aster+0.018
pron+0.018
Neg logit contributions
 aback-0.019
 consequences-0.018
 truth-0.018
ives-0.018
 agrees-0.017
Token the
Token ID262
Feature activation+2.78e-02

Pos logit contributions
 boarded+0.020
ep+0.019
str+0.019
field+0.019
aster+0.018
Neg logit contributions
ives-0.037
 consequences-0.036
 aback-0.036
 truth-0.035
 Give-0.035
Token microscope
Token ID36396
Feature activation+1.42e-02

Pos logit contributions
ives+0.051
 aback+0.050
 agrees+0.049
 consequences+0.048
 fault+0.048
Neg logit contributions
 boarded-0.043
field-0.043
 sudden-0.042
 field-0.042
 sunny-0.042
Token and
Token ID290
Feature activation-5.30e-02

Pos logit contributions
ives+0.028
 consequences+0.028
 aback+0.028
 truth+0.027
 Give+0.027
Neg logit contributions
 boarded-0.020
ep-0.019
str-0.019
field-0.019
 sudden-0.018
Token pointed
Token ID6235
Feature activation+1.76e-02

Pos logit contributions
aster+0.076
str+0.076
ep+0.074
 boarded+0.073
pron+0.072
Neg logit contributions
 aback-0.106
 Give-0.096
 truth-0.093
 consequences-0.093
ives-0.093
Token to
Token ID284
Feature activation-1.78e-02

Pos logit contributions
ives+0.041
 consequences+0.040
 truth+0.038
 fault+0.037
 Give+0.036
Neg logit contributions
 boarded-0.020
 wheat-0.019
ep-0.018
 which-0.018
 sunny-0.018
Token the
Token ID262
Feature activation+3.21e-02

Pos logit contributions
 boarded+0.015
 wheat+0.015
 sudden+0.014
 sunny+0.014
ep+0.014
Neg logit contributions
ives-0.047
 consequences-0.045
 agrees-0.043
 Give-0.043
 aback-0.043
Token sky
Token ID6766
Feature activation-1.68e-02

Pos logit contributions
ives+0.065
 aback+0.064
 agrees+0.064
 begs+0.063
 fault+0.063
Neg logit contributions
 sudden-0.043
 field-0.043
field-0.042
 sunny-0.042
 wheat-0.040
Token.
Token ID13
Feature activation+2.85e-02

Pos logit contributions
 boarded+0.036
str+0.035
ep+0.034
aster+0.034
 sudden+0.034
Neg logit contributions
ives-0.021
 aback-0.021
 consequences-0.021
 truth-0.021
 Give-0.020
Token<|endoftext|>
Token ID50256
Feature activation+3.53e-02

Pos logit contributions
 consequences+0.034
ives+0.032
 aback+0.032
 truth+0.031
 fault+0.030
Neg logit contributions
 boarded-0.024
field-0.023
After-0.023
 sudden-0.023
str-0.023
TokenJim
Token ID18050
Feature activation+1.62e-02

Pos logit contributions
 consequences+0.083
ives+0.079
 apology+0.079
 fault+0.079
 aback+0.078
Neg logit contributions
 boarded-0.038
field-0.037
str-0.035
ep-0.035
 field-0.034
Token was
Token ID373
Feature activation+5.84e-03

Pos logit contributions
ives+0.029
 aback+0.029
 consequences+0.028
 truth+0.028
 Give+0.028
Neg logit contributions
 boarded-0.025
 sudden-0.024
str-0.024
aster-0.024
ep-0.024
Token a
Token ID257
Feature activation-9.26e-03

Pos logit contributions
 aback+0.010
 consequences+0.010
 truth+0.009
 fault+0.009
 Give+0.009
Neg logit contributions
str-0.010
field-0.009
 boarded-0.009
cle-0.009
cker-0.009
Token fire
Token ID2046
Feature activation-1.41e-02

Pos logit contributions
 boarded+0.017
ep+0.016
 blowing+0.016
After+0.015
field+0.015
Neg logit contributions
 truth-0.013
 Give-0.011
 agrees-0.011
 consequences-0.011
 aback-0.011
Tokenman
Token ID805
Feature activation-5.62e-03

Pos logit contributions
 boarded+0.024
 sudden+0.023
After+0.023
aster+0.023
str+0.022
Neg logit contributions
 truth-0.024
 consequences-0.023
ives-0.023
 aback-0.023
 Give-0.022
Token.
Token ID13
Feature activation+2.38e-02

Pos logit contributions
 boarded+0.013
 sudden+0.013
str+0.013
After+0.012
ep+0.012
Neg logit contributions
 Give-0.006
ives-0.006
 truth-0.006
 consequences-0.006
 aback-0.006
Token He
Token ID679
Feature activation-4.31e-03

Pos logit contributions
ives+0.030
 consequences+0.029
 aback+0.029
 truth+0.028
 Give+0.027
Neg logit contributions
 boarded-0.052
ep-0.050
aster-0.049
str-0.049
field-0.049
Token was
Token ID373
Feature activation+4.99e-03

Pos logit contributions
 boarded+0.006
str+0.006
 sudden+0.006
field+0.006
aster+0.006
Neg logit contributions
ives-0.008
 consequences-0.008
 truth-0.008
 aback-0.008
 Give-0.008
Token very
Token ID845
Feature activation+3.18e-03

Pos logit contributions
 truth+0.007
 aback+0.007
 consequences+0.007
 fault+0.006
 Give+0.006
Neg logit contributions
str-0.009
field-0.009
 boarded-0.009
aman-0.009
After-0.009
Token strong
Token ID1913
Feature activation+8.27e-03

Pos logit contributions
 truth+0.005
 aback+0.005
 consequences+0.005
ives+0.005
 Give+0.005
Neg logit contributions
 boarded-0.005
field-0.005
aster-0.005
 sudden-0.005
str-0.005
Token and
Token ID290
Feature activation-6.64e-02

Pos logit contributions
 aback+0.010
ives+0.010
 consequences+0.010
 truth+0.010
 Give+0.010
Neg logit contributions
 boarded-0.011
str-0.011
field-0.010
aster-0.010
ep-0.010
Token it
Token ID340
Feature activation+1.50e-02

Pos logit contributions
 boarded+0.101
str+0.101
field+0.100
aster+0.095
ep+0.094
Neg logit contributions
 aback-0.122
 consequences-0.121
 truth-0.121
 Give-0.118
ives-0.116
Token was
Token ID373
Feature activation+1.07e-02

Pos logit contributions
ives+0.030
 consequences+0.029
 aback+0.029
 Give+0.028
 truth+0.028
Neg logit contributions
 boarded-0.022
ep-0.021
str-0.021
 sudden-0.020
field-0.020
Token bleeding
Token ID16832
Feature activation+2.10e-02

Pos logit contributions
 consequences+0.019
 truth+0.018
 aback+0.018
 fault+0.017
 Give+0.016
Neg logit contributions
 boarded-0.019
field-0.018
str-0.018
aster-0.017
 sudden-0.016
Token.
Token ID13
Feature activation+2.30e-02

Pos logit contributions
ives+0.033
 aback+0.032
 consequences+0.031
 Give+0.031
 truth+0.031
Neg logit contributions
 boarded-0.039
ep-0.039
 sudden-0.037
str-0.036
field-0.036
Token She
Token ID1375
Feature activation-1.74e-02

Pos logit contributions
 consequences+0.044
ives+0.043
 aback+0.041
 agrees+0.040
 truth+0.040
Neg logit contributions
str-0.034
 sudden-0.034
 boarded-0.034
After-0.033
field-0.032
Token quickly
Token ID2952
Feature activation-1.11e-02

Pos logit contributions
field+0.029
aster+0.028
 boarded+0.028
str+0.028
 sunny+0.028
Neg logit contributions
 aback-0.030
 truth-0.028
 Give-0.028
 consequences-0.028
ives-0.028
Token called
Token ID1444
Feature activation-1.32e-03

Pos logit contributions
field+0.018
 boarded+0.016
str+0.016
 sunny+0.016
aster+0.016
Neg logit contributions
 aback-0.020
ives-0.020
 Give-0.019
 consequences-0.019
 truth-0.019
Token an
Token ID281
Feature activation+2.85e-02

Pos logit contributions
 boarded+0.002
aster+0.002
ep+0.002
 sunny+0.002
 wheat+0.002
Neg logit contributions
ives-0.003
 consequences-0.003
 agrees-0.003
 aback-0.003
 truth-0.003
Token ambulance
Token ID22536
Feature activation-6.64e-03

Pos logit contributions
ives+0.059
 aback+0.055
 fault+0.052
 Give+0.052
 consequences+0.052
Neg logit contributions
ep-0.040
 wheat-0.039
 sunny-0.037
 boarded-0.037
aster-0.037
Token to
Token ID284
Feature activation-1.55e-02

Pos logit contributions
 boarded+0.011
ep+0.010
str+0.010
field+0.010
aster+0.010
Neg logit contributions
ives-0.012
 consequences-0.012
 aback-0.012
 truth-0.012
 Give-0.011
Token goes
Token ID2925
Feature activation+1.48e-02

Pos logit contributions
ives+0.031
 consequences+0.031
 aback+0.031
 truth+0.030
 Give+0.030
Neg logit contributions
 boarded-0.038
ep-0.037
str-0.036
aster-0.036
field-0.036
Token!",
Token ID40754
Feature activation-4.78e-02

Pos logit contributions
ives+0.026
 aback+0.026
 consequences+0.026
 truth+0.025
 Give+0.025
Neg logit contributions
 boarded-0.024
ep-0.023
str-0.023
field-0.023
aster-0.023
Token said
Token ID531
Feature activation+1.49e-02

Pos logit contributions
 boarded+0.080
str+0.079
ep+0.077
field+0.077
aster+0.075
Neg logit contributions
 aback-0.082
ives-0.080
 consequences-0.080
 Give-0.080
 truth-0.078
Token Max
Token ID5436
Feature activation+4.26e-02

Pos logit contributions
 aback+0.024
 consequences+0.024
ives+0.023
 truth+0.023
 Give+0.023
Neg logit contributions
 boarded-0.026
ep-0.026
str-0.026
field-0.025
 sudden-0.025
Token.
Token ID13
Feature activation+3.19e-02

Pos logit contributions
 consequences+0.072
ives+0.069
 truth+0.067
 aback+0.065
 Give+0.064
Neg logit contributions
 boarded-0.075
ep-0.073
str-0.072
 sudden-0.070
aster-0.070
Token
Token ID198
Feature activation-2.51e-02

Pos logit contributions
 consequences+0.060
ives+0.060
 aback+0.058
 truth+0.057
 Give+0.055
Neg logit contributions
 boarded-0.050
str-0.047
ep-0.047
 sudden-0.046
field-0.045
Token
Token ID198
Feature activation-3.47e-02

Pos logit contributions
 boarded+0.038
ep+0.037
str+0.036
 sudden+0.036
aster+0.035
Neg logit contributions
 consequences-0.048
ives-0.047
 aback-0.047
 truth-0.045
 Give-0.045
TokenThe
Token ID464
Feature activation-3.30e-03

Pos logit contributions
 boarded+0.047
str+0.044
field+0.044
ep+0.043
aster+0.043
Neg logit contributions
ives-0.072
 aback-0.071
 consequences-0.070
 Give-0.070
 truth-0.069
Token butterfly
Token ID35113
Feature activation-1.37e-02

Pos logit contributions
 boarded+0.004
ep+0.004
 sudden+0.004
str+0.004
 wheat+0.004
Neg logit contributions
ives-0.008
 aback-0.007
 consequences-0.007
 Give-0.007
 agrees-0.007
Token flew
Token ID13112
Feature activation-1.38e-03

Pos logit contributions
aster+0.019
 boarded+0.018
cle+0.017
str+0.017
ep+0.017
Neg logit contributions
 agrees-0.027
 aback-0.026
ives-0.026
 truth-0.026
 Give-0.026
Token up
Token ID510
Feature activation-2.06e-02

Pos logit contributions
 boarded+0.002
ep+0.002
str+0.002
field+0.002
 sudden+0.002
Neg logit contributions
ives-0.002
 consequences-0.002
 aback-0.002
 truth-0.002
 Give-0.002
Token for
Token ID329
Feature activation-1.26e-02

Pos logit contributions
ives+0.036
 aback+0.035
 consequences+0.035
 truth+0.034
 Give+0.034
Neg logit contributions
 boarded-0.051
ep-0.049
str-0.049
field-0.048
aster-0.048
Token it
Token ID340
Feature activation+4.03e-02

Pos logit contributions
aster+0.015
 sudden+0.014
 boarded+0.014
 sunny+0.014
ep+0.014
Neg logit contributions
ives-0.029
 aback-0.027
 Give-0.027
 consequences-0.026
 fault-0.026
Token.
Token ID13
Feature activation+2.96e-02

Pos logit contributions
 consequences+0.067
 aback+0.067
ives+0.066
 Give+0.063
 truth+0.063
Neg logit contributions
 boarded-0.069
ep-0.067
 blowing-0.065
field-0.064
 sunny-0.064
Token
Token ID198
Feature activation-1.83e-02

Pos logit contributions
 consequences+0.067
 fault+0.063
ives+0.061
 aback+0.061
 truth+0.060
Neg logit contributions
After-0.036
 boarded-0.034
 sudden-0.034
str-0.032
At-0.032
Token
Token ID198
Feature activation-2.72e-02

Pos logit contributions
 boarded+0.025
ep+0.024
str+0.024
After+0.024
 sudden+0.024
Neg logit contributions
 consequences-0.038
ives-0.036
 aback-0.036
 fault-0.035
 truth-0.035
TokenWhen
Token ID2215
Feature activation-6.27e-02

Pos logit contributions
 boarded+0.040
field+0.039
str+0.038
ep+0.037
cker+0.036
Neg logit contributions
ives-0.054
 aback-0.053
 Give-0.052
 consequences-0.052
 truth-0.051
Token the
Token ID262
Feature activation-5.20e-03

Pos logit contributions
 boarded+0.110
ep+0.110
str+0.107
cker+0.104
field+0.102
Neg logit contributions
 consequences-0.105
 truth-0.104
 aback-0.102
 Give-0.100
ives-0.098
Token bear
Token ID6842
Feature activation-2.25e-02

Pos logit contributions
 sudden+0.007
 sunny+0.007
 boarded+0.007
lf+0.007
 shining+0.007
Neg logit contributions
ives-0.011
 aback-0.011
 fault-0.011
 agrees-0.010
 consequences-0.010
Token came
Token ID1625
Feature activation+3.21e-02

Pos logit contributions
 boarded+0.034
ep+0.033
str+0.032
aster+0.032
field+0.032
Neg logit contributions
ives-0.042
 aback-0.041
 truth-0.040
 consequences-0.040
 Give-0.040
Token to
Token ID284
Feature activation-4.97e-03

Pos logit contributions
ives+0.058
 consequences+0.057
 truth+0.054
 aback+0.054
 Give+0.054
Neg logit contributions
 boarded-0.053
ep-0.050
 sunny-0.049
 sudden-0.049
field-0.049
Token life
Token ID1204
Feature activation-1.34e-02

Pos logit contributions
 boarded+0.006
aster+0.006
ep+0.006
field+0.005
 sunny+0.005
Neg logit contributions
ives-0.011
 consequences-0.011
 aback-0.011
 Give-0.010
 truth-0.010
TokenThey
Token ID2990
Feature activation-1.06e-02

Pos logit contributions
 boarded+0.053
field+0.050
str+0.049
pron+0.048
aster+0.048
Neg logit contributions
 Give-0.083
 aback-0.083
ives-0.082
 consequences-0.080
 truth-0.079
Token walked
Token ID6807
Feature activation+2.36e-02

Pos logit contributions
str+0.016
field+0.016
aster+0.016
pron+0.016
cker+0.015
Neg logit contributions
 aback-0.020
ives-0.019
 agrees-0.018
 truth-0.018
 consequences-0.018
Token to
Token ID284
Feature activation-2.51e-02

Pos logit contributions
ives+0.041
 consequences+0.041
 aback+0.040
 truth+0.040
 Give+0.039
Neg logit contributions
 boarded-0.040
ep-0.038
str-0.037
field-0.037
 sudden-0.037
Token the
Token ID262
Feature activation+1.59e-02

Pos logit contributions
 boarded+0.028
ep+0.028
str+0.028
field+0.027
pron+0.026
Neg logit contributions
 aback-0.058
 truth-0.056
 consequences-0.055
ives-0.054
 Give-0.054
Token bag
Token ID6131
Feature activation-1.19e-02

Pos logit contributions
ives+0.030
 aback+0.030
 consequences+0.029
 truth+0.029
 Give+0.028
Neg logit contributions
 boarded-0.025
ep-0.023
str-0.023
field-0.023
 sudden-0.023
Token and
Token ID290
Feature activation-6.25e-02

Pos logit contributions
 boarded+0.016
ep+0.015
field+0.015
str+0.015
aster+0.015
Neg logit contributions
 consequences-0.025
ives-0.024
 aback-0.024
 truth-0.023
 Give-0.023
Token pulled
Token ID5954
Feature activation+2.72e-03

Pos logit contributions
str+0.106
ep+0.101
ittens+0.100
pron+0.100
field+0.099
Neg logit contributions
 aback-0.114
ives-0.102
 Give-0.099
 agrees-0.096
 truth-0.093
Token the
Token ID262
Feature activation+2.01e-02

Pos logit contributions
ives+0.006
 consequences+0.006
 truth+0.006
 aback+0.006
 Give+0.006
Neg logit contributions
 boarded-0.003
ep-0.003
 wheat-0.003
aster-0.003
 sudden-0.003
Token zipper
Token ID48992
Feature activation-1.35e-03

Pos logit contributions
ives+0.038
 aback+0.038
 agrees+0.036
 consequences+0.036
 Give+0.036
Neg logit contributions
 boarded-0.030
 wheat-0.029
ep-0.028
 sudden-0.028
str-0.028
Token.
Token ID13
Feature activation+4.29e-02

Pos logit contributions
 boarded+0.002
ep+0.002
field+0.002
 blowing+0.002
str+0.002
Neg logit contributions
 consequences-0.002
ives-0.002
 aback-0.002
 truth-0.002
 Give-0.002
Token Inside
Token ID14384
Feature activation-4.88e-02

Pos logit contributions
ives+0.064
 aback+0.063
 consequences+0.063
 truth+0.061
 Give+0.061
Neg logit contributions
 boarded-0.083
ep-0.080
str-0.079
field-0.078
aster-0.078
Token
Token ID198
Feature activation-3.38e-02

Pos logit contributions
 boarded+0.041
ep+0.039
str+0.038
field+0.038
aster+0.038
Neg logit contributions
ives-0.046
 consequences-0.045
 aback-0.045
 truth-0.044
 Give-0.044
TokenWhen
Token ID2215
Feature activation-6.27e-02

Pos logit contributions
 boarded+0.030
ep+0.027
cker+0.027
str+0.027
field+0.027
Neg logit contributions
ives-0.088
 aback-0.084
 truth-0.084
 Give-0.083
 consequences-0.083
Token she
Token ID673
Feature activation+1.14e-03

Pos logit contributions
 boarded+0.096
ep+0.092
str+0.090
field+0.088
aster+0.088
Neg logit contributions
ives-0.119
 aback-0.118
 consequences-0.118
 truth-0.115
 Give-0.114
Token got
Token ID1392
Feature activation+1.32e-03

Pos logit contributions
ives+0.002
 aback+0.002
 truth+0.002
 Give+0.002
 consequences+0.002
Neg logit contributions
 wheat-0.002
ep-0.002
cker-0.002
 sunny-0.002
str-0.002
Token home
Token ID1363
Feature activation-1.50e-03

Pos logit contributions
 aback+0.002
ives+0.002
 consequences+0.002
 truth+0.002
 Give+0.002
Neg logit contributions
 boarded-0.002
ep-0.002
str-0.002
field-0.002
aster-0.002
Token,
Token ID11
Feature activation-6.36e-02

Pos logit contributions
 boarded+0.002
ep+0.002
str+0.002
field+0.002
aster+0.002
Neg logit contributions
ives-0.003
 aback-0.003
 consequences-0.003
 truth-0.003
 Give-0.003
Token Polly
Token ID36898
Feature activation+1.34e-02

Pos logit contributions
 boarded+0.096
ep+0.094
str+0.091
aster+0.088
field+0.088
Neg logit contributions
 aback-0.121
ives-0.121
 consequences-0.120
 truth-0.118
 Give-0.116
Token laid
Token ID8104
Feature activation-2.81e-02

Pos logit contributions
ives+0.026
 aback+0.026
 consequences+0.025
 truth+0.025
 Give+0.025
Neg logit contributions
 boarded-0.019
ep-0.018
str-0.018
aster-0.018
 sudden-0.018
Token the
Token ID262
Feature activation+2.36e-02

Pos logit contributions
 boarded+0.048
str+0.047
ep+0.046
aster+0.045
 sudden+0.045
Neg logit contributions
 aback-0.051
 consequences-0.047
 truth-0.046
ives-0.045
 Give-0.045
Token note
Token ID3465
Feature activation+4.48e-03

Pos logit contributions
ives+0.036
 aback+0.035
 consequences+0.034
 Give+0.033
 agrees+0.033
Neg logit contributions
 boarded-0.045
str-0.043
ep-0.043
 sudden-0.043
field-0.042
Token out
Token ID503
Feature activation-1.16e-02

Pos logit contributions
 aback+0.007
ives+0.007
 truth+0.007
 consequences+0.007
 Give+0.006
Neg logit contributions
str-0.008
 boarded-0.008
aster-0.008
 sudden-0.008
field-0.008
Token the
Token ID262
Feature activation+5.80e-03

Pos logit contributions
pron+0.093
 boarded+0.091
field+0.090
aman+0.090
str+0.089
Neg logit contributions
 consequences-0.090
 truth-0.085
 aback-0.081
 apology-0.081
 Give-0.081
Token mouse
Token ID10211
Feature activation-3.40e-02

Pos logit contributions
 consequences+0.009
 truth+0.009
 aback+0.008
ives+0.008
 Give+0.008
Neg logit contributions
 boarded-0.012
aster-0.011
ep-0.011
field-0.011
str-0.011
Token were
Token ID547
Feature activation-2.24e-03

Pos logit contributions
 boarded+0.044
str+0.042
ep+0.042
aster+0.042
 sudden+0.041
Neg logit contributions
ives-0.071
 consequences-0.069
 aback-0.069
 truth-0.068
 agrees-0.067
Token best
Token ID1266
Feature activation-1.14e-02

Pos logit contributions
field+0.004
 boarded+0.004
aster+0.004
pron+0.004
aman+0.004
Neg logit contributions
 consequences-0.004
 agrees-0.004
 truth-0.004
 fault-0.004
 aback-0.004
Token friends
Token ID2460
Feature activation-2.16e-02

Pos logit contributions
 boarded+0.015
ep+0.014
 sudden+0.014
str+0.013
aster+0.013
Neg logit contributions
 aback-0.024
ives-0.024
 consequences-0.024
 truth-0.023
 Give-0.023
Token and
Token ID290
Feature activation-6.74e-02

Pos logit contributions
 boarded+0.030
 sunny+0.029
ep+0.028
field+0.028
 they+0.028
Neg logit contributions
ives-0.041
 consequences-0.041
 aback-0.041
 begs-0.039
 Give-0.039
Token always
Token ID1464
Feature activation-3.10e-02

Pos logit contributions
ep+0.090
 boarded+0.089
field+0.086
pron+0.086
str+0.086
Neg logit contributions
 consequences-0.134
 truth-0.133
ives-0.131
 Give-0.128
 agrees-0.128
Token had
Token ID550
Feature activation-7.46e-03

Pos logit contributions
 boarded+0.041
ep+0.039
str+0.038
field+0.037
aster+0.037
Neg logit contributions
ives-0.065
 aback-0.064
 consequences-0.064
 truth-0.063
 Give-0.062
Token fun
Token ID1257
Feature activation-7.64e-03

Pos logit contributions
 boarded+0.012
field+0.012
str+0.012
aster+0.011
ep+0.011
Neg logit contributions
 consequences-0.013
 aback-0.013
ives-0.013
 truth-0.013
 agrees-0.012
Token together
Token ID1978
Feature activation-1.20e-02

Pos logit contributions
 boarded+0.013
str+0.012
ep+0.012
field+0.012
aster+0.012
Neg logit contributions
 consequences-0.013
ives-0.013
 aback-0.013
 truth-0.013
 Give-0.012
Token.
Token ID13
Feature activation+2.23e-02

Pos logit contributions
 boarded+0.025
ep+0.024
str+0.024
aster+0.024
 sudden+0.024
Neg logit contributions
ives-0.016
 consequences-0.016
 aback-0.016
 truth-0.015
 Give-0.015
Token
Token ID198
Feature activation-2.89e-02

Pos logit contributions
 boarded+0.040
field+0.038
ep+0.037
aster+0.037
str+0.036
Neg logit contributions
ives-0.034
 aback-0.034
 truth-0.034
 Give-0.033
 consequences-0.033
TokenTim
Token ID14967
Feature activation+5.43e-03

Pos logit contributions
field+0.043
 boarded+0.043
aster+0.037
pron+0.037
 blowing+0.037
Neg logit contributions
 Give-0.055
 aback-0.054
ives-0.053
 truth-0.052
 consequences-0.052
Tokenmy
Token ID1820
Feature activation-1.03e-02

Pos logit contributions
ives+0.011
 consequences+0.011
 aback+0.011
 truth+0.011
 Give+0.011
Neg logit contributions
 boarded-0.007
ep-0.007
str-0.007
 sudden-0.007
field-0.007
Token felt
Token ID2936
Feature activation-3.12e-03

Pos logit contributions
 boarded+0.015
ep+0.014
str+0.014
field+0.014
aster+0.014
Neg logit contributions
ives-0.020
 aback-0.020
 consequences-0.020
 truth-0.019
 Give-0.019
Token bad
Token ID2089
Feature activation+5.13e-02

Pos logit contributions
 sudden+0.005
 boarded+0.005
 sunny+0.004
 wheat+0.004
aster+0.004
Neg logit contributions
ives-0.006
 fault-0.006
 Give-0.006
 consequences-0.006
 agrees-0.005
Token and
Token ID290
Feature activation-5.65e-02

Pos logit contributions
 aback+0.126
ives+0.125
 agrees+0.120
 truth+0.119
 consequences+0.117
Neg logit contributions
 they-0.042
 when-0.039
 blowing-0.038
 which-0.038
 while-0.035
Token apologized
Token ID22893
Feature activation+4.71e-02

Pos logit contributions
 boarded+0.091
field+0.088
ep+0.088
str+0.087
aster+0.086
Neg logit contributions
 aback-0.101
 consequences-0.100
ives-0.098
 truth-0.097
 Give-0.097
Token again
Token ID757
Feature activation-5.40e-03

Pos logit contributions
ives+0.101
 consequences+0.096
 aback+0.096
 Give+0.095
 truth+0.092
Neg logit contributions
 which-0.057
 they-0.056
 blowing-0.052
 when-0.052
 wheat-0.051
Token.
Token ID13
Feature activation+2.89e-02

Pos logit contributions
 boarded+0.012
ep+0.011
str+0.011
field+0.011
aster+0.011
Neg logit contributions
ives-0.007
 consequences-0.007
 aback-0.007
 truth-0.007
 Give-0.006
Token The
Token ID383
Feature activation-2.96e-03

Pos logit contributions
ives+0.045
 consequences+0.045
 aback+0.043
 truth+0.042
 Give+0.040
Neg logit contributions
 boarded-0.054
str-0.051
field-0.051
ep-0.051
 sudden-0.051
Token bear
Token ID6842
Feature activation-2.16e-02

Pos logit contributions
 sudden+0.004
 boarded+0.004
 sunny+0.004
ep+0.004
lf+0.004
Neg logit contributions
ives-0.006
 aback-0.006
 fault-0.006
 agrees-0.006
 Give-0.006