TOP ACTIVATIONS
MAX = 0.153

. Sharing is caring . Friends are more important than toys
'm happy to help . Friends always help each other .
is our new law . Friends must share and be nice
But we 're friends ! Friends don 't keep secrets from
different tastes . But most importantly , she learned to love
come back and apologize . Friends sometimes forget , but they
more enjoyable and meaningful . Friends should share , help and
, treats , and most importantly , Lily 's love .
You are my friend . Friends help each other . Now
need it . And most importantly , she learned that a
" You 're welcome . Friends help each other ." <|endoftext|>
beautiful things and , most importantly , they had a lot
we can be friends . Friends share and speak and play
are my friend too . Friends forgive and help each other
You are my friend . Friends share and take turns ,"
valuable than collecting things . Friends should help each other when
're welcome , Tom . Friends help each other ." From
Friends help each other . Friends also listen to each other
the owner smiled . " Looks like you two are friends
friends should not quarrel . Friends should be kind to each

MIN ACTIVATIONS
MIN = -0.218

Where â s my jelly ?
Please don â t eat me ,
saw something floating near the bottom of the tub . It
Let â s have a stretching
Let â s explore and have
Let â s not think about
Let â s play a joke
Why don â t you come with
until it stopped at the bottom . Tim my was happy
please â and
s why it â s called a smooth
Who â s there ? â
Let â s go visit Gr
Let â s spin together with
Let â s jump over it
her doll sink to the bottom of the pool . She
What â s up there ?
don â t worry ! I
to break into people â s homes . Sam
T IM â . It looked nice and

INTERVAL 0.060 - 0.153
CONTAINS 9.022%

not gr umpy anymore . They became good friends and admired
my and his mom worked together to build the tower .
little boy named Luke . He found an orange job and
book from the shelf . She says it is a book
couldn 't believe it . Will was so happy

INTERVAL -0.032 - 0.060
CONTAINS 65.512%

bring . At first , nothing seemed to happen
beautiful girl and a handsome boy . They loved each other
, who know what is best for us and want to
with the toy car . Tom thought of a plan to
clumsy sometimes , he could still be helpful to his friends

INTERVAL -0.125 - -0.032
CONTAINS 25.351%

pleased . What a special night it had been - he
excited to see Joey had found the key to unlock it
picked it up and put it in his pocket . He
but then she started to get used to it . After
We love you , mom !" They hugged their

INTERVAL -0.218 - -0.125
CONTAINS 0.115%

to find out what â s there . â
hug , and they both started to plan their next race
t believe you â re here ! â
mother said , There , there
up her special pen and started writing down all the fun
Token.
Token ID13
Feature activation-4.37e-03

Pos logit contributions
 matter+0.127
 unity+0.126
Fin+0.125
 bond+0.123
 course+0.122
Neg logit contributions
angers-0.051
 office-0.048
elt-0.043
 bathroom-0.040
 upstairs-0.038
Token Sharing
Token ID36644
Feature activation+5.80e-02

Pos logit contributions
elt+0.006
angers+0.006
 office+0.006
 bathroom+0.005
uck+0.005
Neg logit contributions
 matter-0.008
 course-0.008
leness-0.008
 unity-0.008
 bond-0.007
Token is
Token ID318
Feature activation-3.36e-02

Pos logit contributions
 matter+0.127
Fin+0.126
 unity+0.125
 souven+0.122
oc+0.120
Neg logit contributions
angers-0.059
 office-0.058
 bathroom-0.053
elt-0.052
 door-0.047
Token caring
Token ID18088
Feature activation+9.18e-02

Pos logit contributions
 office+0.050
 room+0.041
 up+0.041
 outside+0.040
:+0.040
Neg logit contributions
ited-0.062
oc-0.056
leness-0.056
forts-0.055
of-0.052
Token.
Token ID13
Feature activation-6.05e-03

Pos logit contributions
 matter+0.178
 unity+0.177
 course+0.175
Fin+0.172
 bond+0.168
Neg logit contributions
angers-0.128
 office-0.116
elt-0.114
 bathroom-0.104
tub-0.097
Token Friends
Token ID14213
Feature activation+1.53e-01

Pos logit contributions
elt+0.009
angers+0.008
 office+0.008
 bathroom+0.007
uck+0.007
Neg logit contributions
 matter-0.011
 course-0.011
leness-0.011
 unity-0.010
 bond-0.010
Token are
Token ID389
Feature activation-8.12e-03

Pos logit contributions
 memorable+0.248
 souven+0.246
oc+0.241
 unity+0.235
 lasting+0.233
Neg logit contributions
 office-0.206
 bathroom-0.198
 door-0.191
:-0.187
 upstairs-0.187
Token more
Token ID517
Feature activation+3.99e-02

Pos logit contributions
 office+0.011
 room+0.009
 bathroom+0.008
 up+0.008
 outside+0.008
Neg logit contributions
ited-0.017
oc-0.017
of-0.016
leness-0.015
forts-0.015
Token important
Token ID1593
Feature activation+7.87e-02

Pos logit contributions
 course+0.062
oc+0.059
Fin+0.058
 matter+0.058
of+0.057
Neg logit contributions
 office-0.065
angers-0.061
 bathroom-0.059
 upstairs-0.057
 room-0.056
Token than
Token ID621
Feature activation+8.29e-02

Pos logit contributions
 matter+0.108
 unity+0.107
 course+0.107
Fin+0.102
 bond+0.100
Neg logit contributions
angers-0.154
 office-0.143
elt-0.143
 bathroom-0.132
 salon-0.127
Token toys
Token ID14958
Feature activation+6.35e-02

Pos logit contributions
oc+0.146
 course+0.142
 unity+0.140
Fin+0.139
 bond+0.139
Neg logit contributions
angers-0.127
elt-0.119
 office-0.117
 bathroom-0.110
 door-0.102
Token'm
Token ID1101
Feature activation-4.04e-02

Pos logit contributions
 unity+0.103
 course+0.100
 matter+0.099
Fin+0.094
leness+0.093
Neg logit contributions
angers-0.108
elt-0.106
 office-0.105
 bathroom-0.099
tub-0.094
Token happy
Token ID3772
Feature activation+5.02e-02

Pos logit contributions
angers+0.081
 office+0.076
elt+0.075
 bathroom+0.070
 salon+0.067
Neg logit contributions
 matter-0.055
 unity-0.054
 course-0.053
 bond-0.051
Fin-0.050
Token to
Token ID284
Feature activation+3.32e-02

Pos logit contributions
 unity+0.112
 matter+0.112
leness+0.109
oc+0.109
Fin+0.108
Neg logit contributions
angers-0.052
elt-0.052
 office-0.048
 bathroom-0.042
uck-0.040
Token help
Token ID1037
Feature activation+2.68e-02

Pos logit contributions
 unity+0.047
 matter+0.047
 course+0.045
 bond+0.044
Fin+0.043
Neg logit contributions
angers-0.059
 office-0.058
elt-0.054
 bathroom-0.053
 upstairs-0.050
Token.
Token ID13
Feature activation-4.27e-03

Pos logit contributions
 unity+0.055
 bond+0.054
forts+0.053
 course+0.053
Fin+0.053
Neg logit contributions
 office-0.033
angers-0.033
 bathroom-0.028
 door-0.027
 upstairs-0.026
Token Friends
Token ID14213
Feature activation+1.52e-01

Pos logit contributions
angers+0.006
elt+0.006
 office+0.006
 bathroom+0.005
uck+0.005
Neg logit contributions
 matter-0.008
 unity-0.008
 course-0.008
leness-0.008
 bond-0.007
Token always
Token ID1464
Feature activation+8.35e-02

Pos logit contributions
 unity+0.231
oc+0.229
 souven+0.229
 memorable+0.224
 matter+0.224
Neg logit contributions
 office-0.220
 upstairs-0.213
 door-0.211
 bathroom-0.208
angers-0.200
Token help
Token ID1037
Feature activation+4.04e-02

Pos logit contributions
Fin+0.130
 unity+0.128
 course+0.128
oc+0.127
From+0.123
Neg logit contributions
 office-0.130
elt-0.125
angers-0.121
 bathroom-0.119
 upstairs-0.111
Token each
Token ID1123
Feature activation+4.78e-02

Pos logit contributions
 unity+0.062
 course+0.062
 bond+0.059
Fin+0.059
 souven+0.058
Neg logit contributions
 office-0.067
angers-0.066
elt-0.060
 upstairs-0.060
 bathroom-0.060
Token other
Token ID584
Feature activation+2.32e-02

Pos logit contributions
forts+0.099
 unity+0.099
leness+0.095
 bond+0.095
 spe+0.094
Neg logit contributions
 office-0.058
 door-0.047
 bathroom-0.044
 kitchen-0.041
 upstairs-0.041
Token.
Token ID13
Feature activation-5.49e-04

Pos logit contributions
 course+0.045
 matter+0.044
 unity+0.044
Fin+0.044
 bond+0.042
Neg logit contributions
angers-0.032
 office-0.031
elt-0.027
 bathroom-0.027
 upstairs-0.025
Token is
Token ID318
Feature activation-4.06e-02

Pos logit contributions
 course+0.109
 matter+0.107
 unity+0.106
 bond+0.100
 kindness+0.100
Neg logit contributions
angers-0.110
elt-0.101
 office-0.095
 bathroom-0.088
amas-0.085
Token our
Token ID674
Feature activation+3.79e-02

Pos logit contributions
 office+0.055
angers+0.050
 bathroom+0.047
 door+0.047
 upstairs+0.046
Neg logit contributions
oc-0.077
leness-0.076
 bond-0.074
forts-0.073
 course-0.072
Token new
Token ID649
Feature activation+7.36e-02

Pos logit contributions
 matter+0.062
Fin+0.062
oc+0.061
leness+0.061
 course+0.060
Neg logit contributions
 office-0.062
angers-0.060
 bathroom-0.055
 salon-0.054
elt-0.053
Token law
Token ID1099
Feature activation+3.79e-02

Pos logit contributions
 unity+0.120
 matter+0.120
 course+0.119
Fin+0.115
 bond+0.113
Neg logit contributions
angers-0.123
 office-0.116
elt-0.113
 bathroom-0.105
 salon-0.101
Token.
Token ID13
Feature activation-3.88e-03

Pos logit contributions
 matter+0.078
 unity+0.078
 course+0.077
Fin+0.075
 bond+0.074
Neg logit contributions
angers-0.048
 office-0.043
elt-0.043
 bathroom-0.038
 salon-0.036
Token Friends
Token ID14213
Feature activation+1.52e-01

Pos logit contributions
angers+0.005
elt+0.005
 office+0.005
 bathroom+0.004
azes+0.004
Neg logit contributions
 matter-0.008
 bond-0.007
 unity-0.007
 course-0.007
leness-0.007
Token must
Token ID1276
Feature activation+5.69e-02

Pos logit contributions
 souven+0.236
 unity+0.229
 memorable+0.226
oc+0.224
 matter+0.222
Neg logit contributions
 office-0.223
 bathroom-0.205
 upstairs-0.203
 door-0.200
angers-0.197
Token share
Token ID2648
Feature activation+4.48e-02

Pos logit contributions
oc+0.091
ience+0.082
 lasting+0.081
 unity+0.081
 souven+0.081
Neg logit contributions
 office-0.092
 bathroom-0.086
angers-0.079
 door-0.078
 salon-0.077
Token and
Token ID290
Feature activation+7.15e-02

Pos logit contributions
 matter+0.092
Fin+0.092
 unity+0.089
 course+0.088
 bond+0.088
Neg logit contributions
 office-0.049
angers-0.044
 bathroom-0.041
 door-0.040
 upstairs-0.038
Token be
Token ID307
Feature activation-9.28e-03

Pos logit contributions
oc+0.102
Fin+0.102
 course+0.099
 unity+0.094
 matter+0.094
Neg logit contributions
 office-0.120
 bathroom-0.112
elt-0.110
angers-0.110
 door-0.106
Token nice
Token ID3621
Feature activation+7.77e-02

Pos logit contributions
 office+0.015
elt+0.013
angers+0.013
 upstairs+0.012
 bathroom+0.012
Neg logit contributions
 course-0.015
 matter-0.015
Fin-0.014
oc-0.014
affe-0.013
TokenBut
Token ID1537
Feature activation+2.54e-02

Pos logit contributions
 unity+0.037
 course+0.034
 kindness+0.034
 matter+0.034
 bond+0.034
Neg logit contributions
elt-0.032
angers-0.030
 office-0.027
uck-0.026
 bathroom-0.026
Token we
Token ID356
Feature activation+6.96e-02

Pos logit contributions
 unity+0.044
leness+0.043
 matter+0.043
 bond+0.043
 course+0.042
Neg logit contributions
angers-0.038
 office-0.038
elt-0.037
 bathroom-0.034
 door-0.032
Token're
Token ID821
Feature activation-1.79e-02

Pos logit contributions
 unity+0.104
oc+0.100
 souven+0.100
leness+0.099
 course+0.099
Neg logit contributions
 office-0.110
 upstairs-0.109
 bathroom-0.108
angers-0.107
elt-0.107
Token friends
Token ID2460
Feature activation+7.38e-02

Pos logit contributions
angers+0.031
elt+0.028
 office+0.027
 bathroom+0.025
 salon+0.025
Neg logit contributions
 unity-0.030
 matter-0.029
 course-0.029
Fin-0.028
 kindness-0.028
Token!
Token ID0
Feature activation-8.70e-03

Pos logit contributions
 unity+0.160
 matter+0.155
Fin+0.150
leness+0.150
 bond+0.148
Neg logit contributions
 office-0.082
angers-0.082
elt-0.080
 bathroom-0.071
 salon-0.067
Token Friends
Token ID14213
Feature activation+1.51e-01

Pos logit contributions
angers+0.014
elt+0.014
 office+0.012
 bathroom+0.011
amas+0.011
Neg logit contributions
 matter-0.016
 course-0.015
 unity-0.015
 bond-0.015
leness-0.015
Token don
Token ID836
Feature activation+2.10e-02

Pos logit contributions
oc+0.245
 souven+0.240
 unity+0.238
 memorable+0.238
Fin+0.223
Neg logit contributions
 upstairs-0.204
 office-0.199
 outside-0.198
 bathroom-0.197
 door-0.193
Token't
Token ID470
Feature activation+8.30e-02

Pos logit contributions
 unity+0.038
 matter+0.038
 course+0.037
 bond+0.036
Fin+0.036
Neg logit contributions
angers-0.032
elt-0.029
 office-0.029
 bathroom-0.026
 salon-0.025
Token keep
Token ID1394
Feature activation+1.92e-04

Pos logit contributions
 unity+0.136
 course+0.134
oc+0.129
 bond+0.124
 teamwork+0.123
Neg logit contributions
 office-0.122
angers-0.116
elt-0.114
 upstairs-0.114
 bathroom-0.114
Token secrets
Token ID13141
Feature activation+5.80e-02

Pos logit contributions
oc+0.000
 bond+0.000
 course+0.000
Fin+0.000
 matter+0.000
Neg logit contributions
 office-0.000
angers-0.000
 bathroom-0.000
 upstairs-0.000
elt-0.000
Token from
Token ID422
Feature activation+2.98e-03

Pos logit contributions
 course+0.128
Fin+0.127
 matter+0.121
 unity+0.120
From+0.118
Neg logit contributions
angers-0.063
 office-0.060
 door-0.058
 upstairs-0.057
tub-0.053
Token different
Token ID1180
Feature activation+9.22e-02

Pos logit contributions
oc+0.083
Fin+0.075
From+0.072
 course+0.072
 unity+0.070
Neg logit contributions
elt-0.075
 office-0.072
angers-0.071
 bathroom-0.070
 door-0.064
Token tastes
Token ID18221
Feature activation+3.71e-03

Pos logit contributions
 unity+0.134
 course+0.133
 matter+0.132
Fin+0.129
 bond+0.126
Neg logit contributions
angers-0.170
elt-0.160
 office-0.158
 bathroom-0.146
 salon-0.140
Token.
Token ID13
Feature activation-1.00e-02

Pos logit contributions
 unity+0.007
oc+0.007
 bond+0.007
Fin+0.007
 matter+0.007
Neg logit contributions
angers-0.005
elt-0.005
 bathroom-0.004
 office-0.004
tub-0.004
Token But
Token ID887
Feature activation+3.68e-02

Pos logit contributions
angers+0.012
elt+0.012
 office+0.011
 bathroom+0.010
uck+0.009
Neg logit contributions
 matter-0.021
 unity-0.021
leness-0.020
Fin-0.020
 souven-0.019
Token most
Token ID749
Feature activation+1.04e-01

Pos logit contributions
 unity+0.061
Fin+0.060
leness+0.058
 course+0.058
forts+0.058
Neg logit contributions
angers-0.056
 office-0.055
elt-0.054
 salon-0.050
 bathroom-0.050
Token importantly
Token ID11003
Feature activation+1.51e-01

Pos logit contributions
bit+0.128
 unity+0.128
Fin+0.128
 matter+0.127
 course+0.126
Neg logit contributions
elt-0.196
angers-0.194
 office-0.185
azes-0.172
uddled-0.167
Token,
Token ID11
Feature activation-3.63e-02

Pos logit contributions
 unity+0.321
 matter+0.318
 course+0.311
Fin+0.311
 bond+0.307
Neg logit contributions
angers-0.176
elt-0.158
 office-0.156
 bathroom-0.138
 salon-0.131
Token she
Token ID673
Feature activation+2.15e-02

Pos logit contributions
 office+0.046
angers+0.045
 salon+0.043
elt+0.043
 bathroom+0.043
Neg logit contributions
forts-0.070
leness-0.070
oc-0.070
Fin-0.069
Ar-0.069
Token learned
Token ID4499
Feature activation+3.25e-02

Pos logit contributions
 unity+0.033
 matter+0.033
 course+0.033
Fin+0.031
 bond+0.031
Neg logit contributions
angers-0.038
elt-0.035
 office-0.035
 bathroom-0.032
 salon-0.031
Token to
Token ID284
Feature activation+4.26e-02

Pos logit contributions
 course+0.048
 unity+0.047
 matter+0.047
 caring+0.045
 kindness+0.045
Neg logit contributions
angers-0.063
elt-0.057
 office-0.056
 bathroom-0.052
amas-0.051
Token love
Token ID1842
Feature activation+7.11e-02

Pos logit contributions
oc+0.071
 lasting+0.071
 unity+0.071
leness+0.069
Fin+0.069
Neg logit contributions
 office-0.062
 bathroom-0.058
angers-0.056
elt-0.052
 salon-0.052
Token come
Token ID1282
Feature activation-2.90e-02

Pos logit contributions
 unity+0.021
oc+0.021
 matter+0.020
 teamwork+0.020
leness+0.020
Neg logit contributions
 office-0.014
 bathroom-0.013
angers-0.013
 door-0.012
 salon-0.012
Token back
Token ID736
Feature activation-7.89e-02

Pos logit contributions
angers+0.024
 office+0.021
elt+0.021
 bathroom+0.017
 upstairs+0.016
Neg logit contributions
 matter-0.073
 course-0.072
 unity-0.072
leness-0.071
 bond-0.069
Token and
Token ID290
Feature activation+4.05e-02

Pos logit contributions
angers+0.138
 office+0.127
elt+0.127
 bathroom+0.116
 salon+0.112
Neg logit contributions
 matter-0.125
 unity-0.125
 course-0.124
Fin-0.118
 bond-0.118
Token apologize
Token ID16521
Feature activation+5.75e-02

Pos logit contributions
 unity+0.057
 course+0.057
 matter+0.057
Fin+0.055
leness+0.054
Neg logit contributions
 office-0.072
angers-0.072
elt-0.071
 bathroom-0.067
 salon-0.065
Token.
Token ID13
Feature activation-9.42e-03

Pos logit contributions
 matter+0.107
 unity+0.107
 course+0.106
 bond+0.102
 kindness+0.101
Neg logit contributions
angers-0.084
elt-0.076
 office-0.075
 bathroom-0.068
 salon-0.065
Token Friends
Token ID14213
Feature activation+1.51e-01

Pos logit contributions
angers+0.015
elt+0.014
 office+0.014
 bathroom+0.013
amas+0.012
Neg logit contributions
 matter-0.016
 unity-0.016
 course-0.016
 bond-0.016
Fin-0.015
Token sometimes
Token ID3360
Feature activation+2.40e-02

Pos logit contributions
 unity+0.221
 memorable+0.216
 souven+0.214
oc+0.214
 matter+0.211
Neg logit contributions
 office-0.223
 upstairs-0.217
 door-0.212
 bathroom-0.211
 outside-0.207
Token forget
Token ID6044
Feature activation+6.39e-02

Pos logit contributions
Fin+0.036
oc+0.036
 course+0.036
 unity+0.036
 bond+0.034
Neg logit contributions
 office-0.039
 bathroom-0.035
angers-0.035
 salon-0.034
 upstairs-0.034
Token,
Token ID11
Feature activation-3.06e-02

Pos logit contributions
Fin+0.122
 unity+0.122
 matter+0.121
 course+0.121
oc+0.120
Neg logit contributions
 office-0.081
angers-0.081
elt-0.073
 bathroom-0.070
 salon-0.065
Token but
Token ID475
Feature activation+5.42e-02

Pos logit contributions
 office+0.047
angers+0.044
elt+0.042
 bathroom+0.041
 door+0.040
Neg logit contributions
leness-0.053
forts-0.051
Fin-0.051
Mic-0.050
 bond-0.050
Token they
Token ID484
Feature activation+6.12e-02

Pos logit contributions
oc+0.084
leness+0.083
Fin+0.083
forts+0.082
 unity+0.080
Neg logit contributions
 office-0.088
angers-0.082
elt-0.082
 bathroom-0.081
 door-0.079
Token more
Token ID517
Feature activation+3.74e-02

Pos logit contributions
Fin+0.080
 course+0.075
Mic+0.072
Fred+0.072
From+0.070
Neg logit contributions
 office-0.079
angers-0.072
 upstairs-0.070
 door-0.069
 bathroom-0.069
Token enjoyable
Token ID20050
Feature activation+7.97e-02

Pos logit contributions
Fin+0.056
 course+0.055
Fred+0.052
Mic+0.051
Blue+0.050
Neg logit contributions
 office-0.062
angers-0.061
elt-0.057
 door-0.057
 upstairs-0.056
Token and
Token ID290
Feature activation+4.90e-02

Pos logit contributions
Fin+0.171
 matter+0.169
 unity+0.166
 course+0.165
Blue+0.161
Neg logit contributions
angers-0.087
 office-0.084
elt-0.074
 bathroom-0.071
 upstairs-0.067
Token meaningful
Token ID11570
Feature activation+8.88e-02

Pos logit contributions
 course+0.068
Fin+0.066
oc+0.065
affe+0.063
Mic+0.062
Neg logit contributions
 office-0.084
 upstairs-0.078
elt-0.078
 door-0.077
angers-0.076
Token.
Token ID13
Feature activation-1.10e-02

Pos logit contributions
Fin+0.191
 course+0.189
 matter+0.188
 unity+0.184
Fred+0.180
Neg logit contributions
angers-0.097
 office-0.093
elt-0.084
 bathroom-0.077
 door-0.073
Token Friends
Token ID14213
Feature activation+1.50e-01

Pos logit contributions
angers+0.014
elt+0.014
 office+0.014
 bathroom+0.012
omm+0.012
Neg logit contributions
 matter-0.022
 course-0.020
leness-0.020
 unity-0.020
Fin-0.020
Token should
Token ID815
Feature activation+6.54e-02

Pos logit contributions
oc+0.236
Fin+0.236
 souven+0.226
 memorable+0.224
Fred+0.221
Neg logit contributions
 office-0.210
 door-0.202
 upstairs-0.199
:-0.195
 bathroom-0.194
Token share
Token ID2648
Feature activation+4.73e-02

Pos logit contributions
oc+0.129
ience+0.107
Fin+0.107
Mic+0.107
able+0.106
Neg logit contributions
 office-0.087
 bathroom-0.082
 door-0.072
 salon-0.071
 kitchen-0.069
Token,
Token ID11
Feature activation-2.70e-02

Pos logit contributions
Fin+0.105
 matter+0.101
Mic+0.099
Tweet+0.097
 course+0.097
Neg logit contributions
 office-0.045
 door-0.037
 upstairs-0.035
 bathroom-0.035
angers-0.034
Token help
Token ID1037
Feature activation+3.58e-02

Pos logit contributions
 office+0.033
 to+0.030
 bathroom+0.030
 door+0.029
 room+0.028
Neg logit contributions
leness-0.057
oc-0.053
Fin-0.052
 course-0.050
Mic-0.050
Token and
Token ID290
Feature activation+6.11e-02

Pos logit contributions
Fin+0.069
 unity+0.066
oc+0.066
Ar+0.065
 course+0.065
Neg logit contributions
 office-0.044
 upstairs-0.038
angers-0.038
 outside-0.038
 door-0.036
Token,
Token ID11
Feature activation-3.36e-02

Pos logit contributions
 matter+0.110
 unity+0.110
 course+0.107
Fin+0.105
 bond+0.104
Neg logit contributions
angers-0.086
elt-0.078
 office-0.078
 bathroom-0.070
 salon-0.068
Token treats
Token ID18432
Feature activation+7.50e-02

Pos logit contributions
 office+0.031
angers+0.031
elt+0.030
 bathroom+0.028
 door+0.025
Neg logit contributions
leness-0.076
 unity-0.075
oc-0.074
Fin-0.073
 matter-0.072
Token,
Token ID11
Feature activation-3.25e-02

Pos logit contributions
 matter+0.147
 unity+0.146
 course+0.140
Fin+0.139
 bond+0.137
Neg logit contributions
angers-0.104
 office-0.095
elt-0.095
 bathroom-0.084
 salon-0.081
Token and
Token ID290
Feature activation+5.84e-02

Pos logit contributions
angers+0.032
 office+0.031
elt+0.031
 bathroom+0.027
 salon+0.025
Neg logit contributions
 unity-0.073
leness-0.073
 matter-0.071
Fin-0.071
oc-0.071
Token most
Token ID749
Feature activation+1.02e-01

Pos logit contributions
oc+0.106
leness+0.103
Fin+0.103
forts+0.099
 unity+0.099
Neg logit contributions
 office-0.082
 bathroom-0.077
angers-0.076
elt-0.073
 door-0.071
Token importantly
Token ID11003
Feature activation+1.50e-01

Pos logit contributions
leness+0.166
Fin+0.164
 course+0.163
 matter+0.163
forts+0.160
Neg logit contributions
 office-0.160
angers-0.159
elt-0.155
 bathroom-0.141
 door-0.137
Token,
Token ID11
Feature activation-2.94e-02

Pos logit contributions
 matter+0.272
 course+0.272
 unity+0.272
 bond+0.260
Fin+0.256
Neg logit contributions
angers-0.226
elt-0.206
 office-0.204
 bathroom-0.184
 salon-0.178
Token Lily
Token ID20037
Feature activation+9.67e-03

Pos logit contributions
angers+0.037
 office+0.035
 bathroom+0.033
elt+0.032
 salon+0.030
Neg logit contributions
oc-0.059
leness-0.058
 unity-0.058
Fin-0.058
forts-0.058
Token's
Token ID338
Feature activation-8.44e-03

Pos logit contributions
 unity+0.016
 matter+0.016
 course+0.016
Fin+0.015
 bond+0.015
Neg logit contributions
angers-0.016
 office-0.015
elt-0.015
 bathroom-0.014
 salon-0.013
Token love
Token ID1842
Feature activation+7.06e-02

Pos logit contributions
 office+0.013
 bathroom+0.012
angers+0.012
 door+0.011
 knee+0.011
Neg logit contributions
Fin-0.015
oc-0.014
leness-0.014
 matter-0.014
 course-0.014
Token.
Token ID13
Feature activation-1.12e-02

Pos logit contributions
 unity+0.149
 matter+0.148
Fin+0.147
 course+0.143
oc+0.143
Neg logit contributions
angers-0.082
 office-0.078
elt-0.075
 bathroom-0.067
 door-0.062
Token You
Token ID921
Feature activation+8.81e-02

Pos logit contributions
angers+0.006
elt+0.006
 office+0.006
 bathroom+0.005
tub+0.005
Neg logit contributions
 matter-0.007
 unity-0.007
 course-0.007
 bond-0.007
leness-0.007
Token are
Token ID389
Feature activation-5.26e-03

Pos logit contributions
 unity+0.141
Fin+0.139
oc+0.136
 matter+0.134
leness+0.133
Neg logit contributions
 office-0.131
elt-0.131
angers-0.128
 bathroom-0.124
 upstairs-0.121
Token my
Token ID616
Feature activation+1.95e-02

Pos logit contributions
 office+0.007
angers+0.007
elt+0.007
 bathroom+0.007
 upstairs+0.006
Neg logit contributions
 matter-0.010
oc-0.009
 unity-0.009
 bond-0.009
 course-0.009
Token friend
Token ID1545
Feature activation+6.36e-02

Pos logit contributions
 unity+0.033
leness+0.033
 matter+0.033
From+0.033
 course+0.033
Neg logit contributions
 office-0.031
angers-0.031
 bathroom-0.028
elt-0.027
 knee-0.026
Token.
Token ID13
Feature activation-4.67e-03

Pos logit contributions
 unity+0.146
 matter+0.142
leness+0.141
 bond+0.140
Fin+0.139
Neg logit contributions
 office-0.063
elt-0.061
angers-0.055
 door-0.050
 upstairs-0.049
Token Friends
Token ID14213
Feature activation+1.50e-01

Pos logit contributions
angers+0.007
elt+0.007
 office+0.006
 bathroom+0.006
azes+0.005
Neg logit contributions
 matter-0.009
 unity-0.009
 course-0.008
leness-0.008
 bond-0.008
Token help
Token ID1037
Feature activation+3.37e-02

Pos logit contributions
 unity+0.220
 souven+0.214
 memorable+0.213
 matter+0.209
Fin+0.205
Neg logit contributions
 office-0.233
 bathroom-0.223
 upstairs-0.217
angers-0.216
 outside-0.211
Token each
Token ID1123
Feature activation+4.34e-02

Pos logit contributions
 unity+0.048
 course+0.046
 bond+0.045
 matter+0.044
leness+0.044
Neg logit contributions
angers-0.062
 office-0.060
elt-0.058
 bathroom-0.056
uck-0.053
Token other
Token ID584
Feature activation+1.70e-02

Pos logit contributions
 unity+0.085
forts+0.081
Fin+0.080
leness+0.080
 souven+0.079
Neg logit contributions
 office-0.061
 bathroom-0.052
angers-0.048
 salon-0.047
 door-0.047
Token.
Token ID13
Feature activation-3.38e-03

Pos logit contributions
 unity+0.033
 course+0.033
 matter+0.033
Fin+0.033
 bond+0.032
Neg logit contributions
angers-0.022
 office-0.021
elt-0.020
 bathroom-0.019
 salon-0.018
Token Now
Token ID2735
Feature activation+3.87e-02

Pos logit contributions
elt+0.004
angers+0.004
 office+0.004
 bathroom+0.004
amas+0.003
Neg logit contributions
 matter-0.007
 course-0.006
leness-0.006
 unity-0.006
 bond-0.006
Token need
Token ID761
Feature activation+6.36e-02

Pos logit contributions
 unity+0.032
 matter+0.031
 course+0.031
Fin+0.029
 bond+0.029
Neg logit contributions
angers-0.049
elt-0.047
 office-0.046
 bathroom-0.043
 salon-0.042
Token it
Token ID340
Feature activation-6.58e-02

Pos logit contributions
 unity+0.106
oc+0.106
 matter+0.106
Fin+0.105
 course+0.104
Neg logit contributions
angers-0.098
 office-0.094
elt-0.091
 bathroom-0.090
 door-0.083
Token.
Token ID13
Feature activation-1.24e-02

Pos logit contributions
angers+0.087
 office+0.078
elt+0.078
 bathroom+0.069
 salon+0.065
Neg logit contributions
 unity-0.132
 matter-0.132
 course-0.131
Fin-0.126
 bond-0.126
Token And
Token ID843
Feature activation+9.49e-02

Pos logit contributions
angers+0.017
elt+0.016
 office+0.016
 bathroom+0.014
amas+0.013
Neg logit contributions
 matter-0.025
 unity-0.024
 course-0.024
Fin-0.023
 bond-0.023
Token most
Token ID749
Feature activation+9.83e-02

Pos logit contributions
Fin+0.195
Ar+0.185
leness+0.184
oc+0.184
 unity+0.183
Neg logit contributions
 office-0.119
angers-0.114
elt-0.106
 salon-0.101
 bathroom-0.097
Token importantly
Token ID11003
Feature activation+1.49e-01

Pos logit contributions
Fin+0.122
 matter+0.119
 unity+0.116
 course+0.115
bit+0.115
Neg logit contributions
angers-0.191
elt-0.191
 office-0.187
 bathroom-0.169
 upstairs-0.167
Token,
Token ID11
Feature activation-3.67e-02

Pos logit contributions
 unity+0.311
 matter+0.310
 course+0.304
Fin+0.301
 bond+0.297
Neg logit contributions
angers-0.183
 office-0.163
elt-0.162
 bathroom-0.144
 salon-0.136
Token she
Token ID673
Feature activation+2.13e-02

Pos logit contributions
 office+0.048
angers+0.047
 bathroom+0.043
 salon+0.042
elt+0.041
Neg logit contributions
Fin-0.073
forts-0.071
leness-0.071
oc-0.071
Ar-0.069
Token learned
Token ID4499
Feature activation+2.72e-02

Pos logit contributions
 matter+0.033
 unity+0.033
 course+0.032
Fin+0.031
 bond+0.031
Neg logit contributions
angers-0.037
 office-0.035
elt-0.034
 bathroom-0.032
 salon-0.030
Token that
Token ID326
Feature activation+2.95e-02

Pos logit contributions
 unity+0.046
 matter+0.046
 course+0.045
Fin+0.043
 bond+0.043
Neg logit contributions
angers-0.045
elt-0.041
 office-0.041
 bathroom-0.037
 salon-0.036
Token a
Token ID257
Feature activation+5.07e-02

Pos logit contributions
Fin+0.054
oc+0.052
forts+0.050
affe+0.050
leness+0.050
Neg logit contributions
 office-0.045
 bathroom-0.041
angers-0.040
 upstairs-0.038
elt-0.038
Token "
Token ID366
Feature activation+1.95e-02

Pos logit contributions
angers+0.068
 office+0.060
elt+0.057
 bathroom+0.053
 salon+0.050
Neg logit contributions
 course-0.103
 unity-0.101
 caring-0.100
 matter-0.099
From-0.098
TokenYou
Token ID1639
Feature activation+7.80e-02

Pos logit contributions
 unity+0.039
 matter+0.038
 teamwork+0.037
 kindness+0.037
 shields+0.037
Neg logit contributions
elt-0.026
angers-0.022
uck-0.019
ed-0.018
amas-0.018
Token're
Token ID821
Feature activation-5.55e-03

Pos logit contributions
 unity+0.122
 matter+0.118
 course+0.118
oc+0.116
Fin+0.116
Neg logit contributions
angers-0.130
elt-0.129
 office-0.124
 bathroom-0.118
tub-0.114
Token welcome
Token ID7062
Feature activation+2.93e-02

Pos logit contributions
angers+0.012
elt+0.011
 office+0.011
 bathroom+0.010
 salon+0.010
Neg logit contributions
 unity-0.007
 matter-0.007
 course-0.007
Fin-0.007
 bond-0.007
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
 unity+0.058
 matter+0.058
 course+0.057
 bond+0.055
Fin+0.055
Neg logit contributions
angers-0.040
elt-0.036
 office-0.036
 bathroom-0.032
 salon-0.030
Token Friends
Token ID14213
Feature activation+1.48e-01

Pos logit contributions
elt+0.015
angers+0.015
 office+0.014
 bathroom+0.013
uck+0.012
Neg logit contributions
 matter-0.019
 unity-0.018
leness-0.018
 course-0.018
 lasting-0.017
Token help
Token ID1037
Feature activation+3.19e-02

Pos logit contributions
 unity+0.221
 souven+0.218
oc+0.217
 memorable+0.212
Fin+0.210
Neg logit contributions
 office-0.225
 bathroom-0.221
 upstairs-0.218
 door-0.213
angers-0.211
Token each
Token ID1123
Feature activation+3.97e-02

Pos logit contributions
 unity+0.046
 course+0.044
Fin+0.043
 souven+0.042
 bond+0.042
Neg logit contributions
angers-0.057
 office-0.056
elt-0.054
 bathroom-0.052
 upstairs-0.051
Token other
Token ID584
Feature activation+1.33e-02

Pos logit contributions
 unity+0.075
leness+0.070
forts+0.069
 bond+0.068
Fin+0.068
Neg logit contributions
 office-0.058
 bathroom-0.051
angers-0.049
 door-0.049
 salon-0.048
Token."
Token ID526
Feature activation-3.31e-02

Pos logit contributions
 unity+0.023
 course+0.023
 matter+0.023
Fin+0.022
 bond+0.022
Neg logit contributions
angers-0.021
 office-0.019
elt-0.019
 bathroom-0.018
 salon-0.017
Token<|endoftext|>
Token ID50256
Feature activation-2.11e-02

Pos logit contributions
angers+0.060
elt+0.055
 office+0.055
 bathroom+0.050
 salon+0.049
Neg logit contributions
 unity-0.051
 matter-0.050
 course-0.050
Fin-0.048
 bond-0.048
Token beautiful
Token ID4950
Feature activation+7.62e-02

Pos logit contributions
 office+0.044
elt+0.041
angers+0.041
 bathroom+0.038
 door+0.037
Neg logit contributions
 matter-0.053
oc-0.051
Fin-0.051
leness-0.050
forts-0.050
Token things
Token ID1243
Feature activation+9.59e-04

Pos logit contributions
 unity+0.124
 matter+0.124
 course+0.123
 bond+0.117
Fin+0.117
Neg logit contributions
angers-0.131
elt-0.120
 office-0.119
 bathroom-0.109
 salon-0.106
Token and
Token ID290
Feature activation+5.71e-02

Pos logit contributions
 matter+0.002
oc+0.002
Fin+0.002
 unity+0.002
 course+0.002
Neg logit contributions
angers-0.001
elt-0.001
 office-0.001
 bathroom-0.001
 upstairs-0.001
Token,
Token ID11
Feature activation-3.63e-02

Pos logit contributions
 course+0.094
 matter+0.090
oc+0.090
Fin+0.089
 unity+0.089
Neg logit contributions
angers-0.091
 office-0.088
elt-0.087
 bathroom-0.079
tub-0.076
Token most
Token ID749
Feature activation+1.00e-01

Pos logit contributions
 office+0.049
angers+0.048
elt+0.044
 bathroom+0.043
 salon+0.040
Neg logit contributions
Fin-0.068
oc-0.068
 matter-0.068
forts-0.067
Mic-0.066
Token importantly
Token ID11003
Feature activation+1.48e-01

Pos logit contributions
 course+0.137
 matter+0.135
forts+0.135
Fin+0.130
leness+0.129
Neg logit contributions
angers-0.184
elt-0.182
 office-0.182
tub-0.166
 bathroom-0.164
Token,
Token ID11
Feature activation-3.16e-02

Pos logit contributions
 matter+0.283
 unity+0.283
 course+0.281
Fin+0.273
 bond+0.269
Neg logit contributions
angers-0.206
 office-0.186
elt-0.185
 bathroom-0.166
 salon-0.158
Token they
Token ID484
Feature activation+6.12e-02

Pos logit contributions
 office+0.047
angers+0.046
elt+0.042
 bathroom+0.042
 door+0.039
Neg logit contributions
forts-0.056
oc-0.056
Mic-0.055
Fin-0.054
leness-0.054
Token had
Token ID550
Feature activation+2.48e-02

Pos logit contributions
oc+0.115
 unity+0.112
 matter+0.111
 course+0.109
Fin+0.108
Neg logit contributions
angers-0.077
elt-0.075
 office-0.075
 upstairs-0.070
ed-0.068
Token a
Token ID257
Feature activation+5.46e-02

Pos logit contributions
 matter+0.044
 unity+0.042
oc+0.042
 course+0.042
leness+0.041
Neg logit contributions
angers-0.038
 office-0.036
elt-0.036
 bathroom-0.032
tub-0.030
Token lot
Token ID1256
Feature activation+9.96e-02

Pos logit contributions
leness+0.080
forts+0.076
ck+0.074
Fin+0.074
From+0.074
Neg logit contributions
 office-0.101
 bathroom-0.097
 door-0.091
 salon-0.086
 kitchen-0.086
Token we
Token ID356
Feature activation+6.34e-02

Pos logit contributions
 salon+0.045
 door+0.044
 office+0.043
 bathroom+0.040
 knee+0.039
Neg logit contributions
leness-0.072
oc-0.069
 unity-0.065
 bond-0.064
Ar-0.063
Token can
Token ID460
Feature activation+1.72e-03

Pos logit contributions
 unity+0.116
 harmony+0.113
leness+0.113
 matter+0.110
 lasting+0.109
Neg logit contributions
angers-0.088
 office-0.082
elt-0.082
 bathroom-0.078
 door-0.077
Token be
Token ID307
Feature activation-2.66e-02

Pos logit contributions
 unity+0.003
 matter+0.003
 course+0.003
 bond+0.003
oc+0.003
Neg logit contributions
angers-0.002
elt-0.002
 office-0.002
 bathroom-0.002
 salon-0.002
Token friends
Token ID2460
Feature activation+6.82e-02

Pos logit contributions
angers+0.044
 office+0.041
elt+0.040
 bathroom+0.037
 salon+0.034
Neg logit contributions
 matter-0.046
 course-0.045
 unity-0.044
oc-0.043
 bond-0.043
Token.
Token ID13
Feature activation-8.74e-03

Pos logit contributions
 unity+0.140
 matter+0.140
 course+0.135
Fin+0.134
 bond+0.133
Neg logit contributions
angers-0.085
 office-0.079
elt-0.076
 bathroom-0.069
 salon-0.067
Token Friends
Token ID14213
Feature activation+1.48e-01

Pos logit contributions
angers+0.013
elt+0.012
 office+0.011
 bathroom+0.010
uck+0.010
Neg logit contributions
 matter-0.017
leness-0.016
 course-0.016
 unity-0.016
 lasting-0.015
Token share
Token ID2648
Feature activation+4.49e-02

Pos logit contributions
oc+0.232
 unity+0.232
 memorable+0.231
 lasting+0.227
 souven+0.225
Neg logit contributions
 office-0.191
angers-0.188
 bathroom-0.186
our-0.177
 outside-0.177
Token and
Token ID290
Feature activation+6.48e-02

Pos logit contributions
 matter+0.099
 unity+0.094
Fin+0.094
 bond+0.091
leness+0.091
Neg logit contributions
 office-0.042
angers-0.042
 bathroom-0.039
 door-0.036
elt-0.036
Token speak
Token ID2740
Feature activation+3.77e-02

Pos logit contributions
oc+0.096
Fin+0.094
 course+0.092
 unity+0.091
 matter+0.090
Neg logit contributions
elt-0.100
 office-0.100
 bathroom-0.100
angers-0.098
 salon-0.093
Token and
Token ID290
Feature activation+6.10e-02

Pos logit contributions
 course+0.073
 matter+0.073
 unity+0.070
Fin+0.070
oc+0.069
Neg logit contributions
angers-0.051
elt-0.046
 office-0.046
 bathroom-0.041
 door-0.039
Token play
Token ID711
Feature activation+8.75e-03

Pos logit contributions
oc+0.090
 course+0.088
Fin+0.086
 matter+0.082
leness+0.080
Neg logit contributions
angers-0.097
elt-0.097
 office-0.097
 bathroom-0.095
 salon-0.091
Token are
Token ID389
Feature activation-1.17e-02

Pos logit contributions
 unity+0.134
oc+0.133
 matter+0.128
 harmony+0.127
 bond+0.127
Neg logit contributions
elt-0.127
angers-0.120
 office-0.119
 bathroom-0.116
ed-0.107
Token my
Token ID616
Feature activation+1.82e-02

Pos logit contributions
angers+0.019
 office+0.019
elt+0.019
 bathroom+0.018
 upstairs+0.016
Neg logit contributions
 matter-0.018
oc-0.018
 unity-0.018
 bond-0.018
 course-0.017
Token friend
Token ID1545
Feature activation+5.99e-02

Pos logit contributions
leness+0.036
 course+0.035
 unity+0.035
 matter+0.035
From+0.035
Neg logit contributions
 office-0.025
angers-0.024
elt-0.023
 bathroom-0.023
 knee-0.021
Token too
Token ID1165
Feature activation+7.22e-03

Pos logit contributions
 unity+0.146
 matter+0.142
leness+0.141
Fin+0.140
 bond+0.138
Neg logit contributions
 office-0.051
elt-0.048
angers-0.045
 bathroom-0.039
 door-0.037
Token.
Token ID13
Feature activation-1.15e-02

Pos logit contributions
 matter+0.014
 unity+0.014
 course+0.013
Fin+0.013
 bond+0.013
Neg logit contributions
angers-0.011
elt-0.010
 office-0.010
 bathroom-0.009
 salon-0.008
Token Friends
Token ID14213
Feature activation+1.45e-01

Pos logit contributions
angers+0.017
elt+0.017
 office+0.016
 bathroom+0.015
uck+0.013
Neg logit contributions
 matter-0.021
 unity-0.021
 course-0.020
 bond-0.019
Fin-0.019
Token forgive
Token ID20927
Feature activation+7.73e-02

Pos logit contributions
 unity+0.226
 souven+0.220
oc+0.215
 memorable+0.215
 matter+0.209
Neg logit contributions
 office-0.217
 bathroom-0.212
 upstairs-0.200
uck-0.199
angers-0.196
Token and
Token ID290
Feature activation+6.84e-02

Pos logit contributions
 unity+0.133
Fin+0.128
 bond+0.127
 souven+0.126
 course+0.125
Neg logit contributions
angers-0.112
 office-0.110
elt-0.107
 bathroom-0.103
uck-0.097
Token help
Token ID1037
Feature activation+2.77e-02

Pos logit contributions
oc+0.100
Fin+0.099
 matter+0.096
 unity+0.093
 course+0.093
Neg logit contributions
 office-0.114
elt-0.111
 bathroom-0.109
angers-0.103
 salon-0.103
Token each
Token ID1123
Feature activation+3.85e-02

Pos logit contributions
 unity+0.035
 course+0.035
 matter+0.034
Fin+0.033
 bond+0.033
Neg logit contributions
angers-0.056
 office-0.053
elt-0.053
 bathroom-0.050
 salon-0.048
Token other
Token ID584
Feature activation+1.20e-02

Pos logit contributions
 unity+0.067
oc+0.062
Fin+0.062
 bond+0.062
leness+0.061
Neg logit contributions
 office-0.061
 bathroom-0.055
angers-0.054
 salon-0.052
 door-0.051
Token You
Token ID921
Feature activation+7.98e-02

Pos logit contributions
angers+0.024
elt+0.022
 office+0.022
 bathroom+0.020
 salon+0.019
Neg logit contributions
 matter-0.024
 unity-0.024
 course-0.024
 bond-0.023
Fin-0.022
Token are
Token ID389
Feature activation-1.61e-02

Pos logit contributions
 unity+0.134
leness+0.128
 harmony+0.126
 bond+0.124
 matter+0.121
Neg logit contributions
elt-0.122
 office-0.115
angers-0.112
 upstairs-0.109
 bathroom-0.109
Token my
Token ID616
Feature activation+1.11e-02

Pos logit contributions
angers+0.026
 office+0.025
elt+0.025
 bathroom+0.023
 upstairs+0.022
Neg logit contributions
 matter-0.026
 unity-0.026
 course-0.026
 bond-0.026
oc-0.026
Token friend
Token ID1545
Feature activation+5.68e-02

Pos logit contributions
leness+0.022
 course+0.021
 unity+0.021
From+0.021
 matter+0.021
Neg logit contributions
 office-0.015
angers-0.015
elt-0.014
 bathroom-0.014
 knee-0.013
Token.
Token ID13
Feature activation-1.34e-02

Pos logit contributions
 unity+0.130
 matter+0.125
Fin+0.124
 bond+0.123
leness+0.123
Neg logit contributions
 office-0.058
elt-0.054
angers-0.053
 bathroom-0.045
 knee-0.045
Token Friends
Token ID14213
Feature activation+1.44e-01

Pos logit contributions
angers+0.021
elt+0.020
 office+0.020
 bathroom+0.018
azes+0.017
Neg logit contributions
 matter-0.023
 unity-0.023
 course-0.023
 bond-0.022
Fin-0.022
Token share
Token ID2648
Feature activation+4.20e-02

Pos logit contributions
 unity+0.229
 souven+0.218
oc+0.214
 memorable+0.212
 matter+0.210
Neg logit contributions
 office-0.215
 bathroom-0.209
elt-0.203
 upstairs-0.201
angers-0.198
Token and
Token ID290
Feature activation+6.63e-02

Pos logit contributions
 matter+0.083
 unity+0.082
Fin+0.081
 bond+0.079
leness+0.077
Neg logit contributions
 office-0.049
angers-0.046
 bathroom-0.044
elt-0.044
 knee-0.041
Token take
Token ID1011
Feature activation+4.17e-02

Pos logit contributions
oc+0.095
 unity+0.095
 matter+0.094
Fin+0.094
 course+0.094
Neg logit contributions
 office-0.110
elt-0.110
angers-0.106
 bathroom-0.105
 salon-0.098
Token turns
Token ID4962
Feature activation-2.06e-02

Pos logit contributions
 matter+0.044
 unity+0.042
Fin+0.039
 bond+0.038
 course+0.038
Neg logit contributions
elt-0.089
angers-0.089
 office-0.087
uck-0.085
ed-0.084
Token,"
Token ID553
Feature activation-1.57e-02

Pos logit contributions
angers+0.029
elt+0.026
 office+0.025
 bathroom+0.023
 salon+0.022
Neg logit contributions
 unity-0.040
 matter-0.040
 course-0.040
 bond-0.038
Fin-0.038
Token valuable
Token ID8119
Feature activation+9.39e-02

Pos logit contributions
 course+0.058
Fin+0.056
oc+0.054
 matter+0.053
Fred+0.052
Neg logit contributions
 office-0.060
angers-0.057
elt-0.052
 upstairs-0.052
 bathroom-0.052
Token than
Token ID621
Feature activation+6.93e-02

Pos logit contributions
 matter+0.119
 course+0.117
 unity+0.115
Fin+0.113
 bond+0.104
Neg logit contributions
angers-0.192
 office-0.182
elt-0.178
 bathroom-0.163
tub-0.159
Token collecting
Token ID13157
Feature activation+2.36e-02

Pos logit contributions
oc+0.125
 course+0.117
Fin+0.117
From+0.114
 bond+0.114
Neg logit contributions
angers-0.102
elt-0.097
 office-0.094
 bathroom-0.089
 door-0.084
Token things
Token ID1243
Feature activation-5.56e-04

Pos logit contributions
 matter+0.042
 unity+0.042
 course+0.041
Fin+0.041
 bond+0.040
Neg logit contributions
angers-0.036
 office-0.033
elt-0.031
 bathroom-0.030
tub-0.028
Token.
Token ID13
Feature activation-1.37e-02

Pos logit contributions
angers+0.001
 office+0.001
elt+0.001
 bathroom+0.000
 upstairs+0.000
Neg logit contributions
 matter-0.001
 unity-0.001
Fin-0.001
 course-0.001
 bond-0.001
Token Friends
Token ID14213
Feature activation+1.44e-01

Pos logit contributions
angers+0.017
elt+0.017
 office+0.017
 bathroom+0.015
azes+0.015
Neg logit contributions
 matter-0.028
leness-0.026
 unity-0.025
 course-0.025
 bond-0.025
Token should
Token ID815
Feature activation+5.98e-02

Pos logit contributions
oc+0.230
 souven+0.229
Fin+0.226
 memorable+0.226
 matter+0.214
Neg logit contributions
 office-0.198
 door-0.187
 upstairs-0.186
:-0.183
 bathroom-0.180
Token help
Token ID1037
Feature activation+3.18e-02

Pos logit contributions
oc+0.118
ience+0.100
able+0.099
Fin+0.099
 souven+0.098
Neg logit contributions
 office-0.079
 bathroom-0.074
 door-0.067
 salon-0.063
 kitchen-0.063
Token each
Token ID1123
Feature activation+3.54e-02

Pos logit contributions
oc+0.053
 course+0.052
Fin+0.052
haw+0.051
 unity+0.051
Neg logit contributions
 office-0.047
angers-0.043
 upstairs-0.041
 outside-0.041
 door-0.040
Token other
Token ID584
Feature activation+9.04e-03

Pos logit contributions
forts+0.073
Fin+0.072
Rust+0.072
fy+0.071
From+0.070
Neg logit contributions
 office-0.044
 door-0.033
 bathroom-0.031
 kitchen-0.031
 building-0.029
Token when
Token ID618
Feature activation-3.40e-02

Pos logit contributions
Fin+0.018
 course+0.018
 matter+0.017
 unity+0.017
Fred+0.017
Neg logit contributions
 office-0.012
angers-0.011
 bathroom-0.010
 upstairs-0.010
elt-0.009
Token're
Token ID821
Feature activation-1.02e-02

Pos logit contributions
 unity+0.131
 matter+0.130
oc+0.129
leness+0.126
 bond+0.125
Neg logit contributions
elt-0.113
angers-0.104
 office-0.103
ed-0.099
 bathroom-0.096
Token welcome
Token ID7062
Feature activation+2.50e-02

Pos logit contributions
angers+0.020
 office+0.019
elt+0.019
 bathroom+0.018
azes+0.017
Neg logit contributions
 matter-0.013
 unity-0.013
 course-0.013
oc-0.013
 bond-0.012
Token,
Token ID11
Feature activation-4.01e-02

Pos logit contributions
 course+0.048
 matter+0.048
 unity+0.048
Fin+0.046
 bond+0.045
Neg logit contributions
angers-0.036
elt-0.032
 office-0.032
 bathroom-0.028
 salon-0.028
Token Tom
Token ID4186
Feature activation+6.59e-04

Pos logit contributions
 office+0.045
 door+0.040
 salon+0.038
 bathroom+0.037
 knee+0.037
Neg logit contributions
leness-0.087
 bond-0.085
forts-0.083
 unity-0.083
Fin-0.083
Token.
Token ID13
Feature activation-1.00e-02

Pos logit contributions
 unity+0.002
 matter+0.001
 course+0.001
 bond+0.001
Fin+0.001
Neg logit contributions
angers-0.001
 office-0.001
elt-0.001
 bathroom-0.001
 door-0.000
Token Friends
Token ID14213
Feature activation+1.44e-01

Pos logit contributions
elt+0.014
angers+0.013
 office+0.012
 bathroom+0.011
uck+0.011
Neg logit contributions
 matter-0.020
 unity-0.019
leness-0.018
 lasting-0.018
 bond-0.018
Token help
Token ID1037
Feature activation+2.74e-02

Pos logit contributions
 memorable+0.223
 souven+0.220
 unity+0.219
 lasting+0.214
 matter+0.213
Neg logit contributions
 office-0.202
 bathroom-0.192
 upstairs-0.189
 outside-0.187
 door-0.185
Token each
Token ID1123
Feature activation+3.53e-02

Pos logit contributions
 unity+0.041
 course+0.040
 bond+0.040
leness+0.039
 matter+0.039
Neg logit contributions
 office-0.046
angers-0.046
elt-0.043
 bathroom-0.042
uck-0.040
Token other
Token ID584
Feature activation+7.32e-03

Pos logit contributions
 unity+0.076
 souven+0.071
 memorable+0.071
 pleasure+0.071
 lasting+0.071
Neg logit contributions
 office-0.037
 door-0.031
 bathroom-0.030
 kitchen-0.029
 outside-0.027
Token."
Token ID526
Feature activation-3.21e-02

Pos logit contributions
 course+0.016
 unity+0.016
 matter+0.015
Fin+0.015
 bond+0.015
Neg logit contributions
angers-0.008
 office-0.008
 bathroom-0.007
elt-0.007
tub-0.006
Token From
Token ID3574
Feature activation+9.69e-02

Pos logit contributions
angers+0.059
elt+0.054
 office+0.054
 bathroom+0.050
 salon+0.048
Neg logit contributions
 unity-0.048
 course-0.048
 matter-0.048
Fin-0.046
 bond-0.045
Token Friends
Token ID14213
Feature activation+1.41e-01

Pos logit contributions
elt+0.025
angers+0.024
 office+0.023
 bathroom+0.021
azes+0.019
Neg logit contributions
 matter-0.035
 unity-0.034
 course-0.033
leness-0.033
 bond-0.032
Token help
Token ID1037
Feature activation+2.49e-02

Pos logit contributions
 unity+0.222
 matter+0.220
 souven+0.219
 memorable+0.217
 lasting+0.210
Neg logit contributions
 office-0.204
 bathroom-0.194
 upstairs-0.188
 outside-0.183
 door-0.179
Token each
Token ID1123
Feature activation+3.58e-02

Pos logit contributions
 unity+0.035
 course+0.033
leness+0.033
 bond+0.033
 souven+0.033
Neg logit contributions
 office-0.045
angers-0.045
elt-0.042
 bathroom-0.041
 upstairs-0.039
Token other
Token ID584
Feature activation+8.41e-03

Pos logit contributions
 unity+0.072
leness+0.071
ience+0.070
forts+0.069
 spe+0.069
Neg logit contributions
 office-0.044
 bathroom-0.037
 door-0.036
 kitchen-0.034
 upstairs-0.031
Token.
Token ID13
Feature activation-1.62e-02

Pos logit contributions
 course+0.018
 unity+0.018
 matter+0.018
Fin+0.018
 bond+0.017
Neg logit contributions
angers-0.009
 office-0.009
 bathroom-0.008
elt-0.008
 upstairs-0.007
Token Friends
Token ID14213
Feature activation+1.43e-01

Pos logit contributions
elt+0.022
angers+0.021
 office+0.020
 bathroom+0.018
uck+0.017
Neg logit contributions
 matter-0.031
leness-0.029
 course-0.029
 unity-0.028
 lasting-0.028
Token also
Token ID635
Feature activation+6.78e-02

Pos logit contributions
 unity+0.219
 memorable+0.218
 souven+0.216
 matter+0.215
 lasting+0.208
Neg logit contributions
 office-0.209
 bathroom-0.200
 upstairs-0.197
 outside-0.192
 door-0.189
Token listen
Token ID6004
Feature activation-1.02e-02

Pos logit contributions
 matter+0.089
Fin+0.089
 course+0.088
oc+0.086
 unity+0.085
Neg logit contributions
 office-0.121
elt-0.114
 bathroom-0.114
angers-0.110
 door-0.104
Token to
Token ID284
Feature activation+3.17e-02

Pos logit contributions
angers+0.010
elt+0.008
 office+0.008
 bathroom+0.007
 salon+0.006
Neg logit contributions
 matter-0.024
 course-0.024
 unity-0.024
Fin-0.023
 bond-0.023
Token each
Token ID1123
Feature activation+3.42e-02

Pos logit contributions
leness+0.047
oc+0.046
forts+0.046
Fin+0.045
 bond+0.045
Neg logit contributions
 office-0.053
 bathroom-0.047
angers-0.047
 upstairs-0.044
elt-0.043
Token other
Token ID584
Feature activation+8.98e-03

Pos logit contributions
 unity+0.068
 spe+0.066
Fin+0.065
leness+0.064
 souven+0.064
Neg logit contributions
 office-0.043
 bathroom-0.035
 door-0.034
 kitchen-0.033
angers-0.030
Token the
Token ID262
Feature activation-3.26e-02

Pos logit contributions
 unity+0.079
 course+0.078
 matter+0.077
 bond+0.074
oc+0.073
Neg logit contributions
angers-0.072
elt-0.072
 office-0.066
 bathroom-0.063
ed-0.061
Token owner
Token ID4870
Feature activation+2.63e-02

Pos logit contributions
angers+0.049
elt+0.049
 office+0.049
 bathroom+0.046
 door+0.043
Neg logit contributions
 unity-0.056
 matter-0.055
 course-0.054
oc-0.053
leness-0.052
Token smiled
Token ID13541
Feature activation-2.45e-02

Pos logit contributions
 unity+0.049
 matter+0.048
leness+0.047
 harmony+0.046
 lasting+0.045
Neg logit contributions
elt-0.038
angers-0.037
 office-0.035
ed-0.033
 salon-0.032
Token.
Token ID13
Feature activation-1.27e-02

Pos logit contributions
angers+0.038
 office+0.034
elt+0.032
 salon+0.031
 bathroom+0.031
Neg logit contributions
 matter-0.044
 unity-0.044
 course-0.043
 caring-0.043
 kindness-0.043
Token "
Token ID366
Feature activation+2.28e-02

Pos logit contributions
angers+0.022
 office+0.020
elt+0.019
 salon+0.018
 bathroom+0.018
Neg logit contributions
 unity-0.020
 course-0.020
 matter-0.019
Fin-0.019
oc-0.019
TokenLooks
Token ID41102
Feature activation+1.43e-01

Pos logit contributions
 unity+0.040
 teamwork+0.038
 kindness+0.038
 shields+0.038
 cooperation+0.037
Neg logit contributions
elt-0.035
angers-0.025
pee-0.024
uck-0.023
ulk-0.023
Token like
Token ID588
Feature activation+5.98e-02

Pos logit contributions
 unity+0.229
 course+0.222
Frank+0.220
 bond+0.217
oc+0.208
Neg logit contributions
elt-0.219
ed-0.201
 hair-0.183
angers-0.178
 knee-0.178
Token you
Token ID345
Feature activation+4.91e-02

Pos logit contributions
oc+0.116
leness+0.107
forts+0.106
 bond+0.104
 unity+0.103
Neg logit contributions
 office-0.080
elt-0.076
 bathroom-0.074
angers-0.072
 door-0.067
Token two
Token ID734
Feature activation+7.75e-02

Pos logit contributions
 unity+0.075
 course+0.071
 matter+0.070
 bond+0.068
Fin+0.068
Neg logit contributions
angers-0.086
elt-0.084
 office-0.081
 bathroom-0.076
 salon-0.073
Token are
Token ID389
Feature activation-1.77e-02

Pos logit contributions
 unity+0.141
 bond+0.135
From+0.132
 matter+0.130
Fin+0.128
Neg logit contributions
elt-0.103
 office-0.099
angers-0.098
ed-0.089
 bathroom-0.089
Token friends
Token ID2460
Feature activation+7.10e-02

Pos logit contributions
angers+0.020
 office+0.019
elt+0.019
 bathroom+0.016
 door+0.015
Neg logit contributions
 matter-0.037
 bond-0.037
 unity-0.037
 course-0.037
oc-0.036
Token friends
Token ID2460
Feature activation+8.29e-02

Pos logit contributions
oc+0.036
 unity+0.036
 bond+0.035
 course+0.035
leness+0.035
Neg logit contributions
angers-0.038
 office-0.037
elt-0.036
 bathroom-0.035
 salon-0.034
Token should
Token ID815
Feature activation+5.89e-02

Pos logit contributions
oc+0.139
Fin+0.136
 unity+0.136
 souven+0.134
Ar+0.134
Neg logit contributions
 office-0.120
elt-0.117
angers-0.115
 bathroom-0.112
 upstairs-0.110
Token not
Token ID407
Feature activation+5.91e-02

Pos logit contributions
oc+0.117
 souven+0.106
 unity+0.104
 lasting+0.103
leness+0.102
Neg logit contributions
 office-0.076
 bathroom-0.076
 salon-0.067
angers-0.066
 door-0.065
Token quarrel
Token ID41998
Feature activation-1.08e-02

Pos logit contributions
 course+0.088
 unity+0.087
oc+0.085
 bond+0.081
 lasting+0.081
Neg logit contributions
angers-0.100
elt-0.098
 office-0.097
 bathroom-0.091
 salon-0.087
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
angers+0.018
elt+0.017
 office+0.017
 bathroom+0.015
 salon+0.015
Neg logit contributions
 matter-0.018
 unity-0.017
 course-0.017
Fin-0.016
 kindness-0.016
Token Friends
Token ID14213
Feature activation+1.43e-01

Pos logit contributions
angers+0.028
elt+0.027
 office+0.026
 bathroom+0.024
amas+0.023
Neg logit contributions
 matter-0.017
 unity-0.017
 course-0.017
 bond-0.016
Fin-0.016
Token should
Token ID815
Feature activation+5.98e-02

Pos logit contributions
oc+0.249
 souven+0.237
 memorable+0.232
 unity+0.231
Ar+0.227
Neg logit contributions
 upstairs-0.179
 office-0.178
elt-0.176
 outside-0.171
 door-0.168
Token be
Token ID307
Feature activation-1.47e-02

Pos logit contributions
oc+0.113
 souven+0.099
 unity+0.096
Ar+0.095
 harmony+0.095
Neg logit contributions
 office-0.085
 bathroom-0.082
 salon-0.075
 door-0.073
angers-0.073
Token kind
Token ID1611
Feature activation+9.32e-02

Pos logit contributions
elt+0.024
 office+0.024
angers+0.023
 bathroom+0.021
 upstairs+0.021
Neg logit contributions
oc-0.022
 course-0.022
 unity-0.021
 matter-0.021
 bond-0.020
Token to
Token ID284
Feature activation+3.79e-02

Pos logit contributions
 unity+0.196
 matter+0.195
Fin+0.193
 course+0.192
leness+0.187
Neg logit contributions
angers-0.107
elt-0.106
 office-0.106
ed-0.089
 door-0.086
Token each
Token ID1123
Feature activation+4.05e-02

Pos logit contributions
 course+0.053
 unity+0.052
oc+0.051
 bond+0.050
 matter+0.050
Neg logit contributions
angers-0.069
 office-0.067
elt-0.063
 bathroom-0.060
 upstairs-0.059
Token
Token ID26391
Feature activation-1.62e-01

Pos logit contributions
elt+0.015
angers+0.015
 bathroom+0.013
pit+0.013
 office+0.013
Neg logit contributions
 course-0.011
From-0.010
oc-0.010
Fin-0.010
 teaches-0.009
Token
Token ID129
Feature activation-3.96e-02

Pos logit contributions
angers+0.330
elt+0.303
 office+0.300
 bathroom+0.285
 upstairs+0.275
Neg logit contributions
oc-0.206
 course-0.202
Fin-0.193
From-0.191
 unity-0.184
Token
Token ID241
Feature activation-9.56e-02

Pos logit contributions
angers+0.067
elt+0.067
 office+0.059
 bathroom+0.058
tub+0.056
Neg logit contributions
From-0.065
 course-0.065
oc-0.062
Fin-0.062
 unity-0.061
TokenWhere
Token ID8496
Feature activation-5.15e-02

Pos logit contributions
angers+0.191
 office+0.176
elt+0.175
 bathroom+0.163
 salon+0.159
Neg logit contributions
 matter-0.128
 course-0.127
 unity-0.127
Fin-0.121
 bond-0.119
Tokenâ
Token ID22940
Feature activation-3.55e-02

Pos logit contributions
angers+0.107
elt+0.097
 office+0.092
amas+0.088
 bathroom+0.087
Neg logit contributions
 matter-0.071
 course-0.070
 unity-0.067
 kindness-0.065
 love-0.063
Token
Token ID26391
Feature activation-1.74e-01

Pos logit contributions
angers+0.059
elt+0.052
 office+0.045
 bathroom+0.045
uddled+0.044
Neg logit contributions
 course-0.071
 matter-0.066
 unity-0.061
Fin-0.060
leness-0.060
Token
Token ID8151
Feature activation-5.25e-02

Pos logit contributions
angers+0.338
elt+0.306
 office+0.300
 bathroom+0.287
 salon+0.273
Neg logit contributions
oc-0.239
 course-0.232
Fin-0.231
 matter-0.223
Fred-0.213
Tokens
Token ID82
Feature activation-5.25e-02

Pos logit contributions
angers+0.097
 office+0.089
elt+0.086
 bathroom+0.084
 floors+0.077
Neg logit contributions
 course-0.079
oc-0.074
Fin-0.073
 matter-0.073
 love-0.068
Token my
Token ID616
Feature activation-3.44e-02

Pos logit contributions
angers+0.100
elt+0.092
 office+0.085
amas+0.081
 bathroom+0.079
Neg logit contributions
 matter-0.080
 course-0.079
 unity-0.077
 caring-0.073
 kindness-0.073
Token jelly
Token ID27644
Feature activation-3.25e-02

Pos logit contributions
angers+0.058
elt+0.053
 office+0.052
 bathroom+0.048
 salon+0.046
Neg logit contributions
 matter-0.056
 unity-0.056
 course-0.056
 bond-0.053
Fin-0.053
Token?
Token ID30
Feature activation-6.69e-02

Pos logit contributions
angers+0.052
elt+0.048
 office+0.042
ulk+0.040
azes+0.040
Neg logit contributions
 course-0.061
 matter-0.057
 remind-0.053
 bond-0.053
 unity-0.052
Token
Token ID129
Feature activation-4.92e-02

Pos logit contributions
 course+0.185
 matter+0.184
 unity+0.182
Fin+0.176
 bond+0.174
Neg logit contributions
angers-0.150
 office-0.138
elt-0.130
 bathroom-0.123
 salon-0.122
Token
Token ID241
Feature activation-1.02e-01

Pos logit contributions
angers+0.092
elt+0.089
 bathroom+0.080
 cabinet+0.079
tub+0.078
Neg logit contributions
From-0.076
 course-0.073
oc-0.070
Fin-0.067
 unity-0.067
TokenPlease
Token ID5492
Feature activation-3.70e-02

Pos logit contributions
angers+0.199
 office+0.183
elt+0.181
 bathroom+0.169
 salon+0.164
Neg logit contributions
 matter-0.140
 course-0.140
 unity-0.138
Fin-0.134
 bond-0.129
Token don
Token ID836
Feature activation-4.51e-02

Pos logit contributions
angers+0.064
elt+0.059
 office+0.059
 bathroom+0.054
 salon+0.052
Neg logit contributions
 unity-0.059
 matter-0.059
 course-0.059
 bond-0.056
Fin-0.056
Tokenâ
Token ID22940
Feature activation-3.83e-02

Pos logit contributions
angers+0.085
elt+0.079
 office+0.077
 bathroom+0.071
pit+0.070
Neg logit contributions
 course-0.069
 matter-0.066
 unity-0.064
Fin-0.062
From-0.062
Token
Token ID26391
Feature activation-1.74e-01

Pos logit contributions
angers+0.060
elt+0.054
 office+0.051
 bathroom+0.048
tub+0.045
Neg logit contributions
 course-0.072
 matter-0.071
 unity-0.070
Fin-0.067
 bond-0.067
Token
Token ID8151
Feature activation-5.47e-02

Pos logit contributions
angers+0.305
elt+0.276
 office+0.274
 bathroom+0.259
 door+0.239
Neg logit contributions
 course-0.267
oc-0.266
Fin-0.264
 matter-0.263
 unity-0.262
Tokent
Token ID83
Feature activation+7.77e-03

Pos logit contributions
angers+0.101
 office+0.096
elt+0.091
 bathroom+0.090
azes+0.082
Neg logit contributions
 course-0.080
oc-0.076
Fin-0.076
 matter-0.076
 unity-0.073
Token eat
Token ID4483
Feature activation-2.35e-02

Pos logit contributions
 unity+0.011
 course+0.011
 matter+0.011
Fin+0.010
 bond+0.010
Neg logit contributions
angers-0.015
 office-0.014
elt-0.014
 bathroom-0.013
 salon-0.013
Token me
Token ID502
Feature activation-1.00e-01

Pos logit contributions
angers+0.042
elt+0.039
 office+0.039
 bathroom+0.036
 salon+0.035
Neg logit contributions
 unity-0.036
 matter-0.035
 course-0.035
 bond-0.033
Fin-0.033
Token,
Token ID11
Feature activation-8.73e-02

Pos logit contributions
angers+0.170
elt+0.152
 office+0.150
 bathroom+0.138
 salon+0.134
Neg logit contributions
 course-0.165
 matter-0.163
 unity-0.162
 kindness-0.157
 bond-0.155
Token saw
Token ID2497
Feature activation-7.89e-02

Pos logit contributions
angers+0.055
elt+0.050
 office+0.050
 bathroom+0.045
 salon+0.044
Neg logit contributions
 matter-0.053
 unity-0.053
 course-0.053
 bond-0.050
Fin-0.050
Token something
Token ID1223
Feature activation-6.81e-02

Pos logit contributions
angers+0.126
elt+0.115
 office+0.115
 bathroom+0.105
 salon+0.101
Neg logit contributions
 unity-0.136
 matter-0.136
 course-0.134
 bond-0.129
Fin-0.129
Token floating
Token ID12462
Feature activation-3.62e-02

Pos logit contributions
angers+0.120
 office+0.112
elt+0.112
 bathroom+0.104
 salon+0.100
Neg logit contributions
 unity-0.105
 matter-0.103
 course-0.102
 bond-0.099
Fin-0.099
Token near
Token ID1474
Feature activation-2.89e-02

Pos logit contributions
angers+0.053
 office+0.046
elt+0.046
 bathroom+0.041
 salon+0.041
Neg logit contributions
 course-0.071
 matter-0.069
 unity-0.067
Fin-0.066
 bond-0.065
Token the
Token ID262
Feature activation-9.09e-02

Pos logit contributions
angers+0.042
 office+0.041
elt+0.039
 bathroom+0.038
 door+0.035
Neg logit contributions
 unity-0.052
 matter-0.051
 bond-0.050
Fin-0.050
leness-0.049
Token bottom
Token ID4220
Feature activation-1.73e-01

Pos logit contributions
angers+0.149
elt+0.135
 office+0.131
 bathroom+0.118
 salon+0.116
Neg logit contributions
 course-0.156
 unity-0.155
 matter-0.155
 bond-0.147
Fin-0.147
Token of
Token ID286
Feature activation+2.97e-02

Pos logit contributions
angers+0.370
elt+0.342
 office+0.337
 bathroom+0.315
 salon+0.309
Neg logit contributions
 course-0.213
 matter-0.208
 unity-0.202
Fin-0.191
 bond-0.185
Token the
Token ID262
Feature activation-8.78e-02

Pos logit contributions
 unity+0.051
oc+0.051
 matter+0.051
 bond+0.051
Fin+0.050
Neg logit contributions
angers-0.044
 office-0.043
elt-0.042
 bathroom-0.041
 door-0.038
Token tub
Token ID12202
Feature activation-1.11e-01

Pos logit contributions
angers+0.136
elt+0.123
 office+0.122
 bathroom+0.110
 salon+0.106
Neg logit contributions
 matter-0.156
 unity-0.156
 course-0.156
 bond-0.148
Fin-0.148
Token.
Token ID13
Feature activation-4.46e-02

Pos logit contributions
angers+0.189
elt+0.168
 office+0.163
azes+0.155
 bathroom+0.150
Neg logit contributions
 course-0.179
 unity-0.166
 matter-0.164
 bond-0.161
Fin-0.160
Token It
Token ID632
Feature activation-1.53e-02

Pos logit contributions
angers+0.077
 office+0.071
elt+0.066
 salon+0.064
 bathroom+0.062
Neg logit contributions
 course-0.068
Fin-0.067
From-0.066
 unity-0.065
Fred-0.065
Token
Token ID26391
Feature activation-2.03e-01

Pos logit contributions
 matter+0.007
 unity+0.006
 course+0.006
Fin+0.006
oc+0.006
Neg logit contributions
angers-0.009
elt-0.008
 office-0.008
 bathroom-0.007
amas-0.007
Token
Token ID129
Feature activation-6.19e-02

Pos logit contributions
angers+0.390
elt+0.346
 office+0.340
 bathroom+0.316
 upstairs+0.314
Neg logit contributions
 course-0.292
oc-0.285
From-0.277
 unity-0.274
Fin-0.268
Token
Token ID241
Feature activation-1.05e-01

Pos logit contributions
angers+0.115
elt+0.108
 office+0.098
 bathroom+0.095
tub+0.095
Neg logit contributions
 course-0.097
From-0.095
oc-0.090
 unity-0.088
leness-0.088
TokenLet
Token ID5756
Feature activation+1.22e-03

Pos logit contributions
angers+0.209
 office+0.193
elt+0.191
 bathroom+0.178
 salon+0.173
Neg logit contributions
 matter-0.140
 course-0.139
 unity-0.137
Fin-0.132
 bond-0.128
Tokenâ
Token ID22940
Feature activation-3.60e-02

Pos logit contributions
 matter+0.002
 course+0.002
 unity+0.002
Fin+0.002
 bond+0.002
Neg logit contributions
angers-0.002
elt-0.002
 office-0.002
 bathroom-0.002
 salon-0.002
Token
Token ID26391
Feature activation-1.71e-01

Pos logit contributions
angers+0.053
elt+0.048
 office+0.046
 bathroom+0.042
 salon+0.040
Neg logit contributions
 course-0.070
 matter-0.069
 unity-0.068
Fin-0.065
 bond-0.065
Token
Token ID8151
Feature activation-5.42e-02

Pos logit contributions
angers+0.301
elt+0.277
 office+0.273
 bathroom+0.258
 salon+0.243
Neg logit contributions
 course-0.262
oc-0.259
Fin-0.259
 unity-0.256
 matter-0.253
Tokens
Token ID82
Feature activation-4.38e-02

Pos logit contributions
angers+0.098
 office+0.094
elt+0.087
 bathroom+0.087
azes+0.079
Neg logit contributions
 course-0.082
oc-0.077
Fin-0.077
 matter-0.074
 unity-0.074
Token have
Token ID423
Feature activation+4.45e-03

Pos logit contributions
angers+0.081
elt+0.075
 office+0.074
 bathroom+0.068
 salon+0.066
Neg logit contributions
 unity-0.065
 matter-0.065
 course-0.064
Fin-0.061
 bond-0.061
Token a
Token ID257
Feature activation+3.11e-03

Pos logit contributions
 unity+0.008
 matter+0.008
 course+0.008
 bond+0.008
Fin+0.008
Neg logit contributions
angers-0.006
elt-0.006
 office-0.006
 bathroom-0.005
 salon-0.005
Token stretching
Token ID20880
Feature activation-2.70e-02

Pos logit contributions
 unity+0.004
 course+0.004
Fin+0.004
oc+0.004
From+0.004
Neg logit contributions
 office-0.006
angers-0.006
 bathroom-0.005
elt-0.005
 salon-0.005
Token
Token ID26391
Feature activation-2.03e-01

Pos logit contributions
 unity+0.056
 matter+0.056
 course+0.056
Fin+0.054
 bond+0.054
Neg logit contributions
angers-0.036
 office-0.032
elt-0.032
 bathroom-0.028
 salon-0.027
Token
Token ID129
Feature activation-6.19e-02

Pos logit contributions
angers+0.390
elt+0.346
 office+0.340
 bathroom+0.316
 upstairs+0.314
Neg logit contributions
 course-0.292
oc-0.285
From-0.277
 unity-0.274
Fin-0.268
Token
Token ID241
Feature activation-1.05e-01

Pos logit contributions
angers+0.115
elt+0.108
 office+0.098
 bathroom+0.095
tub+0.095
Neg logit contributions
 course-0.097
From-0.095
oc-0.090
 unity-0.088
leness-0.088
TokenLet
Token ID5756
Feature activation+1.22e-03

Pos logit contributions
angers+0.209
 office+0.193
elt+0.191
 bathroom+0.178
 salon+0.173
Neg logit contributions
 matter-0.140
 course-0.139
 unity-0.137
Fin-0.132
 bond-0.128
Tokenâ
Token ID22940
Feature activation-3.60e-02

Pos logit contributions
 matter+0.002
 course+0.002
 unity+0.002
Fin+0.002
 bond+0.002
Neg logit contributions
angers-0.002
elt-0.002
 office-0.002
 bathroom-0.002
 salon-0.002
Token
Token ID26391
Feature activation-1.71e-01

Pos logit contributions
angers+0.053
elt+0.048
 office+0.046
 bathroom+0.042
 salon+0.040
Neg logit contributions
 course-0.070
 matter-0.069
 unity-0.068
Fin-0.065
 bond-0.065
Token
Token ID8151
Feature activation-5.42e-02

Pos logit contributions
angers+0.301
elt+0.277
 office+0.273
 bathroom+0.258
 salon+0.243
Neg logit contributions
 course-0.262
oc-0.259
Fin-0.259
 unity-0.256
 matter-0.253
Tokens
Token ID82
Feature activation-4.38e-02

Pos logit contributions
angers+0.098
 office+0.094
elt+0.087
 bathroom+0.087
azes+0.079
Neg logit contributions
 course-0.082
oc-0.077
Fin-0.077
 matter-0.074
 unity-0.074
Token explore
Token ID7301
Feature activation-6.35e-02

Pos logit contributions
angers+0.081
elt+0.075
 office+0.074
 bathroom+0.068
 salon+0.066
Neg logit contributions
 unity-0.065
 matter-0.065
 course-0.064
Fin-0.061
 bond-0.061
Token and
Token ID290
Feature activation+1.42e-02

Pos logit contributions
angers+0.105
 office+0.100
elt+0.095
 bathroom+0.092
 salon+0.086
Neg logit contributions
 matter-0.103
 unity-0.102
Fin-0.098
 course-0.097
 bond-0.097
Token have
Token ID423
Feature activation+1.76e-02

Pos logit contributions
 unity+0.021
 matter+0.021
 course+0.021
 bond+0.020
Fin+0.019
Neg logit contributions
angers-0.025
 office-0.024
elt-0.024
 bathroom-0.023
 salon-0.022
Token
Token ID26391
Feature activation-2.03e-01

Pos logit contributions
angers+0.074
 office+0.065
elt+0.063
ooter+0.059
 salon+0.058
Neg logit contributions
 matter-0.063
 course-0.062
 unity-0.059
 love-0.059
 souven-0.058
Token
Token ID129
Feature activation-6.19e-02

Pos logit contributions
angers+0.390
elt+0.346
 office+0.340
 bathroom+0.316
 upstairs+0.314
Neg logit contributions
 course-0.292
oc-0.285
From-0.277
 unity-0.274
Fin-0.268
Token
Token ID241
Feature activation-1.05e-01

Pos logit contributions
angers+0.115
elt+0.108
 office+0.098
 bathroom+0.095
tub+0.095
Neg logit contributions
 course-0.097
From-0.095
oc-0.090
 unity-0.088
leness-0.088
TokenLet
Token ID5756
Feature activation+1.22e-03

Pos logit contributions
angers+0.209
 office+0.193
elt+0.191
 bathroom+0.178
 salon+0.173
Neg logit contributions
 matter-0.140
 course-0.139
 unity-0.137
Fin-0.132
 bond-0.128
Tokenâ
Token ID22940
Feature activation-3.60e-02

Pos logit contributions
 matter+0.002
 course+0.002
 unity+0.002
Fin+0.002
 bond+0.002
Neg logit contributions
angers-0.002
elt-0.002
 office-0.002
 bathroom-0.002
 salon-0.002
Token
Token ID26391
Feature activation-1.71e-01

Pos logit contributions
angers+0.053
elt+0.048
 office+0.046
 bathroom+0.042
 salon+0.040
Neg logit contributions
 course-0.070
 matter-0.069
 unity-0.068
Fin-0.065
 bond-0.065
Token
Token ID8151
Feature activation-5.42e-02

Pos logit contributions
angers+0.301
elt+0.277
 office+0.273
 bathroom+0.258
 salon+0.243
Neg logit contributions
 course-0.262
oc-0.259
Fin-0.259
 unity-0.256
 matter-0.253
Tokens
Token ID82
Feature activation-4.38e-02

Pos logit contributions
angers+0.098
 office+0.094
elt+0.087
 bathroom+0.087
azes+0.079
Neg logit contributions
 course-0.082
oc-0.077
Fin-0.077
 matter-0.074
 unity-0.074
Token not
Token ID407
Feature activation+1.07e-02

Pos logit contributions
angers+0.081
elt+0.075
 office+0.074
 bathroom+0.068
 salon+0.066
Neg logit contributions
 unity-0.065
 matter-0.065
 course-0.064
Fin-0.061
 bond-0.061
Token think
Token ID892
Feature activation-5.64e-02

Pos logit contributions
 course+0.015
 unity+0.015
 bond+0.014
oc+0.014
 matter+0.014
Neg logit contributions
angers-0.020
 office-0.019
elt-0.019
 bathroom-0.018
 salon-0.017
Token about
Token ID546
Feature activation-1.31e-02

Pos logit contributions
angers+0.104
 office+0.096
elt+0.096
 bathroom+0.088
 salon+0.086
Neg logit contributions
 unity-0.083
 matter-0.083
 course-0.082
 bond-0.079
Fin-0.078
Token
Token ID26391
Feature activation-2.03e-01

Pos logit contributions
 matter+0.003
 course+0.003
 love+0.002
oc+0.002
Fin+0.002
Neg logit contributions
angers-0.003
elt-0.003
 office-0.003
anger-0.003
amas-0.003
Token
Token ID129
Feature activation-6.19e-02

Pos logit contributions
angers+0.390
elt+0.346
 office+0.340
 bathroom+0.316
 upstairs+0.314
Neg logit contributions
 course-0.292
oc-0.285
From-0.277
 unity-0.274
Fin-0.268
Token
Token ID241
Feature activation-1.05e-01

Pos logit contributions
angers+0.115
elt+0.108
 office+0.098
 bathroom+0.095
tub+0.095
Neg logit contributions
 course-0.097
From-0.095
oc-0.090
 unity-0.088
leness-0.088
TokenLet
Token ID5756
Feature activation+1.22e-03

Pos logit contributions
angers+0.209
 office+0.193
elt+0.191
 bathroom+0.178
 salon+0.173
Neg logit contributions
 matter-0.140
 course-0.139
 unity-0.137
Fin-0.132
 bond-0.128
Tokenâ
Token ID22940
Feature activation-3.60e-02

Pos logit contributions
 matter+0.002
 course+0.002
 unity+0.002
Fin+0.002
 bond+0.002
Neg logit contributions
angers-0.002
elt-0.002
 office-0.002
 bathroom-0.002
 salon-0.002
Token
Token ID26391
Feature activation-1.71e-01

Pos logit contributions
angers+0.053
elt+0.048
 office+0.046
 bathroom+0.042
 salon+0.040
Neg logit contributions
 course-0.070
 matter-0.069
 unity-0.068
Fin-0.065
 bond-0.065
Token
Token ID8151
Feature activation-5.42e-02

Pos logit contributions
angers+0.301
elt+0.277
 office+0.273
 bathroom+0.258
 salon+0.243
Neg logit contributions
 course-0.262
oc-0.259
Fin-0.259
 unity-0.256
 matter-0.253
Tokens
Token ID82
Feature activation-4.38e-02

Pos logit contributions
angers+0.098
 office+0.094
elt+0.087
 bathroom+0.087
azes+0.079
Neg logit contributions
 course-0.082
oc-0.077
Fin-0.077
 matter-0.074
 unity-0.074
Token play
Token ID711
Feature activation-4.88e-02

Pos logit contributions
angers+0.081
elt+0.075
 office+0.074
 bathroom+0.068
 salon+0.066
Neg logit contributions
 unity-0.065
 matter-0.065
 course-0.064
Fin-0.061
 bond-0.061
Token a
Token ID257
Feature activation-1.25e-02

Pos logit contributions
angers+0.082
elt+0.075
 office+0.074
 bathroom+0.067
 salon+0.065
Neg logit contributions
 course-0.082
 unity-0.081
 matter-0.081
Fin-0.077
 bond-0.077
Token joke
Token ID9707
Feature activation-2.05e-02

Pos logit contributions
angers+0.022
 office+0.022
 bathroom+0.022
 salon+0.021
elt+0.021
Neg logit contributions
 unity-0.018
 course-0.016
 matter-0.016
leness-0.016
Fin-0.016
Token
Token ID129
Feature activation-4.92e-02

Pos logit contributions
angers+0.030
 office+0.028
elt+0.027
 bathroom+0.025
 salon+0.025
Neg logit contributions
 course-0.029
 unity-0.029
 matter-0.029
Fin-0.028
 bond-0.027
Token
Token ID241
Feature activation-1.02e-01

Pos logit contributions
angers+0.092
elt+0.089
 bathroom+0.080
 cabinet+0.079
tub+0.078
Neg logit contributions
From-0.076
 course-0.073
oc-0.070
Fin-0.067
 unity-0.067
TokenWhy
Token ID5195
Feature activation-4.03e-02

Pos logit contributions
angers+0.199
 office+0.183
elt+0.181
 bathroom+0.169
 salon+0.164
Neg logit contributions
 matter-0.140
 course-0.140
 unity-0.138
Fin-0.134
 bond-0.129
Token don
Token ID836
Feature activation-4.08e-02

Pos logit contributions
angers+0.076
 office+0.071
elt+0.071
 bathroom+0.067
 salon+0.065
Neg logit contributions
 unity-0.057
 course-0.056
 matter-0.056
Fin-0.053
 bond-0.053
Tokenâ
Token ID22940
Feature activation-3.40e-02

Pos logit contributions
angers+0.070
elt+0.065
 office+0.064
 bathroom+0.058
 salon+0.056
Neg logit contributions
 course-0.067
 matter-0.066
 unity-0.065
Fin-0.062
 caring-0.062
Token
Token ID26391
Feature activation-1.71e-01

Pos logit contributions
angers+0.058
elt+0.051
 office+0.047
 bathroom+0.045
 door+0.043
Neg logit contributions
 course-0.063
 matter-0.062
 unity-0.058
 remind-0.056
 bond-0.056
Token
Token ID8151
Feature activation-4.74e-02

Pos logit contributions
angers+0.310
elt+0.277
 office+0.274
 bathroom+0.262
 upstairs+0.251
Neg logit contributions
oc-0.258
 course-0.250
Fin-0.248
 matter-0.245
 unity-0.237
Tokent
Token ID83
Feature activation+8.48e-03

Pos logit contributions
angers+0.087
 office+0.083
 bathroom+0.077
elt+0.077
 door+0.071
Neg logit contributions
 course-0.070
 matter-0.067
oc-0.067
Fin-0.066
 unity-0.063
Token you
Token ID345
Feature activation-6.54e-03

Pos logit contributions
 unity+0.013
 matter+0.013
 course+0.013
 bond+0.012
Fin+0.012
Neg logit contributions
angers-0.015
elt-0.014
 office-0.014
 bathroom-0.013
 salon-0.012
Token come
Token ID1282
Feature activation-8.37e-02

Pos logit contributions
angers+0.012
elt+0.012
 office+0.012
 bathroom+0.011
 salon+0.011
Neg logit contributions
 unity-0.009
 course-0.009
 matter-0.009
 bond-0.008
Fin-0.008
Token with
Token ID351
Feature activation+2.81e-03

Pos logit contributions
angers+0.115
elt+0.104
 office+0.099
 salon+0.088
 bathroom+0.088
Neg logit contributions
 matter-0.162
 unity-0.161
 course-0.159
 kindness-0.155
 remind-0.154
Token until
Token ID1566
Feature activation-5.01e-02

Pos logit contributions
angers+0.163
elt+0.148
 office+0.145
 bathroom+0.131
 salon+0.127
Neg logit contributions
 matter-0.198
 unity-0.198
 course-0.197
Fin-0.187
 bond-0.187
Token it
Token ID340
Feature activation-1.15e-01

Pos logit contributions
angers+0.106
elt+0.098
 office+0.098
 bathroom+0.091
 salon+0.089
Neg logit contributions
 matter-0.062
 unity-0.062
 course-0.061
Fin-0.057
 bond-0.057
Token stopped
Token ID5025
Feature activation-1.30e-01

Pos logit contributions
angers+0.183
 office+0.164
elt+0.158
 bathroom+0.142
azes+0.140
Neg logit contributions
 matter-0.209
 course-0.203
 unity-0.200
Fin-0.200
 bond-0.192
Token at
Token ID379
Feature activation-2.46e-02

Pos logit contributions
angers+0.210
 office+0.181
elt+0.180
 salon+0.168
anger+0.164
Neg logit contributions
 matter-0.234
 unity-0.231
 course-0.226
 kindness-0.219
Fin-0.217
Token the
Token ID262
Feature activation-9.22e-02

Pos logit contributions
angers+0.043
elt+0.040
 office+0.040
 bathroom+0.036
 salon+0.035
Neg logit contributions
 unity-0.039
 matter-0.039
 course-0.039
Fin-0.037
 bond-0.037
Token bottom
Token ID4220
Feature activation-1.70e-01

Pos logit contributions
angers+0.128
 office+0.119
elt+0.116
 bathroom+0.107
 salon+0.102
Neg logit contributions
 unity-0.175
 matter-0.175
 course-0.173
Fin-0.168
 bond-0.166
Token.
Token ID13
Feature activation-5.01e-02

Pos logit contributions
angers+0.357
elt+0.329
 office+0.320
 bathroom+0.302
 salon+0.295
Neg logit contributions
 course-0.215
 matter-0.210
 unity-0.203
 bond-0.187
Fin-0.187
Token Tim
Token ID5045
Feature activation-1.50e-02

Pos logit contributions
angers+0.086
 office+0.079
elt+0.078
 bathroom+0.072
 salon+0.070
Neg logit contributions
 unity-0.080
 matter-0.080
 course-0.080
Fin-0.076
 bond-0.075
Tokenmy
Token ID1820
Feature activation-3.44e-02

Pos logit contributions
angers+0.024
elt+0.022
 office+0.022
 bathroom+0.020
 salon+0.019
Neg logit contributions
 matter-0.026
 unity-0.026
 course-0.026
Fin-0.024
 bond-0.024
Token was
Token ID373
Feature activation-7.55e-02

Pos logit contributions
angers+0.046
elt+0.042
 office+0.042
 bathroom+0.037
 salon+0.036
Neg logit contributions
 unity-0.068
 matter-0.068
 course-0.067
 bond-0.065
Fin-0.065
Token happy
Token ID3772
Feature activation+2.46e-02

Pos logit contributions
angers+0.153
elt+0.133
anger+0.127
 office+0.127
amas+0.122
Neg logit contributions
 unity-0.105
 caring-0.105
 matter-0.104
 course-0.102
 joy-0.101
Token
Token ID26391
Feature activation-2.03e-01

Pos logit contributions
angers+0.077
 office+0.073
 bathroom+0.069
elt+0.068
 door+0.065
Neg logit contributions
forts-0.084
 course-0.084
 matter-0.083
leness-0.083
 unity-0.080
Token
Token ID129
Feature activation-6.19e-02

Pos logit contributions
angers+0.390
elt+0.346
 office+0.340
 bathroom+0.316
 upstairs+0.314
Neg logit contributions
 course-0.292
oc-0.285
From-0.277
 unity-0.274
Fin-0.268
Token
Token ID241
Feature activation-1.05e-01

Pos logit contributions
angers+0.115
elt+0.108
 office+0.098
 bathroom+0.095
tub+0.095
Neg logit contributions
 course-0.097
From-0.095
oc-0.090
 unity-0.088
leness-0.088
Tokenplease
Token ID29688
Feature activation-1.50e-02

Pos logit contributions
angers+0.209
 office+0.193
elt+0.191
 bathroom+0.178
 salon+0.173
Neg logit contributions
 matter-0.140
 course-0.139
 unity-0.137
Fin-0.132
 bond-0.128
Tokenâ
Token ID22940
Feature activation-3.82e-02

Pos logit contributions
angers+0.027
elt+0.025
 office+0.024
 bathroom+0.022
 salon+0.022
Neg logit contributions
 unity-0.023
 matter-0.023
 course-0.023
 bond-0.022
Fin-0.022
Token
Token ID26391
Feature activation-1.70e-01

Pos logit contributions
angers+0.056
elt+0.049
 office+0.047
 bathroom+0.044
 salon+0.042
Neg logit contributions
 course-0.076
 matter-0.074
 unity-0.072
Fin-0.070
 bond-0.070
Token and
Token ID290
Feature activation+9.87e-03

Pos logit contributions
angers+0.309
elt+0.284
 office+0.281
 bathroom+0.264
 salon+0.250
Neg logit contributions
 course-0.249
oc-0.249
Fin-0.247
 unity-0.244
 matter-0.241
Token
Token ID6184
Feature activation-5.49e-03

Pos logit contributions
 unity+0.014
 matter+0.014
 course+0.014
 bond+0.014
Fin+0.014
Neg logit contributions
angers-0.018
 office-0.017
elt-0.017
 bathroom-0.016
 salon-0.015
Token
Token ID95
Feature activation-1.41e-02

Pos logit contributions
angers+0.010
elt+0.009
 office+0.009
 bathroom+0.008
tub+0.008
Neg logit contributions
 course-0.009
Fin-0.008
 unity-0.008
 matter-0.008
 caring-0.008
Token
Token ID26391
Feature activation-1.48e-01

Pos logit contributions
angers+0.025
elt+0.024
 office+0.022
 bathroom+0.020
 door+0.020
Neg logit contributions
 course-0.023
 matter-0.021
 unity-0.021
From-0.021
Fin-0.021
Token
Token ID129
Feature activation-2.57e-02

Pos logit contributions
angers+0.273
elt+0.255
 office+0.249
 bathroom+0.238
 salon+0.224
Neg logit contributions
 course-0.213
oc-0.211
Fin-0.208
 unity-0.201
 matter-0.197
Token
Token ID8151
Feature activation-4.49e-02

Pos logit contributions
angers+0.302
elt+0.278
 office+0.275
 bathroom+0.259
 salon+0.247
Neg logit contributions
 course-0.233
 unity-0.228
oc-0.225
Fin-0.225
 matter-0.223
Tokens
Token ID82
Feature activation-2.20e-02

Pos logit contributions
angers+0.076
 office+0.070
elt+0.066
 bathroom+0.064
 door+0.059
Neg logit contributions
 course-0.076
 matter-0.072
 unity-0.071
Fin-0.070
oc-0.070
Token why
Token ID1521
Feature activation-7.60e-02

Pos logit contributions
angers+0.042
elt+0.038
 office+0.037
 bathroom+0.034
 salon+0.034
Neg logit contributions
 course-0.033
 matter-0.033
 unity-0.033
 kindness-0.030
 bond-0.030
Token it
Token ID340
Feature activation-1.20e-01

Pos logit contributions
angers+0.152
elt+0.141
 office+0.141
 bathroom+0.131
 salon+0.127
Neg logit contributions
 matter-0.100
 unity-0.100
 course-0.100
Fin-0.094
 bond-0.093
Tokenâ
Token ID22940
Feature activation-3.12e-02

Pos logit contributions
angers+0.206
elt+0.183
 office+0.178
 bathroom+0.159
 salon+0.157
Neg logit contributions
 matter-0.207
 course-0.203
 unity-0.198
 kindness-0.189
 caring-0.188
Token
Token ID26391
Feature activation-1.70e-01

Pos logit contributions
angers+0.048
elt+0.042
 office+0.039
 bathroom+0.037
 door+0.036
Neg logit contributions
 course-0.063
 matter-0.060
 unity-0.057
Fin-0.056
leness-0.056
Token
Token ID8151
Feature activation-4.25e-02

Pos logit contributions
angers+0.298
elt+0.273
 office+0.272
 bathroom+0.252
 upstairs+0.242
Neg logit contributions
 course-0.253
Fin-0.252
oc-0.252
 unity-0.244
 matter-0.240
Tokens
Token ID82
Feature activation-3.34e-02

Pos logit contributions
angers+0.073
 office+0.068
elt+0.064
 bathroom+0.062
 door+0.058
Neg logit contributions
 course-0.069
 matter-0.065
Fin-0.065
oc-0.064
 unity-0.063
Token called
Token ID1444
Feature activation-2.84e-02

Pos logit contributions
angers+0.060
elt+0.055
 office+0.053
 bathroom+0.048
 salon+0.048
Neg logit contributions
 matter-0.053
 course-0.052
 unity-0.052
 kindness-0.049
 bond-0.049
Token a
Token ID257
Feature activation-1.02e-02

Pos logit contributions
angers+0.054
elt+0.049
 office+0.047
azes+0.045
pit+0.043
Neg logit contributions
 course-0.042
 matter-0.041
 unity-0.040
 kindness-0.039
Fin-0.038
Token smooth
Token ID7209
Feature activation+1.75e-02

Pos logit contributions
angers+0.018
 office+0.017
elt+0.017
 bathroom+0.016
 salon+0.015
Neg logit contributions
 unity-0.016
 matter-0.016
 course-0.015
Fin-0.015
 bond-0.015
Token
Token ID26391
Feature activation-2.03e-01

Pos logit contributions
angers+0.012
elt+0.012
 office+0.011
 bathroom+0.011
 salon+0.010
Neg logit contributions
 course-0.008
 matter-0.008
 unity-0.008
 bond-0.008
Fin-0.008
Token
Token ID129
Feature activation-6.19e-02

Pos logit contributions
angers+0.390
elt+0.346
 office+0.340
 bathroom+0.316
 upstairs+0.314
Neg logit contributions
 course-0.292
oc-0.285
From-0.277
 unity-0.274
Fin-0.268
Token
Token ID241
Feature activation-1.05e-01

Pos logit contributions
angers+0.115
elt+0.108
 office+0.098
 bathroom+0.095
tub+0.095
Neg logit contributions
 course-0.097
From-0.095
oc-0.090
 unity-0.088
leness-0.088
TokenWho
Token ID8241
Feature activation-2.95e-02

Pos logit contributions
angers+0.209
 office+0.193
elt+0.191
 bathroom+0.178
 salon+0.173
Neg logit contributions
 matter-0.140
 course-0.139
 unity-0.137
Fin-0.132
 bond-0.128
Tokenâ
Token ID22940
Feature activation-3.55e-02

Pos logit contributions
angers+0.054
 office+0.050
elt+0.050
 bathroom+0.046
 salon+0.045
Neg logit contributions
 unity-0.044
 matter-0.043
 course-0.043
 bond-0.041
Fin-0.041
Token
Token ID26391
Feature activation-1.69e-01

Pos logit contributions
angers+0.062
elt+0.053
 bathroom+0.048
anger+0.047
 office+0.047
Neg logit contributions
 course-0.068
 matter-0.064
 unity-0.058
leness-0.058
 remind-0.057
Token
Token ID8151
Feature activation-4.86e-02

Pos logit contributions
angers+0.329
elt+0.299
 office+0.291
 bathroom+0.279
 salon+0.266
Neg logit contributions
oc-0.230
 course-0.227
Fin-0.223
 matter-0.216
 unity-0.210
Tokens
Token ID82
Feature activation-4.34e-02

Pos logit contributions
angers+0.083
 office+0.078
elt+0.073
 bathroom+0.073
 knee+0.066
Neg logit contributions
 course-0.078
oc-0.072
Fin-0.072
 matter-0.071
 love-0.068
Token there
Token ID612
Feature activation-3.81e-02

Pos logit contributions
angers+0.071
elt+0.065
amas+0.054
 office+0.053
uddled+0.053
Neg logit contributions
 course-0.078
 matter-0.077
 unity-0.074
 remind-0.072
 kindness-0.072
Token?
Token ID30
Feature activation-6.74e-02

Pos logit contributions
angers+0.062
elt+0.055
 office+0.050
azes+0.046
ulk+0.046
Neg logit contributions
 course-0.069
 matter-0.067
 unity-0.064
 remind-0.063
 kindness-0.062
Tokenâ
Token ID22940
Feature activation-2.12e-02

Pos logit contributions
angers+0.118
 office+0.108
elt+0.108
 bathroom+0.099
 salon+0.097
Neg logit contributions
 course-0.103
 unity-0.103
 matter-0.102
Fin-0.099
oc-0.097
Token
Token ID26391
Feature activation-1.59e-01

Pos logit contributions
elt+0.024
angers+0.023
pit+0.020
 bathroom+0.020
 office+0.020
Neg logit contributions
 course-0.018
From-0.018
oc-0.017
 teaches-0.017
Fin-0.016
Token
Token ID129
Feature activation-3.79e-02

Pos logit contributions
angers+0.323
elt+0.303
 office+0.300
 bathroom+0.287
 upstairs+0.275
Neg logit contributions
oc-0.200
 course-0.189
Fin-0.186
From-0.185
 love-0.172
Token
Token ID241
Feature activation-9.53e-02

Pos logit contributions
elt+0.065
angers+0.064
 office+0.057
 bathroom+0.055
tub+0.054
Neg logit contributions
From-0.062
 course-0.061
oc-0.059
Fin-0.059
 unity-0.057
TokenLet
Token ID5756
Feature activation+2.90e-03

Pos logit contributions
angers+0.186
 office+0.172
elt+0.172
 bathroom+0.159
 salon+0.155
Neg logit contributions
 matter-0.131
 unity-0.130
 course-0.130
Fin-0.123
 bond-0.122
Tokenâ
Token ID22940
Feature activation-3.46e-02

Pos logit contributions
 matter+0.005
 course+0.005
 unity+0.005
Fin+0.005
 kindness+0.005
Neg logit contributions
angers-0.005
elt-0.004
 office-0.004
 bathroom-0.004
 salon-0.004
Token
Token ID26391
Feature activation-1.68e-01

Pos logit contributions
angers+0.051
elt+0.046
 office+0.044
 bathroom+0.041
 salon+0.039
Neg logit contributions
 course-0.066
 matter-0.065
 unity-0.065
Fin-0.062
 bond-0.062
Token
Token ID8151
Feature activation-4.99e-02

Pos logit contributions
angers+0.292
elt+0.271
 office+0.268
 bathroom+0.253
 salon+0.239
Neg logit contributions
oc-0.257
 course-0.256
Fin-0.254
 unity-0.248
 matter-0.244
Tokens
Token ID82
Feature activation-4.32e-02

Pos logit contributions
angers+0.093
 office+0.089
elt+0.084
 bathroom+0.084
amas+0.076
Neg logit contributions
 course-0.071
oc-0.068
Fin-0.067
 unity-0.064
 matter-0.064
Token go
Token ID467
Feature activation-8.16e-02

Pos logit contributions
angers+0.080
 office+0.074
elt+0.074
 bathroom+0.069
 salon+0.066
Neg logit contributions
 unity-0.063
 matter-0.063
 course-0.062
 bond-0.059
Fin-0.059
Token visit
Token ID3187
Feature activation-1.54e-02

Pos logit contributions
angers+0.132
elt+0.125
 office+0.117
azes+0.108
 salon+0.107
Neg logit contributions
 course-0.136
 matter-0.131
 unity-0.131
Fin-0.130
 remind-0.128
Token Gr
Token ID1902
Feature activation-2.54e-02

Pos logit contributions
angers+0.027
 office+0.025
elt+0.025
 bathroom+0.023
 salon+0.022
Neg logit contributions
 matter-0.024
 unity-0.024
 course-0.024
 bond-0.023
Fin-0.023
Token
Token ID26391
Feature activation-1.62e-01

Pos logit contributions
elt+0.015
angers+0.015
 bathroom+0.013
pit+0.013
 office+0.013
Neg logit contributions
 course-0.011
From-0.010
oc-0.010
Fin-0.010
 teaches-0.009
Token
Token ID129
Feature activation-3.96e-02

Pos logit contributions
angers+0.330
elt+0.303
 office+0.300
 bathroom+0.285
 upstairs+0.275
Neg logit contributions
oc-0.206
 course-0.202
Fin-0.193
From-0.191
 unity-0.184
Token
Token ID241
Feature activation-9.56e-02

Pos logit contributions
angers+0.067
elt+0.067
 office+0.059
 bathroom+0.058
tub+0.056
Neg logit contributions
From-0.065
 course-0.065
oc-0.062
Fin-0.062
 unity-0.061
TokenLet
Token ID5756
Feature activation+3.56e-03

Pos logit contributions
angers+0.191
 office+0.176
elt+0.175
 bathroom+0.163
 salon+0.159
Neg logit contributions
 matter-0.128
 course-0.127
 unity-0.127
Fin-0.121
 bond-0.119
Tokenâ
Token ID22940
Feature activation-3.25e-02

Pos logit contributions
 matter+0.006
 course+0.006
 unity+0.006
Fin+0.006
 bond+0.005
Neg logit contributions
angers-0.006
elt-0.006
 office-0.005
 bathroom-0.005
 salon-0.005
Token
Token ID26391
Feature activation-1.68e-01

Pos logit contributions
angers+0.048
elt+0.043
 office+0.041
 bathroom+0.038
 salon+0.036
Neg logit contributions
 course-0.063
 matter-0.062
 unity-0.061
Fin-0.059
 bond-0.059
Token
Token ID8151
Feature activation-4.98e-02

Pos logit contributions
angers+0.295
elt+0.273
 office+0.269
 bathroom+0.255
 salon+0.241
Neg logit contributions
 course-0.254
oc-0.253
Fin-0.251
 unity-0.247
 matter-0.242
Tokens
Token ID82
Feature activation-4.31e-02

Pos logit contributions
angers+0.092
 office+0.088
elt+0.083
 bathroom+0.082
amas+0.075
Neg logit contributions
 course-0.072
oc-0.069
Fin-0.069
 unity-0.066
 matter-0.066
Token spin
Token ID7906
Feature activation-2.93e-02

Pos logit contributions
angers+0.080
 office+0.074
elt+0.074
 bathroom+0.068
 salon+0.066
Neg logit contributions
 unity-0.063
 matter-0.063
 course-0.063
 bond-0.059
Fin-0.059
Token together
Token ID1978
Feature activation+2.49e-02

Pos logit contributions
angers+0.050
 office+0.047
elt+0.046
 bathroom+0.041
azes+0.041
Neg logit contributions
 course-0.048
 matter-0.047
 unity-0.046
Fin-0.045
 bond-0.044
Token with
Token ID351
Feature activation+1.45e-02

Pos logit contributions
 unity+0.046
 matter+0.045
 course+0.045
Fin+0.044
 bond+0.043
Neg logit contributions
angers-0.037
 office-0.034
elt-0.033
 bathroom-0.031
 salon-0.029
Token
Token ID26391
Feature activation-1.62e-01

Pos logit contributions
elt+0.015
angers+0.015
 bathroom+0.013
pit+0.013
 office+0.013
Neg logit contributions
 course-0.011
From-0.010
oc-0.010
Fin-0.010
 teaches-0.009
Token
Token ID129
Feature activation-3.96e-02

Pos logit contributions
angers+0.330
elt+0.303
 office+0.300
 bathroom+0.285
 upstairs+0.275
Neg logit contributions
oc-0.206
 course-0.202
Fin-0.193
From-0.191
 unity-0.184
Token
Token ID241
Feature activation-9.56e-02

Pos logit contributions
angers+0.067
elt+0.067
 office+0.059
 bathroom+0.058
tub+0.056
Neg logit contributions
From-0.065
 course-0.065
oc-0.062
Fin-0.062
 unity-0.061
TokenLet
Token ID5756
Feature activation+3.56e-03

Pos logit contributions
angers+0.191
 office+0.176
elt+0.175
 bathroom+0.163
 salon+0.159
Neg logit contributions
 matter-0.128
 course-0.127
 unity-0.127
Fin-0.121
 bond-0.119
Tokenâ
Token ID22940
Feature activation-3.25e-02

Pos logit contributions
 matter+0.006
 course+0.006
 unity+0.006
Fin+0.006
 bond+0.005
Neg logit contributions
angers-0.006
elt-0.006
 office-0.005
 bathroom-0.005
 salon-0.005
Token
Token ID26391
Feature activation-1.68e-01

Pos logit contributions
angers+0.048
elt+0.043
 office+0.041
 bathroom+0.038
 salon+0.036
Neg logit contributions
 course-0.063
 matter-0.062
 unity-0.061
Fin-0.059
 bond-0.059
Token
Token ID8151
Feature activation-4.98e-02

Pos logit contributions
angers+0.295
elt+0.273
 office+0.269
 bathroom+0.255
 salon+0.241
Neg logit contributions
 course-0.254
oc-0.253
Fin-0.251
 unity-0.247
 matter-0.242
Tokens
Token ID82
Feature activation-4.31e-02

Pos logit contributions
angers+0.092
 office+0.088
elt+0.083
 bathroom+0.082
amas+0.075
Neg logit contributions
 course-0.072
oc-0.069
Fin-0.069
 unity-0.066
 matter-0.066
Token jump
Token ID4391
Feature activation-5.30e-02

Pos logit contributions
angers+0.080
 office+0.074
elt+0.074
 bathroom+0.068
 salon+0.066
Neg logit contributions
 unity-0.063
 matter-0.063
 course-0.063
 bond-0.059
Fin-0.059
Token over
Token ID625
Feature activation-9.15e-02

Pos logit contributions
angers+0.083
 office+0.078
elt+0.076
azes+0.070
 salon+0.066
Neg logit contributions
 course-0.094
Fin-0.090
 matter-0.088
 unity-0.088
 souven-0.085
Token it
Token ID340
Feature activation-1.30e-01

Pos logit contributions
angers+0.158
 office+0.146
elt+0.145
 bathroom+0.135
 salon+0.129
Neg logit contributions
 matter-0.146
 unity-0.146
 course-0.141
 bond-0.139
Fin-0.138
Token her
Token ID607
Feature activation-9.46e-02

Pos logit contributions
angers+0.122
 office+0.115
elt+0.113
 bathroom+0.104
 salon+0.097
Neg logit contributions
 unity-0.142
 matter-0.141
 bond-0.137
 course-0.137
Fin-0.136
Token doll
Token ID3654
Feature activation-2.32e-02

Pos logit contributions
angers+0.142
 office+0.130
elt+0.128
 bathroom+0.117
 salon+0.113
Neg logit contributions
 unity-0.173
 matter-0.172
 course-0.171
Fin-0.166
 bond-0.164
Token sink
Token ID14595
Feature activation-1.06e-01

Pos logit contributions
angers+0.038
elt+0.036
 office+0.032
azes+0.030
ulk+0.030
Neg logit contributions
 matter-0.040
 course-0.040
 unity-0.038
 remind-0.036
 bond-0.036
Token to
Token ID284
Feature activation-1.21e-02

Pos logit contributions
angers+0.186
elt+0.164
 office+0.158
azes+0.157
ulk+0.145
Neg logit contributions
 matter-0.164
 course-0.163
 unity-0.153
 love-0.151
 kindness-0.149
Token the
Token ID262
Feature activation-8.74e-02

Pos logit contributions
angers+0.023
elt+0.022
 office+0.022
 bathroom+0.020
 salon+0.019
Neg logit contributions
 unity-0.017
 matter-0.017
 course-0.017
Fin-0.016
 bond-0.016
Token bottom
Token ID4220
Feature activation-1.66e-01

Pos logit contributions
angers+0.112
elt+0.101
azes+0.086
 office+0.084
ulk+0.077
Neg logit contributions
 course-0.184
 matter-0.181
 unity-0.180
 bond-0.175
 kindness-0.173
Token of
Token ID286
Feature activation+3.29e-02

Pos logit contributions
angers+0.316
elt+0.292
 office+0.283
 bathroom+0.263
 salon+0.256
Neg logit contributions
 course-0.239
 matter-0.236
 unity-0.231
 bond-0.216
Fin-0.216
Token the
Token ID262
Feature activation-8.71e-02

Pos logit contributions
 unity+0.051
 matter+0.051
 course+0.050
Fin+0.049
 bond+0.049
Neg logit contributions
angers-0.058
 office-0.054
elt-0.053
 bathroom-0.050
 salon-0.048
Token pool
Token ID5933
Feature activation-5.23e-02

Pos logit contributions
angers+0.110
elt+0.099
 office+0.086
azes+0.080
 salon+0.074
Neg logit contributions
 course-0.183
 matter-0.182
 unity-0.180
 bond-0.175
 kindness-0.171
Token.
Token ID13
Feature activation-4.16e-02

Pos logit contributions
angers+0.072
elt+0.064
 office+0.063
 bathroom+0.056
 salon+0.054
Neg logit contributions
 course-0.102
 matter-0.102
 unity-0.102
 bond-0.097
Fin-0.097
Token She
Token ID1375
Feature activation+3.95e-02

Pos logit contributions
angers+0.065
 office+0.059
elt+0.057
 bathroom+0.052
 salon+0.051
Neg logit contributions
 course-0.072
 unity-0.070
 matter-0.069
Fin-0.068
From-0.067
Token
Token ID26391
Feature activation-1.62e-01

Pos logit contributions
elt+0.015
angers+0.015
 bathroom+0.013
pit+0.013
 office+0.013
Neg logit contributions
 course-0.011
From-0.010
oc-0.010
Fin-0.010
 teaches-0.009
Token
Token ID129
Feature activation-3.96e-02

Pos logit contributions
angers+0.330
elt+0.303
 office+0.300
 bathroom+0.285
 upstairs+0.275
Neg logit contributions
oc-0.206
 course-0.202
Fin-0.193
From-0.191
 unity-0.184
Token
Token ID241
Feature activation-9.56e-02

Pos logit contributions
angers+0.067
elt+0.067
 office+0.059
 bathroom+0.058
tub+0.056
Neg logit contributions
From-0.065
 course-0.065
oc-0.062
Fin-0.062
 unity-0.061
TokenWhat
Token ID2061
Feature activation-4.39e-02

Pos logit contributions
angers+0.191
 office+0.176
elt+0.175
 bathroom+0.163
 salon+0.159
Neg logit contributions
 matter-0.128
 course-0.127
 unity-0.127
Fin-0.121
 bond-0.119
Tokenâ
Token ID22940
Feature activation-2.90e-02

Pos logit contributions
angers+0.081
elt+0.074
 office+0.073
 bathroom+0.067
 salon+0.065
Neg logit contributions
 matter-0.067
 unity-0.066
 course-0.066
 bond-0.062
Fin-0.062
Token
Token ID26391
Feature activation-1.65e-01

Pos logit contributions
angers+0.047
elt+0.041
 office+0.037
 bathroom+0.036
 door+0.034
Neg logit contributions
 course-0.058
 matter-0.055
 unity-0.051
Fin-0.050
leness-0.050
Token
Token ID8151
Feature activation-4.31e-02

Pos logit contributions
angers+0.313
elt+0.284
 office+0.278
 bathroom+0.264
 salon+0.256
Neg logit contributions
oc-0.233
 course-0.230
Fin-0.227
 matter-0.220
 unity-0.212
Tokens
Token ID82
Feature activation-4.36e-02

Pos logit contributions
angers+0.077
 office+0.070
elt+0.068
 bathroom+0.065
 floors+0.060
Neg logit contributions
 course-0.069
 matter-0.064
oc-0.063
Fin-0.062
 love-0.059
Token up
Token ID510
Feature activation-9.80e-02

Pos logit contributions
angers+0.072
elt+0.067
amas+0.057
ed+0.055
uddled+0.054
Neg logit contributions
 matter-0.082
 course-0.081
 unity-0.076
 caring-0.075
 remind-0.074
Token there
Token ID612
Feature activation-4.25e-02

Pos logit contributions
angers+0.144
elt+0.129
 office+0.121
 bathroom+0.111
amas+0.109
Neg logit contributions
 matter-0.191
 course-0.187
 unity-0.184
 bond-0.176
Fin-0.176
Token?
Token ID30
Feature activation-6.76e-02

Pos logit contributions
angers+0.068
elt+0.059
 office+0.052
azes+0.049
ulk+0.049
Neg logit contributions
 course-0.080
 matter-0.079
 unity-0.074
 remind-0.073
Fin-0.071
Token
Token ID26391
Feature activation-1.56e-01

Pos logit contributions
angers+0.035
elt+0.035
 office+0.031
 bathroom+0.030
 salon+0.029
Neg logit contributions
 course-0.034
 unity-0.032
From-0.032
 matter-0.031
oc-0.031
Token
Token ID129
Feature activation-3.48e-02

Pos logit contributions
angers+0.307
elt+0.294
 office+0.289
 bathroom+0.275
 floors+0.261
Neg logit contributions
oc-0.199
Fin-0.193
 course-0.193
From-0.184
 unity-0.181
Token
Token ID241
Feature activation-9.25e-02

Pos logit contributions
elt+0.058
angers+0.058
 office+0.051
 bathroom+0.049
tub+0.049
Neg logit contributions
 course-0.056
From-0.056
 unity-0.055
Fin-0.055
oc-0.054
Tokendon
Token ID9099
Feature activation-3.82e-02

Pos logit contributions
angers+0.169
 office+0.156
elt+0.154
 bathroom+0.143
 salon+0.139
Neg logit contributions
 matter-0.137
 course-0.137
 unity-0.137
Fin-0.132
 bond-0.129
Tokenâ
Token ID22940
Feature activation-3.00e-02

Pos logit contributions
angers+0.074
elt+0.069
 office+0.065
 bathroom+0.062
pit+0.062
Neg logit contributions
 course-0.056
 matter-0.054
From-0.052
Fin-0.052
 unity-0.051
Token
Token ID26391
Feature activation-1.65e-01

Pos logit contributions
angers+0.042
elt+0.037
 office+0.035
 bathroom+0.033
 salon+0.030
Neg logit contributions
 course-0.062
 matter-0.060
 unity-0.059
Fin-0.057
oc-0.057
Token
Token ID8151
Feature activation-4.68e-02

Pos logit contributions
angers+0.295
elt+0.277
 office+0.274
 bathroom+0.260
 upstairs+0.242
Neg logit contributions
oc-0.242
Fin-0.236
 course-0.234
 unity-0.223
 matter-0.219
Tokent
Token ID83
Feature activation+1.39e-02

Pos logit contributions
angers+0.084
 office+0.080
elt+0.076
 bathroom+0.075
 door+0.069
Neg logit contributions
 course-0.071
oc-0.068
Fin-0.067
 matter-0.065
 unity-0.064
Token worry
Token ID5490
Feature activation-2.02e-02

Pos logit contributions
 unity+0.020
 course+0.019
 matter+0.019
 bond+0.018
Fin+0.018
Neg logit contributions
angers-0.027
 office-0.025
elt-0.025
 bathroom-0.023
 salon-0.022
Token!
Token ID0
Feature activation-5.52e-02

Pos logit contributions
angers+0.031
elt+0.029
 office+0.028
 bathroom+0.025
 salon+0.024
Neg logit contributions
 matter-0.037
 unity-0.036
 course-0.036
Fin-0.034
 kindness-0.034
Token I
Token ID314
Feature activation+1.37e-02

Pos logit contributions
angers+0.092
elt+0.085
 office+0.084
 bathroom+0.076
 salon+0.073
Neg logit contributions
 matter-0.092
 unity-0.092
 course-0.091
 bond-0.088
Fin-0.086
Token to
Token ID284
Feature activation-2.62e-02

Pos logit contributions
angers+0.131
elt+0.116
 office+0.109
azes+0.101
 bathroom+0.101
Neg logit contributions
 course-0.171
 matter-0.166
 unity-0.162
oc-0.157
 bond-0.156
Token break
Token ID2270
Feature activation-2.72e-02

Pos logit contributions
angers+0.046
 office+0.046
 bathroom+0.043
elt+0.042
 door+0.040
Neg logit contributions
 unity-0.039
 course-0.037
 bond-0.037
Fin-0.037
 matter-0.037
Token into
Token ID656
Feature activation-7.52e-02

Pos logit contributions
angers+0.045
 office+0.042
elt+0.041
 bathroom+0.039
 door+0.036
Neg logit contributions
 unity-0.045
 matter-0.044
 course-0.044
 bond-0.044
Fin-0.043
Token people
Token ID661
Feature activation-2.84e-02

Pos logit contributions
angers+0.118
 office+0.116
 bathroom+0.107
elt+0.102
 door+0.100
Neg logit contributions
Fin-0.129
 unity-0.128
 course-0.126
 bond-0.126
 matter-0.126
Tokenâ
Token ID22940
Feature activation-3.15e-02

Pos logit contributions
angers+0.039
elt+0.035
 office+0.034
 bathroom+0.031
 salon+0.029
Neg logit contributions
 matter-0.055
 unity-0.055
 course-0.055
 bond-0.053
Fin-0.053
Token
Token ID26391
Feature activation-1.65e-01

Pos logit contributions
angers+0.058
elt+0.052
 bathroom+0.047
tub+0.046
anger+0.046
Neg logit contributions
 course-0.053
 matter-0.052
 unity-0.049
 bond-0.047
leness-0.047
Token
Token ID8151
Feature activation-4.66e-02

Pos logit contributions
angers+0.286
elt+0.267
 office+0.248
 bathroom+0.242
anger+0.228
Neg logit contributions
oc-0.254
Fin-0.240
 course-0.238
 unity-0.233
 harmony-0.229
Tokens
Token ID82
Feature activation-2.84e-02

Pos logit contributions
angers+0.082
 office+0.076
elt+0.075
 bathroom+0.072
azes+0.066
Neg logit contributions
 course-0.071
Fin-0.067
oc-0.067
 unity-0.064
 matter-0.064
Token homes
Token ID5682
Feature activation-3.38e-02

Pos logit contributions
angers+0.042
 office+0.039
elt+0.039
 bathroom+0.035
 salon+0.034
Neg logit contributions
 unity-0.052
 matter-0.051
 course-0.051
Fin-0.050
 bond-0.049
Token.
Token ID13
Feature activation-3.81e-02

Pos logit contributions
angers+0.052
elt+0.048
 office+0.045
azes+0.041
 bathroom+0.041
Neg logit contributions
 matter-0.060
 course-0.059
 unity-0.059
 kindness-0.057
 bond-0.056
Token Sam
Token ID3409
Feature activation-1.46e-02

Pos logit contributions
angers+0.064
elt+0.059
 office+0.059
 bathroom+0.054
 salon+0.052
Neg logit contributions
 matter-0.063
 unity-0.063
 course-0.062
 bond-0.060
Fin-0.058
Token
Token ID129
Feature activation-4.92e-02

Pos logit contributions
angers+0.052
 office+0.048
elt+0.046
 bathroom+0.043
 upstairs+0.041
Neg logit contributions
 matter-0.074
 unity-0.073
 course-0.072
Fin-0.072
 bond-0.071
Token
Token ID241
Feature activation-1.02e-01

Pos logit contributions
angers+0.092
elt+0.089
 bathroom+0.080
 cabinet+0.079
tub+0.078
Neg logit contributions
From-0.076
 course-0.073
oc-0.070
Fin-0.067
 unity-0.067
TokenT
Token ID51
Feature activation-1.08e-02

Pos logit contributions
angers+0.199
 office+0.183
elt+0.181
 bathroom+0.169
 salon+0.164
Neg logit contributions
 matter-0.140
 course-0.140
 unity-0.138
Fin-0.134
 bond-0.129
TokenIM
Token ID3955
Feature activation-1.80e-02

Pos logit contributions
angers+0.018
elt+0.016
 office+0.016
 bathroom+0.015
 salon+0.015
Neg logit contributions
 matter-0.018
 course-0.017
oc-0.017
 bond-0.016
 kindness-0.016
Tokenâ
Token ID22940
Feature activation-2.78e-02

Pos logit contributions
angers+0.037
elt+0.034
 office+0.034
 bathroom+0.031
 salon+0.031
Neg logit contributions
 matter-0.023
 course-0.023
 unity-0.023
Fin-0.022
 bond-0.022
Token
Token ID26391
Feature activation-1.64e-01

Pos logit contributions
angers+0.040
elt+0.035
 office+0.033
 bathroom+0.031
 salon+0.029
Neg logit contributions
 course-0.056
 matter-0.055
 unity-0.054
Fin-0.052
 bond-0.052
Token.
Token ID13
Feature activation-5.30e-02

Pos logit contributions
angers+0.285
elt+0.263
 office+0.259
 bathroom+0.242
 salon+0.229
Neg logit contributions
 course-0.256
 unity-0.251
Fin-0.250
oc-0.249
 matter-0.249
Token It
Token ID632
Feature activation-2.59e-02

Pos logit contributions
angers+0.089
elt+0.082
 office+0.081
 bathroom+0.074
 salon+0.070
Neg logit contributions
 matter-0.089
 unity-0.089
 course-0.087
 bond-0.085
Fin-0.082
Token looked
Token ID3114
Feature activation-9.55e-02

Pos logit contributions
angers+0.038
 office+0.035
elt+0.035
 bathroom+0.032
 salon+0.031
Neg logit contributions
 unity-0.047
 matter-0.047
 course-0.046
 bond-0.045
Fin-0.045
Token nice
Token ID3621
Feature activation+3.36e-02

Pos logit contributions
angers+0.160
elt+0.150
 office+0.147
 bathroom+0.138
 door+0.131
Neg logit contributions
 unity-0.154
 course-0.152
 matter-0.149
Fin-0.147
 bond-0.147
Token and
Token ID290
Feature activation+1.48e-02

Pos logit contributions
 matter+0.059
 unity+0.058
 course+0.058
 bond+0.056
Fin+0.055
Neg logit contributions
angers-0.053
elt-0.048
 office-0.048
 bathroom-0.044
 salon-0.042
Token not
Token ID407
Feature activation+6.43e-02

Pos logit contributions
angers+0.089
elt+0.082
 office+0.081
 bathroom+0.075
 salon+0.074
Neg logit contributions
 unity-0.058
 matter-0.057
 course-0.057
 caring-0.053
 bond-0.053
Token gr
Token ID1036
Feature activation+2.40e-02

Pos logit contributions
 matter+0.106
 unity+0.105
 course+0.103
 bond+0.098
 caring+0.098
Neg logit contributions
angers-0.110
elt-0.100
 office-0.099
 bathroom-0.090
 salon-0.088
Tokenumpy
Token ID32152
Feature activation+4.16e-02

Pos logit contributions
 matter+0.039
 unity+0.038
 course+0.037
oc+0.036
Fin+0.036
Neg logit contributions
angers-0.041
 office-0.039
 bathroom-0.037
elt-0.037
 salon-0.035
Token anymore
Token ID7471
Feature activation+1.12e-02

Pos logit contributions
 unity+0.075
 matter+0.075
 course+0.075
Fin+0.072
 bond+0.071
Neg logit contributions
angers-0.063
elt-0.058
 office-0.057
 bathroom-0.052
 salon-0.050
Token.
Token ID13
Feature activation-1.32e-02

Pos logit contributions
 unity+0.026
Fin+0.025
 matter+0.025
 course+0.025
 bond+0.025
Neg logit contributions
angers-0.011
 office-0.010
elt-0.009
 bathroom-0.008
 upstairs-0.008
Token They
Token ID1119
Feature activation+1.00e-01

Pos logit contributions
angers+0.021
elt+0.020
 office+0.019
 bathroom+0.018
 salon+0.017
Neg logit contributions
 matter-0.023
 unity-0.023
 course-0.023
 bond-0.022
Fin-0.021
Token became
Token ID2627
Feature activation-5.08e-02

Pos logit contributions
oc+0.203
 matter+0.196
leness+0.196
 unity+0.195
Fin+0.195
Neg logit contributions
elt-0.100
ed-0.094
outed-0.090
 office-0.089
 upstairs-0.089
Token good
Token ID922
Feature activation+6.45e-02

Pos logit contributions
 office+0.062
 bathroom+0.058
angers+0.058
 upstairs+0.056
elt+0.052
Neg logit contributions
forts-0.102
oc-0.101
 course-0.099
leness-0.099
affe-0.097
Token friends
Token ID2460
Feature activation+7.74e-02

Pos logit contributions
 matter+0.107
 course+0.107
 unity+0.106
Fin+0.105
 bond+0.100
Neg logit contributions
angers-0.109
 office-0.102
elt-0.101
 bathroom-0.090
azes-0.087
Token and
Token ID290
Feature activation+5.61e-02

Pos logit contributions
 matter+0.186
Fin+0.178
 unity+0.177
leness+0.170
Ar+0.169
Neg logit contributions
 office-0.070
angers-0.066
elt-0.061
 upstairs-0.058
ed-0.054
Token admired
Token ID29382
Feature activation+1.65e-02

Pos logit contributions
 course+0.091
 matter+0.091
Fin+0.090
oc+0.088
 unity+0.087
Neg logit contributions
 office-0.086
angers-0.086
elt-0.085
 bathroom-0.078
 salon-0.073
Tokenmy
Token ID1820
Feature activation-2.40e-02

Pos logit contributions
 matter+0.018
 course+0.018
 unity+0.017
 bond+0.017
Fin+0.016
Neg logit contributions
angers-0.026
elt-0.023
 office-0.023
 bathroom-0.021
 salon-0.021
Token and
Token ID290
Feature activation+4.11e-02

Pos logit contributions
angers+0.048
elt+0.041
 office+0.041
 bathroom+0.038
anger+0.037
Neg logit contributions
 matter-0.035
 course-0.035
 unity-0.034
 caring-0.034
oc-0.034
Token his
Token ID465
Feature activation-3.57e-02

Pos logit contributions
 matter+0.057
 unity+0.053
 course+0.053
 kindness+0.050
 bond+0.050
Neg logit contributions
angers-0.087
elt-0.080
 office-0.076
anger-0.071
 bathroom-0.071
Token mom
Token ID1995
Feature activation-4.13e-02

Pos logit contributions
angers+0.056
elt+0.051
 office+0.049
 bathroom+0.045
 salon+0.044
Neg logit contributions
 matter-0.065
 unity-0.064
 course-0.063
 bond-0.061
 kindness-0.061
Token worked
Token ID3111
Feature activation-3.42e-02

Pos logit contributions
angers+0.062
elt+0.057
 office+0.057
 bathroom+0.051
 salon+0.049
Neg logit contributions
 unity-0.075
 matter-0.074
 course-0.074
Fin-0.071
 bond-0.071
Token together
Token ID1978
Feature activation+6.98e-02

Pos logit contributions
angers+0.068
 office+0.063
elt+0.063
 bathroom+0.058
 salon+0.056
Neg logit contributions
 unity-0.046
 course-0.045
 matter-0.045
Fin-0.043
 bond-0.043
Token to
Token ID284
Feature activation+1.74e-02

Pos logit contributions
 matter+0.144
 unity+0.144
 course+0.143
 bond+0.138
Fin+0.137
Neg logit contributions
angers-0.088
elt-0.079
 office-0.078
 bathroom-0.069
 salon-0.065
Token build
Token ID1382
Feature activation-3.52e-02

Pos logit contributions
 unity+0.027
 course+0.027
 matter+0.027
Fin+0.026
 bond+0.026
Neg logit contributions
angers-0.030
 office-0.029
elt-0.028
 bathroom-0.026
 salon-0.025
Token the
Token ID262
Feature activation-5.13e-02

Pos logit contributions
angers+0.063
elt+0.058
 office+0.057
 bathroom+0.053
 salon+0.051
Neg logit contributions
 unity-0.055
 matter-0.055
 course-0.054
Fin-0.051
 bond-0.051
Token tower
Token ID10580
Feature activation-4.02e-02

Pos logit contributions
angers+0.086
elt+0.079
 office+0.069
anger+0.065
ulk+0.065
Neg logit contributions
 matter-0.090
 course-0.089
 unity-0.088
 kindness-0.082
 bond-0.082
Token.
Token ID13
Feature activation-2.12e-02

Pos logit contributions
angers+0.065
elt+0.060
 office+0.057
 bathroom+0.053
 salon+0.051
Neg logit contributions
 matter-0.069
 course-0.068
 unity-0.068
 bond-0.064
oc-0.064
Token little
Token ID1310
Feature activation+4.42e-02

Pos logit contributions
 unity+0.047
 course+0.046
 matter+0.046
leness+0.044
 kindness+0.043
Neg logit contributions
angers-0.040
 office-0.040
elt-0.037
 bathroom-0.037
 salon-0.035
Token boy
Token ID2933
Feature activation-1.10e-02

Pos logit contributions
 unity+0.083
 matter+0.081
 course+0.079
leness+0.077
 lasting+0.077
Neg logit contributions
angers-0.063
elt-0.063
 office-0.062
 bathroom-0.057
 salon-0.054
Token named
Token ID3706
Feature activation+2.87e-02

Pos logit contributions
angers+0.020
 office+0.019
elt+0.019
 bathroom+0.017
 salon+0.017
Neg logit contributions
 unity-0.017
 matter-0.016
 course-0.016
Fin-0.015
 bond-0.015
Token Luke
Token ID11336
Feature activation+6.85e-03

Pos logit contributions
 matter+0.034
 unity+0.034
 course+0.034
Fin+0.032
 bond+0.031
Neg logit contributions
angers-0.062
elt-0.058
 office-0.058
 bathroom-0.054
 salon-0.053
Token.
Token ID13
Feature activation-5.54e-03

Pos logit contributions
 unity+0.014
 matter+0.014
 course+0.014
Fin+0.014
 bond+0.014
Neg logit contributions
angers-0.009
elt-0.008
 office-0.008
 bathroom-0.007
 salon-0.006
Token He
Token ID679
Feature activation+6.89e-02

Pos logit contributions
angers+0.009
elt+0.009
 office+0.009
 bathroom+0.008
azes+0.008
Neg logit contributions
 unity-0.009
 matter-0.009
 course-0.009
 bond-0.008
Fin-0.008
Token found
Token ID1043
Feature activation-2.62e-02

Pos logit contributions
 matter+0.078
Fin+0.075
 love+0.074
 course+0.074
From+0.072
Neg logit contributions
angers-0.166
elt-0.143
 office-0.142
anger-0.141
ulk-0.139
Token an
Token ID281
Feature activation+4.49e-02

Pos logit contributions
angers+0.046
 office+0.041
elt+0.040
 bathroom+0.038
 salon+0.038
Neg logit contributions
 course-0.043
 unity-0.042
 matter-0.041
Fin-0.040
 kindness-0.040
Token orange
Token ID10912
Feature activation+3.24e-02

Pos logit contributions
 matter+0.065
 course+0.063
 unity+0.062
 kindness+0.059
 caring+0.058
Neg logit contributions
angers-0.088
elt-0.082
ulk-0.074
 office-0.073
 bathroom-0.072
Token job
Token ID1693
Feature activation-6.88e-03

Pos logit contributions
 course+0.045
 matter+0.042
 bond+0.041
oc+0.040
 unity+0.040
Neg logit contributions
angers-0.067
elt-0.060
 office-0.059
azes-0.057
ulk-0.057
Token and
Token ID290
Feature activation+5.45e-02

Pos logit contributions
angers+0.010
elt+0.009
 office+0.009
 bathroom+0.008
 salon+0.008
Neg logit contributions
 matter-0.013
 unity-0.013
 course-0.013
 bond-0.012
 kindness-0.012
Token book
Token ID1492
Feature activation-1.36e-02

Pos logit contributions
 unity+0.096
 matter+0.094
 course+0.092
Fin+0.091
 bond+0.088
Neg logit contributions
angers-0.110
 office-0.105
elt-0.103
 bathroom-0.097
 salon-0.094
Token from
Token ID422
Feature activation-1.86e-03

Pos logit contributions
angers+0.022
elt+0.020
 office+0.020
 bathroom+0.018
 salon+0.018
Neg logit contributions
 course-0.023
 matter-0.023
 unity-0.023
 bond-0.022
Fin-0.021
Token the
Token ID262
Feature activation-5.24e-02

Pos logit contributions
angers+0.003
elt+0.003
 office+0.003
 bathroom+0.003
 salon+0.003
Neg logit contributions
 course-0.003
 matter-0.003
 unity-0.003
Fin-0.003
 kindness-0.003
Token shelf
Token ID18316
Feature activation-5.38e-02

Pos logit contributions
angers+0.078
elt+0.073
azes+0.064
outed+0.062
ed+0.061
Neg logit contributions
 course-0.100
 matter-0.097
 kindness-0.094
 bond-0.093
 unity-0.093
Token.
Token ID13
Feature activation-1.83e-02

Pos logit contributions
angers+0.088
elt+0.081
 office+0.073
 bathroom+0.071
amas+0.071
Neg logit contributions
 course-0.090
 unity-0.086
 matter-0.085
 kindness-0.083
 bond-0.083
Token She
Token ID1375
Feature activation+6.23e-02

Pos logit contributions
angers+0.032
 office+0.031
elt+0.029
pit+0.027
uddled+0.027
Neg logit contributions
 thanks-0.024
 kindness-0.024
 unity-0.024
From-0.023
 joy-0.023
Token says
Token ID1139
Feature activation+6.17e-05

Pos logit contributions
 matter+0.093
 course+0.093
 unity+0.093
 bond+0.087
Fin+0.086
Neg logit contributions
angers-0.115
elt-0.106
 office-0.104
 bathroom-0.096
 salon-0.093
Token it
Token ID340
Feature activation-6.13e-02

Pos logit contributions
 matter+0.000
 unity+0.000
 course+0.000
 bond+0.000
Fin+0.000
Neg logit contributions
angers-0.000
elt-0.000
 office-0.000
 bathroom-0.000
 salon-0.000
Token is
Token ID318
Feature activation-5.43e-02

Pos logit contributions
angers+0.104
elt+0.095
 office+0.093
 bathroom+0.085
 salon+0.083
Neg logit contributions
 matter-0.101
 course-0.100
 unity-0.100
 kindness-0.095
oc-0.095
Token a
Token ID257
Feature activation+2.95e-02

Pos logit contributions
angers+0.086
 office+0.081
elt+0.078
 bathroom+0.075
 door+0.069
Neg logit contributions
 course-0.094
 unity-0.092
Fin-0.089
oc-0.089
 bond-0.089
Token book
Token ID1492
Feature activation-7.70e-03

Pos logit contributions
 unity+0.048
Fin+0.047
 course+0.047
leness+0.046
 matter+0.045
Neg logit contributions
 office-0.050
angers-0.047
 bathroom-0.046
 salon-0.044
 door-0.044
Token couldn
Token ID3521
Feature activation-3.82e-02

Pos logit contributions
 unity+0.071
 matter+0.071
 course+0.070
Fin+0.067
 bond+0.067
Neg logit contributions
angers-0.055
elt-0.051
 office-0.050
 bathroom-0.045
 salon-0.043
Token't
Token ID470
Feature activation+6.69e-02

Pos logit contributions
angers+0.081
elt+0.076
 office+0.072
pit+0.068
outed+0.067
Neg logit contributions
 course-0.051
From-0.046
oc-0.044
 matter-0.044
Fin-0.044
Token believe
Token ID1975
Feature activation+1.49e-02

Pos logit contributions
 unity+0.115
leness+0.110
 course+0.109
Fin+0.109
 harmony+0.106
Neg logit contributions
angers-0.094
 upstairs-0.093
 office-0.092
elt-0.090
 bathroom-0.089
Token it
Token ID340
Feature activation-6.18e-02

Pos logit contributions
 matter+0.020
 unity+0.019
 course+0.019
Fin+0.018
 bond+0.018
Neg logit contributions
angers-0.030
elt-0.029
 office-0.028
 bathroom-0.026
 salon-0.026
Token.
Token ID13
Feature activation-1.87e-02

Pos logit contributions
angers+0.090
 office+0.084
elt+0.081
 bathroom+0.074
 door+0.069
Neg logit contributions
 unity-0.116
 matter-0.112
 course-0.111
 bond-0.111
Fin-0.110
Token
Token ID198
Feature activation+6.39e-02

Pos logit contributions
angers+0.034
 office+0.031
elt+0.031
 salon+0.029
 bathroom+0.028
Neg logit contributions
 unity-0.027
 course-0.027
oc-0.027
 matter-0.026
Fin-0.026
Token
Token ID198
Feature activation+6.45e-02

Pos logit contributions
 matter+0.122
 unity+0.120
 course+0.117
 bond+0.114
 kindness+0.111
Neg logit contributions
angers-0.094
 office-0.085
elt-0.083
 bathroom-0.077
tub-0.072
TokenWill
Token ID8743
Feature activation+6.63e-02

Pos logit contributions
 matter+0.091
 unity+0.091
 kindness+0.088
 bond+0.088
forts+0.088
Neg logit contributions
angers-0.123
 office-0.113
 bathroom-0.108
amas-0.104
elt-0.103
Token was
Token ID373
Feature activation-6.18e-02

Pos logit contributions
 matter+0.115
 unity+0.114
 course+0.114
Fin+0.109
 bond+0.108
Neg logit contributions
angers-0.107
elt-0.097
 office-0.096
 bathroom-0.088
 salon-0.085
Token so
Token ID523
Feature activation-4.41e-03

Pos logit contributions
angers+0.095
 office+0.088
elt+0.087
 bathroom+0.079
 salon+0.075
Neg logit contributions
 unity-0.110
 matter-0.110
 course-0.110
Fin-0.105
 bond-0.105
Token happy
Token ID3772
Feature activation+3.40e-02

Pos logit contributions
angers+0.008
elt+0.008
 office+0.008
 bathroom+0.007
 salon+0.007
Neg logit contributions
 unity-0.007
 matter-0.006
 course-0.006
Fin-0.006
 bond-0.006
Token bring
Token ID2222
Feature activation+3.44e-02

Pos logit contributions
angers+0.045
 office+0.045
 door+0.043
 bathroom+0.043
elt+0.041
Neg logit contributions
Fin-0.058
 unity-0.057
oc-0.056
Mic-0.056
leness-0.055
Token.
Token ID13
Feature activation-2.44e-02

Pos logit contributions
Fin+0.071
oc+0.070
 unity+0.069
 matter+0.068
leness+0.068
Neg logit contributions
angers-0.040
 office-0.038
elt-0.036
 door-0.033
 bathroom-0.033
Token
Token ID198
Feature activation+6.20e-02

Pos logit contributions
angers+0.044
 office+0.040
elt+0.039
 bathroom+0.037
 salon+0.036
Neg logit contributions
 course-0.038
 unity-0.037
 matter-0.036
 bond-0.035
Fin-0.035
Token
Token ID198
Feature activation+6.34e-02

Pos logit contributions
 matter+0.115
 unity+0.113
 course+0.110
 bond+0.107
Fin+0.106
Neg logit contributions
angers-0.094
elt-0.085
 office-0.085
 bathroom-0.077
 salon-0.072
TokenAt
Token ID2953
Feature activation+3.76e-02

Pos logit contributions
 unity+0.098
 matter+0.096
 kindness+0.095
leness+0.093
forts+0.093
Neg logit contributions
angers-0.109
 office-0.102
elt-0.101
 bathroom-0.098
uck-0.095
Token first
Token ID717
Feature activation-3.19e-02

Pos logit contributions
 course+0.053
 unity+0.053
 matter+0.052
 bond+0.049
 kindness+0.048
Neg logit contributions
angers-0.073
elt-0.068
 office-0.067
 bathroom-0.062
 salon-0.060
Token,
Token ID11
Feature activation-5.34e-02

Pos logit contributions
angers+0.037
 office+0.035
elt+0.032
 bathroom+0.030
 salon+0.029
Neg logit contributions
 unity-0.068
Fin-0.067
 matter-0.066
 course-0.066
 bond-0.065
Token nothing
Token ID2147
Feature activation-1.81e-02

Pos logit contributions
angers+0.084
elt+0.078
 office+0.078
 bathroom+0.072
 salon+0.069
Neg logit contributions
 unity-0.092
 course-0.090
 matter-0.090
Fin-0.089
 bond-0.087
Token seemed
Token ID3947
Feature activation-3.43e-02

Pos logit contributions
angers+0.025
elt+0.024
 office+0.023
 bathroom+0.021
 salon+0.020
Neg logit contributions
 unity-0.034
Fin-0.033
oc-0.032
 course-0.031
 bond-0.031
Token to
Token ID284
Feature activation+3.43e-03

Pos logit contributions
angers+0.045
elt+0.042
 office+0.041
 bathroom+0.038
 salon+0.034
Neg logit contributions
 course-0.067
 unity-0.066
 matter-0.065
Fin-0.065
oc-0.065
Token happen
Token ID1645
Feature activation-3.85e-02

Pos logit contributions
oc+0.007
Fin+0.006
ience+0.006
Mic+0.006
haw+0.006
Neg logit contributions
 door-0.005
 bathroom-0.004
 office-0.004
 upstairs-0.004
ble-0.004
Token beautiful
Token ID4950
Feature activation+7.57e-02

Pos logit contributions
 unity+0.047
 course+0.046
 matter+0.046
leness+0.045
forts+0.044
Neg logit contributions
 office-0.041
angers-0.040
 bathroom-0.038
elt-0.037
 salon-0.035
Token girl
Token ID2576
Feature activation-9.12e-03

Pos logit contributions
 matter+0.101
 unity+0.101
 course+0.101
 bond+0.095
Fin+0.094
Neg logit contributions
angers-0.151
elt-0.140
 office-0.139
 bathroom-0.128
 salon-0.125
Token and
Token ID290
Feature activation+4.52e-02

Pos logit contributions
angers+0.018
 office+0.017
elt+0.017
 bathroom+0.015
 salon+0.015
Neg logit contributions
 unity-0.013
 matter-0.012
 course-0.012
Fin-0.012
 bond-0.012
Token a
Token ID257
Feature activation+3.41e-02

Pos logit contributions
 unity+0.081
 matter+0.079
 course+0.079
Fin+0.077
oc+0.077
Neg logit contributions
angers-0.066
elt-0.064
 office-0.063
 bathroom-0.058
 salon-0.054
Token handsome
Token ID22665
Feature activation+4.69e-02

Pos logit contributions
 unity+0.054
 course+0.054
 matter+0.053
leness+0.050
Fin+0.050
Neg logit contributions
angers-0.057
 office-0.056
elt-0.055
 bathroom-0.053
 salon-0.049
Token boy
Token ID2933
Feature activation-6.00e-03

Pos logit contributions
 matter+0.060
 unity+0.058
 course+0.058
 bond+0.057
oc+0.055
Neg logit contributions
angers-0.096
elt-0.090
 office-0.087
 bathroom-0.082
ulk-0.081
Token.
Token ID13
Feature activation-9.54e-03

Pos logit contributions
angers+0.011
elt+0.010
 office+0.010
 bathroom+0.009
 salon+0.009
Neg logit contributions
 course-0.009
 matter-0.009
 unity-0.009
 bond-0.009
Fin-0.009
Token They
Token ID1119
Feature activation+9.89e-02

Pos logit contributions
angers+0.015
elt+0.014
 office+0.014
 bathroom+0.012
 salon+0.012
Neg logit contributions
 matter-0.017
 unity-0.016
 course-0.016
 bond-0.016
Fin-0.016
Token loved
Token ID6151
Feature activation+2.48e-02

Pos logit contributions
 unity+0.175
 matter+0.172
 course+0.169
oc+0.165
 bond+0.163
Neg logit contributions
angers-0.151
elt-0.142
 office-0.137
 bathroom-0.127
 salon-0.120
Token each
Token ID1123
Feature activation+4.48e-02

Pos logit contributions
 unity+0.054
 matter+0.053
 course+0.053
 caring+0.051
 bond+0.051
Neg logit contributions
angers-0.029
elt-0.025
 office-0.025
 bathroom-0.022
 salon-0.021
Token other
Token ID584
Feature activation+1.98e-02

Pos logit contributions
 unity+0.066
 course+0.062
Fin+0.061
 matter+0.061
leness+0.060
Neg logit contributions
angers-0.082
 office-0.081
elt-0.075
 bathroom-0.074
 salon-0.070
Token,
Token ID11
Feature activation-4.43e-02

Pos logit contributions
 matter+0.138
Fin+0.136
 unity+0.133
 course+0.132
Fred+0.127
Neg logit contributions
angers-0.072
 office-0.069
elt-0.064
 bathroom-0.059
 salon-0.054
Token who
Token ID508
Feature activation+8.26e-02

Pos logit contributions
 office+0.058
angers+0.053
 bathroom+0.051
elt+0.050
 salon+0.048
Neg logit contributions
leness-0.089
oc-0.087
Fin-0.084
forts-0.084
 matter-0.083
Token know
Token ID760
Feature activation+3.23e-02

Pos logit contributions
Fin+0.129
 souven+0.128
 matter+0.127
Mic+0.127
Fred+0.125
Neg logit contributions
 office-0.138
angers-0.118
elt-0.115
 bathroom-0.114
uck-0.110
Token what
Token ID644
Feature activation-1.83e-02

Pos logit contributions
Fin+0.054
oc+0.054
 matter+0.053
 unity+0.052
 bond+0.051
Neg logit contributions
 office-0.049
angers-0.049
elt-0.044
 bathroom-0.044
 door-0.041
Token is
Token ID318
Feature activation-6.77e-02

Pos logit contributions
angers+0.028
 office+0.027
elt+0.027
 bathroom+0.025
 salon+0.023
Neg logit contributions
 matter-0.031
 unity-0.031
 course-0.030
Fin-0.030
oc-0.029
Token best
Token ID1266
Feature activation+5.63e-02

Pos logit contributions
angers+0.114
 office+0.108
elt+0.103
 bathroom+0.097
 upstairs+0.090
Neg logit contributions
 course-0.110
 matter-0.108
 unity-0.107
oc-0.105
Fin-0.105
Token for
Token ID329
Feature activation+2.92e-02

Pos logit contributions
 unity+0.095
 matter+0.095
 course+0.095
Fin+0.090
 bond+0.090
Neg logit contributions
angers-0.093
elt-0.085
 office-0.084
 bathroom-0.077
 salon-0.074
Token us
Token ID514
Feature activation-2.35e-02

Pos logit contributions
 matter+0.046
 unity+0.045
oc+0.045
 course+0.044
Fin+0.044
Neg logit contributions
angers-0.050
 office-0.047
elt-0.044
 bathroom-0.043
 upstairs-0.041
Token and
Token ID290
Feature activation+5.71e-02

Pos logit contributions
angers+0.031
 office+0.029
elt+0.029
 bathroom+0.026
 salon+0.024
Neg logit contributions
 matter-0.046
 unity-0.045
 course-0.045
Fin-0.044
 bond-0.043
Token want
Token ID765
Feature activation+3.90e-02

Pos logit contributions
Fin+0.084
 course+0.084
oc+0.084
 matter+0.082
 unity+0.078
Neg logit contributions
 office-0.100
angers-0.097
elt-0.093
 bathroom-0.091
 door-0.085
Token to
Token ID284
Feature activation+1.55e-02

Pos logit contributions
oc+0.095
Fin+0.093
 matter+0.092
 unity+0.092
 course+0.092
Neg logit contributions
angers-0.029
 office-0.028
elt-0.027
 bathroom-0.024
 door-0.023
Token with
Token ID351
Feature activation+4.47e-02

Pos logit contributions
angers+0.092
elt+0.084
 office+0.084
 bathroom+0.075
 salon+0.075
Neg logit contributions
 matter-0.073
 course-0.072
 unity-0.071
 kindness-0.068
 souven-0.067
Token the
Token ID262
Feature activation-3.01e-02

Pos logit contributions
 bond+0.091
 matter+0.089
oc+0.088
 unity+0.088
leness+0.086
Neg logit contributions
elt-0.052
 office-0.050
 door-0.048
 bathroom-0.044
angers-0.044
Token toy
Token ID13373
Feature activation+2.75e-02

Pos logit contributions
 office+0.038
 door+0.037
 bathroom+0.036
angers+0.036
 knee+0.036
Neg logit contributions
 unity-0.059
oc-0.058
Fin-0.058
From-0.058
leness-0.057
Token car
Token ID1097
Feature activation+7.47e-03

Pos logit contributions
 course+0.052
 unity+0.052
 matter+0.052
 bond+0.050
Fin+0.050
Neg logit contributions
angers-0.039
elt-0.036
 office-0.035
 bathroom-0.032
 salon-0.030
Token.
Token ID13
Feature activation-1.72e-02

Pos logit contributions
 course+0.014
 matter+0.014
 unity+0.014
 kindness+0.013
 bond+0.013
Neg logit contributions
angers-0.011
elt-0.010
 office-0.009
 bathroom-0.009
 salon-0.008
Token Tom
Token ID4186
Feature activation-3.11e-03

Pos logit contributions
angers+0.038
 office+0.034
elt+0.033
tub+0.032
 bathroom+0.032
Neg logit contributions
 course-0.020
 matter-0.020
 unity-0.019
From-0.019
Fin-0.018
Token thought
Token ID1807
Feature activation-1.67e-02

Pos logit contributions
angers+0.005
elt+0.004
 office+0.004
 bathroom+0.004
 salon+0.004
Neg logit contributions
 unity-0.006
 matter-0.006
 course-0.006
 bond-0.005
Fin-0.005
Token of
Token ID286
Feature activation+8.89e-02

Pos logit contributions
angers+0.028
 office+0.026
elt+0.025
 bathroom+0.023
 salon+0.023
Neg logit contributions
 unity-0.028
 matter-0.028
 bond-0.027
Fin-0.027
 course-0.026
Token a
Token ID257
Feature activation+4.85e-02

Pos logit contributions
forts+0.174
Fin+0.173
 matter+0.172
 unity+0.172
leness+0.171
Neg logit contributions
 office-0.110
angers-0.100
elt-0.096
 bathroom-0.094
azes-0.084
Token plan
Token ID1410
Feature activation+9.54e-03

Pos logit contributions
leness+0.076
 unity+0.075
oc+0.074
Fin+0.074
 course+0.073
Neg logit contributions
 office-0.076
 bathroom-0.073
 door-0.070
 knee-0.064
 kitchen-0.064
Token to
Token ID284
Feature activation+2.52e-02

Pos logit contributions
 course+0.019
 matter+0.019
 unity+0.019
 bond+0.018
Fin+0.018
Neg logit contributions
angers-0.013
elt-0.012
 office-0.011
 bathroom-0.010
 salon-0.010
Token clumsy
Token ID39120
Feature activation+6.17e-02

Pos logit contributions
angers+0.114
 office+0.111
elt+0.109
 bathroom+0.102
 salon+0.094
Neg logit contributions
 course-0.111
 unity-0.109
oc-0.108
 matter-0.108
 bond-0.106
Token sometimes
Token ID3360
Feature activation+1.33e-03

Pos logit contributions
 unity+0.127
 matter+0.125
Fin+0.123
 course+0.122
 bond+0.122
Neg logit contributions
angers-0.077
elt-0.070
 office-0.070
 bathroom-0.063
tub-0.057
Token,
Token ID11
Feature activation-5.29e-02

Pos logit contributions
 unity+0.003
 matter+0.003
 course+0.003
Fin+0.003
 bond+0.003
Neg logit contributions
angers-0.002
 office-0.001
elt-0.001
 bathroom-0.001
 upstairs-0.001
Token he
Token ID339
Feature activation+1.47e-02

Pos logit contributions
angers+0.071
 office+0.071
elt+0.069
 bathroom+0.067
tub+0.059
Neg logit contributions
forts-0.094
oc-0.093
 bond-0.091
 unity-0.090
leness-0.088
Token could
Token ID714
Feature activation-1.35e-02

Pos logit contributions
 unity+0.026
 matter+0.026
 souven+0.025
Fin+0.025
 bond+0.025
Neg logit contributions
angers-0.021
elt-0.021
 office-0.020
 bathroom-0.018
uck-0.017
Token still
Token ID991
Feature activation+3.46e-02

Pos logit contributions
 office+0.021
angers+0.020
 bathroom+0.019
elt+0.019
 salon+0.018
Neg logit contributions
 unity-0.023
 matter-0.022
Fin-0.021
 bond-0.021
 course-0.021
Token be
Token ID307
Feature activation-3.10e-02

Pos logit contributions
 unity+0.052
 matter+0.051
 course+0.051
Fin+0.050
leness+0.049
Neg logit contributions
 office-0.058
angers-0.057
elt-0.054
 bathroom-0.053
 knee-0.050
Token helpful
Token ID7613
Feature activation+9.05e-02

Pos logit contributions
angers+0.060
elt+0.057
 office+0.057
 bathroom+0.052
 salon+0.050
Neg logit contributions
 matter-0.042
 course-0.042
 unity-0.042
 bond-0.040
Fin-0.040
Token to
Token ID284
Feature activation+3.23e-02

Pos logit contributions
 unity+0.187
 matter+0.186
 course+0.185
Fin+0.182
 bond+0.178
Neg logit contributions
angers-0.112
elt-0.105
 office-0.104
 bathroom-0.089
tub-0.084
Token his
Token ID465
Feature activation-3.75e-02

Pos logit contributions
 unity+0.052
 course+0.051
Fin+0.051
 matter+0.050
 bond+0.050
Neg logit contributions
angers-0.051
 office-0.051
 bathroom-0.046
elt-0.045
 upstairs-0.043
Token friends
Token ID2460
Feature activation+6.98e-02

Pos logit contributions
 office+0.053
angers+0.051
 bathroom+0.047
 knee+0.045
elt+0.044
Neg logit contributions
 course-0.070
 matter-0.069
From-0.067
Fin-0.066
Blue-0.066
Token pleased
Token ID10607
Feature activation+1.65e-02

Pos logit contributions
angers+0.011
elt+0.011
 office+0.011
 bathroom+0.010
 salon+0.010
Neg logit contributions
 matter-0.006
 unity-0.006
 course-0.006
Fin-0.006
 bond-0.006
Token.
Token ID13
Feature activation-1.64e-02

Pos logit contributions
 course+0.026
 unity+0.026
 matter+0.026
Fin+0.025
 bond+0.025
Neg logit contributions
angers-0.029
elt-0.026
 office-0.026
 bathroom-0.024
 salon-0.024
Token What
Token ID1867
Feature activation-2.37e-03

Pos logit contributions
angers+0.031
 office+0.029
elt+0.028
 salon+0.027
 bathroom+0.027
Neg logit contributions
 unity-0.023
 course-0.022
 matter-0.022
From-0.021
 kindness-0.021
Token a
Token ID257
Feature activation+3.86e-02

Pos logit contributions
 office+0.002
elt+0.002
 upstairs+0.002
 bathroom+0.002
ed+0.002
Neg logit contributions
leness-0.005
forts-0.005
 unity-0.005
Fin-0.005
 bond-0.005
Token special
Token ID2041
Feature activation+7.14e-02

Pos logit contributions
Fin+0.053
leness+0.052
 unity+0.049
forts+0.048
 matter+0.045
Neg logit contributions
 office-0.073
 door-0.068
 bathroom-0.068
 cabinet-0.064
 kitchen-0.064
Token night
Token ID1755
Feature activation-8.33e-02

Pos logit contributions
Fin+0.109
leness+0.101
 unity+0.100
forts+0.097
Fred+0.095
Neg logit contributions
 office-0.127
 bathroom-0.117
elt-0.116
 door-0.115
angers-0.115
Token it
Token ID340
Feature activation-7.10e-02

Pos logit contributions
angers+0.147
elt+0.135
 office+0.133
 bathroom+0.123
 salon+0.119
Neg logit contributions
 matter-0.131
 unity-0.131
 course-0.130
 bond-0.123
Fin-0.123
Token had
Token ID550
Feature activation+2.08e-03

Pos logit contributions
angers+0.080
 office+0.079
elt+0.079
 bathroom+0.071
tub+0.066
Neg logit contributions
 unity-0.149
 bond-0.142
 matter-0.142
Fin-0.141
 harmony-0.139
Token been
Token ID587
Feature activation-5.28e-02

Pos logit contributions
 unity+0.004
 matter+0.004
 course+0.004
Fin+0.003
 bond+0.003
Neg logit contributions
angers-0.003
elt-0.003
 office-0.003
 bathroom-0.003
 salon-0.003
Token-
Token ID12
Feature activation-1.24e-02

Pos logit contributions
angers+0.088
elt+0.078
 office+0.077
 bathroom+0.071
 salon+0.070
Neg logit contributions
 course-0.090
 matter-0.089
 unity-0.088
 kindness-0.084
 caring-0.084
Token he
Token ID339
Feature activation+1.61e-02

Pos logit contributions
angers+0.019
elt+0.018
 office+0.017
 bathroom+0.015
 salon+0.015
Neg logit contributions
 unity-0.022
 matter-0.022
 course-0.022
 bond-0.021
Fin-0.021
Token excited
Token ID6568
Feature activation+3.24e-02

Pos logit contributions
angers+0.011
elt+0.010
 office+0.010
 bathroom+0.009
azes+0.009
Neg logit contributions
 matter-0.009
 course-0.008
 unity-0.008
 bond-0.008
Fin-0.008
Token to
Token ID284
Feature activation+1.48e-02

Pos logit contributions
 matter+0.064
 course+0.064
 unity+0.063
Fin+0.060
 kindness+0.060
Neg logit contributions
angers-0.045
 office-0.040
elt-0.039
 bathroom-0.036
 salon-0.035
Token see
Token ID766
Feature activation-3.06e-02

Pos logit contributions
 unity+0.022
 matter+0.022
 course+0.022
Fin+0.022
 bond+0.022
Neg logit contributions
 office-0.025
angers-0.025
 bathroom-0.022
elt-0.022
 upstairs-0.022
Token Joey
Token ID26154
Feature activation-1.47e-02

Pos logit contributions
 office+0.036
angers+0.031
 upstairs+0.030
 outside+0.030
elt+0.030
Neg logit contributions
Ar-0.062
leness-0.062
 matter-0.062
 bond-0.062
forts-0.061
Token had
Token ID550
Feature activation+1.38e-02

Pos logit contributions
angers+0.022
 office+0.021
elt+0.020
 bathroom+0.018
 salon+0.018
Neg logit contributions
 unity-0.027
 matter-0.027
 course-0.026
Fin-0.026
 bond-0.025
Token found
Token ID1043
Feature activation-3.80e-02

Pos logit contributions
 unity+0.022
 matter+0.022
 course+0.022
Fin+0.021
 bond+0.021
Neg logit contributions
angers-0.023
 office-0.022
elt-0.022
 bathroom-0.020
 salon-0.019
Token the
Token ID262
Feature activation-3.55e-02

Pos logit contributions
 office+0.055
angers+0.053
elt+0.051
 door+0.047
 bathroom+0.047
Neg logit contributions
 matter-0.070
Fin-0.068
leness-0.067
 unity-0.067
 bond-0.066
Token key
Token ID1994
Feature activation-4.80e-03

Pos logit contributions
 office+0.052
angers+0.048
 bathroom+0.047
 door+0.046
elt+0.044
Neg logit contributions
oc-0.065
leness-0.065
Fin-0.064
 unity-0.063
From-0.063
Token to
Token ID284
Feature activation+2.64e-02

Pos logit contributions
angers+0.007
 office+0.006
elt+0.006
 bathroom+0.005
 salon+0.005
Neg logit contributions
 unity-0.009
 matter-0.009
 course-0.009
Fin-0.009
 bond-0.009
Token unlock
Token ID12116
Feature activation+1.23e-02

Pos logit contributions
Fin+0.040
leness+0.040
 unity+0.040
forts+0.039
 bond+0.038
Neg logit contributions
 office-0.045
angers-0.041
 door-0.039
 bathroom-0.039
 upstairs-0.038
Token it
Token ID340
Feature activation-8.52e-02

Pos logit contributions
 unity+0.021
Fin+0.021
 matter+0.021
 bond+0.021
forts+0.021
Neg logit contributions
angers-0.018
 office-0.018
 bathroom-0.016
 door-0.016
elt-0.016
Token picked
Token ID6497
Feature activation-2.61e-02

Pos logit contributions
 matter+0.078
 unity+0.075
 course+0.075
Fin+0.073
 bond+0.071
Neg logit contributions
angers-0.086
 office-0.077
elt-0.077
 bathroom-0.071
 salon-0.069
Token it
Token ID340
Feature activation-6.89e-02

Pos logit contributions
angers+0.026
elt+0.022
 office+0.022
 bathroom+0.019
 salon+0.017
Neg logit contributions
 matter-0.061
 unity-0.061
 course-0.061
Fin-0.059
 bond-0.059
Token up
Token ID510
Feature activation-4.15e-02

Pos logit contributions
angers+0.067
elt+0.058
 office+0.053
 bathroom+0.046
 salon+0.045
Neg logit contributions
 matter-0.165
 course-0.163
 unity-0.163
 kindness-0.158
 bond-0.157
Token and
Token ID290
Feature activation+3.81e-02

Pos logit contributions
angers+0.062
 office+0.058
elt+0.057
 bathroom+0.052
 salon+0.049
Neg logit contributions
 unity-0.075
 matter-0.074
 course-0.074
Fin-0.072
 bond-0.071
Token put
Token ID1234
Feature activation-1.10e-03

Pos logit contributions
 matter+0.062
 unity+0.061
 course+0.061
Fin+0.057
 bond+0.057
Neg logit contributions
angers-0.067
elt-0.060
 office-0.060
 bathroom-0.055
 salon-0.053
Token it
Token ID340
Feature activation-7.14e-02

Pos logit contributions
angers+0.002
 office+0.002
elt+0.002
 bathroom+0.002
 door+0.002
Neg logit contributions
 unity-0.002
 matter-0.002
Fin-0.002
 bond-0.002
 course-0.002
Token in
Token ID287
Feature activation-3.97e-03

Pos logit contributions
angers+0.102
elt+0.091
 office+0.091
 salon+0.081
 bathroom+0.081
Neg logit contributions
 matter-0.134
 course-0.134
 unity-0.132
 kindness-0.128
 souven-0.127
Token his
Token ID465
Feature activation-3.80e-02

Pos logit contributions
 office+0.005
 bathroom+0.005
angers+0.005
 door+0.005
 kitchen+0.004
Neg logit contributions
Fin-0.008
 unity-0.008
leness-0.008
forts-0.008
ience-0.007
Token pocket
Token ID10000
Feature activation-4.59e-02

Pos logit contributions
angers+0.067
elt+0.060
ulk+0.054
 salon+0.052
azes+0.052
Neg logit contributions
 course-0.063
 bond-0.062
 unity-0.062
 kindness-0.061
 matter-0.060
Token.
Token ID13
Feature activation-1.89e-02

Pos logit contributions
angers+0.070
elt+0.068
 office+0.061
azes+0.058
 bathroom+0.057
Neg logit contributions
 course-0.078
 unity-0.077
 matter-0.076
 kindness-0.076
oc-0.074
Token He
Token ID679
Feature activation+4.96e-02

Pos logit contributions
angers+0.039
 salon+0.034
 office+0.033
elt+0.033
anger+0.032
Neg logit contributions
 unity-0.023
 matter-0.022
 kindness-0.022
 course-0.022
 teamwork-0.022
Token but
Token ID475
Feature activation+2.27e-02

Pos logit contributions
angers+0.113
 office+0.104
elt+0.104
 bathroom+0.100
 salon+0.094
Neg logit contributions
 unity-0.126
 matter-0.124
 course-0.122
 bond-0.122
oc-0.121
Token then
Token ID788
Feature activation-4.06e-02

Pos logit contributions
 unity+0.038
 course+0.036
Fin+0.036
forts+0.036
oc+0.036
Neg logit contributions
angers-0.035
 office-0.034
elt-0.033
 bathroom-0.032
 salon-0.032
Token she
Token ID673
Feature activation-1.99e-02

Pos logit contributions
angers+0.062
 office+0.058
elt+0.057
 bathroom+0.053
 salon+0.051
Neg logit contributions
 unity-0.071
 matter-0.070
 course-0.070
 bond-0.068
Fin-0.068
Token started
Token ID2067
Feature activation-1.33e-01

Pos logit contributions
angers+0.031
elt+0.029
 office+0.028
 bathroom+0.026
 salon+0.025
Neg logit contributions
 matter-0.035
 unity-0.035
 course-0.035
Fin-0.033
 bond-0.033
Token to
Token ID284
Feature activation-1.61e-02

Pos logit contributions
angers+0.197
 office+0.170
elt+0.160
anger+0.153
amas+0.149
Neg logit contributions
 caring-0.249
 matter-0.247
 unity-0.244
 kindness-0.243
 course-0.239
Token get
Token ID651
Feature activation-8.35e-02

Pos logit contributions
angers+0.029
elt+0.027
 office+0.027
 bathroom+0.025
 salon+0.024
Neg logit contributions
 unity-0.024
 course-0.024
 matter-0.024
Fin-0.023
 bond-0.023
Token used
Token ID973
Feature activation+1.89e-02

Pos logit contributions
angers+0.128
elt+0.124
 office+0.117
 bathroom+0.112
 salon+0.102
Neg logit contributions
 unity-0.147
 course-0.147
 matter-0.145
oc-0.142
 bond-0.142
Token to
Token ID284
Feature activation+1.22e-02

Pos logit contributions
oc+0.041
Fin+0.040
 unity+0.040
 matter+0.039
leness+0.038
Neg logit contributions
angers-0.021
elt-0.020
 office-0.019
 bathroom-0.018
 door-0.017
Token it
Token ID340
Feature activation-8.97e-02

Pos logit contributions
 unity+0.018
 course+0.017
 matter+0.017
Fin+0.017
 bond+0.016
Neg logit contributions
angers-0.022
 office-0.021
elt-0.021
 bathroom-0.020
 salon-0.019
Token.
Token ID13
Feature activation-3.84e-02

Pos logit contributions
angers+0.111
 office+0.099
elt+0.098
 bathroom+0.087
 salon+0.081
Neg logit contributions
 unity-0.187
 matter-0.185
 course-0.185
Fin-0.180
 bond-0.178
Token After
Token ID2293
Feature activation+6.16e-03

Pos logit contributions
angers+0.063
 office+0.058
elt+0.055
 bathroom+0.051
 salon+0.051
Neg logit contributions
 course-0.064
 unity-0.063
 matter-0.063
Fin-0.061
 kindness-0.060
Token We
Token ID775
Feature activation+9.14e-02

Pos logit contributions
elt+0.013
angers+0.012
 office+0.012
 bathroom+0.011
 knee+0.010
Neg logit contributions
 matter-0.018
 course-0.017
leness-0.016
 unity-0.016
 bond-0.016
Token love
Token ID1842
Feature activation+7.27e-02

Pos logit contributions
 unity+0.148
 harmony+0.147
 lasting+0.139
leness+0.139
 souven+0.138
Neg logit contributions
uck-0.129
elt-0.127
 office-0.126
 bathroom-0.123
our-0.123
Token you
Token ID345
Feature activation+5.15e-02

Pos logit contributions
 matter+0.095
 unity+0.093
 course+0.092
Fin+0.091
oc+0.090
Neg logit contributions
angers-0.145
 office-0.139
elt-0.138
 bathroom-0.131
 salon-0.125
Token,
Token ID11
Feature activation-2.81e-02

Pos logit contributions
 matter+0.112
 unity+0.111
Fin+0.108
 course+0.107
 bond+0.105
Neg logit contributions
angers-0.056
 office-0.054
elt-0.050
 bathroom-0.048
 door-0.044
Token mom
Token ID1995
Feature activation-2.06e-02

Pos logit contributions
 office+0.021
 bathroom+0.021
 knee+0.019
 salon+0.019
 door+0.019
Neg logit contributions
leness-0.077
oc-0.073
Ar-0.072
 unity-0.072
Fin-0.071
Token!"
Token ID2474
Feature activation-3.83e-02

Pos logit contributions
angers+0.027
 office+0.025
elt+0.024
 bathroom+0.022
 salon+0.021
Neg logit contributions
 unity-0.041
 matter-0.041
 course-0.041
Fin-0.040
 bond-0.040
Token
Token ID198
Feature activation+6.99e-02

Pos logit contributions
angers+0.076
 office+0.072
elt+0.070
 bathroom+0.065
 salon+0.064
Neg logit contributions
 unity-0.050
 course-0.050
 matter-0.048
 bond-0.047
 kindness-0.047
Token
Token ID198
Feature activation+7.26e-02

Pos logit contributions
 unity+0.114
 matter+0.113
 course+0.113
Fin+0.108
 bond+0.108
Neg logit contributions
angers-0.119
elt-0.109
 office-0.108
 bathroom-0.099
 salon-0.096
TokenThey
Token ID2990
Feature activation+8.28e-02

Pos logit contributions
 matter+0.122
 unity+0.113
 course+0.112
leness+0.104
 bond+0.103
Neg logit contributions
angers-0.124
 office-0.122
 bathroom-0.117
elt-0.114
 salon-0.110
Token hugged
Token ID41130
Feature activation+4.10e-02

Pos logit contributions
 unity+0.143
 matter+0.143
 harmony+0.133
 course+0.132
 bond+0.132
Neg logit contributions
angers-0.131
elt-0.124
 office-0.124
 bathroom-0.116
 salon-0.113
Token their
Token ID511
Feature activation+3.96e-03

Pos logit contributions
 unity+0.067
 matter+0.067
 course+0.067
Fin+0.064
 bond+0.062
Neg logit contributions
angers-0.070
 office-0.064
elt-0.063
 bathroom-0.060
 salon-0.057
Token to
Token ID284
Feature activation+1.32e-02

Pos logit contributions
 unity+0.082
 matter+0.082
 course+0.081
oc+0.080
 bond+0.080
Neg logit contributions
angers-0.034
elt-0.031
 office-0.031
 bathroom-0.027
 salon-0.024
Token find
Token ID1064
Feature activation-5.39e-02

Pos logit contributions
 unity+0.021
 course+0.020
 matter+0.020
 bond+0.020
Fin+0.019
Neg logit contributions
angers-0.022
 office-0.021
 bathroom-0.020
elt-0.020
 salon-0.019
Token out
Token ID503
Feature activation-3.53e-02

Pos logit contributions
angers+0.071
elt+0.068
 office+0.068
 bathroom+0.062
 door+0.056
Neg logit contributions
 unity-0.104
 matter-0.104
 bond-0.101
leness-0.100
oc-0.100
Token what
Token ID644
Feature activation-2.52e-02

Pos logit contributions
angers+0.058
 office+0.056
elt+0.053
 bathroom+0.050
 upstairs+0.048
Neg logit contributions
 matter-0.057
 unity-0.057
 course-0.055
Fin-0.055
 bond-0.055
Tokenâ
Token ID22940
Feature activation+4.14e-03

Pos logit contributions
angers+0.051
elt+0.046
 office+0.045
 bathroom+0.042
 salon+0.041
Neg logit contributions
 matter-0.035
 unity-0.034
 course-0.034
 kindness-0.032
Fin-0.032
Token
Token ID26391
Feature activation-1.32e-01

Pos logit contributions
 course+0.008
 matter+0.008
 unity+0.007
Fin+0.007
 bond+0.007
Neg logit contributions
angers-0.007
elt-0.006
 office-0.006
 bathroom-0.005
 door-0.005
Token
Token ID8151
Feature activation+2.92e-03

Pos logit contributions
angers+0.215
elt+0.202
 office+0.201
 salon+0.188
 bathroom+0.188
Neg logit contributions
Fin-0.207
oc-0.206
 course-0.205
 unity-0.197
 harmony-0.193
Tokens
Token ID82
Feature activation-1.11e-02

Pos logit contributions
 course+0.004
Fin+0.004
 matter+0.004
oc+0.003
 unity+0.003
Neg logit contributions
angers-0.006
 office-0.006
 bathroom-0.005
elt-0.005
 salon-0.005
Token there
Token ID612
Feature activation-3.63e-03

Pos logit contributions
angers+0.018
elt+0.017
 office+0.015
amas+0.014
 salon+0.014
Neg logit contributions
 matter-0.020
 unity-0.019
 course-0.019
 kindness-0.018
 remind-0.018
Token.
Token ID13
Feature activation-1.83e-02

Pos logit contributions
angers+0.006
elt+0.006
 office+0.006
 bathroom+0.005
 salon+0.005
Neg logit contributions
 course-0.006
 matter-0.006
 unity-0.006
Fin-0.006
 bond-0.005
Tokenâ
Token ID22940
Feature activation+1.29e-02

Pos logit contributions
angers+0.031
elt+0.030
 office+0.028
 bathroom+0.026
amas+0.025
Neg logit contributions
 matter-0.031
 unity-0.030
 course-0.030
 bond-0.029
leness-0.028
Token hug
Token ID16225
Feature activation-9.93e-03

Pos logit contributions
 unity+0.005
Fin+0.005
 course+0.005
leness+0.005
 matter+0.005
Neg logit contributions
 office-0.006
 bathroom-0.006
elt-0.006
angers-0.006
 door-0.006
Token,
Token ID11
Feature activation-7.86e-02

Pos logit contributions
angers+0.015
elt+0.014
 office+0.014
 bathroom+0.013
 salon+0.012
Neg logit contributions
 unity-0.018
 matter-0.018
 course-0.017
Fin-0.017
 bond-0.017
Token and
Token ID290
Feature activation+1.67e-02

Pos logit contributions
elt+0.117
angers+0.116
 office+0.109
 bathroom+0.104
 salon+0.096
Neg logit contributions
 unity-0.141
 matter-0.139
 course-0.137
Fin-0.135
oc-0.134
Token they
Token ID484
Feature activation+2.47e-02

Pos logit contributions
 unity+0.027
 course+0.027
 matter+0.027
 bond+0.026
Fin+0.026
Neg logit contributions
angers-0.028
elt-0.026
 office-0.026
 bathroom-0.024
ed-0.023
Token both
Token ID1111
Feature activation+1.83e-02

Pos logit contributions
 unity+0.047
 matter+0.047
 course+0.045
Fin+0.044
oc+0.044
Neg logit contributions
angers-0.032
elt-0.030
 office-0.029
ed-0.028
 bathroom-0.028
Token started
Token ID2067
Feature activation-1.32e-01

Pos logit contributions
 unity+0.033
 matter+0.032
 course+0.032
Fin+0.031
 caring+0.030
Neg logit contributions
angers-0.026
elt-0.026
 office-0.024
 bathroom-0.023
ed-0.022
Token to
Token ID284
Feature activation-1.27e-02

Pos logit contributions
angers+0.160
elt+0.138
 office+0.137
 bathroom+0.118
 salon+0.115
Neg logit contributions
 unity-0.282
 matter-0.280
 course-0.278
 bond-0.270
 kindness-0.269
Token plan
Token ID1410
Feature activation-2.67e-02

Pos logit contributions
angers+0.021
 office+0.021
 bathroom+0.021
elt+0.020
 salon+0.019
Neg logit contributions
 unity-0.019
 course-0.018
Fin-0.018
 matter-0.018
oc-0.018
Token their
Token ID511
Feature activation-2.62e-02

Pos logit contributions
angers+0.037
 office+0.036
elt+0.032
 bathroom+0.031
 upstairs+0.030
Neg logit contributions
 matter-0.050
 course-0.050
Fin-0.050
 unity-0.049
oc-0.049
Token next
Token ID1306
Feature activation-8.21e-03

Pos logit contributions
 office+0.043
angers+0.041
 bathroom+0.040
elt+0.037
 salon+0.037
Neg logit contributions
 course-0.043
Fin-0.042
oc-0.042
leness-0.042
 matter-0.041
Token race
Token ID3234
Feature activation+7.88e-03

Pos logit contributions
 office+0.015
angers+0.015
elt+0.014
 bathroom+0.013
 door+0.013
Neg logit contributions
 unity-0.012
leness-0.012
Fin-0.012
From-0.012
 course-0.011
Token
Token ID8151
Feature activation-1.30e-02

Pos logit contributions
angers+0.250
elt+0.227
 office+0.225
 bathroom+0.216
 salon+0.205
Neg logit contributions
oc-0.217
 course-0.216
 unity-0.215
Fin-0.213
 matter-0.205
Tokent
Token ID83
Feature activation+3.65e-02

Pos logit contributions
angers+0.024
 office+0.022
elt+0.021
 bathroom+0.021
azes+0.019
Neg logit contributions
 course-0.019
Fin-0.019
oc-0.018
 matter-0.018
 unity-0.018
Token believe
Token ID1975
Feature activation-1.83e-03

Pos logit contributions
 unity+0.042
 course+0.037
 bond+0.037
 harmony+0.036
oc+0.036
Neg logit contributions
angers-0.076
 office-0.075
elt-0.072
 bathroom-0.070
 upstairs-0.068
Token you
Token ID345
Feature activation+2.56e-02

Pos logit contributions
angers+0.004
elt+0.003
 office+0.003
 bathroom+0.003
 salon+0.003
Neg logit contributions
 unity-0.003
 matter-0.003
 course-0.003
 bond-0.002
Fin-0.002
Tokenâ
Token ID22940
Feature activation-7.14e-03

Pos logit contributions
 unity+0.039
Fin+0.038
 matter+0.038
 bond+0.037
 caring+0.036
Neg logit contributions
angers-0.043
 office-0.042
elt-0.040
 bathroom-0.039
 door-0.038
Token
Token ID26391
Feature activation-1.41e-01

Pos logit contributions
angers+0.009
elt+0.008
 office+0.008
 bathroom+0.007
 salon+0.007
Neg logit contributions
 matter-0.015
 course-0.015
 unity-0.015
Fin-0.014
 bond-0.014
Token
Token ID8151
Feature activation-1.32e-02

Pos logit contributions
angers+0.259
elt+0.240
 office+0.237
 bathroom+0.227
 salon+0.216
Neg logit contributions
 course-0.203
Fin-0.200
oc-0.200
 unity-0.199
 matter-0.191
Tokenre
Token ID260
Feature activation-3.51e-02

Pos logit contributions
angers+0.022
 office+0.021
elt+0.020
 bathroom+0.019
azes+0.018
Neg logit contributions
 course-0.022
Fin-0.021
 matter-0.020
 unity-0.020
oc-0.020
Token here
Token ID994
Feature activation-1.64e-02

Pos logit contributions
angers+0.061
 office+0.056
elt+0.056
 bathroom+0.051
 salon+0.049
Neg logit contributions
 matter-0.056
 unity-0.056
 course-0.055
Fin-0.053
 bond-0.053
Token!
Token ID0
Feature activation-3.99e-02

Pos logit contributions
angers+0.023
 office+0.022
elt+0.021
 bathroom+0.020
 salon+0.018
Neg logit contributions
 unity-0.030
 matter-0.030
 course-0.030
Fin-0.029
 bond-0.029
Tokenâ
Token ID22940
Feature activation-1.57e-03

Pos logit contributions
angers+0.068
elt+0.065
 office+0.063
 bathroom+0.058
 salon+0.055
Neg logit contributions
 matter-0.065
 unity-0.064
 course-0.062
 bond-0.062
Fin-0.059
Token mother
Token ID2802
Feature activation-5.53e-02

Pos logit contributions
angers+0.070
 office+0.067
elt+0.064
 bathroom+0.059
 salon+0.056
Neg logit contributions
 course-0.094
 unity-0.094
 matter-0.094
Fin-0.091
 bond-0.089
Token said
Token ID531
Feature activation-5.06e-02

Pos logit contributions
angers+0.086
 office+0.084
elt+0.082
 bathroom+0.074
 salon+0.072
Neg logit contributions
 unity-0.095
 matter-0.093
Fin-0.092
 course-0.090
 bond-0.089
Token,
Token ID11
Feature activation-8.07e-02

Pos logit contributions
angers+0.084
elt+0.075
 office+0.074
 bathroom+0.068
 salon+0.066
Neg logit contributions
 course-0.086
 unity-0.086
 matter-0.085
 kindness-0.082
 caring-0.081
Token
Token ID6184
Feature activation+1.28e-04

Pos logit contributions
angers+0.088
elt+0.079
 office+0.078
 bathroom+0.067
 salon+0.062
Neg logit contributions
 matter-0.180
 unity-0.180
 course-0.179
 bond-0.174
Fin-0.173
Token
Token ID95
Feature activation-8.14e-03

Pos logit contributions
 course+0.000
 unity+0.000
Fin+0.000
 matter+0.000
oc+0.000
Neg logit contributions
angers-0.000
elt-0.000
 office-0.000
 bathroom-0.000
 salon-0.000
Token
Token ID26391
Feature activation-1.42e-01

Pos logit contributions
angers+0.014
elt+0.013
 office+0.012
 bathroom+0.012
 salon+0.012
Neg logit contributions
 course-0.013
 unity-0.013
oc-0.012
 matter-0.012
Fin-0.012
Token
Token ID129
Feature activation-2.16e-02

Pos logit contributions
angers+0.273
elt+0.256
 office+0.247
 bathroom+0.239
 salon+0.227
Neg logit contributions
oc-0.196
Fin-0.185
 course-0.184
 unity-0.173
From-0.172
Token
Token ID241
Feature activation-7.76e-02

Pos logit contributions
elt+0.037
angers+0.037
 office+0.032
 bathroom+0.031
tub+0.031
Neg logit contributions
 course-0.034
 unity-0.034
From-0.034
Fin-0.033
leness-0.033
TokenThere
Token ID1858
Feature activation+5.12e-02

Pos logit contributions
angers+0.153
elt+0.143
 office+0.142
 bathroom+0.132
 salon+0.128
Neg logit contributions
 unity-0.105
 matter-0.105
 course-0.104
 bond-0.098
Fin-0.097
Token,
Token ID11
Feature activation-6.95e-02

Pos logit contributions
 unity+0.090
 matter+0.089
 course+0.088
Fin+0.085
 bond+0.085
Neg logit contributions
angers-0.080
 office-0.073
elt-0.073
 bathroom-0.067
 salon-0.064
Token there
Token ID612
Feature activation-1.86e-02

Pos logit contributions
angers+0.120
 office+0.115
elt+0.109
 bathroom+0.107
 salon+0.102
Neg logit contributions
 unity-0.109
 matter-0.107
 course-0.105
 bond-0.105
leness-0.103
Token up
Token ID510
Feature activation-4.71e-02

Pos logit contributions
angers+0.056
 office+0.052
elt+0.051
ed+0.047
 door+0.046
Neg logit contributions
 matter-0.073
 unity-0.072
Fin-0.071
 bond-0.071
 course-0.069
Token her
Token ID607
Feature activation-7.08e-02

Pos logit contributions
angers+0.057
 office+0.057
 bathroom+0.050
elt+0.050
 door+0.049
Neg logit contributions
 matter-0.097
 bond-0.095
Fin-0.095
 unity-0.095
leness-0.094
Token special
Token ID2041
Feature activation+4.91e-02

Pos logit contributions
angers+0.108
 office+0.107
elt+0.098
 bathroom+0.095
 door+0.093
Neg logit contributions
Fin-0.125
 unity-0.122
 matter-0.120
 course-0.118
leness-0.116
Token pen
Token ID3112
Feature activation-3.28e-02

Pos logit contributions
Fin+0.085
 unity+0.083
 course+0.081
 matter+0.080
 caring+0.078
Neg logit contributions
 office-0.075
angers-0.075
elt-0.069
 bathroom-0.067
 door-0.066
Token and
Token ID290
Feature activation+3.69e-02

Pos logit contributions
angers+0.055
elt+0.050
 office+0.050
 bathroom+0.046
 salon+0.044
Neg logit contributions
 matter-0.054
 unity-0.054
 course-0.054
Fin-0.052
 bond-0.051
Token started
Token ID2067
Feature activation-1.28e-01

Pos logit contributions
 unity+0.061
 course+0.061
 matter+0.060
Fin+0.060
 bond+0.058
Neg logit contributions
angers-0.061
 office-0.057
elt-0.056
 bathroom-0.052
 salon-0.050
Token writing
Token ID3597
Feature activation+5.13e-04

Pos logit contributions
angers+0.163
elt+0.147
 office+0.136
 bathroom+0.123
amas+0.121
Neg logit contributions
 unity-0.260
 matter-0.257
 course-0.255
 kindness-0.252
 caring-0.251
Token down
Token ID866
Feature activation-8.42e-02

Pos logit contributions
 unity+0.001
 matter+0.001
 course+0.001
Fin+0.001
 bond+0.001
Neg logit contributions
angers-0.001
 office-0.001
elt-0.001
 bathroom-0.001
 salon-0.001
Token all
Token ID477
Feature activation-9.96e-03

Pos logit contributions
angers+0.125
 office+0.123
 door+0.107
 bathroom+0.105
elt+0.104
Neg logit contributions
Fin-0.158
 unity-0.149
 matter-0.147
 course-0.146
 bond-0.146
Token the
Token ID262
Feature activation-5.42e-02

Pos logit contributions
 office+0.015
angers+0.015
elt+0.013
 bathroom+0.013
 door+0.012
Neg logit contributions
Fin-0.018
 unity-0.017
oc-0.017
Blue-0.016
 matter-0.016
Token fun
Token ID1257
Feature activation+4.67e-02

Pos logit contributions
 office+0.085
angers+0.075
 door+0.073
 kitchen+0.071
 bathroom+0.071
Neg logit contributions
Fin-0.102
 unity-0.090
From-0.090
Fred-0.088
Blue-0.086