TOP ACTIVATIONS
MAX = 8.338

I am going to give you a
I am here to deliver your present
show everyone how special I am !" She practiced dancing and
I am stronger than you ! â
't you moving like I am ?" the infant wondered .
I am the symbol of the forest
is my castle and I am the queen !"
said , " Oh c 'm on Bill , just this
, " No , I am a very patient man ,
't you know that I am hungry too ? You should
I am scared of this pitch and
I am a powerful leaf creature !
I am looking for a serious star
I am strong , I create ,
I am Ty the Water Cat .
fly around like the wheel was doing .
patient man , but I am not fond of people coming
I am waiting for someone to pick
I am racing the wind ! â
I am called a squirrel . I

MIN ACTIVATIONS
MIN = -8.979

They came back late in the evening , tired
. But it was too late . The ice was too
But it is too late . The truck hits Tom
laughed . Everyone celebrated until late into the night !
. " It 's too late now and the peach is
John went outside late , when it was dark
. But it was too late . The man grabbed Ben
cry . It was too late - he had already been
But it was too late . The fence collapsed and
. But it was too late . The candy was gone
Tom and Sue were too late . They could not run
But it was too late . The dog bit Tom
But it was too late . The snow man was
But it was too late . Sam reached the shiny
two friends stayed out until late that evening , looking at
But it was too late . No one heard her
laughed , and talked until late at night . <|endoftext|> Once
." But it was too late . The poison had damaged
But it was too late . Ben touched the w
But it was too late . The man never got

INTERVAL 4.009 - 8.338
CONTAINS 6.526%

dog named Max . Max was very strong . Lucy told
" You did a good thing . You rescued this bird
down the mountain , it was too dark . She could
exciting and new , but a little scary too . He
happy train . The train was cheerful and loved to travel

INTERVAL -0.321 - 4.009
CONTAINS 49.470%

was a little girl called Pearl . Pearl loved playing outside
color to match the water . The storm went away ,
time there was a little girl . She had a plastic
. They like to play in the park . One day
her . Even though the rain was gone , the day

INTERVAL -4.650 - -0.321
CONTAINS 41.065%

a perfect place to just relax . He walked and walked
nap . They dream of the river . <|endoftext|> L ily
. At the party , there was a big
goodbye to their friends . They get in the car .
." They bought some food for the man and gave it

INTERVAL -8.979 - -4.650
CONTAINS 2.939%

fun together . They are happy . <|endoftext|> L ily liked
are hurt . They are sad . The j
crib where she would sit and knit all day long .
<|endoftext|> Once upon a time there were two friends , their
, Lily held her family and friends close to her heart
Token
Token ID95
Feature activation-7.17e-01

Pos logit contributions
 seeker+0.009
 desk+0.008
ird+0.007
 laptop+0.007
 liar+0.007
Neg logit contributions
reci-0.008
les-0.008
orting-0.007
ling-0.007
From-0.007
Token
Token ID26391
Feature activation+6.60e+00

Pos logit contributions
reci+0.022
les+0.021
orting+0.018
ling+0.018
From+0.017
Neg logit contributions
 seeker-0.023
 desk-0.022
ird-0.020
 laptop-0.020
 liar-0.020
Token
Token ID129
Feature activation+4.36e-01

Pos logit contributions
 seeker+0.228
 desk+0.218
ird+0.202
 laptop+0.200
 liar+0.194
Neg logit contributions
reci-0.184
les-0.179
orting-0.153
ling-0.148
 Nem-0.139
Token
Token ID241
Feature activation+1.84e+00

Pos logit contributions
 seeker+0.014
 desk+0.013
 laptop+0.012
ird+0.012
 principal+0.012
Neg logit contributions
reci-0.013
les-0.013
orting-0.011
ling-0.010
elling-0.010
TokenI
Token ID40
Feature activation+1.85e+00

Pos logit contributions
 seeker+0.069
 desk+0.067
 laptop+0.063
ird+0.063
 principal+0.060
Neg logit contributions
reci-0.044
les-0.043
orting-0.035
ling-0.034
From-0.034
Token am
Token ID716
Feature activation+8.24e+00

Pos logit contributions
 seeker+0.045
 desk+0.043
 laptop+0.038
ird+0.037
 principal+0.036
Neg logit contributions
reci-0.070
les-0.069
orting-0.061
ling-0.060
 Nem-0.057
Token going
Token ID1016
Feature activation+2.33e-01

Pos logit contributions
 seeker+0.234
 desk+0.228
 laptop+0.209
ird+0.205
 principal+0.198
Neg logit contributions
reci-0.266
les-0.261
 Nem-0.231
orting-0.230
ling-0.227
Token to
Token ID284
Feature activation+2.63e+00

Pos logit contributions
 seeker+0.007
 desk+0.007
ird+0.007
 laptop+0.007
 liar+0.006
Neg logit contributions
reci-0.007
les-0.007
orting-0.006
ling-0.006
 Nem-0.005
Token give
Token ID1577
Feature activation+2.00e-01

Pos logit contributions
 seeker+0.079
 desk+0.075
 laptop+0.071
ird+0.071
 liar+0.069
Neg logit contributions
reci-0.078
les-0.071
 Nem-0.067
orting-0.064
 them-0.064
Token you
Token ID345
Feature activation+1.16e+00

Pos logit contributions
 seeker+0.006
 ha+0.006
 desk+0.006
 x+0.005
 principal+0.005
Neg logit contributions
reci-0.006
les-0.006
lights-0.005
orting-0.005
ling-0.005
Token a
Token ID257
Feature activation+6.22e+00

Pos logit contributions
 seeker+0.025
 desk+0.023
ird+0.020
 laptop+0.019
 ha+0.019
Neg logit contributions
reci-0.047
les-0.046
orting-0.041
ling-0.041
From-0.039
Token
Token ID95
Feature activation-1.19e+00

Pos logit contributions
reci+0.004
les+0.004
orting+0.004
ling+0.004
From+0.003
Neg logit contributions
 seeker-0.005
 desk-0.004
ird-0.004
 laptop-0.004
 liar-0.004
Token
Token ID26391
Feature activation+6.34e+00

Pos logit contributions
reci+0.036
les+0.036
ling+0.030
orting+0.030
 Nem+0.028
Neg logit contributions
 seeker-0.038
 desk-0.036
ird-0.034
 laptop-0.033
 office-0.032
Token
Token ID129
Feature activation-3.81e-02

Pos logit contributions
 seeker+0.222
 desk+0.215
ird+0.199
 laptop+0.196
 liar+0.188
Neg logit contributions
reci-0.177
les-0.171
orting-0.145
ling-0.141
elling-0.133
Token
Token ID241
Feature activation+1.83e+00

Pos logit contributions
reci+0.001
les+0.001
orting+0.001
ling+0.001
ted+0.001
Neg logit contributions
 seeker-0.001
 desk-0.001
 laptop-0.001
 x-0.001
 office-0.001
TokenI
Token ID40
Feature activation+1.97e+00

Pos logit contributions
 seeker+0.062
 desk+0.060
 laptop+0.055
ird+0.055
 liar+0.053
Neg logit contributions
reci-0.051
les-0.050
orting-0.043
ling-0.042
From-0.040
Token am
Token ID716
Feature activation+8.12e+00

Pos logit contributions
 seeker+0.049
 desk+0.046
ird+0.041
 laptop+0.041
 liar+0.039
Neg logit contributions
reci-0.074
les-0.072
orting-0.065
ling-0.063
From-0.060
Token here
Token ID994
Feature activation-6.69e-01

Pos logit contributions
 seeker+0.254
 desk+0.243
 laptop+0.226
ird+0.218
 liar+0.217
Neg logit contributions
reci-0.240
les-0.235
 Nem-0.203
orting-0.202
ling-0.197
Token to
Token ID284
Feature activation+3.67e+00

Pos logit contributions
reci+0.019
les+0.019
orting+0.017
ling+0.016
 Nem+0.016
Neg logit contributions
 seeker-0.021
 desk-0.020
ird-0.019
 laptop-0.019
 liar-0.018
Token deliver
Token ID5203
Feature activation-1.21e+00

Pos logit contributions
 seeker+0.107
 desk+0.100
 laptop+0.092
ird+0.091
 liar+0.089
Neg logit contributions
reci-0.120
les-0.115
orting-0.103
 Nem-0.099
From-0.098
Token your
Token ID534
Feature activation+8.39e-02

Pos logit contributions
reci+0.035
les+0.033
orting+0.029
ling+0.029
From+0.027
Neg logit contributions
 seeker-0.040
 desk-0.040
ird-0.036
 laptop-0.036
 ha-0.035
Token present
Token ID1944
Feature activation+1.35e+00

Pos logit contributions
 seeker+0.003
 desk+0.002
ird+0.002
 liar+0.002
 principal+0.002
Neg logit contributions
reci-0.003
les-0.003
orting-0.002
ling-0.002
elling-0.002
Token show
Token ID905
Feature activation+6.53e-01

Pos logit contributions
 seeker+0.080
 desk+0.075
 laptop+0.069
ird+0.069
 liar+0.067
Neg logit contributions
reci-0.101
les-0.099
orting-0.087
ling-0.086
 Nem-0.082
Token everyone
Token ID2506
Feature activation-8.29e-02

Pos logit contributions
 seeker+0.024
 desk+0.024
ird+0.022
 laptop+0.022
 ha+0.021
Neg logit contributions
reci-0.016
les-0.015
orting-0.013
ling-0.013
From-0.012
Token how
Token ID703
Feature activation+2.16e+00

Pos logit contributions
reci+0.003
les+0.002
orting+0.002
From+0.002
ling+0.002
Neg logit contributions
 seeker-0.002
 desk-0.002
ird-0.002
 laptop-0.002
 office-0.002
Token special
Token ID2041
Feature activation-5.24e+00

Pos logit contributions
 seeker+0.062
 desk+0.060
 liar+0.054
 ha+0.053
ird+0.053
Neg logit contributions
reci-0.069
les-0.069
orting-0.063
ling-0.057
From-0.055
Token I
Token ID314
Feature activation+5.78e-01

Pos logit contributions
reci+0.153
les+0.144
orting+0.128
ling+0.124
From+0.120
Neg logit contributions
 seeker-0.168
 desk-0.162
ird-0.151
 ha-0.147
 laptop-0.147
Token am
Token ID716
Feature activation+8.11e+00

Pos logit contributions
 seeker+0.017
 desk+0.016
ird+0.014
 laptop+0.014
 liar+0.014
Neg logit contributions
reci-0.019
les-0.019
orting-0.017
ling-0.016
From-0.015
Token!"
Token ID2474
Feature activation+7.92e-02

Pos logit contributions
 seeker+0.260
 desk+0.246
ird+0.227
 laptop+0.224
 liar+0.218
Neg logit contributions
reci-0.245
les-0.240
orting-0.207
ling-0.202
 Nem-0.190
Token She
Token ID1375
Feature activation-2.00e+00

Pos logit contributions
 seeker+0.003
 desk+0.003
ird+0.003
 ha+0.002
 laptop+0.002
Neg logit contributions
reci-0.002
les-0.002
ling-0.002
orting-0.002
elling-0.002
Token practiced
Token ID19893
Feature activation-2.16e+00

Pos logit contributions
les+0.061
reci+0.060
ling+0.050
orting+0.049
From+0.048
Neg logit contributions
 seeker-0.065
 desk-0.060
 laptop-0.056
ird-0.055
 liar-0.055
Token dancing
Token ID15360
Feature activation-3.61e+00

Pos logit contributions
reci+0.065
les+0.064
orting+0.056
ling+0.055
From+0.052
Neg logit contributions
 seeker-0.070
 desk-0.065
ird-0.061
 laptop-0.058
 liar-0.058
Token and
Token ID290
Feature activation-4.83e+00

Pos logit contributions
reci+0.099
les+0.093
orting+0.080
ling+0.077
elling+0.075
Neg logit contributions
 seeker-0.127
 desk-0.122
 laptop-0.111
ird-0.111
 liar-0.107
Token
Token ID95
Feature activation-1.17e+00

Pos logit contributions
reci+0.006
les+0.006
orting+0.005
ling+0.005
elling+0.005
Neg logit contributions
 seeker-0.007
 desk-0.007
ird-0.006
 laptop-0.006
 liar-0.006
Token
Token ID26391
Feature activation+6.42e+00

Pos logit contributions
reci+0.036
les+0.035
orting+0.031
ling+0.029
elling+0.029
Neg logit contributions
 seeker-0.037
 desk-0.036
ird-0.032
 laptop-0.031
 liar-0.031
Token
Token ID129
Feature activation+5.78e-03

Pos logit contributions
 seeker+0.224
 desk+0.215
ird+0.197
 laptop+0.197
 liar+0.192
Neg logit contributions
reci-0.175
les-0.170
orting-0.146
ling-0.140
From-0.132
Token
Token ID241
Feature activation+1.77e+00

Pos logit contributions
 seeker+0.000
 desk+0.000
 laptop+0.000
ird+0.000
 principal+0.000
Neg logit contributions
reci-0.000
les-0.000
orting-0.000
ling-0.000
 Nem-0.000
TokenI
Token ID40
Feature activation+1.87e+00

Pos logit contributions
 seeker+0.066
 desk+0.064
ird+0.059
 laptop+0.059
 liar+0.057
Neg logit contributions
reci-0.044
les-0.043
orting-0.036
ling-0.035
From-0.033
Token am
Token ID716
Feature activation+8.10e+00

Pos logit contributions
 seeker+0.046
 desk+0.043
ird+0.038
 laptop+0.038
 liar+0.036
Neg logit contributions
reci-0.071
les-0.069
orting-0.062
ling-0.060
From-0.058
Token stronger
Token ID7387
Feature activation-2.80e+00

Pos logit contributions
 seeker+0.247
 desk+0.240
 laptop+0.218
ird+0.214
 principal+0.206
Neg logit contributions
reci-0.242
les-0.237
orting-0.207
 Nem-0.206
ling-0.203
Token than
Token ID621
Feature activation+1.20e+00

Pos logit contributions
les+0.076
reci+0.073
ling+0.063
orting+0.062
 Nem+0.056
Neg logit contributions
 seeker-0.095
 desk-0.093
ird-0.086
 laptop-0.085
 principal-0.082
Token you
Token ID345
Feature activation+2.31e+00

Pos logit contributions
 seeker+0.046
 desk+0.044
ird+0.041
 laptop+0.041
 liar+0.040
Neg logit contributions
reci-0.029
les-0.028
orting-0.023
ling-0.023
From-0.021
Token!
Token ID0
Feature activation+1.50e+00

Pos logit contributions
 seeker+0.073
 desk+0.069
 laptop+0.064
ird+0.062
 liar+0.061
Neg logit contributions
reci-0.070
les-0.069
orting-0.059
ling-0.058
From-0.055
Tokenâ
Token ID22940
Feature activation-2.02e+00

Pos logit contributions
 seeker+0.049
ird+0.046
 desk+0.045
 ha+0.042
 laptop+0.042
Neg logit contributions
les-0.046
reci-0.045
ling-0.040
orting-0.037
led-0.035
Token't
Token ID470
Feature activation+5.36e-01

Pos logit contributions
 seeker+0.101
 desk+0.099
ird+0.090
 ha+0.086
 laptop+0.080
Neg logit contributions
reci-0.222
les-0.217
orting-0.199
ling-0.195
 Nem-0.179
Token you
Token ID345
Feature activation+2.56e+00

Pos logit contributions
 seeker+0.021
 desk+0.021
ird+0.019
 laptop+0.019
 liar+0.019
Neg logit contributions
reci-0.012
les-0.012
orting-0.010
ling-0.009
From-0.009
Token moving
Token ID3867
Feature activation-8.50e-01

Pos logit contributions
 seeker+0.075
 desk+0.071
ird+0.065
 laptop+0.064
 liar+0.062
Neg logit contributions
reci-0.086
les-0.083
orting-0.074
ling-0.072
 Nem-0.068
Token like
Token ID588
Feature activation+1.54e+00

Pos logit contributions
reci+0.026
les+0.026
orting+0.022
ling+0.022
 Nem+0.020
Neg logit contributions
 seeker-0.027
 desk-0.026
ird-0.024
 laptop-0.023
 liar-0.022
Token I
Token ID314
Feature activation+9.97e-01

Pos logit contributions
 seeker+0.061
 desk+0.059
ird+0.055
 laptop+0.054
 liar+0.053
Neg logit contributions
reci-0.035
les-0.034
orting-0.028
ling-0.027
From-0.025
Token am
Token ID716
Feature activation+8.08e+00

Pos logit contributions
 seeker+0.027
 desk+0.025
 laptop+0.023
ird+0.022
 liar+0.022
Neg logit contributions
reci-0.035
les-0.035
orting-0.031
ling-0.030
 Nem-0.029
Token?"
Token ID1701
Feature activation+3.53e-01

Pos logit contributions
 seeker+0.215
 desk+0.202
ird+0.184
 laptop+0.181
 liar+0.173
Neg logit contributions
reci-0.285
les-0.279
orting-0.249
ling-0.245
 Nem-0.237
Token the
Token ID262
Feature activation-8.53e-01

Pos logit contributions
 seeker+0.013
 desk+0.012
 laptop+0.011
ird+0.011
 liar+0.011
Neg logit contributions
reci-0.009
les-0.009
orting-0.007
ling-0.007
 Nem-0.007
Token infant
Token ID11212
Feature activation+2.73e+00

Pos logit contributions
reci+0.028
les+0.028
ling+0.024
orting+0.024
 Nem+0.023
Neg logit contributions
 seeker-0.025
 desk-0.023
ird-0.022
 laptop-0.021
 liar-0.020
Token wondered
Token ID14028
Feature activation-7.65e-01

Pos logit contributions
 seeker+0.103
 desk+0.092
 laptop+0.091
ird+0.089
 liar+0.088
Neg logit contributions
les-0.069
reci-0.065
ling-0.061
 Nem-0.053
orting-0.052
Token.
Token ID13
Feature activation+1.22e+00

Pos logit contributions
les+0.021
reci+0.021
orting+0.019
ling+0.019
 them+0.017
Neg logit contributions
 seeker-0.026
 desk-0.024
 laptop-0.022
 liar-0.022
 principal-0.022
Token
Token ID95
Feature activation-1.15e+00

Pos logit contributions
reci+0.003
les+0.003
orting+0.003
ling+0.003
From+0.003
Neg logit contributions
 seeker-0.003
 desk-0.003
ird-0.003
 laptop-0.003
 liar-0.003
Token
Token ID26391
Feature activation+6.49e+00

Pos logit contributions
reci+0.036
les+0.035
orting+0.031
ling+0.029
From+0.029
Neg logit contributions
 seeker-0.036
 desk-0.034
ird-0.031
 laptop-0.031
 liar-0.030
Token
Token ID129
Feature activation+2.78e-02

Pos logit contributions
 seeker+0.238
 desk+0.227
ird+0.211
 laptop+0.209
 liar+0.205
Neg logit contributions
reci-0.166
les-0.161
orting-0.137
ling-0.130
From-0.123
Token
Token ID241
Feature activation+1.97e+00

Pos logit contributions
 seeker+0.001
 desk+0.001
 laptop+0.001
ird+0.001
 office+0.001
Neg logit contributions
reci-0.001
les-0.001
orting-0.001
ling-0.001
 Nem-0.001
TokenI
Token ID40
Feature activation+1.88e+00

Pos logit contributions
 seeker+0.073
 desk+0.070
ird+0.065
 laptop+0.065
 liar+0.063
Neg logit contributions
reci-0.049
les-0.048
orting-0.040
ling-0.039
From-0.037
Token am
Token ID716
Feature activation+8.08e+00

Pos logit contributions
 seeker+0.044
 desk+0.041
ird+0.036
 laptop+0.036
 liar+0.034
Neg logit contributions
reci-0.074
les-0.072
orting-0.065
ling-0.063
From-0.061
Token the
Token ID262
Feature activation+5.66e-01

Pos logit contributions
 seeker+0.232
 desk+0.220
 laptop+0.200
ird+0.198
 liar+0.191
Neg logit contributions
reci-0.270
les-0.263
orting-0.231
ling-0.226
 Nem-0.221
Token symbol
Token ID6194
Feature activation+2.47e+00

Pos logit contributions
ird+0.015
 seeker+0.015
 desk+0.015
 laptop+0.014
 slammed+0.013
Neg logit contributions
reci-0.018
les-0.017
orting-0.016
ling-0.015
 Nem-0.015
Token of
Token ID286
Feature activation+9.14e-01

Pos logit contributions
 seeker+0.089
 desk+0.086
ird+0.080
 laptop+0.078
 liar+0.077
Neg logit contributions
reci-0.065
les-0.062
orting-0.054
ling-0.051
From-0.048
Token the
Token ID262
Feature activation-8.63e-01

Pos logit contributions
 seeker+0.029
 desk+0.028
 laptop+0.025
ird+0.025
 liar+0.025
Neg logit contributions
reci-0.028
les-0.028
orting-0.024
ling-0.023
elling-0.021
Token forest
Token ID8222
Feature activation+2.95e+00

Pos logit contributions
reci+0.032
les+0.030
orting+0.027
ling+0.027
elling+0.026
Neg logit contributions
 seeker-0.022
 desk-0.021
ird-0.020
 laptop-0.019
 liar-0.017
Token is
Token ID318
Feature activation+6.86e+00

Pos logit contributions
les+0.061
reci+0.061
orting+0.051
ling+0.050
 Nem+0.047
Neg logit contributions
 seeker-0.073
 desk-0.069
ird-0.065
 laptop-0.065
 liar-0.063
Token my
Token ID616
Feature activation+3.28e+00

Pos logit contributions
 seeker+0.236
 desk+0.227
ird+0.215
 laptop+0.213
 liar+0.208
Neg logit contributions
reci-0.185
les-0.177
orting-0.158
From-0.149
 Nem-0.148
Token castle
Token ID16669
Feature activation+3.02e+00

Pos logit contributions
 seeker+0.093
ird+0.089
 desk+0.087
 laptop+0.082
 liar+0.080
Neg logit contributions
les-0.102
reci-0.101
orting-0.089
ling-0.086
 Nem-0.083
Token and
Token ID290
Feature activation-3.17e+00

Pos logit contributions
 seeker+0.099
 desk+0.091
 liar+0.088
 laptop+0.087
ird+0.085
Neg logit contributions
les-0.083
reci-0.077
orting-0.070
ling-0.067
s-0.063
Token I
Token ID314
Feature activation+1.13e+00

Pos logit contributions
reci+0.084
les+0.082
orting+0.070
ling+0.067
elling+0.062
Neg logit contributions
 seeker-0.115
 desk-0.111
ird-0.101
 laptop-0.101
 liar-0.098
Token am
Token ID716
Feature activation+8.05e+00

Pos logit contributions
 seeker+0.035
 desk+0.033
 laptop+0.030
ird+0.030
 principal+0.029
Neg logit contributions
reci-0.036
les-0.035
orting-0.030
ling-0.029
 Nem-0.028
Token the
Token ID262
Feature activation+1.24e+00

Pos logit contributions
 seeker+0.254
 desk+0.249
 laptop+0.241
ird+0.237
 liar+0.225
Neg logit contributions
reci-0.218
les-0.209
 Nem-0.195
 them-0.184
orting-0.179
Token queen
Token ID16599
Feature activation+5.46e+00

Pos logit contributions
ird+0.034
 desk+0.033
 laptop+0.032
 seeker+0.031
 slammed+0.030
Neg logit contributions
reci-0.037
les-0.036
 Nem-0.033
orting-0.033
ling-0.032
Token!"
Token ID2474
Feature activation-1.12e-01

Pos logit contributions
 seeker+0.194
 desk+0.183
 laptop+0.174
 principal+0.168
ird+0.167
Neg logit contributions
les-0.141
reci-0.138
ling-0.118
orting-0.115
 Nem-0.104
Token
Token ID198
Feature activation-2.78e+00

Pos logit contributions
reci+0.003
les+0.003
ling+0.002
orting+0.002
elling+0.002
Neg logit contributions
 seeker-0.004
 desk-0.004
ird-0.004
 laptop-0.004
 principal-0.004
Token
Token ID198
Feature activation-2.82e+00

Pos logit contributions
reci+0.079
les+0.077
orting+0.065
ling+0.065
ience+0.064
Neg logit contributions
 desk-0.089
 seeker-0.087
ird-0.080
 laptop-0.079
ena-0.078
Token said
Token ID531
Feature activation+1.10e+00

Pos logit contributions
reci+0.161
les+0.156
orting+0.137
ling+0.132
elling+0.126
Neg logit contributions
 seeker-0.155
 desk-0.148
ird-0.135
 laptop-0.133
 liar-0.129
Token,
Token ID11
Feature activation+2.89e+00

Pos logit contributions
 seeker+0.032
 desk+0.030
ird+0.028
 laptop+0.027
 liar+0.026
Neg logit contributions
reci-0.037
les-0.036
orting-0.032
ling-0.031
From-0.029
Token "
Token ID366
Feature activation-2.49e-01

Pos logit contributions
 desk+0.071
 seeker+0.069
ird+0.064
 laptop+0.062
 ha+0.062
Neg logit contributions
reci-0.110
les-0.104
elling-0.091
orting-0.089
ling-0.088
TokenOh
Token ID5812
Feature activation+5.42e+00

Pos logit contributions
reci+0.008
les+0.008
orting+0.007
 Nem+0.007
ling+0.006
Neg logit contributions
 seeker-0.007
 ha-0.006
 desk-0.006
ird-0.006
ena-0.006
Token c
Token ID269
Feature activation+1.86e+00

Pos logit contributions
 seeker+0.153
 desk+0.147
 laptop+0.139
 principal+0.135
ird+0.127
Neg logit contributions
reci-0.171
les-0.169
orting-0.152
ling-0.145
 Nem-0.143
Token'm
Token ID1101
Feature activation+8.04e+00

Pos logit contributions
 seeker+0.059
 desk+0.054
ird+0.051
 liar+0.050
 ha+0.049
Neg logit contributions
reci-0.056
les-0.049
 Nem-0.046
From-0.043
orting-0.041
Tokenon
Token ID261
Feature activation+1.88e+00

Pos logit contributions
 seeker+0.192
 desk+0.179
ird+0.160
 laptop+0.156
 liar+0.150
Neg logit contributions
reci-0.314
les-0.308
orting-0.276
ling-0.270
From-0.257
Token Bill
Token ID3941
Feature activation+2.21e+00

Pos logit contributions
 seeker+0.058
 desk+0.055
 laptop+0.051
ird+0.050
 principal+0.049
Neg logit contributions
les-0.058
reci-0.057
ling-0.051
orting-0.050
elling-0.045
Token,
Token ID11
Feature activation+5.00e+00

Pos logit contributions
 seeker+0.065
 desk+0.062
ird+0.056
 laptop+0.056
 liar+0.054
Neg logit contributions
reci-0.073
les-0.071
orting-0.063
ling-0.061
From-0.058
Token just
Token ID655
Feature activation-7.37e-01

Pos logit contributions
 seeker+0.161
 desk+0.155
ird+0.142
 ha+0.142
 laptop+0.136
Neg logit contributions
reci-0.157
les-0.153
ling-0.128
orting-0.128
elling-0.119
Token this
Token ID428
Feature activation-2.75e-02

Pos logit contributions
reci+0.024
les+0.023
orting+0.020
ling+0.020
 Nem+0.019
Neg logit contributions
 seeker-0.022
 desk-0.021
 laptop-0.019
ird-0.019
 principal-0.018
Token,
Token ID11
Feature activation+2.06e+00

Pos logit contributions
 seeker+0.178
 laptop+0.173
 desk+0.170
 client+0.169
 liar+0.160
Neg logit contributions
les-0.129
ling-0.119
reci-0.115
 Nem-0.115
 them-0.111
Token "
Token ID366
Feature activation-1.95e+00

Pos logit contributions
 seeker+0.069
 desk+0.066
ird+0.061
 laptop+0.061
 liar+0.059
Neg logit contributions
reci-0.059
les-0.057
orting-0.049
ling-0.048
From-0.045
TokenNo
Token ID2949
Feature activation+9.51e-03

Pos logit contributions
les+0.066
reci+0.061
ling+0.055
orting+0.054
ls+0.052
Neg logit contributions
ird-0.054
 seeker-0.054
 ha-0.053
 desk-0.050
ena-0.048
Token,
Token ID11
Feature activation+4.79e+00

Pos logit contributions
 seeker+0.000
 desk+0.000
 laptop+0.000
ird+0.000
 principal+0.000
Neg logit contributions
reci-0.000
les-0.000
orting-0.000
ling-0.000
 Nem-0.000
Token I
Token ID314
Feature activation+9.51e-01

Pos logit contributions
 seeker+0.159
 desk+0.153
ird+0.139
 laptop+0.137
 ha+0.137
Neg logit contributions
reci-0.146
les-0.142
orting-0.121
ling-0.120
elling-0.112
Token am
Token ID716
Feature activation+8.01e+00

Pos logit contributions
 seeker+0.030
 desk+0.030
 laptop+0.027
 principal+0.026
 office+0.026
Neg logit contributions
reci-0.029
les-0.028
ling-0.024
orting-0.023
s-0.022
Token a
Token ID257
Feature activation+7.34e+00

Pos logit contributions
 desk+0.226
 seeker+0.226
ird+0.201
 laptop+0.201
 principal+0.193
Neg logit contributions
reci-0.241
les-0.239
 Nem-0.218
 them-0.214
ling-0.213
Token very
Token ID845
Feature activation+7.03e+00

Pos logit contributions
ena+0.224
 desk+0.223
ird+0.222
 seeker+0.221
 laptop+0.215
Neg logit contributions
les-0.177
reci-0.167
 Nem-0.166
ling-0.158
 them-0.140
Token patient
Token ID5827
Feature activation-7.23e-01

Pos logit contributions
 seeker+0.204
 desk+0.193
ird+0.177
 laptop+0.176
 principal+0.167
Neg logit contributions
reci-0.229
les-0.226
orting-0.196
ling-0.196
 Nem-0.187
Token man
Token ID582
Feature activation+3.58e+00

Pos logit contributions
les+0.021
reci+0.019
ling+0.018
orting+0.016
 Nem+0.016
Neg logit contributions
 seeker-0.023
 desk-0.022
ird-0.020
 laptop-0.020
 principal-0.020
Token,
Token ID11
Feature activation+3.86e+00

Pos logit contributions
 seeker+0.111
 desk+0.105
 laptop+0.094
ird+0.091
 principal+0.091
Neg logit contributions
les-0.110
reci-0.109
orting-0.095
ling-0.094
 Nem-0.088
Token't
Token ID470
Feature activation+6.44e-01

Pos logit contributions
 seeker+0.070
 desk+0.066
 laptop+0.060
ird+0.059
 liar+0.058
Neg logit contributions
reci-0.083
les-0.081
orting-0.072
ling-0.070
From-0.067
Token you
Token ID345
Feature activation+2.38e+00

Pos logit contributions
 seeker+0.019
 desk+0.018
 laptop+0.016
ird+0.016
 principal+0.015
Neg logit contributions
reci-0.021
les-0.020
orting-0.018
ling-0.018
 Nem-0.017
Token know
Token ID760
Feature activation+6.88e-01

Pos logit contributions
 seeker+0.068
 desk+0.065
 laptop+0.060
ird+0.059
 principal+0.057
Neg logit contributions
reci-0.080
les-0.078
orting-0.068
ling-0.067
 Nem-0.064
Token that
Token ID326
Feature activation-2.84e+00

Pos logit contributions
 seeker+0.024
 desk+0.024
ird+0.022
 laptop+0.021
 ha+0.021
Neg logit contributions
reci-0.019
les-0.018
orting-0.015
ling-0.014
From-0.014
Token I
Token ID314
Feature activation+9.73e-01

Pos logit contributions
reci+0.082
les+0.080
orting+0.069
ling+0.066
 Nem+0.062
Neg logit contributions
 seeker-0.095
 desk-0.090
ird-0.084
 laptop-0.083
 liar-0.080
Token am
Token ID716
Feature activation+8.01e+00

Pos logit contributions
 seeker+0.028
 desk+0.027
 laptop+0.024
ird+0.024
 liar+0.023
Neg logit contributions
reci-0.033
les-0.032
orting-0.028
ling-0.027
 Nem-0.026
Token hungry
Token ID14720
Feature activation-1.66e+00

Pos logit contributions
 seeker+0.232
 desk+0.225
ird+0.209
 laptop+0.204
 principal+0.193
Neg logit contributions
reci-0.257
les-0.249
orting-0.220
 Nem-0.220
ling-0.220
Token too
Token ID1165
Feature activation+2.55e+00

Pos logit contributions
les+0.047
reci+0.046
orting+0.040
ling+0.039
 Nem+0.037
Neg logit contributions
 seeker-0.055
 desk-0.053
ird-0.049
 laptop-0.049
 liar-0.047
Token?
Token ID30
Feature activation+2.89e+00

Pos logit contributions
 seeker+0.076
 desk+0.072
ird+0.066
 laptop+0.065
 principal+0.063
Neg logit contributions
reci-0.083
les-0.081
orting-0.071
ling-0.071
 Nem-0.066
Token You
Token ID921
Feature activation+6.73e-01

Pos logit contributions
 seeker+0.092
 desk+0.087
ird+0.083
 laptop+0.080
 ha+0.078
Neg logit contributions
les-0.089
reci-0.089
ling-0.077
orting-0.075
From-0.067
Token should
Token ID815
Feature activation+1.37e+00

Pos logit contributions
 seeker+0.020
 desk+0.019
ird+0.017
 laptop+0.017
 liar+0.016
Neg logit contributions
reci-0.022
les-0.022
orting-0.020
ling-0.019
From-0.018
Token
Token ID95
Feature activation-8.98e-01

Pos logit contributions
 seeker+0.002
 desk+0.002
 laptop+0.002
ird+0.002
 liar+0.002
Neg logit contributions
reci-0.002
les-0.002
orting-0.002
ling-0.002
From-0.002
Token
Token ID26391
Feature activation+6.37e+00

Pos logit contributions
reci+0.027
les+0.026
orting+0.024
From+0.022
ling+0.022
Neg logit contributions
 seeker-0.028
 desk-0.027
ird-0.024
 laptop-0.024
 liar-0.024
Token
Token ID129
Feature activation+1.19e-01

Pos logit contributions
 seeker+0.216
 desk+0.203
 laptop+0.187
 liar+0.185
ird+0.184
Neg logit contributions
reci-0.173
les-0.169
orting-0.148
From-0.138
ling-0.138
Token
Token ID241
Feature activation+1.71e+00

Pos logit contributions
 seeker+0.004
 desk+0.004
 laptop+0.003
ird+0.003
 principal+0.003
Neg logit contributions
reci-0.003
les-0.003
orting-0.003
ling-0.003
 Nem-0.003
TokenI
Token ID40
Feature activation+1.91e+00

Pos logit contributions
 seeker+0.068
 desk+0.065
 laptop+0.061
ird+0.060
 principal+0.059
Neg logit contributions
reci-0.036
les-0.035
From-0.029
orting-0.028
ling-0.027
Token am
Token ID716
Feature activation+8.00e+00

Pos logit contributions
 seeker+0.042
 desk+0.039
ird+0.035
 laptop+0.034
 liar+0.032
Neg logit contributions
reci-0.077
les-0.076
orting-0.068
ling-0.067
From-0.064
Token scared
Token ID12008
Feature activation-1.61e+00

Pos logit contributions
 seeker+0.228
 desk+0.221
 laptop+0.199
ird+0.191
 principal+0.190
Neg logit contributions
reci-0.259
les-0.253
orting-0.222
ling-0.221
 Nem-0.220
Token of
Token ID286
Feature activation+1.72e+00

Pos logit contributions
reci+0.045
les+0.043
orting+0.037
ling+0.036
From+0.034
Neg logit contributions
 seeker-0.056
 desk-0.053
ird-0.049
 laptop-0.049
 liar-0.047
Token this
Token ID428
Feature activation+9.33e-01

Pos logit contributions
 seeker+0.068
 desk+0.065
ird+0.061
 laptop+0.061
 liar+0.059
Neg logit contributions
reci-0.039
les-0.038
orting-0.032
ling-0.030
 Nem-0.028
Token pitch
Token ID7078
Feature activation-3.19e-01

Pos logit contributions
 seeker+0.027
 desk+0.026
ird+0.024
 laptop+0.023
 principal+0.023
Neg logit contributions
reci-0.031
les-0.030
orting-0.027
ling-0.026
From-0.024
Token and
Token ID290
Feature activation-3.78e+00

Pos logit contributions
reci+0.010
les+0.010
orting+0.009
ling+0.009
 Nem+0.008
Neg logit contributions
 seeker-0.009
 desk-0.009
ird-0.008
 laptop-0.008
 liar-0.008
Token
Token ID95
Feature activation-1.07e+00

Pos logit contributions
reci+0.001
les+0.001
orting+0.001
ling+0.001
From+0.001
Neg logit contributions
 seeker-0.001
 desk-0.001
ird-0.001
 laptop-0.001
 liar-0.001
Token
Token ID26391
Feature activation+6.43e+00

Pos logit contributions
reci+0.033
les+0.031
orting+0.028
ling+0.026
From+0.026
Neg logit contributions
 seeker-0.034
 desk-0.032
ird-0.030
 laptop-0.029
 liar-0.029
Token
Token ID129
Feature activation+1.01e-01

Pos logit contributions
 seeker+0.201
 desk+0.191
ird+0.176
 laptop+0.176
 liar+0.172
Neg logit contributions
reci-0.194
les-0.189
orting-0.168
ling-0.158
From-0.154
Token
Token ID241
Feature activation+1.77e+00

Pos logit contributions
 seeker+0.003
 desk+0.003
 laptop+0.003
ird+0.003
 principal+0.003
Neg logit contributions
reci-0.003
les-0.003
orting-0.003
ling-0.002
 Nem-0.002
Token I
Token ID314
Feature activation+1.21e+00

Pos logit contributions
 seeker+0.063
 desk+0.061
 laptop+0.057
ird+0.057
 principal+0.054
Neg logit contributions
reci-0.045
les-0.044
orting-0.037
From-0.036
ling-0.035
Token am
Token ID716
Feature activation+8.00e+00

Pos logit contributions
 seeker+0.029
 desk+0.029
 laptop+0.026
 principal+0.025
ird+0.024
Neg logit contributions
reci-0.045
les-0.045
ling-0.038
orting-0.038
 Nem-0.037
Token a
Token ID257
Feature activation+6.81e+00

Pos logit contributions
 seeker+0.234
 desk+0.228
 laptop+0.208
ird+0.205
 principal+0.196
Neg logit contributions
reci-0.250
les-0.246
 Nem-0.217
ling-0.212
orting-0.212
Token powerful
Token ID3665
Feature activation-1.35e+00

Pos logit contributions
ird+0.223
ena+0.217
 laptop+0.214
 desk+0.211
 principal+0.204
Neg logit contributions
les-0.153
reci-0.141
ling-0.140
 Nem-0.134
 refreshing-0.122
Token leaf
Token ID12835
Feature activation+1.47e+00

Pos logit contributions
les+0.041
reci+0.038
ling+0.036
 Nem+0.031
From+0.030
Neg logit contributions
ird-0.040
 laptop-0.039
 principal-0.039
 desk-0.038
 seeker-0.036
Token creature
Token ID7185
Feature activation+3.37e+00

Pos logit contributions
 seeker+0.044
 desk+0.042
ird+0.039
 laptop+0.039
 principal+0.038
Neg logit contributions
les-0.046
reci-0.045
ling-0.040
orting-0.039
 Nem-0.036
Token!
Token ID0
Feature activation+1.56e+00

Pos logit contributions
 seeker+0.104
 desk+0.099
 laptop+0.091
ird+0.090
 principal+0.088
Neg logit contributions
les-0.103
reci-0.102
orting-0.088
ling-0.087
 Nem-0.082
Token
Token ID95
Feature activation-9.74e-01

Pos logit contributions
reci+0.002
les+0.002
orting+0.002
From+0.002
ling+0.002
Neg logit contributions
 seeker-0.002
 desk-0.002
 laptop-0.002
ird-0.002
 liar-0.002
Token
Token ID26391
Feature activation+6.39e+00

Pos logit contributions
reci+0.031
les+0.030
orting+0.027
From+0.026
elling+0.025
Neg logit contributions
 seeker-0.029
 desk-0.028
 laptop-0.026
 liar-0.026
ird-0.025
Token
Token ID129
Feature activation+1.27e-01

Pos logit contributions
 seeker+0.200
 desk+0.189
 laptop+0.180
 liar+0.176
 x+0.173
Neg logit contributions
reci-0.184
les-0.182
orting-0.159
s-0.150
From-0.149
Token
Token ID241
Feature activation+1.72e+00

Pos logit contributions
 seeker+0.004
 desk+0.004
 laptop+0.004
ird+0.004
 liar+0.004
Neg logit contributions
reci-0.004
les-0.004
orting-0.003
ling-0.003
 Nem-0.003
TokenI
Token ID40
Feature activation+1.84e+00

Pos logit contributions
 seeker+0.062
 desk+0.061
 laptop+0.057
ird+0.056
 principal+0.054
Neg logit contributions
reci-0.041
les-0.040
From-0.034
orting-0.032
 Nem-0.032
Token am
Token ID716
Feature activation+8.00e+00

Pos logit contributions
 seeker+0.041
 desk+0.038
 laptop+0.034
ird+0.033
 principal+0.032
Neg logit contributions
reci-0.074
les-0.073
orting-0.065
ling-0.064
From-0.061
Token looking
Token ID2045
Feature activation+7.00e-01

Pos logit contributions
 seeker+0.242
 desk+0.232
 laptop+0.214
ird+0.208
 liar+0.205
Neg logit contributions
reci-0.242
les-0.239
orting-0.203
 Nem-0.202
ling-0.201
Token for
Token ID329
Feature activation+1.40e-02

Pos logit contributions
 seeker+0.024
 desk+0.022
ird+0.021
 laptop+0.021
 liar+0.020
Neg logit contributions
reci-0.020
les-0.020
orting-0.017
ling-0.016
From-0.015
Token a
Token ID257
Feature activation+6.22e+00

Pos logit contributions
 seeker+0.000
 desk+0.000
 laptop+0.000
ird+0.000
 liar+0.000
Neg logit contributions
les-0.000
reci-0.000
orting-0.000
 Nem-0.000
ling-0.000
Token serious
Token ID2726
Feature activation-7.56e-01

Pos logit contributions
 seeker+0.188
ena+0.184
ird+0.183
 desk+0.181
 laptop+0.181
Neg logit contributions
les-0.161
reci-0.157
 Nem-0.138
ling-0.136
orting-0.125
Token star
Token ID3491
Feature activation+7.67e-01

Pos logit contributions
les+0.024
reci+0.022
ling+0.020
 Nem+0.018
fulness+0.017
Neg logit contributions
 seeker-0.022
ird-0.022
 laptop-0.021
 principal-0.020
 desk-0.020
Token
Token ID95
Feature activation-1.26e+00

Pos logit contributions
reci+0.013
les+0.013
ling+0.011
orting+0.010
ted+0.010
Neg logit contributions
 seeker-0.013
 desk-0.012
ird-0.012
 laptop-0.011
 ha-0.011
Token
Token ID26391
Feature activation+6.20e+00

Pos logit contributions
les+0.040
reci+0.040
ling+0.034
orting+0.032
ted+0.032
Neg logit contributions
 seeker-0.037
 desk-0.035
ird-0.034
 laptop-0.032
 office-0.032
Token
Token ID129
Feature activation-8.60e-02

Pos logit contributions
 seeker+0.196
 desk+0.192
ird+0.180
 laptop+0.172
 office+0.168
Neg logit contributions
reci-0.189
les-0.182
orting-0.157
ling-0.152
elling-0.152
Token
Token ID241
Feature activation+1.69e+00

Pos logit contributions
reci+0.003
les+0.003
orting+0.002
ted+0.002
ling+0.002
Neg logit contributions
 seeker-0.002
 desk-0.002
 laptop-0.002
ena-0.002
 x-0.002
TokenI
Token ID40
Feature activation+1.79e+00

Pos logit contributions
 seeker+0.061
 desk+0.059
 laptop+0.055
ird+0.054
 liar+0.052
Neg logit contributions
reci-0.044
les-0.042
orting-0.035
ling-0.034
From-0.033
Token am
Token ID716
Feature activation+8.00e+00

Pos logit contributions
 seeker+0.041
 desk+0.038
ird+0.034
 laptop+0.033
 liar+0.032
Neg logit contributions
reci-0.070
les-0.069
orting-0.063
ling-0.061
elling-0.059
Token strong
Token ID1913
Feature activation-3.39e+00

Pos logit contributions
 seeker+0.233
 desk+0.223
 laptop+0.206
ird+0.202
 liar+0.196
Neg logit contributions
reci-0.260
les-0.255
orting-0.221
ling-0.218
 Nem-0.215
Token,
Token ID11
Feature activation+4.20e+00

Pos logit contributions
reci+0.094
les+0.094
orting+0.079
ling+0.078
 Nem+0.073
Neg logit contributions
 seeker-0.114
 desk-0.110
ird-0.102
 laptop-0.101
 principal-0.098
Token I
Token ID314
Feature activation+8.37e-01

Pos logit contributions
 seeker+0.128
 desk+0.121
ird+0.109
 laptop+0.105
 ha+0.104
Neg logit contributions
reci-0.139
les-0.135
orting-0.118
ling-0.116
elling-0.109
Token create
Token ID2251
Feature activation+9.56e-01

Pos logit contributions
 seeker+0.021
 desk+0.020
 laptop+0.018
ird+0.017
 principal+0.017
Neg logit contributions
reci-0.031
les-0.031
orting-0.026
ling-0.026
From-0.025
Token,
Token ID11
Feature activation+3.67e+00

Pos logit contributions
 seeker+0.024
 desk+0.023
ird+0.021
 laptop+0.020
 liar+0.019
Neg logit contributions
reci-0.035
les-0.034
orting-0.031
ling-0.030
From-0.029
Token
Token ID95
Feature activation-1.29e+00

Pos logit contributions
reci+0.011
les+0.011
orting+0.010
ling+0.009
From+0.009
Neg logit contributions
 seeker-0.011
 desk-0.011
ird-0.010
 laptop-0.010
 liar-0.009
Token
Token ID26391
Feature activation+6.26e+00

Pos logit contributions
reci+0.039
les+0.038
orting+0.034
ling+0.031
From+0.031
Neg logit contributions
 seeker-0.041
 desk-0.039
ird-0.036
 laptop-0.035
 liar-0.035
Token
Token ID129
Feature activation-7.63e-02

Pos logit contributions
 seeker+0.193
 desk+0.182
 laptop+0.166
ird+0.165
 liar+0.162
Neg logit contributions
reci-0.194
les-0.189
orting-0.168
ling-0.160
From-0.155
Token
Token ID241
Feature activation+1.68e+00

Pos logit contributions
reci+0.002
les+0.002
orting+0.002
ling+0.002
 Nem+0.002
Neg logit contributions
 seeker-0.002
 desk-0.002
 laptop-0.002
ird-0.002
 office-0.002
TokenI
Token ID40
Feature activation+1.91e+00

Pos logit contributions
 seeker+0.063
 desk+0.061
 laptop+0.057
ird+0.057
 liar+0.055
Neg logit contributions
reci-0.041
les-0.039
orting-0.033
ling-0.031
From-0.030
Token am
Token ID716
Feature activation+8.00e+00

Pos logit contributions
 seeker+0.042
 desk+0.038
ird+0.035
 laptop+0.033
 liar+0.032
Neg logit contributions
reci-0.077
les-0.075
orting-0.069
ling-0.067
elling-0.065
Token Ty
Token ID7039
Feature activation+3.44e+00

Pos logit contributions
 seeker+0.216
 desk+0.205
ird+0.184
 laptop+0.183
 liar+0.175
Neg logit contributions
reci-0.283
les-0.277
orting-0.246
ling-0.240
 Nem-0.230
Token the
Token ID262
Feature activation+4.85e-01

Pos logit contributions
 seeker+0.106
 desk+0.099
 laptop+0.091
ird+0.088
 principal+0.088
Neg logit contributions
les-0.107
reci-0.107
ling-0.093
orting-0.093
 Nem-0.086
Token Water
Token ID5638
Feature activation+2.75e-01

Pos logit contributions
ird+0.014
 desk+0.014
 seeker+0.013
 laptop+0.013
 slammed+0.013
Neg logit contributions
reci-0.013
les-0.012
orting-0.011
ling-0.011
elling-0.011
Token Cat
Token ID5181
Feature activation+2.56e+00

Pos logit contributions
 seeker+0.008
 desk+0.007
ird+0.007
 laptop+0.007
 ha+0.006
Neg logit contributions
reci-0.009
les-0.009
orting-0.008
From-0.008
ling-0.008
Token.
Token ID13
Feature activation+1.53e+00

Pos logit contributions
 seeker+0.073
 desk+0.070
ird+0.065
 laptop+0.063
 ha+0.061
Neg logit contributions
reci-0.085
les-0.081
orting-0.073
ling-0.069
From-0.068
Token fly
Token ID6129
Feature activation-2.35e+00

Pos logit contributions
 seeker+0.111
 desk+0.098
ird+0.093
 laptop+0.091
 principal+0.089
Neg logit contributions
reci-0.128
les-0.122
orting-0.107
 them-0.106
 Nem-0.105
Token around
Token ID1088
Feature activation-3.06e+00

Pos logit contributions
reci+0.077
les+0.075
orting+0.066
ling+0.063
From+0.061
Neg logit contributions
 seeker-0.070
 desk-0.066
ird-0.060
 laptop-0.059
 liar-0.057
Token like
Token ID588
Feature activation+1.33e+00

Pos logit contributions
reci+0.093
les+0.092
orting+0.080
ling+0.078
 Nem+0.074
Neg logit contributions
 seeker-0.096
 desk-0.091
ird-0.085
 laptop-0.083
 liar-0.081
Token the
Token ID262
Feature activation+5.92e-01

Pos logit contributions
 seeker+0.044
 desk+0.041
ird+0.039
 laptop+0.038
 liar+0.037
Neg logit contributions
reci-0.039
les-0.038
orting-0.033
ling-0.032
 Nem-0.031
Token wheel
Token ID7825
Feature activation+1.01e+00

Pos logit contributions
 seeker+0.019
 desk+0.018
ird+0.017
 laptop+0.016
 liar+0.016
Neg logit contributions
reci-0.017
les-0.017
ling-0.015
orting-0.014
 Nem-0.014
Token was
Token ID373
Feature activation+8.00e+00

Pos logit contributions
 seeker+0.030
 desk+0.028
ird+0.025
 laptop+0.025
 liar+0.024
Neg logit contributions
reci-0.033
les-0.033
orting-0.029
ling-0.028
 Nem-0.026
Token doing
Token ID1804
Feature activation+2.03e+00

Pos logit contributions
 seeker+0.222
 desk+0.208
ird+0.190
 laptop+0.187
 liar+0.180
Neg logit contributions
reci-0.276
les-0.271
orting-0.239
ling-0.237
 Nem-0.224
Token.
Token ID13
Feature activation+1.94e+00

Pos logit contributions
 seeker+0.060
 desk+0.056
ird+0.052
 laptop+0.051
 liar+0.050
Neg logit contributions
reci-0.066
les-0.066
orting-0.057
ling-0.057
 Nem-0.053
Token
Token ID220
Feature activation+6.85e-01

Pos logit contributions
 seeker+0.067
 desk+0.067
ird+0.063
 laptop+0.060
 principal+0.060
Neg logit contributions
reci-0.053
les-0.053
ling-0.045
orting-0.043
elling-0.040
Token
Token ID198
Feature activation-2.59e+00

Pos logit contributions
 desk+0.022
 seeker+0.021
ird+0.021
 laptop+0.019
ena+0.019
Neg logit contributions
reci-0.019
les-0.019
ling-0.016
orting-0.016
elling-0.015
Token
Token ID198
Feature activation-2.53e+00

Pos logit contributions
reci+0.071
les+0.071
ling+0.058
orting+0.058
elling+0.057
Neg logit contributions
 desk-0.085
 seeker-0.084
ird-0.082
ena-0.078
 principal-0.077
Token patient
Token ID5827
Feature activation-7.23e-01

Pos logit contributions
 seeker+0.204
 desk+0.193
ird+0.177
 laptop+0.176
 principal+0.167
Neg logit contributions
reci-0.229
les-0.226
orting-0.196
ling-0.196
 Nem-0.187
Token man
Token ID582
Feature activation+3.58e+00

Pos logit contributions
les+0.021
reci+0.019
ling+0.018
orting+0.016
 Nem+0.016
Neg logit contributions
 seeker-0.023
 desk-0.022
ird-0.020
 laptop-0.020
 principal-0.020
Token,
Token ID11
Feature activation+3.86e+00

Pos logit contributions
 seeker+0.111
 desk+0.105
 laptop+0.094
ird+0.091
 principal+0.091
Neg logit contributions
les-0.110
reci-0.109
orting-0.095
ling-0.094
 Nem-0.088
Token but
Token ID475
Feature activation-4.66e-01

Pos logit contributions
 seeker+0.122
 desk+0.117
ird+0.107
 laptop+0.105
 ha+0.102
Neg logit contributions
reci-0.121
les-0.117
orting-0.101
ling-0.100
elling-0.094
Token I
Token ID314
Feature activation+4.94e-01

Pos logit contributions
reci+0.014
les+0.014
orting+0.012
ling+0.011
elling+0.011
Neg logit contributions
 seeker-0.016
 desk-0.015
ird-0.014
 ha-0.013
 laptop-0.013
Token am
Token ID716
Feature activation+7.99e+00

Pos logit contributions
 seeker+0.014
 desk+0.014
 laptop+0.012
 principal+0.012
ird+0.012
Neg logit contributions
reci-0.016
les-0.016
orting-0.014
ling-0.013
 Nem-0.013
Token not
Token ID407
Feature activation+1.27e+00

Pos logit contributions
 seeker+0.222
 desk+0.214
 laptop+0.190
ird+0.189
 principal+0.179
Neg logit contributions
reci-0.264
les-0.258
orting-0.228
ling-0.228
 Nem-0.225
Token fond
Token ID16245
Feature activation-1.97e+00

Pos logit contributions
 seeker+0.040
 desk+0.038
 laptop+0.035
ird+0.034
 principal+0.034
Neg logit contributions
les-0.037
reci-0.037
ling-0.033
 Nem-0.031
orting-0.030
Token of
Token ID286
Feature activation+1.87e+00

Pos logit contributions
les+0.045
reci+0.045
orting+0.036
ling+0.034
elling+0.032
Neg logit contributions
 seeker-0.079
 desk-0.076
 laptop-0.072
ird-0.071
 principal-0.068
Token people
Token ID661
Feature activation+3.35e+00

Pos logit contributions
 seeker+0.066
 desk+0.063
ird+0.059
 laptop+0.058
 liar+0.057
Neg logit contributions
reci-0.050
les-0.049
orting-0.042
ling-0.040
From-0.038
Token coming
Token ID2406
Feature activation+3.32e+00

Pos logit contributions
 seeker+0.088
 desk+0.084
ird+0.077
 laptop+0.076
 liar+0.072
Neg logit contributions
reci-0.119
les-0.115
orting-0.104
ling-0.099
From-0.098
Token
Token ID95
Feature activation-9.98e-01

Pos logit contributions
reci+0.005
les+0.005
orting+0.005
ling+0.004
From+0.004
Neg logit contributions
 seeker-0.006
 desk-0.006
 laptop-0.006
ird-0.006
 liar-0.006
Token
Token ID26391
Feature activation+6.29e+00

Pos logit contributions
reci+0.032
les+0.031
orting+0.028
ling+0.027
From+0.026
Neg logit contributions
 seeker-0.030
 desk-0.028
 laptop-0.026
ird-0.026
 liar-0.025
Token
Token ID129
Feature activation+9.30e-02

Pos logit contributions
 seeker+0.216
 desk+0.207
 laptop+0.192
ird+0.190
 liar+0.186
Neg logit contributions
reci-0.171
les-0.166
orting-0.145
ling-0.135
From-0.132
Token
Token ID241
Feature activation+1.55e+00

Pos logit contributions
 seeker+0.003
 desk+0.003
 laptop+0.003
ird+0.003
 office+0.003
Neg logit contributions
reci-0.003
les-0.003
orting-0.002
ling-0.002
ted-0.002
TokenI
Token ID40
Feature activation+1.86e+00

Pos logit contributions
 seeker+0.056
 desk+0.054
 laptop+0.051
ird+0.051
 principal+0.048
Neg logit contributions
reci-0.039
les-0.038
orting-0.031
From-0.030
ling-0.030
Token am
Token ID716
Feature activation+7.97e+00

Pos logit contributions
 seeker+0.037
 desk+0.035
ird+0.030
 laptop+0.030
 liar+0.028
Neg logit contributions
reci-0.079
les-0.078
orting-0.070
ling-0.069
From-0.066
Token waiting
Token ID4953
Feature activation-1.52e-01

Pos logit contributions
 seeker+0.235
 desk+0.228
 laptop+0.208
ird+0.206
 principal+0.197
Neg logit contributions
reci-0.254
les-0.249
orting-0.218
ling-0.212
 Nem-0.211
Token for
Token ID329
Feature activation+3.71e-01

Pos logit contributions
reci+0.003
les+0.003
orting+0.003
ling+0.002
 Nem+0.002
Neg logit contributions
 seeker-0.006
 desk-0.006
ird-0.006
 laptop-0.005
 liar-0.005
Token someone
Token ID2130
Feature activation+2.45e+00

Pos logit contributions
 seeker+0.013
 desk+0.013
ird+0.012
 laptop+0.012
 liar+0.011
Neg logit contributions
reci-0.010
les-0.010
orting-0.008
ling-0.008
 Nem-0.008
Token to
Token ID284
Feature activation+3.28e+00

Pos logit contributions
 seeker+0.076
 desk+0.072
 laptop+0.065
ird+0.065
 liar+0.063
Neg logit contributions
reci-0.076
les-0.075
orting-0.065
ling-0.064
 Nem-0.060
Token pick
Token ID2298
Feature activation-9.95e-01

Pos logit contributions
 seeker+0.097
 desk+0.092
ird+0.086
 laptop+0.086
 liar+0.082
Neg logit contributions
reci-0.103
les-0.098
orting-0.089
 Nem-0.085
 them-0.084
Token
Token ID95
Feature activation-1.05e+00

Pos logit contributions
reci+0.009
les+0.009
orting+0.008
ling+0.007
 Nem+0.007
Neg logit contributions
 seeker-0.010
 desk-0.010
ird-0.009
 laptop-0.009
 liar-0.009
Token
Token ID26391
Feature activation+6.38e+00

Pos logit contributions
reci+0.030
les+0.029
orting+0.025
ling+0.024
From+0.023
Neg logit contributions
 seeker-0.036
 desk-0.034
ird-0.031
 laptop-0.031
 liar-0.030
Token
Token ID129
Feature activation+6.33e-02

Pos logit contributions
 seeker+0.226
 desk+0.214
 laptop+0.198
ird+0.198
 liar+0.194
Neg logit contributions
reci-0.169
les-0.165
orting-0.142
ling-0.134
From-0.128
Token
Token ID241
Feature activation+1.57e+00

Pos logit contributions
 seeker+0.002
 desk+0.002
 laptop+0.002
ird+0.002
 principal+0.002
Neg logit contributions
reci-0.002
les-0.002
orting-0.002
ling-0.001
elling-0.001
TokenI
Token ID40
Feature activation+1.76e+00

Pos logit contributions
 seeker+0.060
 desk+0.058
 laptop+0.054
ird+0.054
 principal+0.052
Neg logit contributions
reci-0.036
les-0.035
orting-0.029
ling-0.028
From-0.027
Token am
Token ID716
Feature activation+7.97e+00

Pos logit contributions
 seeker+0.039
 desk+0.037
 laptop+0.032
ird+0.032
 liar+0.030
Neg logit contributions
reci-0.071
les-0.069
orting-0.062
ling-0.061
 Nem-0.059
Token racing
Token ID11717
Feature activation-8.56e-01

Pos logit contributions
 seeker+0.236
 desk+0.226
 laptop+0.206
ird+0.203
 liar+0.195
Neg logit contributions
reci-0.251
les-0.246
 Nem-0.215
orting-0.214
ling-0.211
Token the
Token ID262
Feature activation-7.03e-01

Pos logit contributions
les+0.025
reci+0.025
orting+0.022
ling+0.022
 Nem+0.020
Neg logit contributions
 seeker-0.027
 desk-0.026
ird-0.024
 laptop-0.024
 liar-0.023
Token wind
Token ID2344
Feature activation-7.65e-01

Pos logit contributions
reci+0.021
les+0.021
ling+0.017
orting+0.017
 Nem+0.017
Neg logit contributions
 seeker-0.021
ird-0.020
 desk-0.020
 laptop-0.019
ena-0.019
Token!
Token ID0
Feature activation+1.58e+00

Pos logit contributions
reci+0.024
les+0.024
orting+0.021
ling+0.021
 Nem+0.019
Neg logit contributions
 seeker-0.023
 desk-0.022
ird-0.020
 laptop-0.020
 liar-0.019
Tokenâ
Token ID22940
Feature activation-2.03e+00

Pos logit contributions
 seeker+0.053
 desk+0.050
ird+0.050
 ha+0.046
 laptop+0.046
Neg logit contributions
les-0.045
reci-0.045
ling-0.039
orting-0.037
led-0.034
Token
Token ID95
Feature activation-8.56e-01

Pos logit contributions
reci+0.001
les+0.001
orting+0.001
ling+0.001
 Nem+0.000
Neg logit contributions
 seeker-0.001
 desk-0.001
ird-0.001
 laptop-0.001
 liar-0.001
Token
Token ID26391
Feature activation+6.69e+00

Pos logit contributions
reci+0.027
les+0.026
orting+0.023
ling+0.022
From+0.021
Neg logit contributions
 seeker-0.026
 desk-0.025
ird-0.023
 laptop-0.023
 liar-0.022
Token
Token ID129
Feature activation+2.82e-01

Pos logit contributions
 seeker+0.220
 desk+0.210
ird+0.193
 laptop+0.191
 liar+0.184
Neg logit contributions
reci-0.200
les-0.194
orting-0.167
ling-0.163
elling-0.154
Token
Token ID241
Feature activation+1.88e+00

Pos logit contributions
 seeker+0.009
 desk+0.009
 laptop+0.008
 office+0.008
 principal+0.008
Neg logit contributions
reci-0.008
les-0.008
orting-0.007
ling-0.007
elling-0.006
TokenI
Token ID40
Feature activation+1.72e+00

Pos logit contributions
 seeker+0.068
 desk+0.065
ird+0.060
 laptop+0.060
 liar+0.058
Neg logit contributions
reci-0.049
les-0.048
orting-0.041
ling-0.039
elling-0.036
Token am
Token ID716
Feature activation+7.96e+00

Pos logit contributions
 seeker+0.038
 desk+0.036
 laptop+0.031
ird+0.031
 principal+0.029
Neg logit contributions
reci-0.070
les-0.068
orting-0.061
ling-0.060
 Nem-0.058
Token called
Token ID1444
Feature activation+5.27e+00

Pos logit contributions
 seeker+0.231
 desk+0.222
ird+0.205
 laptop+0.204
 liar+0.193
Neg logit contributions
reci-0.253
les-0.248
 Nem-0.220
orting-0.216
ling-0.212
Token a
Token ID257
Feature activation+6.43e+00

Pos logit contributions
 seeker+0.177
 desk+0.167
ird+0.157
 laptop+0.155
 liar+0.151
Neg logit contributions
reci-0.148
les-0.144
orting-0.126
 Nem-0.118
ling-0.118
Token squirrel
Token ID33039
Feature activation+3.79e+00

Pos logit contributions
 seeker+0.197
 desk+0.191
 elbows+0.190
ird+0.189
 laptop+0.184
Neg logit contributions
les-0.161
reci-0.152
 Nem-0.142
ling-0.132
s-0.132
Token.
Token ID13
Feature activation+1.61e+00

Pos logit contributions
 seeker+0.116
 desk+0.110
 laptop+0.101
ird+0.100
 principal+0.097
Neg logit contributions
les-0.118
reci-0.117
orting-0.103
ling-0.101
 Nem-0.095
Token I
Token ID314
Feature activation+6.88e-01

Pos logit contributions
 seeker+0.050
 desk+0.048
ird+0.045
 laptop+0.043
 ha+0.043
Neg logit contributions
les-0.051
reci-0.049
ling-0.044
orting-0.042
led-0.039
Token
Token ID198
Feature activation-2.74e+00

Pos logit contributions
 seeker+0.022
 desk+0.022
ird+0.021
 laptop+0.020
 office+0.020
Neg logit contributions
reci-0.019
les-0.019
ling-0.016
orting-0.015
elling-0.015
Token
Token ID198
Feature activation-2.69e+00

Pos logit contributions
les+0.092
reci+0.090
elling+0.082
ling+0.077
orting+0.077
Neg logit contributions
 desk-0.074
 seeker-0.073
ird-0.071
ena-0.067
 principal-0.066
TokenThey
Token ID2990
Feature activation-3.20e+00

Pos logit contributions
les+0.082
reci+0.077
elling+0.073
ling+0.064
orting+0.064
Neg logit contributions
 seeker-0.084
ird-0.082
 desk-0.081
 ha-0.077
ggy-0.077
Token came
Token ID1625
Feature activation+3.28e+00

Pos logit contributions
reci+0.111
les+0.107
orting+0.100
elling+0.095
ling+0.092
Neg logit contributions
 seeker-0.089
 desk-0.084
ird-0.079
 ha-0.072
 laptop-0.070
Token back
Token ID736
Feature activation-2.95e+00

Pos logit contributions
 seeker+0.081
 desk+0.075
 office+0.065
 laptop+0.062
 upstairs+0.060
Neg logit contributions
reci-0.128
les-0.120
orting-0.111
elling-0.104
From-0.103
Token late
Token ID2739
Feature activation-8.62e+00

Pos logit contributions
reci+0.099
les+0.096
orting+0.085
ling+0.082
elling+0.078
Neg logit contributions
 seeker-0.087
 desk-0.083
 laptop-0.074
ird-0.074
 principal-0.071
Token in
Token ID287
Feature activation+1.32e+00

Pos logit contributions
reci+0.248
les+0.241
orting+0.208
ling+0.200
From+0.189
Neg logit contributions
 seeker-0.291
 desk-0.278
ird-0.255
 laptop-0.254
 liar-0.245
Token the
Token ID262
Feature activation-2.21e+00

Pos logit contributions
 seeker+0.050
 desk+0.048
ird+0.046
 laptop+0.044
 liar+0.044
Neg logit contributions
reci-0.031
les-0.030
orting-0.025
ling-0.024
 Nem-0.023
Token evening
Token ID6180
Feature activation-1.17e+00

Pos logit contributions
reci+0.073
les+0.071
orting+0.066
ted+0.063
From+0.063
Neg logit contributions
 desk-0.063
 seeker-0.063
 office-0.057
 closet-0.056
 laptop-0.052
Token,
Token ID11
Feature activation+2.50e+00

Pos logit contributions
reci+0.040
les+0.038
orting+0.034
ling+0.033
From+0.033
Neg logit contributions
 desk-0.033
 seeker-0.033
ird-0.029
 laptop-0.028
 office-0.027
Token tired
Token ID10032
Feature activation-2.70e+00

Pos logit contributions
 seeker+0.073
 desk+0.071
 laptop+0.062
 office+0.060
 locked+0.059
Neg logit contributions
reci-0.086
les-0.082
orting-0.073
elling-0.073
ling-0.066
Token.
Token ID13
Feature activation+1.29e+00

Pos logit contributions
 seeker+0.018
 desk+0.016
ird+0.016
 laptop+0.016
 liar+0.015
Neg logit contributions
les-0.013
reci-0.012
 Nem-0.012
ling-0.012
orting-0.011
Token But
Token ID887
Feature activation-4.92e-01

Pos logit contributions
 seeker+0.045
 desk+0.044
ird+0.041
 laptop+0.040
 principal+0.039
Neg logit contributions
reci-0.036
les-0.035
orting-0.029
ling-0.029
elling-0.026
Token it
Token ID340
Feature activation-3.66e+00

Pos logit contributions
reci+0.014
les+0.013
orting+0.011
ling+0.011
elling+0.010
Neg logit contributions
 seeker-0.018
 desk-0.017
ird-0.015
 laptop-0.015
 ha-0.015
Token was
Token ID373
Feature activation+5.18e+00

Pos logit contributions
reci+0.133
les+0.126
orting+0.118
elling+0.107
ling+0.107
Neg logit contributions
 seeker-0.095
 desk-0.091
ird-0.082
 laptop-0.079
 ha-0.078
Token too
Token ID1165
Feature activation+7.38e-01

Pos logit contributions
 seeker+0.154
 desk+0.146
ird+0.133
 laptop+0.132
 liar+0.127
Neg logit contributions
reci-0.170
les-0.166
orting-0.146
ling-0.143
From-0.135
Token late
Token ID2739
Feature activation-8.60e+00

Pos logit contributions
 seeker+0.024
 desk+0.023
 locked+0.020
 laptop+0.020
 liar+0.020
Neg logit contributions
reci-0.022
les-0.021
orting-0.020
ted-0.017
elling-0.016
Token.
Token ID13
Feature activation+1.02e+00

Pos logit contributions
reci+0.281
les+0.278
orting+0.249
ling+0.240
 Nem+0.228
Neg logit contributions
 seeker-0.253
 desk-0.235
ird-0.218
 laptop-0.214
 liar-0.210
Token The
Token ID383
Feature activation-4.86e+00

Pos logit contributions
 seeker+0.038
 desk+0.037
ird+0.035
 laptop+0.034
 principal+0.033
Neg logit contributions
reci-0.026
les-0.026
ling-0.021
orting-0.021
elling-0.018
Token ice
Token ID4771
Feature activation-1.54e+00

Pos logit contributions
reci+0.141
les+0.138
orting+0.118
ling+0.116
 Nem+0.110
Neg logit contributions
 seeker-0.162
 desk-0.155
ird-0.145
 laptop-0.142
 liar-0.137
Token was
Token ID373
Feature activation+5.42e+00

Pos logit contributions
reci+0.053
les+0.049
orting+0.045
From+0.043
elling+0.042
Neg logit contributions
 seeker-0.044
 desk-0.043
 laptop-0.037
ird-0.037
 ha-0.037
Token too
Token ID1165
Feature activation+1.16e+00

Pos logit contributions
 seeker+0.143
 desk+0.134
ird+0.121
 laptop+0.120
 liar+0.115
Neg logit contributions
reci-0.196
les-0.191
orting-0.171
ling-0.166
From-0.159
Token
Token ID198
Feature activation-2.65e+00

Pos logit contributions
reci+0.070
les+0.068
ling+0.056
orting+0.056
elling+0.053
Neg logit contributions
 seeker-0.092
 desk-0.089
ird-0.083
 laptop-0.082
 ha-0.081
TokenBut
Token ID1537
Feature activation-4.75e-01

Pos logit contributions
reci+0.071
les+0.069
orting+0.057
ling+0.056
elling+0.053
Neg logit contributions
 seeker-0.096
 desk-0.092
ird-0.086
 laptop-0.084
 ha-0.084
Token it
Token ID340
Feature activation-3.59e+00

Pos logit contributions
les+0.010
reci+0.010
orting+0.009
 Nem+0.008
From+0.008
Neg logit contributions
 seeker-0.018
 desk-0.018
ird-0.017
 laptop-0.017
 liar-0.017
Token is
Token ID318
Feature activation+4.62e+00

Pos logit contributions
reci+0.100
les+0.099
orting+0.083
ling+0.082
From+0.076
Neg logit contributions
 seeker-0.124
 desk-0.119
ird-0.111
 laptop-0.108
 liar-0.106
Token too
Token ID1165
Feature activation+7.10e-01

Pos logit contributions
 desk+0.165
ird+0.161
 seeker+0.160
 principal+0.149
 laptop+0.146
Neg logit contributions
reci-0.108
les-0.104
ling-0.095
 them-0.093
orting-0.091
Token late
Token ID2739
Feature activation-8.60e+00

Pos logit contributions
 seeker+0.025
 desk+0.024
 laptop+0.022
ird+0.022
 liar+0.021
Neg logit contributions
reci-0.020
les-0.019
orting-0.016
ling-0.015
elling-0.014
Token.
Token ID13
Feature activation+1.38e+00

Pos logit contributions
les+0.238
reci+0.234
orting+0.217
ling+0.200
 them+0.200
Neg logit contributions
 seeker-0.295
 desk-0.267
ird-0.262
 liar-0.249
 principal-0.243
Token The
Token ID383
Feature activation-4.51e+00

Pos logit contributions
 seeker+0.055
 desk+0.053
ird+0.050
 laptop+0.049
 liar+0.048
Neg logit contributions
reci-0.031
les-0.030
orting-0.025
ling-0.023
From-0.022
Token truck
Token ID7779
Feature activation+1.45e+00

Pos logit contributions
reci+0.141
les+0.138
orting+0.121
ling+0.117
 Nem+0.111
Neg logit contributions
 seeker-0.140
 desk-0.132
ird-0.123
 laptop-0.119
 liar-0.116
Token hits
Token ID7127
Feature activation-2.14e+00

Pos logit contributions
 seeker+0.049
 desk+0.047
ird+0.044
 laptop+0.042
 liar+0.041
Neg logit contributions
les-0.041
reci-0.041
orting-0.034
ling-0.034
From-0.032
Token Tom
Token ID4186
Feature activation+1.74e+00

Pos logit contributions
reci+0.062
les+0.062
orting+0.053
ling+0.051
 Nem+0.049
Neg logit contributions
 seeker-0.071
 desk-0.067
ird-0.063
 laptop-0.062
 liar-0.060
Token laughed
Token ID13818
Feature activation-1.45e+00

Pos logit contributions
reci+0.133
les+0.129
orting+0.111
 Nem+0.110
ling+0.109
Neg logit contributions
 seeker-0.151
 desk-0.144
 laptop-0.135
ird-0.133
 liar-0.128
Token.
Token ID13
Feature activation+8.62e-01

Pos logit contributions
reci+0.051
les+0.049
orting+0.044
ling+0.042
elling+0.041
Neg logit contributions
 seeker-0.040
 desk-0.037
ird-0.033
 laptop-0.033
 liar-0.032
Token Everyone
Token ID11075
Feature activation-2.02e+00

Pos logit contributions
 desk+0.021
 seeker+0.019
 closet+0.019
 classroom+0.018
 bedroom+0.018
Neg logit contributions
les-0.030
ling-0.030
reci-0.027
ler-0.026
ls-0.025
Token celebrated
Token ID13943
Feature activation-3.37e+00

Pos logit contributions
reci+0.076
les+0.071
orting+0.070
elling+0.064
 Nem+0.061
Neg logit contributions
 desk-0.044
 seeker-0.043
ird-0.039
 ha-0.039
 figure-0.038
Token until
Token ID1566
Feature activation-2.44e+00

Pos logit contributions
reci+0.110
les+0.106
elling+0.090
orting+0.089
ling+0.085
Neg logit contributions
 seeker-0.097
 desk-0.096
 laptop-0.085
ena-0.082
 office-0.082
Token late
Token ID2739
Feature activation-8.59e+00

Pos logit contributions
reci+0.065
les+0.062
orting+0.053
ling+0.051
elling+0.048
Neg logit contributions
 seeker-0.091
 desk-0.086
ird-0.078
 laptop-0.077
 ha-0.077
Token into
Token ID656
Feature activation+2.45e+00

Pos logit contributions
reci+0.269
les+0.252
orting+0.220
elling+0.209
ling+0.209
Neg logit contributions
 seeker-0.273
 desk-0.259
ird-0.233
 laptop-0.230
 ha-0.228
Token the
Token ID262
Feature activation-2.01e+00

Pos logit contributions
 seeker+0.094
 desk+0.089
ird+0.085
 laptop+0.083
 liar+0.082
Neg logit contributions
reci-0.058
les-0.056
orting-0.046
ling-0.043
 Nem-0.043
Token night
Token ID1755
Feature activation+1.69e-01

Pos logit contributions
reci+0.064
les+0.064
orting+0.055
ling+0.054
elling+0.052
Neg logit contributions
 seeker-0.058
ird-0.055
 liar-0.052
 desk-0.052
 laptop-0.048
Token!
Token ID0
Feature activation+6.21e-01

Pos logit contributions
 seeker+0.005
 desk+0.005
ird+0.005
 laptop+0.005
 principal+0.005
Neg logit contributions
reci-0.005
les-0.005
orting-0.004
ling-0.004
From-0.004
Token
Token ID198
Feature activation-2.71e+00

Pos logit contributions
 seeker+0.018
 desk+0.018
ird+0.017
 laptop+0.015
ena+0.015
Neg logit contributions
les-0.021
reci-0.020
ling-0.019
orting-0.017
ience-0.016
Token.
Token ID13
Feature activation+1.13e+00

Pos logit contributions
 seeker+0.101
 desk+0.087
 laptop+0.083
 liar+0.081
 principal+0.080
Neg logit contributions
les-0.079
reci-0.073
ling-0.073
orting-0.069
s-0.066
Token "
Token ID366
Feature activation-1.09e+00

Pos logit contributions
 seeker+0.039
 desk+0.037
ird+0.034
 laptop+0.034
 liar+0.033
Neg logit contributions
reci-0.032
les-0.031
orting-0.027
ling-0.026
elling-0.024
TokenIt
Token ID1026
Feature activation-4.90e+00

Pos logit contributions
les+0.035
reci+0.035
orting+0.031
elling+0.028
ience+0.028
Neg logit contributions
 seeker-0.031
 desk-0.030
ena-0.028
ird-0.028
 principal-0.028
Token's
Token ID338
Feature activation+2.29e+00

Pos logit contributions
reci+0.176
les+0.167
orting+0.159
 Nem+0.147
From+0.144
Neg logit contributions
 seeker-0.118
 desk-0.113
ird-0.108
 ha-0.105
 x-0.101
Token too
Token ID1165
Feature activation+2.94e-01

Pos logit contributions
 seeker+0.069
 desk+0.067
 laptop+0.060
ird+0.060
 principal+0.059
Neg logit contributions
reci-0.073
les-0.070
orting-0.063
ling-0.061
 Nem-0.059
Token late
Token ID2739
Feature activation-8.58e+00

Pos logit contributions
 seeker+0.009
 desk+0.008
ird+0.007
 laptop+0.007
 liar+0.007
Neg logit contributions
reci-0.010
les-0.010
orting-0.008
ling-0.008
From-0.008
Token now
Token ID783
Feature activation-4.41e+00

Pos logit contributions
reci+0.256
les+0.254
orting+0.227
ling+0.214
 Nem+0.204
Neg logit contributions
 seeker-0.269
 desk-0.257
ird-0.236
 laptop-0.234
 liar-0.230
Token and
Token ID290
Feature activation-4.20e+00

Pos logit contributions
reci+0.148
les+0.144
orting+0.127
ling+0.123
From+0.117
Neg logit contributions
 seeker-0.128
 desk-0.121
ird-0.110
 laptop-0.109
 liar-0.105
Token the
Token ID262
Feature activation-2.03e+00

Pos logit contributions
reci+0.114
les+0.111
orting+0.096
ling+0.091
From+0.085
Neg logit contributions
 seeker-0.148
 desk-0.142
ird-0.131
 laptop-0.129
 liar-0.126
Token peach
Token ID47565
Feature activation-8.82e-01

Pos logit contributions
reci+0.059
les+0.058
ling+0.051
 Nem+0.048
orting+0.047
Neg logit contributions
 seeker-0.063
ird-0.060
 desk-0.059
 laptop-0.058
ena-0.054
Token is
Token ID318
Feature activation+4.99e+00

Pos logit contributions
reci+0.027
les+0.026
orting+0.023
ling+0.022
From+0.021
Neg logit contributions
 seeker-0.028
 desk-0.027
ird-0.024
 laptop-0.024
 liar-0.023
Token
Token ID198
Feature activation-3.10e+00

Pos logit contributions
 seeker+0.028
 desk+0.028
ird+0.027
 laptop+0.025
 principal+0.025
Neg logit contributions
reci-0.021
les-0.021
ling-0.017
orting-0.017
elling-0.016
Token
Token ID198
Feature activation-3.00e+00

Pos logit contributions
reci+0.078
les+0.075
orting+0.062
ling+0.060
elling+0.058
Neg logit contributions
 seeker-0.115
 desk-0.113
ird-0.108
 laptop-0.103
 principal-0.101
TokenJohn
Token ID7554
Feature activation-2.64e-02

Pos logit contributions
reci+0.090
les+0.088
orting+0.076
ling+0.072
elling+0.069
Neg logit contributions
 seeker-0.098
 desk-0.095
ird-0.089
 laptop-0.086
 principal-0.083
Token went
Token ID1816
Feature activation+7.23e-01

Pos logit contributions
les+0.001
reci+0.001
ling+0.001
orting+0.001
 Nem+0.001
Neg logit contributions
 seeker-0.001
 desk-0.001
 laptop-0.001
ird-0.001
 liar-0.001
Token outside
Token ID2354
Feature activation-3.47e-01

Pos logit contributions
 seeker+0.025
ird+0.024
 liar+0.024
 desk+0.023
 laptop+0.022
Neg logit contributions
les-0.019
reci-0.017
orting-0.016
ling-0.016
 them-0.015
Token late
Token ID2739
Feature activation-8.57e+00

Pos logit contributions
les+0.008
reci+0.008
ling+0.007
orting+0.006
 Nem+0.006
Neg logit contributions
 seeker-0.013
 desk-0.012
ird-0.012
 liar-0.012
 laptop-0.011
Token,
Token ID11
Feature activation+1.93e+00

Pos logit contributions
les+0.224
reci+0.213
orting+0.193
ling+0.193
 them+0.171
Neg logit contributions
 seeker-0.304
ird-0.284
 liar-0.282
 desk-0.268
 principal-0.252
Token when
Token ID618
Feature activation-6.30e-02

Pos logit contributions
 seeker+0.081
 desk+0.078
ird+0.073
 laptop+0.073
 liar+0.071
Neg logit contributions
reci-0.040
les-0.038
orting-0.031
ling-0.029
From-0.027
Token it
Token ID340
Feature activation-4.58e+00

Pos logit contributions
reci+0.001
les+0.001
orting+0.001
ling+0.001
 Nem+0.001
Neg logit contributions
 seeker-0.003
 desk-0.003
ird-0.003
 laptop-0.003
 liar-0.003
Token was
Token ID373
Feature activation+4.35e+00

Pos logit contributions
les+0.137
reci+0.132
ling+0.115
orting+0.110
s+0.105
Neg logit contributions
 seeker-0.152
 desk-0.142
 liar-0.135
 laptop-0.135
ird-0.133
Token dark
Token ID3223
Feature activation-1.25e+00

Pos logit contributions
 seeker+0.138
 desk+0.136
 client+0.133
 liar+0.132
ird+0.130
Neg logit contributions
reci-0.116
les-0.115
 them-0.102
ling-0.102
orting-0.102
Token.
Token ID13
Feature activation+1.72e+00

Pos logit contributions
les+0.016
reci+0.016
orting+0.014
ling+0.014
s+0.013
Neg logit contributions
 seeker-0.019
 desk-0.018
 liar-0.016
ird-0.016
 laptop-0.016
Token But
Token ID887
Feature activation-4.09e-01

Pos logit contributions
 seeker+0.058
 desk+0.055
 laptop+0.050
ird+0.049
 principal+0.049
Neg logit contributions
reci-0.051
les-0.051
ling-0.044
orting-0.043
elling-0.040
Token it
Token ID340
Feature activation-3.51e+00

Pos logit contributions
reci+0.011
les+0.011
ling+0.009
orting+0.009
elling+0.008
Neg logit contributions
 seeker-0.016
 desk-0.014
 ha-0.013
ird-0.013
 laptop-0.013
Token was
Token ID373
Feature activation+5.67e+00

Pos logit contributions
les+0.098
reci+0.092
ling+0.085
orting+0.077
s+0.075
Neg logit contributions
 seeker-0.120
 desk-0.117
ird-0.109
 laptop-0.107
 liar-0.104
Token too
Token ID1165
Feature activation+9.50e-01

Pos logit contributions
ird+0.209
 desk+0.206
 seeker+0.197
 principal+0.191
 liar+0.188
Neg logit contributions
reci-0.116
 them-0.116
les-0.115
 Nem-0.108
ling-0.103
Token late
Token ID2739
Feature activation-8.57e+00

Pos logit contributions
 seeker+0.037
 desk+0.035
 laptop+0.033
ird+0.032
 liar+0.032
Neg logit contributions
reci-0.023
les-0.022
orting-0.019
ling-0.017
elling-0.016
Token.
Token ID13
Feature activation+1.83e+00

Pos logit contributions
les+0.233
reci+0.227
orting+0.214
ling+0.196
 them+0.196
Neg logit contributions
 seeker-0.296
 desk-0.269
ird-0.267
 liar-0.256
 ha-0.250
Token The
Token ID383
Feature activation-4.30e+00

Pos logit contributions
 seeker+0.067
 desk+0.064
ird+0.060
 laptop+0.059
 liar+0.057
Neg logit contributions
les-0.048
reci-0.048
ling-0.041
orting-0.039
elling-0.035
Token man
Token ID582
Feature activation+2.94e+00

Pos logit contributions
reci+0.121
les+0.119
orting+0.104
ling+0.099
 Nem+0.096
Neg logit contributions
 seeker-0.143
 desk-0.135
ird-0.132
 laptop-0.123
 liar-0.120
Token grabbed
Token ID13176
Feature activation-9.37e-01

Pos logit contributions
 seeker+0.101
 desk+0.090
ird+0.081
 liar+0.081
 laptop+0.081
Neg logit contributions
les-0.089
reci-0.082
ling-0.076
orting-0.071
s-0.071
Token Ben
Token ID3932
Feature activation+2.09e+00

Pos logit contributions
reci+0.028
les+0.026
orting+0.023
ling+0.022
From+0.021
Neg logit contributions
 seeker-0.032
 desk-0.030
 laptop-0.027
ird-0.027
 ha-0.026
Token cry
Token ID3960
Feature activation-2.03e+00

Pos logit contributions
 seeker+0.047
 desk+0.041
ird+0.040
ena+0.037
 principal+0.037
Neg logit contributions
reci-0.080
les-0.079
ling-0.073
 Nem-0.070
 them-0.067
Token.
Token ID13
Feature activation+1.15e+00

Pos logit contributions
les+0.060
reci+0.058
ling+0.054
orting+0.052
 Nem+0.049
Neg logit contributions
 seeker-0.067
 desk-0.061
 laptop-0.058
ird-0.058
 principal-0.057
Token It
Token ID632
Feature activation-4.31e+00

Pos logit contributions
 seeker+0.044
 desk+0.043
ird+0.040
 laptop+0.040
 liar+0.039
Neg logit contributions
reci-0.028
les-0.027
orting-0.022
ling-0.021
elling-0.019
Token was
Token ID373
Feature activation+5.05e+00

Pos logit contributions
reci+0.138
les+0.136
ling+0.118
orting+0.116
From+0.111
Neg logit contributions
 seeker-0.133
 desk-0.125
ird-0.115
 laptop-0.114
 principal-0.109
Token too
Token ID1165
Feature activation+8.06e-01

Pos logit contributions
 seeker+0.148
 desk+0.141
ird+0.130
 laptop+0.125
 principal+0.123
Neg logit contributions
reci-0.165
les-0.161
ling-0.145
 Nem-0.141
orting-0.141
Token late
Token ID2739
Feature activation-8.56e+00

Pos logit contributions
 seeker+0.029
 desk+0.028
 laptop+0.026
ird+0.026
 liar+0.026
Neg logit contributions
reci-0.021
les-0.020
orting-0.018
ling-0.015
From-0.015
Token -
Token ID532
Feature activation+3.00e+00

Pos logit contributions
les+0.237
reci+0.234
orting+0.212
ling+0.199
 Nem+0.188
Neg logit contributions
 seeker-0.290
 desk-0.269
ird-0.259
 laptop-0.249
 liar-0.249
Token he
Token ID339
Feature activation-3.37e+00

Pos logit contributions
 seeker+0.112
 desk+0.108
ird+0.100
 laptop+0.099
 liar+0.097
Neg logit contributions
reci-0.076
les-0.074
orting-0.062
ling-0.060
elling-0.055
Token had
Token ID550
Feature activation-6.57e-01

Pos logit contributions
les+0.106
reci+0.101
ling+0.091
 Nem+0.081
ience+0.081
Neg logit contributions
 seeker-0.109
 desk-0.100
 laptop-0.094
ird-0.093
 principal-0.089
Token already
Token ID1541
Feature activation-3.65e+00

Pos logit contributions
reci+0.023
les+0.022
 Nem+0.020
orting+0.019
ling+0.019
Neg logit contributions
 seeker-0.018
 desk-0.016
 laptop-0.015
ird-0.015
 principal-0.014
Token been
Token ID587
Feature activation+2.31e+00

Pos logit contributions
les+0.113
reci+0.108
ling+0.098
 Nem+0.091
led+0.089
Neg logit contributions
 seeker-0.113
 desk-0.104
 laptop-0.097
 principal-0.095
 client-0.094
Token
Token ID198
Feature activation-2.88e+00

Pos logit contributions
reci+0.086
les+0.085
ling+0.071
elling+0.069
orting+0.069
Neg logit contributions
 desk-0.092
 seeker-0.090
ird-0.085
 laptop-0.084
ena-0.082
TokenBut
Token ID1537
Feature activation-5.38e-01

Pos logit contributions
reci+0.085
les+0.084
elling+0.069
 Nem+0.068
orting+0.068
Neg logit contributions
 seeker-0.093
 desk-0.093
 laptop-0.086
ird-0.085
 ha-0.084
Token it
Token ID340
Feature activation-3.58e+00

Pos logit contributions
reci+0.014
les+0.014
orting+0.012
ling+0.012
From+0.011
Neg logit contributions
 seeker-0.019
 desk-0.018
ird-0.017
 laptop-0.017
 liar-0.016
Token was
Token ID373
Feature activation+5.50e+00

Pos logit contributions
les+0.098
reci+0.089
ling+0.085
From+0.076
s+0.075
Neg logit contributions
 seeker-0.128
 desk-0.123
ird-0.117
 laptop-0.116
 principal-0.111
Token too
Token ID1165
Feature activation+9.50e-01

Pos logit contributions
 desk+0.205
 seeker+0.202
ird+0.198
 laptop+0.189
 principal+0.185
Neg logit contributions
reci-0.120
les-0.114
orting-0.102
 Nem-0.101
 them-0.100
Token late
Token ID2739
Feature activation-8.56e+00

Pos logit contributions
 seeker+0.036
 desk+0.034
 laptop+0.031
ird+0.031
 liar+0.030
Neg logit contributions
reci-0.024
les-0.023
orting-0.020
ling-0.018
ted-0.017
Token.
Token ID13
Feature activation+1.42e+00

Pos logit contributions
les+0.248
reci+0.240
orting+0.225
ling+0.209
 them+0.205
Neg logit contributions
 seeker-0.282
 desk-0.256
ird-0.253
 liar-0.240
 laptop-0.234
Token The
Token ID383
Feature activation-4.40e+00

Pos logit contributions
 seeker+0.053
 desk+0.051
ird+0.048
 laptop+0.047
 liar+0.046
Neg logit contributions
reci-0.036
les-0.036
orting-0.029
ling-0.029
elling-0.026
Token fence
Token ID13990
Feature activation-6.98e-02

Pos logit contributions
reci+0.140
les+0.134
orting+0.122
ling+0.117
From+0.111
Neg logit contributions
ird-0.121
 seeker-0.121
 desk-0.114
 principal-0.106
ena-0.104
Token collapsed
Token ID14707
Feature activation-2.91e+00

Pos logit contributions
les+0.002
reci+0.002
ling+0.002
orting+0.002
ted+0.002
Neg logit contributions
 seeker-0.002
 desk-0.002
ird-0.002
 laptop-0.002
 liar-0.002
Token and
Token ID290
Feature activation-5.27e+00

Pos logit contributions
les+0.080
reci+0.077
ling+0.068
orting+0.067
ted+0.062
Neg logit contributions
 seeker-0.101
 desk-0.096
ird-0.092
 laptop-0.088
 liar-0.087
Token.
Token ID13
Feature activation+1.26e+00

Pos logit contributions
 seeker+0.137
 desk+0.120
 laptop+0.115
ird+0.108
 principal+0.107
Neg logit contributions
les-0.109
reci-0.105
ling-0.096
orting-0.093
s-0.087
Token But
Token ID887
Feature activation-5.29e-01

Pos logit contributions
 desk+0.030
 seeker+0.027
ird+0.027
 principal+0.026
 laptop+0.026
Neg logit contributions
reci-0.048
les-0.046
ling-0.041
orting-0.039
ience-0.038
Token it
Token ID340
Feature activation-3.70e+00

Pos logit contributions
reci+0.016
les+0.015
orting+0.013
ling+0.012
elling+0.012
Neg logit contributions
 seeker-0.018
 desk-0.017
ird-0.016
 ha-0.015
 laptop-0.015
Token was
Token ID373
Feature activation+5.12e+00

Pos logit contributions
reci+0.112
les+0.112
ling+0.097
orting+0.095
From+0.090
Neg logit contributions
 seeker-0.119
 desk-0.112
ird-0.104
 laptop-0.103
 liar-0.100
Token too
Token ID1165
Feature activation+7.58e-01

Pos logit contributions
 desk+0.174
 seeker+0.173
ird+0.162
 laptop+0.160
 principal+0.156
Neg logit contributions
reci-0.135
les-0.128
 Nem-0.116
ling-0.116
orting-0.113
Token late
Token ID2739
Feature activation-8.56e+00

Pos logit contributions
 seeker+0.030
 desk+0.029
 laptop+0.027
ird+0.027
 liar+0.026
Neg logit contributions
reci-0.017
les-0.017
orting-0.014
ling-0.013
ted-0.012
Token.
Token ID13
Feature activation+1.22e+00

Pos logit contributions
les+0.251
reci+0.244
orting+0.229
ling+0.211
 them+0.203
Neg logit contributions
 seeker-0.279
 desk-0.256
ird-0.245
 liar-0.239
 laptop-0.237
Token The
Token ID383
Feature activation-4.77e+00

Pos logit contributions
 seeker+0.039
 desk+0.038
ird+0.035
 laptop+0.034
 principal+0.034
Neg logit contributions
les-0.038
reci-0.037
ling-0.032
orting-0.031
ience-0.028
Token candy
Token ID18550
Feature activation+4.12e-01

Pos logit contributions
reci+0.131
les+0.127
orting+0.108
ling+0.106
 Nem+0.104
Neg logit contributions
 seeker-0.161
ird-0.149
 desk-0.144
ena-0.137
 laptop-0.135
Token was
Token ID373
Feature activation+5.53e+00

Pos logit contributions
 seeker+0.012
 desk+0.011
ird+0.011
 laptop+0.010
 liar+0.010
Neg logit contributions
les-0.014
reci-0.013
ling-0.011
orting-0.011
s-0.011
Token gone
Token ID3750
Feature activation-4.69e+00

Pos logit contributions
 seeker+0.156
 desk+0.146
ird+0.136
 laptop+0.131
 principal+0.128
Neg logit contributions
reci-0.185
les-0.181
ling-0.161
orting-0.160
 Nem-0.155
TokenTom
Token ID13787
Feature activation+3.89e-01

Pos logit contributions
les+0.093
reci+0.091
ling+0.076
orting+0.074
elling+0.072
Neg logit contributions
 seeker-0.092
 desk-0.087
ird-0.085
 ha-0.084
 liar-0.082
Token and
Token ID290
Feature activation-3.19e+00

Pos logit contributions
 seeker+0.016
 desk+0.015
ird+0.014
 laptop+0.014
 liar+0.014
Neg logit contributions
les-0.009
reci-0.008
ling-0.007
orting-0.007
From-0.006
Token Sue
Token ID26089
Feature activation+1.18e+00

Pos logit contributions
reci+0.076
les+0.075
 them+0.067
From+0.067
 Nem+0.067
Neg logit contributions
 seeker-0.109
 desk-0.103
ird-0.100
 laptop-0.099
 liar-0.092
Token were
Token ID547
Feature activation+4.47e+00

Pos logit contributions
 seeker+0.041
 desk+0.038
 laptop+0.035
 principal+0.035
ird+0.034
Neg logit contributions
les-0.033
reci-0.033
ling-0.028
orting-0.027
s-0.026
Token too
Token ID1165
Feature activation+1.02e+00

Pos logit contributions
 seeker+0.128
 desk+0.124
ird+0.110
 principal+0.105
 laptop+0.105
Neg logit contributions
les-0.144
ling-0.142
reci-0.138
 them-0.127
led-0.124
Token late
Token ID2739
Feature activation-8.56e+00

Pos logit contributions
 seeker+0.027
 desk+0.025
ird+0.023
 laptop+0.022
 principal+0.021
Neg logit contributions
reci-0.037
les-0.036
ling-0.034
s-0.031
orting-0.031
Token.
Token ID13
Feature activation+1.46e+00

Pos logit contributions
les+0.255
reci+0.254
orting+0.231
ling+0.217
 them+0.210
Neg logit contributions
 seeker-0.276
 desk-0.251
ird-0.242
 liar-0.229
 laptop-0.227
Token They
Token ID1119
Feature activation-3.51e+00

Pos logit contributions
 seeker+0.060
 desk+0.057
 laptop+0.053
ird+0.053
 liar+0.052
Neg logit contributions
les-0.033
reci-0.032
ling-0.027
orting-0.025
elling-0.022
Token could
Token ID714
Feature activation+3.56e+00

Pos logit contributions
les+0.099
reci+0.094
ling+0.080
From+0.080
ted+0.079
Neg logit contributions
 seeker-0.115
 desk-0.106
 laptop-0.100
 office-0.099
ird-0.097
Token not
Token ID407
Feature activation-1.11e+00

Pos logit contributions
 seeker+0.079
 desk+0.073
ird+0.067
 laptop+0.062
 liar+0.061
Neg logit contributions
reci-0.139
les-0.136
orting-0.123
ling-0.121
From-0.121
Token run
Token ID1057
Feature activation-4.34e-02

Pos logit contributions
reci+0.035
les+0.035
ling+0.030
orting+0.030
 Nem+0.029
Neg logit contributions
 seeker-0.034
 desk-0.031
 liar-0.029
 principal-0.029
 laptop-0.029
Token
Token ID198
Feature activation-2.99e+00

Pos logit contributions
reci+0.091
les+0.090
ling+0.076
elling+0.073
orting+0.073
Neg logit contributions
 seeker-0.092
 desk-0.091
ena-0.084
ird-0.084
 principal-0.083
TokenBut
Token ID1537
Feature activation-5.88e-01

Pos logit contributions
reci+0.085
les+0.083
orting+0.068
ling+0.067
elling+0.066
Neg logit contributions
 seeker-0.103
 desk-0.099
 ha-0.093
 principal-0.091
ird-0.091
Token it
Token ID340
Feature activation-3.85e+00

Pos logit contributions
reci+0.016
les+0.015
orting+0.013
ling+0.013
 Nem+0.012
Neg logit contributions
 seeker-0.021
 desk-0.020
ird-0.019
 laptop-0.019
 liar-0.018
Token was
Token ID373
Feature activation+5.43e+00

Pos logit contributions
les+0.105
reci+0.096
ling+0.091
From+0.082
ted+0.079
Neg logit contributions
 seeker-0.137
 desk-0.134
ird-0.128
 laptop-0.125
 liar-0.119
Token too
Token ID1165
Feature activation+8.05e-01

Pos logit contributions
 desk+0.199
ird+0.197
 seeker+0.188
 laptop+0.180
 principal+0.174
Neg logit contributions
reci-0.120
les-0.117
 Nem-0.112
 them-0.111
ling-0.106
Token late
Token ID2739
Feature activation-8.56e+00

Pos logit contributions
 seeker+0.031
 desk+0.029
 laptop+0.027
 liar+0.026
ird+0.026
Neg logit contributions
reci-0.020
les-0.019
orting-0.017
ling-0.015
ted-0.014
Token.
Token ID13
Feature activation+1.46e+00

Pos logit contributions
les+0.252
reci+0.247
orting+0.229
ling+0.213
 them+0.208
Neg logit contributions
 seeker-0.279
 desk-0.256
ird-0.249
 liar-0.234
 laptop-0.231
Token The
Token ID383
Feature activation-4.60e+00

Pos logit contributions
 seeker+0.059
 desk+0.057
ird+0.053
 laptop+0.053
 liar+0.051
Neg logit contributions
reci-0.032
les-0.031
orting-0.026
ling-0.024
From-0.023
Token dog
Token ID3290
Feature activation+2.34e+00

Pos logit contributions
les+0.147
reci+0.146
ling+0.126
orting+0.126
From+0.118
Neg logit contributions
 seeker-0.128
ird-0.127
 desk-0.122
ena-0.109
 principal-0.109
Token bit
Token ID1643
Feature activation+8.84e-01

Pos logit contributions
 seeker+0.081
 desk+0.075
ird+0.072
 laptop+0.070
 office+0.067
Neg logit contributions
les-0.068
reci-0.058
ling-0.058
ted-0.057
s-0.056
Token Tom
Token ID4186
Feature activation+1.07e+00

Pos logit contributions
 seeker+0.026
 desk+0.025
 laptop+0.022
 principal+0.022
 x+0.022
Neg logit contributions
reci-0.030
les-0.026
orting-0.025
ience-0.024
From-0.023
Token
Token ID198
Feature activation-2.93e+00

Pos logit contributions
reci+0.084
les+0.082
ling+0.070
ience+0.066
elling+0.066
Neg logit contributions
 desk-0.093
 seeker-0.092
ird-0.085
 laptop-0.084
ena-0.084
TokenBut
Token ID1537
Feature activation-3.89e-01

Pos logit contributions
reci+0.096
les+0.094
ling+0.080
orting+0.078
elling+0.077
Neg logit contributions
 seeker-0.089
 desk-0.086
ird-0.079
 laptop-0.078
 ha-0.076
Token it
Token ID340
Feature activation-3.62e+00

Pos logit contributions
reci+0.011
les+0.011
orting+0.010
ling+0.009
 Nem+0.009
Neg logit contributions
 seeker-0.013
 desk-0.012
ird-0.011
 laptop-0.011
 liar-0.011
Token was
Token ID373
Feature activation+5.42e+00

Pos logit contributions
les+0.107
reci+0.104
ling+0.092
orting+0.085
From+0.083
Neg logit contributions
 seeker-0.122
 desk-0.116
ird-0.108
 laptop-0.106
 principal-0.104
Token too
Token ID1165
Feature activation+7.83e-01

Pos logit contributions
 desk+0.190
 seeker+0.189
ird+0.181
 principal+0.172
 laptop+0.171
Neg logit contributions
reci-0.136
les-0.130
ling-0.116
orting-0.113
 Nem-0.112
Token late
Token ID2739
Feature activation-8.55e+00

Pos logit contributions
 seeker+0.028
 desk+0.027
 laptop+0.025
ird+0.024
 liar+0.024
Neg logit contributions
reci-0.021
les-0.020
orting-0.017
ling-0.015
 Nem-0.015
Token.
Token ID13
Feature activation+1.27e+00

Pos logit contributions
les+0.236
reci+0.226
orting+0.218
 them+0.198
ling+0.197
Neg logit contributions
 seeker-0.290
 desk-0.263
ird-0.260
 liar-0.251
 laptop-0.242
Token The
Token ID383
Feature activation-4.59e+00

Pos logit contributions
 seeker+0.044
 desk+0.042
ird+0.039
 laptop+0.039
 liar+0.038
Neg logit contributions
reci-0.035
les-0.035
orting-0.029
ling-0.029
elling-0.026
Token snow
Token ID6729
Feature activation-5.58e-01

Pos logit contributions
reci+0.118
les+0.115
orting+0.097
ling+0.093
From+0.087
Neg logit contributions
 seeker-0.170
 desk-0.163
ird-0.150
 laptop-0.150
 liar-0.145
Tokenman
Token ID805
Feature activation+2.63e+00

Pos logit contributions
reci+0.019
les+0.018
orting+0.016
From+0.015
ling+0.015
Neg logit contributions
 seeker-0.016
 desk-0.015
 laptop-0.014
ird-0.013
 liar-0.013
Token was
Token ID373
Feature activation+5.66e+00

Pos logit contributions
 seeker+0.084
 desk+0.075
 laptop+0.069
ird+0.068
 liar+0.067
Neg logit contributions
les-0.084
reci-0.082
ling-0.072
orting-0.069
ted-0.067
Token
Token ID198
Feature activation-2.60e+00

Pos logit contributions
reci+0.075
les+0.073
orting+0.060
ling+0.059
elling+0.058
Neg logit contributions
 seeker-0.083
 desk-0.080
ird-0.075
 ha-0.073
ena-0.072
TokenBut
Token ID1537
Feature activation-4.20e-01

Pos logit contributions
reci+0.082
les+0.081
orting+0.066
ling+0.065
elling+0.064
Neg logit contributions
 seeker-0.083
 desk-0.079
ird-0.073
 ha-0.072
 liar-0.070
Token it
Token ID340
Feature activation-3.49e+00

Pos logit contributions
reci+0.011
les+0.011
orting+0.010
ling+0.009
 Nem+0.009
Neg logit contributions
 seeker-0.015
 desk-0.014
ird-0.013
 laptop-0.013
 liar-0.013
Token was
Token ID373
Feature activation+5.37e+00

Pos logit contributions
les+0.100
reci+0.091
ling+0.089
From+0.080
s+0.079
Neg logit contributions
 seeker-0.119
 desk-0.115
ird-0.109
 laptop-0.108
 principal-0.104
Token too
Token ID1165
Feature activation+9.43e-01

Pos logit contributions
 desk+0.194
 seeker+0.190
ird+0.187
 laptop+0.176
 principal+0.174
Neg logit contributions
reci-0.126
les-0.119
 them-0.108
orting-0.108
 Nem-0.108
Token late
Token ID2739
Feature activation-8.55e+00

Pos logit contributions
 seeker+0.035
 desk+0.033
 laptop+0.030
ird+0.030
 liar+0.030
Neg logit contributions
reci-0.025
les-0.024
orting-0.021
ling-0.019
ted-0.018
Token.
Token ID13
Feature activation+1.35e+00

Pos logit contributions
les+0.253
reci+0.246
orting+0.230
ling+0.214
 them+0.207
Neg logit contributions
 seeker-0.277
 desk-0.254
ird-0.248
 liar-0.235
 laptop-0.231
Token Sam
Token ID3409
Feature activation+1.61e+00

Pos logit contributions
 seeker+0.052
 desk+0.049
ird+0.046
 laptop+0.046
 liar+0.045
Neg logit contributions
reci-0.033
les-0.033
ling-0.026
orting-0.026
elling-0.023
Token reached
Token ID4251
Feature activation-1.38e+00

Pos logit contributions
 seeker+0.056
 desk+0.051
 laptop+0.048
ird+0.047
 liar+0.046
Neg logit contributions
les-0.047
reci-0.040
ling-0.040
s-0.037
ted-0.037
Token the
Token ID262
Feature activation-1.36e+00

Pos logit contributions
reci+0.036
les+0.035
orting+0.029
ling+0.028
From+0.026
Neg logit contributions
 seeker-0.051
 desk-0.049
ird-0.045
 laptop-0.045
 liar-0.044
Token shiny
Token ID22441
Feature activation-3.64e+00

Pos logit contributions
les+0.035
reci+0.034
ling+0.030
orting+0.029
 them+0.028
Neg logit contributions
ird-0.045
 seeker-0.044
 desk-0.040
 liar-0.040
 laptop-0.039
Token two
Token ID734
Feature activation-3.65e-01

Pos logit contributions
reci+0.119
les+0.116
orting+0.097
ling+0.096
From+0.090
Neg logit contributions
 seeker-0.153
 desk-0.144
ird-0.136
 laptop-0.134
 liar-0.131
Token friends
Token ID2460
Feature activation+1.71e+00

Pos logit contributions
reci+0.012
les+0.011
orting+0.010
ling+0.009
 Nem+0.009
Neg logit contributions
 seeker-0.011
 desk-0.011
ird-0.010
 ha-0.009
 laptop-0.009
Token stayed
Token ID9658
Feature activation-8.92e-01

Pos logit contributions
 seeker+0.049
 desk+0.047
ird+0.044
 laptop+0.042
 ha+0.040
Neg logit contributions
reci-0.057
les-0.055
orting-0.049
ling-0.047
elling-0.045
Token out
Token ID503
Feature activation+3.08e+00

Pos logit contributions
reci+0.029
les+0.029
orting+0.025
ling+0.024
From+0.023
Neg logit contributions
 seeker-0.027
 desk-0.025
ird-0.023
 laptop-0.023
 liar-0.022
Token until
Token ID1566
Feature activation-2.51e+00

Pos logit contributions
 seeker+0.106
 desk+0.103
 laptop+0.094
ird+0.093
 office+0.090
Neg logit contributions
reci-0.085
les-0.082
orting-0.071
ling-0.065
elling-0.064
Token late
Token ID2739
Feature activation-8.54e+00

Pos logit contributions
reci+0.051
les+0.049
orting+0.040
ling+0.037
From+0.034
Neg logit contributions
 seeker-0.106
 desk-0.102
ird-0.096
 laptop-0.095
 liar-0.093
Token that
Token ID326
Feature activation-4.86e+00

Pos logit contributions
reci+0.272
les+0.268
orting+0.235
ling+0.229
 Nem+0.216
Neg logit contributions
 seeker-0.260
 desk-0.246
ird-0.226
 laptop-0.224
 liar-0.218
Token evening
Token ID6180
Feature activation-2.52e+00

Pos logit contributions
reci+0.143
les+0.138
orting+0.122
ling+0.115
From+0.110
Neg logit contributions
 seeker-0.162
 desk-0.156
ird-0.142
 laptop-0.141
 liar-0.135
Token,
Token ID11
Feature activation+2.29e+00

Pos logit contributions
reci+0.079
les+0.078
orting+0.068
ling+0.066
 Nem+0.062
Neg logit contributions
 seeker-0.078
 desk-0.074
ird-0.067
 laptop-0.067
 liar-0.065
Token looking
Token ID2045
Feature activation-1.20e+00

Pos logit contributions
 seeker+0.082
 desk+0.078
ird+0.071
 laptop+0.070
 ha+0.069
Neg logit contributions
reci-0.064
les-0.060
orting-0.052
ling-0.050
elling-0.047
Token at
Token ID379
Feature activation-1.60e+00

Pos logit contributions
reci+0.041
les+0.037
orting+0.035
From+0.034
elling+0.032
Neg logit contributions
 seeker-0.034
 desk-0.032
 laptop-0.027
ird-0.027
 office-0.027
Token
Token ID198
Feature activation-2.88e+00

Pos logit contributions
reci+0.070
les+0.068
orting+0.056
ling+0.054
elling+0.051
Neg logit contributions
 seeker-0.106
 desk-0.105
ird-0.098
 laptop-0.097
 principal-0.094
TokenBut
Token ID1537
Feature activation-3.88e-01

Pos logit contributions
reci+0.092
les+0.090
orting+0.079
ling+0.076
From+0.073
Neg logit contributions
 seeker-0.089
 desk-0.084
ird-0.077
 laptop-0.076
 liar-0.074
Token it
Token ID340
Feature activation-3.62e+00

Pos logit contributions
reci+0.009
les+0.009
orting+0.008
ling+0.008
 Nem+0.008
Neg logit contributions
 seeker-0.014
 desk-0.014
 laptop-0.013
ird-0.013
 liar-0.012
Token was
Token ID373
Feature activation+5.38e+00

Pos logit contributions
les+0.106
reci+0.102
ling+0.092
orting+0.085
From+0.084
Neg logit contributions
 seeker-0.121
 desk-0.116
ird-0.109
 laptop-0.108
 principal-0.104
Token too
Token ID1165
Feature activation+7.42e-01

Pos logit contributions
 desk+0.202
 seeker+0.195
ird+0.189
 laptop+0.185
 principal+0.183
Neg logit contributions
reci-0.123
les-0.116
ling-0.105
orting-0.105
 Nem-0.103
Token late
Token ID2739
Feature activation-8.54e+00

Pos logit contributions
 seeker+0.027
 desk+0.025
 laptop+0.023
 liar+0.023
 locked+0.023
Neg logit contributions
reci-0.020
les-0.019
orting-0.016
ling-0.015
ted-0.014
Token.
Token ID13
Feature activation+1.48e+00

Pos logit contributions
les+0.243
reci+0.236
orting+0.223
ling+0.205
 them+0.204
Neg logit contributions
 seeker-0.285
 desk-0.260
ird-0.252
 liar-0.243
 principal-0.238
Token No
Token ID1400
Feature activation-1.01e+00

Pos logit contributions
 seeker+0.058
 desk+0.055
 laptop+0.051
ird+0.051
 liar+0.050
Neg logit contributions
reci-0.034
les-0.032
orting-0.028
From-0.025
ling-0.025
Token one
Token ID530
Feature activation+6.19e-01

Pos logit contributions
les+0.022
reci+0.021
orting+0.017
ling+0.016
 them+0.015
Neg logit contributions
 seeker-0.040
 desk-0.039
 laptop-0.037
 principal-0.037
ird-0.036
Token heard
Token ID2982
Feature activation+2.74e+00

Pos logit contributions
 seeker+0.020
 desk+0.018
 laptop+0.017
ird+0.017
 principal+0.016
Neg logit contributions
les-0.019
reci-0.018
ling-0.016
ted-0.015
s-0.015
Token her
Token ID607
Feature activation+2.95e+00

Pos logit contributions
 seeker+0.098
 desk+0.092
ird+0.085
 laptop+0.085
 liar+0.083
Neg logit contributions
les-0.074
reci-0.073
orting-0.062
ling-0.059
 them-0.057
Token laughed
Token ID13818
Feature activation-1.67e+00

Pos logit contributions
 desk+0.056
 seeker+0.055
 laptop+0.050
ird+0.048
 slammed+0.047
Neg logit contributions
reci-0.072
les-0.068
elling-0.060
orting-0.060
ling-0.057
Token,
Token ID11
Feature activation+1.55e+00

Pos logit contributions
reci+0.062
les+0.055
elling+0.050
orting+0.049
leness+0.046
Neg logit contributions
 seeker-0.044
 desk-0.042
ird-0.036
 laptop-0.036
 hide-0.035
Token and
Token ID290
Feature activation-4.84e+00

Pos logit contributions
 desk+0.036
 seeker+0.033
 finger+0.032
 reading+0.031
 laptop+0.031
Neg logit contributions
reci-0.061
les-0.055
elling-0.053
orting-0.051
ling-0.046
Token talked
Token ID6619
Feature activation-2.56e+00

Pos logit contributions
reci+0.143
les+0.137
orting+0.120
ling+0.113
elling+0.112
Neg logit contributions
 seeker-0.161
 desk-0.153
ird-0.142
 laptop-0.139
 liar-0.135
Token until
Token ID1566
Feature activation-2.58e+00

Pos logit contributions
reci+0.085
les+0.080
orting+0.069
elling+0.068
ling+0.066
Neg logit contributions
 seeker-0.075
 desk-0.073
 laptop-0.066
ird-0.065
 liar-0.063
Token late
Token ID2739
Feature activation-8.53e+00

Pos logit contributions
reci+0.067
les+0.064
orting+0.053
ling+0.052
elling+0.049
Neg logit contributions
 seeker-0.097
 desk-0.092
ird-0.086
 laptop-0.084
 ha-0.083
Token at
Token ID379
Feature activation-6.31e-01

Pos logit contributions
reci+0.283
les+0.274
orting+0.242
ling+0.234
From+0.224
Neg logit contributions
 seeker-0.252
 desk-0.239
ird-0.217
 laptop-0.215
 liar-0.206
Token night
Token ID1755
Feature activation-1.58e+00

Pos logit contributions
reci+0.014
les+0.013
orting+0.011
ling+0.011
 Nem+0.010
Neg logit contributions
 seeker-0.025
 desk-0.024
ird-0.023
 laptop-0.022
 liar-0.022
Token.
Token ID13
Feature activation+1.15e+00

Pos logit contributions
reci+0.054
les+0.052
orting+0.046
ling+0.045
From+0.043
Neg logit contributions
 seeker-0.044
 desk-0.043
ird-0.038
 laptop-0.038
 liar-0.036
Token<|endoftext|>
Token ID50256
Feature activation-3.43e+00

Pos logit contributions
 seeker+0.041
 desk+0.039
ird+0.037
 principal+0.036
 laptop+0.035
Neg logit contributions
les-0.032
reci-0.031
ling-0.028
orting-0.025
elling-0.024
TokenOnce
Token ID7454
Feature activation-2.40e+00

Pos logit contributions
reci+0.074
les+0.071
orting+0.058
ling+0.055
From+0.051
Neg logit contributions
 seeker-0.140
 desk-0.134
ird-0.126
 laptop-0.124
 liar-0.121
Token."
Token ID526
Feature activation-1.59e-01

Pos logit contributions
 seeker+0.055
 desk+0.053
ird+0.048
 laptop+0.048
 liar+0.046
Neg logit contributions
les-0.048
reci-0.048
orting-0.043
ling-0.041
 Nem-0.038
Token But
Token ID887
Feature activation-3.33e-01

Pos logit contributions
reci+0.004
les+0.004
orting+0.003
ling+0.003
From+0.003
Neg logit contributions
 seeker-0.006
 desk-0.006
ird-0.005
 laptop-0.005
 liar-0.005
Token it
Token ID340
Feature activation-3.52e+00

Pos logit contributions
reci+0.009
les+0.009
orting+0.008
From+0.007
ling+0.007
Neg logit contributions
 seeker-0.011
 desk-0.010
ird-0.010
 laptop-0.010
 liar-0.010
Token was
Token ID373
Feature activation+5.22e+00

Pos logit contributions
les+0.110
reci+0.110
ling+0.095
orting+0.093
From+0.089
Neg logit contributions
 seeker-0.110
 desk-0.104
ird-0.096
 laptop-0.094
 liar-0.091
Token too
Token ID1165
Feature activation+8.61e-01

Pos logit contributions
 desk+0.176
 seeker+0.172
ird+0.166
 laptop+0.158
 principal+0.154
Neg logit contributions
reci-0.139
les-0.134
ling-0.122
orting-0.118
 Nem-0.118
Token late
Token ID2739
Feature activation-8.53e+00

Pos logit contributions
 seeker+0.034
 desk+0.032
 laptop+0.030
ird+0.029
 liar+0.029
Neg logit contributions
reci-0.020
les-0.020
orting-0.017
ling-0.015
ted-0.014
Token.
Token ID13
Feature activation+1.44e+00

Pos logit contributions
les+0.260
reci+0.255
orting+0.236
ling+0.221
 them+0.209
Neg logit contributions
 seeker-0.269
 desk-0.248
ird-0.237
 liar-0.224
 laptop-0.224
Token The
Token ID383
Feature activation-4.70e+00

Pos logit contributions
 seeker+0.054
 desk+0.051
ird+0.048
 laptop+0.047
 liar+0.046
Neg logit contributions
reci-0.036
les-0.034
orting-0.029
ling-0.028
From-0.027
Token poison
Token ID8764
Feature activation-5.50e-01

Pos logit contributions
les+0.133
reci+0.132
orting+0.119
ling+0.111
 them+0.108
Neg logit contributions
ird-0.131
 seeker-0.126
 desk-0.120
ena-0.112
 supposed-0.111
Token had
Token ID550
Feature activation+6.50e-02

Pos logit contributions
les+0.018
reci+0.018
orting+0.015
ling+0.015
 Nem+0.014
Neg logit contributions
 seeker-0.016
 desk-0.015
ird-0.014
 laptop-0.014
 liar-0.013
Token damaged
Token ID9694
Feature activation-3.25e+00

Pos logit contributions
 seeker+0.002
 desk+0.002
ird+0.002
 laptop+0.001
 principal+0.001
Neg logit contributions
reci-0.002
les-0.002
orting-0.002
ling-0.002
 Nem-0.002
Token
Token ID198
Feature activation-3.05e+00

Pos logit contributions
reci+0.098
les+0.096
ling+0.084
elling+0.081
orting+0.079
Neg logit contributions
 seeker-0.092
 desk-0.092
ena-0.084
 laptop-0.082
ird-0.082
TokenBut
Token ID1537
Feature activation-4.54e-01

Pos logit contributions
reci+0.096
les+0.095
elling+0.080
ling+0.079
orting+0.077
Neg logit contributions
 seeker-0.098
 desk-0.093
 ha-0.086
 laptop-0.084
ena-0.083
Token it
Token ID340
Feature activation-3.74e+00

Pos logit contributions
reci+0.012
les+0.012
orting+0.010
ling+0.010
From+0.009
Neg logit contributions
 seeker-0.016
 desk-0.016
ird-0.015
 laptop-0.014
 liar-0.014
Token was
Token ID373
Feature activation+5.58e+00

Pos logit contributions
les+0.095
ling+0.084
reci+0.084
From+0.073
led+0.073
Neg logit contributions
 seeker-0.139
 desk-0.135
ird-0.130
 laptop-0.127
 liar-0.122
Token too
Token ID1165
Feature activation+8.45e-01

Pos logit contributions
 desk+0.209
ird+0.206
 seeker+0.204
 laptop+0.191
 principal+0.189
Neg logit contributions
reci-0.118
les-0.113
 Nem-0.104
 them-0.103
ling-0.098
Token late
Token ID2739
Feature activation-8.53e+00

Pos logit contributions
 seeker+0.032
 desk+0.030
 laptop+0.028
 liar+0.027
ird+0.027
Neg logit contributions
reci-0.021
les-0.020
orting-0.018
ling-0.016
ted-0.015
Token.
Token ID13
Feature activation+1.65e+00

Pos logit contributions
les+0.247
reci+0.241
orting+0.224
ling+0.208
 them+0.204
Neg logit contributions
 seeker-0.283
 desk-0.257
ird-0.252
 liar-0.238
 laptop-0.233
Token Ben
Token ID3932
Feature activation+1.13e+00

Pos logit contributions
 seeker+0.062
 desk+0.060
ird+0.056
 laptop+0.055
 liar+0.054
Neg logit contributions
reci-0.041
les-0.040
orting-0.033
ling-0.032
From-0.030
Token touched
Token ID12615
Feature activation-2.46e+00

Pos logit contributions
 seeker+0.039
 desk+0.035
ird+0.033
 laptop+0.033
 liar+0.032
Neg logit contributions
les-0.034
reci-0.030
ling-0.028
s-0.027
ted-0.026
Token the
Token ID262
Feature activation-1.03e+00

Pos logit contributions
reci+0.069
les+0.067
orting+0.058
ling+0.055
From+0.052
Neg logit contributions
 seeker-0.085
 desk-0.081
ird-0.075
 laptop-0.074
 liar-0.072
Token w
Token ID266
Feature activation+3.14e+00

Pos logit contributions
les+0.033
reci+0.031
ling+0.028
orting+0.027
 Nem+0.025
Neg logit contributions
 seeker-0.031
ird-0.030
 desk-0.028
 liar-0.027
ena-0.027
Token
Token ID198
Feature activation-2.89e+00

Pos logit contributions
les+0.092
reci+0.091
elling+0.080
orting+0.079
ling+0.076
Neg logit contributions
 desk-0.086
 seeker-0.085
ird-0.078
 laptop-0.077
 principal-0.077
TokenBut
Token ID1537
Feature activation-4.47e-01

Pos logit contributions
les+0.086
reci+0.080
elling+0.074
orting+0.069
ling+0.067
Neg logit contributions
 seeker-0.091
 desk-0.090
ird-0.087
 ha-0.085
 principal-0.084
Token it
Token ID340
Feature activation-3.92e+00

Pos logit contributions
reci+0.013
les+0.012
orting+0.010
ling+0.010
elling+0.010
Neg logit contributions
 seeker-0.016
 desk-0.015
ird-0.014
 laptop-0.014
 ha-0.014
Token was
Token ID373
Feature activation+5.13e+00

Pos logit contributions
reci+0.129
les+0.126
orting+0.111
ling+0.108
From+0.102
Neg logit contributions
 seeker-0.117
 desk-0.110
ird-0.101
 laptop-0.100
 liar-0.096
Token too
Token ID1165
Feature activation+8.45e-01

Pos logit contributions
 seeker+0.168
 desk+0.163
ird+0.151
 laptop+0.148
 principal+0.144
Neg logit contributions
reci-0.148
les-0.142
ling-0.124
orting-0.123
 Nem-0.119
Token late
Token ID2739
Feature activation-8.53e+00

Pos logit contributions
 seeker+0.033
 desk+0.032
 laptop+0.030
ird+0.030
 liar+0.029
Neg logit contributions
reci-0.019
les-0.019
orting-0.016
ling-0.014
From-0.013
Token.
Token ID13
Feature activation+1.14e+00

Pos logit contributions
les+0.265
reci+0.263
orting+0.237
ling+0.226
 Nem+0.217
Neg logit contributions
 seeker-0.261
 desk-0.244
ird-0.229
 laptop-0.224
 liar-0.220
Token The
Token ID383
Feature activation-4.97e+00

Pos logit contributions
 seeker+0.046
 desk+0.045
ird+0.042
 laptop+0.042
 liar+0.041
Neg logit contributions
reci-0.025
les-0.024
orting-0.020
ling-0.019
From-0.017
Token man
Token ID582
Feature activation+2.55e+00

Pos logit contributions
les+0.162
reci+0.159
ling+0.142
orting+0.139
 Nem+0.126
Neg logit contributions
 seeker-0.130
ird-0.122
 desk-0.120
ena-0.107
 laptop-0.106
Token never
Token ID1239
Feature activation-5.50e+00

Pos logit contributions
 seeker+0.082
 desk+0.075
 laptop+0.069
ird+0.067
 liar+0.066
Neg logit contributions
les-0.079
reci-0.078
ling-0.068
orting-0.066
From-0.062
Token got
Token ID1392
Feature activation+3.38e+00

Pos logit contributions
reci+0.172
les+0.172
ling+0.145
orting+0.143
 Nem+0.136
Neg logit contributions
 seeker-0.173
 desk-0.162
 laptop-0.152
ird-0.145
 liar-0.142
Token dog
Token ID3290
Feature activation+3.12e+00

Pos logit contributions
les+0.038
reci+0.033
ling+0.031
 Nem+0.031
orting+0.031
Neg logit contributions
 seeker-0.039
ird-0.039
 supposed-0.038
ena-0.036
 principal-0.036
Token named
Token ID3706
Feature activation+2.05e+00

Pos logit contributions
 seeker+0.082
 desk+0.075
 laptop+0.067
 principal+0.066
 liar+0.065
Neg logit contributions
les-0.111
reci-0.111
orting-0.102
ling-0.098
 Nem-0.095
Token Max
Token ID5436
Feature activation+1.88e+00

Pos logit contributions
 seeker+0.078
 desk+0.075
 laptop+0.070
ird+0.070
 liar+0.069
Neg logit contributions
reci-0.046
les-0.044
 Nem-0.039
orting-0.039
ted-0.035
Token.
Token ID13
Feature activation+1.21e+00

Pos logit contributions
 seeker+0.053
 desk+0.050
ird+0.045
 laptop+0.045
 liar+0.044
Neg logit contributions
reci-0.064
les-0.063
orting-0.057
ling-0.055
 Nem-0.053
Token Max
Token ID5436
Feature activation+1.31e+00

Pos logit contributions
 seeker+0.036
 desk+0.036
 laptop+0.033
ird+0.032
 principal+0.031
Neg logit contributions
reci-0.039
les-0.039
ling-0.033
orting-0.033
elling-0.031
Token was
Token ID373
Feature activation+5.48e+00

Pos logit contributions
 seeker+0.035
 desk+0.034
ird+0.030
 laptop+0.030
 ha+0.029
Neg logit contributions
reci-0.047
les-0.045
orting-0.041
ling-0.039
elling-0.038
Token very
Token ID845
Feature activation+3.49e+00

Pos logit contributions
 desk+0.148
 seeker+0.144
ird+0.133
 laptop+0.129
 principal+0.123
Neg logit contributions
reci-0.174
les-0.173
ling-0.165
 Nem-0.161
 them-0.158
Token strong
Token ID1913
Feature activation-4.29e+00

Pos logit contributions
 seeker+0.091
 desk+0.084
ird+0.076
 liar+0.074
 laptop+0.074
Neg logit contributions
reci-0.129
les-0.127
orting-0.114
ling-0.106
elling-0.105
Token.
Token ID13
Feature activation+1.24e+00

Pos logit contributions
les+0.096
reci+0.089
ling+0.078
orting+0.076
 Nem+0.066
Neg logit contributions
 seeker-0.166
 desk-0.163
ird-0.154
 laptop-0.152
 principal-0.149
Token Lucy
Token ID22162
Feature activation+1.26e+00

Pos logit contributions
 desk+0.043
 seeker+0.042
 laptop+0.040
ird+0.040
 slammed+0.038
Neg logit contributions
reci-0.035
les-0.034
ling-0.028
orting-0.028
elling-0.025
Token told
Token ID1297
Feature activation-2.26e+00

Pos logit contributions
 seeker+0.037
 desk+0.035
ird+0.032
 laptop+0.031
 liar+0.030
Neg logit contributions
reci-0.042
les-0.041
orting-0.036
ling-0.035
From-0.033
Token "
Token ID366
Feature activation-1.00e-01

Pos logit contributions
 desk+0.084
 seeker+0.082
 laptop+0.075
ird+0.072
 principal+0.071
Neg logit contributions
reci-0.116
les-0.116
ling-0.099
orting-0.099
elling-0.097
TokenYou
Token ID1639
Feature activation+6.65e-01

Pos logit contributions
les+0.003
reci+0.003
ling+0.003
orting+0.002
lights+0.002
Neg logit contributions
 seeker-0.003
 ha-0.003
ird-0.003
 desk-0.003
ena-0.003
Token did
Token ID750
Feature activation+2.87e+00

Pos logit contributions
 seeker+0.020
 desk+0.019
 laptop+0.018
ird+0.017
 principal+0.017
Neg logit contributions
reci-0.021
les-0.021
orting-0.018
ling-0.017
 Nem-0.017
Token a
Token ID257
Feature activation+5.67e+00

Pos logit contributions
 seeker+0.094
 desk+0.090
ird+0.089
 laptop+0.085
 principal+0.083
Neg logit contributions
reci-0.077
les-0.075
 them-0.068
orting-0.067
 Nem-0.066
Token good
Token ID922
Feature activation-7.43e-01

Pos logit contributions
 seeker+0.208
 desk+0.204
ird+0.195
 laptop+0.191
 principal+0.187
Neg logit contributions
reci-0.132
les-0.131
ling-0.111
orting-0.107
 Nem-0.103
Token thing
Token ID1517
Feature activation+4.04e+00

Pos logit contributions
les+0.024
reci+0.024
ling+0.021
orting+0.021
 Nem+0.019
Neg logit contributions
 desk-0.020
 seeker-0.019
ird-0.019
 principal-0.018
 laptop-0.017
Token.
Token ID13
Feature activation+3.11e+00

Pos logit contributions
 seeker+0.125
 desk+0.119
 laptop+0.108
ird+0.108
 liar+0.104
Neg logit contributions
les-0.122
reci-0.116
orting-0.108
ling-0.103
 Nem-0.098
Token You
Token ID921
Feature activation+3.77e-01

Pos logit contributions
 seeker+0.106
 desk+0.096
ird+0.093
 ha+0.089
 laptop+0.088
Neg logit contributions
les-0.094
reci-0.091
ling-0.083
orting-0.076
led-0.068
Token rescued
Token ID19868
Feature activation-7.38e-01

Pos logit contributions
 seeker+0.011
 desk+0.010
ird+0.009
 laptop+0.009
 liar+0.009
Neg logit contributions
reci-0.013
les-0.013
orting-0.011
ling-0.011
From-0.010
Token this
Token ID428
Feature activation+4.53e-01

Pos logit contributions
reci+0.018
les+0.017
orting+0.014
ling+0.013
From+0.012
Neg logit contributions
 seeker-0.029
 desk-0.028
 laptop-0.025
ird-0.025
 ha-0.025
Token bird
Token ID6512
Feature activation+1.52e+00

Pos logit contributions
 seeker+0.014
 desk+0.013
ird+0.012
 laptop+0.011
 principal+0.011
Neg logit contributions
les-0.015
reci-0.014
orting-0.012
ling-0.012
 Nem-0.011
Token down
Token ID866
Feature activation-1.82e+00

Pos logit contributions
reci+0.078
les+0.077
orting+0.067
ling+0.064
 Nem+0.060
Neg logit contributions
 seeker-0.093
 desk-0.089
ird-0.083
 laptop-0.082
 liar-0.081
Token the
Token ID262
Feature activation-1.48e+00

Pos logit contributions
les+0.046
reci+0.046
orting+0.039
ling+0.037
 Nem+0.036
Neg logit contributions
 seeker-0.065
 desk-0.062
ird-0.059
 laptop-0.059
 liar-0.058
Token mountain
Token ID8598
Feature activation-5.32e-01

Pos logit contributions
reci+0.049
les+0.048
orting+0.044
elling+0.040
ted+0.040
Neg logit contributions
 seeker-0.044
 desk-0.043
 laptop-0.037
 ha-0.036
 office-0.036
Token,
Token ID11
Feature activation+2.28e+00

Pos logit contributions
reci+0.017
les+0.017
orting+0.015
ling+0.014
 Nem+0.014
Neg logit contributions
 seeker-0.016
 desk-0.015
ird-0.014
 laptop-0.014
 liar-0.013
Token it
Token ID340
Feature activation-3.48e+00

Pos logit contributions
 seeker+0.088
 desk+0.083
 laptop+0.079
ird+0.079
 liar+0.077
Neg logit contributions
reci-0.051
les-0.050
orting-0.041
ling-0.040
 Nem-0.039
Token was
Token ID373
Feature activation+5.21e+00

Pos logit contributions
reci+0.113
les+0.112
orting+0.097
ling+0.096
From+0.090
Neg logit contributions
 seeker-0.104
 desk-0.098
ird-0.090
 laptop-0.090
 liar-0.087
Token too
Token ID1165
Feature activation+1.37e+00

Pos logit contributions
 seeker+0.163
 desk+0.158
 laptop+0.147
ird+0.146
 principal+0.143
Neg logit contributions
reci-0.154
les-0.152
ling-0.132
orting-0.131
 Nem-0.130
Token dark
Token ID3223
Feature activation-1.52e+00

Pos logit contributions
 seeker+0.043
 desk+0.040
ird+0.036
 laptop+0.036
 liar+0.035
Neg logit contributions
reci-0.043
les-0.041
orting-0.037
ling-0.034
elling-0.033
Token.
Token ID13
Feature activation+1.25e+00

Pos logit contributions
les+0.043
reci+0.042
orting+0.037
ling+0.036
ted+0.033
Neg logit contributions
 seeker-0.052
 desk-0.049
ird-0.046
 laptop-0.045
 liar-0.045
Token She
Token ID1375
Feature activation-2.70e+00

Pos logit contributions
 seeker+0.049
 desk+0.046
 laptop+0.043
ird+0.043
 liar+0.043
Neg logit contributions
reci-0.028
les-0.027
orting-0.023
From-0.022
 Nem-0.021
Token could
Token ID714
Feature activation+3.94e+00

Pos logit contributions
les+0.084
reci+0.082
ling+0.071
orting+0.068
From+0.067
Neg logit contributions
 seeker-0.087
 desk-0.079
 laptop-0.075
 liar-0.072
ird-0.072
Token exciting
Token ID7895
Feature activation-5.10e+00

Pos logit contributions
 seeker+0.126
 desk+0.121
ird+0.112
 laptop+0.108
 liar+0.106
Neg logit contributions
reci-0.195
les-0.191
ling-0.173
orting-0.171
 Nem-0.165
Token and
Token ID290
Feature activation-3.59e+00

Pos logit contributions
reci+0.154
les+0.152
orting+0.132
ling+0.128
 Nem+0.120
Neg logit contributions
 seeker-0.163
 desk-0.154
ird-0.143
 laptop-0.141
 liar-0.137
Token new
Token ID649
Feature activation-2.85e+00

Pos logit contributions
reci+0.122
les+0.120
orting+0.106
elling+0.099
ling+0.098
Neg logit contributions
 seeker-0.102
 desk-0.099
ird-0.084
 office-0.083
 laptop-0.083
Token,
Token ID11
Feature activation+2.91e+00

Pos logit contributions
les+0.078
reci+0.073
s+0.065
ling+0.065
orting+0.064
Neg logit contributions
 seeker-0.092
 desk-0.088
ird-0.087
 laptop-0.085
 liar-0.082
Token but
Token ID475
Feature activation-9.27e-01

Pos logit contributions
 seeker+0.085
 desk+0.079
 laptop+0.070
ird+0.069
 liar+0.067
Neg logit contributions
reci-0.102
les-0.097
orting-0.086
ling-0.082
elling-0.081
Token a
Token ID257
Feature activation+5.26e+00

Pos logit contributions
reci+0.025
les+0.024
orting+0.020
ling+0.018
elling+0.018
Neg logit contributions
 seeker-0.035
 desk-0.033
ird-0.030
 laptop-0.030
 ha-0.030
Token little
Token ID1310
Feature activation-1.25e-01

Pos logit contributions
 seeker+0.190
ird+0.178
ena+0.175
 desk+0.174
 laptop+0.172
Neg logit contributions
les-0.116
reci-0.116
ling-0.101
 Nem-0.096
From-0.087
Token scary
Token ID14343
Feature activation-5.61e-01

Pos logit contributions
les+0.003
reci+0.003
ling+0.003
 Nem+0.003
orting+0.003
Neg logit contributions
 seeker-0.004
ird-0.004
 desk-0.004
 laptop-0.004
ena-0.004
Token too
Token ID1165
Feature activation+2.51e+00

Pos logit contributions
les+0.018
reci+0.015
ling+0.015
 Nem+0.014
orting+0.013
Neg logit contributions
ird-0.017
 seeker-0.017
 desk-0.016
nam-0.016
 laptop-0.015
Token.
Token ID13
Feature activation+1.33e+00

Pos logit contributions
 seeker+0.071
 desk+0.067
ird+0.061
 laptop+0.060
 liar+0.058
Neg logit contributions
reci-0.086
les-0.084
orting-0.074
ling-0.072
From-0.069
Token He
Token ID679
Feature activation-3.05e+00

Pos logit contributions
 seeker+0.052
 desk+0.050
ird+0.047
 laptop+0.046
 principal+0.045
Neg logit contributions
reci-0.031
les-0.031
orting-0.025
ling-0.025
From-0.022
Token happy
Token ID3772
Feature activation-3.74e+00

Pos logit contributions
 seeker+0.180
ird+0.173
 desk+0.169
 laptop+0.165
 principal+0.158
Neg logit contributions
reci-0.119
les-0.119
ling-0.106
 Nem-0.102
From-0.095
Token train
Token ID4512
Feature activation+4.27e-01

Pos logit contributions
les+0.109
reci+0.107
 Nem+0.094
ling+0.091
orting+0.091
Neg logit contributions
 seeker-0.117
ird-0.109
 desk-0.108
 laptop-0.104
 liar-0.100
Token.
Token ID13
Feature activation+8.42e-01

Pos logit contributions
 seeker+0.013
 desk+0.012
ird+0.011
 laptop+0.011
 liar+0.011
Neg logit contributions
les-0.013
reci-0.013
orting-0.012
ling-0.012
 Nem-0.011
Token The
Token ID383
Feature activation-4.40e+00

Pos logit contributions
 seeker+0.035
 desk+0.034
ird+0.032
 laptop+0.032
 liar+0.031
Neg logit contributions
reci-0.018
les-0.017
orting-0.014
ling-0.013
From-0.012
Token train
Token ID4512
Feature activation-1.80e-01

Pos logit contributions
reci+0.160
les+0.155
orting+0.139
ling+0.137
elling+0.131
Neg logit contributions
 seeker-0.117
 desk-0.112
 laptop-0.099
ird-0.097
 office-0.095
Token was
Token ID373
Feature activation+5.99e+00

Pos logit contributions
reci+0.006
les+0.006
orting+0.006
ling+0.005
elling+0.005
Neg logit contributions
 seeker-0.005
 desk-0.005
ird-0.004
 laptop-0.004
 ha-0.004
Token cheerful
Token ID37999
Feature activation-5.07e+00

Pos logit contributions
 seeker+0.144
 desk+0.137
ird+0.129
 laptop+0.123
 liar+0.122
Neg logit contributions
reci-0.219
les-0.214
ling-0.193
orting-0.190
 Nem-0.189
Token and
Token ID290
Feature activation-3.51e+00

Pos logit contributions
les+0.101
reci+0.088
ling+0.077
orting+0.077
 Nem+0.075
Neg logit contributions
 seeker-0.205
ird-0.197
 desk-0.196
 liar-0.192
 laptop-0.190
Token loved
Token ID6151
Feature activation+1.02e+00

Pos logit contributions
reci+0.125
les+0.123
orting+0.112
ling+0.104
elling+0.102
Neg logit contributions
 seeker-0.095
 desk-0.091
ird-0.080
 laptop-0.079
 ha-0.075
Token to
Token ID284
Feature activation+3.29e+00

Pos logit contributions
 seeker+0.034
 desk+0.032
ird+0.030
 laptop+0.029
 liar+0.029
Neg logit contributions
reci-0.029
les-0.029
orting-0.025
ling-0.024
 Nem-0.024
Token travel
Token ID3067
Feature activation-1.79e+00

Pos logit contributions
 seeker+0.099
 desk+0.093
ird+0.086
 laptop+0.085
 liar+0.083
Neg logit contributions
reci-0.106
les-0.103
orting-0.090
ling-0.087
From-0.084
Token was
Token ID373
Feature activation+4.10e+00

Pos logit contributions
reci+0.137
les+0.135
orting+0.114
ling+0.113
 Nem+0.110
Neg logit contributions
 seeker-0.153
 desk-0.144
 laptop-0.135
ird-0.132
 liar-0.131
Token a
Token ID257
Feature activation+4.70e+00

Pos logit contributions
 seeker+0.097
 desk+0.093
 laptop+0.082
ird+0.082
 principal+0.079
Neg logit contributions
reci-0.156
les-0.153
orting-0.137
ling-0.134
 Nem-0.133
Token little
Token ID1310
Feature activation-8.45e-01

Pos logit contributions
 seeker+0.165
ird+0.158
 desk+0.156
 laptop+0.152
 principal+0.146
Neg logit contributions
reci-0.112
les-0.111
ling-0.098
 Nem-0.095
From-0.089
Token girl
Token ID2576
Feature activation+3.51e+00

Pos logit contributions
reci+0.025
les+0.025
orting+0.021
ling+0.021
 Nem+0.020
Neg logit contributions
 seeker-0.027
 desk-0.026
ird-0.024
 laptop-0.024
 principal-0.023
Token called
Token ID1444
Feature activation+4.41e+00

Pos logit contributions
 seeker+0.112
 liar+0.097
 desk+0.097
 principal+0.096
 laptop+0.094
Neg logit contributions
les-0.100
reci-0.098
ling-0.093
s-0.087
orting-0.085
Token Pearl
Token ID18482
Feature activation+3.57e+00

Pos logit contributions
 seeker+0.170
 desk+0.163
 liar+0.161
 laptop+0.159
 principal+0.153
Neg logit contributions
 Nem-0.075
reci-0.075
les-0.070
 them-0.065
orting-0.064
Token.
Token ID13
Feature activation+5.91e-01

Pos logit contributions
 seeker+0.107
 desk+0.097
ird+0.090
 laptop+0.089
 liar+0.087
Neg logit contributions
les-0.119
reci-0.118
ling-0.103
orting-0.102
 Nem-0.095
Token Pearl
Token ID18482
Feature activation+2.22e+00

Pos logit contributions
 seeker+0.020
 desk+0.019
ird+0.018
 laptop+0.018
 liar+0.017
Neg logit contributions
reci-0.017
les-0.016
orting-0.014
ling-0.013
From-0.013
Token loved
Token ID6151
Feature activation+4.57e-01

Pos logit contributions
 seeker+0.072
 desk+0.067
ird+0.062
 laptop+0.062
 liar+0.060
Neg logit contributions
reci-0.068
les-0.066
orting-0.057
ling-0.056
From-0.053
Token playing
Token ID2712
Feature activation+3.46e-01

Pos logit contributions
 seeker+0.015
ird+0.014
 desk+0.014
 liar+0.013
 laptop+0.013
Neg logit contributions
reci-0.013
les-0.013
orting-0.011
 Nem-0.011
ling-0.010
Token outside
Token ID2354
Feature activation+7.27e-01

Pos logit contributions
 seeker+0.010
 desk+0.010
ird+0.009
 liar+0.009
 laptop+0.009
Neg logit contributions
les-0.011
reci-0.010
orting-0.010
ling-0.009
 Nem-0.009
Token color
Token ID3124
Feature activation+1.31e+00

Pos logit contributions
reci+0.052
les+0.052
orting+0.046
ling+0.043
From+0.043
Neg logit contributions
 seeker-0.040
 desk-0.038
 laptop-0.034
ird-0.033
 liar-0.032
Token to
Token ID284
Feature activation+2.21e+00

Pos logit contributions
 desk+0.040
 seeker+0.040
 laptop+0.036
ird+0.035
 office+0.033
Neg logit contributions
reci-0.041
les-0.039
orting-0.035
From-0.033
elling-0.032
Token match
Token ID2872
Feature activation-1.65e+00

Pos logit contributions
 seeker+0.065
 desk+0.062
ird+0.056
 laptop+0.056
 liar+0.054
Neg logit contributions
reci-0.073
les-0.072
orting-0.063
ling-0.061
elling-0.058
Token the
Token ID262
Feature activation-1.66e+00

Pos logit contributions
reci+0.050
les+0.048
orting+0.042
ling+0.040
From+0.039
Neg logit contributions
 seeker-0.052
 desk-0.051
 laptop-0.046
ird-0.045
 liar-0.044
Token water
Token ID1660
Feature activation+4.72e-01

Pos logit contributions
reci+0.046
les+0.045
orting+0.039
ling+0.037
From+0.036
Neg logit contributions
 seeker-0.057
 desk-0.055
 laptop-0.050
ird-0.049
 liar-0.049
Token.
Token ID13
Feature activation+1.43e+00

Pos logit contributions
 seeker+0.013
 desk+0.013
 laptop+0.011
ird+0.011
 office+0.011
Neg logit contributions
reci-0.016
les-0.015
orting-0.014
From-0.013
elling-0.013
Token The
Token ID383
Feature activation-4.79e+00

Pos logit contributions
 seeker+0.042
ird+0.042
 desk+0.041
 principal+0.037
 laptop+0.037
Neg logit contributions
les-0.043
ling-0.042
reci-0.042
orting-0.034
led-0.033
Token storm
Token ID6388
Feature activation-1.80e+00

Pos logit contributions
reci+0.130
les+0.125
orting+0.110
From+0.100
ling+0.099
Neg logit contributions
 seeker-0.172
 desk-0.164
 laptop-0.150
 liar-0.147
 ha-0.147
Token went
Token ID1816
Feature activation+9.02e-01

Pos logit contributions
reci+0.065
les+0.060
orting+0.058
elling+0.054
From+0.053
Neg logit contributions
 seeker-0.045
 desk-0.042
ird-0.038
 laptop-0.037
 ha-0.037
Token away
Token ID1497
Feature activation-1.98e+00

Pos logit contributions
 seeker+0.027
 desk+0.025
 laptop+0.022
ird+0.022
 liar+0.022
Neg logit contributions
reci-0.030
les-0.028
orting-0.025
elling-0.024
From-0.024
Token,
Token ID11
Feature activation+2.15e+00

Pos logit contributions
reci+0.060
les+0.059
orting+0.051
ling+0.050
From+0.047
Neg logit contributions
 seeker-0.063
 desk-0.060
ird-0.055
 laptop-0.054
 liar-0.053
Token time
Token ID640
Feature activation-1.67e+00

Pos logit contributions
 seeker+0.098
 desk+0.093
ird+0.085
 laptop+0.084
 liar+0.081
Neg logit contributions
reci-0.105
les-0.102
orting-0.090
ling-0.088
From-0.083
Token there
Token ID612
Feature activation-4.64e+00

Pos logit contributions
reci+0.048
les+0.047
orting+0.042
ling+0.040
 Nem+0.040
Neg logit contributions
 seeker-0.054
 desk-0.051
 laptop-0.048
ird-0.047
 liar-0.046
Token was
Token ID373
Feature activation+4.12e+00

Pos logit contributions
reci+0.119
les+0.119
ling+0.098
 Nem+0.097
orting+0.095
Neg logit contributions
 seeker-0.166
 desk-0.157
 laptop-0.152
 liar-0.148
ird-0.146
Token a
Token ID257
Feature activation+4.83e+00

Pos logit contributions
 seeker+0.106
 desk+0.106
 laptop+0.096
ird+0.096
 principal+0.096
Neg logit contributions
reci-0.143
les-0.139
 Nem-0.125
orting-0.124
From-0.123
Token little
Token ID1310
Feature activation-6.66e-01

Pos logit contributions
 seeker+0.169
ird+0.163
 desk+0.159
 laptop+0.155
 principal+0.149
Neg logit contributions
reci-0.114
les-0.113
ling-0.099
 Nem-0.098
From-0.091
Token girl
Token ID2576
Feature activation+3.59e+00

Pos logit contributions
les+0.019
reci+0.019
ling+0.016
orting+0.016
 Nem+0.016
Neg logit contributions
 seeker-0.021
 desk-0.020
ird-0.019
 laptop-0.019
 principal-0.018
Token.
Token ID13
Feature activation+6.29e-01

Pos logit contributions
 seeker+0.112
 desk+0.098
 liar+0.096
 laptop+0.095
 principal+0.094
Neg logit contributions
les-0.105
reci-0.103
ling-0.095
orting-0.091
s-0.090
Token She
Token ID1375
Feature activation-2.77e+00

Pos logit contributions
 seeker+0.025
 desk+0.024
 laptop+0.024
 liar+0.024
ird+0.023
Neg logit contributions
reci-0.013
les-0.012
orting-0.010
From-0.009
elling-0.009
Token had
Token ID550
Feature activation+2.49e-01

Pos logit contributions
les+0.067
reci+0.065
ling+0.054
From+0.054
 Nem+0.050
Neg logit contributions
 seeker-0.106
 desk-0.099
ird-0.097
 laptop-0.096
 principal-0.094
Token a
Token ID257
Feature activation+5.76e+00

Pos logit contributions
 laptop+0.009
 code+0.009
 desk+0.008
ird+0.008
 principal+0.008
Neg logit contributions
 them-0.005
les-0.004
There-0.004
Next-0.004
From-0.004
Token plastic
Token ID7309
Feature activation-1.48e-01

Pos logit contributions
ird+0.217
ena+0.206
 principal+0.192
keeping+0.192
 seeker+0.191
Neg logit contributions
 refreshing-0.084
 Nem-0.084
reci-0.084
les-0.081
s-0.078
Token.
Token ID13
Feature activation+9.85e-01

Pos logit contributions
 seeker+0.057
 desk+0.054
ird+0.048
 laptop+0.048
 liar+0.046
Neg logit contributions
reci-0.081
les-0.079
orting-0.071
ling-0.069
From-0.066
Token They
Token ID1119
Feature activation-3.51e+00

Pos logit contributions
 seeker+0.040
 desk+0.037
ird+0.035
 laptop+0.035
ena+0.034
Neg logit contributions
les-0.024
reci-0.023
ling-0.019
orting-0.018
ience-0.017
Token like
Token ID588
Feature activation+2.41e-01

Pos logit contributions
reci+0.126
les+0.123
orting+0.110
ling+0.107
elling+0.103
Neg logit contributions
 seeker-0.094
 desk-0.087
ird-0.079
 laptop-0.077
 liar-0.075
Token to
Token ID284
Feature activation+2.85e+00

Pos logit contributions
 seeker+0.008
 desk+0.007
ird+0.007
 laptop+0.007
 liar+0.006
Neg logit contributions
reci-0.007
les-0.007
orting-0.006
ling-0.006
 Nem-0.006
Token play
Token ID711
Feature activation+3.80e-01

Pos logit contributions
 seeker+0.103
 desk+0.096
 liar+0.094
 principal+0.093
ird+0.093
Neg logit contributions
reci-0.068
les-0.063
orting-0.054
 Nem-0.054
 them-0.052
Token in
Token ID287
Feature activation+2.27e+00

Pos logit contributions
 seeker+0.012
 desk+0.011
ird+0.010
 laptop+0.010
 liar+0.010
Neg logit contributions
reci-0.012
les-0.012
orting-0.010
ling-0.010
 Nem-0.009
Token the
Token ID262
Feature activation-1.15e+00

Pos logit contributions
 seeker+0.087
 desk+0.083
ird+0.078
 laptop+0.077
 liar+0.075
Neg logit contributions
reci-0.055
les-0.053
orting-0.045
ling-0.043
From-0.040
Token park
Token ID3952
Feature activation-4.98e-01

Pos logit contributions
reci+0.042
les+0.042
orting+0.037
ling+0.036
From+0.034
Neg logit contributions
 seeker-0.030
 desk-0.029
 laptop-0.025
ird-0.024
 office-0.024
Token.
Token ID13
Feature activation+1.18e+00

Pos logit contributions
les+0.013
reci+0.013
orting+0.012
ling+0.012
s+0.011
Neg logit contributions
 seeker-0.017
 desk-0.016
ird-0.015
 liar-0.015
 principal-0.015
Token One
Token ID1881
Feature activation-3.17e+00

Pos logit contributions
 seeker+0.041
 desk+0.038
ird+0.035
 laptop+0.035
 closet+0.034
Neg logit contributions
les-0.035
reci-0.034
ling-0.030
orting-0.028
ience-0.027
Token day
Token ID1110
Feature activation-2.88e+00

Pos logit contributions
reci+0.082
les+0.079
orting+0.067
ling+0.065
From+0.060
Neg logit contributions
 seeker-0.117
 desk-0.112
ird-0.105
 laptop-0.103
 liar-0.101
Token her
Token ID607
Feature activation+3.32e+00

Pos logit contributions
les+0.018
reci+0.018
orting+0.015
ling+0.015
 them+0.015
Neg logit contributions
 seeker-0.020
 desk-0.018
ird-0.018
 liar-0.017
 laptop-0.017
Token.
Token ID13
Feature activation+1.33e+00

Pos logit contributions
 seeker+0.099
 desk+0.087
 liar+0.086
ird+0.086
 principal+0.083
Neg logit contributions
les-0.108
reci-0.107
ling-0.095
orting-0.094
 Nem-0.088
Token Even
Token ID3412
Feature activation-4.06e+00

Pos logit contributions
 seeker+0.040
 desk+0.040
 laptop+0.035
ird+0.035
 principal+0.033
Neg logit contributions
reci-0.042
les-0.041
ling-0.037
orting-0.036
 Nem-0.033
Token though
Token ID996
Feature activation-4.63e+00

Pos logit contributions
reci+0.100
les+0.095
orting+0.083
ling+0.073
elling+0.072
Neg logit contributions
 seeker-0.154
 desk-0.145
ird-0.134
 laptop-0.133
 ha-0.131
Token the
Token ID262
Feature activation-2.15e+00

Pos logit contributions
les+0.099
reci+0.099
orting+0.077
ling+0.075
From+0.072
Neg logit contributions
 seeker-0.184
 desk-0.175
ird-0.170
 laptop-0.168
 liar-0.167
Token rain
Token ID6290
Feature activation-1.17e-02

Pos logit contributions
reci+0.062
les+0.060
orting+0.052
ling+0.050
From+0.047
Neg logit contributions
 seeker-0.072
 desk-0.069
ird-0.064
 laptop-0.063
 liar-0.062
Token was
Token ID373
Feature activation+5.93e+00

Pos logit contributions
reci+0.000
les+0.000
orting+0.000
ling+0.000
 Nem+0.000
Neg logit contributions
 seeker-0.000
 desk-0.000
ird-0.000
 laptop-0.000
 ha-0.000
Token gone
Token ID3750
Feature activation-4.37e+00

Pos logit contributions
 seeker+0.164
 desk+0.155
ird+0.140
 laptop+0.139
 liar+0.134
Neg logit contributions
reci-0.206
les-0.201
orting-0.179
ling-0.173
From-0.165
Token,
Token ID11
Feature activation+2.36e+00

Pos logit contributions
reci+0.143
les+0.140
orting+0.124
ling+0.121
 Nem+0.113
Neg logit contributions
 seeker-0.129
 desk-0.122
ird-0.112
 laptop-0.111
 liar-0.107
Token the
Token ID262
Feature activation-2.25e+00

Pos logit contributions
 seeker+0.077
 desk+0.073
ird+0.068
 laptop+0.067
 liar+0.066
Neg logit contributions
reci-0.070
les-0.068
orting-0.059
ling-0.057
From-0.054
Token day
Token ID1110
Feature activation-9.82e-01

Pos logit contributions
reci+0.064
les+0.062
orting+0.053
ling+0.052
From+0.048
Neg logit contributions
 seeker-0.076
 desk-0.073
ird-0.068
 laptop-0.067
 liar-0.066
Token a
Token ID257
Feature activation+5.63e+00

Pos logit contributions
 seeker+0.043
 desk+0.040
ird+0.036
 laptop+0.035
 ha+0.035
Neg logit contributions
reci-0.055
les-0.053
orting-0.048
ling-0.047
From-0.045
Token perfect
Token ID2818
Feature activation-6.62e-01

Pos logit contributions
ena+0.180
 seeker+0.180
ird+0.178
 desk+0.178
 principal+0.170
Neg logit contributions
reci-0.131
les-0.130
 Nem-0.118
s-0.115
orting-0.113
Token place
Token ID1295
Feature activation+3.48e+00

Pos logit contributions
les+0.017
reci+0.015
 Nem+0.015
ling+0.014
orting+0.014
Neg logit contributions
ird-0.020
ena-0.020
 principal-0.020
 desk-0.019
nam-0.019
Token to
Token ID284
Feature activation+3.03e+00

Pos logit contributions
 seeker+0.114
 desk+0.107
ird+0.102
 laptop+0.100
 liar+0.099
Neg logit contributions
reci-0.097
les-0.097
orting-0.086
ling-0.082
 Nem-0.081
Token just
Token ID655
Feature activation-1.68e+00

Pos logit contributions
 seeker+0.099
 desk+0.093
 laptop+0.090
 liar+0.090
 principal+0.089
Neg logit contributions
reci-0.083
les-0.082
orting-0.069
 Nem-0.069
ling-0.067
Token relax
Token ID8960
Feature activation-3.24e+00

Pos logit contributions
reci+0.054
les+0.053
orting+0.047
ling+0.046
 Nem+0.044
Neg logit contributions
 seeker-0.049
 desk-0.046
 laptop-0.043
ird-0.043
 liar-0.042
Token.
Token ID13
Feature activation+9.80e-01

Pos logit contributions
les+0.106
reci+0.106
ling+0.093
orting+0.093
 Nem+0.088
Neg logit contributions
 seeker-0.093
 desk-0.088
ird-0.082
 laptop-0.080
 liar-0.079
Token He
Token ID679
Feature activation-2.99e+00

Pos logit contributions
 desk+0.029
 seeker+0.028
ird+0.025
 office+0.024
 laptop+0.024
Neg logit contributions
reci-0.034
les-0.033
ling-0.029
elling-0.028
orting-0.027
Token walked
Token ID6807
Feature activation-6.47e-01

Pos logit contributions
les+0.095
reci+0.095
ling+0.082
orting+0.080
 Nem+0.078
Neg logit contributions
 seeker-0.091
 desk-0.085
 laptop-0.079
ird-0.078
 liar-0.076
Token and
Token ID290
Feature activation-4.88e+00

Pos logit contributions
reci+0.023
les+0.020
elling+0.018
orting+0.018
ience+0.017
Neg logit contributions
 seeker-0.020
 desk-0.018
 hide-0.016
 upstairs-0.015
 laptop-0.015
Token walked
Token ID6807
Feature activation-9.38e-01

Pos logit contributions
reci+0.179
les+0.171
orting+0.155
elling+0.149
ling+0.144
Neg logit contributions
 seeker-0.133
 desk-0.125
ird-0.110
 ha-0.108
 laptop-0.107
Token nap
Token ID25422
Feature activation-2.59e-01

Pos logit contributions
 seeker+0.167
 desk+0.156
ird+0.152
ena+0.150
 laptop+0.145
Neg logit contributions
reci-0.128
les-0.122
orting-0.106
 Nem-0.104
ling-0.100
Token.
Token ID13
Feature activation+2.18e+00

Pos logit contributions
les+0.008
reci+0.008
ling+0.007
orting+0.007
 Nem+0.006
Neg logit contributions
 seeker-0.008
 desk-0.007
 laptop-0.007
 principal-0.007
ird-0.007
Token They
Token ID1119
Feature activation-2.99e+00

Pos logit contributions
 seeker+0.082
 desk+0.077
ird+0.073
 laptop+0.072
ena+0.069
Neg logit contributions
les-0.059
reci-0.056
ling-0.051
orting-0.047
led-0.042
Token dream
Token ID4320
Feature activation-2.07e+00

Pos logit contributions
reci+0.108
les+0.106
orting+0.094
ling+0.092
From+0.088
Neg logit contributions
 seeker-0.079
 desk-0.074
ird-0.067
 laptop-0.066
 liar-0.063
Token of
Token ID286
Feature activation+3.76e-01

Pos logit contributions
reci+0.053
les+0.050
orting+0.042
ling+0.041
 Nem+0.040
Neg logit contributions
 seeker-0.075
 desk-0.074
ird-0.068
 laptop-0.068
 principal-0.065
Token the
Token ID262
Feature activation-1.15e+00

Pos logit contributions
 seeker+0.011
 desk+0.010
ird+0.010
 laptop+0.009
 liar+0.009
Neg logit contributions
reci-0.012
les-0.012
orting-0.011
ling-0.010
 Nem-0.010
Token river
Token ID7850
Feature activation-6.83e-01

Pos logit contributions
reci+0.034
les+0.034
ling+0.029
orting+0.028
 Nem+0.028
Neg logit contributions
 seeker-0.035
 desk-0.034
ird-0.034
 laptop-0.031
 liar-0.030
Token.
Token ID13
Feature activation+2.25e+00

Pos logit contributions
reci+0.022
les+0.021
orting+0.018
ling+0.017
From+0.017
Neg logit contributions
 seeker-0.021
 desk-0.020
 laptop-0.018
ird-0.018
 liar-0.017
Token<|endoftext|>
Token ID50256
Feature activation-3.21e+00

Pos logit contributions
 seeker+0.084
 desk+0.079
ird+0.074
 laptop+0.073
ena+0.071
Neg logit contributions
les-0.063
reci-0.058
ling-0.054
orting-0.049
led-0.045
TokenL
Token ID43
Feature activation+1.84e+00

Pos logit contributions
reci+0.074
les+0.071
orting+0.060
ling+0.057
From+0.053
Neg logit contributions
 seeker-0.126
 desk-0.119
ird-0.112
 laptop-0.111
 ha-0.108
Tokenily
Token ID813
Feature activation+5.74e-01

Pos logit contributions
 seeker+0.056
 desk+0.052
ird+0.048
 laptop+0.048
 liar+0.047
Neg logit contributions
les-0.059
reci-0.059
orting-0.051
ling-0.049
 Nem-0.047
Token.
Token ID13
Feature activation+1.45e+00

Pos logit contributions
reci+0.122
les+0.115
orting+0.105
ling+0.100
From+0.099
Neg logit contributions
 seeker-0.090
 desk-0.085
 laptop-0.076
ird-0.075
 liar-0.072
Token
Token ID198
Feature activation-2.94e+00

Pos logit contributions
 desk+0.038
 seeker+0.037
ird+0.035
 principal+0.033
 classroom+0.033
Neg logit contributions
les-0.050
reci-0.048
ling-0.047
orting-0.041
ience-0.041
Token
Token ID198
Feature activation-2.84e+00

Pos logit contributions
reci+0.079
les+0.076
orting+0.065
elling+0.063
ling+0.062
Neg logit contributions
 seeker-0.102
 desk-0.100
ird-0.092
 laptop-0.090
 principal-0.088
TokenAt
Token ID2953
Feature activation-3.17e+00

Pos logit contributions
reci+0.082
les+0.081
orting+0.070
ling+0.066
elling+0.066
Neg logit contributions
 seeker-0.095
 desk-0.090
 laptop-0.082
ird-0.082
 liar-0.081
Token the
Token ID262
Feature activation-1.80e+00

Pos logit contributions
reci+0.077
les+0.075
orting+0.063
ling+0.060
 Nem+0.057
Neg logit contributions
 seeker-0.118
 desk-0.113
ird-0.108
 laptop-0.105
 liar-0.104
Token party
Token ID2151
Feature activation-3.74e-01

Pos logit contributions
reci+0.056
les+0.055
orting+0.048
ling+0.046
 them+0.045
Neg logit contributions
 seeker-0.053
 desk-0.049
ird-0.047
 liar-0.047
 laptop-0.045
Token,
Token ID11
Feature activation+2.48e+00

Pos logit contributions
les+0.011
reci+0.011
orting+0.009
ling+0.009
 Nem+0.009
Neg logit contributions
 seeker-0.012
 desk-0.011
 liar-0.010
 laptop-0.010
ird-0.010
Token there
Token ID612
Feature activation-4.29e+00

Pos logit contributions
 seeker+0.086
 desk+0.082
ird+0.076
 laptop+0.076
 liar+0.073
Neg logit contributions
reci-0.069
les-0.067
orting-0.057
ling-0.055
 Nem-0.052
Token was
Token ID373
Feature activation+4.79e+00

Pos logit contributions
reci+0.129
les+0.127
orting+0.108
ling+0.106
 Nem+0.101
Neg logit contributions
 seeker-0.140
 desk-0.131
 laptop-0.122
ird-0.121
 liar-0.119
Token a
Token ID257
Feature activation+5.39e+00

Pos logit contributions
 seeker+0.140
ird+0.135
 desk+0.135
 laptop+0.127
 client+0.123
Neg logit contributions
les-0.131
reci-0.127
 Nem-0.127
 them-0.124
From-0.117
Token big
Token ID1263
Feature activation-4.98e-01

Pos logit contributions
 seeker+0.185
ird+0.183
ena+0.174
ow+0.164
 elbows+0.164
Neg logit contributions
reci-0.115
les-0.113
 Nem-0.103
ling-0.097
 refreshing-0.096
Token goodbye
Token ID24829
Feature activation-2.35e+00

Pos logit contributions
 seeker+0.047
 desk+0.045
ird+0.042
 laptop+0.041
 liar+0.040
Neg logit contributions
reci-0.036
les-0.035
orting-0.029
ling-0.029
elling-0.027
Token to
Token ID284
Feature activation+2.03e+00

Pos logit contributions
reci+0.074
les+0.072
orting+0.064
ling+0.061
From+0.058
Neg logit contributions
 seeker-0.073
 desk-0.069
ird-0.063
 laptop-0.062
 liar-0.060
Token their
Token ID511
Feature activation+4.37e-01

Pos logit contributions
 seeker+0.070
 desk+0.069
 laptop+0.063
ird+0.062
 principal+0.061
Neg logit contributions
reci-0.057
les-0.054
orting-0.047
ling-0.046
elling-0.043
Token friends
Token ID2460
Feature activation+6.32e-01

Pos logit contributions
 seeker+0.012
 desk+0.011
ird+0.011
 liar+0.010
 laptop+0.010
Neg logit contributions
reci-0.015
les-0.015
orting-0.013
ling-0.013
 Nem-0.012
Token.
Token ID13
Feature activation+1.77e+00

Pos logit contributions
 seeker+0.020
 desk+0.019
ird+0.017
 laptop+0.017
 principal+0.016
Neg logit contributions
reci-0.020
les-0.019
orting-0.017
ling-0.016
elling-0.016
Token They
Token ID1119
Feature activation-3.20e+00

Pos logit contributions
 seeker+0.068
 desk+0.065
ird+0.061
 laptop+0.060
 principal+0.058
Neg logit contributions
les-0.046
reci-0.045
ling-0.039
orting-0.035
elling-0.033
Token get
Token ID651
Feature activation+1.38e+00

Pos logit contributions
reci+0.104
les+0.100
orting+0.088
ling+0.084
From+0.083
Neg logit contributions
 seeker-0.093
 desk-0.089
 laptop-0.082
ird-0.081
 liar-0.078
Token in
Token ID287
Feature activation+1.52e+00

Pos logit contributions
 seeker+0.042
 desk+0.040
ird+0.037
 laptop+0.037
 liar+0.035
Neg logit contributions
reci-0.044
les-0.042
orting-0.037
ling-0.036
From-0.035
Token the
Token ID262
Feature activation-1.35e+00

Pos logit contributions
 seeker+0.064
 desk+0.061
ird+0.060
 liar+0.059
 principal+0.056
Neg logit contributions
reci-0.028
les-0.026
 them-0.022
orting-0.021
From-0.019
Token car
Token ID1097
Feature activation-7.43e-01

Pos logit contributions
reci+0.051
les+0.051
orting+0.044
ling+0.044
From+0.043
Neg logit contributions
 seeker-0.033
 desk-0.033
 laptop-0.029
 office-0.029
 principal-0.027
Token.
Token ID13
Feature activation+2.04e+00

Pos logit contributions
reci+0.021
les+0.021
orting+0.018
ling+0.017
From+0.016
Neg logit contributions
 seeker-0.025
 desk-0.024
ird-0.022
 liar-0.022
 laptop-0.022
Token."
Token ID526
Feature activation-5.21e-01

Pos logit contributions
 seeker+0.032
 desk+0.030
ird+0.027
 laptop+0.027
 liar+0.025
Neg logit contributions
reci-0.045
les-0.044
orting-0.039
ling-0.038
From-0.037
Token They
Token ID1119
Feature activation-3.46e+00

Pos logit contributions
reci+0.019
les+0.018
ling+0.018
ler+0.016
ience+0.016
Neg logit contributions
 desk-0.012
ird-0.012
 closet-0.011
 seeker-0.010
 slammed-0.010
Token bought
Token ID5839
Feature activation-2.70e+00

Pos logit contributions
reci+0.104
les+0.102
orting+0.088
ling+0.086
 Nem+0.081
Neg logit contributions
 seeker-0.112
 desk-0.106
 laptop-0.098
ird-0.097
 liar-0.095
Token some
Token ID617
Feature activation-1.32e+00

Pos logit contributions
reci+0.094
les+0.084
orting+0.081
From+0.078
ling+0.074
Neg logit contributions
 desk-0.071
 seeker-0.068
 x-0.063
 office-0.062
 laptop-0.061
Token food
Token ID2057
Feature activation+1.26e+00

Pos logit contributions
reci+0.042
les+0.040
orting+0.036
ling+0.034
From+0.034
Neg logit contributions
 seeker-0.040
 desk-0.039
 laptop-0.035
ird-0.035
 liar-0.033
Token for
Token ID329
Feature activation-3.27e-01

Pos logit contributions
 seeker+0.040
 desk+0.039
 laptop+0.035
ird+0.035
 office+0.034
Neg logit contributions
reci-0.040
les-0.037
orting-0.031
ling-0.031
From-0.030
Token the
Token ID262
Feature activation-7.26e-01

Pos logit contributions
reci+0.008
les+0.008
orting+0.007
ling+0.006
elling+0.006
Neg logit contributions
 seeker-0.012
 desk-0.012
ird-0.011
 laptop-0.011
 liar-0.011
Token man
Token ID582
Feature activation+3.82e+00

Pos logit contributions
reci+0.022
les+0.021
orting+0.018
ling+0.018
 Nem+0.017
Neg logit contributions
 seeker-0.023
 desk-0.022
ird-0.021
 laptop-0.020
ena-0.019
Token and
Token ID290
Feature activation-4.30e+00

Pos logit contributions
 seeker+0.119
 desk+0.115
ird+0.106
 laptop+0.105
 liar+0.100
Neg logit contributions
reci-0.119
les-0.114
orting-0.100
ling-0.096
From-0.093
Token gave
Token ID2921
Feature activation-1.30e+00

Pos logit contributions
reci+0.124
les+0.118
orting+0.104
ling+0.099
elling+0.094
Neg logit contributions
 seeker-0.145
 desk-0.142
ird-0.130
 laptop-0.128
 ha-0.124
Token it
Token ID340
Feature activation-3.02e+00

Pos logit contributions
reci+0.033
les+0.028
orting+0.028
From+0.025
ling+0.023
Neg logit contributions
 desk-0.045
 seeker-0.045
ird-0.042
 ha-0.041
 x-0.041
Token fun
Token ID1257
Feature activation-3.22e+00

Pos logit contributions
reci+0.066
les+0.064
orting+0.055
 Nem+0.052
From+0.051
Neg logit contributions
 seeker-0.080
 desk-0.076
ird-0.072
 laptop-0.070
 liar-0.068
Token together
Token ID1978
Feature activation-3.56e+00

Pos logit contributions
reci+0.088
les+0.088
orting+0.075
ling+0.073
From+0.067
Neg logit contributions
 seeker-0.113
 desk-0.106
ird-0.100
 laptop-0.098
 liar-0.097
Token.
Token ID13
Feature activation+1.99e+00

Pos logit contributions
reci+0.093
les+0.093
orting+0.080
ling+0.077
From+0.071
Neg logit contributions
 seeker-0.127
 desk-0.120
ird-0.113
 laptop-0.111
 liar-0.109
Token They
Token ID1119
Feature activation-3.11e+00

Pos logit contributions
 seeker+0.082
 desk+0.077
 laptop+0.073
ird+0.072
 principal+0.070
Neg logit contributions
les-0.047
reci-0.045
ling-0.039
orting-0.035
elling-0.032
Token are
Token ID389
Feature activation+4.32e+00

Pos logit contributions
reci+0.100
les+0.094
orting+0.082
ling+0.078
From+0.077
Neg logit contributions
 seeker-0.092
 desk-0.089
ird-0.083
 laptop-0.083
 liar-0.080
Token happy
Token ID3772
Feature activation-5.72e+00

Pos logit contributions
 desk+0.143
 seeker+0.142
ird+0.137
 laptop+0.133
 liar+0.126
Neg logit contributions
reci-0.110
les-0.108
ling-0.096
 Nem-0.092
orting-0.092
Token.
Token ID13
Feature activation+2.27e+00

Pos logit contributions
reci+0.202
les+0.192
orting+0.170
ling+0.165
elling+0.162
Neg logit contributions
 seeker-0.163
 desk-0.156
 laptop-0.137
 principal-0.133
ird-0.133
Token<|endoftext|>
Token ID50256
Feature activation-3.27e+00

Pos logit contributions
 seeker+0.089
 desk+0.082
ird+0.077
 laptop+0.077
ena+0.074
Neg logit contributions
les-0.059
reci-0.055
ling-0.050
orting-0.045
elling-0.042
TokenL
Token ID43
Feature activation+1.65e+00

Pos logit contributions
reci+0.068
les+0.065
orting+0.053
ling+0.050
From+0.045
Neg logit contributions
 seeker-0.136
 desk-0.131
ird-0.123
 laptop-0.122
 liar-0.120
Tokenily
Token ID813
Feature activation+5.11e-01

Pos logit contributions
 seeker+0.052
 desk+0.050
ird+0.045
 laptop+0.045
 liar+0.043
Neg logit contributions
reci-0.052
les-0.050
orting-0.044
ling-0.043
From-0.041
Token liked
Token ID8288
Feature activation-7.05e-02

Pos logit contributions
 seeker+0.017
 desk+0.016
ird+0.014
 laptop+0.014
 ha+0.014
Neg logit contributions
reci-0.015
les-0.014
orting-0.013
ling-0.012
elling-0.012
Token are
Token ID389
Feature activation+4.40e+00

Pos logit contributions
reci+0.102
les+0.099
ling+0.084
From+0.082
orting+0.082
Neg logit contributions
 seeker-0.084
 desk-0.082
ird-0.079
 office-0.076
 principal-0.075
Token hurt
Token ID5938
Feature activation-7.45e-01

Pos logit contributions
 seeker+0.119
 desk+0.116
ird+0.114
 laptop+0.100
 principal+0.099
Neg logit contributions
les-0.144
reci-0.142
ling-0.136
 them-0.127
 Nem-0.124
Token.
Token ID13
Feature activation+2.18e+00

Pos logit contributions
les+0.021
reci+0.020
ling+0.019
 them+0.018
orting+0.018
Neg logit contributions
 seeker-0.025
 desk-0.025
 laptop-0.023
ird-0.023
 principal-0.022
Token They
Token ID1119
Feature activation-2.95e+00

Pos logit contributions
 seeker+0.094
 desk+0.090
ird+0.085
 laptop+0.084
 liar+0.082
Neg logit contributions
reci-0.043
les-0.041
orting-0.033
ling-0.031
From-0.028
Token are
Token ID389
Feature activation+4.56e+00

Pos logit contributions
reci+0.089
les+0.086
ling+0.072
From+0.069
 them+0.069
Neg logit contributions
 seeker-0.091
 desk-0.089
ird-0.084
 office-0.082
 principal-0.081
Token sad
Token ID6507
Feature activation-5.27e+00

Pos logit contributions
 seeker+0.118
 desk+0.114
ird+0.108
 laptop+0.098
 principal+0.096
Neg logit contributions
reci-0.157
les-0.156
ling-0.146
 them-0.136
 Nem-0.135
Token.
Token ID13
Feature activation+2.25e+00

Pos logit contributions
reci+0.212
les+0.204
orting+0.185
ling+0.179
From+0.176
Neg logit contributions
 seeker-0.117
 desk-0.112
 laptop-0.099
ird-0.093
 liar-0.092
Token
Token ID198
Feature activation-2.85e+00

Pos logit contributions
 seeker+0.094
 desk+0.091
ird+0.085
 laptop+0.085
 liar+0.083
Neg logit contributions
reci-0.047
les-0.046
orting-0.036
ling-0.035
elling-0.031
Token
Token ID198
Feature activation-2.78e+00

Pos logit contributions
les+0.081
reci+0.079
orting+0.067
ling+0.066
elling+0.062
Neg logit contributions
 desk-0.094
 seeker-0.094
 laptop-0.087
 principal-0.083
ird-0.083
TokenThe
Token ID464
Feature activation-4.33e+00

Pos logit contributions
les+0.085
reci+0.082
elling+0.068
orting+0.068
ling+0.067
Neg logit contributions
 seeker-0.088
 desk-0.084
 ha-0.081
 laptop-0.080
ird-0.079
Token j
Token ID474
Feature activation-7.72e-01

Pos logit contributions
reci+0.123
les+0.121
orting+0.103
ling+0.101
From+0.095
Neg logit contributions
ird-0.137
 seeker-0.136
 desk-0.130
 laptop-0.118
 liar-0.118
Token crib
Token ID48083
Feature activation-1.39e-01

Pos logit contributions
reci+0.093
les+0.092
orting+0.082
ling+0.080
 Nem+0.076
Neg logit contributions
 seeker-0.063
 desk-0.059
ird-0.054
 laptop-0.052
 liar-0.052
Token where
Token ID810
Feature activation+1.11e+00

Pos logit contributions
les+0.003
reci+0.003
orting+0.003
ling+0.003
s+0.002
Neg logit contributions
 seeker-0.005
 desk-0.005
ird-0.005
 liar-0.005
 principal-0.005
Token she
Token ID673
Feature activation-2.51e+00

Pos logit contributions
 seeker+0.038
 desk+0.036
ird+0.034
 laptop+0.033
 liar+0.032
Neg logit contributions
reci-0.031
les-0.030
orting-0.026
ling-0.025
 Nem-0.024
Token would
Token ID561
Feature activation+1.68e+00

Pos logit contributions
reci+0.077
les+0.077
ling+0.065
orting+0.064
From+0.061
Neg logit contributions
 seeker-0.080
 desk-0.075
 laptop-0.068
ird-0.067
 liar-0.067
Token sit
Token ID1650
Feature activation-8.30e-01

Pos logit contributions
 seeker+0.047
 desk+0.043
ird+0.039
 laptop+0.039
 liar+0.038
Neg logit contributions
reci-0.058
les-0.056
orting-0.050
ling-0.048
From-0.047
Token and
Token ID290
Feature activation-4.69e+00

Pos logit contributions
reci+0.026
les+0.026
orting+0.023
ling+0.023
 Nem+0.021
Neg logit contributions
 seeker-0.025
 desk-0.024
ird-0.022
 laptop-0.022
 liar-0.021
Token knit
Token ID30495
Feature activation-1.42e+00

Pos logit contributions
reci+0.169
les+0.165
orting+0.148
ling+0.143
From+0.137
Neg logit contributions
 seeker-0.124
 desk-0.117
ird-0.105
 laptop-0.104
 liar-0.099
Token all
Token ID477
Feature activation-3.55e+00

Pos logit contributions
reci+0.048
les+0.046
orting+0.041
ling+0.040
elling+0.038
Neg logit contributions
 seeker-0.041
 desk-0.039
ird-0.035
 laptop-0.035
 liar-0.034
Token day
Token ID1110
Feature activation-2.89e+00

Pos logit contributions
reci+0.113
orting+0.108
les+0.106
ience+0.098
 Nem+0.097
Neg logit contributions
 desk-0.092
 seeker-0.089
ird-0.084
 office-0.081
 laptop-0.076
Token long
Token ID890
Feature activation-3.46e+00

Pos logit contributions
les+0.095
reci+0.095
ling+0.083
orting+0.083
 Nem+0.077
Neg logit contributions
 seeker-0.085
 desk-0.076
 liar-0.070
ird-0.070
 principal-0.069
Token.
Token ID13
Feature activation+1.49e+00

Pos logit contributions
les+0.118
reci+0.117
orting+0.102
ling+0.100
 Nem+0.095
Neg logit contributions
 seeker-0.099
 desk-0.092
ird-0.082
 laptop-0.082
 liar-0.081
Token<|endoftext|>
Token ID50256
Feature activation-3.80e+00

Pos logit contributions
 desk+0.025
 seeker+0.025
ird+0.023
 classroom+0.022
 laptop+0.022
Neg logit contributions
les-0.031
reci-0.029
ling-0.028
orting-0.026
 Nem-0.026
TokenOnce
Token ID7454
Feature activation-2.50e+00

Pos logit contributions
reci+0.081
les+0.078
orting+0.064
ling+0.060
From+0.055
Neg logit contributions
 seeker-0.156
 desk-0.150
ird-0.140
 laptop-0.139
 liar-0.136
Token upon
Token ID2402
Feature activation-1.13e+00

Pos logit contributions
reci+0.082
les+0.080
orting+0.073
From+0.068
elling+0.068
Neg logit contributions
 seeker-0.072
 desk-0.072
ird-0.062
 laptop-0.060
 principal-0.057
Token a
Token ID257
Feature activation+2.94e+00

Pos logit contributions
les+0.034
reci+0.033
From+0.030
 them+0.030
ling+0.029
Neg logit contributions
 seeker-0.035
 desk-0.032
 laptop-0.030
 principal-0.030
ird-0.029
Token time
Token ID640
Feature activation-1.87e+00

Pos logit contributions
 seeker+0.084
 desk+0.079
ird+0.072
 laptop+0.071
 liar+0.068
Neg logit contributions
reci-0.100
les-0.097
orting-0.086
ling-0.083
elling-0.080
Token there
Token ID612
Feature activation-4.69e+00

Pos logit contributions
reci+0.057
les+0.056
orting+0.050
ling+0.048
 Nem+0.047
Neg logit contributions
 seeker-0.057
 desk-0.055
 laptop-0.050
ird-0.050
 liar-0.049
Token were
Token ID547
Feature activation+3.47e+00

Pos logit contributions
reci+0.130
les+0.130
ling+0.108
orting+0.107
 Nem+0.106
Neg logit contributions
 seeker-0.160
 desk-0.151
 laptop-0.143
 liar-0.139
ird-0.139
Token two
Token ID734
Feature activation-5.54e-01

Pos logit contributions
 seeker+0.097
 desk+0.088
ird+0.082
 ha+0.078
 laptop+0.076
Neg logit contributions
reci-0.122
les-0.121
orting-0.107
ling-0.104
From-0.096
Token friends
Token ID2460
Feature activation+9.11e-01

Pos logit contributions
reci+0.017
les+0.017
ling+0.015
orting+0.015
 Nem+0.014
Neg logit contributions
 seeker-0.017
 desk-0.016
ird-0.015
 laptop-0.015
 liar-0.014
Token,
Token ID11
Feature activation+2.23e+00

Pos logit contributions
 seeker+0.026
 desk+0.024
 laptop+0.021
ird+0.021
 liar+0.021
Neg logit contributions
reci-0.031
les-0.031
orting-0.028
ling-0.027
 Nem-0.026
Token their
Token ID511
Feature activation+7.78e-01

Pos logit contributions
 seeker+0.074
 desk+0.073
 laptop+0.068
ird+0.066
 liar+0.065
Neg logit contributions
reci-0.056
les-0.055
 Nem-0.052
orting-0.048
From-0.047
Token,
Token ID11
Feature activation+2.21e+00

Pos logit contributions
reci+0.037
les+0.036
orting+0.032
ling+0.031
elling+0.030
Neg logit contributions
 seeker-0.028
 desk-0.026
ird-0.024
 laptop-0.023
 ha-0.022
Token Lily
Token ID20037
Feature activation-1.06e-01

Pos logit contributions
 seeker+0.074
 desk+0.070
ird+0.064
 laptop+0.062
 ha+0.062
Neg logit contributions
reci-0.067
les-0.065
orting-0.056
ling-0.055
elling-0.053
Token held
Token ID2714
Feature activation-2.08e+00

Pos logit contributions
reci+0.003
les+0.003
orting+0.003
ling+0.003
elling+0.003
Neg logit contributions
 seeker-0.003
 desk-0.003
ird-0.003
 laptop-0.003
 liar-0.003
Token her
Token ID607
Feature activation+2.79e+00

Pos logit contributions
reci+0.067
les+0.066
orting+0.057
ling+0.056
elling+0.053
Neg logit contributions
 seeker-0.063
 desk-0.060
ird-0.055
 laptop-0.054
 liar-0.053
Token family
Token ID1641
Feature activation-1.82e-01

Pos logit contributions
 seeker+0.074
 desk+0.066
ird+0.065
 liar+0.061
 laptop+0.060
Neg logit contributions
reci-0.099
les-0.098
orting-0.086
ling-0.086
 Nem-0.081
Token and
Token ID290
Feature activation-4.84e+00

Pos logit contributions
reci+0.006
les+0.006
orting+0.005
ling+0.005
elling+0.005
Neg logit contributions
 seeker-0.006
 desk-0.005
 laptop-0.005
ird-0.005
 ha-0.005
Token friends
Token ID2460
Feature activation+3.12e-01

Pos logit contributions
reci+0.130
les+0.128
orting+0.110
ling+0.106
 Nem+0.102
Neg logit contributions
 seeker-0.167
 desk-0.158
ird-0.148
 laptop-0.148
 liar-0.145
Token close
Token ID1969
Feature activation-4.04e+00

Pos logit contributions
 seeker+0.010
 desk+0.009
 laptop+0.009
ird+0.008
 liar+0.008
Neg logit contributions
reci-0.010
les-0.009
ling-0.008
orting-0.008
 Nem-0.007
Token to
Token ID284
Feature activation+1.57e+00

Pos logit contributions
reci+0.135
les+0.130
orting+0.116
ling+0.112
From+0.109
Neg logit contributions
 seeker-0.117
 desk-0.112
ird-0.100
 laptop-0.100
 principal-0.095
Token her
Token ID607
Feature activation+2.34e+00

Pos logit contributions
 seeker+0.062
 desk+0.057
ird+0.056
 laptop+0.054
 liar+0.054
Neg logit contributions
reci-0.036
les-0.034
 them-0.030
orting-0.028
ling-0.027
Token heart
Token ID2612
Feature activation-2.54e+00

Pos logit contributions
 seeker+0.071
ird+0.061
 liar+0.059
 desk+0.058
 ha+0.055
Neg logit contributions
reci-0.076
les-0.075
orting-0.065
ling-0.065
s-0.061