TOP ACTIVATIONS
MAX = 9.639

cleaned up the broken ash tr ay and Lily learned to
smoking and change the ash tr ay for a toy instead
on blowing until the ash tr ay was so dry that
the cloth onto the ash tr ay . He worked hard
wanted to keep the ash tr ay . So he tried
your fingers in the ash tr ay , so Jon had
not played with the ash tr ay . They wished they
using her toy trains to decor ate . She also learned
helped Lily to tie the rib bons on the door .
suggested . He tied the rib bons onto the stick and
. Jack put the ash tr ay on the kitchen table
throw away the broken ash tr ay and soon forgotten about
the tree and tie the rib bons ." They
can use this broken ash tr ay as a puzzle !"
ite . They tied the rib bons and the feathers to
should never touch an ash tr ay ." S
will not touch the ash tr ay again ," Tim says
tried to fix the ash tr ay , but it was
children helped the girl put rib bons in her hair .
't have all these ch ocol ates ! Now , go

MIN ACTIVATIONS
MIN = -9.013

time , there was a curious girl named Jill . She
time , there was a curious little jelly fish . Everything
time , there was a curious little girl called Maria .
time , there was a curious little cat named Kitty .
a time there was a curious little girl called Daisy .
time , there was a curious little boy named Tim .
time , there was a curious boy . He was walking
a time there was a curious little girl . She was
a time there was a curious little girl named Lucy .
Mom my had a great suggestion . She wanted to go
time , there was a curious little boy . He wanted
a time there was a curious girl . She wanted to
time , there was a curious little boy called Bob .
time , there was a curious cater pillar . He went
time , there was a curious little bird named Twe ety
time , there was a curious little girl called Lucy .
time , there was a curious little boy named Tim .
time , there was a curious pet . The pet had
One day , Mom my said it was time to go
time , there was a curious boy named Tim . Tim

INTERVAL 4.976 - 9.639
CONTAINS 0.653%

back home . When he got there ,
on an exciting adventure . The journey was hard
the horn some more . At the end of the day
something so wonderful . Everyone in the garden
they went to sleep . The next day ,

INTERVAL 0.313 - 4.976
CONTAINS 46.192%

Tom ran to help him . He grabbed a
beautiful , unique necklace made out of gl immer ing jewels
held the gem in their hands . They knocked on the
angry . They decide to take the car without paying .
L ily nodded . She smiled and hugged Tom .

INTERVAL -4.350 - 0.313
CONTAINS 49.227%

for another one , but her mom said she had to
and singing and laughing . And that 's how the dead
his way home , Bob met a new friend , a
, " No , I 'm sorry , that 's not
van , so Daddy helped her walk over to it .

INTERVAL -9.013 - -4.350
CONTAINS 3.928%

, " I 'm here , Ben ! I found a
. The waiter suggested he give a tip ,
finished , Ally smiled and said , " We did it
before ! <|endoftext|> One day , Tim my went to the
, there was a little boy named Jack . Jack was
Token cleaned
Token ID20750
Feature activation+2.81e+00

Pos logit contributions
lights+0.073
lo+0.072
vation+0.062
aro+0.061
opl+0.061
Neg logit contributions
 enough-0.042
 indeed-0.040
 like-0.040
 accidentally-0.036
 please-0.036
Token up
Token ID510
Feature activation+3.34e+00

Pos logit contributions
lights+0.074
lo+0.073
vation+0.065
aro+0.063
opl+0.062
Neg logit contributions
 enough-0.049
 like-0.046
 indeed-0.046
 accidentally-0.042
 Can-0.041
Token the
Token ID262
Feature activation+3.44e+00

Pos logit contributions
lights+0.107
lo+0.106
vation+0.105
opl+0.102
aro+0.097
Neg logit contributions
 enough-0.042
 like-0.039
 indeed-0.032
 three-0.032
 a-0.032
Token broken
Token ID5445
Feature activation+2.64e+00

Pos logit contributions
lights+0.085
lo+0.083
aro+0.070
vation+0.069
irl+0.067
Neg logit contributions
 indeed-0.066
 enough-0.065
 like-0.064
 Can-0.061
 accidentally-0.058
Token ash
Token ID12530
Feature activation+4.06e+00

Pos logit contributions
lights+0.058
lo+0.056
aro+0.044
eds+0.043
opl+0.043
Neg logit contributions
 indeed-0.056
 enough-0.055
 like-0.054
 wanna-0.053
 Can-0.053
Tokentr
Token ID2213
Feature activation+9.64e+00

Pos logit contributions
lo+0.088
lights+0.085
irl+0.083
aro+0.082
vation+0.080
Neg logit contributions
 like-0.088
 enough-0.087
 indeed-0.085
,-0.077
 She-0.075
Tokenay
Token ID323
Feature activation+5.58e+00

Pos logit contributions
lo+0.219
aro+0.207
vation+0.203
lights+0.198
irl+0.196
Neg logit contributions
 indeed-0.188
 enough-0.186
 like-0.168
 please-0.167
 She-0.161
Token and
Token ID290
Feature activation+2.96e+00

Pos logit contributions
lights+0.210
lo+0.208
irl+0.195
vation+0.194
aro+0.194
Neg logit contributions
 indeed-0.029
 enough-0.028
 like-0.028
 accidentally-0.014
,-0.013
Token Lily
Token ID20037
Feature activation+2.14e+00

Pos logit contributions
lo+0.094
lights+0.090
opl+0.081
vation+0.080
aro+0.080
Neg logit contributions
 enough-0.041
 like-0.040
 indeed-0.040
 accidentally-0.037
 wanted-0.035
Token learned
Token ID4499
Feature activation-1.43e+00

Pos logit contributions
lights+0.064
lo+0.063
aro+0.054
vation+0.053
opl+0.052
Neg logit contributions
 enough-0.031
 indeed-0.031
 like-0.030
 accidentally-0.028
 wanted-0.026
Token to
Token ID284
Feature activation+3.01e+00

Pos logit contributions
 indeed+0.019
 enough+0.018
 Can+0.018
 like+0.017
 wanna+0.017
Neg logit contributions
lights-0.043
lo-0.042
aro-0.035
irl-0.035
ups-0.035
Token smoking
Token ID9216
Feature activation+2.09e+00

Pos logit contributions
 indeed+0.005
 enough+0.004
 Can+0.004
 like+0.004
 wanna+0.004
Neg logit contributions
lights-0.008
lo-0.008
aro-0.007
we-0.007
irl-0.007
Token and
Token ID290
Feature activation+1.56e+00

Pos logit contributions
lights+0.064
lo+0.063
aro+0.053
vation+0.053
irl+0.052
Neg logit contributions
 indeed-0.026
 enough-0.025
 Can-0.024
 like-0.024
 wanna-0.023
Token change
Token ID1487
Feature activation+3.54e+00

Pos logit contributions
lights+0.043
lo+0.042
aro+0.035
vation+0.035
irl+0.034
Neg logit contributions
 indeed-0.025
 enough-0.025
 like-0.024
 Can-0.023
 please-0.022
Token the
Token ID262
Feature activation+3.66e+00

Pos logit contributions
lights+0.106
lo+0.106
vation+0.094
aro+0.092
opl+0.090
Neg logit contributions
 enough-0.048
 like-0.047
 indeed-0.046
 Can-0.040
 ok-0.039
Token ash
Token ID12530
Feature activation+3.74e+00

Pos logit contributions
lights+0.095
lo+0.092
aro+0.078
vation+0.077
irl+0.076
Neg logit contributions
 indeed-0.066
 enough-0.065
 like-0.065
 Can-0.060
 ok-0.058
Tokentr
Token ID2213
Feature activation+9.44e+00

Pos logit contributions
lights+0.078
lo+0.076
irl+0.071
vation+0.068
aro+0.066
Neg logit contributions
 like-0.086
 indeed-0.084
 enough-0.083
 please-0.076
 just-0.073
Tokenay
Token ID323
Feature activation+5.21e+00

Pos logit contributions
lo+0.215
aro+0.202
lights+0.198
vation+0.197
irl+0.194
Neg logit contributions
 indeed-0.183
 enough-0.181
 please-0.167
 like-0.163
 She-0.152
Token for
Token ID329
Feature activation+3.41e+00

Pos logit contributions
lights+0.147
lo+0.143
irl+0.126
vation+0.126
aro+0.124
Neg logit contributions
 indeed-0.080
 like-0.079
 enough-0.079
 ok-0.067
 please-0.067
Token a
Token ID257
Feature activation-4.70e-01

Pos logit contributions
lights+0.110
lo+0.108
vation+0.094
aro+0.093
opl+0.091
Neg logit contributions
 indeed-0.040
 enough-0.039
 like-0.037
 Can-0.033
 please-0.033
Token toy
Token ID13373
Feature activation+2.79e-01

Pos logit contributions
 indeed+0.008
 enough+0.008
 like+0.007
 Can+0.007
 gonna+0.007
Neg logit contributions
lights-0.012
lo-0.012
vation-0.010
aro-0.010
irl-0.010
Token instead
Token ID2427
Feature activation+1.11e+00

Pos logit contributions
lights+0.007
lo+0.007
aro+0.006
vation+0.005
opl+0.005
Neg logit contributions
 indeed-0.005
 enough-0.005
 Can-0.005
 like-0.005
 wanna-0.005
Token on
Token ID319
Feature activation+1.44e+00

Pos logit contributions
lights+0.017
lo+0.016
vation+0.014
aro+0.014
opl+0.013
Neg logit contributions
 enough-0.008
 indeed-0.008
 like-0.008
 please-0.007
 accidentally-0.007
Token blowing
Token ID19280
Feature activation+4.00e+00

Pos logit contributions
lights+0.039
lo+0.039
aro+0.033
vation+0.032
opl+0.031
Neg logit contributions
 enough-0.025
 indeed-0.024
 like-0.023
 accidentally-0.021
 ok-0.021
Token until
Token ID1566
Feature activation+9.39e-01

Pos logit contributions
lights+0.120
lo+0.119
vation+0.111
aro+0.107
opl+0.104
Neg logit contributions
 like-0.055
 enough-0.053
 indeed-0.050
 ok-0.041
 please-0.041
Token the
Token ID262
Feature activation+2.21e+00

Pos logit contributions
lo+0.032
lights+0.032
aro+0.028
opl+0.027
cks+0.027
Neg logit contributions
 enough-0.011
 indeed-0.010
 like-0.010
 accidentally-0.009
 please-0.008
Token ash
Token ID12530
Feature activation+3.35e+00

Pos logit contributions
lights+0.061
lo+0.059
aro+0.051
irl+0.051
vation+0.051
Neg logit contributions
 indeed-0.037
 like-0.036
 enough-0.035
 Can-0.033
 please-0.032
Tokentr
Token ID2213
Feature activation+9.40e+00

Pos logit contributions
lo+0.070
lights+0.068
aro+0.063
irl+0.063
vation+0.060
Neg logit contributions
 like-0.075
 indeed-0.073
 enough-0.072
 wanted-0.068
 accidentally-0.066
Tokenay
Token ID323
Feature activation+4.95e+00

Pos logit contributions
lo+0.210
aro+0.199
lights+0.192
vation+0.188
irl+0.187
Neg logit contributions
 indeed-0.191
 enough-0.188
 like-0.176
 please-0.173
 She-0.166
Token was
Token ID373
Feature activation-6.34e-01

Pos logit contributions
lights+0.143
lo+0.138
aro+0.120
vation+0.120
irl+0.117
Neg logit contributions
 indeed-0.074
 enough-0.073
 like-0.072
 accidentally-0.063
 wanted-0.062
Token so
Token ID523
Feature activation-3.87e-01

Pos logit contributions
 indeed+0.012
 enough+0.011
 like+0.011
 Can+0.010
 please+0.010
Neg logit contributions
lights-0.016
lo-0.016
aro-0.013
vation-0.013
opl-0.013
Token dry
Token ID5894
Feature activation+1.06e+00

Pos logit contributions
 like+0.007
 indeed+0.007
 enough+0.006
 please+0.006
 ok+0.006
Neg logit contributions
lo-0.011
lights-0.010
aro-0.009
rob-0.009
opl-0.009
Token that
Token ID326
Feature activation+7.47e-01

Pos logit contributions
lights+0.029
lo+0.028
vation+0.023
aro+0.023
opl+0.022
Neg logit contributions
 indeed-0.017
 enough-0.017
 like-0.016
 Can-0.016
 please-0.015
Token the
Token ID262
Feature activation+2.58e+00

Pos logit contributions
lights+0.086
lo+0.085
vation+0.078
aro+0.076
opl+0.075
Neg logit contributions
 like-0.033
 enough-0.032
 indeed-0.030
 Can-0.026
 accidentally-0.025
Token cloth
Token ID16270
Feature activation+4.02e+00

Pos logit contributions
lights+0.066
lo+0.064
aro+0.054
vation+0.054
irl+0.053
Neg logit contributions
 indeed-0.047
 enough-0.046
 like-0.046
 Can-0.043
 please-0.041
Token onto
Token ID4291
Feature activation+2.94e+00

Pos logit contributions
lo+0.123
lights+0.121
aro+0.108
vation+0.106
opl+0.104
Neg logit contributions
 enough-0.055
 like-0.054
 indeed-0.052
 please-0.043
 accidentally-0.042
Token the
Token ID262
Feature activation+3.16e+00

Pos logit contributions
lights+0.089
lo+0.086
vation+0.076
aro+0.075
ites+0.073
Neg logit contributions
 enough-0.041
 like-0.040
 indeed-0.040
 Can-0.035
 accidentally-0.034
Token ash
Token ID12530
Feature activation+2.60e+00

Pos logit contributions
lights+0.078
lo+0.076
aro+0.063
vation+0.063
irl+0.061
Neg logit contributions
 indeed-0.060
 enough-0.059
 like-0.058
 Can-0.055
 please-0.053
Tokentr
Token ID2213
Feature activation+9.39e+00

Pos logit contributions
lo+0.052
lights+0.051
irl+0.047
aro+0.047
vation+0.045
Neg logit contributions
 like-0.061
 enough-0.060
 indeed-0.058
 pad-0.053
 please-0.053
Tokenay
Token ID323
Feature activation+4.32e+00

Pos logit contributions
lo+0.210
aro+0.202
lights+0.194
vation+0.188
irl+0.186
Neg logit contributions
 indeed-0.186
 enough-0.183
 like-0.171
 please-0.170
 She-0.164
Token.
Token ID13
Feature activation+1.62e-02

Pos logit contributions
lights+0.128
lo+0.123
aro+0.109
vation+0.109
irl+0.107
Neg logit contributions
 indeed-0.061
 like-0.060
 enough-0.059
 please-0.050
 Can-0.049
Token He
Token ID679
Feature activation+2.04e+00

Pos logit contributions
lights+0.001
lo+0.001
aro+0.000
vation+0.000
irl+0.000
Neg logit contributions
 indeed-0.000
 enough-0.000
 like-0.000
 Can-0.000
 please-0.000
Token worked
Token ID3111
Feature activation+2.15e+00

Pos logit contributions
lo+0.060
lights+0.060
aro+0.051
vation+0.051
opl+0.050
Neg logit contributions
 indeed-0.029
 like-0.029
 enough-0.029
 accidentally-0.027
 wanted-0.026
Token hard
Token ID1327
Feature activation+2.60e+00

Pos logit contributions
lights+0.058
lo+0.057
aro+0.048
vation+0.048
opl+0.046
Neg logit contributions
 indeed-0.036
 enough-0.036
 like-0.035
 Can-0.032
 please-0.031
Token wanted
Token ID2227
Feature activation-7.81e-01

Pos logit contributions
lights+0.028
lo+0.027
irl+0.023
opl+0.023
eds+0.023
Neg logit contributions
 enough-0.015
 detectives-0.014
 indeed-0.014
 Can-0.013
 please-0.013
Token to
Token ID284
Feature activation+1.84e+00

Pos logit contributions
 indeed+0.008
 Can+0.007
 like+0.007
 enough+0.007
 wanna+0.007
Neg logit contributions
lights-0.025
lo-0.025
aro-0.021
ups-0.021
vation-0.020
Token keep
Token ID1394
Feature activation+2.44e+00

Pos logit contributions
lights+0.048
lo+0.046
aro+0.039
vation+0.038
irl+0.037
Neg logit contributions
 indeed-0.033
 enough-0.032
 like-0.032
 Can-0.030
 please-0.029
Token the
Token ID262
Feature activation+2.81e+00

Pos logit contributions
lights+0.072
lo+0.071
aro+0.060
vation+0.060
irl+0.059
Neg logit contributions
 indeed-0.035
 enough-0.034
 like-0.033
 Can-0.031
 please-0.029
Token ash
Token ID12530
Feature activation+3.12e+00

Pos logit contributions
lights+0.068
lo+0.067
aro+0.054
vation+0.054
irl+0.052
Neg logit contributions
 indeed-0.054
 enough-0.054
 like-0.052
 Can-0.050
 please-0.048
Tokentr
Token ID2213
Feature activation+9.24e+00

Pos logit contributions
lights+0.071
lo+0.071
irl+0.061
aro+0.061
vation+0.061
Neg logit contributions
 like-0.066
 enough-0.065
 indeed-0.064
 please-0.059
 ok-0.058
Tokenay
Token ID323
Feature activation+4.62e+00

Pos logit contributions
lo+0.213
aro+0.200
lights+0.195
irl+0.189
vation+0.188
Neg logit contributions
 indeed-0.180
 please-0.173
 enough-0.172
 like-0.164
 She-0.155
Token.
Token ID13
Feature activation+1.51e-01

Pos logit contributions
lights+0.128
lo+0.126
aro+0.107
vation+0.106
irl+0.105
Neg logit contributions
 indeed-0.074
 enough-0.073
 like-0.071
 Can-0.065
 please-0.064
Token So
Token ID1406
Feature activation-1.07e+00

Pos logit contributions
lights+0.005
lo+0.005
aro+0.004
vation+0.004
irl+0.004
Neg logit contributions
 indeed-0.001
 enough-0.001
 like-0.001
 Can-0.001
 please-0.001
Token he
Token ID339
Feature activation+1.56e+00

Pos logit contributions
 indeed+0.009
 enough+0.008
 like+0.008
 Can+0.008
 wanna+0.008
Neg logit contributions
lights-0.037
lo-0.036
aro-0.031
vation-0.031
irl-0.031
Token tried
Token ID3088
Feature activation-1.31e+00

Pos logit contributions
lights+0.037
lo+0.037
irl+0.032
ups+0.031
aro+0.031
Neg logit contributions
 Can-0.027
 indeed-0.026
 like-0.026
 wanna-0.025
 enough-0.025
Token your
Token ID534
Feature activation+2.50e+00

Pos logit contributions
lights+0.141
lo+0.135
vation+0.121
aro+0.115
irl+0.114
Neg logit contributions
 enough-0.080
 like-0.079
 indeed-0.076
 please-0.067
 Can-0.066
Token fingers
Token ID9353
Feature activation+4.65e+00

Pos logit contributions
lights+0.070
lo+0.066
vation+0.060
irl+0.059
aro+0.057
Neg logit contributions
 like-0.040
 indeed-0.040
 enough-0.039
 dear-0.036
 Can-0.036
Token in
Token ID287
Feature activation+4.11e+00

Pos logit contributions
lights+0.127
lo+0.126
vation+0.106
aro+0.106
opl+0.103
Neg logit contributions
 enough-0.075
 indeed-0.074
 like-0.074
 please-0.066
 Can-0.066
Token the
Token ID262
Feature activation+3.79e+00

Pos logit contributions
lights+0.123
lo+0.121
vation+0.106
aro+0.103
irl+0.100
Neg logit contributions
 enough-0.059
 indeed-0.059
 like-0.058
 Can-0.050
 please-0.050
Token ash
Token ID12530
Feature activation+2.62e+00

Pos logit contributions
lights+0.109
lo+0.103
aro+0.091
vation+0.090
irl+0.090
Neg logit contributions
 like-0.061
 indeed-0.061
 enough-0.060
 Can-0.057
 please-0.053
Tokentr
Token ID2213
Feature activation+9.14e+00

Pos logit contributions
lights+0.053
lo+0.052
irl+0.048
aro+0.047
vation+0.046
Neg logit contributions
 like-0.060
 indeed-0.059
 enough-0.058
 please-0.055
 Can-0.053
Tokenay
Token ID323
Feature activation+4.51e+00

Pos logit contributions
lo+0.207
aro+0.197
lights+0.190
vation+0.188
irl+0.184
Neg logit contributions
 enough-0.177
 indeed-0.177
 please-0.167
 like-0.157
 She-0.151
Token,
Token ID11
Feature activation-1.78e+00

Pos logit contributions
lights+0.120
lo+0.117
aro+0.100
vation+0.099
irl+0.098
Neg logit contributions
 indeed-0.076
 enough-0.075
 like-0.074
 Can-0.068
 please-0.067
Token so
Token ID523
Feature activation+7.88e-01

Pos logit contributions
 indeed+0.018
 enough+0.018
 Can+0.016
 like+0.015
 wanna+0.014
Neg logit contributions
lights-0.058
lo-0.055
aro-0.049
vation-0.047
irl-0.047
Token Jon
Token ID5966
Feature activation+7.20e-01

Pos logit contributions
lights+0.023
lo+0.023
aro+0.020
vation+0.019
irl+0.019
Neg logit contributions
 indeed-0.011
 enough-0.011
 like-0.010
 Can-0.010
 please-0.009
Token had
Token ID550
Feature activation+5.73e-02

Pos logit contributions
lo+0.018
lights+0.017
ups+0.016
vation+0.015
aro+0.015
Neg logit contributions
 indeed-0.011
 Can-0.011
 please-0.010
 detectives-0.010
 enough-0.010
Token not
Token ID407
Feature activation+2.85e+00

Pos logit contributions
lights+0.056
lo+0.054
vation+0.046
opl+0.046
aro+0.046
Neg logit contributions
 enough-0.023
 indeed-0.022
 like-0.022
 ok-0.019
 accidentally-0.019
Token played
Token ID2826
Feature activation+2.07e+00

Pos logit contributions
lights+0.080
lo+0.079
vation+0.066
aro+0.066
irl+0.064
Neg logit contributions
 indeed-0.045
 enough-0.044
 like-0.043
 Can-0.040
 please-0.038
Token with
Token ID351
Feature activation+3.73e+00

Pos logit contributions
lights+0.056
lo+0.055
aro+0.046
vation+0.046
irl+0.045
Neg logit contributions
 indeed-0.034
 enough-0.034
 like-0.033
 Can-0.031
 please-0.030
Token the
Token ID262
Feature activation+2.55e+00

Pos logit contributions
lights+0.104
lo+0.103
vation+0.097
opl+0.090
aro+0.087
Neg logit contributions
 enough-0.059
 like-0.057
 indeed-0.056
 accidentally-0.051
 please-0.049
Token ash
Token ID12530
Feature activation+2.48e+00

Pos logit contributions
lights+0.068
lo+0.065
vation+0.056
aro+0.055
irl+0.055
Neg logit contributions
 indeed-0.044
 like-0.044
 enough-0.043
 Can-0.040
 accidentally-0.039
Tokentr
Token ID2213
Feature activation+9.09e+00

Pos logit contributions
lights+0.050
lo+0.049
irl+0.045
vation+0.043
aro+0.042
Neg logit contributions
 like-0.059
 indeed-0.058
 enough-0.057
 please-0.053
 Can-0.053
Tokenay
Token ID323
Feature activation+4.42e+00

Pos logit contributions
lo+0.209
aro+0.200
vation+0.193
lights+0.192
irl+0.188
Neg logit contributions
 indeed-0.177
 enough-0.172
 please-0.160
 like-0.157
 Can-0.149
Token.
Token ID13
Feature activation-9.78e-01

Pos logit contributions
lights+0.123
lo+0.120
vation+0.108
irl+0.106
aro+0.104
Neg logit contributions
 indeed-0.068
 enough-0.068
 like-0.067
 Can-0.060
 please-0.059
Token They
Token ID1119
Feature activation+1.48e+00

Pos logit contributions
 indeed+0.015
 enough+0.014
 like+0.014
 Can+0.013
 please+0.013
Neg logit contributions
lights-0.028
lo-0.027
aro-0.023
vation-0.023
irl-0.023
Token wished
Token ID16555
Feature activation-1.73e+00

Pos logit contributions
lights+0.049
lo+0.048
vation+0.042
aro+0.042
opl+0.041
Neg logit contributions
 indeed-0.016
 like-0.016
 enough-0.016
 accidentally-0.014
 wanted-0.013
Token they
Token ID484
Feature activation+3.55e+00

Pos logit contributions
 indeed+0.026
 enough+0.026
 like+0.026
 please+0.023
 three+0.022
Neg logit contributions
lo-0.049
lights-0.047
vation-0.046
opl-0.045
asis-0.044
Token using
Token ID1262
Feature activation+4.49e+00

Pos logit contributions
lights+0.010
lo+0.010
vation+0.008
aro+0.008
irl+0.008
Neg logit contributions
 indeed-0.005
 enough-0.005
 like-0.005
 Can-0.004
 please-0.004
Token her
Token ID607
Feature activation+2.30e+00

Pos logit contributions
lights+0.145
lo+0.142
vation+0.127
aro+0.124
asis+0.121
Neg logit contributions
 enough-0.055
 indeed-0.052
 like-0.052
 please-0.043
 Can-0.043
Token toy
Token ID13373
Feature activation+3.11e+00

Pos logit contributions
lo+0.062
lights+0.062
vation+0.056
asis+0.053
aro+0.053
Neg logit contributions
 enough-0.038
 like-0.038
 indeed-0.037
 daughter-0.036
 three-0.035
Token trains
Token ID13404
Feature activation+4.72e+00

Pos logit contributions
lights+0.081
lo+0.081
vation+0.072
irl+0.070
opl+0.068
Neg logit contributions
 enough-0.053
 like-0.052
 indeed-0.051
 Can-0.047
 please-0.046
Token to
Token ID284
Feature activation+4.95e+00

Pos logit contributions
lights+0.141
lo+0.137
aro+0.119
vation+0.118
irl+0.117
Neg logit contributions
 like-0.070
 enough-0.069
 indeed-0.067
 please-0.059
 Can-0.057
Token decor
Token ID11705
Feature activation+9.07e+00

Pos logit contributions
lights+0.143
lo+0.133
vation+0.118
aro+0.117
ites+0.113
Neg logit contributions
 like-0.092
 please-0.081
 indeed-0.078
 enough-0.075
 accidentally-0.072
Tokenate
Token ID378
Feature activation+4.42e+00

Pos logit contributions
aro+0.211
lights+0.190
lo+0.187
vation+0.177
asis+0.164
Neg logit contributions
 enough-0.179
 like-0.170
 indeed-0.161
 here-0.152
 a-0.149
Token.
Token ID13
Feature activation+2.68e-01

Pos logit contributions
lights+0.129
lo+0.127
vation+0.112
aro+0.109
opl+0.107
Neg logit contributions
 enough-0.063
 like-0.062
 indeed-0.062
 Can-0.053
 please-0.053
Token She
Token ID1375
Feature activation+1.74e+00

Pos logit contributions
lights+0.008
lo+0.008
vation+0.007
opl+0.007
aro+0.007
Neg logit contributions
 indeed-0.003
 enough-0.003
 like-0.003
 Can-0.003
 please-0.003
Token also
Token ID635
Feature activation+4.92e-01

Pos logit contributions
lights+0.046
lo+0.044
irl+0.038
aro+0.037
vation+0.037
Neg logit contributions
 indeed-0.027
 Can-0.027
 enough-0.027
 like-0.027
 wanna-0.026
Token learned
Token ID4499
Feature activation-1.77e+00

Pos logit contributions
lights+0.012
lo+0.012
irl+0.010
vation+0.010
opl+0.010
Neg logit contributions
 indeed-0.008
 enough-0.008
 Can-0.008
 like-0.007
 please-0.007
Token helped
Token ID4193
Feature activation+5.45e-01

Pos logit contributions
lights+0.060
lo+0.059
aro+0.050
vation+0.050
opl+0.049
Neg logit contributions
 indeed-0.031
 enough-0.030
 like-0.030
 Can-0.027
 accidentally-0.026
Token Lily
Token ID20037
Feature activation+3.63e-01

Pos logit contributions
lights+0.016
lo+0.015
aro+0.013
irl+0.013
ups+0.013
Neg logit contributions
 indeed-0.008
 enough-0.008
 Can-0.007
 like-0.007
 wanna-0.007
Token to
Token ID284
Feature activation+3.16e+00

Pos logit contributions
lights+0.010
lo+0.010
aro+0.008
vation+0.008
opl+0.008
Neg logit contributions
 like-0.006
 enough-0.006
 indeed-0.006
 please-0.005
 Can-0.005
Token tie
Token ID9839
Feature activation+3.73e+00

Pos logit contributions
lights+0.089
lo+0.085
aro+0.076
opl+0.075
vation+0.073
Neg logit contributions
 like-0.058
 please-0.051
 indeed-0.050
 three-0.050
 enough-0.049
Token the
Token ID262
Feature activation+1.73e+00

Pos logit contributions
lights+0.119
lo+0.118
vation+0.107
aro+0.104
asis+0.102
Neg logit contributions
 enough-0.046
 like-0.044
 indeed-0.041
 three-0.034
 Can-0.034
Token rib
Token ID12183
Feature activation+9.07e+00

Pos logit contributions
lights+0.043
lo+0.042
aro+0.034
vation+0.034
opl+0.033
Neg logit contributions
 enough-0.033
 indeed-0.032
 like-0.032
 Can-0.031
 please-0.029
Tokenbons
Token ID23461
Feature activation+4.60e+00

Pos logit contributions
lights+0.237
lo+0.204
aro+0.191
vation+0.186
kins+0.178
Neg logit contributions
 indeed-0.160
 enough-0.159
 like-0.157
 pad-0.144
 three-0.140
Token on
Token ID319
Feature activation+3.56e+00

Pos logit contributions
lights+0.149
lo+0.148
vation+0.132
aro+0.130
opl+0.129
Neg logit contributions
 enough-0.053
 like-0.052
 indeed-0.049
 please-0.040
 Can-0.038
Token the
Token ID262
Feature activation+2.88e+00

Pos logit contributions
lights+0.105
lo+0.102
vation+0.092
aro+0.089
opl+0.087
Neg logit contributions
 enough-0.051
 like-0.051
 indeed-0.050
 Can-0.044
 please-0.043
Token door
Token ID3420
Feature activation+2.46e+00

Pos logit contributions
lights+0.066
lo+0.065
aro+0.055
irl+0.053
vation+0.052
Neg logit contributions
 like-0.059
 indeed-0.058
 enough-0.057
 Can-0.055
 daughter-0.053
Token.
Token ID13
Feature activation-7.29e-01

Pos logit contributions
lo+0.080
lights+0.079
aro+0.072
vation+0.070
irl+0.070
Neg logit contributions
 like-0.029
 enough-0.029
 indeed-0.028
 please-0.023
 Can-0.022
Token suggested
Token ID5220
Feature activation-2.79e+00

Pos logit contributions
lights+0.071
lo+0.069
aro+0.060
vation+0.060
irl+0.058
Neg logit contributions
 indeed-0.024
 enough-0.024
 like-0.024
 Can-0.021
 please-0.020
Token.
Token ID13
Feature activation+2.41e-01

Pos logit contributions
 Can+0.052
 enough+0.052
 indeed+0.051
 wanna+0.049
 accidentally+0.048
Neg logit contributions
lights-0.065
lo-0.063
aro-0.050
we-0.050
vation-0.049
Token He
Token ID679
Feature activation+1.27e+00

Pos logit contributions
lights+0.007
lo+0.007
aro+0.006
vation+0.006
opl+0.006
Neg logit contributions
 indeed-0.003
 enough-0.003
 like-0.003
 Can-0.003
 please-0.002
Token tied
Token ID8165
Feature activation+3.50e+00

Pos logit contributions
lights+0.036
lo+0.035
aro+0.029
vation+0.029
irl+0.029
Neg logit contributions
 indeed-0.020
 enough-0.019
 like-0.019
 Can-0.018
 please-0.017
Token the
Token ID262
Feature activation+2.98e+00

Pos logit contributions
lights+0.101
lo+0.099
vation+0.087
aro+0.084
opl+0.083
Neg logit contributions
 enough-0.052
 indeed-0.050
 like-0.050
 Can-0.044
 please-0.043
Token rib
Token ID12183
Feature activation+9.06e+00

Pos logit contributions
lights+0.065
lo+0.064
aro+0.051
vation+0.051
irl+0.049
Neg logit contributions
 indeed-0.064
 enough-0.063
 like-0.061
 Can-0.060
 please-0.058
Tokenbons
Token ID23461
Feature activation+5.18e+00

Pos logit contributions
lights+0.214
lo+0.186
aro+0.162
vation+0.161
kins+0.150
Neg logit contributions
 indeed-0.193
 enough-0.191
 like-0.188
 pad-0.178
 three-0.168
Token onto
Token ID4291
Feature activation+3.39e+00

Pos logit contributions
lights+0.141
lo+0.138
vation+0.117
aro+0.116
irl+0.114
Neg logit contributions
 enough-0.084
 indeed-0.084
 like-0.082
 Can-0.075
 please-0.073
Token the
Token ID262
Feature activation+3.41e+00

Pos logit contributions
lights+0.099
lo+0.097
vation+0.083
aro+0.083
irl+0.081
Neg logit contributions
 indeed-0.048
 enough-0.048
 like-0.046
 Can-0.043
 please-0.041
Token stick
Token ID4859
Feature activation+3.40e+00

Pos logit contributions
lights+0.079
lo+0.077
aro+0.062
vation+0.061
irl+0.060
Neg logit contributions
 indeed-0.071
 enough-0.070
 like-0.068
 Can-0.065
 please-0.063
Token and
Token ID290
Feature activation+1.36e+00

Pos logit contributions
lights+0.119
lo+0.117
aro+0.103
vation+0.103
irl+0.101
Neg logit contributions
 enough-0.030
 indeed-0.029
 like-0.029
 Can-0.023
 please-0.022
Token.
Token ID13
Feature activation+4.93e-01

Pos logit contributions
lights+0.008
lo+0.008
aro+0.007
vation+0.006
we+0.006
Neg logit contributions
 indeed-0.006
 Can-0.006
 enough-0.006
 wanna-0.006
 please-0.005
Token Jack
Token ID3619
Feature activation+1.02e+00

Pos logit contributions
lights+0.013
lo+0.013
vation+0.011
opl+0.011
aro+0.011
Neg logit contributions
 enough-0.008
 indeed-0.008
 like-0.008
 please-0.007
 thief-0.007
Token put
Token ID1234
Feature activation+3.77e+00

Pos logit contributions
lights+0.024
lo+0.024
irl+0.020
vation+0.020
aro+0.019
Neg logit contributions
 Can-0.018
 enough-0.017
 indeed-0.017
 like-0.017
 please-0.017
Token the
Token ID262
Feature activation+2.46e+00

Pos logit contributions
lights+0.122
lo+0.120
aro+0.106
vation+0.106
opl+0.103
Neg logit contributions
 enough-0.049
 like-0.047
 indeed-0.043
 please-0.038
 three-0.037
Token ash
Token ID12530
Feature activation+3.92e+00

Pos logit contributions
lights+0.062
lo+0.060
aro+0.050
vation+0.050
irl+0.049
Neg logit contributions
 indeed-0.046
 enough-0.045
 like-0.044
 Can-0.042
 please-0.040
Tokentr
Token ID2213
Feature activation+9.05e+00

Pos logit contributions
lights+0.080
lo+0.078
irl+0.068
vation+0.067
aro+0.065
Neg logit contributions
 like-0.091
 enough-0.089
 indeed-0.089
 please-0.082
 Can-0.080
Tokenay
Token ID323
Feature activation+5.99e+00

Pos logit contributions
lo+0.202
lights+0.192
vation+0.177
irl+0.177
aro+0.176
Neg logit contributions
 indeed-0.186
 enough-0.184
 please-0.172
 like-0.171
 Can-0.161
Token on
Token ID319
Feature activation+3.48e+00

Pos logit contributions
lights+0.165
lo+0.158
vation+0.138
irl+0.137
aro+0.134
Neg logit contributions
 indeed-0.098
 enough-0.097
 like-0.095
 Can-0.084
 please-0.084
Token the
Token ID262
Feature activation+2.97e+00

Pos logit contributions
lights+0.095
lo+0.091
vation+0.075
irl+0.075
opl+0.074
Neg logit contributions
 indeed-0.054
 enough-0.052
 Can-0.051
 wanna-0.050
 like-0.049
Token kitchen
Token ID9592
Feature activation+4.48e-01

Pos logit contributions
lights+0.064
lo+0.062
aro+0.050
irl+0.049
vation+0.049
Neg logit contributions
 indeed-0.067
 enough-0.065
 like-0.065
 Can-0.062
 please-0.060
Token table
Token ID3084
Feature activation+2.38e+00

Pos logit contributions
lights+0.014
lo+0.014
aro+0.012
vation+0.012
irl+0.012
Neg logit contributions
 indeed-0.005
 enough-0.005
 like-0.005
 Can-0.005
 please-0.004
Token throw
Token ID3714
Feature activation+3.53e+00

Pos logit contributions
lights+0.060
lo+0.060
irl+0.049
vation+0.048
aro+0.048
Neg logit contributions
 indeed-0.050
 Can-0.048
 enough-0.048
 like-0.046
 Please-0.044
Token away
Token ID1497
Feature activation+1.32e+00

Pos logit contributions
lights+0.098
lo+0.095
vation+0.083
aro+0.081
opl+0.079
Neg logit contributions
 enough-0.059
 like-0.058
 indeed-0.056
 Can-0.051
 ok-0.049
Token the
Token ID262
Feature activation+3.44e+00

Pos logit contributions
lights+0.040
lo+0.039
vation+0.035
opl+0.034
irl+0.034
Neg logit contributions
 enough-0.019
 indeed-0.018
 like-0.018
 ok-0.015
 Can-0.015
Token broken
Token ID5445
Feature activation+2.06e+00

Pos logit contributions
lights+0.083
lo+0.082
aro+0.067
vation+0.066
irl+0.064
Neg logit contributions
 indeed-0.066
 enough-0.066
 like-0.064
 Can-0.062
 please-0.060
Token ash
Token ID12530
Feature activation+2.86e+00

Pos logit contributions
lights+0.044
lo+0.043
aro+0.034
opl+0.033
eds+0.032
Neg logit contributions
 indeed-0.044
 enough-0.044
 wanna-0.043
 Can-0.043
 like-0.043
Tokentr
Token ID2213
Feature activation+9.00e+00

Pos logit contributions
lights+0.056
lo+0.056
irl+0.050
vation+0.048
aro+0.047
Neg logit contributions
 like-0.068
 enough-0.068
 indeed-0.067
 please-0.060
 pad-0.060
Tokenay
Token ID323
Feature activation+4.73e+00

Pos logit contributions
lo+0.205
lights+0.190
aro+0.188
irl+0.187
vation+0.187
Neg logit contributions
 indeed-0.178
 enough-0.172
 please-0.162
 like-0.158
 She-0.150
Token and
Token ID290
Feature activation+1.48e+00

Pos logit contributions
lights+0.154
lo+0.151
vation+0.133
aro+0.132
irl+0.131
Neg logit contributions
 indeed-0.052
 enough-0.051
 like-0.049
 Can-0.042
 please-0.040
Token soon
Token ID2582
Feature activation-1.08e+00

Pos logit contributions
lights+0.043
lo+0.043
vation+0.036
aro+0.036
opl+0.035
Neg logit contributions
 indeed-0.021
 enough-0.021
 like-0.021
 Can-0.019
 please-0.018
Token forgotten
Token ID11564
Feature activation-2.29e+00

Pos logit contributions
 indeed+0.017
 enough+0.017
 like+0.016
 Can+0.016
 please+0.015
Neg logit contributions
lights-0.030
lo-0.029
aro-0.025
vation-0.024
irl-0.024
Token about
Token ID546
Feature activation+2.26e+00

Pos logit contributions
 indeed+0.030
 enough+0.028
 Can+0.028
 like+0.028
 wanna+0.026
Neg logit contributions
lights-0.069
lo-0.066
aro-0.057
vation-0.055
irl-0.054
Token the
Token ID262
Feature activation+4.21e+00

Pos logit contributions
lights+0.098
lo+0.096
vation+0.082
aro+0.081
irl+0.078
Neg logit contributions
 enough-0.058
 indeed-0.057
 like-0.056
 Can-0.052
 please-0.050
Token tree
Token ID5509
Feature activation+2.74e+00

Pos logit contributions
lights+0.099
lo+0.090
vation+0.077
aro+0.076
irl+0.073
Neg logit contributions
 enough-0.091
 indeed-0.089
 like-0.088
 Can-0.085
 please-0.082
Token and
Token ID290
Feature activation+1.11e+00

Pos logit contributions
lights+0.086
lo+0.084
vation+0.074
aro+0.073
irl+0.072
Neg logit contributions
 enough-0.034
 indeed-0.034
 like-0.033
 Can-0.029
 please-0.028
Token tie
Token ID9839
Feature activation+3.08e+00

Pos logit contributions
lights+0.030
lo+0.029
aro+0.024
vation+0.024
opl+0.023
Neg logit contributions
 indeed-0.019
 enough-0.019
 like-0.019
 Can-0.018
 please-0.017
Token the
Token ID262
Feature activation+3.10e+00

Pos logit contributions
lights+0.091
lo+0.089
vation+0.080
opl+0.076
aro+0.075
Neg logit contributions
 enough-0.045
 like-0.044
 indeed-0.043
 Can-0.038
 please-0.037
Token rib
Token ID12183
Feature activation+9.00e+00

Pos logit contributions
lights+0.079
lo+0.075
vation+0.063
aro+0.062
irl+0.060
Neg logit contributions
 enough-0.059
 indeed-0.058
 like-0.057
 Can-0.054
 please-0.052
Tokenbons
Token ID23461
Feature activation+4.03e+00

Pos logit contributions
lights+0.233
lo+0.194
vation+0.185
aro+0.183
kins+0.166
Neg logit contributions
 indeed-0.170
 enough-0.170
 like-0.166
 pad-0.153
 three-0.144
Token."
Token ID526
Feature activation+6.60e-01

Pos logit contributions
lights+0.117
lo+0.116
vation+0.099
aro+0.098
opl+0.098
Neg logit contributions
 enough-0.058
 indeed-0.058
 like-0.058
 Can-0.050
 please-0.050
Token
Token ID198
Feature activation+4.66e+00

Pos logit contributions
lights+0.020
lo+0.019
vation+0.018
opl+0.017
eds+0.017
Neg logit contributions
 like-0.009
 enough-0.009
 indeed-0.009
 Can-0.008
 please-0.008
Token
Token ID198
Feature activation+4.16e+00

Pos logit contributions
lights+0.162
lo+0.150
vation+0.144
opl+0.138
aro+0.136
Neg logit contributions
 like-0.044
 please-0.041
 She-0.040
 indeed-0.039
 Can-0.037
TokenThey
Token ID2990
Feature activation+1.64e+00

Pos logit contributions
lights+0.109
lo+0.104
aro+0.088
vation+0.087
opl+0.086
Neg logit contributions
 indeed-0.076
 enough-0.075
 like-0.074
 Can-0.069
 please-0.068
Token can
Token ID460
Feature activation+8.76e-01

Pos logit contributions
lo+0.045
lights+0.042
irl+0.038
eds+0.036
ites+0.036
Neg logit contributions
 indeed-0.029
 Can-0.028
 enough-0.028
 Heidi-0.026
 Robert-0.026
Token use
Token ID779
Feature activation+2.38e+00

Pos logit contributions
lights+0.022
lo+0.022
aro+0.018
vation+0.018
opl+0.017
Neg logit contributions
 indeed-0.016
 like-0.016
 enough-0.016
 please-0.015
 Can-0.015
Token this
Token ID428
Feature activation+2.28e+00

Pos logit contributions
lights+0.074
lo+0.073
vation+0.065
aro+0.062
opl+0.062
Neg logit contributions
 enough-0.033
 indeed-0.030
 like-0.030
 Can-0.026
 please-0.026
Token broken
Token ID5445
Feature activation+2.13e+00

Pos logit contributions
lights+0.060
lo+0.059
vation+0.050
aro+0.050
irl+0.049
Neg logit contributions
 enough-0.040
 indeed-0.040
 like-0.039
 Can-0.036
 please-0.035
Token ash
Token ID12530
Feature activation+2.42e+00

Pos logit contributions
lights+0.048
lo+0.047
aro+0.037
opl+0.036
vation+0.036
Neg logit contributions
 indeed-0.044
 enough-0.043
 like-0.042
 Can-0.041
 wanna-0.041
Tokentr
Token ID2213
Feature activation+8.99e+00

Pos logit contributions
lights+0.045
lo+0.044
irl+0.036
vation+0.036
aro+0.035
Neg logit contributions
 enough-0.061
 like-0.060
 indeed-0.060
 please-0.056
 Can-0.056
Tokenay
Token ID323
Feature activation+3.85e+00

Pos logit contributions
lo+0.200
lights+0.188
aro+0.182
vation+0.178
irl+0.173
Neg logit contributions
 indeed-0.185
 enough-0.184
 please-0.174
 like-0.170
 Can-0.160
Token as
Token ID355
Feature activation+1.32e+00

Pos logit contributions
lights+0.111
lo+0.109
aro+0.093
vation+0.092
irl+0.090
Neg logit contributions
 indeed-0.056
 enough-0.055
 like-0.053
 Can-0.051
 please-0.048
Token a
Token ID257
Feature activation-1.03e+00

Pos logit contributions
lights+0.042
lo+0.041
aro+0.036
irl+0.035
vation+0.035
Neg logit contributions
 indeed-0.014
 enough-0.014
 like-0.013
 Can-0.013
 wanna-0.012
Token puzzle
Token ID15027
Feature activation+1.85e+00

Pos logit contributions
 indeed+0.019
 enough+0.019
 like+0.018
 Can+0.017
 wanna+0.017
Neg logit contributions
lights-0.025
lo-0.024
aro-0.020
vation-0.020
irl-0.019
Token!"
Token ID2474
Feature activation+1.68e+00

Pos logit contributions
lights+0.051
lo+0.051
aro+0.043
vation+0.043
irl+0.042
Neg logit contributions
 indeed-0.029
 enough-0.029
 like-0.028
 Can-0.026
 please-0.025
Tokenite
Token ID578
Feature activation+2.55e+00

Pos logit contributions
lights+0.106
lo+0.103
aro+0.084
vation+0.081
irl+0.080
Neg logit contributions
 indeed-0.122
 enough-0.120
 like-0.117
 pad-0.117
 accidentally-0.112
Token.
Token ID13
Feature activation-8.38e-01

Pos logit contributions
lights+0.076
lo+0.076
vation+0.067
aro+0.066
irl+0.065
Neg logit contributions
 like-0.038
 enough-0.035
 indeed-0.034
 please-0.029
 accidentally-0.028
Token They
Token ID1119
Feature activation+1.82e+00

Pos logit contributions
 indeed+0.012
 enough+0.011
 like+0.011
 Can+0.011
 please+0.010
Neg logit contributions
lights-0.025
lo-0.025
vation-0.021
aro-0.021
irl-0.020
Token tied
Token ID8165
Feature activation+3.48e+00

Pos logit contributions
lights+0.055
lo+0.055
vation+0.047
aro+0.047
opl+0.046
Neg logit contributions
 like-0.027
 enough-0.025
 indeed-0.024
 wanted-0.022
 accidentally-0.022
Token the
Token ID262
Feature activation+2.69e+00

Pos logit contributions
lights+0.100
lo+0.099
vation+0.090
opl+0.086
aro+0.086
Neg logit contributions
 enough-0.053
 like-0.050
 indeed-0.050
 three-0.043
 Can-0.042
Token rib
Token ID12183
Feature activation+8.97e+00

Pos logit contributions
lights+0.065
lo+0.064
aro+0.052
vation+0.052
irl+0.050
Neg logit contributions
 indeed-0.052
 enough-0.052
 like-0.051
 Can-0.048
 please-0.047
Tokenbons
Token ID23461
Feature activation+4.38e+00

Pos logit contributions
lights+0.202
lo+0.184
aro+0.165
vation+0.155
irl+0.145
Neg logit contributions
 indeed-0.195
 like-0.193
 enough-0.189
 pad-0.179
 three-0.168
Token and
Token ID290
Feature activation+2.00e+00

Pos logit contributions
lights+0.139
lo+0.137
vation+0.120
aro+0.119
opl+0.117
Neg logit contributions
 enough-0.053
 indeed-0.051
 like-0.051
 Can-0.042
 please-0.042
Token the
Token ID262
Feature activation+1.79e+00

Pos logit contributions
lo+0.060
lights+0.057
opl+0.052
vation+0.051
asis+0.050
Neg logit contributions
 enough-0.032
 like-0.032
 indeed-0.028
 three-0.027
 wanted-0.027
Token feathers
Token ID27400
Feature activation+3.98e+00

Pos logit contributions
lights+0.042
lo+0.041
aro+0.033
vation+0.033
irl+0.032
Neg logit contributions
 indeed-0.036
 enough-0.035
 like-0.035
 Can-0.033
 please-0.032
Token to
Token ID284
Feature activation+3.06e+00

Pos logit contributions
lights+0.125
lo+0.125
vation+0.111
aro+0.109
opl+0.108
Neg logit contributions
 like-0.050
 enough-0.049
 indeed-0.046
 please-0.038
 Can-0.035
Token should
Token ID815
Feature activation-1.23e-01

Pos logit contributions
lights+0.082
lo+0.081
aro+0.069
vation+0.069
irl+0.067
Neg logit contributions
 indeed-0.037
 enough-0.036
 like-0.034
 Can-0.033
 please-0.031
Token never
Token ID1239
Feature activation+4.00e+00

Pos logit contributions
 indeed+0.002
 enough+0.002
 like+0.002
 Can+0.002
 please+0.002
Neg logit contributions
lights-0.003
lo-0.003
aro-0.003
vation-0.003
irl-0.003
Token touch
Token ID3638
Feature activation+4.33e+00

Pos logit contributions
lights+0.115
lo+0.111
aro+0.093
vation+0.093
opl+0.089
Neg logit contributions
 like-0.065
 indeed-0.062
 enough-0.062
 please-0.058
 Can-0.057
Token an
Token ID281
Feature activation+2.14e+00

Pos logit contributions
lights+0.127
lo+0.125
vation+0.110
aro+0.106
irl+0.105
Neg logit contributions
 enough-0.062
 indeed-0.061
 like-0.061
 Can-0.055
 please-0.052
Token ash
Token ID12530
Feature activation+1.93e+00

Pos logit contributions
lights+0.046
lo+0.046
opl+0.037
aro+0.036
eds+0.034
Neg logit contributions
 indeed-0.044
 enough-0.044
 Can-0.043
 like-0.043
 Please-0.042
Tokentr
Token ID2213
Feature activation+8.96e+00

Pos logit contributions
lights+0.035
lo+0.034
aro+0.027
vation+0.026
irl+0.026
Neg logit contributions
 indeed-0.050
 enough-0.049
 like-0.049
 Can-0.047
 please-0.046
Tokenay
Token ID323
Feature activation+3.31e+00

Pos logit contributions
lo+0.199
lights+0.188
aro+0.182
vation+0.173
irl+0.167
Neg logit contributions
 enough-0.189
 indeed-0.187
 please-0.178
 like-0.176
 Can-0.166
Token."
Token ID526
Feature activation+4.00e-02

Pos logit contributions
lights+0.088
lo+0.086
aro+0.072
vation+0.072
opl+0.069
Neg logit contributions
 indeed-0.056
 enough-0.055
 like-0.053
 Can-0.051
 wanna-0.049
Token
Token ID198
Feature activation+4.28e+00

Pos logit contributions
lights+0.001
lo+0.001
aro+0.001
ites+0.001
irl+0.001
Neg logit contributions
 indeed-0.001
 enough-0.001
 like-0.001
 ok-0.001
 wanna-0.001
Token
Token ID198
Feature activation+4.09e+00

Pos logit contributions
lights+0.141
lo+0.133
vation+0.126
aro+0.118
irl+0.114
Neg logit contributions
 like-0.046
 indeed-0.045
 She-0.042
 enough-0.042
 Can-0.041
TokenS
Token ID50
Feature activation+3.43e+00

Pos logit contributions
lights+0.113
lo+0.108
vation+0.093
aro+0.091
irl+0.089
Neg logit contributions
 indeed-0.068
 enough-0.066
 like-0.065
 Can-0.062
 please-0.060
Token will
Token ID481
Feature activation-5.88e-01

Pos logit contributions
lo+0.046
lights+0.045
aro+0.039
irl+0.038
vation+0.038
Neg logit contributions
 enough-0.025
 indeed-0.024
 Can-0.022
 gonna-0.021
 Robert-0.021
Token not
Token ID407
Feature activation+1.83e+00

Pos logit contributions
 indeed+0.010
 like+0.010
 enough+0.010
 Can+0.009
 please+0.009
Neg logit contributions
lights-0.016
lo-0.016
aro-0.013
vation-0.013
opl-0.012
Token touch
Token ID3638
Feature activation+3.28e+00

Pos logit contributions
lights+0.046
lo+0.045
aro+0.037
vation+0.037
irl+0.036
Neg logit contributions
 indeed-0.033
 enough-0.033
 like-0.031
 Can-0.030
 please-0.029
Token the
Token ID262
Feature activation+2.79e+00

Pos logit contributions
lights+0.098
lo+0.096
vation+0.086
aro+0.081
irl+0.081
Neg logit contributions
 enough-0.048
 like-0.046
 indeed-0.045
 Can-0.042
 please-0.039
Token ash
Token ID12530
Feature activation+2.52e+00

Pos logit contributions
lights+0.073
lo+0.071
vation+0.060
aro+0.059
irl+0.059
Neg logit contributions
 indeed-0.050
 like-0.049
 enough-0.049
 Can-0.046
 accidentally-0.043
Tokentr
Token ID2213
Feature activation+8.94e+00

Pos logit contributions
lights+0.049
lo+0.049
irl+0.041
aro+0.040
vation+0.040
Neg logit contributions
 enough-0.061
 like-0.061
 indeed-0.061
 please-0.057
 Can-0.056
Tokenay
Token ID323
Feature activation+4.47e+00

Pos logit contributions
lo+0.205
aro+0.194
lights+0.192
vation+0.183
irl+0.180
Neg logit contributions
 indeed-0.176
 enough-0.174
 please-0.164
 like-0.156
 Can-0.153
Token again
Token ID757
Feature activation+2.87e+00

Pos logit contributions
lights+0.128
lo+0.124
vation+0.108
aro+0.108
irl+0.106
Neg logit contributions
 enough-0.068
 indeed-0.067
 like-0.065
 Can-0.061
 please-0.060
Token,"
Token ID553
Feature activation+6.26e-01

Pos logit contributions
lights+0.078
lo+0.076
aro+0.064
vation+0.064
irl+0.062
Neg logit contributions
 indeed-0.047
 enough-0.046
 like-0.045
 Can-0.043
 please-0.041
Token Tim
Token ID5045
Feature activation+1.17e+00

Pos logit contributions
lights+0.020
lo+0.020
aro+0.017
vation+0.016
irl+0.016
Neg logit contributions
 indeed-0.008
 enough-0.007
 like-0.007
 Can-0.007
 wanna-0.006
Token says
Token ID1139
Feature activation-2.82e+00

Pos logit contributions
lights+0.023
lo+0.023
ups+0.019
opl+0.018
ites+0.017
Neg logit contributions
 indeed-0.027
 Can-0.025
 accidentally-0.024
 wanna-0.024
 gonna-0.024
Token tried
Token ID3088
Feature activation-1.34e+00

Pos logit contributions
lights+0.014
lo+0.014
aro+0.012
vation+0.011
irl+0.011
Neg logit contributions
 indeed-0.008
 enough-0.008
 like-0.008
 Can-0.008
 wanna-0.007
Token to
Token ID284
Feature activation+2.22e+00

Pos logit contributions
 indeed+0.014
 Can+0.013
 enough+0.013
 like+0.012
 Please+0.012
Neg logit contributions
lights-0.043
lo-0.042
aro-0.036
irl-0.036
eds-0.036
Token fix
Token ID4259
Feature activation+3.48e+00

Pos logit contributions
lights+0.054
lo+0.053
aro+0.043
vation+0.043
irl+0.042
Neg logit contributions
 indeed-0.043
 enough-0.042
 like-0.041
 Can-0.039
 please-0.038
Token the
Token ID262
Feature activation+3.16e+00

Pos logit contributions
lights+0.114
lo+0.109
vation+0.101
aro+0.096
irl+0.095
Neg logit contributions
 enough-0.042
 like-0.039
 indeed-0.039
 Can-0.032
 ok-0.031
Token ash
Token ID12530
Feature activation+2.99e+00

Pos logit contributions
lights+0.076
lo+0.075
aro+0.061
vation+0.061
irl+0.059
Neg logit contributions
 indeed-0.061
 enough-0.060
 like-0.059
 Can-0.056
 please-0.055
Tokentr
Token ID2213
Feature activation+8.91e+00

Pos logit contributions
lights+0.059
lo+0.058
irl+0.052
vation+0.050
aro+0.049
Neg logit contributions
 like-0.071
 enough-0.071
 indeed-0.071
 please-0.064
 Can-0.062
Tokenay
Token ID323
Feature activation+4.82e+00

Pos logit contributions
lo+0.199
lights+0.187
aro+0.183
vation+0.180
irl+0.179
Neg logit contributions
 indeed-0.182
 enough-0.176
 please-0.166
 like-0.162
 She-0.153
Token,
Token ID11
Feature activation-1.91e+00

Pos logit contributions
lights+0.148
lo+0.142
vation+0.126
irl+0.125
aro+0.124
Neg logit contributions
 indeed-0.064
 enough-0.063
 like-0.062
 please-0.053
 Can-0.052
Token but
Token ID475
Feature activation+5.49e-01

Pos logit contributions
 indeed+0.016
 enough+0.016
 Can+0.014
 like+0.014
 wanna+0.013
Neg logit contributions
lights-0.065
lo-0.062
aro-0.055
vation-0.054
irl-0.053
Token it
Token ID340
Feature activation+3.41e+00

Pos logit contributions
lights+0.017
lo+0.016
aro+0.014
vation+0.014
irl+0.014
Neg logit contributions
 indeed-0.007
 enough-0.007
 like-0.007
 Can-0.006
 please-0.006
Token was
Token ID373
Feature activation+4.23e-02

Pos logit contributions
lights+0.100
lo+0.098
vation+0.085
aro+0.084
opl+0.083
Neg logit contributions
 indeed-0.050
 enough-0.049
 like-0.047
 accidentally-0.043
 Can-0.042
Token children
Token ID1751
Feature activation-6.42e-01

Pos logit contributions
lights+0.027
lo+0.026
aro+0.022
vation+0.022
irl+0.022
Neg logit contributions
 indeed-0.013
 enough-0.013
 like-0.012
 Can-0.011
 please-0.011
Token helped
Token ID4193
Feature activation+4.95e-01

Pos logit contributions
 indeed+0.011
 Can+0.010
 dear+0.010
 darling+0.009
 enough+0.009
Neg logit contributions
lo-0.016
lights-0.015
ites-0.013
aro-0.013
ups-0.013
Token the
Token ID262
Feature activation+3.91e-01

Pos logit contributions
lights+0.015
lo+0.015
vation+0.013
aro+0.013
opl+0.013
Neg logit contributions
 enough-0.006
 like-0.006
 indeed-0.006
 Can-0.005
 please-0.005
Token girl
Token ID2576
Feature activation-1.50e+00

Pos logit contributions
lights+0.011
lo+0.011
vation+0.009
aro+0.009
irl+0.009
Neg logit contributions
 indeed-0.006
 enough-0.006
 like-0.006
 Can-0.005
 please-0.005
Token put
Token ID1234
Feature activation+4.07e+00

Pos logit contributions
 indeed+0.027
 enough+0.027
 like+0.026
 Can+0.025
 please+0.024
Neg logit contributions
lights-0.037
lo-0.037
aro-0.030
vation-0.030
irl-0.030
Token rib
Token ID12183
Feature activation+8.89e+00

Pos logit contributions
lights+0.138
lo+0.134
vation+0.123
aro+0.117
eds+0.115
Neg logit contributions
 enough-0.047
 like-0.045
 indeed-0.040
 please-0.033
 three-0.031
Tokenbons
Token ID23461
Feature activation+4.03e+00

Pos logit contributions
lights+0.215
lo+0.192
aro+0.172
vation+0.165
kins+0.154
Neg logit contributions
 enough-0.188
 indeed-0.177
 like-0.171
 pad-0.154
 wanna-0.151
Token in
Token ID287
Feature activation+3.58e+00

Pos logit contributions
lights+0.109
lo+0.109
vation+0.092
aro+0.091
opl+0.090
Neg logit contributions
 enough-0.066
 indeed-0.065
 like-0.065
 Can-0.057
 please-0.056
Token her
Token ID607
Feature activation+1.90e+00

Pos logit contributions
lights+0.113
lo+0.111
vation+0.096
aro+0.096
irl+0.093
Neg logit contributions
 indeed-0.044
 enough-0.043
 like-0.042
 Can-0.038
 please-0.036
Token hair
Token ID4190
Feature activation+2.72e+00

Pos logit contributions
lo+0.053
lights+0.051
aro+0.046
vation+0.045
we+0.043
Neg logit contributions
 enough-0.031
 like-0.030
 indeed-0.030
 daughter-0.029
 Can-0.028
Token.
Token ID13
Feature activation-6.96e-01

Pos logit contributions
lo+0.085
lights+0.083
aro+0.076
vation+0.075
irl+0.074
Neg logit contributions
 like-0.037
 enough-0.036
 indeed-0.033
 please-0.028
 pad-0.027
Token't
Token ID470
Feature activation+4.44e-01

Pos logit contributions
lights+0.019
lo+0.018
aro+0.016
vation+0.015
opl+0.015
Neg logit contributions
 indeed-0.011
 like-0.011
 enough-0.011
 please-0.010
 Can-0.010
Token have
Token ID423
Feature activation+1.39e+00

Pos logit contributions
lights+0.011
lo+0.011
vation+0.009
aro+0.009
irl+0.009
Neg logit contributions
 indeed-0.008
 enough-0.008
 like-0.007
 Can-0.007
 accidentally-0.007
Token all
Token ID477
Feature activation+3.60e+00

Pos logit contributions
lights+0.040
lo+0.040
aro+0.034
vation+0.033
irl+0.033
Neg logit contributions
 indeed-0.019
 like-0.017
 Can-0.017
 enough-0.017
 please-0.016
Token these
Token ID777
Feature activation+2.93e+00

Pos logit contributions
lights+0.096
lo+0.095
aro+0.080
vation+0.079
opl+0.077
Neg logit contributions
 indeed-0.059
 enough-0.057
 like-0.055
 Can-0.054
 please-0.052
Token ch
Token ID442
Feature activation+4.62e+00

Pos logit contributions
lights+0.075
lo+0.074
aro+0.061
vation+0.060
irl+0.059
Neg logit contributions
 indeed-0.052
 enough-0.051
 like-0.050
 Can-0.048
 please-0.046
Tokenocol
Token ID4668
Feature activation+8.89e+00

Pos logit contributions
lights+0.127
lo+0.121
vation+0.109
aro+0.102
irl+0.100
Neg logit contributions
 enough-0.073
 indeed-0.073
 like-0.071
 wanna-0.067
 please-0.063
Tokenates
Token ID689
Feature activation+3.80e+00

Pos logit contributions
lights+0.182
lo+0.155
vation+0.137
irl+0.129
aro+0.123
Neg logit contributions
 indeed-0.211
 enough-0.206
 like-0.196
 pad-0.183
 ok-0.181
Token!
Token ID0
Feature activation-3.74e-01

Pos logit contributions
lights+0.106
lo+0.105
aro+0.089
vation+0.087
irl+0.085
Neg logit contributions
 indeed-0.055
 enough-0.054
 Can-0.052
 like-0.051
 accidentally-0.048
Token Now
Token ID2735
Feature activation-1.67e+00

Pos logit contributions
 enough+0.005
 indeed+0.005
 like+0.005
 accidentally+0.004
 thief+0.004
Neg logit contributions
lo-0.011
lights-0.010
aro-0.009
opl-0.009
vation-0.009
Token,
Token ID11
Feature activation-1.54e+00

Pos logit contributions
 indeed+0.027
 Can+0.025
 enough+0.025
 like+0.025
 wanna+0.023
Neg logit contributions
lights-0.043
lo-0.042
vation-0.036
aro-0.036
irl-0.035
Token go
Token ID467
Feature activation+7.12e-01

Pos logit contributions
 indeed+0.025
 enough+0.025
 like+0.023
 Can+0.023
 wanna+0.022
Neg logit contributions
lights-0.040
lo-0.039
aro-0.033
vation-0.033
irl-0.032
Token time
Token ID640
Feature activation-5.14e+00

Pos logit contributions
 indeed+0.084
 enough+0.083
 like+0.079
 Can+0.078
 please+0.074
Neg logit contributions
lights-0.114
lo-0.111
vation-0.091
irl-0.091
aro-0.090
Token,
Token ID11
Feature activation-5.03e+00

Pos logit contributions
 indeed+0.064
 enough+0.062
 Can+0.057
 wanna+0.057
 Please+0.055
Neg logit contributions
lo-0.157
lights-0.152
aro-0.136
irl-0.128
opl-0.121
Token there
Token ID612
Feature activation-1.60e-01

Pos logit contributions
 indeed+0.091
 wanna+0.091
 enough+0.087
 declined+0.084
 robber+0.082
Neg logit contributions
lights-0.116
lo-0.116
irl-0.098
aro-0.095
cks-0.091
Token was
Token ID373
Feature activation-6.31e+00

Pos logit contributions
 indeed+0.003
 enough+0.003
 like+0.003
 Can+0.003
 wanna+0.003
Neg logit contributions
lo-0.004
lights-0.004
aro-0.003
irl-0.003
vation-0.003
Token a
Token ID257
Feature activation-5.86e+00

Pos logit contributions
 wanna+0.111
 Please+0.109
 Can+0.108
 declined+0.106
 please+0.104
Neg logit contributions
lights-0.134
lo-0.128
irl-0.127
hes-0.114
 tracks-0.110
Token curious
Token ID11040
Feature activation-9.01e+00

Pos logit contributions
 enough+0.085
 indeed+0.080
 like+0.070
 wanna+0.069
 ok+0.068
Neg logit contributions
lights-0.165
lo-0.163
irl-0.143
aro-0.138
vation-0.135
Token girl
Token ID2576
Feature activation-6.26e+00

Pos logit contributions
 enough+0.147
 indeed+0.145
 ok+0.142
 gonna+0.138
 wanna+0.133
Neg logit contributions
lo-0.215
lights-0.210
 hill-0.176
asis-0.173
 tracks-0.167
Token named
Token ID3706
Feature activation-5.54e+00

Pos logit contributions
 indeed+0.070
 like+0.068
 enough+0.068
 Can+0.061
 please+0.057
Neg logit contributions
lights-0.205
lo-0.201
vation-0.176
aro-0.173
irl-0.170
Token Jill
Token ID22772
Feature activation-5.87e+00

Pos logit contributions
 indeed+0.089
 enough+0.083
 wanna+0.076
 like+0.076
 Can+0.074
Neg logit contributions
lo-0.151
lights-0.147
aro-0.123
vation-0.117
irl-0.115
Token.
Token ID13
Feature activation-1.89e+00

Pos logit contributions
 like+0.085
 indeed+0.075
 enough+0.074
 wanted+0.072
 Can+0.063
Neg logit contributions
lights-0.183
lo-0.175
vation-0.165
aro-0.155
irl-0.151
Token She
Token ID1375
Feature activation-1.45e+00

Pos logit contributions
 indeed+0.014
 enough+0.013
 like+0.011
 Can+0.010
 ok+0.010
Neg logit contributions
lights-0.069
lo-0.067
irl-0.058
vation-0.058
aro-0.058
Token time
Token ID640
Feature activation-5.20e+00

Pos logit contributions
 indeed+0.079
 enough+0.078
 like+0.075
 Can+0.074
 please+0.070
Neg logit contributions
lights-0.111
lo-0.108
vation-0.089
irl-0.088
aro-0.088
Token,
Token ID11
Feature activation-4.80e+00

Pos logit contributions
 indeed+0.068
 enough+0.065
 wanna+0.062
 Can+0.061
 Please+0.061
Neg logit contributions
lo-0.153
lights-0.147
aro-0.132
irl-0.124
ves-0.117
Token there
Token ID612
Feature activation-1.82e-01

Pos logit contributions
 wanna+0.087
 indeed+0.087
 enough+0.083
 declined+0.081
 dear+0.079
Neg logit contributions
lights-0.111
lo-0.109
irl-0.092
aro-0.090
cks-0.085
Token was
Token ID373
Feature activation-6.26e+00

Pos logit contributions
 indeed+0.003
 enough+0.003
 like+0.003
 Can+0.003
 wanna+0.003
Neg logit contributions
lo-0.004
lights-0.004
aro-0.004
irl-0.004
vation-0.003
Token a
Token ID257
Feature activation-5.81e+00

Pos logit contributions
 wanna+0.107
 Please+0.106
 declined+0.105
 Can+0.105
 please+0.101
Neg logit contributions
lights-0.133
lo-0.128
irl-0.128
hes-0.114
 weekends-0.113
Token curious
Token ID11040
Feature activation-8.88e+00

Pos logit contributions
 enough+0.084
 indeed+0.079
 like+0.069
 wanna+0.068
 gonna+0.067
Neg logit contributions
lights-0.163
lo-0.161
irl-0.141
aro-0.136
vation-0.134
Token little
Token ID1310
Feature activation-5.07e+00

Pos logit contributions
 enough+0.141
 indeed+0.140
 ok+0.134
 gonna+0.131
 wanna+0.126
Neg logit contributions
lo-0.215
lights-0.213
 hill-0.175
asis-0.173
 tracks-0.168
Token jelly
Token ID27644
Feature activation-2.91e+00

Pos logit contributions
 indeed+0.079
 enough+0.076
 ok+0.072
 gonna+0.070
 like+0.069
Neg logit contributions
lights-0.137
lo-0.135
aro-0.115
ites-0.112
asis-0.106
Tokenfish
Token ID11084
Feature activation-3.52e+00

Pos logit contributions
 like+0.053
 indeed+0.053
 enough+0.048
 wanted+0.047
 Can+0.046
Neg logit contributions
lights-0.076
lo-0.072
vation-0.063
opl-0.061
cks-0.059
Token.
Token ID13
Feature activation-2.27e+00

Pos logit contributions
 indeed+0.050
 like+0.048
 enough+0.048
 Can+0.044
 please+0.042
Neg logit contributions
lights-0.105
lo-0.102
vation-0.088
aro-0.086
irl-0.084
Token Everything
Token ID11391
Feature activation+7.27e-01

Pos logit contributions
 enough+0.039
 indeed+0.038
 ok+0.036
 wanna+0.035
 daughter+0.034
Neg logit contributions
lights-0.058
lo-0.058
opl-0.045
irl-0.045
ites-0.045
Token time
Token ID640
Feature activation-5.07e+00

Pos logit contributions
 indeed+0.083
 enough+0.082
 like+0.078
 Can+0.077
 please+0.073
Neg logit contributions
lights-0.114
lo-0.111
vation-0.091
irl-0.091
aro-0.090
Token,
Token ID11
Feature activation-4.91e+00

Pos logit contributions
 indeed+0.060
 enough+0.059
 wanna+0.054
 Can+0.053
 daughter+0.052
Neg logit contributions
lo-0.157
lights-0.153
aro-0.136
irl-0.129
opl-0.123
Token there
Token ID612
Feature activation-1.41e-01

Pos logit contributions
 indeed+0.086
 wanna+0.084
 enough+0.084
 dear+0.078
 declined+0.078
Neg logit contributions
lights-0.118
lo-0.116
irl-0.101
aro-0.095
cks-0.092
Token was
Token ID373
Feature activation-6.24e+00

Pos logit contributions
 indeed+0.003
 enough+0.002
 like+0.002
 Can+0.002
 please+0.002
Neg logit contributions
lights-0.004
lo-0.004
aro-0.003
irl-0.003
vation-0.003
Token a
Token ID257
Feature activation-5.77e+00

Pos logit contributions
 Please+0.105
 wanna+0.105
 declined+0.105
 Can+0.104
 please+0.101
Neg logit contributions
lights-0.133
irl-0.129
lo-0.128
hes-0.117
 weekends-0.112
Token curious
Token ID11040
Feature activation-8.83e+00

Pos logit contributions
 enough+0.082
 indeed+0.079
 like+0.070
 wanna+0.067
 ok+0.066
Neg logit contributions
lights-0.164
lo-0.162
irl-0.141
aro-0.136
vation-0.135
Token little
Token ID1310
Feature activation-4.89e+00

Pos logit contributions
 enough+0.140
 indeed+0.139
 ok+0.134
 gonna+0.130
 wanna+0.126
Neg logit contributions
lo-0.216
lights-0.212
asis-0.174
 hill-0.173
 tracks-0.166
Token girl
Token ID2576
Feature activation-6.15e+00

Pos logit contributions
 indeed+0.075
 enough+0.072
 ok+0.068
 like+0.067
 gonna+0.066
Neg logit contributions
lights-0.133
lo-0.131
aro-0.112
ites-0.109
asis-0.104
Token called
Token ID1444
Feature activation-6.53e+00

Pos logit contributions
 indeed+0.072
 enough+0.070
 like+0.069
 Can+0.063
 please+0.059
Neg logit contributions
lights-0.197
lo-0.194
vation-0.168
aro-0.167
irl-0.163
Token Maria
Token ID14200
Feature activation-5.83e+00

Pos logit contributions
 indeed+0.107
 enough+0.095
 wanna+0.092
 like+0.091
 gonna+0.089
Neg logit contributions
lo-0.180
lights-0.164
aro-0.149
vation-0.133
ves-0.130
Token.
Token ID13
Feature activation-1.54e+00

Pos logit contributions
 indeed+0.118
 enough+0.115
 Can+0.111
 like+0.110
 please+0.108
Neg logit contributions
lights-0.133
lo-0.131
aro-0.104
vation-0.100
irl-0.100
Token time
Token ID640
Feature activation-5.20e+00

Pos logit contributions
 indeed+0.069
 enough+0.068
 like+0.067
 Can+0.063
 please+0.061
Neg logit contributions
lights-0.106
lo-0.104
aro-0.087
vation-0.087
irl-0.084
Token,
Token ID11
Feature activation-4.88e+00

Pos logit contributions
 indeed+0.051
 enough+0.049
 Can+0.044
 like+0.044
 wanna+0.042
Neg logit contributions
lo-0.171
lights-0.170
aro-0.149
irl-0.146
opl-0.141
Token there
Token ID612
Feature activation-1.53e-01

Pos logit contributions
 indeed+0.081
 enough+0.080
 wanna+0.078
 hesitate+0.071
 pad+0.071
Neg logit contributions
lights-0.123
lo-0.119
irl-0.105
cks-0.101
aro-0.098
Token was
Token ID373
Feature activation-6.19e+00

Pos logit contributions
 indeed+0.003
 enough+0.003
 like+0.003
 Can+0.003
 please+0.002
Neg logit contributions
lights-0.004
lo-0.004
aro-0.003
irl-0.003
vation-0.003
Token a
Token ID257
Feature activation-5.70e+00

Pos logit contributions
 wanna+0.109
 declined+0.102
 hesitate+0.101
 Please+0.101
 Can+0.097
Neg logit contributions
lights-0.135
lo-0.129
irl-0.127
hes-0.118
 weekends-0.111
Token curious
Token ID11040
Feature activation-8.83e+00

Pos logit contributions
 enough+0.091
 indeed+0.085
 wanna+0.076
 ok+0.074
 gonna+0.074
Neg logit contributions
lights-0.150
lo-0.150
irl-0.132
aro-0.124
asis-0.123
Token little
Token ID1310
Feature activation-4.98e+00

Pos logit contributions
 ok+0.148
 enough+0.148
 indeed+0.145
 gonna+0.140
 wanna+0.138
Neg logit contributions
lo-0.208
lights-0.203
asis-0.168
 hill-0.168
loo-0.163
Token cat
Token ID3797
Feature activation-5.24e+00

Pos logit contributions
 indeed+0.080
 enough+0.078
 ok+0.076
 gonna+0.072
 wanna+0.072
Neg logit contributions
lights-0.132
lo-0.131
aro-0.110
ites-0.108
asis-0.104
Token named
Token ID3706
Feature activation-5.49e+00

Pos logit contributions
 indeed+0.065
 enough+0.064
 like+0.061
 Can+0.057
 please+0.054
Neg logit contributions
lights-0.163
lo-0.160
aro-0.138
vation-0.137
irl-0.134
Token Kitty
Token ID21168
Feature activation-4.26e+00

Pos logit contributions
 indeed+0.088
 enough+0.085
 like+0.075
 wanna+0.072
 Can+0.072
Neg logit contributions
lo-0.149
lights-0.141
aro-0.123
irl-0.117
vation-0.114
Token.
Token ID13
Feature activation-1.93e+00

Pos logit contributions
 enough+0.091
 indeed+0.090
 Can+0.086
 wanna+0.085
 ok+0.084
Neg logit contributions
lo-0.091
lights-0.089
aro-0.073
eds-0.070
cks-0.068
Token a
Token ID257
Feature activation-4.32e+00

Pos logit contributions
 enough+0.011
 Can+0.011
 indeed+0.011
 Please+0.010
 gonna+0.010
Neg logit contributions
lights-0.021
lo-0.020
irl-0.019
we-0.018
ves-0.017
Token time
Token ID640
Feature activation-5.14e+00

Pos logit contributions
 indeed+0.082
 enough+0.081
 Can+0.077
 like+0.076
 please+0.072
Neg logit contributions
lights-0.102
lo-0.100
irl-0.083
vation-0.081
opl-0.080
Token there
Token ID612
Feature activation+1.96e-01

Pos logit contributions
 indeed+0.062
 enough+0.060
 Can+0.055
 wanna+0.054
 Please+0.052
Neg logit contributions
lo-0.159
lights-0.154
aro-0.138
irl-0.131
opl-0.125
Token was
Token ID373
Feature activation-6.29e+00

Pos logit contributions
lights+0.005
lo+0.005
aro+0.004
vation+0.004
irl+0.004
Neg logit contributions
 indeed-0.003
 enough-0.003
 like-0.003
 Can-0.003
 please-0.003
Token a
Token ID257
Feature activation-5.72e+00

Pos logit contributions
 Please+0.103
 Can+0.101
 declined+0.101
 wanna+0.098
 Could+0.097
Neg logit contributions
irl-0.137
lights-0.137
lo-0.134
 weekends-0.117
hes-0.116
Token curious
Token ID11040
Feature activation-8.81e+00

Pos logit contributions
 enough+0.071
 indeed+0.069
 like+0.061
 Can+0.057
 wanna+0.056
Neg logit contributions
lights-0.175
lo-0.174
irl-0.150
aro-0.148
vation-0.147
Token little
Token ID1310
Feature activation-5.05e+00

Pos logit contributions
 enough+0.135
 indeed+0.134
 ok+0.130
 gonna+0.127
 wanna+0.125
Neg logit contributions
lo-0.224
lights-0.220
asis-0.179
 hill-0.176
aro-0.175
Token girl
Token ID2576
Feature activation-5.85e+00

Pos logit contributions
 indeed+0.077
 enough+0.075
 ok+0.071
 gonna+0.068
 wanna+0.068
Neg logit contributions
lo-0.138
lights-0.138
aro-0.118
ites-0.114
asis-0.109
Token called
Token ID1444
Feature activation-6.57e+00

Pos logit contributions
 indeed+0.068
 enough+0.066
 like+0.065
 Can+0.059
 please+0.055
Neg logit contributions
lights-0.189
lo-0.185
vation-0.162
aro-0.159
irl-0.156
Token Daisy
Token ID40355
Feature activation-5.21e+00

Pos logit contributions
 indeed+0.107
 enough+0.096
 wanna+0.092
 like+0.091
 Can+0.088
Neg logit contributions
lo-0.183
lights-0.166
aro-0.154
vation-0.135
we-0.133
Token.
Token ID13
Feature activation-1.41e+00

Pos logit contributions
 indeed+0.114
 enough+0.113
 Can+0.109
 wanna+0.108
 gonna+0.108
Neg logit contributions
lo-0.104
lights-0.103
aro-0.080
eds-0.077
we-0.076
Token time
Token ID640
Feature activation-5.41e+00

Pos logit contributions
 indeed+0.067
 like+0.066
 enough+0.066
 Can+0.059
 please+0.059
Neg logit contributions
lights-0.116
lo-0.114
aro-0.098
vation-0.095
opl-0.091
Token,
Token ID11
Feature activation-5.03e+00

Pos logit contributions
 indeed+0.049
 enough+0.047
 like+0.043
 Can+0.042
 wanna+0.039
Neg logit contributions
lo-0.183
lights-0.182
aro-0.160
irl-0.156
opl-0.152
Token there
Token ID612
Feature activation-3.73e-01

Pos logit contributions
 indeed+0.080
 enough+0.077
 wanna+0.074
 Can+0.069
 daughter+0.069
Neg logit contributions
lights-0.134
lo-0.131
irl-0.113
cks-0.109
aro-0.109
Token was
Token ID373
Feature activation-6.49e+00

Pos logit contributions
 indeed+0.006
 enough+0.006
 like+0.006
 Can+0.006
 please+0.005
Neg logit contributions
lights-0.010
lo-0.010
aro-0.008
vation-0.008
irl-0.008
Token a
Token ID257
Feature activation-5.82e+00

Pos logit contributions
 wanna+0.101
 declined+0.098
 Please+0.096
 indeed+0.094
 hesitate+0.094
Neg logit contributions
lights-0.154
lo-0.150
irl-0.144
hes-0.130
opl-0.128
Token curious
Token ID11040
Feature activation-8.77e+00

Pos logit contributions
 enough+0.087
 indeed+0.085
 like+0.076
 wanna+0.073
 ok+0.072
Neg logit contributions
lights-0.162
lo-0.160
irl-0.138
aro-0.134
vation-0.133
Token little
Token ID1310
Feature activation-4.83e+00

Pos logit contributions
 enough+0.141
 indeed+0.140
 ok+0.138
 gonna+0.130
 wanna+0.128
Neg logit contributions
lo-0.221
lights-0.215
asis-0.177
aro-0.171
 hill-0.171
Token boy
Token ID2933
Feature activation-6.31e+00

Pos logit contributions
 indeed+0.073
 enough+0.071
 ok+0.067
 like+0.065
 wanna+0.064
Neg logit contributions
lights-0.134
lo-0.132
aro-0.112
ites-0.109
asis-0.106
Token named
Token ID3706
Feature activation-5.76e+00

Pos logit contributions
 indeed+0.074
 enough+0.072
 like+0.071
 Can+0.065
 please+0.061
Neg logit contributions
lights-0.202
lo-0.197
vation-0.171
aro-0.170
irl-0.166
Token Tim
Token ID5045
Feature activation-3.93e+00

Pos logit contributions
 indeed+0.076
 enough+0.073
 like+0.062
 ok+0.060
 wanna+0.059
Neg logit contributions
lo-0.176
lights-0.172
aro-0.148
vation-0.139
irl-0.139
Token.
Token ID13
Feature activation-2.16e+00

Pos logit contributions
 indeed+0.074
 enough+0.073
 like+0.070
 Can+0.068
 please+0.066
Neg logit contributions
lights-0.097
lo-0.095
aro-0.078
vation-0.077
irl-0.075
Token time
Token ID640
Feature activation-5.08e+00

Pos logit contributions
 indeed+0.080
 enough+0.079
 Can+0.075
 like+0.075
 please+0.070
Neg logit contributions
lights-0.104
lo-0.101
irl-0.084
vation-0.083
aro-0.081
Token,
Token ID11
Feature activation-4.96e+00

Pos logit contributions
 indeed+0.067
 enough+0.065
 wanna+0.061
 Can+0.061
 Please+0.060
Neg logit contributions
lo-0.149
lights-0.144
aro-0.128
irl-0.120
ves-0.113
Token there
Token ID612
Feature activation+1.50e-01

Pos logit contributions
 indeed+0.090
 wanna+0.089
 enough+0.088
 declined+0.083
 robber+0.082
Neg logit contributions
lights-0.115
lo-0.112
irl-0.096
aro-0.092
cks-0.088
Token was
Token ID373
Feature activation-6.28e+00

Pos logit contributions
lo+0.004
lights+0.004
aro+0.003
irl+0.003
vation+0.003
Neg logit contributions
 enough-0.003
 indeed-0.003
 like-0.003
 Can-0.003
 wanna-0.003
Token a
Token ID257
Feature activation-5.61e+00

Pos logit contributions
 wanna+0.114
 Please+0.114
 declined+0.112
 Can+0.112
 Could+0.106
Neg logit contributions
lights-0.125
irl-0.121
lo-0.118
hes-0.108
 weekends-0.106
Token curious
Token ID11040
Feature activation-8.72e+00

Pos logit contributions
 enough+0.082
 indeed+0.077
 like+0.067
 wanna+0.066
 gonna+0.066
Neg logit contributions
lights-0.156
lo-0.155
irl-0.137
aro-0.130
vation-0.128
Token boy
Token ID2933
Feature activation-6.03e+00

Pos logit contributions
 enough+0.141
 indeed+0.139
 ok+0.137
 gonna+0.134
 wanna+0.129
Neg logit contributions
lo-0.206
lights-0.203
 hill-0.170
asis-0.168
 tracks-0.161
Token.
Token ID13
Feature activation-2.30e+00

Pos logit contributions
 indeed+0.071
 enough+0.070
 like+0.068
 Can+0.062
 please+0.059
Neg logit contributions
lights-0.192
lo-0.188
vation-0.163
aro-0.162
irl-0.158
Token He
Token ID679
Feature activation-1.21e+00

Pos logit contributions
 indeed+0.023
 enough+0.023
 like+0.020
 ok+0.018
 Can+0.018
Neg logit contributions
lights-0.078
lo-0.077
aro-0.065
vation-0.065
opl-0.065
Token was
Token ID373
Feature activation-4.31e+00

Pos logit contributions
 enough+0.023
 indeed+0.021
 ok+0.020
 please+0.020
 Can+0.020
Neg logit contributions
lo-0.030
lights-0.028
irl-0.025
ves-0.024
aro-0.024
Token walking
Token ID6155
Feature activation-2.58e+00

Pos logit contributions
 indeed+0.078
 enough+0.076
 pad+0.075
 Can+0.074
 declined+0.073
Neg logit contributions
lo-0.095
lights-0.083
irl-0.081
aro-0.078
 tracks-0.073
Token a
Token ID257
Feature activation-4.38e+00

Pos logit contributions
 enough+0.009
 Can+0.009
 indeed+0.009
 Please+0.008
 gonna+0.008
Neg logit contributions
lights-0.017
lo-0.017
irl-0.015
ves-0.014
we-0.014
Token time
Token ID640
Feature activation-5.22e+00

Pos logit contributions
 indeed+0.080
 enough+0.079
 like+0.076
 Can+0.075
 please+0.071
Neg logit contributions
lights-0.109
lo-0.107
irl-0.088
vation-0.088
aro-0.087
Token there
Token ID612
Feature activation+2.35e-01

Pos logit contributions
 indeed+0.066
 enough+0.064
 wanna+0.059
 Can+0.059
 daughter+0.058
Neg logit contributions
lo-0.157
lights-0.151
aro-0.136
irl-0.129
opl-0.122
Token was
Token ID373
Feature activation-6.35e+00

Pos logit contributions
lo+0.006
lights+0.006
aro+0.005
irl+0.004
vation+0.004
Neg logit contributions
 enough-0.004
 indeed-0.004
 like-0.004
 Can-0.004
 wanna-0.004
Token a
Token ID257
Feature activation-5.71e+00

Pos logit contributions
 declined+0.108
 Please+0.107
 Can+0.106
 wanna+0.104
 Could+0.103
Neg logit contributions
irl-0.134
lo-0.133
lights-0.132
 weekends-0.116
 tracks-0.112
Token curious
Token ID11040
Feature activation-8.72e+00

Pos logit contributions
 enough+0.075
 indeed+0.073
 like+0.065
 Can+0.061
 wanna+0.060
Neg logit contributions
lights-0.169
lo-0.168
irl-0.145
aro-0.143
vation-0.141
Token little
Token ID1310
Feature activation-4.96e+00

Pos logit contributions
 indeed+0.133
 enough+0.133
 ok+0.127
 gonna+0.124
 wanna+0.121
Neg logit contributions
lo-0.221
lights-0.218
asis-0.177
 hill-0.173
aro-0.172
Token girl
Token ID2576
Feature activation-6.02e+00

Pos logit contributions
 indeed+0.077
 enough+0.074
 ok+0.071
 like+0.068
 gonna+0.068
Neg logit contributions
lights-0.135
lo-0.133
aro-0.113
ites-0.109
asis-0.105
Token.
Token ID13
Feature activation-1.91e+00

Pos logit contributions
 indeed+0.072
 enough+0.070
 like+0.069
 Can+0.063
 please+0.060
Neg logit contributions
lights-0.191
lo-0.188
vation-0.162
aro-0.161
irl-0.158
Token She
Token ID1375
Feature activation-1.22e+00

Pos logit contributions
 enough+0.014
 indeed+0.014
 like+0.011
 ok+0.010
 accidentally+0.010
Neg logit contributions
lights-0.068
lo-0.068
vation-0.058
opl-0.057
aro-0.057
Token was
Token ID373
Feature activation-4.48e+00

Pos logit contributions
 enough+0.022
 indeed+0.021
 ok+0.019
 Can+0.019
 please+0.019
Neg logit contributions
lo-0.031
lights-0.030
irl-0.027
aro-0.025
ves-0.025
Token a
Token ID257
Feature activation-4.40e+00

Pos logit contributions
 enough+0.011
 Can+0.011
 indeed+0.011
 Please+0.010
 gonna+0.010
Neg logit contributions
lights-0.020
lo-0.020
irl-0.019
ves-0.018
we-0.017
Token time
Token ID640
Feature activation-5.26e+00

Pos logit contributions
 indeed+0.079
 enough+0.078
 like+0.075
 Can+0.074
 please+0.070
Neg logit contributions
lights-0.111
lo-0.108
vation-0.089
aro-0.088
irl-0.088
Token there
Token ID612
Feature activation+1.71e-01

Pos logit contributions
 indeed+0.068
 enough+0.067
 wanna+0.064
 Can+0.060
 Please+0.060
Neg logit contributions
lo-0.156
lights-0.150
aro-0.136
irl-0.126
opl-0.119
Token was
Token ID373
Feature activation-6.33e+00

Pos logit contributions
lo+0.004
lights+0.004
aro+0.003
irl+0.003
vation+0.003
Neg logit contributions
 indeed-0.003
 enough-0.003
 like-0.003
 Can-0.003
 wanna-0.003
Token a
Token ID257
Feature activation-5.74e+00

Pos logit contributions
 Please+0.102
 Can+0.100
 declined+0.099
 wanna+0.099
 please+0.097
Neg logit contributions
lights-0.141
lo-0.139
irl-0.138
 weekends-0.120
opl-0.120
Token curious
Token ID11040
Feature activation-8.71e+00

Pos logit contributions
 enough+0.077
 indeed+0.074
 like+0.066
 Can+0.062
 wanna+0.062
Neg logit contributions
lights-0.169
lo-0.168
irl-0.145
aro-0.143
vation-0.141
Token little
Token ID1310
Feature activation-4.98e+00

Pos logit contributions
 indeed+0.134
 enough+0.133
 ok+0.129
 gonna+0.126
 wanna+0.123
Neg logit contributions
lo-0.219
lights-0.218
asis-0.175
 hill-0.174
aro-0.172
Token girl
Token ID2576
Feature activation-5.97e+00

Pos logit contributions
 indeed+0.078
 enough+0.075
 ok+0.072
 gonna+0.069
 wanna+0.069
Neg logit contributions
lights-0.134
lo-0.132
aro-0.113
ites-0.109
asis-0.103
Token named
Token ID3706
Feature activation-5.06e+00

Pos logit contributions
 indeed+0.075
 enough+0.073
 like+0.070
 Can+0.066
 please+0.062
Neg logit contributions
lights-0.185
lo-0.182
aro-0.156
vation-0.155
irl-0.152
Token Lucy
Token ID22162
Feature activation-3.97e+00

Pos logit contributions
 indeed+0.078
 enough+0.075
 like+0.066
 wanna+0.064
 Can+0.063
Neg logit contributions
lo-0.141
lights-0.136
aro-0.114
vation-0.110
irl-0.109
Token.
Token ID13
Feature activation-1.24e+00

Pos logit contributions
 enough+0.085
 indeed+0.085
 Can+0.081
 please+0.080
 ok+0.079
Neg logit contributions
lo-0.083
lights-0.081
aro-0.064
eds-0.063
we-0.062
TokenMom
Token ID29252
Feature activation-1.75e+00

Pos logit contributions
lights+0.055
lo+0.052
vation+0.045
aro+0.044
irl+0.043
Neg logit contributions
 indeed-0.033
 like-0.032
 enough-0.032
 Can-0.031
 accidentally-0.029
Tokenmy
Token ID1820
Feature activation-3.76e+00

Pos logit contributions
 indeed+0.035
 thief+0.032
 ok+0.032
 enough+0.032
 Can+0.032
Neg logit contributions
lo-0.039
lights-0.038
eds-0.032
ups-0.031
opl-0.031
Token had
Token ID550
Feature activation-4.54e+00

Pos logit contributions
 indeed+0.049
 ok+0.044
 enough+0.043
 like+0.042
 Can+0.042
Neg logit contributions
lo-0.105
lights-0.103
ites-0.092
we-0.091
vation-0.091
Token a
Token ID257
Feature activation-5.69e+00

Pos logit contributions
 indeed+0.051
 Can+0.049
 enough+0.046
 like+0.045
 thief+0.044
Neg logit contributions
lights-0.137
lo-0.136
irl-0.120
ups-0.116
ites-0.116
Token great
Token ID1049
Feature activation-6.75e+00

Pos logit contributions
 indeed+0.121
 enough+0.111
 Can+0.110
 like+0.109
 Please+0.103
Neg logit contributions
lo-0.121
lights-0.120
irl-0.102
opl-0.095
ites-0.094
Token suggestion
Token ID13052
Feature activation-8.70e+00

Pos logit contributions
 indeed+0.134
 Can+0.129
 enough+0.127
 gonna+0.125
 Please+0.121
Neg logit contributions
lights-0.137
lo-0.136
irl-0.127
opl-0.114
 skills-0.110
Token.
Token ID13
Feature activation-2.61e+00

Pos logit contributions
 indeed+0.151
 enough+0.150
 Can+0.144
 like+0.142
 ok+0.136
Neg logit contributions
lights-0.218
lo-0.213
aro-0.177
vation-0.175
opl-0.169
Token She
Token ID1375
Feature activation-1.48e+00

Pos logit contributions
 enough+0.032
 indeed+0.028
 like+0.026
 thief+0.025
 ok+0.024
Neg logit contributions
lights-0.082
lo-0.081
opl-0.069
aro-0.068
ites-0.065
Token wanted
Token ID2227
Feature activation-3.20e+00

Pos logit contributions
 enough+0.026
 indeed+0.024
 ok+0.024
 Can+0.023
 thief+0.022
Neg logit contributions
lights-0.037
lo-0.036
irl-0.031
aro-0.029
ves-0.029
Token to
Token ID284
Feature activation-1.03e-01

Pos logit contributions
 indeed+0.032
 enough+0.029
 Can+0.029
 like+0.028
 wanna+0.027
Neg logit contributions
lights-0.104
lo-0.103
ups-0.088
aro-0.088
vation-0.087
Token go
Token ID467
Feature activation-9.77e-01

Pos logit contributions
 indeed+0.002
 enough+0.002
 like+0.002
 Can+0.002
 please+0.002
Neg logit contributions
lights-0.002
lo-0.002
aro-0.002
vation-0.002
irl-0.002
Token time
Token ID640
Feature activation-4.87e+00

Pos logit contributions
 indeed+0.082
 enough+0.081
 Can+0.077
 like+0.076
 please+0.073
Neg logit contributions
lights-0.103
lo-0.101
irl-0.084
vation-0.082
opl-0.081
Token,
Token ID11
Feature activation-4.45e+00

Pos logit contributions
 indeed+0.063
 enough+0.061
 wanna+0.058
 Please+0.056
 Can+0.056
Neg logit contributions
lo-0.146
lights-0.140
aro-0.126
irl-0.117
opl-0.110
Token there
Token ID612
Feature activation+2.22e-01

Pos logit contributions
 wanna+0.081
 enough+0.080
 indeed+0.079
 robber+0.076
 declined+0.075
Neg logit contributions
lights-0.104
lo-0.099
irl-0.088
aro-0.082
cks-0.080
Token was
Token ID373
Feature activation-5.89e+00

Pos logit contributions
lights+0.005
lo+0.005
aro+0.004
irl+0.004
vation+0.004
Neg logit contributions
 enough-0.004
 indeed-0.004
 like-0.004
 Can-0.004
 wanna-0.004
Token a
Token ID257
Feature activation-5.63e+00

Pos logit contributions
 wanna+0.105
 Please+0.103
 declined+0.101
 please+0.100
 Can+0.100
Neg logit contributions
lights-0.125
lo-0.118
irl-0.114
 weekends-0.103
 tracks-0.102
Token curious
Token ID11040
Feature activation-8.70e+00

Pos logit contributions
 enough+0.081
 indeed+0.075
 like+0.067
 wanna+0.065
 ok+0.064
Neg logit contributions
lights-0.160
lo-0.158
irl-0.138
aro-0.134
vation-0.132
Token little
Token ID1310
Feature activation-4.53e+00

Pos logit contributions
 enough+0.140
 indeed+0.138
 ok+0.134
 gonna+0.130
 wanna+0.125
Neg logit contributions
lo-0.211
lights-0.207
 hill-0.170
asis-0.170
 tracks-0.163
Token boy
Token ID2933
Feature activation-5.80e+00

Pos logit contributions
 indeed+0.071
 enough+0.068
 ok+0.066
 gonna+0.063
 wanna+0.062
Neg logit contributions
lights-0.122
lo-0.119
aro-0.102
ites-0.099
asis-0.094
Token.
Token ID13
Feature activation-1.99e+00

Pos logit contributions
 indeed+0.072
 enough+0.071
 like+0.068
 Can+0.063
 please+0.060
Neg logit contributions
lights-0.181
lo-0.178
aro-0.153
vation-0.152
irl-0.149
Token He
Token ID679
Feature activation-1.11e+00

Pos logit contributions
 indeed+0.022
 enough+0.022
 like+0.019
 ok+0.018
 accidentally+0.018
Neg logit contributions
lights-0.064
lo-0.064
opl-0.054
aro-0.052
vation-0.052
Token wanted
Token ID2227
Feature activation-2.83e+00

Pos logit contributions
 enough+0.022
 indeed+0.020
 ok+0.020
 please+0.019
 wanna+0.019
Neg logit contributions
lo-0.025
lights-0.023
irl-0.022
ves-0.020
aro-0.020
Token a
Token ID257
Feature activation-4.56e+00

Pos logit contributions
 enough+0.006
 Can+0.006
 indeed+0.006
 wanna+0.006
 gonna+0.006
Neg logit contributions
lights-0.012
lo-0.012
irl-0.011
we-0.010
ves-0.010
Token time
Token ID640
Feature activation-5.12e+00

Pos logit contributions
 indeed+0.079
 enough+0.078
 like+0.076
 Can+0.072
 please+0.070
Neg logit contributions
lights-0.119
lo-0.117
aro-0.097
vation-0.097
irl-0.094
Token there
Token ID612
Feature activation-6.73e-01

Pos logit contributions
 enough+0.060
 indeed+0.060
 wanna+0.055
 Can+0.054
 daughter+0.053
Neg logit contributions
lo-0.157
lights-0.153
aro-0.135
irl-0.129
opl-0.124
Token was
Token ID373
Feature activation-6.47e+00

Pos logit contributions
 indeed+0.011
 enough+0.011
 like+0.011
 Can+0.010
 please+0.010
Neg logit contributions
lights-0.018
lo-0.018
aro-0.015
vation-0.015
irl-0.014
Token a
Token ID257
Feature activation-5.95e+00

Pos logit contributions
 Can+0.090
 indeed+0.083
 declined+0.083
 Could+0.082
 Please+0.081
Neg logit contributions
lo-0.167
lights-0.166
irl-0.162
opl-0.145
cks-0.145
Token curious
Token ID11040
Feature activation-8.69e+00

Pos logit contributions
 enough+0.078
 indeed+0.075
 like+0.064
 gonna+0.063
 wanna+0.062
Neg logit contributions
lo-0.176
lights-0.176
irl-0.153
aro-0.149
vation-0.146
Token girl
Token ID2576
Feature activation-5.92e+00

Pos logit contributions
 indeed+0.137
 enough+0.135
 gonna+0.133
 ok+0.132
 wanna+0.127
Neg logit contributions
lo-0.224
lights-0.218
aro-0.175
asis-0.174
 hill-0.169
Token.
Token ID13
Feature activation-2.54e+00

Pos logit contributions
 indeed+0.072
 enough+0.070
 like+0.068
 Can+0.063
 please+0.059
Neg logit contributions
lights-0.187
lo-0.183
vation-0.158
aro-0.158
irl-0.154
Token She
Token ID1375
Feature activation-1.66e+00

Pos logit contributions
 enough+0.022
 indeed+0.021
 accidentally+0.017
 ok+0.017
 wanna+0.016
Neg logit contributions
lights-0.088
lo-0.088
opl-0.073
aro-0.073
vation-0.072
Token wanted
Token ID2227
Feature activation-3.27e+00

Pos logit contributions
 enough+0.029
 indeed+0.028
 Can+0.025
 ok+0.025
 please+0.025
Neg logit contributions
lo-0.043
lights-0.042
irl-0.036
aro-0.035
vation-0.035
Token to
Token ID284
Feature activation-2.45e-01

Pos logit contributions
 indeed+0.038
 Can+0.033
 wanna+0.033
 like+0.033
 enough+0.033
Neg logit contributions
lo-0.100
lights-0.099
ups-0.086
aro-0.086
vation-0.083
Token time
Token ID640
Feature activation-4.92e+00

Pos logit contributions
 indeed+0.079
 enough+0.078
 like+0.074
 Can+0.074
 please+0.070
Neg logit contributions
lights-0.105
lo-0.102
irl-0.084
vation-0.084
aro-0.082
Token,
Token ID11
Feature activation-4.57e+00

Pos logit contributions
 indeed+0.064
 enough+0.062
 wanna+0.059
 Please+0.058
 Can+0.057
Neg logit contributions
lo-0.145
lights-0.139
aro-0.125
irl-0.117
ves-0.110
Token there
Token ID612
Feature activation+3.02e-01

Pos logit contributions
 wanna+0.086
 indeed+0.084
 enough+0.083
 dear+0.080
 robber+0.080
Neg logit contributions
lights-0.103
lo-0.099
irl-0.085
aro-0.081
cks-0.077
Token was
Token ID373
Feature activation-5.93e+00

Pos logit contributions
lo+0.007
lights+0.007
aro+0.006
irl+0.006
opl+0.005
Neg logit contributions
 indeed-0.006
 enough-0.006
 like-0.005
 Can-0.005
 wanna-0.005
Token a
Token ID257
Feature activation-5.56e+00

Pos logit contributions
 wanna+0.104
 Please+0.103
 declined+0.100
 Can+0.099
 please+0.097
Neg logit contributions
lights-0.127
lo-0.120
irl-0.116
 weekends-0.106
 tracks-0.103
Token curious
Token ID11040
Feature activation-8.64e+00

Pos logit contributions
 enough+0.080
 indeed+0.075
 like+0.067
 wanna+0.065
 ok+0.064
Neg logit contributions
lights-0.157
lo-0.155
irl-0.136
aro-0.131
vation-0.129
Token little
Token ID1310
Feature activation-4.60e+00

Pos logit contributions
 enough+0.138
 indeed+0.137
 ok+0.134
 gonna+0.130
 wanna+0.125
Neg logit contributions
lo-0.207
lights-0.204
 hill-0.169
asis-0.168
 tracks-0.161
Token boy
Token ID2933
Feature activation-5.84e+00

Pos logit contributions
 indeed+0.072
 enough+0.069
 ok+0.067
 gonna+0.064
 wanna+0.063
Neg logit contributions
lights-0.124
lo-0.121
aro-0.104
ites-0.100
asis-0.096
Token called
Token ID1444
Feature activation-6.36e+00

Pos logit contributions
 indeed+0.070
 enough+0.068
 like+0.066
 Can+0.061
 please+0.057
Neg logit contributions
lights-0.186
lo-0.182
vation-0.158
aro-0.157
irl-0.153
Token Bob
Token ID5811
Feature activation-3.28e+00

Pos logit contributions
 indeed+0.085
 enough+0.077
 like+0.072
 wanna+0.069
 Can+0.067
Neg logit contributions
lo-0.192
lights-0.186
aro-0.164
vation-0.152
irl-0.147
Token.
Token ID13
Feature activation-1.46e+00

Pos logit contributions
 indeed+0.071
 enough+0.070
 Can+0.069
 Please+0.067
 please+0.066
Neg logit contributions
lo-0.069
lights-0.067
aro-0.054
eds-0.053
cks-0.051
Token time
Token ID640
Feature activation-5.02e+00

Pos logit contributions
 indeed+0.076
 enough+0.075
 like+0.071
 Can+0.071
 please+0.067
Neg logit contributions
lights-0.102
lo-0.100
irl-0.081
vation-0.081
aro-0.080
Token,
Token ID11
Feature activation-4.57e+00

Pos logit contributions
 indeed+0.067
 enough+0.065
 wanna+0.061
 Please+0.059
 Can+0.059
Neg logit contributions
lo-0.147
lights-0.141
aro-0.127
irl-0.117
ves-0.112
Token there
Token ID612
Feature activation+2.45e-01

Pos logit contributions
 wanna+0.089
 enough+0.088
 indeed+0.088
 robber+0.086
 declined+0.084
Neg logit contributions
lights-0.098
lo-0.094
irl-0.081
aro-0.077
we-0.072
Token was
Token ID373
Feature activation-6.01e+00

Pos logit contributions
lo+0.006
lights+0.006
aro+0.005
irl+0.005
vation+0.004
Neg logit contributions
 enough-0.005
 indeed-0.005
 like-0.004
 Can-0.004
 wanna-0.004
Token a
Token ID257
Feature activation-5.55e+00

Pos logit contributions
 wanna+0.104
 Please+0.103
 Can+0.100
 declined+0.099
 please+0.097
Neg logit contributions
lights-0.133
lo-0.123
irl-0.121
 weekends-0.108
hes-0.106
Token curious
Token ID11040
Feature activation-8.64e+00

Pos logit contributions
 enough+0.078
 indeed+0.073
 like+0.065
 wanna+0.062
 ok+0.062
Neg logit contributions
lights-0.159
lo-0.157
irl-0.138
aro-0.133
vation-0.131
Token cater
Token ID20825
Feature activation-2.79e+00

Pos logit contributions
 enough+0.139
 indeed+0.137
 ok+0.133
 gonna+0.130
 wanna+0.125
Neg logit contributions
lo-0.209
lights-0.206
 hill-0.169
asis-0.167
 tracks-0.161
Tokenpillar
Token ID41643
Feature activation-4.57e+00

Pos logit contributions
 indeed+0.034
 enough+0.033
 like+0.033
 Can+0.030
 please+0.028
Neg logit contributions
lights-0.088
lo-0.086
vation-0.074
aro-0.073
irl-0.072
Token.
Token ID13
Feature activation-2.49e+00

Pos logit contributions
 indeed+0.061
 like+0.060
 enough+0.058
 Can+0.053
 accidentally+0.051
Neg logit contributions
lights-0.141
lo-0.137
vation-0.120
aro-0.115
irl-0.114
Token He
Token ID679
Feature activation-8.07e-01

Pos logit contributions
 indeed+0.041
 enough+0.041
 ok+0.039
 wanna+0.037
 daughter+0.036
Neg logit contributions
lo-0.065
lights-0.063
opl-0.053
irl-0.049
ites-0.048
Token went
Token ID1816
Feature activation-2.17e+00

Pos logit contributions
 enough+0.015
 indeed+0.014
 ok+0.014
 Can+0.013
 wanna+0.013
Neg logit contributions
lo-0.019
lights-0.018
irl-0.017
aro-0.016
ves-0.015
Token time
Token ID640
Feature activation-5.29e+00

Pos logit contributions
 indeed+0.075
 enough+0.074
 like+0.072
 Can+0.069
 please+0.066
Neg logit contributions
lights-0.113
lo-0.111
aro-0.092
vation-0.092
irl-0.089
Token,
Token ID11
Feature activation-4.65e+00

Pos logit contributions
 indeed+0.054
 enough+0.052
 Can+0.047
 like+0.045
 wanna+0.044
Neg logit contributions
lo-0.173
lights-0.170
aro-0.150
irl-0.146
opl-0.141
Token there
Token ID612
Feature activation+1.37e-01

Pos logit contributions
 indeed+0.076
 enough+0.074
 wanna+0.074
 Can+0.068
 dear+0.067
Neg logit contributions
lights-0.118
lo-0.114
irl-0.099
cks-0.095
aro-0.094
Token was
Token ID373
Feature activation-6.08e+00

Pos logit contributions
lights+0.003
lo+0.003
aro+0.003
irl+0.003
vation+0.003
Neg logit contributions
 indeed-0.002
 enough-0.002
 like-0.002
 Can-0.002
 wanna-0.002
Token a
Token ID257
Feature activation-5.71e+00

Pos logit contributions
 wanna+0.108
 declined+0.105
 Please+0.105
 Can+0.102
 please+0.099
Neg logit contributions
lights-0.128
irl-0.123
lo-0.122
hes-0.111
 weekends-0.107
Token curious
Token ID11040
Feature activation-8.61e+00

Pos logit contributions
 enough+0.086
 indeed+0.081
 like+0.071
 wanna+0.071
 ok+0.069
Neg logit contributions
lights-0.156
lo-0.155
irl-0.137
aro-0.129
vation-0.128
Token little
Token ID1310
Feature activation-4.87e+00

Pos logit contributions
 enough+0.140
 indeed+0.138
 ok+0.136
 gonna+0.131
 wanna+0.126
Neg logit contributions
lo-0.209
lights-0.204
asis-0.169
 hill-0.167
aro-0.161
Token bird
Token ID6512
Feature activation-3.14e+00

Pos logit contributions
 indeed+0.076
 enough+0.073
 ok+0.070
 like+0.067
 gonna+0.067
Neg logit contributions
lights-0.132
lo-0.131
aro-0.111
ites-0.108
asis-0.104
Token named
Token ID3706
Feature activation-4.92e+00

Pos logit contributions
 like+0.033
 indeed+0.030
 enough+0.028
 wanted+0.025
 Can+0.024
Neg logit contributions
lights-0.108
lo-0.106
vation-0.096
aro-0.090
irl-0.089
Token Twe
Token ID24205
Feature activation+3.17e-02

Pos logit contributions
 indeed+0.068
 enough+0.063
 wanna+0.060
 like+0.059
 Can+0.056
Neg logit contributions
lo-0.142
lights-0.137
aro-0.123
irl-0.115
cks-0.114
Tokenety
Token ID2963
Feature activation-4.24e+00

Pos logit contributions
lights+0.001
lo+0.001
aro+0.001
vation+0.001
opl+0.001
Neg logit contributions
 indeed-0.000
 enough-0.000
 like-0.000
 Can-0.000
 please-0.000
Token time
Token ID640
Feature activation-5.02e+00

Pos logit contributions
 indeed+0.071
 enough+0.070
 like+0.068
 Can+0.066
 please+0.063
Neg logit contributions
lights-0.101
lo-0.098
vation-0.081
aro-0.080
irl-0.080
Token,
Token ID11
Feature activation-4.50e+00

Pos logit contributions
 indeed+0.064
 enough+0.062
 wanna+0.057
 Can+0.056
 Please+0.056
Neg logit contributions
lo-0.151
lights-0.146
aro-0.131
irl-0.122
opl-0.116
Token there
Token ID612
Feature activation+1.64e-01

Pos logit contributions
 enough+0.080
 indeed+0.080
 wanna+0.079
 robber+0.074
 declined+0.074
Neg logit contributions
lights-0.107
lo-0.104
irl-0.088
aro-0.086
cks-0.080
Token was
Token ID373
Feature activation-6.01e+00

Pos logit contributions
lo+0.004
lights+0.004
aro+0.003
irl+0.003
vation+0.003
Neg logit contributions
 indeed-0.003
 enough-0.003
 like-0.003
 Can-0.003
 wanna-0.003
Token a
Token ID257
Feature activation-5.54e+00

Pos logit contributions
 wanna+0.106
 Please+0.104
 declined+0.102
 Can+0.102
 hesitate+0.098
Neg logit contributions
lights-0.127
lo-0.121
irl-0.120
hes-0.106
 weekends-0.105
Token curious
Token ID11040
Feature activation-8.60e+00

Pos logit contributions
 enough+0.078
 indeed+0.074
 like+0.065
 wanna+0.063
 gonna+0.062
Neg logit contributions
lights-0.158
lo-0.156
irl-0.137
aro-0.133
vation-0.130
Token little
Token ID1310
Feature activation-4.90e+00

Pos logit contributions
 indeed+0.136
 enough+0.136
 ok+0.130
 gonna+0.128
 wanna+0.123
Neg logit contributions
lo-0.208
lights-0.207
 hill-0.170
asis-0.169
 tracks-0.163
Token girl
Token ID2576
Feature activation-5.85e+00

Pos logit contributions
 indeed+0.078
 enough+0.074
 ok+0.072
 gonna+0.070
 wanna+0.068
Neg logit contributions
lights-0.130
lo-0.128
aro-0.109
ites-0.106
 hill-0.101
Token called
Token ID1444
Feature activation-6.28e+00

Pos logit contributions
 indeed+0.069
 enough+0.067
 like+0.066
 Can+0.060
 please+0.056
Neg logit contributions
lights-0.187
lo-0.184
vation-0.159
aro-0.158
irl-0.155
Token Lucy
Token ID22162
Feature activation-3.97e+00

Pos logit contributions
 indeed+0.100
 enough+0.091
 wanna+0.086
 like+0.085
 gonna+0.084
Neg logit contributions
lo-0.174
lights-0.160
aro-0.144
vation-0.130
ves-0.127
Token.
Token ID13
Feature activation-1.32e+00

Pos logit contributions
 indeed+0.089
 enough+0.089
 gonna+0.086
 Can+0.086
 ok+0.086
Neg logit contributions
lo-0.075
lights-0.073
aro-0.057
we-0.056
eds-0.056
Token time
Token ID640
Feature activation-5.17e+00

Pos logit contributions
 indeed+0.068
 enough+0.067
 like+0.067
 Can+0.061
 please+0.060
Neg logit contributions
lights-0.114
lo-0.112
aro-0.096
vation-0.094
opl-0.090
Token,
Token ID11
Feature activation-4.91e+00

Pos logit contributions
 indeed+0.047
 enough+0.046
 like+0.040
 Can+0.040
 wanna+0.037
Neg logit contributions
lo-0.174
lights-0.173
aro-0.152
irl-0.149
opl-0.145
Token there
Token ID612
Feature activation-1.74e-01

Pos logit contributions
 indeed+0.076
 enough+0.074
 wanna+0.070
 Can+0.066
 daughter+0.065
Neg logit contributions
lights-0.131
lo-0.129
irl-0.113
cks-0.108
aro-0.108
Token was
Token ID373
Feature activation-6.27e+00

Pos logit contributions
 indeed+0.003
 enough+0.003
 like+0.003
 Can+0.003
 please+0.003
Neg logit contributions
lights-0.004
lo-0.004
aro-0.004
vation-0.004
irl-0.004
Token a
Token ID257
Feature activation-5.66e+00

Pos logit contributions
 declined+0.106
 wanna+0.106
 Please+0.102
 Can+0.101
 Could+0.100
Neg logit contributions
irl-0.132
lights-0.130
lo-0.129
hes-0.117
 weekends-0.113
Token curious
Token ID11040
Feature activation-8.59e+00

Pos logit contributions
 enough+0.085
 indeed+0.082
 like+0.072
 wanna+0.071
 ok+0.069
Neg logit contributions
lights-0.157
lo-0.156
irl-0.135
aro-0.131
vation-0.128
Token little
Token ID1310
Feature activation-4.90e+00

Pos logit contributions
 enough+0.141
 indeed+0.138
 ok+0.136
 gonna+0.131
 wanna+0.127
Neg logit contributions
lo-0.211
lights-0.203
asis-0.169
 hill-0.166
opl-0.161
Token boy
Token ID2933
Feature activation-6.23e+00

Pos logit contributions
 indeed+0.075
 enough+0.073
 ok+0.069
 like+0.067
 wanna+0.066
Neg logit contributions
lights-0.135
lo-0.134
aro-0.113
ites-0.110
asis-0.106
Token named
Token ID3706
Feature activation-5.46e+00

Pos logit contributions
 like+0.064
 indeed+0.064
 enough+0.061
 Can+0.055
 please+0.051
Neg logit contributions
lights-0.212
lo-0.206
vation-0.184
aro-0.176
irl-0.174
Token Tim
Token ID5045
Feature activation-3.70e+00

Pos logit contributions
 indeed+0.071
 enough+0.068
 like+0.058
 Can+0.056
 ok+0.056
Neg logit contributions
lo-0.168
lights-0.164
aro-0.140
irl-0.133
vation-0.133
Token.
Token ID13
Feature activation-2.13e+00

Pos logit contributions
 indeed+0.065
 enough+0.064
 like+0.064
 Can+0.059
 please+0.057
Neg logit contributions
lights-0.097
lo-0.094
vation-0.080
aro-0.079
irl-0.076
Token time
Token ID640
Feature activation-5.66e+00

Pos logit contributions
 indeed+0.086
 enough+0.085
 like+0.082
 Can+0.080
 please+0.076
Neg logit contributions
lights-0.120
lo-0.117
vation-0.096
irl-0.096
aro-0.095
Token,
Token ID11
Feature activation-5.10e+00

Pos logit contributions
 indeed+0.056
 enough+0.055
 Can+0.049
 like+0.048
 wanna+0.045
Neg logit contributions
lo-0.188
lights-0.184
aro-0.164
irl-0.159
opl-0.154
Token there
Token ID612
Feature activation-4.18e-01

Pos logit contributions
 indeed+0.083
 wanna+0.080
 enough+0.079
 Can+0.074
 hesitate+0.072
Neg logit contributions
lo-0.133
lights-0.133
irl-0.113
cks-0.111
aro-0.109
Token was
Token ID373
Feature activation-6.03e+00

Pos logit contributions
 indeed+0.007
 enough+0.007
 like+0.007
 Can+0.007
 please+0.007
Neg logit contributions
lights-0.011
lo-0.011
aro-0.009
vation-0.009
irl-0.009
Token a
Token ID257
Feature activation-5.78e+00

Pos logit contributions
 wanna+0.097
 declined+0.095
 Can+0.093
 Please+0.093
 hesitate+0.090
Neg logit contributions
lights-0.140
lo-0.138
irl-0.132
hes-0.120
 weekends-0.117
Token curious
Token ID11040
Feature activation-8.59e+00

Pos logit contributions
 enough+0.087
 indeed+0.081
 wanna+0.071
 like+0.071
 ok+0.070
Neg logit contributions
lights-0.159
lo-0.158
irl-0.139
aro-0.132
asis-0.130
Token pet
Token ID4273
Feature activation-3.65e+00

Pos logit contributions
 enough+0.138
 indeed+0.138
 ok+0.134
 gonna+0.131
 wanna+0.125
Neg logit contributions
lo-0.216
lights-0.207
asis-0.170
 hill-0.168
aro-0.166
Token.
Token ID13
Feature activation-2.57e+00

Pos logit contributions
 indeed+0.055
 enough+0.055
 like+0.051
 ok+0.050
 wanna+0.050
Neg logit contributions
lo-0.102
lights-0.100
aro-0.088
vation-0.081
ites-0.081
Token The
Token ID383
Feature activation-7.56e-01

Pos logit contributions
 enough+0.043
 indeed+0.038
 ok+0.038
 accidentally+0.037
 convinced+0.036
Neg logit contributions
lo-0.069
lights-0.067
opl-0.055
aro-0.053
irl-0.052
Token pet
Token ID4273
Feature activation-4.67e-01

Pos logit contributions
 indeed+0.011
 enough+0.011
 like+0.010
 Can+0.010
 accidentally+0.009
Neg logit contributions
lo-0.022
lights-0.021
aro-0.018
vation-0.018
irl-0.018
Token had
Token ID550
Feature activation-2.67e+00

Pos logit contributions
 indeed+0.007
 enough+0.007
 like+0.007
 Can+0.007
 wanna+0.007
Neg logit contributions
lo-0.013
lights-0.013
aro-0.011
vation-0.010
irl-0.010
TokenOne
Token ID3198
Feature activation+2.09e+00

Pos logit contributions
lights+0.066
lo+0.064
aro+0.053
vation+0.053
irl+0.051
Neg logit contributions
 indeed-0.052
 enough-0.051
 like-0.050
 Can-0.048
 please-0.046
Token day
Token ID1110
Feature activation-4.47e+00

Pos logit contributions
lights+0.042
lo+0.040
aro+0.032
vation+0.031
irl+0.029
Neg logit contributions
 indeed-0.050
 enough-0.050
 like-0.050
 Can-0.046
 please-0.046
Token,
Token ID11
Feature activation-5.41e+00

Pos logit contributions
 indeed+0.067
 Can+0.060
 enough+0.058
 like+0.056
 pad+0.056
Neg logit contributions
lo-0.117
lights-0.114
aro-0.106
irl-0.102
ves-0.100
Token Mom
Token ID11254
Feature activation-1.47e+00

Pos logit contributions
 enough+0.097
 indeed+0.089
 servant+0.088
 detective+0.087
 reader+0.085
Neg logit contributions
lo-0.117
lights-0.109
aro-0.102
irl-0.101
ves-0.093
Tokenmy
Token ID1820
Feature activation-3.00e+00

Pos logit contributions
 detectives+0.032
 hesitate+0.032
 Please+0.032
 indeed+0.031
 ok+0.031
Neg logit contributions
lo-0.025
lights-0.025
eds-0.021
opl-0.020
ating-0.019
Token said
Token ID531
Feature activation-8.59e+00

Pos logit contributions
 ok+0.054
 indeed+0.053
 blonde+0.052
 hesitate+0.052
 Can+0.051
Neg logit contributions
lo-0.060
lights-0.057
eds-0.056
we-0.053
irl-0.052
Token it
Token ID340
Feature activation-1.96e+00

Pos logit contributions
 pad+0.147
die+0.147
 detective+0.140
 knight+0.140
 detectives+0.136
Neg logit contributions
lights-0.185
lo-0.177
ves-0.169
we-0.151
vation-0.150
Token was
Token ID373
Feature activation-4.36e+00

Pos logit contributions
 enough+0.044
 indeed+0.041
 detectives+0.041
 like+0.040
 Can+0.039
Neg logit contributions
lights-0.038
lo-0.037
vation-0.030
we-0.030
irl-0.029
Token time
Token ID640
Feature activation-4.44e+00

Pos logit contributions
 indeed+0.095
 enough+0.092
 Can+0.089
 thief+0.086
 hello+0.086
Neg logit contributions
lights-0.084
lo-0.078
irl-0.067
opl-0.061
vation-0.060
Token to
Token ID284
Feature activation+5.98e-01

Pos logit contributions
 indeed+0.056
 thief+0.054
 Can+0.053
 enough+0.052
 knight+0.051
Neg logit contributions
lights-0.122
lo-0.121
we-0.105
aro-0.103
eds-0.101
Token go
Token ID467
Feature activation-1.12e+00

Pos logit contributions
lights+0.014
lo+0.014
aro+0.011
vation+0.011
irl+0.011
Neg logit contributions
 indeed-0.012
 enough-0.012
 like-0.011
 Can-0.011
 please-0.011
Token time
Token ID640
Feature activation-5.43e+00

Pos logit contributions
 indeed+0.071
 enough+0.070
 like+0.069
 Can+0.064
 please+0.063
Neg logit contributions
lights-0.115
lo-0.113
aro-0.095
vation-0.094
opl-0.090
Token,
Token ID11
Feature activation-4.97e+00

Pos logit contributions
 indeed+0.053
 enough+0.052
 Can+0.046
 like+0.045
 wanna+0.043
Neg logit contributions
lo-0.180
lights-0.178
aro-0.157
irl-0.153
opl-0.148
Token there
Token ID612
Feature activation-2.66e-01

Pos logit contributions
 indeed+0.080
 enough+0.078
 wanna+0.074
 Can+0.071
 daughter+0.068
Neg logit contributions
lights-0.131
lo-0.131
irl-0.112
aro-0.108
cks-0.107
Token was
Token ID373
Feature activation-6.25e+00

Pos logit contributions
 indeed+0.005
 enough+0.005
 like+0.004
 Can+0.004
 please+0.004
Neg logit contributions
lights-0.007
lo-0.007
aro-0.006
vation-0.006
irl-0.006
Token a
Token ID257
Feature activation-5.72e+00

Pos logit contributions
 declined+0.105
 wanna+0.104
 Please+0.103
 Can+0.103
 Could+0.098
Neg logit contributions
lights-0.136
lo-0.134
irl-0.133
hes-0.120
 weekends-0.113
Token curious
Token ID11040
Feature activation-8.58e+00

Pos logit contributions
 enough+0.083
 indeed+0.080
 like+0.070
 wanna+0.068
 ok+0.067
Neg logit contributions
lights-0.162
lo-0.161
irl-0.139
aro-0.135
vation-0.133
Token boy
Token ID2933
Feature activation-6.16e+00

Pos logit contributions
 indeed+0.137
 enough+0.137
 ok+0.132
 gonna+0.128
 wanna+0.124
Neg logit contributions
lo-0.215
lights-0.209
asis-0.172
 hill-0.168
aro-0.166
Token named
Token ID3706
Feature activation-5.35e+00

Pos logit contributions
 indeed+0.070
 enough+0.068
 like+0.068
 Can+0.061
 please+0.057
Neg logit contributions
lights-0.201
lo-0.196
vation-0.171
aro-0.168
irl-0.165
Token Tim
Token ID5045
Feature activation-3.36e+00

Pos logit contributions
 indeed+0.066
 enough+0.062
 like+0.055
 Can+0.053
 wanna+0.052
Neg logit contributions
lo-0.168
lights-0.167
aro-0.141
irl-0.137
vation-0.136
Token.
Token ID13
Feature activation-2.24e+00

Pos logit contributions
 indeed+0.056
 enough+0.056
 like+0.055
 Can+0.051
 please+0.049
Neg logit contributions
lights-0.091
lo-0.088
vation-0.075
aro-0.075
irl-0.072
Token Tim
Token ID5045
Feature activation-1.46e+00

Pos logit contributions
 indeed+0.021
 enough+0.021
 like+0.019
 Can+0.018
 please+0.017
Neg logit contributions
lights-0.076
lo-0.075
aro-0.065
vation-0.064
opl-0.064
Token back
Token ID736
Feature activation+1.79e+00

Pos logit contributions
lights+0.004
lo+0.004
aro+0.003
vation+0.003
irl+0.003
Neg logit contributions
 indeed-0.003
 enough-0.003
 like-0.003
 Can-0.003
 please-0.003
Token home
Token ID1363
Feature activation+2.19e+00

Pos logit contributions
lights+0.058
lo+0.055
vation+0.050
irl+0.048
aro+0.048
Neg logit contributions
 enough-0.021
 like-0.020
 indeed-0.020
 Can-0.017
 please-0.016
Token.
Token ID13
Feature activation+1.48e+00

Pos logit contributions
lights+0.065
lo+0.064
vation+0.054
aro+0.054
irl+0.054
Neg logit contributions
 indeed-0.031
 enough-0.031
 like-0.030
 Can-0.026
 please-0.026
Token
Token ID220
Feature activation+3.70e+00

Pos logit contributions
lights+0.044
lo+0.042
vation+0.036
aro+0.036
irl+0.036
Neg logit contributions
 indeed-0.022
 Can-0.021
 enough-0.021
 like-0.021
 Please-0.019
Token
Token ID198
Feature activation+6.39e+00

Pos logit contributions
lights+0.110
lo+0.103
vation+0.089
aro+0.089
irl+0.087
Neg logit contributions
 indeed-0.053
 like-0.053
 enough-0.052
 Can-0.049
 ok-0.048
Token
Token ID198
Feature activation+5.33e+00

Pos logit contributions
lights+0.234
lo+0.202
vation+0.183
aro+0.179
ites+0.178
Neg logit contributions
 like-0.060
 indeed-0.056
 She-0.052
 It-0.051
 Jerry-0.051
TokenWhen
Token ID2215
Feature activation+1.33e+00

Pos logit contributions
lights+0.151
lo+0.137
aro+0.112
cks+0.110
vation+0.108
Neg logit contributions
 indeed-0.095
 like-0.090
 wanna-0.084
 please-0.084
 Can-0.084
Token he
Token ID339
Feature activation+1.63e+00

Pos logit contributions
lights+0.042
lo+0.041
vation+0.036
aro+0.036
opl+0.035
Neg logit contributions
 enough-0.017
 indeed-0.017
 like-0.015
 Can-0.014
 accidentally-0.014
Token got
Token ID1392
Feature activation+1.01e+00

Pos logit contributions
lights+0.051
lo+0.049
aro+0.042
vation+0.042
opl+0.041
Neg logit contributions
 enough-0.022
 indeed-0.022
 like-0.021
 Can-0.019
 accidentally-0.018
Token there
Token ID612
Feature activation+2.34e+00

Pos logit contributions
lights+0.022
lo+0.022
aro+0.018
ites+0.018
irl+0.017
Neg logit contributions
 indeed-0.021
 Can-0.020
 like-0.020
 dear-0.019
 wanna-0.019
Token,
Token ID11
Feature activation-1.30e+00

Pos logit contributions
lo+0.074
lights+0.073
aro+0.062
vation+0.061
irl+0.059
Neg logit contributions
 indeed-0.026
 Can-0.024
 wanna-0.024
 like-0.022
 Please-0.022
Token on
Token ID319
Feature activation+1.72e+00

Pos logit contributions
lights+0.001
lo+0.001
vation+0.001
aro+0.001
irl+0.001
Neg logit contributions
 indeed-0.001
 enough-0.001
 like-0.001
 Can-0.001
 please-0.001
Token an
Token ID281
Feature activation+2.97e-01

Pos logit contributions
lights+0.048
lo+0.047
aro+0.040
vation+0.040
irl+0.039
Neg logit contributions
 indeed-0.027
 enough-0.027
 like-0.026
 Can-0.024
 please-0.023
Token exciting
Token ID7895
Feature activation-1.22e+00

Pos logit contributions
lights+0.007
lo+0.007
vation+0.005
aro+0.005
opl+0.005
Neg logit contributions
 indeed-0.006
 enough-0.006
 like-0.006
 Can-0.006
 accidentally-0.005
Token adventure
Token ID8855
Feature activation+1.06e+00

Pos logit contributions
 enough+0.027
 indeed+0.027
 like+0.026
 Can+0.024
 please+0.024
Neg logit contributions
lights-0.026
lo-0.026
aro-0.021
vation-0.021
ites-0.020
Token.
Token ID13
Feature activation+9.14e-01

Pos logit contributions
lights+0.029
lo+0.029
aro+0.024
irl+0.024
vation+0.024
Neg logit contributions
 enough-0.017
 indeed-0.017
 like-0.017
 Can-0.014
 please-0.014
Token
Token ID198
Feature activation+5.23e+00

Pos logit contributions
lights+0.027
lo+0.026
aro+0.022
vation+0.022
irl+0.021
Neg logit contributions
 indeed-0.013
 enough-0.013
 like-0.013
 Can-0.012
 please-0.011
Token
Token ID198
Feature activation+4.60e+00

Pos logit contributions
lights+0.164
lo+0.149
vation+0.132
aro+0.130
opl+0.126
Neg logit contributions
 indeed-0.070
 like-0.067
 enough-0.064
 please-0.061
 Can-0.060
TokenThe
Token ID464
Feature activation+9.68e-01

Pos logit contributions
lights+0.117
lo+0.111
aro+0.091
vation+0.089
irl+0.088
Neg logit contributions
 indeed-0.088
 enough-0.084
 like-0.084
 Can-0.080
 please-0.078
Token journey
Token ID7002
Feature activation+3.63e+00

Pos logit contributions
lights+0.027
lo+0.027
aro+0.023
vation+0.022
irl+0.022
Neg logit contributions
 indeed-0.015
 enough-0.015
 like-0.014
 Can-0.014
 please-0.013
Token was
Token ID373
Feature activation-1.22e+00

Pos logit contributions
lights+0.097
lo+0.095
aro+0.079
vation+0.079
irl+0.077
Neg logit contributions
 indeed-0.061
 enough-0.061
 like-0.059
 Can-0.056
 please-0.054
Token hard
Token ID1327
Feature activation-1.51e+00

Pos logit contributions
 Can+0.016
 indeed+0.016
 enough+0.016
 Please+0.015
 like+0.015
Neg logit contributions
lights-0.035
lo-0.034
vation-0.028
aro-0.028
irl-0.028
Token the
Token ID262
Feature activation+2.19e+00

Pos logit contributions
lights+0.051
lo+0.051
vation+0.046
opl+0.044
aro+0.043
Neg logit contributions
 enough-0.027
 like-0.027
 indeed-0.026
 Can-0.023
 accidentally-0.022
Token horn
Token ID12718
Feature activation+2.89e+00

Pos logit contributions
lights+0.053
lo+0.051
vation+0.042
aro+0.042
irl+0.041
Neg logit contributions
 indeed-0.043
 like-0.043
 enough-0.043
 Can-0.040
 please-0.039
Token some
Token ID617
Feature activation+2.31e+00

Pos logit contributions
lights+0.093
lo+0.091
aro+0.079
vation+0.079
irl+0.077
Neg logit contributions
 indeed-0.033
 enough-0.032
 like-0.031
 Can-0.029
 please-0.027
Token more
Token ID517
Feature activation+1.84e+00

Pos logit contributions
lo+0.061
lights+0.061
vation+0.052
aro+0.051
opl+0.049
Neg logit contributions
 like-0.042
 enough-0.042
 indeed-0.041
 Can-0.035
 please-0.035
Token.
Token ID13
Feature activation+1.85e-01

Pos logit contributions
lo+0.056
lights+0.056
vation+0.053
aro+0.049
we+0.048
Neg logit contributions
 like-0.025
 indeed-0.025
 enough-0.024
 pad-0.018
 wanted-0.018
Token At
Token ID1629
Feature activation+6.17e+00

Pos logit contributions
lo+0.006
lights+0.006
vation+0.005
aro+0.005
opl+0.005
Neg logit contributions
 indeed-0.002
 enough-0.002
 like-0.002
 Can-0.002
 Please-0.002
Token the
Token ID262
Feature activation+2.71e+00

Pos logit contributions
lights+0.206
lo+0.203
vation+0.188
ups+0.186
we+0.183
Neg logit contributions
 enough-0.081
 like-0.074
 indeed-0.061
 a-0.059
,-0.057
Token end
Token ID886
Feature activation+2.86e-01

Pos logit contributions
lights+0.067
lo+0.064
aro+0.057
ites+0.052
we+0.050
Neg logit contributions
 like-0.059
 enough-0.056
 indeed-0.055
 three-0.051
 accidentally-0.050
Token of
Token ID286
Feature activation+5.03e+00

Pos logit contributions
lights+0.008
lo+0.008
aro+0.006
vation+0.006
irl+0.006
Neg logit contributions
 indeed-0.005
 enough-0.005
 like-0.004
 Can-0.004
 please-0.004
Token the
Token ID262
Feature activation+3.57e+00

Pos logit contributions
lo+0.160
lights+0.159
vation+0.146
opl+0.140
ups+0.138
Neg logit contributions
 enough-0.068
 like-0.064
 indeed-0.064
 accidentally-0.053
 three-0.053
Token day
Token ID1110
Feature activation+5.02e-01

Pos logit contributions
lights+0.108
lo+0.103
aro+0.094
cks+0.093
ups+0.092
Neg logit contributions
 like-0.048
 three-0.046
 puppy-0.045
 enough-0.045
,-0.045
Token something
Token ID1223
Feature activation-1.33e+00

Pos logit contributions
lights+0.086
lo+0.084
aro+0.072
vation+0.072
irl+0.070
Neg logit contributions
 indeed-0.043
 enough-0.043
 like-0.041
 Can-0.039
 please-0.037
Token so
Token ID523
Feature activation-5.33e-01

Pos logit contributions
 indeed+0.022
 enough+0.021
 Can+0.020
 like+0.020
 wanna+0.020
Neg logit contributions
lights-0.035
lo-0.034
aro-0.029
vation-0.028
irl-0.028
Token wonderful
Token ID7932
Feature activation-5.68e-01

Pos logit contributions
 indeed+0.009
 enough+0.008
 like+0.008
 Can+0.008
 please+0.007
Neg logit contributions
lights-0.015
lo-0.015
vation-0.012
aro-0.012
opl-0.012
Token.
Token ID13
Feature activation+4.96e-01

Pos logit contributions
 ok+0.009
 wanna+0.009
 Can+0.009
 please+0.009
 indeed+0.009
Neg logit contributions
lights-0.014
lo-0.014
we-0.011
aro-0.011
irl-0.011
Token
Token ID220
Feature activation+2.92e+00

Pos logit contributions
lights+0.014
lo+0.014
aro+0.012
vation+0.011
irl+0.011
Neg logit contributions
 indeed-0.008
 enough-0.008
 like-0.007
 Can-0.007
 please-0.007
Token
Token ID198
Feature activation+5.34e+00

Pos logit contributions
lights+0.090
lo+0.084
aro+0.074
vation+0.074
eds+0.071
Neg logit contributions
 like-0.039
 indeed-0.039
 enough-0.038
 Can-0.036
 ok-0.035
Token
Token ID198
Feature activation+4.20e+00

Pos logit contributions
lights+0.169
lo+0.153
aro+0.136
vation+0.134
opl+0.132
Neg logit contributions
 indeed-0.069
 like-0.069
 Can-0.064
 wanna-0.061
 please-0.060
TokenEveryone
Token ID16190
Feature activation+2.52e+00

Pos logit contributions
lights+0.100
lo+0.094
aro+0.078
vation+0.075
opl+0.075
Neg logit contributions
 indeed-0.087
 like-0.084
 enough-0.081
 Can-0.080
 wanna-0.078
Token in
Token ID287
Feature activation+2.60e+00

Pos logit contributions
lights+0.069
lo+0.068
aro+0.058
vation+0.057
irl+0.056
Neg logit contributions
 indeed-0.040
 enough-0.039
 like-0.038
 Can-0.036
 please-0.035
Token the
Token ID262
Feature activation+3.22e+00

Pos logit contributions
lights+0.076
lo+0.075
vation+0.065
aro+0.063
ups+0.062
Neg logit contributions
 enough-0.039
 indeed-0.039
 like-0.038
 Can-0.035
 three-0.033
Token garden
Token ID11376
Feature activation+1.66e+00

Pos logit contributions
lights+0.084
lo+0.082
aro+0.070
vation+0.065
opl+0.064
Neg logit contributions
 enough-0.061
 indeed-0.060
 like-0.059
 three-0.057
 Can-0.056
Token they
Token ID484
Feature activation+8.58e-01

Pos logit contributions
 wanna+0.056
 indeed+0.052
 pad+0.051
 gonna+0.049
 Can+0.047
Neg logit contributions
lights-0.060
lo-0.054
aro-0.047
vation-0.046
we-0.045
Token went
Token ID1816
Feature activation-1.27e-01

Pos logit contributions
lights+0.022
lo+0.021
aro+0.018
vation+0.018
irl+0.017
Neg logit contributions
 indeed-0.015
 enough-0.015
 like-0.014
 Can-0.014
 wanna-0.013
Token to
Token ID284
Feature activation+2.40e+00

Pos logit contributions
 indeed+0.002
 enough+0.002
 like+0.002
 Can+0.002
 please+0.001
Neg logit contributions
lights-0.004
lo-0.004
aro-0.003
vation-0.003
irl-0.003
Token sleep
Token ID3993
Feature activation+3.99e-01

Pos logit contributions
lights+0.055
lo+0.054
vation+0.044
aro+0.043
irl+0.043
Neg logit contributions
 indeed-0.049
 enough-0.047
 like-0.045
 Can-0.045
 wanna-0.042
Token.
Token ID13
Feature activation+3.63e-01

Pos logit contributions
lights+0.013
lo+0.013
aro+0.011
vation+0.011
opl+0.011
Neg logit contributions
 indeed-0.005
 enough-0.005
 like-0.005
 Can-0.004
 please-0.004
Token
Token ID198
Feature activation+5.27e+00

Pos logit contributions
lights+0.010
lo+0.010
vation+0.008
opl+0.008
aro+0.008
Neg logit contributions
 indeed-0.005
 enough-0.005
 like-0.005
 wanna-0.005
 please-0.004
Token
Token ID198
Feature activation+4.34e+00

Pos logit contributions
lights+0.161
lo+0.157
aro+0.136
vation+0.135
opl+0.131
Neg logit contributions
 like-0.070
 indeed-0.068
 accidentally-0.063
 enough-0.062
 please-0.062
TokenThe
Token ID464
Feature activation+1.23e+00

Pos logit contributions
lights+0.103
lo+0.101
vation+0.083
aro+0.082
irl+0.080
Neg logit contributions
 enough-0.085
 indeed-0.085
 like-0.082
 Can-0.079
 please-0.076
Token next
Token ID1306
Feature activation+1.02e+00

Pos logit contributions
lights+0.034
lo+0.033
aro+0.028
eds+0.028
ites+0.027
Neg logit contributions
 indeed-0.021
 like-0.021
 enough-0.021
 Can-0.019
 thief-0.019
Token day
Token ID1110
Feature activation-2.16e-01

Pos logit contributions
lights+0.021
lo+0.021
aro+0.017
vation+0.016
opl+0.015
Neg logit contributions
 enough-0.024
 indeed-0.024
 like-0.023
 Can-0.022
 please-0.022
Token,
Token ID11
Feature activation-2.67e+00

Pos logit contributions
 wanna+0.003
 indeed+0.003
 pad+0.003
 gonna+0.003
 like+0.003
Neg logit contributions
lights-0.006
lo-0.006
vation-0.005
ves-0.005
ups-0.005
Token
Token ID198
Feature activation+3.63e+00

Pos logit contributions
 indeed+0.021
 enough+0.021
 like+0.021
 Can+0.020
 please+0.019
Neg logit contributions
lights-0.044
lo-0.043
vation-0.037
aro-0.036
irl-0.036
Token
Token ID198
Feature activation+3.24e+00

Pos logit contributions
lights+0.130
lo+0.118
vation+0.117
asis+0.108
ites+0.107
Neg logit contributions
 like-0.032
 It-0.030
 She-0.030
 please-0.029
 indeed-0.029
TokenTom
Token ID13787
Feature activation+7.27e-01

Pos logit contributions
lights+0.099
lo+0.092
vation+0.081
irl+0.079
opl+0.079
Neg logit contributions
 indeed-0.046
 like-0.045
 accidentally-0.044
 please-0.044
 Can-0.044
Token ran
Token ID4966
Feature activation-9.57e-01

Pos logit contributions
lights+0.020
lo+0.019
aro+0.016
eds+0.016
ups+0.016
Neg logit contributions
 indeed-0.011
 enough-0.011
 Can-0.011
 like-0.011
 please-0.010
Token to
Token ID284
Feature activation+7.70e-01

Pos logit contributions
 indeed+0.015
 dear+0.014
 Can+0.014
 blonde+0.013
 ok+0.013
Neg logit contributions
lo-0.025
lights-0.025
ups-0.022
aro-0.021
 directions-0.021
Token help
Token ID1037
Feature activation+5.61e-01

Pos logit contributions
lights+0.026
lo+0.024
vation+0.022
ites+0.021
opl+0.021
Neg logit contributions
 like-0.009
 please-0.009
 enough-0.009
 indeed-0.008
 Can-0.008
Token him
Token ID683
Feature activation-1.96e+00

Pos logit contributions
lights+0.019
lo+0.018
vation+0.016
opl+0.016
aro+0.016
Neg logit contributions
 like-0.006
 enough-0.006
 indeed-0.006
 please-0.006
 Can-0.005
Token.
Token ID13
Feature activation-1.97e+00

Pos logit contributions
 enough+0.031
 like+0.031
 indeed+0.031
 please+0.028
 Can+0.027
Neg logit contributions
lights-0.055
lo-0.054
vation-0.045
aro-0.045
opl-0.044
Token He
Token ID679
Feature activation+1.08e+00

Pos logit contributions
 indeed+0.019
 enough+0.018
 like+0.017
 Can+0.015
 please+0.014
Neg logit contributions
lights-0.067
lo-0.066
aro-0.058
vation-0.057
irl-0.056
Token grabbed
Token ID13176
Feature activation+7.52e-01

Pos logit contributions
lights+0.030
lo+0.030
aro+0.026
vation+0.025
irl+0.025
Neg logit contributions
 indeed-0.016
 enough-0.015
 like-0.015
 Can-0.014
 please-0.014
Token a
Token ID257
Feature activation-3.19e-02

Pos logit contributions
lights+0.024
lo+0.024
vation+0.021
opl+0.020
aro+0.020
Neg logit contributions
 enough-0.009
 like-0.009
 indeed-0.008
 Can-0.007
 please-0.007
Token beautiful
Token ID4950
Feature activation-1.93e+00

Pos logit contributions
 enough+0.042
 indeed+0.040
 like+0.039
 wanna+0.037
 Can+0.037
Neg logit contributions
lights-0.052
lo-0.050
vation-0.043
irl-0.042
aro-0.041
Token,
Token ID11
Feature activation-2.51e+00

Pos logit contributions
 enough+0.039
 wanna+0.038
 indeed+0.037
 gonna+0.037
 ok+0.036
Neg logit contributions
lights-0.041
lo-0.037
irl-0.033
vation-0.033
asis-0.031
Token unique
Token ID3748
Feature activation-1.23e+00

Pos logit contributions
 enough+0.048
 indeed+0.044
 wanna+0.043
 please+0.042
 Can+0.041
Neg logit contributions
lights-0.059
lo-0.052
irl-0.045
vation-0.045
aro-0.043
Token necklace
Token ID38987
Feature activation-4.29e-01

Pos logit contributions
 enough+0.023
 wanna+0.022
 indeed+0.022
 gonna+0.022
 Can+0.022
Neg logit contributions
lights-0.027
lo-0.025
irl-0.022
vation-0.022
aro-0.021
Token made
Token ID925
Feature activation+1.04e+00

Pos logit contributions
 Can+0.006
 enough+0.006
 indeed+0.006
 wanna+0.006
 Please+0.006
Neg logit contributions
lights-0.012
lo-0.011
vation-0.010
aro-0.010
we-0.009
Token out
Token ID503
Feature activation+3.92e-01

Pos logit contributions
lo+0.032
lights+0.032
aro+0.028
vation+0.028
opl+0.027
Neg logit contributions
 indeed-0.013
 enough-0.013
 like-0.012
 please-0.011
 ok-0.010
Token of
Token ID286
Feature activation+2.91e+00

Pos logit contributions
lights+0.010
lo+0.010
aro+0.008
vation+0.008
opl+0.008
Neg logit contributions
 enough-0.007
 indeed-0.007
 Can-0.006
 accidentally-0.006
 like-0.006
Token gl
Token ID1278
Feature activation+1.06e+00

Pos logit contributions
lights+0.079
lo+0.076
aro+0.064
vation+0.062
irl+0.062
Neg logit contributions
 indeed-0.046
 enough-0.045
 like-0.044
 Can-0.044
 wanna-0.042
Tokenimmer
Token ID10957
Feature activation+1.28e+00

Pos logit contributions
lights+0.023
lo+0.022
aro+0.018
vation+0.018
opl+0.017
Neg logit contributions
 indeed-0.023
 enough-0.023
 like-0.023
 Can-0.021
 please-0.021
Tokening
Token ID278
Feature activation+1.38e+00

Pos logit contributions
lights+0.030
lo+0.027
ups+0.021
we+0.021
vation+0.021
Neg logit contributions
 accidentally-0.025
 Can-0.025
 wanna-0.025
 Please-0.025
 indeed-0.025
Token jewels
Token ID42701
Feature activation+1.88e+00

Pos logit contributions
lights+0.038
lo+0.036
aro+0.030
vation+0.029
irl+0.029
Neg logit contributions
 indeed-0.022
 Can-0.021
 enough-0.021
 wanna-0.021
 please-0.021
Token held
Token ID2714
Feature activation+2.34e+00

Pos logit contributions
lights+0.054
lo+0.053
vation+0.046
aro+0.045
opl+0.045
Neg logit contributions
 enough-0.024
 like-0.024
 indeed-0.023
 accidentally-0.021
 Can-0.020
Token the
Token ID262
Feature activation+1.81e+00

Pos logit contributions
lights+0.061
lo+0.060
vation+0.050
aro+0.050
opl+0.049
Neg logit contributions
 enough-0.041
 indeed-0.041
 like-0.040
 Can-0.037
 please-0.036
Token gem
Token ID16840
Feature activation+2.61e+00

Pos logit contributions
lo+0.040
lights+0.040
aro+0.032
vation+0.032
opl+0.030
Neg logit contributions
 indeed-0.037
 enough-0.036
 like-0.035
 Can-0.034
 accidentally-0.034
Token in
Token ID287
Feature activation+3.65e+00

Pos logit contributions
lo+0.080
lights+0.079
vation+0.069
aro+0.069
opl+0.069
Neg logit contributions
 enough-0.034
 indeed-0.034
 like-0.033
 please-0.027
 Can-0.026
Token their
Token ID511
Feature activation+2.10e+00

Pos logit contributions
lights+0.086
lo+0.083
aro+0.066
irl+0.066
vation+0.065
Neg logit contributions
 indeed-0.071
 enough-0.069
 Can-0.067
 like-0.065
 accidentally-0.065
Token hands
Token ID2832
Feature activation+3.05e+00

Pos logit contributions
lights+0.056
lo+0.056
vation+0.051
aro+0.049
opl+0.047
Neg logit contributions
 enough-0.037
 like-0.035
 three-0.034
 daughter-0.034
 indeed-0.033
Token.
Token ID13
Feature activation-5.27e-01

Pos logit contributions
lights+0.073
lo+0.070
aro+0.056
vation+0.056
irl+0.054
Neg logit contributions
 indeed-0.057
 Can-0.056
 enough-0.055
 wanna-0.054
 like-0.053
Token They
Token ID1119
Feature activation+1.94e+00

Pos logit contributions
 indeed+0.009
 enough+0.009
 wanna+0.008
 ok+0.008
 like+0.008
Neg logit contributions
lights-0.014
lo-0.013
irl-0.011
aro-0.010
vation-0.010
Token knocked
Token ID13642
Feature activation+2.91e+00

Pos logit contributions
lights+0.060
lo+0.058
vation+0.052
opl+0.051
aro+0.050
Neg logit contributions
 like-0.027
 enough-0.026
 indeed-0.025
 wanted-0.023
 accidentally-0.023
Token on
Token ID319
Feature activation+2.24e+00

Pos logit contributions
lights+0.080
lo+0.077
vation+0.068
aro+0.066
irl+0.065
Neg logit contributions
 enough-0.048
 indeed-0.047
 like-0.047
 Can-0.043
 please-0.042
Token the
Token ID262
Feature activation+2.69e+00

Pos logit contributions
lights+0.061
lo+0.060
vation+0.050
aro+0.050
irl+0.049
Neg logit contributions
 indeed-0.037
 enough-0.037
 like-0.036
 Can-0.033
 please-0.032
Token angry
Token ID7954
Feature activation-3.11e+00

Pos logit contributions
lights+0.017
lo+0.015
aro+0.013
irl+0.013
cks+0.013
Neg logit contributions
 indeed-0.009
 enough-0.008
 Can-0.008
 like-0.008
 wanna-0.008
Token.
Token ID13
Feature activation-2.62e+00

Pos logit contributions
 indeed+0.066
 enough+0.064
 wanna+0.064
 accidentally+0.064
 Can+0.063
Neg logit contributions
lights-0.066
lo-0.061
aro-0.049
we-0.047
ups-0.046
Token They
Token ID1119
Feature activation+8.61e-02

Pos logit contributions
 indeed+0.036
 enough+0.035
 like+0.034
 Can+0.033
 please+0.031
Neg logit contributions
lights-0.078
lo-0.077
vation-0.066
aro-0.066
irl-0.064
Token decide
Token ID5409
Feature activation-4.00e+00

Pos logit contributions
lo+0.002
lights+0.002
irl+0.002
aro+0.002
cks+0.002
Neg logit contributions
 indeed-0.002
 enough-0.001
 Can-0.001
 ok-0.001
 3-0.001
Token to
Token ID284
Feature activation+1.50e+00

Pos logit contributions
 indeed+0.040
 enough+0.037
 Can+0.035
 like+0.034
 wanna+0.033
Neg logit contributions
lights-0.131
lo-0.128
aro-0.114
ups-0.110
we-0.108
Token take
Token ID1011
Feature activation+2.71e+00

Pos logit contributions
lights+0.040
lo+0.038
aro+0.032
vation+0.032
opl+0.031
Neg logit contributions
 like-0.027
 indeed-0.026
 enough-0.026
 please-0.025
 accidentally-0.024
Token the
Token ID262
Feature activation+8.59e-01

Pos logit contributions
lights+0.087
lo+0.079
vation+0.078
opl+0.073
asis+0.072
Neg logit contributions
 enough-0.042
 like-0.036
 three-0.031
 indeed-0.031
 a-0.030
Token car
Token ID1097
Feature activation+2.03e+00

Pos logit contributions
lights+0.019
lo+0.018
vation+0.015
aro+0.015
irl+0.014
Neg logit contributions
 enough-0.019
 indeed-0.018
 like-0.018
 Can-0.017
 please-0.017
Token without
Token ID1231
Feature activation+1.59e+00

Pos logit contributions
lights+0.068
lo+0.066
irl+0.059
vation+0.059
aro+0.058
Neg logit contributions
 enough-0.023
 like-0.023
 indeed-0.021
 please-0.018
 thief-0.016
Token paying
Token ID5989
Feature activation+1.81e+00

Pos logit contributions
lights+0.051
lo+0.048
vation+0.043
opl+0.041
asis+0.040
Neg logit contributions
 enough-0.024
 indeed-0.021
 like-0.020
 accidentally-0.019
 please-0.018
Token.
Token ID13
Feature activation-4.73e-01

Pos logit contributions
lights+0.053
lo+0.050
vation+0.047
opl+0.046
irl+0.045
Neg logit contributions
 enough-0.031
 like-0.028
 indeed-0.026
 please-0.024
 ok-0.024
Token
Token ID198
Feature activation+3.36e+00

Pos logit contributions
lights+0.117
lo+0.113
vation+0.106
aro+0.102
asis+0.100
Neg logit contributions
 like-0.035
 Can-0.034
 indeed-0.034
 She-0.033
 please-0.033
TokenL
Token ID43
Feature activation+2.05e+00

Pos logit contributions
lights+0.087
lo+0.084
vation+0.070
aro+0.070
opl+0.068
Neg logit contributions
 indeed-0.061
 like-0.059
 enough-0.059
 Can-0.058
 please-0.056
Tokenily
Token ID813
Feature activation+3.16e-01

Pos logit contributions
lo+0.056
lights+0.056
vation+0.046
aro+0.046
opl+0.046
Neg logit contributions
 enough-0.031
 indeed-0.031
 Can-0.030
 like-0.029
 Please-0.028
Token nodded
Token ID14464
Feature activation-5.67e-01

Pos logit contributions
lights+0.008
lo+0.008
irl+0.007
vation+0.007
ups+0.006
Neg logit contributions
 indeed-0.005
 enough-0.005
 Can-0.005
 like-0.005
 wanna-0.005
Token.
Token ID13
Feature activation-2.04e+00

Pos logit contributions
 indeed+0.011
 enough+0.011
 wanna+0.011
 gonna+0.011
 Can+0.011
Neg logit contributions
lights-0.013
lo-0.011
we-0.009
irl-0.009
eds-0.009
Token She
Token ID1375
Feature activation+3.78e-01

Pos logit contributions
 indeed+0.019
 enough+0.018
 like+0.016
 wanna+0.015
 ok+0.014
Neg logit contributions
lights-0.069
lo-0.068
aro-0.057
opl-0.057
ites-0.057
Token smiled
Token ID13541
Feature activation-2.01e+00

Pos logit contributions
lights+0.010
lo+0.009
irl+0.008
ups+0.008
aro+0.008
Neg logit contributions
 enough-0.006
 indeed-0.006
 Can-0.006
 wanna-0.006
 like-0.006
Token and
Token ID290
Feature activation+2.43e-02

Pos logit contributions
 wanna+0.038
 accidentally+0.037
 gonna+0.036
 indeed+0.035
 Can+0.034
Neg logit contributions
lights-0.049
lo-0.039
ups-0.034
 directions-0.033
we-0.032
Token hugged
Token ID41130
Feature activation+3.72e-01

Pos logit contributions
lights+0.001
lo+0.001
aro+0.001
vation+0.001
opl+0.001
Neg logit contributions
 indeed-0.000
 enough-0.000
 like-0.000
 Can-0.000
 please-0.000
Token Tom
Token ID4186
Feature activation-9.62e-01

Pos logit contributions
lights+0.013
lo+0.013
vation+0.012
aro+0.011
irl+0.011
Neg logit contributions
 enough-0.004
 like-0.003
 indeed-0.003
 Can-0.003
 please-0.003
Token.
Token ID13
Feature activation-1.79e+00

Pos logit contributions
 indeed+0.019
 Can+0.018
 enough+0.018
 wanna+0.018
 like+0.018
Neg logit contributions
lights-0.022
lo-0.021
aro-0.017
vation-0.016
eds-0.016
Token for
Token ID329
Feature activation+2.01e+00

Pos logit contributions
 indeed+0.020
 wanna+0.020
 Please+0.019
 Can+0.019
 blonde+0.018
Neg logit contributions
lights-0.037
lo-0.035
ups-0.032
eds-0.029
 directions-0.029
Token another
Token ID1194
Feature activation+4.28e-01

Pos logit contributions
lights+0.052
lo+0.051
aro+0.043
vation+0.043
irl+0.042
Neg logit contributions
 indeed-0.034
 enough-0.033
 like-0.033
 Can-0.032
 wanna-0.031
Token one
Token ID530
Feature activation+7.09e-01

Pos logit contributions
lo+0.009
lights+0.009
vation+0.008
irl+0.008
aro+0.008
Neg logit contributions
 indeed-0.008
 enough-0.008
 Can-0.008
 like-0.007
 gonna-0.007
Token,
Token ID11
Feature activation-2.09e+00

Pos logit contributions
lights+0.018
lo+0.018
aro+0.015
vation+0.015
irl+0.015
Neg logit contributions
 indeed-0.011
 enough-0.011
 Can-0.010
 wanna-0.010
 like-0.010
Token but
Token ID475
Feature activation-7.13e-01

Pos logit contributions
 indeed+0.022
 enough+0.021
 thief+0.020
 wanna+0.019
 Robert+0.019
Neg logit contributions
lights-0.060
lo-0.057
aro-0.051
we-0.051
ites-0.050
Token her
Token ID607
Feature activation-1.55e+00

Pos logit contributions
 wanna+0.012
 Can+0.011
 indeed+0.010
 pad+0.010
 like+0.010
Neg logit contributions
lights-0.018
lo-0.016
aro-0.015
irl-0.014
we-0.014
Token mom
Token ID1995
Feature activation+1.62e+00

Pos logit contributions
 indeed+0.032
 enough+0.032
 like+0.031
 Can+0.030
 wanna+0.029
Neg logit contributions
lights-0.034
lo-0.033
vation-0.026
aro-0.026
irl-0.026
Token said
Token ID531
Feature activation-4.90e+00

Pos logit contributions
lights+0.046
lo+0.045
aro+0.038
vation+0.038
irl+0.038
Neg logit contributions
 indeed-0.023
 enough-0.023
 Can-0.022
 like-0.022
 wanna-0.020
Token she
Token ID673
Feature activation+6.76e-01

Pos logit contributions
 wanna+0.078
 pad+0.073
 detectives+0.072
 gonna+0.068
 enough+0.067
Neg logit contributions
lights-0.124
lo-0.116
aro-0.106
ves-0.105
we-0.102
Token had
Token ID550
Feature activation-9.74e-01

Pos logit contributions
lights+0.016
lo+0.016
we+0.013
ups+0.013
irl+0.013
Neg logit contributions
 enough-0.011
 Can-0.011
 like-0.011
 indeed-0.011
 wanna-0.011
Token to
Token ID284
Feature activation+2.08e+00

Pos logit contributions
 Can+0.012
 indeed+0.011
 like+0.011
 wanna+0.011
 detective+0.010
Neg logit contributions
lights-0.026
lo-0.026
ups-0.023
irl-0.023
vation-0.022
Token and
Token ID290
Feature activation+2.45e+00

Pos logit contributions
lights+0.024
lo+0.024
aro+0.020
vation+0.020
irl+0.020
Neg logit contributions
 indeed-0.009
 enough-0.009
 like-0.008
 Can-0.008
 please-0.007
Token singing
Token ID13777
Feature activation+1.81e+00

Pos logit contributions
lights+0.064
lo+0.063
vation+0.053
aro+0.052
opl+0.051
Neg logit contributions
 indeed-0.043
 like-0.043
 enough-0.042
 Can-0.039
 accidentally-0.038
Token and
Token ID290
Feature activation+1.77e+00

Pos logit contributions
lights+0.050
lo+0.049
aro+0.041
vation+0.040
irl+0.039
Neg logit contributions
 indeed-0.029
 enough-0.028
 like-0.027
 Can-0.026
 please-0.025
Token laughing
Token ID14376
Feature activation+4.84e-02

Pos logit contributions
lights+0.046
lo+0.045
vation+0.038
aro+0.037
irl+0.037
Neg logit contributions
 indeed-0.031
 like-0.031
 enough-0.031
 Can-0.029
 accidentally-0.027
Token.
Token ID13
Feature activation+1.14e+00

Pos logit contributions
lights+0.001
lo+0.001
aro+0.001
vation+0.001
irl+0.001
Neg logit contributions
 indeed-0.001
 enough-0.001
 Can-0.001
 please-0.001
 wanna-0.001
Token And
Token ID843
Feature activation+2.00e-01

Pos logit contributions
lights+0.034
lo+0.033
vation+0.029
aro+0.029
irl+0.028
Neg logit contributions
 indeed-0.015
 like-0.015
 enough-0.015
 Can-0.014
 Please-0.013
Token that
Token ID326
Feature activation+1.08e+00

Pos logit contributions
lights+0.005
lo+0.005
aro+0.004
vation+0.004
irl+0.004
Neg logit contributions
 indeed-0.003
 enough-0.003
 Can-0.003
 like-0.003
 wanna-0.003
Token's
Token ID338
Feature activation-2.53e-01

Pos logit contributions
lights+0.031
lo+0.031
aro+0.027
vation+0.026
opl+0.025
Neg logit contributions
 enough-0.016
 indeed-0.016
 like-0.016
 Can-0.014
 accidentally-0.014
Token how
Token ID703
Feature activation+1.50e+00

Pos logit contributions
 indeed+0.004
 enough+0.003
 Can+0.003
 like+0.003
 please+0.003
Neg logit contributions
lights-0.007
lo-0.007
aro-0.006
irl-0.006
vation-0.006
Token the
Token ID262
Feature activation+1.89e+00

Pos logit contributions
lights+0.041
lo+0.040
aro+0.033
vation+0.033
irl+0.032
Neg logit contributions
 indeed-0.025
 enough-0.024
 like-0.023
 Can-0.022
 please-0.021
Token dead
Token ID2636
Feature activation+2.85e+00

Pos logit contributions
lights+0.045
lo+0.044
aro+0.036
vation+0.036
irl+0.035
Neg logit contributions
 indeed-0.038
 enough-0.037
 like-0.036
 Can-0.035
 please-0.033
Token his
Token ID465
Feature activation+1.72e+00

Pos logit contributions
lights+0.077
lo+0.077
vation+0.070
we+0.069
ites+0.066
Neg logit contributions
 enough-0.033
 indeed-0.027
 like-0.026
 three-0.023
 a-0.022
Token way
Token ID835
Feature activation-1.76e+00

Pos logit contributions
lights+0.042
lo+0.041
aro+0.034
vation+0.033
ites+0.032
Neg logit contributions
 enough-0.036
 indeed-0.033
 three-0.032
 like-0.032
 daughter-0.031
Token home
Token ID1363
Feature activation+1.49e+00

Pos logit contributions
 indeed+0.031
 wanna+0.029
 dear+0.029
 Can+0.029
 blonde+0.028
Neg logit contributions
lo-0.043
lights-0.042
aro-0.038
ites-0.035
eds-0.034
Token,
Token ID11
Feature activation-3.07e+00

Pos logit contributions
lights+0.055
lo+0.054
aro+0.048
vation+0.047
irl+0.046
Neg logit contributions
 indeed-0.010
 enough-0.010
 like-0.009
 Can-0.008
 wanna-0.007
Token Bob
Token ID5811
Feature activation-3.48e-01

Pos logit contributions
 indeed+0.044
 pad+0.043
 wanna+0.042
 enough+0.041
 Can+0.039
Neg logit contributions
lo-0.085
lights-0.083
aro-0.075
irl-0.066
ves-0.065
Token met
Token ID1138
Feature activation-2.64e+00

Pos logit contributions
 indeed+0.006
 Can+0.005
 dear+0.005
 please+0.005
 wanna+0.005
Neg logit contributions
lo-0.009
lights-0.008
ups-0.008
eds-0.008
irl-0.007
Token a
Token ID257
Feature activation-3.91e+00

Pos logit contributions
 indeed+0.029
 wanna+0.028
 Can+0.027
 Please+0.026
 enough+0.025
Neg logit contributions
lo-0.082
lights-0.080
aro-0.070
irl-0.068
opl-0.067
Token new
Token ID649
Feature activation-2.29e+00

Pos logit contributions
 enough+0.075
 indeed+0.074
 wanna+0.067
 like+0.067
 gonna+0.065
Neg logit contributions
lo-0.085
lights-0.083
irl-0.072
aro-0.069
vation-0.068
Token friend
Token ID1545
Feature activation-3.94e+00

Pos logit contributions
 indeed+0.041
 enough+0.040
 wanna+0.039
 ok+0.039
 gonna+0.038
Neg logit contributions
lo-0.054
lights-0.053
irl-0.045
aro-0.045
asis-0.043
Token,
Token ID11
Feature activation-4.66e+00

Pos logit contributions
 indeed+0.083
 wanna+0.083
 Can+0.079
 dear+0.079
 enough+0.078
Neg logit contributions
lo-0.076
lights-0.073
aro-0.065
we-0.058
ites-0.058
Token a
Token ID257
Feature activation-4.15e+00

Pos logit contributions
 indeed+0.060
 enough+0.060
 wanna+0.056
 Can+0.054
 Please+0.052
Neg logit contributions
lo-0.117
lights-0.114
aro-0.105
we-0.100
 tracks-0.099
Token,
Token ID11
Feature activation-4.29e+00

Pos logit contributions
 wanna+0.063
 pad+0.060
 indeed+0.059
 seen+0.056
 enough+0.056
Neg logit contributions
lo-0.138
lights-0.133
aro-0.124
ves-0.113
we-0.112
Token "
Token ID366
Feature activation+1.38e+00

Pos logit contributions
 enough+0.053
 indeed+0.049
 like+0.045
 wanna+0.043
 detectives+0.040
Neg logit contributions
lights-0.128
lo-0.128
aro-0.112
we-0.108
vation-0.108
TokenNo
Token ID2949
Feature activation-1.18e+00

Pos logit contributions
lights+0.037
lo+0.036
aro+0.030
vation+0.030
irl+0.030
Neg logit contributions
 indeed-0.024
 like-0.023
 Can-0.023
 enough-0.022
 Please-0.021
Token,
Token ID11
Feature activation-3.77e+00

Pos logit contributions
 indeed+0.014
 like+0.013
 Can+0.013
 Please+0.012
 detectives+0.012
Neg logit contributions
lo-0.036
lights-0.035
aro-0.031
opl-0.030
vation-0.029
Token I
Token ID314
Feature activation+5.62e-01

Pos logit contributions
 indeed+0.050
 enough+0.049
 like+0.045
 accidentally+0.045
 wanna+0.043
Neg logit contributions
lights-0.107
lo-0.107
aro-0.094
vation-0.088
irl-0.086
Token'm
Token ID1101
Feature activation-1.92e+00

Pos logit contributions
lo+0.016
lights+0.015
irl+0.014
aro+0.013
vation+0.013
Neg logit contributions
 indeed-0.008
 enough-0.008
 Can-0.007
 Please-0.007
 Robert-0.006
Token sorry
Token ID7926
Feature activation-5.19e+00

Pos logit contributions
 indeed+0.024
 Can+0.023
 enough+0.023
 like+0.022
 pad+0.021
Neg logit contributions
lo-0.055
lights-0.054
aro-0.048
irl-0.047
opl-0.046
Token,
Token ID11
Feature activation-4.08e+00

Pos logit contributions
 pad+0.087
 accidentally+0.086
 blonde+0.085
 seen+0.083
 declined+0.083
Neg logit contributions
lights-0.126
lo-0.122
aro-0.098
we-0.095
ups-0.094
Token that
Token ID326
Feature activation-1.47e+00

Pos logit contributions
 enough+0.050
 indeed+0.048
 accidentally+0.045
 detectives+0.044
 like+0.043
Neg logit contributions
lights-0.119
lo-0.115
aro-0.103
vation-0.095
opl-0.093
Token's
Token ID338
Feature activation-1.49e+00

Pos logit contributions
 enough+0.021
 indeed+0.021
 like+0.021
 Can+0.020
 please+0.019
Neg logit contributions
lights-0.044
lo-0.043
vation-0.036
aro-0.036
irl-0.035
Token not
Token ID407
Feature activation+2.01e-01

Pos logit contributions
 indeed+0.019
 Can+0.018
 enough+0.018
 like+0.017
 wanna+0.017
Neg logit contributions
lights-0.044
lo-0.043
irl-0.037
aro-0.037
opl-0.036
Token van
Token ID5719
Feature activation+1.29e+00

Pos logit contributions
lights+0.045
lo+0.044
aro+0.036
vation+0.035
irl+0.034
Neg logit contributions
 like-0.052
 enough-0.052
 indeed-0.051
 Can-0.048
 please-0.047
Token,
Token ID11
Feature activation-2.93e+00

Pos logit contributions
lights+0.036
lo+0.035
aro+0.030
vation+0.030
irl+0.029
Neg logit contributions
 indeed-0.021
 enough-0.021
 like-0.020
 Can-0.018
 please-0.018
Token so
Token ID523
Feature activation-1.32e+00

Pos logit contributions
 indeed+0.034
 enough+0.034
 thief+0.031
 detective+0.030
 accidentally+0.029
Neg logit contributions
lights-0.084
lo-0.077
aro-0.068
irl-0.068
 tracks-0.066
Token Daddy
Token ID26926
Feature activation-5.09e-01

Pos logit contributions
 indeed+0.022
 like+0.021
 Can+0.020
 enough+0.020
 robber+0.020
Neg logit contributions
lights-0.036
lo-0.033
irl-0.028
aro-0.027
vation-0.027
Token helped
Token ID4193
Feature activation+4.24e-01

Pos logit contributions
 indeed+0.009
 enough+0.008
 Can+0.008
 like+0.008
 Please+0.008
Neg logit contributions
lights-0.013
lo-0.013
aro-0.011
ups-0.010
vation-0.010
Token her
Token ID607
Feature activation-7.51e-01

Pos logit contributions
lights+0.012
lo+0.012
vation+0.010
we+0.010
irl+0.010
Neg logit contributions
 indeed-0.006
 enough-0.006
 Can-0.005
 pad-0.005
 like-0.005
Token walk
Token ID2513
Feature activation+1.80e+00

Pos logit contributions
 indeed+0.013
 enough+0.013
 like+0.013
 Can+0.012
 please+0.012
Neg logit contributions
lights-0.020
lo-0.019
aro-0.016
vation-0.016
irl-0.015
Token over
Token ID625
Feature activation+2.95e+00

Pos logit contributions
lights+0.048
lo+0.047
vation+0.040
aro+0.039
opl+0.039
Neg logit contributions
 enough-0.031
 indeed-0.030
 like-0.030
 Can-0.027
 please-0.027
Token to
Token ID284
Feature activation+1.94e+00

Pos logit contributions
lights+0.081
lo+0.081
aro+0.067
vation+0.064
irl+0.064
Neg logit contributions
 indeed-0.047
 enough-0.045
 like-0.043
 Can-0.042
 ok-0.042
Token it
Token ID340
Feature activation+1.07e+00

Pos logit contributions
lights+0.059
lo+0.056
vation+0.048
aro+0.048
opl+0.048
Neg logit contributions
 like-0.029
 enough-0.028
 indeed-0.027
 please-0.026
 Can-0.025
Token.
Token ID13
Feature activation-6.94e-01

Pos logit contributions
lights+0.029
lo+0.029
aro+0.024
vation+0.024
irl+0.023
Neg logit contributions
 enough-0.018
 indeed-0.018
 like-0.018
 Can-0.016
 please-0.016
Token,
Token ID11
Feature activation-3.10e+00

Pos logit contributions
 indeed+0.040
 enough+0.037
 Can+0.036
 like+0.035
 wanna+0.034
Neg logit contributions
lights-0.152
lo-0.148
aro-0.131
vation-0.127
we-0.125
Token "
Token ID366
Feature activation+9.05e-01

Pos logit contributions
 enough+0.039
 indeed+0.037
 wanna+0.033
 like+0.033
 Can+0.032
Neg logit contributions
lights-0.091
lo-0.088
aro-0.079
vation-0.077
we-0.076
TokenI
Token ID40
Feature activation+8.40e-02

Pos logit contributions
lights+0.029
lo+0.027
eds+0.023
irl+0.022
aro+0.022
Neg logit contributions
 like-0.013
 Can-0.013
 Please-0.012
 indeed-0.012
 please-0.011
Token'm
Token ID1101
Feature activation-2.37e+00

Pos logit contributions
lo+0.002
lights+0.002
irl+0.002
vation+0.002
aro+0.002
Neg logit contributions
 indeed-0.001
 enough-0.001
 Can-0.001
 Robert-0.001
 Please-0.001
Token here
Token ID994
Feature activation-3.37e+00

Pos logit contributions
 indeed+0.035
 Can+0.034
 enough+0.033
 wanna+0.031
 pad+0.031
Neg logit contributions
lo-0.063
lights-0.061
aro-0.056
irl-0.054
cks-0.052
Token,
Token ID11
Feature activation-4.63e+00

Pos logit contributions
 enough+0.051
 accidentally+0.050
 thief+0.047
 blonde+0.046
 declined+0.046
Neg logit contributions
lo-0.081
lights-0.079
aro-0.067
we-0.067
vation-0.065
Token Ben
Token ID3932
Feature activation-3.88e+00

Pos logit contributions
 enough+0.063
 indeed+0.062
 wanna+0.058
 accidentally+0.055
 Can+0.054
Neg logit contributions
lights-0.126
lo-0.121
aro-0.107
irl-0.104
vation-0.103
Token!
Token ID0
Feature activation-3.00e+00

Pos logit contributions
 indeed+0.087
 wanna+0.084
 Can+0.083
 gonna+0.082
 enough+0.082
Neg logit contributions
lo-0.068
lights-0.066
eds-0.054
cks-0.052
opl-0.052
Token I
Token ID314
Feature activation+9.83e-01

Pos logit contributions
 enough+0.043
 indeed+0.040
 like+0.036
 accidentally+0.035
 ok+0.032
Neg logit contributions
lo-0.086
lights-0.085
aro-0.076
opl-0.074
vation-0.073
Token found
Token ID1043
Feature activation-2.84e-01

Pos logit contributions
lo+0.026
lights+0.025
irl+0.022
aro+0.022
vation+0.022
Neg logit contributions
 indeed-0.015
 enough-0.015
 Can-0.014
 Please-0.013
 ok-0.013
Token a
Token ID257
Feature activation-1.37e+00

Pos logit contributions
 indeed+0.004
 like+0.004
 wanna+0.004
 enough+0.004
 Can+0.004
Neg logit contributions
lo-0.008
lights-0.008
aro-0.007
opl-0.006
cks-0.006
Token.
Token ID13
Feature activation+2.44e-03

Pos logit contributions
lights+0.048
lo+0.047
aro+0.040
vation+0.039
ups+0.039
Neg logit contributions
 indeed-0.024
 Can-0.023
 enough-0.022
 like-0.022
 wanna-0.021
Token
Token ID198
Feature activation+4.40e+00

Pos logit contributions
lights+0.000
lo+0.000
aro+0.000
vation+0.000
opl+0.000
Neg logit contributions
 indeed-0.000
 enough-0.000
 like-0.000
 Can-0.000
 please-0.000
Token
Token ID198
Feature activation+3.68e+00

Pos logit contributions
lights+0.134
lo+0.122
vation+0.109
aro+0.105
irl+0.103
Neg logit contributions
 indeed-0.062
 like-0.059
 enough-0.058
 accidentally-0.056
 please-0.055
TokenThe
Token ID464
Feature activation+3.49e-01

Pos logit contributions
lights+0.092
lo+0.089
aro+0.073
vation+0.072
irl+0.070
Neg logit contributions
 indeed-0.070
 enough-0.068
 like-0.067
 Can-0.064
 please-0.063
Token waiter
Token ID46612
Feature activation+7.69e-02

Pos logit contributions
lights+0.009
lo+0.009
aro+0.007
vation+0.007
irl+0.007
Neg logit contributions
 indeed-0.006
 enough-0.006
 like-0.006
 Can-0.006
 please-0.006
Token suggested
Token ID5220
Feature activation-5.21e+00

Pos logit contributions
lights+0.002
lo+0.002
aro+0.002
vation+0.002
ups+0.002
Neg logit contributions
 Can-0.001
 indeed-0.001
 enough-0.001
 like-0.001
 wanna-0.001
Token he
Token ID339
Feature activation-3.36e-01

Pos logit contributions
 pad+0.101
 indeed+0.098
 Can+0.098
 wanna+0.096
 enough+0.092
Neg logit contributions
lo-0.112
lights-0.111
aro-0.095
ves-0.086
we-0.086
Token give
Token ID1577
Feature activation+1.38e+00

Pos logit contributions
 indeed+0.006
 Can+0.006
 enough+0.006
 pad+0.005
 like+0.005
Neg logit contributions
lo-0.008
lights-0.008
ups-0.006
aro-0.006
vation-0.006
Token a
Token ID257
Feature activation-1.21e+00

Pos logit contributions
lights+0.038
lo+0.038
aro+0.032
vation+0.032
irl+0.031
Neg logit contributions
 indeed-0.021
 enough-0.020
 like-0.020
 Can-0.019
 please-0.018
Token tip
Token ID8171
Feature activation-2.99e-01

Pos logit contributions
 indeed+0.024
 enough+0.023
 like+0.022
 Can+0.022
 Please+0.020
Neg logit contributions
lights-0.028
lo-0.028
vation-0.023
aro-0.022
irl-0.022
Token,
Token ID11
Feature activation-3.07e+00

Pos logit contributions
 indeed+0.004
 enough+0.003
 like+0.003
 Can+0.003
 accidentally+0.003
Neg logit contributions
lights-0.009
lo-0.009
aro-0.008
vation-0.008
irl-0.008
Token finished
Token ID5201
Feature activation-1.89e+00

Pos logit contributions
lights+0.007
lo+0.007
aro+0.006
vation+0.006
irl+0.006
Neg logit contributions
 indeed-0.004
 enough-0.004
 like-0.003
 Can-0.003
 please-0.003
Token,
Token ID11
Feature activation-2.89e+00

Pos logit contributions
 indeed+0.028
 enough+0.026
 Can+0.026
 like+0.026
 wanna+0.025
Neg logit contributions
lights-0.054
lo-0.052
irl-0.044
aro-0.043
vation-0.043
Token Ally
Token ID43039
Feature activation-2.17e+00

Pos logit contributions
 indeed+0.059
 enough+0.056
 Can+0.055
 like+0.054
 wanna+0.053
Neg logit contributions
lights-0.067
lo-0.064
aro-0.053
vation-0.051
irl-0.050
Token smiled
Token ID13541
Feature activation-3.54e+00

Pos logit contributions
 indeed+0.036
 Can+0.036
 like+0.034
 enough+0.034
 please+0.034
Neg logit contributions
lights-0.055
lo-0.054
vation-0.046
ups-0.046
aro-0.045
Token and
Token ID290
Feature activation-9.70e-01

Pos logit contributions
 wanna+0.059
 gonna+0.056
 Can+0.056
 robber+0.055
 accidentally+0.055
Neg logit contributions
lights-0.098
lo-0.074
we-0.065
vation-0.064
 directions-0.064
Token said
Token ID531
Feature activation-5.82e+00

Pos logit contributions
 indeed+0.015
 enough+0.015
 like+0.014
 Can+0.013
 please+0.013
Neg logit contributions
lights-0.028
lo-0.027
vation-0.023
aro-0.023
opl-0.022
Token,
Token ID11
Feature activation-4.45e+00

Pos logit contributions
 wanna+0.081
 pad+0.080
 indeed+0.073
 gonna+0.071
 seen+0.071
Neg logit contributions
lights-0.161
lo-0.159
aro-0.141
ves-0.136
irl-0.126
Token "
Token ID366
Feature activation+2.23e+00

Pos logit contributions
 enough+0.050
 indeed+0.047
 like+0.043
 wanna+0.040
 thief+0.039
Neg logit contributions
lights-0.140
lo-0.136
aro-0.121
vation-0.117
we-0.115
TokenWe
Token ID1135
Feature activation+4.44e-01

Pos logit contributions
lights+0.059
lo+0.057
aro+0.048
vation+0.046
irl+0.046
Neg logit contributions
 indeed-0.040
 like-0.038
 enough-0.038
 Can-0.037
 Please-0.035
Token did
Token ID750
Feature activation-3.60e+00

Pos logit contributions
lo+0.011
lights+0.010
irl+0.009
vation+0.009
aro+0.009
Neg logit contributions
 Can-0.007
 indeed-0.007
 enough-0.007
 Heidi-0.007
 Please-0.006
Token it
Token ID340
Feature activation+4.49e-01

Pos logit contributions
 pad+0.052
 Can+0.051
 indeed+0.050
 thief+0.049
 detective+0.047
Neg logit contributions
lights-0.096
lo-0.093
irl-0.086
vation-0.078
ves-0.078
Token before
Token ID878
Feature activation+1.81e-01

Pos logit contributions
lights+0.071
lo+0.070
vation+0.056
we+0.056
opl+0.056
Neg logit contributions
 please-0.062
 indeed-0.061
 Can-0.061
 Please-0.059
 wanna-0.059
Token!
Token ID0
Feature activation+6.79e-01

Pos logit contributions
lights+0.005
lo+0.004
aro+0.004
vation+0.004
irl+0.004
Neg logit contributions
 indeed-0.003
 enough-0.003
 like-0.003
 Can-0.003
 please-0.003
Token<|endoftext|>
Token ID50256
Feature activation+2.87e+00

Pos logit contributions
lights+0.019
lo+0.019
aro+0.016
vation+0.016
irl+0.015
Neg logit contributions
 indeed-0.011
 enough-0.010
 like-0.010
 Can-0.010
 please-0.009
TokenOne
Token ID3198
Feature activation+2.12e+00

Pos logit contributions
lights+0.081
lo+0.076
aro+0.066
vation+0.066
irl+0.064
Neg logit contributions
 indeed-0.046
 enough-0.045
 like-0.045
 Can-0.042
 accidentally-0.039
Token day
Token ID1110
Feature activation-4.83e+00

Pos logit contributions
lights+0.043
lo+0.042
aro+0.033
vation+0.033
opl+0.030
Neg logit contributions
 indeed-0.051
 enough-0.050
 like-0.050
 Can-0.047
 please-0.047
Token,
Token ID11
Feature activation-5.48e+00

Pos logit contributions
 indeed+0.075
 Can+0.069
 enough+0.064
 knight+0.064
 like+0.063
Neg logit contributions
lo-0.121
lights-0.118
irl-0.112
aro-0.107
ves-0.106
Token Tim
Token ID5045
Feature activation-2.02e+00

Pos logit contributions
 enough+0.099
 indeed+0.098
 detective+0.095
 servant+0.095
 hesitate+0.093
Neg logit contributions
lo-0.108
irl-0.098
lights-0.096
 middle-0.090
aro-0.089
Tokenmy
Token ID1820
Feature activation-4.04e+00

Pos logit contributions
 indeed+0.041
 please+0.039
 Can+0.039
 dear+0.039
 Please+0.039
Neg logit contributions
lo-0.041
lights-0.040
eds-0.038
opl-0.036
irl-0.035
Token went
Token ID1816
Feature activation-4.02e+00

Pos logit contributions
 indeed+0.080
 enough+0.075
 please+0.075
 ok+0.075
 Can+0.071
Neg logit contributions
lights-0.084
lo-0.083
eds-0.075
irl-0.074
opl-0.074
Token to
Token ID284
Feature activation-6.27e-02

Pos logit contributions
 indeed+0.050
 enough+0.045
 like+0.043
 Can+0.043
 ok+0.042
Neg logit contributions
lights-0.124
lo-0.122
aro-0.106
vation-0.104
irl-0.104
Token the
Token ID262
Feature activation-6.64e-01

Pos logit contributions
 indeed+0.001
 enough+0.001
 ok+0.001
 like+0.001
 Can+0.001
Neg logit contributions
lo-0.001
lights-0.001
irl-0.001
aro-0.001
vation-0.001
Token,
Token ID11
Feature activation-4.86e+00

Pos logit contributions
 indeed+0.066
 enough+0.065
 wanna+0.061
 daughter+0.061
 Can+0.061
Neg logit contributions
lo-0.142
lights-0.135
aro-0.120
irl-0.115
opl-0.111
Token there
Token ID612
Feature activation-2.46e-01

Pos logit contributions
 wanna+0.087
 indeed+0.086
 hesitate+0.079
 enough+0.079
 Can+0.078
Neg logit contributions
lights-0.114
lo-0.111
irl-0.097
cks-0.093
aro-0.089
Token was
Token ID373
Feature activation-6.45e+00

Pos logit contributions
 indeed+0.005
 enough+0.005
 Can+0.005
 like+0.004
 Please+0.004
Neg logit contributions
lo-0.006
lights-0.006
irl-0.005
aro-0.004
opl-0.004
Token a
Token ID257
Feature activation-5.89e+00

Pos logit contributions
 wanna+0.106
 declined+0.106
 Can+0.103
 Please+0.100
 Could+0.100
Neg logit contributions
lo-0.139
lights-0.138
irl-0.137
hes-0.123
 weekends-0.122
Token little
Token ID1310
Feature activation-5.09e+00

Pos logit contributions
 enough+0.086
 indeed+0.080
 like+0.070
 ok+0.069
 wanna+0.069
Neg logit contributions
lo-0.163
lights-0.162
irl-0.144
aro-0.136
vation-0.134
Token boy
Token ID2933
Feature activation-6.48e+00

Pos logit contributions
 indeed+0.079
 enough+0.077
 ok+0.073
 gonna+0.071
 wanna+0.070
Neg logit contributions
lo-0.137
lights-0.137
ites-0.116
aro-0.116
irl-0.109
Token named
Token ID3706
Feature activation-4.85e+00

Pos logit contributions
 like+0.066
 indeed+0.066
 enough+0.062
 Can+0.056
 accidentally+0.053
Neg logit contributions
lights-0.223
lo-0.216
vation-0.192
aro-0.186
irl-0.182
Token Jack
Token ID3619
Feature activation-5.13e+00

Pos logit contributions
 indeed+0.062
 enough+0.058
 like+0.052
 wanna+0.050
 Can+0.049
Neg logit contributions
lights-0.149
lo-0.149
aro-0.126
irl-0.122
eds-0.122
Token.
Token ID13
Feature activation-2.07e+00

Pos logit contributions
 indeed+0.097
 enough+0.095
 like+0.092
 Can+0.089
 please+0.086
Neg logit contributions
lights-0.126
lo-0.124
aro-0.101
vation-0.100
irl-0.098
Token Jack
Token ID3619
Feature activation-3.07e+00

Pos logit contributions
 enough+0.025
 indeed+0.023
 daughter+0.021
 dear+0.021
 ok+0.020
Neg logit contributions
lights-0.066
lo-0.062
opl-0.054
irl-0.053
ites-0.051
Token was
Token ID373
Feature activation-5.12e+00

Pos logit contributions
 enough+0.056
 indeed+0.053
 please+0.051
 Can+0.051
 wanna+0.050
Neg logit contributions
lo-0.073
lights-0.072
irl-0.063
opl-0.060
eds-0.059