TOP ACTIVATIONS
MAX = 0.075

thought about the night  ´ s adventure as she drifted
" Hello , welcome to the Star Hotel .
Sam said goodbye to their camps ite and started the drive
and says " Hello , welcome to the original bus .
and say " Hello , welcome to our show !" They
said , " Welcome , welcome , my friends ! I
dog named Spot was very exc ite . He was going
<|endoftext|> K ara was a depend able girl . She knew
smiled and said , " Nice to meet you , L
The two combatants stopped fighting and looked at the
Lily and Tom felt a little better . They thanked the
together . Spot was a depend able friend and Sam loved
beach . Sue was very exc ite to play in the
smiled and said : " Nice job , Elliot !" Elliot
smiled and said , " Nice job , Jack !" From
" You are welcome , welcome ! You are cute !"
, little ostr ich , welcome to the zoo ," the
. <|endoftext|> Ben was very exc ite to go to school
cat named Tom was very exc ite . He saw a
ily and Ben were a little sad . They wanted to

MIN ACTIVATIONS
MIN = -0.093

had to know all about microsc opes .
if she could talk about microsc opes . Her
. It said , " Pract ice balancing every day ,
wall . It said " DO NOT T OU CH ".
. He pulled the c ork and a puff of smoke
an old , rusty do ork nob and the do ork
so she could run fr ree ly . The
shiny and had a c ork on top .
red button that said " DO NOT PRESS ." She couldn
her to polish the do ork nob . Lily didn 't
to be restored to its former glory ."
pol ishing all the do ork n obs .
ork nob and the do ork nob sh ined ! She
present ?" B ree said , " Well ,
said , " Ant if ree ze . Poison . Do
she had polished the do ork nob earlier . She thought
. Jake had never had por ridge before and he was
was getting ready to make por ridge and pancakes . He
with tools . They have hamm ers , saw s ,
Sara said , " C rick et , is this your

INTERVAL 0.033 - 0.075
CONTAINS 1.703%

. She said they were good friends and gave them a
they were happy and they hugged Ben and Lily . They
to town . It was a little mouse . The mouse
museum and the people who loved it . <|endoftext|> Once upon
From then on everyone in the me adow was happy

INTERVAL -0.009 - 0.033
CONTAINS 72.027%

down . He was so happy that he was able to
they reached their mom . She hugged them and asked them
off she went . She had a great adventure
me repair the hole in my wall ?" asked Tom .
wild . Sally was embarrassed and apologized to her mom .

INTERVAL -0.051 - -0.009
CONTAINS 26.195%

blue ranch in the countryside . It was full of animals
that day on , Lily blew bubbles every day and became
dinner . Her mom said no , because it was too
swing . The bird thought it would be fun to swing
says , " I know you both like different shows .

INTERVAL -0.093 - -0.051
CONTAINS 0.076%

rot , carrot , you make me strong ." She picks
not . They only used simple words , but their words
my water . It will make your bottom feel better ."
that they were going to make something special . Inside the
store and buy some more cel ery !" Jane was so
Token thought
Token ID1807
Feature activation+8.71e-03

Pos logit contributions
roach+0.003
 aids+0.003
 phrase+0.003
 poses+0.003
eness+0.003
Neg logit contributions
ll-0.002
 together-0.002
 far-0.002
 here-0.002
We-0.002
Token about
Token ID546
Feature activation+7.21e-03

Pos logit contributions
 far+0.011
 together+0.011
ll+0.011
 here+0.009
 glad+0.008
Neg logit contributions
roach-0.020
eness-0.019
 aids-0.019
tern-0.019
 gloss-0.019
Token the
Token ID262
Feature activation-5.96e-03

Pos logit contributions
 together+0.009
ll+0.009
 far+0.009
 glad+0.008
 here+0.007
Neg logit contributions
roach-0.016
paste-0.016
 aids-0.016
est-0.016
eness-0.016
Token night
Token ID1755
Feature activation-1.32e-02

Pos logit contributions
 aids+0.010
roach+0.010
paste+0.010
 gloss+0.010
eness+0.010
Neg logit contributions
ll-0.011
 far-0.010
 together-0.010
 glad-0.009
 here-0.009
TokenÂ
Token ID5523
Feature activation+3.11e-02

Pos logit contributions
roach+0.030
eness+0.028
 phrase+0.028
 aids+0.028
paste+0.027
Neg logit contributions
 together-0.018
 far-0.018
ll-0.015
 they-0.015
 here-0.014
Token´
Token ID18265
Feature activation+7.49e-02

Pos logit contributions
ll+0.051
 together+0.048
 far+0.047
 here+0.041
 glad+0.041
Neg logit contributions
 aids-0.059
roach-0.058
paste-0.058
eness-0.057
 gloss-0.057
Tokens
Token ID82
Feature activation-4.45e-03

Pos logit contributions
ll+0.094
 far+0.086
 together+0.086
 our+0.079
 glad+0.077
Neg logit contributions
 gloss-0.154
paste-0.153
roach-0.150
est-0.149
 aids-0.147
Token adventure
Token ID8855
Feature activation+1.21e-02

Pos logit contributions
roach+0.009
paste+0.008
 gloss+0.008
eness+0.008
 aids+0.008
Neg logit contributions
 far-0.007
 together-0.007
ll-0.007
 glad-0.006
 here-0.006
Token as
Token ID355
Feature activation+1.75e-02

Pos logit contributions
 together+0.015
ll+0.015
 far+0.015
 here+0.011
 they+0.011
Neg logit contributions
 aids-0.028
roach-0.028
paste-0.027
 gloss-0.027
 phrase-0.027
Token she
Token ID673
Feature activation-9.09e-03

Pos logit contributions
ll+0.024
 together+0.023
 far+0.023
 glad+0.019
 here+0.018
Neg logit contributions
 aids-0.038
roach-0.037
eness-0.037
paste-0.036
est-0.036
Token drifted
Token ID38648
Feature activation-6.14e-03

Pos logit contributions
 aids+0.020
roach+0.020
eness+0.019
est+0.019
 sensation+0.019
Neg logit contributions
ll-0.014
 far-0.012
 together-0.012
We-0.010
 here-0.009
Token
Token ID198
Feature activation+7.08e-03

Pos logit contributions
ll+0.014
 together+0.010
 far+0.009
prise+0.009
 glad+0.008
Neg logit contributions
 aids-0.019
paste-0.019
 gloss-0.018
 phrase-0.018
roach-0.018
Token
Token ID198
Feature activation+1.43e-02

Pos logit contributions
ll+0.013
 ours+0.011
prise+0.011
 married+0.010
 Enjoy+0.010
Neg logit contributions
 aids-0.010
 phrase-0.009
 lines-0.009
paste-0.009
 motions-0.009
Token"
Token ID1
Feature activation+1.71e-02

Pos logit contributions
ll+0.028
 together+0.022
 Enjoy+0.021
 far+0.020
prise+0.020
Neg logit contributions
paste-0.021
 aids-0.020
 gloss-0.020
 bolts-0.019
 strands-0.019
TokenHello
Token ID15496
Feature activation+2.78e-02

Pos logit contributions
ll+0.028
 together+0.026
 far+0.025
 glad+0.022
 here+0.021
Neg logit contributions
 aids-0.032
roach-0.032
paste-0.032
est-0.032
 gloss-0.031
Token,
Token ID11
Feature activation+2.30e-02

Pos logit contributions
ll+0.050
 together+0.043
 far+0.043
 glad+0.039
 Enjoy+0.037
Neg logit contributions
est-0.049
 aids-0.047
paste-0.046
roach-0.046
eness-0.046
Token welcome
Token ID7062
Feature activation+6.73e-02

Pos logit contributions
ll+0.041
 together+0.038
 far+0.034
We+0.030
 glad+0.029
Neg logit contributions
est-0.040
 aids-0.039
roach-0.038
paste-0.038
 phrase-0.038
Token to
Token ID284
Feature activation+1.16e-02

Pos logit contributions
ll+0.091
 together+0.073
 far+0.070
 glad+0.066
arl+0.058
Neg logit contributions
est-0.153
 aids-0.143
paste-0.141
eness-0.141
 gloss-0.139
Token the
Token ID262
Feature activation+1.73e-02

Pos logit contributions
ll+0.021
 together+0.017
 far+0.015
We+0.014
arl+0.014
Neg logit contributions
 aids-0.021
 movements-0.019
roach-0.019
 sentences-0.018
est-0.018
Token Star
Token ID2907
Feature activation+1.43e-02

Pos logit contributions
ll+0.032
 together+0.029
 far+0.027
 glad+0.024
 here+0.024
Neg logit contributions
 aids-0.030
roach-0.030
 phrase-0.030
est-0.029
 poses-0.029
Token Hotel
Token ID12696
Feature activation+9.13e-03

Pos logit contributions
ll+0.024
 far+0.021
arl+0.020
 glad+0.020
 Enjoy+0.020
Neg logit contributions
est-0.025
paste-0.024
 aids-0.023
eness-0.023
 gloss-0.022
Token.
Token ID13
Feature activation+2.91e-02

Pos logit contributions
ll+0.014
 together+0.012
 far+0.012
 glad+0.011
arl+0.011
Neg logit contributions
 aids-0.017
est-0.017
paste-0.017
roach-0.017
 phrase-0.016
Token Sam
Token ID3409
Feature activation-3.73e-03

Pos logit contributions
ll+0.013
 together+0.013
 far+0.012
 here+0.010
 glad+0.010
Neg logit contributions
roach-0.021
 aids-0.021
paste-0.021
 gloss-0.021
eness-0.021
Token said
Token ID531
Feature activation-1.42e-02

Pos logit contributions
est+0.007
roach+0.007
 aids+0.007
paste+0.007
eness+0.007
Neg logit contributions
ll-0.006
 together-0.006
 far-0.005
 glad-0.005
arl-0.005
Token goodbye
Token ID24829
Feature activation+1.82e-02

Pos logit contributions
roach+0.026
 aids+0.025
eness+0.025
 gloss+0.024
paste+0.024
Neg logit contributions
ll-0.024
 together-0.024
 far-0.023
 glad-0.020
 here-0.020
Token to
Token ID284
Feature activation+1.41e-02

Pos logit contributions
ll+0.022
 far+0.019
 together+0.018
 glad+0.016
 here+0.015
Neg logit contributions
 aids-0.041
est-0.041
paste-0.041
roach-0.041
eness-0.041
Token their
Token ID511
Feature activation+1.70e-02

Pos logit contributions
ll+0.022
 together+0.017
 far+0.017
arl+0.015
We+0.014
Neg logit contributions
 aids-0.030
roach-0.028
 poses-0.028
est-0.028
 movements-0.028
Token camps
Token ID11665
Feature activation+6.60e-02

Pos logit contributions
ll+0.028
 together+0.025
 far+0.023
We+0.022
arl+0.021
Neg logit contributions
est-0.031
 aids-0.030
 movements-0.030
 phrase-0.030
 poses-0.029
Tokenite
Token ID578
Feature activation+1.08e-02

Pos logit contributions
ll+0.097
 together+0.095
 far+0.093
 glad+0.092
 our+0.087
Neg logit contributions
eness-0.115
roach-0.111
paste-0.108
est-0.108
 gloss-0.106
Token and
Token ID290
Feature activation+1.67e-02

Pos logit contributions
ll+0.016
 together+0.014
 far+0.014
 glad+0.012
 here+0.011
Neg logit contributions
 aids-0.022
roach-0.022
est-0.022
paste-0.022
eness-0.022
Token started
Token ID2067
Feature activation+1.77e-02

Pos logit contributions
ll+0.026
 far+0.023
 together+0.022
arl+0.021
 glad+0.020
Neg logit contributions
est-0.031
paste-0.031
 aids-0.031
roach-0.031
 gloss-0.030
Token the
Token ID262
Feature activation-7.50e-03

Pos logit contributions
ll+0.022
 together+0.018
 far+0.018
 glad+0.015
 here+0.014
Neg logit contributions
 aids-0.041
paste-0.040
roach-0.040
 gloss-0.039
est-0.039
Token drive
Token ID3708
Feature activation+2.90e-02

Pos logit contributions
est+0.015
roach+0.015
 phrase+0.014
 sensation+0.014
 poses+0.014
Neg logit contributions
ll-0.011
 together-0.010
 far-0.009
 glad-0.009
arl-0.008
Token and
Token ID290
Feature activation+1.12e-02

Pos logit contributions
ll+0.029
 together+0.028
 far+0.027
 glad+0.022
 here+0.021
Neg logit contributions
roach-0.052
 aids-0.052
paste-0.050
 gloss-0.050
eness-0.050
Token says
Token ID1139
Feature activation-9.86e-03

Pos logit contributions
 together+0.017
ll+0.017
 far+0.017
 glad+0.014
 here+0.014
Neg logit contributions
roach-0.023
 aids-0.022
eness-0.022
est-0.021
 sensation-0.021
Token "
Token ID366
Feature activation+2.83e-02

Pos logit contributions
roach+0.022
 aids+0.021
eness+0.020
 gloss+0.020
 sensation+0.020
Neg logit contributions
ll-0.014
 together-0.014
 far-0.013
 glad-0.011
 here-0.011
TokenHello
Token ID15496
Feature activation+2.21e-02

Pos logit contributions
ll+0.041
 together+0.037
 far+0.036
 glad+0.032
 here+0.031
Neg logit contributions
 aids-0.058
roach-0.058
est-0.058
paste-0.057
eness-0.057
Token,
Token ID11
Feature activation+1.84e-02

Pos logit contributions
ll+0.034
 together+0.031
 far+0.030
 glad+0.026
 here+0.025
Neg logit contributions
 aids-0.045
roach-0.044
paste-0.043
eness-0.043
est-0.043
Token welcome
Token ID7062
Feature activation+6.57e-02

Pos logit contributions
ll+0.029
 together+0.026
 far+0.025
 glad+0.021
We+0.021
Neg logit contributions
 aids-0.036
est-0.036
paste-0.035
 phrase-0.035
 poses-0.035
Token to
Token ID284
Feature activation+1.05e-02

Pos logit contributions
ll+0.086
 far+0.070
 together+0.069
 glad+0.063
arl+0.054
Neg logit contributions
est-0.147
 aids-0.146
paste-0.144
roach-0.142
 phrase-0.141
Token the
Token ID262
Feature activation+1.01e-02

Pos logit contributions
ll+0.018
 together+0.014
 far+0.013
We+0.012
arl+0.012
Neg logit contributions
 aids-0.020
 movements-0.019
 phrase-0.019
 poses-0.019
est-0.018
Token original
Token ID2656
Feature activation-2.20e-04

Pos logit contributions
ll+0.016
 together+0.015
 far+0.015
 glad+0.013
 here+0.012
Neg logit contributions
 aids-0.020
roach-0.020
paste-0.019
eness-0.019
 gloss-0.019
Token bus
Token ID1323
Feature activation+1.98e-02

Pos logit contributions
paste+0.000
 gloss+0.000
roach+0.000
 aids+0.000
eness+0.000
Neg logit contributions
 far-0.000
ll-0.000
 together-0.000
 here-0.000
 glad-0.000
Token.
Token ID13
Feature activation+2.99e-02

Pos logit contributions
ll+0.029
 together+0.025
 far+0.025
 glad+0.022
 here+0.020
Neg logit contributions
 aids-0.041
est-0.040
roach-0.040
paste-0.040
eness-0.039
Token and
Token ID290
Feature activation+2.41e-02

Pos logit contributions
ll+0.027
 Enjoy+0.022
 far+0.022
 together+0.021
 glad+0.021
Neg logit contributions
est-0.028
 aids-0.028
eness-0.027
 phrase-0.027
 poses-0.027
Token say
Token ID910
Feature activation-1.31e-02

Pos logit contributions
ll+0.032
 together+0.026
 far+0.025
 glad+0.022
arl+0.020
Neg logit contributions
 aids-0.055
paste-0.053
roach-0.053
 poses-0.052
 gloss-0.052
Token "
Token ID366
Feature activation+2.09e-02

Pos logit contributions
roach+0.028
eness+0.027
 sensation+0.027
 aids+0.027
est+0.027
Neg logit contributions
 together-0.018
ll-0.018
 far-0.017
 glad-0.015
 here-0.015
TokenHello
Token ID15496
Feature activation+2.49e-02

Pos logit contributions
ll+0.031
 far+0.027
 together+0.027
 glad+0.025
 here+0.023
Neg logit contributions
est-0.042
 aids-0.042
paste-0.042
roach-0.042
eness-0.042
Token,
Token ID11
Feature activation+1.98e-02

Pos logit contributions
ll+0.039
 together+0.035
 far+0.035
 glad+0.030
 here+0.029
Neg logit contributions
 aids-0.049
roach-0.049
paste-0.048
est-0.048
eness-0.047
Token welcome
Token ID7062
Feature activation+6.54e-02

Pos logit contributions
ll+0.034
 together+0.030
 far+0.028
We+0.025
 glad+0.025
Neg logit contributions
 aids-0.035
est-0.034
paste-0.033
 phrase-0.033
 poses-0.033
Token to
Token ID284
Feature activation+1.28e-02

Pos logit contributions
ll+0.084
 far+0.066
 together+0.065
 glad+0.060
arl+0.052
Neg logit contributions
est-0.146
 aids-0.146
paste-0.142
 phrase-0.141
 poses-0.140
Token our
Token ID674
Feature activation+2.72e-02

Pos logit contributions
ll+0.023
 together+0.017
 far+0.016
arl+0.015
We+0.014
Neg logit contributions
 aids-0.025
 movements-0.023
 poses-0.023
 sentences-0.022
 phrase-0.022
Token show
Token ID905
Feature activation+1.87e-03

Pos logit contributions
ll+0.045
 together+0.038
 far+0.036
 glad+0.034
We+0.033
Neg logit contributions
 aids-0.051
 poses-0.050
est-0.050
 movements-0.050
 phrase-0.050
Token!"
Token ID2474
Feature activation+5.21e-04

Pos logit contributions
ll+0.003
 together+0.002
 far+0.002
 here+0.002
 glad+0.002
Neg logit contributions
 aids-0.004
roach-0.004
paste-0.004
 gloss-0.004
eness-0.004
Token They
Token ID1119
Feature activation+1.30e-02

Pos logit contributions
ll+0.001
 together+0.000
 far+0.000
 glad+0.000
 here+0.000
Neg logit contributions
 aids-0.001
roach-0.001
paste-0.001
 phrase-0.001
 gloss-0.001
Token said
Token ID531
Feature activation-7.93e-03

Pos logit contributions
ll+0.008
 together+0.007
 far+0.007
 glad+0.006
 here+0.006
Neg logit contributions
 aids-0.012
roach-0.012
paste-0.011
est-0.011
eness-0.011
Token,
Token ID11
Feature activation-2.83e-02

Pos logit contributions
roach+0.017
 aids+0.017
eness+0.017
est+0.016
 phrase+0.016
Neg logit contributions
ll-0.011
 together-0.011
 far-0.010
 glad-0.009
 here-0.009
Token "
Token ID366
Feature activation+2.57e-02

Pos logit contributions
roach+0.053
 aids+0.053
est+0.053
eness+0.052
paste+0.052
Neg logit contributions
ll-0.047
 together-0.044
 far-0.043
 glad-0.038
 here-0.036
TokenWelcome
Token ID14618
Feature activation+3.03e-02

Pos logit contributions
ll+0.043
 together+0.040
 far+0.039
 glad+0.034
 here+0.033
Neg logit contributions
 aids-0.047
roach-0.047
est-0.046
paste-0.046
eness-0.046
Token,
Token ID11
Feature activation+2.00e-02

Pos logit contributions
ll+0.045
 together+0.040
 far+0.040
 glad+0.034
 here+0.033
Neg logit contributions
 aids-0.063
roach-0.062
paste-0.061
est-0.061
eness-0.061
Token welcome
Token ID7062
Feature activation+6.46e-02

Pos logit contributions
ll+0.031
 together+0.029
 far+0.027
 glad+0.023
We+0.023
Neg logit contributions
 aids-0.039
est-0.039
roach-0.038
paste-0.038
eness-0.038
Token,
Token ID11
Feature activation+2.21e-02

Pos logit contributions
ll+0.081
 together+0.069
 far+0.066
 glad+0.058
We+0.050
Neg logit contributions
est-0.147
 aids-0.143
paste-0.142
roach-0.141
 gloss-0.141
Token my
Token ID616
Feature activation+1.19e-02

Pos logit contributions
ll+0.035
 together+0.032
 far+0.031
 glad+0.026
We+0.026
Neg logit contributions
 aids-0.042
est-0.042
roach-0.041
paste-0.041
eness-0.041
Token friends
Token ID2460
Feature activation+3.19e-02

Pos logit contributions
ll+0.018
 together+0.016
 far+0.016
 glad+0.013
 here+0.013
Neg logit contributions
 aids-0.025
roach-0.024
paste-0.024
eness-0.024
 gloss-0.024
Token!
Token ID0
Feature activation+2.74e-02

Pos logit contributions
ll+0.061
 together+0.056
 far+0.055
 glad+0.051
arl+0.049
Neg logit contributions
est-0.047
eness-0.045
 aids-0.045
roach-0.045
 phrase-0.044
Token I
Token ID314
Feature activation+2.48e-02

Pos logit contributions
ll+0.051
 together+0.046
 far+0.046
 glad+0.038
 here+0.036
Neg logit contributions
 aids-0.048
roach-0.046
est-0.045
paste-0.045
 gloss-0.045
Token dog
Token ID3290
Feature activation+1.20e-02

Pos logit contributions
ll+0.055
 together+0.045
 far+0.041
 glad+0.039
 here+0.037
Neg logit contributions
est-0.053
roach-0.052
 gloss-0.049
paste-0.048
 sensation-0.048
Token named
Token ID3706
Feature activation+1.08e-02

Pos logit contributions
ll+0.016
 together+0.012
arl+0.011
 far+0.011
 ours+0.010
Neg logit contributions
est-0.027
 aids-0.027
roach-0.026
 gloss-0.026
 streaks-0.025
Token Spot
Token ID15899
Feature activation+1.96e-02

Pos logit contributions
ll+0.019
 together+0.016
 far+0.015
 glad+0.014
We+0.013
Neg logit contributions
 aids-0.020
roach-0.019
est-0.019
 gloss-0.019
paste-0.019
Token was
Token ID373
Feature activation+1.15e-02

Pos logit contributions
ll+0.040
arl+0.035
 together+0.034
 ours+0.032
 glad+0.031
Neg logit contributions
est-0.030
roach-0.026
 streaks-0.025
nob-0.025
 aids-0.024
Token very
Token ID845
Feature activation+9.58e-03

Pos logit contributions
ll+0.022
prise+0.016
arl+0.016
use+0.016
 together+0.016
Neg logit contributions
ze-0.018
 aids-0.017
 gloss-0.017
 ankle-0.017
 sensation-0.017
Token exc
Token ID2859
Feature activation+6.41e-02

Pos logit contributions
ll+0.017
 together+0.016
We+0.014
prise+0.014
 Enjoy+0.013
Neg logit contributions
 gloss-0.013
 scalp-0.013
 ankle-0.012
 movements-0.012
est-0.012
Tokenite
Token ID578
Feature activation-3.43e-03

Pos logit contributions
 together+0.096
ll+0.096
 glad+0.092
 far+0.090
 here+0.086
Neg logit contributions
 gloss-0.109
oga-0.107
roach-0.101
ol-0.101
eness-0.101
Token.
Token ID13
Feature activation-9.12e-03

Pos logit contributions
roach+0.008
paste+0.008
 aids+0.008
eness+0.008
 phrase+0.008
Neg logit contributions
ll-0.004
 together-0.004
 far-0.004
 here-0.003
 glad-0.002
Token He
Token ID679
Feature activation+1.85e-04

Pos logit contributions
 aids+0.021
roach+0.021
paste+0.021
eness+0.021
 gloss+0.020
Neg logit contributions
ll-0.011
 together-0.010
 far-0.010
 glad-0.008
 here-0.008
Token was
Token ID373
Feature activation-1.19e-04

Pos logit contributions
ll+0.000
 together+0.000
 far+0.000
 glad+0.000
 here+0.000
Neg logit contributions
 aids-0.000
roach-0.000
paste-0.000
 gloss-0.000
est-0.000
Token going
Token ID1016
Feature activation+1.90e-02

Pos logit contributions
 aids+0.000
 gloss+0.000
paste+0.000
roach+0.000
est+0.000
Neg logit contributions
ll-0.000
 together-0.000
 far-0.000
We-0.000
arl-0.000
Token<|endoftext|>
Token ID50256
Feature activation+6.26e-03

Pos logit contributions
ll+0.006
 far+0.005
 together+0.005
 here+0.004
prise+0.004
Neg logit contributions
 aids-0.010
roach-0.010
est-0.009
eness-0.009
 gloss-0.009
TokenK
Token ID42
Feature activation+8.66e-03

Pos logit contributions
ll+0.008
 far+0.007
 together+0.007
 glad+0.006
 here+0.005
Neg logit contributions
roach-0.014
 aids-0.014
est-0.014
paste-0.013
 phrase-0.013
Tokenara
Token ID3301
Feature activation-5.26e-03

Pos logit contributions
 together+0.014
 far+0.013
ll+0.013
We+0.012
 here+0.012
Neg logit contributions
est-0.015
roach-0.015
eness-0.014
oline-0.014
ol-0.014
Token was
Token ID373
Feature activation+1.32e-03

Pos logit contributions
 aids+0.011
roach+0.011
est+0.010
paste+0.010
eness+0.010
Neg logit contributions
ll-0.008
 together-0.007
 far-0.007
 glad-0.006
 here-0.006
Token a
Token ID257
Feature activation+3.51e-02

Pos logit contributions
ll+0.002
 together+0.002
 far+0.002
arl+0.002
 Enjoy+0.001
Neg logit contributions
 aids-0.003
 sensation-0.002
 gloss-0.002
est-0.002
 brist-0.002
Token depend
Token ID4745
Feature activation+6.32e-02

Pos logit contributions
ll+0.059
 together+0.052
 far+0.045
arl+0.042
 here+0.041
Neg logit contributions
roach-0.064
est-0.064
 aids-0.064
 gloss-0.063
 phrase-0.063
Tokenable
Token ID540
Feature activation+2.64e-02

Pos logit contributions
ll+0.099
 together+0.083
 far+0.077
 our+0.071
 glad+0.071
Neg logit contributions
est-0.130
roach-0.127
 sensation-0.120
 aids-0.119
 gloss-0.119
Token girl
Token ID2576
Feature activation+2.28e-02

Pos logit contributions
ll+0.040
 together+0.036
 far+0.035
 glad+0.030
 here+0.029
Neg logit contributions
 aids-0.054
roach-0.053
est-0.052
paste-0.052
 gloss-0.052
Token.
Token ID13
Feature activation-4.53e-03

Pos logit contributions
ll+0.029
 together+0.026
 far+0.026
 glad+0.021
 here+0.021
Neg logit contributions
 aids-0.051
roach-0.051
paste-0.050
est-0.050
eness-0.050
Token She
Token ID1375
Feature activation-8.70e-03

Pos logit contributions
paste+0.011
roach+0.011
 aids+0.011
eness+0.010
 gloss+0.010
Neg logit contributions
ll-0.005
 far-0.005
 together-0.005
 glad-0.004
 here-0.004
Token knew
Token ID2993
Feature activation+1.19e-02

Pos logit contributions
 aids+0.021
eness+0.021
roach+0.021
paste+0.021
 phrase+0.020
Neg logit contributions
ll-0.010
 far-0.009
 together-0.009
 glad-0.007
 here-0.007
Token smiled
Token ID13541
Feature activation+3.15e-02

Pos logit contributions
roach+0.049
 aids+0.047
eness+0.046
paste+0.045
 sensation+0.045
Neg logit contributions
ll-0.032
 together-0.029
 far-0.029
 who-0.024
 close-0.022
Token and
Token ID290
Feature activation+8.90e-03

Pos logit contributions
 together+0.033
 far+0.031
ll+0.030
 glad+0.024
 here+0.024
Neg logit contributions
roach-0.084
 aids-0.081
paste-0.078
 gloss-0.078
 phrase-0.077
Token said
Token ID531
Feature activation-1.09e-02

Pos logit contributions
ll+0.012
 together+0.011
 far+0.011
 glad+0.009
 here+0.008
Neg logit contributions
 aids-0.020
roach-0.020
eness-0.020
est-0.019
 phrase-0.019
Token,
Token ID11
Feature activation-2.28e-02

Pos logit contributions
roach+0.024
 aids+0.024
eness+0.023
 phrase+0.023
paste+0.023
Neg logit contributions
ll-0.015
 together-0.014
 far-0.014
 glad-0.011
 here-0.011
Token "
Token ID366
Feature activation+2.91e-02

Pos logit contributions
 aids+0.043
roach+0.043
eness+0.043
est+0.042
paste+0.042
Neg logit contributions
ll-0.037
 together-0.035
 far-0.034
 glad-0.030
 here-0.029
TokenNice
Token ID35284
Feature activation+6.30e-02

Pos logit contributions
ll+0.036
 together+0.032
 far+0.031
 glad+0.026
 here+0.025
Neg logit contributions
 aids-0.068
roach-0.067
paste-0.066
eness-0.066
est-0.065
Token to
Token ID284
Feature activation+1.32e-03

Pos logit contributions
ll+0.081
 together+0.071
 far+0.068
 glad+0.058
We+0.056
Neg logit contributions
est-0.141
 aids-0.141
roach-0.139
paste-0.139
 gloss-0.137
Token meet
Token ID1826
Feature activation+2.23e-02

Pos logit contributions
ll+0.002
 together+0.002
 far+0.001
 glad+0.001
We+0.001
Neg logit contributions
 aids-0.003
roach-0.003
est-0.003
eness-0.003
 gloss-0.003
Token you
Token ID345
Feature activation+9.48e-03

Pos logit contributions
ll+0.018
 together+0.015
 far+0.014
 glad+0.010
 here+0.009
Neg logit contributions
 aids-0.061
roach-0.061
paste-0.060
est-0.060
eness-0.060
Token,
Token ID11
Feature activation+1.62e-02

Pos logit contributions
ll+0.016
 far+0.014
 together+0.014
 glad+0.012
We+0.012
Neg logit contributions
 aids-0.018
est-0.018
eness-0.017
roach-0.017
paste-0.017
Token L
Token ID406
Feature activation+1.92e-02

Pos logit contributions
ll+0.026
 together+0.024
 far+0.023
 glad+0.020
 here+0.020
Neg logit contributions
 aids-0.032
roach-0.032
paste-0.031
eness-0.031
est-0.031
Token
Token ID198
Feature activation-3.18e-03

Pos logit contributions
ll+0.019
 far+0.018
 together+0.016
 glad+0.014
prise+0.014
Neg logit contributions
 aids-0.042
paste-0.040
roach-0.040
 gloss-0.040
eness-0.040
Token
Token ID198
Feature activation-7.39e-04

Pos logit contributions
 aids+0.007
 phrase+0.007
paste+0.007
 gloss+0.007
roach+0.007
Neg logit contributions
ll-0.004
 far-0.003
 together-0.003
 ours-0.003
 glad-0.003
TokenThe
Token ID464
Feature activation+1.51e-05

Pos logit contributions
 aids+0.001
roach+0.001
paste+0.001
 gloss+0.001
 phrase+0.001
Neg logit contributions
ll-0.001
 together-0.001
 far-0.001
 glad-0.001
 here-0.001
Token two
Token ID734
Feature activation+5.30e-03

Pos logit contributions
ll+0.000
 far+0.000
 together+0.000
 glad+0.000
 here+0.000
Neg logit contributions
 aids-0.000
eness-0.000
roach-0.000
paste-0.000
 gloss-0.000
Token combatants
Token ID40711
Feature activation-1.93e-03

Pos logit contributions
ll+0.006
 together+0.006
 far+0.005
 glad+0.004
 here+0.004
Neg logit contributions
 aids-0.013
roach-0.012
paste-0.012
eness-0.012
est-0.012
Token stopped
Token ID5025
Feature activation+6.29e-02

Pos logit contributions
 aids+0.004
roach+0.004
 gloss+0.004
paste+0.004
 poses+0.004
Neg logit contributions
ll-0.003
 together-0.003
 far-0.003
 here-0.002
 glad-0.002
Token fighting
Token ID4330
Feature activation+4.35e-03

Pos logit contributions
ll+0.097
 Enjoy+0.080
 far+0.079
arl+0.077
 together+0.076
Neg logit contributions
 movements-0.120
 sensation-0.114
eness-0.113
 markings-0.112
 phrase-0.112
Token and
Token ID290
Feature activation+2.05e-02

Pos logit contributions
 together+0.005
ll+0.005
 far+0.005
 here+0.004
 glad+0.004
Neg logit contributions
roach-0.010
paste-0.010
 aids-0.010
 gloss-0.010
est-0.010
Token looked
Token ID3114
Feature activation+2.46e-02

Pos logit contributions
ll+0.030
 together+0.028
 far+0.027
 glad+0.023
 here+0.022
Neg logit contributions
 aids-0.043
roach-0.043
paste-0.042
est-0.042
 gloss-0.042
Token at
Token ID379
Feature activation+7.65e-03

Pos logit contributions
ll+0.033
 together+0.029
 far+0.028
 glad+0.023
 here+0.023
Neg logit contributions
 aids-0.054
roach-0.053
est-0.053
paste-0.053
eness-0.052
Token the
Token ID262
Feature activation-4.82e-03

Pos logit contributions
ll+0.010
 together+0.008
 far+0.008
 glad+0.007
 here+0.007
Neg logit contributions
 aids-0.017
roach-0.017
eness-0.017
est-0.017
paste-0.017
Token Lily
Token ID20037
Feature activation-4.92e-03

Pos logit contributions
ll+0.012
 far+0.009
 together+0.009
prise+0.008
 here+0.008
Neg logit contributions
 aids-0.018
 gloss-0.017
roach-0.016
paste-0.016
der-0.016
Token and
Token ID290
Feature activation+1.36e-02

Pos logit contributions
roach+0.012
 aids+0.012
eness+0.012
paste+0.011
 sensation+0.011
Neg logit contributions
ll-0.006
 together-0.005
 far-0.005
 glad-0.004
 here-0.004
Token Tom
Token ID4186
Feature activation+1.18e-02

Pos logit contributions
 together+0.011
ll+0.011
 far+0.011
 their+0.008
 glad+0.008
Neg logit contributions
roach-0.037
 phrase-0.037
eness-0.037
est-0.037
 aids-0.036
Token felt
Token ID2936
Feature activation+7.68e-03

Pos logit contributions
ll+0.019
 together+0.017
 far+0.017
 glad+0.014
 here+0.014
Neg logit contributions
 aids-0.023
roach-0.023
paste-0.023
eness-0.022
 gloss-0.022
Token a
Token ID257
Feature activation+4.01e-02

Pos logit contributions
 far+0.006
 together+0.006
ll+0.006
 glad+0.005
 here+0.004
Neg logit contributions
roach-0.021
 aids-0.020
eness-0.020
 gloss-0.020
paste-0.020
Token little
Token ID1310
Feature activation+6.28e-02

Pos logit contributions
ll+0.066
 together+0.058
 far+0.053
 here+0.048
 Enjoy+0.047
Neg logit contributions
est-0.077
 aids-0.076
 phrase-0.076
 sensation-0.075
roach-0.075
Token better
Token ID1365
Feature activation+4.24e-02

Pos logit contributions
ll+0.092
 together+0.085
 far+0.084
 glad+0.071
 here+0.069
Neg logit contributions
 aids-0.130
roach-0.129
paste-0.126
eness-0.125
 gloss-0.125
Token.
Token ID13
Feature activation+3.23e-03

Pos logit contributions
ll+0.071
 together+0.062
 far+0.060
 Enjoy+0.053
 glad+0.053
Neg logit contributions
 aids-0.079
est-0.078
paste-0.078
 phrase-0.078
eness-0.077
Token They
Token ID1119
Feature activation+1.61e-02

Pos logit contributions
ll+0.004
 far+0.003
 together+0.003
prise+0.003
 here+0.003
Neg logit contributions
 aids-0.007
 gloss-0.007
 poses-0.007
paste-0.007
roach-0.007
Token thanked
Token ID26280
Feature activation+3.46e-02

Pos logit contributions
ll+0.025
 together+0.023
 far+0.022
 here+0.019
 glad+0.019
Neg logit contributions
 aids-0.031
roach-0.031
paste-0.031
 gloss-0.030
eness-0.030
Token the
Token ID262
Feature activation+1.79e-02

Pos logit contributions
ll+0.055
 together+0.047
 far+0.047
 glad+0.040
 here+0.039
Neg logit contributions
 aids-0.070
roach-0.068
eness-0.066
est-0.066
 phrase-0.066
Token together
Token ID1978
Feature activation+1.62e-02

Pos logit contributions
ll+0.033
 far+0.029
 together+0.028
 glad+0.026
 Enjoy+0.024
Neg logit contributions
 aids-0.039
est-0.038
paste-0.038
roach-0.037
eness-0.037
Token.
Token ID13
Feature activation+7.67e-03

Pos logit contributions
ll+0.027
 together+0.024
 far+0.024
 glad+0.021
 here+0.020
Neg logit contributions
 aids-0.031
roach-0.030
paste-0.030
est-0.030
eness-0.029
Token Spot
Token ID15899
Feature activation-1.92e-03

Pos logit contributions
ll+0.010
 far+0.009
 together+0.009
 glad+0.007
 here+0.007
Neg logit contributions
 aids-0.017
roach-0.017
 gloss-0.017
est-0.016
paste-0.016
Token was
Token ID373
Feature activation+4.16e-03

Pos logit contributions
 aids+0.004
roach+0.004
paste+0.004
eness+0.004
 gloss+0.004
Neg logit contributions
ll-0.003
 together-0.003
 far-0.003
 glad-0.002
 here-0.002
Token a
Token ID257
Feature activation+4.41e-02

Pos logit contributions
ll+0.005
 together+0.005
 far+0.005
 glad+0.004
 here+0.004
Neg logit contributions
 aids-0.009
roach-0.009
paste-0.009
eness-0.009
est-0.009
Token depend
Token ID4745
Feature activation+6.27e-02

Pos logit contributions
ll+0.066
 together+0.058
 far+0.052
arl+0.046
 here+0.045
Neg logit contributions
roach-0.089
 aids-0.088
est-0.088
 phrase-0.088
 gloss-0.086
Tokenable
Token ID540
Feature activation+2.10e-02

Pos logit contributions
ll+0.092
 together+0.080
 far+0.077
 glad+0.069
 our+0.067
Neg logit contributions
est-0.131
roach-0.130
paste-0.124
 sensation-0.124
 gloss-0.122
Token friend
Token ID1545
Feature activation+1.99e-02

Pos logit contributions
ll+0.031
 together+0.028
 far+0.027
 glad+0.023
 here+0.022
Neg logit contributions
 aids-0.044
roach-0.043
est-0.042
paste-0.042
eness-0.042
Token and
Token ID290
Feature activation+1.79e-02

Pos logit contributions
ll+0.027
 together+0.025
 far+0.024
 glad+0.020
 here+0.020
Neg logit contributions
 aids-0.043
roach-0.043
paste-0.042
eness-0.042
est-0.042
Token Sam
Token ID3409
Feature activation-7.23e-03

Pos logit contributions
ll+0.025
 together+0.023
 far+0.022
 glad+0.019
 here+0.018
Neg logit contributions
 aids-0.038
roach-0.038
paste-0.037
est-0.037
eness-0.037
Token loved
Token ID6151
Feature activation+4.65e-02

Pos logit contributions
 aids+0.014
roach+0.014
est+0.014
eness+0.014
paste+0.014
Neg logit contributions
ll-0.011
 together-0.010
 far-0.010
 glad-0.009
 here-0.009
Token beach
Token ID10481
Feature activation+3.88e-02

Pos logit contributions
 aids+0.011
paste+0.011
roach+0.011
eness+0.011
 gloss+0.010
Neg logit contributions
ll-0.007
 together-0.006
 far-0.006
 here-0.005
 glad-0.005
Token.
Token ID13
Feature activation-1.30e-02

Pos logit contributions
ll+0.073
arl+0.069
 glad+0.066
 Enjoy+0.061
 ours+0.059
Neg logit contributions
est-0.055
 poses-0.053
 gloss-0.051
 streaks-0.051
 lines-0.049
Token Sue
Token ID26089
Feature activation-1.27e-02

Pos logit contributions
 aids+0.028
est+0.028
roach+0.027
 gloss+0.027
 phrase+0.027
Neg logit contributions
ll-0.018
 together-0.015
 far-0.015
 glad-0.012
 here-0.012
Token was
Token ID373
Feature activation-9.24e-04

Pos logit contributions
est+0.020
 streaks+0.019
oline+0.019
 markings+0.018
 brist+0.018
Neg logit contributions
arl-0.023
ll-0.020
 together-0.019
 Enjoy-0.018
prise-0.018
Token very
Token ID845
Feature activation+4.50e-03

Pos logit contributions
 sensation+0.002
 aids+0.002
est+0.002
 gloss+0.002
 streaks+0.002
Neg logit contributions
ll-0.002
 together-0.001
arl-0.001
 Enjoy-0.001
prise-0.001
Token exc
Token ID2859
Feature activation+6.22e-02

Pos logit contributions
ll+0.008
use+0.007
 together+0.006
We+0.006
 Enjoy+0.006
Neg logit contributions
 scalp-0.006
 sensation-0.006
 streaks-0.006
 movements-0.006
 brist-0.006
Tokenite
Token ID578
Feature activation-1.55e-03

Pos logit contributions
ll+0.099
 together+0.094
 far+0.094
 our+0.091
 glad+0.090
Neg logit contributions
 gloss-0.104
oga-0.096
ol-0.095
 streaks-0.091
roach-0.090
Token to
Token ID284
Feature activation-3.38e-03

Pos logit contributions
 aids+0.004
roach+0.004
eness+0.004
paste+0.004
 phrase+0.004
Neg logit contributions
ll-0.002
 together-0.002
 far-0.002
 here-0.001
 glad-0.001
Token play
Token ID711
Feature activation+1.73e-02

Pos logit contributions
 streaks+0.007
est+0.006
 aids+0.006
 phrase+0.006
 exercises+0.006
Neg logit contributions
ll-0.005
 together-0.004
 glad-0.004
arl-0.004
 far-0.004
Token in
Token ID287
Feature activation+2.16e-02

Pos logit contributions
ll+0.020
 together+0.017
 far+0.017
 glad+0.014
 here+0.013
Neg logit contributions
 aids-0.041
roach-0.041
paste-0.040
est-0.040
 gloss-0.040
Token the
Token ID262
Feature activation+4.68e-03

Pos logit contributions
ll+0.049
arl+0.040
 together+0.036
 glad+0.036
 Enjoy+0.036
Neg logit contributions
 streaks-0.028
est-0.027
 strands-0.027
 poses-0.026
 lines-0.026
Token smiled
Token ID13541
Feature activation+2.66e-02

Pos logit contributions
ll+0.015
 together+0.014
 far+0.014
 glad+0.012
 here+0.011
Neg logit contributions
 aids-0.022
roach-0.022
paste-0.022
eness-0.022
est-0.022
Token and
Token ID290
Feature activation+3.01e-03

Pos logit contributions
 together+0.032
ll+0.030
 far+0.030
 glad+0.024
 here+0.024
Neg logit contributions
roach-0.066
 aids-0.064
paste-0.063
 gloss-0.062
 poses-0.061
Token said
Token ID531
Feature activation-1.40e-02

Pos logit contributions
ll+0.004
 together+0.004
 far+0.004
 glad+0.003
 here+0.003
Neg logit contributions
 aids-0.007
roach-0.007
eness-0.007
est-0.007
paste-0.007
Token:
Token ID25
Feature activation-1.14e-02

Pos logit contributions
roach+0.029
 aids+0.029
eness+0.028
paste+0.028
est+0.028
Neg logit contributions
ll-0.020
 together-0.019
 far-0.018
 glad-0.016
 here-0.015
Token "
Token ID366
Feature activation+2.68e-02

Pos logit contributions
 aids+0.020
roach+0.020
paste+0.020
eness+0.020
est+0.020
Neg logit contributions
ll-0.020
 together-0.018
 far-0.018
 glad-0.016
 here-0.015
TokenNice
Token ID35284
Feature activation+6.21e-02

Pos logit contributions
ll+0.039
 together+0.035
 far+0.035
 glad+0.030
 here+0.029
Neg logit contributions
 aids-0.055
roach-0.055
est-0.054
eness-0.054
paste-0.054
Token job
Token ID1693
Feature activation-1.72e-02

Pos logit contributions
ll+0.082
 together+0.077
 far+0.075
 here+0.062
 glad+0.062
Neg logit contributions
 aids-0.137
roach-0.136
paste-0.134
eness-0.133
 gloss-0.133
Token,
Token ID11
Feature activation+8.40e-03

Pos logit contributions
roach+0.040
 aids+0.039
 gloss+0.039
paste+0.038
 sensation+0.038
Neg logit contributions
 together-0.022
ll-0.021
 far-0.019
 here-0.018
 glad-0.016
Token Elliot
Token ID30451
Feature activation+2.53e-02

Pos logit contributions
ll+0.014
 far+0.013
 together+0.012
 glad+0.011
We+0.011
Neg logit contributions
est-0.016
eness-0.015
 aids-0.015
 phrase-0.015
roach-0.015
Token!"
Token ID2474
Feature activation-5.93e-03

Pos logit contributions
ll+0.047
 far+0.044
 together+0.043
 glad+0.039
arl+0.037
Neg logit contributions
est-0.041
roach-0.040
eness-0.040
 aids-0.039
paste-0.039
Token Elliot
Token ID30451
Feature activation+1.59e-02

Pos logit contributions
 aids+0.013
eness+0.013
est+0.013
roach+0.013
 gloss+0.013
Neg logit contributions
ll-0.008
 far-0.007
 together-0.006
 here-0.005
 glad-0.005
Token smiled
Token ID13541
Feature activation+2.57e-02

Pos logit contributions
ll+0.029
 far+0.026
 together+0.026
arl+0.023
 here+0.023
Neg logit contributions
roach-0.036
 aids-0.035
 gloss-0.035
est-0.034
eness-0.034
Token and
Token ID290
Feature activation+3.04e-03

Pos logit contributions
 together+0.031
ll+0.031
 far+0.029
 glad+0.024
 here+0.023
Neg logit contributions
roach-0.061
 aids-0.060
paste-0.059
 gloss-0.059
 poses-0.057
Token said
Token ID531
Feature activation-1.83e-02

Pos logit contributions
ll+0.004
 together+0.004
 far+0.004
 glad+0.003
 here+0.003
Neg logit contributions
 aids-0.007
roach-0.007
paste-0.006
eness-0.006
est-0.006
Token,
Token ID11
Feature activation-2.22e-02

Pos logit contributions
 aids+0.036
roach+0.036
paste+0.035
est+0.035
eness+0.035
Neg logit contributions
ll-0.029
 together-0.026
 far-0.025
 glad-0.022
 here-0.021
Token "
Token ID366
Feature activation+2.99e-02

Pos logit contributions
 aids+0.037
roach+0.036
paste+0.035
 phrase+0.035
 gloss+0.035
Neg logit contributions
ll-0.041
 far-0.038
 together-0.037
 glad-0.033
 here-0.033
TokenNice
Token ID35284
Feature activation+6.21e-02

Pos logit contributions
ll+0.044
 together+0.040
 far+0.040
 glad+0.034
 here+0.033
Neg logit contributions
 aids-0.061
roach-0.061
est-0.060
eness-0.060
paste-0.060
Token job
Token ID1693
Feature activation-1.52e-02

Pos logit contributions
ll+0.078
 together+0.071
 far+0.069
 glad+0.056
 here+0.055
Neg logit contributions
 aids-0.142
roach-0.141
paste-0.139
eness-0.138
 gloss-0.138
Token,
Token ID11
Feature activation+9.08e-03

Pos logit contributions
roach+0.035
 aids+0.034
 gloss+0.034
paste+0.034
eness+0.033
Neg logit contributions
 together-0.020
ll-0.020
 far-0.018
 here-0.016
 our-0.015
Token Jack
Token ID3619
Feature activation+1.21e-02

Pos logit contributions
ll+0.015
 together+0.014
 far+0.014
 glad+0.012
We+0.011
Neg logit contributions
 aids-0.017
roach-0.017
est-0.016
eness-0.016
 phrase-0.016
Token!"
Token ID2474
Feature activation-5.61e-03

Pos logit contributions
ll+0.020
 together+0.019
 far+0.018
 glad+0.016
 here+0.016
Neg logit contributions
 aids-0.023
roach-0.023
paste-0.022
 gloss-0.022
eness-0.022
Token From
Token ID3574
Feature activation+1.35e-02

Pos logit contributions
 aids+0.013
roach+0.013
eness+0.012
est+0.012
 gloss+0.012
Neg logit contributions
ll-0.007
 far-0.006
 together-0.006
 here-0.005
 glad-0.005
Token "
Token ID366
Feature activation+2.42e-02

Pos logit contributions
 aids+0.033
roach+0.033
paste+0.032
 gloss+0.032
 phrase+0.032
Neg logit contributions
ll-0.039
 far-0.036
 together-0.035
 here-0.031
 glad-0.031
TokenYou
Token ID1639
Feature activation+1.32e-03

Pos logit contributions
ll+0.029
 far+0.024
 together+0.023
 glad+0.019
 here+0.019
Neg logit contributions
 gloss-0.057
roach-0.057
paste-0.057
est-0.057
 aids-0.057
Token are
Token ID389
Feature activation+2.72e-02

Pos logit contributions
ll+0.002
 together+0.002
 far+0.002
 glad+0.001
 here+0.001
Neg logit contributions
 aids-0.003
roach-0.003
paste-0.003
est-0.003
eness-0.003
Token welcome
Token ID7062
Feature activation+4.93e-02

Pos logit contributions
ll+0.031
 together+0.025
 far+0.024
We+0.018
arl+0.018
Neg logit contributions
 aids-0.065
roach-0.063
est-0.063
paste-0.062
eness-0.062
Token,
Token ID11
Feature activation+1.58e-02

Pos logit contributions
ll+0.091
 far+0.079
 together+0.077
 glad+0.071
 Enjoy+0.067
Neg logit contributions
est-0.081
paste-0.079
 aids-0.079
 phrase-0.078
eness-0.078
Token welcome
Token ID7062
Feature activation+6.19e-02

Pos logit contributions
ll+0.026
 together+0.024
 far+0.023
 glad+0.020
 here+0.019
Neg logit contributions
 aids-0.030
roach-0.030
paste-0.029
est-0.029
 gloss-0.029
Token!
Token ID0
Feature activation+2.68e-02

Pos logit contributions
ll+0.101
 together+0.087
 far+0.086
 glad+0.077
 Enjoy+0.070
Neg logit contributions
est-0.116
 aids-0.115
paste-0.114
 gloss-0.113
 phrase-0.113
Token You
Token ID921
Feature activation+1.51e-03

Pos logit contributions
ll+0.044
 far+0.039
 together+0.038
 glad+0.032
 here+0.031
Neg logit contributions
 aids-0.052
roach-0.051
est-0.051
paste-0.050
 phrase-0.050
Token are
Token ID389
Feature activation+2.57e-02

Pos logit contributions
ll+0.002
 together+0.002
 far+0.002
 glad+0.001
 here+0.001
Neg logit contributions
 aids-0.003
roach-0.003
paste-0.003
est-0.003
eness-0.003
Token cute
Token ID13779
Feature activation+1.57e-02

Pos logit contributions
ll+0.039
 together+0.031
 far+0.029
arl+0.027
We+0.025
Neg logit contributions
 aids-0.052
est-0.050
roach-0.050
paste-0.050
 phrase-0.049
Token!"
Token ID2474
Feature activation+1.71e-03

Pos logit contributions
ll+0.025
 far+0.022
 together+0.022
 glad+0.019
 Enjoy+0.018
Neg logit contributions
est-0.031
 aids-0.031
paste-0.030
roach-0.030
eness-0.030
Token,
Token ID11
Feature activation+2.43e-02

Pos logit contributions
ll+0.045
 far+0.040
 together+0.040
 glad+0.037
We+0.035
Neg logit contributions
est-0.043
 aids-0.041
paste-0.041
eness-0.040
 phrase-0.040
Token little
Token ID1310
Feature activation+3.19e-02

Pos logit contributions
ll+0.044
 together+0.041
 far+0.038
 glad+0.034
We+0.034
Neg logit contributions
est-0.040
 phrase-0.036
 aids-0.036
der-0.036
roach-0.035
Token ostr
Token ID42387
Feature activation+1.70e-02

Pos logit contributions
ll+0.059
 together+0.052
 far+0.051
 glad+0.048
 Enjoy+0.044
Neg logit contributions
est-0.056
eness-0.053
roach-0.052
 phrase-0.052
 aids-0.051
Tokenich
Token ID488
Feature activation+1.36e-02

Pos logit contributions
ll+0.024
 far+0.023
 together+0.023
 glad+0.019
 here+0.019
Neg logit contributions
 aids-0.034
paste-0.034
eness-0.034
roach-0.033
est-0.033
Token,
Token ID11
Feature activation+1.87e-02

Pos logit contributions
ll+0.019
 together+0.017
 far+0.017
 glad+0.014
We+0.013
Neg logit contributions
 aids-0.029
est-0.028
roach-0.028
paste-0.028
eness-0.028
Token welcome
Token ID7062
Feature activation+6.18e-02

Pos logit contributions
ll+0.027
 together+0.024
 far+0.021
 glad+0.018
arl+0.018
Neg logit contributions
est-0.038
 aids-0.037
roach-0.036
 phrase-0.036
paste-0.035
Token to
Token ID284
Feature activation+1.03e-02

Pos logit contributions
ll+0.091
 together+0.075
 far+0.069
 glad+0.069
arl+0.060
Neg logit contributions
est-0.131
eness-0.119
 aids-0.118
paste-0.118
 markings-0.117
Token the
Token ID262
Feature activation+1.49e-02

Pos logit contributions
ll+0.018
 together+0.014
We+0.013
 far+0.012
arl+0.012
Neg logit contributions
 aids-0.019
 movements-0.017
 sentences-0.017
roach-0.017
 phrase-0.017
Token zoo
Token ID26626
Feature activation+2.51e-02

Pos logit contributions
ll+0.028
 together+0.026
 far+0.023
 glad+0.021
arl+0.020
Neg logit contributions
est-0.025
 phrase-0.024
 aids-0.024
roach-0.024
 markings-0.024
Token,"
Token ID553
Feature activation+1.97e-02

Pos logit contributions
ll+0.040
 together+0.033
 far+0.033
 glad+0.031
arl+0.031
Neg logit contributions
 aids-0.047
est-0.047
paste-0.045
eness-0.045
 phrase-0.044
Token the
Token ID262
Feature activation+1.58e-02

Pos logit contributions
ll+0.037
 together+0.030
 far+0.029
arl+0.028
 glad+0.026
Neg logit contributions
 aids-0.035
paste-0.032
 poses-0.032
 phrase-0.032
 movements-0.032
Token.
Token ID13
Feature activation+9.07e-03

Pos logit contributions
ll+0.018
 far+0.017
 together+0.017
 glad+0.015
 here+0.014
Neg logit contributions
 aids-0.021
est-0.021
roach-0.021
eness-0.021
paste-0.021
Token<|endoftext|>
Token ID50256
Feature activation+4.58e-03

Pos logit contributions
ll+0.010
 far+0.008
 together+0.008
 here+0.006
prise+0.006
Neg logit contributions
 aids-0.022
 gloss-0.022
roach-0.022
paste-0.021
 poses-0.021
TokenBen
Token ID11696
Feature activation+6.05e-03

Pos logit contributions
ll+0.007
 far+0.006
 together+0.006
 glad+0.005
 ours+0.005
Neg logit contributions
roach-0.009
 aids-0.009
est-0.009
paste-0.009
 gloss-0.009
Token was
Token ID373
Feature activation-3.25e-03

Pos logit contributions
ll+0.009
 together+0.009
 far+0.008
 glad+0.007
arl+0.007
Neg logit contributions
roach-0.012
est-0.012
 aids-0.012
paste-0.011
 gloss-0.011
Token very
Token ID845
Feature activation-3.32e-03

Pos logit contributions
 aids+0.007
 gloss+0.006
 sensation+0.006
est+0.006
roach+0.006
Neg logit contributions
ll-0.005
 together-0.004
 far-0.004
We-0.003
arl-0.003
Token exc
Token ID2859
Feature activation+6.18e-02

Pos logit contributions
 aids+0.006
 sensation+0.006
 scalp+0.006
 gloss+0.006
 streaks+0.005
Neg logit contributions
ll-0.005
 together-0.004
arl-0.004
We-0.004
use-0.004
Tokenite
Token ID578
Feature activation+1.21e-03

Pos logit contributions
ll+0.085
 together+0.084
 far+0.083
 glad+0.081
 here+0.076
Neg logit contributions
 gloss-0.108
oga-0.105
ol-0.103
 aids-0.102
roach-0.102
Token to
Token ID284
Feature activation+3.37e-03

Pos logit contributions
 together+0.002
ll+0.002
 far+0.001
 here+0.001
 who+0.001
Neg logit contributions
paste-0.003
 aids-0.003
roach-0.003
 phrase-0.003
 brist-0.003
Token go
Token ID467
Feature activation+2.89e-02

Pos logit contributions
ll+0.004
 together+0.003
 far+0.003
 glad+0.002
 here+0.002
Neg logit contributions
 aids-0.008
roach-0.008
paste-0.008
est-0.008
eness-0.008
Token to
Token ID284
Feature activation+1.08e-02

Pos logit contributions
ll+0.045
We+0.035
 glad+0.035
 far+0.035
prise+0.033
Neg logit contributions
est-0.052
 gloss-0.052
roach-0.051
 exercises-0.049
oga-0.049
Token school
Token ID1524
Feature activation+8.24e-03

Pos logit contributions
ll+0.017
 together+0.015
 far+0.014
 glad+0.013
We+0.012
Neg logit contributions
 aids-0.021
roach-0.021
est-0.020
 poses-0.020
 phrase-0.020
Token cat
Token ID3797
Feature activation-1.46e-03

Pos logit contributions
ll+0.017
 together+0.016
 far+0.013
 here+0.013
 ours+0.012
Neg logit contributions
roach-0.018
 gloss-0.018
est-0.017
 aids-0.017
paste-0.017
Token named
Token ID3706
Feature activation+8.62e-03

Pos logit contributions
 aids+0.004
roach+0.004
est+0.004
 gloss+0.004
paste+0.004
Neg logit contributions
ll-0.001
 together-0.001
 far-0.001
 glad-0.001
arl-0.001
Token Tom
Token ID4186
Feature activation+2.49e-03

Pos logit contributions
ll+0.014
 together+0.012
 far+0.011
 glad+0.010
We+0.010
Neg logit contributions
 aids-0.017
roach-0.017
 gloss-0.017
paste-0.017
est-0.016
Token was
Token ID373
Feature activation+1.17e-02

Pos logit contributions
ll+0.005
 together+0.004
arl+0.004
 far+0.004
 glad+0.004
Neg logit contributions
est-0.004
roach-0.004
 aids-0.004
 markings-0.004
paste-0.004
Token very
Token ID845
Feature activation+8.73e-03

Pos logit contributions
ll+0.021
 together+0.016
arl+0.016
prise+0.015
 Enjoy+0.015
Neg logit contributions
 aids-0.019
 gloss-0.019
ze-0.019
 brist-0.018
 ankle-0.018
Token exc
Token ID2859
Feature activation+6.16e-02

Pos logit contributions
ll+0.015
 together+0.014
We+0.013
They+0.011
 Enjoy+0.011
Neg logit contributions
 scalp-0.013
 gloss-0.013
 brist-0.013
paste-0.012
 aids-0.012
Tokenite
Token ID578
Feature activation-5.83e-03

Pos logit contributions
ll+0.089
 together+0.089
 far+0.083
 glad+0.082
 here+0.078
Neg logit contributions
 gloss-0.114
roach-0.108
oga-0.107
paste-0.104
ol-0.103
Token.
Token ID13
Feature activation-1.05e-02

Pos logit contributions
roach+0.014
paste+0.014
eness+0.014
 phrase+0.014
 aids+0.014
Neg logit contributions
 together-0.007
 far-0.007
ll-0.006
 here-0.005
 they-0.004
Token He
Token ID679
Feature activation-8.19e-04

Pos logit contributions
eness+0.025
paste+0.025
roach+0.025
 aids+0.024
 phrase+0.024
Neg logit contributions
 far-0.011
ll-0.011
 together-0.011
 glad-0.009
 Enjoy-0.009
Token saw
Token ID2497
Feature activation-1.66e-02

Pos logit contributions
 aids+0.002
roach+0.002
paste+0.002
 gloss+0.002
est+0.002
Neg logit contributions
ll-0.001
 together-0.001
 far-0.001
 glad-0.001
 here-0.001
Token a
Token ID257
Feature activation+1.97e-02

Pos logit contributions
eness+0.038
 phrase+0.038
roach+0.038
 aids+0.037
paste+0.037
Neg logit contributions
 far-0.021
 together-0.020
ll-0.019
 glad-0.017
 many-0.016
Tokenily
Token ID813
Feature activation-1.31e-02

Pos logit contributions
 aids+0.005
roach+0.005
paste+0.005
 gloss+0.005
eness+0.005
Neg logit contributions
ll-0.005
 together-0.005
 far-0.005
 here-0.004
 glad-0.004
Token and
Token ID290
Feature activation+1.47e-02

Pos logit contributions
roach+0.035
 aids+0.034
eness+0.034
paste+0.033
 sensation+0.033
Neg logit contributions
ll-0.013
 together-0.012
 far-0.011
 who-0.008
 here-0.008
Token Ben
Token ID3932
Feature activation+6.34e-03

Pos logit contributions
ll+0.019
 far+0.018
 together+0.017
 glad+0.014
 here+0.014
Neg logit contributions
 aids-0.033
roach-0.033
est-0.033
paste-0.032
 phrase-0.032
Token were
Token ID547
Feature activation+8.02e-03

Pos logit contributions
ll+0.010
 together+0.009
 far+0.009
 glad+0.008
 here+0.007
Neg logit contributions
 aids-0.013
roach-0.013
paste-0.012
 gloss-0.012
est-0.012
Token a
Token ID257
Feature activation+4.85e-02

Pos logit contributions
ll+0.010
 together+0.010
 far+0.010
 glad+0.008
 here+0.008
Neg logit contributions
roach-0.018
 aids-0.018
paste-0.018
est-0.017
eness-0.017
Token little
Token ID1310
Feature activation+6.14e-02

Pos logit contributions
ll+0.090
 together+0.076
 far+0.070
arl+0.066
 Enjoy+0.066
Neg logit contributions
 phrase-0.083
 aids-0.082
 movements-0.081
paste-0.081
est-0.080
Token sad
Token ID6507
Feature activation+1.89e-02

Pos logit contributions
ll+0.111
 together+0.091
 far+0.088
 Enjoy+0.081
arl+0.080
Neg logit contributions
paste-0.110
 aids-0.105
est-0.105
 phrase-0.105
 poses-0.105
Token.
Token ID13
Feature activation+1.15e-02

Pos logit contributions
ll+0.027
 together+0.025
 far+0.024
 glad+0.021
 here+0.020
Neg logit contributions
 aids-0.040
roach-0.040
paste-0.039
 gloss-0.039
eness-0.039
Token They
Token ID1119
Feature activation+1.51e-02

Pos logit contributions
ll+0.015
 far+0.011
 together+0.010
 glad+0.008
prise+0.008
Neg logit contributions
 aids-0.028
roach-0.027
 phrase-0.027
 poses-0.027
 movements-0.026
Token wanted
Token ID2227
Feature activation+1.39e-02

Pos logit contributions
ll+0.024
 together+0.022
 far+0.022
 glad+0.019
 here+0.018
Neg logit contributions
 aids-0.030
roach-0.029
paste-0.029
est-0.028
eness-0.028
Token to
Token ID284
Feature activation-1.97e-02

Pos logit contributions
ll+0.014
 together+0.012
 far+0.012
 glad+0.009
 here+0.009
Neg logit contributions
 aids-0.035
roach-0.035
paste-0.035
est-0.035
eness-0.035
Token had
Token ID550
Feature activation-7.41e-03

Pos logit contributions
ll+0.001
 together+0.001
 far+0.001
 glad+0.001
 here+0.001
Neg logit contributions
roach-0.001
 aids-0.001
eness-0.001
paste-0.001
est-0.001
Token to
Token ID284
Feature activation-1.54e-02

Pos logit contributions
roach+0.019
eness+0.018
est+0.018
 sensation+0.018
paste+0.018
Neg logit contributions
 far-0.009
 together-0.008
ll-0.007
 here-0.007
 many-0.006
Token know
Token ID760
Feature activation-1.91e-02

Pos logit contributions
roach+0.037
paste+0.036
 gloss+0.036
est+0.035
eness+0.035
Neg logit contributions
 far-0.018
 together-0.016
ll-0.016
 here-0.014
 close-0.013
Token all
Token ID477
Feature activation+1.54e-02

Pos logit contributions
roach+0.046
 aids+0.044
 gloss+0.043
 sensation+0.043
paste+0.043
Neg logit contributions
 together-0.024
 far-0.023
ll-0.021
 here-0.018
 glad-0.017
Token about
Token ID546
Feature activation+2.39e-03

Pos logit contributions
 far+0.017
 together+0.017
ll+0.016
 here+0.013
 glad+0.013
Neg logit contributions
paste-0.038
 gloss-0.037
 scalp-0.037
roach-0.036
tern-0.036
Token microsc
Token ID21145
Feature activation-9.34e-02

Pos logit contributions
 together+0.003
 far+0.003
ll+0.003
 here+0.003
 glad+0.002
Neg logit contributions
roach-0.005
paste-0.005
 aids-0.005
eness-0.005
 gloss-0.005
Tokenopes
Token ID13920
Feature activation-1.37e-02

Pos logit contributions
paste+0.126
 gloss+0.126
 aids+0.125
 sensation+0.124
roach+0.121
Neg logit contributions
ll-0.215
 together-0.197
 far-0.184
 here-0.164
 our-0.156
Token.
Token ID13
Feature activation-3.10e-03

Pos logit contributions
 aids+0.032
roach+0.032
 gloss+0.031
 phrase+0.031
paste+0.031
Neg logit contributions
ll-0.017
 together-0.017
 far-0.016
 here-0.013
 glad-0.011
Token
Token ID220
Feature activation+1.24e-02

Pos logit contributions
 aids+0.007
est+0.006
roach+0.006
 phrase+0.006
paste+0.006
Neg logit contributions
ll-0.005
 together-0.004
 far-0.004
 glad-0.003
 here-0.003
Token
Token ID198
Feature activation-2.53e-03

Pos logit contributions
ll+0.012
 far+0.010
 together+0.010
 glad+0.008
 here+0.007
Neg logit contributions
 aids-0.031
paste-0.030
roach-0.030
 gloss-0.029
est-0.029
Token
Token ID198
Feature activation+4.26e-04

Pos logit contributions
 aids+0.006
 phrase+0.005
est+0.005
 poses+0.005
der+0.005
Neg logit contributions
ll-0.003
 far-0.002
 together-0.002
 ours-0.002
arl-0.002
Token if
Token ID611
Feature activation+3.24e-02

Pos logit contributions
ll+0.024
 together+0.021
 far+0.021
 glad+0.017
We+0.017
Neg logit contributions
 aids-0.036
roach-0.035
est-0.035
eness-0.035
paste-0.035
Token she
Token ID673
Feature activation-1.89e-02

Pos logit contributions
ll+0.049
 together+0.043
 far+0.042
 glad+0.036
arl+0.035
Neg logit contributions
 aids-0.065
paste-0.064
roach-0.063
est-0.062
 gloss-0.062
Token could
Token ID714
Feature activation-6.48e-04

Pos logit contributions
 aids+0.040
roach+0.039
paste+0.039
eness+0.039
est+0.038
Neg logit contributions
ll-0.027
 together-0.025
 far-0.024
 glad-0.021
 here-0.020
Token talk
Token ID1561
Feature activation-1.30e-02

Pos logit contributions
 aids+0.001
roach+0.001
paste+0.001
eness+0.001
est+0.001
Neg logit contributions
ll-0.001
 together-0.001
 far-0.001
 here-0.001
 glad-0.001
Token about
Token ID546
Feature activation-1.17e-03

Pos logit contributions
roach+0.036
 aids+0.035
paste+0.034
 gloss+0.034
 scalp+0.034
Neg logit contributions
 together-0.013
 far-0.011
ll-0.009
 here-0.009
 who-0.007
Token microsc
Token ID21145
Feature activation-9.04e-02

Pos logit contributions
roach+0.003
paste+0.003
 aids+0.003
eness+0.003
est+0.003
Neg logit contributions
 far-0.001
 together-0.001
ll-0.001
 here-0.001
 who-0.001
Tokenopes
Token ID13920
Feature activation-9.37e-03

Pos logit contributions
paste+0.126
 gloss+0.124
 aids+0.122
 sensation+0.120
roach+0.119
Neg logit contributions
ll-0.205
 together-0.185
 far-0.175
 here-0.159
 glad-0.149
Token.
Token ID13
Feature activation-1.63e-03

Pos logit contributions
roach+0.022
 aids+0.021
 phrase+0.021
paste+0.021
 gloss+0.021
Neg logit contributions
 together-0.012
ll-0.012
 far-0.011
 here-0.009
 glad-0.008
Token
Token ID198
Feature activation+9.02e-04

Pos logit contributions
 aids+0.003
 phrase+0.003
est+0.003
roach+0.003
 gloss+0.003
Neg logit contributions
ll-0.003
 far-0.002
 together-0.002
 glad-0.002
 here-0.002
Token
Token ID198
Feature activation+5.72e-03

Pos logit contributions
ll+0.001
 ours+0.001
 far+0.001
 together+0.001
arl+0.001
Neg logit contributions
 aids-0.002
 phrase-0.002
est-0.002
 gloss-0.002
 poses-0.002
TokenHer
Token ID9360
Feature activation+3.87e-03

Pos logit contributions
ll+0.011
 far+0.009
 together+0.009
 glad+0.008
arl+0.008
Neg logit contributions
 aids-0.010
 phrase-0.010
est-0.009
eness-0.009
paste-0.009
Token.
Token ID13
Feature activation-6.02e-03

Pos logit contributions
 together+0.001
 far+0.001
ll+0.001
 here+0.000
 who+0.000
Neg logit contributions
roach-0.001
 aids-0.001
paste-0.001
eness-0.001
 gloss-0.001
Token It
Token ID632
Feature activation-9.11e-03

Pos logit contributions
 aids+0.010
roach+0.010
 gloss+0.009
 phrase+0.009
est+0.009
Neg logit contributions
ll-0.012
 far-0.010
 together-0.010
 glad-0.009
 here-0.009
Token said
Token ID531
Feature activation-2.23e-02

Pos logit contributions
roach+0.021
 aids+0.020
eness+0.020
paste+0.020
 poses+0.020
Neg logit contributions
 together-0.012
ll-0.012
 far-0.012
 here-0.009
We-0.009
Token,
Token ID11
Feature activation-3.47e-02

Pos logit contributions
roach+0.046
eness+0.044
 sensation+0.043
 poses+0.043
 aids+0.043
Neg logit contributions
 together-0.034
 far-0.032
ll-0.031
 here-0.029
 glad-0.028
Token "
Token ID366
Feature activation+2.07e-02

Pos logit contributions
roach+0.066
 aids+0.066
eness+0.064
est+0.064
paste+0.064
Neg logit contributions
ll-0.057
 together-0.054
 far-0.052
 glad-0.046
 here-0.045
TokenPract
Token ID49515
Feature activation-8.95e-02

Pos logit contributions
ll+0.032
 together+0.029
 far+0.029
 glad+0.025
 here+0.024
Neg logit contributions
 aids-0.041
roach-0.040
 gloss-0.040
eness-0.040
est-0.040
Tokenice
Token ID501
Feature activation-2.01e-02

Pos logit contributions
 aids+0.111
paste+0.110
 sensation+0.110
roach+0.102
 phrase+0.102
Neg logit contributions
ll-0.211
 together-0.186
 far-0.177
 who-0.167
 here-0.163
Token balancing
Token ID22486
Feature activation-2.23e-02

Pos logit contributions
roach+0.044
 aids+0.043
 sensation+0.043
eness+0.042
paste+0.042
Neg logit contributions
ll-0.028
 together-0.027
 far-0.025
 here-0.022
 glad-0.020
Token every
Token ID790
Feature activation+1.35e-02

Pos logit contributions
 aids+0.051
roach+0.049
 sensation+0.049
eness+0.049
 gloss+0.048
Neg logit contributions
 together-0.030
 far-0.028
ll-0.027
 here-0.024
 our-0.022
Token day
Token ID1110
Feature activation-1.56e-03

Pos logit contributions
 far+0.017
 together+0.015
ll+0.015
 here+0.014
 home+0.012
Neg logit contributions
eness-0.031
 aids-0.031
paste-0.030
roach-0.030
 gloss-0.029
Token,
Token ID11
Feature activation-8.08e-03

Pos logit contributions
roach+0.003
 aids+0.003
paste+0.003
eness+0.003
 gloss+0.003
Neg logit contributions
ll-0.002
 together-0.002
 far-0.002
 here-0.002
 glad-0.001
Token wall
Token ID3355
Feature activation-3.19e-02

Pos logit contributions
eness+0.039
paste+0.039
 gloss+0.039
 aids+0.038
roach+0.037
Neg logit contributions
 far-0.049
 together-0.045
ll-0.045
 here-0.039
 our-0.039
Token.
Token ID13
Feature activation-7.60e-03

Pos logit contributions
roach+0.079
eness+0.078
paste+0.076
 aids+0.075
est+0.075
Neg logit contributions
 together-0.036
 far-0.035
ll-0.032
 here-0.025
 who-0.024
Token It
Token ID632
Feature activation-1.48e-02

Pos logit contributions
 aids+0.018
roach+0.017
paste+0.017
est+0.017
 gloss+0.017
Neg logit contributions
ll-0.009
 together-0.008
 far-0.008
 glad-0.007
 here-0.006
Token said
Token ID531
Feature activation-2.84e-02

Pos logit contributions
eness+0.032
roach+0.032
paste+0.031
 aids+0.031
 sensation+0.031
Neg logit contributions
 together-0.021
ll-0.021
 far-0.020
We-0.017
 they-0.016
Token "
Token ID366
Feature activation+9.53e-03

Pos logit contributions
roach+0.069
eness+0.068
 sensation+0.067
 scalp+0.065
 strands+0.065
Neg logit contributions
 together-0.034
 far-0.032
ll-0.029
 here-0.028
 they-0.028
TokenDO
Token ID18227
Feature activation-8.69e-02

Pos logit contributions
ll+0.015
 together+0.013
 far+0.013
 glad+0.011
 here+0.011
Neg logit contributions
 aids-0.019
roach-0.019
est-0.019
paste-0.019
eness-0.018
Token NOT
Token ID5626
Feature activation-2.39e-02

Pos logit contributions
 sensation+0.127
eness+0.127
 gloss+0.126
paste+0.125
est+0.124
Neg logit contributions
 far-0.173
ll-0.170
 together-0.164
 here-0.158
 glad-0.152
Token T
Token ID309
Feature activation-1.82e-02

Pos logit contributions
paste+0.043
 sensation+0.042
eness+0.042
 aids+0.042
est+0.042
Neg logit contributions
 far-0.040
 together-0.038
ll-0.037
 here-0.035
 our-0.034
TokenOU
Token ID2606
Feature activation+1.18e-02

Pos logit contributions
 sensation+0.028
 aids+0.027
paste+0.027
roach+0.027
 gloss+0.027
Neg logit contributions
ll-0.036
 far-0.032
 together-0.031
arl-0.030
 here-0.030
TokenCH
Token ID3398
Feature activation-2.35e-02

Pos logit contributions
ll+0.025
 together+0.023
 far+0.023
 glad+0.021
 here+0.020
Neg logit contributions
 aids-0.017
roach-0.017
paste-0.016
est-0.016
eness-0.016
Token".
Token ID1911
Feature activation-4.60e-03

Pos logit contributions
roach+0.045
 aids+0.044
 sensation+0.044
 poses+0.044
est+0.044
Neg logit contributions
ll-0.037
 together-0.036
 far-0.036
 here-0.032
arl-0.030
Token.
Token ID13
Feature activation-2.57e-02

Pos logit contributions
roach+0.035
 aids+0.034
eness+0.033
paste+0.033
 gloss+0.033
Neg logit contributions
 together-0.020
ll-0.019
 far-0.018
 glad-0.015
 here-0.015
Token He
Token ID679
Feature activation-9.50e-03

Pos logit contributions
 aids+0.046
paste+0.046
 gloss+0.045
roach+0.044
est+0.044
Neg logit contributions
ll-0.046
 far-0.041
 together-0.039
 glad-0.036
 here-0.034
Token pulled
Token ID5954
Feature activation-1.32e-02

Pos logit contributions
 aids+0.022
roach+0.022
eness+0.022
est+0.021
 sensation+0.021
Neg logit contributions
ll-0.013
 together-0.012
 far-0.011
 here-0.009
We-0.009
Token the
Token ID262
Feature activation-2.69e-02

Pos logit contributions
 aids+0.030
roach+0.029
paste+0.029
 gloss+0.028
eness+0.028
Neg logit contributions
ll-0.017
 together-0.017
 far-0.016
 here-0.013
 glad-0.013
Token c
Token ID269
Feature activation-1.03e-02

Pos logit contributions
 aids+0.042
roach+0.041
eness+0.039
 poses+0.039
 gloss+0.039
Neg logit contributions
ll-0.053
 far-0.051
 together-0.051
 here-0.045
 glad-0.044
Tokenork
Token ID967
Feature activation-8.57e-02

Pos logit contributions
 aids+0.015
 sensation+0.014
 phrase+0.014
 movements+0.014
 poses+0.014
Neg logit contributions
ll-0.022
 together-0.020
 far-0.020
arl-0.019
 here-0.017
Token and
Token ID290
Feature activation-2.73e-02

Pos logit contributions
roach+0.170
 aids+0.168
eness+0.162
 gloss+0.161
paste+0.160
Neg logit contributions
ll-0.135
 together-0.133
 far-0.129
 here-0.108
 glad-0.099
Token a
Token ID257
Feature activation+1.74e-02

Pos logit contributions
 aids+0.065
eness+0.063
roach+0.063
est+0.060
 phrase+0.059
Neg logit contributions
 together-0.040
 far-0.037
ll-0.033
 they-0.029
 here-0.028
Token puff
Token ID33679
Feature activation-4.20e-02

Pos logit contributions
 far+0.028
 together+0.027
ll+0.027
 glad+0.023
 here+0.022
Neg logit contributions
 aids-0.033
roach-0.033
eness-0.032
est-0.031
 poses-0.031
Token of
Token ID286
Feature activation-1.42e-02

Pos logit contributions
 aids+0.115
roach+0.114
 poses+0.113
 phrase+0.109
 exercises+0.107
Neg logit contributions
 together-0.038
 far-0.033
ll-0.023
 they-0.019
 here-0.018
Token smoke
Token ID7523
Feature activation-2.07e-02

Pos logit contributions
 aids+0.025
 gloss+0.025
roach+0.025
eness+0.024
 poses+0.024
Neg logit contributions
 far-0.024
 together-0.023
ll-0.022
 glad-0.021
 our-0.020
Token an
Token ID281
Feature activation-2.61e-02

Pos logit contributions
 aids+0.029
roach+0.029
 phrase+0.028
 gloss+0.028
paste+0.028
Neg logit contributions
 together-0.014
 far-0.013
 their-0.010
 here-0.010
 our-0.009
Token old
Token ID1468
Feature activation+2.18e-03

Pos logit contributions
paste+0.051
 gloss+0.049
 aids+0.048
 poses+0.048
 movements+0.048
Neg logit contributions
 far-0.041
 together-0.039
ll-0.037
 here-0.036
 glad-0.034
Token,
Token ID11
Feature activation-1.78e-02

Pos logit contributions
 far+0.003
 together+0.003
 here+0.003
 home+0.003
 who+0.003
Neg logit contributions
paste-0.004
eness-0.004
roach-0.004
 gloss-0.004
 aids-0.004
Token rusty
Token ID47855
Feature activation-1.84e-02

Pos logit contributions
roach+0.039
eness+0.039
 aids+0.038
est+0.037
 phrase+0.037
Neg logit contributions
 far-0.025
 together-0.024
 glad-0.021
ll-0.020
 here-0.019
Token do
Token ID466
Feature activation-3.97e-02

Pos logit contributions
paste+0.035
 gloss+0.035
 aids+0.033
 echoed+0.033
roach+0.032
Neg logit contributions
 far-0.030
 together-0.026
 home-0.024
 here-0.023
 close-0.022
Tokenork
Token ID967
Feature activation-8.50e-02

Pos logit contributions
roach+0.081
tern+0.078
eness+0.077
 scalp+0.076
paste+0.075
Neg logit contributions
 together-0.061
 far-0.061
ll-0.054
 here-0.049
 who-0.046
Tokennob
Token ID34952
Feature activation-2.98e-02

Pos logit contributions
roach+0.090
 aids+0.087
 scalp+0.084
 sensation+0.080
 gloss+0.079
Neg logit contributions
 together-0.213
ll-0.211
 far-0.209
 here-0.187
 who-0.175
Token and
Token ID290
Feature activation+3.35e-03

Pos logit contributions
roach+0.073
 aids+0.072
 phrase+0.069
paste+0.068
 gloss+0.068
Neg logit contributions
 together-0.037
 far-0.034
ll-0.032
 here-0.026
 they-0.022
Token the
Token ID262
Feature activation-3.69e-03

Pos logit contributions
 together+0.005
 far+0.005
ll+0.005
 here+0.004
 glad+0.004
Neg logit contributions
roach-0.007
 aids-0.007
eness-0.007
est-0.007
 phrase-0.007
Token do
Token ID466
Feature activation-3.56e-02

Pos logit contributions
eness+0.007
 aids+0.007
roach+0.007
paste+0.007
 gloss+0.007
Neg logit contributions
 far-0.005
 together-0.005
ll-0.005
 here-0.004
 glad-0.004
Tokenork
Token ID967
Feature activation-7.92e-02

Pos logit contributions
roach+0.059
eness+0.056
tern+0.055
 brist+0.055
 scalp+0.054
Neg logit contributions
 far-0.067
 together-0.064
ll-0.063
 here-0.056
 there-0.053
Token so
Token ID523
Feature activation-7.43e-03

Pos logit contributions
ll+0.003
 together+0.003
 far+0.003
 here+0.002
 they+0.002
Neg logit contributions
 aids-0.006
roach-0.006
 phrase-0.006
eness-0.006
paste-0.006
Token she
Token ID673
Feature activation-8.59e-03

Pos logit contributions
 aids+0.017
roach+0.016
eness+0.016
paste+0.016
est+0.016
Neg logit contributions
ll-0.010
 together-0.009
 far-0.009
 glad-0.007
 here-0.007
Token could
Token ID714
Feature activation+1.07e-02

Pos logit contributions
 aids+0.016
paste+0.016
roach+0.016
est+0.016
 gloss+0.016
Neg logit contributions
ll-0.013
 together-0.012
 far-0.012
 glad-0.011
 here-0.010
Token run
Token ID1057
Feature activation+3.67e-03

Pos logit contributions
ll+0.015
 together+0.014
 far+0.013
 glad+0.012
 here+0.011
Neg logit contributions
 aids-0.022
roach-0.022
paste-0.022
 gloss-0.022
est-0.022
Token fr
Token ID1216
Feature activation-6.78e-03

Pos logit contributions
ll+0.005
 together+0.005
 far+0.005
 glad+0.004
 here+0.004
Neg logit contributions
 aids-0.008
roach-0.007
paste-0.007
est-0.007
eness-0.007
Tokenree
Token ID631
Feature activation-8.49e-02

Pos logit contributions
 aids+0.012
roach+0.011
paste+0.011
est+0.011
 phrase+0.011
Neg logit contributions
ll-0.013
 together-0.012
 far-0.011
 here-0.010
 glad-0.010
Tokenly
Token ID306
Feature activation-1.36e-02

Pos logit contributions
roach+0.134
 sensation+0.125
 aids+0.122
eness+0.120
est+0.113
Neg logit contributions
ll-0.177
 together-0.174
 far-0.168
 here-0.146
 glad-0.138
Token.
Token ID13
Feature activation-6.32e-03

Pos logit contributions
roach+0.028
 aids+0.027
eness+0.027
 phrase+0.026
paste+0.025
Neg logit contributions
 far-0.021
 together-0.020
ll-0.018
 here-0.018
 home-0.016
Token
Token ID198
Feature activation-3.15e-03

Pos logit contributions
 aids+0.016
roach+0.016
 gloss+0.015
est+0.015
 phrase+0.015
Neg logit contributions
ll-0.007
 far-0.006
 together-0.006
 glad-0.004
 here-0.004
Token
Token ID198
Feature activation-5.65e-03

Pos logit contributions
 aids+0.005
 phrase+0.005
 poses+0.005
est+0.005
 gloss+0.005
Neg logit contributions
ll-0.005
 ours-0.004
arl-0.004
 together-0.004
 far-0.004
TokenThe
Token ID464
Feature activation-4.35e-03

Pos logit contributions
 aids+0.013
roach+0.012
paste+0.012
est+0.012
 gloss+0.012
Neg logit contributions
ll-0.008
 together-0.007
 far-0.007
 glad-0.006
 here-0.005
Token shiny
Token ID22441
Feature activation-1.38e-02

Pos logit contributions
eness+0.063
roach+0.061
est+0.061
 aids+0.061
paste+0.058
Neg logit contributions
 far-0.033
 together-0.031
ll-0.027
 here-0.026
 glad-0.025
Token and
Token ID290
Feature activation-2.58e-02

Pos logit contributions
 aids+0.038
roach+0.036
 phrase+0.035
eness+0.035
 gloss+0.035
Neg logit contributions
 together-0.013
 far-0.013
ll-0.011
 here-0.010
 who-0.008
Token had
Token ID550
Feature activation-1.68e-02

Pos logit contributions
 aids+0.058
eness+0.057
roach+0.056
est+0.055
 phrase+0.055
Neg logit contributions
 together-0.034
ll-0.034
 far-0.034
 glad-0.027
 here-0.026
Token a
Token ID257
Feature activation+1.18e-02

Pos logit contributions
roach+0.043
eness+0.042
 aids+0.041
 gloss+0.041
 sensation+0.040
Neg logit contributions
 far-0.018
 together-0.016
ll-0.014
 many-0.014
 our-0.012
Token c
Token ID269
Feature activation+4.95e-04

Pos logit contributions
ll+0.020
 far+0.019
 together+0.019
 glad+0.016
 here+0.016
Neg logit contributions
roach-0.022
 aids-0.022
eness-0.022
est-0.021
paste-0.021
Tokenork
Token ID967
Feature activation-8.42e-02

Pos logit contributions
ll+0.001
 far+0.001
 together+0.001
arl+0.001
 here+0.001
Neg logit contributions
 aids-0.001
 sensation-0.001
eness-0.001
 phrase-0.001
est-0.001
Token on
Token ID319
Feature activation-2.55e-02

Pos logit contributions
roach+0.177
 aids+0.173
eness+0.172
 poses+0.165
 movements+0.164
Neg logit contributions
 far-0.119
 together-0.114
ll-0.110
 here-0.097
 close-0.085
Token top
Token ID1353
Feature activation-1.27e-02

Pos logit contributions
eness+0.066
roach+0.064
 gloss+0.064
paste+0.064
 aids+0.064
Neg logit contributions
 far-0.027
 together-0.025
 their-0.021
 our-0.021
 they-0.020
Token.
Token ID13
Feature activation-1.66e-02

Pos logit contributions
roach+0.032
 aids+0.032
eness+0.032
 phrase+0.031
paste+0.031
Neg logit contributions
 far-0.013
 together-0.013
ll-0.011
 here-0.010
 they-0.008
Token
Token ID198
Feature activation-7.43e-03

Pos logit contributions
 aids+0.038
 gloss+0.037
roach+0.037
paste+0.037
est+0.036
Neg logit contributions
ll-0.021
 far-0.018
 together-0.017
 glad-0.015
 here-0.014
Token
Token ID198
Feature activation-6.32e-03

Pos logit contributions
 aids+0.011
 gloss+0.011
 phrase+0.011
oline+0.010
paste+0.010
Neg logit contributions
ll-0.012
 ours-0.011
 glad-0.010
 dads-0.010
 far-0.010
Token red
Token ID2266
Feature activation-1.24e-02

Pos logit contributions
paste+0.030
eness+0.029
aces+0.029
 degrees+0.029
 aids+0.028
Neg logit contributions
 far-0.017
 together-0.016
 here-0.014
 home-0.012
 glad-0.012
Token button
Token ID4936
Feature activation-2.98e-02

Pos logit contributions
aces+0.025
 aids+0.025
paste+0.025
eness+0.025
 poses+0.024
Neg logit contributions
 together-0.017
 far-0.017
 home-0.014
 here-0.014
 play-0.014
Token that
Token ID326
Feature activation+2.69e-03

Pos logit contributions
roach+0.083
 aids+0.081
eness+0.080
 phrase+0.079
 poses+0.077
Neg logit contributions
 together-0.028
 far-0.026
ll-0.019
 here-0.018
 who-0.015
Token said
Token ID531
Feature activation-2.88e-02

Pos logit contributions
 far+0.003
 together+0.003
 they+0.003
ll+0.003
 there+0.002
Neg logit contributions
eness-0.007
roach-0.007
 aids-0.007
 phrase-0.006
est-0.006
Token "
Token ID366
Feature activation+8.12e-03

Pos logit contributions
roach+0.075
eness+0.073
 sensation+0.071
 poses+0.071
 oils+0.070
Neg logit contributions
 together-0.030
 far-0.030
 they-0.027
 here-0.025
 there-0.025
TokenDO
Token ID18227
Feature activation-8.42e-02

Pos logit contributions
ll+0.012
 together+0.011
 far+0.011
We+0.009
 here+0.009
Neg logit contributions
 aids-0.017
roach-0.016
paste-0.016
eness-0.016
 phrase-0.016
Token NOT
Token ID5626
Feature activation-2.14e-02

Pos logit contributions
 sensation+0.125
roach+0.124
eness+0.123
paste+0.123
 gloss+0.123
Neg logit contributions
 far-0.164
ll-0.161
 together-0.156
 here-0.153
arl-0.145
Token PRESS
Token ID44310
Feature activation-6.89e-02

Pos logit contributions
paste+0.040
eness+0.040
 sensation+0.039
est+0.039
 aids+0.039
Neg logit contributions
 far-0.035
 together-0.032
ll-0.031
 here-0.031
 our-0.030
Token."
Token ID526
Feature activation-1.37e-02

Pos logit contributions
 gloss+0.138
 sensation+0.135
paste+0.134
 aids+0.134
roach+0.133
Neg logit contributions
 together-0.101
ll-0.099
 far-0.094
 here-0.086
 our-0.074
Token She
Token ID1375
Feature activation-1.69e-02

Pos logit contributions
eness+0.034
paste+0.034
roach+0.034
 phrase+0.034
est+0.033
Neg logit contributions
 together-0.013
 far-0.012
ll-0.012
 They-0.009
 home-0.009
Token couldn
Token ID3521
Feature activation-7.38e-03

Pos logit contributions
roach+0.043
eness+0.043
 aids+0.042
est+0.042
 sensation+0.041
Neg logit contributions
ll-0.018
 together-0.018
 far-0.017
 who-0.014
 close-0.013
Token her
Token ID607
Feature activation-1.56e-02

Pos logit contributions
roach+0.014
 aids+0.014
paste+0.014
 phrase+0.014
eness+0.014
Neg logit contributions
ll-0.007
 together-0.007
 far-0.006
 here-0.005
 glad-0.005
Token to
Token ID284
Feature activation-1.61e-02

Pos logit contributions
roach+0.040
eness+0.040
 aids+0.039
est+0.039
 sensation+0.039
Neg logit contributions
ll-0.015
 together-0.014
 far-0.013
 glad-0.010
 here-0.010
Token polish
Token ID25245
Feature activation-4.44e-03

Pos logit contributions
 aids+0.034
roach+0.033
est+0.033
paste+0.033
eness+0.032
Neg logit contributions
ll-0.023
 together-0.021
 far-0.020
 glad-0.018
 here-0.017
Token the
Token ID262
Feature activation-1.62e-02

Pos logit contributions
 aids+0.013
 phrase+0.012
tern+0.012
eness+0.012
roach+0.012
Neg logit contributions
 together-0.004
 far-0.004
 their-0.003
 your-0.003
 our-0.003
Token do
Token ID466
Feature activation-5.50e-02

Pos logit contributions
 aids+0.029
eness+0.028
roach+0.028
paste+0.027
 gloss+0.027
Neg logit contributions
ll-0.029
 far-0.028
 together-0.027
 here-0.024
 glad-0.023
Tokenork
Token ID967
Feature activation-8.40e-02

Pos logit contributions
roach+0.089
tern+0.088
eness+0.087
 winds+0.083
 streaks+0.081
Neg logit contributions
 together-0.108
 far-0.105
ll-0.105
 they-0.088
 here-0.087
Tokennob
Token ID34952
Feature activation-3.62e-02

Pos logit contributions
 aids+0.085
roach+0.083
eness+0.080
 phrase+0.078
 sensation+0.076
Neg logit contributions
 together-0.217
ll-0.215
 far-0.213
 here-0.192
 close-0.179
Token.
Token ID13
Feature activation-1.57e-02

Pos logit contributions
 aids+0.088
roach+0.087
 phrase+0.085
eness+0.084
paste+0.084
Neg logit contributions
 together-0.043
 far-0.041
ll-0.041
 here-0.031
 glad-0.026
Token Lily
Token ID20037
Feature activation-2.54e-02

Pos logit contributions
 aids+0.035
 gloss+0.035
roach+0.035
paste+0.034
est+0.034
Neg logit contributions
ll-0.021
 far-0.018
 together-0.017
 glad-0.015
 here-0.014
Token didn
Token ID1422
Feature activation-1.90e-02

Pos logit contributions
roach+0.055
 aids+0.055
eness+0.053
paste+0.052
 phrase+0.051
Neg logit contributions
ll-0.040
 together-0.036
 far-0.033
 glad-0.027
 here-0.026
Token't
Token ID470
Feature activation-8.35e-03

Pos logit contributions
 aids+0.046
roach+0.046
est+0.046
eness+0.046
 phrase+0.045
Neg logit contributions
ll-0.021
 together-0.019
 far-0.018
 here-0.014
 glad-0.014
Token to
Token ID284
Feature activation-1.97e-02

Pos logit contributions
 aids+0.056
roach+0.056
est+0.055
paste+0.055
eness+0.055
Neg logit contributions
ll-0.034
 together-0.030
 far-0.029
 glad-0.026
We-0.024
Token be
Token ID307
Feature activation-1.54e-02

Pos logit contributions
 aids+0.041
roach+0.040
eness+0.039
est+0.039
paste+0.039
Neg logit contributions
ll-0.029
 together-0.025
 far-0.023
 glad-0.022
We-0.020
Token restored
Token ID15032
Feature activation+5.59e-04

Pos logit contributions
 aids+0.034
roach+0.034
paste+0.033
est+0.033
 gloss+0.033
Neg logit contributions
ll-0.021
 together-0.019
 far-0.018
 glad-0.015
 here-0.015
Token to
Token ID284
Feature activation-1.41e-02

Pos logit contributions
ll+0.001
 together+0.001
 far+0.001
 glad+0.001
We+0.001
Neg logit contributions
 aids-0.001
est-0.001
roach-0.001
eness-0.001
paste-0.001
Token its
Token ID663
Feature activation-1.69e-02

Pos logit contributions
 aids+0.029
roach+0.028
est+0.028
eness+0.028
paste+0.027
Neg logit contributions
ll-0.022
 together-0.018
 far-0.018
 glad-0.016
We-0.015
Token former
Token ID1966
Feature activation-8.20e-02

Pos logit contributions
 aids+0.034
roach+0.034
paste+0.033
 gloss+0.033
eness+0.033
Neg logit contributions
ll-0.026
 together-0.024
 far-0.024
 glad-0.020
 here-0.020
Token glory
Token ID13476
Feature activation-2.80e-02

Pos logit contributions
 gloss+0.117
paste+0.116
roach+0.111
 aids+0.103
 sensation+0.096
Neg logit contributions
ll-0.174
 far-0.170
 together-0.169
 home-0.151
 our-0.149
Token."
Token ID526
Feature activation-2.01e-03

Pos logit contributions
 aids+0.066
roach+0.066
 gloss+0.065
paste+0.064
eness+0.064
Neg logit contributions
ll-0.033
 together-0.031
 far-0.030
 here-0.024
 glad-0.023
Token
Token ID220
Feature activation+1.91e-02

Pos logit contributions
 aids+0.005
est+0.005
 gloss+0.004
roach+0.004
 phrase+0.004
Neg logit contributions
ll-0.002
 far-0.002
 together-0.002
 here-0.002
 glad-0.002
Token
Token ID198
Feature activation+3.05e-03

Pos logit contributions
ll+0.021
 far+0.020
 together+0.018
prise+0.016
 glad+0.015
Neg logit contributions
 aids-0.043
paste-0.041
 gloss-0.041
 phrase-0.040
est-0.040
Token
Token ID198
Feature activation+4.65e-03

Pos logit contributions
 ours+0.004
ll+0.004
 far+0.004
 together+0.003
 here+0.003
Neg logit contributions
 gloss-0.006
 phrase-0.006
 aids-0.006
 brist-0.006
est-0.006
Token pol
Token ID755
Feature activation+1.32e-02

Pos logit contributions
 together+0.004
ll+0.004
 far+0.004
 glad+0.003
 here+0.003
Neg logit contributions
roach-0.008
 aids-0.007
paste-0.007
eness-0.007
 phrase-0.007
Tokenishing
Token ID3929
Feature activation-1.75e-02

Pos logit contributions
ll+0.020
 together+0.019
 far+0.018
 here+0.015
 glad+0.014
Neg logit contributions
 aids-0.028
 phrase-0.027
roach-0.027
 gloss-0.027
paste-0.027
Token all
Token ID477
Feature activation+2.64e-02

Pos logit contributions
 aids+0.041
roach+0.040
paste+0.039
 gloss+0.039
 scalp+0.039
Neg logit contributions
 together-0.023
 far-0.021
 their-0.016
ll-0.016
 here-0.015
Token the
Token ID262
Feature activation-2.67e-02

Pos logit contributions
ll+0.033
 together+0.032
 far+0.031
 here+0.024
 glad+0.024
Neg logit contributions
paste-0.061
 aids-0.061
roach-0.060
 gloss-0.060
eness-0.059
Token do
Token ID466
Feature activation-4.46e-02

Pos logit contributions
 aids+0.050
paste+0.049
roach+0.048
eness+0.048
 gloss+0.048
Neg logit contributions
ll-0.045
 far-0.043
 together-0.042
 here-0.036
 glad-0.035
Tokenork
Token ID967
Feature activation-8.01e-02

Pos logit contributions
roach+0.100
tern+0.094
paste+0.094
eness+0.094
 gloss+0.093
Neg logit contributions
 together-0.063
 far-0.059
ll-0.056
 here-0.046
 they-0.043
Tokenn
Token ID77
Feature activation-4.33e-02

Pos logit contributions
roach+0.078
 aids+0.075
 scalp+0.070
 gloss+0.068
 sensation+0.067
Neg logit contributions
 together-0.210
ll-0.210
 far-0.202
 here-0.180
We-0.171
Tokenobs
Token ID8158
Feature activation-1.86e-02

Pos logit contributions
roach+0.087
 aids+0.087
paste+0.086
 phrase+0.085
 gloss+0.084
Neg logit contributions
ll-0.067
 together-0.061
 far-0.059
 glad-0.050
 here-0.049
Token.
Token ID13
Feature activation-1.13e-03

Pos logit contributions
 aids+0.043
roach+0.043
 phrase+0.042
 poses+0.040
paste+0.040
Neg logit contributions
 together-0.024
ll-0.023
 far-0.022
 here-0.017
 glad-0.016
Token
Token ID220
Feature activation+1.48e-02

Pos logit contributions
 aids+0.002
roach+0.002
paste+0.002
eness+0.002
est+0.002
Neg logit contributions
ll-0.002
 together-0.002
 far-0.002
 glad-0.001
 here-0.001
Token
Token ID198
Feature activation+3.63e-03

Pos logit contributions
ll+0.015
 far+0.013
 together+0.012
 glad+0.009
 here+0.009
Neg logit contributions
 aids-0.037
 gloss-0.036
paste-0.036
roach-0.036
eness-0.035
Tokenork
Token ID967
Feature activation-8.50e-02

Pos logit contributions
roach+0.081
tern+0.078
eness+0.077
 scalp+0.076
paste+0.075
Neg logit contributions
 together-0.061
 far-0.061
ll-0.054
 here-0.049
 who-0.046
Tokennob
Token ID34952
Feature activation-2.98e-02

Pos logit contributions
roach+0.090
 aids+0.087
 scalp+0.084
 sensation+0.080
 gloss+0.079
Neg logit contributions
 together-0.213
ll-0.211
 far-0.209
 here-0.187
 who-0.175
Token and
Token ID290
Feature activation+3.35e-03

Pos logit contributions
roach+0.073
 aids+0.072
 phrase+0.069
paste+0.068
 gloss+0.068
Neg logit contributions
 together-0.037
 far-0.034
ll-0.032
 here-0.026
 they-0.022
Token the
Token ID262
Feature activation-3.69e-03

Pos logit contributions
 together+0.005
 far+0.005
ll+0.005
 here+0.004
 glad+0.004
Neg logit contributions
roach-0.007
 aids-0.007
eness-0.007
est-0.007
 phrase-0.007
Token do
Token ID466
Feature activation-3.56e-02

Pos logit contributions
eness+0.007
 aids+0.007
roach+0.007
paste+0.007
 gloss+0.007
Neg logit contributions
 far-0.005
 together-0.005
ll-0.005
 here-0.004
 glad-0.004
Tokenork
Token ID967
Feature activation-7.92e-02

Pos logit contributions
roach+0.059
eness+0.056
tern+0.055
 brist+0.055
 scalp+0.054
Neg logit contributions
 far-0.067
 together-0.064
ll-0.063
 here-0.056
 there-0.053
Tokennob
Token ID34952
Feature activation-3.10e-02

Pos logit contributions
roach+0.090
 aids+0.087
 scalp+0.082
 sensation+0.080
 gloss+0.080
Neg logit contributions
ll-0.192
 together-0.192
 far-0.187
 here-0.167
 who-0.156
Token sh
Token ID427
Feature activation-2.45e-02

Pos logit contributions
roach+0.072
 aids+0.072
 phrase+0.069
 sensation+0.067
paste+0.067
Neg logit contributions
 together-0.042
 far-0.040
ll-0.036
 here-0.029
 they-0.027
Tokenined
Token ID1389
Feature activation+1.36e-02

Pos logit contributions
 movements+0.043
 poses+0.042
paste+0.041
 aids+0.040
tern+0.040
Neg logit contributions
ll-0.045
 together-0.038
 far-0.035
arl-0.034
 here-0.032
Token!
Token ID0
Feature activation-7.27e-03

Pos logit contributions
 together+0.018
 far+0.016
 here+0.012
ll+0.012
 they+0.010
Neg logit contributions
 aids-0.034
roach-0.033
 phrase-0.032
 poses-0.032
 gloss-0.032
Token She
Token ID1375
Feature activation-5.44e-03

Pos logit contributions
 aids+0.016
roach+0.016
paste+0.016
eness+0.016
est+0.015
Neg logit contributions
ll-0.010
 together-0.009
 far-0.009
 glad-0.007
 here-0.007
Token present
Token ID1944
Feature activation-7.99e-03

Pos logit contributions
 aids+0.038
roach+0.037
eness+0.036
paste+0.036
der+0.035
Neg logit contributions
 together-0.025
 far-0.024
ll-0.023
 here-0.022
 home-0.021
Token?"
Token ID1701
Feature activation-1.43e-02

Pos logit contributions
 aids+0.019
roach+0.019
paste+0.019
eness+0.019
 phrase+0.019
Neg logit contributions
ll-0.009
 together-0.008
 far-0.008
 here-0.007
 glad-0.006
Token
Token ID198
Feature activation-4.53e-03

Pos logit contributions
 aids+0.029
roach+0.028
paste+0.028
 sensation+0.028
eness+0.028
Neg logit contributions
ll-0.022
 together-0.022
 far-0.020
 glad-0.017
We-0.017
Token
Token ID198
Feature activation-1.94e-03

Pos logit contributions
 gloss+0.011
 aids+0.011
 phrase+0.011
roach+0.011
est+0.010
Neg logit contributions
ll-0.005
 far-0.004
 together-0.004
 glad-0.003
 here-0.003
TokenB
Token ID33
Feature activation+1.67e-02

Pos logit contributions
 aids+0.004
roach+0.004
paste+0.004
est+0.004
eness+0.004
Neg logit contributions
ll-0.003
 together-0.003
 far-0.003
 here-0.002
 glad-0.002
Tokenree
Token ID631
Feature activation-7.90e-02

Pos logit contributions
ll+0.031
 together+0.029
 far+0.028
 here+0.024
 glad+0.024
Neg logit contributions
 aids-0.029
roach-0.028
 sensation-0.028
est-0.028
paste-0.027
Token said
Token ID531
Feature activation-1.94e-02

Pos logit contributions
 sensation+0.160
roach+0.158
eness+0.156
est+0.155
 aids+0.154
Neg logit contributions
ll-0.126
 together-0.122
 far-0.101
 who-0.092
 named-0.091
Token,
Token ID11
Feature activation-2.94e-02

Pos logit contributions
roach+0.047
 aids+0.045
eness+0.045
 sensation+0.045
 poses+0.044
Neg logit contributions
 together-0.024
ll-0.022
 far-0.022
 here-0.018
 glad-0.017
Token "
Token ID366
Feature activation+1.85e-02

Pos logit contributions
roach+0.063
 aids+0.062
eness+0.062
paste+0.061
est+0.061
Neg logit contributions
 together-0.042
ll-0.040
 far-0.039
 glad-0.034
 here-0.033
TokenWell
Token ID5779
Feature activation-5.49e-03

Pos logit contributions
ll+0.018
 together+0.016
 far+0.016
We+0.014
 glad+0.011
Neg logit contributions
 aids-0.048
roach-0.047
eness-0.046
paste-0.046
est-0.046
Token,
Token ID11
Feature activation+9.66e-04

Pos logit contributions
 aids+0.010
est+0.010
roach+0.010
eness+0.010
paste+0.010
Neg logit contributions
ll-0.009
 together-0.008
 far-0.008
We-0.007
 Enjoy-0.007
Token said
Token ID531
Feature activation-1.50e-02

Pos logit contributions
eness+0.009
 aids+0.008
roach+0.008
est+0.008
 movements+0.008
Neg logit contributions
ll-0.005
 together-0.005
 far-0.004
 they-0.004
We-0.004
Token,
Token ID11
Feature activation-2.40e-02

Pos logit contributions
roach+0.036
eness+0.035
 phrase+0.034
 aids+0.034
 sensation+0.034
Neg logit contributions
 together-0.018
ll-0.017
 far-0.016
 here-0.015
 glad-0.014
Token "
Token ID366
Feature activation+2.08e-02

Pos logit contributions
roach+0.047
est+0.047
 aids+0.047
eness+0.047
paste+0.045
Neg logit contributions
ll-0.037
 together-0.036
 far-0.034
 glad-0.031
 here-0.029
TokenAnt
Token ID13217
Feature activation+4.60e-03

Pos logit contributions
ll+0.032
 together+0.029
 far+0.029
 glad+0.025
 here+0.024
Neg logit contributions
 aids-0.042
roach-0.042
paste-0.041
 gloss-0.041
est-0.041
Tokenif
Token ID361
Feature activation+4.75e-02

Pos logit contributions
ll+0.006
 together+0.006
 far+0.006
 glad+0.005
 here+0.005
Neg logit contributions
 aids-0.010
roach-0.010
paste-0.009
est-0.009
 gloss-0.009
Tokenree
Token ID631
Feature activation-7.90e-02

Pos logit contributions
ll+0.047
 together+0.041
 far+0.036
 glad+0.033
 Enjoy+0.030
Neg logit contributions
paste-0.116
roach-0.114
 aids-0.114
 gloss-0.111
eness-0.111
Tokenze
Token ID2736
Feature activation-1.84e-02

Pos logit contributions
 sensation+0.106
roach+0.105
 aids+0.104
 phrase+0.101
eness+0.098
Neg logit contributions
ll-0.168
 together-0.166
 far-0.155
 here-0.145
 our-0.141
Token.
Token ID13
Feature activation+2.01e-02

Pos logit contributions
 phrase+0.043
 aids+0.043
roach+0.043
 sensation+0.043
eness+0.043
Neg logit contributions
 together-0.021
ll-0.021
 far-0.020
 here-0.017
 our-0.016
Token Poison
Token ID19357
Feature activation-2.36e-02

Pos logit contributions
ll+0.037
 together+0.034
 far+0.033
 glad+0.029
 here+0.028
Neg logit contributions
 aids-0.035
roach-0.035
paste-0.034
 gloss-0.034
est-0.034
Token.
Token ID13
Feature activation+2.04e-02

Pos logit contributions
 aids+0.052
 sensation+0.050
 phrase+0.050
 gloss+0.050
 movements+0.049
Neg logit contributions
ll-0.032
 together-0.030
 far-0.028
 here-0.025
arl-0.023
Token Do
Token ID2141
Feature activation-1.46e-02

Pos logit contributions
ll+0.037
 together+0.034
 far+0.034
 glad+0.029
 here+0.028
Neg logit contributions
 aids-0.036
roach-0.035
paste-0.035
 gloss-0.034
eness-0.034
Token she
Token ID673
Feature activation-3.77e-03

Pos logit contributions
 together+0.005
 far+0.005
ll+0.004
 they+0.004
 here+0.004
Neg logit contributions
roach-0.012
eness-0.012
 phrase-0.012
 aids-0.012
est-0.012
Token had
Token ID550
Feature activation+2.24e-03

Pos logit contributions
 aids+0.007
roach+0.007
paste+0.007
 gloss+0.007
est+0.007
Neg logit contributions
ll-0.006
 together-0.006
 far-0.006
 glad-0.005
 here-0.005
Token polished
Token ID23895
Feature activation-1.70e-02

Pos logit contributions
 together+0.003
ll+0.003
 far+0.003
 here+0.002
 their+0.002
Neg logit contributions
roach-0.005
 phrase-0.005
est-0.005
eness-0.005
 aids-0.005
Token the
Token ID262
Feature activation-2.77e-02

Pos logit contributions
 phrase+0.044
 aids+0.044
roach+0.041
tern+0.041
eness+0.041
Neg logit contributions
 together-0.020
 far-0.016
 their-0.013
 our-0.013
ll-0.013
Token do
Token ID466
Feature activation-5.42e-02

Pos logit contributions
eness+0.049
paste+0.048
 gloss+0.047
roach+0.047
 aids+0.047
Neg logit contributions
 far-0.050
 together-0.048
ll-0.047
 here-0.042
 glad-0.041
Tokenork
Token ID967
Feature activation-7.89e-02

Pos logit contributions
roach+0.087
eness+0.087
 winds+0.084
tern+0.083
der+0.083
Neg logit contributions
 together-0.103
 far-0.098
ll-0.095
 here-0.083
 they-0.081
Tokennob
Token ID34952
Feature activation-2.76e-02

Pos logit contributions
roach+0.099
 aids+0.096
 phrase+0.093
eness+0.091
 scalp+0.091
Neg logit contributions
 together-0.184
ll-0.181
 far-0.176
 here-0.156
 close-0.145
Token earlier
Token ID2961
Feature activation-4.14e-03

Pos logit contributions
 aids+0.063
roach+0.063
 phrase+0.061
paste+0.060
eness+0.060
Neg logit contributions
 together-0.036
ll-0.035
 far-0.034
 here-0.027
 glad-0.024
Token.
Token ID13
Feature activation-2.03e-03

Pos logit contributions
 aids+0.010
roach+0.010
 phrase+0.010
eness+0.010
est+0.010
Neg logit contributions
 together-0.005
 far-0.005
ll-0.004
 here-0.004
 they-0.003
Token She
Token ID1375
Feature activation-7.01e-03

Pos logit contributions
 aids+0.004
 gloss+0.004
roach+0.004
paste+0.004
est+0.004
Neg logit contributions
ll-0.003
 far-0.003
 together-0.003
 here-0.002
 glad-0.002
Token thought
Token ID1807
Feature activation-4.47e-03

Pos logit contributions
 aids+0.015
roach+0.015
eness+0.015
est+0.014
paste+0.014
Neg logit contributions
ll-0.010
 together-0.009
 far-0.009
 glad-0.008
 here-0.007
Token.
Token ID13
Feature activation-9.86e-03

Pos logit contributions
 aids+0.069
roach+0.067
 phrase+0.066
 gloss+0.065
eness+0.065
Neg logit contributions
ll-0.036
 together-0.035
 far-0.033
 here-0.026
 they-0.023
Token Jake
Token ID14757
Feature activation-1.89e-02

Pos logit contributions
roach+0.019
eness+0.019
paste+0.019
est+0.019
 sensation+0.018
Neg logit contributions
 together-0.015
ll-0.015
 far-0.014
 glad-0.013
We-0.012
Token had
Token ID550
Feature activation-2.05e-03

Pos logit contributions
roach+0.042
eness+0.041
est+0.040
 aids+0.040
 sensation+0.040
Neg logit contributions
ll-0.027
 together-0.025
 far-0.024
 who-0.020
 here-0.019
Token never
Token ID1239
Feature activation-1.28e-02

Pos logit contributions
est+0.005
eness+0.005
roach+0.005
 aids+0.005
 phrase+0.004
Neg logit contributions
ll-0.003
 far-0.003
 together-0.002
 here-0.002
 glad-0.002
Token had
Token ID550
Feature activation+2.74e-03

Pos logit contributions
roach+0.029
eness+0.029
 phrase+0.029
 aids+0.029
paste+0.029
Neg logit contributions
ll-0.016
 far-0.015
 together-0.014
 glad-0.012
 here-0.012
Token por
Token ID16964
Feature activation-7.83e-02

Pos logit contributions
 far+0.003
 together+0.003
ll+0.003
 here+0.002
 many+0.002
Neg logit contributions
eness-0.007
 aids-0.006
 phrase-0.006
roach-0.006
est-0.006
Tokenridge
Token ID12818
Feature activation-3.43e-02

Pos logit contributions
 aids+0.091
 eerie+0.090
 sensation+0.088
est+0.085
 phrase+0.080
Neg logit contributions
ll-0.195
 together-0.163
 Enjoy-0.163
 far-0.147
arl-0.146
Token before
Token ID878
Feature activation-7.90e-03

Pos logit contributions
 aids+0.060
roach+0.057
 markings+0.056
 streaks+0.055
eness+0.055
Neg logit contributions
ll-0.064
 together-0.063
 far-0.062
 here-0.057
 glad-0.050
Token and
Token ID290
Feature activation-4.72e-03

Pos logit contributions
roach+0.018
 aids+0.018
 gloss+0.018
paste+0.018
 phrase+0.018
Neg logit contributions
ll-0.010
 together-0.010
 far-0.009
 here-0.008
 glad-0.007
Token he
Token ID339
Feature activation+3.63e-03

Pos logit contributions
eness+0.011
roach+0.011
 phrase+0.011
 aids+0.011
est+0.011
Neg logit contributions
 together-0.006
ll-0.005
 far-0.005
 they-0.005
 here-0.004
Token was
Token ID373
Feature activation-2.63e-03

Pos logit contributions
ll+0.005
 far+0.004
 together+0.004
 glad+0.004
We+0.003
Neg logit contributions
roach-0.008
eness-0.008
 aids-0.008
est-0.008
 phrase-0.008
Token was
Token ID373
Feature activation-4.67e-03

Pos logit contributions
 aids+0.010
roach+0.010
paste+0.010
eness+0.010
est+0.010
Neg logit contributions
ll-0.007
 together-0.006
 far-0.006
 glad-0.005
 here-0.005
Token getting
Token ID1972
Feature activation-9.57e-03

Pos logit contributions
 aids+0.010
roach+0.010
paste+0.010
eness+0.010
est+0.010
Neg logit contributions
ll-0.006
 together-0.006
 far-0.005
 glad-0.005
 here-0.004
Token ready
Token ID3492
Feature activation-7.68e-03

Pos logit contributions
 aids+0.022
roach+0.022
paste+0.022
 gloss+0.022
est+0.022
Neg logit contributions
ll-0.012
 together-0.010
 far-0.010
 glad-0.008
 here-0.007
Token to
Token ID284
Feature activation-3.71e-03

Pos logit contributions
 aids+0.020
roach+0.020
 gloss+0.020
paste+0.020
eness+0.020
Neg logit contributions
ll-0.007
 together-0.007
 far-0.006
 here-0.005
 glad-0.004
Token make
Token ID787
Feature activation-4.35e-02

Pos logit contributions
 aids+0.008
roach+0.008
paste+0.008
eness+0.008
 gloss+0.008
Neg logit contributions
ll-0.005
 together-0.004
 far-0.004
 here-0.004
 glad-0.004
Token por
Token ID16964
Feature activation-7.79e-02

Pos logit contributions
roach+0.094
eness+0.088
 phrase+0.088
 aids+0.088
 gloss+0.088
Neg logit contributions
 together-0.068
 far-0.061
ll-0.054
 our-0.053
 here-0.052
Tokenridge
Token ID12818
Feature activation-2.96e-02

Pos logit contributions
 sensation+0.085
est+0.081
roach+0.078
 aids+0.078
 phrase+0.073
Neg logit contributions
ll-0.199
 together-0.177
 far-0.165
 here-0.155
 Enjoy-0.155
Token and
Token ID290
Feature activation-1.04e-02

Pos logit contributions
roach+0.072
 aids+0.072
 phrase+0.069
 gloss+0.069
eness+0.069
Neg logit contributions
 together-0.035
ll-0.034
 far-0.032
 here-0.025
 our-0.022
Token pancakes
Token ID47008
Feature activation-9.00e-03

Pos logit contributions
 aids+0.022
roach+0.022
eness+0.021
est+0.021
 phrase+0.021
Neg logit contributions
ll-0.015
 together-0.015
 far-0.014
 glad-0.012
 here-0.012
Token.
Token ID13
Feature activation-4.40e-03

Pos logit contributions
 aids+0.021
roach+0.020
 phrase+0.019
eness+0.019
 gloss+0.019
Neg logit contributions
ll-0.012
 together-0.012
 far-0.011
 here-0.009
 glad-0.008
Token He
Token ID679
Feature activation-6.26e-03

Pos logit contributions
roach+0.009
 aids+0.009
paste+0.009
 gloss+0.009
eness+0.009
Neg logit contributions
ll-0.007
 together-0.006
 far-0.006
 glad-0.005
 here-0.005
Token with
Token ID351
Feature activation+3.02e-02

Pos logit contributions
ll+0.010
 together+0.008
 far+0.008
 glad+0.006
 here+0.006
Neg logit contributions
 aids-0.024
roach-0.023
paste-0.023
est-0.023
eness-0.023
Token tools
Token ID4899
Feature activation-5.08e-03

Pos logit contributions
ll+0.033
 together+0.033
 far+0.032
 here+0.024
 glad+0.024
Neg logit contributions
 aids-0.073
roach-0.072
eness-0.072
paste-0.071
 phrase-0.071
Token.
Token ID13
Feature activation+3.09e-04

Pos logit contributions
 aids+0.010
roach+0.010
paste+0.010
est+0.010
eness+0.010
Neg logit contributions
ll-0.008
 together-0.007
 far-0.007
 glad-0.006
 here-0.006
Token They
Token ID1119
Feature activation+1.11e-02

Pos logit contributions
ll+0.000
 together+0.000
 far+0.000
 glad+0.000
 here+0.000
Neg logit contributions
 aids-0.001
roach-0.001
paste-0.001
est-0.001
 gloss-0.001
Token have
Token ID423
Feature activation-1.46e-02

Pos logit contributions
ll+0.019
 together+0.017
arl+0.015
 glad+0.015
 here+0.014
Neg logit contributions
roach-0.020
 brist-0.019
 markings-0.019
 bolts-0.018
paste-0.018
Token hamm
Token ID25755
Feature activation-7.68e-02

Pos logit contributions
 aids+0.030
roach+0.030
paste+0.030
est+0.029
eness+0.029
Neg logit contributions
ll-0.022
 together-0.020
 far-0.020
 glad-0.017
 here-0.016
Tokeners
Token ID364
Feature activation+8.47e-03

Pos logit contributions
 aids+0.098
 phrase+0.092
 gloss+0.092
 scalp+0.091
paste+0.089
Neg logit contributions
ll-0.181
 together-0.172
 far-0.167
 here-0.143
arl-0.140
Token,
Token ID11
Feature activation-2.70e-02

Pos logit contributions
ll+0.012
 together+0.011
 far+0.011
 glad+0.009
 here+0.009
Neg logit contributions
 aids-0.017
paste-0.017
roach-0.017
est-0.017
 gloss-0.017
Token saw
Token ID2497
Feature activation-2.56e-02

Pos logit contributions
 aids+0.037
roach+0.036
paste+0.035
est+0.035
eness+0.035
Neg logit contributions
ll-0.059
 together-0.055
 far-0.055
 glad-0.049
 here-0.048
Tokens
Token ID82
Feature activation-7.52e-03

Pos logit contributions
 aids+0.053
roach+0.052
 phrase+0.050
paste+0.050
 gloss+0.050
Neg logit contributions
ll-0.038
 together-0.037
 far-0.037
We-0.029
 glad-0.029
Token,
Token ID11
Feature activation-3.04e-02

Pos logit contributions
 aids+0.016
roach+0.016
paste+0.015
 phrase+0.015
 gloss+0.015
Neg logit contributions
ll-0.011
 together-0.010
 far-0.010
 here-0.008
 glad-0.008
Token Sara
Token ID24799
Feature activation-1.49e-02

Pos logit contributions
 aids+0.010
roach+0.010
paste+0.009
 gloss+0.009
est+0.009
Neg logit contributions
ll-0.006
 together-0.005
 far-0.005
 glad-0.004
 here-0.004
Token said
Token ID531
Feature activation-1.56e-02

Pos logit contributions
roach+0.031
 aids+0.031
eness+0.030
paste+0.030
est+0.030
Neg logit contributions
ll-0.023
 together-0.021
 far-0.020
 glad-0.017
 here-0.016
Token,
Token ID11
Feature activation-2.66e-02

Pos logit contributions
roach+0.036
 aids+0.035
eness+0.034
 sensation+0.034
paste+0.034
Neg logit contributions
 together-0.020
ll-0.020
 far-0.019
 here-0.016
 glad-0.016
Token "
Token ID366
Feature activation+1.47e-02

Pos logit contributions
roach+0.055
 aids+0.054
eness+0.054
est+0.054
paste+0.053
Neg logit contributions
ll-0.039
 together-0.039
 far-0.037
 glad-0.033
 here-0.032
TokenC
Token ID34
Feature activation+4.39e-03

Pos logit contributions
ll+0.018
 together+0.016
 far+0.016
We+0.015
 here+0.012
Neg logit contributions
 aids-0.034
roach-0.033
paste-0.033
eness-0.033
 phrase-0.032
Tokenrick
Token ID5557
Feature activation-7.62e-02

Pos logit contributions
ll+0.008
 together+0.008
 far+0.008
 glad+0.007
 here+0.007
Neg logit contributions
 aids-0.007
roach-0.007
est-0.007
eness-0.007
paste-0.007
Tokenet
Token ID316
Feature activation-2.13e-02

Pos logit contributions
 aids+0.107
paste+0.106
 gloss+0.106
 sensation+0.101
 streaks+0.100
Neg logit contributions
ll-0.160
 together-0.151
 far-0.147
 here-0.136
 our-0.133
Token,
Token ID11
Feature activation-1.03e-02

Pos logit contributions
 aids+0.046
paste+0.044
 sensation+0.044
roach+0.043
 phrase+0.043
Neg logit contributions
ll-0.033
 together-0.029
 far-0.027
 here-0.024
 our-0.023
Token is
Token ID318
Feature activation-2.78e-03

Pos logit contributions
roach+0.023
 aids+0.023
eness+0.022
est+0.022
 phrase+0.022
Neg logit contributions
ll-0.013
 far-0.013
 together-0.012
 here-0.011
 glad-0.010
Token this
Token ID428
Feature activation-1.05e-02

Pos logit contributions
eness+0.007
est+0.007
roach+0.007
 aids+0.007
 phrase+0.007
Neg logit contributions
 far-0.003
 together-0.003
ll-0.003
 here-0.003
 glad-0.002
Token your
Token ID534
Feature activation-7.33e-03

Pos logit contributions
eness+0.023
 scalp+0.022
 aids+0.022
paste+0.022
roach+0.021
Neg logit contributions
ll-0.015
 together-0.014
 far-0.014
 here-0.013
 our-0.012
Token.
Token ID13
Feature activation+1.05e-02

Pos logit contributions
ll+0.023
 far+0.020
 together+0.020
 glad+0.017
 here+0.017
Neg logit contributions
 aids-0.032
roach-0.032
eness-0.031
est-0.031
paste-0.031
Token She
Token ID1375
Feature activation+2.14e-02

Pos logit contributions
ll+0.021
 together+0.017
 far+0.017
prise+0.016
 here+0.015
Neg logit contributions
 aids-0.017
der-0.017
 gloss-0.017
roach-0.016
 brist-0.016
Token said
Token ID531
Feature activation+9.40e-05

Pos logit contributions
ll+0.033
 far+0.030
 together+0.030
arl+0.027
 Enjoy+0.026
Neg logit contributions
 gloss-0.039
paste-0.039
 aids-0.038
roach-0.038
est-0.037
Token they
Token ID484
Feature activation+2.97e-03

Pos logit contributions
ll+0.000
 far+0.000
 together+0.000
We+0.000
 glad+0.000
Neg logit contributions
paste-0.000
 gloss-0.000
 aids-0.000
est-0.000
roach-0.000
Token were
Token ID547
Feature activation+2.06e-02

Pos logit contributions
ll+0.005
 far+0.005
 together+0.004
arl+0.004
 here+0.004
Neg logit contributions
 gloss-0.005
roach-0.005
paste-0.005
est-0.005
eness-0.005
Token good
Token ID922
Feature activation+3.46e-02

Pos logit contributions
ll+0.028
 together+0.022
 far+0.022
We+0.019
arl+0.019
Neg logit contributions
 aids-0.043
roach-0.042
paste-0.041
 gloss-0.041
 phrase-0.041
Token friends
Token ID2460
Feature activation+2.37e-02

Pos logit contributions
ll+0.054
 far+0.051
 together+0.046
 glad+0.042
 Enjoy+0.040
Neg logit contributions
eness-0.066
est-0.065
 gloss-0.065
 aids-0.064
paste-0.063
Token and
Token ID290
Feature activation+2.39e-02

Pos logit contributions
ll+0.038
 together+0.033
 far+0.033
 glad+0.029
 here+0.028
Neg logit contributions
 aids-0.046
paste-0.045
roach-0.045
est-0.045
eness-0.045
Token gave
Token ID2921
Feature activation+3.86e-02

Pos logit contributions
ll+0.031
 far+0.026
 together+0.025
arl+0.021
 here+0.021
Neg logit contributions
 aids-0.053
 gloss-0.053
paste-0.053
roach-0.052
est-0.051
Token them
Token ID606
Feature activation+7.33e-03

Pos logit contributions
ll+0.055
 far+0.044
 glad+0.041
 together+0.041
arl+0.040
Neg logit contributions
 aids-0.081
est-0.081
 gloss-0.080
roach-0.079
paste-0.079
Token a
Token ID257
Feature activation+2.11e-02

Pos logit contributions
ll+0.011
 together+0.010
 far+0.010
 here+0.008
 glad+0.008
Neg logit contributions
 aids-0.016
roach-0.015
eness-0.015
 phrase-0.015
est-0.015
Token they
Token ID484
Feature activation+5.47e-03

Pos logit contributions
roach+0.002
 aids+0.002
eness+0.002
est+0.002
 phrase+0.002
Neg logit contributions
ll-0.000
 together-0.000
 far-0.000
 glad-0.000
 here-0.000
Token were
Token ID547
Feature activation+2.26e-02

Pos logit contributions
ll+0.008
 together+0.007
 far+0.007
 here+0.006
 glad+0.006
Neg logit contributions
 aids-0.011
roach-0.011
paste-0.011
est-0.011
eness-0.011
Token happy
Token ID3772
Feature activation+2.88e-02

Pos logit contributions
ll+0.032
 together+0.025
 far+0.024
arl+0.021
We+0.020
Neg logit contributions
 aids-0.049
paste-0.047
roach-0.046
 phrase-0.046
est-0.045
Token and
Token ID290
Feature activation+1.62e-02

Pos logit contributions
ll+0.045
 far+0.041
 together+0.036
arl+0.035
 glad+0.034
Neg logit contributions
 aids-0.054
paste-0.053
der-0.053
 poses-0.053
est-0.052
Token they
Token ID484
Feature activation+2.97e-02

Pos logit contributions
ll+0.021
 far+0.016
 together+0.016
arl+0.014
 here+0.013
Neg logit contributions
 aids-0.036
paste-0.035
 gloss-0.035
est-0.035
eness-0.034
Token hugged
Token ID41130
Feature activation+4.78e-02

Pos logit contributions
ll+0.044
 together+0.039
 far+0.039
 here+0.034
 glad+0.034
Neg logit contributions
roach-0.058
est-0.058
 aids-0.058
paste-0.057
eness-0.057
Token Ben
Token ID3932
Feature activation+4.64e-03

Pos logit contributions
ll+0.051
 together+0.049
 far+0.046
 glad+0.035
 here+0.035
Neg logit contributions
roach-0.117
 aids-0.117
paste-0.115
 gloss-0.114
est-0.114
Token and
Token ID290
Feature activation+4.28e-02

Pos logit contributions
ll+0.006
 together+0.006
 far+0.006
 glad+0.005
 here+0.005
Neg logit contributions
 aids-0.010
roach-0.010
paste-0.010
 gloss-0.010
 phrase-0.010
Token Lily
Token ID20037
Feature activation+1.48e-02

Pos logit contributions
ll+0.062
 far+0.053
 together+0.052
 here+0.043
 glad+0.043
Neg logit contributions
 aids-0.091
paste-0.089
roach-0.089
 gloss-0.089
est-0.087
Token.
Token ID13
Feature activation+2.11e-02

Pos logit contributions
ll+0.022
 together+0.020
 far+0.019
 glad+0.016
 here+0.016
Neg logit contributions
 aids-0.030
roach-0.030
paste-0.030
est-0.029
eness-0.029
Token They
Token ID1119
Feature activation+2.37e-02

Pos logit contributions
ll+0.029
 far+0.023
 together+0.021
prise+0.019
 here+0.018
Neg logit contributions
 aids-0.047
der-0.045
 gloss-0.045
paste-0.045
 scalp-0.044
Token to
Token ID284
Feature activation+9.50e-03

Pos logit contributions
ll+0.011
 far+0.009
 together+0.009
 glad+0.008
 Enjoy+0.008
Neg logit contributions
est-0.019
 phrase-0.018
 aids-0.018
eness-0.018
 movements-0.018
Token town
Token ID3240
Feature activation+1.61e-02

Pos logit contributions
ll+0.015
 together+0.013
 far+0.012
 glad+0.011
arl+0.011
Neg logit contributions
 aids-0.019
roach-0.019
est-0.018
 movements-0.018
 phrase-0.018
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
ll+0.024
 together+0.020
 far+0.020
 glad+0.018
 Enjoy+0.017
Neg logit contributions
 aids-0.033
est-0.032
 phrase-0.032
roach-0.032
eness-0.031
Token It
Token ID632
Feature activation-2.16e-03

Pos logit contributions
eness+0.023
est+0.023
roach+0.023
paste+0.023
 gloss+0.023
Neg logit contributions
 far-0.013
 together-0.012
 Enjoy-0.010
ll-0.010
 here-0.010
Token was
Token ID373
Feature activation-1.66e-02

Pos logit contributions
roach+0.005
eness+0.005
 aids+0.005
paste+0.005
 gloss+0.004
Neg logit contributions
ll-0.003
 far-0.003
 together-0.003
 here-0.002
 glad-0.002
Token a
Token ID257
Feature activation+3.79e-02

Pos logit contributions
eness+0.037
roach+0.037
paste+0.037
est+0.037
 gloss+0.037
Neg logit contributions
 far-0.022
 together-0.021
ll-0.020
 glad-0.018
 here-0.018
Token little
Token ID1310
Feature activation+4.67e-02

Pos logit contributions
ll+0.058
 together+0.052
 far+0.050
 glad+0.043
 here+0.042
Neg logit contributions
 aids-0.077
roach-0.076
paste-0.074
 gloss-0.074
est-0.074
Token mouse
Token ID10211
Feature activation-4.82e-03

Pos logit contributions
 together+0.063
ll+0.062
 far+0.062
 here+0.051
 glad+0.050
Neg logit contributions
 aids-0.101
roach-0.100
eness-0.098
paste-0.097
 phrase-0.096
Token.
Token ID13
Feature activation-7.53e-03

Pos logit contributions
 aids+0.011
roach+0.010
paste+0.010
est+0.010
eness+0.010
Neg logit contributions
ll-0.007
 together-0.006
 far-0.006
 glad-0.005
 here-0.005
Token The
Token ID383
Feature activation+1.72e-02

Pos logit contributions
roach+0.016
est+0.015
eness+0.015
paste+0.015
 gloss+0.015
Neg logit contributions
 together-0.010
 far-0.010
ll-0.010
 glad-0.009
 here-0.009
Token mouse
Token ID10211
Feature activation-5.85e-03

Pos logit contributions
ll+0.032
 together+0.029
 far+0.029
 glad+0.025
 here+0.025
Neg logit contributions
 aids-0.030
roach-0.029
est-0.029
paste-0.028
eness-0.028
Token museum
Token ID13257
Feature activation+1.18e-02

Pos logit contributions
ll+0.000
 together+0.000
 far+0.000
 glad+0.000
 here+0.000
Neg logit contributions
 aids-0.000
roach-0.000
est-0.000
paste-0.000
 phrase-0.000
Token and
Token ID290
Feature activation+8.35e-03

Pos logit contributions
ll+0.017
 together+0.015
 far+0.015
 glad+0.012
 here+0.012
Neg logit contributions
roach-0.025
 aids-0.025
paste-0.024
 gloss-0.024
eness-0.024
Token the
Token ID262
Feature activation-8.63e-05

Pos logit contributions
ll+0.012
 together+0.011
 far+0.011
 glad+0.009
 here+0.009
Neg logit contributions
 aids-0.018
roach-0.018
paste-0.017
eness-0.017
est-0.017
Token people
Token ID661
Feature activation+2.44e-02

Pos logit contributions
 aids+0.000
roach+0.000
paste+0.000
est+0.000
eness+0.000
Neg logit contributions
ll-0.000
 together-0.000
 far-0.000
 glad-0.000
 here-0.000
Token who
Token ID508
Feature activation+4.39e-02

Pos logit contributions
ll+0.035
 together+0.028
 far+0.028
arl+0.027
 Enjoy+0.027
Neg logit contributions
 aids-0.050
est-0.048
eness-0.048
paste-0.048
 sensation-0.047
Token loved
Token ID6151
Feature activation+4.78e-02

Pos logit contributions
ll+0.075
arl+0.067
 together+0.065
 far+0.061
 Enjoy+0.058
Neg logit contributions
 aids-0.074
roach-0.070
eness-0.069
est-0.068
 sensation-0.068
Token it
Token ID340
Feature activation+6.11e-03

Pos logit contributions
ll+0.074
 together+0.060
 far+0.059
arl+0.053
 glad+0.052
Neg logit contributions
 aids-0.097
roach-0.093
 poses-0.092
 movements-0.092
 phrase-0.092
Token.
Token ID13
Feature activation+6.76e-03

Pos logit contributions
ll+0.009
 together+0.008
 far+0.008
 glad+0.007
 here+0.006
Neg logit contributions
 aids-0.013
roach-0.013
paste-0.013
 gloss-0.012
eness-0.012
Token<|endoftext|>
Token ID50256
Feature activation+5.56e-04

Pos logit contributions
ll+0.008
 far+0.007
 together+0.007
 here+0.005
arl+0.005
Neg logit contributions
 aids-0.016
roach-0.015
eness-0.015
est-0.015
 gloss-0.015
TokenOnce
Token ID7454
Feature activation+2.80e-02

Pos logit contributions
ll+0.001
 far+0.000
 together+0.000
 glad+0.000
 here+0.000
Neg logit contributions
 aids-0.001
roach-0.001
est-0.001
paste-0.001
 phrase-0.001
Token upon
Token ID2402
Feature activation-2.50e-02

Pos logit contributions
 Enjoy+0.037
ll+0.035
We+0.033
 together+0.032
prise+0.031
Neg logit contributions
est-0.051
 aids-0.050
ement-0.047
oline-0.047
roach-0.046
Token
Token ID198
Feature activation+2.06e-03

Pos logit contributions
 aids+0.001
 phrase+0.001
 gloss+0.001
roach+0.001
paste+0.001
Neg logit contributions
ll-0.001
 far-0.001
 together-0.001
 ours-0.001
 here-0.001
TokenFrom
Token ID4863
Feature activation+2.37e-03

Pos logit contributions
ll+0.004
 far+0.003
 together+0.003
 here+0.003
arl+0.003
Neg logit contributions
 aids-0.004
 gloss-0.003
 phrase-0.003
paste-0.003
roach-0.003
Token then
Token ID788
Feature activation-7.90e-03

Pos logit contributions
ll+0.004
 together+0.004
 far+0.004
 glad+0.003
 here+0.003
Neg logit contributions
 aids-0.004
roach-0.004
paste-0.004
eness-0.004
est-0.004
Token on
Token ID319
Feature activation-2.39e-02

Pos logit contributions
 aids+0.019
roach+0.019
paste+0.018
eness+0.018
 gloss+0.018
Neg logit contributions
ll-0.009
 together-0.009
 far-0.008
 glad-0.007
 here-0.006
Token everyone
Token ID2506
Feature activation+3.97e-02

Pos logit contributions
roach+0.053
 aids+0.052
paste+0.051
eness+0.051
 gloss+0.051
Neg logit contributions
ll-0.032
 together-0.031
 far-0.030
 glad-0.025
 here-0.024
Token in
Token ID287
Feature activation+3.62e-02

Pos logit contributions
ll+0.065
 together+0.060
 far+0.059
 glad+0.051
 here+0.050
Neg logit contributions
 aids-0.075
roach-0.074
paste-0.073
est-0.072
eness-0.072
Token the
Token ID262
Feature activation+1.65e-02

Pos logit contributions
ll+0.061
 together+0.049
 far+0.048
We+0.042
 glad+0.042
Neg logit contributions
 aids-0.072
est-0.068
roach-0.068
eness-0.067
 sensation-0.067
Token me
Token ID502
Feature activation+2.97e-02

Pos logit contributions
ll+0.028
 together+0.026
 far+0.025
 glad+0.022
 here+0.022
Neg logit contributions
 aids-0.030
roach-0.030
est-0.029
eness-0.029
 phrase-0.029
Tokenadow
Token ID4584
Feature activation+3.04e-02

Pos logit contributions
ll+0.055
 together+0.051
 far+0.050
 glad+0.044
 here+0.043
Neg logit contributions
 aids-0.050
roach-0.050
paste-0.049
est-0.048
eness-0.048
Token was
Token ID373
Feature activation+6.89e-03

Pos logit contributions
ll+0.048
 together+0.045
 far+0.043
 glad+0.037
 here+0.036
Neg logit contributions
roach-0.060
 aids-0.060
paste-0.059
 gloss-0.058
 phrase-0.058
Token happy
Token ID3772
Feature activation+1.94e-02

Pos logit contributions
ll+0.009
 together+0.008
 far+0.008
 glad+0.007
 here+0.006
Neg logit contributions
roach-0.015
 aids-0.015
paste-0.015
eness-0.015
 gloss-0.015
Token down
Token ID866
Feature activation+1.43e-02

Pos logit contributions
roach+0.015
 aids+0.015
paste+0.015
 gloss+0.015
eness+0.015
Neg logit contributions
ll-0.009
 together-0.009
 far-0.009
 here-0.007
 glad-0.007
Token.
Token ID13
Feature activation-7.40e-03

Pos logit contributions
 together+0.019
ll+0.018
 far+0.018
 here+0.014
 glad+0.014
Neg logit contributions
 aids-0.032
roach-0.032
paste-0.031
 gloss-0.031
eness-0.031
Token He
Token ID679
Feature activation-5.13e-03

Pos logit contributions
 aids+0.016
roach+0.015
 gloss+0.015
paste+0.015
est+0.015
Neg logit contributions
ll-0.011
 far-0.009
 together-0.009
 here-0.008
 glad-0.008
Token was
Token ID373
Feature activation-5.86e-03

Pos logit contributions
roach+0.011
 aids+0.011
eness+0.011
est+0.011
paste+0.010
Neg logit contributions
ll-0.008
 together-0.008
 far-0.007
 glad-0.006
We-0.006
Token so
Token ID523
Feature activation-9.30e-03

Pos logit contributions
roach+0.014
eness+0.014
 aids+0.014
paste+0.013
est+0.013
Neg logit contributions
 together-0.007
 far-0.007
ll-0.007
 glad-0.007
 here-0.006
Token happy
Token ID3772
Feature activation+2.02e-02

Pos logit contributions
 aids+0.021
 sensation+0.020
 gloss+0.020
paste+0.020
est+0.020
Neg logit contributions
ll-0.012
 together-0.009
 far-0.008
 Enjoy-0.008
arl-0.007
Token that
Token ID326
Feature activation+6.10e-03

Pos logit contributions
ll+0.026
 together+0.025
 far+0.023
 here+0.019
 glad+0.018
Neg logit contributions
roach-0.048
 aids-0.048
eness-0.046
paste-0.046
est-0.046
Token he
Token ID339
Feature activation+2.25e-03

Pos logit contributions
ll+0.008
 together+0.008
 far+0.008
 glad+0.006
 here+0.006
Neg logit contributions
 aids-0.014
roach-0.013
paste-0.013
eness-0.013
 phrase-0.013
Token was
Token ID373
Feature activation+1.11e-02

Pos logit contributions
ll+0.004
 together+0.003
 far+0.003
 glad+0.003
 here+0.003
Neg logit contributions
 aids-0.004
roach-0.004
paste-0.004
est-0.004
 gloss-0.004
Token able
Token ID1498
Feature activation+3.56e-02

Pos logit contributions
ll+0.019
 together+0.017
 far+0.016
arl+0.015
We+0.014
Neg logit contributions
 aids-0.020
est-0.019
paste-0.019
 gloss-0.019
roach-0.019
Token to
Token ID284
Feature activation-6.37e-03

Pos logit contributions
ll+0.047
 together+0.039
arl+0.038
 far+0.037
 Enjoy+0.035
Neg logit contributions
 aids-0.073
 movements-0.072
paste-0.071
 gloss-0.071
 phrase-0.070
Token they
Token ID484
Feature activation+4.16e-03

Pos logit contributions
 together+0.008
 far+0.008
ll+0.008
 here+0.006
 glad+0.006
Neg logit contributions
roach-0.019
 aids-0.019
eness-0.019
paste-0.019
est-0.019
Token reached
Token ID4251
Feature activation-2.60e-02

Pos logit contributions
ll+0.007
 together+0.006
 far+0.006
 glad+0.006
 here+0.005
Neg logit contributions
 aids-0.008
paste-0.007
roach-0.007
eness-0.007
est-0.007
Token their
Token ID511
Feature activation-1.06e-02

Pos logit contributions
 aids+0.056
roach+0.055
paste+0.055
est+0.054
eness+0.054
Neg logit contributions
ll-0.036
 together-0.032
 far-0.031
 glad-0.027
 here-0.025
Token mom
Token ID1995
Feature activation+1.53e-02

Pos logit contributions
 aids+0.019
 movements+0.019
 phrase+0.019
 poses+0.019
est+0.018
Neg logit contributions
ll-0.018
 together-0.016
 far-0.015
 glad-0.014
 Enjoy-0.014
Token.
Token ID13
Feature activation+8.77e-03

Pos logit contributions
ll+0.029
 far+0.026
 together+0.025
 glad+0.025
 Enjoy+0.023
Neg logit contributions
 aids-0.024
est-0.024
paste-0.023
 gloss-0.023
eness-0.023
Token She
Token ID1375
Feature activation+9.91e-03

Pos logit contributions
ll+0.011
 together+0.009
 far+0.008
 glad+0.007
 here+0.007
Neg logit contributions
 aids-0.021
roach-0.021
paste-0.020
 gloss-0.020
 phrase-0.020
Token hugged
Token ID41130
Feature activation+4.48e-02

Pos logit contributions
ll+0.015
 together+0.014
 far+0.014
 glad+0.012
 here+0.012
Neg logit contributions
 aids-0.020
roach-0.019
paste-0.019
est-0.019
 gloss-0.019
Token them
Token ID606
Feature activation+1.24e-02

Pos logit contributions
ll+0.058
 together+0.046
 far+0.044
 glad+0.038
 here+0.034
Neg logit contributions
 aids-0.105
roach-0.103
eness-0.102
est-0.102
 phrase-0.101
Token and
Token ID290
Feature activation+1.89e-02

Pos logit contributions
ll+0.019
 far+0.016
 together+0.016
 glad+0.014
We+0.014
Neg logit contributions
 aids-0.024
est-0.024
eness-0.024
roach-0.023
paste-0.023
Token asked
Token ID1965
Feature activation+2.51e-02

Pos logit contributions
ll+0.025
 together+0.023
 far+0.023
 glad+0.018
 here+0.018
Neg logit contributions
roach-0.043
 aids-0.043
est-0.042
eness-0.041
paste-0.041
Token them
Token ID606
Feature activation+6.16e-03

Pos logit contributions
ll+0.027
 together+0.023
 far+0.022
 glad+0.017
 here+0.016
Neg logit contributions
 aids-0.063
roach-0.062
paste-0.061
est-0.061
eness-0.061
Token off
Token ID572
Feature activation-6.41e-03

Pos logit contributions
 aids+0.007
roach+0.007
eness+0.007
est+0.007
paste+0.007
Neg logit contributions
ll-0.005
 together-0.005
 far-0.005
 glad-0.004
 here-0.004
Token she
Token ID673
Feature activation-2.24e-03

Pos logit contributions
 aids+0.014
roach+0.013
paste+0.013
est+0.013
 gloss+0.013
Neg logit contributions
ll-0.009
 together-0.008
 far-0.008
 glad-0.007
 here-0.007
Token went
Token ID1816
Feature activation+2.64e-02

Pos logit contributions
 aids+0.005
roach+0.004
paste+0.004
est+0.004
 gloss+0.004
Neg logit contributions
ll-0.003
 together-0.003
 far-0.003
 glad-0.003
 here-0.002
Token.
Token ID13
Feature activation-1.25e-02

Pos logit contributions
ll+0.039
 together+0.032
 far+0.032
 glad+0.029
We+0.027
Neg logit contributions
 aids-0.055
roach-0.054
paste-0.054
 gloss-0.054
est-0.054
Token
Token ID198
Feature activation-3.41e-03

Pos logit contributions
 aids+0.025
roach+0.025
paste+0.025
est+0.024
eness+0.024
Neg logit contributions
ll-0.019
 together-0.018
 far-0.017
 glad-0.015
 here-0.014
Token
Token ID198
Feature activation-4.90e-03

Pos logit contributions
 aids+0.007
 gloss+0.006
roach+0.006
 phrase+0.006
paste+0.006
Neg logit contributions
ll-0.005
 ours-0.004
 together-0.004
 far-0.004
 our-0.004
TokenShe
Token ID3347
Feature activation-1.06e-02

Pos logit contributions
 aids+0.010
roach+0.009
paste+0.009
 gloss+0.009
est+0.009
Neg logit contributions
ll-0.008
 together-0.007
 far-0.007
 glad-0.006
 here-0.006
Token had
Token ID550
Feature activation-5.75e-03

Pos logit contributions
 aids+0.021
roach+0.021
paste+0.021
est+0.021
 gloss+0.021
Neg logit contributions
ll-0.016
 together-0.015
 far-0.014
 glad-0.012
 here-0.012
Token a
Token ID257
Feature activation+2.37e-02

Pos logit contributions
roach+0.012
 aids+0.012
eness+0.012
est+0.012
 phrase+0.012
Neg logit contributions
ll-0.008
 together-0.008
 far-0.008
 glad-0.006
 here-0.006
Token great
Token ID1049
Feature activation+5.85e-03

Pos logit contributions
ll+0.037
 together+0.033
 far+0.030
 here+0.027
arl+0.026
Neg logit contributions
 aids-0.046
 gloss-0.046
roach-0.046
 phrase-0.046
paste-0.045
Token adventure
Token ID8855
Feature activation+2.23e-02

Pos logit contributions
ll+0.011
 together+0.010
 far+0.010
 glad+0.009
 here+0.009
Neg logit contributions
 aids-0.010
roach-0.010
paste-0.010
est-0.010
eness-0.010
Token me
Token ID502
Feature activation-3.47e-03

Pos logit contributions
 aids+0.022
roach+0.021
est+0.021
paste+0.021
eness+0.021
Neg logit contributions
ll-0.020
 together-0.018
 far-0.018
 glad-0.016
 here-0.015
Token repair
Token ID9185
Feature activation-2.32e-02

Pos logit contributions
 aids+0.007
roach+0.007
est+0.007
paste+0.007
eness+0.007
Neg logit contributions
ll-0.005
 together-0.004
 far-0.004
 glad-0.004
We-0.004
Token the
Token ID262
Feature activation-1.77e-02

Pos logit contributions
 gloss+0.062
paste+0.061
 phrase+0.061
eness+0.061
 aids+0.061
Neg logit contributions
 together-0.022
 far-0.022
 our-0.020
 here-0.019
ll-0.017
Token hole
Token ID7604
Feature activation-2.62e-02

Pos logit contributions
eness+0.031
 aids+0.031
roach+0.030
 gloss+0.030
est+0.029
Neg logit contributions
 far-0.033
ll-0.031
 together-0.030
 here-0.027
 glad-0.026
Token in
Token ID287
Feature activation+7.27e-03

Pos logit contributions
 aids+0.058
roach+0.058
est+0.058
paste+0.057
eness+0.057
Neg logit contributions
ll-0.033
 together-0.029
 far-0.029
 glad-0.025
 Enjoy-0.023
Token my
Token ID616
Feature activation-8.56e-03

Pos logit contributions
ll+0.013
 together+0.011
 far+0.010
 glad+0.010
 Enjoy+0.009
Neg logit contributions
 aids-0.013
est-0.013
roach-0.013
eness-0.013
 streaks-0.013
Token wall
Token ID3355
Feature activation-2.38e-02

Pos logit contributions
 aids+0.016
roach+0.016
eness+0.016
paste+0.015
 gloss+0.015
Neg logit contributions
ll-0.014
 far-0.014
 together-0.014
 here-0.012
 glad-0.012
Token?"
Token ID1701
Feature activation-5.65e-03

Pos logit contributions
roach+0.059
 aids+0.059
paste+0.058
eness+0.058
 gloss+0.058
Neg logit contributions
ll-0.025
 together-0.024
 far-0.023
 here-0.018
 glad-0.016
Token asked
Token ID1965
Feature activation+6.09e-03

Pos logit contributions
 aids+0.011
 phrase+0.011
 gloss+0.011
est+0.011
roach+0.011
Neg logit contributions
ll-0.009
 far-0.008
 together-0.008
arl-0.007
 glad-0.006
Token Tom
Token ID4186
Feature activation-1.78e-02

Pos logit contributions
ll+0.010
 together+0.010
 far+0.009
 glad+0.008
 here+0.008
Neg logit contributions
 aids-0.011
roach-0.011
paste-0.011
eness-0.011
 gloss-0.011
Token.
Token ID13
Feature activation-1.28e-02

Pos logit contributions
roach+0.037
 aids+0.037
paste+0.036
eness+0.036
est+0.036
Neg logit contributions
ll-0.026
 together-0.024
 far-0.023
 glad-0.019
 here-0.019
Token wild
Token ID4295
Feature activation-3.31e-03

Pos logit contributions
paste+0.012
eness+0.012
 aids+0.012
 gloss+0.012
roach+0.012
Neg logit contributions
 far-0.010
ll-0.009
 together-0.008
 here-0.008
 home-0.007
Token.
Token ID13
Feature activation+1.03e-04

Pos logit contributions
paste+0.007
 gloss+0.007
roach+0.007
 aids+0.007
eness+0.007
Neg logit contributions
 far-0.004
 together-0.004
ll-0.004
 here-0.004
 home-0.003
Token Sally
Token ID25737
Feature activation-1.38e-02

Pos logit contributions
ll+0.000
 far+0.000
 together+0.000
 here+0.000
 glad+0.000
Neg logit contributions
 aids-0.000
 gloss-0.000
roach-0.000
est-0.000
 phrase-0.000
Token was
Token ID373
Feature activation-1.36e-03

Pos logit contributions
roach+0.031
 aids+0.029
eness+0.029
paste+0.029
 sensation+0.028
Neg logit contributions
ll-0.021
 together-0.018
 far-0.018
 glad-0.015
 who-0.015
Token embarrassed
Token ID21100
Feature activation+7.87e-03

Pos logit contributions
roach+0.003
eness+0.003
paste+0.003
 gloss+0.003
est+0.003
Neg logit contributions
 far-0.002
 together-0.002
ll-0.001
 glad-0.001
 here-0.001
Token and
Token ID290
Feature activation+4.12e-03

Pos logit contributions
 together+0.009
ll+0.009
 far+0.009
 here+0.007
 glad+0.007
Neg logit contributions
roach-0.019
 aids-0.019
paste-0.018
 phrase-0.018
 gloss-0.018
Token apologized
Token ID22893
Feature activation+2.31e-02

Pos logit contributions
ll+0.006
 together+0.006
 far+0.006
 glad+0.005
 here+0.005
Neg logit contributions
roach-0.008
 aids-0.008
paste-0.008
eness-0.008
est-0.008
Token to
Token ID284
Feature activation+1.19e-02

Pos logit contributions
 together+0.021
ll+0.019
 far+0.018
 here+0.012
 glad+0.012
Neg logit contributions
roach-0.065
 aids-0.064
tern-0.063
 streaks-0.062
est-0.062
Token her
Token ID607
Feature activation+1.53e-02

Pos logit contributions
ll+0.016
 together+0.015
 far+0.015
 glad+0.012
 here+0.012
Neg logit contributions
 aids-0.026
roach-0.026
paste-0.025
est-0.025
eness-0.025
Token mom
Token ID1995
Feature activation+2.64e-02

Pos logit contributions
ll+0.019
 together+0.018
 far+0.017
 glad+0.014
 here+0.014
Neg logit contributions
roach-0.036
 aids-0.035
paste-0.035
eness-0.034
est-0.034
Token.
Token ID13
Feature activation+1.20e-03

Pos logit contributions
ll+0.037
 together+0.034
 far+0.033
 glad+0.028
 here+0.027
Neg logit contributions
 aids-0.057
roach-0.057
paste-0.056
eness-0.055
 phrase-0.055
Token blue
Token ID4171
Feature activation+1.25e-02

Pos logit contributions
ll+0.027
 together+0.023
 far+0.019
 Enjoy+0.019
arl+0.019
Neg logit contributions
roach-0.021
 gloss-0.021
est-0.020
 brist-0.020
 phrase-0.020
Token ranch
Token ID28752
Feature activation+2.24e-02

Pos logit contributions
ll+0.020
 far+0.019
 together+0.018
 here+0.015
 glad+0.015
Neg logit contributions
 aids-0.024
paste-0.024
roach-0.024
eness-0.024
 phrase-0.023
Token in
Token ID287
Feature activation+5.12e-03

Pos logit contributions
ll+0.028
 together+0.026
 far+0.025
 glad+0.020
 here+0.020
Neg logit contributions
 aids-0.051
roach-0.050
paste-0.050
eness-0.049
est-0.049
Token the
Token ID262
Feature activation+1.06e-03

Pos logit contributions
ll+0.009
 together+0.008
 far+0.007
 glad+0.007
 here+0.006
Neg logit contributions
 aids-0.010
roach-0.010
est-0.009
paste-0.009
eness-0.009
Token countryside
Token ID25708
Feature activation+1.55e-02

Pos logit contributions
ll+0.002
 together+0.002
 glad+0.001
prise+0.001
arl+0.001
Neg logit contributions
est-0.002
 aids-0.002
 streaks-0.002
roach-0.002
 movements-0.002
Token.
Token ID13
Feature activation-1.40e-02

Pos logit contributions
ll+0.023
 together+0.021
 far+0.020
 glad+0.017
 here+0.017
Neg logit contributions
 aids-0.032
roach-0.032
paste-0.032
eness-0.031
 gloss-0.031
Token It
Token ID632
Feature activation-1.02e-02

Pos logit contributions
eness+0.028
paste+0.028
roach+0.027
 aids+0.027
 phrase+0.027
Neg logit contributions
 together-0.020
 far-0.020
ll-0.020
 glad-0.017
 here-0.017
Token was
Token ID373
Feature activation-1.75e-02

Pos logit contributions
 aids+0.016
est+0.016
roach+0.015
paste+0.015
 phrase+0.015
Neg logit contributions
ll-0.020
 together-0.019
 far-0.018
 glad-0.016
arl-0.016
Token full
Token ID1336
Feature activation-2.04e-02

Pos logit contributions
 aids+0.037
roach+0.037
est+0.036
paste+0.036
 gloss+0.036
Neg logit contributions
ll-0.025
 together-0.022
 far-0.022
 glad-0.018
 here-0.018
Token of
Token ID286
Feature activation-4.75e-03

Pos logit contributions
 aids+0.047
roach+0.047
paste+0.046
 gloss+0.046
eness+0.046
Neg logit contributions
ll-0.025
 together-0.024
 far-0.023
 glad-0.018
 here-0.018
Token animals
Token ID4695
Feature activation+1.07e-02

Pos logit contributions
est+0.008
 aids+0.008
roach+0.007
 markings+0.007
 streaks+0.007
Neg logit contributions
ll-0.010
 together-0.008
 far-0.008
arl-0.007
We-0.007
Token that
Token ID326
Feature activation-4.51e-03

Pos logit contributions
ll+0.016
 together+0.014
 far+0.014
 Enjoy+0.013
 glad+0.012
Neg logit contributions
roach-0.017
 aids-0.017
 gloss-0.016
est-0.016
paste-0.016
Token day
Token ID1110
Feature activation-1.71e-02

Pos logit contributions
 aids+0.010
est+0.010
roach+0.010
 gloss+0.009
 phrase+0.009
Neg logit contributions
ll-0.006
 together-0.006
 far-0.005
 glad-0.005
 here-0.004
Token on
Token ID319
Feature activation-1.77e-02

Pos logit contributions
 aids+0.036
 gloss+0.036
paste+0.036
roach+0.036
est+0.036
Neg logit contributions
ll-0.024
 far-0.021
 together-0.020
 glad-0.017
 here-0.017
Token,
Token ID11
Feature activation+1.26e-03

Pos logit contributions
 aids+0.040
roach+0.040
paste+0.039
 phrase+0.039
eness+0.039
Neg logit contributions
ll-0.024
 together-0.022
 far-0.021
 here-0.018
 glad-0.017
Token Lily
Token ID20037
Feature activation-6.69e-03

Pos logit contributions
ll+0.002
 far+0.002
 together+0.002
arl+0.002
 glad+0.001
Neg logit contributions
 gloss-0.002
est-0.002
 aids-0.002
paste-0.002
 poses-0.002
Token blew
Token ID17948
Feature activation-1.48e-02

Pos logit contributions
 aids+0.015
roach+0.015
paste+0.014
eness+0.014
 phrase+0.014
Neg logit contributions
ll-0.010
 together-0.009
 far-0.008
 glad-0.007
We-0.007
Token bubbles
Token ID25037
Feature activation-2.30e-02

Pos logit contributions
 aids+0.037
roach+0.036
 phrase+0.034
paste+0.034
eness+0.034
Neg logit contributions
 together-0.020
 far-0.017
ll-0.014
 home-0.013
 many-0.013
Token every
Token ID790
Feature activation+7.73e-03

Pos logit contributions
 aids+0.057
roach+0.056
 phrase+0.055
eness+0.054
paste+0.054
Neg logit contributions
 together-0.026
ll-0.026
 far-0.024
 here-0.018
 glad-0.017
Token day
Token ID1110
Feature activation-5.50e-03

Pos logit contributions
 far+0.009
 together+0.009
ll+0.009
 here+0.007
 glad+0.007
Neg logit contributions
 aids-0.018
eness-0.018
roach-0.018
paste-0.017
 markings-0.017
Token and
Token ID290
Feature activation-8.50e-03

Pos logit contributions
 aids+0.013
roach+0.013
eness+0.013
 phrase+0.013
est+0.013
Neg logit contributions
ll-0.007
 together-0.006
 far-0.006
 glad-0.005
 here-0.005
Token became
Token ID2627
Feature activation+2.63e-02

Pos logit contributions
 aids+0.018
roach+0.018
eness+0.018
 phrase+0.018
paste+0.018
Neg logit contributions
ll-0.012
 together-0.012
 far-0.011
 glad-0.009
 they-0.009
Token dinner
Token ID8073
Feature activation-1.73e-02

Pos logit contributions
 aids+0.055
roach+0.054
paste+0.053
 gloss+0.052
eness+0.052
Neg logit contributions
ll-0.060
 together-0.055
 far-0.054
 glad-0.048
 here-0.047
Token.
Token ID13
Feature activation-9.70e-03

Pos logit contributions
 aids+0.033
roach+0.033
paste+0.033
est+0.032
 gloss+0.032
Neg logit contributions
ll-0.027
 far-0.024
 together-0.024
 glad-0.022
 here-0.020
Token Her
Token ID2332
Feature activation+6.14e-04

Pos logit contributions
 aids+0.018
est+0.018
 phrase+0.018
 gloss+0.017
roach+0.017
Neg logit contributions
ll-0.016
 far-0.015
 together-0.014
 glad-0.012
 here-0.012
Token mom
Token ID1995
Feature activation+7.03e-03

Pos logit contributions
ll+0.001
 together+0.001
 far+0.001
 glad+0.001
 here+0.001
Neg logit contributions
 aids-0.001
roach-0.001
paste-0.001
est-0.001
 gloss-0.001
Token said
Token ID531
Feature activation-2.29e-02

Pos logit contributions
ll+0.010
 far+0.009
 together+0.009
 glad+0.007
 here+0.007
Neg logit contributions
 aids-0.015
roach-0.014
est-0.014
paste-0.014
 gloss-0.014
Token no
Token ID645
Feature activation-1.05e-02

Pos logit contributions
roach+0.052
 aids+0.051
eness+0.050
 phrase+0.050
 poses+0.049
Neg logit contributions
ll-0.030
 together-0.029
 far-0.028
 glad-0.024
 here-0.023
Token,
Token ID11
Feature activation-2.44e-02

Pos logit contributions
 aids+0.023
roach+0.022
paste+0.022
 phrase+0.022
eness+0.021
Neg logit contributions
ll-0.016
 together-0.015
 far-0.013
 glad-0.011
 here-0.011
Token because
Token ID780
Feature activation+4.51e-03

Pos logit contributions
roach+0.054
 aids+0.053
eness+0.052
est+0.051
 phrase+0.051
Neg logit contributions
ll-0.033
 together-0.033
 far-0.031
 glad-0.026
 here-0.026
Token it
Token ID340
Feature activation-8.76e-03

Pos logit contributions
ll+0.006
 together+0.006
 far+0.005
 glad+0.005
 here+0.004
Neg logit contributions
 aids-0.010
roach-0.010
paste-0.010
est-0.010
eness-0.010
Token was
Token ID373
Feature activation-1.24e-02

Pos logit contributions
 aids+0.017
roach+0.017
eness+0.017
paste+0.016
 phrase+0.016
Neg logit contributions
ll-0.015
 together-0.014
 far-0.013
We-0.011
 here-0.011
Token too
Token ID1165
Feature activation-1.27e-03

Pos logit contributions
 aids+0.026
eness+0.026
roach+0.026
est+0.026
paste+0.026
Neg logit contributions
 far-0.017
 together-0.017
ll-0.017
 here-0.014
 glad-0.014
Token swing
Token ID9628
Feature activation+9.00e-03

Pos logit contributions
roach+0.023
 aids+0.022
paste+0.022
eness+0.022
 phrase+0.022
Neg logit contributions
ll-0.017
 together-0.017
 far-0.016
 here-0.013
 glad-0.013
Token.
Token ID13
Feature activation-9.51e-03

Pos logit contributions
 together+0.011
 far+0.011
ll+0.010
 here+0.008
 home+0.007
Neg logit contributions
roach-0.022
 aids-0.022
paste-0.021
 phrase-0.021
eness-0.021
Token The
Token ID383
Feature activation+1.09e-02

Pos logit contributions
roach+0.022
eness+0.022
 aids+0.022
paste+0.022
est+0.022
Neg logit contributions
ll-0.010
 together-0.010
 far-0.010
 glad-0.008
 here-0.008
Token bird
Token ID6512
Feature activation+5.85e-03

Pos logit contributions
ll+0.016
 far+0.016
 together+0.015
 glad+0.013
 here+0.013
Neg logit contributions
 aids-0.022
roach-0.022
paste-0.022
eness-0.022
 phrase-0.021
Token thought
Token ID1807
Feature activation+5.44e-03

Pos logit contributions
ll+0.009
 together+0.009
 far+0.008
 glad+0.007
 here+0.007
Neg logit contributions
roach-0.012
 aids-0.012
paste-0.011
eness-0.011
 gloss-0.011
Token it
Token ID340
Feature activation-2.13e-02

Pos logit contributions
ll+0.007
 far+0.007
 together+0.006
 here+0.005
 glad+0.005
Neg logit contributions
roach-0.013
eness-0.013
 aids-0.012
 phrase-0.012
est-0.012
Token would
Token ID561
Feature activation+2.17e-02

Pos logit contributions
roach+0.054
 aids+0.054
paste+0.052
eness+0.052
 scalp+0.051
Neg logit contributions
 together-0.026
 far-0.023
ll-0.023
 they-0.016
 who-0.016
Token be
Token ID307
Feature activation-9.71e-04

Pos logit contributions
ll+0.032
 far+0.030
 together+0.029
 here+0.025
 glad+0.024
Neg logit contributions
roach-0.046
 aids-0.045
paste-0.045
 gloss-0.044
eness-0.044
Token fun
Token ID1257
Feature activation+6.16e-03

Pos logit contributions
roach+0.002
 aids+0.002
paste+0.002
eness+0.002
 gloss+0.002
Neg logit contributions
 together-0.001
 far-0.001
ll-0.001
 glad-0.001
 here-0.001
Token to
Token ID284
Feature activation+2.13e-03

Pos logit contributions
ll+0.007
 together+0.007
 far+0.006
 here+0.005
 glad+0.005
Neg logit contributions
 aids-0.015
roach-0.015
paste-0.014
eness-0.014
 gloss-0.014
Token swing
Token ID9628
Feature activation+1.49e-02

Pos logit contributions
ll+0.003
 far+0.003
 together+0.003
 here+0.002
 glad+0.002
Neg logit contributions
roach-0.005
 aids-0.005
paste-0.005
eness-0.005
est-0.005
Token says
Token ID1139
Feature activation+3.41e-04

Pos logit contributions
ll+0.025
 together+0.023
 far+0.022
 glad+0.019
 here+0.019
Neg logit contributions
 aids-0.035
roach-0.035
paste-0.034
est-0.034
eness-0.034
Token,
Token ID11
Feature activation-9.84e-03

Pos logit contributions
ll+0.000
 together+0.000
 far+0.000
 glad+0.000
 here+0.000
Neg logit contributions
roach-0.001
 aids-0.001
eness-0.001
est-0.001
 phrase-0.001
Token "
Token ID366
Feature activation+2.81e-02

Pos logit contributions
 aids+0.019
roach+0.019
est+0.018
eness+0.018
paste+0.018
Neg logit contributions
ll-0.016
 together-0.015
 far-0.015
 glad-0.013
 here-0.013
TokenI
Token ID40
Feature activation+3.24e-02

Pos logit contributions
ll+0.040
 together+0.037
 far+0.036
 glad+0.030
 here+0.030
Neg logit contributions
 aids-0.057
roach-0.057
paste-0.057
 gloss-0.057
eness-0.056
Token know
Token ID760
Feature activation-4.35e-03

Pos logit contributions
ll+0.043
 far+0.040
 together+0.040
 here+0.032
 glad+0.032
Neg logit contributions
 gloss-0.067
paste-0.066
 aids-0.066
roach-0.066
 phrase-0.065
Token you
Token ID345
Feature activation-2.89e-02

Pos logit contributions
roach+0.012
 aids+0.012
 phrase+0.012
 gloss+0.012
 sensation+0.012
Neg logit contributions
 together-0.003
ll-0.003
 far-0.003
 here-0.002
 our-0.002
Token both
Token ID1111
Feature activation+1.06e-02

Pos logit contributions
 aids+0.067
roach+0.066
paste+0.065
 gloss+0.065
est+0.064
Neg logit contributions
ll-0.037
 together-0.033
 far-0.032
 glad-0.026
 here-0.025
Token like
Token ID588
Feature activation+1.67e-02

Pos logit contributions
ll+0.010
 together+0.009
 far+0.009
 glad+0.007
 here+0.007
Neg logit contributions
roach-0.028
paste-0.028
 gloss-0.027
 aids-0.027
eness-0.027
Token different
Token ID1180
Feature activation-2.43e-02

Pos logit contributions
ll+0.022
 together+0.021
 far+0.020
 glad+0.017
 here+0.017
Neg logit contributions
 aids-0.036
roach-0.036
paste-0.036
eness-0.036
est-0.036
Token shows
Token ID2523
Feature activation-6.16e-03

Pos logit contributions
 gloss+0.050
paste+0.050
roach+0.049
 scalp+0.047
 sensation+0.046
Neg logit contributions
 together-0.041
 far-0.034
ll-0.034
 our-0.033
 here-0.032
Token.
Token ID13
Feature activation+1.90e-02

Pos logit contributions
roach+0.014
 aids+0.014
 gloss+0.014
 sensation+0.014
paste+0.014
Neg logit contributions
 together-0.008
ll-0.008
 far-0.007
 here-0.006
 glad-0.006
Tokenrot
Token ID10599
Feature activation+8.98e-03

Pos logit contributions
ll+0.018
 together+0.017
 far+0.017
 here+0.015
 glad+0.014
Neg logit contributions
 aids-0.016
roach-0.016
paste-0.015
eness-0.015
 sensation-0.015
Token,
Token ID11
Feature activation-1.28e-02

Pos logit contributions
ll+0.013
 together+0.012
 far+0.011
 here+0.009
 glad+0.009
Neg logit contributions
roach-0.019
 aids-0.019
eness-0.019
paste-0.019
 gloss-0.019
Token carrot
Token ID40562
Feature activation+1.75e-02

Pos logit contributions
 aids+0.029
roach+0.028
paste+0.028
eness+0.028
est+0.028
Neg logit contributions
ll-0.017
 together-0.015
 far-0.015
 glad-0.012
 here-0.012
Token,
Token ID11
Feature activation-1.21e-02

Pos logit contributions
ll+0.026
 together+0.024
 far+0.023
 here+0.019
 glad+0.019
Neg logit contributions
 aids-0.037
roach-0.036
paste-0.035
eness-0.035
 gloss-0.035
Token you
Token ID345
Feature activation-1.66e-02

Pos logit contributions
 aids+0.026
roach+0.026
paste+0.025
est+0.025
eness+0.025
Neg logit contributions
ll-0.017
 together-0.015
 far-0.015
 glad-0.012
 here-0.012
Token make
Token ID787
Feature activation-5.30e-02

Pos logit contributions
 aids+0.038
roach+0.037
 gloss+0.037
eness+0.037
paste+0.037
Neg logit contributions
ll-0.021
 together-0.019
 far-0.019
 glad-0.015
 here-0.015
Token me
Token ID502
Feature activation-5.94e-03

Pos logit contributions
roach+0.119
 gloss+0.113
paste+0.113
eness+0.112
 sensation+0.110
Neg logit contributions
 together-0.071
 far-0.068
ll-0.066
 our-0.060
 here-0.059
Token strong
Token ID1913
Feature activation-1.92e-02

Pos logit contributions
roach+0.015
 aids+0.015
eness+0.014
paste+0.014
 phrase+0.014
Neg logit contributions
ll-0.006
 together-0.006
 far-0.006
 glad-0.005
 here-0.004
Token."
Token ID526
Feature activation+3.45e-03

Pos logit contributions
roach+0.047
 aids+0.046
 gloss+0.045
paste+0.044
 phrase+0.044
Neg logit contributions
 together-0.021
ll-0.021
 far-0.020
 here-0.017
 glad-0.015
Token She
Token ID1375
Feature activation-7.84e-03

Pos logit contributions
ll+0.004
 together+0.004
 far+0.004
 glad+0.003
 here+0.003
Neg logit contributions
 aids-0.008
roach-0.008
paste-0.008
eness-0.008
est-0.008
Token picks
Token ID11103
Feature activation-1.67e-02

Pos logit contributions
roach+0.016
eness+0.016
 sensation+0.015
 aids+0.015
est+0.015
Neg logit contributions
 together-0.012
ll-0.012
 far-0.012
 here-0.010
 our-0.010
Token not
Token ID407
Feature activation-2.77e-02

Pos logit contributions
ll+0.004
 together+0.004
 far+0.003
 Enjoy+0.003
 glad+0.003
Neg logit contributions
 aids-0.004
 poses-0.004
 phrase-0.004
 gloss-0.004
 movements-0.004
Token.
Token ID13
Feature activation+4.27e-03

Pos logit contributions
 aids+0.059
roach+0.059
paste+0.058
est+0.058
 gloss+0.057
Neg logit contributions
ll-0.039
 together-0.036
 far-0.035
 glad-0.029
 here-0.029
Token They
Token ID1119
Feature activation+1.43e-03

Pos logit contributions
ll+0.005
 far+0.004
 together+0.004
 glad+0.003
 here+0.003
Neg logit contributions
 aids-0.010
roach-0.010
paste-0.010
 gloss-0.010
 phrase-0.010
Token only
Token ID691
Feature activation-1.01e-02

Pos logit contributions
ll+0.002
 far+0.002
 together+0.002
 glad+0.001
We+0.001
Neg logit contributions
 aids-0.003
roach-0.003
est-0.003
paste-0.003
eness-0.003
Token used
Token ID973
Feature activation-6.60e-04

Pos logit contributions
 aids+0.022
roach+0.022
paste+0.021
est+0.021
eness+0.021
Neg logit contributions
ll-0.014
 together-0.013
 far-0.013
 glad-0.011
We-0.010
Token simple
Token ID2829
Feature activation-5.18e-02

Pos logit contributions
roach+0.002
 aids+0.002
est+0.002
paste+0.002
 gloss+0.002
Neg logit contributions
 together-0.001
ll-0.001
 far-0.001
 here-0.000
 glad-0.000
Token words
Token ID2456
Feature activation-1.74e-02

Pos logit contributions
roach+0.078
 gloss+0.074
paste+0.074
 sensation+0.069
 scalp+0.069
Neg logit contributions
 together-0.106
 far-0.095
 their-0.089
 our-0.088
 love-0.087
Token,
Token ID11
Feature activation-4.01e-02

Pos logit contributions
roach+0.041
 aids+0.040
est+0.039
 sensation+0.039
paste+0.039
Neg logit contributions
ll-0.021
 together-0.021
 far-0.020
 here-0.016
 glad-0.015
Token but
Token ID475
Feature activation+3.85e-03

Pos logit contributions
roach+0.089
est+0.084
 aids+0.083
eness+0.083
 phrase+0.083
Neg logit contributions
 together-0.055
 far-0.052
ll-0.050
 glad-0.044
 here-0.043
Token their
Token ID511
Feature activation-1.87e-02

Pos logit contributions
ll+0.003
 together+0.003
 far+0.003
 glad+0.002
 here+0.002
Neg logit contributions
roach-0.011
est-0.010
 aids-0.010
eness-0.010
 phrase-0.010
Token words
Token ID2456
Feature activation-1.13e-02

Pos logit contributions
roach+0.033
paste+0.032
eness+0.031
 aids+0.031
est+0.031
Neg logit contributions
ll-0.033
 together-0.032
 far-0.032
 glad-0.027
 here-0.027
Token my
Token ID616
Feature activation-2.88e-03

Pos logit contributions
 aids+0.036
roach+0.036
paste+0.035
 gloss+0.035
eness+0.035
Neg logit contributions
ll-0.027
 together-0.026
 far-0.025
 here-0.021
 glad-0.021
Token water
Token ID1660
Feature activation+2.07e-02

Pos logit contributions
 aids+0.006
roach+0.006
paste+0.006
eness+0.006
 gloss+0.006
Neg logit contributions
ll-0.004
 together-0.004
 far-0.004
 glad-0.003
 here-0.003
Token.
Token ID13
Feature activation+2.29e-02

Pos logit contributions
ll+0.032
 far+0.028
 together+0.028
 glad+0.026
 Enjoy+0.024
Neg logit contributions
est-0.040
paste-0.039
 aids-0.039
roach-0.039
eness-0.039
Token It
Token ID632
Feature activation+7.59e-04

Pos logit contributions
ll+0.036
 together+0.032
 far+0.032
 glad+0.027
 here+0.026
Neg logit contributions
 aids-0.046
roach-0.045
paste-0.044
eness-0.044
est-0.044
Token will
Token ID481
Feature activation+1.41e-02

Pos logit contributions
ll+0.001
 together+0.001
 far+0.001
 glad+0.001
 here+0.001
Neg logit contributions
 aids-0.002
roach-0.002
paste-0.002
eness-0.002
est-0.002
Token make
Token ID787
Feature activation-5.29e-02

Pos logit contributions
ll+0.025
 together+0.023
 far+0.022
 glad+0.020
We+0.019
Neg logit contributions
 aids-0.024
paste-0.023
roach-0.023
est-0.023
eness-0.023
Token your
Token ID534
Feature activation-2.07e-02

Pos logit contributions
 aids+0.145
roach+0.142
paste+0.141
est+0.141
eness+0.140
Neg logit contributions
ll-0.044
 together-0.035
 far-0.034
 glad-0.024
 here-0.022
Token bottom
Token ID4220
Feature activation-9.92e-04

Pos logit contributions
 aids+0.044
roach+0.043
paste+0.043
eness+0.042
 gloss+0.042
Neg logit contributions
ll-0.030
 together-0.027
 far-0.027
 glad-0.023
 here-0.022
Token feel
Token ID1254
Feature activation-1.01e-02

Pos logit contributions
 aids+0.002
roach+0.002
paste+0.002
est+0.002
eness+0.002
Neg logit contributions
ll-0.001
 together-0.001
 far-0.001
 glad-0.001
 here-0.001
Token better
Token ID1365
Feature activation+4.80e-03

Pos logit contributions
 aids+0.020
paste+0.020
est+0.019
roach+0.019
eness+0.019
Neg logit contributions
ll-0.016
 together-0.014
 far-0.014
 glad-0.012
 here-0.011
Token."
Token ID526
Feature activation+1.25e-02

Pos logit contributions
ll+0.008
 far+0.007
 together+0.006
 Enjoy+0.006
 glad+0.006
Neg logit contributions
est-0.009
paste-0.009
eness-0.009
der-0.008
 aids-0.008
Token that
Token ID326
Feature activation+6.45e-03

Pos logit contributions
roach+0.051
eness+0.049
 sensation+0.049
 aids+0.048
 poses+0.048
Neg logit contributions
 together-0.026
 far-0.024
ll-0.024
 here-0.020
 they-0.019
Token they
Token ID484
Feature activation-7.67e-03

Pos logit contributions
ll+0.007
 together+0.007
 far+0.006
 here+0.005
 glad+0.005
Neg logit contributions
roach-0.016
 aids-0.016
eness-0.015
 phrase-0.015
paste-0.015
Token were
Token ID547
Feature activation+7.59e-03

Pos logit contributions
 aids+0.014
est+0.014
paste+0.014
eness+0.014
roach+0.014
Neg logit contributions
ll-0.013
 far-0.011
 together-0.011
 glad-0.010
arl-0.010
Token going
Token ID1016
Feature activation+1.95e-02

Pos logit contributions
ll+0.012
 together+0.009
 far+0.009
arl+0.009
We+0.009
Neg logit contributions
 aids-0.014
roach-0.014
est-0.014
 markings-0.014
paste-0.014
Token to
Token ID284
Feature activation-4.90e-03

Pos logit contributions
ll+0.027
 far+0.021
 glad+0.021
arl+0.020
 together+0.019
Neg logit contributions
roach-0.039
 gloss-0.038
paste-0.038
est-0.038
 markings-0.038
Token make
Token ID787
Feature activation-5.29e-02

Pos logit contributions
 aids+0.011
roach+0.010
est+0.010
 phrase+0.010
 movements+0.010
Neg logit contributions
ll-0.007
 together-0.006
 far-0.006
 glad-0.005
arl-0.005
Token something
Token ID1223
Feature activation-3.63e-03

Pos logit contributions
roach+0.113
 aids+0.109
eness+0.108
paste+0.108
 phrase+0.108
Neg logit contributions
 together-0.079
ll-0.073
 far-0.073
 here-0.060
 our-0.060
Token special
Token ID2041
Feature activation-8.12e-03

Pos logit contributions
 aids+0.008
roach+0.008
eness+0.008
paste+0.008
est+0.008
Neg logit contributions
ll-0.005
 together-0.005
 far-0.004
 glad-0.004
 here-0.004
Token.
Token ID13
Feature activation-1.57e-02

Pos logit contributions
 aids+0.019
roach+0.019
paste+0.018
eness+0.018
 gloss+0.018
Neg logit contributions
ll-0.010
 together-0.009
 far-0.009
 here-0.007
 glad-0.007
Token Inside
Token ID14384
Feature activation+6.25e-03

Pos logit contributions
roach+0.034
paste+0.033
eness+0.033
 aids+0.033
est+0.033
Neg logit contributions
ll-0.021
 together-0.021
 far-0.020
 glad-0.016
 here-0.016
Token the
Token ID262
Feature activation-6.86e-03

Pos logit contributions
ll+0.010
 together+0.009
 far+0.009
arl+0.009
 glad+0.009
Neg logit contributions
est-0.011
roach-0.011
 aids-0.011
 streaks-0.010
eness-0.010
Token store
Token ID3650
Feature activation+1.06e-02

Pos logit contributions
 aids+0.015
roach+0.015
paste+0.014
 gloss+0.014
est+0.014
Neg logit contributions
ll-0.013
 together-0.012
 far-0.012
 glad-0.010
 here-0.010
Token and
Token ID290
Feature activation+9.29e-03

Pos logit contributions
ll+0.017
 glad+0.013
arl+0.013
 far+0.013
 Enjoy+0.013
Neg logit contributions
est-0.019
eness-0.018
 aids-0.018
paste-0.017
der-0.017
Token buy
Token ID2822
Feature activation-1.57e-02

Pos logit contributions
ll+0.014
 together+0.011
arl+0.010
 far+0.010
 glad+0.009
Neg logit contributions
 aids-0.018
paste-0.017
 gloss-0.017
roach-0.017
 movements-0.017
Token some
Token ID617
Feature activation-3.30e-02

Pos logit contributions
 aids+0.029
roach+0.028
eness+0.027
 phrase+0.027
paste+0.027
Neg logit contributions
ll-0.027
 together-0.026
 far-0.025
 glad-0.022
 here-0.022
Token more
Token ID517
Feature activation-2.56e-02

Pos logit contributions
 aids+0.056
roach+0.052
 phrase+0.052
eness+0.052
 sensation+0.051
Neg logit contributions
ll-0.064
 together-0.062
 far-0.060
 here-0.053
 our-0.052
Token cel
Token ID18725
Feature activation-5.73e-02

Pos logit contributions
 aids+0.044
roach+0.042
eness+0.041
 phrase+0.041
 sensation+0.040
Neg logit contributions
ll-0.049
 together-0.048
 far-0.046
 here-0.041
 our-0.040
Tokenery
Token ID1924
Feature activation-1.74e-02

Pos logit contributions
 aids+0.070
roach+0.069
paste+0.068
est+0.067
eness+0.067
Neg logit contributions
ll-0.132
 together-0.124
 far-0.124
 glad-0.113
 here-0.110
Token!"
Token ID2474
Feature activation-1.26e-02

Pos logit contributions
 aids+0.040
roach+0.039
paste+0.039
est+0.038
eness+0.038
Neg logit contributions
ll-0.022
 together-0.020
 far-0.019
 glad-0.016
 here-0.015
Token Jane
Token ID12091
Feature activation-1.43e-02

Pos logit contributions
 aids+0.024
roach+0.024
paste+0.023
eness+0.023
est+0.023
Neg logit contributions
ll-0.021
 together-0.019
 far-0.019
 glad-0.016
 here-0.016
Token was
Token ID373
Feature activation-1.02e-02

Pos logit contributions
roach+0.031
 aids+0.030
paste+0.029
eness+0.029
 poses+0.029
Neg logit contributions
ll-0.022
 together-0.020
 far-0.018
 close-0.015
 who-0.015
Token so
Token ID523
Feature activation-8.21e-03

Pos logit contributions
roach+0.026
 aids+0.025
eness+0.024
paste+0.024
 gloss+0.024
Neg logit contributions
 together-0.011
ll-0.011
 far-0.011
 glad-0.009
 here-0.009