TOP ACTIVATIONS
MAX = 3.737

one by one . Inside of each one were plenty of
made something special with the tw ig . â
Lily accidentally knocked over the te ap ot and spilled tea
. After jour ne ying for many days ,
ily was a big girl now . She was going to
big picture of a sun with her orange cr ay on
. He picked up the tw ig and showed it to
they slept happily under the tw ink ling stars and moons
them how to polish the tw igs until they spark led
helping me polish the silver ware and pick the strawberries .
you and play with the tw ig too ?" asked Ch
't have to keep the tw ig if you don 't
anything . She knew the te ap ot and the cups
closer look . Under one of the trees , she saw
the zero . The one with more lines wins ."
, I gave you the tw ig as a gift !"
The End . <|endoftext|> With his fingers tightly gripping the
she drew a big sun with a smile y face .
soon as she stepped outside with the egg , she heard
ily was a big girl now . She had a new

MIN ACTIVATIONS
MIN = -2.827

pe aches on Mondays , Wed nes days , and Fridays
did something special . On Wed nes days she liked to
<|endoftext|> Anna and Ben were best friends . They liked to
upon a time , two best friends were discussing their plans
the sunlight . But , best of all , there was
L ily and Ben were best friends . They liked to
and cars and made a story . They had fun and
S ara and Ben are best friends . They like to
<|endoftext|> John and Sarah were best friends . They always played
<|endoftext|> One day , two best friends decided to go on
S ara and Ben were best friends . They liked to
L ily and Ben were best friends . They liked to
<|endoftext|> Ben and Lily were best friends . They liked to
Wed nes days because on Wed nes days she did something
L ily and Ben were best friends . They liked to
S ara and Ben were best friends . They liked to
<|endoftext|> Anna and Ben were best friends . They liked to
S ara and Mia are best friends . They like to
<|endoftext|> Anna and Ben were best friends . They liked to
top to make it y umm ier . Sarah was still

INTERVAL 2.096 - 3.737
CONTAINS 0.001%

was drawing a big sun with yellow chalk when her little
poured some water from the te ap ot and pretended it
back the village to its former state . With the help
big picture of a sun with her orange cr ay on
soon as she stepped outside with the egg , she heard

INTERVAL 0.455 - 2.096
CONTAINS 12.041%

away . He did not say sorry . J
strong . She poured some milk into the cup
" Aw , mom , can we play a little
Tim and Jen were playing in the park . They saw
me some food ?", the voice said . Lu

INTERVAL -1.186 - 0.455
CONTAINS 87.490%

you have to try new things . Sometimes you might find
't come in ," Lily said . " You are not
searching for treasures and laughing at all the strange things they
his puzzle and thanked Wh isk ers for helping him .
the ground next to her ! She felt so happy and

INTERVAL -2.827 - -1.186
CONTAINS 0.469%

will have a new tie for you ." That
he couldn 't find a way out . Finally
the chain , it was even prett ier than she had
home . The squirrel learned that life is not always fair
want to make a house for their toys .
Token one
Token ID530
Feature activation+8.35e-01

Pos logit contributions
adoes+0.027
outing+0.024
 fears+0.024
ows+0.023
inters+0.023
Neg logit contributions
 se-0.013
Ar-0.013
 oct-0.012
 pl-0.012
 sw-0.011
Token by
Token ID416
Feature activation+9.07e-01

Pos logit contributions
Ar+0.023
 se+0.022
 oct+0.021
 sw+0.020
 tort+0.019
Neg logit contributions
adoes-0.032
outing-0.029
ows-0.029
 fears-0.029
 anymore-0.029
Token one
Token ID530
Feature activation+1.09e+00

Pos logit contributions
Ar+0.027
 se+0.027
 oct+0.025
 sur+0.025
rob+0.025
Neg logit contributions
adoes-0.030
 anymore-0.030
 appearance-0.027
 fears-0.027
outing-0.026
Token.
Token ID13
Feature activation+2.19e-01

Pos logit contributions
Ar+0.028
 se+0.028
 oct+0.026
 sw+0.025
 tort+0.024
Neg logit contributions
adoes-0.043
outing-0.040
ows-0.039
 fears-0.039
 anymore-0.039
Token Inside
Token ID14384
Feature activation+1.54e+00

Pos logit contributions
 se+0.007
Ar+0.007
 oct+0.007
 pl+0.007
 tort+0.007
Neg logit contributions
adoes-0.007
outing-0.006
 fears-0.006
ows-0.006
 anymore-0.006
Token of
Token ID286
Feature activation+2.33e+00

Pos logit contributions
Ar+0.042
 se+0.040
 oct+0.038
 tort+0.036
 sur+0.035
Neg logit contributions
adoes-0.058
 anymore-0.053
outing-0.053
ows-0.053
cases-0.051
Token each
Token ID1123
Feature activation+4.85e-01

Pos logit contributions
Ar+0.065
 se+0.061
 oct+0.058
 sur+0.057
 tort+0.056
Neg logit contributions
adoes-0.089
 anymore-0.080
outing-0.078
ows-0.077
 fears-0.076
Token one
Token ID530
Feature activation+1.37e+00

Pos logit contributions
Ar+0.016
 se+0.016
 oct+0.015
 tort+0.014
 sw+0.014
Neg logit contributions
adoes-0.016
 fears-0.014
outing-0.014
ows-0.014
 anymore-0.014
Token were
Token ID547
Feature activation+1.28e-01

Pos logit contributions
Ar+0.036
 se+0.035
 oct+0.033
 sw+0.031
 tort+0.031
Neg logit contributions
adoes-0.054
outing-0.049
ows-0.049
 fears-0.048
 anymore-0.048
Token plenty
Token ID6088
Feature activation-1.46e-01

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.003
 sur+0.003
 tort+0.003
Neg logit contributions
adoes-0.005
ows-0.005
outing-0.005
 anymore-0.004
 fears-0.004
Token of
Token ID286
Feature activation-6.22e-01

Pos logit contributions
ows+0.005
adoes+0.005
 anymore+0.005
outing+0.005
 fears+0.005
Neg logit contributions
Ar-0.004
 se-0.004
 oct-0.004
 tort-0.004
rob-0.004
Token made
Token ID925
Feature activation+1.96e-01

Pos logit contributions
 tort+0.010
 oct+0.010
 Rib+0.010
rob+0.010
 cuc+0.010
Neg logit contributions
ows-0.012
 fears-0.012
outed-0.011
adoes-0.011
 anymore-0.011
Token something
Token ID1223
Feature activation+9.00e-01

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.005
 sw+0.004
 tort+0.004
Neg logit contributions
adoes-0.008
outing-0.007
ows-0.007
 fears-0.007
 anymore-0.007
Token special
Token ID2041
Feature activation+2.52e-01

Pos logit contributions
Ar+0.025
 se+0.022
 oct+0.020
thank+0.020
 tort+0.020
Neg logit contributions
adoes-0.034
ows-0.032
 anymore-0.032
outing-0.032
 fears-0.031
Token with
Token ID351
Feature activation+1.31e+00

Pos logit contributions
Ar+0.008
 se+0.007
 tort+0.007
 oct+0.007
 sw+0.007
Neg logit contributions
adoes-0.009
 anymore-0.008
ows-0.008
outing-0.007
 fears-0.007
Token the
Token ID262
Feature activation-9.78e-02

Pos logit contributions
Ar+0.035
 se+0.034
 oct+0.031
 tort+0.030
 sur+0.030
Neg logit contributions
adoes-0.051
outing-0.045
 fears-0.045
ows-0.045
 anymore-0.044
Token tw
Token ID665
Feature activation+2.31e+00

Pos logit contributions
adoes+0.004
 fears+0.004
 anymore+0.003
outing+0.003
 appearance+0.003
Neg logit contributions
Ar-0.002
rob-0.002
 se-0.002
thank-0.002
 oct-0.002
Tokenig
Token ID328
Feature activation+4.58e-01

Pos logit contributions
Ar+0.093
 se+0.087
 oct+0.086
 tort+0.085
thank+0.084
Neg logit contributions
adoes-0.054
 fears-0.050
ows-0.049
outing-0.047
 anymore-0.042
Token.
Token ID13
Feature activation+2.78e-01

Pos logit contributions
Ar+0.011
 se+0.011
 oct+0.010
 sw+0.010
 tort+0.009
Neg logit contributions
adoes-0.019
outing-0.017
ows-0.017
 fears-0.017
 anymore-0.017
Tokenâ
Token ID22940
Feature activation-1.33e-01

Pos logit contributions
 se+0.010
Ar+0.010
 oct+0.009
 sw+0.009
 pl+0.009
Neg logit contributions
adoes-0.008
ows-0.007
cases-0.007
 911-0.007
outing-0.007
Token
Token ID26391
Feature activation-4.80e-02

Pos logit contributions
adoes+0.004
ows+0.004
outing+0.004
 anymore+0.004
 fears+0.004
Neg logit contributions
 se-0.004
Ar-0.004
 oct-0.004
 sw-0.004
 pl-0.004
Token
Token ID198
Feature activation+4.01e-01

Pos logit contributions
adoes+0.002
outing+0.002
 fears+0.002
 anymore+0.002
outed+0.002
Neg logit contributions
Ar-0.001
 se-0.001
 oct-0.001
 pl-0.001
 sw-0.001
Token Lily
Token ID20037
Feature activation-2.15e-01

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.002
outing-0.002
ows-0.001
 fears-0.001
 anymore-0.001
Token accidentally
Token ID14716
Feature activation+2.66e-01

Pos logit contributions
adoes+0.009
outing+0.008
ows+0.008
 fears+0.008
 anymore+0.008
Neg logit contributions
 se-0.006
Ar-0.005
 oct-0.005
 sw-0.005
 pl-0.005
Token knocked
Token ID13642
Feature activation+4.31e-01

Pos logit contributions
 se+0.008
Ar+0.008
 oct+0.008
 sw+0.007
 pl+0.007
Neg logit contributions
adoes-0.009
outing-0.008
ows-0.008
 fears-0.008
 anymore-0.008
Token over
Token ID625
Feature activation-7.26e-02

Pos logit contributions
 se+0.011
Ar+0.011
 oct+0.011
 sw+0.010
 pl+0.010
Neg logit contributions
adoes-0.017
outing-0.016
ows-0.015
 fears-0.015
 anymore-0.015
Token the
Token ID262
Feature activation+3.68e-02

Pos logit contributions
adoes+0.003
outing+0.003
ows+0.003
 fears+0.003
 anymore+0.003
Neg logit contributions
 se-0.002
Ar-0.002
 oct-0.001
 sw-0.001
 pl-0.001
Token te
Token ID573
Feature activation+2.29e+00

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
 fears-0.001
 anymore-0.001
Tokenap
Token ID499
Feature activation+5.30e-01

Pos logit contributions
 se+0.085
Ar+0.084
 oct+0.080
 sw+0.077
 pl+0.076
Neg logit contributions
adoes-0.066
outing-0.058
ows-0.056
 fears-0.055
 anymore-0.055
Tokenot
Token ID313
Feature activation+2.64e-01

Pos logit contributions
 se+0.016
 oct+0.014
 sw+0.014
 pl+0.014
Ar+0.014
Neg logit contributions
adoes-0.019
 fears-0.017
outing-0.017
 anymore-0.017
cases-0.017
Token and
Token ID290
Feature activation-2.05e-01

Pos logit contributions
 se+0.006
Ar+0.006
 oct+0.006
 sw+0.005
 pl+0.005
Neg logit contributions
adoes-0.011
outing-0.010
 fears-0.010
ows-0.010
 anymore-0.010
Token spilled
Token ID31896
Feature activation+1.70e-01

Pos logit contributions
adoes+0.009
outing+0.008
ows+0.008
 fears+0.008
 anymore+0.008
Neg logit contributions
Ar-0.005
 se-0.005
 oct-0.004
 sw-0.004
 tort-0.004
Token tea
Token ID8887
Feature activation+1.86e-01

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.004
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.007
outing-0.006
ows-0.006
 fears-0.006
cases-0.006
Token.
Token ID13
Feature activation+3.11e-01

Pos logit contributions
adoes+0.005
 anymore+0.005
ows+0.005
cases+0.005
outing+0.004
Neg logit contributions
Ar-0.005
 se-0.004
 oct-0.004
 sur-0.004
 sw-0.004
Token
Token ID198
Feature activation+1.93e-01

Pos logit contributions
 se+0.006
Ar+0.006
 oct+0.005
 sw+0.005
 pl+0.005
Neg logit contributions
adoes-0.015
outing-0.014
ows-0.013
 fears-0.013
 anymore-0.013
Token
Token ID198
Feature activation-1.81e-01

Pos logit contributions
 se+0.004
 oct+0.004
 tort+0.004
Ar+0.004
 cuc+0.004
Neg logit contributions
adoes-0.008
ows-0.008
 anymore-0.008
outing-0.008
 fears-0.007
TokenAfter
Token ID3260
Feature activation+8.91e-01

Pos logit contributions
adoes+0.007
outing+0.006
ows+0.006
 fears+0.006
 anymore+0.006
Neg logit contributions
Ar-0.005
 se-0.005
 oct-0.004
 sw-0.004
 pl-0.004
Token jour
Token ID30068
Feature activation+7.23e-01

Pos logit contributions
Ar+0.028
 se+0.025
 sur+0.025
 tort+0.024
 oct+0.024
Neg logit contributions
adoes-0.028
 anymore-0.027
cases-0.025
outing-0.024
 fears-0.024
Tokenne
Token ID710
Feature activation+2.29e+00

Pos logit contributions
 se+0.026
Ar+0.025
 oct+0.024
 sw+0.023
 pl+0.023
Neg logit contributions
adoes-0.021
ows-0.019
 anymore-0.019
outing-0.019
iders-0.018
Tokenying
Token ID1112
Feature activation+1.67e-01

Pos logit contributions
Ar+0.070
 se+0.069
 oct+0.064
 tort+0.061
 sw+0.060
Neg logit contributions
adoes-0.080
outing-0.072
ows-0.071
 fears-0.070
 anymore-0.070
Token for
Token ID329
Feature activation-6.36e-01

Pos logit contributions
 se+0.003
 oct+0.003
 sw+0.003
 pl+0.003
Ar+0.003
Neg logit contributions
adoes-0.008
outing-0.007
ows-0.007
 fears-0.007
outed-0.007
Token many
Token ID867
Feature activation+4.41e-01

Pos logit contributions
adoes+0.025
 anymore+0.022
 fears+0.022
outing+0.021
ows+0.021
Neg logit contributions
Ar-0.017
 se-0.016
 oct-0.015
 sur-0.015
 tort-0.015
Token days
Token ID1528
Feature activation+8.14e-02

Pos logit contributions
Ar+0.011
 se+0.011
 oct+0.010
 tort+0.010
 sur+0.010
Neg logit contributions
adoes-0.017
 fears-0.016
 anymore-0.016
ows-0.015
outing-0.015
Token,
Token ID11
Feature activation+4.65e-01

Pos logit contributions
Ar+0.003
 se+0.002
 oct+0.002
 tort+0.002
 sw+0.002
Neg logit contributions
adoes-0.003
 anymore-0.003
ows-0.003
outing-0.002
cases-0.002
Tokenily
Token ID813
Feature activation-1.12e-01

Pos logit contributions
 se+0.025
Ar+0.025
 oct+0.024
 sw+0.022
 pl+0.022
Neg logit contributions
adoes-0.030
outing-0.026
ows-0.025
 fears-0.025
 anymore-0.025
Token was
Token ID373
Feature activation+1.92e-01

Pos logit contributions
adoes+0.003
ows+0.003
 fears+0.003
 anymore+0.003
outing+0.003
Neg logit contributions
Ar-0.004
 se-0.004
 oct-0.004
 tort-0.004
 sur-0.004
Token a
Token ID257
Feature activation+2.18e-01

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.005
 tort+0.005
 sw+0.005
Neg logit contributions
adoes-0.007
outing-0.007
ows-0.006
 fears-0.006
 anymore-0.006
Token big
Token ID1263
Feature activation+1.49e-02

Pos logit contributions
Ar+0.007
 se+0.006
rob+0.006
 oct+0.006
thank+0.006
Neg logit contributions
adoes-0.007
outing-0.007
 fears-0.006
 anymore-0.006
ows-0.006
Token girl
Token ID2576
Feature activation-2.13e-01

Pos logit contributions
Ar+0.000
 se+0.000
 oct+0.000
 sw+0.000
 pl+0.000
Neg logit contributions
adoes-0.001
outing-0.000
 anymore-0.000
 fears-0.000
ows-0.000
Token now
Token ID783
Feature activation+2.24e+00

Pos logit contributions
 anymore+0.006
adoes+0.006
outing+0.006
ows+0.006
 slams+0.005
Neg logit contributions
Ar-0.008
 se-0.007
 oct-0.007
 tort-0.006
rob-0.006
Token.
Token ID13
Feature activation-9.06e-02

Pos logit contributions
Ar+0.064
 se+0.059
 oct+0.057
 Bub+0.054
 tort+0.054
Neg logit contributions
adoes-0.083
 anymore-0.077
outing-0.076
ows-0.072
 fears-0.072
Token She
Token ID1375
Feature activation-1.44e-01

Pos logit contributions
adoes+0.003
outing+0.003
ows+0.002
 fears+0.002
cases+0.002
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 Bub-0.003
 tort-0.003
Token was
Token ID373
Feature activation-2.75e-02

Pos logit contributions
adoes+0.005
ows+0.005
outing+0.005
 anymore+0.005
 fears+0.005
Neg logit contributions
Ar-0.004
 se-0.004
 oct-0.004
 tort-0.004
 sur-0.003
Token going
Token ID1016
Feature activation-2.39e-01

Pos logit contributions
adoes+0.001
outing+0.001
ows+0.001
 fears+0.001
 anymore+0.001
Neg logit contributions
Ar-0.001
 se-0.001
 oct-0.001
 sw-0.001
 tort-0.001
Token to
Token ID284
Feature activation+9.05e-01

Pos logit contributions
adoes+0.009
outing+0.008
ows+0.008
 anymore+0.008
cases+0.007
Neg logit contributions
Ar-0.007
 se-0.007
 oct-0.007
 tort-0.006
 sw-0.006
Token big
Token ID1263
Feature activation+2.26e-02

Pos logit contributions
adoes+0.007
 anymore+0.006
ows+0.006
outing+0.006
 fears+0.006
Neg logit contributions
Ar-0.005
 se-0.005
 oct-0.005
 sur-0.004
 sw-0.004
Token picture
Token ID4286
Feature activation-3.30e-01

Pos logit contributions
 se+0.001
 oct+0.001
Ar+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
cases-0.001
 fears-0.001
Token of
Token ID286
Feature activation+6.56e-01

Pos logit contributions
adoes+0.014
outing+0.013
 fears+0.013
ows+0.013
 anymore+0.012
Neg logit contributions
 se-0.007
Ar-0.007
 oct-0.007
 sw-0.006
 pl-0.006
Token a
Token ID257
Feature activation+2.91e-01

Pos logit contributions
Ar+0.017
 se+0.016
 oct+0.015
 sur+0.015
 tort+0.015
Neg logit contributions
adoes-0.026
 anymore-0.023
outing-0.023
ows-0.022
 fears-0.022
Token sun
Token ID4252
Feature activation-8.43e-02

Pos logit contributions
Ar+0.009
 se+0.008
 oct+0.008
 sur+0.007
thank+0.007
Neg logit contributions
adoes-0.010
 anymore-0.009
outing-0.009
 fears-0.009
ows-0.009
Token with
Token ID351
Feature activation+2.23e+00

Pos logit contributions
adoes+0.004
outing+0.003
ows+0.003
 fears+0.003
 anymore+0.003
Neg logit contributions
 se-0.002
Ar-0.002
 oct-0.002
 sw-0.002
 pl-0.002
Token her
Token ID607
Feature activation+8.96e-02

Pos logit contributions
Ar+0.054
 se+0.052
 oct+0.049
 tort+0.047
 sur+0.046
Neg logit contributions
adoes-0.092
outing-0.082
ows-0.081
 fears-0.081
 anymore-0.080
Token orange
Token ID10912
Feature activation+3.84e-01

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.002
 tort+0.002
 sw+0.002
Neg logit contributions
adoes-0.003
outing-0.003
ows-0.003
 fears-0.003
 anymore-0.003
Token cr
Token ID1067
Feature activation+7.12e-01

Pos logit contributions
 se+0.011
Ar+0.011
 oct+0.010
 sw+0.009
 pl+0.009
Neg logit contributions
adoes-0.014
outing-0.013
ows-0.013
 fears-0.013
 anymore-0.012
Tokenay
Token ID323
Feature activation-3.40e-01

Pos logit contributions
 se+0.025
Ar+0.025
 oct+0.024
 sw+0.023
 pl+0.023
Neg logit contributions
adoes-0.021
outing-0.019
ows-0.018
 fears-0.018
 anymore-0.018
Tokenon
Token ID261
Feature activation+5.41e-01

Pos logit contributions
adoes+0.012
outing+0.011
ows+0.011
 fears+0.010
 anymore+0.010
Neg logit contributions
Ar-0.010
 se-0.010
 oct-0.010
 sw-0.009
 tort-0.009
Token.
Token ID13
Feature activation+3.98e-01

Pos logit contributions
Ar+0.006
 se+0.006
 oct+0.006
 sur+0.005
 sw+0.005
Neg logit contributions
adoes-0.008
 anymore-0.007
ows-0.007
 fears-0.007
outing-0.007
Token He
Token ID679
Feature activation+4.14e-01

Pos logit contributions
Ar+0.010
 se+0.009
 oct+0.009
 sw+0.008
 pl+0.008
Neg logit contributions
adoes-0.017
outing-0.015
ows-0.015
 fears-0.015
 anymore-0.015
Token picked
Token ID6497
Feature activation+8.39e-01

Pos logit contributions
Ar+0.011
rob+0.010
 oct+0.010
thank+0.010
 se+0.010
Neg logit contributions
 fears-0.013
adoes-0.013
ows-0.013
cases-0.013
outed-0.012
Token up
Token ID510
Feature activation-1.75e-01

Pos logit contributions
 se+0.028
Ar+0.027
 oct+0.026
 sw+0.025
 pl+0.025
Neg logit contributions
adoes-0.028
outing-0.025
ows-0.024
 fears-0.024
 anymore-0.023
Token the
Token ID262
Feature activation+3.49e-01

Pos logit contributions
adoes+0.009
outing+0.008
ows+0.008
 fears+0.008
inters+0.008
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 sw-0.002
 pl-0.002
Token tw
Token ID665
Feature activation+2.23e+00

Pos logit contributions
Ar+0.010
 se+0.010
 oct+0.009
 sw+0.008
 tort+0.008
Neg logit contributions
adoes-0.013
outing-0.012
ows-0.012
 fears-0.012
 anymore-0.011
Tokenig
Token ID328
Feature activation+1.83e-01

Pos logit contributions
Ar+0.085
 se+0.082
 oct+0.079
 sw+0.077
 tort+0.076
Neg logit contributions
adoes-0.061
 fears-0.054
outing-0.054
ows-0.053
 anymore-0.048
Token and
Token ID290
Feature activation-2.06e-01

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.004
 sw+0.003
 pl+0.003
Neg logit contributions
adoes-0.008
outing-0.007
 fears-0.007
ows-0.007
 anymore-0.007
Token showed
Token ID3751
Feature activation-1.96e-01

Pos logit contributions
adoes+0.009
ows+0.008
cases+0.007
 fears+0.007
 slams+0.007
Neg logit contributions
Ar-0.005
 oct-0.005
 se-0.004
 tort-0.004
 sur-0.004
Token it
Token ID340
Feature activation-1.04e+00

Pos logit contributions
adoes+0.008
outing+0.007
ows+0.007
 fears+0.007
 anymore+0.007
Neg logit contributions
 se-0.005
Ar-0.005
 oct-0.004
 sw-0.004
 pl-0.004
Token to
Token ID284
Feature activation+4.75e-01

Pos logit contributions
adoes+0.047
outing+0.045
outed+0.042
ows+0.042
iders+0.042
Neg logit contributions
 se-0.021
 pl-0.019
 sw-0.019
 oct-0.018
Ar-0.018
Token they
Token ID484
Feature activation+9.88e-02

Pos logit contributions
Ar+0.009
 se+0.009
 oct+0.009
 pl+0.008
 sw+0.008
Neg logit contributions
adoes-0.009
 anymore-0.008
outing-0.008
ows-0.007
 fears-0.007
Token slept
Token ID21256
Feature activation-9.09e-02

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.003
 tort+0.002
 sur+0.002
Neg logit contributions
adoes-0.003
ows-0.003
 fears-0.003
 anymore-0.003
outing-0.003
Token happily
Token ID18177
Feature activation+8.79e-01

Pos logit contributions
adoes+0.004
ows+0.003
 anymore+0.003
outing+0.003
 fears+0.003
Neg logit contributions
Ar-0.002
 se-0.002
 oct-0.002
 tort-0.002
 sur-0.002
Token under
Token ID739
Feature activation+3.58e-01

Pos logit contributions
Ar+0.018
 se+0.017
 oct+0.017
 tort+0.015
 cuc+0.015
Neg logit contributions
adoes-0.038
ows-0.037
 anymore-0.037
outing-0.035
 fears-0.035
Token the
Token ID262
Feature activation+2.68e-01

Pos logit contributions
Ar+0.009
 se+0.009
 sur+0.008
 tort+0.008
 oct+0.008
Neg logit contributions
adoes-0.014
 anymore-0.013
 fears-0.013
ows-0.012
outing-0.012
Token tw
Token ID665
Feature activation+2.23e+00

Pos logit contributions
 se+0.007
Ar+0.007
 oct+0.006
 sw+0.006
 pl+0.006
Neg logit contributions
adoes-0.011
outing-0.010
ows-0.010
 fears-0.010
 anymore-0.010
Tokenink
Token ID676
Feature activation+1.11e+00

Pos logit contributions
 se+0.071
Ar+0.069
 oct+0.065
 pl+0.063
 sw+0.062
Neg logit contributions
adoes-0.075
outing-0.068
 anymore-0.066
ows-0.065
 fears-0.063
Tokenling
Token ID1359
Feature activation-8.93e-01

Pos logit contributions
 se+0.039
 oct+0.036
Ar+0.036
 sw+0.036
 pl+0.035
Neg logit contributions
adoes-0.035
outing-0.032
ows-0.029
outed-0.029
 fears-0.029
Token stars
Token ID5788
Feature activation+2.49e-01

Pos logit contributions
adoes+0.038
outing+0.035
ows+0.035
 fears+0.034
 anymore+0.033
Neg logit contributions
 se-0.021
Ar-0.020
 oct-0.019
 sw-0.018
 pl-0.018
Token and
Token ID290
Feature activation-3.22e-01

Pos logit contributions
Ar+0.006
 se+0.006
 oct+0.006
 sw+0.005
 tort+0.005
Neg logit contributions
adoes-0.010
outing-0.009
ows-0.009
 anymore-0.009
 fears-0.009
Token moons
Token ID31093
Feature activation-6.69e-01

Pos logit contributions
adoes+0.013
 fears+0.012
ows+0.012
 anymore+0.012
outing+0.011
Neg logit contributions
Ar-0.008
 se-0.007
 oct-0.007
 tort-0.007
 sur-0.007
Token them
Token ID606
Feature activation-3.18e-01

Pos logit contributions
adoes+0.018
 fears+0.016
ows+0.015
 anymore+0.015
outing+0.015
Neg logit contributions
Ar-0.017
 se-0.017
thank-0.016
 tort-0.015
 sur-0.015
Token how
Token ID703
Feature activation-3.63e-01

Pos logit contributions
adoes+0.012
outing+0.011
ows+0.010
outed+0.010
iders+0.010
Neg logit contributions
 se-0.009
 oct-0.009
Ar-0.009
 sw-0.008
 pl-0.008
Token to
Token ID284
Feature activation-2.49e-01

Pos logit contributions
adoes+0.013
outing+0.012
 fears+0.012
ows+0.011
 anymore+0.011
Neg logit contributions
Ar-0.011
 se-0.010
 oct-0.010
 tort-0.009
thank-0.009
Token polish
Token ID25245
Feature activation+4.07e-01

Pos logit contributions
adoes+0.009
 fears+0.008
 anymore+0.008
ows+0.008
 appearance+0.008
Neg logit contributions
Ar-0.007
rob-0.006
 oct-0.006
 se-0.006
 sur-0.006
Token the
Token ID262
Feature activation+4.79e-01

Pos logit contributions
Ar+0.011
 se+0.011
 oct+0.010
thank+0.010
 tort+0.010
Neg logit contributions
adoes-0.016
ows-0.014
 anymore-0.013
 fears-0.013
outing-0.013
Token tw
Token ID665
Feature activation+2.23e+00

Pos logit contributions
Ar+0.008
 se+0.007
 oct+0.007
thank+0.007
rob+0.006
Neg logit contributions
adoes-0.022
 fears-0.022
 anymore-0.020
 appearance-0.020
ows-0.020
Tokenigs
Token ID9235
Feature activation+3.82e-01

Pos logit contributions
Ar+0.091
 se+0.087
 oct+0.086
 sw+0.083
 tort+0.082
Neg logit contributions
adoes-0.053
ows-0.047
 fears-0.047
outing-0.045
 anymore-0.041
Token until
Token ID1566
Feature activation-1.15e-01

Pos logit contributions
Ar+0.010
 se+0.010
 oct+0.009
 tort+0.008
 sw+0.008
Neg logit contributions
adoes-0.015
ows-0.014
outing-0.014
 anymore-0.014
 fears-0.013
Token they
Token ID484
Feature activation+2.25e-01

Pos logit contributions
adoes+0.004
 anymore+0.004
outing+0.004
 fears+0.004
cases+0.003
Neg logit contributions
Ar-0.003
 se-0.003
 tort-0.003
 oct-0.003
rob-0.003
Token spark
Token ID9009
Feature activation+7.16e-01

Pos logit contributions
Ar+0.007
 tort+0.006
 oct+0.006
rob+0.006
 se+0.006
Neg logit contributions
adoes-0.007
 anymore-0.007
ows-0.007
 fears-0.007
outing-0.007
Tokenled
Token ID992
Feature activation-3.38e-01

Pos logit contributions
 se+0.027
Ar+0.026
 oct+0.025
 sw+0.025
 pl+0.024
Neg logit contributions
adoes-0.020
outing-0.018
 fears-0.017
ows-0.017
 anymore-0.017
Token helping
Token ID5742
Feature activation-1.30e+00

Pos logit contributions
adoes+0.008
 911+0.007
outing+0.007
 aids+0.007
 slams+0.007
Neg logit contributions
Ar-0.009
rob-0.007
 sur-0.007
 se-0.007
 tort-0.007
Token me
Token ID502
Feature activation+3.09e-01

Pos logit contributions
adoes+0.047
outing+0.043
ows+0.042
 fears+0.041
 anymore+0.041
Neg logit contributions
 se-0.039
Ar-0.038
 oct-0.036
 sw-0.034
 pl-0.034
Token polish
Token ID25245
Feature activation+6.26e-01

Pos logit contributions
Ar+0.009
 se+0.008
 oct+0.007
rob+0.007
 tort+0.007
Neg logit contributions
adoes-0.012
ows-0.011
outing-0.011
 anymore-0.011
 fears-0.011
Token the
Token ID262
Feature activation+3.90e-01

Pos logit contributions
Ar+0.016
 se+0.016
 oct+0.015
 sw+0.014
 tort+0.014
Neg logit contributions
adoes-0.025
outing-0.023
ows-0.023
 anymore-0.022
 fears-0.022
Token silver
Token ID8465
Feature activation+7.03e-01

Pos logit contributions
 se+0.010
Ar+0.010
 oct+0.009
 sw+0.009
 pl+0.009
Neg logit contributions
adoes-0.016
outing-0.014
ows-0.014
 fears-0.014
 anymore-0.014
Tokenware
Token ID1574
Feature activation+2.23e+00

Pos logit contributions
Ar+0.018
 se+0.018
 oct+0.017
 sw+0.016
 tort+0.016
Neg logit contributions
adoes-0.028
outing-0.025
ows-0.025
 anymore-0.025
 fears-0.024
Token and
Token ID290
Feature activation-1.63e-01

Pos logit contributions
Ar+0.069
 se+0.062
 oct+0.059
 tort+0.057
 sur+0.057
Neg logit contributions
adoes-0.080
ows-0.078
 anymore-0.073
outing-0.071
cases-0.071
Token pick
Token ID2298
Feature activation+4.81e-01

Pos logit contributions
adoes+0.006
 fears+0.005
outing+0.005
 anymore+0.005
ows+0.005
Neg logit contributions
Ar-0.005
 se-0.004
 oct-0.004
rob-0.004
 sur-0.004
Token the
Token ID262
Feature activation+3.25e-01

Pos logit contributions
Ar+0.015
 se+0.014
 oct+0.014
 tort+0.013
 sur+0.013
Neg logit contributions
adoes-0.016
 anymore-0.015
ows-0.014
outing-0.014
 fears-0.014
Token strawberries
Token ID36973
Feature activation+3.67e-01

Pos logit contributions
Ar+0.009
 se+0.008
 oct+0.008
 sur+0.008
 Bub+0.008
Neg logit contributions
adoes-0.012
 anymore-0.011
outing-0.011
 fears-0.011
 appearance-0.010
Token.
Token ID13
Feature activation-1.25e-01

Pos logit contributions
Ar+0.012
 se+0.011
 oct+0.010
 tort+0.010
 sur+0.010
Neg logit contributions
adoes-0.012
ows-0.012
 anymore-0.012
outing-0.011
cases-0.011
Token you
Token ID345
Feature activation+2.02e-01

Pos logit contributions
 se+0.017
Ar+0.016
 oct+0.016
 sw+0.015
 pl+0.014
Neg logit contributions
adoes-0.023
outing-0.022
ows-0.021
 fears-0.021
cases-0.020
Token and
Token ID290
Feature activation+5.77e-02

Pos logit contributions
 se+0.005
Ar+0.005
 oct+0.005
 sw+0.005
 pl+0.005
Neg logit contributions
adoes-0.008
outing-0.007
ows-0.007
 fears-0.007
 anymore-0.007
Token play
Token ID711
Feature activation+3.29e-01

Pos logit contributions
Ar+0.001
 se+0.001
 oct+0.001
thank+0.001
 pl+0.001
Neg logit contributions
adoes-0.002
 fears-0.002
ows-0.002
 anymore-0.002
outing-0.002
Token with
Token ID351
Feature activation+3.67e-01

Pos logit contributions
Ar+0.010
 se+0.009
 oct+0.009
 sur+0.008
 tort+0.008
Neg logit contributions
adoes-0.012
 anymore-0.011
ows-0.010
 fears-0.010
iders-0.009
Token the
Token ID262
Feature activation+1.13e-01

Pos logit contributions
 se+0.010
Ar+0.010
 oct+0.010
 sw+0.009
 pl+0.009
Neg logit contributions
adoes-0.014
outing-0.013
ows-0.012
 fears-0.012
cases-0.012
Token tw
Token ID665
Feature activation+2.21e+00

Pos logit contributions
Ar+0.002
thank+0.002
 se+0.001
 cuc+0.001
 oct+0.001
Neg logit contributions
adoes-0.005
 fears-0.005
 anymore-0.005
outing-0.005
 appearance-0.005
Tokenig
Token ID328
Feature activation+2.84e-01

Pos logit contributions
Ar+0.087
 se+0.084
 oct+0.082
 tort+0.080
thank+0.079
Neg logit contributions
adoes-0.055
ows-0.049
 fears-0.048
outing-0.048
 anymore-0.045
Token too
Token ID1165
Feature activation-2.16e-01

Pos logit contributions
 se+0.008
Ar+0.007
 oct+0.007
 sw+0.007
 pl+0.006
Neg logit contributions
adoes-0.011
outing-0.010
ows-0.010
 fears-0.010
 anymore-0.010
Token?"
Token ID1701
Feature activation+2.69e-01

Pos logit contributions
adoes+0.010
outing+0.009
ows+0.009
 fears+0.009
cases+0.008
Neg logit contributions
 se-0.005
Ar-0.004
 oct-0.004
 sw-0.004
 pl-0.004
Token asked
Token ID1965
Feature activation+4.47e-01

Pos logit contributions
Ar+0.008
 se+0.007
thank+0.007
 oct+0.006
 cuc+0.006
Neg logit contributions
adoes-0.010
outing-0.008
 slams-0.008
ows-0.008
cases-0.008
Token Ch
Token ID609
Feature activation+6.58e-01

Pos logit contributions
 se+0.013
Ar+0.013
 oct+0.012
thank+0.011
 sur+0.011
Neg logit contributions
adoes-0.016
outing-0.014
 anymore-0.014
ows-0.014
 fears-0.014
Token't
Token ID470
Feature activation+1.41e-01

Pos logit contributions
Ar+0.014
 se+0.013
 oct+0.013
 sw+0.012
 sur+0.012
Neg logit contributions
adoes-0.013
outing-0.012
 anymore-0.011
ows-0.011
 fears-0.011
Token have
Token ID423
Feature activation+1.56e-01

Pos logit contributions
 se+0.004
Ar+0.003
 oct+0.003
 sw+0.003
 tort+0.003
Neg logit contributions
adoes-0.006
cases-0.006
ows-0.006
outing-0.006
outed-0.005
Token to
Token ID284
Feature activation+2.32e-01

Pos logit contributions
 se+0.003
 sw+0.002
 oct+0.002
 pl+0.002
 fish+0.002
Neg logit contributions
ows-0.008
adoes-0.008
iders-0.008
cases-0.007
outing-0.007
Token keep
Token ID1394
Feature activation-1.76e-02

Pos logit contributions
Ar+0.006
 se+0.005
 oct+0.005
 sw+0.005
 tort+0.005
Neg logit contributions
adoes-0.010
outing-0.009
 anymore-0.009
 fears-0.009
ows-0.008
Token the
Token ID262
Feature activation+4.28e-01

Pos logit contributions
adoes+0.001
outing+0.001
ows+0.001
 fears+0.001
 anymore+0.001
Neg logit contributions
Ar-0.000
 se-0.000
 oct-0.000
 sw-0.000
 pl-0.000
Token tw
Token ID665
Feature activation+2.20e+00

Pos logit contributions
Ar+0.012
 se+0.012
 oct+0.011
 sw+0.010
 pl+0.010
Neg logit contributions
adoes-0.017
outing-0.015
ows-0.015
 fears-0.014
 anymore-0.014
Tokenig
Token ID328
Feature activation+2.63e-01

Pos logit contributions
Ar+0.079
 se+0.077
 oct+0.075
 sw+0.072
 tort+0.071
Neg logit contributions
adoes-0.066
ows-0.058
outing-0.058
 fears-0.057
 anymore-0.055
Token if
Token ID611
Feature activation+6.87e-01

Pos logit contributions
 se+0.006
Ar+0.005
 oct+0.005
 pl+0.005
 sw+0.005
Neg logit contributions
adoes-0.011
outing-0.010
 fears-0.010
outed-0.010
 appearance-0.010
Token you
Token ID345
Feature activation+2.56e-02

Pos logit contributions
 se+0.017
Ar+0.015
 sw+0.015
 oct+0.014
 pl+0.014
Neg logit contributions
adoes-0.029
outing-0.027
ows-0.026
 fears-0.026
cases-0.025
Token don
Token ID836
Feature activation+1.33e-01

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
 fears-0.001
 anymore-0.001
Token't
Token ID470
Feature activation-5.10e-02

Pos logit contributions
Ar+0.005
 se+0.004
 oct+0.004
 sw+0.004
 sur+0.004
Neg logit contributions
adoes-0.004
 anymore-0.004
outing-0.004
ows-0.004
 fears-0.003
Token anything
Token ID1997
Feature activation+2.89e-01

Pos logit contributions
adoes+0.014
cases+0.013
 fears+0.013
outing+0.013
ows+0.013
Neg logit contributions
 se-0.007
 oct-0.006
 pl-0.006
 sw-0.006
Ar-0.006
Token.
Token ID13
Feature activation-2.73e-01

Pos logit contributions
 se+0.008
Ar+0.007
 oct+0.007
 sw+0.007
 pl+0.007
Neg logit contributions
adoes-0.011
outing-0.010
ows-0.010
 fears-0.010
cases-0.010
Token She
Token ID1375
Feature activation-3.98e-01

Pos logit contributions
adoes+0.010
outing+0.009
 fears+0.009
ows+0.009
 anymore+0.009
Neg logit contributions
Ar-0.008
 se-0.008
 oct-0.007
 pl-0.007
 tort-0.007
Token knew
Token ID2993
Feature activation-1.18e-01

Pos logit contributions
adoes+0.014
ows+0.013
outing+0.013
 fears+0.013
outed+0.013
Neg logit contributions
Ar-0.012
 tort-0.010
 oct-0.010
 se-0.010
rob-0.010
Token the
Token ID262
Feature activation+2.64e-01

Pos logit contributions
adoes+0.004
outing+0.004
 anymore+0.004
ows+0.003
 fears+0.003
Neg logit contributions
Ar-0.004
 se-0.004
 oct-0.003
 sw-0.003
 tort-0.003
Token te
Token ID573
Feature activation+2.19e+00

Pos logit contributions
Ar+0.008
 se+0.007
 oct+0.007
 sur+0.007
 sw+0.006
Neg logit contributions
adoes-0.009
 anymore-0.009
outing-0.009
 fears-0.008
ows-0.008
Tokenap
Token ID499
Feature activation+2.49e-01

Pos logit contributions
 se+0.069
Ar+0.067
 oct+0.065
 sw+0.062
 pl+0.062
Neg logit contributions
adoes-0.074
outing-0.064
cases-0.064
ows-0.062
 anymore-0.062
Tokenot
Token ID313
Feature activation+8.62e-02

Pos logit contributions
 se+0.007
 oct+0.007
 sw+0.007
 pl+0.006
Ar+0.006
Neg logit contributions
adoes-0.009
outing-0.008
 fears-0.008
cases-0.008
 slams-0.008
Token and
Token ID290
Feature activation-5.16e-02

Pos logit contributions
 se+0.002
Ar+0.002
 oct+0.002
 sw+0.002
 pl+0.002
Neg logit contributions
adoes-0.003
outing-0.003
ows-0.003
 fears-0.003
 anymore-0.003
Token the
Token ID262
Feature activation-6.17e-02

Pos logit contributions
adoes+0.002
outing+0.002
ows+0.002
 fears+0.002
outed+0.002
Neg logit contributions
Ar-0.001
 oct-0.001
 tort-0.001
 se-0.001
rob-0.001
Token cups
Token ID14180
Feature activation+3.93e-01

Pos logit contributions
adoes+0.002
outing+0.002
ows+0.002
 fears+0.002
 anymore+0.002
Neg logit contributions
Ar-0.002
 se-0.002
 oct-0.001
 sur-0.001
 sw-0.001
Token closer
Token ID5699
Feature activation+2.10e-01

Pos logit contributions
adoes+0.000
outing+0.000
ows+0.000
 anymore+0.000
 fears+0.000
Neg logit contributions
 se-0.000
Ar-0.000
 oct-0.000
 sw-0.000
 pl-0.000
Token look
Token ID804
Feature activation+5.78e-01

Pos logit contributions
 se+0.007
Ar+0.007
 oct+0.007
 sw+0.007
 pl+0.007
Neg logit contributions
adoes-0.007
outing-0.006
 fears-0.006
outed-0.006
ows-0.006
Token.
Token ID13
Feature activation+1.66e-01

Pos logit contributions
Ar+0.017
 se+0.016
 oct+0.015
 tort+0.014
 sur+0.014
Neg logit contributions
adoes-0.022
outing-0.020
ows-0.020
 anymore-0.019
 fears-0.019
Token Under
Token ID4698
Feature activation+8.51e-01

Pos logit contributions
 se+0.005
Ar+0.004
 pl+0.004
 tort+0.004
 oct+0.004
Neg logit contributions
adoes-0.006
outing-0.006
 fears-0.005
 anymore-0.005
ows-0.005
Token one
Token ID530
Feature activation+1.00e+00

Pos logit contributions
Ar+0.022
 se+0.022
 oct+0.020
 sw+0.019
 tort+0.019
Neg logit contributions
adoes-0.034
outing-0.031
ows-0.030
 fears-0.030
 anymore-0.030
Token of
Token ID286
Feature activation+2.18e+00

Pos logit contributions
 se+0.023
Ar+0.023
 oct+0.021
 sw+0.019
 pl+0.019
Neg logit contributions
adoes-0.043
outing-0.040
ows-0.039
 fears-0.039
 anymore-0.038
Token the
Token ID262
Feature activation+6.39e-02

Pos logit contributions
Ar+0.053
 se+0.051
 oct+0.047
 sur+0.044
 tort+0.044
Neg logit contributions
adoes-0.092
outing-0.082
 anymore-0.082
ows-0.080
 fears-0.080
Token trees
Token ID7150
Feature activation+7.58e-01

Pos logit contributions
 se+0.002
Ar+0.002
 oct+0.002
 sw+0.002
 pl+0.002
Neg logit contributions
adoes-0.002
outing-0.002
ows-0.002
 fears-0.002
 anymore-0.002
Token,
Token ID11
Feature activation+2.50e-01

Pos logit contributions
 se+0.019
Ar+0.018
 oct+0.017
 sw+0.016
 pl+0.016
Neg logit contributions
adoes-0.032
outing-0.029
 fears-0.028
ows-0.028
 anymore-0.027
Token she
Token ID673
Feature activation-1.23e-01

Pos logit contributions
 se+0.006
 sw+0.005
 oct+0.005
 pl+0.005
 tw+0.005
Neg logit contributions
adoes-0.011
outing-0.010
 fears-0.010
ows-0.010
 appearance-0.010
Token saw
Token ID2497
Feature activation+1.42e-01

Pos logit contributions
adoes+0.004
outing+0.004
ows+0.004
 fears+0.004
 anymore+0.004
Neg logit contributions
Ar-0.003
 se-0.003
 oct-0.003
 tort-0.003
thank-0.003
Token the
Token ID262
Feature activation-1.37e-02

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.001
 sw+0.001
 tort+0.001
Neg logit contributions
adoes-0.003
outing-0.003
 anymore-0.002
ows-0.002
 fears-0.002
Token zero
Token ID6632
Feature activation+7.60e-01

Pos logit contributions
adoes+0.000
 anymore+0.000
outing+0.000
 fears+0.000
ows+0.000
Neg logit contributions
Ar-0.000
 se-0.000
 oct-0.000
 sw-0.000
 Bub-0.000
Token.
Token ID13
Feature activation-2.91e-01

Pos logit contributions
Ar+0.019
 se+0.019
 oct+0.018
 sw+0.017
 tort+0.016
Neg logit contributions
adoes-0.031
outing-0.028
ows-0.028
 anymore-0.027
 fears-0.027
Token The
Token ID383
Feature activation-2.48e-02

Pos logit contributions
adoes+0.008
ows+0.006
outing+0.006
 anymore+0.006
 fears+0.006
Neg logit contributions
 se-0.011
 oct-0.011
Ar-0.011
 pl-0.010
thank-0.010
Token one
Token ID530
Feature activation+1.02e+00

Pos logit contributions
adoes+0.001
 anymore+0.001
 fears+0.001
outing+0.001
ows+0.001
Neg logit contributions
Ar-0.001
 se-0.001
 oct-0.001
 sw-0.001
 Bub-0.001
Token with
Token ID351
Feature activation+2.18e+00

Pos logit contributions
Ar+0.033
 se+0.033
 oct+0.031
 sw+0.029
 tort+0.029
Neg logit contributions
adoes-0.033
outing-0.030
ows-0.029
 fears-0.029
 anymore-0.029
Token more
Token ID517
Feature activation+9.71e-01

Pos logit contributions
Ar+0.053
 se+0.053
 oct+0.049
 sw+0.046
 tort+0.046
Neg logit contributions
adoes-0.089
outing-0.081
ows-0.079
 fears-0.079
 anymore-0.078
Token lines
Token ID3951
Feature activation+3.58e-01

Pos logit contributions
 se+0.029
Ar+0.029
 oct+0.027
 sw+0.026
 pl+0.025
Neg logit contributions
adoes-0.035
outing-0.031
ows-0.031
 fears-0.030
 anymore-0.030
Token wins
Token ID7864
Feature activation+1.15e-01

Pos logit contributions
 se+0.009
Ar+0.009
 oct+0.008
 sw+0.008
 pl+0.008
Neg logit contributions
adoes-0.015
outing-0.013
ows-0.013
 fears-0.013
 anymore-0.013
Token."
Token ID526
Feature activation+7.93e-02

Pos logit contributions
 se+0.003
Ar+0.003
 oct+0.003
 sw+0.002
 tort+0.002
Neg logit contributions
adoes-0.005
outing-0.004
ows-0.004
 anymore-0.004
 fears-0.004
Token
Token ID198
Feature activation+2.28e-01

Pos logit contributions
 se+0.002
Ar+0.002
 oct+0.002
 sw+0.002
 tort+0.002
Neg logit contributions
adoes-0.003
 anymore-0.003
ows-0.003
outing-0.003
 fears-0.003
Token,
Token ID11
Feature activation-3.08e-01

Pos logit contributions
 se+0.007
Ar+0.006
 oct+0.006
 sw+0.006
 pl+0.006
Neg logit contributions
adoes-0.013
outing-0.012
ows-0.011
 fears-0.011
cases-0.011
Token I
Token ID314
Feature activation+1.51e-01

Pos logit contributions
adoes+0.012
outing+0.012
cases+0.012
ows+0.012
outed+0.011
Neg logit contributions
 se-0.009
 sw-0.008
 oct-0.008
 pl-0.007
 show-0.007
Token gave
Token ID2921
Feature activation+4.34e-01

Pos logit contributions
Ar+0.005
 se+0.004
 tort+0.004
 oct+0.004
rob+0.004
Neg logit contributions
adoes-0.005
 fears-0.005
 anymore-0.005
outing-0.005
ows-0.005
Token you
Token ID345
Feature activation+2.51e-02

Pos logit contributions
 se+0.009
 oct+0.008
 sw+0.008
 pl+0.008
Ar+0.007
Neg logit contributions
adoes-0.020
ows-0.019
cases-0.019
outing-0.019
iders-0.018
Token the
Token ID262
Feature activation+3.21e-01

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.000
 sw+0.000
 pl+0.000
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
 fears-0.001
cases-0.001
Token tw
Token ID665
Feature activation+2.18e+00

Pos logit contributions
Ar+0.006
thank+0.006
rob+0.006
 se+0.005
 oct+0.005
Neg logit contributions
 fears-0.013
adoes-0.013
 anymore-0.012
 appearance-0.012
outing-0.012
Tokenig
Token ID328
Feature activation+1.64e-01

Pos logit contributions
Ar+0.089
 se+0.086
 oct+0.084
 tort+0.083
thank+0.082
Neg logit contributions
adoes-0.051
ows-0.047
 fears-0.046
outing-0.045
iders-0.041
Token as
Token ID355
Feature activation+6.23e-01

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.004
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.007
outing-0.006
ows-0.006
 fears-0.006
 anymore-0.006
Token a
Token ID257
Feature activation-2.72e-02

Pos logit contributions
Ar+0.016
 se+0.016
 tort+0.014
 oct+0.014
 sur+0.014
Neg logit contributions
adoes-0.025
outing-0.022
ows-0.022
 fears-0.022
 anymore-0.022
Token gift
Token ID6979
Feature activation-2.79e-01

Pos logit contributions
 fears+0.001
adoes+0.001
 anymore+0.001
 appearance+0.001
ows+0.001
Neg logit contributions
rob-0.001
Ar-0.001
 se-0.001
thank-0.001
 oct-0.001
Token!"
Token ID2474
Feature activation+2.24e-01

Pos logit contributions
adoes+0.010
ows+0.010
 anymore+0.010
outing+0.009
cases+0.009
Neg logit contributions
Ar-0.008
 se-0.008
 oct-0.007
 sur-0.007
 sw-0.007
Token
Token ID198
Feature activation+2.14e-01

Pos logit contributions
 se+0.019
 oct+0.018
 tort+0.018
 cuc+0.018
 sw+0.017
Neg logit contributions
adoes-0.014
 fears-0.014
ows-0.014
 anymore-0.014
outing-0.013
TokenThe
Token ID464
Feature activation+6.50e-02

Pos logit contributions
 oct+0.006
 se+0.006
 sw+0.006
 tort+0.006
 cuc+0.006
Neg logit contributions
adoes-0.007
outing-0.006
 fears-0.006
iders-0.006
 anymore-0.006
Token End
Token ID5268
Feature activation+6.95e-01

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.002
 sw+0.002
 sur+0.002
Neg logit contributions
adoes-0.002
 fears-0.002
outing-0.002
 anymore-0.002
ows-0.002
Token.
Token ID13
Feature activation+4.10e-01

Pos logit contributions
 se+0.019
Ar+0.019
 oct+0.018
 tort+0.018
 sw+0.017
Neg logit contributions
adoes-0.025
 anymore-0.024
ows-0.023
outing-0.022
 fears-0.022
Token<|endoftext|>
Token ID50256
Feature activation+4.65e-01

Pos logit contributions
 se+0.012
 sw+0.011
 oct+0.011
 pl+0.011
Ar+0.010
Neg logit contributions
adoes-0.014
 anymore-0.013
 fears-0.012
ows-0.012
 911-0.012
TokenWith
Token ID3152
Feature activation+2.18e+00

Pos logit contributions
 se+0.013
 oct+0.012
Ar+0.012
 tort+0.012
 pl+0.012
Neg logit contributions
adoes-0.016
 anymore-0.015
ows-0.015
outing-0.014
uffy-0.014
Token his
Token ID465
Feature activation-5.15e-02

Pos logit contributions
Ar+0.060
 se+0.055
 sur+0.054
 tort+0.054
 oct+0.051
Neg logit contributions
adoes-0.077
 fears-0.069
 anymore-0.067
outing-0.066
eting-0.063
Token fingers
Token ID9353
Feature activation+1.06e+00

Pos logit contributions
adoes+0.002
 fears+0.002
outing+0.002
 anymore+0.001
ows+0.001
Neg logit contributions
Ar-0.002
 se-0.001
thank-0.001
 tort-0.001
 oct-0.001
Token tightly
Token ID17707
Feature activation+1.01e+00

Pos logit contributions
 se+0.025
 sw+0.022
 oct+0.022
 pl+0.021
Ar+0.020
Neg logit contributions
adoes-0.046
iders-0.042
outing-0.042
ows-0.041
 fears-0.040
Token gripping
Token ID39931
Feature activation+4.94e-01

Pos logit contributions
 se+0.024
Ar+0.023
 oct+0.022
 sw+0.021
 pl+0.021
Neg logit contributions
adoes-0.042
outing-0.039
ows-0.038
 fears-0.037
 anymore-0.037
Token the
Token ID262
Feature activation+5.33e-01

Pos logit contributions
 se+0.010
Ar+0.009
 oct+0.009
 sw+0.008
 pl+0.008
Neg logit contributions
adoes-0.023
outing-0.022
ows-0.021
 fears-0.020
 anymore-0.020
Token she
Token ID673
Feature activation-7.36e-02

Pos logit contributions
adoes+0.013
ows+0.012
 fears+0.012
outing+0.012
cases+0.011
Neg logit contributions
 se-0.008
Ar-0.007
 oct-0.007
 pl-0.007
 sw-0.007
Token drew
Token ID9859
Feature activation+2.68e-01

Pos logit contributions
adoes+0.003
outing+0.002
 fears+0.002
outed+0.002
 anymore+0.002
Neg logit contributions
Ar-0.002
 oct-0.002
 se-0.002
 tort-0.002
 sur-0.002
Token a
Token ID257
Feature activation+4.41e-01

Pos logit contributions
 se+0.005
Ar+0.005
 oct+0.005
 sw+0.005
 pl+0.004
Neg logit contributions
adoes-0.012
outing-0.011
ows-0.011
 fears-0.011
 anymore-0.011
Token big
Token ID1263
Feature activation-4.78e-02

Pos logit contributions
Ar+0.014
 se+0.013
 oct+0.012
thank+0.012
rob+0.012
Neg logit contributions
adoes-0.014
outing-0.013
 fears-0.013
 anymore-0.013
ows-0.012
Token sun
Token ID4252
Feature activation-6.92e-01

Pos logit contributions
adoes+0.002
 anymore+0.001
 fears+0.001
outing+0.001
ows+0.001
Neg logit contributions
Ar-0.002
 se-0.001
 tort-0.001
thank-0.001
 oct-0.001
Token with
Token ID351
Feature activation+2.17e+00

Pos logit contributions
adoes+0.028
 anymore+0.025
outing+0.025
ows+0.024
 fears+0.023
Neg logit contributions
Ar-0.019
 se-0.017
 oct-0.016
 sur-0.015
 tort-0.015
Token a
Token ID257
Feature activation+4.31e-01

Pos logit contributions
Ar+0.058
 se+0.053
 tort+0.051
 sur+0.051
 oct+0.051
Neg logit contributions
adoes-0.083
outing-0.074
 fears-0.073
 anymore-0.073
ows-0.071
Token smile
Token ID8212
Feature activation+8.30e-02

Pos logit contributions
Ar+0.013
 se+0.012
 oct+0.011
thank+0.011
 tort+0.011
Neg logit contributions
adoes-0.015
outing-0.014
 fears-0.013
 anymore-0.013
ows-0.013
Tokeny
Token ID88
Feature activation-1.11e+00

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.002
 sw+0.002
 tort+0.002
Neg logit contributions
adoes-0.004
outing-0.003
ows-0.003
 anymore-0.003
 fears-0.003
Token face
Token ID1986
Feature activation+1.38e-01

Pos logit contributions
adoes+0.036
outing+0.031
ows+0.031
 fears+0.030
 anymore+0.030
Neg logit contributions
Ar-0.037
 se-0.036
 oct-0.034
 sur-0.032
 tort-0.032
Token.
Token ID13
Feature activation+7.35e-02

Pos logit contributions
Ar+0.004
 se+0.004
 oct+0.003
 sur+0.003
 tort+0.003
Neg logit contributions
adoes-0.005
 anymore-0.005
outing-0.005
ows-0.005
cases-0.004
Token soon
Token ID2582
Feature activation+2.63e-01

Pos logit contributions
Ar+0.032
 se+0.031
 oct+0.029
 tort+0.028
 sw+0.027
Neg logit contributions
adoes-0.044
 anymore-0.040
outing-0.040
ows-0.039
 fears-0.039
Token as
Token ID355
Feature activation+1.16e+00

Pos logit contributions
Ar+0.008
 se+0.008
 oct+0.008
 sw+0.007
 tort+0.007
Neg logit contributions
adoes-0.009
outing-0.008
 anymore-0.008
ows-0.007
 fears-0.007
Token she
Token ID673
Feature activation+3.38e-01

Pos logit contributions
Ar+0.038
 se+0.037
 oct+0.036
 tort+0.035
 sw+0.034
Neg logit contributions
adoes-0.036
 anymore-0.033
outing-0.030
 fears-0.030
cases-0.029
Token stepped
Token ID10764
Feature activation+8.69e-01

Pos logit contributions
Ar+0.011
 cuc+0.010
 oct+0.010
rob+0.010
 tort+0.010
Neg logit contributions
adoes-0.010
outing-0.009
outed-0.009
 fears-0.009
ows-0.009
Token outside
Token ID2354
Feature activation+1.09e+00

Pos logit contributions
Ar+0.027
 se+0.027
 oct+0.025
 sw+0.024
 tort+0.023
Neg logit contributions
adoes-0.030
outing-0.027
 anymore-0.027
ows-0.026
 fears-0.026
Token with
Token ID351
Feature activation+2.17e+00

Pos logit contributions
 se+0.026
Ar+0.024
 oct+0.023
 sw+0.022
 pl+0.022
Neg logit contributions
adoes-0.046
outing-0.043
ows-0.042
 fears-0.041
cases-0.040
Token the
Token ID262
Feature activation-3.83e-02

Pos logit contributions
 se+0.050
Ar+0.050
 oct+0.046
 sw+0.043
 pl+0.042
Neg logit contributions
adoes-0.093
outing-0.085
ows-0.084
 fears-0.083
 anymore-0.082
Token egg
Token ID5935
Feature activation+8.36e-01

Pos logit contributions
adoes+0.001
outing+0.001
 fears+0.001
ows+0.001
 anymore+0.001
Neg logit contributions
Ar-0.001
 se-0.001
 sur-0.001
 oct-0.001
thank-0.001
Token,
Token ID11
Feature activation+5.19e-01

Pos logit contributions
 se+0.019
 oct+0.016
 sw+0.016
 pl+0.016
Ar+0.016
Neg logit contributions
adoes-0.037
outing-0.034
 fears-0.033
ows-0.033
 anymore-0.032
Token she
Token ID673
Feature activation-1.19e-01

Pos logit contributions
 se+0.011
 oct+0.009
 pl+0.009
 sw+0.009
Ar+0.009
Neg logit contributions
adoes-0.024
outing-0.023
ows-0.023
 fears-0.022
inters-0.021
Token heard
Token ID2982
Feature activation+2.62e-01

Pos logit contributions
adoes+0.005
outing+0.004
ows+0.004
 fears+0.004
 anymore+0.004
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 sw-0.003
 pl-0.003
Tokenily
Token ID813
Feature activation+2.16e-03

Pos logit contributions
Ar+0.022
 se+0.022
 tort+0.021
thank+0.020
 oct+0.020
Neg logit contributions
adoes-0.023
ows-0.022
outing-0.022
 anymore-0.021
 fears-0.020
Token was
Token ID373
Feature activation+1.29e-01

Pos logit contributions
Ar+0.000
 se+0.000
 oct+0.000
 tort+0.000
 cuc+0.000
Neg logit contributions
adoes-0.000
ows-0.000
 fears-0.000
 anymore-0.000
 slams-0.000
Token a
Token ID257
Feature activation-5.80e-02

Pos logit contributions
Ar+0.004
 se+0.003
 oct+0.003
 tort+0.003
 sw+0.003
Neg logit contributions
adoes-0.005
outing-0.004
ows-0.004
 fears-0.004
 anymore-0.004
Token big
Token ID1263
Feature activation-1.34e-01

Pos logit contributions
adoes+0.002
outing+0.002
 fears+0.002
 anymore+0.002
ows+0.002
Neg logit contributions
Ar-0.002
 se-0.002
 oct-0.002
rob-0.001
thank-0.001
Token girl
Token ID2576
Feature activation-9.74e-02

Pos logit contributions
adoes+0.005
outing+0.004
ows+0.004
 fears+0.004
 anymore+0.004
Neg logit contributions
 se-0.004
Ar-0.004
 oct-0.004
 sw-0.003
 pl-0.003
Token now
Token ID783
Feature activation+2.15e+00

Pos logit contributions
 anymore+0.003
adoes+0.003
outing+0.003
ows+0.002
 fears+0.002
Neg logit contributions
Ar-0.004
 se-0.003
 oct-0.003
rob-0.003
 tort-0.003
Token.
Token ID13
Feature activation-1.84e-01

Pos logit contributions
Ar+0.064
 se+0.060
 oct+0.058
 sw+0.054
 Bub+0.054
Neg logit contributions
adoes-0.077
 anymore-0.071
outing-0.070
 fears-0.068
ows-0.067
Token She
Token ID1375
Feature activation-1.91e-01

Pos logit contributions
adoes+0.006
outing+0.006
cases+0.005
 fears+0.005
ows+0.005
Neg logit contributions
 se-0.006
Ar-0.006
 oct-0.006
 Bub-0.005
 tort-0.005
Token had
Token ID550
Feature activation+9.62e-01

Pos logit contributions
adoes+0.006
ows+0.006
 anymore+0.006
outing+0.006
 fears+0.006
Neg logit contributions
Ar-0.006
 se-0.005
 oct-0.005
 tort-0.005
 cuc-0.005
Token a
Token ID257
Feature activation+2.61e-01

Pos logit contributions
 se+0.021
Ar+0.021
 oct+0.019
 sw+0.018
 pl+0.017
Neg logit contributions
adoes-0.042
outing-0.039
ows-0.038
 fears-0.038
 anymore-0.037
Token new
Token ID649
Feature activation-6.50e-02

Pos logit contributions
Ar+0.007
 se+0.007
 oct+0.006
 sw+0.006
 pl+0.006
Neg logit contributions
adoes-0.010
outing-0.009
ows-0.009
 fears-0.009
 anymore-0.009
Token pe
Token ID613
Feature activation+9.47e-01

Pos logit contributions
adoes+0.001
 anymore+0.001
outing+0.001
 fears+0.001
ows+0.001
Neg logit contributions
Ar-0.001
 se-0.001
 oct-0.001
 sw-0.001
 tort-0.001
Tokenaches
Token ID3694
Feature activation-1.17e-01

Pos logit contributions
Ar+0.035
 se+0.034
 oct+0.033
 tort+0.031
 sw+0.031
Neg logit contributions
adoes-0.027
outing-0.025
ows-0.024
 fears-0.023
 anymore-0.022
Token on
Token ID319
Feature activation-2.25e-01

Pos logit contributions
adoes+0.004
outing+0.004
ows+0.004
 anymore+0.004
 fears+0.004
Neg logit contributions
Ar-0.003
 se-0.003
 oct-0.003
 sw-0.003
 tort-0.003
Token Mondays
Token ID49748
Feature activation+2.25e-02

Pos logit contributions
adoes+0.010
outing+0.009
ows+0.009
 fears+0.009
 anymore+0.008
Neg logit contributions
 se-0.005
Ar-0.005
 oct-0.005
 sw-0.004
 pl-0.004
Token,
Token ID11
Feature activation-8.52e-03

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.000
 sw+0.000
 pl+0.000
Neg logit contributions
adoes-0.001
outing-0.001
 fears-0.001
ows-0.001
 anymore-0.001
Token Wed
Token ID3300
Feature activation-2.83e+00

Pos logit contributions
adoes+0.000
outing+0.000
ows+0.000
 fears+0.000
 anymore+0.000
Neg logit contributions
Ar-0.000
 se-0.000
 oct-0.000
 sw-0.000
 pl-0.000
Tokennes
Token ID2516
Feature activation+6.91e-01

Pos logit contributions
adoes+0.086
outing+0.077
ows+0.076
 anymore+0.073
 fears+0.072
Neg logit contributions
 se-0.099
Ar-0.097
 oct-0.094
 sw-0.090
 pl-0.088
Tokendays
Token ID12545
Feature activation-6.85e-01

Pos logit contributions
Ar+0.025
 se+0.025
 oct+0.023
 tort+0.023
 sw+0.022
Neg logit contributions
adoes-0.020
outing-0.018
ows-0.018
 anymore-0.018
 fears-0.017
Token,
Token ID11
Feature activation-1.38e-01

Pos logit contributions
adoes+0.031
outing+0.029
 fears+0.029
outed+0.027
 slams+0.027
Neg logit contributions
 se-0.013
Ar-0.012
 pl-0.011
thank-0.011
 oct-0.011
Token and
Token ID290
Feature activation+1.47e-01

Pos logit contributions
adoes+0.006
outing+0.005
ows+0.005
 fears+0.005
 anymore+0.005
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 sw-0.003
 pl-0.003
Token Fridays
Token ID45011
Feature activation+5.53e-01

Pos logit contributions
Ar+0.004
 se+0.004
 oct+0.004
 sw+0.004
 tort+0.004
Neg logit contributions
adoes-0.005
outing-0.005
ows-0.005
 fears-0.005
 anymore-0.005
Token did
Token ID750
Feature activation-1.45e-01

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.002
 sw+0.002
 tort+0.002
Neg logit contributions
adoes-0.003
outing-0.003
ows-0.003
 fears-0.003
 anymore-0.003
Token something
Token ID1223
Feature activation+1.18e+00

Pos logit contributions
adoes+0.005
outing+0.005
 fears+0.005
ows+0.005
outed+0.005
Neg logit contributions
 se-0.004
 pl-0.004
Ar-0.004
 show-0.004
 sw-0.004
Token special
Token ID2041
Feature activation+7.34e-02

Pos logit contributions
 se+0.025
Ar+0.025
 oct+0.023
 sw+0.021
 pl+0.021
Neg logit contributions
adoes-0.052
outing-0.048
ows-0.047
 fears-0.046
 anymore-0.046
Token.
Token ID13
Feature activation+4.85e-02

Pos logit contributions
Ar+0.003
 se+0.002
 oct+0.002
 tort+0.002
 sur+0.002
Neg logit contributions
 anymore-0.002
adoes-0.002
ows-0.002
outing-0.002
cases-0.002
Token On
Token ID1550
Feature activation+1.52e+00

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.002
outing-0.002
ows-0.002
 fears-0.002
 anymore-0.002
Token Wed
Token ID3300
Feature activation-2.33e+00

Pos logit contributions
Ar+0.039
 se+0.039
 oct+0.036
 sw+0.034
 tort+0.034
Neg logit contributions
adoes-0.060
outing-0.054
ows-0.053
 anymore-0.053
 fears-0.052
Tokennes
Token ID2516
Feature activation+1.25e+00

Pos logit contributions
adoes+0.061
outing+0.057
ows+0.056
 anymore+0.053
cases+0.052
Neg logit contributions
 se-0.086
 oct-0.085
Ar-0.083
 tort-0.080
 sw-0.079
Tokendays
Token ID12545
Feature activation+5.29e-01

Pos logit contributions
Ar+0.045
 se+0.044
 oct+0.042
 tort+0.041
 sw+0.039
Neg logit contributions
adoes-0.036
outing-0.033
ows-0.032
 fears-0.032
 anymore-0.032
Token she
Token ID673
Feature activation+1.92e-01

Pos logit contributions
 se+0.012
Ar+0.011
 oct+0.011
 sw+0.010
 pl+0.010
Neg logit contributions
adoes-0.023
outing-0.021
 fears-0.021
ows-0.021
 anymore-0.020
Token liked
Token ID8288
Feature activation+2.97e-01

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.005
 tort+0.005
 sw+0.004
Neg logit contributions
adoes-0.007
outing-0.006
 fears-0.006
ows-0.006
 anymore-0.006
Token to
Token ID284
Feature activation-3.04e-01

Pos logit contributions
 se+0.008
Ar+0.007
 oct+0.007
 sw+0.007
 pl+0.006
Neg logit contributions
adoes-0.012
outing-0.011
ows-0.011
 fears-0.011
 anymore-0.010
Token<|endoftext|>
Token ID50256
Feature activation-3.07e-01

Pos logit contributions
adoes+0.004
outing+0.004
ows+0.004
 anymore+0.004
 fears+0.004
Neg logit contributions
 se-0.002
Ar-0.002
 oct-0.002
 pl-0.002
 sw-0.002
TokenAnna
Token ID31160
Feature activation+1.95e-03

Pos logit contributions
adoes+0.010
 anymore+0.010
ows+0.010
outing+0.009
uffy+0.009
Neg logit contributions
 se-0.009
Ar-0.008
 tort-0.008
 oct-0.008
thank-0.008
Token and
Token ID290
Feature activation+3.07e-02

Pos logit contributions
 se+0.000
Ar+0.000
 oct+0.000
 tort+0.000
 sur+0.000
Neg logit contributions
adoes-0.000
ows-0.000
 anymore-0.000
 fears-0.000
outing-0.000
Token Ben
Token ID3932
Feature activation+9.01e-02

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
cases-0.001
 anymore-0.001
Token were
Token ID547
Feature activation+5.41e-02

Pos logit contributions
 se+0.003
Ar+0.003
 oct+0.003
 sw+0.002
 pl+0.002
Neg logit contributions
adoes-0.003
outing-0.003
ows-0.003
 fears-0.003
 anymore-0.003
Token best
Token ID1266
Feature activation-2.26e+00

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.002
outing-0.002
ows-0.002
 fears-0.002
 anymore-0.002
Token friends
Token ID2460
Feature activation-8.84e-01

Pos logit contributions
adoes+0.096
 fears+0.089
outing+0.089
 anymore+0.088
ows+0.087
Neg logit contributions
Ar-0.052
 se-0.047
 oct-0.045
 tort-0.044
thank-0.043
Token.
Token ID13
Feature activation-5.71e-02

Pos logit contributions
adoes+0.036
outing+0.032
ows+0.032
 anymore+0.032
 fears+0.031
Neg logit contributions
Ar-0.022
 se-0.022
 oct-0.021
 sw-0.019
 tort-0.019
Token They
Token ID1119
Feature activation+3.41e-01

Pos logit contributions
adoes+0.002
outing+0.002
ows+0.002
 fears+0.002
cases+0.002
Neg logit contributions
Ar-0.002
 se-0.002
 oct-0.002
 sw-0.002
 pl-0.002
Token liked
Token ID8288
Feature activation-1.20e-01

Pos logit contributions
Ar+0.011
 se+0.010
 oct+0.010
 tort+0.010
 sur+0.010
Neg logit contributions
adoes-0.011
ows-0.010
 anymore-0.010
outing-0.010
 fears-0.010
Token to
Token ID284
Feature activation-4.59e-02

Pos logit contributions
adoes+0.005
outing+0.004
ows+0.004
 fears+0.004
 anymore+0.004
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 sw-0.003
 pl-0.003
Token upon
Token ID2402
Feature activation-6.86e-01

Pos logit contributions
adoes+0.002
outing+0.002
ows+0.002
 fears+0.002
iders+0.002
Neg logit contributions
 se-0.002
 oct-0.002
Ar-0.002
 sw-0.002
 Bub-0.002
Token a
Token ID257
Feature activation+1.60e-01

Pos logit contributions
adoes+0.030
outing+0.028
ows+0.027
 fears+0.027
 anymore+0.027
Neg logit contributions
 se-0.014
Ar-0.014
 oct-0.013
 sw-0.012
 tort-0.012
Token time
Token ID640
Feature activation-6.19e-01

Pos logit contributions
Ar+0.004
 se+0.004
 oct+0.003
 sw+0.003
 pl+0.003
Neg logit contributions
adoes-0.007
outing-0.006
cases-0.006
ows-0.006
 fears-0.006
Token,
Token ID11
Feature activation-7.30e-01

Pos logit contributions
adoes+0.030
outing+0.027
 fears+0.027
ows+0.027
inters+0.026
Neg logit contributions
 se-0.011
 oct-0.011
Ar-0.010
 sw-0.010
 Bub-0.010
Token two
Token ID734
Feature activation-9.97e-02

Pos logit contributions
adoes+0.029
outing+0.026
 fears+0.026
ows+0.026
cases+0.025
Neg logit contributions
 se-0.019
Ar-0.019
 sw-0.018
 oct-0.018
 Bub-0.017
Token best
Token ID1266
Feature activation-2.25e+00

Pos logit contributions
adoes+0.005
ows+0.005
outing+0.005
iders+0.005
cases+0.005
Neg logit contributions
 se-0.002
 oct-0.001
 sw-0.001
Ar-0.001
 pl-0.001
Token friends
Token ID2460
Feature activation-5.71e-01

Pos logit contributions
adoes+0.116
outing+0.107
ows+0.107
cases+0.105
 anymore+0.103
Neg logit contributions
 se-0.035
Ar-0.032
 oct-0.030
 sw-0.028
 pl-0.025
Token were
Token ID547
Feature activation+5.81e-01

Pos logit contributions
adoes+0.024
outing+0.022
ows+0.022
 fears+0.022
cases+0.021
Neg logit contributions
 se-0.014
Ar-0.013
 oct-0.013
 sw-0.012
 pl-0.011
Token discussing
Token ID11142
Feature activation-5.41e-02

Pos logit contributions
Ar+0.018
 se+0.017
 oct+0.016
thank+0.015
 tort+0.015
Neg logit contributions
adoes-0.020
outing-0.018
 anymore-0.018
ows-0.018
 fears-0.017
Token their
Token ID511
Feature activation+1.47e-01

Pos logit contributions
adoes+0.002
outing+0.002
ows+0.002
 fears+0.002
cases+0.002
Neg logit contributions
 se-0.001
 oct-0.001
 sw-0.001
 pl-0.001
Ar-0.001
Token plans
Token ID3352
Feature activation+4.62e-02

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.004
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.005
outing-0.005
ows-0.005
 anymore-0.005
 fears-0.005
Token the
Token ID262
Feature activation-1.29e-01

Pos logit contributions
Ar+0.002
rob+0.002
 tort+0.002
 se+0.002
 sur+0.002
Neg logit contributions
adoes-0.002
 anymore-0.002
outing-0.002
 fears-0.002
 appearance-0.002
Token sunlight
Token ID19606
Feature activation+1.41e-02

Pos logit contributions
adoes+0.005
 fears+0.004
 anymore+0.004
outing+0.004
ows+0.004
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
thank-0.003
rob-0.003
Token.
Token ID13
Feature activation+2.87e-01

Pos logit contributions
Ar+0.000
 se+0.000
 oct+0.000
 tort+0.000
 sw+0.000
Neg logit contributions
adoes-0.001
ows-0.000
outing-0.000
 anymore-0.000
cases-0.000
Token But
Token ID887
Feature activation+5.00e-01

Pos logit contributions
 se+0.009
 oct+0.008
Ar+0.008
 pl+0.008
 cuc+0.008
Neg logit contributions
adoes-0.010
outing-0.009
 fears-0.009
ows-0.009
 anymore-0.009
Token,
Token ID11
Feature activation-3.92e-01

Pos logit contributions
 se+0.012
Ar+0.011
 oct+0.011
 sw+0.010
 pl+0.010
Neg logit contributions
adoes-0.021
outing-0.019
ows-0.019
 fears-0.019
 anymore-0.018
Token best
Token ID1266
Feature activation-2.23e+00

Pos logit contributions
adoes+0.016
outing+0.015
ows+0.014
 fears+0.014
cases+0.014
Neg logit contributions
 se-0.010
 oct-0.009
Ar-0.009
 sw-0.009
 pl-0.009
Token of
Token ID286
Feature activation-9.59e-01

Pos logit contributions
adoes+0.097
outing+0.090
ows+0.089
cases+0.086
 fears+0.085
Neg logit contributions
 se-0.051
Ar-0.049
 oct-0.047
 sw-0.044
 pl-0.042
Token all
Token ID477
Feature activation-5.88e-01

Pos logit contributions
adoes+0.035
outing+0.031
 anymore+0.030
 fears+0.030
ows+0.030
Neg logit contributions
Ar-0.029
 se-0.028
 oct-0.026
 tort-0.026
 pl-0.025
Token,
Token ID11
Feature activation+2.78e-01

Pos logit contributions
adoes+0.028
outing+0.026
ows+0.025
cases+0.025
iders+0.025
Neg logit contributions
 se-0.011
 oct-0.010
 sw-0.010
 pl-0.009
 tw-0.009
Token there
Token ID612
Feature activation-6.00e-01

Pos logit contributions
 se+0.007
 oct+0.007
Ar+0.006
 sw+0.006
 pl+0.006
Neg logit contributions
adoes-0.011
outing-0.010
ows-0.010
cases-0.010
 fears-0.010
Token was
Token ID373
Feature activation+3.16e-01

Pos logit contributions
adoes+0.033
outing+0.031
inters+0.030
cases+0.030
ows+0.029
Neg logit contributions
 se-0.007
 sw-0.007
 pl-0.006
 tw-0.006
 cl-0.005
TokenL
Token ID43
Feature activation+5.75e-01

Pos logit contributions
 anymore+0.012
adoes+0.011
ows+0.011
uffy+0.010
outing+0.010
Neg logit contributions
 se-0.010
Ar-0.010
 tort-0.010
 oct-0.010
 sur-0.009
Tokenily
Token ID813
Feature activation-9.41e-02

Pos logit contributions
Ar+0.017
 se+0.017
 oct+0.016
 tort+0.015
 sw+0.015
Neg logit contributions
adoes-0.021
outing-0.019
ows-0.019
 fears-0.018
 anymore-0.018
Token and
Token ID290
Feature activation-9.03e-03

Pos logit contributions
adoes+0.003
ows+0.003
outing+0.003
 fears+0.003
 anymore+0.003
Neg logit contributions
Ar-0.003
 se-0.003
 oct-0.003
 tort-0.003
 pl-0.003
Token Ben
Token ID3932
Feature activation+1.70e-01

Pos logit contributions
adoes+0.000
outing+0.000
ows+0.000
cases+0.000
 fears+0.000
Neg logit contributions
 se-0.000
Ar-0.000
 sw-0.000
 oct-0.000
 pl-0.000
Token were
Token ID547
Feature activation+1.97e-01

Pos logit contributions
 se+0.005
Ar+0.005
 oct+0.005
 sw+0.005
 tort+0.005
Neg logit contributions
adoes-0.006
outing-0.005
ows-0.005
 fears-0.005
 anymore-0.005
Token best
Token ID1266
Feature activation-2.23e+00

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.005
 tort+0.004
 sw+0.004
Neg logit contributions
adoes-0.008
outing-0.007
ows-0.007
 fears-0.007
 anymore-0.007
Token friends
Token ID2460
Feature activation-7.88e-01

Pos logit contributions
adoes+0.097
outing+0.091
 fears+0.090
ows+0.089
 anymore+0.088
Neg logit contributions
Ar-0.049
 se-0.046
 oct-0.043
 tort-0.040
thank-0.040
Token.
Token ID13
Feature activation+1.03e-01

Pos logit contributions
adoes+0.032
outing+0.029
ows+0.029
 anymore+0.028
 fears+0.028
Neg logit contributions
Ar-0.020
 se-0.020
 oct-0.018
 sw-0.017
 tort-0.017
Token They
Token ID1119
Feature activation+3.84e-01

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.004
 pl+0.004
 sw+0.004
Neg logit contributions
adoes-0.003
outing-0.002
ows-0.002
 anymore-0.002
 fears-0.002
Token liked
Token ID8288
Feature activation+6.20e-02

Pos logit contributions
Ar+0.012
 se+0.011
 oct+0.011
 tort+0.011
 sur+0.011
Neg logit contributions
adoes-0.012
ows-0.011
outing-0.011
 anymore-0.011
 fears-0.011
Token to
Token ID284
Feature activation-4.20e-02

Pos logit contributions
 se+0.002
Ar+0.002
 oct+0.002
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.002
outing-0.002
ows-0.002
 fears-0.002
 anymore-0.002
Token and
Token ID290
Feature activation+2.18e-02

Pos logit contributions
Ar+0.010
 se+0.009
 oct+0.009
 tort+0.009
 sur+0.008
Neg logit contributions
adoes-0.013
ows-0.012
 anymore-0.012
outing-0.011
cases-0.011
Token cars
Token ID5006
Feature activation-5.74e-01

Pos logit contributions
Ar+0.001
 sur+0.001
 oct+0.001
 tort+0.001
 se+0.001
Neg logit contributions
adoes-0.001
ows-0.001
 fears-0.001
cases-0.001
 anymore-0.001
Token and
Token ID290
Feature activation+1.44e-02

Pos logit contributions
adoes+0.023
ows+0.021
outing+0.021
 anymore+0.021
 fears+0.020
Neg logit contributions
Ar-0.014
 se-0.014
 oct-0.013
 tort-0.012
 sw-0.012
Token made
Token ID925
Feature activation-3.06e-01

Pos logit contributions
Ar+0.000
 sur+0.000
 oct+0.000
 se+0.000
 tort+0.000
Neg logit contributions
adoes-0.001
ows-0.000
cases-0.000
 anymore-0.000
 fears-0.000
Token a
Token ID257
Feature activation-5.41e-01

Pos logit contributions
adoes+0.012
outing+0.011
ows+0.011
 fears+0.011
iders+0.011
Neg logit contributions
 se-0.008
Ar-0.007
 oct-0.007
 pl-0.007
 sw-0.007
Token story
Token ID1621
Feature activation-2.22e+00

Pos logit contributions
adoes+0.018
 fears+0.016
outing+0.016
ows+0.016
 anymore+0.016
Neg logit contributions
Ar-0.016
 se-0.016
 oct-0.015
 tort-0.015
 sur-0.014
Token.
Token ID13
Feature activation+1.93e-01

Pos logit contributions
adoes+0.081
 fears+0.073
outing+0.072
 appearance+0.067
inters+0.067
Neg logit contributions
 se-0.066
Ar-0.063
 pl-0.058
 show-0.058
 sw-0.056
Token They
Token ID1119
Feature activation-3.40e-01

Pos logit contributions
Ar+0.006
 se+0.006
 oct+0.005
 sw+0.005
 pl+0.005
Neg logit contributions
adoes-0.007
outing-0.006
 fears-0.006
ows-0.006
cases-0.006
Token had
Token ID550
Feature activation+2.29e-01

Pos logit contributions
ows+0.010
outed+0.010
outing+0.010
 anymore+0.009
adoes+0.009
Neg logit contributions
Ar-0.011
 tort-0.010
 oct-0.010
 sur-0.010
 Rib-0.009
Token fun
Token ID1257
Feature activation+4.31e-01

Pos logit contributions
Ar+0.007
 tort+0.006
 oct+0.006
 sur+0.006
 se+0.006
Neg logit contributions
adoes-0.008
 anymore-0.008
ows-0.007
outing-0.007
 fears-0.007
Token and
Token ID290
Feature activation-2.46e-01

Pos logit contributions
Ar+0.013
 tort+0.012
 se+0.012
 oct+0.012
 sur+0.011
Neg logit contributions
adoes-0.016
 anymore-0.015
outing-0.014
 fears-0.013
ows-0.013
TokenS
Token ID50
Feature activation+3.21e-01

Pos logit contributions
adoes+0.005
 anymore+0.005
ows+0.004
uffy+0.004
outing+0.004
Neg logit contributions
 se-0.005
 tort-0.004
 oct-0.004
Ar-0.004
 pl-0.004
Tokenara
Token ID3301
Feature activation+7.39e-02

Pos logit contributions
 se+0.011
Ar+0.011
 oct+0.010
 sw+0.010
 pl+0.010
Neg logit contributions
adoes-0.010
outing-0.009
ows-0.009
 fears-0.009
 anymore-0.009
Token and
Token ID290
Feature activation+2.37e-01

Pos logit contributions
Ar+0.002
 se+0.002
 tort+0.002
 oct+0.002
 cuc+0.002
Neg logit contributions
adoes-0.002
ows-0.002
 fears-0.002
 anymore-0.002
 slams-0.002
Token Ben
Token ID3932
Feature activation+1.95e-01

Pos logit contributions
 se+0.006
Ar+0.006
 oct+0.006
 sw+0.005
 pl+0.005
Neg logit contributions
adoes-0.010
outing-0.009
ows-0.009
 fears-0.009
 anymore-0.009
Token are
Token ID389
Feature activation+1.59e-01

Pos logit contributions
Ar+0.006
 se+0.006
 oct+0.006
 tort+0.005
 sur+0.005
Neg logit contributions
adoes-0.007
ows-0.006
outing-0.006
 fears-0.006
 anymore-0.006
Token best
Token ID1266
Feature activation-2.22e+00

Pos logit contributions
Ar+0.005
 se+0.005
 tort+0.005
 oct+0.005
thank+0.004
Neg logit contributions
adoes-0.005
outing-0.005
outed-0.005
 fears-0.005
 anymore-0.005
Token friends
Token ID2460
Feature activation-6.63e-01

Pos logit contributions
adoes+0.095
 fears+0.088
outing+0.088
ows+0.087
 anymore+0.086
Neg logit contributions
Ar-0.050
 se-0.046
 oct-0.043
 tort-0.042
thank-0.042
Token.
Token ID13
Feature activation+4.64e-02

Pos logit contributions
adoes+0.025
 anymore+0.023
outing+0.023
ows+0.023
 fears+0.022
Neg logit contributions
Ar-0.018
 se-0.018
 oct-0.017
 tort-0.016
 pl-0.016
Token They
Token ID1119
Feature activation+4.21e-01

Pos logit contributions
 se+0.002
Ar+0.002
 oct+0.002
 pl+0.002
 tort+0.002
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
 anymore-0.001
 fears-0.001
Token like
Token ID588
Feature activation-2.68e-01

Pos logit contributions
Ar+0.012
 se+0.012
 oct+0.011
 tort+0.011
 sur+0.011
Neg logit contributions
adoes-0.014
ows-0.013
 fears-0.013
 anymore-0.013
outing-0.013
Token to
Token ID284
Feature activation-1.59e-01

Pos logit contributions
adoes+0.009
 anymore+0.007
ows+0.007
outing+0.007
 fears+0.007
Neg logit contributions
Ar-0.009
 se-0.008
 oct-0.008
 tort-0.008
 sur-0.007
Token<|endoftext|>
Token ID50256
Feature activation-1.36e-01

Pos logit contributions
adoes+0.021
outing+0.019
 fears+0.019
 anymore+0.018
ows+0.018
Neg logit contributions
Ar-0.010
 se-0.010
 pl-0.009
 oct-0.009
 sur-0.008
TokenJohn
Token ID7554
Feature activation+2.82e-01

Pos logit contributions
adoes+0.005
 anymore+0.005
ows+0.005
outing+0.005
uffy+0.004
Neg logit contributions
 se-0.004
Ar-0.004
 oct-0.003
 tort-0.003
 pl-0.003
Token and
Token ID290
Feature activation+1.03e-03

Pos logit contributions
 se+0.008
Ar+0.008
 oct+0.007
 sw+0.007
 pl+0.007
Neg logit contributions
adoes-0.011
outing-0.010
ows-0.010
 fears-0.010
 anymore-0.009
Token Sarah
Token ID10490
Feature activation-4.91e-02

Pos logit contributions
Ar+0.000
 se+0.000
 oct+0.000
 tort+0.000
 pl+0.000
Neg logit contributions
adoes-0.000
outing-0.000
 fears-0.000
 anymore-0.000
ows-0.000
Token were
Token ID547
Feature activation+3.53e-01

Pos logit contributions
adoes+0.002
outing+0.002
ows+0.002
 fears+0.002
iders+0.002
Neg logit contributions
 se-0.001
Ar-0.001
 oct-0.001
 sw-0.001
 pl-0.001
Token best
Token ID1266
Feature activation-2.22e+00

Pos logit contributions
Ar+0.009
 se+0.009
 oct+0.008
 sw+0.008
 pl+0.008
Neg logit contributions
adoes-0.014
outing-0.013
ows-0.013
 fears-0.013
 anymore-0.013
Token friends
Token ID2460
Feature activation-6.28e-01

Pos logit contributions
adoes+0.099
outing+0.093
 fears+0.092
ows+0.091
 anymore+0.090
Neg logit contributions
Ar-0.046
 se-0.043
 oct-0.040
 sw-0.038
 tort-0.038
Token.
Token ID13
Feature activation-2.69e-02

Pos logit contributions
adoes+0.025
outing+0.023
ows+0.023
 anymore+0.023
 fears+0.022
Neg logit contributions
Ar-0.016
 se-0.016
 oct-0.015
 sw-0.014
 pl-0.014
Token They
Token ID1119
Feature activation+2.73e-01

Pos logit contributions
adoes+0.001
outing+0.001
ows+0.001
 fears+0.001
cases+0.001
Neg logit contributions
Ar-0.001
 se-0.001
 oct-0.001
 sw-0.001
 tort-0.001
Token always
Token ID1464
Feature activation+5.45e-01

Pos logit contributions
Ar+0.007
 se+0.007
 oct+0.007
 tort+0.006
 sur+0.006
Neg logit contributions
adoes-0.010
outing-0.009
ows-0.009
 anymore-0.009
 fears-0.009
Token played
Token ID2826
Feature activation+1.29e-02

Pos logit contributions
Ar+0.013
 se+0.013
 oct+0.012
 tort+0.011
 pl+0.011
Neg logit contributions
adoes-0.022
outing-0.020
ows-0.020
 fears-0.020
 anymore-0.019
Token<|endoftext|>
Token ID50256
Feature activation-7.57e-02

Pos logit contributions
 se+0.008
Ar+0.008
 oct+0.008
 sw+0.007
 pl+0.007
Neg logit contributions
adoes-0.018
outing-0.017
ows-0.017
 fears-0.016
 anymore-0.016
TokenOne
Token ID3198
Feature activation+1.05e+00

Pos logit contributions
adoes+0.003
 anymore+0.003
ows+0.003
outing+0.002
uffy+0.002
Neg logit contributions
 se-0.002
Ar-0.002
 oct-0.002
 tort-0.002
 pl-0.002
Token day
Token ID1110
Feature activation-1.37e-01

Pos logit contributions
 se+0.016
 oct+0.014
Ar+0.013
 sw+0.013
 pl+0.012
Neg logit contributions
adoes-0.056
outing-0.051
ows-0.051
cases-0.049
iders-0.049
Token,
Token ID11
Feature activation-6.45e-01

Pos logit contributions
adoes+0.006
outing+0.005
ows+0.005
 fears+0.005
 anymore+0.005
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 sw-0.003
 pl-0.003
Token two
Token ID734
Feature activation-2.34e-01

Pos logit contributions
adoes+0.030
outing+0.027
ows+0.027
 fears+0.027
cases+0.026
Neg logit contributions
Ar-0.013
 se-0.013
 oct-0.012
 sw-0.011
 pl-0.011
Token best
Token ID1266
Feature activation-2.22e+00

Pos logit contributions
adoes+0.012
ows+0.011
outing+0.011
iders+0.011
cases+0.011
Neg logit contributions
 se-0.004
 sw-0.003
 oct-0.003
 pl-0.003
Ar-0.003
Token friends
Token ID2460
Feature activation-5.68e-01

Pos logit contributions
adoes+0.114
ows+0.106
outing+0.106
cases+0.102
 anymore+0.102
Neg logit contributions
 se-0.034
Ar-0.032
 oct-0.029
 sw-0.027
 pl-0.025
Token decided
Token ID3066
Feature activation-2.49e-01

Pos logit contributions
adoes+0.025
outing+0.024
ows+0.023
 fears+0.022
cases+0.022
Neg logit contributions
 se-0.013
Ar-0.012
 oct-0.012
 sw-0.011
 pl-0.010
Token to
Token ID284
Feature activation+5.88e-01

Pos logit contributions
adoes+0.011
outing+0.011
ows+0.011
 fears+0.010
iders+0.010
Neg logit contributions
 se-0.006
 sw-0.005
 oct-0.005
Ar-0.004
 pl-0.004
Token go
Token ID467
Feature activation+3.05e-01

Pos logit contributions
Ar+0.016
 se+0.015
 oct+0.014
rob+0.013
 sur+0.013
Neg logit contributions
adoes-0.023
 anymore-0.020
 fears-0.020
ows-0.019
outing-0.019
Token on
Token ID319
Feature activation+2.55e-02

Pos logit contributions
Ar+0.010
 se+0.009
thank+0.009
 oct+0.009
 tort+0.009
Neg logit contributions
adoes-0.010
ows-0.009
 anymore-0.009
cases-0.009
inters-0.008
TokenS
Token ID50
Feature activation+3.85e-01

Pos logit contributions
 anymore+0.008
adoes+0.008
ows+0.007
uffy+0.007
outing+0.007
Neg logit contributions
 se-0.007
 tort-0.007
Ar-0.007
 oct-0.007
thank-0.007
Tokenara
Token ID3301
Feature activation-4.71e-02

Pos logit contributions
Ar+0.014
 se+0.014
 oct+0.013
 tort+0.013
 sw+0.012
Neg logit contributions
adoes-0.011
outing-0.010
ows-0.010
 anymore-0.010
 fears-0.010
Token and
Token ID290
Feature activation+1.06e-01

Pos logit contributions
adoes+0.002
ows+0.002
 anymore+0.002
 fears+0.001
outing+0.001
Neg logit contributions
Ar-0.001
 se-0.001
 tort-0.001
 oct-0.001
 sur-0.001
Token Ben
Token ID3932
Feature activation+9.30e-02

Pos logit contributions
 se+0.003
Ar+0.003
 oct+0.003
 sw+0.003
 pl+0.002
Neg logit contributions
adoes-0.004
outing-0.004
ows-0.004
 fears-0.004
 anymore-0.004
Token were
Token ID547
Feature activation+8.70e-02

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.002
 tort+0.002
 sur+0.002
Neg logit contributions
adoes-0.004
ows-0.003
outing-0.003
 anymore-0.003
 fears-0.003
Token best
Token ID1266
Feature activation-2.22e+00

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.002
 tort+0.002
 sur+0.002
Neg logit contributions
adoes-0.003
outing-0.003
 fears-0.003
ows-0.003
 anymore-0.003
Token friends
Token ID2460
Feature activation-6.63e-01

Pos logit contributions
adoes+0.092
 fears+0.087
outing+0.086
 anymore+0.084
ows+0.084
Neg logit contributions
Ar-0.052
 se-0.047
 oct-0.044
thank-0.044
 tort-0.044
Token.
Token ID13
Feature activation+1.75e-01

Pos logit contributions
adoes+0.024
 anymore+0.023
outing+0.022
ows+0.022
 fears+0.021
Neg logit contributions
Ar-0.019
 se-0.018
 oct-0.017
 tort-0.017
 pl-0.016
Token They
Token ID1119
Feature activation+4.05e-01

Pos logit contributions
 se+0.006
Ar+0.006
 oct+0.006
 sw+0.006
 pl+0.006
Neg logit contributions
adoes-0.005
outing-0.005
ows-0.005
 fears-0.004
 anymore-0.004
Token liked
Token ID8288
Feature activation+1.06e-01

Pos logit contributions
Ar+0.014
 se+0.013
rob+0.012
 tort+0.012
 oct+0.012
Neg logit contributions
adoes-0.011
ows-0.011
outing-0.010
 anymore-0.010
outed-0.010
Token to
Token ID284
Feature activation-1.40e-02

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.003
 tort+0.003
 sw+0.003
Neg logit contributions
adoes-0.004
outing-0.004
 anymore-0.003
 fears-0.003
ows-0.003
TokenL
Token ID43
Feature activation+6.75e-01

Pos logit contributions
adoes+0.010
 anymore+0.010
ows+0.010
outing+0.009
uffy+0.009
Neg logit contributions
 se-0.009
Ar-0.009
 tort-0.008
 oct-0.008
thank-0.008
Tokenily
Token ID813
Feature activation-1.29e-01

Pos logit contributions
Ar+0.020
 se+0.020
 oct+0.019
 tort+0.018
 sw+0.018
Neg logit contributions
adoes-0.024
outing-0.021
ows-0.021
 anymore-0.021
 fears-0.021
Token and
Token ID290
Feature activation-4.98e-03

Pos logit contributions
adoes+0.004
ows+0.003
 fears+0.003
 anymore+0.003
outing+0.003
Neg logit contributions
Ar-0.005
 se-0.005
 oct-0.004
 tort-0.004
 sur-0.004
Token Ben
Token ID3932
Feature activation+1.86e-01

Pos logit contributions
adoes+0.000
outing+0.000
ows+0.000
 fears+0.000
 anymore+0.000
Neg logit contributions
 se-0.000
Ar-0.000
 oct-0.000
 sw-0.000
 pl-0.000
Token were
Token ID547
Feature activation+1.53e-01

Pos logit contributions
 se+0.006
Ar+0.006
 oct+0.006
 tort+0.005
 sw+0.005
Neg logit contributions
adoes-0.006
outing-0.005
ows-0.005
 fears-0.005
 anymore-0.005
Token best
Token ID1266
Feature activation-2.21e+00

Pos logit contributions
Ar+0.004
 se+0.004
 oct+0.004
 tort+0.003
thank+0.003
Neg logit contributions
adoes-0.006
outing-0.006
 fears-0.005
ows-0.005
 anymore-0.005
Token friends
Token ID2460
Feature activation-7.95e-01

Pos logit contributions
adoes+0.093
outing+0.087
 fears+0.087
 anymore+0.085
ows+0.084
Neg logit contributions
Ar-0.050
 se-0.046
 oct-0.044
 tort-0.043
thank-0.043
Token.
Token ID13
Feature activation+8.72e-02

Pos logit contributions
adoes+0.030
 anymore+0.028
outing+0.027
ows+0.027
 fears+0.026
Neg logit contributions
Ar-0.022
 se-0.022
 oct-0.020
 tort-0.020
 pl-0.019
Token They
Token ID1119
Feature activation+4.24e-01

Pos logit contributions
 se+0.004
Ar+0.003
 oct+0.003
 pl+0.003
 sur+0.003
Neg logit contributions
adoes-0.002
 anymore-0.002
outing-0.002
ows-0.002
 fears-0.002
Token liked
Token ID8288
Feature activation-1.28e-02

Pos logit contributions
Ar+0.014
 se+0.013
 oct+0.013
rob+0.013
 tort+0.013
Neg logit contributions
adoes-0.012
ows-0.011
 anymore-0.011
outing-0.011
 fears-0.011
Token to
Token ID284
Feature activation-8.26e-03

Pos logit contributions
adoes+0.000
outing+0.000
 anymore+0.000
ows+0.000
 fears+0.000
Neg logit contributions
Ar-0.000
 se-0.000
 oct-0.000
 tort-0.000
 sw-0.000
Token<|endoftext|>
Token ID50256
Feature activation-2.06e-01

Pos logit contributions
adoes+0.008
ows+0.007
 anymore+0.007
outing+0.007
 fears+0.007
Neg logit contributions
 se-0.007
Ar-0.006
 oct-0.006
 sw-0.006
 pl-0.006
TokenBen
Token ID11696
Feature activation+4.35e-01

Pos logit contributions
adoes+0.007
 anymore+0.007
ows+0.006
outing+0.006
uffy+0.006
Neg logit contributions
 se-0.006
Ar-0.006
 oct-0.006
 tort-0.006
 sur-0.005
Token and
Token ID290
Feature activation+5.55e-02

Pos logit contributions
Ar+0.012
 se+0.012
 oct+0.011
 tort+0.011
 sur+0.010
Neg logit contributions
adoes-0.016
ows-0.015
 fears-0.014
outing-0.014
 anymore-0.014
Token Lily
Token ID20037
Feature activation+2.74e-01

Pos logit contributions
Ar+0.001
 se+0.001
 oct+0.001
 tort+0.001
 sur+0.001
Neg logit contributions
adoes-0.002
 fears-0.002
ows-0.002
outing-0.002
 anymore-0.002
Token were
Token ID547
Feature activation+1.94e-01

Pos logit contributions
Ar+0.007
 se+0.007
 oct+0.007
 tort+0.006
 sw+0.006
Neg logit contributions
adoes-0.010
outing-0.009
ows-0.009
 fears-0.009
 anymore-0.009
Token best
Token ID1266
Feature activation-2.21e+00

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.005
 tort+0.004
 sw+0.004
Neg logit contributions
adoes-0.008
outing-0.007
ows-0.007
 fears-0.007
 anymore-0.007
Token friends
Token ID2460
Feature activation-7.97e-01

Pos logit contributions
adoes+0.094
outing+0.088
 fears+0.087
ows+0.086
 anymore+0.085
Neg logit contributions
Ar-0.050
 se-0.046
 oct-0.044
 tort-0.043
thank-0.042
Token.
Token ID13
Feature activation-6.83e-03

Pos logit contributions
adoes+0.031
outing+0.028
ows+0.028
 anymore+0.028
 fears+0.027
Neg logit contributions
Ar-0.021
 se-0.021
 oct-0.019
 tort-0.018
 sw-0.018
Token They
Token ID1119
Feature activation+3.67e-01

Pos logit contributions
adoes+0.000
outing+0.000
ows+0.000
 fears+0.000
 anymore+0.000
Neg logit contributions
 se-0.000
Ar-0.000
 oct-0.000
 sw-0.000
 pl-0.000
Token liked
Token ID8288
Feature activation+3.25e-02

Pos logit contributions
Ar+0.012
 se+0.011
 oct+0.011
rob+0.011
 tort+0.011
Neg logit contributions
adoes-0.011
ows-0.010
 anymore-0.010
outing-0.010
 fears-0.010
Token to
Token ID284
Feature activation-2.69e-02

Pos logit contributions
Ar+0.001
 se+0.001
 oct+0.001
 sw+0.001
 tort+0.001
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
 fears-0.001
 anymore-0.001
Token Wed
Token ID3300
Feature activation-1.61e+00

Pos logit contributions
Ar+0.001
 se+0.001
 oct+0.001
 sw+0.001
 tort+0.001
Neg logit contributions
adoes-0.002
outing-0.001
 fears-0.001
 anymore-0.001
ows-0.001
Tokennes
Token ID2516
Feature activation+7.16e-01

Pos logit contributions
adoes+0.050
outing+0.045
ows+0.044
 anymore+0.042
 fears+0.042
Neg logit contributions
 se-0.055
Ar-0.055
 oct-0.053
 sw-0.050
 pl-0.049
Tokendays
Token ID12545
Feature activation-6.63e-02

Pos logit contributions
Ar+0.025
 se+0.025
 oct+0.023
 sw+0.022
 tort+0.022
Neg logit contributions
adoes-0.022
outing-0.020
ows-0.019
 anymore-0.019
 fears-0.019
Token because
Token ID780
Feature activation-4.75e-01

Pos logit contributions
adoes+0.002
outing+0.002
 fears+0.002
ows+0.002
cases+0.002
Neg logit contributions
 se-0.002
Ar-0.002
 oct-0.002
 sw-0.002
 pl-0.002
Token on
Token ID319
Feature activation+3.75e-01

Pos logit contributions
adoes+0.017
outing+0.015
 anymore+0.014
ows+0.014
 fears+0.014
Neg logit contributions
Ar-0.015
 se-0.014
 oct-0.013
 sw-0.013
 tort-0.013
Token Wed
Token ID3300
Feature activation-2.21e+00

Pos logit contributions
 se+0.009
 oct+0.008
Ar+0.008
 pl+0.007
 sw+0.007
Neg logit contributions
adoes-0.016
outing-0.016
ows-0.015
 fears-0.015
outed-0.014
Tokennes
Token ID2516
Feature activation+7.73e-01

Pos logit contributions
adoes+0.067
outing+0.061
ows+0.059
 anymore+0.057
 fears+0.057
Neg logit contributions
 se-0.076
Ar-0.075
 oct-0.073
 sw-0.069
 pl-0.068
Tokendays
Token ID12545
Feature activation+2.54e-02

Pos logit contributions
 se+0.024
Ar+0.024
 oct+0.023
 sw+0.022
 pl+0.022
Neg logit contributions
adoes-0.027
outing-0.023
ows-0.023
 fears-0.023
cases-0.022
Token she
Token ID673
Feature activation+8.10e-02

Pos logit contributions
 se+0.000
thank+0.000
 pl+0.000
Ar+0.000
 oct+0.000
Neg logit contributions
adoes-0.001
outing-0.001
 fears-0.001
ows-0.001
iders-0.001
Token did
Token ID750
Feature activation-1.45e-01

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.002
 sw+0.002
 tort+0.002
Neg logit contributions
adoes-0.003
outing-0.003
ows-0.003
 fears-0.003
 anymore-0.003
Token something
Token ID1223
Feature activation+1.18e+00

Pos logit contributions
adoes+0.005
outing+0.005
 fears+0.005
ows+0.005
outed+0.005
Neg logit contributions
 se-0.004
 pl-0.004
Ar-0.004
 show-0.004
 sw-0.004
TokenL
Token ID43
Feature activation+7.20e-01

Pos logit contributions
adoes+0.007
 anymore+0.007
ows+0.006
uffy+0.006
outing+0.006
Neg logit contributions
 se-0.006
Ar-0.006
 oct-0.006
 tort-0.006
 pl-0.006
Tokenily
Token ID813
Feature activation-6.59e-02

Pos logit contributions
 se+0.020
Ar+0.020
 oct+0.018
 pl+0.017
 sw+0.017
Neg logit contributions
adoes-0.028
outing-0.025
 fears-0.024
ows-0.024
 anymore-0.023
Token and
Token ID290
Feature activation-1.14e-01

Pos logit contributions
adoes+0.003
ows+0.002
outing+0.002
 fears+0.002
 anymore+0.002
Neg logit contributions
Ar-0.002
 se-0.002
 oct-0.001
 tort-0.001
 sw-0.001
Token Ben
Token ID3932
Feature activation+1.94e-01

Pos logit contributions
adoes+0.004
outing+0.004
ows+0.004
 fears+0.004
 anymore+0.004
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 sw-0.003
 pl-0.003
Token were
Token ID547
Feature activation+1.75e-01

Pos logit contributions
 se+0.005
Ar+0.005
 oct+0.005
 sw+0.004
 tort+0.004
Neg logit contributions
adoes-0.008
outing-0.007
ows-0.007
 fears-0.007
 anymore-0.007
Token best
Token ID1266
Feature activation-2.21e+00

Pos logit contributions
Ar+0.005
 se+0.004
 oct+0.004
 tort+0.004
 sw+0.004
Neg logit contributions
adoes-0.007
outing-0.006
ows-0.006
 fears-0.006
 anymore-0.006
Token friends
Token ID2460
Feature activation-7.15e-01

Pos logit contributions
adoes+0.092
 fears+0.086
outing+0.086
 anymore+0.084
ows+0.084
Neg logit contributions
Ar-0.052
 se-0.047
 oct-0.045
 tort-0.044
thank-0.043
Token.
Token ID13
Feature activation+2.14e-01

Pos logit contributions
adoes+0.027
 anymore+0.025
outing+0.025
ows+0.025
 fears+0.024
Neg logit contributions
Ar-0.020
 se-0.019
 oct-0.018
 tort-0.017
 pl-0.017
Token They
Token ID1119
Feature activation+5.12e-01

Pos logit contributions
 se+0.008
Ar+0.008
 oct+0.008
 sw+0.007
 pl+0.007
Neg logit contributions
adoes-0.006
outing-0.005
ows-0.005
 anymore-0.005
 fears-0.005
Token liked
Token ID8288
Feature activation+1.60e-01

Pos logit contributions
Ar+0.017
 se+0.015
 oct+0.015
 tort+0.015
rob+0.014
Neg logit contributions
adoes-0.016
ows-0.015
outing-0.015
 anymore-0.014
 fears-0.014
Token to
Token ID284
Feature activation+4.12e-02

Pos logit contributions
Ar+0.004
 se+0.004
 oct+0.004
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.006
outing-0.006
ows-0.006
 fears-0.005
 anymore-0.005
TokenS
Token ID50
Feature activation+4.39e-01

Pos logit contributions
adoes+0.007
 anymore+0.007
ows+0.006
outing+0.006
uffy+0.006
Neg logit contributions
 se-0.006
Ar-0.005
 tort-0.005
 oct-0.005
 pl-0.005
Tokenara
Token ID3301
Feature activation+4.65e-02

Pos logit contributions
 se+0.016
Ar+0.016
 oct+0.015
 sw+0.015
 pl+0.014
Neg logit contributions
adoes-0.013
outing-0.011
ows-0.011
 fears-0.011
 anymore-0.011
Token and
Token ID290
Feature activation+8.71e-02

Pos logit contributions
Ar+0.001
 se+0.001
 tort+0.001
 oct+0.001
 cuc+0.001
Neg logit contributions
adoes-0.002
ows-0.001
 anymore-0.001
outing-0.001
 fears-0.001
Token Ben
Token ID3932
Feature activation+1.51e-01

Pos logit contributions
 se+0.002
Ar+0.002
 oct+0.002
 sw+0.002
 pl+0.002
Neg logit contributions
adoes-0.004
outing-0.003
ows-0.003
 fears-0.003
 anymore-0.003
Token were
Token ID547
Feature activation+1.90e-01

Pos logit contributions
Ar+0.004
 se+0.004
 oct+0.004
 tort+0.004
 sur+0.004
Neg logit contributions
adoes-0.005
ows-0.005
 anymore-0.005
outing-0.005
 fears-0.005
Token best
Token ID1266
Feature activation-2.20e+00

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.004
 tort+0.004
 sw+0.004
Neg logit contributions
adoes-0.008
outing-0.007
ows-0.007
 fears-0.007
 anymore-0.007
Token friends
Token ID2460
Feature activation-6.14e-01

Pos logit contributions
adoes+0.095
outing+0.088
 fears+0.088
ows+0.086
 anymore+0.086
Neg logit contributions
Ar-0.049
 se-0.045
 oct-0.042
 tort-0.041
thank-0.039
Token.
Token ID13
Feature activation+1.59e-01

Pos logit contributions
adoes+0.022
 anymore+0.021
ows+0.021
outing+0.021
 fears+0.019
Neg logit contributions
Ar-0.018
 se-0.017
 oct-0.016
 tort-0.016
 pl-0.015
Token They
Token ID1119
Feature activation+4.60e-01

Pos logit contributions
 se+0.006
Ar+0.006
 oct+0.005
 sw+0.005
 pl+0.005
Neg logit contributions
adoes-0.005
outing-0.004
ows-0.004
 fears-0.004
 anymore-0.004
Token liked
Token ID8288
Feature activation+6.27e-02

Pos logit contributions
Ar+0.016
 se+0.015
rob+0.015
 tort+0.015
 oct+0.014
Neg logit contributions
adoes-0.012
ows-0.012
 anymore-0.011
outing-0.011
outed-0.011
Token to
Token ID284
Feature activation+3.76e-03

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.002
 tort+0.002
 pl+0.002
Neg logit contributions
adoes-0.002
outing-0.002
 anymore-0.002
ows-0.002
 fears-0.002
Token<|endoftext|>
Token ID50256
Feature activation-2.06e-01

Pos logit contributions
 se+0.003
Ar+0.003
 oct+0.003
 sw+0.003
 pl+0.003
Neg logit contributions
adoes-0.005
outing-0.005
ows-0.005
 fears-0.005
 anymore-0.004
TokenAnna
Token ID31160
Feature activation+5.12e-02

Pos logit contributions
adoes+0.007
 anymore+0.007
ows+0.007
outing+0.006
uffy+0.006
Neg logit contributions
 se-0.006
Ar-0.006
 oct-0.006
 tort-0.005
 pl-0.005
Token and
Token ID290
Feature activation-4.15e-02

Pos logit contributions
 se+0.002
Ar+0.002
 oct+0.002
 tort+0.002
 cuc+0.002
Neg logit contributions
adoes-0.002
ows-0.001
 anymore-0.001
 fears-0.001
 slams-0.001
Token Ben
Token ID3932
Feature activation+1.75e-01

Pos logit contributions
adoes+0.001
outing+0.001
ows+0.001
 fears+0.001
cases+0.001
Neg logit contributions
 se-0.001
Ar-0.001
 oct-0.001
 sw-0.001
 pl-0.001
Token were
Token ID547
Feature activation+1.66e-01

Pos logit contributions
 se+0.005
Ar+0.005
 oct+0.005
 sw+0.004
 tort+0.004
Neg logit contributions
adoes-0.006
outing-0.006
ows-0.006
 fears-0.006
 anymore-0.006
Token best
Token ID1266
Feature activation-2.19e+00

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.003
 sw+0.003
 pl+0.003
Neg logit contributions
adoes-0.007
outing-0.007
ows-0.007
 fears-0.006
 anymore-0.006
Token friends
Token ID2460
Feature activation-7.29e-01

Pos logit contributions
adoes+0.102
outing+0.094
ows+0.093
 fears+0.093
 anymore+0.092
Neg logit contributions
Ar-0.042
 se-0.041
 oct-0.037
 tort-0.034
 sw-0.034
Token.
Token ID13
Feature activation+1.61e-01

Pos logit contributions
adoes+0.030
outing+0.027
ows+0.027
 fears+0.027
 anymore+0.026
Neg logit contributions
 se-0.018
Ar-0.018
 oct-0.016
 sw-0.015
 pl-0.015
Token They
Token ID1119
Feature activation+4.17e-01

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.005
 sw+0.005
 pl+0.005
Neg logit contributions
adoes-0.005
outing-0.005
ows-0.005
 fears-0.005
cases-0.004
Token liked
Token ID8288
Feature activation+1.09e-02

Pos logit contributions
Ar+0.013
 se+0.013
 oct+0.012
 tort+0.012
rob+0.012
Neg logit contributions
adoes-0.013
ows-0.012
outing-0.012
 anymore-0.012
 fears-0.012
Token to
Token ID284
Feature activation+3.76e-02

Pos logit contributions
 se+0.000
Ar+0.000
 oct+0.000
 sw+0.000
 pl+0.000
Neg logit contributions
adoes-0.000
outing-0.000
ows-0.000
 fears-0.000
 anymore-0.000
TokenS
Token ID50
Feature activation+3.43e-01

Pos logit contributions
adoes+0.007
 anymore+0.006
ows+0.006
outing+0.006
uffy+0.006
Neg logit contributions
 se-0.006
Ar-0.006
 tort-0.005
 oct-0.005
thank-0.005
Tokenara
Token ID3301
Feature activation+3.98e-02

Pos logit contributions
Ar+0.012
 se+0.012
 oct+0.011
 tort+0.011
 sw+0.011
Neg logit contributions
adoes-0.010
outing-0.009
ows-0.009
 anymore-0.009
 fears-0.009
Token and
Token ID290
Feature activation+5.99e-02

Pos logit contributions
Ar+0.001
 se+0.001
 tort+0.001
 oct+0.001
 cuc+0.001
Neg logit contributions
adoes-0.001
ows-0.001
 anymore-0.001
 fears-0.001
outing-0.001
Token Mia
Token ID32189
Feature activation-2.98e-01

Pos logit contributions
 se+0.002
Ar+0.002
 oct+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.002
outing-0.002
ows-0.002
 fears-0.002
 anymore-0.002
Token are
Token ID389
Feature activation+2.69e-02

Pos logit contributions
adoes+0.011
ows+0.010
 anymore+0.010
outing+0.010
 fears+0.010
Neg logit contributions
Ar-0.008
 se-0.008
 oct-0.008
 tort-0.007
 sur-0.007
Token best
Token ID1266
Feature activation-2.19e+00

Pos logit contributions
Ar+0.001
 se+0.001
 oct+0.001
 tort+0.001
rob+0.001
Neg logit contributions
adoes-0.001
outing-0.001
 anymore-0.001
 fears-0.001
outed-0.001
Token friends
Token ID2460
Feature activation-6.88e-01

Pos logit contributions
adoes+0.092
 fears+0.086
outing+0.085
 anymore+0.084
ows+0.083
Neg logit contributions
Ar-0.050
 se-0.046
 oct-0.044
 tort-0.043
thank-0.042
Token.
Token ID13
Feature activation+1.85e-01

Pos logit contributions
adoes+0.026
 anymore+0.024
outing+0.023
ows+0.023
 fears+0.022
Neg logit contributions
Ar-0.020
 se-0.019
 oct-0.018
 tort-0.017
 pl-0.017
Token They
Token ID1119
Feature activation+4.43e-01

Pos logit contributions
 se+0.006
Ar+0.006
 oct+0.006
 sw+0.006
 pl+0.006
Neg logit contributions
adoes-0.006
outing-0.005
ows-0.005
 fears-0.005
 anymore-0.005
Token like
Token ID588
Feature activation-3.57e-01

Pos logit contributions
Ar+0.014
 se+0.013
 oct+0.012
 tort+0.012
rob+0.012
Neg logit contributions
adoes-0.014
ows-0.014
 anymore-0.013
 fears-0.013
outing-0.013
Token to
Token ID284
Feature activation+4.51e-02

Pos logit contributions
adoes+0.011
 anymore+0.010
ows+0.010
outing+0.010
 fears+0.009
Neg logit contributions
Ar-0.012
 se-0.011
 oct-0.011
 tort-0.010
 sur-0.010
Token<|endoftext|>
Token ID50256
Feature activation-1.93e-01

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.004
 sw+0.004
 tort+0.004
Neg logit contributions
adoes-0.004
outing-0.004
ows-0.004
 fears-0.004
 anymore-0.004
TokenAnna
Token ID31160
Feature activation+2.10e-01

Pos logit contributions
 anymore+0.006
adoes+0.006
ows+0.006
uffy+0.005
outing+0.005
Neg logit contributions
 se-0.006
 tort-0.005
Ar-0.005
 oct-0.005
 pl-0.005
Token and
Token ID290
Feature activation-5.50e-02

Pos logit contributions
Ar+0.006
 se+0.006
 tort+0.005
 oct+0.005
 cuc+0.005
Neg logit contributions
adoes-0.008
ows-0.007
 anymore-0.007
 fears-0.007
 slams-0.006
Token Ben
Token ID3932
Feature activation+1.99e-01

Pos logit contributions
adoes+0.002
outing+0.002
ows+0.002
 fears+0.002
 anymore+0.002
Neg logit contributions
 se-0.002
Ar-0.002
 oct-0.002
 sw-0.001
 pl-0.001
Token were
Token ID547
Feature activation+1.17e-01

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.005
 tort+0.005
 sur+0.004
Neg logit contributions
adoes-0.008
ows-0.007
outing-0.007
 fears-0.007
 anymore-0.007
Token best
Token ID1266
Feature activation-2.19e+00

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.003
 sw+0.002
 tort+0.002
Neg logit contributions
adoes-0.005
outing-0.004
ows-0.004
 fears-0.004
 anymore-0.004
Token friends
Token ID2460
Feature activation-7.12e-01

Pos logit contributions
adoes+0.096
outing+0.089
 fears+0.089
ows+0.088
 anymore+0.087
Neg logit contributions
Ar-0.047
 se-0.044
 oct-0.041
 tort-0.040
 sw-0.038
Token.
Token ID13
Feature activation+1.18e-01

Pos logit contributions
adoes+0.028
outing+0.026
ows+0.025
 anymore+0.025
 fears+0.025
Neg logit contributions
Ar-0.019
 se-0.019
 oct-0.017
 tort-0.016
 sw-0.016
Token They
Token ID1119
Feature activation+4.09e-01

Pos logit contributions
Ar+0.004
 se+0.004
 oct+0.004
 sw+0.003
 pl+0.003
Neg logit contributions
adoes-0.004
outing-0.004
 fears-0.003
ows-0.003
cases-0.003
Token liked
Token ID8288
Feature activation+5.79e-02

Pos logit contributions
Ar+0.014
 se+0.013
 oct+0.013
 tort+0.013
rob+0.012
Neg logit contributions
adoes-0.012
ows-0.011
outing-0.011
 anymore-0.011
 fears-0.010
Token to
Token ID284
Feature activation-8.22e-03

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.001
 sw+0.001
 tort+0.001
Neg logit contributions
adoes-0.002
outing-0.002
ows-0.002
 fears-0.002
 anymore-0.002
Token top
Token ID1353
Feature activation-6.95e-01

Pos logit contributions
adoes+0.006
outing+0.006
ows+0.006
 fears+0.006
cases+0.005
Neg logit contributions
 se-0.004
Ar-0.003
 oct-0.003
 pl-0.003
 sw-0.003
Token to
Token ID284
Feature activation-4.72e-01

Pos logit contributions
adoes+0.032
outing+0.029
 fears+0.029
ows+0.029
cases+0.027
Neg logit contributions
 se-0.014
Ar-0.013
 oct-0.013
 pl-0.012
 sw-0.011
Token make
Token ID787
Feature activation-2.01e-01

Pos logit contributions
adoes+0.017
 anymore+0.016
ows+0.015
outing+0.015
 fears+0.015
Neg logit contributions
Ar-0.014
 se-0.012
 oct-0.012
 tort-0.012
 sur-0.012
Token it
Token ID340
Feature activation-5.07e-01

Pos logit contributions
adoes+0.009
outing+0.008
ows+0.008
 fears+0.008
cases+0.008
Neg logit contributions
 se-0.004
Ar-0.004
 pl-0.004
 oct-0.004
 sw-0.004
Token y
Token ID331
Feature activation+2.03e-02

Pos logit contributions
adoes+0.022
outing+0.020
ows+0.019
 fears+0.019
outed+0.019
Neg logit contributions
 se-0.012
Ar-0.011
 oct-0.011
 pl-0.010
 sw-0.010
Tokenumm
Token ID13929
Feature activation-2.19e+00

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.001
 pl+0.001
thank+0.001
Neg logit contributions
adoes-0.001
ows-0.001
outing-0.001
 fears-0.001
cases-0.001
Tokenier
Token ID959
Feature activation+1.13e-01

Pos logit contributions
adoes+0.079
outing+0.071
ows+0.071
 anymore+0.069
 fears+0.068
Neg logit contributions
Ar-0.065
 se-0.064
 oct-0.060
 sw-0.057
 tort-0.057
Token.
Token ID13
Feature activation+2.85e-01

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.003
 sw+0.002
 pl+0.002
Neg logit contributions
adoes-0.005
outing-0.004
ows-0.004
 anymore-0.004
 fears-0.004
Token Sarah
Token ID10490
Feature activation-3.69e-01

Pos logit contributions
 se+0.007
Ar+0.007
 pl+0.006
 oct+0.006
 tort+0.006
Neg logit contributions
adoes-0.011
outing-0.011
ows-0.010
 fears-0.010
 anymore-0.010
Token was
Token ID373
Feature activation-2.75e-01

Pos logit contributions
adoes+0.014
ows+0.013
outing+0.013
 fears+0.013
cases+0.012
Neg logit contributions
Ar-0.010
 se-0.009
 oct-0.009
 tort-0.008
rob-0.008
Token still
Token ID991
Feature activation+4.25e-01

Pos logit contributions
adoes+0.010
 anymore+0.009
outing+0.009
ows+0.009
 fears+0.009
Neg logit contributions
Ar-0.008
 se-0.008
 oct-0.007
rob-0.007
 tort-0.007
Token was
Token ID373
Feature activation-1.26e-01

Pos logit contributions
adoes+0.006
outing+0.005
 fears+0.005
ows+0.005
 anymore+0.005
Neg logit contributions
Ar-0.004
 se-0.004
 oct-0.003
 tort-0.003
 sur-0.003
Token drawing
Token ID8263
Feature activation-8.29e-02

Pos logit contributions
adoes+0.005
outing+0.005
ows+0.005
 fears+0.005
 anymore+0.005
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 sw-0.003
 pl-0.003
Token a
Token ID257
Feature activation+4.44e-01

Pos logit contributions
adoes+0.004
outing+0.004
 fears+0.004
ows+0.004
 appearance+0.004
Neg logit contributions
 se-0.001
 oct-0.001
 pl-0.001
Ar-0.001
 tw-0.001
Token big
Token ID1263
Feature activation-9.85e-02

Pos logit contributions
Ar+0.014
 se+0.013
 oct+0.012
 sw+0.012
 tort+0.012
Neg logit contributions
adoes-0.015
outing-0.014
 fears-0.014
 anymore-0.013
ows-0.013
Token sun
Token ID4252
Feature activation-8.14e-01

Pos logit contributions
adoes+0.004
outing+0.003
ows+0.003
 fears+0.003
 anymore+0.003
Neg logit contributions
Ar-0.003
 se-0.003
 oct-0.003
 sw-0.002
 pl-0.002
Token with
Token ID351
Feature activation+2.14e+00

Pos logit contributions
adoes+0.034
outing+0.031
ows+0.031
 fears+0.030
 anymore+0.030
Neg logit contributions
 se-0.019
Ar-0.019
 oct-0.018
 sw-0.017
 pl-0.016
Token yellow
Token ID7872
Feature activation+3.21e-01

Pos logit contributions
Ar+0.052
 se+0.050
 oct+0.047
 tort+0.044
 sur+0.044
Neg logit contributions
adoes-0.088
outing-0.079
 fears-0.077
ows-0.077
 anymore-0.077
Token chalk
Token ID30860
Feature activation-7.39e-01

Pos logit contributions
Ar+0.009
 se+0.009
 oct+0.008
 sw+0.008
 tort+0.007
Neg logit contributions
adoes-0.012
outing-0.011
ows-0.011
 anymore-0.011
 fears-0.011
Token when
Token ID618
Feature activation+3.44e-01

Pos logit contributions
adoes+0.031
outing+0.028
ows+0.028
 anymore+0.028
 fears+0.027
Neg logit contributions
Ar-0.018
 se-0.017
 oct-0.016
 sw-0.015
 tort-0.015
Token her
Token ID607
Feature activation-1.64e-01

Pos logit contributions
 se+0.008
 tw+0.006
 sw+0.006
 pl+0.006
 oct+0.006
Neg logit contributions
ows-0.016
outing-0.016
adoes-0.015
 fears-0.015
inters-0.014
Token little
Token ID1310
Feature activation-7.13e-01

Pos logit contributions
adoes+0.006
outing+0.005
ows+0.005
 anymore+0.005
 fears+0.005
Neg logit contributions
 se-0.005
Ar-0.005
 oct-0.005
 sw-0.004
 pl-0.004
Token poured
Token ID19036
Feature activation+8.32e-01

Pos logit contributions
adoes+0.016
ows+0.015
outing+0.015
cases+0.015
 fears+0.015
Neg logit contributions
Ar-0.011
 se-0.010
 oct-0.010
 tort-0.010
 sur-0.009
Token some
Token ID617
Feature activation+7.99e-01

Pos logit contributions
 se+0.019
Ar+0.018
 oct+0.017
 sw+0.016
 pl+0.016
Neg logit contributions
adoes-0.036
outing-0.033
 fears-0.032
ows-0.032
 anymore-0.031
Token water
Token ID1660
Feature activation-1.85e-02

Pos logit contributions
Ar+0.023
 se+0.022
 oct+0.021
 tort+0.021
 sur+0.020
Neg logit contributions
adoes-0.029
ows-0.027
outing-0.026
 anymore-0.025
 fears-0.025
Token from
Token ID422
Feature activation+3.48e-01

Pos logit contributions
adoes+0.001
outing+0.001
 fears+0.001
ows+0.001
 anymore+0.001
Neg logit contributions
 se-0.000
Ar-0.000
 oct-0.000
 sw-0.000
 pl-0.000
Token the
Token ID262
Feature activation+9.50e-02

Pos logit contributions
 se+0.007
Ar+0.007
 oct+0.007
 sw+0.006
 pl+0.006
Neg logit contributions
adoes-0.016
outing-0.015
ows-0.014
 fears-0.014
 anymore-0.014
Token te
Token ID573
Feature activation+2.13e+00

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.003
 sw+0.002
 tort+0.002
Neg logit contributions
adoes-0.003
outing-0.003
ows-0.003
 anymore-0.003
 fears-0.003
Tokenap
Token ID499
Feature activation+3.34e-01

Pos logit contributions
 se+0.084
Ar+0.083
 oct+0.079
 pl+0.077
 sw+0.077
Neg logit contributions
adoes-0.056
outing-0.048
 anymore-0.045
cases-0.045
ows-0.045
Tokenot
Token ID313
Feature activation-9.13e-03

Pos logit contributions
 se+0.013
Ar+0.012
 oct+0.012
 sw+0.012
 pl+0.011
Neg logit contributions
adoes-0.010
outing-0.008
 fears-0.008
 anymore-0.008
cases-0.008
Token and
Token ID290
Feature activation-1.52e-02

Pos logit contributions
adoes+0.000
outing+0.000
ows+0.000
 fears+0.000
 anymore+0.000
Neg logit contributions
 se-0.000
Ar-0.000
 oct-0.000
 sw-0.000
 pl-0.000
Token pretended
Token ID36766
Feature activation-6.57e-01

Pos logit contributions
adoes+0.001
ows+0.001
outing+0.001
cases+0.001
 fears+0.001
Neg logit contributions
Ar-0.000
 oct-0.000
 se-0.000
 tort-0.000
 sur-0.000
Token it
Token ID340
Feature activation-7.03e-01

Pos logit contributions
outing+0.030
adoes+0.030
 fears+0.029
urious+0.028
outed+0.028
Neg logit contributions
 se-0.013
 sw-0.011
 pl-0.011
 oct-0.010
 show-0.010
Token back
Token ID736
Feature activation+1.61e-01

Pos logit contributions
adoes+0.008
 fears+0.007
 anymore+0.007
ows+0.007
outing+0.007
Neg logit contributions
Ar-0.006
 se-0.005
 oct-0.005
 tort-0.005
 sur-0.005
Token the
Token ID262
Feature activation+2.95e-01

Pos logit contributions
Ar+0.004
 se+0.004
 oct+0.004
 tort+0.003
 sur+0.003
Neg logit contributions
adoes-0.007
ows-0.006
 fears-0.006
outing-0.006
 anymore-0.006
Token village
Token ID7404
Feature activation+2.03e-01

Pos logit contributions
Ar+0.009
 se+0.008
 oct+0.008
rob+0.007
 Bub+0.007
Neg logit contributions
adoes-0.010
 fears-0.009
outing-0.009
 anymore-0.009
 appearance-0.009
Token to
Token ID284
Feature activation+2.64e-01

Pos logit contributions
 se+0.005
Ar+0.005
 oct+0.004
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.009
outing-0.008
ows-0.008
 fears-0.008
 anymore-0.007
Token its
Token ID663
Feature activation+7.67e-01

Pos logit contributions
Ar+0.008
rob+0.007
 se+0.007
 oct+0.007
 sur+0.007
Neg logit contributions
adoes-0.009
 anymore-0.009
 fears-0.008
 appearance-0.008
 911-0.008
Token former
Token ID1966
Feature activation+2.11e+00

Pos logit contributions
Ar+0.021
 se+0.020
 oct+0.019
rob+0.019
 sur+0.018
Neg logit contributions
adoes-0.030
 fears-0.027
 anymore-0.026
 appearance-0.026
outing-0.026
Token state
Token ID1181
Feature activation+4.77e-01

Pos logit contributions
 se+0.062
Ar+0.061
 oct+0.058
 sur+0.055
 sw+0.055
Neg logit contributions
adoes-0.077
 fears-0.069
ows-0.068
outing-0.067
 anymore-0.067
Token.
Token ID13
Feature activation+6.11e-01

Pos logit contributions
Ar+0.013
 se+0.012
 oct+0.011
 sw+0.011
 tort+0.011
Neg logit contributions
adoes-0.019
ows-0.017
outing-0.017
 anymore-0.017
 fears-0.016
Token With
Token ID2080
Feature activation+1.81e+00

Pos logit contributions
 se+0.015
Ar+0.014
 oct+0.014
 sw+0.013
 pl+0.013
Neg logit contributions
adoes-0.025
 anymore-0.023
ows-0.023
 fears-0.022
outing-0.022
Token the
Token ID262
Feature activation-2.84e-01

Pos logit contributions
Ar+0.052
 se+0.048
rob+0.047
 sur+0.047
 tort+0.046
Neg logit contributions
adoes-0.061
 fears-0.057
outing-0.053
 anymore-0.052
 fists-0.050
Token help
Token ID1037
Feature activation+9.78e-01

Pos logit contributions
adoes+0.011
 fears+0.011
outing+0.010
 anymore+0.010
ows+0.010
Neg logit contributions
Ar-0.007
 se-0.006
rob-0.006
 oct-0.006
 Rib-0.006
Token big
Token ID1263
Feature activation+2.26e-02

Pos logit contributions
adoes+0.007
 anymore+0.006
ows+0.006
outing+0.006
 fears+0.006
Neg logit contributions
Ar-0.005
 se-0.005
 oct-0.005
 sur-0.004
 sw-0.004
Token picture
Token ID4286
Feature activation-3.30e-01

Pos logit contributions
 se+0.001
 oct+0.001
Ar+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
cases-0.001
 fears-0.001
Token of
Token ID286
Feature activation+6.56e-01

Pos logit contributions
adoes+0.014
outing+0.013
 fears+0.013
ows+0.013
 anymore+0.012
Neg logit contributions
 se-0.007
Ar-0.007
 oct-0.007
 sw-0.006
 pl-0.006
Token a
Token ID257
Feature activation+2.91e-01

Pos logit contributions
Ar+0.017
 se+0.016
 oct+0.015
 sur+0.015
 tort+0.015
Neg logit contributions
adoes-0.026
 anymore-0.023
outing-0.023
ows-0.022
 fears-0.022
Token sun
Token ID4252
Feature activation-8.43e-02

Pos logit contributions
Ar+0.009
 se+0.008
 oct+0.008
 sur+0.007
thank+0.007
Neg logit contributions
adoes-0.010
 anymore-0.009
outing-0.009
 fears-0.009
ows-0.009
Token with
Token ID351
Feature activation+2.23e+00

Pos logit contributions
adoes+0.004
outing+0.003
ows+0.003
 fears+0.003
 anymore+0.003
Neg logit contributions
 se-0.002
Ar-0.002
 oct-0.002
 sw-0.002
 pl-0.002
Token her
Token ID607
Feature activation+8.96e-02

Pos logit contributions
Ar+0.054
 se+0.052
 oct+0.049
 tort+0.047
 sur+0.046
Neg logit contributions
adoes-0.092
outing-0.082
ows-0.081
 fears-0.081
 anymore-0.080
Token orange
Token ID10912
Feature activation+3.84e-01

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.002
 tort+0.002
 sw+0.002
Neg logit contributions
adoes-0.003
outing-0.003
ows-0.003
 fears-0.003
 anymore-0.003
Token cr
Token ID1067
Feature activation+7.12e-01

Pos logit contributions
 se+0.011
Ar+0.011
 oct+0.010
 sw+0.009
 pl+0.009
Neg logit contributions
adoes-0.014
outing-0.013
ows-0.013
 fears-0.013
 anymore-0.012
Tokenay
Token ID323
Feature activation-3.40e-01

Pos logit contributions
 se+0.025
Ar+0.025
 oct+0.024
 sw+0.023
 pl+0.023
Neg logit contributions
adoes-0.021
outing-0.019
ows-0.018
 fears-0.018
 anymore-0.018
Tokenon
Token ID261
Feature activation+5.41e-01

Pos logit contributions
adoes+0.012
outing+0.011
ows+0.011
 fears+0.010
 anymore+0.010
Neg logit contributions
Ar-0.010
 se-0.010
 oct-0.010
 sw-0.009
 tort-0.009
Token soon
Token ID2582
Feature activation+2.63e-01

Pos logit contributions
Ar+0.032
 se+0.031
 oct+0.029
 tort+0.028
 sw+0.027
Neg logit contributions
adoes-0.044
 anymore-0.040
outing-0.040
ows-0.039
 fears-0.039
Token as
Token ID355
Feature activation+1.16e+00

Pos logit contributions
Ar+0.008
 se+0.008
 oct+0.008
 sw+0.007
 tort+0.007
Neg logit contributions
adoes-0.009
outing-0.008
 anymore-0.008
ows-0.007
 fears-0.007
Token she
Token ID673
Feature activation+3.38e-01

Pos logit contributions
Ar+0.038
 se+0.037
 oct+0.036
 tort+0.035
 sw+0.034
Neg logit contributions
adoes-0.036
 anymore-0.033
outing-0.030
 fears-0.030
cases-0.029
Token stepped
Token ID10764
Feature activation+8.69e-01

Pos logit contributions
Ar+0.011
 cuc+0.010
 oct+0.010
rob+0.010
 tort+0.010
Neg logit contributions
adoes-0.010
outing-0.009
outed-0.009
 fears-0.009
ows-0.009
Token outside
Token ID2354
Feature activation+1.09e+00

Pos logit contributions
Ar+0.027
 se+0.027
 oct+0.025
 sw+0.024
 tort+0.023
Neg logit contributions
adoes-0.030
outing-0.027
 anymore-0.027
ows-0.026
 fears-0.026
Token with
Token ID351
Feature activation+2.17e+00

Pos logit contributions
 se+0.026
Ar+0.024
 oct+0.023
 sw+0.022
 pl+0.022
Neg logit contributions
adoes-0.046
outing-0.043
ows-0.042
 fears-0.041
cases-0.040
Token the
Token ID262
Feature activation-3.83e-02

Pos logit contributions
 se+0.050
Ar+0.050
 oct+0.046
 sw+0.043
 pl+0.042
Neg logit contributions
adoes-0.093
outing-0.085
ows-0.084
 fears-0.083
 anymore-0.082
Token egg
Token ID5935
Feature activation+8.36e-01

Pos logit contributions
adoes+0.001
outing+0.001
 fears+0.001
ows+0.001
 anymore+0.001
Neg logit contributions
Ar-0.001
 se-0.001
 sur-0.001
 oct-0.001
thank-0.001
Token,
Token ID11
Feature activation+5.19e-01

Pos logit contributions
 se+0.019
 oct+0.016
 sw+0.016
 pl+0.016
Ar+0.016
Neg logit contributions
adoes-0.037
outing-0.034
 fears-0.033
ows-0.033
 anymore-0.032
Token she
Token ID673
Feature activation-1.19e-01

Pos logit contributions
 se+0.011
 oct+0.009
 pl+0.009
 sw+0.009
Ar+0.009
Neg logit contributions
adoes-0.024
outing-0.023
ows-0.023
 fears-0.022
inters-0.021
Token heard
Token ID2982
Feature activation+2.62e-01

Pos logit contributions
adoes+0.005
outing+0.004
ows+0.004
 fears+0.004
 anymore+0.004
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 sw-0.003
 pl-0.003
Token away
Token ID1497
Feature activation-3.88e-01

Pos logit contributions
adoes+0.021
outing+0.020
ows+0.019
 fears+0.019
 anymore+0.018
Neg logit contributions
Ar-0.014
 se-0.014
 oct-0.013
 pl-0.012
 sw-0.012
Token.
Token ID13
Feature activation-1.11e-01

Pos logit contributions
adoes+0.016
 anymore+0.014
ows+0.014
outing+0.014
 fears+0.014
Neg logit contributions
Ar-0.010
 se-0.010
 oct-0.009
 sw-0.008
 tort-0.008
Token He
Token ID679
Feature activation-1.92e-01

Pos logit contributions
adoes+0.004
outing+0.004
ows+0.004
 anymore+0.004
 fears+0.004
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 sw-0.003
 pl-0.003
Token did
Token ID750
Feature activation+7.33e-03

Pos logit contributions
adoes+0.007
ows+0.007
outing+0.007
 fears+0.007
outed+0.006
Neg logit contributions
Ar-0.005
 se-0.005
 oct-0.005
 tort-0.004
thank-0.004
Token not
Token ID407
Feature activation+1.59e-01

Pos logit contributions
 se+0.000
Ar+0.000
 pl+0.000
 sw+0.000
 show+0.000
Neg logit contributions
adoes-0.000
outing-0.000
ows-0.000
outed-0.000
iders-0.000
Token say
Token ID910
Feature activation+9.53e-01

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.004
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.006
outing-0.006
ows-0.006
cases-0.005
outed-0.005
Token sorry
Token ID7926
Feature activation+8.88e-03

Pos logit contributions
 se+0.030
 oct+0.028
 sw+0.027
 pl+0.026
 Bub+0.026
Neg logit contributions
adoes-0.034
outing-0.032
urious-0.031
outed-0.031
iders-0.030
Token.
Token ID13
Feature activation-1.08e-01

Pos logit contributions
 se+0.000
 oct+0.000
 sw+0.000
 show+0.000
 fish+0.000
Neg logit contributions
adoes-0.000
outing-0.000
outed-0.000
cases-0.000
 slams-0.000
Token
Token ID198
Feature activation+2.96e-01

Pos logit contributions
adoes+0.004
outing+0.004
ows+0.004
 anymore+0.004
 fears+0.004
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 sw-0.003
 pl-0.002
Token
Token ID198
Feature activation-1.47e-01

Pos logit contributions
 se+0.007
 oct+0.006
 tort+0.006
 cuc+0.006
Ar+0.006
Neg logit contributions
adoes-0.012
outing-0.011
ows-0.011
 anymore-0.011
 fears-0.011
TokenJ
Token ID41
Feature activation-2.04e-01

Pos logit contributions
adoes+0.005
outing+0.005
ows+0.005
 fears+0.005
iders+0.005
Neg logit contributions
 se-0.004
 oct-0.004
Ar-0.004
 sw-0.004
 tort-0.004
Token strong
Token ID1913
Feature activation+2.44e-02

Pos logit contributions
adoes+0.008
ows+0.008
outing+0.008
outed+0.007
 anymore+0.007
Neg logit contributions
Ar-0.008
 se-0.007
 oct-0.007
 tort-0.007
rob-0.007
Token.
Token ID13
Feature activation+2.03e-02

Pos logit contributions
Ar+0.001
 se+0.001
 oct+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
 anymore-0.001
 fears-0.001
Token
Token ID198
Feature activation+4.28e-01

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.000
 sw+0.000
 pl+0.000
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
 fears-0.001
 anymore-0.001
Token
Token ID198
Feature activation+2.26e-01

Pos logit contributions
 se+0.009
Ar+0.009
 tort+0.009
 oct+0.008
 cuc+0.008
Neg logit contributions
adoes-0.018
ows-0.017
outing-0.017
 anymore-0.017
 fears-0.016
TokenShe
Token ID3347
Feature activation-1.94e-01

Pos logit contributions
Ar+0.007
 se+0.007
 oct+0.006
 sw+0.006
 pl+0.006
Neg logit contributions
adoes-0.008
outing-0.007
ows-0.007
 fears-0.007
 anymore-0.007
Token poured
Token ID19036
Feature activation+8.63e-01

Pos logit contributions
ows+0.006
adoes+0.006
outed+0.006
 slams+0.006
outing+0.006
Neg logit contributions
Ar-0.006
 tort-0.005
 oct-0.005
rob-0.005
 Bub-0.005
Token some
Token ID617
Feature activation+5.89e-01

Pos logit contributions
 se+0.019
Ar+0.019
 oct+0.017
 sw+0.016
 pl+0.016
Neg logit contributions
adoes-0.038
outing-0.035
ows-0.034
 fears-0.034
 anymore-0.033
Token milk
Token ID7545
Feature activation+9.07e-02

Pos logit contributions
Ar+0.018
 se+0.017
 oct+0.017
 sur+0.017
 tort+0.016
Neg logit contributions
adoes-0.020
ows-0.018
outing-0.018
 anymore-0.018
 fears-0.017
Token into
Token ID656
Feature activation-6.69e-01

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.002
 tort+0.002
 sur+0.002
Neg logit contributions
adoes-0.004
ows-0.003
outing-0.003
 anymore-0.003
 fears-0.003
Token the
Token ID262
Feature activation-2.11e-01

Pos logit contributions
adoes+0.029
outing+0.027
ows+0.026
 fears+0.026
 anymore+0.026
Neg logit contributions
Ar-0.015
 se-0.014
 oct-0.013
 sw-0.012
 tort-0.012
Token cup
Token ID6508
Feature activation+1.42e-02

Pos logit contributions
adoes+0.008
outing+0.007
ows+0.007
 fears+0.007
 anymore+0.007
Neg logit contributions
 se-0.006
Ar-0.006
 oct-0.006
 sw-0.005
 pl-0.005
Token
Token ID198
Feature activation-1.93e-01

Pos logit contributions
 se+0.000
Ar+0.000
 oct+0.000
 tort+0.000
 sw+0.000
Neg logit contributions
adoes-0.000
ows-0.000
outing-0.000
 anymore-0.000
 fears-0.000
Token"
Token ID1
Feature activation+6.46e-01

Pos logit contributions
adoes+0.008
outing+0.007
ows+0.007
 fears+0.007
 anymore+0.007
Neg logit contributions
 se-0.005
Ar-0.005
 oct-0.004
 sw-0.004
 pl-0.004
TokenAw
Token ID23155
Feature activation-4.35e-01

Pos logit contributions
 se+0.020
 oct+0.020
Ar+0.020
 sw+0.018
 pl+0.018
Neg logit contributions
adoes-0.022
outing-0.020
 anymore-0.018
ows-0.018
 fears-0.018
Token,
Token ID11
Feature activation+2.98e-01

Pos logit contributions
adoes+0.022
outing+0.020
 fears+0.020
inters+0.019
outed+0.019
Neg logit contributions
 se-0.007
 oct-0.006
 sw-0.006
 Bub-0.006
Ar-0.005
Token mom
Token ID1995
Feature activation-2.66e-01

Pos logit contributions
 se+0.009
 oct+0.008
Ar+0.008
 sw+0.008
 pl+0.007
Neg logit contributions
adoes-0.011
outing-0.010
ows-0.010
 fears-0.010
cases-0.010
Token,
Token ID11
Feature activation+5.28e-01

Pos logit contributions
adoes+0.013
 fears+0.012
outing+0.011
ows+0.011
inters+0.011
Neg logit contributions
 se-0.005
 oct-0.004
 pl-0.004
 sw-0.004
Ar-0.004
Token can
Token ID460
Feature activation-7.64e-02

Pos logit contributions
 se+0.015
 oct+0.014
Ar+0.013
 sw+0.013
 pl+0.012
Neg logit contributions
adoes-0.020
outing-0.019
ows-0.019
 fears-0.018
cases-0.018
Token we
Token ID356
Feature activation+4.61e-01

Pos logit contributions
adoes+0.003
ows+0.003
outing+0.003
inters+0.003
 fears+0.003
Neg logit contributions
 se-0.002
 show-0.001
 sw-0.001
 oct-0.001
 pl-0.001
Token play
Token ID711
Feature activation-1.27e-01

Pos logit contributions
Ar+0.013
 oct+0.013
 se+0.012
rob+0.012
 cuc+0.012
Neg logit contributions
adoes-0.016
 anymore-0.015
 fears-0.015
ows-0.014
outing-0.014
Token a
Token ID257
Feature activation+3.52e-01

Pos logit contributions
adoes+0.005
 anymore+0.005
ows+0.004
outing+0.004
 fears+0.004
Neg logit contributions
Ar-0.003
 se-0.003
 oct-0.003
 tort-0.003
 sw-0.003
Token little
Token ID1310
Feature activation-1.16e+00

Pos logit contributions
 se+0.010
Ar+0.010
 oct+0.010
 pl+0.009
 tort+0.009
Neg logit contributions
adoes-0.012
 anymore-0.011
outing-0.011
 fears-0.011
ows-0.011
TokenTim
Token ID14967
Feature activation+4.42e-01

Pos logit contributions
adoes+0.008
 anymore+0.008
ows+0.007
outing+0.007
uffy+0.007
Neg logit contributions
 se-0.005
 oct-0.005
Ar-0.005
 tort-0.005
 pl-0.005
Token and
Token ID290
Feature activation+1.33e-01

Pos logit contributions
Ar+0.016
 se+0.016
 oct+0.015
 tort+0.015
 sur+0.014
Neg logit contributions
adoes-0.013
ows-0.011
outing-0.011
 fears-0.011
 anymore-0.011
Token Jen
Token ID13364
Feature activation+1.35e-01

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.003
 tort+0.003
 pl+0.003
Neg logit contributions
adoes-0.005
outing-0.005
ows-0.005
 fears-0.005
 anymore-0.005
Token were
Token ID547
Feature activation+2.85e-01

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.005
 tort+0.004
 sw+0.004
Neg logit contributions
adoes-0.004
ows-0.003
outing-0.003
 anymore-0.003
 fears-0.003
Token playing
Token ID2712
Feature activation-1.16e-01

Pos logit contributions
 se+0.006
Ar+0.006
 oct+0.005
 sw+0.005
 pl+0.005
Neg logit contributions
adoes-0.013
outing-0.012
ows-0.012
 fears-0.011
 anymore-0.011
Token in
Token ID287
Feature activation+4.85e-01

Pos logit contributions
adoes+0.004
 anymore+0.004
ows+0.004
outing+0.004
 fears+0.004
Neg logit contributions
Ar-0.003
 se-0.003
 oct-0.003
 tort-0.003
 sur-0.003
Token the
Token ID262
Feature activation+3.12e-01

Pos logit contributions
Ar+0.012
 se+0.011
 oct+0.011
 tort+0.011
 sur+0.010
Neg logit contributions
adoes-0.020
 anymore-0.017
outing-0.017
 fears-0.017
 slams-0.017
Token park
Token ID3952
Feature activation-3.69e-01

Pos logit contributions
Ar+0.009
 se+0.009
 oct+0.008
 tort+0.008
 sur+0.008
Neg logit contributions
adoes-0.012
outing-0.010
 fears-0.010
ows-0.010
 anymore-0.010
Token.
Token ID13
Feature activation-5.53e-02

Pos logit contributions
adoes+0.016
outing+0.015
ows+0.014
 fears+0.014
 anymore+0.014
Neg logit contributions
 se-0.008
Ar-0.008
 oct-0.008
 sw-0.007
 pl-0.007
Token They
Token ID1119
Feature activation+6.16e-02

Pos logit contributions
adoes+0.002
outing+0.002
 fears+0.002
ows+0.002
 anymore+0.002
Neg logit contributions
 se-0.002
Ar-0.002
 oct-0.002
 sw-0.002
 pl-0.002
Token saw
Token ID2497
Feature activation+3.18e-01

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.002
 tort+0.002
 sur+0.001
Neg logit contributions
adoes-0.002
ows-0.002
outing-0.002
 anymore-0.002
 fears-0.002
Token me
Token ID502
Feature activation+2.44e-01

Pos logit contributions
adoes+0.002
outing+0.002
ows+0.002
cases+0.002
 fears+0.002
Neg logit contributions
 se-0.001
Ar-0.001
 oct-0.001
 pl-0.001
 sw-0.001
Token some
Token ID617
Feature activation-1.51e-01

Pos logit contributions
 se+0.005
Ar+0.005
 oct+0.005
 sw+0.005
 pl+0.005
Neg logit contributions
adoes-0.011
outing-0.010
ows-0.010
 fears-0.009
 anymore-0.009
Token food
Token ID2057
Feature activation-2.04e-01

Pos logit contributions
adoes+0.005
 anymore+0.005
 fears+0.005
ows+0.005
outing+0.004
Neg logit contributions
Ar-0.005
 se-0.004
 sur-0.004
 tort-0.004
 oct-0.004
Token?",
Token ID35379
Feature activation+2.40e-01

Pos logit contributions
adoes+0.008
ows+0.007
 anymore+0.007
outing+0.007
 fears+0.007
Neg logit contributions
Ar-0.005
 se-0.005
 oct-0.005
 sw-0.005
 sur-0.005
Token the
Token ID262
Feature activation-5.11e-01

Pos logit contributions
Ar+0.008
 se+0.007
 oct+0.007
rob+0.007
 sw+0.006
Neg logit contributions
adoes-0.008
 anymore-0.007
outing-0.007
 fears-0.007
ows-0.007
Token voice
Token ID3809
Feature activation+6.60e-01

Pos logit contributions
adoes+0.020
outing+0.018
ows+0.018
 anymore+0.018
 fears+0.018
Neg logit contributions
 se-0.014
Ar-0.013
 oct-0.013
 sw-0.012
 pl-0.012
Token said
Token ID531
Feature activation-5.14e-01

Pos logit contributions
Ar+0.015
 se+0.012
 oct+0.012
 tort+0.011
 cuc+0.011
Neg logit contributions
adoes-0.029
outing-0.027
ows-0.027
 anymore-0.027
outed-0.026
Token.
Token ID13
Feature activation+4.84e-01

Pos logit contributions
adoes+0.020
ows+0.018
outing+0.018
 anymore+0.018
 fears+0.017
Neg logit contributions
Ar-0.014
 se-0.014
 oct-0.013
 tort-0.012
 sw-0.012
Token
Token ID198
Feature activation-2.01e-01

Pos logit contributions
 se+0.010
 oct+0.010
Ar+0.009
 sw+0.008
 sur+0.008
Neg logit contributions
adoes-0.021
ows-0.020
outing-0.019
 anymore-0.019
cases-0.019
Token
Token ID198
Feature activation-5.28e-01

Pos logit contributions
adoes+0.008
ows+0.008
 anymore+0.008
outing+0.007
cases+0.007
Neg logit contributions
 oct-0.005
 se-0.005
Ar-0.004
 sw-0.004
 cuc-0.004
TokenLu
Token ID25596
Feature activation-2.09e-01

Pos logit contributions
adoes+0.020
outing+0.018
ows+0.017
 anymore+0.017
 fears+0.017
Neg logit contributions
 se-0.015
Ar-0.015
 oct-0.015
 sw-0.013
 pl-0.013
Token you
Token ID345
Feature activation-2.08e-02

Pos logit contributions
adoes+0.012
outing+0.011
ows+0.011
cases+0.011
outed+0.010
Neg logit contributions
 se-0.008
 oct-0.008
 sw-0.008
 pl-0.007
Ar-0.007
Token have
Token ID423
Feature activation+3.86e-01

Pos logit contributions
adoes+0.001
outing+0.001
ows+0.001
cases+0.001
iders+0.001
Neg logit contributions
 se-0.001
Ar-0.001
 oct-0.001
 sw-0.001
 pl-0.000
Token to
Token ID284
Feature activation+6.20e-01

Pos logit contributions
 se+0.010
Ar+0.010
 oct+0.009
 sw+0.009
 pl+0.008
Neg logit contributions
adoes-0.016
outing-0.014
ows-0.014
 fears-0.014
 anymore-0.014
Token try
Token ID1949
Feature activation-4.84e-01

Pos logit contributions
Ar+0.017
 se+0.017
 oct+0.016
 sw+0.015
 tort+0.014
Neg logit contributions
adoes-0.023
ows-0.021
outing-0.021
 fears-0.021
 anymore-0.021
Token new
Token ID649
Feature activation-6.82e-02

Pos logit contributions
adoes+0.020
outing+0.019
ows+0.018
 fears+0.018
outed+0.018
Neg logit contributions
 se-0.012
Ar-0.011
 oct-0.011
 pl-0.010
 sw-0.010
Token things
Token ID1243
Feature activation-1.58e-01

Pos logit contributions
adoes+0.002
 anymore+0.002
 fears+0.002
ows+0.002
outing+0.002
Neg logit contributions
Ar-0.003
 se-0.003
 oct-0.002
 sw-0.002
 tort-0.002
Token.
Token ID13
Feature activation-6.76e-01

Pos logit contributions
adoes+0.006
outing+0.006
 fears+0.006
ows+0.006
 anymore+0.006
Neg logit contributions
 se-0.004
Ar-0.004
 oct-0.004
 sw-0.003
 pl-0.003
Token Sometimes
Token ID8975
Feature activation+3.19e-01

Pos logit contributions
adoes+0.021
outing+0.018
ows+0.018
 fears+0.018
 anymore+0.017
Neg logit contributions
 se-0.024
Ar-0.023
 oct-0.022
 sw-0.021
 pl-0.021
Token you
Token ID345
Feature activation+2.65e-03

Pos logit contributions
 se+0.008
 oct+0.007
Ar+0.007
 sw+0.007
 pl+0.007
Neg logit contributions
adoes-0.013
outing-0.013
ows-0.012
outed-0.012
 fears-0.012
Token might
Token ID1244
Feature activation+2.85e-01

Pos logit contributions
 se+0.000
Ar+0.000
 oct+0.000
 sw+0.000
 pl+0.000
Neg logit contributions
adoes-0.000
outing-0.000
ows-0.000
 fears-0.000
 anymore-0.000
Token find
Token ID1064
Feature activation+5.15e-02

Pos logit contributions
Ar+0.008
 se+0.008
 oct+0.007
 sur+0.007
 sw+0.007
Neg logit contributions
adoes-0.011
 anymore-0.010
outing-0.010
ows-0.010
 fears-0.010
Token't
Token ID470
Feature activation+2.06e-01

Pos logit contributions
 se+0.018
Ar+0.018
 oct+0.017
 sw+0.016
 pl+0.016
Neg logit contributions
adoes-0.020
outing-0.018
ows-0.018
 anymore-0.018
 fears-0.018
Token come
Token ID1282
Feature activation+3.74e-01

Pos logit contributions
 se+0.005
Ar+0.005
 oct+0.004
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.009
outing-0.008
ows-0.008
 fears-0.008
outed-0.008
Token in
Token ID287
Feature activation-4.66e-01

Pos logit contributions
 se+0.011
Ar+0.011
 oct+0.010
 sw+0.009
 pl+0.009
Neg logit contributions
adoes-0.014
outing-0.012
ows-0.012
 anymore-0.012
 fears-0.012
Token,"
Token ID553
Feature activation+5.06e-01

Pos logit contributions
adoes+0.019
outing+0.017
 anymore+0.017
ows+0.017
 fears+0.017
Neg logit contributions
 se-0.012
Ar-0.012
 oct-0.011
 sw-0.010
 pl-0.010
Token Lily
Token ID20037
Feature activation+3.88e-01

Pos logit contributions
Ar+0.018
 se+0.017
thank+0.016
 oct+0.016
 tort+0.016
Neg logit contributions
adoes-0.015
outing-0.013
 anymore-0.012
 fears-0.012
cases-0.012
Token said
Token ID531
Feature activation+1.74e-01

Pos logit contributions
 se+0.006
Ar+0.005
 oct+0.005
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.020
outing-0.018
 fears-0.018
ows-0.018
 anymore-0.018
Token.
Token ID13
Feature activation+1.76e-01

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.003
 sw+0.003
 pl+0.003
Neg logit contributions
adoes-0.008
outing-0.007
 fears-0.007
ows-0.007
 anymore-0.007
Token "
Token ID366
Feature activation+4.30e-01

Pos logit contributions
Ar+0.004
 se+0.004
 tort+0.003
 oct+0.003
 pl+0.003
Neg logit contributions
adoes-0.008
 fears-0.007
outing-0.007
 anymore-0.007
ows-0.007
TokenYou
Token ID1639
Feature activation-4.20e-01

Pos logit contributions
Ar+0.009
 se+0.009
 oct+0.008
 sw+0.008
 pl+0.008
Neg logit contributions
adoes-0.019
ows-0.018
outing-0.017
 fears-0.017
 anymore-0.017
Token are
Token ID389
Feature activation-7.47e-03

Pos logit contributions
adoes+0.017
outing+0.016
ows+0.015
iders+0.015
 fears+0.014
Neg logit contributions
 se-0.011
Ar-0.011
 sw-0.010
 oct-0.010
 pl-0.010
Token not
Token ID407
Feature activation+2.22e-02

Pos logit contributions
adoes+0.000
outing+0.000
ows+0.000
 fears+0.000
cases+0.000
Neg logit contributions
 se-0.000
Ar-0.000
 oct-0.000
 sw-0.000
 pl-0.000
Token searching
Token ID10342
Feature activation-3.39e-01

Pos logit contributions
adoes+0.008
outing+0.007
ows+0.007
 fears+0.007
 anymore+0.007
Neg logit contributions
Ar-0.005
 se-0.005
 oct-0.005
 tort-0.005
 sw-0.005
Token for
Token ID329
Feature activation-1.59e-01

Pos logit contributions
adoes+0.014
outing+0.013
ows+0.012
 fears+0.012
 anymore+0.012
Neg logit contributions
 se-0.008
Ar-0.008
 oct-0.008
 sw-0.007
 pl-0.007
Token treasures
Token ID29561
Feature activation+1.97e-01

Pos logit contributions
adoes+0.006
 anymore+0.005
ows+0.005
 fears+0.005
outing+0.005
Neg logit contributions
Ar-0.004
 se-0.004
 oct-0.004
thank-0.004
 sur-0.004
Token and
Token ID290
Feature activation+8.19e-02

Pos logit contributions
Ar+0.006
 se+0.005
 oct+0.005
 tort+0.005
 sur+0.005
Neg logit contributions
adoes-0.007
ows-0.007
 anymore-0.007
outing-0.007
cases-0.006
Token laughing
Token ID14376
Feature activation+4.60e-01

Pos logit contributions
Ar+0.002
 se+0.002
 oct+0.002
 sur+0.002
 tort+0.002
Neg logit contributions
adoes-0.003
ows-0.003
 fears-0.003
 anymore-0.003
cases-0.003
Token at
Token ID379
Feature activation-5.70e-01

Pos logit contributions
 se+0.011
Ar+0.011
 oct+0.010
 sw+0.010
 pl+0.009
Neg logit contributions
adoes-0.019
outing-0.018
ows-0.017
 fears-0.017
 anymore-0.017
Token all
Token ID477
Feature activation-1.09e-01

Pos logit contributions
adoes+0.022
outing+0.020
ows+0.019
 fears+0.019
cases+0.019
Neg logit contributions
 se-0.016
Ar-0.015
 oct-0.015
 sw-0.014
 pl-0.014
Token the
Token ID262
Feature activation+1.62e-01

Pos logit contributions
adoes+0.005
outing+0.004
ows+0.004
 anymore+0.004
 fears+0.004
Neg logit contributions
Ar-0.002
 se-0.002
 oct-0.002
 sw-0.002
 tort-0.002
Token strange
Token ID6283
Feature activation+5.55e-02

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.004
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.006
outing-0.006
ows-0.006
 fears-0.006
 anymore-0.005
Token things
Token ID1243
Feature activation+1.32e-01

Pos logit contributions
 se+0.002
 oct+0.002
Ar+0.002
 sw+0.002
 pl+0.001
Neg logit contributions
adoes-0.002
outing-0.002
ows-0.002
 fears-0.002
outed-0.002
Token they
Token ID484
Feature activation+4.58e-01

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.003
 sw+0.003
 tort+0.003
Neg logit contributions
adoes-0.005
outing-0.005
ows-0.005
 anymore-0.005
 fears-0.005
Token his
Token ID465
Feature activation-1.85e-01

Pos logit contributions
Ar+0.019
 se+0.015
 tort+0.015
thank+0.015
 sur+0.015
Neg logit contributions
adoes-0.020
outing-0.017
 anymore-0.017
 slams-0.017
 911-0.016
Token puzzle
Token ID15027
Feature activation+4.67e-01

Pos logit contributions
adoes+0.006
outing+0.005
ows+0.005
 fears+0.005
 anymore+0.005
Neg logit contributions
 se-0.007
Ar-0.007
 oct-0.006
 sw-0.006
 pl-0.006
Token and
Token ID290
Feature activation-2.52e-01

Pos logit contributions
Ar+0.013
 se+0.012
 oct+0.012
 sw+0.011
 sur+0.011
Neg logit contributions
adoes-0.017
ows-0.016
 anymore-0.016
 fears-0.015
outing-0.015
Token thanked
Token ID26280
Feature activation-7.58e-03

Pos logit contributions
adoes+0.009
 slams+0.007
cases+0.007
ouncing+0.007
ows+0.007
Neg logit contributions
Ar-0.008
 oblig-0.006
 oct-0.006
 sur-0.006
 se-0.006
Token Wh
Token ID854
Feature activation+9.02e-01

Pos logit contributions
adoes+0.000
 anymore+0.000
 911+0.000
 fears+0.000
uffy+0.000
Neg logit contributions
 se-0.000
Ar-0.000
 sur-0.000
 oct-0.000
 cuc-0.000
Tokenisk
Token ID1984
Feature activation+1.78e-01

Pos logit contributions
Ar+0.036
 se+0.034
 oct+0.034
 sw+0.033
 Bub+0.033
Neg logit contributions
adoes-0.023
 fears-0.019
outing-0.018
cases-0.018
 slams-0.017
Tokeners
Token ID364
Feature activation+8.74e-01

Pos logit contributions
 se+0.005
Ar+0.005
 oct+0.004
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.007
outing-0.006
ows-0.006
 fears-0.006
 anymore-0.006
Token for
Token ID329
Feature activation-5.11e-01

Pos logit contributions
Ar+0.025
 se+0.025
 oct+0.022
 tort+0.021
 sur+0.021
Neg logit contributions
adoes-0.032
ows-0.030
 anymore-0.030
outing-0.029
 fears-0.029
Token helping
Token ID5742
Feature activation-9.84e-01

Pos logit contributions
adoes+0.016
 911+0.014
 anymore+0.014
 slams+0.013
outing+0.013
Neg logit contributions
Ar-0.016
 sur-0.014
 oct-0.014
 se-0.014
 Oy-0.014
Token him
Token ID683
Feature activation+1.47e-01

Pos logit contributions
adoes+0.037
 anymore+0.033
outing+0.032
ows+0.032
 fears+0.032
Neg logit contributions
 se-0.028
Ar-0.028
 oct-0.026
 sur-0.024
 sw-0.024
Token.
Token ID13
Feature activation+2.63e-01

Pos logit contributions
Ar+0.004
 se+0.004
 oct+0.003
 sw+0.003
 tort+0.003
Neg logit contributions
adoes-0.006
outing-0.005
ows-0.005
 anymore-0.005
 fears-0.005
Token the
Token ID262
Feature activation-1.23e-02

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.001
 sw+0.001
 pl+0.001
Neg logit contributions
adoes-0.003
outing-0.003
ows-0.003
 fears-0.003
 anymore-0.003
Token ground
Token ID2323
Feature activation-1.18e-01

Pos logit contributions
adoes+0.000
outing+0.000
ows+0.000
 anymore+0.000
 fears+0.000
Neg logit contributions
Ar-0.000
 se-0.000
 oct-0.000
 sw-0.000
 sur-0.000
Token next
Token ID1306
Feature activation-1.28e-01

Pos logit contributions
adoes+0.005
outing+0.005
 fears+0.005
ows+0.005
cases+0.005
Neg logit contributions
 se-0.002
 oct-0.002
Ar-0.002
 pl-0.002
 sw-0.002
Token to
Token ID284
Feature activation+2.66e-01

Pos logit contributions
adoes+0.005
outing+0.005
ows+0.005
cases+0.005
 fears+0.005
Neg logit contributions
 se-0.003
Ar-0.003
 oct-0.003
 pl-0.003
 sw-0.003
Token her
Token ID607
Feature activation+2.91e-01

Pos logit contributions
Ar+0.008
 se+0.008
 oct+0.007
 tort+0.007
 sw+0.007
Neg logit contributions
adoes-0.010
outing-0.008
 anymore-0.008
ows-0.008
 fears-0.008
Token!
Token ID0
Feature activation+5.41e-02

Pos logit contributions
Ar+0.008
 se+0.008
 oct+0.007
 sw+0.007
 pl+0.007
Neg logit contributions
adoes-0.011
outing-0.010
ows-0.010
 fears-0.010
 anymore-0.010
Token She
Token ID1375
Feature activation-3.59e-01

Pos logit contributions
 se+0.002
Ar+0.002
 oct+0.002
 sw+0.002
 pl+0.002
Neg logit contributions
adoes-0.002
outing-0.001
ows-0.001
 fears-0.001
 anymore-0.001
Token felt
Token ID2936
Feature activation+4.25e-01

Pos logit contributions
adoes+0.011
 fears+0.011
outed+0.011
ows+0.010
outing+0.010
Neg logit contributions
rob-0.011
Ar-0.010
thank-0.010
 oct-0.010
 tort-0.010
Token so
Token ID523
Feature activation-3.70e-01

Pos logit contributions
Ar+0.013
rob+0.013
 se+0.012
 oct+0.012
 tort+0.011
Neg logit contributions
adoes-0.014
 anymore-0.014
 fears-0.013
ows-0.013
 slams-0.012
Token happy
Token ID3772
Feature activation-2.81e-01

Pos logit contributions
adoes+0.016
ows+0.014
 fears+0.014
outing+0.014
 anymore+0.014
Neg logit contributions
Ar-0.009
 se-0.008
 oct-0.008
 tort-0.007
 sw-0.007
Token and
Token ID290
Feature activation+2.02e-01

Pos logit contributions
adoes+0.012
outing+0.011
ows+0.011
 fears+0.011
 anymore+0.010
Neg logit contributions
 se-0.007
Ar-0.006
 oct-0.006
 sw-0.006
 pl-0.006
Token will
Token ID481
Feature activation+2.63e-01

Pos logit contributions
 oct+0.005
Ar+0.005
 se+0.005
rob+0.005
 cuc+0.004
Neg logit contributions
adoes-0.003
 fears-0.003
ows-0.003
 anymore-0.003
outing-0.003
Token have
Token ID423
Feature activation-1.56e-01

Pos logit contributions
Ar+0.008
rob+0.007
 oct+0.007
 se+0.007
 sur+0.006
Neg logit contributions
adoes-0.009
 anymore-0.009
 fears-0.008
ows-0.008
 appearance-0.008
Token a
Token ID257
Feature activation+2.12e-01

Pos logit contributions
adoes+0.006
 anymore+0.005
 fears+0.005
outing+0.005
ows+0.005
Neg logit contributions
Ar-0.004
 se-0.004
 oct-0.004
 tort-0.004
 sur-0.004
Token new
Token ID649
Feature activation-1.60e-01

Pos logit contributions
Ar+0.005
 se+0.005
 oct+0.005
 sw+0.004
 sur+0.004
Neg logit contributions
adoes-0.008
outing-0.008
 fears-0.008
ows-0.007
 anymore-0.007
Token tie
Token ID9839
Feature activation+5.92e-01

Pos logit contributions
adoes+0.006
 anymore+0.005
 fears+0.005
ows+0.005
 appearance+0.005
Neg logit contributions
Ar-0.005
 oct-0.005
 se-0.004
 polar-0.004
us-0.004
Token for
Token ID329
Feature activation-1.46e+00

Pos logit contributions
Ar+0.017
 se+0.016
 oct+0.015
 sw+0.014
 pl+0.014
Neg logit contributions
adoes-0.022
 anymore-0.021
ows-0.020
outing-0.019
cases-0.019
Token you
Token ID345
Feature activation+3.29e-01

Pos logit contributions
adoes+0.045
 anymore+0.039
 fears+0.038
outing+0.037
 appearance+0.036
Neg logit contributions
Ar-0.050
 se-0.048
 oct-0.046
 sur-0.045
 sw-0.044
Token."
Token ID526
Feature activation+2.24e-02

Pos logit contributions
 se+0.008
Ar+0.008
 oct+0.007
 sw+0.007
 pl+0.006
Neg logit contributions
adoes-0.014
outing-0.013
ows-0.013
 fears-0.012
 anymore-0.012
Token
Token ID198
Feature activation+1.79e-01

Pos logit contributions
 se+0.000
Ar+0.000
 oct+0.000
 sw+0.000
 pl+0.000
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
 fears-0.001
 anymore-0.001
Token
Token ID198
Feature activation-2.89e-01

Pos logit contributions
 se+0.004
 oct+0.004
 tort+0.004
 cuc+0.004
 sw+0.004
Neg logit contributions
adoes-0.007
ows-0.007
 anymore-0.007
outing-0.007
cases-0.006
TokenThat
Token ID2504
Feature activation-1.32e-01

Pos logit contributions
adoes+0.010
outing+0.008
ows+0.008
 fears+0.008
 anymore+0.008
Neg logit contributions
 se-0.009
 oct-0.009
Ar-0.009
 sw-0.009
 pl-0.008
Token he
Token ID339
Feature activation-1.57e-02

Pos logit contributions
 se+0.002
Ar+0.002
 oct+0.002
 sw+0.002
 pl+0.002
Neg logit contributions
adoes-0.004
outing-0.004
ows-0.004
 fears-0.004
 anymore-0.004
Token couldn
Token ID3521
Feature activation-3.84e-01

Pos logit contributions
adoes+0.001
outing+0.001
ows+0.001
 fears+0.001
 anymore+0.001
Neg logit contributions
 se-0.000
Ar-0.000
 oct-0.000
 sw-0.000
 pl-0.000
Token't
Token ID470
Feature activation-2.90e-01

Pos logit contributions
adoes+0.013
outing+0.012
ows+0.012
 anymore+0.011
 fears+0.011
Neg logit contributions
 se-0.012
Ar-0.012
 oct-0.011
 sw-0.011
 pl-0.010
Token find
Token ID1064
Feature activation-6.94e-02

Pos logit contributions
adoes+0.010
 anymore+0.010
ows+0.009
 fears+0.009
outing+0.009
Neg logit contributions
Ar-0.008
 se-0.008
 oct-0.008
 cuc-0.007
 sw-0.007
Token a
Token ID257
Feature activation-5.87e-02

Pos logit contributions
adoes+0.002
outing+0.002
ows+0.002
iders+0.002
outed+0.002
Neg logit contributions
 se-0.002
Ar-0.002
 oct-0.002
 sw-0.002
 pl-0.002
Token way
Token ID835
Feature activation-1.51e+00

Pos logit contributions
adoes+0.002
outing+0.002
ows+0.002
 fears+0.002
 anymore+0.002
Neg logit contributions
 se-0.002
Ar-0.002
 oct-0.001
 sw-0.001
 pl-0.001
Token out
Token ID503
Feature activation+1.85e-01

Pos logit contributions
adoes+0.063
outing+0.058
ows+0.056
 fears+0.056
iders+0.054
Neg logit contributions
 se-0.037
Ar-0.035
 oct-0.034
 sw-0.033
 pl-0.032
Token.
Token ID13
Feature activation+1.35e-02

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.004
 sw+0.004
 pl+0.004
Neg logit contributions
adoes-0.008
outing-0.007
ows-0.007
 fears-0.007
cases-0.007
Token
Token ID198
Feature activation+3.43e-02

Pos logit contributions
Ar+0.000
 se+0.000
 oct+0.000
 pl+0.000
 sw+0.000
Neg logit contributions
adoes-0.001
outing-0.001
ows-0.001
 fears-0.001
 anymore-0.001
Token
Token ID198
Feature activation-4.45e-01

Pos logit contributions
 se+0.001
Ar+0.001
 oct+0.001
 tort+0.001
 sw+0.000
Neg logit contributions
adoes-0.002
outing-0.002
ows-0.002
 anymore-0.001
 fears-0.001
TokenFinally
Token ID11158
Feature activation+5.77e-01

Pos logit contributions
adoes+0.020
ows+0.018
outing+0.018
 fears+0.018
 anymore+0.017
Neg logit contributions
Ar-0.010
 se-0.009
 pl-0.008
 oct-0.008
 sw-0.008
Token the
Token ID262
Feature activation+3.69e-01

Pos logit contributions
 se+0.004
Ar+0.004
 oct+0.004
 sw+0.003
 pl+0.003
Neg logit contributions
adoes-0.007
outing-0.006
ows-0.006
 fears-0.006
 anymore-0.006
Token chain
Token ID6333
Feature activation-2.24e-01

Pos logit contributions
Ar+0.012
 se+0.012
 oct+0.012
 sw+0.011
 tort+0.011
Neg logit contributions
adoes-0.012
outing-0.010
 fears-0.010
ows-0.010
 anymore-0.010
Token,
Token ID11
Feature activation+9.99e-02

Pos logit contributions
adoes+0.010
outing+0.009
 fears+0.009
ows+0.008
outed+0.008
Neg logit contributions
 se-0.005
Ar-0.005
 pl-0.005
 oct-0.005
 sw-0.004
Token it
Token ID340
Feature activation+2.05e-01

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.002
 sw+0.002
 tort+0.002
Neg logit contributions
adoes-0.004
outing-0.004
ows-0.004
 anymore-0.004
 fears-0.004
Token was
Token ID373
Feature activation-9.83e-02

Pos logit contributions
Ar+0.004
 se+0.004
 oct+0.004
 sw+0.004
 tort+0.004
Neg logit contributions
adoes-0.009
outing-0.008
ows-0.008
 fears-0.008
 anymore-0.008
Token even
Token ID772
Feature activation-1.36e+00

Pos logit contributions
adoes+0.004
 anymore+0.003
ows+0.003
 fears+0.003
outing+0.003
Neg logit contributions
Ar-0.003
 se-0.003
 oct-0.003
 sur-0.003
rob-0.003
Token prett
Token ID46442
Feature activation+3.63e-01

Pos logit contributions
adoes+0.051
outing+0.046
ows+0.045
 anymore+0.045
 fears+0.045
Neg logit contributions
Ar-0.038
 oct-0.036
 se-0.035
 sur-0.032
 tort-0.032
Tokenier
Token ID959
Feature activation+8.01e-02

Pos logit contributions
Ar+0.012
 se+0.011
 oct+0.011
 sw+0.010
 tort+0.010
Neg logit contributions
adoes-0.012
outing-0.012
ows-0.011
 anymore-0.011
 fears-0.011
Token than
Token ID621
Feature activation-7.74e-01

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.003
 sw+0.003
 pl+0.003
Neg logit contributions
adoes-0.002
 anymore-0.002
ows-0.002
outing-0.002
 fears-0.002
Token she
Token ID673
Feature activation+4.71e-01

Pos logit contributions
adoes+0.031
outing+0.028
ows+0.028
 fears+0.027
cases+0.027
Neg logit contributions
 se-0.020
Ar-0.019
 oct-0.018
 pl-0.017
 sw-0.017
Token had
Token ID550
Feature activation+3.03e-01

Pos logit contributions
 se+0.013
Ar+0.013
 oct+0.013
 cuc+0.012
 Bub+0.012
Neg logit contributions
adoes-0.016
 anymore-0.015
 fears-0.015
ows-0.015
outing-0.014
Token home
Token ID1363
Feature activation+2.56e-01

Pos logit contributions
adoes+0.010
 fears+0.009
 anymore+0.009
 appearance+0.008
outing+0.008
Neg logit contributions
Ar-0.009
 se-0.008
rob-0.008
 oct-0.008
 sur-0.008
Token.
Token ID13
Feature activation+1.49e-03

Pos logit contributions
Ar+0.007
 se+0.007
 oct+0.006
 sw+0.006
 tort+0.006
Neg logit contributions
adoes-0.010
ows-0.009
outing-0.009
 anymore-0.009
 fears-0.009
Token The
Token ID383
Feature activation-1.93e-01

Pos logit contributions
 se+0.000
Ar+0.000
 oct+0.000
 sw+0.000
 pl+0.000
Neg logit contributions
adoes-0.000
ows-0.000
outing-0.000
 fears-0.000
 anymore-0.000
Token squirrel
Token ID33039
Feature activation-3.68e-01

Pos logit contributions
adoes+0.007
 fears+0.006
 anymore+0.006
outing+0.006
 appearance+0.006
Neg logit contributions
Ar-0.005
 se-0.005
 oct-0.005
 sur-0.005
rob-0.005
Token learned
Token ID4499
Feature activation-3.17e-01

Pos logit contributions
adoes+0.015
outing+0.014
ows+0.014
 fears+0.014
 anymore+0.014
Neg logit contributions
Ar-0.009
 se-0.008
 oct-0.008
 tort-0.007
 sur-0.007
Token that
Token ID326
Feature activation-1.34e+00

Pos logit contributions
adoes+0.012
outing+0.010
 anymore+0.010
ows+0.010
 fears+0.010
Neg logit contributions
Ar-0.009
 se-0.009
 oct-0.008
 tort-0.008
thank-0.008
Token life
Token ID1204
Feature activation-2.27e-01

Pos logit contributions
adoes+0.050
 anymore+0.044
 fears+0.043
outing+0.042
 appearance+0.041
Neg logit contributions
Ar-0.039
 se-0.036
 oct-0.034
thank-0.033
 pl-0.033
Token is
Token ID318
Feature activation+3.22e-01

Pos logit contributions
adoes+0.008
outing+0.008
ows+0.007
cases+0.007
 fears+0.007
Neg logit contributions
 se-0.007
Ar-0.006
 oct-0.006
 sw-0.006
 pl-0.006
Token not
Token ID407
Feature activation-1.90e-01

Pos logit contributions
Ar+0.011
 se+0.011
 oct+0.011
 sur+0.010
rob+0.010
Neg logit contributions
adoes-0.010
 anymore-0.009
outing-0.009
 fears-0.009
ows-0.009
Token always
Token ID1464
Feature activation+8.04e-01

Pos logit contributions
adoes+0.008
outing+0.007
ows+0.007
cases+0.007
 fears+0.007
Neg logit contributions
 se-0.005
Ar-0.005
 oct-0.004
 sw-0.004
 pl-0.004
Token fair
Token ID3148
Feature activation-4.95e-02

Pos logit contributions
 se+0.020
Ar+0.020
 oct+0.019
 sw+0.018
 pl+0.017
Neg logit contributions
adoes-0.032
outing-0.030
ows-0.029
cases-0.028
 fears-0.028
Token want
Token ID765
Feature activation-2.23e-01

Pos logit contributions
adoes+0.006
outing+0.006
ows+0.006
 fears+0.005
 anymore+0.005
Neg logit contributions
Ar-0.004
 se-0.004
 oct-0.004
 tort-0.003
 sw-0.003
Token to
Token ID284
Feature activation-1.21e-01

Pos logit contributions
adoes+0.009
outing+0.008
ows+0.008
 fears+0.008
 anymore+0.008
Neg logit contributions
 se-0.006
Ar-0.006
 oct-0.005
 sw-0.005
 pl-0.005
Token make
Token ID787
Feature activation-1.17e-02

Pos logit contributions
adoes+0.004
 anymore+0.004
ows+0.004
 fears+0.004
 slams+0.003
Neg logit contributions
Ar-0.003
rob-0.003
 oct-0.003
 se-0.003
 tort-0.003
Token a
Token ID257
Feature activation-7.99e-02

Pos logit contributions
adoes+0.001
outing+0.000
ows+0.000
 fears+0.000
 anymore+0.000
Neg logit contributions
 se-0.000
Ar-0.000
 oct-0.000
 sw-0.000
 pl-0.000
Token house
Token ID2156
Feature activation-1.55e+00

Pos logit contributions
adoes+0.003
outing+0.002
ows+0.002
 fears+0.002
 anymore+0.002
Neg logit contributions
Ar-0.003
 se-0.003
 oct-0.002
thank-0.002
 tort-0.002
Token for
Token ID329
Feature activation-1.46e+00

Pos logit contributions
adoes+0.067
outing+0.062
 fears+0.061
ows+0.060
 anymore+0.058
Neg logit contributions
 se-0.036
Ar-0.034
 oct-0.032
 sw-0.030
 pl-0.030
Token their
Token ID511
Feature activation-3.66e-01

Pos logit contributions
adoes+0.058
outing+0.052
ows+0.051
 fears+0.051
 anymore+0.050
Neg logit contributions
Ar-0.038
 se-0.038
 oct-0.035
 sw-0.033
 tort-0.033
Token toys
Token ID14958
Feature activation+9.65e-02

Pos logit contributions
adoes+0.013
outing+0.012
ows+0.011
 fears+0.011
 anymore+0.011
Neg logit contributions
 se-0.011
Ar-0.011
 oct-0.010
 sw-0.010
 pl-0.010
Token.
Token ID13
Feature activation-7.80e-02

Pos logit contributions
Ar+0.003
 se+0.003
 oct+0.002
 tort+0.002
 sw+0.002
Neg logit contributions
adoes-0.004
 anymore-0.003
ows-0.003
outing-0.003
cases-0.003
Token
Token ID198
Feature activation+2.68e-02

Pos logit contributions
adoes+0.004
outing+0.004
 fears+0.004
ows+0.004
 anymore+0.004
Neg logit contributions
Ar-0.001
 se-0.001
 oct-0.001
 sw-0.001
 pl-0.001
Token
Token ID198
Feature activation-3.54e-01

Pos logit contributions
 se+0.001
 oct+0.001
 tort+0.000
Ar+0.000
 sw+0.000
Neg logit contributions
adoes-0.001
ows-0.001
outing-0.001
 anymore-0.001
cases-0.001