TOP ACTIVATIONS
MAX = 8.105

would take out a j igsaw puzzle and try to solve
. The owl told Wh isk ers , " To be
happy that she hugged Wh isk ers . " Thank you
. The vet showed Wh isk ers the x - ray
him . Spot urged Wh isk ers to come play with
" Look Wh isk ers , I found gold
his puzzle and thanked Wh isk ers for helping him .
Ch ir py thanked Wh isk ers for helping him .
, Lucy !" said Wh isk ers , " Do you
by letting them read his story . <|endoftext|> Once upon a
down together to read the story . From that day on
j olly cat named Wh isk ers . Wh isk ers
it . He told Wh isk ers to look at him
family sh oo ed Wh isk ers away and told Max
you want to read the story inside the shell ? รข
the children thanked the old story te ller . They gave
This time she wrote her story about dolphins and how they
book . Ben reads the story to Lily . Lily listens
couch and mom reads the story . Anna listens and looks
almost forgot to write her story . She quickly grabbed a

MIN ACTIVATIONS
MIN = -10.065

help , but no - one could open it .
was destroyed and no - one was able to help her
somewhere safe so no - one would lose it . He
kitchen , but no - one knew why . The people
very special . Something no one else had . He went
of the biggest and gent lest dinosaurs ever ," mom explained
grass y field and no one dared to come near him
help them . But no one sees them . No one
so fast that no - one could catch him . He
passed , but no - one came and it started to
be clean and no - one will want to play with
looked around but no - one was coming . Then a
for help , but no one could hear her . She
their parents , but no one heard them . The creature
and slides down . " Whe e !" she says .
help them . But no one came . The jungle cat
this stadium , but no one was allowed to go inside
what it was but no one knew . Some said it
was so heavy that no one could carry it . But
so com fy and no one knew about her secret spot

INTERVAL 3.562 - 8.105
CONTAINS 0.880%

sang , the dove 's eyes grew heavy , and he
. It followed the bread trail to the gate . Mia
stars and planets on the box . They make a rocket
what 's happening in the world , like when it 's
voice said , " I forgive you . Let 's be

INTERVAL -0.980 - 3.562
CONTAINS 76.744%

very silly . His name was Max . Max
," the dog says . " You are very kind .
something ," the elephant said . The ant listened carefully to
so excited and couldn 't believe his luck . He grabbed
home together . They had a lot of fun . They

INTERVAL -5.523 - -0.980
CONTAINS 22.305%

pink room . One sunny day , Lily went
She was so happy that she suggested that they take a
I heard that every day if you do good things then
brave . Now , let 's c uddle and read a
. She said she liked me . She asked me if

INTERVAL -10.065 - -5.523
CONTAINS 0.071%

a music box , and as Emma cl apped , he
wonderful time playing , and when it was time to go
hay . Then one day , he got lucky
anything is possible as long as you don 't give up
all morning , and no one knew why . Her mum
Token would
Token ID561
Feature activation-7.40e-01

Pos logit contributions
 bal+0.049
Whe+0.045
 Sara+0.043
CR+0.042
 Mir+0.042
Neg logit contributions
iversity-0.049
 dearly-0.039
gency-0.039
 facts-0.037
assing-0.037
Token take
Token ID1011
Feature activation+7.83e-01

Pos logit contributions
 bal+0.024
 when+0.020
CR+0.020
me+0.020
Whe+0.020
Neg logit contributions
iversity-0.041
gency-0.035
 dearly-0.035
assing-0.035
hips-0.034
Token out
Token ID503
Feature activation+1.08e+00

Pos logit contributions
iversity+0.042
 dearly+0.036
gency+0.035
hips+0.035
 facts+0.035
Neg logit contributions
 bal-0.027
Whe-0.023
me-0.023
CR-0.023
ara-0.022
Token a
Token ID257
Feature activation+1.53e+00

Pos logit contributions
iversity+0.065
gency+0.057
assing+0.056
 dearly+0.055
hips+0.054
Neg logit contributions
 bal-0.031
 when-0.027
 one-0.027
 another-0.027
CR-0.025
Token j
Token ID474
Feature activation+1.36e+00

Pos logit contributions
iversity+0.074
gency+0.062
 dearly+0.061
assing+0.061
hips+0.060
Neg logit contributions
 bal-0.061
CR-0.052
 when-0.052
Whe-0.052
me-0.052
Tokenigsaw
Token ID45636
Feature activation+8.10e+00

Pos logit contributions
iversity+0.080
gency+0.071
assing+0.069
hips+0.067
 dearly+0.067
Neg logit contributions
 bal-0.042
CR-0.033
 when-0.032
me-0.032
Whe-0.032
Token puzzle
Token ID15027
Feature activation+2.09e+00

Pos logit contributions
iversity+0.355
gency+0.303
assing+0.294
 dearly+0.285
hips+0.281
Neg logit contributions
 bal-0.362
 when-0.317
 she-0.311
me-0.310
CR-0.309
Token and
Token ID290
Feature activation+4.38e-01

Pos logit contributions
iversity+0.127
gency+0.112
assing+0.109
 dearly+0.108
hips+0.107
Neg logit contributions
 bal-0.058
 when-0.048
CR-0.045
 she-0.044
me-0.044
Token try
Token ID1949
Feature activation-8.64e-01

Pos logit contributions
iversity+0.021
gency+0.018
 facts+0.018
 dearly+0.018
hips+0.018
Neg logit contributions
 bal-0.016
Whe-0.015
ara-0.015
me-0.014
CR-0.014
Token to
Token ID284
Feature activation+8.53e-01

Pos logit contributions
 bal+0.019
 when+0.018
 one+0.017
 another+0.017
 she+0.016
Neg logit contributions
iversity-0.057
gency-0.049
ishly-0.049
 dearly-0.049
assing-0.048
Token solve
Token ID8494
Feature activation+2.17e+00

Pos logit contributions
iversity+0.040
hips+0.034
assing+0.033
gency+0.033
 dearly+0.033
Neg logit contributions
 bal-0.035
Whe-0.031
CR-0.030
 Mir-0.029
 Sara-0.029
Token.
Token ID13
Feature activation+4.82e-01

Pos logit contributions
iversity+0.140
gency+0.125
assing+0.122
hips+0.121
 dearly+0.118
Neg logit contributions
 bal-0.060
 when-0.053
 she-0.048
 one-0.046
 another-0.045
Token The
Token ID383
Feature activation+5.81e-01

Pos logit contributions
iversity+0.027
 dearly+0.023
gency+0.023
hips+0.023
assing+0.023
Neg logit contributions
 bal-0.015
 when-0.013
 Sara-0.013
CR-0.013
 Mir-0.012
Token owl
Token ID39610
Feature activation+4.54e-01

Pos logit contributions
iversity+0.026
gency+0.021
 dearly+0.021
assing+0.020
hips+0.020
Neg logit contributions
 bal-0.026
CR-0.023
Whe-0.023
me-0.023
 when-0.022
Token told
Token ID1297
Feature activation+1.11e+00

Pos logit contributions
iversity+0.026
 dearly+0.023
smart+0.022
 facts+0.022
gency+0.022
Neg logit contributions
 bal-0.014
Whe-0.013
CR-0.012
 Mir-0.011
 Sara-0.011
Token Wh
Token ID854
Feature activation+1.91e+00

Pos logit contributions
iversity+0.071
 lens+0.065
 tripod+0.064
igsaw+0.064
gency+0.063
Neg logit contributions
 Sara-0.025
 when-0.024
 bal-0.023
 one-0.022
 another-0.022
Tokenisk
Token ID1984
Feature activation+8.00e+00

Pos logit contributions
iversity+0.049
 lens+0.045
 tripod+0.045
gency+0.041
 facts+0.038
Neg logit contributions
ara-0.108
me-0.108
 bal-0.107
 when-0.102
CR-0.100
Tokeners
Token ID364
Feature activation+2.16e+00

Pos logit contributions
iversity+0.361
gency+0.318
 dearly+0.302
 lens+0.301
assing+0.299
Neg logit contributions
 bal-0.321
CR-0.295
me-0.292
 when-0.291
Whe-0.287
Token,
Token ID11
Feature activation+1.03e+00

Pos logit contributions
iversity+0.144
gency+0.139
assing+0.136
igsaw+0.135
 unison+0.132
Neg logit contributions
 she-0.058
 when-0.056
 one-0.046
 at-0.038
 another-0.038
Token "
Token ID366
Feature activation+1.51e+00

Pos logit contributions
iversity+0.059
hips+0.052
igsaw+0.052
gency+0.052
 lens+0.052
Neg logit contributions
 when-0.031
 she-0.031
 bal-0.029
 one-0.029
 Mir-0.028
TokenTo
Token ID2514
Feature activation+1.69e+00

Pos logit contributions
iversity+0.081
assing+0.067
 dearly+0.066
isk+0.065
smart+0.065
Neg logit contributions
 bal-0.055
 Mir-0.047
 Sara-0.045
 dipped-0.044
CR-0.042
Token be
Token ID307
Feature activation-2.27e-01

Pos logit contributions
iversity+0.097
 dearly+0.084
gency+0.083
assing+0.081
hips+0.081
Neg logit contributions
 bal-0.051
 when-0.043
CR-0.042
 Sara-0.042
me-0.041
Token happy
Token ID3772
Feature activation-1.27e+00

Pos logit contributions
iversity+0.023
 dearly+0.019
smart+0.019
hips+0.019
assing+0.018
Neg logit contributions
 bal-0.019
Whe-0.017
CR-0.017
me-0.017
 Mir-0.017
Token that
Token ID326
Feature activation-4.86e-01

Pos logit contributions
 bal+0.044
Whe+0.041
CR+0.039
me+0.039
 Mir+0.037
Neg logit contributions
iversity-0.066
 dearly-0.057
 facts-0.055
smart-0.055
gency-0.055
Token she
Token ID673
Feature activation-1.39e+00

Pos logit contributions
 bal+0.008
Whe+0.007
me+0.007
CR+0.007
 dipped+0.006
Neg logit contributions
iversity-0.033
 dearly-0.030
smart-0.029
hips-0.029
assing-0.029
Token hugged
Token ID41130
Feature activation+2.87e+00

Pos logit contributions
 bal+0.051
Whe+0.046
 Mir+0.044
CR+0.044
Next+0.043
Neg logit contributions
iversity-0.072
gency-0.059
 dearly-0.059
assing-0.058
hips-0.057
Token Wh
Token ID854
Feature activation+2.39e+00

Pos logit contributions
iversity+0.173
hips+0.148
gency+0.148
 dearly+0.147
 facts+0.147
Neg logit contributions
 bal-0.083
 when-0.071
 Sara-0.070
 another-0.067
Whe-0.067
Tokenisk
Token ID1984
Feature activation+7.87e+00

Pos logit contributions
iversity+0.053
gency+0.041
 lens+0.040
 dearly+0.039
 tripod+0.038
Neg logit contributions
me-0.148
 bal-0.147
ara-0.145
 when-0.136
CR-0.135
Tokeners
Token ID364
Feature activation+2.27e+00

Pos logit contributions
iversity+0.341
gency+0.288
 dearly+0.279
assing+0.275
hips+0.270
Neg logit contributions
 bal-0.347
CR-0.309
me-0.307
Whe-0.304
 when-0.303
Token.
Token ID13
Feature activation+4.75e-01

Pos logit contributions
iversity+0.144
gency+0.127
assing+0.123
hips+0.123
 facts+0.123
Neg logit contributions
 bal-0.059
 when-0.052
 one-0.048
 another-0.048
 she-0.047
Token "
Token ID366
Feature activation+9.38e-01

Pos logit contributions
iversity+0.025
gency+0.021
 dearly+0.021
assing+0.020
hips+0.020
Neg logit contributions
 bal-0.017
 when-0.015
CR-0.014
Whe-0.014
 Sara-0.014
TokenThank
Token ID10449
Feature activation+2.59e+00

Pos logit contributions
iversity+0.046
 dearly+0.041
assing+0.038
gency+0.037
smart+0.036
Neg logit contributions
 bal-0.033
 Mir-0.031
CR-0.030
 Sara-0.029
 asks-0.029
Token you
Token ID345
Feature activation+2.17e+00

Pos logit contributions
iversity+0.188
gency+0.165
 dearly+0.165
assing+0.164
hips+0.163
Neg logit contributions
 bal-0.041
Whe-0.027
CR-0.027
me-0.025
 when-0.025
Token.
Token ID13
Feature activation-4.56e-04

Pos logit contributions
 bal+0.016
 when+0.013
CR+0.012
me+0.012
Whe+0.012
Neg logit contributions
iversity-0.033
gency-0.029
assing-0.028
 dearly-0.028
hips-0.028
Token The
Token ID383
Feature activation-2.14e-01

Pos logit contributions
 bal+0.000
me+0.000
 dipped+0.000
CR+0.000
Whe+0.000
Neg logit contributions
iversity-0.000
 dearly-0.000
smart-0.000
gency-0.000
assing-0.000
Token vet
Token ID20202
Feature activation-3.03e+00

Pos logit contributions
 bal+0.005
Whe+0.005
CR+0.005
me+0.004
 Mir+0.004
Neg logit contributions
iversity-0.013
gency-0.011
 dearly-0.011
assing-0.011
 facts-0.011
Token showed
Token ID3751
Feature activation+8.30e-01

Pos logit contributions
 bal+0.106
Whe+0.105
CR+0.096
Next+0.093
 Mir+0.085
Neg logit contributions
iversity-0.159
 dearly-0.130
smart-0.130
hips-0.126
gency-0.126
Token Wh
Token ID854
Feature activation+1.53e+00

Pos logit contributions
iversity+0.046
 dearly+0.040
smart+0.039
hips+0.038
gency+0.038
Neg logit contributions
 bal-0.027
Whe-0.023
CR-0.023
me-0.022
Next-0.022
Tokenisk
Token ID1984
Feature activation+7.83e+00

Pos logit contributions
iversity+0.030
gency+0.019
hips+0.017
 dearly+0.017
 lens+0.017
Neg logit contributions
 bal-0.104
me-0.098
CR-0.096
 when-0.096
ara-0.095
Tokeners
Token ID364
Feature activation+2.09e+00

Pos logit contributions
iversity+0.335
gency+0.278
 dearly+0.272
assing+0.268
hips+0.264
Neg logit contributions
 bal-0.349
CR-0.311
me-0.308
Whe-0.308
 when-0.307
Token the
Token ID262
Feature activation+8.77e-01

Pos logit contributions
iversity+0.124
gency+0.116
assing+0.112
 tripod+0.106
asser+0.106
Neg logit contributions
 when-0.060
 bal-0.057
 she-0.056
 one-0.055
 another-0.054
Token x
Token ID2124
Feature activation+1.63e+00

Pos logit contributions
iversity+0.044
gency+0.037
 dearly+0.037
assing+0.036
 facts+0.035
Neg logit contributions
 bal-0.033
CR-0.029
Whe-0.029
me-0.028
Next-0.028
Token-
Token ID12
Feature activation+1.18e+00

Pos logit contributions
iversity+0.102
 dearly+0.090
gency+0.090
assing+0.089
hips+0.089
Neg logit contributions
 bal-0.043
me-0.033
 Mir-0.032
 when-0.031
 one-0.031
Tokenray
Token ID2433
Feature activation+2.16e+00

Pos logit contributions
iversity+0.046
gency+0.035
 dearly+0.035
smart+0.034
assing+0.033
Neg logit contributions
 bal-0.058
Whe-0.053
CR-0.052
 Sara-0.051
 Mir-0.051
Token him
Token ID683
Feature activation+9.32e-01

Pos logit contributions
iversity+0.034
hips+0.030
 lens+0.030
 facts+0.030
gency+0.029
Neg logit contributions
 when-0.012
 Sara-0.011
 bal-0.011
 another-0.011
 one-0.011
Token.
Token ID13
Feature activation+4.04e-01

Pos logit contributions
iversity+0.058
gency+0.051
hips+0.050
assing+0.050
 facts+0.050
Neg logit contributions
 bal-0.023
 when-0.021
 another-0.019
 one-0.019
me-0.018
Token Spot
Token ID15899
Feature activation-2.94e-01

Pos logit contributions
iversity+0.024
hips+0.021
 dearly+0.021
gency+0.021
 facts+0.021
Neg logit contributions
 bal-0.011
 Sara-0.009
 when-0.009
Whe-0.009
 she-0.009
Token urged
Token ID11643
Feature activation-5.13e-01

Pos logit contributions
 bal+0.008
 when+0.007
CR+0.007
me+0.007
Whe+0.007
Neg logit contributions
iversity-0.018
gency-0.015
 dearly-0.015
hips-0.015
assing-0.015
Token Wh
Token ID854
Feature activation+1.68e+00

Pos logit contributions
 bal+0.015
 when+0.012
 Sara+0.012
CR+0.012
Whe+0.012
Neg logit contributions
iversity-0.031
gency-0.027
hips-0.026
 dearly-0.026
assing-0.026
Tokenisk
Token ID1984
Feature activation+7.81e+00

Pos logit contributions
iversity+0.040
 lens+0.033
gency+0.031
 tripod+0.030
 facts+0.029
Neg logit contributions
 bal-0.101
me-0.101
ara-0.100
 when-0.095
CR-0.094
Tokeners
Token ID364
Feature activation+1.69e+00

Pos logit contributions
iversity+0.333
gency+0.276
 dearly+0.270
assing+0.267
hips+0.262
Neg logit contributions
 bal-0.349
CR-0.312
Whe-0.309
me-0.308
 when-0.307
Token to
Token ID284
Feature activation+9.96e-01

Pos logit contributions
iversity+0.117
gency+0.110
assing+0.108
 tripod+0.107
 lens+0.107
Neg logit contributions
 when-0.035
 she-0.030
 one-0.029
 another-0.026
 bal-0.023
Token come
Token ID1282
Feature activation+2.93e-01

Pos logit contributions
iversity+0.055
gency+0.047
 dearly+0.047
hips+0.046
assing+0.046
Neg logit contributions
 bal-0.033
CR-0.027
Whe-0.027
me-0.027
 when-0.027
Token play
Token ID711
Feature activation+2.94e-01

Pos logit contributions
iversity+0.018
gency+0.016
 dearly+0.016
 facts+0.015
assing+0.015
Neg logit contributions
 bal-0.008
 when-0.006
CR-0.006
me-0.006
Whe-0.006
Token with
Token ID351
Feature activation+8.39e-01

Pos logit contributions
iversity+0.020
gency+0.019
hips+0.018
 facts+0.018
ric+0.018
Neg logit contributions
 bal-0.005
 when-0.005
 another-0.004
 one-0.004
me-0.004
Token
Token ID198
Feature activation+2.10e-01

Pos logit contributions
iversity+0.075
 dearly+0.065
gency+0.065
hips+0.064
assing+0.064
Neg logit contributions
 bal-0.034
Whe-0.027
CR-0.027
me-0.026
 when-0.026
Token
Token ID198
Feature activation+4.54e-01

Pos logit contributions
iversity+0.013
 dearly+0.011
gency+0.011
hips+0.011
assing+0.011
Neg logit contributions
 bal-0.006
Whe-0.005
CR-0.005
me-0.005
 when-0.005
Token"
Token ID1
Feature activation+1.55e-01

Pos logit contributions
iversity+0.030
gency+0.027
 dearly+0.027
hips+0.027
 facts+0.027
Neg logit contributions
 bal-0.009
CR-0.007
 when-0.007
me-0.007
Next-0.007
TokenLook
Token ID8567
Feature activation+1.16e+00

Pos logit contributions
iversity+0.008
assing+0.006
gency+0.006
 tripod+0.006
 dearly+0.006
Neg logit contributions
 bal-0.006
 Mir-0.005
CR-0.005
 when-0.005
me-0.005
Token Wh
Token ID854
Feature activation+1.35e+00

Pos logit contributions
iversity+0.067
gency+0.058
 dearly+0.057
assing+0.057
hips+0.056
Neg logit contributions
 bal-0.033
Whe-0.029
CR-0.029
Next-0.028
me-0.027
Tokenisk
Token ID1984
Feature activation+7.73e+00

Pos logit contributions
iversity+0.030
 dearly+0.023
 facts+0.021
 tripod+0.021
gency+0.020
Neg logit contributions
 bal-0.086
me-0.084
ara-0.083
CR-0.080
 when-0.078
Tokeners
Token ID364
Feature activation+1.73e+00

Pos logit contributions
iversity+0.338
gency+0.288
 dearly+0.283
assing+0.275
hips+0.268
Neg logit contributions
 bal-0.332
CR-0.299
me-0.297
Whe-0.293
 when-0.291
Token,
Token ID11
Feature activation+1.58e+00

Pos logit contributions
iversity+0.101
 dearly+0.091
 facts+0.088
hips+0.088
gency+0.087
Neg logit contributions
 bal-0.050
 when-0.042
 she-0.040
 Sara-0.039
 another-0.039
Token I
Token ID314
Feature activation-3.46e-01

Pos logit contributions
iversity+0.099
 dearly+0.088
gency+0.087
hips+0.086
assing+0.086
Neg logit contributions
 bal-0.040
 Sara-0.033
 when-0.033
 she-0.033
 another-0.032
Token found
Token ID1043
Feature activation-5.12e-01

Pos logit contributions
 bal+0.011
Whe+0.009
CR+0.009
 Mir+0.009
Next+0.009
Neg logit contributions
iversity-0.020
gency-0.017
 dearly-0.017
assing-0.016
hips-0.016
Token gold
Token ID3869
Feature activation+3.59e-01

Pos logit contributions
 bal+0.016
Whe+0.014
me+0.014
Next+0.013
CR+0.013
Neg logit contributions
iversity-0.029
gency-0.025
 dearly-0.025
assing-0.024
smart-0.024
Token his
Token ID465
Feature activation-7.27e-02

Pos logit contributions
iversity+0.010
 dearly+0.008
hips+0.008
gency+0.008
 facts+0.008
Neg logit contributions
 bal-0.004
 when-0.004
 she-0.004
 another-0.004
 one-0.004
Token puzzle
Token ID15027
Feature activation+1.93e+00

Pos logit contributions
 bal+0.003
CR+0.003
Whe+0.003
 when+0.003
me+0.003
Neg logit contributions
iversity-0.003
gency-0.003
 dearly-0.002
assing-0.002
hips-0.002
Token and
Token ID290
Feature activation-2.97e-01

Pos logit contributions
iversity+0.125
gency+0.113
assing+0.110
hips+0.106
 facts+0.105
Neg logit contributions
 bal-0.048
 when-0.041
 she-0.038
 another-0.037
 one-0.036
Token thanked
Token ID26280
Feature activation+9.45e-01

Pos logit contributions
 bal+0.010
Whe+0.009
CR+0.009
me+0.009
ara+0.008
Neg logit contributions
iversity-0.016
 dearly-0.013
gency-0.013
 facts-0.013
smart-0.013
Token Wh
Token ID854
Feature activation+2.19e+00

Pos logit contributions
iversity+0.059
 lens+0.053
hips+0.051
gency+0.051
 facts+0.051
Neg logit contributions
 Sara-0.024
 bal-0.023
 another-0.022
 when-0.022
 Mir-0.022
Tokenisk
Token ID1984
Feature activation+7.72e+00

Pos logit contributions
iversity+0.051
gency+0.042
 lens+0.042
 tripod+0.041
 facts+0.037
Neg logit contributions
 bal-0.135
me-0.132
 when-0.127
ara-0.127
CR-0.123
Tokeners
Token ID364
Feature activation+1.91e+00

Pos logit contributions
iversity+0.355
gency+0.322
 lens+0.307
assing+0.305
 dearly+0.302
Neg logit contributions
 bal-0.301
me-0.276
CR-0.273
 when-0.270
 Sara-0.267
Token for
Token ID329
Feature activation+4.18e-01

Pos logit contributions
iversity+0.104
gency+0.091
assing+0.089
 facts+0.088
 lens+0.088
Neg logit contributions
 bal-0.064
 when-0.059
 another-0.056
 one-0.055
 Sara-0.054
Token helping
Token ID5742
Feature activation+1.81e+00

Pos logit contributions
iversity+0.023
 lens+0.021
gency+0.020
 facts+0.020
EL+0.020
Neg logit contributions
 when-0.013
 she-0.013
 Sara-0.012
 another-0.012
 asked-0.012
Token him
Token ID683
Feature activation+1.82e-01

Pos logit contributions
iversity+0.098
gency+0.083
 dearly+0.083
 facts+0.083
hips+0.082
Neg logit contributions
 bal-0.061
 when-0.057
 Sara-0.053
 another-0.052
 one-0.052
Token.
Token ID13
Feature activation+3.84e-01

Pos logit contributions
iversity+0.010
 dearly+0.008
hips+0.008
gency+0.008
assing+0.008
Neg logit contributions
 bal-0.006
Whe-0.006
CR-0.005
me-0.005
 Mir-0.005
Token Ch
Token ID609
Feature activation-2.72e+00

Pos logit contributions
 bal+0.008
CR+0.007
Whe+0.007
me+0.007
 when+0.007
Neg logit contributions
iversity-0.010
gency-0.009
 dearly-0.009
assing-0.008
hips-0.008
Tokenir
Token ID343
Feature activation-4.12e-01

Pos logit contributions
 bal+0.104
me+0.096
CR+0.096
ara+0.093
 when+0.087
Neg logit contributions
iversity-0.124
 facts-0.120
gency-0.114
 tripod-0.112
 lens-0.108
Tokenpy
Token ID9078
Feature activation-2.91e-01

Pos logit contributions
 bal+0.023
 when+0.021
CR+0.021
me+0.021
 Mir+0.021
Neg logit contributions
iversity-0.013
gency-0.010
 dearly-0.009
assing-0.009
 facts-0.009
Token thanked
Token ID26280
Feature activation+7.16e-01

Pos logit contributions
 bal+0.010
Whe+0.008
CR+0.008
 Mir+0.008
me+0.008
Neg logit contributions
iversity-0.016
 dearly-0.014
gency-0.013
smart-0.013
hips-0.013
Token Wh
Token ID854
Feature activation+1.95e+00

Pos logit contributions
iversity+0.041
 lens+0.036
 facts+0.035
 dearly+0.035
gency+0.035
Neg logit contributions
 bal-0.021
 when-0.020
 Sara-0.020
 Mir-0.020
 she-0.019
Tokenisk
Token ID1984
Feature activation+7.71e+00

Pos logit contributions
iversity+0.046
 lens+0.041
 tripod+0.040
 facts+0.039
gency+0.036
Neg logit contributions
 bal-0.115
me-0.115
ara-0.113
 when-0.109
CR-0.105
Tokeners
Token ID364
Feature activation+1.88e+00

Pos logit contributions
iversity+0.357
gency+0.333
 lens+0.323
 dearly+0.311
assing+0.311
Neg logit contributions
 bal-0.287
me-0.268
CR-0.263
 when-0.260
 Sara-0.254
Token for
Token ID329
Feature activation+3.25e-01

Pos logit contributions
iversity+0.111
gency+0.101
 facts+0.099
assing+0.099
 lens+0.097
Neg logit contributions
 bal-0.052
 when-0.049
 one-0.044
 another-0.044
 she-0.044
Token helping
Token ID5742
Feature activation+1.75e+00

Pos logit contributions
iversity+0.017
 lens+0.017
igsaw+0.016
gency+0.015
 facts+0.015
Neg logit contributions
 when-0.011
 she-0.010
 another-0.010
 Sara-0.010
 one-0.010
Token him
Token ID683
Feature activation+1.15e-01

Pos logit contributions
iversity+0.096
 facts+0.083
igsaw+0.083
 lens+0.083
gency+0.081
Neg logit contributions
 bal-0.057
 when-0.056
 she-0.050
 Sara-0.049
 one-0.049
Token.
Token ID13
Feature activation+7.17e-03

Pos logit contributions
iversity+0.006
 dearly+0.006
gency+0.006
hips+0.005
assing+0.005
Neg logit contributions
 bal-0.004
Whe-0.003
CR-0.003
me-0.003
 Mir-0.003
Token,
Token ID11
Feature activation+9.46e-01

Pos logit contributions
iversity+0.044
gency+0.036
 dearly+0.034
assing+0.034
smart+0.034
Neg logit contributions
 bal-0.033
CR-0.031
Whe-0.029
Next-0.029
 dipped-0.028
Token Lucy
Token ID22162
Feature activation-4.35e-01

Pos logit contributions
iversity+0.052
gency+0.044
 dearly+0.044
assing+0.044
hips+0.043
Neg logit contributions
 bal-0.031
CR-0.026
Whe-0.026
me-0.026
 when-0.026
Token!"
Token ID2474
Feature activation-1.07e+00

Pos logit contributions
 bal+0.016
Whe+0.014
CR+0.014
Next+0.013
me+0.013
Neg logit contributions
iversity-0.023
 dearly-0.019
gency-0.019
assing-0.018
smart-0.018
Token said
Token ID531
Feature activation-1.84e+00

Pos logit contributions
 bal+0.039
Whe+0.035
CR+0.034
me+0.034
 kicker+0.032
Neg logit contributions
iversity-0.054
gency-0.044
 dearly-0.044
 facts-0.042
 tripod-0.042
Token Wh
Token ID854
Feature activation+1.47e+00

Pos logit contributions
 bal+0.068
CR+0.062
Whe+0.060
Next+0.059
me+0.058
Neg logit contributions
iversity-0.093
gency-0.077
 dearly-0.077
hips-0.074
smart-0.072
Tokenisk
Token ID1984
Feature activation+7.67e+00

Pos logit contributions
iversity+0.063
 lens+0.054
gency+0.054
 tripod+0.053
 facts+0.052
Neg logit contributions
 bal-0.062
ara-0.059
me-0.059
 when-0.056
 Sara-0.055
Tokeners
Token ID364
Feature activation+1.67e+00

Pos logit contributions
iversity+0.359
gency+0.327
assing+0.310
 lens+0.307
 dearly+0.304
Neg logit contributions
 bal-0.293
me-0.270
CR-0.263
 when-0.261
Whe-0.261
Token,
Token ID11
Feature activation+7.32e-01

Pos logit contributions
iversity+0.120
gency+0.109
 facts+0.108
igsaw+0.106
hips+0.106
Neg logit contributions
 when-0.028
 she-0.026
 one-0.024
 bal-0.024
 asked-0.024
Token "
Token ID366
Feature activation+1.00e+00

Pos logit contributions
iversity+0.036
 dearly+0.030
gency+0.030
hips+0.030
 facts+0.030
Neg logit contributions
 bal-0.029
 when-0.025
 Sara-0.024
 Mir-0.024
 she-0.024
TokenDo
Token ID5211
Feature activation-7.27e-01

Pos logit contributions
iversity+0.054
 dearly+0.046
assing+0.043
smart+0.043
gency+0.042
Neg logit contributions
 bal-0.033
 dipped-0.029
 Mir-0.029
CR-0.028
 Sara-0.027
Token you
Token ID345
Feature activation-8.31e-01

Pos logit contributions
 bal+0.009
Whe+0.005
 when+0.005
CR+0.005
me+0.004
Neg logit contributions
iversity-0.056
gency-0.050
 dearly-0.049
assing-0.049
hips-0.049
Token by
Token ID416
Feature activation+1.06e+00

Pos logit contributions
iversity+0.041
gency+0.035
 dearly+0.035
assing+0.034
hips+0.034
Neg logit contributions
 bal-0.020
CR-0.016
Whe-0.016
 when-0.016
me-0.016
Token letting
Token ID9616
Feature activation-5.88e-01

Pos logit contributions
iversity+0.058
gency+0.050
asser+0.048
 dearly+0.048
assing+0.048
Neg logit contributions
 when-0.034
 bal-0.033
 one-0.032
 she-0.031
 another-0.031
Token them
Token ID606
Feature activation-2.34e-01

Pos logit contributions
 bal+0.023
Whe+0.020
CR+0.020
me+0.019
Next+0.019
Neg logit contributions
iversity-0.029
 dearly-0.024
gency-0.024
hips-0.024
assing-0.024
Token read
Token ID1100
Feature activation+2.19e+00

Pos logit contributions
 bal+0.008
Whe+0.007
CR+0.007
 Mir+0.007
ara+0.007
Neg logit contributions
iversity-0.012
 dearly-0.010
hips-0.010
 facts-0.010
smart-0.010
Token his
Token ID465
Feature activation+2.11e+00

Pos logit contributions
iversity+0.136
gency+0.119
assing+0.118
 dearly+0.115
hips+0.115
Neg logit contributions
 bal-0.057
 when-0.048
 another-0.045
 one-0.045
me-0.044
Token story
Token ID1621
Feature activation+7.66e+00

Pos logit contributions
iversity+0.091
gency+0.074
 dearly+0.073
assing+0.072
hips+0.072
Neg logit contributions
 bal-0.096
Whe-0.084
CR-0.084
 when-0.083
me-0.083
Token.
Token ID13
Feature activation+1.44e-01

Pos logit contributions
iversity+0.423
 dearly+0.364
hips+0.358
gency+0.354
 facts+0.352
Neg logit contributions
 bal-0.247
Whe-0.214
CR-0.213
 Mir-0.204
me-0.203
Token<|endoftext|>
Token ID50256
Feature activation+1.91e+00

Pos logit contributions
iversity+0.008
gency+0.007
 dearly+0.007
assing+0.007
hips+0.007
Neg logit contributions
 bal-0.004
Whe-0.004
CR-0.004
 when-0.004
me-0.004
TokenOnce
Token ID7454
Feature activation+1.53e+00

Pos logit contributions
iversity+0.114
gency+0.097
 dearly+0.097
assing+0.095
hips+0.095
Neg logit contributions
 bal-0.055
CR-0.044
Whe-0.043
me-0.042
 when-0.042
Token upon
Token ID2402
Feature activation+4.60e-01

Pos logit contributions
iversity+0.082
 dearly+0.070
gency+0.069
hips+0.069
assing+0.067
Neg logit contributions
 bal-0.052
Whe-0.044
CR-0.044
me-0.043
 when-0.042
Token a
Token ID257
Feature activation+1.12e+00

Pos logit contributions
iversity+0.028
 dearly+0.024
gency+0.024
assing+0.023
hips+0.023
Neg logit contributions
 bal-0.013
CR-0.010
Whe-0.010
me-0.010
 Mir-0.010
Token down
Token ID866
Feature activation+1.04e+00

Pos logit contributions
 bal+0.080
 when+0.065
Whe+0.063
CR+0.063
me+0.062
Neg logit contributions
iversity-0.199
gency-0.174
 dearly-0.172
hips-0.170
assing-0.170
Token together
Token ID1978
Feature activation+2.00e+00

Pos logit contributions
iversity+0.064
gency+0.057
assing+0.056
 dearly+0.056
hips+0.055
Neg logit contributions
 bal-0.027
 when-0.023
CR-0.022
 she-0.022
 another-0.021
Token to
Token ID284
Feature activation+1.13e+00

Pos logit contributions
iversity+0.120
gency+0.104
 dearly+0.102
assing+0.102
hips+0.101
Neg logit contributions
 bal-0.057
 when-0.047
CR-0.046
me-0.045
Whe-0.045
Token read
Token ID1100
Feature activation+2.61e+00

Pos logit contributions
iversity+0.071
 dearly+0.060
hips+0.060
gency+0.059
smart+0.058
Neg logit contributions
 another-0.028
 one-0.027
 when-0.026
 Sara-0.026
 she-0.026
Token the
Token ID262
Feature activation+2.63e+00

Pos logit contributions
iversity+0.164
gency+0.145
assing+0.143
hips+0.139
 dearly+0.139
Neg logit contributions
 bal-0.064
 when-0.058
 another-0.055
 she-0.054
 one-0.054
Token story
Token ID1621
Feature activation+7.66e+00

Pos logit contributions
iversity+0.103
gency+0.085
assing+0.084
ishly+0.084
hips+0.084
Neg logit contributions
 bal-0.121
 one-0.118
 when-0.116
 another-0.114
me-0.112
Token.
Token ID13
Feature activation+6.71e-01

Pos logit contributions
iversity+0.417
 dearly+0.355
hips+0.353
 facts+0.347
gency+0.346
Neg logit contributions
 bal-0.256
Whe-0.221
CR-0.219
me-0.210
 Mir-0.207
Token From
Token ID3574
Feature activation+3.38e+00

Pos logit contributions
iversity+0.030
 dearly+0.025
gency+0.024
hips+0.023
assing+0.023
Neg logit contributions
 bal-0.029
CR-0.026
me-0.025
Whe-0.025
 when-0.024
Token that
Token ID326
Feature activation+1.16e+00

Pos logit contributions
iversity+0.169
gency+0.142
 dearly+0.141
assing+0.139
hips+0.139
Neg logit contributions
 bal-0.130
CR-0.112
Whe-0.111
 when-0.111
me-0.110
Token day
Token ID1110
Feature activation-3.30e-01

Pos logit contributions
iversity+0.083
assing+0.079
gency+0.078
hips+0.078
 facts+0.076
Neg logit contributions
 bal-0.016
 when-0.015
 one-0.015
 she-0.013
me-0.011
Token on
Token ID319
Feature activation+2.03e+00

Pos logit contributions
 bal+0.009
CR+0.008
 when+0.008
Whe+0.007
me+0.007
Neg logit contributions
iversity-0.020
gency-0.017
 dearly-0.017
assing-0.017
hips-0.017
Token j
Token ID474
Feature activation+8.20e-03

Pos logit contributions
 bal+0.005
CR+0.004
Whe+0.004
me+0.004
Next+0.003
Neg logit contributions
iversity-0.011
 dearly-0.010
gency-0.010
assing-0.010
hips-0.009
Tokenolly
Token ID5098
Feature activation-6.14e-01

Pos logit contributions
iversity+0.001
gency+0.000
assing+0.000
 facts+0.000
 dearly+0.000
Neg logit contributions
 bal-0.000
me-0.000
CR-0.000
 when-0.000
 Sara-0.000
Token cat
Token ID3797
Feature activation-3.79e-01

Pos logit contributions
 bal+0.018
CR+0.016
Whe+0.016
Next+0.015
me+0.014
Neg logit contributions
iversity-0.036
 dearly-0.030
gency-0.030
assing-0.029
 tripod-0.029
Token named
Token ID3706
Feature activation+1.81e-01

Pos logit contributions
 bal+0.015
 when+0.014
 she+0.013
 one+0.013
 asked+0.013
Neg logit contributions
iversity-0.019
gency-0.017
hips-0.016
 facts-0.016
assing-0.016
Token Wh
Token ID854
Feature activation+1.37e+00

Pos logit contributions
iversity+0.010
gency+0.009
assing+0.008
 dearly+0.008
isk+0.008
Neg logit contributions
 bal-0.006
CR-0.005
Whe-0.005
me-0.005
Next-0.005
Tokenisk
Token ID1984
Feature activation+7.61e+00

Pos logit contributions
iversity+0.033
 facts+0.025
gency+0.025
 lens+0.024
 dearly+0.023
Neg logit contributions
ara-0.083
me-0.082
 bal-0.080
 when-0.077
CR-0.076
Tokeners
Token ID364
Feature activation+1.66e+00

Pos logit contributions
iversity+0.310
 dearly+0.243
gency+0.243
hips+0.240
assing+0.239
Neg logit contributions
 bal-0.365
Whe-0.321
CR-0.320
 when-0.317
me-0.316
Token.
Token ID13
Feature activation+1.44e+00

Pos logit contributions
iversity+0.126
igsaw+0.120
 facts+0.115
gency+0.115
heit+0.115
Neg logit contributions
 when-0.028
 one-0.021
 bal-0.019
 another-0.017
 she-0.015
Token Wh
Token ID854
Feature activation+2.37e-02

Pos logit contributions
iversity+0.091
gency+0.079
 facts+0.078
 lens+0.078
 dearly+0.077
Neg logit contributions
 bal-0.036
 Sara-0.032
 when-0.031
Whe-0.030
 she-0.030
Tokenisk
Token ID1984
Feature activation+6.55e+00

Pos logit contributions
iversity+0.001
gency+0.001
 tripod+0.001
 facts+0.001
 lens+0.001
Neg logit contributions
ara-0.001
me-0.001
 bal-0.001
 when-0.001
CR-0.001
Tokeners
Token ID364
Feature activation+7.73e-02

Pos logit contributions
iversity+0.290
gency+0.250
 dearly+0.240
assing+0.236
 lens+0.231
Neg logit contributions
 bal-0.277
CR-0.251
me-0.249
Whe-0.247
 when-0.246
Token it
Token ID340
Feature activation+7.80e-01

Pos logit contributions
 bal+0.018
 when+0.015
 one+0.014
 another+0.014
 Sara+0.013
Neg logit contributions
iversity-0.068
gency-0.059
 facts-0.059
hips-0.059
 dearly-0.059
Token.
Token ID13
Feature activation+1.32e-01

Pos logit contributions
iversity+0.039
 dearly+0.035
hips+0.034
gency+0.033
smart+0.033
Neg logit contributions
 bal-0.028
Whe-0.025
CR-0.025
me-0.025
ara-0.023
Token He
Token ID679
Feature activation+2.18e+00

Pos logit contributions
iversity+0.007
gency+0.006
 dearly+0.006
assing+0.005
hips+0.005
Neg logit contributions
 bal-0.005
CR-0.004
Whe-0.004
me-0.004
 Mir-0.004
Token told
Token ID1297
Feature activation+8.66e-01

Pos logit contributions
iversity+0.112
 dearly+0.096
gency+0.094
isk+0.088
asser+0.087
Neg logit contributions
 bal-0.073
Whe-0.073
CR-0.063
me-0.061
 Sara-0.060
Token Wh
Token ID854
Feature activation+1.49e+00

Pos logit contributions
iversity+0.050
 dearly+0.043
gency+0.043
assing+0.042
hips+0.042
Neg logit contributions
 bal-0.027
 when-0.023
 Sara-0.023
 Mir-0.022
 one-0.022
Tokenisk
Token ID1984
Feature activation+7.58e+00

Pos logit contributions
iversity+0.041
 tripod+0.034
 lens+0.034
gency+0.033
 facts+0.031
Neg logit contributions
ara-0.086
 bal-0.086
me-0.086
 when-0.079
 later-0.078
Tokeners
Token ID364
Feature activation+1.79e+00

Pos logit contributions
iversity+0.354
gency+0.320
assing+0.303
 lens+0.302
 dearly+0.299
Neg logit contributions
 bal-0.288
CR-0.267
me-0.264
 when-0.262
 another-0.256
Token to
Token ID284
Feature activation+1.24e+00

Pos logit contributions
iversity+0.127
assing+0.122
gency+0.120
 unison+0.117
 lens+0.115
Neg logit contributions
 she-0.037
 when-0.035
 one-0.028
 another-0.023
 at-0.022
Token look
Token ID804
Feature activation+3.85e+00

Pos logit contributions
iversity+0.067
hips+0.058
gency+0.058
 dearly+0.057
assing+0.057
Neg logit contributions
 bal-0.041
CR-0.036
me-0.035
Whe-0.034
Next-0.034
Token at
Token ID379
Feature activation-1.29e+00

Pos logit contributions
iversity+0.241
gency+0.209
 dearly+0.208
assing+0.206
hips+0.206
Neg logit contributions
 bal-0.100
CR-0.079
 when-0.079
Whe-0.078
me-0.077
Token him
Token ID683
Feature activation+1.08e+00

Pos logit contributions
 when+0.024
 one+0.023
 another+0.022
 Sara+0.020
 bal+0.019
Neg logit contributions
iversity-0.091
igsaw-0.083
 facts-0.083
 lens-0.080
 tripod-0.080
Token family
Token ID1641
Feature activation+1.42e+00

Pos logit contributions
 bal+0.075
 when+0.057
 Sara+0.054
 one+0.054
 another+0.052
Neg logit contributions
iversity-0.221
gency-0.194
assing-0.190
smart-0.190
hips-0.188
Token sh
Token ID427
Feature activation-2.79e+00

Pos logit contributions
iversity+0.085
gency+0.075
hips+0.073
 dearly+0.073
assing+0.072
Neg logit contributions
 bal-0.041
 when-0.036
 asked-0.034
 she-0.034
me-0.033
Tokenoo
Token ID2238
Feature activation-1.44e+00

Pos logit contributions
 bal+0.100
me+0.095
Whe+0.086
 when+0.085
ara+0.085
Neg logit contributions
iversity-0.145
gency-0.125
 facts-0.123
 lens-0.118
hips-0.118
Tokened
Token ID276
Feature activation+3.84e-01

Pos logit contributions
 bal+0.041
 when+0.033
CR+0.033
Whe+0.032
me+0.032
Neg logit contributions
iversity-0.086
gency-0.075
 dearly-0.075
hips-0.074
assing-0.074
Token Wh
Token ID854
Feature activation+1.42e+00

Pos logit contributions
iversity+0.024
gency+0.020
 dearly+0.020
hips+0.020
assing+0.020
Neg logit contributions
 bal-0.011
 when-0.008
CR-0.008
Whe-0.008
me-0.008
Tokenisk
Token ID1984
Feature activation+7.56e+00

Pos logit contributions
iversity+0.031
 facts+0.023
 lens+0.023
 tripod+0.021
 dearly+0.021
Neg logit contributions
 bal-0.089
me-0.087
ara-0.084
 when-0.083
CR-0.082
Tokeners
Token ID364
Feature activation+1.48e+00

Pos logit contributions
iversity+0.338
gency+0.295
 dearly+0.285
 lens+0.281
 facts+0.275
Neg logit contributions
 bal-0.312
CR-0.285
me-0.284
 when-0.277
Whe-0.275
Token away
Token ID1497
Feature activation+3.04e+00

Pos logit contributions
iversity+0.092
gency+0.081
 dearly+0.080
 facts+0.080
hips+0.079
Neg logit contributions
 bal-0.038
 when-0.032
 she-0.029
 another-0.029
 one-0.029
Token and
Token ID290
Feature activation-4.73e-01

Pos logit contributions
iversity+0.208
gency+0.186
assing+0.182
 tripod+0.181
 facts+0.179
Neg logit contributions
 when-0.068
 bal-0.062
 one-0.053
 another-0.052
 she-0.050
Token told
Token ID1297
Feature activation+7.08e-01

Pos logit contributions
 bal+0.015
Whe+0.014
CR+0.013
me+0.013
Next+0.013
Neg logit contributions
iversity-0.026
 dearly-0.022
gency-0.022
assing-0.022
smart-0.021
Token Max
Token ID5436
Feature activation+2.80e+00

Pos logit contributions
iversity+0.041
gency+0.035
 lens+0.035
hips+0.035
assing+0.035
Neg logit contributions
 bal-0.022
 when-0.021
 Sara-0.020
 she-0.019
 another-0.019
Token you
Token ID345
Feature activation-4.08e-01

Pos logit contributions
 bal+0.002
Whe+0.002
CR+0.002
me+0.002
 when+0.002
Neg logit contributions
iversity-0.006
 dearly-0.005
gency-0.005
hips-0.005
assing-0.005
Token want
Token ID765
Feature activation-1.21e+00

Pos logit contributions
 bal+0.015
 when+0.013
CR+0.013
me+0.013
Whe+0.013
Neg logit contributions
iversity-0.021
gency-0.018
 dearly-0.017
assing-0.017
hips-0.017
Token to
Token ID284
Feature activation+1.08e+00

Pos logit contributions
 bal+0.036
Whe+0.030
CR+0.030
me+0.029
 Mir+0.028
Neg logit contributions
iversity-0.071
gency-0.061
 dearly-0.060
hips-0.059
assing-0.059
Token read
Token ID1100
Feature activation+2.20e+00

Pos logit contributions
iversity+0.056
gency+0.047
hips+0.047
 dearly+0.046
assing+0.046
Neg logit contributions
 bal-0.039
Whe-0.034
CR-0.034
me-0.033
Next-0.032
Token the
Token ID262
Feature activation+2.43e+00

Pos logit contributions
iversity+0.143
gency+0.125
assing+0.125
 dearly+0.124
hips+0.123
Neg logit contributions
 bal-0.049
 when-0.041
 another-0.039
 one-0.039
 Sara-0.039
Token story
Token ID1621
Feature activation+7.54e+00

Pos logit contributions
iversity+0.118
gency+0.099
assing+0.099
 dearly+0.097
hips+0.096
Neg logit contributions
 bal-0.097
 one-0.085
 when-0.085
 she-0.084
 another-0.082
Token inside
Token ID2641
Feature activation-6.64e-01

Pos logit contributions
iversity+0.400
 dearly+0.335
gency+0.331
hips+0.331
 facts+0.325
Neg logit contributions
 bal-0.265
CR-0.229
Whe-0.228
 Mir-0.220
me-0.219
Token the
Token ID262
Feature activation+1.33e+00

Pos logit contributions
 bal+0.017
 when+0.015
 she+0.014
 Sara+0.014
me+0.014
Neg logit contributions
iversity-0.041
gency-0.036
 dearly-0.035
assing-0.035
hips-0.035
Token shell
Token ID7582
Feature activation+4.19e-01

Pos logit contributions
iversity+0.064
isk+0.054
hips+0.053
gency+0.052
 dearly+0.052
Neg logit contributions
 bal-0.054
 sun-0.050
 she-0.050
 one-0.050
 when-0.049
Token?
Token ID30
Feature activation-1.07e-01

Pos logit contributions
iversity+0.023
 dearly+0.019
gency+0.019
hips+0.019
smart+0.019
Neg logit contributions
 bal-0.014
Whe-0.012
CR-0.012
 Mir-0.012
me-0.011
Tokenรข
Token ID22940
Feature activation-1.10e+00

Pos logit contributions
 bal+0.004
Whe+0.004
me+0.004
CR+0.004
 dipped+0.004
Neg logit contributions
iversity-0.005
gency-0.004
smart-0.004
 dearly-0.004
hips-0.004
Token the
Token ID262
Feature activation-8.68e-01

Pos logit contributions
iversity+0.033
hips+0.028
 dearly+0.028
 facts+0.028
gency+0.027
Neg logit contributions
 bal-0.019
CR-0.016
me-0.016
Whe-0.016
ara-0.015
Token children
Token ID1751
Feature activation+7.65e-01

Pos logit contributions
 bal+0.020
 one+0.018
 when+0.017
 sun+0.017
 another+0.016
Neg logit contributions
iversity-0.057
gency-0.051
hips-0.050
 dearly-0.049
smart-0.048
Token thanked
Token ID26280
Feature activation+1.05e+00

Pos logit contributions
iversity+0.042
gency+0.035
 dearly+0.035
hips+0.035
assing+0.035
Neg logit contributions
 bal-0.026
Whe-0.022
CR-0.022
me-0.021
 Mir-0.021
Token the
Token ID262
Feature activation+1.96e+00

Pos logit contributions
iversity+0.077
gency+0.068
assing+0.067
 dearly+0.067
hips+0.066
Neg logit contributions
 when-0.016
 another-0.015
 one-0.015
 Sara-0.014
 bal-0.013
Token old
Token ID1468
Feature activation+2.68e+00

Pos logit contributions
iversity+0.116
gency+0.105
hips+0.100
 dearly+0.100
smart+0.099
Neg logit contributions
 one-0.055
 sun-0.053
 bal-0.053
 kids-0.051
 another-0.049
Token story
Token ID1621
Feature activation+7.52e+00

Pos logit contributions
iversity+0.171
 dearly+0.153
gency+0.150
hips+0.149
smart+0.148
Neg logit contributions
 bal-0.062
 one-0.055
 when-0.052
 she-0.052
 Sara-0.049
Tokente
Token ID660
Feature activation+4.04e+00

Pos logit contributions
iversity+0.464
gency+0.415
assing+0.408
 dearly+0.391
hips+0.384
Neg logit contributions
 bal-0.199
 when-0.176
 she-0.165
 one-0.164
 another-0.164
Tokenller
Token ID6051
Feature activation+1.06e+00

Pos logit contributions
iversity+0.162
gency+0.128
 dearly+0.128
assing+0.125
hips+0.124
Neg logit contributions
 bal-0.195
Whe-0.174
CR-0.173
me-0.169
 when-0.169
Token.
Token ID13
Feature activation+5.88e-01

Pos logit contributions
iversity+0.062
gency+0.055
assing+0.054
hips+0.052
 lens+0.051
Neg logit contributions
 bal-0.032
 when-0.031
 another-0.029
 one-0.029
 she-0.028
Token They
Token ID1119
Feature activation+2.40e+00

Pos logit contributions
iversity+0.029
gency+0.024
 dearly+0.024
assing+0.024
hips+0.024
Neg logit contributions
 bal-0.023
CR-0.020
Whe-0.020
 when-0.020
 Mir-0.019
Token gave
Token ID2921
Feature activation+6.72e-01

Pos logit contributions
iversity+0.128
gency+0.112
assing+0.111
hips+0.108
 dearly+0.107
Neg logit contributions
 bal-0.081
 when-0.074
 asked-0.073
 she-0.071
 another-0.071
Token This
Token ID770
Feature activation+1.52e+00

Pos logit contributions
iversity+0.005
 dearly+0.004
gency+0.004
hips+0.004
assing+0.004
Neg logit contributions
 bal-0.003
CR-0.003
Whe-0.003
me-0.003
 when-0.003
Token time
Token ID640
Feature activation-6.07e-01

Pos logit contributions
iversity+0.089
 dearly+0.078
 facts+0.076
gency+0.075
 lens+0.074
Neg logit contributions
 bal-0.044
Whe-0.038
CR-0.038
 Mir-0.036
me-0.034
Token she
Token ID673
Feature activation-8.71e-01

Pos logit contributions
 bal+0.019
Whe+0.017
CR+0.016
Next+0.016
me+0.016
Neg logit contributions
iversity-0.034
 dearly-0.029
gency-0.029
hips-0.029
smart-0.028
Token wrote
Token ID2630
Feature activation+8.41e-01

Pos logit contributions
 bal+0.033
Whe+0.031
CR+0.029
Next+0.028
 Mir+0.028
Neg logit contributions
iversity-0.043
 dearly-0.036
gency-0.036
smart-0.035
assing-0.035
Token her
Token ID607
Feature activation+1.51e+00

Pos logit contributions
iversity+0.050
gency+0.044
assing+0.043
 dearly+0.043
hips+0.042
Neg logit contributions
 bal-0.024
 when-0.021
 she-0.020
 another-0.020
me-0.019
Token story
Token ID1621
Feature activation+7.51e+00

Pos logit contributions
iversity+0.091
assing+0.079
gency+0.078
hips+0.077
smart+0.076
Neg logit contributions
 bal-0.045
 when-0.040
 one-0.038
 another-0.038
me-0.034
Token about
Token ID546
Feature activation+3.68e-01

Pos logit contributions
iversity+0.432
gency+0.377
assing+0.370
 dearly+0.368
hips+0.361
Neg logit contributions
 bal-0.230
 when-0.194
me-0.186
CR-0.185
 she-0.184
Token dolphins
Token ID36798
Feature activation-1.34e+00

Pos logit contributions
iversity+0.025
gency+0.022
assing+0.022
hips+0.022
asser+0.021
Neg logit contributions
 when-0.008
 bal-0.008
 she-0.008
 one-0.007
 another-0.007
Token and
Token ID290
Feature activation-4.59e-01

Pos logit contributions
 bal+0.039
 when+0.032
me+0.031
CR+0.031
Whe+0.031
Neg logit contributions
iversity-0.079
gency-0.069
 dearly-0.068
assing-0.067
hips-0.067
Token how
Token ID703
Feature activation-2.12e+00

Pos logit contributions
 bal+0.016
CR+0.014
Whe+0.014
me+0.014
 when+0.013
Neg logit contributions
iversity-0.024
gency-0.021
 dearly-0.020
assing-0.020
hips-0.020
Token they
Token ID484
Feature activation-1.79e+00

Pos logit contributions
 bal+0.052
Whe+0.043
CR+0.039
me+0.039
 Mir+0.036
Neg logit contributions
iversity-0.136
 dearly-0.120
 facts-0.118
hips-0.116
smart-0.115
Token book
Token ID1492
Feature activation+3.75e+00

Pos logit contributions
iversity+0.061
smart+0.055
gency+0.055
 dearly+0.053
hips+0.053
Neg logit contributions
 bal-0.033
 sun-0.031
 one-0.030
 dizz-0.029
 another-0.029
Token.
Token ID13
Feature activation-3.22e-01

Pos logit contributions
iversity+0.214
 dearly+0.184
gency+0.183
hips+0.182
assing+0.180
Neg logit contributions
 bal-0.116
Whe-0.097
CR-0.096
me-0.094
 Mir-0.092
Token Ben
Token ID3932
Feature activation-1.69e+00

Pos logit contributions
 bal+0.014
me+0.012
CR+0.012
Whe+0.012
 when+0.012
Neg logit contributions
iversity-0.014
gency-0.012
 dearly-0.012
hips-0.012
assing-0.011
Token reads
Token ID9743
Feature activation+2.30e+00

Pos logit contributions
 bal+0.061
Whe+0.053
CR+0.052
me+0.051
Next+0.050
Neg logit contributions
iversity-0.087
assing-0.073
 facts-0.073
gency-0.072
 dearly-0.072
Token the
Token ID262
Feature activation+2.63e+00

Pos logit contributions
iversity+0.165
assing+0.152
gency+0.147
 dearly+0.145
isk+0.143
Neg logit contributions
 when-0.038
 another-0.037
 one-0.037
 Sara-0.034
 bal-0.033
Token story
Token ID1621
Feature activation+7.50e+00

Pos logit contributions
iversity+0.115
ishly+0.104
hips+0.103
assing+0.102
gency+0.099
Neg logit contributions
 one-0.107
 sun-0.105
 girl-0.103
 bal-0.102
 another-0.101
Token to
Token ID284
Feature activation+2.34e+00

Pos logit contributions
iversity+0.441
gency+0.388
assing+0.382
 dearly+0.376
hips+0.367
Neg logit contributions
 bal-0.222
 when-0.187
 she-0.178
me-0.177
 Mir-0.176
Token Lily
Token ID20037
Feature activation+1.27e+00

Pos logit contributions
iversity+0.181
gency+0.159
assing+0.156
 dearly+0.153
hips+0.152
Neg logit contributions
 Sara-0.036
 one-0.032
 another-0.032
 Mir-0.031
 when-0.026
Token.
Token ID13
Feature activation-9.15e-03

Pos logit contributions
iversity+0.077
gency+0.067
 dearly+0.066
assing+0.065
hips+0.065
Neg logit contributions
 bal-0.035
 when-0.028
CR-0.028
Whe-0.028
me-0.028
Token Lily
Token ID20037
Feature activation-1.52e+00

Pos logit contributions
 bal+0.000
me+0.000
CR+0.000
Whe+0.000
 when+0.000
Neg logit contributions
iversity-0.000
 dearly-0.000
gency-0.000
hips-0.000
 facts-0.000
Token listens
Token ID35019
Feature activation+1.84e+00

Pos logit contributions
 bal+0.059
CR+0.053
Whe+0.051
me+0.049
Next+0.049
Neg logit contributions
iversity-0.073
 facts-0.061
assing-0.060
gency-0.060
 dearly-0.060
Token couch
Token ID18507
Feature activation+2.45e+00

Pos logit contributions
iversity+0.070
smart+0.061
 dearly+0.061
hips+0.061
assing+0.060
Neg logit contributions
 bal-0.050
 when-0.046
 Mir-0.046
 she-0.046
 one-0.046
Token and
Token ID290
Feature activation-1.35e-01

Pos logit contributions
iversity+0.155
gency+0.136
assing+0.133
 dearly+0.133
hips+0.132
Neg logit contributions
 bal-0.060
 when-0.053
 another-0.048
 one-0.048
 she-0.047
Token mom
Token ID1995
Feature activation+1.03e-01

Pos logit contributions
 bal+0.006
CR+0.005
Next+0.005
me+0.005
Whe+0.004
Neg logit contributions
iversity-0.006
gency-0.005
assing-0.005
 facts-0.005
 tripod-0.005
Token reads
Token ID9743
Feature activation+2.38e+00

Pos logit contributions
iversity+0.005
gency+0.004
assing+0.004
hips+0.004
 facts+0.004
Neg logit contributions
 bal-0.004
CR-0.003
Next-0.003
Whe-0.003
me-0.003
Token the
Token ID262
Feature activation+2.53e+00

Pos logit contributions
iversity+0.151
gency+0.132
assing+0.131
 dearly+0.129
hips+0.128
Neg logit contributions
 bal-0.060
 when-0.052
 Sara-0.049
 another-0.049
 Mir-0.047
Token story
Token ID1621
Feature activation+7.49e+00

Pos logit contributions
iversity+0.103
assing+0.088
hips+0.087
 dearly+0.086
gency+0.085
Neg logit contributions
 one-0.111
 sun-0.111
 bal-0.110
 girl-0.106
 boy-0.106
Token.
Token ID13
Feature activation-2.73e-01

Pos logit contributions
iversity+0.399
hips+0.340
 dearly+0.340
 facts+0.337
gency+0.329
Neg logit contributions
 bal-0.257
Whe-0.224
CR-0.221
me-0.213
Next-0.207
Token Anna
Token ID11735
Feature activation-1.43e+00

Pos logit contributions
 bal+0.011
me+0.010
CR+0.009
Whe+0.009
 dipped+0.009
Neg logit contributions
iversity-0.013
 facts-0.011
 dearly-0.011
gency-0.011
hips-0.010
Token listens
Token ID35019
Feature activation+1.73e+00

Pos logit contributions
 bal+0.058
CR+0.052
Whe+0.051
Next+0.049
me+0.048
Neg logit contributions
iversity-0.066
 facts-0.055
assing-0.053
asser-0.052
gency-0.052
Token and
Token ID290
Feature activation+6.07e-01

Pos logit contributions
iversity+0.102
 dearly+0.087
gency+0.087
hips+0.086
assing+0.086
Neg logit contributions
 bal-0.052
Whe-0.043
CR-0.042
me-0.041
 when-0.040
Token looks
Token ID3073
Feature activation+2.21e+00

Pos logit contributions
iversity+0.023
 facts+0.022
assing+0.019
gency+0.019
 lens+0.019
Neg logit contributions
 bal-0.028
CR-0.026
me-0.025
Whe-0.024
ara-0.023
Token almost
Token ID2048
Feature activation-2.74e+00

Pos logit contributions
 bal+0.029
Whe+0.028
CR+0.026
Next+0.026
 Mir+0.025
Neg logit contributions
iversity-0.039
gency-0.032
 dearly-0.032
assing-0.030
 tripod-0.030
Token forgot
Token ID16453
Feature activation+1.90e-02

Pos logit contributions
 bal+0.097
Whe+0.088
me+0.085
 Mir+0.085
Next+0.082
Neg logit contributions
iversity-0.144
gency-0.120
assing-0.117
 dearly-0.116
smart-0.111
Token to
Token ID284
Feature activation-3.04e-01

Pos logit contributions
iversity+0.001
gency+0.001
assing+0.001
hips+0.001
 dearly+0.001
Neg logit contributions
 bal-0.001
 when-0.001
 she-0.001
 Sara-0.001
 one-0.001
Token write
Token ID3551
Feature activation+1.34e+00

Pos logit contributions
 bal+0.010
CR+0.008
Whe+0.008
me+0.008
 when+0.008
Neg logit contributions
iversity-0.017
gency-0.015
 dearly-0.014
assing-0.014
hips-0.014
Token her
Token ID607
Feature activation+1.70e+00

Pos logit contributions
iversity+0.081
gency+0.071
assing+0.070
 dearly+0.069
hips+0.068
Neg logit contributions
 bal-0.037
 when-0.032
 another-0.030
 she-0.030
 one-0.030
Token story
Token ID1621
Feature activation+7.49e+00

Pos logit contributions
iversity+0.093
gency+0.079
 dearly+0.078
assing+0.078
hips+0.078
Neg logit contributions
 bal-0.057
 when-0.048
Whe-0.047
me-0.047
CR-0.047
Token.
Token ID13
Feature activation+5.67e-01

Pos logit contributions
iversity+0.413
gency+0.354
 dearly+0.350
assing+0.347
hips+0.344
Neg logit contributions
 bal-0.246
 when-0.206
CR-0.203
me-0.203
Whe-0.203
Token She
Token ID1375
Feature activation+2.27e+00

Pos logit contributions
iversity+0.029
 dearly+0.025
gency+0.024
assing+0.023
hips+0.023
Neg logit contributions
 bal-0.021
CR-0.019
me-0.018
Whe-0.018
 Mir-0.017
Token quickly
Token ID2952
Feature activation+1.80e+00

Pos logit contributions
iversity+0.116
 dearly+0.098
gency+0.096
 facts+0.092
hips+0.091
Neg logit contributions
 bal-0.082
Whe-0.078
CR-0.072
Next-0.069
 Mir-0.068
Token grabbed
Token ID13176
Feature activation+9.97e-01

Pos logit contributions
iversity+0.100
gency+0.085
 dearly+0.084
assing+0.082
hips+0.082
Neg logit contributions
 bal-0.058
Whe-0.052
CR-0.048
me-0.048
 Mir-0.048
Token a
Token ID257
Feature activation+2.22e+00

Pos logit contributions
iversity+0.065
gency+0.056
 dearly+0.056
assing+0.056
hips+0.055
Neg logit contributions
 bal-0.024
Whe-0.018
CR-0.018
 when-0.018
me-0.018
Token help
Token ID1037
Feature activation+7.45e-01

Pos logit contributions
 bal+0.029
 when+0.024
CR+0.024
Whe+0.023
 Sara+0.023
Neg logit contributions
iversity-0.051
gency-0.044
 dearly-0.044
hips-0.043
assing-0.043
Token,
Token ID11
Feature activation-3.33e-01

Pos logit contributions
iversity+0.042
 dearly+0.036
gency+0.035
assing+0.035
hips+0.035
Neg logit contributions
 bal-0.024
Whe-0.021
CR-0.020
me-0.019
Next-0.019
Token but
Token ID475
Feature activation-2.17e+00

Pos logit contributions
 bal+0.011
Whe+0.010
CR+0.010
me+0.010
ara+0.009
Neg logit contributions
iversity-0.018
 dearly-0.015
gency-0.015
smart-0.015
assing-0.015
Token no
Token ID645
Feature activation-3.70e+00

Pos logit contributions
 bal+0.063
Whe+0.062
CR+0.059
me+0.055
Next+0.055
Neg logit contributions
iversity-0.122
 dearly-0.106
gency-0.102
ishly-0.100
hips-0.100
Token-
Token ID12
Feature activation-3.65e+00

Pos logit contributions
Whe+0.100
CR+0.096
 bal+0.094
 Mir+0.087
me+0.084
Neg logit contributions
iversity-0.199
 dearly-0.192
smart-0.185
 facts-0.183
gency-0.178
Tokenone
Token ID505
Feature activation-8.23e+00

Pos logit contributions
 bal+0.124
Whe+0.112
 Mir+0.111
CR+0.105
 Sara+0.105
Neg logit contributions
iversity-0.196
smart-0.165
 dearly-0.165
gency-0.160
 facts-0.156
Token could
Token ID714
Feature activation-2.74e+00

Pos logit contributions
 bal+0.309
Whe+0.285
CR+0.266
Next+0.261
me+0.259
Neg logit contributions
iversity-0.415
 dearly-0.353
smart-0.347
 facts-0.343
gency-0.341
Token open
Token ID1280
Feature activation-6.81e-01

Pos logit contributions
 bal+0.103
Whe+0.097
CR+0.094
ara+0.090
 Mir+0.089
Neg logit contributions
iversity-0.131
 dearly-0.114
assing-0.112
hips-0.112
gency-0.110
Token it
Token ID340
Feature activation-8.22e-01

Pos logit contributions
 bal+0.019
Whe+0.015
CR+0.015
me+0.015
 Mir+0.014
Neg logit contributions
iversity-0.041
gency-0.036
 dearly-0.036
hips-0.035
assing-0.035
Token.
Token ID13
Feature activation-4.24e-01

Pos logit contributions
 bal+0.027
Whe+0.024
CR+0.023
me+0.022
 Mir+0.022
Neg logit contributions
iversity-0.045
 dearly-0.039
hips-0.038
gency-0.038
assing-0.038
Token
Token ID198
Feature activation-9.21e-01

Pos logit contributions
 bal+0.016
 dipped+0.014
me+0.014
CR+0.014
Whe+0.013
Neg logit contributions
iversity-0.021
 dearly-0.018
gency-0.017
smart-0.017
assing-0.016
Token was
Token ID373
Feature activation-1.13e+00

Pos logit contributions
 bal+0.016
Whe+0.014
CR+0.014
me+0.013
 when+0.013
Neg logit contributions
iversity-0.029
gency-0.024
 dearly-0.024
hips-0.024
assing-0.024
Token destroyed
Token ID6572
Feature activation+3.02e-01

Pos logit contributions
 bal+0.041
Whe+0.035
CR+0.035
me+0.034
 when+0.034
Neg logit contributions
iversity-0.059
 dearly-0.050
gency-0.050
hips-0.049
assing-0.049
Token and
Token ID290
Feature activation-9.14e-01

Pos logit contributions
iversity+0.017
 dearly+0.015
gency+0.015
hips+0.014
assing+0.014
Neg logit contributions
 bal-0.009
Whe-0.008
CR-0.008
me-0.008
 Mir-0.008
Token no
Token ID645
Feature activation-3.24e+00

Pos logit contributions
 bal+0.026
Whe+0.021
CR+0.021
me+0.020
 when+0.020
Neg logit contributions
iversity-0.055
 dearly-0.048
gency-0.048
hips-0.047
assing-0.047
Token-
Token ID12
Feature activation-2.66e+00

Pos logit contributions
 bal+0.083
CR+0.074
Whe+0.074
me+0.070
Next+0.064
Neg logit contributions
iversity-0.189
 dearly-0.186
 facts-0.179
hips-0.169
gency-0.167
Tokenone
Token ID505
Feature activation-7.50e+00

Pos logit contributions
 bal+0.096
Whe+0.083
CR+0.080
 Mir+0.080
me+0.079
Neg logit contributions
iversity-0.140
 dearly-0.117
gency-0.116
hips-0.114
smart-0.114
Token was
Token ID373
Feature activation-2.23e+00

Pos logit contributions
 bal+0.268
Whe+0.244
CR+0.228
Next+0.224
 Mir+0.223
Neg logit contributions
iversity-0.393
 dearly-0.335
smart-0.328
 facts-0.324
gency-0.322
Token able
Token ID1498
Feature activation-2.91e+00

Pos logit contributions
 bal+0.088
Whe+0.078
CR+0.077
me+0.076
Next+0.074
Neg logit contributions
iversity-0.109
 dearly-0.093
gency-0.090
assing-0.090
hips-0.089
Token to
Token ID284
Feature activation-1.18e+00

Pos logit contributions
 bal+0.096
Whe+0.086
me+0.082
CR+0.080
 Mir+0.078
Neg logit contributions
iversity-0.161
 dearly-0.140
gency-0.135
smart-0.134
assing-0.133
Token help
Token ID1037
Feature activation+9.41e-01

Pos logit contributions
 bal+0.040
 when+0.034
me+0.033
CR+0.033
Whe+0.033
Neg logit contributions
iversity-0.065
gency-0.055
 dearly-0.054
 facts-0.053
assing-0.053
Token her
Token ID607
Feature activation-1.43e+00

Pos logit contributions
iversity+0.056
 dearly+0.049
gency+0.048
hips+0.048
assing+0.048
Neg logit contributions
 bal-0.027
Whe-0.022
CR-0.022
me-0.021
Next-0.021
Token somewhere
Token ID7382
Feature activation+5.47e-01

Pos logit contributions
iversity+0.193
 dearly+0.165
hips+0.161
gency+0.161
assing+0.158
Neg logit contributions
 bal-0.127
Whe-0.112
CR-0.109
me-0.109
 Mir-0.109
Token safe
Token ID3338
Feature activation+9.65e-01

Pos logit contributions
iversity+0.028
 dearly+0.025
hips+0.024
smart+0.023
gency+0.023
Neg logit contributions
 bal-0.019
Whe-0.018
CR-0.017
me-0.017
 Mir-0.017
Token so
Token ID523
Feature activation+1.59e+00

Pos logit contributions
iversity+0.060
gency+0.052
assing+0.051
 dearly+0.051
 facts+0.050
Neg logit contributions
 bal-0.025
 when-0.022
 one-0.020
 she-0.020
 another-0.020
Token no
Token ID645
Feature activation-3.34e+00

Pos logit contributions
iversity+0.081
 dearly+0.078
hips+0.074
 facts+0.069
smart+0.069
Neg logit contributions
Whe-0.051
 bal-0.050
me-0.047
CR-0.043
Next-0.043
Token-
Token ID12
Feature activation-2.49e+00

Pos logit contributions
Whe+0.071
CR+0.071
 bal+0.070
me+0.068
Next+0.057
Neg logit contributions
 dearly-0.190
iversity-0.185
 facts-0.183
gency-0.175
smart-0.175
Tokenone
Token ID505
Feature activation-7.45e+00

Pos logit contributions
 bal+0.089
Whe+0.080
 Mir+0.077
 Sara+0.074
CR+0.073
Neg logit contributions
iversity-0.132
 dearly-0.110
smart-0.110
gency-0.109
assing-0.106
Token would
Token ID561
Feature activation-1.76e+00

Pos logit contributions
 bal+0.272
Whe+0.262
Next+0.236
me+0.230
CR+0.229
Neg logit contributions
iversity-0.379
smart-0.325
 dearly-0.320
 facts-0.318
gency-0.308
Token lose
Token ID4425
Feature activation+1.58e-01

Pos logit contributions
 bal+0.056
 when+0.047
CR+0.046
 she+0.045
me+0.045
Neg logit contributions
iversity-0.100
gency-0.085
 dearly-0.084
assing-0.083
 facts-0.083
Token it
Token ID340
Feature activation-1.19e+00

Pos logit contributions
iversity+0.010
 dearly+0.009
gency+0.009
hips+0.008
assing+0.008
Neg logit contributions
 bal-0.004
Whe-0.003
CR-0.003
me-0.003
 when-0.003
Token.
Token ID13
Feature activation-1.77e-01

Pos logit contributions
 bal+0.039
Whe+0.034
CR+0.033
me+0.032
 Mir+0.032
Neg logit contributions
iversity-0.066
 dearly-0.057
gency-0.055
hips-0.055
assing-0.055
Token He
Token ID679
Feature activation+1.91e+00

Pos logit contributions
 bal+0.007
me+0.006
CR+0.006
Whe+0.006
 dipped+0.006
Neg logit contributions
iversity-0.009
 dearly-0.007
gency-0.007
smart-0.007
hips-0.007
Token kitchen
Token ID9592
Feature activation+1.25e+00

Pos logit contributions
iversity+0.004
 dearly+0.003
gency+0.003
assing+0.003
 lens+0.003
Neg logit contributions
 bal-0.002
CR-0.002
Whe-0.002
 Mir-0.002
 Sara-0.002
Token,
Token ID11
Feature activation+3.07e-01

Pos logit contributions
iversity+0.071
 dearly+0.061
gency+0.061
hips+0.060
assing+0.060
Neg logit contributions
 bal-0.038
Whe-0.032
CR-0.032
me-0.031
 Mir-0.031
Token but
Token ID475
Feature activation-1.84e+00

Pos logit contributions
iversity+0.016
gency+0.014
 dearly+0.014
assing+0.013
hips+0.013
Neg logit contributions
 bal-0.011
CR-0.009
Whe-0.009
 when-0.009
me-0.009
Token no
Token ID645
Feature activation-2.94e+00

Pos logit contributions
 bal+0.057
Whe+0.050
me+0.048
CR+0.045
ara+0.045
Neg logit contributions
iversity-0.106
 dearly-0.094
gency-0.090
hips-0.090
assing-0.088
Token-
Token ID12
Feature activation-3.03e+00

Pos logit contributions
CR+0.083
Whe+0.083
 bal+0.082
me+0.074
Next+0.071
Neg logit contributions
iversity-0.150
 dearly-0.146
 facts-0.142
gency-0.141
smart-0.132
Tokenone
Token ID505
Feature activation-7.37e+00

Pos logit contributions
 bal+0.087
Whe+0.073
 Mir+0.069
 Sara+0.068
CR+0.067
Neg logit contributions
iversity-0.183
 dearly-0.156
gency-0.156
smart-0.153
assing-0.152
Token knew
Token ID2993
Feature activation+9.13e-01

Pos logit contributions
 bal+0.277
Whe+0.258
CR+0.237
Next+0.236
me+0.231
Neg logit contributions
iversity-0.370
 dearly-0.313
smart-0.312
gency-0.302
 facts-0.298
Token why
Token ID1521
Feature activation-2.42e-02

Pos logit contributions
iversity+0.036
 dearly+0.029
gency+0.029
hips+0.028
smart+0.028
Neg logit contributions
 bal-0.044
Whe-0.041
me-0.040
CR-0.040
Next-0.038
Token.
Token ID13
Feature activation+4.59e-02

Pos logit contributions
 bal+0.001
 when+0.001
 she+0.000
CR+0.000
 another+0.000
Neg logit contributions
iversity-0.002
gency-0.001
assing-0.001
 dearly-0.001
hips-0.001
Token The
Token ID383
Feature activation-2.50e-02

Pos logit contributions
iversity+0.002
 dearly+0.002
gency+0.002
hips+0.002
assing+0.002
Neg logit contributions
 bal-0.002
me-0.002
Whe-0.002
CR-0.002
 when-0.001
Token people
Token ID661
Feature activation+5.87e-01

Pos logit contributions
 bal+0.001
CR+0.001
 when+0.001
Whe+0.001
me+0.001
Neg logit contributions
iversity-0.001
gency-0.001
 dearly-0.001
hips-0.001
assing-0.001
Token very
Token ID845
Feature activation+7.45e-01

Pos logit contributions
 bal+0.113
Whe+0.100
CR+0.098
me+0.097
 Mir+0.095
Neg logit contributions
iversity-0.170
 dearly-0.146
gency-0.141
 facts-0.141
hips-0.140
Token special
Token ID2041
Feature activation-3.40e+00

Pos logit contributions
iversity+0.036
 dearly+0.031
 facts+0.029
gency+0.029
hips+0.029
Neg logit contributions
 bal-0.029
Whe-0.026
CR-0.026
me-0.025
 Mir-0.025
Token.
Token ID13
Feature activation-6.56e-01

Pos logit contributions
 bal+0.109
Whe+0.091
CR+0.091
me+0.089
 Mir+0.087
Neg logit contributions
iversity-0.191
 dearly-0.163
gency-0.163
hips-0.160
assing-0.159
Token Something
Token ID13742
Feature activation+1.39e+00

Pos logit contributions
 bal+0.026
 dipped+0.022
Whe+0.022
me+0.022
CR+0.022
Neg logit contributions
iversity-0.031
 dearly-0.027
gency-0.026
smart-0.025
hips-0.025
Token no
Token ID645
Feature activation-4.55e+00

Pos logit contributions
iversity+0.077
gency+0.066
 dearly+0.066
assing+0.065
hips+0.065
Neg logit contributions
 bal-0.045
CR-0.037
Whe-0.037
me-0.037
 when-0.036
Token one
Token ID530
Feature activation-7.29e+00

Pos logit contributions
Whe+0.143
 bal+0.139
CR+0.137
Next+0.129
me+0.119
Neg logit contributions
iversity-0.234
 dearly-0.222
 facts-0.210
smart-0.201
gency-0.199
Token else
Token ID2073
Feature activation-5.64e+00

Pos logit contributions
 bal+0.265
Whe+0.250
CR+0.234
Next+0.232
me+0.232
Neg logit contributions
iversity-0.368
 dearly-0.315
 facts-0.312
smart-0.303
gency-0.296
Token had
Token ID550
Feature activation+4.59e-02

Pos logit contributions
 bal+0.186
CR+0.155
Whe+0.154
 when+0.153
me+0.153
Neg logit contributions
iversity-0.310
gency-0.264
 dearly-0.263
assing-0.259
hips-0.259
Token.
Token ID13
Feature activation-6.03e-01

Pos logit contributions
iversity+0.002
 dearly+0.002
gency+0.002
hips+0.002
 facts+0.002
Neg logit contributions
 bal-0.002
Whe-0.002
me-0.001
 Mir-0.001
CR-0.001
Token He
Token ID679
Feature activation+1.26e+00

Pos logit contributions
 bal+0.023
Whe+0.020
me+0.020
 dipped+0.020
CR+0.020
Neg logit contributions
iversity-0.029
 dearly-0.025
gency-0.024
hips-0.024
smart-0.024
Token went
Token ID1816
Feature activation-2.82e-01

Pos logit contributions
iversity+0.067
 dearly+0.058
gency+0.057
assing+0.056
 facts+0.055
Neg logit contributions
 bal-0.043
Whe-0.038
CR-0.037
Next-0.035
me-0.035
Token of
Token ID286
Feature activation-3.01e+00

Pos logit contributions
 bal+0.185
CR+0.156
Whe+0.156
me+0.153
 when+0.152
Neg logit contributions
iversity-0.289
gency-0.245
 dearly-0.244
hips-0.240
assing-0.240
Token the
Token ID262
Feature activation-1.86e+00

Pos logit contributions
 bal+0.069
 one+0.066
 another+0.063
 when+0.063
 Sara+0.062
Neg logit contributions
iversity-0.192
gency-0.170
assing-0.167
asser-0.166
 dearly-0.165
Token biggest
Token ID4094
Feature activation-4.10e+00

Pos logit contributions
 bal+0.058
 when+0.049
 one+0.048
Whe+0.047
me+0.046
Neg logit contributions
iversity-0.107
gency-0.091
 dearly-0.090
hips-0.089
assing-0.089
Token and
Token ID290
Feature activation-1.89e+00

Pos logit contributions
 bal+0.147
 when+0.124
CR+0.120
 one+0.120
me+0.120
Neg logit contributions
iversity-0.218
gency-0.183
 dearly-0.181
hips-0.181
assing-0.181
Token gent
Token ID25049
Feature activation+1.80e+00

Pos logit contributions
 bal+0.067
CR+0.059
Whe+0.057
me+0.056
Next+0.055
Neg logit contributions
iversity-0.099
 dearly-0.085
gency-0.084
 facts-0.083
smart-0.082
Tokenlest
Token ID32712
Feature activation-7.25e+00

Pos logit contributions
iversity+0.097
gency+0.086
 dearly+0.084
assing+0.083
hips+0.082
Neg logit contributions
 bal-0.059
me-0.052
 when-0.050
CR-0.049
 one-0.047
Token dinosaurs
Token ID27199
Feature activation-6.43e-01

Pos logit contributions
 bal+0.211
CR+0.171
Whe+0.171
me+0.168
 when+0.167
Neg logit contributions
iversity-0.429
gency-0.370
 dearly-0.368
assing-0.363
hips-0.363
Token ever
Token ID1683
Feature activation+3.98e-01

Pos logit contributions
 bal+0.022
Whe+0.021
CR+0.020
Next+0.019
me+0.019
Neg logit contributions
iversity-0.034
 dearly-0.029
hips-0.029
 facts-0.028
smart-0.028
Token,"
Token ID553
Feature activation-2.42e+00

Pos logit contributions
iversity+0.021
 dearly+0.018
 facts+0.018
smart+0.018
gency+0.017
Neg logit contributions
 bal-0.014
Whe-0.013
CR-0.012
 Mir-0.012
me-0.011
Token mom
Token ID1995
Feature activation-9.32e-01

Pos logit contributions
 bal+0.073
CR+0.062
Whe+0.061
me+0.059
Next+0.055
Neg logit contributions
iversity-0.137
 dearly-0.119
gency-0.118
smart-0.117
hips-0.117
Token explained
Token ID4893
Feature activation+8.02e-01

Pos logit contributions
 bal+0.025
Whe+0.023
CR+0.022
Next+0.018
me+0.018
Neg logit contributions
iversity-0.056
smart-0.048
gency-0.048
 dearly-0.047
 facts-0.047
Token grass
Token ID8701
Feature activation-3.73e+00

Pos logit contributions
 bal+0.069
Whe+0.060
CR+0.059
me+0.058
 Mir+0.056
Neg logit contributions
iversity-0.120
 dearly-0.104
gency-0.102
hips-0.100
 facts-0.099
Tokeny
Token ID88
Feature activation-6.94e-01

Pos logit contributions
 bal+0.120
Whe+0.108
CR+0.105
me+0.103
 Mir+0.101
Neg logit contributions
iversity-0.203
 dearly-0.174
hips-0.171
gency-0.169
 lens-0.166
Token field
Token ID2214
Feature activation-8.81e-01

Pos logit contributions
 bal+0.022
Whe+0.019
CR+0.018
me+0.018
 Mir+0.018
Neg logit contributions
iversity-0.039
 dearly-0.033
gency-0.033
hips-0.032
 lens-0.032
Token and
Token ID290
Feature activation-1.46e+00

Pos logit contributions
 bal+0.027
Whe+0.023
CR+0.022
me+0.022
 Mir+0.022
Neg logit contributions
iversity-0.050
 dearly-0.043
gency-0.043
hips-0.042
assing-0.042
Token no
Token ID645
Feature activation-4.51e+00

Pos logit contributions
 bal+0.048
CR+0.042
me+0.042
Whe+0.041
ara+0.040
Neg logit contributions
iversity-0.078
 dearly-0.067
gency-0.066
 facts-0.064
assing-0.064
Token one
Token ID530
Feature activation-7.20e+00

Pos logit contributions
CR+0.119
Whe+0.118
 bal+0.110
me+0.107
 Mir+0.101
Neg logit contributions
 dearly-0.240
iversity-0.224
 facts-0.218
 brand-0.211
 unison-0.207
Token dared
Token ID28765
Feature activation-3.04e+00

Pos logit contributions
 bal+0.244
Whe+0.232
me+0.222
CR+0.221
Next+0.215
Neg logit contributions
iversity-0.372
 dearly-0.329
smart-0.313
 facts-0.312
gency-0.306
Token to
Token ID284
Feature activation+5.53e-02

Pos logit contributions
 bal+0.092
Whe+0.078
CR+0.077
me+0.076
Next+0.073
Neg logit contributions
iversity-0.174
 dearly-0.151
gency-0.150
hips-0.147
assing-0.147
Token come
Token ID1282
Feature activation-5.23e-01

Pos logit contributions
iversity+0.003
gency+0.002
assing+0.002
hips+0.002
 dearly+0.002
Neg logit contributions
 bal-0.002
Whe-0.002
CR-0.002
me-0.002
Next-0.002
Token near
Token ID1474
Feature activation-7.15e-01

Pos logit contributions
 bal+0.019
Whe+0.017
CR+0.016
me+0.016
Next+0.016
Neg logit contributions
iversity-0.026
hips-0.023
 dearly-0.023
assing-0.022
gency-0.022
Token him
Token ID683
Feature activation+1.53e-01

Pos logit contributions
 bal+0.017
CR+0.013
Whe+0.013
 when+0.013
me+0.013
Neg logit contributions
iversity-0.046
gency-0.040
 dearly-0.040
assing-0.040
hips-0.040
Token help
Token ID1037
Feature activation-3.03e-01

Pos logit contributions
 bal+0.089
Whe+0.081
CR+0.076
me+0.075
Next+0.071
Neg logit contributions
iversity-0.120
 facts-0.108
smart-0.107
 dearly-0.106
hips-0.104
Token them
Token ID606
Feature activation-2.66e+00

Pos logit contributions
 bal+0.009
Whe+0.008
CR+0.007
ara+0.007
 Mir+0.007
Neg logit contributions
iversity-0.017
 dearly-0.015
smart-0.015
hips-0.015
 facts-0.015
Token.
Token ID13
Feature activation-6.72e-01

Pos logit contributions
 bal+0.087
Whe+0.079
CR+0.076
 Mir+0.074
ara+0.074
Neg logit contributions
iversity-0.143
 dearly-0.123
hips-0.122
gency-0.120
assing-0.119
Token But
Token ID887
Feature activation-2.98e+00

Pos logit contributions
 bal+0.027
me+0.024
Whe+0.023
 dipped+0.023
CR+0.023
Neg logit contributions
iversity-0.031
 dearly-0.026
gency-0.026
hips-0.026
smart-0.025
Token no
Token ID645
Feature activation-3.94e+00

Pos logit contributions
 bal+0.088
Whe+0.083
CR+0.077
me+0.076
Next+0.075
Neg logit contributions
iversity-0.166
 dearly-0.144
gency-0.144
hips-0.142
assing-0.140
Token one
Token ID530
Feature activation-7.16e+00

Pos logit contributions
Whe+0.091
 bal+0.089
CR+0.079
me+0.074
 Mir+0.072
Neg logit contributions
iversity-0.232
 dearly-0.218
smart-0.213
 facts-0.210
gency-0.207
Token sees
Token ID7224
Feature activation-1.83e+00

Pos logit contributions
 bal+0.283
Whe+0.272
Next+0.257
me+0.252
CR+0.246
Neg logit contributions
iversity-0.342
smart-0.284
 dearly-0.284
 facts-0.278
gency-0.269
Token them
Token ID606
Feature activation-2.50e+00

Pos logit contributions
 bal+0.058
Whe+0.051
me+0.048
CR+0.047
Next+0.046
Neg logit contributions
iversity-0.103
 dearly-0.088
gency-0.088
hips-0.087
 facts-0.086
Token.
Token ID13
Feature activation-6.20e-01

Pos logit contributions
 bal+0.073
 when+0.061
CR+0.059
me+0.058
 she+0.057
Neg logit contributions
iversity-0.147
gency-0.127
 dearly-0.126
 facts-0.124
hips-0.124
Token No
Token ID1400
Feature activation-2.43e+00

Pos logit contributions
 bal+0.026
me+0.022
Whe+0.022
CR+0.022
 when+0.022
Neg logit contributions
iversity-0.028
gency-0.023
 dearly-0.023
hips-0.023
assing-0.023
Token one
Token ID530
Feature activation-6.54e+00

Pos logit contributions
 bal+0.048
Whe+0.039
CR+0.036
Next+0.034
me+0.034
Neg logit contributions
iversity-0.164
gency-0.147
 dearly-0.147
 facts-0.147
smart-0.144
Token so
Token ID523
Feature activation+2.53e+00

Pos logit contributions
 bal+0.041
CR+0.034
me+0.034
Whe+0.034
 Mir+0.033
Neg logit contributions
iversity-0.069
 dearly-0.059
gency-0.059
assing-0.058
hips-0.057
Token fast
Token ID3049
Feature activation-1.22e-01

Pos logit contributions
iversity+0.124
 dearly+0.106
hips+0.101
smart+0.101
assing+0.100
Neg logit contributions
 bal-0.099
Whe-0.087
me-0.086
 Mir-0.083
CR-0.082
Token that
Token ID326
Feature activation-6.26e-01

Pos logit contributions
 bal+0.004
me+0.004
Whe+0.004
CR+0.004
 Mir+0.003
Neg logit contributions
iversity-0.007
 dearly-0.006
gency-0.005
assing-0.005
hips-0.005
Token no
Token ID645
Feature activation-2.75e+00

Pos logit contributions
 bal+0.020
ara+0.018
Whe+0.018
me+0.017
CR+0.017
Neg logit contributions
iversity-0.033
 dearly-0.029
 facts-0.028
smart-0.028
gency-0.028
Token-
Token ID12
Feature activation-2.21e+00

Pos logit contributions
 bal+0.056
CR+0.051
Whe+0.046
me+0.045
 Mir+0.044
Neg logit contributions
iversity-0.175
 dearly-0.166
 facts-0.158
gency-0.158
smart-0.156
Tokenone
Token ID505
Feature activation-7.14e+00

Pos logit contributions
 bal+0.061
 Mir+0.051
Whe+0.051
 Sara+0.049
CR+0.047
Neg logit contributions
iversity-0.136
 dearly-0.116
smart-0.114
gency-0.114
assing-0.112
Token could
Token ID714
Feature activation-2.09e+00

Pos logit contributions
 bal+0.263
Whe+0.249
Next+0.229
 Mir+0.225
CR+0.223
Neg logit contributions
iversity-0.365
smart-0.309
 dearly-0.304
 facts-0.295
gency-0.291
Token catch
Token ID4929
Feature activation-2.49e+00

Pos logit contributions
 bal+0.071
CR+0.059
Whe+0.059
me+0.058
 when+0.058
Neg logit contributions
iversity-0.114
gency-0.097
 dearly-0.097
assing-0.095
hips-0.095
Token him
Token ID683
Feature activation+3.77e-01

Pos logit contributions
 bal+0.083
CR+0.070
Whe+0.070
me+0.069
 when+0.068
Neg logit contributions
iversity-0.136
gency-0.116
 dearly-0.115
assing-0.114
hips-0.113
Token.
Token ID13
Feature activation+8.89e-02

Pos logit contributions
iversity+0.023
gency+0.020
 dearly+0.019
hips+0.019
assing+0.019
Neg logit contributions
 bal-0.010
 when-0.009
 another-0.008
CR-0.008
me-0.008
Token He
Token ID679
Feature activation+2.22e+00

Pos logit contributions
iversity+0.004
 dearly+0.004
gency+0.004
hips+0.003
assing+0.003
Neg logit contributions
 bal-0.004
me-0.003
CR-0.003
 dipped-0.003
Whe-0.003
Token passed
Token ID3804
Feature activation+4.88e-01

Pos logit contributions
 bal+0.075
Whe+0.070
 Mir+0.064
Next+0.062
 dipped+0.061
Neg logit contributions
iversity-0.140
smart-0.119
gency-0.118
assing-0.112
hips-0.110
Token,
Token ID11
Feature activation+9.57e-01

Pos logit contributions
iversity+0.027
 dearly+0.024
gency+0.023
hips+0.023
assing+0.023
Neg logit contributions
 bal-0.016
Whe-0.013
me-0.013
CR-0.013
 Mir-0.013
Token but
Token ID475
Feature activation-1.42e+00

Pos logit contributions
iversity+0.052
 dearly+0.045
gency+0.044
assing+0.044
hips+0.044
Neg logit contributions
 bal-0.032
me-0.027
Whe-0.027
CR-0.027
 Mir-0.025
Token no
Token ID645
Feature activation-2.74e+00

Pos logit contributions
 bal+0.041
Whe+0.037
CR+0.035
me+0.035
ara+0.033
Neg logit contributions
iversity-0.083
 dearly-0.074
gency-0.072
hips-0.070
assing-0.069
Token-
Token ID12
Feature activation-2.42e+00

Pos logit contributions
 bal+0.095
CR+0.094
Whe+0.094
me+0.087
Next+0.086
Neg logit contributions
 dearly-0.123
iversity-0.122
gency-0.113
 facts-0.113
 brand-0.111
Tokenone
Token ID505
Feature activation-7.07e+00

Pos logit contributions
 bal+0.075
Whe+0.066
 Mir+0.062
 Sara+0.062
 asks+0.058
Neg logit contributions
iversity-0.142
 dearly-0.120
gency-0.119
smart-0.118
assing-0.112
Token came
Token ID1625
Feature activation-2.42e+00

Pos logit contributions
 bal+0.261
Whe+0.245
CR+0.225
Next+0.224
 Mir+0.218
Neg logit contributions
iversity-0.359
 dearly-0.306
smart-0.302
gency-0.295
 facts-0.290
Token and
Token ID290
Feature activation-2.61e-01

Pos logit contributions
 bal+0.086
Whe+0.077
me+0.073
CR+0.072
 Mir+0.071
Neg logit contributions
iversity-0.125
 dearly-0.110
gency-0.107
hips-0.107
assing-0.105
Token it
Token ID340
Feature activation-1.52e+00

Pos logit contributions
 bal+0.008
Whe+0.007
me+0.007
Next+0.007
ara+0.007
Neg logit contributions
iversity-0.015
gency-0.013
 dearly-0.013
smart-0.013
ishly-0.012
Token started
Token ID2067
Feature activation-4.08e+00

Pos logit contributions
Whe+0.061
 bal+0.060
Next+0.055
 Mir+0.054
ara+0.053
Neg logit contributions
iversity-0.070
 dearly-0.061
isk-0.056
hips-0.056
gency-0.056
Token to
Token ID284
Feature activation-7.26e-01

Pos logit contributions
 bal+0.113
 when+0.092
CR+0.091
Whe+0.088
me+0.087
Neg logit contributions
iversity-0.248
gency-0.213
 dearly-0.212
hips-0.210
assing-0.210
Token be
Token ID307
Feature activation-1.39e+00

Pos logit contributions
iversity+0.013
 dearly+0.011
gency+0.011
assing+0.011
hips+0.011
Neg logit contributions
 bal-0.008
Whe-0.007
CR-0.007
 Mir-0.006
me-0.006
Token clean
Token ID3424
Feature activation+1.91e+00

Pos logit contributions
 bal+0.053
Whe+0.048
CR+0.046
me+0.046
 Mir+0.045
Neg logit contributions
iversity-0.069
 dearly-0.058
assing-0.057
smart-0.057
gency-0.057
Token and
Token ID290
Feature activation+1.39e-01

Pos logit contributions
iversity+0.096
 dearly+0.086
gency+0.082
hips+0.081
assing+0.081
Neg logit contributions
 bal-0.066
Whe-0.060
CR-0.060
Next-0.057
 Mir-0.057
Token no
Token ID645
Feature activation-2.76e+00

Pos logit contributions
iversity+0.008
 dearly+0.007
gency+0.006
assing+0.006
smart+0.006
Neg logit contributions
 bal-0.005
Whe-0.004
 Mir-0.004
CR-0.004
me-0.004
Token-
Token ID12
Feature activation-1.80e+00

Pos logit contributions
CR+0.084
 bal+0.081
 Mir+0.078
me+0.075
Whe+0.074
Neg logit contributions
iversity-0.135
 dearly-0.132
gency-0.124
 facts-0.121
smart-0.121
Tokenone
Token ID505
Feature activation-7.04e+00

Pos logit contributions
 bal+0.059
 Mir+0.054
Whe+0.051
 dipped+0.048
 sun+0.048
Neg logit contributions
iversity-0.099
gency-0.083
smart-0.083
 dearly-0.081
assing-0.080
Token will
Token ID481
Feature activation-1.00e+00

Pos logit contributions
 bal+0.239
Whe+0.223
 Mir+0.209
Next+0.209
CR+0.205
Neg logit contributions
iversity-0.369
smart-0.318
 dearly-0.313
 facts-0.307
gency-0.301
Token want
Token ID765
Feature activation-2.26e+00

Pos logit contributions
 bal+0.033
CR+0.027
 when+0.027
Whe+0.027
me+0.027
Neg logit contributions
iversity-0.056
gency-0.048
 dearly-0.047
assing-0.046
hips-0.046
Token to
Token ID284
Feature activation+3.74e-01

Pos logit contributions
 bal+0.069
Whe+0.065
me+0.060
CR+0.060
 Mir+0.059
Neg logit contributions
iversity-0.120
 dearly-0.112
gency-0.104
smart-0.103
 facts-0.102
Token play
Token ID711
Feature activation-2.43e-01

Pos logit contributions
iversity+0.019
 dearly+0.017
gency+0.017
assing+0.017
hips+0.016
Neg logit contributions
 bal-0.013
CR-0.011
Whe-0.011
 Mir-0.011
me-0.011
Token with
Token ID351
Feature activation+4.55e-01

Pos logit contributions
 bal+0.007
CR+0.006
 Mir+0.006
Whe+0.006
Next+0.006
Neg logit contributions
iversity-0.014
 dearly-0.012
gency-0.012
assing-0.012
smart-0.012
Token looked
Token ID3114
Feature activation+2.74e+00

Pos logit contributions
iversity+0.148
igsaw+0.140
 facts+0.134
hips+0.129
gency+0.129
Neg logit contributions
 asked-0.064
 when-0.056
 she-0.053
 bal-0.052
 one-0.052
Token around
Token ID1088
Feature activation-8.58e-01

Pos logit contributions
iversity+0.177
gency+0.156
assing+0.152
 facts+0.151
hips+0.150
Neg logit contributions
 bal-0.069
 when-0.065
 one-0.057
 another-0.057
 she-0.056
Token but
Token ID475
Feature activation-1.11e+00

Pos logit contributions
 when+0.018
 one+0.016
 another+0.016
 she+0.015
 bal+0.015
Neg logit contributions
iversity-0.062
gency-0.054
assing-0.054
hips-0.053
 tripod-0.052
Token no
Token ID645
Feature activation-2.62e+00

Pos logit contributions
 bal+0.035
Whe+0.029
CR+0.029
me+0.028
 Mir+0.027
Neg logit contributions
iversity-0.063
 dearly-0.054
gency-0.054
assing-0.053
hips-0.053
Token-
Token ID12
Feature activation-2.37e+00

Pos logit contributions
 bal+0.062
CR+0.058
Whe+0.056
me+0.053
Next+0.046
Neg logit contributions
 dearly-0.150
iversity-0.150
 facts-0.143
gency-0.139
smart-0.135
Tokenone
Token ID505
Feature activation-7.04e+00

Pos logit contributions
 bal+0.081
Whe+0.069
 Mir+0.067
 Sara+0.065
CR+0.065
Neg logit contributions
iversity-0.130
 dearly-0.110
gency-0.108
smart-0.107
hips-0.106
Token was
Token ID373
Feature activation-2.01e+00

Pos logit contributions
 bal+0.264
Whe+0.237
CR+0.225
me+0.219
Next+0.219
Neg logit contributions
iversity-0.357
 dearly-0.303
gency-0.295
smart-0.294
hips-0.290
Token coming
Token ID2406
Feature activation-5.76e-01

Pos logit contributions
 bal+0.072
Whe+0.065
CR+0.062
me+0.062
ara+0.061
Neg logit contributions
iversity-0.104
 dearly-0.091
assing-0.088
gency-0.087
smart-0.087
Token.
Token ID13
Feature activation+2.39e-01

Pos logit contributions
 bal+0.018
Whe+0.016
CR+0.015
me+0.015
 Mir+0.015
Neg logit contributions
iversity-0.032
 dearly-0.028
gency-0.027
hips-0.027
assing-0.027
Token Then
Token ID3244
Feature activation-2.81e+00

Pos logit contributions
iversity+0.012
 dearly+0.011
gency+0.010
hips+0.010
assing+0.010
Neg logit contributions
 bal-0.009
me-0.007
Whe-0.007
CR-0.007
 when-0.007
Token a
Token ID257
Feature activation-3.29e-01

Pos logit contributions
 bal+0.084
Whe+0.072
me+0.070
CR+0.068
Next+0.064
Neg logit contributions
iversity-0.160
 dearly-0.143
gency-0.140
hips-0.136
 facts-0.136
Token for
Token ID329
Feature activation-9.48e-01

Pos logit contributions
iversity+0.009
gency+0.008
 dearly+0.008
hips+0.008
assing+0.008
Neg logit contributions
 bal-0.006
CR-0.005
Whe-0.005
me-0.004
 when-0.004
Token help
Token ID1037
Feature activation+7.59e-01

Pos logit contributions
 bal+0.030
Whe+0.025
me+0.025
CR+0.025
 when+0.024
Neg logit contributions
iversity-0.053
gency-0.046
 dearly-0.046
assing-0.045
hips-0.045
Token,
Token ID11
Feature activation-6.95e-01

Pos logit contributions
iversity+0.040
 dearly+0.035
assing+0.034
smart+0.034
gency+0.034
Neg logit contributions
 bal-0.025
Whe-0.024
CR-0.022
 Mir-0.021
ara-0.021
Token but
Token ID475
Feature activation-2.37e+00

Pos logit contributions
 bal+0.024
Whe+0.022
CR+0.021
me+0.021
ara+0.020
Neg logit contributions
iversity-0.036
 dearly-0.032
gency-0.030
smart-0.030
assing-0.030
Token no
Token ID645
Feature activation-3.91e+00

Pos logit contributions
Whe+0.073
 bal+0.070
CR+0.067
me+0.062
ara+0.061
Neg logit contributions
iversity-0.130
 dearly-0.116
gency-0.111
ishly-0.109
hips-0.107
Token one
Token ID530
Feature activation-7.02e+00

Pos logit contributions
Whe+0.121
 bal+0.112
CR+0.112
 Mir+0.107
me+0.104
Neg logit contributions
iversity-0.193
 dearly-0.191
smart-0.181
gency-0.177
 facts-0.175
Token could
Token ID714
Feature activation-2.68e+00

Pos logit contributions
 bal+0.285
Whe+0.271
me+0.256
CR+0.256
Next+0.250
Neg logit contributions
iversity-0.322
 dearly-0.277
smart-0.268
 facts-0.263
gency-0.259
Token hear
Token ID3285
Feature activation+3.61e-01

Pos logit contributions
 bal+0.114
Whe+0.111
CR+0.105
ara+0.104
 Mir+0.102
Neg logit contributions
iversity-0.114
 dearly-0.098
hips-0.096
assing-0.094
smart-0.094
Token her
Token ID607
Feature activation-1.65e+00

Pos logit contributions
iversity+0.020
 dearly+0.018
gency+0.017
hips+0.017
smart+0.017
Neg logit contributions
 bal-0.011
Whe-0.009
CR-0.009
me-0.009
Next-0.009
Token.
Token ID13
Feature activation-4.32e-01

Pos logit contributions
 bal+0.057
CR+0.050
Whe+0.050
me+0.048
Next+0.047
Neg logit contributions
iversity-0.087
 dearly-0.075
gency-0.074
hips-0.073
assing-0.072
Token She
Token ID1375
Feature activation+1.13e+00

Pos logit contributions
 bal+0.016
 dipped+0.015
me+0.014
CR+0.014
Whe+0.014
Neg logit contributions
iversity-0.021
 dearly-0.018
gency-0.017
smart-0.017
assing-0.017
Token their
Token ID511
Feature activation-3.29e+00

Pos logit contributions
 bal+0.025
 when+0.021
 she+0.020
 Sara+0.020
CR+0.020
Neg logit contributions
iversity-0.056
gency-0.049
assing-0.048
 dearly-0.048
hips-0.048
Token parents
Token ID3397
Feature activation+6.61e-01

Pos logit contributions
 bal+0.096
 when+0.080
 Sara+0.077
Whe+0.076
CR+0.076
Neg logit contributions
iversity-0.195
gency-0.168
assing-0.164
 dearly-0.164
hips-0.162
Token,
Token ID11
Feature activation-1.11e+00

Pos logit contributions
iversity+0.038
gency+0.032
 dearly+0.032
hips+0.032
assing+0.032
Neg logit contributions
 bal-0.020
CR-0.017
 when-0.017
Whe-0.017
me-0.016
Token but
Token ID475
Feature activation-2.65e+00

Pos logit contributions
 bal+0.035
CR+0.030
Whe+0.030
me+0.029
Next+0.028
Neg logit contributions
iversity-0.062
 dearly-0.053
gency-0.053
hips-0.052
assing-0.052
Token no
Token ID645
Feature activation-3.98e+00

Pos logit contributions
 bal+0.079
Whe+0.071
CR+0.070
me+0.064
Next+0.063
Neg logit contributions
iversity-0.151
 dearly-0.133
gency-0.128
hips-0.128
 facts-0.127
Token one
Token ID530
Feature activation-6.98e+00

Pos logit contributions
 bal+0.086
Whe+0.081
CR+0.081
me+0.070
 Mir+0.067
Neg logit contributions
iversity-0.243
 dearly-0.229
 facts-0.226
smart-0.217
gency-0.216
Token heard
Token ID2982
Feature activation+3.94e-01

Pos logit contributions
 bal+0.246
Whe+0.220
CR+0.214
me+0.210
Next+0.206
Neg logit contributions
iversity-0.365
 dearly-0.312
 facts-0.304
gency-0.302
smart-0.300
Token them
Token ID606
Feature activation-1.83e+00

Pos logit contributions
iversity+0.022
 dearly+0.019
gency+0.018
hips+0.018
assing+0.018
Neg logit contributions
 bal-0.013
Whe-0.011
CR-0.011
me-0.010
 Mir-0.010
Token.
Token ID13
Feature activation-5.33e-01

Pos logit contributions
 bal+0.055
 when+0.046
me+0.044
CR+0.044
Whe+0.043
Neg logit contributions
iversity-0.107
gency-0.092
 dearly-0.091
hips-0.090
assing-0.089
Token The
Token ID383
Feature activation-4.99e-01

Pos logit contributions
 bal+0.022
me+0.019
Whe+0.018
CR+0.018
 dipped+0.018
Neg logit contributions
iversity-0.025
 dearly-0.021
gency-0.021
assing-0.020
hips-0.020
Token creature
Token ID7185
Feature activation-9.24e-01

Pos logit contributions
 bal+0.016
 when+0.013
me+0.012
 one+0.012
 Sara+0.012
Neg logit contributions
iversity-0.029
gency-0.024
 dearly-0.024
assing-0.024
hips-0.024
Token and
Token ID290
Feature activation-2.60e+00

Pos logit contributions
 bal+0.115
Whe+0.098
me+0.096
 Mir+0.094
Next+0.093
Neg logit contributions
iversity-0.172
 dearly-0.142
gency-0.141
hips-0.141
assing-0.138
Token slides
Token ID19392
Feature activation-4.45e+00

Pos logit contributions
 bal+0.084
Whe+0.073
me+0.072
Next+0.070
CR+0.069
Neg logit contributions
iversity-0.141
gency-0.120
 dearly-0.120
assing-0.118
 facts-0.117
Token down
Token ID866
Feature activation-1.56e+00

Pos logit contributions
 bal+0.149
me+0.128
Whe+0.125
CR+0.125
Next+0.123
Neg logit contributions
iversity-0.244
 dearly-0.205
hips-0.205
gency-0.204
assing-0.202
Token.
Token ID13
Feature activation-7.11e-01

Pos logit contributions
 bal+0.050
me+0.042
Whe+0.041
CR+0.038
 dipped+0.037
Neg logit contributions
iversity-0.087
 dearly-0.076
hips-0.075
smart-0.074
gency-0.073
Token "
Token ID366
Feature activation-8.17e-02

Pos logit contributions
 bal+0.028
me+0.025
 dipped+0.025
CR+0.024
Whe+0.024
Neg logit contributions
iversity-0.033
 dearly-0.028
gency-0.027
smart-0.026
hips-0.026
TokenWhe
Token ID10842
Feature activation-6.97e+00

Pos logit contributions
 bal+0.002
 when+0.002
 she+0.002
 asks+0.002
 Mir+0.002
Neg logit contributions
iversity-0.004
assing-0.004
 dearly-0.004
gency-0.004
 tripod-0.004
Tokene
Token ID68
Feature activation-8.05e-01

Pos logit contributions
 bal+0.309
CR+0.267
 Mir+0.262
 she+0.260
Next+0.258
Neg logit contributions
iversity-0.305
 dearly-0.249
gency-0.242
smart-0.238
assing-0.232
Token!"
Token ID2474
Feature activation-1.42e+00

Pos logit contributions
 bal+0.029
CR+0.026
Next+0.026
Whe+0.025
 Mir+0.025
Neg logit contributions
iversity-0.041
hips-0.033
 dearly-0.033
gency-0.033
assing-0.033
Token she
Token ID673
Feature activation-1.36e+00

Pos logit contributions
 bal+0.048
me+0.046
Whe+0.042
CR+0.042
 dipped+0.040
Neg logit contributions
iversity-0.067
 tripod-0.056
 dearly-0.055
smart-0.055
ishly-0.054
Token says
Token ID1139
Feature activation-1.41e+00

Pos logit contributions
CR+0.047
Next+0.047
 bal+0.047
There+0.044
Whe+0.043
Neg logit contributions
iversity-0.061
 facts-0.052
ishly-0.051
gency-0.049
 stories-0.049
Token.
Token ID13
Feature activation+1.01e+00

Pos logit contributions
 bal+0.052
Whe+0.052
CR+0.048
Next+0.048
ara+0.047
Neg logit contributions
iversity-0.066
 dearly-0.056
smart-0.054
hips-0.054
gency-0.054
Token help
Token ID1037
Feature activation+6.18e-01

Pos logit contributions
 bal+0.060
CR+0.051
Whe+0.051
me+0.049
 Mir+0.049
Neg logit contributions
iversity-0.093
 dearly-0.079
gency-0.079
hips-0.078
assing-0.078
Token them
Token ID606
Feature activation-1.86e+00

Pos logit contributions
iversity+0.035
 dearly+0.031
gency+0.030
hips+0.030
smart+0.030
Neg logit contributions
 bal-0.019
Whe-0.016
CR-0.016
me-0.015
Next-0.015
Token.
Token ID13
Feature activation-4.27e-01

Pos logit contributions
 bal+0.055
CR+0.045
Whe+0.045
me+0.044
 when+0.044
Neg logit contributions
iversity-0.108
gency-0.093
 dearly-0.093
assing-0.092
hips-0.091
Token But
Token ID887
Feature activation-2.66e+00

Pos logit contributions
 bal+0.017
CR+0.014
me+0.014
Whe+0.014
 dipped+0.014
Neg logit contributions
iversity-0.021
 dearly-0.017
gency-0.017
hips-0.017
smart-0.016
Token no
Token ID645
Feature activation-3.64e+00

Pos logit contributions
 bal+0.075
Whe+0.071
CR+0.066
me+0.062
Next+0.062
Neg logit contributions
iversity-0.155
 dearly-0.135
gency-0.134
hips-0.131
ishly-0.129
Token one
Token ID530
Feature activation-6.88e+00

Pos logit contributions
Whe+0.090
 bal+0.089
CR+0.082
 Mir+0.073
me+0.071
Neg logit contributions
iversity-0.211
 dearly-0.199
smart-0.191
 facts-0.190
gency-0.186
Token came
Token ID1625
Feature activation-2.83e+00

Pos logit contributions
 bal+0.283
Whe+0.280
Next+0.259
CR+0.256
me+0.252
Neg logit contributions
iversity-0.314
 dearly-0.264
smart-0.255
 facts-0.247
gency-0.239
Token.
Token ID13
Feature activation-2.43e-01

Pos logit contributions
 bal+0.082
Whe+0.068
CR+0.066
me+0.065
 Mir+0.064
Neg logit contributions
iversity-0.168
gency-0.144
 dearly-0.144
hips-0.142
assing-0.142
Token The
Token ID383
Feature activation-5.01e-01

Pos logit contributions
 bal+0.009
me+0.008
Whe+0.008
CR+0.008
 when+0.008
Neg logit contributions
iversity-0.012
gency-0.010
 dearly-0.010
hips-0.010
assing-0.010
Token jungle
Token ID20712
Feature activation-4.71e-01

Pos logit contributions
 bal+0.016
 when+0.014
Whe+0.013
CR+0.013
me+0.013
Neg logit contributions
iversity-0.028
gency-0.024
 dearly-0.024
assing-0.023
hips-0.023
Token cat
Token ID3797
Feature activation-1.11e+00

Pos logit contributions
 bal+0.015
Whe+0.013
CR+0.013
 Mir+0.012
me+0.012
Neg logit contributions
iversity-0.026
 dearly-0.022
gency-0.022
hips-0.022
assing-0.022
Token this
Token ID428
Feature activation-1.33e-01

Pos logit contributions
iversity+0.072
gency+0.064
 dearly+0.064
hips+0.063
assing+0.063
Neg logit contributions
 bal-0.021
 when-0.017
CR-0.016
 Sara-0.016
Whe-0.016
Token stadium
Token ID10308
Feature activation+1.03e+00

Pos logit contributions
 bal+0.004
 when+0.004
Whe+0.004
 another+0.004
CR+0.004
Neg logit contributions
iversity-0.007
gency-0.006
assing-0.006
 dearly-0.006
hips-0.006
Token,
Token ID11
Feature activation-3.93e-01

Pos logit contributions
iversity+0.058
 dearly+0.049
hips+0.048
 facts+0.048
gency+0.048
Neg logit contributions
 bal-0.034
Whe-0.028
CR-0.028
me-0.027
Next-0.027
Token but
Token ID475
Feature activation-2.48e+00

Pos logit contributions
 bal+0.013
me+0.012
CR+0.012
ara+0.011
Whe+0.011
Neg logit contributions
iversity-0.020
hips-0.018
 dearly-0.018
 facts-0.017
assing-0.017
Token no
Token ID645
Feature activation-3.76e+00

Pos logit contributions
 bal+0.073
Whe+0.067
CR+0.066
me+0.064
Next+0.062
Neg logit contributions
iversity-0.139
 dearly-0.121
gency-0.119
hips-0.116
assing-0.115
Token one
Token ID530
Feature activation-6.87e+00

Pos logit contributions
 bal+0.086
CR+0.083
Whe+0.079
me+0.078
Next+0.070
Neg logit contributions
iversity-0.220
 facts-0.209
 dearly-0.205
gency-0.201
 languages-0.198
Token was
Token ID373
Feature activation-2.29e+00

Pos logit contributions
 bal+0.267
Whe+0.262
Next+0.248
me+0.248
CR+0.242
Neg logit contributions
iversity-0.324
 dearly-0.276
 facts-0.270
smart-0.265
gency-0.253
Token allowed
Token ID3142
Feature activation-3.10e+00

Pos logit contributions
 bal+0.087
Whe+0.081
me+0.079
ara+0.079
CR+0.075
Neg logit contributions
iversity-0.112
 dearly-0.098
smart-0.095
assing-0.094
gency-0.090
Token to
Token ID284
Feature activation-1.30e-01

Pos logit contributions
 bal+0.098
Whe+0.087
CR+0.082
me+0.082
Next+0.080
Neg logit contributions
iversity-0.174
 dearly-0.150
gency-0.148
hips-0.146
assing-0.145
Token go
Token ID467
Feature activation-8.74e-01

Pos logit contributions
 bal+0.004
Whe+0.004
CR+0.004
 Mir+0.004
Next+0.004
Neg logit contributions
iversity-0.007
gency-0.006
hips-0.006
assing-0.006
 dearly-0.006
Token inside
Token ID2641
Feature activation-1.15e+00

Pos logit contributions
 bal+0.030
Whe+0.025
CR+0.024
me+0.024
 Mir+0.024
Neg logit contributions
iversity-0.047
 dearly-0.041
assing-0.041
gency-0.041
hips-0.040
Token what
Token ID644
Feature activation+1.60e-01

Pos logit contributions
 bal+0.003
Whe+0.003
CR+0.003
me+0.003
 when+0.003
Neg logit contributions
iversity-0.005
 dearly-0.004
gency-0.004
hips-0.004
assing-0.004
Token it
Token ID340
Feature activation-3.63e+00

Pos logit contributions
iversity+0.009
 dearly+0.008
 facts+0.008
gency+0.008
smart+0.008
Neg logit contributions
 bal-0.005
Whe-0.004
CR-0.004
me-0.004
Next-0.004
Token was
Token ID373
Feature activation-9.83e-01

Pos logit contributions
 bal+0.142
Whe+0.133
CR+0.126
 Mir+0.122
Next+0.122
Neg logit contributions
iversity-0.174
 facts-0.147
 dearly-0.145
hips-0.140
smart-0.136
Token but
Token ID475
Feature activation-2.21e+00

Pos logit contributions
 bal+0.034
Whe+0.030
CR+0.029
me+0.028
 Mir+0.028
Neg logit contributions
iversity-0.052
 dearly-0.045
gency-0.044
hips-0.044
 facts-0.044
Token no
Token ID645
Feature activation-3.66e+00

Pos logit contributions
 bal+0.056
Whe+0.049
CR+0.047
me+0.046
Next+0.042
Neg logit contributions
iversity-0.136
 dearly-0.122
hips-0.118
gency-0.118
 facts-0.118
Token one
Token ID530
Feature activation-6.81e+00

Pos logit contributions
 bal+0.074
Whe+0.072
CR+0.069
me+0.060
 Mir+0.054
Neg logit contributions
iversity-0.229
 dearly-0.218
 facts-0.215
gency-0.208
smart-0.202
Token knew
Token ID2993
Feature activation+3.10e-02

Pos logit contributions
 bal+0.244
Whe+0.224
CR+0.215
me+0.212
Next+0.208
Neg logit contributions
iversity-0.348
 dearly-0.301
 facts-0.291
smart-0.288
gency-0.286
Token.
Token ID13
Feature activation-3.97e-02

Pos logit contributions
iversity+0.001
 dearly+0.001
hips+0.001
smart+0.001
gency+0.001
Neg logit contributions
 bal-0.001
Whe-0.001
CR-0.001
me-0.001
Next-0.001
Token Some
Token ID2773
Feature activation-3.90e+00

Pos logit contributions
 bal+0.002
Whe+0.001
me+0.001
CR+0.001
 dipped+0.001
Neg logit contributions
iversity-0.002
 dearly-0.002
gency-0.002
smart-0.002
hips-0.002
Token said
Token ID531
Feature activation-2.33e+00

Pos logit contributions
 bal+0.109
Whe+0.089
CR+0.088
me+0.086
 Mir+0.085
Neg logit contributions
iversity-0.234
 dearly-0.202
gency-0.201
hips-0.198
assing-0.198
Token it
Token ID340
Feature activation-6.11e-01

Pos logit contributions
 bal+0.067
Whe+0.055
CR+0.055
me+0.054
 when+0.052
Neg logit contributions
iversity-0.138
gency-0.119
 dearly-0.119
hips-0.117
assing-0.117
Token was
Token ID373
Feature activation-1.30e+00

Pos logit contributions
 bal+0.083
Whe+0.075
CR+0.072
Next+0.072
 Mir+0.071
Neg logit contributions
iversity-0.113
 dearly-0.097
hips-0.094
gency-0.092
 facts-0.091
Token so
Token ID523
Feature activation-5.72e-02

Pos logit contributions
 bal+0.047
Whe+0.041
CR+0.040
me+0.040
 Mir+0.038
Neg logit contributions
iversity-0.067
 dearly-0.058
gency-0.056
hips-0.056
assing-0.055
Token heavy
Token ID4334
Feature activation-2.29e+00

Pos logit contributions
 bal+0.002
Whe+0.002
CR+0.002
me+0.002
 Mir+0.002
Neg logit contributions
iversity-0.003
 dearly-0.002
hips-0.002
 facts-0.002
 lens-0.002
Token that
Token ID326
Feature activation-2.36e+00

Pos logit contributions
 bal+0.089
Whe+0.083
me+0.080
CR+0.078
Next+0.076
Neg logit contributions
iversity-0.110
 dearly-0.093
gency-0.091
assing-0.090
 facts-0.090
Token no
Token ID645
Feature activation-3.93e+00

Pos logit contributions
 bal+0.066
Whe+0.062
CR+0.058
me+0.056
Next+0.055
Neg logit contributions
iversity-0.132
 dearly-0.120
 facts-0.112
hips-0.112
gency-0.109
Token one
Token ID530
Feature activation-6.78e+00

Pos logit contributions
 bal+0.094
CR+0.093
Whe+0.092
me+0.086
 Mir+0.081
Neg logit contributions
iversity-0.220
 dearly-0.213
 facts-0.196
gency-0.195
smart-0.193
Token could
Token ID714
Feature activation-2.36e+00

Pos logit contributions
Whe+0.260
 bal+0.249
Next+0.241
CR+0.237
me+0.235
Neg logit contributions
iversity-0.315
 dearly-0.276
smart-0.259
 facts-0.247
gency-0.241
Token carry
Token ID3283
Feature activation-1.15e+00

Pos logit contributions
 bal+0.081
Whe+0.075
CR+0.071
Next+0.069
ara+0.069
Neg logit contributions
iversity-0.123
 dearly-0.106
hips-0.105
assing-0.102
gency-0.102
Token it
Token ID340
Feature activation-5.19e-02

Pos logit contributions
 bal+0.032
CR+0.026
 when+0.026
Whe+0.025
me+0.025
Neg logit contributions
iversity-0.069
gency-0.060
 dearly-0.059
assing-0.059
hips-0.059
Token.
Token ID13
Feature activation-8.12e-02

Pos logit contributions
 bal+0.002
Whe+0.001
CR+0.001
me+0.001
 Mir+0.001
Neg logit contributions
iversity-0.003
 dearly-0.002
gency-0.002
hips-0.002
assing-0.002
Token But
Token ID887
Feature activation-2.18e+00

Pos logit contributions
 bal+0.003
 dipped+0.003
me+0.003
CR+0.003
Whe+0.003
Neg logit contributions
iversity-0.004
 dearly-0.003
gency-0.003
smart-0.003
assing-0.003
Token so
Token ID523
Feature activation-2.27e-01

Pos logit contributions
 bal+0.043
Whe+0.039
me+0.039
ara+0.039
CR+0.038
Neg logit contributions
iversity-0.049
 dearly-0.045
hips-0.040
smart-0.040
 brand-0.040
Token com
Token ID401
Feature activation-3.21e+00

Pos logit contributions
 bal+0.009
Whe+0.009
me+0.008
CR+0.008
ara+0.008
Neg logit contributions
iversity-0.010
 dearly-0.009
 facts-0.008
 brand-0.008
hips-0.008
Tokenfy
Token ID24928
Feature activation-2.54e+00

Pos logit contributions
 bal+0.151
CR+0.136
Whe+0.135
 Mir+0.134
 Sara+0.132
Neg logit contributions
iversity-0.132
 dearly-0.107
gency-0.104
assing-0.103
hips-0.102
Token and
Token ID290
Feature activation-1.50e+00

Pos logit contributions
 bal+0.092
Whe+0.085
CR+0.084
me+0.082
Next+0.078
Neg logit contributions
iversity-0.129
 dearly-0.109
gency-0.107
hips-0.104
assing-0.103
Token no
Token ID645
Feature activation-4.06e+00

Pos logit contributions
 bal+0.052
CR+0.048
Whe+0.047
ara+0.045
me+0.043
Neg logit contributions
iversity-0.078
 dearly-0.070
 facts-0.064
gency-0.064
 brand-0.063
Token one
Token ID530
Feature activation-6.78e+00

Pos logit contributions
CR+0.109
Whe+0.097
 bal+0.092
uck+0.086
There+0.085
Neg logit contributions
 dearly-0.225
iversity-0.201
 brand-0.190
smart-0.189
 facts-0.189
Token knew
Token ID2993
Feature activation+8.89e-01

Pos logit contributions
Whe+0.247
 bal+0.243
CR+0.238
Next+0.231
me+0.224
Neg logit contributions
iversity-0.323
 dearly-0.289
smart-0.264
 facts-0.260
gency-0.251
Token about
Token ID546
Feature activation-3.35e-01

Pos logit contributions
iversity+0.047
 dearly+0.040
gency+0.039
hips+0.039
smart+0.039
Neg logit contributions
 bal-0.031
Whe-0.028
CR-0.027
me-0.026
Next-0.025
Token her
Token ID607
Feature activation-1.96e+00

Pos logit contributions
 bal+0.010
 when+0.009
 she+0.008
 Sara+0.008
 one+0.008
Neg logit contributions
iversity-0.020
gency-0.017
hips-0.017
assing-0.017
 dearly-0.016
Token secret
Token ID3200
Feature activation+1.51e+00

Pos logit contributions
 bal+0.065
CR+0.057
Whe+0.056
me+0.055
 Mir+0.053
Neg logit contributions
iversity-0.106
 dearly-0.091
gency-0.090
assing-0.088
hips-0.088
Token spot
Token ID4136
Feature activation-2.73e+00

Pos logit contributions
iversity+0.085
gency+0.072
 dearly+0.072
assing+0.071
hips+0.071
Neg logit contributions
 bal-0.049
CR-0.040
Whe-0.040
me-0.040
 when-0.040
Token sang
Token ID25889
Feature activation-1.57e+00

Pos logit contributions
 bal+0.072
Whe+0.065
CR+0.064
 Mir+0.061
 Sara+0.059
Neg logit contributions
iversity-0.092
 dearly-0.072
gency-0.072
asser-0.071
smart-0.070
Token,
Token ID11
Feature activation+2.11e-01

Pos logit contributions
 bal+0.049
CR+0.043
Whe+0.042
me+0.041
 Mir+0.040
Neg logit contributions
iversity-0.087
 dearly-0.075
gency-0.074
hips-0.073
smart-0.072
Token the
Token ID262
Feature activation+2.13e-01

Pos logit contributions
iversity+0.011
 dearly+0.010
gency+0.009
 brand+0.009
ishly+0.009
Neg logit contributions
 bal-0.006
CR-0.006
me-0.006
 dipped-0.005
Whe-0.005
Token dove
Token ID38862
Feature activation-3.63e+00

Pos logit contributions
iversity+0.011
 tripod+0.009
 lens+0.009
assing+0.009
 facts+0.009
Neg logit contributions
 bal-0.007
CR-0.007
me-0.006
 Mir-0.006
Whe-0.005
Token's
Token ID338
Feature activation-2.00e+00

Pos logit contributions
 bal+0.120
Whe+0.119
CR+0.111
 Mir+0.102
ara+0.102
Neg logit contributions
iversity-0.190
 dearly-0.159
smart-0.158
ishly-0.153
hips-0.153
Token eyes
Token ID2951
Feature activation+3.70e+00

Pos logit contributions
 bal+0.083
CR+0.078
Whe+0.076
me+0.074
 Mir+0.072
Neg logit contributions
iversity-0.090
 dearly-0.077
 facts-0.074
gency-0.074
hips-0.073
Token grew
Token ID6348
Feature activation-1.46e+00

Pos logit contributions
iversity+0.188
 dearly+0.158
hips+0.155
gency+0.155
assing+0.152
Neg logit contributions
 bal-0.135
Whe-0.119
CR-0.117
 Mir-0.115
 Sara-0.113
Token heavy
Token ID4334
Feature activation+7.65e-01

Pos logit contributions
 bal+0.045
Whe+0.042
CR+0.039
 Mir+0.037
me+0.037
Neg logit contributions
iversity-0.078
 dearly-0.071
hips-0.069
smart-0.067
 lens-0.067
Token,
Token ID11
Feature activation+2.62e-01

Pos logit contributions
iversity+0.044
 dearly+0.038
gency+0.038
hips+0.037
assing+0.037
Neg logit contributions
 bal-0.023
Whe-0.019
CR-0.019
me-0.019
 Mir-0.018
Token and
Token ID290
Feature activation-6.82e-01

Pos logit contributions
iversity+0.014
 dearly+0.012
gency+0.011
 facts+0.011
smart+0.011
Neg logit contributions
 bal-0.009
CR-0.008
Whe-0.008
me-0.008
 Mir-0.007
Token he
Token ID339
Feature activation-1.54e+00

Pos logit contributions
 bal+0.020
CR+0.019
Whe+0.019
me+0.018
ara+0.017
Neg logit contributions
iversity-0.037
 dearly-0.033
 brand-0.032
smart-0.031
gency-0.031
Token.
Token ID13
Feature activation-7.51e-02

Pos logit contributions
 bal+0.021
 when+0.019
 she+0.017
 another+0.017
 one+0.017
Neg logit contributions
iversity-0.053
gency-0.047
assing-0.046
 dearly-0.045
 facts-0.045
Token It
Token ID632
Feature activation+7.75e-01

Pos logit contributions
 bal+0.002
me+0.002
Whe+0.002
 dipped+0.002
CR+0.002
Neg logit contributions
iversity-0.004
smart-0.004
 facts-0.004
 dearly-0.003
 lens-0.003
Token followed
Token ID3940
Feature activation-3.42e-02

Pos logit contributions
iversity+0.037
 dearly+0.031
smart+0.031
 facts+0.030
isk+0.030
Neg logit contributions
 bal-0.029
Whe-0.028
CR-0.026
Next-0.025
me-0.024
Token the
Token ID262
Feature activation+6.87e-01

Pos logit contributions
 bal+0.001
Whe+0.001
CR+0.001
me+0.001
Next+0.001
Neg logit contributions
iversity-0.002
 dearly-0.002
gency-0.002
hips-0.002
 facts-0.002
Token bread
Token ID8509
Feature activation-1.96e+00

Pos logit contributions
iversity+0.037
gency+0.032
 facts+0.032
assing+0.031
 dearly+0.031
Neg logit contributions
 bal-0.023
Whe-0.020
CR-0.020
me-0.018
 Mir-0.018
Token trail
Token ID8025
Feature activation+3.60e+00

Pos logit contributions
 bal+0.060
Whe+0.052
CR+0.050
me+0.050
 Mir+0.048
Neg logit contributions
iversity-0.112
 dearly-0.096
gency-0.095
hips-0.095
assing-0.094
Token to
Token ID284
Feature activation+8.43e-01

Pos logit contributions
iversity+0.221
gency+0.195
 dearly+0.189
assing+0.189
 facts+0.186
Neg logit contributions
 bal-0.097
 when-0.080
CR-0.077
 she-0.075
me-0.074
Token the
Token ID262
Feature activation+2.93e-01

Pos logit contributions
iversity+0.053
gency+0.046
 dearly+0.046
assing+0.045
hips+0.045
Neg logit contributions
 bal-0.022
CR-0.017
Whe-0.017
me-0.017
 when-0.016
Token gate
Token ID8946
Feature activation+2.30e+00

Pos logit contributions
iversity+0.016
gency+0.013
assing+0.013
 dearly+0.013
hips+0.013
Neg logit contributions
 bal-0.010
Whe-0.009
CR-0.008
me-0.008
 Mir-0.008
Token.
Token ID13
Feature activation-6.10e-02

Pos logit contributions
iversity+0.133
 dearly+0.114
gency+0.113
hips+0.112
assing+0.112
Neg logit contributions
 bal-0.070
Whe-0.058
CR-0.058
me-0.057
 Mir-0.055
Token Mia
Token ID32189
Feature activation-2.20e+00

Pos logit contributions
 bal+0.002
me+0.002
Whe+0.002
 dipped+0.002
CR+0.002
Neg logit contributions
iversity-0.003
smart-0.003
 facts-0.003
 dearly-0.003
hips-0.003
Token stars
Token ID5788
Feature activation-1.16e+00

Pos logit contributions
 bal+0.058
Whe+0.053
me+0.051
CR+0.050
ara+0.048
Neg logit contributions
iversity-0.080
 facts-0.072
hips-0.071
 dearly-0.069
 lens-0.069
Token and
Token ID290
Feature activation-9.99e-01

Pos logit contributions
 bal+0.042
Whe+0.040
CR+0.037
me+0.036
Next+0.035
Neg logit contributions
iversity-0.057
hips-0.051
 facts-0.048
smart-0.048
 dearly-0.048
Token planets
Token ID14705
Feature activation-9.16e-01

Pos logit contributions
 bal+0.037
me+0.033
Whe+0.032
ara+0.031
CR+0.031
Neg logit contributions
iversity-0.047
 facts-0.045
hips-0.041
smart-0.041
 lens-0.040
Token on
Token ID319
Feature activation+6.74e-01

Pos logit contributions
 bal+0.031
Whe+0.028
CR+0.027
me+0.026
Next+0.025
Neg logit contributions
iversity-0.048
hips-0.042
 facts-0.041
 dearly-0.041
smart-0.040
Token the
Token ID262
Feature activation-1.48e-01

Pos logit contributions
iversity+0.047
gency+0.042
 dearly+0.041
assing+0.041
smart+0.040
Neg logit contributions
 bal-0.012
 when-0.011
 another-0.010
 one-0.010
 Mir-0.010
Token box
Token ID3091
Feature activation+4.09e+00

Pos logit contributions
 bal+0.005
 when+0.004
 one+0.004
 Mir+0.004
CR+0.004
Neg logit contributions
iversity-0.008
gency-0.007
smart-0.007
 dearly-0.007
assing-0.007
Token.
Token ID13
Feature activation-2.87e-01

Pos logit contributions
iversity+0.216
hips+0.190
 dearly+0.188
 facts+0.184
gency+0.182
Neg logit contributions
 bal-0.140
Whe-0.124
CR-0.119
me-0.115
Next-0.112
Token They
Token ID1119
Feature activation+1.33e+00

Pos logit contributions
 bal+0.014
me+0.013
CR+0.012
 dipped+0.012
Whe+0.012
Neg logit contributions
iversity-0.011
hips-0.009
 facts-0.009
 dearly-0.009
gency-0.009
Token make
Token ID787
Feature activation-1.47e+00

Pos logit contributions
iversity+0.067
hips+0.057
gency+0.056
 dearly+0.055
assing+0.055
Neg logit contributions
 bal-0.050
CR-0.043
Whe-0.043
me-0.042
Next-0.042
Token a
Token ID257
Feature activation-6.38e-01

Pos logit contributions
 bal+0.061
Whe+0.054
CR+0.053
me+0.052
ara+0.051
Neg logit contributions
iversity-0.062
hips-0.060
 facts-0.059
 stories-0.057
 brand-0.055
Token rocket
Token ID10701
Feature activation-1.49e+00

Pos logit contributions
 bal+0.023
CR+0.019
Whe+0.019
 when+0.019
me+0.019
Neg logit contributions
iversity-0.034
gency-0.028
 dearly-0.028
assing-0.028
hips-0.027
Token what
Token ID644
Feature activation+1.14e+00

Pos logit contributions
 bal+0.010
 when+0.008
 another+0.008
 one+0.008
 Mir+0.008
Neg logit contributions
iversity-0.025
gency-0.022
assing-0.021
 dearly-0.021
hips-0.021
Token's
Token ID338
Feature activation-3.60e+00

Pos logit contributions
iversity+0.069
 dearly+0.060
 facts+0.059
smart+0.058
hips+0.057
Neg logit contributions
 bal-0.031
Whe-0.029
me-0.026
CR-0.026
ara-0.025
Token happening
Token ID5836
Feature activation-8.19e-01

Pos logit contributions
 bal+0.131
Whe+0.121
me+0.116
CR+0.115
ara+0.109
Neg logit contributions
iversity-0.184
 dearly-0.156
 facts-0.154
hips-0.154
gency-0.153
Token in
Token ID287
Feature activation+1.13e+00

Pos logit contributions
 bal+0.023
CR+0.019
Whe+0.019
 when+0.018
me+0.018
Neg logit contributions
iversity-0.049
gency-0.042
 dearly-0.042
assing-0.042
hips-0.041
Token the
Token ID262
Feature activation-2.80e-02

Pos logit contributions
iversity+0.075
gency+0.066
 dearly+0.066
assing+0.065
hips+0.065
Neg logit contributions
 bal-0.023
 when-0.020
 one-0.019
 another-0.019
 Sara-0.019
Token world
Token ID995
Feature activation+4.41e+00

Pos logit contributions
 bal+0.001
 when+0.001
 one+0.001
 sun+0.001
 another+0.001
Neg logit contributions
iversity-0.002
gency-0.001
hips-0.001
assing-0.001
 dearly-0.001
Token,
Token ID11
Feature activation+5.06e-01

Pos logit contributions
iversity+0.251
 dearly+0.214
gency+0.212
hips+0.210
assing+0.210
Neg logit contributions
 bal-0.138
Whe-0.116
CR-0.114
me-0.112
 Mir-0.111
Token like
Token ID588
Feature activation-1.00e+00

Pos logit contributions
iversity+0.027
 dearly+0.024
gency+0.023
hips+0.023
assing+0.023
Neg logit contributions
 bal-0.017
CR-0.015
Whe-0.014
me-0.014
 Mir-0.014
Token when
Token ID618
Feature activation-5.01e+00

Pos logit contributions
 bal+0.032
Whe+0.027
CR+0.027
me+0.027
ara+0.025
Neg logit contributions
iversity-0.057
 dearly-0.048
 facts-0.048
gency-0.048
hips-0.048
Token it
Token ID340
Feature activation-2.16e+00

Pos logit contributions
 bal+0.155
Whe+0.143
me+0.133
ara+0.125
 dipped+0.119
Neg logit contributions
iversity-0.277
 dearly-0.249
 facts-0.245
hips-0.238
assing-0.237
Token's
Token ID338
Feature activation-3.22e+00

Pos logit contributions
 bal+0.078
Whe+0.074
 Mir+0.065
CR+0.065
Next+0.064
Neg logit contributions
iversity-0.113
smart-0.094
 dearly-0.091
assing-0.090
 facts-0.090
Token voice
Token ID3809
Feature activation+2.33e+00

Pos logit contributions
iversity+0.036
gency+0.031
 dearly+0.031
assing+0.030
hips+0.030
Neg logit contributions
 bal-0.020
CR-0.016
Whe-0.016
me-0.016
 when-0.016
Token said
Token ID531
Feature activation-5.81e-01

Pos logit contributions
iversity+0.130
 dearly+0.114
smart+0.110
 facts+0.110
hips+0.110
Neg logit contributions
 bal-0.071
Whe-0.064
CR-0.061
 Mir-0.058
me-0.058
Token,
Token ID11
Feature activation+2.27e+00

Pos logit contributions
 bal+0.017
 when+0.014
CR+0.014
Whe+0.013
me+0.013
Neg logit contributions
iversity-0.035
gency-0.030
 dearly-0.030
assing-0.029
hips-0.029
Token "
Token ID366
Feature activation+2.35e+00

Pos logit contributions
iversity+0.123
 dearly+0.106
hips+0.106
assing+0.106
gency+0.105
Neg logit contributions
 bal-0.077
 when-0.070
 she-0.069
 Mir-0.067
 Sara-0.066
TokenI
Token ID40
Feature activation+8.15e-01

Pos logit contributions
iversity+0.096
 dearly+0.078
smart+0.075
assing+0.071
 facts+0.067
Neg logit contributions
 bal-0.105
 Mir-0.092
 sets-0.089
 Sara-0.088
 older-0.088
Token forgive
Token ID20927
Feature activation+3.77e+00

Pos logit contributions
iversity+0.038
 dearly+0.032
gency+0.031
smart+0.030
assing+0.030
Neg logit contributions
 bal-0.033
Whe-0.031
Next-0.030
CR-0.030
 Mir-0.029
Token you
Token ID345
Feature activation+8.29e-01

Pos logit contributions
iversity+0.264
 dearly+0.233
gency+0.232
assing+0.230
 facts+0.229
Neg logit contributions
 bal-0.069
Whe-0.050
CR-0.049
me-0.045
 Mir-0.042
Token.
Token ID13
Feature activation+7.70e-01

Pos logit contributions
iversity+0.055
gency+0.048
assing+0.047
hips+0.047
 facts+0.047
Neg logit contributions
 bal-0.019
 when-0.018
 one-0.016
 she-0.016
 another-0.015
Token Let
Token ID3914
Feature activation-1.32e+00

Pos logit contributions
iversity+0.042
gency+0.036
 dearly+0.035
smart+0.035
hips+0.034
Neg logit contributions
 bal-0.026
CR-0.022
me-0.022
 dipped-0.022
Next-0.021
Token's
Token ID338
Feature activation-2.06e+00

Pos logit contributions
 bal+0.043
 Mir+0.039
Whe+0.039
CR+0.036
Next+0.035
Neg logit contributions
iversity-0.070
 dearly-0.060
ishly-0.058
gency-0.057
assing-0.057
Token be
Token ID307
Feature activation-5.08e-01

Pos logit contributions
 bal+0.068
CR+0.062
Whe+0.061
 Mir+0.060
Next+0.059
Neg logit contributions
iversity-0.107
hips-0.094
 dearly-0.094
gency-0.093
assing-0.092
Token very
Token ID845
Feature activation+2.76e+00

Pos logit contributions
 bal+0.017
Whe+0.015
CR+0.015
me+0.015
ara+0.014
Neg logit contributions
iversity-0.016
 dearly-0.014
hips-0.013
assing-0.012
 facts-0.012
Token silly
Token ID14397
Feature activation-2.83e-01

Pos logit contributions
iversity+0.104
 dearly+0.092
hips+0.091
 microsc+0.088
 tripod+0.088
Neg logit contributions
Whe-0.122
 bal-0.120
CR-0.103
me-0.102
one-0.101
Token.
Token ID13
Feature activation+7.53e-01

Pos logit contributions
 bal+0.009
Whe+0.008
CR+0.008
me+0.008
 Mir+0.007
Neg logit contributions
iversity-0.016
 dearly-0.014
gency-0.013
 facts-0.013
assing-0.013
Token His
Token ID2399
Feature activation-1.11e+00

Pos logit contributions
iversity+0.034
 dearly+0.028
gency+0.028
hips+0.027
assing+0.027
Neg logit contributions
 bal-0.033
me-0.028
CR-0.028
Whe-0.028
 when-0.028
Token name
Token ID1438
Feature activation+2.11e+00

Pos logit contributions
 bal+0.039
CR+0.037
Whe+0.035
me+0.034
Next+0.033
Neg logit contributions
iversity-0.056
 dearly-0.048
 facts-0.048
gency-0.047
assing-0.046
Token was
Token ID373
Feature activation+2.01e+00

Pos logit contributions
iversity+0.119
 dearly+0.101
gency+0.099
assing+0.097
 facts+0.097
Neg logit contributions
 bal-0.066
Whe-0.056
CR-0.055
 Mir-0.054
Next-0.052
Token Max
Token ID5436
Feature activation+2.17e+00

Pos logit contributions
iversity+0.122
gency+0.106
 dearly+0.105
assing+0.104
hips+0.104
Neg logit contributions
 bal-0.055
CR-0.044
Whe-0.044
 when-0.043
me-0.043
Token.
Token ID13
Feature activation+5.46e-01

Pos logit contributions
iversity+0.149
gency+0.136
hips+0.134
assing+0.131
 facts+0.131
Neg logit contributions
 when-0.044
 bal-0.044
 one-0.037
 she-0.035
 another-0.035
Token
Token ID198
Feature activation+3.81e-01

Pos logit contributions
iversity+0.033
gency+0.028
 dearly+0.028
assing+0.027
hips+0.027
Neg logit contributions
 bal-0.015
 when-0.013
Whe-0.013
 Sara-0.013
CR-0.013
Token
Token ID198
Feature activation+5.60e-01

Pos logit contributions
iversity+0.024
gency+0.021
 dearly+0.021
assing+0.021
hips+0.021
Neg logit contributions
 bal-0.009
CR-0.007
Whe-0.007
 when-0.007
me-0.007
TokenMax
Token ID11518
Feature activation+9.26e-01

Pos logit contributions
iversity+0.031
gency+0.027
 dearly+0.026
hips+0.026
 facts+0.026
Neg logit contributions
 bal-0.018
CR-0.016
 when-0.016
me-0.016
Next-0.016
Token,"
Token ID553
Feature activation-2.89e+00

Pos logit contributions
iversity+0.040
gency+0.035
 dearly+0.034
assing+0.034
hips+0.034
Neg logit contributions
 bal-0.020
 when-0.016
CR-0.016
me-0.016
Whe-0.016
Token the
Token ID262
Feature activation-9.04e-01

Pos logit contributions
 bal+0.106
CR+0.090
Whe+0.088
me+0.088
 dipped+0.081
Neg logit contributions
iversity-0.150
gency-0.125
 facts-0.125
 dearly-0.125
hips-0.125
Token dog
Token ID3290
Feature activation+9.88e-01

Pos logit contributions
 bal+0.021
 Sara+0.017
Whe+0.017
 another+0.017
 when+0.017
Neg logit contributions
iversity-0.059
 dearly-0.052
hips-0.052
gency-0.052
smart-0.050
Token says
Token ID1139
Feature activation-2.44e+00

Pos logit contributions
iversity+0.051
 facts+0.047
asser+0.044
hips+0.044
assing+0.044
Neg logit contributions
 bal-0.032
CR-0.029
Next-0.027
Whe-0.027
me-0.026
Token.
Token ID13
Feature activation+7.57e-02

Pos logit contributions
 bal+0.082
Whe+0.072
CR+0.069
me+0.068
Next+0.067
Neg logit contributions
iversity-0.132
 dearly-0.113
hips-0.112
gency-0.111
smart-0.111
Token "
Token ID366
Feature activation+7.53e-01

Pos logit contributions
iversity+0.004
gency+0.003
 dearly+0.003
assing+0.003
hips+0.003
Neg logit contributions
 bal-0.003
CR-0.003
Whe-0.003
 when-0.003
me-0.003
TokenYou
Token ID1639
Feature activation+2.49e+00

Pos logit contributions
iversity+0.044
 dearly+0.037
assing+0.037
hips+0.036
gency+0.036
Neg logit contributions
 bal-0.023
CR-0.018
 Mir-0.018
Whe-0.018
 Sara-0.018
Token are
Token ID389
Feature activation+1.23e+00

Pos logit contributions
iversity+0.116
gency+0.099
 dearly+0.096
assing+0.094
smart+0.094
Neg logit contributions
 bal-0.103
 when-0.091
me-0.090
 another-0.088
 asked-0.088
Token very
Token ID845
Feature activation+2.93e+00

Pos logit contributions
iversity+0.068
gency+0.059
assing+0.056
hips+0.056
 dearly+0.056
Neg logit contributions
 bal-0.041
 when-0.036
 Sara-0.035
 Mir-0.035
 another-0.034
Token kind
Token ID1611
Feature activation+2.40e+00

Pos logit contributions
iversity+0.160
gency+0.142
assing+0.137
 dearly+0.135
hips+0.134
Neg logit contributions
 bal-0.098
me-0.083
 Sara-0.083
 when-0.082
 Mir-0.081
Token.
Token ID13
Feature activation+9.29e-03

Pos logit contributions
iversity+0.123
 dearly+0.107
 facts+0.107
smart+0.106
hips+0.102
Neg logit contributions
 bal-0.084
Whe-0.080
me-0.074
CR-0.074
Next-0.070
Token something
Token ID1223
Feature activation+1.51e+00

Pos logit contributions
iversity+0.051
gency+0.044
 dearly+0.043
assing+0.043
hips+0.042
Neg logit contributions
 bal-0.032
 when-0.027
CR-0.027
Whe-0.027
me-0.027
Token,"
Token ID553
Feature activation-1.62e+00

Pos logit contributions
iversity+0.081
 dearly+0.070
hips+0.069
 facts+0.069
smart+0.068
Neg logit contributions
 bal-0.050
Whe-0.046
CR-0.044
me-0.043
 Mir-0.042
Token the
Token ID262
Feature activation-1.83e-01

Pos logit contributions
 bal+0.060
Whe+0.051
CR+0.051
me+0.049
ara+0.048
Neg logit contributions
iversity-0.080
smart-0.070
 facts-0.068
 dearly-0.067
gency-0.066
Token elephant
Token ID20950
Feature activation-3.75e-01

Pos logit contributions
 bal+0.005
CR+0.004
Whe+0.004
me+0.004
 when+0.004
Neg logit contributions
iversity-0.011
gency-0.009
 dearly-0.009
assing-0.009
hips-0.009
Token said
Token ID531
Feature activation-1.78e+00

Pos logit contributions
Whe+0.011
 bal+0.011
CR+0.010
 Sara+0.010
 Mir+0.009
Neg logit contributions
iversity-0.020
 dearly-0.018
smart-0.018
 facts-0.017
gency-0.017
Token.
Token ID13
Feature activation+8.43e-01

Pos logit contributions
 bal+0.053
CR+0.043
Whe+0.043
 when+0.042
me+0.042
Neg logit contributions
iversity-0.105
gency-0.090
 dearly-0.090
assing-0.089
hips-0.088
Token The
Token ID383
Feature activation+8.33e-01

Pos logit contributions
iversity+0.038
smart+0.032
gency+0.030
 dearly+0.029
 tripod+0.028
Neg logit contributions
 bal-0.036
me-0.032
Whe-0.030
CR-0.030
 dipped-0.029
Token ant
Token ID1885
Feature activation-1.39e+00

Pos logit contributions
iversity+0.051
gency+0.045
hips+0.044
 dearly+0.044
assing+0.044
Neg logit contributions
 bal-0.023
 when-0.018
 Mir-0.018
Whe-0.018
 one-0.018
Token listened
Token ID16399
Feature activation+1.84e+00

Pos logit contributions
 bal+0.048
Whe+0.045
CR+0.043
 Sara+0.041
 Mir+0.041
Neg logit contributions
iversity-0.072
 dearly-0.062
smart-0.059
gency-0.059
 facts-0.058
Token carefully
Token ID7773
Feature activation+1.01e-01

Pos logit contributions
iversity+0.112
gency+0.097
assing+0.095
 dearly+0.095
hips+0.095
Neg logit contributions
 bal-0.051
 when-0.043
CR-0.041
me-0.040
 Mir-0.040
Token to
Token ID284
Feature activation+1.50e+00

Pos logit contributions
iversity+0.007
gency+0.006
assing+0.006
 dearly+0.006
 tripod+0.006
Neg logit contributions
 when-0.002
 bal-0.002
 one-0.002
 another-0.002
 she-0.002
Token so
Token ID523
Feature activation+7.92e-01

Pos logit contributions
 bal+0.017
CR+0.016
ara+0.016
Whe+0.016
me+0.016
Neg logit contributions
iversity-0.015
 dearly-0.013
gency-0.012
smart-0.012
assing-0.011
Token excited
Token ID6568
Feature activation-1.65e+00

Pos logit contributions
iversity+0.035
 dearly+0.032
smart+0.031
 brand+0.030
hips+0.028
Neg logit contributions
Whe-0.029
 bal-0.029
ara-0.028
CR-0.027
me-0.027
Token and
Token ID290
Feature activation+1.78e-02

Pos logit contributions
 bal+0.062
Whe+0.056
CR+0.055
me+0.053
 Mir+0.052
Neg logit contributions
iversity-0.083
 dearly-0.071
gency-0.069
 facts-0.068
assing-0.067
Token couldn
Token ID3521
Feature activation-1.32e-01

Pos logit contributions
iversity+0.001
 dearly+0.001
gency+0.001
 brand+0.001
smart+0.001
Neg logit contributions
 bal-0.001
Whe-0.001
ara-0.001
CR-0.001
me-0.001
Token't
Token ID470
Feature activation-1.20e+00

Pos logit contributions
 bal+0.005
 dipped+0.005
Whe+0.005
 when+0.005
Next+0.005
Neg logit contributions
iversity-0.006
 dearly-0.005
hips-0.005
gency-0.005
assing-0.005
Token believe
Token ID1975
Feature activation+1.88e+00

Pos logit contributions
 bal+0.037
 when+0.030
CR+0.029
Whe+0.029
me+0.029
Neg logit contributions
iversity-0.072
gency-0.061
 dearly-0.060
hips-0.060
assing-0.059
Token his
Token ID465
Feature activation+3.73e-01

Pos logit contributions
iversity+0.124
assing+0.107
gency+0.107
hips+0.104
 facts+0.104
Neg logit contributions
 bal-0.045
 when-0.042
 she-0.040
 Sara-0.039
 Mir-0.036
Token luck
Token ID8458
Feature activation+4.23e+00

Pos logit contributions
iversity+0.018
gency+0.014
assing+0.014
hips+0.014
 dearly+0.014
Neg logit contributions
 bal-0.016
 when-0.014
 another-0.014
 Mir-0.014
CR-0.014
Token.
Token ID13
Feature activation+3.88e-01

Pos logit contributions
iversity+0.245
gency+0.210
 dearly+0.208
assing+0.207
hips+0.206
Neg logit contributions
 bal-0.131
 when-0.108
CR-0.106
me-0.105
 she-0.103
Token He
Token ID679
Feature activation+2.21e+00

Pos logit contributions
iversity+0.018
 dearly+0.015
gency+0.015
hips+0.014
assing+0.014
Neg logit contributions
 bal-0.016
me-0.014
CR-0.014
Whe-0.014
 when-0.013
Token grabbed
Token ID13176
Feature activation+8.30e-01

Pos logit contributions
iversity+0.116
 dearly+0.099
gency+0.097
assing+0.094
 facts+0.093
Neg logit contributions
 bal-0.078
Whe-0.070
CR-0.067
me-0.065
 Mir-0.064
Token home
Token ID1363
Feature activation+9.10e-02

Pos logit contributions
 bal+0.010
 when+0.009
CR+0.008
 Sara+0.008
Whe+0.008
Neg logit contributions
iversity-0.022
gency-0.019
 dearly-0.019
assing-0.019
hips-0.019
Token together
Token ID1978
Feature activation+8.39e-01

Pos logit contributions
iversity+0.006
gency+0.005
 dearly+0.005
assing+0.005
hips+0.005
Neg logit contributions
 bal-0.002
CR-0.002
 when-0.002
Whe-0.002
me-0.002
Token.
Token ID13
Feature activation-1.80e-01

Pos logit contributions
iversity+0.050
gency+0.043
 dearly+0.043
assing+0.043
hips+0.043
Neg logit contributions
 bal-0.024
 when-0.019
CR-0.019
Whe-0.019
me-0.019
Token They
Token ID1119
Feature activation+1.76e+00

Pos logit contributions
 bal+0.007
me+0.006
CR+0.005
Whe+0.005
 when+0.005
Neg logit contributions
iversity-0.009
 dearly-0.008
gency-0.008
hips-0.008
assing-0.008
Token had
Token ID550
Feature activation+1.43e+00

Pos logit contributions
iversity+0.094
gency+0.081
hips+0.079
 dearly+0.079
assing+0.078
Neg logit contributions
 bal-0.061
 when-0.053
 Sara-0.052
me-0.051
 asked-0.051
Token a
Token ID257
Feature activation-5.83e-01

Pos logit contributions
iversity+0.083
 dearly+0.072
gency+0.071
assing+0.070
 facts+0.069
Neg logit contributions
 bal-0.044
CR-0.036
me-0.036
Whe-0.036
 Mir-0.035
Token lot
Token ID1256
Feature activation+2.65e+00

Pos logit contributions
 bal+0.018
 another+0.016
 one+0.016
 sun+0.015
 dizz+0.014
Neg logit contributions
iversity-0.035
gency-0.031
hips-0.031
smart-0.031
assing-0.030
Token of
Token ID286
Feature activation-2.99e+00

Pos logit contributions
iversity+0.161
 dearly+0.143
 facts+0.142
smart+0.142
hips+0.140
Neg logit contributions
 bal-0.067
Whe-0.057
 Mir-0.055
me-0.054
CR-0.054
Token fun
Token ID1257
Feature activation-9.02e-01

Pos logit contributions
 bal+0.082
Whe+0.066
CR+0.065
 when+0.065
me+0.064
Neg logit contributions
iversity-0.185
gency-0.160
 dearly-0.158
hips-0.157
assing-0.157
Token.
Token ID13
Feature activation-4.35e-01

Pos logit contributions
 bal+0.027
CR+0.022
Whe+0.022
me+0.021
 when+0.021
Neg logit contributions
iversity-0.053
gency-0.046
 dearly-0.046
assing-0.045
hips-0.045
Token They
Token ID1119
Feature activation+1.58e+00

Pos logit contributions
 bal+0.017
CR+0.015
Whe+0.015
me+0.014
 when+0.014
Neg logit contributions
iversity-0.022
gency-0.018
 dearly-0.018
hips-0.018
assing-0.018
Token pink
Token ID11398
Feature activation-3.03e+00

Pos logit contributions
 bal+0.037
CR+0.032
Whe+0.031
me+0.031
 when+0.029
Neg logit contributions
iversity-0.072
 dearly-0.063
gency-0.062
hips-0.061
assing-0.060
Token room
Token ID2119
Feature activation+7.40e-01

Pos logit contributions
 bal+0.097
CR+0.090
Whe+0.087
me+0.085
Next+0.084
Neg logit contributions
iversity-0.161
gency-0.139
 dearly-0.139
 lens-0.135
hips-0.134
Token.
Token ID13
Feature activation+3.74e-03

Pos logit contributions
iversity+0.045
gency+0.039
 dearly+0.038
assing+0.038
hips+0.038
Neg logit contributions
 bal-0.021
 when-0.017
CR-0.017
Whe-0.016
me-0.016
Token
Token ID198
Feature activation-2.12e-01

Pos logit contributions
iversity+0.000
 dearly+0.000
gency+0.000
hips+0.000
 lens+0.000
Neg logit contributions
 bal-0.000
CR-0.000
me-0.000
 dipped-0.000
Whe-0.000
Token
Token ID198
Feature activation+1.34e-01

Pos logit contributions
 bal+0.006
CR+0.004
Whe+0.004
me+0.004
 when+0.004
Neg logit contributions
iversity-0.013
gency-0.011
 dearly-0.011
assing-0.011
hips-0.011
TokenOne
Token ID3198
Feature activation-2.47e+00

Pos logit contributions
iversity+0.006
 dearly+0.005
hips+0.005
assing+0.005
gency+0.005
Neg logit contributions
 bal-0.006
 Mir-0.005
Whe-0.005
CR-0.005
me-0.005
Token sunny
Token ID27737
Feature activation-2.11e+00

Pos logit contributions
 bal+0.087
CR+0.081
Whe+0.079
me+0.076
 Mir+0.075
Neg logit contributions
iversity-0.121
 dearly-0.106
 facts-0.096
smart-0.093
 tablet-0.093
Token day
Token ID1110
Feature activation-1.45e+00

Pos logit contributions
 bal+0.086
 Mir+0.084
Whe+0.083
 shorter+0.081
 dipped+0.081
Neg logit contributions
iversity-0.085
 dearly-0.067
 lens-0.067
 globe-0.063
 facts-0.062
Token,
Token ID11
Feature activation+7.71e-02

Pos logit contributions
Whe+0.064
 bal+0.061
me+0.058
Next+0.057
 Mir+0.056
Neg logit contributions
iversity-0.060
 dearly-0.046
gency-0.045
 lens-0.043
smart-0.043
Token Lily
Token ID20037
Feature activation-1.62e+00

Pos logit contributions
iversity+0.004
 dearly+0.003
 brand+0.003
gency+0.003
 lens+0.003
Neg logit contributions
 bal-0.003
Whe-0.002
me-0.002
CR-0.002
 asks-0.002
Token went
Token ID1816
Feature activation-2.28e-01

Pos logit contributions
Whe+0.074
 bal+0.068
 Mir+0.063
Next+0.062
 hotter+0.059
Neg logit contributions
iversity-0.067
 lens-0.052
 tripod-0.052
 camera-0.050
asser-0.049
Token She
Token ID1375
Feature activation+1.61e+00

Pos logit contributions
 bal+0.009
CR+0.008
me+0.008
Whe+0.008
 dipped+0.008
Neg logit contributions
iversity-0.010
 dearly-0.008
gency-0.008
smart-0.008
hips-0.008
Token was
Token ID373
Feature activation-7.61e-01

Pos logit contributions
iversity+0.077
 dearly+0.067
gency+0.062
smart+0.061
assing+0.060
Neg logit contributions
 bal-0.061
Whe-0.057
CR-0.056
Next-0.053
 Mir-0.052
Token so
Token ID523
Feature activation+6.87e-01

Pos logit contributions
 bal+0.034
CR+0.033
ara+0.032
Whe+0.031
me+0.031
Neg logit contributions
iversity-0.030
 dearly-0.028
smart-0.025
 brand-0.024
hips-0.023
Token happy
Token ID3772
Feature activation-1.09e+00

Pos logit contributions
iversity+0.028
 dearly+0.025
smart+0.023
hips+0.022
 brand+0.021
Neg logit contributions
 bal-0.031
Whe-0.029
CR-0.028
ara-0.028
 Mir-0.027
Token that
Token ID326
Feature activation-5.67e-01

Pos logit contributions
 bal+0.043
Whe+0.043
CR+0.041
me+0.041
 Mir+0.037
Neg logit contributions
iversity-0.049
 dearly-0.043
 facts-0.041
smart-0.040
gency-0.038
Token she
Token ID673
Feature activation-1.53e+00

Pos logit contributions
 bal+0.012
Whe+0.012
CR+0.012
me+0.011
ara+0.011
Neg logit contributions
iversity-0.034
 dearly-0.033
smart-0.030
 facts-0.029
 lens-0.029
Token suggested
Token ID5220
Feature activation+4.51e-02

Pos logit contributions
 bal+0.061
Whe+0.057
 Mir+0.055
Next+0.054
CR+0.054
Neg logit contributions
iversity-0.074
 dearly-0.060
gency-0.057
assing-0.057
smart-0.055
Token that
Token ID326
Feature activation+1.40e+00

Pos logit contributions
iversity+0.003
 dearly+0.002
gency+0.002
assing+0.002
hips+0.002
Neg logit contributions
 bal-0.001
Whe-0.001
CR-0.001
me-0.001
 when-0.001
Token they
Token ID484
Feature activation+9.62e-01

Pos logit contributions
iversity+0.087
 dearly+0.080
smart+0.075
 facts+0.074
gency+0.072
Neg logit contributions
 bal-0.032
Whe-0.030
CR-0.030
me-0.026
ara-0.026
Token take
Token ID1011
Feature activation+7.73e-01

Pos logit contributions
iversity+0.049
 dearly+0.041
gency+0.041
hips+0.040
assing+0.040
Neg logit contributions
 bal-0.036
Whe-0.031
CR-0.031
me-0.030
 Mir-0.030
Token a
Token ID257
Feature activation+1.76e+00

Pos logit contributions
iversity+0.041
 dearly+0.037
 facts+0.034
smart+0.034
gency+0.033
Neg logit contributions
 bal-0.025
me-0.023
CR-0.023
Next-0.021
Whe-0.021
TokenI
Token ID40
Feature activation-2.34e+00

Pos logit contributions
 bal+0.019
 when+0.015
CR+0.015
Whe+0.015
me+0.014
Neg logit contributions
iversity-0.043
gency-0.037
 dearly-0.037
assing-0.037
hips-0.036
Token heard
Token ID2982
Feature activation+1.36e+00

Pos logit contributions
 bal+0.089
 Mir+0.081
Whe+0.081
CR+0.080
Next+0.079
Neg logit contributions
iversity-0.111
 dearly-0.097
gency-0.093
smart-0.091
assing-0.089
Token that
Token ID326
Feature activation-8.39e-01

Pos logit contributions
iversity+0.075
smart+0.063
 facts+0.063
 dearly+0.062
gency+0.062
Neg logit contributions
 bal-0.044
Whe-0.038
me-0.037
CR-0.037
ara-0.035
Token every
Token ID790
Feature activation-2.37e+00

Pos logit contributions
 bal+0.028
CR+0.024
Whe+0.024
me+0.023
Next+0.023
Neg logit contributions
iversity-0.046
 dearly-0.038
gency-0.038
hips-0.038
smart-0.037
Token day
Token ID1110
Feature activation-1.70e+00

Pos logit contributions
 bal+0.071
Whe+0.061
CR+0.059
me+0.058
 Mir+0.057
Neg logit contributions
iversity-0.137
 dearly-0.118
gency-0.117
 facts-0.114
smart-0.114
Token if
Token ID611
Feature activation-3.72e+00

Pos logit contributions
 bal+0.059
Whe+0.053
CR+0.051
me+0.050
Next+0.049
Neg logit contributions
iversity-0.090
 dearly-0.076
gency-0.075
hips-0.074
assing-0.073
Token you
Token ID345
Feature activation-1.29e+00

Pos logit contributions
 bal+0.109
Whe+0.101
 dipped+0.090
ead+0.087
me+0.086
Neg logit contributions
iversity-0.210
gency-0.185
 dearly-0.182
 facts-0.175
smart-0.174
Token do
Token ID466
Feature activation+1.38e+00

Pos logit contributions
 bal+0.050
Whe+0.044
CR+0.044
 Mir+0.043
Next+0.043
Neg logit contributions
iversity-0.063
 dearly-0.052
gency-0.052
hips-0.051
assing-0.051
Token good
Token ID922
Feature activation+1.56e+00

Pos logit contributions
iversity+0.083
gency+0.071
 dearly+0.071
hips+0.071
assing+0.070
Neg logit contributions
 bal-0.038
 when-0.033
 another-0.031
CR-0.031
 she-0.031
Token things
Token ID1243
Feature activation-3.45e-01

Pos logit contributions
iversity+0.083
gency+0.072
 dearly+0.071
 facts+0.071
smart+0.070
Neg logit contributions
 bal-0.052
Whe-0.047
CR-0.046
me-0.045
 Mir-0.045
Token then
Token ID788
Feature activation-2.80e+00

Pos logit contributions
 bal+0.011
Whe+0.009
CR+0.009
me+0.009
 Mir+0.009
Neg logit contributions
iversity-0.019
 dearly-0.017
hips-0.016
gency-0.016
assing-0.016
Token brave
Token ID14802
Feature activation+4.11e-01

Pos logit contributions
iversity+0.145
 dearly+0.122
gency+0.119
assing+0.118
hips+0.118
Neg logit contributions
 bal-0.105
Whe-0.092
CR-0.091
me-0.087
Next-0.087
Token.
Token ID13
Feature activation-3.35e-02

Pos logit contributions
iversity+0.018
 dearly+0.017
smart+0.016
 facts+0.016
 languages+0.015
Neg logit contributions
 bal-0.016
Whe-0.015
CR-0.014
me-0.014
 Mir-0.014
Token Now
Token ID2735
Feature activation+2.21e-01

Pos logit contributions
 bal+0.001
 dipped+0.001
me+0.001
Next+0.001
 longer+0.001
Neg logit contributions
iversity-0.002
 dearly-0.001
gency-0.001
smart-0.001
 facts-0.001
Token,
Token ID11
Feature activation-1.86e-01

Pos logit contributions
iversity+0.008
hips+0.007
gency+0.007
 stories+0.007
 dearly+0.007
Neg logit contributions
CR-0.009
Whe-0.009
Next-0.009
 bal-0.009
Ha-0.008
Token let
Token ID1309
Feature activation-2.97e+00

Pos logit contributions
CR+0.007
 bal+0.006
Next+0.006
Whe+0.006
me+0.006
Neg logit contributions
iversity-0.008
hips-0.007
 stories-0.007
gency-0.007
 dearly-0.007
Token's
Token ID338
Feature activation-2.28e+00

Pos logit contributions
 bal+0.105
Whe+0.103
Next+0.095
 Mir+0.093
 clim+0.093
Neg logit contributions
iversity-0.132
isk-0.116
 stories-0.108
 dearly-0.106
gency-0.106
Token c
Token ID269
Feature activation+8.80e-01

Pos logit contributions
 bal+0.076
CR+0.068
Whe+0.066
Next+0.064
 Mir+0.064
Neg logit contributions
iversity-0.120
gency-0.103
 dearly-0.103
assing-0.102
hips-0.101
Tokenuddle
Token ID24500
Feature activation+6.63e-01

Pos logit contributions
iversity+0.043
 dearly+0.036
gency+0.036
hips+0.035
assing+0.035
Neg logit contributions
 bal-0.035
CR-0.030
Whe-0.030
 when-0.030
 Mir-0.030
Token and
Token ID290
Feature activation-6.89e-01

Pos logit contributions
iversity+0.038
gency+0.032
smart+0.032
hips+0.032
 dearly+0.031
Neg logit contributions
 bal-0.019
CR-0.016
Whe-0.016
Next-0.016
 asked-0.015
Token read
Token ID1100
Feature activation+1.32e+00

Pos logit contributions
 bal+0.026
CR+0.024
Whe+0.024
ara+0.024
me+0.023
Neg logit contributions
iversity-0.031
gency-0.026
assing-0.026
 dearly-0.026
hips-0.026
Token a
Token ID257
Feature activation+2.07e+00

Pos logit contributions
iversity+0.077
 dearly+0.066
gency+0.066
 facts+0.066
hips+0.065
Neg logit contributions
 bal-0.039
Whe-0.032
CR-0.032
me-0.031
Next-0.029
Token.
Token ID13
Feature activation+4.44e-01

Pos logit contributions
iversity+0.094
 dearly+0.083
gency+0.079
 facts+0.078
hips+0.078
Neg logit contributions
 bal-0.054
CR-0.048
Whe-0.047
 Mir-0.044
Next-0.044
Token She
Token ID1375
Feature activation+2.50e+00

Pos logit contributions
iversity+0.022
 dearly+0.019
hips+0.019
gency+0.018
smart+0.018
Neg logit contributions
 bal-0.016
me-0.014
CR-0.014
 dipped-0.014
Whe-0.014
Token said
Token ID531
Feature activation-1.38e+00

Pos logit contributions
iversity+0.130
 dearly+0.116
 facts+0.112
gency+0.104
 tripod+0.104
Neg logit contributions
CR-0.082
 bal-0.080
Whe-0.078
Next-0.070
 Mir-0.063
Token she
Token ID673
Feature activation+1.47e+00

Pos logit contributions
 bal+0.024
CR+0.018
Whe+0.018
me+0.017
Next+0.016
Neg logit contributions
iversity-0.096
 dearly-0.085
gency-0.084
hips-0.084
 facts-0.084
Token liked
Token ID8288
Feature activation+6.44e-01

Pos logit contributions
iversity+0.075
 dearly+0.064
gency+0.063
assing+0.061
hips+0.061
Neg logit contributions
 bal-0.054
CR-0.048
Whe-0.047
 Mir-0.045
 when-0.045
Token me
Token ID502
Feature activation-1.99e+00

Pos logit contributions
iversity+0.041
gency+0.035
assing+0.035
hips+0.035
 dearly+0.035
Neg logit contributions
 bal-0.016
 when-0.013
 Sara-0.013
 she-0.013
 Mir-0.012
Token.
Token ID13
Feature activation+2.42e-01

Pos logit contributions
 bal+0.056
 when+0.050
 another+0.048
 Sara+0.048
 one+0.048
Neg logit contributions
iversity-0.119
gency-0.105
assing-0.103
hips-0.102
 lens-0.101
Token She
Token ID1375
Feature activation+2.23e+00

Pos logit contributions
iversity+0.012
 dearly+0.010
hips+0.010
gency+0.010
smart+0.010
Neg logit contributions
 bal-0.009
me-0.008
CR-0.008
 dipped-0.008
Whe-0.008
Token asked
Token ID1965
Feature activation-3.83e+00

Pos logit contributions
iversity+0.115
 dearly+0.103
 facts+0.101
 tripod+0.093
 meanings+0.092
Neg logit contributions
CR-0.073
 bal-0.071
Whe-0.070
Next-0.061
me-0.056
Token me
Token ID502
Feature activation-1.40e+00

Pos logit contributions
 bal+0.097
 when+0.078
 Sara+0.075
CR+0.075
Whe+0.075
Neg logit contributions
iversity-0.241
gency-0.210
 dearly-0.208
assing-0.207
hips-0.205
Token if
Token ID611
Feature activation-2.76e+00

Pos logit contributions
 bal+0.039
CR+0.034
Whe+0.033
Next+0.030
me+0.030
Neg logit contributions
iversity-0.084
 dearly-0.073
gency-0.073
hips-0.072
 facts-0.071
Token a
Token ID257
Feature activation+1.73e+00

Pos logit contributions
iversity+0.087
assing+0.079
gency+0.078
asser+0.075
 dearly+0.074
Neg logit contributions
 bal-0.025
 another-0.022
 when-0.022
 one-0.022
 she-0.020
Token music
Token ID2647
Feature activation+1.09e+00

Pos logit contributions
iversity+0.096
isk+0.086
assing+0.086
gency+0.086
 dearly+0.086
Neg logit contributions
 bal-0.054
 one-0.047
 another-0.045
 when-0.044
 girl-0.044
Token box
Token ID3091
Feature activation+4.42e+00

Pos logit contributions
iversity+0.064
gency+0.056
 dearly+0.055
assing+0.055
hips+0.054
Neg logit contributions
 bal-0.033
 when-0.027
Whe-0.026
CR-0.026
me-0.025
Token,
Token ID11
Feature activation+2.54e-01

Pos logit contributions
iversity+0.266
gency+0.230
 dearly+0.229
assing+0.226
hips+0.226
Neg logit contributions
 bal-0.123
CR-0.099
Whe-0.099
me-0.097
 when-0.097
Token and
Token ID290
Feature activation-6.89e-01

Pos logit contributions
iversity+0.014
 dearly+0.012
gency+0.011
hips+0.011
assing+0.011
Neg logit contributions
 bal-0.009
CR-0.007
Whe-0.007
me-0.007
ara-0.007
Token as
Token ID355
Feature activation-5.61e+00

Pos logit contributions
 bal+0.021
Whe+0.020
CR+0.019
me+0.019
ara+0.019
Neg logit contributions
iversity-0.038
 dearly-0.032
 facts-0.031
smart-0.031
 brand-0.031
Token Emma
Token ID18966
Feature activation-3.78e-01

Pos logit contributions
 bal+0.178
Whe+0.163
CR+0.161
 dipped+0.151
 reaches+0.145
Neg logit contributions
iversity-0.308
 dearly-0.280
hips-0.258
 facts-0.255
gency-0.250
Token cl
Token ID537
Feature activation-1.24e+00

Pos logit contributions
 bal+0.013
Whe+0.012
CR+0.012
Next+0.012
 Mir+0.011
Neg logit contributions
iversity-0.020
gency-0.016
assing-0.016
 dearly-0.015
 lens-0.015
Tokenapped
Token ID6320
Feature activation-1.12e-01

Pos logit contributions
 bal+0.051
Next+0.049
Whe+0.049
CR+0.048
 Mir+0.048
Neg logit contributions
iversity-0.052
ishly-0.044
smart-0.044
 dearly-0.043
hips-0.043
Token,
Token ID11
Feature activation+6.76e-01

Pos logit contributions
 bal+0.003
Whe+0.003
CR+0.003
me+0.003
Next+0.003
Neg logit contributions
iversity-0.006
 dearly-0.005
hips-0.005
smart-0.005
gency-0.005
Token he
Token ID339
Feature activation-1.14e+00

Pos logit contributions
iversity+0.042
 dearly+0.037
smart+0.037
 facts+0.036
gency+0.036
Neg logit contributions
 bal-0.016
CR-0.014
me-0.014
Whe-0.013
ara-0.012
Token wonderful
Token ID7932
Feature activation+1.84e+00

Pos logit contributions
 bal+0.010
 one+0.009
 another+0.009
 when+0.009
Whe+0.008
Neg logit contributions
iversity-0.014
hips-0.013
gency-0.013
assing-0.012
smart-0.012
Token time
Token ID640
Feature activation-9.26e-01

Pos logit contributions
iversity+0.128
gency+0.115
assing+0.114
hips+0.113
smart+0.111
Neg logit contributions
 bal-0.037
 one-0.028
 when-0.027
Whe-0.026
 another-0.024
Token playing
Token ID2712
Feature activation-8.62e-01

Pos logit contributions
 bal+0.032
Whe+0.029
CR+0.028
me+0.028
 Mir+0.027
Neg logit contributions
iversity-0.048
 dearly-0.041
smart-0.040
assing-0.040
gency-0.040
Token,
Token ID11
Feature activation+1.72e-01

Pos logit contributions
 bal+0.022
 when+0.019
 another+0.018
Whe+0.017
 one+0.017
Neg logit contributions
iversity-0.054
gency-0.048
 dearly-0.047
assing-0.046
hips-0.046
Token and
Token ID290
Feature activation-8.49e-01

Pos logit contributions
iversity+0.009
 dearly+0.007
assing+0.007
gency+0.007
hips+0.007
Neg logit contributions
 bal-0.006
CR-0.006
me-0.005
Whe-0.005
Next-0.005
Token when
Token ID618
Feature activation-5.64e+00

Pos logit contributions
 bal+0.027
CR+0.024
Whe+0.023
me+0.022
Next+0.022
Neg logit contributions
iversity-0.047
 dearly-0.041
gency-0.039
assing-0.039
 facts-0.039
Token it
Token ID340
Feature activation-2.48e+00

Pos logit contributions
 bal+0.181
me+0.158
 dipped+0.157
CR+0.151
Whe+0.146
Neg logit contributions
iversity-0.289
 dearly-0.272
gency-0.260
assing-0.252
 facts-0.244
Token was
Token ID373
Feature activation-1.34e+00

Pos logit contributions
Whe+0.110
 bal+0.108
 Mir+0.106
 Sara+0.099
ead+0.098
Neg logit contributions
iversity-0.095
 dearly-0.069
smart-0.068
isk-0.068
asser-0.067
Token time
Token ID640
Feature activation-1.84e+00

Pos logit contributions
 bal+0.040
me+0.034
Whe+0.034
 Mir+0.033
CR+0.033
Neg logit contributions
iversity-0.076
 dearly-0.067
gency-0.064
hips-0.064
assing-0.063
Token to
Token ID284
Feature activation-7.49e-01

Pos logit contributions
 bal+0.045
 when+0.040
 she+0.038
 one+0.037
 Sara+0.037
Neg logit contributions
iversity-0.115
assing-0.101
gency-0.099
 dearly-0.099
hips-0.099
Token go
Token ID467
Feature activation-9.94e-01

Pos logit contributions
 bal+0.019
CR+0.015
Whe+0.015
me+0.015
 when+0.015
Neg logit contributions
iversity-0.047
gency-0.041
 dearly-0.041
assing-0.040
hips-0.040
Token hay
Token ID27678
Feature activation-4.14e+00

Pos logit contributions
 bal+0.128
CR+0.107
Whe+0.107
 when+0.106
me+0.105
Neg logit contributions
iversity-0.207
gency-0.176
 dearly-0.175
assing-0.173
hips-0.172
Token.
Token ID13
Feature activation-8.19e-01

Pos logit contributions
 bal+0.143
Whe+0.125
CR+0.122
me+0.119
Next+0.117
Neg logit contributions
iversity-0.221
gency-0.185
 dearly-0.185
hips-0.182
assing-0.181
Token
Token ID198
Feature activation-1.23e+00

Pos logit contributions
 bal+0.033
me+0.029
 dipped+0.029
Whe+0.028
CR+0.028
Neg logit contributions
iversity-0.038
 dearly-0.032
gency-0.031
smart-0.030
hips-0.030
Token
Token ID198
Feature activation-8.81e-01

Pos logit contributions
 bal+0.032
CR+0.027
 Sara+0.026
 when+0.026
Whe+0.026
Neg logit contributions
iversity-0.076
gency-0.067
 dearly-0.066
assing-0.065
hips-0.065
TokenThen
Token ID6423
Feature activation-3.51e+00

Pos logit contributions
 bal+0.027
Whe+0.022
 Mir+0.021
 when+0.021
me+0.021
Neg logit contributions
iversity-0.051
 dearly-0.043
assing-0.043
gency-0.042
hips-0.042
Token one
Token ID530
Feature activation-5.85e+00

Pos logit contributions
 bal+0.117
Whe+0.112
me+0.109
Next+0.104
CR+0.102
Neg logit contributions
iversity-0.174
gency-0.152
 dearly-0.149
hips-0.144
 facts-0.143
Token day
Token ID1110
Feature activation-2.35e+00

Pos logit contributions
Whe+0.189
 bal+0.183
CR+0.180
Next+0.172
me+0.169
Neg logit contributions
iversity-0.292
 dearly-0.263
smart-0.253
 facts-0.247
 barrel-0.227
Token,
Token ID11
Feature activation-4.46e-01

Pos logit contributions
Whe+0.091
 bal+0.087
me+0.083
Next+0.083
 dipped+0.078
Neg logit contributions
iversity-0.107
smart-0.085
gency-0.083
 facts-0.083
 dearly-0.082
Token he
Token ID339
Feature activation-2.38e+00

Pos logit contributions
 bal+0.014
me+0.013
Whe+0.013
CR+0.012
 dipped+0.012
Neg logit contributions
iversity-0.023
gency-0.020
 dearly-0.020
 facts-0.019
hips-0.019
Token got
Token ID1392
Feature activation-9.47e-01

Pos logit contributions
 bal+0.080
Whe+0.072
Next+0.065
 Mir+0.065
me+0.065
Neg logit contributions
iversity-0.129
gency-0.109
 dearly-0.107
assing-0.107
 facts-0.105
Token lucky
Token ID9670
Feature activation+7.94e-01

Pos logit contributions
 bal+0.035
Whe+0.035
ara+0.034
me+0.032
CR+0.032
Neg logit contributions
iversity-0.042
 dearly-0.041
gency-0.037
smart-0.037
assing-0.036
Token anything
Token ID1997
Feature activation-2.41e+00

Pos logit contributions
 bal+0.053
Whe+0.046
CR+0.046
me+0.044
 Mir+0.043
Neg logit contributions
iversity-0.090
 dearly-0.078
 facts-0.076
hips-0.076
gency-0.075
Token is
Token ID318
Feature activation-2.06e+00

Pos logit contributions
 bal+0.090
 when+0.078
CR+0.076
me+0.076
 she+0.075
Neg logit contributions
iversity-0.122
gency-0.104
assing-0.101
 dearly-0.101
hips-0.100
Token possible
Token ID1744
Feature activation-5.24e+00

Pos logit contributions
 bal+0.066
Whe+0.056
CR+0.055
me+0.054
 Mir+0.053
Neg logit contributions
iversity-0.116
 dearly-0.100
gency-0.098
hips-0.098
assing-0.097
Token as
Token ID355
Feature activation-6.40e+00

Pos logit contributions
 bal+0.173
Whe+0.152
CR+0.146
me+0.143
 Mir+0.140
Neg logit contributions
iversity-0.286
 dearly-0.247
gency-0.242
hips-0.240
 facts-0.238
Token long
Token ID890
Feature activation-1.76e+00

Pos logit contributions
 bal+0.238
Whe+0.209
CR+0.206
me+0.203
ara+0.194
Neg logit contributions
iversity-0.325
 dearly-0.278
 facts-0.273
hips-0.270
gency-0.267
Token as
Token ID355
Feature activation-5.94e+00

Pos logit contributions
 bal+0.052
Whe+0.048
me+0.046
 Mir+0.043
CR+0.043
Neg logit contributions
iversity-0.100
 dearly-0.085
hips-0.085
smart-0.085
gency-0.084
Token you
Token ID345
Feature activation-1.54e+00

Pos logit contributions
 bal+0.186
Whe+0.167
me+0.156
CR+0.153
ara+0.146
Neg logit contributions
iversity-0.331
 facts-0.293
 dearly-0.291
hips-0.287
smart-0.280
Token don
Token ID836
Feature activation-7.27e-01

Pos logit contributions
 bal+0.057
CR+0.049
Whe+0.048
me+0.048
 when+0.047
Neg logit contributions
iversity-0.079
gency-0.066
 dearly-0.066
assing-0.065
hips-0.065
Token't
Token ID470
Feature activation-1.64e+00

Pos logit contributions
 bal+0.022
CR+0.018
me+0.018
 when+0.018
Whe+0.017
Neg logit contributions
iversity-0.042
gency-0.037
 dearly-0.036
 facts-0.036
assing-0.035
Token give
Token ID1577
Feature activation-4.11e-01

Pos logit contributions
 bal+0.058
 when+0.050
CR+0.049
me+0.048
Whe+0.048
Neg logit contributions
iversity-0.087
gency-0.073
 dearly-0.072
hips-0.071
assing-0.071
Token up
Token ID510
Feature activation-2.74e+00

Pos logit contributions
 bal+0.011
Whe+0.008
CR+0.008
me+0.008
 when+0.008
Neg logit contributions
iversity-0.026
 dearly-0.022
gency-0.022
hips-0.022
assing-0.022
Token all
Token ID477
Feature activation+2.45e+00

Pos logit contributions
iversity+0.080
gency+0.071
assing+0.070
hips+0.069
 facts+0.069
Neg logit contributions
 bal-0.032
 when-0.029
 another-0.025
 one-0.025
 she-0.025
Token morning
Token ID3329
Feature activation-2.93e+00

Pos logit contributions
iversity+0.144
gency+0.127
assing+0.125
hips+0.123
 dearly+0.120
Neg logit contributions
 bal-0.071
 when-0.063
 Sara-0.061
 she-0.060
 one-0.059
Token,
Token ID11
Feature activation+4.44e-01

Pos logit contributions
 bal+0.078
 when+0.067
 she+0.063
 one+0.061
 Sara+0.060
Neg logit contributions
iversity-0.181
gency-0.160
hips-0.158
assing-0.157
 dearly-0.155
Token and
Token ID290
Feature activation-8.27e-01

Pos logit contributions
iversity+0.026
gency+0.022
hips+0.022
 dearly+0.022
assing+0.022
Neg logit contributions
 bal-0.014
 when-0.011
 she-0.011
 Sara-0.011
CR-0.011
Token no
Token ID645
Feature activation-2.85e+00

Pos logit contributions
 bal+0.024
Whe+0.021
CR+0.020
me+0.019
 Mir+0.018
Neg logit contributions
iversity-0.049
 dearly-0.043
gency-0.042
 facts-0.041
assing-0.041
Token one
Token ID530
Feature activation-5.74e+00

Pos logit contributions
CR+0.102
Whe+0.095
There+0.086
 bal+0.084
BO+0.079
Neg logit contributions
 dearly-0.134
 facts-0.133
 stories-0.123
iversity-0.122
gency-0.118
Token knew
Token ID2993
Feature activation+1.32e+00

Pos logit contributions
 bal+0.216
Whe+0.210
CR+0.201
Next+0.195
me+0.192
Neg logit contributions
iversity-0.276
 dearly-0.239
smart-0.229
 facts-0.227
gency-0.221
Token why
Token ID1521
Feature activation+1.87e-01

Pos logit contributions
iversity+0.051
 dearly+0.041
gency+0.040
hips+0.039
assing+0.039
Neg logit contributions
 bal-0.065
Whe-0.059
CR-0.059
me-0.058
 Mir-0.057
Token.
Token ID13
Feature activation+1.82e-01

Pos logit contributions
iversity+0.013
gency+0.011
assing+0.011
hips+0.011
 dearly+0.011
Neg logit contributions
 bal-0.004
 when-0.003
 she-0.003
 Sara-0.003
CR-0.003
Token Her
Token ID2332
Feature activation-1.88e+00

Pos logit contributions
iversity+0.011
gency+0.009
assing+0.009
hips+0.009
 dearly+0.009
Neg logit contributions
 bal-0.005
 when-0.004
CR-0.004
 she-0.004
 Sara-0.004
Token mum
Token ID25682
Feature activation-9.74e-01

Pos logit contributions
 bal+0.050
Whe+0.046
CR+0.045
me+0.041
 Mir+0.039
Neg logit contributions
iversity-0.111
 dearly-0.099
gency-0.097
 facts-0.097
 lens-0.095