TOP ACTIVATIONS
MAX = 7.506

the operation was finished , Sab rina went back home and
" Ar rr , mate y , this is our
A few days later , Sab rina had to stay in
minute . Is that her mate ? She must love him
't know . Luc as started to complain "
Ben finds a cap in his room . It is a
can I play in the sandbox ?" asked Lily .
found a big knot in her yarn . She tried to
found a new car in his toy box . It was
found a new car in her toy box . It was
found a new toy in her box for other things .
finds a new toy in her room . It is a
It needs to find a mate and lay eggs . Then
" Ar r , mate y , where is the
She blinked her eyes and imm ag ined she could see
finds a new car in his toy box . It is
finds a big puzzle in his room . It has many
finds a new jewelry in her mom 's drawer . It
found a big hammer in his dad 's tool box .
she found an olive in her backyard . She was so

MIN ACTIVATIONS
MIN = -5.781

noticed her and invited her on stage to play . She
while Molly wasn â t looking .
that shouting wasn â t a regular thing to
grandma 's house for safe keeping . When
and her mom were going on a tour . Mil a
of her daughter â s new skill . <|endoftext|>
They cheered and smiled all day long ! <|endoftext|> Tom my
about the day â s adventure . <|endoftext|> Once
said Ali â s mom .
realised he wasn â t scared to go to
knew she wasn â t a threat and it
They laughed and smiled all day . At the
played games and laughed all day . When it was time
uddled the cat and said '' You are safe now !
them talked and laughed all day . Billy made
praised the rabbit â s determination and gave it
But his mom said '' No ! ''
thin stone wasn â t good enough to join
told M ina â s family that he wanted
liked . Lucy hugged her Sister and said " Thank You

INTERVAL 4.185 - 7.506
CONTAINS 0.101%

my favorite too ." <|endoftext|> Once upon a time , there
from their adventure . <|endoftext|> Once upon a time , there
He liked to rock in his closet every day . Little
of toys and books in your room . And you know
be scared and started to trem ble . Mia was patient

INTERVAL 0.863 - 4.185
CONTAINS 23.314%

about the jewel . <|endoftext|> Once upon a time , there
, and then landed on a nearby field . Jack and
, Ben . You are a good friend . You tried
said . " Now let 's go back to making your
play with her . Ben smiled and said yes . He

INTERVAL -2.459 - 0.863
CONTAINS 74.422%

Soon after , the family got a cat . The cat
if she wanted one , she had to choose carefully .
on a farm . Every day , the cow would eat
him a hug . He said : " I don 't
cater pillar , then ran off to find a tree of

INTERVAL -5.781 - -2.459
CONTAINS 2.163%

came back home to give Max his food . Max was
Sarah picked up the cap and said , "
. They talked and discussed who should get the carrot .
Tim my . Tim my liked to play outside in his
y was okay and they continued to play .
Token the
Token ID262
Feature activation-7.69e-01

Pos logit contributions
 cheering+0.015
 searching+0.015
 rescuing+0.015
keeping+0.014
 improving+0.014
Neg logit contributions
ires-0.020
ara-0.020
ia-0.020
outed-0.020
ushing-0.020
Token operation
Token ID4905
Feature activation-4.87e-01

Pos logit contributions
 cheering+0.020
 rescuing+0.020
 searching+0.020
keeping+0.020
bench+0.019
Neg logit contributions
ia-0.015
ara-0.015
ires-0.015
outed-0.015
ours-0.014
Token was
Token ID373
Feature activation-3.59e-01

Pos logit contributions
 cheering+0.011
 rescuing+0.011
 searching+0.011
keeping+0.011
 improving+0.010
Neg logit contributions
ia-0.011
ara-0.011
ires-0.011
outed-0.011
ours-0.011
Token finished
Token ID5201
Feature activation-1.29e+00

Pos logit contributions
 cheering+0.007
 rescuing+0.006
keeping+0.006
 searching+0.006
rob+0.006
Neg logit contributions
ia-0.010
ires-0.010
ara-0.010
outed-0.010
ushing-0.009
Token,
Token ID11
Feature activation+1.19e+00

Pos logit contributions
keeping+0.026
 rescuing+0.026
 cheering+0.026
 searching+0.025
rob+0.025
Neg logit contributions
ara-0.033
ia-0.033
ires-0.032
ushing-0.031
outed-0.031
Token Sab
Token ID9910
Feature activation+6.76e+00

Pos logit contributions
ara+0.034
uddle+0.034
ires+0.034
outed+0.033
ours+0.033
Neg logit contributions
 cheering-0.022
 searching-0.021
 rescuing-0.020
 exploring-0.020
 improving-0.020
Tokenrina
Token ID22267
Feature activation+1.54e+00

Pos logit contributions
uddle+0.174
ours+0.169
ires+0.163
ara+0.161
erve+0.158
Neg logit contributions
keeping-0.128
 cheering-0.119
 searching-0.119
 rescuing-0.119
Charl-0.117
Token went
Token ID1816
Feature activation-1.21e+00

Pos logit contributions
ires+0.037
ia+0.037
ara+0.036
outed+0.035
ours+0.035
Neg logit contributions
 cheering-0.034
 searching-0.033
 rescuing-0.033
keeping-0.032
 improving-0.032
Token back
Token ID736
Feature activation+2.10e-01

Pos logit contributions
 rescuing+0.022
 cheering+0.022
keeping+0.022
rob+0.022
 searching+0.021
Neg logit contributions
ara-0.032
ia-0.032
ires-0.032
outed-0.031
ours-0.031
Token home
Token ID1363
Feature activation+1.29e+00

Pos logit contributions
ara+0.005
ires+0.005
ia+0.005
ours+0.005
outed+0.005
Neg logit contributions
 cheering-0.004
bench-0.004
 rescuing-0.004
rob-0.004
ner-0.004
Token and
Token ID290
Feature activation+1.01e-01

Pos logit contributions
ia+0.030
ara+0.029
outed+0.029
ushing+0.028
ires+0.028
Neg logit contributions
rob-0.030
 rescuing-0.027
keeping-0.027
Brian-0.027
bench-0.026
Token
Token ID198
Feature activation-1.00e-01

Pos logit contributions
ires+0.015
ushing+0.015
ia+0.015
outed+0.015
ara+0.015
Neg logit contributions
 cheering-0.006
 searching-0.006
 rescuing-0.005
bench-0.005
ner-0.005
Token"
Token ID1
Feature activation+2.66e-01

Pos logit contributions
 cheering+0.001
Charl+0.001
ner+0.001
+0.001
 searching+0.001
Neg logit contributions
ires-0.003
ushing-0.003
outed-0.003
ia-0.003
uddle-0.003
TokenAr
Token ID3163
Feature activation+2.41e+00

Pos logit contributions
ia+0.008
ires+0.008
ushing+0.008
uddle+0.008
ours+0.007
Neg logit contributions
 cheering-0.005
 rescuing-0.004
 searching-0.004
Charl-0.004
keeping-0.004
Tokenrr
Token ID21062
Feature activation+9.47e-01

Pos logit contributions
ia+0.052
ara+0.050
ires+0.049
ushing+0.048
outed+0.047
Neg logit contributions
keeping-0.061
 searching-0.059
 cheering-0.059
bench-0.059
 rescuing-0.059
Token,
Token ID11
Feature activation+6.94e-01

Pos logit contributions
ia+0.026
ires+0.026
ara+0.026
outed+0.025
ours+0.025
Neg logit contributions
 cheering-0.018
 rescuing-0.017
 searching-0.017
keeping-0.017
bench-0.016
Token mate
Token ID16133
Feature activation+6.60e+00

Pos logit contributions
outed+0.019
uddle+0.017
ires+0.017
ara+0.017
ushing+0.017
Neg logit contributions
 searching-0.016
 cheering-0.015
 rescuing-0.015
 exploring-0.015
 helping-0.015
Tokeny
Token ID88
Feature activation-2.37e-01

Pos logit contributions
ires+0.169
outed+0.162
ours+0.159
ara+0.157
ushing+0.154
Neg logit contributions
 cheering-0.130
keeping-0.130
 rescuing-0.126
 searching-0.122
 exploring-0.117
Token,
Token ID11
Feature activation+8.03e-01

Pos logit contributions
 cheering+0.004
 rescuing+0.004
 searching+0.004
keeping+0.004
 improving+0.004
Neg logit contributions
ia-0.006
ara-0.006
ires-0.006
outed-0.006
ushing-0.006
Token this
Token ID428
Feature activation-4.24e-01

Pos logit contributions
outed+0.022
ara+0.020
ires+0.020
uddle+0.020
ia+0.020
Neg logit contributions
 searching-0.018
 rescuing-0.018
 cheering-0.018
 exploring-0.017
 helping-0.017
Token is
Token ID318
Feature activation+8.36e-01

Pos logit contributions
 searching+0.009
 rescuing+0.009
rob+0.008
bench+0.008
 cheering+0.008
Neg logit contributions
outed-0.011
ires-0.011
ours-0.010
ushing-0.010
ara-0.010
Token our
Token ID674
Feature activation+1.19e+00

Pos logit contributions
ia+0.021
ires+0.020
ara+0.020
outed+0.020
ours+0.019
Neg logit contributions
 cheering-0.018
 rescuing-0.018
 searching-0.018
keeping-0.017
bench-0.017
TokenA
Token ID32
Feature activation-6.11e-01

Pos logit contributions
 cheering+0.007
 rescuing+0.007
 searching+0.007
keeping+0.006
bench+0.006
Neg logit contributions
ia-0.009
ara-0.009
ires-0.009
outed-0.008
ours-0.008
Token few
Token ID1178
Feature activation+5.39e-01

Pos logit contributions
keeping+0.013
 rescuing+0.012
rob+0.012
 searching+0.012
hered+0.012
Neg logit contributions
ia-0.017
ara-0.016
ires-0.014
ry-0.014
ushing-0.014
Token days
Token ID1528
Feature activation-1.31e+00

Pos logit contributions
ia+0.016
ara+0.016
outed+0.015
ires+0.015
ours+0.015
Neg logit contributions
keeping-0.010
 rescuing-0.009
 cheering-0.009
 searching-0.009
bench-0.008
Token later
Token ID1568
Feature activation+1.18e-01

Pos logit contributions
 cheering+0.034
 rescuing+0.034
 searching+0.034
keeping+0.033
bench+0.033
Neg logit contributions
ara-0.026
ia-0.026
ires-0.025
outed-0.024
ours-0.024
Token,
Token ID11
Feature activation+1.71e+00

Pos logit contributions
ara+0.003
ia+0.003
ires+0.003
outed+0.003
ushing+0.003
Neg logit contributions
 cheering-0.002
 rescuing-0.002
 searching-0.002
keeping-0.002
bench-0.002
Token Sab
Token ID9910
Feature activation+6.46e+00

Pos logit contributions
ires+0.047
ara+0.046
outed+0.046
uddle+0.046
ia+0.046
Neg logit contributions
 cheering-0.033
 searching-0.033
 rescuing-0.031
 exploring-0.030
 improving-0.030
Tokenrina
Token ID22267
Feature activation+1.40e+00

Pos logit contributions
uddle+0.166
ours+0.161
ires+0.156
ara+0.152
erve+0.150
Neg logit contributions
keeping-0.121
 searching-0.115
 rescuing-0.113
Charl-0.113
 cheering-0.113
Token had
Token ID550
Feature activation-1.67e+00

Pos logit contributions
ia+0.032
ara+0.032
ires+0.030
outed+0.030
ushing+0.030
Neg logit contributions
 rescuing-0.032
 cheering-0.032
 searching-0.032
keeping-0.032
bench-0.031
Token to
Token ID284
Feature activation+3.21e-01

Pos logit contributions
 cheering+0.030
 rescuing+0.029
 searching+0.029
rob+0.028
keeping+0.028
Neg logit contributions
ia-0.047
ara-0.047
ires-0.045
outed-0.045
ours-0.044
Token stay
Token ID2652
Feature activation+9.84e-01

Pos logit contributions
ires+0.008
ia+0.007
ara+0.007
outed+0.007
ours+0.007
Neg logit contributions
 cheering-0.007
 rescuing-0.007
 searching-0.007
keeping-0.007
 improving-0.007
Token in
Token ID287
Feature activation-2.69e-01

Pos logit contributions
ia+0.028
ara+0.027
ires+0.027
outed+0.026
ours+0.026
Neg logit contributions
 cheering-0.018
 searching-0.018
 rescuing-0.018
keeping-0.017
 improving-0.017
Token minute
Token ID5664
Feature activation+2.80e-01

Pos logit contributions
keeping+0.019
 rescuing+0.019
 cheering+0.019
 searching+0.019
bench+0.019
Neg logit contributions
ia-0.019
ara-0.019
ires-0.017
outed-0.017
uddle-0.017
Token.
Token ID13
Feature activation+1.39e+00

Pos logit contributions
ia+0.007
ara+0.007
ires+0.007
outed+0.007
ushing+0.007
Neg logit contributions
 cheering-0.005
 rescuing-0.005
 searching-0.005
keeping-0.005
bench-0.005
Token Is
Token ID1148
Feature activation+2.63e+00

Pos logit contributions
uddle+0.042
outed+0.040
ia+0.040
ires+0.040
ushing+0.039
Neg logit contributions
 rescuing-0.025
ner-0.023
 cheering-0.022
keeping-0.022
 Owl-0.022
Token that
Token ID326
Feature activation+6.84e-01

Pos logit contributions
uddle+0.073
ushing+0.072
ara+0.071
ires+0.071
ia+0.069
Neg logit contributions
 searching-0.048
ner-0.047
 rescuing-0.046
 exploring-0.046
 flying-0.044
Token her
Token ID607
Feature activation+2.25e+00

Pos logit contributions
outed+0.018
ires+0.018
ushing+0.017
ours+0.017
ara+0.017
Neg logit contributions
 rescuing-0.014
 cheering-0.014
 searching-0.014
 improving-0.013
 helping-0.013
Token mate
Token ID16133
Feature activation+6.44e+00

Pos logit contributions
ires+0.055
ara+0.055
ia+0.054
outed+0.054
ushing+0.052
Neg logit contributions
 searching-0.048
 cheering-0.048
 rescuing-0.048
keeping-0.046
 exploring-0.046
Token?
Token ID30
Feature activation+6.28e-01

Pos logit contributions
ara+0.135
ires+0.134
outed+0.130
ia+0.128
ours+0.128
Neg logit contributions
 cheering-0.155
 rescuing-0.154
keeping-0.153
 searching-0.150
 exploring-0.145
Token She
Token ID1375
Feature activation+2.17e+00

Pos logit contributions
uddle+0.017
ia+0.017
outed+0.017
ires+0.017
ushing+0.017
Neg logit contributions
 rescuing-0.012
ner-0.012
 cheering-0.011
 searching-0.011
Charl-0.011
Token must
Token ID1276
Feature activation+6.34e-01

Pos logit contributions
ires+0.053
ara+0.052
ia+0.052
outed+0.050
ushing+0.049
Neg logit contributions
 searching-0.048
 rescuing-0.047
 cheering-0.046
 improving-0.044
 exploring-0.044
Token love
Token ID1842
Feature activation+8.46e-01

Pos logit contributions
outed+0.019
ires+0.019
ours+0.019
ushing+0.018
ara+0.018
Neg logit contributions
 rescuing-0.011
 searching-0.011
 cheering-0.011
keeping-0.010
 improving-0.010
Token him
Token ID683
Feature activation+1.33e+00

Pos logit contributions
ires+0.017
ia+0.017
ara+0.017
ours+0.016
outed+0.016
Neg logit contributions
 cheering-0.022
bench-0.021
 rescuing-0.021
 searching-0.021
hered-0.021
Token't
Token ID470
Feature activation-1.27e+00

Pos logit contributions
keeping+0.014
 rescuing+0.013
 cheering+0.013
 searching+0.013
rob+0.013
Neg logit contributions
ia-0.029
ires-0.029
ara-0.028
ours-0.027
ushing-0.026
Token know
Token ID760
Feature activation+7.14e-01

Pos logit contributions
rob+0.027
 rescuing+0.027
 cheering+0.027
keeping+0.026
 searching+0.026
Neg logit contributions
ia-0.032
ara-0.031
ires-0.031
ushing-0.030
outed-0.029
Token.
Token ID13
Feature activation-6.68e-01

Pos logit contributions
ara+0.016
ires+0.016
ia+0.016
outed+0.016
uddle+0.015
Neg logit contributions
 cheering-0.017
 searching-0.017
 rescuing-0.017
 exploring-0.016
 improving-0.016
Token
Token ID198
Feature activation+4.62e-01

Pos logit contributions
 cheering+0.011
 searching+0.011
 rescuing+0.010
bench+0.010
keeping+0.010
Neg logit contributions
ara-0.019
ires-0.019
ushing-0.019
ia-0.019
uddle-0.019
Token
Token ID198
Feature activation-2.61e-01

Pos logit contributions
ia+0.013
ara+0.013
ires+0.013
outed+0.012
ours+0.012
Neg logit contributions
 rescuing-0.008
 cheering-0.008
 searching-0.008
keeping-0.008
 improving-0.008
TokenLuc
Token ID22946
Feature activation+6.24e+00

Pos logit contributions
 cheering+0.005
 searching+0.005
Brian+0.005
 improving+0.005
 rescuing+0.005
Neg logit contributions
ours-0.007
uddle-0.007
ara-0.007
ushing-0.007
ires-0.007
Tokenas
Token ID292
Feature activation+7.47e-01

Pos logit contributions
uddle+0.132
ara+0.128
ours+0.127
erve+0.125
erving+0.116
Neg logit contributions
 searching-0.126
 insepar-0.126
Brian-0.125
 cheering-0.125
bench-0.124
Token started
Token ID2067
Feature activation-1.96e+00

Pos logit contributions
ires+0.022
ours+0.020
ara+0.020
ia+0.020
uddle+0.020
Neg logit contributions
 searching-0.013
 cheering-0.013
hered-0.013
 rescuing-0.012
keeping-0.012
Token to
Token ID284
Feature activation-4.40e-01

Pos logit contributions
rob+0.040
ner+0.037
bench+0.037
keeping+0.037
 rescuing+0.037
Neg logit contributions
ia-0.050
outed-0.050
ara-0.049
ires-0.049
ours-0.049
Token complain
Token ID13121
Feature activation+1.28e+00

Pos logit contributions
rob+0.009
 rescuing+0.009
keeping+0.008
bench+0.008
 cheering+0.008
Neg logit contributions
ia-0.011
ara-0.011
outed-0.011
ires-0.011
ours-0.011
Token "
Token ID366
Feature activation-7.39e-01

Pos logit contributions
ara+0.041
ires+0.040
ia+0.039
ours+0.039
outed+0.039
Neg logit contributions
 cheering-0.019
 searching-0.019
 rescuing-0.018
 exploring-0.018
 helping-0.017
Token Ben
Token ID3932
Feature activation+1.61e+00

Pos logit contributions
ushing+0.034
uddle+0.034
ia+0.033
ires+0.033
aj+0.033
Neg logit contributions
 searching-0.023
 cheering-0.023
ner-0.022
 exploring-0.022
 very-0.021
Token finds
Token ID7228
Feature activation+2.44e+00

Pos logit contributions
ires+0.041
ia+0.040
outed+0.039
ara+0.038
ushing+0.038
Neg logit contributions
 searching-0.034
 cheering-0.033
keeping-0.032
 rescuing-0.032
bench-0.031
Token a
Token ID257
Feature activation+8.34e-01

Pos logit contributions
ara+0.081
ires+0.081
ia+0.080
ushing+0.079
outed+0.078
Neg logit contributions
 searching-0.031
 rescuing-0.030
 cheering-0.030
bench-0.027
keeping-0.027
Token cap
Token ID1451
Feature activation+1.20e+00

Pos logit contributions
ia+0.018
ara+0.018
ires+0.017
uddle+0.017
ours+0.017
Neg logit contributions
keeping-0.021
 rescuing-0.020
 cheering-0.020
rob-0.019
hered-0.019
Token in
Token ID287
Feature activation+7.96e-01

Pos logit contributions
ires+0.040
ara+0.040
ia+0.040
outed+0.039
ours+0.039
Neg logit contributions
 rescuing-0.015
 searching-0.015
 cheering-0.015
bench-0.014
hered-0.014
Token his
Token ID465
Feature activation+6.20e+00

Pos logit contributions
ia+0.022
ara+0.022
ires+0.022
outed+0.021
ours+0.021
Neg logit contributions
 cheering-0.015
 rescuing-0.014
 searching-0.014
keeping-0.014
 improving-0.014
Token room
Token ID2119
Feature activation+2.17e+00

Pos logit contributions
ires+0.160
outed+0.156
ushing+0.156
ara+0.154
ours+0.150
Neg logit contributions
 searching-0.130
 rescuing-0.120
 exploring-0.117
 cheering-0.116
bench-0.115
Token.
Token ID13
Feature activation-7.16e-01

Pos logit contributions
ires+0.072
ara+0.072
ours+0.069
uddle+0.069
outed+0.069
Neg logit contributions
 searching-0.027
bench-0.027
 cheering-0.025
 rescuing-0.025
keeping-0.024
Token It
Token ID632
Feature activation-3.10e-01

Pos logit contributions
 searching+0.011
bench+0.011
 aback+0.011
 rescuing+0.010
ner+0.010
Neg logit contributions
ust-0.021
outed-0.021
ires-0.020
aj-0.020
ara-0.020
Token is
Token ID318
Feature activation+6.83e-02

Pos logit contributions
 searching+0.006
rob+0.006
 cheering+0.006
 rescuing+0.006
bench+0.006
Neg logit contributions
ires-0.008
ushing-0.008
outed-0.008
ara-0.007
ia-0.007
Token a
Token ID257
Feature activation+7.91e-01

Pos logit contributions
ara+0.002
ires+0.002
ia+0.002
outed+0.002
ours+0.002
Neg logit contributions
 searching-0.001
 cheering-0.001
 rescuing-0.001
 improving-0.001
bench-0.001
Token can
Token ID460
Feature activation-4.17e-01

Pos logit contributions
ires+0.056
outed+0.055
ia+0.052
ushing+0.051
ara+0.049
Neg logit contributions
 searching-0.034
 rescuing-0.034
 exploring-0.033
 cheering-0.033
 helping-0.033
Token I
Token ID314
Feature activation-3.38e-01

Pos logit contributions
 rescuing+0.009
ner+0.009
 cheering+0.009
 searching+0.009
 helping+0.008
Neg logit contributions
ires-0.011
outed-0.010
ia-0.010
ushing-0.010
ara-0.010
Token play
Token ID711
Feature activation+8.92e-02

Pos logit contributions
 searching+0.005
 cheering+0.005
 rescuing+0.004
hered+0.004
 exploring+0.004
Neg logit contributions
ires-0.012
outed-0.011
ara-0.011
ia-0.011
ushing-0.011
Token in
Token ID287
Feature activation+2.52e-01

Pos logit contributions
ires+0.003
outed+0.003
ara+0.003
ia+0.003
ushing+0.003
Neg logit contributions
 cheering-0.002
 rescuing-0.002
 searching-0.002
 exploring-0.001
 improving-0.001
Token the
Token ID262
Feature activation+1.36e+00

Pos logit contributions
ia+0.008
ires+0.008
ara+0.008
outed+0.007
ours+0.007
Neg logit contributions
 cheering-0.004
 rescuing-0.004
 searching-0.004
keeping-0.004
bench-0.004
Token sandbox
Token ID35204
Feature activation+6.15e+00

Pos logit contributions
ires+0.032
outed+0.032
ushing+0.032
ours+0.031
ara+0.030
Neg logit contributions
 searching-0.031
 cheering-0.031
 distance-0.030
 rescuing-0.030
 exploring-0.029
Token?"
Token ID1701
Feature activation+1.78e+00

Pos logit contributions
ires+0.152
outed+0.148
ia+0.145
ushing+0.144
ours+0.142
Neg logit contributions
 searching-0.132
 cheering-0.131
 rescuing-0.126
 exploring-0.123
ner-0.121
Token asked
Token ID1965
Feature activation+1.15e+00

Pos logit contributions
ires+0.046
ushing+0.045
inch+0.043
ust+0.042
ia+0.042
Neg logit contributions
 searching-0.040
 cheering-0.039
ner-0.038
Ell-0.036
 rescuing-0.036
Token Lily
Token ID20037
Feature activation+8.13e-01

Pos logit contributions
ires+0.026
ushing+0.026
uddle+0.025
ia+0.024
ours+0.023
Neg logit contributions
 cheering-0.027
hered-0.026
 searching-0.026
 rescuing-0.026
ner-0.026
Token.
Token ID13
Feature activation-3.76e-02

Pos logit contributions
ia+0.018
outed+0.018
ara+0.018
ires+0.018
ushing+0.018
Neg logit contributions
rob-0.019
 rescuing-0.018
keeping-0.018
 cheering-0.018
 searching-0.017
Token
Token ID220
Feature activation+1.09e+00

Pos logit contributions
 cheering+0.001
 searching+0.001
 rescuing+0.001
bench+0.001
keeping+0.001
Neg logit contributions
ires-0.001
ia-0.001
ara-0.001
ushing-0.001
outed-0.001
Token found
Token ID1043
Feature activation+2.34e+00

Pos logit contributions
ires+0.083
ushing+0.076
ours+0.075
ia+0.075
outed+0.074
Neg logit contributions
 searching-0.050
 chasing-0.046
 exploring-0.044
hered-0.043
 rescuing-0.043
Token a
Token ID257
Feature activation+1.15e+00

Pos logit contributions
ia+0.065
ires+0.064
ara+0.064
outed+0.062
ours+0.061
Neg logit contributions
 cheering-0.044
keeping-0.043
bench-0.042
 searching-0.042
 rescuing-0.041
Token big
Token ID1263
Feature activation-7.75e-01

Pos logit contributions
ara+0.024
ia+0.024
uddle+0.023
ires+0.023
ushing+0.022
Neg logit contributions
keeping-0.029
 rescuing-0.028
 cheering-0.028
rob-0.028
 searching-0.027
Token knot
Token ID29654
Feature activation-5.91e-01

Pos logit contributions
keeping+0.020
rob+0.019
 rescuing+0.018
 cheering+0.018
ner+0.018
Neg logit contributions
ara-0.017
ia-0.016
uddle-0.016
ires-0.015
ours-0.015
Token in
Token ID287
Feature activation+1.01e+00

Pos logit contributions
 cheering+0.012
keeping+0.012
 rescuing+0.011
rob+0.011
 searching+0.011
Neg logit contributions
ia-0.015
ires-0.015
ara-0.015
outed-0.015
ushing-0.014
Token her
Token ID607
Feature activation+6.11e+00

Pos logit contributions
ia+0.023
ara+0.022
ires+0.021
outed+0.020
ours+0.019
Neg logit contributions
 cheering-0.025
rob-0.024
 rescuing-0.024
 searching-0.024
ner-0.024
Token yarn
Token ID21181
Feature activation+1.25e+00

Pos logit contributions
ires+0.152
outed+0.148
ours+0.147
ara+0.147
ia+0.146
Neg logit contributions
 searching-0.128
 rescuing-0.123
Charl-0.122
 exploring-0.118
keeping-0.117
Token.
Token ID13
Feature activation-4.70e-01

Pos logit contributions
ia+0.032
ires+0.032
ara+0.032
outed+0.031
ushing+0.030
Neg logit contributions
 cheering-0.026
 rescuing-0.025
 searching-0.025
keeping-0.024
 improving-0.024
Token She
Token ID1375
Feature activation+1.56e+00

Pos logit contributions
 cheering+0.012
keeping+0.012
 searching+0.012
 improving+0.012
 rescuing+0.012
Neg logit contributions
ara-0.010
ia-0.010
ires-0.009
outed-0.009
ours-0.009
Token tried
Token ID3088
Feature activation-4.71e-01

Pos logit contributions
ires+0.040
ara+0.038
ia+0.038
ushing+0.037
ours+0.037
Neg logit contributions
 searching-0.033
 cheering-0.032
 rescuing-0.032
bench-0.030
 improving-0.030
Token to
Token ID284
Feature activation+4.36e-01

Pos logit contributions
 Amen+0.015
keeping+0.014
rob+0.014
bench+0.014
folk+0.013
Neg logit contributions
ia-0.008
ara-0.008
ires-0.007
okes-0.007
ours-0.007
Token found
Token ID1043
Feature activation+2.21e+00

Pos logit contributions
ires+0.095
ours+0.087
ia+0.086
ushing+0.085
uddle+0.082
Neg logit contributions
 searching-0.050
 cheering-0.048
 chasing-0.045
 exploring-0.043
keeping-0.043
Token a
Token ID257
Feature activation+8.14e-01

Pos logit contributions
ara+0.070
ia+0.069
ires+0.069
outed+0.067
ushing+0.067
Neg logit contributions
 cheering-0.032
 rescuing-0.032
 searching-0.032
keeping-0.030
 improving-0.029
Token new
Token ID649
Feature activation+8.02e-01

Pos logit contributions
ara+0.017
ia+0.017
uddle+0.017
ires+0.017
ours+0.016
Neg logit contributions
keeping-0.021
 rescuing-0.020
 cheering-0.019
rob-0.019
 searching-0.019
Token car
Token ID1097
Feature activation+1.10e+00

Pos logit contributions
ia+0.016
ara+0.016
ires+0.016
outed+0.015
uddle+0.015
Neg logit contributions
 rescuing-0.021
keeping-0.021
 cheering-0.020
 searching-0.020
Charl-0.020
Token in
Token ID287
Feature activation+3.07e-01

Pos logit contributions
ires+0.031
ia+0.031
ara+0.031
outed+0.030
ours+0.029
Neg logit contributions
 cheering-0.020
 rescuing-0.019
 searching-0.019
keeping-0.019
 improving-0.018
Token his
Token ID465
Feature activation+6.10e+00

Pos logit contributions
ara+0.008
ia+0.008
ires+0.008
outed+0.007
ours+0.007
Neg logit contributions
 rescuing-0.006
 cheering-0.006
 searching-0.006
keeping-0.006
hered-0.006
Token toy
Token ID13373
Feature activation+3.65e+00

Pos logit contributions
ires+0.168
ushing+0.164
ia+0.163
ara+0.162
outed+0.161
Neg logit contributions
 searching-0.119
 cheering-0.108
 exploring-0.107
bench-0.106
 rescuing-0.105
Token box
Token ID3091
Feature activation+1.72e+00

Pos logit contributions
ia+0.104
ires+0.104
outed+0.098
ara+0.098
ushing+0.098
Neg logit contributions
 searching-0.067
 cheering-0.065
rob-0.065
 rescuing-0.064
bench-0.063
Token.
Token ID13
Feature activation-1.61e-01

Pos logit contributions
ara+0.053
ires+0.052
ia+0.052
ours+0.051
outed+0.050
Neg logit contributions
 searching-0.028
 rescuing-0.026
bench-0.026
 cheering-0.025
keeping-0.025
Token It
Token ID632
Feature activation-4.32e-01

Pos logit contributions
 rescuing+0.003
 searching+0.003
 aback+0.002
 fixing+0.002
 cheering+0.002
Neg logit contributions
ara-0.005
aj-0.005
outed-0.005
ust-0.005
uddle-0.005
Token was
Token ID373
Feature activation+4.52e-01

Pos logit contributions
 cheering+0.008
 searching+0.008
bench+0.007
 rescuing+0.007
rob+0.007
Neg logit contributions
ires-0.013
ara-0.012
ia-0.012
ushing-0.012
ours-0.012
Token found
Token ID1043
Feature activation+2.01e+00

Pos logit contributions
ires+0.084
ia+0.077
ours+0.076
ushing+0.075
ara+0.073
Neg logit contributions
 searching-0.048
 cheering-0.046
bench-0.046
keeping-0.044
 improving-0.042
Token a
Token ID257
Feature activation+8.09e-01

Pos logit contributions
ires+0.057
ia+0.056
ara+0.056
outed+0.055
ours+0.054
Neg logit contributions
 cheering-0.036
keeping-0.035
 searching-0.035
 rescuing-0.035
bench-0.034
Token new
Token ID649
Feature activation+9.35e-01

Pos logit contributions
ara+0.016
ia+0.016
uddle+0.016
ires+0.015
ours+0.015
Neg logit contributions
keeping-0.022
 rescuing-0.020
rob-0.020
 cheering-0.020
hered-0.019
Token car
Token ID1097
Feature activation+1.15e+00

Pos logit contributions
ara+0.018
ia+0.018
ires+0.017
uddle+0.016
outed+0.016
Neg logit contributions
keeping-0.025
 rescuing-0.025
 cheering-0.025
 searching-0.024
hered-0.024
Token in
Token ID287
Feature activation+2.41e-01

Pos logit contributions
ires+0.030
ia+0.030
ara+0.030
outed+0.029
ushing+0.029
Neg logit contributions
 cheering-0.022
 rescuing-0.022
 searching-0.021
keeping-0.021
 improving-0.021
Token her
Token ID607
Feature activation+6.09e+00

Pos logit contributions
ara+0.006
ia+0.006
ires+0.006
outed+0.005
ours+0.005
Neg logit contributions
 rescuing-0.005
 cheering-0.005
 searching-0.005
rob-0.005
hered-0.005
Token toy
Token ID13373
Feature activation+3.55e+00

Pos logit contributions
ires+0.156
ia+0.153
ara+0.151
outed+0.151
ushing+0.150
Neg logit contributions
 searching-0.126
 cheering-0.118
bench-0.117
 rescuing-0.116
 exploring-0.115
Token box
Token ID3091
Feature activation+1.70e+00

Pos logit contributions
ia+0.092
ires+0.091
ara+0.087
outed+0.087
ushing+0.086
Neg logit contributions
 searching-0.073
 cheering-0.073
 rescuing-0.072
bench-0.070
rob-0.070
Token.
Token ID13
Feature activation-3.80e-01

Pos logit contributions
ara+0.053
ires+0.053
ia+0.052
ours+0.052
outed+0.050
Neg logit contributions
 searching-0.026
bench-0.025
 rescuing-0.025
 cheering-0.024
keeping-0.023
Token It
Token ID632
Feature activation-1.89e-01

Pos logit contributions
ner+0.007
 searching+0.007
 rescuing+0.007
 cheering+0.007
bench+0.006
Neg logit contributions
ia-0.011
ushing-0.010
ara-0.010
ires-0.010
outed-0.010
Token was
Token ID373
Feature activation+4.58e-01

Pos logit contributions
 cheering+0.004
 searching+0.004
 rescuing+0.004
bench+0.004
rob+0.004
Neg logit contributions
ires-0.005
ara-0.005
ia-0.005
ushing-0.005
outed-0.005
Token found
Token ID1043
Feature activation+2.13e+00

Pos logit contributions
ires+0.086
ours+0.076
ia+0.076
ushing+0.075
iring+0.074
Neg logit contributions
 searching-0.041
keeping-0.034
 cheering-0.034
 exploring-0.033
 looking-0.033
Token a
Token ID257
Feature activation+9.93e-01

Pos logit contributions
ia+0.062
ires+0.061
ara+0.061
outed+0.060
ours+0.059
Neg logit contributions
 cheering-0.037
 rescuing-0.036
 searching-0.036
keeping-0.035
bench-0.034
Token new
Token ID649
Feature activation+8.60e-01

Pos logit contributions
ara+0.022
ia+0.022
ires+0.021
uddle+0.021
outed+0.020
Neg logit contributions
keeping-0.024
 rescuing-0.024
 cheering-0.023
rob-0.023
 searching-0.023
Token toy
Token ID13373
Feature activation+1.55e+00

Pos logit contributions
ara+0.019
ia+0.018
ires+0.018
uddle+0.017
outed+0.017
Neg logit contributions
 cheering-0.021
 rescuing-0.021
keeping-0.021
rob-0.020
 searching-0.020
Token in
Token ID287
Feature activation+3.70e-01

Pos logit contributions
ires+0.040
ara+0.040
ia+0.040
outed+0.040
ours+0.039
Neg logit contributions
 rescuing-0.031
 searching-0.031
 cheering-0.030
hered-0.029
ner-0.029
Token her
Token ID607
Feature activation+6.07e+00

Pos logit contributions
ara+0.010
ia+0.010
ires+0.010
outed+0.009
ours+0.009
Neg logit contributions
 rescuing-0.007
 cheering-0.007
 searching-0.007
keeping-0.007
rob-0.007
Token box
Token ID3091
Feature activation+1.78e+00

Pos logit contributions
outed+0.162
ires+0.161
ours+0.159
ia+0.158
ara+0.158
Neg logit contributions
 searching-0.120
 exploring-0.108
 rescuing-0.106
Charl-0.106
bench-0.104
Token for
Token ID329
Feature activation-2.13e-01

Pos logit contributions
ara+0.054
ires+0.054
ia+0.053
ours+0.053
outed+0.051
Neg logit contributions
 searching-0.029
 rescuing-0.027
bench-0.027
 cheering-0.026
keeping-0.025
Token other
Token ID584
Feature activation+5.97e-01

Pos logit contributions
 cheering+0.005
 rescuing+0.004
rob+0.004
 searching+0.004
bench+0.004
Neg logit contributions
ia-0.005
ires-0.005
ara-0.005
outed-0.005
ours-0.005
Token things
Token ID1243
Feature activation+1.98e+00

Pos logit contributions
ia+0.014
ires+0.014
ara+0.014
outed+0.014
ours+0.013
Neg logit contributions
 cheering-0.013
 rescuing-0.013
 searching-0.013
keeping-0.013
bench-0.013
Token.
Token ID13
Feature activation-1.80e-01

Pos logit contributions
ia+0.044
ires+0.044
ara+0.044
ushing+0.042
outed+0.042
Neg logit contributions
 cheering-0.045
rob-0.045
 rescuing-0.044
keeping-0.043
Charl-0.043
Token finds
Token ID7228
Feature activation+2.48e+00

Pos logit contributions
ires+0.058
ia+0.056
outed+0.055
ushing+0.054
ours+0.053
Neg logit contributions
 searching-0.049
 cheering-0.046
keeping-0.046
 exploring-0.046
 rescuing-0.045
Token a
Token ID257
Feature activation+1.10e+00

Pos logit contributions
ia+0.075
ires+0.075
ara+0.075
outed+0.072
ours+0.071
Neg logit contributions
 cheering-0.040
 rescuing-0.039
 searching-0.039
keeping-0.038
bench-0.037
Token new
Token ID649
Feature activation+1.19e+00

Pos logit contributions
ara+0.025
ia+0.025
ires+0.025
outed+0.024
uddle+0.023
Neg logit contributions
 cheering-0.025
 rescuing-0.025
keeping-0.025
 searching-0.024
bench-0.024
Token toy
Token ID13373
Feature activation+1.80e+00

Pos logit contributions
ia+0.027
ara+0.027
ires+0.026
outed+0.025
ours+0.025
Neg logit contributions
 cheering-0.028
 rescuing-0.028
 searching-0.028
keeping-0.027
 improving-0.027
Token in
Token ID287
Feature activation+5.80e-01

Pos logit contributions
outed+0.047
ires+0.047
ara+0.047
ia+0.047
ours+0.045
Neg logit contributions
 rescuing-0.035
 searching-0.035
 cheering-0.034
hered-0.033
ner-0.033
Token her
Token ID607
Feature activation+6.05e+00

Pos logit contributions
ara+0.015
ia+0.015
ires+0.015
outed+0.014
ours+0.014
Neg logit contributions
 cheering-0.011
 rescuing-0.011
 searching-0.011
keeping-0.011
rob-0.011
Token room
Token ID2119
Feature activation+2.53e+00

Pos logit contributions
outed+0.162
ushing+0.159
ires+0.159
ours+0.157
ara+0.156
Neg logit contributions
 searching-0.121
 exploring-0.110
 rescuing-0.110
Charl-0.107
keeping-0.106
Token.
Token ID13
Feature activation-3.23e-01

Pos logit contributions
ires+0.080
ara+0.079
outed+0.077
ia+0.077
ours+0.076
Neg logit contributions
 searching-0.037
 rescuing-0.034
 cheering-0.034
bench-0.033
keeping-0.033
Token It
Token ID632
Feature activation+7.41e-02

Pos logit contributions
 searching+0.006
 rescuing+0.006
ner+0.006
bench+0.006
keeping+0.005
Neg logit contributions
outed-0.009
ires-0.009
ara-0.009
ushing-0.009
ia-0.009
Token is
Token ID318
Feature activation+3.84e-01

Pos logit contributions
ires+0.002
outed+0.002
ushing+0.002
ara+0.002
ia+0.002
Neg logit contributions
 searching-0.002
 cheering-0.002
 rescuing-0.002
bench-0.002
 improving-0.001
Token a
Token ID257
Feature activation+1.07e+00

Pos logit contributions
ara+0.012
ires+0.012
ia+0.012
outed+0.012
ours+0.012
Neg logit contributions
 searching-0.006
 cheering-0.006
 rescuing-0.006
 improving-0.005
bench-0.005
Token It
Token ID632
Feature activation+1.74e-01

Pos logit contributions
uddle+0.061
outed+0.056
ia+0.056
andy+0.055
iet+0.055
Neg logit contributions
 rescuing-0.033
 Amen-0.033
keeping-0.031
 cheering-0.030
-0.029
Token needs
Token ID2476
Feature activation+2.82e-01

Pos logit contributions
ires+0.004
ara+0.004
ia+0.004
outed+0.004
ushing+0.004
Neg logit contributions
 searching-0.004
 rescuing-0.004
 cheering-0.004
keeping-0.004
 improving-0.004
Token to
Token ID284
Feature activation+6.82e-01

Pos logit contributions
ires+0.007
ia+0.007
ara+0.007
ours+0.007
outed+0.007
Neg logit contributions
 cheering-0.006
rob-0.006
bench-0.006
keeping-0.006
 improving-0.006
Token find
Token ID1064
Feature activation+1.87e+00

Pos logit contributions
ires+0.018
outed+0.018
ara+0.018
ushing+0.018
ours+0.018
Neg logit contributions
 rescuing-0.014
 searching-0.014
 cheering-0.014
keeping-0.013
 improving-0.013
Token a
Token ID257
Feature activation+5.01e-01

Pos logit contributions
ara+0.049
ia+0.048
outed+0.048
ires+0.047
ushing+0.046
Neg logit contributions
 rescuing-0.039
 searching-0.038
 cheering-0.037
keeping-0.036
 exploring-0.035
Token mate
Token ID16133
Feature activation+6.03e+00

Pos logit contributions
ara+0.012
ia+0.012
ires+0.011
uddle+0.011
ours+0.011
Neg logit contributions
 cheering-0.011
 rescuing-0.011
rob-0.011
keeping-0.011
bench-0.011
Token and
Token ID290
Feature activation+2.05e+00

Pos logit contributions
ara+0.167
ires+0.165
ia+0.164
outed+0.163
ours+0.161
Neg logit contributions
 cheering-0.109
keeping-0.108
 rescuing-0.107
 searching-0.107
bench-0.100
Token lay
Token ID3830
Feature activation+2.96e+00

Pos logit contributions
ara+0.067
ires+0.065
ia+0.065
ushing+0.064
outed+0.064
Neg logit contributions
 searching-0.033
 rescuing-0.032
 cheering-0.030
 exploring-0.029
keeping-0.029
Token eggs
Token ID9653
Feature activation+6.35e-01

Pos logit contributions
ia+0.094
ara+0.092
ires+0.090
ushing+0.089
outed+0.088
Neg logit contributions
keeping-0.046
 searching-0.044
 rescuing-0.044
 cheering-0.041
 exploring-0.041
Token.
Token ID13
Feature activation+2.10e+00

Pos logit contributions
ires+0.015
ia+0.015
ara+0.015
ushing+0.015
outed+0.015
Neg logit contributions
 cheering-0.014
 rescuing-0.013
 searching-0.013
Charl-0.013
 improving-0.013
Token Then
Token ID3244
Feature activation-1.56e-01

Pos logit contributions
uddle+0.057
outed+0.055
ia+0.055
ushing+0.052
ara+0.052
Neg logit contributions
 rescuing-0.042
 Amen-0.041
keeping-0.040
 cheering-0.040
 searching-0.039
Token
Token ID198
Feature activation+1.13e-01

Pos logit contributions
ires+0.011
ushing+0.011
ia+0.011
ara+0.011
outed+0.011
Neg logit contributions
 cheering-0.005
 searching-0.005
 rescuing-0.005
bench-0.005
keeping-0.004
Token"
Token ID1
Feature activation-2.14e-01

Pos logit contributions
ires+0.003
outed+0.003
ushing+0.003
ours+0.003
ia+0.003
Neg logit contributions
 cheering-0.002
 rescuing-0.002
 searching-0.002
Charl-0.002
Ell-0.002
TokenAr
Token ID3163
Feature activation+1.89e+00

Pos logit contributions
 cheering+0.004
 rescuing+0.004
 searching+0.004
keeping+0.003
bench+0.003
Neg logit contributions
ires-0.006
ia-0.006
ara-0.006
outed-0.006
ushing-0.006
Tokenr
Token ID81
Feature activation+1.16e+00

Pos logit contributions
ia+0.040
ara+0.038
ires+0.036
ushing+0.035
outed+0.034
Neg logit contributions
keeping-0.050
bench-0.049
 searching-0.049
 cheering-0.048
Charl-0.048
Token,
Token ID11
Feature activation+8.49e-01

Pos logit contributions
ires+0.038
outed+0.037
ara+0.035
ours+0.035
ushing+0.034
Neg logit contributions
 cheering-0.015
hered-0.014
ner-0.014
 searching-0.013
 chasing-0.013
Token mate
Token ID16133
Feature activation+6.03e+00

Pos logit contributions
outed+0.023
ires+0.021
uddle+0.021
ushing+0.021
ara+0.020
Neg logit contributions
 searching-0.020
 rescuing-0.019
 cheering-0.019
 exploring-0.019
 helping-0.018
Tokeny
Token ID88
Feature activation-1.49e-01

Pos logit contributions
ires+0.153
outed+0.146
ours+0.144
ara+0.143
ushing+0.140
Neg logit contributions
 cheering-0.119
keeping-0.118
 rescuing-0.115
 searching-0.113
 exploring-0.108
Token,
Token ID11
Feature activation+9.65e-01

Pos logit contributions
 cheering+0.003
 rescuing+0.003
 searching+0.003
keeping+0.003
bench+0.003
Neg logit contributions
ia-0.004
ires-0.004
ara-0.004
outed-0.004
ours-0.004
Token where
Token ID810
Feature activation+1.11e+00

Pos logit contributions
outed+0.027
ires+0.025
ara+0.024
uddle+0.024
ushing+0.024
Neg logit contributions
 rescuing-0.022
 searching-0.022
 cheering-0.021
 exploring-0.020
 helping-0.020
Token is
Token ID318
Feature activation-3.56e-01

Pos logit contributions
ires+0.035
outed+0.033
ushing+0.033
ara+0.033
ia+0.032
Neg logit contributions
 searching-0.017
 cheering-0.017
 exploring-0.016
 rescuing-0.016
 flying-0.016
Token the
Token ID262
Feature activation-1.38e+00

Pos logit contributions
 searching+0.007
 rescuing+0.007
 cheering+0.007
 exploring+0.007
 improving+0.007
Neg logit contributions
ara-0.009
outed-0.009
ia-0.009
ires-0.009
ushing-0.009
Token She
Token ID1375
Feature activation+1.20e+00

Pos logit contributions
 cheering+0.023
 searching+0.023
 rescuing+0.023
keeping+0.022
bench+0.022
Neg logit contributions
ia-0.032
ires-0.032
ara-0.031
outed-0.030
ushing-0.030
Token blinked
Token ID40360
Feature activation+1.16e+00

Pos logit contributions
ires+0.037
ushing+0.034
ours+0.034
ia+0.033
uddle+0.032
Neg logit contributions
 searching-0.023
 exploring-0.020
 cheering-0.020
 chasing-0.019
 flying-0.019
Token her
Token ID607
Feature activation+2.14e+00

Pos logit contributions
ara+0.033
ia+0.033
ours+0.032
ires+0.032
uddle+0.032
Neg logit contributions
 searching-0.022
 cheering-0.021
 exploring-0.021
 rescuing-0.020
 flying-0.019
Token eyes
Token ID2951
Feature activation-6.26e-01

Pos logit contributions
ara+0.024
ia+0.022
ires+0.022
ours+0.021
outed+0.020
Neg logit contributions
 searching-0.076
 cheering-0.075
 rescuing-0.074
 exploring-0.073
 improving-0.072
Token and
Token ID290
Feature activation-1.40e-01

Pos logit contributions
 cheering+0.012
 rescuing+0.012
 searching+0.012
keeping+0.012
bench+0.012
Neg logit contributions
ia-0.016
ires-0.016
ara-0.016
outed-0.016
ushing-0.015
Token imm
Token ID2296
Feature activation+6.02e+00

Pos logit contributions
 searching+0.003
 cheering+0.003
 exploring+0.003
 rescuing+0.003
 improving+0.003
Neg logit contributions
ires-0.004
ara-0.004
ours-0.004
uddle-0.004
ia-0.004
Tokenag
Token ID363
Feature activation+1.70e+00

Pos logit contributions
uddle+0.125
ara+0.124
ia+0.117
aj+0.112
ours+0.111
Neg logit contributions
bench-0.149
keeping-0.145
 cheering-0.144
 improving-0.139
 searching-0.138
Tokenined
Token ID1389
Feature activation+1.26e+00

Pos logit contributions
ara+0.047
uddle+0.046
ours+0.044
ires+0.044
aj+0.043
Neg logit contributions
 rescuing-0.033
 searching-0.032
keeping-0.032
 cheering-0.031
 exploring-0.030
Token she
Token ID673
Feature activation-7.20e-01

Pos logit contributions
ara+0.035
ia+0.034
ires+0.034
outed+0.034
ours+0.033
Neg logit contributions
 cheering-0.025
 searching-0.024
 rescuing-0.024
 improving-0.023
 exploring-0.022
Token could
Token ID714
Feature activation-1.38e+00

Pos logit contributions
 searching+0.015
 cheering+0.015
 rescuing+0.014
 improving+0.014
 exploring+0.014
Neg logit contributions
ires-0.019
ara-0.019
ours-0.018
ia-0.018
ushing-0.018
Token see
Token ID766
Feature activation-1.05e-01

Pos logit contributions
 cheering+0.026
 searching+0.025
 rescuing+0.024
 improving+0.023
 exploring+0.023
Neg logit contributions
ara-0.040
ours-0.039
ires-0.039
outed-0.039
uddle-0.038
Token finds
Token ID7228
Feature activation+2.69e+00

Pos logit contributions
ires+0.053
ia+0.051
outed+0.050
ushing+0.050
ara+0.049
Neg logit contributions
 searching-0.039
 cheering-0.038
keeping-0.037
bench-0.036
 rescuing-0.036
Token a
Token ID257
Feature activation+9.19e-01

Pos logit contributions
ires+0.090
ara+0.090
ia+0.089
ushing+0.088
outed+0.087
Neg logit contributions
 searching-0.034
 rescuing-0.034
 cheering-0.033
keeping-0.031
bench-0.030
Token new
Token ID649
Feature activation+1.14e+00

Pos logit contributions
ara+0.021
ia+0.021
ires+0.020
uddle+0.020
outed+0.019
Neg logit contributions
keeping-0.021
 rescuing-0.021
 cheering-0.021
 searching-0.021
rob-0.020
Token car
Token ID1097
Feature activation+1.29e+00

Pos logit contributions
ia+0.024
ara+0.024
ires+0.023
outed+0.022
uddle+0.022
Neg logit contributions
 rescuing-0.029
 cheering-0.029
keeping-0.029
 searching-0.028
 improving-0.028
Token in
Token ID287
Feature activation+7.73e-01

Pos logit contributions
ia+0.036
ires+0.036
ara+0.036
outed+0.034
ours+0.034
Neg logit contributions
 cheering-0.024
 rescuing-0.024
 searching-0.023
keeping-0.023
bench-0.022
Token his
Token ID465
Feature activation+6.02e+00

Pos logit contributions
ia+0.021
ara+0.021
ires+0.021
outed+0.020
ours+0.020
Neg logit contributions
 cheering-0.015
 rescuing-0.015
 searching-0.015
keeping-0.014
 improving-0.014
Token toy
Token ID13373
Feature activation+3.72e+00

Pos logit contributions
ires+0.163
ushing+0.158
outed+0.157
ara+0.156
ia+0.156
Neg logit contributions
 searching-0.119
 cheering-0.109
bench-0.108
 exploring-0.107
 rescuing-0.106
Token box
Token ID3091
Feature activation+2.09e+00

Pos logit contributions
ires+0.104
ia+0.103
outed+0.099
ushing+0.098
ara+0.098
Neg logit contributions
 searching-0.069
 cheering-0.068
 rescuing-0.067
bench-0.066
rob-0.066
Token.
Token ID13
Feature activation-6.09e-03

Pos logit contributions
ara+0.067
ires+0.067
ours+0.066
ia+0.065
outed+0.065
Neg logit contributions
bench-0.029
 searching-0.028
 rescuing-0.027
keeping-0.025
 cheering-0.025
Token It
Token ID632
Feature activation+8.62e-02

Pos logit contributions
 rescuing+0.000
 searching+0.000
bench+0.000
 cheering+0.000
keeping+0.000
Neg logit contributions
ara-0.000
outed-0.000
ires-0.000
ushing-0.000
aj-0.000
Token is
Token ID318
Feature activation+2.26e-01

Pos logit contributions
ires+0.002
ushing+0.002
outed+0.002
ara+0.002
ia+0.002
Neg logit contributions
 cheering-0.002
 searching-0.002
rob-0.002
bench-0.002
 rescuing-0.002
Token finds
Token ID7228
Feature activation+2.71e+00

Pos logit contributions
ires+0.045
outed+0.043
ia+0.042
ours+0.042
ushing+0.042
Neg logit contributions
 searching-0.036
keeping-0.034
 cheering-0.034
 rescuing-0.033
 exploring-0.033
Token a
Token ID257
Feature activation+6.91e-01

Pos logit contributions
ara+0.091
ires+0.090
ia+0.090
ushing+0.089
uddle+0.088
Neg logit contributions
 searching-0.034
 rescuing-0.034
 cheering-0.032
keeping-0.029
bench-0.029
Token big
Token ID1263
Feature activation-4.91e-01

Pos logit contributions
ara+0.015
ia+0.015
ires+0.015
uddle+0.014
ours+0.014
Neg logit contributions
keeping-0.016
 rescuing-0.016
 cheering-0.016
 searching-0.016
rob-0.015
Token puzzle
Token ID15027
Feature activation+4.48e-01

Pos logit contributions
keeping+0.012
 rescuing+0.011
 cheering+0.011
Charl+0.011
Ell+0.011
Neg logit contributions
ara-0.011
ia-0.011
ires-0.010
uddle-0.010
ours-0.010
Token in
Token ID287
Feature activation+7.99e-01

Pos logit contributions
ia+0.011
ires+0.011
ara+0.011
ushing+0.010
outed+0.010
Neg logit contributions
 cheering-0.010
keeping-0.009
 rescuing-0.009
Charl-0.009
rob-0.009
Token his
Token ID465
Feature activation+6.02e+00

Pos logit contributions
ia+0.022
ires+0.022
ara+0.022
outed+0.021
ours+0.021
Neg logit contributions
 cheering-0.015
 rescuing-0.015
 searching-0.015
keeping-0.014
bench-0.014
Token room
Token ID2119
Feature activation+2.31e+00

Pos logit contributions
ires+0.145
ara+0.145
outed+0.145
ia+0.143
ushing+0.142
Neg logit contributions
 searching-0.131
 rescuing-0.125
bench-0.121
 cheering-0.121
keeping-0.120
Token.
Token ID13
Feature activation-2.79e-01

Pos logit contributions
ara+0.074
ires+0.073
ia+0.072
uddle+0.071
ours+0.070
Neg logit contributions
 searching-0.033
bench-0.032
 rescuing-0.031
keeping-0.030
 cheering-0.030
Token It
Token ID632
Feature activation+1.31e-01

Pos logit contributions
bench+0.004
 searching+0.004
 aback+0.004
ner+0.004
 rescuing+0.004
Neg logit contributions
ust-0.008
ushing-0.008
outed-0.008
uddle-0.008
ires-0.008
Token has
Token ID468
Feature activation-7.45e-01

Pos logit contributions
ires+0.003
ara+0.003
ia+0.003
outed+0.003
ushing+0.003
Neg logit contributions
 searching-0.003
 cheering-0.003
 rescuing-0.003
bench-0.003
 improving-0.003
Token many
Token ID867
Feature activation-1.00e+00

Pos logit contributions
 cheering+0.014
 rescuing+0.014
 searching+0.014
keeping+0.014
bench+0.013
Neg logit contributions
ia-0.020
ires-0.020
ara-0.020
outed-0.019
ours-0.019
Token finds
Token ID7228
Feature activation+2.65e+00

Pos logit contributions
ires+0.066
ia+0.063
outed+0.062
ushing+0.061
ara+0.061
Neg logit contributions
 searching-0.052
 cheering-0.050
keeping-0.048
 exploring-0.048
 improving-0.047
Token a
Token ID257
Feature activation+1.22e+00

Pos logit contributions
ires+0.083
ara+0.082
ia+0.082
outed+0.080
ushing+0.080
Neg logit contributions
 rescuing-0.039
 searching-0.039
 cheering-0.039
keeping-0.037
 improving-0.036
Token new
Token ID649
Feature activation+9.32e-01

Pos logit contributions
ara+0.027
ia+0.027
ires+0.026
uddle+0.026
ours+0.025
Neg logit contributions
keeping-0.029
 rescuing-0.029
 cheering-0.028
rob-0.028
 searching-0.028
Token jewelry
Token ID22634
Feature activation+1.38e+00

Pos logit contributions
ara+0.020
ia+0.020
ires+0.019
outed+0.018
uddle+0.018
Neg logit contributions
 cheering-0.023
 rescuing-0.023
keeping-0.022
 searching-0.022
ner-0.022
Token in
Token ID287
Feature activation+7.46e-01

Pos logit contributions
ires+0.044
ia+0.044
ara+0.044
outed+0.043
ours+0.042
Neg logit contributions
 searching-0.020
 rescuing-0.019
 cheering-0.019
keeping-0.018
bench-0.018
Token her
Token ID607
Feature activation+6.01e+00

Pos logit contributions
ia+0.020
ara+0.020
ires+0.020
outed+0.019
ours+0.019
Neg logit contributions
 cheering-0.014
 rescuing-0.014
 searching-0.014
keeping-0.014
bench-0.013
Token mom
Token ID1995
Feature activation-8.03e-01

Pos logit contributions
outed+0.144
ours+0.143
ires+0.143
ushing+0.140
ara+0.139
Neg logit contributions
 searching-0.135
 rescuing-0.124
 exploring-0.124
Charl-0.123
keeping-0.121
Token's
Token ID338
Feature activation+3.47e+00

Pos logit contributions
 cheering+0.019
 rescuing+0.019
rob+0.019
ner+0.018
keeping+0.018
Neg logit contributions
ara-0.018
ia-0.017
ires-0.017
ushing-0.016
outed-0.016
Token drawer
Token ID33451
Feature activation+1.94e+00

Pos logit contributions
ara+0.080
ires+0.080
ia+0.079
outed+0.077
ours+0.076
Neg logit contributions
 searching-0.081
keeping-0.079
 rescuing-0.079
 cheering-0.078
bench-0.076
Token.
Token ID13
Feature activation-4.00e-01

Pos logit contributions
ires+0.061
ara+0.061
ia+0.058
ours+0.058
outed+0.057
Neg logit contributions
 searching-0.030
 rescuing-0.028
bench-0.027
keeping-0.026
 exploring-0.025
Token It
Token ID632
Feature activation+2.15e-02

Pos logit contributions
 searching+0.005
 exploring+0.005
keeping+0.005
 Smith+0.005
 rescuing+0.005
Neg logit contributions
ushing-0.012
inch-0.012
ust-0.012
aj-0.012
ia-0.012
Token found
Token ID1043
Feature activation+2.42e+00

Pos logit contributions
ires+0.091
ours+0.083
ushing+0.083
ia+0.079
iring+0.078
Neg logit contributions
 searching-0.044
 chasing-0.040
 exploring-0.038
 cheering-0.037
Ell-0.035
Token a
Token ID257
Feature activation+9.04e-01

Pos logit contributions
ara+0.076
ia+0.076
ires+0.076
outed+0.074
ushing+0.073
Neg logit contributions
 rescuing-0.036
 cheering-0.036
 searching-0.035
keeping-0.033
 improving-0.033
Token big
Token ID1263
Feature activation-7.93e-01

Pos logit contributions
ara+0.020
ia+0.020
ires+0.020
uddle+0.019
outed+0.019
Neg logit contributions
keeping-0.021
 rescuing-0.021
 cheering-0.021
 searching-0.020
rob-0.020
Token hammer
Token ID15554
Feature activation-7.03e-02

Pos logit contributions
keeping+0.019
 cheering+0.019
 rescuing+0.019
rob+0.018
Charl+0.018
Neg logit contributions
ara-0.018
ia-0.017
uddle-0.017
ires-0.016
ours-0.016
Token in
Token ID287
Feature activation+6.11e-01

Pos logit contributions
 cheering+0.001
keeping+0.001
 rescuing+0.001
 searching+0.001
Charl+0.001
Neg logit contributions
ires-0.002
ia-0.002
ara-0.002
outed-0.002
ushing-0.002
Token his
Token ID465
Feature activation+5.99e+00

Pos logit contributions
ara+0.016
ia+0.016
ires+0.015
ours+0.014
outed+0.014
Neg logit contributions
 cheering-0.013
 rescuing-0.013
 searching-0.012
hered-0.012
keeping-0.012
Token dad
Token ID9955
Feature activation-1.27e+00

Pos logit contributions
ires+0.159
ara+0.155
ia+0.154
outed+0.153
ours+0.151
Neg logit contributions
 searching-0.125
 rescuing-0.114
 exploring-0.111
bench-0.109
 cheering-0.107
Token's
Token ID338
Feature activation+4.03e+00

Pos logit contributions
 cheering+0.033
rob+0.033
 rescuing+0.032
Charl+0.032
keeping+0.032
Neg logit contributions
ara-0.025
ia-0.024
ushing-0.023
ires-0.023
ours-0.022
Token tool
Token ID2891
Feature activation+1.77e+00

Pos logit contributions
ara+0.102
ires+0.100
ia+0.100
outed+0.095
ours+0.094
Neg logit contributions
 searching-0.089
keeping-0.084
 rescuing-0.083
bench-0.083
 cheering-0.081
Tokenbox
Token ID3524
Feature activation+1.21e+00

Pos logit contributions
ia+0.046
ires+0.046
ara+0.046
outed+0.045
ours+0.044
Neg logit contributions
 cheering-0.035
 rescuing-0.034
 searching-0.034
keeping-0.033
 improving-0.032
Token.
Token ID13
Feature activation-4.75e-01

Pos logit contributions
ia+0.034
ara+0.033
ires+0.033
ours+0.032
outed+0.032
Neg logit contributions
 searching-0.023
 rescuing-0.022
 cheering-0.022
keeping-0.022
bench-0.021
Token she
Token ID673
Feature activation+2.73e+00

Pos logit contributions
ushing+0.058
ia+0.058
ires+0.058
ours+0.055
uddle+0.055
Neg logit contributions
 searching-0.039
 cheering-0.037
 rescuing-0.037
ner-0.036
 helping-0.035
Token found
Token ID1043
Feature activation+2.08e+00

Pos logit contributions
ires+0.088
ia+0.079
ours+0.078
ushing+0.077
iring+0.075
Neg logit contributions
 searching-0.047
 cheering-0.044
 chasing-0.043
 exploring-0.042
hered-0.041
Token an
Token ID281
Feature activation+1.33e+00

Pos logit contributions
ara+0.059
ia+0.059
ires+0.059
outed+0.057
ours+0.056
Neg logit contributions
 cheering-0.037
 searching-0.035
 rescuing-0.035
keeping-0.035
bench-0.035
Token olive
Token ID19450
Feature activation-7.42e-01

Pos logit contributions
ara+0.031
ia+0.031
ires+0.031
outed+0.030
ours+0.030
Neg logit contributions
 cheering-0.029
 searching-0.029
 rescuing-0.029
keeping-0.028
 improving-0.028
Token in
Token ID287
Feature activation+6.55e-01

Pos logit contributions
 cheering+0.015
keeping+0.014
 rescuing+0.013
 improving+0.013
rob+0.013
Neg logit contributions
ia-0.020
ara-0.020
ires-0.019
uddle-0.019
ushing-0.019
Token her
Token ID607
Feature activation+5.96e+00

Pos logit contributions
ara+0.016
ia+0.016
ires+0.015
outed+0.014
ours+0.014
Neg logit contributions
 cheering-0.014
 rescuing-0.014
hered-0.014
 searching-0.014
 improving-0.014
Token backyard
Token ID24296
Feature activation+1.98e+00

Pos logit contributions
ires+0.155
outed+0.152
ours+0.150
ia+0.149
ara+0.148
Neg logit contributions
 searching-0.125
 rescuing-0.116
 exploring-0.116
Charl-0.115
keeping-0.113
Token.
Token ID13
Feature activation-7.23e-01

Pos logit contributions
ia+0.051
ires+0.051
ara+0.051
outed+0.049
ours+0.049
Neg logit contributions
 cheering-0.040
 rescuing-0.039
 searching-0.039
keeping-0.038
bench-0.037
Token She
Token ID1375
Feature activation+1.27e+00

Pos logit contributions
 searching+0.016
 cheering+0.015
 rescuing+0.015
keeping+0.015
ner+0.015
Neg logit contributions
ires-0.018
ia-0.018
ushing-0.017
outed-0.017
ara-0.017
Token was
Token ID373
Feature activation+1.02e-01

Pos logit contributions
ires+0.035
ara+0.033
ia+0.033
ushing+0.032
ours+0.032
Neg logit contributions
 searching-0.025
 cheering-0.024
 rescuing-0.023
 improving-0.022
 exploring-0.022
Token so
Token ID523
Feature activation-1.18e+00

Pos logit contributions
outed+0.003
ires+0.003
ia+0.003
ushing+0.003
ours+0.003
Neg logit contributions
rob-0.002
keeping-0.002
robat-0.002
bench-0.002
hered-0.002
Token noticed
Token ID6810
Feature activation+2.89e-01

Pos logit contributions
 cheering+0.029
 searching+0.028
 rescuing+0.028
 improving+0.026
keeping+0.026
Neg logit contributions
ires-0.039
ia-0.039
ara-0.039
ours-0.037
outed-0.037
Token her
Token ID607
Feature activation+3.47e-01

Pos logit contributions
ires+0.008
ara+0.008
ia+0.008
outed+0.008
ours+0.008
Neg logit contributions
 cheering-0.006
 searching-0.005
 rescuing-0.005
keeping-0.005
 improving-0.005
Token and
Token ID290
Feature activation-1.45e+00

Pos logit contributions
ires+0.010
ara+0.010
ia+0.010
outed+0.009
ours+0.009
Neg logit contributions
 cheering-0.006
 searching-0.006
 rescuing-0.006
keeping-0.006
 improving-0.006
Token invited
Token ID9392
Feature activation-4.58e-01

Pos logit contributions
 cheering+0.029
 searching+0.028
 rescuing+0.027
 improving+0.026
 exploring+0.026
Neg logit contributions
ires-0.040
ara-0.040
ia-0.039
ushing-0.038
outed-0.038
Token her
Token ID607
Feature activation-1.02e+00

Pos logit contributions
 rescuing+0.009
 cheering+0.009
 searching+0.009
keeping+0.009
bench+0.009
Neg logit contributions
ia-0.011
ires-0.011
ara-0.011
outed-0.011
ours-0.011
Token on
Token ID319
Feature activation-5.78e+00

Pos logit contributions
 cheering+0.016
 searching+0.016
 rescuing+0.016
keeping+0.015
 improving+0.015
Neg logit contributions
ires-0.031
ara-0.031
ia-0.031
outed-0.030
ours-0.029
Token stage
Token ID3800
Feature activation-3.92e-01

Pos logit contributions
rob+0.137
ner+0.127
keeping+0.127
 rescuing+0.126
bench+0.124
Neg logit contributions
ia-0.135
ara-0.130
ires-0.128
outed-0.120
ushing-0.116
Token to
Token ID284
Feature activation-1.66e+00

Pos logit contributions
 cheering+0.007
 searching+0.007
 rescuing+0.007
keeping+0.007
 improving+0.007
Neg logit contributions
ara-0.011
ires-0.011
ia-0.011
outed-0.010
ours-0.010
Token play
Token ID711
Feature activation-1.40e+00

Pos logit contributions
 cheering+0.031
 rescuing+0.030
 searching+0.030
keeping+0.029
 improving+0.028
Neg logit contributions
ires-0.046
ara-0.046
ia-0.046
outed-0.045
ours-0.044
Token.
Token ID13
Feature activation-6.74e-01

Pos logit contributions
 rescuing+0.030
 searching+0.029
keeping+0.029
 cheering+0.029
bench+0.028
Neg logit contributions
ia-0.034
ara-0.034
ires-0.033
outed-0.031
uddle-0.031
Token She
Token ID1375
Feature activation+2.47e-01

Pos logit contributions
 cheering+0.016
 searching+0.015
 rescuing+0.015
keeping+0.015
bench+0.015
Neg logit contributions
ires-0.015
ia-0.015
ara-0.015
outed-0.015
ushing-0.015
Token while
Token ID981
Feature activation-1.68e+00

Pos logit contributions
ia+0.013
ara+0.013
ires+0.013
outed+0.012
ushing+0.012
Neg logit contributions
 rescuing-0.011
 cheering-0.011
 searching-0.011
rob-0.011
keeping-0.011
Token Molly
Token ID30236
Feature activation-1.55e+00

Pos logit contributions
 cheering+0.029
 searching+0.029
 rescuing+0.028
keeping+0.027
 improving+0.027
Neg logit contributions
ara-0.049
ires-0.049
ia-0.049
outed-0.048
ours-0.047
Token wasn
Token ID2492
Feature activation-1.33e+00

Pos logit contributions
 cheering+0.032
 searching+0.032
 rescuing+0.032
keeping+0.031
 improving+0.030
Neg logit contributions
ires-0.040
ia-0.038
ara-0.038
outed-0.037
ours-0.037
Tokenâ
Token ID22940
Feature activation-9.33e-01

Pos logit contributions
keeping+0.019
rob+0.018
 rescuing+0.018
 searching+0.017
 cheering+0.017
Neg logit contributions
ia-0.043
ara-0.043
ires-0.042
outed-0.040
ours-0.040
Token
Token ID26391
Feature activation-2.78e+00

Pos logit contributions
 cheering+0.017
 searching+0.017
 rescuing+0.016
keeping+0.016
 improving+0.015
Neg logit contributions
ara-0.027
ia-0.026
ires-0.026
outed-0.025
ushing-0.025
Token
Token ID8151
Feature activation-5.51e+00

Pos logit contributions
 cheering+0.053
 searching+0.052
 rescuing+0.052
keeping+0.051
hered+0.049
Neg logit contributions
ires-0.076
ia-0.074
ara-0.074
outed-0.073
ours-0.072
Tokent
Token ID83
Feature activation-1.55e+00

Pos logit contributions
 searching+0.100
 cheering+0.100
 rescuing+0.099
keeping+0.099
rob+0.097
Neg logit contributions
ia-0.152
ara-0.152
outed-0.148
ires-0.147
ushing-0.146
Token looking
Token ID2045
Feature activation-2.83e-01

Pos logit contributions
rob+0.028
bench+0.025
Charl+0.024
 rescuing+0.024
Ell+0.024
Neg logit contributions
ia-0.043
ushing-0.042
ara-0.042
outed-0.041
ires-0.040
Token.
Token ID13
Feature activation-7.06e-01

Pos logit contributions
 cheering+0.006
 searching+0.006
 rescuing+0.006
keeping+0.006
 improving+0.005
Neg logit contributions
ia-0.007
ara-0.007
ires-0.007
outed-0.007
ours-0.007
Token
Token ID198
Feature activation-9.46e-01

Pos logit contributions
 cheering+0.010
 searching+0.010
 rescuing+0.010
keeping+0.010
 improving+0.009
Neg logit contributions
ia-0.022
ires-0.022
ara-0.022
outed-0.021
ushing-0.021
Token
Token ID198
Feature activation-1.30e+00

Pos logit contributions
 rescuing+0.017
 cheering+0.017
 searching+0.017
keeping+0.017
bench+0.016
Neg logit contributions
ia-0.026
ara-0.026
ires-0.026
outed-0.025
ours-0.025
Token that
Token ID326
Feature activation-1.32e+00

Pos logit contributions
 cheering+0.058
 rescuing+0.057
keeping+0.056
 searching+0.056
rob+0.056
Neg logit contributions
ia-0.058
ara-0.057
ires-0.056
outed-0.055
ushing-0.054
Token shouting
Token ID19642
Feature activation-5.68e-01

Pos logit contributions
 cheering+0.029
 rescuing+0.029
 searching+0.029
keeping+0.028
bench+0.028
Neg logit contributions
ia-0.031
ara-0.031
ires-0.031
outed-0.029
ours-0.029
Token wasn
Token ID2492
Feature activation-1.40e+00

Pos logit contributions
 cheering+0.011
 rescuing+0.010
 searching+0.010
 improving+0.010
hered+0.010
Neg logit contributions
ires-0.016
ara-0.016
ia-0.016
outed-0.015
ours-0.015
Tokenâ
Token ID22940
Feature activation-1.04e+00

Pos logit contributions
keeping+0.017
 searching+0.017
 cheering+0.017
rob+0.016
 rescuing+0.016
Neg logit contributions
ia-0.047
ara-0.046
ires-0.046
outed-0.044
ours-0.044
Token
Token ID26391
Feature activation-2.69e+00

Pos logit contributions
 cheering+0.019
 searching+0.019
 rescuing+0.019
keeping+0.018
 improving+0.018
Neg logit contributions
ara-0.029
ia-0.029
ires-0.029
outed-0.028
ushing-0.028
Token
Token ID8151
Feature activation-5.44e+00

Pos logit contributions
 cheering+0.051
 searching+0.050
 rescuing+0.050
keeping+0.048
hered+0.047
Neg logit contributions
ires-0.074
ia-0.073
ara-0.072
outed-0.071
ours-0.070
Tokent
Token ID83
Feature activation-1.99e+00

Pos logit contributions
 cheering+0.106
 searching+0.105
keeping+0.104
 rescuing+0.104
rob+0.102
Neg logit contributions
ara-0.144
ia-0.144
ushing-0.139
outed-0.139
ires-0.137
Token a
Token ID257
Feature activation-1.28e+00

Pos logit contributions
rob+0.029
keeping+0.027
bench+0.027
 rescuing+0.026
Charl+0.026
Neg logit contributions
ia-0.061
ires-0.060
ara-0.060
outed-0.059
ushing-0.059
Token regular
Token ID3218
Feature activation-1.50e+00

Pos logit contributions
rob+0.029
keeping+0.029
 rescuing+0.028
bench+0.028
 cheering+0.028
Neg logit contributions
ara-0.029
ia-0.029
uddle-0.028
ushing-0.027
ires-0.027
Token thing
Token ID1517
Feature activation+2.29e-01

Pos logit contributions
 cheering+0.032
 rescuing+0.032
 searching+0.032
keeping+0.031
bench+0.030
Neg logit contributions
ires-0.037
ia-0.037
ara-0.037
outed-0.036
ours-0.035
Token to
Token ID284
Feature activation-1.19e+00

Pos logit contributions
ia+0.006
ara+0.006
ires+0.006
outed+0.006
ushing+0.006
Neg logit contributions
 cheering-0.004
 searching-0.004
 rescuing-0.004
keeping-0.004
bench-0.004
Token grandma
Token ID49890
Feature activation-1.16e+00

Pos logit contributions
ires+0.074
ara+0.073
ia+0.071
ours+0.070
outed+0.070
Neg logit contributions
 searching-0.109
 cheering-0.109
 rescuing-0.105
 exploring-0.103
 improving-0.103
Token's
Token ID338
Feature activation+1.41e+00

Pos logit contributions
 cheering+0.026
 rescuing+0.025
keeping+0.025
rob+0.025
 searching+0.025
Neg logit contributions
ara-0.027
ia-0.027
ires-0.026
ushing-0.025
ours-0.025
Token house
Token ID2156
Feature activation+2.18e-01

Pos logit contributions
ia+0.028
ires+0.028
ara+0.028
outed+0.026
ushing+0.026
Neg logit contributions
 cheering-0.037
 rescuing-0.036
 searching-0.036
keeping-0.035
ner-0.035
Token for
Token ID329
Feature activation-1.10e+00

Pos logit contributions
ia+0.005
ara+0.005
ires+0.005
outed+0.005
ushing+0.005
Neg logit contributions
 cheering-0.005
 rescuing-0.004
 searching-0.004
rob-0.004
keeping-0.004
Token safe
Token ID3338
Feature activation-5.76e-01

Pos logit contributions
rob+0.022
 cheering+0.022
Charl+0.021
hered+0.021
bench+0.021
Neg logit contributions
ia-0.027
ires-0.027
ara-0.026
outed-0.025
ushing-0.025
Tokenkeeping
Token ID19934
Feature activation-5.44e+00

Pos logit contributions
ner+0.010
rob+0.010
Charl+0.010
 cheering+0.010
odic+0.010
Neg logit contributions
ia-0.015
ara-0.015
ires-0.015
outed-0.014
ushing-0.014
Token.
Token ID13
Feature activation+2.56e-01

Pos logit contributions
rob+0.151
Charl+0.147
Brian+0.145
 cheering+0.143
Silver+0.142
Neg logit contributions
ia-0.092
ushing-0.089
ara-0.087
ry-0.087
outed-0.086
Token
Token ID220
Feature activation-7.32e-01

Pos logit contributions
ia+0.008
ires+0.008
ara+0.008
ushing+0.008
ours+0.008
Neg logit contributions
 searching-0.004
 cheering-0.004
 rescuing-0.004
keeping-0.004
 exploring-0.004
Token
Token ID198
Feature activation-1.32e-01

Pos logit contributions
 cheering+0.011
 rescuing+0.010
 searching+0.010
keeping+0.010
bench+0.010
Neg logit contributions
ara-0.023
ia-0.023
ires-0.022
outed-0.022
ours-0.022
Token
Token ID198
Feature activation-8.39e-02

Pos logit contributions
 rescuing+0.003
 searching+0.003
 cheering+0.003
keeping+0.003
 improving+0.003
Neg logit contributions
ara-0.003
ia-0.003
ires-0.003
outed-0.003
uddle-0.003
TokenWhen
Token ID2215
Feature activation-1.04e+00

Pos logit contributions
 cheering+0.002
 rescuing+0.002
 searching+0.002
keeping+0.002
 improving+0.002
Neg logit contributions
ires-0.002
ia-0.002
ara-0.002
outed-0.002
ours-0.002
Token and
Token ID290
Feature activation-9.96e-01

Pos logit contributions
ires+0.016
ia+0.016
outed+0.015
ours+0.015
ara+0.015
Neg logit contributions
 searching-0.011
 cheering-0.011
keeping-0.011
 rescuing-0.011
 exploring-0.011
Token her
Token ID607
Feature activation+3.37e+00

Pos logit contributions
 cheering+0.022
 searching+0.021
 rescuing+0.021
keeping+0.021
 exploring+0.020
Neg logit contributions
ires-0.025
ia-0.025
ara-0.024
outed-0.024
ours-0.024
Token mom
Token ID1995
Feature activation-5.69e-01

Pos logit contributions
ires+0.042
ia+0.041
ara+0.040
outed+0.039
ours+0.039
Neg logit contributions
 cheering-0.114
 searching-0.112
 rescuing-0.111
Charl-0.109
hered-0.108
Token were
Token ID547
Feature activation-2.09e+00

Pos logit contributions
 cheering+0.007
 searching+0.007
 rescuing+0.006
 exploring+0.006
 improving+0.006
Neg logit contributions
ires-0.020
ia-0.019
ushing-0.019
uddle-0.018
outed-0.018
Token going
Token ID1016
Feature activation-3.90e+00

Pos logit contributions
keeping+0.033
 rescuing+0.033
rob+0.032
 cheering+0.032
bench+0.032
Neg logit contributions
ires-0.062
outed-0.062
ara-0.062
ia-0.061
ours-0.060
Token on
Token ID319
Feature activation-5.39e+00

Pos logit contributions
 cheering+0.066
 rescuing+0.065
keeping+0.065
bench+0.063
Charl+0.062
Neg logit contributions
ires-0.111
ara-0.109
ia-0.108
outed-0.107
ours-0.107
Token a
Token ID257
Feature activation+1.26e+00

Pos logit contributions
bench+0.126
 rescuing+0.124
keeping+0.119
rob+0.119
 cheering+0.118
Neg logit contributions
ara-0.126
ia-0.115
ires-0.111
ours-0.110
outed-0.105
Token tour
Token ID4205
Feature activation+1.59e+00

Pos logit contributions
ires+0.034
ia+0.034
ara+0.034
outed+0.033
ours+0.033
Neg logit contributions
 searching-0.024
 cheering-0.024
 rescuing-0.023
keeping-0.022
 improving-0.022
Token.
Token ID13
Feature activation-1.70e+00

Pos logit contributions
ia+0.041
ires+0.041
ara+0.041
outed+0.040
ours+0.039
Neg logit contributions
 cheering-0.032
 searching-0.031
 rescuing-0.031
keeping-0.030
bench-0.030
Token Mil
Token ID4460
Feature activation+1.97e+00

Pos logit contributions
 cheering+0.038
 searching+0.038
 exploring+0.036
 +0.035
 having+0.034
Neg logit contributions
ires-0.042
ushing-0.039
ia-0.037
ust-0.036
inch-0.036
Tokena
Token ID64
Feature activation-2.54e-01

Pos logit contributions
ires+0.043
ours+0.041
ara+0.040
uddle+0.039
ia+0.039
Neg logit contributions
 searching-0.045
 cheering-0.045
keeping-0.044
 rescuing-0.043
bench-0.043
Token of
Token ID286
Feature activation-1.86e+00

Pos logit contributions
 cheering+0.011
 searching+0.011
 rescuing+0.011
keeping+0.011
 improving+0.011
Neg logit contributions
ara-0.014
ia-0.013
ires-0.013
outed-0.013
ours-0.013
Token her
Token ID607
Feature activation-7.30e-01

Pos logit contributions
rob+0.038
bench+0.038
hered+0.038
 searching+0.038
 cheering+0.037
Neg logit contributions
ia-0.047
ires-0.046
ara-0.044
ushing-0.042
ours-0.041
Token daughter
Token ID4957
Feature activation-4.52e+00

Pos logit contributions
rob+0.016
keeping+0.016
 rescuing+0.015
Ell+0.015
bench+0.015
Neg logit contributions
ia-0.018
ires-0.018
ara-0.017
ushing-0.017
uddle-0.017
Tokenâ
Token ID22940
Feature activation-1.56e+00

Pos logit contributions
rob+0.106
 rescuing+0.100
bench+0.099
keeping+0.098
Charl+0.098
Neg logit contributions
ia-0.102
ara-0.098
ushing-0.098
ires-0.097
ours-0.093
Token
Token ID26391
Feature activation-3.57e+00

Pos logit contributions
 searching+0.031
 cheering+0.031
 rescuing+0.031
 improving+0.029
 chasing+0.029
Neg logit contributions
ara-0.042
ia-0.040
ires-0.038
ushing-0.037
outed-0.037
Token
Token ID8151
Feature activation-5.37e+00

Pos logit contributions
 rescuing+0.076
 cheering+0.073
 searching+0.073
 improving+0.070
keeping+0.070
Neg logit contributions
ara-0.091
ia-0.089
ires-0.085
ushing-0.085
outed-0.083
Tokens
Token ID82
Feature activation-1.21e+00

Pos logit contributions
 searching+0.114
 rescuing+0.113
 cheering+0.113
 improving+0.110
keeping+0.109
Neg logit contributions
ia-0.133
ara-0.132
ires-0.127
ushing-0.125
outed-0.125
Token new
Token ID649
Feature activation-1.46e+00

Pos logit contributions
rob+0.028
keeping+0.024
Ell+0.024
bench+0.023
ner+0.023
Neg logit contributions
ia-0.031
ires-0.029
ushing-0.029
outed-0.027
ara-0.027
Token skill
Token ID5032
Feature activation-2.35e+00

Pos logit contributions
ner+0.035
 rescuing+0.034
Charl+0.034
rob+0.033
Ell+0.033
Neg logit contributions
ia-0.032
ara-0.031
uddle-0.030
ires-0.030
ushing-0.029
Token.
Token ID13
Feature activation-1.16e+00

Pos logit contributions
rob+0.052
 rescuing+0.052
 cheering+0.052
 searching+0.052
Charl+0.050
Neg logit contributions
ia-0.054
ires-0.052
ara-0.051
ushing-0.051
outed-0.049
Token<|endoftext|>
Token ID50256
Feature activation-4.02e+00

Pos logit contributions
 cheering+0.020
 rescuing+0.020
 searching+0.020
keeping+0.019
bench+0.019
Neg logit contributions
ara-0.033
ia-0.033
ires-0.033
outed-0.032
ours-0.032
Token They
Token ID1119
Feature activation-1.03e+00

Pos logit contributions
 cheering+0.022
 searching+0.021
 rescuing+0.021
keeping+0.021
bench+0.020
Neg logit contributions
ia-0.021
ires-0.021
ara-0.021
outed-0.020
ushing-0.020
Token cheered
Token ID34767
Feature activation-7.50e-01

Pos logit contributions
 cheering+0.023
 rescuing+0.023
 searching+0.022
keeping+0.022
bench+0.022
Neg logit contributions
ia-0.024
ara-0.024
ires-0.024
outed-0.024
ours-0.023
Token and
Token ID290
Feature activation-2.59e+00

Pos logit contributions
 cheering+0.015
 rescuing+0.015
 searching+0.015
 improving+0.014
keeping+0.014
Neg logit contributions
ara-0.020
ia-0.020
ires-0.019
outed-0.019
ours-0.019
Token smiled
Token ID13541
Feature activation-5.67e-01

Pos logit contributions
bench+0.056
 rescuing+0.055
 searching+0.054
keeping+0.054
 cheering+0.053
Neg logit contributions
ia-0.061
ara-0.061
outed-0.058
ires-0.058
ushing-0.057
Token all
Token ID477
Feature activation-3.49e+00

Pos logit contributions
 cheering+0.011
 searching+0.011
 rescuing+0.011
 improving+0.010
hered+0.010
Neg logit contributions
ara-0.015
ia-0.015
ires-0.015
ours-0.015
outed-0.015
Token day
Token ID1110
Feature activation-5.31e+00

Pos logit contributions
 searching+0.066
 rescuing+0.065
ner+0.064
keeping+0.063
 improving+0.062
Neg logit contributions
ia-0.096
ara-0.093
ires-0.091
outed-0.090
ours-0.086
Token long
Token ID890
Feature activation-2.67e+00

Pos logit contributions
rob+0.135
keeping+0.125
 rescuing+0.123
 Amen+0.121
Charl+0.121
Neg logit contributions
ia-0.110
ushing-0.109
ires-0.103
outed-0.102
ara-0.098
Token!
Token ID0
Feature activation-8.75e-01

Pos logit contributions
 cheering+0.061
bench+0.060
 rescuing+0.060
 searching+0.059
rob+0.059
Neg logit contributions
ia-0.061
ara-0.059
ires-0.058
outed-0.055
ushing-0.055
Token<|endoftext|>
Token ID50256
Feature activation-3.30e+00

Pos logit contributions
 cheering+0.018
 rescuing+0.018
 searching+0.018
keeping+0.017
 improving+0.017
Neg logit contributions
ara-0.023
ia-0.022
ires-0.022
outed-0.021
ours-0.021
TokenTom
Token ID13787
Feature activation+7.52e-01

Pos logit contributions
 rescuing+0.080
 cheering+0.079
 improving+0.077
 searching+0.076
bench+0.075
Neg logit contributions
ara-0.079
ia-0.073
ires-0.069
uddle-0.065
ours-0.065
Tokenmy
Token ID1820
Feature activation-2.59e-01

Pos logit contributions
ires+0.025
ia+0.024
outed+0.023
ushing+0.023
ours+0.023
Neg logit contributions
 searching-0.010
 cheering-0.010
 rescuing-0.010
keeping-0.009
 exploring-0.009
Token about
Token ID546
Feature activation-1.32e+00

Pos logit contributions
 cheering+0.086
 rescuing+0.086
 searching+0.085
keeping+0.084
bench+0.083
Neg logit contributions
ia-0.073
ara-0.072
ires-0.072
outed-0.068
ours-0.068
Token the
Token ID262
Feature activation-2.62e+00

Pos logit contributions
hered+0.031
ner+0.031
bench+0.030
rob+0.030
keeping+0.029
Neg logit contributions
ires-0.028
ia-0.027
ara-0.026
ours-0.026
outed-0.024
Token day
Token ID1110
Feature activation-3.44e+00

Pos logit contributions
keeping+0.061
ner+0.060
Charl+0.060
bench+0.059
 cheering+0.059
Neg logit contributions
ia-0.060
ires-0.057
uddle-0.056
ara-0.056
ushing-0.052
Tokenâ
Token ID22940
Feature activation-4.91e-01

Pos logit contributions
 cheering+0.072
 rescuing+0.071
 searching+0.071
keeping+0.069
bench+0.067
Neg logit contributions
ia-0.086
ires-0.086
ara-0.086
outed-0.083
ours-0.081
Token
Token ID26391
Feature activation-2.60e+00

Pos logit contributions
 cheering+0.010
 searching+0.009
 rescuing+0.009
keeping+0.009
 improving+0.009
Neg logit contributions
ara-0.013
ia-0.013
ires-0.013
ushing-0.012
outed-0.012
Token
Token ID8151
Feature activation-5.26e+00

Pos logit contributions
 cheering+0.052
 rescuing+0.051
 searching+0.051
keeping+0.049
bench+0.048
Neg logit contributions
ia-0.068
ara-0.068
ires-0.067
outed-0.065
ushing-0.064
Tokens
Token ID82
Feature activation-1.95e-01

Pos logit contributions
 cheering+0.113
 rescuing+0.112
 searching+0.112
keeping+0.109
 improving+0.108
Neg logit contributions
ia-0.129
ara-0.128
ires-0.126
outed-0.122
ushing-0.120
Token adventure
Token ID8855
Feature activation-2.06e+00

Pos logit contributions
rob+0.004
ner+0.004
Charl+0.004
bench+0.004
keeping+0.004
Neg logit contributions
ia-0.005
ires-0.005
ushing-0.005
ara-0.005
outed-0.005
Token.
Token ID13
Feature activation-7.51e-01

Pos logit contributions
 cheering+0.045
 rescuing+0.045
 searching+0.044
keeping+0.043
Charl+0.043
Neg logit contributions
ia-0.049
ara-0.048
ires-0.047
ushing-0.045
outed-0.044
Token<|endoftext|>
Token ID50256
Feature activation-3.56e+00

Pos logit contributions
 cheering+0.013
 rescuing+0.013
 searching+0.013
keeping+0.013
bench+0.013
Neg logit contributions
ara-0.021
ia-0.021
ires-0.021
outed-0.020
ours-0.020
TokenOnce
Token ID7454
Feature activation+3.17e+00

Pos logit contributions
 rescuing+0.081
 cheering+0.080
 improving+0.078
keeping+0.077
 searching+0.077
Neg logit contributions
ara-0.089
ia-0.084
ires-0.080
ours-0.077
uddle-0.076
Token
Token ID26391
Feature activation-1.21e+00

Pos logit contributions
ara+0.018
ia+0.018
ires+0.017
outed+0.017
ushing+0.017
Neg logit contributions
 cheering-0.011
 searching-0.011
 rescuing-0.011
keeping-0.011
 improving-0.010
Token said
Token ID531
Feature activation-1.08e+00

Pos logit contributions
 cheering+0.025
 rescuing+0.025
 searching+0.025
keeping+0.024
bench+0.024
Neg logit contributions
ara-0.030
ia-0.030
ires-0.030
outed-0.029
ushing-0.029
Token Ali
Token ID12104
Feature activation-2.39e+00

Pos logit contributions
 cheering+0.019
 rescuing+0.019
 searching+0.018
 exploring+0.017
ner+0.017
Neg logit contributions
ires-0.033
outed-0.032
ara-0.031
ia-0.031
ours-0.030
Tokenâ
Token ID22940
Feature activation-6.15e-01

Pos logit contributions
 cheering+0.046
 rescuing+0.045
 searching+0.045
keeping+0.044
 improving+0.043
Neg logit contributions
ires-0.064
ara-0.064
ia-0.063
outed-0.062
ours-0.061
Token
Token ID26391
Feature activation-3.20e+00

Pos logit contributions
 cheering+0.012
 searching+0.012
 rescuing+0.012
keeping+0.011
 improving+0.011
Neg logit contributions
ara-0.017
ia-0.017
ires-0.016
outed-0.016
ushing-0.016
Token
Token ID8151
Feature activation-5.23e+00

Pos logit contributions
 cheering+0.063
 rescuing+0.062
 searching+0.062
keeping+0.060
bench+0.059
Neg logit contributions
ires-0.084
ia-0.084
ara-0.083
outed-0.081
ours-0.080
Tokens
Token ID82
Feature activation+1.54e-02

Pos logit contributions
 searching+0.099
 cheering+0.099
 rescuing+0.098
 improving+0.095
keeping+0.093
Neg logit contributions
ia-0.145
ara-0.143
ires-0.138
outed-0.137
ushing-0.134
Token mom
Token ID1995
Feature activation-2.57e+00

Pos logit contributions
ara+0.000
ires+0.000
outed+0.000
ia+0.000
ours+0.000
Neg logit contributions
 rescuing-0.000
 searching-0.000
 cheering-0.000
keeping-0.000
 exploring-0.000
Token.
Token ID13
Feature activation-1.14e+00

Pos logit contributions
 cheering+0.049
 rescuing+0.049
 searching+0.048
keeping+0.047
bench+0.046
Neg logit contributions
ia-0.069
ara-0.068
ires-0.068
outed-0.066
ushing-0.065
Token
Token ID220
Feature activation-1.48e+00

Pos logit contributions
 cheering+0.021
 searching+0.020
 rescuing+0.019
keeping+0.019
 exploring+0.018
Neg logit contributions
ires-0.033
ushing-0.031
outed-0.031
ara-0.031
ia-0.031
Token
Token ID198
Feature activation-1.39e+00

Pos logit contributions
 cheering+0.024
 rescuing+0.024
 searching+0.023
keeping+0.023
bench+0.022
Neg logit contributions
ia-0.043
ires-0.043
ara-0.043
outed-0.042
ushing-0.042
Token realised
Token ID19169
Feature activation+1.61e+00

Pos logit contributions
ires+0.009
ara+0.008
ours+0.008
ia+0.008
outed+0.008
Neg logit contributions
 searching-0.007
 cheering-0.007
 rescuing-0.006
 improving-0.006
 exploring-0.006
Token he
Token ID339
Feature activation-8.13e-01

Pos logit contributions
ara+0.041
ia+0.040
ires+0.040
outed+0.040
ushing+0.039
Neg logit contributions
 searching-0.035
 rescuing-0.034
 cheering-0.034
 exploring-0.033
 improving-0.032
Token wasn
Token ID2492
Feature activation-1.08e+00

Pos logit contributions
 cheering+0.018
 rescuing+0.018
 searching+0.017
keeping+0.017
bench+0.017
Neg logit contributions
ia-0.020
ara-0.020
ires-0.019
outed-0.019
ushing-0.019
Tokenâ
Token ID22940
Feature activation-1.03e-01

Pos logit contributions
keeping+0.015
 searching+0.015
rob+0.014
 rescuing+0.014
 cheering+0.014
Neg logit contributions
ia-0.035
ara-0.034
ires-0.034
outed-0.033
ushing-0.032
Token
Token ID26391
Feature activation-2.41e+00

Pos logit contributions
 cheering+0.002
 searching+0.002
 rescuing+0.002
keeping+0.002
 improving+0.002
Neg logit contributions
ara-0.003
ia-0.003
ires-0.003
outed-0.003
ushing-0.003
Token
Token ID8151
Feature activation-5.23e+00

Pos logit contributions
 cheering+0.049
 rescuing+0.049
 searching+0.048
keeping+0.047
bench+0.046
Neg logit contributions
ia-0.061
ara-0.061
ires-0.060
outed-0.059
ushing-0.059
Tokent
Token ID83
Feature activation-1.45e+00

Pos logit contributions
keeping+0.110
rob+0.108
 cheering+0.107
 searching+0.107
 improving+0.104
Neg logit contributions
ia-0.131
ara-0.130
outed-0.128
ushing-0.128
uddle-0.124
Token scared
Token ID12008
Feature activation-3.04e-01

Pos logit contributions
rob+0.030
bench+0.028
 cheering+0.028
keeping+0.027
ner+0.027
Neg logit contributions
ia-0.036
outed-0.035
ushing-0.035
ires-0.035
ara-0.034
Token to
Token ID284
Feature activation-1.02e+00

Pos logit contributions
 cheering+0.007
 rescuing+0.007
 searching+0.007
keeping+0.006
 improving+0.006
Neg logit contributions
ia-0.007
ara-0.007
ires-0.007
outed-0.007
ours-0.007
Token go
Token ID467
Feature activation-2.02e+00

Pos logit contributions
 rescuing+0.021
 cheering+0.021
 searching+0.021
 improving+0.020
keeping+0.020
Neg logit contributions
ires-0.027
outed-0.027
ara-0.027
ours-0.027
ia-0.026
Token to
Token ID284
Feature activation+5.09e-01

Pos logit contributions
 cheering+0.034
 searching+0.034
 rescuing+0.033
keeping+0.032
 improving+0.031
Neg logit contributions
ia-0.059
ires-0.059
ara-0.059
outed-0.058
ours-0.057
Token knew
Token ID2993
Feature activation+1.79e+00

Pos logit contributions
ires+0.036
ara+0.034
ours+0.034
ia+0.032
ushing+0.032
Neg logit contributions
 searching-0.027
 cheering-0.025
 rescuing-0.025
 exploring-0.024
 chasing-0.023
Token she
Token ID673
Feature activation-1.36e+00

Pos logit contributions
ara+0.048
ires+0.048
outed+0.047
ia+0.046
ours+0.046
Neg logit contributions
 rescuing-0.036
 searching-0.036
 cheering-0.035
 exploring-0.034
 improving-0.033
Token wasn
Token ID2492
Feature activation-9.78e-01

Pos logit contributions
 searching+0.023
 rescuing+0.022
 cheering+0.020
keeping+0.020
 exploring+0.020
Neg logit contributions
ires-0.042
ia-0.040
ara-0.040
ours-0.040
ushing-0.039
Tokenâ
Token ID22940
Feature activation-5.53e-02

Pos logit contributions
keeping+0.016
 searching+0.015
bench+0.015
rob+0.014
 cheering+0.014
Neg logit contributions
ia-0.030
ara-0.030
ires-0.029
ushing-0.027
outed-0.027
Token
Token ID26391
Feature activation-2.40e+00

Pos logit contributions
 cheering+0.001
 searching+0.001
 rescuing+0.001
 improving+0.001
keeping+0.001
Neg logit contributions
ara-0.002
ia-0.002
ires-0.001
outed-0.001
ushing-0.001
Token
Token ID8151
Feature activation-5.22e+00

Pos logit contributions
 cheering+0.049
 rescuing+0.048
 searching+0.047
keeping+0.046
bench+0.046
Neg logit contributions
ia-0.062
ara-0.062
ires-0.061
outed-0.059
ushing-0.059
Tokent
Token ID83
Feature activation-1.30e+00

Pos logit contributions
 cheering+0.104
 searching+0.103
keeping+0.102
rob+0.101
 rescuing+0.101
Neg logit contributions
ara-0.137
ia-0.136
outed-0.132
ushing-0.130
ires-0.128
Token a
Token ID257
Feature activation-3.66e-01

Pos logit contributions
rob+0.030
bench+0.027
ner+0.027
Charl+0.027
 cheering+0.026
Neg logit contributions
ia-0.032
ushing-0.029
outed-0.029
ara-0.029
ust-0.028
Token threat
Token ID2372
Feature activation-8.92e-02

Pos logit contributions
keeping+0.009
rob+0.009
bench+0.009
 cheering+0.009
 rescuing+0.009
Neg logit contributions
ara-0.008
ia-0.008
uddle-0.007
ires-0.007
ushing-0.007
Token and
Token ID290
Feature activation+1.67e-01

Pos logit contributions
 cheering+0.002
 rescuing+0.002
rob+0.002
 searching+0.002
keeping+0.002
Neg logit contributions
ia-0.002
ara-0.002
ires-0.002
ushing-0.002
outed-0.002
Token it
Token ID340
Feature activation-1.09e+00

Pos logit contributions
ires+0.005
ara+0.005
ia+0.005
uddle+0.005
ushing+0.005
Neg logit contributions
 searching-0.003
 rescuing-0.003
 exploring-0.003
 cheering-0.003
 helping-0.003
Token They
Token ID1119
Feature activation-1.26e+00

Pos logit contributions
 cheering+0.002
 rescuing+0.002
 searching+0.002
keeping+0.002
bench+0.002
Neg logit contributions
ia-0.002
ires-0.002
ara-0.002
outed-0.002
ours-0.002
Token laughed
Token ID13818
Feature activation-8.82e-01

Pos logit contributions
 rescuing+0.027
keeping+0.026
 searching+0.026
 cheering+0.025
bench+0.025
Neg logit contributions
ia-0.031
ara-0.030
outed-0.030
ires-0.029
ours-0.029
Token and
Token ID290
Feature activation-2.30e+00

Pos logit contributions
 cheering+0.018
 rescuing+0.018
 searching+0.018
keeping+0.017
bench+0.017
Neg logit contributions
ia-0.023
ires-0.023
ara-0.023
outed-0.022
ushing-0.022
Token smiled
Token ID13541
Feature activation-3.34e-01

Pos logit contributions
 rescuing+0.051
bench+0.050
keeping+0.049
Charl+0.049
 searching+0.049
Neg logit contributions
ia-0.052
ara-0.051
outed-0.049
ours-0.046
ires-0.046
Token all
Token ID477
Feature activation-3.43e+00

Pos logit contributions
 cheering+0.006
 searching+0.006
 rescuing+0.006
 improving+0.006
 exploring+0.006
Neg logit contributions
ara-0.009
ia-0.009
ires-0.009
ours-0.009
outed-0.009
Token day
Token ID1110
Feature activation-5.22e+00

Pos logit contributions
 rescuing+0.064
 searching+0.063
keeping+0.063
ner+0.061
hered+0.060
Neg logit contributions
ia-0.096
ara-0.092
ires-0.090
outed-0.089
ours-0.086
Token.
Token ID13
Feature activation-4.62e-01

Pos logit contributions
rob+0.132
keeping+0.128
 Amen+0.126
 rescuing+0.123
Charl+0.122
Neg logit contributions
ia-0.114
ushing-0.106
ires-0.102
outed-0.098
ara-0.096
Token
Token ID198
Feature activation-2.26e-01

Pos logit contributions
 cheering+0.009
 searching+0.009
 rescuing+0.009
keeping+0.009
bench+0.009
Neg logit contributions
ia-0.012
ires-0.012
ara-0.012
outed-0.012
ours-0.012
Token
Token ID198
Feature activation-6.49e-01

Pos logit contributions
 cheering+0.004
 rescuing+0.004
 searching+0.004
keeping+0.004
bench+0.004
Neg logit contributions
ia-0.006
ara-0.006
ires-0.006
outed-0.006
ours-0.006
TokenAt
Token ID2953
Feature activation-3.07e-01

Pos logit contributions
 cheering+0.015
 rescuing+0.015
 searching+0.014
keeping+0.014
 improving+0.014
Neg logit contributions
ires-0.015
ara-0.015
ia-0.015
outed-0.014
ours-0.014
Token the
Token ID262
Feature activation+6.65e-01

Pos logit contributions
 rescuing+0.006
keeping+0.006
 cheering+0.006
bench+0.006
Charl+0.006
Neg logit contributions
ia-0.008
ara-0.008
ires-0.008
ours-0.008
outed-0.008
Token played
Token ID2826
Feature activation-6.31e-01

Pos logit contributions
 cheering+0.027
 searching+0.026
 rescuing+0.025
 exploring+0.024
 improving+0.024
Neg logit contributions
ires-0.035
ia-0.034
ara-0.034
ours-0.033
ushing-0.033
Token games
Token ID1830
Feature activation-2.00e-01

Pos logit contributions
 rescuing+0.012
keeping+0.012
 cheering+0.011
 searching+0.011
bench+0.011
Neg logit contributions
ia-0.017
ires-0.017
ara-0.017
uddle-0.016
ushing-0.016
Token and
Token ID290
Feature activation-8.00e-01

Pos logit contributions
keeping+0.005
 rescuing+0.005
 Amen+0.005
Charl+0.005
rob+0.005
Neg logit contributions
ia-0.004
ara-0.004
uddle-0.004
ires-0.004
ushing-0.004
Token laughed
Token ID13818
Feature activation-2.19e-01

Pos logit contributions
bench+0.015
 rescuing+0.015
Charl+0.015
keeping+0.015
 cheering+0.015
Neg logit contributions
ia-0.021
ara-0.020
ires-0.019
outed-0.019
ours-0.018
Token all
Token ID477
Feature activation-2.60e+00

Pos logit contributions
 cheering+0.004
 rescuing+0.004
 searching+0.004
keeping+0.004
bench+0.004
Neg logit contributions
ia-0.006
ires-0.006
ara-0.006
outed-0.006
ushing-0.005
Token day
Token ID1110
Feature activation-5.20e+00

Pos logit contributions
 rescuing+0.040
 searching+0.040
keeping+0.038
 cheering+0.038
 improving+0.037
Neg logit contributions
ia-0.080
ara-0.079
ires-0.078
outed-0.076
ours-0.074
Token.
Token ID13
Feature activation-6.13e-01

Pos logit contributions
 rescuing+0.129
rob+0.128
 cheering+0.121
keeping+0.119
 improving+0.117
Neg logit contributions
ia-0.110
ires-0.104
ara-0.103
ushing-0.102
outed-0.100
Token When
Token ID1649
Feature activation-1.10e+00

Pos logit contributions
 rescuing+0.015
 cheering+0.015
 searching+0.015
keeping+0.014
rob+0.014
Neg logit contributions
ara-0.014
ia-0.014
ires-0.014
outed-0.013
ours-0.013
Token it
Token ID340
Feature activation-9.42e-01

Pos logit contributions
 rescuing+0.028
 cheering+0.028
keeping+0.027
 searching+0.027
bench+0.027
Neg logit contributions
ara-0.023
ia-0.023
ires-0.022
ours-0.021
outed-0.021
Token was
Token ID373
Feature activation-1.28e+00

Pos logit contributions
 cheering+0.017
 searching+0.017
hered+0.016
ner+0.015
 improving+0.015
Neg logit contributions
ires-0.029
ours-0.027
ushing-0.026
outed-0.026
ara-0.026
Token time
Token ID640
Feature activation-2.97e+00

Pos logit contributions
 cheering+0.022
 rescuing+0.022
 searching+0.021
keeping+0.021
bench+0.020
Neg logit contributions
ia-0.037
ara-0.037
ires-0.037
outed-0.036
ours-0.035
Tokenuddled
Token ID32745
Feature activation+3.44e-01

Pos logit contributions
ia+0.058
ara+0.055
ours+0.054
ires+0.052
ry+0.051
Neg logit contributions
hered-0.081
hed-0.080
 rescuing-0.079
Ell-0.079
g-0.078
Token the
Token ID262
Feature activation-2.14e+00

Pos logit contributions
ires+0.007
ara+0.007
ia+0.006
outed+0.006
erve+0.006
Neg logit contributions
bench-0.009
ner-0.008
Charl-0.008
 cheering-0.008
rob-0.008
Token cat
Token ID3797
Feature activation-1.61e+00

Pos logit contributions
rob+0.050
keeping+0.050
 cheering+0.049
bench+0.047
 Amen+0.046
Neg logit contributions
ara-0.050
ia-0.048
uddle-0.048
ires-0.044
outed-0.043
Token and
Token ID290
Feature activation-5.65e-01

Pos logit contributions
 cheering+0.037
rob+0.037
bench+0.036
keeping+0.035
 rescuing+0.035
Neg logit contributions
ia-0.037
ara-0.035
ushing-0.034
ires-0.034
uddle-0.034
Token said
Token ID531
Feature activation-2.37e+00

Pos logit contributions
 cheering+0.012
 rescuing+0.012
 searching+0.012
keeping+0.012
 improving+0.012
Neg logit contributions
ires-0.014
ia-0.013
ara-0.013
outed-0.013
ours-0.013
Token ''
Token ID10148
Feature activation-5.20e+00

Pos logit contributions
 cheering+0.048
 searching+0.047
keeping+0.047
 rescuing+0.047
bench+0.046
Neg logit contributions
ia-0.061
ires-0.060
ara-0.060
outed-0.057
ours-0.057
TokenYou
Token ID1639
Feature activation-4.91e-01

Pos logit contributions
 searching+0.114
bench+0.108
 cheering+0.105
 rescuing+0.105
 exploring+0.103
Neg logit contributions
ara-0.134
outed-0.111
ires-0.110
ushing-0.108
ours-0.106
Token are
Token ID389
Feature activation-1.00e+00

Pos logit contributions
 cheering+0.009
 rescuing+0.009
 searching+0.009
keeping+0.008
 improving+0.008
Neg logit contributions
ires-0.014
ia-0.014
ara-0.014
outed-0.013
ours-0.013
Token safe
Token ID3338
Feature activation-4.03e-01

Pos logit contributions
rob+0.023
keeping+0.022
bench+0.022
hered+0.020
 cheering+0.020
Neg logit contributions
ia-0.025
ires-0.023
ara-0.022
ushing-0.021
outed-0.021
Token now
Token ID783
Feature activation-2.01e+00

Pos logit contributions
rob+0.008
 cheering+0.008
bench+0.008
 searching+0.008
 rescuing+0.008
Neg logit contributions
ara-0.010
ia-0.010
ires-0.010
ushing-0.010
outed-0.009
Token!
Token ID0
Feature activation-1.63e-01

Pos logit contributions
rob+0.039
 cheering+0.039
 searching+0.038
 rescuing+0.037
keeping+0.037
Neg logit contributions
ia-0.052
ara-0.052
ires-0.051
ushing-0.050
outed-0.048
Token them
Token ID606
Feature activation-1.88e-01

Pos logit contributions
 cheering+0.060
 rescuing+0.059
 searching+0.059
keeping+0.059
bench+0.058
Neg logit contributions
ia-0.065
ires-0.064
ara-0.064
outed-0.061
ours-0.060
Token talked
Token ID6619
Feature activation+3.82e-01

Pos logit contributions
 searching+0.004
 cheering+0.004
 rescuing+0.004
keeping+0.004
hered+0.004
Neg logit contributions
ires-0.005
ara-0.005
ia-0.005
ours-0.005
outed-0.005
Token and
Token ID290
Feature activation-1.84e+00

Pos logit contributions
ia+0.009
ires+0.009
ara+0.008
outed+0.008
ushing+0.008
Neg logit contributions
rob-0.009
keeping-0.008
ner-0.008
 cheering-0.008
 searching-0.008
Token laughed
Token ID13818
Feature activation+9.22e-02

Pos logit contributions
bench+0.042
ner+0.041
rob+0.041
Charl+0.041
 cheering+0.039
Neg logit contributions
ia-0.040
ara-0.038
outed-0.037
ires-0.037
ushing-0.036
Token all
Token ID477
Feature activation-2.79e+00

Pos logit contributions
ia+0.002
ires+0.002
ara+0.002
outed+0.002
ushing+0.002
Neg logit contributions
 cheering-0.002
 rescuing-0.002
 searching-0.002
keeping-0.002
bench-0.002
Token day
Token ID1110
Feature activation-5.18e+00

Pos logit contributions
 searching+0.045
 rescuing+0.045
 cheering+0.044
keeping+0.044
ner+0.044
Neg logit contributions
ia-0.084
ires-0.081
ara-0.081
outed-0.080
ours-0.077
Token.
Token ID13
Feature activation-6.81e-01

Pos logit contributions
rob+0.131
Charl+0.124
 rescuing+0.123
 Amen+0.123
 cheering+0.122
Neg logit contributions
ia-0.105
ushing-0.097
outed-0.095
ires-0.095
uddle-0.090
Token
Token ID198
Feature activation-5.29e-01

Pos logit contributions
 cheering+0.015
 rescuing+0.015
keeping+0.015
 searching+0.015
rob+0.015
Neg logit contributions
ires-0.016
ara-0.015
outed-0.015
ia-0.014
ushing-0.014
Token
Token ID198
Feature activation-1.03e+00

Pos logit contributions
keeping+0.013
 rescuing+0.012
 searching+0.012
 cheering+0.012
ner+0.012
Neg logit contributions
ia-0.012
ara-0.011
ires-0.011
outed-0.010
uddle-0.010
TokenBilly
Token ID49640
Feature activation-2.43e-01

Pos logit contributions
keeping+0.023
 searching+0.022
 cheering+0.022
 rescuing+0.022
ner+0.021
Neg logit contributions
ara-0.024
ires-0.024
ia-0.024
ushing-0.023
outed-0.022
Token made
Token ID925
Feature activation+2.05e-01

Pos logit contributions
 rescuing+0.005
 cheering+0.005
 searching+0.005
keeping+0.005
bench+0.005
Neg logit contributions
ia-0.006
ara-0.006
ires-0.005
outed-0.005
ushing-0.005
Token praised
Token ID15342
Feature activation+4.19e-01

Pos logit contributions
ires+0.011
ia+0.011
ara+0.011
outed+0.011
ours+0.011
Neg logit contributions
 cheering-0.010
 searching-0.010
 rescuing-0.009
keeping-0.009
 improving-0.009
Token the
Token ID262
Feature activation-2.95e+00

Pos logit contributions
ia+0.011
ires+0.010
ara+0.010
outed+0.010
ushing+0.010
Neg logit contributions
bench-0.009
 cheering-0.009
 searching-0.009
 rescuing-0.008
keeping-0.008
Token rabbit
Token ID22746
Feature activation-3.00e+00

Pos logit contributions
keeping+0.071
bench+0.069
 cheering+0.068
rob+0.068
 rescuing+0.068
Neg logit contributions
ia-0.067
ara-0.064
ires-0.062
uddle-0.061
ushing-0.059
Tokenâ
Token ID22940
Feature activation-5.08e-01

Pos logit contributions
rob+0.070
keeping+0.069
bench+0.069
 cheering+0.068
ner+0.066
Neg logit contributions
ia-0.070
ara-0.065
ires-0.065
ushing-0.063
ours-0.060
Token
Token ID26391
Feature activation-3.19e+00

Pos logit contributions
 cheering+0.011
 searching+0.010
 rescuing+0.010
keeping+0.010
 improving+0.010
Neg logit contributions
ara-0.013
ia-0.013
ires-0.013
ushing-0.012
outed-0.012
Token
Token ID8151
Feature activation-5.14e+00

Pos logit contributions
 rescuing+0.073
 cheering+0.072
 searching+0.071
bench+0.071
 improving+0.070
Neg logit contributions
ia-0.074
ara-0.074
ires-0.070
ushing-0.069
uddle-0.069
Tokens
Token ID82
Feature activation-3.90e-01

Pos logit contributions
 searching+0.119
 cheering+0.118
 rescuing+0.118
keeping+0.117
 improving+0.116
Neg logit contributions
ia-0.119
ara-0.116
ires-0.113
outed-0.111
ushing-0.109
Token determination
Token ID12123
Feature activation-3.30e+00

Pos logit contributions
rob+0.008
keeping+0.007
ner+0.007
bench+0.007
 searching+0.007
Neg logit contributions
ia-0.011
ires-0.010
ushing-0.010
ara-0.010
outed-0.010
Token and
Token ID290
Feature activation-7.94e-01

Pos logit contributions
rob+0.078
 rescuing+0.076
keeping+0.075
 cheering+0.075
Charl+0.074
Neg logit contributions
ia-0.074
ara-0.069
ires-0.069
ushing-0.068
outed-0.066
Token gave
Token ID2921
Feature activation+1.30e+00

Pos logit contributions
 cheering+0.017
 rescuing+0.016
 searching+0.016
keeping+0.016
 exploring+0.015
Neg logit contributions
ires-0.021
ara-0.021
ia-0.021
outed-0.020
ours-0.020
Token it
Token ID340
Feature activation-2.06e+00

Pos logit contributions
ires+0.036
ia+0.036
ara+0.036
outed+0.035
ushing+0.034
Neg logit contributions
 cheering-0.024
 rescuing-0.024
 searching-0.024
keeping-0.023
 improving-0.022
Token
Token ID198
Feature activation-5.44e-01

Pos logit contributions
 rescuing+0.004
 cheering+0.004
 searching+0.004
keeping+0.004
rob+0.004
Neg logit contributions
ia-0.006
ara-0.006
ires-0.006
outed-0.006
ours-0.006
TokenBut
Token ID1537
Feature activation+2.68e-01

Pos logit contributions
 cheering+0.008
 searching+0.008
 rescuing+0.008
keeping+0.007
George+0.007
Neg logit contributions
ires-0.017
ours-0.017
ia-0.017
uddle-0.017
ara-0.017
Token his
Token ID465
Feature activation+3.78e+00

Pos logit contributions
uddle+0.008
ia+0.008
ires+0.008
ara+0.008
ushing+0.008
Neg logit contributions
 searching-0.005
 exploring-0.005
 rescuing-0.005
 cheering-0.005
 fixing-0.004
Token mom
Token ID1995
Feature activation-9.46e-01

Pos logit contributions
ires+0.062
ara+0.059
ours+0.058
ushing+0.057
ia+0.055
Neg logit contributions
 searching-0.115
 cheering-0.114
 exploring-0.109
 rescuing-0.107
 chasing-0.106
Token said
Token ID531
Feature activation-3.06e+00

Pos logit contributions
 cheering+0.020
 searching+0.020
 rescuing+0.018
 exploring+0.017
 improving+0.017
Neg logit contributions
ires-0.026
ours-0.024
ia-0.024
ara-0.023
uddle-0.023
Token ''
Token ID10148
Feature activation-5.12e+00

Pos logit contributions
 cheering+0.054
 searching+0.054
 rescuing+0.054
 improving+0.050
keeping+0.050
Neg logit contributions
ires-0.088
ara-0.087
ia-0.086
outed-0.085
ushing-0.083
TokenNo
Token ID2949
Feature activation-1.62e+00

Pos logit contributions
 searching+0.121
 rescuing+0.118
 cheering+0.114
 improving+0.113
 exploring+0.113
Neg logit contributions
ara-0.122
outed-0.108
ires-0.104
ours-0.102
ushing-0.100
Token!
Token ID0
Feature activation-1.09e+00

Pos logit contributions
 rescuing+0.032
keeping+0.032
 searching+0.032
 cheering+0.032
bench+0.031
Neg logit contributions
ara-0.043
ia-0.043
ires-0.041
outed-0.040
ushing-0.039
Token''
Token ID7061
Feature activation-1.44e+00

Pos logit contributions
keeping+0.022
 cheering+0.021
 rescuing+0.021
 Smith+0.021
ner+0.021
Neg logit contributions
ia-0.029
ires-0.028
uddle-0.027
outed-0.027
ara-0.026
Token
Token ID198
Feature activation-8.34e-01

Pos logit contributions
bench+0.030
rob+0.030
 rescuing+0.029
keeping+0.029
 cheering+0.029
Neg logit contributions
ara-0.035
ires-0.033
ia-0.032
outed-0.032
ushing-0.031
Token
Token ID198
Feature activation-1.05e+00

Pos logit contributions
 rescuing+0.013
 searching+0.013
 cheering+0.012
rob+0.012
keeping+0.012
Neg logit contributions
ara-0.026
ia-0.026
ires-0.025
outed-0.025
ushing-0.024
Token thin
Token ID7888
Feature activation-2.05e+00

Pos logit contributions
rob+0.061
 cheering+0.061
 rescuing+0.061
keeping+0.060
hered+0.059
Neg logit contributions
ia-0.063
ires-0.061
ara-0.059
outed-0.056
ushing-0.056
Token stone
Token ID7815
Feature activation-1.74e+00

Pos logit contributions
 cheering+0.047
 rescuing+0.046
 searching+0.046
keeping+0.045
bench+0.044
Neg logit contributions
ia-0.048
ires-0.047
ara-0.047
outed-0.045
ours-0.045
Token wasn
Token ID2492
Feature activation-9.12e-01

Pos logit contributions
 cheering+0.038
 rescuing+0.038
 searching+0.038
keeping+0.036
bench+0.036
Neg logit contributions
ires-0.042
ia-0.042
ara-0.042
outed-0.040
ours-0.039
Tokenâ
Token ID22940
Feature activation-6.90e-01

Pos logit contributions
keeping+0.012
rob+0.011
 cheering+0.011
 searching+0.011
 rescuing+0.011
Neg logit contributions
ia-0.031
ires-0.030
ara-0.030
outed-0.029
ours-0.029
Token
Token ID26391
Feature activation-2.39e+00

Pos logit contributions
 cheering+0.013
 searching+0.013
 rescuing+0.013
keeping+0.012
 improving+0.012
Neg logit contributions
ara-0.019
ia-0.019
ires-0.019
outed-0.018
ushing-0.018
Token
Token ID8151
Feature activation-5.10e+00

Pos logit contributions
 cheering+0.048
 rescuing+0.047
 searching+0.047
keeping+0.045
bench+0.045
Neg logit contributions
ia-0.062
ara-0.062
ires-0.061
outed-0.059
ushing-0.059
Tokent
Token ID83
Feature activation-1.45e+00

Pos logit contributions
 searching+0.099
 cheering+0.098
 rescuing+0.097
keeping+0.095
rob+0.095
Neg logit contributions
ara-0.137
ia-0.136
ushing-0.132
outed-0.131
ires-0.130
Token good
Token ID922
Feature activation-1.28e+00

Pos logit contributions
rob+0.031
Charl+0.029
keeping+0.028
bench+0.027
 cheering+0.027
Neg logit contributions
ia-0.036
ushing-0.035
ires-0.034
outed-0.033
ara-0.033
Token enough
Token ID1576
Feature activation-1.37e+00

Pos logit contributions
 cheering+0.026
 searching+0.025
 rescuing+0.025
keeping+0.024
 improving+0.024
Neg logit contributions
ara-0.033
ia-0.033
ires-0.033
outed-0.032
ours-0.032
Token to
Token ID284
Feature activation-1.45e+00

Pos logit contributions
rob+0.029
 cheering+0.028
Charl+0.028
 rescuing+0.027
ner+0.027
Neg logit contributions
ires-0.035
ia-0.035
ara-0.033
ushing-0.033
outed-0.032
Token join
Token ID4654
Feature activation+1.11e-01

Pos logit contributions
 cheering+0.029
 rescuing+0.029
 searching+0.029
 improving+0.028
keeping+0.027
Neg logit contributions
ara-0.039
ires-0.038
outed-0.038
ours-0.038
ia-0.037
Token told
Token ID1297
Feature activation+4.64e-01

Pos logit contributions
ires+0.015
ia+0.014
ara+0.014
ours+0.014
ushing+0.014
Neg logit contributions
 searching-0.011
 cheering-0.010
 rescuing-0.010
bench-0.010
 improving-0.010
Token M
Token ID337
Feature activation+4.17e-01

Pos logit contributions
uddle+0.014
ires+0.013
outed+0.013
ushing+0.013
ara+0.013
Neg logit contributions
 searching-0.008
 cheering-0.008
 rescuing-0.008
 exploring-0.008
 helping-0.007
Tokenina
Token ID1437
Feature activation-1.73e+00

Pos logit contributions
ires+0.009
ours+0.009
outed+0.008
ia+0.008
erve+0.008
Neg logit contributions
 searching-0.009
 cheering-0.009
keeping-0.009
 exploring-0.009
 improving-0.009
Tokenâ
Token ID22940
Feature activation-5.63e-01

Pos logit contributions
 searching+0.036
 cheering+0.036
 rescuing+0.036
keeping+0.034
 improving+0.034
Neg logit contributions
ara-0.044
ires-0.044
ia-0.043
outed-0.042
ours-0.041
Token
Token ID26391
Feature activation-3.19e+00

Pos logit contributions
 cheering+0.011
 searching+0.011
 rescuing+0.011
 improving+0.010
keeping+0.010
Neg logit contributions
ara-0.015
ia-0.015
ires-0.015
ushing-0.014
outed-0.014
Token
Token ID8151
Feature activation-5.09e+00

Pos logit contributions
 cheering+0.065
 rescuing+0.064
 searching+0.063
bench+0.063
 improving+0.062
Neg logit contributions
ia-0.080
ara-0.080
ires-0.077
ushing-0.076
uddle-0.075
Tokens
Token ID82
Feature activation-2.58e-01

Pos logit contributions
 cheering+0.107
 searching+0.106
 rescuing+0.105
 improving+0.104
rob+0.104
Neg logit contributions
ia-0.131
ara-0.127
ires-0.123
outed-0.121
ushing-0.118
Token family
Token ID1641
Feature activation-2.74e+00

Pos logit contributions
 searching+0.006
 cheering+0.006
 rescuing+0.006
keeping+0.005
 exploring+0.005
Neg logit contributions
ara-0.006
ires-0.006
outed-0.006
ia-0.006
ushing-0.006
Token that
Token ID326
Feature activation-1.94e+00

Pos logit contributions
 cheering+0.071
rob+0.070
 rescuing+0.069
keeping+0.069
 searching+0.069
Neg logit contributions
ia-0.055
ires-0.054
ara-0.052
outed-0.051
ours-0.051
Token he
Token ID339
Feature activation-1.30e+00

Pos logit contributions
 cheering+0.044
 rescuing+0.043
 searching+0.043
keeping+0.043
bench+0.042
Neg logit contributions
ara-0.044
ia-0.044
ires-0.043
outed-0.041
ours-0.041
Token wanted
Token ID2227
Feature activation-2.13e+00

Pos logit contributions
 cheering+0.026
 rescuing+0.026
 searching+0.026
keeping+0.025
 improving+0.024
Neg logit contributions
ires-0.034
ia-0.034
ara-0.034
outed-0.033
ours-0.032
Token liked
Token ID8288
Feature activation-2.45e+00

Pos logit contributions
ara+0.017
ires+0.017
ours+0.017
ia+0.016
uddle+0.016
Neg logit contributions
 searching-0.014
 cheering-0.014
 rescuing-0.013
 improving-0.013
bench-0.013
Token.
Token ID13
Feature activation-8.79e-01

Pos logit contributions
rob+0.063
Charl+0.061
bench+0.061
hered+0.059
 cheering+0.058
Neg logit contributions
ia-0.049
ires-0.048
ara-0.047
ours-0.043
ushing-0.042
Token Lucy
Token ID22162
Feature activation-5.17e-01

Pos logit contributions
 cheering+0.020
 rescuing+0.020
 searching+0.020
keeping+0.020
bench+0.019
Neg logit contributions
ara-0.020
ires-0.020
ia-0.020
outed-0.019
ours-0.019
Token hugged
Token ID41130
Feature activation-2.14e-01

Pos logit contributions
 cheering+0.011
 rescuing+0.011
 searching+0.011
keeping+0.011
bench+0.011
Neg logit contributions
ia-0.012
ara-0.012
ires-0.012
outed-0.012
ours-0.012
Token her
Token ID607
Feature activation+2.48e-01

Pos logit contributions
 rescuing+0.005
rob+0.005
bench+0.005
keeping+0.005
 searching+0.005
Neg logit contributions
ara-0.005
ires-0.005
ia-0.005
outed-0.004
ushing-0.004
Token Sister
Token ID26553
Feature activation-5.08e+00

Pos logit contributions
ara+0.004
ires+0.004
ia+0.004
outed+0.003
ushing+0.003
Neg logit contributions
 rescuing-0.008
 cheering-0.008
keeping-0.008
 searching-0.008
rob-0.008
Token and
Token ID290
Feature activation-8.83e-01

Pos logit contributions
rob+0.129
 rescuing+0.125
bench+0.125
 cheering+0.124
ner+0.122
Neg logit contributions
ara-0.105
ia-0.103
ushing-0.099
ires-0.098
outed-0.097
Token said
Token ID531
Feature activation-3.46e+00

Pos logit contributions
 cheering+0.019
 searching+0.019
 rescuing+0.018
 exploring+0.018
 improving+0.017
Neg logit contributions
ires-0.023
ia-0.023
ara-0.022
ours-0.022
uddle-0.021
Token "
Token ID366
Feature activation-3.47e+00

Pos logit contributions
 cheering+0.075
 rescuing+0.074
 searching+0.074
keeping+0.072
 improving+0.070
Neg logit contributions
ia-0.084
ara-0.084
ires-0.084
outed-0.081
ours-0.080
TokenThank
Token ID10449
Feature activation-4.39e-01

Pos logit contributions
 searching+0.072
 rescuing+0.070
 cheering+0.068
bench+0.068
keeping+0.067
Neg logit contributions
ara-0.092
ires-0.087
ia-0.085
outed-0.083
ushing-0.081
Token You
Token ID921
Feature activation-1.70e+00

Pos logit contributions
bench+0.014
 distance+0.014
 searching+0.014
 cheering+0.013
keeping+0.013
Neg logit contributions
ara-0.007
ires-0.007
ia-0.007
ushing-0.005
ours-0.005
Token my
Token ID616
Feature activation+1.81e+00

Pos logit contributions
ires+0.042
ara+0.041
ia+0.041
outed+0.039
ours+0.039
Neg logit contributions
 cheering-0.029
keeping-0.029
 rescuing-0.028
 searching-0.028
ner-0.028
Token favorite
Token ID4004
Feature activation+2.69e+00

Pos logit contributions
ires+0.035
ia+0.035
ara+0.035
outed+0.034
ushing+0.033
Neg logit contributions
 cheering-0.048
 searching-0.048
 rescuing-0.047
keeping-0.046
 improving-0.045
Token too
Token ID1165
Feature activation-9.18e-01

Pos logit contributions
ara+0.065
ia+0.065
ires+0.064
outed+0.062
ours+0.062
Neg logit contributions
 cheering-0.058
 rescuing-0.058
 searching-0.057
keeping-0.056
bench-0.056
Token."
Token ID526
Feature activation+7.84e-01

Pos logit contributions
 cheering+0.019
 rescuing+0.019
 searching+0.019
keeping+0.018
bench+0.018
Neg logit contributions
ia-0.023
ara-0.023
ires-0.023
outed-0.022
ours-0.022
Token<|endoftext|>
Token ID50256
Feature activation-2.49e+00

Pos logit contributions
ires+0.023
ia+0.023
ara+0.022
ushing+0.022
outed+0.022
Neg logit contributions
 cheering-0.013
 searching-0.013
 rescuing-0.012
keeping-0.012
bench-0.012
TokenOnce
Token ID7454
Feature activation+4.50e+00

Pos logit contributions
 rescuing+0.064
 cheering+0.064
 improving+0.063
 relying+0.062
 willing+0.061
Neg logit contributions
ara-0.057
ia-0.054
ires-0.047
uddle-0.044
outed-0.044
Token upon
Token ID2402
Feature activation+2.48e+00

Pos logit contributions
ires+0.090
ushing+0.089
ara+0.087
erve+0.086
ia+0.084
Neg logit contributions
 searching-0.112
 cheering-0.109
 exploring-0.106
ner-0.102
bench-0.102
Token a
Token ID257
Feature activation+9.28e-02

Pos logit contributions
ia+0.069
ara+0.068
ires+0.067
outed+0.064
ushing+0.062
Neg logit contributions
 cheering-0.046
 rescuing-0.045
keeping-0.045
 searching-0.044
bench-0.044
Token time
Token ID640
Feature activation-1.49e+00

Pos logit contributions
ia+0.003
ara+0.003
ires+0.003
outed+0.003
ushing+0.002
Neg logit contributions
 cheering-0.002
 rescuing-0.002
keeping-0.002
 searching-0.002
ner-0.002
Token,
Token ID11
Feature activation+1.44e+00

Pos logit contributions
keeping+0.036
bench+0.034
 cheering+0.034
 rescuing+0.033
rob+0.033
Neg logit contributions
ia-0.036
ara-0.035
ires-0.034
ours-0.032
ushing-0.032
Token there
Token ID612
Feature activation+2.98e-01

Pos logit contributions
ires+0.036
ia+0.036
ara+0.035
ushing+0.035
outed+0.034
Neg logit contributions
 searching-0.031
 cheering-0.030
 rescuing-0.030
 exploring-0.029
 improving-0.028
Token from
Token ID422
Feature activation-1.97e-01

Pos logit contributions
ara+0.031
ires+0.031
ia+0.031
outed+0.030
ours+0.030
Neg logit contributions
 cheering-0.023
 searching-0.023
 rescuing-0.022
keeping-0.022
 improving-0.021
Token their
Token ID511
Feature activation+1.19e+00

Pos logit contributions
 searching+0.004
 cheering+0.004
 rescuing+0.004
 exploring+0.003
 improving+0.003
Neg logit contributions
ara-0.006
ia-0.006
outed-0.006
ushing-0.006
ires-0.006
Token adventure
Token ID8855
Feature activation-9.17e-01

Pos logit contributions
ia+0.031
ires+0.031
ara+0.031
outed+0.030
ours+0.030
Neg logit contributions
 cheering-0.024
 rescuing-0.023
 searching-0.023
keeping-0.023
bench-0.022
Token.
Token ID13
Feature activation+9.22e-02

Pos logit contributions
 cheering+0.021
 rescuing+0.021
Charl+0.020
 searching+0.020
rob+0.020
Neg logit contributions
ia-0.021
ara-0.021
ires-0.020
ushing-0.019
outed-0.019
Token<|endoftext|>
Token ID50256
Feature activation-2.61e+00

Pos logit contributions
ires+0.003
ia+0.003
ara+0.003
outed+0.003
ushing+0.003
Neg logit contributions
 cheering-0.001
 searching-0.001
 rescuing-0.001
keeping-0.001
Charl-0.001
TokenOnce
Token ID7454
Feature activation+4.39e+00

Pos logit contributions
 rescuing+0.069
 improving+0.069
 cheering+0.068
 willing+0.067
 relying+0.066
Neg logit contributions
ara-0.057
ia-0.054
ires-0.047
uddle-0.044
outed-0.043
Token upon
Token ID2402
Feature activation+2.57e+00

Pos logit contributions
ires+0.089
ushing+0.087
ara+0.087
ours+0.084
outed+0.084
Neg logit contributions
 searching-0.110
 cheering-0.107
 exploring-0.103
 rescuing-0.101
bench-0.101
Token a
Token ID257
Feature activation-2.04e-01

Pos logit contributions
ia+0.072
ara+0.072
ires+0.071
outed+0.068
ushing+0.066
Neg logit contributions
 cheering-0.046
 rescuing-0.045
keeping-0.044
 searching-0.044
bench-0.044
Token time
Token ID640
Feature activation-1.51e+00

Pos logit contributions
 cheering+0.003
 searching+0.003
 rescuing+0.003
keeping+0.003
bench+0.003
Neg logit contributions
ires-0.006
ia-0.006
ara-0.006
outed-0.006
ours-0.006
Token,
Token ID11
Feature activation+1.09e+00

Pos logit contributions
keeping+0.033
 cheering+0.032
bench+0.032
 rescuing+0.032
rob+0.031
Neg logit contributions
ia-0.038
ara-0.037
ires-0.036
outed-0.035
ushing-0.035
Token there
Token ID612
Feature activation+2.95e-01

Pos logit contributions
ires+0.028
ara+0.028
ia+0.027
ushing+0.027
outed+0.027
Neg logit contributions
 searching-0.023
 rescuing-0.022
 cheering-0.021
 exploring-0.021
 improving-0.020
Token He
Token ID679
Feature activation+7.01e-01

Pos logit contributions
 searching+0.039
 cheering+0.038
 rescuing+0.038
keeping+0.037
bench+0.037
Neg logit contributions
ara-0.036
ia-0.036
ires-0.035
outed-0.035
ours-0.034
Token liked
Token ID8288
Feature activation-2.20e+00

Pos logit contributions
ires+0.018
ia+0.017
ara+0.017
ours+0.017
ushing+0.017
Neg logit contributions
 searching-0.015
 rescuing-0.015
 cheering-0.015
 exploring-0.014
 improving-0.014
Token to
Token ID284
Feature activation-4.63e-01

Pos logit contributions
hered+0.051
rob+0.049
Charl+0.049
bench+0.049
ner+0.047
Neg logit contributions
ires-0.051
ia-0.048
ours-0.048
ara-0.046
outed-0.046
Token rock
Token ID3881
Feature activation+1.47e+00

Pos logit contributions
 searching+0.009
 rescuing+0.009
 cheering+0.009
keeping+0.009
 improving+0.009
Neg logit contributions
ires-0.012
ara-0.012
ia-0.012
outed-0.012
ours-0.012
Token in
Token ID287
Feature activation-4.48e-01

Pos logit contributions
ara+0.040
ia+0.040
ires+0.040
outed+0.039
ours+0.038
Neg logit contributions
 cheering-0.027
 rescuing-0.027
 searching-0.027
keeping-0.026
 improving-0.025
Token his
Token ID465
Feature activation+4.46e+00

Pos logit contributions
hered+0.011
rob+0.011
Ell+0.011
ner+0.011
Charl+0.010
Neg logit contributions
ia-0.010
ires-0.009
ara-0.009
ours-0.008
ry-0.008
Token closet
Token ID21615
Feature activation+2.96e+00

Pos logit contributions
ires+0.117
ara+0.115
ia+0.113
outed+0.113
ushing+0.111
Neg logit contributions
 searching-0.097
 exploring-0.090
 rescuing-0.087
 cheering-0.086
 improving-0.084
Token every
Token ID790
Feature activation-2.75e-01

Pos logit contributions
ara+0.087
ia+0.086
ires+0.086
ours+0.083
outed+0.082
Neg logit contributions
 searching-0.053
 rescuing-0.050
 cheering-0.050
 exploring-0.048
keeping-0.048
Token day
Token ID1110
Feature activation-1.59e+00

Pos logit contributions
 cheering+0.004
 rescuing+0.004
keeping+0.004
rob+0.003
 searching+0.003
Neg logit contributions
ia-0.009
ara-0.009
ires-0.009
outed-0.008
ours-0.008
Token.
Token ID13
Feature activation-7.78e-01

Pos logit contributions
 cheering+0.034
 rescuing+0.034
rob+0.033
 searching+0.033
keeping+0.033
Neg logit contributions
ires-0.038
ia-0.037
ara-0.037
ushing-0.036
outed-0.035
Token Little
Token ID7703
Feature activation-2.53e+00

Pos logit contributions
 searching+0.018
 cheering+0.018
 rescuing+0.017
keeping+0.017
 exploring+0.017
Neg logit contributions
ia-0.018
ara-0.018
ires-0.018
outed-0.018
ushing-0.017
Token of
Token ID286
Feature activation+9.47e-02

Pos logit contributions
ia+0.013
ires+0.013
ara+0.013
outed+0.012
ours+0.012
Neg logit contributions
 cheering-0.015
 searching-0.015
 rescuing-0.015
keeping-0.015
rob-0.014
Token toys
Token ID14958
Feature activation+1.50e+00

Pos logit contributions
ires+0.002
ia+0.002
ara+0.002
outed+0.002
ours+0.002
Neg logit contributions
rob-0.002
hered-0.002
Charl-0.002
keeping-0.002
ner-0.002
Token and
Token ID290
Feature activation+4.69e-01

Pos logit contributions
ara+0.037
ia+0.037
ires+0.037
outed+0.036
ours+0.035
Neg logit contributions
 cheering-0.031
 rescuing-0.031
 searching-0.031
keeping-0.030
bench-0.029
Token books
Token ID3835
Feature activation+1.37e+00

Pos logit contributions
ara+0.012
ia+0.012
ires+0.012
outed+0.012
ushing+0.012
Neg logit contributions
 cheering-0.009
 rescuing-0.009
 searching-0.009
keeping-0.009
 improving-0.009
Token in
Token ID287
Feature activation+1.10e+00

Pos logit contributions
ara+0.034
ires+0.034
ia+0.034
outed+0.032
ushing+0.032
Neg logit contributions
 cheering-0.029
 rescuing-0.028
 searching-0.028
keeping-0.028
 improving-0.027
Token your
Token ID534
Feature activation+4.69e+00

Pos logit contributions
ia+0.026
ara+0.026
ires+0.026
outed+0.025
ours+0.025
Neg logit contributions
 cheering-0.024
 rescuing-0.024
 searching-0.024
keeping-0.023
bench-0.023
Token room
Token ID2119
Feature activation+2.00e+00

Pos logit contributions
ires+0.112
ia+0.111
ara+0.111
outed+0.108
ushing+0.107
Neg logit contributions
 searching-0.105
 rescuing-0.102
keeping-0.102
 cheering-0.102
 exploring-0.098
Token.
Token ID13
Feature activation+1.49e+00

Pos logit contributions
ires+0.056
ara+0.055
ia+0.055
outed+0.054
ours+0.053
Neg logit contributions
 cheering-0.036
 searching-0.036
 rescuing-0.035
keeping-0.035
bench-0.033
Token And
Token ID843
Feature activation+1.73e+00

Pos logit contributions
ia+0.037
ires+0.037
ara+0.036
outed+0.036
uddle+0.035
Neg logit contributions
 cheering-0.031
 rescuing-0.031
 searching-0.031
keeping-0.030
 improving-0.029
Token you
Token ID345
Feature activation+3.81e-01

Pos logit contributions
ara+0.043
ia+0.043
outed+0.043
uddle+0.042
ires+0.042
Neg logit contributions
 searching-0.037
 cheering-0.037
 rescuing-0.037
 improving-0.036
 exploring-0.035
Token know
Token ID760
Feature activation+1.53e+00

Pos logit contributions
ara+0.010
ia+0.010
ires+0.010
outed+0.009
ours+0.009
Neg logit contributions
 cheering-0.008
 rescuing-0.008
 searching-0.008
keeping-0.007
rob-0.007
Token be
Token ID307
Feature activation+4.76e-01

Pos logit contributions
keeping+0.012
 rescuing+0.012
 searching+0.012
 cheering+0.012
rob+0.012
Neg logit contributions
ia-0.014
ara-0.013
ires-0.013
outed-0.013
ushing-0.012
Token scared
Token ID12008
Feature activation+7.80e-01

Pos logit contributions
ia+0.012
outed+0.012
ires+0.012
ushing+0.012
ara+0.012
Neg logit contributions
rob-0.009
ner-0.009
keeping-0.009
Charl-0.009
 rescuing-0.009
Token and
Token ID290
Feature activation+6.94e-01

Pos logit contributions
ara+0.022
ires+0.022
ia+0.022
outed+0.021
ours+0.021
Neg logit contributions
 cheering-0.015
 searching-0.014
 rescuing-0.014
 improving-0.013
keeping-0.013
Token started
Token ID2067
Feature activation-1.35e+00

Pos logit contributions
ires+0.019
ara+0.019
ushing+0.018
uddle+0.018
ia+0.018
Neg logit contributions
 searching-0.014
 cheering-0.014
 rescuing-0.014
 chasing-0.013
 exploring-0.013
Token to
Token ID284
Feature activation+1.50e-01

Pos logit contributions
rob+0.033
ner+0.030
Charl+0.029
robat+0.029
bench+0.029
Neg logit contributions
outed-0.029
ia-0.029
ours-0.028
okes-0.027
uddle-0.027
Token trem
Token ID10023
Feature activation+4.28e+00

Pos logit contributions
ia+0.004
ara+0.004
ires+0.004
outed+0.004
ours+0.004
Neg logit contributions
rob-0.003
keeping-0.003
 rescuing-0.003
 cheering-0.003
bench-0.003
Tokenble
Token ID903
Feature activation+7.46e-01

Pos logit contributions
ara+0.102
outed+0.094
aj+0.094
ours+0.093
uddle+0.093
Neg logit contributions
 rescuing-0.103
 cheering-0.097
 improving-0.097
keeping-0.096
 chasing-0.094
Token.
Token ID13
Feature activation+6.17e-01

Pos logit contributions
ia+0.019
ires+0.019
ara+0.019
outed+0.018
ours+0.018
Neg logit contributions
 cheering-0.015
 rescuing-0.015
 searching-0.015
keeping-0.014
bench-0.014
Token Mia
Token ID32189
Feature activation+6.35e-01

Pos logit contributions
ires+0.013
ushing+0.013
ara+0.013
uddle+0.013
inch+0.012
Neg logit contributions
 searching-0.015
 cheering-0.014
 rescuing-0.014
 exploring-0.014
ner-0.014
Token was
Token ID373
Feature activation+6.91e-01

Pos logit contributions
ires+0.015
ia+0.015
ara+0.015
outed+0.015
ours+0.015
Neg logit contributions
 cheering-0.014
 searching-0.014
 rescuing-0.014
keeping-0.013
 improving-0.013
Token patient
Token ID5827
Feature activation-2.12e+00

Pos logit contributions
outed+0.019
ia+0.018
ires+0.018
ushing+0.017
ours+0.017
Neg logit contributions
rob-0.014
keeping-0.013
bench-0.012
ner-0.012
 rescuing-0.012
Token about
Token ID546
Feature activation-7.53e-01

Pos logit contributions
 cheering+0.049
 rescuing+0.049
 searching+0.048
keeping+0.048
bench+0.047
Neg logit contributions
ia-0.030
ires-0.030
ara-0.030
outed-0.028
ours-0.028
Token the
Token ID262
Feature activation-1.19e+00

Pos logit contributions
bench+0.015
 cheering+0.015
ner+0.015
hered+0.015
rob+0.015
Neg logit contributions
ia-0.019
ires-0.019
ara-0.019
outed-0.017
ours-0.017
Token jewel
Token ID16927
Feature activation-2.30e-01

Pos logit contributions
 cheering+0.028
 rescuing+0.028
ner+0.028
rob+0.027
keeping+0.027
Neg logit contributions
ara-0.027
ia-0.027
ires-0.026
uddle-0.025
outed-0.025
Token.
Token ID13
Feature activation+1.62e-01

Pos logit contributions
 cheering+0.005
 rescuing+0.005
 searching+0.005
keeping+0.005
bench+0.005
Neg logit contributions
ia-0.006
ires-0.006
ara-0.006
ushing-0.005
outed-0.005
Token<|endoftext|>
Token ID50256
Feature activation-3.09e+00

Pos logit contributions
ia+0.005
ires+0.005
ushing+0.004
uddle+0.004
ara+0.004
Neg logit contributions
 searching-0.003
 cheering-0.003
keeping-0.003
 rescuing-0.003
Charl-0.003
TokenOnce
Token ID7454
Feature activation+3.94e+00

Pos logit contributions
 rescuing+0.078
 cheering+0.077
 improving+0.077
 willing+0.075
 searching+0.074
Neg logit contributions
ara-0.071
ia-0.066
ires-0.060
uddle-0.056
ours-0.055
Token upon
Token ID2402
Feature activation+2.76e+00

Pos logit contributions
ires+0.077
ushing+0.077
ara+0.076
ours+0.074
erve+0.073
Neg logit contributions
 searching-0.100
 cheering-0.097
 exploring-0.095
bench-0.092
 rescuing-0.092
Token a
Token ID257
Feature activation+1.27e-01

Pos logit contributions
ia+0.075
ara+0.074
ires+0.073
outed+0.070
ushing+0.068
Neg logit contributions
 cheering-0.052
keeping-0.051
 rescuing-0.050
bench-0.050
 searching-0.049
Token time
Token ID640
Feature activation-1.15e+00

Pos logit contributions
ia+0.004
ara+0.004
ires+0.004
outed+0.004
ushing+0.003
Neg logit contributions
 cheering-0.002
 rescuing-0.002
 searching-0.002
keeping-0.002
 improving-0.002
Token,
Token ID11
Feature activation+1.35e+00

Pos logit contributions
keeping+0.025
 cheering+0.024
 rescuing+0.024
bench+0.024
rob+0.023
Neg logit contributions
ia-0.030
ara-0.029
ires-0.028
outed-0.027
ushing-0.027
Token there
Token ID612
Feature activation+4.36e-01

Pos logit contributions
ires+0.035
ara+0.035
ia+0.034
ushing+0.034
uddle+0.034
Neg logit contributions
 searching-0.027
 rescuing-0.027
 cheering-0.026
 exploring-0.025
 improving-0.025
Token,
Token ID11
Feature activation+3.49e-01

Pos logit contributions
ires+0.023
ara+0.023
ia+0.023
outed+0.023
ours+0.022
Neg logit contributions
 cheering-0.014
 searching-0.014
 rescuing-0.014
keeping-0.013
 exploring-0.013
Token and
Token ID290
Feature activation+9.51e-02

Pos logit contributions
ia+0.009
ires+0.009
ara+0.009
outed+0.009
ours+0.009
Neg logit contributions
 cheering-0.007
 rescuing-0.007
 searching-0.007
keeping-0.007
 improving-0.007
Token then
Token ID788
Feature activation-6.18e-01

Pos logit contributions
ires+0.003
ia+0.003
ara+0.003
ours+0.003
uddle+0.003
Neg logit contributions
 cheering-0.002
 searching-0.002
 rescuing-0.002
 improving-0.002
ner-0.002
Token landed
Token ID11406
Feature activation+9.10e-01

Pos logit contributions
 searching+0.011
 cheering+0.011
 rescuing+0.010
 exploring+0.010
ner+0.010
Neg logit contributions
ires-0.019
ara-0.018
ours-0.018
ia-0.018
outed-0.018
Token on
Token ID319
Feature activation-8.44e-01

Pos logit contributions
ia+0.028
ara+0.028
ires+0.028
outed+0.027
ours+0.026
Neg logit contributions
 cheering-0.014
 rescuing-0.014
 searching-0.014
keeping-0.014
bench-0.013
Token a
Token ID257
Feature activation+2.02e+00

Pos logit contributions
 cheering+0.018
 rescuing+0.018
rob+0.017
 searching+0.017
keeping+0.017
Neg logit contributions
ara-0.021
ia-0.021
ires-0.020
outed-0.019
ours-0.019
Token nearby
Token ID6716
Feature activation+8.52e-01

Pos logit contributions
ia+0.049
ara+0.048
uddle+0.046
ires+0.044
ushing+0.044
Neg logit contributions
keeping-0.045
rob-0.045
 rescuing-0.045
 searching-0.043
 cheering-0.043
Token field
Token ID2214
Feature activation+1.01e+00

Pos logit contributions
ia+0.024
ara+0.024
ires+0.024
outed+0.023
ours+0.023
Neg logit contributions
 cheering-0.015
 rescuing-0.015
 searching-0.015
keeping-0.014
bench-0.014
Token.
Token ID13
Feature activation-3.30e-01

Pos logit contributions
ia+0.026
ara+0.025
ires+0.025
outed+0.025
ushing+0.024
Neg logit contributions
 cheering-0.020
 rescuing-0.020
 searching-0.020
keeping-0.020
rob-0.019
Token Jack
Token ID3619
Feature activation-3.78e-01

Pos logit contributions
 searching+0.007
 cheering+0.007
 rescuing+0.007
keeping+0.006
ner+0.006
Neg logit contributions
ia-0.008
ires-0.008
ara-0.008
outed-0.008
ushing-0.008
Token and
Token ID290
Feature activation+4.23e-02

Pos logit contributions
 rescuing+0.009
rob+0.009
keeping+0.009
bench+0.009
 cheering+0.009
Neg logit contributions
ia-0.008
ara-0.008
outed-0.008
ushing-0.008
ires-0.008
Token,
Token ID11
Feature activation+4.10e-01

Pos logit contributions
 cheering+0.028
 rescuing+0.028
 searching+0.028
keeping+0.027
bench+0.027
Neg logit contributions
ia-0.038
ara-0.037
ires-0.037
outed-0.036
ushing-0.035
Token Ben
Token ID3932
Feature activation-9.65e-01

Pos logit contributions
outed+0.008
ires+0.008
ours+0.007
ushing+0.007
uddle+0.007
Neg logit contributions
 cheering-0.011
 searching-0.011
 rescuing-0.011
 improving-0.010
 exploring-0.010
Token.
Token ID13
Feature activation-1.24e-01

Pos logit contributions
 cheering+0.021
 rescuing+0.021
 searching+0.021
ner+0.020
bench+0.020
Neg logit contributions
ia-0.024
ara-0.024
ires-0.022
ushing-0.022
outed-0.021
Token You
Token ID921
Feature activation-8.60e-03

Pos logit contributions
 rescuing+0.003
 cheering+0.003
 searching+0.002
keeping+0.002
Charl+0.002
Neg logit contributions
outed-0.003
ia-0.003
ires-0.003
ara-0.003
uddle-0.003
Token are
Token ID389
Feature activation+5.55e-01

Pos logit contributions
keeping+0.000
 cheering+0.000
 searching+0.000
bench+0.000
 rescuing+0.000
Neg logit contributions
ia-0.000
ara-0.000
ushing-0.000
outed-0.000
ires-0.000
Token a
Token ID257
Feature activation+1.16e+00

Pos logit contributions
ia+0.013
ires+0.013
ara+0.012
ushing+0.011
outed+0.011
Neg logit contributions
keeping-0.014
rob-0.013
bench-0.013
hered-0.013
ner-0.013
Token good
Token ID922
Feature activation-5.65e-01

Pos logit contributions
ara+0.025
ia+0.024
uddle+0.022
ushing+0.022
ires+0.022
Neg logit contributions
keeping-0.032
bench-0.030
rob-0.029
ner-0.029
 rescuing-0.028
Token friend
Token ID1545
Feature activation-7.37e-02

Pos logit contributions
 cheering+0.014
 searching+0.014
 rescuing+0.014
keeping+0.013
 improving+0.013
Neg logit contributions
ires-0.012
outed-0.012
ia-0.011
ara-0.011
ours-0.011
Token.
Token ID13
Feature activation+6.10e-01

Pos logit contributions
 searching+0.002
 cheering+0.002
bench+0.002
 rescuing+0.002
rob+0.002
Neg logit contributions
ia-0.002
ara-0.002
ires-0.001
ushing-0.001
outed-0.001
Token You
Token ID921
Feature activation+5.58e-01

Pos logit contributions
ia+0.014
ires+0.014
outed+0.014
ara+0.014
ushing+0.014
Neg logit contributions
 rescuing-0.014
 cheering-0.014
 searching-0.013
keeping-0.013
 improving-0.013
Token tried
Token ID3088
Feature activation-1.49e-01

Pos logit contributions
ia+0.011
ara+0.011
outed+0.010
ushing+0.010
ires+0.010
Neg logit contributions
keeping-0.015
 cheering-0.014
 searching-0.014
rob-0.014
 rescuing-0.014
Token said
Token ID531
Feature activation-1.54e+00

Pos logit contributions
 cheering+0.012
 searching+0.011
 rescuing+0.011
bench+0.011
 improving+0.011
Neg logit contributions
ires-0.009
ara-0.009
ia-0.009
ours-0.008
uddle-0.008
Token.
Token ID13
Feature activation-4.22e-01

Pos logit contributions
rob+0.040
Charl+0.036
Brian+0.036
 rescuing+0.036
keeping+0.036
Neg logit contributions
outed-0.032
ia-0.032
ushing-0.032
ires-0.030
ara-0.030
Token "
Token ID366
Feature activation-5.93e-01

Pos logit contributions
 cheering+0.008
 searching+0.008
 rescuing+0.008
keeping+0.008
bench+0.007
Neg logit contributions
ires-0.011
ia-0.011
ara-0.011
ushing-0.011
outed-0.011
TokenNow
Token ID3844
Feature activation-8.03e-01

Pos logit contributions
 searching+0.014
 rescuing+0.013
 cheering+0.013
keeping+0.013
hered+0.013
Neg logit contributions
ara-0.014
ires-0.014
ia-0.013
outed-0.013
ushing-0.012
Token let
Token ID1309
Feature activation-1.14e+00

Pos logit contributions
 cheering+0.017
 rescuing+0.017
 searching+0.016
 improving+0.016
 exploring+0.016
Neg logit contributions
outed-0.021
ires-0.021
ia-0.020
ushing-0.019
ara-0.019
Token's
Token ID338
Feature activation+1.57e+00

Pos logit contributions
 cheering+0.017
 rescuing+0.016
hered+0.016
 improving+0.016
 searching+0.015
Neg logit contributions
ires-0.036
outed-0.036
ia-0.034
ushing-0.034
ours-0.033
Token go
Token ID467
Feature activation-1.88e+00

Pos logit contributions
ires+0.040
ia+0.040
ara+0.039
outed+0.039
ushing+0.038
Neg logit contributions
 cheering-0.033
 searching-0.033
 rescuing-0.033
 improving-0.032
keeping-0.031
Token back
Token ID736
Feature activation-2.87e-01

Pos logit contributions
 cheering+0.035
 rescuing+0.034
 searching+0.034
keeping+0.033
bench+0.033
Neg logit contributions
ara-0.051
ia-0.051
ires-0.051
outed-0.049
ours-0.049
Token to
Token ID284
Feature activation-1.03e-01

Pos logit contributions
 cheering+0.005
 rescuing+0.005
 searching+0.005
keeping+0.004
bench+0.004
Neg logit contributions
ia-0.008
ires-0.008
ara-0.008
outed-0.008
ours-0.008
Token making
Token ID1642
Feature activation+1.08e+00

Pos logit contributions
 cheering+0.002
 rescuing+0.002
 searching+0.002
bench+0.002
keeping+0.002
Neg logit contributions
ara-0.003
ia-0.003
ires-0.003
ours-0.003
outed-0.003
Token your
Token ID534
Feature activation+2.09e+00

Pos logit contributions
ires+0.028
ara+0.028
ia+0.028
ours+0.027
outed+0.026
Neg logit contributions
 rescuing-0.022
 cheering-0.021
 searching-0.021
keeping-0.021
bench-0.021
Token play
Token ID711
Feature activation-4.08e-01

Pos logit contributions
 cheering+0.020
 rescuing+0.019
 searching+0.019
keeping+0.019
 improving+0.018
Neg logit contributions
ires-0.026
ara-0.026
ia-0.026
outed-0.025
ours-0.025
Token with
Token ID351
Feature activation+1.29e-02

Pos logit contributions
 cheering+0.008
 rescuing+0.008
 searching+0.008
keeping+0.008
bench+0.008
Neg logit contributions
ia-0.010
ara-0.010
ires-0.010
outed-0.010
ushing-0.010
Token her
Token ID607
Feature activation+1.95e+00

Pos logit contributions
outed+0.000
ires+0.000
ushing+0.000
ours+0.000
ia+0.000
Neg logit contributions
 cheering-0.000
 searching-0.000
 rescuing-0.000
 exploring-0.000
 improving-0.000
Token.
Token ID13
Feature activation-7.34e-01

Pos logit contributions
ara+0.053
ires+0.053
ia+0.053
ours+0.052
outed+0.052
Neg logit contributions
 cheering-0.037
 searching-0.036
 rescuing-0.036
 improving-0.034
keeping-0.034
Token Ben
Token ID3932
Feature activation+5.79e-01

Pos logit contributions
 cheering+0.020
ner+0.020
 searching+0.020
hered+0.020
Charl+0.019
Neg logit contributions
ia-0.013
ires-0.013
ushing-0.013
outed-0.013
uddle-0.012
Token smiled
Token ID13541
Feature activation+9.98e-01

Pos logit contributions
ires+0.016
ushing+0.014
ia+0.014
ours+0.014
ara+0.014
Neg logit contributions
 searching-0.012
 cheering-0.012
hered-0.011
 rescuing-0.011
 improving-0.010
Token and
Token ID290
Feature activation-6.36e-01

Pos logit contributions
ires+0.032
ours+0.032
ia+0.032
ara+0.031
uddle+0.030
Neg logit contributions
 cheering-0.014
 searching-0.014
 rescuing-0.012
hered-0.012
 exploring-0.012
Token said
Token ID531
Feature activation-1.29e+00

Pos logit contributions
 cheering+0.013
 searching+0.012
 chasing+0.011
 rescuing+0.011
 helping+0.011
Neg logit contributions
ires-0.019
ushing-0.017
uddle-0.017
ara-0.017
ours-0.017
Token yes
Token ID3763
Feature activation+4.80e-01

Pos logit contributions
 cheering+0.029
 searching+0.029
 rescuing+0.029
keeping+0.029
bench+0.028
Neg logit contributions
ia-0.030
ara-0.030
ires-0.030
outed-0.028
ours-0.028
Token.
Token ID13
Feature activation-5.46e-01

Pos logit contributions
ires+0.014
ours+0.014
outed+0.014
ara+0.014
ia+0.014
Neg logit contributions
 cheering-0.008
 searching-0.008
 rescuing-0.008
bench-0.007
keeping-0.007
Token He
Token ID679
Feature activation+2.86e-01

Pos logit contributions
 cheering+0.011
ner+0.011
 searching+0.011
bench+0.011
keeping+0.011
Neg logit contributions
ushing-0.014
ia-0.013
ires-0.013
outed-0.013
uddle-0.013
TokenSoon
Token ID28093
Feature activation-1.86e+00

Pos logit contributions
ires+0.005
ours+0.005
ara+0.005
outed+0.005
ia+0.005
Neg logit contributions
 cheering-0.004
 rescuing-0.004
 searching-0.004
Charl-0.004
 improving-0.004
Token after
Token ID706
Feature activation-1.06e-01

Pos logit contributions
 cheering+0.037
 rescuing+0.037
keeping+0.037
bench+0.037
 searching+0.036
Neg logit contributions
ia-0.048
ara-0.048
ires-0.047
ours-0.045
outed-0.045
Token,
Token ID11
Feature activation+8.17e-01

Pos logit contributions
Charl+0.002
bench+0.002
 cheering+0.002
keeping+0.002
 rescuing+0.002
Neg logit contributions
ia-0.003
ires-0.003
ara-0.002
ours-0.002
outed-0.002
Token the
Token ID262
Feature activation-5.37e-01

Pos logit contributions
ires+0.025
ia+0.025
ara+0.025
outed+0.024
ours+0.024
Neg logit contributions
 cheering-0.013
 searching-0.013
 rescuing-0.013
keeping-0.012
 improving-0.012
Token family
Token ID1641
Feature activation-6.58e-01

Pos logit contributions
 cheering+0.013
 rescuing+0.013
keeping+0.013
 searching+0.013
rob+0.012
Neg logit contributions
ia-0.012
ara-0.012
ires-0.012
outed-0.011
ours-0.011
Token got
Token ID1392
Feature activation-3.38e-02

Pos logit contributions
 cheering+0.014
 searching+0.014
 rescuing+0.013
keeping+0.013
 improving+0.013
Neg logit contributions
ires-0.017
ia-0.017
ara-0.016
outed-0.016
ours-0.016
Token a
Token ID257
Feature activation+1.03e+00

Pos logit contributions
rob+0.001
Charl+0.001
robat+0.001
ner+0.001
Brian+0.001
Neg logit contributions
outed-0.001
ours-0.001
uddle-0.001
ushing-0.001
ires-0.001
Token cat
Token ID3797
Feature activation-9.37e-01

Pos logit contributions
uddle+0.022
ia+0.021
ara+0.021
ushing+0.020
ours+0.019
Neg logit contributions
keeping-0.027
rob-0.027
 rescuing-0.025
 cheering-0.024
bench-0.024
Token.
Token ID13
Feature activation-8.80e-01

Pos logit contributions
rob+0.019
 cheering+0.019
bench+0.018
 improving+0.018
keeping+0.018
Neg logit contributions
ia-0.023
ara-0.023
ushing-0.022
uddle-0.022
ires-0.021
Token The
Token ID383
Feature activation-7.01e-01

Pos logit contributions
 cheering+0.017
 searching+0.016
 rescuing+0.016
keeping+0.016
ner+0.015
Neg logit contributions
ia-0.024
ires-0.023
ara-0.023
ushing-0.023
outed-0.023
Token cat
Token ID3797
Feature activation-1.78e-01

Pos logit contributions
 rescuing+0.019
 cheering+0.019
keeping+0.019
rob+0.019
 searching+0.019
Neg logit contributions
ia-0.013
ara-0.013
ires-0.013
outed-0.012
ours-0.012
Token if
Token ID611
Feature activation-1.32e+00

Pos logit contributions
 cheering+0.033
keeping+0.033
 rescuing+0.033
hered+0.033
 searching+0.033
Neg logit contributions
ia-0.031
ara-0.031
ires-0.030
ours-0.029
outed-0.029
Token she
Token ID673
Feature activation-3.06e-01

Pos logit contributions
 searching+0.024
 exploring+0.022
 cheering+0.022
 rescuing+0.021
 improving+0.020
Neg logit contributions
ires-0.040
ushing-0.039
outed-0.038
uddle-0.038
ours-0.037
Token wanted
Token ID2227
Feature activation-1.25e+00

Pos logit contributions
 searching+0.006
 rescuing+0.006
 cheering+0.006
keeping+0.006
 improving+0.006
Neg logit contributions
ires-0.008
ia-0.008
ara-0.008
ours-0.007
outed-0.007
Token one
Token ID530
Feature activation+1.62e+00

Pos logit contributions
rob+0.026
 cheering+0.026
keeping+0.025
bench+0.024
 rescuing+0.024
Neg logit contributions
ia-0.032
ara-0.032
ires-0.031
ours-0.030
outed-0.029
Token,
Token ID11
Feature activation+1.35e+00

Pos logit contributions
ia+0.044
ires+0.044
ara+0.044
outed+0.043
ours+0.042
Neg logit contributions
 cheering-0.030
 rescuing-0.030
 searching-0.030
keeping-0.029
bench-0.028
Token she
Token ID673
Feature activation+6.43e-01

Pos logit contributions
ires+0.043
ara+0.043
outed+0.043
uddle+0.042
ia+0.042
Neg logit contributions
 searching-0.021
 cheering-0.020
 rescuing-0.019
 helping-0.019
 exploring-0.019
Token had
Token ID550
Feature activation-7.78e-01

Pos logit contributions
ires+0.017
ia+0.016
ara+0.016
ours+0.016
ushing+0.016
Neg logit contributions
 searching-0.013
 rescuing-0.012
 cheering-0.012
bench-0.012
 improving-0.012
Token to
Token ID284
Feature activation+4.72e-01

Pos logit contributions
 cheering+0.013
 rescuing+0.013
 searching+0.013
keeping+0.012
bench+0.012
Neg logit contributions
ia-0.023
ara-0.023
ires-0.023
outed-0.022
ours-0.022
Token choose
Token ID3853
Feature activation+1.97e+00

Pos logit contributions
ires+0.012
outed+0.012
ours+0.012
ara+0.012
ia+0.012
Neg logit contributions
 cheering-0.010
 rescuing-0.010
 searching-0.010
keeping-0.009
 improving-0.009
Token carefully
Token ID7773
Feature activation-8.98e-01

Pos logit contributions
ia+0.059
ara+0.058
ushing+0.058
outed+0.058
uddle+0.058
Neg logit contributions
 searching-0.031
 rescuing-0.031
 cheering-0.030
 exploring-0.029
 improving-0.029
Token.
Token ID13
Feature activation-4.30e-02

Pos logit contributions
 cheering+0.017
 searching+0.017
 rescuing+0.017
keeping+0.017
bench+0.016
Neg logit contributions
ia-0.024
ires-0.024
ara-0.024
outed-0.023
ushing-0.023
Token on
Token ID319
Feature activation-1.96e+00

Pos logit contributions
ires+0.014
outed+0.014
ia+0.013
ara+0.013
ushing+0.013
Neg logit contributions
 cheering-0.014
rob-0.014
keeping-0.014
ner-0.013
bench-0.013
Token a
Token ID257
Feature activation+1.69e+00

Pos logit contributions
 cheering+0.049
bench+0.046
ner+0.045
keeping+0.044
hered+0.044
Neg logit contributions
ia-0.045
ara-0.045
ires-0.042
ours-0.037
erve-0.036
Token farm
Token ID5318
Feature activation+7.52e-01

Pos logit contributions
ia+0.044
ara+0.044
ires+0.043
outed+0.042
ours+0.041
Neg logit contributions
 cheering-0.034
 rescuing-0.033
 searching-0.033
keeping-0.033
 improving-0.032
Token.
Token ID13
Feature activation-1.35e+00

Pos logit contributions
ia+0.017
ires+0.017
ara+0.017
outed+0.016
ushing+0.016
Neg logit contributions
 cheering-0.017
 improving-0.017
rob-0.016
 rescuing-0.016
ner-0.016
Token Every
Token ID3887
Feature activation-2.52e+00

Pos logit contributions
 cheering+0.032
 rescuing+0.032
 searching+0.032
keeping+0.031
 improving+0.031
Neg logit contributions
ara-0.030
ia-0.030
ires-0.030
outed-0.028
ours-0.028
Token day
Token ID1110
Feature activation-2.08e+00

Pos logit contributions
 cheering+0.039
keeping+0.037
 rescuing+0.037
 improving+0.036
ner+0.035
Neg logit contributions
ia-0.081
ara-0.080
ires-0.078
outed-0.075
ours-0.075
Token,
Token ID11
Feature activation+3.43e-01

Pos logit contributions
keeping+0.044
 cheering+0.044
 rescuing+0.043
 searching+0.042
bench+0.042
Neg logit contributions
ara-0.053
ia-0.052
ires-0.051
outed-0.049
ours-0.049
Token the
Token ID262
Feature activation-2.12e+00

Pos logit contributions
ia+0.010
ires+0.010
ara+0.010
outed+0.009
ours+0.009
Neg logit contributions
 cheering-0.006
 rescuing-0.006
 searching-0.006
keeping-0.006
bench-0.006
Token cow
Token ID9875
Feature activation+5.41e-01

Pos logit contributions
 cheering+0.041
 rescuing+0.041
 searching+0.041
keeping+0.039
bench+0.038
Neg logit contributions
ia-0.056
ires-0.056
ara-0.056
outed-0.054
ours-0.054
Token would
Token ID561
Feature activation-1.10e+00

Pos logit contributions
ia+0.012
ara+0.012
ires+0.012
outed+0.012
ours+0.012
Neg logit contributions
 cheering-0.012
 rescuing-0.012
 searching-0.012
keeping-0.012
bench-0.012
Token eat
Token ID4483
Feature activation+4.90e-01

Pos logit contributions
 cheering+0.025
keeping+0.025
rob+0.024
 rescuing+0.024
 searching+0.024
Neg logit contributions
ia-0.025
ires-0.025
ara-0.025
outed-0.023
ours-0.023
Token him
Token ID683
Feature activation-6.55e-01

Pos logit contributions
ires+0.026
ia+0.026
ara+0.026
outed+0.025
ours+0.025
Neg logit contributions
 cheering-0.018
 searching-0.018
 rescuing-0.018
keeping-0.017
 improving-0.017
Token a
Token ID257
Feature activation-7.28e-01

Pos logit contributions
 rescuing+0.009
 searching+0.009
 cheering+0.009
keeping+0.009
 improving+0.009
Neg logit contributions
ires-0.021
ara-0.021
ia-0.020
outed-0.020
ushing-0.020
Token hug
Token ID16225
Feature activation-6.29e-01

Pos logit contributions
keeping+0.016
 rescuing+0.016
 searching+0.015
 cheering+0.015
rob+0.015
Neg logit contributions
ara-0.018
ia-0.018
ires-0.017
uddle-0.017
outed-0.017
Token.
Token ID13
Feature activation-9.21e-01

Pos logit contributions
 cheering+0.012
 rescuing+0.012
 searching+0.012
keeping+0.012
bench+0.012
Neg logit contributions
ia-0.016
ires-0.016
ara-0.016
outed-0.016
ours-0.016
Token He
Token ID679
Feature activation+3.26e-01

Pos logit contributions
 cheering+0.018
 searching+0.017
 rescuing+0.016
 improving+0.016
bench+0.016
Neg logit contributions
ires-0.025
ia-0.025
ara-0.025
ushing-0.024
uddle-0.024
Token said
Token ID531
Feature activation-1.88e+00

Pos logit contributions
ires+0.010
ours+0.009
ia+0.009
ara+0.009
ushing+0.009
Neg logit contributions
 cheering-0.006
 searching-0.006
 rescuing-0.005
 improving-0.005
Charl-0.005
Token:
Token ID25
Feature activation-3.83e+00

Pos logit contributions
 cheering+0.036
 rescuing+0.035
 searching+0.035
 improving+0.034
 exploring+0.033
Neg logit contributions
ires-0.052
ara-0.052
ia-0.050
outed-0.050
uddle-0.049
Token "
Token ID366
Feature activation-1.96e+00

Pos logit contributions
keeping+0.094
rob+0.091
bench+0.090
 rescuing+0.090
 cheering+0.090
Neg logit contributions
ara-0.082
ires-0.081
ia-0.081
ours-0.077
outed-0.076
TokenI
Token ID40
Feature activation-5.70e-01

Pos logit contributions
 cheering+0.036
 rescuing+0.036
 searching+0.035
keeping+0.035
 improving+0.034
Neg logit contributions
ires-0.054
ia-0.054
ara-0.053
outed-0.052
ours-0.052
Token don
Token ID836
Feature activation-1.79e+00

Pos logit contributions
 cheering+0.009
 searching+0.009
 rescuing+0.009
keeping+0.008
 improving+0.008
Neg logit contributions
ires-0.017
ia-0.017
ara-0.017
outed-0.017
ours-0.017
Token't
Token ID470
Feature activation-5.70e-01

Pos logit contributions
keeping+0.031
 searching+0.029
rob+0.027
 rescuing+0.027
bench+0.026
Neg logit contributions
ires-0.052
ara-0.052
ia-0.051
ushing-0.051
ours-0.049
Token cater
Token ID20825
Feature activation-2.87e+00

Pos logit contributions
bench+0.037
 rescuing+0.037
 cheering+0.037
keeping+0.037
rob+0.036
Neg logit contributions
ia-0.050
ara-0.049
ires-0.048
uddle-0.046
ushing-0.046
Tokenpillar
Token ID41643
Feature activation-8.40e-01

Pos logit contributions
 cheering+0.054
 rescuing+0.052
 searching+0.052
keeping+0.051
bench+0.050
Neg logit contributions
ara-0.081
ia-0.080
ires-0.079
ushing-0.076
uddle-0.076
Token,
Token ID11
Feature activation-2.54e-01

Pos logit contributions
 cheering+0.019
 rescuing+0.018
keeping+0.018
rob+0.018
bench+0.018
Neg logit contributions
ia-0.020
ara-0.019
ires-0.019
ushing-0.018
outed-0.018
Token then
Token ID788
Feature activation-1.59e+00

Pos logit contributions
 cheering+0.005
 rescuing+0.005
 searching+0.005
keeping+0.005
bench+0.005
Neg logit contributions
ia-0.006
ara-0.006
ires-0.006
outed-0.006
ours-0.006
Token ran
Token ID4966
Feature activation-1.27e+00

Pos logit contributions
 searching+0.031
 cheering+0.031
 exploring+0.030
 flying+0.028
 rescuing+0.028
Neg logit contributions
ires-0.047
ia-0.045
ara-0.043
outed-0.043
ushing-0.043
Token off
Token ID572
Feature activation-8.70e-01

Pos logit contributions
 cheering+0.024
 rescuing+0.024
 searching+0.024
keeping+0.023
bench+0.023
Neg logit contributions
ia-0.034
ara-0.034
ires-0.034
outed-0.033
ours-0.032
Token to
Token ID284
Feature activation+8.21e-01

Pos logit contributions
rob+0.022
keeping+0.021
bench+0.020
Charl+0.020
 rescuing+0.020
Neg logit contributions
ia-0.018
ara-0.018
ires-0.017
ry-0.016
outed-0.016
Token find
Token ID1064
Feature activation+8.64e-01

Pos logit contributions
ires+0.023
ara+0.023
ia+0.023
outed+0.022
ushing+0.022
Neg logit contributions
 cheering-0.015
 searching-0.015
 rescuing-0.015
keeping-0.014
 improving-0.014
Token a
Token ID257
Feature activation-2.22e-01

Pos logit contributions
ara+0.025
ires+0.025
ia+0.024
outed+0.024
ours+0.024
Neg logit contributions
 rescuing-0.015
 cheering-0.015
 searching-0.015
keeping-0.014
 improving-0.014
Token tree
Token ID5509
Feature activation-5.09e-01

Pos logit contributions
keeping+0.006
rob+0.005
bench+0.005
 rescuing+0.005
ner+0.005
Neg logit contributions
ara-0.005
ia-0.005
uddle-0.005
ushing-0.004
ires-0.004
Token of
Token ID286
Feature activation-1.08e+00

Pos logit contributions
rob+0.011
 cheering+0.011
 rescuing+0.011
Charl+0.011
keeping+0.011
Neg logit contributions
ia-0.012
ara-0.011
ires-0.011
ushing-0.011
uddle-0.011
Token came
Token ID1625
Feature activation+1.20e-01

Pos logit contributions
ires+0.003
ara+0.003
uddle+0.003
ia+0.003
ours+0.003
Neg logit contributions
 searching-0.003
 rescuing-0.002
 cheering-0.002
 exploring-0.002
keeping-0.002
Token back
Token ID736
Feature activation-1.52e-01

Pos logit contributions
ires+0.003
ia+0.003
outed+0.003
ara+0.003
ushing+0.003
Neg logit contributions
rob-0.003
bench-0.003
ner-0.003
 cheering-0.003
Charl-0.003
Token home
Token ID1363
Feature activation+6.88e-01

Pos logit contributions
 cheering+0.003
 rescuing+0.003
 searching+0.003
keeping+0.003
bench+0.003
Neg logit contributions
ires-0.004
ia-0.004
ara-0.004
outed-0.004
ours-0.004
Token to
Token ID284
Feature activation+4.67e-02

Pos logit contributions
ia+0.017
ires+0.017
ara+0.017
outed+0.017
ushing+0.017
Neg logit contributions
 cheering-0.014
 rescuing-0.014
 searching-0.014
rob-0.014
keeping-0.014
Token give
Token ID1577
Feature activation+1.25e+00

Pos logit contributions
ia+0.001
ires+0.001
ara+0.001
outed+0.001
ushing+0.001
Neg logit contributions
 cheering-0.001
 rescuing-0.001
 searching-0.001
keeping-0.001
 improving-0.001
Token Max
Token ID5436
Feature activation-2.84e+00

Pos logit contributions
ia+0.033
ires+0.033
ara+0.033
outed+0.032
ours+0.031
Neg logit contributions
 cheering-0.025
 rescuing-0.024
 searching-0.024
keeping-0.024
bench-0.023
Token his
Token ID465
Feature activation+6.31e-02

Pos logit contributions
 cheering+0.054
ner+0.051
 rescuing+0.051
 searching+0.051
rob+0.051
Neg logit contributions
ia-0.077
ara-0.077
ires-0.075
outed-0.074
ours-0.073
Token food
Token ID2057
Feature activation+1.59e-01

Pos logit contributions
ia+0.002
ires+0.002
ara+0.002
outed+0.001
ours+0.001
Neg logit contributions
 cheering-0.001
 rescuing-0.001
 searching-0.001
keeping-0.001
bench-0.001
Token.
Token ID13
Feature activation-3.63e-01

Pos logit contributions
ia+0.004
ara+0.004
ires+0.004
outed+0.004
ushing+0.004
Neg logit contributions
 cheering-0.003
 rescuing-0.003
 searching-0.003
keeping-0.003
 improving-0.003
Token Max
Token ID5436
Feature activation-7.77e-02

Pos logit contributions
 searching+0.007
 cheering+0.007
bench+0.007
Charl+0.007
keeping+0.007
Neg logit contributions
ires-0.010
ushing-0.009
ia-0.009
ara-0.009
ours-0.009
Token was
Token ID373
Feature activation+5.11e-01

Pos logit contributions
 cheering+0.002
 rescuing+0.002
 searching+0.002
keeping+0.002
bench+0.002
Neg logit contributions
ia-0.002
ara-0.002
ires-0.002
outed-0.002
ours-0.002
Token
Token ID198
Feature activation+1.47e-01

Pos logit contributions
 rescuing+0.038
 cheering+0.038
 searching+0.037
keeping+0.036
rob+0.036
Neg logit contributions
ara-0.038
ires-0.038
ia-0.038
outed-0.036
ours-0.036
Token
Token ID198
Feature activation-7.69e-01

Pos logit contributions
ia+0.004
ara+0.004
ires+0.004
outed+0.003
ours+0.003
Neg logit contributions
 rescuing-0.003
 searching-0.003
 cheering-0.003
keeping-0.003
rob-0.003
TokenSarah
Token ID29284
Feature activation-6.30e-01

Pos logit contributions
 cheering+0.018
 rescuing+0.018
 searching+0.018
keeping+0.018
 improving+0.017
Neg logit contributions
ia-0.017
ires-0.017
ara-0.017
outed-0.016
ours-0.016
Token picked
Token ID6497
Feature activation-5.06e-01

Pos logit contributions
 searching+0.013
 cheering+0.013
 rescuing+0.012
 exploring+0.012
 improving+0.012
Neg logit contributions
ires-0.017
ia-0.016
ours-0.016
ara-0.015
outed-0.015
Token up
Token ID510
Feature activation+1.44e-01

Pos logit contributions
 cheering+0.010
 rescuing+0.010
 searching+0.010
keeping+0.010
bench+0.010
Neg logit contributions
ara-0.013
ia-0.013
ires-0.013
outed-0.013
ours-0.013
Token the
Token ID262
Feature activation-2.91e+00

Pos logit contributions
ires+0.004
ia+0.004
ara+0.004
outed+0.004
ushing+0.004
Neg logit contributions
 cheering-0.002
 searching-0.002
 rescuing-0.002
keeping-0.002
 improving-0.002
Token cap
Token ID1451
Feature activation-9.25e-01

Pos logit contributions
 searching+0.076
 cheering+0.076
 rescuing+0.074
hered+0.072
rob+0.072
Neg logit contributions
ires-0.059
outed-0.057
ours-0.056
ia-0.056
ara-0.056
Token and
Token ID290
Feature activation-7.60e-01

Pos logit contributions
 cheering+0.019
 rescuing+0.018
 searching+0.018
keeping+0.018
 improving+0.017
Neg logit contributions
ia-0.024
ara-0.024
ires-0.024
outed-0.023
ushing-0.023
Token said
Token ID531
Feature activation-2.82e+00

Pos logit contributions
 searching+0.014
 cheering+0.014
 exploring+0.013
 rescuing+0.013
 chasing+0.013
Neg logit contributions
ires-0.023
ia-0.023
ushing-0.021
ara-0.021
outed-0.021
Token,
Token ID11
Feature activation-2.43e+00

Pos logit contributions
 cheering+0.051
 rescuing+0.050
 searching+0.050
 improving+0.047
keeping+0.046
Neg logit contributions
ires-0.080
ara-0.080
ia-0.079
outed-0.078
ushing-0.076
Token "
Token ID366
Feature activation-2.36e+00

Pos logit contributions
keeping+0.067
rob+0.065
bench+0.063
 rescuing+0.062
 cheering+0.061
Neg logit contributions
ia-0.046
ires-0.046
ara-0.046
outed-0.043
ours-0.043
Token.
Token ID13
Feature activation-1.51e+00

Pos logit contributions
keeping+0.007
 cheering+0.007
 rescuing+0.007
bench+0.007
 searching+0.007
Neg logit contributions
ia-0.008
ires-0.008
ara-0.008
outed-0.008
ours-0.008
Token They
Token ID1119
Feature activation-1.81e+00

Pos logit contributions
 cheering+0.038
 rescuing+0.038
 searching+0.038
keeping+0.037
bench+0.036
Neg logit contributions
ia-0.032
ires-0.031
ara-0.031
outed-0.030
ours-0.030
Token talked
Token ID6619
Feature activation-6.75e-04

Pos logit contributions
keeping+0.042
 rescuing+0.042
 cheering+0.041
bench+0.041
 searching+0.041
Neg logit contributions
ia-0.040
ara-0.040
ires-0.039
outed-0.039
ours-0.038
Token and
Token ID290
Feature activation-1.95e+00

Pos logit contributions
keeping+0.000
rob+0.000
ner+0.000
 cheering+0.000
bench+0.000
Neg logit contributions
ia-0.000
ires-0.000
ara-0.000
outed-0.000
ushing-0.000
Token discussed
Token ID6693
Feature activation-1.36e+00

Pos logit contributions
bench+0.045
Charl+0.045
ner+0.043
rob+0.043
keeping+0.043
Neg logit contributions
ia-0.042
outed-0.039
ara-0.038
ours-0.036
ires-0.035
Token who
Token ID508
Feature activation-2.59e+00

Pos logit contributions
keeping+0.038
bench+0.038
hered+0.037
rob+0.037
 cheering+0.037
Neg logit contributions
ires-0.025
ia-0.024
ara-0.023
ours-0.023
ushing-0.022
Token should
Token ID815
Feature activation-1.73e+00

Pos logit contributions
 cheering+0.059
 rescuing+0.058
keeping+0.058
 searching+0.058
bench+0.057
Neg logit contributions
ia-0.059
ara-0.059
ires-0.059
outed-0.056
ours-0.056
Token get
Token ID651
Feature activation-1.08e+00

Pos logit contributions
 rescuing+0.035
 searching+0.035
 cheering+0.034
keeping+0.032
 improving+0.032
Neg logit contributions
ires-0.046
outed-0.045
ara-0.045
ours-0.044
ia-0.044
Token the
Token ID262
Feature activation-3.63e+00

Pos logit contributions
rob+0.022
bench+0.022
 cheering+0.021
ner+0.021
Charl+0.021
Neg logit contributions
ires-0.028
ia-0.027
ara-0.027
ours-0.027
outed-0.026
Token carrot
Token ID40562
Feature activation-6.38e-01

Pos logit contributions
 cheering+0.103
keeping+0.102
ner+0.099
 rescuing+0.097
bench+0.097
Neg logit contributions
ia-0.069
ires-0.069
ara-0.067
uddle-0.061
ushing-0.060
Token.
Token ID13
Feature activation-8.50e-01

Pos logit contributions
 cheering+0.014
rob+0.013
 rescuing+0.013
 searching+0.013
Charl+0.013
Neg logit contributions
ia-0.016
ires-0.016
ara-0.015
ushing-0.015
outed-0.015
Token Tim
Token ID5045
Feature activation-1.23e-01

Pos logit contributions
ushing+0.004
ires+0.004
outed+0.004
ara+0.004
ia+0.004
Neg logit contributions
 searching-0.003
 rescuing-0.003
 exploring-0.003
 cheering-0.003
 chasing-0.003
Tokenmy
Token ID1820
Feature activation-4.30e-01

Pos logit contributions
 cheering+0.002
 rescuing+0.002
 searching+0.002
ner+0.002
keeping+0.002
Neg logit contributions
ia-0.003
ara-0.003
ires-0.003
outed-0.003
ushing-0.003
Token.
Token ID13
Feature activation-1.35e+00

Pos logit contributions
ner+0.011
keeping+0.011
 cheering+0.011
bench+0.011
rob+0.011
Neg logit contributions
ara-0.009
ia-0.009
ires-0.008
ushing-0.008
outed-0.008
Token Tim
Token ID5045
Feature activation+2.86e-01

Pos logit contributions
 cheering+0.039
 rescuing+0.039
 searching+0.039
keeping+0.038
bench+0.038
Neg logit contributions
ia-0.023
ires-0.023
ara-0.023
outed-0.021
ours-0.021
Tokenmy
Token ID1820
Feature activation+5.21e-02

Pos logit contributions
ara+0.008
ia+0.008
ires+0.007
outed+0.007
ushing+0.007
Neg logit contributions
 cheering-0.006
 rescuing-0.006
ner-0.006
 searching-0.006
keeping-0.005
Token liked
Token ID8288
Feature activation-2.47e+00

Pos logit contributions
ara+0.001
ia+0.001
ires+0.001
outed+0.001
ushing+0.001
Neg logit contributions
 cheering-0.001
 rescuing-0.001
ner-0.001
keeping-0.001
 searching-0.001
Token to
Token ID284
Feature activation-6.99e-01

Pos logit contributions
hered+0.056
bench+0.055
ner+0.055
Charl+0.054
rob+0.053
Neg logit contributions
ires-0.056
ours-0.054
ara-0.053
ia-0.053
outed-0.050
Token play
Token ID711
Feature activation-7.72e-01

Pos logit contributions
 searching+0.013
 rescuing+0.013
 cheering+0.013
keeping+0.013
 improving+0.013
Neg logit contributions
ires-0.019
ara-0.019
outed-0.019
ia-0.019
ours-0.018
Token outside
Token ID2354
Feature activation-1.79e-01

Pos logit contributions
 rescuing+0.012
 searching+0.012
 cheering+0.012
 improving+0.011
rob+0.011
Neg logit contributions
ia-0.024
ara-0.024
ires-0.023
outed-0.023
ours-0.023
Token in
Token ID287
Feature activation-5.78e-01

Pos logit contributions
keeping+0.005
bench+0.004
Charl+0.004
 cheering+0.004
hered+0.004
Neg logit contributions
ires-0.004
ia-0.004
ara-0.003
ours-0.003
ry-0.003
Token his
Token ID465
Feature activation+4.22e+00

Pos logit contributions
ner+0.012
bench+0.012
hered+0.012
 rescuing+0.012
 cheering+0.012
Neg logit contributions
ia-0.014
ara-0.014
ires-0.014
ours-0.013
uddle-0.013
Tokeny
Token ID88
Feature activation+6.79e-01

Pos logit contributions
ires+0.103
ours+0.102
ia+0.093
okes+0.092
outed+0.091
Neg logit contributions
hered-0.066
keeping-0.063
Charl-0.062
Ell-0.062
 Smith-0.059
Token was
Token ID373
Feature activation+8.38e-02

Pos logit contributions
ires+0.021
ours+0.019
ara+0.018
outed+0.018
ushing+0.018
Neg logit contributions
 searching-0.013
 cheering-0.012
 exploring-0.012
 chasing-0.012
 rescuing-0.011
Token okay
Token ID8788
Feature activation+8.02e-01

Pos logit contributions
ia+0.002
outed+0.002
ara+0.002
ires+0.002
ushing+0.002
Neg logit contributions
rob-0.002
keeping-0.002
 cheering-0.002
 rescuing-0.002
bench-0.002
Token and
Token ID290
Feature activation+7.20e-03

Pos logit contributions
ia+0.019
ara+0.019
ires+0.018
ushing+0.018
outed+0.018
Neg logit contributions
rob-0.018
bench-0.018
 cheering-0.018
keeping-0.017
 rescuing-0.017
Token they
Token ID484
Feature activation-1.26e+00

Pos logit contributions
ires+0.000
ia+0.000
ara+0.000
ushing+0.000
ours+0.000
Neg logit contributions
 searching-0.000
 rescuing-0.000
 cheering-0.000
 exploring-0.000
 helping-0.000
Token continued
Token ID3767
Feature activation-2.57e+00

Pos logit contributions
 searching+0.025
 cheering+0.025
 rescuing+0.024
Ell+0.023
hered+0.023
Neg logit contributions
ires-0.037
ara-0.035
ia-0.034
ours-0.034
ushing-0.032
Token to
Token ID284
Feature activation-7.20e-01

Pos logit contributions
ner+0.066
rob+0.065
odic+0.062
bench+0.062
Mic+0.061
Neg logit contributions
ia-0.048
ours-0.046
ry-0.045
ara-0.044
ires-0.044
Token play
Token ID711
Feature activation-1.44e-01

Pos logit contributions
 cheering+0.016
bench+0.016
 rescuing+0.016
keeping+0.016
ner+0.016
Neg logit contributions
ia-0.016
ara-0.016
ires-0.015
ours-0.015
outed-0.015
Token.
Token ID13
Feature activation+3.61e-01

Pos logit contributions
 cheering+0.003
 rescuing+0.003
 searching+0.003
keeping+0.003
bench+0.003
Neg logit contributions
ia-0.004
ara-0.004
ires-0.004
outed-0.004
ushing-0.004
Token
Token ID220
Feature activation+4.25e-01

Pos logit contributions
ires+0.014
ushing+0.014
ia+0.014
outed+0.013
ara+0.013
Neg logit contributions
 searching-0.004
 cheering-0.003
 exploring-0.003
 rescuing-0.003
Ell-0.003
Token
Token ID198
Feature activation+3.65e-01

Pos logit contributions
ara+0.014
ia+0.014
ires+0.014
ushing+0.013
outed+0.013
Neg logit contributions
 cheering-0.006
 rescuing-0.006
 searching-0.005
keeping-0.005
Charl-0.005