TOP ACTIVATIONS
MAX = 0.075

complete . Height : 90 centimeters . Weight : 15 kilograms
, thirty , thirty - one , thirty - two ,
, twenty - nine , thirty , thirty - one ,
- nine , thirty , thirty - one , thirty -
, thirty - one , thirty - two , thirty -
and rocks . She asked more questions and learned more things
knew she had to ask more . " What
, thirty - seven , thirty - eight , thirty -
, thirty - two , thirty - three , thirty -
the birds - 1 , 2 , 3 , 4 ,
filled with colourful or nam ents . Jimmy looked out of
, Lily 's plant grew taller and taller . It was
. He counted 1 , 2 , 3 ... all the
- one , thirty - two , thirty - three ,
, thirty - three , thirty - four , thirty -
Daddy , can we add more cars to my collection ?"
They made v room - v room noises as they moved
We just need to add more pieces to it to increase
, thirty - six , thirty - seven , thirty -
, thirty - seven , thirty - eight , thirty -

MIN ACTIVATIONS
MIN = -0.089

you go , my d arl ings . Enjoy your lunch
's wonderful , my d arl ings !" Mom my says
's okay , my d arl ings . I forgive you
you too , my d arl ings . Now let 's
you go , my d arl ings ," Mom said .
you too , my d arl ings ," she said .
perfect fish , my d arl ings ," mom says .
's good , my d arl ings . I 'm glad
you okay , my d arl ings ?" Mom asked .
have fun , my d arl ings ?" Their mother asked
have fun , my d arl ings ?" she asks .
you too , my d arl ings . But you have
forgive you , my d arl ings . And of course
your fault , my d arl ings . You didn 't
you okay , my d arl ings ?" she asked .
'm sorry , my d arl ings . The tornado took
What happened , my d arl ings ? Why do you
'm sorry , my d arl ings . That was a
'm sorry , my d arl ings . That was a
What happened , my d arl ings ? Why are you

INTERVAL 0.034 - 0.075
CONTAINS 0.348%

in his office and asked him what to do . The
. She said , " Tom , I am here .
what is her name and why is she sad .
. Sam thanks them for being nice . They become friends
, Lily 's mom sold more and more things . Then

INTERVAL -0.007 - 0.034
CONTAINS 73.116%

wet . They walked around and saw a big p uddle
truck was loading purple balloons for the parade . Tim loved
And so , the mighty warrior was thanked and
garden was color less . Then one day ,
It had been hurt by a dangerous weapon . The brave

INTERVAL -0.048 - -0.007
CONTAINS 26.530%

can be my friend . But next time , please bring
There were many pictures of Jimmy drinking from the pond and
was a small , scared animal . The animal was trembling
't want to stop playing , but he knew he had
Tim and his mom helped the lizard move to

INTERVAL -0.089 - -0.048
CONTAINS 0.007%

She put on her swim suit and ran to the shore
bed wearing his favourite py j amas , feeling so proud
still your friend . Acc idents happen , but we can
Sh rimp ? What is shrimp ?" L ila asked .
Nem o and his friend D ory were swimming around when
Token complete
Token ID1844
Feature activation+2.22e-02

Pos logit contributions
 knights+0.097
 explorers+0.096
 triangles+0.086
 caves+0.084
 rockets+0.081
Neg logit contributions
ada-0.106
aman-0.102
mine-0.090
 AW-0.084
ige-0.084
Token.
Token ID13
Feature activation-2.02e-03

Pos logit contributions
 knights+0.037
 explorers+0.037
 triangles+0.033
 caves+0.032
 rockets+0.031
Neg logit contributions
ada-0.039
aman-0.037
mine-0.033
ige-0.031
 AW-0.031
Token Height
Token ID27280
Feature activation+3.79e-02

Pos logit contributions
aman+0.004
ada+0.004
mine+0.003
 AW+0.003
ige+0.003
Neg logit contributions
 knights-0.003
 explorers-0.003
 triangles-0.003
 caves-0.003
 pirates-0.003
Token:
Token ID25
Feature activation+3.09e-02

Pos logit contributions
 knights+0.066
 explorers+0.064
 triangles+0.062
 caves+0.057
 slices+0.056
Neg logit contributions
ada-0.059
aman-0.057
mine-0.053
use-0.053
 AW-0.052
Token 90
Token ID4101
Feature activation+3.80e-02

Pos logit contributions
 knights+0.046
 explorers+0.045
 triangles+0.040
 caves+0.039
 rockets+0.038
Neg logit contributions
ada-0.059
aman-0.058
mine-0.052
 AW-0.048
ir-0.048
Token centimeters
Token ID48829
Feature activation+7.53e-02

Pos logit contributions
 knights+0.054
 explorers+0.053
 triangles+0.047
 caves+0.045
 rockets+0.043
Neg logit contributions
ada-0.076
aman-0.074
mine-0.067
 AW-0.062
ige-0.062
Token.
Token ID13
Feature activation-3.35e-03

Pos logit contributions
 knights+0.115
 explorers+0.115
 triangles+0.099
 caves+0.099
 rockets+0.095
Neg logit contributions
ada-0.140
aman-0.137
mine-0.123
 AW-0.114
ige-0.113
Token Weight
Token ID14331
Feature activation+2.46e-02

Pos logit contributions
aman+0.006
ada+0.006
mine+0.005
 AW+0.005
ige+0.005
Neg logit contributions
 knights-0.006
 explorers-0.005
 triangles-0.005
 caves-0.005
 pirates-0.005
Token:
Token ID25
Feature activation+3.04e-02

Pos logit contributions
 knights+0.041
 explorers+0.041
 triangles+0.037
 caves+0.036
 rockets+0.035
Neg logit contributions
ada-0.042
aman-0.041
mine-0.036
 AW-0.034
use-0.034
Token 15
Token ID1315
Feature activation+3.55e-02

Pos logit contributions
 knights+0.048
 explorers+0.048
 triangles+0.041
 caves+0.041
 rockets+0.040
Neg logit contributions
ada-0.055
aman-0.053
mine-0.047
 AW-0.044
ige-0.044
Token kilograms
Token ID37075
Feature activation+2.93e-02

Pos logit contributions
 explorers+0.063
 knights+0.062
 caves+0.054
 triangles+0.053
 lair+0.053
Neg logit contributions
ada-0.056
aman-0.055
mine-0.048
 AW-0.045
inkle-0.045
Token,
Token ID11
Feature activation+1.22e-02

Pos logit contributions
 knights+0.105
 explorers+0.102
 triangles+0.096
 slices+0.092
 he+0.091
Neg logit contributions
ada-0.094
aman-0.088
mine-0.075
 punish-0.072
ir-0.071
Token thirty
Token ID12277
Feature activation+6.97e-02

Pos logit contributions
 knights+0.016
 explorers+0.016
 triangles+0.014
 caves+0.013
 rockets+0.013
Neg logit contributions
ada-0.025
aman-0.025
mine-0.022
 AW-0.021
use-0.021
Token,
Token ID11
Feature activation+1.27e-02

Pos logit contributions
 knights+0.123
 explorers+0.123
 triangles+0.108
 caves+0.107
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.089
Token thirty
Token ID12277
Feature activation+6.96e-02

Pos logit contributions
 knights+0.017
 explorers+0.016
 triangles+0.014
 caves+0.014
 rockets+0.013
Neg logit contributions
ada-0.026
aman-0.026
mine-0.023
 AW-0.022
use-0.022
Token-
Token ID12
Feature activation+4.06e-02

Pos logit contributions
 knights+0.124
 explorers+0.123
 triangles+0.108
 caves+0.108
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.088
Tokenone
Token ID505
Feature activation+7.33e-02

Pos logit contributions
 explorers+0.049
 knights+0.049
 caves+0.040
 triangles+0.040
 lair+0.038
Neg logit contributions
ada-0.089
aman-0.087
mine-0.079
ige-0.075
 AW-0.074
Token,
Token ID11
Feature activation+1.41e-02

Pos logit contributions
 knights+0.126
 explorers+0.125
 triangles+0.110
 caves+0.109
 rockets+0.106
Neg logit contributions
ada-0.123
aman-0.119
mine-0.105
 AW-0.097
ige-0.097
Token thirty
Token ID12277
Feature activation+6.96e-02

Pos logit contributions
 knights+0.019
 explorers+0.018
 triangles+0.016
 caves+0.015
 rockets+0.015
Neg logit contributions
ada-0.029
aman-0.029
mine-0.026
 AW-0.025
use-0.025
Token-
Token ID12
Feature activation+3.95e-02

Pos logit contributions
 knights+0.124
 explorers+0.123
 triangles+0.108
 caves+0.108
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.088
Tokentwo
Token ID11545
Feature activation+6.52e-02

Pos logit contributions
 knights+0.048
 explorers+0.047
 caves+0.039
 triangles+0.039
 rockets+0.037
Neg logit contributions
ada-0.087
aman-0.085
mine-0.077
ige-0.073
 AW-0.073
Token,
Token ID11
Feature activation+1.08e-02

Pos logit contributions
 knights+0.121
 explorers+0.119
 triangles+0.108
 caves+0.105
 slices+0.102
Neg logit contributions
ada-0.102
aman-0.098
mine-0.085
 AW-0.078
use-0.078
Token,
Token ID11
Feature activation+9.32e-03

Pos logit contributions
 explorers+0.102
 knights+0.102
 caves+0.091
 triangles+0.091
 lair+0.089
Neg logit contributions
ada-0.061
aman-0.060
mine-0.050
ige-0.045
ir-0.044
Token twenty
Token ID8208
Feature activation+6.21e-02

Pos logit contributions
 knights+0.013
 explorers+0.013
 triangles+0.011
 caves+0.011
 rockets+0.010
Neg logit contributions
ada-0.019
aman-0.018
mine-0.016
 AW-0.015
ige-0.015
Token-
Token ID12
Feature activation+3.95e-02

Pos logit contributions
 knights+0.101
 explorers+0.100
 triangles+0.087
 caves+0.086
 rockets+0.083
Neg logit contributions
ada-0.110
aman-0.107
mine-0.095
 AW-0.088
ige-0.088
Tokennine
Token ID30888
Feature activation+5.93e-02

Pos logit contributions
 knights+0.047
 explorers+0.047
 caves+0.038
 triangles+0.038
 rockets+0.036
Neg logit contributions
ada-0.087
aman-0.085
mine-0.078
ige-0.073
 AW-0.073
Token,
Token ID11
Feature activation+1.22e-02

Pos logit contributions
 knights+0.105
 explorers+0.102
 triangles+0.096
 slices+0.092
 he+0.091
Neg logit contributions
ada-0.094
aman-0.088
mine-0.075
 punish-0.072
ir-0.071
Token thirty
Token ID12277
Feature activation+6.97e-02

Pos logit contributions
 knights+0.016
 explorers+0.016
 triangles+0.014
 caves+0.013
 rockets+0.013
Neg logit contributions
ada-0.025
aman-0.025
mine-0.022
 AW-0.021
use-0.021
Token,
Token ID11
Feature activation+1.27e-02

Pos logit contributions
 knights+0.123
 explorers+0.123
 triangles+0.108
 caves+0.107
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.089
Token thirty
Token ID12277
Feature activation+6.96e-02

Pos logit contributions
 knights+0.017
 explorers+0.016
 triangles+0.014
 caves+0.014
 rockets+0.013
Neg logit contributions
ada-0.026
aman-0.026
mine-0.023
 AW-0.022
use-0.022
Token-
Token ID12
Feature activation+4.06e-02

Pos logit contributions
 knights+0.124
 explorers+0.123
 triangles+0.108
 caves+0.108
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.088
Tokenone
Token ID505
Feature activation+7.33e-02

Pos logit contributions
 explorers+0.049
 knights+0.049
 caves+0.040
 triangles+0.040
 lair+0.038
Neg logit contributions
ada-0.089
aman-0.087
mine-0.079
ige-0.075
 AW-0.074
Token,
Token ID11
Feature activation+1.41e-02

Pos logit contributions
 knights+0.126
 explorers+0.125
 triangles+0.110
 caves+0.109
 rockets+0.106
Neg logit contributions
ada-0.123
aman-0.119
mine-0.105
 AW-0.097
ige-0.097
Token-
Token ID12
Feature activation+3.95e-02

Pos logit contributions
 knights+0.101
 explorers+0.100
 triangles+0.087
 caves+0.086
 rockets+0.083
Neg logit contributions
ada-0.110
aman-0.107
mine-0.095
 AW-0.088
ige-0.088
Tokennine
Token ID30888
Feature activation+5.93e-02

Pos logit contributions
 knights+0.047
 explorers+0.047
 caves+0.038
 triangles+0.038
 rockets+0.036
Neg logit contributions
ada-0.087
aman-0.085
mine-0.078
ige-0.073
 AW-0.073
Token,
Token ID11
Feature activation+1.22e-02

Pos logit contributions
 knights+0.105
 explorers+0.102
 triangles+0.096
 slices+0.092
 he+0.091
Neg logit contributions
ada-0.094
aman-0.088
mine-0.075
 punish-0.072
ir-0.071
Token thirty
Token ID12277
Feature activation+6.97e-02

Pos logit contributions
 knights+0.016
 explorers+0.016
 triangles+0.014
 caves+0.013
 rockets+0.013
Neg logit contributions
ada-0.025
aman-0.025
mine-0.022
 AW-0.021
use-0.021
Token,
Token ID11
Feature activation+1.27e-02

Pos logit contributions
 knights+0.123
 explorers+0.123
 triangles+0.108
 caves+0.107
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.089
Token thirty
Token ID12277
Feature activation+6.96e-02

Pos logit contributions
 knights+0.017
 explorers+0.016
 triangles+0.014
 caves+0.014
 rockets+0.013
Neg logit contributions
ada-0.026
aman-0.026
mine-0.023
 AW-0.022
use-0.022
Token-
Token ID12
Feature activation+4.06e-02

Pos logit contributions
 knights+0.124
 explorers+0.123
 triangles+0.108
 caves+0.108
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.088
Tokenone
Token ID505
Feature activation+7.33e-02

Pos logit contributions
 explorers+0.049
 knights+0.049
 caves+0.040
 triangles+0.040
 lair+0.038
Neg logit contributions
ada-0.089
aman-0.087
mine-0.079
ige-0.075
 AW-0.074
Token,
Token ID11
Feature activation+1.41e-02

Pos logit contributions
 knights+0.126
 explorers+0.125
 triangles+0.110
 caves+0.109
 rockets+0.106
Neg logit contributions
ada-0.123
aman-0.119
mine-0.105
 AW-0.097
ige-0.097
Token thirty
Token ID12277
Feature activation+6.96e-02

Pos logit contributions
 knights+0.019
 explorers+0.018
 triangles+0.016
 caves+0.015
 rockets+0.015
Neg logit contributions
ada-0.029
aman-0.029
mine-0.026
 AW-0.025
use-0.025
Token-
Token ID12
Feature activation+3.95e-02

Pos logit contributions
 knights+0.124
 explorers+0.123
 triangles+0.108
 caves+0.108
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.088
Token,
Token ID11
Feature activation+1.27e-02

Pos logit contributions
 knights+0.123
 explorers+0.123
 triangles+0.108
 caves+0.107
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.089
Token thirty
Token ID12277
Feature activation+6.96e-02

Pos logit contributions
 knights+0.017
 explorers+0.016
 triangles+0.014
 caves+0.014
 rockets+0.013
Neg logit contributions
ada-0.026
aman-0.026
mine-0.023
 AW-0.022
use-0.022
Token-
Token ID12
Feature activation+4.06e-02

Pos logit contributions
 knights+0.124
 explorers+0.123
 triangles+0.108
 caves+0.108
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.088
Tokenone
Token ID505
Feature activation+7.33e-02

Pos logit contributions
 explorers+0.049
 knights+0.049
 caves+0.040
 triangles+0.040
 lair+0.038
Neg logit contributions
ada-0.089
aman-0.087
mine-0.079
ige-0.075
 AW-0.074
Token,
Token ID11
Feature activation+1.41e-02

Pos logit contributions
 knights+0.126
 explorers+0.125
 triangles+0.110
 caves+0.109
 rockets+0.106
Neg logit contributions
ada-0.123
aman-0.119
mine-0.105
 AW-0.097
ige-0.097
Token thirty
Token ID12277
Feature activation+6.96e-02

Pos logit contributions
 knights+0.019
 explorers+0.018
 triangles+0.016
 caves+0.015
 rockets+0.015
Neg logit contributions
ada-0.029
aman-0.029
mine-0.026
 AW-0.025
use-0.025
Token-
Token ID12
Feature activation+3.95e-02

Pos logit contributions
 knights+0.124
 explorers+0.123
 triangles+0.108
 caves+0.108
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.088
Tokentwo
Token ID11545
Feature activation+6.52e-02

Pos logit contributions
 knights+0.048
 explorers+0.047
 caves+0.039
 triangles+0.039
 rockets+0.037
Neg logit contributions
ada-0.087
aman-0.085
mine-0.077
ige-0.073
 AW-0.073
Token,
Token ID11
Feature activation+1.08e-02

Pos logit contributions
 knights+0.121
 explorers+0.119
 triangles+0.108
 caves+0.105
 slices+0.102
Neg logit contributions
ada-0.102
aman-0.098
mine-0.085
 AW-0.078
use-0.078
Token thirty
Token ID12277
Feature activation+6.60e-02

Pos logit contributions
 knights+0.014
 explorers+0.014
 triangles+0.012
 caves+0.012
 rockets+0.011
Neg logit contributions
ada-0.022
aman-0.022
mine-0.020
 AW-0.019
ige-0.018
Token-
Token ID12
Feature activation+3.14e-02

Pos logit contributions
 knights+0.119
 explorers+0.119
 triangles+0.104
 caves+0.104
 rockets+0.101
Neg logit contributions
ada-0.105
aman-0.102
mine-0.089
 AW-0.082
ige-0.082
Token and
Token ID290
Feature activation+1.41e-02

Pos logit contributions
ada+0.002
aman+0.002
mine+0.002
ige+0.002
 AW+0.002
Neg logit contributions
 knights-0.003
 explorers-0.003
 triangles-0.003
 caves-0.003
 lair-0.003
Token rocks
Token ID12586
Feature activation-3.85e-03

Pos logit contributions
 knights+0.019
 explorers+0.018
 lair+0.018
kins+0.018
 importantly+0.017
Neg logit contributions
ada-0.023
aman-0.022
 hate-0.021
mine-0.021
use-0.021
Token.
Token ID13
Feature activation+5.40e-03

Pos logit contributions
ada+0.007
aman+0.007
mine+0.006
ir+0.006
ige+0.006
Neg logit contributions
 knights-0.006
 explorers-0.006
 triangles-0.005
 caves-0.005
 lair-0.005
Token She
Token ID1375
Feature activation+6.84e-04

Pos logit contributions
 explorers+0.011
 knights+0.011
 triangles+0.009
 caves+0.009
 lair+0.009
Neg logit contributions
ada-0.008
aman-0.007
mine-0.006
ir-0.006
use-0.006
Token asked
Token ID1965
Feature activation+1.46e-02

Pos logit contributions
 knights+0.001
 explorers+0.001
 triangles+0.001
 caves+0.001
 rockets+0.001
Neg logit contributions
ada-0.001
aman-0.001
mine-0.001
ige-0.001
use-0.001
Token more
Token ID517
Feature activation+6.84e-02

Pos logit contributions
 knights+0.025
 explorers+0.024
 triangles+0.021
 caves+0.021
 rockets+0.020
Neg logit contributions
ada-0.025
aman-0.024
mine-0.022
 AW-0.020
ige-0.020
Token questions
Token ID2683
Feature activation+2.83e-02

Pos logit contributions
 knights+0.100
 explorers+0.098
 lair+0.088
 shields+0.084
 triangles+0.084
Neg logit contributions
ada-0.126
aman-0.121
ir-0.105
mine-0.105
awk-0.102
Token and
Token ID290
Feature activation+1.22e-02

Pos logit contributions
 knights+0.049
 explorers+0.049
 caves+0.043
 triangles+0.043
 lair+0.042
Neg logit contributions
ada-0.046
aman-0.045
mine-0.040
ir-0.036
ige-0.036
Token learned
Token ID4499
Feature activation+2.86e-02

Pos logit contributions
 knights+0.020
 explorers+0.020
 caves+0.018
 triangles+0.018
 lair+0.018
Neg logit contributions
aman-0.018
ada-0.017
 punish-0.016
 hate-0.015
use-0.015
Token more
Token ID517
Feature activation+4.50e-02

Pos logit contributions
 knights+0.038
 explorers+0.038
 triangles+0.032
 caves+0.032
 lair+0.031
Neg logit contributions
ada-0.058
aman-0.057
mine-0.051
ige-0.048
ir-0.048
Token things
Token ID1243
Feature activation-1.18e-02

Pos logit contributions
 explorers+0.069
 knights+0.069
 triangles+0.058
 lair+0.058
 caves+0.058
Neg logit contributions
ada-0.079
aman-0.077
inkle-0.065
ir-0.065
mine-0.065
Token knew
Token ID2993
Feature activation-3.90e-03

Pos logit contributions
 knights+0.017
 explorers+0.017
 caves+0.015
 triangles+0.015
 rockets+0.014
Neg logit contributions
ada-0.021
aman-0.021
mine-0.018
ige-0.017
use-0.017
Token she
Token ID673
Feature activation+6.58e-03

Pos logit contributions
aman+0.007
ada+0.007
mine+0.006
 punish+0.005
aley+0.005
Neg logit contributions
 knights-0.006
 explorers-0.006
 triangles-0.006
 caves-0.005
 lair-0.005
Token had
Token ID550
Feature activation+1.35e-02

Pos logit contributions
 knights+0.009
 explorers+0.009
 triangles+0.008
 caves+0.007
 rockets+0.007
Neg logit contributions
ada-0.013
aman-0.013
mine-0.012
ige-0.011
use-0.011
Token to
Token ID284
Feature activation+6.31e-03

Pos logit contributions
 explorers+0.020
 knights+0.020
 triangles+0.018
 caves+0.017
 rockets+0.017
Neg logit contributions
aman-0.024
ada-0.023
ige-0.021
 punish-0.021
mine-0.021
Token ask
Token ID1265
Feature activation+1.27e-02

Pos logit contributions
 explorers+0.011
 knights+0.011
 there+0.010
 pirates+0.010
 he+0.010
Neg logit contributions
ada-0.010
aman-0.010
 AW-0.008
mine-0.008
use-0.008
Token more
Token ID517
Feature activation+6.77e-02

Pos logit contributions
 knights+0.019
 explorers+0.019
 caves+0.016
 triangles+0.016
 pirates+0.015
Neg logit contributions
ada-0.024
aman-0.024
mine-0.021
 AW-0.021
ige-0.020
Token.
Token ID13
Feature activation+6.17e-03

Pos logit contributions
 explorers+0.108
 knights+0.107
 lair+0.092
 triangles+0.091
 caves+0.091
Neg logit contributions
ada-0.116
aman-0.114
 punish-0.096
 hate-0.094
mine-0.094
Token
Token ID198
Feature activation-1.04e-03

Pos logit contributions
 explorers+0.011
 knights+0.011
 triangles+0.010
 caves+0.010
 rockets+0.009
Neg logit contributions
ada-0.010
aman-0.010
mine-0.009
ige-0.008
use-0.008
Token
Token ID198
Feature activation-5.21e-03

Pos logit contributions
ada+0.002
aman+0.002
mine+0.002
ir+0.002
ige+0.002
Neg logit contributions
 explorers-0.002
 knights-0.002
 triangles-0.001
 caves-0.001
 rockets-0.001
Token"
Token ID1
Feature activation-2.40e-02

Pos logit contributions
ada+0.009
aman+0.009
mine+0.008
 AW+0.007
ige+0.007
Neg logit contributions
 knights-0.009
 explorers-0.009
 triangles-0.008
 caves-0.008
 rockets-0.007
TokenWhat
Token ID2061
Feature activation+1.56e-02

Pos logit contributions
ada+0.046
aman+0.045
mine+0.040
ige+0.037
use+0.037
Neg logit contributions
 knights-0.036
 explorers-0.036
 triangles-0.031
 caves-0.031
 rockets-0.030
Token,
Token ID11
Feature activation+1.16e-03

Pos logit contributions
 knights+0.043
 explorers+0.042
 triangles+0.038
 caves+0.038
 rockets+0.037
Neg logit contributions
ada-0.028
aman-0.027
mine-0.023
 AW-0.020
ige-0.020
Token thirty
Token ID12277
Feature activation+6.44e-02

Pos logit contributions
 explorers+0.002
 knights+0.002
 caves+0.001
 triangles+0.001
 lair+0.001
Neg logit contributions
ada-0.002
aman-0.002
mine-0.002
ige-0.002
 AW-0.002
Token-
Token ID12
Feature activation+2.65e-02

Pos logit contributions
 explorers+0.120
 knights+0.120
 caves+0.106
 triangles+0.105
 rockets+0.102
Neg logit contributions
ada-0.098
aman-0.096
mine-0.083
ige-0.077
ir-0.076
Tokenseven
Token ID26548
Feature activation+5.05e-02

Pos logit contributions
 explorers+0.038
 knights+0.038
 caves+0.033
 triangles+0.032
 lair+0.031
Neg logit contributions
ada-0.052
aman-0.051
mine-0.046
ige-0.043
 AW-0.042
Token,
Token ID11
Feature activation+4.88e-03

Pos logit contributions
 knights+0.096
 explorers+0.094
 triangles+0.085
 caves+0.084
 slices+0.081
Neg logit contributions
ada-0.078
aman-0.075
mine-0.065
use-0.060
 AW-0.060
Token thirty
Token ID12277
Feature activation+6.67e-02

Pos logit contributions
 knights+0.007
 explorers+0.007
 triangles+0.006
 caves+0.006
 rockets+0.006
Neg logit contributions
ada-0.010
aman-0.009
mine-0.008
 AW-0.008
use-0.008
Token-
Token ID12
Feature activation+3.34e-02

Pos logit contributions
 knights+0.120
 explorers+0.119
 triangles+0.105
 caves+0.105
 rockets+0.101
Neg logit contributions
ada-0.107
aman-0.104
mine-0.091
 AW-0.083
ige-0.083
Tokeneight
Token ID26022
Feature activation+4.50e-02

Pos logit contributions
 knights+0.038
 explorers+0.037
 triangles+0.031
 caves+0.030
 rockets+0.028
Neg logit contributions
ada-0.075
aman-0.074
mine-0.067
 AW-0.064
use-0.064
Token,
Token ID11
Feature activation+2.17e-03

Pos logit contributions
 explorers+0.097
 knights+0.097
 caves+0.087
 triangles+0.086
 lair+0.085
Neg logit contributions
ada-0.055
aman-0.054
mine-0.045
ige-0.040
 AW-0.040
Token thirty
Token ID12277
Feature activation+6.38e-02

Pos logit contributions
 knights+0.003
 explorers+0.003
 triangles+0.003
 caves+0.003
 rockets+0.003
Neg logit contributions
ada-0.004
aman-0.004
mine-0.004
 AW-0.003
ige-0.003
Token-
Token ID12
Feature activation+2.83e-02

Pos logit contributions
 explorers+0.118
 knights+0.117
 caves+0.104
 triangles+0.103
 rockets+0.100
Neg logit contributions
ada-0.098
aman-0.096
mine-0.083
ige-0.077
 AW-0.076
Token,
Token ID11
Feature activation+1.41e-02

Pos logit contributions
 knights+0.126
 explorers+0.125
 triangles+0.110
 caves+0.109
 rockets+0.106
Neg logit contributions
ada-0.123
aman-0.119
mine-0.105
 AW-0.097
ige-0.097
Token thirty
Token ID12277
Feature activation+6.96e-02

Pos logit contributions
 knights+0.019
 explorers+0.018
 triangles+0.016
 caves+0.015
 rockets+0.015
Neg logit contributions
ada-0.029
aman-0.029
mine-0.026
 AW-0.025
use-0.025
Token-
Token ID12
Feature activation+3.95e-02

Pos logit contributions
 knights+0.124
 explorers+0.123
 triangles+0.108
 caves+0.108
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.088
Tokentwo
Token ID11545
Feature activation+6.52e-02

Pos logit contributions
 knights+0.048
 explorers+0.047
 caves+0.039
 triangles+0.039
 rockets+0.037
Neg logit contributions
ada-0.087
aman-0.085
mine-0.077
ige-0.073
 AW-0.073
Token,
Token ID11
Feature activation+1.08e-02

Pos logit contributions
 knights+0.121
 explorers+0.119
 triangles+0.108
 caves+0.105
 slices+0.102
Neg logit contributions
ada-0.102
aman-0.098
mine-0.085
 AW-0.078
use-0.078
Token thirty
Token ID12277
Feature activation+6.60e-02

Pos logit contributions
 knights+0.014
 explorers+0.014
 triangles+0.012
 caves+0.012
 rockets+0.011
Neg logit contributions
ada-0.022
aman-0.022
mine-0.020
 AW-0.019
ige-0.018
Token-
Token ID12
Feature activation+3.14e-02

Pos logit contributions
 knights+0.119
 explorers+0.119
 triangles+0.104
 caves+0.104
 rockets+0.101
Neg logit contributions
ada-0.105
aman-0.102
mine-0.089
 AW-0.082
ige-0.082
Tokenthree
Token ID15542
Feature activation+5.48e-02

Pos logit contributions
 knights+0.039
 explorers+0.039
 caves+0.032
 triangles+0.032
 rockets+0.030
Neg logit contributions
ada-0.068
aman-0.066
mine-0.060
ige-0.057
 AW-0.057
Token,
Token ID11
Feature activation+6.49e-03

Pos logit contributions
 knights+0.101
 explorers+0.096
 triangles+0.092
 slices+0.089
 caves+0.088
Neg logit contributions
ada-0.088
aman-0.083
mine-0.071
ige-0.065
ishly-0.064
Token thirty
Token ID12277
Feature activation+6.51e-02

Pos logit contributions
 explorers+0.009
 knights+0.009
 caves+0.008
 triangles+0.007
 lair+0.007
Neg logit contributions
ada-0.013
aman-0.013
mine-0.011
ige-0.011
ir-0.011
Token-
Token ID12
Feature activation+2.78e-02

Pos logit contributions
 knights+0.118
 explorers+0.118
 caves+0.104
 triangles+0.104
 rockets+0.100
Neg logit contributions
ada-0.103
aman-0.100
mine-0.087
ige-0.080
 AW-0.080
Token the
Token ID262
Feature activation-1.86e-02

Pos logit contributions
ada+0.035
aman+0.035
mine+0.031
ir+0.030
inkle+0.029
Neg logit contributions
 knights-0.030
 explorers-0.029
 lair-0.025
 triangles-0.025
 shields-0.024
Token birds
Token ID10087
Feature activation+5.58e-03

Pos logit contributions
ada+0.036
aman+0.036
mine+0.032
ige+0.030
use+0.030
Neg logit contributions
 explorers-0.027
 knights-0.027
 caves-0.022
 triangles-0.022
 lair-0.022
Token -
Token ID532
Feature activation+3.19e-02

Pos logit contributions
 explorers+0.009
 knights+0.009
 caves+0.008
 triangles+0.008
 rockets+0.007
Neg logit contributions
ada-0.010
aman-0.010
mine-0.009
ige-0.008
 AW-0.008
Token 1
Token ID352
Feature activation+4.26e-02

Pos logit contributions
 knights+0.045
 explorers+0.044
 triangles+0.038
 caves+0.038
 lair+0.038
Neg logit contributions
ada-0.062
aman-0.061
mine-0.054
 AW-0.051
ige-0.051
Token,
Token ID11
Feature activation+1.13e-02

Pos logit contributions
 knights+0.088
 explorers+0.088
 caves+0.079
 triangles+0.079
 lair+0.077
Neg logit contributions
ada-0.056
aman-0.054
mine-0.046
ige-0.041
 AW-0.041
Token2
Token ID17
Feature activation+6.59e-02

Pos logit contributions
 knights+0.016
 explorers+0.016
 lair+0.014
 caves+0.014
 chambers+0.014
Neg logit contributions
ada-0.021
aman-0.021
mine-0.018
ige-0.017
use-0.017
Token,
Token ID11
Feature activation+1.40e-02

Pos logit contributions
 knights+0.117
 explorers+0.114
 triangles+0.106
 caves+0.101
 he+0.100
Neg logit contributions
ada-0.104
aman-0.098
mine-0.084
 punish-0.081
use-0.080
Token3
Token ID18
Feature activation+5.13e-02

Pos logit contributions
 knights+0.022
 explorers+0.022
 caves+0.019
 triangles+0.019
 lair+0.019
Neg logit contributions
ada-0.025
aman-0.025
mine-0.022
 AW-0.020
ige-0.020
Token,
Token ID11
Feature activation+1.11e-02

Pos logit contributions
 knights+0.087
 explorers+0.084
 triangles+0.081
 slices+0.076
 caves+0.076
Neg logit contributions
ada-0.084
aman-0.080
mine-0.067
 punish-0.067
use-0.065
Token4
Token ID19
Feature activation+5.55e-02

Pos logit contributions
 knights+0.017
 explorers+0.017
 caves+0.015
 triangles+0.014
 lair+0.014
Neg logit contributions
ada-0.020
aman-0.020
mine-0.018
 AW-0.016
ige-0.016
Token,
Token ID11
Feature activation+1.39e-02

Pos logit contributions
 knights+0.099
 explorers+0.097
 triangles+0.090
 caves+0.086
 slices+0.085
Neg logit contributions
ada-0.088
aman-0.084
mine-0.071
use-0.068
 punish-0.067
Token filled
Token ID5901
Feature activation+7.69e-03

Pos logit contributions
aman+0.013
ada+0.013
mine+0.012
ir+0.010
use+0.010
Neg logit contributions
 knights-0.014
 explorers-0.014
 triangles-0.013
 shields-0.013
 importantly-0.013
Token with
Token ID351
Feature activation+6.95e-03

Pos logit contributions
 explorers+0.012
 knights+0.012
 importantly+0.011
 lair+0.011
 shields+0.011
Neg logit contributions
aman-0.013
ada-0.013
mine-0.012
ir-0.011
 ME-0.011
Token colourful
Token ID44799
Feature activation+1.81e-02

Pos logit contributions
 knights+0.010
 explorers+0.010
 lair+0.009
 triangles+0.009
kins+0.008
Neg logit contributions
ada-0.013
aman-0.012
mine-0.011
ige-0.011
ir-0.010
Token or
Token ID393
Feature activation+2.01e-02

Pos logit contributions
 explorers+0.024
 knights+0.023
 lair+0.021
 triangles+0.020
 caves+0.020
Neg logit contributions
ada-0.036
aman-0.034
mine-0.031
ige-0.029
ir-0.029
Tokennam
Token ID7402
Feature activation+5.82e-02

Pos logit contributions
 lair+0.029
 knights+0.029
 explorers+0.028
and+0.027
kins+0.026
Neg logit contributions
ada-0.031
aman-0.028
 hate-0.027
 punish-0.027
ige-0.026
Tokenents
Token ID658
Feature activation+6.56e-02

Pos logit contributions
 knights+0.074
 explorers+0.074
 caves+0.061
 triangles+0.061
 lair+0.058
Neg logit contributions
ada-0.123
aman-0.120
mine-0.109
ige-0.103
ir-0.103
Token.
Token ID13
Feature activation+7.38e-03

Pos logit contributions
 knights+0.115
 explorers+0.112
 importantly+0.102
 caves+0.101
 triangles+0.101
Neg logit contributions
ada-0.104
aman-0.097
mine-0.095
ir-0.084
ose-0.083
Token Jimmy
Token ID12963
Feature activation-1.27e-02

Pos logit contributions
 explorers+0.013
 knights+0.012
kins+0.011
 queens+0.011
 triangles+0.011
Neg logit contributions
ada-0.012
aman-0.012
mine-0.010
ING-0.010
 ME-0.009
Token looked
Token ID3114
Feature activation-2.52e-02

Pos logit contributions
ada+0.024
aman+0.024
mine+0.021
ige+0.020
 AW+0.020
Neg logit contributions
 knights-0.019
 explorers-0.018
 triangles-0.016
 caves-0.016
 rockets-0.015
Token out
Token ID503
Feature activation-1.77e-02

Pos logit contributions
ada+0.049
aman+0.048
mine+0.043
ige+0.041
 AW+0.041
Neg logit contributions
 knights-0.037
 explorers-0.037
 triangles-0.031
 caves-0.031
 rockets-0.030
Token of
Token ID286
Feature activation+6.96e-03

Pos logit contributions
ada+0.033
aman+0.032
mine+0.029
ir+0.028
 AW+0.027
Neg logit contributions
 knights-0.026
 explorers-0.025
 lair-0.023
 caves-0.022
 triangles-0.022
Token,
Token ID11
Feature activation+1.51e-02

Pos logit contributions
ada+0.010
aman+0.010
mine+0.009
ir+0.008
 AW+0.008
Neg logit contributions
 explorers-0.008
 knights-0.008
 triangles-0.007
 caves-0.007
 lair-0.007
Token Lily
Token ID20037
Feature activation+1.01e-02

Pos logit contributions
 knights+0.022
 explorers+0.022
 lair+0.019
 caves+0.019
 triangles+0.019
Neg logit contributions
ada-0.028
aman-0.028
mine-0.025
 punish-0.023
 AW-0.023
Token's
Token ID338
Feature activation+1.75e-02

Pos logit contributions
 knights+0.017
 explorers+0.017
 triangles+0.015
 caves+0.015
 rockets+0.014
Neg logit contributions
ada-0.017
aman-0.017
mine-0.014
 AW-0.013
ige-0.013
Token plant
Token ID4618
Feature activation+5.15e-03

Pos logit contributions
 explorers+0.027
 knights+0.027
 caves+0.024
 triangles+0.023
 queens+0.022
Neg logit contributions
ada-0.031
aman-0.031
mine-0.027
ige-0.026
 punish-0.026
Token grew
Token ID6348
Feature activation+3.49e-02

Pos logit contributions
 explorers+0.008
 knights+0.008
 caves+0.007
 triangles+0.007
 rockets+0.007
Neg logit contributions
ada-0.009
aman-0.009
mine-0.008
 AW-0.008
ige-0.008
Token taller
Token ID25242
Feature activation+6.55e-02

Pos logit contributions
 knights+0.045
 explorers+0.045
 triangles+0.037
 lair+0.036
 caves+0.036
Neg logit contributions
ada-0.073
aman-0.070
mine-0.063
ir-0.062
ige-0.061
Token and
Token ID290
Feature activation+1.93e-02

Pos logit contributions
 knights+0.135
 explorers+0.135
 lair+0.121
 caves+0.121
 rockets+0.119
Neg logit contributions
ada-0.080
aman-0.079
mine-0.066
ir-0.062
ige-0.060
Token taller
Token ID25242
Feature activation+5.84e-02

Pos logit contributions
 knights+0.028
 explorers+0.027
 lair+0.026
 importantly+0.023
and+0.023
Neg logit contributions
aman-0.033
ada-0.033
use-0.032
iable-0.031
mine-0.030
Token.
Token ID13
Feature activation+4.19e-04

Pos logit contributions
 knights+0.111
 explorers+0.111
 lair+0.099
 caves+0.098
 importantly+0.097
Neg logit contributions
aman-0.080
ada-0.080
mine-0.072
ir-0.069
 AW-0.065
Token It
Token ID632
Feature activation+5.23e-04

Pos logit contributions
 explorers+0.001
 knights+0.001
 caves+0.001
 lair+0.001
 rockets+0.001
Neg logit contributions
ada-0.001
aman-0.001
mine-0.000
 AW-0.000
ir-0.000
Token was
Token ID373
Feature activation+6.27e-03

Pos logit contributions
 knights+0.001
 explorers+0.001
 caves+0.001
 importantly+0.001
 lair+0.001
Neg logit contributions
ada-0.001
aman-0.001
mine-0.001
 AW-0.001
inkle-0.001
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
 explorers+0.018
 knights+0.018
 caves+0.016
 shields+0.016
 importantly+0.015
Neg logit contributions
ada-0.014
aman-0.014
mine-0.012
ose-0.011
our-0.011
Token He
Token ID679
Feature activation+1.37e-02

Pos logit contributions
 explorers+0.021
 knights+0.021
 caves+0.019
 lair+0.019
 shields+0.018
Neg logit contributions
ada-0.022
aman-0.021
mine-0.019
ING-0.017
 AW-0.017
Token counted
Token ID14789
Feature activation+4.89e-02

Pos logit contributions
 explorers+0.020
 knights+0.020
 caves+0.018
 importantly+0.017
 lair+0.017
Neg logit contributions
ada-0.023
aman-0.023
mine-0.022
ige-0.020
zy-0.020
Token 1
Token ID352
Feature activation+4.53e-02

Pos logit contributions
 explorers+0.084
 knights+0.083
 importantly+0.075
 lair+0.074
 shields+0.073
Neg logit contributions
aman-0.076
ada-0.075
mine-0.067
ir-0.060
ige-0.058
Token,
Token ID11
Feature activation+4.11e-04

Pos logit contributions
 explorers+0.096
 knights+0.094
 caves+0.088
 lair+0.087
 shields+0.087
Neg logit contributions
aman-0.051
ada-0.051
mine-0.044
ige-0.040
ose-0.038
Token2
Token ID17
Feature activation+6.52e-02

Pos logit contributions
 explorers+0.001
 knights+0.001
 caves+0.001
 lair+0.001
 rockets+0.001
Neg logit contributions
ada-0.001
aman-0.001
mine-0.001
ige-0.001
use-0.001
Token,
Token ID11
Feature activation+4.16e-03

Pos logit contributions
 explorers+0.135
 knights+0.134
 caves+0.121
 triangles+0.118
 lair+0.118
Neg logit contributions
ada-0.086
aman-0.083
mine-0.072
ige-0.064
 AW-0.064
Token3
Token ID18
Feature activation+5.14e-02

Pos logit contributions
 explorers+0.007
 knights+0.007
 lair+0.006
 caves+0.006
 shields+0.006
Neg logit contributions
ada-0.007
aman-0.007
mine-0.006
ige-0.006
use-0.006
Token...
Token ID986
Feature activation+2.33e-02

Pos logit contributions
 knights+0.101
 explorers+0.099
 triangles+0.091
 caves+0.088
 slices+0.086
Neg logit contributions
ada-0.074
aman-0.072
mine-0.061
use-0.056
 AW-0.056
Token all
Token ID477
Feature activation+1.14e-02

Pos logit contributions
 explorers+0.037
 knights+0.036
 caves+0.032
 lair+0.031
 triangles+0.030
Neg logit contributions
ada-0.042
aman-0.041
mine-0.038
ige-0.034
 AW-0.034
Token the
Token ID262
Feature activation-1.74e-04

Pos logit contributions
 explorers+0.018
 knights+0.018
 lair+0.016
 caves+0.016
 shields+0.016
Neg logit contributions
ada-0.020
aman-0.019
mine-0.019
upid-0.017
ir-0.016
Token-
Token ID12
Feature activation+4.06e-02

Pos logit contributions
 knights+0.124
 explorers+0.123
 triangles+0.108
 caves+0.108
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.088
Tokenone
Token ID505
Feature activation+7.33e-02

Pos logit contributions
 explorers+0.049
 knights+0.049
 caves+0.040
 triangles+0.040
 lair+0.038
Neg logit contributions
ada-0.089
aman-0.087
mine-0.079
ige-0.075
 AW-0.074
Token,
Token ID11
Feature activation+1.41e-02

Pos logit contributions
 knights+0.126
 explorers+0.125
 triangles+0.110
 caves+0.109
 rockets+0.106
Neg logit contributions
ada-0.123
aman-0.119
mine-0.105
 AW-0.097
ige-0.097
Token thirty
Token ID12277
Feature activation+6.96e-02

Pos logit contributions
 knights+0.019
 explorers+0.018
 triangles+0.016
 caves+0.015
 rockets+0.015
Neg logit contributions
ada-0.029
aman-0.029
mine-0.026
 AW-0.025
use-0.025
Token-
Token ID12
Feature activation+3.95e-02

Pos logit contributions
 knights+0.124
 explorers+0.123
 triangles+0.108
 caves+0.108
 rockets+0.104
Neg logit contributions
ada-0.113
aman-0.110
mine-0.096
 AW-0.089
ige-0.088
Tokentwo
Token ID11545
Feature activation+6.52e-02

Pos logit contributions
 knights+0.048
 explorers+0.047
 caves+0.039
 triangles+0.039
 rockets+0.037
Neg logit contributions
ada-0.087
aman-0.085
mine-0.077
ige-0.073
 AW-0.073
Token,
Token ID11
Feature activation+1.08e-02

Pos logit contributions
 knights+0.121
 explorers+0.119
 triangles+0.108
 caves+0.105
 slices+0.102
Neg logit contributions
ada-0.102
aman-0.098
mine-0.085
 AW-0.078
use-0.078
Token thirty
Token ID12277
Feature activation+6.60e-02

Pos logit contributions
 knights+0.014
 explorers+0.014
 triangles+0.012
 caves+0.012
 rockets+0.011
Neg logit contributions
ada-0.022
aman-0.022
mine-0.020
 AW-0.019
ige-0.018
Token-
Token ID12
Feature activation+3.14e-02

Pos logit contributions
 knights+0.119
 explorers+0.119
 triangles+0.104
 caves+0.104
 rockets+0.101
Neg logit contributions
ada-0.105
aman-0.102
mine-0.089
 AW-0.082
ige-0.082
Tokenthree
Token ID15542
Feature activation+5.48e-02

Pos logit contributions
 knights+0.039
 explorers+0.039
 caves+0.032
 triangles+0.032
 rockets+0.030
Neg logit contributions
ada-0.068
aman-0.066
mine-0.060
ige-0.057
 AW-0.057
Token,
Token ID11
Feature activation+6.49e-03

Pos logit contributions
 knights+0.101
 explorers+0.096
 triangles+0.092
 slices+0.089
 caves+0.088
Neg logit contributions
ada-0.088
aman-0.083
mine-0.071
ige-0.065
ishly-0.064
Token,
Token ID11
Feature activation+1.08e-02

Pos logit contributions
 knights+0.121
 explorers+0.119
 triangles+0.108
 caves+0.105
 slices+0.102
Neg logit contributions
ada-0.102
aman-0.098
mine-0.085
 AW-0.078
use-0.078
Token thirty
Token ID12277
Feature activation+6.60e-02

Pos logit contributions
 knights+0.014
 explorers+0.014
 triangles+0.012
 caves+0.012
 rockets+0.011
Neg logit contributions
ada-0.022
aman-0.022
mine-0.020
 AW-0.019
ige-0.018
Token-
Token ID12
Feature activation+3.14e-02

Pos logit contributions
 knights+0.119
 explorers+0.119
 triangles+0.104
 caves+0.104
 rockets+0.101
Neg logit contributions
ada-0.105
aman-0.102
mine-0.089
 AW-0.082
ige-0.082
Tokenthree
Token ID15542
Feature activation+5.48e-02

Pos logit contributions
 knights+0.039
 explorers+0.039
 caves+0.032
 triangles+0.032
 rockets+0.030
Neg logit contributions
ada-0.068
aman-0.066
mine-0.060
ige-0.057
 AW-0.057
Token,
Token ID11
Feature activation+6.49e-03

Pos logit contributions
 knights+0.101
 explorers+0.096
 triangles+0.092
 slices+0.089
 caves+0.088
Neg logit contributions
ada-0.088
aman-0.083
mine-0.071
ige-0.065
ishly-0.064
Token thirty
Token ID12277
Feature activation+6.51e-02

Pos logit contributions
 explorers+0.009
 knights+0.009
 caves+0.008
 triangles+0.007
 lair+0.007
Neg logit contributions
ada-0.013
aman-0.013
mine-0.011
ige-0.011
ir-0.011
Token-
Token ID12
Feature activation+2.78e-02

Pos logit contributions
 knights+0.118
 explorers+0.118
 caves+0.104
 triangles+0.104
 rockets+0.100
Neg logit contributions
ada-0.103
aman-0.100
mine-0.087
ige-0.080
 AW-0.080
Tokenfour
Token ID14337
Feature activation+4.18e-02

Pos logit contributions
 explorers+0.036
 knights+0.036
 caves+0.030
 triangles+0.029
 lair+0.028
Neg logit contributions
ada-0.059
aman-0.057
mine-0.052
ige-0.049
 AW-0.049
Token,
Token ID11
Feature activation-5.00e-03

Pos logit contributions
 knights+0.078
 explorers+0.078
 caves+0.069
 triangles+0.069
 rockets+0.067
Neg logit contributions
ada-0.063
aman-0.061
mine-0.053
 AW-0.049
ige-0.049
Token thirty
Token ID12277
Feature activation+5.76e-02

Pos logit contributions
ada+0.009
aman+0.009
mine+0.008
 AW+0.008
ige+0.008
Neg logit contributions
 explorers-0.008
 knights-0.008
 caves-0.006
 triangles-0.006
 lair-0.006
Token-
Token ID12
Feature activation+2.28e-02

Pos logit contributions
 knights+0.104
 explorers+0.104
 triangles+0.092
 caves+0.091
 rockets+0.089
Neg logit contributions
ada-0.091
aman-0.089
mine-0.077
 AW-0.071
ige-0.071
TokenDaddy
Token ID48280
Feature activation+2.60e-02

Pos logit contributions
 knights+0.015
 explorers+0.014
 triangles+0.012
 caves+0.012
 pirates+0.012
Neg logit contributions
ada-0.023
aman-0.022
mine-0.020
 AW-0.019
inkle-0.019
Token,
Token ID11
Feature activation+2.86e-03

Pos logit contributions
 knights+0.051
 explorers+0.051
 triangles+0.045
 caves+0.044
 pirates+0.044
Neg logit contributions
ada-0.037
aman-0.036
mine-0.031
 AW-0.029
ir-0.029
Token can
Token ID460
Feature activation+9.54e-04

Pos logit contributions
 knights+0.005
 explorers+0.005
 triangles+0.004
 caves+0.004
 lair+0.004
Neg logit contributions
ada-0.005
aman-0.005
mine-0.004
ige-0.004
use-0.004
Token we
Token ID356
Feature activation+1.43e-02

Pos logit contributions
 knights+0.002
 explorers+0.002
 triangles+0.001
 rockets+0.001
 caves+0.001
Neg logit contributions
ada-0.002
aman-0.002
mine-0.001
ige-0.001
ir-0.001
Token add
Token ID751
Feature activation+2.03e-02

Pos logit contributions
 explorers+0.023
 there+0.021
 knights+0.020
 pirates+0.019
 astronauts+0.018
Neg logit contributions
aman-0.026
ada-0.026
 AW-0.025
ir-0.023
inkle-0.022
Token more
Token ID517
Feature activation+6.47e-02

Pos logit contributions
 knights+0.023
 explorers+0.023
 triangles+0.018
 caves+0.018
 rockets+0.017
Neg logit contributions
ada-0.046
aman-0.045
mine-0.041
 AW-0.039
ige-0.039
Token cars
Token ID5006
Feature activation+2.96e-02

Pos logit contributions
 knights+0.089
 explorers+0.089
 caves+0.074
 triangles+0.074
 lair+0.073
Neg logit contributions
ada-0.129
aman-0.126
mine-0.112
 AW-0.106
ige-0.106
Token to
Token ID284
Feature activation+8.13e-04

Pos logit contributions
 knights+0.044
 explorers+0.044
 triangles+0.037
 caves+0.036
 rockets+0.036
Neg logit contributions
ada-0.058
aman-0.056
mine-0.049
 AW-0.048
ir-0.047
Token my
Token ID616
Feature activation-6.23e-03

Pos logit contributions
 knights+0.001
 explorers+0.001
 triangles+0.001
 caves+0.001
 rockets+0.001
Neg logit contributions
ada-0.002
aman-0.001
mine-0.001
 AW-0.001
ige-0.001
Token collection
Token ID4947
Feature activation+4.24e-03

Pos logit contributions
ada+0.011
aman+0.010
mine+0.009
 punish+0.009
use+0.009
Neg logit contributions
 explorers-0.010
 knights-0.010
 triangles-0.009
 caves-0.009
 importantly-0.008
Token?"
Token ID1701
Feature activation-5.57e-03

Pos logit contributions
 knights+0.009
 explorers+0.009
 triangles+0.008
 caves+0.008
 rockets+0.007
Neg logit contributions
ada-0.006
aman-0.006
mine-0.005
 AW-0.005
ir-0.004
Token They
Token ID1119
Feature activation-2.19e-02

Pos logit contributions
 knights+0.001
 explorers+0.001
 triangles+0.001
 caves+0.001
 lair+0.001
Neg logit contributions
ada-0.001
aman-0.001
mine-0.001
 AW-0.001
ige-0.001
Token made
Token ID925
Feature activation-2.88e-03

Pos logit contributions
aman+0.041
ada+0.040
mine+0.033
ir+0.032
 AW+0.032
Neg logit contributions
 explorers-0.035
 knights-0.035
 caves-0.029
 triangles-0.029
 pirates-0.029
Token v
Token ID410
Feature activation+3.97e-02

Pos logit contributions
ada+0.005
aman+0.005
mine+0.004
 punish+0.004
ige+0.004
Neg logit contributions
 explorers-0.004
 knights-0.004
 caves-0.004
 importantly-0.004
 lair-0.004
Tokenroom
Token ID3823
Feature activation+1.67e-02

Pos logit contributions
 knights+0.063
 explorers+0.058
 triangles+0.057
and+0.056
 slices+0.053
Neg logit contributions
aman-0.073
ada-0.067
use-0.064
aley-0.057
mine-0.057
Token-
Token ID12
Feature activation+2.29e-02

Pos logit contributions
 knights+0.022
 explorers+0.021
 triangles+0.019
 pieces+0.018
 rockets+0.018
Neg logit contributions
aman-0.035
ada-0.035
use-0.030
mine-0.030
aley-0.029
Tokenv
Token ID85
Feature activation+6.44e-02

Pos logit contributions
 knights+0.029
 explorers+0.029
 triangles+0.025
 caves+0.024
 rockets+0.024
Neg logit contributions
ada-0.048
aman-0.048
mine-0.042
use-0.040
inkle-0.040
Tokenroom
Token ID3823
Feature activation+2.03e-02

Pos logit contributions
 knights+0.128
 explorers+0.128
 triangles+0.114
 caves+0.114
 rockets+0.111
Neg logit contributions
ada-0.090
aman-0.088
mine-0.075
 AW-0.068
ige-0.068
Token noises
Token ID26782
Feature activation+4.01e-03

Pos logit contributions
 knights+0.030
 explorers+0.030
 triangles+0.026
 caves+0.025
 rockets+0.025
Neg logit contributions
ada-0.039
aman-0.039
mine-0.034
use-0.033
ir-0.032
Token as
Token ID355
Feature activation+5.29e-03

Pos logit contributions
 knights+0.007
 explorers+0.007
 triangles+0.006
 caves+0.006
 rockets+0.006
Neg logit contributions
ada-0.007
aman-0.007
mine-0.006
ige-0.005
 AW-0.005
Token they
Token ID484
Feature activation-2.53e-03

Pos logit contributions
 knights+0.007
 lair+0.007
 explorers+0.007
 caves+0.007
 chambers+0.007
Neg logit contributions
ada-0.010
aman-0.008
mine-0.008
 hate-0.008
 AW-0.008
Token moved
Token ID3888
Feature activation+1.81e-02

Pos logit contributions
aman+0.005
ada+0.005
mine+0.004
 AW+0.004
ige+0.004
Neg logit contributions
 knights-0.003
 explorers-0.003
 triangles-0.003
 caves-0.003
 rockets-0.003
Token We
Token ID775
Feature activation-1.06e-02

Pos logit contributions
ada+0.013
aman+0.012
mine+0.011
 AW+0.010
use+0.010
Neg logit contributions
 knights-0.010
 explorers-0.010
 caves-0.009
 triangles-0.009
 lair-0.009
Token just
Token ID655
Feature activation-8.26e-03

Pos logit contributions
aman+0.022
ada+0.020
oph+0.018
ING+0.018
 ME+0.018
Neg logit contributions
 knights-0.015
 explorers-0.015
 there-0.013
 pirates-0.012
 brothers-0.012
Token need
Token ID761
Feature activation-1.07e-02

Pos logit contributions
ada+0.012
aman+0.012
mine+0.010
 AW+0.009
ige+0.009
Neg logit contributions
 knights-0.016
 explorers-0.016
 triangles-0.014
 caves-0.014
 rockets-0.014
Token to
Token ID284
Feature activation+4.64e-03

Pos logit contributions
ada+0.023
aman+0.022
mine+0.020
ige+0.019
 AW+0.019
Neg logit contributions
 knights-0.013
 explorers-0.013
 triangles-0.011
 caves-0.011
 lair-0.010
Token add
Token ID751
Feature activation+1.59e-02

Pos logit contributions
 knights+0.007
 explorers+0.007
 smaller+0.006
 there+0.006
 pirates+0.006
Neg logit contributions
ada-0.009
aman-0.008
 AW-0.008
oph-0.007
awk-0.007
Token more
Token ID517
Feature activation+6.44e-02

Pos logit contributions
 knights+0.019
 explorers+0.019
 triangles+0.016
 caves+0.016
 rockets+0.015
Neg logit contributions
ada-0.035
aman-0.034
mine-0.031
 AW-0.029
ige-0.029
Token pieces
Token ID5207
Feature activation+2.18e-02

Pos logit contributions
 explorers+0.096
 knights+0.095
 lair+0.081
 caves+0.080
 triangles+0.078
Neg logit contributions
ada-0.120
aman-0.117
mine-0.104
use-0.100
 AW-0.099
Token to
Token ID284
Feature activation-2.54e-03

Pos logit contributions
 knights+0.031
 explorers+0.031
 caves+0.026
 lair+0.026
 triangles+0.026
Neg logit contributions
ada-0.042
aman-0.040
mine-0.037
ir-0.035
use-0.035
Token it
Token ID340
Feature activation+8.18e-03

Pos logit contributions
ada+0.005
aman+0.005
mine+0.004
 AW+0.004
awk+0.004
Neg logit contributions
 knights-0.004
 explorers-0.004
 triangles-0.003
 caves-0.003
 rockets-0.003
Token to
Token ID284
Feature activation+2.71e-03

Pos logit contributions
 knights+0.014
 explorers+0.014
 caves+0.012
 triangles+0.012
 lair+0.012
Neg logit contributions
ada-0.014
aman-0.013
mine-0.012
 AW-0.011
use-0.011
Token increase
Token ID2620
Feature activation+2.77e-02

Pos logit contributions
 knights+0.004
 explorers+0.004
 triangles+0.003
 caves+0.003
 pirates+0.003
Neg logit contributions
ada-0.005
aman-0.005
 AW-0.005
mine-0.005
awk-0.005
Token,
Token ID11
Feature activation-1.91e-03

Pos logit contributions
 explorers+0.057
 knights+0.057
 caves+0.051
 triangles+0.050
 rockets+0.049
Neg logit contributions
ada-0.039
aman-0.038
mine-0.032
ir-0.029
ige-0.029
Token thirty
Token ID12277
Feature activation+6.27e-02

Pos logit contributions
ada+0.004
aman+0.003
mine+0.003
ige+0.003
 AW+0.003
Neg logit contributions
 explorers-0.003
 knights-0.003
 caves-0.003
 triangles-0.002
 lair-0.002
Token-
Token ID12
Feature activation+2.47e-02

Pos logit contributions
 explorers+0.116
 knights+0.116
 caves+0.102
 triangles+0.102
 rockets+0.099
Neg logit contributions
ada-0.097
aman-0.095
mine-0.082
ige-0.076
 AW-0.075
Tokensix
Token ID19412
Feature activation+2.07e-02

Pos logit contributions
 knights+0.036
 explorers+0.036
 triangles+0.031
 caves+0.030
 rockets+0.029
Neg logit contributions
ada-0.048
aman-0.047
mine-0.042
 AW-0.039
ige-0.039
Token,
Token ID11
Feature activation+1.16e-03

Pos logit contributions
 knights+0.043
 explorers+0.042
 triangles+0.038
 caves+0.038
 rockets+0.037
Neg logit contributions
ada-0.028
aman-0.027
mine-0.023
 AW-0.020
ige-0.020
Token thirty
Token ID12277
Feature activation+6.44e-02

Pos logit contributions
 explorers+0.002
 knights+0.002
 caves+0.001
 triangles+0.001
 lair+0.001
Neg logit contributions
ada-0.002
aman-0.002
mine-0.002
ige-0.002
 AW-0.002
Token-
Token ID12
Feature activation+2.65e-02

Pos logit contributions
 explorers+0.120
 knights+0.120
 caves+0.106
 triangles+0.105
 rockets+0.102
Neg logit contributions
ada-0.098
aman-0.096
mine-0.083
ige-0.077
ir-0.076
Tokenseven
Token ID26548
Feature activation+5.05e-02

Pos logit contributions
 explorers+0.038
 knights+0.038
 caves+0.033
 triangles+0.032
 lair+0.031
Neg logit contributions
ada-0.052
aman-0.051
mine-0.046
ige-0.043
 AW-0.042
Token,
Token ID11
Feature activation+4.88e-03

Pos logit contributions
 knights+0.096
 explorers+0.094
 triangles+0.085
 caves+0.084
 slices+0.081
Neg logit contributions
ada-0.078
aman-0.075
mine-0.065
use-0.060
 AW-0.060
Token thirty
Token ID12277
Feature activation+6.67e-02

Pos logit contributions
 knights+0.007
 explorers+0.007
 triangles+0.006
 caves+0.006
 rockets+0.006
Neg logit contributions
ada-0.010
aman-0.009
mine-0.008
 AW-0.008
use-0.008
Token-
Token ID12
Feature activation+3.34e-02

Pos logit contributions
 knights+0.120
 explorers+0.119
 triangles+0.105
 caves+0.105
 rockets+0.101
Neg logit contributions
ada-0.107
aman-0.104
mine-0.091
 AW-0.083
ige-0.083
Token,
Token ID11
Feature activation-3.08e-03

Pos logit contributions
 explorers+0.034
 knights+0.034
 caves+0.031
 triangles+0.031
 lair+0.030
Neg logit contributions
ada-0.019
aman-0.019
mine-0.016
ige-0.014
 AW-0.014
Token thirty
Token ID12277
Feature activation+5.78e-02

Pos logit contributions
ada+0.005
aman+0.005
mine+0.005
ir+0.004
ige+0.004
Neg logit contributions
 explorers-0.005
 knights-0.005
 caves-0.005
 lair-0.004
 shields-0.004
Token-
Token ID12
Feature activation+2.86e-02

Pos logit contributions
 explorers+0.108
 knights+0.108
 caves+0.095
 triangles+0.095
 lair+0.092
Neg logit contributions
ada-0.088
aman-0.085
mine-0.074
ige-0.068
 AW-0.068
Tokenseven
Token ID26548
Feature activation+5.09e-02

Pos logit contributions
 knights+0.047
 explorers+0.047
 triangles+0.041
 caves+0.040
 rockets+0.039
Neg logit contributions
ada-0.050
aman-0.049
mine-0.043
 AW-0.040
ir-0.040
Token,
Token ID11
Feature activation+2.16e-03

Pos logit contributions
 knights+0.107
 explorers+0.107
 triangles+0.096
 caves+0.096
 rockets+0.093
Neg logit contributions
ada-0.066
aman-0.064
mine-0.054
 AW-0.048
ige-0.048
Token thirty
Token ID12277
Feature activation+6.42e-02

Pos logit contributions
 explorers+0.003
 knights+0.003
 caves+0.003
 triangles+0.003
 lair+0.003
Neg logit contributions
ada-0.004
aman-0.004
mine-0.003
ige-0.003
ir-0.003
Token-
Token ID12
Feature activation+3.35e-02

Pos logit contributions
 explorers+0.121
 knights+0.120
 caves+0.107
 triangles+0.105
 rockets+0.103
Neg logit contributions
ada-0.096
aman-0.094
mine-0.081
ige-0.076
ir-0.075
Tokeneight
Token ID26022
Feature activation+4.01e-02

Pos logit contributions
 knights+0.040
 explorers+0.039
 triangles+0.033
 caves+0.031
 rockets+0.030
Neg logit contributions
ada-0.074
aman-0.073
mine-0.065
 AW-0.062
inkle-0.062
Token,
Token ID11
Feature activation-1.23e-03

Pos logit contributions
 explorers+0.091
 knights+0.090
 caves+0.082
 lair+0.081
 triangles+0.080
Neg logit contributions
ada-0.043
aman-0.043
mine-0.035
inkle-0.031
ir-0.030
Token thirty
Token ID12277
Feature activation+5.93e-02

Pos logit contributions
ada+0.002
aman+0.002
mine+0.002
ige+0.002
ir+0.002
Neg logit contributions
 explorers-0.002
 knights-0.002
 caves-0.002
 lair-0.002
 shields-0.002
Token-
Token ID12
Feature activation+2.90e-02

Pos logit contributions
 explorers+0.114
 knights+0.113
 caves+0.102
 triangles+0.099
 lair+0.098
Neg logit contributions
ada-0.085
aman-0.084
mine-0.072
ige-0.068
ir-0.067
Token you
Token ID345
Feature activation-1.93e-02

Pos logit contributions
ada+0.005
aman+0.005
mine+0.005
't+0.005
 punish+0.005
Neg logit contributions
 knights-0.006
 lair-0.006
 triangles-0.006
 explorers-0.006
and-0.006
Token go
Token ID467
Feature activation-2.74e-03

Pos logit contributions
aman+0.029
ada+0.029
mine+0.024
 AW+0.022
ir+0.022
Neg logit contributions
 explorers-0.037
 knights-0.037
 triangles-0.032
 pirates-0.032
 there-0.032
Token,
Token ID11
Feature activation-1.92e-02

Pos logit contributions
ada+0.006
aman+0.006
mine+0.005
ige+0.005
 AW+0.005
Neg logit contributions
 explorers-0.004
 knights-0.004
 caves-0.003
 pirates-0.003
 triangles-0.003
Token my
Token ID616
Feature activation-2.30e-02

Pos logit contributions
ada+0.031
aman+0.030
mine+0.026
 AW+0.024
ige+0.024
Neg logit contributions
 knights-0.034
 explorers-0.034
 triangles-0.030
 caves-0.030
 rockets-0.029
Token d
Token ID288
Feature activation-9.02e-03

Pos logit contributions
ada+0.042
aman+0.041
mine+0.037
 AW+0.034
ige+0.034
Neg logit contributions
 knights-0.036
 explorers-0.035
 triangles-0.030
 caves-0.030
 rockets-0.029
Tokenarl
Token ID7063
Feature activation-8.90e-02

Pos logit contributions
aman+0.012
ada+0.009
 AW+0.009
use+0.009
 punish+0.009
Neg logit contributions
 knights-0.019
 explorers-0.018
 triangles-0.017
and-0.017
 pirates-0.016
Tokenings
Token ID654
Feature activation-2.83e-02

Pos logit contributions
ada+0.142
aman+0.133
mine+0.120
 punish+0.114
aley+0.110
Neg logit contributions
 knights-0.151
 explorers-0.151
kins-0.136
 pieces-0.136
 slices-0.133
Token.
Token ID13
Feature activation-1.25e-02

Pos logit contributions
ada+0.048
aman+0.047
ir+0.041
mine+0.040
ige+0.040
Neg logit contributions
 knights-0.047
 explorers-0.046
 there-0.042
 he-0.041
 pirates-0.039
Token Enjoy
Token ID18179
Feature activation-2.24e-02

Pos logit contributions
ada+0.022
aman+0.022
mine+0.019
ige+0.018
ir+0.018
Neg logit contributions
 knights-0.021
 explorers-0.021
 triangles-0.018
 caves-0.018
 pirates-0.017
Token your
Token ID534
Feature activation-3.20e-02

Pos logit contributions
ada+0.040
aman+0.037
mine+0.033
 AW+0.032
ishly+0.031
Neg logit contributions
 knights-0.038
 explorers-0.036
 triangles-0.033
 caves-0.033
 slices-0.033
Token lunch
Token ID9965
Feature activation-2.05e-02

Pos logit contributions
ada+0.062
aman+0.061
mine+0.055
use+0.052
 AW+0.052
Neg logit contributions
 knights-0.045
 explorers-0.045
 triangles-0.038
 caves-0.038
 lair-0.036
Token's
Token ID338
Feature activation-1.11e-02

Pos logit contributions
ada+0.013
aman+0.013
mine+0.011
 AW+0.010
ige+0.010
Neg logit contributions
 explorers-0.014
 knights-0.014
 caves-0.012
 triangles-0.012
 lair-0.012
Token wonderful
Token ID7932
Feature activation+1.45e-03

Pos logit contributions
ada+0.022
aman+0.021
mine+0.019
 AW+0.018
ir+0.018
Neg logit contributions
 knights-0.016
 explorers-0.016
 triangles-0.013
 caves-0.013
 rockets-0.013
Token,
Token ID11
Feature activation-1.25e-02

Pos logit contributions
 knights+0.003
 explorers+0.003
 caves+0.003
 triangles+0.003
 lair+0.003
Neg logit contributions
ada-0.002
aman-0.002
mine-0.002
ige-0.001
 AW-0.001
Token my
Token ID616
Feature activation-2.26e-02

Pos logit contributions
ada+0.024
aman+0.023
mine+0.020
ige+0.019
 AW+0.019
Neg logit contributions
 knights-0.018
 explorers-0.018
 caves-0.016
 triangles-0.016
 lair-0.015
Token d
Token ID288
Feature activation-9.21e-03

Pos logit contributions
ada+0.040
aman+0.039
mine+0.035
 AW+0.033
ir+0.032
Neg logit contributions
 knights-0.037
 explorers-0.037
 triangles-0.032
 caves-0.031
 pirates-0.031
Tokenarl
Token ID7063
Feature activation-8.55e-02

Pos logit contributions
aman+0.014
 punish+0.011
use+0.011
 AW+0.011
mine+0.011
Neg logit contributions
 knights-0.018
 triangles-0.016
 explorers-0.016
 queens-0.015
and-0.015
Tokenings
Token ID654
Feature activation-3.00e-02

Pos logit contributions
ada+0.131
aman+0.130
mine+0.112
 punish+0.108
aley+0.106
Neg logit contributions
 knights-0.143
 explorers-0.140
kins-0.137
 pieces-0.135
 slices-0.131
Token!"
Token ID2474
Feature activation-8.39e-03

Pos logit contributions
aman+0.054
ada+0.052
ir+0.045
mine+0.045
inkle+0.044
Neg logit contributions
 explorers-0.045
 knights-0.045
 there-0.042
 he-0.040
 pieces-0.039
Token Mom
Token ID11254
Feature activation+3.91e-03

Pos logit contributions
ada+0.016
aman+0.016
mine+0.014
 AW+0.013
ige+0.013
Neg logit contributions
 explorers-0.012
 knights-0.012
 caves-0.011
 triangles-0.010
 lair-0.010
Tokenmy
Token ID1820
Feature activation+1.37e-02

Pos logit contributions
 knights+0.007
 explorers+0.007
 triangles+0.006
 caves+0.006
 rockets+0.006
Neg logit contributions
ada-0.006
aman-0.006
mine-0.005
 AW-0.005
ige-0.005
Token says
Token ID1139
Feature activation+2.65e-03

Pos logit contributions
 knights+0.023
 explorers+0.023
 triangles+0.020
 caves+0.019
 rockets+0.019
Neg logit contributions
ada-0.024
aman-0.023
mine-0.020
 AW-0.019
ige-0.019
Token's
Token ID338
Feature activation+1.21e-02

Pos logit contributions
 explorers+0.013
 knights+0.013
 caves+0.012
 triangles+0.012
 lair+0.011
Neg logit contributions
ada-0.010
aman-0.009
mine-0.008
 AW-0.007
 punish-0.007
Token okay
Token ID8788
Feature activation+2.13e-02

Pos logit contributions
 explorers+0.022
 knights+0.022
 caves+0.020
 triangles+0.020
 lair+0.019
Neg logit contributions
aman-0.018
ada-0.018
ige-0.015
 punish-0.015
mine-0.015
Token,
Token ID11
Feature activation-2.22e-02

Pos logit contributions
 explorers+0.046
 knights+0.046
 triangles+0.044
 slices+0.044
 valleys+0.044
Neg logit contributions
ada-0.021
aman-0.020
use-0.016
 hate-0.016
mine-0.016
Token my
Token ID616
Feature activation-1.85e-02

Pos logit contributions
ada+0.041
aman+0.040
mine+0.035
ige+0.033
 AW+0.033
Neg logit contributions
 knights-0.034
 explorers-0.033
 triangles-0.029
 caves-0.029
 lair-0.028
Token d
Token ID288
Feature activation-9.20e-03

Pos logit contributions
aman+0.032
ada+0.032
mine+0.030
ir+0.027
 AW+0.027
Neg logit contributions
 knights-0.031
 explorers-0.030
 pirates-0.026
 brothers-0.026
 triangles-0.026
Tokenarl
Token ID7063
Feature activation-8.53e-02

Pos logit contributions
aman+0.012
 AW+0.010
use+0.010
 punish+0.010
ada+0.009
Neg logit contributions
 knights-0.020
 explorers-0.018
and-0.017
 triangles-0.017
 pirates-0.016
Tokenings
Token ID654
Feature activation-3.43e-02

Pos logit contributions
ada+0.134
aman+0.132
mine+0.117
iable+0.107
 punish+0.106
Neg logit contributions
 explorers-0.148
 knights-0.147
kins-0.136
 pieces-0.132
ment-0.132
Token.
Token ID13
Feature activation-9.85e-03

Pos logit contributions
ada+0.067
aman+0.067
mine+0.059
ige+0.057
 AW+0.056
Neg logit contributions
 knights-0.049
 explorers-0.049
 triangles-0.041
 caves-0.041
 rockets-0.040
Token I
Token ID314
Feature activation-3.59e-03

Pos logit contributions
ada+0.017
aman+0.016
mine+0.014
 AW+0.013
use+0.013
Neg logit contributions
 knights-0.017
 explorers-0.017
 triangles-0.015
 caves-0.014
 lair-0.014
Token forgive
Token ID20927
Feature activation-6.97e-04

Pos logit contributions
aman+0.007
ada+0.007
mine+0.006
ige+0.006
ir+0.006
Neg logit contributions
 explorers-0.005
 knights-0.005
 there-0.005
 pirates-0.005
 he-0.005
Token you
Token ID345
Feature activation+4.00e-03

Pos logit contributions
ada+0.001
aman+0.001
mine+0.001
 AW+0.001
ige+0.001
Neg logit contributions
 knights-0.001
 explorers-0.001
 triangles-0.001
 caves-0.001
 rockets-0.001
Token you
Token ID345
Feature activation-2.66e-03

Pos logit contributions
ada+0.015
aman+0.014
mine+0.012
ir+0.011
use+0.011
Neg logit contributions
 explorers-0.035
 knights-0.035
 lair-0.033
 caves-0.032
 shields-0.031
Token too
Token ID1165
Feature activation-9.90e-03

Pos logit contributions
aman+0.006
ada+0.006
 AW+0.006
mine+0.006
ir+0.005
Neg logit contributions
 knights-0.003
 explorers-0.003
 triangles-0.002
 pirates-0.002
 rockets-0.002
Token,
Token ID11
Feature activation-2.10e-02

Pos logit contributions
ada+0.016
aman+0.016
mine+0.014
 AW+0.013
ige+0.013
Neg logit contributions
 knights-0.018
 explorers-0.018
 triangles-0.016
 importantly-0.015
 caves-0.015
Token my
Token ID616
Feature activation-2.22e-02

Pos logit contributions
ada+0.044
aman+0.043
mine+0.039
inkle+0.037
use+0.037
Neg logit contributions
 knights-0.028
 explorers-0.028
 triangles-0.024
 pirates-0.023
 rockets-0.022
Token d
Token ID288
Feature activation-1.03e-02

Pos logit contributions
ada+0.039
aman+0.039
mine+0.035
 AW+0.032
ir+0.032
Neg logit contributions
 knights-0.037
 explorers-0.036
 triangles-0.032
 caves-0.031
 pirates-0.030
Tokenarl
Token ID7063
Feature activation-8.48e-02

Pos logit contributions
aman+0.013
 AW+0.011
use+0.011
 punish+0.011
mine+0.010
Neg logit contributions
 knights-0.022
 explorers-0.020
 triangles-0.020
 slices-0.019
and-0.019
Tokenings
Token ID654
Feature activation-3.50e-02

Pos logit contributions
ada+0.126
aman+0.125
mine+0.109
 punish+0.099
ige+0.099
Neg logit contributions
 explorers-0.153
 knights-0.152
kins-0.142
 slices-0.136
 pieces-0.136
Token.
Token ID13
Feature activation-1.22e-02

Pos logit contributions
aman+0.069
ada+0.069
mine+0.059
 AW+0.059
ige+0.059
Neg logit contributions
 knights-0.050
 explorers-0.049
 triangles-0.042
 caves-0.040
 rockets-0.040
Token Now
Token ID2735
Feature activation-1.98e-02

Pos logit contributions
ada+0.020
aman+0.020
mine+0.018
ige+0.016
ir+0.016
Neg logit contributions
 knights-0.021
 explorers-0.021
 triangles-0.019
 caves-0.018
 pirates-0.018
Token let
Token ID1309
Feature activation-1.53e-03

Pos logit contributions
ada+0.031
aman+0.030
mine+0.026
ir+0.025
ige+0.024
Neg logit contributions
 explorers-0.036
 knights-0.036
 triangles-0.032
 caves-0.032
 pirates-0.031
Token's
Token ID338
Feature activation+2.10e-02

Pos logit contributions
ada+0.003
aman+0.003
mine+0.003
ir+0.003
inkle+0.003
Neg logit contributions
 knights-0.002
 explorers-0.002
 there-0.002
 pirates-0.002
 triangles-0.002
Token you
Token ID345
Feature activation-5.23e-03

Pos logit contributions
ada+0.020
aman+0.019
mine+0.016
ige+0.015
ir+0.015
Neg logit contributions
 knights-0.029
 explorers-0.029
 triangles-0.026
 caves-0.026
 rockets-0.025
Token go
Token ID467
Feature activation-4.52e-03

Pos logit contributions
aman+0.008
ada+0.007
mine+0.006
aley+0.006
ige+0.006
Neg logit contributions
 knights-0.011
 explorers-0.010
 pirates-0.009
 triangles-0.009
 caves-0.009
Token,
Token ID11
Feature activation-2.48e-02

Pos logit contributions
aman+0.008
ada+0.008
ige+0.007
mine+0.007
aley+0.007
Neg logit contributions
 knights-0.007
 there-0.007
 explorers-0.007
 he-0.006
 cave-0.006
Token my
Token ID616
Feature activation-2.43e-02

Pos logit contributions
ada+0.053
aman+0.053
mine+0.050
inkle+0.049
use+0.049
Neg logit contributions
 knights-0.031
 explorers-0.030
 pirates-0.026
 there-0.026
 he-0.024
Token d
Token ID288
Feature activation-1.18e-02

Pos logit contributions
aman+0.044
ada+0.043
mine+0.041
use+0.037
ir+0.036
Neg logit contributions
 knights-0.039
 explorers-0.039
 pirates-0.033
 triangles-0.032
 cave-0.032
Tokenarl
Token ID7063
Feature activation-8.46e-02

Pos logit contributions
aman+0.016
use+0.014
mine+0.013
 AW+0.012
 punish+0.012
Neg logit contributions
 knights-0.023
 explorers-0.021
and-0.021
 hunters-0.020
run-0.020
Tokenings
Token ID654
Feature activation-3.46e-02

Pos logit contributions
ada+0.127
aman+0.125
mine+0.114
iable+0.104
ige+0.102
Neg logit contributions
 explorers-0.151
 knights-0.148
kins-0.136
 pieces-0.134
 slices-0.130
Token,"
Token ID553
Feature activation-1.53e-02

Pos logit contributions
ada+0.063
aman+0.062
inkle+0.056
mine+0.054
ige+0.053
Neg logit contributions
 knights-0.050
 explorers-0.048
 there-0.048
 he-0.047
 slices-0.045
Token Mom
Token ID11254
Feature activation+4.76e-03

Pos logit contributions
ada+0.029
aman+0.028
mine+0.025
ige+0.024
 punish+0.023
Neg logit contributions
 explorers-0.023
 knights-0.023
 caves-0.020
 lair-0.020
 triangles-0.020
Token said
Token ID531
Feature activation-2.30e-03

Pos logit contributions
 explorers+0.010
 knights+0.010
 caves+0.010
 lair+0.009
 shields+0.009
Neg logit contributions
ada-0.006
aman-0.006
mine-0.005
ose-0.005
 punish-0.004
Token.
Token ID13
Feature activation+7.53e-03

Pos logit contributions
ada+0.004
aman+0.003
mine+0.003
 punish+0.003
our+0.003
Neg logit contributions
 explorers-0.004
 knights-0.004
 caves-0.004
 lair-0.004
 cave-0.004
Token you
Token ID345
Feature activation-1.66e-03

Pos logit contributions
ada+0.016
aman+0.016
mine+0.013
use+0.012
 punish+0.012
Neg logit contributions
 knights-0.034
 explorers-0.034
 lair-0.031
 caves-0.031
 triangles-0.030
Token too
Token ID1165
Feature activation-8.94e-03

Pos logit contributions
ada+0.003
aman+0.003
 AW+0.003
mine+0.003
ige+0.003
Neg logit contributions
 knights-0.003
 explorers-0.003
 triangles-0.002
 rockets-0.002
 pirates-0.002
Token,
Token ID11
Feature activation-1.95e-02

Pos logit contributions
ada+0.015
aman+0.014
mine+0.013
ige+0.012
 AW+0.012
Neg logit contributions
 explorers-0.016
 knights-0.016
 caves-0.014
 triangles-0.014
 rockets-0.014
Token my
Token ID616
Feature activation-2.09e-02

Pos logit contributions
ada+0.039
aman+0.038
mine+0.034
ige+0.032
ir+0.032
Neg logit contributions
 knights-0.027
 explorers-0.027
 caves-0.023
 triangles-0.023
 lair-0.023
Token d
Token ID288
Feature activation-1.17e-02

Pos logit contributions
ada+0.039
aman+0.038
mine+0.036
ir+0.033
use+0.032
Neg logit contributions
 knights-0.032
 explorers-0.032
 pirates-0.027
 triangles-0.027
 rockets-0.027
Tokenarl
Token ID7063
Feature activation-8.45e-02

Pos logit contributions
aman+0.015
use+0.013
 punish+0.013
 AW+0.012
inkle+0.012
Neg logit contributions
 knights-0.024
 explorers-0.023
 hunters-0.021
and-0.021
 triangles-0.020
Tokenings
Token ID654
Feature activation-3.71e-02

Pos logit contributions
ada+0.126
aman+0.123
mine+0.109
ige+0.099
 punish+0.099
Neg logit contributions
 explorers-0.156
 knights-0.152
kins-0.141
 pieces-0.136
 slices-0.132
Token,"
Token ID553
Feature activation-1.03e-02

Pos logit contributions
aman+0.065
ada+0.064
mine+0.054
 AW+0.054
ige+0.053
Neg logit contributions
 knights-0.062
 explorers-0.061
 triangles-0.053
 slices-0.051
 caves-0.051
Token she
Token ID673
Feature activation+2.70e-02

Pos logit contributions
ada+0.018
aman+0.017
mine+0.015
 say+0.014
 punish+0.014
Neg logit contributions
 explorers-0.016
 lair-0.016
 knights-0.016
 caves-0.016
 shields-0.015
Token said
Token ID531
Feature activation+2.18e-04

Pos logit contributions
 knights+0.053
 explorers+0.053
 caves+0.050
 lair+0.048
 triangles+0.047
Neg logit contributions
ada-0.039
aman-0.036
mine-0.032
ige-0.030
 punish-0.029
Token.
Token ID13
Feature activation+1.05e-02

Pos logit contributions
 explorers+0.000
 knights+0.000
 caves+0.000
 lair+0.000
 cave+0.000
Neg logit contributions
ada-0.000
mine-0.000
aman-0.000
ir-0.000
our-0.000
Token perfect
Token ID2818
Feature activation-9.06e-03

Pos logit contributions
ada+0.000
aman+0.000
mine+0.000
ige+0.000
 AW+0.000
Neg logit contributions
 knights-0.000
 explorers-0.000
 caves-0.000
 triangles-0.000
 lair-0.000
Token fish
Token ID5916
Feature activation-2.03e-02

Pos logit contributions
ada+0.013
aman+0.012
mine+0.011
 AW+0.010
ige+0.010
Neg logit contributions
 knights-0.018
 explorers-0.018
 triangles-0.016
 caves-0.016
 pirates-0.016
Token,
Token ID11
Feature activation-1.84e-02

Pos logit contributions
ada+0.026
aman+0.025
mine+0.021
ige+0.019
use+0.019
Neg logit contributions
 knights-0.042
 explorers-0.042
 triangles-0.038
 caves-0.038
 rockets-0.037
Token my
Token ID616
Feature activation-2.11e-02

Pos logit contributions
ada+0.035
aman+0.034
mine+0.030
ige+0.029
 AW+0.028
Neg logit contributions
 knights-0.026
 explorers-0.026
 caves-0.023
 lair-0.023
 triangles-0.023
Token d
Token ID288
Feature activation-7.29e-03

Pos logit contributions
ada+0.045
aman+0.044
mine+0.040
ige+0.038
 AW+0.038
Neg logit contributions
 knights-0.026
 explorers-0.026
 caves-0.022
 triangles-0.021
 lair-0.020
Tokenarl
Token ID7063
Feature activation-8.39e-02

Pos logit contributions
aman+0.011
 punish+0.009
use+0.008
 AW+0.008
mine+0.008
Neg logit contributions
 knights-0.015
 explorers-0.015
 triangles-0.014
ull-0.013
 pirates-0.013
Tokenings
Token ID654
Feature activation-3.14e-02

Pos logit contributions
ada+0.122
aman+0.122
mine+0.105
 punish+0.097
ige+0.096
Neg logit contributions
 explorers-0.162
 knights-0.158
kins-0.144
 triangles-0.140
 caves-0.140
Token,"
Token ID553
Feature activation-1.30e-02

Pos logit contributions
aman+0.059
ada+0.058
mine+0.050
 AW+0.049
inkle+0.049
Neg logit contributions
 knights-0.048
 explorers-0.048
 there-0.040
 triangles-0.040
 slices-0.040
Token mom
Token ID1995
Feature activation+6.35e-03

Pos logit contributions
ada+0.025
aman+0.024
mine+0.022
ige+0.020
use+0.020
Neg logit contributions
 explorers-0.019
 knights-0.019
 caves-0.016
 triangles-0.016
 lair-0.016
Token says
Token ID1139
Feature activation+4.05e-03

Pos logit contributions
 knights+0.011
 explorers+0.011
 triangles+0.010
 rockets+0.009
 pirates+0.009
Neg logit contributions
aman-0.011
ada-0.011
 AW-0.009
mine-0.009
ige-0.009
Token.
Token ID13
Feature activation+1.23e-02

Pos logit contributions
 explorers+0.008
 knights+0.007
 caves+0.007
 lair+0.007
 shields+0.007
Neg logit contributions
ada-0.006
mine-0.005
aman-0.005
our-0.005
ir-0.005
Token's
Token ID338
Feature activation+7.71e-04

Pos logit contributions
ada+0.005
aman+0.005
mine+0.005
ir+0.005
use+0.004
Neg logit contributions
 knights-0.003
 explorers-0.003
 pieces-0.003
 triangles-0.003
 rockets-0.003
Token good
Token ID922
Feature activation-1.06e-02

Pos logit contributions
 explorers+0.001
 knights+0.001
 triangles+0.001
 caves+0.001
 pirates+0.001
Neg logit contributions
ada-0.002
aman-0.002
mine-0.001
use-0.001
ir-0.001
Token,
Token ID11
Feature activation-1.97e-02

Pos logit contributions
ada+0.016
aman+0.016
mine+0.014
ige+0.013
 AW+0.013
Neg logit contributions
 knights-0.020
 explorers-0.020
 caves-0.018
 triangles-0.018
 lair-0.017
Token my
Token ID616
Feature activation-2.40e-02

Pos logit contributions
ada+0.041
aman+0.039
mine+0.038
use+0.036
ir+0.035
Neg logit contributions
 knights-0.027
 explorers-0.027
 triangles-0.022
 pirates-0.022
 rockets-0.021
Token d
Token ID288
Feature activation-8.43e-03

Pos logit contributions
mine+0.042
aman+0.041
ada+0.040
use+0.037
 AW+0.036
Neg logit contributions
 explorers-0.037
 knights-0.037
 brothers-0.032
 triangles-0.032
 days-0.032
Tokenarl
Token ID7063
Feature activation-8.37e-02

Pos logit contributions
aman+0.012
use+0.011
 punish+0.010
mine+0.010
 AW+0.010
Neg logit contributions
 knights-0.015
 explorers-0.015
ment-0.014
ull-0.014
 triangles-0.014
Tokenings
Token ID654
Feature activation-3.46e-02

Pos logit contributions
ada+0.131
aman+0.129
mine+0.117
iable+0.110
ige+0.107
Neg logit contributions
 explorers-0.143
kins-0.137
 knights-0.136
 pieces-0.129
ment-0.128
Token.
Token ID13
Feature activation-8.17e-03

Pos logit contributions
ada+0.068
aman+0.068
mine+0.059
ige+0.058
 AW+0.057
Neg logit contributions
 knights-0.049
 explorers-0.048
 triangles-0.041
 caves-0.040
 slices-0.039
Token I
Token ID314
Feature activation-4.78e-03

Pos logit contributions
ada+0.014
aman+0.014
mine+0.012
ige+0.011
 AW+0.011
Neg logit contributions
 knights-0.014
 explorers-0.013
 triangles-0.012
 caves-0.012
 rockets-0.011
Token'm
Token ID1101
Feature activation-4.75e-03

Pos logit contributions
aman+0.009
ada+0.009
 ME+0.007
mine+0.007
 AW+0.007
Neg logit contributions
 explorers-0.007
 knights-0.007
 there-0.006
 pirates-0.006
 he-0.006
Token glad
Token ID9675
Feature activation-2.53e-03

Pos logit contributions
ada+0.009
aman+0.009
mine+0.008
use+0.008
 AW+0.008
Neg logit contributions
 explorers-0.007
 knights-0.007
 triangles-0.006
 caves-0.006
 pirates-0.006
Token you
Token ID345
Feature activation-2.38e-03

Pos logit contributions
 knights+0.036
 explorers+0.036
 caves+0.033
 triangles+0.032
 lair+0.032
Neg logit contributions
ada-0.023
aman-0.022
mine-0.019
 AW-0.017
ige-0.017
Token okay
Token ID8788
Feature activation+1.85e-02

Pos logit contributions
ada+0.004
aman+0.004
mine+0.003
ige+0.003
 AW+0.003
Neg logit contributions
 knights-0.004
 explorers-0.004
 triangles-0.004
 caves-0.003
 lair-0.003
Token,
Token ID11
Feature activation-1.85e-02

Pos logit contributions
 explorers+0.037
 knights+0.037
 triangles+0.032
 pirates+0.032
 caves+0.032
Neg logit contributions
ada-0.026
aman-0.026
mine-0.023
 AW-0.021
ige-0.021
Token my
Token ID616
Feature activation-2.13e-02

Pos logit contributions
ada+0.036
aman+0.035
mine+0.031
ige+0.030
 AW+0.029
Neg logit contributions
 knights-0.027
 explorers-0.027
 caves-0.023
 triangles-0.023
 lair-0.022
Token d
Token ID288
Feature activation-1.07e-02

Pos logit contributions
ada+0.038
aman+0.037
mine+0.034
use+0.032
aley+0.030
Neg logit contributions
 knights-0.034
 explorers-0.034
 triangles-0.029
 pirates-0.029
 rockets-0.029
Tokenarl
Token ID7063
Feature activation-8.37e-02

Pos logit contributions
aman+0.014
use+0.012
ada+0.011
 punish+0.011
 AW+0.011
Neg logit contributions
 knights-0.021
 explorers-0.020
ull-0.019
 hunters-0.019
 triangles-0.019
Tokenings
Token ID654
Feature activation-3.60e-02

Pos logit contributions
ada+0.127
aman+0.125
mine+0.112
iable+0.102
aley+0.101
Neg logit contributions
 explorers-0.148
 knights-0.146
kins-0.137
 pieces-0.132
 slices-0.130
Token?"
Token ID1701
Feature activation-1.33e-02

Pos logit contributions
aman+0.057
ada+0.056
mine+0.048
 AW+0.047
ige+0.047
Neg logit contributions
 knights-0.060
 explorers-0.059
 there-0.058
 he-0.054
 hunters-0.053
Token Mom
Token ID11254
Feature activation+5.61e-03

Pos logit contributions
ada+0.026
aman+0.025
mine+0.022
ir+0.021
 punish+0.021
Neg logit contributions
 explorers-0.019
 knights-0.019
 caves-0.017
 triangles-0.016
 lair-0.016
Token asked
Token ID1965
Feature activation+8.10e-03

Pos logit contributions
 explorers+0.010
 knights+0.010
 caves+0.009
 shields+0.009
 lair+0.009
Neg logit contributions
ada-0.009
aman-0.009
mine-0.008
ose-0.007
use-0.007
Token.
Token ID13
Feature activation+1.05e-02

Pos logit contributions
 explorers+0.014
 knights+0.014
 lair+0.013
 caves+0.013
 triangles+0.013
Neg logit contributions
ada-0.013
aman-0.012
mine-0.011
ir-0.010
use-0.010
Token have
Token ID423
Feature activation+3.00e-03

Pos logit contributions
aman+0.014
ada+0.013
ige+0.011
 AW+0.011
 ME+0.010
Neg logit contributions
 explorers-0.013
 knights-0.012
 there-0.011
 pirates-0.010
 astronauts-0.010
Token fun
Token ID1257
Feature activation+6.55e-03

Pos logit contributions
 knights+0.004
 explorers+0.004
 triangles+0.004
 lair+0.004
kins+0.004
Neg logit contributions
aman-0.005
ada-0.005
 punish-0.005
use-0.005
 AW-0.005
Token,
Token ID11
Feature activation-1.96e-02

Pos logit contributions
 knights+0.011
 explorers+0.011
 triangles+0.010
 caves+0.010
 rockets+0.009
Neg logit contributions
ada-0.011
aman-0.011
mine-0.010
ige-0.009
 AW-0.009
Token my
Token ID616
Feature activation-2.24e-02

Pos logit contributions
ada+0.039
aman+0.038
mine+0.035
 AW+0.033
use+0.033
Neg logit contributions
 explorers-0.028
 knights-0.028
 triangles-0.023
 caves-0.023
 rockets-0.022
Token d
Token ID288
Feature activation-9.88e-03

Pos logit contributions
ada+0.043
aman+0.043
mine+0.041
 AW+0.038
use+0.037
Neg logit contributions
 explorers-0.033
 knights-0.032
 triangles-0.027
 rockets-0.027
 caves-0.026
Tokenarl
Token ID7063
Feature activation-8.27e-02

Pos logit contributions
aman+0.013
use+0.012
 punish+0.012
 AW+0.012
ishly+0.011
Neg logit contributions
 knights-0.018
 explorers-0.018
ull-0.017
 triangles-0.016
 hunters-0.016
Tokenings
Token ID654
Feature activation-3.45e-02

Pos logit contributions
ada+0.129
aman+0.126
mine+0.113
iable+0.104
ige+0.103
Neg logit contributions
 explorers-0.144
 knights-0.138
kins-0.134
 pieces-0.127
ment-0.126
Token?"
Token ID1701
Feature activation-1.24e-02

Pos logit contributions
ada+0.052
aman+0.052
ige+0.043
 AW+0.043
mine+0.042
Neg logit contributions
 explorers-0.063
 knights-0.063
 there-0.059
 he-0.056
 triangles-0.054
Token Their
Token ID5334
Feature activation+1.53e-02

Pos logit contributions
ada+0.025
aman+0.024
mine+0.021
 punish+0.020
ir+0.020
Neg logit contributions
 knights-0.017
 explorers-0.017
 caves-0.015
 lair-0.015
 triangles-0.015
Token mother
Token ID2802
Feature activation+1.64e-04

Pos logit contributions
 knights+0.024
 explorers+0.024
 triangles+0.021
 caves+0.021
 lair+0.020
Neg logit contributions
ada-0.028
aman-0.027
mine-0.024
ige-0.023
ir-0.023
Token asked
Token ID1965
Feature activation+2.81e-03

Pos logit contributions
 explorers+0.000
 knights+0.000
 caves+0.000
 lair+0.000
 shields+0.000
Neg logit contributions
ada-0.000
aman-0.000
mine-0.000
ose-0.000
use-0.000
Token have
Token ID423
Feature activation+4.44e-03

Pos logit contributions
aman+0.010
ada+0.009
 AW+0.008
ige+0.008
oph+0.007
Neg logit contributions
 explorers-0.009
 knights-0.009
 there-0.008
 pirates-0.008
 he-0.008
Token fun
Token ID1257
Feature activation+5.68e-03

Pos logit contributions
 explorers+0.006
 knights+0.006
kins+0.006
 triangles+0.005
 lair+0.005
Neg logit contributions
aman-0.008
ada-0.008
 punish-0.007
use-0.007
ige-0.007
Token,
Token ID11
Feature activation-2.16e-02

Pos logit contributions
 knights+0.010
 explorers+0.010
 triangles+0.009
 caves+0.009
 pirates+0.009
Neg logit contributions
ada-0.009
aman-0.009
mine-0.008
 AW-0.007
ige-0.007
Token my
Token ID616
Feature activation-2.39e-02

Pos logit contributions
ada+0.044
aman+0.044
mine+0.041
use+0.040
 AW+0.039
Neg logit contributions
 explorers-0.031
 knights-0.030
 pirates-0.025
 there-0.024
 triangles-0.024
Token d
Token ID288
Feature activation-9.05e-03

Pos logit contributions
aman+0.049
ada+0.049
mine+0.044
ir+0.041
use+0.041
Neg logit contributions
 knights-0.036
 explorers-0.035
 rockets-0.029
 pirates-0.029
 triangles-0.028
Tokenarl
Token ID7063
Feature activation-8.26e-02

Pos logit contributions
aman+0.012
use+0.011
 AW+0.011
 punish+0.010
ada+0.010
Neg logit contributions
 knights-0.019
 explorers-0.018
 hunters-0.016
ull-0.016
 giants-0.016
Tokenings
Token ID654
Feature activation-3.32e-02

Pos logit contributions
ada+0.128
aman+0.125
mine+0.109
ige+0.103
 punish+0.100
Neg logit contributions
 explorers-0.151
 knights-0.148
kins-0.138
 pieces-0.130
 triangles-0.129
Token?"
Token ID1701
Feature activation-1.44e-02

Pos logit contributions
aman+0.055
ada+0.054
ige+0.047
 AW+0.047
mine+0.045
Neg logit contributions
 explorers-0.056
 knights-0.055
 there-0.053
 he-0.051
 hunters-0.048
Token she
Token ID673
Feature activation+2.25e-02

Pos logit contributions
ada+0.031
aman+0.031
mine+0.028
ige+0.027
 AW+0.027
Neg logit contributions
 explorers-0.017
 knights-0.017
 caves-0.014
 triangles-0.014
 pirates-0.013
Token asks
Token ID7893
Feature activation+2.74e-03

Pos logit contributions
 knights+0.038
 explorers+0.038
 triangles+0.033
 caves+0.033
 rockets+0.032
Neg logit contributions
ada-0.038
aman-0.037
mine-0.033
ige-0.031
 AW-0.031
Token.
Token ID13
Feature activation+1.15e-02

Pos logit contributions
 explorers+0.005
 knights+0.005
 caves+0.004
 triangles+0.004
 lair+0.004
Neg logit contributions
ada-0.005
aman-0.004
mine-0.004
ir-0.004
ige-0.003
Token you
Token ID345
Feature activation+3.81e-03

Pos logit contributions
ada+0.017
aman+0.016
mine+0.014
ige+0.012
ir+0.012
Neg logit contributions
 knights-0.026
 explorers-0.026
 triangles-0.023
 caves-0.023
 lair-0.022
Token too
Token ID1165
Feature activation-6.74e-03

Pos logit contributions
 knights+0.005
 explorers+0.004
 triangles+0.004
 brothers+0.004
 pirates+0.004
Neg logit contributions
aman-0.009
ada-0.009
 AW-0.008
mine-0.008
ir-0.007
Token,
Token ID11
Feature activation-1.79e-02

Pos logit contributions
aman+0.012
ada+0.012
mine+0.011
 AW+0.009
ige+0.009
Neg logit contributions
 knights-0.012
 explorers-0.011
 importantly-0.010
 triangles-0.010
 there-0.010
Token my
Token ID616
Feature activation-1.84e-02

Pos logit contributions
ada+0.037
aman+0.036
mine+0.033
 AW+0.031
use+0.031
Neg logit contributions
 knights-0.025
 explorers-0.025
 triangles-0.021
 pirates-0.021
 caves-0.020
Token d
Token ID288
Feature activation-8.59e-03

Pos logit contributions
ada+0.031
aman+0.031
mine+0.028
ir+0.026
 AW+0.026
Neg logit contributions
 knights-0.032
 explorers-0.032
 triangles-0.027
 pirates-0.027
 caves-0.026
Tokenarl
Token ID7063
Feature activation-8.20e-02

Pos logit contributions
aman+0.011
use+0.010
 AW+0.010
 punish+0.009
mine+0.009
Neg logit contributions
 knights-0.018
 explorers-0.017
and-0.016
 hunters-0.015
 pirates-0.015
Tokenings
Token ID654
Feature activation-3.33e-02

Pos logit contributions
ada+0.124
aman+0.123
mine+0.108
ige+0.100
iable+0.100
Neg logit contributions
 explorers-0.146
 knights-0.143
kins-0.137
ment-0.129
 pieces-0.128
Token.
Token ID13
Feature activation-8.73e-03

Pos logit contributions
aman+0.069
ada+0.067
mine+0.058
ige+0.058
 AW+0.058
Neg logit contributions
 knights-0.045
 explorers-0.043
 triangles-0.037
 there-0.036
 caves-0.035
Token But
Token ID887
Feature activation-3.52e-03

Pos logit contributions
ada+0.014
aman+0.013
mine+0.012
ige+0.011
ir+0.011
Neg logit contributions
 knights-0.016
 explorers-0.016
 caves-0.014
 triangles-0.014
 pirates-0.014
Token you
Token ID345
Feature activation+1.62e-02

Pos logit contributions
ada+0.005
aman+0.005
mine+0.004
use+0.004
ir+0.004
Neg logit contributions
 knights-0.007
 explorers-0.007
 lair-0.006
 caves-0.006
 triangles-0.006
Token have
Token ID423
Feature activation+1.08e-02

Pos logit contributions
 explorers+0.025
 knights+0.025
 pirates+0.021
 there+0.021
 brothers+0.021
Neg logit contributions
aman-0.034
ada-0.031
 AW-0.027
mine-0.026
 ME-0.025
Token forgive
Token ID20927
Feature activation+3.61e-03

Pos logit contributions
aman+0.004
ada+0.004
mine+0.003
ican+0.003
ir+0.003
Neg logit contributions
 knights-0.002
 explorers-0.002
 there-0.002
 pirates-0.002
 cave-0.002
Token you
Token ID345
Feature activation+5.02e-03

Pos logit contributions
 knights+0.006
 explorers+0.006
 triangles+0.005
 pirates+0.005
 there+0.005
Neg logit contributions
ada-0.006
aman-0.006
 AW-0.005
mine-0.005
ige-0.005
Token,
Token ID11
Feature activation-1.77e-02

Pos logit contributions
 knights+0.008
 explorers+0.008
 triangles+0.007
 caves+0.007
 rockets+0.007
Neg logit contributions
ada-0.009
aman-0.009
mine-0.008
 AW-0.007
ige-0.007
Token my
Token ID616
Feature activation-1.91e-02

Pos logit contributions
ada+0.035
aman+0.033
mine+0.031
 AW+0.030
use+0.030
Neg logit contributions
 explorers-0.026
 knights-0.026
 pirates-0.022
 triangles-0.022
 there-0.021
Token d
Token ID288
Feature activation-9.90e-03

Pos logit contributions
ada+0.030
aman+0.030
mine+0.027
 AW+0.024
ir+0.024
Neg logit contributions
 knights-0.036
 explorers-0.035
 triangles-0.031
 pirates-0.030
 caves-0.030
Tokenarl
Token ID7063
Feature activation-8.19e-02

Pos logit contributions
aman+0.012
 AW+0.011
use+0.011
 punish+0.011
 hate+0.010
Neg logit contributions
 knights-0.021
 explorers-0.019
and-0.019
 triangles-0.018
 queens-0.017
Tokenings
Token ID654
Feature activation-3.46e-02

Pos logit contributions
ada+0.127
aman+0.125
mine+0.110
 punish+0.103
iable+0.102
Neg logit contributions
 explorers-0.141
 knights-0.140
kins-0.133
ment-0.127
 pieces-0.126
Token.
Token ID13
Feature activation-1.43e-02

Pos logit contributions
ada+0.068
aman+0.068
inkle+0.058
 AW+0.058
mine+0.058
Neg logit contributions
 knights-0.049
 explorers-0.047
 triangles-0.040
 slices-0.040
 there-0.039
Token And
Token ID843
Feature activation-6.32e-03

Pos logit contributions
ada+0.025
aman+0.024
mine+0.022
ir+0.020
ige+0.020
Neg logit contributions
 knights-0.024
 explorers-0.024
 triangles-0.020
 caves-0.020
 pirates-0.020
Token of
Token ID286
Feature activation+9.08e-03

Pos logit contributions
ada+0.009
aman+0.008
mine+0.007
 punish+0.007
use+0.007
Neg logit contributions
 knights-0.012
 lair-0.011
 explorers-0.011
 caves-0.011
 shields-0.011
Token course
Token ID1781
Feature activation-3.63e-02

Pos logit contributions
 knights+0.019
 explorers+0.019
 caves+0.017
 triangles+0.017
 lair+0.016
Neg logit contributions
ada-0.012
aman-0.012
mine-0.010
ige-0.009
ir-0.009
Token your
Token ID534
Feature activation-1.79e-02

Pos logit contributions
 knights+0.029
 explorers+0.028
 caves+0.026
 triangles+0.026
 lair+0.025
Neg logit contributions
ada-0.033
aman-0.032
mine-0.028
ige-0.028
 punish-0.028
Token fault
Token ID8046
Feature activation-3.47e-02

Pos logit contributions
ada+0.031
aman+0.029
mine+0.028
ir+0.025
 AW+0.025
Neg logit contributions
 explorers-0.031
 knights-0.030
 lair-0.026
 rockets-0.026
 pirates-0.026
Token,
Token ID11
Feature activation-9.52e-03

Pos logit contributions
ada+0.053
aman+0.051
mine+0.044
ige+0.040
 AW+0.040
Neg logit contributions
 knights-0.065
 explorers-0.065
 caves-0.058
 triangles-0.058
 lair-0.056
Token my
Token ID616
Feature activation-1.30e-02

Pos logit contributions
ada+0.017
aman+0.017
mine+0.015
 AW+0.014
ige+0.014
Neg logit contributions
 knights-0.015
 explorers-0.015
 caves-0.013
 triangles-0.013
 lair-0.013
Token d
Token ID288
Feature activation-7.30e-03

Pos logit contributions
ada+0.022
aman+0.021
mine+0.020
ir+0.019
 AW+0.018
Neg logit contributions
 explorers-0.022
 knights-0.021
 rockets-0.018
 pirates-0.018
 cave-0.018
Tokenarl
Token ID7063
Feature activation-8.09e-02

Pos logit contributions
aman+0.009
use+0.008
 punish+0.008
 AW+0.008
mine+0.008
Neg logit contributions
 knights-0.014
 explorers-0.013
 hunters-0.013
run-0.013
and-0.012
Tokenings
Token ID654
Feature activation-3.09e-02

Pos logit contributions
ada+0.128
aman+0.120
mine+0.111
aley+0.102
ige+0.102
Neg logit contributions
 explorers-0.136
 knights-0.133
kins-0.128
ment-0.124
 pieces-0.124
Token.
Token ID13
Feature activation-6.47e-03

Pos logit contributions
aman+0.058
ada+0.057
mine+0.049
ige+0.049
 AW+0.048
Neg logit contributions
 knights-0.047
 explorers-0.047
 triangles-0.040
 there-0.039
 rockets-0.039
Token You
Token ID921
Feature activation-5.02e-03

Pos logit contributions
ada+0.012
aman+0.012
mine+0.011
ige+0.010
ir+0.010
Neg logit contributions
 knights-0.010
 explorers-0.010
 caves-0.008
 triangles-0.008
 pirates-0.008
Token didn
Token ID1422
Feature activation-1.61e-02

Pos logit contributions
aman+0.009
ada+0.008
 AW+0.007
ige+0.007
mine+0.007
Neg logit contributions
 explorers-0.009
 knights-0.009
 there-0.008
 astronauts-0.008
 pirates-0.008
Token't
Token ID470
Feature activation-2.11e-02

Pos logit contributions
aman+0.021
ada+0.019
use+0.016
ige+0.016
awk+0.015
Neg logit contributions
 knights-0.034
 explorers-0.033
 triangles-0.032
 rockets-0.032
 he-0.031
Token you
Token ID345
Feature activation+1.58e-03

Pos logit contributions
 knights+0.040
 explorers+0.040
 caves+0.036
 triangles+0.036
 lair+0.035
Neg logit contributions
ada-0.027
aman-0.026
mine-0.022
 AW-0.020
ige-0.020
Token okay
Token ID8788
Feature activation+1.92e-02

Pos logit contributions
 knights+0.003
 explorers+0.003
 triangles+0.002
 caves+0.002
 lair+0.002
Neg logit contributions
ada-0.003
aman-0.003
mine-0.002
ige-0.002
ir-0.002
Token,
Token ID11
Feature activation-1.84e-02

Pos logit contributions
 explorers+0.039
 knights+0.038
 triangles+0.033
 pirates+0.033
 there+0.033
Neg logit contributions
ada-0.027
aman-0.027
mine-0.023
 AW-0.022
ige-0.022
Token my
Token ID616
Feature activation-1.96e-02

Pos logit contributions
ada+0.036
aman+0.035
mine+0.031
ige+0.029
 AW+0.029
Neg logit contributions
 knights-0.027
 explorers-0.027
 caves-0.023
 triangles-0.023
 lair-0.022
Token d
Token ID288
Feature activation-1.07e-02

Pos logit contributions
aman+0.035
ada+0.035
mine+0.033
use+0.031
ir+0.030
Neg logit contributions
 explorers-0.030
 knights-0.030
 pirates-0.025
 brothers-0.025
 there-0.025
Tokenarl
Token ID7063
Feature activation-8.07e-02

Pos logit contributions
aman+0.016
use+0.014
 AW+0.013
 punish+0.013
inkle+0.012
Neg logit contributions
 knights-0.019
 explorers-0.018
and-0.017
 hunters-0.017
ull-0.016
Tokenings
Token ID654
Feature activation-3.65e-02

Pos logit contributions
ada+0.127
aman+0.124
mine+0.114
iable+0.106
ige+0.104
Neg logit contributions
 explorers-0.136
 knights-0.131
kins-0.127
 pieces-0.121
men-0.120
Token?"
Token ID1701
Feature activation-1.20e-02

Pos logit contributions
aman+0.058
ada+0.057
ige+0.048
 AW+0.047
mine+0.047
Neg logit contributions
 knights-0.061
 explorers-0.060
 there-0.059
 he-0.056
 hunters-0.053
Token she
Token ID673
Feature activation+2.66e-02

Pos logit contributions
ada+0.023
aman+0.023
mine+0.020
ir+0.019
use+0.019
Neg logit contributions
 explorers-0.018
 knights-0.017
 caves-0.015
 triangles-0.015
 lair-0.015
Token asked
Token ID1965
Feature activation+8.27e-03

Pos logit contributions
 explorers+0.046
 knights+0.046
 caves+0.044
 lair+0.042
 logs+0.041
Neg logit contributions
ada-0.041
aman-0.039
mine-0.037
 punish-0.032
ING-0.032
Token.
Token ID13
Feature activation+1.27e-02

Pos logit contributions
 explorers+0.015
 knights+0.015
 caves+0.014
 lair+0.014
 cave+0.013
Neg logit contributions
ada-0.013
aman-0.012
mine-0.011
ir-0.010
 punish-0.010
Token'm
Token ID1101
Feature activation+3.51e-03

Pos logit contributions
aman+0.015
ada+0.015
 AW+0.013
mine+0.013
ING+0.013
Neg logit contributions
 explorers-0.013
 knights-0.012
 there-0.012
 pirates-0.011
 he-0.011
Token sorry
Token ID7926
Feature activation+8.34e-03

Pos logit contributions
 knights+0.005
 explorers+0.005
 triangles+0.005
 pirates+0.004
 caves+0.004
Neg logit contributions
ada-0.007
aman-0.007
mine-0.006
use-0.006
 AW-0.006
Token,
Token ID11
Feature activation-1.65e-02

Pos logit contributions
 knights+0.017
 explorers+0.017
 triangles+0.015
 caves+0.015
 rockets+0.014
Neg logit contributions
ada-0.012
aman-0.011
mine-0.010
ige-0.009
 AW-0.009
Token my
Token ID616
Feature activation-1.85e-02

Pos logit contributions
ada+0.033
aman+0.032
mine+0.030
use+0.029
 AW+0.028
Neg logit contributions
 explorers-0.023
 knights-0.022
 triangles-0.019
 pirates-0.019
 caves-0.018
Token d
Token ID288
Feature activation-4.68e-03

Pos logit contributions
aman+0.032
ada+0.032
mine+0.028
ir+0.027
 AW+0.026
Neg logit contributions
 knights-0.032
 explorers-0.031
 pirates-0.027
 triangles-0.026
 rockets-0.026
Tokenarl
Token ID7063
Feature activation-8.06e-02

Pos logit contributions
aman+0.006
use+0.005
 AW+0.005
 punish+0.004
ada+0.004
Neg logit contributions
 knights-0.010
 explorers-0.010
 triangles-0.009
and-0.009
 rockets-0.009
Tokenings
Token ID654
Feature activation-3.08e-02

Pos logit contributions
ada+0.124
aman+0.123
mine+0.107
iable+0.102
ige+0.101
Neg logit contributions
 explorers-0.141
 knights-0.138
kins-0.131
 pieces-0.126
ment-0.125
Token.
Token ID13
Feature activation-7.50e-03

Pos logit contributions
aman+0.065
ada+0.063
mine+0.056
ige+0.055
inkle+0.055
Neg logit contributions
 knights-0.039
 explorers-0.039
 he-0.032
 triangles-0.032
 there-0.032
Token The
Token ID383
Feature activation-9.01e-03

Pos logit contributions
ada+0.012
aman+0.012
mine+0.010
 AW+0.010
use+0.010
Neg logit contributions
 knights-0.013
 explorers-0.013
 triangles-0.012
 caves-0.011
 lair-0.011
Token tornado
Token ID33718
Feature activation-2.21e-02

Pos logit contributions
ada+0.017
aman+0.016
mine+0.015
ige+0.014
ir+0.014
Neg logit contributions
 knights-0.014
 explorers-0.014
 triangles-0.012
 caves-0.012
 rockets-0.012
Token took
Token ID1718
Feature activation-5.38e-03

Pos logit contributions
ada+0.043
aman+0.042
ige+0.036
mine+0.035
ir+0.034
Neg logit contributions
 explorers-0.033
 knights-0.032
 pieces-0.027
 triangles-0.027
 there-0.027
TokenWhat
Token ID2061
Feature activation+1.67e-02

Pos logit contributions
 explorers+0.008
 knights+0.008
 lair+0.007
 caves+0.007
 triangles+0.007
Neg logit contributions
ada-0.007
aman-0.007
mine-0.006
use-0.006
ING-0.006
Token happened
Token ID3022
Feature activation+8.32e-03

Pos logit contributions
 explorers+0.034
 valleys+0.034
 caves+0.034
 lair+0.033
 logs+0.033
Neg logit contributions
ada-0.017
're-0.017
 hate-0.016
 punish-0.015
 are-0.015
Token,
Token ID11
Feature activation-1.92e-02

Pos logit contributions
 knights+0.016
 explorers+0.016
 caves+0.014
 triangles+0.014
 pirates+0.013
Neg logit contributions
ada-0.013
aman-0.013
mine-0.011
ige-0.010
 AW-0.010
Token my
Token ID616
Feature activation-1.74e-02

Pos logit contributions
ada+0.040
aman+0.039
mine+0.036
use+0.034
 AW+0.034
Neg logit contributions
 knights-0.026
 explorers-0.026
 triangles-0.022
 pirates-0.021
 caves-0.021
Token d
Token ID288
Feature activation-9.93e-03

Pos logit contributions
ada+0.028
aman+0.027
mine+0.025
use+0.023
 AW+0.023
Neg logit contributions
 knights-0.031
 explorers-0.031
 slices-0.026
 triangles-0.026
 pirates-0.026
Tokenarl
Token ID7063
Feature activation-8.05e-02

Pos logit contributions
aman+0.013
use+0.011
mine+0.011
 punish+0.010
ada+0.010
Neg logit contributions
 knights-0.020
 explorers-0.019
 hunters-0.018
 triangles-0.018
and-0.017
Tokenings
Token ID654
Feature activation-3.67e-02

Pos logit contributions
ada+0.118
aman+0.118
mine+0.103
ige+0.094
iable+0.093
Neg logit contributions
 explorers-0.147
 knights-0.146
kins-0.135
 pieces-0.131
 triangles-0.130
Token?
Token ID30
Feature activation-1.45e-02

Pos logit contributions
aman+0.057
ada+0.055
mine+0.047
 AW+0.046
ige+0.046
Neg logit contributions
 knights-0.064
 explorers-0.062
 there-0.061
 he-0.058
 triangles-0.057
Token Why
Token ID4162
Feature activation+1.55e-02

Pos logit contributions
ada+0.026
aman+0.024
mine+0.022
use+0.021
 AW+0.021
Neg logit contributions
 explorers-0.023
 knights-0.023
 triangles-0.020
 caves-0.020
 lair-0.020
Token do
Token ID466
Feature activation+2.49e-02

Pos logit contributions
 knights+0.026
 explorers+0.026
 triangles+0.022
 caves+0.022
 rockets+0.022
Neg logit contributions
ada-0.027
aman-0.026
mine-0.023
 AW-0.022
ir-0.022
Token you
Token ID345
Feature activation-1.26e-02

Pos logit contributions
 explorers+0.053
 knights+0.053
 lair+0.049
 caves+0.049
 triangles+0.048
Neg logit contributions
ada-0.029
aman-0.027
mine-0.023
 punish-0.022
 AW-0.022
Token'm
Token ID1101
Feature activation+3.29e-03

Pos logit contributions
aman+0.018
ada+0.018
mine+0.016
ige+0.015
 AW+0.015
Neg logit contributions
 explorers-0.011
 there-0.011
 knights-0.011
 he-0.011
 cave-0.010
Token sorry
Token ID7926
Feature activation+7.27e-03

Pos logit contributions
 knights+0.005
 explorers+0.005
 triangles+0.004
 caves+0.004
 pirates+0.004
Neg logit contributions
ada-0.006
aman-0.006
mine-0.006
use-0.005
 AW-0.005
Token,
Token ID11
Feature activation-1.55e-02

Pos logit contributions
 knights+0.015
 explorers+0.015
 triangles+0.013
 caves+0.013
 rockets+0.013
Neg logit contributions
ada-0.010
aman-0.010
mine-0.009
ige-0.008
 AW-0.008
Token my
Token ID616
Feature activation-2.02e-02

Pos logit contributions
ada+0.031
aman+0.031
mine+0.028
use+0.026
 AW+0.026
Neg logit contributions
 knights-0.022
 explorers-0.022
 triangles-0.018
 caves-0.018
 rockets-0.017
Token d
Token ID288
Feature activation-7.42e-03

Pos logit contributions
ada+0.037
aman+0.036
mine+0.033
use+0.030
 AW+0.030
Neg logit contributions
 knights-0.032
 explorers-0.032
 triangles-0.028
 caves-0.027
 pirates-0.027
Tokenarl
Token ID7063
Feature activation-8.04e-02

Pos logit contributions
aman+0.009
use+0.007
ada+0.007
mine+0.007
 punish+0.007
Neg logit contributions
 knights-0.016
 explorers-0.015
 hunters-0.014
 triangles-0.014
ull-0.014
Tokenings
Token ID654
Feature activation-3.22e-02

Pos logit contributions
ada+0.120
aman+0.120
mine+0.105
iable+0.096
aley+0.096
Neg logit contributions
 explorers-0.145
 knights-0.143
kins-0.133
 pieces-0.129
 slices-0.127
Token.
Token ID13
Feature activation-7.81e-03

Pos logit contributions
aman+0.064
ada+0.063
mine+0.055
ige+0.054
 AW+0.053
Neg logit contributions
 knights-0.045
 explorers-0.044
 triangles-0.037
 caves-0.036
 slices-0.036
Token That
Token ID1320
Feature activation-4.36e-03

Pos logit contributions
ada+0.014
aman+0.013
mine+0.012
ige+0.011
 AW+0.011
Neg logit contributions
 knights-0.013
 explorers-0.013
 triangles-0.011
 caves-0.011
 rockets-0.011
Token was
Token ID373
Feature activation+3.54e-03

Pos logit contributions
ada+0.008
aman+0.008
mine+0.007
 AW+0.007
ige+0.007
Neg logit contributions
 knights-0.007
 explorers-0.007
 triangles-0.006
 caves-0.006
 rockets-0.006
Token a
Token ID257
Feature activation+6.20e-03

Pos logit contributions
 explorers+0.006
 knights+0.006
 caves+0.006
 chambers+0.006
 lair+0.006
Neg logit contributions
aman-0.005
ada-0.005
ige-0.004
 punish-0.004
 hate-0.004
Token'm
Token ID1101
Feature activation+3.34e-03

Pos logit contributions
aman+0.008
ada+0.008
mine+0.007
ir+0.007
 AW+0.007
Neg logit contributions
 explorers-0.006
 there-0.006
 knights-0.005
 he-0.005
 pirates-0.005
Token sorry
Token ID7926
Feature activation+1.05e-02

Pos logit contributions
 explorers+0.005
 knights+0.005
 pirates+0.004
 there+0.004
 triangles+0.004
Neg logit contributions
aman-0.006
ada-0.006
mine-0.006
use-0.006
 AW-0.005
Token,
Token ID11
Feature activation-1.56e-02

Pos logit contributions
 knights+0.020
 explorers+0.020
 caves+0.017
 triangles+0.017
 pirates+0.017
Neg logit contributions
ada-0.016
aman-0.016
mine-0.014
ige-0.013
use-0.013
Token my
Token ID616
Feature activation-1.72e-02

Pos logit contributions
ada+0.037
aman+0.036
mine+0.034
use+0.033
inkle+0.033
Neg logit contributions
 explorers-0.016
 knights-0.016
 triangles-0.012
 pirates-0.012
 caves-0.012
Token d
Token ID288
Feature activation-5.48e-03

Pos logit contributions
ada+0.028
aman+0.028
mine+0.025
 AW+0.023
use+0.023
Neg logit contributions
 knights-0.030
 explorers-0.030
 caves-0.025
 triangles-0.025
 pirates-0.025
Tokenarl
Token ID7063
Feature activation-8.03e-02

Pos logit contributions
aman+0.007
use+0.006
 AW+0.006
 punish+0.005
ada+0.005
Neg logit contributions
 knights-0.011
 explorers-0.011
and-0.010
 hunters-0.010
 triangles-0.010
Tokenings
Token ID654
Feature activation-3.09e-02

Pos logit contributions
ada+0.129
aman+0.127
mine+0.115
iable+0.111
 punish+0.108
Neg logit contributions
 explorers-0.128
kins-0.124
 knights-0.124
ment-0.121
men-0.118
Token.
Token ID13
Feature activation-7.46e-03

Pos logit contributions
aman+0.069
ada+0.067
inkle+0.061
ige+0.060
mine+0.059
Neg logit contributions
 knights-0.035
 explorers-0.033
 he-0.030
 there-0.030
 hunters-0.027
Token That
Token ID1320
Feature activation-2.95e-03

Pos logit contributions
ada+0.012
aman+0.012
mine+0.010
 AW+0.010
ige+0.010
Neg logit contributions
 knights-0.013
 explorers-0.013
 triangles-0.012
 caves-0.012
 lair-0.011
Token was
Token ID373
Feature activation+2.03e-03

Pos logit contributions
ada+0.005
aman+0.005
mine+0.005
ir+0.004
ige+0.004
Neg logit contributions
 knights-0.005
 explorers-0.005
 triangles-0.004
 caves-0.004
 rockets-0.004
Token a
Token ID257
Feature activation+4.83e-03

Pos logit contributions
 knights+0.003
 explorers+0.003
 lair+0.003
 caves+0.003
 slices+0.003
Neg logit contributions
ada-0.003
aman-0.003
ige-0.003
 punish-0.003
iable-0.002
TokenWhat
Token ID2061
Feature activation+1.43e-02

Pos logit contributions
ada+0.011
aman+0.011
mine+0.010
use+0.009
 AW+0.009
Neg logit contributions
 knights-0.009
 explorers-0.009
 triangles-0.008
 caves-0.008
 lair-0.008
Token happened
Token ID3022
Feature activation+8.96e-03

Pos logit contributions
 explorers+0.030
 knights+0.030
 caves+0.030
 lair+0.030
 logs+0.029
Neg logit contributions
ada-0.015
're-0.013
 hate-0.013
mine-0.012
 punish-0.012
Token,
Token ID11
Feature activation-1.76e-02

Pos logit contributions
 explorers+0.016
 knights+0.015
 there+0.013
 pirates+0.013
 caves+0.013
Neg logit contributions
ada-0.015
aman-0.015
mine-0.014
ige-0.013
 AW-0.012
Token my
Token ID616
Feature activation-1.66e-02

Pos logit contributions
ada+0.035
aman+0.035
mine+0.031
 AW+0.029
ige+0.029
Neg logit contributions
 knights-0.025
 explorers-0.025
 triangles-0.021
 caves-0.021
 rockets-0.020
Token d
Token ID288
Feature activation-9.05e-03

Pos logit contributions
ada+0.029
mine+0.029
aman+0.028
use+0.027
ir+0.027
Neg logit contributions
 knights-0.025
 explorers-0.025
 brothers-0.022
 pirates-0.022
 rockets-0.021
Tokenarl
Token ID7063
Feature activation-7.78e-02

Pos logit contributions
aman+0.012
use+0.012
 punish+0.010
 AW+0.010
mine+0.010
Neg logit contributions
 knights-0.018
 explorers-0.017
 hunters-0.016
lights-0.015
ull-0.015
Tokenings
Token ID654
Feature activation-3.66e-02

Pos logit contributions
ada+0.119
aman+0.116
mine+0.105
iable+0.097
ige+0.097
Neg logit contributions
 explorers-0.138
 knights-0.133
kins-0.127
 pieces-0.121
ment-0.120
Token?
Token ID30
Feature activation-1.33e-02

Pos logit contributions
aman+0.054
ada+0.054
mine+0.046
ige+0.045
 AW+0.044
Neg logit contributions
 explorers-0.065
 knights-0.064
 there-0.062
 he-0.058
 caves-0.057
Token Why
Token ID4162
Feature activation+1.64e-02

Pos logit contributions
ada+0.024
aman+0.023
mine+0.021
 AW+0.020
use+0.019
Neg logit contributions
 knights-0.021
 explorers-0.021
 triangles-0.018
 caves-0.018
 lair-0.017
Token are
Token ID389
Feature activation-2.10e-03

Pos logit contributions
 knights+0.034
 explorers+0.034
 triangles+0.030
 caves+0.030
 pirates+0.029
Neg logit contributions
ada-0.022
aman-0.022
mine-0.018
ir-0.017
 AW-0.016
Token you
Token ID345
Feature activation-1.62e-03

Pos logit contributions
ada+0.002
aman+0.002
 punish+0.001
mine+0.001
 ME+0.001
Neg logit contributions
 lair-0.005
 knights-0.005
 explorers-0.005
and-0.005
 caves-0.005
Token in
Token ID287
Feature activation-7.29e-03

Pos logit contributions
 knights+0.044
 explorers+0.044
 triangles+0.040
 caves+0.038
 importantly+0.038
Neg logit contributions
aman-0.032
ada-0.032
mine-0.027
use-0.026
ir-0.025
Token his
Token ID465
Feature activation-4.38e-04

Pos logit contributions
ada+0.015
aman+0.015
mine+0.014
ir+0.013
use+0.013
Neg logit contributions
 knights-0.010
 explorers-0.009
 triangles-0.008
 caves-0.008
 rockets-0.008
Token office
Token ID2607
Feature activation+2.11e-02

Pos logit contributions
aman+0.001
ada+0.001
use+0.001
 punish+0.001
inkle+0.001
Neg logit contributions
 knights-0.001
 explorers-0.001
 triangles-0.001
 queens-0.001
 chefs-0.001
Token and
Token ID290
Feature activation+2.61e-03

Pos logit contributions
 explorers+0.040
 knights+0.040
 triangles+0.036
 caves+0.035
 rockets+0.034
Neg logit contributions
ada-0.031
aman-0.031
mine-0.027
ige-0.024
use-0.024
Token asked
Token ID1965
Feature activation+9.65e-03

Pos logit contributions
 knights+0.004
 explorers+0.004
 triangles+0.004
 importantly+0.004
 caves+0.004
Neg logit contributions
ada-0.004
aman-0.004
mine-0.004
 punish-0.004
use-0.004
Token him
Token ID683
Feature activation+4.55e-02

Pos logit contributions
 explorers+0.016
 knights+0.016
 triangles+0.015
 lair+0.014
 caves+0.014
Neg logit contributions
ada-0.016
aman-0.015
mine-0.013
use-0.013
ir-0.012
Token what
Token ID644
Feature activation+4.04e-02

Pos logit contributions
 explorers+0.075
 knights+0.072
 importantly+0.072
 nights+0.072
 pans+0.071
Neg logit contributions
ada-0.066
aman-0.059
use-0.055
 punish-0.054
mine-0.053
Token to
Token ID284
Feature activation+9.30e-04

Pos logit contributions
lights+0.079
 pans+0.076
 knights+0.075
 lair+0.075
 logs+0.075
Neg logit contributions
 hate-0.043
 punish-0.042
 judge-0.042
't-0.040
ada-0.039
Token do
Token ID466
Feature activation+1.32e-02

Pos logit contributions
 explorers+0.002
 knights+0.002
 pirates+0.002
 caves+0.002
 there+0.002
Neg logit contributions
ada-0.001
aman-0.001
mine-0.001
 AW-0.001
ige-0.001
Token.
Token ID13
Feature activation+8.43e-03

Pos logit contributions
 explorers+0.024
 knights+0.024
 caves+0.022
 triangles+0.022
 lair+0.022
Neg logit contributions
ada-0.018
aman-0.016
mine-0.016
ose-0.015
 punish-0.015
Token The
Token ID383
Feature activation+9.67e-04

Pos logit contributions
 explorers+0.014
 knights+0.014
 caves+0.013
 triangles+0.013
 lair+0.012
Neg logit contributions
ada-0.014
aman-0.014
mine-0.012
 AW-0.011
ige-0.011
Token.
Token ID13
Feature activation+2.31e-03

Pos logit contributions
 explorers+0.031
 knights+0.030
 caves+0.027
 triangles+0.027
 lair+0.026
Neg logit contributions
ada-0.032
aman-0.031
mine-0.028
ige-0.025
ir-0.025
Token She
Token ID1375
Feature activation+2.24e-03

Pos logit contributions
 knights+0.004
 explorers+0.004
 triangles+0.004
 caves+0.004
 rockets+0.004
Neg logit contributions
ada-0.004
aman-0.004
mine-0.003
 AW-0.003
ige-0.003
Token said
Token ID531
Feature activation-7.30e-03

Pos logit contributions
 knights+0.004
 explorers+0.004
 caves+0.003
 triangles+0.003
 rockets+0.003
Neg logit contributions
ada-0.004
aman-0.004
mine-0.003
ige-0.003
 punish-0.003
Token,
Token ID11
Feature activation+4.85e-03

Pos logit contributions
ada+0.007
aman+0.006
mine+0.005
ir+0.005
 AW+0.004
Neg logit contributions
 explorers-0.017
 knights-0.017
 caves-0.016
 lair-0.016
 triangles-0.016
Token "
Token ID366
Feature activation+9.22e-03

Pos logit contributions
 knights+0.006
 explorers+0.006
 caves+0.005
 triangles+0.005
 lair+0.005
Neg logit contributions
ada-0.011
aman-0.010
mine-0.009
 AW-0.009
ige-0.009
TokenTom
Token ID13787
Feature activation+3.64e-02

Pos logit contributions
 knights+0.016
 explorers+0.016
 triangles+0.014
 caves+0.014
 rockets+0.013
Neg logit contributions
ada-0.016
aman-0.016
mine-0.014
ige-0.013
 AW-0.013
Token,
Token ID11
Feature activation-4.91e-03

Pos logit contributions
 knights+0.069
 explorers+0.068
 there+0.060
 pirates+0.059
 triangles+0.059
Neg logit contributions
ada-0.056
aman-0.054
mine-0.047
ir-0.043
 AW-0.043
Token I
Token ID314
Feature activation+7.97e-03

Pos logit contributions
ada+0.007
aman+0.007
mine+0.006
 punish+0.006
ige+0.006
Neg logit contributions
 knights-0.009
 explorers-0.009
 triangles-0.008
 caves-0.008
 lair-0.008
Token am
Token ID716
Feature activation+7.21e-03

Pos logit contributions
 explorers+0.014
 knights+0.013
 there+0.013
 pirates+0.012
 he+0.011
Neg logit contributions
aman-0.015
ada-0.014
mine-0.012
ige-0.012
ir-0.011
Token here
Token ID994
Feature activation-1.64e-02

Pos logit contributions
 knights+0.011
 explorers+0.011
 triangles+0.009
 caves+0.009
 rockets+0.009
Neg logit contributions
ada-0.014
aman-0.013
mine-0.012
 AW-0.011
ige-0.011
Token.
Token ID13
Feature activation-1.23e-02

Pos logit contributions
ada+0.028
aman+0.027
mine+0.024
 punish+0.023
use+0.023
Neg logit contributions
 knights-0.026
 explorers-0.026
 triangles-0.024
 caves-0.023
 shields-0.023
Token what
Token ID644
Feature activation+3.94e-02

Pos logit contributions
 knights+0.043
 explorers+0.043
 triangles+0.038
 caves+0.038
 shields+0.037
Neg logit contributions
ada-0.048
aman-0.046
mine-0.040
use-0.038
ige-0.038
Token is
Token ID318
Feature activation-1.12e-02

Pos logit contributions
 knights+0.066
 explorers+0.065
 lair+0.063
 shields+0.063
 pans+0.061
Neg logit contributions
ada-0.057
aman-0.055
 hate-0.051
mine-0.050
 ME-0.049
Token her
Token ID607
Feature activation-6.26e-03

Pos logit contributions
ada+0.020
aman+0.019
mine+0.017
ige+0.016
 AW+0.016
Neg logit contributions
 knights-0.018
 explorers-0.018
 caves-0.015
 triangles-0.015
 slices-0.015
Token name
Token ID1438
Feature activation-1.34e-02

Pos logit contributions
ada+0.011
aman+0.011
mine+0.010
 AW+0.009
ir+0.009
Neg logit contributions
 explorers-0.010
 knights-0.010
 caves-0.009
 triangles-0.009
 lair-0.009
Token and
Token ID290
Feature activation+7.08e-03

Pos logit contributions
ada+0.026
aman+0.025
mine+0.022
 AW+0.021
ige+0.021
Neg logit contributions
 knights-0.020
 explorers-0.020
 triangles-0.017
 caves-0.017
 pirates-0.016
Token why
Token ID1521
Feature activation+3.48e-02

Pos logit contributions
 knights+0.011
 explorers+0.011
 caves+0.009
 triangles+0.009
 lair+0.009
Neg logit contributions
ada-0.013
aman-0.012
mine-0.011
 punish-0.011
use-0.010
Token is
Token ID318
Feature activation-1.21e-02

Pos logit contributions
 knights+0.052
 explorers+0.052
 lair+0.046
 caves+0.045
 triangles+0.044
Neg logit contributions
ada-0.063
aman-0.061
mine-0.057
 punish-0.052
 hate-0.052
Token she
Token ID673
Feature activation-6.48e-03

Pos logit contributions
ada+0.022
aman+0.021
mine+0.019
ige+0.018
 ME+0.018
Neg logit contributions
 explorers-0.018
 knights-0.018
 caves-0.016
 lair-0.016
 shields-0.016
Token sad
Token ID6507
Feature activation+5.67e-03

Pos logit contributions
ada+0.013
aman+0.012
mine+0.011
 AW+0.010
ir+0.010
Neg logit contributions
 knights-0.010
 explorers-0.010
 triangles-0.008
 caves-0.008
 pirates-0.008
Token.
Token ID13
Feature activation+1.92e-03

Pos logit contributions
 knights+0.009
 explorers+0.009
 caves+0.008
 triangles+0.008
 lair+0.008
Neg logit contributions
ada-0.010
aman-0.010
mine-0.009
use-0.008
 AW-0.008
Token
Token ID198
Feature activation-4.09e-03

Pos logit contributions
 explorers+0.003
 knights+0.003
 caves+0.003
 triangles+0.003
 lair+0.003
Neg logit contributions
ada-0.003
aman-0.003
mine-0.003
ING-0.002
ir-0.002
Token.
Token ID13
Feature activation+6.40e-04

Pos logit contributions
ada+0.023
aman+0.022
mine+0.020
ige+0.018
ir+0.018
Neg logit contributions
 explorers-0.022
 knights-0.021
 caves-0.019
 triangles-0.018
 lair-0.018
Token Sam
Token ID3409
Feature activation+2.77e-03

Pos logit contributions
 explorers+0.001
 knights+0.001
 lair+0.001
 caves+0.001
kins+0.001
Neg logit contributions
ada-0.001
aman-0.001
mine-0.001
 AW-0.001
ING-0.001
Token thanks
Token ID5176
Feature activation-2.43e-02

Pos logit contributions
 knights+0.004
 explorers+0.004
 triangles+0.004
 rockets+0.003
 pirates+0.003
Neg logit contributions
ada-0.006
aman-0.006
ir-0.005
mine-0.005
ige-0.005
Token them
Token ID606
Feature activation-1.22e-02

Pos logit contributions
ada+0.040
aman+0.038
mine+0.034
ir+0.032
 punish+0.030
Neg logit contributions
 explorers-0.040
 knights-0.040
 caves-0.036
 triangles-0.036
 lair-0.036
Token for
Token ID329
Feature activation-1.26e-03

Pos logit contributions
ada+0.020
aman+0.019
mine+0.017
ige+0.015
ir+0.015
Neg logit contributions
 explorers-0.022
 knights-0.021
 caves-0.019
 triangles-0.019
 lair-0.019
Token being
Token ID852
Feature activation+3.60e-02

Pos logit contributions
ada+0.002
aman+0.002
 punish+0.002
mine+0.002
ir+0.002
Neg logit contributions
 explorers-0.002
 knights-0.002
 chambers-0.002
 caves-0.002
 pans-0.002
Token nice
Token ID3621
Feature activation-6.81e-03

Pos logit contributions
 caves+0.054
 explorers+0.053
 lair+0.052
 knights+0.052
 logs+0.050
Neg logit contributions
ada-0.063
mine-0.054
ige-0.053
 punish-0.052
aman-0.051
Token.
Token ID13
Feature activation-1.38e-03

Pos logit contributions
ada+0.009
aman+0.008
mine+0.007
ING+0.007
awk+0.007
Neg logit contributions
 explorers-0.013
 knights-0.013
 caves-0.012
 pans-0.012
 triangles-0.012
Token They
Token ID1119
Feature activation-1.91e-02

Pos logit contributions
ada+0.002
aman+0.002
 AW+0.001
upid+0.001
aff+0.001
Neg logit contributions
 explorers-0.003
 knights-0.002
kins-0.002
 lair-0.002
 caves-0.002
Token become
Token ID1716
Feature activation+1.97e-02

Pos logit contributions
aman+0.034
ada+0.030
ir+0.025
ING+0.024
 AW+0.024
Neg logit contributions
 explorers-0.035
 knights-0.035
 he-0.031
 astronauts-0.030
 brothers-0.030
Token friends
Token ID2460
Feature activation-5.66e-03

Pos logit contributions
 lair+0.026
lights+0.025
 caves+0.025
and+0.024
 nights+0.024
Neg logit contributions
ada-0.038
ige-0.032
 ME-0.028
mine-0.028
 punish-0.028
Token,
Token ID11
Feature activation-1.27e-02

Pos logit contributions
ada+0.011
aman+0.010
mine+0.009
 AW+0.008
ir+0.008
Neg logit contributions
 knights-0.013
 explorers-0.013
 triangles-0.011
 caves-0.011
 lair-0.011
Token Lily
Token ID20037
Feature activation+3.23e-04

Pos logit contributions
ada+0.025
aman+0.025
mine+0.023
 AW+0.021
ir+0.021
Neg logit contributions
 explorers-0.018
 knights-0.017
 lair-0.015
 caves-0.015
 triangles-0.015
Token's
Token ID338
Feature activation+1.09e-02

Pos logit contributions
 knights+0.000
 explorers+0.000
 triangles+0.000
 caves+0.000
 lair+0.000
Neg logit contributions
ada-0.001
aman-0.001
mine-0.001
 AW-0.001
ir-0.001
Token mom
Token ID1995
Feature activation-8.51e-03

Pos logit contributions
 explorers+0.019
 knights+0.018
 caves+0.017
 triangles+0.017
 queens+0.016
Neg logit contributions
ada-0.018
aman-0.017
 punish-0.014
mine-0.014
ige-0.014
Token sold
Token ID2702
Feature activation-4.82e-03

Pos logit contributions
ada+0.016
aman+0.016
mine+0.014
 AW+0.013
ir+0.013
Neg logit contributions
 knights-0.013
 explorers-0.013
 triangles-0.011
 caves-0.011
 rockets-0.010
Token more
Token ID517
Feature activation+4.35e-02

Pos logit contributions
ada+0.010
aman+0.010
mine+0.009
ige+0.008
 AW+0.008
Neg logit contributions
 knights-0.007
 explorers-0.007
 triangles-0.006
 caves-0.006
 lair-0.005
Token and
Token ID290
Feature activation-4.87e-04

Pos logit contributions
 knights+0.050
 explorers+0.050
 triangles+0.040
 caves+0.040
 lair+0.038
Neg logit contributions
ada-0.098
aman-0.096
mine-0.087
ige-0.082
 AW-0.082
Token more
Token ID517
Feature activation+3.51e-02

Pos logit contributions
aman+0.001
ada+0.001
 punish+0.001
oph+0.001
mine+0.001
Neg logit contributions
 explorers-0.001
 knights-0.001
 lair-0.001
 caves-0.001
 cave-0.001
Token things
Token ID1243
Feature activation-1.48e-02

Pos logit contributions
 explorers+0.045
 knights+0.044
 lair+0.041
 caves+0.040
 triangles+0.038
Neg logit contributions
ada-0.070
aman-0.065
mine-0.058
 hate-0.057
 punish-0.057
Token.
Token ID13
Feature activation+3.05e-03

Pos logit contributions
ada+0.031
aman+0.030
mine+0.027
use+0.025
 AW+0.025
Neg logit contributions
 knights-0.020
 explorers-0.019
 caves-0.016
 triangles-0.016
 lair-0.016
Token Then
Token ID3244
Feature activation+1.23e-02

Pos logit contributions
 knights+0.005
 explorers+0.004
 triangles+0.004
 caves+0.004
 rockets+0.004
Neg logit contributions
aman-0.006
ada-0.006
mine-0.005
ir-0.005
inkle-0.005
Token wet
Token ID9583
Feature activation-1.24e-02

Pos logit contributions
aman+0.005
ada+0.005
mine+0.004
inkle+0.004
 punish+0.004
Neg logit contributions
 knights-0.005
 explorers-0.005
 triangles-0.004
 caves-0.004
 lair-0.004
Token.
Token ID13
Feature activation+3.34e-03

Pos logit contributions
ada+0.022
aman+0.022
mine+0.019
 AW+0.018
ige+0.018
Neg logit contributions
 knights-0.020
 explorers-0.020
 triangles-0.017
 caves-0.017
 lair-0.016
Token They
Token ID1119
Feature activation-2.12e-02

Pos logit contributions
 knights+0.005
 explorers+0.005
 caves+0.004
 triangles+0.004
 lair+0.004
Neg logit contributions
ada-0.007
aman-0.006
mine-0.006
 AW-0.005
ige-0.005
Token walked
Token ID6807
Feature activation+2.12e-02

Pos logit contributions
ada+0.037
aman+0.035
mine+0.034
use+0.029
ige+0.029
Neg logit contributions
 knights-0.035
 explorers-0.034
 triangles-0.031
 lair-0.031
 caves-0.030
Token around
Token ID1088
Feature activation-2.27e-02

Pos logit contributions
 knights+0.036
 explorers+0.036
 caves+0.032
 triangles+0.032
 rockets+0.031
Neg logit contributions
ada-0.036
aman-0.035
mine-0.031
ige-0.029
 AW-0.029
Token and
Token ID290
Feature activation+3.28e-03

Pos logit contributions
ada+0.041
aman+0.040
mine+0.035
 AW+0.033
ige+0.033
Neg logit contributions
 knights-0.037
 explorers-0.037
 triangles-0.032
 caves-0.032
 rockets-0.031
Token saw
Token ID2497
Feature activation+7.20e-03

Pos logit contributions
 knights+0.005
 explorers+0.005
 triangles+0.005
 lair+0.005
 caves+0.005
Neg logit contributions
ada-0.006
aman-0.006
mine-0.005
use-0.005
 AW-0.005
Token a
Token ID257
Feature activation+9.66e-03

Pos logit contributions
 explorers+0.011
 knights+0.011
 lair+0.010
kins+0.010
 shields+0.009
Neg logit contributions
ada-0.013
aman-0.012
ige-0.011
mine-0.010
 punish-0.010
Token big
Token ID1263
Feature activation-4.03e-03

Pos logit contributions
 knights+0.013
 explorers+0.013
 triangles+0.011
 caves+0.011
 rockets+0.010
Neg logit contributions
ada-0.020
aman-0.019
mine-0.018
ir-0.016
 AW-0.016
Token p
Token ID279
Feature activation-2.55e-02

Pos logit contributions
ada+0.008
aman+0.008
mine+0.008
use+0.007
ir+0.007
Neg logit contributions
 explorers-0.006
 knights-0.005
 cave-0.005
 caves-0.005
 there-0.005
Tokenuddle
Token ID24500
Feature activation-2.03e-03

Pos logit contributions
ada+0.053
mine+0.052
aman+0.052
use+0.047
 punish+0.047
Neg logit contributions
 cave-0.027
 explorers-0.025
 knights-0.025
 there-0.024
 cubes-0.024
Token truck
Token ID7779
Feature activation-1.01e-02

Pos logit contributions
ada+0.011
aman+0.011
mine+0.009
 AW+0.009
 punish+0.009
Neg logit contributions
 knights-0.010
 explorers-0.010
 triangles-0.009
 caves-0.009
 lair-0.008
Token was
Token ID373
Feature activation+7.14e-03

Pos logit contributions
ada+0.017
aman+0.016
mine+0.015
 punish+0.014
ose+0.014
Neg logit contributions
 knights-0.016
 explorers-0.015
 caves-0.015
 triangles-0.014
 queens-0.014
Token loading
Token ID11046
Feature activation+2.10e-02

Pos logit contributions
 knights+0.011
 explorers+0.011
 caves+0.010
 triangles+0.010
 queens+0.009
Neg logit contributions
ada-0.013
aman-0.012
 punish-0.010
mine-0.010
ige-0.010
Token purple
Token ID14032
Feature activation+2.35e-02

Pos logit contributions
 explorers+0.029
 knights+0.029
 importantly+0.026
 lair+0.025
 triangles+0.025
Neg logit contributions
ada-0.040
aman-0.038
mine-0.035
ige-0.034
ING-0.033
Token balloons
Token ID35485
Feature activation+1.05e-02

Pos logit contributions
 knights+0.029
 explorers+0.029
 triangles+0.024
 caves+0.024
 lair+0.024
Neg logit contributions
ada-0.050
aman-0.048
mine-0.044
 AW-0.041
ir-0.041
Token for
Token ID329
Feature activation+2.38e-03

Pos logit contributions
 knights+0.019
 explorers+0.018
 importantly+0.017
 triangles+0.016
 caves+0.016
Neg logit contributions
ada-0.016
aman-0.015
mine-0.014
ose-0.013
our-0.013
Token the
Token ID262
Feature activation-1.03e-02

Pos logit contributions
 explorers+0.003
 knights+0.003
 lair+0.003
 triangles+0.003
 caves+0.003
Neg logit contributions
ada-0.005
aman-0.004
mine-0.004
ir-0.004
ige-0.004
Token parade
Token ID16134
Feature activation-7.87e-03

Pos logit contributions
ada+0.018
aman+0.017
mine+0.015
 punish+0.014
 AW+0.014
Neg logit contributions
 knights-0.017
 explorers-0.017
 triangles-0.015
 caves-0.015
 lair-0.014
Token.
Token ID13
Feature activation+5.49e-03

Pos logit contributions
ada+0.012
aman+0.011
mine+0.010
 punish+0.009
ING+0.009
Neg logit contributions
 knights-0.014
 explorers-0.014
 triangles-0.013
 caves-0.013
 importantly-0.013
Token Tim
Token ID5045
Feature activation+5.10e-03

Pos logit contributions
 explorers+0.009
 knights+0.009
 caves+0.008
 lair+0.008
 triangles+0.008
Neg logit contributions
ada-0.010
aman-0.009
mine-0.008
 AW-0.008
ING-0.008
Token loved
Token ID6151
Feature activation-1.67e-02

Pos logit contributions
 knights+0.008
 explorers+0.008
 caves+0.007
 triangles+0.007
 lair+0.006
Neg logit contributions
ada-0.010
aman-0.009
mine-0.008
ige-0.008
ING-0.008
Token
Token ID198
Feature activation-5.38e-03

Pos logit contributions
 explorers+0.012
 knights+0.012
 pans+0.010
 caves+0.010
 shields+0.010
Neg logit contributions
ada-0.011
aman-0.011
mine-0.009
upid-0.008
 ME-0.008
Token
Token ID198
Feature activation-8.31e-03

Pos logit contributions
ada+0.009
aman+0.009
mine+0.008
ir+0.008
inkle+0.007
Neg logit contributions
 explorers-0.009
 knights-0.009
 triangles-0.008
 caves-0.008
 queens-0.008
TokenAnd
Token ID1870
Feature activation-6.31e-03

Pos logit contributions
ada+0.013
aman+0.013
mine+0.011
 AW+0.010
ir+0.010
Neg logit contributions
 explorers-0.016
 knights-0.015
 caves-0.013
 rockets-0.013
 triangles-0.013
Token so
Token ID523
Feature activation+2.67e-03

Pos logit contributions
ada+0.011
aman+0.010
mine+0.009
 ME+0.009
 punish+0.008
Neg logit contributions
 knights-0.011
 explorers-0.010
 pans-0.009
 triangles-0.009
 lair-0.009
Token,
Token ID11
Feature activation-7.19e-03

Pos logit contributions
 knights+0.005
 explorers+0.005
 lair+0.004
 triangles+0.004
 caves+0.004
Neg logit contributions
ada-0.004
aman-0.004
mine-0.004
inkle-0.004
 AW-0.004
Token the
Token ID262
Feature activation-4.42e-03

Pos logit contributions
ada+0.014
aman+0.014
mine+0.013
 AW+0.012
ir+0.012
Neg logit contributions
 knights-0.010
 explorers-0.009
 lair-0.008
 triangles-0.008
 caves-0.008
Token mighty
Token ID18680
Feature activation-1.63e-05

Pos logit contributions
ada+0.007
aman+0.007
mine+0.006
ING+0.006
 AW+0.006
Neg logit contributions
 explorers-0.007
 knights-0.007
 caves-0.006
and-0.006
 slices-0.006
Token warrior
Token ID16491
Feature activation-5.38e-03

Pos logit contributions
ada+0.000
aman+0.000
mine+0.000
ige+0.000
 AW+0.000
Neg logit contributions
 knights-0.000
 explorers-0.000
 triangles-0.000
 caves-0.000
 rockets-0.000
Token was
Token ID373
Feature activation-9.80e-04

Pos logit contributions
ada+0.010
aman+0.010
mine+0.009
 AW+0.008
ige+0.008
Neg logit contributions
 knights-0.008
 explorers-0.008
 triangles-0.007
 caves-0.007
 rockets-0.007
Token thanked
Token ID26280
Feature activation-1.57e-02

Pos logit contributions
ada+0.002
aman+0.002
mine+0.001
inkle+0.001
 punish+0.001
Neg logit contributions
 knights-0.002
 explorers-0.002
 caves-0.002
 triangles-0.001
 chambers-0.001
Token and
Token ID290
Feature activation+1.59e-02

Pos logit contributions
ada+0.031
aman+0.031
mine+0.028
ir+0.027
 AW+0.026
Neg logit contributions
 explorers-0.021
 knights-0.021
 caves-0.018
 triangles-0.018
 lair-0.017
Token garden
Token ID11376
Feature activation-1.56e-02

Pos logit contributions
ada+0.019
aman+0.018
mine+0.016
ige+0.015
ir+0.015
Neg logit contributions
 knights-0.015
 explorers-0.015
 triangles-0.013
 caves-0.013
 lair-0.012
Token was
Token ID373
Feature activation+1.87e-03

Pos logit contributions
ada+0.026
aman+0.025
mine+0.023
 punish+0.022
ishly+0.022
Neg logit contributions
 explorers-0.025
 knights-0.025
 triangles-0.022
 caves-0.022
 queens-0.022
Token color
Token ID3124
Feature activation+1.43e-02

Pos logit contributions
 knights+0.003
 explorers+0.003
 caves+0.003
and+0.003
 triangles+0.003
Neg logit contributions
aman-0.003
ada-0.003
mine-0.003
 punish-0.003
ir-0.003
Tokenless
Token ID1203
Feature activation-1.28e-02

Pos logit contributions
 knights+0.023
 explorers+0.023
 caves+0.019
 triangles+0.019
 lair+0.019
Neg logit contributions
ada-0.026
aman-0.025
mine-0.022
ir-0.021
ige-0.021
Token.
Token ID13
Feature activation+1.02e-02

Pos logit contributions
ada+0.015
aman+0.015
mine+0.014
ir+0.012
ishly+0.012
Neg logit contributions
 knights-0.025
 explorers-0.025
 lair-0.023
 chambers-0.023
 pans-0.023
Token
Token ID198
Feature activation-1.50e-03

Pos logit contributions
 explorers+0.020
 knights+0.019
 queens+0.017
 lair+0.017
 shields+0.017
Neg logit contributions
ada-0.015
aman-0.015
mine-0.013
ir-0.011
 ME-0.011
Token
Token ID198
Feature activation-6.28e-03

Pos logit contributions
ada+0.002
aman+0.002
mine+0.002
ir+0.002
't+0.002
Neg logit contributions
 explorers-0.003
 knights-0.003
 shields-0.002
 triangles-0.002
 rockets-0.002
TokenThen
Token ID6423
Feature activation+2.04e-02

Pos logit contributions
ada+0.010
aman+0.010
mine+0.009
ir+0.008
 AW+0.008
Neg logit contributions
 explorers-0.012
 knights-0.011
 rockets-0.010
 caves-0.010
 lair-0.010
Token one
Token ID530
Feature activation-7.37e-03

Pos logit contributions
and+0.044
 lair+0.043
 explorers+0.043
 knights+0.042
 queens+0.040
Neg logit contributions
aman-0.021
ada-0.020
mine-0.017
 AW-0.015
ishly-0.015
Token day
Token ID1110
Feature activation+5.08e-03

Pos logit contributions
ada+0.011
aman+0.010
 hate+0.010
awk+0.009
inkle+0.009
Neg logit contributions
 knights-0.013
 explorers-0.013
 shields-0.012
 lair-0.012
 rockets-0.012
Token,
Token ID11
Feature activation-1.81e-03

Pos logit contributions
 lair+0.012
 explorers+0.012
 knights+0.012
and+0.011
 triangles+0.011
Neg logit contributions
ada-0.005
aman-0.004
mine-0.003
ING-0.003
 AW-0.003
Token It
Token ID632
Feature activation-4.74e-03

Pos logit contributions
 knights+0.004
 explorers+0.004
 triangles+0.004
 caves+0.004
 rockets+0.004
Neg logit contributions
ada-0.003
aman-0.003
mine-0.003
 AW-0.003
ige-0.003
Token had
Token ID550
Feature activation+1.30e-02

Pos logit contributions
ada+0.010
aman+0.009
mine+0.009
 AW+0.008
ir+0.008
Neg logit contributions
 knights-0.006
 explorers-0.006
 caves-0.005
 triangles-0.005
 rockets-0.005
Token been
Token ID587
Feature activation+2.82e-02

Pos logit contributions
 knights+0.020
 explorers+0.020
 caves+0.017
 triangles+0.017
 rockets+0.017
Neg logit contributions
ada-0.024
aman-0.023
mine-0.021
ige-0.020
 AW-0.019
Token hurt
Token ID5938
Feature activation-2.19e-03

Pos logit contributions
 knights+0.039
 explorers+0.039
 triangles+0.033
 caves+0.033
 rockets+0.032
Neg logit contributions
ada-0.056
aman-0.054
mine-0.049
ige-0.046
ir-0.046
Token by
Token ID416
Feature activation+1.09e-02

Pos logit contributions
ada+0.004
aman+0.004
mine+0.003
 AW+0.003
use+0.003
Neg logit contributions
 knights-0.004
 explorers-0.004
 caves-0.003
 triangles-0.003
 there-0.003
Token a
Token ID257
Feature activation+9.32e-05

Pos logit contributions
 knights+0.015
 explorers+0.015
 triangles+0.013
 caves+0.013
 pirates+0.012
Neg logit contributions
ada-0.022
aman-0.021
mine-0.019
use-0.019
 AW-0.018
Token dangerous
Token ID4923
Feature activation-7.13e-03

Pos logit contributions
 knights+0.000
 explorers+0.000
 triangles+0.000
 caves+0.000
 lair+0.000
Neg logit contributions
ada-0.000
aman-0.000
mine-0.000
ige-0.000
 AW-0.000
Token weapon
Token ID4282
Feature activation-9.04e-03

Pos logit contributions
ada+0.015
aman+0.015
mine+0.014
ige+0.013
 AW+0.013
Neg logit contributions
 explorers-0.010
 knights-0.009
 cave-0.008
 caves-0.008
 pirates-0.008
Token.
Token ID13
Feature activation+5.86e-03

Pos logit contributions
ada+0.016
aman+0.015
mine+0.013
ige+0.013
 AW+0.012
Neg logit contributions
 explorers-0.015
 knights-0.015
 caves-0.013
 triangles-0.013
 lair-0.013
Token The
Token ID383
Feature activation-5.80e-04

Pos logit contributions
 knights+0.010
 explorers+0.010
 triangles+0.009
 caves+0.009
 rockets+0.008
Neg logit contributions
ada-0.010
aman-0.010
mine-0.008
 AW-0.008
ige-0.008
Token brave
Token ID14802
Feature activation+8.69e-03

Pos logit contributions
ada+0.001
aman+0.001
mine+0.001
 punish+0.001
ING+0.001
Neg logit contributions
 knights-0.001
 explorers-0.001
 triangles-0.001
 caves-0.001
 rockets-0.001
Token can
Token ID460
Feature activation-1.97e-02

Pos logit contributions
aman+0.014
ada+0.012
mine+0.010
ige+0.010
aley+0.010
Neg logit contributions
 explorers-0.015
 knights-0.015
 there-0.013
 astronauts-0.013
 pirates-0.013
Token be
Token ID307
Feature activation+2.11e-02

Pos logit contributions
aman+0.047
ada+0.043
ige+0.042
 AW+0.041
andel+0.039
Neg logit contributions
 he-0.023
 there-0.021
 knights-0.020
 explorers-0.019
 friends-0.019
Token my
Token ID616
Feature activation-2.55e-02

Pos logit contributions
 explorers+0.031
 knights+0.031
 caves+0.028
 triangles+0.028
 slices+0.028
Neg logit contributions
ada-0.037
aman-0.034
 punish-0.032
awk-0.031
ige-0.031
Token friend
Token ID1545
Feature activation-2.15e-02

Pos logit contributions
ada+0.059
aman+0.058
mine+0.055
 AW+0.052
use+0.051
Neg logit contributions
 explorers-0.029
 knights-0.029
 pirates-0.022
 caves-0.022
 triangles-0.021
Token.
Token ID13
Feature activation-1.27e-02

Pos logit contributions
ada+0.033
aman+0.031
mine+0.028
use+0.026
ING+0.025
Neg logit contributions
 explorers-0.039
 knights-0.039
 caves-0.035
 triangles-0.035
 lair-0.034
Token But
Token ID887
Feature activation-7.23e-03

Pos logit contributions
ada+0.022
aman+0.021
mine+0.019
ige+0.018
ir+0.017
Neg logit contributions
 knights-0.022
 explorers-0.021
 caves-0.018
 triangles-0.018
 pirates-0.018
Token next
Token ID1306
Feature activation+2.35e-02

Pos logit contributions
ada+0.011
aman+0.010
mine+0.009
use+0.008
ige+0.008
Neg logit contributions
 knights-0.014
 explorers-0.014
 triangles-0.012
 caves-0.012
 lair-0.012
Token time
Token ID640
Feature activation+1.74e-02

Pos logit contributions
 knights+0.042
 explorers+0.042
 triangles+0.037
 caves+0.037
 rockets+0.036
Neg logit contributions
ada-0.038
aman-0.037
mine-0.032
 AW-0.030
ige-0.030
Token,
Token ID11
Feature activation-5.33e-03

Pos logit contributions
 knights+0.035
 explorers+0.035
 caves+0.031
 triangles+0.031
 pirates+0.031
Neg logit contributions
ada-0.024
aman-0.023
mine-0.020
ige-0.018
ir-0.018
Token please
Token ID3387
Feature activation-1.08e-04

Pos logit contributions
ada+0.008
aman+0.007
mine+0.006
ige+0.006
 AW+0.006
Neg logit contributions
 knights-0.011
 explorers-0.011
 caves-0.009
 triangles-0.009
 pirates-0.009
Token bring
Token ID2222
Feature activation-1.64e-04

Pos logit contributions
ada+0.000
aman+0.000
mine+0.000
ige+0.000
ir+0.000
Neg logit contributions
 explorers-0.000
 there-0.000
 knights-0.000
 he-0.000
 pirates-0.000
Token There
Token ID1318
Feature activation+1.99e-02

Pos logit contributions
 explorers+0.009
 knights+0.009
 caves+0.008
 triangles+0.008
 lair+0.008
Neg logit contributions
ada-0.011
aman-0.011
mine-0.010
 AW-0.009
ige-0.009
Token were
Token ID547
Feature activation+7.95e-03

Pos logit contributions
 knights+0.034
 explorers+0.034
 triangles+0.029
 caves+0.029
 lair+0.029
Neg logit contributions
ada-0.034
aman-0.033
mine-0.030
ir-0.026
 punish-0.026
Token many
Token ID867
Feature activation+8.91e-03

Pos logit contributions
 explorers+0.011
 knights+0.011
 lair+0.009
 caves+0.009
 shields+0.009
Neg logit contributions
ada-0.015
aman-0.015
mine-0.013
ige-0.013
 punish-0.012
Token pictures
Token ID5986
Feature activation+3.97e-04

Pos logit contributions
 explorers+0.012
 knights+0.012
 lair+0.010
 caves+0.010
 triangles+0.009
Neg logit contributions
ada-0.018
aman-0.017
mine-0.015
ir-0.015
ige-0.015
Token of
Token ID286
Feature activation+3.81e-04

Pos logit contributions
 knights+0.001
 explorers+0.001
 caves+0.000
 lair+0.000
 triangles+0.000
Neg logit contributions
ada-0.001
aman-0.001
mine-0.001
ige-0.001
ir-0.001
Token Jimmy
Token ID12963
Feature activation-2.00e-02

Pos logit contributions
 knights+0.001
 explorers+0.000
 triangles+0.000
 caves+0.000
 lair+0.000
Neg logit contributions
ada-0.001
aman-0.001
mine-0.001
ige-0.001
ir-0.001
Token drinking
Token ID7722
Feature activation-8.60e-03

Pos logit contributions
ada+0.031
aman+0.030
mine+0.027
ige+0.025
use+0.024
Neg logit contributions
 explorers-0.036
 knights-0.036
 caves-0.032
 triangles-0.031
 lair-0.031
Token from
Token ID422
Feature activation-2.48e-03

Pos logit contributions
ada+0.015
aman+0.015
mine+0.013
 AW+0.012
ige+0.012
Neg logit contributions
 knights-0.014
 explorers-0.014
 triangles-0.012
 caves-0.012
 rockets-0.012
Token the
Token ID262
Feature activation-1.15e-02

Pos logit contributions
ada+0.006
aman+0.005
mine+0.005
 AW+0.005
ige+0.005
Neg logit contributions
 knights-0.003
 explorers-0.003
 caves-0.002
 triangles-0.002
 rockets-0.002
Token pond
Token ID16723
Feature activation-1.60e-02

Pos logit contributions
ada+0.023
aman+0.023
mine+0.021
 AW+0.019
ir+0.019
Neg logit contributions
 knights-0.016
 explorers-0.016
 caves-0.014
 triangles-0.014
 pirates-0.013
Token and
Token ID290
Feature activation+4.16e-03

Pos logit contributions
ada+0.025
aman+0.024
mine+0.021
ir+0.019
use+0.019
Neg logit contributions
 knights-0.029
 explorers-0.029
 triangles-0.025
 caves-0.025
 rockets-0.025
Token was
Token ID373
Feature activation+7.71e-03

Pos logit contributions
ada+0.023
aman+0.022
mine+0.020
 AW+0.019
ige+0.019
Neg logit contributions
 knights-0.014
 explorers-0.014
 triangles-0.012
 caves-0.012
 rockets-0.011
Token a
Token ID257
Feature activation+1.60e-02

Pos logit contributions
 knights+0.011
 explorers+0.011
 triangles+0.009
 caves+0.009
 rockets+0.009
Neg logit contributions
ada-0.016
aman-0.015
mine-0.014
ige-0.013
 AW-0.013
Token small
Token ID1402
Feature activation-4.94e-03

Pos logit contributions
 explorers+0.023
 knights+0.023
 triangles+0.020
 caves+0.020
 rockets+0.019
Neg logit contributions
ada-0.031
aman-0.030
mine-0.027
ige-0.026
ir-0.025
Token,
Token ID11
Feature activation-1.28e-02

Pos logit contributions
ada+0.010
aman+0.009
mine+0.008
ige+0.008
 AW+0.008
Neg logit contributions
 explorers-0.008
 knights-0.007
 cave-0.006
 caves-0.006
 triangles-0.006
Token scared
Token ID12008
Feature activation+2.21e-03

Pos logit contributions
ada+0.022
aman+0.021
mine+0.019
 AW+0.018
ige+0.018
Neg logit contributions
 knights-0.020
 explorers-0.020
 lair-0.018
 caves-0.017
 triangles-0.017
Token animal
Token ID5044
Feature activation-1.66e-02

Pos logit contributions
 explorers+0.003
 knights+0.003
 caves+0.003
 triangles+0.003
 pirates+0.003
Neg logit contributions
ada-0.004
aman-0.004
mine-0.004
ige-0.004
 AW-0.004
Token.
Token ID13
Feature activation+4.86e-03

Pos logit contributions
ada+0.030
aman+0.029
mine+0.026
 AW+0.024
ige+0.024
Neg logit contributions
 knights-0.026
 explorers-0.026
 triangles-0.023
 caves-0.023
 rockets-0.022
Token The
Token ID383
Feature activation-2.55e-03

Pos logit contributions
 explorers+0.009
 knights+0.009
 caves+0.008
 triangles+0.008
 lair+0.008
Neg logit contributions
ada-0.007
aman-0.007
mine-0.006
ige-0.006
 AW-0.006
Token animal
Token ID5044
Feature activation-1.95e-02

Pos logit contributions
ada+0.005
aman+0.005
mine+0.005
ige+0.004
use+0.004
Neg logit contributions
 explorers-0.004
 knights-0.004
 caves-0.003
 triangles-0.003
 pirates-0.003
Token was
Token ID373
Feature activation+6.95e-03

Pos logit contributions
ada+0.033
aman+0.031
mine+0.028
 punish+0.027
ING+0.026
Neg logit contributions
 knights-0.031
 explorers-0.031
 triangles-0.029
 caves-0.029
 slices-0.028
Token trembling
Token ID39228
Feature activation+1.21e-02

Pos logit contributions
 knights+0.011
 explorers+0.010
 triangles+0.009
 caves+0.009
 slices+0.009
Neg logit contributions
ada-0.013
aman-0.012
mine-0.011
ige-0.010
 punish-0.010
Token't
Token ID470
Feature activation-2.10e-02

Pos logit contributions
aman+0.031
ada+0.029
use+0.028
awk+0.028
ige+0.026
Neg logit contributions
 he-0.038
 there-0.035
 knights-0.035
 his-0.034
 triangles-0.034
Token want
Token ID765
Feature activation-3.85e-02

Pos logit contributions
aman+0.035
ada+0.029
 AW+0.026
inkle+0.025
ishly+0.025
Neg logit contributions
 explorers-0.040
 knights-0.039
 there-0.036
 he-0.036
 pirates-0.035
Token to
Token ID284
Feature activation-8.59e-03

Pos logit contributions
ada+0.083
aman+0.081
mine+0.073
ige+0.069
 punish+0.069
Neg logit contributions
 explorers-0.048
 knights-0.047
 caves-0.039
 triangles-0.039
 lair-0.037
Token stop
Token ID2245
Feature activation+5.20e-03

Pos logit contributions
aman+0.019
ada+0.018
 AW+0.017
ige+0.017
aff+0.016
Neg logit contributions
 there-0.010
 he-0.009
 explorers-0.009
 knights-0.009
 his-0.009
Token playing
Token ID2712
Feature activation+9.69e-03

Pos logit contributions
 knights+0.007
 explorers+0.007
 triangles+0.006
 caves+0.006
 lair+0.006
Neg logit contributions
ada-0.010
aman-0.010
mine-0.009
ige-0.009
 AW-0.009
Token,
Token ID11
Feature activation-1.24e-02

Pos logit contributions
 knights+0.017
 explorers+0.017
 triangles+0.014
 caves+0.014
 rockets+0.014
Neg logit contributions
ada-0.016
aman-0.016
mine-0.014
ige-0.013
 AW-0.013
Token but
Token ID475
Feature activation-1.24e-03

Pos logit contributions
ada+0.017
aman+0.017
mine+0.015
 punish+0.013
ige+0.013
Neg logit contributions
 knights-0.024
 explorers-0.024
 lair-0.022
 caves-0.022
 triangles-0.021
Token he
Token ID339
Feature activation+1.81e-02

Pos logit contributions
aman+0.002
ada+0.002
mine+0.002
 hate+0.002
 say+0.002
Neg logit contributions
 lair-0.002
 knights-0.002
 explorers-0.002
 caves-0.002
and-0.002
Token knew
Token ID2993
Feature activation-5.45e-03

Pos logit contributions
 explorers+0.034
 knights+0.033
 caves+0.031
 triangles+0.030
 rockets+0.030
Neg logit contributions
ada-0.028
aman-0.027
mine-0.024
ige-0.023
 punish-0.022
Token he
Token ID339
Feature activation+1.15e-02

Pos logit contributions
aman+0.009
ada+0.008
iable+0.007
't+0.007
 punish+0.007
Neg logit contributions
 explorers-0.009
 knights-0.009
 valleys-0.008
 lair-0.008
 triangles-0.008
Token had
Token ID550
Feature activation+1.32e-02

Pos logit contributions
 knights+0.017
 explorers+0.017
 caves+0.015
 triangles+0.015
 rockets+0.014
Neg logit contributions
ada-0.022
aman-0.021
mine-0.020
ige-0.018
 punish-0.018
Token
Token ID198
Feature activation+1.19e-02

Pos logit contributions
 explorers+0.002
 knights+0.002
 triangles+0.002
 caves+0.002
 lair+0.002
Neg logit contributions
ada-0.002
aman-0.002
mine-0.002
 AW-0.002
ir-0.002
Token
Token ID198
Feature activation+4.71e-03

Pos logit contributions
 explorers+0.019
 knights+0.019
 triangles+0.016
 caves+0.016
 lair+0.016
Neg logit contributions
ada-0.022
aman-0.021
mine-0.019
ir-0.018
 AW-0.018
TokenTim
Token ID14967
Feature activation+1.67e-02

Pos logit contributions
 explorers+0.007
 knights+0.007
 triangles+0.006
 caves+0.006
 lair+0.006
Neg logit contributions
ada-0.009
aman-0.009
mine-0.008
 AW-0.007
ige-0.007
Token and
Token ID290
Feature activation+1.78e-02

Pos logit contributions
 knights+0.029
 explorers+0.029
 triangles+0.025
 caves+0.025
 rockets+0.024
Neg logit contributions
ada-0.028
aman-0.027
mine-0.024
 AW-0.022
ige-0.022
Token his
Token ID465
Feature activation+7.18e-03

Pos logit contributions
 importantly+0.034
 queens+0.031
 explorers+0.031
 chefs+0.031
and+0.030
Neg logit contributions
mine-0.023
ada-0.022
 say-0.021
 hug-0.020
aman-0.020
Token mom
Token ID1995
Feature activation-8.21e-03

Pos logit contributions
 explorers+0.011
 knights+0.011
 caves+0.010
 triangles+0.010
 valleys+0.010
Neg logit contributions
ada-0.012
aman-0.011
mine-0.010
ige-0.010
ING-0.010
Token helped
Token ID4193
Feature activation+9.58e-03

Pos logit contributions
ada+0.017
aman+0.016
mine+0.014
 AW+0.014
inkle+0.014
Neg logit contributions
 knights-0.011
 explorers-0.011
 triangles-0.009
 caves-0.009
 rockets-0.009
Token the
Token ID262
Feature activation+2.75e-03

Pos logit contributions
 knights+0.012
 explorers+0.012
 caves+0.010
 triangles+0.010
 rockets+0.009
Neg logit contributions
ada-0.021
aman-0.021
mine-0.019
 AW-0.018
ige-0.018
Token lizard
Token ID42406
Feature activation+1.29e-03

Pos logit contributions
 knights+0.004
 explorers+0.004
 triangles+0.004
 shields+0.004
kins+0.004
Neg logit contributions
ada-0.004
aman-0.004
ING-0.004
mine-0.004
 ME-0.004
Token move
Token ID1445
Feature activation+5.86e-03

Pos logit contributions
 knights+0.003
 explorers+0.003
 triangles+0.002
 importantly+0.002
 caves+0.002
Neg logit contributions
ada-0.002
aman-0.002
mine-0.001
 punish-0.001
ir-0.001
Token to
Token ID284
Feature activation+1.18e-03

Pos logit contributions
 importantly+0.010
 explorers+0.010
 knights+0.010
 lair+0.009
 queens+0.009
Neg logit contributions
ada-0.009
mine-0.008
aman-0.008
ir-0.008
 ME-0.007
Token She
Token ID1375
Feature activation+4.92e-03

Pos logit contributions
 knights+0.006
 explorers+0.006
 triangles+0.005
 caves+0.005
 rockets+0.005
Neg logit contributions
ada-0.006
aman-0.006
mine-0.006
 AW-0.005
ige-0.005
Token put
Token ID1234
Feature activation-5.70e-04

Pos logit contributions
 knights+0.007
 explorers+0.007
 caves+0.006
 triangles+0.006
 rockets+0.006
Neg logit contributions
ada-0.009
aman-0.009
mine-0.008
ige-0.008
 AW-0.008
Token on
Token ID319
Feature activation-7.16e-03

Pos logit contributions
ada+0.001
aman+0.001
mine+0.001
ir+0.001
ige+0.001
Neg logit contributions
 explorers-0.001
 knights-0.001
 lair-0.001
 importantly-0.001
 caves-0.001
Token her
Token ID607
Feature activation+2.42e-04

Pos logit contributions
ada+0.013
aman+0.012
mine+0.011
 AW+0.010
ige+0.010
Neg logit contributions
 knights-0.012
 explorers-0.012
 triangles-0.010
 caves-0.010
 rockets-0.010
Token swim
Token ID9422
Feature activation-3.39e-03

Pos logit contributions
 knights+0.000
 explorers+0.000
 triangles+0.000
 caves+0.000
 lair+0.000
Neg logit contributions
ada-0.000
aman-0.000
mine-0.000
 AW-0.000
ige-0.000
Tokensuit
Token ID6063
Feature activation-5.35e-02

Pos logit contributions
ada+0.006
aman+0.006
mine+0.005
ING+0.005
 AW+0.005
Neg logit contributions
 knights-0.005
 explorers-0.005
 triangles-0.005
 caves-0.005
 importantly-0.005
Token and
Token ID290
Feature activation-1.49e-04

Pos logit contributions
ada+0.096
aman+0.092
 AW+0.080
use+0.079
ir+0.078
Neg logit contributions
 knights-0.086
 explorers-0.085
 there-0.077
 he-0.074
 slices-0.074
Token ran
Token ID4966
Feature activation+8.90e-03

Pos logit contributions
ada+0.000
aman+0.000
mine+0.000
 AW+0.000
use+0.000
Neg logit contributions
 knights-0.000
 lair-0.000
 explorers-0.000
and-0.000
 importantly-0.000
Token to
Token ID284
Feature activation+1.88e-03

Pos logit contributions
 knights+0.011
 explorers+0.011
 caves+0.009
 triangles+0.009
 rockets+0.008
Neg logit contributions
ada-0.020
aman-0.019
mine-0.017
ige-0.017
 AW-0.017
Token the
Token ID262
Feature activation+1.08e-03

Pos logit contributions
 knights+0.002
 explorers+0.002
 caves+0.002
 triangles+0.001
 pirates+0.001
Neg logit contributions
aman-0.004
ada-0.004
mine-0.004
ige-0.004
 AW-0.004
Token shore
Token ID15191
Feature activation+1.08e-04

Pos logit contributions
 knights+0.002
 explorers+0.002
 triangles+0.002
 caves+0.002
 lair+0.001
Neg logit contributions
ada-0.002
aman-0.002
mine-0.002
ige-0.002
 AW-0.002
Token bed
Token ID3996
Feature activation-4.41e-03

Pos logit contributions
ada+0.003
aman+0.003
mine+0.003
 punish+0.002
ir+0.002
Neg logit contributions
 explorers-0.002
 knights-0.002
 triangles-0.002
 lair-0.002
 caves-0.002
Token wearing
Token ID5762
Feature activation-3.67e-03

Pos logit contributions
ada+0.008
aman+0.008
mine+0.007
 AW+0.006
ige+0.006
Neg logit contributions
 knights-0.007
 explorers-0.007
 caves-0.006
 triangles-0.006
 lair-0.006
Token his
Token ID465
Feature activation-2.47e-03

Pos logit contributions
ada+0.009
aman+0.009
mine+0.009
 AW+0.008
ige+0.008
Neg logit contributions
 knights-0.003
 explorers-0.003
 triangles-0.002
 caves-0.002
 rockets-0.002
Token favourite
Token ID12507
Feature activation-1.29e-02

Pos logit contributions
ada+0.005
aman+0.005
mine+0.004
ir+0.004
 AW+0.004
Neg logit contributions
 knights-0.004
 explorers-0.003
 triangles-0.003
 caves-0.003
 rockets-0.003
Token py
Token ID12972
Feature activation+1.51e-02

Pos logit contributions
ada+0.022
aman+0.022
mine+0.019
 AW+0.018
ige+0.018
Neg logit contributions
 knights-0.021
 explorers-0.021
 triangles-0.019
 caves-0.019
 rockets-0.018
Tokenj
Token ID73
Feature activation-5.21e-02

Pos logit contributions
 knights+0.023
 explorers+0.023
 pans+0.023
 caves+0.022
 there+0.022
Neg logit contributions
ada-0.024
aman-0.022
 punish-0.022
upid-0.021
 realize-0.020
Tokenamas
Token ID17485
Feature activation+5.24e-03

Pos logit contributions
ada+0.083
aman+0.080
 AW+0.076
 punish+0.075
 hate+0.073
Neg logit contributions
 knights-0.077
 triangles-0.075
 explorers-0.073
 rockets-0.070
 slices-0.069
Token,
Token ID11
Feature activation-3.15e-03

Pos logit contributions
 explorers+0.008
 knights+0.008
 caves+0.006
 triangles+0.006
 rockets+0.006
Neg logit contributions
ada-0.010
aman-0.010
mine-0.008
 AW-0.008
ige-0.008
Token feeling
Token ID4203
Feature activation+1.64e-03

Pos logit contributions
ada+0.006
aman+0.005
mine+0.005
ige+0.004
 AW+0.004
Neg logit contributions
 knights-0.005
 explorers-0.005
 triangles-0.004
 caves-0.004
 lair-0.004
Token so
Token ID523
Feature activation+3.11e-03

Pos logit contributions
 knights+0.002
 explorers+0.002
 triangles+0.002
 caves+0.002
 lair+0.002
Neg logit contributions
ada-0.003
aman-0.003
mine-0.003
inkle-0.003
 punish-0.003
Token proud
Token ID6613
Feature activation-5.95e-05

Pos logit contributions
 knights+0.004
 explorers+0.004
 triangles+0.003
 caves+0.003
 lair+0.003
Neg logit contributions
ada-0.007
aman-0.006
mine-0.006
inkle-0.006
 AW-0.005
Token still
Token ID991
Feature activation-7.92e-03

Pos logit contributions
ada+0.010
aman+0.010
 punish+0.009
mine+0.008
ige+0.008
Neg logit contributions
 knights-0.011
 explorers-0.011
 triangles-0.009
 caves-0.009
 slices-0.009
Token your
Token ID534
Feature activation-2.07e-02

Pos logit contributions
ada+0.014
aman+0.013
inkle+0.012
 punish+0.012
 hate+0.011
Neg logit contributions
 knights-0.012
 explorers-0.012
 triangles-0.011
and-0.011
 caves-0.010
Token friend
Token ID1545
Feature activation-2.04e-02

Pos logit contributions
ada+0.037
aman+0.036
mine+0.033
 AW+0.030
ir+0.030
Neg logit contributions
 knights-0.034
 explorers-0.034
 triangles-0.029
 caves-0.029
 pirates-0.029
Token.
Token ID13
Feature activation-7.82e-03

Pos logit contributions
ada+0.032
aman+0.031
mine+0.028
use+0.026
 punish+0.025
Neg logit contributions
 knights-0.037
 explorers-0.036
 triangles-0.033
 caves-0.033
 lair-0.031
Token Acc
Token ID6366
Feature activation+5.67e-04

Pos logit contributions
ada+0.014
aman+0.014
mine+0.012
ige+0.011
 AW+0.011
Neg logit contributions
 knights-0.013
 explorers-0.013
 triangles-0.011
 caves-0.011
 rockets-0.010
Tokenidents
Token ID3231
Feature activation-5.40e-02

Pos logit contributions
 knights+0.001
 explorers+0.001
 triangles+0.001
 pirates+0.001
 rockets+0.001
Neg logit contributions
ada-0.000
aman-0.000
mine-0.000
ir-0.000
use-0.000
Token happen
Token ID1645
Feature activation-1.54e-02

Pos logit contributions
aman+0.106
ada+0.103
ishly+0.086
aley+0.086
ige+0.086
Neg logit contributions
 he-0.080
 there-0.079
 explorers-0.077
 knights-0.071
 his-0.069
Token,
Token ID11
Feature activation-1.37e-02

Pos logit contributions
ada+0.023
aman+0.022
mine+0.019
use+0.018
 AW+0.017
Neg logit contributions
 knights-0.029
 explorers-0.029
 caves-0.026
 triangles-0.025
 lair-0.025
Token but
Token ID475
Feature activation-7.84e-03

Pos logit contributions
ada+0.024
aman+0.023
mine+0.020
 AW+0.019
ige+0.019
Neg logit contributions
 knights-0.023
 explorers-0.023
 caves-0.020
 triangles-0.020
 lair-0.020
Token we
Token ID356
Feature activation+1.17e-02

Pos logit contributions
ada+0.012
aman+0.012
mine+0.010
use+0.010
 punish+0.010
Neg logit contributions
 knights-0.014
 explorers-0.014
 triangles-0.012
 caves-0.012
 lair-0.012
Token can
Token ID460
Feature activation-1.19e-02

Pos logit contributions
 explorers+0.016
 knights+0.015
 there+0.014
 brothers+0.013
 pirates+0.013
Neg logit contributions
aman-0.026
ada-0.023
 AW-0.021
ING-0.021
 ME-0.021
TokenSh
Token ID2484
Feature activation-9.33e-03

Pos logit contributions
ada+0.043
aman+0.042
mine+0.037
use+0.034
 AW+0.034
Neg logit contributions
 knights-0.047
 explorers-0.047
 caves-0.042
 triangles-0.041
 lair-0.040
Tokenrimp
Token ID23750
Feature activation-7.75e-03

Pos logit contributions
aman+0.015
ada+0.014
 AW+0.014
ige+0.013
use+0.012
Neg logit contributions
 knights-0.015
and-0.015
 explorers-0.015
ull-0.014
 triangles-0.014
Token?
Token ID30
Feature activation-2.47e-03

Pos logit contributions
aman+0.012
ada+0.011
 AW+0.009
mine+0.009
ige+0.009
Neg logit contributions
 explorers-0.015
 knights-0.014
 triangles-0.013
 there-0.013
 caves-0.013
Token What
Token ID1867
Feature activation+1.65e-02

Pos logit contributions
ada+0.005
aman+0.004
mine+0.004
ige+0.004
 AW+0.004
Neg logit contributions
 knights-0.004
 explorers-0.004
 triangles-0.003
 caves-0.003
 rockets-0.003
Token is
Token ID318
Feature activation-1.32e-02

Pos logit contributions
 explorers+0.035
 lair+0.035
 valleys+0.035
 knights+0.034
 caves+0.034
Neg logit contributions
ada-0.015
aman-0.014
're-0.013
 hate-0.013
 punish-0.013
Token shrimp
Token ID30060
Feature activation-4.94e-02

Pos logit contributions
ada+0.021
aman+0.020
ige+0.017
mine+0.017
 AW+0.017
Neg logit contributions
 knights-0.022
 explorers-0.022
 lair-0.020
 caves-0.020
 triangles-0.019
Token?"
Token ID1701
Feature activation-1.40e-02

Pos logit contributions
ada+0.069
aman+0.068
mine+0.056
 AW+0.053
ige+0.052
Neg logit contributions
 explorers-0.101
 knights-0.101
 caves-0.089
 triangles-0.089
 there-0.088
Token L
Token ID406
Feature activation-1.37e-03

Pos logit contributions
ada+0.022
aman+0.020
mine+0.019
ir+0.017
 ME+0.016
Neg logit contributions
 explorers-0.026
 knights-0.025
 lair-0.024
 caves-0.023
 rockets-0.023
Tokenila
Token ID10102
Feature activation+1.56e-02

Pos logit contributions
aman+0.003
ada+0.002
 AW+0.002
use+0.002
 punish+0.002
Neg logit contributions
 knights-0.002
 explorers-0.002
 triangles-0.002
 there-0.002
 pirates-0.002
Token asked
Token ID1965
Feature activation-9.32e-04

Pos logit contributions
 knights+0.019
 explorers+0.019
 triangles+0.016
 caves+0.015
 rockets+0.014
Neg logit contributions
ada-0.034
aman-0.033
mine-0.030
 AW-0.029
ige-0.029
Token.
Token ID13
Feature activation+8.86e-03

Pos logit contributions
ada+0.002
aman+0.001
mine+0.001
ir+0.001
ING+0.001
Neg logit contributions
 explorers-0.002
 knights-0.002
 caves-0.001
 lair-0.001
 triangles-0.001
Token Nem
Token ID22547
Feature activation-1.09e-02

Pos logit contributions
ada+0.009
aman+0.009
mine+0.008
ige+0.007
 AW+0.007
Neg logit contributions
 knights-0.005
 explorers-0.005
 triangles-0.004
 caves-0.004
 rockets-0.004
Tokeno
Token ID78
Feature activation+1.95e-02

Pos logit contributions
ada+0.024
aman+0.023
mine+0.021
awk+0.021
use+0.021
Neg logit contributions
 knights-0.012
 explorers-0.012
 triangles-0.011
 rockets-0.010
 importantly-0.010
Token and
Token ID290
Feature activation+1.29e-02

Pos logit contributions
 knights+0.030
 explorers+0.030
 triangles+0.026
 caves+0.025
 rockets+0.025
Neg logit contributions
ada-0.036
aman-0.035
mine-0.031
use-0.029
ir-0.029
Token his
Token ID465
Feature activation+5.02e-03

Pos logit contributions
 knights+0.016
 explorers+0.016
 importantly+0.015
 slices+0.015
 lair+0.015
Neg logit contributions
ada-0.024
mine-0.023
aman-0.022
 hug-0.022
 say-0.021
Token friend
Token ID1545
Feature activation-1.35e-02

Pos logit contributions
 caves+0.005
 knights+0.005
 explorers+0.005
 nights+0.005
and+0.005
Neg logit contributions
ada-0.010
ige-0.009
ING-0.009
aman-0.008
awk-0.008
Token D
Token ID360
Feature activation-5.14e-02

Pos logit contributions
ada+0.024
aman+0.023
mine+0.021
ige+0.019
use+0.019
Neg logit contributions
 knights-0.022
 explorers-0.021
 triangles-0.019
 caves-0.019
 lair-0.018
Tokenory
Token ID652
Feature activation-3.28e-02

Pos logit contributions
aman+0.070
 AW+0.059
ada+0.057
inkle+0.054
 punish+0.052
Neg logit contributions
 explorers-0.105
 knights-0.105
and-0.093
 pirates-0.093
 triangles-0.091
Token were
Token ID547
Feature activation+6.59e-03

Pos logit contributions
ada+0.066
aman+0.065
mine+0.058
 AW+0.055
ige+0.055
Neg logit contributions
 explorers-0.046
 knights-0.046
 triangles-0.038
 caves-0.038
 pirates-0.036
Token swimming
Token ID14899
Feature activation-6.38e-04

Pos logit contributions
 knights+0.009
 explorers+0.009
 slices+0.008
 triangles+0.008
 shields+0.008
Neg logit contributions
ada-0.013
aman-0.012
mine-0.011
ir-0.011
 punish-0.011
Token around
Token ID1088
Feature activation-1.90e-02

Pos logit contributions
ada+0.001
mine+0.001
aman+0.001
ir+0.001
awk+0.001
Neg logit contributions
 knights-0.001
 explorers-0.001
 importantly-0.001
 lair-0.001
 triangles-0.001
Token when
Token ID618
Feature activation-1.52e-03

Pos logit contributions
ada+0.034
aman+0.033
mine+0.030
ir+0.030
use+0.028
Neg logit contributions
 knights-0.027
 explorers-0.026
 lair-0.024
 queens-0.024
 shields-0.024