TOP ACTIVATIONS
MAX = 0.139

deaf . He could sign hello , thank you , hungry
M um and Dad took Flor a to the park .
Tim my liked to pick up rocks and sticks and carry
and Ben want to say hello . They crawl
girl who loved playing dress up . Every day she would
. She liked to say hello to the people who worked
y cat watching him from afar . Twe ety didn 't
. They wanted to say hello . They walked
She loved to play dress up and wear pretty clothes .
C urious , Sam went nearer to the pole to take
She loved to play dress up and put on pretty clothes
L ily wanted to say hello to the goose , so
. She wants to say hello to them .
who loved to play dress up . He had a big
She loved to play dress up with her friends . One
who like to play dress up . They have many hats
deaf . She could sign hello , thank you , hungry
Mia liked to play dress up with her clothes . She
. Lily wanted to say hello to the lizard , but
" Can we go closer , mom ?" Lily asked

MIN ACTIVATIONS
MIN = -0.184

!" The chip m unk finished the treat
you ". The chip m unk and Sam were
place ! The chip mun ks didn 't know
! The chip m unk was excited and
again . The chip mun ks were so thankful
. The chip m unk told the boy
too . The chip m unk helped the frog
around . Suddenly , the chip m unk stopped and looked
after that . Whenever the chip m unk needed help ,
floor . The chip mun ks were very curious
leave !" Suddenly , the chip mun ks were free again
!" said Lily . The por cup ine said nothing and
por cup ine ! The por cup ine qu illed and
When the chip m unk reached her home
chip m unk ! The chip m unk said to Sally
bird . Luckily , the chip m unk said they should
in her paws . The chip m unk got closer and
boy finally understood why the chip m unk had been so
. The av oc ados were shared with everybody
. The town spe ople were relieved . They

INTERVAL 0.058 - 0.139
CONTAINS 3.047%

cried . The end . <|endoftext|> Once upon a time ,
and Lily were friends who liked to play outside . They
bowls of salad . They look green and red and yellow
Anna . She wanted to learn something new . So one
out of the fireplace to explore ." The boy

INTERVAL -0.022 - 0.058
CONTAINS 75.877%

Tim my felt guilty for not listening to his mom ,
at her seriously . " I have to do some work
fountain and felt very sleepy . He decided to take a
playing baseball again . Danny was proud of himself for being
to play with her mom 's purse . She liked to

INTERVAL -0.103 - -0.022
CONTAINS 21.033%

not looking . " Hey , that 's our
One day , the dad wanted to do something special
store , but when Lily took a bite , she made
some chickens . The animals were very happy living on the
" Oh , honey , I 'm sorry to hear

INTERVAL -0.184 - -0.103
CONTAINS 0.044%

keeper told Louise that the oil had run out very quickly
rod . He cast the line in the water , and
would work , but the therm ometer still couldn 't show
s wrong , rh in oc eros ? â said
But they see another long line of kids waiting . They
Token deaf
Token ID21815
Feature activation+3.10e-02

Pos logit contributions
 English+0.005
 Spanish+0.005
 clearly+0.004
 closer+0.004
 wonders+0.004
Neg logit contributions
bank-0.008
 chip-0.008
keeping-0.008
rim-0.008
 burgers-0.008
Token.
Token ID13
Feature activation+2.61e-02

Pos logit contributions
 English+0.030
 Spanish+0.027
 clearly+0.025
 wonders+0.022
 closer+0.022
Neg logit contributions
 chip-0.070
rim-0.069
bank-0.067
 burgers-0.066
 lin-0.066
Token He
Token ID679
Feature activation+4.78e-02

Pos logit contributions
 English+0.024
 Spanish+0.022
adies+0.022
 clearly+0.021
 One+0.020
Neg logit contributions
 chip-0.057
bank-0.053
 burgers-0.053
ages-0.052
 ir-0.051
Token could
Token ID714
Feature activation+4.22e-02

Pos logit contributions
 English+0.062
 clearly+0.059
 Spanish+0.057
 wonders+0.056
adies+0.055
Neg logit contributions
 chip-0.089
 burgers-0.082
bank-0.082
 tacos-0.078
ages-0.077
Token sign
Token ID1051
Feature activation+5.53e-02

Pos logit contributions
adies+0.053
 English+0.050
itates+0.047
 Spanish+0.047
 clearly+0.045
Neg logit contributions
 chip-0.078
bank-0.072
 burgers-0.070
 tacos-0.068
ards-0.066
Token hello
Token ID23748
Feature activation+1.39e-01

Pos logit contributions
 English+0.069
 Spanish+0.066
 closer+0.063
 clearly+0.062
 hello+0.056
Neg logit contributions
 chip-0.107
bank-0.104
rim-0.099
ages-0.099
 burgers-0.097
Token,
Token ID11
Feature activation+3.64e-02

Pos logit contributions
 English+0.191
 Spanish+0.179
adies+0.178
 closer+0.175
 clearly+0.175
Neg logit contributions
 chip-0.242
bank-0.227
 burgers-0.222
 tacos-0.216
 ir-0.213
Token thank
Token ID5875
Feature activation+2.58e-02

Pos logit contributions
 English+0.041
 Spanish+0.039
 clearly+0.037
 closer+0.036
 hello+0.034
Neg logit contributions
 chip-0.076
bank-0.073
ages-0.070
akers-0.070
rim-0.070
Token you
Token ID345
Feature activation+1.70e-02

Pos logit contributions
 English+0.028
 Spanish+0.027
adies+0.026
 closer+0.025
 clearly+0.025
Neg logit contributions
bank-0.053
 chip-0.052
 burgers-0.047
 bulldo-0.047
 pitcher-0.046
Token,
Token ID11
Feature activation+2.38e-02

Pos logit contributions
 English+0.021
 Spanish+0.020
 closer+0.019
 clearly+0.017
adies+0.017
Neg logit contributions
 chip-0.034
rim-0.034
bank-0.033
 pitcher-0.032
 tacos-0.031
Token hungry
Token ID14720
Feature activation+2.87e-04

Pos logit contributions
 English+0.026
 Spanish+0.024
 closer+0.023
 clearly+0.023
 hello+0.021
Neg logit contributions
 chip-0.051
bank-0.049
rim-0.047
akers-0.047
ages-0.046
TokenM
Token ID44
Feature activation-2.88e-03

Pos logit contributions
 English+0.075
adies+0.072
 Spanish+0.070
 clearly+0.068
 closer+0.067
Neg logit contributions
 chip-0.131
bank-0.123
 burgers-0.120
 ir-0.116
ages-0.116
Tokenum
Token ID388
Feature activation+7.52e-03

Pos logit contributions
 chip+0.005
bank+0.005
 burgers+0.005
 tacos+0.004
 ir+0.004
Neg logit contributions
 English-0.004
adies-0.004
 Spanish-0.004
 clearly-0.004
 closer-0.004
Token and
Token ID290
Feature activation+3.97e-02

Pos logit contributions
adies+0.008
 opens+0.007
 clearly+0.007
 wonders+0.007
 closely+0.007
Neg logit contributions
 chip-0.015
bank-0.015
ages-0.014
 burgers-0.014
 tacos-0.014
Token Dad
Token ID17415
Feature activation+4.72e-03

Pos logit contributions
 hello+0.043
 closer+0.042
 English+0.040
adies+0.037
 upon+0.036
Neg logit contributions
bank-0.084
 chip-0.081
rim-0.080
ages-0.079
akers-0.076
Token took
Token ID1718
Feature activation+1.97e-02

Pos logit contributions
adies+0.006
 closer+0.006
 English+0.005
 clearly+0.005
bell+0.005
Neg logit contributions
 chip-0.009
bank-0.009
ages-0.008
rim-0.008
 tacos-0.008
Token Flor
Token ID4432
Feature activation+1.32e-01

Pos logit contributions
 English+0.020
 closer+0.020
adies+0.020
 Spanish+0.019
 closely+0.019
Neg logit contributions
bank-0.040
 chip-0.039
stack-0.037
rim-0.037
 bulldo-0.035
Tokena
Token ID64
Feature activation+4.11e-02

Pos logit contributions
 English+0.181
adies+0.174
 Spanish+0.170
 clearly+0.167
 closer+0.161
Neg logit contributions
 chip-0.222
bank-0.207
 burgers-0.201
ages-0.198
 ir-0.196
Token to
Token ID284
Feature activation+2.13e-02

Pos logit contributions
 English+0.050
 closer+0.050
 Spanish+0.047
adies+0.046
 closely+0.046
Neg logit contributions
 chip-0.076
bank-0.075
rim-0.070
 bulldo-0.068
 burgers-0.068
Token the
Token ID262
Feature activation+6.71e-02

Pos logit contributions
 English+0.025
 Spanish+0.024
 closer+0.023
 clearly+0.022
adies+0.022
Neg logit contributions
 chip-0.040
bank-0.040
ages-0.040
akers-0.038
rim-0.038
Token park
Token ID3952
Feature activation+6.77e-02

Pos logit contributions
 English+0.098
 Spanish+0.093
adies+0.092
 closer+0.092
 clearly+0.090
Neg logit contributions
 chip-0.107
bank-0.102
ages-0.098
 burgers-0.097
 ir-0.095
Token.
Token ID13
Feature activation+2.90e-02

Pos logit contributions
 closer+0.055
 English+0.051
 One+0.051
 Spanish+0.045
 closely+0.045
Neg logit contributions
 chip-0.154
rim-0.152
 ir-0.146
 tacos-0.144
 bulldo-0.144
Token Tim
Token ID5045
Feature activation+1.35e-02

Pos logit contributions
 English+0.044
adies+0.043
 Spanish+0.042
 clearly+0.041
 closer+0.041
Neg logit contributions
 chip-0.035
bank-0.032
 burgers-0.031
ages-0.030
 tacos-0.030
Tokenmy
Token ID1820
Feature activation+3.52e-02

Pos logit contributions
 English+0.019
adies+0.018
 Spanish+0.018
 closer+0.017
 clearly+0.016
Neg logit contributions
 chip-0.023
bank-0.022
 ir-0.021
ards-0.021
 burgers-0.021
Token liked
Token ID8288
Feature activation+8.58e-02

Pos logit contributions
 English+0.045
adies+0.043
 Spanish+0.042
 clearly+0.042
 closer+0.041
Neg logit contributions
 chip-0.062
bank-0.058
 burgers-0.057
 tacos-0.055
 ir-0.054
Token to
Token ID284
Feature activation+2.97e-02

Pos logit contributions
 English+0.081
 closer+0.075
 Spanish+0.075
 wonders+0.071
 clearly+0.070
Neg logit contributions
bank-0.176
rim-0.172
 chip-0.170
akers-0.167
ages-0.164
Token pick
Token ID2298
Feature activation+4.27e-02

Pos logit contributions
 English+0.033
adies+0.032
 Spanish+0.031
 closer+0.030
 clearly+0.030
Neg logit contributions
 chip-0.057
bank-0.055
 burgers-0.053
ages-0.051
 ir-0.051
Token up
Token ID510
Feature activation+1.29e-01

Pos logit contributions
 English+0.051
adies+0.049
 Spanish+0.048
 closer+0.048
 clearly+0.047
Neg logit contributions
 chip-0.079
bank-0.075
 burgers-0.072
ages-0.071
 tacos-0.070
Token rocks
Token ID12586
Feature activation+8.04e-02

Pos logit contributions
 English+0.180
 Spanish+0.171
adies+0.170
 closer+0.166
 clearly+0.162
Neg logit contributions
 chip-0.219
bank-0.215
rim-0.191
ages-0.189
 burgers-0.188
Token and
Token ID290
Feature activation+6.55e-02

Pos logit contributions
 closer+0.086
 closely+0.080
 clearly+0.073
 English+0.068
 Spanish+0.064
Neg logit contributions
 chip-0.165
 tacos-0.158
rim-0.157
bank-0.156
 burgers-0.155
Token sticks
Token ID16461
Feature activation+3.72e-02

Pos logit contributions
 closer+0.075
 closely+0.067
 wonders+0.067
 English+0.066
 clearly+0.064
Neg logit contributions
 chip-0.129
rim-0.121
bank-0.121
ages-0.121
akers-0.117
Token and
Token ID290
Feature activation+5.90e-02

Pos logit contributions
 closer+0.040
 clearly+0.036
 English+0.036
 closely+0.034
 Spanish+0.033
Neg logit contributions
 chip-0.079
 burgers-0.073
 tacos-0.072
ages-0.071
bank-0.071
Token carry
Token ID3283
Feature activation+2.40e-02

Pos logit contributions
 closer+0.071
 closely+0.062
 wonders+0.062
 English+0.061
 One+0.060
Neg logit contributions
 chip-0.113
rim-0.111
bank-0.108
ages-0.105
akers-0.103
Token and
Token ID290
Feature activation-1.54e-03

Pos logit contributions
 English+0.016
adies+0.015
 Spanish+0.015
 clearly+0.015
 closer+0.014
Neg logit contributions
 chip-0.019
bank-0.018
 burgers-0.018
ages-0.017
 tacos-0.017
Token Ben
Token ID3932
Feature activation-8.91e-03

Pos logit contributions
 chip+0.003
bank+0.003
ages+0.003
akers+0.003
rim+0.003
Neg logit contributions
 English-0.002
adies-0.001
 Spanish-0.001
 closer-0.001
 clearly-0.001
Token want
Token ID765
Feature activation+4.77e-02

Pos logit contributions
 chip+0.016
bank+0.016
 burgers+0.015
 tacos+0.015
 ge+0.015
Neg logit contributions
 English-0.011
adies-0.010
 Spanish-0.010
 clearly-0.009
bell-0.009
Token to
Token ID284
Feature activation+1.73e-02

Pos logit contributions
 closer+0.056
 English+0.055
 Spanish+0.051
adies+0.051
 clearly+0.050
Neg logit contributions
 chip-0.089
bank-0.088
akers-0.085
 ir-0.085
rim-0.085
Token say
Token ID910
Feature activation+3.36e-02

Pos logit contributions
 English+0.020
 closer+0.019
adies+0.019
 Spanish+0.019
 clearly+0.018
Neg logit contributions
 chip-0.032
bank-0.031
 ir-0.030
 burgers-0.030
ages-0.030
Token hello
Token ID23748
Feature activation+1.29e-01

Pos logit contributions
 English+0.014
adies+0.014
 clearly+0.012
itates+0.011
 Spanish+0.011
Neg logit contributions
 chip-0.087
 burgers-0.083
bank-0.083
 tacos-0.081
ages-0.080
Token.
Token ID13
Feature activation+1.94e-02

Pos logit contributions
 English+0.204
adies+0.203
 Spanish+0.196
 clearly+0.190
itates+0.186
Neg logit contributions
 chip-0.188
bank-0.172
 burgers-0.166
ages-0.162
 tacos-0.156
Token
Token ID198
Feature activation+3.56e-02

Pos logit contributions
 English+0.016
 closer+0.015
 One+0.014
 Spanish+0.014
adies+0.014
Neg logit contributions
 chip-0.042
bank-0.041
rim-0.041
akers-0.041
 burgers-0.041
Token
Token ID198
Feature activation+2.86e-02

Pos logit contributions
 English+0.047
adies+0.046
 Spanish+0.044
 clearly+0.042
 diary+0.042
Neg logit contributions
 chip-0.063
bank-0.058
 burgers-0.056
 tacos-0.053
ages-0.053
TokenThey
Token ID2990
Feature activation-4.32e-03

Pos logit contributions
One+0.028
 closer+0.026
 English+0.025
adies+0.023
itates+0.023
Neg logit contributions
bank-0.062
 chip-0.060
keeping-0.060
ages-0.058
 tacos-0.058
Token crawl
Token ID27318
Feature activation+3.04e-02

Pos logit contributions
 chip+0.007
bank+0.007
 burgers+0.007
 tacos+0.007
 ir+0.006
Neg logit contributions
 English-0.006
adies-0.006
 Spanish-0.005
 clearly-0.005
itates-0.005
Token girl
Token ID2576
Feature activation+2.87e-02

Pos logit contributions
 English+0.038
 closer+0.036
 Spanish+0.034
 clearly+0.033
 hello+0.031
Neg logit contributions
bank-0.086
ages-0.083
rim-0.082
 tacos-0.082
 burgers-0.082
Token who
Token ID508
Feature activation+1.53e-02

Pos logit contributions
 English+0.021
 Spanish+0.019
 closer+0.019
 clearly+0.018
 One+0.018
Neg logit contributions
 burgers-0.068
bank-0.067
 tacos-0.067
 chip-0.066
ages-0.066
Token loved
Token ID6151
Feature activation+8.85e-02

Pos logit contributions
 wonders+0.014
 clearly+0.014
 English+0.014
 One+0.013
 name+0.013
Neg logit contributions
 chip-0.033
 burgers-0.032
bank-0.030
rim-0.030
 tacos-0.030
Token playing
Token ID2712
Feature activation+6.01e-02

Pos logit contributions
 English+0.072
 Spanish+0.068
 closer+0.067
 wonders+0.067
 diary+0.065
Neg logit contributions
bank-0.194
rim-0.193
ages-0.188
akers-0.185
 chip-0.185
Token dress
Token ID6576
Feature activation+7.28e-02

Pos logit contributions
 closer+0.062
 English+0.061
 closely+0.059
adies+0.057
 Spanish+0.057
Neg logit contributions
 chip-0.123
bank-0.116
ages-0.112
 burgers-0.111
 tacos-0.108
Token up
Token ID510
Feature activation+1.29e-01

Pos logit contributions
 English+0.088
 clearly+0.086
 closely+0.086
 Spanish+0.085
 closer+0.084
Neg logit contributions
 chip-0.130
bank-0.125
 burgers-0.121
 bulldo-0.119
 ge-0.119
Token.
Token ID13
Feature activation+2.99e-02

Pos logit contributions
 English+0.140
 closer+0.140
 clearly+0.134
 closely+0.133
 Spanish+0.132
Neg logit contributions
 chip-0.264
bank-0.259
ages-0.252
 tacos-0.248
 bulldo-0.247
Token Every
Token ID3887
Feature activation+5.64e-02

Pos logit contributions
 English+0.019
adies+0.019
 Spanish+0.017
 closer+0.017
 clearly+0.016
Neg logit contributions
 chip-0.072
bank-0.069
 burgers-0.067
 ir-0.066
 tacos-0.066
Token day
Token ID1110
Feature activation-4.37e-02

Pos logit contributions
 closer+0.081
 English+0.081
 One+0.080
 Spanish+0.078
One+0.077
Neg logit contributions
 chip-0.090
ages-0.089
akers-0.087
 burgers-0.087
rim-0.085
Token she
Token ID673
Feature activation+1.70e-02

Pos logit contributions
bank+0.103
 chip+0.101
rim+0.099
 bulldo+0.097
ards+0.097
Neg logit contributions
adies-0.032
 English-0.032
 closer-0.031
 hello-0.029
bell-0.029
Token would
Token ID561
Feature activation+2.27e-02

Pos logit contributions
 wonders+0.016
 English+0.016
bell+0.016
 clearly+0.016
 closely+0.016
Neg logit contributions
 chip-0.036
bank-0.033
 burgers-0.033
rim-0.032
keeping-0.032
Token.
Token ID13
Feature activation+1.89e-02

Pos logit contributions
 English+0.048
adies+0.045
 clearly+0.045
 Spanish+0.044
 closer+0.041
Neg logit contributions
 chip-0.110
bank-0.106
 bulldo-0.103
 burgers-0.102
rim-0.101
Token She
Token ID1375
Feature activation+4.06e-02

Pos logit contributions
eks+0.024
adies+0.024
 nearer+0.023
 English+0.022
ieve+0.022
Neg logit contributions
 chip-0.035
 burgers-0.031
bank-0.031
 tacos-0.030
 hamb-0.029
Token liked
Token ID8288
Feature activation+7.45e-02

Pos logit contributions
 English+0.053
adies+0.051
 clearly+0.050
 Spanish+0.049
 closer+0.049
Neg logit contributions
 chip-0.072
bank-0.068
 burgers-0.066
ages-0.064
 tacos-0.063
Token to
Token ID284
Feature activation+1.88e-02

Pos logit contributions
 English+0.074
 Spanish+0.069
 closer+0.066
 clearly+0.065
 closely+0.061
Neg logit contributions
ages-0.154
bank-0.153
rim-0.152
 chip-0.148
akers-0.147
Token say
Token ID910
Feature activation+3.89e-02

Pos logit contributions
 English+0.019
adies+0.018
 Spanish+0.018
 clearly+0.018
 closer+0.018
Neg logit contributions
 chip-0.039
bank-0.037
 burgers-0.036
ages-0.036
rim-0.035
Token hello
Token ID23748
Feature activation+1.29e-01

Pos logit contributions
 English+0.033
 Spanish+0.031
adies+0.028
 hello+0.028
 closer+0.028
Neg logit contributions
 chip-0.089
bank-0.087
rim-0.084
akers-0.082
 burgers-0.082
Token to
Token ID284
Feature activation-1.15e-03

Pos logit contributions
 English+0.195
adies+0.194
 Spanish+0.186
 clearly+0.182
 closer+0.179
Neg logit contributions
 chip-0.198
bank-0.181
 burgers-0.177
ages-0.172
 ir-0.168
Token the
Token ID262
Feature activation+3.18e-02

Pos logit contributions
 chip+0.002
bank+0.002
 burgers+0.002
 tacos+0.002
 ir+0.002
Neg logit contributions
 English-0.002
adies-0.002
 Spanish-0.002
 closer-0.002
 clearly-0.002
Token people
Token ID661
Feature activation+1.77e-02

Pos logit contributions
 English+0.041
 Spanish+0.039
 name+0.039
 answer+0.038
 wonders+0.037
Neg logit contributions
bank-0.056
 chip-0.052
rim-0.052
ages-0.051
keeping-0.049
Token who
Token ID508
Feature activation+2.24e-02

Pos logit contributions
 English+0.018
 clearly+0.018
 closer+0.018
 Spanish+0.017
 closely+0.016
Neg logit contributions
 chip-0.036
bank-0.036
 tacos-0.034
 burgers-0.034
rim-0.034
Token worked
Token ID3111
Feature activation-2.62e-03

Pos logit contributions
 English+0.031
 clearly+0.031
 Spanish+0.030
 closer+0.030
adies+0.029
Neg logit contributions
 chip-0.037
bank-0.036
 burgers-0.034
ages-0.034
 tacos-0.033
Tokeny
Token ID88
Feature activation+1.26e-03

Pos logit contributions
 English+0.045
adies+0.045
 Spanish+0.043
 closer+0.042
 clearly+0.042
Neg logit contributions
 chip-0.033
bank-0.029
 ir-0.029
 burgers-0.028
akers-0.027
Token cat
Token ID3797
Feature activation+1.63e-02

Pos logit contributions
 closer+0.001
 English+0.001
 diary+0.001
 hello+0.001
 name+0.001
Neg logit contributions
bank-0.002
ages-0.002
 burgers-0.002
 tacos-0.002
keeping-0.002
Token watching
Token ID4964
Feature activation+2.09e-02

Pos logit contributions
 English+0.025
 closer+0.024
adies+0.024
 clearly+0.024
 Spanish+0.024
Neg logit contributions
 chip-0.024
 burgers-0.022
bank-0.022
 tacos-0.022
 ir-0.021
Token him
Token ID683
Feature activation+6.90e-02

Pos logit contributions
 English+0.022
 closer+0.021
 Spanish+0.021
 clearly+0.021
adies+0.020
Neg logit contributions
 chip-0.041
bank-0.039
ages-0.039
akers-0.039
 burgers-0.038
Token from
Token ID422
Feature activation-1.49e-02

Pos logit contributions
 English+0.099
adies+0.097
 Spanish+0.093
 clearly+0.090
 closer+0.090
Neg logit contributions
 chip-0.113
bank-0.106
 burgers-0.103
ages-0.099
 tacos-0.098
Token afar
Token ID44246
Feature activation+1.28e-01

Pos logit contributions
 chip+0.030
bank+0.029
 burgers+0.028
ages+0.028
 tacos+0.027
Neg logit contributions
 English-0.016
adies-0.015
 Spanish-0.014
 closer-0.014
 clearly-0.014
Token.
Token ID13
Feature activation+1.54e-02

Pos logit contributions
 English+0.077
 clearly+0.075
 closer+0.069
 closely+0.064
 Spanish+0.062
Neg logit contributions
rim-0.320
 chip-0.312
akers-0.307
 ir-0.305
 triple-0.303
Token Twe
Token ID24205
Feature activation+2.38e-02

Pos logit contributions
 English+0.015
 One+0.015
 closer+0.015
 clearly+0.014
adies+0.014
Neg logit contributions
 chip-0.030
 burgers-0.029
akers-0.029
 tacos-0.029
rim-0.029
Tokenety
Token ID2963
Feature activation-4.68e-03

Pos logit contributions
 English+0.044
adies+0.043
 hello+0.042
 Spanish+0.041
 clearly+0.041
Neg logit contributions
 chip-0.030
akers-0.029
rim-0.028
bank-0.027
 ir-0.027
Token didn
Token ID1422
Feature activation+1.79e-02

Pos logit contributions
 chip+0.007
bank+0.007
 burgers+0.007
 tacos+0.006
ages+0.006
Neg logit contributions
 English-0.007
adies-0.007
 Spanish-0.007
 clearly-0.007
 closer-0.007
Token't
Token ID470
Feature activation+4.72e-02

Pos logit contributions
 English+0.025
adies+0.024
 Spanish+0.023
 wonders+0.022
 nods+0.022
Neg logit contributions
 chip-0.033
bank-0.028
 burgers-0.027
 bulldo-0.027
 tacos-0.026
Token.
Token ID13
Feature activation+1.72e-02

Pos logit contributions
 English+0.078
adies+0.076
 Spanish+0.073
 closer+0.071
 clearly+0.071
Neg logit contributions
 chip-0.104
bank-0.098
 burgers-0.096
 tacos-0.093
ages-0.092
Token They
Token ID1119
Feature activation+5.06e-03

Pos logit contributions
 English+0.019
adies+0.018
 Spanish+0.018
 closer+0.017
 clearly+0.017
Neg logit contributions
 chip-0.034
bank-0.032
 burgers-0.031
ages-0.031
 tacos-0.031
Token wanted
Token ID2227
Feature activation+6.78e-02

Pos logit contributions
 English+0.008
adies+0.008
 Spanish+0.008
 clearly+0.008
itates+0.008
Neg logit contributions
 chip-0.007
bank-0.006
 burgers-0.006
 ir-0.006
 tacos-0.006
Token to
Token ID284
Feature activation+2.49e-02

Pos logit contributions
 English+0.094
 Spanish+0.088
 closer+0.088
adies+0.088
 clearly+0.086
Neg logit contributions
 chip-0.117
bank-0.111
 burgers-0.105
ages-0.104
 ir-0.103
Token say
Token ID910
Feature activation+3.66e-02

Pos logit contributions
adies+0.029
 English+0.028
 Spanish+0.026
itates+0.026
 clearly+0.025
Neg logit contributions
 chip-0.048
bank-0.045
 burgers-0.044
 tacos-0.043
 ir-0.042
Token hello
Token ID23748
Feature activation+1.28e-01

Pos logit contributions
adies+0.014
 English+0.014
 clearly+0.012
 Spanish+0.011
itates+0.011
Neg logit contributions
 chip-0.096
bank-0.092
 burgers-0.091
 tacos-0.089
ages-0.089
Token.
Token ID13
Feature activation+1.97e-02

Pos logit contributions
adies+0.212
 English+0.206
 Spanish+0.199
 clearly+0.192
itates+0.191
Neg logit contributions
 chip-0.176
bank-0.159
ages-0.154
 burgers-0.153
 hamb-0.142
Token
Token ID198
Feature activation+3.49e-02

Pos logit contributions
 English+0.014
 closer+0.014
 One+0.014
 clearly+0.012
 Spanish+0.012
Neg logit contributions
 chip-0.045
rim-0.044
bank-0.044
akers-0.043
ages-0.043
Token
Token ID198
Feature activation+2.72e-02

Pos logit contributions
 English+0.047
adies+0.047
 Spanish+0.045
 diary+0.043
 clearly+0.042
Neg logit contributions
 chip-0.060
bank-0.056
 burgers-0.054
 bulldo-0.051
ages-0.050
TokenThey
Token ID2990
Feature activation-6.64e-03

Pos logit contributions
One+0.029
 closer+0.025
 One+0.022
She+0.022
 English+0.021
Neg logit contributions
keeping-0.058
bank-0.058
 tacos-0.058
 chip-0.057
 hamb-0.055
Token walked
Token ID6807
Feature activation+1.68e-02

Pos logit contributions
 chip+0.010
bank+0.009
 burgers+0.009
 tacos+0.008
 ir+0.008
Neg logit contributions
 English-0.011
adies-0.010
 Spanish-0.010
 clearly-0.010
 closer-0.010
Token She
Token ID1375
Feature activation+4.31e-02

Pos logit contributions
eks+0.033
 nearer+0.033
adies+0.032
 English+0.032
 closer+0.032
Neg logit contributions
 chip-0.044
bank-0.037
 pitcher-0.036
 ir-0.036
 Mitt-0.036
Token loved
Token ID6151
Feature activation+8.79e-02

Pos logit contributions
 English+0.069
adies+0.067
 Spanish+0.066
 closer+0.064
 clearly+0.064
Neg logit contributions
 chip-0.065
bank-0.059
 burgers-0.057
ages-0.057
 ir-0.056
Token to
Token ID284
Feature activation+3.33e-02

Pos logit contributions
 English+0.082
 Spanish+0.077
 closer+0.076
 wonders+0.076
 hello+0.072
Neg logit contributions
bank-0.180
 chip-0.168
rim-0.168
ages-0.166
akers-0.164
Token play
Token ID711
Feature activation+8.15e-02

Pos logit contributions
 English+0.040
adies+0.038
 Spanish+0.037
 closer+0.036
 clearly+0.036
Neg logit contributions
 chip-0.062
bank-0.060
 burgers-0.057
ages-0.056
 ir-0.055
Token dress
Token ID6576
Feature activation+7.64e-02

Pos logit contributions
 English+0.109
adies+0.105
 Spanish+0.103
 clearly+0.100
 closer+0.099
Neg logit contributions
 chip-0.143
bank-0.133
 burgers-0.131
 tacos-0.126
ages-0.125
Token up
Token ID510
Feature activation+1.27e-01

Pos logit contributions
 English+0.132
adies+0.129
 Spanish+0.126
 clearly+0.123
 closer+0.122
Neg logit contributions
 chip-0.104
bank-0.094
 burgers-0.092
ages-0.089
 ir-0.088
Token and
Token ID290
Feature activation+5.42e-02

Pos logit contributions
 English+0.165
adies+0.159
 closer+0.156
 clearly+0.156
 Spanish+0.156
Neg logit contributions
 chip-0.226
bank-0.219
 burgers-0.208
 tacos-0.205
ages-0.202
Token wear
Token ID5806
Feature activation+4.20e-02

Pos logit contributions
 English+0.059
 closer+0.057
adies+0.055
 clearly+0.055
 Spanish+0.055
Neg logit contributions
 chip-0.108
bank-0.104
 burgers-0.099
ages-0.099
rim-0.098
Token pretty
Token ID2495
Feature activation+7.37e-02

Pos logit contributions
adies+0.055
itates+0.054
 English+0.054
 clearly+0.051
 Spanish+0.050
Neg logit contributions
 chip-0.071
 ir-0.068
 burgers-0.066
 tacos-0.065
ages-0.065
Token clothes
Token ID8242
Feature activation+2.91e-02

Pos logit contributions
itates+0.126
 English+0.125
adies+0.123
 clearly+0.122
 Spanish+0.120
Neg logit contributions
 ir-0.096
 chip-0.094
 burgers-0.087
 tacos-0.086
 oil-0.086
Token.
Token ID13
Feature activation+2.38e-02

Pos logit contributions
 English+0.032
 closer+0.031
 clearly+0.031
adies+0.031
 Spanish+0.030
Neg logit contributions
 chip-0.056
bank-0.054
 burgers-0.052
 tacos-0.051
 bulldo-0.050
TokenC
Token ID34
Feature activation-1.01e-02

Pos logit contributions
adies+0.018
 English+0.018
 clearly+0.018
 Him+0.017
 Spanish+0.017
Neg logit contributions
 chip-0.020
ages-0.018
bank-0.018
 burgers-0.017
 bulldo-0.017
Tokenurious
Token ID16421
Feature activation+4.52e-02

Pos logit contributions
 chip+0.019
bank+0.018
akers+0.018
rim+0.017
uc+0.017
Neg logit contributions
 English-0.013
adies-0.012
 hello-0.011
 closer-0.011
 Spanish-0.011
Token,
Token ID11
Feature activation-4.61e-03

Pos logit contributions
 English+0.076
adies+0.075
 Spanish+0.072
itates+0.072
 clearly+0.071
Neg logit contributions
 chip-0.063
bank-0.058
 burgers-0.056
 tacos-0.053
 ir-0.052
Token Sam
Token ID3409
Feature activation-9.42e-03

Pos logit contributions
 chip+0.008
 burgers+0.007
bank+0.007
 tacos+0.007
 ir+0.007
Neg logit contributions
 English-0.006
adies-0.006
 Spanish-0.005
 clearly-0.005
itates-0.005
Token went
Token ID1816
Feature activation-4.20e-03

Pos logit contributions
 chip+0.016
bank+0.015
 burgers+0.015
 tacos+0.014
ages+0.014
Neg logit contributions
 English-0.013
adies-0.013
 closer-0.013
 clearly-0.012
 Spanish-0.012
Token nearer
Token ID40671
Feature activation+1.27e-01

Pos logit contributions
 chip+0.007
bank+0.007
 burgers+0.007
 bulldo+0.006
 hamb+0.006
Neg logit contributions
 English-0.006
adies-0.005
 Spanish-0.005
 clearly-0.005
 diary-0.005
Token to
Token ID284
Feature activation-1.66e-02

Pos logit contributions
 English+0.162
 closer+0.157
 clearly+0.154
adies+0.152
 Spanish+0.151
Neg logit contributions
 chip-0.230
bank-0.214
 burgers-0.213
 tacos-0.209
ages-0.207
Token the
Token ID262
Feature activation+1.78e-02

Pos logit contributions
 chip+0.020
bank+0.018
 burgers+0.018
 ge+0.018
 hamb+0.016
Neg logit contributions
 English-0.030
adies-0.029
itates-0.029
 Spanish-0.029
azz-0.027
Token pole
Token ID16825
Feature activation-4.90e-03

Pos logit contributions
 English+0.037
adies+0.036
 Spanish+0.036
 clearly+0.035
 closer+0.035
Neg logit contributions
 chip-0.018
bank-0.016
 burgers-0.015
 tacos-0.014
ages-0.014
Token to
Token ID284
Feature activation-8.82e-03

Pos logit contributions
 chip+0.009
 burgers+0.009
 tacos+0.009
ages+0.009
 ir+0.009
Neg logit contributions
 English-0.006
 closer-0.005
 clearly-0.005
adies-0.005
 Spanish-0.005
Token take
Token ID1011
Feature activation+1.28e-02

Pos logit contributions
 chip+0.015
bank+0.014
 burgers+0.013
 tacos+0.013
 ge+0.013
Neg logit contributions
 English-0.012
adies-0.012
 Spanish-0.012
itates-0.011
 clearly-0.011
Token She
Token ID1375
Feature activation+4.22e-02

Pos logit contributions
 closer+0.043
adies+0.042
 nearer+0.040
eks+0.039
ieve+0.039
Neg logit contributions
 chip-0.049
 Su-0.048
bank-0.047
 ir-0.047
Su-0.044
Token loved
Token ID6151
Feature activation+8.73e-02

Pos logit contributions
 English+0.056
adies+0.055
 clearly+0.053
 Spanish+0.052
 closer+0.052
Neg logit contributions
 chip-0.073
bank-0.069
 burgers-0.067
 tacos-0.064
ages-0.064
Token to
Token ID284
Feature activation+3.42e-02

Pos logit contributions
 English+0.077
 wonders+0.075
 Spanish+0.072
 diary+0.071
 closer+0.070
Neg logit contributions
bank-0.184
 chip-0.173
rim-0.172
ages-0.168
stack-0.165
Token play
Token ID711
Feature activation+8.15e-02

Pos logit contributions
 English+0.038
adies+0.036
 Spanish+0.035
 closer+0.034
 clearly+0.034
Neg logit contributions
 chip-0.067
bank-0.064
 burgers-0.061
ages-0.060
akers-0.059
Token dress
Token ID6576
Feature activation+7.64e-02

Pos logit contributions
 English+0.091
adies+0.088
 closer+0.085
 Spanish+0.085
 clearly+0.083
Neg logit contributions
 chip-0.159
bank-0.152
 burgers-0.146
ages-0.143
 tacos-0.142
Token up
Token ID510
Feature activation+1.27e-01

Pos logit contributions
 English+0.125
adies+0.121
 Spanish+0.118
 clearly+0.115
 closer+0.115
Neg logit contributions
 chip-0.112
bank-0.102
 burgers-0.100
ages-0.096
 tacos-0.095
Token and
Token ID290
Feature activation+5.27e-02

Pos logit contributions
 English+0.151
 clearly+0.146
 closer+0.146
adies+0.143
 closely+0.143
Neg logit contributions
 chip-0.243
bank-0.234
 burgers-0.224
 tacos-0.223
ages-0.218
Token put
Token ID1234
Feature activation+1.35e-02

Pos logit contributions
 closer+0.054
 English+0.054
 clearly+0.051
 closely+0.050
 wonders+0.049
Neg logit contributions
 chip-0.108
bank-0.104
rim-0.101
ages-0.100
 burgers-0.100
Token on
Token ID319
Feature activation-1.77e-02

Pos logit contributions
 English+0.020
adies+0.020
 Spanish+0.019
 closer+0.019
 clearly+0.019
Neg logit contributions
 chip-0.021
bank-0.020
 burgers-0.019
ages-0.019
 tacos-0.019
Token pretty
Token ID2495
Feature activation+6.04e-02

Pos logit contributions
 chip+0.034
 burgers+0.031
 ir+0.031
bank+0.031
 tacos+0.031
Neg logit contributions
adies-0.020
 English-0.020
itates-0.019
 clearly-0.018
 Spanish-0.018
Token clothes
Token ID8242
Feature activation+2.20e-02

Pos logit contributions
 English+0.098
itates+0.094
 Spanish+0.093
 clearly+0.093
adies+0.093
Neg logit contributions
 chip-0.084
 ir-0.080
 burgers-0.079
 tacos-0.077
bank-0.077
TokenL
Token ID43
Feature activation-3.14e-02

Pos logit contributions
 English+0.035
adies+0.034
 Spanish+0.033
 clearly+0.033
 closer+0.032
Neg logit contributions
 chip-0.029
bank-0.026
ages-0.025
 burgers-0.025
 ir-0.024
Tokenily
Token ID813
Feature activation-2.37e-03

Pos logit contributions
akers+0.045
rim+0.043
 chip+0.041
 lin+0.041
mons+0.041
Neg logit contributions
 English-0.050
 Spanish-0.048
 closer-0.045
 Venus-0.045
 hello-0.043
Token wanted
Token ID2227
Feature activation+5.95e-02

Pos logit contributions
 chip+0.003
bank+0.003
 burgers+0.003
ages+0.003
 ir+0.003
Neg logit contributions
 English-0.004
adies-0.004
 Spanish-0.004
 clearly-0.004
 closer-0.004
Token to
Token ID284
Feature activation+2.28e-02

Pos logit contributions
 English+0.077
 closer+0.074
adies+0.073
 Spanish+0.072
 clearly+0.070
Neg logit contributions
 chip-0.104
bank-0.101
 ir-0.095
 burgers-0.095
rim-0.095
Token say
Token ID910
Feature activation+2.94e-02

Pos logit contributions
 English+0.027
adies+0.026
 Spanish+0.025
 clearly+0.025
 closer+0.024
Neg logit contributions
 chip-0.043
bank-0.041
 burgers-0.040
 tacos-0.038
ages-0.038
Token hello
Token ID23748
Feature activation+1.27e-01

Pos logit contributions
 clearly+0.016
adies+0.016
itates+0.015
 English+0.015
 wonders+0.014
Neg logit contributions
 chip-0.072
 burgers-0.067
bank-0.066
ages-0.066
 ge-0.065
Token to
Token ID284
Feature activation-1.56e-03

Pos logit contributions
adies+0.237
 English+0.230
 Spanish+0.223
eks+0.218
 clearly+0.215
Neg logit contributions
 chip-0.152
bank-0.130
ages-0.124
 burgers-0.119
 hamb-0.115
Token the
Token ID262
Feature activation+2.54e-02

Pos logit contributions
 chip+0.002
bank+0.002
 burgers+0.002
 tacos+0.002
 ir+0.002
Neg logit contributions
 English-0.003
adies-0.003
 Spanish-0.003
itates-0.003
 clearly-0.002
Token goose
Token ID37246
Feature activation+1.01e-02

Pos logit contributions
 English+0.053
adies+0.052
 Spanish+0.051
 clearly+0.050
itates+0.049
Neg logit contributions
 chip-0.027
bank-0.023
 burgers-0.022
 ge-0.020
 tacos-0.020
Token,
Token ID11
Feature activation+3.22e-02

Pos logit contributions
 English+0.013
 closer+0.013
 Spanish+0.013
 clearly+0.013
adies+0.012
Neg logit contributions
 chip-0.017
 burgers-0.016
bank-0.016
 ir-0.016
ages-0.016
Token so
Token ID523
Feature activation-1.87e-02

Pos logit contributions
 English+0.051
 Spanish+0.048
 closer+0.048
adies+0.048
 clearly+0.048
Neg logit contributions
 chip-0.048
bank-0.046
 burgers-0.044
ages-0.044
 tacos-0.043
Token.
Token ID13
Feature activation+1.31e-03

Pos logit contributions
 English+0.038
 closer+0.036
 Spanish+0.035
adies+0.034
 hello+0.033
Neg logit contributions
 chip-0.067
bank-0.065
 burgers-0.063
 tacos-0.061
 ir-0.061
Token She
Token ID1375
Feature activation+2.70e-02

Pos logit contributions
 English+0.002
adies+0.002
 Spanish+0.002
 clearly+0.001
 closer+0.001
Neg logit contributions
 chip-0.002
bank-0.002
 burgers-0.002
ages-0.002
 tacos-0.002
Token wants
Token ID3382
Feature activation+7.41e-02

Pos logit contributions
adies+0.033
 English+0.032
 Spanish+0.030
 closer+0.029
itates+0.029
Neg logit contributions
 chip-0.050
bank-0.046
 burgers-0.045
 tacos-0.044
 ge-0.043
Token to
Token ID284
Feature activation+2.03e-02

Pos logit contributions
 English+0.103
adies+0.098
 Spanish+0.097
 closer+0.096
 clearly+0.094
Neg logit contributions
 chip-0.124
bank-0.118
 burgers-0.114
ages-0.111
 ir-0.110
Token say
Token ID910
Feature activation+3.26e-02

Pos logit contributions
 English+0.025
adies+0.023
 Spanish+0.023
 closer+0.022
 clearly+0.022
Neg logit contributions
 chip-0.038
bank-0.036
 burgers-0.035
ages-0.034
 tacos-0.034
Token hello
Token ID23748
Feature activation+1.26e-01

Pos logit contributions
adies+0.011
 English+0.011
 clearly+0.009
 Spanish+0.008
itates+0.008
Neg logit contributions
 chip-0.088
bank-0.083
 burgers-0.083
 tacos-0.081
ages-0.080
Token to
Token ID284
Feature activation-2.49e-03

Pos logit contributions
adies+0.209
 English+0.206
 Spanish+0.198
 clearly+0.194
itates+0.192
Neg logit contributions
 chip-0.174
bank-0.156
 burgers-0.148
ages-0.146
 tacos-0.138
Token them
Token ID606
Feature activation+5.58e-02

Pos logit contributions
 chip+0.004
bank+0.004
 burgers+0.004
 tacos+0.004
 ge+0.004
Neg logit contributions
adies-0.003
 English-0.003
itates-0.003
 Spanish-0.003
 clearly-0.003
Token.
Token ID13
Feature activation+1.29e-02

Pos logit contributions
 English+0.073
adies+0.069
 Spanish+0.069
 closer+0.068
 clearly+0.068
Neg logit contributions
 chip-0.099
bank-0.094
 burgers-0.092
 tacos-0.089
 ir-0.089
Token
Token ID198
Feature activation+2.90e-02

Pos logit contributions
 English+0.016
adies+0.016
 Spanish+0.015
 clearly+0.015
 closer+0.015
Neg logit contributions
 chip-0.024
bank-0.022
 burgers-0.021
 tacos-0.021
ages-0.020
Token
Token ID198
Feature activation+2.16e-02

Pos logit contributions
 English+0.048
adies+0.047
 diary+0.047
 Spanish+0.046
 outsider+0.043
Neg logit contributions
 chip-0.044
bank-0.037
 burgers-0.034
 ge-0.033
ages-0.032
Token who
Token ID508
Feature activation+2.29e-02

Pos logit contributions
 English+0.097
adies+0.095
 Spanish+0.092
 clearly+0.089
 closer+0.088
Neg logit contributions
 chip-0.109
bank-0.100
 burgers-0.096
ages-0.094
 ir-0.093
Token loved
Token ID6151
Feature activation+8.42e-02

Pos logit contributions
 English+0.033
adies+0.032
 Spanish+0.031
 clearly+0.030
 closer+0.030
Neg logit contributions
 chip-0.038
bank-0.035
 burgers-0.034
ages-0.033
 tacos-0.033
Token to
Token ID284
Feature activation+3.02e-02

Pos logit contributions
 English+0.072
 closer+0.071
 Spanish+0.067
 wonders+0.066
 hello+0.064
Neg logit contributions
bank-0.174
rim-0.167
akers-0.164
 chip-0.163
ages-0.162
Token play
Token ID711
Feature activation+8.37e-02

Pos logit contributions
 English+0.034
adies+0.032
 Spanish+0.032
 closer+0.031
 clearly+0.030
Neg logit contributions
 chip-0.057
bank-0.056
 burgers-0.053
ages-0.053
akers-0.052
Token dress
Token ID6576
Feature activation+7.82e-02

Pos logit contributions
 English+0.119
adies+0.115
 Spanish+0.112
 clearly+0.109
 closer+0.107
Neg logit contributions
 chip-0.140
bank-0.131
 burgers-0.129
 tacos-0.124
 ir-0.122
Token up
Token ID510
Feature activation+1.26e-01

Pos logit contributions
 English+0.132
adies+0.128
 Spanish+0.125
 clearly+0.122
 closer+0.122
Neg logit contributions
 chip-0.110
bank-0.101
 burgers-0.099
 tacos-0.094
ages-0.094
Token.
Token ID13
Feature activation+2.49e-02

Pos logit contributions
 English+0.150
 closer+0.146
 clearly+0.145
 closely+0.143
 Spanish+0.141
Neg logit contributions
 chip-0.235
bank-0.228
 tacos-0.217
 burgers-0.215
ages-0.215
Token He
Token ID679
Feature activation+4.10e-02

Pos logit contributions
 English+0.022
adies+0.021
 Spanish+0.020
 closer+0.019
 clearly+0.019
Neg logit contributions
 chip-0.056
bank-0.053
 burgers-0.052
 tacos-0.050
 ir-0.050
Token had
Token ID550
Feature activation-2.35e-02

Pos logit contributions
 English+0.067
adies+0.065
 Spanish+0.064
 clearly+0.062
 closer+0.062
Neg logit contributions
 chip-0.059
bank-0.055
 burgers-0.053
ages-0.051
 ir-0.051
Token a
Token ID257
Feature activation+4.93e-02

Pos logit contributions
 chip+0.052
bank+0.050
 burgers+0.049
ages+0.047
 tacos+0.047
Neg logit contributions
 English-0.020
adies-0.019
 Spanish-0.018
 clearly-0.018
 closer-0.018
Token big
Token ID1263
Feature activation+1.62e-02

Pos logit contributions
 English+0.060
adies+0.059
 Spanish+0.056
 clearly+0.055
 closer+0.054
Neg logit contributions
 chip-0.092
bank-0.086
 burgers-0.084
 tacos-0.081
 ir-0.081
Token She
Token ID1375
Feature activation+4.21e-02

Pos logit contributions
eks+0.033
 nearer+0.032
adies+0.032
 closer+0.031
 English+0.031
Neg logit contributions
 chip-0.042
 pitcher-0.035
bank-0.035
 Mitt-0.035
 oil-0.034
Token loved
Token ID6151
Feature activation+8.66e-02

Pos logit contributions
 English+0.068
adies+0.065
 Spanish+0.065
 closer+0.063
 clearly+0.063
Neg logit contributions
 chip-0.063
bank-0.057
 burgers-0.056
ages-0.055
 ir-0.054
Token to
Token ID284
Feature activation+3.19e-02

Pos logit contributions
 English+0.080
 closer+0.075
 Spanish+0.075
 wonders+0.073
 diary+0.071
Neg logit contributions
bank-0.179
 chip-0.167
rim-0.166
ages-0.162
akers-0.161
Token play
Token ID711
Feature activation+8.07e-02

Pos logit contributions
 English+0.039
adies+0.037
 Spanish+0.036
 closer+0.035
 clearly+0.035
Neg logit contributions
 chip-0.058
bank-0.056
 burgers-0.054
ages-0.053
 ir-0.052
Token dress
Token ID6576
Feature activation+7.61e-02

Pos logit contributions
 English+0.112
adies+0.107
 Spanish+0.105
 clearly+0.102
 closer+0.100
Neg logit contributions
 chip-0.139
bank-0.129
 burgers-0.127
 tacos-0.122
 ir-0.121
Token up
Token ID510
Feature activation+1.25e-01

Pos logit contributions
 English+0.142
adies+0.138
 Spanish+0.135
itates+0.131
 closer+0.130
Neg logit contributions
 chip-0.096
 burgers-0.083
bank-0.083
ages-0.080
 ir-0.079
Token with
Token ID351
Feature activation+3.82e-02

Pos logit contributions
 English+0.174
adies+0.168
 Spanish+0.164
 clearly+0.163
 closer+0.162
Neg logit contributions
 chip-0.213
bank-0.202
 burgers-0.194
 tacos-0.189
ages-0.188
Token her
Token ID607
Feature activation+7.58e-02

Pos logit contributions
adies+0.042
 English+0.042
itates+0.040
 clearly+0.039
 Spanish+0.039
Neg logit contributions
 chip-0.076
 burgers-0.070
 ir-0.070
 tacos-0.069
bank-0.067
Token friends
Token ID2460
Feature activation+3.62e-02

Pos logit contributions
 English+0.098
 Spanish+0.092
 closer+0.090
 clearly+0.090
adies+0.090
Neg logit contributions
bank-0.134
 chip-0.133
 burgers-0.126
 tacos-0.122
ages-0.119
Token.
Token ID13
Feature activation+2.57e-02

Pos logit contributions
 English+0.030
 closely+0.030
 One+0.029
 closer+0.029
 Spanish+0.028
Neg logit contributions
bank-0.080
rim-0.080
 chip-0.079
 burgers-0.077
 tacos-0.076
Token One
Token ID1881
Feature activation+7.12e-02

Pos logit contributions
eks+0.027
adies+0.027
 English+0.025
 nearer+0.025
 clearly+0.025
Neg logit contributions
 chip-0.055
bank-0.048
 oil-0.046
 burgers-0.045
 ir-0.045
Token who
Token ID508
Feature activation+3.92e-02

Pos logit contributions
adies+0.072
 English+0.071
 Spanish+0.067
itates+0.065
 closer+0.064
Neg logit contributions
 chip-0.074
bank-0.068
 burgers-0.067
 tacos-0.063
 ir-0.063
Token like
Token ID588
Feature activation+3.66e-02

Pos logit contributions
 English+0.047
adies+0.046
 Spanish+0.044
 clearly+0.044
 closer+0.043
Neg logit contributions
 chip-0.074
bank-0.069
 burgers-0.068
 tacos-0.065
ages-0.065
Token to
Token ID284
Feature activation+1.39e-02

Pos logit contributions
 English+0.041
 Spanish+0.038
 closer+0.035
 hello+0.035
adies+0.033
Neg logit contributions
bank-0.073
 chip-0.072
rim-0.072
akers-0.069
ages-0.068
Token play
Token ID711
Feature activation+8.14e-02

Pos logit contributions
 English+0.018
adies+0.018
 clearly+0.017
 Spanish+0.017
itates+0.017
Neg logit contributions
 chip-0.024
bank-0.022
 burgers-0.022
 tacos-0.021
 ir-0.021
Token dress
Token ID6576
Feature activation+7.48e-02

Pos logit contributions
 English+0.116
adies+0.113
 Spanish+0.110
 closer+0.109
 clearly+0.108
Neg logit contributions
 chip-0.135
bank-0.126
 burgers-0.122
ages-0.118
 tacos-0.117
Token up
Token ID510
Feature activation+1.25e-01

Pos logit contributions
 English+0.113
 Spanish+0.108
adies+0.108
 clearly+0.107
 closer+0.106
Neg logit contributions
 chip-0.116
bank-0.110
 burgers-0.104
 tacos-0.101
 bulldo-0.100
Token.
Token ID13
Feature activation+2.83e-02

Pos logit contributions
 English+0.159
 Spanish+0.150
 closer+0.149
adies+0.147
 clearly+0.146
Neg logit contributions
 chip-0.233
bank-0.226
 burgers-0.214
ages-0.212
 tacos-0.211
Token They
Token ID1119
Feature activation+9.87e-04

Pos logit contributions
adies+0.018
 English+0.018
 clearly+0.017
 closer+0.017
itates+0.016
Neg logit contributions
 chip-0.068
bank-0.065
 burgers-0.064
 tacos-0.063
 ir-0.061
Token have
Token ID423
Feature activation-6.29e-03

Pos logit contributions
 English+0.001
adies+0.001
 Spanish+0.001
 clearly+0.001
 closer+0.001
Neg logit contributions
 chip-0.002
bank-0.002
 burgers-0.002
 ir-0.002
 tacos-0.002
Token many
Token ID867
Feature activation+1.76e-02

Pos logit contributions
 chip+0.015
bank+0.014
 burgers+0.014
rim+0.014
ages+0.013
Neg logit contributions
 English-0.005
adies-0.004
 Spanish-0.004
 closer-0.004
 clearly-0.004
Token hats
Token ID23910
Feature activation+2.27e-02

Pos logit contributions
 English+0.026
adies+0.025
 Spanish+0.024
 clearly+0.024
 closer+0.024
Neg logit contributions
 chip-0.028
bank-0.026
 burgers-0.026
 ir-0.025
 tacos-0.025
Token deaf
Token ID21815
Feature activation+2.14e-02

Pos logit contributions
bank+0.008
 chip+0.008
rim+0.008
 burgers+0.008
keeping+0.008
Neg logit contributions
 English-0.005
 Spanish-0.005
 clearly-0.005
 closer-0.005
 wonders-0.004
Token.
Token ID13
Feature activation+1.96e-02

Pos logit contributions
 English+0.022
 Spanish+0.020
 clearly+0.019
 wonders+0.017
 closer+0.017
Neg logit contributions
 chip-0.047
rim-0.046
bank-0.046
 burgers-0.045
 hamb-0.044
Token She
Token ID1375
Feature activation+4.34e-02

Pos logit contributions
adies+0.014
 English+0.014
 closer+0.013
 Spanish+0.012
 clearly+0.012
Neg logit contributions
 chip-0.047
bank-0.044
 burgers-0.043
 tacos-0.042
 ir-0.042
Token could
Token ID714
Feature activation+3.15e-02

Pos logit contributions
 English+0.060
adies+0.056
 Spanish+0.056
 clearly+0.056
 closer+0.054
Neg logit contributions
 chip-0.075
bank-0.071
 burgers-0.070
 tacos-0.066
ages-0.066
Token sign
Token ID1051
Feature activation+4.69e-02

Pos logit contributions
adies+0.040
 English+0.038
itates+0.036
 Spanish+0.035
 clearly+0.034
Neg logit contributions
 chip-0.058
bank-0.052
 burgers-0.051
 tacos-0.050
ards-0.048
Token hello
Token ID23748
Feature activation+1.25e-01

Pos logit contributions
 English+0.060
 Spanish+0.057
 closer+0.055
 clearly+0.054
adies+0.052
Neg logit contributions
 chip-0.088
bank-0.085
 burgers-0.081
ages-0.080
rim-0.079
Token,
Token ID11
Feature activation+3.11e-02

Pos logit contributions
 English+0.182
adies+0.179
 Spanish+0.172
 clearly+0.169
 closer+0.168
Neg logit contributions
 chip-0.199
bank-0.186
 burgers-0.180
ages-0.175
 ir-0.173
Token thank
Token ID5875
Feature activation+1.86e-02

Pos logit contributions
 English+0.034
 Spanish+0.032
 clearly+0.030
 closer+0.030
adies+0.028
Neg logit contributions
 chip-0.066
bank-0.064
ages-0.060
rim-0.060
 burgers-0.060
Token you
Token ID345
Feature activation+5.05e-03

Pos logit contributions
 English+0.023
 Spanish+0.022
adies+0.022
 closer+0.021
 clearly+0.021
Neg logit contributions
 chip-0.035
bank-0.034
 burgers-0.032
ages-0.031
 bulldo-0.031
Token,
Token ID11
Feature activation+1.92e-02

Pos logit contributions
 English+0.007
 Spanish+0.006
 closer+0.006
 closely+0.006
 clearly+0.005
Neg logit contributions
 chip-0.010
rim-0.010
bank-0.009
 tacos-0.009
 bulldo-0.009
Token hungry
Token ID14720
Feature activation-1.34e-02

Pos logit contributions
 English+0.021
 Spanish+0.020
 closer+0.019
 clearly+0.019
 hello+0.018
Neg logit contributions
 chip-0.041
bank-0.040
rim-0.037
ages-0.037
 burgers-0.037
Token Mia
Token ID32189
Feature activation+2.16e-02

Pos logit contributions
 English+0.043
 Spanish+0.043
 clearly+0.041
 closer+0.040
itate+0.039
Neg logit contributions
 chip-0.035
bank-0.032
 burgers-0.031
 tacos-0.031
 pitcher-0.029
Token liked
Token ID8288
Feature activation+8.39e-02

Pos logit contributions
 English+0.035
adies+0.033
 Spanish+0.033
 closer+0.032
 clearly+0.032
Neg logit contributions
 chip-0.032
bank-0.030
 burgers-0.029
ages-0.028
 ir-0.028
Token to
Token ID284
Feature activation+2.70e-02

Pos logit contributions
 English+0.075
 Spanish+0.069
 closer+0.068
 hello+0.064
 diary+0.063
Neg logit contributions
bank-0.180
rim-0.170
 chip-0.165
akers-0.164
ages-0.164
Token play
Token ID711
Feature activation+8.11e-02

Pos logit contributions
 English+0.033
adies+0.032
 Spanish+0.031
 closer+0.030
 clearly+0.030
Neg logit contributions
 chip-0.050
bank-0.047
 burgers-0.046
ages-0.044
 tacos-0.044
Token dress
Token ID6576
Feature activation+7.57e-02

Pos logit contributions
 English+0.121
adies+0.116
 Spanish+0.114
 clearly+0.111
 closer+0.109
Neg logit contributions
 chip-0.131
bank-0.120
 burgers-0.119
 tacos-0.114
ages-0.113
Token up
Token ID510
Feature activation+1.25e-01

Pos logit contributions
 English+0.130
adies+0.126
 Spanish+0.123
 closer+0.120
 clearly+0.120
Neg logit contributions
 chip-0.105
bank-0.095
 burgers-0.094
ages-0.090
 tacos-0.089
Token with
Token ID351
Feature activation+3.79e-02

Pos logit contributions
 English+0.165
adies+0.159
 Spanish+0.155
 clearly+0.155
 closer+0.154
Neg logit contributions
 chip-0.221
bank-0.213
 burgers-0.204
 tacos-0.200
ages-0.198
Token her
Token ID607
Feature activation+7.75e-02

Pos logit contributions
 English+0.037
adies+0.036
 Spanish+0.033
 clearly+0.033
 closer+0.033
Neg logit contributions
 chip-0.081
bank-0.075
 burgers-0.075
 tacos-0.073
 ir-0.073
Token clothes
Token ID8242
Feature activation+9.07e-03

Pos logit contributions
 English+0.098
 Spanish+0.092
 closer+0.090
 clearly+0.089
 diary+0.088
Neg logit contributions
bank-0.141
 chip-0.136
 burgers-0.131
 tacos-0.127
rim-0.124
Token.
Token ID13
Feature activation+2.53e-02

Pos logit contributions
 closely+0.007
 closer+0.007
 clearly+0.007
bell+0.007
 English+0.007
Neg logit contributions
 chip-0.020
bank-0.020
 burgers-0.019
rim-0.019
 ge-0.019
Token She
Token ID1375
Feature activation+4.21e-02

Pos logit contributions
 English+0.030
 Spanish+0.029
adies+0.029
 closer+0.028
 diary+0.028
Neg logit contributions
 chip-0.047
bank-0.044
 tacos-0.040
 burgers-0.040
 Lem-0.039
Token.
Token ID13
Feature activation+1.16e-02

Pos logit contributions
 English+0.015
adies+0.015
 closer+0.014
 clearly+0.014
 Spanish+0.014
Neg logit contributions
 chip-0.022
bank-0.021
 burgers-0.020
 tacos-0.020
 ir-0.020
Token Lily
Token ID20037
Feature activation+1.13e-02

Pos logit contributions
adies+0.018
 English+0.018
 Spanish+0.017
 clearly+0.016
 closer+0.016
Neg logit contributions
 chip-0.019
bank-0.017
 burgers-0.016
ages-0.016
 oil-0.016
Token wanted
Token ID2227
Feature activation+5.83e-02

Pos logit contributions
 English+0.018
adies+0.017
 Spanish+0.017
 clearly+0.016
 closer+0.016
Neg logit contributions
 chip-0.017
bank-0.016
 burgers-0.015
ages-0.015
 ir-0.015
Token to
Token ID284
Feature activation+2.00e-02

Pos logit contributions
 English+0.087
adies+0.085
 Spanish+0.082
 clearly+0.081
 closer+0.080
Neg logit contributions
 chip-0.093
bank-0.087
 burgers-0.084
 tacos-0.081
ages-0.081
Token say
Token ID910
Feature activation+2.96e-02

Pos logit contributions
 English+0.028
adies+0.028
 Spanish+0.027
itates+0.026
 clearly+0.026
Neg logit contributions
 chip-0.034
bank-0.031
 burgers-0.030
 tacos-0.029
 ge-0.028
Token hello
Token ID23748
Feature activation+1.25e-01

Pos logit contributions
adies+0.018
itates+0.018
 clearly+0.017
bell+0.016
 English+0.016
Neg logit contributions
 chip-0.070
 burgers-0.064
bank-0.063
 cho-0.063
 ge-0.063
Token to
Token ID284
Feature activation-3.83e-03

Pos logit contributions
adies+0.227
 English+0.220
 Spanish+0.214
eks+0.209
 clearly+0.205
Neg logit contributions
 chip-0.153
bank-0.133
ages-0.128
 burgers-0.122
 cho-0.117
Token the
Token ID262
Feature activation+2.22e-02

Pos logit contributions
 chip+0.004
bank+0.004
 burgers+0.003
 ge+0.003
 cho+0.003
Neg logit contributions
itates-0.007
adies-0.007
utters-0.007
 English-0.007
bell-0.007
Token lizard
Token ID42406
Feature activation+6.08e-02

Pos logit contributions
adies+0.041
bell+0.040
 Him+0.039
 English+0.036
eks+0.036
Neg logit contributions
 chip-0.033
 sk-0.027
 le-0.026
 rac-0.025
 ge-0.024
Token,
Token ID11
Feature activation+2.69e-02

Pos logit contributions
adies+0.095
 English+0.095
 Spanish+0.090
itates+0.088
 clearly+0.088
Neg logit contributions
 chip-0.090
bank-0.084
 burgers-0.080
ages-0.076
 tacos-0.075
Token but
Token ID475
Feature activation+9.70e-04

Pos logit contributions
 English+0.037
 closer+0.037
 Spanish+0.035
 clearly+0.035
 hello+0.034
Neg logit contributions
 chip-0.045
bank-0.044
ages-0.044
akers-0.043
rim-0.043
Token
Token ID198
Feature activation+2.81e-02

Pos logit contributions
 English+0.048
adies+0.047
 Spanish+0.045
 clearly+0.045
eks+0.044
Neg logit contributions
 chip-0.059
 burgers-0.052
bank-0.051
 oil-0.048
 bulldo-0.048
Token"
Token ID1
Feature activation-3.53e-02

Pos logit contributions
 English+0.036
adies+0.035
 Spanish+0.034
 closer+0.034
 clearly+0.033
Neg logit contributions
 chip-0.050
bank-0.048
 burgers-0.046
 tacos-0.045
ages-0.045
TokenCan
Token ID6090
Feature activation+5.26e-02

Pos logit contributions
 chip+0.063
 burgers+0.056
bank+0.056
 ir+0.054
 tacos+0.054
Neg logit contributions
adies-0.046
 English-0.045
 clearly-0.044
 Spanish-0.043
 closely-0.042
Token we
Token ID356
Feature activation+1.56e-02

Pos logit contributions
 English+0.085
adies+0.083
 Spanish+0.081
 clearly+0.080
 closer+0.078
Neg logit contributions
 chip-0.078
bank-0.070
 burgers-0.069
 tacos-0.066
ages-0.065
Token go
Token ID467
Feature activation+2.30e-02

Pos logit contributions
 English+0.018
adies+0.017
 Spanish+0.017
 clearly+0.016
 closer+0.016
Neg logit contributions
 chip-0.030
bank-0.028
 burgers-0.027
 tacos-0.027
 ir-0.026
Token closer
Token ID5699
Feature activation+1.25e-01

Pos logit contributions
 English+0.032
adies+0.030
 Spanish+0.030
 clearly+0.029
 wonders+0.027
Neg logit contributions
 chip-0.038
bank-0.036
 burgers-0.035
 bulldo-0.034
ards-0.033
Token,
Token ID11
Feature activation+2.36e-02

Pos logit contributions
 English+0.141
adies+0.133
 closer+0.132
 Spanish+0.131
 clearly+0.128
Neg logit contributions
 chip-0.241
bank-0.231
 burgers-0.224
ages-0.219
 ir-0.219
Token mom
Token ID1995
Feature activation+2.47e-02

Pos logit contributions
 English+0.027
 closer+0.025
adies+0.025
 Spanish+0.025
 clearly+0.024
Neg logit contributions
bank-0.046
 chip-0.046
ages-0.044
 ir-0.042
akers-0.042
Token?"
Token ID1701
Feature activation+6.22e-02

Pos logit contributions
 English+0.014
adies+0.012
 closer+0.012
 Spanish+0.012
 clearly+0.011
Neg logit contributions
 chip-0.062
bank-0.061
 burgers-0.059
ages-0.058
 tacos-0.058
Token Lily
Token ID20037
Feature activation+2.25e-02

Pos logit contributions
 English+0.073
adies+0.068
 closer+0.068
 Spanish+0.068
 clearly+0.066
Neg logit contributions
 chip-0.115
bank-0.114
ages-0.109
 burgers-0.108
akers-0.106
Token asked
Token ID1965
Feature activation+2.22e-02

Pos logit contributions
adies+0.040
 English+0.036
 Spanish+0.035
 closer+0.035
eks+0.034
Neg logit contributions
 chip-0.031
bank-0.027
 ir-0.027
 burgers-0.025
 tacos-0.024
Token!"
Token ID2474
Feature activation-7.23e-03

Pos logit contributions
 chip+0.003
 ir+0.003
bank+0.003
rim+0.003
 bulldo+0.003
Neg logit contributions
 English-0.002
 closer-0.002
adies-0.002
 clearly-0.002
 Spanish-0.002
Token
Token ID220
Feature activation+1.21e-02

Pos logit contributions
akers+0.017
rim+0.017
 burgers+0.017
 tacos+0.016
bank+0.016
Neg logit contributions
 One-0.005
 English-0.004
 clearly-0.004
 hello-0.004
 She-0.004
Token
Token ID198
Feature activation-5.66e-03

Pos logit contributions
 English+0.011
adies+0.010
 clearly+0.010
 Spanish+0.010
 closer+0.009
Neg logit contributions
 chip-0.026
bank-0.025
 burgers-0.025
 ir-0.024
 tacos-0.024
Token
Token ID198
Feature activation-1.23e-02

Pos logit contributions
 chip+0.012
bank+0.012
 burgers+0.011
 tacos+0.011
 ir+0.011
Neg logit contributions
 English-0.005
adies-0.005
 Spanish-0.005
 clearly-0.005
 closer-0.005
TokenThe
Token ID464
Feature activation-6.01e-02

Pos logit contributions
bank+0.025
 burgers+0.024
 chip+0.024
 tacos+0.024
 hamb+0.023
Neg logit contributions
 English-0.013
 Spanish-0.012
 clearly-0.012
One-0.012
 closer-0.011
Token chip
Token ID11594
Feature activation-1.84e-01

Pos logit contributions
bank+0.109
keeping+0.105
rim+0.100
 ir+0.100
akers+0.097
Neg logit contributions
 closer-0.070
 name-0.070
 answer-0.069
 English-0.068
 wonders-0.067
Tokenm
Token ID76
Feature activation-1.15e-01

Pos logit contributions
keeping+0.341
bank+0.337
 bulldo+0.317
 cho+0.303
 burgers+0.299
Neg logit contributions
 nods-0.216
 closer-0.198
 shrug-0.197
itates-0.197
 trusts-0.184
Tokenunk
Token ID2954
Feature activation-8.09e-02

Pos logit contributions
bank+0.180
 ir+0.172
 chip+0.172
 burgers+0.170
keeping+0.169
Neg logit contributions
 name-0.154
 clearly-0.139
 English-0.139
 shrug-0.138
adies-0.138
Token finished
Token ID5201
Feature activation-5.17e-02

Pos logit contributions
 burgers+0.161
bank+0.158
 chip+0.156
 tacos+0.155
akers+0.154
Neg logit contributions
 clearly-0.085
 closer-0.081
 English-0.078
 shrug-0.075
 nods-0.074
Token the
Token ID262
Feature activation-1.41e-02

Pos logit contributions
akers+0.105
ages+0.102
mons+0.100
rim+0.099
 bulldo+0.099
Neg logit contributions
 closely-0.060
 curiously-0.055
 clearly-0.054
 closer-0.051
 her-0.045
Token treat
Token ID2190
Feature activation-4.96e-02

Pos logit contributions
keeping+0.022
bers+0.022
bank+0.022
akers+0.021
 ir+0.021
Neg logit contributions
 closer-0.021
 name-0.020
 answer-0.020
 reply-0.019
 title-0.018
Token you
Token ID345
Feature activation+2.44e-03

Pos logit contributions
 English+0.003
 Spanish+0.003
 closer+0.003
 anyone+0.003
 clearly+0.003
Neg logit contributions
bank-0.008
 chip-0.008
 bulldo-0.007
 hamb-0.007
 paramed-0.007
Token".
Token ID1911
Feature activation-1.65e-02

Pos logit contributions
 English+0.003
 closer+0.003
 clearly+0.003
 Spanish+0.002
 closely+0.002
Neg logit contributions
 chip-0.005
bank-0.005
 ir-0.005
 tacos-0.005
 bulldo-0.005
Token
Token ID198
Feature activation-2.40e-03

Pos logit contributions
 chip+0.038
rim+0.038
 burgers+0.038
akers+0.038
 ir+0.038
Neg logit contributions
 One-0.011
 English-0.010
 Spanish-0.009
 Maybe-0.009
 clearly-0.009
Token
Token ID198
Feature activation-9.45e-03

Pos logit contributions
 chip+0.005
bank+0.005
 burgers+0.005
 ir+0.005
 tacos+0.005
Neg logit contributions
 English-0.002
adies-0.002
 Spanish-0.002
 clearly-0.002
 closer-0.002
TokenThe
Token ID464
Feature activation-5.53e-02

Pos logit contributions
 chip+0.018
bank+0.018
 burgers+0.017
 ir+0.017
 hamb+0.017
Neg logit contributions
 English-0.010
One-0.010
 Spanish-0.010
 closer-0.009
 clearly-0.009
Token chip
Token ID11594
Feature activation-1.81e-01

Pos logit contributions
keeping+0.108
bank+0.104
ages+0.097
rim+0.093
 ir+0.092
Neg logit contributions
 closer-0.065
 answer-0.062
 wonders-0.062
 moral-0.060
 name-0.059
Tokenm
Token ID76
Feature activation-1.16e-01

Pos logit contributions
keeping+0.399
bank+0.357
 bulldo+0.335
 cho+0.326
icol+0.323
Neg logit contributions
 nods-0.186
 shrug-0.157
 closer-0.156
 trusts-0.144
itates-0.143
Tokenunk
Token ID2954
Feature activation-8.45e-02

Pos logit contributions
bank+0.189
keeping+0.187
rob+0.185
 ir+0.184
 chip+0.183
Neg logit contributions
 name-0.142
 upon-0.129
aly-0.126
 wonders-0.125
 shrug-0.125
Token and
Token ID290
Feature activation-9.99e-03

Pos logit contributions
 chip+0.164
 burgers+0.164
 tacos+0.159
 ir+0.157
bank+0.156
Neg logit contributions
 clearly-0.091
 closer-0.089
 English-0.087
 nods-0.084
 shrug-0.082
Token Sam
Token ID3409
Feature activation-4.92e-02

Pos logit contributions
rim+0.024
akers+0.022
keeping+0.022
rob+0.022
bank+0.022
Neg logit contributions
 closer-0.006
 another-0.006
 English-0.006
 Crystal-0.005
 nods-0.005
Token were
Token ID547
Feature activation-4.86e-02

Pos logit contributions
 chip+0.090
 burgers+0.086
 tacos+0.085
rim+0.084
bank+0.084
Neg logit contributions
 clearly-0.061
 closer-0.060
 wonders-0.058
 English-0.058
 closely-0.057
Token place
Token ID1295
Feature activation-4.76e-03

Pos logit contributions
ages+0.046
bank+0.042
 chip+0.042
akers+0.041
mons+0.041
Neg logit contributions
 English-0.023
 closer-0.023
 wonders-0.022
 Spanish-0.022
 clearly-0.021
Token!
Token ID0
Feature activation-2.07e-02

Pos logit contributions
 chip+0.011
rob+0.011
ages+0.011
 bulldo+0.010
rim+0.010
Neg logit contributions
 clearly-0.005
 closer-0.004
 closely-0.004
 English-0.004
 Spanish-0.004
Token
Token ID198
Feature activation-1.33e-03

Pos logit contributions
 chip+0.047
akers+0.045
bank+0.045
rim+0.045
 burgers+0.045
Neg logit contributions
 English-0.017
 One-0.016
 Spanish-0.015
 clearly-0.014
 closer-0.013
Token
Token ID198
Feature activation-7.08e-03

Pos logit contributions
 chip+0.003
bank+0.003
 burgers+0.003
 tacos+0.003
 ir+0.003
Neg logit contributions
 English-0.001
adies-0.001
 Spanish-0.001
 clearly-0.001
 closer-0.001
TokenThe
Token ID464
Feature activation-5.87e-02

Pos logit contributions
bank+0.015
 chip+0.015
 ir+0.014
 burgers+0.014
 tacos+0.014
Neg logit contributions
 English-0.007
One-0.007
 Spanish-0.007
 closer-0.006
adies-0.006
Token chip
Token ID11594
Feature activation-1.78e-01

Pos logit contributions
bank+0.110
keeping+0.109
rim+0.104
ages+0.103
rob+0.098
Neg logit contributions
 closer-0.071
 English-0.065
 answer-0.064
 name-0.063
 wonders-0.062
Tokenmun
Token ID6199
Feature activation-6.82e-02

Pos logit contributions
keeping+0.380
bank+0.355
 bulldo+0.329
 cho+0.319
 tacos+0.314
Neg logit contributions
 nods-0.169
 shrug-0.150
 closer-0.147
itates-0.145
 trusts-0.141
Tokenks
Token ID591
Feature activation-3.49e-02

Pos logit contributions
 chip+0.163
 burgers+0.150
bank+0.147
 ir+0.144
 tacos+0.144
Neg logit contributions
 English-0.055
adies-0.052
 clearly-0.050
 Spanish-0.049
 closely-0.047
Token didn
Token ID1422
Feature activation-1.88e-02

Pos logit contributions
bank+0.065
 chip+0.064
 burgers+0.061
keeping+0.060
 ir+0.060
Neg logit contributions
 English-0.042
 clearly-0.042
 closer-0.040
bell-0.039
 closely-0.038
Token't
Token ID470
Feature activation+2.05e-02

Pos logit contributions
 chip+0.039
bank+0.037
 burgers+0.037
ages+0.036
 tacos+0.035
Neg logit contributions
 English-0.018
adies-0.017
 Spanish-0.017
 clearly-0.016
 closer-0.016
Token know
Token ID760
Feature activation+2.45e-02

Pos logit contributions
 English+0.016
 closer+0.015
 clearly+0.015
 Spanish+0.015
 answer+0.014
Neg logit contributions
 chip-0.048
bank-0.046
 burgers-0.045
ages-0.045
 tacos-0.044
Token!
Token ID0
Feature activation-1.65e-02

Pos logit contributions
 bulldo+0.063
 ir+0.061
 paramed+0.061
bank+0.060
 chip+0.060
Neg logit contributions
 closer-0.018
 clearly-0.016
 closely-0.016
 answer-0.014
 Spanish-0.012
Token
Token ID220
Feature activation+1.69e-02

Pos logit contributions
akers+0.037
rim+0.037
 burgers+0.036
 tacos+0.036
 ir+0.036
Neg logit contributions
 One-0.013
 She-0.011
 English-0.010
 Every-0.010
 Sometimes-0.010
Token
Token ID198
Feature activation-7.97e-04

Pos logit contributions
 English+0.017
adies+0.016
 Spanish+0.016
 clearly+0.015
 closer+0.015
Neg logit contributions
 chip-0.035
bank-0.033
 burgers-0.032
ages-0.031
 tacos-0.031
Token
Token ID198
Feature activation-6.74e-03

Pos logit contributions
 chip+0.002
bank+0.002
 burgers+0.002
 tacos+0.002
 ir+0.002
Neg logit contributions
 English-0.001
 clearly-0.001
adies-0.001
 Spanish-0.001
 closer-0.001
TokenThe
Token ID464
Feature activation-5.33e-02

Pos logit contributions
bank+0.014
 chip+0.014
 burgers+0.013
 tacos+0.013
 hamb+0.013
Neg logit contributions
 English-0.007
One-0.006
 Spanish-0.006
 clearly-0.006
 closer-0.006
Token chip
Token ID11594
Feature activation-1.76e-01

Pos logit contributions
keeping+0.057
bank+0.057
rim+0.049
ages+0.048
bers+0.046
Neg logit contributions
 closer-0.102
 answer-0.102
 name-0.100
 wonders-0.100
 shrug-0.098
Tokenm
Token ID76
Feature activation-1.09e-01

Pos logit contributions
keeping+0.338
bank+0.329
 burgers+0.301
 bulldo+0.301
 cho+0.300
Neg logit contributions
itates-0.176
 nods-0.175
 closer-0.171
 shrug-0.167
 wonders-0.153
Tokenunk
Token ID2954
Feature activation-8.10e-02

Pos logit contributions
 chip+0.162
bank+0.156
 burgers+0.156
 ir+0.149
 tacos+0.144
Neg logit contributions
 name-0.156
 English-0.153
adies-0.153
 clearly-0.146
 wonders-0.144
Token was
Token ID373
Feature activation-9.62e-03

Pos logit contributions
 burgers+0.157
 tacos+0.152
 chip+0.149
 ir+0.149
bank+0.149
Neg logit contributions
 clearly-0.089
 closer-0.084
 English-0.084
 shrug-0.079
bell-0.079
Token excited
Token ID6568
Feature activation+2.33e-02

Pos logit contributions
mons+0.018
 chip+0.018
 ir+0.018
bank+0.018
 bulldo+0.017
Neg logit contributions
 closer-0.012
 clearly-0.011
 English-0.011
 Spanish-0.011
 wonders-0.010
Token and
Token ID290
Feature activation+1.20e-02

Pos logit contributions
 English+0.027
 clearly+0.027
 closer+0.026
 closely+0.026
adies+0.025
Neg logit contributions
 chip-0.044
bank-0.041
 burgers-0.041
ages-0.041
 bulldo-0.040
Token again
Token ID757
Feature activation+5.31e-02

Pos logit contributions
 closer+0.030
 clearly+0.028
 English+0.028
 closely+0.027
adies+0.026
Neg logit contributions
 chip-0.068
 ir-0.065
 tacos-0.064
akers-0.064
ages-0.063
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
 clearly+0.037
 closer+0.036
 English+0.036
 closely+0.035
 another+0.032
Neg logit contributions
 paramed-0.128
rim-0.127
 tacos-0.127
 chip-0.126
 bulldo-0.126
Token
Token ID198
Feature activation-7.23e-04

Pos logit contributions
 chip+0.029
 burgers+0.029
akers+0.028
 tacos+0.028
bank+0.028
Neg logit contributions
 English-0.012
 Spanish-0.011
 One-0.011
 closer-0.010
 clearly-0.010
Token
Token ID198
Feature activation-6.18e-03

Pos logit contributions
 chip+0.002
bank+0.001
 burgers+0.001
 tacos+0.001
 ir+0.001
Neg logit contributions
 English-0.001
adies-0.001
 Spanish-0.001
 clearly-0.001
 closer-0.001
TokenThe
Token ID464
Feature activation-5.12e-02

Pos logit contributions
 chip+0.012
bank+0.012
 burgers+0.012
 tacos+0.011
 hamb+0.011
Neg logit contributions
 English-0.007
 Spanish-0.007
 closer-0.006
One-0.006
adies-0.006
Token chip
Token ID11594
Feature activation-1.75e-01

Pos logit contributions
keeping+0.079
bank+0.076
rim+0.066
ages+0.065
 ir+0.062
Neg logit contributions
 closer-0.084
 answer-0.079
 name-0.078
 shy-0.076
 English-0.075
Tokenmun
Token ID6199
Feature activation-6.56e-02

Pos logit contributions
keeping+0.394
bank+0.365
 bulldo+0.337
 tacos+0.331
 burgers+0.327
Neg logit contributions
 nods-0.146
 closer-0.138
itates-0.133
 shrug-0.129
bird-0.127
Tokenks
Token ID591
Feature activation-2.76e-02

Pos logit contributions
 chip+0.152
 burgers+0.142
bank+0.137
 tacos+0.136
 ir+0.136
Neg logit contributions
 English-0.056
adies-0.054
 clearly-0.052
 Spanish-0.050
 closely-0.048
Token were
Token ID547
Feature activation-5.72e-02

Pos logit contributions
bank+0.052
 chip+0.051
 burgers+0.051
 tacos+0.050
 ir+0.049
Neg logit contributions
 English-0.032
 clearly-0.031
 closer-0.030
 closely-0.028
 Spanish-0.028
Token so
Token ID523
Feature activation-4.25e-02

Pos logit contributions
mons+0.113
bank+0.111
 ir+0.109
ages+0.108
 bulldo+0.107
Neg logit contributions
 closer-0.066
 clearly-0.066
 English-0.064
 Spanish-0.062
 wonders-0.056
Token thankful
Token ID25535
Feature activation-2.63e-02

Pos logit contributions
 chip+0.082
bank+0.078
 burgers+0.076
ages+0.073
 tacos+0.073
Neg logit contributions
 English-0.049
adies-0.048
 Spanish-0.046
 clearly-0.045
 closer-0.044
Token.
Token ID13
Feature activation-4.91e-03

Pos logit contributions
 closer+0.005
 closely+0.005
 English+0.005
 curiously+0.005
 clearly+0.005
Neg logit contributions
rob-0.013
rim-0.013
 pitcher-0.012
 chip-0.012
keeping-0.012
Token
Token ID220
Feature activation+2.16e-02

Pos logit contributions
akers+0.012
rim+0.012
 ir+0.011
ages+0.011
 burgers+0.011
Neg logit contributions
 One-0.004
 Sometimes-0.003
 Every-0.003
 On-0.003
 clearly-0.003
Token
Token ID198
Feature activation+3.12e-03

Pos logit contributions
 English+0.016
 clearly+0.015
 Spanish+0.015
adies+0.014
 closer+0.014
Neg logit contributions
 chip-0.049
 burgers-0.047
bank-0.047
 ir-0.046
 tacos-0.046
Token
Token ID198
Feature activation-3.69e-03

Pos logit contributions
 English+0.002
One+0.002
 One+0.002
 Spanish+0.002
 closer+0.001
Neg logit contributions
 burgers-0.008
 chip-0.008
 tacos-0.007
 ir-0.007
bank-0.007
TokenThe
Token ID464
Feature activation-4.76e-02

Pos logit contributions
keeping+0.008
 burgers+0.008
 ir+0.008
bank+0.008
 tacos+0.008
Neg logit contributions
One-0.004
 One-0.003
 English-0.003
 Spanish-0.002
 closer-0.002
Token chip
Token ID11594
Feature activation-1.75e-01

Pos logit contributions
keeping+0.079
bank+0.072
rim+0.071
rob+0.067
ages+0.065
Neg logit contributions
 closer-0.070
 answer-0.069
 name-0.068
 wonders-0.068
 moral-0.067
Tokenm
Token ID76
Feature activation-1.08e-01

Pos logit contributions
keeping+0.381
bank+0.336
 cho+0.315
 bulldo+0.312
 burgers+0.301
Neg logit contributions
 nods-0.173
 shrug-0.153
 closer-0.153
itates-0.152
 trusts-0.136
Tokenunk
Token ID2954
Feature activation-7.27e-02

Pos logit contributions
rob+0.179
keeping+0.177
bank+0.172
stack+0.168
 ir+0.168
Neg logit contributions
 name-0.137
 shrug-0.118
 upon-0.116
urious-0.113
aly-0.113
Token told
Token ID1297
Feature activation-2.28e-02

Pos logit contributions
 chip+0.138
 burgers+0.138
keeping+0.137
rim+0.132
akers+0.132
Neg logit contributions
 clearly-0.082
 closer-0.081
 English-0.080
 shrug-0.075
 closely-0.075
Token the
Token ID262
Feature activation-1.70e-03

Pos logit contributions
akers+0.055
ages+0.051
keeping+0.049
rim+0.049
 lin+0.049
Neg logit contributions
 closely-0.017
 clearly-0.016
 One-0.016
 another-0.016
 anyone-0.016
Token boy
Token ID2933
Feature activation-3.45e-03

Pos logit contributions
keeping+0.002
rim+0.002
rob+0.002
akers+0.002
stack+0.002
Neg logit contributions
 name-0.003
 answer-0.003
 wonders-0.003
 question-0.003
 closer-0.003
Token too
Token ID1165
Feature activation-2.14e-03

Pos logit contributions
 closer+0.007
 closely+0.007
 clearly+0.007
 English+0.006
 hello+0.005
Neg logit contributions
rim-0.024
rob-0.024
akers-0.024
 bulldo-0.024
 ir-0.023
Token.
Token ID13
Feature activation-9.56e-03

Pos logit contributions
 chip+0.004
bank+0.004
 burgers+0.004
rim+0.004
 ir+0.004
Neg logit contributions
 English-0.002
 clearly-0.002
 closer-0.002
 Spanish-0.002
adies-0.002
Token
Token ID198
Feature activation+3.66e-03

Pos logit contributions
rim+0.021
akers+0.021
 burgers+0.021
 ir+0.021
 tacos+0.021
Neg logit contributions
 One-0.007
 English-0.007
 clearly-0.006
 Spanish-0.006
 Sometimes-0.006
Token
Token ID198
Feature activation-2.41e-03

Pos logit contributions
 English+0.004
adies+0.004
 Spanish+0.003
 clearly+0.003
 closer+0.003
Neg logit contributions
 chip-0.008
bank-0.007
 burgers-0.007
ages-0.007
 tacos-0.007
TokenThe
Token ID464
Feature activation-4.79e-02

Pos logit contributions
bank+0.005
 chip+0.005
 burgers+0.005
 hamb+0.005
 tacos+0.005
Neg logit contributions
One-0.002
 English-0.002
She-0.002
 clearly-0.002
 Spanish-0.002
Token chip
Token ID11594
Feature activation-1.74e-01

Pos logit contributions
keeping+0.085
bank+0.084
ages+0.079
rim+0.076
akers+0.076
Neg logit contributions
 closer-0.063
 answer-0.059
 English-0.057
 wonders-0.057
 Spanish-0.056
Tokenm
Token ID76
Feature activation-1.07e-01

Pos logit contributions
keeping+0.359
bank+0.330
 bulldo+0.301
 burgers+0.300
 cho+0.298
Neg logit contributions
 nods-0.175
 closer-0.167
 shrug-0.162
itates-0.157
 trusts-0.149
Tokenunk
Token ID2954
Feature activation-7.55e-02

Pos logit contributions
bank+0.159
 burgers+0.158
 chip+0.156
keeping+0.149
 ir+0.148
Neg logit contributions
 name-0.148
 English-0.137
 clearly-0.134
 wonders-0.134
 shrug-0.133
Token helped
Token ID4193
Feature activation-1.17e-02

Pos logit contributions
 burgers+0.147
 chip+0.144
 tacos+0.141
bank+0.139
akers+0.138
Neg logit contributions
 clearly-0.083
 closer-0.082
 English-0.080
 shrug-0.075
 closely-0.074
Token the
Token ID262
Feature activation+7.11e-03

Pos logit contributions
rob+0.026
akers+0.025
 ir+0.024
 bulldo+0.024
ages+0.024
Neg logit contributions
 closer-0.011
 curiously-0.010
 closely-0.009
 answer-0.009
 clearly-0.009
Token frog
Token ID21264
Feature activation+2.43e-02

Pos logit contributions
 closer+0.008
 wonders+0.007
 answer+0.007
 shy+0.006
 reply+0.006
Neg logit contributions
keeping-0.014
dogs-0.013
bank-0.013
akers-0.013
rim-0.013
Token around
Token ID1088
Feature activation+2.34e-02

Pos logit contributions
 chip+0.057
ages+0.055
akers+0.054
 tacos+0.053
 burgers+0.053
Neg logit contributions
 closer-0.026
 English-0.024
 Spanish-0.023
 clearly-0.022
 closely-0.021
Token.
Token ID13
Feature activation-1.78e-02

Pos logit contributions
 closer+0.027
 clearly+0.024
 English+0.024
 closely+0.024
 Spanish+0.022
Neg logit contributions
ages-0.050
 chip-0.049
akers-0.047
 burgers-0.046
 tacos-0.046
Token Suddenly
Token ID24975
Feature activation-1.76e-02

Pos logit contributions
 chip+0.040
 burgers+0.039
rim+0.038
 ir+0.038
bank+0.038
Neg logit contributions
 English-0.014
 One-0.012
 Spanish-0.012
 clearly-0.012
adies-0.012
Token,
Token ID11
Feature activation-4.39e-02

Pos logit contributions
 chip+0.031
bank+0.030
 burgers+0.029
rim+0.029
 tacos+0.029
Neg logit contributions
 English-0.023
adies-0.022
 closer-0.022
 Spanish-0.021
 clearly-0.021
Token the
Token ID262
Feature activation-5.98e-02

Pos logit contributions
bank+0.099
rim+0.098
akers+0.096
 chip+0.096
ages+0.095
Neg logit contributions
 closer-0.040
 English-0.035
 clearly-0.033
 hello-0.033
 nearer-0.031
Token chip
Token ID11594
Feature activation-1.74e-01

Pos logit contributions
bank+0.120
rim+0.119
keeping+0.115
akers+0.111
 hyd+0.110
Neg logit contributions
 name-0.060
 answer-0.057
 closer-0.057
 wonders-0.054
 shy-0.053
Tokenm
Token ID76
Feature activation-1.14e-01

Pos logit contributions
keeping+0.365
bank+0.323
 bulldo+0.308
 hyd+0.306
 burgers+0.301
Neg logit contributions
 nods-0.172
 closer-0.153
 shrug-0.148
She-0.141
 clearly-0.141
Tokenunk
Token ID2954
Feature activation-8.00e-02

Pos logit contributions
 chip+0.175
 burgers+0.172
bank+0.168
 ir+0.167
 bulldo+0.162
Neg logit contributions
 name-0.148
aly-0.137
 clearly-0.137
 English-0.134
 whispers-0.134
Token stopped
Token ID5025
Feature activation+2.79e-02

Pos logit contributions
 chip+0.148
 burgers+0.144
bank+0.140
 tacos+0.138
 ir+0.136
Neg logit contributions
 closer-0.100
 clearly-0.097
 English-0.096
 closely-0.091
bell-0.089
Token and
Token ID290
Feature activation-1.81e-03

Pos logit contributions
 closer+0.036
 clearly+0.033
 English+0.033
 closely+0.032
 Spanish+0.031
Neg logit contributions
 chip-0.053
ages-0.052
 burgers-0.051
 tacos-0.050
bank-0.050
Token looked
Token ID3114
Feature activation+2.50e-02

Pos logit contributions
 chip+0.003
akers+0.003
 burgers+0.003
ages+0.003
bank+0.003
Neg logit contributions
 closer-0.002
 clearly-0.002
 English-0.002
 closely-0.002
adies-0.002
Token after
Token ID706
Feature activation-5.47e-02

Pos logit contributions
 chip+0.071
 tacos+0.070
rim+0.070
 burgers+0.069
bank+0.068
Neg logit contributions
 clearly-0.021
 closer-0.020
 English-0.018
 closely-0.018
 Spanish-0.017
Token that
Token ID326
Feature activation-3.89e-02

Pos logit contributions
 chip+0.105
bank+0.101
 ir+0.096
 bulldo+0.095
ages+0.095
Neg logit contributions
 clearly-0.066
 English-0.061
 closer-0.059
 closely-0.059
 Spanish-0.056
Token.
Token ID13
Feature activation-2.37e-02

Pos logit contributions
rim+0.093
 tacos+0.089
mons+0.088
akers+0.088
uc+0.087
Neg logit contributions
 closely-0.034
 One-0.032
 closer-0.031
 wonders-0.031
 English-0.028
Token Whenever
Token ID21326
Feature activation+1.37e-02

Pos logit contributions
 chip+0.046
 burgers+0.045
akers+0.045
rim+0.045
 ir+0.044
Neg logit contributions
 English-0.026
 One-0.025
 Spanish-0.024
 clearly-0.022
 Sometimes-0.022
Token the
Token ID262
Feature activation-6.15e-02

Pos logit contributions
 English+0.015
 anyone+0.015
 closer+0.014
 Spanish+0.013
 Crystal+0.013
Neg logit contributions
rim-0.028
 chip-0.027
bank-0.027
akers-0.027
ages-0.027
Token chip
Token ID11594
Feature activation-1.74e-01

Pos logit contributions
keeping+0.098
bank+0.086
rob+0.084
ages+0.080
bek+0.080
Neg logit contributions
 closer-0.092
 answer-0.090
 wonders-0.090
 name-0.089
 shy-0.088
Tokenm
Token ID76
Feature activation-1.09e-01

Pos logit contributions
keeping+0.378
icol+0.340
bank+0.326
 cho+0.310
 bulldo+0.308
Neg logit contributions
 nods-0.153
 closer-0.138
 shrug-0.133
 cannot-0.130
bird-0.126
Tokenunk
Token ID2954
Feature activation-8.11e-02

Pos logit contributions
rob+0.176
 ir+0.171
bek+0.170
bank+0.170
keeping+0.169
Neg logit contributions
 name-0.133
 shrug-0.119
 upon-0.119
 wonders-0.118
 closer-0.114
Token needed
Token ID2622
Feature activation-3.14e-02

Pos logit contributions
 chip+0.147
bank+0.139
 burgers+0.139
akers+0.135
 tacos+0.134
Neg logit contributions
 English-0.100
 closer-0.100
 clearly-0.099
 wonders-0.095
 Spanish-0.093
Token help
Token ID1037
Feature activation-1.50e-02

Pos logit contributions
 chip+0.067
rim+0.066
akers+0.066
bank+0.066
ages+0.065
Neg logit contributions
 closer-0.031
 English-0.028
 Spanish-0.027
 hello-0.026
 answer-0.025
Token,
Token ID11
Feature activation-1.60e-02

Pos logit contributions
rob+0.034
akers+0.032
 bulldo+0.032
rim+0.032
 chip+0.032
Neg logit contributions
 closer-0.013
 closely-0.012
 clearly-0.012
 English-0.012
 answer-0.012
Token floor
Token ID4314
Feature activation-3.89e-02

Pos logit contributions
uc+0.027
ages+0.027
rim+0.026
bank+0.026
akers+0.025
Neg logit contributions
 closer-0.020
 closely-0.019
 diary-0.018
 open-0.018
 answer-0.018
Token.
Token ID13
Feature activation-1.21e-02

Pos logit contributions
 chip+0.096
 bulldo+0.091
 ir+0.090
rim+0.090
ages+0.090
Neg logit contributions
 clearly-0.026
 closely-0.024
 closer-0.023
 English-0.020
bell-0.018
Token
Token ID198
Feature activation+3.87e-03

Pos logit contributions
 chip+0.024
bank+0.023
ages+0.022
 burgers+0.022
rim+0.022
Neg logit contributions
 English-0.013
 Spanish-0.012
 One-0.012
adies-0.012
 clearly-0.012
Token
Token ID198
Feature activation-2.47e-03

Pos logit contributions
 English+0.004
adies+0.004
 Spanish+0.004
 clearly+0.004
 closer+0.004
Neg logit contributions
 chip-0.008
bank-0.007
 burgers-0.007
ages-0.007
 tacos-0.007
TokenThe
Token ID464
Feature activation-5.52e-02

Pos logit contributions
 chip+0.005
bank+0.004
 burgers+0.004
 ir+0.004
 tacos+0.004
Neg logit contributions
 English-0.003
adies-0.003
 Spanish-0.003
 closer-0.003
One-0.003
Token chip
Token ID11594
Feature activation-1.72e-01

Pos logit contributions
bank+0.102
 chip+0.098
ages+0.095
rim+0.094
 burgers+0.093
Neg logit contributions
 closer-0.065
 English-0.065
 Spanish-0.061
adies-0.059
 diary-0.059
Tokenmun
Token ID6199
Feature activation-6.46e-02

Pos logit contributions
keeping+0.363
bank+0.334
 bulldo+0.310
 cho+0.303
 piling+0.295
Neg logit contributions
 nods-0.170
 shrug-0.147
 closer-0.142
le-0.142
 trusts-0.140
Tokenks
Token ID591
Feature activation-3.20e-02

Pos logit contributions
 chip+0.146
 burgers+0.134
bank+0.133
 ir+0.129
 tacos+0.129
Neg logit contributions
 English-0.058
adies-0.057
 Spanish-0.053
 clearly-0.052
 closer-0.050
Token were
Token ID547
Feature activation-5.99e-02

Pos logit contributions
bank+0.059
 chip+0.059
 burgers+0.056
keeping+0.055
 ir+0.055
Neg logit contributions
 English-0.038
 clearly-0.037
 closer-0.036
bell-0.035
 closely-0.035
Token very
Token ID845
Feature activation-4.46e-02

Pos logit contributions
rob+0.120
bank+0.117
ages+0.116
mons+0.114
keeping+0.110
Neg logit contributions
 closer-0.071
 clearly-0.068
 English-0.068
 Spanish-0.067
 closely-0.064
Token curious
Token ID11040
Feature activation+3.41e-02

Pos logit contributions
 chip+0.095
bank+0.093
keeping+0.092
rob+0.091
ages+0.090
Neg logit contributions
 clearly-0.042
 closely-0.042
 closer-0.042
 English-0.040
 wonders-0.038
Token leave
Token ID2666
Feature activation-6.70e-04

Pos logit contributions
ages+0.041
akers+0.041
 ir+0.041
mons+0.040
bers+0.040
Neg logit contributions
 closer-0.023
 clearly-0.021
 English-0.021
 answer-0.020
 name-0.020
Token!"
Token ID2474
Feature activation-1.11e-02

Pos logit contributions
 chip+0.001
bank+0.001
 ir+0.001
 bulldo+0.001
ages+0.001
Neg logit contributions
 closer-0.001
 English-0.001
 closely-0.001
adies-0.001
 Spanish-0.001
Token Suddenly
Token ID24975
Feature activation-1.00e-02

Pos logit contributions
akers+0.026
rim+0.026
 burgers+0.025
 tacos+0.025
 hamb+0.024
Neg logit contributions
 One-0.008
 English-0.008
 closer-0.008
 clearly-0.007
 Spanish-0.007
Token,
Token ID11
Feature activation-3.61e-02

Pos logit contributions
 chip+0.018
bank+0.018
rim+0.018
 burgers+0.018
akers+0.017
Neg logit contributions
 English-0.012
 closer-0.012
adies-0.012
 Spanish-0.011
 hello-0.011
Token the
Token ID262
Feature activation-5.54e-02

Pos logit contributions
bank+0.076
rim+0.075
akers+0.074
 chip+0.073
keeping+0.072
Neg logit contributions
 closer-0.038
 English-0.035
 clearly-0.032
 hello-0.032
 Venus-0.031
Token chip
Token ID11594
Feature activation-1.72e-01

Pos logit contributions
bank+0.117
rim+0.113
ages+0.113
akers+0.112
keeping+0.111
Neg logit contributions
 closer-0.050
 English-0.045
 answer-0.044
 name-0.042
 wonders-0.042
Tokenmun
Token ID6199
Feature activation-6.33e-02

Pos logit contributions
keeping+0.377
bank+0.330
 bulldo+0.307
 ir+0.306
icol+0.304
Neg logit contributions
 closer-0.168
 nods-0.163
bird-0.152
 shrug-0.148
le-0.145
Tokenks
Token ID591
Feature activation-2.98e-02

Pos logit contributions
 chip+0.154
 burgers+0.142
 ir+0.138
bank+0.137
 tacos+0.136
Neg logit contributions
 English-0.047
adies-0.047
 clearly-0.046
 closely-0.043
 Spanish-0.042
Token were
Token ID547
Feature activation-6.19e-02

Pos logit contributions
 chip+0.056
bank+0.054
 burgers+0.052
 tacos+0.051
keeping+0.051
Neg logit contributions
 English-0.036
 closer-0.036
 clearly-0.035
adies-0.034
bell-0.033
Token free
Token ID1479
Feature activation-1.86e-02

Pos logit contributions
bank+0.113
 chip+0.112
keeping+0.108
 ir+0.107
ages+0.107
Neg logit contributions
 closer-0.081
 clearly-0.076
 English-0.075
 Spanish-0.070
 closely-0.069
Token again
Token ID757
Feature activation+3.39e-02

Pos logit contributions
 chip+0.037
 tacos+0.037
ages+0.037
rob+0.036
 burgers+0.036
Neg logit contributions
 closer-0.020
 clearly-0.018
 English-0.018
 closely-0.018
 Spanish-0.017
Token!"
Token ID2474
Feature activation+3.25e-03

Pos logit contributions
 chip+0.135
 burgers+0.129
 ir+0.125
bank+0.125
 tacos+0.124
Neg logit contributions
adies-0.070
 English-0.070
 wonders-0.069
 clearly-0.069
 hello-0.068
Token said
Token ID531
Feature activation-2.47e-02

Pos logit contributions
 closer+0.003
 One+0.003
 English+0.003
 Crystal+0.003
 hello+0.003
Neg logit contributions
akers-0.007
rim-0.007
bank-0.006
 burgers-0.006
 chip-0.006
Token Lily
Token ID20037
Feature activation+2.79e-03

Pos logit contributions
bank+0.044
 chip+0.044
 burgers+0.043
ages+0.042
akers+0.042
Neg logit contributions
 English-0.030
 closer-0.029
 Spanish-0.028
adies-0.028
 clearly-0.027
Token.
Token ID13
Feature activation-1.06e-02

Pos logit contributions
 English+0.003
 closer+0.003
 clearly+0.003
 Spanish+0.003
 closely+0.003
Neg logit contributions
 chip-0.005
 burgers-0.005
bank-0.005
 tacos-0.005
 ir-0.005
Token The
Token ID383
Feature activation-2.91e-02

Pos logit contributions
akers+0.022
rim+0.022
 burgers+0.021
 ir+0.020
 tacos+0.020
Neg logit contributions
 One-0.011
 English-0.010
 hello-0.010
 closer-0.009
 Crystal-0.009
Token por
Token ID16964
Feature activation-1.68e-01

Pos logit contributions
 chip+0.045
bank+0.044
 burgers+0.043
ages+0.043
 ir+0.041
Neg logit contributions
 English-0.042
 closer-0.041
 Spanish-0.040
adies-0.039
 clearly-0.039
Tokencup
Token ID25244
Feature activation-4.93e-02

Pos logit contributions
 burgers+0.272
 chip+0.264
bank+0.258
 ir+0.253
 tacos+0.245
Neg logit contributions
 clearly-0.209
 English-0.208
adies-0.206
itates-0.199
 closer-0.196
Tokenine
Token ID500
Feature activation-8.16e-02

Pos logit contributions
 burgers+0.080
 chip+0.076
 tacos+0.074
bank+0.072
 ir+0.072
Neg logit contributions
 English-0.066
 clearly-0.064
 closer-0.063
 name-0.062
 closely-0.062
Token said
Token ID531
Feature activation-2.33e-02

Pos logit contributions
 chip+0.146
bank+0.143
 burgers+0.143
akers+0.140
 ir+0.137
Neg logit contributions
 English-0.096
 clearly-0.096
 closer-0.095
 closely-0.093
 wonders-0.089
Token nothing
Token ID2147
Feature activation+1.95e-02

Pos logit contributions
 chip+0.038
bank+0.036
 burgers+0.035
 ir+0.034
 tacos+0.034
Neg logit contributions
 English-0.033
adies-0.032
 closer-0.032
 Spanish-0.031
 clearly-0.031
Token and
Token ID290
Feature activation-2.46e-04

Pos logit contributions
 English+0.026
adies+0.025
 clearly+0.024
 Spanish+0.024
 closer+0.024
Neg logit contributions
 chip-0.034
bank-0.032
 burgers-0.032
 tacos-0.031
ages-0.030
Token por
Token ID16964
Feature activation-1.28e-01

Pos logit contributions
 closer+0.012
 English+0.010
 diary+0.010
 hello+0.010
 name+0.010
Neg logit contributions
ages-0.019
bank-0.019
keeping-0.018
 ir-0.018
ards-0.018
Tokencup
Token ID25244
Feature activation-2.47e-02

Pos logit contributions
 ir+0.217
 burgers+0.212
bank+0.207
 chip+0.205
keeping+0.205
Neg logit contributions
adies-0.152
 name-0.135
 clearly-0.134
 hello-0.133
 closer-0.131
Tokenine
Token ID500
Feature activation-4.90e-02

Pos logit contributions
 burgers+0.039
 gol+0.039
 ir+0.038
 chip+0.037
 bulldo+0.037
Neg logit contributions
 name-0.032
adies-0.031
 clearly-0.031
 closely-0.031
 whispers-0.029
Token!
Token ID0
Feature activation-1.80e-02

Pos logit contributions
 ir+0.107
 chip+0.104
 burgers+0.102
rim+0.101
bank+0.100
Neg logit contributions
 closely-0.042
 clearly-0.039
 closer-0.037
 hello-0.032
 name-0.031
Token The
Token ID383
Feature activation-3.72e-02

Pos logit contributions
 chip+0.039
bank+0.037
 burgers+0.036
 tacos+0.035
ages+0.035
Neg logit contributions
 English-0.016
adies-0.016
 Spanish-0.015
 clearly-0.015
 closer-0.014
Token por
Token ID16964
Feature activation-1.67e-01

Pos logit contributions
bank+0.054
keeping+0.053
ages+0.052
 chip+0.052
 burgers+0.050
Neg logit contributions
 closer-0.055
 wonders-0.053
 name-0.052
 answer-0.052
 English-0.052
Tokencup
Token ID25244
Feature activation-5.63e-02

Pos logit contributions
 burgers+0.285
 ir+0.279
 chip+0.279
 gol+0.272
bank+0.269
Neg logit contributions
adies-0.190
 clearly-0.178
eks-0.168
 name-0.166
 hello-0.162
Tokenine
Token ID500
Feature activation-8.37e-02

Pos logit contributions
 burgers+0.087
 chip+0.086
 tacos+0.079
 bulldo+0.079
 ir+0.079
Neg logit contributions
 clearly-0.078
 English-0.075
adies-0.075
 name-0.075
 wonders-0.074
Token qu
Token ID627
Feature activation-7.19e-02

Pos logit contributions
 chip+0.151
bank+0.145
 burgers+0.145
 ir+0.143
rim+0.139
Neg logit contributions
 clearly-0.102
 closely-0.098
 wonders-0.094
adies-0.093
 closer-0.093
Tokenilled
Token ID2967
Feature activation-5.55e-02

Pos logit contributions
 chip+0.109
bank+0.100
 burgers+0.098
 ir+0.093
 tacos+0.093
Neg logit contributions
 English-0.112
adies-0.110
 Spanish-0.106
 clearly-0.105
 closer-0.104
Token and
Token ID290
Feature activation-1.98e-02

Pos logit contributions
 chip+0.103
bank+0.100
 burgers+0.098
 bulldo+0.097
ages+0.096
Neg logit contributions
 English-0.067
 closer-0.063
 Spanish-0.062
adies-0.061
 closely-0.061
Token
Token ID220
Feature activation+2.92e-02

Pos logit contributions
 English+0.003
adies+0.003
 Spanish+0.003
 closer+0.003
 clearly+0.003
Neg logit contributions
 chip-0.005
bank-0.005
 burgers-0.004
ages-0.004
 tacos-0.004
Token
Token ID198
Feature activation+1.55e-02

Pos logit contributions
 English+0.045
adies+0.044
 Spanish+0.043
 closer+0.041
bell+0.039
Neg logit contributions
 chip-0.044
bank-0.040
ages-0.039
 burgers-0.037
akers-0.035
Token
Token ID198
Feature activation+9.20e-03

Pos logit contributions
adies+0.026
 English+0.025
 Spanish+0.024
 diary+0.023
bell+0.023
Neg logit contributions
 chip-0.022
bank-0.021
ages-0.017
Su-0.017
 burgers-0.017
TokenWhen
Token ID2215
Feature activation-1.67e-02

Pos logit contributions
 English+0.012
adies+0.011
 Spanish+0.011
 closer+0.011
 clearly+0.011
Neg logit contributions
 chip-0.017
bank-0.016
 burgers-0.015
ages-0.015
 ir-0.014
Token the
Token ID262
Feature activation-5.60e-02

Pos logit contributions
bank+0.041
rim+0.040
ages+0.039
 chip+0.039
akers+0.039
Neg logit contributions
 closer-0.011
adies-0.009
 English-0.009
 Crystal-0.009
 One-0.009
Token chip
Token ID11594
Feature activation-1.64e-01

Pos logit contributions
bank+0.083
keeping+0.080
 ir+0.071
rim+0.069
akers+0.067
Neg logit contributions
 name-0.087
 answer-0.085
 closer-0.084
 shrug-0.080
 wonders-0.079
Tokenm
Token ID76
Feature activation-1.05e-01

Pos logit contributions
keeping+0.358
bank+0.311
 ir+0.290
icol+0.289
 cho+0.286
Neg logit contributions
 nods-0.132
 closer-0.124
 shrug-0.120
itates-0.117
One-0.116
Tokenunk
Token ID2954
Feature activation-7.35e-02

Pos logit contributions
bank+0.147
 chip+0.144
 burgers+0.137
keeping+0.136
 ir+0.136
Neg logit contributions
 name-0.153
adies-0.151
 closer-0.144
 wonders-0.144
aly-0.143
Token reached
Token ID4251
Feature activation-3.08e-04

Pos logit contributions
 chip+0.142
 burgers+0.131
bank+0.130
 tacos+0.127
akers+0.127
Neg logit contributions
 closer-0.086
adies-0.083
 English-0.081
 clearly-0.080
bell-0.078
Token her
Token ID607
Feature activation+4.96e-02

Pos logit contributions
 chip+0.001
bank+0.001
 burgers+0.000
ages+0.000
 ir+0.000
Neg logit contributions
 English-0.000
adies-0.000
 closer-0.000
 clearly-0.000
 Spanish-0.000
Token home
Token ID1363
Feature activation-1.52e-02

Pos logit contributions
 English+0.067
adies+0.063
 Spanish+0.063
 closer+0.063
 clearly+0.063
Neg logit contributions
 chip-0.084
bank-0.080
 burgers-0.078
 tacos-0.076
 ir-0.076
Token chip
Token ID11594
Feature activation-1.19e-01

Pos logit contributions
 closer+0.007
 English+0.007
 Spanish+0.006
 diary+0.006
 name+0.006
Neg logit contributions
bank-0.015
ages-0.015
ards-0.014
keeping-0.014
mons-0.014
Tokenm
Token ID76
Feature activation-6.35e-02

Pos logit contributions
bank+0.212
keeping+0.212
ages+0.203
 tacos+0.201
 ir+0.199
Neg logit contributions
 closer-0.138
 clearly-0.119
 shrug-0.115
le-0.113
 answer-0.110
Tokenunk
Token ID2954
Feature activation-4.12e-02

Pos logit contributions
 chip+0.096
 burgers+0.094
 ir+0.093
bank+0.093
keeping+0.088
Neg logit contributions
 name-0.087
 English-0.081
aly-0.081
 clearly-0.079
adies-0.079
Token!
Token ID0
Feature activation-2.56e-02

Pos logit contributions
 tacos+0.083
 ir+0.082
 burgers+0.081
bank+0.081
ages+0.080
Neg logit contributions
 closer-0.043
 clearly-0.038
 closely-0.038
 English-0.035
 hello-0.034
Token The
Token ID383
Feature activation-4.40e-02

Pos logit contributions
akers+0.054
 chip+0.053
rim+0.052
 burgers+0.052
ages+0.052
Neg logit contributions
 One-0.024
 English-0.022
 hello-0.021
 Spanish-0.021
 closer-0.021
Token chip
Token ID11594
Feature activation-1.62e-01

Pos logit contributions
keeping+0.052
bank+0.051
 ir+0.050
rim+0.048
akers+0.047
Neg logit contributions
 closer-0.079
 name-0.078
 answer-0.075
 wonders-0.074
 shrug-0.073
Tokenm
Token ID76
Feature activation-9.93e-02

Pos logit contributions
keeping+0.313
bank+0.289
 bulldo+0.275
 hyd+0.272
 ir+0.272
Neg logit contributions
 closer-0.165
 nods-0.164
 shrug-0.155
itates-0.148
 wonders-0.146
Tokenunk
Token ID2954
Feature activation-6.96e-02

Pos logit contributions
 chip+0.154
 ir+0.152
bank+0.151
 burgers+0.149
keeping+0.141
Neg logit contributions
 name-0.132
aly-0.123
 English-0.122
adies-0.119
 clearly-0.118
Token said
Token ID531
Feature activation-2.52e-02

Pos logit contributions
 burgers+0.133
keeping+0.131
 tacos+0.130
 chip+0.129
dogs+0.127
Neg logit contributions
 clearly-0.082
 closer-0.078
 closely-0.075
 English-0.073
 curiously-0.072
Token to
Token ID284
Feature activation-4.19e-02

Pos logit contributions
 chip+0.046
bank+0.045
 ir+0.044
akers+0.044
rim+0.044
Neg logit contributions
 closer-0.031
 English-0.031
 hello-0.031
 Spanish-0.029
adies-0.029
Token Sally
Token ID25737
Feature activation+4.67e-03

Pos logit contributions
ages+0.071
 chip+0.070
bank+0.067
 burgers+0.067
akers+0.066
Neg logit contributions
 English-0.056
 closer-0.055
 clearly-0.054
adies-0.054
 Spanish-0.053
Token bird
Token ID6512
Feature activation+4.20e-02

Pos logit contributions
 English+0.026
 closer+0.025
 Spanish+0.024
 wonders+0.024
 clearly+0.024
Neg logit contributions
 chip-0.037
ages-0.036
bank-0.036
 burgers-0.036
 tacos-0.035
Token.
Token ID13
Feature activation-8.22e-03

Pos logit contributions
 closer+0.041
 English+0.039
 clearly+0.039
 closely+0.037
 Spanish+0.037
Neg logit contributions
 chip-0.088
 burgers-0.085
 bulldo-0.084
 ir-0.084
 tacos-0.084
Token Luckily
Token ID24262
Feature activation-3.13e-02

Pos logit contributions
rim+0.018
akers+0.018
 ir+0.017
 burgers+0.017
ages+0.017
Neg logit contributions
 One-0.008
 English-0.007
 Spanish-0.007
 Crystal-0.006
 clearly-0.006
Token,
Token ID11
Feature activation-2.72e-02

Pos logit contributions
 chip+0.060
rim+0.059
 tacos+0.058
 burgers+0.058
bank+0.058
Neg logit contributions
 closer-0.036
 English-0.035
adies-0.034
 closely-0.032
 Spanish-0.031
Token the
Token ID262
Feature activation-4.84e-02

Pos logit contributions
rim+0.062
akers+0.058
ages+0.058
bank+0.057
 chip+0.056
Neg logit contributions
 closer-0.030
 English-0.026
 another-0.024
 Spanish-0.023
 clearly-0.023
Token chip
Token ID11594
Feature activation-1.61e-01

Pos logit contributions
 chip+0.085
bank+0.083
 burgers+0.081
ages+0.081
 tacos+0.079
Neg logit contributions
 English-0.063
 closer-0.059
 Spanish-0.059
adies-0.057
 clearly-0.056
Tokenm
Token ID76
Feature activation-9.96e-02

Pos logit contributions
keeping+0.301
bank+0.268
 bulldo+0.258
 burgers+0.253
 hyd+0.250
Neg logit contributions
 nods-0.179
 closer-0.178
 shrug-0.168
 clearly-0.160
 wonders-0.158
Tokenunk
Token ID2954
Feature activation-6.64e-02

Pos logit contributions
 chip+0.149
 burgers+0.141
bank+0.139
 ir+0.132
 tacos+0.131
Neg logit contributions
 English-0.150
adies-0.145
 clearly-0.142
 Spanish-0.142
 closer-0.141
Token said
Token ID531
Feature activation-1.95e-02

Pos logit contributions
 chip+0.123
 burgers+0.118
bank+0.112
 ir+0.112
 tacos+0.112
Neg logit contributions
 English-0.084
 closer-0.082
 clearly-0.082
 closely-0.077
 Spanish-0.076
Token they
Token ID484
Feature activation-4.86e-02

Pos logit contributions
 chip+0.044
bank+0.043
rim+0.043
akers+0.042
keeping+0.042
Neg logit contributions
 English-0.017
 hello-0.017
 closer-0.016
 Spanish-0.015
adies-0.015
Token should
Token ID815
Feature activation-6.84e-03

Pos logit contributions
 chip+0.080
bank+0.075
 burgers+0.073
 tacos+0.070
ages+0.070
Neg logit contributions
 English-0.069
adies-0.067
 Spanish-0.065
 clearly-0.064
 closer-0.064
Token in
Token ID287
Feature activation-2.00e-02

Pos logit contributions
 English+0.025
 closer+0.025
 clearly+0.023
 Spanish+0.023
adies+0.023
Neg logit contributions
 chip-0.039
bank-0.037
 burgers-0.037
 tacos-0.036
 ir-0.036
Token her
Token ID607
Feature activation+2.63e-02

Pos logit contributions
 chip+0.041
bank+0.040
ages+0.040
 burgers+0.039
 tacos+0.039
Neg logit contributions
 English-0.022
 Spanish-0.021
 closer-0.020
 clearly-0.019
 wonders-0.019
Token paws
Token ID49193
Feature activation-3.45e-02

Pos logit contributions
 English+0.034
 diary+0.033
 name+0.032
 wonders+0.032
 Spanish+0.031
Neg logit contributions
 chip-0.048
bank-0.046
 burgers-0.045
rim-0.044
dogs-0.043
Token.
Token ID13
Feature activation-1.15e-02

Pos logit contributions
 chip+0.079
 tacos+0.077
 burgers+0.077
rim+0.076
 ir+0.075
Neg logit contributions
 closer-0.026
 closely-0.025
 clearly-0.025
 English-0.023
 One-0.022
Token The
Token ID383
Feature activation-3.87e-02

Pos logit contributions
rim+0.028
akers+0.027
 burgers+0.027
 chip+0.027
 ir+0.026
Neg logit contributions
 One-0.007
 English-0.007
 clearly-0.006
 Spanish-0.006
adies-0.006
Token chip
Token ID11594
Feature activation-1.60e-01

Pos logit contributions
bank+0.057
keeping+0.054
rim+0.053
akers+0.052
ages+0.052
Neg logit contributions
 closer-0.059
 English-0.056
 wonders-0.056
 name-0.055
 shrug-0.055
Tokenm
Token ID76
Feature activation-1.03e-01

Pos logit contributions
keeping+0.304
bank+0.290
 burgers+0.272
 tacos+0.270
 bulldo+0.266
Neg logit contributions
 closer-0.167
itates-0.162
 shrug-0.161
 nods-0.160
 wonders-0.151
Tokenunk
Token ID2954
Feature activation-6.71e-02

Pos logit contributions
 chip+0.135
bank+0.126
 burgers+0.124
 ir+0.117
 tacos+0.116
Neg logit contributions
 English-0.175
adies-0.171
 Spanish-0.166
 clearly-0.165
 closer-0.164
Token got
Token ID1392
Feature activation-2.60e-02

Pos logit contributions
 chip+0.120
 burgers+0.116
 tacos+0.113
bank+0.111
dogs+0.108
Neg logit contributions
 English-0.087
 clearly-0.086
 closer-0.085
 closely-0.082
 wonders-0.079
Token closer
Token ID5699
Feature activation+8.95e-02

Pos logit contributions
 chip+0.054
bank+0.051
 burgers+0.050
 ir+0.048
ages+0.048
Neg logit contributions
 English-0.026
adies-0.026
 Spanish-0.023
 clearly-0.022
itates-0.021
Token and
Token ID290
Feature activation+2.63e-03

Pos logit contributions
 closer+0.105
 English+0.101
 clearly+0.100
 closely+0.097
 Spanish+0.093
Neg logit contributions
 chip-0.172
 burgers-0.162
rim-0.161
 tacos-0.160
bank-0.160
Token boy
Token ID2933
Feature activation-4.19e-02

Pos logit contributions
keeping+0.061
bank+0.057
rim+0.056
rob+0.053
ages+0.047
Neg logit contributions
 name-0.085
 closer-0.085
 shy-0.084
 wonders-0.083
 another-0.081
Token finally
Token ID3443
Feature activation-1.52e-02

Pos logit contributions
 chip+0.084
 burgers+0.082
mons+0.081
rim+0.081
akers+0.081
Neg logit contributions
 clearly-0.045
 closer-0.043
 English-0.042
 wonders-0.042
bell-0.041
Token understood
Token ID7247
Feature activation+1.69e-02

Pos logit contributions
 chip+0.029
 burgers+0.027
 ir+0.026
bank+0.026
 tacos+0.026
Neg logit contributions
 English-0.019
 clearly-0.018
 closer-0.018
adies-0.018
 Spanish-0.017
Token why
Token ID1521
Feature activation-1.50e-02

Pos logit contributions
 clearly+0.014
 English+0.013
 closely+0.013
 closer+0.012
 Spanish+0.012
Neg logit contributions
akers-0.041
ages-0.041
 chip-0.040
drops-0.039
 bulldo-0.039
Token the
Token ID262
Feature activation-3.38e-02

Pos logit contributions
rim+0.031
akers+0.031
ages+0.031
 lin+0.030
bank+0.030
Neg logit contributions
 clearly-0.015
 English-0.014
 anyone-0.013
 Spanish-0.013
 closer-0.012
Token chip
Token ID11594
Feature activation-1.60e-01

Pos logit contributions
keeping+0.036
rim+0.032
rob+0.031
akers+0.027
ams+0.027
Neg logit contributions
 name-0.068
 wonders-0.067
 answer-0.066
 closer-0.066
 shy-0.064
Tokenm
Token ID76
Feature activation-1.00e-01

Pos logit contributions
keeping+0.357
bank+0.326
 cho+0.324
icol+0.314
 bulldo+0.310
Neg logit contributions
itates-0.113
 closer-0.105
 nods-0.100
 shrug-0.091
 clearly-0.085
Tokenunk
Token ID2954
Feature activation-5.29e-02

Pos logit contributions
rob+0.169
bank+0.159
 ir+0.157
keeping+0.157
bek+0.157
Neg logit contributions
 name-0.125
 upon-0.112
 shrug-0.107
 answer-0.106
aly-0.103
Token had
Token ID550
Feature activation-4.72e-02

Pos logit contributions
 chip+0.101
 burgers+0.099
 ir+0.097
bank+0.097
rim+0.097
Neg logit contributions
 clearly-0.061
 closer-0.057
 English-0.053
 closely-0.052
 shrug-0.051
Token been
Token ID587
Feature activation-2.16e-02

Pos logit contributions
 chip+0.103
rim+0.095
bank+0.092
 bulldo+0.090
dogs+0.090
Neg logit contributions
 English-0.055
 Spanish-0.050
 closer-0.048
 wonders-0.047
 clearly-0.047
Token so
Token ID523
Feature activation-2.51e-02

Pos logit contributions
 chip+0.046
bank+0.040
stack+0.040
rim+0.040
 bulldo+0.039
Neg logit contributions
 English-0.027
 closer-0.026
 Spanish-0.025
 clearly-0.024
 wonders-0.022
Token.
Token ID13
Feature activation-1.33e-02

Pos logit contributions
 clearly+0.003
 wonders+0.003
 closely+0.003
 English+0.003
 closer+0.003
Neg logit contributions
 chip-0.009
akers-0.009
rim-0.009
ages-0.009
 tacos-0.009
Token
Token ID198
Feature activation-1.81e-03

Pos logit contributions
 chip+0.029
 burgers+0.029
akers+0.028
rim+0.028
bank+0.028
Neg logit contributions
 English-0.011
 One-0.011
 Spanish-0.010
 clearly-0.010
adies-0.009
Token
Token ID198
Feature activation-7.30e-03

Pos logit contributions
 chip+0.004
bank+0.004
 burgers+0.003
 tacos+0.003
ages+0.003
Neg logit contributions
 English-0.002
adies-0.002
 Spanish-0.002
 clearly-0.002
 closer-0.002
TokenThe
Token ID464
Feature activation-5.51e-02

Pos logit contributions
 chip+0.014
bank+0.013
 burgers+0.013
ages+0.013
 ir+0.013
Neg logit contributions
 English-0.008
 Spanish-0.008
adies-0.008
 clearly-0.008
One-0.008
Token av
Token ID1196
Feature activation-9.48e-02

Pos logit contributions
 chip+0.105
bank+0.100
 burgers+0.097
ages+0.094
 tacos+0.094
Neg logit contributions
 English-0.064
adies-0.062
 Spanish-0.060
 clearly-0.059
 closer-0.059
Tokenoc
Token ID420
Feature activation-1.58e-01

Pos logit contributions
 burgers+0.098
 chip+0.097
 tacos+0.095
bank+0.091
 hamb+0.081
Neg logit contributions
 English-0.177
 closer-0.173
 Spanish-0.173
 wonders-0.173
 clearly-0.172
Tokenados
Token ID22484
Feature activation-8.09e-02

Pos logit contributions
 chip+0.186
 burgers+0.178
bank+0.175
 ir+0.166
 tacos+0.165
Neg logit contributions
adies-0.275
 English-0.274
 clearly-0.262
 Spanish-0.261
 closer-0.259
Token were
Token ID547
Feature activation-5.40e-02

Pos logit contributions
 chip+0.145
 burgers+0.142
 tacos+0.140
bank+0.133
akers+0.132
Neg logit contributions
 clearly-0.103
 closer-0.097
 closely-0.095
 wonders-0.093
 curiously-0.088
Token shared
Token ID4888
Feature activation-2.35e-03

Pos logit contributions
mons+0.108
bank+0.107
 bulldo+0.106
 chip+0.106
rob+0.104
Neg logit contributions
 clearly-0.056
 closer-0.056
 closely-0.053
 English-0.052
 wonders-0.052
Token with
Token ID351
Feature activation+7.19e-04

Pos logit contributions
 chip+0.005
 bulldo+0.005
bank+0.005
rim+0.005
 ir+0.005
Neg logit contributions
 English-0.002
 Spanish-0.002
 clearly-0.002
 closer-0.002
 closely-0.002
Token everybody
Token ID7288
Feature activation+4.52e-02

Pos logit contributions
 English+0.001
 Spanish+0.001
 clearly+0.001
 wonders+0.001
 Crystal+0.001
Neg logit contributions
bank-0.001
 chip-0.001
 bulldo-0.001
rim-0.001
akers-0.001
Token.
Token ID13
Feature activation-2.09e-02

Pos logit contributions
 clearly+0.017
 closer+0.016
 closely+0.014
 close+0.011
 wonders+0.010
Neg logit contributions
 paramed-0.101
 tacos-0.097
 chip-0.097
mons-0.095
 bulldo-0.095
Token
Token ID198
Feature activation-9.56e-03

Pos logit contributions
 chip+0.048
 burgers+0.047
 tacos+0.046
bank+0.046
rim+0.046
Neg logit contributions
 English-0.016
 Spanish-0.014
 One-0.014
 clearly-0.014
 wonders-0.012
Token
Token ID198
Feature activation-1.45e-02

Pos logit contributions
 chip+0.020
bank+0.019
 burgers+0.019
 tacos+0.018
 ir+0.018
Neg logit contributions
 English-0.009
adies-0.009
 Spanish-0.009
 clearly-0.008
 closer-0.008
TokenThe
Token ID464
Feature activation-5.90e-02

Pos logit contributions
 chip+0.030
bank+0.030
 burgers+0.029
 tacos+0.029
 hamb+0.029
Neg logit contributions
 English-0.014
One-0.014
 Spanish-0.013
 clearly-0.013
 closer-0.013
Token town
Token ID3240
Feature activation-7.27e-02

Pos logit contributions
 chip+0.112
bank+0.110
ages+0.103
 burgers+0.102
keeping+0.101
Neg logit contributions
 English-0.069
 closer-0.068
 Spanish-0.064
adies-0.064
 wonders-0.062
Tokenspe
Token ID4125
Feature activation-1.56e-01

Pos logit contributions
 tacos+0.139
 chip+0.134
 burgers+0.132
rob+0.131
mons+0.130
Neg logit contributions
 closer-0.082
 clearly-0.082
 English-0.074
bell-0.074
 closely-0.074
Tokenople
Token ID643
Feature activation-3.14e-02

Pos logit contributions
 chip+0.283
 burgers+0.270
 hamb+0.266
bank+0.265
 tacos+0.262
Neg logit contributions
adies-0.200
 English-0.196
 Spanish-0.182
 clearly-0.181
 wonders-0.180
Token were
Token ID547
Feature activation-3.89e-02

Pos logit contributions
 chip+0.065
 tacos+0.060
mons+0.060
 burgers+0.060
bank+0.059
Neg logit contributions
 clearly-0.036
 closer-0.036
 closely-0.033
 wonders-0.032
 curiously-0.031
Token relieved
Token ID22598
Feature activation+3.97e-03

Pos logit contributions
mons+0.081
bank+0.080
 chip+0.079
 tacos+0.078
 peppers+0.078
Neg logit contributions
 closer-0.045
 clearly-0.040
 English-0.036
 Spanish-0.036
 close-0.035
Token.
Token ID13
Feature activation-1.47e-02

Pos logit contributions
 closer+0.004
 clearly+0.004
 closely+0.003
 hello+0.003
 English+0.003
Neg logit contributions
 chip-0.009
rob-0.009
ages-0.009
 bulldo-0.009
akers-0.009
Token They
Token ID1119
Feature activation-3.23e-02

Pos logit contributions
 chip+0.029
rim+0.028
akers+0.028
 burgers+0.028
 tacos+0.027
Neg logit contributions
 English-0.017
 Spanish-0.016
 One-0.015
 clearly-0.015
 wonders-0.014
Token cried
Token ID16896
Feature activation+1.84e-02

Pos logit contributions
 closer+0.018
 English+0.018
 clearly+0.018
 wonders+0.018
adies+0.017
Neg logit contributions
 chip-0.026
akers-0.024
rim-0.023
bank-0.023
 ir-0.023
Token.
Token ID13
Feature activation-7.08e-03

Pos logit contributions
 English+0.031
adies+0.030
 Spanish+0.030
itates+0.029
bell+0.028
Neg logit contributions
 chip-0.025
 burgers-0.023
bank-0.022
ages-0.021
 oil-0.021
Token The
Token ID383
Feature activation-2.64e-02

Pos logit contributions
 chip+0.011
bank+0.011
 burgers+0.010
ages+0.010
 tacos+0.010
Neg logit contributions
 English-0.011
adies-0.010
 closer-0.010
 Spanish-0.010
 clearly-0.010
Token end
Token ID886
Feature activation-2.15e-02

Pos logit contributions
 chip+0.040
 oil+0.033
 bulldo+0.031
 burgers+0.031
 tacos+0.031
Neg logit contributions
 English-0.045
adies-0.045
 Spanish-0.043
 clearly-0.042
 Him-0.041
Token.
Token ID13
Feature activation-9.09e-03

Pos logit contributions
 chip+0.030
bank+0.026
 burgers+0.025
ages+0.024
 bulldo+0.024
Neg logit contributions
adies-0.038
 English-0.037
 Spanish-0.036
eks-0.034
 clearly-0.034
Token<|endoftext|>
Token ID50256
Feature activation+7.32e-02

Pos logit contributions
 chip+0.015
 burgers+0.013
bank+0.013
ages+0.013
 bulldo+0.012
Neg logit contributions
adies-0.014
 English-0.013
 closer-0.013
 Spanish-0.013
 closely-0.013
TokenOnce
Token ID7454
Feature activation+1.90e-02

Pos logit contributions
 English+0.128
adies+0.127
 Spanish+0.122
bell+0.122
 closely+0.118
Neg logit contributions
 chip-0.096
bank-0.084
 burgers-0.078
keeping-0.075
 bulldo-0.074
Token upon
Token ID2402
Feature activation+5.80e-02

Pos logit contributions
 English+0.026
 Spanish+0.026
 title+0.025
adies+0.025
 afar+0.025
Neg logit contributions
 chip-0.029
str-0.027
 making-0.026
 bulldo-0.026
melon-0.025
Token a
Token ID257
Feature activation+5.59e-02

Pos logit contributions
 English+0.112
 Spanish+0.109
bell+0.109
ares+0.105
 closely+0.104
Neg logit contributions
 chip-0.065
 burgers-0.060
 making-0.058
ages-0.054
 hamb-0.053
Token time
Token ID640
Feature activation-2.62e-02

Pos logit contributions
 English+0.110
 clearly+0.106
 Spanish+0.106
adies+0.103
 nearer+0.101
Neg logit contributions
 chip-0.063
 burgers-0.056
bank-0.053
 hamb-0.052
 tacos-0.052
Token,
Token ID11
Feature activation+2.03e-02

Pos logit contributions
 chip+0.022
bank+0.016
 bulldo+0.016
 burgers+0.016
 making+0.015
Neg logit contributions
 English-0.058
 clearly-0.058
 Spanish-0.056
 wonders-0.055
bell-0.054
Token and
Token ID290
Feature activation+4.16e-02

Pos logit contributions
 English+0.018
adies+0.018
 Spanish+0.017
 clearly+0.017
 closer+0.016
Neg logit contributions
 chip-0.033
bank-0.032
 burgers-0.031
ages-0.030
 tacos-0.030
Token Lily
Token ID20037
Feature activation+3.87e-02

Pos logit contributions
 English+0.055
adies+0.052
 Spanish+0.051
 closer+0.050
 clearly+0.048
Neg logit contributions
 chip-0.074
bank-0.072
 burgers-0.068
rim-0.067
akers-0.066
Token were
Token ID547
Feature activation+1.31e-02

Pos logit contributions
 English+0.052
adies+0.052
 clearly+0.049
 closer+0.049
 Spanish+0.049
Neg logit contributions
 chip-0.066
bank-0.062
 burgers-0.060
ages-0.058
 tacos-0.057
Token friends
Token ID2460
Feature activation+4.78e-02

Pos logit contributions
 English+0.024
adies+0.023
 clearly+0.023
 Spanish+0.023
 closer+0.022
Neg logit contributions
 chip-0.016
 burgers-0.015
bank-0.014
 tacos-0.014
 ir-0.014
Token who
Token ID508
Feature activation+4.12e-02

Pos logit contributions
adies+0.079
 English+0.078
eks+0.073
 Spanish+0.073
itates+0.071
Neg logit contributions
 chip-0.068
 burgers-0.060
bank-0.060
 tacos-0.055
ards-0.055
Token liked
Token ID8288
Feature activation+9.06e-02

Pos logit contributions
 English+0.042
 clearly+0.041
adies+0.039
 Spanish+0.038
 wonders+0.038
Neg logit contributions
 chip-0.084
bank-0.080
 burgers-0.079
ages-0.076
 tacos-0.075
Token to
Token ID284
Feature activation+3.22e-02

Pos logit contributions
 English+0.083
 closer+0.078
 Spanish+0.078
 hello+0.070
 clearly+0.068
Neg logit contributions
bank-0.193
rim-0.189
 chip-0.188
ages-0.185
akers-0.184
Token play
Token ID711
Feature activation+8.89e-02

Pos logit contributions
 English+0.038
adies+0.037
 Spanish+0.036
 closer+0.035
 clearly+0.035
Neg logit contributions
 chip-0.061
bank-0.057
 burgers-0.056
ages-0.054
 tacos-0.054
Token outside
Token ID2354
Feature activation+1.05e-01

Pos logit contributions
 English+0.112
 closer+0.108
adies+0.107
 Spanish+0.105
 clearly+0.102
Neg logit contributions
 chip-0.163
bank-0.153
ages-0.145
 burgers-0.145
 tacos-0.140
Token.
Token ID13
Feature activation+3.49e-02

Pos logit contributions
 English+0.094
 closer+0.093
 One+0.088
 Spanish+0.086
 closely+0.085
Neg logit contributions
 chip-0.226
bank-0.225
rim-0.222
ages-0.221
 tacos-0.216
Token They
Token ID1119
Feature activation+8.49e-03

Pos logit contributions
adies+0.020
 English+0.019
 clearly+0.018
 closer+0.018
itates+0.017
Neg logit contributions
 chip-0.086
 burgers-0.081
bank-0.081
 tacos-0.079
 ir-0.077
Token bowls
Token ID34796
Feature activation-5.22e-02

Pos logit contributions
 chip+0.019
bank+0.018
 burgers+0.017
ages+0.017
 tacos+0.017
Neg logit contributions
 English-0.019
adies-0.019
 Spanish-0.018
 clearly-0.018
 closer-0.018
Token of
Token ID286
Feature activation-4.51e-02

Pos logit contributions
 chip+0.120
 bulldo+0.119
 ir+0.119
bank+0.118
 tort+0.116
Neg logit contributions
 closer-0.039
 upon-0.034
 closely-0.030
adies-0.029
 answer-0.029
Token salad
Token ID20698
Feature activation-1.66e-02

Pos logit contributions
 chip+0.073
bank+0.072
ages+0.066
 ir+0.063
 bulldo+0.063
Neg logit contributions
 English-0.064
 closer-0.063
adies-0.063
 clearly-0.061
 wonders-0.061
Token.
Token ID13
Feature activation+1.34e-03

Pos logit contributions
 paramed+0.038
 chip+0.038
 ir+0.037
rim+0.037
bank+0.037
Neg logit contributions
 closer-0.012
 closely-0.011
 clearly-0.010
adies-0.010
 upon-0.009
Token They
Token ID1119
Feature activation-1.70e-02

Pos logit contributions
 English+0.002
adies+0.002
 Spanish+0.001
 One+0.001
 clearly+0.001
Neg logit contributions
 chip-0.003
bank-0.002
rim-0.002
 ir-0.002
akers-0.002
Token look
Token ID804
Feature activation+6.30e-02

Pos logit contributions
 chip+0.032
bank+0.031
ages+0.029
 burgers+0.029
 ir+0.029
Neg logit contributions
adies-0.019
 English-0.019
 closer-0.019
 clearly-0.019
 closely-0.018
Token green
Token ID4077
Feature activation+9.82e-03

Pos logit contributions
 English+0.098
adies+0.092
 Spanish+0.092
 clearly+0.088
itates+0.088
Neg logit contributions
 burgers-0.094
 chip-0.093
 hamb-0.086
 tacos-0.086
bank-0.082
Token and
Token ID290
Feature activation+2.42e-02

Pos logit contributions
 closer+0.010
 English+0.010
 clearly+0.010
adies+0.010
 closely+0.010
Neg logit contributions
 chip-0.019
bank-0.019
rim-0.018
 bulldo-0.018
 ir-0.017
Token red
Token ID2266
Feature activation+1.04e-02

Pos logit contributions
 English+0.026
 closer+0.026
 clearly+0.025
 wonders+0.024
adies+0.024
Neg logit contributions
 chip-0.047
bank-0.046
ages-0.044
rim-0.044
 ir-0.043
Token and
Token ID290
Feature activation+1.47e-02

Pos logit contributions
adies+0.010
 English+0.010
 closer+0.010
 clearly+0.010
 closely+0.009
Neg logit contributions
 chip-0.021
bank-0.021
rim-0.020
 bulldo-0.019
 ir-0.019
Token yellow
Token ID7872
Feature activation+1.72e-02

Pos logit contributions
 English+0.016
 closer+0.016
 clearly+0.015
 wonders+0.015
 Spanish+0.015
Neg logit contributions
bank-0.028
 chip-0.028
rim-0.027
ages-0.027
 ir-0.026
Token Anna
Token ID11735
Feature activation+7.26e-02

Pos logit contributions
 English+0.004
adies+0.004
 Spanish+0.004
 clearly+0.004
 closer+0.004
Neg logit contributions
 chip-0.006
bank-0.006
 burgers-0.006
ages-0.006
 tacos-0.006
Token.
Token ID13
Feature activation+2.21e-02

Pos logit contributions
adies+0.123
 English+0.122
 Spanish+0.119
 nearer+0.117
 closer+0.116
Neg logit contributions
 chip-0.097
ages-0.084
bank-0.084
 burgers-0.083
akers-0.082
Token She
Token ID1375
Feature activation+4.14e-02

Pos logit contributions
 nearer+0.038
eks+0.038
 closer+0.035
ieve+0.035
endant+0.034
Neg logit contributions
 chip-0.031
 coffee-0.024
 pitcher-0.023
 washing-0.023
 dough-0.023
Token wanted
Token ID2227
Feature activation+7.17e-02

Pos logit contributions
 English+0.065
adies+0.063
 Spanish+0.062
 closer+0.061
 clearly+0.060
Neg logit contributions
 chip-0.063
bank-0.058
 burgers-0.056
ages-0.055
 tacos-0.055
Token to
Token ID284
Feature activation+3.20e-02

Pos logit contributions
 English+0.103
adies+0.100
 Spanish+0.098
 clearly+0.096
 closer+0.095
Neg logit contributions
 chip-0.117
bank-0.110
 burgers-0.107
ages-0.103
 tacos-0.103
Token learn
Token ID2193
Feature activation+6.63e-02

Pos logit contributions
 English+0.037
 Spanish+0.035
adies+0.034
 closer+0.034
 clearly+0.033
Neg logit contributions
 chip-0.059
bank-0.058
 burgers-0.056
akers-0.055
ages-0.054
Token something
Token ID1223
Feature activation+1.05e-01

Pos logit contributions
adies+0.073
 English+0.070
itates+0.067
 Spanish+0.064
 clearly+0.064
Neg logit contributions
 chip-0.129
 burgers-0.120
bank-0.120
 ir-0.115
 tacos-0.115
Token new
Token ID649
Feature activation+8.43e-02

Pos logit contributions
 English+0.097
adies+0.096
 Spanish+0.088
 clearly+0.085
 closer+0.084
Neg logit contributions
 chip-0.225
bank-0.212
 burgers-0.209
 tacos-0.202
 ir-0.202
Token.
Token ID13
Feature activation+3.01e-02

Pos logit contributions
 English+0.110
 Spanish+0.104
adies+0.103
 clearly+0.102
 closer+0.101
Neg logit contributions
 chip-0.152
bank-0.142
 burgers-0.140
 tacos-0.135
 ir-0.133
Token So
Token ID1406
Feature activation+5.88e-03

Pos logit contributions
adies+0.050
eks+0.048
 nearer+0.046
 closer+0.046
 English+0.045
Neg logit contributions
 chip-0.045
bank-0.036
 Lem-0.033
 coffee-0.033
 tacos-0.032
Token one
Token ID530
Feature activation+7.38e-02

Pos logit contributions
 English+0.011
adies+0.011
 Spanish+0.010
 clearly+0.010
itates+0.010
Neg logit contributions
 chip-0.007
bank-0.006
 burgers-0.006
 ir-0.006
 tacos-0.006
Token out
Token ID503
Feature activation+4.21e-02

Pos logit contributions
 English+0.020
 closer+0.020
adies+0.019
 Spanish+0.018
 clearly+0.018
Neg logit contributions
 chip-0.030
bank-0.029
 burgers-0.028
 tacos-0.028
ages-0.027
Token of
Token ID286
Feature activation-4.31e-02

Pos logit contributions
 closer+0.048
 English+0.047
 clearly+0.045
adies+0.044
 closely+0.044
Neg logit contributions
 chip-0.088
bank-0.086
ages-0.081
 tacos-0.080
mons-0.079
Token the
Token ID262
Feature activation-8.33e-03

Pos logit contributions
ages+0.091
 chip+0.090
bank+0.088
mons+0.082
akers+0.082
Neg logit contributions
 closer-0.047
 English-0.044
 wonders-0.042
 clearly-0.041
adies-0.041
Token fireplace
Token ID44250
Feature activation-2.55e-02

Pos logit contributions
bank+0.015
 chip+0.015
ages+0.014
 tacos+0.014
rim+0.014
Neg logit contributions
 closer-0.010
 English-0.010
 wonders-0.009
 clearly-0.009
adies-0.009
Token to
Token ID284
Feature activation-3.64e-03

Pos logit contributions
rim+0.055
 chip+0.055
 ir+0.054
 tacos+0.054
 Swim+0.052
Neg logit contributions
 clearly-0.022
 closer-0.022
 closely-0.020
 English-0.019
bell-0.018
Token explore
Token ID7301
Feature activation+6.83e-02

Pos logit contributions
 chip+0.007
ages+0.007
 ir+0.007
akers+0.007
mons+0.007
Neg logit contributions
 closer-0.004
 clearly-0.003
 answer-0.003
adies-0.003
 One-0.003
Token."
Token ID526
Feature activation-5.08e-04

Pos logit contributions
 English+0.080
 clearly+0.079
 closer+0.078
adies+0.076
 Spanish+0.075
Neg logit contributions
 chip-0.133
bank-0.129
ages-0.125
 tacos-0.122
 ir-0.119
Token
Token ID198
Feature activation+6.50e-03

Pos logit contributions
akers+0.001
 chip+0.001
rim+0.001
 burgers+0.001
 ir+0.001
Neg logit contributions
 English-0.000
 One-0.000
 clearly-0.000
 Spanish-0.000
 closer-0.000
Token
Token ID198
Feature activation+2.10e-04

Pos logit contributions
 English+0.006
 Spanish+0.005
adies+0.005
 clearly+0.005
 closer+0.005
Neg logit contributions
 chip-0.014
bank-0.014
 burgers-0.014
 ir-0.013
 tacos-0.013
TokenThe
Token ID464
Feature activation-4.66e-02

Pos logit contributions
 English+0.000
One+0.000
 Spanish+0.000
 closer+0.000
 clearly+0.000
Neg logit contributions
bank-0.000
 burgers-0.000
 chip-0.000
 ir-0.000
 tacos-0.000
Token boy
Token ID2933
Feature activation-5.19e-02

Pos logit contributions
 chip+0.078
 burgers+0.070
bank+0.069
 tacos+0.066
 ir+0.066
Neg logit contributions
 English-0.068
adies-0.067
 Spanish-0.065
 clearly-0.064
 closer-0.061
Token Tim
Token ID5045
Feature activation-2.57e-02

Pos logit contributions
 English+0.003
 One+0.003
adies+0.003
 clearly+0.003
 Spanish+0.003
Neg logit contributions
 chip-0.003
 burgers-0.003
bank-0.003
akers-0.003
rim-0.003
Tokenmy
Token ID1820
Feature activation-9.11e-05

Pos logit contributions
 chip+0.046
bank+0.042
 ir+0.041
 burgers+0.041
ages+0.041
Neg logit contributions
adies-0.033
 English-0.033
 Spanish-0.031
 clearly-0.031
 closer-0.030
Token felt
Token ID2936
Feature activation+6.38e-03

Pos logit contributions
 chip+0.000
bank+0.000
 burgers+0.000
 tacos+0.000
 ir+0.000
Neg logit contributions
 English-0.000
 clearly-0.000
adies-0.000
 closer-0.000
 Spanish-0.000
Token guilty
Token ID6717
Feature activation-2.01e-02

Pos logit contributions
 English+0.009
adies+0.009
 Spanish+0.008
 clearly+0.007
 closer+0.007
Neg logit contributions
 chip-0.011
bank-0.010
 burgers-0.010
 tacos-0.010
ages-0.010
Token for
Token ID329
Feature activation-4.84e-02

Pos logit contributions
 chip+0.034
bank+0.033
 burgers+0.032
 tacos+0.031
rim+0.031
Neg logit contributions
 English-0.027
 clearly-0.026
 closer-0.026
 Spanish-0.025
adies-0.025
Token not
Token ID407
Feature activation+2.08e-02

Pos logit contributions
bank+0.072
ages+0.071
 chip+0.071
rim+0.067
akers+0.066
Neg logit contributions
 clearly-0.076
 English-0.073
 closely-0.072
 closer-0.071
 Spanish-0.068
Token listening
Token ID8680
Feature activation-3.19e-03

Pos logit contributions
 English+0.032
 Spanish+0.030
 closer+0.030
 answer+0.029
 reply+0.028
Neg logit contributions
 chip-0.033
bank-0.033
rim-0.031
stack-0.029
bers-0.029
Token to
Token ID284
Feature activation-1.78e-02

Pos logit contributions
 chip+0.007
rim+0.007
bank+0.007
stack+0.007
 ge+0.007
Neg logit contributions
 English-0.003
 clearly-0.003
 closer-0.003
 Spanish-0.003
 closely-0.003
Token his
Token ID465
Feature activation+3.22e-02

Pos logit contributions
ages+0.037
bank+0.037
 chip+0.037
mons+0.037
akers+0.036
Neg logit contributions
 English-0.018
 clearly-0.017
 Spanish-0.016
 answer-0.015
 closely-0.015
Token mom
Token ID1995
Feature activation+1.63e-02

Pos logit contributions
 English+0.041
adies+0.040
 Spanish+0.038
 clearly+0.037
 closer+0.036
Neg logit contributions
 chip-0.061
 burgers-0.055
bank-0.055
 tacos-0.053
ages-0.053
Token,
Token ID11
Feature activation+8.62e-03

Pos logit contributions
 closer+0.013
 closely+0.012
 clearly+0.012
 answer+0.011
 English+0.011
Neg logit contributions
rim-0.036
bank-0.035
 ge-0.035
 cho-0.035
 chip-0.035
Token at
Token ID379
Feature activation-1.44e-02

Pos logit contributions
adies+0.046
 English+0.046
 Spanish+0.043
 clearly+0.041
itates+0.041
Neg logit contributions
 chip-0.058
 burgers-0.057
bank-0.054
 ir-0.053
 tacos-0.053
Token her
Token ID607
Feature activation+4.42e-02

Pos logit contributions
 chip+0.023
 burgers+0.021
bank+0.021
 tacos+0.020
 ir+0.019
Neg logit contributions
 English-0.021
adies-0.021
 Spanish-0.020
itates-0.020
 clearly-0.020
Token seriously
Token ID6411
Feature activation-4.54e-03

Pos logit contributions
 English+0.051
 closely+0.050
 closer+0.050
 Spanish+0.049
 clearly+0.048
Neg logit contributions
bank-0.085
 chip-0.084
rim-0.080
 burgers-0.080
ages-0.078
Token.
Token ID13
Feature activation-9.33e-03

Pos logit contributions
 chip+0.010
bank+0.010
 tacos+0.009
 burgers+0.009
rim+0.009
Neg logit contributions
 closely-0.005
 closer-0.005
 clearly-0.005
 English-0.005
 Spanish-0.004
Token "
Token ID366
Feature activation-4.14e-02

Pos logit contributions
 chip+0.014
bank+0.013
 burgers+0.012
 tacos+0.012
ages+0.012
Neg logit contributions
 English-0.015
adies-0.015
 Spanish-0.014
 clearly-0.014
 closer-0.014
TokenI
Token ID40
Feature activation+1.73e-02

Pos logit contributions
 chip+0.080
bank+0.076
 burgers+0.074
ages+0.073
 bulldo+0.071
Neg logit contributions
 English-0.049
 Spanish-0.046
adies-0.045
 clearly-0.045
 closer-0.044
Token have
Token ID423
Feature activation-1.63e-02

Pos logit contributions
 English+0.020
adies+0.020
 closer+0.019
 clearly+0.019
 Spanish+0.019
Neg logit contributions
 chip-0.033
bank-0.031
 burgers-0.031
 tacos-0.029
ages-0.029
Token to
Token ID284
Feature activation-2.03e-03

Pos logit contributions
 chip+0.039
bank+0.037
rim+0.036
stack+0.035
cut+0.033
Neg logit contributions
 English-0.015
 Spanish-0.013
adies-0.013
 closer-0.012
 wonders-0.012
Token do
Token ID466
Feature activation-7.21e-04

Pos logit contributions
 chip+0.005
bank+0.004
bers+0.004
ages+0.004
mons+0.004
Neg logit contributions
 answer-0.002
 English-0.002
 reply-0.002
 name-0.002
 open-0.002
Token some
Token ID617
Feature activation+3.58e-02

Pos logit contributions
 chip+0.002
rim+0.002
bank+0.002
 bulldo+0.002
 ir+0.001
Neg logit contributions
 English-0.001
 clearly-0.001
 closely-0.001
 Spanish-0.001
 closer-0.001
Token work
Token ID670
Feature activation-1.27e-02

Pos logit contributions
 English+0.043
 Spanish+0.042
 answer+0.041
 closely+0.039
 closer+0.039
Neg logit contributions
bank-0.069
 chip-0.069
stack-0.064
keeping-0.064
ages-0.062
Token fountain
Token ID29420
Feature activation-2.62e-02

Pos logit contributions
 closer+0.012
 English+0.012
 clearly+0.011
itates+0.011
 Spanish+0.011
Neg logit contributions
 chip-0.007
ages-0.007
bank-0.007
 ir-0.006
akers-0.006
Token and
Token ID290
Feature activation+2.00e-02

Pos logit contributions
 chip+0.059
 ir+0.054
 ah+0.053
rim+0.052
ages+0.052
Neg logit contributions
 closer-0.024
 closely-0.023
 English-0.022
adies-0.022
 clearly-0.022
Token felt
Token ID2936
Feature activation+2.40e-03

Pos logit contributions
 closer+0.021
 clearly+0.020
 English+0.020
 wonders+0.019
adies+0.019
Neg logit contributions
 chip-0.043
rim-0.039
 ir-0.038
ages-0.038
akers-0.038
Token very
Token ID845
Feature activation-2.77e-02

Pos logit contributions
 English+0.003
adies+0.003
 Spanish+0.003
 closer+0.003
 clearly+0.003
Neg logit contributions
 chip-0.005
bank-0.004
 burgers-0.004
ages-0.004
 ir-0.004
Token sleepy
Token ID42023
Feature activation-2.93e-03

Pos logit contributions
 chip+0.056
bank+0.052
 burgers+0.051
ages+0.050
 ir+0.050
Neg logit contributions
 English-0.029
adies-0.028
 clearly-0.028
 closer-0.028
 Spanish-0.027
Token.
Token ID13
Feature activation+7.25e-03

Pos logit contributions
rim+0.007
 chip+0.007
uc+0.007
bank+0.007
ages+0.007
Neg logit contributions
 closer-0.002
 clearly-0.002
 closely-0.002
 answer-0.002
 English-0.002
Token He
Token ID679
Feature activation+2.68e-02

Pos logit contributions
 English+0.006
 Spanish+0.006
adies+0.006
 clearly+0.006
 One+0.006
Neg logit contributions
 chip-0.016
rim-0.015
 burgers-0.015
akers-0.015
bank-0.015
Token decided
Token ID3066
Feature activation+2.46e-02

Pos logit contributions
 English+0.042
adies+0.041
 Spanish+0.040
 clearly+0.038
itates+0.037
Neg logit contributions
 chip-0.040
bank-0.038
 burgers-0.037
 tacos-0.035
 ir-0.035
Token to
Token ID284
Feature activation-6.51e-03

Pos logit contributions
 English+0.030
 closer+0.030
adies+0.028
 clearly+0.028
 Spanish+0.027
Neg logit contributions
 chip-0.046
bank-0.043
rim-0.042
akers-0.042
 burgers-0.042
Token take
Token ID1011
Feature activation+1.38e-02

Pos logit contributions
 chip+0.013
bank+0.012
 burgers+0.012
ages+0.012
 ir+0.012
Neg logit contributions
 English-0.007
adies-0.007
 closer-0.007
 Spanish-0.007
 clearly-0.006
Token a
Token ID257
Feature activation+6.75e-02

Pos logit contributions
 English+0.016
 Spanish+0.015
 clearly+0.015
itates+0.015
adies+0.015
Neg logit contributions
 chip-0.026
 burgers-0.025
bank-0.024
 tacos-0.024
ages-0.024
Token playing
Token ID2712
Feature activation+1.36e-02

Pos logit contributions
 chip+0.036
bank+0.033
 burgers+0.032
ages+0.031
 ir+0.030
Neg logit contributions
 English-0.044
adies-0.044
 Spanish-0.042
 closer-0.041
 clearly-0.041
Token baseball
Token ID9283
Feature activation+4.12e-03

Pos logit contributions
 English+0.013
 closer+0.013
adies+0.013
 Spanish+0.012
 closely+0.012
Neg logit contributions
 chip-0.030
bank-0.026
ages-0.026
str-0.026
 ir-0.026
Token again
Token ID757
Feature activation+3.90e-02

Pos logit contributions
 closer+0.003
 English+0.003
 closely+0.003
adies+0.003
 diary+0.003
Neg logit contributions
 chip-0.009
 paramed-0.008
 Su-0.008
str-0.008
 hamb-0.008
Token.
Token ID13
Feature activation-4.95e-03

Pos logit contributions
 upon+0.026
 wonders+0.025
 English+0.025
 clearly+0.025
 closer+0.025
Neg logit contributions
 chip-0.095
 burgers-0.088
 paramed-0.088
 tacos-0.087
 bulldo-0.086
Token Danny
Token ID15105
Feature activation-3.36e-02

Pos logit contributions
 chip+0.010
 burgers+0.010
akers+0.010
rim+0.010
bank+0.010
Neg logit contributions
 English-0.005
 One-0.005
 Spanish-0.004
adies-0.004
 clearly-0.004
Token was
Token ID373
Feature activation-1.10e-02

Pos logit contributions
 chip+0.069
 burgers+0.065
rim+0.064
akers+0.064
keeping+0.063
Neg logit contributions
 clearly-0.036
 wonders-0.035
 closely-0.034
 English-0.033
 nods-0.032
Token proud
Token ID6613
Feature activation-1.40e-02

Pos logit contributions
 chip+0.022
bank+0.021
 ir+0.021
mons+0.020
 burgers+0.020
Neg logit contributions
 closer-0.013
 clearly-0.012
 English-0.012
 Spanish-0.012
 closely-0.011
Token of
Token ID286
Feature activation-5.91e-02

Pos logit contributions
bank+0.032
ages+0.031
keeping+0.031
 chip+0.031
stack+0.030
Neg logit contributions
 clearly-0.012
 English-0.011
 closer-0.011
 closely-0.011
 wonders-0.010
Token himself
Token ID2241
Feature activation+4.81e-02

Pos logit contributions
 chip+0.131
ages+0.128
bank+0.123
 burgers+0.122
akers+0.122
Neg logit contributions
 English-0.058
 clearly-0.056
 Spanish-0.055
 closely-0.051
 closer-0.050
Token for
Token ID329
Feature activation-6.34e-02

Pos logit contributions
 closely+0.048
 closer+0.044
 English+0.038
 clearly+0.038
 Spanish+0.037
Neg logit contributions
rim-0.106
bank-0.105
 ge-0.098
mons-0.098
keeping-0.097
Token being
Token ID852
Feature activation-4.86e-02

Pos logit contributions
ages+0.128
mons+0.124
 chip+0.123
bank+0.122
str+0.120
Neg logit contributions
 English-0.072
 clearly-0.070
 Spanish-0.068
 what-0.068
 closer-0.065
Token to
Token ID284
Feature activation+2.28e-02

Pos logit contributions
 English+0.072
 Spanish+0.068
 closer+0.065
 hello+0.060
 clearly+0.059
Neg logit contributions
bank-0.160
rim-0.155
ages-0.155
akers-0.152
 chip-0.148
Token play
Token ID711
Feature activation+7.82e-02

Pos logit contributions
 English+0.026
adies+0.024
 Spanish+0.024
 closer+0.023
 clearly+0.023
Neg logit contributions
 chip-0.045
bank-0.043
 burgers-0.041
ages-0.040
akers-0.040
Token with
Token ID351
Feature activation+5.56e-02

Pos logit contributions
 English+0.103
adies+0.099
 Spanish+0.097
 closer+0.096
 clearly+0.095
Neg logit contributions
 chip-0.137
bank-0.130
 burgers-0.125
ages-0.122
 tacos-0.121
Token her
Token ID607
Feature activation+9.35e-02

Pos logit contributions
 English+0.042
 Spanish+0.038
 clearly+0.036
adies+0.035
 closer+0.035
Neg logit contributions
 chip-0.129
bank-0.129
 burgers-0.122
ages-0.122
 tacos-0.118
Token mom
Token ID1995
Feature activation+4.01e-02

Pos logit contributions
 English+0.122
 Spanish+0.115
 closer+0.113
 diary+0.112
 closely+0.109
Neg logit contributions
bank-0.168
 chip-0.162
 burgers-0.155
ages-0.152
rim-0.151
Token's
Token ID338
Feature activation+1.85e-02

Pos logit contributions
 English+0.032
 Spanish+0.031
 One+0.030
 closer+0.029
 closely+0.028
Neg logit contributions
bank-0.092
rim-0.089
 chip-0.088
 bulldo-0.087
ages-0.086
Token purse
Token ID27517
Feature activation+1.70e-02

Pos logit contributions
 English+0.026
 Spanish+0.025
 closer+0.024
adies+0.024
 diary+0.024
Neg logit contributions
 chip-0.031
bank-0.030
 burgers-0.028
ages-0.027
 tacos-0.027
Token.
Token ID13
Feature activation+1.97e-02

Pos logit contributions
 English+0.018
 Spanish+0.017
 closer+0.017
 clearly+0.017
adies+0.017
Neg logit contributions
 chip-0.033
bank-0.033
 burgers-0.032
rim-0.031
 tacos-0.031
Token She
Token ID1375
Feature activation+3.89e-02

Pos logit contributions
adies+0.014
 English+0.013
 closer+0.012
eks+0.012
 clearly+0.012
Neg logit contributions
 chip-0.047
bank-0.044
 burgers-0.043
 ir-0.042
 tacos-0.042
Token liked
Token ID8288
Feature activation+7.34e-02

Pos logit contributions
 English+0.057
adies+0.056
 Spanish+0.054
 clearly+0.054
 closer+0.053
Neg logit contributions
 chip-0.062
bank-0.058
 burgers-0.057
 tacos-0.054
ages-0.054
Token to
Token ID284
Feature activation+1.84e-02

Pos logit contributions
 English+0.065
 Spanish+0.061
 closer+0.057
 clearly+0.053
 closely+0.053
Neg logit contributions
rim-0.162
bank-0.160
ages-0.158
akers-0.155
 hyd-0.151
Token not
Token ID407
Feature activation+7.74e-03

Pos logit contributions
 chip+0.066
bank+0.063
 ir+0.058
 bulldo+0.056
ages+0.055
Neg logit contributions
 closer-0.060
 clearly-0.060
adies-0.059
 English-0.059
 closely-0.057
Token looking
Token ID2045
Feature activation+2.14e-02

Pos logit contributions
 closer+0.010
 clearly+0.009
 English+0.009
adies+0.009
 upon+0.009
Neg logit contributions
 chip-0.015
bank-0.015
stack-0.013
rim-0.013
 bulldo-0.013
Token.
Token ID13
Feature activation+2.01e-03

Pos logit contributions
 closer+0.023
 English+0.022
 clearly+0.022
adies+0.021
 closely+0.021
Neg logit contributions
 chip-0.045
bank-0.042
rim-0.041
 ir-0.040
ages-0.039
Token
Token ID198
Feature activation+1.79e-02

Pos logit contributions
 English+0.002
adies+0.002
 Spanish+0.002
 clearly+0.002
 closer+0.002
Neg logit contributions
 chip-0.004
bank-0.004
 burgers-0.004
ages-0.004
 ir-0.004
Token
Token ID198
Feature activation+9.39e-03

Pos logit contributions
adies+0.024
 English+0.023
 Spanish+0.022
 clearly+0.021
 closer+0.021
Neg logit contributions
 chip-0.031
bank-0.029
 burgers-0.028
ages-0.027
 bulldo-0.027
Token"
Token ID1
Feature activation-4.77e-02

Pos logit contributions
 English+0.012
adies+0.011
 Spanish+0.011
 closer+0.011
 clearly+0.011
Neg logit contributions
 chip-0.017
bank-0.016
 burgers-0.016
 tacos-0.016
 ir-0.016
TokenHey
Token ID10814
Feature activation+3.57e-02

Pos logit contributions
 chip+0.090
bank+0.087
 burgers+0.083
ages+0.081
 tacos+0.081
Neg logit contributions
 English-0.057
adies-0.055
 Spanish-0.053
 closer-0.052
 clearly-0.052
Token,
Token ID11
Feature activation-1.84e-02

Pos logit contributions
 English+0.064
 Spanish+0.062
 clearly+0.061
itates+0.060
adies+0.060
Neg logit contributions
 chip-0.048
 burgers-0.042
bank-0.040
 tacos-0.039
 hamb-0.039
Token that
Token ID326
Feature activation-5.04e-02

Pos logit contributions
 chip+0.035
bank+0.034
 burgers+0.033
ages+0.032
 ir+0.032
Neg logit contributions
 English-0.021
adies-0.020
 closer-0.019
 Spanish-0.019
 clearly-0.019
Token's
Token ID338
Feature activation-5.18e-02

Pos logit contributions
bank+0.101
keeping+0.099
akers+0.099
rim+0.098
 ir+0.098
Neg logit contributions
 closer-0.045
 English-0.044
adies-0.043
 name-0.040
 clearly-0.040
Token our
Token ID674
Feature activation-7.15e-02

Pos logit contributions
 chip+0.088
bank+0.082
 burgers+0.080
 tacos+0.077
ages+0.077
Neg logit contributions
 English-0.072
adies-0.070
 Spanish-0.068
 clearly-0.066
 closer-0.066
Token
Token ID198
Feature activation+1.59e-02

Pos logit contributions
adies+0.033
 English+0.032
 diary+0.031
 Spanish+0.030
 clearly+0.029
Neg logit contributions
 chip-0.032
bank-0.030
 burgers-0.027
ages-0.025
 bulldo-0.025
TokenOne
Token ID3198
Feature activation+4.61e-02

Pos logit contributions
 English+0.006
adies+0.005
 Spanish+0.005
 clearly+0.004
 closer+0.004
Neg logit contributions
 chip-0.043
bank-0.041
 burgers-0.040
 tacos-0.039
ages-0.039
Token day
Token ID1110
Feature activation-5.77e-02

Pos logit contributions
 English+0.088
adies+0.086
 closer+0.085
 Spanish+0.085
 clearly+0.084
Neg logit contributions
 chip-0.052
bank-0.050
 burgers-0.046
 ir-0.045
ages-0.044
Token,
Token ID11
Feature activation-5.61e-03

Pos logit contributions
 chip+0.094
bank+0.091
 burgers+0.086
 ir+0.084
ages+0.083
Neg logit contributions
 English-0.082
adies-0.081
 closer-0.078
 Spanish-0.077
 clearly-0.075
Token the
Token ID262
Feature activation-2.52e-02

Pos logit contributions
bank+0.011
rim+0.010
 chip+0.010
keeping+0.010
ages+0.010
Neg logit contributions
 closer-0.007
 English-0.007
adies-0.006
 Spanish-0.006
 hello-0.006
Token dad
Token ID9955
Feature activation-2.26e-02

Pos logit contributions
 chip+0.040
 burgers+0.036
bank+0.036
 tacos+0.034
 ir+0.034
Neg logit contributions
 English-0.039
adies-0.038
 Spanish-0.037
 clearly-0.036
 closer-0.035
Token wanted
Token ID2227
Feature activation+5.34e-02

Pos logit contributions
bank+0.040
 chip+0.040
keeping+0.036
 burgers+0.036
ages+0.036
Neg logit contributions
adies-0.027
 closer-0.027
 English-0.026
 wonders-0.025
bell-0.025
Token to
Token ID284
Feature activation+1.70e-02

Pos logit contributions
 closer+0.045
 English+0.040
 One+0.037
 wonders+0.036
 another+0.036
Neg logit contributions
akers-0.117
rim-0.117
ages-0.114
bank-0.114
drops-0.114
Token do
Token ID466
Feature activation+1.39e-02

Pos logit contributions
 answer+0.019
 name+0.018
 closer+0.018
 One+0.018
 clearly+0.017
Neg logit contributions
bers-0.032
bank-0.032
ages-0.031
 chip-0.031
 ir-0.031
Token something
Token ID1223
Feature activation+9.24e-02

Pos logit contributions
 English+0.007
 clearly+0.007
 closer+0.007
 closely+0.007
 Spanish+0.006
Neg logit contributions
 chip-0.034
rim-0.033
akers-0.033
bank-0.032
 ir-0.032
Token special
Token ID2041
Feature activation+3.39e-02

Pos logit contributions
 English+0.100
 closer+0.095
adies+0.094
 Spanish+0.093
 clearly+0.093
Neg logit contributions
 chip-0.184
bank-0.175
 burgers-0.168
ages-0.166
 ir-0.166
Token store
Token ID3650
Feature activation-1.04e-02

Pos logit contributions
rim+0.009
ages+0.008
bank+0.008
 ir+0.008
uc+0.008
Neg logit contributions
 diary-0.006
 closer-0.005
 Crystal-0.005
 English-0.005
 name-0.005
Token,
Token ID11
Feature activation+1.54e-02

Pos logit contributions
rim+0.026
 ah+0.025
rob+0.025
 tacos+0.024
 bulldo+0.024
Neg logit contributions
 English-0.005
bell-0.005
 One-0.005
 closer-0.005
 upon-0.004
Token but
Token ID475
Feature activation-1.71e-02

Pos logit contributions
 closer+0.013
 clearly+0.013
 English+0.012
 upon+0.012
 closely+0.012
Neg logit contributions
rim-0.034
rob-0.034
bank-0.034
ages-0.033
uc-0.032
Token when
Token ID618
Feature activation-1.36e-02

Pos logit contributions
rim+0.040
bank+0.040
ages+0.039
akers+0.038
 chip+0.037
Neg logit contributions
 closer-0.014
 English-0.013
 clearly-0.012
 closely-0.011
 Spanish-0.011
Token Lily
Token ID20037
Feature activation-1.68e-02

Pos logit contributions
bank+0.028
rim+0.028
ages+0.027
 chip+0.027
akers+0.026
Neg logit contributions
 English-0.015
 closer-0.014
 Spanish-0.013
 One-0.013
 clearly-0.013
Token took
Token ID1718
Feature activation-3.75e-02

Pos logit contributions
 chip+0.030
 burgers+0.029
bank+0.029
rim+0.027
 tacos+0.027
Neg logit contributions
bell-0.020
 clearly-0.020
 closer-0.019
 One-0.019
 English-0.019
Token a
Token ID257
Feature activation+4.24e-02

Pos logit contributions
 chip+0.080
bank+0.079
rim+0.076
 burgers+0.074
ages+0.074
Neg logit contributions
 English-0.035
 closer-0.033
 Spanish-0.033
adies-0.031
 clearly-0.030
Token bite
Token ID13197
Feature activation+1.17e-02

Pos logit contributions
 English+0.054
 closer+0.054
 Spanish+0.051
adies+0.049
 closely+0.048
Neg logit contributions
 chip-0.074
bank-0.072
 burgers-0.069
ages-0.069
 ir-0.068
Token,
Token ID11
Feature activation+1.03e-03

Pos logit contributions
 closer+0.011
 One+0.009
 clearly+0.009
 She+0.008
 closely+0.008
Neg logit contributions
bank-0.026
 bulldo-0.025
akers-0.025
keeping-0.025
rim-0.024
Token she
Token ID673
Feature activation-1.12e-02

Pos logit contributions
 English+0.001
 closer+0.001
 Spanish+0.001
 clearly+0.001
adies+0.001
Neg logit contributions
bank-0.002
 chip-0.002
rim-0.002
 burgers-0.002
ages-0.002
Token made
Token ID925
Feature activation-5.42e-02

Pos logit contributions
 chip+0.019
bank+0.019
 burgers+0.018
keeping+0.018
 ir+0.017
Neg logit contributions
 clearly-0.015
 closer-0.014
 closely-0.014
 English-0.014
 wonders-0.014
Token some
Token ID617
Feature activation+1.98e-02

Pos logit contributions
 English+0.017
adies+0.015
 closer+0.015
 Spanish+0.014
 clearly+0.013
Neg logit contributions
bank-0.053
 chip-0.052
rim-0.050
ages-0.049
 burgers-0.049
Token chickens
Token ID25972
Feature activation-4.21e-02

Pos logit contributions
 English+0.025
 hello+0.025
adies+0.024
 closer+0.024
 Spanish+0.024
Neg logit contributions
bank-0.036
 chip-0.035
rim-0.032
keeping-0.032
 lin-0.031
Token.
Token ID13
Feature activation+6.51e-03

Pos logit contributions
 chip+0.104
 lin+0.094
 burgers+0.093
 tacos+0.093
rim+0.093
Neg logit contributions
 closer-0.027
 English-0.025
 name-0.024
 hello-0.024
 answer-0.024
Token The
Token ID383
Feature activation-1.99e-02

Pos logit contributions
 English+0.009
adies+0.009
 Spanish+0.009
 clearly+0.009
 closer+0.008
Neg logit contributions
 chip-0.011
bank-0.010
 burgers-0.010
 tacos-0.010
ages-0.009
Token animals
Token ID4695
Feature activation-1.03e-02

Pos logit contributions
bank+0.030
 chip+0.028
 ir+0.027
keeping+0.026
rim+0.026
Neg logit contributions
 English-0.031
adies-0.030
 closer-0.029
 Spanish-0.029
 name-0.028
Token were
Token ID547
Feature activation-2.28e-02

Pos logit contributions
 chip+0.019
bank+0.017
 burgers+0.017
 tacos+0.016
 ir+0.016
Neg logit contributions
 English-0.013
adies-0.013
 closer-0.013
 clearly-0.013
 Spanish-0.012
Token very
Token ID845
Feature activation-1.33e-02

Pos logit contributions
 chip+0.043
bank+0.043
 ir+0.041
 tacos+0.040
 burgers+0.040
Neg logit contributions
 English-0.026
 closer-0.024
 Spanish-0.024
 clearly-0.023
adies-0.023
Token happy
Token ID3772
Feature activation+1.97e-02

Pos logit contributions
 chip+0.026
bank+0.025
 burgers+0.024
 tacos+0.024
 ir+0.024
Neg logit contributions
 English-0.014
adies-0.014
 Spanish-0.013
 closer-0.013
 clearly-0.013
Token living
Token ID2877
Feature activation-7.29e-02

Pos logit contributions
 English+0.023
adies+0.022
 closer+0.022
 clearly+0.022
 Spanish+0.021
Neg logit contributions
 chip-0.037
bank-0.035
 burgers-0.034
 ir-0.033
 tacos-0.033
Token on
Token ID319
Feature activation-3.97e-02

Pos logit contributions
 chip+0.150
bank+0.138
 bulldo+0.131
 ir+0.128
 tacos+0.126
Neg logit contributions
 English-0.081
 closer-0.080
adies-0.078
 wonders-0.078
 clearly-0.077
Token the
Token ID262
Feature activation+9.11e-04

Pos logit contributions
akers+0.077
 ir+0.075
bank+0.074
rim+0.073
ages+0.072
Neg logit contributions
 English-0.047
 Spanish-0.043
 One-0.043
 hello-0.042
One-0.040
Token
Token ID198
Feature activation+8.01e-03

Pos logit contributions
adies+0.022
 English+0.022
 Spanish+0.021
 outsider+0.020
 closer+0.020
Neg logit contributions
 chip-0.023
bank-0.020
ages-0.020
 burgers-0.019
 bulldo-0.019
Token"
Token ID1
Feature activation-5.41e-02

Pos logit contributions
 English+0.012
adies+0.012
 Spanish+0.012
 clearly+0.011
 closer+0.011
Neg logit contributions
 chip-0.012
bank-0.011
ages-0.011
 burgers-0.011
akers-0.010
TokenOh
Token ID5812
Feature activation-4.47e-02

Pos logit contributions
 chip+0.097
bank+0.092
 burgers+0.090
 tacos+0.087
ages+0.086
Neg logit contributions
 English-0.069
adies-0.066
 Spanish-0.065
 clearly-0.064
 closer-0.064
Token,
Token ID11
Feature activation-4.32e-02

Pos logit contributions
 chip+0.070
bank+0.065
 burgers+0.063
 tacos+0.060
ages+0.060
Neg logit contributions
 English-0.068
adies-0.066
 Spanish-0.065
 clearly-0.063
 closer-0.063
Token honey
Token ID12498
Feature activation-5.18e-02

Pos logit contributions
 chip+0.082
bank+0.081
 burgers+0.076
ages+0.076
 ir+0.075
Neg logit contributions
 English-0.049
adies-0.048
 closer-0.046
 Spanish-0.045
 clearly-0.045
Token,
Token ID11
Feature activation-2.78e-02

Pos logit contributions
 ir+0.101
 burgers+0.099
keeping+0.096
 tacos+0.096
bank+0.096
Neg logit contributions
adies-0.049
 answer-0.046
 closer-0.045
 name-0.045
 shrug-0.044
Token I
Token ID314
Feature activation-7.06e-03

Pos logit contributions
bank+0.059
 chip+0.055
rim+0.054
ages+0.054
 ir+0.053
Neg logit contributions
adies-0.026
 English-0.026
 closer-0.026
 hello-0.026
 clearly-0.024
Token'm
Token ID1101
Feature activation-3.27e-02

Pos logit contributions
 chip+0.011
bank+0.011
 burgers+0.010
ages+0.010
 tacos+0.010
Neg logit contributions
 English-0.011
adies-0.010
 Spanish-0.010
 clearly-0.010
 closer-0.010
Token sorry
Token ID7926
Feature activation+2.39e-02

Pos logit contributions
 chip+0.070
bank+0.070
 ir+0.066
ages+0.065
 burgers+0.065
Neg logit contributions
 clearly-0.029
 English-0.029
 closer-0.028
adies-0.028
 Spanish-0.027
Token to
Token ID284
Feature activation+1.68e-03

Pos logit contributions
 English+0.040
adies+0.039
 Spanish+0.038
 clearly+0.037
itates+0.036
Neg logit contributions
 chip-0.035
 burgers-0.031
bank-0.031
ages-0.029
 tacos-0.029
Token hear
Token ID3285
Feature activation+5.84e-02

Pos logit contributions
 English+0.002
 clearly+0.002
 closer+0.002
 Spanish+0.002
adies+0.002
Neg logit contributions
bank-0.003
ages-0.003
 chip-0.003
 burgers-0.003
akers-0.003
Tokenkeeper
Token ID13884
Feature activation-1.29e-02

Pos logit contributions
 burgers+0.136
bank+0.136
 ir+0.135
 tacos+0.134
rim+0.134
Neg logit contributions
 closer-0.057
 English-0.055
bell-0.054
 clearly-0.052
 Spanish-0.050
Token told
Token ID1297
Feature activation-2.17e-02

Pos logit contributions
keeping+0.025
bank+0.025
 burgers+0.024
ages+0.024
 tacos+0.024
Neg logit contributions
 closer-0.014
 clearly-0.014
 closely-0.013
bell-0.013
 curiously-0.013
Token Louise
Token ID31774
Feature activation+2.51e-02

Pos logit contributions
 chip+0.042
bank+0.039
 burgers+0.038
 tacos+0.037
ages+0.037
Neg logit contributions
 English-0.022
adies-0.022
itates-0.022
 Spanish-0.021
 clearly-0.021
Token that
Token ID326
Feature activation-3.64e-02

Pos logit contributions
 English+0.031
 closer+0.029
 clearly+0.029
 Spanish+0.028
 closely+0.028
Neg logit contributions
 chip-0.046
rim-0.043
 ir-0.043
 burgers-0.043
bank-0.043
Token the
Token ID262
Feature activation-2.60e-02

Pos logit contributions
akers+0.077
ages+0.075
bank+0.075
rim+0.075
uc+0.072
Neg logit contributions
 English-0.035
 clearly-0.031
 Spanish-0.031
 closer-0.030
 wonders-0.029
Token oil
Token ID3056
Feature activation-1.28e-01

Pos logit contributions
rim+0.044
bers+0.044
keeping+0.044
bank+0.043
ams+0.041
Neg logit contributions
 name-0.036
 answer-0.035
 One-0.033
 closer-0.032
 diary-0.031
Token had
Token ID550
Feature activation-5.70e-02

Pos logit contributions
 chip+0.249
rim+0.248
akers+0.246
 hamb+0.244
 ge+0.239
Neg logit contributions
 closer-0.107
 answer-0.101
 cannot-0.101
 clearly-0.100
 name-0.100
Token run
Token ID1057
Feature activation-1.04e-02

Pos logit contributions
 chip+0.110
bank+0.106
rim+0.104
bers+0.101
stack+0.100
Neg logit contributions
 English-0.065
 closer-0.061
 clearly-0.061
 answer-0.058
 Spanish-0.058
Token out
Token ID503
Feature activation+2.75e-02

Pos logit contributions
 chip+0.022
rim+0.022
 tacos+0.021
 hamb+0.021
 burgers+0.020
Neg logit contributions
 closer-0.011
 English-0.010
 Spanish-0.010
 closely-0.009
 clearly-0.009
Token very
Token ID845
Feature activation-2.53e-02

Pos logit contributions
 closer+0.026
 English+0.026
 clearly+0.024
 Spanish+0.024
 upon+0.022
Neg logit contributions
 chip-0.065
bank-0.061
 tacos-0.060
ages-0.058
mons-0.058
Token quickly
Token ID2952
Feature activation+1.86e-02

Pos logit contributions
 chip+0.056
bank+0.054
rim+0.053
 burgers+0.053
 hamb+0.052
Neg logit contributions
 closer-0.023
 closely-0.023
 clearly-0.022
 English-0.021
 Spanish-0.019
Token rod
Token ID15299
Feature activation-1.35e-02

Pos logit contributions
 Su+0.057
Su+0.057
 paramed+0.055
 ir+0.054
akers+0.052
Neg logit contributions
 name-0.055
 closer-0.052
 answer-0.047
 closely-0.044
 English-0.044
Token.
Token ID13
Feature activation+3.41e-03

Pos logit contributions
 chip+0.031
rim+0.031
akers+0.030
 ir+0.030
 paramed+0.029
Neg logit contributions
 closer-0.010
 closely-0.009
 clearly-0.008
 English-0.008
 answer-0.007
Token He
Token ID679
Feature activation+2.14e-02

Pos logit contributions
 English+0.003
adies+0.003
 Spanish+0.003
 clearly+0.003
 closer+0.003
Neg logit contributions
 chip-0.007
bank-0.007
 burgers-0.007
rim-0.007
akers-0.007
Token cast
Token ID3350
Feature activation-1.95e-02

Pos logit contributions
 closer+0.027
 wonders+0.024
 closely+0.024
 clearly+0.023
 English+0.021
Neg logit contributions
keeping-0.039
 chip-0.038
rim-0.037
 ir-0.036
bank-0.036
Token the
Token ID262
Feature activation-9.62e-03

Pos logit contributions
 chip+0.043
bank+0.042
rim+0.042
 burgers+0.040
 ir+0.039
Neg logit contributions
 English-0.018
 closer-0.018
 clearly-0.017
 Spanish-0.016
adies-0.016
Token line
Token ID1627
Feature activation-1.08e-01

Pos logit contributions
keeping+0.018
Su+0.017
rim+0.017
 Su+0.017
 paramed+0.016
Neg logit contributions
 closer-0.014
 name-0.013
 answer-0.012
 nearer-0.011
 wonders-0.011
Token in
Token ID287
Feature activation-1.83e-02

Pos logit contributions
rim+0.247
 chip+0.245
akers+0.234
 ir+0.233
 tacos+0.229
Neg logit contributions
 closer-0.114
 closely-0.101
 clearly-0.090
 nearer-0.080
adies-0.079
Token the
Token ID262
Feature activation-1.42e-02

Pos logit contributions
ages+0.046
rim+0.044
mons+0.044
akers+0.044
drops+0.042
Neg logit contributions
 closer-0.014
 English-0.014
 Spanish-0.013
 One-0.013
 wonders-0.013
Token water
Token ID1660
Feature activation-2.50e-02

Pos logit contributions
keeping+0.030
rim+0.029
 ir+0.028
akers+0.028
Su+0.028
Neg logit contributions
 closer-0.016
 name-0.014
 open-0.013
 wonders-0.013
 another-0.012
Token,
Token ID11
Feature activation+1.24e-03

Pos logit contributions
rob+0.064
 ir+0.063
akers+0.063
rim+0.062
 chip+0.062
Neg logit contributions
 closer-0.015
 closely-0.013
 clearly-0.012
 open-0.009
 answer-0.009
Token and
Token ID290
Feature activation+1.09e-02

Pos logit contributions
 closer+0.002
 closely+0.001
 clearly+0.001
 wonders+0.001
 open+0.001
Neg logit contributions
rim-0.003
uc-0.003
akers-0.003
keeping-0.003
ages-0.003
Token would
Token ID561
Feature activation-2.67e-02

Pos logit contributions
 chip+0.020
 burgers+0.019
bank+0.019
 tacos+0.018
akers+0.018
Neg logit contributions
 clearly-0.016
 English-0.015
 closer-0.015
 wonders-0.014
 Spanish-0.014
Token work
Token ID670
Feature activation-2.85e-02

Pos logit contributions
 chip+0.051
 burgers+0.050
 tacos+0.049
ages+0.049
bank+0.048
Neg logit contributions
 closer-0.031
 clearly-0.031
 English-0.030
 Spanish-0.028
 answer-0.028
Token,
Token ID11
Feature activation+4.19e-03

Pos logit contributions
 chip+0.060
 burgers+0.057
 tacos+0.056
bank+0.056
rim+0.056
Neg logit contributions
 clearly-0.030
 English-0.030
 closer-0.030
 Spanish-0.028
 closely-0.027
Token but
Token ID475
Feature activation-2.33e-02

Pos logit contributions
 clearly+0.005
 closer+0.005
 English+0.004
 closely+0.004
 Spanish+0.004
Neg logit contributions
rim-0.009
 chip-0.009
akers-0.009
uc-0.009
ages-0.008
Token the
Token ID262
Feature activation-4.77e-02

Pos logit contributions
rim+0.049
akers+0.049
 chip+0.048
ages+0.048
bank+0.047
Neg logit contributions
 closer-0.025
 clearly-0.024
 English-0.023
 closely-0.022
 Spanish-0.021
Token therm
Token ID21969
Feature activation-1.13e-01

Pos logit contributions
bank+0.083
rim+0.082
ages+0.081
akers+0.079
 chip+0.078
Neg logit contributions
 English-0.062
 closer-0.061
 Spanish-0.058
 diary-0.058
 name-0.056
Tokenometer
Token ID15635
Feature activation-9.33e-02

Pos logit contributions
 chip+0.183
 burgers+0.180
bank+0.177
 tacos+0.173
 ir+0.169
Neg logit contributions
 English-0.152
 clearly-0.148
adies-0.146
itates-0.145
 closer-0.143
Token still
Token ID991
Feature activation-3.52e-02

Pos logit contributions
 chip+0.177
 burgers+0.176
rim+0.174
akers+0.170
 tacos+0.169
Neg logit contributions
 clearly-0.106
 closer-0.100
bell-0.097
 wonders-0.092
 shrug-0.091
Token couldn
Token ID3521
Feature activation-2.39e-02

Pos logit contributions
 chip+0.062
bank+0.058
 burgers+0.057
rim+0.057
 tacos+0.056
Neg logit contributions
 clearly-0.049
 closer-0.047
 English-0.046
 wonders-0.043
 closely-0.043
Token't
Token ID470
Feature activation+2.01e-02

Pos logit contributions
 chip+0.052
 burgers+0.051
bank+0.051
 tacos+0.050
akers+0.050
Neg logit contributions
 English-0.019
 clearly-0.018
adies-0.018
 closer-0.017
 Spanish-0.017
Token show
Token ID905
Feature activation+7.04e-03

Pos logit contributions
 English+0.024
 clearly+0.023
 closer+0.023
adies+0.023
 Spanish+0.022
Neg logit contributions
 chip-0.037
bank-0.035
 burgers-0.035
 tacos-0.034
ages-0.034
Tokens
Token ID82
Feature activation-2.78e-02

Pos logit contributions
 chip+0.068
 burgers+0.064
bank+0.063
 tacos+0.062
 ir+0.061
Neg logit contributions
 English-0.050
itates-0.050
 clearly-0.049
 upon-0.048
 Spanish-0.048
Token wrong
Token ID2642
Feature activation-2.80e-02

Pos logit contributions
 chip+0.068
bank+0.064
ages+0.062
 burgers+0.061
 ir+0.061
Neg logit contributions
 English-0.020
adies-0.019
 Spanish-0.019
 closer-0.017
 clearly-0.016
Token,
Token ID11
Feature activation-7.50e-03

Pos logit contributions
bank+0.072
 chip+0.071
ages+0.071
 burgers+0.071
akers+0.070
Neg logit contributions
 English-0.013
 Spanish-0.011
 name-0.010
 closer-0.009
 clearly-0.007
Token rh
Token ID9529
Feature activation-3.11e-02

Pos logit contributions
bank+0.015
ages+0.014
keeping+0.014
uc+0.014
akers+0.013
Neg logit contributions
 English-0.009
 Spanish-0.009
 name-0.008
 hello-0.008
 clearly-0.008
Tokenin
Token ID259
Feature activation-2.50e-02

Pos logit contributions
 chip+0.063
bank+0.062
 burgers+0.060
ages+0.058
 ir+0.057
Neg logit contributions
 English-0.033
 Spanish-0.030
adies-0.030
 clearly-0.029
 closer-0.028
Tokenoc
Token ID420
Feature activation-1.22e-01

Pos logit contributions
keeping+0.068
 chip+0.066
 ir+0.065
mons+0.064
bank+0.064
Neg logit contributions
 English-0.009
 closer-0.007
 Spanish-0.007
 name-0.006
 shrug-0.006
Tokeneros
Token ID27498
Feature activation+7.21e-03

Pos logit contributions
 burgers+0.203
 pitcher+0.199
 chip+0.198
 lin+0.197
bank+0.195
Neg logit contributions
adies-0.148
 English-0.142
 Spanish-0.136
 hello-0.127
 clearly-0.127
Token?
Token ID30
Feature activation-1.39e-03

Pos logit contributions
 English+0.002
 closer+0.001
 clearly+0.001
 hello+0.001
adies+0.001
Neg logit contributions
 chip-0.020
 ir-0.019
keeping-0.019
bank-0.019
ages-0.019
Tokenâ
Token ID22940
Feature activation-8.10e-02

Pos logit contributions
 chip+0.003
bank+0.003
 burgers+0.003
 tacos+0.003
ages+0.003
Neg logit contributions
 English-0.002
 Spanish-0.001
adies-0.001
 One-0.001
 clearly-0.001
Token
Token ID26391
Feature activation-1.53e-02

Pos logit contributions
 chip+0.141
 burgers+0.139
 ir+0.130
 ge+0.128
 tacos+0.128
Neg logit contributions
adies-0.099
 English-0.098
 Spanish-0.095
 clearly-0.094
 One-0.092
Token said
Token ID531
Feature activation-2.54e-02

Pos logit contributions
 chip+0.029
 burgers+0.028
 ir+0.027
bank+0.027
 tacos+0.027
Neg logit contributions
 English-0.017
adies-0.016
 Spanish-0.016
 clearly-0.016
itates-0.015
Token But
Token ID887
Feature activation+2.00e-04

Pos logit contributions
 chip+0.003
bank+0.003
 burgers+0.003
rim+0.003
 ir+0.003
Neg logit contributions
 English-0.002
 Spanish-0.002
adies-0.002
 clearly-0.002
 closer-0.002
Token they
Token ID484
Feature activation-6.25e-02

Pos logit contributions
 closer+0.000
 English+0.000
 clearly+0.000
 closely+0.000
 Spanish+0.000
Neg logit contributions
bank-0.000
rim-0.000
ages-0.000
akers-0.000
 chip-0.000
Token see
Token ID766
Feature activation+2.88e-02

Pos logit contributions
 chip+0.125
bank+0.119
 burgers+0.115
 ir+0.113
ages+0.112
Neg logit contributions
 English-0.066
adies-0.064
 clearly-0.062
 closer-0.062
 Spanish-0.061
Token another
Token ID1194
Feature activation+2.40e-02

Pos logit contributions
 English+0.033
adies+0.031
 Spanish+0.030
 clearly+0.029
 closer+0.029
Neg logit contributions
 chip-0.057
bank-0.054
 burgers-0.052
ages-0.051
 ir-0.051
Token long
Token ID890
Feature activation-6.26e-02

Pos logit contributions
 English+0.032
 closer+0.032
 Spanish+0.030
 wonders+0.029
 answer+0.029
Neg logit contributions
bank-0.042
ages-0.042
 chip-0.041
uc-0.040
rim-0.040
Token line
Token ID1627
Feature activation-1.05e-01

Pos logit contributions
ages+0.116
rim+0.112
uc+0.110
akers+0.110
 chip+0.110
Neg logit contributions
 English-0.085
 Spanish-0.077
 answer-0.076
 shrug-0.076
 closer-0.073
Token of
Token ID286
Feature activation-5.94e-02

Pos logit contributions
 chip+0.220
bank+0.200
 ir+0.195
 burgers+0.194
ages+0.194
Neg logit contributions
adies-0.124
 English-0.124
 closer-0.117
 Spanish-0.112
 clearly-0.110
Token kids
Token ID3988
Feature activation-2.66e-02

Pos logit contributions
bank+0.129
ages+0.127
rim+0.122
stack+0.121
 hyd+0.120
Neg logit contributions
 English-0.065
 Spanish-0.065
 closer-0.062
 wonders-0.060
 answer-0.058
Token waiting
Token ID4953
Feature activation-4.21e-02

Pos logit contributions
 chip+0.057
bank+0.051
 burgers+0.050
ages+0.050
 ir+0.050
Neg logit contributions
 closer-0.028
 English-0.028
 clearly-0.027
adies-0.027
 closely-0.027
Token.
Token ID13
Feature activation-1.01e-02

Pos logit contributions
 chip+0.090
bank+0.084
ages+0.079
 bulldo+0.078
rim+0.077
Neg logit contributions
 closer-0.049
 English-0.047
 clearly-0.045
adies-0.045
 Spanish-0.043
Token They
Token ID1119
Feature activation-2.56e-02

Pos logit contributions
 chip+0.017
bank+0.016
 burgers+0.015
 tacos+0.015
 ir+0.015
Neg logit contributions
 English-0.014
adies-0.014
 Spanish-0.013
 clearly-0.013
 closer-0.013