Hacker News new | ask | show | jobs
by airstrike 238 days ago
> because it's more likely to literally interpret that sequence and go into the latent space of drama

This always makes me wonder if saying some seemingly random of tokens would make the model better at some other task

petrichor fliegengitter azúcar Einstein mare könyv vantablack добро حلم syncretic まつり nyumba fjäril parrot

I think I'll start every chat with that combo and see if it makes any difference

2 comments

There’s actually research being done in this space that you might find interesting: “attention sinks” https://arxiv.org/abs/2503.08908
No Free Lunch theorem applies here!