Hacker News new | ask | show | jobs
by fraboniface 643 days ago
Not exactly, they use a small but long-context model that has the whole dataset in its context (or a large part of it) to generate the chunks as elements of the reply, before passing those to the final model. So the retrieval itself is different, there is no embeddeding model or vector db.