Hacker News new | ask | show | jobs
by PeterStuer 1046 days ago
Yes. I think the point is that the price per token for creating the embeddings using e.g. OpenAI's text-embedding-ada-002 api might be low, this will add up to some significant cost for a large document corpus. The suggestion to roll your own based on freely available embedding models is sound IMHO.

Now how to chunk those documents into semantically coherent pieces for context retrieval, that is the real challange though.