Hacker News new | ask | show | jobs
by zxt_tzx 473 days ago
> Have you looked into chunking (breaking input into smaller chunks and doing vector search on the chunks)?

Ohh I had not seriously considered this until reading this. I could have multiple embeddings per issue and search across those embeddings and if the same issue is matched multiple times, I would probably take the strongest match and dedupe it.

I could create embeddings for comments too and search across those.

Thanks for the suggestion, would be a good think to try!

> Choosing a chunking strategy seems to be a deep rabbit hole of its own.

Yes this is true. In my case, I think the metadata fields like Title and Labels are probably doing a lot of the work (which would be duplicated across chunks?) and, within an issue body, off the top of my head, I can't see any intuitive ways to chunk it.

I have heard that for standard RAG, chunking goes a surprisingly long way!