|
|
|
|
|
by zxt_tzx
473 days ago
|
|
> Have you looked into chunking (breaking input into smaller chunks and doing vector search on the chunks)? Ohh I had not seriously considered this until reading this. I could have multiple embeddings per issue and search across those embeddings and if the same issue is matched multiple times, I would probably take the strongest match and dedupe it. I could create embeddings for comments too and search across those. Thanks for the suggestion, would be a good think to try! > Choosing a chunking strategy seems to be a deep rabbit hole of its own. Yes this is true. In my case, I think the metadata fields like Title and Labels are probably doing a lot of the work (which would be duplicated across chunks?) and, within an issue body, off the top of my head, I can't see any intuitive ways to chunk it. I have heard that for standard RAG, chunking goes a surprisingly long way! |
|