| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stoicjumbotron 747 days ago
	What would be the chunking strategy for q&a pairs? Right now I'm embedding the complete question and answer but the query results are not good as the response contains data not related to the question at all.

2 comments

laborcontract 747 days ago

Join q and a when vectorizing them. Questions alone are too short to carry a lot of semantic richness.

When you get a query, you then run two semantic search queries: one using the original question and one using a HYDE version of the question. Take those results and run it through cohere’s rerank.

link

stoicjumbotron 746 days ago

Yes, right now I'm vectorizing them as a pair. I'm then running a query in Pinecone by embedding the exact question but still the result does not have the actual q&a pair.

I'm not familiar with HYDE version. I'll check it out. Thanks for the suggestion

link

harpastum 747 days ago

It depends on the specifics of your format, but we’ve had success embedding the questions and answers separately. If either match, you return the complete question and answer text. Make sure to deduplicate before returning, in case both match.

link

stoicjumbotron 746 days ago

I'll try embedding it separately as well and try to figure out from there. Thanks for the suggestion

link