Hacker News new | ask | show | jobs
by visarga 1182 days ago
This approach has some issues

- it chunks inputs, with some overlap, but this can destroy context

- the retrieved passages, when they come from different documents, have no apparent relation or could be mistakenly considered related

- the model struggles to correlate data between the document snippets, taking half an idea from one side and half from the other side and mixing them up in something that doesn't really make sense

1 comments

Implementation details. Check out what's going on with LangChain, augmented retrieval, etc. We'll be able to create knowledge bases on specific subjects with vetted data, and get the bot to retrieve and summarize appropriate results while providing a citation to the original source.