Hacker News new | ask | show | jobs
by fbrncci 1147 days ago
Pretty easy, take a look at Langchain tutorials on YouTube. Basically you give it a set of documents, split these into smaller documents and then store these in a vector database and create embeddings (OpenAI, Jina, etc). Then when you interface with OpenAI on GPT-3 or GPT-4, you interface with those documents and embeddings and produce an answer based on the document set (or very near to it). It takes some practice, but with some repetitions you could code this together from scratch within 5-10 minutes. This channel on YouTube thought me within less than 2 days:

- https://www.youtube.com/@DataIndependent

1 comments

Thank you! It was an eye opener for me. We've been using slightly different approach (at https://jopilot.net) but vector database + langchain allows to process much bigger amount of data.
No problem! You could probably improve it by fine-tuning GPT models on different categories of documents, prior to doing the vector retrieval from embedding. Fine tuning isn't available for GPT-3-turbo or GPT-4 yet, so I am waiting to try out this hybrid approach for when it does come available.