| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sergiotapia 1146 days ago

1. Take text of source material. Example: extract_pdf_text(pdf_path_string)

2. Segment text every 3000 characters.

3. Generate embeddings for every text segment and save the embeddings to PineconeDB. Make sure you also save the raw text segment as additional meta information so you can use it later. https://platform.openai.com/docs/guides/embeddings/what-are-...

4. Capture user's question and generate an embedding for it using the same OpenAI API.

5. Query your PineconeDB with the question's embedding you will get matches back.

6. Use these matches as context to hit the OpenAI chatgpt API endpoint. Example:

    Using this context answer this question: **QUESTION**

    **CONTEXT**

1 comments

Thank you for the detailed explanation!