|
|
|
Ask HN: Way to extract relevant parts from a PDF based on a question?
|
|
1 points
by madhatter999
780 days ago
|
|
Dear HN, I am trying to do some semantic search in a given corpus of PDF documents based on a question as input. My goal is to find the relevant parts from the PDF that best answers the input question. I am interested in finding out concepts, frameworks, and methodologies that will help me with this task. If you have any pointers, I would greatly appreciate it! |
|
So you can convert all the paragraphs in your document into vectors, convert your question into a vector, and then find the e.g. 10 closest vectors, or all that fall under a certain maximum distance, etc.
You can store the embeddings in a vector database, to search across multiple documents.