|
|
|
|
|
by binalpatel
700 days ago
|
|
You can do some really cool things now with these models, like ask them to extract not just the text but figures/graphs as nodes/edges and it works very well. Back when GPT-4 with vision came out I tried this with a simple prompt + dumping in a pydantic schema of what I wanted and it was spot on, pretty much this (before json mode was a supported): You are an expert in PDFs. You are helping a user extract text from a PDF.
Extract the text from the image as a structured json output.
Extract the data using the following schema:
{Page.model_json_schema()}
Example:
{{
"title": "Title",
"page_number": 1,
"sections": [
...
],
"figures": [
...
]
}}
https://binal.pub/2023/12/structured-ocr-with-gpt-vision/ |
|