Hacker News new | ask | show | jobs
by lolpanda 845 days ago
I think LlamaParse is trying to solve a hard problem. Many enterprise customers I know have strong need to parse PDF files and extract data accurately. I found the interface a bit confusing. From your blog post, LlamaParse can extract numbers in tables, but it appears that the output isn't provided in tabular format. Instead, access to these numbers is only available through a question-answering. Is this accurate?
1 comments

The output is either text or markdown, and from there you can handle it however you need.

In LlamaIndex for example, there are a a few markdown-specific classes that work well with this.

You can find an example over in the repo -- https://github.com/run-llama/llama_parse/blob/main/examples/...

I was hoping to get structured data. For example, parsing an voice will give results like {"title"... "line_items": [...], "date":...}