Hacker News new | ask | show | jobs
by rahimnathwani 503 days ago
Hi Jerry,

How well does llamaparse work on foreign-language documents?

I have pipeline for Arabic-language docs using Azure for OCR and GPT-4o-mini to extract structured information. Would it be worth trying llamaparse to replace part of the pipeline or the whole thing?

1 comments

yes! we have foreign language support for better OCR on scans. Here's some more details. Docs: https://docs.cloud.llamaindex.ai/llamaparse/features/parsing... Notebook: https://github.com/run-llama/llama_parse/blob/main/examples/...
What is disable_ocr=True for? Is it for documents that already have a text layer, that you don't want to OCR again?
yeah disable OCR is for documents where you don't need to OCR a scanned image, it'll just parse out the text

it's faster if True