|
|
|
|
|
by diarrhea
486 days ago
|
|
I’m curious about the async aspect of this. I was under the impression PDF processing like OCR is purely CPU bound. OS file I/O interfaces are sync, so async does not help. With GIL, so single threaded Python, I can’t see how async improves performance for the PDF use case. Only parallelism helps, and concurrency doesn’t. When would it yield back to the event loop when it’s busy number crunching? |
|
It's both. The OCR part is ofc CPU bound, but the entire text extraction involves reading files, or writing and then reading files.
Without async, these simply block.
As for efficiency - if you're working in an async application context you have to "asyncify" these operations or suffer the consequences.