Hacker News new | ask | show | jobs
by nickserv 30 days ago
Gave it a try for structured data extraction. Tested returning a JSON object from images.

The output was correct, and seemed deterministic, although I ran it only 2-3 times on the same image.

Main problem is response time: it took about 20-25 seconds for a simple structure of 5 fields. As such unusable at scale, let alone "real time" processing.

Other problem is cost, it is considerably more expensive than more established models for the same document, like flash-light.

Shame, the architecture is very interesting.

1 comments

Thanks for the feedback!

We're working a lot more on speed in the coming few weeks :) More GPUs and more optimizations.

Our has been focus on quality of output first and we'll make optimizations as we grow :)

The lite models are great for simple use cases but won't don well in more complex OCR use cases.