|
|
|
|
|
by souvik3333
361 days ago
|
|
Hey, the reason for the long processing time is that lots of people are using it, and with probably larger documents. I tested your file locally seems to be working correctly. https://ibb.co/C36RRjYs Regarding the token limit, it depends on the text. We are using the qwen-2.5-vl tokenizer in case you are interested in reading about it. You can run it very easily in a Colab notebook. This should be faster than the demo https://github.com/NanoNets/docext/blob/main/PDF2MD_README.m... There are incorrect words in the extraction, so I would suggest you to wait for the handwritten text model's release. |
|
Apologies if there's some unspoken nuance in this exchange, but by "working correctly" did you just mean that it ran to completion? I don't even recognize some of the unicode characters that it emitted (or maybe you're using some kind of strange font, I guess?)
Don't misunderstand me, a ginormous number of floating point numbers attempting to read that handwriting is already doing better than I can, but I was just trying to understand if you thought that outcome is what was expected