|
|
|
|
|
by Eisenstein
367 days ago
|
|
Document: * https://imgur.com/cAtM8Qn Result: * https://imgur.com/ElUlZys Perhaps it needed more than 1K tokens? But it took about an hour (number 28 in queue) to generate that and I didn't feel like trying again. How many tokens does it usually take to represent a page of text with 554 characters? |
|
Regarding the token limit, it depends on the text. We are using the qwen-2.5-vl tokenizer in case you are interested in reading about it.
You can run it very easily in a Colab notebook. This should be faster than the demo https://github.com/NanoNets/docext/blob/main/PDF2MD_README.m...
There are incorrect words in the extraction, so I would suggest you to wait for the handwritten text model's release.