This is great news, it's been needed for ages - handwriting is more than just funky OCR, it's OCR as applied to vector lines with a defined stroke order. So for instance, a lowercase e and c might render to the exact same pixels due to the 'loop' of the e overlapping itself, but if we know the stroke started in the middle of the line and then retreads itself, we can know for sure we're looking at an 'e'. That's simply not possible in e.g. Tesseract.
This project is just funky OCR, i.e. "offline" handwriting recognition that operates on the pixels of the final image only. That means it works on scans, but can't take stroke order information into account.
What you're talking about would be "online" handwriting recognition, where timing information about each stroke is available.
Yes, that's totally correct! The current version of the plugin supports only so called "offline" HTR, which operates on images. This is ultimately determined by the underlying machine learning model.
I have developed another model however (based on a somewhat recent Google paper by Carbune et al. 2020), that operates on pen dynamics and thereby implements online HTR, see here:
This model is open-source as well and will be part of the HTR system for Xournal++ in the future. Feel free to give it a try yourself locally.
One question that has been bothering me a long time and prevented online HTR so far for me is how to find text on a page in temporal domain (i.e. in online domain and not offline domain). If you have any ideas on that, please do let me know as I would greatly appreciate that! One possible way is a transformer model - but again that feels a bit overkill and introduces a context length.
Yes, you're right, stroke-order-aware HWR are hard to find. One reason for that is the lack of good datasets for machine learning model training!
As such, my stroke-order-aware attempt over at https://github.com/PellelNitram/OnlineHTR/ uses a dataset from 2000 with around 12,000 samples. Contrary, the internal Google dataset is reported to feature around 16,000,000 samples :-D.
Currently, the machine learning model only supports offline HTR (i.e. using images) but online HTR (i.e. using pen time series data) is in the making, see here:
How has your experience been? Maybe you should give them a testimonial here, it's the least they deserve since it's serve you well (assumingly). Also, please consider giving us your perspective from a long-term user, I think you bring a unique perspective to the table!!! HTR is an oasis in this global desert of manual digitization. I did a relatively extensive search for projects a year ago, I don't remember seeing Xournal at all or maybe the old development date threw me off.
"HTR is an oasis in this global desert of manual digitization" - hach, this is such a great phrasing! <3
Xournal++ is a great project that features a bunch of really great developers who dedicate a lot of time to it.
I am not much involved in the Xournal++ development itself but then try to utilise my machine learning skills to build an HTR system for Xournal++ in the form of a plugin.
I was wondering, how do you capture handwritten notes using Xournal and Xournal++? Using some sort of tablet maybe?
Great to hear that you found out about Xournal++! It's really the power of open-source that you own your own handwritten notes as you are not locked in.
A very nice project but... How many really want to scan to text handwritten text? Results will be messy anyway and typically today handwritten text is not more than few pages, far quicker to retype or even dictate than correcting OCR.
BTW personally I use Xournal++ to add text/images to pdfs, typically where I have some crappy low importance pdf-form, not a real one, and I do not want to invest time in a nice LaTeX + cart.el (Emacs artist mode wrapper to get coordinate of any form clicking with the mouse on them [1]). I still have to do with some scanned documents but originally printed from a computer not handwritten.
Handwritten text recognition might be very welcome to scan and index old public archives, witch is damn complex since there are countless of style of cursive, but it's still a very needed thing to merge the old paper world to the digital one not to loose history.
The HTR feature in Xournal++ does not require you to scan anything though. You just write handwritten notes in Xournal++ as always and upon saving with the plugin, the resulting PDF is searchable (the handwritten texts at least). So there's no scanning or retyping involved :-).
How cool! I have over a decade of notes taken in xournal and other digital tablets and was considering taking a short sabbatical to type them all out. Might not need to after all! I will definitely try this.
Oh that's so exciting to hear! Please do let me know if you need help when trying to use the plugin! I am happy to help you until the plugin works for you.
I decompose my answer in three prime time metrics below and obtain that the plugin is at 6.4/10.
These are the decomposed prime time metrics:
1. The plugin itself is at 10/10; e.g. I use it in production myself.
2. The prediction quality is at, say, 6/10. I am actively working on this.
3. The installation process is 3/10. That's because I have only tested it using my machine setup, for which it actually works fine. To improve here, I'd love to get a bit of user feedback to improve the installation process.