Hacker News new | ask | show | jobs
by bckr 748 days ago
You’re right, and thanks to another one of the commenters, I have an idea for how I could do this.

Take my journals, and run a relatively simple word separation algorithm over them.

Shuffle up those words and pay to have them annotated.

Reconstruct the dataset from there.

1 comments

It might be better to provide 2 3 words of context with each separated word. My handwriting is often bad, and I sometimes have to guess a word based on context.