Hacker News new | ask | show | jobs
by kylebgorman 4093 days ago
For comparability, most people use the Penn Treebank-III WSJ data. Sections 03-06 are test, the remaining sections are train/dev.

Most methods are based on some sort of simple feature templates and machine learning, so they should generalize relatively well to a wide variety of languages, IMO.