Hacker News new | ask | show | jobs
by matt4077 4089 days ago
This seems to be insanely relevant: I'm currently working on transforming government documents into structured xml and I've had to be stopped repeatedly from implementing something like this. I have a treetop grammar now that (mostly) works, but I'm tempted to try this.
1 comments

You are? That's interesting, b/c that also something I am currently working on.

What do you need the grammatical parsing for? Identification of named entities?

We're building a tracker for EU legislative process. There's xml markup for legislative documents (akoma ntoso) and we need to transform the pdfs that the EU publishes into it to allow, for example, user annotation (and just good html representation in general. We've built on this South African project: https://github.com/longhotsummer/slaw
Are you working for an NGO, like Open Data Foundation or something like that?

I'd be interested in following your progress. You can send me a mail, so we can connect.

It's an NGO startup: https://twitter.com/aecu_eu