Hacker News new | ask | show | jobs
by tannhaeuser 2412 days ago
I see. I just wanted to point out that semistructured documents have a long history in law in particular, with some of the oldest text database in use. AFAIK, law firms were holding on to WordPerfect for a long time (and many are using it still) even when MS Word became the de-facto mainstream format, and WordPerfect has a rich history of structured, non-WYSIWYG editing, and could be converted to SGML as early as 1992. So I guess if the problem today is overuse of binary-only transport formats such as PDF, a discussion about representation of text in law should start with a look back, with a perspective on what's been lost (not to mention that a paged media format is probably not the way forward in 2019 or even 1999 for that matter).
1 comments

Definitely agree. In legal we've ended up with this odd process by which we start with semistructured data that could, but isn't, stored as structured data, e.g. the summary deal terms in a corporate context. That is usually a Word table of key value pairs, e.g. parties, financial values, percentages, key clause types or even the exact text to be included.

That is then negotiated into a long form MS Word contract, negotiated and then signed and physically scanned back into a machine as a PDF (vs PDFing the native word doc and preserving the text layer).

Very avoidable as you note, and actually solvable via a look backwards and / or redesigning the technical paradigm for representing a contracts data model from cradle to grave.