Hacker News new | ask | show | jobs
by lawtomated 2412 days ago
True, maybe we should have clarified that in our article.

The point we were trying to make is different however. That point is this: in reality most legal data, in its final authoritative form, at least re contracts, is stored in a scanned pdf of the signed agreement, ie an image not the original Word doc containing semi structured data.

Granted, if lawyers didn't scan docs as images and hold only those scanned images to be the authoritative data on a particular matter things might be different.

Lots of projects in the works at law firms and in house legal teams to try and maintain contracts as structured, or at least semi structured, data from cradle to grave but still old habits of scanning contracts persists.

Not sure if that adds clarity. Be good to know. If it does (or if it doesn't) be good to understand so we can improve our content :)

1 comments

I see. I just wanted to point out that semistructured documents have a long history in law in particular, with some of the oldest text database in use. AFAIK, law firms were holding on to WordPerfect for a long time (and many are using it still) even when MS Word became the de-facto mainstream format, and WordPerfect has a rich history of structured, non-WYSIWYG editing, and could be converted to SGML as early as 1992. So I guess if the problem today is overuse of binary-only transport formats such as PDF, a discussion about representation of text in law should start with a look back, with a perspective on what's been lost (not to mention that a paged media format is probably not the way forward in 2019 or even 1999 for that matter).
Definitely agree. In legal we've ended up with this odd process by which we start with semistructured data that could, but isn't, stored as structured data, e.g. the summary deal terms in a corporate context. That is usually a Word table of key value pairs, e.g. parties, financial values, percentages, key clause types or even the exact text to be included.

That is then negotiated into a long form MS Word contract, negotiated and then signed and physically scanned back into a machine as a PDF (vs PDFing the native word doc and preserving the text layer).

Very avoidable as you note, and actually solvable via a look backwards and / or redesigning the technical paradigm for representing a contracts data model from cradle to grave.