Hacker News new | ask | show | jobs
by dbot 3235 days ago
Please take a moment to notice how horribly unfriendly a deposition transcript is. Thousands of these documents are produced every day, in a proprietary format that is antiquated and near impossible to work with. The PDF is unusable given the line numbers, headers, footers, etc. The simple act of copying and pasting - for example, writing a brief, a blog, etc. - is painful.

I know developers could create an amazing solution, but the legal community hasn't asked yet, unfortunately.

12 comments

There is nothing antiquated about PDF. It's an incredibly widely supported standardized format that can cleanly handle everything from a Word document to a scan of a prisoner's hand written pro se brief to a printed document that someone has scribbled on. It preserves formatting information, which is incredibly important because court filings are regularly printed on paper. Not because there is no electronic workflow (the entire workflow is electronic at least in federal court), but because it's a pain in the ass to read and annotate things on a computer versus putting tabs in a binder.

As to the line/page format, it's used because sentences or even words within depositions are quoted in briefs with citations to exactly where they appear. And frankly, if your software can't even grok a simple 2D format it's probably not intelligent enough to do any useful processing of the document.

I'm always on the lookout for good legal technology. But legal technology purveyors are like those people who think programming IDEs should all be visual environments where you program by dragging and dropping connectors between blocks. It's like, no.

I think you misunderstand the original comment. It isn't criticising the PDF format. It is crticising the format of the PDF
It's the structure of the document that's the problem, not the file format. Copying and pasting out of it is painful.
A lot of the time they're also provided in other formats, including plain text. It's still got pages and line numbers (they're essential for citing to), and it uses an ASCII form feed for new page, but it's better to work with. You can actually pipe it to port 9100 on a common network printer and it'll come out about right. There is also software to work with these transcripts.
There are a number of legal tech companies fighting to break into legal document management (ediscovery). I worked for Everlaw (a16z) and there's also Disco and Logikcull, among others
Let's not forget Relativity :)
What would the business model be? Or to ask the same thing a different way: who is wasting money with the status quo?
Litigants are wasting money with the status quo. Not because of the format, per se, but because of a lack of reasonably good NLP-based search and summary tools. Much money is spent paying people to review transcripts by hand, when the bulk of the heavy lifting could be done with software. I know the market well enough to manage the product development and sell it but lack the NLP skills to build it. Anyone interested in talking about it feel free to hit me up.
You don't understand how large court cases work.

You're obligated to provide everything to the other side, doing so in a format that requires them to have a small army of people to read every line instead of being able to do a simple text search is exactly the point. There are even companies that specialize in taking large amounts of electronic data (email is a good example) and printing every single page so that opposing council ends up with enough paper to fill a room.

Edit: I also have no idea how you'd sell a product that considerably reduces billable hours.

Exactly. Large cases often produce box after box of documents and frequently supply documents not related that were not requested. It is a game of burying them in paper because if there is a smoking gun (document that is critical to their side) they may not see it or when they find it, they have paid tens of thousands of dollars in legal fees for the attorneys to find it in the many boxes of documents.

Often times one litigant can starve the less funded litigant out. Successfully starving a litigant out results in favorable settlements for the offending litigant.

You could sell it to plaintiffs lawyers and the increasing universe of firms that do fixed fee or capped arrangements. Plaintiffs lawyers have huge incentives to minimize per-case investment, but they don't use much legal tech. Which is pretty great evidence of how well it works.

Also you can obviously search PDFs. People read every line of deposition transcripts because they're looking for admissions (places where the deponent slips up and reveals useful information).

> I also have no idea how you'd sell a product that considerably reduces billable hours.

I think you'd want to sell it to the folks who are paying for those billable hours.

You sell it to firms who would like to compete on price while showing clients that they do so due to their use of cutting-edge technology
They're not the ones using it though
The nature means that you'd only reduce your adversary's costs, not your own.
> I also have no idea how you'd sell a product that considerably reduces billable hours.

Hmm.. perhaps, if they can force the opposition to use similar tech, then they can promise faster resolutions.(That's still assuming both the parties wouldn't mind it much, but don't see it happening).

> There are even companies that specialize in taking large amounts of electronic data (email is a good example) and printing every single page so that opposing council ends up with enough paper to fill a room.

Surely there's a limit to how awkward you can make this for the other side? Why would the courts allow making it intentionally difficult for one side to gather evidence to help their case?

For example, I'm sure they wouldn't allow you to deliver the documents on numbered post-it notes, one sentence per note and in a random order.

I would have thought the court would insist the material is delivered in the most practical format (e.g. emails as text files or in a searchable database) and both sides get access to the same format unless there are special circumstances.

I would imagine they are stuck back a few decades when delivering documents on paper was standard and nothing unusual. So printed paper is a minimal acceptable format. Since part of the legal battle is to also drain the other side's resources it would make sense then to go by the absolute allowed minimum and nothing more.

If delivering documents on posted notes was allowed surely they'd be companies specializing in that.

Seems like requiring an electronic copy when one is available originally would go a long way. Is it court's prerogative, or do we need a Congressional amendment for this to become commonplace?
It is common place. I've never gotten paper discovery in a civil case. Opposing counsel sends links to a secure download site, sometimes a CD or USB. The documents are sent as natives plus TIFF images of each page plus metadata. We load them into an ediscovery platform where everything is OCR-ed and indexed. Whoever reviews it works within the platform where documents can be searched, tagged, marked up, etc.
This is where us old-timers go into get-off-my-lawn mode: "You kids today don't know how easy you have it for document discovery — in my day we spent days and weeks in hot, dusty warehouses looking through boxes of mouldering paper files ...."
I thought this happens only on TV. Why the hell would you not require by law that electronic information in the best available format must be supplied if available???
How do you define "best available format" though? There's a rabbit hole of complexity just in this single sentence!
So are you saying it's sensible that we're stuck with emails printed on reams of paper because it's too hard to define a better spec? There is really no reason for digital files not to be mandated in 2017 except "lol bureaucracy." Frankly, if someone only has access to paper files they should be required to scan them. As is, at least as much work is being done in the opposite direction (printing tons of paper) which is just ridiculous.
I work on large lawsuits for a living. You obviously don't understand the point I made.
There is a significant industry based on e-discovery that never sees, or requires actual paper. Much more likely is an inhalation of the contents of a custodian's hard drive, or the ingestion of a full PST file (MS Outlook email form) into a system, or downloading email inboxes from Office 365 or Gmail.

Once ingested, the documents are searched for words or phrases, tagged as relevant, privileged, or non-responsive. See the FRCP (Federal Rules of Civil Procedure) for discussion of electronic documents in discovery.

Not only are courts expecting parties in lawsuits to supply documents in electronic form, there are are now rules in some courts tailored to TAR, or technical assisted review, which often means LSA (Latent Semantic Analysis).

So the idea of dropping tons of paper on the hapless opponent is an idea that is practically of antiquity, dating back to the MCI/ATT lawsuits. Large lawsuits simply don't work that way anymore.

Given developers are responsible for CSV, PDF (a complex programming language that renders documents as a side effect), XML, and JSON, I see no particular reason to share your optimism.
No freaking kidding. It's 2017, where's the json..
For all the good press PDF is still a bitch to work with when trying make end to end workflows (particularly if you want to avoid opening acrobat or something).
This is a particular file format used for legal transcripts rendered as a PDF. The underlying format is not all that difficult to work with. Good legal software needs to be able to deal with hundreds of different file formats, from Lotus Notes to the oldest MS Word format or early forms of PowerPoint. The transcript format is a piece of cake relative to others.
But it would be an act of faith and patience to redact and docket-file high-tech versions-- that is seamless for the low-tech PDF, which was pen-signed and docket-friendly. The PDF was even pulled from the docket and re-posted on scribd, with searchable text intact.

There also would be a time-synchronized video and an e-transcript, yes, in proprietary non-open standards.

is there a link to the video?
The goal of law firms is to maximize billable hours and make it difficult for non-lawyers to conduct legal work. Working efficiently is not the goal at all.
not specific to depositions, but a piece that might help: api for pdf's https://www.pdfotter.com/
why is so many pages completely redacted? how do you read the unreadcted original
Being on the legal teams for plaintiff or defendant would be a start.
Unless they do something stupid like just draw block boxes over the real text, you don't. That's the purpose of redaction.