Hacker News new | ask | show | jobs
by gord 6306 days ago
thanks for the write up.. interesting to see how things develop in the real world outside your own domain.

I'm surprised the big G hasn't just paid some money to get that data, given their plan to scan all the worlds books.

I wonder what percent of all text is legal or medical.

1 comments

I doubt West, LexisNexis, or any other legal aggregator will sell the information to Google. Those companies make a lot of money selling it to lawyers on a monthly subscription basis. They also do some value-add to the materials. What I see on West or LexisNexis is more than just the publicly available decision. West and Lexis employ lawyers to create summaries and other helpful things for the legal researcher.

There certainly is a lot of legal text. Lawyers certainly are good at creating volumes of paper. For example, the Supreme Court just decided a case, Wyeth v. Levine. It will be recorded in volume 555. So to date the Supreme Court decisions have filled 554 volumes of 1000 pages each. And that is just one court. Every state court, state appeals court, and state supreme court, federal court, land court, etc has similar volumes and page counts.

And all of this is just the primary sources. Once you add secondary sources, aka books and papers written by learned scholars on individual topics or cases, the number of books and pages increase by orders of magnitude. And we still haven't archived any statutes (those go on forever, for each state) or any administrative law. And each one of those has comment sections that go on for pages whereas the actual rule is only a paragraph.

I wonder what percentage this is, too. I bet it is still extremely small compared to what the rest of the world has produced. There are so few law writers when compared to all other writers.

Thats a lot of text. The few patents Ive read strike me as quite verbose. I was quite amazed at what was patentable, and how loosely described {ephemeral!} the descriptions were. I'm not suggesting all legal text is as sparse in information.

We could certainly do with a better text search for patents.. but I wonder if thats possible unless a form of restricted prose is used that makes the text less obtuse/verbose.

Maybe an algorithm can reduce the common legal motifs and replace them with shorter versions thus refactoring legal-speak into human-readable prose on which text search can be effective.

[ For some reason this reminds me of the law student drama series 'the paper chase'. ]

How well is the information hyper-linked? Presumably one paper references many previous rulings, and youd jump around a lot in researching issues.

Thank you for reminding me of patents. I forgot to mention those. A patent is a completely different entity compared to case law. Case law and case briefs/motions written by lawyers have to be short, concise, to the point, and logical. The judge will quickly (in a matter of a few seconds) ignore your argument if he has to spend any time figuring out what you have to say.

That leads all of our law professors to drill into our heads brevity, clarity, and conciseness in everything we write. But as you mentioned, patents have a completely different audience and goal.

I am amazed at what is patentable too. I wrote a research paper arguing against software patents. The professor that graded my paper disagreed with the position very much. I wrote mine a few days before the Court of Appeals for the Federal Circuit heard the Bilski case. When the decision was rendered this past fall they made some law that is similar to what I argued. I should go show the professor the paper he marked down and the Bilski decision. But I digress...

Patents are a land grab. The goal is to get the vaguest, broadest patent possible and protect the most space. And the legal-speak is there because those words have been litigated time and time again and they have a known meaning to the courts. As soon as you write a new phrase you open yourself to debate in front of the court. A macro to convert legal-speak to human-readable prose should be used at the researcher's own peril.

We are told time and time again: read the case for yourself. Do not read anyone else's summary. And don't paraphrase words unless you know to stay away from the special ones.

Example: There was a contracts case where the contract says "only use pipe made by Company A to build my house." The builder uses pipe from Company B. The court ruled against home buyer because "only use pipe by Company A" doesn't actually mean that! It means use pipe similar to the quality of Company A! So translating the legal speak required to really get a builder to use pipe from Company A into "only use pipe from Company A" will result in failure.

The information is hyper-linked very well. I wish I could show you, but I can't. My student access to the site is restricted to school use only. I am pretty sure I will be violating the TOS by posting any of the information.

But every case cited is linked. Those are the most important. Judges are linked to other decisions. Same with arguing lawyers. Statutes are linked. Footnotes are linked. Obscure terms are linked. For example, a medication will have a link but a legal term of art will not.

I just wish the search and the site overall were faster. Sometimes the navigation is quirky, too.