Hacker News new | ask | show | jobs
by DannyBee 2691 days ago
The collection is random precisely because we could not get folks like PACER to give us data, even when we offered to pay for the cost of production.

The scholar team is small. Anurag[1] thought there was no reason law shouldn't be accessible to normal people too. So we pushed on that direction (I lead an eng team in DC at the time that worked on opening up data that should have been open. We also did election information, etc).

Once PACER/et all turned them down, i'm pretty sure they made some deals but there really wasn't a good and complete source.

Worse, lots of states/etc had locked themselves into exclusive deals and so couldn't give us the data if they wanted to. (They were actually happy to be locked in, it turns out).

They do have some fairly good feeds from paid sources but ...

Anurag is very persistent, but even here, i think he's been focusing on other areas that are more useful to people.

[1] https://www.wired.com/2014/10/the-gentleman-who-made-scholar...

2 comments

random question, have yall ever thought to separate the google scholar part from legal part. I know its a simple button click, but they are really different outputs. I use both, mainly the scholar part for finding prior art references in patent litigations, and the legal searches to find citations. I would also think it would help on the branding. either way, thanks for putting together a product that really helps.
Do you at least pull whatever is in RECAP?
Not sure if they do, but RECAP pushes data to the Internet Archive, so anyone can pull that PACER data back out.

https://blog.archive.org/2017/02/13/internet-archive-offers-...