| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by codechad 1582 days ago
	This is amazing. Can you share any info on how you were able to compile so much info from different sources? In my limited experience of hunting for legal filings, it seemed like every court had its own system, with nothing standardized or programmatic. Thanks!

1 comments

richardbarosky 1582 days ago

The search uses elasticsearch 7 for full text search. It's been extremely fast and worked very well. You're right court data is scattered across many different systems and needs to be aggregated, which is a difficult process.

link

kingcharles 1582 days ago

Are you using freelaw's code to scrape all the different servers? Why are there no contact details on the site? I don't understand the mystery and black ops nature of this thing. It feels like there is some sort of conspiracy here that I've yet to uncover!

link

richardbarosky 1582 days ago

There are I think about 5 million opinions from that project, yes. I wouldn't say it's blackops, feel free contact me on reddit.

link

tmikaeld 1582 days ago

How much ram does that use up? What’s the latency? Is it sharded? Is it a cluster? So many questions

link

richardbarosky 1582 days ago

There are 2 search boxes going. One for storing the search index without source and another which stores the source, which is only used for highlighting. Searches usually take under 200ms and SRP and individual pages usually take less than 20ms. The 2 ES nodes are not formally part of a single cluster due to the index storage difference. Another box uses a traditional LAMP setup. Feel free to send a message on reddit if interested in more detail.

link

ramoz 1582 days ago

How large is the index?

How do you manage that between RAM or SSD?

link

richardbarosky 1582 days ago

Search - Index is ~373GB. AMD Epyc 7371 - 16c/32t - 3.1 GHz/3.8 GHz. 512 GB ECC 2400 MHz. 2×1.92 TB SSD NVMe

Highlight - Index is ~620GB. Xeon-D 2141I - 8c/16t - 2.2 GHz/3 GHz. 64 GB ECC 2133 MHz. 2×1.92 TB SSD NVMe

Search and highlighting handled async from queue.

link

ramoz 1582 days ago

Awesome insight, and site. Thanks

link