Hacker News new | ask | show | jobs
by codechad 1582 days ago
This is amazing. Can you share any info on how you were able to compile so much info from different sources? In my limited experience of hunting for legal filings, it seemed like every court had its own system, with nothing standardized or programmatic.

Thanks!

1 comments

The search uses elasticsearch 7 for full text search. It's been extremely fast and worked very well. You're right court data is scattered across many different systems and needs to be aggregated, which is a difficult process.
Are you using freelaw's code to scrape all the different servers? Why are there no contact details on the site? I don't understand the mystery and black ops nature of this thing. It feels like there is some sort of conspiracy here that I've yet to uncover!
There are I think about 5 million opinions from that project, yes. I wouldn't say it's blackops, feel free contact me on reddit.
How much ram does that use up? What’s the latency? Is it sharded? Is it a cluster? So many questions
There are 2 search boxes going. One for storing the search index without source and another which stores the source, which is only used for highlighting. Searches usually take under 200ms and SRP and individual pages usually take less than 20ms. The 2 ES nodes are not formally part of a single cluster due to the index storage difference. Another box uses a traditional LAMP setup. Feel free to send a message on reddit if interested in more detail.
How large is the index?

How do you manage that between RAM or SSD?

Search - Index is ~373GB. AMD Epyc 7371 - 16c/32t - 3.1 GHz/3.8 GHz. 512 GB ECC 2400 MHz. 2×1.92 TB SSD NVMe

Highlight - Index is ~620GB. Xeon-D 2141I - 8c/16t - 2.2 GHz/3 GHz. 64 GB ECC 2133 MHz. 2×1.92 TB SSD NVMe

Search and highlighting handled async from queue.

Awesome insight, and site. Thanks