Hacker News new | ask | show | jobs
by aitchnyu 558 days ago
How big are they? I thought its the 100gb order of magnitude as (dead tree) libraries.
2 comments

Hmm. A couple years ago I think one large firm I knew their aws instance had about 400tb I think. Constantly growing with new cases.

They had instances around the world this was just one.

It adds up quick. I know of a law firm (under a 100 employees) with over 20TB.
That's still peanuts, you can get consumer grade HDDs with that capacity in a single drive. A business grade line would have no trouble uploading all of that data in less than a week, even with a bunch of extenuating circumstances.
Some smaller businesses may have a huge data store, but not the money to pay for a business grade internet connection to upload it in a reasonable amount of time. I've worked for clients who have a 10 megabit full duplex fiber connection for over $1,000 a month (probably because of low ISP competition and because they were in a newly built, low density area). If they were working on migrating to the cloud, they would certainly consider taking a few hard drives one time to AWS rather than maxing their 10 MB full duplex connection for weeks or months.
> no trouble uploading all of that data in less than a week

When you're doing e-discovery, deadlines are often measured in days - not just for the upload time, but for the analysis and finding the needle in the haystack.

Also gotta think of what else is using the corporate internet pipe you can’t drown it in one aws upload for days.
I'd imagine with LLMs today, discovery work is probably done on the cloud by bots.
It'd be a great way to get sued for negligence. You can't even assume the counterparty has correctly put everything into discovery for you. What you don't know is what gets you into trouble.

An example from the Karen Reed case, the police, somehow, uploaded a video that had been put through a "mirror filter" and thus showed a vehicle in the opposite orientation from reality. Is your LLM going to notice that?

Do you know of a single attorney who has been held liable for negligence for using an LLM to help accelerate their document research work?
A few years ago there was definitely document processing automation and query based filtering but still alot of human work.

I assume you’re right and AI now does some of the work but I doubt all of it. Also how reliable would the AI be… you’d hate to not have critical evidence at trial because you trusted the AI fully and it missed something.

Discovery data includes audio, video, social site data, as well as the usual documents and emails.

Yeah that’s the quickest way to go bankrupt. Imagine trusting the current LLMs to do that and the prompts involved. No one is going to trust that.