Hacker News new | ask | show | jobs
by breakwaterlabs 1009 days ago
In what world is 2160TB $100k?

Current single disk solutions are around $25/TB for HDDs and ~$100/TB for NVMe.

At a minimum you're looking at $54k just for raw capacity-- assuming no backup, no chassis, no networking, and no redundancy.

More reasonable estimations would be in excess of $400/TB.

3 comments

Sure, whatever, a factor of 10 here or there hardly matters. I literally misinterpreted “multiple gigabytes per hour” as 999 GB/hr, not a much more reasonable 10 GB/hr. I literally overestimated data rates by a factor of 10,000% and the number still comes out “reasonable” i.e. a cost that can be paid if the cost/benefit is there.

Unless you want to claim storage costs $5,000/TB for 3 MB/s of I/O “multiple gigabytes per hour” with 90 day retention for a team worth of logging is not stupid on its face. Not to say that is a efficient or smart solution, but certainly not a “look at this insane request by developers” the person I was originally responding to was making it out to be.

Personally, I would probably question the competence of the team if they had that sort of logging rate with manual logging statements, but I am merely pointing out that “multiple gigabytes per hour” for 90 days is not crazy on its face and a plausible business case could be made for it even with a relatively modest engineering team.

My recent discussions with multiple SAN vendors as well as quoting out cost to DIY storage has that number being far away from "reasonable". I do not claim storage is $5,000/TB but it is substantially higher than the $50/TB you're estimating.

It's difficult to estimate the log throughput in this scenario. Cisco on debug all can overload the device's CPU; systems like sssd can generate MB of logs for a single login.

All of this is really missing the core issue though. A 2PB system is nontrivial to procure, nontrivial to run, and if you want it to be of any use at all you're going to end up purchasing or implementing some kind of log aggregation system like Splunk. That incurs lifecycle costs like training and implementation, and then you get asked about retention and GDPR.... and in the process, lose sight of whether this thing you've made actually provides any business value.

IT is not an ends in itself, and if these logs are unlikely to be used the question is less about dollars-per-developer-hour and more about preventing IT scope creep and the accumulation of cruft that can mature into technical debt.

But you wouldn't use a SAN here. SAN pricing is far away from reasonable for this situation.

For the 20TB case, you can fit that on 1 to 4 drives. It's super cheap. Plus probably a backup hard drive but maybe you don't even need to back it up.

For the 2PB case, you probably want multiple search servers that have all the storage built in. There's definitely cost increases here, but I wouldn't focus too much on it, because that was more of a throwaway. Focus more on the 20TB version.

> That incurs lifecycle costs like training and implementation

Those don't relate much to the amount of storage.

> and then you get asked about retention and GDPR....

It's 90 days. Maybe you throw in a filter. It's not too difficult.

> if these logs are unlikely to be used

The devs are complaining about the search features, it sounds like the logs are being used.

> preventing IT scope creep and the accumulation of cruft that can mature into technical debt

Sure, that's reasonable. But that has nothing to do with the amount of storage.

> Current single disk solutions are around $25/TB for HDDs

More like $15/TB. $100K for 2 PB of storage with redundancy and backups is quite reasonable.

I'm showing Exos x20 20TBs for ~$500 new.

$300 is moving towards refurb / shucked prices.

> I'm showing Exos x20 20TBs for ~$500 new.

Where? For new prices I'm seeing $350 at amazon, $350 at B&H, $280 direct from newegg, $280 at serverpartsdeals.

> In what world is 2160TB $100k?

When you buy a SAN to present a bunch of disks as one thing to the rest of the machines.