Hacker News new | ask | show | jobs
by topkai22 2283 days ago
While proxying through a torrent system is a good idea. I doubt it would get well seeded outside a few popular datasets- the agency would end up the sole seeder of the long tail.

I’m willing to bet NASA saves a ton of money by going to a cloud provider- US government storage setups are insanely expensive. I remember a project I was on got a quote of over $10,000/TB in 2014, and there is no way egress is actually free right now- they are paying for a government regulation compliant internet connection one way or another.

I do worry about vendor lock in to a degree, but I’m confident the agency and tax payers would save money going to any major cloud provider.

3 comments

Sounds like there is a bigger story there and it's probably a managed SAN.

I've operated pretty significant government shared infrastructures like this in the past... we were offering fast, flash-cached disk in 2010 for about $5,000/TB. $10k/TB is not unreasonable for highly available Tier-1 storage for something like SAP, especially in that era where you couldn't use all flash in most case.

Today, cost structures can be very different. You can land high-iop storage for a fraction of the cost without the overhead of a big SAN. If you need capacity focused storage, that is also much cheaper.

An agency like NASA gets hosed on services, and cloud is no different. AWS is probably a net savings for operational workloads whose characteristics are known. Backup is a no-brainer. But for a high-volume, operationally highly variable thing like a public archive of data, AWS a square peg in a round hole because of the metered access.

I’m sure that $10k/terabyte quote was complete overkill for what we needed- but that’s what the stove piped storage org was offering, and it killed the project we were working on.
I hope you can correct my numbers but I am pretty sure this is within the same decimal order of magnitude :

If 1-2TB drives were handily $1k in 2010 (2005 $1K hot you 128GB 15KRPM)

and your array set is at least R10,

already raw storage is approaching half of ten thousand dollars.

And this ignores controllers, cabling and chassis.

And this is before we look at our storage software licenses.

Is backup, point in time SLA, replication and availability in this budget?

I wasn't really sure what they pitched us technically, but your pitch sounds reasonable. It was also complete overkill- we were hosting read only static images (map tiles). Azure and AWS were less than $300/TB/Year at the time, and their triple replication was more than what we needed availability wise.
Maybe I'm missing vital context info here: Why didn't you go with an alternative?
Because the storage group refused to sign off on a cheaper solution with lower specs (I don't know why) and acquisitions in the government is a mess so going outside would have tied up one of our primary constraints (the tech lead) more than it was worth.

The overall system ended up with worse capabilities than it should have had, but it did ship.

Wow! That's good to know, if a bit disheartening. I guess I was thinking costs for small startup costs with some cheap-ish linux raid setups and likely massive fiber taps NASA must surely already have. Not government/big business costs.
What causes a cost of $10000/TB? Even with multiple redundant failsafes I just cannot see how the cost could run up to that.
In 2014?

You'd be buying something like an EMC vMax that can sustain 1M+ IOPS on lots of 15K spinning drives, with caching tiers on crazy expensive flash.

To support that, you need a fibre channel network layer and a bunch of FTEs to attend to it. Usually compliance requirements require segmentation of roles, which increases cost. If you're a federal government entity, those FTEs are most likely contractors billed out at $125-300/hr. Figure $3-5M/year on labor costs alone, although that may be divided out over multiple systems.

This happens in commercial business too. I had a buddy who was making about $150k in NYC to zone luns on a SAN. Basically he kept a spreadsheet and updated a specific configuration setting 2-3x a day and spent about 60-90 minutes/day doing that. The rest was waiting or studying for his MBA.

It's pretty wacky to compare S3 to this type of storage.

At a technical level yes, it’s wacky. At a “this is what government departments actually do” level, it’s perfectly reasonable. I’m sure NASAs current system is actually pretty efficient as the us government goes, but having spent a career running into the sort of institutional pathologies that lead to an interdepartmental quote for $10k/terabyte, I’m willing to bet AWS is very competitive.
A million iops from spinning rust?

200 iops per drive from 2.5" 15KRPM is good going....

Edit:iops auto spellings