Hacker News new | ask | show | jobs
by ChuckMcM 3553 days ago
Pretty interesting. I love how bespoke data centers are converging along very similar lines. I'm sure if you were inside a Microsoft, Google, Amazon, or Facebook data center you would recognize the new design touch points. The old data center is dead, long live the new data center.

This was the second big thing Google had learned early on: "The equipment is reduced to its basics so it runs cooler. It can also be easily accessed and repaired quickly." -- slide 12/17

The whole sheet metal box around a server was a real waste of time if your employees are the only ones accessing the area, and the only reason they want to touch the machine is to repair it. This in contrast to NetApp (where I had worked before) which was busily designing impressive cabinets that would "stand tall" on the raised flooring of the data center.

2 comments

I think the lesson came in earlier in the NUMA and MPP machines where they kept trying to cram more stuff on boards that were themselves pluggable into the larger system. This convergence has happened from several directions. It's not all the different from the earlier one that started in the 1960's where they fought cost and inefficiency by getting as few components per box sharing as much as possible. Moores Law temporarily reversed it (transistors and memory are free!) then reality check hits that this seems to be a fundamental principle.

My design a while back was to put it all on PCI cards on a PCI backplane. I saw backplanes that basically look like motherboards full of PCI slots that load into racks. I wanted to make the cards nothing but CPU and memory whose software communicated over efficient networking (not TCP/IP) through PCI DMA. My design had IO/MMU functionality in the backplane or PCI cards. At least one card having full-featured stack for management and at least one I/O card for external interface. I figured the backplane itself could be extended for that, too, with a dedicated port like motherboards do integrated GigE. Management and I/O could come through remote DMA over dedicated wires like many servers do with Ethernet so all the PCI slots could be dedicated to compute.

Dumbest thing about Facebook's model is them destroying drives. The first thing to notice, due to Ross Anderson's Security Engineering, is that those pieces still contain a lot of data if they weren't degaussed first. Next is to remember the fastest way to destroy data: use clustered, encrypting filesystems so that secrets never touch the drive. Then, you just have to delete the keys to loose the secrets. No need to trash the drives at all. The crypto can happen at the storage manager or at hardware interface with HW acceleration available for both types. I'm surprised they haven't already built this with all the smart people they have working on big-data stacks.

To your last paragraph, only relying on forgetting the keys works great, as long as you have absolute 100% confidence in the mechanism used to do that. I read your posts on HN often so I know you know you're quite familiar with defense in depth--I feel that user data is one of those areas where it's ok to do more than one thing to protect the data.

That said, there are a number of systems at FB where deleting a crypto key loses the linked data forever--but they still crunch the hard drives just to be really sure. The drive crunching is an incredibly tiny expenditure compared to the massive CapEx and OpEx required to build, stock, and run the datacenters. It's worth it if only for the peace of mind.

Well, thanks for chiming in with insider view.

"as long as you have absolute 100% confidence in the mechanism used to do that"

It's true. These mechanisms fail way less than shredders, though. Ideally, the drive encryption would pull KEYMAT from a dedicated system for that somehow on boot (kernel, network, whatever). That system should be medium to high assurance. Easy way is rad-hard ASIC's (or antifuse FPGA's) with ECC RAM and ChipKill that implement a safe-coded protocol engine that moves keys around in memory. These are in high-availability configuration with electrical and optical isolation. Separate box manages things, does backups on encrypted data, etc. A good HSM combo at Level 3 or 4 is already mostly there, though. Remember even Ross Anderson's people couldnt break IBM's outside some stupid, unevaluated software for banking. My ideal just assures protocol itself a bit more.

"I feel that user data is one of those areas where it's ok to do more than one thing to protect the data."

It's fine, except to environmemtalists, to do it extra on top of crypto for extra assurance. By itself, crushing it is insufficient given it might be recovered given just how much data they cram in tiny spaces. It's why DOD/NSA standards were to suck the magnetism out of the platter with qualified degaussers then destroy it. Crypto then destruction can't be directly compared but should also make it hard.

"there are a number of systems at FB where deleting a crypto key loses the linked data forever"

Great they do. Thanks for telling me.

"The drive crunching is an incredibly tiny expenditure compared to the massive CapEx and OpEx required to build, stock, and run the datacenters."

I believe that. What groups like Facebook pull off in datacenter hardware, software, and administration continues to amaze me.

The number of physical connections (power and numerous data cables) that can be seen in the 5th slide that each machine has makes this seem infeasible for now, but considering the article mentions there's "only one technician for every 25,000 servers" and Facebook's FBAR software (hey, that's up from 20k), it seems like the next step in "the new data center" is having a robot to unrack (and rack) entire machines and bring them to a servicing bay.