| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lukegb 4089 days ago
	Inspired by https://twitter.com/garybernhardt/status/600783770925420546

2 comments

mosselman 4089 days ago

Can someone explain in a bit more detail what this is about? Is the 'joke' that running data computation in RAM is faster than what? From disk?

link

JonnieCache 4089 days ago

The subtext is that running a fancy distributed system is more exciting and beneficial for ones resume than simply buying a massive bloody server and putting postgres on it, and that people are making tech decisions on this basis.

link

Anderkent 4089 days ago

This of course ignores that it's much easier to get your hands on a cluster of average machines than one massive bloody server, and all the non-performance-oriented benefits of running a cluster (availability etc.).

Much easier to request a client provisions 20 of their standard machines, or get them from AWS. People don't like custom hardware, and for good reason.

link

RogerL 4089 days ago

Yes. Just yesterday in some other story everyone was arguing for the cloud because who wants to maintain their own hardware? This morning the hue and cry is "slap more RAN in that puppy".

I just speced out a 6TB Dell server. Price? It is already at $600K, and I haven't fully speced it out yet (just processor, memory, drive). Maybe that memory requirement is high (though it is about what I would need); 1TB is somewhat over $200K.

For the right situation that sort of thing maybe makes sense, though I'm SOL if I need high availability (power out, internet flakey, RAM chip goes bad, etc leaves me dead in the water).

So I would need to stand up a million or more in equipment in several places, or just use AWS and suffer the scorn of someone saying 'you could have put that in RAM'. Yes. Yes I could have.

link

vidarh 4089 days ago

> everyone was arguing for the cloud because who wants to maintain their own hardware?

Well, I keep arguing against that, because you still get 90%+ of the maintenance work, plus some new maintenance work you didn't have before, to avoid some some relatively minor hardware maintenance. And you can get most of the benefits of non-cloud deployment with managed hosting where you never have to touch the hardware yourself.

I work both on "cloud only" setups and on physical hardware sitting in racks I manage, and you know what? The operational effort for the cloud setup is far higher even considering it costs me 1.5 hours in just travel time (combined both ways) every time I need to visit the data centre.

For starters, while servers fail and require manual maintenance, those failures are rare compared to the litany of issues I have to protect against in cloud setups because they happen often enough to be a problem. (The majority of the servers I deal with have uptimes in the multi-year range; average server failure rate is low enough that maintenance cost per server is in the single digit percentage of server and hosting costs). Secondly I have to fight against all kind of design issues with the specific cloud providers that are often sub-optimal and require extra effort (e.g. I lose flexibility to pick the best hardware configurations).

Cloud services have their place, but far too many people just assumes they're going to be cheaper, and proceed to spend three times as much what it'd cost them to just buy or lease some hardware, or rent managed hosting services.

Even if you don't want to maintain your own hardware, AWS is almost never cost effective if you keep instances alive more than 6-8 hours of the day in general. Your mileage may wary, of course.

link

collyw 4088 days ago

"The cloud" is basically a new non standard OS to learn.

I am reasonably happy configuring an old school Linux box. Heroku is much more of a pain in the arse to deploy to in my experience, despite much of the work being done for you already. Debugging deployment issues is particularly painful.

link

jules 4089 days ago

Depends on what you mean by massive bloody server. You can get a server with a terabyte of RAM for a price that's insignificant compared to the cost developing software to run on a cluster.

link

aliakhtar 4089 days ago

> You can get a server with a terabyte of RAM for a price that's insignificant compared to the cost developing software to run on a cluster.

This assumes that a) You're in the valley where average developer salary is $10k a month or more, b) You're a large company paying developers that salary.

There are lots of other places where a) Developers are cheaper, or b) You're a cash strapped startup whose developers are the founder(s) working for free.

link

jules 4089 days ago

Comparison still holds, because if you buy a cluster with X amount of RAM the price will be roughly the same as a single server with X amount of RAM. Except that for some large X there won't be any off the shelf servers you can buy with that amount of RAM (let's say 2000GB), but lets be honest here, 99% of companies needs are under that X especially if we're talking about startups.

link

sjtgraham 4089 days ago

You're assuming there are people competent of building such systems who are ignorant of the fact they can earn that money anywhere in the world.

link

falcolas 4089 days ago

Amazon offers some bloody huge servers... 32 core, 256GB RAM, and 48TB HDD space. d2.8x large

link

brianwawok 4089 days ago

That is 4k a MONTH for 256gb of ram.

If you could do the same job on a fleet of 8-16GB servers.. you can get a lot more CPU for a lot less dollars. Depends if you really need everything on 1 machine or not (as of course nothing will beat same machine in memory locality)

link

jules 4089 days ago

Not true, 8x16GB costs as much as 1x256 on Amazon. The issue here is that Amazon is hilariously expensive in general. Hetzner will rent you a 256GB server for €460 per month. Or you can buy one from Dell for $5000. These are not high numbers, in 1990 you paid more than that for a "cheap" home computer. For the price of a floppy drive back then you can now get a 32GB server.

link

pquerna 4089 days ago

rackspace, onmetal-memory[1]: 512 GB, $1650/mo (3.22 $/gb/mo)

softlayer, dual Xeon 2000 Series: 512GB, $1,823.00/mo (3.56 $/gb/mo)

these are on-demand prices. pre-pay, or use a term discount, and its cheaper.

Build it yourself: You can build a Dell or similar on a 2-Xeon-proc (E5 series), your main limit is getting good prices on 16x 32GB DIMMS. But lets say you can buy the RAM for ~$6500, then its just dependent on the rest of your kit, lets say $10,000 flat for the whole server. $277.77/mo over 36 months, but you still need network infrastructure, and you might want a new one in 12 months, but you get the general idea.

[1] - http://www.rackspace.com/en-us/cloud/servers/onmetal

link

viggity 4089 days ago

and fwiw, costs at amazon will scale linearly with resources. the 1 beefy box with 256GB RAM box costs about as much as 16 boxes with 16GB of RAM each.

link

agarcia-deniz 4089 days ago

If you're running a windows system licensing costs will be smaller when scaling up than scaling out, so there is that to bear in mind.

link

tempodox 4089 days ago

It may be more exciting, but don't those people know about the CAP theorem?

link

isp 4089 days ago

link

jacquesm 4089 days ago

You should post that if it hasn't been posted already, it's a much better way to make the case than the current link.

link

isp 4089 days ago

Done: https://news.ycombinator.com/item?id=9582060

I originally saw it on HN, but almost two years ago. Old comments: https://news.ycombinator.com/item?id=6398650

link

jordanthoms 4089 days ago

There is no point deploying a heavy, complex (and usually pretty slow due to the overheads involved) distributed database, when you could just buy a server with xTB ram, load any sql database on it, and run your queries in a fraction of the time. If your data is so large that it can't fit in the RAM of a single machine, then distributed databases make more sense (since loading data off disk is very slow, modulo SSD).

link

jacquesm 4089 days ago

For some problems SQL would already be way too much overhead.

link

e12e 4089 days ago

Could you give a concrete example?

If your working set is small, say 1TB -- and so fits in RAM -- for what kind of problems would using SQL be so much of an overhead that you need a different approach? And what would that approach be?

I suppose you could have a massive set of linear equations that you might be able to fit into 1TB of RAM, but would be difficult to work with as tables in Postgres?

link

jacquesm 4089 days ago

Take a graph, you could use an SQL database to store it and do your graph analysis using SQL, or, alternatively, you could convert your graph to an extremely compact in-memory format and then do your analysis on that. Much better efficiency for the same size problem, bonus: you can now analyze much larger graphs with the same hardware.

link

e12e 4089 days ago

Or maybe a bit of both: http://stackoverflow.com/questions/27967093/how-to-aggregate...

I appreciate you taking the time to answer -- and I get that there's a reason for why we have graph databases. But I really meant something more concrete, as in here's a real-world example that isn't feasible to do on machine X with postgresql, but easy(ish) with a proper graph structure/db -- rather than "not all data structures are easy to map to database tables in a space-efficient manner".

link

learnstats2 4089 days ago

Data that fits in RAM doesn't need any "Big Data" solutions.

link

icebraining 4089 days ago

I believe it's more "no, you don't need an Hadoop cluster of 20 machines, your data fits in the RAM of one machine".

link

feld 4089 days ago

People will build gigantic compute clusters with expensive storage backends when their entire dataset fits in memory.

If it fits in memory, it's going to be magnitudes faster to work with than on any other infrastructure you can build.

So the trick is, you take their "big data problem" and hand them a server where everything can be hot in memory and their problem no longer exists.

link

JDDunn9 4089 days ago

Right, RAM an order of magnitude faster than disk, so calculations will be performed very quickly. Big data usually implies clusters of servers because the data won't fit on one server (even on the disk).

link

jacquesm 4089 days ago

Big Data usually implies 'big dollars', not necessarily a large amount of data. Simply use some in-efficient algorithm and a datastore with sufficient overhead and you're in Big Data territory.

Re-do the same thing using an optimal algorithm operating on a compact datastructure and you make it look easy, fast and cheap. Of course you're not going to make nearly as much money.

link

emodendroket 4089 days ago

Most people who think they have "big data" problems actually don't have "big" data at all.

link

jacquesm 4089 days ago

He's selling himself short.

link