| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by skj 1686 days ago
	Google uses map reduce extensively... where it's appropriate. True time helps with things like spanner transactions. It's just a totally different use case.

2 comments

dekhn 1686 days ago

The tech lead of the Google MapReduce team (which no longer exists) just received their award for turning down mapreduce. IIRC it was officially done 5 years ago. However I believe the code to delete MR was never checked in and I'm not sure if there are still users.

MapReduce was used at Google for highly inappropriate things. For example, the machine learning system I worked on, Sibyl https://www.datanami.com/2014/07/17/inside-sibyl-googles-mas... was implemented using mapreduce but there was no real technical justification for that- it's just that there was no other system that could scale to the volumes required or handle the constant failures endemic to GOogle's internal systems. It ended up requiring all sorts of heroic work to make MR scale, for example map-side combiners (which "reduced" items with common keys in the map output before it gets flushed to the shuffle files). All of this got replaced with TensorFlow and only the good bits of Sibyl were extracted to TFX.

sokoloff 1686 days ago

It seems to me “there is no other technical system in the company capable to perform the task” is a valid technical justification.

jpgvm 1685 days ago

i.e this is a crappy tool but it's best tool we have come up with for this problem thus far.

There are lots of such tools which are used begrudgingly by people that have an intuition for the fact it can be done better but not the concrete idea and/or time to implement it.

dekhn 1685 days ago

it wasn't a crappy tool (mapreduce was amazing) but it definitely was an impedence mismatch for this particular job. Later, we tried to get SIbyl to move to the underlying compute engine that Flume was built on top of it but it turned out to be more profitable to just let it die slowly.

mrep 1685 days ago

MapReduce was deprecated because flume [0] the successor is better but it does practically the same thing and flume is used massively. I believe dataflow is the public google cloud version.

[0]: https://research.google/pubs/pub35650/

oblio 1685 days ago

> handle the constant failures endemic to Google's internal systems

This sounds bad.

dekhn 1685 days ago

yeah it's crazy, I don't understand why Google's data center machines shit themselves so often. Probably cheap ram.

drewda 1686 days ago

Yes, which is why it's amusing in hindsight that for a decade everyone* outside Google was forcing all* their distributed data tasks into the MapReduce paradigm, without considering alternative approaches like the one used by Spanner.

* slight exaggerations, I know

mrep 1686 days ago

I'm not sure how you think a distributed data processing technology would "fake-out" other companies when building/choosing database technology. They are totally different problem sets.

MapReduce does not have a set in stone data source/sink and can use multiple things like bigtable and spanner so they are complementary technologies.

rp1 1686 days ago

I think the parent commenter might be referring to systems like Hive or HBase built on top of Hadoop and do have a lot of overlap with a large scale database system.

drewda 1685 days ago

Exactly, thanks. "MapReduce paradigm" wasn't precise, and I can see why that makes everyone want to give a distributed systems 101 lecture in this comment thread :)

quin3 1685 days ago

HBase isn’t really related to MapReduce though, more akin to BigTable.

quin3 1686 days ago

It’s not even related. No one was running OLTP workloads as MapReduce jobs at any point.

LaserToy 1686 days ago

Spanner didn’t exist in 2012.

jeffbee 1686 days ago

Maybe, but this 2010 presentation mentions it.

https://cloud.google.com/files/storage_architecture_and_chal...

dekhn 1686 days ago

at that time, only aristocrats could use spanner.

teraflop 1686 days ago

Yes it did. Google published a paper about it in 2012, and claimed that at that point it had been in development for 5 years and in production for more than 1.

LaserToy 1686 days ago

Mmm, I worked at google in 2012 and F1 was in active development.

I’ve never heard of Spanner internally. Maybe it was in development, but it was not in use.

Edit: went and read more. Looks like Spanner existed but didn’t have sql, so it wasn’t what it is today. And looks like I don’t remember things any more.

ithkuil 1685 days ago

at least by 2013 it had some basic SQL (although much more limited subset that one usable today); if you needed more you would be using f1; IIRC I used spanner (without F1) in production around 2013