Hacker News new | ask | show | jobs
by nerme 5834 days ago
MapReduce is already being applied to graphs.

Spectral graph theory, ie, the construction and manipulation of graphs that are described by matrices, has been around for quite some time. Two types of matrix representations are typically used. Adjacency matrices and Laplacian matrices.

Page Rank is an example of this. It is equivalent to finding the centrality of a graph, by computing the primary eigenvector of the adjacency matrix of the graph.

MapReduce is used for accomplishing the difficult task of computing the needed eigenvector of a multi-billion dimensional matrix. Instead of attempting an exact computation of the primary eigenvector, the power method is used. The power method is an iterative approach that quickly approaches an approximation of the eigenvector. It comes down to looping over a matrix multiplied by a vector until you get the precision you're looking for.

The cool thing about the power method is that you don't REALLY need to get your data in to some sort of linear algebraic representation of a matrix, you can just use more efficient data structures to mimic matrix multiplication.

Since matrix multiplication is done a row and a column at a time, you can split it up in a parallel manner. The map function handles the computation of each row of the resulting vector, and the reduce function puts it back together again, ready for the next iteration of the power method.

As for PreGel, I didn't really understand the description of it.

I have always had a lot of gripes with publications related to algorithms. There are very rarely any code snippets. I feel at least that the authors should go over their algorithms in some sort of pseudo-code.

I realize that academic papers are written for a specific audience in mind, an audience that is used to seeing a bunch of algorithms described in linear algebraic notation, but I feel it does a huge disservice to spreading knowledge. I mean, I'm somewhat versed in mathematics, albeit at an undergrad level, but I rarely know what these articles are talking about in any specific detail, at least to the point where I could reproduce their approach in functioning computer code.

I've even been published in ACM, and I can't follow the majority of papers in the journal!

This isn't just applicable to computer science, rather all mathematics in general. The notations used are so arcane that even the simplest of ideas can seem daunting. It's probably why so many people are turned off by mathematics early on in their academic careers.

It took me a good week of brushing up on linear algebra and reading through various descriptions and white papers to figure out what was going on with Page Rank. It really isn't that complicated of an algorithm in the end... you're just constructing an MxM matrix where rows and columns are considered web pages, getting those rows to add up to 1 while making sure there are no elements equal to 0, computing an eigenvector, and then scaling the eigenvector down to make each element add up to 1. However, you read the Wikipedia entry on Page Rank and it is borderline indecipherable!

Why isn't there a "somewhat complex math explained in normal language with pretty pictures and graphs" textbook or website out there? Are all mathematicians incapable of communicating their art form to the world at large?

1 comments

Well, there's "Proofs without words" (http://books.google.co.uk/books?id=Kx2cjyzTIYkC) and its sequel (http://books.google.co.uk/books?id=wMpwQgAACAAJ). I'm not sure that's exactly what you meant, though.