Hacker News new | ask | show | jobs
by ranty 4337 days ago
Not being that au fait with big data stuff, can someone tell me what graph processing is for?
2 comments

One current gold-rush topic is ASIC-resistant bitcoin mining.

Bitcoin works by generating a particular type of SHA256 hash -- you take some fixed inputs (transactions), a random input you can change (a nonce), and together they need to SHA256-hash into a special hash, eg one with 5 leading zero's.

This is known as a "proof of work" system -- it has the interesting property that it is quick to verify but difficult to produce (you can't predict a SHA256 hash). Mining bitcoin is basically brute-forcing tons of random inputs until you find an input that produces a special SHA256 hash, which you then advertise to the network and the network rewards you with libertarian disney dollars.

Now, you can make a custom chip (ASIC) that is optimized to churn out as many SHA256 hashes as possible -- far more and far faster than a CPU can. In fact, ASIC mining is the only cost-effective way to mine bitcoin these days. Thus, bitcoin mining is quickly becoming about who can buy the most chips and power them with the cheapest electricity, encouraging massive centralized data farms -- a Bad Thing if you believe in the Decentralize All The Things! promise of bitcoin.

A better proof-of-work system is one that shares the quick-to-verify-difficult-to-solve property of specially-formed SHA256 hashes, but one that can't be specialized for with an ASIC. Then more people could participate on commodity hardware and you would (theoretically) have less centralization (or at least encourage better commodity hardware instead of better specialized SHA256 chips).

Anyways, back to graph processing... one class of problem that is difficult to create specialized ASICs for yet is still 'quick-to-verify-but-difficult-to-solve' is finding cycles of a specific length (say 42) in a large graph. Basically solving such a problem is memory-hard, so an ASIC can only get you so far (although people have proposed ASIC farm architectures that try, eg with shared central memory accessed by many ASIC cores, which is starting to sound a bit like a GPU...)

We're interested in day-to-day business needs. Databases and compute engines here handle real-time transaction processing or data analysis for graph data. E.g., logistics, recommendation systems, social graphs (users, employees, companies), payments (fraud or segmentation), IT systems/traffic (performance & security), knowledge management (manufacturing, legal), etc.
...wait, so you mean you use this stuff for like real work instead of intellectual exercises in creative incentives for burning electricity? ;)
"Libertarian Disney Dollars" just about sums it up.
Well for OLTP style processing there are hundreds of use cases. The Neo4j guys have a few on their site to give you some ideas: http://www.neotechnology.com/neo4j-use-cases/

For OLAP style processing there are also many uses. The most common is to look for clusters in large and complex sets of highly related data. (People, for example.) Cluster identification can be used in examining crime networks, disease propogation, finding likely "influencers" for marketing and political purposes, finding many-dimensional patterns of things people tend to buy, and so on. Apache Spark/GraphX is one such tool set.

Twitter uses offline graph processing to suggest who you should follow. Many companies use it (real time or offline) to suggest other items you might want to buy.

And more.

The really cool use case in my mind is combining OLTP+OLAP in real time. Hence the interest in GPU acceleration. We have an OLTP platform that integrates a CPU version of the GAS engine used in MapGraph:

http://bigdata.com/bigdata

http://bigdata.com/mapgraph

Eventually we'd like to see the two platforms converge.