| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by PaulHoule 1534 days ago

At scales far below what you're talking about people experience grave difficulties making sense of big graphs.

https://cambridge-intelligence.com/how-to-fix-hairballs/

One of my favorite examples is this guy

https://en.wikipedia.org/wiki/Mark_Lombardi

I saw an art exhibit that showed some of the sketches that he made and it was clear that he worked really hard drawing and redrawing each graph and they went from being hairballish to telling a clear story.

You're also very insightful to be talking about the specific scale you're working at because it matters. Graph workloads can drive you batty because they frequently defeat caches by beating very nonlocal.

For your small data set you are in the range where you can get a "big" computer with say 64GB or 128GB of RAM and be able to work in RAM. You might be a little disappointed with the performance (it takes a while to touch every memory address in a 128GB machine) but it will good enough if you're efficient and disciplined.

As an RDF fanatic I'll share that I have handled data sets on the small end of your scale with

https://virtuoso.openlinksw.com/

1 comments

craggyjaggy 1534 days ago

I have a million nodes, which I'm confident I can prune to less than 100k now that I know I'll have to. Each with 100 edges that I can probably filter down to 10-20. That should get me down to <3GB, which might be more in reach?

link

PaulHoule 1534 days ago

Yeah, you might even get away with using neo4j at that scale, which has an API people like even if it doesn't handle bigger graphs well.

link

nikonyrh 1530 days ago

"it doesn't handle bigger graphs well."

I'd like to hear more, I used it for a prototype several years ago and was quite impressed with the query language but also performance.

link

craggyjaggy 1534 days ago

I'll have a look at Neo4j, thank you :)

link