Hacker News new | ask | show | jobs
by rspeer 3950 days ago
The last time I tried Neo4j was in 2010 or 2011, when I was trying to build ConceptNet 5 (http://conceptnet5.media.mit.edu) on it.

It had showstopping security problems when bound to anything but 127.0.0.1, so I came up with a software firewall to put around it and hoped for the best. It promised Lucene search but its implementation was full of Lucene injections, unless I escaped every special character I could think of like a freaking PHP programmer. There was no way to get data in faster than a slow trickle, unless that data was somehow already in another Neo4j database. Doing any interesting graph operations led to interesting messages about running out of "PermGen". And before I could even get all the data in, it had consumed enough resources to blow my academic AWS budget for months.

I was on the mailing list looking for support, and found it pretty lacking. The best I ever got was a bunch of Java code to try (my code is in Python).

I use SQLite now. It doesn't do very much, but it does what it's supposed to, and that's great.

If Neo4J has improved significantly since then, forgive me that I'm not rushing back to try it again.

2 comments

That sucks. :( Sorry about that. Neo4j isn't perfect today and it certainly wasn't perfect 4-5 years ago. We're working hard on it tho!

And thanks for being specific (amazed that you remember specific issues from five years ago!). I don't remember the 127.0.0.1 security problems, but I don't hear anything about them so my guess is they've been addressed. We have a lot of finance and government customers that have high requirements on security. As for your Lucene issues, we did a complete overhaul of our search and indexing story in Neo4j 2.0 (released late 2013). We've continuously improved import performance (which has traditionally been a weak spot) and Neo4j 2.2 includes a batch importer which injects >1M records / sec sustained pace at scale (10s of billions of records) on commodity hardware. As for the memory management issues, we like many other data products written in Java struggled with GC for a long time, and like many others we ultimately concluded that we had to move a lot of the critical parts off heap / manage the memory ourselves, which significantly improved memory utilization.

I understand that you got stung historically and therefore hesitate to check us out again. And if SQLite is working well for you, there's no need to! But Neo4j and the graph space has matured a LOT since 2010 and fortunately I don't think your "bleeding edge" experience from 4-5 years ago will be replicated anymore for someone coming new into the space.

Thanks for the feedback.

4-5 was a very long time ago in graph db time :) neo4j and its competitors have changed the lot!!

While neo4j has it's proponents. The lack of standards support means that as a data provider it's hard to support.

Check out http://tinkerpop.com. Apache TinkerPop 3.0.0 was released in June 2015 and it is a quantum leap forward. Not only is it now apart of the Apache Software Foundation, but the Gremlin3 query language has advanced significantly since Gremlin2. The language is much cleaner, provides declarative graph pattern matching constructs, and it supports both OLTP graph databases (e.g. Titan, Neo4j, OrientDB) and OLAP graph processors (e.g. Spark, Giraph). With most every graph vendor providing TinkerPop-connectivity, this should make it easier for developers as they don't have to learn a new query language for each graph system and developers are less prone to experience vendor lock-in as their code (like JDBC/SQL) can just move to another underlying graph system.
Its more about data interchange support, i.e. we could support GraphML instead/next to the RDF varieties. But this would be difficult for us to generate in a streaming way.

Then for our end users, we would need to hack in a namespace convention to avoid issues when integrating our data.

Then TinkerPop misses the SERVICE concept for federated querying in SPARQL1.1, which is essential for our endusers who do knowledge discovery (i.e. small biology labs without the inhouse capability of running their own large databases).

You use SQlite instead of Neo4J? May I ask what's your use case? I don't see these overlapping in any scenario...