Hacker News new | ask | show | jobs
by canadiantim 1317 days ago
Could it be efficient to use Apache AGE for e.g. retrieving all comments on an article?

Currently I’m using materialized paths to efficiency return all commments but would be keen to know if AGE can help query comments for an article more powerfully.

3 comments

The ltree extension (https://www.postgresql.org/docs/current/ltree.html) is perfect for this usecase.
How does ltree compare with jsonb?
Have you tried recursive CTEs with a simple id, parent_id etc schema? These should perform very well if those columns are in an index.

Afaik this is pretty much the canonical way to store recursive comment trees. Or any kind of DAG.

As long as comments are a tree, there’s only one path from the root (the post) to an individual comment. How would a recursive CTE perform better than a prefix scan on an indexed string column?

Storing a pointer to each node’s parent or using sorted sets seems like it would make the parent poster’s query slower. Those approaches would make it easier to reparent comments, though, and they’d support arbitrarily deep trees (whereas the materialized path implementations I’ve seen limit path length).

Especially in postgres, you can use a hash index to get constant time index lookups rather than log(n) with a b-tree index, which is the best you could do with prefix lookups on a column. And you're storing and indexing less data since the tree is inherent in the structure and not expressed as part of the data.

Honestly I'd test both implementations to see which is fastest, but my gut is still that recursive CTEs would be fastest, while also simplest, structure-wise. You'd also still be getting all the benefits you'd expect from database-native functionality, like referential integrity and schema enforcement, etc. Presumably there's structure in these indexed string columns that the database knows nothing about, and thus can't enforce any constraints or optimize outside of plain string prefix lookups.

My experience has been that people don't try recursive CTEs because they don't realize they exist, and so they reach for all sorts of exotic or bespoke implementations of essentially that same concept. So I at least try to make sure people know they're a really solid tool to keep in your tool belt, and one that hasn't let me down yet even in very large data sets.

This comment gave me a flashback to Celko's SQL for Smarties. I believe the updated books are split off into a few smaller books? But the section/book on trees in a relational database helped me greatly once in a galaxy far far away.
AGE can handle that and recursive ctes can as well, but AGE has mechanisms to handle cyclic graphs as well.
Why not a trigger that maintains this in a simpler query in a separate table? Sounds more performant to me!

Recursive CTEs sounds like something you would do if your total comment count in the db is not in the six figures or something. What does HN do?

I once achieved it by setting a parent_id string column - for root comments it’ll just be article id. For replies it’ll be “parent_comment.parent_id || parent_comment.id”. Then it’s just a single list no recursion needed to get the full hierarchy at whatever level. Can also be easily migrated to dynamodb and get infinite scaling and zero downtime costs.