Hacker News new | ask | show | jobs
by andrewstuart2 1322 days ago
Have you tried recursive CTEs with a simple id, parent_id etc schema? These should perform very well if those columns are in an index.

Afaik this is pretty much the canonical way to store recursive comment trees. Or any kind of DAG.

4 comments

As long as comments are a tree, there’s only one path from the root (the post) to an individual comment. How would a recursive CTE perform better than a prefix scan on an indexed string column?

Storing a pointer to each node’s parent or using sorted sets seems like it would make the parent poster’s query slower. Those approaches would make it easier to reparent comments, though, and they’d support arbitrarily deep trees (whereas the materialized path implementations I’ve seen limit path length).

Especially in postgres, you can use a hash index to get constant time index lookups rather than log(n) with a b-tree index, which is the best you could do with prefix lookups on a column. And you're storing and indexing less data since the tree is inherent in the structure and not expressed as part of the data.

Honestly I'd test both implementations to see which is fastest, but my gut is still that recursive CTEs would be fastest, while also simplest, structure-wise. You'd also still be getting all the benefits you'd expect from database-native functionality, like referential integrity and schema enforcement, etc. Presumably there's structure in these indexed string columns that the database knows nothing about, and thus can't enforce any constraints or optimize outside of plain string prefix lookups.

My experience has been that people don't try recursive CTEs because they don't realize they exist, and so they reach for all sorts of exotic or bespoke implementations of essentially that same concept. So I at least try to make sure people know they're a really solid tool to keep in your tool belt, and one that hasn't let me down yet even in very large data sets.

This comment gave me a flashback to Celko's SQL for Smarties. I believe the updated books are split off into a few smaller books? But the section/book on trees in a relational database helped me greatly once in a galaxy far far away.
AGE can handle that and recursive ctes can as well, but AGE has mechanisms to handle cyclic graphs as well.
Why not a trigger that maintains this in a simpler query in a separate table? Sounds more performant to me!

Recursive CTEs sounds like something you would do if your total comment count in the db is not in the six figures or something. What does HN do?