Hacker News new | ask | show | jobs
by akshayshah 1319 days ago
As long as comments are a tree, there’s only one path from the root (the post) to an individual comment. How would a recursive CTE perform better than a prefix scan on an indexed string column?

Storing a pointer to each node’s parent or using sorted sets seems like it would make the parent poster’s query slower. Those approaches would make it easier to reparent comments, though, and they’d support arbitrarily deep trees (whereas the materialized path implementations I’ve seen limit path length).

1 comments

Especially in postgres, you can use a hash index to get constant time index lookups rather than log(n) with a b-tree index, which is the best you could do with prefix lookups on a column. And you're storing and indexing less data since the tree is inherent in the structure and not expressed as part of the data.

Honestly I'd test both implementations to see which is fastest, but my gut is still that recursive CTEs would be fastest, while also simplest, structure-wise. You'd also still be getting all the benefits you'd expect from database-native functionality, like referential integrity and schema enforcement, etc. Presumably there's structure in these indexed string columns that the database knows nothing about, and thus can't enforce any constraints or optimize outside of plain string prefix lookups.

My experience has been that people don't try recursive CTEs because they don't realize they exist, and so they reach for all sorts of exotic or bespoke implementations of essentially that same concept. So I at least try to make sure people know they're a really solid tool to keep in your tool belt, and one that hasn't let me down yet even in very large data sets.