Hacker News new | ask | show | jobs
by p4lindromica 2710 days ago
A perfect hash is one in which their are no collisions for a given data set, which guarantees O(1), or constant, lookup time. This patch introduces a perl script which computes a perfect hash function at build time given the current set of SQL keywords.

The old keyword lookup method uses binary search. Binary search is a search algorithm that must compare at most O(log n) entries before finding a match.

Making 1 keyword comparison instead of log n keyword comparisons yields the 20% speedup.

1 comments

> Making 1 keyword comparison instead of log n keyword comparisons yields the 20% speedup.

Worthwhile to note that that speedup is for the raw parsing, without parse-analysis (i.e. looking up the tables referenced and such), planning (figuring out how to execute the query) and execution. So end-user impact will be much smaller than 20%. But for some cases, e.g. when many columns are referenced, it'll be measurable.

Column names aren't keywords though, are they?.

Using a perfect hash table is definitely cool, but it's hard to imagine a query with so many keywords that this would have a measurable impact on total query execution time.

Due to the desire to have keywords that aren't fully reserved (we have keyword classes that are fully reserved, can appear as type names, can appear as column names, or can appear nearly everywhere) we/PG need to look up whether a specific string is a keyword or an identifier in a number of places in the scanner/parser.

Addendum: And that indeed means that column references, unless quoted with "", used to go through the binary search and now go through the perfect hash.

Edit: Typo

Ah that makes sense.

Thanks for that insight!