| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by p4lindromica 2757 days ago

A perfect hash is one in which their are no collisions for a given data set, which guarantees O(1), or constant, lookup time. This patch introduces a perl script which computes a perfect hash function at build time given the current set of SQL keywords.

The old keyword lookup method uses binary search. Binary search is a search algorithm that must compare at most O(log n) entries before finding a match.

Making 1 keyword comparison instead of log n keyword comparisons yields the 20% speedup.

1 comments

anarazel 2757 days ago

> Making 1 keyword comparison instead of log n keyword comparisons yields the 20% speedup.

Worthwhile to note that that speedup is for the raw parsing, without parse-analysis (i.e. looking up the tables referenced and such), planning (figuring out how to execute the query) and execution. So end-user impact will be much smaller than 20%. But for some cases, e.g. when many columns are referenced, it'll be measurable.

link

the_duke 2757 days ago

Column names aren't keywords though, are they?.

Using a perfect hash table is definitely cool, but it's hard to imagine a query with so many keywords that this would have a measurable impact on total query execution time.

link

anarazel 2757 days ago

Due to the desire to have keywords that aren't fully reserved (we have keyword classes that are fully reserved, can appear as type names, can appear as column names, or can appear nearly everywhere) we/PG need to look up whether a specific string is a keyword or an identifier in a number of places in the scanner/parser.

Addendum: And that indeed means that column references, unless quoted with "", used to go through the binary search and now go through the perfect hash.

Edit: Typo

link

the_duke 2757 days ago

Ah that makes sense.

Thanks for that insight!

link