|
|
|
|
|
by gvinciguerra
1975 days ago
|
|
Hi @kreeben, the construction is very fast, it can be done in linear time with a single scan of the data. Just to give you some figures, we constructed a PGM-index on 10^12 key-value pairs (8 bytes + 8 bytes) loaded in main memory in less than 3 seconds. About the information that we store, we prefer to say "piecewise linear approximation" because the line segments do not necessarily connect points, and they are not connected to each other. The guarantee of the piecewise linear approx is that each line segment is far from the input points by at most a user-given integer ε. With this guarantee, when we compute the approximate position p of a given query key q, we can always find the true position of q after a fast binary search in the range of positions [p-ε,p+ε]. The piecewise linear model is nothing more than a list of triples (key,slope,intercept). The key is needed because we need to know where a segment starts and where the previous one ends. The slope and the intercept are exactly as you said, the parameters that let us compute the approximate position via p = slope * q + intercept. |
|