| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by legulere 1966 days ago
	In the example they sort the data array. Does that mean this works just on sorted arrays? Insert and delete performance would be horrible I guess.

5 comments

gvinciguerra 1966 days ago

Hi @legulere!

Yep, the example of Figure 2 shows only a static PGM-index on a sorted array.

Insertion and deletions are discussed in Section 3 "Dynamic PGM-index" and experimented in Section 7.3.

The Dynamic PGM-index is open-source too: you can find the implementation at https://github.com/gvinciguerra/PGM-index/blob/master/includ... and the documentation at https://pgm.di.unipi.it/docs/cpp-reference/#classpgm_1_1_dyn...

link

latch 1966 days ago

They propose a solution for dynamic PGM indexes in the paper (section 3) and benchmark it (section 6). A summary is that, in their benchmark, their index is faster by 13%-71% in most cases, but can be slower (1%-15.2%) in a few cases.

I agree the example would be more eye-catching without that sort.

http://www.vldb.org/pvldb/vol13/p1162-ferragina.pdf

link

contravariant 1966 days ago

In the full paper they quote a rather interesting method [1] that allows you to insert values in amortized O(log(n)) time (deletes are apparently handled with tombstones, presumably rebuilding the whole thing when a sufficiently large proportion is deleted).

A very abridged explanation of how they handle inserts: you split the collection in a list of collections where position k contains either nothing or a collection size 2^k. When you want to add a new value you find the first empty spot and fill it by building a set of your new value together with the collections of all the preceding spots (because the sizes are all sequential powers of two this will fit exactly). Provided that merging the collections takes linear time this takes an amortized O(log(n)) per inserted item.

Of course once you have this you can use it for any learned index that can be learned in linear time.

[1]: M. H. Overmars. The Design of Dynamic Data Structures, volume 156 of Lecture Notes in Computer Science. Springer, 1983.

link

X6S1x6Okd1st 1966 days ago

Skimming the paper it appears so. B-trees also require sorted data.

link

bmh 1966 days ago

B-trees don't need sorted data. The B-tree insertion algorithm performs the sorting for you.

link

infogulch 1966 days ago

B-trees need sortable data; GP is not saying that the data has to be pre-sorted, but that it can't be unorderable or otherwise uncomparable.

link

AlphaSite 1966 days ago

Isn’t sorting required for range based indexes?

link

jandrewrogers 1966 days ago

Not necessarily. If you have indexing structures for data types that do not have a total order, only a partial order, you can store and do an indexed range search on data types that do have a total order. The primary implication is that the output of the range search will not reflect the total order in the way it would for a traditional B+Tree.

link

YesThatTom2 1966 days ago

Same question. Can you give me an example of an indexing scheme that works on non-sortable data? I can’t think of any.

link

jandrewrogers 1966 days ago

The canonical example is indexing rectangles. They have no total order. It is far from the only example. Any data type where equality and intersection are not equivalent test functions will effectively be non-sortable.

There are many indexing schemes for data with these properties. They focus on topological relationships rather than order relationships.

link

AnbeSivam 1965 days ago

Do you know of any blog/papers which talks about this - using topology for such interval data types.

link