|
|
|
|
|
by mathieuh
293 days ago
|
|
https://datafusion.apache.org/blog/2024/09/13/string-view-ge... > The concept of inlined strings with prefixes (called “German Strings” by Andy Pavlo, in homage to TUM, where the Umbra paper that describes them originated) has been used in many recent database systems (Velox, Polars, DuckDB, CedarDB, etc.) and was introduced to Arrow as a new StringViewArray[^3] type. Arrow’s original StringArray is very memory efficient but less effective for certain operations. StringViewArray accelerates string-intensive operations via prefix inlining and a more flexible and compact string representation. Seems to be nothing more than they were invented at a German university. I spent quite some time thinking it had something to do with German’s sometimes-SOV word order. |
|
Umbra: A Disk-Based System with In-Memory Performance
https://db.in.tum.de/~freitag/papers/p29-neumann-cidr20.pdf
Section 3.1 covers string handling.
This article (also linked from tfa) explains German strings in more detail.
https://cedardb.com/blog/german_strings