|
|
|
|
|
by eddyxu
1113 days ago
|
|
Hey, co-author of Lance here. Lance is faster in random access because the layout / encodings were designed to be fast in both scan and random access case. We borrowed many ideas from Google's Procella paper, and Arrow's in-memory layout. Also we added a bunch of I/O exec plan optimizations with the assumption that it has large-blob columns (i.e., image, lidar point cloud) during scanning, which do not exist in traditional OLAP systems, because their workloads are different than ML training. Re-implementing Lance in Java should have very similar I/O characteristics. There are actually some efforts to support Lance in JVM / Spark data sources. |
|