Hacker News new | ask | show | jobs
by ZeroCool2u 269 days ago
Everyone asking why this exists when DuckDB or PostGIS or the JVM based Sedona already exists, clearly has not run into the painful experience of working on these large geospatial workloads when the legacy options are either not viable or not an option for other reasons, which happens more often than you might expect! And the CRS awareness!!! Incredible! This is such a huge source of error when you throw folks that are doing their best, but don't have a lot of experience with GIS workloads. Very expensive queries have had to be rerun with drastic changes to the results, because someone got their CRS mixed up.

I don't get to do geospatial work as much anymore, but I would have killed for this just a year ago.

2 comments

I usually start with PostGIS for single-node workloads and then switch to Exasol when I get to truly massive datasets (Exasol has a more limited set of spatial operators, but scales effortlessly across multiple nodes).

It will be great with some more options in this space, especially if it makes a smooth transition from single-node/local interactions to multi-node scale-out.

well for one, it's not crashing at some larger use-cases when duckdb does. according to the graph unless I'm mis-reading
I'd like to know the details of the errors -- because it could have been as simple as running out of memory.
I doubt this hypothesis, because duckdb written in c++ should be able to tolerate memory failure, while this written in rust has to deal with rusts memory allocation failures are panic's behavior.

That is to say that if the issue is duckdb running out of memory, it is most likely because the rust implementation is using memory more efficiently for whatever query is crashing duckdb, rather than graceful handling of memory allocation failure.

Where it is possible in c++ to gracefully handle memory allocation failure, it is not really a thing in rust I'm not even sure whether it is possible to catch_unwind it. I say this as a rust person who doesn't fancy c++ in the slightest...

You cannot use the rust standard library in environments where arbitrary allocations may fail but neither can you use the STL. The difference is the rust standard library doesn't pretend that it has some reasonable way to deal with allocation failure. std::bad_alloc is mainly a parlor trick used to manufacture the idea that copy and move fallibility are reasonable things.

I wouldn't wager a nickel on someone's life if it depended on embedded STL usage.

I’ve never seen anyone try to catch allocation failures in C++ code and in many cases doing so correctly is very difficult, not least of which is that writing exception-safe code is the exception, not the rule.
There's an effort to expose allocation errors in the standard library for the Linux kernel. Pretty sure it is well under way.
OOM are still something a DB can "avoid" so it's not like that class of bugs is some special issue that nullifies thing.
Crashing when running out of memory is not acceptable software behavior in my opinion.
Right, but all it says is that an error was thrown.
You can generate the dataset with the instructions in this readme: https://github.com/apache/sedona-spatialbench/tree/main

Here are the queries: https://github.com/apache/sedona-spatialbench/blob/main/prin...

They should be fairly easy to replicate!