Hacker News new | ask | show | jobs
by wmiel 1001 days ago
I'm not sure I understand why they're using java if they avoid GC? Doesn't sound like the best fit, especially the foreign memory in java isn't too pleasant to work with.
6 comments

"Hard to write but easy for customers to deploy" is my guess. There are a bunch of very high-performance computing use-cases in finance (quant, HFT) that Java gets used for pretty routinely. That's a very attractive market to build primitives like databases for, they have very deep pockets, but you need to play in their ecosystem.

I've seen this "GC-less" Java in those use-cases quite a bit. From a conceptual design POV it's likely not the best approach, but there's a lot of sunk cost in that eco-system and a lot of trust and expertise where "Choosing a better language" is often several orders of magnitude more expensive.

They're not the only ones to have done this in this space. VoltDB (Michael Stonebreaker of Postgres [among other things] fame) did this -- low or no-GC style Java, effectively non-idiomatic Java, but taking advantage of the Java runtime in other ways.

Others have done the same. And as others have pointed out, there's things outside the DB domain in high frequency trading and the like that have done this as well.

There are advantages to Java: mature runtime, large talent pool out there, good tooling (still haven't seen anything as good as JMX for any other runtime). And if there's any language whose GC could be tuned to be "responsible", it'd be the JVM; there's been more GC R&D in the JVM than in any other runtime.

I worked at RelationalAI (another DB vendor) for a bit, and their DB is all written in Julia, another garbage collected language... and the GC in Julia is what I'd characterize as ... immature... for that kind of application. I would have loved to have access to the JVM's GC there.

Also this looks to be more of an analytical, column oriented, database. So I can imagine they're optimizing more for throughput than transactional latency. (I could be wrong, correct me, Quest folks...)

And choice of Java likely has to do with when they began working on the project and what was out there at the time. It's the real world of software eng. We work with the tools and people we have because shipping a product on time and bringing in $$ is more important than anything else. I don't know when they got started, but Rust has only matured to "mainstream" stability/acceptance in the last 2-3 years.

Finally, DBs often have a very layered architecture and theyt could easily compartmentalize pieces such that latency sensitive bits could be done in native Rust. They're not apparently doing this, but I could see them doing things like moving the page buffer or column indices or storage engine over to Rust over time for performance benefits.

All power to them, it's great to see them working with Rust. (aside: my email history looks like I spoke to a recruiter there at some point, maybe, but didn't interview? I think if I'd known they were playing with Rust I would have given that more attention...)

Also this looks to be more of an analytical, column oriented, database. So I can imagine they're optimizing more for throughput than transactional latency.

Yes that is the case

Seamless integration with parts that don't need to be GC-free comes to mind: they are not building an application, they are building a building block. And that building block can be used both in applications that do require the latency guarantees of GC-free as well as in applications that don't. Another class of applications would be ones that alternate between phases of unpredictable latency (like bootup or reconfiguration) and low-latency operation.
I remember reading that the founder was working in low-latency Java development with London investment banks for years. I guess it's what he knew.

Also, Rust is a hard language to start a company with so I wouldn't be surprised if this is more of a product maturity thing.

Surely it is at least as hard to find people who know how to write Java without GC?

Presumably you can't use Hotspot so you have to write your own VM too?

Folks with a background in electronic trading (FX, hedge funds, trading firms etc) are familiar with zero-gc Java. London / NY / HK are a good pool of talent in that respect
Yep, I ended up checking their code base. I know a little bit of Java but didn't realise that sun.misc.Unsafe existed, so it does actually look fairly straightforward to create code outside of GC (for anyone reading questdb/std/Unsafe.java seems to be where some allocations are handled). A pain I am sure, but way more manageable than I thought.
It's not necessarily outside of GC, it's that they make great efforts to avoid GC. Such as instantiating all objects at startup and holding references to avoid GC, never using new keyword, avoiding objects in favor of primitives, avoid exceptions, etc.

These systems often restart or do a full "stop the world" GC once per day.

The system is quite different than what most are used to, especially during a trend towards increasingly Functional styles with immutability as a default, etc.

Peter Lawrey[1] has some great posts/talks about his experiences in HFT.

[1] https://github.com/peter-lawrey

It's not that uncommon of move, to choose jumping through hoops with Java over writing C.

Might become less common now Rust is teaching the level it is.

Having worked on writing DB internals in both Rust and in other languages, I can say that there's huge time-saving advantages to having something higher-level & garbage collected at the layer of the query parser/analyzer/compiler. The borrowing/ownership semantics can get really snaky when dealing with complicated expression trees, iteration patterns, etc.

It's fairly hard to write ergonomic interfaces for more complicated iteration patterns in Rust while still respecting safety. That's actually fine and by design, and it's possible with a lot of effort and thought but this is not as much of a concern in e.g. Java. E.g. skim the discussion on this proposed "cursor" API for Rust's stdlib BTree: https://github.com/rust-lang/rust/issues/107540

(And while Rust's enum-based algebraic types & pattern matching are nice, they're actually fairly limited when compared to what you can find in e.g. Scala or F#, Haskell, etc.)

But I think there's also huge win in doing something like the pager/buffer pool/storage/data structure/indexes layer in Rust. For safety and efficiency reasons.

Yeah, sounds like a lot of effort.

The only thing they say that explains it is they end up with a single jar file, whose only dependency is the JRE.

So I guess they get platform independence and easy installation.

QuestDB engineer here: We use jlink to create images for selected platforms. This means not even JRE is a dependency: You unpack a tarball and you are good to go. See: https://questdb.io/docs/get-started/binaries/
Well, technically it is a dependency in that you can't target platforms there isn't a JRE for. But I take your point about simplifying installation.
Keep in mind the JNI libs are platform specific. That means available platforms are a function of what the JRE runs on AND they have built the shared lib for (and bundled in to the jar)
you know what fatjars are?