Hacker News new | ask | show | jobs
by lukhas 1275 days ago
Currently it's "-Xms30g -Xmx30g -XX:+UseG1GC -XX:+PrintCodeCache -XX:ProfiledCodeHeapSize=500m -XX:NonProfiledCodeHeapSize=500m -XX:NonNMethodCodeHeapSize=24m -XX:ReservedCodeCacheSize=1024m -XX:InitialCodeCacheSize=1024m -XX:ParallelGCThreads=24"

Everything after G1GC was suggested by various helpful experts on HN, Discord, by email and other media.

3 comments

Is there any other runtime that needs as much tuning as the JVM?

It seems like a failure of the JVM that someone capable of porting thousands of lines of code with little trouble then fails to tune the JVM. It's like, "we wrote all the code, now the hard part starts, tuning the JVM".

I took just the opposite lesson from this: the JVM offers enough turing knobs that in the rare, extreme cases where the defaults don't work for you (I suspect few single-server workloads in the world have the combination of complexity and request rate that Lichess does), there are still ways to do something about it. If he'd been using, say, Go, he'd probably have had to give up and roll back, or patch the runtime code.
I’ve been using Go for 7’ish years across a large codebase, comprising multiple services interacting with millions of people.

Every release Go gets better and requires 0 changes to the code. I have never needed to fine-tune the GC. I have never needed to spend a month rewriting my code to work with Go 2.0. I have never been nervous to update to a new Go version. I write the code once, and it runs great in prod for years. I love and value these things. I also suspect that Lichess’s use case would perform extremely well in Go out-of-the-box, since it’s just a web app.

I certainly hope the JVM team would like Lichess and other web apps to run well without needing arcane configuration knowledge gathered over years of experience and battle scars in production.

> I’ve been using Go for 7’ish years across a large codebase, comprising multiple services interacting with millions of people.

> Every release Go gets better and requires 0 changes to the code. I have never needed to fine-tune the GC. I have never been nervous to update to a new Go version.

I've had the same experience with the JVM, over a longer time period. You hear about this because it's exceptional and interesting, not because it's the norm. "Just a web app" is an incredibly reductive take.

The same problem exists in the Go runtime (it's fundamental to the problem space), and since they don't let you tune it presumably they either guess or hardcode what the value should be; good luck if they get it wrong. Sure, you probably won't hit it with your code - the overwhelming majority of Java users don't hit this kind of problem with their code either.

The JVM is a runtime virtual machine. Go produces static binaries (I think?) that run on the CPU, big difference.
Go produces binaries but those include a substantial runtime, with e.g. garbage collection and full reflection.
They're not tuning the GC, really. They're tuning the code cache. That's what the main culprit was. Go doesn't have a JIT. No code cache to configure.
> or patch the runtime code

A huge spike in CPU usage would be treated as a regression in other language runtimes and would likely be addressed. The fact that this is solvable with advanced JVM knobs is both good and bad. Good in the way you say. Bad because the complexity of maintaining all those knobs makes it difficult if not impossible to improve the runtime defaults, and every runtime problem becomes a tuning problem.

> A huge spike in CPU usage would be treated as a regression in other language runtimes and would likely be addressed.

A spike that applied to all programs certainly would. A spike that applied to only one program? Good luck, IME.

"If he'd been using, say, Go, he'd probably have had to give up and roll back, or patch the runtime code."

Err no, Go has GC yes, but most of that tuning seems to be related to JIT/Codepaths etc.

Perhaps it is different now, but I've always hated how to figure out how much memory a java app needs. You can certainly give it 30GB of ram and it will happily use it all up and then start making garbage collection calls. But does it really need all that ram? I think the best practice of the time was to continually lower your max heap amounts until you started getting allocation errors, then bump your number up by 20%-50% (or something like that).
No, that's not now you should tune the modern JVM.

GC will always happen no matter how much heap size you allocate: large heap size will make GC happen less frequently, but depending on the algorithm, it will also increase how much time is needed for each collection. The key point for GC tuning is to keep total GC time and pause time under control with as little memory as possible.

* First, you have to setup monitoring for the garbage collection time: turn on metrics collection and details garbage collection logging.

* Second, tune your total GC time so that it's under 5% or less. Start with a reasonably small max heap size, says 256MB, and keep increasing it if the GC time is still too large. Try to keep the max heap size under 32GB to take advantage of "Compressed OOPs"

* Third, you only need to use more advance flags if you detect large GC pause in your GC logging. Otherwise, you're done.

Wow...
I've used jconsole and hit GC button when system is loaded. That gave me an estimate of required size. I usually suggest multiplying it by two.
I write Java for 15 years and the only knob I ever had to change is Xmx. I think that I also had to adjust permgen size in old JVM for development, it's not needed anymore. My apps are not world scale like lichess but small country scale.
I converted our Lamborghini to EV and used carbon fibre. Let's go ! Oh wait, we need to 're-enchant and bless' the roads first.
Yeah, every runtime, if you want to squeeze performance out of it and run everything on one big server like it sounds like they're doing.
Java is a language built up of failed experiments in language design.

My full rant about it is https://blog.habets.se/2022/08/Java-a-fractal-of-bad-experim...

It's partially my subjective opinion, but it seems that almost every single language design decision that Java made was, in retrospect, a bad one.

Not that I could have done better. It's just that none of it panned out.

Eh... your hate is blinding you.

Java was the first mainstream garbage-collected language. Not the first GC language, but the first one to get serious traction. It started the post-C++ era.

That was a pretty bold design decision for the time, and one that worked out. The VM was another big one, and it also worked out.

I did say "almost" every single language decision.

GC wasn't a failed experiment. OOP extremism was. And the latter is a bigger Java standout than GC.

Python predates Java and is still mainstream, and while (mostly) not garbage collected, but reference counted, language design wise it's not a big difference. There have been GCd Python implementations, proving that.

Being successful is not really a language design choice, so I wouldn't count that.

> The VM was another big one, and it also worked out.

As I explain in the blog post, the VM was a colossally bad mistake. The inspiration for the VM sounds like it was better. So looks like they took the ball and started running in the wrong direction.

I think that Java made three wrong decisions.

1. Checked exceptions.

2. Crusade against properties.

3. Bad API is not being actively deprecated. StringBuffer, Vector, Date, Calendar, File. Replacements exist, but old classes are still in core.

Also I think that Java community is crazy with their love of Spring Boot. Like Spring is not magic enough, let's put more magic to autoconfigure its magic. But that's not a failure of Java.

Also, JAVA rise to popularity coincided with the rise of "design-patterns-everywhere", nothing against them where they fit. But there was a time where 'design-patterns' were the true-gospel and Java was where you go to design-pattern-church and practise your religion, everyday, everywhere !
That's not quite fair. They made 1 good design decision (killing pointers).
My graph of null pointer exceptions per second would disagree with you. It's something so common that "NPE" is a common term used when using Java.

But maybe you're saying that the decision was good, but that they failed to actually do it?

Are you running the whole lichess on one machine, or is this one shard only? 30 GB RAM for one instance of application seems very high. (sorry did not have time to read the whole article yet)
That's the one and only server running https://github.com/lichess-org/lila, ie. the main scala JVM application.

See https://lichess.org/source for a list of all the services with a more-or-less up to date diagram.

I for one, am just glad we got rid of all the magic-numbers in CS :P