Hacker News new | ask | show | jobs
by defatigable 1003 days ago
This is a niche case, but I spent months trying to upgrade one of our services from one LTS version to the next (I forget which). We encountered a weird bug where services running on the latest JRE would mysteriously corrupt fields when deserializing thrift messages, but only after running for a little while.

After an enormously unpleasant debugging cycle, we realized that the JIT compiler was incorrectly eliminating a call to System::arrayCopy, which meant that some fields were left uninitialized. But only when JIT compiled, non-optimized code ran fine.

This left us with three possible upgrade paths:

* Upgrade thrift to a newer version and hope that JIT compilation works well on it. But this is a nightmare since A) thrift is no longer supported, and B) new versions of thrift are not backwards compatible so you have to bump a lot of dependent libraries and update code for a bunch of API changes (in a LARGE number of services in our monorepo...). With no guarantee that the new version would fix the problem.

* File a bug report and wait for a minor version fix to address the issue.

* Skip this LTS release and hope the JIT bug is fixed in the next one.

* Disable JIT compilation for the offending functions and hope the performance hit is negligible.

I ultimately left the company before the fix was made, but I think we were leaning towards the last option (hopefully filing a bug report, too...).

There's no way this is the normal reason companies don't bump JRE versions as soon as they come out, but it's happened at least once. :-)

In general there's probably some decent (if misguided) bias towards "things are working fine on the current version, why risk some unexpected issues if we upgrade?"

2 comments

I encountered a weird bug with deserializing JSON in a JRuby app during an OpenJDK upgrade - it would sporadically throw a parse error for no apparent reason. I was upgrading to OpenJDK 15, but another user experienced the same regression with an LTS upgrade from 8 to 11.

The end result of my own investigation led to this quite satisfying thread on hotspot-compiler-dev, in which an engineer starts with my minimal reproduction of the problem and posts a workaround within 24 hours: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2021...

There's also a tip there: try a fastdebug build and see if you can convert it into an assertion failure you can look up.

fastdebug is a good tip, thanks for sharing!
did you work for a very large rideshare company by any chance?