| This is a niche case, but I spent months trying to upgrade one of our services from one LTS version to the next (I forget which). We encountered a weird bug where services running on the latest JRE would mysteriously corrupt fields when deserializing thrift messages, but only after running for a little while. After an enormously unpleasant debugging cycle, we realized that the JIT compiler was incorrectly eliminating a call to System::arrayCopy, which meant that some fields were left uninitialized. But only when JIT compiled, non-optimized code ran fine. This left us with three possible upgrade paths: * Upgrade thrift to a newer version and hope that JIT compilation works well on it. But this is a nightmare since A) thrift is no longer supported, and B) new versions of thrift are not backwards compatible so you have to bump a lot of dependent libraries and update code for a bunch of API changes (in a LARGE number of services in our monorepo...). With no guarantee that the new version would fix the problem. * File a bug report and wait for a minor version fix to address the issue. * Skip this LTS release and hope the JIT bug is fixed in the next one. * Disable JIT compilation for the offending functions and hope the performance hit is negligible. I ultimately left the company before the fix was made, but I think we were leaning towards the last option (hopefully filing a bug report, too...). There's no way this is the normal reason companies don't bump JRE versions as soon as they come out, but it's happened at least once. :-) In general there's probably some decent (if misguided) bias towards "things are working fine on the current version, why risk some unexpected issues if we upgrade?" |
The end result of my own investigation led to this quite satisfying thread on hotspot-compiler-dev, in which an engineer starts with my minimal reproduction of the problem and posts a workaround within 24 hours: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2021...
There's also a tip there: try a fastdebug build and see if you can convert it into an assertion failure you can look up.