Hacker News new | ask | show | jobs
by pilif 5103 days ago
I would love to see what's really causing this bug. We read so many times over the weekend to either reboot or just run that date command - but nobody is telling us what's causing the problem.

Also, seeing that other threaded applications had similar problems, I doubt this is a java issue - more likely a pthread, glibc or even kernel issue

4 comments

The patch that was shared on the lkml shows some insight on what is causing the issue. https://lkml.org/lkml/2012/7/1/27

Apparently the issues might be due "to the leapsecond being added without calling clock_was_set() to notify the hrtimer subsystem of the change", a possible fix being to patch kernel/time/timekeeping.c to be leapsecond aware.

There is a good explanation here: http://serverfault.com/q/403732/58037
That's predominantly about the kernel crash, not the high-CPU futex issue. One of the most maddening things about this is that there have been several different issues related to leap seconds on Linux, making it all the harder to get information.
This seems like the best explanation I've found so far: https://lkml.org/lkml/2012/7/1/203
Agreed. Also it clearly accounts for the futex related load issues and it even gives nice and readable C code to see the problem happening.

This explains it for me. Thanks a lot for the pointer.