| HN Mirror

Disclaimer: not a kernel dev, opinion based upon very cursory inspection.

The patch references the "scheduler clock," which is a high-speed, high-resolution monotonic clock used to schedule future events. For example, a network card driver might need to reset a chip, wait 2 milliseconds, and then do another initialization step. It can use the scheduler to cause the second step to be executed 2 milliseconds in the future; the "scheduler clock" is the alarm clock for this purpose.

Measuring the "current time" is pretty complicated when you're dealing with multiple-core variable-frequency processors, need a precise measurement, and can't afford to slow things down. The "scheduler clock" code fuses together time sources and elapsed-time indicators to provide an estimated current time which has certain guarentees (such as code running a particular core will never see time go backwards, it will be accurate within particular limits, and it won't need global locks). The sources and elapsed-time indicators it has available varies by computer architecture, vendor, and chip family; therefore the exact behavior on an Intel core 5 will differ from that of an Arm M7.

The patch in question changes the behavior of local_time(); this is the function used by code which wants to know what the current time is on its particular core. The patch tries to make local_time() return a sane value if the schedule clock hasn't been fully initialized but is at least running.

As you can imagine, there a lot of things that can go wrong with that. I think the problem is that sched_clock_init_late() is marking the clock as "running" before it should. I could very well be wrong. Regardless, it's pretty clear that there's some kind of architecture-dependent clock initialization race condition that once in a while gets triggered.