|
|
|
|
|
by dimtion
1831 days ago
|
|
In the Facebook bug listed in the post this specific mitigation would probably not have been enough since the bug was due to invalid instructions emitted by the JIT. Under exact same workloads, two duplicate JIT running on two duplicate CPU would have most likely emitted the same erroneous code. |
|
That's not how I understood the blog post.
> Next they needed to understand the specific sequence of instructions causing the corruption. This turned out to be as much of a nightmare as anything else in the story. The application, like most similar applications in hyperscale environments, ran in a virtual machine that used Just-In-Time compilation, rendering the exact instruction sequence inaccessible. They had to use mutiple tools to figure out what the JIT compiler was doing to the source code, and then finally achieve an assembly language test:
>> The assembly code accurately reproducing the defect is reduced to a 60-line assembly level reproducer. We started with a 430K line reproducer and narrowed it down to 60 lines.
It sounds like the JIT produced accurate (although hard to find) machine code. Then when the CPU ran that machine code it executed it incorrectly, but only when executed on core 59.