| This is very much worth watching. It is a tour de force. Laurie does an amazing job of reimagining Google's strange job optimisation technique (for jobs running on hard disk storage) that uses 2 CPUs to do the same job. The technique simply takes the result of the machine that finishes it first, discarding the slower job's results... It seems expensive in resources, but it works and allows high priority tasks to run optimally. Laurie re-imagines this process but for RAM!! In doing this she needs to deal with Cores, RAM channels and other relatively undocumented CPU memory management features. She was even able to work out various undocumented CPU/RAM settings by using her tool to find where timing differences exposed various CPU settings. She's turned "Tailslayer" into a lib now, available on Github, https://github.com/LaurieWired/tailslayer You can see her having so much fun, doing cool victory dances as she works out ways of getting around each of the issues that she finds. The experimentation, explanation and graphing of results is fantastic. Amazing stuff. Perhaps someone will use this somewhere? As mentioned in the YT comments, the work done here is probably a Master's degrees worth of work, experimentation and documentation. Go Laurie! |
Update: found the bypass via the youtube blurb: https://github.com/LaurieWired/tailslayer
"Tailslayer is a C++ library that reduces tail latency in RAM reads caused by DRAM refresh stalls.
"It replicates data across multiple, independent DRAM channels with uncorrelated refresh schedules, using (undocumented!) channel scrambling offsets that works on AMD, Intel, and Graviton. Once the request comes in, Tailslayer issues hedged reads across all replicas, allowing the work to be performed on whichever result responds first."