|
|
|
|
|
by 6keZbCECT2uB
78 days ago
|
|
I like the project: taking it from refresh-induced tail latency to racing threads assigned to addresses that are de-correlated by memory channel. Connecting this to a lookup table which is broadcasted across memory channels to let the lookup paths race makes for a nice narrative, but framing this as reducing tail latency confused me because I was expecting this to do a join where a single reader gets the faster of the two racers. From a narrative standpoint, I agree it makes more sense to focus on a duplicated lookup table and fastest wins, however, from an engineering standpoint, framing it in terms of channel de-correlated reads has more possibilities. For example, if you need to evaluate multiple parallel ML models to get a result then by intentionally partitioning your models by channel you could ensure that a model does reads on only fast data or only slow data. ML models might not be that interesting since they are good candidates for being resident in L3. |
|