I wonder if there will be a hardware solution in the future that duplicates memory over multiple channels and gives the first result back transparently without threads and racing.
That somewhat existed in servers in the past, my R720xd had RAM mirroring mode. IDK if it used it for reducing latency, but you could take out a stick and the server would continue running as normal and report an alarm in iDRAC.
No, as far as I can tell, it does not reduce latency for reads. The latency for writes is worse for both the average and worst case conditions as writes have to be sent to two dimms. The purpose is high reliability. I believe it's most analogous to RAID1 systems which, generally, only issue a read to a single device rather than taking the first succeeding of simultaneous reads.
Source: not only do I have an R720xd (and two regular R720s), I checked the Intel Xeon E5-2600v2 reference manuals.
Out of my area, but yeah, I have never heard of an optimized read using that. On the surface, it seems like a task much better suited for HW and there are companies that would probably pay for the ram per core penalty to get that low jitter in latency.