These tests are only meaningless if you completely misinterpret what they're testing. It's not a test of the overall reliability of the drives. They're just testing the write endurance (and occasionally the data retention). The wear leveling and garbage collection algorithms will have zero variance between different drives of the same model, so there's no need for a large sample of controllers. And each drive itself constitutes a large sample of flash memory so any random variation in the lifespan of individual NAND cells is already averaged out.