| HN Mirror

> The challenge is selecting the tests that best represent the typical ML/DL use cases for the M1 and comparing it to an alternative such as the V100 using a common toolchain like Tensorflow.

The benchmarks there are actual applications of ML, that people use to solve real world problems. To get a benchmark accepted you need to argue and convince people that the problem the benchmark solves must be solved by a lot of people, and that doing so burns enough cycles worldwide to be helpful to design ML hardware and software.

The hardware and software then gets developed to make solving these problems fast, which then in turns make real-world applications of ML fast.

Suggesting that the M1 is a solution, and now we just need to find a good problem that this solution solves well and add it there as a benchmark is the opposite to how mlperf works, and hardware vendors suggesting this is the reason mlperf exists. We already have common ML problems that a lot of people need to solve. Either the M1 is good at those or it isn't. If it isn't, it should become better at those. Being better at problems people don't want / need to solve does not help anybody.