Hacker News new | ask | show | jobs
by leedrake5 1028 days ago
I assume that this primarily benefits games and not any deep learning right? The most attractive aspect of Mac M1 is the huge memory boost. Might not be great for training due to the inability to distribute across multiple cards, but it makes for a great inference engine for stable diffusion, llama, and other large models.
4 comments

There are two modern cross-platform GPGPU standards that Apple Silicon can theoretically use or implement - SYCL and Vulkan Compute.

SYCL is Khronos Group's vendor-neutral, high-level programming framework. Application support is limited, but hopefully with Intel's backing, the situation would gradually improve. Meanwhile, Vulkan Compute sidesteps the entire headache with compute shaders. But I'm not familiar with it in terms of application support.

SYCL can be implemented on top of OpenCL and OpenCL's SPIR-V extension. It soon turned out that this route is unfeasible due to prevalent vendor lock-in that's not going to change anytime soon, so it has largely been abandoned by everyone else but Intel and Mesa. Right now SYCL is usually implemented by backends to GPU vendor's respective APIs, like ROCm, HIP or CUDA. Doing the same for Metal would be very challenging.

Mesa already has experimental support of OpenCL w/ SPIR-V on Intel and AMDGPU, so theoretically it can be extended to Apple Silicon. Difficulty of implementing OpenCL's SPIR-V extension should be comparable with Vulkan compute shader (which also uses SPIR-V). However, currently OpenCL on Apple Silicon is entirely unsupported. The last time I checked, it's on the roadmap.

The only problem with cross-platform standards is they are never performance portable unless they're so high level someone already their primitives have already implemented the algorithm X different ways for you already.

For any low level performance programming you need to code to the specific microarchitecture, so the pros of a single programming language/library are limited (you're not getting any code reuse that isn't available in the top level non-hardware C code anyway) and often outweighed by the ability to take advantage of the vendor's dedicated extensions provided by their preferred programming mechanism.

This issue was well modeled by OpenCL, which never really caught on for programming Nvidia GPUs for this reason.

Correct. You need CUDA, or ROCm, MPS (native to macOS) backends for running deep learning. I found it relatively easy to train some Pytorch model on beefy server with CUDA and running interference on my Macbook Air.
MPS is a Metal shader library rather than a programming language, which would be MSL (like GLSL/HLSL).
The compute shader portion is a good step but it's still not going to provide the interfaces most of these deep learning tools expect.

That said eiln wrote an ANE (Apple Neural Engine) driver which enables using the dedicated hardware for this instead of the GPU. It is set to be merged into linux-asahi in the future.

TensorFlow Lite does indeed support OpenGL ES.