| HN Mirror

Just to clarify: are you trying to compete with Nvidia, or with Intel? If you're going against GPUs, is your chip something that can run neural networks (better than Nvidia)?

trsohmers 3621 days ago

Short answer: If we were to implement SIMD FP16 support similarly to how we have a planned dual FP32 in our FP64 FPU, we would be able to easily match GPU performance by throwing more cores at the problem, while still being more efficient. While neural nets/machine learning is interesting, and we could potentially enable it in new forms as we can provide a desktop GPU's capability in a much smaller/lower power form factor, it is not our main focus. As the other commenter noted, there are ASICs that do a good job at that, though since we are more generally programmable than those sort of ASICs, we would be able to handle changes in algorithms over time while some may not be able to.

The more interesting problems for us are things that GPUs can't do well, such as level 1 (vector) and level 2 (matrix-vector) BLAS operations. While most GPUs (and CPUs when utilizing SIMD instructions) only get a couple of percent the performance on level 1 and level 2 BLAS compared to level 3 (matrix-matrix), we are equally performant across all three (and at a very high percentage of theoretical peak).

Interesting. Which applications require vector-vector or matrix-vector operations as opposed to matrix-matrix?

Also, custom ASICs are the current state of the art for NN.

edit: missing word

Which custom ASICs are you talking about?

I'm referring to Google's TPU.

I couldn't find any details about that chip. How do you know it's state of the art?

I only know what is publicly know. It was discussed a while ago on HN. I think that google claims the best performance per Watt and manages to do that with specialized 8 bit floating point ALUs. I don't think it has been publicly available yet so the claims lack third party verification.

VLIW have been used very successfully as DSPs for a long time, I do not think anybody is debating that. It is outside that niche that they have repeatedly been found lacking.

I'm sure your architecture would work fine for a subset of HPC problems like those that are currently run on a traditional GPGPU, but even in the HPC world many problems are ill suited for a GPU (think particle transport).