| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by p1esk 3621 days ago
	Just to clarify: are you trying to compete with Nvidia, or with Intel? If you're going against GPUs, is your chip something that can run neural networks (better than Nvidia)?

2 comments

trsohmers 3621 days ago

Short answer: If we were to implement SIMD FP16 support similarly to how we have a planned dual FP32 in our FP64 FPU, we would be able to easily match GPU performance by throwing more cores at the problem, while still being more efficient. While neural nets/machine learning is interesting, and we could potentially enable it in new forms as we can provide a desktop GPU's capability in a much smaller/lower power form factor, it is not our main focus. As the other commenter noted, there are ASICs that do a good job at that, though since we are more generally programmable than those sort of ASICs, we would be able to handle changes in algorithms over time while some may not be able to.

The more interesting problems for us are things that GPUs can't do well, such as level 1 (vector) and level 2 (matrix-vector) BLAS operations. While most GPUs (and CPUs when utilizing SIMD instructions) only get a couple of percent the performance on level 1 and level 2 BLAS compared to level 3 (matrix-matrix), we are equally performant across all three (and at a very high percentage of theoretical peak).

link

p1esk 3621 days ago

Interesting. Which applications require vector-vector or matrix-vector operations as opposed to matrix-matrix?

link

gpderetta 3621 days ago

Also, custom ASICs are the current state of the art for NN.

edit: missing word

link

p1esk 3621 days ago

Which custom ASICs are you talking about?

link

gpderetta 3621 days ago

I'm referring to Google's TPU.

link

p1esk 3621 days ago

I couldn't find any details about that chip. How do you know it's state of the art?

link

gpderetta 3621 days ago

I only know what is publicly know. It was discussed a while ago on HN. I think that google claims the best performance per Watt and manages to do that with specialized 8 bit floating point ALUs. I don't think it has been publicly available yet so the claims lack third party verification.

link