Hacker News new | ask | show | jobs
by nomel 23 days ago
You didn't touch on the most important aspect for cost: die area!

How much die space ($) will that circuitry, that's probably statistically near zero chance for you main customers workload (who has model weight of 0 or 1!?), add. And, if you can stomach the cost, what else could you put there instead?

3 comments

Weights should not be 0 (at least not frequently) but in a ReLU-based neural network, activations are 0 pretty often. You're absolutely right about die area though.
> near zero chance for you main customers workload

What percent of this hardware is running inference for ReLU models? ;)

Nvidia has added structural sparsity to their GPUs and every time they pull out a flops or tops number, they assume you will use structural sparsity.

The die area argument here makes no sense. Supporting structural sparsity can be done either by duplicating the multipliers with and without the support or you have a single general purpose multiplier that does both, in which case you can have twice as many of them.

Also, in ReLU^2 networks, 90%+ parameters are zero.

> The die area argument here makes no sense.

Any logic you add to the GPU is physical silicon and metal that take up physical space.

> duplicating the multipliers with and without the support or you have a single general purpose multiplier that does both

That would be extra physical logic, which would be extra physical space on the die. "can be done" isn't my point, it's that "doing requires surface area".

I expect the degraded critical path will most likely be worse than a bit of die area. On modern processes you have A LOT of transistors to play with.