Hacker News new | ask | show | jobs
by hangsi 848 days ago
Neural networks have two different compute costs: training and inference.

These are roughly analogous to compile time vs runtime for compiled programming languages.

Training is in general a more intensive task. However, in an ideal scenario training is run once and inference is run millions of times, so the lifetime cost of inference is bigger - this is why it might make sense to optimize for intense.

1 comments

Inference consists of a single forward pass, which is easy to compute. Meanwhile training has both a forward pass and backward pass (back propagation). The minimum required precision for training is higher since you want to be able to have both fast and slow rates of change. Bfloat16 is preferred for training, while int8 tends to be good enough for inference.