|
|
|
|
|
by krallistic
1018 days ago
|
|
"Inference" - getting the predictions out of the model.
While training you need to run: Input -> Model -> Output (Prediction) - Compare with True Output (Label) -> Backpropagation of Loss through the Model.
Which can highly batched & pipelined. (And you have to batch to train in any reasonable amount of times, and GPUs shine in batch regime) When a single user request comes in, you just want the prediction of that single input, so no backprogation and no batching. Which is more CPU friendly. |
|