|
|
|
|
|
by munro
1491 days ago
|
|
That /sounds/ right, but training still has a forward part, so OP does raise a really great question. And looking at the silicon, the neural engine is almost the size of the GPU. Really need someone educated in this area to chime in :) |
|
(Also, many inference accelerators use lower precision than you do when training)
There are tricks you can do to use inference to accelerate training, such as one we developed to focus on likely-poorly-performing examples: https://arxiv.org/abs/1910.00762