|
|
|
|
|
by mirekrusin
1108 days ago
|
|
What intermediate results you need to store? For backpropagation you take the diff between actual and expected output and you go backwards to calculate derivate and apply it with optimiser - that's 8 extra bytes for single precision floats per trainable parameter. Why do you need 80? |
|