|
|
|
|
|
by pretty_dumm_guy
2005 days ago
|
|
Thank you for your answer. It appears to me that we are trying to achieve an algorithm that has better time complexity than the one that we have right now(reverse mode differentiation with gradient descent). Is it possible to combine these methods in a straight forward manner with methods that try to reduce the space complexity? For example, Lottery ticket hypothesis(https://arxiv.org/abs/1803.03635) seems to reduce spacial complexity(Please do correct me if I am wrong). Also, based on my rather poor and limited knowledge, it appears to me that set of proposed methods that reduced space complexity and set of proposed methods that reduce time complexity are disjoint. Is that the case ? |
|
There is a lot of work on trying to speed up optimization, for example the K-FAC optimizer by Roger Grosse that uses second-order gradient information in a scalable way.
The lottery ticket pruning strategies do reduce space complexity, but I think the main reason people are interested in it is to reduce training time complexity, or deployment memory requirements, but not so much training memory requirements.
As for whether memory-saving and time-saving approaches are disjoint, many methods (like checkpointing) introduce a tradeoff between time and space complexity, so no.