It is the primary training method for Pytorch and Tensorflow models and thus essential for training artificial neural nets (such as AlexNet, weird they would randomly pick an architecture). It is often used for non-negative matrix factorization, non-linear regression, the basis for most modern machine learning algorithms.