|
|
|
|
|
by aDyslecticCrow
612 days ago
|
|
In Big-O notation, O(2n) = O(n). Two times slower is actually not that much. If this slowdown results in better inference in the same number of training rounds or better-tuned weights with fewer redundant features, that can be a very worthwhile sacrifice. It's also a complex optimization problem, not just about computing. Two times, the parameters take more than two times the time to tune and two times the working memory to train and use. There are also plenty of model training scenarios where data throughput from the dataset into memory and back out is the final bottleneck. So, though I agree it is indeed a downside, I think it's a worthwhile sacrifice if the results they show are reproducible. |
|