|
|
|
|
|
by Nevermark
872 days ago
|
|
I think there is more here than a backward look. The article introduced a discrete algorithm method for approximating the gradient optimization model. It would be interesting to optimize the discrete algorithm for both design and inference times, and see if any space or time advantages over gradient learning could be found. Or if new ideas popped as a result of optimization successes or failures. It also might have an advantage in terms of algorithm adjustments. For instance, given the most likely responses at each step, discard the most likely whenever follow ups are not too far below - and see if that reliably avoided copyright issues. A lot easier to poke around a discrete algorithm, with zero uncertainty as to what is happening, vs. vast tensor models. |
|