|
|
|
|
|
by famouswaffles
1134 days ago
|
|
That could count I suppose but I don't think that's really the kind of insight Sutton is alluding to in his original writing. Insight in this case would be more like shoehorning one of the processes humans would use to solve the problem. There are no innate grammar rules the architecture looks to before each attempt, no tree or word search. Things like that. Polishing the input in that way is neat but it's not like you can't go character or word level for a transformer. The current way is just far more compute efficient but the Transformer will figure out the seq to seq all the same. |
|