| First transformer models still dealt only with the training set. Eventually it was extended to work with an external data source that it queries. This is not a new thing, for example, image style transfer and some other image tasks that were attempted before the domination of NNs did the same thing (linear models would query the db for help and guided feature extraction). The greatest effect in transformers is the attention mechanism combined with self-supervised learning. Investigations in self-supervised learning tasks (article illustrates one word gap, but there are others) can result in superior models that are sometimes even easier to train. As for SAT, optimization, graph neural networks might end up being more effective (due to high structure of the inputs). I'm definitely awaiting for traveling salesman solver or similar, guided by NN, solving things faster and reaching optimality more frequently that optimized heuristic algos. |
There was a competition for exactly this at Neurips 2021
https://www.ecole.ai/2021/ml4co-competition/
Not sure how much they improved over handcrafted heuristics, but the summary paper may give some insights
https://arxiv.org/abs/2203.02433