|
So, from the perspective I have within the subfield I work in, explainable AI (XAI), we're seeing a bunch of fascinating developments. First, as you mentioned, Rudin continues to prove that the reason for using AI/ML is that we don't understand the problem well enough; otherwise we wouldn't even think to use it! So, pushing our focus to better understand the problem, and then levy ML concepts and techniques (including "classical AI" and statistical learning), we're able to make something that not only outperforms some state-of-the-art in most metrics, but often even is much less resource intensive to create and deploy (in compute, data, energy, and human labour), with added benefits from direct interpretability and post-hoc explanations. One example has been the continued primacy of tree ensembles on tabular datasets [0], even for the larger datasets, though they truly shine on the small to medium datasets that actually show up in practice, which from Tigani's observations [1] would include most of those who think they have big data. Second, we're seeing practical examples of exactly this outside Rudin! In particular, people are using ML more to do live parameter fine-tuning that outwise would need more exhaustive searches or human labour that are difficult for real-time feedback, or copious human ingenuity to resolve in a closed-form solution. Opus 1.5 is introducing some experimental work here, as are a few approaches in video and image encoding. These are domains where, as in the first, we understand the problem, but also understand well enough that there's search spaces we simply don't know enough about to be able to dramatically reduce. Approaches like this have been bubbling out of other sciences (physics, complexity theory, bioinformatics, etc) that lead to some interesting work in distillation and extraction of new models from ML, or "physically aware" operators that dramatically improve neural nets, such as Fourier Neural Operators (FNO) [2], which embeds FFTs rather than forcing it to be relearned (as has been found to often happen) for remarkable speed-ups with PDEs such as for fluid dynamics, and has already shown promise with climate modelling [3], material science [4]. There are also many more operators, which all work completely differently, yet bring human insight back to the problem, and sometimes lead to extracting a new model for us to use without the ML! Understanding begets understanding, so the "shifting goalposts" of techniques considered "AI" is a good thing! Third, specifically to improvements in explainability, we've seen the Neural Tangent Kernel (NTK) [5] rapidly go from strength to strength since its introduction. While rooted in core explainability vis a vis making neural nets more mathematically tractable to analysis, not only inspiring other approaches [6] and behavioural understanding of neural nets [7, 8], but novel ML itself [9] with ways to transfer the benefits of neural networks to far less resource intensive techniques; which [9]'s RFM kernel machine proves competitive with the best tree ensembles from [0], and even has advantage on numerical data (plus outperforms prior NTK based kernel machines). An added benefit is the approach used to underpin [9] itself leads to new interpretation and explanation techniques, similar to integrated gradients [10, 11] but perhaps more reminiscent of the idea in [6]. Finally, specific to XAI, we're seeing people actually deal with the problem that, well, people aren't really using this stuff! XAI in particular, yes, but also the myriad of interpretable models a la Rudin or the significant improvements found in hybrid approaches and reinforcement learning. Cicero [12], for example, does have an LLM component, but uses it in a radically different way compared to most people's current conception of LLMs (though, again, ironically closer to the "classic" LLMs for semantic markup), much like the AlphaGo series altered the way the deep learning component was utilised by embedding and hybridising it [13] (its successors obviating even the traditional supervised approach through self-play [14], and beyond Go). This is all without even mentioning the neurosymbolic and other approaches to embed "classical AI" in deep learning (such as RETRO [15]). Despite these successes, adoption of these approaches is still very far behind, especially compared to the zeitgeist of ChatGPT style LLMs (and general hype around transformers), and arguably much worse for XAI due to the barrier between adoption and deeper usage [16]. This is still early days, however, and again to harken Rudin, we don't understand the problem anywhere near well enough, and that extends to XAI and ML as problem domains themselves. Things we can actually understand seem a far better approach to me, but without getting too Monkey's Paw about it, I'd posit that we should really consider if some GPT-N or whatever is actually what we want, even if it did achieve what we thought we wanted. Constructing ML with useful and efficient inductive bias is a much harder challenge than we ever anticipated, hence the eternal 20 years away problem, so I just think it would perhaps be a better use of our time to make stuff like this, where we know what is actually going on, instead of just theoretically. It'll have a part, no doubt, Cicero showed that there's clear potential, but people seem to be realising "... is all you need" and "scaling laws" were just a myth (or worse, marketing). Plus, all those delays to the 20 years weren't for nothing, and there's a lot of really capable, understandable techniques just waiting to be used, with more being developed and refined every year. After all, look at the other comments! So many different areas, particularly within deep learning (such as NeRFs or NAS [17]), which really show we have so much left to learn. Exciting! [0]: Léo Grinsztajn et al. "Why do tree-based models still outperform deep learning on tabular data?" https://arxiv.org/abs/2207.08815
[1]: Jordan Tigani "Big Data is Dead" https://motherduck.com/blog/big-data-is-dead/
[2]: Zongyi Li et al. "Fourier Neural Operator for Parametric Partial Differential Equations" https://arxiv.org/abs/2010.08895
[3]: Jaideep Pathak et al. "FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators" https://arxiv.org/abs/2202.11214
[4]: Huaiqian You et al. "Learning Deep Implicit Fourier Neural Operators with Applications to Heterogeneous Material Modeling" https://arxiv.org/abs/2203.08205
[5]: Arthur Jacot et al. "Neural Tangent Kernel: Convergence and Generalization in Neural Networks" https://arxiv.org/abs/1806.07572
[6]: Pedro Domingos "Every Model Learned by Gradient Descent Is Approximately a Kernel Machine" https://arxiv.org/abs/2012.00152
[7]: Alexander Atanasov et al. "Neural Networks as Kernel Learners: The Silent Alignment Effect" https://arxiv.org/abs/2111.00034
[8]: Yilan Chen et al. "On the Equivalence between Neural Network and Support Vector Machine" https://arxiv.org/abs/2111.06063
[9]: Adityanarayanan Radhakrishnan et al. "Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features" https://arxiv.org/abs/2212.13881
[10]: Mukund Sundararajan et al. "Axiomatic Attribution for Deep Networks" https://arxiv.org/abs/1703.01365
[11]: Pramod Mudrakarta "Did the model understand the questions?" https://arxiv.org/abs/1805.05492
[12]: META FAIR Diplomacy Team et al. "Human-level play in the game of Diplomacy by combining language models with strategic reasoning" https://www.science.org/doi/10.1126/science.ade9097
[13]: DeepMind et al. "Mastering the game of Go with deep neural networks and tree search" https://www.nature.com/articles/nature16961
[14]: DeepMind et al. "Mastering the game of Go without human knowledge" https://www.nature.com/articles/nature24270
[15]: Sebastian Borgeaud et al. "Improving language models by retrieving from trillions of tokens" https://arxiv.org/abs/2112.04426
[16]: Umang Bhatt et al. "Explainable Machine Learning in Deployment" https://dl.acm.org/doi/10.1145/3351095.3375624
[17]: M. F. Kasim et al. "Building high accuracy emulators for scientific simulations with deep neural architecture search" https://arxiv.org/abs/2001.08055
|
> Finally, specific to XAI, we're seeing people actually deal with the problem that, well, people aren't really using this stuff!
I am very curious to see which practical interpretability/explainability requirements enter into regulations - on one hand it's hard to imagine a one-size fits all approach, especially for applications incorporating LLMs, but Bordt et al. [1] demonstrate that you can provoke arbitrary feature attributions for a prediction if you can choose post-hoc explanations and parameters freely, making a case that it can't _just_ be left to the model developers either
[1] "Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts", Bordt et al. 2022, https://dl.acm.org/doi/10.1145/3531146.3533153