|
|
|
|
|
by jalammar
1986 days ago
|
|
I'd love to look at your group's visualizations! Is it a private repo? because the link doesn't open up. It never stops to blow my mind that we can represent words and concepts in vectors of numbers. Thanks for your kind words! It's a labor of passion, honestly. And while in previous years it was a nights-and-weekends project, I have recently been giving it my entire time and focus -- which is why I'm able to dip my toes more heavily into R&D like Ecco and the "Explaining Transformers" article. |
|
[1]: https://twitter.com/Johannes_Welbl/status/106530965474036121...
Work and articles like yours has truly had an impact on me, even though they are largely qualitative. We always say “Turing complete” this and “Turing complete” that, but theoretical statements such as this have little practical utility to me as we all know that what can be learnt and what is learnt are two very different things. For example, “Visualizing and Understanding Recurrent Networks” by Karpathy et al. (2015) [2] that you list as inspiration blew my mind in terms of for example neurons that monotonically decrease from the sentence start. I remember Karpathy giving a talk on it in London and what struck me was how he simply had gone to manually inspect the neurons manually (heresy!) as there were only a few thousand of them any way. That playfulness, truly admirable.
[2]: https://arxiv.org/abs/1506.02078
Another anecdote, now from “Attention Is All You Need” by Vaswani et al. (2017) [3] where I was far from sold on Transformers as a model until Uszkoreit gave a talk at an invitation-only summit where he showed those cherry-picked attention heads that “flipped” based on whether an object was animate or not. I approached him after the talk and asked why it was not in the paper as it was awesome! Maybe I am biased because I give a large role to intuition in science, but analysis such as this is far more valuable to me as a researcher than yet another point of BLEU or a 10th dataset. Again, my bias, but I feel that there is a need for new ways of thinking in terms of both “hard” empiricism and “soft” analysis in machine learning as we seemingly are now having to mature given the attention we are receiving.
[3]: https://arxiv.org/abs/1706.03762
Apologies if I am rambling, it is midnight now and I barely slept last night.