Hacker News new | ask | show | jobs
by jalammar 1991 days ago
Hi HN,

Author here. I had been fascinated with Andrej Karpathy's article (https://karpathy.github.io/2015/05/21/rnn-effectiveness/) -- especially where it shows neurons being activated in response to brackets and indentation.

I built Ecco to enable examining neurons inside Transformer-based language models.

You can use Ecco to simply interact with a language model and see its output token by token(as it's built on the awesome Hugging Face transformers package). But more interestingly you can use it to examine neuron activations. The article explains more: https://jalammar.github.io/explaining-transformers/

I have a couple more visualizations I'd like to add in the future. It's open source, so feel free to help me improve it.

3 comments

I can not thank you enough for your “The Illustrated Transformer” [1] that I have directed two cohorts of MSc students to – it is a true gem of an article. A few years ago my group made an interface to visualise contextual word representations [2] that looked like a primordial soup ancestor to your most recent article (no screenshots though, sadly). I hope putting these together brings you as much joy as it does to your fans in academia and education like myself reading it. Despite Chris Ohla’s effort with Distill, I still think we lack a good way to give the amount of credit efforts like yours deserve.

[1]: https://jalammar.github.io/illustrated-transformer

[2]: https://github.com/uclnlp/muppetshow

I also want to make an additional "Thank You" note for the author on the lovely "The Illustrated Word2Vec" [0]. I wish every concept Machine Learning or otherwise would follow such a framework.

[0] https://jalammar.github.io/illustrated-word2vec/

I'd love to look at your group's visualizations! Is it a private repo? because the link doesn't open up. It never stops to blow my mind that we can represent words and concepts in vectors of numbers.

Thanks for your kind words! It's a labor of passion, honestly. And while in previous years it was a nights-and-weekends project, I have recently been giving it my entire time and focus -- which is why I'm able to dip my toes more heavily into R&D like Ecco and the "Explaining Transformers" article.

Yikes, you are right… I just linked a private repo. '^^ I have poked the rest of the group and it seems that at least a tweet was made [1] – but not much else remains. Describing it from memory, we ran ELMo and BERT on Wikipedia and then allowed similarity search between a query and showed heat maps to a matched context. Nothing particularly deep compared to yours that go into the transformer “machinery”, but I think it captures very well how most Question Answering models still operate: Embed query and contexts in a high-dimensional space, compare, find semantically plausible span, and done!

[1]: https://twitter.com/Johannes_Welbl/status/106530965474036121...

Work and articles like yours has truly had an impact on me, even though they are largely qualitative. We always say “Turing complete” this and “Turing complete” that, but theoretical statements such as this have little practical utility to me as we all know that what can be learnt and what is learnt are two very different things. For example, “Visualizing and Understanding Recurrent Networks” by Karpathy et al. (2015) [2] that you list as inspiration blew my mind in terms of for example neurons that monotonically decrease from the sentence start. I remember Karpathy giving a talk on it in London and what struck me was how he simply had gone to manually inspect the neurons manually (heresy!) as there were only a few thousand of them any way. That playfulness, truly admirable.

[2]: https://arxiv.org/abs/1506.02078

Another anecdote, now from “Attention Is All You Need” by Vaswani et al. (2017) [3] where I was far from sold on Transformers as a model until Uszkoreit gave a talk at an invitation-only summit where he showed those cherry-picked attention heads that “flipped” based on whether an object was animate or not. I approached him after the talk and asked why it was not in the paper as it was awesome! Maybe I am biased because I give a large role to intuition in science, but analysis such as this is far more valuable to me as a researcher than yet another point of BLEU or a 10th dataset. Again, my bias, but I feel that there is a need for new ways of thinking in terms of both “hard” empiricism and “soft” analysis in machine learning as we seemingly are now having to mature given the attention we are receiving.

[3]: https://arxiv.org/abs/1706.03762

Apologies if I am rambling, it is midnight now and I barely slept last night.

Hey, I feel you! I'm an intuitive learner as well. I wouldn't have been able to learn much in ML if it weren't for people who write and visualize and make the methods accessible to non-experts. In my case, as with many others, it was the writing and videos of Andrew Ng, Karpathy, Chris Olah, Nando de Freitas, Sebastian Ruder, Andrew Trask, and Denny Britz amongst others. Accessible content like this goes a long way in building the confidence to further pursue the topic and not be intimidated by the steep learning curve. It fill me with joy that you've found some of my work helpful.

Thanks for digging up the screenshot. Exploring contextualize word embeddings is truly fascinating. And thanks for sharing your experience!

You are not rambling. Thanks for sharing.
I just want to say I absolutely love the name and logo. Brings back some fond memories of an incredibly hard game from once upon a time...

Having said that, IANAL, but I find it unlikely that the use of a dolphin and the word Ecco together are not trademarked, so you may want to check on that before someone bugs you about it

"Ecco the Dolphin" is a game for Sega consoles. https://en.wikipedia.org/wiki/Ecco_the_Dolphin
Yes, that's precisely what I meant
This is fantastic, I used your earlier transformers article to first get a real grasp on the architecture. I hope you expand this to accommodate other modes of attention outside of transformers paradigm as well!
Wonderful! Thanks!

I am curious about those recent O(L) attention transformers (see slide 106 of http://gabrielilharco.com/publications/EMNLP_2020_Tutorial__...). If these methods are converging towards a new self-attention mechanism, I'd love to try illustrating that.

What other attention modes are you referring to? Did something in particular catch your attention?

Personally, I implemented this just yesterday.

https://arxiv.org/pdf/1703.03130.pdf

It's a bit older now but I was looking for a self attention method without resorting to a transformer model and this proposed an interesting implementation that wound up being very successful for my problem case.