Hacker News new | ask | show | jobs
by whoateallthepy 1230 days ago
At the end of last year I put together a repository to try and show what is achieved by self-attention on a toy example: detect whether a sequence of characters contains both "a" and "b".

The toy problem is useful because the model dimensionality is low enough to make visualization straightforward. The walkthrough also goes through how things can go wrong, and how it can be improved, etc.

The walkthrough and code is all available here: https://github.com/rstebbing/workshop/tree/main/experiments/....

It's not terse like nanoGPT or similar because the goal is a bit different. In particular, to gain more intuition about the intermediate attention computations, the intermediate tensors are named and persisted so they can be compared and visualized after the fact. Everything should be exactly reproducible locally too!