Hacker News new | ask | show | jobs
by whitten 506 days ago
Has anyone tried to recreate LLMs in APL ? Matrix multiplication is common in both of them. If LLAMA.C is only a few hundred lines then I would be surprised if the APL is more than a page of code.
3 comments

A page? I sure hope not. The whole training and inference code for a U-Net architecture in APL is only about 30 lines, with inference direction only taking up 4 or so lines. Assuming you know APL, the code is actually really straightforward and readable, too! Not to mention that it's only ~2.5x slower than a comparable PyTorch implementation.

[0]:https://www.dyalog.com/uploads/conference/dyalog22/presentat...

People have done it. For eg, https://github.com/BobMcDear/trap and few others that I can't find the links of right now.

When I got into BQN recently, I had the same thought and tried my hand at recreating LLM building blocks and saw the compactness APL-likes afford first-hand: https://www.chandergovind.org/blog/5126/aipasi/Array-program...

Conversely, after many years of dabbling I still consider myself a complete novice at APL and array programming. However, my studying of array langs made learning LLMs fairly trivial. In particular, studying APL's inner product from the Mastering Dyalog book was the key to understanding the self-attention mechanism from Attention is All You Need and how it communicates semantic information across the positionally encoded tokens.

I'd still probably struggle to do anything mildly complex in APL, though, sadly. It hasn't fully clicked yet, but maybe one day!