| HN Mirror

I mean, I wouldn't say it's transformative or bet on it equalling usual LLM performance in general. It's kind of similar to weight reuse you see in RNNs, where the same `h` is maintained throughout. In usual LLMs each block has its own state.

These guys are choosing a middle ground - stacking few transformers, and then reusing the same 2 blocks 8 times over.

It'll be interesting to see what usecases are served well with this approach. Understanding of these architectures' response to these changes are still largely empirical so hard to say ahead of time. My intuition is that for repetitive input signals it could be good - audio processing comes to mind. But complex attention and stuff like in elevenlabs style translation is probably too much to hope for. Whisper type transcription tho, might work.