Hacker News new | ask | show | jobs
by regularfry 260 days ago
If this is a way to get equivalent results to a much larger network in the same FLOPs but with a fraction of the VRAM, it's transformative.

I'm particularly keen to see if you could do speech-to-text with this architecture, and replace Whisper for smaller devices.

2 comments

Nvidia's parakeet dropped recently with better performance and 0.6B params, so the rate of progress here looks good, probably next year (or mby the year after) they'll be running no probs
I mean, I wouldn't say it's transformative or bet on it equalling usual LLM performance in general. It's kind of similar to weight reuse you see in RNNs, where the same `h` is maintained throughout. In usual LLMs each block has its own state.

These guys are choosing a middle ground - stacking few transformers, and then reusing the same 2 blocks 8 times over.

It'll be interesting to see what usecases are served well with this approach. Understanding of these architectures' response to these changes are still largely empirical so hard to say ahead of time. My intuition is that for repetitive input signals it could be good - audio processing comes to mind. But complex attention and stuff like in elevenlabs style translation is probably too much to hope for. Whisper type transcription tho, might work.