Hacker News new | ask | show | jobs
by swyx 43 days ago
> They've published a fair amount about their architecture - enough that I imagine frontier labs could implement.

i think the real ones know this is the tip of the iceberg? hparam tuning, data recipes, data collection, custom kernels, rl/eval infra, all immensely deep topics that would condense multiple decades of phd lifetimes to produce SOTA performance (in both senses of the word) like this.

i would also calibrate what you are impressed by. simply waiting is a posttrain thing - the fact that gemini and oai have not prioritized it is not something you should overindex on as hard. what they showed with full duplex is technically far far harder to achieve

2 comments

I agree that full duplex is the amazing bit. For instance, the three engineers shouting trivia questions while a timer is running — that’s extremely novel as far as I can tell.

I’d like to believe from the demos that this ability to wait kind of falls out of the model as an emergent property — perhaps coming out of a small RL loop - rather than a specific behavior trained, a-la a VAD component in a stack. Either way, I would guess that VAD absolutely cannot do this right now — interruptions are highly annoying on all voice interaction experiences, and if it were a simple matter of better post training, SOMEONE would have done this, e.g. elevenlabs.

But, I disagree on your idea that this is too expensive/too hard to replicate. For me, yes. But, there’s an existence proof — a small team at a new company just did this without a real roadmap, certainly for less than $1b dollars and probably in less than two years. They are almost certainly less skilled at your list of needs to replicate than teams at the frontier labs, who have been given a roadmap.. So I don’t think it’s as difficult as you propose, from an organizational skills perspective.

SOTA is very much about both training on well catered corpus (having it) and also hundreds of iterations which eventually make you into… several PHDs really.

This is ML/AI. Is not calling third party APIs. If you want any SOTA in any AI area you need to design your own strategy and models. Drilling down to get there is super painful and perhaps not something a paid-for-course can teach you.

Random is everywhere and so are unexpected engineering challenges. Mastering linear algebra alongside some geometry and still knowing classic algos is the starting point.