Hacker News new | ask | show | jobs
by MichealCodes 239 days ago
I don't think we've had the transformer moment for audio training yet, but yes, in theory audio-first models will be much more capable.
1 comments

Particularly interesting would be transformations between tokenised audio and tokenised text.

I recall someone telling me once up to 90% of communication can be non-verbal, so when an LLM sticks to just text, it's only getting 10% of the data.