| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tomp 640 days ago

The problem with all these speech-to-speech multi-modal models is that, if you wanna do anything other than just talk, you need transcription.

So you're back at square one.

Current AI (even GPT-4o) simply isn't capable enough to do useful stuff. You need to augment it somehow - either modularize it, or add RAG, or similar - and for all of those, you need the transcript.

2 comments

huac 640 days ago

> Current AI (even GPT-4o) simply isn't capable enough to do useful stuff. You need to augment it somehow - either modularize it, or add RAG, or similar

I am sympathetic to this view but strongly disagree that you need a transcript. Think about it a bit more!!

link

stavros 639 days ago

> Current AI (even GPT-4o) simply isn't capable enough to do useful stuff.

I'm loving all these wild takes about LLMs, meanwhile LLMs are doing useful things for me all day.

link

tomp 639 days ago

For me as well… with constant human supervision. But if you try to build a business service, you need autonomy and exact rule following. We’re not there yet.

link

MacsHeadroom 639 days ago

Autonomy and rule following are at odds. Humans have the same problem. The solutions we use for ourselves work amazingly for LLMs (because they're trained on human data).

Examples: Give an LLM an effective identity (prompt engineering), a value system (Constitutional AI), make it think about these things before it acts (CoT + system prompt), have a more capable [more expensive / higher inference] agent review the LLMs work from time to time (multi-agent), have a more capable agent iterate on prompts to improve results in a test environment (EvoAgents), etc.

We can't simply provide an off the shelf LLM with a paragraph or two and expect it to reliably fulfill an arbitrary task without supervision any more than we can expect the same from a random nihilist going through an identity crisis. They both need identity, values, time to think, social support, etc. before they can be reliable workers.

link

stavros 639 days ago

In my company, LLMs replaced something we used to use humans for. Turned out LLMs are better than humans at following rules.

If you need a way to perform complicated tasks with autonomy and exact rule following, your problem simply won't be solved right now.

link