| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ggerganov 960 days ago

Yes, I was planning to do this back then, but other stuff came up. There are many different ways in which this simple example can be improved:

- better detection of when speech ends (currently basic adaptive threshold)

- use small LLM for quick response with something generic while big LLM computes

- TTS streaming in chunks or sentences

One of the better OSS versions of such chatbot I think is https://github.com/yacineMTB/talk. Though probably many other similar projects also exist by now.

2 comments

generalizations 959 days ago

I keep wondering if a small LLM can also be used to help detect when the speaker has finished speaking their thought, not just when they've paused speaking.

link

drunkenmagician 959 days ago

Maybe using a voice activity detector, VAD would be a lighter (less resources required) option.

link

generalizations 959 days ago

That works when you know what you’re going to say. A human knows when you’re pausing to think, but have a thought you’re in the middle of expressing. A VAD doesn’t know this and would interrupt when it hears a silence of N seconds; a lightweight LLM would know to keep waiting despite the silence.

link

cjbprime 959 days ago

And the inverse: the VAD would wait longer than necessary after a person says e.g. "What do you think?", in case they were still in the middle of talking.

link

rjtavares 960 days ago

> use small LLM for quick response with something generic while big LLM computes

Can't wait for poorly implemented chat apps to always start a response with "That's a great question!"

link

Joeri 959 days ago

“Uhm, i mean, like, you know” would indeed be a little more human.

link

avarun 959 days ago

Just like poorly implemented human brains tend to do :P

link