| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eru 224 days ago
	Humans often answer with fluff like "That's a good question, thanks for asking that, [fluff, fluff, fluff]" to give themselves more breathing room until the first 'token' of their real answer. I wonder if any LLM are doing stuff like that for latency hiding?

2 comments

mips_avatar 224 days ago

I don't think the models are doing this, time to first token is more of a hardware thing. But people writing agents are definitely doing this, particularly in voice it's worth it to use a smaller local llm to handle the acknowledgment before handing it off.

link

strangegecko 224 days ago

Do humans really do that often?

Coming up with all that fluff would keep my brain busy, meaning there's actually no additional breathing room for thinking about an answer.

link

eru 224 days ago

People who professionally answer questions do that, yes. Eg politicians or press secretaries for companies, or even just your professor taking questions after a talk.

> Coming up with all that fluff would keep my brain busy, meaning there's actually no additional breathing room for thinking about an answer.

It gets a lot easier with practice: your brain caches a few of the typical fluff routines.

link