Hacker News new | ask | show | jobs
by lelanthran 41 days ago
> > …but as a user, I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate

> This is the opposite of the feedback I get. Users want instant responses.

I am skeptical that you are getting feedback that users prefer instant wrong results to 200ms-lag correct results.

Deeply skeptical!

7 comments

Oh, I can absolutely believe it. Humans are deeply irrational, especially about things that mess about in time frames too short for our conscious thought processes to kick in. Instant but confident sounding (and confident sounding because it's instant) will beat slower every time. You don't know which is correct until a long time after you've made a decision to trust it, or whether you like it.
> Instant but confident sounding (and confident sounding because it's instant) will beat slower every time.

Sure, but I am skeptical that users are actually saying "I prefer wrong answers over lag", which is what the post I responded to implied.

This is different to user's saying "I prefer quick answers to laggy answers", which is what I presume they may have said.

To actually settle this, the feedback must answer the question "Do you want wrong answers quickly or correct answers with an added 0.2 second delay?" because, well, those are the only two options right now.

Dunno. Feels like stated vs revealed preferences to me. Of course everyone will _say_ they want the wrong answers, but I can totally see users getting annoyed at slow responses, thinking that the developers should've traded accuracy for quicker responses. (or not thinking that at all, just demanding quicker responses unconditionally)
No I think they are saying no one would say they want wrong answers. People say they want fast answers and they are implying they should also be correct.
> actually saying

Yeah, I don't think that's the form of the feedback here.

Deeply false dichotomy!

The blog post glosses over the details and implies that 200ms of latency would be a magic solution. They do admit that WebRTC already has provisions for up to 200ms, so I guess they’re really implying that 400ms would be the happy case path for their alternative buffering, which is starting to get in the range where users would probably be annoyed.

Have you tried having conversational speech over a link with almost half a second of delay? It’s bad. You have to work hard to establish a turn taking routine with the other party and do extra mental work to identify your slot to talk.

The other half of this problem requires acknowledging that LLMs are actually pretty decent at interpreting input with gaps. You can drop words or even letters from LLM input and still get surprisingly decent results back. This post acts like a dropped packet means your response is going to send the LLM off on a wrong response or something.

100% agree. Sounds like they're either asking the wrong questions, or quoting answers selectively to suit this argument.
Especially when 200ms is the rule of thumb for things still feeling "instant" to users in terms of UX, this is like a rounding error in terms of latency when I regularly wait for actual minutes for an LLM to finish its bloody thinking and have to refresh through several "we're experiencing heavy load" errors.
I think as a user I have 2 modes: 1. Q&A mode where it's basically Google search by voice. 2. I'm trying to process an idea I have with an LLM buddy.

My desires are pretty different in the two scenarios. Q&A mode if it's not quick to respond I'll think something is wrong with my phone.

Deep think mode I'm honestly kind of pissed off at how fast it tries to respond. I want it to slow down and give me a chance to process and use extra compute on its side (including newer models) so it doesn't just spew low thought bullshit at me.

It seems like the system could detect which of these two modes was happening and adapt, including protocol.

I haven't tried the voice mode since the new model updates, maybe it's gotten better.

Counter to everything I just said though and germain to the topic at hand, when I'm in q&a mode that's probably the worst time for it to drop audio as it changes the query significantly. vs when I'm talking at it for 2 minutes it could probably throw half away.

> I am skeptical that you are getting feedback that users prefer instant wrong results to 200ms-lag correct results.

You are skeptical that people would prefer instant responses with 99.99% accuracy to waiting noticeably longer for a higher-accuracy rate?

The Internet, over its entire history, suggests otherwise.

Who claimed 99.99% accuracy?

A single dropped or missed word in a sentence can reverse the meaning.

I am skeptical that people would rather have wrong answers than lag. I am not claiming what the percentage is and neither are you, because no one measured it at the low lag.

I would be punching my phone if the stupid network causing a wrong prompt and the LLM sends me unrelated answers. Correctness should be foundational no matter what, then improve the latency as best as possible. We all understand that if the network is bad then the latency can not be guaranteed but correctness should be.