Hacker News new | ask | show | jobs
by hnuser123456 474 days ago
If you value local compute and don't need massive speed, that's still twice as fast as most people can type.
2 comments

Human typing speed is magnitudes slower than our eyes scanning for the correct answer.

ChatGPT o3 mini high thinks at about 140 tokens/s by my estimation and I sometimes wish it can return answers quicker.

Getting a simple prompt answer would take 2-3 minutes using the AMD system and forget about longer context.

Reasoning models spend a whole bunch of time reasoning before returning an answer. I was toying with QWQ 32B last night and ran into one question I gave it where it spent 18 minutes at 13tok/s in the <think> phase before returning a final answer. I value local compute but reasoning models aren’t terribly feasible at this speed since you don’t really need to see the first 90% of their thinking output.