Hacker News new | ask | show | jobs
by mips_avatar 438 days ago
One problem with LLMs is that the amount of "thinking" they do when answering a question is dependent on how many tokens they use generating the answer. A big part of the power of models like deepseek R1 is they figured out how to get a model to use a lot of tokens in a logical way to work towards solving a problem. The models don't know the answer they come to it by generating it, and generating more helps them. In the future we'll probably see the trend continue where the model generates a "thinking" response first, then the model summarizes the answer concisely.