|
What's not mentioned here is test-time compute. Idea being that, sure, you can spend a ton of compute power on pre-training and fine-tuning, but generation is difficult. So instead of spending all time and power more focused on that, how about spending some time and power on it for the model to generate a bunch of possibilities, and then spend the rest of time having a model verify what's been generated for correctness. That's the Let's Verify Step by step. Great video to talk about this: https://www.youtube.com/watch?v=ARf0WyFau0A In threads on LLMs, this point doesn't get brought up as much as I'd expect, so I'm curious if I'm missing talks on this or maybe it's wrong. But I see this as the way forward. Models generating tons of answers, and other models being able to pick out the correct ones, and the combinations being beyond human ability, where after, humans can do their own verification. Edit: Think of it this way. Trying to create something isn't easy. If I was to write a short story, it'd be very difficult, even if I spent years reading what others have written to learn their patterns. If I then tried to write and publish a single one myself, no chance it'd be any good. But _judging_ short stories is much easier to do. So if I said screw it, I'll read a couple stories to get the initial framework, then write 100 stories in the same amount of time I'd have spent reading and learning more about short stories, I can then go through the 100 and pick out the one I think is the best and publish that. That's where I see LLMs going and what the video and papers mentioned in the video say. |