| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by khafra 451 days ago
	"LLM whisperer" folks will confidently claim that base models are substantially smarter than fine-tuned chat models; with qualitative differences in capabilities. But you have to be an LLM whisperer to get useful work out of a base model, since they're not SFT'ed, RLHF'ed, or RLAIF'ed into actually wanting to help you.

2 comments

andai 451 days ago

How can I learn more about this?

Is it like in the early GPT-3 days, when you had to give it a bunch of examples and hope it catches the pattern?

link

nullc 450 days ago

Not so much examples, though those can help... but you have to imagine a document of a sort that would be in the training set whose completion would be the answer you seek.

Like, "Solve this equation for me: " more likely gets completed with "Do your own homework buddy!" or just a list of more similar questions without answers. While, "careful analysis revealed the solution the equation X turned out to have a solution of", might be more likely to get what you want.

Also a lot more sensitivity to tone and context, write a prompt that sounds like it was written on some teenager fan subreddit, you'll get an answer of the sort that sounds like it belongs there.

link

im3w1l 451 days ago

Back in those days I would either create a little scene with a knowledgeable person and someone with a question. Or I would start writing a monologue and generate a continuation for it.

link

Der_Einzige 451 days ago

Me being old man yelling at cloud about how your chat/tool template matters more than your post-training technique.

DeepSeek-R1 is trivially converted back to a non reasoning model with just chat template modifications. I bet you can chat template your way into a good quality model from a base model, no RLHF/DPO/SFT/GRPO needed.

link