| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bytesandbits 124 days ago
	we constantly underestimate the power of inference scaffolding. I have seen it in all domains: coding, ASR, ARC-AGI benchmarks you name it. Scaffolding can do a lot! And post-training too. I am confident our currently pre-trained models can beat this benchmark over 80% with the right post-training and scaffolding. That being said I don't think ARC-AGI proves much. It is not a useful task at all in the wild. it is just a game; a strange and confusing one. For me this is just a pointless pseudo-academic exercise. Good to have, but by no means measures intelligence and even less utility of a model.

3 comments

ithkuil 124 days ago

That's unsurprising given that a lot of our own abilities as humans come from having painstakingly acquired practices and methodologies and tools (like pencil and paper, note taking, let alone algebra, formal methods and electromechanical aids). We call this "education" but it works in a way that is more similar to agentic harnesses than to pretraining or fine-tuning. This is reflected in the fundamental different way in which children and adults learn new skills

link

Linello 124 days ago

Scaffolding is all you need. I am absolutely certain about that. It's abound finding good ways to approximate the reward function being used during post-training, but at inference time. A general enough reward that can score candidates well will inevitably improve the abilities of LLMs when put inside scaffolds.

link

nubg 124 days ago

what exactly does scaffolding mean in this context? genuine question

link

boxed 118 days ago

I'm gonna guess it means "whatever we still need humans to figure out to spoon feed the models"

link

bytesandbits 124 days ago

anything that doesn't touch the model parameters at all once it has been compiled. for example, in streaming ASR of an encoder-decoder you can get gains in accuracy just by enhancing the encoder-decoder orchestration and ratio, frequency of fwd passes, dynamically adjusting the length of rolling windows (if using full attention). Prompting would be part of this too, including few-shot examples. Decoding strategy is also part of this (top-k, nucleus, speculative decoding, greedy or anything else). Applying signal processing or any kind of processing to the input before getting it into the model, or to the output. There are a lot of things you can do.

link

Linello 124 days ago

Also think about the program-synthesis approach proposed by Poetiq.ai. python programs are being generated and evaluated against previous examples. Then in-context learning is done programmatically via prompt concatenation. If you can "score" online the working and non working examples, then you have a very strong reward signal.

link