| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ipsum2 606 days ago
	You can't expect a 1B model to perform as well as 7B or chatGPT, probably the best use case is speculative decoding or to use to fine tune for a specific use case.

1 comments

theanonymousone 606 days ago

What is "speculative decoding"?

link

regularfry 606 days ago

Speculative decoding is using a small model to quickly generate a sequence that every so often you pass through a larger model to check and correct. It can be much faster than just using the larger model, with tolerably close accuracy.

link

qeternity 606 days ago

> with tolerably close accuracy.

No, speculative decoding has exactly the same accuracy as the target model. It is mathematically identical to greedy decoding.

link

kgc 605 days ago

Is there a reference for this? I was wondering the same thing.

link

qeternity 605 days ago

Read the original whitepaper or go look at how any framework implements it.

You will see that tokens not predicted by greedy sampling of the target model are rejected. Ergo, they are mathematically identical.

link