| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gemeral 844 days ago
	> and blowing up parameter count to make up for it based on (an admittedly rapid and indulgent reading of the paper), it seems like they're not increasing the parameter size. Do you mind pointing out where the blowup is occurring?

1 comments

magnustesshu 844 days ago

They're saying that likely, models of comparable size will perform worse (the paper claims as good)

But since they are (optimized up to 8 or 10x if packing terns beyond 2 bits, in practice it seems 3-5x considering larger other structures needed in memory) more memory efficient, the largest models can be that much larger.

link