| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwaway314155 158 days ago
	That doesn’t tell you if the new method continues to perform better at higher parameter counts.

2 comments

tuned 158 days ago

it most-likely will in terms of performance as it uses 50% less memory (for sure it will at inference time that is the most used operation on web services), because it can leverage longer T and D if the design is confirmed and the quality of generation is comparable to other models. If this very basic assumption is correct, it means a lot of savings in electricity as the same GPUs can resolve more requests.

link

throwaway314155 157 days ago

By performance, I meant the accuracy of the model, not the runtime/memory characteristics.

link

amelius 158 days ago

Nor that the training from scratch will even work.

link

tuned 158 days ago

exactly, that is the current objective. To proove that generation for a specific domain is on-par with causal attention models

link