| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by redlock 505 days ago
	The issue here is that, even with a lot of VRAM, you may be able to run the model, but with a large context, it will still be too slow. (For example, running LLaMA 70B with a 30k+ context prompt takes minutes to process.)