| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pilotneko 851 days ago
	I experimented with this model and vLLM around a month ago. The long context length is attractive, but it was incredibly slow on a g5.12xlarge (4 NVIDIA A10G GPUs). I actually could not get it to respond for single examples longer than 50K tokens.