| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mks_shuffle 758 days ago
	You can try Groq API for faster inference. They use custom hardware to speed up the inference. Supported open models can be found here: https://console.groq.com/docs/models (includes llama-70b)

1 comments

thanks, tried this to some mixed results. seems like they have caps on speed/rate limits etc if you havent spoken to them so might reach out