Hacker News new | ask | show | jobs
by mks_shuffle 758 days ago
You can try Groq API for faster inference. They use custom hardware to speed up the inference. Supported open models can be found here: https://console.groq.com/docs/models (includes llama-70b)
1 comments

thanks, tried this to some mixed results. seems like they have caps on speed/rate limits etc if you havent spoken to them so might reach out