Y
Hacker News
new
|
ask
|
show
|
jobs
by
mks_shuffle
758 days ago
You can try Groq API for faster inference. They use custom hardware to speed up the inference. Supported open models can be found here:
https://console.groq.com/docs/models
(includes llama-70b)
1 comments
yungtriggz
755 days ago
thanks, tried this to some mixed results. seems like they have caps on speed/rate limits etc if you havent spoken to them so might reach out
link