| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ebalit 714 days ago

We built the cheapest Llama 3.1 70B inference API, specialized for tasks that are not time sensitive (ie. batch processing jobs for example).

Without any quantization our current price is 30cts ingest and 50cts output per million tokens. [1]

1 comments

Amazing! Please dont hesitate to open an issue or a PR Will update our dataset and add it.