| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Anticlockwise 1107 days ago
	Can you or anyone else comment on how replicate's per-second pricing ends up comparing to OpenAI's per token pricing when using Llama2?

2 comments

simonw 1107 days ago

My hunch is that OpenAI is a lot cheaper. I've spent $0.26 on 115 seconds of compute with Llama 2 on Replicate so far, which is only a dozen test prompts.

link

ta988 1107 days ago

It is insanely more expensive on replica and they don't have the 70b model yet which will make it even more prohibitive.

link

richdougherty 1107 days ago

Looks like it's here now: https://replicate.com/replicate/llama70b-v2-chat

As for pricing, that model's pages says: "Predictions run on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 17 seconds."

And the pricing page (https://replicate.com/pricing) says Nvidia A100 (80GB) GPU hardware costs $0.0032 per second.

So Llama 2 70B would "typically" cost under 17 x 0.0032 = $0.0544 per run.

link

ta988 1102 days ago

Thank you for checking that.

link