Hacker News new | ask | show | jobs
by Anticlockwise 1060 days ago
Can you or anyone else comment on how replicate's per-second pricing ends up comparing to OpenAI's per token pricing when using Llama2?
2 comments

My hunch is that OpenAI is a lot cheaper. I've spent $0.26 on 115 seconds of compute with Llama 2 on Replicate so far, which is only a dozen test prompts.
It is insanely more expensive on replica and they don't have the 70b model yet which will make it even more prohibitive.
Looks like it's here now: https://replicate.com/replicate/llama70b-v2-chat

As for pricing, that model's pages says: "Predictions run on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 17 seconds."

And the pricing page (https://replicate.com/pricing) says Nvidia A100 (80GB) GPU hardware costs $0.0032 per second.

So Llama 2 70B would "typically" cost under 17 x 0.0032 = $0.0544 per run.

Thank you for checking that.