Deploying Llama3 70B on AWS – GPU Requirement, Cost and Step-by-Step Guide

Y	Hacker News new \| ask \| show \| jobs

	Deploying Llama3 70B on AWS – GPU Requirement, Cost and Step-by-Step Guide (slashml.com)
	3 points by JJneid 742 days ago

1 comments

rini17 742 days ago

Note that quantized versions of llama3 70B can be ran on CPU on much cheaper server. I am personally using it via llama.cpp on bare metal 6-core Xeon CPU with 128G RAM for ~50 euro monthly.

link

JJneid 742 days ago

Is inference speed an issue for you?

link

rini17 742 days ago

Sufficient for fluent conversation.

link

JJneid 742 days ago

usually performance takes a hit with quantization. are you getting quality responses?

link

rini17 742 days ago

Since llama3, yes, quite satisfying.

link