Y
Hacker News
new
|
ask
|
show
|
jobs
Deploying Llama3 70B on AWS – GPU Requirement, Cost and Step-by-Step Guide
(
slashml.com
)
3 points
by
JJneid
742 days ago
1 comments
rini17
742 days ago
Note that quantized versions of llama3 70B can be ran on CPU on much cheaper server. I am personally using it via llama.cpp on bare metal 6-core Xeon CPU with 128G RAM for ~50 euro monthly.
link
JJneid
742 days ago
Is inference speed an issue for you?
link
rini17
742 days ago
Sufficient for fluent conversation.
link
JJneid
742 days ago
usually performance takes a hit with quantization. are you getting quality responses?
link
rini17
742 days ago
Since llama3, yes, quite satisfying.
link