Estimating required GPU memory for serving LLMs

Y	Hacker News new \| ask \| show \| jobs

	Estimating required GPU memory for serving LLMs (substratus.ai)
	2 points by samosx 938 days ago

1 comments

samosx 938 days ago

Having a hard time with estimating how much GPU memory that LLM needs to serve it? What kind of GPUs to use and how many?

Wrote a blog post to demystify the process of GPU memory usage estimating.

link

brianjking 938 days ago

My issue is figuring out how to identify how many concurrent users you can support on average on a given GPU.

Understanding the vram to simply load the weights is easy enough. When you are allowing for something like content generation with varying lengths of input/output tokens, how do you even begin to identify the GPUs you need?

link