Hacker News new | ask | show | jobs
by minimaxir 811 days ago
That's definitely one way to leverage the GPU VRAM hardware inflation intended for LLM model training.
1 comments

I'm fairly confident that most of the hardware you see available today (for consumers) wasn't specifically designed with LLMs in mind.
Sure, the 8GB VRAM gaming GPUs aren't designed for LLMs (and would effectively get zero benefit from the data throughput of GPU-accelerated data frames compared to typical approaches), but the 80GB A100s server GPUs definitely are.
> but the 80GB A100s server GPUs definitely are

I'm sure LLMs were considered, like many other ML use cases, but that A100 was intended for LLMs? I'm unsure about that.

A100 was released the same year as GPT3, and it wasn't until GPT3 went live that people really started pay attention. Then I'm sure designing and producing a GPU takes a longer time than a couple of months.