Hacker News new | ask | show | jobs
by AnthonyMouse 843 days ago
Intel doesn't sell a lot of graphics cards whatsoever though. Be the first to offer 64GB of VRAM for under $1000 and that could change pretty fast.
1 comments

Not without CUDA unfortunately.
Don't underestimate the amount of shit people would be willing to deal with to make stuff work.

A capable GPU with 24+ GB would sell if it significantly undercuts Nvidia. Just look at geohot building his tinyboxes with AMD cards.

I would personally love that project but there are already so many versioning issues in the space it would be a nightmare if ROCm randomly broke things all the time.
I agree, ROCm seems to be a mess from the outside, but I'm glad people are putting in the effort.
And we're talking about Intel here. AMD is going to price competitively against Nvidia but they'd still rather you buy a $20,000 MI300 than a hypothetical 128GB Radeon for $2000.

Intel could very easily just put a buttload of VRAM on their existing GPUs to stick it to their competitors and make out like bandits. All they'd have to do is charge a Big markup instead of an Enterprise markup. And Intel has a better history of not making broken libraries.

a lot of the true AI value is context window size limited, not compute limited.
Assuming 50 input tokens per second, you could still be waiting ten minutes for a full 32k token prompt.

What you are talking about is highly optimized inference using accelerators, batching and speculative decoding to achieve high throughout. Once you have that then compute is irrelevant except in terms of cost, but if all you have is a small consumer grade GPU you will be compute limited at the extreme limits of your context window.

I'm taking about context in, not out. reports I have and the knowledge base I want answers from are 500-1000k tokens.

I don't need long answers, I need by site specific knowledge base