| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by xadhominemx 843 days ago
	Not without CUDA unfortunately.

2 comments

KeplerBoy 843 days ago

Don't underestimate the amount of shit people would be willing to deal with to make stuff work.

A capable GPU with 24+ GB would sell if it significantly undercuts Nvidia. Just look at geohot building his tinyboxes with AMD cards.

link

xadhominemx 843 days ago

I would personally love that project but there are already so many versioning issues in the space it would be a nightmare if ROCm randomly broke things all the time.

link

KeplerBoy 843 days ago

I agree, ROCm seems to be a mess from the outside, but I'm glad people are putting in the effort.

link

AnthonyMouse 843 days ago

And we're talking about Intel here. AMD is going to price competitively against Nvidia but they'd still rather you buy a $20,000 MI300 than a hypothetical 128GB Radeon for $2000.

Intel could very easily just put a buttload of VRAM on their existing GPUs to stick it to their competitors and make out like bandits. All they'd have to do is charge a Big markup instead of an Enterprise markup. And Intel has a better history of not making broken libraries.

link

cyanydeez 843 days ago

a lot of the true AI value is context window size limited, not compute limited.

link

imtringued 842 days ago

Assuming 50 input tokens per second, you could still be waiting ten minutes for a full 32k token prompt.

What you are talking about is highly optimized inference using accelerators, batching and speculative decoding to achieve high throughout. Once you have that then compute is irrelevant except in terms of cost, but if all you have is a small consumer grade GPU you will be compute limited at the extreme limits of your context window.

link

cyanydeez 842 days ago

I'm taking about context in, not out. reports I have and the knowledge base I want answers from are 500-1000k tokens.

I don't need long answers, I need by site specific knowledge base

link