I would personally love that project but there are already so many versioning issues in the space it would be a nightmare if ROCm randomly broke things all the time.
And we're talking about Intel here. AMD is going to price competitively against Nvidia but they'd still rather you buy a $20,000 MI300 than a hypothetical 128GB Radeon for $2000.
Intel could very easily just put a buttload of VRAM on their existing GPUs to stick it to their competitors and make out like bandits. All they'd have to do is charge a Big markup instead of an Enterprise markup. And Intel has a better history of not making broken libraries.
Assuming 50 input tokens per second, you could still be waiting ten minutes for a full 32k token prompt.
What you are talking about is highly optimized inference using accelerators, batching and speculative decoding to achieve high throughout. Once you have that then compute is irrelevant except in terms of cost, but if all you have is a small consumer grade GPU you will be compute limited at the extreme limits of your context window.
A capable GPU with 24+ GB would sell if it significantly undercuts Nvidia. Just look at geohot building his tinyboxes with AMD cards.