I literally wrote about running quantized models and how much more affordable it could be in the very next sentence. Please don't reply if you can't be bothered to read the entire comment, it's not that long.
I read the comment, thanks. I just disagree with your cost estimate. Even for a small business that needs high throughput they could probably do it for far less than $300k if they aren’t just blindly buying the first big nvidia setup they can.