More like "you need to sign up for our website and pay for a subscription", and I'd much rather do that if it's actually providing value. I am absolutely not going to run model locally which slowly churns out words at 5 tps while making the computer hot to touch.
I would very much like not to have to download 22 GB for some inference capability that is way worse than API calls both in terms of quality and speed.
I would rather pay money than seeing this thing running in my browser that only prints 5 tps on high-end consumer hardware.
Fair, but actually you'd surely want your choice of those three, right?
And what's being discussed here is what the better implementation of option 3 is.
My point is that if you're going with one of the possible implementations of option 3, then 22GB per browser is objectively a lot better than 22GB per website.