If you want to run larger models, then CPU inference is your only choice.
Also, not many implementations can even use it.