Seems like the metric they're optimising for is reducing the number of bad answers, not the proportion of bad answers, and giving non-answers to a larger fraction of questions will achieve that.
I haven't noticed ChatGPT-4 to give worse answers overall recently, but I have noticed it refusing to answer more queries. I couldn't get it to cite case law, for example (inspired by that fool of a lawyer who couldn't be bothered to check citations).