Well, in the real world it might. It should trigger a bug creation and a fix to the code, but not an incident.
Now all of a sudden to decide this you need more complex and/or specific queries in your monitoring system (or a good ML-based alert system), so complexity is already going up.
If your service is returning 5xx, that is the the definition of a server error, of course that is degraded service. Instead we have pointless dashboards that are green an hour after everything is broken.
Returning 4xx on a client error isn't hard and is usually handled largely by your framework of choice.
Quick counter-example for GP: what if the 500 spike is due to a spike in malformed requests from a single (maybe malicious) user?