SAM is also conditioned on points, if it's ambiguous what you want to mask you can add a point on the saddle and the model will add it without a problem, segmentation is pretty much solved, I agree with the parent post.
IME I haven't gotten great results using SAM, maybe it was just the images I was using? They weren't great quality and it seemed to struggle with low contrast areas
If the user generates a picture of a horse and rider to add onto another composition - they probably want to include the saddle.