Because otherwise people would just put a 1-frame endcard in their video with whatever they want to have as a thumbnail?
And if the thumbnail is picked randomly, you are just rewarding videos that show dramatic content with text overlay at all times. And the best-researched Tom Scott video might get randomly punished because the thumbnail is a brown-grey blur in the middle of a transition.
The existence of the medium "thumbnail" is not more of a problem than the existence of book cover. Breeding a healthy culture of what should be on it is the problem we should focus on, both by audience behavior and suggestion algorithm.
I think largely because not allowing it doesn't make that much of a difference; I can just as easily place a click-baity frame within my actual video and use that as the thumb.
Because it's great for analytics. The formula of "face in thumbnail, big text, bold colours" works and YouTube WANT you to be clickbaited so that you watch the ads.
Older YouTubers used to manipulate this by putting the suggestive part of the video at the right time.
Here's a blast from the past doing just that: https://www.youtube.com/watch?v=r8tXjJL3xcM