I think you would need to understand in some detail the makeup of all the content on YouTube before you can make that choice.
As a hypothetical example, say yt had one user and 1 billion years of video. It makes a lot more sense for that one user to do up scaling than it does for yt to transcode all that video. The reverse is also true i.e. 1 video, 1 billion users = yt should do a lot of transcoding.
The real solution (in the economic sense of minimising overall cost) is probably a compromise where yt transcode their most popular content, and if you're interested in the long tail of unpopular content then you can invest in upscaling.
Course it does, they don't need to run the upscale job, don't need to store it, get to sell more hardware.
Maybe not for the user tha pays more to upscale through having to buy hardware or pay extra for electricity (minimal but still). At least they save on they b/w :)
As a hypothetical example, say yt had one user and 1 billion years of video. It makes a lot more sense for that one user to do up scaling than it does for yt to transcode all that video. The reverse is also true i.e. 1 video, 1 billion users = yt should do a lot of transcoding.
The real solution (in the economic sense of minimising overall cost) is probably a compromise where yt transcode their most popular content, and if you're interested in the long tail of unpopular content then you can invest in upscaling.