Presumably it has a different representation from a bare T? E.g. if T=Boolean then there are only two possible values for it, whereas there are three possible values for Option[T], so they need different kinds of storage (obv. in practice a boolean probably doesn't use a whole byte so you can do a kind of packed optimization, but you can't scale that arbitrarily far for deeply nested Option[Option[...Option[T]...], and this is an issue one would expect to hit with an integer or pointer where the natural representation is a full word that doesn't have any "spare" states).
Yes, there's an optimization. Specifically, there's a trait called "NonZero"; if T implements it, then Option<T> uses all zeros to represent None, and so std::mem::size_of::<Option<T>>() == std::mem::size_of::<T>(); If not, then you have to store a tag.