That's assuming they tested for one masked/non-masked input only and not bundled it with other redesigns, or didn't have other confounders like a sale, or marketing campaign, etc. Also that they did run this as a trial for set amount of time, and not until Optimizely told them that the B-version is outperforming A-version (common A/B testing pitfall).
Overall, seeing the UX crap produced by many data-driven companies (Google included), I have low trust in their methodology.
Given my personal experience in this industry, and around adtech companies, I wouldn't. It's easy to get things wrong, and hard to verify against reality.
Overall, seeing the UX crap produced by many data-driven companies (Google included), I have low trust in their methodology.