Both using a hierarchical transformer, adapting the transformer network architecture to vision tasks more efficiently.