Hacker News new | ask | show | jobs
by motoboi 526 days ago
Not necessarily.

For visual tasks, that is the state of the art, with visual features being "gouped" into more semantically relevant parts ("circles" grouped into "fluffy textures" grouped into "dog ears"). This hierarchy building behavior is baked into the model.

For transformers, not so much. Although each transformer block output serve as input for the next block, they can learn hierarchical relationship (in latent space, not in human language), but that is not backed nor enforced in the architecture.