I'm particularly interested in architecture variations, approaches to the classification head design and loss function, etc.