Yeah they train well and very stably even int8, maxtext now has LLaMA and mistral support too, pytorch xla gets 50% MFU with spmd and you have some nice stacks like levanter
Haven't been too impressed with inference versus tensor rt llm for example though
Haven't been too impressed with inference versus tensor rt llm for example though