Is there a good paper (or talk) how inference looks at scale? (Kinda like ELI-using-single-gpus)
https://arxiv.org/pdf/2309.06180
https://arxiv.org/pdf/2309.06180