The BigScience team (a working group of researchers that trained the BLOOM-176B LLM last year) released Petals [0][1] which allows distributed inference and fine-tuning of BLOOM, with the option to pick a custom model + private swarm. SWARM [2][3] is a WIP from yandex and UW that shares some of the same codebase, but is for distributed training.