Hacker News new | ask | show | jobs
by dvmazur 939 days ago
You should check out Hivemind[1]. It is very similar to what you described except it used MoE for "fragmentation". They have a couple of examples of pre-training in their repo. Hivemind was also used to build Petals[2] but it only supports fine-tuning and inference[3] afaik.

[1] https://github.com/learning-at-home/hivemind [2] https://github.com/bigscience-workshop/petals [3] https://chat.petals.dev/