Hacker News new | ask | show | jobs
by jsheard 641 days ago
Does it actually work? AIUI the current consensus is that you need massive interconnect bandwidth to train big models efficiently, and the internet is nowhere near that. I'm sure the Nvidia DGX boxes have 10x400Gb NICs for a reason.
2 comments

There are methods that make it feasible to train models over the internet. DiLoCo is one [1] and NousResearch has found a way to improve on that using a method they call DisTro [2].

1. https://arxiv.org/abs/2311.08105

2. https://github.com/NousResearch/DisTrO?tab=readme-ov-file

I have no idea. The idea is certainly interesting but I've never actually understood how to run inference on these models... the people that run it seem to be unable to just talk simply.
I've seen bittensor before. I think it makes sense, as a way to incentivise people to rent their GPUs, without relying on a central platform. But I've always felt it was kind of a scam because it was so hard to find any guides on how to use it.

Also, this doesn't seem to actually solve the issue of fine tuners needing funding to rent those GPUs? One alternative is something like AI Horde, which pays GPU providers with "labour vouchers" that allow them to get priority next time they want GPU. Requires a central platform to track vouchers and ban those who exchange them. Basically a sort of real-life comparison of mutualism (AI Horde) vs capitalism (bittensor).