Hacker News new | ask | show | jobs
by jsheard 641 days ago
Decentralized inferencing perhaps, but the training is very much centralized around Metas continued willingness to burn obscene amounts of money. The open source community simply can't afford to pick up the torch if Meta stops releasing free models.
3 comments

There's plenty of open source AI out there that isn't Meta. It's just not as good.

The #1 problem is not compute, but data and the manpower required to clean that data up.

The main thing you can do is support companies and groups who are releasing open source models. They are usually using their own data.

> There's plenty of open source AI out there that isn't Meta. It's just not as good.

To my knowledge all of the notable open source models are subsidised by corporations in one way or another, whether by being the side project of a mega-corp which can absorb the loss (Meta) or coasting on investor hype (Mistral, Stability). Neither of those give me much confidence that they will continue forever, especially the latter category which will just run out of money eventually.

For open source AI to actually be sustainable it needs to stand on its own, which will likely require orders of magnitude more efficient training, and even then the data cleaning and RLHF are a huge money sink.

if you can do 100x more efficient training with open source, closeAI can simply take that and train a model that's 100x bigger/longer/more tokens.
AKA why Unsloth is now YC backed for their even better (but closed source) fine-tuning.
https://huggingface.co/datasets/HuggingFaceFW/fineweb

The #1 problem is absolutely compute. People barely get funding for fine tunes, and even if you physically buy the GPUs it'll cost you in power consumption.

That said, good data is definitely the #2 problem. But nowadays you can just get good synthetic datasets from calling closed model APIs or just using existing local LLMs to sift through trash. That'll cost you too.

>The main thing you can do is support companies and groups who are releasing open source models. They are usually using their own data.

Alternatively we could create standardized open source training data like wikipedia, wikimedia as well as public domain literature and open courseware. I'm sure that there are many other such free and legal sources of data.

but the training data is one of the key bits that makes or breaks your model's performance.

There is a reason why datasets are private and the model weights aren't.

Compute is for sure the number one problem. Look at how long it’s taking for anything better than Pony Diffusion to come out for NSFW image gen despite the insane amount of demand for it.

Look at how much computer purple AI actually has. It’s basically nothing.

One area that's interesting, but easy to dismiss because it's the ultimate cross-section of hype (AI and crypto) is bittensor.

AFAICT it decentralizes the training of these models by giving you an incentive to train models which will mine the crypto if you're improving it.

I learned about it years ago, mined some crypto, lost the keys and now kicking myself cuz I would've made a pretty penny lol

Does it actually work? AIUI the current consensus is that you need massive interconnect bandwidth to train big models efficiently, and the internet is nowhere near that. I'm sure the Nvidia DGX boxes have 10x400Gb NICs for a reason.
There are methods that make it feasible to train models over the internet. DiLoCo is one [1] and NousResearch has found a way to improve on that using a method they call DisTro [2].

1. https://arxiv.org/abs/2311.08105

2. https://github.com/NousResearch/DisTrO?tab=readme-ov-file

I have no idea. The idea is certainly interesting but I've never actually understood how to run inference on these models... the people that run it seem to be unable to just talk simply.
I've seen bittensor before. I think it makes sense, as a way to incentivise people to rent their GPUs, without relying on a central platform. But I've always felt it was kind of a scam because it was so hard to find any guides on how to use it.

Also, this doesn't seem to actually solve the issue of fine tuners needing funding to rent those GPUs? One alternative is something like AI Horde, which pays GPU providers with "labour vouchers" that allow them to get priority next time they want GPU. Requires a central platform to track vouchers and ban those who exchange them. Basically a sort of real-life comparison of mutualism (AI Horde) vs capitalism (bittensor).

Centralized production, decentralized consumption.