| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by robertlagrant 1180 days ago
	> I want the best LLMs to be open source too How do you do this without being incredibly wealthy?

5 comments

crdrost 1180 days ago

You (1) are a company who (2) understands the business domain and has an appropriate business plan.

Sadly the reality of funding today makes it unlikely that these two will both be simultaneously satisfied. The problem is that history will look back on the necessary business plan and deem it a failure even if it generates a company that does a billion dollars plus in annual revenue.

This is actually not unique to large language models but most innovation around computers. The basic problem is that if you build a force-multiplier (spreadsheets, personal computing, large-language models all come to mind) then what will make it succeed is its versatility: people want a hammer that can be used for smashing all manner of things, not just your company's particular brand of matching nails. And most people will only pick up that hammer once per week or once per month, only like 1% of the economy if that will be totally revolutionized, "we use this force-multiplier every day, it is now indispensable, we can't imagine life without it," and it's never predictable what that sector will be -- it's going to be like "oh, who ever dreamed that the killer application for LLMs would be them replacing AutoCAD at mechanical contractors" or some shit.

In those strange eons, to wildly succeed, one must give up on anticipating all usages of the software, one must cease controlling it and set it free. "Well where's the profit in that?" -- it is that this company was one of the first players in the overall market, they got an early chance to stake out as much territory as possible. But the market exploded way larger than they could handle and then everybody looks back on them and says "wow, what a failure, they only captured 1% of that market, they could have been so much more successful." Yeah, they captured 1% of a $100B market, some failure, right?

But what actually happens is that companies see the potential, investors get dollar signs in their eyes, everyone starts to lock down and control these, "you may use large language models but only in the ways that we say, through the interfaces which we provide," and then the only thing that you can use it for is to get generic conversational advice about your hemorrhoids, so after 5-10 years the bubble of excitement fizzles out. Nobody ever dreams to apply it to AutoCAD or whatever, and the world remains unchanged.

link

javajosh 1180 days ago

History is littered with great software that died because no-one used it because the business model was terrible. Capturing $1B of value is better than 0, and everyone understands this. And who cares what history thinks anyway?

OpenAI has spent a lot of money to get their result. It's safe to assume it will take a lot of money to get a similar result, and then to share it (although I assume bit torrent will be good enough). Once people are running their models, they can innovate to their hearts content. It's not clear how or why they'd give money back to the enabling technology. So how does money flow back to the innovators in proportion to the value produced, if not a SaaS?

link

robertlagrant 1180 days ago

If those are all that's required, why don't you start a company with a business plan written so it satisfies your criteria? Then you can lead the way with OSS LLMs.

link

ftxbro 1180 days ago

what stage of capitalism is this

link

ftxbro 1180 days ago

Yes a rugged individual would have to be incredibly wealthy to do it!

But maybe the governments will make one and maintain it with taxes as an infrastructure service, like roads, giving everyone expanded powers of cognition, memory, and expertise, and raising the consciousnesses of humanity to new heights. Probably in USA it wouldn't happen if we judge ourselves only in zero sum relation to others - helping everyone would be a wash and only waste our money!

link

robertlagrant 1180 days ago

The US spends more on its citizens than almost any other country, and more on helping other countries than any other country.

The problem with making something nationalised or a utility is you'd better have made sure there's no innovation needed or risk required. Once that's all settled, then maybe consider it.

link

szundi 1180 days ago

Some governments probably alread do and use it against so-called terrorists or enemies of the people…

link

nickthegreek 1180 days ago

crowd source to pay for the gpu rentals.

link

xsmasher 1180 days ago

A company that wants to sell you the hardware that LLMs run on might do this. NVIDIA? Apple?

link

mejutoco 1180 days ago

Pooling resources a la SETI@home would be an interesting option I would love to see.

link

simonw 1180 days ago

My understanding is that can work for model inference but not for model training.

https://github.com/bigscience-workshop/petals is a project that does this kind of thing for running inference - I tried it out in Google Collab and it seemed to work pretty well.

Model training is much harder though, because it requires a HUGE amount of high bandwidth data exchange between the machines doing the training - way more than is feasible to send over anything other than a local network connection.

link

robertlagrant 1180 days ago

And a lot of expensive data scientists.

link

PeterisP 1180 days ago

This is the type of task where if you'd want to pool resources, then it would be more efficient to pool dollars and buy compute power rather than pool compute power - I'd assume that if treat the decentralized hardware as free, just the the extra electricity cost of using it is more expensive than just renting a centralized server which can do it efficiently.

link

shagie 1180 days ago

SETI@home (and similar projects) fall into the domain of embarrassingly parallelizable ( https://en.wikipedia.org/wiki/Embarrassingly_parallel ).

My own experience with this was a distributed ray tracer where the server sent the full model to the machines and then each machine would ask for one scan line to do, report back, and then ask for another scan line and repeated.

There was no interaction between the machines - what was on one scan line didn't need any coordination with what was on another scan line.

Likewise, with SETI@home, the server could give you a chunk of data and you could analyze that chunk - the contents of another chunk of data didn't change the analysis being done on this one.

Furthermore, these can be done asynchronously and then assembled when everything is done. Only the very final product / analysis / artifact needs all of the data and nothing other than the end process is waiting on any sub process.

For doing gradient descent ( https://www.3blue1brown.com/lessons/gradient-descent ), as I understand it, each iteration is dependent on the previous one.

Doing 13,002 dimensional (for the example of a 784 -> 16 -> 16 -> 10 neuron net digit recognizer in the 3b1b page) matrix math is the parallel part... but and if you get into the billions of parameters it gets much larger. Matrix multiplication has difficulty across a network. For example - http://www.lac.inpe.br/~stephan/CAP-372/Fox_example.pdf and http://www.cs.csi.cuny.edu/~gu/teaching/courses/csc76010/sli...

> We are now ready for the second stage. In this stage, we broadcast the next column (mod n) of A across the processes and shift-up (mod n) the B values.

That use of "broadcast" - the matrix multiplication is limited by the speed of the slowest node and it needs to send all the data from the previous calculation to all the nodes making it difficult to use across a network that experiences latency.

When doing ML training, they most of TB/sec of bandwidth... and the high end extremes are in PB/sec ( https://www.cerebras.net/product-chip/ ) ... and I'm sitting here watching Steam download.

The inefficiencies of the network, slow computers, and amount of data transfer to preform the next calculation make network distributed machine learning "not a good choice" at this time.

link