Hacker News new | ask | show | jobs
by _xnmw 1022 days ago
For the sake of not giving Microsoft and a few other tech giants immense power over the world, I really do hope the cost and efficiency of LLMs improve dramatically, until we can get GPT-4-equivalent models trained on a few graphics cards and running offline on an iPhone. Really rooting for these kinds of projects until someone makes the breakthrough.
7 comments

You may be interested in what we’re working on at Symbolica AI.

We’re using formal logic in the form of abstract rewrite systems over a causal graph to perform geometric deep learning. In theory it should be able to learn the same topological structure of data that neural networks do, but using entirely discrete operations and without the random walk inherent to stochastic gradient descent.

Current experiments are really promising, and assuming the growth curve continues as we scale up you should be able to train a GPT-4 scale LLM in a few weeks on commodity hardware (we are using a desktop with 4 4090’s currently), and be able to do both inference and continual fine tuning/online learning on device.

> We’re using formal logic in the form of abstract rewrite systems over a causal graph to perform geometric deep learning. In theory it should be able to learn the same topological structure of data that neural networks do, but using entirely discrete operations and without the random walk inherent to stochastic gradient descent.

Abstract rewrite like a computer algebra system's (e.g. Wolfram) term rewriting equation simplication method?

Heavily influenced by Wolfram's work on metamathematics and the physics project, in so far as using a rewrite system to uncover an emergent topology; we're just using it to uncover the topology of certain data (assuming that the manifold hypothesis is correct), rather than the topology of fundamental physics as he did.
Sounds cool, but what are the drawbacks?
Biggest drawback is that since the structure is all discrete, it is inherently weak at modeling statistical distributions. For example, it'll likely never best a neural network at stock market prediction or medical data extrapolation.

However, for things that are discrete and/or causal in nature, we expect it to outperform deep learning by a wide margin. We're focused on language to start, but want to eventually target planning and controls problems as well, such as self-driving and robotics.

Another drawback is that the algorithm as it stands today is based on a subgraph isomorphism search, which is hard. Not hard as in tricky to get right like Paxos or other complex algorithms; like NP-Hard, so very difficult to scale. We have some fantastic Ph.Ds working with us who focus on optimization of subgraph isomorphism search, and category theorists working to formalize what constraints we can relax without effecting the learning mechanism of the rewrite system, so we're confident that it's achievable, but the time horizon is unknown currently.

It doesn't exist at scale yet.
Especially interested in learning directly on geometries, please keep us updated and share results
Would definitely recommend Bronstein et. al's work on geometric deep learning! https://geometricdeeplearning.com

That's effectively the right hand side of the bridge that we're building between formal logic and deep learning. So far their work has been viewed mainly as descriptive, helping to understand neural networks better, but as their abstract calls out: "it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented". That's us (we hope)!

I would like to subscribe to your newsletter, we'd be super interested in this at Brainchain AI.

Drop me a link at (my first name) @ brainchain dot AI if you'd like to chat, I'd love to hear more about what you're working on!

Really cool stuff! Do you have any recommendations of where we could learn more?
The key in that is models. Per the GPT4 leaked details, it’s not a a single model but 16 MOE mixture of experts. There’s probably quite a lot of complexity on the backend in sourcing the right model for the right query. In short, it’s probably better to focus on single models for specific tasks in the OS community as evidenced by Code Llama. Having a system like GPT4 is still difficult to replicate. Getting it to run on a consumer hardware for specific tasks like code gen at almost GPT4 level is doable.
>There’s probably quite a lot of complexity on the backend in sourcing the right model for the right query.

This isn't how Sparse MoE models work. There isn't really any complexity like that. And different models will or can pick each token.

Sparse models aren't an ensemble of models.

There are many MoE architectures and I suppose we don’t know for sure which OpenAI is using. The “selection” of the right mix of models is something that a network learns and it’s not a complex process. Certainly no more complex than training an LLM.
When I wrote “backend” was a poor choice of a word. “Meta-model” is probably a better choice of wording.

I hope it did not detract too much from the point of focusing on subtasks and modalities for FOSS as GPT 4 was built on a $163 million budget.

Finally, good point. We’ve got no idea of what OpenAI’s MoE approach is and how it works. I went back to Metas 2022 NLLB-200 system paper and they didn’t even publish the exact details of the router (gate).

Yeah, good point on the importance of FOSS focusing on subtasks... because FOSS isn't going to be spending $150M+ training a model any time soon without something like government backing.
I think with or without algorithmic advantages hardware will improve for local model running. There’s an immense amount of capital being invested in hardware improvement and that will absolutely trickle down.

My sincere belief is that local models is the way of the future, with flexible base models adapted via Lora and context to specific use cases. I think open source models and techniques are inexorable at this point barring some sort of regulatory moat and will rival commercial models in all but extreme cases.

I don't, how do you maintain control and prevent mass harm in that case? I don't see anyway out other than similar gatekeeping we apply to ownership and use of high explosives and radiological weapon tooling.

At all other times I support tech freedom. I use libre software, I use Tor, I donate to privacy and FOSS organizations constantly. I only write my software projects under an AGPL license. AI is qualitatively different. A world run amok with intelligent infinite Sybils is not good for anyone. I hope massive compute continues to be necessary, it may be the only hard chokepoint we have to keep a handle on the beast.

> For the sake of not giving Microsoft and a few other tech giants immense power over the world

I agree with and appreciate the sentiment, but it feels way too late for that. These people do have and exert direct control over pretty much all of our digital devices. It's funny (or sad) that we only seem to care about this when shiny doodads like AI come around every so-often.

That could also help tech giants build even larger/more capable models cheaply. Ideally there would be a hard ceiling of LLM capability that even massive amounts of hardware couldn't exceed, allowing inexpensive hardware to catch up.
I personally hope that LLMs have no such limits. The good these tools can do is immeasurable.

I can already run Llama 2 @70b on my laptop, and that’ll look like a quaint old AI artifact in 5-7 years. I think the consumer market will keep pace yet stay well below SotA, just as it always has. That still leaves plenty of room for incredible open-source stuff!

to be fair, if that is achieved then the massive models that tech giants produce will probably be phenomenal