| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brucethemoose2 1163 days ago

> requisition computing power, copy themselves to these servers, and modify their own training and construction

Fortunately the training hardware/software stack is kinda finicky and specific. They aren't just going to anonymously rent a bunch of instances for full self training, even on the dark web, at least not yet.

Sci fi is full of AI that slip out of systems and slither around the net like its all a big highway, but integrated 500-GPU supercomputers or Cerebras WS2 nodes aren't just lying around unattended. And we are a long way from full retraining on commodity hardware.

2 comments

bob1029 1163 days ago

> And we are a long way from full retraining on commodity hardware.

This is the part where I am wondering. If we are clever with our architecture, the part that needs retraining may not necessarily be the LLM each time (or ever).

Training a binary classifier to detect a new situation is way more efficient than retraining or fine-tuning GPT4. You (or the AI) could train thousands of these models in just a few hours on commodity hardware. How much "intelligence" or capability could emerge from a tree of 1000+ binary choices evaluated over every input prompt at every turn? What are the implications of being able to retrain the entire classification front end on every turn? What if all statistics could be reflected by the LLM?

Think about dynamic classes proposed by the LLM at runtime that are then automatically trained on relevant data. That's where it starts to get a bit scary for me. E.g.:

> I propose that I add a new binary classifier to contextualize prompts that result in some measured outcome. If I detect a future prompt probably results in this outcome, I will add the following context to it: "..." If the confidence for this classifier ever measures below X%, it should be deleted.

A human could inspect this sort of system a lot more reliably than with other proposals.

link

brucethemoose2 1163 days ago

You're talking about seperate networks picking LORAs and such? Loading a huge number of embeddings or swapping between sub/additive networks or whatever is not very efficient either, and right now the LLM itself is quite monolithic.

And training such a decision tree seems like it would take a very long time.

The OpenAI CEO has talked about using subnetworks instead of an omegamodel like everyone is using now, and that would make lots of what you describe (and straight up anonymous finetuning) more practical.

link

tmaly 1163 days ago

What would it take to have AI you could carry around in your pocket like out of a William Gibson novel?

If someone came out with a memristor that could be manufactured at scale would we get there sooner?

link