|
|
|
|
|
by brucethemoose2
1163 days ago
|
|
> requisition computing power, copy themselves to these servers, and modify their own training and construction Fortunately the training hardware/software stack is kinda finicky and specific. They aren't just going to anonymously rent a bunch of instances for full self training, even on the dark web, at least not yet. Sci fi is full of AI that slip out of systems and slither around the net like its all a big highway, but integrated 500-GPU supercomputers or Cerebras WS2 nodes aren't just lying around unattended. And we are a long way from full retraining on commodity hardware. |
|
This is the part where I am wondering. If we are clever with our architecture, the part that needs retraining may not necessarily be the LLM each time (or ever).
Training a binary classifier to detect a new situation is way more efficient than retraining or fine-tuning GPT4. You (or the AI) could train thousands of these models in just a few hours on commodity hardware. How much "intelligence" or capability could emerge from a tree of 1000+ binary choices evaluated over every input prompt at every turn? What are the implications of being able to retrain the entire classification front end on every turn? What if all statistics could be reflected by the LLM?
Think about dynamic classes proposed by the LLM at runtime that are then automatically trained on relevant data. That's where it starts to get a bit scary for me. E.g.:
> I propose that I add a new binary classifier to contextualize prompts that result in some measured outcome. If I detect a future prompt probably results in this outcome, I will add the following context to it: "..." If the confidence for this classifier ever measures below X%, it should be deleted.
A human could inspect this sort of system a lot more reliably than with other proposals.