Hacker News new | ask | show | jobs
by ex3ndr 895 days ago
This is basically this - it can learn ignore some paths, and amplify something more important, then you can just cut this paths without sensible loss of quality. The problem is that you are not going to win anything from this - non-matrix multiplication would be slower or the same.
1 comments

The issue is that you are thinking of this in terms of information compression, which is what LLMs are.

Im more concerned with an LLM having the ability to be trained to the point where a subset of the graph represents all the nand gates necessary for a cpu and ram, so when you ask it questions it can actually run code to compute them accurately instead of offering a statistical best guess, i.e decompression after lossy compression.

Just give it a computer? Even a virtual machine. It can output assembly code or high level code that gets compiled.
The issue is not having access to the cpu, the issue is that the model being able to be trained in such a way that it has representative structures for applicable problem solving. Furthermore, the structures itself should

Philosophically, you can start ad hoc-ing functionalities on top of LLMs and expect major progress. Sure, you can make them better, but you will never get to the state where AI is massively useful.

For example, lets say you gather a whole bunch of experts in respective fields, and you give them a task to put together a detailed plan on how to build a flying car. You will have people doing design, doing simulations, researching material sourcing, creating CNC programs for manufacturing parts, sourcing tools and equipment, writing software, e.t.c. And when executing this plan, they would be open to feedback for anything missed, and can advise on how to proceed.

The AI with above capability should be able to go out on the internet, gather respective data, run any soft of algorithms it needs to run, and perhaps after a month of number crunching on a cloud rented TPU rack produce step by step plan with costs on how to do all of that. And it would be better than those experts because it should be able to create a much higher fidelity simulations to account for things like vibration and predict if some connector if going to wobble loose .

> Philosophically, you can start ad hoc-ing functionalities on top of LLMs and expect major progress. Sure, you can make them better, but you will never get to the state where AI is massively useful.

Evolution created various neural structures in biological brains (visual cortex, medulla, thalamus, etc) rather ad-hoc, and those resulted in "massively useful" systems. Why should AI be different?

I mean, we could definitely run architectures through simulated evolution with genetic algorithms, but then you arrive at the same problem as humans do, which is that you end up with a statistically best solution for given conditions. Sure, that could be a form of AI but there is likely a better (and likely faster) way to build an AI that isn't fundamentally statistical in nature and is adaptable to any and all problems.
LLMs seem like the least efficient way to accomplish this. NAND gates, for example, are inherently 1-bit operators, but LLMs use more. If weights are all binary, than gradients are restricted to -1, 0, and 1, which doesn't give you much room to make incremental improvements. You can add extra bits back, but that's pure overhead. But all this is besides the real issue, which is that LLMs and NNs in general are inherently fuzzy; they guess. Computers aren't, we have perfect simulators.

Consider how humans design things. We don't talk through every CPU cycle to convince ourself a design works, we use bespoke tooling. Not all problems are language shaped.

From what you've written, I don't see why any of this would require the LLM to "be trained to the point where a subset of the graph represents all the nand gates necessary for a cpu and ram" - you'd just be emulating a CPU, but slower.

Tool usage is better, because the LLM can access the relevant computing/simulation at the highest fidelity and as fast as they can run on a real or virtual computer, rather than emulated poorly in a giant pyramid of matrix multiplications.

Am I missing the point?

Well, just remember that NAND gates are made of transistors themselves which are a statistical model of a sort… just designed to appear digital when combined to that NAND level.

This is why I am very interested in analog again—quantum stuff is statistical already, so why go from statistical (analog) to digital (huge drop off of performance, e.g. just look at basic addition in a ALU) and back to statistical. Very interested. Not sure if it will ever be worth it, but can’t rule it out.