| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vipshek 826 days ago
	I have no idea about the merits of this approach, but I found this interview with the founders a lot more sensical than the linked article: https://twitter.com/Extropic_AI/status/1767203839818781085

4 comments

blueblimp 826 days ago

This was definitely easier to follow.

Since they're building a special-purpose accelerator for a certain class of models, what I'd like to see is some evidence that those models can achieve competitive performance (once the hardware is mature). Namely, simulate these models on conventional hardware to determine how effective they are, then estimate what the cost would be to run the same model on Extropic's future hardware.

Eliezer 825 days ago

Ah, but running an experiment like that risks it returning an answer you don't like.

huevosabio 826 days ago

Much, much better. The first minute or so explains what they are trying to do and why in a way the I can understand.

This interview makes me much more excited and less skeptic than Verdon's usual mumbo-jumbo jargon. He should try using simpler, and more humble language more often.

blovescoffee 826 days ago

This interview makes their product seem like BS. First, they literally cannot simply explain the problem or solution. Regardless, their pitch is that they're building a more power efficient probability distribution sampler. No one in AI research thinks that's a bottleneck.

edit: btw the bottleneck in AI algos is matrix multiply and memory bandwith.

HarHarVeryFunny 825 days ago

My take on the Garry Tan interview (which seems pretty clear, regardless of whether this is snake oil or not) is that Extropic are building low power analog chips because we're hitting up against the limits of Moore's Law (limit's of physics in reducing transistor size), and at the same time the power consumption for LLM/AI training and inference is starting to get out of hand.

So, their solution is to embrace the stochastic operation of smaller chip geometries where transistors become unreliable, and double down on it by running the chips at low power where the stochasticity is even worse. They are using an analog chip design/architecture of some sort (presumably some sort of matmul equivalent?) and using a "full-stack" design whereby they have custom software to run neural nets on their chips, taking advantage of the fact the neural nets can tolerate, and utilize, randomness.

HarHarVeryFunny 825 days ago

Just watched a few minutes of the Lex interview, and have to say Verdon gives off a totally different vibe there, and seems to be talking gibberish about quantum computing.

However, the idea of using analog matrix multiply is reasonable, and has already been done by at least one company:

https://mythic.ai/products/m1076-analog-matrix-processor/

blovescoffee 825 days ago

I'm sorry this may come off as rude, not my intention: The Gary Tan interview explicitly says those things, I'm not sure that's really your "take".

HarHarVeryFunny 825 days ago

Fair enough, but others seem to have a different take!

blovescoffee 825 days ago

True!

ninjin 826 days ago

Computationally, yes, those are the bottlenecks. But I would also add supervised training data, as we can never get enough of that and it is one of few things that increases in compute are (to my mind, you could argue that by scaling unsupervised training further we could do away with it, but I am not yet convinced) not able to solve.

blovescoffee 826 days ago

Their startup is addressing computing bottlenecks so that's what I addressed. Supervised training dat isn't a bottleneck on LLMs, Diffusion models, or any of the hot areas at the moment.

ninjin 825 days ago

I think the situation is less clear than that. While I have limited research experience with image generation, I believe I do have a fair understanding of large language models. From the publication of GPT-2 until ChatGPT, it was true that the argument always was that supervised training data was not a priority and that it all boiled down to scaling the amount of unsupervised training data. However, this all changed with preference tuning, etc. and I think there is also an argument to be made that the extensive training data curation that we see today (and is withheld from the "papers" we see for the models) is a form of supervision in its own right. It could be that we will see computational/data scaling dominate again, but I think it is equally possible that we will have the next few years dominated by data curation and exploring forms of supervision to "extract" value out of what was learnt at the unsupervised training stage.

Still, you are correct that Extropic is looking at the computation rather than data. But, I wanted to chime in so as the discussion here would not leave the impression that we are still in the days of pure unsupervised scaling.

duped 826 days ago

My understanding is that the goal of these approaches are to avoid those bottlenecks.

blovescoffee 825 days ago

Did they invent new DL algorithms and publish them? If I remember what I heard in the interview correctly, this targets existing architectures.

duped 825 days ago

No, they're using analog computers. They point that out in the interview and the linked article.

blovescoffee 825 days ago

To clarify, I meant neural network architectures not chip architectures.

throwawaymaths 826 days ago

Vaguely though what they are talking about sounds like it might be better for training? (I'm really stretching it here)

blovescoffee 825 days ago

Yes that's stretching the truth

jason-phillips 826 days ago

And Lex's podcast/interview with Guillaume Verdon, one of said founders.

https://m.youtube.com/watch?v=8fEEbKJoNbU&pp=ygUVbGV4IGZyaWR...

throwawaymaths 826 days ago

Anyone else get super creepy vibes from the way he talks in this video? I'm calling that it's a fraud.

If it is a fraud, how do people like this get funded?? (And how can I be creepier so that my real ideas get funded)

ballooney 825 days ago

He gets lots of interesting guests (and some BSers) on his podcast, so people listen.

throwawaymaths 825 days ago

I'm not talking about lex. Lex is fine (if boring; that's good, it puts the focus on the guest).

pclmulqdq 826 days ago

The sad truth: get on Twitter and say a lot of weird, "high-minded" things. It's where VCs hang out, and this is the language they get from a lot of people.