Since they're building a special-purpose accelerator for a certain class of models, what I'd like to see is some evidence that those models can achieve competitive performance (once the hardware is mature). Namely, simulate these models on conventional hardware to determine how effective they are, then estimate what the cost would be to run the same model on Extropic's future hardware.
Much, much better. The first minute or so explains what they are trying to do and why in a way the I can understand.
This interview makes me much more excited and less skeptic than Verdon's usual mumbo-jumbo jargon. He should try using simpler, and more humble language more often.
This interview makes their product seem like BS. First, they literally cannot simply explain the problem or solution. Regardless, their pitch is that they're building a more power efficient probability distribution sampler. No one in AI research thinks that's a bottleneck.
edit: btw the bottleneck in AI algos is matrix multiply and memory bandwith.
My take on the Garry Tan interview (which seems pretty clear, regardless of whether this is snake oil or not) is that Extropic are building low power analog chips because we're hitting up against the limits of Moore's Law (limit's of physics in reducing transistor size), and at the same time the power consumption for LLM/AI training and inference is starting to get out of hand.
So, their solution is to embrace the stochastic operation of smaller chip geometries where transistors become unreliable, and double down on it by running the chips at low power where the stochasticity is even worse. They are using an analog chip design/architecture of some sort (presumably some sort of matmul equivalent?) and using a "full-stack" design whereby they have custom software to run neural nets on their chips, taking advantage of the fact the neural nets can tolerate, and utilize, randomness.
Just watched a few minutes of the Lex interview, and have to say Verdon gives off a totally different vibe there, and seems to be talking gibberish about quantum computing.
However, the idea of using analog matrix multiply is reasonable, and has already been done by at least one company:
Computationally, yes, those are the bottlenecks. But I would also add supervised training data, as we can never get enough of that and it is one of few things that increases in compute are (to my mind, you could argue that by scaling unsupervised training further we could do away with it, but I am not yet convinced) not able to solve.
Their startup is addressing computing bottlenecks so that's what I addressed. Supervised training dat isn't a bottleneck on LLMs, Diffusion models, or any of the hot areas at the moment.
I think the situation is less clear than that. While I have limited research experience with image generation, I believe I do have a fair understanding of large language models. From the publication of GPT-2 until ChatGPT, it was true that the argument always was that supervised training data was not a priority and that it all boiled down to scaling the amount of unsupervised training data. However, this all changed with preference tuning, etc. and I think there is also an argument to be made that the extensive training data curation that we see today (and is withheld from the "papers" we see for the models) is a form of supervision in its own right. It could be that we will see computational/data scaling dominate again, but I think it is equally possible that we will have the next few years dominated by data curation and exploring forms of supervision to "extract" value out of what was learnt at the unsupervised training stage.
Still, you are correct that Extropic is looking at the computation rather than data. But, I wanted to chime in so as the discussion here would not leave the impression that we are still in the days of pure unsupervised scaling.
The sad truth: get on Twitter and say a lot of weird, "high-minded" things. It's where VCs hang out, and this is the language they get from a lot of people.
Since they're building a special-purpose accelerator for a certain class of models, what I'd like to see is some evidence that those models can achieve competitive performance (once the hardware is mature). Namely, simulate these models on conventional hardware to determine how effective they are, then estimate what the cost would be to run the same model on Extropic's future hardware.