Hacker News new | ask | show | jobs
by lwneal 1326 days ago
At this rate they'll be manufacturing 0nm chips soon, and in a decade they'll be on -1nm.

But despite the weird naming scheme, it's clear from transistor density [1] and GPU prices [2] that foundries are still making progress in transistors per dollar. That progress is just barely beginning to make large neural networks (Stable Diffusion, vision and speech systems, language model AIs) deployable in consumer applications.

It might not matter whether your cell phone renders this page in 1ms or 10ms, but the difference between talking to a 20B parameter language model and a 200B net is night and day [3]. If TSMC/Samsung/Intel can squeeze out just one or two more nodes, then by the middle of the century we might have limited general-purpose AI in every home and office.

[1] https://en.wikipedia.org/wiki/Transistor_count#GPUs

[2] https://pcpartpicker.com/trends/price/video-card/

[3] https://textsynth.com/playground.html

7 comments

As a traditional "web dev" kind of hacker, I feel like I'm just sitting idly by while a massive transition happens underneath me.

I understand roughly why this shift is happening (machine language proving to solve a whole raft of hard problems) and how it's happening (specialized chip designs for matrix math). But I don't understand where it's all going, or how I can plug into it.

It feels like a fundamentally different landscape than what I'm used to. There's more alchemy, perhaps. Or maybe it's that the truly important models are trained using tools and data that are out of reach for individuals.

Does anyone else feel this way? Better yet, has anyone felt this way and overcome it by getting up to speed in the ML world?

As a web developer, you are primed to create the interfaces to these ML tools (or any tool really). The browser is the most widespread UI application that nearly every single consumer computer has access to. You can work on literally anything that someone accesses via the internet, and the more AI/ML is made useful, the more people will need access to these products, the more web devs will be needed to build apps to use these products
Not only for building user-facing tools; the AI/ML space needs a lot of good engineering practices to function. In many cases, traditional software, infra and DevOps, QA and UI engineers, etc., are as crucial to ML projects as data scientists are.

So I don't think you need to worry about being left behind or that your skills are stagnating if you're not directly developing ML models. There's no existential reason to jump in, unless you're particularly interested in ML.

> So I don't think you need to worry about being left behind or that your skills are stagnating if you're not directly developing ML models.

Aren't ML models kind of the front lines these days? If I'm understanding correctly, it's the models, the training techniques, the curation of data sets-- These are the things that will inform the next generation of products and services.

I agree that there's more to it that just letting ML models loose, but it certainly seems like the core of it.

Depends on what you want to focus on. My point is that there are plenty of roles adjacent to the core of ML that are still needed to make ML function. Think about data storage for models, maintaining CI pipelines for training, UIs for curation and labeling, packaging and deploying models, data version control systems, etc. None of these are tasks data scientists should be concerned about, and viceversa, data science is not something engineers should necessarily be concerned about either. It doesn't make either role superior; they just complement each other well.
This makes me hopeful for the future of my career, but as a n00b in regards to CS principles, it still kinda sucks that the minutia of this technology is so foreign to me. I think that is what the OP was getting at.

Using ML/AI tools is not terribly difficult, and the principle of how they work is simple enough (feed the model examples of what you want, eventually it can reproduce similar ideas when prompted or recognize new examples as something that it's seen before). Maybe it's ignorance on my part in regards to what it takes to learn these technologies deeply, but right now, I wouldn't even know where to begin to start studying to really learn how this technology works/operates on a fundamental level.

Despite my ignorance of the subject, I could probably work out how to step into an ancient archaic COBOL system or whatever, but ML/AI _feels_ so far out of my reach as a webdev.

This is one reason I think ML as a topic is cool, even if I think the practical uses for the every day person is still really far away from being a realistic.

As a wise dog once said, "sucking at something is the first step towards being kinda good at something."

I avoided many areas of computer science and programming for far too long because I thought they were "hard". Many of them turned out to be way easier than I expected (I often found myself wondering, "why didn't someone tell me how easy this is! I could have done this years ago!!"), or so much fun that working on them felt effortless, despite the increased effort in objective terms.

I guess what I'm saying is, start learning about what interests you today! Make small and consistent progress.

I've found that setting aside time every day (rather than just when I feel like it) to study areas of interest has been extremely helpful in this regard.

It's not something that's out of reach for a web dev. It might seem like an insurmountable task, but if you break it down, you can digest it easier.

ML is just math. We're at a point where you don't even have to completely grok the math to apply ML techniques productively.

If you want somewhere to start and need a project idea, read up on how to build a simple binary classifier. The "hard" work is building good training and validation sets if you use a ML framework.

The practical deep learning course from fast.ai is what I've been working my way through, at a much slower pace, I am sure, than someone would if they were writing code for a living. The difficult technical hurdles are getting used to (i) Python if you've never used it and (ii) Jupyter notebooks if you've never used them. I'm using Paperspace.com for my GPU instances.

https://course.fast.ai/

Thanks for the link! I'll put it on the list of other stuff to check out. I don't write Python at work, but it's probably something I should familiarize myself with a little more since it's pretty popular in the Home Automation communities.
Just like every other technological advance, there is a sort of "food chain" that builds on top of these foundational technologies. You didn't have to work in cryptography to play a role in the massive proliferation of online commerce and banking. There were and are many, many conventional tasks and non-PhD-making innovations between cryptography existing and the wonderful low-friction commerce we now enjoy.

Don't know how language translation models work? No problem, use one that someone else made to make a web framework that self-internationalizes to the user's browser default language without the site creator even knowing that language exists!

That's certainly true! I can use my current skillset to help connect users with new tech. And there have been many minor revolutions during the course of my career, many of which have been incorporated into the sort of work I do.

I guess the difference (for me, anyway) is that this change isn't incremental. It's a fundamentally new type of computing-- One that comes with a totally new way of approaching problems. Listening to Andrej Karpathy talk about Software 2.0, for instance, it seems probable that ML has a place in many parts of the stack.

It's possible I'm just projecting my insecurities, here, but my experience has been that changes to computing hardware usually result in changes across the entire industry. And this feels like a pretty meaningful change.

The ML space is a lot easier to get into than before. Training and using models is more and more just like using any other library or framework. The hard algorithmic parts, the math, that's all being solved by academia and researcher.

That said, getting hold of the data and the computational resources for training are barriers.

I'll be bold enough to make the contrarian prediction. The approach of throwing ever more parameters in the model and ever more transistors on the chip is at best a brute force approach to AI and will likely plateau in effectiveness long before we get to "general purpose AI". We do not need 1nm neurons running at GHZ rates and training on a corpus of everything ever said just to comprehend language. There needs to be an algorithmic breakthrough. There is likely already more than enough processing power.

Even bolder prediction: When we finally understand how the brain actually does it, the algorithmic improvement will be so enormous that the machine learning tasks which run on massive servers today will be able to run on the phone currently in your pocket.

This view may ultimately be right, but massively ignores the current observed trends in capabilities increase[0], scaling laws[1], and things like grokking[2]. I'm seeing an increasing amount of researchers (me included) moving to stances like: "there is a scary possibility that we may solve all the benchmarks we come up for AI... without understanding anything fundamentally deep about what intelligence is about. a bummer for those like me who are see AI as a fantastic way to unlock deeper insights on human intelligence" @Thom_Wolf [3]

[0] https://www.lesswrong.com/posts/K4urTDkBbtNuLivJx/why-i-thin...

[1] https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla...

[2] https://twitter.com/_akhaliq/status/1479265403142553601

[3] https://twitter.com/TacoCohen/status/1584499066410790912

I feel the same way. What if there’s a better way to use those transistors?

The semiconductor researchers spend a lot of effort to make ever-smaller transistors. What is a transistor? It’s a tiny switch.

The ML researchers meanwhile use the language of linear algebra to define mathematical transformations of real numbers with nice differentiability properties.

The chipmakers are then tasked with reconciling the two. So they use transistors to make gates. And gates to make adders. And adders to make integer multipliers. And integer multipliers to make floating point multipliers. And fp multipliers to implement matrix multiplication. And now you can run your cat diffuser model on those transistors.

But what is the chance that the configuration of transistors in a floating point multiplier is anywhere close to the most efficient transistor configuration for learning?

The only reason we’re using multiplication of real numbers is because the math people said so.

Since we are openly speculating, I think the missing ingredient is feedback loops. There is no explicit input side and output side of the brain. Its all just a ball of neurons. There is propagation delay between the neurons. This makes it possible to have self sustaining loops of neurons firing. The longer the loop, the longer the amount of time it takes to go full circle. We call this phenomena "brain waves".

I think what we get wrong is that individual neurons rarely represent anything. They are a medium for the waves. The waves are the currency of thought. A brain is a series of electro-mechanical oscillators that resonates with abstract concepts and patterns.

AFAIK, most research is still using the old "neurons represent single things" paradigm. Someone needs to tell them, there's no such thing as a "grandmother neuron".

> When we finally understand how the brain actually does it

If you dig into how neurons work in the brain you'll discover that a single neuron has the complexity of a large neural network internally and it's behavior is not nearly as simple as the typical model explanation. Different ion channels, time-dependent behavior, up/down regulation of neurotransmitter receptors and release, and much more.

It is entirely possible that the brain "does it" by throwing vastly more computing resources at the problem than we previously believed.

I have and while its true a neuron does more than the simple on/off of their artificial peers, I'd hardly call it "the complexity of a large neural network internally". There just aren't that many bits needed to represent all the parameters you just mentioned. Stuff you mention like ion channels and neurotransmitters feels like excessively mimicking biological constraints rather than something actually relevant. Who cares if the real neurons use chemical channels and electrical channels to communicate, the artificial ones can just send the same information in electrical channels for everything.
While brain has vastly more computing resources, they are probably operating very far from the theoretical optimum. There must be so much baggage, inefficiency and dead pathways left which do not have much purpose at all. The brain was evolved, not designed, which means that any random features/mutations which did not inhibit individuals ability to procreate got to stay.

I think that even when we understand the brain completely, it will be very difficult disentangling what is useful for artificial neural networks and what doesn't really matter.

The capabilities are already there. If the compute becomes more affordable, they will explode in usage. In fact, this is already happening. See live transcription on newer iOS devices for an example.

Scaling diagrams also showed no sign of plateauing AFAIK.

Intel chose Angstroms for their roadmap nodes: 20A = 2nm.
>At this rate they'll be manufacturing 0nm chips soon, and in a decade they'll be on -1nm.

The industry has chosen angstroms as the next unit. Where 1.4nm will becomes 14A. ( Intel for now, but TSMC has uses the term in a few of its presentation )

>If TSMC/Samsung/Intel can squeeze out just one or two more nodes,

We have a very solid roadmap all the way to 1nm, or 10A by the end of this decade. As long as the market is willing to pay for it. At least TSMC 3nm and 2nm is pretty much done.

>limited general-purpose AI in every home

Like the comment below, I am not convinced the brute force approach works. You can already build a 800mm2 NPU today that is equivalent to a chip used in "in every home and office." by the end of this decade. But we are still no where near it.

> clear from .. that foundries are still making progress in transistors per dollar.

Are they?

Home GPU price have been elevated due to crypto.

Latest node CPUs are mostly opaque long-term contracts.

The xxBN transistors chips are priced at $xx,000.

If you have a (better) reference I would love to see it.

Just look at the latest Apple silicon. Transistor per dollar and TDP is showing great strides.

Intel and Nvidia just got too lazy or fumbled their node progress.

Nvidia uses TSMC for fabrication.
>"That progress is just barely beginning to make large neural networks (Stable Diffusion, vision and speech systems, language model AIs) deployable in consumer applications."

Interesting. Is Stable Diffusion a product of neural network size then? Is the size of the network a function of chips density? Also is there a Stable Diffusion app that currently works on edge devices?

If edge devices includes my gamer PC then yes. For apps I'd recommend https://github.com/AUTOMATIC1111/stable-diffusion-webui
They aren't even really 1nm.... but either way, they could switch to pico-meter.....