Hacker News new | ask | show | jobs
by xeox538 388 days ago
I believe we're currently seeing AI in the "mainframe" era, much like the early days of computing, where a single machine occupied an entire room and consumed massive amounts of power, yet offered less compute than what now fits in a smartphone.

I expect rapid progress in both model efficiency and hardware specialization. Local inference on edge devices, using chips designed specifically for AI workloads, will drastically reduce energy consumption for the majority of tasks. This shift will free up large-scale compute resources to focus on truly complex scientific problems, which seems like a worthwhile goal to me.

3 comments

The CPU development curve is often thrown around but it very seldomly fits anything else in reality. It was a very rare and extraordinary set of coincidences that got it us here. Computation using silicon turned out to have massive growth potential for a variety of lucky reasons but say battery tech is not so lucky, nor is fusion nor is quantum computing.

The low hanging fruit has been plucked by said silicon development process and while remarkable improvement in AI efficiency is likely it is highly unlikely for that to follow a similar curve.

More likely is slow, incremental process taking decades. We cannot just wish away billions of parameters and the need for trillions of operations. It’s not like we have some open path of possible improvement like with silicon. We walked that path already.

Maybe photonics..

I don’t understand the “chips designed for AI workloads” sentiment I hear all the time. Llms were designed using Gpus. The hardware already exists, so what will make it use less energy in a world where Gpus over the last decade have only become bigger, hotter, more power hungry hardware? If we could develop Llm on anything less we probably would have shifted back to Cpus already.
Google's TPUs are an example of efficient chips designed for AI workloads, and were in development before the LLM boom.

It's just hard to replicate the power and efficiency of CUDA.

It sure seems like that to me. I was pretty impressed by how easily I could run small Gemma on 7 year old laptop and get a decent chat experience.

I can imagine that doing some clever offloading to a normal programs and using the LLM as a sort of "fuzzy glue" for the rest could improve the efficiency on many common tasks.

Big tech ain't investing heavily so you can run local, what data does that leave them to sell, and what power and control does that give them. Zilch.
I mean.. cute conspiracy but it doesn't correspond with reality. Just look what's Google releasing, they are trying to make these things fit on consumer hardware.