Hacker News new | ask | show | jobs
Intel introduces 3-GHz desktop chip (2002) (computerworld.com)
29 points by mikektung 4657 days ago
9 comments

Instructions per cycle has been getting somewhat better, and it is that number multiplied by the clock speed that is a better indicator of actual performance https://en.wikipedia.org/wiki/Instructions_per_cycle

Everything else needs to keep up to - it is pointless having a fast processor if it has to keep waiting on memory, storage and the network. Those are very slowly catching up and also lead to overall improved performance.

I've been hoping that asynchronous implementations would take over. In theory parts of the chip can run at whatever speeds are best for them at that time, and not have to be synchronised with other parts. And when not in use they easily power down. There were some async ARM chips made, but no progress since 2000 https://en.wikipedia.org/wiki/AMULET_microprocessor

This was in Intel's dark days of P4 / "Netburst" microarchitecture. They goosed lots of GHz out of the chip by going with a very deep pipeline, but performance in real-world applications was terrible. (deep processor pipelines kill you when you mispredict a branch).

I sat in on a few sales calls from Intel about their new Pentium M / "Centrino" mobile architecture in 2003. What was amazing was that their performance graphs showed that Centrino had all the performance of P4, but with much lower power.

Basically, the terrible P4 microarchitecture, plus Intel's incompatible 64-bit approach (Itanium, aka "the Itanic"), left a big hole in the market where AMD stepped in and mopped up for 3-4 years with Opterons, the first 64-bit x86 processors.

Even today, x64 architecture is called "AMD64" for this reason -- AMD defined the instruction set, and Intel had to follow (for once).

IPC is undoubtedly much higher today, plus now similar machines would have 4 cores or more.

I wonder why it took until later in 2006 before a version of Pentium M showed up with 64-bit support.
Intel didn't release any 64-bit x86 processors until 2005. They bet so heavily on the alternative, Itanium, that it took them a few years to recover.

EDIT: it was actually mid-2004, as noted below.

They release the first 64-bit Xeon in mid 2004.
Light has been stuck at c for the last decade, too. When will it break this barrier?
This is the most awesome comment in this post.
Clock speed is an unfortunate marketing gimmick anymore. I dare compare it to peak horsepower. A 3ghz chip from today will run circles around a chip from 2002, and with less power to boot. AMD is ahead in the clock speed race, but gets beat handily by "slower" Intel processors, while using twice the power. The focus going forward is going to be on power efficiency and using more cores, not clock speed.
This is true, but it's missing the point. A modern CPU gets probably 50% more work out of a median clock cycle and runs 33% faster for single threaded (turbo) workloads. So it's twice as fast. And sure, there are four of them on the die.

But back up another decade to 1992, where a top of the line PC was a 50MHz 486 with well under half the IPC of the linked Northwood running 60x slower.

For those of us who remember the 80's and 90's, it's a very different world we live in.

The (single threaded) performance improvement is significantly larger. Anandtech has 2005 vintage Pentium in their benchmarks: http://anandtech.com/bench/product/92?vs=836 and there is probably a significant perf difference between 2002 and 2005.

And you can't really ignore the massive improvements gained via GPUs. There are your 100x differences

First off, you're absolutely right. The difference in IPC between a modern CPU and a 2002 vintage Pentium 4 is pretty incredible, and the way forward is all about power efficiency and cramming more cores onto one die.

However, I think it's unfair to say clock speed is an unfortunate marketing gimmick "anymore" when it was a gimmick all the way back in 2002 when Intel released the 3.06 GHz "Northwood" Pentium 4 that the OP's linked article references. In fact, it was a gimmick that caused one of the biggest strategy/roadmap blunders Intel ever made.

Intel designed NetBurst (the architecture that the P4 was based on) to do one thing really well: allow Intel to ramp up clock speeds quickly. The architectural choices they made to enable this severely hobbled the P4's performance, especially the 20 (later 31!) stage pipeline that made the penalty for branch mispredictions pretty awful.

Intel eventually released P4s that clocked as high as 3.8 GHz and had an unheard of (in the x86 space) 115 watt TDP, but when the Athlon 64 was released, AMD could smoke Intel's fastest P4s using slower clocked CPUs with lower TDPs. Instead of focusing on raw clock speed, AMD focused on architectural improvements like x86-64, HyperTransport, and an integrated memory controller. (Intel CPUs wouldn't see QPI or an integrated memory controller until Nehalem, released five years after the first Athlon 64.)

As you say, the tables are now turned — clock for clock, the IPC of AMD's Piledriver core is behind that of even Intel's (two generations old) Sandy Bridge core, and all AMD seems to be able to do is add more cores and crank up the clock speed. Unfortunately for AMD, adding more cores doesn't help single-threaded performance, and a very nasty side-effect of increasing clock speed is that the processor's TDP increases disproportionately: the 4.7 GHz FX-9590 has a whopping 220 watt TDP, while the 4.0 GHz FX-8350's TDP is 125 watts.

Actually, the performance difference is small. The biggest gain we have today is multi-cores, so no one thread can hog all processing pipelines. If you're pushing through 1 billion instructions, it will still take 1/3 of a second for your CPU to chunk through all of it, but other code can be processed simultaneously on another core.

I know, I know. My laptop (Lenovo T400) has almost the exact same specs as the old Sunfire V20 rackmounts I have (2x2.4ghz cores, 4gb ram, passable video), but the laptop can run on batteries for 5 hours.

But clockspeed is still king -- this T400 run circles around a Lenovo W520 with a 1.6ghz Core i7 and 3x the ram. I know because I had a W520 for work, and I could see the difference.

If you have more cores, and a single task then you have to deal with partitioning of a dataset for a task - if it is partitionable. Thats where the problem lies with SMP computing. Branch predictors have gotten whole lots smarter just as compilers better at giving code that fits better into multiple pipelines.
The problem is physics. We can't get to higher clock speeds with current materials, due to heat. It's kind of like how fighter jets haven't got any faster (top speed anyway) since the 60's...
The MIG-25 is rated at Mach 2.8 GHz, but can be overclocked to 3.2.

https://en.wikipedia.org/wiki/Mig-25

Indeed, and first flew in 1964. Note that the follow-on MiG-31 was considerably slower despite sharing the general aerodynamic platform.

Since then Vmax has been declining, as the aerodynamics and mechanical complications ( e.g variable intake ramps ) of higher-Mach flight were determined to be less useful than transonic manouevrability and sustained supercruising.

The exception to this trend has been the superfighter category ( F-111, F-14, F-15, F-22, Su-27 ) which have maintained the same ~ M2.5 Vmax due to their specific role. Yes, even the F-111 was meant to be a fleet fighter.

But none have pushed up past the heady M3.0 level that was routinely broken by a series of prototypes in the 1960s.

Well...it's more complex...

With die sizes as small as they are, we have a problem where electrons...jump...through basically solid walls from an electrified wire to an unpowered wire. Now, turning on one circuit means the circuit browns-out and a neighboring circuit gets half-powered.

With current materials in the CMOS manufacturing process; to be a little nitpicky.
Fighter jets havn't got any faster because more speed is worthless compared to better avionics.

That is also what is happening with computers - it is simply more efficient to cram out more instructions per clock cycle than it is to cram out more clock cycles.

Low power and multicore are cute and all, but imagine the type of machine learning we could do on 384GHz cores.
Ok, now imagine what a 384GHz core would cost.

(hint: probably more than the world GDP. Each)

Ok, NOW imagine what 10,000 3.84GHz cores would cost (100 times more aggregate cycles per second than 1x 384GHz core). What's that, you figure a measly $10-30M instead of more money than exists on earth?

Any research simulation is going to want to be parallelized anyway. You'll bump into the limits of the 384GHz core, no doubt about that, at which point you are back to distributed computing. For limitless complexity and limitless appetite for computing power, distributed computing will always be the answer.

Imagine the type of machine learning we could do on low power 384GHz multi-cores
Imagine the type of machine learning we could do on 384 low power 1GHz cores.
Not everything parallelizes nicely. Given enough processors, serial computation eventually becomes the bottleneck:

http://en.wikipedia.org/wiki/Amdahl's_law

Many machine learning methods parallelize extremely nicely.
Imagine the type of machine learning we could do on low power 384GHz quantum multi-cores with plasma exhaust relays and phase lock inverters.
Invert the polarity
I'm afraid the machine you'd get would be a furnace or a torch... AFAIK power that you'd need to dissipate is linear on frequency, so unless we will discover technology to radically lower the voltage used in CPUs or some other parameter, dissipating 100x heat would be kind of a big problem for such CPU.
In one clock cycle, light would be able to travel about 780 micrometers. The real propagation speed of signals on a chip will be substantially less than the speed of light, and it gets slower at smaller feature sizes. This could be tricky.

http://en.wikipedia.org/wiki/Interconnect_bottleneck

Imagine a Beowulf cluster of these!!!
Unfortunately, many machine learning algorithms aren't easily parallelizable. Consider any boosting ensemble learner, such as the popular random forest, which optimizes the result from the previous iteration. (I run the machine learning startup diffbot.com and we could really use faster cycles.)
what sort of algorithms would you like to have faster variants/implementations of?
You could invent 1000 new ML algorithms in the time it would take you to engineer that.
The machine learning I did was never limited by CPU power, but rather my algorithm and especially the states. Asking for better performance without better states is garbage-in-glory-out.
Cool. Back then there still were articles about a new faster desktop CPU! Today, whenever there's news about a CPU, it's about some other low power mobile whatever thing that is not faster. Yawn.
I wonder: what ratio of FLOPs would you get, between this chip, and an array of "low power mobile whatever thing"s adding up to an equivalent power-draw?
Surely DSPs or GPUs achieve the best FLOPs per watt.
I would be interested in the numbers.
If we can't make the clock speed faster what about massively increasing the size of the on chip cache? I think they call them like l2 and l3 or something. If I had 1gb of cache then maybe my whole program could run without doing much main memory access. That would be fast right?
Check out Haswell GT3e with 128 MB of L4 cache. It helps, but probably not enough to justify what they're charging for it.
As long as all users of your program also have that $5000 CPU (if it would exist), this might be a good idea. Cost-wise, with more than 5 users, it probably starts becoming viable to port that application to something more sane, for example $1000 GPUs like NVIDIA's Titan.
Indeed, AMD announced their first 5ghz CPU in June at E3.

http://www.amd.com/us/press-releases/Pages/amd-unleashes-201...

It is for sale now at Newegg for $699. 4.7ghz with 5ghz turbo.

http://www.newegg.com/Product/Product.aspx?Item=N82E16819113...

Though speed bumps aren't exactly as crazy as the old days. I remember my next upgrade from a 50Mhz 486 ended up being a 266Mhz Pentium 2. This was over a span of about 3 years.
Power consumption is much better.
x86 & x86-64 cores still create heat like incandescent lightbulbs.