Hacker News new | ask | show | jobs
by TrainedMonkey 147 days ago
> CPU: Dual-core 300MHz ARM Cortex-M4F

It's absolute bonkers amount of hardware scaling that happened since Doom was released. Yes, this is a tremendous overkill here, but the crazy part here is that this fits into an earpiece.

3 comments

This is the "little part" of what fits into an earpiece. Each of those cores is maybe 0.04 square millimeters of die on e.g. 28nm process. RAM takes some area, but that's dwarfed by the analog and power components and packaging. The marginal cost of the gates making up the processors is effectively zero.
so 1mm2 peppered by those cores at 300MHz will give you 4 Tflops. And whole 200mm wafer - 100 Petaflops, like 10 B200s, and just at less than $3K/wafer. Giving half area to memory we'll get 50 PFlops with 300Gb RAM. Power draw is like 10-20KW. So, giving these numbers i'd guess Cerebras has tremendous margin and is just printing money :)
Yes, assuming you don't need to connect anything together and that RAM is tinier than it really is, sure. At 28nm, 3megabits/square millimeter is what you get of SRAM, so an entire wafer only gets you ~12 gigabytes of memory.

And, of course, most of Cerebras' costs are NRE and the stuff like getting heat out of that wafer and power in.

Why not ddram?
Same reason why Cerebras doesn't use DRAM. The whole point of putting memory close is to increase performance and bandwidth, and DRAM is fundamentally latent.

Also, process that is good at making logic isn't necessarily good for making DRAM. Yes, eDRAM exists, but most designs don't put DRAM on the same die as logic and instead stack it or put it off-chip.

Almost all these microcontrollers that are single-die have flash+SRAM. Almost all microprocessor cache designs are SRAM for these reasons (with some designs using off-die L3 DRAM)-- for these reasons.

CPU cache is understandably SRAM.

>The whole point of putting memory close is to increase performance and bandwidth, and DRAM is fundamentally latent.

When the access patterns are well established and understood, like in the case of transformers, you can mitigate latency by prefetch (we can even have very beefed up prefetch pipeline knowing that we target transformers), while putting memory on the same chip gives you huge number of data lines thus resulting in huge bandwidth.

I remember playing Doom on a single-core 25MHz 486 laptop. It was, at the time, an amazing machine, hundreds of times more powerful than the flight computer that ran the Apollo space capsule, and now it is outclassed by an earbud.
Can we finally end this Apollo computer comparison forever? It was a real time computer NOT designed for speed but real time operations.1

Why don't you compare it to let's say pdp11, vax780/11 or Cray 1 supercomputer?

NASA used a lot of supercomputers here on earth pior to mission start.

> It was a real time computer NOT designed for speed but real time operations.

More than anything, it was designed to be small and use little power.

But these little ARM Cortex M4F that we're comparing to are also designed for embedded, possibly hard-real-time operations. And dominant factors in experience on playback through earbuds are response time and jitter.

If the AGC could get a capsule to the moon doing hard real-time tasks (and spilling low priority tasks as necessary), a single STM32F405 with a Cortex M4F could do it better.

Actually, my team is going to fly a STM32F030 for minimal power management tasks-- but still hard real-time-- on a small satellite. Cortex-M0. It fits in 25 milliwatts vs 55W. We're clocked slow, but still exceed the throughput of the AGC by ~200-300x. Funnily enough, the amount of RAM is about the same as the AGC :D It's 70 cents in quantity, but we have to pay three whole dollars at quantity 1.

> NASA used a lot of supercomputers here on earth pior to mission start.

Fine, let's compare to the CDC 6600, the fastest computer of the late 60's. M4F @ 300MHz is a couple hundred single precision megaflops; CDC6600 was like 3 not-quite-double-precision megaflops. The hacky "double single precision" techniques have comparable precision-- figure that is probably about 10x slower on average, so each M4F could do about 20 CDC-6600 equivalent megaflops or is roughly 5-10x faster. The amount of RAM is about the same on this earbud.

His 486-25 -- if a DX model with the FPU -- was probably roughly twice as fast as the 6600 and probably had 4x the RAM, and used 2 orders of magnitude less power and massed 3 orders of magnitude less.

Control flow, integer math, etc, being much faster than that.

Just a few more pennies gets you a microcontroller with a double precision FPU, like a Cortex-M7F with the FPv4-SP-D16, which at 300MHz is good for maybe 60 double precision megaflops-- compared to the 6600, 20x faster and more precision.

I have thought about this a little more, and looked into things. Since NASA used the 360/91, and had a lot of 360's and 7900's... all of NASA's 60's computing couldn't quite fit into a single 486DX-25. You'd be more like 486DX2-100 era to replace everything comfortably, and you'd want a lot of RAM-- like 16MB.

It looks like NASA had 5 360/75's plus a 360/91 by the end, plus a few other computers.

The biggest 360/75's (I don't know that NASA had the highest spec model for all 5) were probably roughly 1/10th of a 486-100 plus 1 megabyte of RAM. The 360/91 that they had at the end was maybe 1/3rd of a 486-100 plus up to 6 megabytes of RAM.

Those computers alone would be about 85% of a 486-100. Everything else was comparatively small. And, of course, you need to include the benefits from getting results on individual jobs much faster, even if sustained max throughput is about the same. So all of NASA, by the late 60's, probably fits into one relatively large 486DX4-100.

Incidentally, one random bit of my family lore; my dad was an IBM man and knew a lot about 360's and OS/360. He received a call one evening from NASA during Apollo 13 asking for advice about how they could get a little bit more out of their machines. My mom was miffed about dinner being interrupted until she understood why :D

What's your project/ cubesat name?

Ps. Try msp430 f model for low power. These can be CRAZY efficient.

Ps. Don't forget to short circuit the solar panel directly to system: then your satellite might talk even 50 years from now such as some HAM satellites from cold war (Oscar 7 I think)

> What's your project/ cubesat name?

NyanSat; I'm PI and mentor for a team of high school students that were selected by NASA CSLI.

> Ps. Try msp430 f model for low power. These can be CRAZY efficient.

Yah, I've used MSP430 in space. STM32F0 fits what we're using it for. The main flight computer we designed, and it's RP2350 with MRAM. Some of the avionics details are here: https://github.com/OakwoodEngineering/ObiWanKomputer

> Ps. Don't forget to short circuit the solar panel directly to system: then your satellite might talk even 50 years from now such as some HAM satellites from cold war (Oscar 7 I think)

Current ITU guidelines make it clear this is something we're not supposed to do to ensure that we can actually end transmissions by the satellite. We'll re-enter/burn up within

And perhaps more fittingly, that PC couldn't decode and play an MP3 in real time.
And by an order of magnitude or more, too!
Yes but also Doom is very very old.

I bought a kodak camera in 2000 (640x480 resolution) and even that could run Doom on it. Way back when. Actually playable with sounds and everything.

Here's an even older one running it: https://m.youtube.com/watch?v=k-AnvqiKzjY