Hacker News new | ask | show | jobs
Libre-SoC 180nm Power ISA v3.0 ASIC Submitted to IMEC MPW (openpowerfoundation.org)
91 points by lkcl 1809 days ago
9 comments

For SW type people ...

GCC's impact was possible because it was (with GAS - the assembler) 100% feasible to have an open source toolchain. Yes more software was necessary for a complete system (linker, libc, etc), but GCC made it possible to build from the ground floor up.

Also, yes, the initial GCC was worse than any proprietary decent tool chain at the time, but it got better and better because each improvement built on all the earlier open sourced efforts.

Think about how hard Linux kernel development would have been if it had to rely on different proprietary tool chains for every target architecture (and possibly chip version).

Hardware definition languages (Verilog/VHDL, etc) enable high level chip design like high level programming languages, but making the physical chip requires a PDK (process design kit) that encodes how each critical silicon feature is built.

So a chip built for TSMC 28nm contains TSMC proprietary material and is essentially unportable. It can take several years to move a major chip from one foundry to another (or even a shrink at the same foundry), and the proprietary tool chains preclude a development process that can incrementally improve portability.

This announcement is a a major step toward a similar foundation being available for silicon design. It is very important that it is a large complex chip, rather than just a research development vehicle.

[disclaimer - past life as OpenPOWER participant]

I've worked on big chips designed to be taped out to multiple (3) fabs - you have to either build your own libraries that have some minimum performance on all processes, or recompile with a new fab's libraries - my experience is that if you plan for it it's more a matter of a few months than years
you'll be fascinated to know that we picked a python-based (Object-Orientated) HDL - nmigen - for exactly this reason.

we've developed a dynamically SIMD-partitionable-maskable set of "base primitives" for example, so you set a "mask" and it automatically subdivides the 64-bit adder into two halves. but we didn't leave it there, we did shift, multiply, less-than, greater-than - everything.

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/iee... https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/iee...

can you imagine doing that in VHDL or Verilog? tens of engineers needed, or some sort of macro-auto-generated code (treating VHDL / Verilog as a machine-code compiler target).

the reason for doing this - planning it well in advance - is because we're doing Cray-style Vectors (Draft SVP64) with polymorphic element-width over-rides. yes, really. the "base" operation is 64-bit, but you can over-ride the source and destination operation width.

the reason why we're using our own Cell Library is actually down to transparency. we want customers to be able to compile the GDS-II files themselves, fully automated, no involvement from us, no manual intervention.

ironically, as an aside: Staf's Cells are 30% smaller (by area) than the Foundry equivalents.

Google has done a lot of effort in that direction. The first ever chips have already been produced that are fully open source from the tools used to make to the complete tool chain need to manufacture them.

There is a huge amount of great stuff going on this this area.

Tim Ansell - Skywater PDK: Fully open source manufacturable PDK for a 130nm process

https://www.youtube.com/watch?v=EczW2IWdnOM

interestingly, Libre-SOC and NLnet's funding pre-dates the google-sponsored Skywater 130nm process. also, because it's funded by NLnet we're not dependent on google, don't have to pass "conditions", and in particular were not forced to use OpenLane and were not limited to 48 pins controlled by a "Management Engine".

Staf actually developed actual IOpad Cells (from scratch), actual Standard Cells and a 4k SRAM block: we did not use the NDA'd TSMC Cell Libraries, here.

if we had used Skywater 130nm we would have been forced to ditch LIP6.fr (i cannot express enough how hard Jean-Paul Chaput has worked on coriolis2 for the past 18 months), we would not have been able to test the IOpads that Staf developed... yeah.

bottom line is we used a complete independent VLSI toolchain - fully automated - that has nothing to do with the USA or DARPA Military funding - and was developed with European expertise.

> and in particular were not forced to use OpenLane and were not limited to 48 pins controlled by a "Management Engine".

That's because they used TSMC, not SkyWater.

I think you're deliberately creating confusion here.

Also, as the webpage states, they signed TSMC's NDA:

> LIP6 were able to create the GDS-II tape-out under NDA

Sure, if you sign the NDAs, you can use whatever toolflow you want.

Look, I don't mean to in any way denigrate your techinical achievement here, and I have no beef with your project. But the absence of no-NDA foundry access is a huge, massive obstacle to a truly public and free open-source ecosystem, and lately there have been a lot of people and organizations papering over that problem and bamboozling software folk who aren't aware of the issue and its details. Hiding the problem isn't going to get it fixed.

> Also, as your webpage states, you signed TSMC's NDA:

FALSE. again. i do not work for LIP6. i do not work for Chips4Makers. i am an independent *LIBRE* Developer. i have NEVVERRRR signed a Foundry NDA and, having a background involving security analysis and Reverse-Engineering, it would be suicidally and monumentally stupid and counter-productive for me, personally, to do so.

please try to not conflate matters (twice in succession) that you haven't checked or read properly. the best thing to do is to ask questions, such as:

"You're a Libre Project. that has significant implications that everything is entirely Libre. I notice however that you say that someone signed a Foundry NDA? what impact did this have for you? did it stop you from releasing any source code as per obligations of LIBRE Licenses?"

and then i can answer positively and in a friendly way rather than having to publicly waste both my time and that of readers in first unpicking the mistakes, embarrassing you in the process (which risks a public confrontation that annoys everybody even more), and it all goes to hell pretty quickly after that.

answering the question above that you didn't ask: as you know there are about five layers of NDAs in the Silicon Industry.

we've managed to bust through three of those, and so have managed - as a LIBRE Team - to fulfil our obligations both to our funding body, NLnet, under their Privacy and Enhanced Trust Programme, and to Libre/Open Hardware developers by releasing all HDL under LGPLv3 Licenses

     https://git.libre-soc.org
and using Libre-Licensed VLSI tools

and using Libre-Licensed Cell Libraries

now, the TEAM THAT DEVELOPED the VLSI tool - signed a TSMC NDA.

      NOBODY ON THE LIBRE-SOC TEAM SIGNED THAT NDA.
also, Chips4Makers - the developers of FlexLib - signed a TSMC NDA

      CHIPS4MAKERS != Libre-SOC
we are three separate and INDEPENDENT teams, working together, to tackle an insane situation, at different levels. i'll say it again:

      LIBRE-SOC HAS NOT SIGNED AAAANNNYYYYY FOUNDRY NDAs.
are we clear about that, now?

there happens also to be another team, Libre-Silicon, also funded by NLnet, who are developing an actual Libre VLSI process and actually developing a mini home-grown Fab.

then there is another NLnet-sponsored project, working with the Libre Silicon team, to develop another Libre-Licensed Standard Cell Library, that is targetted at Libre-Silicon's PDK (when it's available)

  https://nlnet.nl/project/LibreSiliconStandardCellLibrary/
however neither of these are ready, so we went with the pragmatic route, after exhausting all other options: the parallel track.
These details are fascinating and non-obvious. Every little tool or part you have to use to work with even 15-year old chip fab processes is under NDA.

It's hard to come up with a good analogy ... it's like you need to write your own serial driver for your new open-source programming language to do any I/O, because you can't call any libraries or OS syscalls because they're all NDA, even on a 15+ year old computer/OS.

yehyeh, or you bought a winmodem over 15 years ago, someone told you "hey you have to upgrade to windows 10", it downloads over your 56k Dialup winmodem, reboots... and... no drivers. yes this really happens: ThinkPenguin stock TTYACM USB modems and their biggest customers are Rural people in the USA who are too far out to get broadband!

LIP6 does actually have a fully NDA-free silicon-proven Cell Library, called nsxlib, it's been used in 360nm and 180nm, the 180nm was done by a Japanese University. i think i may have mentioned this already, it's a small town with a 2(?) micron foundry, they make it available to people anywhere in the world entirely for free, it's for training the employees of the town, because it's so old and basic it's hard to mess it up. so they want people to submit designs that the trainees can learn how to fab, before they move on to the more expensive equipment.

but, really, use Chips4Makers, he has 360nm available, EUR 1750 for 20 MPW chips in QFP, i believe.

A fully open source chip, from Verilog to Fabrication is cool!

It may be 180nm (1999-era technology), but that's still hugely important. The world of semiconductor design is incredibly closed source and secretive.

Note that Google has open sourced a full set of design rules for a 130nm process (codenamed SkyWater), making fully open chip designs also possible for this finer process. 130nm was current in the very early 2000s, so it should be possible to achieve interesting results with it.
Didn't they require some closed logic between your stuff and all of the I/O?
Yes, and they still do. They've got a Management-Engine type layer that you are forbidden to remove or modify in any way, even if you pay the full $10k and do not accept any subsidies.

So many people asked about this that the foundry had to make a FAQ about it:

https://www.skywatertechnology.com/ufaqs/can-i-customize-the...

This impacts the fab itself, but the design rules can still be usable elsewhere since they've been released openly.
Use them where, exactly?

Your comment glosses over a ton of critical details.

The most important of them being that even on such an old technology generation (180nm-110nm) no two fabs are so compatible that you can send a GDS designed for one of them to the other unless (a) one of them licensed their process from the other, like IBM/GloFo/Samsung back in the 2010s or (b) you planned for this in advance and designed a custom "least common denominator" process (like MOSIS SCMOS) to target which means making very large performance sacrifices. The (b) approach is much harder than it looks; I know of no examples other than MOSIS SCMOS, and in spite of being the pioneer experts at doing this they had a hard time at 180nm and failed on the following (90nm) generation.

The other, lesser, problem is that no foundry will let you even submit a GDS without signing their NDA. Even if you swear to them that you don't need their design rules for some reason. They don't care. NDA or no chips, not up for discussion. In fact, technically SkyWater still works this way -- to avoid the NDA you must submit through eFabless, not directly to the foundry (maybe this will change someday) and eFabless signed their NDA, then (obviously) negotiated a waiver. So saying "you can work around this problem that the only no-NDA foundry has by just going to another foundry" because there are no other no-NDA foundries, nor are there any on the horizon.

What about the tools and processes to manufacture this? Are those open source or broadly available? For instance, is it possible to have a small scale "community" fab for 1999-era chip technology?
yes, Chips4Makers http://chips4makers.io will help anyone who wants to do a 360nm ASIC, the costs are ridiculously cheap. like... EUR 1750 for 20 MPW samples, something mad, who would have ever thought it.

Staf will also "protect" you from the Foundry NDAs. you develop with a "symbolic" version of the Cell Library, he runs the "Real" one and sends it to IMEC on your behalf. here's Staf's "symbolic" Cell Library, it's based on FreePDK45 https://gitlab.com/Chips4Makers/c4m-pdk-freepdk45/-/releases

Coriolis2 - http://coriolis.lip6.fr/ - is entirely Libre-Licensed. it's fully automated, you don't have to do any "hand-editing", it has unit tests (so you have demos you can look at and also check you installed everything right). we have some automated setup scripts for it if you're interested: https://git.libre-soc.org/?p=dev-env-setup.git;a=blob;f=cori...

LIP6 have a Silicon-proven ENTIRELY Libre Cell Library called nsxlib, if you really want to go that route. it's Silicon-proven in 360nm and 180nm.

Also, LIP6 have a relationship with a small town in Japan, they have 2 micron fab which is used for "training" of employees of the town. submission for that is entirely free. i know this exists but have not used it, and don't know more details, but i can probably put you in touch with Sorbonne University if you're serious.

and if you really really want to do "at home" stuff, Libre-Silicon is developing a 2in wafer fab, using Ultra-Violet DLPs and high-accuracy stepper motors, that you'll be able to buy and operate from your garage or lab. think "3D printing", i think they're aiming for 2000 nm or something (20 micron)? really big, but proves the concept.

I'm wondering, what's the difference between the 'real' cell library and 'symbolic' cell library?
they both have the same connections on the outside (they both have the same "netlist") and you can use the exact same SPICE model (a transistor-level simulation) but usually they're entirely empty inside.

so the VLSI tool can still Place-and-Route them, you can still creaate GDS-II Files, but if you send them to the Foundry, the Foundry will look at you like you have two heads or something and won't talk to you again.

that said: some Foundries have their own Symbolic ("ghost") Cell Libraries, which they send you. you run the VLSI tools with those, then when they get the GDS-II files they SUBSTITUTE the REAL cells for the ghost Cells... and then put that into the Fab.

they do this because they're so paranoid they don't even want you to know what's inside their "Symbolic" (ghost) Cells.

Foundry Symbolic Cells are invariably available only under NDA.

sigh.

which begs the question, how the hell is any information is going to leak out from a completely empty Cell, and unfortunately the answer is: quite a lot. number of layers, what the "stack" is of those layers, distance between tracks, width of tracks, and so on, and the PDK also has to include via sizes and so on anyway.

this starts to give you some idea of the levels of insanity we had to workaround, to meet our Audit and Transparency objectives.

bottom line is until we can bust through these final layers of NDAs, customers who really want to verify the complete GDS-II Files are also going to have to sign a Foundry NDA.

Legalese
I've been keeping an eye out for anything like this. There's Sam Zeloof, doing one-offs in his home lab [1], and there's Libre Silicon [2] putting together their fab too, but the info there's more scarce.

Neither one has published an easily-replicable process, meaning I can't really repeat what they've done. IMO what this space needs is an open source build plan/BoM, with a cottage industry of people selling DiY and pre-assembled kits. Once the 3d printing community got there, that's when things took off -- before kits or at least build guides with proper BoMs, it was just disparate individuals doing their own thing.

Connect me with anyone who's got a good approach to building some sort of replicable open-source fab though, and I'll quit my job and join the project full-time (that's not a joke: I'm serious).

[1] http://sam.zeloof.xyz/category/semiconductor/ [2] https://libresilicon.com/

Hey, I admire your spirit and enthusiasm.

However, one thing to keep in mind is that below 500nm a lot of the chemicals are extremely toxic and not the kind of thing that garage hackers are qualified to handle in an environmentally safe manner.

Arsenic, phosphene gas, hydrogen fluoride, nasty solvents. I build a lot of crazy stuff in my shop, but I don't even trust myself to dispose of these correctly. If makers like myself get involved in this we're going to end up with a lot of new superfund sites. In residential neighborhoods.

And then of course there's the ion implanter, which none of the fab employees want to spend much time around...

I’m not hooked on building desktop CPUs at home or anything. It could be 3 um, 1 MHz and I’d be happy. It doesn’t even have to be semiconductors. We had vacuum tubes and core memory before transistors. The modern fab is optimized for density, perf, and power. Prioritize ease of fabrication and maybe you get a process or substrate that looks radically different from today’s commercial fabs.

Or maybe we adopt a 500 nm node and stop there :-)

Or any other options for "small" batch sizes?
we use nmigen (python-based OO HDL) which through yosys generates verilog as an automatic step.

180nm is still by far and above the world's most heavily-used geometry, because the price-performance (bang per buck, however you want to put it) is so extremely high.

an 8in wafer is USD 600 and that's extremely low. any power MOSFET, power transistor, diode or other high current semiconductor you absolutely don't want small "things" (detailed tiny tracks) you want MASSIVE ones.

why on earth would you waste money on tiny features, it's like using the latest 0.15mm 3D printing nozzles to 3D print a massive 300x300x300 mm cube that's going to be used for nothing more than a foot-stool. you want a 1.2mm nozzle for that!

then any processor below 300 mhz, you can get away with 180nm. need only an 8 mhz 8-bit or 4-bit washing machine or microwave processor, or something to go in a cheap digital watch? 180nm is your best bet: you'll get tens of thousands of < 1 mm^2 ASICs on a single wafer which means you're well below $0.05 per individual die.

a 28nm 8in wafer would be about... 10x that cost, you'd end up with exactly the same transistor (or 8 mhz 8-bit processor), why would you pay more money for what you don't need?

btw the real reason why there's a chip shortage: the Automotive industry, who are cheap bar-stewards, wanted even lower than $600 per 8in wafer so they went with 360nm and cruder geometry. that's equipment that's even older than the 1990s, like 40+ years in some cases.

so then the stupidity hit, and they stopped ordering. then 18 months later they phone up these old Foundries and say, "ok, we're ready to start ordering again". and the Foundries say, "oh, we switched off the equipment, and it cooled down and got damaged (just like that massive Electric plant in S. Australia that was de-commissioned, the concrete cracked when they switched it off, and it's completely unsafe to start up again). you were our only customer for the past 30 years, so we scrapped it all. you'll have to now compete with the consumer-grade smaller geometry Fabs like everyone else".

which is something that none of the Automotive companies have told their Governments, because then they can't go crying "boo hoo hoo, we can't make chips any more at the price that we demand, waaa, waaaa, i wannnt myyy monneeeeey"

and now of course they can't use the old masks, because those were designed for 360nm and cruder geometries, they have to redesign the entire ASIC for 180nm and that's why you can't now get onto 180nm and other MPW Programmes because the frickin Automotive Industry has jammed them all to hell.

This is a very important step. I don't understand how this is not on the first page. Maybe a more click-baity title is needed?
Who is this important for? Is there a lot of software still being developed for POWER? It seems niche to me, but maybe I'm the one in a niche.
The POWER/PowerPC ISA is still widely used in safety-critical avionics, where a mature tool-chain exists for supporting DO-178 objectives.

In my opinion, an area of interest going forward into the next decade of more safety-critical software written by smaller and smaller orgs (e.g. eVTOL companies, sensor companies, etc) is continuing to push forward which objectives can be accomplished by formal means instead of primarily through testing.

An NXP or IBM processor might be great, and might be mature, and might be very well tested -- but I, as a safety-critical software developer, have little way of demonstrating that to certification authorities. The availability of open-source processor designs and, in the future, traceable and accountable conversion from those HDL designs to RTL, to masks, and then to silicon, gives a path to showing that portions of a processor are correct-by-design, and thus a path to the goal of showing that my machine-code-as-authored(-by-an-assembler) and machine-code-as-executed(-by-a-processor) semantics match.

> The POWER/PowerPC ISA is still widely used in safety-critical avionics

and in the Mars Rover, which is a radiation-hardened 133mhz 32-bit Power ISA system.

DO-178 objectives? You mean the same one used in 737 Max?
I'm not familiar with whether the 737 Max development used DO-178B or DO-178C; the latter is a successor to the former, but frames the development process significantly differently.

Any process can be used well or poorly, and DO-178C isn't really a process, it's a set of objectives that a process must accomplish. When used in good faith, I believe it can lead to software of higher quality than almost any other approach (although, to be fair, at higher software development cost than almost any other approach). That doesn't mean that chanting the document name and using hand-me-down rituals is sufficient to achieve high quality software, of course :-).

Many hyperscalar server setups use POWER8/POWER9 CPUs. 4 logical processes per core (and 8 with the upcoming 15-core POWER10 configurations) are pretty useful when measuring perf-per-watt.

The Talos is currently the only fully libre computer available for high-perf computing, and it uses POWER9 CPUs. If you want a fully free CPU, your choices are either very dated CPUs or POWER.

Many distros (inc. Debian, and most source-based ones) support ppc64/POWER officially quite well and go out of their way to ensure a high degree of portability.

Actually, SMT8 P8 and P9 is documented, though it seems rare. Our HPC systems are only SMT4, anyhow.

Yes, you can just install most of at least Debian, Fedora, RHEL, at least, though it needs an "alt" kernel on RHEL7 P9. There are a few things which haven't been ported, mainly due to assembler, I guess. (PRoot and DMTCP are two I know.) Even x86 SIMD intrinsics will largely work, if not necessarily very efficiently.

By Libre computer do you mean the entire system, hardware and firmware?
Hardware: not necessarily. Firmware: yes.
we'll be going as far as is practical and pragmatic with the actual hardware, and still actually meet user-expectations. firmware, bootloader, OS, drivers, BIOS: definitely.
AFAIK this is a libre soc developed using libre software tools, some of which were developed by the group members themselves, free from royalties and independent from any for-profit institution. This is probably "librier" than RISCV.

The fact that the POWER architecture may be niche is not a problem since so much software can be compiled for it. See the thalos workstations: https://www.raptorcs.com/TALOSII/ and the powerpc notebook: https://www.powerpc-notebook.org/en/

For people who are willing to use niche hardware for more control on what is running, this is seems like a very important step.

RISCV is an ISA, not a core design much less a complete SoC. The closest comparison would be something like Rocket, or BOOM.
IMO: the underlying architecture is mostly relevant to kernel/compiler authors and people doing aggressive optimization. For most application devs it's about as irrelevant as you can get (unless your language has a very hard to port compiler cough rust.)

What's good about this is that the source is available and can be verified to some degree against the hardware (by decapping it.) That puts a log of constraints on what kinds of secret back doors people can build that we didn't have before.

Rust supports powerpc64le-unknown-linux-gnu, it is in-fact what we used to test a lot of POWER9's instructions to replicate the exact results that POWER9 gives, since the ISA spec doesn't specify the results for a lot of cases.

https://git.libre-soc.org/?p=power-instruction-analyzer.git;...

I was talking about new architectures in general, not just powerpc.
ultimately what we'd like to see is entirely NDA-free PDKs even for 12nm and below, and you can run the VLSI tools and generate the EXACT GDS-II yourself, then yes, de-cap the processor and do a digital comparison.

before you even get to that stage, you run the Formal Correctness Proofs and unit tests on the HDL, so that YOU have confidence that the HDL which you're about to generate the GDS-II files from is actually correct and does the damn job.

example of a Formal Correctness Proof for the fixed arithmetic Power ISA pipeline:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu...

runs with symbiyosys, so you end up running SAT Solvers like yices2 and z3.

basically we absolutely do not want to be the people you come to and say, "can we trust your ASIC?" and like Intel they lie to you and say "of course!", we want to say, "don't bloody well ask us, go run the damn tools yourself! oh, btw, if you want help with that we charge USD 5k per hour"

Many things were ported to power over the last ~3 decades, and that code is still valuable today.
Commenting on articles early in their life weights them down significantly, if you want something on the front page you should absolutely not comment on it until it gets there.
Thanks for the advice. Didn't know that. Actually, I'm answering only because because it finally got to the first page.

Off topic: where did you get this rule?

That seems counterintuitive. Do you have any data to support this?
Because it isn't really open source.

https://news.ycombinator.com/item?id=27777223

> Symbolic (ghost) versions of FlexLib allowed Libre-SOC developers to not have to sign a Foundry NDA during the development of the ASIC Layout

In other words, this chip isn't even remotely open-source.

What they sent to the foundry isn't the "ghost cells" (which don't have transistors in them and therefore don't work).

This fails the most basic requirements of being open source.

HDL source code: https://git.libre-soc.org/?p=soc.git;a=summary

Coriolis2 source code: http://coriolis.lip6.fr/

Chips4Makers FlexLib Cell Library based on FreePDK45: https://gitlab.com/Chips4Makers/c4m-pdk-freepdk45/-/releases

Automated Layout scripts for generation of GDS-II Files: https://git.libre-soc.org/?p=soclayout.git;a=summary

please do try to get your facts right and not mislead people by making false claims, eh?

What is the problem if they could be translated to a working chip? A C program contains no instructions the machine can use and yet you can compile an open source program with a closed source compiler.
we used an entirely Libre-licensed VLSI "compiler", which takes HDL as input and spits out fully-completed GDS-II Files.

the problem with this particular irate individual is that he's assumed that because TSMC's DRC rules are only accessible under NDA that automatically absof*** everything was also "fake open source".

idiot.

sigh.

clearly didn't read the article.

whilst both Staf Verhaegen and LIP6.fr signed the TSMC Foundry NDA, we in the Libre-SOC team did not. we therefore worked entirely in the Libre world, honoured our committment to full transparency, whilst Staf and Jean-Paul and the rest of the team from LIP6 worked extremely hard "in parallel".

the ASIC can therefore be compiled with three different Cell Libraries:

* LIP6.fr's 180nm "nsxlib" - this is a silicon-proven 180nm Cell Library * Staf's FreePDK45 "symbolic" cell library using FlexLib (as the name says, it uses the Academic FreePDK45 DRC) * the NDA'd TSMC 180nm "real" variant of Staf's FlexLib

i was therefore able to "prepare" work for Jean-Paul, via the parallel track, commit it to the PUBLIC REPOSITORY (the one that's open, that our resident idiot didn't bother to check existed or even ask where it is), which saved Jean-Paul time whilst he focussed on fixing issues in coriolis2.

it was a LOT of work.

I can't wait to see the Vulkan implementation for this. Apparently it should be somewhat hardware-accelerated due to the vector capabilities of the core?
yes, so the "normal" way that GPUs work is: the architecture and the ISA are so staggeringly optimised they're completely incompatible and incapable of running standard (general-purpose) workloads. no MMU, vast wide SIMD engines, massive numbers of parallel memory interfaces that run really slowly but can handle (when added up) vast bandwidth far in excess of "normal" processor memory, and so on.

on top of that, because it's an entirely separate processor, to get it to do anything you actually have to have a Remote Procedure Call system, operating over Shared Memory!

oink.

so the process for running a GPU shader binary is as follows:

step 1: fire up a compiler (in userspace) step 2: compiler takes the shader IR and turns it into GPU assembler step 3: the userspace program (game, blender, whatever) triggers the linux kernel (or windows kernel) to upload that GPU binary to the GPU step 4: the kernel copies that GPU binary over Shared Memory Bus (usually PCIe) step 5: now we unwind back to userspace (with a context-switch) and want to actually run something (OpenGL call) step 6: the OpenGL call (or Vulkan) gets some function call parameters and some data step 7: the userspace library (MESA) "packs" (marshalls) those function call parameters into serialised data step 8: the userspace library triggers the linux (windows) kernel to "upload" the serialised function call parameters - again over Shared Memory Bus step 9: the kernel waits for that to happen step 10: the userspace proceeds (after a context-switch) and waits for notification that the function call has completed...

... i'm not going to bother filling in the rest of the details, you get the general idea that this is completely insane and goes a long way towards explaining why GPU Cards are so expensive and why it takes YEARS to reverse-engineer GPU drivers.

in the Libre-SOC architecture - which is termed a "Hybrid" one, the following happens:

step 1: the compiler is fired up (in userspace, just like above) step 2: compiler takes the shader IR and turns it into *NATIVE* (Power ISA with Cray-style Vectors and some custom opcodes) assembler step 3: userspace program JIT EXECUTES THAT BINARY NATIVELY RIGHT THERE RIGHT THEN

done.

did you see any kernel context-switches in that simple 3-step process? that's because there aren't any needed.

now, the thing is - answering your question a bit more - that "just having vector capabilities" is nowhere near enough. the lesson has been learned from Nyuzi, Larrabee, and others: if you simply create a high-performance general-purpoes Vector ISA, you have successfully created something that absolutely sucks at GPU workloads: about TWENTY FIVE PERCENT (one quarter) of the capability of a modern GPU for the same power consumption.

therefore, you need to add SIN, COS, ATAN2, LOG2, and other opcodes, but you need to add them with "reduced accuracy" (like, only 12 bit or so) because that's all that's needed for 3D.

you need to add Texture caches, and Texture interpolation opcodes (takes 4 pixels @ 00 01 10 11 square coordinates, plus two FP XY numbers between 0.0 and 1.0, and interpolates the pixels in 2D).

you need to add YUV2RGB and other pixel-format-conversion opcodes that are in the Vulkan Specification...

and many more.

but, we first had to actually, like, y'know, have a core that can actually execute instructions at all? :) and that's what this first Test ASIC is: a first step.

Awesome job. I tried to make a simple GPU in chisel w/ hardfloat. I also came to the conclusion that Larrabee was a joke and dedicated triangle interpolation hardware was necessary, but I didn't consider the half-float(?) or caches or other additions you had to make.
thx phndrenad2. funny i just searched "chisel gpu" and found two: https://github.com/jbush001/ChiselGPU https://github.com/Chlorophytus/broccoli

half-float we'd like to do by using a dynamic SIMD-aware 64-bit ALU that has auto-partitioning. we do however already have an actual FP16 implementation https://git.libre-soc.org/?p=ieee754fpu.git;a=tree;f=src/iee...

or more to the point, one that is compile-time configureable with one parameter (bit-width), so the same HDL does FP16, FP32 and FP64. i'd like to make that dynmaically-SIMD-configureable but it'll take some base work in nmigen to do without massive code-explosions.

Interesting as this is, I'll look forward to version two, to see how the vector processing works.
you can get a pretty good idea right now, the simulator is functional and the unit tests include explanations in english:

https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/...

i'm currently in the middle of a rabbit-hole exploration of being able to do in-place RADIX-2 FFT, DCT and DFT butterflys, the target is a general purpose function to cover each of those, in around 25 Vector instructions.

not 2,000 optimised loop-unrolled instructions specifically crafted for RADIX-8, another for RADIX-16, another for RADIX-32 ..... RADIX-4096 (as is the case in ffmpeg): 25 instructions FOR ANY 2^N FFT.

btw if you're interested in "real-world" SVP64 Vector Assembler we have the beginnings of an ffmpeg MP3 CODEC inner loop:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=medi...

that's under 100 instructions, more than 4x less assembler for the same job in PPC64. and 6.5 times less assembler than ffmpeg's optimised x86 apply_window_float.S

you will no doubt be aware of the huge power savings that brings due to reduced L1 cache usage.

I didn't see any specs for this SoC in the article, did I miss it?
no, it's pretty basic, and implicit: it's the (newly-created) "Scalar Fixed-Point Compliancy Subset) - i added a bit to the wikipedia page last month about them https://en.wikipedia.org/wiki/Power_ISA#Compliancy

it's 64-bit, LE/BE, and it's implementing a "Finite State Machine" (similar technique to picorv32, if you know that design). this because we wanted to keep it REALLY basic, and also very clear as a Reference Design, none of the "optimised pipelined decoders and issuers" that you normally find, which make it really, really difficult to see what the hell is going on.

bear in mind this includes SVP64: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple...

if you go back several revisions, the non-Vectorised version is like... 400 lines?

What does this mean to noobs like me?
Here are a few implications:

* In a few years (maybe 5?), it might be possible to build a computer that you can trust has no intentional back doors in the CPU, but is modern enough to run software from within the last decade.

* If this catches on, and is used by enough people, economies of scale might kick in, and bring costs for advanced custom chips down by an order of magnitude (if the cpu is small enough, and if more fab capacity is built). Not Intel/AMD/ARM parts - those prices will remain stable, at first.

* Maybe we can have another decent consumer-grade router? No, this is a pipe-dream.

* Our Amiga accelerator boards will become SMOKING fast.

Is the chip in question a complete CPU?
yes. it is however a test ASIC. therefore it has no on-board boot ROM, and has to have programs uploaded to it over JTAG.
Is it 32 bit or 64 bit?
Congratulations.
thanks :)