Hacker News new | ask | show | jobs
by bunnie 94 days ago
Actually, the PIO does what it does very well! There is no "worse" or "better" - just different.

Because it does what it does so well, I use the PIO as the design study comparison point. This requires taking a critical view of its architecture. Such a review doesn't mean its design is bad - but we try to take it apart and see what we can learn from it. In the end, there are many things the PIO can do that the BIO can't do, and vice-versa. For example, the BIO can't do the PIO's trick of bit-banging DVI video signals; but, the PIO isn't going to be able to protocol processing either.

In terms of area, the larger area numbers hold for both an ASIC flow as well as the FPGA flow. I ran the design through both sets of tools with the same settings, and the results are comparable. However, it's easier to share the FPGA results because the FPGA tools are NDA-free and everyone can replicate it.

That being said, I also acknowledge in the article that it's likely there are clever optimizations in the design of the actual PIO that I did not implement. Still, barrel shifters are a fairly expensive piece of hardware whether in FPGA or in ASIC, and the PIO requires several of them, whereas the BIO only has one. The upshot is that the PIO can do multiple bit-shifts in a single clock cycle, whereas the BIO requires several cycles to do the same amount of bit-shifting. Again, neither good or bad - just different trade-offs.

2 comments

> The upshot is that the PIO can do multiple bit-shifts in a single clock cycle... it's likely there are clever optimizations in the design of the actual PIO that I did not implement

I was curious, so looked into this. From what I can tell, PIO can only actually do a maximum of two shifts per cycle. That's one IN, OUT, or SET instruction plus a side-set.

And the side-set doesn't actually require a full barrel shifter. It only ever needs to shift a maximum of 5 bits (to 32 positions), which is going to cut down its size. With careful design, you could probably get away with only a single 32-bit barrel shifter (plus the 5-bit side-set shifter).

Interestingly, Figure 48 in the RP2040 Datasheet suggests they actually use seperate input and output shifters (possibly because IN and OUT rotate in opposite directions?). It also shows the interface between the state machine input/output mapping, pointing out the two seperate output channels.

Thanks btw for saying clearly that BIO is not suitable for DVI output. I was curious about this and was planning to ask on social media.

I've done some fun stuff in PIO, in particular the NRZI bit stuffing for USB (12Mbps max). That's stretching it to its limit. Clearly there will be things for which BIO is much better.

I suspect that a variant of BIO could probably do DVI by optimizing for that specific use case (in particular, configuring shifters on the output FIFO), but I'm not sure it's worth the lift.

USB 12Mbps is one of the envisioned core use cases - the Baochip doesn't have a host USB interface, so being able to emulate a full-speed USB host with a BIO core opens the possibility of things like having a keyboard that you can plug into the device. CAN is another big use case, once there is a CAN bus emulator there's a bunch of things you can do. Another one is 10/100Mbit ethernet - it's not fast - but good for extremely long runs (think repeaters for lighting protocols across building-scale deployments).

When considering the space of possibilities, I focused on applications that I could see there being actual product sold that rely upon the feature. The problem with DVI is that while it's a super-clever demo, I don't see volume products going to market relying upon that feature. The moment you connect to an external monitor, you're going to want an external DRAM chip to run the sorts of applications that effectively utilize all those pixels. I could be wrong and mis-judged the utility of the demo but if you do the analysis on the bandwidth and RAM available in the Baochip, I feel that you could do a retro-gaming emulator with the chip, but you wouldn't, for example, be replacing a video kiosk with the chip. Running DOOM on a TV would be cool, but also, you're not going to sell a video game kit that just runs DOOM and nothing else.

The good news is there's plenty of room to improve the performance of the BIO. If adoption is robust for the core, I can make the argument to the company that's paying for the tape-outs to give me actual back-end resources and I can upgrade the cores to something more capable, while improving the DMA bandwidth, allowing us to chase higher system frequencies. But realistically, I don't see us ever reaching a point where, for example, we're bit-banging USB high speed at 480Mbps - if not simply because the I/Os aren't full-swing 3.3V at that point in time.

My feeling about programmable IOs is they’re fun, but not the right choice for commodity high speed interfaces like USB. You obviously can make them work, but they’re large compared to what you would need for a dedicated unit. The DVI over PIO is a good example: showed something interesting (and that’s great!) but not widely useful. Also, a lot of protocols, even slow ones, have failure and edge cases that would need to be covered. Not to mention the physical characteristics, like you’ve said for high speed USB.
This is true, but only relevant if you order enough units (>100 k? Depending on price & margin of course) to customize your die. Otherwise, you have to find a chip with the I/Os that you want, all the rest being equal. Good luck with that if you need something specific (8 UARTs for instance) or obscure.
Yes, I can see BIO being really good at USB host. With 4k of SRAM I can see it doing a lot more of the protocol than just NRZI; easily CRC and the 1kHz SOF heartbeat, and I wouldn't be surprised if it could even do higher level things like enumeration.

You may be right about not much scope for DVI in volume products. I should be clear I'm just playing with RP2350 because it's fun. But the limitation you describe really has more to do with the architectural decision to use a framebuffer. I'm interested in how much rendering you can get done racing the beam, and have come to the conclusion it's quite a lot. It certainly includes proportional fonts, tiles'n'sprites, and 4bpp image decompression (I've got a blog post in the queue). Retro emulators are a sweet spot for sure (mostly because their VRAM fits neatly in on-chip SRAM), but I can imagine doing a kiosk.

Definitely agree that bit-banging USB at 480Mbps makes no sense, a purpose-built PHY is the way to go.