Hacker News new | ask | show | jobs
by raverbashing 4405 days ago
I think what's broken is not OpenGL, D3D, etc

What's broken is that the abstraction between graphics card and data (on the screen) is too big

We don't have troublesome/fat drivers as these since the "Softmodem" days and even then (Wifi is also complicated)

It's too big of a gap.

In 2D graphics, you send graphical data and it is displayed. You may even write it directly to memory after some setup.

Audio, same thing. Network, it's bytes to the wire. Disk drive, "write these bytes to sector X" (yes, it's more complicated then that, still)

With 3D, we have two APIs that have an awful amount of work to do between the getting the data and displaying it.

I'll profess my ignorance in the low-level aspects, I only know "GlTriangle" , OpenGL 101 kind of stuff, and I have no idea how: 1 - this is sent to the videocard, 2 - how does it decide to turn that into what we see on the screen.

Compared to the other drivers this is a lot of work and a lot of possibilities of getting this wrong.

Adding GPGPU stuff makes it easier in one aspect and more complicated in other aspects. We don't have a generic way of producing equal results from equal inputs (not even the same programming environment is available)

We don't have OpenGL, we have "this OpenGL works on nVidia, this other one works on ATI, this one works on iOS, or sometimes it doesn't work anywhere even though it might be officially allowed"

4 comments

To my understanding, the critical difference between framebuffer graphics and 3D API graphics is the processing! In a framebuffer scenario, the CPU does all the rendering. Since CPU is poorly suited to rendering 3D, we have a coprocessor called a GPU. The CPU has to feed the GPU work.

Because the GPU is cutting-edge, there is a certain amount of magic voodoo required for top performance that needs to get abstracted away- maybe this particular model of GPU you have doesn't support some common instruction. You don't want to handle that in your software, you want to hide that in the driver.

Beyond that, the API is also there to make the GPU easier to use. OpenGL is a mess, sure, but to my understanding most developers would pull their hair out and give up if they had to program the GPU directly.

Isn't the GPU just a computer that happens to support more parallelism than the CPU? Why not have a simpler API based on general-purpose operations like map/reduce/scatter/gather? Then there would be no need to add new "cutting edge" operations every year. I for one would be happy to use that instead of OpenGL or DX.
> Isn't the GPU just a computer that happens to support more parallelism than the CPU?

Not really. It's like quantum mechanics compared to classical physics.

For instance, "branches" don't work like you'd expect. On a cpu you execute one branch or the other. On a GPU, you get things like both branches execute, but then it just throws away the half that shouldn't have run, but that means you're bottlenecked by whichever branch takes the longest (Or something like that -- the details escape me but I do remember something about CUDA's branching doing weird things). Point being, GPU's are weird. It's nothing like programming a CPU at all.

> On a GPU, you get things like both branches execute, but then it just throws away the half that shouldn't have run

It's not that weird. You don't really have thousands of parallell processors, but a single processor, operating on thousands of values. (Like SIMD on steroids.)

Since all operations must be done identically on all values, a "branch" is really doing both branches and recombining them with a mask of equally many booleans - as you say "throwing away" the unwanted branch.

The GPU microops continue to change for the same reason we got x87, MMX, AVX256... We expect the very best performance from our graphics coprocessor, and sometimes the fastest way to do something is with a new hardware op.

The GPU doesn't get more complicated to support OpenGL; OpenGL gets more complicated to 1) meet the needs of developers and 2) support the GPU.

The real way to stop adding new features & ops every year is to stop caring about whether your GPU is fast.

P.S. The GPU is a computation engine that is more parallel than the CPU, but it is not analogous to a CPU with more threads. A GPU basically cannot branch (if/for/while) worth a damn, for example.

Among other things, yes, the GPU is a computer that supports more parallelism than the CPU. But not in the way you want.

There are a few issues. One is that the GPU does much more than map/scatter/gather (reduce is hard in parallel so it doesn't do that), look into the stages of the graphics pipeline and see what I mean. The other big one is that it doesn't work like a CPU in a lot of ways and making it general purpose like one would loose enough of the performance that it would no longer be useful in many cases.

Really what you're asking for is a super-parallel general-purpose CPU, which really isn't what a GPU is or wants to be.

The reason for this is that graphics cards have evolved to be enormously complex.

Every (well, hopefully) graphics engineer knows how the data goes over the wire into the graphics card, and understands the stages of the graphics pipeline (or at least the ones that matter for the version of OpenGL/D3D they're targetting).

Yes, the relative complexity of the graphics pipeline means that there's much more potential for errors from the driver, but I'm not sure it's productive to say 'this is too complex' when there isn't a simpler way to get the same results.

IMO the complexity is only half the problem- humans are really not very good at thinking like a GPU. Most people have to be trained to be able to think massively parallel, and it's also hard to work with a device that is so bad at conditional branching.

So, one of the roles of the API is a sort of a deserializer, converting our human, linear, serial way of thinking to the GPU's parallel world.

I don't think OpenGL or D3D or 3D programmers deal with 'parallelization' much. There are very common things that have to be done that in reality are mathematically parallel, but are basically one unit in the API, such a texture, or geometry surface and transformations. It is true complexity, in multi-graph fashion. So, even if we didn't have massively paralle GPUs, we would still come up with same exact abstractions, not because we are not good at thinking massively parallel, but because there is no need to rewrite a loop over a matrix a billion times.
What's broken is that the abstraction between graphics card and data (on the screen) is too big

Yeah. I would love to see a display protocol where the pixels themselves are exposed as a framebuffer I can write to. No refresh rates, no scanning; I want random access to the pixels on the screen limited only by the available bandwidth.

I believe you can still use modern GFX in framebuffer mode.

It's slow, but the CPU can never render as fast as the GPU.

Indeed, which is why the old video game consoles had to have separate programmable video units to work performantly at all. Consoles like NES, SNES, and Genesis had "hardware-accelerated 2D" because a completely CPU-driven approach would have been unworkable for good games.

Even before 3D cards for PCs, and even when CPU speeds got fast enough to make games like Doom, you still saw the development of specialized hardware (such as the VESA Local Bus) to support better video and graphics capabilities.

Correct, it would be impossible to do blit fast enough, etc, with the hardware (CPU) of that time, so you needed the video chip to be intelligent

Example: http://en.wikipedia.org/wiki/Texas_Instruments_TMS9918

That's the same as MSX's VDP.

That thing is not really fast. MSX's main CPU, an 8-bit, Z-80 running at less than 4MHz, could write to the framebuffer MUCH faster than that thing, were it not in the way of the framebuffer.

It did provide some interesting capabilities beyond that, such as sprites and custom 'glyphs', which simplified a lot of the programs at the time.

However, it did (and does, there's an active community) hamper efforts to improve the performance.

>could write to the framebuffer MUCH faster than that thing, were it not in the way of the framebuffer.

I'd like to see a Z80 blitting to a framebuffer faster than a VDP could chew through some tiles and sprites.

There general design of the famous TI VDP was fine; pretty much every 2D game console released after the Colecovision was either directly inspired by, or featured a direct successor to that chip, so clearly there was value in that combination of tilemaps (the "custom glyphs" you mentioned), sprite multiplexing, and separate video memory. You never really needed a framebuffer or direct video memory access on those systems. The TI VDP was just barely too limited in ways that exacerbated the flaws in the design.

The most obvious: No hardware scrolling. The NES could do per-pixel scrolling between two screens full of tiles, either arranged side-by-side or top-to-bottom. When the screen "wrapped around" the other edge, you only needed to load 1 row or column of new tile indices into place at a time, which comes to about 16-20 bytes every few frames. That's barely anything, and so NES games do just fine poking into VRAM through special registers one byte at a time.

On the TI VDP, which lacks this, you're obviously going to have big problems trying to implement smooth scrolling. More importantly, even for choppy 1 tile scrolling, you have to move the entire game area of tiles over at once. For a full screen of tiles (256/8 * 192/8) = 768 tiles, which when you add the color attribute map to the equation comes close to a kilobyte. I haven't programmed for the MSX, but that's probably too much to transfer in one vblank.

Ditto with the sprites, of which you only get 4 per line, and they're monochrome. If you want multi-colored sprites or more objects per line, you either have flickering or move everything over to framebuffer graphics and deal with that headache.

So I guess the short-sighted solution to this problem would be to improve the speed of CPU access to VRAM, and the better approach taken by Nintendo, Sega, etc. is to improve the capabilities of the VDP until it doesn't really matter how fast you can access VRAM. If the VDP is powerful enough that this isn't a burden, the advantage here is that the VDP has a lot more bandwidth available than it would if it were sharing a bus with the CPU. The same principle applied to the TI VDP-based systems, it was just a lot easier to run into their limitations.

> We don't have OpenGL, we have "this OpenGL works on nVidia, this other one works on ATI, this one works on iOS, or sometimes it doesn't work anywhere even though it might be officially allowed"

But this is just the shortfall failure to be standards compliant. We have multiple ARM implementations in CPU space, we have two vendors x86 CPUs, we have all kinds of wireless cards, but assembly on one ARM core better run on another, your x86 binary better run on ARM or Intel chips unless you use vendor specific extensions, and your wireless radio better process inbound data properly regardless of the sender.

I think the real problem is that in the massively parallel pipeline architecture space (aka, gpus, but they are more than just graphics processors anymore) is that they aren't treated like proper programmable computers. What we should have are compilers to build ASM binaries for each architecture, and sane ISAs.

That would mean in the present tense you would need Intel: preSB, SB, IVB, Haswell, Broadwell, Skylake, etc compilers. I've never read their ISA documentation, so I don't know which of these have common core so that you could treat it like a CPU - ie, base instruction set with extensions like SSE or NEON. I do imagine though if we weren't working at such an absurdly high level, we would see graphics hardware conform to the same model as CPUs - common ISA, with multiple implementations.

I guess the problem there is that the way all modern ISAs are handled is ridiculous and stupid. Intel "owns" x86, ARM "owns" itself, etc. The reason we don't have reasonable CPU competition is that you don't just compete in hardware implementation, you either need to pay extortion to use what is effectively the same API - the same ISA - and that is one of the strengths of the graphics industry. The effective ISA that software is built against is predominantly open (most GPU code is openGL, despite how many directX games there are). We can see the difference - very little hardware runs DirectX because they have to pay MS the privilege to support it, and then it isn't even a standard at all so you can't use it on anything but Windows. It is congruent with proprietary ISAs for cpus.

An example of a non-proprietary ISA is SPARC. I've never really looked into how good an ISA it actually is, but it is royalty free unless you want to use the branding - you can implement your own SPARC cpus without paying jack, the way it should be.

So like I said, what we really need is a common open ISA for hardware graphics accelerators to implement, and then independently develop extensions for that can be proposed and adopted into the mainstream standard. And if that standard ever becomes overly bloated, any vendor can develop their own newer base to handle newer paradigms or fix bloat, the same way we see programming languages go.

If I didn't have to figure out how to eat or didn't have another dozen things I'd want to do, I'd definitely want to see what I could manage writing a binary compiler for the SI ISA from AMD and seeing how performant you could make some other high level graphics language with compiled binaries. That is kind of what they did with Mantle, but their continued showing of not making any effort to open that language up to standardization or even just publishing it at all shows that it certainly won't be the answer.

So like I said, what we really need is a common open ISA for hardware graphics accelerators to implement, and then independently develop extensions for that can be proposed and adopted into the mainstream standard.

Except for the word "ISA", this sounds an awful lot like OpenGL, OpenCL, & Direct3D.

Which makes sense. You don't run a software binary on a GPU, so why does it need its own standard ISA? A GPU crunches data. The CPU feeds the GPU data. So we use APIs. You could make the GPU more like a CPU at the cost of graphics performance. Or you could integrate a CPU to the GPU, whose job is to run software and keep the GPU fed. But that sounds an awful lot like an APU.

Basically what I'm saying is a GPU is more like a network adapter- it is just fed data- and network adapters don't have their own ISA either! They do have binary blobs, but that is firmware.