|
Well, sure; but the problem of font rendering specifically is an "embarrassingly parallel" one, isn't it? If you've got 1000 glyphs at a specific visual size to pre-cache into alpha-mask textures; and you've got 1000 GPU shader cores to compute those glyphs on; then each shader core only needs to compute one glyph once. Can a CPU really be so much faster than these cores that it can run this Turing-complete font rendering program (which, to be clear, is already an abstract machine run through an interpreter either way, whether implemented on the CPU or the GPU) consisting of O(N) interpreted instructions, O(N) times, for a total of O(N^2) serial CPU computation steps; in less than the time it takes the O(N) GPU cores to run only O(N) serial computation steps each? Especially on a modern low-power system (e.g. a cheap phone), where you might only have 2-4 slow CPU cores, but still have a bounty of (equally slow) GPU cores sitting there doing mostly nothing? If so, CPUs are pretty amazing. But even if it were true that it'd be faster in some sense (time to first pixel, where the first rendered glyph becomes available?) to render on the CPU — accelerators don't just exist to make things faster, they also exist to offload problems so the CPU can focus on things that are its comparative advantage. Analogies: - An apprentice tradesperson doesn't have to be better at a delegated task than their mentor is; they only need to be good enough at the task to free up some time for the mentor to focus on getting something higher-priority done, that the mentor can do and the apprentice (currently) cannot. For example, the apprentices working for master oil painters did the backgrounds, so the master could focus on portrait details + anatomy. The master could have done the backgrounds faster! But then that time would be time not spent working on the foreground. - Ethernet cards. CPUs are fast enough to "bit bang" even 10GBe down a wire just fine; but except under very specific situations (i.e. dedicated network-switches where the CPU wants to process every packet synchronously as it comes in), it's better that they don't, leaving the (slower!) Ethernet MCU to parse Ethernet frames, discard L2-misdirected ones, and DMA the rest into kernel ring-buffer memory. - Audio processors in old game consoles like the SNES's S-SMP and the C64's SID — yes, the CPU could do everything these could do, and faster; but if the CPU had to keep music samples playing in realtime, it wouldn't have much time to do things like gameplay (which usually goes together with playing music samples!) Offloading font (or generalized implicit-shape) rendering to the GPU might not make sense if you're just computing letterforms for billboard textures in a static 3D scene (rather the opposite!) but in a game that wants to do things like physics and AI on the CPU, load times can likely be shorter with the GPU tasked with the font rendering, no? Especially since the rendered glyph-textures then don't have to be loaded into VRAM, because they're already there. |
Cores in GPUs do not operate independently, they have hierarchies of memory and command structure. They are good at sharing some parts and terrible at sharing other parts.
Exploiting the parallelism of a GPU in the context of curve rasterization is still an active research problem (Raph Levien, who has posted elsewhere in this thread, is one of the people doing the research), and it's not easy.
I restrained from commenting on the specifics of how curves are rasterized, but if you want to imagine it, think about a letter, maybe a large "g", think about the points that make it up, and then come up with an algorithm to find out whether a specific point is inside or outside that outline. What you'll quickly realize is that there's no local solution, there's only global solutions. You have to test the intersection of all curves to know whether a given pixel is inside or outside the outline, and that sort of problem is serial.
The work division you want (do a bit of work for each curve), is exactly backwards from the work division a normal GPU might give you (do a bit of work for each pixel), pushing you towards things like compute shaders.
I could go on, but this comment thread is already too deep.