|
Having a queue of 1,000 independent work items to do doesn't mean something is "embarrassingly parallel". Operating systems are a classic example of something that's hard to parallelize, and they have 1,000 independent processes they need to schedule and manage. Heterogeneous tasks makes parallelism hard! Cores in GPUs do not operate independently, they have hierarchies of memory and command structure. They are good at sharing some parts and terrible at sharing other parts. Exploiting the parallelism of a GPU in the context of curve rasterization is still an active research problem (Raph Levien, who has posted elsewhere in this thread, is one of the people doing the research), and it's not easy. I restrained from commenting on the specifics of how curves are rasterized, but if you want to imagine it, think about a letter, maybe a large "g", think about the points that make it up, and then come up with an algorithm to find out whether a specific point is inside or outside that outline. What you'll quickly realize is that there's no local solution, there's only global solutions. You have to test the intersection of all curves to know whether a given pixel is inside or outside the outline, and that sort of problem is serial. The work division you want (do a bit of work for each curve), is exactly backwards from the work division a normal GPU might give you (do a bit of work for each pixel), pushing you towards things like compute shaders. I could go on, but this comment thread is already too deep. |
> The work division you want (do a bit of work for each curve), is exactly backwards from the work division a normal GPU might give you (do a bit of work for each pixel)
Doesn't this mean that you could:
1. entirely "offline", at typeface creation time:
1a. break glyphs into their component "convex curved region tiles" (where each region is either full, empty, or defined by a curve with zero inflection points)
1b. deduplicate those tiles (anneal glyph boundaries to minimize distint tiles; take advantage of symmetries), to form a minimal set of such curve-tiles, and assign those sequence numbers, forming a "distinct curves table" for the typeface;
1c. restate each glyph as a grid of paint-by-numbers references (a "name table", to borrow the term from tile-based consoles) where each grid position references its tile + any applied rotation+reflection+inversion
2. Then, at scene-load time,
2a. take each distinct curve from the typeface's distinct-curves table, at the chosen size;
2b. generate a (rather large, but helpfully at most 8bpp) texture as so: for all distinct-curve tiles (U pos), for all potential angled-vector-line intersections (V pos), copy the distinct-curve tile, and serialize the intersection data into pixels beside it
2c. run a compute shader to operate concurrently over the workload tiles in this texture to generate an output texture of the same dimensions, that encodes, for each workload, the alpha-mask for the painted curve for the specified angle, iff the intersection test was good (otherwise generating a blank alpha-mask output);
2d. (this is the part I don't know whether GPUs can do) parallel-reduce the UxV tilemap into a Ux1 tilemap, by taking each horizontal strip, and running a pixel-shader that ORs the tiles together (where, if step 2c is done correctly, at most one tile should be non-zero per strip!)
2e. treat this Ux1 output texture as a texture atlas, and each typeface nametable as a UV map for said texture atlas, and render the glyphs.
To be clear, I'm not expecting that I came up with an off-the-cuff solution to an active "independent research problem" here; I'm just curious why it doesn't work :)