I might be waving hands here, but I got impression that actual components that do raw processing in GPU are not that complex. Complex part (and which produces the performance) is the pipeline feeding data into those units. GPUs are also programmable chips so even publishing driver that shows how graphic stack on GPU side is set up could potentially reveal whole architecture of that chip.