I like CEPL a lot too, but I agree, it doesn't address the OP's pain points.
I am not enough of a gpu guy to know, but I thought that was what Harlan [1] was trying to address. It had used miniKanren originally as a region and type inferencer.
CEPL does not try to fix the underlying problem, but provide a decent way of working with it. The Varjo compiler seems already quite evolved and sure, it could be better. On the other hand, this seems to be a single-person project (http://techsnuffle.com/).
CEPL/Varjo creator here. Thanks for the shout-out but my compiler is very basic, it has potential though and I'm looking forward to people making higher level approaches/languages that use CEPL & Varjo under the hood.
Regarding the authors issues, the APIs are very ugly, especially in GL where you have two apis smashed together (pre v3.1 and post). I'm not qualified to comment of DirectX but I feel in the GL case there is a fairly nice API buried in the madness, and that with a suitably flexible language you can making working with it rather pleasant without resorting to tools that totally separate you from the lower levels.
Vulkan is no panacea (nor dx12) because with control comes the associated complexity. Vulkan is going to be awesome for folks making engines, languages or those who just want absolute control. But Khronos are very open that these Vulkan & SPIRV are not replacements for GL/GLSL. The increased variety of APIs that will be possible for people to make WILL be exciting though and that is worthy of celebration, as will be mixing and matching the above.
Re the "impoverished GPU-specific programming languages" comment. Limiting the scope of a language can give you valuable places to optimize and one of the primary concerns for this hardware is speed. As an exercise/strawman add pointers to an imaginary gpu language and work out how to keep the performance you currently have. This is going to become more evident as languages come out that compile to SPIRV and we get to see behind the curtain at just how hard it is to make code run fast against the various hardware approaches to render that exist in GPUs (for a case in point see 'tiled rendering').
Im unsure around the 'interface too flexible' issue as GPU resources are still valuable enough that being able to unload code is still useful, and at that point you have to rely that the pipeline you load follows the interface you expect it to, as you would with any shared library. Even when you take away all the api ugliness you are still left with the fact you are talking to a separate computer with separate concerns, I'd rather embrace this in a way that feels natural to your host language than hide this separation which makes certain issues (e.g synchronization) harder to deal with.
Regardless, this is still a valuable post and an issue worth getting fired up about, it's always exciting to see people making different ways of leveraging these awesome pieces of hardware
Speaking to the DX APIs. 9.1c was pretty much the same kind of mess gl is now. Two APIs, the fixed function and programmable pipelines mashed together to create something totally gross.
10 restructured the whole interface significantly. The API is actually quite nice these days. With the caveat that the API overall is very performance conscious so GPU considerations show up in the API idioms and thus in client code.
I am not enough of a gpu guy to know, but I thought that was what Harlan [1] was trying to address. It had used miniKanren originally as a region and type inferencer.