The big problem I've had historically with non-native CUDA wrappers is that they always seem to omit or bug some feature that is critical for my application, and the amount of debugging pain and implementation or bugfix work to get around this problem exceeds the effort "savings" of a high level interface by an order of magnitude or three.