Last time I tried this approach, the problem was that the native camera input had much better quality, while ImageCapture was essentially a still frame from a video (worse exposure/processing).
Although image quality is a challenge but it is still very suitable for various use cases that isn't around the image quality. The image looks better now.