"Enhancement" meaning tone-mapping? Are neural networks really required for that? Seems like a lot of heavy machinery for the resulting filter, but maybe tone-mapping standards have gone up.
They imply "human operator" level retouching, so potentially some combination of tone mapping, unsharp mask, edge enhancement, etc. as a single NN operation. It's also <30ms for 1080p on mobile, so potentially better than average speed.
This is effectively "tone mapping" which is aware of context. Ie. a face and a shoe might have the exact same color, yet they can end up different colours after processing even if they appear in the same image.