I have an example of interior decorating inpainting where I replaced a large floor-to-ceiling window with a mirror, and the result was pretty impressive using NB Pro from nearly a year ago.
For locally hostable image editing models, the edit variant of the recently released Boogu-Image[1] model is very good. Anecdotally, I'd say way better than Flux.2 Klein 9B and Qwen-Edit.
As far as I know, gpt-image-2 doesn't even let you define a mask unless you've already run it through one iteration, and once you do define the mask, it just ignores it 90% of the time. It's utterly useless for inpainting. Also, this and other proprietary models are severely limited in their output resolution.
I do agree, however, that the Flux2 family is the SoTA at the moment. Running locally via something like Comfy gets incredible results.
Yeah definitely. You can do workarounds like drawing circles or using highlighters to create pseudo-masks for use with OpenAI or Google models but it’s really just a visual indication more than anything.
If you want real precision (especially for complex polygonal masks), or if you’re concerned about image degradation over multiple edit rounds, you'll slam against the limitations of those approaches.
Even with SOTA proprietary models, repeatedly editing and re-uploading an image is like making a copy of a copy of a VHS tape: you're gonna see subtle color shifts and quality loss steadily accumulate.
At that point, you either need to put in the manual work in something like Photoshop (bringing elements in as layers and masking them properly) or, as you mentioned, use a model or workflow that properly supports masking.
Awnings, if I understand correctly (I just learned this word right now), are purely additive attachments to structure exteriors - so perhaps they wouldn't necessarily need a full inpainting model? Wouldn't it be enough to estimate an affine transform for a quad and blend the image of awning directly (and the same with shadow map to fake shade)? Is classical photogrammetry up to such task these days?
I'm quite perplexed by this comment. If I'm understanding you correctly, sure, what you describe is possible through significantly more effort, orchestration, and source photos. Or we can grab one still image and throw an inpainting model at it.
I have no idea but I think you might be onto something.
So you're saying that, if I can calculate from the picture the position (height, inclination and such), and I can render the model (should be doable) for that height and angle, my best course of action could be to combine original + render and only at the end use a visual model? That could be interesting.
More-less, yes. I was actually thinking about taking a high-resolution rendering of an awning directly facing the camera, and transforming that quad onto the user image - which requires computing the transformation matrix that would right the user image so the building is level and directly facing the camera, and then applying the inverse of that to the quad with your rendering - but I now realize I assumed user photos would mostly be nearly straight images of the building, not at large, odd angles. For general case, you'd need a 3D model (even if approximate), and apply the inverse transform to that, and then render it on top.
This idea rests on the assumption that my understanding of what "awnings" are is correct and matches your project, i.e. additive structures. In that case, your primary problem is adding pixels on top of the user image. Additive modifications are easy to pull off. Inpainting seems like overkill here; it's something that shines when you need to poke holes or replace some of the aspects of the original that is not covered by the part you're adding.
OTOH, it might still be that inpainting is your best bet for operational reasons - additive modification itself may not be a problem, but fixing lighting and shadows might, and current image generations models should handle this in stride.
(I say should because that's my expectation, but I never tested any of the current models on for ability to fix shadows that cover areas similar to the targeted modification, but lay beyond it. It might be that you'll still need a model and a transform estimate just to generate a shadow map as a hint for the model where it needs to act and how.)
I have an example of interior decorating inpainting where I replaced a large floor-to-ceiling window with a mirror, and the result was pretty impressive using NB Pro from nearly a year ago.
https://imgpb.com/ZXkiXV
Locally hostable? For my money I'd argue Flux.2 Klein but Qwen-Edit still puts in the work.