Hacker News new | ask | show | jobs
by bd 3760 days ago
These are really cool. Though if you were, like me, puzzled how could some really complex and coherent features come from those simple drawings / masks, have a look at the original paintings that were used as sources and compare them with generated images:

Original #1:

https://github.com/alexjc/neural-doodle/blob/master/samples/...

Generated #1:

https://github.com/alexjc/neural-doodle/blob/master/docs/Coa...

Original #2:

https://github.com/alexjc/neural-doodle/blob/master/samples/...

Generated #2:

https://github.com/alexjc/neural-doodle/blob/master/docs/Lan...

So those new generated images are structurally very similar to the original sources. Neural net seems to be good at "reshuffling" of the sources. That's probably how things like reflections on the water got there, even if not present in the doodles.

2 comments

Thanks for clarifying, I'll update the README. The research paper does a better job of explaining this with its figures!

The algorithm can only reuse combinations of patterns that it knows about, it can do extrapolation but it often ends up being just like a blend. However, you can give it multiple images and it'd borrow the best features from either—for example drawing from all of Monet's work. (Needs more optimization for this to work though, takes a lot of time and memory.)

As for the images, as long as the type of scene is roughly the same it'll work fine. The fact it can copy things "semantically" by understanding the content of the image makes it work much more reliably—at the cost of extra annotations from somewhere. With the original Deep Style Network it's very fragile to input conditions, and composition needs to match very well for it to work (or you pick an abstract style). That was part of the motivation for researching this over the past months.

So if I understood well, this GIF shows you - human being - exploring possibilities / limitations of your method, hand tweaking it for one particular image?

http://nucl.ai/files/2016/03/MonetPainting.gif

That is, the final image, the one that looks the best, is the result of you doing tweaks to doodles to get something that neural net can then fill-in convincingly?

Or are these a different runs of the same method based on the same inputs, that have some natural variability, and you selected the one that looked the best?

Or are these progression steps in one run of the automated algorithm?

Language in the blog post is kinda ambiguous, not sure which steps were done by algorithm and which by a human being.

Exactly, the doodling is done by humans and the machine paints the HD images based on Renoir's original. I've edited the blog post to clarify.
That part was clear :)

What still isn't clear to me is how exactly that "workflow" demo (and consequently the "money-shot" final generated images) happened.

There is a progression of generated images with increasing quality. Who did which steps in those iterations?

Blog post uses ambiguous language: "N-th image tries / removes / fixes", etc.

It's not clear though if it was:

1) algorithm steps (keep computing more till generated image looks good), or

2) human being tweaking inputs to fixed algorithm (keep painting new input/output doodles till generated image looks good), or

3) human being tweaking algorithm itself (change code till generated image looks good).

The algorithm does the same thing every time (it's triggered on request), only the input is changed by the human modifying the doodle—as shown in the video.

The output gets better because through iteration the glitches are removed incrementally, and it converges on a final painting that looks good!

Aha. I was wondering about that.

I've done some experimentation with neural network-based style transfer (this one: https://github.com/jcjohnson/neural-style_), and the results that I got pointed strongly to the same effect: it works well if the two images (source for style and source for content) are very similar in framing, composition and subject, and very badly if they're wildly different.

Having said that, this algorithm seems to be MUCH better than the one I tried at transferring style. I'd have expected those paintings to transfer to the doodles much worse than they did.

But don't expect to take a portrait doodle and a landscape source and have it come out well :)

The "Semantic" tag is misleading, because human perception parses lighting and textures cues in 2D images as 3D hinting.

Representational art is all about modelling, highlighting and/or transforming the hinting, depending on the level of abstraction. E.g. if you look at portraits, the pen/brush strokes usually emphasise 3D structures.

This code does a little of that, but the model is extremely crude compared to the models the human brain uses.

For genuine semantic perception you'd have to duplicate - and maybe improve - the human model. I doubt you can do that in 2D, because the human model is trained by years of genuine 3D perception.

That's not to sound negative - I think this is very impressive visually. But it could be taken further.

> This code does a little of that.

Actually, the code does none of that ;-) All of the semantics are provided by the users: either as manual annotations or by plugging in an existing architecture for semantic segmentation / pixel labeling. It's designed to be independent of the source of the semantic maps, so we can continue to work on both problems separately.

It works for basic color segmentation already, and here are some of the papers we're integrating currently: http://gitxiv.com/search/?q=segmentation