Hi HN! A year ago, I shared a demo where you could paint a photograph in the
style of one of five paintings, via neural networks, right in the browser.
Since then, a few papers have come out attempting to create a single model
that could be used for all styles. I tried porting one of those models to the
browser today.
A brief summary of how the algorithm works:
- For any particular style, a neural network encodes it into a 100-dimensional
vector that represents the network's "understanding" of the style.
- This vector is fed, along with the content image, to another neural
network that does the style transformation.
This is also how combining two styles work. The mean of the style vectors
of Style A and Style B is calculated and used as the style vector input
to the transformation network.
In any case, I acknowledge the results are not perfect and will not look
good for all combinations of style and content (particularly faces, ugh),
but I think it's a good reason to get excited about what will eventually become
possible in the future using the browser alone.
It world be nice to be able to move the "Stylization strength" bar and see how the image changes, but it's probably too slow for a real time result :(.
What about making a short video or an animated gif? Is the transformation smooth? (Is the image at 53.2% similar to the image at 53.3%?) Or the texture is rearranged in a random way?
You can probably calculate some key points in the bar while the user is wandering, and then use them. This will kill the battery of the cellphones, so perhaps don't enable it by default.
Since then, a few papers have come out attempting to create a single model that could be used for all styles. I tried porting one of those models to the browser today.
A brief summary of how the algorithm works:
- For any particular style, a neural network encodes it into a 100-dimensional vector that represents the network's "understanding" of the style.
- This vector is fed, along with the content image, to another neural network that does the style transformation.
This is also how combining two styles work. The mean of the style vectors of Style A and Style B is calculated and used as the style vector input to the transformation network.
In any case, I acknowledge the results are not perfect and will not look good for all combinations of style and content (particularly faces, ugh), but I think it's a good reason to get excited about what will eventually become possible in the future using the browser alone.