Hacker News new | ask | show | jobs
Generating six-pack abs with Tensorflow (blog.floydhub.com)
13 points by saip 3172 days ago
5 comments

Suppose that data in quantities orders of magnitude greater than that used in this project was scraped from the internet and used to train a model that powers a commercial product.

Is it ethical to sell something that is dependent upon data that people might consider sensitive? Even if they willingly lent their photos to fitness organizations, it's unlikely that they would have predicted that AI would make use of their personal data in the ways it does today.

I'll be honest, this sort of thing didn't cross my mind when I was gathering my small dataset. To me, if someone had willingly let an organization publish their photo on the internet, it was free game.

That said, having considered what you've pointed out about people's expectations when having their data taken not accounting for things like deep learning, I would be hesitant adopting the same attitude if I were working on a product or paid service. Right now I can't lay down a bottom line as to whether I think what you described is ethical, but I do think that the general public should be more informed when it comes to how their data, even old data, could be potentially used.

Hey Ray, great article! Really loved your approach, i've never imaged a similar application! I think that this type of service can really be useful to a lot of people to maintain the right motivation for achieving their fitness goals, that usually are in the medium-long term.

I have some questions: Have you planned some possible data pipeline to improve the data generation? What about concept validation? I mean, this is really cool and could really improved the users experience in a gym or fitness center.

Hey, so the idea behind the app would be that users would constantly upload photos of themselves, thereby generating a lot of data, that includes their personal information, time of uploads, and hopefully lifestyle information, which would make the images generated by the model more realistic.

For this to work, the generated images would need to be already realistic enough. I think one way to "fake it until you make it" would be to interpolate between someone's picture and a fitness model, in the method introduced by the paper "Generative Visual Manipulation on the Natural Image Manifold"

I noticed the skin and short color is different in the sample outputs. What techniques can be used to "teach" the network to preserve those features?
When it comes to the short color, I think this an instance of the model "overfitting" to the data. I noticed that images generated during training gradually learned to map changes in shorts colors for each of the ~130 people in the set that had shorts in their photo. When I use overfitting in this context, I mean that the model hasn't learned that certain transformations are things that humans consider invariant in the image-to-image translation problem, and since it has enough parameters to map every such noisy transformation in such a small training set, we get the artifacts you pointed out.

Data augmentation by slightly scaling pixel intensity uniformly across all color channels would be my first idea as to how to counteract this. After that, I would consider regularization by adding a penalty to the cost function that punishes larger differences in average pixel intensity.

Often times a body transformation from one image to another results in a redefinition of the relative position and length of edges in a photo (as a person gets leaner).

Do you think that the L1 regularization used in pix2pix - which is intended to preserve low-frequency spacial patterns in the image pairs - will ultimately end up limiting how well cGANs can learn the body transformation image translation task?

Great question! My intuition tells me that L1 regularization could indeed be impacting the performance of the model, but for me personally, the only way to find out for certain would be to test this empirically.
I'd like to see more about how this deep learning visualization (or any DL project for that matter) was used in context of a user-facing product or app - does the user take a selfie and receive a transformed picture back. I've seen the Prisma style transfer app, and I'm curious to see how others are starting to include DL features inside a customer product.
That's exactly how we had the app work in the demo. The "facade" part of things is that our trained model was not used as the backend or exposed on an API in any way. We used it to generate our best transformation on an image I took of myself, and hard coded an Android app to always return that transformation.

You can see that demo in the facebook live video, or check out the code here: https://github.com/rayheberer/burda_hackday