Hacker News new | ask | show | jobs
by superconformist 2712 days ago
After I collect a dataset I save duplicate entries with the text reversed. More data is good data. Except now my machine learning robot is dyslexic.
2 comments

Of course you're joking here, but for images people do reverse the image to enhance the output of their models and this has shown to be beneficial. There are other augmentation you can do as well. In total they're called "Synthetic Data".
This is not surprising because if image A is an apple, its mirror image A' is also an apple. Adding A' to your dataset is just plain ol' regularization. It is meant to lower bias, to prevent overfitting.
It will also make the flag of Côte d’Ivoire be perceived as that of Ireland.
It's not surprising, but highly necessary because you might otherwise learn too little about structural features, and too much about unnecessary things such as the background colour of some objects.
The other day I was thinking that if I create a black cube in a simulation with white background, and generate a number of PNG images of the cube in different positions, The adjusted weights I get in my network when it classifies with success in further simulations(applying the learned weights to classify now, instead of learning and adjusting the weights), Can also be used to classify a real life black dice on a real life white table, if the camera is well positioned and the light is right. In this case, the learning would "generalize", from the simulation to the real life. Maybe this simulated data used to adjust the weights could also be called "synthetic data" or "simulated data" or just "artificially generated data".
You see, I'm speaking only the truth, merely referring to synthetic data. And to think that they're downvoting me, poor fools.
You may joking about it, but ML researchers use all kinds of strategies like this one in order to perform what they call _data augmentation_.

In particular for computer vision tasks, creating a perturbed variant of your input images (slightly warped, flipped, mirror, what have you...) can do wonders for the generalization performance.