| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by superconformist 2712 days ago
	After I collect a dataset I save duplicate entries with the text reversed. More data is good data. Except now my machine learning robot is dyslexic.

2 comments

why_only_15 2711 days ago

Of course you're joking here, but for images people do reverse the image to enhance the output of their models and this has shown to be beneficial. There are other augmentation you can do as well. In total they're called "Synthetic Data".

link

gnulinux 2711 days ago

This is not surprising because if image A is an apple, its mirror image A' is also an apple. Adding A' to your dataset is just plain ol' regularization. It is meant to lower bias, to prevent overfitting.

link

jzwinck 2711 days ago

It will also make the flag of Côte d’Ivoire be perceived as that of Ireland.

link

Topolomancer 2711 days ago

It's not surprising, but highly necessary because you might otherwise learn too little about structural features, and too much about unnecessary things such as the background colour of some objects.

link

paradoxparalax 2711 days ago

The other day I was thinking that if I create a black cube in a simulation with white background, and generate a number of PNG images of the cube in different positions, The adjusted weights I get in my network when it classifies with success in further simulations(applying the learned weights to classify now, instead of learning and adjusting the weights), Can also be used to classify a real life black dice on a real life white table, if the camera is well positioned and the light is right. In this case, the learning would "generalize", from the simulation to the real life. Maybe this simulated data used to adjust the weights could also be called "synthetic data" or "simulated data" or just "artificially generated data".

link

superconformist 2711 days ago

You see, I'm speaking only the truth, merely referring to synthetic data. And to think that they're downvoting me, poor fools.

link

Topolomancer 2711 days ago

You may joking about it, but ML researchers use all kinds of strategies like this one in order to perform what they call _data augmentation_.

In particular for computer vision tasks, creating a perturbed variant of your input images (slightly warped, flipped, mirror, what have you...) can do wonders for the generalization performance.

link