Hacker News new | ask | show | jobs
by janalsncm 618 days ago
Yes, exactly. It’s one of the reasons that an autoencoder’s compressed representation might not work that well for similarity. You need to explicitly push similar examples together and dissimilar examples apart, otherwise everything can get smashed close together.

The next level of understanding is asking how “similar” and “dissimilar” are chosen. As an example, should texts about the same topic be considered similar? Or maybe texts from the same user (regardless of what topic they’re talking about)?