Hacker News new | ask | show | jobs
by vitorsr 1175 days ago
Thanks for doing this, I will be sure to try it in ML competitions. I really like that you used Armadillo, which is something I personally want to do for my own projects.

Just out of technical curiosity, have you come across any particular developments or empirical evidence on the use of (invertible) data transformations to enhance clustering results? I am currently researching a particular problem within signal processing related to signal distribution transforms and I am particularly interested in reading about potential applications. As an example, and since you mention JPM partial funding, how would copula transformation affect the results of clustering (assuming an inverse exists etc. and we apply the inverse transformation afterwards)?

1 comments

One thing that's important to note is that k-medoids supports arbitrary distance metrics -- in fact, your dissimilarity measure need not even be a metric (it can be negative, asymmetric, not satisfy the triangle inequality, etc.)

An implication of this is that if you were to do some invertible data transformation and then perform clustering, that's equivalent to doing clustering with a different dissimilarity measure (without the data transformation in the first place). It should be possible to avoid doing the invertible data transformation in the first place if you're willing to engineer your dissimilarity measure.

Without more details, it's hard to say exactly what would happen to the clustering results under custom dissimilarity measures or data transformations -- but our package supports both use cases!