A bigger problem with kmeans than figuring out the number of clusters is the fact that the model assumes multivariate gaussian spheres. That's a bad model for most non-toy data.
For example, word2vec uses k-means clustering using cosine similarity measure [1]. It works very, very well. The caveat is not many optimization variations of k-means will work with that "distance".
My feeling is that if for a given problem "cluster center" is a meaningful concept then k-means is the right tool. The concept of "distance" can be adjusted as well. I think any metric will do. But if you want clusters without a "center of mass" then you are faced with a whole other problem and need other tools.
The model underlying k-means is that all the data is distributed into k hyperspheres. In the simple 2D case, that means drawing k circles around your data points in a X/Y plot such that the inter-group variance is minimized. This is bad because in the real world, data is typically grouped in an elliptical or irregular way.
For example, word2vec uses k-means clustering using cosine similarity measure [1]. It works very, very well. The caveat is not many optimization variations of k-means will work with that "distance".
[1] https://github.com/tmikolov/word2vec/blob/master/word2vec.c#...