I found myself having to do some clustering a few months back and learned about mean-shift clustering [1]. I found it to be a nice clustering algorithm for when you don't know quantity of clusters ahead of time.
That said, its worth noting that there are algorithms out there for determining optimal number of clusters for k-means (though personally I found them to be costly and subject to overfitting) like using the silhouette coefficient or elbow method [2].
Note also that specifically for one-dimensional data, there is a globally optimal solution to the k-means clustering problem. There is an R package that implements it using a C++ core implementation [1], and also a Python wrapper [2].
This implementation is also surprisingly fast, so you can use it to brute-force check many different numbers of clusters and check using silhouette distance. The advantage over traditional k-means is that you don't need to check multiple initializations for any given number of clusters, because the algorithm is deterministic and guaranteed to find the global optimum.
A bigger problem with kmeans than figuring out the number of clusters is the fact that the model assumes multivariate gaussian spheres. That's a bad model for most non-toy data.
For example, word2vec uses k-means clustering using cosine similarity measure [1]. It works very, very well. The caveat is not many optimization variations of k-means will work with that "distance".
My feeling is that if for a given problem "cluster center" is a meaningful concept then k-means is the right tool. The concept of "distance" can be adjusted as well. I think any metric will do. But if you want clusters without a "center of mass" then you are faced with a whole other problem and need other tools.
The model underlying k-means is that all the data is distributed into k hyperspheres. In the simple 2D case, that means drawing k circles around your data points in a X/Y plot such that the inter-group variance is minimized. This is bad because in the real world, data is typically grouped in an elliptical or irregular way.
This implementation is also surprisingly fast, so you can use it to brute-force check many different numbers of clusters and check using silhouette distance. The advantage over traditional k-means is that you don't need to check multiple initializations for any given number of clusters, because the algorithm is deterministic and guaranteed to find the global optimum.
[1]: https://cran.r-project.org/package=Ckmeans.1d.dp
[2]: https://github.com/djdt/ckwrap