For example, word2vec uses k-means clustering using cosine similarity measure [1]. It works very, very well. The caveat is not many optimization variations of k-means will work with that "distance".
[1] https://github.com/tmikolov/word2vec/blob/master/word2vec.c#...