My LSH implementation is here: https://github.com/loda-lang/loda-rust/blob/develop/script/t...
Example of the 100 most similar documents: https://github.com/neoneye/loda-identify-similar-programs/bl...
There can be false positives, so after LSH then do a more in-depth comparison.