Detecting Near-Duplicates for Web Crawling (http://www.wwwconference.org/www2007/papers/paper215.pdf)
SEOMoz has in-memory and db-backed implementations of simhash in Python (https://github.com/seomoz?query=simhash)
Unfortunately, it's also encumbered by a patent: http://www.google.com/patents/US7158961
Unfortunately, it's also encumbered by a patent: http://www.google.com/patents/US7158961