Hacker News new | ask | show | jobs
by mindesc 1056 days ago
Those are used. Search for minimum description principle and entropy based classifier. The performance is poor, but it is definitely there and really easy to deploy. I have seen gzip being used for plagiarism detection as similar text tends to compress better. Use the compression ratio as weights on spring model then for visualisation. Also works with network communication metadata ...