Hacker News new | ask | show | jobs
by feliixh 24 days ago
I've been building out https://www.accessmrf.com/sources to index Transparency in Coverage files across insurers in a single place. Most recently I have been looking at the overlap across files using Min Hashing and found that there is a lot of duplicate data. Currently, I'm figuring out a way of making this dataset more digestible by identifying and eliminating the 90+% duplicate and ghost data.
1 comments

It looks a complex website! i like it