Hacker News new | ask | show | jobs
by Kubuxu 590 days ago
Probably forks/duplicates of repos in the dataset.
1 comments

Also commits. I imagine that there is a lot of information to gather from the history of repos in addition to the "static view" of a codebase.

However, it doesn't seem trivial to do deduplication in that case without removing relevant/necessary context.