Hacker News new | ask | show | jobs
by imron 3444 days ago
Perhaps both?

I also think filtering out comments would improve it - especially because so many source files include a copyright statement at the top, and the same licenses (MIT, GPL, Apache, etc) are found repeated in many different files and it distorts the results somewhat.

1 comments

The copyrights are filtered out, because indeed there was a lot of them.