Hacker News new | ask | show | jobs
by bityard 711 days ago
That is true. However, it also has a staggering amount of duplicate data. I have _heard_ that if you search for most any particular book, you often get a dozen results of varying sizes and quality. Even for the same filetype. It's a hard problem to solve, but if we had something that could somehow pick the "best" copy of a particular title, for every title in the library, Anna could likely drop the zero herself.
2 comments

As one of their blog posts explains that's by design, they download all versions of any file. The reasoning was that some worse quality video files will have subtitles or better audio than the high quality video.

Some filtering may be possible to automate but lots of the tasks involved will have to be manual. Like merging video and audio from different sources or syncing subtitles from another file.

The above number is excluding duplicates.