Hacker News new | ask | show | jobs
by ks2048 673 days ago

    I don’t have 8TB laying around, but we can be a bit more clever.... In particular I cared about a specific column called url. I really care about the urls because they essentially tell us a lot more from a website than what meats the eye.
I'm I correct that it is only only using the URL of the PDF to do classification? Maybe still useful, but that's quite a different story than "classifying all the pdfs".
1 comments

It’s just classifying the URLs if that’s the case.

The legwork to classify PDFs is already done, and the authorship of the article can go to anyone who can get a grant for a $400 NewEgg order for an 8TB drive.