Hacker News new | ask | show | jobs
by ephemeralkey 1113 days ago
Filter the CommonCrawl data for webmanifest file URLs and publish a concise URL index as a simple text file (on say GitHub).

We can use this to bootstrap a Progressive Web Apps (PWA) index webpage (also as a PWA) and give the platform app stores some competition!

CommonCrawl has an AWS Athena SQL query sample here: https://commoncrawl.org/2018/03/index-to-warc-files-and-urls...

We can filter by

  content_mime_type             STRING,
  content_mime_detected         STRING,
fields for the MIME type 'application/manifest+json'