|
|
|
|
|
by ephemeralkey
1113 days ago
|
|
Filter the CommonCrawl data for webmanifest file URLs and publish a concise URL index as a simple text file (on say GitHub). We can use this to bootstrap a Progressive Web Apps (PWA) index webpage (also as a PWA) and give the platform app stores some competition! CommonCrawl has an AWS Athena SQL query sample here: https://commoncrawl.org/2018/03/index-to-warc-files-and-urls... We can filter by content_mime_type STRING,
content_mime_detected STRING,
fields for the MIME type 'application/manifest+json' |
|