I'm curious: what sort of fidelity have you seen in `grab-site`'s rewritten static asset URLs? Having to fix URLs that weren't properly rewritten ended up taking me the most time.
grab-site doesn't rewrite URLs, it captures the entire request and response of each http request for an asset and stores as-is for archival purposes. The quality of any mutations or transformations performed will be governed by the tooling used to consume the generated WARCs.
More detail on the WARC format can be found below:
Thank you! I wish I had found this sooner to try it out. Similar question to the other one I asked on this thread: what sort of fidelity have you seen in `archivebox`'s rewritten static asset URLs? Having to fix URLs that weren't properly rewritten ended up taking me the most time.
https://github.com/ArchiveTeam/grab-site