https://github.com/harvard-lil/warc-gpt
https://lil.law.harvard.edu/blog/2024/02/12/warc-gpt-an-open...