|
|
|
|
|
by sheraz
3905 days ago
|
|
I don't have anything public, but I have been exploring strategies for gluing together different tech in order to accomplish our goals. Latest stack has been: - wget / wpull / heretrix to produce .warcs across a single domain
- have a filewatcher on a folder to process .warc into text and then push it into elasticsearch with relevant metadata
- flask search frontend for querying / results Happy to share my learnings elsewhere. (I pinged you on email) |
|