As for processing the data we crawled, we are using ArchiveSpark (https://github.com/helgeho/ArchiveSpark)
Also, Mixnode defaults on Amazon S3 for storage which was ok with us since we're using EC2 for processing the results.
As for processing the data we crawled, we are using ArchiveSpark (https://github.com/helgeho/ArchiveSpark)
Also, Mixnode defaults on Amazon S3 for storage which was ok with us since we're using EC2 for processing the results.