Y
Hacker News
new
|
ask
|
show
|
jobs
Extracting Subset of Common Crawl Data on Laptop
(
avilpage.com
)
1 points
by
chillaranand
1310 days ago
1 comments
chillaranand
1310 days ago
Each Common crawl monthly data consists of ~100 TB. For some use cases, we don't need entire data set. We just need a subset of the data.
In this post, lets see how we can extract sub set of the data from our laptop itself.
link
In this post, lets see how we can extract sub set of the data from our laptop itself.