|
|
|
|
|
by classybull
3210 days ago
|
|
This. I had a problem where I needed to scrape roughly 20,000 html documents daily, which is normally a pretty slow task. You have to open the file, load it into memory, parse the DOM, and then run all of your selection methods. Sequentially, it took about 60 minutes daily. Multithreading slowed it down because it was CPU bound. Multiprocessing allowed me to run 12 processes across 8 cores. That took the total processing time down to about 4 minutes or so. And I was able to write the code in a day. Writing something similar in Java or C++ would have taken me a week. |
|