|
|
|
|
|
by hackingforfun
1635 days ago
|
|
In terms of NodeJS vs Python, specifically for web scraping, would you choose NodeJS? If so, why? I'm more familiar with NodeJS but I'm working with a team that is leaning towards using Python for web scraping, so that's why I'm asking. They said spinning up multiple processes in Python is easier so at scale it will work better. I know you can use the Cluster module to have child processes in NodeJS, but in my experience it's a bit of a pain to use, although it's not always required to use anyway, at least when only using NodeJS as a web server (as long as you have multiple NodeJS instances, in case one goes down). Web scraping is a bit different though. Curious if you have any thoughts on this. |
|
Concurrency is an important limitation as you've noticed, but it's already a problem for CPython. You would be able to squeeze out more req/s from NodeJS than CPython, up to a point where you would need to bring in something extra to scale to all the cores of one machine (multiprocessing in Python, something like Cluster in NodeJS) which you wouldn't need in Go/Rust/Java.
Then of course scaling further, you would need a system to run jobs across machines, and your choice of Go over Python wouldn't necessarily matter so much. The difference in performance wouldn't limit what you can do, it would just change what you pay for compute. If your compute costs more but your devs can implement features faster, performance is usually unimportant.