|
|
|
|
|
by pdimitar
2902 days ago
|
|
The only complication is if you want to use Meeseks (https://github.com/mischov/meeseeks) which requires the Rust compiler and runtime be installed because it has native bindings. Meeseks is useful because it's a bit faster than the default Floki (https://github.com/philss/floki) and because it can handle very malformed HTML. As for Elixir itself, here's a quick example: ``` # Assume this contains 1000 URLs urls = [....] # This will utilize 100 threads; if the second parameter is omitted, it will use threads equal to CPU cores. For I/O bound tasks however it's pretty safe to use much more. results = Task.async_stream(&YourScrapingModule.your_scraping_function/1, max_concurrency: 100) ``` It's honestly that simple in Elixir. For finer grained control the line count is little bigger -- but little. Not hundreds of lines for sure. |
|
The better handling of malformed HTML by default is the much bigger deal.