Hacker News new | ask | show | jobs
by storyinmemo 3332 days ago
My suggestion for learning: re-write this using futures. Here's a quick example of using similar parallelism to search structured json in gzip'd files: https://github.com/autocracy/snippets/blob/master/search_clo...
3 comments

Basically xargs in python, and I love xargs.

There are some cool examples of places were xargs fails in the docs, this got me very excited I hope it works in 2.7. https://docs.python.org/3/library/concurrent.futures.html

There is futures module for Python 2.7 on PyPI.

In simple cases, you could use multiprocessing.Pool():

  results = pool.map(processfile, getpaths())
There is also ThreadPool. There are known issues with multiprocessing especially on Python 2.7 e.g., Ctrl+C handling.

Though you don't need threading to manage concurrent subprocesses or IO (Gevent, Twisted or curio if you use Python 3).

And alwyays use if __name__ == '__main__' if you want be able to test your code.
I'm learning. Thanks for letting me know. Will use that.
Thanks. Will learn and use.