| Hello and happy new year! We're excited to introduce the Pyper package for concurrency & parallelism in Python. Pyper is a flexible framework for concurrent / parallel data processing, following the functional paradigm. Source code can be found on [github](https://github.com/pyper-dev/pyper) Key features: Intuitive API: Easy to learn, easy to think about. Implements clean abstractions to seamlessly unify threaded, multiprocessed, and asynchronous work. Functional Paradigm: Python functions are the building blocks of data pipelines. Let's you write clean, reusable code naturally. Safety: Hides the heavy lifting of underlying task execution and resource clean-up. No more worrying about race conditions, memory leaks, or thread-level error handling. Efficiency: Designed from the ground up for lazy execution, using queues, workers, and generators. Pure Python: Lightweight, with zero sub-dependencies. We'd love to hear any feedback on this project! |
But I'm not sure I can use this even though I have a specific use-case that feels like it would work well (high-performance pure Python downloading from cloud object storage). The examples are a bit too simple and I don't understand how I can do more complicated things.
I chunk up my work, run it in parallel and then I need to do a fan-in step to reduce my chunks - how do you do that in Pyper?
Can the processes have state? Pure functions are nice, but if I'm reaching for multiprocess, I need performance and if I need performance, I'll often want a cache of some sort (I don't want to pickle and re-instantiate a cloud client every time I download some bytes for instance).
How do exceptions work? Observability? Logs/prints?
Then there's stuff that is probably asking too much from this project, but I get it if I write my own python pipeline so it matters to me - rate limiting WIP, cancellation, progress bars.
But if some of these problems are/were solved and it offers an easy way to use multiprocessing in python, I would probably use it!