Pachyderm is a system for the nouns; this is a system for the verbs.
Polyaxon makes it easy to schedule training on a Kubernetes cluster. The problem this solves is that machine learning engineers generally spend too long running their jobs in series, rather than parallel. Instead of running one thing and waiting for it to finish, it's both more efficient and better methodology to plan out the experiments and then run them all at once.
Pachyderm is more concerned with versioning and asset management. It's more like Git+Airflow.
Let's say your experiment depends on training word vectors from Common Crawl dumps. You need to download the dump, extract the text you want, and train your word vectors models. Pachyderm is all about the problem of caching the intermediate results of that ETL pipeline, and making sure that you don't lose track of like, which month of data was used to compute these vectors. Polyaxon is all about the problem of, there are so many ways to train the word vectors and use them downstream. You want to explore that space systematically, by scheduling and automatically evaluating a lot of the work in parallel.
I just want to add to the other comments that Polyaxon focuses on different aspects of machine learning reproducibility than Pachyderm, although Polyaxon will be providing a very simple pipelining abstraction to start experiments based on previous jobs, triggers, or schedules, or provide the possibility to run post-experiment jobs. It will not focus on data provenance the same way Pachyderm does. In fact, Polyaxon and Pachyderm could be used together.
I don't know pachyderm but it seems to me quite similar to storm to create data pipelines. Polyaxon is useful to train deep learning models in a cluster. I couldn't find any example of how to do it in pachyderm (there are examples with only one node).
Polyaxon makes it easy to schedule training on a Kubernetes cluster. The problem this solves is that machine learning engineers generally spend too long running their jobs in series, rather than parallel. Instead of running one thing and waiting for it to finish, it's both more efficient and better methodology to plan out the experiments and then run them all at once.
Pachyderm is more concerned with versioning and asset management. It's more like Git+Airflow.
Let's say your experiment depends on training word vectors from Common Crawl dumps. You need to download the dump, extract the text you want, and train your word vectors models. Pachyderm is all about the problem of caching the intermediate results of that ETL pipeline, and making sure that you don't lose track of like, which month of data was used to compute these vectors. Polyaxon is all about the problem of, there are so many ways to train the word vectors and use them downstream. You want to explore that space systematically, by scheduling and automatically evaluating a lot of the work in parallel.