Hacker News new | ask | show | jobs
by jamesblonde 2620 days ago
At Logical Clocks, we build a horizontally scalable ML pipeline framework on the open-source Hopsworks platform, based around its feature store and Airflow for orchestration:

* https://hopsworks.readthedocs.io/en/latest/hopsml/hopsML.htm...

* https://www.logicalclocks.com/feature-store/

The choice for the DataPrep stage basically comes down to Spark or Apache Beam, and we currently support Spark, but plan to soon add support for Beam, because of some of the goodies in TFX (TensorFlow Extended).

For distributed hyperparam opt and distributed training, we leverage Apache Spark and our own version of YARN that supports GPUs -

* https://www.youtube.com/watch?v=tx6HyoUYGL0

For model serving, we support Kubernetes:

* https://hopsworks.readthedocs.io/en/0.9/hopsml/hopsML.html#s...

Our platform supports TLS/SSL certificates everywhere and is open-source. Download it and try it, and it runs in several large enterprises in Europe. We have a cluster with >1000 users in Sweden here:

* https://www.hops.site

(Edited for formatting)

1 comments

This is great, thanks for the link. Could you expand on how this workflow be different/better than sticking to just something like TFX and tensorflow serving? Is it easier to use or more scalable?
It is pretty much the same as TFX - but with Spark for both DataPrep and Distributed HyperparamOpt/Training, and a Feature Store. Model serving is slightly more sophisticated than just TensorFlow Serving on Kubernetes. We support serving requests through the Hopsworks REST API to TFServering/Kubernetes. This gives us both access control (clients have a TLS cert to authenticate and authorize themselves) and we log all predictions to a Kafka topic. We are adding support to enrich feature vectors using the Feature Store in the serving API, not quite there yet.

We intend to support TFX as we already support Flink. Flink/Beam for Python 3 is needed for TFX, but it's not quite there yet, almost.

It will be interesting to see which one of Spark or Beam will become the horizontally scalable platform of choice for TensorFlow. (PyTorch people don't seem as interested, as they mostly come from a background of not wanting complexity).