This is great, thanks for the link.
Could you expand on how this workflow be different/better than sticking to just something like TFX and tensorflow serving? Is it easier to use or more scalable?
It is pretty much the same as TFX - but with Spark for both DataPrep and Distributed HyperparamOpt/Training, and a Feature Store. Model serving is slightly more sophisticated than just TensorFlow Serving on Kubernetes. We support serving requests through the Hopsworks REST API to TFServering/Kubernetes. This gives us both access control (clients have a TLS cert to authenticate and authorize themselves) and we log all predictions to a Kafka topic. We are adding support to enrich feature vectors using the Feature Store in the serving API, not quite there yet.
We intend to support TFX as we already support Flink. Flink/Beam for Python 3 is needed for TFX, but it's not quite there yet, almost.
It will be interesting to see which one of Spark or Beam will become the horizontally scalable platform of choice for TensorFlow. (PyTorch people don't seem as interested, as they mostly come from a background of not wanting complexity).
We intend to support TFX as we already support Flink. Flink/Beam for Python 3 is needed for TFX, but it's not quite there yet, almost.
It will be interesting to see which one of Spark or Beam will become the horizontally scalable platform of choice for TensorFlow. (PyTorch people don't seem as interested, as they mostly come from a background of not wanting complexity).