| +1, tomasdpinho. Yes to everything, and notably the queues everywhere, versioning the models, and the issue to mix sync and async (go for queues). As a scientist designing risk management systems, I also like to: . avoid moving the data; . bring the (ML/stats) code to the data; . make in-memory computations (when possible) to reduce latency (network+disk); . work on live data instead of copies that drift out-of-date; and . write software to keep models up to date because they drift with time too and that's a major, operationally un-noticed, and extremely costly problem. I'm not yet into Tensor/ML-Flow, but I use R, JS, and Postgres, thereby relying on open-source eco-systems (and packages) that are: . as standard as possible; . well-maintained; . with a long expected support; and . as few dependencies as possible. |