Hacker News new | ask | show | jobs
by CMCDragonkai 1411 days ago
My company matrix.ai was working a new cloud orchestration platform and Nix was core to how customers would package their applications/containers for deployment. The OS development is halted for now while we are working on a secrets management system.

So it was only natural to fully dog food Nix. We also introduced it to clients during our computer vision machine learning consulting work. It was the only way to get reproducible projects involving a complex set of Python dependencies, Tensorflow, CUDNN, CUDA, Nvidia libraries (there is a very strict set of requirements going all the way to hardware). I actually first tried doing it with Ubuntu and apt, it did not work. Setting up your own nixpkgs overlay is a must in these scenarios.

It is definitely something that is easier to fully dial in when you start from scratch. It's a comprehensive system so it will take time for adoption. I always recommend starting with it as a development environment tool first, then consider automating your OS conf or user profile or VMs... etc.

1 comments

What was the orchestration system used for? Was it in the case where there were many models that needed to be run one after another. I know it's a huge problem in video processing to be able to increase speed a ton. My company Sieve (see profile) is building infrastructure specifically for running ML models on video which is why I'm curious.
It was built for AI driven container orchestration, configuration synthesis from high level constraints.

Yes ML workloads is particularly complex, because they have both batch oriented data flows (training), and service oriented data flows (inference). There aren't many systems that can adequately express both.