| After building ML systems in various organizations, I collected some useful kit I wrote into a single library. What you can do with this: * Save (and retrieve) model checkpoints (optionally with a content-addressable naming scheme) on blob storage * Load datasets incrementally from blob storage into Pytorch, using a local disk cache * store your training metrics into SQLite Design principles : * "dumb cloud and smart software" - I prefer commodity services like object storage and container runtimes to framework-like abstractions (e.g. managed MLFlow or similar) * extend Lightning in the most straightforward way * let the user assemble a lightweight MLOps process with minimal changes to preexisting model code. Happy to field any questions and receive feedback ! The library was refined using Sonnet, but thoroughly checked by eye and hand. |