|
|
|
|
|
by KRAKRISMOTT
1202 days ago
|
|
Do you plan to add data management too? Because those are the biggest features offered by your competitors like weights and biased. Having a place to dump and load a few hundred gigabytes of data is very important because many on-demand cloud compute services don't offer persistence. Most ML training at scale aren't using Colab notebooks beyond initial prototyping because it's too expensive. Dealing with a cluster of servers and running Jupyter on them is already annoying enough, so having data management abstracted away makes life a lot easier. https://wandb.ai/site/artifacts Make sure to talk to your users while building this. Some platforms didn't, for example https://docs.grid.ai/features/datastores Grid/Lightning's data management is half baked. They only allow mounting one set of data per instance, which is close to useless for any training beyond the most simplistic of applications because most data aren't nicely cleaned. You often have to bring together disparate sets of data for multi-modal applications. |
|
Soon, we plan to add data management features too but primarily on the production side so that data scientists can safely and securely version the data which their AI application came across in production as well as use it to refine their model (if allowed)