Hacker News new | ask | show | jobs
by ishcheklein 2173 days ago
Hey! DVC maintainer and co-founder here. First of all, congrats and let me know if we can help you or you have some collaboration in mind! A few questions - how does workflow look like - do you expect users to upload all data to your service? How can data then be consumed from the platform?
1 comments

Thanks!

We don't expect users to upload all data to our service - the type of data we're interested in is "metadata." URLs to the raw data, labels, inferences, embeddings, and any additional attributes for their dataset. Users can POST this to our API and we'll ingest it that way.

If users don't provide their own embeddings, we need access to the raw data so we can run our pretrained models on the data to generate embeddings.

However, if users do provide their own embeddings, we would never need access to the raw data - Aquarium operates on embeddings, so the raw data URLs would be purely for visualization within the UI. This is really nice because it means that we can access restrict URLs so only customers can visualize it (via URL signing endpoints, only authorizing IP addresses within customer VPNs, Okta integration) and Aquarium would operate on relatively anonymized embeddings and metadata.