| HN Mirror

Thanks!

We don't expect users to upload all data to our service - the type of data we're interested in is "metadata." URLs to the raw data, labels, inferences, embeddings, and any additional attributes for their dataset. Users can POST this to our API and we'll ingest it that way.

If users don't provide their own embeddings, we need access to the raw data so we can run our pretrained models on the data to generate embeddings.

However, if users do provide their own embeddings, we would never need access to the raw data - Aquarium operates on embeddings, so the raw data URLs would be purely for visualization within the UI. This is really nice because it means that we can access restrict URLs so only customers can visualize it (via URL signing endpoints, only authorizing IP addresses within customer VPNs, Okta integration) and Aquarium would operate on relatively anonymized embeddings and metadata.