| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pgao 2164 days ago

Thanks a bunch!

I think the biggest issues with this approach is the requirement for embeddings. It's hard sometimes for a customer to understand what layer to pull out of their net to send to us, so sometimes we just use a pretrained net to generate embeddings. One net for audio, one net for imagery, one net for pointclouds, etc.

I'd say that it's harder for this tool to work with structured/tabular data for a few reasons.

One, most structured datasets are domain-specific, so it's not easy to pull a pretrained model off the shelf to generate embeddings - typically we would need a customer to give us the embeddings from their own model in these cases.

Two, neural nets actually aren't the best for certain structured data tasks. Tree-based techniques often get better performance on simpler tasks, which means there's no obvious embedding to pull from the model.

Three, an alternate interpretation is that a feature vector input for structured data tasks is already an embedding! When the input data is low dimensional, you can do anomaly detection and clustering just by histogramming and other basic population statistics on your data, so it's a lot easier than dealing with unstructured data like imagery.

So I wouldn't say that our tooling wouldn't work for structured data, but more that in those types of cases, maybe there's something simpler that works just as well.