|
|
|
|
|
by dodata
2307 days ago
|
|
Neat! Congrats on the launch - the demo is very helpful to understand the product. Having consumed long, painful PDF data dictionaries in the past, this is a big breath of fresh air. Excited to see where Syndetic goes! For me, the most painful part of working with 3rd party data was actually figuring out the "match rate" to internal data. For example, you might be a consumer-facing company who hopes to add more context to your internal data by pulling in 3rd party information for existing clients. To match your internal data to a 3rd party dataset, you usually match on some hashed email (or similar identifier) to see what percentage of your consumer records will be available in the 3rd party dataset. Have you thought about something like that with your tool? Maybe you can upload a sample of hashed emails and see how different match rates pan out. |
|
We're going to be adding a feature where we can flag fields as identifying keys and index them. We'll start with a simple intersection count ("upload 100 stock tickers, see how many records match"). Then we'll add an interactive feature to let a prospective customer generate all of the stats in the dictionary scoped down to the subset of data they care about. It's important to be able to answer questions like "for the 100 tickers I care about, how many NULLs are there for this other column?".
Maybe someday we'll even get into the more general record linkage problem when there's no reliable matching key.