Hacker News new | ask | show | jobs
by tyingq 1186 days ago
I see some evidence that it handles complex transformations, but there's so many corner cases in the real world, like...

- Different ranges, where the source is, say "size 0-10", and the destination is "S/M/L".

- Various flattening or exploding needs. Like an array of namespaced tags driving a flat list of boolean fields. Or a source with 2 tables and a foreign key being transformed into tags, or flat fields, or a 3-level nesting.

- Encoding/Decoding things. Transforming windows-1252 into utf-8. Decoding base64 (or json, or xml, or...) and storing as fields in the destination.

- Compound transforms, both directions, two fields into one, or vice-versa with splitting on a delimeter.

- Appending a unique suffix/count to some field because the source doesn't enforce uniqueness on the field, but the destination does. Or going the other direction.

- Hundreds of similar patterns.

It's fairly easy to see the breadth if you look at all the dials and knobs on any popular ETL tool.

I'm curious if the idea is to pull all these into scope, or if it's to ignore it, and focus on a deliberately smaller market.

1 comments

We've observed that our system performs really well handling most corner cases as long as the context required can be interpolated from its inputs (either in the schemas and their descriptions or in the underlying data we sample from). In the worst case, the most you'd have to do is edit schema descriptions on our platform to include the necessary context (For example, specifying the encoding that you expect the field in your end schema to have).

For the compound transform scenario, since we optimize for modularity in the transformations we build, our systems prioritize defining these transformations unless it makes no sense to do so.