Hacker News new | ask | show | jobs
by rudolfix 965 days ago
we actually spent several weeks writing openAPI -> dlt pipeline converter. you can check what've got here: https://github.com/dlt-hub/dlt-init-openapi

we'll continue this project but I learnt from it that most of the openAPI specs are a mess with hundreds of endpoints, incomplete definitions, lack of relations between endpoints, unique constraints etc. so there's tons of heuristics needed anyway. but sometimes it works. and is quite amazning!

if your source has well defined schema, we support ie. arrow tables natively. we keep 100% of that schema: https://dlthub.com/docs/blog/dlt-arrow-loading if you want to define your own schemas you can do it in many different way: - via pydantic models: https://dlthub.com/docs/general-usage/resource#define-a-sche... - via json-schema like definitions: https://dlthub.com/docs/general-usage/resource#define-schema - in a schema file: https://dlthub.com/docs/walkthroughs/adjust-a-schema

if you want to enforce schema and data contracts: - you can use pydantic models to validate data (if you use pydantic model as a table definition, this is the default) - we have soon-to-be-merged schema contract PR: https://github.com/dlt-hub/dlt/pull/594

My observations are that it is more than 1% of people that are fine with auto-generated schemas. But that could be selection bias (they use our library because they like it).