| https://forge.foxtrotcommunications.net/ I've been a data engineer for years and one thing drove me crazy: every
time we integrated a new API, someone had to manually write SQL to flatten
the JSON into tables. LATERAL FLATTEN for Snowflake, UNNEST for BigQuery,
EXPLODE for Databricks — same logic, different syntax, written from
scratch every time. Forge takes an OpenAPI spec (or any JSON schema) and automatically: 1. Discovers all fields across all nesting levels
2. Generates dbt models that flatten nested JSON into a star schema
3. Compiles for BigQuery, Snowflake, Databricks, AND Redshift from the
same metadata
4. Runs incrementally — new fields get added via schema evolution,
no rebuilds The key insight is that JSON-to-table is a compilation problem, not a
query problem. If you know the schema, you can generate all the SQL
mechanically. Forge is essentially a compiler: schema in, warehouse-
specific SQL out. How it works under the hood: - An introspection phase scans actual data rows and collects the union
of ALL keys (not just one sample record), so sparse/optional fields
are always discovered
- Each array-of-objects becomes its own child table with a hierarchical
index (idx) linking back to the parent — no manual join keys needed
- Warehouse adapters translate universal metadata into dialect-specific
SQL:
BigQuery: UNNEST(JSON_EXTRACT_ARRAY(...))
Snowflake: LATERAL FLATTEN(input => PARSE_JSON(...))
Databricks: LATERAL VIEW EXPLODE(from_json(...))
Redshift: JSON_PARSE + manual extraction
- dbt handles incremental loads with on_schema_change='append_new_columns' The full pipeline: Bellows (synthetic data generation from OpenAPI specs)
→ BigQuery staging → Forge (model generation + dbt run) → queryable
tables + dbt docs. There's also Merlin (AI-powered field enrichment
via Gemini) that auto-generates realistic data generators for each
field. I built this because I watched teams spend weeks writing one-off
FLATTEN queries that broke the moment an API added a field. Every
Snowflake blog post shows you how to parse 3 fields from a known
schema — none of them handle schema evolution, arbitrary nesting depth,
or cross-warehouse portability. Try it: https://forge.foxtrotcommunications.net Happy to answer questions about the architecture, the cross-warehouse
compilation approach, or the AI enrichment layer. |
Curious how you handle schema drift at the introspection phase specifically when an API starts returning a field as sometimes a string and sometimes an object depending on the endpoint response. Does Forge pick a winner or surface it as a conflict for the user to resolve?