| HN Mirror

The sticking point for me, though, is side effects. Once you need to call an external API—maybe to insert vector embeddings, send records to a SaaS service, or update some non-SQL store—you lose the comfortable ACID guarantees and pure SQL elegance. Even if you stage data in a DuckDB table, you still have to process each row or batch with an imperative approach. That’s where I start feeling the friction. SQL is brilliant for purely data-driven transformations; it doesn’t inherently solve "call this remote side-effect function in small batches, handle partial failures, and keep the pipeline consistent.

Can we unify those worlds? If your project, Sqlflow, manages to let folks stay mostly in SQL—while also elegantly handling side effects—that might be a huge step forward. For strictly data-focused workflows, I’m 100% on board that SQL alone is often the best "DSL" around. The complexity creeps in when we go from "write results to a table" to "call an external system" (possibly with partial commits, retries, or streaming needs). That’s usually where we end up rolling bespoke logic. If Sqlflow can bridge that gap, it’d be awesome. I’ll check it out—thanks for sharing.