Hacker News new | ask | show | jobs
by gigatexal 637 days ago
I got my start in a SQLserver all MS shop.

I hated SSIS. I wished I just had something akin to Python and dataframes to just parse things in a data pipeline. Instead I had some graphical tool whose error messages and deployments left a lot to be desired.

But SQLServer and Microsoft in general make for good platforms for companies: they’re ubiquitous enough that you won’t find it hard to find engineers to work on your stack, there’s support, etc

2 comments

The key feature of SSIS, is parallel dataflow

You can so easily write (and schedule) parallel dataflows in SSIS, to do the same code using a general purpose programming language would be a lot harder

Also remember that dataflows are data pipe streams, so SSIS can be very very fast

Anyway, there is BIML, which allow you to create SSIS package by writing XML, I personally never used it, mainly because its licensing situation seemed weird to me ( i think BIML is free, but the tool support is not, and MS SSDT doesnt support BIML coding i thinkg)

Yeah I think I never gave it a fair shake. I think — like most things — if understood and used properly it can be amazing.
SSIS is for integrations, and pandas is definitely not. I’m not sure what you’re trying to do with SSIS that you’re also doing with pandas, but it’s probably wrong. SSIS is far more geared to data warehousing integrations, while pandas would be reading a data warehouse and doing stuff with it. SSIS isn’t really meant for processing incoming data, even if you can kind of hack it together to do that.

I will say that when we want “real time” integrations, SSIS is phenomenally bad. But that’s not entirely unexpected for what it is.

We don't need to be so pedantic. Python -- as it often is -- would be the glue, it would be the connecting part, and pandas (polars, duckdb, anything really) would be the processing part. Then once processed the outputs would be placed somewhere be it an update to a db table or some other downstream thing.
I was just saying you should most likely not be doing data processing with SSIS. That’s not what it’s for even if it can be cobbled in to doing some.