Hacker News new | ask | show | jobs
by BrentOzar 4067 days ago
If I was going to pick a relational database system, I'm not sure these would be the criteria I'd use:

CSV support - if you do that much CSV extract/transform/load (or indeed, any kind of ETL work), use an ETL tool. SQL Server comes with SQL Server Integration Services for that kind of thing.

Ergonomics of dropping and creating tables and stored procedures - the author's example is probably the toughest way I can think of to drop a table. It's easier to check sys.all_objects (which will catch anything - functions, views, procs, etc).

Operating system choice - well, in 2015, if you're going to mention that, you should be thinking cloud services anyway. They're both available in the cloud - and at which point, who cares what OS it runs on?

Goes on and on. I'm a SQL Server guy, and if I was going to make a list of how to choose a relational database platform, here's what I'd probably list:

* Up-front cost (license, maintenance)

* Ease of finding staff, and their costs

* Ease of finding tooling, and its cost

* Ease of finding scalable patterns (proven ways to grow it)

I don't think either platform is a loser on those counts - it's hard to go wrong with either one.

4 comments

I mean you have a point on the OS, being in the cloud is a definite bonus. However caring what OS it's running on comes down to something he was brief about in his article. For me, I can manage an Linux server no issue with just a bash prompt, I can't honestly say the same about a Windows Server. Other people might argue Apache and running that on a Windows server is kind of pointless? I don't know, just spit-balling that one.

Obviously Windows would be IIS, which means overall the OS choice is a nice one. The fact that you can choose which one to host with, which means you get something comfortable.

> For me, I can manage an Linux server no issue with just a bash prompt, I can't honestly say the same about a Windows Server.

Have you seen Windows Core and PowerShell? An increasing number of admins are doing just that - it's how we work at Stack Overflow, for example.

I've used PowerShell quite a bit actually, even wrote a few script a couple years ago to work with Active Directory. I still can't see myself managing a server from there. I know configurations and everything are possible through it, like the registry and such. However config files are just so much more friendly to me. Which is probably why I would pick Linux on that. Permissions are also a little bit more basic and harder to screw up in Linux IMO.
It's try-able. But tell me how to setup and configure IIS ARR as a reverse proxy without using the GUI.

SQL Server is probably the best case, since the UI is a wrapper around TSQL and they expose the scripts at any point.

Then there's fun stuff. Like Invoke-WebRequest going super slow because it has a terribly implemented progress bar. That's right: out of the box, you can't download files with the shipped downloader, because it shows progress (like curl) and increases transfer times by a couple orders of magnitude. Come on.

Oh, do you work for SO Brent?
> Oh, do you work for SO Brent?

I just do SQL Server consulting for 'em from time to time. They're ridiculously sharp guys, so they don't need me much anymore.

Fully agree! This article is based on merits that I would hardly consider if making a choice between any relational engine. It is even more disturbing that the author claims to be a data analyst. CSV support.......R U serious? Who in the right mind would spend hundreds of thousands of $$$ for SQL Server Enterprise Edition licences and regret it because PostreSQL can do it better for free? I'm using both and they're both great but not for or in spite of the reasons outlined here. Author needs to educate herself/himself better before publishing this nonsense.
> CSV support - if you do that much CSV extract/transform/load (or indeed, any kind of ETL work), use an ETL tool.

Why? CSVs are portable and much easier to deal with (when exported correctly, which PostgreSQL does and MS SQL does NOT).

Yes. I have seen more than few 100k+ employees companies using CSV for data transfer between their internal systems.

They were usually processed using Oracle external tables though.

Can you drop a recommendation for your favorite startup-friendly ETL tool?
Pentaho if you have no or little money or SSIS (comes with SQL Server licence)
SSIS is the most frustrating, worthwhile bundled app I've seen. It's gotten significantly better but for what is essentially a giant XML writer they sure do like to hide all the options and configurations. I love the API though, so handy.
I don't think so. I used it since it was DTS, not SSIS and whilst Informatica is better at all things ETL, SSIS is great for what it can do out of the box. Besides, if it can't, you can always do it via custom C#or VB extension.

Just finished a multi-terabyte data warehouse project where all data is loaded using SSIS and ControlM and it works great! The dev process was also easy and trouble-free.

I can't tell if I know you now...

Anyways, yes, it's very powerful and an excellent tool but I always have to describe it as awfulsome. SSISDB has made great improvements to the setup and promotion of objects which previously could be a pretty big headache (especially when you're not the one deploying etc). Functions for the various tasks/settings can be hidden and it can take a while to convince newcomers that the options are available, you just have to...poke around a bit. It comes with a substantial amount of controls.

The metadata/conversion of data can be frustrating but automation greater improves this with a quick controlling framework and the sys tables, it also makes moving data across hundreds of tables a breeze. It is very fast and very stable and I'm glad it's getting improvements and having focused development.

I tried Pentaho, but our ETL tasks are being written by engineers (at least right now), and it seemed much clunkier than just writing ruby scripts. Is it worth it in the long run?
If you just need to move data from A 2 B than no. If your transformations are complex and will grow overtime than yes, the time and learning curve are justified IMO.