Hacker News new | ask | show | jobs
by hp50g 4814 days ago
We plugged spring integration, spring batch and jasper in and we're sorted.

The main problem with SSIS and SSRS is that you just can't put it in version control or scale it up inexpensively. Plus to be honest the tooling is shitty and unreliable (says me who spent all day fighting T-SQL debugger HRESULT crashes in SSMS whilst trying to reverse engineer a huge shitty black box SQL sproc into something scalable and testable in Java). Ugh.

We run both massive SQL server and postgres installations.

1 comments

To save me a lot of reading time, is spring batch and integration more or less a drop in for SSIS?

I am looking at Postgres at the minute and the biggest stop for me is mass data imports from our customers on a nightly basis.

I've been keeping an eye on Postgres since version 9 as it has (reportedly) the same features as SQL but for a lot less.

We currently pay £250 a month per processor license and this shoots up costs a lot when you have a cluster of servers. If it's unavoidable then we carry on paying but if there are free alternatives out there then I'll jump at the chance.

The rest of the DB is used for serving sites, no other BI work done really.

EDIT: Typo

Re mass data imports - Postgres offers parallel restore of dumps since version 8.4:

(Quoting the documentation for pg_restore):

-j number-of-jobs --jobs=number-of-jobs Run the most time-consuming parts of pg_restore — those which load data, create indexes, or create constraints — using multiple concurrent jobs. This option can dramatically reduce the time to restore a large database to a server running on a multi-processor machine.

Each job is one process or one thread, depending on the operating system, and uses a separate connection to the server.

The optimal value for this option depends on the hardware setup of the server, of the client, and of the network. Factors include the number of CPU cores and the disk setup. A good place to start is the number of CPU cores on the server, but values larger than that can also lead to faster restore times in many cases. Of course, values that are too high will lead to decreasing performance because of thrashing.

Only the custom archive format is supported with this option. The input file must be a regular file (not, for example, a pipe). This option is ignored when emitting a script rather than connecting directly to a database server. Also, multiple jobs cannot be used together with the option --single-transaction.

http://www.postgresql.org/docs/8.4/static/app-pgrestore.html

Its not a drop in replacement. You have to write code/configuration (but not much). Its definitely suitable for your use case - we use it for the same thing.

Feature set is comparable. We tend to avoid specific platform features as they are a migration risk.

Ha - our SQL license fee is around £60k a machine one off per major drop. We're not dropping that again for 2012. No way. Not when we have 8 machines :)

We have dug ourselves the hole really by using feature specific code and relying on certain features in our structure.

We are slowly digging ourselves out the hole with dependency injection in our code from a Entity point of view and want to go a similar route with our DB.

We are nowhere near yourselves but we have spent £18k on SQL licenses in the past 2 years. It's a lot of cash and I don't think we see the full benefit to be fair. Coupled with the abundance of other Microsoft licenses we pay out for, more than half our server costs are licenses.

I appreciate your comment, it's taken me a step in the right direction to sorting out DBs out.

Out of curiosity, how comprehensive is your test suite? For one company I worked for, we actually found PostgreSQL was outperforming Oracle and because of how comprehensive our test suite was, the lead dev connected to PostgreSQL and got 80% of the test suite passing in one evening. A strong test suite allows you to instantly find out where your app breaks down.

Side note for those who don't believe PostgreSQL can outperform Oracle: we needed a custom data type that we were aggregating over. The projected table size was over a billion rows (last I heard it had reached over 4 billion rows). As I recall, in Oracle, you were limited to writing user-defined data types in SQL or Java (now you can use C). PostgreSQL, being open source, allowed the leave to implement the custom data type in C. The query we needed in Oracle took several minutes to run due to the need to constantly serialize/deserialize the data over the aggregation. The PostgreSQl version returned in a few seconds.

We even hired a well-known Oracle performance consultant who did wonderful things to all of our queries ... except for this one custom data type which left him stumped.

The code change (and risk) can potentially be quite substantial if you're relying on default SQL server behaviour (i.e. not MVCC/snapshot isolation). Code that relies on blocking when it hits rows that are locked by another process get a bit of a nasty surprise during that kind of switch :-).
Always have deterministic known behavior :)

We shot ourselves when we first implemented NHibernate by setting our transaction boundary at the wrong place. This caused all sorts of portability problems.

I would recommend Talend Open Studio for the ETL portion. You'll get the same SQL bulk load feature as SSIS with any Database. http://www.talend.com/products/talend-open-studio

Choose Big Data or Integration downloads.

Thanks for the link, I've been looking around for software that does similar things.

I have been taking a look at SQL Web Edition in the view of coming away from standard that supports SSIS so it's good I'm getting a few options of other DBs too.

Although a majority of our stack is Microsoft based, I would love to get off the Microsoft train when it comes to the DB. It's a massive cost to our business