Hacker News new | ask | show | jobs
by jeffbee 1520 days ago
My only experience in the financial sector indicates the opposite. The firm held its trading history for tens of thousands of accounts going back 60 years in a SQL Server. Anyone who wanted a question answered had to submit it for overnight analysis and get the answer the next day. But in an optimal non-RDMS representation, said trading history could be condensed to a single 200MB flat file that could be queried interactively, in microseconds. Dumping the RDBMS for most use cases pretty much revolutionized the daily experience for the people at that firm.
3 comments

This seems borderline impossible? I'd be curious if there were missing indexes or bad index selection or efforts to do things like joins? Normally for very large data sets you can shard by account if needed if that's the common filter if there is some insane table scan. Audit stuff tends to be "sparse" so if you can get an index which just tells you which pages have results, that is usually a 100X speedup with a pretty small index size.

But agreed, a daily dump to something can go a long way to unlocking other tools - in govt this is especially true not because the DBMS is hard to use, but because so many layers of consultants and others in the way it's not usable.

SQL Server can easily handle such datasets if columnstore is employed. I wouldn't be surprised if a single weekend of building a proper index (...being very generous here) wouldn't make their DB go literally 100x faster.
The problem at that firm wasn't the SQL Server it was the RDBMS mindset which led them to do joins between tables of journaled trades and much larger tables of corporate actions going back decades, instead of materializing a more useful intermediate result. This is my main beef with RDBMSs: they lead to cognitive hazards of this type. It's not that the databases are themselves naturally bad systems.
Were they running this on a 386? Why didn't they optimize their database?
It was "optimized" for the aesthetic concerns of RDBMS purists and for ease of implementation of certain other systems, in other words it was optimal for the developers and severely sub-optimal for the users, which is a problem endemic to RDBMS fandom.
So it was basically "write optimized", fully normalized, required at least a half dozen joins to get anything out of it?