|
|
|
|
|
by orenmazor
4676 days ago
|
|
We started out with mondrian+mysql, but quickly had to drop mysql ("mysql is great until you want to put data into it and then get it back out again" - unattributed, to protect the guilty). Our primary work was with postgresql in the beginning. which, in my opinion, is a pretty solid database to work with overall. We did partition all of our dimension and facts tables, which helped a great deal. Aggregates were a problem that is specific to us, so we couldn't cheat the usual way that reporting servers do. The other problem is that our stack is suddenly fully of things like java and olap and postgresql, which made onboarding people who wanted to help, and just debugging, a pain. I like that comment about the coalface/bookworm, but sometimes it takes somebody on the outside to see what I'm missing. |
|
You can do very well at much, much lower cost in the Python world: pandas, PyTables, or even just straight numpy.
Seriously, using any of these would make the report generation time basically zero, and you'd just have to make your ETL work quickly enough to feed it; how well this can be solved depends on how you store the original data ("pre-facts").
The book written by this guy http://blog.wesmckinney.com/ (and the guy himself, if you can get him) will probably advance you way more than experiments.