Hacker News new | ask | show | jobs
by johnnyhillbilly 3260 days ago
Can anybody give me an example of any dbms that spans the OLTP/OLAP gap properly _on the same data_?

Potentially added constraints: -ACID for OLTP

-## TB+

-30-way analytical joins on complex criteria and multiple data sets with billions of entries

-fast iterations on data prep for analytics, so analysts can make, find and correct errors

-proper workload management (almost no "stupidly designed" queries)

-HA

-HPC

I'm asking because I can't see this without hw and sw being integrated to allow for it (appliance). Are there any cloud offerings that live up to this?

EDIT: Formatting got mangled on submit.

6 comments

Oracle does pretty well running OLTP and analytics workloads concurrently against the same schema. While I don't generally recommend doing so, you can fire off massive scans or complex joins against a production system doing thousands of transactions per second and it handles it just fine.

In my view, one reason that we don't see huge demand for this combination is that the schema that makes sense for analytics is often different from that which makes sense for the online system.

Not that I'm aware of. It's tricky because the data storage format, query scan method, data distribution etc. requirements are different for OLTP/OLAP. However you can replicate the data to both OLTP and OLAP database and use both of them at the same time. That's people usually do. In fact, even if there is a database for this use-case, it should probably do the same process internally so that you don't need to do it in application level.
I'm one of the founders at MemSQL, which does what you describe.
Totally tangential, but I'm curious if you could elaborate a bit on how compiled queries [1] work in MemSQL? Am I correctly guessing that it completely ditches the traditional Volcano-style iterator model, in a manner similar to HyPer [2]?

[1] https://docs.memsql.com/v5.8/docs/code-generation

[2] http://www.vldb.org/pvldb/vol4/p539-neumann.pdf

Looks like a sound design based on a cursory inspection :)

Question(s): Do you offer any appliances? The reason why I am asking is for computationally intense workloads where the same data may be shuffled around multiple times between processors. Can one e.g. set up MemSQL with RDMA over Infiniband?

MemSQL engineer here.

No, we do not offer appliances. We are a software only solution. I do not know of any deployments where RDMA is being utilized today. I'm interested in your use case. If you're so inclined, join chat.memsql.com (my UN is eklhad) and we can converse a bit more rapidly.

Thank you guys for your answers :)

I am charting the landscape of distributed database systems (federated and homogenous). Node interconnectivity is just one of many potential bottlenecks.

With a sufficiently complex query, redistribution of data by hash must occur a number of times for linear scalability (based on my understanding). Ethernet based interconnectivity typically suffers from high CPU utilization and various QoS issues for this particular use case. This also seems to apply to Ethernet based fabric offerings, though I haven't kept up with that field for a couple of years.

If you guys are encountering performance issues connected to either RAM=>CPU loading or data redistribution between nodes, you may want to keep this in mind.

I may get in touch via chat at a later time as I'm slightly more than average interested in HPC database systems :) The more offerings, the better!

Check snappydata ?
Netezza but IBM bought it and killed it.
SAP HANA?
It's in-memory so you also need TBs of memory in order to be able to use it for analytical workloads.