Hacker News new | ask | show | jobs
by quangio 2785 days ago
What is the benefit of using MemSQL over some other free in memory databases like Apache Ignite ? I see that they have better documentation and support (edit: + competition on Codeforces which winners rarely receive their T-Shirts). What about other things?
3 comments

MemSQL CEO here. There are a few:

- MemSQL is transactional and writes transactions on disk

- MemSQL has an excellent implementation of SQL with mature query optimization and query execution. And it get better every release. This is from 6.5 https://www.memsql.com/blog/6.5-performance/

- MemSQL has in-memory and on-disk data storage so you can use MemSQL to store petabytes

- MemSQL has columnstores and vectorized query processing: https://news.ycombinator.com/item?id=16617098

- MemSQL supports geospatial, fulltext search, and json

- MemSQL allows you to stream data from kafka in one command: https://docs.memsql.com/sql-reference/v6.5/create-pipeline/

Given the availability of open source solutions why would I in 2018 build a critical part of my application on a closed platform?

Genuinely not intended as snark, I'm just curious why memsql is so compelling that I would consider it.

To build on the list of features above, here's a blog post from Pandora where they go into the details of what they use MemSQL for and some of the alternatives they looked into for their use case: https://engineering.pandora.com/using-memsql-at-pandora-79a8...
Isn't that what the previous answer was trying to say? Paraphrasing "MemSQL has these features which we think Apache Ignite and others do not."

All things being equal, I agree that open source solutions are the best. Things are just not always equal.

> Isn't that what the previous answer was trying to say?

Truth be told, the list if features is not very compelling. I mean, JSON support is not a reason to pick a commercial dbms over a FLOSS one.

There is also Tarantool which is another in-memory/SQL DB and is Open Source.

https://github.com/tarantool/tarantool

I would also like to see a response to this.
Those are some nice features. Unfortunately, your DBMS doesn't do some basic things like return consistent results for a simple SQL query with a group by and having clause. I admit this might be a configuration issue on my company's end, but if so, that is a terrible configuration option and should be hidden away, opt in only, with a huge wall of warnings so people don't actually enable it except in extreme circumstances.
I've seen you mention these inconsistent results twice here in this this thread, but have worked at MemSQL for 5 years and never heard of such an issue. Have you reached out to see if maybe your query / data is not what you expect? I've seen inconsistent results only once, and it was because the default date formats across RDMBSs were different (and was not anticipated).
How would the query/data not be what I expect if If I'm writing the query myself, and looking directly at the sql table definition to create it?

Beyond those considerations, why would the same exact same query (executed several times in rapid succession from the console) produce vastly different results? Also, I should clarify, rewriting the query from "select ... from xyz group by ... having ..." to "select ... from (select * from xyz where ...) group by ..." made the inconsistency goes away, without changing the filtering clause. That does not inspire confidence.

Can you post the full schema and query? Are you sure you are not projecting columns that are not part of the group by expression?
I really appreciate the offer. I got the go-ahead to share this information, where should I direct it?
If you’re hitting the disk, aren’t you losing some of the advantages of using an in-memory database in the first place? Or would it still be more performant than a traditional RDBMS due to optimized in memory data structures?
There is a compound effect in building memory optimized features and being a distributed database that can put a lot more cores at work and cache a lot more data in a cluster:

- In-memory row stores. Super fast for updates and point lookups

- Memory optimized hash joins that minimized cash misses. Great for analytical/reporting use cases

- Vectorization for columnstore query processing. Super fast aggregations that work best when data is cached in memory

Apache Ignite is an in-memory data grid that supports persistence and overflow-to-disk. It primarily started as a cache and now has a full key/value store with SQL-92 on top, but isn't a full relational database. Instead it has other features like distributed data structures, messaging, and is more about connecting your applications together. Easier deployment model with all nodes being identical.

MemSQL is a distributed full-featured relational database that has in-memory rowstore and on-disk columnstore tables with rich support for SQL, fulltext search and JSON. It's a fast RDBMS and does really well with analytical queries.

Do you need a fast cache, key/value, messaging system? Or a RDBMS with fast OLTP + OLAP capabilities?

Presto on top of Apache Ignite is basically an OLAP database (It just takes some effort to write the connector). It doesn't support transactional workloads though.
This sounds like one of those things that sounds fun on a whiteboard but will be hell to implement and actually make perform well at all.
"Basically like" is very different from "engineered for"

Column-oriented storage itself is many times faster for analytical queries, even if on disk, and combined with the other optimizations of MemSQL will get you far better performance. Along with all the data being able to constantly undergo transactional updates.

I would like to extend this question, what's the benefit of MemSQL over a columnar store db with a massive cache?