| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by quangio 2785 days ago
	What is the benefit of using MemSQL over some other free in memory databases like Apache Ignite ? I see that they have better documentation and support (edit: + competition on Codeforces which winners rarely receive their T-Shirts). What about other things?

3 comments

nikita 2785 days ago

MemSQL CEO here. There are a few:

- MemSQL is transactional and writes transactions on disk

- MemSQL has an excellent implementation of SQL with mature query optimization and query execution. And it get better every release. This is from 6.5 https://www.memsql.com/blog/6.5-performance/

- MemSQL has in-memory and on-disk data storage so you can use MemSQL to store petabytes

- MemSQL has columnstores and vectorized query processing: https://news.ycombinator.com/item?id=16617098

- MemSQL supports geospatial, fulltext search, and json

- MemSQL allows you to stream data from kafka in one command: https://docs.memsql.com/sql-reference/v6.5/create-pipeline/

link

noir_lord 2785 days ago

Given the availability of open source solutions why would I in 2018 build a critical part of my application on a closed platform?

Genuinely not intended as snark, I'm just curious why memsql is so compelling that I would consider it.

link

ruw1090 2785 days ago

To build on the list of features above, here's a blog post from Pandora where they go into the details of what they use MemSQL for and some of the alternatives they looked into for their use case: https://engineering.pandora.com/using-memsql-at-pandora-79a8...

link

dangoor 2785 days ago

Isn't that what the previous answer was trying to say? Paraphrasing "MemSQL has these features which we think Apache Ignite and others do not."

All things being equal, I agree that open source solutions are the best. Things are just not always equal.

link

geezerjay 2784 days ago

> Isn't that what the previous answer was trying to say?

Truth be told, the list if features is not very compelling. I mean, JSON support is not a reason to pick a commercial dbms over a FLOSS one.

link

_up 2785 days ago

There is also Tarantool which is another in-memory/SQL DB and is Open Source.

https://github.com/tarantool/tarantool

link

_eht 2785 days ago

I would also like to see a response to this.

link

CuriouslyC 2785 days ago

Those are some nice features. Unfortunately, your DBMS doesn't do some basic things like return consistent results for a simple SQL query with a group by and having clause. I admit this might be a configuration issue on my company's end, but if so, that is a terrible configuration option and should be hidden away, opt in only, with a huge wall of warnings so people don't actually enable it except in extreme circumstances.

link

neildahlke 2785 days ago

I've seen you mention these inconsistent results twice here in this this thread, but have worked at MemSQL for 5 years and never heard of such an issue. Have you reached out to see if maybe your query / data is not what you expect? I've seen inconsistent results only once, and it was because the default date formats across RDMBSs were different (and was not anticipated).

link

CuriouslyC 2785 days ago

How would the query/data not be what I expect if If I'm writing the query myself, and looking directly at the sql table definition to create it?

Beyond those considerations, why would the same exact same query (executed several times in rapid succession from the console) produce vastly different results? Also, I should clarify, rewriting the query from "select ... from xyz group by ... having ..." to "select ... from (select * from xyz where ...) group by ..." made the inconsistency goes away, without changing the filtering clause. That does not inspire confidence.

link

nikita 2785 days ago

Can you post the full schema and query? Are you sure you are not projecting columns that are not part of the group by expression?

link

CuriouslyC 2785 days ago

I really appreciate the offer. I got the go-ahead to share this information, where should I direct it?

link

curiousDog 2785 days ago

If you’re hitting the disk, aren’t you losing some of the advantages of using an in-memory database in the first place? Or would it still be more performant than a traditional RDBMS due to optimized in memory data structures?

link

nikita 2785 days ago

There is a compound effect in building memory optimized features and being a distributed database that can put a lot more cores at work and cache a lot more data in a cluster:

- In-memory row stores. Super fast for updates and point lookups

- Memory optimized hash joins that minimized cash misses. Great for analytical/reporting use cases

- Vectorization for columnstore query processing. Super fast aggregations that work best when data is cached in memory

link

manigandham 2785 days ago

Apache Ignite is an in-memory data grid that supports persistence and overflow-to-disk. It primarily started as a cache and now has a full key/value store with SQL-92 on top, but isn't a full relational database. Instead it has other features like distributed data structures, messaging, and is more about connecting your applications together. Easier deployment model with all nodes being identical.

MemSQL is a distributed full-featured relational database that has in-memory rowstore and on-disk columnstore tables with rich support for SQL, fulltext search and JSON. It's a fast RDBMS and does really well with analytical queries.

Do you need a fast cache, key/value, messaging system? Or a RDBMS with fast OLTP + OLAP capabilities?

link

majidazimi 2785 days ago

Presto on top of Apache Ignite is basically an OLAP database (It just takes some effort to write the connector). It doesn't support transactional workloads though.

link

capkutay 2785 days ago

This sounds like one of those things that sounds fun on a whiteboard but will be hell to implement and actually make perform well at all.

link

manigandham 2785 days ago

"Basically like" is very different from "engineered for"

Column-oriented storage itself is many times faster for analytical queries, even if on disk, and combined with the other optimizations of MemSQL will get you far better performance. Along with all the data being able to constantly undergo transactional updates.

link

nomel 2785 days ago

I would like to extend this question, what's the benefit of MemSQL over a columnar store db with a massive cache?

link