Hacker News new | ask | show | jobs
by nikita 2787 days ago
MemSQL CEO here. There are a few:

- MemSQL is transactional and writes transactions on disk

- MemSQL has an excellent implementation of SQL with mature query optimization and query execution. And it get better every release. This is from 6.5 https://www.memsql.com/blog/6.5-performance/

- MemSQL has in-memory and on-disk data storage so you can use MemSQL to store petabytes

- MemSQL has columnstores and vectorized query processing: https://news.ycombinator.com/item?id=16617098

- MemSQL supports geospatial, fulltext search, and json

- MemSQL allows you to stream data from kafka in one command: https://docs.memsql.com/sql-reference/v6.5/create-pipeline/

3 comments

Given the availability of open source solutions why would I in 2018 build a critical part of my application on a closed platform?

Genuinely not intended as snark, I'm just curious why memsql is so compelling that I would consider it.

To build on the list of features above, here's a blog post from Pandora where they go into the details of what they use MemSQL for and some of the alternatives they looked into for their use case: https://engineering.pandora.com/using-memsql-at-pandora-79a8...
Isn't that what the previous answer was trying to say? Paraphrasing "MemSQL has these features which we think Apache Ignite and others do not."

All things being equal, I agree that open source solutions are the best. Things are just not always equal.

> Isn't that what the previous answer was trying to say?

Truth be told, the list if features is not very compelling. I mean, JSON support is not a reason to pick a commercial dbms over a FLOSS one.

There is also Tarantool which is another in-memory/SQL DB and is Open Source.

https://github.com/tarantool/tarantool

I would also like to see a response to this.
Those are some nice features. Unfortunately, your DBMS doesn't do some basic things like return consistent results for a simple SQL query with a group by and having clause. I admit this might be a configuration issue on my company's end, but if so, that is a terrible configuration option and should be hidden away, opt in only, with a huge wall of warnings so people don't actually enable it except in extreme circumstances.
I've seen you mention these inconsistent results twice here in this this thread, but have worked at MemSQL for 5 years and never heard of such an issue. Have you reached out to see if maybe your query / data is not what you expect? I've seen inconsistent results only once, and it was because the default date formats across RDMBSs were different (and was not anticipated).
How would the query/data not be what I expect if If I'm writing the query myself, and looking directly at the sql table definition to create it?

Beyond those considerations, why would the same exact same query (executed several times in rapid succession from the console) produce vastly different results? Also, I should clarify, rewriting the query from "select ... from xyz group by ... having ..." to "select ... from (select * from xyz where ...) group by ..." made the inconsistency goes away, without changing the filtering clause. That does not inspire confidence.

Can you post the full schema and query? Are you sure you are not projecting columns that are not part of the group by expression?
I really appreciate the offer. I got the go-ahead to share this information, where should I direct it?
Here, or https://www.memsql.com/forum/, or memsql-public.slack.com.
If you’re hitting the disk, aren’t you losing some of the advantages of using an in-memory database in the first place? Or would it still be more performant than a traditional RDBMS due to optimized in memory data structures?
There is a compound effect in building memory optimized features and being a distributed database that can put a lot more cores at work and cache a lot more data in a cluster:

- In-memory row stores. Super fast for updates and point lookups

- Memory optimized hash joins that minimized cash misses. Great for analytical/reporting use cases

- Vectorization for columnstore query processing. Super fast aggregations that work best when data is cached in memory