InfiniSQL | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	InfiniSQL (infinisql.org)
	49 points by ahassan 4573 days ago

12 comments

mtravis 4573 days ago

A few things (I'm the author of InfiniSQL)

1) I include keystore-like stored procedures in the source. They do get/set with integer key and string val. I haven't done thorough benchmarking, but I expect them to outperform the other benchmark I've published, which is quite a bit more complex workload

2) (camus2) agreed, nothing ever dies in IT. But roll back the clock a few years. How much noSQL would come into exisence if there was a free xzySQL that scaled across nodes, was fast, etc. I believe the answer is that there'd be very few network-based noSQL for operational workloads if that had been the case.

3) jwatte: Yeah! Jagged edges too!

4) stephen24: Also, I intend to change the license from AGPL to GPL next time I push out some code. No excuse not to try it out.

5) siliconc0w: There's an architectural write-up at High Scalability: http://highscalability.com/blog/2013/11/25/how-to-make-an-in... -- I believe that the actor model architecture is distinct in InfiniSQL.

6) diwu1989: Yes and no. Yes, MemSQL is more mature. No,

(a) I'm not sure how MemSQL scales horizontally (especially since that was a feature added after v1 of their code was released), and,

(b) MemSQL isn't free software

7) itsbits: for now InfiniSQL is mainly for hackers and early adopters--the dependencies are pretty clearly documented but it requires some effort to work with in its current state

jsmthrowaway 4573 days ago

Please consider the Apache License or some other license instead of the GPL. There are many organizations that cannot use any flavor of GPL, including LGPL, for legal reasons. You can debate the wisdom of that amongst yourselves, but alas, that's how it is in some places.

(And I really want to try this...)

DannyBee 4573 days ago

"There are many organizations that cannot use any flavor of GPL, including LGPL, for legal reasons"

To be clear, there are no legal reasons I can think of that would ever prevent internal use of LGPL/GPL software.

You mean these companies (Apple, for example) have policies.

Policies like this often change because someone decides the cost vs risk tradeoff is worth it.

Changing a license because of bad policies of certain companies is not a great reason to change a license (in fact, it's, IMHO, an actively bad one).

You really should only change licenses if you find the license you chose does not suit the needs of your users (and policies are not really needs).

philwelch 4573 days ago

I find that to be a strangely ideological response. Your prospective users' requirements are up to them to decide, not up to you. They're the ones who are going to decide whether or not to use your software.

DannyBee 4573 days ago

?? Of course they are up to the users to decide, but policies and needs are different. I'm curious, how do you think policies like this change?

Most of the developers i've seen will happily sell you a commercial license if you don't like the software. After paying for it enough, most companies start to ask "well, actually, how risky is this, really?", and this is how policies change.

In any case, my other point stands - there are no actual legal reasons to not use LGPL/GPL software internally. It would have zero legal impact.

philwelch 4573 days ago

If InfiniSQL was an established incumbent where the choice was between living with GPL and buying a commercial license I would agree with you, but it's a newcomer where the main choice is whether to use it at all.

mtravis 4573 days ago

I assume these shops have Linux in their environments, including the GNU toolchain. There must be some contradiction somewhere that I'm not aware of.

Based on FSF feedback, I'm going to modify the license to include a Classpath-like exception. The intention is to allow people to write stored procedures that link against infinisql without triggering the copyleft. Only if the source to infinisql itself is modified (and distributed) will the copyleft apply.

I'm curious to know the rationale against the GPL in general (not just the AGPL), and how those shops allow Linux & gnu toolchains in spite of their rule against the GPL.

philwelch 4573 days ago

Generally, Linux and the GNU toolchain are carefully managed exceptions and there is massive commercial pressure against continuing to use anything GPL-licensed. Linux itself is strong enough to hold out against this pressure, but other things like GCC are not, which is why there is so much work being invested into LLVM/Clang.

DannyBee 4573 days ago

"massive commercial pressure against continuing to use anything GPL-licensed"

I actually generally see just the opposite - even automakers, who are traditionally stalwarts about anything, are now starting to use GPL software in cars.

"which is why there is so much work being invested into LLVM/Clang"

This is a weird opinion, that i've seen a few times.

This is not why LLVM was/is chosen, AFAIK. LLVM was/is chosen for greater control over destiny, a better platform, and a better community.

If LLVM was GPL there is exactly one company that would theoretically stop contributing (admittedly, it's been about 2 months since i calculated the list of companies that contributed in the past year). I doubt that would actually happen, too (mainly because I asked once if they would)

I was just at an LLVM social this evening, and not a single person there worked for a company that chose LLVM because of "massive commercial pressure against the GPL".

philwelch 4573 days ago

LLVM may have been a poor example, but I'm not sure that justified downvoting my comment when there are, in fact, lots of companies with more restrictive policies against use of GPL software vs. other licenses. (Not even necessarily contribution, but even use). That GPL is allowed at all is a result of the fact that there are some essential GPL licensed projects with no good alternatives, like Linux. InfiniSQL is not one of them.

stormbrew 4573 days ago

Considering the adoption of llvm and clang came with a freezing of the version of gcc used/distributed by Apple to a version before a particularly notable GPL version (among other GNU projects being similarly frozen, like bash) was applied to it, it would take a hell of an alternate explanation to dislodge the notion that Apple's endorsement of the project wasn't significantly related to licensing.

mtravis 4573 days ago

Thank you, Phil. I'm conflicted about this--I was convinced recently to move away from AGPL having to do with what I was previously unaware of as seemingly legitimate acceptance issues. I feel good about using GPL instead of AGPL.

But I'm conflicted about GPL vs Apache (or BSDish) in the sense that I'm getting the message that I have to bend over backwards just a little bit further before somebody, somewhere might be willing to use my software, maybe. Free isn't enough. I also have to let them fork it, keep it proprietary, wrap their own brand around it, before maybe they might consider using it.

That said, I really want people to use it, and of course help me hack on it. But I'm conflicted.

tracker1 4573 days ago

I say keep it GPL.. AGPL may be too far for many companies.. but GPL should be fine for the core product. As long as any protocols are well document, and client libraries are under more permissive licenses, I don't see an issue with it.

philwelch 4573 days ago

You can do what you want because it's your software. But from the open source policies I've seen companies use, there are generally three lists of licenses. The first list is "you can use any open source software that follows these licenses". BSD, MIT, Apache, etc. are on this list. The second list is "you have to get approval from Legal to use software with these licenses but we would generally prefer for you not to." GPLv2 is generally on this list. The third list is "don't even think about it", and GPLv3 and AGPL are on this list.

My impression is that the second list exists solely because there exists GPLv2 licensed software with no viable alternatives to it. Unfortunately, your project is not one of them. It's your project so you can do whatever you want, but GPL is an obstacle to adoption in industry.

zobzu 4573 days ago

So many reasons to keep GPL. They can use GPL just fine, it's just that they don't wanna contribute if they modify it.

justin66 4573 days ago

> So many reasons to keep GPL. They can use GPL just fine, it's just that they don't wanna contribute if they modify it.

More charitably, they don't want to be _legally obligated_ to contribute if they modify it.

tintor 4573 days ago

Regarding MemSQL: - we have just released v2.5 with full support for JSON and online ALTER TABLE across cluster - MemSQL performs great on both OLAP and OLTP - it scales well: we have several hundred node cluster in production at Zynga - license cost for startups is $1

mtravis 4573 days ago

Congratulations!

Do you have benchmark reports?

jacob019 4573 days ago

I'm supposed to use the perl api for user and schema management? Perl holds a special place in my heart, but I'm not too excited about managing my database with it. How about an interactive console?

I'm currently using MySQL, how similar is the SQL syntax?

mtravis 4573 days ago

On backlog to fix. But InfiniSQL is for hackers and early adopters at this stage.

The SQL support is documented (http://www.infinisql.org/docs/index/)

coolsunglasses 4573 days ago

Hackers and early adopters are using Perl in 2013? Sure you aren't off by 12-15 years?

mtravis 4573 days ago

This you? http://favstar.fm/users/hipsterhacker

Also, the main application is in C++. A python script launches the C++ daemons. Perl scripts are quick and dirty tests and deployment scripts. The main hacking I'm looking for is with C++, and I don't care so much if the other stuff gets re-implemented in some other language.

coolsunglasses 4573 days ago

Nope, just a guy that fucks with databases.

No API, got it.

jacob019 4573 days ago

Awesome project and a killer concept. No one has been able to really solve relational database scalability yet. I'll have to study the implementation. I was just talking with some friends a few weeks ago about this problem and we concluded that if someone came up with a distributed relational database with decent scalable performance they would be very successful indeed. Will try it out and follow the progress. Hope it takes off.

diwu1989 4573 days ago

Have you tried Vertica? One of the big data project my team did used more than 200 servers in a single Vertica cluster. At the enterprise OEM level, the pricing is actually really affordable. You should try out Vertica Community Edition, the free 3 node version.

mtravis 4573 days ago

Vertica's a data warehouse. InfiniSQL is geared for OLTP. --------- Thanks, jacob019. Please fork/follow on github, twitter if you're into that, etc.

arnorhs 4573 days ago

Did you mean to link to http://www.infinisql.org/docs/index ? I was getting an error on /docs/

mtravis 4573 days ago

Thanks, edited.

camus2 4573 days ago

I believe the original subtitle is "Extreme Scale Transaction Processing" . "The NoSQL killer" is kind of childish, nothing is going to kill anything.

yeukhon 4573 days ago

Same thought and it being at an early stage, ugh. And there goes at least a dozen of competitors out there trying to be different than MongoDB. I am just sort of happy that in the SQL world we usually either look at MySQL or PostgreSQL (well, Oracle and SQL servers are probably more relevant to corporate web service)... but I think people are trying to migrate too.

tracker1 4573 days ago

I think that even in a NoSQL driven domain, that a classic SQL based RDBMS has a place. It's that certain types of load have acceptable levels of relaxed constraints.. that can increase when your data is searched/read over 1000 times for every write. That joins are expensive and even mirroring data to a nosql store has benefits over purely rdbms.

I like document stores like MongoDB and RethinkDB and feel they are a great fit for most scenarios. I also feel that caching layers with Redis or Memcached can help...

Cassandra is interesting in the primary storage space as well, and imho has resolved a lot of issues, while others remain. I'm interested to see if this database can get there faster than Cassandra/CQL can get to more parity with traditional SQL systems.

While I appreciate the options, there is no one solution for everything... If you never break 100 simultaneous users, memory-mapped flat files and map/reduce could be sufficient.

ashah 4573 days ago

sensationalism sells, probably why your "original" link was missed by poster

wimpycofounder 4573 days ago

So...uh...how does it work? Anyone know if there is an architecture overview somewhere? And why there isn't a link to it on the damn front page?

jfim 4573 days ago

From their documentation:

> InfiniSQL currently is an in memory database. This means that all records are stored in system memory, and not written to disk. This provides very high performance--but it also means that InfiniSQL currently lacks the property of Durability. If the power goes out, all data is gone. This limitation is temporary.

They do mention that they'll implement persistence, but that's likely to lower performance, as you're limited to how fast the write ahead log can be written, even if updates to on-disk structures are batched.

They also mention:

> No sharding is necessary with InfiniSQL: it partitions data automatically across available hardware. Connect to any node, and all of the data is accessible.

I haven't looked at how joins are done across large tables that span over multiple nodes (or if it's even supported), but that's not likely to be fast either, for obvious reasons.

mtravis 4573 days ago

1) persistence: battery-backed UPS and synchronous replication. No WAL anywhere. I'm thinking about ways to do disk-based storage without synchronous IO, to provide decent performance with higher storage capacity

2) no joins supported yet. However, the benchmark that I performed (on the blog) involves 3 updates across random nodes. I designed InfiniSQL specifically to perform multi-node transactions very well, because that's the Achilles' heel of every other distributed OLTP system. I plan to implement joins, but expect them to perform decently for the workload you describe.

jfim 4573 days ago

Gotcha, it's for OLTP, don't know how I missed that.

Should be quite easy to do equijoins especially if you're joining a couple thousand rows at most at a time; it only gets hairier when you're joining all records of very large tables that don't necessarily fit in memory, which is not very OLTP-y.

With regards to persistence, I'm really curious to hear how you're planning to have durability without writing something to disk on every transaction. It could work if you're relaxing the definition of durable to mean written to memory on at least $n$ nodes, though that's likely to be surprising to someone with a stricter definition of durable.

Edit: By the way, it's really cool that you have a C++ implementation of actors, I'll have to look into it. Have you thought about turning that into a library?

mtravis 4573 days ago

For durability, check out http://www.infinisql.org/docs/overview/#idp37053600

I've thought about having an actor library, or minimally, to have the actor basis of InfiniSQL independent of specific workload, but haven't thought it through entirely. I'd be supportive of any efforts to that effect if you want to work on it!

sb057 4573 days ago

Front page > Documentation > Overview

It practically is on the front page.

jbellis 4573 days ago

Last week's discussion here: https://news.ycombinator.com/item?id=6795263

siliconc0w 4573 days ago

Can you compare InfiniSQL to existing in-memory clustered relational database solutions like Galera?

diwu1989 4573 days ago

I see this as fairly similar to memSQL, but less mature.

diger44 4573 days ago

I actually thought this was another joke at first...

stephen 4573 days ago

"Not just a teaser version". Nice!

glibgil 4573 days ago

It uses 2pc so it won't really scale.

mtravis 4573 days ago

I think you mean 2PL.

It does really scale, check out the benchmark report on the blog. http://www.infinisql.org/blog/2013/1112/benchmarking-infinis...

For deadlock-prone workloads, it will likely not be as good, admittedly.

I'm considering a variation on MVCC that gets around the single transactionid bottleneck, but the currently implementation is based on 2PL. http://www.infinisql.org/docs/overview/#ftn.idp37098256

For concurrency management algorithms, there are no good ones. Only those that are less bad than others in some cases.

MichaelGG 4573 days ago

Have you given any more thought to ... not multithreading it? Since you're scaling across servers, apply the same concept across cores. Presto, no more bottleneck on atomically incrementing an ID.

mtravis 4573 days ago

Good thinking, but I think that shifts the issue--namely, that each inter-thread message uses atomic compare and swap to create the message. I assume there'd be a similar bottleneck on the actor that generates the transactionid limited by the number of messages it can send & receive.

Instead, a friend and I have been thinking about how to perhaps modify MVCC to work with distinct transactionid's per partition. Namely, I'm already generating what I call "subtransactionid"'s for each partition involved in a transaction. And those must be ordered for synchronous replication, so I think the way to implement a variation on MVCC may already be mostly there.

I know I still owe you an architectural doc...fixin' ta, ya know.

itsbits 4573 days ago

so many dependencies to install...

jwatte 4573 days ago

Oooh! Shiny!