1) I include keystore-like stored procedures in the source. They do get/set with integer key and string val. I haven't done thorough benchmarking, but I expect them to outperform the other benchmark I've published, which is quite a bit more complex workload
2) (camus2) agreed, nothing ever dies in IT. But roll back the clock a few years. How much noSQL would come into exisence if there was a free xzySQL that scaled across nodes, was fast, etc. I believe the answer is that there'd be very few network-based noSQL for operational workloads if that had been the case.
3) jwatte: Yeah! Jagged edges too!
4) stephen24: Also, I intend to change the license from AGPL to GPL next time I push out some code. No excuse not to try it out.
6) diwu1989: Yes and no. Yes, MemSQL is more mature. No,
(a) I'm not sure how MemSQL scales horizontally (especially since that was a feature added after v1 of their code was released), and,
(b) MemSQL isn't free software
7) itsbits: for now InfiniSQL is mainly for hackers and early adopters--the dependencies are pretty clearly documented but it requires some effort to work with in its current state
Please consider the Apache License or some other license instead of the GPL. There are many organizations that cannot use any flavor of GPL, including LGPL, for legal reasons. You can debate the wisdom of that amongst yourselves, but alas, that's how it is in some places.
I find that to be a strangely ideological response. Your prospective users' requirements are up to them to decide, not up to you. They're the ones who are going to decide whether or not to use your software.
??
Of course they are up to the users to decide, but policies and needs are different. I'm curious, how do you think policies like this change?
Most of the developers i've seen will happily sell you a commercial license if you don't like the software.
After paying for it enough, most companies start to ask "well, actually, how risky is this, really?", and this is how policies change.
In any case, my other point stands - there are no actual legal reasons to not use LGPL/GPL software internally. It would have zero legal impact.
If InfiniSQL was an established incumbent where the choice was between living with GPL and buying a commercial license I would agree with you, but it's a newcomer where the main choice is whether to use it at all.
I assume these shops have Linux in their environments, including the GNU toolchain. There must be some contradiction somewhere that I'm not aware of.
Based on FSF feedback, I'm going to modify the license to include a Classpath-like exception. The intention is to allow people to write stored procedures that link against infinisql without triggering the copyleft. Only if the source to infinisql itself is modified (and distributed) will the copyleft apply.
I'm curious to know the rationale against the GPL in general (not just the AGPL), and how those shops allow Linux & gnu toolchains in spite of their rule against the GPL.
Generally, Linux and the GNU toolchain are carefully managed exceptions and there is massive commercial pressure against continuing to use anything GPL-licensed. Linux itself is strong enough to hold out against this pressure, but other things like GCC are not, which is why there is so much work being invested into LLVM/Clang.
"massive commercial pressure against continuing to use anything GPL-licensed"
I actually generally see just the opposite - even automakers, who are traditionally stalwarts about anything, are now starting to use GPL software in cars.
"which is why there is so much work being invested into LLVM/Clang"
This is a weird opinion, that i've seen a few times.
This is not why LLVM was/is chosen, AFAIK. LLVM was/is chosen for greater control over destiny, a better platform, and a better community.
If LLVM was GPL there is exactly one company that would theoretically stop contributing (admittedly, it's been about 2 months since i calculated the list of companies that contributed in the past year). I doubt that would actually happen, too (mainly because I asked once if they would)
I was just at an LLVM social this evening, and not a single person there worked for a company that chose LLVM because of "massive commercial pressure against the GPL".
LLVM may have been a poor example, but I'm not sure that justified downvoting my comment when there are, in fact, lots of companies with more restrictive policies against use of GPL software vs. other licenses. (Not even necessarily contribution, but even use). That GPL is allowed at all is a result of the fact that there are some essential GPL licensed projects with no good alternatives, like Linux. InfiniSQL is not one of them.
Considering the adoption of llvm and clang came with a freezing of the version of gcc used/distributed by Apple to a version before a particularly notable GPL version (among other GNU projects being similarly frozen, like bash) was applied to it, it would take a hell of an alternate explanation to dislodge the notion that Apple's endorsement of the project wasn't significantly related to licensing.
Thank you, Phil. I'm conflicted about this--I was convinced recently to move away from AGPL having to do with what I was previously unaware of as seemingly legitimate acceptance issues. I feel good about using GPL instead of AGPL.
But I'm conflicted about GPL vs Apache (or BSDish) in the sense that I'm getting the message that I have to bend over backwards just a little bit further before somebody, somewhere might be willing to use my software, maybe. Free isn't enough. I also have to let them fork it, keep it proprietary, wrap their own brand around it, before maybe they might consider using it.
That said, I really want people to use it, and of course help me hack on it. But I'm conflicted.
I say keep it GPL.. AGPL may be too far for many companies.. but GPL should be fine for the core product. As long as any protocols are well document, and client libraries are under more permissive licenses, I don't see an issue with it.
You can do what you want because it's your software. But from the open source policies I've seen companies use, there are generally three lists of licenses. The first list is "you can use any open source software that follows these licenses". BSD, MIT, Apache, etc. are on this list. The second list is "you have to get approval from Legal to use software with these licenses but we would generally prefer for you not to." GPLv2 is generally on this list. The third list is "don't even think about it", and GPLv3 and AGPL are on this list.
My impression is that the second list exists solely because there exists GPLv2 licensed software with no viable alternatives to it. Unfortunately, your project is not one of them. It's your project so you can do whatever you want, but GPL is an obstacle to adoption in industry.
Regarding MemSQL:
- we have just released v2.5 with full support for JSON and online ALTER TABLE across cluster
- MemSQL performs great on both OLAP and OLTP
- it scales well: we have several hundred node cluster in production at Zynga
- license cost for startups is $1
I'm supposed to use the perl api for user and schema management? Perl holds a special place in my heart, but I'm not too excited about managing my database with it. How about an interactive console?
I'm currently using MySQL, how similar is the SQL syntax?
Also, the main application is in C++. A python script launches the C++ daemons. Perl scripts are quick and dirty tests and deployment scripts. The main hacking I'm looking for is with C++, and I don't care so much if the other stuff gets re-implemented in some other language.
Awesome project and a killer concept. No one has been able to really solve relational database scalability yet. I'll have to study the implementation. I was just talking with some friends a few weeks ago about this problem and we concluded that if someone came up with a distributed relational database with decent scalable performance they would be very successful indeed. Will try it out and follow the progress. Hope it takes off.
Have you tried Vertica? One of the big data project my team did used more than 200 servers in a single Vertica cluster. At the enterprise OEM level, the pricing is actually really affordable. You should try out Vertica Community Edition, the free 3 node version.
Same thought and it being at an early stage, ugh. And there goes at least a dozen of competitors out there trying to be different than MongoDB. I am just sort of happy that in the SQL world we usually either look at MySQL or PostgreSQL (well, Oracle and SQL servers are probably more relevant to corporate web service)... but I think people are trying to migrate too.
I think that even in a NoSQL driven domain, that a classic SQL based RDBMS has a place. It's that certain types of load have acceptable levels of relaxed constraints.. that can increase when your data is searched/read over 1000 times for every write. That joins are expensive and even mirroring data to a nosql store has benefits over purely rdbms.
I like document stores like MongoDB and RethinkDB and feel they are a great fit for most scenarios. I also feel that caching layers with Redis or Memcached can help...
Cassandra is interesting in the primary storage space as well, and imho has resolved a lot of issues, while others remain. I'm interested to see if this database can get there faster than Cassandra/CQL can get to more parity with traditional SQL systems.
While I appreciate the options, there is no one solution for everything... If you never break 100 simultaneous users, memory-mapped flat files and map/reduce could be sufficient.
> InfiniSQL currently is an in memory database. This means that all records are stored in system memory, and not written to disk. This provides very high performance--but it also means that InfiniSQL currently lacks the property of Durability. If the power goes out, all data is gone. This limitation is temporary.
They do mention that they'll implement persistence, but that's likely to lower performance, as you're limited to how fast the write ahead log can be written, even if updates to on-disk structures are batched.
They also mention:
> No sharding is necessary with InfiniSQL: it partitions data automatically across available hardware. Connect to any node, and all of the data is accessible.
I haven't looked at how joins are done across large tables that span over multiple nodes (or if it's even supported), but that's not likely to be fast either, for obvious reasons.
1) persistence: battery-backed UPS and synchronous replication. No WAL anywhere. I'm thinking about ways to do disk-based storage without synchronous IO, to provide decent performance with higher storage capacity
2) no joins supported yet. However, the benchmark that I performed (on the blog) involves 3 updates across random nodes. I designed InfiniSQL specifically to perform multi-node transactions very well, because that's the Achilles' heel of every other distributed OLTP system. I plan to implement joins, but expect them to perform decently for the workload you describe.
Gotcha, it's for OLTP, don't know how I missed that.
Should be quite easy to do equijoins especially if you're joining a couple thousand rows at most at a time; it only gets hairier when you're joining all records of very large tables that don't necessarily fit in memory, which is not very OLTP-y.
With regards to persistence, I'm really curious to hear how you're planning to have durability without writing something to disk on every transaction. It could work if you're relaxing the definition of durable to mean written to memory on at least $n$ nodes, though that's likely to be surprising to someone with a stricter definition of durable.
Edit: By the way, it's really cool that you have a C++ implementation of actors, I'll have to look into it. Have you thought about turning that into a library?
I've thought about having an actor library, or minimally, to have the actor basis of InfiniSQL independent of specific workload, but haven't thought it through entirely. I'd be supportive of any efforts to that effect if you want to work on it!
Have you given any more thought to ... not multithreading it? Since you're scaling across servers, apply the same concept across cores. Presto, no more bottleneck on atomically incrementing an ID.
Good thinking, but I think that shifts the issue--namely, that each inter-thread message uses atomic compare and swap to create the message. I assume there'd be a similar bottleneck on the actor that generates the transactionid limited by the number of messages it can send & receive.
Instead, a friend and I have been thinking about how to perhaps modify MVCC to work with distinct transactionid's per partition. Namely, I'm already generating what I call "subtransactionid"'s for each partition involved in a transaction. And those must be ordered for synchronous replication, so I think the way to implement a variation on MVCC may already be mostly there.
I know I still owe you an architectural doc...fixin' ta, ya know.
1) I include keystore-like stored procedures in the source. They do get/set with integer key and string val. I haven't done thorough benchmarking, but I expect them to outperform the other benchmark I've published, which is quite a bit more complex workload
2) (camus2) agreed, nothing ever dies in IT. But roll back the clock a few years. How much noSQL would come into exisence if there was a free xzySQL that scaled across nodes, was fast, etc. I believe the answer is that there'd be very few network-based noSQL for operational workloads if that had been the case.
3) jwatte: Yeah! Jagged edges too!
4) stephen24: Also, I intend to change the license from AGPL to GPL next time I push out some code. No excuse not to try it out.
5) siliconc0w: There's an architectural write-up at High Scalability: http://highscalability.com/blog/2013/11/25/how-to-make-an-in... -- I believe that the actor model architecture is distinct in InfiniSQL.
6) diwu1989: Yes and no. Yes, MemSQL is more mature. No,
(a) I'm not sure how MemSQL scales horizontally (especially since that was a feature added after v1 of their code was released), and,
(b) MemSQL isn't free software
7) itsbits: for now InfiniSQL is mainly for hackers and early adopters--the dependencies are pretty clearly documented but it requires some effort to work with in its current state