My company currently is using Go to monitor and respond to events on hundreds of thousands of RabbitMQ message queues. We create a goroutine for handling each queue, and Go handles all of the concurrency and threading in the runtime while avoiding the resource overhead of standard threads. All of this is done in an application that took about 6 hours to write.
For the vast majority of cases that we deal with, we don't really need much for resources to run this application.
One specific test uses approximately 20k goroutines and averages 15-20MB RAM depending on test load. As for CPU utilization, the impact is minimal; RabbitMQ is the biggest bottleneck, as our peak message throughput for a single RabbitMQ broker is about 50k/messages per second, which our go process is able to handle without much issue. The worst-case scenario that I've been able to test for is one where those 50k messages are evenly spread across different queues; even then, our CPU utilization wasn't any higher than 15%/core on a 12-core server.
I'm reading over the source with my coffee this morning. I'll pseudo-CR it if I see mistakes (commented on a typo already). Hope you don't mind! I'm interested to see how you are doing this because I'm playing with my own toy key-value store on the week-ends [1].
In [2], wondering why you make GOMAXPROCS=2*Cpu by default?
I experiment with different GOMAXPROCS settings on three machines and noticed that 1CPU does not run tiedot to its full potential, 3CPU seems to be slower than 2*CPU.
With GOMAXPROCS=n*CPU, n is roughly the amount of pre-emptive (vs the built-in cooperative) multitasking that you want going on, with 1 being none. Handled by the OS, of course. Interesting that you noticed a speed up > 1.
I didn't think about that, I'll write that down in my checklist of things to do when testing/benchmarking my projects... like another dimension to take care of when testing. Aside from domain-range, good data, bad data, edge cases... and other parameters - now add to that concurrency scenarios.
Is there anything Golang can't do fairly well with a small code base? Damn, I wish I were still 20 and had shitloads of time on my hands to invest in learning the ins and outs of the language.
It does take Google ~100k lines to load balance their MySQL servers, so the size of your program is still highly dependent on how simple you want to make it. I'm 21 and spending most of my time writing Go -- there aren't many "ins and outs" required to learn. I find that it is very idiomatic to write simple solutions and simply take advantage of interfaces if someone wants to implement a more specific piece of your code base. If you want to learn Go well all you have to do is read the spec[1] and the source code of at least some portion of the standard library.
As an aside, this project is interesting. I've been kinda curious of experimenting on a project like this on my own. However, I wish the author's documentation opened with what ideas from what papers inspired the project.
20 - as in before life starts to throw serious time sinks at you, wives, children, mortgages, high stress jobs etc etc.
At 20, I just did whatever the hell I wanted, bummed around Europe before figuring out where to do my masters. Responsibility was not paramount on my mind. Maybe 20 something's today are different, but not the ones I know, it's still all about having fun, learning new stuff and exploring the possibilities.
I fail to see how anyone could not have understood what the comment meant.
Starbase is pattered after "/rdb", a flat-file relational database adhering to the Unix-philosophy, ie., piping together small, single-purpose tools. The approach is covered in "Unix Relational Database Management" ( http://www.amazon.com/Relational-Database-Management-Prentic... ), a book which anticipated the "suckless" movement by a couple of decades ( http://suckless.org/philosophy ).
It would be nice to see something like /rdb, except with:
1. Better transparent support for optional indexes when querying.
2. Automatic updating of indexes when deleting/updating data.
3. Scripts included in the package written in "rc" rather than "sh".
4. BSD license.
Perhaps something like tiedot could be built on top of the above: a single, statically-compiled binary to expose the flat file database through a JSON/REST interface and to honor the Unix user/group table-file permissions through standard HTTP authentication. Forms could be designed against the web service while system administration is handled with as much of the unix system as possible.
Such a stack would be great for smaller start ups and where *nix experience is available.
according to my (limited) understanding, camlistore is a generic BLOB storage - which is not something tiedot addresses. tiedot is a generic unstructured data storage - more like CouchDB/Cassandra, it stores serialized JSON data rather than BLOB.
I'd like to see how this compares performance-wise with MongoDB and other JSON-based document stores, especially with data sets that are at a larger scale. I know Mongo tends to start crumbling if it can't fit an entire index in the available memory (which happens when you have a 4GB data set, unfortunately). Have you done any of those comparisons?
That said, this looks really interesting. Though I can't imagine for the life of me why you'd indent such a wonderful project with tabs. ;)
Currently, tiedot's stance on ACID is similar to MongoDB's.
I have not spent enough time on the project to support redundancy in it, sorry. I totally agree that redundancy is a must-have if someone wants to use tiedot in serious scenarios, so I will definitely spend time on making this feature available.
Scalability on symmetric multiprocessing architectures has been seriously considered and implemented - basically, tiedot can demonstrate that more CPUs = more performance. However scaling by replication has not been considered yet.
You should decide if you want to make a serious run at being an option for true production-quality deployment, or if this is a fun project. If it's just a fun project, you may want to consider not worrying about replication/redundancy; it's tricky, quirky, and if you haven't been considering it from day one, likely to require a near-complete rewrite, which may be an awful lot of work for a fun project. Of course, if you are going to be serious, it is a must.
I am completely neutral on which direction you go; my point here is just that if you are just having some fun, you may find replication will turn out to be, well, potentially rather unfun. Educational as can be, though. It's a far, far more subtle problem than initially meets the eye.