|
Paddy, thanks for asking me if I am "familiar with the basics" of a DHT. Just to be clear, I don't think DHTs are the same as messaging systems. But, they touch on a set of related issues. My assertion was that "Just because software runs on multiple machines it does not mean that there is 'no SPoF'". You seem to have read this as saying "in all cases where there are multiple machines, there is always a SPoF". In any case, I was trying to make a more general point, which is that (A) lots of cases which seem to have no SPoF, in fact do have SPoFs, (B) some cases get accused of having a SPoF, but doing so is a mistake, and Rabbit is in this category because you can set up a number of multi-machine scenarios with the required redundancy, and (C) in many cases having a SPoF is not a problem anyway, and may even be a good thing. You said "From what I can see in the docs, RabbitMQ is a client/server relationship. Meaning there is a server. Meaning a single point of failure and a bottleneck. I hate those..". With all due respect this represents a misunderstanding of Rabbit. Moreover you assert that 'there is a server' means that there is a 'single point of failure'. Because the conclusion does not follow from the premise, I assumed you must have meant something else. Perhaps you could explain why you think your system has fewer 'bottlenecks' than RabbitMQ or other messaging systems. I don't think that it does but would love to be enlightened. From your more recent comment, you seem to be saying that using a DHT is a good thing. Yes, I agree, sometimes this is the case. Are you familiar with how RabbitMQ does any of the following: clustering, HA, federation? You say that "Am I saying my solution is closer to what I want than RabbitMQ is? Yes. Yes I am.". I would love to understand this. |
I get the impression that you feel I asked this in a demeaning manner. I had no such intent. But some of your statements seem to contradict the fundamentals of a DHT, so I was unsure what level of explanation I needed to do. I meant only to gauge your level of prior knowledge, in order to engage you on that level. I apologise if I come off as brusque; everyone seems to be taking the fact that I did not use their technology of choice as a personal insult, so I'm growing weary of explaining why each individual technology is not what I wanted.
> Just to be clear, I don't think DHTs are the same as messaging systems. But, they touch on a set of related issues.
Agreed. To be clear: I released a DHT. I did not release a messaging system of any sort. I do intend to build a messaging system on top of it, but its uses are not limited to that.
> My assertion was that "Just because software runs on multiple machines it does not mean that there is 'no SPoF'". You seem to have read this as saying "in all cases where there are multiple machines, there is always a SPoF".
Not really. My assertion is just that any software that is built on a DHT and the principles behind it has no single point of failure. And if you were not referring to software built on DHTs, I'm not entirely sure how the comment is relevant to the discussion?
> In any case, I was trying to make a more general point, which is that (A) lots of cases which seem to have no SPoF, in fact do have SPoFs
I would argue that many of the alternatives people have proposed to me fall under this description.
> (B) some cases get accused of having a SPoF, but doing so is a mistake, and Rabbit is in this category because you can set up a number of multi-machine scenarios with the required redundancy
Redundancy does not make the SPoF disappear, it just manages the risk of that SPoF. Perhaps I am using the term beyond its meaning here (sorry!), but I think of this more in the architecture of the cluster. If traffic is routed through a small, centralised subset of machines whose specific purpose is to handle or route that traffic, I consider that to be a SPoF, no matter how unlikely that failure may be. I consider it so because rather than architecting your cluster to avoid issues, you are offsetting the issues to an ops/deployment problem. Yes, you can achieve HA, but it is not an inherent part of your cluster; it's bolted on afterwards by some clever duct-taping as you ping servers and swap them out if they seem to be down.
> (C) in many cases having a SPoF is not a problem anyway, and may even be a good thing.
A DHT is not appropriate for all cases, though I would challenge anyone to quote me on ever saying it is. I won't even say it's the best tool for the job I'm using it for; it simply is the tool that fit best with my desired approach to the problem.
> With all due respect this represents a misunderstanding of Rabbit.
This is entirely possible. I have a decidedly rudimentary understanding of Rabbit, something I tried to convey by qualifying all of my statements about it. "From what I can see in the docs", etc.
> Moreover you assert that 'there is a server' means that there is a 'single point of failure'. Because the conclusion does not follow from the premise, I assumed you must have meant something else.
In my understanding, the conclusion does follow from the premise: a server designates a machine that is specifically intended to handle requests. Clearly, I'm misusing the term SPoF. My apologies for the confusion caused by that. In the six months I worked on this, I never once had to explain why I thought a DHT fit my needs; I was speaking to people who worked with distributed systems, so no explanation was needed. This left me a little ill-prepared to explain myself.
> Perhaps you could explain why you think your system has fewer 'bottlenecks' than RabbitMQ or other messaging systems. I don't think that it does but would love to be enlightened.
It's entirely possible my system does not have fewer bottlenecks than RabbitMQ. Again, I'm no RabbitMQ expert. Here's why I think my system has few bottlenecks: * No change in code or deploy practices is needed between one server and one billion servers. * Unless a catastrophic event hits the cluster (an event that would leave your application non-functioning, even if Pastry continued functioning), the cluster will remain healthy as servers come and go. This is not a remedy put in place by ops, it is not a bolted on feature, it is a core premise of the algorithm. I prefer to solve my availability concerns at the software level, rather on the deploy level. This might be a virtue of the fact that my software is open source, so I try to make it simple for others to deploy. It might be a virtue of the fact that I am more familiar with writing software than I am with deploying software. * The messaging component is not an element in the architecture; rather, it is an embedded piece of every single element in the architecture that wishes to take advantage of it. There are no messaging servers, there are no brokers, no queues. There is simply your architecture, except now it can communicate efficiently.
Based on these three points, Pastry fit in with the approach I wanted to take in my architecture. It seemed like every other messaging protocol I could think of preferred to have a messaging server, instead. Even if you have a pool of these servers, allowing for HA through redundancy, that is not really what I was looking for.
> you seem to be saying that using a DHT is a good thing. Yes, I agree, sometimes this is the case.
We are in agreement, then. Examine your problem, then choose a tool for the job. A lot of people seem to be taking issue with the fact that I did not contort the problem until it fit pre-existing solutions, instead of creating a solution to the problem I saw.
> Are you familiar with how RabbitMQ does any of the following: clustering, HA, federation?
I am familiar with this page: http://www.rabbitmq.com/ha.html In addition, I have seen this page: http://www.rabbitmq.com/distributed.html
Both of them feel like a bolted on solution to the problem of distributed message passing, rather than an inherent design characteristic. Allow me to quote:
> some important caveats apply: whilst exchanges and bindings survive the loss of individual nodes, queues and their messages do not. This is because a queue and its contents reside on exactly one node, thus the loss of a node will render its queues unavailable.
This is understandable; queuing is pretty much impossible (as far as I know) to achieve in a distributed system. But I don't need queuing, so why should I let that hamstring me unnecessarily?
> should one node of a cluster fail, the queue can automatically switch to one of the mirrors and continue to operate, with no unavailability of service.
This does not feel like an inherent aspect of the design. This feels a lot like a deploy detail.
> In normal operation, for each mirrored-queue, there is one master and several slaves,
When I hear "master" and "slave", they translate in my head to "bottleneck/SPoF" and "backups".
> Clustering connects multiple machines together to form a single logical broker.
That sounds an awful lot like a single element in the system that is responsible for message traffic. There may be a lot of machines in that single element, but it is a single element nonetheless.
I am not saying that RabbitMQ is a bad solution for message passing, nor am I saying my problem couldn't be solved by RabbitMQ if I changed my architecture to fit Rabbit's needs. All I am saying is that I have a preference for architectures that does not take advantage of the things Rabbit does really, really well, and does take advantage of the things that a DHT does really, really well. So I'm more than a little confused that people are up in arms over the fact that I used the paradigm that matched my preference instead of trying to force something to do what it was not intended to do.