Hacker News new | ask | show | jobs
by OJ 5438 days ago
Hi equark,

Thanks for the question. Yes there are reasons why there are such differences between the CorrugatedIron configuration and that of the project you have linked to. To understand them, I think it's important to look at the differences between the two services being used, and the goals of the libraries that connect to them. Please bear in mind that I am no expert with Redis, nor the libraries that connect to it.

Production Riak configurations are clustered and hence it makes sense to distribute the connections across the entire cluster to avoid hammering one node. Clusters are often hidden behind some kind of proxy or load-balander such as HAProxy. Hiding behind a proxy is a great way of removing the need for clients to worry about the nodes that are in the cluster and managing their lifetimes.

In the .NET world I haven't yet seen many (or any) production configurations where there is a proxy, such as HAProxy install, sitting between an application and it's database. I'm sure there are many reasons for this, inculding built-in clustering for MS SQL (the most popular choice for RDBMS sitting behind .NET applications) or the lack of a cluster full stop.

When putting together the design for CI, we thought about what setup people are likely to have when they reach production. We came to the conclusion that there's a good chance that most people would be interested in reducing the number of "moving parts" and avoid the need to put a proxy in place just to load-balance connections across a Riak cluster. As a result, we decided to attempt to put this into the client library itself, hence removing the need to have a proxy installed. This doesn't, however, force you to do so. If you do have a proxy application which is load-balancing connections to your Riak cluster, then you simply have to modify the CI configuration so that there is just one node in the cluster and point it to the proxy.

This design allows both rich client and web applications to connect to the whole cluster without the need of a proxy. The "problem" with this design is that the cluster needs to be configured, and in the .NET world that is generally a little verbose.

The specification of the configSection is something that is required in .NET if you want your configuration to be included in app.config or web.config files. Forcing people to have separate config files would not make sense and goes against convention in the .NET world. However, if you do want to have your own configuration file outside of the usual locations, we support that too (just pass in the full path to the config file as an extra parameter to RiakClusterConfiguration.LoadFromConfig()).

Before embarking on the development of CI, we wanted to make efforts to reduce the amount of management that the user had to do. By management I mean handling the lifecycle of connections, freeing up resources, figuring out which node to connect to, etc.

The net effect of this effort is that once a cluster has been instantiated (which, mind you, should only be done once for the entire application's lifetime) the user can pull client instances out of the cluster and do not have to think at all about connection management. Getting hold of a client becomes as simple as:

    var client = cluster.CreateClient();
RiakClient instances are not disposable. RiakClient API calls, behind the scenes, use higher-order functions to avoid the leaky abstraction that is disposable resources and hence the RiakClient object doesn't have to manage connection lifetimes either. Get a client, use it, don't worry about cleaning up after yourself because we have that covered already. This gets rid of a log of fluff like:

    using(var client = ....)
    {
      client.DoStuff();
    }
.. and turns it into:

    client.DoStuff();
It might be just me, but I do prefer reducing "boilerplate" throughout the user's codebase and having just a little more in the up-front configuration than the other way around. This also gives us the ability to load-balance calls across the nodes in the cluster without the user having to think about it. Again, I think this makes the API cleaner, easier to use and reduces the chances of boilerplate leaking out into user code.

I think comparing this design to that which you have linked to, the RedisConnection, isn't really comparing apples with apples. But for the sake of discussion the RedisConnection, behind the scenes, obviously has some default values (such as port number) hidden away. We do the same, everyone but the node name and the host address has defaults behind the scenes and could be reduced to something as simple as <node name="foo" host="bar" />. This isn't hidden in the sample configuration so that people can see the options that are available to them. Also note how the user of this connection needs to know, in code, about the host they're connecting to. Of course, this could be specified via configuration too, but it has to go somewhere, and that will result in boilerplate. You're also managing the connection yourself, and the users of the client run the risk of leaking resources if they forget to clean up at the right time. Lastly, the RedisConnection doesn't appear to be pooled (please correct me if I'm wrong) and hence managing the number of connections to your Redis instance is up to the API user. In the case of Redis, this might be ok, but in production, it tends to be a good idea to limit the number of connections to services as network resources can be expensive.

So, just to reiterate, the design goals of both of these clients are obviously quite different. CI configuration is only done once up-front, and from there client instances are created simply by calling `cluster.CreateClient()`, management of resources is done for you, and it has built-in load balancing. Very different to Booksleeve from what I can tell (again, happy to be corrected).

I hope this helps clarify the stance and helps understand why and where the sacrifices were made.

Thanks again for the question.

OJ (TheColonial).

2 comments

It sounds like you're making a lot of assumptions about its use case, without actually using it in production. This whole thing sounds drastically over-designed.

And that initialization code is symptomatic of everything that is wrong with the statically-typed OO world. If you really need all that XML, what's wrong with this?

  var cluster = new RiakCluster("riakConfig");
  var client = cluster.CreateClient();
  var value = client.Get("my_bucket", "my_key");
...and I suspect you could probably omit the parameter to RiakCluster, since it appears to be fixed. If lib users need more control over configuration, then they can write extra code towards that. Don't require it for the basic, default case.
jallman,

You're right, we are making a few assumptions, which to me is natural given that this is v0.1.0. It hasn't been used it production because it never existed before now.

I fail to see how enabling clustering and connection pooling via configuration is over-design. Perhaps you could share some more detail on that?

The first line of your code sample is making assumptions itself. No, you can not omit the parameter to RiakCluster, the section name in the configuration is not fixed and is configurable by the user. This means you can have multiple clusters if you want to and configure them both in your config and specify them by name in your code.

The last two lines are exactly the same as what we have.

So really the only difference I see in your suggestion is the way that the configuration is wired in. Here's where your assumption comes in: you have assumed that the configuration is always in the same spot (ie. an app.config or web.config). There are no guarantees that this is the case.

Yes, we could probably change the interface so that it does search common configurations by default, and that is a suggestion that we're more than open to. As I said, this is v0.1.0 and it's just been unveiled. After using it, I'd be interested to see if you consider the configuration/wiring (which is still quite small, and a one-off task) painful.

We seem to be quibbling over one extra line of code, which ironically is substantially shorter than the comment that either of us has posted.

Having said that, we are always looking for ways to improve this kind of stuff, as I too hate unnecessarily verbose configuration. In our case, for version 0.1.0, we wanted to make sure the ability to configure and wire up in many different ways was available. In future revisions we'll no doubt look at home we can make this simpler.

With all this in mind, along with what I said in previous comments, how would you go about enabling this level of flexibility in the library without, as you put it, making it "drastically over-designed"?

> You're right, we are making a few assumptions

Protip: build just as much as needed, and iterate as needed. You'll often find your early assumptions are not always the correct ones.

If you posted this saying, "Hey, this has been proven on a 20-node Riak deployment," then you would have a lot more credibility. Right now though, it seems like this is solving a Maserati problem for you.

> The first line of your code sample is making assumptions itself.

Read the rest of what I said. equark in this thread is also saying the same thing. Favor convention over configuration.

Here's another idea: dump the XML, or at least give us the option to bypass it and specify servers in code. The XML loader necessarily maps back to internal data structures, so why not cut it out completely? That is one-off; twenty lines of boilerplate is a PITA for quick projects, as are redundant configs sitting around uselessly.

I don't work with .NET (just offering up a critique) -- if that goes against the .NET orthodoxy, oh well. That'll be another reason I don't use it.

Thanks again for the comment.

Your "protip" implies that "needed" is a known quantity, which it isn't in this case. It's a new project, new development without any clients at all. Personally, I would rather give too much flexibility and wind it back after a while than not give enough flexibility and have nobody use it because they can't configure it to their needs. The last thing I would want to have happen is for people to give up an attempt to use it because it isn't able to do certain things. That to me is not a protip if you're hoping that people will use it.

I wish I could post saying "Hey this has been proven on ..." but I can't, because it hasn't. My personal credibility has nothing to do with it at all. If you evaluate software purely on someone's credibility then you'd have barely anything to choose from.

For me, it's solving a problem that I have, and solving it quite well. From the feedback I've got elsewhere, others have felt the same and don't really have an issue with a typical XML configuration that you find in almost any .NET application.

While I appreciate the idea, I am not dumping the XML. It's idiomatic .NET configuration, and I am not going to fly in the face of this convention due to my, or any other person's, bias against XML.

You can specify the configuration through code, the XML-driven config is just one way that you can configure it.

I'm not sure which twenty lines of boilerplate you're talking about. I see 2 for configuration, and one to create a client. I don't see the redundancy, and I also don't see any of the configuration as useless. Each entry serves a specific purpose, and can be specified or left as a default (except for important properties such as host name). I don't believe that forcing hard-coding of configuration is a better option over XML, particularly in the ecosystem this library lives in.

> Read the rest of what I said.

I read it. Multiple times. And that was before I responded.

equark is saying that he doesn't want to write boilerplate code. You're saying that XML config is redundant and sitting around uselessly. I don't see those as the same thing.

If you don't work with .NET then obviously this library isn't targeted at you. Hence, feel free not to use it :)

Personally this difference of opinion is moot, as the configuration/wiring of settings is such a small part of the library. It is possible to configure it easily to fit your environment, even if it's 1 or 2 lines of code more than you'd like.

I'd be keen to hear what you had to say about the other 98% of the code. ie. The bit that actually does something interesting.

Thanks for the critique.

My design preference is to make sure the default option feels natural in the f# or ironpython interactive console. I should be able to interact with Riak with no XML and one line to connect. Obviously it's only 2% but it's the first 2%.
This is the most interesting point made so far. Agreed, I had not considered the likes of FSI. That does come with the issue of being painful reading XML config from app.config files, but can easily be handled via what we have.

Would you consider this to be out of the question:

    open CorrugatedIron.Comms
    
    let cluster = RiakCluster.CreateFromConfig("section", "path/to/file.config")
    let client = cluster.CreateClient()
Thanks for the detailed response. First, I'm glad you guys are doing this. It's great, I didn't mean to be overly negative.

I'd just lean towards having sensible defaults, so that first experience is:

var client = new RiakClient()

I just get nervous when I see what seems like should being a very simple API using ConnectionFactory and company.

I agree with the point about the ConnectionFactory. That is something I still don't like. It's there, at the moment, to make it easy to test certain parts of the app. We'll look to add an overload to this so that we don't have to specify it for non-testing scenarios.

As a quick side-node something as simple as:

    var client = new RiakClient();
is nice, but has to infer a _lot_ from the world. We could write a lot of code to make sure that it's smart enough to pull all the details from the default locations or we could let people tell us where to look. We went for the latter, as it's much safer to be told than to pull config from the wrong spot.

Again, we'll look to improve this, particularly as we get more feedback from people using the library.

Thanks!

equark,

I don't think you were being overly negative. It just tends to be common for people on HN to be more terse than necessary and sometimes things can come across the wrong way. I have taken no offense :)

The sensible defaults thing is definitely something that should be in place. For v0.1.0, the goal was to provide the flexibility out of the box rather than the lack thereof. No doubt for v0.2.0 (already in the works) we'll have adjusted this to make it simpler for the more common cases. We didn't want to limit people out of the box.

Configuration of this kind of thing is hard when you can technically have multiple clusters, so somewhere along the way when wiring up the cluster you do need to specify something :)

At least we have the ability to do whatever we need. Now, with help from the community, users, and commenters like yourself, we can improve things to keep as many people happy as we can.

Thanks again for the comment.