Hacker News new | ask | show | jobs
Scylla release: version 1.0 (scylladb.com)
57 points by hexalisk 3717 days ago
3 comments

It's so refreshing to see an open source project have the balls to announce a 1.0 early and not stay in the eternal "0.x we're still beta so don't blame us for any problems"-phase.

One thing that's stopping us from switching over is that we need more real world success stories of other people making the switch and being better off. For all its warts, Cassandra is the devil we know, but ScyllaDB is definitely something we're keeping an eye on.

One suggestion that might make it easier for people to move over is thorough testing of mixed clusters. I wouldn't mind switching one node in my cluster over, if I could be certain it wouldn't screw up the entire cluster. I saw that they ran through the Jepsen tests and did a good job, it would be very interesting to see how a 50/50 cassandra/scylladb cluster would fare in the same tests!

We don't support mixed clusters, though, and are unlikely to do so.

Most people testing Scylla at this point are doing A/B testing, using some proxy to replicate the writes to both clusters and then comparing the two.

Okay, I understand that the burden of correctly supporting the internal gossip protocol is very high.

It's a shame though, because it makes it harder for medium-sized installations to convert. If you're running a small cluster, then getting a few more machines to test it out isn't that much of a burden. If you're so big that you have multiple clusters, you can probably roll through and upgrade cluster by cluster or keyspace by keyspace or something.

But between those two there's a medium size installation where an upgrade to ScyllaDB is a serious project that would eat up a sizeable chunk of the server budget just to test it out, and the ability to do a server-by-server upgrade of a running cluster would be invaluable.

On the other hand, that type of customer probably wouldn't be interested in premium support either, so there's probably little incentive for you to support that scenario over the large installation scenario. And that's totally fair, I might as well be wishing for unicorns and rainbows. :-)

On another note, DataStax recently announced that they'll be discontinuing the free version of OpsCenter from Cassandra 3.0 and forward. So you have a shot at grabbing people that are stuck on 2.x, because if there's a cost to upgrading anyway, and if you could offer something better than OpsCenter, your alternative might look very tempting indeed.

Supporting the gossip protocol is hard, but so is supporting everything else. A more interesting question here is: if there is a bug in the gossip protocol, Cassandra side, and the addition of a different implementation of the same protocol happens to trash your cluster, who do you blame? How long until you find out?

That sounds like a support nightmare. We're willing to revisit this in the future, but I think it is an unlikely development.

On the other hand, if the performance benefits prove to be true, you might be able to test Scylla with much less machines than you have for Cassandra.
I think they have the luxury of targetting a specific, invariant set of requirements vis-a-vis the Cassandra project, so that helps. But still, their speed has been phenomenal.

I really hope to try out Scylla with some of my cassandra schemas... see how much of the 10x -> 20x speed advantage really bears out.

I was suspicious of these guys originally but they've been pretty upfront with testing and despite the convenience of Java, I can see how they might get massive improvements in speed by doing lower-level C/C++ code.

If you do run your own benchmarks, please publish them - together with your methodology.

We fully stand behind our claims, and don't want to be seen as the "oh, it's the vendor saying, of course it's good".

If you do happen to find an odd case in which we don't perform well, we'd love to hear about it and will work with you in fixing it.

What's the status of secondary index? It was described as "half-ready" here[1]. Is that still the case?

Also does Scylla support some sort of data locality as far as secondary index is concerned?

For example if I want to store comments, I'd want to store all comments that belong to the same forum thread in a single node. All these comments have the same thread_id. Then I can have a secondary index on thread_id. When I want to get all comments that belong to a certain thread I can just query on the thread_id secondary index and only 1 node is queried. Is that possible?

[1] https://github.com/scylladb/scylla/issues/401

We are starting developing our solution for secondary index right now. We haven't yet made a decision regarding how we'll implement it.
So a feature that was described as half-ready 7 months ago is actually still undecided design-wise 7 months later? That's surprising. When do you think secondary index will be available?
This feature was never described as half-ready. The ticket said we had half ready code, but it's part of the design process to choose whether or not we'll use that code. I can see, however, how this may give a reader the wrong impression, and I apologize for that.

As can be seen in the follow ups for that ticket, we haven't yet decided if we'll support standard secondary index, SASI, some form of materialized views, or all of them. Until we do, it's hard to commit to a timeline. Basic secondary index (for which we have code half ready) is pretty simple, and if we do implement it, it should land in our main version in a couple of weeks. The others are a bit more complicated.

How does Scylla compare to Aerospike? They seem pretty similar.
They are similar as both built in C/C++ with performance in mind. They are very different as Aerospike is a Key Value store, while Scylla, like Cassandra is Columnar (Key, Key, Value). This gives Scylla a much richer semantic.

There are more differences, like Aerospike holding all keys in memory, while Scylla does not have such limitation. I'm sure there are more differences in tunable consistency, HA and multi DC, but I'm not an Aerospike expert.

I don't know Scylla, but Aerospike is intentionally designed for low-latency networking [1] (i.e., typically colocated within a single rack), which can be a challenge in the cloud.

[1] https://aphyr.com/posts/324-jepsen-aerospike

There is nothing Cloud-specific in Scylla, though. Scylla will also target low latencies, and we have been very successful at that. But we also commit to supporting the Cassandra data model, so we won't really chase extra latencies if it means breaking that.