Rochefort – Poor Man's Kafka

Y	Hacker News new \| ask \| show \| jobs

	Rochefort – Poor Man's Kafka (github.com)
	31 points by booknomads 3085 days ago

8 comments

saryant 3085 days ago

I've run Kafka at large scale. I've also seen even larger scale attempts to replace it.

Just use Kafka. Seriously, it's rock solid and is practically lingua franca in backend architecture these days. Everyone understands it and every data processing framework or service supports it.

Kafka is much, much more than just distributed pub/sub. It's disk cache optimizations alone make rolling your own a terrible idea.

link

kylecordes 3085 days ago

This is a principle our industry implements poorly. It often seems like each new generation (for very small values of the word generation) must reinvent the same thing. Perhaps because the old thing was too complex to understand immediately... complexity driven by the needs of the underlying problem... complexity which the new implementation will inevitably obtain if it survives long enough and becomes popular enough that anyone cares.

(That said, I'm highly in favor of innovation of most any kind; building new things is great, if the new thing has some plausible innovation over the old thing!)

link

borplk 3085 days ago

The industry itself feeds this cycle by rewarding people for creating new projects and demanding shiny github projects.

link

maltalex 3085 days ago

It’s true not just in software , but in many aspects of modern society. We value innovation (even when it is bot innovative) a lot more than maintenance. Just look at physical infrastructure as an example.

link

bonesss 3084 days ago

> Kafka is much, much more than just distributed pub/sub

In between 0% Kafka and 100% Kafka: Kafkas REST proxy gives a minimalistic API surface for using the system. Recreating that API in another language with a simpler backend is highly achievable...

My Kafka installation has a parallel relational backend that provides an onboarding story for smaller apps and groups, for example. It provides about 13% of Kafkas functionality and can't scale meaningfully, but provides baseline data streaming in a pinch and is API compatible with how we use Kafka most of the time.

link

luhn 3085 days ago

If you're looking for a log that's not Kafka, also worthwhile to check out Redis Streams, which will be included in the upcoming Redis 5.0 release.

http://antirez.com/news/114 (I think the API has changed a bit since this blog post, but the concepts and capabilities are the same.)

link

voxadam 3085 days ago

On a somewhat related note there is a link implementation of Kafka written in Go called Jocko.[0]

[0] https://github.com/travisjeffery/jocko

link

tbrock 3084 days ago

This is actually interesting because Kakfka’s is great and the protocol makes sense but I want a binary to run so I don’t have to deal with the java ecosystem.

link

jitl 3085 days ago

Backstory and caveats:

> Losing Data and NIH

> You can lose data on crash and there is no replication, so you have to orchestrate that yourself doing double writes or something.

> The super simple architecture allows for all kidnds of hacks to do backups/replication/sharding but you have to do those yourself.

> My usecase is ok with losing some data, and we dont have money to pay for kafka+zk+monitoring(kafka,zk), nor time to learn how to optimize it for our quite big write and very big multi-read load.

> Keep in mind that there is some not-invented-here syndrome involved into making it, but I use the service in production and it works very nice :)

link

549362-30499 3085 days ago

I'm scratching my head about "we dont have money to pay for kafka+zk+monitoring(kafka,zk)". Kafka and Zookeeper are both open source. As are monitoring and alerting tools such as Prometheus. Surely the hosting and storage costs are similar. So what does this project offer its creator, other than a great deal of infrastructural debt and all the latent bugs of a roll-your-own solution that lacks a community?

link

booknomads 3084 days ago

Just setting that up will cost at least 2-4gb of ram, and we are streched thin as it is, 2gb of ram would mean we have to get one more node for our kubernetes cluster in gcloud.

Me and my team understands the 300 lines of code that go into rochefort and can twist and modify it for our needs.

Performance will make or break our startup, which deals with real time user behaviour analytics, and doing high performant java for a while, I know very well how much time I will have to spend looking at G1 logs to fine tune it.

I am sure we wont use rochefort after we scale up, but for now I think gives us greather velocity than kafka (just because if we want to modify kafka we have to spend a week on a simple change).

I want to be able to add more meta information in the header, or read the files from another process, rsync and read them to my laptop, add custom reducers etc, all those things will take me minutes with rochefort and days with kafka.

link

matt_wulfeck 3085 days ago

I'm assuming he means money as in time, to install, configure, optimize, and monitor those distributed systems.

link

nevi-me 3085 days ago

I'm a self-taught, have a single dedicated server, I have a single-instance Kafka running on top of ZK. Yes, I lose the benefits of replication, failover, etc. I don't need that though. The whole installation took me half an hour, learning Kafka took maybe 3 hours, and as long as my server's been up, Kafka's been up.

Granted, I am not monitoring Kafka, but I do other processes.

The other nice thing is that now that I have ZK, other software that need it can just reuse the same process.

I think using the maintenance cost as a reason to write your own tool, is a short-sighted decision.

link

549362-30499 3085 days ago

Even if that's the case, deploying and scaling a Kafka cluster is something that hundreds of companies have figured out and publicly written about. It's something that you can hire an experienced engineer to fix. When this thing runs into problems, they will be all new ones.

link

bonesss 3084 days ago

These days you can also find kubernetes scripts that handle the Kafka installation, setup, routing, etc.

That just leaves the "simple work" of administering and tuning which, as you pointed out, is competence that's steadily growing in the industry.

link

jdormit 3085 days ago

As opposed to the time required to implement, debug, and support a custom solution?

link

ckocagil 3085 days ago

I love the concepts Kafka defines so clearly, but the software is too complex and have dozens of knobs you have to adjust.

Simple code with "obviously no bugs" vs complex code with "no obvious bugs".

link

smugworth 3083 days ago

> I love the concepts Kafka defines so clearly, but the software is too complex and have dozens of knobs you have to adjust.

This is one of our biggest headaches, and it's not even that a Kafka server itself is so configurable. We have hundreds of teams writing client applications, and jumping on bridges because Kafka clients have poor configurations is getting old. Too many knobs to twiddle, but I guess that's what happens if you're expecting to be able to tweak for high performance.

link

bognition 3085 days ago

I'm always very curious about the backstory of projects like this. Without that backstory there is very little chance I'd try out something like this.

Ideally the read me would explain why Kafka didn't cut it, why the trade offs the authors made were worth it (in this case), and why I did consider using a this system.

Sadly I don't have enough time to read an entire repo of code to try and figure these things out.

link

LogicX 3085 days ago

Atleast read the bottom of the README where some of this is covered

link

agnivade 3083 days ago

Question to OP - Did your team check out NATS (https://nats.io/) ? What are your thoughts on that ?

link

z0r 3085 days ago

this repository appears to be just a hair over a week old, so i am skeptical even of "I use the service in production and it works very nice". fun project i'm sure, but if i felt like breaking the rules and engaging in a little NIH of this sort - i'm not sure i'd choose HTTP (or any other network protocol) as the hub to build it around

link

amerine 3084 days ago

I don’t disagree with you, buts it’s also possible it was modeled/extracted from something non-public.

link

miguelrochefort 3085 days ago

How did you pick the name?

link

joefreeman 3085 days ago

"Rochefort Trappistes 10 is my favorite beer and I was drinking it while doing the initial implementation at sunday night"

link