Your comment and the parent aren't mutually exclusive. It's fairly easy for coders to provide vast amounts of technical justification for resume driven development.
Is a new immature language like Pony really that desirable in the marketplace? I highly doubt 'resume development' was ever a primary motivation.
If it was anything beyond the strategic interests of the company it's that developers love a) clean code and possibilities of a fresh new project and b) playing with new modern toys.
The characterisation of JVM GC behavior seems slightly unfair.
Saying that JVMs are 'stop the world' and Pony is 'concurrent' feels like it's ignoring modern JVM GC strategies.
It might be true that you can avoid stop the world collection for an actor-system, but by logical extension would that not be possible on the JVM as well for that particular workload, given a suitably designed actor-system?
Its possible yes. At the time we started working on Wallaroo, the only real option for concurrent GC on the JVM was Azul. And we didn't want to tie our product and our goals to another company's commercial offering.
My intent was not be unfair. There's a lot of nuance in the topic that can be hard to cover in a more general blog post. Garbage collection is a fascinating topic, there's a great amount of detail that is left out in that post. I was going to a broad overview of general thinking.
I think this statement is a reach. There are plenty of concurrent collectors for the JVM, and there are tons of architectural and coding strategies to mitigate inconvenient GCs. This is not a one-of-a-kind problem, this is well studied and the path is well trodden. Azul is a convenient solution that wouldn't involve having to do a whole lot of case study, but it's certainly not the only one. And this is as-if perfect deterministic latency was the only factor that was important (then why not use C?).
What it sounds like instead was very minimal benchmarks or science was performed ahead of time, then a lot of justification written afterward. I know that's a reach as well, but this is a well-trodden path, so the answer "Pony is the best possible solution for the interest of our business" just seems like a very strange conclusion.
I see the argument/utility in using the right tool for the job, even if it was a slightly new language, more than most people. But I have a higher tolerance for risk more than most people (and have experienced the downside costs of those choices, they are very real).
Chosing newer languages gets maligned far more often than pigeonholing the wrong old languages onto problems.
Besides someone's got to take risks with (potentially) better technology, as long as they know the risks and fully considered them going in, then by all means. Plus as long as we continue to use C for all systems development the longer we'll have preventable security issues.
I've read that blog post and some of your other posts on HN. I get why the JVM, C/C++, and Go were not fits. However, I have not seen a lucid explanation of why you didn't go with Erlang or Elixir.
We looked at Erlang. Several of us are friends with folks who worked at Basho on Riak and we talked with them about our performance goals. They were very skeptical that we could meet them using Erlang. Based on that, we moved on from Erlang.
It's worth posting the long version. I've been following the various Pony blog posts with interest (I'm a both language geek and a distributed systems geek), but I always come away with the notion "Huh, kinda cool, but why didn't they just use Erlang, it'd be a great fit for this".
So either:
a) Erlang is not a good fit, and I'm wrong. Then I'd really like to know why I'm wrong!
b) Your friends at Basho led you astray. Would also be interesting to know what happened in this case!
Either way, without knowing more details, the short version you just posted is inconsistent with the claim that you guys did serious research into existing language ecosystems before going your own way.
Erlang, while having many virtues, is simply slow.
Once, I reimplemented in Elixir a toy data science tool I had previously built in node. Idiomatic node, idiomatic Elixir, both written for readability. The Elixir was approximately 100 times slower than the node version.
Now Erlang often feels fast, because of the architectures it allows, but when you get down to shuffling bytes around or doing low level math it is currently slow, slow, slow.
Given Wallaroo's speed goals, I would have been really surprised had they used Erlang:
I too would love to hear a long-form answer to this. Erlang seems like a good fit for this problem, besides maybe packaging up the client in an easily usable fashion.
Have you ever made a decision to not use Pony (or whatever the new sexy tech)?
I've read a few blog posts now and the end result always seems to be Pony.
You even talk up how great the C and Java client libraries are. Well they can't be as great as you say or you would have used them.
The C library seems to perform better, be more featureful, and tested better. So once again, why? It certainly can't be because you library is going to top the C library in any way. The article even seems to imply you'd be happy being at parity with the C lib.
In regards to the C and Scala/Java client libraries. They are great for what they do and how they do it. However, that doesn't mean they're ideal in every scenario. For example, the Scala/Java client is the most feature rich client and is actively developed in sync with the Kafka brokers. This, however, doesn't make it suitable for embedding in other languages. As a result, the C client was created by the community and is now officially supported by Confluent. That doesn't in any way take away from the quality of the Scala/Java client though.
Also, while the C client is more featureful and better tested, there is still the concern regarding the thread pools internal to Pony and librdkafka. We've seen first hand how CPU cache invalidation can impact performance so we are very aware of the potential negatives if the Pony and librdkafka threads ever end up fighting with each other over the same CPU resources and would prefer to avoid that.
Yes, Pony Kafka is currently slower than the C client. But it is also almost completely untuned as of right now. We expect there is a lot of low hanging fruit on that front that will give us significant gains. Yes, we mention in the blog post that we would be happy at being parity with the C client but our goal has always been to exceed it, eventually. Both in terms of performance and features.
I'm coming from this as somebody who often has to rewrite a lot of library code because of certain performance issue and poor decisions from library writers often regarding things like garbage collection and hidden resources, like thread pools or an event loop, that cannot be hooked into. I see it all the time. I can no longer count the number of times I've had to rewrite parts of the JDK or networking libraries because of these issues.
Now, this is what I'm hearing from what you are saying:
> 1- We can't use a JVM implementation because we aren't using a JVM language.
Makes sense.
> 2- The C library is okay, but hides its thread pool with no way to access it.
Ugh. Hate that. Its like these people writing these have never had to use them in a real project. The sign of a mediocre library.
Pony's actor model might have to rewrite almost any library used by it when concurrency is involved.
But now, I think you answered your own question in the titles now:
> Why we wrote our Kafka Client in Pony
1- Because the C library is mediocre and hides its threads from users making it not very useful for high-performance applications.
2- Because the rest of the system is in Pony. Really, you could write it in C/C++ or even Rust as long as you wrote it in a way that played well with Pony's concurrency model, but why bother with that extra effort, especially if you believe - as you seem to - that Pony's concurrency story is superior.
Once we made the decision to use Pony for Wallaroo, that has driven a lot of our other choices. The Java and C client libraries are excellent. We had architectural concerns about how the thread pool in the clients would interact with our scheduler threads.
There's a large performance improvement we get by having a single scheduler thread for each CPU. The performance impact of that is very large. Adding another threadpool that competes for CPU usage would be problematic.
Our client is for those high-performance use cases where if we can get parity or close to parity with the C client then we should get much better performance due to those architectural concerns.
That said, we plan on providing a way for folks who are less concerned with performance to use the C client library.
In the end, it was less about "use Pony" and more about "do this in a way that matches with Wallaroo's architecture".
Sure, but why should it be accomplished without Pony? Languages are optimized for use-cases. This means that some languages are good and some are worse at handling particular use-cases. If Pony is the best choice for their use-case, why would you not choose it? Taking all the risks of a new tech into account, of course.
If no one takes the plunge how do languages af technologies ever get proven? A startup without bureaucracy, institutional legacy and technical debt seems like a good place to do it.
I don't know about "hipster" unless you're using it as a synecdoche for "trend following."
There is definitely a predilection in certain parts of the coder community to prefer newness and difference over tried and true. There isn't anything wrong with that necessarily: it's part of how progress is made. However I think it's often taken to extremes in the coder community.
If it was anything beyond the strategic interests of the company it's that developers love a) clean code and possibilities of a fresh new project and b) playing with new modern toys.