Hacker News new | ask | show | jobs
by royjacobs 2443 days ago
Reading this article it seems like yet another example of "you don't have big data". Most of the features that are unique to Spark (or Spark-like setups) were not needed, so in the end it's mostly...just an app talking to Postgres?

I'm not sure, but reading other articles[0] on the blog seems like they've been jumping on bandwagons before, so it's probably good to come back on those decisions every now and again.

Edit: Not trying to come off as too snarky, though I've found that this type of thing is pretty common in startups where everyone from the CTO on down has some experience but not a lot of experience. I've fallen into that trap too, at some point saying "Sure, Scala will work great! It's future-proof and everyone will love it!" cue crickets

[0] https://tech.channable.com/posts/2017-02-24-how-we-secretly-...

6 comments

> Reading this article it seems like yet another example of "you don't have big data".

Yes definitely. This thing was a big exercise in "You aren't going to need it".

> so in the end it's mostly...just an app talking to Postgres?

Yes. We solve problems for our customers. It's nice if we can do that without also creating problems for ourselves :)

> seems like they've been jumping on bandwagons before

Just wanted to point out that this system is also written in Haskell. We didn't really switch bandwagons.

>Just wanted to point out that this system is also written in Haskell. We didn't really switch bandwagons.

Fair enough! In the end it's great that you've been able to take a step back and look at your ultimate goal (i.e. the customer problem solving). That doesn't always happen enough, and it's a good sign for your IT teams :) Also the fact that you were able to get the time to actually do this.

There's that, but Spark is also a little bit tricky because it has such a big feature set that it's not just attractive as a big data tool. On paper, it can be attractive as a tool for easy single-node data parallelism, for easy streaming data processing, for easy machine learning on the Java platform, stuff like that. I'm looking at migrating off of Spark, too, and finding that Spark is still the only way to get a decently ergonomic (for developers) data table library on a Java language.

So there's all of that stuff, and then you think, "Oh, and it gives us an easy scaling path if we ever find our data volumes growing at an unexpectedly rapid clip." And at first you think it's just a cherry on top.

It may seem too good to be true at first, but with everyone using it, and with most the alternatives looking like they'll involve at least as much dev effort during the initial analysis, it's pretty easy to miss that.

And it's only a while later, when you're already pretty heavily invested in the ecosystem, that you start to really understand some things that the bloggers and book authors don't talk about in public. Like how Spark makes Big Data and scale-out a self-fulfilling prophecy. You will need to scale out, even if your data should fit in memory, because Spark wastes memory like a 22-year-old football player wastes money.

And maybe you're an appropriately cynical jerk, and can therefore spot that it couldn't possibly live up to the hype from a mile away. Congratulations. Hopefully you're the technical lead. Even if you are, though, too bad, because, nobody else in the meeting room is as jaded about tech as you are. Certainly not the management folks who don't program or have been out of the game for years. And, since you are cynical and jaded, that means there's still a good chance you'll go with Spark anyway. Because that's a hard argument to win - unless you've already been burned by Spark in the past, you aren't going to have enough intimate knowledge to make an argument that's concrete enough to sound convincing. And because this isn't the hill you want to die on. And because, being appropriately cynical, you realize that, at the end of the day, it's not really your problem. Spark will get the job done. It'll be inefficient, sure, but the extra server and development costs aren't coming out of your pocket, they're coming out of the pocket of the person who wants to be using Spark.

Also, if you collaborate with data scientists and don't have the resources to use fully distinct systems for the production and research/reporting functions then Spark is very nearly the only game in town.
Curious what you recommend instead of Spark.
Almost 100% of the time when somebody says “curious what you recommend instead of _____” the best answer is “write a program”, but for some reason programmers aren’t supposed to do that any more.
That's a question that has no single answer. Spark is a giant omnibus project with all sorts of bells and whistles, and the viable alternatives are going to vary wildly, depending on which subset of them you actually need/use.

If you want a single, simple answer that's easy to sell to management, I'd refer you back to the last paragraph of my original post.

I'm not a user myself, but just from watching the chatter in the world it seems dask and rapids are quickly overtaking spark in mindshare, or at least enthusiasm, and that spark is following hadoop/mapreduce to the Big Data Graveyard.
I think it is, but it's a worthwhile point to reiterate. It's also worth pointing out that it can be worth doing a rewrite to a less scalable architecture if it simplifies the architecture and provides more flexibility.

The company I work is currently undergoing a similar rewrite from firebase to Postgres (although we're doing a gradual migration), and it's amazing how much code we have been able to throw away, and how much quicker we can move in the parts of the code that have been migrated.

Trying Haskell if you happen to land a Haskell expert on your team sounds entirely reasonable. It's an old language by now and even used in the conservative FAANG companies. And if you're stuck on the JVM, so does checking out the competition to Java. Lots of people are happy with Clojure or Scala and don't look back.
>Not trying to come off as too snarky

I had a similar thought. Postgres is a great choice, but then they also went with Haskell ... I look forward to another blog post in 2-3 years detailed all the ways that Haskell failed them and that at the day they should have just gone with an industry standard language.

https://tech.channable.com/posts/2017-02-24-how-we-secretly-...

FWIW they have this post from almost 3 years ago about adding Haskell to their stack. I'm guessing the time for your prediction has come and gone.

On the other hand, I find it quite encouraging that Haskell was barely mentioned. It seems they viewed the risk as "changing the projects language" rather than "using a non industry standard language".

> "Sure, Scala will work great! It's future-proof and everyone will love it!"

Scala is still the least-bad option for a JVM language.

Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.

>Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.

It’s funny that you gloss over one the big productivity issues with Scala. Somehow everyone who joins your team should be acutely aware of your flavor of Scala.

If the community decided if wanted to be a better java or a worse Haskell then I bet more people would be productive in Scala

It's not "my" flavor of Scala, it's what it's designed for. It's a pragmatic, ML family language.

Read Odersky's book, write code, still have access to all JVM libraries and a lot less brain damage.

There are people that try to write worse Haskell in everything from Perl to Kotlin too. The reputation is overblown and has little to do with the actual language.

They also gloss over the huge productivity killers that are sbt and scalac. Compile times are almost as bad as C++ where I work. Sbt will be "Done compiling." and then hang for 15-20 minutes.
I would argue that in a sufficiently large organization you're better off using plain Java or perhaps Kotlin. The latter hits the sweet spot between 'expressive' and 'unreadable' whereas Scala can miss the mark.

I'm sure if you are very disciplined when writing Scala then this will not happen, etc etc. But the fact that people need to decide upfront which parts of Scala to use and which ones to avoid seem like red flags to me (and in fact were red flags, in my experience).

With Java getting value types, record types, and pattern matching it will end up giving all the other JVM languages a run for their money. Today, Kotlin is much more approachable than Scala, not to mention better tooling. Once Java catches up though, it will be a different story.
Lots of good things coming for the jdk. I'm extremely looking for project loom (fibers) to land.

I expect kotlin to fall off everywhere except android once loom and the rest of project amber land.

When are higher-kinded types roadmapped for Java?
I'm not aware that they are. It's a nice feature, but it might introduce unnecessary indirection is used inappropriately, that and the vast majority of languages get by without it without any significant hindrance.
Yeah the team is productive. But only in the 20% left for real work between the interminable arguing over monads or circe or play or sbt or whatever.
Doing real work in Scala for a decade. Never argued about any of that.
> Scala is still the least-bad option for a JVM language.

Source, please?

> Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.

Why is that?