| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by codeflo 1115 days ago
	I haven't heard of either of those companies. I don't even fully understand what Databricks does. But it's clear that they have no problem shutting down a production database offering with 30 days notice, and have the gall to title this action "Investing in the Developer Experience". If this doesn't send a message that you shouldn't trust them with anything important, I don't know what would.

7 comments

qsort 1114 days ago

> what Databricks does

It's an ancient African word that means "I am because I can't install Apache Spark".

link

fsociety 1114 days ago

Just install Apache Spark they said. It will be fun they said.

If you have the money, having a managed Spark instance with a bunch of added features can be a big win for some. There is a lot that goes into Spark maintenance.

link

nerdponx 1114 days ago

It also apparently includes some performance optimizations because they control both the hardware and software. And Delta Lake is pretty cool, and hosted MLFlow integration.

link

sagarm 1114 days ago

Databricks built a proprietary vectorized accelerator for Spark they call Photon. It's not just that they've tuned OSS Spark especially well.

link

RBerenguel 1114 days ago

Back when I was a customer (before Photon was released, also during) they had a very good tuning, in the order of around 2x faster for the workloads we had at the time (very large graph computation and a “simple” filtering)

link

rovr138 1115 days ago

Databricks is a company by the people that built Spark.

They've extended and their platform does a lot now.

link

andruby 1114 days ago

What is Spark?

I assume that’s Apache Spark, which is described as a “ unified analytics engine for large-scale data processing”

Still not clear for me what to use it for :-/

link

rovr138 1114 days ago

It is Apache Spark. It's a framework that allows processing large amounts of data in parallel on a cluster of computers.

You can use batch processing, streaming, do machine learning and graph jobs. You usually use Scala, Java, Python or R to write your code. The code is executed in Scala, so it all gets converted to it. For example, in Python you'd use PySpark and that gets written down to its scala equivalent which is then executed.

I mainly work in Python, so I'm going to talk about some features there. But it support dataframes and exposes the data in Spark DataFrames. You build operations and those slowly build a DAG. It's not until you either execute, save or request to see the data that it actually starts executing the DAG after optimizing what it needs.

If you need something that spark doesn't support, you can use regular python, but because it won't get converted to spark, it'll run on only one node and be limited. So you have to rewrite your code optimizing for it.

You can process some data in memory, you can use disk, you can use databases. Either as source or targets.

A use case can be, load the raw data as it comes in, transform the data to your intermediary states, then write out different tables based on what they need to do.

---

It's a framework that has an engine to manage code running on clusters, a language to interact with the data, abstractions and optimizations of the code, ways to store the data, checkpoints for optimizations, and other things.

link

kccqzy 1114 days ago

Wow you are right. The blog post doesn't even mention it but the home page https://bit.io/ does.

link

lucideer 1114 days ago

Slight oversimplification but Apache Spark is basically the "open core" to Databricks' commercial platform.

link

debarshri 1114 days ago

It probably was an acqui-hire. If the product was growing at a VC investible rate, they wouldn't have sunset-ed the product. Alternatively, may be they are going rebrand it into something that aligns with databricks.

link

relativ575 1114 days ago

> But it's clear that they have no problem shutting down a production database offering with 30 days notice

Maybe there is no production db left from paying customers?

link

codeflo 1114 days ago

The homepage suggests otherwise, but who knows: https://bit.io/

link

re-thc 1114 days ago

> I don't even fully understand what Databricks does

The naming is really confusing. When I brick my console it's broken. I'm not sure I want to brick my data :(

link