Hacker News new | ask | show | jobs
by scapecast 2130 days ago
Co-founder of intermix.io here (which we sold in March). We came more from the performance monitoring angle (specifically for Redshift), but then shifted to a product that works horizontally across all warehouses, to track usage, workflows and user engagement. "Shift to Data Products" was the narrative we started using in Q4 2019. If you read the copy on the current intermix.io website, I think you'll find yourself nodding. (FYI - we got bought by a small PE Fund that is rolling the product into Xplenty, an ETL product).

My experience is that monitoring data quality is a still an under-appreciated discipline. I've found that most teams still have an "not invented here" mentality, or don't even know they have the problem! That can lead to a "oh, we can just fix it when it happens" type of mentality. But your timing may be better than ours - we started back in 2016.

I haven't played with your product (yet), only took a look at this thread and your website. Some observations:

- SQL Editor - big plus! I think giving your users a space where they can take action is a super value-add, we didn't have that.

- nice work running the tests inside the customer's warehouse. That has two benefits for you. 1) you're not incurring the cost to crunch the metadata, it can get quite expensive, depending on the number of tables in the warehouse. 2) you're avoiding data access issues, getting access to the warehouse was always a hurdle, even though we only needed access to the system tables.

- pricing model. I think the per-seat model is the way to go. We tried charging by number of rows, and size of the warehouse (number of nodes), but then you run into weird situations with customers who are dealing with huge historic datasets, but really only look at the last 30 of data.

My unsolicited $0.02 is that you think hard about distribution. I think you want to think about hitching your wagon to the cloud marketplaces, and Snowflake's marketplace. For example, attaching themselves to Snowflake is what made all the difference for Fivetran.

I have a bunch of more scars that I can share if you care to know them :-)

4 comments

Fantastic blog post, thanks for sharing.

So I guess if you had to pick arbitrary revenue/data/fte cutoffs, do you see the org chart of these adopters as you’ve described looking a certain way? Let me try to rephrase that.

Do you think there’s a step function of “here you need one DBA who is a holy librarian” and “here we need a gitlab styled data team with SLAs and the data equivalent of HR business partners who get assigned to the BU”?

Tangential to your comment but curious if you believe the human side scales akin to the infrastructure side.

Where is the blog post?
> My experience is that monitoring data quality is still an under-appreciated discipline.

We agree with this a lot, we found there are often a lot of unknown unknowns that drive data issues, and a lot of teams aren’t sure of where to start. It’s why we’re spending so much time on trying to make relevant tests in Hubble that are easy to set up and use (and then let users create custom tests once they get the hang of it).

Great point on the distribution, we do think being close to the data warehouses is really important for us, most teams already have one set up, but don’t know if what’s inside it is correct or useful. We’re looking to get set up on their marketplaces soon!

It sounds super relevant, we’d love to hear more - you can get me at hamzah[at]gethubble.io

Awesome - just followed up on your ping!
> we got bought by a small PE Fund that is rolling the product into Xplenty

I'm interested to hear more about your experience building data warehouse related products, and perhaps learnings you've had along the way. I guess selling to PE wasn't the initial goal, but I'd imagine your product is very well suited to the Redshift space.

I've been working on Snowflake related products, and their adoption speaks to a world of new problems being created, similar to your product with Redshift. I suppose the risk is being squashed by Snowflake building the feature, or businesses migrating to something new (perhaps Redshift products have suffered because of Snowflake)

Basically, what do the battle scars look like :D

there are always things the warehouses can't build themselves.

For example, with intermix.io, it was the tracing we had built for other tools like Looker and dbt. The insight was that the result of a DAG involves many different calculations across different tables. The metadata only tells you that the steps happened, but doesn't tell you in what sequence they happened, where the "hiccup", latency, etc.

Redshift is clearly suffering from Snowflake. I wrote about that in my post-mortem. That post also has a few battle scars:

https://medium.com/@larskamp/why-we-sold-intermix-io-to-priv...

Ping me on LinkedIn if you care to hear more :-)

Cheers, appreciate it, will ping you on LI!
FYI: Snowflake seems to be a commercial marketplace that lets users download data sets (weather, marketing etc) and presumably people to upload their data sets

https://www.snowflake.com/data-marketplace/

I assume there is a open version that's really good but less cool