Hacker News new | ask | show | jobs
by syats 1852 days ago
After 5 minutes of reading through their website, I still don't understand what is their actual product. I like to think of my self as someone who is computer-savy, working at a software company, designing systems that access several databases from several clients, using spark, aws, the works... and still, their website makes no fkn sense! Could someone translate, please?

oh.. but look at the amount of job openings these guys have across the world.. is this another marketing ploy? I am deeply bothered by this kind of "products".

2 comments

I am absolutely no expert in any of these domains but the level of confusion described in these comments seems a little exaggerated. Is it so hard to see what is going on here?

A data lake is a centralized repository where all of a companies data is aggregated. This allows analysts to perform queries against a single data source (often masquerading as a SQL database) rather than against 100s of distinct databases (which may be a hodgepodge of no-sql, sql, custom-rest-api, etc.). These "data lakes" often grow to a massive size since they will often not only include your application data (usually batch replicated from prod databases on some schedule or in some cases streamed directly) but also data from external sources (e.g. a feed from your payment processor, compressed events from your app/website analytics, server logs, marketing and advertising sources).

Storing and processing that volume of data efficiently is a difficult task. Many companies decide to just dump that data in a raw format into cloud storage services like AWS S3. Then some third parties made the SQL-like interfaces run on top of S3 (or connectors from S3 into other familiar tools like Spark). This allows for low-cost storage while also allowing data analysts the ability to use tools they are already very familiar with. This way of handling large volumes of data stored for analysis has become very popular.

But now that you have so much data stored in S3 you might start to wonder how you can control access to it. An analyst doing queries on website performance might not require access to the payment processing data. Your security team might point out that your growing analyst team has more access to sensitive company data than is required. As you negotiate big corporate deals their security team might start to red-flag unnecessary access to data (or ask you for your policies governing access to that data and how those policies are enforced).

This product seems to allow finer control over access to data stored in these kind of data lakes. In the same way a bunch of tools appeared to create a SQL like facade on top of the data, this tool creates a facade on top of data access control.

Not only is what they are doing completely understandable from a quick skim of the article, it also seems totally necessary. I have no doubt this is a massive market and this product has every chance to serve a real need.

I think confusion let’s them sell to people that do not know what they are doing.