| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gigatexal 637 days ago
	So from one expensive vendor to another? Your boss seems smart. ;-) What’s the rationale? What do you gain?

4 comments

systems 637 days ago

Well, from one VERY expensive vendor, to another considerably less expensive vendor

Also, MSSQL have few things going for it, and surprisingly no one seem to be even trying to catch up

    - Their BI Stacks (PowerBI, SSAS)
    - Their Database Development (SDK) ( https://learn.microsoft.com/en-us/sql/tools/sql-database-projects/sql-database-projects?view=sql-server-ver16 )

The MSSQL BI stack is unmatched , SSAS is the top star of BI cubes and the second option is not even close

SSRS is ok, SSIS is passable , but still both are very decent

PowerBI and family is also the best option for Mid to large (not FAANG large, but just normal large) companies

And finally the GEM that is database projects, you can program your DB changes declaratively, there is nothing like this in the market and again, no one is even trying

The easiest platformt todo evolutionary DB development is MS SQL

I really wish someone will implement DB Projects (dacpac) for Postgresql

link

gigatexal 637 days ago

What do you love so about PowerBI? I’ve not looked at it very closely. I’ve worked with Tableau and Looker and LookerPro. All of which seemed fine.

Is it a home run if the end users aren’t on Windows or using Excel? I’m thinking about the feature where you can use a SQLServer DB as a data source to Excel.

link

bradford 637 days ago

> What do you love so about PowerBI?

For a large portion of my career, the dashboarding solutions I've worked with have followed a similiar model: they provide a presentation layer directly on top of a query of some kind (usually, but not always, a SQL query). This seems like a natural next step for organizations that have a lot of data in one spot, but no obvious way of visualizing it.

But, after implementing and maintaining dashboards/reports constructed in this way, big problems start to arise. The core of the problem is that each dashboard/report/visual is tightly coupled to the datasource that's backing it. This tight coupling leads to many problems which I won't enumerate.

Power BI is great because it can provide an abstraction layer (a semantic model) on top of the many various data sources that might be pushed into a report. You're free to combine data from Excel with msSql or random Json, and it all plays together nicely in the same report. You can do data cleaning in the import stage, and the dimension/fact-tables pattern has been able to solve the wide variety of challenges that I've thrown at it.

All this means that the PowerBI reports I've made have been far more adaptable to changes than the other tightly coupled solutions I've used. (I haven't used Tableau, but my understanding is that it provides similar modeling concepts to decouple the data input from the data origin. I'm not at all familiar with Looker).

[disclaimer, I'm a Microsoft Employee (but I don't work in Power BI)]

link

jaakl 636 days ago

The problem of semantic models what I've seen in tools like Looker, Tableau and Qlik (very probably same for PowerBI) is that they are tightly coupled to the tool itself, work within them only. Now you want "modern data system" then you want them decoupled and implemented with an open semantic model which is then accessable by data consumers in Google spreadsheets, Jupyter notebooks and whatever BI/Analytics/reporting tools your stakeholder uses or prefers. There are very new solutions for this like dbt semantic models; their only issue is that they tend to be so fresh that bigger orgs (where they do make most sense) may be shy on implementing them yet. To the original topic - not sure how much PG17 can be used in these stacks, usually much better are analytical databases - BigQuery, Snowflake, maybe Redshift, future (Mother)Duck(db)

link

greggyb 636 days ago

The semantic model in Power BI is not tightly coupled to the tool. It is an SSAS Tabular model. It is pretty trivial to migrate a Power BI model to Analysis Services (Microsoft's server component for semantic models, hostable on-prem or as a cloud offering).

Both Power BI and Analysis Services are is accessible via XMLA. XMLA is an old standard, like SOAP old, much older than dbt.

XMLA provides a standard interface to OLAP data and has been adopted by other vendors in the OLAP space, including SAP and SAS as founding members. Mondrian stands out in my mind as an open source tool which also allows clients to connect via XMLA.

From what I can see, dbt only supports a handful of clients and has a home-grown API. While you may argue that dbt's API is more modern and easier to write a client for (and I'd probably agree with you! XMLA is a protocol, not a REST API), the point of a standard is that clients do not have to implement support for individual tools.

And of course, if you want API-based access there is a single API for a hosted Power BI semantic model to execute arbitrary queries (not XMLA), though its rate limits leave something to be desired: https://learn.microsoft.com/en-us/rest/api/power-bi/datasets...

Note the limit on "one table" there means one resultset. The resultset can be derived from a single arbitrary query that includes data from multiple tables in the model, and presented as a single tabular result.

Note: XMLA is an old standard, and so many libraries implementing support are old. It never took off like JDBC or ODBC did. I'm not trying to oversell it. You'd probably have to implement a fair bit of support yourself if you wanted a client tool to use such a library. Nevertheless it is a protocol that offers a uniform mechanism for accessing dimensional, OLAP data from multiple semantic model providers.

With regard to using PG17, the Tabular model (as mentioned above, shared across Power BI and Analysis Services) can operate in a caching/import mode or in a passthrough mode (aka DirectQuery).

When importing, it supports a huge array of sources, including relational databases (anything with an ODBC driver), file sources, and APIs. In addition to HTTP methods, Microsoft also has a huge library of pre-built connectors that wrap APIs from lots of SaaS products, such that users do not even need to know what an API is: these connectors prompt for credentials and allow users to see whatever data is exposed by the SaaS that they have permissions to. Import supports Postgres

When using DirectQuery, no data is persisted by the semantic model, instead queries are generated on the fly and passed to the backing data store. This can be configured with SSO to allow the database security roles to control what data individual users can see, or it can be configured with security roles at the semantic model layer (such that end users need no specific DB permissions). DirectQuery supports Postgres.

With regard to security, the Tabular model supports static and dynamic low-level and object-level security. Objects may be tables, columns, or individual measures. This is supported for both import and DirectQuery models.

With regard to aggregation, dbt seems to offer sum, min, max, distinct count, median, average, percentile, and boolean sum. Or you can embed a snippet of SQL that must be specific to the source you're connecting to.

The Tabular model offers a full query language, DAX, that was designed from the ground up for expressing business logic in analytical queries and large aggregations. Again, you may argue that another query language is a bad idea, and I'm sympathetic to that: I always advise people to write as little DAX as possible and to avoid complex logic if they can. Nevertheless, it allows a uniform interface to data regardless of source, and it allows much more expressivity than dbt, from what I can see. I'll also note that it seems to have hit a sweet spot, based on the rapid growth of Power BI and the huge number of people who are not programmers by trade writing DAX to achieve business goals.

There are plenty of pain points to the Tabular model as well. I do not intend to paint a rosy picture, but I have strived to address the claims made in a way that makes a case for why I disagree with the characterization of the model being tightly coupled to the reporting layer, the model being closed, and the model being limited.

Side note: the tight coupling in Power BI goes the other way. The viz layer can only interact with a Tabular model. The model is so broad because the viz tool is so constrained.

link

jaakl 636 days ago

Thanks for the writeup. There are indeed use cases, especially in the MS multiverse. Proof of the none->basic->complex “can do everything” (soap,xml,rpc)->radically simpler “do what really matters” (rest, json, markdown) path. I’m not really sure if dbt semantic layer is the final open “standard” for the future analytical models and metrics, it has own questionmarks, it is literally just a transformer with metrics as addon and there are just initial implementations, but today I’d rather give that thing a try. Simpler is so much better

link

bigger_cheese 636 days ago

My Org has been heavily pushing us to use Power BI, I have found it has a lot of pain points.

I expect how useful it is depends a lot on your use cases. If you want to throw a couple of KPI's on a dashboard it fulfils that requirement pretty well but if you want to do any analytics (beyond basic aggregations like, min, max and median) or slightly complicated trending (like multiple Y axis) Power BI is painfully complicated.

link

systems 636 days ago

PowerBI might be the only BI solution with a decent functional language

DAX is supper powerful, if you design your datawarehouse as a proper stat scheme, there is nothing you cannot calculate using DAX

Nothing in the market is even close, most BI tool have magic expression language, DAX is a proper language

link

bigger_cheese 636 days ago

I'm sure it can work but I've found it hard to slot into our use cases.

Power BI's trending is one thing I have really struggled with the limitations it has.

For one of my use cases (industrial plant data) we often want to trend timeseries data on a common xaxis with uniquely scaled yaxis.

i.e Temperature, Pressure, Flow Rate, Fill height etc all have differently scaled yaxis but common xaxis.

Something like this example https://imgur.com/a/zd5Giom

Power BI limits you to a single primary and a single secondary Y axis only (i.e limit of two scales).

link

gigatexal 637 days ago

I got my start in a SQLserver all MS shop.

I hated SSIS. I wished I just had something akin to Python and dataframes to just parse things in a data pipeline. Instead I had some graphical tool whose error messages and deployments left a lot to be desired.

But SQLServer and Microsoft in general make for good platforms for companies: they’re ubiquitous enough that you won’t find it hard to find engineers to work on your stack, there’s support, etc

link

systems 637 days ago

The key feature of SSIS, is parallel dataflow

You can so easily write (and schedule) parallel dataflows in SSIS, to do the same code using a general purpose programming language would be a lot harder

Also remember that dataflows are data pipe streams, so SSIS can be very very fast

Anyway, there is BIML, which allow you to create SSIS package by writing XML, I personally never used it, mainly because its licensing situation seemed weird to me ( i think BIML is free, but the tool support is not, and MS SSDT doesnt support BIML coding i thinkg)

link

gigatexal 637 days ago

Yeah I think I never gave it a fair shake. I think — like most things — if understood and used properly it can be amazing.

link

wredue 637 days ago

SSIS is for integrations, and pandas is definitely not. I’m not sure what you’re trying to do with SSIS that you’re also doing with pandas, but it’s probably wrong. SSIS is far more geared to data warehousing integrations, while pandas would be reading a data warehouse and doing stuff with it. SSIS isn’t really meant for processing incoming data, even if you can kind of hack it together to do that.

I will say that when we want “real time” integrations, SSIS is phenomenally bad. But that’s not entirely unexpected for what it is.

link

gigatexal 637 days ago

We don't need to be so pedantic. Python -- as it often is -- would be the glue, it would be the connecting part, and pandas (polars, duckdb, anything really) would be the processing part. Then once processed the outputs would be placed somewhere be it an update to a db table or some other downstream thing.

link

wredue 636 days ago

I was just saying you should most likely not be doing data processing with SSIS. That’s not what it’s for even if it can be cobbled in to doing some.

link

greggyb 637 days ago

Power BI embeds a quite good columnstore database engine. It is exactly the same database engine as in the SQL Server Analysis Services Tabular model.

For the types of queries typical in BI (primarily aggregations that involve lots of table scans), this yields great performance. Non-technically savvy users can import millions of rows of data and explore this with interactive visuals. The performance characteristics of the database engine allow moderately complex logic to remain high-performance into 10s or 100s of millions of records. This is on laptop-spec hardware. 100Ms of records will need enough supporting RAM, but the data is highly compressed, so this fits.

The performance scaling allows simple aggregates of single-digit-billions-of-records to remain fast: usually just a couple of seconds. Performance tuning for more complex logic or larger datasets becomes complex.

When I talk about the performance characteristics above, I mean for the in-memory database engine. This performance comes out of the box. There are no indices to create or typical database administration. Users set up an import of data and define relationships among tables, and define aggregation logic. The query language is DAX, a functional, relational language built with the goal of making business logic and aggregations simple to write.

For simple use cases, users do not even need to write their logic in DAX. The viz layer automatically generated basic aggregates and provides a sophisticated filtering and interactive viz canvas experience.

There is also an ETL layer, called Power Query. This has very different performance characteristics, and in general you should try to minimize your work in this tool and its M language, as it has very different semantics. The sweet spot is intra-row logic and basic filtering of data that will be imported into the columnstore database I mentioned above.

The same underpinning technology, VertiPaq, supports columnar compression and storage/low level query in other Microsoft products.

Columnstore indices in SQL Server also use VertiPaq for compression and storage engine.

The data storage layer in Microsoft Fabric utilizes parquet files in delta tables (same tech as Databricks). Part of the VertiPaq engine is a set of compression optimization heuristics that are quite good. They apply this optimization to get 30%-50% compression improvements in the parquet files they generate. They call this feature V-order (v for VertiPaq).

The standard compression algorithms in Parquet include dictionary and run-length encoding. The same are used (though obviously with different implementations) in VertiPaq's native storage format. The compression optimization heuristics include some for identifying superior sort orders to help maximize the effect of these styles of compression. V-order is the application of these heuristics to the data before applying compression for the parquet files.

To be 100% clear, V-order is a pre-processing step that yields vanilla parquet files, not any sort of extension to the parquet format. V-order is applied automatically in the storage layer for Microsoft Fabric, called OneLake.

You may come across documentation for xVelocity; this is an abandoned name for the same technology, VertiPaq.

I can talk about this literally all day. If you've got specific followups, feel free to ask here or email me at (any name) at (my username) dot com.

link

greggyb 637 days ago

Follow up regarding the platform question:

Power BI is a desktop application for Windows and also a SaaS solution for hosting the solutions you build in the desktop tool.

The SaaS product offering is confusing for the time being due to the introduction of Fabric, which includes Power BI and a whole data stack besides.

Your Power BI power users will need a Windows platform available, though I have seen plenty of use of Parallels or a remote desktop infrastructure for users not using a Windows device.

There are authoring experiences in-browser that are improving and in some cases surpassing what is in Power BI Desktop, but if you're doing a lot of work authoring in Power BI, then the desktop application is where you want to be for now.

For pure querying or just data model development, there are third party tools that are superior to Power BI Desktop (disclosure: I do work part time with one of these third party vendors).

link

kragen 637 days ago

does 'bi' mean 'olap'?

because the literal expansion 'business intelligence' (or the explanation in the Wikipedia article) is hard to interpret as something that makes sense in contexts like this where you're apparently talking about features of software rather than social practices. the reference to 'bi cubes' makes me think that maybe it means 'olap'?

link

marcosdumay 637 days ago

It basically means "report editors".

And no, MS ones aren't miles ahead of the competition. But are bundled with Office, so the competition is mostly going out of business by now.

link

greggyb 637 days ago

Except Microsoft's report editor, Power BI, includes a powerful columnstore database that can handle importing and rapidly aggregating 100Ms or even 1Bs of records with query response times of just a couple seconds.

That has been the differentiator. Power BI has lots going for and against it, but its database is miles ahead of competitors' report tools.

Edit: clarification: importing large volumes of data takes more than a few seconds. The query performance of the in-memory database after import is complete is what I referred to above in my original comment.

link

systems 637 days ago

DSS became BI became AI

At some point in time, Decision Support Systems became Business Intelligence and nowadays this is switching ti AI

BI (formerly DSS) is a set of tools that enable Business Analytics , OLAP and technologies that implements OLAP (Cubes and Columnar databases) enables BI

link

gigatexal 637 days ago

I think so but it’s overloaded and basically used for reporting and analysis stuff.

link

setr 637 days ago

bi tooling is usually report/analysis builders, intended to be consumed by.. business users. More importantly, they're usually also meant to be used by business users to build their own reports in the first place -- although in practice it often ends up going back to IT to maintain/update anyways. Basically anything that competes with excel pivot tables is a bi tool.

OLAP/cubes usually underpins bi tooling, since the usecase is almost entirely across-the-board aggregations.

link

Guillaume86 637 days ago

I fully agree on the sql project type, it's amazing, another great side benefit is intellisense on your db schema (find all references, etc), git blame on the schema is also great. Doesn't redgate offer kinda the same tool commercially with support for more than mssql? Never tried it but IIRC it looked similar...

link

tianzhou 632 days ago

>> I really wish someone will implement DB Projects (dacpac) for Postgresql

Bytebase did https://www.bytebase.com/docs/change-database/synchronize-sc...

link

pphysch 637 days ago

> And finally the GEM that is database projects, you can program your DB changes declaratively, there is nothing like this in the market and again, no one is even trying

Automatic migration tools have essentially been doing this for a while (e.g. Django migrations). But I agree it would be nice if Postgres had better support built in.

link

koolba 637 days ago

Database migrations to arrive at a target state are an interesting beast. It’s not just the destination that matters, it’s how you get there.

The naive approach of applying whatever DDL you need to turn database schema A into B is not necessarily the one anybody would actually want to run, and in larger deployments it’s most definitely not the one you’d want to run.

Intermediate availability and data migration go well beyond simply adding tables or columns. They’re highly application specific.

The main improvements in this space are making existing operations CONCURRENT (e.g., like index creation) and minimizing overall lock times for operations that cannot (e.g., adding columns with default values).

link

pphysch 637 days ago

I agree automating migrations correctly from state A to B without any risk of data loss or downtime is essentially an intractable problem.

Still, the popular tools seem to have good heuristics and good DBAs recognize when migrations must be written by hand.

link

tianzhou 632 days ago

(Bytebase co-founder) Bytebase implements this https://www.bytebase.com/docs/change-database/synchronize-sc...

link

sroussey 637 days ago

At least both of these databases support clustered indexes. For decades.

link

manishsharan 637 days ago

MS Azure SQL on is very cheap compared to any other self hosted database including PG and MS SQL. Unless they running a super heavy workload , this solution will meet most business requirements.

But this is also a gateway drug for cloud addiction.

link

kjax 637 days ago

After scanning the Azure and AWS pricing pages and calculators for comparably speced Azure Manged SQL and Amazon RDS for PostgreSQL instances, AWS's RDS comes out at about 1/3 the price of Azure Managed SQL.

This matches my experience with deployments on AWS and Azure. Both are managed db instances, but Azure's is consistently more expensive for comparable offerings on AWS or GCP.

link

manishsharan 637 days ago

I do not have experience with AWS RDS so I cannot speak to that.

In my experience, GCP Cloud SQL for Postgres has been more expensive than MS Azure SQL. In our tests, CloudSQL also was not comparable to the resiliency offered by Azure SQL. Things like Automated DR and automated failover etc. were not at par with what Azure offered. Not to mention , Column level encryption is standard for Azure SQL.

link

AlfeG 637 days ago

Is there any analog for Azure SQL Elastic Pools for Postgres anywhere?

We pay in total something around 600 bucks to manage around 250 databases in MSSQL servers (with failover for prod databases, DTU based model)

We pay for log analytics more then we pay for Sql Servers.

Those Elastic Pools is a blocker for us on the way to migrate to Postgres from MSSQL...

link

ilkhan4 637 days ago

It's been a while since I've looked at elastic pools in Azure, but maybe Neon (https://neon.tech/) or recently Nile (https://www.thenile.dev/) might work in terms of spinning up a bunch of separate logical DBs with shared compute/storage.

link

clarkbw 637 days ago

(neon employee) and Neon will be available on Azure very soon ( https://neon.tech/blog/neon-is-coming-to-azure )

link

nikita 637 days ago

(neon ceo). We have lots of examples of this. Here is one with Retool. https://neon.tech/blog/how-retool-uses-retool-and-the-neon-a...

link

on_the_train 637 days ago

Exactly. Supposedly the paid solution ensures long term support. The most fun part is that our customers need to buy these database licenses, so it directly reduces our own pay. Say no to non-technical (or rational) managers :<

link

remram 637 days ago

Did you not pay for Oracle?

link

on_the_train 637 days ago

We pay, but what hurts is that our customers need to pay, too. For both oracle and ms of course

link

marcosdumay 637 days ago

Microsoft is way less likely to sue you and a couple of orders of magnitude cheaper than Oracle.

Besides, managing Microsoft licensing is a bliss close to Oracle's. And yeah, MSSQL is much better in almost every way than Oracle.

If you only compare those two, it's a non-brainier.

link

0cf8612b2e1e 637 days ago

A boolean column type.

link