Hacker News new | ask | show | jobs
by technimad 2053 days ago
I don’t like the devaluation of the word observability. A chart like this used to be called something like ‘tools to monitor and understand postgres’. Observability is the ability to derive the internal state of a system from the outside. Usually done through traces.

It would be great if Postgres could emit a trace per query, showing in real-time which internal components were hit by this query. A sort of continuous query explain service.

Combine these traces to database clients and other front end services and you’ll be able to point to the front end service version which causes cache misses deep inside postgres.

4 comments

https://www.postgresql.org/docs/current/dynamic-trace.html:

“PostgreSQL provides facilities to support dynamic tracing of the database server. This allows an external utility to be called at specific points in the code and thereby trace execution.

A number of probes or trace points are already inserted into the source code. These probes are intended to be used by database developers and administrators. By default the probes are not compiled into PostgreSQL; the user needs to explicitly tell the configure script to make the probes available.

Currently, the DTrace utility is supported, which, at the time of this writing, is available on Solaris, macOS, FreeBSD, NetBSD, and Oracle Linux. The SystemTap project for Linux provides a DTrace equivalent and can also be used. Supporting other dynamic tracing utilities is theoretically possible by changing the definitions for the macros in src/include/utils/probes.h.”

Observability is not just about traces. This great article[1] by @copyconstruct describes it in detail, and lists monitoring, alerting/visualization, tracing, and log aggregation/analytics as key components of an observability stack.

[1] https://copyconstruct.medium.com/monitoring-and-observabilit...

What kind of in information would you like to see in that trace?

Not planning to work on it directly, but I am planning to do some larger executor work as one of the next bigger projects, and having ideas about what kind of information people would like to see could make it easier to implement them later.

A trace should answer why a query is fast or slow. It should provide enough information to a dba (which I’m not) to determine in which domain the most likely optimization is possible. I.e. the query itself, db configuration and tuning or underlying infra.
Unfortunately some of the data for that isn't cheap to collect. Postgres' EXPLAIN (ANALYZE, SETTINGS) <query>, which shows a good bit of what you want, can slow queries down substantially (in some workloads). Primarily because of timestamp overhead.

Just always collecting that information just in case it may get accessed thus isn't really feasible. It'd be good to make it possible to query cheaper information on-demand though (e.g. asking for the EXPLAIN of a query running in another session, without analyze, should be doable with some effort).

You can already set up things in a way that allows to correlate connections / queries with distributed tracing. But it's a more work than it should be. Postgres' pg_stat_activity shows queries, and it can include information that allows to correlate in the connection's 'application_name'.

The point the OP was trying to make I believe, is not that it should do all of that, but that if it doesn't do that a different name to observability should be chosen such as monitoring.
"It would be great if Postgres could emit a trace per query, showing in real-time which internal components were hit by this query. A sort of continuous query explain service."
I think you can achieve this with pg_stat_statements and auto_explain.
Not super familiar with PG but I was under the impression it did have traces - https://www.postgresql.org/docs/9.1/dynamic-trace.html