Hacker News new | ask | show | jobs
by lateforwork 58 days ago
There are two broad types of databases: operational and analytical.

Operational databases store transactions and support day-to-day application workflows.

For analysis, data is often copied into separate analytical databases (data warehouses), which are structured for efficient querying and large-scale data processing. These systems are designed to handle complex, random queries and heavy workloads.

LLM agents are the best way to analyze data stored in these databases. This is the future.

1 comments

> LLM agents are the best way to analyze data stored in these databases

Why, and how?

> Why

Based on my experience with Claude, it's pretty damn good at doing data analysis, if given the right curated data models. You still need to eyeball the generated SQL to make sure it makes sense.

> and how?

1. Replicate your Postgres into Snowflake/Databricks/ClickHouse/etc, or directly to Iceberg and hook it up to Snowflake/Databricks/ClickHouse/etc.

2. Give your agent read access to query it.

3. Build dimensional models (facts and dimensions tables) from the raw data. You can ask LLM for help here, Claude is pretty good at designing data models in my experience.

4. Start asking your agent questions about your data.

Keep steps 3-4 as a tight feedback loop. Every time your agent hallucinates or struggle to answer your questions, improve the model.

Side note: I'm currently building a platform that does all 3 (though you still need to do 2 yourself), you just need Postgres + 1 command to set it up: https://polynya.dev/

> Claude is pretty good at designing data models in my experience

Yesterday, Claude decided to go with nvarchar(100) for an IP address column instead of varbinary(16), and thinks RBAR triggers are just-as-good as temporal tables.

So, no. Claude is not good at designing data models in my experience.

Postgres has network types https://www.postgresql.org/docs/current/datatype-net-types.h...

  inet —- allows zero bits to the right of the netmask
  cider —- does not allow
> Side note: I'm currently building a platform

Oh ok this comment is just an ad then

Wide tables and rich data. Dozens to hundreds of columns, some of them a json dimension. Way easier to explore these datasets with AI