Hacker News new | ask | show | jobs
by angoragoats 635 days ago
I don't understand the problem that's being solved here. At the scale you're talking about (e.g. millions of requests per day with FindAI), why would I want to house immutable log data inside a relational database, presumably alongside actual relational data that's critical to my app? It's only going to bog down the app for my users.

There are plenty of other solutions (examples include Presto, Athena, Redshift, or straight up jq over raw log files on disk) which are better suited for this use case. Storing log data in a relational DB is pretty much always an anti-pattern, in my experience.

2 comments

Philip here from Find AI. We store our Velvet logs in a dedicated DB. It's postgres now, but we will probably move it to Clickhouse at some point. Our main app DB is in postgres, so everybody just knows how it works and all of our existing BI tools support it.

Here's a video about what we do with the data: https://www.youtube.com/watch?v=KaFkRi5ESi8

It's a standalone DB, just for LLM logging. Since it's your DB - you can configure data retention, and migrate data to an analytics DB / warehouse if cost or latency becomes a concern. And, we're happy to support whatever DB you require (ClickHouse, Big Query, Snowflake, etc) in a managed deployment.
I guess I should have elaborated to say that even if you're spinning up a new database expressly for this purpose (which I didn't see specifically called out in your docs anywhere as a best practice), you're starting off on the wrong foot. Maybe I'm old-school, but relational databases should be for relational data. This data isn't relational, it's write-once log data, and it belongs in files on disk, or in purpose-built analytics tools, if it gets too large to manage.
Got it. We can store logs to your purpose-built analytics DB of choice.

PostgreSQL (Neon) is our free self-serve offering because it’s easy to spin up quickly.