|
|
|
|
|
by roh26it
694 days ago
|
|
At Portkey, this is a problem we deal with quite a bit. Also the reason that Datadog and the traditional observability vendors did not work for LLM use cases since they're not built to handle large volumes of data. We've done this through a careful combination of Clickhouse + MinIO for fast retrieval of log items + selected retrieval from the MinIO buckets. Cost becomes a very big factor when managing, filtering and searching through TBs of data even for fairly small use cases. One thing we lost in the process is full-text search over the request & response pairs and while we try to intelligently add metadata to requests to make searching easier, it isn't the complete experience yet. Still WIP as a problem statement to solve and maybe the last straw here. Any suggestions? |
|
We are thinking about sampled hot data for ops staff in otel DB+UIs, and long-term full data in S3/Clickhouse for custom tooling. It'd be cool if we could send Clickhouse historical otel sessions to grafana etc on demand, but likely a bridge too far...