Hacker News new | ask | show | jobs
by Dave_Rosenthal 845 days ago
With the "production-grade" part of the title, I was hoping to see bit more about scalability, fault tolerance, updating continually-changing sources of data, A/Bing new versions of models, slow rollout, logging/analytics, work prioritization/QOS, etc. It seems like the lack of these kind of features is where a lot of the toy/demo stacks aren't really prepared for production. Any thoughts on those topics?
1 comments

This is a great question, thanks for asking.

We are testing workflows internally that use orchestration software like Hatchet/Temporal to allow the framework to robustly handle 100s of GBs of upload data from parsing to chunking to embedding to storing [1][2]. The goal is to build durable execution at each step, because even steps like PDF extraction can be expensive / time consuming. We are targeting an prelim. release of these features in < 1 month.

Logging is built natively into the framework with postgres or sqlite options. We ship a GUI that leverages these logs and the application flow to allow developers to see queries, search results, and RAG completions in realtime.

We are planning on adding more features here to help with evaluation / insight as we get further feedback.

On the A/B, slow rollout, and analytics side, we are still early but suspect there is a lot of value to be had here, particularly because human feedback is pretty crucial in optimizing any RAG system. Developer feedback will be particularly important here since there are a lot of paths to choose between.

[1] https://hatchet.run/ [2] https://temporal.io/