Hacker News new | ask | show | jobs
by semi_sentient 229 days ago
Modern ranking systems (feeds, search, recommendations) have strict latency budgets, often under 200 ms at p99. This write-up describes how we designed a production system using a decoupled microservice architecture for serving, a feature + vector store data layer, and an automated MLOps pipeline for training → deployment. This is less about modeling, more about the infrastructure that keeps it all running.