| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by semi_sentient 229 days ago
	Modern ranking systems (feeds, search, recommendations) have strict latency budgets, often under 200 ms at p99. This write-up describes how we designed a production system using a decoupled microservice architecture for serving, a feature + vector store data layer, and an automated MLOps pipeline for training → deployment. This is less about modeling, more about the infrastructure that keeps it all running.