|
|
|
|
|
by semi_sentient
229 days ago
|
|
Modern ranking systems (feeds, search, recommendations) have strict latency budgets, often under 200 ms at p99.
This write-up describes how we designed a production system using a decoupled microservice architecture for serving, a feature + vector store data layer, and an automated MLOps pipeline for training → deployment.
This is less about modeling, more about the infrastructure that keeps it all running. |
|