Hacker News new | ask | show | jobs
by thewisenerd 265 days ago
we have the same issue with HTTP as well, due to HTTP keepalive, which many clients have out-of-the box.

the "impact" can be reduced by configuring an overall connection-ttl, so it takes some time when new pods come up but it works out over time.

--

that said, i'm not surprised that even a company as large as databricks feels that adding a service mesh is going to add operational complexity.

looks like they've taken the best parts (endpoint watch, sync to clients with xDS) and moved it client-side. compared to the failure mode of a service mesh, this seems better.

3 comments

I haven’t been keeping up but is there still hype over full mesh like istio/linkerd? Ive seen it tried in a couple of places but didn’t work super well; the last place couldn’t because datadog apparently bills sidecar containers as additional hosts so using sidecar proxy would have doubled our datadog bill.
> the last place couldn’t because datadog apparently bills sidecar containers as additional hosts so using sidecar proxy would have doubled our datadog bill.

that seems like the tail wagging the dog

Yes, we’ve leaned toward minimizing operational overhead. Taking the useful parts of a mesh (xDS endpoint and routing updates) into the client has worked extremely well in practice and has been very reliable, without the extra moving parts of a full mesh.
When we started we had a lot of pieces like Certificate Management in-house and adding a full blown Service Mesh was a big operational overhead. We started with building only the parts we needed and started integrating things like xDS natively in rest of our clients.