|
|
|
|
|
by ccstevens
2466 days ago
|
|
Chris from Databricks here. Glad you enjoyed the write up and glad to hear we aren’t alone. We also had difficulty creating a repro outside of Spark (JVM). I tried with Python sockets without any luck. That said, hitting the issue requires the right mix of dropped packets, socket buffer sizes and MSS. I don’t think there is anything special about the JVM influencing those variables. Now that I know more, maybe I can craft a minimal repro in another language. A datapoint I didn’t mention in the post is that we had a significantly higher repro rate when talking to S3 through a VPC endpoint. The only difference I could see was that the VPC endpoint connections had an MSS of 1412, while the MSS was slightly higher (1436 IIRC) on non-VPC connections. Yet to draw conclusions from that. |
|