Hacker News new | ask | show | jobs
by eeks 2925 days ago
This implementation looks a lot like HW TCP offload engines where the kernel handles session creation and termination and the HW takes care of most of the state machine. Apple then must have found some way to hide the handling of ancillary tasks from the user in a way that does not cripple the protocol. They may have added hooks in the main application loop but my guess is that they are running separate threads to avoid having the application hanging the main thread in a non-returning loop and prevent these ancillary tasks from running (some are time sensitive). This means that their TCP user-space session management is most likely multi-threaded, which has a detrimental impact on performance due to the use of locks and the consequential cache pollution.
1 comments

In the WWDC session on this, they demonstrated a simple app that sent uncompressed video frames captured from the camera over the network (I think using TCP, but I don't recall for sure) using BSD sockets and using Network.framework and reported 30% less overhead with Network.framework.

Which is to say, you're saying "detrimental impact on performance" and yet this seems to be a significant win over BSD sockets.

> I think using TCP, but I don't recall for sure

It was UDP.

> Which is to say, you're saying "detrimental impact on performance" and yet this seems to be a significant win over BSD sockets.

The point I was trying to drive home is that, while 30% overhead reduction compared to BSD socket is nothing to laugh at, a fully user-space UDP/TCP network stack combined with memory-mapped buffer sharing usually gives performance improvement measured in the "x" and not in the "%".

Now, there is not much in terms of experiment protocol in their material so its very hard to tell. You mention uncompressed video so they could be sending humongous frames, which I doubt as no one ever would send raw frames over the network (that would be several MB per frame for a 720p front camera). But if that is the case then the data copy operation becomes the predominant bottleneck and getting rid of one may justify the improvement. But that is not a realistic scenario.

The more realistic scenario is that they sent compressed delta-frames across the network (H264 or HEIF), which would then considerably reduce the transferred payload size. In that scenario, data copy is not the predominant overhead anymore and 30% overhead reduction is underwhelming, telling me that they still are calling expensive operations like syscalls/uIPC on the critical data path.

According to what they said in the session, they were asking for video frames from the camera and sending them, completely un-interpreted, over the network. The idea being the receiving device could take these frames and handle them exactly the same way as they would handle data coming from the local camera.