I often wonder what chromecast is doing in the 10 seconds it takes to start streaming something. Especially on hardware and software developed by google.
Buffering and waiting for TCP slow-start to get up to speed. While the amount of buffer time needed can be debated, given that in a long stream it is likely that a packet will be lost and resent via normal TCP (or just sent via a different route and so arrive late), you should have a few seconds buffer, and that means a few seconds at the start before the stream starts. It is a technical thing.
You don't have to implement things as above. For live video you probably should use UDP and design your protocol so that you can handle a few missed packets - that is have your video become fuzzy in those cases. This is a lot more complex to design though and so not would you should do first. Even if you have Google's engineers, the first solution is better for not-live video since a few more seconds of delay mean you can keep clear video.
You don't have to implement things as above. For live video you probably should use UDP and design your protocol so that you can handle a few missed packets - that is have your video become fuzzy in those cases. This is a lot more complex to design though and so not would you should do first. Even if you have Google's engineers, the first solution is better for not-live video since a few more seconds of delay mean you can keep clear video.