Hacker News new | ask | show | jobs
by mrmufungo 2545 days ago
The backend is pure Java, and is composed of four separate microservices. The web server (we use Javalin) is nothing special; the transcoder works as follows:

* An POST request is sent to the web server. If the file is valid, it is moved via FTP to a local server. I then create a message for RabbitMQ to add to a transcode queue.

* A worker listening on the channel gets the job, and does all the needed work to make the file "streamable", e.g. segmenting and encoding file parts, creating the M3U8, generating the waveform, storing to S3, etc. Throughout the process, progress is saved in Redis, which is made available to the client for visual feedback

* Once the process is done, an identifier is generated, which the client refers to when saving any modifications to the track information. We primarily use MariaDB.

The transcode process is the most complex piece of the puzzle, of course. Everything else is pretty much CRUD operations.

1 comments

> all the needed work to make the file "streamable", e.g. segmenting and encoding file parts

Pardon the probably stupid question, but is that (splitting into several files) really necessary? Isn't supporting HTTP range requests enough?

Or is it because clients tend to download the whole file, even when the listener is still at the very beginning and might skip, meaning wasted bandwidth?

Not a stupid question at all! I actually looked into byte-range requests as opposed to downloading individual segments, indeed Amazon S3 does support byte-range requests. I would think it's similar to implement, and wouldn't have as many moving parts.

When I had started the project, I saw how other streaming sites fetched their audio, and I saw a pattern in individual GET requests for segments. I thought, "Hey, they're doing this for a reason, and even though I don't fully understand this reason, maybe I should do it to." At the time, I think I justified it by thinking that someone who wants to rip the stream file would have to piece the track together versus just requesting the full track.

In the end, it really doesn't matter: anyone who is willed enough can get whatever is being sent to their client. It's one of those moments where I didn't bother to really think it through and factor in the practicality aspect. I'd like to go back and re-visit byte-range requests, though, just to see how it would work differently.

MPEG-DASH and HLS both use segments. I think one reason is to allow for dynamic quality switching: if the bandwidth drops, the player can simply start pulling segments encoded at a lower bitrate. This is harder to do with byte-range, since bytes don't map to time cleanly.
As someone else who has built streaming things, HTTP range requests aren't enough all of the time.

Quite a number of older Android clients, like TVs, cheap phones, etc. don't have an up-to-date browser on them, and they will make the first range-request correctly (with a byte range that reflects the browser knows the full content-length, etc)...

But you'll have to build your own scheduling system to pull in further requests, which is painful, clumsy and doesn't always work.