This is a normal camera, a normal radio that transmits it, and a large radio telescope (once, over half a century ago, the largest in the world, but now amateur operated; https://en.m.wikipedia.org/wiki/Dwingeloo_Radio_Observatory) used as an antenna to receive that broadcast.
I expect the data to be transferred one pixel at a time, but possibly, it’s losslessly compressed.
It’s “Dutch radio amateurs image” because they controlled the camera.
It depends on the type of observations being made. If it is done with a single dish with a single antenna feed then to make images you just have to know when and where you are pointing while you record the power.
But most images you see of radio astronomy are done with interferometers. This means you use multiple pairs of antenna. Each pair will respond to a point source going across the sky with the interference pattern modulating the voltage at the receiver that follows an acos pattern relative to the distance between the antennas in the pair.
The multiple pairs of antenna all each contribute a 2D fringe pattern. The interference makes the receivers record what is essentially the fourier transform of the sky. So some pair might have sine wave of a particular wavelength and angle, and another pair another angle. By combining all these 2D patterns of waves the sum ends up being an "image" of the sky (convolved with the primary and sidelobe beams).
That gets you a "DIRTY" image, and it really is dirty. The beam of an interferometer is a complex thing. So after that multiple types of cleaning algorithms may be used to use prior knowledge to pull just the actual signal out of the noisy, gunked up "image".
So in the end each pixel is really the result of a somewhat arbitrary algorithm picking out the bright bits after taking the FFT of the sky of each baseline pair and combining them.
The radio part isn't directly linked to the imaging part.
The radio link is effectively just a modem (short for 'modulate/demodulate') that transforms bits into some modulation of a radio signal. A simple and error-prone modulation would be to just shift the frequency back and forth between two values such that the lower frequency represents a binary zero and the higher a binary one. The receiver listens to the RF coming from the transmitter, and if it applies the same algorithm to demodulate the signal back to bits, will start kicking out bits that turn to bytes that turn to pixels or telemetry or navigational data or whatever the satellite wishes to send. So in this case, it could be something approximating a standard digital camera, computer snaps a photo, maybe stores it on a filesystem, queues it up to send and when the time is right starts blasting bits down the RF pipe back to earth.
In my very limited experience with this the satellite will typically send a simple raw lossless bitmap or similar encoding to minimize the effect of data loss for individual pixels.
I did find a short blurb on the Longjiang-2 modulation types:
"While receiving signals from satellites in low Earth orbit requires only relatively simple antennas, doing so for satellites in orbit around the Moon (a thousand times more distant), is much harder. To this end Longjiang-1 and 2 transmit signals in two low data-rate, error-resistant, modes; one using digital modulation (GMSK) at 250 bits per second, while the other mode (JT4G) switches between four closely spaced frequencies to send 4.375 symbols per second. This latter mode was developed by Nobel-prize winning astrophysicist Joe Taylor and is designed for radio amateurs to relay messages at very low signal strengths, typically when bouncing them off the surface of the Moon."
GMSK stands for Gaussian minimum shift keying [1] and is also used for GSM mobile phone data transmission. It's a fairly sophisticated frequency shift keying that minimizes phase disturbance between the shifted frequencies.
JT4G is a simple frequency shift keying modulation. Here's [2] a recording of it from Longjiang-2, turn up the volume to hear the downconverted audio.
I expect the data to be transferred one pixel at a time, but possibly, it’s losslessly compressed.
It’s “Dutch radio amateurs image” because they controlled the camera.