Yes. The basic structure of a CMOS sensor looks a lot like a DRAM array, but instead of a capacitor you have a photo diode and the "bit lines" / column lines go into a programmable gain amplifier and then into a 12 or 14 bit ADC with some DSP [1] and a parallel-to-serial interface. The exposure is determined by the readout (row for row), that's what gives you rolling shutter.
Some sensors have global shutter instead, which makes it even more like DRAM: Conceptually each pixel has a capacitor and a photodiode, and a transistor connecting the two, all of these transistors are connected in parallel and form the "global shutter signal".
Some sensors have faster ADCs and use one ADC for a bunch of columns, which has been claimed as the source of column-banding in some sensors (unclear if correct).
[1] Sensors that are intended for both photo and video typically support things like pixel binning, where the sensor itself averages e.g. 2x2 blocks of pixels internally. A lower quality alternative is line skipping, where the sensor is told to only read out every nth line, thus reducing resolution considerably. The higher quality alternative is "full-sensor readout", i.e. the camera reads all pixels and downsamples the image to the video resolution. I believe some (announced?) sensors can do this in the sensor itself now.
Note: A lot of this is "somewhat informed speculation" on my part, because image sensor manufacturers tend to be very secretive of their sensor's details.
Some sensors have global shutter instead, which makes it even more like DRAM: Conceptually each pixel has a capacitor and a photodiode, and a transistor connecting the two, all of these transistors are connected in parallel and form the "global shutter signal".
Some sensors have faster ADCs and use one ADC for a bunch of columns, which has been claimed as the source of column-banding in some sensors (unclear if correct).
[1] Sensors that are intended for both photo and video typically support things like pixel binning, where the sensor itself averages e.g. 2x2 blocks of pixels internally. A lower quality alternative is line skipping, where the sensor is told to only read out every nth line, thus reducing resolution considerably. The higher quality alternative is "full-sensor readout", i.e. the camera reads all pixels and downsamples the image to the video resolution. I believe some (announced?) sensors can do this in the sensor itself now.
Note: A lot of this is "somewhat informed speculation" on my part, because image sensor manufacturers tend to be very secretive of their sensor's details.