The text to image models represent the data that is being fed to it. If you feed it 100 images and out of those 100 images 95 are of white men politicians and 5 are of politicians of color, that's what you are going to get back. These datasets are gathered automatically from various internet sources and represent the data that is available out in the wild. For stable diffusion to "fix" the bias in the data set would be simply impossible. That would require hiring an army of people whose sole purpose would be to "rebalance" the data set on some internal objective schema manually.
parroting and perpetuating the biases will ensure it will continue to be a culture issue.