It can be smoothly related with probability of post being undesirable. So if algo thinks it's 50% undesirable simply count it as "half a weight." Or tune this function to be whatever you want. Twitter/etc already makes arbitrary choices about what gets shown.
For every mean-spirited hate post that gets promoted, another tweet about knitting is not promoted. Why is censorship only bad if the content is hateful?