Hacker News new | ask | show | jobs
by buro9 4596 days ago
I looked at that myself, and decided that it wasn't the path I should go down.

ParseFragment throws an error on bad input, but actually I just want that stripped and to carry on processing things. If a user has put in a mostly usable piece of HTML and then got something wrong as an error rather than bad intent then permissiveness in how we handle that should rule.

And then I wondered about the wisdom of creating a potentially large security library on a not quite nailed down API.

Ultimately, given that this is a security thing, I figured it's best to go with the proven many-eyeballed solution that was had widespread acceptance.

Feel free to use the package we've provided, the bit of go code you need for it is:

    import (
    	"os/exec"
    )
    
    func SanitiseHTML(html string) (string, error) {
    	cleanse := exec.Command("java", "-jar", "/usr/sbin/cleanse.jar", "--permissive")
    
    	writer, err := cleanse.StdinPipe()
    	if err != nil {
    		return "", err
    	}
    
    	_, err = writer.Write([]byte(html))
    	if err != nil {
    		return "", err
    	}
    
    	err = writer.Close()
    	if err != nil {
    		return "", err
    	}
    
    	buff, err := cleanse.Output()
    	if err != nil {
    		return "", err
    	}
    
    	cleanse.Start()
    
    	return string(buff), nil
    }