Hacker News new | ask | show | jobs
by gregsadetsky 1728 days ago
I was just struggling with this -- specifically, our users' "UX" expectation that entering "example.com" should work when asked for their website URL.

Most URL validation rules/regex/librairies/etc. reject "example.com". However, if you head over to Stripe (for example), in the account settings, when asked for your company's URL, Stripe will accept "example.com", and assume "http://" as the prefix (which yes, can have its own problems)

What's a good solution? I both want to validate URLs, but also let users enter "example.com". But if I simply do

    if(validateURL(url)) {
      return true;
    } else if(validateURL("http://" + url)) {
      return true;
    } else {
      return false;
    }
i.e. validate the given URL, and as a fallback, try to validate "http://" + the given url, that opens the door to weird, non-URLs strings being incorrectly validated...

Help :-)

4 comments

Parse, don’t validate. If you need a heuristic that accepts non-URL strings as if they were valid URLs, you should convert those non-URL strings to valid URLs so the rest of your code can just deal with valid URLs.

    if (validateURL(url)) {
      return url;
    } else if (validateURL("http://" + url)) {
      return "http://" + url;
    } else {
      return null;
    }
I know we're not golfing, but it pains me to see that repetition in the middle. Mightn't we write

    if (!validateURL(url)) {
        url = "http://" + url;
        if (!validateURL(url)) {
            url = null;
        }
    }
    return url;
to snip a small probability of a bug?
I find that branchiness (and mutation of the variable) harder to follow. Personally, I’d just take “parse, don’t validate” to its logical conclusion and go for:

    const parseUrl = url => validateUrl(url) ? url : null;
    return parseUrl(url) || parseUrl('http://'+url) || null;
Address validators for online checkout are notoriously inaccurate, though they still help a lot. You just have to prompt the user, "Did you mean 123 Example St?"

I'd probably do the same for poorly formatted URLs. When the user hits Submit, a prompt appears saying, "Did you mean `https://example.com`?"

i would suggest bias your implementation against false negatives. They can always come back and update it if it's wrong, and their url could just as easily be "valid" but incorrect, eg any typo in a domain name.

if it's really important, you could try making a request to the url and see if it loads, but that still doesn't validate its the url they intended to input.

might be cool to load the url with puppeteer and capture a screenshot of the page. if they can't recognize their own website, it's on them.

This could potentially be abused, but you could actually try to resolve the DNS to determine if it's valid (could be weird for some cases like localhost or IP addresses). Or just do a "curl https://whatever.com" and see what happens (assuming that all of the websites are running a webserver, although idk if that is true in your situation)