Hacker News new | ask | show | jobs
by Sephr 1722 days ago
> Assume that this regex will be used for a public URL shortener written in PHP, so URLs like http://localhost/, //foo.bar/, ://foo.bar/, data:text/plain;charset=utf-8,OHAI and tel:+1234567890 shouldn’t pass (even though they’re technically valid)

At Transcend, we need to allow site owners to regulate any arbitrary network traffic, so our data flow input UI¹ was designed to detect all valid hosts (including local hosts, IDN, IPv6 literal addresses, etc) and URLs (host-relative, protocol-relative, and absolute). If the site owner inputs content that is not a valid host or URL, then we treat their input as a regex.

I came up with these simple utilities built on top of the URL interface standard² to detect all valid hosts & URLs:

• isValidHost: https://gist.github.com/eligrey/6549ad0a635fa07749238911b429...

Example valid inputs:

  host.example
  はじめよう.みんな (IDN domain; xn--p8j9a0d9c9a.xn--q9jyb4c)
  [::1] (IPv6 address)
  0xdeadbeef (IPv4 address; 222.173.190.239)
  123.456 (IPv4 address; 123.0.1.200)
  123456789 (IPv4 address; 7.91.205.21)
  localhost
• isValidURL (and isValidAbsoluteURL): https://gist.github.com/eligrey/443d51fab55864005ffb3873204b...

Example valid inputs to isValidURL:

  https://absolute-url.example
  //relative-protocol.example
  /relative-path-example
1. https://docs.transcend.io/docs/configuring-data-flows

2. https://developer.mozilla.org/en-US/docs/Web/API/URL

1 comments

while not terribly important or outright not required this fails (treats urls as regex) for link-local addresses with device identifier (zone-id) applied like "[fe80::8caa:8cff:fe80:ff32%eth0]" although that would need to be fixed in the standard if its desired :)

i've found some reasoning[0] as to why its not supported with browsers in mind though.

[0] https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2