| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by reaperman 1103 days ago

> Extra: ChatGPT Gave a Wrong RegexPermalink I consulted ChatGPT for a regex to extract domains from urls, and it gave a flawed one:

^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n?]+).

It even gave reasonable detailed explanations which convinced me. Later tests revealed that this regex doesn’t work for url with @ in path, such as https://foo.com/@./bar. The correct one should be

^(?:https?:\/\/)?(?:[^@\/\n]+@)?(?:www\.)?([^:\/?\n]+).

---------------------

The trick is to ask ChatGPT what the right tool for the job is in your language of choice. For python, ChatGPT will happily give you:

  from urllib.parse import urlparse
  extract_domain = lambda url: urlparse(url).netloc.replace('www.', '', 1)
  # Example usage
  url = 'https://foo.com/@./bar'
  domain = extract_domain(url)
  print(domain)  # Output: foo.com

-------------

I don't think RegEx is typically the "most" correct tool for the job for things which likely have built-in parser libraries (XML, HTML, URLs, JSON, etc)