|
|
|
|
|
by reaperman
1103 days ago
|
|
> Extra: ChatGPT Gave a Wrong RegexPermalink
I consulted ChatGPT for a regex to extract domains from urls, and it gave a flawed one: ^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n?]+). It even gave reasonable detailed explanations which convinced me. Later tests revealed that this regex doesn’t work for url with @ in path, such as https://foo.com/@./bar. The correct one should be ^(?:https?:\/\/)?(?:[^@\/\n]+@)?(?:www\.)?([^:\/?\n]+). --------------------- The trick is to ask ChatGPT what the right tool for the job is in your language of choice. For python, ChatGPT will happily give you: from urllib.parse import urlparse
extract_domain = lambda url: urlparse(url).netloc.replace('www.', '', 1)
# Example usage
url = 'https://foo.com/@./bar'
domain = extract_domain(url)
print(domain) # Output: foo.com
-------------I don't think RegEx is typically the "most" correct tool for the job for things which likely have built-in parser libraries (XML, HTML, URLs, JSON, etc) |
|