| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mpeg 1770 days ago
	I was once failed on a technical interview, partly because on the coding test I was asked to write a url parser "from scratch, the way a browser would do it" and I explained it would take way too long to account for every edge case in the URL RFC but that I could do a quick and dirty approach for common urls. After I did this, the interviewer stopped me and told me in a negative way that he expected me to use a regex, which kinda shows he had no idea how a web browser works.

5 comments

elif 1770 days ago

how would you even "parse" a url with a regex? dynamically defined named subpatterns for each url parameter? I think the best i could do on paper with a regex is say "yup this is a url" or maybe "yup i can count the number of params"

Unless it was a specific url with specific params?

link

chrismorgan 1770 days ago

Match groups so you can split it up into scheme, username, password, host, port, path, query, fragment. Not difficult to approximate, though for best results with diverse schemes you’d want an engine that allows repeated named groups, and I don’t know if any do (JavaScript and Python don’t).

link

powersnail 1770 days ago

Python's `regex` package does allow repeated named group.

link

elif 1769 days ago

I mean ya that would match a query string, but it wouldn't parse it?

link

tyingq 1770 days ago

I assume they meant "some regex implementation, including replace and/or match groups".

Like, for just the params part (yes, broken and simplistic):

  #!/usr/bin/perl
  $_="a=b&c=d&e=f&whatever=some thing";
  while (s/^([^&]*)=([^&]*)(&|$)//) {
    print "[$1] [$2]\n";
  }

link

specialist 1770 days ago

Ya. I've also suffered copypasta trials administered by bar raisers, mensa members, and other self appointed keepers of the sacred nerd flame.

My imagined remedies are no 1:1 interviews and recording these sessions for "possible quality assurance and training purposes".

link

axiosgunnar 1770 days ago

How do browsers parse URLs then?

link

djur 1770 days ago

There's actually a standard for it these days.

https://url.spec.whatwg.org/#url-parsing

link

wolfgang42 1770 days ago

Here’s a polyfill for the JS URL() interface which should give you a taste: https://github.com/zloirock/core-js/blob/272ac1b4515c5cfbf34... (I tried finding the one in Firefox but I couldn’t actually work out where it started, this one is much easier to follow)

TLDR: it’s a traditional parser—a big state machine that steps through the URL character by character and tokenizes it into the relevant pieces.

link

jhgb 1770 days ago

Did you point out that his two requirements were contradictory?

link

mercora 1770 days ago

its not very likely this is whats happening here but i feel like this could be done on purpose to see how you act in this kind of situation. it kinda tells how you would act once you inevitably go into a conflict with colleagues arguing over stuff like that.

link

tapland 1770 days ago

Or it could be one of those outsourced interviews.

link

codetrotter 1770 days ago

In that case I think the proper response should be: “I am very sure that browsers don’t do it that way. But let’s have a look.” And then pull up the source code for Chromium and Firefox. Assuming it’s not whiteboard only.

And if they still insist even after the source of Chromium and FF has been consulted. Well then it’s time to leave. Don’t want to work with anyone like that.

link

axiosgunnar 1770 days ago

How do browsers parse URLs then?

link

Sephr 1770 days ago

See https://chromium.googlesource.com/chromium/src/+/HEAD/url/#c...

link