Hacker News new | ask | show | jobs
by mpeg 1723 days ago
I was once failed on a technical interview, partly because on the coding test I was asked to write a url parser "from scratch, the way a browser would do it" and I explained it would take way too long to account for every edge case in the URL RFC but that I could do a quick and dirty approach for common urls.

After I did this, the interviewer stopped me and told me in a negative way that he expected me to use a regex, which kinda shows he had no idea how a web browser works.

5 comments

how would you even "parse" a url with a regex? dynamically defined named subpatterns for each url parameter? I think the best i could do on paper with a regex is say "yup this is a url" or maybe "yup i can count the number of params"

Unless it was a specific url with specific params?

Match groups so you can split it up into scheme, username, password, host, port, path, query, fragment. Not difficult to approximate, though for best results with diverse schemes you’d want an engine that allows repeated named groups, and I don’t know if any do (JavaScript and Python don’t).
Python's `regex` package does allow repeated named group.
I mean ya that would match a query string, but it wouldn't parse it?
I assume they meant "some regex implementation, including replace and/or match groups".

Like, for just the params part (yes, broken and simplistic):

  #!/usr/bin/perl
  $_="a=b&c=d&e=f&whatever=some thing";
  while (s/^([^&]*)=([^&]*)(&|$)//) {
    print "[$1] [$2]\n";
  }
Ya. I've also suffered copypasta trials administered by bar raisers, mensa members, and other self appointed keepers of the sacred nerd flame.

My imagined remedies are no 1:1 interviews and recording these sessions for "possible quality assurance and training purposes".

How do browsers parse URLs then?
There's actually a standard for it these days.

https://url.spec.whatwg.org/#url-parsing

Here’s a polyfill for the JS URL() interface which should give you a taste: https://github.com/zloirock/core-js/blob/272ac1b4515c5cfbf34... (I tried finding the one in Firefox but I couldn’t actually work out where it started, this one is much easier to follow)

TLDR: it’s a traditional parser—a big state machine that steps through the URL character by character and tokenizes it into the relevant pieces.

Did you point out that his two requirements were contradictory?
its not very likely this is whats happening here but i feel like this could be done on purpose to see how you act in this kind of situation. it kinda tells how you would act once you inevitably go into a conflict with colleagues arguing over stuff like that.
Or it could be one of those outsourced interviews.
In that case I think the proper response should be: “I am very sure that browsers don’t do it that way. But let’s have a look.” And then pull up the source code for Chromium and Firefox. Assuming it’s not whiteboard only.

And if they still insist even after the source of Chromium and FF has been consulted. Well then it’s time to leave. Don’t want to work with anyone like that.

How do browsers parse URLs then?