|
Okay I tried it. I got interrupted twice for like ~12 minutes total, making the time I spent coding *checks terminal history* also 12 minutes. I made the assumption (would have asked if live) that if a user visits "A-B-C-D-E-F", then the program should identify "B-C-D" (etc.) as a visited path as well, and not only "A-B-C" and "D-E-F", which I felt made it quite a bit trickier than perhaps intended (but this seems like the only correct solution to me). The code I came up with for the first question, where you "cat" (without UUOC! Heh) the log file data into the program: import sys
unfinishedPaths = {} # [user] = [path1, path2, ...] = [[page1, page2], [page1]]
finishedPaths = {} # [path] = count
for line in sys.stdin:
user = line.split(',')[0].strip()
page = line.split(',')[1].strip()
if user not in unfinishedPaths:
unfinishedPaths[user] = []
deleteIndex = []
for pathindex, path in enumerate(unfinishedPaths[user]):
path.append(page)
if len(path) == 3:
deleteIndex.append(pathindex)
for pathindex in deleteIndex:
serializedPath = ' -> '.join(unfinishedPaths[user][pathindex])
if serializedPath in finishedPaths:
finishedPaths[serializedPath] += 1
else:
finishedPaths[serializedPath] = 1
del unfinishedPaths[user][pathindex]
unfinishedPaths[user].append([page])
for k in sorted(finishedPaths, key=lambda x: finishedPaths[x], reverse=True):
print(str(k) + ' with a count of ' + str(finishedPaths[k]))
Not tested properly because no expected output is given, but from concatenating your sample data a few times and introducing a third person, the output looks plausible. And I just noticed I failed because it says top 3, not just print all in order (guess I expect the user to use "| head -3" since it's a command-line program).I needed to look up the parameter/argument that turns out to be called "key" for sorted() so I didn't do it all by heart (used html docs on the local filesystem for that, no web search or LLM), and I had one bout of confusion where I thought I needed to have another for loop inside of the "for pathindex, path in ..." (thinking it was "for pathsindex, paths in", note the plural). Not sure I'd have figured that one out with interview stress. This is definitely trickier than fizzbuzz or similar. Would budget at least 20 minutes for a great candidate having bad nerves and bad luck, which makes it fairly long given that you have follow-up questions and probably also want to get to other topics like team fit and compensation expectations at some point edit: wait, now I need to know: did I get hired? |
Major:
1. Sorting finishedPaths is unnecessary given it only asks for the most frequent one (not the top 3 btw)
2. Deleting from the middle of the unfinishedPaths list is slow because it needs to shift the subsequent elements
3. You're storing effectively the same information 3 times in unfinishedPaths ([A, B, C], [B, C], [C])
Minor:
1. line.split is called twice
2. Way too many repeated dict lookups that could be easily avoided (in particular the 'if key (not) in dict: do_something(dict[key])' stuff should be done using dict.get and dict.setdefault instead)
3. deleteIndex doesn't need to be a list, it's always at most 1 element