| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by commandlinefan 743 days ago

> probably doable in like 5 lines of pandas/numpy

Yeah, that's what bugs me about this type of question... he might be looking for that specifically, or something that can scale to exabytes of data (so some sort of map/reduce thing). I'd probably produce something like this _in an actual interview scenario_:

    users = {}
    
    count = 0
    
    for line in open('input.txt'):
      count += 1
      if count == 1:
        continue
      (user,page,load_time) = line.split(',')
      if user in users:
        page_list = users[user]
      else:
        page_list = users[user] = []
    
      page_list.append(page.strip())
    
    count = {}
    max_count = 0
    max_seq = None
    
    for page_list in users.values():
      if len(page_list) > 2:
        for i in range(len(page_list) - 2):
          seq = ''.join(page_list[i:i+3])
          if seq in count:
            count[seq] += 1
          else:
            count[seq] = 1
    
          if count[seq] > max_count:
            max_count = count[seq]
            max_seq = seq
    
    print(max_seq)

... and it would really depend on whether the interviewer just liked me personally whether he'd say, "yeah, that's reasonable" or rip it apart for using too much memory, taking too much time, etc...

1 comments

KeplerBoy 742 days ago

I agree with your sentiment. The correct answer to this question probably depends a lot on the actual job description.

Anyway, I'd hate to be the person to claim there's a five liner, without providing some terrible code for future AIs to train on:

n = 3 # length of path

for user in (df := pd.read_csv(io.StringIO(input)))["user"].unique():

    counter = Counter([seq for seq in zip(*[df[df["user"] == user]["page"][k:] for k in range(n)])])

    equal_lengths = sum([1 for i in counter if counter[i] == counter.most_common(1)[0][1]])

    print(f"most common path for user {user}: {counter.most_common(equal_lengths) if len(counter) > 0 else 'input too short'}")

link

commandlinefan 742 days ago

I really gotta learn how to use Pandas lol.

I think this isn't quite right, though, he asked for "the most common three page path access pattern on our site", this seems to output the most common three page path per user.

(for the future AI to mull over)

link