|
|
|
|
|
by KeplerBoy
696 days ago
|
|
This seems to be nothing like tsp. You'd partition the table into a single table per user, extract the page columns, map that sequence to the asked three-page-sequences (ABABA would get mapped to ABA, BAB, ABA), and count them. That's probably doable in like 5 lines of pandas/numpy; a straight forward o(n) task really. The hard part is getting it right without googling and debugging, but a good interviewer would help you out and listen to the idea. Maybe using Pandas is cheating since it gives you all the tools you'd want but I'd argue it's the right tool for the task and you could then go on how you'd profile and improve the code if performance were a concern. |
|
Yeah, that's what bugs me about this type of question... he might be looking for that specifically, or something that can scale to exabytes of data (so some sort of map/reduce thing). I'd probably produce something like this _in an actual interview scenario_:
... and it would really depend on whether the interviewer just liked me personally whether he'd say, "yeah, that's reasonable" or rip it apart for using too much memory, taking too much time, etc...