Hacker News new | ask | show | jobs
by yolo69420 1511 days ago
Maybe I'm missing something but unless the interviewer expects a 100% correct csv parser that handles all possible corner cases this is basically

- read file

- start at beginning; if you encounter separator, add new entry to an array. if you encounter newline, save entire current array as a new line to an array of lines.

- do lexicographical sort of arrays (can just use qsort for this).

- if the test q actually wants you to parse out numbers and shit like that, nothing about the structure changes, you just have to record some extra info when parsing values and then use a different comparison depending on the type in the function you pass to qsort

I dare say someone programming algorithms in a kernel or writing code for custom file systems should be able to do this. OK 15 min may be tight if you're nervous but you should get there.

3 comments

You forgot quotes. Unfortunately we're moving forward with other candidates at this time. Good luck in your job search!
It gets very nasty at this point especially because I’ve used popular languages who’s CSV parsing libraries were unable to handle some weird quotes properly.
What languages? No option to configure any character as quote character or delimiter?
Ruby. Python. Several JavaScript NPM packages. Part of the problem here is the standards around CSVs aren’t consistent, nor are they well respected.

Things get even weirder when you throw non-Latin characters in the mix. May such parsers predate widespread use of UTF-8.

Python I believe was the issue in my case.
RFC 4180!
I don’t think you realize how much harder this becomes with C. You probably need to can’t just add strings to an array you’ll need to allocate space for all the arrays (presumably by doing a first pass to determine CSV dimensions) and all the substrings (since they’re Cstrings).

Personally I’d do it by storing indexes into the CSV contents instead of using null terminated C strings.

Doing all of this in just 15 minutes is going to be hard.

By the way, your solution as described is wrong, since it sorts each row independently. What you’d need to do is store all the data in a transposed format i.e an array of pointers to column arrays, then sort the array by giving qsort a comparison function that looks at the first entry in each column.

State machine is all you need.