> How would you do this without basically writing your own database from scratch?
It obviously is, for a suitable loose use of the term, "writing your own database from scratch" (unless you are using an existing library, for which many languages have ones that can be leveraged in the stdlib, including often ones that bring in SQLite). But doing a database of this extremely minimal complexity, or leveraging existing libraries to provide the functionality, is a pretty basic task that anyone expecting to get employed to do software development should be able to do with the conditions described.
I absolutely agree with this. Moreover, this basic approach of giving the candidate a reasonable set of requirements, a quiet space, and as much time as necessary is hands-down the best way I can think of to asses competency. It's how I was assessed by a company at which I just accepted a SRE position. There was a homework project, a few "personality" interviews, and then a work day for which I was compensated. I was stressed out with performance anxiety leading up to this naturally, but it ended up being a blast.
I was looking for a couple of months before I went through the process with my current employer, and some of the experiences were just horrible: one company put me through 7 interviews with a single engineer each time, lasting 30-45 minutes, and then brushed me off with a two sentence email. Another wanted me to design an algorithm in 20 minutes while they watched via screen share, and if I couldn't do it in 20 minutes I wasn't good enough for them. Several asked me to take timed tests with prominent counters flashing in the corner. It's just stupid.
So I think personally we should support employers who understand how software is made, and are willing to create the right conditions for you to really practice your craft, and then judge what you've made and how you interacted with potential team mates.
Store it in memory? I doubt the amount of data they provided for a test problem was so tremendous that it required a database.
In itself this is a good (dis)qualifier; if I give someone a problem and tell them the data will never exceed 10k rows, I expect them to be practical and not waste time setting up a database.
The flat line in the bathtub curve is where given "data will never exceed 10k rows" implies no database. On the inexperienced end, people don't understand that 10k rows doesn't require a database. On the experienced end, candidates probably oughtn't to believe it.
Absent an instruction to specifically "do not use a database", I feel like even 10K rows suggests the use a database as the data structure and then the DB query interfaces as the interface.
I don't see why you wouldn't use a database for a problem that it's particularly good at addressing.
As I think I said elsewhere, we allowed people to use a database if they wanted to (and could install it themselves on the dev VM we gave them), but then we would include looking at how they designed the schema, how much thought did they put into connection handling, etc.
But most of the people who successfully passed this didn't use a database, it just wasn't enough data to be meaningful - thousands of rows at most.
Yeah, or you could use the opportunity to demonstrate that you know how to roll your own data structures, provided you've clarified that the dataset isn't likely to grow. Instead of showing them you know how to install a database.
What? while sure you might not need a database if you have under 10k rows, if you want any amount of resiliency storing data in memory is a huge mistake as if the current instance of the application fails your done for.
Well, as other posters have commented you sort of do, but since we let you use libraries and the like, you had lots of options:
- sqlite
- Storable (perl)
- serialize some other data structure to disk
- in-memory
I seem to recall one or two people over the years who wanted to install a relational database as part of the code to store stuff - we were fine with that as long as you then showed all the right behaviors for dealing with relational databases (using transactions, catching errors, etc.)
But most people read it in and store it in memory - there was only a few thousand rows.
I guess my intuition was right. I was mostly curious if there was some other clever way of handling this. Mostly related to being able to query the data arbitrarily
I don't even remember how sophisticated the querying had to be - I think it was something like:
- find and display all movies by the specified year
- substring text searches in the title
Maybe one other. But it wasn't even approaching anything really sophisticated. We just wanted to see that people could write a basic logic flow, read-and-write data application.
Then for the people who did finish it, we looked at the code. Was it well commented, well thought out, did it have error handling (what if the input file doesn't exist? What if something eats your data file while it's running?), what assumptions did they make.
So the base test was basically a sniff test - if you can't do this, you're not a good enough developer to work here. If you can do it, we'll look at the quality of the result to assess how senior you are.
By implementing only that 1% of a real database's functionality that this task requires. Consider: you have only one table, no updates or deletes or transactions, fixed schema, and very, very simple queries.
So it's just a matter of parsing the CSV into some native data structure in your favorite language and then computing sort indices of the columns by which rows might be queried.
And sadly you don't even need to create sort indices. Just reading the data into a list of tuples and a few list comprehensions would provide all the querying needed (it wouldn't be efficient but most candidates can't even get that far)
Admittedly a database would make the querying more efficient (this is actually probably a good use case for Mongo), but you could just keep a dictionary for each query-able field if you want better than O(n).
Only once they ask for efficient Levenstein-distance-supporting search do you need to use any brainpower.
N.b. if you choose to parse CSV from scratch, make sure to handle commas in quoted text as well as escaped quotes. The better approach is to grab a library or code snippet and call it a day.
For example in python:
1. Load the CSV file using the csv lib in a list of dictionary
2. Fill a sqlite database with the list
3. Read wathever the user asked for (ex: movies longer than 100 minutes and release date before 1999) and build a SQL query
4. Display results
Well - I'd probably write my own database from scratch. If the list was short enough and time tight enough, I might just let it be an array stored in memory, but I guess that would cause performance problems after more than a couple hundred lines.
for interviews I'd say even straight dumping data into the file system and retrieving later when the user needs it would be a somewhat valid approach. It's more important that the candidate is able to explain the reason he choose this approach over holding everything in memory (writing/retrieval overhead vs. memory limitation, etc), given the data size/structure of the assignment.
If candidates reason well about design decisions of their implementation, perfect or not, it at least tells me they are reliable developers who do not jump into coding before thinking carefully.
It obviously is, for a suitable loose use of the term, "writing your own database from scratch" (unless you are using an existing library, for which many languages have ones that can be leveraged in the stdlib, including often ones that bring in SQLite). But doing a database of this extremely minimal complexity, or leveraging existing libraries to provide the functionality, is a pretty basic task that anyone expecting to get employed to do software development should be able to do with the conditions described.