Hacker News new | ask | show | jobs
Ask HN: What are some rite-of-passage style projects for programmers?
87 points by linuss 3691 days ago
Hi, I'm trying to compile a list of projects, algorithms and maybe data structures that I feel every well-rounded programmer should have implemented at least once in their career. I'm hoping this list can then function as a guide for programmers to challenge themselves and maybe fill gaps in their knowledge they didn't know they had. So far, I've come up with the following list:

* Data structures - Linked list - Hash map - Several types of trees

* Algorithms - The common sorting algorithms - Dijkstra - Graph-traversal algorithms

* Projects - Ray tracer - Parser/Interpreter - Compiler - Virtual Machine - Small kernel - Neural Network - Web server

The algorithm and data structures section I believe should be covered by any decent computer science education (but may be useful to self-taught programmers). The projects are slightly more advanced and may take up to several weeks to implement completely. I realize this list is far from complete, so that's why I'm turning to you: do you have any projects that you've worked on that turned out to be very educational and made you a better programmer once you completed it?

28 comments

Odd list. You can have a very successful developer career without ever needing to write any of those ever. I'm a little over 20 years in now, and have never needed to do any of that.

Maybe in the 90s, I might have actually built a linked list in C for whatever reason, and I guess you could define graph traversal to include walking a tree pulled out of a database. But the rest are things that The Universe Provides For You(tm), which one would only ever reproduce in School, a Job Interview, or a Poorly Chosen Hobby.

As to actual rites of passage? Ship Something that real people actually use. Build an entire thing, be it a piece of desktop software, video game, web application, mobile app, etc. from bottom to top and send it out in to the world fully formed. That, in my mind, is what we're here for.

> You can have a very successful developer career without ever needing to write any of those ever.

You write this, but then you go on to mention job interviews, which are becoming more and more algorithmic each passing year. Software engineers are frequently asked to write these during interviews. If you've missed the experience of being asked to implement a hash table or linked list in an interview, or at the very least asked to write a function somehow manipulating some data structure, you've either been lucky or else stayed in the same job for a long time (not a bad thing, but not typical).

I highly recommend all software developers practice implementing data structures (linked lists, hash tables, trees, graphs) and algorithms (sorting, combinatorics, etc) on a regular basis. If you fail to master these, and to understand their underlying rates of growth (ie, "Big O"), you'll be forever pigeon-holed as a lightweight "webdev" forever, even if you have the programming chops required to build complex, reliable, and useful software.

Plus, they get really interesting the more you study them, and the math behind them.

> You write this, but then you go on to mention job interviews, which are becoming more and more algorithmic each passing year.

It's been the opposite experience for me in SF, actually. More startups are realizing that they want people who can build good products quickly, and with more people coming from non-CS backgrounds, they'd rather test for that than see if they can whiteboard a breadth-first search.

I took enough CS coursework to learn the stuff you mentioned, but I've worked with a number of developers who don't know any CS theory. Several of them are stronger engineers than me and have had very successful careers, and I'd rather work with someone who can communicate well and solve hard problems than someone who's read through CLRS.

I agree that for most developer roles most of these are definitely not necessary. I do know from personal experience that learning things that are far removed from any of the regular projects you're working on can provide valuable insights. The thing with these projects is that it's really difficult to know what you don't know yet, and so implementing something that is far removed from your day-to-day workings can provide you with insights that might help you, even though it seems completely alien.

Also, these projects are intended for self-study. Building an entire product from top to bottom is indeed a very valuable skill, but hopefully one you learn on the job.

Actually the number of opportunities you have to build something from top to bottom are not that numerous. In my 30 years I can think of 4 non-trivial projects off the top of my head. In fact I often advise programmers to have a side project where they get to make all the decisions and see how it works (or doesn't). One of the side benefits is that it tends to make people much less "my way or the highway" at work because they have somewhere that they can express themselves fully.
I have been recommending side projects on a similar basis for a while. I usually emphasize learning under the condition of leisure rather than schedule pressure, to enhance focus on good design and allow for a depth-first learning style.
Those kinds of projects are valuable not because of any significant chance of having to do something similar professionally, but because doing them gives you a fuller and more mature mental model for programming. Doing the larger projects listed will give you deep understanding of the stack (compiler/programming language, operating system, network protocols etc) you are interacting with every day. Especially non-trivial bugs and performance issues tend to test this understanding, and if you go through those kinds of experiences, you will be up for the challenge.

(I have personally done a toy HTTP server, small compiler, neural network, etc. and will be forever grateful I did so)

From the list I think he wants to become a great computer scientist instead of a great programmer.

Most computer science degrees will go way beyond this in the theoretical aspects.

I would add bloom filters, and some math like Fourier transforms and bezier curves.

i have just completed that real project, the next thing is deployment. We will see if they will want to use it. As for learning: yes I learnt a little bit of everything, but that programming is not enough to be like you all. There is a difference between a programmer and a hacker. Programmers, I guess enjoy knowing everything. Hackers just want their thing done. I dont know when I will have time to checkmark that list, but understanding every little project developers publish on HN is not my goal.
Some real world rites of passage... * have your project cancelled * get outsourced * build the same thing twice (billing system, e-commerce, etc) * underestimate a project * get promoted to management....realize you hate it * hire a net-negative * implement a cms

We could totally make career bingo outnof this...

- Stick long enough at a job that the product becomes a social network.

- Attend the same convention twice in a row for different jobs.

- Work with an ivy league MBA.

- Eventually settle on a Java or C# job.

- Build a wonderful product that people won't use. For whatever reason...

- Burnout and dream of working in construction...

> - Eventually settle on a Java or C# job.

why is Java considered "settling"? relatively inexperienced here

It's not a jab at Java or C#. It's a comment on the fight between ideals and money. Our ideal language will probably not make us money. Thats why we "settle" for one that does. :)
They've got the reputation as being kind of boring, corporate languages, among some people. Neither one is usually in the list of things that people would get very excited about using.
I don't think Java or C# are boring at all! Java is incorporating some interesting new features and its ecosystem is absorbing/evolving some of the newer styles of thought from other languages. C# is maturing into an all platform language that is backed by amazing tooling.
what's an "exciting" language in your opinion?
What I posted doesn't represent my own opinion, it's my perception of what developers in general might think, based on what I've heard people say in the past.

I don't have a big problem with Java (it was the first language that I learned object-oriented programming in), and I like C#'s increasingly cross-platform focus. I like the things I've read about C++17 and the features planned for it, and I'd be interested in seeing Rust pick up some steam.

Other people get really excited about various functional languages, new+shiny Javascript frameworks, and Node.js. Anything that's new, related to web technologies, and solves a perceived problem.

I know the answer was not meant for me, but I'm very excited about Verilog. It allows me to build my own chips with FPGAs (not that I'm doing any at the moment). Plus that means I can build my own language around a custom chip architecture.
> Burnout and dream of working in construction...

Is this a thing? Working in construction is usually my first day dream when I'm feeling fed up with work or a little burned out.

I've noticed this being a thing. Actually, its my #1 dream after doing ASIMUV full time (which will hopefully will become a reality). Also, its a comment easter egg because Im referring to the movie office space. The protagonist quit his job at a software company to go and work in construction. :)
I thought it was a reference to Office Space.
Seems to be. Funny though how I've never watched it and construction is my #1 mental escape for burn out. #2 is military.
Should add, *write a quick temporary / proof of concept application, then need to support it indefinitely.
yuuup... I built one of these for a friends company. 10 years later that is still the default home page for the internal staff...
To me, those aren't really signs of growth, other than being able to take a punch.

1. Writing a professional product, from scratch to completion, by yourself or as a team lead, while doing every aspect of the project: BA, PM, design, architecture, server install, data model (if applicable), code, comments, documentation, QA, deployment, support.

2. Pick up a complete mess that someone else wrote, who is no longer available. Read it, really understand it, and be able to refactor it into something worth while.

As stated, most of the things the OP listed are really just academic and he should have done those in school, as they have been done as nausium. Rarely do you need to do those anymore.

To your last paragraph, I work with people with MS degrees in comp-sci who come to me, a drop-out, to ask questions about that stuff (data structures, algorithms, etc.). Sadly, not every school is up to the task of teaching these fundamentals. It seems that they're mostly teaching web development and Java Beans these days.
That's unsettling. When I earned my MIS degree (with a bunch of CS courses) in 1997, you had the opportunity to get a great education, but you also had an opportunity to coast by. Back then, everything was done in "teams." You had people who took the bull by the horns, and you had people who were trying not to drown. C's get degrees, as we used to say.

Now, learning people with MS in CS have the same issues is what's really jarring. There were so many weed-out courses in BS CS, I couldn't imagine someone getting as far as a MS CS without knowing the basics.

Having said that, I earned most of my chops on "the street," with a great mentor many years ago. We really challenged each other want wanted to show each other how smart we were. Good times.

Do you know where they got their degree? ITT Tech or something?

I don't know, but I wouldn't be surprised if it were a fly-by-night sort of organisation like that.

In any case, the best developers I've worked with are the ones that practice the fundamentals on their own, regardless of whether they're degreed or not.

Pretty much this lol. Can also add things like:

- coding while fighting the scope creep

- coding while trying to figure out the overall project architecture on the fly in endless whiteboard meetings with other devs because the high level architecture Solution Architect put together ain't worth a sh*t.

- stepping out of your comfort zone on the daily basis having to deal with the tech you have 'some' but not 'good' let alone 'expert' knowledge of because you can't possibly hire an expert for every tech

etc etc... but still delivering on time with low defect count :)

* Have your company liquidated around you while being told "everything is fine", right up until the day you arrive at work and the office locks have been changed.
Haha, yeah, these are probably more realistic. I was writing my list from the perspective of the ivory tower of perfect computer science
+1 amazing post, you got me.
My rights of passage:

1. your own MVC framework You should do this to appreciate why developers of other frameworks make the decisions they do. I gained so much wisdom from this.

2. Parsing HTML with regex (see here - http://blog.codinghorror.com/regular-expressions-now-you-hav...)

Seriously, just DONT DO IT(tm) -- but if you do, you will eventually learn why you don't want to do it this way, and you might get pretty good at regex expressions

3. Your first mobile app published to the app store Publishing apps to the Apple app store has given me a deeper appreciation for paying attention to the little details. Also, making native apps is a completely different paradigm than web apps because shipping code with logic errors has such a high cost and delay to fixing them.

4. Port an existing library to a new language I long time ago I ported a recipe parsers from Ruby to Python for a paid gig. It was such a good learning experience because I had a perfectly functioning reference implementation, which allowed me to go deep on getting the details right.

I had to replicate test cases, documentation, scaffolding, and the code itself while being aware of the gotchas of Python.

The quintessential stackoverflow answer about parsing HTML with regexp should be noted:

http://stackoverflow.com/questions/1732348/regex-match-open-...

>Seriously, just DONT DO IT(tm)

Yes, you'll learn that this task is literally impossible, as well as why:) I think we've all used regex in a spot where it's impossible to do so correctly at some point.

Well, considering the PCRE regexes are nearly Turing complete, you can parse (well formed) HTML.

Someone better versed than I: https://nikic.github.io/2012/06/15/The-true-power-of-regular...

Of course, the money quote is: Just because you can, doesn’t mean that you should.

I had never seen PCRE regexes. I was thinking about the plain ones. This was pretty cool, thanks for sharing:)
I highly highly recommend writing your own MVC framework. I did this about three years ago because I was tired of working with Laravel which I didn't fully understand or appreciate at the time. Building your own forces you to take apart other frameworks and see how they run and really ask yourself what you would do different. It's a small enough project for you to play with for a week or two and really get a better understanding of how modern web applications are put together and how they could be put together better.
The OP's list seems is a list that will prepare you to ace job interviews. This list is a list that will make you undeniable [1].

[1] https://www.youtube.com/watch?v=2Pn1RVZu-24

Thanks for your reply, these are great additions to my list :)
For regular expressions, I recommend "Mastering Regular Expressions".

I thought I knew REs before reading that book -- since I had both been using them for years and did a course on automata theory at Uni -- but that was just Dunning–Kruger.

Edit: Also write a templating system and be done with it, so you won't waste time doing it later. :-) When I first learned programming I did quite a few simple games and routines to find primes.

Snake. Can pretty much pack all of CS 101 into that one little game ;)

My "hello world" for learning new languages and platforms. Integer based or floating point. 2d or 3d. Sound effects, sprite animation, physics, procedural particle systems, global leaderboards, digital skins and so on ad infinitum. Allows you to experience nuances in packaging and deploying WebGL vs Android vs Steam. Continue polishing it, and you may end up with something fun that others will love!

Can also be refactored into a full Tron Light Cycle style simulation. Which is a great way to learn AI. Good luck!

For me, it was writing my own Lisp. I'd strongly recommend giving it a try -- you'll learn about tokenization, parsing, garbage collection, and a bunch of little software development lessons.
Useful for getting started: https://github.com/kanaka/mal
For me, it was implementing a large bit of the C standard library. I had just blown an interview at Microsoft in 2001, where I stumbled over a whiteboard implementation of strtol, and I thought that I needed a serious upgrade of my understanding of library implementations and the algorithms that are used in them.

Every developer should also implement all the basic data structures and algorithms -- and then never write their own again! The process, however, definitely improves your chops, helps you understand the trade-offs inherent in choosing among data structures and algorithms, and gives you an appreciation for what's going on "under the hood."

Also, writing a compiler covers a lot of the list above. Even a simple language like Brainfuck (my choice for this rite of passage) can teach you a lot. It also gets you into the world of assembly language, which it seems that fewer and fewer new devs are exposed to these days.

(Yep... I'm officially old now. I just played the "kids these days" card.)

Maybe compiler-related stuff. Create a small dummy language that compiles down to JVM bytecode and things like that (learn about tokenizing, parser, abstract syntax tree, type checking, operator precedence, code generation, etc...).
The classic one I always heard of was a shell. You learn about input parsing, fork/exec, and usually some syscalls. Lots of features can be added on, like running processes in the background, autocompletions etc.

Writing a shell in C (you can use Readline if you don't care to learn much about parsing) teaches you lots of stuff about the operating system you're working on.

This may be colored by me being self-taught, but aside from interviews, why algorithms?

Seems to me that the best way to test your well-rounded skills as a programmer is to build and launch a product. Even if you don't aim to be an entrepreneur, the holes you find while taking an idea from inception to launch are much bigger holes than you'd find building this tree vs that tree.

Building fundamental data structures and implementing classic algorithms is a phenomenal way to learn about unit testing. How else are you going to make sure your doubly-linked list is doing the right thing?

It's much more difficult to learn about unit testing when you have the complications of a project stack: databases, front ends, etc. The fundamental data structures are self contained and their behavior down to the last detail (the spec) is fully described. That's an ideal way to learn about writing good unit tests.

Unit testing is not the be all and end all of writing an application.
The goal of the list is to find projects that will increase your understanding and skills as a programmer. This is a purposefully vague description, and I do believe that implementing basic algorithms and data structures can help you in your day to day work. I would never advocate doing such things in production work in languages that provide these tools for you, but it is helpful to realize why your high-level implementation should use a hash-map/dictionary instead of a list.
I'm also self-taught, and that's exactly why I found it important to focus on algorithms. Along with data structures, they're the fundamental elements of software development. (Data structures are the bricks, and algorithms are the mortar, so to speak.)
Depending on your area of work algorithms' value can range from useless to invaluable. The same applies to every topic. But I usually ask a very simple algo question during an interview to figure out general cognitive ability of the candidate.
Common ones from when I started writing video games in C in the 90's -

* Write malloc() and free().

* Write a gzip decompressor.

* Write a triangle rasterizer and use it to draw a spinning cube on the screen.

Each of those is a decent but manageable amount of work for a new dev and will teach you a variety of useful low-level skills.

The phrase 'rite-of-passage', to me, carries negative connotations - something you have to do, something tedious, something necessary. In truth, there are no such projects.

However, in terms of projects popularly considered to be commonly implemented by newer programmers that do hold benefit, in network programming, I would say a traceroute implementation. Server-side, I'd say any multi-node cluster system, preferably diskless. Any embedded system. An RDBMS system. A NoSQL system. An open source intelligence system. Any computational linguistic system. Any i18n/l10n heavy project.

I read it as something that every programmer encounters sooner or later, and after which he may, with pride, imagine himself a bit closer to the 'Senior' level.
Ah, the context/connotations of rite-of-passage might have been lost on me as English is not my first language. Thanks for your suggestions though!
I don't think it's a language thing. To many the notion of an ordeal endured or an adventure undertaken in order to pass into manhood or womanhood (or whatever) is still appealing. "Rite of passage" does not have a universally negative meaning in English, it's just that many of us today are... well, yeah.
I think you might be confusing a "Rite of Passage"[0] with a "Hazing Ritual"[1]

[0]: https://en.wikipedia.org/wiki/Rite_of_passage [1]: https://en.wikipedia.org/wiki/Hazing_ritual

I cannot imagine why you think so. Or why you believe the links you shared support such a statement.
In retrospect, I think I misread your posts
1. Backtracking algorithm 2. Any kind of NP hard problem really 3. A recursive decent parser 4. An expression evaluation library 5. A plugin mechanism 6. Fixing memory leaks 7. Speed optimization
Learn enough sql to write an application entirely in the db, except presentation layer, of course. Use OO concepts to hide data, provide accessors, etc. Write a wrapper library for accessors in your front end language of choice (CLI, Web, GUI, whatever).

You won't often see a big project structured this way, but it is very effective for a first hand experience of data driven design.

I assuming you mean writing all the business logic in stored procs and functions. Every time I've seen this, it ended up horribly. I use the database for storage and data validation only. I write a middle tier for the business logic, and of course the front end layer.

I'm not saying it can't been done well, but it poses enough problems that are difficult to avoid, like calling an external web service inside a stored procedure, and the general mess of maintaining a bunch of chained stored procedures, functions.

It also guarantees vendor lock-in.

I should also mention, I found it common and well-received to use this pattern for sustaining development on an enterprise app. Some logic has to be in the database anyway.
In my experience, keep the logic in / close to the data model and the application code becomes very clean and easy. If you have a crap data model, then lots of implicit rules start appearing at the application level, bugs start to appear and no one knows exactly what the application should be doing, as all the rules are tied up in some crappy crufty code.

Like Linus said, bed programmers worry about the code, good programmers worry about the data and its relationships (or something to that effect).

That experience is why I propose pushing as much business logic down into the data layer as possible as a matter of implementing a rite-of-passage code. For organizations, this puts a premium on db programmers, but it also protects their data from less-skilled programmers.
What you describe is reasonable enterprise architecture. I'm responding to a fairly different question.
Judging by the last couple of applications I have inherited, most people should be doing this (but are not). The database is usually fast and should be doing a lot of the heavy lifting.
I found a significant disparity between the front-end application developers and the db-focused team I worked on, doing sustaining development. Basically, we were essentially out of IOPS so there was not any room for a bad query in production. Our db-team strategy was to provide safe APIs in SQL that paved a happy path for our front-end consumers. It was very successful.
On the web:

1. Build your own CMS

2. Build your own framework

3. Performance tune SOMETHING intensely so that you can observe the bottlenecks and their causes across the stack

Just off the cuff there.

1. Anything involving two different forms of encoding, one of which should preferably be something obscure like ShiftJIS.
* Projects - Ray tracer - Parser/Interpreter - Compiler - Virtual Machine - Small kernel - Neural Network - Web server

Among these, do the ray tracer and neural network seem way easier than the competition to me due to my math background or are they just weirdly chosen? Those two are essentially algorithms, not "projects".

Go to an interview and get asked an algorithm question after another realizing you've never had to implement a meaningful algorithm from scratch since you graduated College, because there's a library for (almost) everything.
To learn web security and encryption, Strip CTF2 (https://stripe.com/blog/capture-the-flag-wrap-up) is fabulous.
Encryption, so you learn something about how it works.

Some kind of game.

Edit: Something that uses an API.

The problem with learning encryption by writing it yourself is that you will not know if you got it right or totally wrong.

For most types of problems, the feedback is clear - if it works and gives you the expected result, then it's (at least mostly) correct. For implementing cryptographic systems, a passing test suite doesn't mean anything, you'd need extensive expert review to tell you where you went totally wrong, and without it you'd just likely learn untrue things.

> and without it you'd just likely learn untrue things.

I prefer this to the more common 'Just don't do it', as you actually state why. The reason is something I knew intuitively but couldn't really (And had actually never devoted any time to) put[ting] it into words. Thank you.

Encryption, so you learn something about how it works.

I would actively discourage developers from writing their own encryption software, in case they're tempted to use their work in a production app. Encryption is sufficiently complicated that you will get it wrong, and sufficiently important that getting it wrong will be very bad. Leave encryption to cryptographers.

I didn't think about that they might use the one they make.

That would be bad, so maybe an obfuscator instead?

I agree, that sounds cool. I do agree with the other commenter, it should come with a strongly-worded warning never to use it for a real product, but I believe there are several useful encryption tutorials and guides around that could be followed for educational purposes.
not really an answer to your question, but it could give some guidelines: http://matt.might.net/articles/what-cs-majors-should-know/
Wow, this is awesome. Thanks for the link.
Code something in an afternoon that makes your friends go "wow" the next day!
Project management application.
anything involving concurrency.
I agree that concurrency is a nearly essential skill nowadays, but would you have any concrete projects or implementations that use concurrency? There's quite a difference between writing a parallel merge-sort and say, a whatsapp back-end clone.
Write an IRC bot with sockets. The program needs to listen to multiple networks inputs (the connections to different IRC servers) as well as respond to user input (or at least signaling). This is a great playground to try out different models such as thread per connection, your own busy waiting or sleeping event loop, and OS event loops such as poll.

If you want to do more parallelism rather than just concurrency, you could have some inputs make the bot start doing some work (like computing the billionth digit of pi) and queue the work up into a threadpool. If you really want to get into parallel programming, you could write a multiproducer-multiconsumer queue to allow the IO threads to communicate with the pi computing worker pool.

Hope this is a good concrete project! I did the first part (multiple connection IRC bot) a few years and it definitely helped me understand concurrency and network programming.

If you want a 'toy'/practice project, writing a multi-threaded work queue (multiple producers, multiple consumers) is a good _relatively_ simple one (the API is simple and self-contained, anyway), to get your feet wet with concurrency. In real production work, you'd probably use an existing implementation for your platform of choice, but that's true of many of your examples too (few of us write an OS or compiler for production work! Probably more of us have had to write a concurrent work queue compared to an OS, heh.)
A timesheet application
Dive into someone else's imperfect at best code and make it work. After finishing the final project for my 2nd programming class, I helped someone else make their's work. In the context of what could run on a PDP-11/70 running Version 6 UNIX™ with 24x80 CRT terminals, it was rather neat, a semi-real time air traffic control simulator game, with planes coming from the upper corners and crossing paths as they went down to the catercornered runway.

Great idea, better than mine, but the code was awful, had three global state variables X, XX, and XXX ... plus U, UU, and UUU for the UFOs he'd added to make the game more fun ^_^.

I helped him reduce the complexity by removing the UFOs (it was still quite challenging enough), and getting it to work in general. This prepared me for the many future jobs I took working on the code bases of others (one of which, for example, taught me red-black trees for real), and which soon enough led to the extremes of software archaeology when you can't even ask anyone about the code.

Not that you necessarily want to seek out such work, it's hard and often thankless, but at its best it's also what paying down technical debt is about. And code you've written long ago can also be rather foreign when you come back to it....

Find a personal itch and scratch it. In my case, I knew little about reverse engineering closed source bins and drivers but had some hardware I was not yet ready to toss out. I learnt a lot about writing new device drivers based on the specs I managed to gtok from poking and peeking and analysing the original ones.
Looking for a job as a fresh graduate. With no work history it is all bullshit multi-day interviews, giving presentations, showing you have 'people skills', rediculously high applciation to interview and interview to offer ratio.

My favourite was a small company that told me they 'like to get girls in for interviews' on the phone and then I stupidly still went to a face to face (I was desperate) and I got rejected - the reason was I used a loop to implement something but didn't suggest I could copy and paste the loop innards 10 times, and therefore was 'holding things back'.