Hacker News new | ask | show | jobs
Building a simple shell in C – Part 3 (blog.ehoneahobed.com)
105 points by ehoneahobed 1310 days ago
6 comments

Surprising to see this article. I am a CS student, and at the second year, in OS course in one the assigment, we are actually building Shell in C. Very simplistic one. Great to read.
Oregon State by any chance? It was a super cool exercise. I'd like to revisit it someday without a time crunch and build another.
Maybe not. I attended Florida Atlantic University and that was a project I did. The shell wasn't that much, just run commands, support redirection and pipes, and (I think) handle environment variables (for example, "ls $HOME").
The book Advanced Programming in the Unix Environment also covers something similar iirc.
A good book I used to learn how to write a shell is "Using C with Curses, Lex and Yacc" by Axel-Tobias Schreiner in 1990.
I have personally tried to build one in C but the parsing was the real pain, I managed to have a tokenizer, barely found how to make an AST and never figured out what to do with. All parsing tutorials are about parsing mathematical expressions, I found it hard to adapt to shell grammar.
Yes a huge part of shell is parsing, and C is a bad language for that.

If you want POSIX shell you'll have at least 5K lines of parsing code; if you want bash it's at least 10K lines. It's closer to 20K lines of C in bash itself.

There's really no way around that, and IMO the best answer is to use a different language -- which is ALSO hard, because many language runtimes don't support fork() or signals in the way that a shell needs.

(e.g. CPython is actually closer than say Go because it supports fork() and exec(), but even it has issues with signals, EINTR, etc.)

I wrote a bunch of posts on how Oil does it:

How to Parse Shell Like a Programming Language - https://www.oilshell.org/blog/2019/02/07.html

posts tagged #parsing-shell: https://www.oilshell.org/blog/tags.html?tag=parsing-shell#pa...

Oil Is Being Implemented "Middle Out" https://www.oilshell.org/blog/2022/03/middle-out.html

Wouldn't most projects use a parser generator anyway? Making the choice of language separate from "what's the best language for parsing stuff".
Parser generators aren't widely used for implementing shells (or JavaScript engines, or C/C++ compilers, for that matter). IMO they're nice for designing languages, but not necessarily implementing them.

bash is actually one of the only shells that uses yacc, and the maintainer regards it as a mistake. It uses yacc for maybe 1/4 of the language and the rest is all hand written stuff intertwined with generated code. It's pretty messy.

See http://www.aosabook.org/en/bash.html

and e.g. https://www.oilshell.org/blog/2016/10/13.html

I might have issues later. For now, in Next Generation Shell peg/leg parser is doing fine (with limited scripting around to avoid repetition).

https://piumarta.com/software/peg/

https://github.com/ngs-lang/ngs/blob/bdfb2fd70162cd7183ac8d4...

TCL?
I actually took inspiration from https://www.oilshell.org/blog/2016/10/19.html#toc_1 when I implemented the tokenizer. Really liked the idea.
You should check out Crafting Interpreters!

http://craftinginterpreters.com

Wow this looks really interesting, thanks for sharing!
Parsing shell input is somewhat different than other languages because keywords are contextual. For example `if echo` and `echo if` are both legal, but `if` is only a keyword in the first example. This affects the design of the lexer.

Despite that, fish-shell still uses a traditional handwritten recursive descent parser. Link if you want to see: https://github.com/fish-shell/fish-shell/blob/master/src/ast...

Shameless Plug: a simple shell in ~60 lines of Go: https://simjue.pages.dev/post/2018/07-01-go-unix-shell/
lol glad to see this a week after my building a simple shell in C project was due