Hacker News new | ask | show | jobs
by cassepipe 1314 days ago
I have personally tried to build one in C but the parsing was the real pain, I managed to have a tokenizer, barely found how to make an AST and never figured out what to do with. All parsing tutorials are about parsing mathematical expressions, I found it hard to adapt to shell grammar.
3 comments

Yes a huge part of shell is parsing, and C is a bad language for that.

If you want POSIX shell you'll have at least 5K lines of parsing code; if you want bash it's at least 10K lines. It's closer to 20K lines of C in bash itself.

There's really no way around that, and IMO the best answer is to use a different language -- which is ALSO hard, because many language runtimes don't support fork() or signals in the way that a shell needs.

(e.g. CPython is actually closer than say Go because it supports fork() and exec(), but even it has issues with signals, EINTR, etc.)

I wrote a bunch of posts on how Oil does it:

How to Parse Shell Like a Programming Language - https://www.oilshell.org/blog/2019/02/07.html

posts tagged #parsing-shell: https://www.oilshell.org/blog/tags.html?tag=parsing-shell#pa...

Oil Is Being Implemented "Middle Out" https://www.oilshell.org/blog/2022/03/middle-out.html

Wouldn't most projects use a parser generator anyway? Making the choice of language separate from "what's the best language for parsing stuff".
Parser generators aren't widely used for implementing shells (or JavaScript engines, or C/C++ compilers, for that matter). IMO they're nice for designing languages, but not necessarily implementing them.

bash is actually one of the only shells that uses yacc, and the maintainer regards it as a mistake. It uses yacc for maybe 1/4 of the language and the rest is all hand written stuff intertwined with generated code. It's pretty messy.

See http://www.aosabook.org/en/bash.html

and e.g. https://www.oilshell.org/blog/2016/10/13.html

I might have issues later. For now, in Next Generation Shell peg/leg parser is doing fine (with limited scripting around to avoid repetition).

https://piumarta.com/software/peg/

https://github.com/ngs-lang/ngs/blob/bdfb2fd70162cd7183ac8d4...

TCL?
I actually took inspiration from https://www.oilshell.org/blog/2016/10/19.html#toc_1 when I implemented the tokenizer. Really liked the idea.
You should check out Crafting Interpreters!

http://craftinginterpreters.com

Wow this looks really interesting, thanks for sharing!
Parsing shell input is somewhat different than other languages because keywords are contextual. For example `if echo` and `echo if` are both legal, but `if` is only a keyword in the first example. This affects the design of the lexer.

Despite that, fish-shell still uses a traditional handwritten recursive descent parser. Link if you want to see: https://github.com/fish-shell/fish-shell/blob/master/src/ast...