Hacker News new | ask | show | jobs
by chubot 1310 days ago
Yes a huge part of shell is parsing, and C is a bad language for that.

If you want POSIX shell you'll have at least 5K lines of parsing code; if you want bash it's at least 10K lines. It's closer to 20K lines of C in bash itself.

There's really no way around that, and IMO the best answer is to use a different language -- which is ALSO hard, because many language runtimes don't support fork() or signals in the way that a shell needs.

(e.g. CPython is actually closer than say Go because it supports fork() and exec(), but even it has issues with signals, EINTR, etc.)

I wrote a bunch of posts on how Oil does it:

How to Parse Shell Like a Programming Language - https://www.oilshell.org/blog/2019/02/07.html

posts tagged #parsing-shell: https://www.oilshell.org/blog/tags.html?tag=parsing-shell#pa...

Oil Is Being Implemented "Middle Out" https://www.oilshell.org/blog/2022/03/middle-out.html

2 comments

Wouldn't most projects use a parser generator anyway? Making the choice of language separate from "what's the best language for parsing stuff".
Parser generators aren't widely used for implementing shells (or JavaScript engines, or C/C++ compilers, for that matter). IMO they're nice for designing languages, but not necessarily implementing them.

bash is actually one of the only shells that uses yacc, and the maintainer regards it as a mistake. It uses yacc for maybe 1/4 of the language and the rest is all hand written stuff intertwined with generated code. It's pretty messy.

See http://www.aosabook.org/en/bash.html

and e.g. https://www.oilshell.org/blog/2016/10/13.html

I might have issues later. For now, in Next Generation Shell peg/leg parser is doing fine (with limited scripting around to avoid repetition).

https://piumarta.com/software/peg/

https://github.com/ngs-lang/ngs/blob/bdfb2fd70162cd7183ac8d4...

TCL?
I actually took inspiration from https://www.oilshell.org/blog/2016/10/19.html#toc_1 when I implemented the tokenizer. Really liked the idea.