Hacker News new | ask | show | jobs
by chubot 3286 days ago
The lexer for bash is inside that file, parse.y -- see yylex(), which calls read_token(). It doesn't use lex; it's written by hand.

I'm not sure what you mean that shells don't do string manipulation. Almost ALL they do is string manipulation.

That's true for the shell interpreter, which has to make sense of the input program, and for user programs, which are processing argv strings like file system paths, and stdin.

There are actually a handful of different parsers inside bash, which I mention here: http://www.oilshell.org/blog/2016/10/26.html

Brace substitution is another little parser as well. And globbing, and regex, both of which need their own parsers. (bash has its own glob parser, but some shells use libc's glob implementation). bash is really at 4-7 sublanguages in one.

The annoying thing about shell is that it makes it impossible NOT to do string manipulation in your program, because there is all this implicit stuff like word splitting.

1 comments

One of my takeaways from TFA was along the lines of…

"Hmmm… he's using strtok, that's not how a real shell would work. What would a minimal shell, without scripting, pipes, redirects etc. do? Just correctly parsing legal file paths (which TFA needs to correctly implement 'cd') is well out of scope of a small article like this."

Right, a real shell obviously can't use strtok. If you're leaving out pipes, redirects, and any control flow, then separating a shell string into words for the argv[] array is fairly similar to lexing a C-escaped string (e.g. in C, Java, Python, JavaScript).

You have backslashes, single quotes, and double quotes basically. Traditionally this is done with switch statement in a loop in C.

But that is not a good approach for a real shell. Even inside double quotes you can have a fully recursive program, like:

    $ echo "hi ${v1:-A${v2:-X${v3}Y}B}"
    hi AXYB
Once you have recursion then you need some kind of parser, not just a lexer.
I've been mocked on HN for saying this before but Bash and other shells of it's ilk are programming languages in their own right. I mean sure you're dependant on the suite of tools in $PATH to do anything useful, but that's not that much different to the standard libraries that make modern languages so powerful.
I have have a hard time seeing what there is to mock about your opinion of shells. I absolutely consider them languages - better at some things, worse at others.