|
|
|
|
|
by chubot
3286 days ago
|
|
The lexer for bash is inside that file, parse.y -- see yylex(), which calls read_token(). It doesn't use lex; it's written by hand. I'm not sure what you mean that shells don't do string manipulation. Almost ALL they do is string manipulation. That's true for the shell interpreter, which has to make sense of the input program, and for user programs, which are processing argv strings like file system paths, and stdin. There are actually a handful of different parsers inside bash, which I mention here: http://www.oilshell.org/blog/2016/10/26.html Brace substitution is another little parser as well. And globbing, and regex, both of which need their own parsers. (bash has its own glob parser, but some shells use libc's glob implementation). bash is really at 4-7 sublanguages in one. The annoying thing about shell is that it makes it impossible NOT to do string manipulation in your program, because there is all this implicit stuff like word splitting. |
|
"Hmmm… he's using strtok, that's not how a real shell would work. What would a minimal shell, without scripting, pipes, redirects etc. do? Just correctly parsing legal file paths (which TFA needs to correctly implement 'cd') is well out of scope of a small article like this."