|
|
|
|
|
by kevindong
3296 days ago
|
|
IRL, shells don't do string manipulation (well, technically everything becomes string manipulation at some point, but in this context not in the normal sense of the term). Shells generally use a lexer to split inputs up into tokens (generally using regex) [0] and then make sense of the inputs using a parser (the most famous of which is called yacc [1]). [0]: I was going to link to Bash's lex file here, but they appear to do something funky which would require a non-trivial amount of time to find, understand, and write here. So, you'll just have to take my word on this. I give you wikipedia as a substitute: https://en.wikipedia.org/wiki/Lexical_analysis [1]: https://git.savannah.gnu.org/cgit/bash.git/tree/parse.y |
|
I'm not sure what you mean that shells don't do string manipulation. Almost ALL they do is string manipulation.
That's true for the shell interpreter, which has to make sense of the input program, and for user programs, which are processing argv strings like file system paths, and stdin.
There are actually a handful of different parsers inside bash, which I mention here: http://www.oilshell.org/blog/2016/10/26.html
Brace substitution is another little parser as well. And globbing, and regex, both of which need their own parsers. (bash has its own glob parser, but some shells use libc's glob implementation). bash is really at 4-7 sublanguages in one.
The annoying thing about shell is that it makes it impossible NOT to do string manipulation in your program, because there is all this implicit stuff like word splitting.