I haven't read the tut yet but… could I write a language like python with that tutorial? If not, what would it be the best tutorial for indented syntax like python?
There is a trick for parsing "indented" syntax which is not covered by this tutorial: the lexer (not the parser) keeps track of indentation. Lexer inserts Indent and Descent tokens, which the parser considers like { and } tokens in a C syntax language.
Now your parser implements a context-free grammar although the language is not context-free.
First, if you know Python, play with writing your own interpreters, parsers, etc. in Python itself. It'll be slow, sure, but you'll learn a lot.
Next, try using the python ast modules for taking 'real' python code, and turning it into an AST which you can then do stuff with. For instance, you could take a subset of python, and turn it into C, or Javascript, or whatever. This will be fun, but don't expect to be able to turn all python into those other languages, there's just WAAY too many edge cases, differences in scope, etc. But for a subset of the language, however much you choose to use, it's pretty fun.
In general, parsing syntax should be the LEAST complex issue in writing a programming language. Playing with the above ideas in Python will be a lot of fun and you'll learn a lot anyway.
Some resources to look for... writing a LISP in python:
Another way to approach it is to drop markers using a preprocessor after indentation. A group of friends and I were hacking on something similar and were trying to allow Lisp syntax sans parentheses. We created a preprocessor that would wrap indented child lines with the appropriate parentheses. Here's the OCaml file below:
Will help you in a very limited way. Unfortunally, you will need to hunt from several places to go after any half-decent implementation of any decent-enough language.
But do anything first. So, take this tutorial and complete it. If after it you wanna move...
I'm working in my own language, and after read like hundreds of links/blogs/books/etc, you will find that a lot of that is heavily biased torwards parsing, LISPys, FORTHs, and ultra-heavy-monads-worship or academic complexting that will make your head spin. After all that read, I'm still confused. Because a lot of things are not simpler or obvious in the way how make a blog or a ecommerce site can be.
Syntax is very important, but after read a lot, I think that do the parsing stuff is not the very best first step.
INSTEAD DO THE AST WALKER. Seriously. Get clear how the the AST can be processed and focus in that. Changes in the AST will cause heavy impacts in your parsing stuff, and a lot of question need to get clarified first. For example, from the link above:
My first questions? How make a REPL, a debugger, how implement pattern matching, type checking, if a interpreter can be fast enough, if possible to avoid to do a whole VM for it, etc...
Exist a LOT of practical questions that are left as "a exercise for the reader" that need clarifications.
So, my advice:
- Get a list of what do you want for your language. Then think harder and try to remove features and get to the core of it.
- From the above, start to understand the details. For example: I wanna to do GO-style concurrency, so... how implement GO-style concurrency? If that is not solved, your syntax is pointless. That is why try to fill the "core" of the language as much as possible before to commit to the GUI (aka:syntax) of it.
- Then do a interpreter. Is FAR easier. Even if you wanna do a compiler, start here is more easier.
- On the side, play with your syntax, and imagine how write in the language could be. Then start to think "to make this syntax do that, how I can do inside the guts of the compiler"? But you can delay the parser for a while.
When the time come for the parsing, probably is better to do as a top-down parser and/or pratt-parser. I don't waste time with parser generators. If your disagree, ok with me ;)
I have find that use F# is great for this stuff. However, OCalm have more code-samples (ie: ML languages are made for this!). Lisp will work great if your are insane to like that ;). Hardcode C-developer? Tons of code to look, but not very much in clarity to see.
Writing an interpreter is much, much harder than implementing a simple, straightforward compiler.
Interpreter is unavoidably convoluted, it is full of very complex things like environment handling, it cannot be split into smaller parts in any sane way.
Compiler, on the other hand, is nothing but a trivial sequence of very simple tranforms (as simple as you like), each is nothing but a couple of term rewriting rules. You do not need a Turing-complete language to implement a compiler.
Maybe if the compiler is very simple. But I have said:
"""
How make a REPL, a debugger, how implement pattern matching, type checking, if a interpreter can be fast enough, if possible to avoid to do a whole VM for it, etc...
"""
REPL, debugger, no-VM and other stuff look easier with interpreters than compilers.
However, I will be happy to be wrong: I wanna the simplest way to get where I want to.
Currently: Working with F#, wanna REPL, debugger, AGDTs, pattern-matching, go-like concurrency, iterators, semi-functional...
Where I hit a big block is how do interop with a interpreter (ie: Call .NET methods) and that look easier with a compiler...
Now your parser implements a context-free grammar although the language is not context-free.