It's a long regex, but it's just whitespace followed by an alternation with 5 different types of data: split-unquote, special characters, strings, comments, symbols. The string tokenizing branch is a bit complicated because it has to allow internal escaping of quotes. Early iterations of the guide didn't explain the regex in detail but the section now describes each of the regex components.
Step 0, so I didn't get very far.
https://github.com/kanaka/mal/blob/master/process/guide.md#s...