Hacker News new | ask | show | jobs
by nandemo 4664 days ago
> I find context-free lexing to be a serious limitation on parsing.

What is this supposed to mean? Context-free and context-sensitive are well-defined terms in formal language theory, but OP seems to be using them in a non-standard way.

In any case, when we talk about lexers it's normally understood we're parsing a regular language, which is simpler than parsing general context-free languages (let alone context-sensitive) [1]. If you're doing something that cannot be expressed with a plain regular expression, then it's probably not lexing in the first place.

    Age 37
    Group 15-B
    Phone +49.(0).123.456
Maybe I'm dense, but I can't see this would be problematic. If OP said what did he try, and why it didn't work, it would be nice.

[1] http://en.wikipedia.org/wiki/Chomsky_hierarchy

1 comments

I hope lexers aren't only for parsing regular languages! I definitely want my lexer to be able to parse balanced parens.
Interesting. You mean you use a language where "sequence of balanced parens" is a token?
No, I use many languages that care about whether parens (each one of which is its own token) are balanced. The language of balanced parens (a language that includes "" and "((((()()(()))())))" but not "()())(()" or "(((()))))") is a simpler language that also cares about parens being balanced. It was also the first language I saw in compilers class that was not regular.
Then you don't need the lexer to parse balanced parens. The job of the lexer is to turn

    ("Foo(" + bar)
into something like

    OpenParen String Op Identifier CloseParen
Then the (synyactical) parser takes over. This is pretty standard.