| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dom0 3302 days ago
	Most programming languages are context-sensitive [1] (at least with unbounded nesting), so parsing them correctly and efficiently is mathematically impossible. All practical implementations have to take shortcuts. [1] Mainly due to begin..end blocks, curly braces or indentation (as in Python)

2 comments

Edmond 3302 days ago

Are most programming languages really context-sensitive? or aren't they mostly context-free?

My days of fiddling with writing parsers are long ago (https://www.codeproject.com/Articles/7035/A-Java-Language-ID...) but if I remember correctly most languages aim for at most a LL(2) grammar, meaning they are designed so the parser doesn't have to peek more than two tokens ahead before being able to make a correct determination.

link

hornetblack 3301 days ago

C has some fun ones:

a * b;

Is either: a times b if a is a var, or declare a varible b with type a. If a is a typedef.

Also:

some_type b = {a, b, c, d};

Is only valid if some_type is an array or a struct. Which is possibly defined elsewhere in the source.

(I tend to see this syntax in some code bases (some_type) { a, b, c, d }, which is a bit better).

link

wtetzner 3302 days ago

Context free grammars are perfectly capable of expressing matched curly braces, even with unbounded nesting. Am I missing something?

link

dom0 3302 days ago

Yes, thinking about it that alone is not sufficient. Still, I'd claim that most languages are not context free.

CPP (Pre-processor) aside, C is not context free due to typedef making identifiers ambiguous. Also if-then-else? Since C++ templates are turing-complete, the grammar is probably unrestricted.

Python is not context free due to

    if ...:
        stmt1
        if ...:
           stmt2
        stmt3

stmt3 and stmt1 have to share the same level of indentation to form a valid Python program, but they might contain arbitrary indentation within brackets.

link