Hacker News new | ask | show | jobs
by jitl 3309 days ago
This is what Lighttable aimed to do - create an extensible, abstracted code editor for many languages. See also CodeBubbles (Java) and most Smalltalk environments.

http://lighttable.com/2012/04/12/light-table-a-new-ide-conce...

3 comments

> It’s no secret that I really like Clojure and as a lisp, it was the easiest language for me to start the prototype with, but there’s no reason this couldn’t be done for any language with a dynamic runtime. The rest is mostly simple analysis of an AST and some clever inference.

I have looked into this. It is kind of criminal that for most real world languages (Ruby[1], C[2] etc), it's not possible to just define a grammar and throw it at a standard parser generator for them - they generally have one or two quirks which make this infeasible.

In my ideal alternate universe it would be considered unthinkable to publish a language without also publishing a grammar in a standard format for said language, which can then be plugged into your favourite text/semantic/tree editor. Our tools should dictate our languages, not the other way around.

[1] http://programmingisterrible.com/post/42432568185/how-to-par...

[2] https://en.wikipedia.org/wiki/The_lexer_hack

Most programming languages are context-sensitive [1] (at least with unbounded nesting), so parsing them correctly and efficiently is mathematically impossible. All practical implementations have to take shortcuts.

[1] Mainly due to begin..end blocks, curly braces or indentation (as in Python)

Are most programming languages really context-sensitive? or aren't they mostly context-free?

My days of fiddling with writing parsers are long ago (https://www.codeproject.com/Articles/7035/A-Java-Language-ID...) but if I remember correctly most languages aim for at most a LL(2) grammar, meaning they are designed so the parser doesn't have to peek more than two tokens ahead before being able to make a correct determination.

C has some fun ones:

a * b;

Is either: a times b if a is a var, or declare a varible b with type a. If a is a typedef.

Also:

some_type b = {a, b, c, d};

Is only valid if some_type is an array or a struct. Which is possibly defined elsewhere in the source.

(I tend to see this syntax in some code bases (some_type) { a, b, c, d }, which is a bit better).

Context free grammars are perfectly capable of expressing matched curly braces, even with unbounded nesting. Am I missing something?
Yes, thinking about it that alone is not sufficient. Still, I'd claim that most languages are not context free.

CPP (Pre-processor) aside, C is not context free due to typedef making identifiers ambiguous. Also if-then-else? Since C++ templates are turing-complete, the grammar is probably unrestricted.

Python is not context free due to

    if ...:
        stmt1
        if ...:
           stmt2
        stmt3
stmt3 and stmt1 have to share the same level of indentation to form a valid Python program, but they might contain arbitrary indentation within brackets.
The RealBASIC IDE also give you a function-based editing experience instead of storing entire text files full of code.

It was kind of neat, but also led to a lot of clicking around. It's one of those Holy Grail ideas people have been talking about forever, but I'm not convinced it's actually that superior given all of the ecosystem downsides there are to moving away from text.

I think you probably could do something better than text files, but it has to be a lot better to get over the chasm of losing all of your familiar editors, command line text utilities like grep, easy copy/paste, etc.

Text is a lowest common denominator medium. People get hung up on the "lowest" part, but the "common" part is pretty damn convenient.

And Self's Morphic, which went even further ("static" objects would be defined via UI elements with only the method bodies being written in small editors)