Hacker News new | ask | show | jobs
by prezjordan 896 days ago
Every time I sit down to write a toy language I get stuck on two conveniences: syntax highlighting and auto-formatting. I feel like they're table stakes for a programming language in 2024, but they're _just_ annoying enough that I lose interest. Any tips?
6 comments

Basic syntax coloring is easy: just reuse your tokenizer. You already need a tokenizer for your compiler or interpreter. Assign colors to keywords, strings, numbers, and comments. For this simple syntax coloring you don't need to parse, only tokenize, and for pretty much all languages you can do it on a per-line basis.

Auto-formatting is also easy: just don't do it. Does a toy language really need it? I have two languages with syntax coloring and zero languages with auto-formatting. I think syntax coloring is table stakes but auto-formatting is not. You can have an interesting toy language without auto-formatting.

I dunno, there are a few things I like highlighted that you can’t easily do with just a tokenizer, like parameters or function calls.

> Auto-formatting is also easy: just don't do it.

I agree with this, except that auto-closing grouping symbols is extremely useful and turns out to be pretty easy to support.

Auto formatting for work is an angel, for code you love it is the devil.

Writing code that you love is like crafting it. You align subjects and data in beautiful flows of if/else or case/switch. You put multiple statements on a line if they encompass one "thought"

Try writing without a formatter when you are the sole author.

As for syntax highlighting I agree. You can get 80% of the way there with a vim regex of keywords, but it's one of the main things I'm solving by writing my own editor.

> for code you love it is the devil

This doesn't have to be true! Zig's `zig fmt` doesn't wrap on column count but instead lets you insert breaks with a ",". While this does introduce a level of user discretion which can diminish the braindeadness of autoformatters, I find that it works a lot better because now I can choose where the reasonable place to break is. There may technically be more wiggle room for debate, but now you're spending less time debating variable names or whatever to get it to format not-awfully.

Maybe it's just cause I want to stick to a certain width, but I can't remember the last time I disagreed with the autoformatter.
Have you tried for any length of time?

For example, there's lots of little places where aligning keywords helps clarify code

https://github.com/civboot/civlua/blob/main/ds/ds.lua#L89

Formatters HATE putting multiple statements on a single line, but when they go it makes it so much easier to parse (for a person)

Yeah, I used to not have an autoformatter, and still occasionally I'm working in an environment without access to one. I don't get how this example is clearer than a multi-line if, or perhaps a ternary operator (idk if lua has that).
My problem is that when designing my own language, I decided on GC, lambda syntax, async/await, weak typing, and... oops, it's basically just Javascript. Sure it'd have tons of smaller differences vs JS, but it's too demotivating.
It is still a great learning exercise IMO. Knowing which features are good/familiar to use, but difficult to implement, lets you explore other avenues, like news_to_me and the backslash for lambda [1].

For me, for example, I was gonna go with async/await in my toy language, but didn't like the implementation and decided to explore a monad-like direction.

[1] https://news.ycombinator.com/item?id=38851863

I can see this learning exercise applying to other things, only it takes a lot of time and commitment. So far I've also learned a lot on other projects that I was able to carry out more fully. But everyone has their thing.
Javascript (for all its warts) does in many ways seem to be converging on a certainly local maximum for ergonomic language design. Perhaps this is because it is used so much in the 'wild west' of browser programming? Where things can move quickly, with polyfill for backwards compatibility (a modern marvel to my eyes, being able to use new language features on old platforms!)
Yes, it's an example of not letting perfection get in the way of good. They made something accessible, and the finer details got worked out later.
The other thing is, JS was designed with the web use case in mind. If I ever make a language, it'll be for some new use case that really needs one, perhaps some extension of SQL. And it'll prioritize ease of use in that case over "purity" or other principles.
Js lacks progressive typing, that could be an interesting feat
TS tried, and I didn't like the end result.
When you start, your language is just for you so the question is are syntax highlighting and auto-formatting table stakes for you? If yes, then I'd suggest you learn how to write a tree-sitter grammar and integrate with your editor of choice. This will also get you up and running with a basic AST rather quickly and you can decide whether or not you want to write your own parser or just use the tree-sitter generated ast for your compiler.
This was a super annoying aspect of this project actually, since you basically have to write an additional parser in whatever syntax highlighting format your editor uses (sublime text in my case). The only thing I really learned is the easier the syntax is to parse, the better.
Gnome has a syntax highlighting sub-system that editors like Xed and Pluma use, with .lang files somewhere iirc, regex based, can do keyword and literal highlighting quite nicely. Limited to Gnome of course, but good for experimentation. Maybe Windows/Mac/other Linux has something similar.

Edit: for VS Code:

https://code.visualstudio.com/api/language-extensions/syntax...