| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by asrp 1914 days ago

Sorry if I've asked these years ago and just don't remember the answer.

> The '/copy-to-ebx' after the slashes is just my way of helping the reader understand what the instruction does. I don't want the reader to have to consult the Intel manual for every instruction, even if I'm forcing the writer to do so.

Why not make the comment the instruction and the bytes the (maybe even optional?) comment in that case then?

From your first post.

> The fact that C compilers are written in C contributes a lot of the complexity that makes compilers black magic to most people.

Isn't this more a symptom of C though? I'm hoping this is generally not true if you replace C with other languages (but could be very wrong). But more generally, I'm thinking you could make "the compiler's inner workings is not black magic" a constraint rather than make not writing the higher level language in the higher level language the constraint.

In my case, I tried that first route and then moved to instead having the compiler written in the higher level language but emitting output that's close enough to (my) handwritten lower level language.

I'll have to read your two part post more carefully though. Glad to see this project getting some attention, even though in an unusual fashion.

1 comments

akkartik 1914 days ago

Great questions! I've actually never considered putting the comment first! I'll have to think about that one.

You're right to point out that there are two components to "C compilers written in C make compilers seem complex": the metacircularity, and C-specific difficulties. I think I was focusing on the first when I wrote that, but I can't exclude the possibility you raise. A better language might reduce the need to understand it operationally, by looking under the hood to understand what a line of code is translated to. The Mu way may well be a dead end, since the requirement of understanding translated code restricts how complex compiler optimizations can get. You probably don't want to understand Haskell's loop fusion by comparing source and generated code.

In my mind there's an idea maze where there are 3 major possibilities for improving the future of software:

a) Simple languages and translators that are easy to understand by running them. This is the Mu way.

b) Type-driven languages that are easy to understand by reading them. Haskell and OCaml seem to fit here, and they may well be the right answers.

c) Complex languages that discourage abstractions atop them. This is the APL way, and it too might end up being the right way.

I'm doing a) mostly because it seems to fit my brain better. I just can't seem to get into Haskell or OCaml or APL.

link

asrp 1914 days ago

> I've actually never considered putting the comment first! I'll have to think about that one.

I'm sure there are many competing constraints so definitely don't do it because I'm suggesting this on a whim. :) My reasoning is that as a human reader, the comment is the more readable part, so I'd want to see it first. And for a computer, it probably doesn't care if the op code appears first or not.

> You probably don't want to understand Haskell's loop fusion by comparing source and generated code.

Indeed. But even though C and Haskell are very different, I think they share a common philosophy about compilation where you can basically do whatever you want as long as it still produces the same result.

I vaguely remember looking at Python generate bytecode (with `dis.dis`) and seeing it wasn't too bad. I haven't tried it on a larger program though.

There's tcc (and more recently chibicc that I haven't had a chance to check out yet) that you're probably already aware of. Is the generated output still pretty bad.

I'll also throw my own attempt in the ring

- High level https://github.com/asrp/flpc/blob/master/lib/stage0.flpc - Low level (up to line 45) https://github.com/asrp/flpc/blob/master/precompiled/self.f

even though it's not quite optimized for this purpose and the code itself is still a bit unclean. If there was a syntax highlighter for the low level language, I'd probably highlight "[", "]" and "bind:" as a start. I can try to clarify any obscure syntax or primitive.

Some more general ideas to get aroud the issue. - Invoke optimization only when asked specifically (and apply the optimization locally). That is, optimization would need at least additional syntax in the language. - Explicitly track correspondance between source and target (at the character or token level) and also do this in each optimization pass. Maybe even keep the intermediate values of each pass so you can browse through it like a stack trace.

> In my mind there's an idea maze where there are 3 major possibilities for improving the future of software:

I guess I'm trying another route even though I don't know if it fits the definition of improving the future of software.

d) Have programmers make their own compiler/interpreter and language by giving them the tools and knowledge to do that (more) easily.

This would (hopefully) avoid the black box/magic issue since the programmer would know the details of the inner workings by virtue of having written it. Though I'm most definitely very far from the goal and the questions can be asked about how to improve their target language.

link

akkartik 1914 days ago

Oh your project looks familiar. Though I might have seen it a long time ago. I'll take a closer look.

> My reasoning is that as a human reader, the comment is the more readable part, so I'd want to see it first. And for a computer, it probably doesn't care if the op code appears first or not.

Yeah, for sure. One rebuttal that comes to mind is the dictum, "don't get suckered by comments, debug code." Comments are useful, but too much emphasis on them has led to dark times in my past :)

Still very worth considering.

link

asrp 1914 days ago

I've read through more of you post can came across the bottom comment (don't know how to permalink to it) which better expresses my comment above.

> An optimizing linter has the problem of being destructive. It goes like this:

> The programmer will write his or her program in a readable way. They'll run it through the compiler, which points out that something can be optimized, the programmer—having already gone through the process of writing the first implementation with all its constraints and other ins and outs fresh in their mind—will slap their head and mutter "of course!", and then replace the original naive implementation with one based on the notes the compiler has given. Chances are high that the result will be less comprehensible to other programmers who come along—or even to the same programmer revisiting their own code 6 months later.

Also a data point and word of warning about (lack of) optimization. My own projects (one of which was mostly hand-written in x86 assembly) have been pretty heavily stalled from speed issues, that sent me on significant detours. Since you are working with your own compiler/interpreter to implement your levels, you are directly affected by their compilation speeds as you iterate. Even with modern hardware, they can quickly become too slow to be even usable.

This is unfortunately another consequence of having too much black magic in (C) compilers. So we get the wrong intuition about how fast computers are.

link

akkartik 1914 days ago

Were your languages very high-level? If so, that kinda rhymes with my experience on past projects. The more expressive the language, the easier it is for programs to create combinatorial explosions that slow everything down if compiled naively.

link

asrp 1906 days ago

The language is high-level but I wouldn't necessarily say very high level. But because I'm trying to spin up language features at runtime (like a lot of Forth does), there are a few layers on top of the language primitives.

I wish there was some framework for me to add optimizations as I go along, especially if there could be some speed gauranttee. Though in my case, I'd like to also not lose the relation to the original source (like C does when values are optimized out).

link