Hacker News new | ask | show | jobs
by gpderetta 1638 days ago
My wish-list for my ideal (non-system) programming language:

- first class tables and named tuples as the primary datastructure. Includes the full set of relational operations, and transaction support. Optional persistence. Everything is not a table though. Tables are great but pragmatism trumps dogmatism.

- structural typing (ties neatly with the above) and support for row polymorphism

- shared nothing, distributed, multiprocessing, except for explicitly shared tables as transactions allow for safe controlled mutation of shared tables. Messages are just named tuples and row polymorphism should allow for protocol evolution. Message queues and stream can be abstracted as one pass tables.

- Async as in Cilk not JS. No red/green functions. Multiprocessing can be cheap, just spawn an user thread. The compiler will use whatever compilation strategy is the best (cactus stacks, full CPS transform, whatever).

- seamless job management, pipelines, graphs. Ideally this language should be a perfectly fine shell replacement. But with transparent support for running processes on multiple machines. And better error management.

A bit more nebulous and needs more thoughts:

- exceptions, error codes and optional/variant results are all faces of the same medal and can look the same with the right syntactic sugar.

- custom table representation. You can optionally decide how your table should be physically represented in memory or disk. Explicit pointers to speed up joins. Nested representation for naturally hierarchical data. Denormalized

- first class graphs. Graphs and relational tables are dual. And with the above point it should be possible to represent them efficiently. What operations we need?

- capabilities. All dependencies are passed to each function, no global data and code. You can tell if your function does IO or allocates by looking at its signature. Subsumes dependency injection. Implicit parameters and other syntactic sugar should make this bearable.

- staged compilation via partial evaluation. This should subsume macros. Variables are a tuple of (value, type), where type is a dictionary of operation-name->operation-implementation. First stage is a fully dynamic language, but by fixing the shape of the dictionary you get interfaces/traits/protocol with dynamic dispatch, by fixing the implementation you get static dispatch. Again, significant sugar is needed to make this workable.

edit:

missed an important element: - transparent remote code execution: run your code where your data is. Capabilities are pretty much a requirement for security.

1 comments

I'm no longer convinced of the need for row or record polymorphism. It encourages passing around types that have no clear domain or purpose, so I think it inhibits understanding in general. Do you have any examples where it's indispensable?
I don't think it is indispensable, I think it is convenient and still better than what is done today were types without clear domain and purpose are already passed around.

At the very least with row polymorphism, a function can declare which subset of the type it actually care about instead of taking an unwanted dependency on the whole blob.

In particular I'm considering the scenario were a large application (or better a collection of applications) evolve without a central plan and messages tend to grow to accommodate orthogonal requirements (the alternative is splitting the messages, but it has performance, complexity and organizational overhead).

In theory the alternative is message inheritance, but in my experience it has never worked well and it is very hard to retrofit anyway.

> At the very least with row polymorphism, a function can declare which subset of the type it actually care about instead of taking an unwanted dependency on the whole blob.

This is the argument I no longer find convincing. Do you have an example where this is so much clearer than alternate, simpler ways of doing it?

For instance, in principle you could easily rewrite a function that works on a record with 3 fields to just accept 3 parameters. The only additional "burden" is that the caller has to pass in those 3 fields, where before they could just pass in the record.

Having it as a single parameter precludes the mistake of passing unrelated values, though.
The row typed function is also less reusable for that reason.

If you have two fields of compatible type such that you can confuse which one to pass as a parameter, then it's likely you're not making enough domain specific type distinctions that would disambiguate these fields.

If these fields were really compatible domain specific types, then it's more likely you would want to be able to use that function with both fields at some point. Row typing then either hinders this reuse (not good), or requires you to refactor to encapsulate both fields in a new record with compatible fields and pass that in (maybe good?). This is code you wouldn't have to write without row types.

But as I said, I would like a concrete example to discuss if anyone has one. Speaking in abstract like this isn't likely to be convincing either way.

Nothing is indispensable as long as you have a Turing complete language. That is a really bad mindset to use.

Anyway, are you complaining that the types are abstract? (That is as bad a complaint as it sounds.) Or do you have something different in mind?

You're taking indispensable too literally. If you have to commonly write 1,000 lines of code without a feature, but the feature permits you to to reduce this to 1 line of code, I'd consider that to be pretty indispensable.

Where the indispensable line is is debatable, hence my request for an example.