| > I've had a similar experience at university while using Haskell. It's a very elegant lazy pure functional language, but it's glacially slow. The natural, low-friction path is very inefficient. The efficient path is unidiomatic and verbose. Well, if you run a Haskell program on a "C-Machine" of course a comparable program in a "C-Language" will be faster — as it don't need to bridge any gap in execution semantics. The point is: Mostly all modern computers are "C-Machines". Modern CPUs go even a long way to simulate a "PDP-7 like computer" to the outside world, even they're working internally quite different. (The most effective optimizations like cache hierarchies, pipelineing, out-of-order execution, JIT compilation to native instructions [CPU internal "micro-ops"], and some more magic are "hidden away"; they're "transparent" to programmers and actually often not even accessible by them). So not only there's nothing than "C-Machines", those "C-Machines" are even highly optimized to most efficiently execute "C-Languages", but nothing else! If you want to feed in something that's not a "C-Language" you have to first translate it to one. That transformation will almost always make your program less efficient than writing it (by hand) in a "C-Language" directly. That's obvious. On the other hand running Haskell on a "Haskell-Machine"¹ is actually quite efficient. (I think depending on the problem to solve it even outperforms a "C-Machine" by some factor; don't remember the details, would need to look through the papers to be sure…). On such a machine an idiomatic C or Rust program would be "glacially slow", of course, for the same reason as the other way around: The need to adapt execution semantics before such "no-native" programs could be run will obviously make the "translated" programs much slower compared to programs build in languages much closer to the execution semantics provided by the machines hardware implemented evaluator. That said, I understand why we can't have dedicated hardware evaluators for all kind of (significantly different) languages. Developing and optimizing hardware is just to expensive and takes to much time. At least if you'd like to compete on the status quo. But I could imagine for the future some kind of high level "meta language" targeting FPGAs which could be compiled down to efficient hardware-based evaluators for programs written in it. Maybe this could even end the predominance of "C-Languages" when it comes to efficient software? Actually the inherently serial command-stream semantics of "C-Languages" aren't well fitted to the parallel data-flow hardware architectures we're using at the core now since some time (where we even do a lot of gymnastics to hide the true nature of the metal by "emulating a PDP-7" to the outside world — as that's what "C-Languages" are build for and expect as their runtime). To add to the topic of the submission: Are there any HW implementations of APLs? Maybe on GPUs? (As this seems a good fit for array processing languages). ¹ https://github.com/tommythorn/Reduceron |
> Are there any HW implementations of APLs?
This has been discussed on Y-Combinator in the past[1]. I found a discussion referencing a paper about leveraging Futhark[2].
1. https://news.ycombinator.com/item?id=11965474
2. https://futhark-lang.org/publications/fhpc16.pdf