Hacker News new | ask | show | jobs
by Athas 3252 days ago
APL (and J by extension) are more tricky to parallelise than you might expect. The frequent reliance on boxing leads to irregular pointer structures, and the absence of compile-time type information makes it hard to generate code at all. APL is usually based on efficient implementations of primitives, but that is certainly too fine-grained to be sufficient for bandwidth-starved devices such as GPUs. I contributed to an APL-to-GPU compiler[0], and it was hard to make it work on more than a small (well-behaved) subset.

[0]: https://github.com/melsman/apltail

1 comments

Dyalog seem to have done amazing work on this in the last few years. Talk about Dyalog onging performance work https://video.dyalog.com/Dyalog16/?v=2AeONlTj1aY. Latest version performance info https://www.dyalog.com/dyalog/dyalog-versions/160/performanc...