Hacker News new | ask | show | jobs
by a7b3fa 2260 days ago
Jonesforth is insanely cool. The linked mirror seems to be missing 'jonesforth.f'. Maybe try check out this one[1] for the full implementation.

I recently tried porting Jonesforth to UEFI[2], so I could run it directly on my hardware without needing an operating system. I was actually surprised by how easy it turned out to be.

Okay, admittedly I ended up rushing a bit towards the end, and the final result is very bare-bones - it can do "Hello, World!", Fibonacci numbers, and then that's pretty much it. Still, it was a lot of fun, and I would totally recommend a project like this, especially if you don't usually work with "low-level" development.

I also ended up writing a blog post[3] to help people get started writing assembly for UEFI. The best resource is probably the OS Dev wiki, though. It has a ton of great resources.

[1]: https://github.com/nornagon/jonesforth

[2]: https://github.com/c2d7fa/jonasforth

[3]: https://johv.dk/blog/bare-metal-assembly-tutorial.html

[4]: https://wiki.osdev.org/Expanded_Main_Page

2 comments

Is jonesforth singlethreaded or does it utilize all available cpu cores? I only ask because a lot of these little baremetal languages just do the bare minimum needed to boot to a repl without exploiting the full capabilities of the hardware
It's single-threaded because it's a learning tool not a forth you'd ever want to use in an environment where performance would matter. Single-threaded or not, forth interpreters [not compilers] have terrible interaction with branch prediction - they will never perform well on modern CPUs. Virtually every other instruction executed is a jmp that cannot be predicted and thus will collapse the pipeline.
> forth interpreters..have terrible interaction with branch prediction - they will never perform well on modern CPUs.

Interesting statement. It led me to this paper:

Branch Prediction and the Performance of Interpreters - Don’t Trust Folklore (2015)

https://hal.inria.fr/hal-01100647/document (pdf)

"..Many studies go back to when branch predictors were not very aggressive. Folklore has retained that a highly mispredicted indirect jump is one of the main reasons for the inefficiency of switch-based interpreters."

"The accuracy of branch prediction on interpreters has been dramatically improved over the three last Intel processor generations. This..has reached a level where it cannot be considered as an obstacle for performance anymore."

There is an older paper by Anton Erl that already showing variations in performances for the same implementation technique from one generation of Pentium to another and of course between AMD and Intel.

Personally, I stopped worrying and used the most convenient implementation for my use case (portable interpreter written in C). Your set of primitives and how you code your Forth programs usually have a much larger improvement potential.

That paper refers to interpreters in the traditional sense. However threaded code is not interpreted in the same way. After every instruction is a computed jump. There are benchmarks on this in the jonesforth code you can actually run and you will observe the exact problem there.
Note that the paper does compare a switch-based dispatcher to a computed-goto version ('jump threading') - cf figure 2. The latter used to have a significant performance advantage over the former, which is apparently no longer true (cf figure 3 (a)). That of course doesn't invalidate your point.
> they will never perform well on modern CPUs

Is this statement for indirect threaded Forths only? What about direct threaded, token threaded and sub-routine threaded Forths?

So basically the opposite of APL.
Any language that arranges data in arrays or large structs does well on modern machines, especially with vector and SIMD extensions. To be fair to Forth, there exist machines that are well-suited to running "threaded code"[1], it's just that they are not machines that are commonly available today.

[1] "threaded" has nothing to do with multithreaded: https://en.wikipedia.org/wiki/Threaded_code

I don't buy this. I understand the use of your term 'threaded', but these are unconditional and therefore can be incorporated into the instruction pipeline with little or no overhead. Here's a very old SO post, CPus won't have got worse since then

https://stackoverflow.com/questions/5127833/meaningful-cost-...

"But in general, on modern processors, there is minimal cost for an unconditional jump. It's basically pretty much free apart from a very small amount of instruction cache overhead. It will probably get executed in parallel with neighbouring instructions so might not even cost you a clock cycle. "

If they were all unconditional, that would be fine. The problem is they are all computed jumps (see jonesforth for details).
Thanks for the link, that looks really interesting
UEFI forth sounds like a really cool project. Also cool to learn FASM has an easy way to generate efi executables.