| > it is allocating 2 IC nodes for each numeric operation, while Python is not While that's true, Python would be using big integers (PyLongObject) for most of the computations, meaning every number gets allocated on the heap. If we use a Python implementation that would avoid this, like PyPy or Cython, the results change significantly: % cat sum.py
def sum(depth, x):
if depth == 0:
return x
else:
fst = sum(depth-1, x*2+0) # adds the fst half
snd = sum(depth-1, x*2+1) # adds the snd half
return fst + snd
if __name__ == '__main__':
print(sum(30, 0))
% time pypy sum.py
576460751766552576
pypy sum.py 4.26s user 0.06s system 96% cpu 4.464 total
That's on an M2 Pro. I also imagine the result in Bend would not be correct since it only supports 24 bit integers, meaning it'd overflow quite quickly when summing up to 2^30, is that right?[Edit: just noticed the previous comment had already mentioned pypy] > I'm aware it is 2x slower on non-Apple CPUs. Do you know why? As far as I can tell, HVM has no aarch64/Apple-specific code. Could it be because Apple Silicon has wider decode blocks? > can be underwhelming, and I understand if you don't believe on my words I don't think anyone wants to rain on your parade, but extraordinary claims require extraordinary evidence. The work you've done in Bend and HVM sounds impressive, but I feel the benchmarks need more evaluation/scrutiny. Since your main competitor would be Mojo and not Python, comparisons to Mojo would be nice as well. |
I'm personally putting a LOT of effort to make our claims as accurate and truthful as possible, in every single place. Documentation, website, demos. I spent hours in meetings to make sure everything is correct. Yet, sometimes it feels that no matter how much effort I put, people will just find ways to misinterpret it.
We published the real benchmarks, checked and double checked. And then you complained some benchmarks are not so good. Which we acknowledged, and provided causes, and how we plan to address them. And then you said the benchmarks need more evaluation? How does that make sense in the context of them being underwhelming?
We're not going to compare to Mojo or other languages, specifically because it generates hate.
Our only claim is:
HVM2 is the first version of our Interaction Combinator evaluator that runs with linear speedup on GPUs. Running closures on GPUs required colossal amount of correctness work, and we're reporting this milestone. Moreover, we finally managed to compile a Python-like language to it. That is all that is being claimed, and nothing else. The codegen is still abysmal and single-core performance is bad - that's our next focus. If anything else was claimed, it wasn't us!