|
|
|
|
|
by ziedaniel1
759 days ago
|
|
Very cool idea - but unless I'm missing something, this seems very slow. I just wrote a simple loop in C++ to sum up 0 to 2^30. With a single thread without any optimizations it runs in 1.7s on my laptop -- matching Bend's performance on an RTX 4090! With -O3 it vectorizes the loop to run in less than 80ms. #include <iostream>
int main() {
int sum = 0;
for (int i = 0; i < 1024*1024*1024; i++) {
sum += i;
}
std::cout << sum << "\n";
return 0;
}
|
|
Bend's codegen is still abysmal, but these are all low-hanging fruits. Most of the work went into making the parallel evaluator correct (which is extremely hard!). I know that sounds "trust me", but the single-thread performance will get much better once we start compiling procedures, generating loops, etc. It just hasn't been done.
(I wonder if I should have waited a little bit more before actually posting it)