|
|
|
|
|
by tiehuis
2960 days ago
|
|
It certainly isn't out of reach to get a fairly close speed to GMP implementation-wise if you are willing to optimize the low-level loops in assembly. I think the simple cases are rather straight-forward to reach parity but once you start needing to optimize your algorithm thresholds, it requires much more testing to find the optimal values [1]. It is also easy to overlook how well optimized GMP is across a wide range of less common architectures and chips and I wouldn't be surprised if my particular implementation lost a bit of ground on other architectures like ARM (would be a good thing to test). [1] https://gmplib.org/devel/thres/ |
|