Hacker News new | ask | show | jobs
by avitzur 1454 days ago
I wrote the blog post. AMA.
3 comments

I'm wondering what you needed to write unsafe code for? Could there be another way to get rid of reference counting for your use case?

Another question, have you tried using Accelerate framework to solve performance bottlenecks (or save yourself from having to write your own calc code)?

Most of the performance critical sections are numeric computation loops. I use the Unsafe APIs where profiling shows that the bounds-checking overhead is significant.

In principle, I could get rid of reference counting overhead by using value types or immutable data. I couldn't see a simple path to doing that without re-architecting everything (with no guarantee that the end result would not just have different performance issues.) For the moment, I'm awaiting compiler improvements before re-evaluating the tradeoffs. There is certainly room for the compiler to reason better on eliding retain/release. https://github.com/apple/swift/issues/58549

Yes, the code does use Accelerate where applicable. That is one component of the numeric evaluation. It addresses the lowest level of things like evaluate the sin function on every array element, or multiply there arrays element wise. Performance tuning is a game of whack-a-mole. There's always another bottleneck somewhere.

(Thanks!)

Can you say more about the parse tree impact on performance?

Is the expression left in a tree form and effectively "interpreted" from the AST for all points? Is that the critical path?

The math is internally represented as a tree for display and editing. Most of the performance critical code is the numeric evaluation when graphing. For that, the math is compiled to a linear byte code which vastly improves locality of reference and is an opportunity to apply optimizations such as common subexpression elimination and loop invariant code motion.
I hadn't followed the link in article originally to get to https://mobile.twitter.com/RonAvitzur/status/146102321572409.... So literally the process of parsing the expression and producing the byte code is the performance challenge now? Or is it also walking the bytecode to do anything?

My basic question would be: why not go back to flex/bison/yacc/whatever via C-FFI? (But I think it would still be bad, since you'll want to get to a Swift data structure for your ops and those will still have the Arc issues)

That thread describes me working through performance issues in the initial port eight months ago. Those cases perform adequately now. Parsing is not a bottleneck. Walking the bytecode remains a performance hotspot, as that is where all the numeric calculations occur, but no more or less so comparing the C++ and Swift implementations.

I did investigate maintaining the flex/bison parser, since its generated state machine C code is more robust than my handwritten recursive descent parser when presented with pathological input. However, as you say, since I need a Swift data structure in the end, there is little to be gained and a lot of complication bridging via a C-FFI.

Would it be possible to vectorize the mathematical operations so that the hotspot isn't the interpreter?
Yes, the numeric evaluation is vectorized via the Accelerate Framework's vForce and vDSP APIs. That is a significant performance improvement. The numeric evaluation remains a hotspot with vectorization, as that is where the app does most of its work.
You’re a complete beast and I absolutely adore the origin story. Thank you.
Thank you! I can't believe that was thirty years ago. It feels unreal.