Hacker News new | ask | show | jobs
by danharaj 3009 days ago
Floating point's non-associativity is certainly a problem for composability and stability of neural networks, you're right. However this non-associativity is well behaved in the sense that there's an ideal arithmetic we wish to approximate and we have techniques for mitigating the discrepancy between floating point and that arithmetic.

The non-associativity of the octonions is fundamental to their structure, not something to be worked around. In particular, there's no way to consider an octonion-valued network as comprising several layers plugged in serial.