|
|
|
|
|
by celrod
33 days ago
|
|
I'm still working on it.
I'm currently working on a cache tile-size optimization algorithm that should (a) handle trees (a set of loops can be merged at some cache levels and split at others, e.g. in an MLP it may carry an output through the L3 cache, while doing sub-operations in the L2/L1/registers) (b) converge reasonably quickly so compile times are acceptable. This is the last step before I move to code generation and then generating a ton of test cases/debugging. My goal is some form of release by the end of the year. |
|