Hacker News new | ask | show | jobs
by exabrial 75 days ago
His other project "Byte Me", along with judicious javap usage, has been super useful for me learning JVM bytecode so I could make a machine learning model compiler for the JVM (basically compile your ML models as native code; ONNX, tree ensembles, regressors, classifiers, etc as native JVM classes with no massive runtime needed)

still in the works, but its here for those interested: Petrify: https://github.com/exabrial/petrify

1 comments

Why not compiling it to Java source code (not bytecode)? Users would use their own Java compiler then.

Same as, say, ANTLR generates code to parse various texts to AST.

Great question, actually I tried that! m2cgen is a project that does that in fact.

It works fine for simple models, but breaks down for production-sized tree ensembles. The JVM has a hard 64KB method size limit, and javac controls how your deeply nested if/else trees get laid out. m2cgen's own FAQ says to reduce estimators when you hit recursion limits during generation. With direct bytecode emission I control the method structure precisely, I can split across methods exactly where needed and manage the constant pool directly. I also wrote much more efficient bytecode than m2cgen creates as equivalent source.

The source code is also a pretty useless step, sets off all kinds of static analysis alarms in your stack, and also I worry about source code injection (not that can't happen with petrify, it's just a lot harder).

Finally, I'm grateful for the sweat the authors of m2cgen have put in, but the project has gone without updates for 4 years. That doesn't mean it's useless (some mature software never sees updates), but it's not a positive sign either.