Hacker News new | ask | show | jobs
by mahaloz 829 days ago
It’s always cool to see different approaches in this area, but I worry its benchmarks are meaningless without a comparison of non-AI based approaches (like IDA Pro). It would be interesting to see how this model holds up on metrics from previous papers in security.
1 comments

Thanks! We're working on Ghidra/IDA pro. The problem we face is the right kind of data to test with and how to evaluate it. It's like there's no "standard" benchmark/metrics that everyone uses for decompilation.
As others have said, the standardization of metrics is still something debated, but at the same time, this space has been explored by various top-tier papers that your paper did not cite. For example, DREAM [1], evaluated using the classic metric of goto-emittence. Rev.ng [2], evaluated using Cyclomatic Complexity and gotos. SAILR [3], evaluated using the previous metrics and a Graph Edit Distance score for the structure of the code.

I feel that without a justification for dropping previously established metrics by the peer review process, you weaken your new metrics. However, I still think this is an interesting paper. It just could be made more legit by thoroughly reading/citing previous work in the area and building an argument for why you may go against it.

[1]: https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan... [2]: https://rev.ng/downloads/asiaccs-2020-paper.pdf [3]: https://www.usenix.org/system/files/sec23winter-prepub-301-b...

References make a paper stronger!