Hacker News new | ask | show | jobs
by gg82 336 days ago
I wonder if embeddings could be created from open source and library code and then used to convert back the code with all the correct variable and function names.
2 comments

It's not AI but Ghidra has a cool feature called BSim which does something similar. Each function get's a "feature vector" which now that I think about it has some clear parallels to embeddings.
Wow that is cool, I bet with that feature and a huge database of known "feature vectors" from open-source libraries so you can focus on the actual business logic of the binary instead of trying to reverse external library functions
BSim is a hash machine, right? (BSim uses feature vectors, and locality-sensitive hashing.)

Embeddings could be derived from reconstituted hash.

I've been wondering the same thing. However you would have to have a very large database of embeddings for this to be useful, right?

Otoh I can see this being disproportionately helpful with reverse Engineering Rust and Go binaries, which usually include many opensource dependencies