|
|
|
|
|
by a2code
819 days ago
|
|
The problem is interesting in at least two aspects. First, an ideal decompiler would eliminate proprietary source code. Second, the abundant publicly available C code allows you to simply make a dataset of paired ASM and source code. There is also a lot of variety with optimization level, compiler choice, and platform. What is unclear to me is: why did the authors fine-tune the DeepSeek-Coder model? Can you train an LLM from zero with a similar dataset? How big does the LLM need to be? Can it run locally? |
|
It's basically always better to start training with a pre-trained model rather than random, even if what you want isn't that close to what you start with.