| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bt1a 827 days ago
	I would also think that generating a very large amount of C code using coding LLMs (using deepseek, for example, + verifying that the output compiles) as synthetic training data would be quite beneficial in this situation. Generally the quality of synthetic training data is one of the main concerns, but in this case, the ability for the code to compile is the crux.

1 comments

Zambyte 826 days ago

I would think that the primary benefit of this over existing decompiler tools would be the ability to use sensible names for identifiers, break up a project to be a sensible set of modules, and maybe even add realistic / helpful comments. If you're synthesizing code to do that, you'll probably gain on the front of generating code that compiles, at the cost of these advantages.

link