Hacker News new | ask | show | jobs
by boltzmann-brain 664 days ago
how do you make an LLM work on the AST level? do you just feed a normal LLM a text representation of the AST, or do you make an LLM where the basic data structure is an AST node rather than a character string (human-language word)?
3 comments

The frontier models can all work with both source code and ASTs as a result of their standard training.

Knowing this raises the question, which is better to feed an LLM source code of ASTs?

The answer is really it depends on the use case, there are tradeoffs. For example keeping comments intact possibly gives the model hints to reason better. On the other side, it can be argued that a pure AST has less noise for the model to be confused by.

There are other tradeoffs as well. For example, any analysis relating to coding styles would require the full source code.

It looks like they're running `webcrack` to deobfuscate/unminify and then asking the LLM for better variable names.
I'm using both a custom Babel plugin and LLMs to achieve this.

Babel first parses the code to AST, and for each variable the tool:

1. Gets the variable name and surrounding scope as code

2. Asks the LLM to come up with a good name for the given variable name, by looking at the scope where the variable is

3. Uses Babel to make the context-aware rename to AST based on the LLM's response