|
|
|
|
|
by boomanaiden154
719 days ago
|
|
I'm not sure it's likely that the LLM here learned from gcc. The size optimization work here is focused on learning phase orderings for LLVM passes/the LLVM pipeline, which wouldn't be at all applicable to gcc. Additionally, they train approximately half on assembly and half on LLVM-IR. They don't talk much about how they generate the dataset other than that they generated it from the CodeLlama dataset, but I would guess they compile as much code as they can into LLVM-IR and then just lower that into assembly, leaving gcc out of the loop completely for the vast majority of the compiler specific training. |
|