|
|
|
|
|
by minosu
1152 days ago
|
|
This repository presents finetuned LLaMA models that try to address the limited ability of existing language models when it comes to generating code for less popular programming languages. gpt-3.5-turbo and gpt-4 have proven to be excellent coders, but fall off sharply when asked to generate code for languages other than Python/Javascript etc.
The godot-dodo approach to address this: Finetune smaller models on a single one of these languages, using human-created code scraped from MIT-licensed GitHub repositories, with existing GPT models generating instructions for each code snippet. This differs from the dataset generation approach used by projects such as stanford-alpaca or gpt4all, in that the output values of the training set remain high quality, human data, while following the same instruction-following behavior. This will likely prove more effective the more obscure the language. In this case, GDScript was used, which is the scripting language for the popular open-source game-engine Godot. The same approach however can be applied to any other language. Performance is promising, with the 7 billion parameter finetune outperforming GPT models in producing syntax that compiles on first try, while being somewhat less capable at following complex instructions. A comprehensive evaluation comparing all models can be found here:
https://github.com/minosvasilias/godot-dodo/tree/main/models |
|