|
|
|
|
|
by kantord
997 days ago
|
|
Currently it is hard limited to these file extensions: https://github.com/kantord/SeaGOAT/blob/ebfde263b970ddecdddf... It is to avoid wasting time processing files that cannot lead to good results. If you want to try it for a different programming language, please fork the repo and try adding your file formats and test if it gives meaningful results, and if it does please submit a pull request. Other than that one limitation is that it uses a model under the hood that is trained on a specific dataset which is filtered for a specific list of programming languages. So without changing the model as well, the support for other languages could be subpar. At the moment the model is all-MiniLM-L6-v2, here's a detailed summary of the dataset: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v... |
|