Hacker News new | ask | show | jobs
by artisanspam 996 days ago
What are the limitations on what languages this supports?
1 comments

Currently it is hard limited to these file extensions: https://github.com/kantord/SeaGOAT/blob/ebfde263b970ddecdddf...

It is to avoid wasting time processing files that cannot lead to good results. If you want to try it for a different programming language, please fork the repo and try adding your file formats and test if it gives meaningful results, and if it does please submit a pull request.

Other than that one limitation is that it uses a model under the hood that is trained on a specific dataset which is filtered for a specific list of programming languages. So without changing the model as well, the support for other languages could be subpar. At the moment the model is all-MiniLM-L6-v2, here's a detailed summary of the dataset: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v...

also I plan to add features that incorporate a "dumb" analysis of the codebase in order to avoid spamming the results with mostly irrelevant results such as import statements or decorators. Those features would be language dependent, so support would need to be added for each language
extensions are configurable or truly hard coded?
it is hardcoded at the moment, but I am willing to merge code that adds the option to override.

Also probably a flag would solve it for some users, the best way would be to add a configuration option. At the moment there are no config file/.rc file support in SeaGOAT though, but there is an issue to add it and I'm happy to merge pull requests: https://github.com/kantord/SeaGOAT/issues/180

update: I changed the hardcoded set of languages to support the following:

Text Files (.txt) Markdown (.md) Python (.py) C (.c, ``.h`) C++ (.cpp, .hpp) TypeScript (.ts, .tsx) JavaScript (.js, .jsx) HTML (.html) Go (.go) Java (.java) PHP (.php) Ruby (.rb)

https://github.com/kantord/SeaGOAT#what-programming-langauge...

Based on the code, they're hardcoded. It seems like it'd be pretty straightforward to add an override flag though.