| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kantord 1044 days ago

Currently it is hard limited to these file extensions: https://github.com/kantord/SeaGOAT/blob/ebfde263b970ddecdddf...

It is to avoid wasting time processing files that cannot lead to good results. If you want to try it for a different programming language, please fork the repo and try adding your file formats and test if it gives meaningful results, and if it does please submit a pull request.

Other than that one limitation is that it uses a model under the hood that is trained on a specific dataset which is filtered for a specific list of programming languages. So without changing the model as well, the support for other languages could be subpar. At the moment the model is all-MiniLM-L6-v2, here's a detailed summary of the dataset: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v...

2 comments

kantord 1044 days ago

also I plan to add features that incorporate a "dumb" analysis of the codebase in order to avoid spamming the results with mostly irrelevant results such as import statements or decorators. Those features would be language dependent, so support would need to be added for each language

link

tinix 1044 days ago

extensions are configurable or truly hard coded?

link

kantord 1044 days ago

it is hardcoded at the moment, but I am willing to merge code that adds the option to override.

Also probably a flag would solve it for some users, the best way would be to add a configuration option. At the moment there are no config file/.rc file support in SeaGOAT though, but there is an issue to add it and I'm happy to merge pull requests: https://github.com/kantord/SeaGOAT/issues/180

link

kantord 1044 days ago

update: I changed the hardcoded set of languages to support the following:

Text Files (.txt) Markdown (.md) Python (.py) C (.c, ``.h`) C++ (.cpp, .hpp) TypeScript (.ts, .tsx) JavaScript (.js, .jsx) HTML (.html) Go (.go) Java (.java) PHP (.php) Ruby (.rb)

https://github.com/kantord/SeaGOAT#what-programming-langauge...

link

rockostrich 1044 days ago

Based on the code, they're hardcoded. It seems like it'd be pretty straightforward to add an override flag though.

link