Hacker News new | ask | show | jobs
by mdaniel 1455 days ago
For my curiosity, https://www.tabnine.com/code/java/classes/com.google.gson.Js... says it was taken from https://github.com/Vedenin/useful-java-links/tree/master/hel... which Tabnine annotates as "license: other" (it's actually CC-BY-SA-4.0 <https://github.com/Vedenin/useful-java-links/blob/1f4278c9ad...>) but (a) does Tabnine not know about CC licenses? (b) how is the downstream user (i.e. the person accepting said snippet completion) supposed to interpret "license: other" without doing the same research I just did?

That story gets even weirder with the 3rd link on that page whose license is also "other" but this time the "view source" link goes to https://www.tabnine.com/web/assistant/code/rs/5c781237e70f87... . I find that weird for at least two reasons: (a) it clearly says "This snippet was taken from github" and has a GitHub style "org/repo" nomenclature, but doesn't link to the actual repo (b) at the very top of that file is the boilerplate Apache 2.0 license header

Finally, one should be very cautious about ever linking to "master" URLs, since the branch can get nuked if the repo owner decides to go with the "master to main" rename, it can lead the user to a copy of the file that is almost guaranteed not to be the same sha as the one Tabnine indexed, and related to that the repo can undergo a license change (FOSS to BSL is a very common one) leading to some complicated discussions

1 comments

There's a standard GitHub uses for license files (which must be at the root of the repo) which fills in the "license" field on the right column of the repo. If the standard isn't met then the link just says "View license". I imagine TabNine is pulling the license from the GitHub API.

https://docs.github.com/en/repositories/managing-your-reposi...

https://github.com/Vedenin/useful-java-links

When master branch is renamed to main, GitHub redirects any old links. https://github.com/github/renaming#renaming-existing-branche...

A fine reason for them to bork the useful-java-links case, I guess, but your argument falls over for the 3rd link: https://github.com/MovingBlocks/Terasology/blob/develop/LICE... and the sidebar widget correctly says "Apache-2.0 license"
The code-search application is completely distinct from the IDE-assistant application. They do not share any code, have completely different pipelines for training the backend and completely different datasets. The specific source you mentioned will not be part of the training dataset for Tabnine's IDE assistant. Hope this clarifies.
> The code-search application is completely distinct from the IDE-assistant application

Ok, but how is anyone supposed to know that? There's no verbiage anywhere on those search screens saying that

I thought using the code search with its auto-complete widgets would be a "try it in the browser" version of the product, but what I'm hearing is no, it's just some separate toy trying to be beat Sourcegraph or something?