Hacker News new | ask | show | jobs
by a-dub 1690 days ago
hm. how many large integer literals are there in your code? it could just be learning that user ids are long strings of digits and is making a guess as to which long string of digits (based on some context, like sharing a line with "id" in it) might be the right one...
1 comments

But that's the thing, if it took the whole string it would be fine, it extracted the exact right substring. There for 5 others in the same file. None I would have been able to distinguish from each other without context.
think of it this way, in the entire corpus of github, how often do you think that there are numeric identifiers that appear near terms like "id" where the numeric part is then used elsewhere with terms like "id" or terms that are frequently found near terms like "id"?

don't get me wrong, it's cool, but these models operate on a character by character basis with sequence context. if they can learn things like matching pairs of parens and quotes in certain contexts, it seems they could certainly learn things like extracting long strings of digits.

now what would be cool would be if they could generate regular expressions for the rules they're learning.