| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kevinsundar 1690 days ago

Seems totally reasonable to me too. It probably has just seen the pattern

``` str = "{{SOME_ID_HERE}}-jash127hg27128h"

participant.follow({twitterUserId:"{{SOME_ID_HERE}}"}) ```

1 comments

harshitaneja 1690 days ago

This doesn't seem likely. No one would be generating it this way as access token is issued after oauth and I am unaware of any method to get the second half of the token without the first half. And given that in the same response that contains access token, user id is passed as well so there is no need to extract it from there.

link

a-dub 1690 days ago

hm. how many large integer literals are there in your code? it could just be learning that user ids are long strings of digits and is making a guess as to which long string of digits (based on some context, like sharing a line with "id" in it) might be the right one...

link

harshitaneja 1690 days ago

But that's the thing, if it took the whole string it would be fine, it extracted the exact right substring. There for 5 others in the same file. None I would have been able to distinguish from each other without context.

link

a-dub 1690 days ago

think of it this way, in the entire corpus of github, how often do you think that there are numeric identifiers that appear near terms like "id" where the numeric part is then used elsewhere with terms like "id" or terms that are frequently found near terms like "id"?

don't get me wrong, it's cool, but these models operate on a character by character basis with sequence context. if they can learn things like matching pairs of parens and quotes in certain contexts, it seems they could certainly learn things like extracting long strings of digits.

now what would be cool would be if they could generate regular expressions for the rules they're learning.

link