| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by benhamner 1644 days ago

This uses ~2-3k tweets per day for most days, which seems to be more than enough. According to https://twitter.com/WordleStats/status/1486021209015963649 there's about 250k daily tweets per Wordle right now, so this is about a 1% sample coming from whatever the Twitter search API returned when I ran that query.

The simulated distributions it's comparing to are based on 1000 runs per 5-letter word.

Anecdotally, 250 was enough to get it working for those simulated distributions, 100 and below it became increasingly noisier. A higher N would be nice, but I didn't spend more time optimizing the performance for the simulation code beyond what was needed to get this working.

3 comments

bscphil 1644 days ago

This is a cool project, but I wanted to tell you that your evaluate_guess function is wrong.

    evaluate_guess(answer="crest", guess="erase")
    "MYNYM"

Many people misunderstand this but it's not how the rules actually work. Correct here would be MYNYN, because there is only one E in the correct answer. There must be a 1-1 correspondence between any 'M' letter in the guess and the letter in the answer. This is similar to the rules for the game "Mastermind".

link

waterproof 1644 days ago

Right, I wonder how many of the “fake/invalid” tweets that OP observed are actually this bug in the analysis code.

EDIT: actually it looks like it’s correct - evaluate_guess_char() only returns “M” if there’s an instance of the guess letter that’s not accounted for.

link

bscphil 1643 days ago

It's not correct, I pasted the code from the article directly into ipython.

It filters out cases where the corresponding character in the answer is correct (a 'Y'), but not cases where it's used in another maybe (a 'M'). The latter requires keeping track of state in a way that this doesn't.

For example:

    evaluate_guess(answer="crest", guess="erase")
    'MYNYM'

Which is wrong, as stated above.

    evaluate_guess(answer="crest", guess="erese")
    'NYYYN'

Which is right, even though we only changed the middle letter of the guess, not either of the broken letters. In this case the filtering works correctly.

link

Scaevolus 1644 days ago

If you want to get even more tweets, you could use twitter's streaming API with the keyword "Wordle": http://adilmoujahid.com/posts/2014/07/twitter-analytics/

It should allow capturing a significant fraction of the 250k daily wordle tweets.

link

gojomo 1644 days ago

Besides eliminating the superficially-impossible rows (like `YYYYM`), does it do anything against more-sophisticated chaffing, like one or more accounts posting possible-but-inaccurate hint grids pointing at an alternate answer?

link