the most interesting part of this imo is the heuristic it uses to find the most likely word:
# Greatly penalize words that have multiple of the same black letters
# Greatly penalize words that have the same black letters as any seen word
# Greatly penalize words that have the same yellow letter/index pair as any seen word
# Reward words that have yellow letters that match yellow letters in seen words, but in different positions
# Reward words that have yellow letters that match green letters in seen words
# Penalize words that have yellow letters that don't appear in seen words
# Penalize words that have green letters that don't appear in seen words
# Rewards words based on commonality
# Reward/penalize words based on line indexes and common letters
I built something similar to this that just lists all the possible words, and is nowhere near as smart -- but it lets you draw pretty pictures! https://jse.li/wordle-reverse/
I honestly thought I removed those comments lol. They might be a little outdated, but yeah, I spent a lot of time thinking about what kinds of information I could glean. There's definitely some bias in there towards a certain kind of strategy/play.
Looking at the parent run, my first thought was that its initial guess is a relatively uncommon word and is (I assume) probably not a very good choice as an initial guess. There are various lists of the best starting words in order going around. I wonder if you could get to the best results by picking the highest ranked word consistent with the other lines?
I thought about using lists like that, but it felt a little too much like cheating. My attempt at something like this is, if the word is one of your first two guesses, I reward it for having more common letters. And actually penalize common letter "misses" in later lines. I'm sure it's not 100% optimal, but I saw some improvement on my admittedly very limited set of test data. Here's the relevant code - https://github.com/mattruzicka/wordle_decoder/blob/0bfd7eaac...
Of course, doing what I wrote also makes the assumption that the person playing is using one of the top 10 (or whatever) starting words which may or may not be correct.
That aside, I started playing a 6 letter variant and, while I haven't done any sort of analysis with access to the word list, I did do a quick Word Hippo analysis of words with common letters, especially vowels, and while I'm sure my results aren't optimal they're probably pretty good.
ADDED: Which I think is probably not all that different from what your code is doing. I fed Word Hippo 4 or 5 letters and told it to come up with some common words.
But that still requires an assumption that a player has given some thought to good starting guesses.
I briefly had the thought of creating a game where you get a bunch of those Wordle color square messages from in-game "friends", and maybe some side info - "Tim always guesses adieu as his first word", "Jane always uses one of her kids' names - Olive or Hazel", etc and then you have to guess the word. Haven't had time to experiment with it though so not sure how fun it'd be. I don't retain any rights to the idea if anyone wants to run with it and try it out.
I tried that in a single game with a friend's Octordle result. You have to solve it yourself first, then work out their initial guesses. Here's my solution to today's Octordle:
Throwing in those extra hints about the players make it feel more like a zebra puzzle, but that's not a bad thing, especially if you want mere humans to solve it. I can't easily process a thousand tweets, so giving more compact data is key.
Even using the example on the github page, it gives me a crash -
/Library/Ruby/Gems/2.6.0/gems/wordle_decoder-0.1.2/lib/wordle_decoder/wordle_share.rb:8:in `block in final_line?': undefined local variable or method `_1' for WordleDecoder::WordleShare:Class (NameError)
from /Library/Ruby/Gems/2.6.0/gems/wordle_decoder-0.1.2/lib/wordle_decoder/wordle_share.rb:8:in `any?'
from /Library/Ruby/Gems/2.6.0/gems/wordle_decoder-0.1.2/lib/wordle_decoder/wordle_share.rb:8:in `final_line?'
from /Library/Ruby/Gems/2.6.0/gems/wordle_decoder-0.1.2/bin/wordle_decoder:14:in `<top (required)>'
from /usr/local/bin/wordle_decoder:23:in `load'
from /usr/local/bin/wordle_decoder:23:in `<main>'
Yeah, you can think of it as a percentage. The min is 1 and the max is 99. It could be 100, but I don't trust my mostly untested side project that much, so I cut it off at 99 :)