I'd love to see your code! My strategy[0] (not-yet-automated) doesn't aim to minimize in the worst case, but rather to minimize the expected size of the set of possible words.
Have you done any thinking about hard mode? Getting trapped in a 'deep' pattern (?ight: e, f, l, m, n, r, s, t, w) too soon seems like a problem. But this is way outside my bailiwick.
Yeah, I was trying to determine the utility function that would determine how to select words, but I'm more of a gut-feel, intuitive sort of person. I also tried a squared term like yours, but for some reason, it didn't feel right when I tested it. That version decided the best initial word was `LARES`. I have a sneaky suspicion that we need to account for how common a word was. I think my solver was getting to hung up worrying about BRAXY and CRURA, and giving them similar weight to a word like TRACK.
However, it was very hard to debug because a slightly buggy version was still decent at playing the game! In fact, I'm fairly certain I still have some bugs. I need to comment my code and get it up on Github. It's also super brute force O(n^2)
Okay, I ran it again with the squared term in the utility function. Here's what it did:
L A R E S (0 green, 0 yellow, 576 words remain)
T O N I C (1 green, 0 yellow, 50 words remain)
B O O D Y (4 green, 0 yellow, 5 words remain)
D E G U M (0 green, 2 yellow, 1 word remains)
G O O D Y (5 green, 0 yellow, 0 words remain)