Hacker News new | ask | show | jobs
by jengland 1168 days ago
I prompted GPT-4 with this and I think it solved it. The first time it assumed the words were generated with random letters and solved it like a math textbook problem. Then I prompted it with this:

Ok, I see you calculated the probably using randomly generated "words" from the letters of the English alphabet. I am interested in the actual probably of two real words in English that are 5 letters wrong share the first three characters.

I am a Python developer, so I will understand it if you give me a Python script.

I gave me this which looks right to me:

  import nltk
  from collections import defaultdict
  
  nltk.download('words')
  from nltk.corpus import words
  
  # Get the English words
  english_words = words.words()
  
  # Filter the words to get only five-letter words
  five_letter_words = [word for word in english_words if len(word) == 5]
  
  # Create a dictionary to store the count of words with the same first three letters
  words_dict = defaultdict(int)
  
  # Count the words with the same first three letters
  for word in five_letter_words:
      key = word[:3]
      words_dict[key] += 1
  
  # Calculate the number of pairs with the same first three letters
  same_first_three_letter_pairs = sum((count * (count - 1)) // 2 for count in words_dict.values())
  
  # Calculate the total number of pairs from the five-letter words list
  total_pairs = (len(five_letter_words) * (len(five_letter_words) - 1)) // 2
  
  # Calculate the probability
  probability = same_first_three_letter_pairs / total_pairs
  
  print(f"Probability: {probability:.4f} or {probability * 100:.2f}%")