Hacker News new | ask | show | jobs
A (deceptively tricky?) Python question
5 points by noaharc 5887 days ago
I am writing a Scrabble AI, and I want to handle blank tiles. Right now, I want to expand a string containing blank tiles into a list of strings composed of words created by all possible interpretations of the blanks.

So, "H_LLO" ==> ["HALLO", "HBLLO", ... "HZLLO"] (length is 26)

However, Scrabble has 2 blank tiles, so I can't just do a simple for loop and replace.

For example, "H_LL_" ==> ["HALLA", "HALLB", ... "HBLLA", ... "HZLLZ"] (length is 26x26 = 676)

What is the cleanest way to do this? The best I could come up with is the following, but I don't much like it:

  import re
  from itertools import product

  w = "H_LL_"
  expanded = []
  product_args = list()
  for _ in xrange(w.count("_")):
      product_args.append( map(chr, range(65, 91)) )
  for repl_str in product(*tuple(product_args)):
      wtemp = w
      for c in repl_str:
          wtemp = re.sub('_', c, wtemp)
      expanded.append(wtemp)
5 comments

Try this for when you need to fill 2 blank tiles:

  import string
  from itertools import product
  
  w = "H_LL_"
  w_fmt = w.replace('_', '%s')
  
  product_args = [list(string.uppercase)] * 2
  
  expanded = [w_fmt % repl for repl in product(*product_args)]
Many thanks! I'm using it now.
Do you want to just find dictionary words based on board configuration and a bench? Typically they build a Directed Acyclic Word Graph from the dictionary and then you can just traverse the graph.

So for your example, you start at the H node and follow all paths. Then follow the L path if it exists. Then follow the next L path. Then follow all paths from that node. Then check if the word has ended.

Here's another way to do this in Python.

   lets =  [chr(x) for x in range( ord('a'),ord('z')+1)]

   def blankexpand( templates ):
       expansion = []
       for t in templates:
        expansion = expansion + [t.replace('_',let,1) for let in lets]
       return expansion


   def expand( template ):
        blanktiles = template.count('_')
        if blanktiles == 0:
                print 'nothing to expand'
        elif blanktiles == 1:
                print blankexpand( [template] )
        elif blanktiles == 2:
                print blankexpand( blankexpand( [template] ) )
        else :
                print 'too many blank tiles'
        print


   expand( 'abce' )
   expand( 'a_')
   expand( 'h_ll_')
   expand( '___' )
I would simply use hyphen '-' as well as the underscore '_'.

Another almost as trivial solution would be to have a string with two blanks and permute away !

Well, do you really want ALL possibilities, or just valid ones (actual words - HZLLO is not a word)?

If you want all possibilities, I might suggest wrapping it in a function that can lazily return the possibilities one by one as needed (see the yield keyword).

If you want only valid entries, you will probably want something more sophisticated (e.g. a search tree)

Yes, if you wanted to only have valid words, this could reduce the search space dramatically. I know it's not python, but if you replaced the '_'s with '.'s just the grep command will be surprisingly quick and powerful.

$ grep -i ^H.LL.$ /usr/share/dict/words

Maybe a similar method using the re module and then matching on your dictionary?

All possibilities with only two characters is still <1000 combinations. At this sort of small number I'd take a guess and say that it would be quicker to use a slightly less efficient algorithm and let the Python internals do the processing on lists than to iterate at Python level.

The reason is that internal Python operations are orders of magnitude faster than a looping construct where the loop is written in Python.

So I'd have a set for the allowed word list, generate a set of permutations and then ask Python for the intersection, rather than iterating through the permutations on the fly.