Hacker News new | ask | show | jobs
by haikuginger 3059 days ago
If you need to check against multiword tags, I'd suggest a utility function to expand a list of words into each possible one-or-more-word subset. Should still be substantially faster than the current state, and you can improve it even more by limiting it to phrases with no more words than the tag with the maximum number of words.

    def get_all_phrases(descr):
        words = descr.split()
        if len(words) == 1:
            return words
        phrases = []
        for i in range(2, len(words) + 1):
            phrases += get_phrases_of_len(i, words)
        return words + phrases

    def get_phrases_of_len(length, words):
        return [' '.join(words[i:i+length]) for i in range((len(words) - length) + 1)]