|
|
|
|
|
by zahlman
77 days ago
|
|
Sure, but making one string from the file contents is surely much better than having a separate string per word in the original data. ... Ah, but I suppose the existing code hasn't avoided that anyway. (It's also creating regex match objects, but those get disposed each time through the loop.) I don't know that there's really a way around that. Given the file is barely a KB, I rather doubt that the illustrated techniques are going to move the needle. In fact, it looks as though the entire data structure (whether a dict, Counter etc.) should a relatively small part of the total reported memory usage. The rest seems to be internal Python stuff. |
|
If you don't care about efficiency you can just do len(set(text.split())), but that's barely worth making a function for.