| 1) I use unittest since I ship code to others and I want fewer dependencies. I think if I was doing this only for myself I might look to nose. But I've been using unittest long enough that I know how to get what I want from it, so I haven't bothered to learn nose. 2) I don't know about Py3 issues like that. 3.1) I am not much of a believer about "SRP". What's your public/published API? Is it useful? It is understandable? Does your exposed module structure give a domain view or an internal organizational view that might change after refactoring. Behind the scenes you're more free to design as you wish. 3.2) In a small package like this, I would have everything in a single file. 4) I don't like the "import re" and other class attributes in ClippingsIterator. I prefer them as global variables, and prefix with a "_" as a hint that they aren't meant as part of the API. I believe this preference is the standard consensus. 5) don't know. 6) I'm not knowledgeable enough to judge, but it looks fine to me. Minor points: for c in clippings:
self.clippings.append(c)
is more idiomatically written self.clippings.extend(c)
"(clipping['book'],clipping['author'])" should have a space after the comma.In general it's best to have things like "import os" done once, at the top of the module. "os" is a builtin, is always present, and the "import os" has almost no overhead. The jinja2 import is a bit more special. Do you want the code to work even if jinja2 isn't installed? In which case, what you have is good. I would add a comment, on the same line as "from jinja2 import ...", saying "available from <url>". This gives an additional clue should a failure occur. But if your code doesn't make sense without jinja2 then I would put the import at the top of the module, to occur at import time. This is more like what people expect, so would be easier to diagnose should a failure occur. It doesn't look like "import_clippings" is part of the exposed API, since it takes no parameters and always sets an instance variable that looks like it should only be set once. In that case, prefix it with a "_" to indicate that it shouldn't be called directly. I would write ClippingsIterator() as a function rather than a class, in this rough form: def read_clippings(source):
detected_encoding = _detect_encoding(source)
with open(source, mode='r', encoding=detected_encoding) as source_file:
while 1:
clipping_buffer = []
count = 1
while True:
if count > 5:
raise InvalidFormatException("Input file doesn't seem to be a clippings file, separators are missing or damaged")
line = self.source_file.readline()
if not line:
return # end of input
elif line != self.clipping_separator:
...
yield line_dict
where "_detect_encoding" is your current method, but written as a module function. (One hint that a method isn't actually method is if it never uses the "self" variable. In those cases, I nearly always write them as module functions.) |