Hacker News new | ask | show | jobs
Please code review whoarder, my first (tiny) Python module
1 points by ronj 4765 days ago
Hi, I'm an Industrial Engineer-turning-developer. Over the last years my role was half-functional/half-development, but I've heard I'm able at technical stuff (and I LOVE IT) so I want to dig deeper that direction. So far I worked on VB, web dev, and C#/Java enterpriseware.

I've been playing with Python for a while, which I love and would like to target professionally, and am lucky enough to take a work break to train myself and hope to land a Python job. I read a few books (Clean Code, Dive Into Python 2+3) and the last two days I built https://github.com/ronjouch/whoarder

Being on my own I'm probably doing some things horribly wrong, so I'd like some feedback from experienced people. Anything will do! Advice on unit tests, general coding practices, python-specific features I use well or misuse, advice on books, tools, etc. I have a few questions, too:

1. test.py: I used unittest to get used to standard stuff. What would I gain by using nose? Do you always use it, or is unittest fine for small projects?

2. whoarder.py: The if block in is ugly but is the only workaround I found for relative imports in py3. Could I restructure my app to avoid the issue?

3.1. My app could fit in a single .py, but I made a module anyway, because I'm happy to drink the "It will help you structure and better SRP your code" kool-aid. Right?

3.2. ... but I put several classes in a single .py because I felt it worthwhile to bring them closer. Is that ok or frowned upon?

4. clippings.py:ClippingsIterator: I abstracted input parsing through a custom iterator, while I could just have split by separator. That way I don't read all upfront and can interrupt if something goes wrong. Good or overkill?

5. clippings.py:detect_encoding: any idea why chardet2 doesn't tell me about the BOM?

6. template.html/web: anything worth mentioning; I'm very much doing things 'by hand', maybe the wrong way and would appreciate general feedback.

Thanks for your help.

1 comments

1) I use unittest since I ship code to others and I want fewer dependencies. I think if I was doing this only for myself I might look to nose. But I've been using unittest long enough that I know how to get what I want from it, so I haven't bothered to learn nose.

2) I don't know about Py3 issues like that.

3.1) I am not much of a believer about "SRP". What's your public/published API? Is it useful? It is understandable? Does your exposed module structure give a domain view or an internal organizational view that might change after refactoring. Behind the scenes you're more free to design as you wish.

3.2) In a small package like this, I would have everything in a single file.

4) I don't like the "import re" and other class attributes in ClippingsIterator. I prefer them as global variables, and prefix with a "_" as a hint that they aren't meant as part of the API. I believe this preference is the standard consensus.

5) don't know.

6) I'm not knowledgeable enough to judge, but it looks fine to me.

Minor points:

    for c in clippings:
        self.clippings.append(c)
is more idiomatically written

    self.clippings.extend(c)
"(clipping['book'],clipping['author'])" should have a space after the comma.

In general it's best to have things like "import os" done once, at the top of the module. "os" is a builtin, is always present, and the "import os" has almost no overhead.

The jinja2 import is a bit more special. Do you want the code to work even if jinja2 isn't installed? In which case, what you have is good. I would add a comment, on the same line as "from jinja2 import ...", saying "available from <url>". This gives an additional clue should a failure occur.

But if your code doesn't make sense without jinja2 then I would put the import at the top of the module, to occur at import time. This is more like what people expect, so would be easier to diagnose should a failure occur.

It doesn't look like "import_clippings" is part of the exposed API, since it takes no parameters and always sets an instance variable that looks like it should only be set once. In that case, prefix it with a "_" to indicate that it shouldn't be called directly.

I would write ClippingsIterator() as a function rather than a class, in this rough form:

    def read_clippings(source):
        detected_encoding = _detect_encoding(source)
        with open(source, mode='r', encoding=detected_encoding) as source_file:
            while 1:
                clipping_buffer = []
                count = 1
                while True:
                    if count > 5:
                        raise InvalidFormatException("Input file doesn't seem to be a clippings file, separators are missing or damaged")
                    line = self.source_file.readline()

                    if not line:
                        return   # end of input
                    elif line != self.clipping_separator:
         ...
            yield line_dict
where "_detect_encoding" is your current method, but written as a module function. (One hint that a method isn't actually method is if it never uses the "self" variable. In those cases, I nearly always write them as module functions.)
Great, and cool that you touched imports, that's also something I'm hesitating on. I'll be doing your suggested refactors, and am considering consolidating into a single .py. Thanks!