Hacker News new | ask | show | jobs
by cwisecarver 3323 days ago
I've got more than 5 years of experience with Django on a number of teams and at a couple of companies and in my experience almost everything in this article is completely incorrect.

The only things I would agree with is the point about project layout and avoiding django's squashmigrations for the truncate the migrations table, delete the migrations, and create a new initial migration.

Practically everything else in this article is wrong, in my opinion.

6 comments

I don't know as much about Django as you or the authors of the article.

So I can't really tell which of you has a better point, or is better in context and so forth.

I do, however, see that the authors wrote a long article, and backed up each point with an example of what could go wrong and how to avoid it.

You, on the other hand, just asserted that you largely disagree.

So if you disagree, perhaps you can take some place where you feel they were particularly wrong and explain why. Otherwise, we are left with the impression that they are correct.

I've got 10 years' experience on projects small and large and I have to agree. The title talks about building at scale but the article doesn't stress that which makes some of the advice downright weird.

>If you don't really understand the point of apps, ignore them and stick with a single app for your backend. You can still organize a growing codebase without using separate apps.

This is where the article lost me. If this is for building at scale, maybe, I don't know. I never hit a point where designing the project in apps became a problem. Regardless, if you don't know why Django wants to use apps, that suggests you are new to Django and probably not building at scale, so this feels like poor advice. Much of the article is telling readers to do things exactly contrary to Django's philosophy; the problem with that is there are lots of articles and StackOverflow answers out there based on Django's philosophy. There isn't a similar body of reference based on the authors' approach.

I don't know why explicitly naming your database tables is imperative for running at scale. Now we're breaking from Django's convention because some day we might want to stop using Django and we will be annoyed by its table-naming convention? Avoiding "fat models" is another place where it feels more like opinion than anything to do with performance or good design.

It would be good to know what database engine the authors are running into such serious migration issues with-- MySQL?

> Avoiding "fat models" is another place where it feels more like opinion than anything to do with performance or good design

So in the Java world, the general pattern is that:

Views:

  - Accept and sanitize query parameters

  - Call call one or more service methods.

  - Catch errors and return an appropriate error response

  - Render a JSON response based on the results of the service methods if nothing goes wrong.
Service methods:

  - Perform business logic

  - Manage persistence

  - Bubble up errors
The nice thing about this architecture is that each piece of the codebase tells a complete story about what it's doing. That is from looking at the view you can see what parameters it accepts, how they are sanitized, what service method it calls, each of the errors that can be returned, and what the 200 response looks like.

And looking at the service method we can see what business logic it performs, and what the database queries look like.

In each case there isn't any reason to look at other methods to understand the 'story' of what's happening in your app. This makes it very easy to read the codebase and audit it for correctness.

The problem with fat models is that they're not telling a story about what's actually happening in the app, e.g. looking at them doesn't tell you anything about the business logic the endpoints are performing. And what's worse, you also can't look at the views or services and know what they're doing either.

As someone who strongly prefers Python and Django over the Java ecosystem, I'll say hands down that in terms of how web app are architected they got it right and the Django people got it wrong. As far as I can tell the whole Domain Model Architecture thing seems like a bunch of bullshit that was invented to sell consulting. If the advocates of this approach can't even write a coherent Wikipedia article, it should give you a clue as to what the code ends up looking like. [1]

[1] https://en.wikipedia.org/wiki/Domain-driven_design

Yeah, I don't disagree with that at all. I came to Django from C# after playing with Ruby on Rails a little bit and the lack of an explicit Controller in Django confused me and I think it is part of the driver behind the "fat model" approach. I like the idea of the logic for the business object being inside it and all testable on its own but I think it has its limits-- thinking about my own Django codebases, the number of class/ static methods I have on models is a code smell from me learning OOP on C# where I had to stick those methods some place.
Is there a good place or pattern for service methods in Django? I've got some very fat models right now, and it's a DRY improvement over having the fat in the views, but like you say it takes a lot of effort to trace what's going on.
Let's say you're following the approach of breaking down your project into separate apps, so you have an app called user_accounts. This app would be a folder containing files like:

  views.py, services.py, models.py, test_views.py
So in views.py you'd have a User class, with:

  A POST method that calls services.create_user(username, email_address, password)

  A GET method that calls services.get_user_profile(request.user)

  A PUT method that calls services.update_user_profile(x, y, z)

  A DELETE method that calls services.inactivate_user(request.user)
The return value of each of these views can just be whatever services.get_user_profile(request.user) returns, rendered into JSON.

Then each of the services performs whatever business logic it needs to, preferably directly in the method. But if it would be more readable split into multiple methods, then you can create some private helper methods in the services file prefixed with an underscore. You can also have a separate folder somewhere for utility functions meant to be reused across the app, e.g. get_user_emails(request.user, is_active=True, is_verified=True)

Basically though each view sanitizes the data, e.g. strips XSS out of strings, makes sure booleans are actually booleans, etc.

Then each service first does field-level validation with serializers, e.g. ensuring that usernames meet the appropriate requirements for usernames. Next if there is other business logic validation that needs to happen, it happens, e.g. making sure that only users with verified email addresses can perform certain actions.

After that you perform the actual business logic, e.g. transforming any data. Then you perform your CRUD operation, e.g. creating a user model. And lastly you return something, e.g. returning the user model.

Each endpoint and service method can be written pretty much following this pattern, which makes the codebase super readable because once you understand one endpoint you understand all of them. And the service methods are the reusable component of the architecture, so e.g. if you want the ability for admins to create users, then they are created with the exact same service method. (But called from your admin endpoints/services.)

That's almost exactly what I do, except for using a big monolithic app for the entire backend (called "core"), and making "services" a package with several modules inside.

It also resembles very much what I see in Java projects which use DDD (Domain Driven Design).

What's your take on the article's point that you should have fewer rather than more Django apps (citing the problem of inter-app-FKs)?

So my startup is built the way you describe, in terms of just having one main app, and I personally prefer this style.

The basic argument in favor of breaking down the Django project into multiple apps is that it makes the components more decoupled and reusable. But personally I think this is bullshit. If you want your apps to be reusable and decoupled then you need to put a ton of time into architecting them this way, the idea that you're going to get these benefits just from putting stuff into different folders is magical thinking. It seems like pretty much the textbook example of cargo cult programming.

That said for the client I'm currently working for, the decision was made to do it the 'standard Django way' in terms of breaking it into multiple apps. So far I haven't run into any issues here. I like it slightly less because I think having all the views in the views folder, and all the services in the services folder makes folks more likely to reuse code just by making it easier to find. But yeah, so far no real problems, but I'm also not expecting to see any magical benefits either.

I'm not aware of architectural patterns for service methods in Django (would like to find some as well), but what I did was to somewhat mimic a Java structure.

All the project is in one single app, which I unimaginatively called "core", and inside this app there's a "services" Python package (i.e. a folder with a __init__.py file inside). These have roughly one Python module (.py file) for each "category" of services. For example, there's thin layers like "user_service.py" (basically passes through to the relevant models), to more complex services like "dependency_x_integration_service.py", which connects to external service "X" and pulls some relevant data (say, user interaction datapoints), and bridges them to the models in the system.

We do roughly this where I work, so I broadly agree.

That said, it's fairly common for the unit of reuse to be below service methods. Also, depending upon how exactly you manage transactions, another thing to look out for is making multiple non-idempotent service calls from the view - this will be an area ripe for race conditions you likely aren't testing.

> it's fairly common for the unit of reuse to be below service methods.

In terms of utility functions or serializers? What does that look like exactly?

E.g. in our codebase service methods can call helper methods (non-reusable), utility methods (reusable), and serializers (non-reusable).

I agree the doordash article gets some stuff right and most stuff wrong, almost to the point where it's difficult to read. But (somewhat tangentially) I admit I have struggled in the past with separating out Django apps for reasons not mentioned in the article.

Specifically, say I have two apps, with a second more specific app heavily dependent on a first more general app. What I find in this scenario is that I sometimes need hooks into the general app from the specific app, which means that I wind up importing modules from the specific app into the general app. This hasn't generally been a showstopper in my experience, but it creates some friction because:

a) I would prefer for the general app to have no dependencies on the specific app

b) This results in circular imports (which can themselves be addressed, but this is an implementation detail I would prefer not to have to worry about)

I realize these issues can be mitigated with signals, but I try to use signals sparingly for various reasons (https://code.djangoproject.com/ticket/16547#comment:2). It also helps that foreign keys can be expressed using a string literal rather than the actual model, but in the end, I still occasionally run into situations I don't feel great about.

Please note that I'm not advising against separating out functionality into apps. Instead, I'm merely citing an issue about having multiple apps that bothers me.

Agreed. I've built and maintained a moderate size Django app for 5 years now, and had similar issues.

GOOD app division: my "members" app which has classes for UserProfile, MemberType, and communicates with an upstream 3rd party membership API. It doesn't have any dependencies, but a bunch of other apps that depend on it. Another one I've just started work on is a generic Questionnaire/Survey app. This one is a definite candidate for spinning out as an open-source third-party app later, it lets you attach Questions and Answers to any of your own model objects via generic relations.

BAD app division: I have separate "Entry" and "EntryHandling" models across two apps, the latter is a OneToOne with an Entry. Originally this was a separation of concerns, but it's become a mess. Like the parent, the generic app ends up depending on the specific app, and migrations have to be handled gently and sometimes manually edited.

If you treat your Django apps as points that would be logical for splitting as micro-services, you'd probably be just fine.

> I never hit a point where designing the project in apps became a problem.

The article literally describes why this is a problem in the first place. Cross-app model relations are a PITA, and splitting "sections" of your site into separate apps often has you end up with cross-app relations.

The more general point here is that: The functional separation between Django apps and the logical separation between "parts" of your site often do not match up, and thus you should be careful about splitting up your site into multiple apps.

In my experience, this is absolutely true and happens often. The article recommends separating the parts of your site into modules and packages within a single app, which is a great idea and something the Django docs don't make obvious as a choice.

>Cross-app model relations are a PITA

A PITA how? They say they ran into migration issues due to the apps approach but it sounds like they ran into issues due to the sheer number of migrations happening across a bunch of developers. That sounds like a likely problem on big teams, but I don't think it's one best solved by not using the app approach and I don't think it's one whose underlying problem is ForeignKeys to models in other apps. Again, it would be nice to know if this was on MySQL or a different database as what finally caused me to move to Postgres 8 years or so ago was the heartburn of migration on MySQL. I think I've run into one or two migration knots since then and they were both due to me moving a little too fast.

And at scale, I would assume you aren't actually running the migrations but generating SQL from them and running that. Still could run into the same problems, but you could sort that out by hand when you do. Not the best answer, but from the article it feels like a more formalized/ strict approach to who gets to modify the database and when would be good.

>splitting "sections" of your site into separate apps often has you end up with cross-app relations.

It also encourages you to do some up-front data modeling which is a skill that gets rarer as ORMs get more common.

/old man yells at cloud

Don't see how up-front modelling would have predicted the high planes of Unicode & MySQL's utf8 encoding being supplanted by utf8mb4 so users of your product could message taco emoji to each other.

I have an agenda against Getting It Right First Time, Every Time, since it encourages brittle code that's a struggle to adapt to new, seemingly-similar use cases.

I don't understand how that first sentence is relevant. How would using apps organization vs the approach described in the article protect you from making the decision to change your database's underlying encoding?
One of the projects I worked on in the past had a legacy app which contained most of the core logic... and as the site grew, new apps were created because it got impossible to manage that one massive app. Splitting it into multiple pieces was incredibly painful because the codebase wasn't designed for that... so while I think this may help you get up and running faster, it will probably cause problems down the line.
Could you please elaborate?

I wouldn't say it is "wrong", but I skimmed it, and the advice seems either generic (organize your apps inside a package... like, duh?) or awfully specific to their own services.

In particular the things about dealing with migrations and the database. In my experience, the database structure doesn't change THAT much to warrant 3 or so sections of ramblings about dealing with migrations. And migrations don't tend to be dramatic either. My experience is that they are rather anti-climatic (I always sort of expected them to choke and kill my DB, since I started using South ages ago, but I've been pleasantly surprised so far).

Of course, this requires more than 5 minutes of planning on the developer's part.

And the article never touches things like how to actually run a django app at scale. I've seen an alarming number of places that just run their apps via the builtin django server (via the "runserver" command). This is, as I understand it, a very bad idea (for performance and security reasons).

Running your app under uwsgi (for example) optimally isn't exactly trivial, and I'd like to see them touch upon that.

Yep, this article has absolutely nothing to do with application performance and more to do with managing complexity, but avoiding complexity was not mentioned enough in the article, and they went as far as to suggest that users NOT use the ORM and to build a middle-layer for CRUD, which, well, is just flat out insane. Sorry, just is...
I replied to someone else who asked. https://news.ycombinator.com/item?id=14361539
Can you expand on what do you think is incorrect in the post?
- FK/M2M across reusable, packaged apps is only bad if you don't match the interfaces correctly. See: almost every third-party Django app that is built to integrate with another application's models.

- Sometimes you want the app concept just for organization. There's nothing bad about that and it makes sharing things inside of a project easier.

- Explicitly naming your database tables doesn't make any sense. You're using an ORM. Accept it or don't use it.

- Explicitly declaring through on a m2m field if you're not adding metadata to the relationship is pointless, but, if you're not using the table naming from Django I could see why they'd be invisible because there's no pattern to follow.

- GenericForeignKeys are dangerous but not for any of the reasons listed. It's because they implicitly force a two way join which seems magic until it becomes debilitatingly slow.

- The entire section on migrations leads me to believe that the first time the migrations are being run is on a production deploy. If you don't know SQL and you don't test your migrations prior to deployment, yes, it will be fairly difficult to determine what kind of performance/locking they're going to have.

- No to Fat Models? This breaks down to "The framework we chose to use suggests a pattern, we also chose not to follow that pattern." That's fine if you want to do that but I wouldn't suggest it to others.

- It's not hard to get signals to not fire in certain circumstances, you put the conditional in the signal callback like almost every other event-driven pattern. Also, bulk updating models doesn't fire signals in Django because save isn't called. Read the documentation.

- Avoid using the ORM? Why choose a framework as complete as Django where 80% of the features are built around or on top of the ORM and then don't use it?

- Caching complex objects makes cache invalidation hard. Well, yes, yes it does.

Re: naming tables, older versions of Django require (not sure about the latest) you to name your app in each model (via Meta) if you've broken your models into several modules within an app; maybe they are trying to speak to that?
I do appreciate some of the sentiment here. "Organize your apps inside a package", "Keep migrations safe", "Don't cache models" and "Avoid GenericForeignKey" are the ones I agree the most with, so I'll go over some of the others. Some of the other migration-related ones I don't have a strong opinion on...

> If you don’t really understand the point of apps, ignore them and stick with a single app for your backend. You can still organize a growing codebase without using separate apps.

This is still possible, but gets very painful down the line when you need to split it out into separate apps because detangling the mess will be next to impossible. I generally try to think about apps as distinct features which sometimes helps in splitting out the functionality.

> Explicitly name your database tables

If you're using apps this provides a nice separation between apps and makes it easier to see where data is coming from, rather than having custom table names that could be coming from anywhere.

> Avoid fat models

Models are really the only core location you have to add functionality to an object without worrying about copying it down the line. Fat models can be a pain to deal with, but it's better than the suggested alternative of building an additional access layer on top of the models themselves. Models are Objects and it makes sense to use them as such.

> Avoid using the ORM as the main interface to your data

Why? Building an additional layer on top of an already useful layer to do things it already supports seems a bit crazy. The part of this that I think is the strangest is this line: "Apart from signals, your only workaround is to overload a lot of logic into Model.save(), which can get very unwieldy and awkward." Those are the main two workarounds... and it's interesting to see them say "Be careful with signals" the section before then admit they're useful but recommend not using them.

Those were the main things I noticed. There are obviously some useful tips in here, particularly around migrations, but the portions where they go against direct recommendations relating to django.

If you're looking for a good book on recommendations from people who have been writing Django applications at scale, I strongly recommend Two Scoops of Django (https://www.twoscoopspress.com). There's a new version (https://www.twoscoopspress.com/products/two-scoops-of-django...) coming out soon which may be worth checking out (I've read previous versions and am recommending based on those).

I found signals (especially post_save hooks) incredibly useful for updating related models and caches. Their rational for avoiding them was weak.
Django signals are too magical, and very difficult to debug

We've had too many cases of phantom bugs which turned out to be caused by an errant signal in some distant unrelated model.

It's not really helpful when people come along and just say, "That's wrong", without explaining why...
I agree, but can you elaborate?

edit: 100% earnest request here

Same thing here, no clue why people are upvoting this.