Hacker News new | ask | show | jobs
by ceeker 1558 days ago
Can someone share how they handle versioning in their API when it comes to data model changes? For example `POST /users` now takes a required field `avatar_url` but it was not part of `v1`.

Since this field is validated in the DB, merely having `v1` `v2` distinction at the API layer is not sufficient. So I was thinking we will have to either 1) disable DB validations and rely on app validations or 2) run two separate systems (e.g., one DB per version) and let people 'upgrade' to the new version (once you upgrade you cannot go back).

Even though people refer to Stripe's API versioning blog, I don't recall any mention of actual data model changes and how it is actually managed

6 comments

Those are possible, but ugly solutions. Two cleaner ones are either depracate and remove the v1 api altogether, or when inserting a record to the database from the v1 api, use a default dummy value for avatar_url.
Yes, and definitely favour the deprecation. Treat web APIs like any interface - have minor and major versions, deprecating then dropping old versions.
Although I agree with the comments above, adding a field is also a breaking change, even with a default value. Especially prone to this is any openApi client (speaking from experience ...). Maybe filtering it out would be an actual solution without breaking anything. An API update shouldn't need to update my implementation because it's not working anymore, in that case it's a breaking change and a major version bump.
I really like this way of versioning https://medium.com/@XenoSnowFox/youre-thinking-about-api-ver...

It uses Accept and Content-Type to version resources: application/vnd.company.article-v1+json

That's...really clever, but at the same time, I feel like there's a lot of assumptions baked into how Content-Types are used, and making your own content-type for each data model when it's all just application/json seems...wrong to me on an intuitive level, but I can't quite annunciate why.

I only half agree with the sentiment that /api/v1 violates REST patterns. I don't think there's any guarantee that /api/v1/bars/123 can't be the same object as /api/v2/bars/123.

Interesting! Personally, if I had to go to the lengths of the article, I might as well use a schema registry like Avro.
I upvoted because I'm curious to hear what others are doing. We typically make sure to only make such breaking changes where either the now-required value or a sane filler value could be used. If it's the same API for the same purpose, it's usually not a stretch to assume the values for a new field are derived from some combination of an old field or else are primitive components of an old field such that they can be deduced and stubbed in your transition layer (or calculated/looked-up/whatever one-by-one as part of a bulk migration script during the transition). If your v2 is so drastic of a breaking upgrade that it bears no relationship to v1, I imagine your SOL and probably should have thought out your v1 or your v1-to-v2 story better, if only for the sake of the poor devs using your API (and you probably need separate tables at that point).

For other fields like your example of `avatar_url` I would use a placeholder avatar for all legacy users (the grey anonymous snowman profile comes to mind).

Thanks. This is a fair point. I made up the example only to illustrate the idea. Since Stripe is considered some sort of benchmark here I was curious to see how they tackle all the learnings they will have over time...I feel it is very hard to think through all the future cases especially when you are just about starting out with your product.

For example, in financial services and insurance, regs change and what data we need to collect change and sometimes their dependency will change. I am curious what's companies that have grown substantially had to do to their APIs.

No worries, I understood it was a throwaway example that shouldn't be looked at too closely. You just have to remember that your DB isn't a model of what you want to require from your customers but rather a model of what you actually necessarily have and don't have. A field like the ones you're talking about shouldn't be marked non-nullable in the database if there's a chance you actually don't have that data (and when you are suddenly required to collect something you didn't have before, you're not going to have it).

Coming at this from a strongly-typed background, you acknowledge the fact that despite new regulations requiring a scan of the user's birth certificate in order to get an API token, that field can't be marked as non-null if you don't in fact have all those birth certificates. You are then forced to handle both the null and not-null cases when retrieving the value from the database.

So your API v2 can absolutely (in its MVC or whatever model) have that field marked as non-null but since your API v1 will still be proxying code to the same database, your db model would have that field marked as nullable (until the day when you have collected that field for all your customers).

If a downstream operation is contingent on the field being non-null, you are forced to grapple with the reality that you don't have said field for all your users (because of APIv1 users) and so you need to throw some sort of 400 Bad Request or similar error because (due to regulations) this operation is no longer allowed past some sunset date for users that haven't complied with regulation XYZ. In this case, it's a benefit that your db model has the field marked as null because it forces you to handle the cases where you don't have that field.

I guess what I'm saying is the db model isn't what you wish your data were like but rather what your data actually is, whether you like it or not.

I think Stripe was originally built on Rails (can’t find anything to confirm that at the moment). But my guess is they enforce things at the app layer, since Rails didn’t really provide a good way to enforce things at the DB layer originally. They support very old API versions by transforming requests backwards and forward through a list of API version transforms, which also suggests to me that this sort of thing is enforced at the app layer rather than the DB.
Hey

We're working in this space at the moment, (eliminating the pain from breaking changes in APIs) and looking to get feedback on what we're building.

We're all from banking backgrounds, so understand the reg headaches you're talking about.

Can we chat?

I'm not saying you should do it this way, this is just how our startup (still very much in the "move fast and discover product fit" stage) does it. We have separate API models (pydantic and fastapi) and DB models (sqlalchemy). Basically everything not in the original db schema ends up nullable when we first add a field. The API model handles validation.

Then if we absolutely do need a field non-null in the db, we run a backfill with either derived or dummy data. Then we can make the column non-null.

We use alembic to manage migration versions.

But we aren't even out of v0 endpoint and our stack is small enough that we have a lot of wiggle room. No idea how scalable this approach is.

The downside is maintaining separate api and db models, but the upside is decoupling things that really aren't the same. We tried an ORM which has a single model for both (Ormar) and it just wasn't mature, but also explicit conversions from wire format to db format are nice.

That's what salesforce does. In our app we're on version 47.0 of their API

https://test.salesforce.com/services/Soap/c/47.0

And in the latest version of the API docs they have details regarding old versions. Example:

https://developer.salesforce.com/docs/atlas.en-us.api.meta/a...

Type reference Properties Create, Filter, Group, Nillable, Sort Description The ID of the parent object record that relates to this action plan.

For API version 48 and later, supported parent objects are Account, AssetsAndLiabilities, BusinessMilestone, Campaign, Card, Case, Claim, Contact, Contract, Financial Account, Financial Goal, Financial Holding, InsurancePolicy, InsurancePolicyCoverage, Lead, Opportunity, PersonLifeEvent, ResidentialLoanApplication, and Visit as well as custom objects with activities enabled.

For API version 47 and later, supported parent objects are Account, BusinessMilestone, Campaign, Case, Claim, Contact, Contract, InsurancePolicy, InsurancePolicyCoverage, Lead, Opportunity, PersonLifeEvent, and Visit as well as custom objects with activities enabled.

For API version 46 and later, supported parent objects are Account, Campaign, Case, Contact, Contract, Lead, and Opportunity as well as custom objects with activities enabled.

For API version 45 and earlier: the only supported parent object is Account.

We’re using event sourcing so the “projection” (db snapshot) have 2 different tables for v1 and v2. Think users_v1, users_v2.

Obviously there will be always challenges with eventually consistency but that is another topic altogether.

Have a default value for v1. Don't maintain two DBs just for this.