Hacker News new | ask | show | jobs
Nearly all web APIs get paging wrong (vermorel.com)
9 points by apievangelist 4066 days ago
2 comments

I'd like to see this continuation pattern described in a bit more detail. How does the client define the sort parameters or preferred data limits per continuation? Have I missed the point?
Continuations are "easy" (if you have decent language support or can glom it on) but that's keeping state on the server side. The author describes continuation tokens that "never expire", which is incompatible with server-side state.

Without state, a token can be a bookmark into a predefined ordered dataset. That's more reliable than an offset, but just as inflexible, and much more expensive on the server side.

Or, I'm missing something too.

One thing to note is that while the continuation token is just a blob of data from the clients perspective, the server can actually use it to store the required state information.

A simple method would be to take the state information the server needs in order to continue the enumeration (e.g. sorting order, how far along it was in the enumeration, etc.), JSON-encode it, encrypt&sign it, and then base64 encode it.

Return that token to the client, and if the client wants more data it can pass that token back to the server, which can decode it into all the information it needs to resume the enumeration.

I like this approach, but this requires storing the rest of the enumeration. I think that you would want to the continuation token to expire after a reasonable period of time (10 minutes to 1 hour). When the token expired, you would remove the rest of the enumeration from your data store.

This isn't what the author recommends, but I think this is a good approach.

I don't think this requires storing anything on the server. That was kind of the point of the whole "store the server state in the continuation token"-thing :)

You want to store enough information in the token that you can easily reconstruct and resume the enumeration.

For example, let us say that the user asked for all comments with a score >= 5 sorted by post time. In that case you could return 100 comments, and a token that encoded something like:

  {
    "min_score": 5,
    "sort": "post_time",
    "resume_from_post_time": "2015-05-07T05:34:02Z",
  }
To ensure that it is easy to resume the enumeration, the API can fudge the number of returned items so that the returned data always breaks at a nice "post_time" boundary. The goal here is to make it easy for the client to get all the data in the enumeration without implementing all this logic themselves.

True, it will only work efficiently for some types of queries, but a lot of the common queries can be reworked into something like that.

Ok, I see the difference.

You suggest that the continuation token, is basically an encoding of the query parameters, to fetch results from the API. If you go with this approach, then you don't have to store any state on the server. This is a good approach, because it's simple, but it doesn't solve the issue, where the response from the API changes while you making the paging API calls. The example used in the article is where an order was deleted, while you were calling the API.

I was thinking of using a uuid to generate a continuation token, and storing a copy of the results from the API. Subsequent calls that use the continuation token, would take a subset of these results. This requires storing more state on the server, and managing that state. The benefit to this approach is that results you get back from paging are consistent. This solves the issue, where the results from the API change while you are calling the API multiple times. The downside to this approach is that you have to store more state in server. If you are storing the full results for all of these paging API calls, then this could be quite large.

I've often wondered how the paging on HN could be better. The main issue is going from page 1 to page 2 where items move between them and I either see items a second time, or miss them. The problem with fixing a sequence on first page load is then when to refresh for new content--only on page 1? Lastly, a prescription is not helpful without a design for efficient implementation. How can this be achieved in a stateless manner?
The only way I see to solve this without server side state is to replace the page-parameter with a list of items you have seen. The more button would then just find the top 30 items you haven't previously seen. Unfortunately, this would become unwieldy very quickly, and sooner or later you hit the browser limitations on maximum URL size.

A relatively simple approach that involves server side state is to periodically (once a minute?) generate the list of (for example) the 10000 top items.

(A high traffic site will most likely want to do this in any case, so that it has a cached list of items ready to serve to clients, instead of issuing a database query to find the top items for every request.)

Now, instead of overwriting the list of top items every time you regenerate it, keep multiple versions of the list. Then you can make the link to the next page specify the version of the list and the page number. That way, users will browse through one specific version of the list.

(This requires storing some state on the server, but the amount is relatively small. You control both the size of the generated list, how often new lists are generated and how long they are kept, so there is an easily calculated upper bound on the amount of state information you need store.)

This solution satisfies my usage and doesn't use a continuation token, though one could be constructed from version and page. It does however expire.

I can see that the other comments on constructing continuation tokens won't work for HN assuming post upvotes are mutably updated.