Hacker News new | ask | show | jobs
by reipahb 4053 days ago
I don't think this requires storing anything on the server. That was kind of the point of the whole "store the server state in the continuation token"-thing :)

You want to store enough information in the token that you can easily reconstruct and resume the enumeration.

For example, let us say that the user asked for all comments with a score >= 5 sorted by post time. In that case you could return 100 comments, and a token that encoded something like:

  {
    "min_score": 5,
    "sort": "post_time",
    "resume_from_post_time": "2015-05-07T05:34:02Z",
  }
To ensure that it is easy to resume the enumeration, the API can fudge the number of returned items so that the returned data always breaks at a nice "post_time" boundary. The goal here is to make it easy for the client to get all the data in the enumeration without implementing all this logic themselves.

True, it will only work efficiently for some types of queries, but a lot of the common queries can be reworked into something like that.

1 comments

Ok, I see the difference.

You suggest that the continuation token, is basically an encoding of the query parameters, to fetch results from the API. If you go with this approach, then you don't have to store any state on the server. This is a good approach, because it's simple, but it doesn't solve the issue, where the response from the API changes while you making the paging API calls. The example used in the article is where an order was deleted, while you were calling the API.

I was thinking of using a uuid to generate a continuation token, and storing a copy of the results from the API. Subsequent calls that use the continuation token, would take a subset of these results. This requires storing more state on the server, and managing that state. The benefit to this approach is that results you get back from paging are consistent. This solves the issue, where the results from the API change while you are calling the API multiple times. The downside to this approach is that you have to store more state in server. If you are storing the full results for all of these paging API calls, then this could be quite large.

Actually, deleted orders are not a problem with continuation tokens. The deleted order is only an issue if you use traditional paging requests, which was the point of the article.

The only case I can currently think of that cannot be solved using continuation tokens sent to the client is where the order of the items that are enumerated may change between calls to the API. For example, imagine that you are fetching items sorted by score, and somebody upvotes or downvotes an item while you are enumerating them. In that case it is very difficult to encode enough information in the continuation token. (I can think of complicated ways to make it work, but the resulting database queries would be horrible.)

But for simple stuff like deleted items and similar, it is easy. If you leave out sorting it is trivial to implement as well -- all you need is to enumerate the items based on an internal ID that is guaranteed to always increase, filtering them as required. The continuation token will simply be the ID of the last item you evaluated and the filter that is applied. On the next request you just resume from that ID. If that item ID happens to be deleted in the meantime it is no problem. You just resume from the next one available. I.e.:

  SELECT * FROM items WHERE id > :last_returned_id AND [insert-filter-here] ORDER BY id LIMIT 100;
I understand where you're going, but this is not a continuation in the formal sense, and it isn't even usefully similar.

If the client stores the state on behalf of the server, the server will potentially be working with a modified dataset on the next request, and we're right back where we started with limit and offset.

You can cook up a scheme to make it work for a restricted range of requests, but the compromises are severe.