Loading
0 comments

Do not Reinvent the Pagination Wheel
From the very day that Google released the first version of the AppEngine SDK until today, developers have been asking themselves and each other, “How can I handle paginating results from the datastore?” There were dozens of different naive attempts that failed because of the unique nature of the datastore. There were many approaches the solved only half the problem by offering forward-paging only. Still others offered good pagination while limiting the kinds of queries that could be performed.

It was an ugly and uncomfortable mess. It made everyone somewhat uneasy.

Google partially solved the problem with version 1.3.1 of the SDK which introduced query cursors, a simple, transparent and HTTP-friendly way to serialize and deserialize query states. They only provided a single-direction of query resumption, but it was a huge advance over the prior capabilities. A complete paging solution still required quite a bit of work.

I began doing that work in order to provide a new project of mine with web-standard paging. I wanted Previous and Next buttons as well as links to each page of results. Using cursors greatly simplified the code, but I was still writing a lot of code, and time that would otherwise be spent creating a great user interface and improving usability was going to building ugly behind-the-scenes mechanisms.

Fortunately for me, the voice of experience rolled around inside my head and advised me, “Hit up Google. Make sure that you aren’t reinventing the wheel.” One well-constructed Google query later, and I found Ben Davies’s PagedQuery class. It had all of the features that I needed, and it used all of the techniques and strategies a that top-notch AppEngine engineer would apply. It was concisely-coded yet extensively commented. It was beautiful and free. It was the wheel that I nearly reinvented.

So, go ahead and ignore everything else that is out there related to AppEngine paging. Disregard even this very blog’s old posts on the subject. Ben Davies built what you want. Use it.
3 comments

Paginating Records in Google AppEngine

3/3/2001: I now consider the information in this post to be obsolete. More useful and up-to-date advice is available in the post Do Not Reinvent the Pagination Wheel.

 

In creating this blogging software, I have had to come to grips with finding a way to paginate content.  It's a relatively trivial exercise under most circumstances; it is a well-understood pattern, and it is actually built in to some of the popular frameworks.  AppEngine is a little different, and the nature of the Datastore actually makes it rather challenging to implement efficient useful paging.   I've come up with a solution that I think makes for a good balance of functionality and AppEngine-friendliness.

The code and tehcniques included here are Open Source.  I do hope that if you choose to use this code in your oen project that you'll comment here to share your feedback, suggestions and experiences.  Sharing means caring, guys.  For real.

This Paginator class depends on the Model that it will be paginating having an 'index' field, a unique value that is order with respect to how the pagination will occur.  For instance, here is the model definition for this blog's Comment entity:

class Comment(db.Model):
    """A Model for storing comments associated with another entity."""
    author = db.StringProperty(required=True, verbose_name="Author")
    "A text representation of the user who write the comment."
   
    body = db.TextProperty(required=True, verbose_name="Comment")
    "The text of the comment."
   
    added = db.DateTimeProperty(auto_now_add=True, verbose_name="Date Added")
    "The date that the comment was added, or created."

    index = db.IntegerProperty(required=True, default=0)
    "The index of the comment in the collection of comments for the parent entity."

Here, index increases every time a new comment is added; in fact, it mirrors added, always increasing.  However, index will always be unique.  It might not always be contiguous however, as a Comment can be deleted.  This function adds comments to the parent entity.  You can see how index is maintained:

def add_comment(self, author, body):
    "Add a new comment to this entity.  Returns the new comment object."
    new_comment = None
    def add_comment_txn():
        new_comment = Comment(parent=self, author=author, body=body, index=self.comment_index)
        new_comment.put()
        self.comment_index += 1
        self.comment_count += 1
        self.put()
           
        return new_comment
    new_comment = db.run_in_transaction(add_comment_txn)
       
    memcache.delete(self._comments_cache_key())
    # Invalidate the cached collection of records, so it will be regenerated
    # and re-loaded with the new record in it.
       
    return new_comment

Paginator comes in to play in the function that gets a page of comments when the blog is requested to show a post:

def get_comments(self, index=0, count=5):
    "Return the comments attached to this entity."
    comments_paginator = Paginator(count, 'index')
    comments = comments_paginator.get_page(db.Query(Comment).ancestor(self), index, True)
           
    return comments

The only perhaps slightly non-obvious part is index.  Where does it come from?  How do I know which index to ask for?  Is index the page number?  The answer to those questions is a little bit of a chicken-and-egg situation.  You provide Paginator's get_page method with an index from a previous call, usually the next_page or prev_page index.  Usually, you'll get those values the first time by calling get_page with an index of None.  That will tell it to get the very first page of results, and then you will have access to the prev_index, next_index and curr_index values that can be fed back in to it.  The Paginator alwasy looks for indexes relative to what is passed in, so the requested index doesn't exist --because it was deleted between calls -- it'll find the next one in the order.

So, that should give you a pretty good idea of how the Paginator works.  Please post any questions or suggestions as a comment, and I'll see them and address them as best as I am able.  Here, then is the actual Paginator code:

#Copyright 2008 Adam A. Crossland
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
#http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.

from google.appengine.ext import db
import copy

class PaginatedList(list):
    """An extended normal Python list with three additional properties used for
    pagination purposes:
    prev_index - the starting index of the previous page of entities;
    next_index - the starting index of the next page of entities;
    curr_index - the starting index of the current page of entities
    """
    def __init__(self, *args, **kw):
        list.__init__(self, *args, **kw)
        self.prev_index = None
        "The starting index of the previous page of entities"
        self.next_index = None
        "The starting index of the next page of entities"
        self.curr_index = None
        "The starting index of the current page of entities"
   
class Paginator:
    "A class that supports pagination of AppEngine Datastore entities."
    def __init__(self, page_size, index_field):
        self.page_size = page_size
        "The number of entities that constitute a 'page'"
        self.index_field = index_field
        "The name of the field in the Model that is a orderable index"

    def get_page(self, query=None, start_index=None, ascending=True):
        """Takes a normal AppEngine Query and returns paginated results.
        query - a Datastore Query object.  It must not have an order clause.
        start_index - the index of the first record in the desired page.  If the
            index is not known, or the first page is needed, None should be
            passed.
        ascending - True if the index column is to be ordered ascending; False
            should be passed for descending ordering.
        """
       
        fetched = None
       
        # I need to make a copy of the query, as once I use it to get the main
        # collection of desired records, I will not be able to re-use it to get
        # the next or prev collection.
        query_copy = copy.deepcopy(query)
       
        if ascending:
            # First, I will grab the requested page of entities and determine
            # the index for the next page
            filter_on = self.index_field + " >="
            fetched = PaginatedList(query.filter(filter_on, start_index).order(self.index_field).fetch(self.page_size + 1))
            if len(fetched) > 0:
                # The first row that we get back is the real index.
                fetched.curr_index = fetched[0].index
            if len(fetched) > self.page_size:
                # We fetched one more record than we actually need.  That is the
                # index of the first record of the next page.  Record it, and
                # delete the extra record from our collection.
                fetched.next_index = fetched[-1].index
                del(fetched[-1])
            # Now, I will try to determine the index of the previous page
            filter_on = self.index_field + " <"
            previous_page = query_copy.filter(filter_on, start_index).order("-" + self.index_field).fetch(self.page_size)
            if len(previous_page) > 0:
                # The last record is the first record in the previous page.
                # Record it.
                fetched.prev_index = previous_page[-1].index
        else:
            # Follow the same logical pattern as for ascending, but reverse
            # the polarity of the neutron flow
            filter_on = self.index_field + " <="
            fetched = PaginatedList(query.filter(filter_on, start_index).order("-" + self.index_field).fetch(self.page_size + 1))
            if len(fetched) > 0:
                # The first row that we get back is the real index.
                fetched.curr_index = fetched[0].index           
            if len(fetched) > self.page_size:
                # We fetched one more record than we actually need.  That is the
                # index of the first record of the next page.  Record it, and
                # delete the extra record from our collection.
                fetched.next_index = fetched[-1].index
                del(fetched[-1])
            # Determine index of previous page
            filter_on = self.index_field + " >"
            previous_page = query_copy.filter(filter_on, start_index).order(self.index_field).fetch(self.page_size)
            if len(previous_page) > 0:
                # The last record is the first record in the previous page.
                # Record it.
                fetched.prev_index = previous_page[-1].index
               
        return fetched