Do not Reinvent the Pagination Wheel
From the very day that Google released the first version of the AppEngine SDK until today, developers have been asking themselves and each other, “How can I handle paginating results from the datastore?” There were dozens of different naive attempts that failed because of the unique nature of the datastore. There were many approaches the solved only half the problem by offering forward-paging only. Still others offered good pagination while limiting the kinds of queries that could be performed.

It was an ugly and uncomfortable mess. It made everyone somewhat uneasy.

Google partially solved the problem with version 1.3.1 of the SDK which introduced query cursors, a simple, transparent and HTTP-friendly way to serialize and deserialize query states. They only provided a single-direction of query resumption, but it was a huge advance over the prior capabilities. A complete paging solution still required quite a bit of work.

I began doing that work in order to provide a new project of mine with web-standard paging. I wanted Previous and Next buttons as well as links to each page of results. Using cursors greatly simplified the code, but I was still writing a lot of code, and time that would otherwise be spent creating a great user interface and improving usability was going to building ugly behind-the-scenes mechanisms.

Fortunately for me, the voice of experience rolled around inside my head and advised me, “Hit up Google. Make sure that you aren’t reinventing the wheel.” One well-constructed Google query later, and I found Ben Davies’s PagedQuery class. It had all of the features that I needed, and it used all of the techniques and strategies a that top-notch AppEngine engineer would apply. It was concisely-coded yet extensively commented. It was beautiful and free. It was the wheel that I nearly reinvented.

So, go ahead and ignore everything else that is out there related to AppEngine paging. Disregard even this very blog’s old posts on the subject. Ben Davies built what you want. Use it.

Announcing Sluggable-mixin


I've extracted the slug-related code from this blogging software and packaged it so that it should be easily-portable to any Google AppEngine application.  I hope that it will be useful for anyone who needs to add nice human and search engine-friendly URLs to their app.  It works like and is similar in spirit to my previously-released AppEngine mixin, taggable-mixin.

I'd like to encourage anyone who finds this useful -- or any who finds it to be a useless atrocity -- to leave questions, suggestions and feedback here.  I will answer as quickly as I can.


Now with slugs!

I've just integrated my new AppEngine Datastore mixin class, sluggable-mixin.  It adds the ability to associate a user-friendly slug with any datastore entity.  Posts here are now referenced by slug rather than their lengthy and meaningless datastore ID string.

I'm going to release it as open source as soon as I can polish it up a bit and write some good documentation.  I'm a stickler for good, thorough documentation and automated unit tests.  The tests are done; although, they might not have as much coverage as I'd like.  I'm hoping to get sluggable-mixin released before the New Year begins. 


Open Source as a Roadside Picnic

One of the many changes that Jessamyn suggested for MyKidsLibrary is that the URLs should be comprised of meaningful text rather than just numbers.  In addition to being more human-friendly, it is, apparently, an important search engine optimization technique.

Ruby on Rails likes to construct URLs that end with a numeric identifier that is used to look up a specific record in the database.  It is an efficient, effective solution, and the software engineer side of me never considered why you'd have it be otherwise.  I have come to think of URLs as being things that are as effectively meaningless and worthless to my brain as printouts of UNIX coredumps.  I click on links, I bookmark pages, I never pay the slightest attention to URLs.  I use tools -- browsers, bookmarking services -- to work with URLs just as I use tools to write software.

Once I decided to go about making the change, I set out to find who else had already done this work.  The Rails ecosystem is vast and densely populated; I knew that there was but a very tiny chance that I'd actually have to start from scratch.  Sure enough, a little work on Google revealed that there were many candidate solutions.  I picked one that looked solid and set about integrating it into my project.

I'm not new to this; I have been a working, salary-earning software engineer for nearly two decades, so I should have been prepared for the documentation to suck.  The documentation always sucks.  The last time that I read really good, comprehensive documentation was when I was writing code for a VMS system, and I sat right next to the big orange wall.  At least, I remember it being good; it's all so long ago that I might be remembering it in a somewhat nostalgic light.

I had to figure out a lot of things that weren't mentioned in the documentation, and while that's not the worst thing, it is still frustrating to see a useful, well-put-together package that stops just short of being perfect.  And, really, they all do.

Open Source is invaluable, but in many respects, it reminds me of a Roadside Picnic.

And just to prove that I'm not a hypocritical dick, my next post will include extensive, failproof instructions for configuring and using the wonderful Rails plugin acts_as_urlnameable.


Just released: taggable-mixin

I have just packaged up a bit of the code that is part of this blog, the part that manages tags, and released it as a Google Code-hosted open source project: taggable-mixin.

Taggable is a Python mixin class that can be added to any AppEngine Model class to give it the ability to have tags associated with it.  It does so without modifying the Model itself; the tags are stored in a completely separate Model of their own, and they are associated by Key.  The tags are managed efficiently, so they are never duplicated.  A single tag instance can be associated with any number of different model instances.

I think that it is a pretty neat, clean and compact solution to the problem of tagging.  I'm very much looking forward  to getting feedback from my fellow AppEngine developers, as I am still a relative newbie to Python.  I have learned a great deal about it since I began working with AppEngine, but I am hardly a veteran coder, familiar with Python idioms and such.  I also want this to be as valuable a contribution to the community as it can be, but unless I hear back from those using it, I will have only limited ideas about how it can be improved.


Another release -- this time with all new tags!

I've just uploaded another release of my blog software, and this time it has TAGS!  Tags, of course, are clickable links, and you can use them to search for all posts that also have the given tag.  Cool, I think.

I'm rather proud of the fact that I wrote the tagging-related classes to be completely transportable to anyone else's Google AppEngine code, and integrating this functionality is as easy as making the taggable class a super-class of any Model class.

I'll update this entry once I have release the tagging code as a Google Code open source project.  I might see if I can run it by someone who is a more experienced Pythonista, so I can get a quick sanity check.  I'm just the tiniest bit nervous that while my code might be valid Python, it might not observe common idioms of the language.


Paginating Records in Google AppEngine

3/3/2001: I now consider the information in this post to be obsolete. More useful and up-to-date advice is available in the post Do Not Reinvent the Pagination Wheel.


In creating this blogging software, I have had to come to grips with finding a way to paginate content.  It's a relatively trivial exercise under most circumstances; it is a well-understood pattern, and it is actually built in to some of the popular frameworks.  AppEngine is a little different, and the nature of the Datastore actually makes it rather challenging to implement efficient useful paging.   I've come up with a solution that I think makes for a good balance of functionality and AppEngine-friendliness.

The code and tehcniques included here are Open Source.  I do hope that if you choose to use this code in your oen project that you'll comment here to share your feedback, suggestions and experiences.  Sharing means caring, guys.  For real.

This Paginator class depends on the Model that it will be paginating having an 'index' field, a unique value that is order with respect to how the pagination will occur.  For instance, here is the model definition for this blog's Comment entity:

class Comment(db.Model):
    """A Model for storing comments associated with another entity."""
    author = db.StringProperty(required=True, verbose_name="Author")
    "A text representation of the user who write the comment."
    body = db.TextProperty(required=True, verbose_name="Comment")
    "The text of the comment."
    added = db.DateTimeProperty(auto_now_add=True, verbose_name="Date Added")
    "The date that the comment was added, or created."

    index = db.IntegerProperty(required=True, default=0)
    "The index of the comment in the collection of comments for the parent entity."

Here, index increases every time a new comment is added; in fact, it mirrors added, always increasing.  However, index will always be unique.  It might not always be contiguous however, as a Comment can be deleted.  This function adds comments to the parent entity.  You can see how index is maintained:

def add_comment(self, author, body):
    "Add a new comment to this entity.  Returns the new comment object."
    new_comment = None
    def add_comment_txn():
        new_comment = Comment(parent=self, author=author, body=body, index=self.comment_index)
        self.comment_index += 1
        self.comment_count += 1
        return new_comment
    new_comment = db.run_in_transaction(add_comment_txn)
    # Invalidate the cached collection of records, so it will be regenerated
    # and re-loaded with the new record in it.
    return new_comment

Paginator comes in to play in the function that gets a page of comments when the blog is requested to show a post:

def get_comments(self, index=0, count=5):
    "Return the comments attached to this entity."
    comments_paginator = Paginator(count, 'index')
    comments = comments_paginator.get_page(db.Query(Comment).ancestor(self), index, True)
    return comments

The only perhaps slightly non-obvious part is index.  Where does it come from?  How do I know which index to ask for?  Is index the page number?  The answer to those questions is a little bit of a chicken-and-egg situation.  You provide Paginator's get_page method with an index from a previous call, usually the next_page or prev_page index.  Usually, you'll get those values the first time by calling get_page with an index of None.  That will tell it to get the very first page of results, and then you will have access to the prev_index, next_index and curr_index values that can be fed back in to it.  The Paginator alwasy looks for indexes relative to what is passed in, so the requested index doesn't exist --because it was deleted between calls -- it'll find the next one in the order.

So, that should give you a pretty good idea of how the Paginator works.  Please post any questions or suggestions as a comment, and I'll see them and address them as best as I am able.  Here, then is the actual Paginator code:

#Copyright 2008 Adam A. Crossland
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#See the License for the specific language governing permissions and
#limitations under the License.

from google.appengine.ext import db
import copy

class PaginatedList(list):
    """An extended normal Python list with three additional properties used for
    pagination purposes:
    prev_index - the starting index of the previous page of entities;
    next_index - the starting index of the next page of entities;
    curr_index - the starting index of the current page of entities
    def __init__(self, *args, **kw):
        list.__init__(self, *args, **kw)
        self.prev_index = None
        "The starting index of the previous page of entities"
        self.next_index = None
        "The starting index of the next page of entities"
        self.curr_index = None
        "The starting index of the current page of entities"
class Paginator:
    "A class that supports pagination of AppEngine Datastore entities."
    def __init__(self, page_size, index_field):
        self.page_size = page_size
        "The number of entities that constitute a 'page'"
        self.index_field = index_field
        "The name of the field in the Model that is a orderable index"

    def get_page(self, query=None, start_index=None, ascending=True):
        """Takes a normal AppEngine Query and returns paginated results.
        query - a Datastore Query object.  It must not have an order clause.
        start_index - the index of the first record in the desired page.  If the
            index is not known, or the first page is needed, None should be
        ascending - True if the index column is to be ordered ascending; False
            should be passed for descending ordering.
        fetched = None
        # I need to make a copy of the query, as once I use it to get the main
        # collection of desired records, I will not be able to re-use it to get
        # the next or prev collection.
        query_copy = copy.deepcopy(query)
        if ascending:
            # First, I will grab the requested page of entities and determine
            # the index for the next page
            filter_on = self.index_field + " >="
            fetched = PaginatedList(query.filter(filter_on, start_index).order(self.index_field).fetch(self.page_size + 1))
            if len(fetched) > 0:
                # The first row that we get back is the real index.
                fetched.curr_index = fetched[0].index
            if len(fetched) > self.page_size:
                # We fetched one more record than we actually need.  That is the
                # index of the first record of the next page.  Record it, and
                # delete the extra record from our collection.
                fetched.next_index = fetched[-1].index
            # Now, I will try to determine the index of the previous page
            filter_on = self.index_field + " <"
            previous_page = query_copy.filter(filter_on, start_index).order("-" + self.index_field).fetch(self.page_size)
            if len(previous_page) > 0:
                # The last record is the first record in the previous page.
                # Record it.
                fetched.prev_index = previous_page[-1].index
            # Follow the same logical pattern as for ascending, but reverse
            # the polarity of the neutron flow
            filter_on = self.index_field + " <="
            fetched = PaginatedList(query.filter(filter_on, start_index).order("-" + self.index_field).fetch(self.page_size + 1))
            if len(fetched) > 0:
                # The first row that we get back is the real index.
                fetched.curr_index = fetched[0].index           
            if len(fetched) > self.page_size:
                # We fetched one more record than we actually need.  That is the
                # index of the first record of the next page.  Record it, and
                # delete the extra record from our collection.
                fetched.next_index = fetched[-1].index
            # Determine index of previous page
            filter_on = self.index_field + " >"
            previous_page = query_copy.filter(filter_on, start_index).order(self.index_field).fetch(self.page_size)
            if len(previous_page) > 0:
                # The last record is the first record in the previous page.
                # Record it.
                fetched.prev_index = previous_page[-1].index
        return fetched

Great software is worth paying for

The open source software development model has been a revolution, a sweeping tide of change and innovation that has lifted and invigorated this industry while seeming that it ought to be antagonistic to it. It is private industry for the most part that provides programmers with salaries, and giving things away for free is almost certainly going to take revenue away from some, right?

Strangely, wonderfully, the opposite is true; the technology industry is booming, populated by a veritable Cambrian explosion of entrepreneurs, innovators, makers and dreamers who are enabled by free and open-source software. Even those of us who work primarily within private industry, contributing few or none of our work products back to the community still benefit enormously from the vibrant, competitive, changing ecosystem that surrounds us.

Still, when you embrace open-source software and start using it on a daily basis, it's perfectly obvious that a great deal of it lacks a certain something. Call it usability, or perhaps the correct word is beauty. Grace? Elegance? Polish? It's hard to pin down exactly what it is, but something is missing oftentimes. Look at the software that comes with your very-much-not free Macintosh, and it you'll see a stark contrast with the competing packages that are available for Linux.

Yorba, however, is producing great, beautiful software. Usable and elegant software that you might gladly pay for. But it is free. To you. It is not free for Yorba, which must pay salaries and rent and utility bills. Right now, they are running an Indiegogo campaign to fund development of Geary, a new desktop email client that seems to be destined to define its genre. It is something that everybody who uses open-source software should support because it represents the next wave of happy, progressive disruption within the on-going OSS revolution.

Great software is worth paying for. Do the right thing and chip in.