I’ve been looking into creating a community driven web site for a while now and when I found the Google App Engine (GAE) I decided it would make the ideal platform for my application.
I’m a great believer in the right tool for the right job so I’m not tied to a particular language or platform. Java is my day job but deployment and hosting purely in terms of cost is a barrier. With GAE there is no cost of entry, almost unlimited scaling ability and you only pay if the site becomes very popular. So this was an ideal opportunity to learn a little Python, the only language that is currently supported. I’d dabbled before but just couldn’t get past the “white space has meaning”. It just didn’t feel right.
I have to say that now I’ve spent some more time with the language I quite like it. As it is a scripting language and you get the Google App Engine SDK for local development, the whole code, compile, deploy, test cycle that comes with Java is thankfully avoided. I’ve decided to use Django with GAE as Google have provided a version of it that is compatible with the Google datastore and it is a great MVC web framework that takes away a lot of the work that would have to be done manually without it.
The only thing I have struggled with is the Google datastore. The version of Django that come with GAE has been modified to make the datastore look like a relational database. But it isn’t. It is just one big table that can store anything you want. This makes it fairly fast and very scalable. But.
The main problem you have is with JOINS, which are handy with relational data. I come from the world of normalised databases where repeated data is bad design. Whilst creating my app I stuck with the normalised database model. As the application has become more complex the datastore persistence framework has allowed me to progress without putting any further thought into the datastore. The framework allows you to define an entity with joins to other entities. However, in order to retrieve these linked entities the datastore will perform a completely seperate retrieve as soon as they are referenced.
To take a forum example: if I create a forum post my user will be stored on that post. The post will be stored against the topic etc. So you may have post.topic and post.user fields referring to other entities. In my controller I can retrieve all posts for a topic and pass them to my template view. Then in the template view I can display fields like post.text and post.user.name – herein lies the problem. If I have a topic with 50 posts and each post shows the users name then this will cause 50 seperate calls to retrieve each user individually as they are referenced. Whilst the datastore is fairly fast and very scalable it is expensive to call both in terms of time and processing power. Because of this Google also have quotas on the number of calls you make and start charging when they are exceeded. Most likely the free quotas they give you will be adequate but if your site takes off it could start getting expensive.
The mantra is: take the pain in the updates so that the reads are cheap. Because of this I have started de-normalising my database and it feels terrible. Continuing with the example above I would add the user name to the message table. That way I do not need to access the user table at all when displaying a topic. If a user changes his name then it all goes a bit pear shaped but the idea is, is that this does not occur too often. Take the hit with the update to ease the read.
For the ultimate in speed and scalability Google say that a single entity should match what is required in the view. I can’t envisage any application where this is even remotely possible so you have to take a pragmatic approach.
I have had to look at every join I have defined and at how I reference the joined data and then take a view on how often it is likely to change before deciding on whether I duplicate the data. It is a painful process but will pay off in the future if Im lucky
The next problem area is data migration with new software versions. I might see if I can make use of this somewhere: GAE BAR
Not convinced on the name of that one though It might be too slow for migrations on a live system so maybe this one: GAE Migrations
Theres no easy way yet. I’ll put it off as long as possible I think.