Optimizing Datastore Use

Google App Engine’s datastore is one of the most underrated parts of the service. Having a relatively cheap (in some cases free) use of a fast, reliable NoSQL store is a terrific deal, especially since most developers are only experienced with SQL databases.

With that said the App Engine datastore can get expensive quickly, especially if it’s being used inefficiently. One of my favorite illustrations of this point is this article. Here’s what I do to optimize my datastore use:

  1. Use .setUnindexedProperty. To set a property on an Entity you call entity.setProperty(key, value). But in the background App Engine is building an index (perhaps multiple indexes) to allow searches on that property. These index builds can get expensive very quickly. If you don’t need to search on a property, use setUnindexedProperty. This informs App Engine that the application will not be searching on that property, so don’t build an index.
  2. Cache data in memcache. Whenever you make a datastore request, copy the returned results into memcache. Then if you need to make the same request, try pulling the data from memcache before querying the datastore again. Memcache use is free so access it before the datastore. Some datastore abstractions – such as Objectify – do this automatically.
  3. If this is a high-traffic app, consider using a backend. A high-memory backend can hold a large amount of data in RAM, and you can transfer data to/from other instances by using URLFetch. As a bonus this technique can be faster than querying entities from the datastore.
  4. Turn off AppStats. If you have AppStats enabled, turn if off for some speed gains. AppStats stores its data in memcache which may cause some of your data to be evicted, causing increased datastore queries and an increased datastore bill.

Receiving Email in Golang

I’m in the middle of writing a Java application on App Engine to receive mail, and I decided to look up on how to do it in Go. It’s shockingly easy, just a few lines of code (r represents http.Request):

    c := appengine.NewContext(r)
    defer r.Body.Close()
    msg, err := mail.ReadMessage(r.Body)

And that’s it. You can extract headers and the mail message body from the Message struct. It’s quite pleasant to use, and surprisingly fast at parsing email.

Retrieving A Datastore Entity With A Key (or Kind Name & ID)

Short code fragment that someone might find useful: retrieve an Entity if you know what the kind and entity ID/name are.

Key key = KeyFactory.createKey("kind", "id/name");
try {
    Entity entity = DatastoreServiceFactory.getDatastoreService().get(key);
} catch (EntityNotFoundException e) {
    //The entity wasn't found. Handle this exception.
}

Setting Security Constraints (Or, Adding Admin-Only Areas In web.xml)

After having experimented with Go for the past few weeks, returning back to Java is a little bit annoying, especially when configuring web.xml and appengine-web.xml files. Golang has a clean, neat configuration file in app.yaml, and yet Java on App Engine has to deal with relatively heavyweight XML files.

For instance, this is the markup required to create an admin-only folder on J/GAE:

The markup alone is 2-3 times the size of the settings themselves! There needs to be a better way of handling this.

Google App Engine Startup Time & Uncompressing JARs

About a week ago, I saw this post https://groups.google.com/forum/?fromgroups=#!topic/google-appengine/GdBqSxqviYk about instances being unable to load and failing initialization. I took one look at the picture provided, saw the line This request started at [time] and was still executing at [time, 1 minute later], and immediately assumed that the application’s init function was taking too long to run.

In most cases, that’s a fair assumption to make. One of the bigger pitfalls of App Engine is that instances have only 60 seconds to start up (load in all files and run the init method of servlets). It’s very common for developers to write in a huge amount of code within the init method, and then have instances fail startup because initialization took too long. In this case I believed init was the problem for one reason alone: the picture of the logging stack trace included references to ZIP I/O streams. Uncompressing and processing large ZIP files within the init function could easily take more than a minute.

However, it turned out that the developer wasn’t uncompressing ZIP files in the init – the answer was that App Engine was having a slow day, and was exceeding the 60 second startup limit just trying to uncompress the JAR. Which is pretty amazing and notable enough to comment on – the application didn’t even get fully extracted before App Engine shut down the instance as a failure.

Golang App Binary is missing

Getting a lot of app binary is missing errors from App Engine. It’s odd, I’ve been deploying Java apps on GAE for years, and never received any of these errors. But on the Go runtime, these errors are cropping up repeatedly.

Your project must be configured to use a JDK in order to use JSPs

Whenever I configure a new Eclipse install, I get a strange error the first time I use JSP files. Eclipse complains that I have to configure it to use a JDK, even though a JDK is obviously installed (Otherwise how could it compile Java class files?).

The fix is to go to Preferences – Java – Installed JREs. Click on the first JRE entry, press Edit, and navigate to the base JDK directory. After you save the setting restart Eclipse, and everything will work.