Optimizing Datastore Use

Google App Engine’s datastore is one of the most underrated parts of the service. Having a relatively cheap (in some cases free) use of a fast, reliable NoSQL store is a terrific deal, especially since most developers are only experienced with SQL databases.

With that said the App Engine datastore can get expensive quickly, especially if it’s being used inefficiently. One of my favorite illustrations of this point is this article. Here’s what I do to optimize my datastore use:

  1. Use .setUnindexedProperty. To set a property on an Entity you call entity.setProperty(key, value). But in the background App Engine is building an index (perhaps multiple indexes) to allow searches on that property. These index builds can get expensive very quickly. If you don’t need to search on a property, use setUnindexedProperty. This informs App Engine that the application will not be searching on that property, so don’t build an index.
  2. Cache data in memcache. Whenever you make a datastore request, copy the returned results into memcache. Then if you need to make the same request, try pulling the data from memcache before querying the datastore again. Memcache use is free so access it before the datastore. Some datastore abstractions – such as Objectify – do this automatically.
  3. If this is a high-traffic app, consider using a backend. A high-memory backend can hold a large amount of data in RAM, and you can transfer data to/from other instances by using URLFetch. As a bonus this technique can be faster than querying entities from the datastore.
  4. Turn off AppStats. If you have AppStats enabled, turn if off for some speed gains. AppStats stores its data in memcache which may cause some of your data to be evicted, causing increased datastore queries and an increased datastore bill.

Google Drive: Creating a folder

Today I answered a question about how to create a folder in Google Drive . The answer is simple: Google Drive considers a folder to be simply another file, so to create a folder you use the same API call as creating a file, except that you pass in a MIME type of application/vnd.google-apps.folder.

It’s a pretty straightforward answer, but it started me thinking about the nature of folders and Google. It’s interesting that Google Drive supports folders at all, considering that another famous Google service – Gmail – doesn’t support folders at all. Instead, it has labels.

In Gmail, you can create as many labels as you want, and apply any number of them to an email. Labels are similar to folders: they enable users to categorize and easily search for files – the only thing missing is the hierarchical pattern that folders enable. Which leads me back to Google Drive: I wonder how folders are internally represented within the Google Drive system. Based on the API, it looks like folders are considered a special case of files.

But if Drive can handle folders as simply a special file, why doesn’t Gmail support folders as well? Is it simply that Gmail and Drive are separate codebases, or that Google feels that labels are a better categorizing scheme than folders?

It’s always interesting to think about Google’s architecture and how different services have vastly different builds and APIs.

Golang Error: not enough arguments to return

One of the strengths of Go is the ability to return multiple values from a function. As a long-time Java programmer I’ve envied languages with muttiple returns for a long time. Here’s a quick summary of the feature:

Quick example: to return multiple values, use the classic return statement:

 return value1, value2

And to get those return values from a function, you can write:

 var1, var2 := someMultipleReturnFunction()

The great thing about this system is that you can ignore a return value if you don’t need it. For example, if you need to skip the first return value:

 _, var2 := someMultipleReturnFunction()

Which brings me to the original reason for this blog post. I was writing a Go application, when a build failed due to the error not enough arguments to return. The problem was that I had multiple return statements within a function, and one of the statements declared only one return value, instead of the two return values that the function declared.

The easy fix? Simply declare nil to pad out the “missing” return value. So the return statement became:

 return var1, nil

Easy, quick, and painless.

Receiving Email in Golang

I’m in the middle of writing a Java application on App Engine to receive mail, and I decided to look up on how to do it in Go. It’s shockingly easy, just a few lines of code (r represents http.Request):

    c := appengine.NewContext(r)
    defer r.Body.Close()
    msg, err := mail.ReadMessage(r.Body)

And that’s it. You can extract headers and the mail message body from the Message struct. It’s quite pleasant to use, and surprisingly fast at parsing email.

Retrieving A Datastore Entity With A Key (or Kind Name & ID)

Short code fragment that someone might find useful: retrieve an Entity if you know what the kind and entity ID/name are.

Key key = KeyFactory.createKey("kind", "id/name");
try {
    Entity entity = DatastoreServiceFactory.getDatastoreService().get(key);
} catch (EntityNotFoundException e) {
    //The entity wasn't found. Handle this exception.
}

Accessing and searching cells on Google Spreadsheet API

Recently I answered a question on how to access the Google Spreadsheet API. The poster was asking how to search the contents of a row/cell for a specific phrase, so I put together a quick example in Java.

The source code is below. Note that you need to replace the find_word variable with the phrase that you’re searching for, and the username and password variables with your Google username/password.

    try {
        String USERNAME = "username";
        String PASSWORD = "password";
        String find_word = "GOOG";
        find_word = find_word.toLowerCase();
        SpreadsheetService service = new SpreadsheetService(
                "GAEGROUPSEXAMPLENOPROD");
        service.setUserCredentials(USERNAME, PASSWORD);
        URL SPREADSHEET_FEED_URL = new URL(
                "https://spreadsheets.google.com/feeds/spreadsheets/private/full");
        // Make a request to the API and get all spreadsheets.
        SpreadsheetFeed feed = service.getFeed(SPREADSHEET_FEED_URL,
                SpreadsheetFeed.class);
        List<SpreadsheetEntry> spreadsheets = feed.getEntries();
        if (spreadsheets.size() == 0) {
            resp.getWriter().println(
                    "There are no spreadsheets to inspect!");
        }
        // Retrieve an iterator over all spreadsheets contained in
        // this user's Google Drive
        Iterator<SpreadsheetEntry> spreadsheet_iterator = spreadsheets
                .iterator();
        while (spreadsheet_iterator.hasNext()) {
            SpreadsheetEntry spreadsheet = spreadsheet_iterator.next();
            String spreadsheet_name = spreadsheet.getTitle().getPlainText();
            resp.getWriter().println(
                    "Currently searching spreadsheet " + spreadsheet_name);

            // Search only the first worksheet of the spreadsheet.
            WorksheetFeed worksheetFeed = service.getFeed(
                    spreadsheet.getWorksheetFeedUrl(), WorksheetFeed.class);
            List<WorksheetEntry> worksheets = worksheetFeed.getEntries();
            WorksheetEntry worksheet = worksheets.get(0);

            // Fetch the cell feed of the worksheet.
            URL cellFeedUrl = worksheet.getCellFeedUrl();
            CellFeed cellFeed = service
                    .getFeed(cellFeedUrl, CellFeed.class);

            for (CellEntry cell : cellFeed.getEntries()) {
                // Retrieve the contents of each cell.
                String cell_contents = cell.getCell().getInputValue()
                        .toLowerCase();

                // Match the word with the cell contents. Ignoring case.
                if (cell_contents.indexOf(find_word) != -1) {
                    // Found the word.
                    resp.getWriter().println(
                            "Cell with contents " + cell_contents
                                    + " matches your phrase!");
                    // Find the row
                    Integer row_index = new Integer(cell.getCell().getRow());
                    resp.getWriter().println(
                            "<b>Row " + row_index.toString() + "</b> in "
                                    + spreadsheet_name
                                    + " matches your query.");
                    String rowCollect = "";
                    // Print row data
                    URL rowCellFeedUrl = new URI(worksheet.getCellFeedUrl()
                            .toString()
                            + "?min-row="
                            + row_index
                            + "&max-row=" + row_index).toURL();
                    CellFeed rowCellFeed = service.getFeed(rowCellFeedUrl,
                            CellFeed.class);
                    // Iterate through each cell, printing its value.
                    for (CellEntry rowCell : rowCellFeed.getEntries()) {
                        // Print the cell's formula or text value
                        rowCollect += rowCell.getCell().getInputValue()
                                + "\t";
                    }
                    resp.getWriter().println(
                            "<b>Row Data: </b>" + rowCollect);
                }
            }// end for looping through cells
        }// end while looping through spreadsheets

    } catch (Exception e) {
        resp.getWriter().println("Exception: " + e.getMessage());
    }

Also, don’t put this example into production use until you switch out the authentication service to OAuth (this example uses ClientLogin to authenticate).

Setting Security Constraints (Or, Adding Admin-Only Areas In web.xml)

After having experimented with Go for the past few weeks, returning back to Java is a little bit annoying, especially when configuring web.xml and appengine-web.xml files. Golang has a clean, neat configuration file in app.yaml, and yet Java on App Engine has to deal with relatively heavyweight XML files.

For instance, this is the markup required to create an admin-only folder on J/GAE:

The markup alone is 2-3 times the size of the settings themselves! There needs to be a better way of handling this.

Adding a domain alias

I was trying to add a subdomain off of this domain (learntogoogleit.com) to my Google Apps, but apparently this isn’t allowed:

Google Apps won’t let you put in any domain that contains the word google. Blogger doesn’t allow it either. Amusingly enough, it does allow domains with competitor names in them, such as bing or microsoft.

Google App Engine Startup Time & Uncompressing JARs

About a week ago, I saw this post https://groups.google.com/forum/?fromgroups=#!topic/google-appengine/GdBqSxqviYk about instances being unable to load and failing initialization. I took one look at the picture provided, saw the line This request started at [time] and was still executing at [time, 1 minute later], and immediately assumed that the application’s init function was taking too long to run.

In most cases, that’s a fair assumption to make. One of the bigger pitfalls of App Engine is that instances have only 60 seconds to start up (load in all files and run the init method of servlets). It’s very common for developers to write in a huge amount of code within the init method, and then have instances fail startup because initialization took too long. In this case I believed init was the problem for one reason alone: the picture of the logging stack trace included references to ZIP I/O streams. Uncompressing and processing large ZIP files within the init function could easily take more than a minute.

However, it turned out that the developer wasn’t uncompressing ZIP files in the init – the answer was that App Engine was having a slow day, and was exceeding the 60 second startup limit just trying to uncompress the JAR. Which is pretty amazing and notable enough to comment on – the application didn’t even get fully extracted before App Engine shut down the instance as a failure.

Golang App Binary is missing

Getting a lot of app binary is missing errors from App Engine. It’s odd, I’ve been deploying Java apps on GAE for years, and never received any of these errors. But on the Go runtime, these errors are cropping up repeatedly.