HTTP POST URLFetch Using The Low Level Java API

Here’s a code snippet demonstrating how to generate a POST request using the low level urlfetch API.

The variables post_url and post_data represent the request URL and the content of the POST as strings. The response from the urlfetch is stored in response_content . If the request was unsuccessful (the target server returned a non-200 HTTP status code) a RuntimeException will be thrown.

HTTPRequest request = new HTTPRequest(new URL(post_url), HTTPMethod.POST);
request.setHeader(new HTTPHeader("User-Agent", "Example Application"));
request.setPayload(post_data.getBytes("UTF-8"));
HTTPResponse response = URLFetchServiceFactory.getURLFetchService().fetch(request);
//If the response wasn't successful, throw an exception.
if (response.getResponseCode() != 200) {
    throw new RuntimeException("Response code was not 200: " + response.getResponseCode());
}
String response_content = new String(response.getContent());

Remember to import the URLFetch library:

import com.google.appengine.api.urlfetch.*;

X-Google-Cache-Control URL Fetch Response Header

Google caches the results of URL Fetch requests so subsequent requests can be supplied from the cache, thereby speeding up the request and your application in general.

This can be troublesome though, especially if an application is accessing a web page that changes quickly; URL fetch may be returning stale results without the application understanding this. Fortunately, there’s a way to detect whether or not the page was retrieved from a Google cache server.

For all URL Fetch requests from the production App Engine servers, the X-Google-Cache-Control header is added to URL Fetch responses. If the header has a value of remote-fetch, then the fetch retrieved a fresh copy of the page. If the value is remote-cache-hit, then the page was retrieved from Google’s cache and may have stale data.

Here’s how the header will look like if it’s a cache hit:

X-Google-Cache-Control: remote-cache-hit

While a freshly retrieved page will have this header:

X-Google-Cache-Control: remote-fetch

Missing User Agent For Development Server URL Fetches

A quick note: the App Engine development server doesn’t add an User-Agent header for URL fetch requests.

As I commented in a previous post, the App Engine production environment automatically sets an User-Agent (listed below) to all URL Fetch requests. If you set a custom user agent, App Engine will append the below text to your custom header.

AppEngine-Google; (+http://code.google.com/appengine; 
appid: YOUR_APPLICATION_ID_HERE)

However, the development server doesn’t add this header automatically. If you set a custom User-Agent header, that’s all that will be sent – no other identifying information. If you don’t set an user agent, URL fetches from the development server will not have any user agent information.

This can be an issue while developing applications in the dev server; some APIs require the existence of this header, and will refuse to respond or heavily rate limit requests if this header is missing. For instance, the NewsBlur API requires an user agent header for all requests. If the request doesn’t contain an user agent header, the API will refuse the request even if it’s authenticated.

Always set a custom user agent header which accurately describes your application to all URL fetch requests. If your application does a lot of URL fetches to the same API/server, it may be a good idea to list your email address or a web page with more information about your application.

URLFetch User Agent

When an application makes an URLFetch request, App Engine adds the following text as the User-Agentheader:

AppEngine-Google; (+http://code.google.com/appengine; 
appid: YOUR_APPLICATION_ID_HERE)

Even if the application sets a custom user agent header, App Engine will append the above text to the header.

This can be annoying because there are some servers and services that rate limit based on the user agent. If there is a human reviewing the request logs, it can be confusing to see a stream of largely-identical user agent strings.

It’s good practice to set a descriptive user agent for all URL fetches. It’s even better if you can write your user agent with App Engine’s required text in mind. For instance, consider writing user agent headers like this one: App Engine Example Crawler hosted by. When App Engine appends its required text to the end of this, the receiving server will see an user agent of:

App Engine Example Crawler hosted by
AppEngine-Google; (+http://code.google.com/appengine; 
appid: YOUR_APPLICATION_ID_HERE)

This user agent header looks cleaner, neater, and is easier for a human to understand.

Here is the above in code form:

String user_agent = "App Engine Example Crawler";
user_agent += " hosted by ";//After this, GAE will append the identifier.
connection.setRequestProperty("User-Agent", user_agent);

The connection variable represents a java.net.HttpURLConnection object.

HTTP GET Using The Low Level Java App Engine API

Here’s a short code example showing how to do a HTTP GET using the low level Java API.

The variable url_string_here is the URL being retrieved as a String. It returns a byte[] array containing the content of the response. If the response code is not 200 (i.e. anything other than HTTP OK) then this code throws a RuntimeException.

URL url = new URL(url_string_here);
HTTPRequest request = new HTTPRequest(url, HTTPMethod.GET);
request.setHeader(new HTTPHeader("User-Agent", "Custom User Agent "));
//Execute request.
HTTPResponse response = URLFetchServiceFactory.getURLFetchService().fetch(request);
if (response.getResponseCode() == 200) {
    //The response was OK
    //Retrieve the content of the response.
    return response.getContent();
}//end if the response code was 200.
else {
    throw new RuntimeException("Response code was " + response.getResponseCode());
}

Deadline exceeded while waiting for HTTP response from URL

Occasionally applications – even the best behaved applications – will get the error “Deadline exceeded while waiting for HTTP response from URL.”

Generally, this means that the web service you’re trying to connect to is down or slow. If the service is down, then you can continuously retry your URL fetches by queuing them up within a task.

If the web service is slow, then you have an alternative: setting the read and connect timeouts to a longer timeout point. By default, App Engine expects that an URL fetch will take – at most – 5 seconds. That’s 5 seconds to connect to the web service (resolve DNS and so forth), send the request data, allow the web service to process the request, and finally retrieve any response sent back. For the vast majority of applications, that’s more than enough. The popular web APIs such as Twitter, Facebook, Google, etc all process and return requests in much less than 5 seconds.

However, a slow or malfunctioning web service may take longer than 5 seconds to respond to a query. If your app is downloading a large amount of data (more than a few MB) you may also go past this limit. To tell App Engine to wait for a longer period of time, use this code (url_connection represents a HttpURLConnection object):

url_connection.setReadTimeout(milliseconds_to_wait_for_read);
url_connection.setConnectTimeout(milliseconds_to_wait_for_connect);

Remember that the time to wait is denoted in milliseconds, so do the appropriate conversions (for example, if you wanted the connection to wait 30 seconds, you would put 30000 milliseconds).