computeengine – Learn Google

May 29, 2024May 29, 2024

UniSuper and Google Cloud Platform

I know a lot of enterprise cloud customers have been watching the recent incident with Google Cloud (GCP) and UniSuper. For those of you who haven’t seen it: UniSuper is an Australian pension fund firm which had their services hosted on Google Cloud. For some weird reason, their private cloud project was completely deleted. Google’s postmortem of the project is here: https://cloud.google.com/blog/products/infrastructure/details-of-google-cloud-gcve-incident . Fascinating reading – in particular what surprises me is that GCP takes full blame for the incident. There must be some very interesting calls occurring with Google and their other enterprise customers.

There’s some fascinating morsels to consider in Google’s postmortem of the incident. Consider this passage:

Data backups that were stored in Google Cloud Storage in the same region were not impacted by the deletion, and, along with third party backup software, were instrumental in aiding the rapid restoration.
https://cloud.google.com/blog/products/infrastructure/details-of-google-cloud-gcve-incident

Fortunately for UniSuper, the data in Google Cloud Storage didn’t seem to be affected and they were able to restore from there. But it looks like UniSuper also had a another set of data stored with another cloud. The following is from UniSuper’s explanation of the event at: https://www.unisuper.com.au/contact-us/outage-update .

UniSuper had backups in place with an additional service provider. These backups have minimised data loss, and significantly improved the ability of UniSuper and Google Cloud to complete the restoration.
https://www.unisuper.com.au/contact-us/outage-update

Having a full set of backups with another service provider has to be terrifically expensive. I’d be curious to see a discussion of who the additional service provider is and a discussion of the costs. I also wonder if the backup cloud is live-synced with the GCP servers or if there’s a daily/weekly sync of the data to help reduce costs.

The GCP statement seems to say that the restoration was completed with just the data from Google Cloud Storage, while the UniSuper statement is a bit more ambiguous – you could read the statement as either (1) the offsite data was used to complete the restoration or (2) the offsite data was useful but not vital to the restoration effort.

Interestingly, a HN comment indicates that the Australian financial regulator requires this multi-cloud strategy: https://www.infoq.com/news/2024/05/google-cloud-unisuper-outage/ .

I did a quick dive to figure out where these requirements are coming from, and from the best that I could tell, these requirements come from the APRA’s Prudential Standard CPS 230 – Operational Risk Management document. Here’s some interesting lines from there:

An APRA-regulated entity must, to the extent practicable, prevent disruption to
critical operations, adapt processes and systems to continue to operate within
tolerance levels in the event of a disruption and return to normal operations
promptly once a disruption is over.

An APRA-regulated entity must not rely on a service provider unless it can ensure that in doing so it can continue to meet its prudential obligations in full and effectively manage the associated risks.

Australian Prudential Regulation Authority (APRA) – Prudential Standard CPS 230 Operational Risk Management

I think the “rely on a service provider” is the most interesting text here. I wonder if – by keeping a set of data on another cloud provider – UniSuper can justify to the APRA that it’s not relying on any single cloud provider but instead has diversified its risks.

I couldn’t find any discussion about the maximum amount of downtime allowed, so I’m not sure where the “4 week” tolerance from the HN comment came from. Most likely that is from industry norms. But I did find some text about tolerance levels of disruptive events:

38. For each critical operation, an APRA-regulated entity must establish tolerance levels for:
(a) the maximum period of time the entity would tolerate a disruption to the
operation

Australian Prudential Regulation Authority (APRA) – Prudential Standard CPS 230 Operational Risk Management

It’s definitely interesting to see how requirements for enterprise cloud customers grow from their regulators and other interested parties. There’s often some justification underlying every decision (such as duplicating data across clouds) no matter how strange it seems at first.

APRA History On The Cloud

While digging into this subject, I found it quite interesting to trace how the APRA changed its tune about cloud computing over the years. As recently as 2010, the APRA felt the need to, “emphasise the need for proper risk and governance processes for all outsourcing and offshoring arrangements.” Here’s an interesting excerpt from their 2010 letter sent to all APRA-overseen financial companies:

Although the use of cloud computing is not yet widespread in the financial services industry, several APRA-regulated institutions are considering, or already utilising, selected cloud computing based services. Examples of such services include mail (and instant messaging), scheduling (calendar), collaboration (including workflow) applications and CRM solutions. While these applications may seem innocuous, the reality is that they may form an integral part of an institution’s core business processes, including both approval and decision-making, and can be material and critical to the ongoing operations of the institution.
APRA has noted that its regulated institutions do not always recognise the significance of cloud computing initiatives and fail to acknowledge the outsourcing and/or offshoring elements in them. As a consequence, the initiatives are not being subjected to the usual rigour of existing outsourcing and risk management frameworks, and the board and senior management are not fully informed and engaged.
https://www.apra.gov.au/sites/default/files/Letter-on-outsourcing-and-offshoring-ADI-GI-LI-FINAL.pdf

While the letter itself seems rather innocuous, it seems to have had a bit of a chilling effect on Australian banks: this article comments that, “no customers in the finance or government sector were willing to speak on the record for fear of drawing undue attention by regulators“.

An APRA document published on July 6, 2015 seems to be even more critical of the cloud. Here’s a very interesting quote from page 6:

In light of weaknesses in arrangements observed by APRA, it is not readily evident that risk management and mitigation techniques for public cloud arrangements have reached a level of maturity commensurate with usages having an extreme impact if disrupted. Extreme impacts can be financial and/or reputational, potentially threatening the ongoing ability of the APRA-regulated entity to meet its obligations.
https://www.apra.gov.au/sites/default/files/information-paper-outsourcing-involving-shared-computing-services_0.pdf

Then just three years later, the APRA seems to be much more friendly to cloud computing. A ComputerWorld article entitled “Banking regulator warms to cloud computing” published on September 24, 2018 quotes the APRA chair as acknowledging, “advancements in the safety and security in using the cloud, as well as the increased appetite for doing so, especially among new and aspiring entities that want to take a cloud-first approach to data storage and management.”

It’s curious to see the evolution of how organizations consider the cloud. I think UniSuper/GCP’s quick restoration of their cloud projects will result in a much more friendly environment toward the cloud.

May 16, 2020May 16, 2020

Tinkering With WP-CLI, Google Cloud, and BlueHost

I’ve been spending the last few days wrapping the WP-CLI application – a command line program to automate the administration of WordPress installations – inside a Java app so I can automate some WordPress work. One of the major bottlenecks was fixing up the correct SSH string to connect to the various WordPress providers.

Initially I was having a bit of difficulty because I misread the wp-cli documentation and I thought the –user argument was the SSH username. When I got that fixed, it turns out that some WordPress hosting services, such as BlueHost, require you to contact their support to activate SSH. I had to connect my application to multiple WordPress hosting services, but in this post I’ll use BlueHost as an example since they’re fairly representative of the work I had to go through.

In BlueHost’s case, having support activate SSH support was a surprisingly painless process – it only took a quick 5 minute text chat where I verified my email address. To build out the proper SSH command, I also needed to look at details provided by cPanel:

All the information you need to build the SSH string is in the General Information section in the above screenshot. Your SSH string should look like this:

php wp-cli.phar plugin list --ssh=<CURRENT_USER>@<SHARED_IP_ADDRESS>:2222/<HOME_DIRECTORY>/public_html --debug

The BlueHost account I’m using as an example is a shared WordPress account, so it listed a shared IP. Make sure to double check the port number (2222 in the above code sample) – the usual SSH port is 22, but BlueHost uses 2222 for shared accounts. Note that I’ve listed an additional folder under the user home directory; the home directory path only takes you to the user’s home directory, but wp-cli needs the path to the WordPress installation, which is usually under another folder (in this case /public_html/ ).

February 18, 2014January 26, 2019

Facebook Outbound Email

I’m in the middle of testing an email server on Compute Engine, and I noticed something unusual: apparently Facebook’s outbound email servers insist on using extended SMTP to send email.

With extended SMTP, an email server sends email by opening up a connection and sending the EHLO command. The proper response is either 250 (to indicate success and that extended SMTP support is available) or 550 (the responding server did not understand the command, which is another way of saying that the responding server does not support ESMTP). In case of 550 errors, the usual practice is to fall back to the original SMTP command set and to send a HELO request.

But Facebook’s outbound mail servers seem to only want to connect with ESMTP servers: FB mail servers send a quit command instead of falling back to sending a HELO command.

Another interesting oddity from watching mail logs: Google’s Gmail servers seem to be the only mail servers properly implementing the BDAT command (binary data). I never see any other mail servers attempt to use it.

October 1, 2013December 31, 2018

Removing EXIF Data With ImageMagick

Recently I needed to find a way to mass-remove EXIF data from multiple images. EXIF – if you didn’t know – is essentially metadata included within images. This metadata can include the name of the camera, the GPS coordinates where the picture was taken, comments, etc.

The easiest way is to remove EXIF data is to download ImageMagick and use the mogrify command line tool:

mogrify -strip /example/directory/*

ImageMagick can be run within a Compute Engine VM to make it available to a web application. Here’s the command to install ImageMagick:

sudo aptitude install imagemagick

September 26, 2013December 31, 2018

Communicating With Sockets On Compute Engine

Writing custom applications on Compute Engine requires the use of sockets to communicate with clients. Here’s a code example demonstrating how to read and write to sockets in Java.

Assume that client_socket is the socket communicating with the client:

/**
 * Contains the socket we're communicating with.
 */
Socket client_socket;

Create a socket by using a ServerSocket (represented by server_socket ) to listen to and accept incoming connections:

Socket client_socket = server_socket.accept();

Then extract a PrintWriter and a BufferedReader from the socket. The PrintWriter out object sends data to the client, while the BufferedReader in object lets the application read in data sent by the client:

/**
 * Handles sending communications to the client.
 */
PrintWriter out;
/**
 * Handles receiving communications from the client.
 */
BufferedReader in;
try {
    //We can send information to the client by writing to out.
    out = new PrintWriter(client_socket.getOutputStream(), true);
    //We receive information from the client by reading in.
    in = new BufferedReader(new InputStreamReader(client_socket.getInputStream()));
    /**
     * Start talking to the other server.
     */
    //Read in data sent to us.
    String in_line = in.readLine();
    //Send back information to the client.
    out.println(send_info);
}//end try
catch (IOException e) {
    //A general problem was encountered while handling client communications.
}

A line of text sent by the client can be read in by calling in.readLine() demonstrated by the in_line string. To send a line of text to the client, call out.println(send_info) where send_info represents a string. If an error occurs during communication, an IOException will be thrown and caught by the above catchstatement.

Remember to add the following imports:

import java.io.IOException;
import java.net.ServerSocket;
import java.net.Socket;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.BufferedReader;

September 17, 2013December 30, 2018

Releasing A Static IP In Compute Engine

Compute Engine allows static IPs to be reserved, even if no VM is attached. However, IPs that are unattached to VMs or load balancers are charged at $0.01 per hour. To avoid this charge, it’s a good idea to release IPs that are no longer in use.

To release a static IP, first go to the Compute Engine section of the Google Cloud Console. Select the Networks option:

A list of the IPs allocated to the project will be displayed:

Select the IP to remove and click the link labeled Release IP address:

You’ll see a confirmation box next. Press Yes.

The console will need a few moments to release the IP:

August 24, 2013December 28, 2018

Compute Engine: Over 100% CPU Utilization

An interesting tidbit for today’s post: apparently a Compute Engine machine can temporarily push over 100% CPU utilization. Here’s a screenshot from my project:

August 16, 2013December 28, 2018

Creating A Server In Java On Compute Engine

Compute Engine is a terrific hosting platform for applications – not only HTTP-based applications, but for servers running all types of services. To run these services an application has to listen for incoming connections using a server socket. Here’s how to do that in Java.

First, bind a server socket to a port. Here we’re binding it to the SMTP-reserved port 25. The port binding will fail if there’s already a server bound to that port.

//The server socket that handles all incoming connections.
ServerSocket server_socket = null;
//Bind the server socket to port 25. Fail and exit 
//if we fail to bind.
try {
    server_socket = new ServerSocket(25);
    System.out.println("OK: Bound to port 25.");
} 
catch (IOException e) {
    System.err.println("ERROR: Unable to bind to port 25.");
    e.printStackTrace();
    System.exit(-1);
}

Now wait for an incoming connection to accept (you can put the following code into a while loop if you want to accept multiple connections):

//Now we need to wait for a client to connect to us.
//Accept() blocks until it receives a new connection.
try {
    //A new client has connected.
    Socket client_socket = server_socket.accept();
    System.out.println("Accepted!");
    //Hand off the socket for further handling.
    //Do something here with the socket.
} 
catch (IOException e) {
    System.err.println("Failed to accept incoming connection:" + e.getMessage());
    e.printStackTrace();
    System.exit(-1);
}

From here, handle the socket as needed.

Remember to import the following classes:

import java.io.IOException;
import java.net.ServerSocket;
import java.net.Socket;

August 14, 2013December 28, 2018

Creating A Static IP Address In Google Compute Engine

First, go to the Google Cloud console at https://cloud.google.com/console . You’ll see the following screen. Click on a project or create a new one:

Click on the Compute Engine link on the next screen:

Press the Networks link on the left hand side navigation bar:

You’ll see the button New static IP within the Networks screen:

Name your new IP address, set a description (optional), and set the region where your IP address is located in. You can optionally also attach your new IP to a machine, if you have one running.

After you click the Create button, Compute Engine will need a few seconds to allocate a new IP address:

Once the allocation is complete, your new IP address will be listed in the Networks screen:

July 30, 2013December 23, 2018

Generating A JAR Application For Compute Engine

Packaging Java applications for Compute Engine is slightly different than for App Engine. The code, plus any associated libraries and resources, needs to be built into a single JAR file. Here’s how to do that in Eclipse.

First, go to File – Export:

Select Java – Runnable JAR file from the options presented:

This screen sets up the JAR options. The Launch Configuration option sets the class file to run when the JAR is executed (in other words, the file containing the main(String[] args) function to start the application). Export Destination sets the directory to store the finished JAR file in.

Press Finish when you’re done. You can now use gcutil to upload the JAR file to your Compute Engine machine and run it using the standard java command (remember to install a JRE first).