How I Can Tell When Cloud SQL Is Under Maintenance

I have StackDriver notifications set up to email me whenever an error happens with my App Engine applications. This morning, I guessed my Google Cloud SQL instance was under maintenance. Not exactly a Sherlock Holmes -level deduction considering this display:

All of these SQL errors happened within a minute range, 10 hours ago.

Error Detail

Here’s the details page of one of the errors:

Detail Page, Screen 1
Detail Page, Screen 2

Note that these errors occurred at 8:01 – 8:02 AM. What else happened at that time?

Maintenance Logs

And as you can see, right around that time maintenance finished.

When you see a burst of errors at a single time, typically the root cause is maintenance or (rarely) backups being completed. Make sure your application is error-resistant by retrying failed SQL queries.

The Bottom Line

Cloud SQL maintenance can result in a burst of errors. Make sure your application can retry failed SQL queries, or log failed operations so they can be reviewed by your operations staff.

Also when you see an error, make sure to check your maintenance and backup logs. It’s an easy mistake to see an error and assume your code is at fault – knock out the simple error causes first before spending time digging into code and records.

Fun Graphs

As a bonus, and because I love metric graphs, here are some graphs showing the effect of the maintenance period around 8 AM: