I have StackDriver notifications set up to email me whenever an error happens with my App Engine applications. This morning, I guessed my Google Cloud SQL instance was under maintenance. Not exactly a Sherlock Holmes -level deduction considering this display:
Here’s the details page of one of the errors:
Note that these errors occurred at 8:01 – 8:02 AM. What else happened at that time?
And as you can see, right around that time maintenance finished.
When you see a burst of errors at a single time, typically the root cause is maintenance or (rarely) backups being completed. Make sure your application is error-resistant by retrying failed SQL queries.
The Bottom Line
Cloud SQL maintenance can result in a burst of errors. Make sure your application can retry failed SQL queries, or log failed operations so they can be reviewed by your operations staff.
Also when you see an error, make sure to check your maintenance and backup logs. It’s an easy mistake to see an error and assume your code is at fault – knock out the simple error causes first before spending time digging into code and records.
As a bonus, and because I love metric graphs, here are some graphs showing the effect of the maintenance period around 8 AM: