Google SEO Update On March 2024: Up 314%

If you’re interested in search optimization, you’ll know about Google’s new search update that released in March 2024. Per Google, the search update is intended to weed out low effort sites, sites with a ton of AI content, affiliate review sites, and so forth. A good outline of what went on in this update is here.

In short, a lot of chaos occurred. Major publications are reporting pretty severe drops in traffic; smaller sites are reporting traffic drops of greater than 90%. Here’s a fun quote:

BBC News, for example, was among the sites that saw the biggest percentage drops, with its site losing 37% of its search visibility having fallen from 24.7 to 15.4 points in a little over six weeks. Its relative decline was second only to Canada-based entertainment site, Screenrant which saw its visibility fall by 40% from 27.6 to 16.7.

https://pressgazette.co.uk/media-audience-and-business-data/first-google-core-update-of-2024-brings-bad-news-for-most-news-publishers/

There’s a lot of doom and gloom about this update, but I’m really liking it. I’m seeing a lot of very interesting stuff float up on my Google searches that normally would be buried. In particular I’m seeing fewer “top 10 XYZ” type webpages and more links to opinion websites such as Reddit and other forums.

And then there’s this: one of my websites is reporting 314% more clicks from Google search.

I run a small blog (not this one) which is basically a tumblelog-style fan blog for a specific consumer-goods company. It really doesn’t do much except repost funny pictures and interesting articles. The blog typically gets about 100 clicks a month from Google search – which never ceases to amaze me, especially since the site itself is so simple.

With that in mind, I was shocked to suddenly see a burst of emails over the past month congratulating me over a sudden rise in traffic:

A sample of the emails:

What on earth is going on? A quick view of my search console shows the truth:

I’m not making any larger point here, it’s just interesting to see how fast things can change during a search core update.

Task Queue Fun: DeadlineExceeded

I always love pointing out fun errors – where I define “fun error” as an error that is intermittent/happens rarely in the context of regular operation in the application. Those errors are always the most fun to fix.

Today I was poking through some of my toy applications running on Google Cloud when I saw this:

And the text only:

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
._end_unary_response_blocking ( /layers/google.python.pip/pip/lib/python3.7/site-packages/grpc/_channel.py:1003 )	-	Jan 23, 2024	22 hours ago	-

DeadlineExceeded: 504 Deadline Exceeded
.error_remapped_callable ( /layers/google.python.pip/pip/lib/python3.7/site-packages/google/api_core/grpc_helpers.py:81 )	-	Jan 23, 2024	22 hours ago

Hmm – so an error occurred 22 hours ago, that last reoccurred almost 4 months ago (Jan 23, 2024). Doesn’t sound very important. But just for the laughs, let’s dig in.

Of the two errors, I know that the first one (InactiveRPCError) is most likely due to a connection being unable to complete. Not a giant problem, happens all the time in the cloud – servers get rebooted, VMs get shuffled off to another machine, etc. Not a serious problem. The deadline exceeded one concerns me though because I know this application connects to a bunch of different APIs and does a ton of time consuming operations, and I want to make sure that everything is back to normal.

So here’s the view of the error page:

So I know that the error is somewhere communicating with Google services since the error pops up in the google api core library. Let’s hop on over to logging and find the stack trace – I’ve redacted a line that doesn’t mean anything to the purpose of this post:

Traceback (most recent call last):
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/flask/app.py", line 2070, in wsgi_app
    response = self.full_dispatch_request()
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/flask/app.py", line 1515, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/flask/app.py", line 1513, in full_dispatch_request
    rv = self.dispatch_request()
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/flask/app.py", line 1499, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  [REDACTED]
  File "/srv/main.py", line 331, in launch_task
    task_creation_results = client.create_task(parent=queue_prop, task=task)
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/google/cloud/tasks_v2/services/cloud_tasks/client.py", line 2203, in create_task
    metadata=metadata,
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/google/api_core/timeout.py", line 120, in func_with_timeout
    return func(*args, **kwargs)
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 81, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.DeadlineExceeded: 504 Deadline Exceeded

If you missed the culprit in the above text, let me help you out: the call to the Google Task Queue service on line 331 of my application ended up exceeding Google’s deadline, and threw up the exception from Google’s end. Perhaps there was a transient infrastructure issue, perhaps task queue was under maintenance, perhaps it was just bad luck.

File "/srv/main.py", line 331, in launch_task
    task_creation_results = client.create_task(parent=queue_prop, task=task)

There’s really nothing to be done here, other than maybe catching the exception and trying again. I will point out that the task queue service is surprisingly resilient: out of tens/hundreds of thousands of task queue calls over the past 5 months that this application has performed, only 2 tasks (one in January 2024, one yesterday) have failed to enqueue. More importantly, my code is functioning as intended and I can mark this issue as Resolved or at least Muted.

Now honestly, this is a my bad sort of situation. If this was a real production app there should be something catching the exception. But since this is a toy application, I absolutely am fine tolerating the random and thankfully very rare failures in task queue.

The database default does not exist for project…

I opened up a new Google Cloud project to act as a staging project and forgot to set up a Firestore database for the project, and got the following error:

google.api_core.exceptions.NotFound: 404 The database default does not exist for project magicalnews Please visit https://console.cloud.google.com/datastore/setup?project=[projectname] to add a Cloud Datastore or Cloud Firestore database

Google Cloud Logging

Obviously to fix this you need to create a Cloud Firestore or Cloud Datastore, but I will say I love the detailed error message and the direct link to fix the problem. I’m noticing a lot of small developer experience fixes and filing down of “sharp edges” – I hope they continue because the developer experience on Google Cloud is just getting better and better.

Python: Changing Timezones In A DateTime

Here’s some quick code samples for shifting a UTC datetime object (created_at is a datetime.utcnow()) to a different timezone. In this first example, we use timedelta to add/remove hours to find the current time at UTC-6:00.

local_time = created_at + timedelta(hours=-6)
final_time =  datetime.strftime(local_time, '%Y-%m-%d %I:%M:%S %p')

In this sample, a datetime (created_at) is declared to be a UTC time, then converted into US/Chicago time and formatted for human presentation:

local_datetime = pytz.timezone('UTC').localize(created_at).astimezone(pytz.timezone('US/Central'))
local_datetime_str = "Created (User Local Time): " +  str(datetime.strftime(local_datetime, '%Y-%m-%d %I:%M:%S %p'))

How I Can Tell When Cloud SQL Is Under Maintenance

I have StackDriver notifications set up to email me whenever an error happens with my App Engine applications. This morning, I guessed my Google Cloud SQL instance was under maintenance. Not exactly a Sherlock Holmes -level deduction considering this display:

All of these SQL errors happened within a minute range, 10 hours ago.

Error Detail

Here’s the details page of one of the errors:

Detail Page, Screen 1
Detail Page, Screen 2

Note that these errors occurred at 8:01 – 8:02 AM. What else happened at that time?

Maintenance Logs

And as you can see, right around that time maintenance finished.

When you see a burst of errors at a single time, typically the root cause is maintenance or (rarely) backups being completed. Make sure your application is error-resistant by retrying failed SQL queries.

The Bottom Line

Cloud SQL maintenance can result in a burst of errors. Make sure your application can retry failed SQL queries, or log failed operations so they can be reviewed by your operations staff.

Also when you see an error, make sure to check your maintenance and backup logs. It’s an easy mistake to see an error and assume your code is at fault – knock out the simple error causes first before spending time digging into code and records.

Fun Graphs

As a bonus, and because I love metric graphs, here are some graphs showing the effect of the maintenance period around 8 AM:

Initial Thoughts On Comparing Tweets

Note: some of this code is old; it could be rewritten to benefit more from SQL’s windowing and other advanced functions.

I’ve been poking around some old code of mine recently – since it’s going unused, maybe someone on the Internet would benefit from it. A few years ago, I was toying with some code to detect Twitter trending tweets. Consider a SQL table set up like the below, where user is the name of Twitter user posting the tweet, ID is the tweet ID, created_at is the creation date of the tweet, and retweet_count/favorite_count are the number of retweets and favorites the tweet had.

CREATE TABLE twitter (
    category VARCHAR(30),
    user VARCHAR(30),
    id VARCHAR(90), 
    created_at DATETIME,
    retweet_count INT,
    favorite_count INT,
    add_date DATETIME
);

The trend finder worked like this: it would take snapshots of each Tweet’s retweet and favorite count (add_date is the datetime of the snapshot) and then compare the snapshots to detect how fast the tweet was gaining retweets/favorites. The comparison was done in SQL. I named the gain rate of retweets/favorites the slope of the activity, and calculated multiple slopes: the increase in retweets/favorites within the most recent 90 minutes, and the overall increase of retweets/favorites. Here’s the simplified SQL (other SQL has been cut off because this is already too long, and not relevant to the point I’m making):


insert into tweet_slope 
(

select twitter.created_at, twitter.category, twitter.id, twitter.retweet_count as max_retweet, 
 twitter.favorite_count as max_favorite, twitter.add_date, new_table.max_date, 
 first_table.first_date, first_table.first_retweet, first_table.first_favorite, 
 prior_table.prior_date, prior_table.prior_retweet, prior_table.prior_favorite,
                                                                                                    	
(twitter.retweet_count - first_table.first_retweet) as full_retweet_diff,
TIMESTAMPDIFF(MINUTE, first_table.first_date, new_table.max_date) as full_retweet_timespan,
((twitter.retweet_count - first_table.first_retweet) / 
TIMESTAMPDIFF(MINUTE, first_table.first_date, new_table.max_date)) as full_retweet_slope,
(twitter.retweet_count - prior_table.prior_retweet) as recent_retweet_diff,
TIMESTAMPDIFF(MINUTE, prior_table.prior_date, new_table.max_date) as recent_retweet_timespan,
((twitter.retweet_count - prior_table.prior_retweet) / 
TIMESTAMPDIFF(MINUTE, prior_table.prior_date, new_table.max_date)) as recent_retweet_slope,
(twitter.favorite_count - first_table.first_favorite) as full_favorite_diff,
TIMESTAMPDIFF(MINUTE, first_table.first_date, new_table.max_date) as full_favorite_timespan,
((twitter.favorite_count - first_table.first_favorite) / 
TIMESTAMPDIFF(MINUTE, first_table.first_date, new_table.max_date)) as full_favorite_slope,
(twitter.favorite_count - prior_table.prior_favorite) as recent_favorite_diff,
TIMESTAMPDIFF(MINUTE, prior_table.prior_date, new_table.max_date) as recent_favorite_timespan,
((twitter.favorite_count - prior_table.prior_favorite) / 
TIMESTAMPDIFF(MINUTE, prior_table.prior_date, new_table.max_date)) as recent_favorite_slope
 from (select id, max(add_date) as max_date from twitter group by id) as new_table
 inner join twitter on twitter.id=new_table.id and twitter.add_date=new_table.max_date
 INNER JOIN 
(select twitter.id, twitter.add_date as first_date, twitter.retweet_count as 
 first_retweet, twitter.favorite_count as first_favorite from twitter inner 
 join (select id, min(add_date) as min_date from twitter group by id) as old_dates 
 on twitter.id=old_dates.id and twitter.add_date=old_dates.min_date) as first_table
 
 on twitter.id=first_table.id
 INNER JOIN 
 (select twitter.id, twitter.add_date as prior_date, twitter.retweet_count as 
 prior_retweet, twitter.favorite_count as prior_favorite from twitter INNER JOIN 
 (select id, max(add_date) as prior_set_date from twitter where add_date <  
 DATE_SUB(NOW(), INTERVAL 90 MINUTE) group by id) as prior_set 
 ON twitter.add_date=prior_set.prior_set_date and twitter.id=prior_set.id) as  prior_table 
 ON twitter.id=prior_table.id
 where twitter.created_at > DATE_SUB(NOW(), INTERVAL 1 DAY) AND 
 new_table.max_date > DATE_SUB(NOW(), INTERVAL 45 MINUTE) AND 
 new_table.max_date <> first_table.first_date
and twitter.category = :category
)
;

Now this code worked, and it worked great! A fast DB machine made all the difference to crunch out all the SQL calculations. I tested it by sampling tweets from sets of related accounts. One set I tested were all Disney/Marvel/Star Wars-related Twitter accounts (for instance, the accounts of actors for Luke Skywalker, Iron Man, Disney Parks) and the top-ranking tweets were almost all from Mark Hamill’s account; on the day I was testing, Mark’s tweets had thousands of favorites/retweets due to being politics-related.

I said this code was old at the beginning of the post, and that’s true – it has since been replaced. While this code worked, it was too simple: in practice, it would never surface some trending tweets from less-popular Twitter accounts.

The real problem is that the popularity of tweets is a bimodal distribution: most tweets end up puttering along with, at best, a handful of retweets/favorites; some might even get a few hundred. But then there’s a small minority of tweets, posted from extremely popular accounts, that get thousands and thousands of favorites and retweets regardless of the individual tweet’s merit – the tweets themselves are popular because the poster is popular. This crowds out the trending tweets from the other section of the bimodal graph.

I rebuilt the application to take into account the number of followers a Twitter user has (as a proxy for the account’s relative popularity) and to compare a tweet to the account’s tweet history (how popular is a given tweet compared to the other tweets in the same account?). I’ll share the code soon.

For New Engineers, A Reminder…

I find that many new software engineers are nervous when they start a new job – there’s always so much to learn: every software company does things a little differently or possibly radically differently than the “best” practices taught at university. It’s good to take a step back and remember that everybody messes up once in a while.

I was quite amused today when browsing HN and saw this article: I shipped a word processor that formatted the hard drive every 1024 saves. It links to a Twitter thread reproduced below: https://twitter.com/GeePawHill/status/1256342997643526151 (Click image below to expand).

A twitter thread where a programmer discusses a bug wherein a word processor overwrote the hard drive's master boot record with garbage, making the computer unable to boot.

Everyone makes mistakes – the important thing is to be able to learn from them. If you’re a new intern or software engineer, my advice to you is to ask lots of questions: in my experience, people don’t mind asking questions as long as the questions show that you’re growing and learning.

NewsBlur: Iterating Through A Folder’s RSS Feed

After Google Reader was shut down, I moved to NewsBlur to follow my RSS feeds. The great thing about NewsBlur is that you can add RSS feeds to a folder and Newsblur will merge all the stories under that folder into a single RSS feed.

Under NewsBlur, you’ll want to pull the folder RSS feed from the settings option:

NewsBlur settings option - the folder RSS URL is at the bottom.

The following Python code can pull the feed and iterate through it to find article information. At the bottom of this code example, each child represents a possible article, and sub_child represents a property on the article: the URL, the title, etc. I use a variant of this code to help identify important news stories.

import requests
import xml.etree.ElementTree as ET
import logging
import datetime, pytz
import json
import urllib.parse

#tears through the newsblur folder xml searching for <entry> items
def parse_newsblur_xml():
    r = requests.get('NEWSBLUR_FOLDER_RSS')
    if r.status_code != 200:
        print("ERROR: Unable to retrieve address ")
        return "error"
    xml = r.text
    xml_root = ET.fromstring(xml)
    #we search for <entry> tags because each entry tag stores a single article from a RSS feed
    for child in xml_root:
        if not child.tag.endswith("entry"):
            continue
        #if we are down here, the tag is an entry tag and we need to parse out info
        #Grind through the children of the <entry> tag
        for sub_child in child:
            if sub_child.tag.endswith("category"): #article categories
                #call sub_child.get('term') to get categories of this article
            elif sub_child.tag.endswith("title"): #article title
                #call sub_child.text to get article title
            elif sub_child.tag.endswith("summary"): #article summary
                #call sub_child.text to get article summary
            elif sub_child.tag.endswith("link"):
                #call sub_child.get('href') to get article URL

Cloud Build Error – User does not have permission to access app (or it may not exist): The caller does not have permission

Whenever I provision a new Google Cloud project, I always get bitten by this error. I keep forgetting to set up IAM rules to allow Cloud Build access to App Engine.

Screenshot of failed Cloud Build run. Cloud Build does not have permission to access my App Engine instance.
Screenshot of failed Cloud Build run. Cloud Build does not have permission to access my App Engine instance.
Operation completed over 1 objects/8.6 KiB.
BUILD
Already have image (with digest): gcr.io/cloud-builders/gcloud
ERROR: (gcloud.app.deploy) User [[email protected]] does not have permission to access app [APP_ID_REDACTED] (or it may not exist): The caller does not have permission
ERROR
ERROR: build step 0 "gcr.io/cloud-builders/gcloud" failed: exit status 1

To fix this, go into Settings under Cloud Build and enable access to App Engine, and any other cloud service you use in conjunction with Cloud Build. Then wait a moment for the settings to take effect and rerun the build.

Setting up Cloud Build to connect to App Engine.
Setting up Cloud Build to connect to App Engine.

Firestore Errors

Most of my apps are using Google’s Datastore, but I decided to try out the new Firestore on a test application. I’m receiving quite a few of the below errors:

io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference cleanQueue: *~*~*~ Channel ManagedChannelImpl{logId=346, target=firestore.googleapis.com:443} was not shutdown properly!!! ~*~*~* (ManagedChannelOrphanWrapper.java:151)
    Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() returns true.
java.lang.RuntimeException: ManagedChannel allocation site
	at io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init>(ManagedChannelOrphanWrapper.java:94)
	at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:52)
	at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:43)
	at io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:514)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:223)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:164)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:156)
	at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:157)
	at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:122)
	at com.google.cloud.firestore.spi.v1.GrpcFirestoreRpc.<init>(GrpcFirestoreRpc.java:122)
	at com.google.cloud.firestore.FirestoreOptions$DefaultFirestoreRpcFactory.create(FirestoreOptions.java:80)
	at com.google.cloud.firestore.FirestoreOptions$DefaultFirestoreRpcFactory.create(FirestoreOptions.java:72)
	at com.google.cloud.ServiceOptions.getRpc(ServiceOptions.java:510)
	at com.google.cloud.firestore.FirestoreOptions.getFirestoreRpc(FirestoreOptions.java:315)
	at com.google.cloud.firestore.FirestoreImpl.<init>(FirestoreImpl.java:77)
	at com.google.cloud.firestore.FirestoreOptions$DefaultFirestoreFactory.create(FirestoreOptions.java:63)
	at com.google.cloud.firestore.FirestoreOptions$DefaultFirestoreFactory.create(FirestoreOptions.java:56)
	at com.google.cloud.ServiceOptions.getService(ServiceOptions.java:498)
Screenshot of Firestore exception - failure to shut down in code.

These errors stopped when I called close() on the com.google.cloud.firestore.Firestore object after I was done with storage operations:

Javadoc for close() on com.google.cloud.firestore.Firestore.

I can’t help but feel a little disappointed at this new requirement to close the Firestore connection. It feels like a regression from the Datastore – there was no need to close the datastore object after usage.