A Reply To: “Google Defaults To Not Indexing” Or: Google As Miss Manners

I saw this blog post on Hacker News, and it was so notable that I was thinking about it for the past week. I disagree on its major points for technical reasons, but I agree in that you should SEO with the thought that it’s true.

But first, I want to make a distinction here. When Google hits a website and looks at its content for possible inclusion into its search index, we call that “spidering”. That’s not a word plucked out of nowhere – we call web crawlers searching for content “spiders” and there’s a long technical history behind that.

In my experience, Google spiders basically everything – even places maybe you wish Google didn’t find such as admin pages. And frankly this makes sense – spidering your web site doesn’t only give information about your website, but it also gets Google information about how it should rank other web pages. For example, Google gets information about the sites you link out to, which contributes to PageRank calculations of how other web pages should be ranked. A second example is that by spidering all the web pages, Google can find scraped/duplicate content and possibly consider the offending domain (not necessarily your domain!) for SEO penalties.

So if there is an incentive to spider everything, you can see where I disagree with the blog post:

Credit: https://www.vincentschmalbach.com/google-now-defaults-to-not-indexing-your-content/

I think it’s very unreasonable to say “Google is no longer trying to index the entire web.” There are huge incentives for Google to spider and at least know about the entire web, even if they don’t actually show the web pages it knows about in its search.

First off, most people don’t go past the first page of search results anymore. For a majority of searches, the answers from Google’s AI summary/the first few results (regardless of whether they’re ads or not) will show up with the answer. 60% of searches don’t even result in a click to an outside web page. So even if Google knows about additional web sites that might match the search, is it worth the computing power to resolve the rankings much below the 20th search result slot or even farther?

There’s a human analog here: people do not want to hear additional details. They want you to get to the point as fast as possible. Here’s a Miss Manners article on “Is there any polite way to encourage someone who is recounting an anecdote to you to come to the point a little faster?” I find it reasonable to assume Google search is simply getting to the point and not showing sites that – even though they have relevant information – that information is already available on the other competing web pages that are higher ranked.

So in short, I disagree with this blog article on a technical basis. I don’t think it’s quite so easy to to say because a web page is not showing up in a Google search, that automatically equals Google didn’t see it or care about it or that it’s not in the Google index.

On the other hand, I think the blog’s deeper point is true. We’ve reached the point in the Internet where there are lots of good competing information sources. If you want to launch a competitor, you need to have a value proposition and a niche: a place that you can get started. For example, suppose you have a Pizza Hut, Papa Johns, (insert your favorite pizza place here) in your town. Your townspeople are generally happy with the pizza available, and there’s no obvious need for another pizza place. If you want to launch a new pizza restaurant, you can’t just say, “We sell pizza.” You have to have a value proposition different than Pizza Hut/Papa Johns/etc: maybe the pizza at your restaurant is meatier/cheesier/better crust/whatever better than the competitors.

The same goes for content: if you want to launch a new website, you need to have a value proposition different than what your competitors are offering if you want a space in Google search rankings. You need to develop a following as an expert in some niche in order to compete with better, more well funded competitors especially if you’re a smaller blog.

GoDaddy EMail Forwarding Shuts Off – Migrating Catchall Email Addresses

I have a domain that I used for email almost 20 years ago. I don’t use it for (important!) email anymore, nor really for any other purpose, but I do occasionally check the email account attached to the domain every week or so in case something important gets sent through. Most of the time it’s nothing more than a few hundred messages from various listservs I’ve been on for a long time.

So you can imagine my surprise when I logged onto the email account and saw zero new messages – very unusual since those listservs have a lot of daily traffic. After some Googling, I found that GoDaddy’s catch all email forwarders apparently no longer work. Here’s an example post from Reddit on the situation: https://www.reddit.com/r/godaddy/comments/1d94771/email_catchall_help/ .

What really annoys me is that the email forwarding is silently broken – there’s no rejection email or anything. I tried to send some emails to my email-forwarded domain and none of them went through, nor were they rejected. Again, here’s a Reddit post documenting this: https://www.reddit.com/r/DontGoDaddy/comments/1d1pvim/comment/l7d2ua5/ .

I tried searching my email archives to see if there was any warning email catchalls would be turned off – I didn’t see anything, and this Reddit post confirms that nobody else received a warning either: https://www.reddit.com/r/ProtonMail/comments/1d2bup6/comment/l6k9amp/ .

I’m pretty disappointed in how email forwarding from GoDaddy was shut down. Free email forwarding has been basically free with domain registration for a long time with any decent registrar.

Anyway to fix this, I’ve been rerouting a bunch of domains to map to my Google Apps account as alias domains, then mapping the catchall address in Gmail Routing to map to my main account as in the picture below.

Once again, Google to the rescue, but I am seriously annoyed at having to work around GoDaddy issues. The fact that they gave zero warning of this change is concerning to say the least.

Google Search For . (Period)

For today, I wanted to record a quick observation I had while Googling. It’s also a reminder that choosing the correct search terms can drastically change what Google returns to you.

If I Google for the period symbol (.), I get back results for the phrase “full stop punctuation.” I know this because the words “full stop punctuation” are bolded in the returned Google page. Here’s a screenshot in case that changes:

Note that the links aren’t terribly interesting – I don’t see any links to punctuation or style guides, just pages with the words “full stop punctuation.”

Now interestingly, if I search for the words “period punctuation”, I get back a small context box explaining to me what a period is used for in writing, as well as a list of punctuation and writing guides:

The results for a Google search for “period punctuation.”

As you can see, a minor change in search terms dramatically changes what you get, even if both terms mean largely the same thing.

UniSuper and Google Cloud Platform

I know a lot of enterprise cloud customers have been watching the recent incident with Google Cloud (GCP) and UniSuper. For those of you who haven’t seen it: UniSuper is an Australian pension fund firm which had their services hosted on Google Cloud. For some weird reason, their private cloud project was completely deleted. Google’s postmortem of the project is here: https://cloud.google.com/blog/products/infrastructure/details-of-google-cloud-gcve-incident . Fascinating reading – in particular what surprises me is that GCP takes full blame for the incident. There must be some very interesting calls occurring with Google and their other enterprise customers.

There’s some fascinating morsels to consider in Google’s postmortem of the incident. Consider this passage:

Data backups that were stored in Google Cloud Storage in the same region were not impacted by the deletion, and, along with third party backup software, were instrumental in aiding the rapid restoration.

https://cloud.google.com/blog/products/infrastructure/details-of-google-cloud-gcve-incident

Fortunately for UniSuper, the data in Google Cloud Storage didn’t seem to be affected and they were able to restore from there. But it looks like UniSuper also had a another set of data stored with another cloud. The following is from UniSuper’s explanation of the event at: https://www.unisuper.com.au/contact-us/outage-update .

UniSuper had backups in place with an additional service provider. These backups have minimised data loss, and significantly improved the ability of UniSuper and Google Cloud to complete the restoration.

https://www.unisuper.com.au/contact-us/outage-update

Having a full set of backups with another service provider has to be terrifically expensive. I’d be curious to see a discussion of who the additional service provider is and a discussion of the costs. I also wonder if the backup cloud is live-synced with the GCP servers or if there’s a daily/weekly sync of the data to help reduce costs.

The GCP statement seems to say that the restoration was completed with just the data from Google Cloud Storage, while the UniSuper statement is a bit more ambiguous – you could read the statement as either (1) the offsite data was used to complete the restoration or (2) the offsite data was useful but not vital to the restoration effort.

Interestingly, a HN comment indicates that the Australian financial regulator requires this multi-cloud strategy: https://www.infoq.com/news/2024/05/google-cloud-unisuper-outage/ .

I did a quick dive to figure out where these requirements are coming from, and from the best that I could tell, these requirements come from the APRA’s Prudential Standard CPS 230 – Operational Risk Management document. Here’s some interesting lines from there:

  1. An APRA-regulated entity must, to the extent practicable, prevent disruption to
    critical operations, adapt processes and systems to continue to operate within
    tolerance levels in the event of a disruption and return to normal operations
    promptly once a disruption is over.
  2. An APRA-regulated entity must not rely on a service provider unless it can ensure that in doing so it can continue to meet its prudential obligations in full and effectively manage the associated risks.
Australian Prudential Regulation Authority (APRA) – Prudential Standard CPS 230 Operational Risk Management

I think the “rely on a service provider” is the most interesting text here. I wonder if – by keeping a set of data on another cloud provider – UniSuper can justify to the APRA that it’s not relying on any single cloud provider but instead has diversified its risks.

I couldn’t find any discussion about the maximum amount of downtime allowed, so I’m not sure where the “4 week” tolerance from the HN comment came from. Most likely that is from industry norms. But I did find some text about tolerance levels of disruptive events:

  1. 38. For each critical operation, an APRA-regulated entity must establish tolerance levels for:
    (a) the maximum period of time the entity would tolerate a disruption to the
    operation
Australian Prudential Regulation Authority (APRA) – Prudential Standard CPS 230 Operational Risk Management

It’s definitely interesting to see how requirements for enterprise cloud customers grow from their regulators and other interested parties. There’s often some justification underlying every decision (such as duplicating data across clouds) no matter how strange it seems at first.

APRA History On The Cloud

While digging into this subject, I found it quite interesting to trace how the APRA changed its tune about cloud computing over the years. As recently as 2010, the APRA felt the need to, “emphasise the need for proper risk and governance processes for all outsourcing and offshoring arrangements.” Here’s an interesting excerpt from their 2010 letter sent to all APRA-overseen financial companies:

Although the use of cloud computing is not yet widespread in the financial services industry, several APRA-regulated institutions are considering, or already utilising, selected cloud computing based services. Examples of such services include mail (and instant messaging), scheduling (calendar), collaboration (including workflow) applications and CRM solutions. While these applications may seem innocuous, the reality is that they may form an integral part of an institution’s core business processes, including both approval and decision-making, and can be material and critical to the ongoing operations of the institution.
APRA has noted that its regulated institutions do not always recognise the significance of cloud computing initiatives and fail to acknowledge the outsourcing and/or offshoring elements in them. As a consequence, the initiatives are not being subjected to the usual rigour of existing outsourcing and risk management frameworks, and the board and senior management are not fully informed and engaged.

https://www.apra.gov.au/sites/default/files/Letter-on-outsourcing-and-offshoring-ADI-GI-LI-FINAL.pdf

While the letter itself seems rather innocuous, it seems to have had a bit of a chilling effect on Australian banks: this article comments that, “no customers in the finance or government sector were willing to speak on the record for fear of drawing undue attention by regulators“.

An APRA document published on July 6, 2015 seems to be even more critical of the cloud. Here’s a very interesting quote from page 6:

In light of weaknesses in arrangements observed by APRA, it is not readily evident that risk management and mitigation techniques for public cloud arrangements have reached a level of maturity commensurate with usages having an extreme impact if disrupted. Extreme impacts can be financial and/or reputational, potentially threatening the ongoing ability of the APRA-regulated entity to meet its obligations.

https://www.apra.gov.au/sites/default/files/information-paper-outsourcing-involving-shared-computing-services_0.pdf

Then just three years later, the APRA seems to be much more friendly to cloud computing. A ComputerWorld article entitled “Banking regulator warms to cloud computing” published on September 24, 2018 quotes the APRA chair as acknowledging, “advancements in the safety and security in using the cloud, as well as the increased appetite for doing so, especially among new and aspiring entities that want to take a cloud-first approach to data storage and management.

It’s curious to see the evolution of how organizations consider the cloud. I think UniSuper/GCP’s quick restoration of their cloud projects will result in a much more friendly environment toward the cloud.

How To Waste AdWords Budget: Postie Plugin Edition

Some time ago I was looking for ways to send in posts to my WordPress blog via email, and I found a reference to a WordPress plugin called “Postie.” So I popped that into Google search and what did I get?

The correct answer to this search would be the Postie WordPress plugin hosted here. But apparently there is another company named Postie which manages enterprise mail (hosted at postie.com) which is a completely separate entity to the WordPress plugin (hosted at postieplugin.com). As you can see from the screenshot, my search resulted in an ad for the enterprise company.

But I have no interest in enterprise mail. That ad is effectively wasted. Worse yet, the CTR (clickthrough rate, the number of times the ad is clicked on divided by the number of times the ad is shown) of the ad goes down through no fault of the ad itself. But you can see why the ad was shown – the ad’s creator placed ads on the word “postie” and didn’t realize there might be other organizations with the same name.

This is a good example of where negative keywords are used. In short negative keywords are used to find searches to NOT show ads to. In this case, Postie (the enterprise company) should have used negative keywords to exclude the word “plugin” so they’re not confused with Postie Plugin (the WordPress plugin).

Google SEO Update On March 2024: Up 314%

If you’re interested in search optimization, you’ll know about Google’s new search update that released in March 2024. Per Google, the search update is intended to weed out low effort sites, sites with a ton of AI content, affiliate review sites, and so forth. A good outline of what went on in this update is here.

In short, a lot of chaos occurred. Major publications are reporting pretty severe drops in traffic; smaller sites are reporting traffic drops of greater than 90%. Here’s a fun quote:

BBC News, for example, was among the sites that saw the biggest percentage drops, with its site losing 37% of its search visibility having fallen from 24.7 to 15.4 points in a little over six weeks. Its relative decline was second only to Canada-based entertainment site, Screenrant which saw its visibility fall by 40% from 27.6 to 16.7.

https://pressgazette.co.uk/media-audience-and-business-data/first-google-core-update-of-2024-brings-bad-news-for-most-news-publishers/

There’s a lot of doom and gloom about this update, but I’m really liking it. I’m seeing a lot of very interesting stuff float up on my Google searches that normally would be buried. In particular I’m seeing fewer “top 10 XYZ” type webpages and more links to opinion websites such as Reddit and other forums.

And then there’s this: one of my websites is reporting 314% more clicks from Google search.

I run a small blog (not this one) which is basically a tumblelog-style fan blog for a specific consumer-goods company. It really doesn’t do much except repost funny pictures and interesting articles. The blog typically gets about 100 clicks a month from Google search – which never ceases to amaze me, especially since the site itself is so simple.

With that in mind, I was shocked to suddenly see a burst of emails over the past month congratulating me over a sudden rise in traffic:

A sample of the emails:

What on earth is going on? A quick view of my search console shows the truth:

I’m not making any larger point here, it’s just interesting to see how fast things can change during a search core update.

How To Internet Market: YouTube, Santa, and Canadian Airspace

Merry Christmas and happy holidays to all!

There are a lot of ways to associate your product with a holiday, and if you can successfully do that, the holiday can drive huge amounts of sales. Examples include Elf on a Shelf, eating KFC on Christmas (in Japan, it’s a widespread tradition to eat KFC fried chicken on Christmas), and the Disney parade on Christmas.

But my favorite example of Internet marketing over Christmas is NORAD Tracks Santa, located at https://www.noradsanta.org/. NORAD stands for North American Aerospace Defense Command – it’s a joint military command between American and Canadian militaries to protect the skies over both countries. Every year, the website above tracks Santa as he goes around the world delivering presents.

Now you may say: wait a minute, NORAD isn’t selling a product or service, this isn’t an example of marketing. Marketing is far more than just selling a product or service; it also includes burnishing a brand, or building greater awareness of an organization. In this case, I’m using marketing in the context of how NORAD uses NORAD Tracks Santa to build greater public awareness of its mission, and to burnish its reputation. That last part – burnishing reputation – can be helpful for government agencies, especially when asking for funding from Congress.

The NORAD Tracks Santa website is really neat – if you look at it Christmas Eve night, you see an animation of Santa flying over a world map (the world map is provided by Microsoft Bing). Here’s an example screenshot:

A screenshot of the NORAD Tracks Santa page on Christmas Eve.

The reason I love NORAD Tracks Santa as a great example of Internet marketing is how it seamlessly blends marketing, education, and the holidays in one package. For instance, look at this video from the NORAD Tracks Santa page:

A screenshot from one of the Santa-tracking videos on NORAD Tracks Santa. The video embedded on the page is hosted by YouTube. Click on the picture to go to the full video.

The YouTube video embedded on the page goes to here: https://www.youtube.com/watch?v=pR-_novdArc – go ahead and watch it. Pay close attention to what it says and more importantly, what it does not say.

Here’s a transcript of the video’s narrator if you can’t watch the video:

NORAD is receiving reports that Santa’s sleigh is moving north toward Canadian airspace from the Mid-Atlantic. CF-18 Hornets from the Royal Canadian Air Force are escorting Santa through Canadian airspace. As part of Operation Noble Eagle – NORAD’s mission to safeguard North American skies – CF-18s maintain a constant state of alert, ready to respond immediately to potential threats to the homelands. Santa and his reindeer certainly pose no threat but he can rest easy knowing that the NORAD team has the watch ensuring safe travels across North America.

NORAD Tracks Santa, NTS Santa CAM – Canadian Air Force

Consider how well the marketing is done here. There’s a education element at play (explaining Operation Noble Eagle), a marketing element (associating NORAD with the holidays, which is a positive association) and the entertainment element of watching Santa be escorted by fighter jets.

But also consider what is not said in the video and merely implied. The viewer sees the fighter jets smoothly move into an escort position, implying experience and professionalism in regards to the fighter pilots and the NORAD organization as a whole. The viewer sees the fighters soar across mountainous and ice-covered lands, implying the hard and difficult job of the organization.

Let’s try another example – here is a video of NORAD tracking Santa through Massachusetts:

A screenshot of NORAD Tracks Santa. The video is embedded from YouTube and covers how NORAD tracks Santa through the Massachusetts area. Click the picture to see the full video on YouTube. The red dot at the center of the yellow beam is not a tracking target; it’s Rudolph the Reindeer’s lighted red nose.

The above screenshot embeds the following video, which tracks Santa as he passes over the Cape Cod Air Force Station: https://www.youtube.com/watch?v=RGchQuqqwd4 . I recommend watching it, but here’s a transcript if you can’t:

NORAD was notified by Air Force Space Command that their PAVE phased-array warning system – early warning radar known as PAVE PAWS at Cape Cod Air Force Station Massachusetts – is tracking Santa on his way from the US to South America. This radar is not only capable of detecting ballistic missile attacks and conducting general Space Surveillance and satellite tracking, but at this time of year the PAVE PAWS station keeps an eye on Santa as he flies over the Atlantic toward the Western Hemisphere.

NTS Santa Cam English Ground Station at Cape Cod

Again, note the educational aspects of the video (what PAVE PAWS stands for and what it does), the marketing aspects of the video (associating NORAD and the Air Force with the holiday season) and the entertainment element of watching Santa.

But again consider what is not said. The video implies professionalism (someone is manning the station at night on a holiday) and security (someone is on the watch for possible threats).

The Takeaway

NORAD Tracks Santa is a masterpiece of marketing done right. Consider adding similar elements to your online marketing strategy, such as a simple game, amusing videos, and educational content discussing your organization’s mission.

Finance, Google, and Plex

I remarked in a previous blog post about how Google is diversifying their income by moving into financial products. Today sees the launch of Plex, a way to manage bank accounts, offers, and (soon) to open bank accounts.

Google launching waitlist for Plex, its new banking app. https://twitter.com/Google/status/1329120723193921543

This Verge article goes more in depth about Plex; the part I find most interesting is this sentence:

But Google is also ramping up other ways to pay with this app. Underneath People and Businesses are a couple of new buttons: “Get gas” and “Order food.” The food option ties into Google’s existing food ordering system that is compatible with enough systems for the company to claim it works with over 100,000 restaurants. You’ll also be able to pay for gas or parking directly in the app…

Extracted from https://www.theverge.com/2020/11/18/21571806/google-pay-relaunch-money-payments-finances-deals-offers-banking-plex

What I find interesting about Google Plex is that it’s a huge expansion of Google’s business: it moves Google more into the consumer realm such as into financial management and payments (competing with Samsung Pay, Apple Pay, Mint), into food ordering (competing with GrubHub) and gas (competing with many loyalty programs). If Plex succeeds, it could mean a many multi-billion dollar business, even larger than the Google Cloud Platform business unit.

Fun With Google Trends: AP CS Edition

I was amused at this recent Slashdot posting: Google Searches For ‘Java’ Spiked During Friday’s Online AP CS Exam. Slashdot includes this interesting chart, showing traffic trends for Java searches peaking around the time of the AP CS test:

Credit: Slashdot

Apparently the first AP CS question was about java.util.ArrayList, and the Google Trends chart for ArrayList shows the same bump on Friday (this chart shows the search interest for ArrayList over the last 3 months):

Credit: Google Trends

It’s always fun to see how Google’s search volume changes depending on external events. Try looking at Google Trends whenever a major event occurs and watch the keywords people use.

Tinkering With WP-CLI, Google Cloud, and BlueHost

I’ve been spending the last few days wrapping the WP-CLI application – a command line program to automate the administration of WordPress installations – inside a Java app so I can automate some WordPress work. One of the major bottlenecks was fixing up the correct SSH string to connect to the various WordPress providers.

Initially I was having a bit of difficulty because I misread the wp-cli documentation and I thought the –user argument was the SSH username. When I got that fixed, it turns out that some WordPress hosting services, such as BlueHost, require you to contact their support to activate SSH. I had to connect my application to multiple WordPress hosting services, but in this post I’ll use BlueHost as an example since they’re fairly representative of the work I had to go through.

In BlueHost’s case, having support activate SSH support was a surprisingly painless process – it only took a quick 5 minute text chat where I verified my email address. To build out the proper SSH command, I also needed to look at details provided by cPanel:

Click to expand.

All the information you need to build the SSH string is in the General Information section in the above screenshot. Your SSH string should look like this:

php wp-cli.phar plugin list --ssh=<CURRENT_USER>@<SHARED_IP_ADDRESS>:2222/<HOME_DIRECTORY>/public_html --debug

The BlueHost account I’m using as an example is a shared WordPress account, so it listed a shared IP. Make sure to double check the port number (2222 in the above code sample) – the usual SSH port is 22, but BlueHost uses 2222 for shared accounts. Note that I’ve listed an additional folder under the user home directory; the home directory path only takes you to the user’s home directory, but wp-cli needs the path to the WordPress installation, which is usually under another folder (in this case /public_html/ ).