NewsBlur: Iterating Through A Folder’s RSS Feed

After Google Reader was shut down, I moved to NewsBlur to follow my RSS feeds. The great thing about NewsBlur is that you can add RSS feeds to a folder and Newsblur will merge all the stories under that folder into a single RSS feed.

Under NewsBlur, you’ll want to pull the folder RSS feed from the settings option:

NewsBlur settings option - the folder RSS URL is at the bottom.

The following Python code can pull the feed and iterate through it to find article information. At the bottom of this code example, each child represents a possible article, and sub_child represents a property on the article: the URL, the title, etc. I use a variant of this code to help identify important news stories.

import requests
import xml.etree.ElementTree as ET
import logging
import datetime, pytz
import json
import urllib.parse

#tears through the newsblur folder xml searching for <entry> items
def parse_newsblur_xml():
    r = requests.get('NEWSBLUR_FOLDER_RSS')
    if r.status_code != 200:
        print("ERROR: Unable to retrieve address ")
        return "error"
    xml = r.text
    xml_root = ET.fromstring(xml)
    #we search for <entry> tags because each entry tag stores a single article from a RSS feed
    for child in xml_root:
        if not child.tag.endswith("entry"):
            continue
        #if we are down here, the tag is an entry tag and we need to parse out info
        #Grind through the children of the <entry> tag
        for sub_child in child:
            if sub_child.tag.endswith("category"): #article categories
                #call sub_child.get('term') to get categories of this article
            elif sub_child.tag.endswith("title"): #article title
                #call sub_child.text to get article title
            elif sub_child.tag.endswith("summary"): #article summary
                #call sub_child.text to get article summary
            elif sub_child.tag.endswith("link"):
                #call sub_child.get('href') to get article URL

Cloud Build Error – User does not have permission to access app (or it may not exist): The caller does not have permission

Whenever I provision a new Google Cloud project, I always get bitten by this error. I keep forgetting to set up IAM rules to allow Cloud Build access to App Engine.

Screenshot of failed Cloud Build run. Cloud Build does not have permission to access my App Engine instance.
Screenshot of failed Cloud Build run. Cloud Build does not have permission to access my App Engine instance.
Operation completed over 1 objects/8.6 KiB.
BUILD
Already have image (with digest): gcr.io/cloud-builders/gcloud
ERROR: (gcloud.app.deploy) User [USER_ID_REDACTED@cloudbuild.gserviceaccount.com] does not have permission to access app [APP_ID_REDACTED] (or it may not exist): The caller does not have permission
ERROR
ERROR: build step 0 "gcr.io/cloud-builders/gcloud" failed: exit status 1

To fix this, go into Settings under Cloud Build and enable access to App Engine, and any other cloud service you use in conjunction with Cloud Build. Then wait a moment for the settings to take effect and rerun the build.

Setting up Cloud Build to connect to App Engine.
Setting up Cloud Build to connect to App Engine.

Firestore Errors

Most of my apps are using Google’s Datastore, but I decided to try out the new Firestore on a test application. I’m receiving quite a few of the below errors:

io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference cleanQueue: *~*~*~ Channel ManagedChannelImpl{logId=346, target=firestore.googleapis.com:443} was not shutdown properly!!! ~*~*~* (ManagedChannelOrphanWrapper.java:151)
    Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() returns true.
java.lang.RuntimeException: ManagedChannel allocation site
	at io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init>(ManagedChannelOrphanWrapper.java:94)
	at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:52)
	at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:43)
	at io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:514)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:223)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:164)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:156)
	at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:157)
	at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:122)
	at com.google.cloud.firestore.spi.v1.GrpcFirestoreRpc.<init>(GrpcFirestoreRpc.java:122)
	at com.google.cloud.firestore.FirestoreOptions$DefaultFirestoreRpcFactory.create(FirestoreOptions.java:80)
	at com.google.cloud.firestore.FirestoreOptions$DefaultFirestoreRpcFactory.create(FirestoreOptions.java:72)
	at com.google.cloud.ServiceOptions.getRpc(ServiceOptions.java:510)
	at com.google.cloud.firestore.FirestoreOptions.getFirestoreRpc(FirestoreOptions.java:315)
	at com.google.cloud.firestore.FirestoreImpl.<init>(FirestoreImpl.java:77)
	at com.google.cloud.firestore.FirestoreOptions$DefaultFirestoreFactory.create(FirestoreOptions.java:63)
	at com.google.cloud.firestore.FirestoreOptions$DefaultFirestoreFactory.create(FirestoreOptions.java:56)
	at com.google.cloud.ServiceOptions.getService(ServiceOptions.java:498)
Screenshot of Firestore exception - failure to shut down in code.

These errors stopped when I called close() on the com.google.cloud.firestore.Firestore object after I was done with storage operations:

Javadoc for close() on com.google.cloud.firestore.Firestore.

I can’t help but feel a little disappointed at this new requirement to close the Firestore connection. It feels like a regression from the Datastore – there was no need to close the datastore object after usage.

WordPress Annoyances

I haven’t been posting as much as I want to lately – I’ve been fiddling with some WordPress issues and a lot of work from my day job.

Here’s some minor thoughts that don’t deserve a post by themselves:

Routing

{
  "code":"rest_no_route",
  "message":"No route was found matching the URL and request method",
  "data": {"status":404}
}

I wrote a custom WP plugin which accepts requests from an App Engine application and returns some custom data. Unfortunately, my app on GAE was returning the above error whenever it tried to make a HTTP request to the WordPress app.

Long story short, the register_rest_route() on my plugin only declared a GET endpoint, and my GAE application was trying to use POST. Make sure you’re using the same HTTP type if you get this error.

WPEngine Firewalls

By default, WPEngine has a firewall that blocks GAE-originated requests from hitting WP plugins – fortunately, if you need GAE to WPEngine-hosted WP communications, you can email WPEngine through their contact form to remove the firewall on a per-blog basis.

Setting Up Sendgrid To Receive Mail

I was setting up a new application to use SendGrid’s inbound parse email function, so here’s some quick documentation. In the Sendgrid dashboard, go under Settings > Inbound Parse:

Sendgrid's settings menu holds the inbound parse option.

Then click on the top blue button: Add host & URL.

Inbound parse screen on Sendgrid. Click the top blue button to continue adding inbound options for your email.

Fill in the screen that comes up with the proper domain, and subdomain (the subdomain is optional). The destination URL is where Sendgrid will POST the email to.

At the domain registrar, set up the proper MX record. Look up the appropriate documentation based on the registrar you use – this is how it looks like on GoDaddy:

Screenshot of the proper MX record on GoDaddy.

In your application, set up a handler to answer the SendGrid request: in the screenshot example above, the handler was located at /inboundmailwebhook/. Any inbound mail gets POSTed as regular form data, which most frameworks can handle automatically.

Tweepy Code Sample: Auth & Iterating Through Following Users

Here’s a short code example using Tweepy to pull a list of following users (users that you follow). consumer_key, consumer_secret, access_token and access_token_secret are necessary tokens for authenticating into Twitter.

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

for friend in tweepy.Cursor(api.friends).items(3):
    # Extract friend (following) Twitter screen name and user id
    friend_screen_name = friend.screen_name
    friend_id = friend.id_str
    print("Friend screen name & ID: %s - %s" % (friend_screen_name, friend_id))

Searching The Content Of A Web Page: The intext: and allintext: Search Operators

Among the less useful of operators are the intext: and allintext: search operators. As the title says, these operators require that the given word(s) show up in the content of a web page. For example, if you searched for intext:stock (no space between intext: and the searched keyword), the returned web pages would have the word stock as part of the web page:

intext:stock

Similarly, if you searched for allintext:stock dis, you would get web pages with the words stock and dis within their text content:

allintext:stock dis

While these operators are important to remember, they’re not as useful as their intitle/allintitle/inurl/allinurl counterparts. In the vast majority of cases, skipping the intext: search function and searching on the same key words would result in the same, or largely the same, search results as using the operators.

Delete Old Entities – Java Datastore

This is an ultra-simplified example of how to delete old entities from the App Engine Datastore. The first 3 lines of code retrieves the current date, then subtracts 60 days from the current time (the multiplication converts days to milliseconds). DATE_PROPERTY_ON_ENTITY is the date property on the entity – when first writing the entity to the datastore, add the current date as a property. ENTITY_KIND is the entity kind we’re deleting.

		//Calculate 60 days ago.
		long current_date_long = (new Date()).getTime();
		long past_date_long = current_date_long - (1000 * 60 * 60 * 24 * 60);
		Date past_date = new Date(past_date_long);
		
		DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
		Query.Filter date_filter = new Query.FilterPredicate("DATE_PROPERTY_ON_ENTITY", Query.FilterOperator.LESS_THAN_OR_EQUAL, past_date);
		Query date_query = new Query("ENTITY_KIND").setFilter(date_filter);
		PreparedQuery date_query_results = datastore.prepare(date_query);
		
		Iterator<Entity> iterate_over_old_entities = date_query_results.asIterator();
		
		while (iterate_over_old_entities.hasNext()) {
			Entity old_entity = iterate_over_old_entities.next();
			
			System.out.println("Deleting: " + old_entity.getProperties());
			
			datastore.delete(old_entity.getKey());
		}

Note that is a simplified function – it’s useful if you have a handful of entities that need deleting, but if you have more than a handful, you should convert to using datastore cursors and paging through entities to delete.