Correcting A SQLite Code Example

I’ve been experimenting with filtering and manipulating a large amount of data within a Google Cloud Function. I decided to use an in-memory SQLite database to help manage all the data, so I googled up some code samples. This page came up with some helpful Python code samples.

Unfortunately when I tried to run the sample code, Cloud Functions popped an error. The sample code uses Python 2-style print as a statement instead of as a function call – i.e. the print call is missing the parentheses needed to make it a correct function call. Here’s a sample screenshot:

I’ve placed red arrows next to the erroneous print statements. If you paste this code into Google Cloud Functions, it won’t work because print needs to be a function call, (with parentheses) instead of a statement (missing parentheses). Credit: https://www.tutorialspoint.com/sqlite/sqlite_python.htm

Below is a fixed version of the code in the linked page. You can paste it directly into the Google Cloud Functions editor and it’ll work: it sets up an in-memory database, creates a table, adds data, then queries data out of it.

import sqlite3



def hello_world(request):
    """Responds to any HTTP request.
    Args:
        request (flask.Request): HTTP request object.
    Returns:
        The response text or any set of values that can be turned into a
        Response object using
        `make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
    """
    conn = sqlite3.connect(":memory:")
    conn.execute('''CREATE TABLE COMPANY
         (ID INT PRIMARY KEY     NOT NULL,
         NAME           TEXT    NOT NULL,
         AGE            INT     NOT NULL,
         ADDRESS        CHAR(50),
         SALARY         REAL);''')
    conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
        VALUES (1, 'Paul', 32, 'California', 20000.00 )");
    conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
        VALUES (2, 'Allen', 25, 'Texas', 15000.00 )");
    conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
        VALUES (3, 'Teddy', 23, 'Norway', 20000.00 )");
    conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
        VALUES (4, 'Mark', 25, 'Rich-Mond ', 65000.00 )");
    conn.commit()
    print("Records created successfully");
    cursor = conn.execute("SELECT id, name, address, salary from COMPANY")
    for row in cursor:
        print("ID = ", row[0])
        print("NAME = ", row[1])
        print("ADDRESS = ", row[2])
        print("SALARY = ", row[3], "\n")
    conn.close()
    request_json = request.get_json()
    if request.args and 'message' in request.args:
        return request.args.get('message')
    elif request_json and 'message' in request_json:
        return request_json['message']
    else:
        return f'Hello World!'

Use this code as a starting point to build your own cloud functions and work with data.

I’m pleasantly surprised at how fast SQLite runs within a cloud function – I was worried that the function would run out of memory quickly, but I’ve been manipulating thousands of rows comfortably within a 512MB RAM function.

Cloud Build Failure: You do not have permission to act as (Service Account)

I have a demo App Engine application on GitHub, mapped through Google Cloud Build to automatically redeploy upon any change in the master repository. I’ve left this app untouched for about a month or so, until now where I made some minor updates and pushed those updates to the GitHub repository.

Unfortunately it seems that Cloud Build has changed some permissions, because suddenly errors came up and my updates failed to deploy. Here’s a screenshot of my Cloud Build page, and the errors:

My Cloud Build page. All my builds in October and earlier of this year succeeded, but my November builds started failing.

Apparently this error was due to a permissions error

ERROR: (gcloud.app.deploy) PERMISSION_DENIED: You do not have permission to act as 'project-name@appspot.gserviceaccount.com'
- '@type': type.googleapis.com/google.rpc.ResourceInfo
  description: You do not have permission to act as this service account.
  resourceName: project-name@appspot.gserviceaccount.com
  resourceType: serviceAccount
ERROR
ERROR: build step 0 "gcr.io/cloud-builders/gcloud" failed: step exited with non-zero status: 1
Error detail in context.

When I tried to force the run via the Run trigger, I got this error:

Failed to trigger build: generic::permission_denied: service account account@cloudbuild.gserviceaccount.com has insufficient permission to execute the build on project project-name.

In short, you need to add the Cloud Build Service Agent role to Cloud Build, allowing it to use service accounts to authenticate into other Google services. in the IAM section of the cloud console, find the Cloud Build service account:

The Cloud Build service account is in the black box.

And then add the Cloud Build Service Agent to the Cloud Build service account:

After I added that role, my Cloud Build deployments worked again.

A Funny Health Graph

Health graphs always amuse me. I occasionally have to scratch my head and wonder what exactly is being measured.

Take the graph below. At first, the service has zero users and a zero error rate. But once it gets to 1 user (July 19), the error rate ramps right up to 100% and stays at 100% error rate even when there are no users using it (July 20 – 21 part of the graph).

How can there be an error rate if it’s not being used?

Just another day in amusing metrics.

Initial Thoughts On Comparing Tweets

Note: some of this code is old; it could be rewritten to benefit more from SQL’s windowing and other advanced functions.

I’ve been poking around some old code of mine recently – since it’s going unused, maybe someone on the Internet would benefit from it. A few years ago, I was toying with some code to detect Twitter trending tweets. Consider a SQL table set up like the below, where user is the name of Twitter user posting the tweet, ID is the tweet ID, created_at is the creation date of the tweet, and retweet_count/favorite_count are the number of retweets and favorites the tweet had.

CREATE TABLE twitter (
    category VARCHAR(30),
    user VARCHAR(30),
    id VARCHAR(90), 
    created_at DATETIME,
    retweet_count INT,
    favorite_count INT,
    add_date DATETIME
);

The trend finder worked like this: it would take snapshots of each Tweet’s retweet and favorite count (add_date is the datetime of the snapshot) and then compare the snapshots to detect how fast the tweet was gaining retweets/favorites. The comparison was done in SQL. I named the gain rate of retweets/favorites the slope of the activity, and calculated multiple slopes: the increase in retweets/favorites within the most recent 90 minutes, and the overall increase of retweets/favorites. Here’s the simplified SQL (other SQL has been cut off because this is already too long, and not relevant to the point I’m making):


insert into tweet_slope 
(

select twitter.created_at, twitter.category, twitter.id, twitter.retweet_count as max_retweet, 
 twitter.favorite_count as max_favorite, twitter.add_date, new_table.max_date, 
 first_table.first_date, first_table.first_retweet, first_table.first_favorite, 
 prior_table.prior_date, prior_table.prior_retweet, prior_table.prior_favorite,
                                                                                                    	
(twitter.retweet_count - first_table.first_retweet) as full_retweet_diff,
TIMESTAMPDIFF(MINUTE, first_table.first_date, new_table.max_date) as full_retweet_timespan,
((twitter.retweet_count - first_table.first_retweet) / 
TIMESTAMPDIFF(MINUTE, first_table.first_date, new_table.max_date)) as full_retweet_slope,
(twitter.retweet_count - prior_table.prior_retweet) as recent_retweet_diff,
TIMESTAMPDIFF(MINUTE, prior_table.prior_date, new_table.max_date) as recent_retweet_timespan,
((twitter.retweet_count - prior_table.prior_retweet) / 
TIMESTAMPDIFF(MINUTE, prior_table.prior_date, new_table.max_date)) as recent_retweet_slope,
(twitter.favorite_count - first_table.first_favorite) as full_favorite_diff,
TIMESTAMPDIFF(MINUTE, first_table.first_date, new_table.max_date) as full_favorite_timespan,
((twitter.favorite_count - first_table.first_favorite) / 
TIMESTAMPDIFF(MINUTE, first_table.first_date, new_table.max_date)) as full_favorite_slope,
(twitter.favorite_count - prior_table.prior_favorite) as recent_favorite_diff,
TIMESTAMPDIFF(MINUTE, prior_table.prior_date, new_table.max_date) as recent_favorite_timespan,
((twitter.favorite_count - prior_table.prior_favorite) / 
TIMESTAMPDIFF(MINUTE, prior_table.prior_date, new_table.max_date)) as recent_favorite_slope
 from (select id, max(add_date) as max_date from twitter group by id) as new_table
 inner join twitter on twitter.id=new_table.id and twitter.add_date=new_table.max_date
 INNER JOIN 
(select twitter.id, twitter.add_date as first_date, twitter.retweet_count as 
 first_retweet, twitter.favorite_count as first_favorite from twitter inner 
 join (select id, min(add_date) as min_date from twitter group by id) as old_dates 
 on twitter.id=old_dates.id and twitter.add_date=old_dates.min_date) as first_table
 
 on twitter.id=first_table.id
 INNER JOIN 
 (select twitter.id, twitter.add_date as prior_date, twitter.retweet_count as 
 prior_retweet, twitter.favorite_count as prior_favorite from twitter INNER JOIN 
 (select id, max(add_date) as prior_set_date from twitter where add_date <  
 DATE_SUB(NOW(), INTERVAL 90 MINUTE) group by id) as prior_set 
 ON twitter.add_date=prior_set.prior_set_date and twitter.id=prior_set.id) as  prior_table 
 ON twitter.id=prior_table.id
 where twitter.created_at > DATE_SUB(NOW(), INTERVAL 1 DAY) AND 
 new_table.max_date > DATE_SUB(NOW(), INTERVAL 45 MINUTE) AND 
 new_table.max_date <> first_table.first_date
and twitter.category = :category
)
;

Now this code worked, and it worked great! A fast DB machine made all the difference to crunch out all the SQL calculations. I tested it by sampling tweets from sets of related accounts. One set I tested were all Disney/Marvel/Star Wars-related Twitter accounts (for instance, the accounts of actors for Luke Skywalker, Iron Man, Disney Parks) and the top-ranking tweets were almost all from Mark Hamill’s account; on the day I was testing, Mark’s tweets had thousands of favorites/retweets due to being politics-related.

I said this code was old at the beginning of the post, and that’s true – it has since been replaced. While this code worked, it was too simple: in practice, it would never surface some trending tweets from less-popular Twitter accounts.

The real problem is that the popularity of tweets is a bimodal distribution: most tweets end up puttering along with, at best, a handful of retweets/favorites; some might even get a few hundred. But then there’s a small minority of tweets, posted from extremely popular accounts, that get thousands and thousands of favorites and retweets regardless of the individual tweet’s merit – the tweets themselves are popular because the poster is popular. This crowds out the trending tweets from the other section of the bimodal graph.

I rebuilt the application to take into account the number of followers a Twitter user has (as a proxy for the account’s relative popularity) and to compare a tweet to the account’s tweet history (how popular is a given tweet compared to the other tweets in the same account?). I’ll share the code soon.

AppSheet

I’m pretty bullish on all no-code/low-code automation services such as IFTTT, Zapier, AWS HoneyCode, and all the startups in this sector. Software is eating the world, and it’s not hard to see the piles of cash being shoveled to the first company who can drastically simplify a lot of current IT operations.

That’s one of the reasons I’m pleased to see Google introducing more automation within AppSheet and Apigee: https://techcrunch.com/2020/09/08/google-cloud-launches-its-business-application-platform-with-apigee-and-appsheet/ . I’ve been toying around with AppSheet to help automate some work, and it is a surprisingly fun and agile platform to work on.

I really like the deep integration with Google Sheets: AppSheet makes it easy to quickly deploy a proof of concept app or a low-traffic production application and store data within a Google Sheet. Better yet, AppSheet can analyze your data and set up an app template automatically. When I first started AppSheet, it automatically imported a spreadsheet “Post Mapping” stored in my Google Drive: it was a listing of bookmarks and dates I had been saving.

A view of AppSheet’s introductory page.

When I clicked on the Post Mapping file, AppSheet automatically created table and calendar views of my data:

One thing I’m surprised not to see is integration with Firebase. Firebase is – IMO – the easiest to use backend for mobile and web apps, and yet AppSheet doesn’t integrate it. At the very least I’d like to see AppSheet borrow Firebase’s authentication options.

In short, AppSheet is definitely something to keep your eyes on.

Firestore Add Document – Python

A simple example of adding a document to the Firestore:

    from google.cloud import firestore
    
    #Make a note in the firestore
    firestore_client = firestore.Client()
    doc_ref = firestore_client.collection(u'CollectionsNameHere').add({
        u'propertyone': u'thisisavalue',
        u'propertytwo': 2,
    })
    logging.warning(doc_ref[1].id)

.add returns a tuple of the date that the document was written into the firestore, and a document reference to the written document. .add lets Firestore create a document ID to the written document: to figure out what the ID is, the last line accesses the tuple and pulls out the document ID.

An item that bit me: the official documentation for Firestore states that .add and .doc().set() are equivalent. That’s true, but the returns from both of those functions are different.

Documentation from Google showing .add and .doc.set are the same operation.

.add returns a tuple of the document creation time and then a reference to the added document. However, .set returns a WriteResult, not a tuple. The WriteResult contains the time the document was updated/set in the property update_time. Make sure to use the correct function as necessary.

Set Documentation

Set Documentation - Firestore
WriteResult documentation - Firestore.

Add Documentation

Add documentation - Firestore

Missing Logs – Google Logs Viewer

I opened a new GCP project to host a Python application when I hit a problem – my logging.info() and logging.warn() statements weren’t showing up in my logs. Then I realized the standard error and standard out streams weren’t selected in logging!

If you’re missing log information, make sure to select the correct streams in the second dropdown box, as in below:

Screenshot of logging, selecting stderr and stdout streams.
Screenshot of logging, selecting stderr and stdout streams.

PhantomJSCloud Error: Invalid Character ‘u’ – Python

I use PhantomJSCloud to take screenshots of web apps. While writing new code, I noticed this error coming back from the server:

{  
   "name":"HttpStatusCodeException",
   "statusCode":400,
   "message":"Error extracting userRequest. innerException: JSON5: invalid character 'u' at 1:1",
   "stack":[  
      "no stack unless env is DEV or TEST, or logLevel is DEBUG or TRACE"
   ]
}

The problem came because the request wasn’t JSON-encoding the object; the fix looks like this (using the requests library):

    post_data_string = json.dumps(python_object_here, sort_keys=True, indent=3, separators=(',', ': '))
    r = requests.post('https://phantomjscloud.com/api/browser/v2/', data=post_data_string)