January 13, 2020 – Learn Google

After Google Reader was shut down, I moved to NewsBlur to follow my RSS feeds. The great thing about NewsBlur is that you can add RSS feeds to a folder and Newsblur will merge all the stories under that folder into a single RSS feed.

Under NewsBlur, you’ll want to pull the folder RSS feed from the settings option:

NewsBlur settings option - the folder RSS URL is at the bottom.

The following Python code can pull the feed and iterate through it to find article information. At the bottom of this code example, each child represents a possible article, and sub_child represents a property on the article: the URL, the title, etc. I use a variant of this code to help identify important news stories.

import requests
import xml.etree.ElementTree as ET
import logging
import datetime, pytz
import json
import urllib.parse

#tears through the newsblur folder xml searching for <entry> items
def parse_newsblur_xml():
    r = requests.get('NEWSBLUR_FOLDER_RSS')
    if r.status_code != 200:
        print("ERROR: Unable to retrieve address ")
        return "error"
    xml = r.text
    xml_root = ET.fromstring(xml)
    #we search for <entry> tags because each entry tag stores a single article from a RSS feed
    for child in xml_root:
        if not child.tag.endswith("entry"):
            continue
        #if we are down here, the tag is an entry tag and we need to parse out info
        #Grind through the children of the <entry> tag
        for sub_child in child:
            if sub_child.tag.endswith("category"): #article categories
                #call sub_child.get('term') to get categories of this article
            elif sub_child.tag.endswith("title"): #article title
                #call sub_child.text to get article title
            elif sub_child.tag.endswith("summary"): #article summary
                #call sub_child.text to get article summary
            elif sub_child.tag.endswith("link"):
                #call sub_child.get('href') to get article URL

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Day: January 13, 2020

NewsBlur: Iterating Through A Folder’s RSS Feed