My minimalistic RSS feed setup

A couple of years ago I started using RSS (or atom) feeds to stay up to date with websites and blogs I wanted to read. This method is more convenient than what I used before (i.e. open Firefox and open each website I want to follow in a new tab, one by one), but unfortunately not every website provides an RSS feed these days.

At first I used newsboat, but I soon started disliking the curses interface - see also my rant on curses at the end of this other blog post. Then I discovered sfeed.

sfeed

sfeed is an extremely minimalistic RSS and atom reader: it reads the xml content of feed file from standard input and it outputs one line per feed item, with tab-separated timestamps, title, link and so on. This tool comes bundled with other commands that can be combined with it, such as sfeed_plain, which converts the output of sfeed into something more readable:

$ curl -L https://sebastiano.tronto.net/blog/feed.xml | sfeed | sfeed_plain
  2023-06-16 02:00  UNIX text filters, part 0 of 3: regular expressions                    https://sebastiano.tronto.net/blog/2023-06-16-regex
  2023-05-05 02:00  I had to debug C code on a smartphone                                  https://sebastiano.tronto.net/blog/2023-05-05-debug-smartphone
  2023-04-10 02:00  The big rewrite                                                        https://sebastiano.tronto.net/blog/2023-04-10-the-big-rewrite
  2023-03-30 02:00  The man page reading club: dc(1)                                       https://sebastiano.tronto.net/blog/2023-03-30-dc
  2023-03-06 01:00  Resizing my website's pictures with ImageMagick and find(1)            https://sebastiano.tronto.net/blog/2023-03-06-resize-pictures
...

One can also write a configuration file with all the desired feeds and fetch them with sfeed_update, or even use the sfeed_curses UI. But the reasons I tried out sfeed in the first place is that I did not want to use a curses UI, so I decided to stick with sfeed_plain.

My wrapper script - old versions

In the project’s homepage the following short script is presented to demonstrate the flexibility of sfeed:

#!/bin/sh
url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
    sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
test -n "${url}" && $BROWSER "${url}"

The first line shows a list of feed items in dmenu to let the user select one, the second line opens the selected item in a web browser. I was impressed by how simple and clever this example was, and I decided to expand on it to build “my own” feed reader UI.

In the first version I made, my feeds were separated in folders, one per file, and one could select multiple feeds or even entire folders via dmenu using dmenu-filepicker for file selection. Once the session was terminated, all shown feeds were marked as “read” by writing the timestamp of the last read item on a cache file, and they were not shown again on successive calls.

This system worked fine for me, but at some point I grew tired of feeds being marked as “read” automatically. I also disliked the complexity of my own script. So I rewrote it from scratch, giving up the idea of marking feeds as read. This second version can still be found in the old folder of my scripts repo, but I may remove it in the future. You will still be able to find it in the git history.

I have happily used this second version for more than a year, but I had some minor issues with it. The main one was that, as I started adding more and more websites to my feed list, fetching them took longer and longer - up to 20-30 seconds; while the feed was loading, I could not start doing other stuff, because later dmenu would have grapped my keyboard while I was typing. Moreover, having a way to filter out old feed items is kinda useful when you check your feed relatively often. A few weeks ago I had enough and I decided to rewrite my wrapper script once again.

My wrapper script - current version

In its current version, my feed scripts accepts four sub-commands: get to update the feed, menu to prompt a dmenu selection, clear to remove the old items and show to list all the new items. Since clear is a separate action, I do not have the problem I used to have with my first version, i.e. that feeds are automatically marked as read even if I sometimes do not want them to be.

Let’s walk through my last iteration on this script - you can find it in my scripts repository, but I’ll include it at the end of this section too.

At first I define some variables (mostly filenames), so that I can easily adapt the script if one day I want to move stuff around:

dir=$HOME/box/sfeed
feeddir=$dir/urls
destdir=$dir/new
olddir=$dir/old
readdir=$dir/last
menu="dmenu -l 20 -i"
urlopener=open-url

Here open-url is another one of my utility scripts.

To update the feed, I loop over the files in my feed folder. Each file contains a single line with the feed’s url, and the name of the file is the name / title of the website. The results of sfeed are piped into sfeed_plain and then saved to a file, and the most recent time stamp for each feed is updated.

getnew() {
    for f in "$feeddir"/*; do
        read -r url < "$f"
        name=$(basename "$f")
        d="$destdir/$name"
        r="$readdir/$name"

        [ -f "$r" ] && read -r lr < "$r" || lr=0

        # Get new feed items
        tmp=$(mktemp)
        curl -s "$url" | sfeed | \
        awk -v lr="$lr" '$1 > lr {print $0}' | \
        tee "$tmp" | sfeed_plain >> "$d"

        # Update last time stamp
        awk -v lr="$lr" '$1 > lr {lr=$1} END {print lr}' <"$tmp" >"$r"
    done
}

The next snippet is used to show the new feed items. The for loop could be replaced by a simple cat "$destdir"/*, but I also want to prepend each line with the name of the website.

show() {
    for f in "$destdir"/*; do
        ff=$(basename "$f")
        if [ -s "$f" ]; then
            while read -r line; do
                printf '%20s    %s\n' "$ff" "$line"
            done < "$f"
        fi
    done
}

Finally, the following one-liner can be used to prompt the user to select and open the desired items in a browser using dmenu:

selectmenu() {
    $menu | awk '{print $NF}' | xargs $urlopener
}

The “clear” action is a straightfortward file management routine, and the rest of the script is just shell boilerplate code to parse the command line options and sub-commands. Putting it all together, the script looks like this:

#!/bin/sh

# RSS feed manager

# Requires: sfeed, sfeed_plain (get), dmenu, open-url (menu)

# Usage: feed [-m menu] [get|menu|clear|show]

dir=$HOME/box/sfeed
feeddir=$dir/urls
destdir=$dir/new
olddir=$dir/old
readdir=$dir/last
menu="dmenu -l 20 -i"
urlopener=open-url

usage() {
    echo "Usage: feed [get|menu|clear|show]"
}

getnew() {
    for f in "$feeddir"/*; do
        read -r url < "$f"
        name=$(basename "$f")
        d="$destdir/$name"
        r="$readdir/$name"

        [ -f "$r" ] && read -r lr < "$r" || lr=0

        # Get new feed items
        tmp=$(mktemp)
        curl -s "$url" | sfeed | \
        awk -v lr="$lr" '$1 > lr {print $0}' | \
        tee "$tmp" | sfeed_plain >> "$d"

        # Update last time stamp
        awk -v lr="$lr" '$1 > lr {lr=$1} END {print lr}' <"$tmp" >"$r"
    done
}

show() {
    for f in "$destdir"/*; do
        ff=$(basename "$f")
        if [ -s "$f" ]; then
            while read -r line; do
                printf '%20s    %s\n' "$ff" "$line"
            done < "$f"
        fi
    done
}

selectmenu() {
    $menu | awk '{print $NF}' | xargs $urlopener
}

while getopts "m:" opt; do
    case "$opt" in
        m)
            menu="$OPTARG"
            ;;
        *)
            usage
            exit 1
            ;;
    esac
done

shift $((OPTIND - 1))

if [ -z "$1" ]; then
    usage
    exit 1
fi

case "$1" in
    get)
        getnew
        countnew=$(cat "$destdir"/* | wc -l)
        echo "$countnew new feed items"
        ;;
    menu)
        show | selectmenu
        ;;
    clear)
        d="$olddir/$(date +'%Y-%m-%d-%H-%M-%S')"
        mkdir "$d"
        mv "$destdir"/* "$d/"
        ;;
    show)
        show
        ;;
    *)
        usage
        exit 1
        ;;
esac

I personally like this approach of taking a simple program that only uses standard output and standard input and wrapping it around a shell script to have it do exactly what I want. The bulk of the work is done the “black box” program, and the shell scripts glues it together with the “configuration” files (in this case, my feed folder) and presents the results to me, interactively (e.g. via dmenu) or otherwise.

At this point my feed-comsumption workflow would be something like this: first I feed get, then I do other stuff while the feed loads and later, after a couple of minutes or so, I run a feed show or feed menu. This is still not ideal, because whenever I want to check my feeds I still have to wait for them to be downloaded. The only way to go around it would be to have feed get run automatically when I am not thinking about it…

Setting up a cron job

My personal laptop is not always connected to the internet, and in general I do not like having too many network-related jobs running in the background. But I do have a machine that is always connected to the internet: the VM instance hosting this website.

Since my new setup saves my feed updates to local files, I can have a cron job fetch the new items and update files in a folder sync’d via syncthing (yes, I do have that one network service constantly running in the background…). This setup is similar to the one I use to fetch my email.

I rarely use cron, and I am always a little intimitaded by its syntax. But in the end to have feed get run every hour I just needed to add the following two lines via crontab -e:

MAILTO=""
0 * * * * feed get

This is my definitive new setup, and I like it. It also has the advantage that I only need to install sfeed on my server and not locally, though I prefer to still keep it around.

So far I have found one little caveat: if my feed gets updated after I read it and before I run a feed clear, some items may be deleted before I see them. This is easilly worked around by running a quick feed show before I clear the feeds up, but it is still worth keeping in mind.

Conclusions

This is a summary of my last script-crafting adventure. As I was writing this post I realized I could probably use sfeed_update to simplify the script a bit, since I do not separate feeds into folders anymore. I have also found out that sfeed_mbox was created (at least I think it was not there the last time I checked) and I could use it to browse my feed with a mail client - see also this video tutorial for a demo.

With all of this, did I solve my problem in the best possible way? Definitely not. But does it work for me? Absolutely! Did I learn something new while doing this? Kind of, but mostly I have just excercised skills that I already had.

All in all, it was a fun exercise.