How I update my website

When I created my website, I decided that I wanted to understand 100% of what I did. In practice, this means that I did not want to use any framework, not even a simple one. Someone might call this minimalism, someone else might call it being a control freak. I think I like the second one more.

The principles I follow, which are actually an afterthought after a few months of trial-and-error, are roughly these:

In more practical terms, this boils down to writing a couple of CSS and html files, writing a script that adds a header and a footer to the output of lowdown, and using rsync to deploy the files to my server. All of this is available on my git page, but I won’t explain every detail of these build scripts here. In particular, my script also builds a gemini version of my website, which I won’t discuss here.

Prerequisites

My website is hosted on an OpenBSD virtual machine in a remote server. I can access this virtual machine via SSH, which gives me root access to the operating system. I use rsync for uploading my files to this server. As long as you have an http server that can serve static html files and a way to upload your files to this server, you can easily manage your website in a similar way. I am not going to explain how to do all of this here; if you need help I suggest you have a look at Roman Zolotarev’s website.

As for my local machine, the one I am actually using to write these blog posts, I just use a Markdown translator, a text editor, rsync and other basic UNIX tools.

Directory structure

In my working directory there are two main folders with the exact same sub-folder structure: the first one is src, which contains all the markdown files I write, plus any other file I need for my pages, such as pictures; the other is http, which contains the html pages exactly as they are on my website. The http folder is generated from the src folder when I run the build.sh script - more on this later! (You won’t find the http folder on my git page)

There is one small caveat here: I like my urls to be clean and I want them stripped of the .html extension. To do this, I set up my src folder so that every subfolder contains at most one .md (Markdown) file, which is converted to an index.html file in the corresponding subfolder of http. In practice, if there are the following files:

├── src
│   ├── git
│   │   └── git-or-any-other-name.md

The following is generated by build.sh:

├── http
│   ├── git
│   │   └── index.html

So that when one accesses sebastiano.tronto.net/git the web server automatically serves the index.html file. Without this trick the correct URL would have been sebastiano.tronto.net/git.html or something, which I don’t like.

The main working directory also contains the top.html and bottom.html files. These files are not uploaded directly to my server - they are not even well-formed html files! - but they are used to build all other html pages.

Building the pages

The basic idea behind build.sh is very simple. If we want to create the html file corresponding to, say, src/page/file.md, we just need to create a file called http/page/index.html and copy there the contents of top.html, followed by the output of lowdown src/page/file.md, followed by the contents of bottom.html. In shell language:

cat top.html > http/page/index.html
lowdown src/page/file.md >> http/page/index.html
cat bottom.html >> http/page/index.html

Of course you don’t have to use lowdown as I do: any Markdown translator works. Indeed, if you want to use a different markup language to write your pages you just need to replace the second line in the code above.

We would also like to make a small change to the header while we build this page, namely we want the title of the page (the one displayed in your browser’s top bar or tab) to match the actual title of the page or blog post. To do this easily I have left a placeholder TITLE in top.html, that we just need to replace with the actual title of the page. To find out what the title is we just need to get the text following the first # (hash space) in the Markdown file - that is, the first “big title” of the page. We can do this thanks to the classic UNIX tools sed, grep and head:

sed "s/TITLE/$(grep '^\# ' < src/page/file.md \
    | head -n 1 | sed 's/^\# //')/" < top.html > http/page/index.html
lowdown src/page/file.md >> http/page/index.html
cat bottom.html >> http/page/index.html

The first two lines might be a bit complicated to work out if you are not familiar with these commands. Let’s break them down!

The main command is sed "s/TITLE/...stuff.../" which replaces the first occurrence of the string TITLE with that complicated stuff. The end of the second line tells sed to use top.html as input and write the output to http/page/index.html. The complicated stuff that is going to replace TITLE is enclosed in $(), which means that it is the result of a command. This command is itself a chain of commands: first we find all lines that start with # with grep '^\# ' on the correct file (< src/page/file.html), then we take the first of these lines (head -n 1) and finally we trim the leading # with sed. As you can see, the UNIX shell is quite a powerful tool!

Now we just need to do all of this recursively on the src folder. The final result looks something like this:

#!/bin/sh

recursivebuild() {
    local destdir=$(echo $1 | sed 's|^src|http|')
    mkdir -p "$destdir"
    for file in $(ls $1); do
        if [ -d "$1/$file" ]; then

            # Recursively build subdirectories
            mkdir -p "$destdir/$file"
            recursivebuild "$1/$file"
        else
            extension=$(echo "$file" | sed 's/.*\.//')
            if [ "$extension" = "md" ]; then

                # Process Markdown files, as above
                sed "s/TITLE/$(grep '^\# ' < "$1/$file" \
                    | head -n 1 \
                    | sed 's/^\# //')/" < top.html \
                    > "$destdir/index.html"
                lowdown "$1/$file" >> "$destdir/index.html"
                cat bottom.html >> "$destdir/index.html"
            else
                
                # Copy all other files as they are
                cp "$1/$file" "$destdir/$file"
            fi
        fi
    done
}

recursivebuild src

Extras: the blog index and RSS feed

The blog index page is also generated by the build script, but the corresponding Markdown file in src is not created by hand. Instead, this file is generated by scanning the src/blog subfolder. For each post, the date is deduced from the name of the folder containing the markdown file, which always starts with the date itself in the yyyy-mm-dd format.

While we scan the blog directory to create a list of posts, we might as well make an RSS feed file for the blog. This is a file used by feed reader applications to check if there is any new post. The format is quite simple: check out mine.

The code to accomplish this looks something like this:

makeblog() {
    bf=src/blog/blog.md    # Blog index file
    ff=src/blog/feed.xml   # RSS feed file

    printf "# Blog\n\n[RSS Feed](feed.xml)\n\n" > $bf
    cp feed-top.xml $ff

    for i in $(ls src/blog | sort -r); do
        if [ -d src/blog/$i ]; then

            # Get basic data of the post (date, title)
            f="src/blog/$i/*.md"
            d=$(echo $i | grep -oE '^[0-9]{4}-[0-9]{2}-[0-9]{2}')
            t=$(head -n 1 $f | sed 's/# //')

            # Add blog post to the list
            echo "* $d [$t]($i)" >> $bf

            # Create RSS feed item
            echo "<item>" >> $ff
            echo "<title>$t</title>" >> $ff
            echo "<link>https://sebastiano.tronto.net/blog/$i</link>" >> $ff
            echo "<description>$t</description>" >> $ff
            echo "<pubDate>$d</pubDate>" >> $ff
            echo "</item>" >> $ff
            echo "" >> $ff
        fi
    done

    # Close the RSS feed file
    echo "" >> $ff
    echo "</channel>" >> $ff
    echo "</rss>" >> $ff
}

Deploying with make

Updating or adding a page is now very easy: I just need to edit the corresponding Markdown file, run ./build.sh to build the new html pages and run

rsync -rv --delete --rsync-path=openrsync http/ \
    tronto.net:/var/www/htdocs/sebastiano.tronto.net

to sync the http directory with my server. I need to use the --rsync-path option because the rsync binary has a different name on my local system (Linux) than on my server (OpenBSD). But apart from this the command is straightforward.

Of course I don’t want to type this lenghty command every time. It is very convenient in this case to write a short Makefile:

all: clean
    ./build.sh

clean:
    rm -r http
    mkdir -p http

deploy:
    rsync -rv --delete --rsync-path=openrsync http/ \
        tronto.net:/var/www/htdocs/sebastiano.tronto.net

.PHONY: all clean deploy

So that I just need to run make to build and make deploy to upload the new files. Watch out: if you want to reproduce this on your system, make sure that the user on your server has sufficient permissions to run that rsync command - in particular you need write permission on the /var/www/htdocs folder.

If you are not familiar with the make(1) syntax, this step is completely optional and you can simply type the full commands every time, or make another small script called deploy.sh and run that instead.

Follow-up?

I am sure my build scripts will keep evolving over time, so at some point I might write a new post about the same topic. I am also probably going to write something about how I generate my git page using stagit, if anything just to document my post-receive hooks. So, if you liked this post, stay tuned for more!