The man page reading club: sh(1) - part 2: commands and builtins
This post is part of a series
This is the second and last part of our exciting sh(1) manual page read. This time we are going to learn about commands and builtins. In case you have missed it, check out the first part where we dealt with the shell’s grammar.
I’ll spare you the fan fiction this time - let’s go straight to the technical part!
As usual, you can follow along at man.openbsd.org
Commands
The Commands section of the manual page starts like this:
The shell first expands any words that are not variable assignments or
redirections, with the first field being the command name and any
successive fields arguments to that command. It sets up redirections, if
any, and then expands variable assignments, if any. It then attempts to
run the command.
The next few paragraphs describe how the name of a command is
interpreted. There are two distinct cases: if the name contains
any slashes, it is considered as a path to a file; if it does not,
the shell tries to interpret it as a special builtin, as a shell
function, as a non-special builtin (the difference between these
two types of builtins will be explained later) or finally as the
name of an executable file (binary or script) to be looked for in
$PATH
.
The meaning of this variable is explained in the ENVIRONMENT
section:
PATH Pathname to a colon separated list of directories used to search for
the location of executable files. A pathname of `.' represents the
current working directory. The default value of PATH on OpenBSD is:
/usr/bin:/bin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/local/bin
Grouping commands
The manual page continues with explaining how to group commands together to create more complex commands. There are five ways to create a list of commands, and their syntax is always of the form
command SEP command SEP ...
where SEP
is one of the separators described below.
- Sequential lists: One or more commands separated by a semicolon
;
are exectuted in order one after the other. - Asynchronous lists: One or more commands separated by an ampersand
&
are executed in parallel, each in a different subshell. - Pipelines: Two or more commands separated by a pipe
|
are executed in order, using the output of each command as input for the next one. Together with I/O redirection, that we have seen last time, pipelines are one of the “killer features” of UNIX that makes its shell such a powerful language that it is still widely appreciated more than fifty years after its introduction. - AND lists: Two or more commands separated by a double ampersand
&&
are executed in order, but a command is only run if the exit status of the previous command was zero. - OR lists: Two or more commands separated by a double pipe
||
are executed in order, but a command is only run if the exit status of the previous command was different from zero.
The AND and OR lists can be combined by using a mix of &&
and
||
. The two operators have the same precedence.
The exit status of a list of commands is equal to the exit status
of the last commands executed, except for asynchronous lists where
the exit status is always zero. For pipelines, the exit status can
be inverted by putting an exclamation mark !
at the beginning of
the list.
Now that I think about it, I have mentioned the exit status of a command a few times here and in the last episode, but I have never explained what it is. Basically, every command concludes its execution by returning a number (exit status), which may be zero to indicate a succesful execution or anything different from zero to indicate a failure. This will become even more relevant soon.
Finally, a list of commands can be treated as a single command by enclosing it in parentheses or in braces:
Command lists, as described above, can be enclosed within `()' to have
them executed in a subshell, or within `{}' to have them executed in the
current environment:
(command ...)
{ command ...; }
Any redirections specified after the closing bracket apply to all
commands within the brackets. An operator such as `;' or a newline are
needed to terminate a command list within curly braces.
Flow control
Much like any imperative programming language, the shell has some constructs that allow controlling the flow of the execution. The for loop is perhaps the most peculiar one. Its format is:
for name [in [word ...]]
do
command
...
done
The commands are executed once for every item in the expansion of
[word ...]
and every time the value of the variable name
is set
to one of these items. (check the last episode
for an explanation of text expansion).
While loops are perhaps more familiar to regular programmers: a command called condition is run, and if its exit code is zero the body of the while loop is executed, and so on. The format is
while condition
do
command
...
done
There is an opposite construct with until
in place of while
which executes the body as long as condition
exits with non-zero
status.
A case conditional can be used to run commands depending on something matching a pattern. The format is
case word in
(pattern [| pattern ...]) command;;
...
esac
Where pattern
can be expressed using the usual filename globbing
syntax that we briefly covered last time - see
glob(7) for more
details.
As an example, this short code snippet tries to determine the type of the file given as first argument from its extension:
case "$1" in
(*.txt) echo "Text file";;
(*.wav | *.mp3 | *.ogg) echo "Music file";;
(*) echo "Something else";;
esac
Note that double quotes around the $1
to avoid file names with
spaces being considered as multiple words.
The if conditional is also a classic construct that programmers are very familiar with. Its general format is
if conditional
then
command
...
elif conditional
then
command
...
else
command
...
fi
Like for the while
construct, conditional
is a command that is
run and its exit status is evaluated. elif
is just short for
“else, if…”.
Finally, the shell also has functions, that are basically groups of commands that can be given a name and executed when using that name as a command. Their syntax may be simpler than you expect:
function() command-list
When defining functions it is common to write command-list
in the
{ command ; command ; ... ; }
format. Replacing the semicolons
with newlines we get the more familiar-looking structure
function() {
command
command
...
}
Builtins
The builtins are listed in alphabetic order in the manual page, which is very convenient when consulting it for reference, but it is not the best choice for a top-to-bottom read. So I’ll shuffle them around and divide them into a few groups. I’ll skip some stuff, but I’ll try to cover what is important for regular use.
But first, as promised at the beginning of the previous section, we need to explain the difference between “special” and regular builtins.
A number of built-ins are special in that a syntax error can cause a
running shell to abort, and, after the built-in completes, variable
assignments remain in the current environment. The following built-ins
are special: ., :, break, continue, eval, exec, exit, export, readonly,
return, set, shift, times, trap, and unset.
More programming features
As we have seen, the shell language includes some classical programming
constructs, like if
and while
. There are more builtins that can be
helpful these constructs: for example true
and false
are builtins
that do nothing and return a zero and a non-zero value respectively,
thus acting as sort of “boolean variables”.
The builtins break
and continue
, used inside a loop of any kind,
behave exactly as in C. The builtin return
is used to exit the current
function. An exit code may be specified as a parameter, to indicate
success (0) or failure (any other number).
Variables
The builtin read
can be used to get input from the user - or
indeed from anywhere else, thanks to redirection:
read [-r] name ...
Read a line from standard input. The line is split into fields, with
each field assigned to a variable, name, in turn (first field
assigned to first variable, and so on). If there are more fields
than variables, the last variable will contain all the remaining
fields. If there are more variables than fields, the remaining
variables are set to empty strings. A backslash in the input line
causes the shell to prompt for further input.
The options to the read command are as follows:
-r Ignore backslash sequences.
As an example of reading from something other than standard input, this short script takes a filename as an argument and prints each line of the file preceded by its line number:
i=0
while read line
do
i=$((i+1))
echo $i: $line
done < $1
Notice that the redirector < $1
is placed at the end of the while
commend, after then closing done
.
The builtins export
and readonly
deal with permissions: the
first is used to make a variable visible to all subsequently ran
commands (by default it is not), while the latter is used to make
a variable unchangeable. The syntax is the same for both:
command [-p] name[=value]
If =value
is given, the value is assigned to the variable before
changing the permissions. The option -p
is used to list out all
the variables that are currently exported or set as read-only.
Running commands
If you want to run the commands contained in file
, you can do so
by using . file
(the single dot is a builtin). For example you
can list some commands that you want to run at the beginning of
each shell session (e.g. aliases, see the next section) and run
them with just one command. Many other shells, such as ksh, run
certain files like .profile
at startup, but sh does not.
If the commands you want to run are saved in variables or other
parameters you can use eval
. For example, the following script
takes a command and its arguments as parameters, runs them and
returns a different message depending on the exit code:
if eval $@
then
echo "The command $@ ran happily"
else
echo "Oh no! Something went wrong!"
fi
Aliases
Aliases provide a nice shortcut sometimes, for example for shortening a long command name or for adding a certain set of options by default.
Using alias name=value
makes it so every time name
is read by
the shell as a command (i.e. not when it is an argument) it is
replaced by value
. For example using alias off='shutdown -p now'
can be used to easily call the shutdown
command with the common
option -p now
- check out an older blog entry
to learn about this surprisingly feature-rich command!
Using just alias name
tells you the value of the corresponding alias,
if it is set. Using alias
with no argument returns a list of all
currently set aliases. Contrary to variables, aliases are visible in
every subshell.
Finally, unalias name
can be used to unset the corresponding
alias; unalias -a
unsets all currently set aliases.
Moving around directories
Next (a meaningless word, since we are going in our own completely
arbitrary order) we have cd
and pwd
, which can be used to move around
in the directory tree.
pwd
simply prints the current path - it is short for “Print Working
Directory”. The working directory is where files are looked for by
the shell, for example when used as arguments for commands. If a
file is not in the current working directory, its full path has to
be specified in order to refer to it.
The working directory can be changed with cd path/to/new/directory
.
If the path is not specified, it defaults to $HOME
, the home
directory of the current user. The path can also be a single dash
-
, meaning “return to the previous working directory”. Finally,
if the path does not start with a slash and is not found relatively
to the current working directory, the variable CDPATH
, which
should contain a colon-separated list of directories, is read to
try and find the new directory starting from there.
Jobs
The builtins jobs
, kill
, bg
and fg
can be used to manage multiple
jobs running in the same shell. For example you can can run a command in
the background with command &
, and later kill it with kill [id]
or
bring it to the foreground with fg [id]
(the id
of the command will
be printed by the shell when you run command &
).
I wanted to write something more about this, but I found the man page for sh a bit lacking. I had to rely on other resources, such as the manual page of ksh(1). I think I’ll postpone job control to another entry. Stay tuned!
Update: here is the post on job control.
And finally…
exit [n]
Exit the shell with exit status n, or that of the last command executed.
Conclusion
I have skipped a few sections of the man page and many of the builtins, but I am happy with the result and I think we can end it here. After all, if I did not make any selection at all for these “reading club” entries, you could just read the manual page yourself, so what would the point be?
I am not sure what I am going to cover in the next episode. On the one hand I should alternate between shorter pages and longer ones, mainly to avoid burning out by taking on too many huge projects. But on the other hand long pages are often more interesting.
Anyway, I hope you enjoyed this long double-post and that you may have learnt something new. See you next time!
Next in the series: tetris(6)