The man page reading club: dc(1)
This post is part of a series
For this episode I have decided to go back to the basics, in multiple
ways. Indeed dc
, the desk calculator, is:
- A calculator, the most basic functionality for a computer to be called so - “computer” literaly means “calculator”.
- A stack-machine, one of the most basic Turing-complete computational models.
- One of the oldest UNIX utilities, predating even the C language - in fact, it was originally written in B.
But is it also a practical tool to use? Let’s find out!
dc(1)
Follow along at man.openbsd.org
There are a few features marked as non-portable in the manual page, most of them relevant to OpenBSD’s bc implementation. To make the post a bit shorter, I have decided to skip all of them.
The first few lines of the manual page explain that dc
uses
reverse Polish notation:
numbers can be pushed onto a stack, and operations are performed on the top
(or top two) numbers on the stack, their result being pushed back onto the
stack to replace the operands.
dc
allows to set an arbitrary precision (here called scale), as well
as different bases for input and output - for example, you may want to
input your numbers in binary and read the output in hexadecimal. The
output base can be any number greater than 1, but the input base must
be between 2 and 16.
The most basic operation you can perform is simply pushing a number
onto the stack. Letters A to F can be used to input numbers in bases
higher than 10, and negative numbers are written with an underscore _
instead of dash -
.
The commands are listed in alphabetic order in the manual page, but I will instead separate them in more logical sections.
Basic operations
The most basic operations are +
(sum), -
(subtraction), *
(multiplication), /
(division), %
(remainder or modulus) and
^
(exponentiation). There is also v
(square root).
For example the command 4 7-
results in -3
. You can input it
like that, all on one line and without any whitespace between the
7
and the -
. But if you do, you won’t get any output. Why?
Stack manipulation
Operations remove one or more numbers from the stack and push back the result. So the answer to the previous question is: the result was pushed onto the stack, but no instruction was given to print it.
This can be done with the command p
, which prints the top number in
the stack. The command f
prints the whole stack. Both of them leave
the stack unchanged. So for example:
$ echo '4 7-p' | dc
-3
(Notice how we redirected the output of echo
to be read by dc
-
I’ll never get tired of
this)
Commands that manipulate the stack are c
to clear the whole stack
and d
to duplicate the top element. The command z
pushes onto the
stack the number of elements currently on the stack.
Scale and bases
As mentioned at the beginning, some global parameters can be set:
input base, output base and scale. This can be done with the commands
i
, o
, and k
, respectively: each of them pops the top element
of the stack and uses it as value to set the respective global parameter.
The capitalized version of these commands, I
, O
and K
, read
the value of the input base, the output base or scale respectively
and push it onto the stack
Each number on the stack has its own scale, too. This value is derived from the global scale and the scales of the operands used to compute it. More precisely:
For addition and subtraction, the scale of the result is the maximum
of scales of the operands. For division the scale of the result
is defined by the scale set by the k operation. For multiplication,
the scale is defined by the expression min(a+b,max(a,b,scale)),
where a and b are the scales of the operands, and scale is the scale
defined by the k operation. For exponentiation with a non-negative
exponent, the scale of the result is min(a*b,max(scale,a)), where
a is the scale of the base, and b is the value of the exponent. If
the exponent is negative, the scale of the result is the scale
defined by the k operation.
The command X
can be used to replace the top number with its
scale. Similarly, the command Z
replaces the top number with its
length, i.e. its number of digits (not counting eventual decimal point
or negative sign).
Registers and arrays
So far we have seen that dc
can do everything that a rather basic
RPN calculator can do. Things are going to get
much more interesting in the next two sections.
dc
allows the use of 256 registers to store data. Each register
is labelled by a single byte - in practice, an ASCII character.
This character can be anything, even a whitespace or a non-printable
character, so make sure not to put unneeded whitespace before a
register name.
The actual structure of registers was not very clear to me from the manual page. I had to read the relevant section and command descriptions a few times, and in the end I resorted to the ultimate technique: try it out. (I skipped the “read the source code” step, please forgive my impurity.)
It turns out that each register is a stack, each level of which contains both a single number and an unbounded array of numbers. The single number and the array can be manipulated separately. All the values default to 0 if unset.
The command sr
can be used to pop the top element of stack and
save it as the “single” value of register r
. You can replace r
by any other ASCII character to manipulate other registers. To load
the “single” value from register r
onto the main stack, you can
use lr
; this command does not alter the state of the register.
To manipulate a register’s array, you can use ;r
and :r
:
:r Pop two values from the stack. The second value on the stack is
stored into the array r indexed by the top of stack.
;r Pop a value from the stack. The value is used as an index into
register r. The value in this register is pushed onto the stack.
So for example 42 3:r
stores the number 42 in the third position of
the array of register r
, and 3;r
retrieves this value.
So far so good. But I said that each register is actually a stack. What did I mean by that?
The commands Sr
and Lr
(capital S and L) can be used for this: Sr
creates a new stack level on register r
, pops the top value of the
main stack, and saves that value as the “single” value. In doing so,
a new level of the register’s array is also created. Conversely, Lr
pops a level of register r
and pushes its single value onto the main
stack, deleting the whole array saved on the level that was popped.
Let’s work out an example to help us understand this. First, we push
some numbers on register a
:
1sa
100 0:a 101 1:a 102 2:a
Now register a
looks something like this:
Level 1 --- single value: 1 --- array: 100 101 102 0 0 ...
You can confirm this by running the commands la 0;a 1;a 2;a f
,
which should output the numbers 102, 101, 100 and 1, one per line.
Now let’s push another level onto the register with 2Sa
. The
register now looks something like this:
Level 2 --- single value: 2 --- array: 0 0 0 0 0 ...
Level 1 --- single value: 1 --- array: 100 101 102 0 0 ...
Running the same command as before (la 0;a 1;a 2;a f
) should
now yield 0, 0, 0, 2.
Lastly, let’s pop the top level of this register with La
. Now
it should look like this again:
Level 1 --- single value: 1 --- array: 100 101 102 0 0 ...
And you can check this with the usual command. If you do, you’ll
notice that the number 2
has also been pushed on the main stack
by the La
command.
Phew, this was a long one! And we have not reached the most interesting part yet…
Strings and macros
In dc
you can work not only with numbers, but also with strings.
You can input a string by enclosing it in square brackets, like
this: [Hello, World!]
. Square brackets can appear in a string
if they are either balanced or escaped by a backslash.
Strings can be pushed onto the main stack or saved in any register
like numbers. But what can you do with them? One thing you can do
is print them with the P
command:
[Hello, World!
]P
Hello, World!
As you can see, it is very easy to include a newline in a string.
But much more interesting is the fact that you can execute strings
with the x
command. This allows you to create macros. For example,
say you want to evaluate the function p(x)=x^2+2x-1
. Since we are
working in RPN, it is probably easier to rewrite p(x)
as
x(x+2)-1
. If your number x
is on the stack, you can compute p(x)
with the commands d2+*1-
. But what if you want to do this
multiple times? Here macros can help:
[d2+*1-]sp
Now we have saved the macro “evaluate p(x)” on the register p
. We
can execute it any time we want by loading it with lp
and then
executing it with x
:
3 lpx
_2 lpx
1 lpx
f
Should give 2, -1, 14.
Conditionals
Lastly, we can control the flow of macro execution using conditionals:
<x >x =x !<x !>x !=x
The top two elements of the stack are popped and compared.
Register x is executed if they obey the stated relation.
Let’s see a simple example: computing the average of all numbers on the stack.
First we need to save the number of elements somewhere, say in the
register n
. We can do this with zsn
. Then we need to sum the
whole stack. We can do this by calling +
until the stack is only
one element left… this sounds like a loop, but we can use recursion
instead:
[+z1<a]sa
This saves the the macro [+z1<a]
in register a
, achieving recursion:
the macro starts by summing the top two numbers, then pushes the number
of elements left onto the stack with z
, followed by one. It then pops
these two numbers and calls itself if the top one is less then the second.
Putting this all together, we can compute the average of a bunch of numbers, say to two decimal digits, like this:
10 12 11 9 8 10 11 10 10
2k
[+z1<a]sa
zsnlaxln/p
Not the most legible code, but quite short!
Conclusion
In the end I managed to write a rather lengthy post about something as
simple as a desk calculator. And I have even skipped some things, like
recursion levels and the ?
command!
Initially I wanted to write about
bc(1), the other standard UNIX
calculator. It works with the more familiar infix notation and has
for loops, if / else statements and functions. I even wrote a
small library of mathematical functions
to show off! But in the end I thought it would be boring, so I decided
to learn and write about dc
instead. In practice I am likely going to
use bc and my hand-written math library for most purposes - except
maybe computing averages, that was one example where the terseness of
dc
can come in handy.
Fun fact (from the bc manual page):
bc is actually a preprocessor for dc(1), which it invokes automatically,
unless the -c (compile only) option is present. In this case the
generated dc(1) instructions are sent to the standard output, instead
of being interpreted by a running dc(1) process.
I think it would be a fun excercise to try and re-implement dc
, and
then bc as a compiler to dc
code. I could learn a few things about
compilers with this project! But for now I’ll have to put it in the
ever-growing list of “one day, maybe” ideas.