A non-standard positional notation is one where the value of each position isn’t necessarily a straightforward power of the radix. I am also including when the radix is not a positive integer (such as -2), even though mathematically the representation is consistent with standard positional notation. By altering the interpretation of one or more of the place values (or the radix) of a binary representation, we are able to represent negative values. In this post I’ll be covering sign-magnitude, the most intuitive method, the radix complement methods (ones’ complement and two’s complement), offset binary (also known as excess-k or biased), and base -2 (base negative two).

This post (Negative Binary Numbers) was first published on How To Program Anything by Asher Wolfstein.

]]>This is the fifth article in a series whose intention is to have the reader able to understand binary, octal, and hexadecimal; three radices of great importance to contemporary computer theory. Up to now I have covered reading and converting integer values into binary, octal, and hexadecimal, performing arithmetic, and basic Boolean operations. Now I shall delve into non-standard positional notations. In this article, I will examine systems that allow us to represent negative binary numbers and use those negative values in computations.

In the following sections, the terms *most significant bit (MSB)* and *least significant bit (LSB)* pop up. These refer to two ends of a binary representation. As you might guess, the most significant bit is the bit that, if toggled, has the greatest effect on the resulting interpretation. In our representations, it is the left-most bit, being the highest power of 2.

Likewise, the least significant bit is the bit that, if toggled has the smallest effect on the resulting interpretation. In our representations this is the right-most bit, being only a value of 1.

In *sign-magnitude* representation, also called *sign-and-magnitude* or *signed magnitude,* one bit in a series of bits is assigned the role of the *sign bit* while the others play the part of *magnitude*. The sign bit is often the most significant bit in the sequence and is usually set to 0 for a positive value, and 1 for a negative value. The magnitude is read as a traditional binary number and then multiplied by a positive 1 (if the sign bit is positive: 0) or a negative 1 (if the sign bit is negative: 1).

In our examples for this post, we’ll be using a byte, defined as eight bits, to illustrate the systems. In this setup, we are able to represent values in the range of -127_{10} to 127_{10}. In sign-magnitude, to change the sign of a value from a negative to a positive, or vice-versa, one simply has to toggle the sign-bit. For example, 27_{10} can be encoded as 00011011_{2}, and -27_{10} encoded as 10011011_{2}. Note how the most significant digit on the left was switched from a 0 to a 1.

IBM was an early supporter of sign-magnitude, utilizing it in their 704, 709, and 7090 mainframe models. Though this method is the easiest to reason about and most intuitive, it comes with some drawbacks, particularly in implementation. For example, you might have noticed there are two different representations for the value 0. In our example, 00000000_{2} and 10000000_{2} both represent 0, with one being *-0.* A consequence of two representations for 0 is that in comparing values to 0, you must perform two checks instead of one.

Another downside, in comparison to the next methods of sign representation, is that addition and subtraction require different circuitry logic depending on the sign bit. If a computer were to operate on sign-magnitude representations it would need to be equipped with logic circuitry for both addition and subtraction. The next couple of methods of representation, using radix complements, avoid this issue.

Despite these shortcomings, sign-magnitude does find a place in the modern computing world. Floating-point values, those represented by a *significand* and an *exponent* component, often use a sign-bit in the leading or most significant bit to indicate negative or positive values.

The next couple of methods for encoding negative values in binary operate on what is known as the *method of complements* technique. This technique allows for encoding both negative and positive values in such a way that the calculator only needs to use one algorithm to compute both addition and subtraction. This is ideal in terms of hardware as extra circuitry means greater complexity and higher production costs.

The method of complements technique relies on the concept of *additive inverses.* The additive inverse of a given number *x* is a number that produces zero when added to *x*. This is also known as the *opposite*, *sign change*, or *negation*. In practice, the concept is simple: the additive inverse of 5 is -5, 23 is -23, 1 is -1, and so on.

To implement the method of complements technique we half the possible representations of a set number of digits in such a way that half of them represent positive integers, and the other half represent each positive integer’s additive inverse. Each pair, composed of the positive number, and the additive inverse is called a *complement*. The digit values are arranged in such a way in these representations that subtraction of any number is accomplished by adding its complement.

How is this accomplished; how does one arrange the digits to allow this to happen? To answer that we must delve into the concept of radix complements.

The *radix complement* of any value can be summed up in a simple equation: *b ^{n}-x*. Here

The above decimal radix complement is often called *ten’s complement*. It is referred to as ten’s complement because, for a single digit, the complement is the difference between the digit and ten. A variation on the radix complement is the *diminished radix complement* in which you subtract one from the computed base exponent before subtracting the value: *(b ^{n} – 1) – x*. This can be seen in the diagram below:

The above-diminished decimal radix complement is often called the *nine’s complement* due to the fact that each digit of the complement is arrived at by subtracting from 9 (the base – 1). As you might notice from the equation given, you can compute the ten’s complement by simply adding one to the nine’s complement.

Radix complements are useful because you can do subtraction using radix complements by actually performing addition. As noted previously, this is advantageous for hardware engineers as only one algorithm must be encoded into logic circuitry.

*But how does one do this?*

There are two main methods of performing subtraction through addition using radix complements. The first method utilizes diminished radix complements, and the second uses a “plain” radix complement.

The first method relies on the following mathematical method.

Suppose we are subtracting *y* from *x*. The diminished radix complement of *x* is *b ^{n} – 1 – x* as outlined above. If we were to add this to

The second method is a little shorter and relies on the radix complement rather than the diminished radix complement. In this method, you add the radix complement of *y* to *x* resulting in the amount *x + (b ^{n} – y)*. If

Another way to do the second method is to use the diminished radix complement in the addition, and then add 1 to the subtraction of *b ^{n}* yielding

There’s one caveat with this method that can trip up new calculators. If the subtrahend *y* has fewer digits than the minuend *x* the method won’t work unless you add additional digits to *y* before finding the radix complement. In the case of decimal, you would take the nine’s complement of 0, resulting in additional 9 digits. You can see this in action in the following figure:

Earlier I wrote that “half of [the representations would] represent positive integers, and the other half represent each positive integer’s additive inverse.” *What did I mean by this?*

Suppose that we were to treat radix complements as negative values rather than simply numbers arrived at by a calculation. To do so, we’d have to designate a set number of digits that we’re going to use (to establish the range). Let’s specify three digits: 000 – 999. We could split the range available to us (1000) into two sections: positive integers (000 – 499), and negative numbers (their additive inverses) (500 – 999). This means that both 000 and 999 would represent 0 and -0. 001 and 998 would represent 1 and -1 respectively, and so on. 998 is 001’s additive inverse in this respect because we are only allowed three digits. Adding this complement together using normal decimal techniques (not our representations), we would strip the leading 1 (as above), and get 000, which is zero. If we were to place these representations on a number line it would look like this:

Don’t get tripped up by their decimal values! In this system 749 is a *symbol* and is **actually equal to -250.** Using this strange number line, and confining ourselves to three digits, we can represent negative values without using a negative sign.

If we take a positive representation, and a negative representation and add them together as normal decimal values, the result is the correct placement on the number line. The key is that when we add two *symbols* from this number line we add them as if they are normal numbers (like in the above diagrams) rather than add the values they represent. When we map the sum (by subtracting by 999, see above) back to the number line it gives us the correct value. Let’s visualize this:

Notice how on the left calculation, where the result is a positive number that “overflows” over the maximum number of digits (3), we subtract 999 to achieve the correct result. We could have also simply stripped off the leading one, and added one to accomplish the same thing. On the right-hand side, there was no overflow, meaning we didn’t have to subtract *b ^{n}-1*. The sum on our modified number line is the correct value. This mapping is known as

I have now demonstrated in decimal one of the most common methods of representing negative values in computer processors. If you change the radix from 10 to 2, you can accomplish the same thing in binary. Duplicating the above math in base-2 yields what is known as one’s complement.

By altering the radix of the above process from ten to two we achieve in binary what is known as *one’s complement*. Because of the small number of digits in binary (0 and 1) one’s complement has at least one interesting property.

In one’s complement, negative binary numbers are simply the inverse (logical NOT) of their positive counterparts. This means, continuing our byte example, that 46_{10} in binary 00101110_{2} has a complement of 11010001_{2}. This looks like 209_{10}, but remember our number line (now in binary). It isn’t so much what its apparent value is, as it is what value it represents in our system:

The division of the representations along the number line is more elegant this time around. The leftmost bit (the most significant bit in our representation) clearly marks whether a number is considered positive or negative.

Many early computers implemented one’s complement in order to perform subtraction by using addition. These included the famous PDP-1, the LINC, the CDC 6600, and UNIVAC 1101.

It is interesting to observe the addition (and subtraction) of one’s complement representations. In observing an addition that “overflows” past our byte restriction, we can take the overflow digit (the *carry*) and add it to the rightmost digit (the least significant bit), much like we did in our decimal method. This is known as the *end-around carry* in logic circuits. You can observe this in action below:

For background on how to perform binary addition or subtraction refer to the previous article the series *Binary (Base-2) And Its Operations*. If we wanted to perform actual binary subtraction, we must do something similar with the presumed hanging borrow. This is known as the *end-around borrow* and comprises subtracting the borrow from the one’s place as such:

One’s complement is fine, but it does have some drawbacks. You can see that we have that dangling 1 that either needs to be added or subtracted depending on the values and operation. On top of that, there are two representations equaling 0 (-0 is essentially equivalent to 0 here) as happened before in sign-magnitude.

In one’s complement, you can avoid generating a -0_{10} (11111111_{2}) by performing a subtraction rather than addition and reversing the sign of the *subtrahend*. This method is known as a *complementing subtractor*.

However, having two versions of zero still requires making two tests when comparing against zero, (which is a common operation). As well, that additional hanging 1 in the end-around carry or end-around borrow complicates implementation. These two factors are less than ideal, but fortunately, there is a remedy available.

One way we can circumvent having two representations is if we perform a sort of shift by one. If we add 1 to the binary inverse (the one’s complement negative) we circumvent the hanging 1 we were encountering before. But what about -0_{10} (11111111_{2})?

If we add 1 to 11111111_{2} (the binary inverse) we end up with 00000000_{2} with a carry of 1. Ignoring the carry bit, we have the representation for (+)0. This eliminates the two versions of 0, for when we try to compute -0 we end up with 00000000_{2} again. But what happens to 11111111_{2}? That value ends up being the start of the negative numbers, being the inverse of +1_{10} (00000001_{2} -> 11111110_{2} + 1_{2} = 11111111_{2}).

Computing negative values as the positive value’s binary inverse *plus one* is called *two’s complement*. One’s complement is to two’s complement as nine’s complement is to ten’s complement, hence the name. This relationship can be shown mathematically:

The sum of a value and its *one’s complement* is all 1 bits (1010 + 0101 = 1111) which is equal to *2 ^{n}-1* (see radix complements above, 2 is the radix in base-2). If we add 1 to the one’s complement we get

Another way to think about two’s complement is to think about it in terms of place values. As the negative values decrease (move right) down our number line, the normal interpretation of the last seven bits goes from 127_{10} to 0_{10}. If we consider the eighth bit to represent -128_{10}, we can subtract the value presented in the next seven bits to arrive at the two’s complement value.

We can sum up this process in an equation that is applicable to any number of binary digits. The previous observations, this equation, and our new number line look like the following:

The equation used in this article to compute Two’s Complement was lifted from Wikipedia (https://en.wikipedia.org/wiki/Two%27s_complement) and is used (and explained) here under the terms of the CC3.0 license.

The above equation is a formal way of stating the process outlined above. Here *n* is the number of digits. We start at the leftmost digit (n-1) and multiply it by -1 and *2 ^{n-1}*. From there we perform a summation (that’s the Σ) from the next leftmost digit (n-2)’s value onward.

One algorithmic method for obtaining a value’s two’s complement is to survey its binary representation from the least significant digit to the most significant digit (in our representation from right to left). Copy the zeros up until the first 1, then invert the rest of the bits:

The advantage of all this is that there is only one representation of zero, and addition and subtraction don’t require any special logic to perform, such as the hanging 1 of one’s complement above. All that is necessary is to strip the leading 1 if present. This is usually done by simply ignoring the final carry and keeping the result restricted to a set number of bits.

When restricting the number of processable digits, it’s possible to perform an addition or subtraction that results in a sum or difference larger than can be represented. This is called an *arithmetic overflow*. One interesting property of two’s complement is that you can verify whether an arithmetic overflow has occurred by examining the leftmost two *carry bits*. In addition, if they are the same the result is valid, but if they are different an overflow has occurred. This is illustrated in the following figure:

What if we want to expand a two’s complement binary number from a smaller number of digits to a larger number of digits? If we simply add 0s as the most significant digits (leftmost in our representation), as we might do with an ordinary binary representation, we would end up transforming negative values into positive values. Remember that in two’s complement, a leading 1 indicates that the following bits are to be interpreted as negative *(-128 + x)*.

The answer is to apply what is called a *sign extension*. A sign extension duplicates the most-significant digit (the leftmost here) of a binary representation to create a representation with more digits. Adding 1s to a negative value keeps it negative, and adding 0s to a positive value retains its sign.

In the previous article, we also discussed the *shift operation* of most processors. As above, when shifting a two’s complement representation to the right, you would copy the most significant digit as the digit that is added to the left side. This would only be done when shifting to the right; shifting to the left you would add a 0 to the right.

Two’s complement may at first seem like a panacea in terms of representing negative numbers (without a sign), but there is at least one tradeoff. One’s complement was plagued with two representations of the number 0. Two’s complement is plagued with a value that defies calculations: the most negative number.

With a restricted set of digits (here we are restricting ourselves to a byte, or eight bits), the minimum representable value in two’s complement fails to materialize an additive inverse. For our byte, the minimum representable value is -128_{10}. -128_{10} is represented as 10000000_{2}. Inverting it gives us 01111111_{2}, which is 127_{10}. Adding 1 to it would give us 10000000_{2} or -128_{10}. Positive 128_{10} can’t be represented with eight bits, and thus, -128_{10} has no complement to complete it.

This means that the two’s complement of -128_{10} in an eight-bit system is -128_{10}. Performing the above sequence produces an arithmetic overflow (see above) as a carry digit goes into the most significant bit, but no carry comes from the most significant bit. This causes the carry’s two most significant bits (leftmost here) to be different (0 and 1) indicating an invalid calculation.

This leads to the following invalid results for calculations:

- negating -128, or multiplying by -1, results in -128, not +128
- the inverse of multiplication, dividing -128 by -1 is undefined
- the absolute value of |-128| is still -128

These complications must be kept in mind when performing arithmetic operations on two’s complement representations. In modern systems, this tradeoff of attention from the programmer has come about as preferable to maintaining two representations of 0 internally. The first minicomputer, the PDP-8 in 1965, used two’s complement arithmetic and set the trend. Almost all subsequent home computer processors utilize two’s complement.

The *offset binary* method, also known as *excess-k* or *biased representation*, is another way of encoding negative values. In offset representation, the encoding of a value is the bit representation corresponding to that value plus an *offset* (*biasing value* or *excess*). So, for example, if we use a biasing value of 128 (excess-128) then 0 is encoded as 0 + 128_{10} making 10000000_{2}. A positive number, such as 56, would be encoded as 56_{10} + 128_{10} = 184_{10} making 10111000_{2}. An example of a negative value would be -45_{10}, which is encoded as -45_{10} + 128_{10} = 83_{10} making 01010011_{2}. Computing the encoding like this sets up the following number line:

Notice that the number line for biased representations resembles the number line for two’s complement. The difference is in the most significant bit (leftmost here), which is opposite the two’s complement representation. Because of this you cannot simply add or subtract biased representations like you can two’s complement.

Excess-K representations are primarily used for the exponent component of floating-point representations. IEEE 754 defines the exponent component of a 32-bit single-precision number as an 8-bit excess-127 representation. The 64-bit double-precision exponent component is defined as an 11-bit excess-1023 representation. You will also see excess-k pop up again when we cover binary-coded decimal in * Non-Standard Positional Notations In Binary And Otherwise*.

Back in *Understanding Radix*, I wrote that the radix of a number does not have to be a positive integer. Technically, the radix can be any value, including the Euler constant *e*. Here I’ll show you how you can use a radix of -2 to construct a representation that can encode positive and negative values.

When employing a negative radix, we must remember that the product of a negative value multiplied by a negative value is a positive value. As the place value increases by powers of -2 you’ll notice the place value is positive and negative in turn:

At first, it may seem there are gaps in the integer values (positive and negative) that this system could represent, but surprisingly there are not. It seems that way because of the asymmetric nature of the number line that a given number of bits can represent. If there are an odd number of bits, this base can represent twice as many negative numbers as positive. And if there are an even number of bits then twice as many positive numbers can be represented.

A consequence of using this base is that the binary representations are quite irregular and difficult to read. Observe counting from negative seven to positive seven using base -2 for instance:

This concludes our survey of negative value representations in binary. We covered *sign-magnitude*, *one’s* and *two’s complements*, *excess-k*, and *base -2*. All of these are viable systems for representing negative numbers. However, some prove to be more useful in practice than others.

The most arithmetically versatile representation turns out to be two’s complement, despite the edge case of the most negative number. Since the release of mini- and microcomputers, mostly for home consumer use, most modern processors use two’s complement for representing negative numbers on the processor level (for non-floating-point operations).

Now that I’ve laid out the basic numerical uses and operations of binary, we can move on to material that “counts” in binary in a very different fashion. These are non-standard positional notation systems that don’t quite fit into the orderly box of normal numerical representation. In the next article of the series, I’ll be covering *binary coded decimal*, as well as *signed-digit representations* including *balanced ternary* and *redundant binary representations*. I will also provide an introduction to systems popularly known as *Gray codes*.

Image Based On A Photo by Fabian Bächli on Unsplash

This post (Negative Binary Numbers) was first published on How To Program Anything by Asher Wolfstein.

]]>This article continues the trend of the previous articles and begins with a history of binary. After that, I briefly reiterate why binary is used in modern electronic devices as covered in the previous article, and go into more depth regarding binary “sizes” (bit, byte, kilobyte, etc.) Then I move on to important elements of binary arithmetic, and the operations of addition, subtraction, multiplication, and division. I cover two operations often found in computing processors, the shift operators, and their mathematical meaning. Finally, I briefly cover Boolean logic operations.

This post (Binary (Base-2) And Its Operations) was first published on How To Program Anything by Asher Wolfstein.

]]>This is the fourth article in a series whose intention is to have the reader able to understand binary, octal, and hexadecimal; three radices of great importance to contemporary computer theory. By the end of this series, you should be able to read and convert integer values into binary, octal, and hexadecimal, perform arithmetic on all three representations, understand basic Boolean operations, and otherwise have a further appreciation of the power of binary.

As previously explained in the first two articles (*Understanding Radix*, and *Converting To Binary, Octal, And Hexadecimal*), the binary numeral system is a positional notation system based on a radix of 2. Because of this it is often also referred to as base-2. Since the radix is two, binary numbers use only two digits: 0, and 1.

The binary numeral system as we are familiar with it today was shaped by **Gottfried Leibniz** and other mathematicians in the 16th and 17th centuries. However, the idea of binary traces back before that, all the way to ancient Egypt.

Early forms of a binary fractional system appeared in documents dated to the fifth dynasty of Egypt around 2400 BC and were later developed in full hieroglyphic form in the nineteenth dynasty of Egypt around 1200 BC. In this fractional system, one of two that the ancient Egyptian scholars used, one would measure portions of a heqat (a unit of volume) by adding together fractions whose denominator was a power of two (1/2, 1/4, 1/8, 1/16, 1/32, and 1/64). These are known as the *Horus-Eye fractions* to today’s mathematical historians. Some believe the symbols used in this system could be arranged to form the eye of Horus.

Besides the Horus-Eye fractions, which share binary’s powers of two, the ancient Egyptians also used binary for multiplication. The *Rhind Mathematical Papyrus,* dating to 1650 BC, outlines a multiplication method very similar to Horner’s method in the previous article. In this method, you perform the multiplication of two numbers by either doubling or adding the first number to itself to achieve a product according to the binary representation of the second number.

In the 9th century BC, China developed its own version of binary, though not necessarily for mathematical purposes. The *I Ching* (or *Yi Jing, 易經*), which translates to the *Book Of Changes,* or *Classic Of Changes,* serves as a text for divination and classical Chinese thought/cosmology. This text is based on the Taoistic duality of yin and yang and uses a form of cleromancy based on eight trigrams (Bagua) and sixty-four hexagrams (Gua). If we look at those two diagram systems as powers of two (and convert them to binary) we can see that, including zero, they equate to three-digit and six-digit binary numerals.

In fact, during the Song Dynasty, scholar **Shao Yong (邵雍)** rearranged the hexagrams in a format that can be read today, unintentionally by Yong, as a sequence of binary representations. If you read solid lines as zero and broken lines as one, you can read from the bottom right to the top left of Shao Yong’s square a sequence of binary numerals counting from zero to sixty-three. Below is a reproduction of Yong’s square:

China wasn’t the only country that founded a divination system related to binary. There is also the Ifá, a Yoruba religion and as used here a system of divination. This system uses a sixteen-principle organization identified by four slots of binary values. This practice has its earliest origins in West Africa and is still observed today by many including the notable actor **Xolo Maridueña.**

On the Western side, there was the practice of geomancy, from the Late Greek geōmanteía, translating as “foresight by the earth.” This system of divination was based on a randomized generation of sixteen geomantic figures, four rows each containing a binary value of one or two dots. Geomancy itself can be traced back further into Arabic traditions, with the original names of the figures traditionally given in Arabic.

In the 2nd century BC, binary numbers emerged in India. In studying and describing prosody, being the study of meter in poetry and verses, the Indian scholar Pingala encoded short and long syllables into two different values. In his classic *Chandaḥśāstra* (*“science of meters”* in Sanskrit) he describes forming a matrix with these values to give a unique value to each meter. Unlike the binary we’ve been presenting, Pingala’s binary is read from left to right, increasing in value towards the right. His system also begins at one, instead of zero, where 0000_{2} would equal 1_{10}.

This brings us to the second most famous scholar of binary, **Gottfried Leibniz**. There were predecessors to Leibniz, such as **Ramon Llull** (who endeavored after a universal science called ‘ars generalis’), **Francis Bacon** (as a cipher), and **John Napier** (in his non-positional location arithmetic), but after **George Boole**, Leibniz is probably the most famous of Western philosophers to have embraced binary.

Leibniz admired and studied Chinese culture and was fascinated by the relation of the hexagrams (in Fuxi order above) to binary number representations one through sixty-three. His paper *Explication de l’Arithmétique Binaire*, published in 1703 (three years after **Lobkowitz’s** publication of a similar system), outlined the traditional binary positional notation we use today with increasing powers of two extending to the left.

Influenced by Llull’s arrangement of his Ars Generalis with binary combinations, Leibniz positioned his concepts on binary as central to his own ambitious version of Llull’s work, the Characteristica Universalis being a framing of the whole of reality in intellectual terms. Leibniz believed the way Llull had constructed his alphabet was too limiting and proposed another alternative and broader alphabet which used numbers rather than letters. These ideas would go on to influence his intellectual successors **George Boole** and **Gottlob Frege** in their forming of modern symbolic logic.

One hundred and fifty-one years later in 1854 **George Boole**, a mostly autodidactic British mathematician, published *The Laws of Thought*. Within this landmark work were the principles of what is now known as Boolean algebra. Boole’s intention was to systematize Aristotelian logic in such a way that a mathematical foundation could be laid out for later extension.

In Boolean algebra the values of variables are one of two truth values: true or false, often marked as 1 or 0 (as in electronic computers). This is in contrast to elementary algebra, with which most students are more familiar, where variables can be any number. This leads to a constraint on possible meaningful operations. The main operations in Boolean algebra are truth-value based and include conjunction *(and – ∧)*, disjunction *(or – ∨)*, and negation *(not – ¬)*. Because of this constraint Boolean algebra becomes a system that can describe logical operations much like those used by Aristotle and other philosopher-logicians.

Fast forward to 1937 and meet **Claude Shannon**, an American mathematician and engineer best known for being the father of *information theory*. Shannon became acquainted with **George Boole’s** work when he attended a philosophy class at the University of Michigan. Up until that point, Boole’s work seemed more like a pursuit of academic interest rather than something practical. However, Shannon brilliantly recognized that Boole’s work could be modeled by mechanical processes and thus could be brought to life in a practical way. This led to a twenty-one-year-old Shannon focusing the efforts of his master’s thesis on the optimization of electrical mechanical telephone relays using Boolean algebra.

Almost simultaneously, **Victor Shestakov** at Moscow State University proposed a theory of electric switches based on Boolean logic in 1935 and later presented an academic thesis alongside Soviet logicians and mathematicians **Yanovskaya**, **Gaaze-Rapoport**, **Dobrushin**, **Lupanov**, **Medvedev**, and **Uspensky** in 1938 which was finally published in 1941 in Russian.

In November of 1937, **George Stibitz** completed the Model K (for Kitchen) in his home kitchen. This machine utilized insights gleaned from Shannon and Shestakov about the practicalities of embodying Boolean algebra to perform binary addition. This led to his then-employer Bell Labs launching a research program in 1938 with Stibitz in charge. And this further led to Stibitz demonstrating the Complex Number Computer at the American Mathematical Society conference at Dartmouth College on September 11, 1940. There were a number of extremely notable persons in attendance including computer scientist **John von Neumann** (of *Neumann architecture*), **John Mauchly** (who later went to design ENIAC), and **Norbert Weiner** (the originator of *Cybernetics*).

Stibitz and Shannon weren’t entirely alone. Between the years 1935 and 1938, **Konrad Zuse** created a motor-powered mechanical computer called the Z1 completely from private funding. It too used binary as a basis for representation and calculation. Even though it was the first freely programmable computer (using punched celluloid film) using Boolean logic, it was unreliable. Unfortunately, this computer was a casualty of World War II when it was destroyed in a bombardment of Berlin. The Z3 was completed in 1941 and became known as the world’s first working programmable, fully automatic digital computer with 2,600 relays and a clock frequency of 5-10 Hz. This model too was destroyed in December of 1943 from an Allied bombardment of Berlin.

From there onwards, Boolean algebra and binary became the de facto standard foundation of practical digital circuit design, and likewise provided an intellectual framework for future developments of the Information Age.

When observing modern digital computers one might wonder if binary holds some special qualities, or if it’s possible to recreate the same technology using other number bases. The answer is yes, it is possible and has been done.

One of the earliest calculating machines, the Difference Engine as created by **Charles Babbage** and **Ada Lovelace**, is not based on binary at all. This engine operated on ten discrete digits, making it a decimal machine. As well, in the case of such devices as Pascal’s calculator, binary wasn’t a factor.

In an earlier article in this series, *Radix Economy*, I cover explorations into three-valued logic, or ternary, computers. For example, in the 1940s Soviet engineers were able to build a ternary-based computer they called Setun. Two decades later, research in the United States came to fruition when **Gideon Frieder** and his colleagues at the State University of New York at Buffalo designed a complete ternary machine they dubbed ternac.

In the previous article of the series, *Converting To Binary, Octal, and Hexadecimal*, I also go over why binary might be more apt than other numeral systems for use in electrical computing devices.

To summarize, when looking at embodying numeral systems into a physical medium there exists a tradeoff between mathematical expressiveness and implementation complexity. After base-3, the higher the radix used the more complex the implementation becomes. This is in stark contrast to binary, which only has two states.

Consider an electrical charge. It is much easier and more reliable to simply test or measure for the existence of an electrical charge versus its nonexistence than it is to measure varying levels of charge. With varying levels of charge, any interference or noise can more easily disturb the encoded information from one digit to any other as opposed to having to achieve a full charge.

As well, with binary, or on and off, there is also elegance in circuit construction. Here one can use Boolean logic to construct various combinational circuits, whereas in a ternary (or larger) computer the outcomes of logical operations can be less clear.

All these factors, combined with historical tradition, conspire to make binary the current de facto standard for electrically charged computers. As technology progresses into the realms of nanotechnology and quantum mechanics this may change, but for now, it is here to stay.

When storing binary within a computer system (by means of charge, magnetism, or another medium) it is useful to define how much binary information is being stored. The smallest unit of binary, being one digit of binary, is called a “bit” as coined by the American mathematician **John Wilder Tukey** while working with **John von Neumann** on early computer designs. The term “bit” is a portmanteau of “binary digit,” and was first used by **Claude Shannon** in an article published in 1948.

Historically, the next largest size up from a single bit is known as a nybble (or nibble) and is comprised of four bits. You might recognize this definition from the previous article as equivalent to the binary expansion of a hexadecimal digit. The nybble is also referred to as a half-byte (see below), tetrade, semi-octet (see below), quadbit, or quartet. Except for semi-octet, this nomenclature is more often found in the early literature of computer science. In some computer architectures, four bits are the fundamental unit of processing and in this context, each four-bit group is also known as a character. If you’ve been reading the series in order you’ll know that four bits (being a hexadecimal digit) is able to portray sixteen different digits (or states) from 0 to F.

Traditionally, the next largest size from a nybble of four bits is the byte. I write traditionally because in early computer systems the size of a “byte” was hardware/system dependent and could range from 1 to 48 bits. Computer systems using 6-bit and 9-bit bytes were prevalent in the 1960s. However, the ISO/IEC 238201:1993 and the IEC 80000-13 standards documents provided the modern standard definition of “byte” as a sequence of eight bits, being able to store the values 0-255. The term “byte” can be said to be coined by **Werner Buchholz** in 1956 while working on the IBM Stretch, or **Louis G. Dooley** while working on SAGE at MIT Lincoln Laboratory, both in 1956. In the former case, it is an intentional misspelling of the term bite in an effort to distinguish it from “bit.” Readers will note then, from the previous article, that a byte, being a sequence of two nybbles, can be represented using two hexadecimal digits.

Because the term “byte” still remains somewhat ambiguous due to its use in the early literature of computer science and in the continued use of early designs, a strictly 8-bit sequence is also referred to as an octet. Due to its specificity, the term octet is often found in modern communications protocol standards and international communications.

The unit symbol for the byte is specified in IEC 80000-13 and IEEE 1541 as the upper-case character B, while the lowercase letter o denotes an octet.

From here, it gets trickier than you might at first imagine. The confusion rests on the simple decision of whether to use powers of ten, like the decimal measuring system, or powers of two (as a computer does when specifying memory addresses), when describing ever-larger sizes. In theory, decimal measuring relies on standard SI (International Standard of Units) prefixes for increasing sizes: kilo-, mega-, giga-, tera-, peta-, etc. and their abbreviations respectively, k, M, G, T, and P. Systems based on the more historical powers of two would use binary prefixes for increasing powers of 1,024 (a power of two itself): kibi- (1024^{1}), mebi- (1024^{2}), gibi- (1024^{3}), tebi- (1024^{4}), pebi- (1024^{5}), etc. and their abbreviations respectively, KiB, MiB, GiB, TiB, PiB.

In the non-ideal real world, however, due to historical usage and the medium, there is a third convention called the customary convention where the term kilobyte (using the SI prefix) actually refers to 1024 bytes, megabyte refers to 1024^{2} bytes, and so on. This is a strange combination of the decimal measuring terms with the actual binary power values.

While kibibyte means 1024 bytes specifically, kilobyte can mean either 1024 bytes or 1000 bytes without proper disambiguation. This ambiguity is nothing to sneeze at and has actually led to lawsuits concerning the capacity of storage peripherals! This has led to courts holding that the legal definition of a gigabyte is decimal, being 10^{9} bytes, rather than the binary definition of 2^{30}. This has also created a divide in popular computer operating systems, where devices running Microsoft Windows refer to their memory capacities using the customary convention, while other popular systems (mainly of the BSD variety) such as Apple’s macOS (Darwin), iOS, Ubuntu, and Debian have adopted the decimal-based SI standard.

As you have read, the binary (base-2) numeral system is comprised of only two values: 0, and 1. Due to the simplicity of the digit, being one of two states, binary can be encoded using a variety of mediums as long as the interpreter can differentiate between two states. One could use a set of two-state switches, or on paper, one could punch out holes; as long as two mutually exclusive states exist. Please note that, when reading digital circuits, an “on” state is not necessarily equivalent to a value of 1. Thus, when examining any binary system, it’s important to understand the context.

As with any radix, you can represent any finite number using binary, particularly if using a radix point to indicate fractional values (see Radix Economy – Negative Exponents). With the exclusion of floating-point formats (which accommodate very large and very small values using scientific notation) most work in binary is done on integers that have no fractional component. When speaking or writing about binary values it is custom to use the numerals 0, and 1 in line with positional notation using a radix of two, such as 1011001_{2}.

As you’ve probably noticed, this site uses a subscript after the notation to indicate the radix used in that notation. For example, the number 5_{10} (decimal) is represented in binary as 101_{2}. There are many other conventions besides this to be aware of:

- 1011001 in binary
- In this convention, the base is explicitly stated.
- 1011001b or 1011001B
- This convention uses a suffix of b/B to indicate binary. Using a lower-case b is also known as the Intel convention.
- %1011001
- Using a prefix of % to indicate binary is in line with Intel’s competitor Motorola’s convention.
- 0b1011001 or #b1011001
- You’ll often see this notation used in programming languages, the latter, particularly, in Lisp-based languages.

Because binary is simply a base-2 numeral system and can represent any given finite value you can perform the standard mathematical operations of addition, subtraction, multiplication, and division. At first, it can seem a little odd as we are used to using more than two digits, but with a little practice, you can easily perform at least addition in your head.

The most important thing to remember when performing binary addition is that the value 2_{10} is represented in binary as 10_{2}. In this case, you’ll notice that upon reaching the value 2 (being two 1’s) we immediately “carry” to the next place value. This gives rise to the disconcerting phenomena where two 1s become a 0 and a carry of 1. This is visualized in the following diagram:

Just like in decimal arithmetic, when the result of addition between two digits is larger than can be represented by one digit you “carry” the resulting second digit over to the next place value. This is true of any positional notation system of any radix, and so it also follows in binary:

You’ll notice here that when confronted with three 1s in an addition, we get the value 3_{10}, which is 11_{2} in binary. The result is a 1 below, *and* a carry of 1 as well.

Binary representations often have long strings of 1s in them. You can perform a sort of shorthand operation in terms of the carry digits by recognizing that any carry operation begun at the beginning of a series of 1s will extend until the termination of that sequence. This is known as the *Long Carry Method*. The diagram below puts this observation in a better graphical format:

Using these methods one can perform addition on any two binary representations.

Subtraction is the inverse of addition. Instead of carrying over an excess value into a new place value, when a subtraction operation of two digits would result in a negative result you borrow from the next largest place value.

The requirement to carry for digit values too small to subtract from works just as it does in decimal (base-10). In decimal, you’d subtract the next place value to the left by 1, and carry over a “ten’s place.” The *x* in the above diagram indicates the digit of the next full place value. If the next place value is a 0, then you carry from the next left place value equal to 1.

You might be wondering how subtracting 1 from what looks like a 1 comes out as 1 when carrying. Remember that, even in decimal, when you carry you are adding the next place value to the upper digit. In decimal, you are adding 10_{10} (the radix) to the digit in question, so it follows that in binary you are adding 2_{10} (10_{2} in binary) to the place value. ** 2 – 1 = 1,** hence the result.

I believe that the best way to represent binary subtraction is to see it in action, rather than through words. Observe the following binary subtraction:

In mathematics, subtracting a positive number is equivalent to adding a negative number of equal absolute value. In the case of computers, you can do something similar by representing negative numbers in binary using a system known as *Two’s Complement*. With this non-standard notational system, you can perform subtractions utilizing the binary addition operation. Negative binary integer representation and Two’s Complement will be covered in the next article in this series.

Just as in decimal, multiplication in binary can be calculated by hand using *partial products*. Each partial product is shifted one place value to the left as the multiplication continues. Once all partial products are completed, you add them together to achieve the product.

In decimal, because the multiplication of two digits from 0 to 9 can result in a number involving two digits, calculating a product often involves complicated carries of varying amounts. Binary, in contrast, is fortunately much simpler: only two 1s multiplied together will result in a 1. Because of this, partial products are either a sequence of 0s, or a copy of the multiplier, depending on the specific digit reference in the multiplicand. The below diagram visualizes this process:

Observe in the partial products (each addition) that the multiplier 10101_{2} is repeated according to the multiplicand (on the side).

The division operation is the inverse of multiplication, where the calculator discovers how many units can fit “inside” another value. Division as a mathematical algorithm is the most intricate of the four basic arithmetic operations, and because of that has many different methods available for achieving a quotient. There are enough methods, in fact, that I could write an entire series on them alone.

However, for this section, we’ll be employing long division in binary much as we might do in decimal. This method was introduced by the mathematician **Henry Briggs** around 1600 AD. The long division I’m going to be performing here follows the notation used in the United States (where this is being written), where the divisor is to the left of the dividend, and the quotient above the dividend.

Much like in decimal long division, you perform binary long division through a series of subtractions of the divisor from the dividend. As in multiplication, where the two-value aspect of binary produced either the multiplier or zero, here we always subtract just the divisor rather than a multiple of the divisor. This means that, as we move to the right in our division, we can immediately subtract the divisor as soon as our “partial remainder” is large enough and place a 1 in the quotient. A diagram may be better able to convey this sequence of actions:

Here we divide 1101011010_{2} (858_{10}) by 101_{2} (5_{10}) and end up with a quotient 10101011_{2} (171_{10}) and a remainder of 11_{2} (3_{10}). You can see how, when the partial remainder isn’t large enough to subtract the divisor we place a 0 in the quotient and move to the right. When the partial remainder is large enough, we place a 1 in the quotient and subtract the divisor (since there can only be a multiple of 1).

Many early and modern processors have a special instruction that allows the programmer to “shift” the contents of a register (a binary representation) to the left or right, adding in 0s where necessary. Depending on the processor the bit being shifted out of the binary representation (the overflow) may end up in the carry flag, or simply be truncated. In our examples here, when shifting to the right any bits shifted past the one’s place are eliminated and ignored.

*Why would a programmer want to do this?*

There are many reasons, however, one of the chief reasons is to accomplish a quick multiplication or division by a power of two. Shifting all the bits in a representation to the left or right effectively multiplies and divides by two respectively. This works because shifting any singular bit to the left essentially multiplies its value by two because the next left place value is the next power of two while shifting to the right divides by two much as the next right place value is a lower power of two.

When the binary representation being shifted to the right has a 1 in its least-significant digit (the one’s place), the division operation rounds down to the next lowest integer.

These operations are summarized in the following diagram:

Earlier I talked about **George Boole** and his Boolean algebra. This algebra only has two truth values: true or false. This constraint enabled Boole to construct strict and specific methods of reasoning over logical propositions and provide a mathematical foundation to the field of logic.

Because values in Boolean algebra are one of two binary values, true or false, the traditional arithmetical operations in elementary algebra don’t really apply. Instead, Boolean algebra’s basic operations closely resemble the operations of *set theory* including *conjunction* (intersection set) and *disjunction* (union set).

There are numerous logical *operators*, also called *logical connectives*, *sentential connectives*, or *sentential operators*. Because logical operators aren’t directly related to counting or arithmetic in binary I will cover only three here: *and* ∧ (conjunction), *or* ∨ (disjunction), and *not* ¬ (negation).

The first two logical connectives take a minimum of two parameters, or values, while the last takes one parameter. A parameter here is a true or false value. Depending on the input values, the connective is evaluated and assigned a new truth value. This evaluation can be summed up in what is known as a truth table. Below are the truth tables for the three above operators:

You can see with the *and* operator, a 1 (true) is produced only if ** both** inputs are also true. In the

Boolean logic is the backdrop for what are known as *logic gates* in *digital circuit design.* A logic gate in a digital circuit outputs a charge depending on the inputs it receives, much in the same fashion as its namesake Boolean logic operation. So, for example, an AND gate outputs a true value (presumably a charge) only if both inputs are also true (charged). By clever utilization of these gates, circuits can be constructed that perform arbitrary combinational logic, including adding binary representations together. Circuits like these in great replication compose *computer processors.*

We’ve covered not only the history of binary, but the most basic operations, both arithmetical and logical, that a programmer or engineer might want to perform on binary representations. I’ve covered a lot of ground, but there’s still more binary than I’ve laid out so far. I mentioned in the subtraction section that it’s possible to represent negative numbers in binary in such a way that you can add them to positive binary representations and end up with the correct answer. This wizardry relies on a non-standard positional notation known as *Two’s Complement.*

In the next article, I’ll be covering the standard methods of representing negative numbers in binary and what impact those methods have on arithmetical operations, in particular subtraction.

Image Based On A Photo by AX!X on Unsplash

This post (Binary (Base-2) And Its Operations) was first published on How To Program Anything by Asher Wolfstein.

]]>This article begins with a recap of where we are in the series in regards to the concept of counting. I review the definition of positional notation as outlined in the first article and then move on to reveal how we can calculate the number of digits a value will have in a given radix. In doing so I will go over two mathematical concepts relevant to this calculation: exponents and logarithms. I will then use logarithms to show how you can calculate the efficiency of a given radix, also called the radix economy, and answer the question, “What's the most efficient radix?”

This post (Radix Economy) was first published on How To Program Anything by Asher Wolfstein.

]]>This is the third article in a series whose intention is to have the reader able to understand binary, octal, and hexadecimal; three radices of great importance to contemporary computer theory. By the end of this series, you should be able to read and convert integer values into binary, octal, and hexadecimal, perform arithmetic on all three representations, understand basic Boolean operations, and otherwise have a further appreciation of the power of binary.

In a previous article, we defined a glossary of terms associated with positional notation using the following graphic:

In this graphic:

*Digit*is a numeral in a given place indicating a corresponding value.*Number*indicates the total value being represented.*Position*is the singular placement of a given digit in relation to others.*Index*is an enumeration starting from zero of each successive position.

Let’s expand this diagram to a full definition with an example:

Here we show the definition of a positional notation system: a counting method in which the value of a given representation is a summation of each digit (*d _{b}*) multiplied by a factor determined by the digit’s position. In the most efficient of cases, the factor is determined by a given base

Here I introduce some symbols commonly associated with set theory to help with my formal definition. A set, as mentioned in the previous article, is a series of elements. I define a set *D* using curly braces {…} to denote the set of **Digits** available in the numbering system. The number of digits in this example, and in common use base systems, is equal to the base or radix being used (including zero). Thus, in a base 5 numbering system, *d _{1}*,

Under **Number**, the symbols *a _{n} … a_{4}a_{3}a_{2}a_{1}a_{0}* indicate placeholders for digits (

For more information on how that system works with more examples, please refer to the previous article, Understanding Radix.

We can make some general observations on writing numbers using a straightforward positional notation. I use the term straightforward to refer to positional notation as I have outlined above, and do not include here unusual radices such as negative or imaginary radices.

We can observe that some numbers require more digits to represent them than others. In fact, the larger the number, the more digits required to write it down. However, this isn’t a direct relationship: the number of digits doesn’t increase in direct relation to the size of the number.

It takes one digit in decimal to represent the number five. Along those lines, you might expect it then to take a hundred times more digits to represent a hundred times that value. But it doesn’t; instead, we end up with three digits representing five hundred. The number of digits then, instead of growing linearly with the value of a number, increases in a *logarithmic* way.

Up until now (in this series), I’ve assumed the reader is familiar with the concept of exponents and exponentiation when I’ve discussed powers of the radix. Here we shall actually define *exponents* as I have been using them, and then use that definition to illustrate their inverse: logarithms.

For our use, an exponent is simply a shorthand way of indicating that a given value is multiplied by itself a number of times. It is usually shown as a superscript number to the right. The number it decorates is known as the *base* (not to be confused with radix), and the value of the *exponent* itself is known as the *power* or *factor* like so: base^{power}. The term *power* can also refer to the end result of the multiplication process, so for example 256 is a *power* of 2 (the eighth *power*). Performing exponentiation can also be called *raising* a given *base* by a given *power*.

A power of zero is defined as the integer 1, and a power of one is defined as the base itself. You can see in the above diagram that each successive power of 5 is simply the lower power of 5 multiplied by 5 again (here, 5 is the base). To illustrate, 5^{1} = 5×1 = 5, 5^{2} = 5×5 = 25, 5^{3} = 5x5x5 = 125, 5^{4} = 5x5x5x5 = 625, 5^{5} = 5x5x5x5x5 = 3125, and so on.

What happens when the power of an exponent is negative? If we consider that a positive power indicates a series of multiplications, we might surmise intuitively that a negative power would be the inverse of that. We would be correct and identify the inverse of multiplication as division. Thus, we can define a negative power of b as b^{-n} = 1 / b^{n}:

Even though the right part of this definition is a single division (as opposed to multiple multiplications) it measures up to our standard. This is due to the fact that multiple divisions can be represented as a single division. For example, 10^{-1} = 1 / 10 or one-tenth, 10^{-2} is ten times smaller than that, being 1 / 100 or one-hundredth.

This can be illustrated by the multiplication of fractions. A fraction, as shown in the above diagram, represents a division of the *numerator* by the *denominator*. When multiplying by a fraction, the mathematician is actually performing multiplication *and* division by multiplying both the numerators and the denominators. Using this knowledge we can view the above equation as a series of multiplications (and divisions):

The astute reader will notice that place-values progressing to the left in positional notation are increasing powers of the radix. If we were to draw a point, otherwise known as a *radix point* or in base-10 the *decimal point,* and then create place-values to the right we could continue the progression but rather than counting up, we count down into the negatives. This allows us to create a fractional part of a number, in exponentially increasing degrees of accuracy:

A fractional exponent is what is known as a *root*, the most widely recognized form of which is the *square root*. The root of a value is defined as the number that, when brought to the power of the denominator, equals the base. For example, an exponent of 1/2, the square root, is defined by what number multiplied by itself equals the base. For example, 9^{1/2} = 3 because 3^{2} = 9. In this example, 2 is the *reciprocal,* or *inverse,* of 1/2. An exponent of 1/3 would require us to raise the target number by a power of 3. That’s why 8^{1/3} = 2, because 2^{3} = 8. In mathematical terms, roots can be written either as a fractional exponent, or using check mark like symbol like the diagram below:

When considering fractional exponents one must remember the properties of multiplication. You might think that 10^{0.1} or ten to the one-tenth power might equal 1, as 1 is 1/10 of 10, however, that would be incorrect. 10^{0.1} actually equals approximately 1.258925411794167, to our surprise. However, it makes sense because multiplying 1 by itself 10 times equals 1, whereas multiplying 1.258925411794167 by itself 10 times equals 10, completing the exponent definition.

Logarithms, on the other hand, are the inverse functions of exponents. Rather than compute the powers of a base like an exponent, a logarithm computes the *power* that a given *base* must be raised to in order to produce a result. You can think of it as computing a root/power where you know the base of the root and the final result beforehand. Exponents and logarithms are simply two sides to the same operation as shown in the diagram:

Logarithms were originally introduced by **John Napier**, a Scottish mathematician, physicist, and astronomer, in 1614 as a means of simplifying the process of multiplication and division. Calculating logarithms themselves is complex, involving finding roots combined with exponentiation (it is the reverse of the calculations featured in *fractional exponents*), but one can write these results into a *logarithm table*. Arduous multi-digit multiplication steps can be replaced by table lookups and addition because of a particular property of logarithms: *the logarithm of a product is the sum of the logarithms of its factors:*

Thus, to multiply three large complicated numbers one would only need to look up each of their respective logarithms in the table, add them together, and then raise the logarithm *base* to the resulting number (most likely via reverse lookup):

Every logarithm calculation relies on a *base*. Remember that the base in logarithms is the value that is being raised to a given power, and does not mean radix despite sharing a common synonym. The base is very important and cannot be ignored. However, in many texts, you will find logarithms that have no apparent base. These are often, depending on the context, what is known as *common logarithms* which have the base 10. Another base often seen in the wild is the Euler constant *e*, which makes up what are called *natural logarithms*. *Common logarithms* in text are often written *log* while *natural logarithms* are written *ln*.

Coming back full circle, earlier we showed how the number of *digits* grows *logarithmically* in relation to the scale of a number. This is easily observed because each place value in a straightforward positional notation increases by a *power* of the *radix*. If we count in base-10, each place value increases by a power of 10, likewise in base-2 each place value increases by a power of 2.

Review the positional notation definitions:

You’ve most likely noticed by now that the *index* of a given *position* is equal to the logarithm of base radix (log_{radix}) of the place value (log_{10} 1000 = 3). This is because the place value is the inverse of this, being the radix to the index power: radix^{index} (10^{3} = 1000).

This curious relationship enables us to calculate the number of digits of a given number in a given base. By calculating the logarithm, base radix, of a value plus one (dropping the fractional part) we find the maximum place value required. Remember that a logarithm is what value a given base must be raised to; in this instance, the radix is the logarithm base as illustrated:

With the ability to calculate how many digits a given value will have in an arbitrary base we can now calculate the efficiency of a given radix. This measure of efficiency is known as the *radix economy* and tells us the relative costs involved in implementing a given number system.

When implementing machines that can store values, such as computers, or number displays, it can be useful to know what resources are necessary for representing those numbers. Imagine that we are devising a display that will represent the number of likes on this article. We decide each place value will be represented by a wheel with the digits printed on different wheel positions. In base-10 each wheel will then have 10 faces. If we need to count up to 99,999 we will then need to manufacture and have on hand 5 wheels each with 10 faces for a total of 50 faces.

This scenario can be generalized so that for a given value n (here it was 99,999) we can determine how many faces are necessary (50) to represent it in an arbitrary base (10). You’ll notice that for this scenario we multiplied the radix (being the number of possible digits) by the number of places (being the wheels). Since we know how to calculate the number of places needed to represent a value in a given base (above), we can simply substitute that equation in the multiplication:

This equation measures a given radix’s efficiency *E* as a function involving the parameters *r* (the radix) and *N* (the number). So, for example, the number 263 in base-10 would have a radix economy, E, of E(10, 263) = 10 x ⌊(log_{10} 263) + 1⌋ = 3. The number 3256 would come out as 10 x ⌊(log_{10} 3256) + 1⌋ = 4. Notice here it doesn’t matter much the exact value of *N*, as it gets ‘glossed’ over into the number of digits necessary to represent *N*. Through a process of approximation and algebraic re-arrangement we can arrive at an equation that captures this simplification:

The diagram above uses a calculation we haven’t covered in this tutorial: the *natural logarithm*, that being the logarithm with base *e* or the *Euler constant*. In short, you can easily calculate the logarithm of any given base by calculating the *natural logarithm* of the value and dividing it by the natural logarithm of the base. We use that calculation here to separate out the exponential aspects involving the given number so that we can separate, on the right side, the exponential aspects involving the radix.

The last equation in the diagram can be read: “The *radix economy* [ *E(r,N)* ] divided by the natural logarithm of *N* is approximately equal to the radix *r* divided by the natural logarithm of *r*.” This means, working backward, we can find the radix economy of *N* by *r* by dividing *r* by log_{e}(*r*) and multiplying it by log_{e}(*N*).

If we’re only interested in arriving at an efficiency value we can compare relative to other efficiency measures (and not worrying about the actual value *N*), we can skip the multiplication by *log _{e}(N)* and just work with the parts that are pertinent to the radix,

You can see here an interesting derivation: using this equation as a measure of the relative efficiency of a given radix, a radix of 3 is actually more efficient (smaller factor) than a radix of 2, and of course, the most efficient radix (by this standard) is the Euler constant *e*.

The equation used in this article to compute radix economy was lifted from Wikipedia (https://en.wikipedia.org/wiki/Radix_economy) and is used (and explained) here under the terms of the CC3.0 license.

So what does this mean for us? It is well known that computers and other electronic devices almost universally use binary, or base 2, to represent values. However, we can see here that, allegedly, 3 is the most efficient radix (barring the ability to implement *e*). So what gives?

There are a number of reasons devices today continue to use binary rather than ternary (or base 3). One is an already existing investment in electronic materials (integrated circuit designs, transistors, etc.) that most accurately represent a binary system. Another is that binary is less error-prone as opposed to larger radices because it is the simplest representation. A value is either on, or off, whereas in a ternary system we must also maintain the third state in between which could fluctuate giving rise to errors.

Despite this con, there are several pros to a ternary system over a binary system. A non-standard positional notation known as *balanced ternary* (which we will explore later) allows integer representations to be added or subtracted more efficiently, and certain operations (such as comparisons) could proffer up more information at once than in binary. When comparing two integers, the computer could select from *greater than*, *less than*, or *equal*, whereas in binary, one must test for each relationship separately in a true-false manner.

It isn’t as if ternary passed by unnoticed. Early computer scientists and engineers were intrigued by the ability to create a computer that could perform calculations more elegantly than binary allowed. In 1950, a survey conducted by Engineering Research Associates that was published in *High-Speed Computing Devices* concluded that a binary system was preferred, but that a ternary system under the mathematical assumptions given (similar to our above equation) was more economical.

In the early 1950s, the MIT Servomechanisms Laboratory developed the Whirlwind I computer for the U.S. Navy. Proposals for the architecture of this machine, the first of its kind to operate on 16 bits in parallel, included one from **Herb Grosch**, an early computer scientist best known for *Grosch’s Law.* He proposed that the Whirlwind computer operate using ternary, probably in an effort to squeeze out more performance. Whirlwind I morphed into the control system of the military radar network that remained in operation for much of the Cold War, but unfortunately, ternary was ruled out early on in its development.

Funnily enough, it was all the way on the other side of that war that a man named **Nikolai P. Brusentsov** and his colleagues at Moscow State University actually developed the first working ternary computer in 1958. It was named Setun, after a river that flowed near the working campus. In the Setun architecture values were stored in a series of eighteen ternary digits, or trits. This allowed the machine to store in eighteen slots what a binary computer would have to store in twenty-nine slots. When we multiply the number of “faces” by the number of “wheels,” as discussed earlier that gives Setun a radix economy of 54 (3 x 18) in opposition to a binary radix economy of 58 (2 x 29).

Unfortunately, this efficiency wasn’t actually realized with Setun. The hardware necessary to store a trit could have also easily stored two individual bits. In the same space a trit occupied, storing three states, there could have been a total of four states.

It is with this in mind that some criticize our measure of efficiency. Numerous posts and a few papers show that the overall cost of implementing ternary is more prohibitive than simply implementing binary. This is often shown using technology that is optimized for a binary system, so there is no surprise there. Our generalized equation for efficiency as well, being *r x ⌊(log _{r} number) + 1⌋,* may not necessarily apply if technology is discovered that enables our metaphorical wheel to have more faces with no additional cost. This would mean the incremental cost of increasing the radix is not equivalent to the incremental cost of increasing the digits.

Ternary computing wasn’t contemplated and then wholly discarded. In the 1960s, experiments and proposals included building ternary memory cells and even ternary logic gates. This culminated two decades after Whirlwind I and Setun when **Gideon Frieder** and his colleagues at the State University of New York at Buffalo designed a complete ternary machine they dubbed *ternac*.

In 2022 the world is coming to grips with a post-pandemic (the novel coronavirus) economy and an evolving ecology of scarcity that is impacting the computing world. Microchips are becoming harder to manufacture, and computational demands (such as the metaverse and blockchain) are growing exponentially. It is possible that future work in computing technology might include systems that can perform calculations outside of the traditional transistor framework. Will these systems once again take up the ternary flag in pursuit of greater efficiency?

Only time will tell.

Image Based On A Photo by Christian Dubovan on Unsplash

This post (Radix Economy) was first published on How To Program Anything by Asher Wolfstein.

]]>This is the second article in a series whose intention is to have the reader able to understand binary, octal, and hexadecimal; three radices of great importance to contemporary computer theory. This article builds upon the previous article by outlining three important radices (binary, octal, and hexadecimal) that are useful in the field of computer science. I start with arbitrary base conversion using two methods. Then, a bit of background is given for why these bases are important, particularly binary. Finally, we perform radix conversion.

This post (Converting To Binary, Octal, and Hexadecimal) was first published on How To Program Anything by Asher Wolfstein.

]]>This is the second article in a series whose intention is to have the reader able to understand binary, octal, and hexadecimal; three radices of great importance to contemporary computer theory. By the end of this series, you should be able to read and convert integer values into binary, octal, and hexadecimal, perform arithmetic on all three representations, understand basic Boolean operations, and otherwise have a further appreciation of the power of binary.

In the following, we endeavor to convert from one radix representation to a different radix. We delineate these radices as the *source base* and the *target base*. We convert *from* the source base *to* the target base. This article does not cover converting the fractional part of a number in a given radix.

The source radix doesn’t really matter so much in conversion as does the target radix. The source radix simply informs us as to what value is being represented by the source, giving us the place values by which to multiply our digits.

The most straightforward method of converting from a source to a target base is to enumerate the place values of the target base (in order) that are less than the value being converted. From there, you can perform what is known as a *Euclidean division,* a fancy way of saying you compute a *quotient* and a *remainder.* You divide the source value *(the dividend)* by the highest place value *(the divisor)* and record the quotient as the digit, retaining the remainder. You then move to the next lowest place value and repeat using the remainder as the new divisor. This method is shown below for the small value 190 decimal into base 3:

This could be considered a naive conversion implementation, however. It leaves the calculator room for improvement. Calculating the divisions of large numbers is complicated and drawn out. Fortunately, there is a better way, though counterintuitive at first. We can “flip” the Euclidean division so that, instead of recording quotients, we record remainders. To do this, instead of dividing by the place value, we instead divide by the radix itself, storing the remainder as the digit, and using the quotient as the divisor of the next step. This method is shown below for the small value 190 decimal into base 3:

This works because of the nature of the remainder when dividing by the radix. Every place value is a multiple of the radix since each place value is the radix multiplied by itself index-number of times. By dividing by the radix, we are essentially subsuming, or dividing by, every place value until we can no longer divide, giving us the remainder. Essentially, we reduce each place’s value to one’s place and discover how many ones remain. This idea can be written mathematically using *Euclid’s Division Lemma* where *q* is the quotient and *r* is the remainder, and *a* and *b* are the dividend and divisor respectively:

This method may make more sense if you look at the operations in reverse: by multiplying by the radix and adding the remainder. This can be illustrated via a diagram borrowed from the next section. In this diagram, the second row is the previous integer from the third row multiplied by three, and the third row is the sum of the first two:

Dividing by the radix still requires the use of division, although a simpler division. For the calculator, there is still room for improvement. Division operations are more complicated and arduous to carry out than other operations. Addition and multiplication (repeated addition) can be carried out more easily than subtraction and division (repeated subtraction). Is there a way we could convert a value from a source base to a target base using a minimum of addition and multiplication?

The answer lies in an algorithm called *Horner’s Method* or *Horner’s Scheme*. This method is named after mathematician William George Horner but dates further back to Chinese and Persian mathematicians. Horner’s Method is an algorithm for efficiently calculating polynomials by utilizing addition to simplify extended multiplications.

It is based on *Horner’s Rule,* which put succinctly, unwraps a polynomial such as a_{0} + a_{1}x + a_{2}x^{2} + a_{3}x^{3} + a_{4}x^{4} into a recursive equation a_{0} + x(a_{1} + x(a_{2} + x(a_{3} + xa_{4}))). This allows the evaluation of a polynomial of degree *n* with only *n* multiplications and *n* additions:

When considering a value in a given base, we can substitute the radix for *x*, and the coefficients (*a _{n}*) as the digits. So, for 190 expressed using a radix of 3 above, we would have:

We can then work through the process of multiplying and adding the coefficients with the digits using a table inspired by *Synthetic Division* (which is based on Horner’s Method):

We start with the left-most digit, multiply it by the radix, and then add it to the next right digit. We repeat this process until we have the end result. *If we perform the additions and multiplications in our target base,* the end result will be a complete base conversion.

This method, when done by hand or with the assistance of a calculator, is most useful for converting from arbitrary radices to decimal (a radix of ten). The observant reader will notice that this was a very formal (and roundabout) way to find the value of a numeral given in any radix: by summing the multiplication of each digit by its respective radix exponent.

The standard numerical system, at least in science, is a positional notation using the radix of ten known as the Indo-Arabic numeral system. This system uses the numerals/digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 to form numbers in increasing powers of ten. This system is the most common symbolic representation of numbers in the world. When used to represent integers and non-integers (fractions) alike this system is called *decimal notation*. You will also find *decimal* referring to purely the fractional part of a number being the digits after the *decimal point*. For a review on positional notation using a radix of ten refer to the previous article in the series: Understanding Radix.

You might also see *decimal* referred to as *denary* or *decanary*, though not often. The common characteristic here is the de(c)- prefix. The dec(a)- prefix in decimal comes from the Greek for ten: δέκα, pronounced déka.

Binary on the other hand is a system of positional notation using a radix of two. This notation is most often used today when discussing numerical values in electronic settings such as computer programming. Counting in binary uses the numerals/digits 0, and 1 exclusively to form numbers in increasing powers of two. The etymology of binary actually traces to the Latin *bini* which translates to *“two-by-two.”* The term binary can refer to anything made of two parts, such as a binary choice, or a binary star, but here we use it to refer to the binary numeral system.

An article later in the series titled *Binary (Base-2) And Its Operations* deals exclusively with the binary number system, its historical roots, modern usage, and arithmetic. However, here we will briefly explore two common dominating factors for using a binary numbering system in computation: electricity and elegance.

Computational mechanisms can be constructed from a variety of materials, even billiard balls (in an idealized fashion), as long as certain conditions are met. Modern computational mechanisms are built using electrical circuits (of increasingly smaller sizes) consisting of *transistors.* Transistors are semiconductor devices that can switch electrical signals depending on an input signal. This switch occurs in two states: *on* and *off.*

By mapping these two outputs, on and off, to the binary digits 0 and 1 (arbitrarily), we can construct devices that appear to operate in accordance with binary enumeration. By carefully aligning and stringing together collections of these switches, we can perform mathematical and logical functions on binary representations to form a miniature calculator. This construction of a miniature calculator is the essence of what is now known as the modern *computer processor.*

In 1937, **Claude Shannon** produced such a device sans transistors by utilizing electronic relays and switches for his master’s thesis at the Massachusetts Institute of Technology. This was the first historical computational processor and was outlined in the paper *A Symbolic Analysis of Relay and Switching Circuits*. This thesis went on to become the foundation for practical digital circuit design and enabled the creation of the modern computer.

In the next article in this series, I explore the concept of Radix Economy, being the efficiency of a given radix in expressing numbers. The general idea is to set up a count of materials necessary to express a number, such as faces on a die and the required number of dice. In that article, we arrive at the conclusion that three is the most efficient practical radix according to this measure, but two isn’t far behind.

As the radix climbs higher than three, the efficiency of the radix decreases, with a radix of 5 garnering approximately 3.10667 and a radix of 10 achieving 4.34294 (lower is better). This lines up with reality: an increase in possible digits leads to an increase in implementation complexity.

With binary, we must only track two clear states: on and off. With ternary, or any base higher than two, we would need to track multiple states in exclusivity to each other. For example, with a radix of three (ternary) we would need to track an off signal (minimum), an on signal (maximum), and something in between. Our switches in our processor would need to select not from a simple on and off, but from three states. Building a reliable third state electrically is complex, and increases the margin for error: if the electrical signal happens to fall outside the threshold of the intermediate state, it could be read as one of the others.

As mentioned in Radix Economy, radices larger than two in computing systems aren’t impossible. It’s not a forgone conclusion that binary will always remain the best answer. But as integrated circuit technology and transistors are currently used, binary is the most elegant representation in terms of complexity and margin for error.

Let’s convert the value 197 to binary following the efficient division method from above:

Now, let’s use the Horner Method to convert 11000101_{2} back into decimal notation:

The term octal refers to a radix of eight. The prefix oct- comes from the Greek word for eight: οκτώ, pronounced októ.

But why use a radix of eight? It turns out octal is translatable to three bits, much like the later hexadecimal is translatable to four bits.

What’s a bit? The further article covering binary delves further into the definition of binary numbers in relation to computer hardware, but a quick overview here is useful. In computer science, *“bit”* is the term used for the smallest amount of information processable/storable by a conventional electronic computer. One bit is one binary digit: on and off. By stringing multiple bits together you can represent numbers of varying ranges.

In the case of octal, we can string together three bits to represent a binary integer between the values zero and seven inclusively. This is a total of eight independent values, each representable by one octal digit (0, 1, 2, 3, 4, 5, 6, and 7). We can visualize this in the following table:

Binary is clean, elegant, and simple… for a computer. Unfortunately, to even a trained eye binary representations can quickly become untenable. Consider the binary value 10001010011111010000_{2}, equal to 567,248 in decimal notation. If I were to change a single one of those digits (bits), what value would it then represent? You can see the issue.

Because the octal radix (8) is a clean power of two, we can simplify binary expressions in groups of that power. In octal’s case, in groups of three. We can reference the table above (and the value above) to turn 10001010011111010000_{2}, into 2123720_{8}. The following diagram visualizes this process:

While octal is handy for simplifying binary, it’s not used as commonly today as the next notation of hexadecimal. Its most famous current usage is in file permissions in Unix-style operating systems, particularly *Linux.* In those systems, common user permissions are expressed in three sets of three bits. Each bit represents the permissions *read*, *write*, and *execute*, and each set denotes the context of *owner*, *group*, and *public*. In this regard, full permissions in every context for a file could be expressed as 777_{8} (111111111_{2}).

With a binary representation as the source base, conversion to octal becomes simply a matter of substituting three bits at a time with the corresponding octal digit. In other cases though, such as converting from base 10, we can use the methods already outlined. Let’s convert the value 197 to octal following the division method from above:

Now, let’s use the Horner Method to convert 305_{8} back into decimal notation:

Hexadecimal is a positional notation with a radix of sixteen. The prefix hexadecimal- is comprised of two prefixes: hex- and dec-. Previously we said dec- is related to the Greek term for ten. Hex- is a similar prefix and is related to the Greek word for six: έξι, pronounced éxi.

You might think hexadec- would refer to ten multiplied by six and refer to a radix of sixty. In the first article in the series, we encountered the Babylonian era numeral system that used a radix of sixty. However, that system was called *sexagesimal*. Hexadecimal refers to a radix of sixteen, being the fourth power of two.

In order to represent numbers in hexadecimal, we need to have sixteen different numerals. The decimal notation can provide the first ten (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9), but what about the remaining six? In computer science and mathematical literature, the first six letters of the Latin alphabet are substituted for numerals: A, B, C, D, E, and F. This gives us a complete set of digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F.

Just like in octal, hexadecimal is a clean power of two so we can simplify binary expressions in groups of that power. In hexadecimal’s case, in groups of four. Just like in octal, where one octal digit represents three successive bits, one hexadecimal digit represents four successive bits. We can build a similar reference table as above and use it to turn 10001010011111010000_{2}, into 8A7D0_{16}. The following diagram visualizes this process:

Because binary values of considerable length can be significantly shortened using hexadecimal it is often used when working “close to the hardware.” Programming in machine code, or one layer of abstraction above that, *Assembly,* often requires dealing with large binary values as they are stored in *registers,* and when addressing computer memory. In this context, hexadecimal becomes a valuable resource.

Another area where users most often encounter hexadecimal is in 24/32-bit color values, particularly on the internet. These color values usually have three color channels, red, green, and blue, with an optional alpha channel for transparency. Each channel has 256 shades, 0 – 255 (exactly one byte, or eight bits of information). In binary, 255 is represented by 11111111_{2}. This is eight ones in sequence, but if you break it up into two groups of four (1111), you can convert it to hexadecimal using the above table: FF_{16}. On the web (such as in the *CSS* standard), the format for specifying a 24-bit “web” color follows the hexadecimal triplet RRGGBB, where R stands for red, G for green, and B for blue. In this scheme, pure red becomes FF0000, pure green 00FF00, and pure blue 0000FF.

With a binary representation as the source base, conversion to hexadecimal becomes simply a matter of substituting four bits at a time with the corresponding hexadecimal digit. In other cases though, such as converting from base 10, we can use the methods already outlined. Let’s convert the above binary number (567,248 in decimal) to hexadecimal following the division method from above:

Now, let’s use the Horner Method to convert 8A7D0_{16} into its decimal notation:

We’ve been presented with various bases so far in this series: binary, base-3, base-7, octal, and hexadecimal. However, we haven’t discussed which radix is “the best.” Is there a “most efficient” radix? Is binary (base-2) the most efficient radix to store information? This question is answered in the next article. In doing so, we’ll also discuss how to calculate how many digits are in an arbitrary value in any base. This type of calculation is useful in *information theory* for determining how much data is required to represent any given probability. Further in the series, we’ll be exploring binary, its history, and how to perform arithmetic operations in binary. We’ll also briefly touch upon Boolean logic operations, the shift operations found in many processors, and their relation to binary.

Image Based On A Photo by Fikri Rasyid on Unsplash

This post (Converting To Binary, Octal, and Hexadecimal) was first published on How To Program Anything by Asher Wolfstein.

]]>This article puts forth a brief history of counting, which details how we arrived at some of the conventions we have today, including the notion of radix. It then explores the concept of radix in positional numeral systems, and in particular the concept of using radices of arbitrary values. With this foundation, it becomes a simple exercise to use binary, octal, and hexadecimal, each with a radix of two, eight, and sixteen respectively.

This post (Understanding Radix) was first published on How To Program Anything by Asher Wolfstein.

]]>This is the first article in a series whose intention is to have the reader able to understand binary, octal, and hexadecimal; three radices of great importance to contemporary computer theory. By the end of this series, you should be able to read and convert integer values into binary, octal, and hexadecimal, perform arithmetic on all three representations, understand basic Boolean operations, and otherwise have a further appreciation of the power of binary.

We rarely think about counting itself in our daily usage of numerals, having learned from an early age how to do so. Our most common system of counting, the Hindu-Arabic (or Indo-Arabic) numeral system, was developed and adopted over a long period of time. It’s strange to consider, but this decimal numeral system wasn’t always around and certainly wasn’t the first.

To understand the concept of radix, which is an abstraction particular to a positional numeral system such as the Indo-Arabic, we have to understand how such a system came about. To understand how and why this system came about and is used, we have to understand the intellectual foundations of counting itself.

The concept of a number is an abstraction. The idea is that a given identifier, in this case, a number, indicates the quantity of a set of individual elements. For instance, say we have these symbols:

Without any process of abstraction, this collection of symbols presents to us a gathering of unique entities, each separate from the other. Our first abstraction is to note that some of the symbols have similar characteristics. We can see that some of the symbols are a particular color, and likewise, we can note that some of the symbols have the same overall shape.

Abstraction is this conscious process through which we identify particular characteristics of entities while ignoring the specific measures of those characteristics. By identifying the trait of hue, and ignoring all other aspects, we come up with *color*, no matter the shape or size. By identifying a unique look, and ignoring all other aspects, we come up with *shape*, no matter the size or color. Using that process we can say multiple symbols are all green, or all square in the above diagram.

Counting is a similar process. Now we identify a new trait, that of amount. Focusing purely on the characteristic of amount gives rise to the abstraction of *quantity*. Each symbol considered by itself is a single thing. Single here is an *instance* of the abstraction of quantity. If we then consider the entire diagram we can see that there are seven things. Seven here is an instance, arbitrarily named, of the same abstraction of quantity. This abstraction, and its infinite instances, is what we know as *counting*.

In the diagram above I can see there are four green shapes, and three blue shapes. I can also see there are two squares and five triangles. Overall, there are seven shapes in total. This is counting in practice.

In mathematical terms what we are doing in these cases is determining the size of a set. A set is a collection of elements, and the quantity of that set is the number of those elements. Each colored shape is an element and the numbers three or five correspond to the quantity of them in a given set.

It’s useful to be able to acknowledge and assign names to instances of the abstraction of quantity (numbers). Each language has its own way of doing this. In English, the most basic names are one, two, three, and so on up until nine. For instance, with the knowledge that I have four arrows left, I can gauge my ability to hunt, or knowing I have seven pies can tell me how many people I can invite to dinner. However, what do we do when a) we want to record this information and b) the numbers become so large they become difficult to process?

Let’s first tackle the recording of quantity. In pre-literary history, people often used various body parts to assist them in counting. We still do this today when we employ our fingers, for example, to count up to ten. However, once the quantity to count surpasses memorable body parts, accuracy suffers.

People needed something external to themselves that could persist. One way of creating an external measure was to use objects, or marks, in such a way that there were equal marks for equal quantity.

This system is known as a *unary numeral system:* to represent a quantity, a given symbol is repeated the same amount of times. This was useful for recording purposes, as you could indicate a quantity without having to have what you were counting present, but it presented serious shortcomings.

Imagine looking at a disorganized series of one-hundred twenty-seven dots and determining how many dots exist. To do so, you’d have to recount them again to be sure. To make this recount easier, and much simpler, a method of marking known as tally marks was used.

In this system, marks are drawn together in clusters, each typically denoting a quantity of five. The diagrams below show four different systems of tally marks as they exist today:

These types of tally marks (and variations) are used most often in Europe, Australia, New Zealand, North America, and South Africa.

For regions where Chinese logograms (漢字/汉字) are used, mainly China and Japan, the character 正 is drawn repeatedly as it consists of five strokes.

In France, Spain, and Brazil it’s common to use squares with a slash such as this.

In the fields of forestry, the tally marks are often clustered in quantities of ten by including dots in the square as well as lines.

The key observation to take away from these tally marks is that of the clustering. With tally mark clusters we can suddenly start counting by fives or tens when counting the marks as opposed to one at a time. This idea of clustering together a quantity is useful, and, as we’ll see, is expanded upon.

In ancient times, the remains of which were later found in the Zagros region of Iran, people called Sumerians developed methods of primitive accounting. Society required records of transactions of flock and grain and so recording methods were devised utilizing clay tokens and string. The method by which a given quantity was encoded onto strung-together clay is known as a *sign-value system*.

To create a record that represented three bales of hay, as a probably historically inaccurate example, they would bake three clay tokens with a symbol on them to represent hay and then string them together. These were later baked into a bulla (or hollow ball) to preserve their integrity from outside manipulations. In this example, one token represented one bale of hay.

However, this had its limitations. As the number of things increased, the more difficult it was to count with singular tokens, much like the unary system. To overcome this, different tokens were baked and assigned a larger sum. Using this method to scale up you’d end up with a token for one bale of hay, a token for five bales of hay, a token for ten bales, and so on.

Each different token was strung along into a sequence. To know what amount a given string of tokens represented, one would add up the numbers assigned to each. One ten bale token, one five bale token, and two one bale tokens would be equivalent to seventeen bales of hay. This process and encoding, where a given number is arrived at through the summation of each element, is known as *sign-value notation*.

You’ve probably noticed that the tally marks discussed before are in many ways much like the tokens. Different representations for different values, particularly the forestry example. Tally marks are a unary system with elements of sign-value. However, sign-value notation by itself, as we’ll see in the following example, can be more complex and nuanced than tally marks allow.

You are likely already familiar with sign-value notation! This notation later came to be used in Roman numerals. That notation, with a few added rules of subtraction, is probably the most famous sign-value system. In Roman numerals, an *I* represents a one, *V* a five, *X* a ten, *L* a fifty, and *C* is one hundred. The example clay sequence above would then translate similarly to *XVII* in Roman numerals.

The astute reader will notice here the beginning of a very important trend. Once again we see higher and more abstract values (those difficult to count in sequence) getting shortened or clustered together. In this case, the Sumerians in our example grouped five elements into one element, and so on.

As history marched forward, the Sumerians gradually replaced the dull pointer they were using for their accounting with a reed stylus that could press wedge shapes into clay. This was an adoption of the wedge-shaped cuneiform writing they had also developed. These wedge-shaped numerical notations continued to retain the circular shape and sign-value notation they had inherited, but there were problems.

Up until that point, Sumerians had recorded amounts in various states of isolation and each city had come up with its own method of a sign-value notation using different values. The incompatibilities weren’t limited to location, even in one city there could be different systems for different concepts; one for discrete objects, one for ingredients, one for area, one for time, etc. Think about how today some of us continue to struggle to convert tablespoons to cups, and ounces to pounds. These different systems of quantity led to ambiguity when recorded in cuneiform. Without knowing the proper sign-value delineations being used, the marks would be meaningless.

Eventually, these systems began to converge to a single standard around 2100 BC. Two cuneiform marks, the vertical wedge, and a V-like shape called a chevron became dominant. Sign-value delineations also converged, and by 1950 BC, at the beginning of the Old Babylonian period, a new standard arose in the form of a sexagesimal number system.

This new sexagesimal number system debuted a counting innovation that so far has eluded us. In a sexagesimal system, one counts using alternating delineations of ten and six (hence sexagesimal), but in contrast to sign-value, one also pays attention to where a given value is in the sequence. This is known as *positional notation* or *place-value notation*.

In this system, one counts up to nine using vertical marks, but when you get to ten you draw a chevron on the side. You may repeat this four more times, creating four more chevrons and ultimately nine more vertical wedges to reach a value of fifty-nine. Fifty-nine then has five chevron marks and nine vertical marks. Past this point, you would begin again in another position. A vertical mark in that position would indicate a value of sixty, being the next value after fifty-nine.

At first, this method may sound alien, but millions of people every day feel the influence of this system when they count sixty seconds to a minute, sixty minutes to an hour, or three-hundred and sixty degrees to a circle. This numerical ghost of Babylonia continues to haunt us today.

At last, we have uncovered the ultimate and most elegant consequence of value clustering. By paying attention to *where* a given numeral is in relation to others we can increase its value most efficiently to represent tens or even thousands as necessary.

You might notice that, although the Sumerian positional notation revolves around the number sixty, each position itself is divided up by the more familiar value of ten. Mathematically speaking there is a common theory that sixty was chosen because of its compactness and prime factorization (2 x 3 x 3 x 5). Due to this factorization, it is what is known as a *superior highly composite number* and is divisible by 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, and 30. A superior highly composite number is a number with the highest ratio of divisors to some power of itself in comparison to other numbers.

In positional notation, as opposed to sign-value notation, the placement of a numeral within a sequence of numerals helps determine that numerals value. In sign-value notation the numeral 5 is always equal to five, so 555 would be fifteen. In positional notation, the placement of a numeral, its position, indicates an additional value by which to be multiplied. To continue the example, 555 in decimal positional notation would equal five-hundred and fifty-five (5 x 100 + 5 x 10 + 5 x 1).

Using this notation a given numeral and its place is called a *digit*. 555 above would be three digits, while 12673 would be five digits. The actual place of a digit is called its *position*. The value represented by a series of digits is called a *number*. You can count the placements of digits from the right starting at 0. This count is called the *index*. The diagram below sums up these terms graphically:

But how do we determine the value by which a numeral is multiplied? How do we know that a 5 in index two is multiplied by 100 as opposed to say 256? To answer those questions we must develop a new parallel concept integral to positional notation: *radix* (also called the *base*). Radix is a Latin word for “root.”

We’ve outlined how the position of a numeral in positional notation denotes a value to be multiplied against that numeral for a final value. In the diagram above, for example, 6 is to be multiplied by 100 for a final value of 600. But, what are the values to be multiplied? How did we know to multiply by 100?

The easiest way to illustrate the concept of radix is to count. Let us employ the standard method of counting, decimal, to show what each value placement, or position, means. If we start at zero and count up in decimal, we eventually reach nine. What happens next involves numeral position and defines the concept of radix or base.

9 is the highest numeral in the decimal system, meaning that there are no other numerals to choose from when counting. So how do we count higher? Those familiar with decimal will naturally point out the value of 10.

Notice what was done here.

We replaced the original 9 with a 0 and wrote a new numeral *1* to the left of the 0. In positional notation, we understand that this 1 is in the *“ten’s place”* and is equal to ten. That is, it is equal to *1 x 10*. Of course, we could also derive that from the fact that 10 is equal to 1 + 9.

Likewise, if we count up another 9 values we reach the value 19. What happens when we count 1 past 19? We replace the 9 with a 0, and then increment (add 1) the *1* to a *2.* This indicates, as above, that we are now speaking of 2 tens (that is, twenty) added to a 0. We can continue this process until we have 9 tens and 9 ones: 99.

What happens when we count one more past 99?

Much like with the tens place we replace the two 9s with 0s and write an additional numeral 1 to the left of the previous digits. We end up with 100 which represents one hundred. That is, the 1 is now in the *“hundred’s place”* and is equal to one hundred.

As we count up, we continue this process and in doing so we end up with a *“thousand’s place,”* a *“ten thousand’s place,”* a *“hundred thousand’s place,”* a *“million’s place”* and so on.

You’ll notice that each time we create a new “place” or position, the value indicated by that position increases by a factor (power) of 10. It’s no coincidence that in decimal the highest numeral, 9, is one less than the factor by which we’re multiplying. In our scenario here, the factor of 10 is known as the positional notation’s *radix* or *base.* Decimal then is known as base 10 or having a radix of 10.

Radix is the factor by which our *place values* (the value indicated by a numeral’s position) increase. You might think that, with a radix of 10, this is an obvious concept that need not be heeded. However, things get interesting when we decide to change the radix to something other than 10.

What does that look like?

Let’s use a radix that is lower than 10, for example, a radix of 6. This means that each place value increases by a factor of 6. The first place value (starting at 0) is 1, giving us *6 ^{0} = 1.* The second place value (one to the left) is

If the second place value is 6, as opposed to 10, that means that we can only count up to 5 in the *“one’s place”* before replacing it with a zero and incrementing one digit to the left. In this scheme, this radix, that means that *10 = 1 x 6 = 6* as opposed to ten, and likewise, if we counted up by 5 again and then added one more we’d get *20 = 2 x 6 = 12.*

You’ll notice here that not only is the radix the factor by which the place values increase, but it is also equal to the number of unique numerals (including zero) that we use to count in that radix.

Mathematically we can generalize the place value properties of an arbitrary radix. Let us define the radix as *b* (we assume here *b > 1*) and a series of digits limited by that radix as *d _{1}d_{2}d_{3}…d_{n}*

We can actually define the radix to be something other than a natural number. We can employ an algebraic number such as the golden ratio, a complex base using the imaginary unit *i*, a negative base (which allows us to represent negative numbers without a minus sign), and even the natural log *e*. The latter of these comes up later when we discuss the economical factors of radix.

Let’s see if we can show the concept of radix using a positional notation by example. In these examples, we’ll be representing the value 2067 in three different radices: 3, 7, and 10.

For a radix of 3, we have the numerals 0, 1, and 2 at our disposal. The place-values increase by a factor of 3 (the radix) as we travel left. This gives us the following place values:

We start on the left and subtract place values away from the total value with increasing granularity and precision. On the far left we start with 729 (3^{6}), then 243 (3^{5}), followed by 81 (3^{4}), 27 (3^{3}), 9 (3^{2}), 3 (3^{1}), and finally 1 (3^{0}). You can see how the indices of the positional notation correspond to the powers of the radix. In each of these place values, we can only count up to 2 (the radix minus 1).

In the diagram, you can see I have translated the value 2067 into base 3. Starting with 1458 (729 x 2) we add 486 (243 x 2), then 81 (81 x 1), 27 (27 x 1), 9 (1 x 9), and finally 6 (3 x 2) to get our value 2067. In the next post in the series, we’ll discuss exactly how I was able to do this conversion.

For a radix of 7, we have the numerals 0, 1, 2, 3, 4, 5, and 6 at our disposal. The place-values increase by a factor of 7 (the radix) as we travel left. This gives us the following place values:

On the far left we start with 343 (7^{3}), then 49 (7^{2}), followed by 7 (7^{1}), and finally 1 (7^{0}). In the diagram, you can see I have translated the value 2067 into base 7. We start on the left and subtract place values away from the total value with increasing granularity and precision. Starting with 2058 (6 x 343), we skip 49 (the remainder of our subtraction isn’t large enough) and then add 7 (1 x 7), and finally 2 (2 x 1) for a total of 2067.

Finally, for our familiar radix of 10, we have the full range of traditional numerals at hand: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. The place-values increase by a factor of 10 (the radix) as we travel left. This gives us the following place values:

On the far left we start with 1000 (10^{3}), then 100 (10^{2}), followed by 10 (10^{1}), and finally 1 (10^{0}). In the diagram, you can see I have translated the value 2067 into base 10. We start on the left and subtract place values away from the total value with increasing granularity and precision. Starting with 2000 (2 x 1000), we add 60 (6 x 10), and finally 7 (7 x 1) to achieve a sum of 2067.

Writing numbers using positional notation gives rise to some interesting properties. Before we go too much further, though, we’ll cover how to convert between decimal and the three major radices used in computer science: binary, octal, and hexadecimal. As the series progresses we’ll cover many of these fascinating facets including how to calculate the economy (efficiency) of a given radix, how to represent negative and fractional values, and even some non-standard positional notation systems such as balanced ternary.

Image Based On A Photo by Amirhossein Azandarian Malayeri on Unsplash

This post (Understanding Radix) was first published on How To Program Anything by Asher Wolfstein.

]]>This series intends to have the reader able to understand binary, octal, and hexadecimal; three radices of great importance to contemporary computer theory. By the end of this series, you should be able to read and convert integer values into binary, octal, and hexadecimal, perform arithmetic operations on all three representations, understand basic Boolean operations, and otherwise have a further appreciation of the power of binary.

This post (Binary, Octal, And Hexadecimal) was first published on How To Program Anything by Asher Wolfstein.

]]>Practitioners in computer science often must convert between three major positional numeral systems: binary, octal, and hexadecimal. Binary is short for base two, consisting only of ones and zeros, and is used in the construction of circuits and components, such as a central processing unit. Octal is short for base eight and can serve as a way to group binary digits together in sets of three. Hexadecimal is short for base sixteen. Like octal, hexadecimal can also be used to group binary digits in groups of four. As binary sequences get longer, the ability to group digits together as shorthand becomes essential.

In this series of posts, the reader will come to understand these three number bases in the context of counting in any number base. We’ll first cover the concept of a radix, or number base, with an eye toward the historical aspects of counting. From there we’ll leverage our understanding of radix to convert decimal values to binary, octal, and hexadecimal and back. Once we’ve explored these specific radices we examine the idea of radix economy, and answer the question, “Is there a radix with optimal efficiency?”

The next couple of articles focuses specifically on binary, as this radix is the basis for how all information is represented in a computer. Here we examine binary vernacular, such as the kilobit or megabyte, as well as Boolean logic. Further on we explore negative value representations using binary including the popular two’s-complement method. To round out our discussion of arbitrary bases we experiment with non-standard positional notations such as balanced ternary, a form of base three, and other obscure counting odds and ends.

This article puts forth a brief history of counting, which details how we arrived at some of the conventions we have today including the notion of radix. It then explores the concept of radix in positional numeral systems, in particular the concept of using radices of arbitrary values.

This article builds upon the previous article by outlining three important radices (binary, octal, and hexadecimal) that are useful in the field of computer science. I start with arbitrary base conversion using two methods. Then, a bit of background is given for why these bases are important, particularly binary. Finally, we perform radix conversion.

This article begins with a recap of where we are in the series in regards to the concept of counting. I review the definition of positional notation as outlined in the first article and then move on to reveal how we can calculate the number of digits a value will have in a given radix. In doing so I will go over two mathematical concepts relevant to this calculation: exponents and logarithms. I will then use logarithms to show how you can calculate the efficiency of a given radix, also called the radix economy, and answer the question, “What’s the most efficient radix?”

This article continues the trend of the previous articles and begins with a history of binary. After that, I briefly reiterate why binary is used in modern electronic devices as covered in the previous article, and go into more depth regarding binary “sizes” (bit, byte, kilobyte, etc.) Then I move on to important elements of binary arithmetic, and the operations of addition, subtraction, multiplication, and division. I cover two operations often found in computing processors, the shift operators, and their mathematical meaning. Finally, I briefly cover Boolean logic operations.

A non-standard positional notation is one where the value of each position isn’t necessarily a straightforward power of the radix. I am also including when the radix is not a positive integer (such as -2), even though mathematically the representation is consistent with standard positional notation. By altering the interpretation of one or more of the place values (or the radix) of a binary representation, we are able to represent negative values. In this post I’ll be covering sign-magnitude, the most intuitive method, the radix complement methods (ones’ complement and two’s complement), offset binary (also known as excess-k or biased), and base -2 (base negative two).

Featured Image Base On A Photo by Carlos Irineu da Costa on Unsplash

This post (Binary, Octal, And Hexadecimal) was first published on How To Program Anything by Asher Wolfstein.

]]>