Every piece of information that is stored in computers is encoded in numbers.
The sentence you're reading now is also just a sequence of numbers: each letter of the alphabet has a corresponding numeric code, so the word "hello" is represented with five numbers: 104, 101, 108, 108, 111.
We use a standard encoding scheme that is called UTF-8, which can represent a lot of characters, including symbols, characters from all sorts of languages, and emojis. Other encoding systems exist, but UTF-8 is most commonly used today.
Give it a try and see how each letter turns into a number:
In our everyday life, we use decimal numbers, which are based on the number 10: we count in tens, hundreds, thousands, and so forth. This way, we count physical things: one tree, two apples, ten fingers. But the same quantities can be represented using different notations and with other numerical systems. For instance, sometimes we use Roman numerals: I, II, III, IV, X. These are not particularly practical if we need to deal with larger numbers, though—and likewise, decimal numbers are not always practical when we deal with computers and data.
Computers use the binary number system which, fundamentally, is not different from other number systems. To understand how these systems are related, let's take a decimal number: . It can be deconstructed as a sum of numbers (100 + 20 + 3), or as a sum of exponents:
Notice how we exponentiate the number 10: it's our number base, or radix, which can be seen as the number of
digits that we can use for counting, starting with zero. For example, base 10 has 10 digits from 0 to 9,
base 8 has 8 digits from 0 oct to 7 oct, and base 2 has only two digits: 0 bin and 1 bin. An exponent can also be seen as a digit's place.
If we take a number like 500, it has 3 digits: 5, 0, and 0.
5 is the 3rd digit from the right. We can think of that row of numbers as a zero based list or array. Counting from the right, the 5 sits on index 2. So we take 10 to the power of 2 and multiply by the digit 5, i.e.: 102 × 5 = 500.
Something fun happens when we change the number base and follow the same principle of summing exponents.
Let's see how it works in base
oct is a sum of exponents:
If we sum the numbers 448 + 56 + 7, we'll get , which is exactly what the octal number 777 oct is if we convert it into a decimal number!
The same thing happens if we use 2 as our number base. For example, let's take a number bin. It can be deconstructed as a sum of the following exponents:
If we sum all these exponents, we will get , which is the number that is represented as 1011 bin in the binary form.
But what about the number systems where we have more than 10 digits? Like, for example, hexadecimal, or the number base 16. When we run out of digits, we can start using letters instead: for example, the hexadecimal digit A hex is equal to decimal 10, and the digit B hex is the same thing as 11, and it goes up to F hex.
Give it a try: hex
If we sum the numbers 16 + 15, we'll get the decimal number , which corresponds to 1f hex.
We use the hexadecimal number system in computing because it has a neat property: we can represent a single byte (a number between 0 and 255, or 28) with just 2 hexadecimal digits. This allows us to build neat grids, or hex views, to represent data as sequences of hexadecimal numbers.
Let's get back to that text encoding example, but this time let’s use hexadecimal encoding:
By their very nature as digital machines, computers only deal with ones and zeros. Through text encoding and base changes, computers can deal with the letters and numbers that humans use every day. Now that you know how some basic data is encoded “on the wire”, you’ll be ready to learn more about networking and low-level programming.