Assembly Language
Last updated
Last updated
We have all the parts that are needed for our CPU and learned the architecture of our computer. However, since the CPU is more complicated than our previous circuits, we will take a preview of its functionalities before actually constructing one. If you are a web developer, you may have heard of the new WebAssembly language that enables all languages to run on the Web and makes web applications run much faster. If you have used C or other low-level languages before, you might know that it compiles to assembly to be executed.
You will hear the verb "compile" or its noun "compilation" a lot both in this course and out in the programming community. In simple terms, compilation means translating a program written in one language to another, like translating an article written in Chinese to English. Just like there are human translaters that translates texts and speeches from one natural language to another, you can translate computer programs by hand from one programming language to another. However, this manual translation is very tedious and error-prone. Instead, we develop another program called the "compiler" to do the translation for us. We will learn how to build a compiler later on in this course.
In fact, all programming languages will be compiled directly or indirectly to some assembly language to be able to run on a physical hardware. The biggest difference between assembly languages and other languages you might have used like Python and JavaScript is abstraction levels. Unlike higher-level languages, assembly languages do not have concepts like functions, closures, classes and usually have only very simple numerical data types like integers and floating-point numbers. However, we can realize all these complex abstractions in assembly by constructing something called a Virtual Machine (VM). We will cover VMs later.
So we know that most languages are compiled to assembly languages but how does assembly programs run on a physical computer? How does the computer understand the meaning of the letters and digits in the program and know what to do? It turns out that assembly languages actually have two formats - one for humans and one for machines. The human format is a text file that we can edit in a text editor like VS Code or notepad. The machine format is a binary file containing a long list of binary instructions for the CPU. There's a separate program called an assembler that compiles the human format to the machine format. We will create the assembler later. Meanwhile, let's take a look at the assembly language for our computer.
There are two types of instructions - A instruction and C instruction. Let's first see what an A instruction looks like:
The @
sign means to
store the value 243
in the A
register
load the value at RAM address 243
into the M
register
This instruction can be a little confusing because it can serve two purposes:
Load a number into the A
register
Load a value in RAM to the M
register
So we developed shorthand names for common register addresses:
Shorthand Name
RAM Address
R0
0
R1
1
R2
2
...
R15
15
so these two A instructions are equivalent:
The extra R
in front just reminds us programmers that the number 12
refers to a memory addres instead of a number. Remember the base address of the screen and keyboard memory maps? We also have shorthands for them:
Shorthand Name
RAM Address
SREEN
65536
KBD
84747
In general, an A instruction looks like:
Now let's see how the computer stores and remembers the A instruction. The translation rules are straight forward:
@
sign neatly translate to a 0
the number after @
just becomes a 31 bit binary number
Example 1:
translates to
3
is 11
in binary. We pad extra 0
s in front to make it 31
bits. Lastly, we add one extra 0
in front that stands for the @
sign. Now we have a full 32
bits A instruction in machine format!
Example 2:
You can't do the decimal to binary conversion in your head this time but 42802345
in decimals is 10100011010001110010101001
in binary. Again, we pad extra 0
s in front if the number doesn't take up the full 31
bits. Lastly, we add one extra 0
in front that stands for the @
sign to make the instruction 32
bits.
A natural question will be what happens if we want to pass in negative numbers? Say we want to pass in -1
in decimals. Why can't we just do:
And translate it to
If the only instruction our computer supports is the A instruction, doing so will be perfectly fine. However, the top bit is used to switch between the A and C instructions. It's called the operation code or op code for short. Alright, so did we just limit the numbers we can pass to 31
bits instead of the full available 32
bits just to accomodate the op code? Let's put this excellent question aside for a moment and come back to it after covering the C instruction.
First, let's see a sample C instruction in human format:
Let's decompose the instruction into three parts:
Last chapter we designed our CPU with 3 internal registers A
, M
, and D
:
The C instruction allows us to store to any zero, one, two, or all three of these registers. For each register, we use 1 bit to represent whether to store to that register or not:
Destinations
d1
d2
d3
none
0
0
0
M
0
0
1
D
0
1
0
MD
0
1
1
A
1
0
0
AM
1
0
1
AD
1
1
0
AMD
1
1
1
Remember that our ALU takes in 6 inputs
To Be Continued...