Assembly Language

We have all the parts that are needed for our CPU and learned the architecture of our computer. However, since the CPU is more complicated than our previous circuits, we will take a preview of its functionalities before actually constructing one. If you are a web developer, you may have heard of the new WebAssembly language that enables all languages to run on the Web and makes web applications run much faster. If you have used C or other low-level languages before, you might know that it compiles to assembly to be executed.

You will hear the verb "compile" or its noun "compilation" a lot both in this course and out in the programming community. In simple terms, compilation means translating a program written in one language to another, like translating an article written in Chinese to English. Just like there are human translaters that translates texts and speeches from one natural language to another, you can translate computer programs by hand from one programming language to another. However, this manual translation is very tedious and error-prone. Instead, we develop another program called the "compiler" to do the translation for us. We will learn how to build a compiler later on in this course.

In fact, all programming languages will be compiled directly or indirectly to some assembly language to be able to run on a physical hardware. The biggest difference between assembly languages and other languages you might have used like Python and JavaScript is abstraction levels. Unlike higher-level languages, assembly languages do not have concepts like functions, closures, classes and usually have only very simple numerical data types like integers and floating-point numbers. However, we can realize all these complex abstractions in assembly by constructing something called a Virtual Machine (VM). We will cover VMs later.

So we know that most languages are compiled to assembly languages but how does assembly programs run on a physical computer? How does the computer understand the meaning of the letters and digits in the program and know what to do? It turns out that assembly languages actually have two formats - one for humans and one for machines. The human format is a text file that we can edit in a text editor like VS Code or notepad. The machine format is a binary file containing a long list of binary instructions for the CPU. There's a separate program called an assembler that compiles the human format to the machine format. We will create the assembler later. Meanwhile, let's take a look at the assembly language for our computer.

A Instruction

There are two types of instructions - A instruction and C instruction. Let's first see what an A instruction looks like:

@243

The @ sign means to

  • store the value 243 in the A register

  • load the value at RAM address 243 into the M register

This instruction can be a little confusing because it can serve two purposes:

  • Load a number into the A register

  • Load a value in RAM to the M register

So we developed shorthand names for common register addresses:

so these two A instructions are equivalent:

@R12
@12

The extra R in front just reminds us programmers that the number 12 refers to a memory addres instead of a number. Remember the base address of the screen and keyboard memory maps? We also have shorthands for them:

In general, an A instruction looks like:

Now let's see how the computer stores and remembers the A instruction. The translation rules are straight forward:

  • @ sign neatly translate to a 0

  • the number after @ just becomes a 31 bit binary number

Example 1:

@3

translates to

00000000000000000000000000000011

3 is 11 in binary. We pad extra 0s in front to make it 31 bits. Lastly, we add one extra 0 in front that stands for the @ sign. Now we have a full 32 bits A instruction in machine format!

Example 2:

@42802345
00000010100011010001110010101001

You can't do the decimal to binary conversion in your head this time but 42802345 in decimals is 10100011010001110010101001 in binary. Again, we pad extra 0s in front if the number doesn't take up the full 31 bits. Lastly, we add one extra 0 in front that stands for the @ sign to make the instruction 32 bits.

A natural question will be what happens if we want to pass in negative numbers? Say we want to pass in -1 in decimals. Why can't we just do:

@-1

And translate it to

11111111111111111111111111111111

If the only instruction our computer supports is the A instruction, doing so will be perfectly fine. However, the top bit is used to switch between the A and C instructions. It's called the operation code or op code for short. Alright, so did we just limit the numbers we can pass to 31 bits instead of the full available 32 bits just to accomodate the op code? Let's put this excellent question aside for a moment and come back to it after covering the C instruction.

C Instruction

First, let's see a sample C instruction in human format:

D=M+1;JGE

Let's decompose the instruction into three parts:

Destinations

Last chapter we designed our CPU with 3 internal registers A, M, and D:

The C instruction allows us to store to any zero, one, two, or all three of these registers. For each register, we use 1 bit to represent whether to store to that register or not:

Computation

Remember that our ALU takes in 6 inputs

zx,nx,zy,ny,f,no

To Be Continued...

Last updated