> For the complete documentation index, see [llms.txt](https://kevinli.gitbook.io/crafting-computer/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://kevinli.gitbook.io/crafting-computer/assembly-language.md).

# Assembly Language

We have all the parts that are needed for our CPU and learned the architecture of our computer. However, since the CPU is more complicated than our previous circuits, we will take a preview of its functionalities before actually constructing one. If you are a web developer, you may have heard of the new [WebAssembly ](https://www.youtube.com/watch?v=HktWin_LPf4)language that enables all languages to run on the Web and makes web applications run much faster. If you have used C or other low-level languages before, you might know that it compiles to assembly to be executed.

> You will hear the verb "compile" or its noun "compilation" a lot both in this course and out in the programming community. In simple terms, compilation means translating a program written in one language to another, like translating an article written in Chinese to English. Just like there are human translaters that translates texts and speeches from one natural language to another, you can translate computer programs by hand from one programming language to another. However, this manual translation is very tedious and error-prone. Instead, we develop another program called the "compiler" to do the translation for us. We will learn how to build a compiler later on in this course.

![https://unsplash.com/photos/ZzWsHbu2y80](/files/-M7Ft1sSrNQClTNASOvX)

In fact, all programming languages will be compiled directly or indirectly to some assembly language to be able to run on a physical hardware. The biggest difference between assembly languages and other languages you might have used like Python and JavaScript is abstraction levels. Unlike higher-level languages, assembly languages do not have concepts like functions, closures, classes and usually have only very simple numerical data types like integers and floating-point numbers. However, we can realize all these complex abstractions in assembly by constructing something called a Virtual Machine (VM). We will cover VMs later.

So we know that most languages are compiled to assembly languages but how does assembly programs run on a physical computer? How does the computer understand the meaning of the letters and digits in the program and know what to do? It turns out that assembly languages actually have two formats - one for humans and one for machines. The human format is a text file that we can edit in a text editor like VS Code or notepad. The machine format is a binary file containing a long list of binary instructions for the CPU. There's a separate program called an assembler that compiles the human format to the machine format. We will create the assembler later. Meanwhile, let's take a look at the assembly language for our computer.

## A Instruction

There are two types of instructions - A instruction and C instruction. Let's first see what an A instruction looks like:

```
@243
```

The `@` sign means to

* store the value `243` in the `A` register
* load the value at RAM address `243` into the `M` register

This instruction can be a little confusing because it can serve two purposes:

* Load a number into the `A` register
* Load a value in RAM to the `M` register

So we developed shorthand names for common register addresses:

| Shorthand Name | RAM Address |
| -------------- | ----------- |
| R0             | 0           |
| R1             | 1           |
| R2             | 2           |
| ...            |             |
| R15            | 15          |

so these two A instructions are equivalent:

```
@R12
```

```
@12
```

The extra `R` in front just reminds us programmers that the number `12` refers to a memory addres instead of a number. Remember the base address of the screen and keyboard memory maps? We also have shorthands for them:

| Shorthand Name | RAM Address |
| -------------- | ----------- |
| SREEN          | 65536       |
| KBD            | 84747       |

In general, an A instruction looks like:

![A Instruction in human format](/files/-M7FvTdLEoyonHxMXBwV)

Now let's see how the computer stores and remembers the A instruction. The translation rules are straight forward:

* `@` sign neatly translate to a `0`&#x20;
* the number after `@` just becomes a 31 bit binary number

Example 1:

```
@3
```

translates to

```
00000000000000000000000000000011
```

`3` is `11` in binary. We pad extra `0`s in front to make it `31` bits. Lastly, we add one extra `0` in front that stands for the `@` sign. Now we have a full `32` bits A instruction in machine format!

Example 2:

```
@42802345
```

```
00000010100011010001110010101001
```

You can't do the decimal to binary conversion in your head this time but `42802345` in decimals is `10100011010001110010101001` in binary. Again, we pad extra `0`s in front if the number doesn't take up the full `31` bits. Lastly, we add one extra `0` in front that stands for the `@` sign to make the instruction `32` bits.

A natural question will be what happens if we want to pass in negative numbers? Say we want to pass in `-1` in decimals. Why can't we just do:

```
@-1
```

And translate it to

```
11111111111111111111111111111111
```

If the only instruction our computer supports is the A instruction, doing so will be perfectly fine. However, the top bit is used to switch between the A and C instructions. It's called the operation code or op code for short. Alright, so did we just limit the numbers we can pass to `31` bits instead of the full available `32` bits just to accomodate the op code? Let's put this excellent question aside for a moment and come back to it after covering the C instruction.

## C Instruction

First, let's see a sample C instruction in human format:

```
D=M+1;JGE
```

Let's decompose the instruction into three parts:

![Three Parts of the C Instruction](/files/-M7L2v9pi4OzsZ_OMvKr)

### Destinations

Last chapter we designed our CPU with 3 internal registers `A`, `M`, and `D`:

![3 Registers in the CPU](/files/-M7L3pS6CqKFIXMFvqUm)

The C instruction allows us to store to any zero, one, two, or all three of these registers. For each register, we use 1 bit to represent whether to store to that register or not:

| Destinations | d1 | d2 | d3 |
| ------------ | -- | -- | -- |
| none         | 0  | 0  | 0  |
| M            | 0  | 0  | 1  |
| D            | 0  | 1  | 0  |
| MD           | 0  | 1  | 1  |
| A            | 1  | 0  | 0  |
| AM           | 1  | 0  | 1  |
| AD           | 1  | 1  | 0  |
| AMD          | 1  | 1  | 1  |

### Computation

Remember that our ALU takes in 6 inputs

```
zx,nx,zy,ny,f,no
```

To Be Continued...


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://kevinli.gitbook.io/crafting-computer/assembly-language.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
