Assembly Language Crash Course (Pt. 2), A Deeper Dive
by
Assembly Language Crash Course (Pt. 2), A Deeper Dive
This lecture focuses more on the core internals of x86-64 assembly (which is a 64-bit superset of the 32-bit x86 instruction set).
The following topics are discussed:
- Assembly language representations (AT&T/UNIX versus Intel/Microsoft syntax): http://www.imada.sdu.dk/Courses/DM18/Litteratur/IntelnATT.htm
- Variable length instruction format
- Binary instruction layout (generalized, with some case-specific examples)
- x86-64 register set
- Memory addressing modes (7 variants!)
Additionally, this lecture includes a demostration using the in-class compiled program from Lecture W05.2. In this example, we examine a number of specific instructions that were generated from our C program’s compilation. We discuss how the bytes generated in the file correspond to components of an instruction, and then also use a hex viewing utility to examine those instructions in their native context.
Helpful reference tables
Slides: lecture-w06.pdf (PDF)
Video: CS7038: Wk06 - Assembly Language Crash Course
x86-64/32 Registers
64-bit Register Name | Rnn Name | Name (Description) | 32-bit register name | 16-bit Register name | 8-bit high/low |
---|---|---|---|---|---|
RAX | R0 | Accumulator (frequently result storage) | EAX | AX | AH/AL |
RCX | R1 | Counter (frequently used as i in iterations/loops) | ECX | CX | CH/CL |
RDX | R2 | Data (frequently used as additional arg in operations) | EDX | DX | DH/DL |
RBX | R3 | Base (frequently used as base address or counter) | EBX | BX | BH/BL |
RSP | R4 | Stack Pointer (used internally to keep track of top of CPU stack) | ESP | SP | |
RBP | R5 | Base Pointer (frequently used to keep track of bottom of CPU stack) | EBP | BP | |
RSI | R6 | Source Index (keeps track of indexes in source arrays) | ESI | SI | |
RDI | R7 | Destination Index (keeps track of indexes in destination arrays) | EDI | DI | |
R8-R15 | R8-R15 | Additional general-purpose registers only avail. in 64-bit |
For many of the registers, you may access fragments of them using their synonyms. Editing AH, for example, will modify the bits 15..8 in register RAX (EAX, AX, etc…). The fragments address the following bit ranges in their respective registers:
- RxX/RxI/RxP: Bits 63..0 (full 64-bit register)
- ExX/ExI/ExP: Bits 31..0 (lower half)
- xX/xI/xP: Bits 15..0 (lower quarter)
- xH: Bits 15..8 (byte index 1)
- xL: Bits 7..0 (byte index 0, lower-most byte)
There are also a number of limited-use, system, and context-specific registers that are part of the architecture. I will not go into these at this time.
x86 Addressing Modes
The x86 instruction set also has a number of addressing modes. These represent instruction variants that provide various mechanisms to work with constant (hard-coded data), data present in registers, and data present in memory.
Addr. Mode Name | AT&T/UNIX example | Intel/Microsoft Equivalent | Description of example |
---|---|---|---|
Immediate | movq $0x0a, %rbx |
MOV RBX, 0Ah |
Copy constant 0x0a into register RBX |
Register | movq %rax, %rbx |
MOV RBX, RAX |
Copy data from RAX into RBX |
Direct Memory | movq 0x1000, %rbx |
MOV RBX, QWORD PTR [1000h] |
Copy 64-bit data from memory loc. 0x1000 into RBX |
Register Indirect | movq (%rbx), %rbp |
MOV RBP, [RBX] |
Copy 64-bit data from memory loc. stored in RBX into RBP |
Register Indirect w/ displacement | movq 0x08(%rbx), %rbp |
MOV RBP, 08h[RBX] |
Copy 64-bit data from memory location RBX+0x08 into RBP |
Scaled Register Indirect w/ displacement | movq 0x08(,%rbx,4), %rbp |
MOV RBP, 08h[RBX*04h] |
Copy 64-bit data from memory location [(RBX*0x04)+0x08] into RBP |
Scaled Register Indirect w/ displacement + base reg. | movq 0x08(%rsi,%rbx,4), %rbp |
MOV RBP, 08h[RSI][RBX*04h] |
Copy 64-bit data from memory location [(RBX*0x04)+RSI+0x08] into RBP |