Assembly Language Crash Course (Pt. 1)
by Coleman Kane
This lecture introduces the class to mainstream CPU architecture, the compilation & translation stage, and the distinctions between native machine language and more general machine-agnostic programming languages, such as C.
We delve into how a compiler will break up the source code for a program into multiple blocks, to construct a Control Flow Diagram, that governs execution flow. These blocks are then compiled, and subsequently translated, into a native machine language for the target platform (such as x86-64 machine code).
The human-readable representation of this is typically referred to as “assembly language”.
Using visual static analysis tools, such as IDA, this CFG is reconstructed from compiled code and then presented to an analyst for review.
Slides: lecture-w05-2.pdf (PDF)
Example sources from lecture:
- asm-prog.c - Original example C program discussed in class, with comments
- asm-prog.s - Compiled code from above, represented in x86-64 assembly using AT&T syntax
- asmprog.dot.pdf - CFG diagram for asm-prog.s from slides in PDF format
- asmprog-snowman.cpp - Decompiled C++ code from the compiled binary asm-prog
Some helpful links to static analysis tools leveraging assembly language:
- Ghidra - https://www.ghidra-sre.com/ (closed src)
- IDA - https://www.hex-rays.com/ (closed src)
- binary ninja - https://binary.ninja (closed src)
- ROSE - https://www.rose-compiler.org/ (semi-open src)
- radare2 - http://rada.re/r/ (open src)
- snowman - https://github.com/yegord/snowman (open src)
Helpful machine-language and assembly references:
- x86 Instruction reference - simple site - http://ref.x86asm.net/
- AMD64 Programmer’s reference, Vol 3 - https://support.amd.com/TechDocs/24594.pdf (PDF)
- Sandpile - http://sandpile.org
- Navigable parsed version of Intel 64 reference - https://github.com/zneak/x86doc
ARM Reference, for comparison:
- ARM Instruction Set - http://www.peter-cockerell.net/aalp/html/ch-3.html