printf
and scanf
from C
standard library, but nothing elseIn x86 CPU architecture, there are 8 general registers:
AX
: accumulator, used in arithmetic operations
BX
: base, pointer to dataCX
: counter, used in loopsDX
: data, used in arithmetic operations and I/OSI
: source in string operationsDI
: destination in string operationsSP
: stack pointerBP
: base pointer, used for local variablesUnder these names you can access 16 bits
32 bits are accessible with prefix E
e.g. EAX
64 bits are accessible with prefix R
e.g. RAX
In 64-bit mode, there are also 8 additional registers:
R8
, R9
, …, R15
For these additional registers, to access 32 bits you can use
suffix D
e.g. R8D
and for 16 bits you can use
suffix W
e.g. R8W
Floating point numbers are handled differently:
XMM0
, …, XMM7
, each 128-bit wideYMM0
, …, YMM15
, each 256-bit wide
(the XMMn
remains an alias for the lower part of
YMMn
)ZMM0
, …, ZMM31
, each 512-bits wide (as above,
YMMn
and XMMn
are aliases for lower parts of
ZMMn
)The XMM
registers are 128-bit wide and can hold
either:
For YMM
registers these values are doubled and for
ZMM
they are quadrupled (e.g. ZMM
can hold 8x
64-bit double-precision floats)
The above list of types works like a union
in C
For example, XMM0
might hold these 16 bytes:
3F FF 00 00 00 00 00 00 3F FF 00 00 00 00 00 00
If you treat it as 2x 64-bit floats, then it is equal to (1.0, 1.0)
If you treat is as 8x 16-bit integers, then it is equal to (16383, 0, 0, 0, 16383, 0, 0, 0)
Here “treating” means “using dedicated instructions”, so there
isn’t one instruction for adding XMM
registers, but rather
a dedicated instruction for adding 2x 64-bit floats and another
dedicated instruction for adding 8x 16-bit integers
Recommended reading: https://www.gamedev.net/blog/615/entry-2250281-demystifying-sse-move-instructions/
In 2012 Intel introduced F16C extension to instruction set, which can convert half-precision floats (using 16 bits) to single-precision and back
mov destination, source
to copy contentsadd destination, operand
,
sub destination, operand
to add/subtract valueslea destination, [address]
to load address of a memory
locationcall function
, ret
to call a function or
return from onecmp op1, op2
to compare two valuesjmp label
to jump unconditionallyjCC label
to jump conditionally depending on
CC
:
jnz
is jump if not zerojz
is jump if zerojg
is jump if greaterjl
is jump if lessRAX
or rax
, CMP
or
cmp
, etc.I recommend using NASM
You can use NASM with the following Makefile: (please download the file
and not copy-paste from here, because Makefile
syntax is
sensitive to tabs vs spaces and copying from HTML page might break
it)
inputs = $(wildcard *.asm)
objects = $(patsubst %.asm,%.o,$(inputs))
outputs = $(patsubst %.asm,%,$(inputs))
all: $(outputs)
clean:
$(RM) $(objects) $(outputs)
%.o: %.asm
$^
nasm -f elf64
%: %.o
$@ $^ cc -o
This Makefile
will:
.asm
in the current
directorynasm -f elf64
on them to produce object files
with extension .o
cc
to link object files with C standard library
to produce the final executableTo use it, please open any text editor and write your assembly
code in it. Then save the file with .asm
extension and in
the same directory as the Makefile
. Then execute
make
in a console and run your generated executable (or
read the error message and fix the code)
I can recommend edb for debugging:
edb --run ./program
main
functionscanf
, the debugger will “freeze”,
because the process awaits for your input in the console64
bits
default rel
global main
extern printf
section .data
format db 'Hello world!', 0xA, 0
section .bss
section .text
main:
sub rsp, 8
lea rdi, [format]
mov al, 0
call printf wrt ..plt
add rsp, 8
sub rax, rax
ret
bits
directive instructs about the current
mode
default rel
instructs to use relative addressing
mov reg, address
, and the address was a
constant 16- or 32-bit numberprintf
)call printf wrt ..plt
, it means
that the call will actually jump to PLT (Procedure Linkage Table)global
is used to mark a label in code as exported
to the outside (here we mark main
as global so that linker
knows where is the entrypoint)
extern
is used to mark a symbol as one that will be
resolved from external library
There are three main sections:
section .data
is where you put initialised datasection .bss
is where you declare uninitialised
datasection .text
is where you put codeInitialised data is d
eclared as:
db 1, 1011b ; char[2] = { 1, 0xB }
_8bit dw 2, 755o ; short[2] = { 2, 0755 }
_16bit dd 3, 7Fh, 0.125 ; int[2] = { 3, 0x7F }, float[1] = { 0.125f }
_32bit dq 4, 0x7F, 0.25 ; long[2] = { 4, 0x7F }, double[1] = { 0.25 } _64bit
Uninitialised data is res
erved by providing number
of elements required:
1 ; char[1]
_8bit resb 2 ; short[2]
_16bit resw 3 ; int[3] or float[3]
_32bit resd 4 ; long[4] or double[4] _64bit resq
If the name of data label is used as it is, then it corresponds to the address of the variable. To refer to contents such memory address, you need to put the name of data label between brackets:
lea rax, [variable] ; load effective address of `variable` into rax register
mov rax, [variable] ; read contents of `variable` and store it in rax register
NASM uses Intel’s syntax, which is
instruction destination, source
e.g. mov rax, 7
means “set value 7 to rax
register”
In 64-bit Linux applications, the calling convention is defined by System V AMD64 ABI:
The first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, R9 (…), while XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7 are used for the first floating point arguments. (…) Integer return values up to 64 bits in size are stored in RAX while values up to 128 bit are stored in RAX and RDX. Floating-point return values are similarly stored in XMM0 and XMM1.
If the callee is a variadic function, then the number of floating point arguments passed to the function in vector registers must be provided by the caller in the AL register.
In addition, the stack must be aligned to 16 bytes (the RSP
register must be divisible by 16 without remainder). Because a
call
instruction stores 8 bytes in a 64-bit architecture,
in the beginning of the example we use sub rsp, 8
to
achieve this alignment requirement
bits 64
default rel
global main
extern scanf
section .data
db '%d %d %d', 0
format_3x_d
section .bss
resd 3
array_int
section .text
main:
sub rsp, 8
lea rcx, [array_int + 8] ; 4th integer/pointer argument
lea rdx, [array_int + 4] ; 3rd integer/pointer argument
lea rsi, [array_int + 0] ; 2nd integer/pointer argument
lea rdi, [format_3x_d] ; 1st integer/pointer argument
mov al, 0 ; no floating-point arguments
call scanf wrt ..plt
add rsp, 8
sub rax, rax
ret
10 20 30
on
the terminalrdi
, rsi
,
rdx
and rcx
registers0xA
,
0x14
, 0x1E
values (the 10, 20 and 30 as
hexadecimal values)bits 64
default rel
global main
extern scanf
section .data
db '%lf %lf %lf', 0
format_3x_lf
section .bss
resq 3
array_double
section .text
main:
sub rsp, 8
lea rcx, [array_double + 16] ; 4th integer/pointer argument
lea rdx, [array_double + 8] ; 3rd integer/pointer argument
lea rsi, [array_double + 0] ; 2nd integer/pointer argument
lea rdi, [format_3x_lf] ; 1st integer/pointer argument
mov al, 0 ; no floating-point arguments
call scanf wrt ..plt
add rsp, 8
sub rax, rax
ret
1.0 2.0 3.0
on the terminalrdi
, rsi
,
rdx
and rcx
registersal
register mentions that there are no
floating-point arguments0x3FF0000000000000
, 0x4000000000000000
,
0x4008000000000000
valuesbits 64
default rel
global main
extern printf
section .data
db '%f %f %f', 0xA, 0
format_3x_f align 16
dq 1.0, 2.0, 3.0
data
section .text
main:
sub rsp, 8
movlpd xmm2, [data + 16] ; 3rd floating-point argument
movlpd xmm1, [data + 8] ; 2nd floating-point argument
movlpd xmm0, [data] ; 1st floating-point argument
lea rdi, [format_3x_f] ; 1st integer/pointer argument
mov al, 3 ; three floating-point arguments
call printf wrt ..plt
add rsp, 8
sub rax, rax
ret
printf
data
array
using movlpd
instruction (it loads one double-precision
value and stores it in the lower half of a XMM register)rdi
, while the
floating-point arguments are passed in xmm0
,
xmm1
and xmm3
al
register mentions that there are three
floating-point registers in use0x3FF0000000000000
, 0x4000000000000000
,
0x4008000000000000
valuesbits 64
default rel
global main
extern scanf
extern printf
section .data
db '%lf %lf', 0
format_2x_lf db '%lf is less than %lf', 0xA, 0
less_than_str db '%lf is greater or equal to %lf', 0xA, 0
greater_or_equal_str
section .bss
resq 2
array_double
section .text
main:
sub rsp, 8
lea rdx, [array_double + 8] ; 3rd integer/pointer argument
lea rsi, [array_double + 0] ; 2nd integer/pointer argument
lea rdi, [format_2x_lf] ; 1st integer/pointer argument
mov al, 0 ; no floating-point arguments
call scanf wrt ..plt
movlpd xmm0, [array_double] ; load first value to lower half of xmm0
movlpd xmm1, [array_double + 8] ; load second value to lower half of xmm1
cmpltsd xmm0, xmm1 ; check if lower half of xmm0 is LESS-THAN lower half of xmm1
movq rax, xmm0 ; copy the comparison result to RAX
cmp rax, 0
jz greater_or_equal
less_than:
lea rdi, [less_than_str]
jmp print_message
greater_or_equal:
lea rdi, [greater_or_equal_str]
print_message:
movlpd xmm1, [array_double + 8]
movlpd xmm0, [array_double]
mov al, 2
call printf wrt ..plt
add rsp, 8
sub rax, rax
ret
Comparison of floating point numbers does not change the
flags
register (so conditional jumps will not work
directly)
The comparison changes the bit pattern of the destination register to either all 1s (if the condition is true) or all 0s (otherwise)
Here, the instruction cmpltsd
will compare:
...lt..
means “less than” (other possibilities:
...eq..
, ...le..
, etc.).....sd
means “scalar double” i.e. take only 1x double
in the lower half of the register (other possibility:
.....pd
for “packed double”)If the result of cmpltsd
is true, than the lower
half of the register will be set to 0xFFFFFFFFFFFFFFFF
,
otherwise it will be set to 0x0
movq
will copy this value to rax
which
gets compared with 0 to know the actual comparison result of the
floating-point numbers
Write your own version of strcpy
function
It should be an equivalent of:
#include <stdio.h>
int main() {
char input[1024];
char output[1024];
("%s", input);
scanf
char *src = input;
char *dst = output;
while ((*dst = *src) != '\0') {
++;
src++;
dst}
("%s\n", output);
printfreturn 0;
}
Useful links:
Implement bubble sort in assembly
It should be an equivalent of:
#include <stdio.h>
int main() {
int array[100];
int n = 0;
while (scanf("%d", &array[n]) == 1) {
++;
n}
for (int i = 0; i < n; i++) {
for (int j = n - 1; j > i; j--) {
if (array[j] < array[j - 1]) {
int tmp = array[j];
[j] = array[j - 1];
array[j - 1] = tmp;
array}
}
}
for (int i = 0; i < n; i++) {
("%d ", array[i]);
printf}
return 0;
}
Useful links:
Write a program which will calculate square root of a sequence of numbers with step 0.125
It should be an equivalent of: (but without calling
sqrt()
function)
#include <math.h>
#include <stdio.h>
int main() {
double end;
("%lf", &end);
scanf
for (double d = 0.0; d < end; d += 0.125) {
("sqrt(%f) = %f\n", d, sqrt(d));
printf}
return 0;
}
Useful links:
The Maclaurin series for e^x
is:
\sum_{k=0}^{\infty}
\frac{x^k}{k!}
Write a program which will approximate e^x with first k components
It should be an equivalent of:
#include <stdio.h>
int main() {
int k;
double x;
("%i %lf", &k, &x);
scanf
double series = 1;
double numerator = 1;
double denominator = 1;
for (int i = 1; i <= k; i++) {
*= x;
numerator *= i;
denominator += numerator / denominator;
series }
("e^x = %f\n", series);
printfreturn 0;
}
Useful links: