printf and scanf from C
standard library, but nothing elseIn x86 CPU architecture, there are 8 general registers:
AX: accumulator, used in arithmetic operations
BX: base, pointer to dataCX: counter, used in loopsDX: data, used in arithmetic operations and I/OSI: source in string operationsDI: destination in string operationsSP: stack pointerBP: base pointer, used for local variablesUnder these names you can access 16 bits
32 bits are accessible with prefix E
e.g. EAX
64 bits are accessible with prefix R
e.g. RAX
In 64-bit mode, there are also 8 additional registers:
R8, R9, …, R15
For these additional registers, to access 32 bits you can use
suffix D e.g. R8D and for 16 bits you can use
suffix W e.g. R8W
Floating point numbers are handled differently:
XMM0, …, XMM7, each 128-bit wideYMM0, …, YMM15, each 256-bit wide
(the XMMn remains an alias for the lower part of
YMMn)ZMM0, …, ZMM31, each 512-bits wide (as above,
YMMn and XMMn are aliases for lower parts of
ZMMn)The XMM registers are 128-bit wide and can hold
either:
For YMM registers these values are doubled and for
ZMM they are quadrupled (e.g. ZMM can hold 8x
64-bit double-precision floats)
The above list of types works like a union in C
For example, XMM0 might hold these 16 bytes:
3F FF 00 00 00 00 00 00 3F FF 00 00 00 00 00 00If you treat it as 2x 64-bit floats, then it is equal to (1.0, 1.0)
If you treat is as 8x 16-bit integers, then it is equal to (16383, 0, 0, 0, 16383, 0, 0, 0)
Here “treating” means “using dedicated instructions”, so there
isn’t one instruction for adding XMM registers, but rather
a dedicated instruction for adding 2x 64-bit floats and another
dedicated instruction for adding 8x 16-bit integers
Recommended reading: https://www.gamedev.net/blog/615/entry-2250281-demystifying-sse-move-instructions/
In 2012 Intel introduced F16C extension to instruction set, which can convert half-precision floats (using 16 bits) to single-precision and back
mov destination, source to copy contentsadd destination, operand,
sub destination, operand to add/subtract valueslea destination, [address] to load address of a memory
locationcall function, ret to call a function or
return from onecmp op1, op2 to compare two valuesjmp label to jump unconditionallyjCC label to jump conditionally depending on
CC:
jnz is jump if not zerojz is jump if zerojg is jump if greaterjl is jump if lessRAX or rax, CMP or
cmp, etc.I recommend using NASM
You can use NASM with the following Makefile: (please download the file
and not copy-paste from here, because Makefile syntax is
sensitive to tabs vs spaces and copying from HTML page might break
it)
inputs = $(wildcard *.asm)
objects = $(patsubst %.asm,%.o,$(inputs))
outputs = $(patsubst %.asm,%,$(inputs))
all: $(outputs)
clean:
$(RM) $(objects) $(outputs)
%.o: %.asm
nasm -f elf64 $^
%: %.o
cc -o $@ $^This Makefile will:
.asm in the current
directorynasm -f elf64 on them to produce object files
with extension .occ to link object files with C standard library
to produce the final executableTo use it, please open any text editor and write your assembly
code in it. Then save the file with .asm extension and in
the same directory as the Makefile. Then execute
make in a console and run your generated executable (or
read the error message and fix the code)
I can recommend edb for debugging:
edb --run ./programmain
functionscanf, the debugger will “freeze”,
because the process awaits for your input in the consolebits 64
default rel
global main
extern printf
section .data
format db 'Hello world!', 0xA, 0
section .bss
section .text
main:
sub rsp, 8
lea rdi, [format]
mov al, 0
call printf wrt ..plt
add rsp, 8
sub rax, rax
retbits directive instructs about the current
mode
default rel instructs to use relative addressing
mov reg, address, and the address was a
constant 16- or 32-bit numberprintf)call printf wrt ..plt, it means
that the call will actually jump to PLT (Procedure Linkage Table)global is used to mark a label in code as exported
to the outside (here we mark main as global so that linker
knows where is the entrypoint)
extern is used to mark a symbol as one that will be
resolved from external library
There are three main sections:
section .data is where you put initialised datasection .bss is where you declare uninitialised
datasection .text is where you put codeInitialised data is declared as:
_8bit db 1, 1011b ; char[2] = { 1, 0xB }
_16bit dw 2, 755o ; short[2] = { 2, 0755 }
_32bit dd 3, 7Fh, 0.125 ; int[2] = { 3, 0x7F }, float[1] = { 0.125f }
_64bit dq 4, 0x7F, 0.25 ; long[2] = { 4, 0x7F }, double[1] = { 0.25 }Uninitialised data is reserved by providing number
of elements required:
_8bit resb 1 ; char[1]
_16bit resw 2 ; short[2]
_32bit resd 3 ; int[3] or float[3]
_64bit resq 4 ; long[4] or double[4]If the name of data label is used as it is, then it corresponds to the address of the variable. To refer to contents such memory address, you need to put the name of data label between brackets:
lea rax, [variable] ; load effective address of `variable` into rax register
mov rax, [variable] ; read contents of `variable` and store it in rax registerNASM uses Intel’s syntax, which is
instruction destination, source
e.g. mov rax, 7 means “set value 7 to rax
register”
In 64-bit Linux applications, the calling convention is defined by System V AMD64 ABI:
The first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, R9 (…), while XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7 are used for the first floating point arguments. (…) Integer return values up to 64 bits in size are stored in RAX while values up to 128 bit are stored in RAX and RDX. Floating-point return values are similarly stored in XMM0 and XMM1.
If the callee is a variadic function, then the number of floating point arguments passed to the function in vector registers must be provided by the caller in the AL register.
In addition, the stack must be aligned to 16 bytes (the RSP
register must be divisible by 16 without remainder). Because a
call instruction stores 8 bytes in a 64-bit architecture,
in the beginning of the example we use sub rsp, 8 to
achieve this alignment requirement
bits 64
default rel
global main
extern scanf
section .data
format_3x_d db '%d %d %d', 0
section .bss
array_int resd 3
section .text
main:
sub rsp, 8
lea rcx, [array_int + 8] ; 4th integer/pointer argument
lea rdx, [array_int + 4] ; 3rd integer/pointer argument
lea rsi, [array_int + 0] ; 2nd integer/pointer argument
lea rdi, [format_3x_d] ; 1st integer/pointer argument
mov al, 0 ; no floating-point arguments
call scanf wrt ..plt
add rsp, 8
sub rax, rax
ret
10 20 30 on
the terminalrdi, rsi,
rdx and rcx registers0xA,
0x14, 0x1E values (the 10, 20 and 30 as
hexadecimal values)bits 64
default rel
global main
extern scanf
section .data
format_3x_lf db '%lf %lf %lf', 0
section .bss
array_double resq 3
section .text
main:
sub rsp, 8
lea rcx, [array_double + 16] ; 4th integer/pointer argument
lea rdx, [array_double + 8] ; 3rd integer/pointer argument
lea rsi, [array_double + 0] ; 2nd integer/pointer argument
lea rdi, [format_3x_lf] ; 1st integer/pointer argument
mov al, 0 ; no floating-point arguments
call scanf wrt ..plt
add rsp, 8
sub rax, rax
ret
1.0 2.0 3.0
on the terminalrdi, rsi,
rdx and rcx registersal register mentions that there are no
floating-point arguments0x3FF0000000000000, 0x4000000000000000,
0x4008000000000000 valuesbits 64
default rel
global main
extern printf
section .data
format_3x_f db '%f %f %f', 0xA, 0
align 16
data dq 1.0, 2.0, 3.0
section .text
main:
sub rsp, 8
movlpd xmm2, [data + 16] ; 3rd floating-point argument
movlpd xmm1, [data + 8] ; 2nd floating-point argument
movlpd xmm0, [data] ; 1st floating-point argument
lea rdi, [format_3x_f] ; 1st integer/pointer argument
mov al, 3 ; three floating-point arguments
call printf wrt ..plt
add rsp, 8
sub rax, rax
ret
printfdata array
using movlpd instruction (it loads one double-precision
value and stores it in the lower half of a XMM register)rdi, while the
floating-point arguments are passed in xmm0,
xmm1 and xmm3al register mentions that there are three
floating-point registers in use0x3FF0000000000000, 0x4000000000000000,
0x4008000000000000 valuesbits 64
default rel
global main
extern scanf
extern printf
section .data
format_2x_lf db '%lf %lf', 0
less_than_str db '%lf is less than %lf', 0xA, 0
greater_or_equal_str db '%lf is greater or equal to %lf', 0xA, 0
section .bss
array_double resq 2
section .text
main:
sub rsp, 8
lea rdx, [array_double + 8] ; 3rd integer/pointer argument
lea rsi, [array_double + 0] ; 2nd integer/pointer argument
lea rdi, [format_2x_lf] ; 1st integer/pointer argument
mov al, 0 ; no floating-point arguments
call scanf wrt ..plt
movlpd xmm0, [array_double] ; load first value to lower half of xmm0
movlpd xmm1, [array_double + 8] ; load second value to lower half of xmm1
cmpltsd xmm0, xmm1 ; check if lower half of xmm0 is LESS-THAN lower half of xmm1
movq rax, xmm0 ; copy the comparison result to RAX
cmp rax, 0
jz greater_or_equal
less_than:
lea rdi, [less_than_str]
jmp print_message
greater_or_equal:
lea rdi, [greater_or_equal_str]
print_message:
movlpd xmm1, [array_double + 8]
movlpd xmm0, [array_double]
mov al, 2
call printf wrt ..plt
add rsp, 8
sub rax, rax
retComparison of floating point numbers does not change the
flags register (so conditional jumps will not work
directly)
The comparison changes the bit pattern of the destination register to either all 1s (if the condition is true) or all 0s (otherwise)
Here, the instruction cmpltsd will compare:
...lt.. means “less than” (other possibilities:
...eq.., ...le.., etc.).....sd means “scalar double” i.e. take only 1x double
in the lower half of the register (other possibility:
.....pd for “packed double”)If the result of cmpltsd is true, than the lower
half of the register will be set to 0xFFFFFFFFFFFFFFFF,
otherwise it will be set to 0x0
movq will copy this value to rax which
gets compared with 0 to know the actual comparison result of the
floating-point numbers
Write your own version of strcpy function
It should be an equivalent of:
#include <stdio.h>
int main() {
char input[1024];
char output[1024];
scanf("%s", input);
char *src = input;
char *dst = output;
while ((*dst = *src) != '\0') {
src++;
dst++;
}
printf("%s\n", output);
return 0;
}Useful links:
Implement bubble sort in assembly
It should be an equivalent of:
#include <stdio.h>
int main() {
int array[100];
int n = 0;
while (scanf("%d", &array[n]) == 1) {
n++;
}
for (int i = 0; i < n; i++) {
for (int j = n - 1; j > i; j--) {
if (array[j] < array[j - 1]) {
int tmp = array[j];
array[j] = array[j - 1];
array[j - 1] = tmp;
}
}
}
for (int i = 0; i < n; i++) {
printf("%d ", array[i]);
}
return 0;
}Useful links:
Write a program which will calculate square root of a sequence of numbers with step 0.125
It should be an equivalent of: (but without calling
sqrt() function)
#include <math.h>
#include <stdio.h>
int main() {
double end;
scanf("%lf", &end);
for (double d = 0.0; d < end; d += 0.125) {
printf("sqrt(%f) = %f\n", d, sqrt(d));
}
return 0;
}Useful links:
The Maclaurin series for e^x
is:
\sum_{k=0}^{\infty}
\frac{x^k}{k!}
Write a program which will approximate e^x with first k components
It should be an equivalent of:
#include <stdio.h>
int main() {
int k;
double x;
scanf("%i %lf", &k, &x);
double series = 1;
double numerator = 1;
double denominator = 1;
for (int i = 1; i <= k; i++) {
numerator *= x;
denominator *= i;
series += numerator / denominator;
}
printf("e^x = %f\n", series);
return 0;
}Useful links: