User Tools

Site Tools


Sidebar

os_cp:2024:shell_assignment

Each student is supposed to choose one topic of the script from the list below, write the script and send it to me, attached to an e-mail message with the subject beginning with [OSCP], no later than 12.05.2024 (AoE).

If you have your own idea for the topic of the script, write it up and send me via an e-mail, I'll consider adding the topic to the list.

List of topics

Topic 1. Self-extracting shell script that builds source code and runs tests.
The script contains embedded compressed archive with a source code of a selected program. When run, the script checks if certain requirements are met: if required compilers and libraries are present, if there is enough disk space, and if target directory is non-existent or empty. Then, it unpacks source in the target directory and compiles the source code. Ones the binaries are ready, the script creates environment for tests (e.g. data files), and runs some tests to check if the binaries work correctly and cleans up the test environment.

Topic 2. Script installing and configuring LAMP on selected Linux distribution.
The script shall be run with root privileges and should be prepared for a specific Linux distribution. The script first installs packages: a selected HTTP server (such as apache, nginx, lighttpd…), a selected relational database server (mysql, mariadb, postgresql, …) and PHP. Then, the script configures those servers — puts a custom website in webroot, creates a database and configures access to it, and enables PHP on the server.
The script must protect the DB access with a randomized password, and put the password in a configuration file of the website.
LAMP is a commonly used nickname for a set of software used to run web applications.

Topic 3. Recursively sanitizing file names.
The script accepts arguments that: define allowed characters in file names, choose how the characters outside the set are replaced, and choose how multiple files mapping to the same name are handled — in case of directories, the script allows to choose if two directories that map to the same name should be merged or one of the directories shall have a suffix appended. Once set up, the script renames files that contain disallowed characters recursively in all provided directories. The script must support Unicode characters in the allowed characters set.
Hint: sed 's/[^allowed]/_/gi replaces all but allowed to _, whereas the suffix i selects case insensitivity.

Example invocation and output

Topic 4. Live plot of network bandwidth.
The script shall get information on number of bytes sent and received by a network interface ifname specified by arguments. Basing on that information, it should calculate periodically (the period shall be specified in arguments) the receive and transmit bandwidth as difference of received/transmitted bytes divided by difference of timestamps, and shall draw a plot of both bandwidths, with the time on horizontal axis.
The script shall use Unicode characters (e.g., ⢀⢠⢰⢸⡀⣀⣠⣰⣸⡄⣄⣤⣴⣼⡆⣆⣦⣶⣾⡇⣇⣧⣷⣿, ▁▂▃▄▅▆▇█ or ▗▐▖▄▟▌▙█) to draw the plot, and the plots should fill the terminal window.
Number of tx/rx bytes can be read from /sys/class/net/ifname/statistics/tx_bytes and …/rx_bytes files, or displayed with ethtool --statistics ifname or ip [--json] --statistics show dev ifname commands.
To write this script, consider using console codes to move cursor on the screen.

Topic 5. Automating running benchmarks and plotting results.
First, select a program that accepts a parameter and produces some result (or metric).
The script must accept several arguments: a list (or range) of the values for the parameter, the number of iterations, and the prefix of the output file name.
The script shall run the program with specified arguments, and repeat the run with each value of the parameter as many times as there are iterations. The script shall place the results in a text file.
Then, the script shall generate a scatter plot of the results, with X axis being the parameter value, and Y axis being the result. The plot shall contain the average result with error bars and be generated as an image (e.g., svg/png).
Consider using gnuplot or R to generate the plot.
If you don't have a clue what to benchmark, you might use openssl prime -generate -bits N, and measure the time it takes to generate a prime of size N as a result.

Topic 6. Script testing a program with parameter combinations.
A script for testing a program accepting two parameters. The script should run the selected program with all possible combinations for the two parameters, taking the values for the parameters from corresponding lists.
The standard output1) of each run should be placed in a directory with the name of the tested program and the date when the script was run, to a file named so that it uniquely identifies the parameter combination. The files, once ready, should be compressed.
The script does not accept arguments, the parameter list and the command to run must be configurable by editing the first few lines of the script.
The script shall print to the standard output, before running each test, information about the combination being currently tested and the current time.
It's up to you to choose the program to be tested. An example program that can be tested is the fio disk performance benchmark; an example execution in Bash is:
fio <(echo -e "[foo] \n time_based \n runtime=1s \n filename=/tmp/testFile \n filesize=10M \n ioengine=posixaio \n readwrite=randrw \n rwmixread=10 \n blocksize=4k")
Parameters to control for FIO can be the block size and write percentage.

Topic 7. Preparing for run and running a program on a remote machine.
The script shall create a temporary directory in which it copies files from a list (that is either read from a file, or stored in a variable at the beginning of the script). Then, the script shall prepare a configuration file and fill it with the values provided as the script arguments. Next, the files shall be copied to a remote machine (the address of the machine should be stored in a variable at the beginning of the script), and the specified program is run on the remote machine (the command to run is also set as a variable at the beginning of the script). The standard output and the standard error of the program shall be redirected to files, and copied back to the local machine once the program terminates. The files should be renamed so that they contain the program name and the current date.
It's up to you to choose the program to be tested. You can use the fio mentioned in th previus topic if you cannot come up with any program.

Topic 8. Script reminding of upcoming birthdays.
The script shall read and store information on cyclic events in a human-readable text file. (Any format of the data in file is fine, e.g., lines such as 12.31 John Doe's birthday are sufficient.)
The script run without any arguments shall list all events within two weeks. If there are not more than three such events, the script lists the three soonest events.
With right arguments, the script shall add new events, remove existing events and list all events within a year in order.

Topic 9. Presentation/slideshow script.
The script shall display contents of the files in a directory, one by one, as a simple form of presentation. The script takes as an argument a name of the directory with files. The user shall be able to navigate the slides freely and be able to show a table of contents by pressing predefined keys. The script shall print a status bar with the slide number, total count of slides and name of the current file.
If any of the files is a source code, it should be colorized.
In bash, read -sn1 reads a single character from input without echoing it back. Notice that pressing an arrow key generates three characters.
For colorizing, use an external tool (for instance pygmentize).

Topic 10. Thumbnail generator.
The script accepts as arguments the list of directories with images. For every image file in the directories, it generates a JPEG thumbnail and places it in tn subdirectory of the directory that contains the image. The thumbnails shall have change, modification and access time identical to the original image.
Additionally, for each directory the script creates an image cover.jpg with first 25 thumbnails in a 5x5 grid, and places it in the tn directory as well.
The script shall print the name of each processed image.
Consider using file to detect whether the file is an image, and using imagemagick toolkit for image manipulation (convert/magick and montage)

Topic 11. Extracting GPX tracks from geotagged photos.
The script shall recursively go through all .jpg files in directories specified as the arguments. The script shall extract GPS Position and Create Date EXIF tags from the files, and shall create a GPX file with routes (position + date) and tracks (position + URL to the photo). The photos should be organized into tracks/routes so that a new track/route is started if the time elapsed between taking the photos is lager than a defined period (e.g., an hour). Photos that do not have GPS tags should be omitted. The script outputs a summary: number of photos with all required tags, number of all photos, number of generated track/routes.
You need to take sample pictures with a camera/smartphone with GPS as test data.
exiftool extracts data from files. A number of tools, for instance gpxsee, will let you see if the resulting GPX file is correct.

Example (simple) GPX file

Topic 12. Organizing photo collection by dates.
The script accepts the destination directory as the first argument and the list of source directories as the remaining arguments.
The script shall move each file from the source directories (recursively) to the year/month/day subdirectory in the target directory and rename the files to hour_minute_second__old_name. The time and date shall be read from EXIF metadata (e.g., CreateDate, DateTimeOriginal), and of no valid metadata is embedded, then the last modify date shall be used. If two files (e.g., from separate source directories) map to the same name, the script must add consecutive numbers at the end of file name (but before the extension) so that no file is lost.
The script shall output information on each each move operation as a source file --> target file line.
Consider using exiftool to extract the dates.

Topic 13. Fetching images from a web page.
The script takes as an argument a list of URLs to web pages. The script fetches the pages and extracts the addresses of all images present on the pages (an image is understood as the <img ... tag with a src paramter). Then, the script downloads all images into the current directory, unless a file with identical name as the image already exists. The script shall download the images in parallel, but with a specified upper bound on the number of concurrent downloads.
The script outputs each image URL found on the page, outputs whether it exists in the current directory or whether it will be downloaded, and outputs a completion message for each image.
Consider using wget or curl to download files from the web.

Topic 14. Looking up duplicate files.
The script accepts a list of directories as the arguments.
The script shall search the directories recursively, and firstly it should group the files by size. Then, in each group with multiple files, it should calculate the MD5 sum of each file. Lastly, if two or more files have identical sum, the files should be compared against each other with the cmp command.
The script shall output groups of identical files (each file name in a separate line) separated by a blank line to the standard output.

Topic 15. Recursive text search.
The script accepts a search key as the first argument, followed by a list of directories.
The script shall search in all files in the specified directories (recursively) for the specified search key. The script shall detect type of each file, and for formats that contain text not stored as plain text, the script shall convert the file to plain text (without creating any files on disk). For instance, the text from PDF files shall be extracted (pdftotext), compressed files (.gz/.bz2/.xz/…) should be uncompressed (zgrep/bzgrep/xzgrep/…), zip archives such as . zip files, most e-book file formats (.epub/.mobi/…), MS Office documents (.docx/.pptx/…), and Open Document Format files (.odt/.odp/…) should be first unzipped (unzip -c).
The script shall output the names of the files in which the search key is found. The script shall not output errors when the files or subdirectories are not accessible.

Topic 16. rsync-based backup script.
The script shall create an incremental backup, reading the list of directories to back up from one file, and a list of file name patterns that should be excluded from the backups form another file.
The script shall put the backup to a specified (as a variable at the beginning of the script) directory on a remote machine, so that once the script finishes, the target directory contains exact copies of the local directories being backed up, apart from the files from the exclude list (that should be absent in the backup). Each time the script is run, it shall create a directory with the current date, where all files deleted or modified since the last run shall be placed. (NB: rsync does all that by itself.)
Once the backup is ready, the script shall create the following lists of files: one with names of deleted (since the previous backup) files, another with name of modified files, and yet another with a list of new files. The lists should have the name derived from the directory with deleted and modified files, with, subsequently, .del, .mod and .new suffix.
While running, the script shall output what it is doing. Once all work is done, the script shall output its execution the time and the size of the directory with deleted and modified files.
Consider using --backup, --backup-dir and --exclude-from switches for rsync, as well as the find and comm utilities for generating file lists.

Topic 17. Benchmarking file compression methods.
The script accepts one argument – a file name to be compressed.
The script shall compress the file using at least gzip, bzip2, xz and zstd programs with at least three presets (profiles) for each of the programs (meaningful across the programs, such as 'best supported compress ratio', or 'default compress ratio', or 'using multiple threads').
The script shall record the duration of the compression and decompression, the resulting file size, and display the results as two tables: one with absolute values, other with time values relative to the shortest time, and the size values relative to original file size. Each table should place programs as columns, and profiles as rows (or, if you choose so, the other way round).
The script must compress the files to temporary files that are deleted once the measurements are done, and it must not overwrite or erase existing files (consider using mktemp).

Topic 18. Verifying output of a program.
The script accepts a path to an executable as an argument.
The script shall read from a file (which name is set as a variable at the beginning of the script) pairs composed of a path to a file with expected output and an argument list. Then, the scripts shall run the specified executable with each argument list, and verify if the actual output matches the expected one. The script shall run the tests concurrently so that it uses all available CPU threads.
The script shall output, preserving the order from the input file, one line per each test with pass and the name of the file with expected output provided the executable yielded correct results, and otherwise FAIL in bold red followed by the name of the the file with expected output and the first differing line. If the executable printed any characters to the standard error, the script shall add a line indicating the fact and shall output the characters to the standard output, indenting each line with a tab.

os_cp/2024/shell_assignment.txt · Last modified: 2024/04/22 20:40 by jkonczak