Application for Google Summer of Code 2007: Krzysztof Lichota "Automatic boot and application start file prefetching"

Idea

Disk access is one of the main reasons of slow application startup. Ubuntu's main competition (Windows XP) has been providing for a long time a feature to analyze application and system startup and prefetch necessary files into memory when application is started again [1]. Also files are reorganized on disk for faster access during system boot and application startup. Currently, although several attempts has been made, there is no such end-to-end, automatic solution for Linux systems and I want to implement it.

Current state

There were some attempts to provide boot and application startup prefetching, but all have some problems and none of them works as expected.

Ubuntu boot readahead

Ubuntu currently (checked on Ubuntu Dapper) includes boot scripts which can analyze and prefetch files during boot. It works quite well in general, but has the following problems:
Other important features:

Preload

preload [2] developed as part of Google Summer of Code 2005 aimed to provide preloading of file based on statistical analysis by corellation of applications (possibly multiple) and files they use.
The idea is unsuitable for speeding up application startup for the following reasons:
Other important features:

Bootcache/filecache

Bootcache [3] has been developed as part of Google Summer of Code 2006 [4]. It concentrates on kernel side of prefetching by providing facilities for faster readahead and analysis of page cache.
It contains some interesting features:
However, it also has some problems:

Conclusions

Currently available solutions, while providing partial solutions, do not provide complete and automatic solution for prefetching. In particular:

Project

Objective

I would like to concentrate on delivering prefetching solution for everyday use by casual users, leveraging prior  solutions where appropriate and providing missing parts of complete and automatic prefetching:
Implementation will be concentrated on most important parts (subject to analysis of benefit and implementation complexity) with the main goal to deliver working automatic solution at the end of project, leaving less obvious benefits as secondary goals. Filesystem specific parts will be done for ext3 as default file system in Ubuntu and most often used for desktops.

Implementation sketch

Hooking into application startup

If possible, I will use existing solution such as binfmt to run appropriate hooks.
If it is not possible, I will patch kernel sources appropriately.
Hooks will be run in kernel or user space, depending on analysis of efficiency and security of both solutions.
Existing prefetching tools (from bootcache or direct kernel facilities) will be reused for prefetching part.
Tracing will be done using lightweight tracing facility (described below) or, if found better (or time is short), existing bootcache tracing facility will be used.

Lightweight tracing solution

Providing read tracing with minimum overhead should be possible, similarly to blktrace facility already present in kernel. According to my preliminary tests, blktrace does not incur significant overhead during boot, although it logs several records for each read and write, so logging only reads and metadata accesses should not have high impact.

Metadata reads and reads tracing will be implemented as patch for ext3 module and kernel (if necessary). Generic parts which can be used for other filesystems or other uses will be moved into common module or kernel.

Tool to change layout of files

I have done investigation of tools for changing disk layout on Linux systems and could not find any proper solution, possibly because changing layout of files on disk is risky. e2defrag (part of ext2 utilities) has not been developed for years and currently is not usable and even dangerous (it might destroy filesystem if run on ext3 filesystem).

I have decided to start from scratch and implemented a prototype of tool to move file blocks for ext3 filesystem. Currently it is able to locate free area on disk of appropriate size and move data blocks and indirect blocks of selected files to it, in given order. The code is here [5]. It uses e2fslib library, used also by current ext2/3 tools (like e2fsck). It lacks inode relocation and I will investigate if it is necessary and in such case I will add it.
Finally I will improve it to the point it can be used safely on desktop computers, with common options used for ext3 in Ubuntu, add extensive tests and seek review by ext3 developers. If possible, I will try to submit it to ext2 tools distribution.

This tool will be hooked into shutdown scripts for automatic changing layout of files during shutdown. If possible, I will reuse for it scripts already used by bootcache.

Layout of files on disk will be set using simple policy (group files needed only by one application in one area, group common files for applications in another area), based on boot and application startup traces.
If time permits, some more advances policies can be tested. Tool will be designed in such way that testing various policies is possible, for further research.

Prefetching of filesystem metadata

If time permits and preliminary analysis shows it is feasible, I will add simple caching facility to ext3 filesystem module to prefetch metadata blocks and instrument code to satisfy such reads from cache.

Deliverables

Deliverables (in order of importance): If time permits:

Roadmap

About me

I am student of 1st year of PhD studies in Computer Sciences at the Warsaw University. My main interests are operating systems and distributed systems.

Why I am the right person for this task?

Contact

Krzysztof Lichota <lichota@mimuw.edu.pl>