Monday 21 June 2010


As a sample filesystem plug-in, a processor for FAT32 was chosen. There were a number of reasons behind this decision:
  • Well-documented, long established format;
  • A very simple structure that allowed more time to be dedicated to the general VeRa ecosystem; and,
  • Volume of potential sample data, due to it being the normal format for devices such as digital cameras and USB memory sticks.
A FAT drive can be split into three main parts:
  1. Header, containing information relating to the layout of the drive and how individual elements are sized.
  2. The File Allocation Tables themselves (two is a common number, to allow recovery in event of drive damage).
  3. The data itself.
 The first area of interest is the boot sector. It should be noted that the structure of this varies between differing versions of FAT, but the data available includes:
  • Bytes per sector, and sectors per cluster.
  • The number of File Allocation Tables.
  • The start addresses of the File Allocation Tables and the data that each relates to.
It is easiest to describe the FAT itself alongside the data, as one leads directly to the other. This is easier with a diagram, but in essence it is a 'map' of the main data area where an individual byte in the FAT maps to an equivalent cluster in the main data area. Thus, the first FAT item relates to the first data cluster, etc.

The actual values within the FAT are of more interest. If the values are between 0x00000001 and 0x0FFFFFEF (for FAT32 - the maximums for FAT12 and FAT16 are significantly lower due to the value only being stored across two bytes), then it is a pointer to the next item in the FAT; otherwise, it indicates that the data has finished. For example, if the FAT is arranged as follows:
Byte 3: 0x00000004
Byte 4: 0x00000006
Byte 5: 0x0000000A
Byte 6: 0x0FFFFFF0
then we can say that the data goes across clusters 3, 4 before ending in 6.

The first cluster will always point to a data file that represents the root folder of the drive itself. This can be split into individual structures each of length 0x20 (32) bytes, that can represent filenames or folders (or, parts of longer filenames and folder names that don't fit into the standard 8.3 DOS format). These then link back to the FAT, to indicate the locations of the files and folders that the root contains.

The result of this is that the entire filesystem can be processed in a small amount of code:
  1. Call the procedure to build the folders, with an inital offset of zero. This will allow it to read the root folder.
  2. Parse the folder  in blocks of length 0x20 bytes.
  3. Where the block indicates a folder, then recursively call the current procedure, but this time with an offset that allows this folder's structure to be read instead.
Our output is an XML structure that represents the current filesystem. By storing pointers in this structure that relate to the original cluster numbers, it is then possible to create methods within the parser that can retrieve entire files simply by passing in XML elements representing the files themselves.

These methods form part of the overall VeRa object model, a model that covers file systems, files themselves and visualisation tools. This model will be examined separately, as it is key to the way that VeRa handles its plugins.

Partition Tables

Parsing a partition record is surprisingly easy. The first step is to read the master boot record (MBR) for an image, and specifically the bytes from 0x01BE onwards: this is the partition table itself. These contain up to four-byte records that indicate the starting points of either the individual partition, or further ('extended') partition tables.

A typical partition table could be:
  • Address of partition 1
  • Address of partition 2
  • Address of partition 3
  • Address of next partition table
Although this demonstrates why the fourth partition onward is always described as being in an extended partition, it also shows why it not possible to ascertain at this point whether the pointer is to a partition, or an extended partition table. This can only be worked out when reading the partition record itself; its structure, with the first byte being numbered zero, is as follows:
  • Bytes 8 to 11 are the starting sector of the drive (or next partition table);
  • Bytes 12 to 15 are the size of the partition; and,
  • Where byte 4 equals 0x05 or 0x0F, this indicates that this is an extended partition.
In the case of an extended partition, the partition table will then generally be laid out as:
  • Address of partition 4
  • Address of next partition table
  • Blank record (pointing to a partition of size zero)
  • Blank record (pointing to a partition of size zero)
It is through this list, going from one partition table to the next, that we can gather information regarding every single partition on a drive image. Note that although convention dictates that the final record of the table (fourth for in the MBR, second in extended partitions) is the pointer, there is nothing that specifies that this has to be the case; therefore, VeRa is written to handle any combination of partition record and extended partition.

Once the partition records have all been collected, it is then be possible to read individual partitions and work out which, if any, filesystem is installed. This, again, is a simple process of deduction:
  • Read bytes 0x26 and 0x42 from the partition.
  • If byte 0x42 is 0x28 or 0x29, then the filesystem type is in the eight bytes starting at address 0x52.
  • If byte 0x26 is 0x28 or 0x29, then the filesystem type is in the eight bytes starting at address 0x36.
  • If byte 0x26 is 0x80, then the filesystem type is NTFS.
(note that this is a very simplified version that doesn't go into details of why these specific values are checked. Jonathan de Boyne Pollard goes into far more detail and the reasons behind why this works.)

Friday 18 June 2010

VeRa - Software Complete (almost)

The 17th of June marked a minor milestone, as that is the date on which full-time development on VeRa ceased. A few small features still need to be tweaked, and the software given a makeover, but in terms of functionality it is now complete.

VeRa works in a wizard-style interface, where the user is taken through the following steps:
  • Selection of the source of the data.
    • This can be one of a captured raw drive image, a folder (and, optionally, subfolders), or a previously exported VeRa file.
  • In the case of a raw drive image, the user is given for each filesystem found the option of which plugin they wish to use for processing.
    • Although in its initial development only a single FAT processor has been developed, this could easily include processors for NTFS, ext2, ext3, etc, and other options such as the ability to include deleted files (which the current processor filters out).
  • For drive images and folders, the user can then select which plugins they wish to process the data through for further processing.
    • In the sample development, a single JPEG processor is included. This reads details of every JPEG it finds, retrieves (via a callback to the file system processor) its header information, and processes its EXIF header data to retrieve information including camera make and model, and GPS coordinates of where the photograph was taken.
    • This facility isn't available for previously exported VeRa files, as there is no way to guarantee availability of the original source file(s).
  • The user can then select a visualisation tool to view the data in. This is implemented as a visual .NET control rather than a standard class, as the control can include the facility to narrow down the data into more manageable chunks.
  • Once the user is happy with the data, the file can then be exported. The export format is a best-of-breed combination of the file formats already investigated, cherry-picking the best features from each.
For the visualisation, one very important resource has been the Ordnance Survey's OpenData initiative. This initiative, which didn't exist when development started but became available at a crucial time, means the sample visualisation tool can include detailed Ordnance Survey maps within its interface.

The project write-up has now begun, and over time I hope to update this blog with further information and sample code (as I document it) on the parsing of partition tables, FAT filesystems and EXIF data.