Monday 21 June 2010

FAT32

As a sample filesystem plug-in, a processor for FAT32 was chosen. There were a number of reasons behind this decision:
  • Well-documented, long established format;
  • A very simple structure that allowed more time to be dedicated to the general VeRa ecosystem; and,
  • Volume of potential sample data, due to it being the normal format for devices such as digital cameras and USB memory sticks.
A FAT drive can be split into three main parts:
  1. Header, containing information relating to the layout of the drive and how individual elements are sized.
  2. The File Allocation Tables themselves (two is a common number, to allow recovery in event of drive damage).
  3. The data itself.
 The first area of interest is the boot sector. It should be noted that the structure of this varies between differing versions of FAT, but the data available includes:
  • Bytes per sector, and sectors per cluster.
  • The number of File Allocation Tables.
  • The start addresses of the File Allocation Tables and the data that each relates to.
It is easiest to describe the FAT itself alongside the data, as one leads directly to the other. This is easier with a diagram, but in essence it is a 'map' of the main data area where an individual byte in the FAT maps to an equivalent cluster in the main data area. Thus, the first FAT item relates to the first data cluster, etc.

The actual values within the FAT are of more interest. If the values are between 0x00000001 and 0x0FFFFFEF (for FAT32 - the maximums for FAT12 and FAT16 are significantly lower due to the value only being stored across two bytes), then it is a pointer to the next item in the FAT; otherwise, it indicates that the data has finished. For example, if the FAT is arranged as follows:
Byte 3: 0x00000004
Byte 4: 0x00000006
Byte 5: 0x0000000A
Byte 6: 0x0FFFFFF0
then we can say that the data goes across clusters 3, 4 before ending in 6.

The first cluster will always point to a data file that represents the root folder of the drive itself. This can be split into individual structures each of length 0x20 (32) bytes, that can represent filenames or folders (or, parts of longer filenames and folder names that don't fit into the standard 8.3 DOS format). These then link back to the FAT, to indicate the locations of the files and folders that the root contains.

The result of this is that the entire filesystem can be processed in a small amount of code:
  1. Call the procedure to build the folders, with an inital offset of zero. This will allow it to read the root folder.
  2. Parse the folder  in blocks of length 0x20 bytes.
  3. Where the block indicates a folder, then recursively call the current procedure, but this time with an offset that allows this folder's structure to be read instead.
Our output is an XML structure that represents the current filesystem. By storing pointers in this structure that relate to the original cluster numbers, it is then possible to create methods within the parser that can retrieve entire files simply by passing in XML elements representing the files themselves.

These methods form part of the overall VeRa object model, a model that covers file systems, files themselves and visualisation tools. This model will be examined separately, as it is key to the way that VeRa handles its plugins.

No comments:

Post a Comment