Friday 30 October 2009

Detecting the Filesystem

Before any analysis of files can be performed, the drive image (or images) must be analysed so the system knows how the drive was formatted; for example, a Windows PC may use FAT32 or NTFS format, an Apple PC (yes, they are PCs) may be HFS or HFS+, whilst a Linux PC might use ext2, ext3, reiserfs or any other format that happened to interest that particular user when the computer was being set up.

Detection of the filesystem type is - in theory - relatively simple, as each type differs from the other in a number of different ways; combined, these ways make up that filesystem's 'signature'. The issue then becomes finding enough of these signatures to ensure that the filesystem is accurately guessed; due to the high level of documentation of the formats that exists on the Internet this information is simple to find.

Sleuthkit Informer describes a number of ways of narrowing down the information based on the master boot record and certain 'magic values':

"TestDisk runs some basic checks on the boot sector/superblock of each filesystem. As EXT2, EXT3, REISERFS, and JFS share the same partition type, number 0x83, TestDisk has to do additional checks for some filesystems ... Examples of sanity checks include checking for magic values or signatures.  For example, FAT and NTFS have 0xAA55 at 0x1FE of the boot sector."

whilst de Boyne Pollard, in his unfortunately incomplete 'How to determine the filesystem type of a volume' describes an algorithm for breaking down various parts of a drive image to narrow down the filesystem as accurately as possible. Note that, although he doesn't expand it, BPB refers to the boot sector's BIOS parameter block.

Ideally, the method I would prefer to use is that used by the GNU 'mount' command; unfortunately, not being a Linux kernel export I am unable to locate the source code and see how that reads these signatures for myself.

At this stage, I have one or more images, and I now how the individual drives within those images are formatted. The next step is then being able to retrieve lists of files from the drive; as the storage of this information varies significantly between file systems and is therefore potentially the most time-consuming element of the analysis step, this function is one I will only implement for one or two filesystem types initially as a 'proof-of-concept'. Again, sites such as NTFS.com will prove invaluable here.

No comments:

Post a Comment