Friday 30 October 2009

Detecting the Filesystem

Before any analysis of files can be performed, the drive image (or images) must be analysed so the system knows how the drive was formatted; for example, a Windows PC may use FAT32 or NTFS format, an Apple PC (yes, they are PCs) may be HFS or HFS+, whilst a Linux PC might use ext2, ext3, reiserfs or any other format that happened to interest that particular user when the computer was being set up.

Detection of the filesystem type is - in theory - relatively simple, as each type differs from the other in a number of different ways; combined, these ways make up that filesystem's 'signature'. The issue then becomes finding enough of these signatures to ensure that the filesystem is accurately guessed; due to the high level of documentation of the formats that exists on the Internet this information is simple to find.

Sleuthkit Informer describes a number of ways of narrowing down the information based on the master boot record and certain 'magic values':

"TestDisk runs some basic checks on the boot sector/superblock of each filesystem. As EXT2, EXT3, REISERFS, and JFS share the same partition type, number 0x83, TestDisk has to do additional checks for some filesystems ... Examples of sanity checks include checking for magic values or signatures.  For example, FAT and NTFS have 0xAA55 at 0x1FE of the boot sector."

whilst de Boyne Pollard, in his unfortunately incomplete 'How to determine the filesystem type of a volume' describes an algorithm for breaking down various parts of a drive image to narrow down the filesystem as accurately as possible. Note that, although he doesn't expand it, BPB refers to the boot sector's BIOS parameter block.

Ideally, the method I would prefer to use is that used by the GNU 'mount' command; unfortunately, not being a Linux kernel export I am unable to locate the source code and see how that reads these signatures for myself.

At this stage, I have one or more images, and I now how the individual drives within those images are formatted. The next step is then being able to retrieve lists of files from the drive; as the storage of this information varies significantly between file systems and is therefore potentially the most time-consuming element of the analysis step, this function is one I will only implement for one or two filesystem types initially as a 'proof-of-concept'. Again, sites such as NTFS.com will prove invaluable here.

Thursday 29 October 2009

Drive Letters

A useful write-up on drive letter assignments can be found, unsurprisingly, in Wikipedia. Thus, if the user of the forensic application is able to let the application know the order of physical disks that the images relate to, it should be possible to logically work out the order in which drive letters are initially assigned.

Windows does, however, give users the option of altering drive letters; NTFS 3 also allows drives to be given 'mount points' within the filesystem. And of course the various Unix variants don't use drive letters at all. Therefore, for Windows drives we also need access to the registry in order to work out these additional issues; for Linux, the main configuration files (probably /etc/fstab).

File Systems

In order to be able to visualise a timeline of a PCs usage, it first of all needs to be possible to get the information from the PC in the first place. The most common method of doing this is by capturing an 'image' of the hard disks (and other rewriteable media) within the physical hardware; these images then exist as files on the analyser's own systems, with each bit in the file representing an individual address on the drive itself.

Note that as little work as possible is done on the original hardware; the Association of Chief Police Officers has released guidelines on how evidence should be gathered, and one fundamental principle (unique to digital evidence) is that the data should, whereever possible, always be analysed without altering the original. (PDF here)

There are already any number of free tools that can create these images, from the Unix dd command to AccessData's FTK Imager; therefore, replicating this functionality is pointless (not to say timeconsuming in the development process). We can therefore assume that at the time that the visualisation tool starts the image has already been captured.

The first step of the analysis process is then to work out how the drive was structured. Some manual input may be required here (which is allowable, as an actual person with a real brain would have had to create the images in the first place) in the event that an analysis covers multiple images (i.e. multiple hard drives), but in general what would need to be worked out at this stage is:
  • Partition information, i.e. how the physical drive was divided into individual driver letters; and,
  • The file system of each partition.
Partition information can vary between different operating systems (for example, I believe FAT originally allowed one primary partition, with a secondary partition then holding multiple individual partitions), and in some cases it may be essential to work out the drive letter that the original PC assigned to each partition.

The next step is to then work out the file systems held in each partition; these could be FAT16, FAT32, NTFS, ext3, or any combination of the above. How to work this out is the subject of a later post, however.

Wednesday 28 October 2009

Forensic timelines

Although retrieving data is the core aim of a forensic investigation, and existing tools have more power than the average investigator is likely to use, they are still lacking in some areas. As a simple example, if we wished to try and trace the actions on a PC during a certain time period, the majority of the tools that currently exist are simply not geared up to give this level of detail.

The main issue is that the concept of a 'date' exists in multiple places within a single PC:
  • File creation / modification / last access
  • Visit to a website
  • The last time a particular registry key was accessed
  • When a particular USB stick was last used on the PC
  • When a photograph was taken
In the case of the last of these, a fundamental point is that at that time the photograph was not even on the PC. To view this information requires more than access to the file system itself; it needs an application that can understand filetypes and 'look inside' the files.

In writing a system capable of looking inside the files, and in doing so mapping out the dates associated with any particular object, it should then be possible to create a 'forensic timeline' of the usage of that computer. This timeline will never be complete and, at times, may be inaccurate, but as long as these limitations are known and handled it will still be a useful tool in the investigator's arsenal.

Others have also realised this; Olsson and Boldt have documented the development process behind CyberForensics TimeLab in Digital Investigation. However, their software is still very much a prototype with a basic user interface and a lack of output options; these elements alone are ripe for improvement.

A Short(ish) Introduction

Welcome to my MSc Project Blog!

I’m currently a mature (by age if not attitude) MSc student in my final year at the University of Glamorgan, studying Computer Forensics (or, as it will say on my certificate, the slightly more wordy "Information Security and Computer Crime"). The final year is concerned with my thesis, which in my case will be a major project centered around the subject of digital forensics with relevance to modules I’ve studied over previous years.

The project I will be working on is still going through the approval process, but will almost certainly be a software product covering one or both of the subject areas of Forensic Timelining and Digital Evidence Bags. Both are relatively immature technologies compared with the subject of digital forensics as a whole (and that’s saying something!), and therefore ripe for investigation and further development.

The aim of this blog is therefore two-fold:
  • To document my progress through the year in the completion of my project, and thus to act as the project diary which then becomes part of the eventual thesis; and,
  • To publicise the work I’m doing, and attract feedback and/or suggestions of where the project could go after completion.
All comments are of course welcome, although be aware that I may use anything posted within the eventual thesis if it is appropriate; by commenting you are therefore granting me a non-exclusive unlimited licence to use (either in original, edited or redacted form) any content you post within this blog. Your comments will still retain you as the copyright owner, however.