Thursday, 10 December 2009

Creating blank partitioned drives (2)

In my previous post, I described a method of creating test filesystems for VeRa such that:
  • The drive contained multiple partitions;
  • Each partition could be a different filesystem; and,
  • The drives existed in a raw image format (such as would be extracted by dd or equivalent software).
Last week, VeriSign very kindly sent me a 2Gb USB drive; although as a drive it is significantly smaller than those that VeRa are targetted at, 2Gb is still sufficient for multiple partitions, filesystems and files within these. Unfortunately, Windows XP and Vista see USB drives as removable rather than fixed storage and therefore do not allow more than a single partition.

Two solutions exist to this; one involves a minor alteration to the data on the USB drive, whilst the other alters the way in which Windows treats the drive itself. Note that these require using third party software and may produce unreliable results that, at worst, may lead to serious data loss.

Removable Bit

An article on describes the tool 'Lexar Bootit', which allows the 'removable' bit on the USB drive's descriptor to be set as false; in theory, this then makes the USB drive appear as fixed storage to any PC it is plugged into. In practise, it only appears to work with a selection of drives, and specifically not my otherwise-unbranded VeriSign one.

Replacement Driver

When a USB drive is first used in Windows, a standard driver is used that reports back to the operating system whether the drive is removable or not. Some years ago, IBM released their Microdrive, a tiny hard disk with the form factor of a removable CF media card. Because sizes for these went above the 4GB limit of the FAT filesystem that portable devices often used, the Windows driver for these Microdrives (now sold under the Hitachi branding) allowed them to be presented to the operating system as fixed drives suitable for containing multiple partitions.

This page, as well as mentioning the Lexar application above, also shows a method in which this Hitachi driver can be altered to allow it to be applied to any USB memory device; as a result, Windows can be persuaded to treat any USB memory device as a fixed (partitionable) drive.

Through this second method, I have been able to create test images with multiple filesystems without the additional requirement of a virtual machine. Note that other PCs will not be able to view anything other than the first partition; this could be used to hide data, but if that is not the intention I would recommend keeping a copy of the altered driver on the first visible partition.

Friday, 27 November 2009

Creating blank partitioned drives

For a system such as VeRa, testing its filesystem and partition table detection routines on a range of partition combinations is a necessity; however, creating these can be time-consuming. A simple solution to this involves a Linux live ISO and a copy of VirtualBox, an open-source virtualisation tool from Sun Microsystems.

The process I am following is as follows:
  1. Create a new Virtual Machine, with a drive sized as required.
  2. Boot this Virtual Machine from the Linux live ISO.
  3. Using a partition manager (such as fdisk), create partitions on the virtual hard disk as required.
  4. Format these partitions to match the requirements of the particular test.
  5. Exit VirtualBox.
  6. Using 'VBoxManage.exe' from the VirtualBox install folder, convert the VirtualBox image to a raw image format: VBoxManage clonehd –format RAW (image).vdi (image).img
(with thanks to

Tuesday, 17 November 2009

XML in Forensics: DEX

In their 2009 paper 'DEX: Digital evidence provenance supporting reproducibility and comparison', Levine and Liberatore refer to Alink, Garfinkel and Turner's past works (as also documented here, here and here) in attempting to develop a file format suitable for presenting as evidence in forensic cases.

They initially criticise the fact that the tools that provide the greatest amount of support (and by inference are commercial and closed-source) are the same tools that require the end user to trust the tools blindly, with no easy way of verifying that the output is valid. Therefore, they propose (as both Garfinkel and Turner before them already did) a format that is independant of the forensic tool itself: DEX, or Digital Evidence Exchange.
"A DEX description of evidence is sufficient for a third party using an independently developed tool to quickly extract the same evidence and verify that the reproduction is correct according to a known specification."
(p. S49)

They also note the importance of the work that preceeded their own:
"The description by Garfinkel et al. of AFF ... provides a good survey of many other image formats, including EnCase, FTK, ILook, and more."
(p. S50)

An advantage of the file format is that the data stored within it need not come from a single source (unlike a situation where the data is an image file and additional metadata, as is the case with many other tools). A key result of this is that is therefore possible to append further data to an existing DEX file.

Unlike the DEB format, DEX does not require a log of alterations to the file and/or examinations of the evidence; it also keeps no information on the individual who processed the evidence. Although this might be seen by some as a disadvantage, the DEX file itself contains enough information to allow its content to be verified against the original evidence anyway. Any logging can therefore be done via access to the source data - the physical drives, for example - which can be handled through normal law enforcement procedures.

The format is also explictly designed to allow files to be processed through multiple disparate tools to provide analysis that one tool alone may not be capable of. It is this that makes it ideal as a native format for VeRa's data:
  • For input, allow the user to specify a DEX file alongside images and/or filesystems.
  • The initial analysis tools can parse the data in the images and add them to the existing DEX data (or create a new DEX file).
  • The visualisation tool can then display the data from the DEX file, allow it to be narrowed down to the relevant data (e.g. only files handled between specific dates), and then output to a new DEX file.
It also means that, at least in theory, any tools that can manipulate or visualise DEX data can be integrated directly into VeRa, as long as they can be adapted to run within the .NET runtime.

See: Levine, B; Liberatore, M (2009). DEX: Digital evidence provenance supporting reproducibility and comparison. Digital Investigation, Volume 6 (2009), p.S48-S56

Thursday, 12 November 2009

XML in Forensics: fiwalk and AFF

In 2006, Simson Garfinkel attempted to solve the issue of propriatory disk image formats (as used by most forensic analysis tools), and at the same time the problem with storage of massive retrieved datasets; his solution was AFF, the Advanced File Format.

One tool that he has developed, to work alongside AFF and its base SleuthKit, is fiwalk, which covers the following features:
  • Finds all partitions & automatically processes each.
  • Handles file systems on raw device (partition-less).
  • Creates a single output file with forensic data data from all.

Unfortunately, although designed to extract forensic data into an XML format (similar to that of other tools, and indeed VeRa), there are issues with its implementation:
  • The format itself has no single point of documentation, making it difficult for other investigators to extend its functionality. The PDF above has a brief overview, whilst Forensics Wiki has a little more (although with comments such as 'not sure what this means', there is an indication that their documentation may have been by way of reverse-engineering).
  • No mention of the XML format is made in the fiwalk download itself either.
  • The system is designed on top of Sleuthkit, a set of Linux-based tools.
Although the final point is a minor one, AllBusiness recently noted that:
"The computer forensic software market has long been a duopolistic market with the two significant players being Guidance Software (GUID) and AccessData."

Displacing either market leader would be a major task; however, producing a product that complements their differing functionalities would be viable. As both Encase and FTK are Windows-based, it seems logical that Windows should be the target platform for new software in order to maximise acceptance.

Wednesday, 11 November 2009

XML in Forensics: XIRAF

In 2005, Turner published his paper 'Unification of digital evidence from disparate sources (Digital Evidence Bags)'. He described an XML data format that could be used in a similar form to normal evidence bags, and concludes by stating:
"The digital forensic community is in need of a new approach to the way in which the information from digital devices is gathered and processed."
Turner, 2005

XML is also the subject of a 2006 paper by Alink et al, "XIRAF - XML-based indexing and querying for digital forensics". Based upon Alink's own MSc thesis, an architecture is described that links between forensic analysis elements by a shared XML format. One example given is that of a forensically captured image:
[case id="test-case"]
    [image id="1" name="A" start="0" end="15000000"]
        [volume type="FAT32" start="0" end="10000000"/]
        [volume type="NTFS" start="10000000" end="15000000"/]
    [image id="2" name="B" start="15000000" end="35000000"/]
    [image id="3" name="C" start="35000000" end="40000000"]
        [volume type="EXT2" start="35000000" end="40000000"/]
(the blog software appears to have problems with angle brackets, hence the odd XML above.)

Although the formats may not be of direct use, Alink et al go on to describe how individual tools can register themselves as being for specific sets of files through XPath queries:
Description: Lists recently deleted files by looking at the recycle bin log files (usually named "INFO2")
Input selection: Selects all files named INFO2
Input query:  //file[@name[ends-with(.,"/INFO2")]]

 They have also used the query language to generate timeline data incorporating file times, EXIF data and other sources of date-related data; the author goes on to state:
"[The investigator] could see, for example, that movie files are created in the file system at approximately the same time that suspects are discussing a transfer of those files using a chat program."
and notes that the timeline software could allow the user to drill down from the displayed data back to the source of that data. However, like Turner's DEB, XIRAF's format appear to have gained little support in the past 4 years with no obvious tools having appeared.

It is possible that a newer format, DEX, may take off where both DEB and XIRAF appear to have failed; it is this that will be investigated next as part of the VeRa project.

Tuesday, 10 November 2009


The project plan has been submitted, with the software that will come out of it being given the name 'Virtualisation Environment for Resource Analysis', or VeRa; as an individual's name this can also have the meaning 'truth'.

The research areas of the project are:
  • XML data formats for import and export, covering DEB, XIRAF and DEX;
  • File system analysis:
    • Being able to analyse an image file and determine the overall partition structure as well as the filetypes of the individual partitions.
    • Ability to extract at will the file structure and even individual files from an image;
  • Visualisation of data
The eventual application, which will form a core part of the eventual thesis, will be capable of:
  • Utilising plug-ins for the majority of its analysis functionality;
  • Reading directory structures in from a variety of resources:
    • Disk images (via ‘Data Capture’ implementations);
    • DEX-format evidence bags; or,
    • Locally attached storage.
  • Passing data through ‘Data Analysis’ tools to the registered ‘Data Visualisation’ tool.
  • Outputting data to DEX-format evidence bags.
The analysis and visualisation sections will be designed specifically to encourage others to develop plug-ins; it is possible that those could act as 'mini-projects' for individuals on other (BSc or MSc) courses. Sample plug-ins will be developed in order to show the application's functionality, but will be fully replaceable.

The application will be developed in C# to run under Microsoft Windows; as a result, any plug-ins will also need to be developed in a language compatible with the .net framework, although it is possible that the application as a whole may run under Novell's Mono.

Development will be in Microsoft Visual Studio 2008, provided through their DreamSpark programme. I will also use Buzan Online's iMindMap to scope out the project itself, and Microsoft Expression for software and interface design and prototyping.

Thursday, 5 November 2009

Visual Elements

One of the key features of any software project is usability. And for a project where visualisation is key, being able to convey the information in the best manner possible is essential.

The initial build of my final project will have a visualisation component, but the aim is to modularise as much as possible. With that in mind, the following elements will be individual, and indeed replaceable, components:
  • File system processors
    • The software will come with processors for certain filesystems built in, but it should be possible to add others in. The end user will be able to select one component to handle NTFS processing, and another for ext3 processing. In the event of a file system existing that the software cannot handle, a component for that filesystem can also be written and added in.
  • File processors
    • The ability to handle processing of every single filetype is a major task. Therefore this will also rely upon plug-ins to add new functionality; all filesystem data will be passed to each processor and it will be up to that processor to gather relevant data. For example, an image processor may collect data relating to all JPG, GIF and PNG files, whilst an Internet Explorer processor can collect browsing history, cookies and registry settings.
  • Visual Display
    • Although a standard timeline will be included, there should be no reason why another display component could not be added at a later stage; it would simply have to accept retrieved data from the file processors, use visual elements and UI components to narrow down the data to what happens to be relevant, and then output the file data again.
In theory, therefore, the system could be used for more than just timelines; it could show breakdowns of images by file or displayed size, and allow the user to export those of given dimensions (or even taken on a certain camera). I would recommend browsing through books such as Edward Tufte's 'Visual Display of Quantative Information' for inspiration.

Wednesday, 4 November 2009

Digital Evidence Bags

There are currently fundamental differences in the treatment of digital and non-digital (physical) forensic evidence in our legal system. All examination of non-digital evidence is logged through the use of evidence bags, whilst for digital evidence the only requirements are generally to provide evidence that the data being worked upon and presented:
  • has been collected in a manner consistent with the ACPO guidelines on digital evidence collection; and,
  • matches, through verification against the output of an agreed hashing algorithm, the data collected from the original source.
Another key issue affecting digital evidence gathering is the sheer size of the datasets. With terabyte drives available for less than £60, the amount of data that needs to be processed, investigated, stored and presented is on the verge of being unmanageable.

Others also acknowledge this:
“Traditional computer forensics is on the edge of a precipice … The reason for this imminent doomsday is the sheer volume of data that has to be processed during the course of a digital forensic investigation.”
Turner, p. 223

Turner proposes an alternative; a set of file formats that mimic the structure of non-digital evidence bags in a digital environment. A base implementation has been developed, however there is no evidence to suggest that any further development has occurred either by Turner or by third parties; it is feasible that this initial implementation only exists to facilitate Turner's related patent application.

There is little argument against the concept - as opposed to Turner's implementation - of Digital Evidence Bags; after all, if there is a bloodstain on a wall nobody would ever suggest taking the entire building as evidence. However, it should be noted that Turner's solution is not the only one that exists. We also have to examine the Digital Evidence Exchange (DEX) format.

See: Turner, P. (2005). Unification of digital evidence from disparate sources (Digital Evidence Bags). Digital Investigation, Volume 2, Issue 3, September 2005, p.223-228

Tuesday, 3 November 2009

Which Language?

Through their Dreamspark programme, Microsoft have made their Professional suite of development tools available to full- and part-time students for free. As my primary job rĂ´le involves software development on the Microsoft .NET platform, it then seems natural for me to use a tool such as Visual Studio 2008 for the development of this project.

The one question that remains, however, is which language - or C#? Coming from a hobbist BBC BASIC programmer background, I've followed through with ASP, VBScript, VB6 and all flavours of since, and never seen the need to learn C# (although being aware of most of the syntax I'm still able to follow it with little effort); in addition, since both VB and C# compile into bytecode for use with the .NET CLR, there should be no performance difference between the two.

For long term maintenance of the software - in particular, if it were to be passed on to other developers - would C# be a better choice? Just to add to the temptation to learn the language, a few months ago a copy of O'Reilly's 'Learning C# 3.0' landed on my desk; a result, it transpired, of my having won a competiton in VSJ some time back.

Monday, 2 November 2009

Outlook PST file format

One of the core aims of the project is to be able to take any bit of information, without having access to the application that created it, and present it on a timeline. Therefore, I'm happy to see that Microsoft are planning on opening up the format of Outlook .PST files. Being able to chart the sending of individual emails without resorting to third-party estimations of the file formats will increase the usefulness of any timeline significantly. do however warn that "Microsoft has merely stated its intent to open the PST format, and not provided a timeframe in which it will do so."

The Windows Registry

When analysing an image, one important element to be able to see is the registry; this stores settings for all manner of applications within Windows (as well as settings related to Windows itself), and combined with the dates that are stored alongside each registry entry can provide invaluable information for creating an accurate timeline.

The following extract is from an unpublished paper I previously worked on:
"The Windows registry is stored across a number of different files, referred to as ‘hives’. Within Windows 95, and later 98 and ME, these hives were named ‘system.dat’ and ‘user.dat’, and stored within the Windows installation folder. Windows NT and its successors were more centred on multiple users, and therefore store their files under both the Windows installation folder and the user’s own folder (Casey, 2004, p. 276)."
'Casey' in this case refers to the book 'Digital Evidence and Computer Crime'. Other papers (most notably, 'Mee, V., Tryfonas, T., & Sutherland, I. (2006). The Windows Registry as a forensic artefact: Illustrating evidence collection for Internet usage. Digital Investigation , 166-173.') go into significant detail about the structure of the registry, but miss one important element: how do we view the registry without a registry editor?

Surprisingly enough, documentation on this is scarce; even the copy of Microsoft Press's 'Windows Registry Guide 2nd Edition' next to me don't even touch on it, assuming at all times that the registry can be loaded without need for further intervention. A number of tools are available (including the 'regedit' tool built into Windows), but unless source is available this is of unfortunately little use.

Therefore, a new required component for this project is one capable of programmatically loading a registry file, and providing an interface into the data itself. The format of the registry differs between versions of Windows, but the following two papers give some indication of a method of parsing the registry file:
It should be noted that the registry replaced the .INI files that existed on older versions of Windows; Unix still uses a similar concept with '.' configuration files, and due to being stored within the filesystem both these are significantly easier to parse.

Friday, 30 October 2009

Detecting the Filesystem

Before any analysis of files can be performed, the drive image (or images) must be analysed so the system knows how the drive was formatted; for example, a Windows PC may use FAT32 or NTFS format, an Apple PC (yes, they are PCs) may be HFS or HFS+, whilst a Linux PC might use ext2, ext3, reiserfs or any other format that happened to interest that particular user when the computer was being set up.

Detection of the filesystem type is - in theory - relatively simple, as each type differs from the other in a number of different ways; combined, these ways make up that filesystem's 'signature'. The issue then becomes finding enough of these signatures to ensure that the filesystem is accurately guessed; due to the high level of documentation of the formats that exists on the Internet this information is simple to find.

Sleuthkit Informer describes a number of ways of narrowing down the information based on the master boot record and certain 'magic values':

"TestDisk runs some basic checks on the boot sector/superblock of each filesystem. As EXT2, EXT3, REISERFS, and JFS share the same partition type, number 0x83, TestDisk has to do additional checks for some filesystems ... Examples of sanity checks include checking for magic values or signatures.  For example, FAT and NTFS have 0xAA55 at 0x1FE of the boot sector."

whilst de Boyne Pollard, in his unfortunately incomplete 'How to determine the filesystem type of a volume' describes an algorithm for breaking down various parts of a drive image to narrow down the filesystem as accurately as possible. Note that, although he doesn't expand it, BPB refers to the boot sector's BIOS parameter block.

Ideally, the method I would prefer to use is that used by the GNU 'mount' command; unfortunately, not being a Linux kernel export I am unable to locate the source code and see how that reads these signatures for myself.

At this stage, I have one or more images, and I now how the individual drives within those images are formatted. The next step is then being able to retrieve lists of files from the drive; as the storage of this information varies significantly between file systems and is therefore potentially the most time-consuming element of the analysis step, this function is one I will only implement for one or two filesystem types initially as a 'proof-of-concept'. Again, sites such as will prove invaluable here.

Thursday, 29 October 2009

Drive Letters

A useful write-up on drive letter assignments can be found, unsurprisingly, in Wikipedia. Thus, if the user of the forensic application is able to let the application know the order of physical disks that the images relate to, it should be possible to logically work out the order in which drive letters are initially assigned.

Windows does, however, give users the option of altering drive letters; NTFS 3 also allows drives to be given 'mount points' within the filesystem. And of course the various Unix variants don't use drive letters at all. Therefore, for Windows drives we also need access to the registry in order to work out these additional issues; for Linux, the main configuration files (probably /etc/fstab).

File Systems

In order to be able to visualise a timeline of a PCs usage, it first of all needs to be possible to get the information from the PC in the first place. The most common method of doing this is by capturing an 'image' of the hard disks (and other rewriteable media) within the physical hardware; these images then exist as files on the analyser's own systems, with each bit in the file representing an individual address on the drive itself.

Note that as little work as possible is done on the original hardware; the Association of Chief Police Officers has released guidelines on how evidence should be gathered, and one fundamental principle (unique to digital evidence) is that the data should, whereever possible, always be analysed without altering the original. (PDF here)

There are already any number of free tools that can create these images, from the Unix dd command to AccessData's FTK Imager; therefore, replicating this functionality is pointless (not to say timeconsuming in the development process). We can therefore assume that at the time that the visualisation tool starts the image has already been captured.

The first step of the analysis process is then to work out how the drive was structured. Some manual input may be required here (which is allowable, as an actual person with a real brain would have had to create the images in the first place) in the event that an analysis covers multiple images (i.e. multiple hard drives), but in general what would need to be worked out at this stage is:
  • Partition information, i.e. how the physical drive was divided into individual driver letters; and,
  • The file system of each partition.
Partition information can vary between different operating systems (for example, I believe FAT originally allowed one primary partition, with a secondary partition then holding multiple individual partitions), and in some cases it may be essential to work out the drive letter that the original PC assigned to each partition.

The next step is to then work out the file systems held in each partition; these could be FAT16, FAT32, NTFS, ext3, or any combination of the above. How to work this out is the subject of a later post, however.

Wednesday, 28 October 2009

Forensic timelines

Although retrieving data is the core aim of a forensic investigation, and existing tools have more power than the average investigator is likely to use, they are still lacking in some areas. As a simple example, if we wished to try and trace the actions on a PC during a certain time period, the majority of the tools that currently exist are simply not geared up to give this level of detail.

The main issue is that the concept of a 'date' exists in multiple places within a single PC:
  • File creation / modification / last access
  • Visit to a website
  • The last time a particular registry key was accessed
  • When a particular USB stick was last used on the PC
  • When a photograph was taken
In the case of the last of these, a fundamental point is that at that time the photograph was not even on the PC. To view this information requires more than access to the file system itself; it needs an application that can understand filetypes and 'look inside' the files.

In writing a system capable of looking inside the files, and in doing so mapping out the dates associated with any particular object, it should then be possible to create a 'forensic timeline' of the usage of that computer. This timeline will never be complete and, at times, may be inaccurate, but as long as these limitations are known and handled it will still be a useful tool in the investigator's arsenal.

Others have also realised this; Olsson and Boldt have documented the development process behind CyberForensics TimeLab in Digital Investigation. However, their software is still very much a prototype with a basic user interface and a lack of output options; these elements alone are ripe for improvement.

A Short(ish) Introduction

Welcome to my MSc Project Blog!

I’m currently a mature (by age if not attitude) MSc student in my final year at the University of Glamorgan, studying Computer Forensics (or, as it will say on my certificate, the slightly more wordy "Information Security and Computer Crime"). The final year is concerned with my thesis, which in my case will be a major project centered around the subject of digital forensics with relevance to modules I’ve studied over previous years.

The project I will be working on is still going through the approval process, but will almost certainly be a software product covering one or both of the subject areas of Forensic Timelining and Digital Evidence Bags. Both are relatively immature technologies compared with the subject of digital forensics as a whole (and that’s saying something!), and therefore ripe for investigation and further development.

The aim of this blog is therefore two-fold:
  • To document my progress through the year in the completion of my project, and thus to act as the project diary which then becomes part of the eventual thesis; and,
  • To publicise the work I’m doing, and attract feedback and/or suggestions of where the project could go after completion.
All comments are of course welcome, although be aware that I may use anything posted within the eventual thesis if it is appropriate; by commenting you are therefore granting me a non-exclusive unlimited licence to use (either in original, edited or redacted form) any content you post within this blog. Your comments will still retain you as the copyright owner, however.