Friday 27 November 2009

Creating blank partitioned drives

For a system such as VeRa, testing its filesystem and partition table detection routines on a range of partition combinations is a necessity; however, creating these can be time-consuming. A simple solution to this involves a Linux live ISO and a copy of VirtualBox, an open-source virtualisation tool from Sun Microsystems.

The process I am following is as follows:
  1. Create a new Virtual Machine, with a drive sized as required.
  2. Boot this Virtual Machine from the Linux live ISO.
  3. Using a partition manager (such as fdisk), create partitions on the virtual hard disk as required.
  4. Format these partitions to match the requirements of the particular test.
  5. Exit VirtualBox.
  6. Using 'VBoxManage.exe' from the VirtualBox install folder, convert the VirtualBox image to a raw image format: VBoxManage clonehd –format RAW (image).vdi (image).img
(with thanks to

Tuesday 17 November 2009

XML in Forensics: DEX

In their 2009 paper 'DEX: Digital evidence provenance supporting reproducibility and comparison', Levine and Liberatore refer to Alink, Garfinkel and Turner's past works (as also documented here, here and here) in attempting to develop a file format suitable for presenting as evidence in forensic cases.

They initially criticise the fact that the tools that provide the greatest amount of support (and by inference are commercial and closed-source) are the same tools that require the end user to trust the tools blindly, with no easy way of verifying that the output is valid. Therefore, they propose (as both Garfinkel and Turner before them already did) a format that is independant of the forensic tool itself: DEX, or Digital Evidence Exchange.
"A DEX description of evidence is sufficient for a third party using an independently developed tool to quickly extract the same evidence and verify that the reproduction is correct according to a known specification."
(p. S49)

They also note the importance of the work that preceeded their own:
"The description by Garfinkel et al. of AFF ... provides a good survey of many other image formats, including EnCase, FTK, ILook, and more."
(p. S50)

An advantage of the file format is that the data stored within it need not come from a single source (unlike a situation where the data is an image file and additional metadata, as is the case with many other tools). A key result of this is that is therefore possible to append further data to an existing DEX file.

Unlike the DEB format, DEX does not require a log of alterations to the file and/or examinations of the evidence; it also keeps no information on the individual who processed the evidence. Although this might be seen by some as a disadvantage, the DEX file itself contains enough information to allow its content to be verified against the original evidence anyway. Any logging can therefore be done via access to the source data - the physical drives, for example - which can be handled through normal law enforcement procedures.

The format is also explictly designed to allow files to be processed through multiple disparate tools to provide analysis that one tool alone may not be capable of. It is this that makes it ideal as a native format for VeRa's data:
  • For input, allow the user to specify a DEX file alongside images and/or filesystems.
  • The initial analysis tools can parse the data in the images and add them to the existing DEX data (or create a new DEX file).
  • The visualisation tool can then display the data from the DEX file, allow it to be narrowed down to the relevant data (e.g. only files handled between specific dates), and then output to a new DEX file.
It also means that, at least in theory, any tools that can manipulate or visualise DEX data can be integrated directly into VeRa, as long as they can be adapted to run within the .NET runtime.

See: Levine, B; Liberatore, M (2009). DEX: Digital evidence provenance supporting reproducibility and comparison. Digital Investigation, Volume 6 (2009), p.S48-S56

Thursday 12 November 2009

XML in Forensics: fiwalk and AFF

In 2006, Simson Garfinkel attempted to solve the issue of propriatory disk image formats (as used by most forensic analysis tools), and at the same time the problem with storage of massive retrieved datasets; his solution was AFF, the Advanced File Format.

One tool that he has developed, to work alongside AFF and its base SleuthKit, is fiwalk, which covers the following features:
  • Finds all partitions & automatically processes each.
  • Handles file systems on raw device (partition-less).
  • Creates a single output file with forensic data data from all.

Unfortunately, although designed to extract forensic data into an XML format (similar to that of other tools, and indeed VeRa), there are issues with its implementation:
  • The format itself has no single point of documentation, making it difficult for other investigators to extend its functionality. The PDF above has a brief overview, whilst Forensics Wiki has a little more (although with comments such as 'not sure what this means', there is an indication that their documentation may have been by way of reverse-engineering).
  • No mention of the XML format is made in the fiwalk download itself either.
  • The system is designed on top of Sleuthkit, a set of Linux-based tools.
Although the final point is a minor one, AllBusiness recently noted that:
"The computer forensic software market has long been a duopolistic market with the two significant players being Guidance Software (GUID) and AccessData."

Displacing either market leader would be a major task; however, producing a product that complements their differing functionalities would be viable. As both Encase and FTK are Windows-based, it seems logical that Windows should be the target platform for new software in order to maximise acceptance.

Wednesday 11 November 2009

XML in Forensics: XIRAF

In 2005, Turner published his paper 'Unification of digital evidence from disparate sources (Digital Evidence Bags)'. He described an XML data format that could be used in a similar form to normal evidence bags, and concludes by stating:
"The digital forensic community is in need of a new approach to the way in which the information from digital devices is gathered and processed."
Turner, 2005

XML is also the subject of a 2006 paper by Alink et al, "XIRAF - XML-based indexing and querying for digital forensics". Based upon Alink's own MSc thesis, an architecture is described that links between forensic analysis elements by a shared XML format. One example given is that of a forensically captured image:
[case id="test-case"]
    [image id="1" name="A" start="0" end="15000000"]
        [volume type="FAT32" start="0" end="10000000"/]
        [volume type="NTFS" start="10000000" end="15000000"/]
    [image id="2" name="B" start="15000000" end="35000000"/]
    [image id="3" name="C" start="35000000" end="40000000"]
        [volume type="EXT2" start="35000000" end="40000000"/]
(the blog software appears to have problems with angle brackets, hence the odd XML above.)

Although the formats may not be of direct use, Alink et al go on to describe how individual tools can register themselves as being for specific sets of files through XPath queries:
Description: Lists recently deleted files by looking at the recycle bin log files (usually named "INFO2")
Input selection: Selects all files named INFO2
Input query:  //file[@name[ends-with(.,"/INFO2")]]

 They have also used the query language to generate timeline data incorporating file times, EXIF data and other sources of date-related data; the author goes on to state:
"[The investigator] could see, for example, that movie files are created in the file system at approximately the same time that suspects are discussing a transfer of those files using a chat program."
and notes that the timeline software could allow the user to drill down from the displayed data back to the source of that data. However, like Turner's DEB, XIRAF's format appear to have gained little support in the past 4 years with no obvious tools having appeared.

It is possible that a newer format, DEX, may take off where both DEB and XIRAF appear to have failed; it is this that will be investigated next as part of the VeRa project.

Tuesday 10 November 2009


The project plan has been submitted, with the software that will come out of it being given the name 'Virtualisation Environment for Resource Analysis', or VeRa; as an individual's name this can also have the meaning 'truth'.

The research areas of the project are:
  • XML data formats for import and export, covering DEB, XIRAF and DEX;
  • File system analysis:
    • Being able to analyse an image file and determine the overall partition structure as well as the filetypes of the individual partitions.
    • Ability to extract at will the file structure and even individual files from an image;
  • Visualisation of data
The eventual application, which will form a core part of the eventual thesis, will be capable of:
  • Utilising plug-ins for the majority of its analysis functionality;
  • Reading directory structures in from a variety of resources:
    • Disk images (via ‘Data Capture’ implementations);
    • DEX-format evidence bags; or,
    • Locally attached storage.
  • Passing data through ‘Data Analysis’ tools to the registered ‘Data Visualisation’ tool.
  • Outputting data to DEX-format evidence bags.
The analysis and visualisation sections will be designed specifically to encourage others to develop plug-ins; it is possible that those could act as 'mini-projects' for individuals on other (BSc or MSc) courses. Sample plug-ins will be developed in order to show the application's functionality, but will be fully replaceable.

The application will be developed in C# to run under Microsoft Windows; as a result, any plug-ins will also need to be developed in a language compatible with the .net framework, although it is possible that the application as a whole may run under Novell's Mono.

Development will be in Microsoft Visual Studio 2008, provided through their DreamSpark programme. I will also use Buzan Online's iMindMap to scope out the project itself, and Microsoft Expression for software and interface design and prototyping.

Thursday 5 November 2009

Visual Elements

One of the key features of any software project is usability. And for a project where visualisation is key, being able to convey the information in the best manner possible is essential.

The initial build of my final project will have a visualisation component, but the aim is to modularise as much as possible. With that in mind, the following elements will be individual, and indeed replaceable, components:
  • File system processors
    • The software will come with processors for certain filesystems built in, but it should be possible to add others in. The end user will be able to select one component to handle NTFS processing, and another for ext3 processing. In the event of a file system existing that the software cannot handle, a component for that filesystem can also be written and added in.
  • File processors
    • The ability to handle processing of every single filetype is a major task. Therefore this will also rely upon plug-ins to add new functionality; all filesystem data will be passed to each processor and it will be up to that processor to gather relevant data. For example, an image processor may collect data relating to all JPG, GIF and PNG files, whilst an Internet Explorer processor can collect browsing history, cookies and registry settings.
  • Visual Display
    • Although a standard timeline will be included, there should be no reason why another display component could not be added at a later stage; it would simply have to accept retrieved data from the file processors, use visual elements and UI components to narrow down the data to what happens to be relevant, and then output the file data again.
In theory, therefore, the system could be used for more than just timelines; it could show breakdowns of images by file or displayed size, and allow the user to export those of given dimensions (or even taken on a certain camera). I would recommend browsing through books such as Edward Tufte's 'Visual Display of Quantative Information' for inspiration.

Wednesday 4 November 2009

Digital Evidence Bags

There are currently fundamental differences in the treatment of digital and non-digital (physical) forensic evidence in our legal system. All examination of non-digital evidence is logged through the use of evidence bags, whilst for digital evidence the only requirements are generally to provide evidence that the data being worked upon and presented:
  • has been collected in a manner consistent with the ACPO guidelines on digital evidence collection; and,
  • matches, through verification against the output of an agreed hashing algorithm, the data collected from the original source.
Another key issue affecting digital evidence gathering is the sheer size of the datasets. With terabyte drives available for less than £60, the amount of data that needs to be processed, investigated, stored and presented is on the verge of being unmanageable.

Others also acknowledge this:
“Traditional computer forensics is on the edge of a precipice … The reason for this imminent doomsday is the sheer volume of data that has to be processed during the course of a digital forensic investigation.”
Turner, p. 223

Turner proposes an alternative; a set of file formats that mimic the structure of non-digital evidence bags in a digital environment. A base implementation has been developed, however there is no evidence to suggest that any further development has occurred either by Turner or by third parties; it is feasible that this initial implementation only exists to facilitate Turner's related patent application.

There is little argument against the concept - as opposed to Turner's implementation - of Digital Evidence Bags; after all, if there is a bloodstain on a wall nobody would ever suggest taking the entire building as evidence. However, it should be noted that Turner's solution is not the only one that exists. We also have to examine the Digital Evidence Exchange (DEX) format.

See: Turner, P. (2005). Unification of digital evidence from disparate sources (Digital Evidence Bags). Digital Investigation, Volume 2, Issue 3, September 2005, p.223-228

Tuesday 3 November 2009

Which Language?

Through their Dreamspark programme, Microsoft have made their Professional suite of development tools available to full- and part-time students for free. As my primary job rĂ´le involves software development on the Microsoft .NET platform, it then seems natural for me to use a tool such as Visual Studio 2008 for the development of this project.

The one question that remains, however, is which language - or C#? Coming from a hobbist BBC BASIC programmer background, I've followed through with ASP, VBScript, VB6 and all flavours of since, and never seen the need to learn C# (although being aware of most of the syntax I'm still able to follow it with little effort); in addition, since both VB and C# compile into bytecode for use with the .NET CLR, there should be no performance difference between the two.

For long term maintenance of the software - in particular, if it were to be passed on to other developers - would C# be a better choice? Just to add to the temptation to learn the language, a few months ago a copy of O'Reilly's 'Learning C# 3.0' landed on my desk; a result, it transpired, of my having won a competiton in VSJ some time back.

Monday 2 November 2009

Outlook PST file format

One of the core aims of the project is to be able to take any bit of information, without having access to the application that created it, and present it on a timeline. Therefore, I'm happy to see that Microsoft are planning on opening up the format of Outlook .PST files. Being able to chart the sending of individual emails without resorting to third-party estimations of the file formats will increase the usefulness of any timeline significantly. do however warn that "Microsoft has merely stated its intent to open the PST format, and not provided a timeframe in which it will do so."

The Windows Registry

When analysing an image, one important element to be able to see is the registry; this stores settings for all manner of applications within Windows (as well as settings related to Windows itself), and combined with the dates that are stored alongside each registry entry can provide invaluable information for creating an accurate timeline.

The following extract is from an unpublished paper I previously worked on:
"The Windows registry is stored across a number of different files, referred to as ‘hives’. Within Windows 95, and later 98 and ME, these hives were named ‘system.dat’ and ‘user.dat’, and stored within the Windows installation folder. Windows NT and its successors were more centred on multiple users, and therefore store their files under both the Windows installation folder and the user’s own folder (Casey, 2004, p. 276)."
'Casey' in this case refers to the book 'Digital Evidence and Computer Crime'. Other papers (most notably, 'Mee, V., Tryfonas, T., & Sutherland, I. (2006). The Windows Registry as a forensic artefact: Illustrating evidence collection for Internet usage. Digital Investigation , 166-173.') go into significant detail about the structure of the registry, but miss one important element: how do we view the registry without a registry editor?

Surprisingly enough, documentation on this is scarce; even the copy of Microsoft Press's 'Windows Registry Guide 2nd Edition' next to me don't even touch on it, assuming at all times that the registry can be loaded without need for further intervention. A number of tools are available (including the 'regedit' tool built into Windows), but unless source is available this is of unfortunately little use.

Therefore, a new required component for this project is one capable of programmatically loading a registry file, and providing an interface into the data itself. The format of the registry differs between versions of Windows, but the following two papers give some indication of a method of parsing the registry file:
It should be noted that the registry replaced the .INI files that existed on older versions of Windows; Unix still uses a similar concept with '.' configuration files, and due to being stored within the filesystem both these are significantly easier to parse.