Category Archives: eDiscovery

Analysis of e-mail and appointment falsification on Microsoft Outlook/Exchange

1
Filed under Computer Forensics, Email Investigations, Evidence Analysis, eDiscovery

Author: Joachim Metz <forensics@hoffmannbv.nl>

Summary

In digital forensic analysis it is sometimes required to be able to determine if an e-mail has or has
not been falsified. In this paper a review of certain Outlook Message Application Programming
Interface (MAPI)
is provided which can help in determining falsified e-mails or altered
appointments in an Microsoft Outlook/Exchange environment.

About the libpff project

In 2008 Joachim Metz a forensic investigator at Hoffmann Investigations started the libpff project.
At that time the best source about the Personal Folder File (PFF) format in the public domain was
the libpst project. The libpst project dated back to 2002 and had been contributed and maintained
by David Smith, Joe Nahmias, Brad Hards and Carl Byington.

However the libpst, at that time, wasn’t a library and had no support for recovering deleted items
in PST and OST files. The initial goal of the libpff project to create a shared library for PST and
OST that had support for recovering deleted items. Recovering deleted items requires detailed
knowledge of the inner structures of the PFF format. This was the beginning of an interesting
journey. In which even recently additional information about the inner structures has been
discovered, like the 6c and 8c table and the use of indirection in large tables.

In March 2009 PFF forensics was first discussed as part of Microsoft Office forensics in the
Hoffmann Advanced Forensic Sessions (HAFS). A paper titled ‘Personal Folder File (PFF)
forensics’
was published as part of the HAFS. This paper explains the basics of the PFF format,
which can be quite a challenge to understand. One of the main conclusions of the both the paper
and the seminar was that different forensic tools provide different results when recovering deleted
items in PST and OST files.

In the mean time the libpff project has evolved. Due to continued analysis of the PFF format and
several contributions new aspects of the file format have been discovered. Some of which are the
PFF items that contain information about the recipients, sub folders, sub messages and sub
associated items.

Also a lot of information available about the MAPI has made available. The OpenChange project
provides libmapi which contains an Open Source implementation of the MAPI. And the
MFCMAPI project has provided a lot of MAPI information now available on MSDN.

Within Hoffmann Investigations libpff has been to put to work for two purposes. First as a tool to
cross reference findings in other forensic tools and secondarily as a tool that can provide more
information about PST and OST files than those forensic tools. In the upcoming Hoffmann
Advanced Forensic Sessions in November 2009
PFF forensics will be therefore once more the
subject of discussion. In the mean time several of the interesting findings are provided in this
paper.

1. Introduction

Wouldn’t it be nice to have your forensic analysis software to filter out falsified e-mails and
appointments for you? However, most of the current forensic tools provide little information about
the authenticity of e-mail messages and appointments. Therefore, certain analysis have to be done
manually. This paper will give you an understanding of parts the Outlook Message Application
Programming Interface (MAPI) to help identify falsified e-mails in Microsoft Outlook/Exchange
environments.

1.1. Background

If you are a forensic investigator in the field of corporate environments you are probably dealing
with Microsoft Outlook and Exchange most of the time. What you might not know is that both
make heavy use of the MAPI. The MAPI is not only a programming interface but also a useful
resource of information regarding properties of e-mail attributes. For those of you not familiar
with analyzing the Personal Folder File format used by Microsoft Outlook for PST and OST files,
I advice reading [METZ09] before reading this paper.

2. Falsified e-mail message

In a recent investigation we had to investigate if a user had sent an e-mail at a certain date and
time. We started by determining the existence of the e-mail in the mailbox of both the sender and
the recipients. But there were other characteristics that were highly interesting from a forensic
point of view.

A certain e-mail dated March 10, 2009 was forwarded on March 17, 2009. The original e-mail
could not be found in any of the mailboxes. The first indication of falsification was a discoloring
of the day of the month in a print-out of the forwarded e-mail. The 0 in March 10, was gray while
the surrounding text was clearly black.

2.1. The e-mail body

In Outlook/Exchange an e-mail message can contain RTF and/or HTML body text. Both RTF and
HTML formats use formatting codes. Using these formatting codes we did a low-level analysis of
the body text. Most of the available forensic tools do not provide access to these formatting codes,
but lucky for us there is libpff and its tools.
After having compiled libpff with verbose and debug output and having pffexport export the PST

file with the verbose option (-v), we had created a detailed debug log file. In this log file we looked up the e-mail and its RTF body. In the RTF body the following information was found:

{\*\htmltag84 <b>}\htmlrtf {\b \htmlrtf0 Sent:
{\*\htmltag92 </b>}\htmlrtf }\htmlrtf0 Tuesday March 1
{\*\htmltag84 <span style='color:#1F497D'>}\htmlrtf {\htmlrtf0 0
{\*\htmltag92 </span>}\htmlrtf }\htmlrtf0 , 2009 13:48
{\*\htmltag116 <br>}\htmlrtf \line
\htmlrtf0
{\*\htmltag4 \par }

Using other forwarded e-mails as a reference, we established that the bold formatting code should not be there.

2.2. Conversation index

Looking at existing e-mail messages we hypothesized that the original e-mail was not created on
March 10, 2009 but was in fact an e-mail created on March 17 2009 that had been altered. We
wanted proof besides the lack of the original e-mail message in the mailboxes of the sender and
the recipients.

A MSDN article titled 'Tracking conversations' provided us with a fairly reliable answer.
[MSDN] states that:

PR_CONVERSATION_INDEX (PidTagConversationIndex) indicates the position of the
message within a particular conversation. It is a client's reponsibility to
set PR_CONVERSATION_INDEX for each outgoing message, whether it is a new
message, a forwarded message, or a reply. Clients can set this property
manually or call ScCreateConversationIndex, a utility function provided by
MAPI.
ScCreateConversationIndex generates the value of a conversation index for any
outgoing message. ScCreateConversationIndex implements the index as a header
block that is 22 bytes in length, followed by zero or more child blocks each 5
bytes in length.
The header block is composed of 22 bytes, divided into three parts:
 * One reserved byte. Its value is 1.
 * Five bytes for the current system time converted to the FILETIME structure
 format.
 * Sixteen bytes holding a GUID, or globally unique identifier.
Each child block is composed of 5 bytes, divided as follows:
 * One bit containing a code representing the difference between the current
 time and the time stored in the header block. This bit will be 0 if the
 difference is less than .02 second and greater than two years and 1 if the
 difference is less than one second and greater than 56 years.
 * Thirty one bits containing the difference between the current time and the
 time in the header block expressed in FILETIME units.This part of the child
 block is produced using one of two strategies, depending on the value of
 the first bit. If this bit is zero, ScCreateConversationIndex discards the
 high 15 bits and the low 18 bits. If this bit is one, the function discards
 the high 10 bits and the low 23 bits.
 * Four bits containing a random number generated by calling the Win32
 function GetTickCount.
 * Four bits containing a sequence count that is taken from part of the random
 number.

Reverse-engineering this description for the PFF format I found that the part of the header block

containing the ‘One reserved byte’ with a value of 1 is actually the first byte of the filetime. So

there are not 5 bytes of the filetime but 6. The date and time in the header block of the

conversation index matches the creation date and time of e-mail messages.

The child block contains a difference between the current and the previous time and not the time

stored in the header block, as according to the MSDN specification. This was validated using the

creation date and time of multiple e-mails.

The conversation index for the specific e-mail translates to:

0x0071 (PidTagConversationIndex : Conversation index)
0x0102 (PT_BINARY : Binary data)
Header block:
 Filetime        : Mar 17, 2009 10:13:04 UTC
 GUID            : 11111111-2222-3333-4444-555555555555
Child block: 1
 Filetime        : Mar 17, 2009 10:18:03 UTC
 Random number   : 2
 Sequence count : 0
Child block: 2
 Filetime        : Mar 17, 2009 10:24:01 UTC
 Random number   : 9
 Sequence count : 0
Child block: 3
 Filetime        : Mar 17, 2009 10:42:39 UTC
 Random number   : 9
 Sequence count : 0
Child block: 4
 Filetime        : Mar 17, 2009 10:45:36 UTC
 Random number   : 14
 Sequence count : 0
Child block: 5
 Filetime        : Apr 17, 2009 07:19:08 UTC
 Random number   : 8
 Sequence count : 0

Note that the precision of the date and time difference in the child block varies and does not match

the creation date and time. The actual reason for this variation is yet unknown.

0x3007 (PidTagCreationTime : Creation time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime        : Apr 17, 2009 08:41:20 UTC

However there is no date March 10, 2009 in the conversation index. Looking at the conversation

indexes of other forwarded and replied e-mail messages this is the behavior we would expect.

Note that the GUID ‘11111111-2222-3333-4444-555555555555′ in this example was altered.

Using the GUID we found corresponding e-mails, with the same GUID in the conversation index.

Most of these e-mails had a different content. This finding supported our hypothesis. All of the

corresponding e-mails also had a creation date of March 17, 2009. Therefore, it was plausible that

the e-mail with the discolored zero in ‘March 10′ was falsified using another e-mail created on

March 17, 2009. Upon being faced with the findings in an interview, the sender of the e-mail admitted that he had

altered the e-mail.

3. The modified appointment

In another investigation we found an appointment that contained a conversation topic that
contained one of the keywords we were looking for. However the appointment had an entirely
different subject and the last modification date and time already indicated that the appointment
was modified at a later date.

We needed to be certain that this behavior was caused by modifying an appointment. Using
Outlook we created a PST file with an appointment. Libpff provided us with the following
information about the subject and the conversation topic:

0x0037 (PidTagSubject : Subject)
0x001f (PT_UNICODE : UTF-16 Unicode string)
Unicode string  : ^A^ATest1
0x0070 (PidTagConversationTopic : Conversation topic)
0x001f (PT_UNICODE : UTF-16 Unicode string)
Unicode string  : Test1

And about the date and time values:

0x0039 (PidTagClientSubmitTime : Client submit time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:07:47 UTC
0x0071 (PidTagConversationIndex : Conversation index)
0x0102 (PT_BINARY : Binary data)
Header block:
 Filetime         : Jul 23, 2009 14:07:47 UTC
 GUID             : 11111111-2222-3333-4444-555555555555
0x0e06 (PidTagOriginalDeliveryTime : Message delivery time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:07:47 UTC
0x3007 (PidTagCreationTime : Creation time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:04:28 UTC
0x3008 (PidTagLastModificationTime : Last modification time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:07:50 UTC

The ^A characters in the subject are control characters and can be ignored.

Note that the creation and last modification date and time are not equal.

Next we modified the appointment and had libpff provide us with information about the subject

and the conversation topic:

0x0037 (PidTagSubject : Subject)
0x001f (PT_UNICODE : UTF-16 Unicode string)
Unicode string  : ^A^AModified1
0x0070 (PidTagConversationTopic : Conversation topic)
0x001f (PT_UNICODE : UTF-16 Unicode string)
Unicode string  : Test1

And about the date and time values:

0x0039 (PidTagClientSubmitTime : Client submit time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:07:47 UTC
0x0071 (PidTagConversationIndex : Conversation index)
0x0102 (PT_BINARY : Binary data)
Header block:
 Filetime         : Jul 23, 2009 14:07:47 UTC
 GUID             : 11111111-2222-3333-4444-555555555555
0x0e06 (PidTagOriginalDeliveryTime : Message delivery time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:07:47 UTC
0x3007 (PidTagCreationTime : Creation time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:04:28 UTC
0x3008 (PidTagLastModificationTime : Last modification time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:08:37 UTC

As you can see the conversation topic and index do not change when an appointment is modified.

The last modification date and time in the example is not much of an indication that the

appointment was modified, mainly because we did the modification right after the creation of the

appointment.

4. Conclusion

E-mails and appointments in Outlook/Exchange provide us with certain properties that can be
useful for digital forensic analysis of e-mails, like the conversation index and multiple formatted
body texts. Others may be the conversation topic and original creation and/or modification dates
and times.

Appendix A. References

[METZ09]
Title:     Personal Folder File (PFF) forensics
Subtitile: Analyzing the horrible reference file format
Author(s): Joachim Metz
URL:       http://kent.dl.sourceforge.net/sourceforge/libpff/PFF_forensics.pdf
[MSDN]
Title:     Tracking conversations
URL:      http://msdn.microsoft.com/en-us/library/cc765583.aspx

Perl Fu: Email Discovery

1
Filed under Email Investigations, eDiscovery

Hal Pomeranz, Deer Run Associates

I hope Mike Worman doesn’t hate on me for stealing his “Perl Fu” idea, but I recently have been dealing with a task that is perfect for Perl.  One of my customers is having to do a laborious discovery process through a huge email archive that is in “Unix mailbox format”– meaning large text files with the email messages all concatentated togther.  They need to find any one of a list of relevant keywords in messages stored in these hundreds of gigabytes of large text files and output the entire text of the matching email messages.

Unix mailbox format is a file format that I’ve dealt with a lot, and I’ve written many scripts to parse these kinds of files.  So it probably took me less time to write the script to do this than it’s going to take me to write this blog post.  But I figured this is a task that other readers of the blog might encounter from time to time, so here’s the code:

#!/usr/bin/perl
# mgrep -- match patterns and output messages from Unix mailbox files
# Usage: mgrep [-i] [-f file] [pattern] file1 ...

use strict;
use Getopt::Std;

my %opts = ();
getopts('if:', \%opts);

my $pattern = undef;
if (length($opts{'f'})) {
    open(FILE, "< $opts{'f'}") ||
	die "Can't open pattern file $opts{'f'}: $!\n";
    my @lines = <FILE>;
    close(FILE);
    chomp(@lines);
    $pattern = '(' . join('|', @lines) . ')';
}
else {
    $pattern = shift(@ARGV);
}
$pattern = "(?i)$pattern" if ($opts{'i'});

my $message = undef;
while (<>) {
    if (/^From\s/) {
	print $message if ($message =~ /$pattern/s);
	$message = undef;
    }
    $message .= $_;
}
print $message if ($message =~ /$pattern/s);

The actual meat of the program is the “while (<>) …” loop down in the bottom third of the code.  We spend more code processing arguments and setting up the pattern match than on actually processing the input files.  But here are some notes to help you make sense of what’s happening in the program:

  1. First we “use strict” to have Perl help us enforce good programming practice in our script, like pre-declaring variables with “my” to help prevent typos and other errors.
  2. Then we incorporate the standard Perl command line argument processing library (”use Getopt::Std”) and call getops() to process the command line arguments.  Here we’re specifying that our program accepts both “-i” (case insensitive matching) and “-f” to specify a file name containing a list of patterns to match against.  The “:” after the “f” in the getops() string means that”-f” expects an argument, namely the file name.  Any options that getopts() finds will be stored in the “%opts” array.
  3. Next our “if” block checks to see if the “-f” option was set.  If so, then we attempt to open the specified file name and read in its contents (”die” causes the program to abort if the file can’t be opened).  We use chomp() to remove the newlines from the lines we read in and then we concatenate all of the patterns together to form a pattern string like “(pattern1|pattern2|…)” (”pattern1 or pattern2 or …”).  Note that if “-f” was not set, then we just read the pattern in from the command line like the normal Unix grep program (that’s the “else { … }” block).
  4. Next we check to see if the “-i” (case-insensitive match) option is set.  If so, then we add “(?i)” at the front of our pattern.  In a Perl pattern match, this is one way to express case-insensitive matching.
  5. Now we’re finally ready to start processing our input files.  The “while (<>) { … }” construct is a useful bit of Perl shorthand that emulates the standard Unix command-line processing.  Specifically it means that if there are any remaining command-line arguments, they should be treated as file names and opened sequentially and all lines processed one at a time from each file.  If there are no unused arguments on the command line after our argument processing, then the program should look for its input from the standard input.
  6. Within the body of the loop, we’re processing our input one line at a time.  At the end of the loop we’re simply concatenating the lines we read into the “$message” variable that holds our message text.  “$_” is the magic Perl variable that represents the text of the line we’re currently processing, and “$message .= $_” means “append $_ to the text already in $message”.
  7. Now for the uninitiated, Unix mailbox format is nothing but a large text file with messages concatenated one after the other.  You can recognize the start of each new mail message when you find a line that begins “From<whitespace>“.  Our “if { … }” block at the top of the loop matches this pattern as an indication that we’ve reached the end of one message and are starting in on another.  If the message we’ve collected so far matches the pattern specified by the user then we print the entire contents of the mail message.  Then we empty our “$message” variable and so we can start collecting the next mail message.
  8. After we’ve processed all of our input files, we still need to determine whether or not we should output the last message from the last file we processed.  That’s why there’s one more print statement after the end of the loop.

Whew!  That’s a lot of words for a simple script, but I hope it helps you wrap your head around some of the more obscure bits of Perl syntax and gives you some ideas for writing your own scripts.  By the way, because I chose to use Perl for this task, one of the happy accidents is that we can actually use the Perl regular expression syntax for the patterns we give as input to the program (whether we put them in a file or specify them on the command line).  This is good news because Perl’s pattern matching syntax is much more flexible and expressive than the one used by the regular Unix grep command.

Happy email hacking!

Hal Pomeranz is an independent IT/Computer Security Consultant and a SANS Faculty Fellow.  He is available as a strolling Perl programmer for weddings and bar mitzvahs.

Forensics and Data Access Auditing

Comments Off
Filed under Computer Forensics, Evidence Analysis, Incident Response, eDiscovery

by Craig Wright

Data access auditing is a surveillance control that intersects with forensics and incident handling. In all events, the same level of care needs to be taken as any event can lead to a forensic engagement. By monitoring access to all sensitive information contained within the database, suspicious activity can be brought to the examiner’s awareness. Databases commonly structure data as tables containing columns (think of a spreadsheet, only more complex). Data access examinations should address six questions:

  1. Who accessed the data?
  2. When was the data accessed?
  3. How was the data accessed? (This is what computer program or client software was used?)
  4. Where was the data accessed from (this is the location on the network or Internet)
  5. Which SQL query was used to access the data?
  6. Was it the attempt to access data successful? (And if yes, how much data was retrieved?)

The evidence available to the forensic examiner is provided:

  • Within the client system (this may be infeasible – such as in web based commerce systems or remote clients),
  • Within the database (including the logs produced by the database that are sent to a remote system), or
  • Between the client and the database (such as firewall logs, IDS/IPS devices and host based events and logs).

An analysis within the client entails using the evidence available on the client itself and forms a standard forensic engagement. Client systems can hold a wealth of database access tools and the logs that these create. These logs may contain lists of end-user activity that a user has performed on the database. In respect of web based systems, the web server itself may be treated as a client of sorts.

To obtain an adequate audit trail from client systems alone, all data access must have occurred using client tools under the control of the organization conducting the review. In the event that data access can transpire using other means, it is rare that sufficient evidence will be available. This option by itself is the entirely worst option available to the examiner, but it can provide additional evidence in support of the other methods. This is chiefly used in the event of a forensic investigation.

An examination within the database is often problematic due to:

  • A limited audit functionality of many database management systems (DBMS),
  • Inconsistent DBMS configurations and types being deployed throughout an organization, and
  • Performance losses due to enabling the forensic tools and mechanisms

An examination within the database is without doubt better than auditing within the client, however, the best approach is a combination of auditing the client, network and the database.

Capturing data between the client and the database entails monitoring the communication between the client and the database. This involves capturing and interpreting the traffic between the client and the database. Software is available for this and it may be used to provide data access auditing. The biggest issues with this type of data access examinations are:

  • Encryption between the client and the database server,
  • Privacy considerations and rights to view data, and
  • Correlating large volumes of data that also need to be parsed and processed to be useful.

The issue with network capture is that this requires planning. Most organisations do not have forensic captures of all transactions.

Craig Wright is a Director with Information Defense in Australia. He holds both the GSE-Malware and GSE-Compliance certifications from GIAC. He is a perpetual student with numerous post graduate degrees including an LLM specializing in international commercial law and ecommerce law as well as working on his 4th IT focused Masters degree (Masters in System Development) from Charles Stuart University where he is helping to launch a Masters degree in digital forensics. He is starting his second doctorate, a PhD on the quantification of information system risk at CSU in April this year.

SQL, Databases and Forensics

1
Filed under Computer Forensics, Reverse Engineering, eDiscovery

by Craig Wright

For the most part, databases have become an integral part of any organization. More importantly, they have become mission critical. On top of this, many enterprise level databases are far larger than any disk you are likely to encounter. As an example, I was required to image a database that belonged to an insurance company. This database was 68TB in total size and it was business critical. The consequence is that you need to start thinking of other ways to do forensic work on databases.

As with all live system forensics, begin with gathering the evidence required starting from the most volatile and working toward that which is unlikely to change. When doing this, remember to:

  • Protect the Audit Trail – Protect the audit trail so that audit information cannot be added, changed, or deleted.
  • Access only pertinent data and limit your actions – In order to avoid cluttering the meaningful information and changing the evidence; plan and target all database activities before you start.
  • ERDs (entity relationship diagrams) are your friend.

Triggers and T-SQL code for analysis are rarely added into databases, but you should check. There may be something that could provide additional levels of logging and recording already on the system. The transaction logs can recreate an entire database – these are essential.

Think of other areas to look…

One form of volatile data that is usually overlooked is the “Plan Cache”. When a SQL statement is submitted to be parsed by the query processor, the query processor will identify the lowest computational cost strategy to retrieve the requested data – this is an execution plan. An execution plan can be recovered from the Plan Cache and used to reconstruct the activity (such as that of an attacker). This is a source of overlooked but highly volatile evidence that can be used to recreate the execution history from stored procedures, function execution and even command line SQL queries.

ERDs

XCase and DbVisualiser are a couple great tools for database work. These map databases to create a visual map of the database and the tables. These are also known as CASE tools.

CASE tools can be a great aid to incident response and forensic work involving database systems. CASE or Computer Assisted Software Engineering tools not only help in the development of software and database structures but can be used to reverse engineer existing databases and check them against a predefined schema. There are a variety of both open source and commercial CASE tools.

With more and more commercial databases running over the terabyte size, standard command line SQL coding is unlikely to find all of the intricate relationships between these tables, stored procedures and other database functions. A CASE tool on the other hand can reverse engineer existing databases to produce diagrams that represent the database. These can be compared with existing schema diagrams to ensure that the database matches the architecture that it is originally built from and to quickly zoom in on selected areas. This can be done either from a live SQL system or a disk image.

Visual objects, colors and better diagrams may all be introduced to further enhance the capacity to analyze the structure. Reverse engineering a database will enable the determination of various structures that have been created within the database. Some of these include:

  • The indexes,
  • Fields,
  • Relationships,
  • Sub-categories,
  • Views,
  • Connections,
  • Primary keys and alternate keys,
  • Triggers,
  • Constraints,
  • Procedures and functions,
  • Rules,
  • Table space and storage details associated with the database,
  • Sequences used and finally the entities within the database.

Each of the tables will also display detailed information concerning the structure of each of the fields that may be viewed at a single glance. In large databases a graphical view is probably the only method that will adequately determine if relationships between different tables and functions within a database actually meet the requirements. It may be possible in smaller databases to determine the referential integrity constraints between different fields, but in a larger database containing thousands of tables there is no way to do this in a simple manner using manual techniques.

Fig. 1 Display database schema.

It is not just security functions such as cross site scripting and SQL injection that need to be considered. Relationships between various entities and the rights and privileges that are associated with various tables and roles also need to be considered. The CASE tools allow us to visualize the most important security features associated with a database. These are:

  1. Schemas restrict the views of the database for users,
  2. Domains, assertions, checks and other integrity controls defined as database objects which may be enforced using the DBMS in the process of database queries and updates,
  3. Authorization rules. These are rules which identify the users and roles associated with the database and may be used to restrict the actions that a user can take against any of the database features such as tables or individual fields,
  4. Authentication schemes. These are schemes which can be used to identify users attempting to gain access to the database or individual features within the database.
  5. User defined procedures which may define constraints or limitations on the use of the database,
  6. Encryption processes. Many compliance regimes call for the encryption of selected data on the database. Most modern databases include encryption processes that can be used to ensure that the data is protected.
  7. Other features such as backup, check point capabilities and journaling help to ensure recovery processes for the database. These controls aid in database availability and integrity, two of the three legs of security.

CASE tools also contain other functions that are useful when conducting a forensic analysis of a database. One function that is extremely useful is model comparison.

Fig. 2 Reverse Engineer existing databases into presentation quality diagrams in minutes.

Case tools allow the forensic analyst to:

  • Present clear data models at various levels of detail using visual objects, colors and embedded diagrams to organize database schemas,
  • Synchronize models with the database,
  • Compare a baseline model to the actual database (or to another model),

Case tools can generate code automatically and also store this for review and baselining. This includes:

  • DDL Code to build and change the database structure
  • Triggers and Stored Procedures to safeguard data integrity
  • Views and Queries to extract data

Model comparison involves comparing the model of the database with the actual database on the system. This can be used to ensure change control or to ensure that no unauthorized changes have been made and that the data integrity has been maintained. To do this, a baseline of the database structure will be taken at some point in time. At a later time the database could be reverse engineered to create another model and these two models could be compared. Any differences, variations or discrepancies between these would represent a change. Any changes should be authorized changes and if not, should be investigated. Many of the tools also have functions that provide detailed reports of all discrepancies.

Many modern databases run into the terabytes and contain tens of thousands of tables. A baseline and automated report of any differences, variations or discrepancies makes the job of finding a change on these databases much simpler. Triggers and stored procedures can be stored within the CASE tool itself. These can be used to safeguard data integrity. Ideally, selected areas within the database will have been set up such as honeytoken styled fields or views that can be checked against a hash at different times to ensure that no-one has altered any of these areas of the database. Further in database tables it should not change. Tables of hashes may be maintained and validated using the offline model that has stored these hash functions already. Any variation would be reported in the discrepancy report.

Next the capability to create a complex ERD or Entity Relationship Diagram in itself adds value to the engagement. Many organizations do not have a detailed structure of the database and these are grown organically over time with many of the original designers having left the organization. In this event it is not uncommon for the organization to have no idea about the various tables that they have on their own database.

Another benefit of CASE tools is their ability to migrate data. CASE tools have the ability to create detailed SQL statements and to replicate through reverse engineering the data structures. They can then migrate these data structures to a separate database that can be used for analysis offline. This is useful as the data can be copied to another system. That system may be used to interrogate tables without fear of damaging the data. In particular the data that has migrated to the tables does not need to be the actual data, meaning that the examiner does not have access to sensitive information but will know the defenses and protections associated with the database and can extract selected information without accessing all of the data.

This is useful as the examiner can then perform complex interrogations of the database that may result in damage to the database if it was running on the live system. This provides a capability for the examiner to validate the data in the database against the business rules and constraints that have been defined by the models and generate detailed integrity reports. This capability gives an organization advanced tools that will help them locate faulty data subsets and other sources of evidence through the use of automatically generated SQL statements.

Craig Wright is a Director with Information Defense in Australia. He holds both the GSE-Malware and GSE-Compliance certifications from GIAC. He is a perpetual student with numerous post graduate degrees including an LLM specializing in international commercial law and ecommerce law as well as working on his 4th IT focused Masters degree (Masters in System Development) from Charles Stuart University where he is helping to launch a Masters degree in digital forensics. He is starting his second doctorate, a PhD on the quantification of information system risk at CSU in April this year.

Law Is Not A Science: Admissibility of Computer Evidence and MD5 Hashes

2
Filed under Computer Forensics, Digital Forensic Law, eDiscovery

Another day… another hashing discussion:

On the SANS GIAC Alumni list the other day, the question popped up from one of the individuals on the list:

“I’m assuming that this group has had the pleasure to consume the latest research focused on MD5 hash collisions.  Discussions about hash collisions seems to carry the same energy as religion and politics.   My question is regarding digital evidence and the use of MD5 hashes to establish digital evidence integrity.  The use of hashes to ensure digital evidence integrity has legal precedence. However, as more research companies introduce concerns related to MD5 hashes, the courts will at some point, no longer consider this as a valid technology to ensure integrity.

Has anyone heard of a successful attempt to dismiss evidence due to concerns that MD5 is no longer considered tamper proof?”

This topic pops up from time to time in our Computer Forensics classes at SANS (er… pretty much every time…)

The answer:

First off, as of today, using MD5 algorithm as a form of hashing for digital forensic work is completely acceptable.

You can use additional means of hashing, but honestly, choose which algorithm you feel is best.  As long as you are accomplishing hashing of evidence you are fine and your evidence will usually see its day in court.

Why?

First off, admissibility guidelines do not differentiate between physical and electronic evidence.  The Federal Rules of Evidence (FRE rules 901 and 902) guide authentication of evidence for admissibility (http://federalevidence.com/advisory-committee-notes).  No where does it state that electronic evidence will be treated differently than physical evidence for authentication purposes.

  • Could you get electronic evidence admitted without hashing? Yep.
  • Will hashing help admissibility of my evidence? Certainly, but it is not legally required.
  • What if someone brings up collisions in court? Again, usually an attempt to confuse the jury.  But you can turn this on them by stating that it is more likely that before showing up for jury duty, all the jurors randomly put the same 7 numbers into the Powerball Lottery and won. That has a much greater chance of happening than a naturally occurring collision. (Thanks to Scott Moulton for that great analogy).  With folks being prosecuted on partial fingerprint matches or eye witness testimony from a guy driving by in a car at 30 MPH, do we really think this is a show stopper for courts?
  • Interesting Rob, but anyone with some legal credentials to back up what you are telling us? Yes, our very own author/senior instructor Richard Salgado for Computer Forensics at SANS wrote a wonderful paper on the topic several years ago for Harvard Law Review (http://www.harvardlawreview.org/forum/issues/119/dec05/salgado.pdf) that states “…there is more than reasonable assurance that two different inputs will not have the same hash value.” ( see footnotes 7 & 8 )
  • If hashing is not legally required to prove authenticity, why do we use hashing, chain of custody, and proper storage of evidence in case of pending litigation?  Two point five reasons:

1.  Expert Witness:

Best practices are tested if you are deposed as an expert. Hashing (any form) is considered a best practice for digital forensic practitioners.  If you take yourself seriously in this line of work and you do not perform any type of hashing then you open yourself up for a cross examination as an expert that would not be fun to sit through.  “The court is called upon to reject testimony that is based upon premises lacking any significant support and acceptance within the scientific community,” (http://federalevidence.com/advisory-committee-notes#Rule702). If you would like your testimony to hold greater weight, HASH. ’nuff said.

2.  Tampering.

Tampering can only be brought up if the opposing council has a strong argument that the evidence has been deliberately modified.  Tampering can not just be brought up because of it is digital evidence and easily modified… the opposing side has to prove it happened.  The burden is on the side claiming that tampering happened not the side entering the evidence (see http://www.usdoj.gov/criminal/cybercrime/s&smanual2002.htm and do a search for “Authenticity and the Alteration of Computer Records“). With hashing (even using an algorithm such as MD5), you can reduce the threat that someone will claim the evidence has been tampered with if you can prove over time it has not changed.  Which in this case, collisions are really not a big deal at all as long as you get the same hash every time you calculate it against the evidence.

Why is MD5 still ok?  From the cited website:  “The existence of an air-tight security system [to prevent tampering] is not, however, a prerequisite to the admissibility of computer printouts. If such a prerequisite did exist, it would become virtually impossible to admit computer-generated records; the party opposing admission would have to show only that a better security system was feasible.

One last thought from Eoghan Casey on this topic: “On May 24, 2006, the DFRWS posted a challenge asking for anyone to produce actual files (or evidence) that have produced a collision and nobody has succeeded yet!”

2.5. Law Is Not A Science:

I tell students this regularly…  We (you and I) are technical.  We grew up loving math.  We feel that if we add 1+1 we will always get 2.  This is why it is a science.  1+1=2 Repeatable. 1+1=2 Satisfying.  Feels good doesn’t it? 1+1=2

Well, lets take that same formula from our nice scientific world and put it in the legal world.

Court 1: 1+1=2

Court 2: 1+1=2

Court 3: 1+1=3

See what happened there?  We ended up with some bizarre result.  This drives us crazy. Well, in reality, this is not exactly what happens.  What does happen?  What if you take the SAME evidence, the SAME analysis, the SAME conclusions…  you drop that into TEN separate courts, you will probably end up with the same verdict 9 times out of 10.

HOWEVER, (comma, space, pause for additional dramatic effect) there is always at least one jury/judge that will think differently and rule the other way given the SAME evidence, arguments, and testimony.  We need to realize that we cannot force our mindset onto a system that is not a science, but rather, is an art. As a result, like the core question asks about MD5 hashing, we think we need to “fix” the courts or come up with a system that is FAIL proof.

In the instances where we might find that MD5 is attacked in court and subsequently not used for authentication in a courtroom, we can point to variety of reasons.  In the several cases my peers and I have reviewed, it appeared that the prosecution failed to produce an expert to discuss hashing.  Generally all the expert would need to accomplish is to discuss the true likelihood of a collision… which is far less likely than even a collision with DNA evidence.  It isn’t whether the hashing standard has a fault, but whether it is GOOD enough… 1+1=3.  DNA analysis, fingerprinting, and eye witness testimony all have their faults… but are they good enough to convict?  YEP.  Have criminals been let off due to the fact that the prosecution could not produce a DNA expert to discuss the likelihood of a false positive?  Even worse, the judge/jury listens to the explanation and still reject it.  You don’t have to dig far to find cases where individuals are not convicted despite the fact compelling scientific evidence points to the contrary. 1+1=3

And here is the kicker… even though one or two courts rule against the scientific facts such as DNA evidence (or countless others), it does not set precedence and invalidate DNA evidence for here to the end of time.

So…  what do the lawyers think?

The best way to see why law and science do not mix well is to view it from a lawyer’s perspective. This is an excerpt from one of my favorite legal blogs on the subject written by Ralph Losey who has a wonderful book called e-Discovery Current Trends and Cases (worth a read if you deal with litigation and you work in IT).  It is a rather long blog entry, but read it if you have the time.  Doesn’t directly discuss MD5 hashing, but you will see why such a discussion about MD5 hashing being admissible or not due to collisions probably drives the lawyers crazy… just like it drives us crazy when we ended up with 1+1=3 in their world.

From the blog: (http://ralphlosey.wordpress.com/2008/08/24/tech-v-law-a-plea-for-mutual-respect/)

…the practice of law is an art, not a science, and the human element can never be replaced by technology.

Unlike computer code, the rules of law are malleable and there are always exceptions. This in turn is one of the key reasons the two cultures of Law and IT have such a hard time understanding one another. It is also the reason a few inexperienced engineer types are delusionary and arrogant enough to think that e-discovery can be “fixed” with the right software algorithms. It cannot because law is not a science, it is far too complex and chaotic for that. Or if it is a science, it is more like Quantum Physics, where electrons are unpredictable and can be in two places at once, not the orderly world of Newtonian Science that most engineers live in.

Yes, there are many computer programs that can be used as effective tools in the pursuit of justice. We lawyers need to wake up to that fact. But so too do the technologists who think the right software alone will fix everything. The human element is key in Law which is one reason that training is so important.

Rob Lee – (rlee@sans.org)

Rob Lee is a Principal Consultant for MANDIANT, a leading provider of information security consulting services and software to Fortune 500 organizations and the U.S. Government. Rob has over 12 years experience in computer forensics, vulnerability discovery, intrusion detection and incident response. Rob is the  lead course author and faculty fellow for the computer forensic courses at the SANS Institute.

Digital Forensics Professionals: Texas PI Legislation Interpreted

2
Filed under Certification and License, Computer Forensics, eDiscovery

Automated Traffic Enforcement Opinion:  Relevant to Electronic Discovery Work?

A Texas state government agency has published a formal opinion interpreting controversial new legislation on the licensing of computer forensics experts as private investigators. The Texas Private Security Bureau says it “generally” feels the private administrators of traffic enforcement cameras need not be licensed as PIs. The ruling may help us construe this new law in other contexts, such as e-discovery performed by computer forensics professionals.

The agency’s reasoning is that the companies running traffic cameras are engaged in only “ministerial” activities at the direction of public servants (i.e. city employees). But the Bureau says its opinion applies only “generally” to traffic camera operators because some operators might be doing more than mere “ministerial” activities.

To say it a different way: It is the opinion of this arm of Texas government that private red-light camera administrators are not performing the kinds of investigations that require a license. Instead, says the agency, the cities (like College Station, Texas) that hire the administrators are the entities performing the investigations. And the legislation excludes city governments from licensure.

The Texas Private Security Bureau… Continue Reading

Benjamin Wright is a Senior Instructor for the SANS Institute, teaching data security law courses remotely, on-site and at national conferences.

Destruction of adverse documents

Comments Off
Filed under Email Investigations, Evidence Acquisition, eDiscovery

It is an offence to destroy any document that is or may be used as evidence in an ongoing or potential judicial proceeding in most western (at least the common law) jurisdictions. An organization must not destroy documents on the foundation that the evidence is unfavorable. The penalties for the destruction of documents suspected to possibly be subject to litigation may perhaps end in a charge of obstruction to justice. This makes the determination of deleted material that has been destroyed following a litigation hold situation a key goal of the forensic investigator.

Adverse inferences are often upheld in litigation if a party cannot produce the required documents. There is also the hazard of reputation damage. In British American Tobacco Australia Services Limited v Roxanne Joy Cowell for the estate of Rolah Ann McCabe [2002] VSCA 197 the Judge in first instance seriously denounced BAT for the methodical destruction of a large number of records. Documents that may hold evidentiary value need to be retained. Sardonically implementing a record retention policy without taking proper precautions will generally draw an adverse inference from the court if there is any departure from the policy.

The consequence is that policy also necessitates ongoing education about the policy and the procedures utilized to enforce it and constant re-examination of its content. Where a document has been deliberately destroyed, the court is likely to come to a negative determination.

The litigation process of discovery
Discovery is the progression of events that follow the initiation of legal proceedings. A matter will proceed to Court only after all parties have delivered up relevant documents or have presented testimony that they cannot provide these documents. The process of e-discovery involves electronic records such as emails.

Rigidly enforced periods make it vital for the parties to be able to retrieve documents and emails promptly. The forensic investigator has a duty to uncover breaches of litigation hold. Documents destroyed within the period following knowledge of a law suit for instance come under this category.

Expectation of Privacy
Privacy in the workplace is a contentious subject. The definitions of privacy, and its means of protection, vary by jurisdiction. Employee email is commonplace and is used for both work and private means. Organizations have stringent legal requirements in the European Union, Australia, the United States, and other jurisdictions to guard information on private individuals from unauthorized disclosure.

The expectation of privacy does not provide the right to destroy evidence. It is a matter for the court to determine if a file is relevant to a particular case or if it may be excluded.

How strong can the law be?
To answer this, I put forth an example of a fairly recent Australian law. The Victorian Crimes (Document Destruction) Act 2006 (the Document Destruction Act) was passed into law in Victoria (an Australian State) in 2006. Together with the Evidence (Document Unavailability) Act 2006 (the Document Unavailability Act), these pieces of legislation amend the Victorian Crimes Act 1958 and Evidence Act 1958, correspondingly. They where issued in response to concerns raised by the Report on Document Destruction and Civil Litigation in Victoria, by Professor Peter Sallmann. These documents add weight to the need for all companies comprehend their responsibility in respect of how they store or destroy any documents. This incorporates email and other electronic files.

The Document Destruction Act establishes additional criminal penalties and the Document Unavailability Act sets up new civil consequences. The Document Destruction Act affects acts carried out in Victoria such as those by companies resident (or engaging in business) within Victoria. The Document Unavailability Act pertains to civil proceedings initiated within Victoria.

These particular acts are focused on proceedings that have been started within a single state in Australia. The thing is, that the individual laws may vary (and at times be unclear), but it is nearly universal that the destruction of a document that could be used as evidence in a court is a crime. Where this really comes into effect is that the evidence of the destruction of a document can in  fact be worse then the material which may have been contained in the document that was destroyed.

Craig Wright, GFCA Gold #0265, is an author, auditor and forensic analyst. He has nearly 30 GIAC certifications, several post-graduate degrees and is one of a very small number of people who have successfully completed the GSE exam.

PTK: Evidence adding and Indexing

Comments Off
Filed under Computer Forensics, Email Investigations, Evidence Acquisition, Evidence Analysis, Incident Response, Memory Analysis, Mobile Device Forensics, Write Blockers, eDiscovery

At the moment the output formats used in computer forensics for the support of media duplication are mainly three:

●    dd (RAW image) –  the best and most utilized format
●    Encase format (EWF) – closed format now widely supported by the CF products
●    AFF Lib Format– very complete but still expanding

PTK can recognize the above listed formats. Usually, a media copy can be made from a single file or on split files. PTK is able to recognize the split image situation and, given the first chunk, automatically import the additional files. No log files or other types of data are allowed inside the evidence directory (i.e. file.e01, file.e02, file.log is not permitted). Through TSK, PTK automatically recognizes every partition in the image including support for the following file systems type: NTFS, FAT, UFS 1, UFS 2, EXT2FS, EXT3FS, and ISO 9660. One may also define, if necessary, the original time zone. Remember that for the FAT file system, time information are saved according to the local system date. With PTK, during the FAT image importing, the timestamps are converted from the original system’s local time into GMT/UTC time. For the NTFS file system, the timestamps are already saved in  GMT/UTC format and thus the time zone setting represents only a visualization parameter that can be changed at all times. For every added evidence you can obviously calculate the hash code (MD5, SHA1) and check it with a well-known one.

File system detection

ram-dump unknown file system

In case PTK is not able to identify the file system the user can choose to import the image as RAM dump and make use of the RAM dump analysis or import it as RAW image and have the ability to analyse the disk through the Data Unit or to run the Live Keyword Search on it. During the evidence importing process it is possible to decide whether to create a symbolic link to the image or copy the entire evidence, split or not, inside the PTK directory  images (%www path%/ptk/images).

Even if PTK doesn’t change in any way the evidence file, it is advisable to always use a write blocker. In case the write  block is Firewire, and not ATA, it is recommended that you copy the entire evidence on a disk in order to improve data access speed and the performance consequently. The indexing process requires a number of resources in terms of CPU and I/O disk. Once the evidence is imported it is possible to start working directly on it through various analysis modules (File Analysis, Live Keyword Search, Data Unit, etc..) or start the indexing process. PTK’s indexing engine, discussed on in previous articles, allows one to perform different automated tasks and produce results that all investigators assigned to the case can consult. The indexing process supplies all investigators with its analysis results  but it’s launched only once by the Master Investigator. The diagram below  contains the indexing process operated by PTK using TSK tools. The performance of the indexing engine was  improved compared to the first beta versions.

PTK indexing form

PTK indexing engine

The next article will deal with PTK’s  multi-user system, the possibility to forbid more than one investigator to access specific cases  and the bookmarking features available for every investigator.

Michele Zambelli, GCFA SIlver #1856, is a member of PTK Team and a Security Consultant at DFLabs Italy.

More command line forensics fu

8
Filed under Computer Forensics, Evidence Analysis, eDiscovery

Recently, I was asked to if I could recover all images from a hard disk drive that could be linked to a specific digital camera. In this case, the EXIF data contained the make, model and serial number of the camera in question. Using some simple command fu, I was able to quickly recover all of the images. I could have used GUI tools, but I believe in keeping my command line skills polished so I try to use them as much as I can.

Here’s how I did it. For the sake of demonstration, I’m using the ipcase_ntfs.img from SANS Security 508: Computer Forensics, Investigation and Response, but the concepts are the same for any hard drive image.

To begin with, extract the strings from the image as follows:

strings --radix=d image_file > image_strings.txt

Using the --radix=d causes the strings command to include the byte offset in decimal where the given string occurs in the image_file.

Next I grepped out all the lines matching the camera’s serial number. For this demonstration, I’ll pull out each line of the strings file that contains a reference to exif as follows:

grep -i exif image_strings.txt > hits_exif.txt

Here is a screen shot of the resulting hits_exif.txt file:

Sample contents of hits_exif.jpg

Sample contents of hits_exif.jpg

From here we can craft a single compound command line statement that will recover each file containing EXIF data and verify that the files recovered are image files. Here it is:

for k in $(for i in $(awk '{print $1}' hits_exif.txt); do declare j=$i/4096; ifind ipcase_ntfs.img -d $j; done | sort | uniq); do icat ipcase_ntfs.img $k > $k; file $k; done

The standard out for this command is:

Standard out for our compound command

Standard out for our compound command

Let’s break this command down working from the inside out, the inner for loop takes the decimal offset value for each hit in the hits_exif.txt file and divides it by 4096 which is the cluster size for our file system image. We found this out earlier in our investigation by running fsstat against the ipcase_ntfs.img file.

The quotient from this calculation corresponds to the cluster offset in the file system where the hit occurred. We feed this offset to the ifind command using the -d option, this gives us the MFT entry that points to that particular cluster. Next we pipe the MFT entries to the sort and uniq commands. The resulting unique MFT entries are passed as arguments to the icat command which recovers the data at the given MFT entry by writing it to a file of the same name. Finally, the file command is run against each newly created file and the results are printed to standard out. According to file all but one of the newly created files are jpeg images.

That’s it. With very little practice you will be stringing together command line statements that will optimize the processing of forensic images. Give it a try, you’ll be surprised at just how effective and efficient you can be with a little command line fu.

Dave Hull, GCFA Silver #3368, is an aspiring maker and technologist specializing in information security. He is the principal consultant and founder of Trusted Signal.