Category Archives: Evidence Analysis

Facebook Memory Forensics

0
Filed under Browser Forensics, Computer Forensics, Email Investigations, Evidence Analysis, Memory Analysis, Reverse Engineering

OK, like everyone I joined facebook just to get updates on my high school reunion. (Who knew you could also use it as a possible alibi.)

But then, after writing pdgmail and pdymail and seeing all the neat personal information in facebook…tada pdfbook! Memory parsing to grab facebook info.

Like it’s predecessors pdgmail and pdymail, I’m following the simple construct that memory strings are easy to get to and yield a treasure of information given today’s web 2.0 world of javascript, dhtml, json, etc. Facebook, it turns out doesn’t seem to cough up xml like yahoo, or json like gmail but rather unique class ID strings in it’s html.

What does this mean to forensics? Well with a memory dump from any of the popular memory dumping tools, strings -el  and pdfbook you can get:

  • status updates
  • facebook emails
  • lists of friends
  • likely owners of the memory image

Friends come with their unique facebook ID’s like:

Story from friend: id:6815841748: Name:Barack Obama

Facebook emails are raw html with authors, dates, etc like so :

FacebookEmailDetail author: Storm Large url: http://www.facebook.com/stormlarge
FacebookEmailDetail Date: October 29 at 9:41am
FacebookEmailDetail Body: Nov 19.2009 - 8:30PM
Molly Malones - Los Angeles, California
More info:

Facebook recent activity is like so:

RecentActivity:Jeff became a fan of Fishbone.

Status updates show up like so:

StoryMessage:Jeff Bryner 2 gamble @the airport or not, that is the question.

If you’re really lucky the memory image will contain enough html to produce what pdfbook recognizes as a ‘delete’ button which is only passed out to the owner of the html content. In other words, you are allowed to delete your posts on facebook, pdfbook recognizes this and your facebook userid, correlates it and deduces that the likely owner of the memory image is:


Likely Owner of fbook memory artifacts: FacebookUserID:1421688057 Name:Jeff Bryner

A sample usage:

on a windows or linux box, use pd from www.trapkit.de ala:
pd -p 2345> 2345.dump

where 2345 is the process ID of running instance of IE/firefox/browser of your choice.

You can also use any memory imaging software like mdd, win32dd, etc. to grab the whole memory on the box rather than just one process. You can also use common memory repositories like pagefile.sys, hiberfile.sys, etc.

I’ll refer the reader to the memory imaging tool reference at the forensic wiki

Transfer the dumped memory to linux and do:

strings -el 2345.dump> memorystrings.txt
pdfbook -f memorystrings.txt

It’ll find what it can out of the memory image and spit out it’s findings to standard out. Grep your way to facebook happiness or redirect the output to a file for later viewing.

As this is mosly html parsing, it’s very brittle; meaning that a change in the classID of one of the facebook UI components breaks this program. Matter of fact it’s already broken once since the UI rework of 10/2009. So it will work for awhile until they redesign and I’m out of sync.  Maybe I’ll post it to sourceforge or github so you all can update as you see fit.

Along those lines, look for the diary of pdfbook creation with explanation of it’s regex goodness at the newly created digitalforensicsmagazine.com freshly created this month! Disect and contribute your own regex hacks for finding stuff you recognize in your own facebook memory images.

Jeff Bryner , GCFA Gold #137, also holds the CISSP and GCIH certifications, occasionally teaches for SANS, performs forensics, intrusion analysis, and security architecture work on a daily basis and runs p0wnlabs.com just for fun.

Helix 3 Pro: First Impressions

0
Filed under Computer Forensics, Evidence Acquisition, Evidence Analysis, Incident Response, Linux IR, Memory Analysis, Registry Analysis, Windows IR

I have used several versions of Helix over the recent years.  I enjoy the tool set and recommend it to forensics colleagues, sysadmins, and even family members.

Quite a substantial ruckus was raised this year when e-fense announced that Helix 3 would no longer be free to download.  Instead, would-be users must pay to register as a forum user to get access to Helix 3 Pro updates for a year.

I took the plunge and purchased my forum membership.  Here are the first things I noticed:

  • Some of the highlights…
    • The forum allows access to the Helix 3 software the member applies a registration token.
    • After adding the token, I was able to download not only Helix 3 Pro, but also Helix 3, and contributed tools.
    • Helix 3 Pro is really nothing like the 1.8 and 1.9 versions that came before it.  Although it still provides a bootable live CD as well as executables that can be run in Windows in Linux, the interfaces for all the modes of use have been made more consistent and seamless.  Also, a Mac OS X set of tools have been added.
    • The Helix 3 Pro CD also provides a set of cell phone forensics tools (that I will cover in a follow-on posting).
    • One of e-fense’s goals with the Helix 3 release was to provide a forensics tool that did not touch the host computer in any way.  I have not tried to verify this yet, although I intend to do so soon.
  • And the lowlights…
    • On my Dell D630 laptop (and few other systems), the boot process generated a number of errors and — in some cases — would not detect a graphical interface mode correctly, leaving me with an unusable Helix environment.
    • The majority of the tools that made previous versions of Helix useful are just completely gone.  This is apparently done so that the Helix Pro 3 image can be trusted.  I spoke to a sales representative at e-fense who told me that several customers were using Helix 3 Pro in environments where open source software of questionable origins is, well, frowned upon.
    • Static binaries formerly found on the Helix 1.x CDs are now separate downloads.  They are still available through the Helix forums.

This is the first in a series of blog postings I plan to publish on Helix 3 Pro.  Please post comments if there are specific tools or features of the LiveCD you would like me to cover.

John Jarocki, GCFA Silver #2161, is an Information Security Analyst specializing in intrusion detection, forensics, and malware analysis. He also holds GCIA, GCIH, GCFW and GSEC certifications and is the Treasurer of NM InfraGard.  John recently co-authored a controversial paper on using LiveCDs to mitigate online banking risks.

Mounting Images Using Alternate Superblocks (Follow-Up)

Comments Off
Filed under Computer Forensics, Evidence Acquisition, Evidence Analysis

Hal Pomeranz, Deer Run Associates

Several months ago, I blogged about using alternate superblocks to fake out the ext3 drivers so you could mount file system images read-only, even if they were needing journal recovery.  However, due to recent changes in the ext file system driver the method I describe in my posting is no longer sufficient.  Happily, there’s a quick work-around.

Let’s try the solution from the end of my previous posting under a more recent Linux kernel:

# mount -o loop,ro,sb=131072 dev_sda2.dd /mnt
mount: wrong fs type, bad option, bad superblock on /dev/loop0,
 missing codepage or helper program, or other error
 In some cases useful info is found in syslog - try
 dmesg | tail  or so

This looks like the original error output we got when using the primary superblock.  Looking at the relevant dmesg output we see something different, however:

[163135.527484] JBD: Ignoring recovery information on journal
[163135.795917] Buffer I/O error on device loop0, logical block 0
[163135.795931] lost page write due to I/O error on loop0
[163135.795944] Buffer I/O error on device loop0, logical block 1
[163135.795949] lost page write due to I/O error on loop0
[163135.795958] Buffer I/O error on device loop0, logical block 2
[163135.795963] lost page write due to I/O error on loop0
[163135.795973] Buffer I/O error on device loop0, logical block 3
[163135.795977] lost page write due to I/O error on loop0
[163135.795986] Buffer I/O error on device loop0, logical block 18
[163135.795991] lost page write due to I/O error on loop0
[163135.795999] Buffer I/O error on device loop0, logical block 32
[163135.796034] lost page write due to I/O error on loop0
[163135.796232] Buffer I/O error on device loop0, logical block 73
[163135.796238] lost page write due to I/O error on loop0
[163135.796248] Buffer I/O error on device loop0, logical block 74
[163135.796253] lost page write due to I/O error on loop0
[163135.796261] Buffer I/O error on device loop0, logical block 94
[163135.796267] lost page write due to I/O error on loop0
[163135.796275] Buffer I/O error on device loop0, logical block 96
[163135.796280] lost page write due to I/O error on loop0
[163135.796516] JBD: recovery failed
[163135.796520] EXT3-fs: error loading journal.

It would appear that even though we’re using an alternate superblock that’s marked as not requiring journal recovery, the ext file system driver is still finding the uncompleted journal entries and trying to apply them.  This is arguably “more correct” behavior than the old driver used, but it doesn’t help us very much.

The simple work-around is to tell the ext file system driver to ignore the journal by forcing the file system to be mounted as ext2:

# mount -t ext2 -o loop,ro,sb=131072 dev_sda2.dd /mnt
# ls /mnt
bin   dev  home    lib         mnt  proc  sbin  usr
boot  etc  initrd  lost+found  opt  root  tmp   var

Excellent!  With this small modification our trick is working again.  Hurrah!

You might well wonder what happens if you just try to mount our image as ext2 without using the alternate superblock.  Unfortunately, simply mounting as ext2 is not sufficient because the primary superblock is still marked as needing journal recovery.  Though I wonder why this flag should be relevant to an ext2 file system, it’s enough to prevent the mount from happening.  So the result is that you need to both mount using an alternate superblock and (at least on modern Linux kernels) mount the file system as ext2 to stop the file system driver from looking at the journal.

Hal Pomeranz is an independent IT/Computer Security consultant and a SANS Faculty Fellow.  He actually discovered this problem when attempting to give a live demo in the middle of a class.  Unfortunately, the solution only occurred to him after class was concluded.  This is one of the reasons why being a SANS Instructor can be so… invigorating.

Is Your index.dat File LEAKing?

Comments Off
Filed under Browser Forensics, Computer Forensics, Evidence Analysis

One of the projects that I’ve been working on, has required me to become intimately familiar with index.dat files.  These files (index.dat) are usually associated with Internet Explorer’s browser history.  If you’ve ever worked with index.dat files before, you’ve probably encountered the mysterious “LEAK” record.  After some analysis, I think I’ve finally figured out what LEAK records are used for.

Essentially, a LEAK record is created when a cached URL entry is deleted (by calling DeleteUrlCacheEntry) and the cached file associated with the entry (a.k.a. “temporary internet file” or TIF) can not be deleted.

You can easily test this on your own system:

  1. Open Internet Explorer and surf to a web page.  Ideally a page with a unique and easily identifiable name (e.g. thisisnotthefileyouarelookingfor.txt).
  2. At a command prompt, navigate to your internet cache directory.  On Windows XP/2003 systems this will be under Documents and Settings\<username>\Local Settings\Temporary Internet Files\Content.IE5.  There are four subdirectories (with random looking names).  These directories contain the locally cached copies of various web pages.
  3. Find the locally cached copy of the page you visited in step 1.  The cached page will be under one of the four subdirectories, and is usually named <page name>[number].<ext>.  (This is why a unique and easily identifiable page name is useful)
  4. Using notepad open the cached copy of the page.
  5. With notepad still open, go back to Internet Explorer and clear your browsing history.
  6. You should now see a LEAK record in the index.dat file under the Content.IE5 directory.  (It helps to use a hex editor).

It’s that simple. A detailed explanation and sample code to create LEAK records can be found here.

Mike Murr is a forensics analyst with Code-X Technologies, a SANS Instructor, author of www.forensicblog.org and LibForensics, an open source framework for digital forensics written in Python.

Windows Scheduler (at job) Forensics

7
Filed under Computer Forensics, Evidence Analysis, Incident Response, Windows IR

This information may be useful to people responding to compromise incidents involving Windows. Typically these days, when a job is scheduled for execution later, possibly every day, week, or month, it’s done via a GUI tool or ‘schtasks‘. However , you can still use the original command line ‘at‘ tool. This utility also allows such jobs to be scheduled over the network if admin credentials are possessed, which makes it quite useful to an attacker for post exploitation activities. When cleaning up after something like this, it’s useful to know a bit about what it does under the hood, including the formats of the associated .job file, and the structure and location of associated log entries.

A scheduled job created by the At command

Figure 1: A scheduled job created by the At command

When the job is scheduled using the ‘at’ command, a file is created under the Windows\Tasks folder. This file has a .job extension, is named At#.job (jobs not scheduled by the ‘at’ command will have arbitrary names), and its format is described on msdn. If the job is not set up for repeat execution(weekly, monthly, etc.) then the file is deleted after the job is run. In some cases I’ve been able to extract the data portion from unallocated space by searching for a characteristic comment string that the ‘at’ command (I haven’t been able to create a search keyword that will find generic .job files that weren’t created by ‘at’), at least in Windows XP, places on all .job files that it creates, “Created by NetScheduleJobAdd.”. Note that this string is in Unicode, and is null-terminated. The 16 bits immediately following the two nulls at the end of this unicode string contain the length (little-endian) of the User Data section in bytes, and they are followed by that section (if the length is non-zero). After this section comes the reserved data size ( another 16 bit little endian value in bytes), followed by the reserved data section. Then there’s a 16 bit trigger count, which contains the size of the array of triggers that follows. According to the MS documentation, this value is supposed to be the number of bytes in the array, but as it’s one in all the examples I’ve personally seen, I think it’s really supposed to be the number of triggers defined. The trigger value starts with a 2-byte size, which should be 0×30 (little-endian, of course, which means 3000). Then there’s a 2-byte reserved field, followed by three more 2-byte little-endian values specifying the year, month, and day when the job is scheduled to execute first. Next come three more 2-byte values which would specify end year, month, & day for repeating jobs, and finally two more two byte values, for the start hour and start minute. There are other fields in the structure, but these are the important ones. (While there’s a field specifying the job’s author, in all examples I’ve seen from the At command, it’s set to “System”.)

Hex dump of the At1.job shown above

Figure 2. Hex dump of the At1.job shown above

When a scheduled job is actually executed (not when it’s scheduled), a log entry is written to the scheduler’s log file. The name of this file is defined by the registry key HKLM\Software\Microsoft\SchedulingAgent, but it defaults to SchedLgU.txt. On WinXP, this file is located in the Windows folder, but on Vista, they’ve moved it to Windows\Tasks. The log entry format looks something like the following (in unicode):

“At#.job” (filename.exe)

Started #/##/2009 #:##:## PM

“At#.job” (filename.exe)

Finished #/##/2009 #:##:## PM

Note that the numbers aren’t padded with leading zeros, and that the above lines are two separate 2-line log entries. It’s possible that other entries could fall in between the Start and Finish for a given job. Also, these entries are written for any scheduled job, regardless of whether it was created by ‘at’ or one of the other scheduler utilities. Of course if it was created by something other than ‘at’, the .job file will probably have a different name.

As always, please feel free to leave commentary if you liked this article or want to call me on the carpet for some inaccuracy.

John McCash, GCFA Silver #2816, is currently a Forensic Investigator employed by a fortune 500 telecommunications equipment provider.

Decrypting a PointSec Encrypted Drive Using Live View, VMWare, and Helix

4
Filed under Computer Forensics, Drive Encryption, Evidence Acquisition, Evidence Analysis

Doing it the HARD way!

Perhaps you remember my previous blog on EnCase and PointSec, which included my plea for Guidance Software and CheckPoint to work together to create a seamless way to decrypt drives without having to go through 20 or 30 steps to get there.  I even wrote, out of desperation, A Case for Decryption of the Original, because it would save time consuming steps and not change the data relevant to an investigation.

Time for an update.  As noted in my last blog on decrypting the original, VMWare no longer recognizes a raw disk as a valid disk image.  Images have to be converted before VMWare will recognize them.

 Here is a new and “improved” method that will result in a COMPLETE decrypted image without changing the original.  It is more painful because more steps are involved, but it works.  (Today).  That being said, I STILL want PointSec, now called “End Point Security,” to work with Guidance to create a driver that could be used to directly access the disk image and decrypt it in EnCase.  This can’t be rocket science, right?  Let me add an encrypted image to the case, key in a password, and access the data.

In the mean time, gather your tools.  You will need the dcfldd for Windows, Live View application, VMWare Server, and Helix for imaging.  (Twice).

  1. Use Helix or your other favorite method to acquire a raw image of the drive to be decrypted.  (There is an open source version of Helix you can download for free, or you can purchase Helix Pro in order to have support, if your prefer.)  [Watch for my upcoming blog on using dcfldd to acquire a raw image.]
  2. Use Live View to convert the raw image to a VMDK file.  (You will have to have the correct versions of VMWare to read the VMDK.  Live View will inform you what version of VMWare you should be running.)
  3. Acquire the PointSec recovery file from the administrator.  (This whole process assumes that you have the administrator ID and password for an administrative install of PointSec.  If you don’t have that, you are reduced to a manual brute force attack.  Good luck!)
  4. Using the PointSec recovery file, create Recovery Media.  (Believe it or not, you need a real floppy disk to do this.  Can’t just create a raw floppy image.  Go figure.)
  5. Create a raw image of the floppy disk in a file on the Windows hard drive using the following command:
    dcfldd if=\\.\A: of=filename.img   
    (requires you have dcfldd installed – available from sourceforge.com
    If you use linux, refer to the floppy drive device (if=/dev/fd0 or as appropriate for your system) as the input file instead of the above syntax.)
  6. Copy the resulting floppy image to your VMWare server where you intend to decrypt the image.
  7. Open VMWare
  8. Select the VM created by Live View, but do NOT start the machine.  (Note that you will not have to create a new virtual machine.  Live View handles all that.  But also note that Live View creates a snapshot and other files as well, which cannot be read directly into EnCase Forensic.  That is why we must do the final acquisition with Helix in this process.)
  9. Add a floppy drive to the VM configuration and select the image created above as the floppy virtual drive.  Make sure it will “Connect on Power On” so that the machine will boot to the floppy

10.  Edit the CD Rom settings and set it to use an ISO image.  Point to a copy of the Helix ISO image.  (This is for acquiring the decrypted drive later, but will not be used for the decryption step.)

11.  Start the Virtual Machine – it will boot to the floppy image.

12.  Enter the requested PointSec administrator credentials and start the decrypt process.  The VMDK image will be decrypted.

13.  Once you have entered the credentials, the program begins decrypting the hard drive image, posting a % complete message as it goes.

14.  Once decrypted, reboot the VM

15.  Hit escape ONE TIME during boot to get Boot Menu.  (If you hit escape too many times, VMWare will blow by the boot menu, but not to worry, because we have left the floppy image set up as the boot drive.  That way the decrypted image will not boot and will, therefore, remain unchanged for maintaining Chain of Custody.)

16.  Select CD-Rom from the boot menu to boot to the Helix CD-Rom.

17.  Run Helix from the CD.

18.  Insert a USB drive with enough spare space to receive the image from the “target” machine.  You will mount it later.  Helix is able to mount NTFS in read/write mode, so your portable drive can be formatted using NTFS.

19.  Once Helix has booted up, use the VMWare toolbar option:  VM/Removable Devices/USB Devices to select the USB drive for writing the acquired decrypted image.

20.  Open a Terminal Session by clicking on the terminal icon in the Helix tool bar.

21.  Execute the following command in order to get root prompt:  sudo su –

22.  Execute the following command in order to determine drive designations:  fdisk –l   [note that is dash lower case L, not I or 1]

23.  Once the USB drive has been added to the VM, if it is formatted using NTFS, use the following command to mount the drive:
mount –t ntfs-3g /dev/sdx1 /media/sdx1 –o force  
(substitute correct letter for x based on the results of your fdisk –l listing)

24.  Create a directory on the USB drive to receive the image.

25.  Change to the directory you just created.

26.  Execute the following command in order to record disk parameters for the case:  fdisk –l > fdisk.txt

27.  Use the following command to acquire the image:
dcfldd if=/dev/sdx of=filename.img conv=noerror,sync hash=md5 hashlog=filename.img.md5

28.  Once completed, for the record, do the following command to save the history of commands into file:
history > history.txt, then save the mount config in case anyone asks about that with:
mount > mount.txt

29.  Now you have a raw, decrypted image that can be read into EnCase and properly acquired for analysis.  Using this method, the original disk is untouched, and the only change to the disk image is that it was decrypted.  This preserves proper Chain of Custody and avoids contamination of the evidence.

 

Whew, that was way too painful.  In my next blog, I will share a method of “slaving” the target drive so that it can be acquired directly into EnCase with the hard disk left in its original state.  Still not as easy as it ought to be, but much easier than the VMWare method.  The only caveat is that the “Slave” method will allow us to image the decrypted partition(s), but will not allow decryption of the entire hard drive.  So at some point, it may be necessary to use the method in this post, not the “Slave” method.

 J. Michael Butler, GCFA Gold #00056, is an Information Security Consultant employed by a fortune 500 application service provider who processes approximately half of the $5 trillion of residential mortgage debt in the US. He is a certified computer forensics specialist. In addition, he authored the enterprise wide security incident management plan and information security policies for his corporation. He can be reached at jmbutler_1 at hotmail dot com.

Flash Cookie Forensics

9
Filed under Computer Forensics, Evidence Analysis
Flash cookies have been a hot topic lately with the release of an excellent research paper titled Flash Cookies and Privacy.  Flash Cookies, or local Shared Objects in Macromedia parlance, are a great example of a forensic artifact that has existed for a long time but was virtually ignored until someone decided to shine some light on it.  Whenever I see new research about problematic privacy controls, I immediately get out my notepad, because I know that I am going to find some great artifacts that can help in my forensic investigations.
First some basics:
  • Macromedia Flash has become ubiquitous on the web, providing features such as streaming video and a “rich client” experience.  Many of the most popular sites on the web are dependent on Flash, and thus a high percentage of Internet users have installed the Flash plug-in.
  • The Flash standard incorporates local Shared Objects (LSOs), which allow data (such as preferences) to be stored in the local Flash instance on a user’s machine.
  • LSOs are stored as individual files with a .SOL file extension.  By default they are less than 100 kB in size and have no expiration (unlike traditional HTTP cookies).
  • I have found .SOL files in two locations on the local system: %user profile%\Application Data\Macromedia\Flash Player  and %user profile%\Application Data\Macromedia\Flash Player\#SharedObjects\<random profile id>\         (%user profile% is shorthand for where the user folders reside – typically C:\Documents and Settings\<account name>\ on a XP system).  For Vista analysis, you will need to look in the Roaming folder within %user profile%.
  • LSOs are not browser based, so there is currently no easy way for the average user to remove them (simply deleting the files does the job, but a user would need to know where they are located).  This makes LSOs very persistent on the local system.

Analysis

For our purposes, the term Flash Cookies is an apt descriptor for LSOs since they give very similar information to what we find in traditional HTTP cookies. Those of you that have taken the SANS SEC 408 Computer Forensic Essentials course will recall that HTTP cookies can give us the following information:

Websites that were visited

Macromedia Flash requires that LSOs be stored hierarchically by domain.  This is one way it is able to enforce the rule that each domain may only store up to 100k on the local system.  From our perspective, this gives us a very handy means for quickly reviewing the sites visited.

Figure 1: Directory listing displaying LSO domains

Figure 1: Directory listing displaying LSO domains

One thing to note is that Flash based advertisements also have the ability to save LSOs.  This is important because in some cases we can’t necessarily conclude that it was the user’s intent to access the domain.  The origin of the LSO is often obvious (see Figure 2), but further testing or additional artifacts may be necessary to make any definitive conclusions.

Figure 2: Local Shared Object saved from a Flash advertisement

Figure 2: LSO saved from a Flash advertisement

Local user account that visited the site

Recall that the .SOL files are located within the %user profile% folder, indicating the account that was logged in when the LSO was saved.

When the site was first and last visited

Since the .SOL files are saved individually, we have a nice set of file system timestamps to utilize.  On Windows XP (which has Access time stamping on by default) we can use the Access Time to tell us when the LSO was last read.  This can potentially tell us when the site was last visited, but we have to be careful since I am not aware of any standard that requires an issuing site to read the LSO.  It is certainly in their best interests and in my testing all appear to be doing so, but if the site does not read the LSO for some reason, the Access time will not be updated.

The .SOL file Creation Time can potentially tell us when the site was first visited.  Again, we are not assured that the LSO was created on the first visit to the site, so it is difficult to be conclusive.  A better way of looking at this would be the “first known visit to the site”.  Other artifacts on the system may be able to corroborate this time or indicate an even earlier visit time.

So looking again at Figure 1, we can see that the first known visit to mg3.mail.yahoo.com was 11/27/2008 at 1:38am and the last known visit was 8/17/2009 at 5:27pm (local machine time).

Data stored by the website

Flash specifically attempts to obfuscate data within each LSO by controlling the format and forcing a binary serialization of any stored data.  That being said, if you find a relevant file, don’t overlook this data area.  I have found interesting tidbits such as text-based location information stored by a weather website.

Tools

While not recommended as a forensic tool (primarily because it requires installation / execution on a live system), the Better Privacy Firefox extension is a great tool for identifying (and removing) LSOs on your local system.  One of the best ways to learn about forensic artifacts is by reviewing them on a system with known behavior (i.e. your own system). The Better Privacy plug-in allows you to easily review (and manage) LSOs on a live system.

Figure 3: Better Privacy Firefox Plug-in Screenshot

Figure 3: Better Privacy Firefox Plug-in Screenshot

This is just a first look at Flash Cookies — I encourage our readers to post any links or information they have discovered in the blog comments.

Chad Tilbury, GCFA, has spent over ten years conducting computer crime investigations ranging from hacking to espionage to multi-million dollar fraud cases.   He currently teaches SEC408 Computer Forensic Essentials and SEC508 Computer Forensics, Investigations, and Response for the SANS Institute.

Analysis of e-mail and appointment falsification on Microsoft Outlook/Exchange

1
Filed under Computer Forensics, Email Investigations, Evidence Analysis, eDiscovery

Author: Joachim Metz <forensics@hoffmannbv.nl>

Summary

In digital forensic analysis it is sometimes required to be able to determine if an e-mail has or has
not been falsified. In this paper a review of certain Outlook Message Application Programming
Interface (MAPI)
is provided which can help in determining falsified e-mails or altered
appointments in an Microsoft Outlook/Exchange environment.

About the libpff project

In 2008 Joachim Metz a forensic investigator at Hoffmann Investigations started the libpff project.
At that time the best source about the Personal Folder File (PFF) format in the public domain was
the libpst project. The libpst project dated back to 2002 and had been contributed and maintained
by David Smith, Joe Nahmias, Brad Hards and Carl Byington.

However the libpst, at that time, wasn’t a library and had no support for recovering deleted items
in PST and OST files. The initial goal of the libpff project to create a shared library for PST and
OST that had support for recovering deleted items. Recovering deleted items requires detailed
knowledge of the inner structures of the PFF format. This was the beginning of an interesting
journey. In which even recently additional information about the inner structures has been
discovered, like the 6c and 8c table and the use of indirection in large tables.

In March 2009 PFF forensics was first discussed as part of Microsoft Office forensics in the
Hoffmann Advanced Forensic Sessions (HAFS). A paper titled ‘Personal Folder File (PFF)
forensics’
was published as part of the HAFS. This paper explains the basics of the PFF format,
which can be quite a challenge to understand. One of the main conclusions of the both the paper
and the seminar was that different forensic tools provide different results when recovering deleted
items in PST and OST files.

In the mean time the libpff project has evolved. Due to continued analysis of the PFF format and
several contributions new aspects of the file format have been discovered. Some of which are the
PFF items that contain information about the recipients, sub folders, sub messages and sub
associated items.

Also a lot of information available about the MAPI has made available. The OpenChange project
provides libmapi which contains an Open Source implementation of the MAPI. And the
MFCMAPI project has provided a lot of MAPI information now available on MSDN.

Within Hoffmann Investigations libpff has been to put to work for two purposes. First as a tool to
cross reference findings in other forensic tools and secondarily as a tool that can provide more
information about PST and OST files than those forensic tools. In the upcoming Hoffmann
Advanced Forensic Sessions in November 2009
PFF forensics will be therefore once more the
subject of discussion. In the mean time several of the interesting findings are provided in this
paper.

1. Introduction

Wouldn’t it be nice to have your forensic analysis software to filter out falsified e-mails and
appointments for you? However, most of the current forensic tools provide little information about
the authenticity of e-mail messages and appointments. Therefore, certain analysis have to be done
manually. This paper will give you an understanding of parts the Outlook Message Application
Programming Interface (MAPI) to help identify falsified e-mails in Microsoft Outlook/Exchange
environments.

1.1. Background

If you are a forensic investigator in the field of corporate environments you are probably dealing
with Microsoft Outlook and Exchange most of the time. What you might not know is that both
make heavy use of the MAPI. The MAPI is not only a programming interface but also a useful
resource of information regarding properties of e-mail attributes. For those of you not familiar
with analyzing the Personal Folder File format used by Microsoft Outlook for PST and OST files,
I advice reading [METZ09] before reading this paper.

2. Falsified e-mail message

In a recent investigation we had to investigate if a user had sent an e-mail at a certain date and
time. We started by determining the existence of the e-mail in the mailbox of both the sender and
the recipients. But there were other characteristics that were highly interesting from a forensic
point of view.

A certain e-mail dated March 10, 2009 was forwarded on March 17, 2009. The original e-mail
could not be found in any of the mailboxes. The first indication of falsification was a discoloring
of the day of the month in a print-out of the forwarded e-mail. The 0 in March 10, was gray while
the surrounding text was clearly black.

2.1. The e-mail body

In Outlook/Exchange an e-mail message can contain RTF and/or HTML body text. Both RTF and
HTML formats use formatting codes. Using these formatting codes we did a low-level analysis of
the body text. Most of the available forensic tools do not provide access to these formatting codes,
but lucky for us there is libpff and its tools.
After having compiled libpff with verbose and debug output and having pffexport export the PST

file with the verbose option (-v), we had created a detailed debug log file. In this log file we looked up the e-mail and its RTF body. In the RTF body the following information was found:

{\*\htmltag84 <b>}\htmlrtf {\b \htmlrtf0 Sent:
{\*\htmltag92 </b>}\htmlrtf }\htmlrtf0 Tuesday March 1
{\*\htmltag84 <span style='color:#1F497D'>}\htmlrtf {\htmlrtf0 0
{\*\htmltag92 </span>}\htmlrtf }\htmlrtf0 , 2009 13:48
{\*\htmltag116 <br>}\htmlrtf \line
\htmlrtf0
{\*\htmltag4 \par }

Using other forwarded e-mails as a reference, we established that the bold formatting code should not be there.

2.2. Conversation index

Looking at existing e-mail messages we hypothesized that the original e-mail was not created on
March 10, 2009 but was in fact an e-mail created on March 17 2009 that had been altered. We
wanted proof besides the lack of the original e-mail message in the mailboxes of the sender and
the recipients.

A MSDN article titled 'Tracking conversations' provided us with a fairly reliable answer.
[MSDN] states that:

PR_CONVERSATION_INDEX (PidTagConversationIndex) indicates the position of the
message within a particular conversation. It is a client's reponsibility to
set PR_CONVERSATION_INDEX for each outgoing message, whether it is a new
message, a forwarded message, or a reply. Clients can set this property
manually or call ScCreateConversationIndex, a utility function provided by
MAPI.
ScCreateConversationIndex generates the value of a conversation index for any
outgoing message. ScCreateConversationIndex implements the index as a header
block that is 22 bytes in length, followed by zero or more child blocks each 5
bytes in length.
The header block is composed of 22 bytes, divided into three parts:
 * One reserved byte. Its value is 1.
 * Five bytes for the current system time converted to the FILETIME structure
 format.
 * Sixteen bytes holding a GUID, or globally unique identifier.
Each child block is composed of 5 bytes, divided as follows:
 * One bit containing a code representing the difference between the current
 time and the time stored in the header block. This bit will be 0 if the
 difference is less than .02 second and greater than two years and 1 if the
 difference is less than one second and greater than 56 years.
 * Thirty one bits containing the difference between the current time and the
 time in the header block expressed in FILETIME units.This part of the child
 block is produced using one of two strategies, depending on the value of
 the first bit. If this bit is zero, ScCreateConversationIndex discards the
 high 15 bits and the low 18 bits. If this bit is one, the function discards
 the high 10 bits and the low 23 bits.
 * Four bits containing a random number generated by calling the Win32
 function GetTickCount.
 * Four bits containing a sequence count that is taken from part of the random
 number.

Reverse-engineering this description for the PFF format I found that the part of the header block

containing the ‘One reserved byte’ with a value of 1 is actually the first byte of the filetime. So

there are not 5 bytes of the filetime but 6. The date and time in the header block of the

conversation index matches the creation date and time of e-mail messages.

The child block contains a difference between the current and the previous time and not the time

stored in the header block, as according to the MSDN specification. This was validated using the

creation date and time of multiple e-mails.

The conversation index for the specific e-mail translates to:

0x0071 (PidTagConversationIndex : Conversation index)
0x0102 (PT_BINARY : Binary data)
Header block:
 Filetime        : Mar 17, 2009 10:13:04 UTC
 GUID            : 11111111-2222-3333-4444-555555555555
Child block: 1
 Filetime        : Mar 17, 2009 10:18:03 UTC
 Random number   : 2
 Sequence count : 0
Child block: 2
 Filetime        : Mar 17, 2009 10:24:01 UTC
 Random number   : 9
 Sequence count : 0
Child block: 3
 Filetime        : Mar 17, 2009 10:42:39 UTC
 Random number   : 9
 Sequence count : 0
Child block: 4
 Filetime        : Mar 17, 2009 10:45:36 UTC
 Random number   : 14
 Sequence count : 0
Child block: 5
 Filetime        : Apr 17, 2009 07:19:08 UTC
 Random number   : 8
 Sequence count : 0

Note that the precision of the date and time difference in the child block varies and does not match

the creation date and time. The actual reason for this variation is yet unknown.

0x3007 (PidTagCreationTime : Creation time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime        : Apr 17, 2009 08:41:20 UTC

However there is no date March 10, 2009 in the conversation index. Looking at the conversation

indexes of other forwarded and replied e-mail messages this is the behavior we would expect.

Note that the GUID ‘11111111-2222-3333-4444-555555555555′ in this example was altered.

Using the GUID we found corresponding e-mails, with the same GUID in the conversation index.

Most of these e-mails had a different content. This finding supported our hypothesis. All of the

corresponding e-mails also had a creation date of March 17, 2009. Therefore, it was plausible that

the e-mail with the discolored zero in ‘March 10′ was falsified using another e-mail created on

March 17, 2009. Upon being faced with the findings in an interview, the sender of the e-mail admitted that he had

altered the e-mail.

3. The modified appointment

In another investigation we found an appointment that contained a conversation topic that
contained one of the keywords we were looking for. However the appointment had an entirely
different subject and the last modification date and time already indicated that the appointment
was modified at a later date.

We needed to be certain that this behavior was caused by modifying an appointment. Using
Outlook we created a PST file with an appointment. Libpff provided us with the following
information about the subject and the conversation topic:

0x0037 (PidTagSubject : Subject)
0x001f (PT_UNICODE : UTF-16 Unicode string)
Unicode string  : ^A^ATest1
0x0070 (PidTagConversationTopic : Conversation topic)
0x001f (PT_UNICODE : UTF-16 Unicode string)
Unicode string  : Test1

And about the date and time values:

0x0039 (PidTagClientSubmitTime : Client submit time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:07:47 UTC
0x0071 (PidTagConversationIndex : Conversation index)
0x0102 (PT_BINARY : Binary data)
Header block:
 Filetime         : Jul 23, 2009 14:07:47 UTC
 GUID             : 11111111-2222-3333-4444-555555555555
0x0e06 (PidTagOriginalDeliveryTime : Message delivery time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:07:47 UTC
0x3007 (PidTagCreationTime : Creation time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:04:28 UTC
0x3008 (PidTagLastModificationTime : Last modification time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:07:50 UTC

The ^A characters in the subject are control characters and can be ignored.

Note that the creation and last modification date and time are not equal.

Next we modified the appointment and had libpff provide us with information about the subject

and the conversation topic:

0x0037 (PidTagSubject : Subject)
0x001f (PT_UNICODE : UTF-16 Unicode string)
Unicode string  : ^A^AModified1
0x0070 (PidTagConversationTopic : Conversation topic)
0x001f (PT_UNICODE : UTF-16 Unicode string)
Unicode string  : Test1

And about the date and time values:

0x0039 (PidTagClientSubmitTime : Client submit time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:07:47 UTC
0x0071 (PidTagConversationIndex : Conversation index)
0x0102 (PT_BINARY : Binary data)
Header block:
 Filetime         : Jul 23, 2009 14:07:47 UTC
 GUID             : 11111111-2222-3333-4444-555555555555
0x0e06 (PidTagOriginalDeliveryTime : Message delivery time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:07:47 UTC
0x3007 (PidTagCreationTime : Creation time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:04:28 UTC
0x3008 (PidTagLastModificationTime : Last modification time)
0x0040 (PT_SYSTEM : Windows Filetime (64-bit))
Filetime          : Jul 23, 2009 14:08:37 UTC

As you can see the conversation topic and index do not change when an appointment is modified.

The last modification date and time in the example is not much of an indication that the

appointment was modified, mainly because we did the modification right after the creation of the

appointment.

4. Conclusion

E-mails and appointments in Outlook/Exchange provide us with certain properties that can be
useful for digital forensic analysis of e-mails, like the conversation index and multiple formatted
body texts. Others may be the conversation topic and original creation and/or modification dates
and times.

Appendix A. References

[METZ09]
Title:     Personal Folder File (PFF) forensics
Subtitile: Analyzing the horrible reference file format
Author(s): Joachim Metz
URL:       http://kent.dl.sourceforge.net/sourceforge/libpff/PFF_forensics.pdf
[MSDN]
Title:     Tracking conversations
URL:      http://msdn.microsoft.com/en-us/library/cc765583.aspx

Making Reviewing Files From Data Carving Easier: Images

Comments Off
Filed under Computer Forensics, Evidence Analysis

Background

I usually do a lot of data carving. With 500 gig drives becoming the norm in machines, the recovered files I see from data carving is huge. Nothing like having to review 10000+ jpegs and having to review each one. I had a lot of issues trying to find something to review that many images. After trying many programs and some hacks to break up the images into smaller subsets. I decided to write my own set of tools for processing the files recovered from data carving.

Data Carver Processors

The Data Carver Processors are a combination of Perl scripts and other programs that are designed to break up the recovered files into manageable chunks. As the script runs over the files, it will create a series of web pages with thumbnails and a second web page for each file that contains plug-in output like metadata, hashes, and etc. The scripts, for the most part, will not process damaged files. If a file is damaged, there will be no image for it on the web page. Just some text on the file.

Currently I have something for images, video, pdfs, and documents. The first one to be released is the Image processor. I will be releasing the others as I have them finished and docs created for them. Should be over the course of the Aug. and Sept.

Image-processor

Point this script at a directory full of images and it does:

  • Creates a series of web pages that contains a thumbnail of all readable images
  • Gathers details about the files such as Exif data
  • Sort images based upon nudity; Creates a CSV file with results in addition to the output in the web pages; Currently only detects “white” skin
  • Review images using StegDetect (only does JPEGs); Creates a CSV file with results in addition to the output in the web pages
  • Create your own plug-ins


Requirements

Perl modules: Getopt::Long, Pod::Usage, Image::ExifTool, Image::Magick, Imager::SkinDetector, File::Basename, Config::IniFiles
Libraries and packages installed: Imagemagick and Stegdetect
Stegdetect Notes: In order to get Stegdetect to compile using gcc 4.x, download the patch at Stegdetect 0.6.

Example commandline:
1) cd into the directory where you want the output put into
2) /appl/data_carver_processors/image-processor.pl –inputdir /foremost/output/jpeg –output index –plugindir /appl/data_carver_processors/image-plugins  –ini /appl/data_carver_processors/data_processor.ini –nudity –stego

Sample Main Page

Sample Secondary Page
(Where you clicked on the image)

INI File

The INI file (data_processor.ini) contains the user configurable options for each one of the data processor scripts.

Each line has a comment before the parameter. See the INI file for more details.

Plug-ins

The script will only except plug-ins written in Perl. Just place your plug-in into the appropriate directory (i.e. image-plug-ins). Each plug-in has to be able to accept one command line argument which is -i {file name}. The Image Processor script will execute the plug-in and pass the -i option and file name to the plug-in. The output will be captured and placed into the web page for that file.

An example Perl plug-in

#!/usr/bin/perl 

#use warnings;
use Getopt::Std;
use File::Basename;
use Digest::SHA qw(sha512);

sub usage() {
 print <<EOM;
 usage: $0 [options] [directory]
 options:
 -i file    Input file

EOM
 exit;
}
usage() unless getopts("ht:o:i:p:");
usage() if $opt_h;
$file = $opt_i;

$sha = Digest::SHA->new;
$sha->addfile($file);
$digest = $sha->hexdigest;
$digest =~ tr/[a-z]/[A-Z]/;
print "SHA512: $digest\n";

An example Perl plug-in with shell commands

#!/usr/bin/perl 

#use warnings;
use Getopt::Std;
use File::Basename;

sub usage() {
 print <<EOM;
 usage: $0 [options] [directory]
 options:
 -i file    Input file

EOM
 exit;
}

usage() unless getopts("hi:");
usage() if $opt_h;
$file = $opt_i;

$digest = `md5sum $file`;
$digest =~ tr/[a-z]/[A-Z]/;
print "MD5: $digest\n";

For those of you who are not into writing Perl scripts, take a look a the first line that has a $digest in it. The shell command is between the ` `.  Feel free to replace the md5sum with anything else you want to run. You should remove the 2nd $digest line, unless you want the output in upper case.  Lastly, alter the print statement by changing the MD5 to whatever you want to call your plug-in. Save the file and place it in the plug-in directory.

Testing The Plugin

If you end up writing your own plug-ins, you can always test your plug-in by doing:

# ./{plug-in file name}.pl -i {test file name}

If the output is what you expect, then your ready to run it with the processor.

Please send me any plug-ins you write. I will be happy to include them with the download.

CSV Files

The CSV files that get generated with the –nudity and –stego options can be found under the directory tn.

nudity_review.csv

If you gave the Image processor the –nudity option it will create a file called nudity_review.csv. The contents contain the file name and score as per the Imager::SkinDetector. An example file looks like this:

/appl/scripts/data_carve_processors/test/data/Battlestar.jpg,9.43998543211
/appl/scripts/data_carve_processors/test/data/Nmap_Matrix_Screen_Huge.jpg,0.235281046114735
/appl/scripts/data_carve_processors/test/data/annie-360x184.png,23.9994074147233
/appl/scripts/data_carve_processors/test/data/annie-720x368.png,23.6893463389346
/appl/scripts/data_carve_processors/test/data/axm-v3-10-p25.jpg,14.7481563652296
/appl/scripts/data_carve_processors/test/data/nmap-matrix2log-cropped.gif,6.15041880852475
/appl/scripts/data_carve_processors/test/data/nmap-matrixhax0r3c.gif,0
/appl/scripts/data_carve_processors/test/data/nmap-matrixhax0r3c.png,0
/appl/scripts/data_carve_processors/test/data/nmap_matrix5.png,0
/appl/scripts/data_carve_processors/test/data/nmap_matrix6.png,0

stego.csv

The stego.csv file will get created when the –stego is given. Again, this contains the file name and output from StegDetect. Here is an example:

/appl/scripts/data_carve_processors/test/data/Battlestar.jpg : skipped (false positive likely)
/appl/scripts/data_carve_processors/test/data/Nmap_Matrix_Screen_Huge.jpg : negative
/appl/scripts/data_carve_processors/test/data/axm-v3-10-p25.jpg : negative

INI File

The INI file (data_processor.ini) contains the user configurable options for each one of the data processor scripts.

Each line has a comment before the parameter. See the INI file for more details.

Other Notes

Feedback: Please send me an email with any features/plug-ins you would like to see. If you find any errors with the scripts, let me know. I am also interested any plug-ins you want to share.

Errors: As the script runs over the files you may see some errors outputted. The errors are from the programs running on the recovered files. Not all of the files that the data carvers recover are good files. Hence, the errors.

License: GPL 2.0

Download at: data_carver_processors.tar.gz

Contact: cs[at]citadelsystems.net

Keven Murphy, GCFA Gold #24, is the Senior Forensics/Incident Handler to General Dynamics Land Systems.

FAT File Sizes

9
Filed under Computer Forensics, Evidence Analysis

If you’re just checking this blog for the first time, you should know that this post is one in a series of posts dealing with a FAT file system that has been tweaked in various ways to make recovery of the data more difficult, if only for the casual observer. Forensics folks like yourselves would have no issue recovering the data, but the point of this series is to learn about the FAT file system and how it works.

In last week’s FAT Tuesday post we looked at a file in our usb key image (get it here) called “Scheduled Visits.exe”. We looked at the metadata for the file using istat and saw that it was 1000 bytes in length and occupied two clusters on the disk.

When we attempted to copy the file out of the mount point, it worked, unlike the previous two files we’d worked with in our image. We ran the file command against it and found that it was a zip file. However, when we ran the unzip command against the file we got a nasty error message saying “End-of-central-directory signature not found…” Not terribly helpful and in the interest of saving time and getting to the point of the post, I gave you an idea of how we could determine if there was something wrong with our zip file and then I quickly reminded the readers that we were working with a modified file system.

That led us to the challenge question, actually, a multi-part question: what’s wrong, how do we fix it and in which file system data structure?

It took Joel very little time (once I lifted the registered account restriction) to leave a comment giving all the technical details of the problem. I asked Joel how we could fix it and he posted a follow up comment with an explanation. Let’s look at Joel’s fix. If you want to play along download a copy of the usbkey.img that we’ve been using and grab your hex editor.

Joel starts out by telling us that, “Parsing the FAT directory entry for ‘Scheduled Visits.exe’ indicates that the file size is 1,000 bytes and is contained in sectors 487…” Let’s look at FAT Directory Entry for Scheduled Visits.exe:
Scheduled Visits FAT Directory Entry
I’ve “highlighted” two elements of the data structure. The two bytes in yellow are the starting cluster value for this file in little endian. It’s a 16-bit value because this is a FAT 16 image. If you take 49 00 and reverse it to convert to big endian, 00 49 then convert to decimal:

(4 x 16^1) + (9 x 16^0) = 73

This is where istat gets the starting cluster value. Remember that there’s a difference of 414 between the cluster number and the sector number in our image, so cluster 73 is sector 487. I explained this in a previous post and will touch on it again in a few paragraphs.

In the ugly light-green color are the four bytes that make up our file size, again stored in little endian, E8 03 00 00. Convert this to big endian, 00 00 03 E8 and convert from hexadecimal to decimal:

(3 x 16^2) + (E x 16^1) + (8 x 16^0) = (3 x 256) + (14 x 16) + 8 = 1000

This is where istat gets the file size.

Some of you may want to know how Joel knew where the directory entry for “Scheduled Visits.exe” was located in the image. There are a few ways you could have found it in your hex editor but all of them require knowledge of the data structure for a directory entry so you can figure out where each record begins and ends. Here’s one way to find the directory entry, first run fsstat against the disk image to get some details about where the different data structures are located:
Final fsstat

As you can see, fsstat shows us that the root directory in our file system begins in sector 384. Our sector size is 512 so we multiply that by 384 and jump to the product in our hex editor. This puts us at the beginning of the root directory entry. Directory entries are not static in size because they have to accommodate long file names. I’m not going to cover the entire data structure here, again I recommend Brian Carrier’s File System Forensic Analysis for complete treatment of that and so much more. If you try this in your own hex editor, you should see the data structure shown above.

So, we’ve established that our starting cluster is 73 and the size of our file is 1000 bytes. After cluster 73, how do we know where the file continues on the disk? We look in the File Allocation Table. But where is the FAT located on the disk? Again, refer back to the output from fsstat above. It shows that FAT 0 begins in sector 4. Our sector size is 512 so multiply that by 4 and jump to byte offset 2048 in your hex editor and you’ll see something like the image below (if you’ve made all the corrections from the previous entries in our series and you won’t have the highlighting):
Scheduled Visits FAT Chain

So the FAT begins at byte offset 2048. In an update to A Big FAT Lie Part 2, I posted this chart:

Offset Sector Cluster
----------------------
2048     414     ---
2050     415     ---
2052     416     002
...

Why don’t the two byte values in byte offsets 2048 – 2049 and 2050 – 2051 have a cluster associated with them? Because Microsoft’s specification for FAT file systems says the first data cluster on the disk is cluster 2. Those offsets would represent clusters 0 and 1 which don’t exist. So our first cluster on the disk is represented in bytes 2052 – 2053 in the disk image. FAT 16 file systems use 16 bits for cluster addresses. What is the 16 bit value in bytes 2052 and 2053? 03 00, which when converted to decimal is 3. This 3 tells us that the file that started in cluster 2, continues in cluster 3.

Jump ahead to the “highlighted” area. This represents the cluster chain for the “Scheduled Visits.exe” file. Remember the FAT Directory Entry above told us the file began in cluster 73, looking in cluster 73’s entry in the FAT chain for the file, we see the 16 bit value 4A 00. Convert this, using the same method as previous values, and you get 74. So the file that began in cluster 73 continues in cluster 74. Consult cluster 74’s entry in the FAT and we see 4B 00 or 75 in decimal. But wait, according to istat’s output above “Scheduled Visits.exe” should only occupy clusters 73 and 74. Cluster 74 should contain an End-of-Chain marker.

In fact, as Joel noted in his answer, this file continues for five clusters. Indeed, look at the bottom of the output from the fsstat command and you’ll see the chain for this file, 487 – 491 (reported in sectors, remember sector 416 = cluster 2 so sector 487 = cluster 73).

Let’s recap the facts so far, we copied “Scheduled Visits.exe” out of the mounted file system and found it to be a zip archive, but when we went to unzip it, we got an error message about an unexpected end of file. We now know, based on the values in the FAT that our file occupies more than the two clusters istat calculated based on the file size as given in the FAT Directory Entry for the file.

There are a couple of ways to fix the FAT Directory Entry so that we can successfully copy the file and unzip it. One way would be to simply change the file size to 2560 (five clusters at 512 bytes each for 2560 bytes), but what if the zip file doesn’t actually occupy all of the fifth cluster? Then our file size will be incorrect. How do we get the correct file size? One way is to use the blkcat command to carve out the data in clusters 73 through 77 (sectors 487 – 491). This will be the entire zip archive file. Once carved out, unzip the file, then zip it again and check it’s size. When I did this on my system, I found the zip file was 2428 bytes. Using a hex editor, I was able to reduce the size even further to 2420 bytes by removing 8 bytes of nulls from the end of the file. Decimal 2420 converted to hexadecimal is 974, converting again from big endian to little endian, we get 74 09, plug that into the location for the file size in the directory entry data structure and save the image. So your FAT Directory Entry should look like this:
Fixed Directory Entry for Scheduled Visits.exe
Note the corrected file size in blue. Let’s see what happens now when we mount the image and copy the file:
Scheduled Visists.exe successAnd with that, we’ve successfully restored the image to the state it was in before our suspect made her modifications to the FAT data structures.

Or have we? In the first post in this series, I said these “entries will detail most of the steps required to repair the USB key image.” For the Syngress book giveaway this week, be the first to leave a comment saying what I’ve left out. The winner will receive a copy of Chris Pogue’s Unix and Linux Forensic Analysis DVD Toolkit. As with previous weeks, I’ll post hints, if needed, one per day and if no one answers correctly within a week, I’ll give the book away another time.

Update 20090819 12:20 UTC:
Hint: No one really cares if you can backup, they only care if you can restore.

Update 20090820 12:20 UTC:
Hint: Engineers have been implementing these in systems for years.

Dave Hull, GCFA, is founder of Trusted Signal, a provider of info sec consulting focused on incident response, digital investigations and web application security. He’ll be teaching SANS Sec. 508: Computer Forensics, Investigation and Response in Colorado Springs, Nov. 30 – Dec. 5th.