Projects - Last Resort File Recovery

Note: advice is (of course) followed at your own risk! YMMV.

Scenario: you accidentally installed Linux over the top of your existing operating system, rather than installing it along side.

All is not lost! It's likely that most of your data is still there, not yet overwritten. The problem is that the file allocation table will have been lost, so you won't know where each file starts and ends, and if it's split up into several fragments.

Let's have a look at how installing Linux changed the hard disk's contents.

Disk usage before accident

Disk usage after accident

What it looks like in Linux

Key

file table
file 1
file 2
file 3
file 4
file 5
linux files
linux file table
unused space

When Linux was installed, it wrote a big block of data at the start of the disk, but the rest of the disk was untouched. The reason for leaving the rest untouched is performance: it takes a long time to write to a whole disk, but just marking it as unused is quick and easy.

You can see that parts of file 1 and file 2 were lost. More importantly, the file allocation table was overwritten.

A single file may be split into several fragments at different positions on the disk (e.g. file 3). The file allocation table contains the locations of these fragments, along with the file name and things like creation date and access permissions. Without the original file table, we don't know where each file is placed - all we see is a big array of bytes.

For example, all of file 3 is still present, but we don't know where its fragments are. I'm going to focus on recovering the unfragmented files - 4 and 5.

How can we recover files 4 and 5?

We need to pick out the files from the bytes on disk, without knowing where they start or end. Unless we know something about the files, this is impossible! But if we do know something about the type of files we're looking for, we're in with a chance.

For example, JPEGs always begin with the bytes FF D8 FF E0 ?? ?? 4A 46 49 46 00 01 - where ?? can be any byte - and they always end with FF D9.

I've written a JPEG file recovery program in C++. Note: it requires a 64-bit install of Linux. (If you have a 64-bit Intel or AMD processor, then the Ubuntu x86_64 livecd is fine, just do sudo apt-get install g++ first).

Several other formats have distinctive headers and footers - the the interested reader is invited to extend the technique to other file types.