FAT File System

File Allocation Table (FAT) is a file system architecture used for data storage. Originally designed in 1977 for use on floppy disks, it was extended for use on hard disk drives for use with DOS and then Microsoft Windows. FAT is simple and robust, offering good performance in light-weight implementations. It cannot however deliver the same performance, reliability and scalability of most modern file systems.

FAT is supported by most PC operating systems, mobile devices, digital cameras and embedded systems, and therefore still commonly used for data exchange between computer systems and devices. FAT has long since been superseded as the default file system by NTFS for Windows systems. Today FAT is suited to removable media (excepting CD, DVD and Blu-Ray) and most commonly found on USB sticks, flash cards and other solid-state memory devices. Although seen as a legacy file system, FAT data recovery is still common, mostly from removable media.

FAT File System Features

The original FAT12 file system, used 12 bits to store the logical cluster numbers for the file allocation. In 1984 FAT16, which uses 16 bit logical cluster numbers, was introduced for use on hard disk drives. With the introduction of Windows 95, support for unicode names up to 255 characters was added. The following year in 1996, Windows 95 OSR2 saw the introduction of FAT32, which uses 32 bit logical cluster addressing.

When hard drive space was limited and expensive, compression software was available for FAT12 and FAT16 file systems. Fortunately this is now almost never seen, as each compression utility presents complications for data recovery, depending upon the damage to the hard drive.

FAT File System Internal Structure

As the name FAT suggests, the file allocation is stored in a table, which is a linked list of clusters, detailing the next cluster in the chain, the final one denoting the end of the allocation with a specific value. Two copies of the file allocation table are held, except for some rare instances on some removable media, which provides a certain level of redundancy.

All file metadata is stored in directories, which specify the first cluster of the allocation, and then uses the FAT to access all further clusters. The default naming convention is an uppercase name in an 8.3 format; 8 for the filename, and 3 for the extension. To provide full unicode support the extended version, often referred to as vFAT, splits the name up to allow backward compatibility with legacy software.

FAT Data Recovery

FAT is a simple and well defined file system, from which the results of data recovery, depend completely upon the levels of damage, and which areas of the disk have been affected. Damage to the FAT is recoverable, as long as both copy of the same information have not been lost, in which case the allocation of some files and directories may be lost, although the first cluster is always known. Damage to a directory will result in lost files, but the allocated data chain can be recovered, although this is without its metadata.

Reformatting a FAT volume causes all the allocation data in the FAT’s and the root directory to be cleared. Data recovery is possible by performing an un-format, which searches for directory entries. The quality of this un-format depends upon the fragmentation level of files. Only the first cluster of any file found is guaranteed to be correct.

XFS Data Volume

XFS development was started in 1993 by Silicon Graphics Inc. and released the following year on their IRIX operating system. It is a high performance 64-bit journaling file system, excelling in parallel input/output operations. The maximum supported data volume size is 16EB and a maximum file size of 8EB.

In 2000 XFS was released under the GNU General Public License (GPL) and ported for the Linux operating system, and released the following year. In 2014 Redhat Linux started using XFS as the default file system, including support as the boot partition. The specifications have been released, the details of which can be utilised for the purposes of XFS data recovery, to overcome a variety of file system problems.

Features of XFS Data Volumes

As is now common, XFS uses journaling to enable the recovery of all system metadata following a crash or power failure. XFS introduced 64 bit date times extending the Unix date range beyond 2038, along with nanosecond resolution, rather than the traditional 1 second resolution of UFS.

A unique feature of XFS is the pre-allocation of I/O bandwidth at a pre-determined rate. This is suitable for many real-time applications. This feature is however, only supported on IRIX using specialised hardware.

XFS Data Volume Internals

XFS splits the file system into Allocation Groups (AG) with no pre-defined inodes allocated. All inodes are dynamically allocated, and referenced according to their AG and position within it. The only pre-allocated system structures are the superblocks, a copy in each AG, and the data block usage bitmaps. This avoids wasting space for a file system containing only a small number of files, but conversely does not place an arbitrary limitation on the number of files and directories allowed.

Directories are stored using a BTree, which allows fast and efficient searching and sorting. Files allocation uses extents, these also stored in a BTree, which allows for fast access of any data block; very useful for real time systems.

XFS Data Recovery

While XFS is fairly complex, the specifications are readily available, making the development of data recovery tools relatively simple. The most common reasons for XFS data recovery are due to hardware failure, deletion of the partition, or reformatting the volume.

The metadata structures, allow the file system to be readily scanned during data recovery for lost files and directories, following the loss of important sections of the file system.

When an XFS volume is formatted, only the initial inode allocation (usually 128 entries) including the root directory is overwritten, along with the data block usage bitmaps. This allows data recovery to be performed upon a data volume which has been reformatted, with only minimal data loss, depending upon the amount of data written after the format procedure.

Linux Extended File System

The extended file system, ext was created in April 1992 specifically as the first file system for use by the Linux kernel. The metadata structures are very similar to the traditional Unix File System (UFS) allowing a data volume up to 2GB in size.

In January 1993 this was replaced by the second incarnation of the file system, ext2. Depending upon the implementation in the Linux kernel, the maximum volume size can be from 2TB up to 32TB. The largest filesize is correspondingly between 16GB up to 2TB. Despite the introduction of ext3 and ext4 versions, ext2 is still being used, although mostly on SD cards and USB flash drives. Information about all versions of the file system is readily available, via the open source community, and data recovery from most situations is possible.

Features of Linux Extended File System

Ext3 introduced journaling in 1991, to allow the automatic recovery of the file and directory metadata after a crash or power failure. Like UFS the allocation is block based, with indirect allocation blocks used for large files. A driver allowing data compression is available, but in practice rarely ever used.

Ext4 was developed in 2008, which extended the maximum volume size to 1EB and the maximum file size to 16TB. The date time which can be specified was also enhanced allowing a much wider date range, as well as nanosecond resolution, rather than 1 second resolution.

Linux Extended File System Internals

The internal structure is very similar to UFS, with inodes and usage bitmaps allocated in fixed positions for each cylinder group. Ext3 saw HTree indexing added for large directories, to improve the performance of the file system, when a large number of files or directories are stored together.

Extents were added with Ext4, allowing a set of contiguous blocks to be defined efficiently, in chunks of up to 128MB. Up to four extents can be defined in the inode, with an HTree used to store the remaining allocation. The number of subdirectories allowed was also increased from 32,000 to an unlimited number.

Linux Extended File System Data Recovery

As with UFS, data recovery from ext2, ext3 and ext4 file systems is a relatively well known process, with no surprises. These file systems also suffer the same problem that the loss of directory index information, will lose the name of the file or directory associated with an inode. A reformat will also see all inodes overwritten, resulting in the loss all metadata. A data trawl in such a situation is the only data recovery option left.

If the extended file system can’t be automatically repaired and mounted by the operating system, it is important not to run third party tools in an attempt to fix the issue, as they may destroy important data structures. Once the operating system is unable to mount the file system, severe damage has occurred, which requires a professional data recovery service to examine the data, as a full understanding of the underlying data structures is necessary.

Unix File System (UFS)

Unix was first developed in the 1970’s at the Bell Labs research centre, to provide a multitasking, multiuser computer operating system. Originally intended for internal use, Unix was soon licenced by AT&T for academic and commercial use to vendors such as the University of California, Berkeley (BSD,) Microsoft (Xenix,) IBM (AIX) and Sun Microsystems (SunOS/Solaris.)

The Unix philosophy is characterised by a modular design, which presents a set of simple tools, for well-defined functions, with a unified filesystem acting as the main means of communication. There have been many clones of Unix since, with Linux now proving to be more popular for server platforms than the “true” Unix operating systems.  Each “true” Unix filesystem is based on the same architecture, making Unix data recovery fairly easy, even for a new variant.

Features of Unix File Systems

Unix file systems are defined by a superblock, which specifies all the parameters required to mount it, using the operating system. The file system itself is split into cylinder groups, each comprised of a backup copy of the superblock, cylinder group header, which defines statistics and free block lists. Each cylinder group defines a set of inodes which are predefined at format time, containing the file and directory attributes. These are followed by the data blocks, used for file storage.

The original Unix file system contained only a single cylinder group, but as disks became larger this caused problems, with the disk performing a large number of seeks, called thrashing. Multiple cylinder groups have helped to lessen this, and thereby speed the file system up. A later addition was to also add journaling to the file system, allowing easy recovery of the file system by the operating system after a crash or power failure.

Unix File System Internal Features

The inodes contain the metadata for the individual files, with allocation data for larger files, requiring the use of indirect allocation blocks. Each block is individually listed, which for larger files is inefficient if those the data blocks are sequential in order.

Directory files, allocate space in which the file names, with their associated inodes are stored. These are the only location where the file name is listed, so the loss of any directory inode or the directory information can have drastic consequences for the file system.

Unix File System Data Recovery

As explained, Unix file systems follow a well-defined architecture, so data recovery from a new file system is relatively simple. It is a matter of encoding the new data structures, in order to produce a new data recovery solution. Unfortunately the nature of the filesystem, means that the loss a directory file, means the loss of the names of the files located in that directory. As long as the inodes and the associated allocation can be found, these files are recoverable, but their original name is lost.

Reformatting a Unix file system results in all the inodes, which are in fixed positions being cleared. This makes a data trawl the only viable option in such cases, by processing the space unused by the system areas of the file system.

HFS+ Data Volume

HFS Plus or HFS+ was developed by Apple Inc. and serves as the primary file system of OS X.  HFS+ was introduced with the January 19, 1998 release of Mac OS 8.1. HFS+ was developed as a replacement for the Hierarchical File System (HFS) as the primary file system used in Macintosh computers (or other systems running Mac OS.)

Features of HFS+ Data Volumes

HFS Plus can address 232 allocation blocks, allowing a volume size of 8EB and also a file size of 8EB. HFS+ uses forks, analogous to Additional Data Streams (ADS) used in NTFS volumes, although until 2006 only the resource and data forks were used.

As of Mac OS X v10.3, all HFS Plus volumes have journaling set by default, allowing the file system to be recovered from a crash or power failure, when the operating system is booted. With Mac OS X Snow Leopard 10.6, HFS+ compression was added, also referred to as AppleFSCompression. Compressed data can be stored in either an extended attribute or the resource fork.

Internal HFS+ Data Structures

The HFS Plus Volume Header is located in the sector 2 of the file system, with an alternate copy located in the second to last sector, as a backup. The Volume Header contains all the information about the file system, including the locations of all the system areas and useful meta-data.

The system areas contain the Allocation File, Catalog File, Extents Overflows and Attributes File. Apart from the Allocation File, these are all stored as a B-Tree, which keeps the information sorted, allowing fast searching, sequential access, insertions and deletions. The data blocks follow the initial Catalog File allocation, which may contain extents for each of the system files.

Recoverability of an HFS Plus Volume

The loss of, or corruption to only a few particular sectors, or a reformat can render the data on the volume almost completely unrecoverable, leaving on a data trawl as a viable option to recover any usable data. The loss of the header for the Catalog File is a situation most data recovery solutions are unable to handle. The DiskEng software is however able to deal with this problem, any data loss only caused by any further damage to the file system.

NTFS Volume

NTFS (New Technology File System) was developed by Microsoft as a replacement for the FAT (File Allocation Table) file system. By the 1990’s FAT was proving to have many limitations, some of which were addressed, to improve its usability, but a newer more versatile, multi-user ready and reliable file system was required. The first version was released in 1993 with Windows NT 3.1, with the last major update coming with the release of Windows XP.

Features of NTFS Data Volumes

The maximum cluster size supported by an NTFS partition is 64kB, with the maximum volume size (264-1 clusters) being 256TB. The theoretical maximum file size is 16EB, although as of Windows 8, it has been restricted to 256TB. The file system can be set up to allow data compression, which uses the LZNT1 algorithm, to improve disk space usage and in some instances improve data throughput when reading data. The use of Alternate Data Streams (ADS) is available, initially to provide a means of implementing Services for Macintosh (SFM.) Sparse file allocation is also available, which allows for the creation of large blank files almost instantaneously, without having to reserve the file allocation on disk first.

NTFS volumes also provide journalling, which stores copies of the data which is about to be moved or modified, so that in the event of a system crash or power failure, it allows the rollback of uncommitted changes to critical data structures when the volume is remounted. Data encryption is another available option for Professional, Ultimate and Server editions of Windows. In line with most mainframe and Unix systems, user quotas can be implemented on an NTFS volume.

NTFS Internal Data Structures

All file and directory meta-data, such as file name, file dates, access control information and size, are stored as meta information in the Master File Table (MFT,) itself a file, which is opened when the file system is mounted. Two copies of the first 32 MFT entries (usually 1kB each) are stored in two locations, allowing possible corruption of the first copy to be overcome; the locations of both are held in the NTFS volume boot sector. A copy of the boot sector is also held in the last sector of the partition.

Recoverability of NTFS

The robust nature of NTFS makes it a highly recoverable file system, which can be rebuilt successfully even with the loss of large sections of system data structures. The release of NTFS with Windows XP saw the introduction of numbered records within the MFT, which can be used to resequence the entries in the event of the loss of its allocation information. These can also be used in the event of reformatting the file system, in order to rebuild the directory structure and locate all recoverable files.

During normal data recovery analysis, it is possible to scan for lost files and directories, the results of which are usually very good. The result from scanning for deleted files and directories can yield good results, but it depends upon whether data was written to the file system subsequent to the deletion of the items in question. It is only in the rarest and most severe cases of data corruption or unreadable disk sectors, that the results of data recovery from from NTFS may be unsuccessful.