Tar Backup Format

Tar is a software utility used for collecting multiple files in a single archive file, sometimes referred to as a tarball. The name is derived from tape archive, originally designed for writing data sequentially to a tape drive. Historically early tape drives wrote variable length data blocks, which is slower and less efficient than writing fixed length blocks. Tar was designed for writing fixed length data blocks, allowing file headers to have a fixed alignment, allowing backups to be created to tape, disks and files.

Originally tar was developed within the Unix environment as a means of creating archives which could be used for data backups and as a means of distributing data. In line with all successive backup formats all data headers contain the relevant metadata associated for each file which is stored in the archive. Tar has a long history with many extensions added, but is a very well-known format, which makes data recovery a relatively easy task.

Tar Development

Headers within the original tar archive format used printable octal representations of all numbers, which for historical reasons limited the maximum size of any individual file stored within the archive to be 8GB. In 2001 the ability to store number in native binary format which potentially allows files of unlimited size and larger than the current 64 bit limitation present within most file systems to be archived.

Through a process of proposals and ratification through a POSIX standard, other limitations have been overcome, such as dates extended from 32 bit representations to 64 bit allowing nanosecond resolution, now common with most operating systems and data volumes. Security information was limited to the standard Unix permissions, user id and group id. This has been extended to allow extended attributes and access control lists (ACLs) to be stored.

Tar Archive Usage

Although tar archives are now rarely used for backing up data to tape, they are very commonly used for distributing software and updates. It is also common to combine one of many compression formats available on Unix systems, with the tar archive, whereby the archive stream is compressed as it is created.

The tar archive utility available on Unix systems is a command line tool, with no graphical user interface, with no ability for creating a catalog of files written to tape. It is therefore not ideal as part of a backup strategy within a disaster recovery plan. However tar has also formed the basis for several backup packages which have strived to overcome this limitation, providing a graphical user interface, in order to allow easy selection of which files should be backed up or restored. One such utility is Veritas NetBackup which provides cross-platform backup functionality, allowing multiple computers to send data to a single server.

Tar Data Recovery

Although tar is now rarely seen especially on modern tape drives, it can still be found on some older and no obsolete tape formats, such as Exabyte tapes, early DDS dat cartridges and any other tape drive commonly used with a Unix system.

Tar is an important data format, as it is in common use for software distribution, particularly in the open source sector, as well as the core for many other backup formats still in use. It is also useful for recovering or converting data stored as a tar archive on legacy tape data cartridges.

Comments are closed