CPIO Data Archive

CPIO is a general-purpose file archive utility, almost universally installed on Unix-based computer systems. The CPIO utility was designed originally to enable data to be stored on magnetic computer tape. CPIO has been installed on almost all Unix based operating systems since its first introduction. CPIO derived it name from a programming term, copy in and out, which represent the standard input and output operations.

At one time CPIO was in common usage on archive tapes, particularly for distributing or transferring data between computer systems, as it allowed data compatibility across different platforms. CPIO is however used in several applications, including the Linux RPM Package Manager, initramfs module in the Linux Kernel and the Apple Pax installer. With the introduction of more versatile backup formats without the same limitations, CPIO is only rarely seen, usually confined to old and often obsolete media formats, which usually require data recovery or data conversion.

CPIO Development

Common to archive formats, CPIO was developed to stored multiple files and directories within a single stream of bytes, be that to a disk file or directly to magnetic tape. CPIO is unusual, in that the backup archive format does not align the file and directory headers on a block boundary. This is a throwback to when it was extremely important to save as much space as possible, particularly as the majority of tape drives in the early 1970s did not ship with hardware compression as standard. The original format was stored as binary data, which was also capable of storing an individual file up to 4 gigabytes in size, a very uncommon feature within an archive format of the time. Although the headers are not boundary aligned the format does include some padding bytes which are used to delimit some fields, while aligning the data on word boundaries. A disadvantage of storing the values as binary numbers, is that it limits the compatibility of the format, and cause data alignment issues.

Superseding this was the Portable ASCII Format, also referred to as the Old ASCII Format, which was developed to overcome the compatibility issues of the original format. This was done by using printable octal numbers for storing the numerical values instead. The length of the file header prior to the file name and its data were also changed to a fixed length, aligning the data to a 32-bit boundary. The maximum stored file size of this format allowed files up to 8 gigabytes to be stored.

All values stored in the new ASCII format are 8 byte printable hexadecimal values. In previous versions, any hardlink encountered would result in a duplicate of the data being written to the archive, but correctly handled in this version by setting the size to zero. The maximum permissible file size was however limited down to only 4 gigabytes. Data verification was added to the header, incorrectly referred to as a CRC check despite not being a true cyclic redundancy check.

Data Recovery and Conversion From CPIO

Despite the CPIO archive format being designed to allow the interchange of data between different platforms, and thoroughly documented, there are minor differences in the implementation which sometimes require our data recovery and conversion utilities to be modified to handle it. There is compression available for the CPIO format, although the archive can be parsed into a tool such as gzip before being written to disk file or tape.

CPIO is rarely seen, with most examples being found on obsolete types of media, which has usually been sent to us for data conversion, although the occasional tape arrived for data recovery due to media damage resulting in restoration failure. We have considerable experience at DiskEng in extracting the data files from all versions of CPIO archives arriving at our laboratory for data recovery or data conversion.

Comments are closed