2004, 2005 by Marc J. Rochkind. All rights reserved. Portions marked "Open Source" may be copied under license.

 

Copying Files With Links

Exercise 3.11 involves copying a tree of directories and files while preserving hard and symbolic links. This article discusses what "preserving" might mean and sketches how to do the copying. Multiple links to directories are not discussed.

Hard Links

How to Preserve Multiple Links

Every file has at least one hard link, and some have more than one. There's no distinction between them; that is, no concept of "main" and "secondary" links. It's obvious how to handle files with only one hard link, so this section is concerned only with multiply-linked-to files.

In a general-purpose copying utility, which is what Exercise 3.11 is about, there are three ways to interpret what "preserving" the hard links might mean:

  1. Don't copy the file at all. Just re-create all the links to it in the target tree. This is really unworkable in general, because the target might be on another device, and hard links can't span devices.
  2. Make one copy of the file and re-create any other links to it that were in the source tree as links to the copy in the target tree.
  3. Don't pay any attention to the link count, and just make multiple copies of the file in the target tree. This is in effect what happens in typical solutions to Exercises 3.9 and 3.10.

A mentioned, #1 is no good, and #3 wastes space and severs the linking arrangement, so #2 is the best.

Copying Algorithm

Two members of the stat structure, st_dev and st_ino, uniquely identify an i-node on a mounted device. So, given several links, these two members can be used to tell whether they are linking to the same source i-node.

While traversing the source tree, it's necessary to keep track (via a linear list, hash table, etc.) of each source i-node (st_dev/st_ino pair) that corresponds to a file with multiple links and to associate that i-node with the path name in the target tree. That path name is the path that results from the first encounter with the source i-node, which is when the file is actually copied. When another link to a source i-node that has already been copied is encountered, a link is created in the target tree to the first copy of the file in the target tree. This way, there will be exactly one copy in the target tree of every file in the source tree.

Although only files with more than one link need to be tracked to handle hard links, also tracking files with one link will help with symbolic links, which are discussed next.

Symbolic Links

Two Copying Situations

In copying a source tree, there are two kinds of symbolic links that might be encountered:

  1. Internal: A symbolic link that references something within the source tree. Whatever that something is, it will be treated separately, so the symbolic link should be re-created in the target tree with its contents (a path in the source tree) appropriately transformed to a path in the target tree.
  2. External: A symbolic link that references something outside the source tree. The link should simply be re-created in the target tree with the same contents.

(Of course, this suggested treatment of symbolic links isn't the only reasonable one. Perhaps the files externally linked to should be copied, especially if the purpose of the tree-copy is for backup.)

Copying Algorithm

If files are tracked as explained in the section "Hard Links," it's easy to tell whether a symbolic link points to an internal file: Use stat to get the st_dev/st_ino pair and see if it's in the table. If so, it's internal, and the new path, also in the table, is what you pass to symlink.

For external links, the old path can be read with readlink and simply used directly in a call to symlink.

Acknowledgements

comp.unix.programmer Thread

Updated 03/26/2005 11:53:01 AM