The newsletter | 02/11/2012

Symlinks and junctions

Today I'd like to run a quick poll on the subject of symbolic links and junctions and their handling in backup.

I am in the process of adding support for these to the program and it turned out to be a relatively complicated matter. I think I have it figured out, but I still need a second opinion.

Please have a look. The poll is at the end of the page.

Quick primer

Modern operating systems include a number of ways to refer to one file system object - a file or a directory - with another. Windows is no exception.

The basic form is a shortcut. It is a small file with .lnk extension that contains a name of target file or directory. When one clicks on such file, Windows Explorer recognizes the .lnk extension, reads target from the file and then navigates accordingly. Shortcuts can exist on any file system, but they are effectively a feature of Windows Explorer. Just as Photoshop is needed to open a .psd file, Windows Explorer is required to make sense of a .lnk file.

Junctions and symbolic links are of the same idea, except they are built into the file system. One doesn't need a special program to make sense of them. Instead, one can just "cd" and "open" them as if they were real directory or a file.

Both are NTFS features. Junction is an older feature dating back to Win2k and it's a directory shortcut. Symlinks were introduced in Vista and they can be used for both directories and files. Also, unlike junctions, they can point outside of their own disk partition, which is very handy.

Here's an example of junctions found on W7:




Hard links is another referral mechanism, but it is irrelevant to the matter at hand, so I'll leave it at that.

Lastly, it's worth mentioning that the use of symbolic and hard links in Unix world dates back to 1978.

Skip, copy, traverse

So the question at hand is what to do when a backup encounters a symlink or a junction.

Obviously, there are three options - skip, copy or traverse.

Skipping is the simplest, just pretend it's not there.

Copying means re-creating the exact symlink at destination even if it ends up pointing to an invalid location. This preserves exact symlink information, but it also may render it invalid in the backup copy.

Traversing obviously means that symlinks are entered and backed up as if they were real files or directories.

It's complicated

Symlinks exist for a reason, so ignoring them all together is probably not a very good idea. Between copying and traversing, the problem is that sometimes we may want to copy and sometimes - to traverse.

Take for example, a backup of C:\ProgramData:
Desktop  points at  C:\Users\Public\Desktop
So we may either copy the link or traverse it. On the other hand:
Templates  →  C:\ProgramData\Microsoft\Windows\Templates
which is inside of our tree, so we can't traverse it or else we'll end up with a duplicate data. There's also this marvel:
Application Data  →  C:\ProgramData
It points at its parent directory. This we certainly can't traverse, because otherwise we'll be stuck in a loop.

There's also this - Bvckup doesn't include a restore function, and Windows doesn't provide an easy way to copy a symlink. Instead, all common commands traverse symlinks.

This means that even if Bvckup were to replicate symlinks, it won't be possible to copy them back to the source from a backup copy. So it makes little practical sense to replicate symlinks unless there's a tangible benefit.

Backing up

So here's what I think Bvckup should be doing by default.

1. For the symlinks and junctions that point somewhere in the source tree, the backup will convert them to a relative format and copy. For example, the Templates link will become:
Templates  →  Microsoft\Windows\Template

2. For the symlinks and junctions that point to a (grand) parent of the source directory and thus create an external loop, the backup will skip them.

3. This leaves symlinks that point outside of the source tree. These will be traversed or skipped depending on user's preferences.

In other words, the main difference from a conventional backup is that symlinks are copied only if they point at a location inside of the source tree. Otherwise they are either skipped (or traversed, if configured).

Questions

Just pick an option and it will be submitted on click.

1. Would this approach work for your backup needs?

Yes, seems OK
No  —  please elaborate

2. Should internal symlinks be converted to a relative format?

Yes, that's fine
No, skip them
No, copy them

3. What should be the default for externally pointing symlinks?

Traverse
Skip
Copy

If you've got any thoughts or comments on this, I want to hear them. Being able to look at the matter from other people's perspective is really helpful.

Either ping me privately or post here.



Alex Pankratov
contact info

Newsletter index

Shipped ! Ready for Launch Beta is Ready ! Towards Beta The New Website Tech Demo #1
You are here
Symlinks and Junctions
Master Feature List Quick Announcement