Preview 7 of R81 adds support for working with live files.

This is a bit of a rabbit hole, so some background is in order.

If we are to open an existing file, append some data to it and keep it open, then the dir command will keep returning file's original size until we close the file:

The same will also happen to the last-modified timestamp. Even though we "touch" the file, everyone else will be seeing its old timestamp until we release the file reference.

This looks odd, doesn't it?

The explanation to this behavior can be found here and it has to do with the fact that modern file systems such as NTFS and ReFS store file meta data alongside the file itself rather than with its folder entry.

This allows attaching arbitrary meta data to the files (i.e. not just attributes and timestamps), but it also means that a simple directory listing requires combing through each and every file to collect respective bits of meta.

Obviously, this doesn't scale well.

This issue is resolved by caching parts of file's meta in its folder's entry. This speeds up directory listings, but it also comes at a cost of meta being replicated lazily.

For example, if we are to add data to a file, then its new file size will not get propagated to its folder record immediately. Instead, it will done at some point later on.

Depending on the Windows version, this may happen when we flush file buffers, query file size via a handle or close its handle.

All this means that there's no reliable way of detecting actively modified files based on their directory listing alone.

So if we want to identify and back up such live files, we need a workaround.

One workaround is to just forcibly update them on every run, regardless of whether they have actually changed or not. It's crude and wasteful, but in a pinch it works.

A better option, supported by the new preview release, is to double-check directory listing data by re-querying file size and timestamps directly from the files themselves.

The queries aren't cheap, so they are not issued for all files.

Instead, this option works similarly to the forced updates - we specify one or more files of interest and the program rechecks their meta if it runs into them during a scan.

None of this is too complicated, but it takes time to understand why things work they way they do and how to work around that.

*  Kudos to James Kindon for raising the issue and providing original traces to identify it.
Made by IO Bureau in Switzerland

Blog / RSS
Follow Twitter
Miscellanea Press kit
Company Imprint

Legal Terms