The new and much improved delta copier is in.
( the original version is
through multi-core, parallel hashing, asynchronous I/O and more aggressive caching of the state data.
algorithm through addition of the whole file hash. When all individual block hashes come back unchanged, the algorithm now uses the file hash to confirm that the file has indeed remained unchanged.
Combined with the original use of two separate hashes per data block, this
eliminates the need for a precautionary full-file sync
every N copies.
Improved handling of cancelled and aborted delta copies by introducing support for
The algorithm now remembers how far along the file it managed to proceed and stashes this information for the next run.
When the copying is cancelled partway through, the file is marked as out-of-sync, but the delta state is updated to capture what has been already done. This comes very handy and speeds things up when updating very large files.
Improved handling of
large file counts
Version 1 maintained an in-memory index of all delta copied files and, surprisingly, this didn't scale well. It also kept all delta state information in a single folder, which too led to some pain and suffering when the file count grew large.
New version eliminates both these problems.
Lastly, all copier parameters - block sizes, hash algorithms, cache buffer sizes, threading parameters, etc. - are now
on a per-job basis.
All in all - THIS IS IT.
This is how a simple and efficient delta copying should be done. Can't wait to release it to the real world.