Feb 23, 2016
Delta copying is a optimized way of copying a file when an older version of the same file already exists at destination.
When no copy of a file exists at destination we have no option by to read every byte of the source file and then write them all into backup copy.
If we are then to make a small change to the source file and repeat the copying, the vast majority of data we'll be writing will be exactly the same as what's already on the disk.
So it only makes sense to try and eliminate these redundant writes, and that's exactly what delta copying is about.
Naturally, there is more than one way to approach this matter.
The rsync way
A widely-used rsync tool  uses two cooperating processes - one at the source and another at destination - which both read their own copy of a file, block by block, and talk to each other to compare block checksums. When checksums don't match, then the source process forwards respective block to the destination process, which merges it into the destination file.
That's a much simplified version of rsync. There's quite a bit more to its algorithm, because after all it was Tridge's PhD thesis :)
The biggest plus of rsync is that all you need is just two copies of a file and then it can one make look like another, expeditiously.
The biggest minus of rsync is that you need to have a copy of rsync running on the receiving end. Meaning, if your NAS doesn't support rsync, then that's it. No rsync for you.
Bvckup 2 and its older brother Bvckup take a different approach.
When a file is first copied, the app splits it into equally-sized blocks, computes a hash for each block and then stores these hashes locally.
On the next copy, as the app goes through the source file block by block, it re-computes the hashes and compares them to the saved versions. If they match (* see below), then a block is assumed to be unchanged and it is skipped over. Otherwise, it is written out and the saved hash is updated to its new value.
Easy-peasy. But what of them caveats, you wonder? Indeed.
1. The last thing we want is to skip a modified block only because it happened to have the exact same hash as its previous version. The risk of this event is mitigated by using two separate checksums for each block, both of which are stored in a hash file.
Additionally, Bvckup 2 computes a full-file checksum using 3rd digest algorithm. In case when no block-level changes are detected in a file, this hash is verified to match its version from the previous run. If there's ever a mismatch, the file is re-copied in full.
2. Delta copying assumes that destination file remains unmodified between the runs. Because if it's not, then all our precious locally saved block hashes will simply be of no use.
Luckily, since we are in a backup software context, this holds true in a vast majority of cases. However as they say - trust, but verify.
To catch changes to destination files the app saves their size and created/last-modified time stamps alongside the block hashes. If these aren't an exact match to the reality on the next run, then destination file is deemed to be modified and the file is re-copied in full.
3. Delta copying is an in-place update algorithm. It works with a live copy of destination file, meaning that if we are to cancel/abort the copying mid-way through, we may end up with a partially updated file.
There's not much we can do about this, but to detect this regrettable development on the next run and deal with it appropriately.
For orderly cancellations the app remembers how far along the file it was, stashes this information with the hashes and then resume from this point on the next run.
For abortive cancellations the size-timestamp caching provision from #2 above will ensure that the file is re-copied in full on the next run.
Delta copying is used only for larger files.
Files smaller than 2MB and files under 32MB that weren't modified within last 30 days are always copied in full. In older releases (including the last beta) the criteria was simpler, see  for full details.
Default block size is 32KB.
Per-block hashes are MD5 and a variation of CRC32.
Per-file hash is SHA1.
This means that we store 20 bytes of hashes per 32KB of raw data, plus a fixed per-file overhead of 40 something bytes. This works out to about 0.06% of data size, which is not that bad.
Internally, the delta copying routine organized into the reading-hashing-writing pipeline, operating fully asynchronously on a pool of I/O buffers.
The copying starts with the app issuing multiple read requests in parallel.
Once a request is completed, the I/O buffer is forwarded to the hashing module, which maintains a stand by pool of hashing threads. Once the buffer is hashed and if it appears to be modified, a write request is issued for it. Then, once the write request completes, the buffer is again used to read the next block in sequence and the cycle repeats.
Delta copying module comes with a lot of settings - from hashing thread count to buffer counts to read/write chunk sizes - all tweakable. However the app does a good job picking the defaults based on the exact disposition of source/destination - whether they are the same drive, whether they are on the network, whether they are over older or newer SMB protocol, etc. - so generally there's no need for messing with them.
So there you have it - the delta copying - a new best friend of your VM images and TC containers :-)
Feb 23, 2016
Pushing vs pulling backups
In short - push-style backups maximize the efficiency of delta copying.
When a backup is going over the network, there's often a question of where it's better to run the app.
If the app runs on the source machine, it's a "local-to-remote" or "push" backup. And when the app runs on the backup machine, it's a "remote-to-local" or "pull" backup.
Delta copying gets its speed benefits from being selective with writes. With push backups all reads are local (fast) and writes go over the network (slow). With pull backups all reads are over the network (slow) and writes are local (fast).
So if we are reducing the amount of writes, then with faster reads and slower writes the effect will be far more pronounced => push backups are better. In other words, running the app on a source machine will generally result in faster backups.
Dec 20, 2016
"Files smaller than 2MB and files under 32MB that weren't modified within last 30 days are always copied in full."
Let me see if I have this straight: In my source I have a folder tree of 11.2GB in 365,000 files, most under 2mb. Every one of those files will be copied on every bvckup2 run, even when date/time/size are unchanged?
Jan 29, 2017
Hi, the above one was a good explanation about such an useful program's feature, but I'm wondering something about it:
Is there a way one can force the "delta copying" mode, for specific folders/files, which one foresees they should be copied that way, and not in the regular, full mode?
I mean, let's say I have two folders, source and backup; there is an "example_123.x" file in the first one, which I regularly update, but it's filename's will not always be like that, but have some slightly variation with each update, let's say "example_124.x". Currently, if I want there is the chance Bvckup uses the "delta copying" feature for said updated file (because I'm positively sure it is just an updated version from the same file, and not a totally new one), I have to rename it previously, in order it matches that in the source folder, because otherwise, it is going to handle it as a completely new one, and transfer it fully; besides, if it does copy it in full, I will have to delete the previous "example_123.x" file, which would suppose an additional "deleting" task.
Alex Pankratov :
Jan 29, 2017
It all depends on how exactly example_123.x turns into example_124.x.
If this happens because 123 gets _renamed_ into 124, then bvckup2 can understand that - it will both rename 123's backup copy and preserve its delta copying state, so it will keep on delta-updating the file. This however requires "Detecting Changes" setting in backup settings to be set to "Use snapshot" (and not to "Re-scan on every run") and "Rename detection" enabled for files.
However if 124 is created from scratch and filled with 123's data, then the app won't be able to link these two files and it will indeed re-copy 124 from scratch.
What you are suggesting is effectively a kind of "hint" system to tell bvckup2 that THIS file and THAT file are two versions of the same source file even though it's not obvious. This is not a bad idea, but I strongly suspect that it will get very hairy when it comes to the implementation.
Jan 29, 2017
Yes, I think the only way of avoiding the whole copying process for source files and their equivalent updated ones, would be renaming the source level files with a generic name, and then using that generic name for all the newer files as well, so they match each time.
That's certainly not the more straightforward way of handling the updates for said cases, but will still be more efficient than copying full files each time, principally with larger ones.
What you mention about the "hint" system is kind of what I first thought of, about Bvckup2 being that smart how to detect that some files should be delta copied, even if they have some little differences in their filenames, whether it be through a kind of special index from said files, or some forced option specifically for them.
But of course, I know there should be miles away between just thinking about a kind of workaround like that, and having it implemented. ;)