The support forum

How to change the thread count and block size?

dro :

Mar 19, 2021

Hi All,

I'm trying to figure out which items in the settings.ini are for configuring the number of threads being use and the block size; I'm just having trouble figuring out where they are and what they're called. Google results aren't returning anything helpful.

I would be interested read and write information, as well as any other potentially related settings.

Thank you!

Alex Pankratov :

Mar 19, 2021

Generally, "conf.scanning" settings are for the scanning phase, "conf.exec" settings are for the main backup phase and "conf.copying.ultra" control how files are copied. Each part will have their own thread count control.

The thread count and buffer sizes will be set dynamically based on the CPU core count, storage device type and its connection. I.e. HDD over USB2 will have a different buffer count and sizes than, say, SDD on a SATA connection, which in turn be different from what's used for a remote share over an SMB3 link.

What you are after exactly?

I like (and approve of) the question. It would help to understand the specifics to save me a bit of time writing up the details. These settings are not documented because they tend to change from release to release.

dro :

Mar 22, 2021

I have a pair of servers that backup data to remote SMB shares. I'm using a tool (https://ccsiobench.com/) that benchmarks read/write performance to those SMB shares by block size and thread count.

Based off the block size and thread count indicated in the Bvckup logs, and comparing that to the results from the benchmarking tool, I'm seeing a potential for a 10-15% increase in performance if I'm able to manually configure both the read and write block size and thread count. In our situation that would be a marked improvement.

The files to be copied generally 6-10GB in size with minimum of 500MB and max of 30GB. About 500 files in total. So not particularly concerned with "light" mode if I'm understanding things correctly.

Any idea which settings might be relevant to my situation?

Alex Pankratov :

Mar 22, 2021

1. Pick a large file from the backup log and look at its "Completed in" line. It will be something like this

    Completed in 10 min 33 sec, copied 5.18 GB out of 256 GB
        413.93 MBps | 414.03 reading, 1770.54 hashing, 1194.02 writing

The last three numbers are performance estimates of three parts of the delta copying process - blocks are read, then hashed and checked for changes, and then they are written out. If you are copying files in full, you will see only two numbers there - for reading and writing.

These are not absolute figures (i.e. not MBps values). Instead, they are meant for relative comparison between themselves. The lowest figure is the bottleneck.

In my example, the bottleneck is in reading even though the backup goes from a local SSD to a NAS over an ordinary 1 Gbps link. This happens because delta copying eliminates most of the writes, so the copying _is_ in fact read-bound.

So the first thing to do is to look at your numbers.

2. You can indeed vary buffer sizes and buffer counts. There is no separate control over read vs write sizes. It reads in full buffers and it writes in full buffers (except for the last file block and cases when a part of the block is detected as unchanged by the delta copying logic).

Buffer size/count for non-delta copying is defined by

    conf.copying.ultra.buf.count
    conf.copying.ultra.buf.size

Size is in bytes. The default for copying to/from a remote share is 8 x 1MB. You may want to try bumping 8 to 16 or splitting total buffer space of 8MB differently, e.g. 32 x 256KB or similar.

CcsioBench may in fact provide a good reference here. It is after all derived from an internal profiling tool used to settle on Bvckup2 IO defaults.

If a file is to be copied in delta copying mode, then the buffer count is increased by

    conf.copying.ultra.delta.extra_bufs

This is done because in the delta copying mode every block also goes through the hashing phase, so the whole thing can juggle more blocks in-flight than the straight copying. By default, extra_bufs are set so that the total number of buffers is at least 4. That's it.

3. There is a separate IO config for files that are copied in a so-called "light" mode. It sounds like you know what it is. It kicks in when the file is smaller than 32 full IO buffers, so that'd be 256MB in case of remote copying. I can clarify this part, just ask.

4. Threading - there are two separate thread counts. One controls the number of hashing threads in the delta copying mode and another the number of copying threads when parallel exec is enabled.

The idea here is that if the parallel exec runs into a file that needs delta copying, it won't start any other copies until this - "heavy" - copy operation completes. This means that even though both thread counts can be set to ~ # of CPU cores, in practice both thread sets - the hashing and the execution - will never be active together. At least not for long.

Also, with parallel exec the main speed benefit comes from parallelizing non-IO operations - file creation, meta info copying, etc. So its benefits when copying larger files are really quite negligible.

In other words, I'd keep threading setup at its defaults. It's basically smart enough to not interfere with the process in a bad way. But if you feel like experimenting, relevant entries are

    conf.copying.ultra.delta.threads
    conf.exec.threads

HTH. If you have time, I'd be interested to see what you find out.

dro :

Mar 22, 2021

Thank you, that is all great information. I may be missing it in the config, but is there a way to set the IO mode manually?

Alex Pankratov :

Mar 22, 2021

The light vs heavy?

dro :

Mar 23, 2021

My apologies, I should've clarified better, the CCSIO tool provides feedback on buffered vs unbuffered read/writes, sequential vs random, etc. Are there any configuration options for those in Bvckup?

Alex Pankratov :

Mar 23, 2021

This is controlled via the following two settings -

    conf.copying.ultra.mode.src.large
    conf.copying.ultra.mode.dst.large

These are bitfields (bit combinations) of the following flags -

    0x08000000 - FILE_FLAG_SEQUENTIAL_SCAN
    0x10000000 - FILE_FLAG_RANDOM_ACCESS
    0x20000000 - FILE_FLAG_NO_BUFFERING
    0x80000000 - FILE_FLAG_WRITE_THROUGH

with the latter permitted only in the "dst" value.

To specify a combo you need to add respective flag values, e.g. random-access and write-through will be A0000000 (because 2 + 8 in hex is A).

The defaults are 28000000 for "src" and 08000000 for "dst", or "no buffering + sequential" and "sequential" respectively.

For background info on flags see -
1. https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea
2. https://docs.microsoft.com/en-us/windows/win32/fileio/file-buffering
3. https://devblogs.microsoft.com/oldnewthing/20120120-00/?p=8493

In particular, keep in mind the sector-multiple requirement of the "no-buffering" mode. This is not an issue when reading, but when writing it may cause an error if the file size is NOT a sector-size multiple. Bvckup2 doesn't have any special processing of the last file block when the file is opened in this mode (though this is _now_ on the todo list).

dro :

Apr 01, 2021

Hi Alex, thank you again for all of the great information. A more general inquiry though, I've made a number of different changes in values regarding all of the settings you've mentioned in this thread (as well as a few others) and I haven't seen any measurable impact at all. I don't mean performance of the backup jobs, but everything else as well, disk I/O, CPU usage, etc. has remained relatively stable even when increasing the thread count to 48+, enabling parallelization of all files, etc.

Do you have any thoughts on that? I would at least think there'd be some signs of change, even if not strictly performance related.

All of the values I've changed include:
    conf.copying.ultra.buf.count
    conf.copying.ultra.buf.size
    conf.copying.ultra.delta.extra_bufs
    conf.copying.ultra.delta.threads
    conf.exec.threads
    conf.copying.ultra.mode.src.large
    conf.copying.ultra.mode.dst.large
    conf.exec.bulk.max_count

Alex Pankratov :

Apr 05, 2021

Do you have any thoughts on that?


Have you confirmed that your overrides were actually picked up and used by the program? As recorded in the backup log. I suspect that you did, but it won't hurt to ask.

Based on my own testing I can say that the copying thread count (exec threads) has the biggest impact when copying lots of smaller files and when creating or deleting folders in bulk. Conversely, IO buffer sizes and counts matter when copying very large files. But in both cases performance tends to plateau very quickly and throwing more of the same at it (more threads, bigger buffers, etc.) doesn't make much difference.

Also keep in mind that the numbers you get from ccsiobench are based on a relatively short IO bursts. For more accurate measure of _sustained_ IO performance you may want to run a customized test with every test run executed for *at least* 20-30 seconds. You can reduce their count, but they need to be longer. Just to make sure all intermediate caches, buffers and such get filled to the brim with every test run.

Under these conditions, the resulting numbers will likely be less than what you get from the default test and closer to what you might be seeing in bvckup2.

New topic

Create
Made by IO Bureau in Switzerland
Support

Updates Newsletter
Blog & RSS
Follow Twitter
Reddit
Miscellanea Press kit
Testimonials
Company Imprint

Legal Terms
Privacy