Having thought about it a bit more, I can even remove that slight upward slope from the hump of the blue graph. The usage still climbs just a little bit during the run, but for practical intents and purposes it's
Wrapping up changes to the logging system. Here's a quick graph to demonstrate the effects of the rework. For an initial full run of a 100,000 file backup memory usage is down from about 200 to 50 Megs. For a 1,000,000 file backup the effect is similar - peak memory usage dropped
from 2.9GB to 0.58GB
Previously, the UI would keep logs of last few backup runs in the memory and for larger backups this would eat up a good chunk of memory. This was especially true for the initial backups where all files needed copying and thus for 100K files it would generate close to half a million log entries.
For subsequent backups it wasn't a big deal as they would handle just the changes and generate only a handful of steps and a bit of log data So in the end, the memory usage was capped, but the cap was higher than it could've been.
Beta 60 addresses that.
Only a very small part of the log is now kept in memory, just enough to fill the log viewer panel in the UI. The rest is kept on the disk and read from there on as-needed basis.
However this is easier said than done.
Bvckup logs are hierarchical, meaning that they are
with nodes that may have an arbitrary large number of children. Running a 1 million file backup would yield a "Processing..." entry with 1 million children, each with up to a dozen of descendants.
Displaying this tree would've not been a big deal if all log entries were always visible at all times. When the UI would've wanted to show entries 700,503 through 700,523 entries, it would've simply gone to a respective location in the log index file and plucked the data out. But...
The nodes can be collapsed and expanded. This hides and shows arbitrary chunks of the log tree and then the UI has this issue of finding 700,503-th
item. And this is hard.
This implies quick look-ups in some sort of super-imposed binary search tree of visible items. A tree that changes every time a node is expanded or collapsed and that sits in a disk file, which restricts how you can walk and modify it without getting an I/O penalty.
This is fun. This is what cost me 3 weeks of shuffling through CS books and several complete rewrites of the tree indexing code. Believe it or not, but this is basically an original research and it seems that no one had to deal with this issue before. At least not publicly.
The solution is to use slightly customized version of an
to store the list of node's children (that's using tree within a tree), use it for the visible item look-ups and then optimize the hell out of the index I/O through aggressive write caching.
When all i's are dotted and t's crossed, this yields a logging system that can push up to 1 mil entries per second on a resonably equipped box. This might not be knock-your-socks-off impressive, but it's plenty enough for the task at hand.