Browse Source

change the builtin default for --chunker-params, create 2MiB chunks, fixes #343

one of the biggest issues with borg < 1.0 was that it had a default target chunk
size of 64kiB, thus it created a lot of chunks, a huge chunk management overhead
(high RAM and disk usage).
Thomas Waldmann 9 years ago
parent
commit
169634f2ca
4 changed files with 21 additions and 19 deletions
  1. 2 2
      borg/archive.py
  2. 5 5
      docs/internals.rst
  3. 2 2
      docs/misc/create_chunker-params.txt
  4. 12 10
      docs/usage.rst

+ 2 - 2
borg/archive.py

@@ -26,10 +26,10 @@ import msgpack
 
 
 ITEMS_BUFFER = 1024 * 1024
 ITEMS_BUFFER = 1024 * 1024
 
 
-CHUNK_MIN_EXP = 10  # 2**10 == 1kiB
+CHUNK_MIN_EXP = 19  # 2**19 == 512kiB
 CHUNK_MAX_EXP = 23  # 2**23 == 8MiB
 CHUNK_MAX_EXP = 23  # 2**23 == 8MiB
 HASH_WINDOW_SIZE = 0xfff  # 4095B
 HASH_WINDOW_SIZE = 0xfff  # 4095B
-HASH_MASK_BITS = 16  # results in ~64kiB chunks statistically
+HASH_MASK_BITS = 21  # results in ~2MiB chunks statistically
 
 
 # defaults, use --chunker-params to override
 # defaults, use --chunker-params to override
 CHUNKER_PARAMS = (CHUNK_MIN_EXP, CHUNK_MAX_EXP, HASH_MASK_BITS, HASH_WINDOW_SIZE)
 CHUNKER_PARAMS = (CHUNK_MIN_EXP, CHUNK_MAX_EXP, HASH_MASK_BITS, HASH_WINDOW_SIZE)

+ 5 - 5
docs/internals.rst

@@ -210,9 +210,9 @@ producing chunks of 2^HASH_MASK_BITS Bytes on average.
 ``borg create --chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE``
 ``borg create --chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE``
 can be used to tune the chunker parameters, the default is:
 can be used to tune the chunker parameters, the default is:
 
 
-- CHUNK_MIN_EXP = 10 (minimum chunk size = 2^10 B = 1 kiB)
+- CHUNK_MIN_EXP = 19 (minimum chunk size = 2^19 B = 512 kiB)
 - CHUNK_MAX_EXP = 23 (maximum chunk size = 2^23 B = 8 MiB)
 - CHUNK_MAX_EXP = 23 (maximum chunk size = 2^23 B = 8 MiB)
-- HASH_MASK_BITS = 16 (statistical medium chunk size ~= 2^16 B = 64 kiB)
+- HASH_MASK_BITS = 21 (statistical medium chunk size ~= 2^21 B = 2 MiB)
 - HASH_WINDOW_SIZE = 4095 [B] (`0xFFF`)
 - HASH_WINDOW_SIZE = 4095 [B] (`0xFFF`)
 
 
 The buzhash table is altered by XORing it with a seed randomly generated once
 The buzhash table is altered by XORing it with a seed randomly generated once
@@ -313,13 +313,13 @@ If a remote repository is used the repo index will be allocated on the remote si
 
 
 E.g. backing up a total count of 1 Mi (IEC binary prefix e.g. 2^20) files with a total size of 1TiB.
 E.g. backing up a total count of 1 Mi (IEC binary prefix e.g. 2^20) files with a total size of 1TiB.
 
 
-a) with create ``--chunker-params 10,23,16,4095`` (default):
+a) with ``create --chunker-params 10,23,16,4095`` (custom, like borg < 1.0 or attic):
 
 
   mem_usage  =  2.8GiB
   mem_usage  =  2.8GiB
 
 
-b) with create ``--chunker-params 10,23,20,4095`` (custom):
+b) with ``create --chunker-params 19,23,21,4095`` (default):
 
 
-  mem_usage  =  0.4GiB
+  mem_usage  =  0.31GiB
 
 
 .. note:: There is also the ``--no-files-cache`` option to switch off the files cache.
 .. note:: There is also the ``--no-files-cache`` option to switch off the files cache.
    You'll save some memory, but it will need to read / chunk all the files as
    You'll save some memory, but it will need to read / chunk all the files as

+ 2 - 2
docs/misc/create_chunker-params.txt

@@ -6,7 +6,7 @@ About borg create --chunker-params
 CHUNK_MIN_EXP and CHUNK_MAX_EXP give the exponent N of the 2^N minimum and
 CHUNK_MIN_EXP and CHUNK_MAX_EXP give the exponent N of the 2^N minimum and
 maximum chunk size. Required: CHUNK_MIN_EXP < CHUNK_MAX_EXP.
 maximum chunk size. Required: CHUNK_MIN_EXP < CHUNK_MAX_EXP.
 
 
-Defaults: 10 (2^10 == 1KiB) minimum, 23 (2^23 == 8MiB) maximum.
+Defaults: 19 (2^19 == 512KiB) minimum, 23 (2^23 == 8MiB) maximum.
 
 
 HASH_MASK_BITS is the number of least-significant bits of the rolling hash
 HASH_MASK_BITS is the number of least-significant bits of the rolling hash
 that need to be zero to trigger a chunk cut.
 that need to be zero to trigger a chunk cut.
@@ -14,7 +14,7 @@ Recommended: CHUNK_MIN_EXP + X <= HASH_MASK_BITS <= CHUNK_MAX_EXP - X, X >= 2
 (this allows the rolling hash some freedom to make its cut at a place
 (this allows the rolling hash some freedom to make its cut at a place
 determined by the windows contents rather than the min/max. chunk size).
 determined by the windows contents rather than the min/max. chunk size).
 
 
-Default: 16 (statistically, chunks will be about 2^16 == 64kiB in size)
+Default: 21 (statistically, chunks will be about 2^21 == 2MiB in size)
 
 
 HASH_WINDOW_SIZE: the size of the window used for the rolling hash computation.
 HASH_WINDOW_SIZE: the size of the window used for the rolling hash computation.
 Default: 4095B
 Default: 4095B

+ 12 - 10
docs/usage.rst

@@ -249,8 +249,10 @@ Examples
     NAME="root-`date +%Y-%m-%d`"
     NAME="root-`date +%Y-%m-%d`"
     $ borg create -C zlib,6 /mnt/backup::$NAME / --do-not-cross-mountpoints
     $ borg create -C zlib,6 /mnt/backup::$NAME / --do-not-cross-mountpoints
 
 
-    # Backup huge files with little chunk management overhead
-    $ borg create --chunker-params 19,23,21,4095 /mnt/backup::VMs /srv/VMs
+    # Make a big effort in fine granular deduplication (big chunk management
+    # overhead, needs a lot of RAM and disk space, see formula in internals
+    # docs - same parameters as borg < 1.0 or attic):
+    $ borg create --chunker-params 10,23,16,4095 /mnt/backup::small /smallstuff
 
 
     # Backup a raw device (must not be active/in use/mounted at that time)
     # Backup a raw device (must not be active/in use/mounted at that time)
     $ dd if=/dev/sda bs=10M | borg create /mnt/backup::my-sda -
     $ dd if=/dev/sda bs=10M | borg create /mnt/backup::my-sda -
@@ -506,15 +508,15 @@ resource usage (RAM and disk space) as the amount of resources needed is
 (also) determined by the total amount of chunks in the repository (see
 (also) determined by the total amount of chunks in the repository (see
 `Indexes / Caches memory usage` for details).
 `Indexes / Caches memory usage` for details).
 
 
-``--chunker-params=10,23,16,4095 (default)`` results in a fine-grained deduplication
-and creates a big amount of chunks and thus uses a lot of resources to manage them.
-This is good for relatively small data volumes and if the machine has a good
-amount of free RAM and disk space.
+``--chunker-params=10,23,16,4095`` results in a fine-grained deduplication
+and creates a big amount of chunks and thus uses a lot of resources to manage
+them. This is good for relatively small data volumes and if the machine has a
+good amount of free RAM and disk space.
 
 
-``--chunker-params=19,23,21,4095`` results in a coarse-grained deduplication and
-creates a much smaller amount of chunks and thus uses less resources.
-This is good for relatively big data volumes and if the machine has a relatively
-low amount of free RAM and disk space.
+``--chunker-params=19,23,21,4095`` (default) results in a coarse-grained
+deduplication and creates a much smaller amount of chunks and thus uses less
+resources. This is good for relatively big data volumes and if the machine has
+a relatively low amount of free RAM and disk space.
 
 
 If you already have made some archives in a repository and you then change
 If you already have made some archives in a repository and you then change
 chunker params, this of course impacts deduplication as the chunks will be
 chunker params, this of course impacts deduplication as the chunks will be