|
@@ -168,13 +168,27 @@ A chunk is stored as an object as well, of course.
|
|
|
Chunks
|
|
|
------
|
|
|
|
|
|
-|project_name| uses a rolling hash computed by the Buzhash_ algorithm, with a
|
|
|
-window size of 4095 bytes (`0xFFF`), with a minimum chunk size of 1024 bytes.
|
|
|
-It triggers (chunks) when the last 16 bits of the hash are zero, producing
|
|
|
-chunks of 64kiB on average.
|
|
|
+The |project_name| chunker uses a rolling hash computed by the Buzhash_ algorithm.
|
|
|
+It triggers (chunks) when the last HASH_MASK_BITS bits of the hash are zero,
|
|
|
+producing chunks of 2^HASH_MASK_BITS Bytes on average.
|
|
|
+
|
|
|
+create --chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE
|
|
|
+can be used to tune the chunker parameters, the default is:
|
|
|
+
|
|
|
+- CHUNK_MIN_EXP = 10 (minimum chunk size = 2^10 B = 1 kiB)
|
|
|
+- CHUNK_MAX_EXP = 23 (maximum chunk size = 2^23 B = 8 MiB)
|
|
|
+- HASH_MASK_BITS = 16 (statistical medium chunk size ~= 2^16 B = 64 kiB)
|
|
|
+- HASH_WINDOW_SIZE = 4095 [B] (`0xFFF`)
|
|
|
+
|
|
|
+The default parameters are OK for relatively small backup data volumes and
|
|
|
+repository sizes and a lot of available memory (RAM) and disk space for the
|
|
|
+chunk index. If that does not apply, you are advised to tune these parameters
|
|
|
+to keep the chunk count lower than with the defaults.
|
|
|
|
|
|
The buzhash table is altered by XORing it with a seed randomly generated once
|
|
|
-for the archive, and stored encrypted in the keyfile.
|
|
|
+for the archive, and stored encrypted in the keyfile. This is to prevent chunk
|
|
|
+size based fingerprinting attacks on your encrypted repo contents (to guess
|
|
|
+what files you have based on a specific set of chunk sizes).
|
|
|
|
|
|
|
|
|
Indexes / Caches
|
|
@@ -243,7 +257,7 @@ Indexes / Caches memory usage
|
|
|
|
|
|
Here is the estimated memory usage of |project_name|:
|
|
|
|
|
|
- chunk_count ~= total_file_size / 65536
|
|
|
+ chunk_count ~= total_file_size / 2 ^ HASH_MASK_BITS
|
|
|
|
|
|
repo_index_usage = chunk_count * 40
|
|
|
|
|
@@ -252,20 +266,32 @@ Here is the estimated memory usage of |project_name|:
|
|
|
files_cache_usage = total_file_count * 240 + chunk_count * 80
|
|
|
|
|
|
mem_usage ~= repo_index_usage + chunks_cache_usage + files_cache_usage
|
|
|
- = total_file_count * 240 + total_file_size / 400
|
|
|
+ = chunk_count * 164 + total_file_count * 240
|
|
|
|
|
|
All units are Bytes.
|
|
|
|
|
|
-It is assuming every chunk is referenced exactly once and that typical chunk size is 64kiB.
|
|
|
+It is assuming every chunk is referenced exactly once (if you have a lot of
|
|
|
+duplicate chunks, you will have less chunks than estimated above).
|
|
|
+
|
|
|
+It is also assuming that typical chunk size is 2^HASH_MASK_BITS (if you have
|
|
|
+a lot of files smaller than this statistical medium chunk size, you will have
|
|
|
+more chunks than estimated above, because 1 file is at least 1 chunk).
|
|
|
|
|
|
If a remote repository is used the repo index will be allocated on the remote side.
|
|
|
|
|
|
-E.g. backing up a total count of 1Mi files with a total size of 1TiB:
|
|
|
+E.g. backing up a total count of 1Mi files with a total size of 1TiB.
|
|
|
+
|
|
|
+a) with create --chunker-params 10,23,16,4095 (default):
|
|
|
|
|
|
- mem_usage = 1 * 2**20 * 240 + 1 * 2**40 / 400 = 2.8GiB
|
|
|
+ mem_usage = 2.8GiB
|
|
|
|
|
|
-Note: there is a commandline option to switch off the files cache. You'll save
|
|
|
-some memory, but it will need to read / chunk all the files then.
|
|
|
+b) with create --chunker-params 10,23,20,4095 (custom):
|
|
|
+
|
|
|
+ mem_usage = 0.4GiB
|
|
|
+
|
|
|
+Note: there is also the --no-files-cache option to switch off the files cache.
|
|
|
+You'll save some memory, but it will need to read / chunk all the files then as
|
|
|
+it can not skip unmodified files then.
|
|
|
|
|
|
|
|
|
Encryption
|
|
@@ -291,6 +317,7 @@ Encryption keys are either derived from a passphrase or kept in a key file.
|
|
|
The passphrase is passed through the ``BORG_PASSPHRASE`` environment variable
|
|
|
or prompted for interactive usage.
|
|
|
|
|
|
+
|
|
|
Key files
|
|
|
---------
|
|
|
|
|
@@ -355,4 +382,10 @@ representation of the repository id.
|
|
|
Compression
|
|
|
-----------
|
|
|
|
|
|
-Currently, compression is disabled by default. Zlib compression can be enabled by passing ``--compression level`` on the command line. Level can be anything from 0 (no compression, fast) to 9 (high compression, slow).
|
|
|
+|project_name| currently always pipes all data through a zlib compressor which
|
|
|
+supports compression levels 0 (no compression, fast) to 9 (high compression, slow).
|
|
|
+
|
|
|
+See ``borg create --help`` about how to specify the compression level and its default.
|
|
|
+
|
|
|
+Note: zlib level 0 creates a little bit more output data than it gets as input,
|
|
|
+due to zlib protocol overhead.
|