create_chunker-params.txt 4.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
  1. About borg create --chunker-params
  2. ==================================
  3. --chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE
  4. CHUNK_MIN_EXP and CHUNK_MAX_EXP give the exponent N of the 2^N minimum and
  5. maximum chunk size. Required: CHUNK_MIN_EXP < CHUNK_MAX_EXP.
  6. Defaults: 19 (2^19 == 512KiB) minimum, 23 (2^23 == 8MiB) maximum.
  7. Currently it is not supported to give more than 23 as maximum.
  8. HASH_MASK_BITS is the number of least-significant bits of the rolling hash
  9. that need to be zero to trigger a chunk cut.
  10. Recommended: CHUNK_MIN_EXP + X <= HASH_MASK_BITS <= CHUNK_MAX_EXP - X, X >= 2
  11. (this allows the rolling hash some freedom to make its cut at a place
  12. determined by the windows contents rather than the min/max. chunk size).
  13. Default: 21 (statistically, chunks will be about 2^21 == 2MiB in size)
  14. HASH_WINDOW_SIZE: the size of the window used for the rolling hash computation.
  15. Must be an odd number. Default: 4095B
  16. Trying it out
  17. =============
  18. I backed up a VM directory to demonstrate how different chunker parameters
  19. influence repo size, index size / chunk count, compression, deduplication.
  20. repo-sm: ~64kiB chunks (16 bits chunk mask), min chunk size 1kiB (2^10B)
  21. (these are attic / borg 0.23 internal defaults)
  22. repo-lg: ~1MiB chunks (20 bits chunk mask), min chunk size 64kiB (2^16B)
  23. repo-xl: 8MiB chunks (2^23B max chunk size), min chunk size 64kiB (2^16B).
  24. The chunk mask bits was set to 31, so it (almost) never triggers.
  25. This degrades the rolling hash based dedup to a fixed-offset dedup
  26. as the cutting point is now (almost) always the end of the buffer
  27. (at 2^23B == 8MiB).
  28. The repo index size is an indicator for the RAM needs of Borg.
  29. In this special case, the total RAM needs are about 2.1x the repo index size.
  30. You see index size of repo-sm is 16x larger than of repo-lg, which corresponds
  31. to the ratio of the different target chunk sizes.
  32. Note: RAM needs were not a problem in this specific case (37GB data size).
  33. But just imagine, you have 37TB of such data and much less than 42GB RAM,
  34. then you should use the "lg" chunker params so you only need
  35. 2.6GB RAM. Or even bigger chunks than shown for "lg" (see "xl").
  36. You also see compression works better for larger chunks, as expected.
  37. Duplication works worse for larger chunks, also as expected.
  38. small chunks
  39. ============
  40. $ borg info /extra/repo-sm::1
  41. Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 10,23,16,4095 /extra/repo-sm::1 /home/tw/win
  42. Number of files: 3
  43. Original size Compressed size Deduplicated size
  44. This archive: 37.12 GB 14.81 GB 12.18 GB
  45. All archives: 37.12 GB 14.81 GB 12.18 GB
  46. Unique chunks Total chunks
  47. Chunk index: 378374 487316
  48. $ ls -l /extra/repo-sm/index*
  49. -rw-rw-r-- 1 tw tw 20971538 Jun 20 23:39 index.2308
  50. $ du -sk /extra/repo-sm
  51. 11930840 /extra/repo-sm
  52. large chunks
  53. ============
  54. $ borg info /extra/repo-lg::1
  55. Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 16,23,20,4095 /extra/repo-lg::1 /home/tw/win
  56. Number of files: 3
  57. Original size Compressed size Deduplicated size
  58. This archive: 37.10 GB 14.60 GB 13.38 GB
  59. All archives: 37.10 GB 14.60 GB 13.38 GB
  60. Unique chunks Total chunks
  61. Chunk index: 25889 29349
  62. $ ls -l /extra/repo-lg/index*
  63. -rw-rw-r-- 1 tw tw 1310738 Jun 20 23:10 index.2264
  64. $ du -sk /extra/repo-lg
  65. 13073928 /extra/repo-lg
  66. xl chunks
  67. =========
  68. (borg-env)tw@tux:~/w/borg$ borg info /extra/repo-xl::1
  69. Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 16,23,31,4095 /extra/repo-xl::1 /home/tw/win
  70. Number of files: 3
  71. Original size Compressed size Deduplicated size
  72. This archive: 37.10 GB 14.59 GB 14.59 GB
  73. All archives: 37.10 GB 14.59 GB 14.59 GB
  74. Unique chunks Total chunks
  75. Chunk index: 4319 4434
  76. $ ls -l /extra/repo-xl/index*
  77. -rw-rw-r-- 1 tw tw 327698 Jun 21 00:52 index.2011
  78. $ du -sk /extra/repo-xl/
  79. 14253464 /extra/repo-xl/