Browse Source

docs: provide more details on object layout

While writing my own out-of-band decoder, I had a hard time figuring out
how to unpack the manifest. From the description, I was only able to
read that the manifest is msgpack'd, but I had not been able to figure
out that it's also going through the same encryption+compression logic
as all other things do.

This should make it a little clearer and provide the necessary
information to understand how the compression works.
Jonas Schäfer 2 years ago
parent
commit
bd89fb717e
1 changed files with 21 additions and 6 deletions
  1. 21 6
      docs/internals/data-structures.rst

+ 21 - 6
docs/internals/data-structures.rst

@@ -121,6 +121,14 @@ partial/uncommitted transaction.
 The size of individual segments is limited to 4 GiB, since the offset of entries
 within segments is stored in a 32-bit unsigned integer in the repository index.
 
+Objects
+~~~~~~~
+
+All objects (the manifest, archives, archive item streams chunks and file data
+chunks) are encrypted and/or compressed. See :ref:`data-encryption` for a
+graphic outlining the anatomy of an object in Borg. The `type` for compression
+is explained in :ref:`data-compression`.
+
 Index, hints and integrity
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -860,6 +868,8 @@ HashIndex is implemented in C and wrapped with Cython in a class-based interface
 The Cython wrapper checks every passed value against these reserved values and
 raises an AssertionError if they are used.
 
+.. _data-encryption:
+
 Encryption
 ----------
 
@@ -961,18 +971,23 @@ key file, wrapped using the standard ``textwrap`` module with a header.
 The header is a single line with a MAGIC string, a space and a hexadecimal
 representation of the repository id.
 
+.. _data-compression:
+
 Compression
 -----------
 
-Borg supports the following compression methods:
+Borg supports the following compression methods, each identified by two bytes:
 
-- none (no compression, pass through data 1:1)
-- lz4 (low compression, but super fast)
+- none (no compression, pass through data 1:1), identified by ``\x00\x00``
+- lz4 (low compression, but super fast), identified by ``\x01\x00``
 - zstd (level 1-22 offering a wide range: level 1 is lower compression and high
-  speed, level 22 is higher compression and lower speed) - since borg 1.1.4
+  speed, level 22 is higher compression and lower speed) - since borg 1.1.4,
+  identified by ``\x03\x00``
 - zlib (level 0-9, level 0 is no compression [but still adding zlib overhead],
-  level 1 is low, level 9 is high compression)
-- lzma (level 0-9, level 0 is low, level 9 is high compression).
+  level 1 is low, level 9 is high compression), identified by a zlib header
+  (``\x.8\x..``)
+- lzma (level 0-9, level 0 is low, level 9 is high compression), identified
+  by ``\x02\x00``.
 
 Speed:  none > lz4 > zlib > lzma, lz4 > zstd
 Compression: lzma > zlib > lz4 > none, zstd > lz4