Browse Source

update docs

Thomas Waldmann 2 years ago
parent
commit
c8830cde44

+ 46 - 17
docs/internals/data-structures.rst

@@ -77,7 +77,7 @@ don't have a particular meaning (except for the Manifest_).
 
 
 Normally the keys are computed like this::
 Normally the keys are computed like this::
 
 
-  key = id = id_hash(unencrypted_data)
+  key = id = id_hash(plaintext_data)  # plain = not encrypted, not compressed, not obfuscated
 
 
 The id_hash function depends on the :ref:`encryption mode <borg_rcreate>`.
 The id_hash function depends on the :ref:`encryption mode <borg_rcreate>`.
 
 
@@ -98,15 +98,15 @@ followed by a number of log entries. Each log entry consists of (in this order):
 
 
 * crc32 checksum (uint32):
 * crc32 checksum (uint32):
   - for PUT2: CRC32(size + tag + key + digest)
   - for PUT2: CRC32(size + tag + key + digest)
-  - for PUT: CRC32(size + tag + key + data)
+  - for PUT: CRC32(size + tag + key + payload)
   - for DELETE: CRC32(size + tag + key)
   - for DELETE: CRC32(size + tag + key)
   - for COMMIT: CRC32(size + tag)
   - for COMMIT: CRC32(size + tag)
 * size (uint32) of the entry (including the whole header)
 * size (uint32) of the entry (including the whole header)
 * tag (uint8): PUT(0), DELETE(1), COMMIT(2) or PUT2(3)
 * tag (uint8): PUT(0), DELETE(1), COMMIT(2) or PUT2(3)
 * key (256 bit) - only for PUT/PUT2/DELETE
 * key (256 bit) - only for PUT/PUT2/DELETE
-* data (size - 41 bytes) - only for PUT
-* xxh64 digest (64 bit) = XXH64(size + tag + key + data) - only for PUT2
-* data (size - 41 - 8 bytes) - only for PUT2
+* payload (size - 41 bytes) - only for PUT
+* xxh64 digest (64 bit) = XXH64(size + tag + key + payload) - only for PUT2
+* payload (size - 41 - 8 bytes) - only for PUT2
 
 
 PUT2 is new since repository version 2. For new log entries PUT2 is used.
 PUT2 is new since repository version 2. For new log entries PUT2 is used.
 PUT is still supported to read version 1 repositories, but not generated any more.
 PUT is still supported to read version 1 repositories, but not generated any more.
@@ -116,7 +116,7 @@ version 2+.
 Those files are strictly append-only and modified only once.
 Those files are strictly append-only and modified only once.
 
 
 When an object is written to the repository a ``PUT`` entry is written
 When an object is written to the repository a ``PUT`` entry is written
-to the file containing the object id and data. If an object is deleted
+to the file containing the object id and payload. If an object is deleted
 a ``DELETE`` entry is appended with the object id.
 a ``DELETE`` entry is appended with the object id.
 
 
 A ``COMMIT`` tag is written when a repository transaction is
 A ``COMMIT`` tag is written when a repository transaction is
@@ -130,13 +130,42 @@ partial/uncommitted transaction.
 The size of individual segments is limited to 4 GiB, since the offset of entries
 The size of individual segments is limited to 4 GiB, since the offset of entries
 within segments is stored in a 32-bit unsigned integer in the repository index.
 within segments is stored in a 32-bit unsigned integer in the repository index.
 
 
-Objects
-~~~~~~~
+Objects / Payload structure
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All data (the manifest, archives, archive item stream chunks and file data
+chunks) is compressed, optionally obfuscated and encrypted. This produces some
+additional metadata (size and compression information), which is separately
+serialized and also encrypted.
+
+See :ref:`data-encryption` for a graphic outlining the anatomy of the encryption in Borg.
+What you see at the bottom there is done twice: once for the data and once for the metadata.
+
+An object (the payload part of a segment file log entry) must be like:
+
+- length of encrypted metadata (16bit unsigned int)
+- encrypted metadata (incl. encryption header), when decrypted:
+
+  - msgpacked dict with:
+
+    - ctype (compression type 0..255)
+    - clevel (compression level 0..255)
+    - csize (overall compressed (and maybe obfuscated) data size)
+    - psize (only when obfuscated: payload size without the obfuscation trailer)
+    - size (uncompressed size of the data)
+- encrypted data (incl. encryption header), when decrypted:
+
+  - compressed data (with an optional all-zero-bytes obfuscation trailer)
+
+This new, more complex repo v2 object format was implemented to be able to efficiently
+query the metadata without having to read, transfer and decrypt the (usually much bigger)
+data part.
+
+The metadata is encrypted to not disclose potentially sensitive information that could be
+used for e.g. fingerprinting attacks.
+
+The compression `ctype` and `clevel` is explained in :ref:`data-compression`.
 
 
-All objects (the manifest, archives, archive item streams chunks and file data
-chunks) are encrypted and/or compressed. See :ref:`data-encryption` for a
-graphic outlining the anatomy of an object in Borg. The `type` for compression
-is explained in :ref:`data-compression`.
 
 
 Index, hints and integrity
 Index, hints and integrity
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -855,7 +884,7 @@ For each borg invocation, a new sessionkey is derived from the borg key material
 and the 48bit IV starts from 0 again (both ciphers internally add a 32bit counter
 and the 48bit IV starts from 0 again (both ciphers internally add a 32bit counter
 to our IV, so we'll just count up by 1 per chunk).
 to our IV, so we'll just count up by 1 per chunk).
 
 
-The chunk layout is best seen at the bottom of this diagram:
+The encryption layout is best seen at the bottom of this diagram:
 
 
 .. figure:: encryption-aead.png
 .. figure:: encryption-aead.png
     :figwidth: 100%
     :figwidth: 100%
@@ -954,14 +983,14 @@ representation of the repository id.
 Compression
 Compression
 -----------
 -----------
 
 
-Borg supports the following compression methods, each identified by a type
-byte:
+Borg supports the following compression methods, each identified by a ctype value
+in the range between 0 and 255 (and augmented by a clevel 0..255 value for the
+compression level):
 
 
 - none (no compression, pass through data 1:1), identified by 0x00
 - none (no compression, pass through data 1:1), identified by 0x00
 - lz4 (low compression, but super fast), identified by 0x01
 - lz4 (low compression, but super fast), identified by 0x01
 - zstd (level 1-22 offering a wide range: level 1 is lower compression and high
 - zstd (level 1-22 offering a wide range: level 1 is lower compression and high
-  speed, level 22 is higher compression and lower speed) - since borg 1.1.4,
-  identified by 0x03
+  speed, level 22 is higher compression and lower speed) - identified by 0x03
 - zlib (level 0-9, level 0 is no compression [but still adding zlib overhead],
 - zlib (level 0-9, level 0 is no compression [but still adding zlib overhead],
   level 1 is low, level 9 is high compression), identified by 0x05
   level 1 is low, level 9 is high compression), identified by 0x05
 - lzma (level 0-9, level 0 is low, level 9 is high compression), identified
 - lzma (level 0-9, level 0 is low, level 9 is high compression), identified

BIN
docs/internals/encryption-aead.odg


BIN
docs/internals/encryption-aead.png