3 lat temu · c8830cde44
--- a/docs/internals/data-structures.rst
+++ b/docs/internals/data-structures.rst
@@ -77,7 +77,7 @@ don't have a particular meaning (except for the Manifest_).
 
				 
			
 
				 Normally the keys are computed like this::
			
 
				 
			
 
				-  key = id = id_hash(unencrypted_data)
			
 
				+  key = id = id_hash(plaintext_data)  # plain = not encrypted, not compressed, not obfuscated
			
 
				 
			
 
				 The id_hash function depends on the :ref:`encryption mode <borg_rcreate>`.
			
 
				 
			
@@ -98,15 +98,15 @@ followed by a number of log entries. Each log entry consists of (in this order):
 
				 
			
 
				 * crc32 checksum (uint32):
			
 
				   - for PUT2: CRC32(size + tag + key + digest)
			
 
				-  - for PUT: CRC32(size + tag + key + data)
			
 
				+  - for PUT: CRC32(size + tag + key + payload)
			
 
				   - for DELETE: CRC32(size + tag + key)
			
 
				   - for COMMIT: CRC32(size + tag)
			
 
				 * size (uint32) of the entry (including the whole header)
			
 
				 * tag (uint8): PUT(0), DELETE(1), COMMIT(2) or PUT2(3)
			
 
				 * key (256 bit) - only for PUT/PUT2/DELETE
			
 
				-* data (size - 41 bytes) - only for PUT
			
 
				-* xxh64 digest (64 bit) = XXH64(size + tag + key + data) - only for PUT2
			
 
				-* data (size - 41 - 8 bytes) - only for PUT2
			
 
				+* payload (size - 41 bytes) - only for PUT
			
 
				+* xxh64 digest (64 bit) = XXH64(size + tag + key + payload) - only for PUT2
			
 
				+* payload (size - 41 - 8 bytes) - only for PUT2
			
 
				 
			
 
				 PUT2 is new since repository version 2. For new log entries PUT2 is used.
			
 
				 PUT is still supported to read version 1 repositories, but not generated any more.
			
@@ -116,7 +116,7 @@ version 2+.
 
				 Those files are strictly append-only and modified only once.
			
 
				 
			
 
				 When an object is written to the repository a ``PUT`` entry is written
			
 
				-to the file containing the object id and data. If an object is deleted
			
 
				+to the file containing the object id and payload. If an object is deleted
			
 
				 a ``DELETE`` entry is appended with the object id.
			
 
				 
			
 
				 A ``COMMIT`` tag is written when a repository transaction is
			
@@ -130,13 +130,42 @@ partial/uncommitted transaction.
 
				 The size of individual segments is limited to 4 GiB, since the offset of entries
			
 
				 within segments is stored in a 32-bit unsigned integer in the repository index.
			
 
				 
			
 
				-Objects
			
 
				-~~~~~~~
			
 
				+Objects / Payload structure
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				+
			
 
				+All data (the manifest, archives, archive item stream chunks and file data
			
 
				+chunks) is compressed, optionally obfuscated and encrypted. This produces some
			
 
				+additional metadata (size and compression information), which is separately
			
 
				+serialized and also encrypted.
			
 
				+
			
 
				+See :ref:`data-encryption` for a graphic outlining the anatomy of the encryption in Borg.
			
 
				+What you see at the bottom there is done twice: once for the data and once for the metadata.
			
 
				+
			
 
				+An object (the payload part of a segment file log entry) must be like:
			
 
				+
			
 
				+- length of encrypted metadata (16bit unsigned int)
			
 
				+- encrypted metadata (incl. encryption header), when decrypted:
			
 
				+
			
 
				+  - msgpacked dict with:
			
 
				+
			
 
				+    - ctype (compression type 0..255)
			
 
				+    - clevel (compression level 0..255)
			
 
				+    - csize (overall compressed (and maybe obfuscated) data size)
			
 
				+    - psize (only when obfuscated: payload size without the obfuscation trailer)
			
 
				+    - size (uncompressed size of the data)
			
 
				+- encrypted data (incl. encryption header), when decrypted:
			
 
				+
			
 
				+  - compressed data (with an optional all-zero-bytes obfuscation trailer)
			
 
				+
			
 
				+This new, more complex repo v2 object format was implemented to be able to efficiently
			
 
				+query the metadata without having to read, transfer and decrypt the (usually much bigger)
			
 
				+data part.
			
 
				+
			
 
				+The metadata is encrypted to not disclose potentially sensitive information that could be
			
 
				+used for e.g. fingerprinting attacks.
			
 
				+
			
 
				+The compression `ctype` and `clevel` is explained in :ref:`data-compression`.
			
 
				 
			
 
				-All objects (the manifest, archives, archive item streams chunks and file data
			
 
				-chunks) are encrypted and/or compressed. See :ref:`data-encryption` for a
			
 
				-graphic outlining the anatomy of an object in Borg. The `type` for compression
			
 
				-is explained in :ref:`data-compression`.
			
 
				 
			
 
				 Index, hints and integrity
			
 
				 ~~~~~~~~~~~~~~~~~~~~~~~~~~
			
@@ -855,7 +884,7 @@ For each borg invocation, a new sessionkey is derived from the borg key material
 
				 and the 48bit IV starts from 0 again (both ciphers internally add a 32bit counter
			
 
				 to our IV, so we'll just count up by 1 per chunk).
			
 
				 
			
 
				-The chunk layout is best seen at the bottom of this diagram:
			
 
				+The encryption layout is best seen at the bottom of this diagram:
			
 
				 
			
 
				 .. figure:: encryption-aead.png
			
 
				     :figwidth: 100%
			
@@ -954,14 +983,14 @@ representation of the repository id.
 
				 Compression
			
 
				 -----------
			
 
				 
			
 
				-Borg supports the following compression methods, each identified by a type
			
 
				-byte:
			
 
				+Borg supports the following compression methods, each identified by a ctype value
			
 
				+in the range between 0 and 255 (and augmented by a clevel 0..255 value for the
			
 
				+compression level):
			
 
				 
			
 
				 - none (no compression, pass through data 1:1), identified by 0x00
			
 
				 - lz4 (low compression, but super fast), identified by 0x01
			
 
				 - zstd (level 1-22 offering a wide range: level 1 is lower compression and high
			
 
				-  speed, level 22 is higher compression and lower speed) - since borg 1.1.4,
			
 
				-  identified by 0x03
			
 
				+  speed, level 22 is higher compression and lower speed) - identified by 0x03
			
 
				 - zlib (level 0-9, level 0 is no compression [but still adding zlib overhead],
			
 
				   level 1 is low, level 9 is high compression), identified by 0x05
			
 
				 - lzma (level 0-9, level 0 is low, level 9 is high compression), identified
			
--- a/docs/internals/encryption-aead.odg
+++ b/docs/internals/encryption-aead.odg
--- a/docs/internals/encryption-aead.png
+++ b/docs/internals/encryption-aead.png