10 years ago · 9e77251d8c
--- a/docs/global.rst.inc
+++ b/docs/global.rst.inc
@@ -7,12 +7,15 @@
 
				 .. _deduplication: https://en.wikipedia.org/wiki/Data_deduplication
			
 
				 .. _AES: https://en.wikipedia.org/wiki/Advanced_Encryption_Standard
			
 
				 .. _HMAC-SHA256: http://en.wikipedia.org/wiki/HMAC
			
 
				+.. _SHA256: https://en.wikipedia.org/wiki/SHA-256
			
 
				 .. _PBKDF2: https://en.wikipedia.org/wiki/PBKDF2
			
 
				 .. _ACL: https://en.wikipedia.org/wiki/Access_control_list
			
 
				 .. _libacl: http://savannah.nongnu.org/projects/acl/
			
 
				 .. _github: https://github.com/jborg/attic
			
 
				 .. _OpenSSL: https://www.openssl.org/
			
 
				 .. _Python: http://www.python.org/
			
 
				+.. _Buzhash: https://en.wikipedia.org/wiki/Buzhash
			
 
				+.. _msgpack: http://msgpack.org/
			
 
				 .. _`msgpack-python`: https://pypi.python.org/pypi/msgpack-python/
			
 
				 .. _llfuse: https://pypi.python.org/pypi/llfuse/
			
 
				 .. _homebrew: http://mxcl.github.io/homebrew/
			
@@ -24,3 +27,4 @@
 
				 .. _Arch Linux: https://aur.archlinux.org/packages/attic/
			
 
				 .. _Slackware: http://slackbuilds.org/result/?search=Attic
			
 
				 .. _Cython: http://cython.org/
			
 
				+.. _mailing list discussion about internals: http://librelist.com/browser/attic/2014/5/6/questions-and-suggestions-about-inner-working-of-attic>
			
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -50,6 +50,7 @@ User's Guide
 
				    quickstart
			
 
				    usage
			
 
				    faq
			
 
				+   internals
			
 
				 
			
 
				 Getting help
			
 
				 ============
			
--- a/docs/internals.rst
+++ b/docs/internals.rst
@@ -0,0 +1,317 @@
 
				+.. include:: global.rst.inc
			
 
				+.. _internals:
			
 
				+
			
 
				+Internals
			
 
				+=========
			
 
				+
			
 
				+This page documents the internal data structures and storage
			
 
				+mechanisms of |project_name|. It is partly based on `mailing list
			
 
				+discussion about internals`_ and also on static code analysis. It may
			
 
				+not be exactly up to date with the current source code.
			
 
				+
			
 
				+|project_name| stores its data in a `Repository`. Each repository can
			
 
				+hold multiple `Archives`, which represent individual backups that
			
 
				+contain a full archive of the files specified when the backup was
			
 
				+performed. Deduplication is performed across multiple backups, both on
			
 
				+data and metadata, using `Segments` chunked with the Buzhash_
			
 
				+algorithm. Each repository has the following file structure:
			
 
				+
			
 
				+README
			
 
				+  simple text file describing the repository
			
 
				+
			
 
				+config
			
 
				+  description of the repository, includes the unique identifier. also
			
 
				+  acts as a lock file
			
 
				+
			
 
				+data/
			
 
				+  directory where the actual data (`segments`) is stored
			
 
				+
			
 
				+hints.%d
			
 
				+  undocumented
			
 
				+
			
 
				+index.%d
			
 
				+  cache of the file indexes. those files can be regenerated with
			
 
				+  ``check --repair``
			
 
				+
			
 
				+Config file
			
 
				+-----------
			
 
				+
			
 
				+Each repository has a ``config`` file which which is a ``INI``
			
 
				+formatted file which looks like this::
			
 
				+
			
 
				+    [repository]
			
 
				+    version = 1
			
 
				+    segments_per_dir = 10000
			
 
				+    max_segment_size = 5242880
			
 
				+    id = 57d6c1d52ce76a836b532b0e42e677dec6af9fca3673db511279358828a21ed6
			
 
				+
			
 
				+This is where the ``repository.id`` is stored. It is a unique
			
 
				+identifier for repositories. It will not change if you move the
			
 
				+repository around so you can make a local transfer then decide to move
			
 
				+the repository in another (even remote) location at a later time.
			
 
				+
			
 
				+|project_name| will do a POSIX read lock on that file when operating
			
 
				+on the repository.
			
 
				+
			
 
				+Segments and archives
			
 
				+---------------------
			
 
				+
			
 
				+|project_name| is a "filesystem based transactional key value
			
 
				+store". It makes extensive use of msgpack_ to store data and, unless
			
 
				+otherwise noted, data is stored in msgpack_ encoded files.
			
 
				+
			
 
				+Objects referenced by a key (256bits id/hash) are stored inline in
			
 
				+files (`segments`) of size approx 5MB in ``repo/data``. They contain:
			
 
				+
			
 
				+* header size
			
 
				+* crc
			
 
				+* size
			
 
				+* tag
			
 
				+* key
			
 
				+* data
			
 
				+
			
 
				+Segments are built locally, and then uploaded. Those files are
			
 
				+strictly append-only and modified only once.
			
 
				+
			
 
				+Tag is either ``PUT``, ``DELETE``, or ``COMMIT``. A segment file is
			
 
				+basically a transaction log where each repository operation is
			
 
				+appended to the file. So if an object is written to the repository a
			
 
				+``PUT`` tag is written to the file followed by the object id and
			
 
				+data. And if an object is deleted a ``DELETE`` tag is appended
			
 
				+followed by the object id. A ``COMMIT`` tag is written when a
			
 
				+repository transaction is committed.  When a repository is opened any
			
 
				+``PUT`` or ``DELETE`` operations not followed by a ``COMMIT`` tag are
			
 
				+discarded since they are part of a partial/uncommitted transaction.
			
 
				+
			
 
				+The manifest is an object with an id of only zeros (32 bytes), that
			
 
				+references all the archives. It contains:
			
 
				+
			
 
				+* version
			
 
				+* list of archives
			
 
				+* timestamp
			
 
				+* config
			
 
				+
			
 
				+Each archive contains:
			
 
				+
			
 
				+* name
			
 
				+* id
			
 
				+* time
			
 
				+
			
 
				+It is the last object stored, in the last segment, and is replaced
			
 
				+each time.
			
 
				+
			
 
				+The archive metadata does not contain the file items directly. Only
			
 
				+references to other objects that contain that data. An archive is an
			
 
				+object that contain metadata:
			
 
				+
			
 
				+* version
			
 
				+* name
			
 
				+* items list
			
 
				+* cmdline
			
 
				+* hostname
			
 
				+* username
			
 
				+* time
			
 
				+
			
 
				+Each item represents a file or directory or
			
 
				+symlink is stored as an ``item`` dictionary that contains:
			
 
				+
			
 
				+* path
			
 
				+* list of chunks
			
 
				+* user
			
 
				+* group
			
 
				+* uid
			
 
				+* gid
			
 
				+* mode (item type + permissions)
			
 
				+* source (for links)
			
 
				+* rdev (for devices)
			
 
				+* mtime
			
 
				+* xattrs
			
 
				+* acl
			
 
				+* bsdfiles
			
 
				+
			
 
				+``ctime`` (change time) is not stored because there is no API to set
			
 
				+it and it is reset every time an inode's metadata is changed.
			
 
				+
			
 
				+All items are serialized using msgpack and the resulting byte stream
			
 
				+is fed into the same chunker used for regular file data and turned
			
 
				+into deduplicated chunks. The reference to these chunks is then added
			
 
				+to the archive metadata. This allows the archive to store many files,
			
 
				+beyond the ``MAX_OBJECT_SIZE`` barrier of 20MB.
			
 
				+
			
 
				+A chunk is an object as well, of course. The chunk id is either 
			
 
				+HMAC-SHA256_, when encryption is used, or a SHA256_ hash otherwise.
			
 
				+
			
 
				+Hints are stored in a file (``repo/hints``) and contain:
			
 
				+
			
 
				+* version
			
 
				+* list of segments
			
 
				+* compact
			
 
				+
			
 
				+Chunks
			
 
				+------
			
 
				+
			
 
				+|project_name| uses a rolling checksum with Buzhash_ algorithm, with
			
 
				+window size of 4095 bytes (`0xFFF`), with a minimum of 1024, and triggers when
			
 
				+the last 16 bits of the checksum are null, producing chunks of 64kB on
			
 
				+average. All these parameters are fixed. The buzhash table is altered
			
 
				+by XORing it with a seed randomly generated once for the archive, and
			
 
				+stored encrypted in the keyfile.
			
 
				+
			
 
				+Indexes
			
 
				+-------
			
 
				+
			
 
				+There are two main indexes: the chunk lookup index and the repository
			
 
				+index. There is also the file chunk cache.
			
 
				+
			
 
				+The chunk lookup index is stored in ``cache/chunk`` and is indexed on
			
 
				+the ``chunk hash``. It contains:
			
 
				+
			
 
				+* reference count
			
 
				+* size
			
 
				+* ciphered size
			
 
				+
			
 
				+The repository index is stored in ``repo/index.%d`` and is also
			
 
				+indexed on ``chunk hash`` and contains:
			
 
				+
			
 
				+* segment
			
 
				+* offset
			
 
				+
			
 
				+The repository index files are random access but those files can be
			
 
				+recreated if damaged or lost using ``check --repair``.
			
 
				+
			
 
				+Both indexes are stored as hash tables, directly mapped in memory from
			
 
				+the file content, with only one slot per bucket, but that spreads the
			
 
				+collisions to the following buckets. As a consequence the hash is just
			
 
				+a start position for a linear search, and if the element is not in the
			
 
				+table the index is linearly crossed until an empty bucket is
			
 
				+found. When the table is full at 90% its size is doubled, when it's
			
 
				+empty at 25% its size is halfed. So operations on it have a variable
			
 
				+complexity between constant and linear with low factor, and memory
			
 
				+overhead varies between 10% and 300%.
			
 
				+
			
 
				+The file chunk cache is stored in ``cache/files`` and is indexed on
			
 
				+the ``file path hash`` and contains:
			
 
				+
			
 
				+* age
			
 
				+* inode number
			
 
				+* size
			
 
				+* mtime_ns
			
 
				+* chunks hashes
			
 
				+
			
 
				+The inode number is stored to make sure we distinguish between
			
 
				+different files, as a single path may not be unique across different
			
 
				+archives in different setups.
			
 
				+
			
 
				+The file chunk cache is stored as a python associative array storing
			
 
				+python objects, which generate a lot of overhead. This takes around
			
 
				+240 bytes per file without the chunk list, to be compared to at most
			
 
				+64 bytes of real data (depending on data alignment), and around 80
			
 
				+bytes per chunk hash (vs 32), with a minimum of ~250 bytes even if
			
 
				+only one chunk hash.
			
 
				+
			
 
				+Indexes memory usage
			
 
				+--------------------
			
 
				+
			
 
				+Here is the estimated memory usage of |project_name| when using those
			
 
				+indexes.
			
 
				+
			
 
				+Repository index
			
 
				+  40 bytes x N ~ 200MB (If a remote repository is
			
 
				+  used this will be allocated on the remote side)
			
 
				+
			
 
				+Chunk lookup index
			
 
				+  44 bytes x N ~ 220MB
			
 
				+
			
 
				+File chunk cache
			
 
				+  probably 80-100 bytes x N ~ 400MB
			
 
				+
			
 
				+In the above we assume 350GB of data that we divide on an average 64KB
			
 
				+chunk size, so N is around 5.3 million.
			
 
				+
			
 
				+Encryption
			
 
				+----------
			
 
				+
			
 
				+AES_ is used with CTR mode of operation (so no need for padding). A 64
			
 
				+bits initialization vector is used, a `HMAC-SHA256`_ is computed
			
 
				+on the encrypted chunk with a random 64 bits nonce and both are stored
			
 
				+in the chunk. The header of each chunk is : ``TYPE(1)`` +
			
 
				+``HMAC(32)`` + ``NONCE(8)`` + ``CIPHERTEXT``. Encryption and HMAC use
			
 
				+two different keys.
			
 
				+
			
 
				+In AES CTR mode you can think of the IV as the start value for the
			
 
				+counter. The counter itself is incremented by one after each 16 byte
			
 
				+block. The IV/counter is not required to be random but it must NEVER be
			
 
				+reused. So to accomplish this |project_name| initializes the encryption counter
			
 
				+to be higher than any previously used counter value before encrypting
			
 
				+new data.
			
 
				+
			
 
				+To reduce payload size only 8 bytes of the 16 bytes nonce is saved in
			
 
				+the payload, the first 8 bytes are always zeroes. This does not affect
			
 
				+security but limits the maximum repository capacity to only 295
			
 
				+exabytes (2**64 * 16 bytes).
			
 
				+
			
 
				+Encryption keys are either a passphrase, passed through the
			
 
				+``ATTIC_PASSPHRASE`` environment or prompted on the commandline, or
			
 
				+stored in automatically generated key files.
			
 
				+
			
 
				+Key files
			
 
				+---------
			
 
				+
			
 
				+When initialized with the ``init -e keyfile`` command, |project_name|
			
 
				+needs an associated file in ``$HOME/.attic/keys`` to read and write
			
 
				+the repository. The format is based on msgpack_, base64 encoding and
			
 
				+PBKDF2_ SHA256 hashing, which is then encoded again in a msgpack_.
			
 
				+
			
 
				+The internal data structure is as follows:
			
 
				+
			
 
				+version
			
 
				+  currently always an integer, 1
			
 
				+
			
 
				+repository_id
			
 
				+  the ``id`` field in the ``config`` ``INI`` file of the repository.
			
 
				+
			
 
				+enc_key
			
 
				+  the key used to encrypt data with AES (256 bits)
			
 
				+  
			
 
				+enc_hmac_key
			
 
				+  the key used to HMAC the resulting AES-encrypted data (256 bits)
			
 
				+
			
 
				+id_key
			
 
				+  the key used to HMAC the above chunks, the resulting hash is
			
 
				+  stored out of band (256 bits)
			
 
				+
			
 
				+chunk_seed
			
 
				+  the seed for the buzhash chunking table (signed 32 bit integer)
			
 
				+
			
 
				+Those fields are processed using msgpack_. The utf-8 encoded phassphrase
			
 
				+is encrypted with PBKDF2_ and SHA256_ using 100000 iterations and a
			
 
				+random 256 bits salt to give us a derived key. The derived key is 256
			
 
				+bits long.  A `HMAC-SHA256`_ checksum of the above fields is generated
			
 
				+with the derived key, then the derived key is also used to encrypt the
			
 
				+above pack of fields. Then the result is stored in a another msgpack_
			
 
				+formatted as follows:
			
 
				+
			
 
				+version
			
 
				+  currently always an integer, 1
			
 
				+
			
 
				+salt
			
 
				+  random 256 bits salt used to process the passphrase
			
 
				+
			
 
				+iterations
			
 
				+  number of iterations used to process the passphrase (currently 100000)
			
 
				+
			
 
				+algorithm
			
 
				+  the hashing algorithm used to process the passphrase and do the HMAC
			
 
				+  checksum (currently the string ``sha256``)
			
 
				+
			
 
				+hash
			
 
				+  the HMAC of the encrypted derived key
			
 
				+
			
 
				+data
			
 
				+  the derived key, encrypted with AES over a PBKDF2_ SHA256 key
			
 
				+  described above
			
 
				+
			
 
				+The resulting msgpack_ is then encoded using base64 and written to the
			
 
				+key file, wrapped using the standard ``textwrap`` module with a
			
 
				+header. The header is a single line with the string ``ATTIC_KEY``, a
			
 
				+space and a hexadecimal representation of the repository id.