|
@@ -56,37 +56,81 @@ on the repository.
|
|
Repository structure
|
|
Repository structure
|
|
--------------------
|
|
--------------------
|
|
|
|
|
|
-|project_name| is a "filesystem based transactional key value store".
|
|
|
|
-
|
|
|
|
-Objects referenced by a key (256bits id/hash) are stored in line in
|
|
|
|
-files (segments) of size approx 5MB in ``repo/data``. They contain :
|
|
|
|
-header size, crc, size, tag, key, data. Tag is either ``PUT``,
|
|
|
|
-``DELETE``, or ``COMMIT``. Segments are built locally, and then
|
|
|
|
-uploaded. Those files are strictly append-only and modified only once.
|
|
|
|
-
|
|
|
|
-A segment file is basically a transaction log where each repository
|
|
|
|
-operation is appended to the file. So if an object is written to the
|
|
|
|
-repository a ``PUT`` tag is written to the file followed by the object
|
|
|
|
-id and data. And if an object is deleted a ``DELETE`` tag is appended
|
|
|
|
|
|
+|project_name| is a "filesystem based transactional key value
|
|
|
|
+store". It makes extensive use of msgpack_ to store data and, unless
|
|
|
|
+otherwise noted, data is stored in msgpack_ encoded files.
|
|
|
|
+
|
|
|
|
+Objects referenced by a key (256bits id/hash) are stored inline in
|
|
|
|
+files (`segments`) of size approx 5MB in ``repo/data``. They contain:
|
|
|
|
+
|
|
|
|
+* header size
|
|
|
|
+* crc
|
|
|
|
+* size
|
|
|
|
+* tag
|
|
|
|
+* key
|
|
|
|
+* data
|
|
|
|
+
|
|
|
|
+Segments are built locally, and then uploaded. Those files are
|
|
|
|
+strictly append-only and modified only once.
|
|
|
|
+
|
|
|
|
+Tag is either ``PUT``, ``DELETE``, or ``COMMIT``. A segment file is
|
|
|
|
+basically a transaction log where each repository operation is
|
|
|
|
+appended to the file. So if an object is written to the repository a
|
|
|
|
+``PUT`` tag is written to the file followed by the object id and
|
|
|
|
+data. And if an object is deleted a ``DELETE`` tag is appended
|
|
followed by the object id. A ``COMMIT`` tag is written when a
|
|
followed by the object id. A ``COMMIT`` tag is written when a
|
|
repository transaction is committed. When a repository is opened any
|
|
repository transaction is committed. When a repository is opened any
|
|
``PUT`` or ``DELETE`` operations not followed by a ``COMMIT`` tag are
|
|
``PUT`` or ``DELETE`` operations not followed by a ``COMMIT`` tag are
|
|
discarded since they are part of a partial/uncommitted transaction.
|
|
discarded since they are part of a partial/uncommitted transaction.
|
|
|
|
|
|
The manifest is an object with an id of only zeros (32 bytes), that
|
|
The manifest is an object with an id of only zeros (32 bytes), that
|
|
-references all the archives. It contains : version, list of archives,
|
|
|
|
-timestamp, config. Each archive contains: name, id, time. It is the last
|
|
|
|
-object stored, in the last segment, and is replaced each time.
|
|
|
|
|
|
+references all the archives. It contains:
|
|
|
|
+
|
|
|
|
+* version
|
|
|
|
+* list of archives
|
|
|
|
+* timestamp
|
|
|
|
+* config
|
|
|
|
+
|
|
|
|
+Each archive contains:
|
|
|
|
+
|
|
|
|
+* name
|
|
|
|
+* id
|
|
|
|
+* time
|
|
|
|
+
|
|
|
|
+It is the last object stored, in the last segment, and is replaced
|
|
|
|
+each time.
|
|
|
|
|
|
The archive metadata does not contain the file items directly. Only
|
|
The archive metadata does not contain the file items directly. Only
|
|
references to other objects that contain that data. An archive is an
|
|
references to other objects that contain that data. An archive is an
|
|
-object that contain metadata : version, name, items list, cmdline,
|
|
|
|
-hostname, username, time. Each item represents a file or directory or
|
|
|
|
-symlink is stored as a ``item`` dictionnary that contains: path, list
|
|
|
|
-of chunks, user, group, uid, gid, mode (item type + permissions),
|
|
|
|
-source (for links), rdev (for devices), mtime, xattrs, acl,
|
|
|
|
-bsdfiles. ``ctime`` (change time) is not stored because there is no
|
|
|
|
-API to set it and it is reset every time an inode's metadata is changed.
|
|
|
|
|
|
+object that contain metadata:
|
|
|
|
+
|
|
|
|
+* version
|
|
|
|
+* name
|
|
|
|
+* items list
|
|
|
|
+* cmdline
|
|
|
|
+* hostname
|
|
|
|
+* username
|
|
|
|
+* time
|
|
|
|
+
|
|
|
|
+Each item represents a file or directory or
|
|
|
|
+symlink is stored as a ``item`` dictionnary that contains:
|
|
|
|
+
|
|
|
|
+* path
|
|
|
|
+* list of chunks
|
|
|
|
+* user
|
|
|
|
+* group
|
|
|
|
+* uid
|
|
|
|
+* gid
|
|
|
|
+* mode (item type + permissions)
|
|
|
|
+* source (for links)
|
|
|
|
+* rdev (for devices)
|
|
|
|
+* mtime
|
|
|
|
+* xattrs
|
|
|
|
+* acl
|
|
|
|
+* bsdfiles
|
|
|
|
+
|
|
|
|
+``ctime`` (change time) is not stored because there is no API to set
|
|
|
|
+it and it is reset every time an inode's metadata is changed.
|
|
|
|
|
|
All items are serialized using msgpack and the resulting byte stream
|
|
All items are serialized using msgpack and the resulting byte stream
|
|
is fed into the same chunker used for regular file data and turned
|
|
is fed into the same chunker used for regular file data and turned
|
|
@@ -97,8 +141,11 @@ beyond the ``MAX_OBJECT_SIZE`` barrier of 20MB.
|
|
A chunk is an object as well, of course, and its id is the hash of its
|
|
A chunk is an object as well, of course, and its id is the hash of its
|
|
(unencrypted and uncompressed) content.
|
|
(unencrypted and uncompressed) content.
|
|
|
|
|
|
-Hints are stored in a file (``repo/hints``) and contain: version, list of
|
|
|
|
-segments, compact.
|
|
|
|
|
|
+Hints are stored in a file (``repo/hints``) and contain:
|
|
|
|
+
|
|
|
|
+* version
|
|
|
|
+* list of segments
|
|
|
|
+* compact
|
|
|
|
|
|
Chunks
|
|
Chunks
|
|
------
|
|
------
|
|
@@ -113,31 +160,55 @@ stored encrypted in the keyfile.
|
|
Indexes
|
|
Indexes
|
|
-------
|
|
-------
|
|
|
|
|
|
-The chunk lookup index (chunk hash -> reference count, size, ciphered
|
|
|
|
-size ; in file cache/chunk) and the repository index (chunk hash ->
|
|
|
|
-segment, offset ; in file ``repo/index.%d``) are stored in a sort of hash
|
|
|
|
-table, directly mapped in memory from the file content, with only one
|
|
|
|
-slot per bucket, but that spreads the collisions to the following
|
|
|
|
-buckets. As a consequence the hash is just a start position for a linear
|
|
|
|
-search, and if the element is not in the table the index is linearly
|
|
|
|
-crossed until an empty bucket is found. When the table is full at 90%
|
|
|
|
-its size is doubled, when it's empty at 25% its size is halfed. So
|
|
|
|
-operations on it have a variable complexity between constant and linear
|
|
|
|
-with low factor, and memory overhead varies between 10% and 300%.
|
|
|
|
-
|
|
|
|
-The file chunk cache (file path hash -> age, inode number, size,
|
|
|
|
-mtime_ns, chunks hashes ; in file cache/files) is stored as a python
|
|
|
|
-associative array storing python objects, which generate a lot of
|
|
|
|
-overhead. This takes around 240 bytes per file without the chunk
|
|
|
|
-list, to be compared to at most 64 bytes of real data (depending on data
|
|
|
|
-alignment), and around 80 bytes per chunk hash (vs 32), with a minimum
|
|
|
|
-of ~250 bytes even if only one chunck hash. The inode number is stored
|
|
|
|
-to make sure we distinguish between different files, as a single path
|
|
|
|
-may not be unique accross different archives in different setups.
|
|
|
|
-
|
|
|
|
-The ``index.%d`` files are random access but those files can be
|
|
|
|
|
|
+There are two main indexes: the chunk lookup index and the repository
|
|
|
|
+index. There is also the file chunk cache.
|
|
|
|
+
|
|
|
|
+The chunk lookup index is stored in ``cache/chunk`` and is indexed on
|
|
|
|
+the ``chunk hash``. It contains:
|
|
|
|
+
|
|
|
|
+* reference count
|
|
|
|
+* size
|
|
|
|
+* ciphered size
|
|
|
|
+
|
|
|
|
+The repository index is stored in ``repo/index.%d`` and is also
|
|
|
|
+indexed on ``chunk hash`` and contains:
|
|
|
|
+
|
|
|
|
+* segment
|
|
|
|
+* offset
|
|
|
|
+
|
|
|
|
+The repository index files are random access but those files can be
|
|
recreated if damaged or lost using ``check --repair``.
|
|
recreated if damaged or lost using ``check --repair``.
|
|
|
|
|
|
|
|
+Both indexes are stored as hash tables, directly mapped in memory from
|
|
|
|
+the file content, with only one slot per bucket, but that spreads the
|
|
|
|
+collisions to the following buckets. As a consequence the hash is just
|
|
|
|
+a start position for a linear search, and if the element is not in the
|
|
|
|
+table the index is linearly crossed until an empty bucket is
|
|
|
|
+found. When the table is full at 90% its size is doubled, when it's
|
|
|
|
+empty at 25% its size is halfed. So operations on it have a variable
|
|
|
|
+complexity between constant and linear with low factor, and memory
|
|
|
|
+overhead varies between 10% and 300%.
|
|
|
|
+
|
|
|
|
+The file chunk cache is stored in ``cache/files`` and is indexed on
|
|
|
|
+the ``file path hash`` and contains:
|
|
|
|
+
|
|
|
|
+* age
|
|
|
|
+* inode number
|
|
|
|
+* size
|
|
|
|
+* mtime_ns
|
|
|
|
+* chunks hashes
|
|
|
|
+
|
|
|
|
+The inode number is stored to make sure we distinguish between
|
|
|
|
+different files, as a single path may not be unique accross different
|
|
|
|
+archives in different setups.
|
|
|
|
+
|
|
|
|
+The file chunk cache is stored as a python associative array storing
|
|
|
|
+python objects, which generate a lot of overhead. This takes around
|
|
|
|
+240 bytes per file without the chunk list, to be compared to at most
|
|
|
|
+64 bytes of real data (depending on data alignment), and around 80
|
|
|
|
+bytes per chunk hash (vs 32), with a minimum of ~250 bytes even if
|
|
|
|
+only one chunck hash.
|
|
|
|
+
|
|
Indexes memory usage
|
|
Indexes memory usage
|
|
--------------------
|
|
--------------------
|
|
|
|
|
|
@@ -158,9 +229,9 @@ Encryption
|
|
----------
|
|
----------
|
|
|
|
|
|
AES_ is used with CTR mode of operation (so no need for padding). A 64
|
|
AES_ is used with CTR mode of operation (so no need for padding). A 64
|
|
-bits initialization vector is used, a SHA256_ based HMAC_ is computed
|
|
|
|
|
|
+bits initialization vector is used, a `HMAC-SHA256`_ is computed
|
|
on the encrypted chunk with a random 64 bits nonce and both are stored
|
|
on the encrypted chunk with a random 64 bits nonce and both are stored
|
|
-in the chunk. The header of each chunk is : ``TYPE(1)` +
|
|
|
|
|
|
+in the chunk. The header of each chunk is : ``TYPE(1)`` +
|
|
``HMAC(32)`` + ``NONCE(8)`` + ``CIPHERTEXT``. Encryption and HMAC use
|
|
``HMAC(32)`` + ``NONCE(8)`` + ``CIPHERTEXT``. Encryption and HMAC use
|
|
two different keys.
|
|
two different keys.
|
|
|
|
|
|
@@ -185,10 +256,8 @@ Key files
|
|
|
|
|
|
When initialized with the ``init -e keyfile`` command, |project_name|
|
|
When initialized with the ``init -e keyfile`` command, |project_name|
|
|
needs an associated file in ``$HOME/.attic/keys`` to read and write
|
|
needs an associated file in ``$HOME/.attic/keys`` to read and write
|
|
-the repository. As with most crypto code in |project_name|, the format
|
|
|
|
-of those files is defined in `attic/key.py`_. The format is based on
|
|
|
|
-msgpack_, base64 encoding and PBKDF2_ SHA256 encryption, which is
|
|
|
|
-then encoded again in a msgpack_.
|
|
|
|
|
|
+the repository. The format is based on msgpack_, base64 encoding and
|
|
|
|
+PBKDF2_ SHA256 encryption, which is then encoded again in a msgpack_.
|
|
|
|
|
|
The internal data structure is as follows:
|
|
The internal data structure is as follows:
|
|
|
|
|
|
@@ -212,9 +281,9 @@ chunk_seed
|
|
the seed for the buzhash chunking table (signed 32 bit integer)
|
|
the seed for the buzhash chunking table (signed 32 bit integer)
|
|
|
|
|
|
Those fields are encoded using msgpack_. The utf-8-encoded phassphrase
|
|
Those fields are encoded using msgpack_. The utf-8-encoded phassphrase
|
|
-is encrypted with a PBKDF2_ and SHA256_ using 100000 iterations and a
|
|
|
|
|
|
+is encrypted with PBKDF2_ and SHA256_ using 100000 iterations and a
|
|
random 256 bits salt to give us a derived key. The derived key is 256
|
|
random 256 bits salt to give us a derived key. The derived key is 256
|
|
-bits long. A HMAC_ SHA256_ checksum of the above fields is generated
|
|
|
|
|
|
+bits long. A `HMAC-SHA256`_ checksum of the above fields is generated
|
|
with the derived key, then the derived key is also used to encrypt the
|
|
with the derived key, then the derived key is also used to encrypt the
|
|
above pack of fields. Then the result is stored in a another msgpack_
|
|
above pack of fields. Then the result is stored in a another msgpack_
|
|
formatted as follows:
|
|
formatted as follows:
|