internals.rst 9.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314
  1. .. include:: global.rst.inc
  2. .. _internals:
  3. Internals
  4. =========
  5. This page documents the internal data structures and storage
  6. mechanisms of |project_name|. It is partly based on `mailing list
  7. discussion about internals`_ and also on static code analysis. It may
  8. not be exactly up to date with the current source code.
  9. |project_name| stores its data in a `Repository`. Each repository can
  10. hold multiple `Archives`, which represent individual backups that
  11. contain a full archive of the files specified when the backup was
  12. performed. Deduplication is performed across multiple backups, both on
  13. data and metadata, using `Segments` chunked with the Buzhash_
  14. algorithm. Each repository has the following file structure:
  15. README
  16. simple text file describing the repository
  17. config
  18. description of the repository, includes the unique identifier. also
  19. acts as a lock file
  20. data/
  21. directory where the actual data (`segments`) is stored
  22. hints.%d
  23. undocumented
  24. index.%d
  25. cache of the file indexes. those files can be regenerated with
  26. ``check --repair``
  27. Repository config file
  28. ----------------------
  29. Each repository has a ``config`` file which which is a ``INI``
  30. formatted file which looks like this:
  31. [repository]
  32. version = 1
  33. segments_per_dir = 10000
  34. max_segment_size = 5242880
  35. id = 57d6c1d52ce76a836b532b0e42e677dec6af9fca3673db511279358828a21ed6
  36. This is where the ``repository.id`` is stored. It is a unique
  37. identifier for repositories. It will not change if you move the
  38. repository around so you can make a local transfer then decide to move
  39. the repository in another (even remote) location at a later time.
  40. |project_name| will do a POSIX read lock on that file when operating
  41. on the repository.
  42. Repository structure
  43. --------------------
  44. |project_name| is a "filesystem based transactional key value
  45. store". It makes extensive use of msgpack_ to store data and, unless
  46. otherwise noted, data is stored in msgpack_ encoded files.
  47. Objects referenced by a key (256bits id/hash) are stored inline in
  48. files (`segments`) of size approx 5MB in ``repo/data``. They contain:
  49. * header size
  50. * crc
  51. * size
  52. * tag
  53. * key
  54. * data
  55. Segments are built locally, and then uploaded. Those files are
  56. strictly append-only and modified only once.
  57. Tag is either ``PUT``, ``DELETE``, or ``COMMIT``. A segment file is
  58. basically a transaction log where each repository operation is
  59. appended to the file. So if an object is written to the repository a
  60. ``PUT`` tag is written to the file followed by the object id and
  61. data. And if an object is deleted a ``DELETE`` tag is appended
  62. followed by the object id. A ``COMMIT`` tag is written when a
  63. repository transaction is committed. When a repository is opened any
  64. ``PUT`` or ``DELETE`` operations not followed by a ``COMMIT`` tag are
  65. discarded since they are part of a partial/uncommitted transaction.
  66. The manifest is an object with an id of only zeros (32 bytes), that
  67. references all the archives. It contains:
  68. * version
  69. * list of archives
  70. * timestamp
  71. * config
  72. Each archive contains:
  73. * name
  74. * id
  75. * time
  76. It is the last object stored, in the last segment, and is replaced
  77. each time.
  78. The archive metadata does not contain the file items directly. Only
  79. references to other objects that contain that data. An archive is an
  80. object that contain metadata:
  81. * version
  82. * name
  83. * items list
  84. * cmdline
  85. * hostname
  86. * username
  87. * time
  88. Each item represents a file or directory or
  89. symlink is stored as a ``item`` dictionnary that contains:
  90. * path
  91. * list of chunks
  92. * user
  93. * group
  94. * uid
  95. * gid
  96. * mode (item type + permissions)
  97. * source (for links)
  98. * rdev (for devices)
  99. * mtime
  100. * xattrs
  101. * acl
  102. * bsdfiles
  103. ``ctime`` (change time) is not stored because there is no API to set
  104. it and it is reset every time an inode's metadata is changed.
  105. All items are serialized using msgpack and the resulting byte stream
  106. is fed into the same chunker used for regular file data and turned
  107. into deduplicated chunks. The reference to these chunks is then added
  108. to the archvive metadata. This allows the archive to store many files,
  109. beyond the ``MAX_OBJECT_SIZE`` barrier of 20MB.
  110. A chunk is an object as well, of course, and its id is the hash of its
  111. (unencrypted and uncompressed) content.
  112. Hints are stored in a file (``repo/hints``) and contain:
  113. * version
  114. * list of segments
  115. * compact
  116. Chunks
  117. ------
  118. |project_name| uses a rolling checksum with Buzhash_ algorithm, with
  119. window size of 4095 bytes, with a minimum of 1024, and triggers when
  120. the last 16 bits of the checksum are null, producing chunks of 64kB on
  121. average. All these parameters are fixed. The buzhash table is altered
  122. by XORing it with a seed randomly generated once for the archive, and
  123. stored encrypted in the keyfile.
  124. Indexes
  125. -------
  126. There are two main indexes: the chunk lookup index and the repository
  127. index. There is also the file chunk cache.
  128. The chunk lookup index is stored in ``cache/chunk`` and is indexed on
  129. the ``chunk hash``. It contains:
  130. * reference count
  131. * size
  132. * ciphered size
  133. The repository index is stored in ``repo/index.%d`` and is also
  134. indexed on ``chunk hash`` and contains:
  135. * segment
  136. * offset
  137. The repository index files are random access but those files can be
  138. recreated if damaged or lost using ``check --repair``.
  139. Both indexes are stored as hash tables, directly mapped in memory from
  140. the file content, with only one slot per bucket, but that spreads the
  141. collisions to the following buckets. As a consequence the hash is just
  142. a start position for a linear search, and if the element is not in the
  143. table the index is linearly crossed until an empty bucket is
  144. found. When the table is full at 90% its size is doubled, when it's
  145. empty at 25% its size is halfed. So operations on it have a variable
  146. complexity between constant and linear with low factor, and memory
  147. overhead varies between 10% and 300%.
  148. The file chunk cache is stored in ``cache/files`` and is indexed on
  149. the ``file path hash`` and contains:
  150. * age
  151. * inode number
  152. * size
  153. * mtime_ns
  154. * chunks hashes
  155. The inode number is stored to make sure we distinguish between
  156. different files, as a single path may not be unique accross different
  157. archives in different setups.
  158. The file chunk cache is stored as a python associative array storing
  159. python objects, which generate a lot of overhead. This takes around
  160. 240 bytes per file without the chunk list, to be compared to at most
  161. 64 bytes of real data (depending on data alignment), and around 80
  162. bytes per chunk hash (vs 32), with a minimum of ~250 bytes even if
  163. only one chunck hash.
  164. Indexes memory usage
  165. --------------------
  166. Here is the estimated memory usage of |project_name| when using those
  167. indexes:
  168. Repository index
  169. 40 bytes x N ~ 200MB (If a remote repository is
  170. used this will be allocated on the remote side)
  171. Chunk lookup index
  172. 44 bytes x N ~ 220MB
  173. File chunk cache
  174. probably 80-100 bytes x N ~ 400MB
  175. Encryption
  176. ----------
  177. AES_ is used with CTR mode of operation (so no need for padding). A 64
  178. bits initialization vector is used, a `HMAC-SHA256`_ is computed
  179. on the encrypted chunk with a random 64 bits nonce and both are stored
  180. in the chunk. The header of each chunk is : ``TYPE(1)`` +
  181. ``HMAC(32)`` + ``NONCE(8)`` + ``CIPHERTEXT``. Encryption and HMAC use
  182. two different keys.
  183. In AES CTR mode you can think of the IV as the start value for the
  184. counter. The counter itself is incremented by one after each 16 byte
  185. block. The IV/counter is not required to be random but it must NEVER be
  186. reused. So to accomplish this Attic initializes the encryption counter
  187. to be higher than any previously used counter value before encrypting
  188. new data.
  189. To reduce payload size only 8 bytes of the 16 bytes nonce is saved in
  190. the payload, the first 8 bytes are always zeros. This does not affect
  191. security but limits the maximum repository capacity to only 295
  192. exabytes (2**64 * 16 bytes).
  193. Encryption keys are either a passphrase, passed through the
  194. ``ATTIC_PASSPHRASE`` environment or prompted on the commandline, or
  195. stored in automatically generated key files.
  196. Key files
  197. ---------
  198. When initialized with the ``init -e keyfile`` command, |project_name|
  199. needs an associated file in ``$HOME/.attic/keys`` to read and write
  200. the repository. The format is based on msgpack_, base64 encoding and
  201. PBKDF2_ SHA256 encryption, which is then encoded again in a msgpack_.
  202. The internal data structure is as follows:
  203. version
  204. currently always an integer, 1
  205. repository_id
  206. the ``id`` field in the ``config`` ``INI`` file of the repository.
  207. enc_key
  208. the key used to encrypt data with AES (256 bits)
  209. enc_hmac_key
  210. the key used to HMAC the resulting AES-encrypted data (256 bits)
  211. id_key
  212. the key used to HMAC the above chunks, the resulting hash is
  213. stored out of band (256 bits)
  214. chunk_seed
  215. the seed for the buzhash chunking table (signed 32 bit integer)
  216. Those fields are encoded using msgpack_. The utf-8-encoded phassphrase
  217. is encrypted with PBKDF2_ and SHA256_ using 100000 iterations and a
  218. random 256 bits salt to give us a derived key. The derived key is 256
  219. bits long. A `HMAC-SHA256`_ checksum of the above fields is generated
  220. with the derived key, then the derived key is also used to encrypt the
  221. above pack of fields. Then the result is stored in a another msgpack_
  222. formatted as follows:
  223. version
  224. currently always an integer, 1
  225. salt
  226. random 256 bits salt used to encrypt the passphrase
  227. iterations
  228. number of iterations used to encrypt the passphrase (currently 100000)
  229. algorithm
  230. the hashing algorithm used to encrypt the passphrase and do the HMAC
  231. checksum (currently the string ``sha256``)
  232. hash
  233. the HMAC checksum of the encrypted derived key
  234. data
  235. the derived key, encrypted with AES over a PBKDF2_ SHA256 hash
  236. described above
  237. The resulting msgpack_ is then encoded using base64 and written to the
  238. key file, wrapped using the textwrap_ module with a header. The header
  239. is a single line with the string ``ATTIC_KEY``, a space and a
  240. hexadecimal representation of the repository id.