security.rst 19 KB


  1. .. somewhat surprisingly the "bash" highlighter gives nice results with
  2. the pseudo-code notation used in the "Encryption" section.
  3. .. highlight:: bash
  4. ========
  5. Security
  6. ========
  7. .. _borgcrypto:
  8. Cryptography in Borg
  9. ====================
  10. Attack model
  11. ------------
  12. The attack model of Borg is that the environment of the client process
  13. (e.g. ``borg create``) is trusted and the repository (server) is not. The
  14. attacker has any and all access to the repository, including interactive
  15. manipulation (man-in-the-middle) for remote repositories.
  16. Furthermore the client environment is assumed to be persistent across
  17. attacks (practically this means that the security database cannot be
  18. deleted between attacks).
  19. Under these circumstances Borg guarantees that the attacker cannot
  20. 1. modify the data of any archive without the client detecting the change
  21. 2. rename, remove or add an archive without the client detecting the change
  22. 3. recover plain-text data
  23. 4. recover definite (heuristics based on access patterns are possible)
  24. structural information such as the object graph (which archives
  25. refer to what chunks)
  26. The attacker can always impose a denial of service per definition (he could
  27. forbid connections to the repository, or delete it entirely).
  28. When the above attack model is extended to include multiple clients
  29. independently updating the same repository, then Borg fails to provide
  30. confidentiality (i.e. guarantees 3) and 4) do not apply any more).
  31. .. _security_structural_auth:
  32. Structural Authentication
  33. -------------------------
  34. Borg is fundamentally based on an object graph structure (see :ref:`internals`),
  35. where the root object is called the manifest.
  36. Borg follows the `Horton principle`_, which states that
  37. not only the message must be authenticated, but also its meaning (often
  38. expressed through context), because every object used is referenced by a
  39. parent object through its object ID up to the manifest. The object ID in
  40. Borg is a MAC of the object's plaintext, therefore this ensures that
  41. an attacker cannot change the context of an object without forging the MAC.
  42. In other words, the object ID itself only authenticates the plaintext of the
  43. object and not its context or meaning. The latter is established by a different
  44. object referring to an object ID, thereby assigning a particular meaning to
  45. an object. For example, an archive item contains a list of object IDs that
  46. represent packed file metadata. On their own it's not clear that these objects
  47. would represent what they do, but by the archive item referring to them
  48. in a particular part of its own data structure assigns this meaning.
  49. This results in a directed acyclic graph of authentication from the manifest
  50. to the data chunks of individual files.
  51. .. _tam_description:
  52. .. rubric:: Authenticating the manifest
  53. Since the manifest has a fixed ID (000...000) the aforementioned authentication
  54. does not apply to it, indeed, cannot apply to it; it is impossible to authenticate
  55. the root node of a DAG through its edges, since the root node has no incoming edges.
  56. With the scheme as described so far an attacker could easily replace the manifest,
  57. therefore Borg includes a tertiary authentication mechanism (TAM) that is applied
  58. to the manifest since version 1.0.9 (see :ref:`tam_vuln`).
  59. TAM works by deriving a separate key through HKDF_ from the other encryption and
  60. authentication keys and calculating the HMAC of the metadata to authenticate [#]_::
  61. # RANDOM(n) returns n random bytes
  62. salt = RANDOM(64)
  63. ikm = id_key || enc_key || enc_hmac_key
  64. # *context* depends on the operation, for manifest authentication it is
  65. # the ASCII string "borg-metadata-authentication-manifest".
  66. tam_key = HKDF-SHA-512(ikm, salt, context)
  67. # *data* is a dict-like structure
  68. data[hmac] = zeroes
  69. packed = pack(data)
  70. data[hmac] = HMAC(tam_key, packed)
  71. packed_authenticated = pack(data)
  72. Since an attacker cannot gain access to this key and also cannot make the
  73. client authenticate arbitrary data using this mechanism, the attacker is unable
  74. to forge the authentication.
  75. This effectively 'anchors' the manifest to the key, which is controlled by the
  76. client, thereby anchoring the entire DAG, making it impossible for an attacker
  77. to add, remove or modify any part of the DAG without Borg being able to detect
  78. the tampering.
  79. Note that when using BORG_PASSPHRASE the attacker cannot swap the *entire*
  80. repository against a new repository with e.g. repokey mode and no passphrase,
  81. because Borg will abort access when BORG_PASSPRHASE is incorrect.
  82. However, interactively a user might not notice this kind of attack
  83. immediately, if she assumes that the reason for the absent passphrase
  84. prompt is a set BORG_PASSPHRASE. See issue :issue:`2169` for details.
  85. .. [#] The reason why the authentication tag is stored in the packed
  86. data itself is that older Borg versions can still read the
  87. manifest this way, while a changed layout would have broken
  88. compatibility.
  89. Encryption
  90. ----------
  91. Encryption is currently based on the Encrypt-then-MAC construction,
  92. which is generally seen as the most robust way to create an authenticated
  93. encryption scheme from encryption and message authentication primitives.
  94. Every operation (encryption, MAC / authentication, chunk ID derivation)
  95. uses independent, random keys generated by `os.urandom`_ [#]_.
  96. Borg does not support unauthenticated encryption -- only authenticated encryption
  97. schemes are supported. No unauthenticated encryption schemes will be added
  98. in the future.
  99. Depending on the chosen mode (see :ref:`borg_init`) different primitives are used:
  100. - The actual encryption is currently always AES-256 in CTR mode. The
  101. counter is added in plaintext, since it is needed for decryption,
  102. and is also tracked locally on the client to avoid counter reuse.
  103. - The authentication primitive is either HMAC-SHA-256 or BLAKE2b-256
  104. in a keyed mode. HMAC-SHA-256 uses 256 bit keys, while BLAKE2b-256
  105. uses 512 bit keys.
  106. The latter is secure not only because BLAKE2b itself is not
  107. susceptible to `length extension`_, but also since it truncates the
  108. hash output from 512 bits to 256 bits, which would make the
  109. construction safe even if BLAKE2b were broken regarding length
  110. extension or similar attacks.
  111. - The primitive used for authentication is always the same primitive
  112. that is used for deriving the chunk ID, but they are always
  113. used with independent keys.
  114. Encryption::
  115. id = AUTHENTICATOR(id_key, data)
  116. compressed = compress(data)
  117. iv = reserve_iv()
  118. encrypted = AES-256-CTR(enc_key, 8-null-bytes || iv, compressed)
  119. authenticated = type-byte || AUTHENTICATOR(enc_hmac_key, encrypted) || iv || encrypted
  120. Decryption::
  121. # Given: input *authenticated* data, possibly a *chunk-id* to assert
  122. type-byte, mac, iv, encrypted = SPLIT(authenticated)
  123. ASSERT(type-byte is correct)
  124. ASSERT( CONSTANT-TIME-COMPARISON( mac, AUTHENTICATOR(enc_hmac_key, encrypted) ) )
  125. decrypted = AES-256-CTR(enc_key, 8-null-bytes || iv, encrypted)
  126. decompressed = decompress(decrypted)
  127. ASSERT( CONSTANT-TIME-COMPARISON( chunk-id, AUTHENTICATOR(id_key, decompressed) ) )
  128. The client needs to track which counter values have been used, since
  129. encrypting a chunk requires a starting counter value and no two chunks
  130. may have overlapping counter ranges (otherwise the bitwise XOR of the
  131. overlapping plaintexts is revealed).
  132. The client does not directly track the counter value, because it
  133. changes often (with each encrypted chunk), instead it commits a
  134. "reservation" to the security database and the repository by taking
  135. the current counter value and adding 4 GiB / 16 bytes (the block size)
  136. to the counter. Thus the client only needs to commit a new reservation
  137. every few gigabytes of encrypted data.
  138. This mechanism also avoids reusing counter values in case the client
  139. crashes or the connection to the repository is severed, since any
  140. reservation would have been committed to both the security database
  141. and the repository before any data is encrypted. Borg uses its
  142. standard mechanism (SaveFile) to ensure that reservations are durable
  143. (on most hardware / storage systems), therefore a crash of the
  144. client's host would not impact tracking of reservations.
  145. However, this design is not infallible, and requires synchronization
  146. between clients, which is handled through the repository. Therefore in
  147. a multiple-client scenario a repository can trick a client into
  148. reusing counter values by ignoring counter reservations and replaying
  149. the manifest (which will fail if the client has seen a more recent
  150. manifest or has a more recent nonce reservation). If the repository is
  151. untrusted, but a trusted synchronization channel exists between
  152. clients, the security database could be synchronized between them over
  153. said trusted channel. This is not part of Borg's functionality.
  154. .. [#] Using the :ref:`borg key migrate-to-repokey <borg_key_migrate-to-repokey>`
  155. command a user can convert repositories created using Attic in "passphrase"
  156. mode to "repokey" mode. In this case the keys were directly derived from
  157. the user's passphrase at some point using PBKDF2.
  158. Borg does not support "passphrase" mode otherwise any more.
  159. .. _key_encryption:
  160. Offline key security
  161. --------------------
  162. Borg cannot secure the key material while it is running, because the keys
  163. are needed in plain to decrypt/encrypt repository objects.
  164. For offline storage of the encryption keys they are encrypted with a
  165. user-chosen passphrase.
  166. A 256 bit key encryption key (KEK) is derived from the passphrase
  167. using PBKDF2-HMAC-SHA256 with a random 256 bit salt which is then used
  168. to Encrypt-*and*-MAC (unlike the Encrypt-*then*-MAC approach used
  169. otherwise) a packed representation of the keys with AES-256-CTR with a
  170. constant initialization vector of 0. A HMAC-SHA256 of the plaintext is
  171. generated using the same KEK and is stored alongside the ciphertext,
  172. which is converted to base64 in its entirety.
  173. This base64 blob (commonly referred to as *keyblob*) is then stored in
  174. the key file or in the repository config (keyfile and repokey modes
  175. respectively).
  176. This scheme, and specifically the use of a constant IV with the CTR
  177. mode, is secure because an identical passphrase will result in a
  178. different derived KEK for every key encryption due to the salt.
  179. The use of Encrypt-and-MAC instead of Encrypt-then-MAC is seen as
  180. uncritical (but not ideal) here, since it is combined with AES-CTR mode,
  181. which is not vulnerable to padding attacks.
  182. .. seealso::
  183. Refer to the :ref:`key_files` section for details on the format.
  184. Refer to issue :issue:`747` for suggested improvements of the encryption
  185. scheme and password-based key derivation.
  186. Implementations used
  187. --------------------
  188. We do not implement cryptographic primitives ourselves, but rely
  189. on widely used libraries providing them:
  190. - AES-CTR and HMAC-SHA-256 from OpenSSL 1.0 / 1.1 are used,
  191. which is also linked into the static binaries we provide.
  192. We think this is not an additional risk, since we don't ever
  193. use OpenSSL's networking, TLS or X.509 code, but only their
  194. primitives implemented in libcrypto.
  195. - SHA-256 and SHA-512 from Python's hashlib_ standard library module are used.
  196. Borg requires a Python built with OpenSSL support (due to PBKDF2), therefore
  197. these functions are delegated to OpenSSL by Python.
  198. - HMAC, PBKDF2 and a constant-time comparison from Python's hmac_ standard
  199. library module is used. While the HMAC implementation is written in Python,
  200. the PBKDF2 implementation is provided by OpenSSL. The constant-time comparison
  201. (``compare_digest``) is written in C and part of Python.
  202. - BLAKE2b is either provided by the system's libb2, an official implementation,
  203. or a bundled copy of the BLAKE2 reference implementation (written in C).
  204. Implemented cryptographic constructions are:
  205. - Encrypt-then-MAC based on AES-256-CTR and either HMAC-SHA-256
  206. or keyed BLAKE2b256 as described above under Encryption_.
  207. - Encrypt-and-MAC based on AES-256-CTR and HMAC-SHA-256
  208. as described above under `Offline key security`_.
  209. - HKDF_-SHA-512
  210. .. _Horton principle: https://en.wikipedia.org/wiki/Horton_Principle
  211. .. _HKDF: https://tools.ietf.org/html/rfc5869
  212. .. _length extension: https://en.wikipedia.org/wiki/Length_extension_attack
  213. .. _hashlib: https://docs.python.org/3/library/hashlib.html
  214. .. _hmac: https://docs.python.org/3/library/hmac.html
  215. .. _os.urandom: https://docs.python.org/3/library/os.html#os.urandom
  216. Remote RPC protocol security
  217. ============================
  218. .. note:: This section could be further expanded / detailed.
  219. The RPC protocol is fundamentally based on msgpack'd messages exchanged
  220. over an encrypted SSH channel (the system's SSH client is used for this
  221. by piping data from/to it).
  222. This means that the authorization and transport security properties
  223. are inherited from SSH and the configuration of the SSH client and the
  224. SSH server -- Borg RPC does not contain *any* networking
  225. code. Networking is done by the SSH client running in a separate
  226. process, Borg only communicates over the standard pipes (stdout,
  227. stderr and stdin) with this process. This also means that Borg doesn't
  228. have to directly use a SSH client (or SSH at all). For example,
  229. ``sudo`` or ``qrexec`` could be used as an intermediary.
  230. By using the system's SSH client and not implementing a
  231. (cryptographic) network protocol Borg sidesteps many security issues
  232. that would normally impact distributing statically linked / standalone
  233. binaries.
  234. The remainder of this section will focus on the security of the RPC
  235. protocol within Borg.
  236. The assumed worst-case a server can inflict to a client is a
  237. denial of repository service.
  238. The situation were a server can create a general DoS on the client
  239. should be avoided, but might be possible by e.g. forcing the client to
  240. allocate large amounts of memory to decode large messages (or messages
  241. that merely indicate a large amount of data follows). The RPC protocol
  242. code uses a limited msgpack Unpacker to prohibit this.
  243. We believe that other kinds of attacks, especially critical vulnerabilities
  244. like remote code execution are inhibited by the design of the protocol:
  245. 1. The server cannot send requests to the client on its own accord,
  246. it only can send responses. This avoids "unexpected inversion of control"
  247. issues.
  248. 2. msgpack serialization does not allow embedding or referencing code that
  249. is automatically executed. Incoming messages are unpacked by the msgpack
  250. unpacker into native Python data structures (like tuples and dictionaries),
  251. which are then passed to the rest of the program.
  252. Additional verification of the correct form of the responses could be implemented.
  253. 3. Remote errors are presented in two forms:
  254. 1. A simple plain-text *stderr* channel. A prefix string indicates the kind of message
  255. (e.g. WARNING, INFO, ERROR), which is used to suppress it according to the
  256. log level selected in the client.
  257. A server can send arbitrary log messages, which may confuse a user. However,
  258. log messages are only processed when server requests are in progress, therefore
  259. the server cannot interfere / confuse with security critical dialogue like
  260. the password prompt.
  261. 2. Server-side exceptions passed over the main data channel. These follow the
  262. general pattern of server-sent responses and are sent instead of response data
  263. for a request.
  264. The msgpack implementation used (msgpack-python) has a good security track record,
  265. a large test suite and no issues found by fuzzing. It is based on the msgpack-c implementation,
  266. sharing the unpacking engine and some support code. msgpack-c has a good track record as well.
  267. Some issues [#]_ in the past were located in code not included in msgpack-python.
  268. Borg does not use msgpack-c.
  269. .. [#] - `MessagePack fuzzing <https://blog.gypsyengineer.com/fun/msgpack-fuzzing.html>`_
  270. - `Fixed integer overflow and EXT size problem <https://github.com/msgpack/msgpack-c/pull/547>`_
  271. - `Fixed array and map size overflow <https://github.com/msgpack/msgpack-c/pull/550>`_
  272. Using OpenSSL
  273. =============
  274. Borg uses the OpenSSL library for most cryptography (see `Implementations used`_ above).
  275. OpenSSL is bundled with static releases, thus the bundled copy is not updated with system
  276. updates.
  277. OpenSSL is a large and complex piece of software and has had its share of vulnerabilities,
  278. however, it is important to note that Borg links against ``libcrypto`` **not** ``libssl``.
  279. libcrypto is the low-level cryptography part of OpenSSL,
  280. while libssl implements TLS and related protocols.
  281. The latter is not used by Borg (cf. `Remote RPC protocol security`_, Borg itself does not implement
  282. any network access) and historically contained most vulnerabilities, especially critical ones.
  283. The static binaries released by the project contain neither libssl nor the Python ssl/_ssl modules.
  284. Compression and Encryption
  285. ==========================
  286. Combining encryption with compression can be insecure in some contexts (e.g. online protocols).
  287. There was some discussion about this in `github issue #1040`_ and for Borg some developers
  288. concluded this is no problem at all, some concluded this is hard and extremely slow to exploit
  289. and thus no problem in practice.
  290. No matter what, there is always the option not to use compression if you are worried about this.
  291. .. _github issue #1040: https://github.com/borgbackup/borg/issues/1040
  292. Fingerprinting
  293. ==============
  294. Stored chunk sizes
  295. ------------------
  296. A borg repository does not hide the size of the chunks it stores (size
  297. information is needed to operate the repository).
  298. The chunks stored in the repo are the (compressed, encrypted and authenticated)
  299. output of the chunker. The sizes of these stored chunks are influenced by the
  300. compression, encryption and authentication.
  301. buzhash chunker
  302. +++++++++++++++
  303. The buzhash chunker chunks according to the input data, the chunker's
  304. parameters and the secret chunker seed (which all influence the chunk boundary
  305. positions).
  306. Small files below some specific threshold (default: 512kiB) result in only one
  307. chunk (identical content / size as the original file), bigger files result in
  308. multiple chunks.
  309. fixed chunker
  310. +++++++++++++
  311. This chunker yields fixed sized chunks, with optional support of a differently
  312. sized header chunk. The last chunk is not required to have the full block size
  313. and is determined by the input file size.
  314. Within our attack model, an attacker posessing a specific set of files which
  315. he assumes that the victim also posesses (and backups into the repository)
  316. could try a brute force fingerprinting attack based on the chunk sizes in the
  317. repository to prove his assumption.
  318. Stored chunk proximity
  319. ----------------------
  320. Borg does not try to obfuscate order / proximity of files it discovers by
  321. recursing through the filesystem. For performance reasons, we sort directory
  322. contents in file inode order (not in file name alphabetical order), so order
  323. fingerprinting is not useful for an attacker.
  324. But, when new files are close to each other (when looking at recursion /
  325. scanning order), the resulting chunks will be also stored close to each other
  326. in the resulting repository segment file(s).
  327. This might leak additional information for the chunk size fingerprinting
  328. attack (see above).