security.rst 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430
  1. .. include:: ../global.rst.inc
  2. .. somewhat surprisingly the "bash" highlighter gives nice results with
  3. the pseudo-code notation used in the "Encryption" section.
  4. .. highlight:: bash
  5. ========
  6. Security
  7. ========
  8. .. _borgcrypto:
  9. Cryptography in Borg
  10. ====================
  11. .. _attack_model:
  12. Attack model
  13. ------------
  14. The attack model of Borg is that the environment of the client process
  15. (e.g. ``borg create``) is trusted and the repository (server) is not. The
  16. attacker has any and all access to the repository, including interactive
  17. manipulation (man-in-the-middle) for remote repositories.
  18. Furthermore the client environment is assumed to be persistent across
  19. attacks (practically this means that the security database cannot be
  20. deleted between attacks).
  21. Under these circumstances Borg guarantees that the attacker cannot
  22. 1. modify the data of any archive without the client detecting the change
  23. 2. rename, remove or add an archive without the client detecting the change
  24. 3. recover plain-text data
  25. 4. recover definite (heuristics based on access patterns are possible)
  26. structural information such as the object graph (which archives
  27. refer to what chunks)
  28. The attacker can always impose a denial of service per definition (he could
  29. forbid connections to the repository, or delete it entirely).
  30. .. _security_structural_auth:
  31. Structural Authentication
  32. -------------------------
  33. Borg is fundamentally based on an object graph structure (see :ref:`internals`),
  34. where the root object is called the manifest.
  35. Borg follows the `Horton principle`_, which states that
  36. not only the message must be authenticated, but also its meaning (often
  37. expressed through context), because every object used is referenced by a
  38. parent object through its object ID up to the manifest. The object ID in
  39. Borg is a MAC of the object's plaintext, therefore this ensures that
  40. an attacker cannot change the context of an object without forging the MAC.
  41. In other words, the object ID itself only authenticates the plaintext of the
  42. object and not its context or meaning. The latter is established by a different
  43. object referring to an object ID, thereby assigning a particular meaning to
  44. an object. For example, an archive item contains a list of object IDs that
  45. represent packed file metadata. On their own, it's not clear that these objects
  46. would represent what they do, but by the archive item referring to them
  47. in a particular part of its own data structure assigns this meaning.
  48. This results in a directed acyclic graph of authentication from the manifest
  49. to the data chunks of individual files.
  50. Above used to be all for borg 1.x and was the reason why it needed the
  51. tertiary authentication mechanism (TAM) for manifest and archives.
  52. borg 2 now stores the ro_type ("meaning") of a repo object's data into that
  53. object's metadata (like e.g.: manifest vs. archive vs. user file content data).
  54. When loading data from the repo, borg verifies that the type of object it got
  55. matches the type it wanted. borg 2 does not use TAMs any more.
  56. As both the object's metadata and data are AEAD encrypted and also bound to
  57. the object ID (via giving the ID as AAD), there is no way an attacker (without
  58. access to the borg key) could change the type of the object or move content
  59. to a different object ID.
  60. This effectively 'anchors' the manifest (and also other metadata, like archives)
  61. to the key, which is controlled by the client, thereby anchoring the entire DAG,
  62. making it impossible for an attacker to add, remove or modify any part of the
  63. DAG without Borg being able to detect the tampering.
  64. Passphrase notes
  65. ----------------
  66. Note that when using BORG_PASSPHRASE the attacker cannot swap the *entire*
  67. repository against a new repository with e.g. repokey mode and no passphrase,
  68. because Borg will abort access when BORG_PASSPHRASE is incorrect.
  69. However, interactively a user might not notice this kind of attack
  70. immediately, if she assumes that the reason for the absent passphrase
  71. prompt is a set BORG_PASSPHRASE. See issue :issue:`2169` for details.
  72. .. _security_encryption:
  73. Encryption
  74. ----------
  75. AEAD modes
  76. ~~~~~~~~~~
  77. Modes: --encryption (repokey|keyfile)-[blake2-](aes-ocb|chacha20-poly1305)
  78. Supported: borg 2.0+
  79. Encryption with these modes is based on AEAD ciphers (authenticated encryption
  80. with associated data) and session keys.
  81. Depending on the chosen mode (see :ref:`borg_rcreate`) different AEAD ciphers are used:
  82. - AES-256-OCB - super fast, single-pass algorithm IF you have hw accelerated AES.
  83. - chacha20-poly1305 - very fast, purely software based AEAD cipher.
  84. The chunk ID is derived via a MAC over the plaintext (mac key taken from borg key):
  85. - HMAC-SHA256 - super fast IF you have hw accelerated SHA256 (see section "Encryption" below).
  86. - Blake2b - very fast, purely software based algorithm.
  87. For each borg invocation, a new session id is generated by `os.urandom`_.
  88. From that session id, the initial key material (ikm, taken from the borg key)
  89. and an application and cipher specific salt, borg derives a session key via HKDF.
  90. For each session key, IVs (nonces) are generated by a counter which increments for
  91. each encrypted message.
  92. Session::
  93. sessionid = os.urandom(24)
  94. ikm = crypt_key
  95. salt = "borg-session-key-CIPHERNAME"
  96. sessionkey = HKDF(ikm, sessionid, salt)
  97. message_iv = 0
  98. Encryption::
  99. id = MAC(id_key, data)
  100. compressed = compress(data)
  101. header = type-byte || 00h || message_iv || sessionid
  102. aad = id || header
  103. message_iv++
  104. encrypted, auth_tag = AEAD_encrypt(session_key, message_iv, compressed, aad)
  105. authenticated = header || auth_tag || encrypted
  106. Decryption::
  107. # Given: input *authenticated* data and a *chunk-id* to assert
  108. type-byte, past_message_iv, past_sessionid, auth_tag, encrypted = SPLIT(authenticated)
  109. ASSERT(type-byte is correct)
  110. past_key = HKDF(ikm, past_sessionid, salt)
  111. decrypted = AEAD_decrypt(past_key, past_message_iv, authenticated)
  112. decompressed = decompress(decrypted)
  113. Notable:
  114. - More modern and often faster AEAD ciphers instead of self-assembled stuff.
  115. - Due to the usage of session keys, IVs (nonces) do not need special care here as
  116. they did for the legacy encryption modes.
  117. - The id is now also input into the authentication tag computation.
  118. This strongly associates the id with the written data (== associates the key with
  119. the value). When later reading the data for some id, authentication will only
  120. succeed if what we get was really written by us for that id.
  121. Legacy modes
  122. ~~~~~~~~~~~~
  123. Modes: --encryption (repokey|keyfile)-[blake2]
  124. Supported: borg < 2.0
  125. These were the AES-CTR based modes in previous borg versions.
  126. borg 2.0 does not support creating new repos using these modes,
  127. but ``borg transfer`` can still read such existing repos.
  128. .. _key_encryption:
  129. Offline key security
  130. --------------------
  131. Borg cannot secure the key material while it is running, because the keys
  132. are needed in plain to decrypt/encrypt repository objects.
  133. For offline storage of the encryption keys they are encrypted with a
  134. user-chosen passphrase.
  135. A 256 bit key encryption key (KEK) is derived from the passphrase
  136. using argon2_ with a random 256 bit salt. The KEK is then used
  137. to Encrypt-*then*-MAC a packed representation of the keys using the
  138. chacha20-poly1305 AEAD cipher and a constant IV == 0.
  139. The ciphertext is then converted to base64.
  140. This base64 blob (commonly referred to as *keyblob*) is then stored in
  141. the key file or in the repository config (keyfile and repokey modes
  142. respectively).
  143. The use of a constant IV is secure because an identical passphrase will
  144. result in a different derived KEK for every key encryption due to the salt.
  145. .. seealso::
  146. Refer to the :ref:`key_files` section for details on the format.
  147. Implementations used
  148. --------------------
  149. We do not implement cryptographic primitives ourselves, but rely
  150. on widely used libraries providing them:
  151. - AES-OCB and CHACHA20-POLY1305 from OpenSSL 1.1 are used,
  152. which is also linked into the static binaries we provide.
  153. We think this is not an additional risk, since we don't ever
  154. use OpenSSL's networking, TLS or X.509 code, but only their
  155. primitives implemented in libcrypto.
  156. - SHA-256, SHA-512 and BLAKE2b from Python's hashlib_ standard library module are used.
  157. - HMAC and a constant-time comparison from Python's hmac_ standard library module are used.
  158. - argon2 is used via argon2-cffi.
  159. Implemented cryptographic constructions are:
  160. - HKDF_-SHA-512 (using ``hmac.digest`` from Python's hmac_ standard library module)
  161. .. _Horton principle: https://en.wikipedia.org/wiki/Horton_Principle
  162. .. _HKDF: https://tools.ietf.org/html/rfc5869
  163. .. _length extension: https://en.wikipedia.org/wiki/Length_extension_attack
  164. .. _hashlib: https://docs.python.org/3/library/hashlib.html
  165. .. _hmac: https://docs.python.org/3/library/hmac.html
  166. .. _os.urandom: https://docs.python.org/3/library/os.html#os.urandom
  167. Remote RPC protocol security
  168. ============================
  169. .. note:: This section could be further expanded / detailed.
  170. The RPC protocol is fundamentally based on msgpack'd messages exchanged
  171. over an encrypted SSH channel (the system's SSH client is used for this
  172. by piping data from/to it).
  173. This means that the authorization and transport security properties
  174. are inherited from SSH and the configuration of the SSH client and the
  175. SSH server -- Borg RPC does not contain *any* networking
  176. code. Networking is done by the SSH client running in a separate
  177. process, Borg only communicates over the standard pipes (stdout,
  178. stderr and stdin) with this process. This also means that Borg doesn't
  179. have to use a SSH client directly (or SSH at all). For example,
  180. ``sudo`` or ``qrexec`` could be used as an intermediary.
  181. By using the system's SSH client and not implementing a
  182. (cryptographic) network protocol Borg sidesteps many security issues
  183. that would normally impact distributing statically linked / standalone
  184. binaries.
  185. The remainder of this section will focus on the security of the RPC
  186. protocol within Borg.
  187. The assumed worst-case a server can inflict to a client is a
  188. denial of repository service.
  189. The situation where a server can create a general DoS on the client
  190. should be avoided, but might be possible by e.g. forcing the client to
  191. allocate large amounts of memory to decode large messages (or messages
  192. that merely indicate a large amount of data follows). The RPC protocol
  193. code uses a limited msgpack Unpacker to prohibit this.
  194. We believe that other kinds of attacks, especially critical vulnerabilities
  195. like remote code execution are inhibited by the design of the protocol:
  196. 1. The server cannot send requests to the client on its own accord,
  197. it only can send responses. This avoids "unexpected inversion of control"
  198. issues.
  199. 2. msgpack serialization does not allow embedding or referencing code that
  200. is automatically executed. Incoming messages are unpacked by the msgpack
  201. unpacker into native Python data structures (like tuples and dictionaries),
  202. which are then passed to the rest of the program.
  203. Additional verification of the correct form of the responses could be implemented.
  204. 3. Remote errors are presented in two forms:
  205. 1. A simple plain-text *stderr* channel. A prefix string indicates the kind of message
  206. (e.g. WARNING, INFO, ERROR), which is used to suppress it according to the
  207. log level selected in the client.
  208. A server can send arbitrary log messages, which may confuse a user. However,
  209. log messages are only processed when server requests are in progress, therefore
  210. the server cannot interfere / confuse with security critical dialogue like
  211. the password prompt.
  212. 2. Server-side exceptions passed over the main data channel. These follow the
  213. general pattern of server-sent responses and are sent instead of response data
  214. for a request.
  215. The msgpack implementation used (msgpack-python) has a good security track record,
  216. a large test suite and no issues found by fuzzing. It is based on the msgpack-c implementation,
  217. sharing the unpacking engine and some support code. msgpack-c has a good track record as well.
  218. Some issues [#]_ in the past were located in code not included in msgpack-python.
  219. Borg does not use msgpack-c.
  220. .. [#] - `MessagePack fuzzing <https://blog.gypsyengineer.com/fun/msgpack-fuzzing.html>`_
  221. - `Fixed integer overflow and EXT size problem <https://github.com/msgpack/msgpack-c/pull/547>`_
  222. - `Fixed array and map size overflow <https://github.com/msgpack/msgpack-c/pull/550>`_
  223. Using OpenSSL
  224. =============
  225. Borg uses the OpenSSL library for most cryptography (see `Implementations used`_ above).
  226. OpenSSL is bundled with static releases, thus the bundled copy is not updated with system
  227. updates.
  228. OpenSSL is a large and complex piece of software and has had its share of vulnerabilities,
  229. however, it is important to note that Borg links against ``libcrypto`` **not** ``libssl``.
  230. libcrypto is the low-level cryptography part of OpenSSL,
  231. while libssl implements TLS and related protocols.
  232. The latter is not used by Borg (cf. `Remote RPC protocol security`_, Borg itself does not implement
  233. any network access) and historically contained most vulnerabilities, especially critical ones.
  234. The static binaries released by the project contain neither libssl nor the Python ssl/_ssl modules.
  235. Compression and Encryption
  236. ==========================
  237. Combining encryption with compression can be insecure in some contexts (e.g. online protocols).
  238. There was some discussion about this in :issue:`1040` and for Borg some developers
  239. concluded this is no problem at all, some concluded this is hard and extremely slow to exploit
  240. and thus no problem in practice.
  241. No matter what, there is always the option not to use compression if you are worried about this.
  242. Fingerprinting
  243. ==============
  244. Stored chunk sizes
  245. ------------------
  246. A borg repository does not hide the size of the chunks it stores (size
  247. information is needed to operate the repository).
  248. The chunks stored in the repo are the (compressed, encrypted and authenticated)
  249. output of the chunker. The sizes of these stored chunks are influenced by the
  250. compression, encryption and authentication.
  251. buzhash chunker
  252. ~~~~~~~~~~~~~~~
  253. The buzhash chunker chunks according to the input data, the chunker's
  254. parameters and the secret chunker seed (which all influence the chunk boundary
  255. positions).
  256. Small files below some specific threshold (default: 512 KiB) result in only one
  257. chunk (identical content / size as the original file), bigger files result in
  258. multiple chunks.
  259. fixed chunker
  260. ~~~~~~~~~~~~~
  261. This chunker yields fixed sized chunks, with optional support of a differently
  262. sized header chunk. The last chunk is not required to have the full block size
  263. and is determined by the input file size.
  264. Within our attack model, an attacker possessing a specific set of files which
  265. he assumes that the victim also possesses (and backups into the repository)
  266. could try a brute force fingerprinting attack based on the chunk sizes in the
  267. repository to prove his assumption.
  268. To make this more difficult, borg has an ``obfuscate`` pseudo compressor, that
  269. will take the output of the normal compression step and tries to obfuscate
  270. the size of that output. Of course, it can only **add** to the size, not reduce
  271. it. Thus, the optional usage of this mechanism comes at a cost: it will make
  272. your repository larger (ranging from a few percent larger [cheap] to ridiculously
  273. larger [expensive], depending on the algorithm/params you wisely choose).
  274. The output of the compressed-size obfuscation step will then be encrypted and
  275. authenticated, as usual. Of course, using that obfuscation would not make any
  276. sense without encryption. Thus, the additional data added by the obfuscator
  277. are just 0x00 bytes, which is good enough because after encryption it will
  278. look like random anyway.
  279. To summarize, this is making size-based fingerprinting difficult:
  280. - user-selectable chunker algorithm (and parametrization)
  281. - for the buzhash chunker: secret, random per-repo chunker seed
  282. - user-selectable compression algorithm (and level)
  283. - optional ``obfuscate`` pseudo compressor with different choices
  284. of algorithm and parameters
  285. Secret key usage against fingerprinting
  286. ---------------------------------------
  287. Borg uses the borg key also for chunking and chunk ID generation to protect against fingerprinting.
  288. As usual for borg's attack model, the attacker is assumed to have access to a borg repository.
  289. The borg key includes a secret random chunk_seed which (together with the chunking algorithm)
  290. determines the cutting places and thereby the length of the chunks cut. Because the attacker trying
  291. a chunk length fingerprinting attack would use a different chunker secret than the borg setup being
  292. attacked, they would not be able to determine the set of chunk lengths for a known set of files.
  293. The borg key also includes a secret random id_key. The chunk ID generation is not just using a simple
  294. cryptographic hash like sha256 (because that would be insecure as an attacker could see the hashes of
  295. small files that result only in 1 chunk in the repository). Instead, borg uses keyed hash (a MAC,
  296. e.g. HMAC-SHA256) to compute the chunk ID from the content and the secret id_key. Thus, an attacker
  297. can't compute the same chunk IDs for a known set of small files to determine whether these are stored
  298. in the attacked repository.
  299. Stored chunk proximity
  300. ----------------------
  301. Borg does not try to obfuscate order / proximity of files it discovers by
  302. recursing through the filesystem. For performance reasons, we sort directory
  303. contents in file inode order (not in file name alphabetical order), so order
  304. fingerprinting is not useful for an attacker.
  305. But, when new files are close to each other (when looking at recursion /
  306. scanning order), the resulting chunks will be also stored close to each other
  307. in the resulting repository segment file(s).
  308. This might leak additional information for the chunk size fingerprinting
  309. attack (see above).