notes.rst 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334
  1. Additional Notes
  2. ----------------
  3. Here are misc. notes about topics that are maybe not covered in enough detail in the usage section.
  4. .. _chunker-params:
  5. ``--chunker-params``
  6. ~~~~~~~~~~~~~~~~~~~~
  7. The chunker params influence how input files are cut into pieces (chunks)
  8. which are then considered for deduplication. They also have a big impact on
  9. resource usage (RAM and disk space) as the amount of resources needed is
  10. (also) determined by the total amount of chunks in the repository (see
  11. :ref:`cache-memory-usage` for details).
  12. ``--chunker-params=buzhash,10,23,16,4095`` results in a fine-grained deduplication|
  13. and creates a big amount of chunks and thus uses a lot of resources to manage
  14. them. This is good for relatively small data volumes and if the machine has a
  15. good amount of free RAM and disk space.
  16. ``--chunker-params=buzhash,19,23,21,4095`` (default) results in a coarse-grained
  17. deduplication and creates a much smaller amount of chunks and thus uses less
  18. resources. This is good for relatively big data volumes and if the machine has
  19. a relatively low amount of free RAM and disk space.
  20. ``--chunker-params=fixed,4194304`` results in fixed 4MiB sized block
  21. deduplication and is more efficient than the previous example when used for
  22. for block devices (like disks, partitions, LVM LVs) or raw disk image files.
  23. ``--chunker-params=fixed,4096,512`` results in fixed 4kiB sized blocks,
  24. but the first header block will only be 512B long. This might be useful to
  25. dedup files with 1 header + N fixed size data blocks. Be careful not to
  26. produce a too big amount of chunks (like using small block size for huge
  27. files).
  28. If you already have made some archives in a repository and you then change
  29. chunker params, this of course impacts deduplication as the chunks will be
  30. cut differently.
  31. In the worst case (all files are big and were touched in between backups), this
  32. will store all content into the repository again.
  33. Usually, it is not that bad though:
  34. - usually most files are not touched, so it will just re-use the old chunks
  35. it already has in the repo
  36. - files smaller than the (both old and new) minimum chunksize result in only
  37. one chunk anyway, so the resulting chunks are same and deduplication will apply
  38. If you switch chunker params to save resources for an existing repo that
  39. already has some backup archives, you will see an increasing effect over time,
  40. when more and more files have been touched and stored again using the bigger
  41. chunksize **and** all references to the smaller older chunks have been removed
  42. (by deleting / pruning archives).
  43. If you want to see an immediate big effect on resource usage, you better start
  44. a new repository when changing chunker params.
  45. For more details, see :ref:`chunker_details`.
  46. ``--noatime / --noctime``
  47. ~~~~~~~~~~~~~~~~~~~~~~~~~
  48. You can use these ``borg create`` options not to store the respective timestamp
  49. into the archive, in case you do not really need it.
  50. Besides saving a little space for the not archived timestamp, it might also
  51. affect metadata stream deduplication: if only this timestamp changes between
  52. backups and is stored into the metadata stream, the metadata stream chunks
  53. won't deduplicate just because of that.
  54. ``--nobsdflags / --noflags``
  55. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  56. You can use this not to query and store (or not extract and set) flags - in case
  57. you don't need them or if they are broken somehow for your fs.
  58. On Linux, dealing with the flags needs some additional syscalls. Especially when
  59. dealing with lots of small files, this causes a noticeable overhead, so you can
  60. use this option also for speeding up operations.
  61. ``--umask``
  62. ~~~~~~~~~~~
  63. borg uses a safe default umask of 077 (that means the files borg creates have
  64. only permissions for owner, but no permissions for group and others) - so there
  65. should rarely be a need to change the default behaviour.
  66. This option only affects the process to which it is given. Thus, when you run
  67. borg in client/server mode and you want to change the behaviour on the server
  68. side, you need to use ``borg serve --umask=XXX ...`` as a ssh forced command
  69. in ``authorized_keys``. The ``--umask`` value given on the client side is
  70. **not** transferred to the server side.
  71. Also, if you choose to use the ``--umask`` option, always be consistent and use
  72. the same umask value so you do not create a mixup of permissions in a borg
  73. repository or with other files borg creates.
  74. ``--read-special``
  75. ~~~~~~~~~~~~~~~~~~
  76. The ``--read-special`` option is special - you do not want to use it for normal
  77. full-filesystem backups, but rather after carefully picking some targets for it.
  78. The option ``--read-special`` triggers special treatment for block and char
  79. device files as well as FIFOs. Instead of storing them as such a device (or
  80. FIFO), they will get opened, their content will be read and in the backup
  81. archive they will show up like a regular file.
  82. Symlinks will also get special treatment if (and only if) they point to such
  83. a special file: instead of storing them as a symlink, the target special file
  84. will get processed as described above.
  85. One intended use case of this is backing up the contents of one or multiple
  86. block devices, like e.g. LVM snapshots or inactive LVs or disk partitions.
  87. You need to be careful about what you include when using ``--read-special``,
  88. e.g. if you include ``/dev/zero``, your backup will never terminate.
  89. Restoring such files' content is currently only supported one at a time via
  90. ``--stdout`` option (and you have to redirect stdout to where ever it shall go,
  91. maybe directly into an existing device file of your choice or indirectly via
  92. ``dd``).
  93. To some extent, mounting a backup archive with the backups of special files
  94. via ``borg mount`` and then loop-mounting the image files from inside the mount
  95. point will work. If you plan to access a lot of data in there, it likely will
  96. scale and perform better if you do not work via the FUSE mount.
  97. Example
  98. +++++++
  99. Imagine you have made some snapshots of logical volumes (LVs) you want to back up.
  100. .. note::
  101. For some scenarios, this is a good method to get "crash-like" consistency
  102. (I call it crash-like because it is the same as you would get if you just
  103. hit the reset button or your machine would abruptly and completely crash).
  104. This is better than no consistency at all and a good method for some use
  105. cases, but likely not good enough if you have databases running.
  106. Then you create a backup archive of all these snapshots. The backup process will
  107. see a "frozen" state of the logical volumes, while the processes working in the
  108. original volumes continue changing the data stored there.
  109. You also add the output of ``lvdisplay`` to your backup, so you can see the LV
  110. sizes in case you ever need to recreate and restore them.
  111. After the backup has completed, you remove the snapshots again.
  112. ::
  113. $ # create snapshots here
  114. $ lvdisplay > lvdisplay.txt
  115. $ borg create --read-special arch lvdisplay.txt /dev/vg0/*-snapshot
  116. $ # remove snapshots here
  117. Now, let's see how to restore some LVs from such a backup.
  118. ::
  119. $ borg extract arch lvdisplay.txt
  120. $ # create empty LVs with correct sizes here (look into lvdisplay.txt).
  121. $ # we assume that you created an empty root and home LV and overwrite it now:
  122. $ borg extract --stdout arch dev/vg0/root-snapshot > /dev/vg0/root
  123. $ borg extract --stdout arch dev/vg0/home-snapshot > /dev/vg0/home
  124. .. _separate_compaction:
  125. Separate compaction
  126. ~~~~~~~~~~~~~~~~~~~
  127. Borg does not auto-compact the segment files in the repository at commit time
  128. (at the end of each repository-writing command) any more (since borg 1.2.0).
  129. This causes a similar behaviour of the repository as if it was in append-only
  130. mode (see below) most of the time (until ``borg compact`` is invoked or an
  131. old client triggers auto-compaction).
  132. This has some notable consequences:
  133. - repository space is not freed immediately when deleting / pruning archives
  134. - commands finish quicker
  135. - repository is more robust and might be easier to recover after damages (as
  136. it contains data in a more sequential manner, historic manifests, multiple
  137. commits - until you run ``borg compact``)
  138. - user can choose when to run compaction (it should be done regularly, but not
  139. necessarily after each single borg command)
  140. - user can choose from where to invoke ``borg compact`` to do the compaction
  141. (from client or from server, it does not need a key)
  142. - less repo sync data traffic in case you create a copy of your repository by
  143. using a sync tool (like rsync, rclone, ...)
  144. You can manually run compaction by invoking the ``borg compact`` command.
  145. .. _append_only_mode:
  146. Append-only mode (forbid compaction)
  147. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  148. A repository can be made "append-only", which means that Borg will never
  149. overwrite or delete committed data (append-only refers to the segment files,
  150. but borg will also reject to delete the repository completely).
  151. If ``borg compact`` command is used on a repo in append-only mode, there
  152. will be no warning or error, but no compaction will happen.
  153. append-only is useful for scenarios where a backup client machine backups
  154. remotely to a backup server using ``borg serve``, since a hacked client machine
  155. cannot delete backups on the server permanently.
  156. To activate append-only mode, set ``append_only`` to 1 in the repository config:
  157. ::
  158. borg config append_only 1
  159. Note that you can go back-and-forth between normal and append-only operation with
  160. ``borg config``; it's not a "one way trip."
  161. In append-only mode Borg will create a transaction log in the ``transactions`` file,
  162. where each line is a transaction and a UTC timestamp.
  163. In addition, ``borg serve`` can act as if a repository is in append-only mode with
  164. its option ``--append-only``. This can be very useful for fine-tuning access control
  165. in ``.ssh/authorized_keys``:
  166. ::
  167. command="borg serve --append-only ..." ssh-rsa <key used for not-always-trustable backup clients>
  168. command="borg serve ..." ssh-rsa <key used for backup management>
  169. Running ``borg repo-create`` via a ``borg serve --append-only`` server will *not* create
  170. an append-only repository. Running ``borg repo-create --append-only`` creates an append-only
  171. repository regardless of server settings.
  172. Example
  173. +++++++
  174. Suppose an attacker remotely deleted all backups, but your repository was in append-only
  175. mode. A transaction log in this situation might look like this:
  176. ::
  177. transaction 1, UTC time 2016-03-31T15:53:27.383532
  178. transaction 5, UTC time 2016-03-31T15:53:52.588922
  179. transaction 11, UTC time 2016-03-31T15:54:23.887256
  180. transaction 12, UTC time 2016-03-31T15:55:54.022540
  181. transaction 13, UTC time 2016-03-31T15:55:55.472564
  182. From your security logs you conclude the attacker gained access at 15:54:00 and all
  183. the backups where deleted or replaced by compromised backups. From the log you know
  184. that transactions 11 and later are compromised. Note that the transaction ID is the
  185. name of the *last* file in the transaction. For example, transaction 11 spans files 6
  186. to 11.
  187. In a real attack you'll likely want to keep the compromised repository
  188. intact to analyze what the attacker tried to achieve. It's also a good idea to make this
  189. copy just in case something goes wrong during the recovery. Since recovery is done by
  190. deleting some files, a hard link copy (``cp -al``) is sufficient.
  191. The first step to reset the repository to transaction 5, the last uncompromised transaction,
  192. is to remove the ``hints.N``, ``index.N`` and ``integrity.N`` files in the repository (these
  193. files are always expendable). In this example N is 13.
  194. Then remove or move all segment files from the segment directories in ``data/`` starting
  195. with file 6::
  196. rm data/**/{6..13}
  197. That's all to do in the repository.
  198. If you want to access this rolled back repository from a client that already has
  199. a cache for this repository, the cache will reflect a newer repository state
  200. than what you actually have in the repository now, after the rollback.
  201. Thus, you need to clear the cache::
  202. borg repo-delete --cache-only
  203. The cache will get rebuilt automatically. Depending on repo size and archive
  204. count, it may take a while.
  205. You also will need to remove ~/.config/borg/security/REPOID/manifest-timestamp.
  206. Drawbacks
  207. +++++++++
  208. As data is only appended, and nothing removed, commands like ``prune`` or ``delete``
  209. won't free disk space, they merely tag data as deleted in a new transaction.
  210. Be aware that as soon as you write to the repo in non-append-only mode (e.g. prune,
  211. delete or create archives from an admin machine), it will remove the deleted objects
  212. permanently (including the ones that were already marked as deleted, but not removed,
  213. in append-only mode). Automated edits to the repository (such as a cron job running
  214. ``borg prune``) will render append-only mode moot if data is deleted.
  215. Even if an archive appears to be available, it is possible an attacker could delete
  216. just a few chunks from an archive and silently corrupt its data. While in append-only
  217. mode, this is reversible, but ``borg check`` should be run before a writing/pruning
  218. operation on an append-only repository to catch accidental or malicious corruption::
  219. # run without append-only mode
  220. borg check --verify-data && borg compact
  221. Aside from checking repository & archive integrity you may also want to check
  222. backups manually to ensure their content seems correct.
  223. Further considerations
  224. ++++++++++++++++++++++
  225. Append-only mode is not respected by tools other than Borg. ``rm`` still works on the
  226. repository. Make sure that backup client machines only get to access the repository via
  227. ``borg serve``.
  228. Ensure that no remote access is possible if the repository is temporarily set to normal mode
  229. for e.g. regular pruning.
  230. Further protections can be implemented, but are outside of Borg's scope. For example,
  231. file system snapshots or wrapping ``borg serve`` to set special permissions or ACLs on
  232. new data files.
  233. SSH batch mode
  234. ~~~~~~~~~~~~~~
  235. When running Borg using an automated script, ``ssh`` might still ask for a password,
  236. even if there is an SSH key for the target server. Use this to make scripts more robust::
  237. export BORG_RSH='ssh -oBatchMode=yes'