notes.rst 9.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226
  1. Additional Notes
  2. ----------------
  3. Here are misc. notes about topics that are maybe not covered in enough detail in the usage section.
  4. .. _chunker-params:
  5. --chunker-params
  6. ~~~~~~~~~~~~~~~~
  7. The chunker params influence how input files are cut into pieces (chunks)
  8. which are then considered for deduplication. They also have a big impact on
  9. resource usage (RAM and disk space) as the amount of resources needed is
  10. (also) determined by the total amount of chunks in the repository (see
  11. `Indexes / Caches memory usage` for details).
  12. ``--chunker-params=10,23,16,4095`` results in a fine-grained deduplication|
  13. and creates a big amount of chunks and thus uses a lot of resources to manage
  14. them. This is good for relatively small data volumes and if the machine has a
  15. good amount of free RAM and disk space.
  16. ``--chunker-params=19,23,21,4095`` (default) results in a coarse-grained
  17. deduplication and creates a much smaller amount of chunks and thus uses less
  18. resources. This is good for relatively big data volumes and if the machine has
  19. a relatively low amount of free RAM and disk space.
  20. If you already have made some archives in a repository and you then change
  21. chunker params, this of course impacts deduplication as the chunks will be
  22. cut differently.
  23. In the worst case (all files are big and were touched in between backups), this
  24. will store all content into the repository again.
  25. Usually, it is not that bad though:
  26. - usually most files are not touched, so it will just re-use the old chunks
  27. it already has in the repo
  28. - files smaller than the (both old and new) minimum chunksize result in only
  29. one chunk anyway, so the resulting chunks are same and deduplication will apply
  30. If you switch chunker params to save resources for an existing repo that
  31. already has some backup archives, you will see an increasing effect over time,
  32. when more and more files have been touched and stored again using the bigger
  33. chunksize **and** all references to the smaller older chunks have been removed
  34. (by deleting / pruning archives).
  35. If you want to see an immediate big effect on resource usage, you better start
  36. a new repository when changing chunker params.
  37. For more details, see :ref:`chunker_details`.
  38. --umask
  39. ~~~~~~~
  40. If you use ``--umask``, make sure that all repository-modifying borg commands
  41. (create, delete, prune) that access the repository in question use the same
  42. ``--umask`` value.
  43. If multiple machines access the same repository, this should hold true for all
  44. of them.
  45. --read-special
  46. ~~~~~~~~~~~~~~
  47. The --read-special option is special - you do not want to use it for normal
  48. full-filesystem backups, but rather after carefully picking some targets for it.
  49. The option ``--read-special`` triggers special treatment for block and char
  50. device files as well as FIFOs. Instead of storing them as such a device (or
  51. FIFO), they will get opened, their content will be read and in the backup
  52. archive they will show up like a regular file.
  53. Symlinks will also get special treatment if (and only if) they point to such
  54. a special file: instead of storing them as a symlink, the target special file
  55. will get processed as described above.
  56. One intended use case of this is backing up the contents of one or multiple
  57. block devices, like e.g. LVM snapshots or inactive LVs or disk partitions.
  58. You need to be careful about what you include when using ``--read-special``,
  59. e.g. if you include ``/dev/zero``, your backup will never terminate.
  60. Restoring such files' content is currently only supported one at a time via
  61. ``--stdout`` option (and you have to redirect stdout to where ever it shall go,
  62. maybe directly into an existing device file of your choice or indirectly via
  63. ``dd``).
  64. To some extent, mounting a backup archive with the backups of special files
  65. via ``borg mount`` and then loop-mounting the image files from inside the mount
  66. point will work. If you plan to access a lot of data in there, it likely will
  67. scale and perform better if you do not work via the FUSE mount.
  68. Example
  69. +++++++
  70. Imagine you have made some snapshots of logical volumes (LVs) you want to backup.
  71. .. note::
  72. For some scenarios, this is a good method to get "crash-like" consistency
  73. (I call it crash-like because it is the same as you would get if you just
  74. hit the reset button or your machine would abrubtly and completely crash).
  75. This is better than no consistency at all and a good method for some use
  76. cases, but likely not good enough if you have databases running.
  77. Then you create a backup archive of all these snapshots. The backup process will
  78. see a "frozen" state of the logical volumes, while the processes working in the
  79. original volumes continue changing the data stored there.
  80. You also add the output of ``lvdisplay`` to your backup, so you can see the LV
  81. sizes in case you ever need to recreate and restore them.
  82. After the backup has completed, you remove the snapshots again. ::
  83. $ # create snapshots here
  84. $ lvdisplay > lvdisplay.txt
  85. $ borg create --read-special /path/to/repo::arch lvdisplay.txt /dev/vg0/*-snapshot
  86. $ # remove snapshots here
  87. Now, let's see how to restore some LVs from such a backup. ::
  88. $ borg extract /path/to/repo::arch lvdisplay.txt
  89. $ # create empty LVs with correct sizes here (look into lvdisplay.txt).
  90. $ # we assume that you created an empty root and home LV and overwrite it now:
  91. $ borg extract --stdout /path/to/repo::arch dev/vg0/root-snapshot > /dev/vg0/root
  92. $ borg extract --stdout /path/to/repo::arch dev/vg0/home-snapshot > /dev/vg0/home
  93. .. _append_only_mode:
  94. Append-only mode
  95. ~~~~~~~~~~~~~~~~
  96. A repository can be made "append-only", which means that Borg will never overwrite or
  97. delete committed data (append-only refers to the segment files, but borg will also
  98. reject to delete the repository completely). This is useful for scenarios where a
  99. backup client machine backups remotely to a backup server using ``borg serve``, since
  100. a hacked client machine cannot delete backups on the server permanently.
  101. To activate append-only mode, edit the repository ``config`` file and add a line
  102. ``append_only=1`` to the ``[repository]`` section (or edit the line if it exists).
  103. In append-only mode Borg will create a transaction log in the ``transactions`` file,
  104. where each line is a transaction and a UTC timestamp.
  105. In addition, ``borg serve`` can act as if a repository is in append-only mode with
  106. its option ``--append-only``. This can be very useful for fine-tuning access control
  107. in ``.ssh/authorized_keys`` ::
  108. command="borg serve --append-only ..." ssh-rsa <key used for not-always-trustable backup clients>
  109. command="borg serve ..." ssh-rsa <key used for backup management>
  110. Running ``borg init`` via a ``borg serve --append-only`` server will *not* create
  111. an append-only repository. Running ``borg init --append-only`` creates an append-only
  112. repository regardless of server settings.
  113. Example
  114. +++++++
  115. Suppose an attacker remotely deleted all backups, but your repository was in append-only
  116. mode. A transaction log in this situation might look like this: ::
  117. transaction 1, UTC time 2016-03-31T15:53:27.383532
  118. transaction 5, UTC time 2016-03-31T15:53:52.588922
  119. transaction 11, UTC time 2016-03-31T15:54:23.887256
  120. transaction 12, UTC time 2016-03-31T15:55:54.022540
  121. transaction 13, UTC time 2016-03-31T15:55:55.472564
  122. From your security logs you conclude the attacker gained access at 15:54:00 and all
  123. the backups where deleted or replaced by compromised backups. From the log you know
  124. that transactions 11 and later are compromised. Note that the transaction ID is the
  125. name of the *last* file in the transaction. For example, transaction 11 spans files 6
  126. to 11.
  127. In a real attack you'll likely want to keep the compromised repository
  128. intact to analyze what the attacker tried to achieve. It's also a good idea to make this
  129. copy just in case something goes wrong during the recovery. Since recovery is done by
  130. deleting some files, a hard link copy (``cp -al``) is sufficient.
  131. The first step to reset the repository to transaction 5, the last uncompromised transaction,
  132. is to remove the ``hints.N`` and ``index.N`` files in the repository (these two files are
  133. always expendable). In this example N is 13.
  134. Then remove or move all segment files from the segment directories in ``data/`` starting
  135. with file 6::
  136. rm data/**/{6..13}
  137. That's all to it.
  138. Drawbacks
  139. +++++++++
  140. As data is only appended, and nothing removed, commands like ``prune`` or ``delete``
  141. won't free disk space, they merely tag data as deleted in a new transaction.
  142. Be aware that as soon as you write to the repo in non-append-only mode (e.g. prune,
  143. delete or create archives from an admin machine), it will remove the deleted objects
  144. permanently (including the ones that were already marked as deleted, but not removed,
  145. in append-only mode).
  146. Note that you can go back-and-forth between normal and append-only operation by editing
  147. the configuration file, it's not a "one way trip".
  148. Further considerations
  149. ++++++++++++++++++++++
  150. Append-only mode is not respected by tools other than Borg. ``rm`` still works on the
  151. repository. Make sure that backup client machines only get to access the repository via
  152. ``borg serve``.
  153. Ensure that no remote access is possible if the repository is temporarily set to normal mode
  154. for e.g. regular pruning.
  155. Further protections can be implemented, but are outside of Borg's scope. For example,
  156. file system snapshots or wrapping ``borg serve`` to set special permissions or ACLs on
  157. new data files.
  158. SSH batch mode
  159. ~~~~~~~~~~~~~~
  160. When running Borg using an automated script, ``ssh`` might still ask for a password,
  161. even if there is an SSH key for the target server. Use this to make scripts more robust::
  162. export BORG_RSH='ssh -oBatchMode=yes'