image-backup.rst 7.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154
  1. .. include:: ../global.rst.inc
  2. .. highlight:: none
  3. Backing up entire disk images
  4. =============================
  5. Backing up disk images can still be efficient with Borg because its `deduplication`_
  6. technique makes sure only the modified parts of the file are stored. Borg also has
  7. optional simple sparse file support for extract.
  8. It is of utmost importancy to pin down the disk you want to backup.
  9. You need to use the SERIAL for that.
  10. Use:
  11. .. code-block:: bash
  12. # You can find the short disk serial by:
  13. # udevadm info --query=property --name=nvme1n1 | grep ID_SERIAL_SHORT | cut -d '=' -f 2
  14. DISK_SERIAL="7VS0224F"
  15. DISK_ID=$(readlink -f /dev/disk/by-id/*"${DISK_SERIAL}") # Returns /dev/nvme1n1
  16. mapfile -t PARTITIONS < <(lsblk -o NAME,TYPE -p -n -l "$DISK_ID" | awk '$2 == "part" {print $1}')
  17. echo "Partitions of $DISK_ID:"
  18. echo "${PARTITIONS[@]}"
  19. echo "Disk Identifier: $DISK_ID"
  20. # Use the following line to perform a borg backup for the full disk:
  21. # borg create --read-special /path/to/repo::{now} "$DISK_ID"
  22. # Use the following to perform a borg backup for all partitions of the disk
  23. # borg create --read-special /path/to/repo::{now} "${PARTITIONS[@]}"
  24. # Example output:
  25. # Partitions of /dev/nvme1n1:
  26. # /dev/nvme1n1p1
  27. # /dev/nvme1n1p2
  28. # /dev/nvme1n1p3
  29. # Disk Identifier: /dev/nvme1n1
  30. # borg create --read-special /path/to/repo::{now} /dev/nvme1n1
  31. # borg create --read-special /path/to/repo::{now} /dev/nvme1n1p1 /dev/nvme1n1p2 /dev/nvme1n1p3
  32. Decreasing the size of image backups
  33. ------------------------------------
  34. Disk images are as large as the full disk when uncompressed and might not get much
  35. smaller post-deduplication after heavy use because virtually all file systems don't
  36. actually delete file data on disk but instead delete the filesystem entries referencing
  37. the data. Therefore, if a disk nears capacity and files are deleted again, the change
  38. will barely decrease the space it takes up when compressed and deduplicated. Depending
  39. on the filesystem, there are several ways to decrease the size of a disk image:
  40. Using ntfsclone (NTFS, i.e. Windows VMs)
  41. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  42. ``ntfsclone`` can only operate on filesystems with the journal cleared (i.e. turned-off
  43. machines), which somewhat limits its utility in the case of VM snapshots. However, when
  44. it can be used, its special image format is even more efficient than just zeroing and
  45. deduplicating. For backup, save the disk header and the contents of each partition::
  46. HEADER_SIZE=$(sfdisk -lo Start $DISK | grep -A1 -P 'Start$' | tail -n1 | xargs echo)
  47. PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d')
  48. dd if=$DISK count=$HEADER_SIZE | borg create repo::hostname-partinfo -
  49. echo "$PARTITIONS" | grep NTFS | cut -d' ' -f1 | while read x; do
  50. PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
  51. ntfsclone -so - $x | borg create repo::hostname-part$PARTNUM -
  52. done
  53. # to back up non-NTFS partitions as well:
  54. echo "$PARTITIONS" | grep -v NTFS | cut -d' ' -f1 | while read x; do
  55. PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
  56. borg create --read-special repo::hostname-part$PARTNUM $x
  57. done
  58. Restoration is a similar process::
  59. borg extract --stdout repo::hostname-partinfo | dd of=$DISK && partprobe
  60. PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d')
  61. borg list --format {archive}{NL} repo | grep 'part[0-9]*$' | while read x; do
  62. PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
  63. PARTITION=$(echo "$PARTITIONS" | grep -E "$DISKp?$PARTNUM" | head -n1)
  64. if echo "$PARTITION" | cut -d' ' -f2- | grep -q NTFS; then
  65. borg extract --stdout repo::$x | ntfsclone -rO $(echo "$PARTITION" | cut -d' ' -f1) -
  66. else
  67. borg extract --stdout repo::$x | dd of=$(echo "$PARTITION" | cut -d' ' -f1)
  68. fi
  69. done
  70. .. note::
  71. When backing up a disk image (as opposed to a real block device), mount it as
  72. a loopback image to use the above snippets::
  73. DISK=$(losetup -Pf --show /path/to/disk/image)
  74. # do backup as shown above
  75. losetup -d $DISK
  76. Using zerofree (ext2, ext3, ext4)
  77. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  78. ``zerofree`` works similarly to ntfsclone in that it zeros out unused chunks of the FS,
  79. except it works in place, zeroing the original partition. This makes the backup process
  80. a bit simpler::
  81. sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d' | grep Linux | cut -d' ' -f1 | xargs -n1 zerofree
  82. borg create --read-special repo::hostname-disk $DISK
  83. Because the partitions were zeroed in place, restoration is only one command::
  84. borg extract --stdout repo::hostname-disk | dd of=$DISK
  85. .. note:: The "traditional" way to zero out space on a partition, especially one already
  86. mounted, is simply to ``dd`` from ``/dev/zero`` to a temporary file and delete
  87. it. This is ill-advised for the reasons mentioned in the ``zerofree`` man page:
  88. - it is slow
  89. - it makes the disk image (temporarily) grow to its maximal extent
  90. - it (temporarily) uses all free space on the disk, so other concurrent write actions may fail.
  91. Virtual machines
  92. ----------------
  93. If you use non-snapshotting backup tools like Borg to back up virtual machines, then
  94. the VMs should be turned off for the duration of the backup. Backing up live VMs can
  95. (and will) result in corrupted or inconsistent backup contents: a VM image is just a
  96. regular file to Borg with the same issues as regular files when it comes to concurrent
  97. reading and writing from the same file.
  98. For backing up live VMs use filesystem snapshots on the VM host, which establishes
  99. crash-consistency for the VM images. This means that with most file systems (that
  100. are journaling) the FS will always be fine in the backup (but may need a journal
  101. replay to become accessible).
  102. Usually this does not mean that file *contents* on the VM are consistent, since file
  103. contents are normally not journaled. Notable exceptions are ext4 in data=journal mode,
  104. ZFS and btrfs (unless nodatacow is used).
  105. Applications designed with crash-consistency in mind (most relational databases like
  106. PostgreSQL, SQLite etc. but also for example Borg repositories) should always be able
  107. to recover to a consistent state from a backup created with crash-consistent snapshots
  108. (even on ext4 with data=writeback or XFS). Other applications may require a lot of work
  109. to reach application-consistency; it's a broad and complex issue that cannot be explained
  110. in entirety here.
  111. Hypervisor snapshots capturing most of the VM's state can also be used for backups and
  112. can be a better alternative to pure file system based snapshots of the VM's disk, since
  113. no state is lost. Depending on the application this can be the easiest and most reliable
  114. way to create application-consistent backups.
  115. Borg doesn't intend to address these issues due to their huge complexity and
  116. platform/software dependency. Combining Borg with the mechanisms provided by the platform
  117. (snapshots, hypervisor features) will be the best approach to start tackling them.