123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154 |
- .. include:: ../global.rst.inc
- .. highlight:: none
- Backing up entire disk images
- =============================
- Backing up disk images can still be efficient with Borg because its `deduplication`_
- technique makes sure only the modified parts of the file are stored. Borg also has
- optional simple sparse file support for extract.
- It is of utmost importancy to pin down the disk you want to backup.
- You need to use the SERIAL for that.
- Use:
- .. code-block:: bash
- # You can find the short disk serial by:
- # udevadm info --query=property --name=nvme1n1 | grep ID_SERIAL_SHORT | cut -d '=' -f 2
- export BORG_REPO=/path/to/repo
- DISK_SERIAL="7VS0224F"
- DISK_ID=$(readlink -f /dev/disk/by-id/*"${DISK_SERIAL}") # Returns /dev/nvme1n1
- mapfile -t PARTITIONS < <(lsblk -o NAME,TYPE -p -n -l "$DISK_ID" | awk '$2 == "part" {print $1}')
- echo "Partitions of $DISK_ID:"
- echo "${PARTITIONS[@]}"
- echo "Disk Identifier: $DISK_ID"
- # Use the following line to perform a borg backup for the full disk:
- # borg create --read-special {now} "$DISK_ID"
- # Use the following to perform a borg backup for all partitions of the disk
- # borg create --read-special {now} "${PARTITIONS[@]}"
- # Example output:
- # Partitions of /dev/nvme1n1:
- # /dev/nvme1n1p1
- # /dev/nvme1n1p2
- # /dev/nvme1n1p3
- # Disk Identifier: /dev/nvme1n1
- # borg create --read-special {now} /dev/nvme1n1
- # borg create --read-special {now} /dev/nvme1n1p1 /dev/nvme1n1p2 /dev/nvme1n1p3
- Decreasing the size of image backups
- ------------------------------------
- Disk images are as large as the full disk when uncompressed and might not get much
- smaller post-deduplication after heavy use because virtually all file systems don't
- actually delete file data on disk but instead delete the filesystem entries referencing
- the data. Therefore, if a disk nears capacity and files are deleted again, the change
- will barely decrease the space it takes up when compressed and deduplicated. Depending
- on the filesystem, there are several ways to decrease the size of a disk image:
- Using ntfsclone (NTFS, i.e. Windows VMs)
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- ``ntfsclone`` can only operate on filesystems with the journal cleared (i.e. turned-off
- machines), which somewhat limits its utility in the case of VM snapshots. However, when
- it can be used, its special image format is even more efficient than just zeroing and
- deduplicating. For backup, save the disk header and the contents of each partition::
- HEADER_SIZE=$(sfdisk -lo Start $DISK | grep -A1 -P 'Start$' | tail -n1 | xargs echo)
- PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d')
- dd if=$DISK count=$HEADER_SIZE | borg create repo::hostname-partinfo -
- echo "$PARTITIONS" | grep NTFS | cut -d' ' -f1 | while read x; do
- PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
- ntfsclone -so - $x | borg create repo::hostname-part$PARTNUM -
- done
- # to back up non-NTFS partitions as well:
- echo "$PARTITIONS" | grep -v NTFS | cut -d' ' -f1 | while read x; do
- PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
- borg create --read-special repo::hostname-part$PARTNUM $x
- done
- Restoration is a similar process::
- borg extract --stdout repo::hostname-partinfo | dd of=$DISK && partprobe
- PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d')
- borg list --format {archive}{NL} repo | grep 'part[0-9]*$' | while read x; do
- PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
- PARTITION=$(echo "$PARTITIONS" | grep -E "$DISKp?$PARTNUM" | head -n1)
- if echo "$PARTITION" | cut -d' ' -f2- | grep -q NTFS; then
- borg extract --stdout repo::$x | ntfsclone -rO $(echo "$PARTITION" | cut -d' ' -f1) -
- else
- borg extract --stdout repo::$x | dd of=$(echo "$PARTITION" | cut -d' ' -f1)
- fi
- done
- .. note::
- When backing up a disk image (as opposed to a real block device), mount it as
- a loopback image to use the above snippets::
- DISK=$(losetup -Pf --show /path/to/disk/image)
- # do backup as shown above
- losetup -d $DISK
- Using zerofree (ext2, ext3, ext4)
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- ``zerofree`` works similarly to ntfsclone in that it zeros out unused chunks of the FS,
- except it works in place, zeroing the original partition. This makes the backup process
- a bit simpler::
- sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d' | grep Linux | cut -d' ' -f1 | xargs -n1 zerofree
- borg create --read-special repo::hostname-disk $DISK
- Because the partitions were zeroed in place, restoration is only one command::
- borg extract --stdout repo::hostname-disk | dd of=$DISK
- .. note:: The "traditional" way to zero out space on a partition, especially one already
- mounted, is simply to ``dd`` from ``/dev/zero`` to a temporary file and delete
- it. This is ill-advised for the reasons mentioned in the ``zerofree`` man page:
- - it is slow
- - it makes the disk image (temporarily) grow to its maximal extent
- - it (temporarily) uses all free space on the disk, so other concurrent write actions may fail.
- Virtual machines
- ----------------
- If you use non-snapshotting backup tools like Borg to back up virtual machines, then
- the VMs should be turned off for the duration of the backup. Backing up live VMs can
- (and will) result in corrupted or inconsistent backup contents: a VM image is just a
- regular file to Borg with the same issues as regular files when it comes to concurrent
- reading and writing from the same file.
- For backing up live VMs use filesystem snapshots on the VM host, which establishes
- crash-consistency for the VM images. This means that with most file systems (that
- are journaling) the FS will always be fine in the backup (but may need a journal
- replay to become accessible).
- Usually this does not mean that file *contents* on the VM are consistent, since file
- contents are normally not journaled. Notable exceptions are ext4 in data=journal mode,
- ZFS and btrfs (unless nodatacow is used).
- Applications designed with crash-consistency in mind (most relational databases like
- PostgreSQL, SQLite etc. but also for example Borg repositories) should always be able
- to recover to a consistent state from a backup created with crash-consistent snapshots
- (even on ext4 with data=writeback or XFS). Other applications may require a lot of work
- to reach application-consistency; it's a broad and complex issue that cannot be explained
- in entirety here.
- Hypervisor snapshots capturing most of the VM's state can also be used for backups and
- can be a better alternative to pure file system based snapshots of the VM's disk, since
- no state is lost. Depending on the application this can be the easiest and most reliable
- way to create application-consistent backups.
- Borg doesn't intend to address these issues due to their huge complexity and
- platform/software dependency. Combining Borg with the mechanisms provided by the platform
- (snapshots, hypervisor features) will be the best approach to start tackling them.
|