Bläddra i källkod

Merge pull request #5097 from fantasya-pbem/docs/2295_Orphaned-chunks

[DOC] Document one cause of orphaned chunks in check command, #2295
TW 5 år sedan
förälder
incheckning
cccee36a60
1 ändrade filer med 36 tillägg och 35 borttagningar
  1. 36 35
      src/borg/archiver.py

+ 36 - 35
src/borg/archiver.py

@@ -2812,58 +2812,59 @@ class Archiver:
 
         First, the underlying repository data files are checked:
 
-        - For all segments the segment magic (header) is checked
-        - For all objects stored in the segments, all metadata (e.g. crc and size) and
+        - For all segments, the segment magic header is checked.
+        - For all objects stored in the segments, all metadata (e.g. CRC and size) and
           all data is read. The read data is checked by size and CRC. Bit rot and other
           types of accidental damage can be detected this way.
-        - If we are in repair mode and a integrity error is detected for a segment,
-          we try to recover as many objects from the segment as possible.
-        - In repair mode, it makes sure that the index is consistent with the data
-          stored in the segments.
-        - If you use a remote repo server via ssh:, the repo check is executed on the
-          repo server without causing significant network traffic.
+        - In repair mode, if an integrity error is detected in a segment, try to recover
+          as many objects from the segment as possible.
+        - In repair mode, make sure that the index is consistent with the data stored in
+          the segments.
+        - If checking a remote repo via ``ssh:``, the repo check is executed on the server
+          without causing significant network traffic.
         - The repository check can be skipped using the ``--archives-only`` option.
-        - A repository check can be time consuming. Partial checks are possible with the ``--max-duration`` option.
+        - A repository check can be time consuming. Partial checks are possible with the
+          ``--max-duration`` option.
 
         Second, the consistency and correctness of the archive metadata is verified:
 
         - Is the repo manifest present? If not, it is rebuilt from archive metadata
           chunks (this requires reading and decrypting of all metadata and data).
-        - Check if archive metadata chunk is present. if not, remove archive from
-          manifest.
+        - Check if archive metadata chunk is present; if not, remove archive from manifest.
         - For all files (items) in the archive, for all chunks referenced by these
-          files, check if chunk is present.
-          If a chunk is not present and we are in repair mode, replace it with a same-size
-          replacement chunk of zeros.
-          If a previously lost chunk reappears (e.g. via a later backup) and we are in
-          repair mode, the all-zero replacement chunk will be replaced by the correct chunk.
-          This requires reading of archive and file metadata, but not data.
-        - If we are in repair mode and we checked all the archives: delete orphaned
-          chunks from the repo.
-        - if you use a remote repo server via ssh:, the archive check is executed on
-          the client machine (because if encryption is enabled, the checks will require
-          decryption and this is always done client-side, because key access will be
-          required).
-        - The archive checks can be time consuming, they can be skipped using the
+          files, check if chunk is present. In repair mode, if a chunk is not present,
+          replace it with a same-size replacement chunk of zeroes. If a previously lost
+          chunk reappears (e.g. via a later backup), in repair mode the all-zero replacement
+          chunk will be replaced by the correct chunk. This requires reading of archive and
+          file metadata, but not data.
+        - In repair mode, when all the archives were checked, orphaned chunks are deleted
+          from the repo. One cause of orphaned chunks are input file related errors (like
+          read errors) in the archive creation process.
+        - If checking a remote repo via ``ssh:``, the archive check is executed on the
+          client machine because it requires decryption, and this is always done client-side
+          as key access is needed.
+        - The archive checks can be time consuming; they can be skipped using the
           ``--repository-only`` option.
 
-        The ``--max-duration`` option can be used to split a long-running repository check into multiple partial checks.
-        After the given number of seconds the check is interrupted. The next partial check will continue where the
-        previous one stopped, until the complete repository has been checked. Example: Assuming a full check took 7
-        hours, then running a daily check with --max-duration=3600 (1 hour) would result in one full check per week.
+        The ``--max-duration`` option can be used to split a long-running repository check
+        into multiple partial checks. After the given number of seconds the check is
+        interrupted. The next partial check will continue where the previous one stopped,
+        until the complete repository has been checked. Example: Assuming a full check took 7
+        hours, then running a daily check with --max-duration=3600 (1 hour) resulted in one
+        full check per week.
 
-        Attention: Partial checks can only do way less checks than a full check (only the CRC32 checks on segment file
-        entries are done) and cannot be combined with ``--repair``. Partial checks may therefore be useful only with very
-        large repositories where a full check would take too long. Doing a full repository check aborts a partial check;
-        the next partial check will start from the beginning.
+        Attention: Partial checks can only do way less checking than a full check (only the
+        CRC32 checks on segment file entries are done), and cannot be combined with the
+        ``--repair`` option. Partial checks may therefore be useful only with very large
+        repositories where a full check took too long. Doing a full repository check aborts a
+        partial check; the next partial check will restart from the beginning.
 
         The ``--verify-data`` option will perform a full integrity verification (as opposed to
         checking the CRC32 of the segment) of data, which means reading the data from the
         repository, decrypting and decompressing it. This is a cryptographic verification,
         which will detect (accidental) corruption. For encrypted repositories it is
-        tamper-resistant as well, unless the attacker has access to the keys.
-
-        It is also very slow.
+        tamper-resistant as well, unless the attacker has access to the keys. It is also very
+        slow.
         """)
         subparser = subparsers.add_parser('check', parents=[common_parser], add_help=False,
                                           description=self.do_check.__doc__,